Index: user/ngie/bug-237403/MAINTAINERS
===================================================================
--- user/ngie/bug-237403/MAINTAINERS	(revision 346925)
+++ user/ngie/bug-237403/MAINTAINERS	(revision 346926)
@@ -1,125 +1,132 @@
 $FreeBSD$
 
 Please note that the content of this file is strictly advisory.
 No locks listed here are valid.  The only strict review requirements
 are granted by core.  These are documented in head/LOCKS and enforced
 by svnadmin/conf/approvers.
 
 The source tree is a community effort.  However, some folks go to the
 trouble of looking after particular areas of the tree.  In return for
 their active caretaking of the code it is polite to coordinate changes
 with them.  This is a list of people who have expressed an interest in
 part of the code or listed their active caretaking role so that other
 committers can easily find somebody who is familiar with it.  The notes
 should specify if there is a 3rd party source tree involved or other
 things that should be kept in mind.
 
 However, this is not a 'big stick', it is an offer to help and a source
 of guidance.  It does not override the communal nature of the tree.
 It is not a registry of 'turf' or private property.
 
 ***
 This list is prone to becoming stale quickly.  The best way to find the recent
 maintainer of a sub-system is to check recent logs for that directory or
 sub-system.
 ***
 
 ***
 Maintainers are encouraged to visit:
   https://reviews.freebsd.org/herald
 
 and configure notifications for parts of the tree which they maintain.
 Notifications can automatically be sent when someone proposes a revision or
 makes a commit to the specified subtree.
 ***
 
 subsystem	login	notes
 -----------------------------
-atf			freebsd-testing,jmmv,ngie	Pre-commit review requested.
 ath(4)		adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
+contrib/atf		ngie,#test		Pre-commit review requested.
+contrib/capsicum-test	ngie,#capsicum,#test	Pre-commit review requested.
 contrib/compiler-rt	dim	Pre-commit review preferred.
+contrib/googletest	ngie,#test		Pre-commit review requested.
 contrib/ipfilter	cy	Pre-commit review requested.
 contrib/libc++		dim	Pre-commit review preferred.
 contrib/libcxxrt	dim	Pre-commit review preferred.
 contrib/libunwind	dim,emaste,jhb	Pre-commit review preferred.
 contrib/llvm		dim	Pre-commit review preferred.
 contrib/llvm/tools/lldb	dim,emaste	Pre-commit review preferred.
-contrib/netbsd-tests	freebsd-testing,ngie	Pre-commit review requested.
-contrib/pjdfstest	freebsd-testing,asomers,ngie,pjd	Pre-commit review requested.
+contrib/netbsd-tests	ngie,#test		Pre-commit review requested.
+contrib/pjdfstest	asomers,ngie,pjd,#test	Pre-commit review requested.
 *env(3)		secteam	Due to the problematic security history of this
 			code, please have patches reviewed by secteam.
 etc/mail	gshapiro	Pre-commit review requested.  Keep in sync with -STABLE.
 etc/sendmail	gshapiro	Pre-commit review requested.  Keep in sync with -STABLE.
 fetch		des	Pre-commit review requested, email only.
 geli		pjd	Pre-commit review requested (both sys/geom/eli/ and sbin/geom/class/eli/).
 isci(4)		jimharris	Pre-commit review requested.
 iwm(4)		adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 iwn(4)		adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 kqueue		jmg	Pre-commit review requested.  Documentation Required.
 libdpv		dteske	Pre-commit review requested. Keep in sync with dpv(1).
 libfetch	des	Pre-commit review requested, email only.
 libfigpar	dteske	Pre-commit review requested.
 libm		freebsd-numerics	Send email with patches to freebsd-numerics@
 libpam		des	Pre-commit review requested, email only.
 linprocfs	des	Pre-commit review requested, email only.
 lpr		gad	Pre-commit review requested, particularly for
 			lpd/recvjob.c and lpd/printjob.c.
 nanobsd		imp	Pre-commit phabricator review requested.
 net80211	adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 nfs		freebsd-fs@FreeBSD.org, rmacklem is best for reviews.
 nvd(4)		jimharris	Pre-commit review requested.
 nvme(4)		jimharris	Pre-commit review requested.
 nvmecontrol(8)	jimharris	Pre-commit review requested.
 opencrypto	jmg	Pre-commit review requested.  Documentation Required.
 openssh		des	Pre-commit review requested, email only.
 openssl		benl,jkim	Pre-commit review requested.
 otus(4)		adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 pci bus		imp,jhb	Pre-commit review requested.
 pmcstudy(8)	rrs		Pre-commit review requested.
 procfs		des	Pre-commit review requested, email only.
 pseudofs	des	Pre-commit review requested, email only.
 release/release.sh	gjb,re	Pre-commit review and regression tests
 				requested.
 sctp		rrs,tuexen	Pre-commit review requested (changes need to be backported to github).
 sendmail	gshapiro	Pre-commit review requested.
 sh(1)		jilles		Pre-commit review requested. This also applies
 				to kill(1), printf(1) and test(1) which are
 				compiled in as builtins.
 share/mk	imp, bapt, bdrewery, emaste, sjg	Make is hard.
-share/mk/*.test.mk	freebsd-testing,ngie (same list as share/mk too)	Pre-commit review requested.
+share/mk/*.test.mk	imp,bapt,bdrewery,	Pre-commit review requested.
+			emaste,ngie,sjg,#test
 stand/forth		dteske	Pre-commit review requested.
 stand/lua		kevans	Pre-commit review requested
-sys/compat/linuxkpi	hselasky	If in doubt, ask.
+sys/compat/linuxkpi	hselasky		If in doubt, ask.
+			zeising, johalun	pre-commit review requested via
+						#x11 phabricator group.
+						(to avoid drm graphics drivers
+						impact)
 sys/contrib/ipfilter	cy	Pre-commit review requested.
 sys/dev/e1000	erj	Pre-commit phabricator review requested.
 sys/dev/ixgbe	erj	Pre-commit phabricator review requested.
 sys/dev/ixl	erj	Pre-commit phabricator review requested.
 sys/dev/sound/usb	hselasky	If in doubt, ask.
 sys/dev/usb	hselasky	If in doubt, ask.
 sys/dev/xen	royger		Pre-commit review recommended.
 sys/netinet/ip_carp.c	glebius	Pre-commit review recommended.
 sys/netpfil/pf	kp,glebius	Pre-commit review recommended.
 sys/x86/xen	royger		Pre-commit review recommended.
 sys/xen		royger		Pre-commit review recommended.
-tests			freebsd-testing,ngie	Pre-commit review requested.
+tests			ngie,#test		Pre-commit review requested.
 tools/build	imp	Pre-commit review requested, especially to fix bootstrap issues.
 top(1)		eadler	Pre-commit review requested.
 usr.sbin/bsdconfig	dteske	Pre-commit phabricator review requested.
 usr.sbin/dpv	dteske	Pre-commit review requested. Keep in sync with libdpv.
 usr.sbin/pkg	pkg@	Please coordinate behavior or flag changes with pkg team.
 usr.sbin/sysrc	dteske	Pre-commit phabricator review requested. Keep in sync with bsdconfig(8) sysrc.subr.
 vmm(4)		tychon, jhb	Pre-commit review requested via #bhyve
 				phabricator group.
 libvmmapi	tychon, jhb	Pre-commit review requested via #bhyve
 				phabricator group.
 usr.sbin/bhyve*	tychon, jhb	Pre-commit review requested via #bhyve
 				phabricator group.
 autofs(5)	trasz	Pre-commit review recommended.
 iscsi(4)	trasz	Pre-commit review recommended.
 rctl(8)		trasz	Pre-commit review recommended.
 sys/dev/ofw	nwhitehorn	Pre-commit review recommended.
 sys/dev/drm*	imp	Pre-commit review requested in phabricator. Changes need to
 			be mirrored in github repo.
 sys/dev/usb/wlan adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 sys/arm/allwinner	manu	Pre-commit review requested
 sys/arm64/rockchip	manu	Pre-commit review requested

Property changes on: user/ngie/bug-237403/MAINTAINERS
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/MAINTAINERS:r346444-346925
Index: user/ngie/bug-237403/bin/date/date.1
===================================================================
--- user/ngie/bug-237403/bin/date/date.1	(revision 346925)
+++ user/ngie/bug-237403/bin/date/date.1	(revision 346926)
@@ -1,471 +1,473 @@
 .\"-
 .\" Copyright (c) 1980, 1990, 1993
 .\"	The Regents of the University of California.  All rights reserved.
 .\"
 .\" This code is derived from software contributed to Berkeley by
 .\" the Institute of Electrical and Electronics Engineers, Inc.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\" 3. Neither the name of the University nor the names of its contributors
 .\"    may be used to endorse or promote products derived from this software
 .\"    without specific prior written permission.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\"     @(#)date.1	8.3 (Berkeley) 4/28/95
 .\" $FreeBSD$
 .\"
-.Dd March 20, 2019
+.Dd April 23, 2019
 .Dt DATE 1
 .Os
 .Sh NAME
 .Nm date
 .Nd display or set date and time
 .Sh SYNOPSIS
 .Nm
-.Op Fl jRu
+.Op Fl jnRu
 .Op Fl r Ar seconds | Ar filename
 .Oo
 .Fl v
 .Sm off
 .Op Cm + | -
 .Ar val Op Ar ymwdHMS
 .Sm on
 .Oc
 .Ar ...
 .Op Cm + Ns Ar output_fmt
 .Nm
 .Op Fl ju
 .Sm off
 .Op Oo Oo Oo Oo Ar cc Oc Ar yy Oc Ar mm Oc Ar dd Oc Ar HH
 .Ar MM Op Ar .ss
 .Sm on
 .Nm
 .Op Fl jRu
 .Fl f Ar input_fmt new_date
 .Op Cm + Ns Ar output_fmt
 .Nm
 .Op Fl jnu
 .Op Fl I Ns Op Ar FMT
 .Op Fl f Ar input_fmt
 .Op Fl r Ar ...
 .Op Fl v Ar ...
 .Op Ar new_date
 .Sh DESCRIPTION
 When invoked without arguments, the
 .Nm
 utility displays the current date and time.
 Otherwise, depending on the options specified,
 .Nm
 will set the date and time or print it in a user-defined way.
 .Pp
 The
 .Nm
 utility displays the date and time read from the kernel clock.
 When used to set the date and time,
 both the kernel clock and the hardware clock are updated.
 .Pp
 Only the superuser may set the date,
 and if the system securelevel (see
 .Xr securelevel 7 )
 is greater than 1,
 the time may not be changed by more than 1 second.
 .Pp
 The options are as follows:
 .Bl -tag -width Ds
 .It Fl f
 Use
 .Ar input_fmt
 as the format string to parse the
 .Ar new_date
 provided rather than using the default
 .Sm off
 .Oo Oo Oo Oo Oo
 .Ar cc Oc
 .Ar yy Oc
 .Ar mm Oc
 .Ar dd Oc
 .Ar HH
 .Oc Ar MM Op Ar .ss
 .Sm on
 format.
 Parsing is done using
 .Xr strptime 3 .
 .It Fl I Ns Op Ar FMT
 Use
 .St -iso8601
 output format.
 .Ar FMT
 may be omitted, in which case the default is
 .Sq date .
 Valid
 .Ar FMT
 values are
 .Sq date ,
 .Sq hours ,
 .Sq minutes ,
 and
 .Sq seconds .
 The date and time is formatted to the specified precision.
 When
 .Ar FMT
 is
 .Sq hours
 (or the more precise
 .Sq minutes
 or
 .Sq seconds ) ,
 the
 .St -iso8601
 format includes the timezone.
 .It Fl j
 Do not try to set the date.
 This allows you to use the
 .Fl f
 flag in addition to the
 .Cm +
 option to convert one date format to another.
+.It Fl n
+Obsolete flag, accepted and ignored for compatibility.
 .It Fl R
 Use RFC 2822 date and time output format.
 This is equivalent to using
 .Dq Li %a, %d %b %Y \&%T %z
 as
 .Ar output_fmt
 while
 .Ev LC_TIME
 is set to the
 .Dq C
 locale .
 .It Fl r Ar seconds
 Print the date and time represented by
 .Ar seconds ,
 where
 .Ar seconds
 is the number of seconds since the Epoch
 (00:00:00 UTC, January 1, 1970;
 see
 .Xr time 3 ) ,
 and can be specified in decimal, octal, or hex.
 .It Fl r Ar filename
 Print the date and time of the last modification of
 .Ar filename .
 .It Fl u
 Display or set the date in
 .Tn UTC
 (Coordinated Universal) time.
 .It Fl v
 Adjust (i.e., take the current date and display the result of the
 adjustment; not actually set the date) the second, minute, hour, month
 day, week day, month or year according to
 .Ar val .
 If
 .Ar val
 is preceded with a plus or minus sign,
 the date is adjusted forwards or backwards according to the remaining string,
 otherwise the relevant part of the date is set.
 The date can be adjusted as many times as required using these flags.
 Flags are processed in the order given.
 .Pp
 When setting values
 (rather than adjusting them),
 seconds are in the range 0-59, minutes are in the range 0-59, hours are
 in the range 0-23, month days are in the range 1-31, week days are in the
 range 0-6 (Sun-Sat),
 months are in the range 1-12 (Jan-Dec)
 and years are in the range 80-38 or 1980-2038.
 .Pp
 If
 .Ar val
 is numeric, one of either
 .Ar y ,
 .Ar m ,
 .Ar w ,
 .Ar d ,
 .Ar H ,
 .Ar M
 or
 .Ar S
 must be used to specify which part of the date is to be adjusted.
 .Pp
 The week day or month may be specified using a name rather than a
 number.
 If a name is used with the plus
 (or minus)
 sign, the date will be put forwards
 (or backwards)
 to the next
 (previous)
 date that matches the given week day or month.
 This will not adjust the date,
 if the given week day or month is the same as the current one.
 .Pp
 When a date is adjusted to a specific value or in units greater than hours,
 daylight savings time considerations are ignored.
 Adjustments in units of hours or less honor daylight saving time.
 So, assuming the current date is March 26, 0:30 and that the DST adjustment
 means that the clock goes forward at 01:00 to 02:00, using
 .Fl v No +1H
 will adjust the date to March 26, 2:30.
 Likewise, if the date is October 29, 0:30 and the DST adjustment means that
 the clock goes back at 02:00 to 01:00, using
 .Fl v No +3H
 will be necessary to reach October 29, 2:30.
 .Pp
 When the date is adjusted to a specific value that does not actually exist
 (for example March 26, 1:30 BST 2000 in the Europe/London timezone),
 the date will be silently adjusted forwards in units of one hour until it
 reaches a valid time.
 When the date is adjusted to a specific value that occurs twice
 (for example October 29, 1:30 2000),
 the resulting timezone will be set so that the date matches the earlier of
 the two times.
 .Pp
 It is not possible to adjust a date to an invalid absolute day, so using
 the switches
 .Fl v No 31d Fl v No 12m
 will simply fail five months of the year.
 It is therefore usual to set the month before setting the day; using
 .Fl v No 12m Fl v No 31d
 always works.
 .Pp
 Adjusting the date by months is inherently ambiguous because
 a month is a unit of variable length depending on the current date.
 This kind of date adjustment is applied in the most intuitive way.
 First of all,
 .Nm
 tries to preserve the day of the month.
 If it is impossible because the target month is shorter than the present one,
 the last day of the target month will be the result.
 For example, using
 .Fl v No +1m
 on May 31 will adjust the date to June 30, while using the same option
 on January 30 will result in the date adjusted to the last day of February.
 This approach is also believed to make the most sense for shell scripting.
 Nevertheless, be aware that going forth and back by the same number of
 months may take you to a different date.
 .Pp
 Refer to the examples below for further details.
 .El
 .Pp
 An operand with a leading plus
 .Pq Sq +
 sign signals a user-defined format string
 which specifies the format in which to display the date and time.
 The format string may contain any of the conversion specifications
 described in the
 .Xr strftime 3
 manual page, as well as any arbitrary text.
 A newline
 .Pq Ql \en
 character is always output after the characters specified by
 the format string.
 The format string for the default display is
 .Dq +%+ .
 .Pp
 If an operand does not have a leading plus sign, it is interpreted as
 a value for setting the system's notion of the current date and time.
 The canonical representation for setting the date and time is:
 .Pp
 .Bl -tag -width Ds -compact -offset indent
 .It Ar cc
 Century
 (either 19 or 20)
 prepended to the abbreviated year.
 .It Ar yy
 Year in abbreviated form
 (e.g., 89 for 1989, 06 for 2006).
 .It Ar mm
 Numeric month, a number from 1 to 12.
 .It Ar dd
 Day, a number from 1 to 31.
 .It Ar HH
 Hour, a number from 0 to 23.
 .It Ar MM
 Minutes, a number from 0 to 59.
 .It Ar ss
 Seconds, a number from 0 to 60
 (59 plus a potential leap second).
 .El
 .Pp
 Everything but the minutes is optional.
 .Pp
 Time changes for Daylight Saving Time, standard time, leap seconds,
 and leap years are handled automatically.
 .Sh ENVIRONMENT
 The following environment variables affect the execution of
 .Nm :
 .Bl -tag -width Ds
 .It Ev TZ
 The timezone to use when displaying dates.
 The normal format is a pathname relative to
 .Pa /usr/share/zoneinfo .
 For example, the command
 .Dq TZ=America/Los_Angeles date
 displays the current time in California.
 See
 .Xr environ 7
 for more information.
 .El
 .Sh FILES
 .Bl -tag -width /var/log/messages -compact
 .It Pa /var/log/utx.log
 record of date resets and time changes
 .It Pa /var/log/messages
 record of the user setting the time
 .El
 .Sh EXIT STATUS
 The
 .Nm
 utility exits 0 on success, 1 if unable to set the date, and 2
 if able to set the local date, but unable to set it globally.
 .Sh EXAMPLES
 The command:
 .Pp
 .Dl "date ""+DATE: %Y-%m-%d%nTIME: %H:%M:%S"""
 .Pp
 will display:
 .Bd -literal -offset indent
 DATE: 1987-11-21
 TIME: 13:36:16
 .Ed
 .Pp
 In the Europe/London timezone, the command:
 .Pp
 .Dl "date -v1m -v+1y"
 .Pp
 will display:
 .Pp
 .Dl "Sun Jan  4 04:15:24 GMT 1998"
 .Pp
 where it is currently
 .Li "Mon Aug  4 04:15:24 BST 1997" .
 .Pp
 The command:
 .Pp
 .Dl "date -v1d -v3m -v0y -v-1d"
 .Pp
 will display the last day of February in the year 2000:
 .Pp
 .Dl "Tue Feb 29 03:18:00 GMT 2000"
 .Pp
 So will the command:
 .Pp
 .Dl "date -v3m -v30d -v0y -v-1m"
 .Pp
 because there is no such date as the 30th of February.
 .Pp
 The command:
 .Pp
 .Dl "date -v1d -v+1m -v-1d -v-fri"
 .Pp
 will display the last Friday of the month:
 .Pp
 .Dl "Fri Aug 29 04:31:11 BST 1997"
 .Pp
 where it is currently
 .Li "Mon Aug  4 04:31:11 BST 1997" .
 .Pp
 The command:
 .Pp
 .Dl "date 8506131627"
 .Pp
 sets the date to
 .Dq Li "June 13, 1985, 4:27 PM" .
 .Pp
 .Dl "date ""+%Y%m%d%H%M.%S"""
 .Pp
 may be used on one machine to print out the date
 suitable for setting on another.
 .Qq ( Li "+%m%d%H%M%Y.%S"
 for use on
 .Tn Linux . )
 .Pp
 The command:
 .Pp
 .Dl "date 1432"
 .Pp
 sets the time to
 .Li "2:32 PM" ,
 without modifying the date.
 .Pp
 The command
 .Pp
 .Dl "TZ=America/Los_Angeles date -Iseconds -r 1533415339"
 .Pp
 will display
 .Pp
 .Dl "2018-08-04T13:42:19-07:00"
 .Pp
 Finally the command:
 .Pp
 .Dl "date -j -f ""%a %b %d %T %Z %Y"" ""`date`"" ""+%s"""
 .Pp
 can be used to parse the output from
 .Nm
 and express it in Epoch time.
 .Sh DIAGNOSTICS
 It is invalid to combine the
 .Fl I
 flag with either
 .Fl R
 or an output format
 .Dq ( + Ns ... )
 operand.
 If this occurs,
 .Nm
 prints:
 .Ql multiple output formats specified
 and exits with an error status.
 .Sh SEE ALSO
 .Xr locale 1 ,
 .Xr gettimeofday 2 ,
 .Xr getutxent 3 ,
 .Xr strftime 3 ,
 .Xr strptime 3
 .Rs
 .%T "TSP: The Time Synchronization Protocol for UNIX 4.3BSD"
 .%A R. Gusella
 .%A S. Zatti
 .Re
 .Sh STANDARDS
 The
 .Nm
 utility is expected to be compatible with
 .St -p1003.2 .
 The
 .Fl d , f , I , j , r , t ,
 and
 .Fl v
 options are all extensions to the standard.
 .Pp
 The format selected by the
 .Fl I
 flag is compatible with
 .St -iso8601 .
 .Sh HISTORY
 A
 .Nm
 command appeared in
 .At v1 .
 .Pp
 The
 .Fl I
 flag was added in
 .Fx 12.0 .
Index: user/ngie/bug-237403/bin/date/date.c
===================================================================
--- user/ngie/bug-237403/bin/date/date.c	(revision 346925)
+++ user/ngie/bug-237403/bin/date/date.c	(revision 346926)
@@ -1,391 +1,393 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1985, 1987, 1988, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #ifndef lint
 static char const copyright[] =
 "@(#) Copyright (c) 1985, 1987, 1988, 1993\n\
 	The Regents of the University of California.  All rights reserved.\n";
 #endif /* not lint */
 
 #if 0
 #ifndef lint
 static char sccsid[] = "@(#)date.c	8.2 (Berkeley) 4/28/95";
 #endif /* not lint */
 #endif
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/time.h>
 #include <sys/stat.h>
 
 #include <ctype.h>
 #include <err.h>
 #include <locale.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <syslog.h>
 #include <unistd.h>
 #include <utmpx.h>
 
 #include "vary.h"
 
 #ifndef	TM_YEAR_BASE
 #define	TM_YEAR_BASE	1900
 #endif
 
 static time_t tval;
 
 static void badformat(void);
 static void iso8601_usage(const char *);
 static void multipleformats(void);
 static void printdate(const char *);
 static void printisodate(struct tm *);
 static void setthetime(const char *, const char *, int);
 static void usage(void);
 
 static const struct iso8601_fmt {
 	const char *refname;
 	const char *format_string;
 } iso8601_fmts[] = {
 	{ "date", "%Y-%m-%d" },
 	{ "hours", "T%H" },
 	{ "minutes", ":%M" },
 	{ "seconds", ":%S" },
 };
 static const struct iso8601_fmt *iso8601_selected;
 
 static const char *rfc2822_format = "%a, %d %b %Y %T %z";
 
 int
 main(int argc, char *argv[])
 {
 	int ch, rflag;
 	bool Iflag, jflag, Rflag;
 	const char *format;
 	char buf[1024];
 	char *fmt;
 	char *tmp;
 	struct vary *v;
 	const struct vary *badv;
 	struct tm *lt;
 	struct stat sb;
 	size_t i;
 
 	v = NULL;
 	fmt = NULL;
 	(void) setlocale(LC_TIME, "");
 	rflag = 0;
 	Iflag = jflag = Rflag = 0;
-	while ((ch = getopt(argc, argv, "f:I::jRr:uv:")) != -1)
+	while ((ch = getopt(argc, argv, "f:I::jnRr:uv:")) != -1)
 		switch((char)ch) {
 		case 'f':
 			fmt = optarg;
 			break;
 		case 'I':
 			if (Rflag)
 				multipleformats();
 			Iflag = 1;
 			if (optarg == NULL) {
 				iso8601_selected = iso8601_fmts;
 				break;
 			}
 			for (i = 0; i < nitems(iso8601_fmts); i++)
 				if (strcmp(optarg, iso8601_fmts[i].refname) == 0)
 					break;
 			if (i == nitems(iso8601_fmts))
 				iso8601_usage(optarg);
 
 			iso8601_selected = &iso8601_fmts[i];
 			break;
 		case 'j':
 			jflag = 1;	/* don't set time */
+			break;
+		case 'n':
 			break;
 		case 'R':		/* RFC 2822 datetime format */
 			if (Iflag)
 				multipleformats();
 			Rflag = 1;
 			break;
 		case 'r':		/* user specified seconds */
 			rflag = 1;
 			tval = strtoq(optarg, &tmp, 0);
 			if (*tmp != 0) {
 				if (stat(optarg, &sb) == 0)
 					tval = sb.st_mtim.tv_sec;
 				else
 					usage();
 			}
 			break;
 		case 'u':		/* do everything in UTC */
 			(void)setenv("TZ", "UTC0", 1);
 			break;
 		case 'v':
 			v = vary_append(v, optarg);
 			break;
 		default:
 			usage();
 		}
 	argc -= optind;
 	argv += optind;
 
 	if (!rflag && time(&tval) == -1)
 		err(1, "time");
 
 	format = "%+";
 
 	if (Rflag)
 		format = rfc2822_format;
 
 	/* allow the operands in any order */
 	if (*argv && **argv == '+') {
 		if (Iflag)
 			multipleformats();
 		format = *argv + 1;
 		++argv;
 	}
 
 	if (*argv) {
 		setthetime(fmt, *argv, jflag);
 		++argv;
 	} else if (fmt != NULL)
 		usage();
 
 	if (*argv && **argv == '+') {
 		if (Iflag)
 			multipleformats();
 		format = *argv + 1;
 	}
 
 	lt = localtime(&tval);
 	if (lt == NULL)
 		errx(1, "invalid time");
 	badv = vary_apply(v, lt);
 	if (badv) {
 		fprintf(stderr, "%s: Cannot apply date adjustment\n",
 			badv->arg);
 		vary_destroy(v);
 		usage();
 	}
 	vary_destroy(v);
 
 	if (Iflag)
 		printisodate(lt);
 
 	if (format == rfc2822_format)
 		/*
 		 * When using RFC 2822 datetime format, don't honor the
 		 * locale.
 		 */
 		setlocale(LC_TIME, "C");
 
 	(void)strftime(buf, sizeof(buf), format, lt);
 	printdate(buf);
 }
 
 static void
 printdate(const char *buf)
 {
 	(void)printf("%s\n", buf);
 	if (fflush(stdout))
 		err(1, "stdout");
 	exit(EXIT_SUCCESS);
 }
 
 static void
 printisodate(struct tm *lt)
 {
 	const struct iso8601_fmt *it;
 	char fmtbuf[32], buf[32], tzbuf[8];
 
 	fmtbuf[0] = 0;
 	for (it = iso8601_fmts; it <= iso8601_selected; it++)
 		strlcat(fmtbuf, it->format_string, sizeof(fmtbuf));
 
 	(void)strftime(buf, sizeof(buf), fmtbuf, lt);
 
 	if (iso8601_selected > iso8601_fmts) {
 		(void)strftime(tzbuf, sizeof(tzbuf), "%z", lt);
 		memmove(&tzbuf[4], &tzbuf[3], 3);
 		tzbuf[3] = ':';
 		strlcat(buf, tzbuf, sizeof(buf));
 	}
 
 	printdate(buf);
 }
 
 #define	ATOI2(s)	((s) += 2, ((s)[-2] - '0') * 10 + ((s)[-1] - '0'))
 
 static void
 setthetime(const char *fmt, const char *p, int jflag)
 {
 	struct utmpx utx;
 	struct tm *lt;
 	struct timeval tv;
 	const char *dot, *t;
 	int century;
 
 	lt = localtime(&tval);
 	if (lt == NULL)
 		errx(1, "invalid time");
 	lt->tm_isdst = -1;		/* divine correct DST */
 
 	if (fmt != NULL) {
 		t = strptime(p, fmt, lt);
 		if (t == NULL) {
 			fprintf(stderr, "Failed conversion of ``%s''"
 				" using format ``%s''\n", p, fmt);
 			badformat();
 		} else if (*t != '\0')
 			fprintf(stderr, "Warning: Ignoring %ld extraneous"
 				" characters in date string (%s)\n",
 				(long) strlen(t), t);
 	} else {
 		for (t = p, dot = NULL; *t; ++t) {
 			if (isdigit(*t))
 				continue;
 			if (*t == '.' && dot == NULL) {
 				dot = t;
 				continue;
 			}
 			badformat();
 		}
 
 		if (dot != NULL) {			/* .ss */
 			dot++; /* *dot++ = '\0'; */
 			if (strlen(dot) != 2)
 				badformat();
 			lt->tm_sec = ATOI2(dot);
 			if (lt->tm_sec > 61)
 				badformat();
 		} else
 			lt->tm_sec = 0;
 
 		century = 0;
 		/* if p has a ".ss" field then let's pretend it's not there */
 		switch (strlen(p) - ((dot != NULL) ? 3 : 0)) {
 		case 12:				/* cc */
 			lt->tm_year = ATOI2(p) * 100 - TM_YEAR_BASE;
 			century = 1;
 			/* FALLTHROUGH */
 		case 10:				/* yy */
 			if (century)
 				lt->tm_year += ATOI2(p);
 			else {
 				lt->tm_year = ATOI2(p);
 				if (lt->tm_year < 69)	/* hack for 2000 ;-} */
 					lt->tm_year += 2000 - TM_YEAR_BASE;
 				else
 					lt->tm_year += 1900 - TM_YEAR_BASE;
 			}
 			/* FALLTHROUGH */
 		case 8:					/* mm */
 			lt->tm_mon = ATOI2(p);
 			if (lt->tm_mon > 12)
 				badformat();
 			--lt->tm_mon;		/* time struct is 0 - 11 */
 			/* FALLTHROUGH */
 		case 6:					/* dd */
 			lt->tm_mday = ATOI2(p);
 			if (lt->tm_mday > 31)
 				badformat();
 			/* FALLTHROUGH */
 		case 4:					/* HH */
 			lt->tm_hour = ATOI2(p);
 			if (lt->tm_hour > 23)
 				badformat();
 			/* FALLTHROUGH */
 		case 2:					/* MM */
 			lt->tm_min = ATOI2(p);
 			if (lt->tm_min > 59)
 				badformat();
 			break;
 		default:
 			badformat();
 		}
 	}
 
 	/* convert broken-down time to GMT clock time */
 	if ((tval = mktime(lt)) == -1)
 		errx(1, "nonexistent time");
 
 	if (!jflag) {
 		utx.ut_type = OLD_TIME;
 		memset(utx.ut_id, 0, sizeof(utx.ut_id));
 		(void)gettimeofday(&utx.ut_tv, NULL);
 		pututxline(&utx);
 		tv.tv_sec = tval;
 		tv.tv_usec = 0;
 		if (settimeofday(&tv, NULL) != 0)
 			err(1, "settimeofday (timeval)");
 		utx.ut_type = NEW_TIME;
 		(void)gettimeofday(&utx.ut_tv, NULL);
 		pututxline(&utx);
 
 		if ((p = getlogin()) == NULL)
 			p = "???";
 		syslog(LOG_AUTH | LOG_NOTICE, "date set by %s", p);
 	}
 }
 
 static void
 badformat(void)
 {
 	warnx("illegal time format");
 	usage();
 }
 
 static void
 iso8601_usage(const char *badarg)
 {
 	errx(1, "invalid argument '%s' for -I", badarg);
 }
 
 static void
 multipleformats(void)
 {
 	errx(1, "multiple output formats specified");
 }
 
 static void
 usage(void)
 {
 	(void)fprintf(stderr, "%s\n%s\n%s\n",
 	    "usage: date [-jnRu] [-r seconds|file] [-v[+|-]val[ymwdHMS]]",
 	    "            "
 	    "[-I[date | hours | minutes | seconds]]",
 	    "            "
 	    "[-f fmt date | [[[[[cc]yy]mm]dd]HH]MM[.ss]] [+format]"
 	    );
 	exit(1);
 }
Index: user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.ipv4localsctp.ksh
===================================================================
--- user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.ipv4localsctp.ksh	(revision 346925)
+++ user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.ipv4localsctp.ksh	(revision 346926)
@@ -1,137 +1,153 @@
 #!/usr/bin/env ksh
 #
 # CDDL HEADER START
 #
 # The contents of this file are subject to the terms of the
 # Common Development and Distribution License (the "License").
 # You may not use this file except in compliance with the License.
 #
 # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 # or http://www.opensolaris.org/os/licensing.
 # See the License for the specific language governing permissions
 # and limitations under the License.
 #
 # When distributing Covered Code, include this CDDL HEADER in each
 # file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 # If applicable, add the following below this CDDL HEADER, with the
 # fields enclosed by brackets "[]" replaced with your own identifying
 # information: Portions Copyright [yyyy] [name of copyright owner]
 #
 # CDDL HEADER END
 #
 
 #
 # Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
 #
 
 #
 # Test {ip,sctp}:::{send,receive} of IPv4 SCTP to local host.
 #
 # This may fail due to:
 #
 # 1. A change to the ip stack breaking expected probe behavior,
 #    which is the reason we are testing.
 # 2. The lo0 interface missing or not up.
 # 3. An unlikely race causes the unlocked global send/receive
 #    variables to be corrupted.
 #
 # This test performs a SCTP association and checks that at least the
 # following packet counts were traced:
 #
 # 7 x ip:::send (4 during the setup, 3 during the teardown)
 # 7 x sctp:::send (4 during the setup, 3 during the teardown)
 # 7 x ip:::receive (4 during the setup, 3 during the teardown)
 # 7 x sctp:::receive (4 during the setup, 3 during the teardown)
 
 # The actual count tested is 7 each way, since we are tracing both
 # source and destination events.
 #
 
 if (( $# != 1 )); then
 	print -u2 "expected one argument: <dtrace-path>"
 	exit 2
 fi
 
 dtrace=$1
 local=127.0.0.1
 DIR=/var/tmp/dtest.$$
 
 sctpport=1024
 bound=5000
-while [ $sctpport -lt $bound ]; do
-	ncat --sctp -z $local $sctpport > /dev/null || break
-	sctpport=$(($sctpport + 1))
-done
-if [ $sctpport -eq $bound ]; then
-	echo "couldn't find an available SCTP port"
-	exit 1
-fi
 
 mkdir $DIR
 cd $DIR
 
-# ncat will exit when the association is closed.
-ncat --sctp --listen $local $sctpport &
-
-cat > test.pl <<-EOPERL
+cat > client.pl <<-EOPERL
 	use IO::Socket;
 	my \$s = IO::Socket::INET->new(
 	    Type => SOCK_STREAM,
 	    Proto => "sctp",
 	    LocalAddr => "$local",
 	    PeerAddr => "$local",
-	    PeerPort => $sctpport,
+	    PeerPort => \$ARGV[0],
 	    Timeout => 3);
-	die "Could not connect to host $local port $sctpport \$@" unless \$s;
+	die "Could not connect to host $local port \$ARGV[0] \$@" unless \$s;
 	close \$s;
-	sleep(2);
+	sleep(\$ARGV[1]);
 EOPERL
 
-$dtrace -c 'perl test.pl' -qs /dev/stdin <<EODTRACE
+while [ $sctpport -lt $bound ]; do
+	perl client.pl $sctpport 0 2>&- || break
+	sctpport=$(($sctpport + 1))
+done
+if [ $sctpport -eq $bound ]; then
+	echo "couldn't find an available SCTP port"
+	exit 1
+fi
+
+cat > server.pl <<-EOPERL
+	use IO::Socket;
+	my \$l = IO::Socket::INET->new(
+	    Type => SOCK_STREAM,
+	    Proto => "sctp",
+	    LocalAddr => "$local",
+	    LocalPort => $sctpport,
+	    Listen => 1,
+	    Reuse => 1);
+	die "Could not listen on $local port $sctpport \$@" unless \$l;
+	my \$c = \$l->accept();
+	close \$l;
+	while (<\$c>) {};
+	close \$c;
+EOPERL
+
+perl server.pl &
+
+$dtrace -c "perl client.pl $sctpport 2" -qs /dev/stdin <<EODTRACE
 BEGIN
 {
 	ipsend = sctpsend = ipreceive = sctpreceive = 0;
 }
 
 ip:::send
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
     args[4]->ipv4_protocol == IPPROTO_SCTP/
 {
 	ipsend++;
 }
 
 sctp:::send
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local"/
 {
 	sctpsend++;
 }
 
 ip:::receive
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
     args[4]->ipv4_protocol == IPPROTO_SCTP/
 {
 	ipreceive++;
 }
 
 sctp:::receive
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local"/
 {
 	sctpreceive++;
 }
 
 END
 {
 	printf("Minimum SCTP events seen\n\n");
-	printf("ip:::send (%d) - %s\n", ipsend, ipsend >= 7 ? "yes" : "no");
-	printf("ip:::receive (%d) - %s\n", ipreceive, ipreceive >= 7 ? "yes" : "no");
-	printf("sctp:::send (%d) - %s\n", sctpsend, sctpsend >= 7 ? "yes" : "no");
-	printf("sctp:::receive (%d) - %s\n", sctpreceive, sctpreceive >= 7 ? "yes" : "no");
+	printf("ip:::send - %s\n", ipsend >= 7 ? "yes" : "no");
+	printf("ip:::receive - %s\n", ipreceive >= 7 ? "yes" : "no");
+	printf("sctp:::send - %s\n", sctpsend >= 7 ? "yes" : "no");
+	printf("sctp:::receive - %s\n", sctpreceive >= 7 ? "yes" : "no");
 }
 EODTRACE
 
 status=$?
 
 cd /
 /bin/rm -rf $DIR
 
 exit $status
Index: user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh
===================================================================
--- user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh	(revision 346925)
+++ user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh	(revision 346926)
@@ -1,159 +1,175 @@
 #!/usr/bin/env ksh
 #
 # CDDL HEADER START
 #
 # The contents of this file are subject to the terms of the
 # Common Development and Distribution License (the "License").
 # You may not use this file except in compliance with the License.
 #
 # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 # or http://www.opensolaris.org/os/licensing.
 # See the License for the specific language governing permissions
 # and limitations under the License.
 #
 # When distributing Covered Code, include this CDDL HEADER in each
 # file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 # If applicable, add the following below this CDDL HEADER, with the
 # fields enclosed by brackets "[]" replaced with your own identifying
 # information: Portions Copyright [yyyy] [name of copyright owner]
 #
 # CDDL HEADER END
 #
 
 #
 # Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
 #
 
 #
 # Test sctp:::state-change and sctp:::{send,receive} by connecting to
 # the local discard service.
 # A number of state transition events along with SCTP send and
 # receive events for the message should result.
 #
 # This may fail due to:
 #
 # 1. A change to the ip stack breaking expected probe behavior,
 #    which is the reason we are testing.
 # 2. The lo0 interface missing or not up.
 # 3. An unlikely race causes the unlocked global send/receive
 #    variables to be corrupted.
 #
 # This test performs a SCTP connection and checks that at least the
 # following packet counts were traced:
 #
 # 7 x ip:::send (4 during the setup, 3 during the teardown)
 # 7 x sctp:::send (4 during the setup, 3 during the teardown)
 # 7 x ip:::receive (4 during the setup, 3 during the teardown)
 # 7 x sctp:::receive (4 during the setup, 3 during the teardown)
 #
 # The actual count tested is 7 each way, since we are tracing both
 # source and destination events.
 #
 
 if (( $# != 1 )); then
 	print -u2 "expected one argument: <dtrace-path>"
 	exit 2
 fi
 
 dtrace=$1
 local=127.0.0.1
 DIR=/var/tmp/dtest.$$
 
 sctpport=1024
 bound=5000
-while [ $sctpport -lt $bound ]; do
-	ncat --sctp -z $local $sctpport > /dev/null || break
-	sctpport=$(($sctpport + 1))
-done
-if [ $sctpport -eq $bound ]; then
-	echo "couldn't find an available SCTP port"
-	exit 1
-fi
 
 mkdir $DIR
 cd $DIR
 
-# ncat will exit when the association is closed.
-ncat --sctp --listen $local $sctpport &
-
-cat > test.pl <<-EOPERL
+cat > client.pl <<-EOPERL
 	use IO::Socket;
 	my \$s = IO::Socket::INET->new(
 	    Type => SOCK_STREAM,
 	    Proto => "sctp",
 	    LocalAddr => "$local",
 	    PeerAddr => "$local",
-	    PeerPort => $sctpport,
+	    PeerPort => \$ARGV[0],
 	    Timeout => 3);
-	die "Could not connect to host $local port $sctpport \$@" unless \$s;
+	die "Could not connect to host $local port \$ARGV[0] \$@" unless \$s;
 	close \$s;
-	sleep(2);
+	sleep(\$ARGV[1]);
 EOPERL
 
-$dtrace -c 'perl test.pl' -qs /dev/stdin <<EODTRACE
+while [ $sctpport -lt $bound ]; do
+	perl client.pl $sctpport 0 2>&- || break
+	sctpport=$(($sctpport + 1))
+done
+if [ $sctpport -eq $bound ]; then
+	echo "couldn't find an available SCTP port"
+	exit 1
+fi
+
+cat > server.pl <<-EOPERL
+	use IO::Socket;
+	my \$l = IO::Socket::INET->new(
+	    Type => SOCK_STREAM,
+	    Proto => "sctp",
+	    LocalAddr => "$local",
+	    LocalPort => $sctpport,
+	    Listen => 1,
+	    Reuse => 1);
+	die "Could not listen on $local port $sctpport \$@" unless \$l;
+	my \$c = \$l->accept();
+	close \$l;
+	while (<\$c>) {};
+	close \$c;
+EOPERL
+
+perl server.pl &
+
+$dtrace -c "perl client.pl $sctpport 2" -qs /dev/stdin <<EODTRACE
 BEGIN
 {
 	ipsend = sctpsend = ipreceive = sctpreceive = 0;
 }
 
 ip:::send
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
     args[4]->ipv4_protocol == IPPROTO_SCTP/
 {
 	ipsend++;
 }
 
 sctp:::send
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
  (args[4]->sctp_sport == $sctpport || args[4]->sctp_dport == $sctpport)/
 {
 	sctpsend++;
 }
 
 ip:::receive
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
     args[4]->ipv4_protocol == IPPROTO_SCTP/
 {
 	ipreceive++;
 }
 
 sctp:::receive
 /args[2]->ip_saddr == "$local" && args[2]->ip_daddr == "$local" &&
  (args[4]->sctp_sport == $sctpport || args[4]->sctp_dport == $sctpport)/
 {
 	sctpreceive++;
 }
 
 sctp:::state-change
 {
 	state_event[args[3]->sctps_state]++;
 }
 
 END
 {
 	printf("Minimum SCTP events seen\n\n");
 	printf("ip:::send - %s\n", ipsend >= 7 ? "yes" : "no");
 	printf("ip:::receive - %s\n", ipreceive >= 7 ? "yes" : "no");
 	printf("sctp:::send - %s\n", sctpsend >= 7 ? "yes" : "no");
 	printf("sctp:::receive - %s\n", sctpreceive >= 7 ? "yes" : "no");
 	printf("sctp:::state-change to cookie-wait - %s\n",
 	    state_event[SCTP_STATE_COOKIE_WAIT] >=1 ? "yes" : "no");
 	printf("sctp:::state-change to cookie-echoed - %s\n",
 	    state_event[SCTP_STATE_COOKIE_ECHOED] >=1 ? "yes" : "no");
 	printf("sctp:::state-change to established - %s\n",
 	    state_event[SCTP_STATE_ESTABLISHED] >= 2 ? "yes" : "no");
 	printf("sctp:::state-change to shutdown-sent - %s\n",
 	    state_event[SCTP_STATE_SHUTDOWN_SENT] >= 1 ? "yes" : "no");
 	printf("sctp:::state-change to shutdown-received - %s\n",
 	    state_event[SCTP_STATE_SHUTDOWN_RECEIVED] >= 1 ? "yes" : "no");
 	printf("sctp:::state-change to shutdown-ack-sent - %s\n",
 	    state_event[SCTP_STATE_SHUTDOWN_ACK_SENT] >= 1 ? "yes" : "no");
 }
 EODTRACE
 
 status=$?
 
 cd /
 /bin/rm -rf $DIR
 
 exit $status
Index: user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh.out
===================================================================
--- user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh.out	(revision 346925)
+++ user/ngie/bug-237403/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/ip/tst.localsctpstate.ksh.out	(revision 346926)
@@ -1,12 +1,13 @@
 Minimum SCTP events seen
 
 ip:::send - yes
 ip:::receive - yes
 sctp:::send - yes
 sctp:::receive - yes
 sctp:::state-change to cookie-wait - yes
 sctp:::state-change to cookie-echoed - yes
 sctp:::state-change to established - yes
 sctp:::state-change to shutdown-sent - yes
 sctp:::state-change to shutdown-received - yes
 sctp:::state-change to shutdown-ack-sent - yes
+
Index: user/ngie/bug-237403/cddl/contrib/opensolaris
===================================================================
--- user/ngie/bug-237403/cddl/contrib/opensolaris	(revision 346925)
+++ user/ngie/bug-237403/cddl/contrib/opensolaris	(revision 346926)

Property changes on: user/ngie/bug-237403/cddl/contrib/opensolaris
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/cddl/contrib/opensolaris:r346444-346925
Index: user/ngie/bug-237403/cddl
===================================================================
--- user/ngie/bug-237403/cddl	(revision 346925)
+++ user/ngie/bug-237403/cddl	(revision 346926)

Property changes on: user/ngie/bug-237403/cddl
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/cddl:r346444-346925
Index: user/ngie/bug-237403/etc/mtree/BSD.sendmail.dist
===================================================================
--- user/ngie/bug-237403/etc/mtree/BSD.sendmail.dist	(revision 346925)
+++ user/ngie/bug-237403/etc/mtree/BSD.sendmail.dist	(revision 346926)
@@ -1,14 +1,14 @@
 # $FreeBSD$
 #
 # Please see the file src/etc/mtree/README before making changes to this file.
 #
 
 /set type=dir uname=root gname=wheel mode=0755
 .               nochange
     var             nochange
-        spool           nochange
+        spool           nochange tags=package=runtime
             clientmqueue    uname=smmsp gname=smmsp mode=0770
             ..
         ..
     ..
 ..
Index: user/ngie/bug-237403/etc/mtree/BSD.usr.dist
===================================================================
--- user/ngie/bug-237403/etc/mtree/BSD.usr.dist	(revision 346925)
+++ user/ngie/bug-237403/etc/mtree/BSD.usr.dist	(revision 346926)
@@ -1,1254 +1,1254 @@
 # $FreeBSD$
 #
 # Please see the file src/etc/mtree/README before making changes to this file.
 #
 
 /set type=dir uname=root gname=wheel mode=0755
 .
     bin
     ..
     include
         private
             bsdstat
             ..
             event
             ..
             gmock
                 internal
                     custom
                     ..
                 ..
             ..
             gtest
                 internal
                     custom
                     ..
                 ..
             ..
             sqlite3
             ..
             ucl
             ..
             zstd
             ..
         ..
     ..
     lib
         aout
         ..
         clang
             8.0.0
                 include
                     sanitizer
                     ..
                 ..
                 lib
                     freebsd
                     ..
                 ..
             ..
         ..
         compat
             aout
             ..
         ..
         dtrace
         ..
         engines
         ..
         i18n
         ..
         libxo
             encoder
             ..
         ..
     ..
     libdata
         gcc
         ..
         ldscripts
         ..
         pkgconfig
         ..
     ..
     libexec
         bsdconfig
             020.docsinstall
                 include
                 ..
             ..
             030.packages
                 include
                 ..
             ..
             040.password
                 include
                 ..
             ..
             050.diskmgmt
                 include
                 ..
             ..
             070.usermgmt
                 include
                 ..
             ..
             080.console
                 include
                 ..
             ..
             090.timezone
                 include
                 ..
             ..
             110.mouse
                 include
                 ..
             ..
             120.networking
                 include
                 ..
             ..
             130.security
                 include
                 ..
             ..
             140.startup
                 include
                 ..
             ..
             150.ttys
                 include
                 ..
             ..
             dot
                 include
                 ..
             ..
             include
             ..
             includes
                 include
                 ..
             ..
         ..
         bsdinstall
         ..
         dwatch
         ..
         hyperv
         ..
         lpr
             ru
             ..
         ..
         sendmail
         ..
         sm.bin
         ..
     ..
     local
     ..
     obj             nochange
     ..
     sbin
     ..
     share
         atf
         ..
         bsdconfig
             media
             ..
             networking
             ..
             packages
             ..
             password
             ..
             startup
             ..
             timezone
             ..
             usermgmt
             ..
         ..
         calendar
             de_AT.ISO_8859-15
             ..
             de_DE.ISO8859-1
             ..
             fr_FR.ISO8859-1
             ..
             hr_HR.ISO8859-2
             ..
             hu_HU.ISO8859-2
             ..
             pt_BR.ISO8859-1
             ..
             pt_BR.UTF-8
             ..
             ru_RU.KOI8-R
             ..
             ru_RU.UTF-8
             ..
             uk_UA.KOI8-U
             ..
         ..
         dict
         ..
         doc
             IPv6
             ..
             atf
             ..
             legal
             ..
             llvm
                 clang
                 ..
             ..
             ncurses
             ..
             ntp
                 drivers
                     icons
                     ..
                     scripts
                     ..
                 ..
                 hints
                 ..
                 icons
                 ..
                 pic
                 ..
                 scripts
                 ..
             ..
             pjdfstest
             ..
         ..
         dtrace
         ..
         examples
             BSD_daemon
             ..
             FreeBSD_version
             ..
             IPv6
             ..
             bhyve
             ..
             bootforth
             ..
             bsdconfig
             ..
             csh
             ..
             diskless
             ..
             dma
             ..
             drivers
             ..
             dwatch
             ..
             etc
                 defaults
                 ..
             ..
             find_interface
             ..
             hast
             ..
             hostapd
             ..
             indent
             ..
             ipfilter
             ..
             ipfw
             ..
             jails
             ..
             kld
                 cdev
                     module
                     ..
                     test
                     ..
                 ..
                 dyn_sysctl
                 ..
                 firmware
                     fwconsumer
                     ..
                     fwimage
                     ..
                 ..
                 khelp
                 ..
                 syscall
                     module
                     ..
                     test
                     ..
                 ..
             ..
             libusb20
             ..
             libvgl
             ..
             mdoc
             ..
             netgraph
             ..
             pc-sysinstall
             ..
             perfmon
             ..
             pf
             ..
             ppi
             ..
             ppp
             ..
             printing
             ..
             scsi_target
             ..
             ses
                 getencstat
                 ..
                 sesd
                 ..
                 setencstat
                 ..
                 setobjstat
                 ..
                 srcs
                 ..
             ..
             smbfs
                 print
                 ..
             ..
             sunrpc
                 dir
                 ..
                 msg
                 ..
                 sort
                 ..
             ..
             tcsh
             ..
             uefisign
             ..
             ypldap
             ..
         ..
         firmware
         ..
         games
             fortune
             ..
         ..
         i18n
             csmapper
                 APPLE
                 ..
                 AST
                 ..
                 BIG5
                 ..
                 CNS
                 ..
                 CP
                 ..
                 EBCDIC
                 ..
                 GB
                 ..
                 GEORGIAN
                 ..
                 ISO-8859
                 ..
                 ISO646
                 ..
                 JIS
                 ..
                 KAZAKH
                 ..
                 KOI
                 ..
                 KS
                 ..
                 MISC
                 ..
                 TCVN
                 ..
             ..
             esdb
                 APPLE
                 ..
                 AST
                 ..
                 BIG5
                 ..
                 CP
                 ..
                 DEC
                 ..
                 EBCDIC
                 ..
                 EUC
                 ..
                 GB
                 ..
                 GEORGIAN
                 ..
                 ISO-2022
                 ..
                 ISO-8859
                 ..
                 ISO646
                 ..
                 KAZAKH
                 ..
                 KOI
                 ..
                 MISC
                 ..
                 TCVN
                 ..
                 UTF
                 ..
             ..
         ..
         keys
             pkg
-                revoked
+                revoked tags=package=runtime
                 ..
-                trusted
+                trusted tags=package=runtime
                 ..
             ..
         ..
         locale
             af_ZA.ISO8859-1
             ..
             af_ZA.ISO8859-15
             ..
             af_ZA.UTF-8
             ..
             ar_AE.UTF-8
             ..
             ar_EG.UTF-8
             ..
             ar_JO.UTF-8
             ..
             ar_MA.UTF-8
             ..
             ar_QA.UTF-8
             ..
             ar_SA.UTF-8
             ..
             am_ET.UTF-8
             ..
             be_BY.CP1131
             ..
             be_BY.CP1251
             ..
             be_BY.ISO8859-5
             ..
             be_BY.UTF-8
             ..
             bg_BG.CP1251
             ..
             bg_BG.UTF-8
             ..
             ca_AD.ISO8859-1
             ..
             ca_AD.ISO8859-15
             ..
             ca_ES.ISO8859-1
             ..
             ca_ES.ISO8859-15
             ..
             ca_FR.ISO8859-1
             ..
             ca_FR.ISO8859-15
             ..
             ca_IT.ISO8859-1
             ..
             ca_IT.ISO8859-15
             ..
             ca_AD.UTF-8
             ..
             ca_ES.UTF-8
             ..
             ca_FR.UTF-8
             ..
             ca_IT.UTF-8
             ..
             cs_CZ.ISO8859-2
             ..
             cs_CZ.UTF-8
             ..
             da_DK.ISO8859-1
             ..
             da_DK.ISO8859-15
             ..
             da_DK.UTF-8
             ..
             de_AT.ISO8859-1
             ..
             de_AT.ISO8859-15
             ..
             de_AT.UTF-8
             ..
             de_CH.ISO8859-1
             ..
             de_CH.ISO8859-15
             ..
             de_CH.UTF-8
             ..
             de_DE.ISO8859-1
             ..
             de_DE.ISO8859-15
             ..
             de_DE.UTF-8
             ..
             el_GR.ISO8859-7
             ..
             el_GR.UTF-8
             ..
             en_AU.ISO8859-1
             ..
             en_AU.ISO8859-15
             ..
             en_AU.US-ASCII
             ..
             en_AU.UTF-8
             ..
             en_CA.ISO8859-1
             ..
             en_CA.ISO8859-15
             ..
             en_CA.US-ASCII
             ..
             en_CA.UTF-8
             ..
             en_GB.ISO8859-1
             ..
             en_GB.ISO8859-15
             ..
             en_GB.US-ASCII
             ..
             en_GB.UTF-8
             ..
             en_HK.ISO8859-1
             ..
             en_HK.UTF-8
             ..
             en_IE.ISO8859-1
             ..
             en_IE.ISO8859-15
             ..
             en_IE.UTF-8
             ..
             en_NZ.ISO8859-1
             ..
             en_NZ.ISO8859-15
             ..
             en_NZ.US-ASCII
             ..
             en_NZ.UTF-8
             ..
             en_PH.UTF-8
             ..
             en_SG.ISO8859-1
             ..
             en_SG.UTF-8
             ..
             en_US.ISO8859-1
             ..
             en_US.ISO8859-15
             ..
             en_US.US-ASCII
             ..
             en_US.UTF-8
             ..
             en_ZA.ISO8859-1
             ..
             en_ZA.ISO8859-15
             ..
             en_ZA.US-ASCII
             ..
             en_ZA.UTF-8
             ..
             es_AR.ISO8859-1
             ..
             es_AR.UTF-8
             ..
             es_CR.UTF-8
             ..
             es_ES.ISO8859-1
             ..
             es_ES.ISO8859-15
             ..
             es_ES.UTF-8
             ..
             es_MX.ISO8859-1
             ..
             es_MX.UTF-8
             ..
             et_EE.ISO8859-1
             ..
             et_EE.ISO8859-15
             ..
             et_EE.UTF-8
             ..
             eu_ES.ISO8859-1
             ..
             eu_ES.ISO8859-15
             ..
             eu_ES.UTF-8
             ..
             fi_FI.ISO8859-1
             ..
             fi_FI.ISO8859-15
             ..
             fi_FI.UTF-8
             ..
             fr_BE.ISO8859-1
             ..
             fr_BE.ISO8859-15
             ..
             fr_BE.UTF-8
             ..
             fr_CA.ISO8859-1
             ..
             fr_CA.ISO8859-15
             ..
             fr_CA.UTF-8
             ..
             fr_CH.ISO8859-1
             ..
             fr_CH.ISO8859-15
             ..
             fr_CH.UTF-8
             ..
             fr_FR.ISO8859-1
             ..
             fr_FR.ISO8859-15
             ..
             fr_FR.UTF-8
             ..
             ga_IE.UTF-8
             ..
             he_IL.UTF-8
             ..
             hi_IN.ISCII-DEV
             ..
             hi_IN.UTF-8
             ..
             hr_HR.ISO8859-2
             ..
             hr_HR.UTF-8
             ..
             hu_HU.ISO8859-2
             ..
             hu_HU.UTF-8
             ..
             hy_AM.ARMSCII-8
             ..
             hy_AM.UTF-8
             ..
             is_IS.ISO8859-1
             ..
             is_IS.ISO8859-15
             ..
             is_IS.UTF-8
             ..
             it_CH.ISO8859-1
             ..
             it_CH.ISO8859-15
             ..
             it_CH.UTF-8
             ..
             it_IT.ISO8859-1
             ..
             it_IT.ISO8859-15
             ..
             it_IT.UTF-8
             ..
             ja_JP.SJIS
             ..
             ja_JP.UTF-8
             ..
             ja_JP.eucJP
             ..
             kk_KZ.UTF-8
             ..
             ko_KR.CP949
             ..
             ko_KR.UTF-8
             ..
             ko_KR.eucKR
             ..
             lt_LT.ISO8859-13
             ..
             lt_LT.UTF-8
             ..
             lv_LV.ISO8859-13
             ..
             lv_LV.UTF-8
             ..
             mn_MN.UTF-8
             ..
             nb_NO.ISO8859-1
             ..
             nb_NO.ISO8859-15
             ..
             nb_NO.UTF-8
             ..
             nl_BE.ISO8859-1
             ..
             nl_BE.ISO8859-15
             ..
             nl_BE.UTF-8
             ..
             nl_NL.ISO8859-1
             ..
             nl_NL.ISO8859-15
             ..
             nl_NL.UTF-8
             ..
             nn_NO.ISO8859-1
             ..
             nn_NO.ISO8859-15
             ..
             nn_NO.UTF-8
             ..
             pl_PL.ISO8859-2
             ..
             pl_PL.UTF-8
             ..
             pt_BR.ISO8859-1
             ..
             pt_BR.UTF-8
             ..
             pt_PT.ISO8859-1
             ..
             pt_PT.ISO8859-15
             ..
             pt_PT.UTF-8
             ..
             ro_RO.ISO8859-2
             ..
             ro_RO.UTF-8
             ..
             ru_RU.CP1251
             ..
             ru_RU.CP866
             ..
             ru_RU.ISO8859-5
             ..
             ru_RU.KOI8-R
             ..
             ru_RU.UTF-8
             ..
             se_FI.UTF-8
             ..
             se_NO.UTF-8
             ..
             sk_SK.ISO8859-2
             ..
             sk_SK.UTF-8
             ..
             sl_SI.ISO8859-2
             ..
             sl_SI.UTF-8
             ..
             sr_RS.ISO8859-5
             ..
             sr_RS.UTF-8
             ..
             sr_RS.ISO8859-2
             ..
             sr_RS.UTF-8@latin
             ..
             sv_FI.ISO8859-1
             ..
             sv_FI.ISO8859-15
             ..
             sv_FI.UTF-8
             ..
             sv_SE.ISO8859-1
             ..
             sv_SE.ISO8859-15
             ..
             sv_SE.UTF-8
             ..
             tr_TR.ISO8859-9
             ..
             tr_TR.UTF-8
             ..
             uk_UA.CP1251
             ..
             uk_UA.ISO8859-5
             ..
             uk_UA.KOI8-U
             ..
             uk_UA.UTF-8
             ..
             zh_CN.GB18030
             ..
             zh_CN.GB2312
             ..
             zh_CN.GBK
             ..
             zh_CN.eucCN
             ..
             zh_CN.UTF-8
             ..
             zh_HK.UTF-8
             ..
             zh_TW.Big5
             ..
             zh_TW.UTF-8
             ..
         ..
         man
             man1
             ..
             man2
             ..
             man3
             ..
             man4
                 aarch64
                 ..
                 amd64
                 ..
                 arm
                 ..
                 i386
                 ..
                 powerpc
                 ..
                 sparc64
                 ..
             ..
             man5
             ..
             man6
             ..
             man7
             ..
             man8
                 amd64
                 ..
                 i386
                 ..
                 powerpc
                 ..
                 sparc64
                 ..
             ..
             man9
             ..
         ..
         misc
             fonts
             ..
         ..
         mk
         ..
         nls
             C
             ..
             af_ZA.ISO8859-1
             ..
             af_ZA.ISO8859-15
             ..
             af_ZA.UTF-8
             ..
             am_ET.UTF-8
             ..
             be_BY.CP1131
             ..
             be_BY.CP1251
             ..
             be_BY.ISO8859-5
             ..
             be_BY.UTF-8
             ..
             bg_BG.CP1251
             ..
             bg_BG.UTF-8
             ..
             ca_ES.ISO8859-1
             ..
             ca_ES.ISO8859-15
             ..
             ca_ES.UTF-8
             ..
             cs_CZ.ISO8859-2
             ..
             cs_CZ.UTF-8
             ..
             da_DK.ISO8859-1
             ..
             da_DK.ISO8859-15
             ..
             da_DK.UTF-8
             ..
             de_AT.ISO8859-1
             ..
             de_AT.ISO8859-15
             ..
             de_AT.UTF-8
             ..
             de_CH.ISO8859-1
             ..
             de_CH.ISO8859-15
             ..
             de_CH.UTF-8
             ..
             de_DE.ISO8859-1
             ..
             de_DE.ISO8859-15
             ..
             de_DE.UTF-8
             ..
             el_GR.ISO8859-7
             ..
             el_GR.UTF-8
             ..
             en_AU.ISO8859-1
             ..
             en_AU.ISO8859-15
             ..
             en_AU.US-ASCII
             ..
             en_AU.UTF-8
             ..
             en_CA.ISO8859-1
             ..
             en_CA.ISO8859-15
             ..
             en_CA.US-ASCII
             ..
             en_CA.UTF-8
             ..
             en_GB.ISO8859-1
             ..
             en_GB.ISO8859-15
             ..
             en_GB.US-ASCII
             ..
             en_GB.UTF-8
             ..
             en_IE.UTF-8
             ..
             en_NZ.ISO8859-1
             ..
             en_NZ.ISO8859-15
             ..
             en_NZ.US-ASCII
             ..
             en_NZ.UTF-8
             ..
             en_US.ISO8859-1
             ..
             en_US.ISO8859-15
             ..
             en_US.UTF-8
             ..
             es_ES.ISO8859-1
             ..
             es_ES.ISO8859-15
             ..
             es_ES.UTF-8
             ..
             et_EE.ISO8859-15
             ..
             et_EE.UTF-8
             ..
             fi_FI.ISO8859-1
             ..
             fi_FI.ISO8859-15
             ..
             fi_FI.UTF-8
             ..
             fr_BE.ISO8859-1
             ..
             fr_BE.ISO8859-15
             ..
             fr_BE.UTF-8
             ..
             fr_CA.ISO8859-1
             ..
             fr_CA.ISO8859-15
             ..
             fr_CA.UTF-8
             ..
             fr_CH.ISO8859-1
             ..
             fr_CH.ISO8859-15
             ..
             fr_CH.UTF-8
             ..
             fr_FR.ISO8859-1
             ..
             fr_FR.ISO8859-15
             ..
             fr_FR.UTF-8
             ..
             gl_ES.ISO8859-1
             ..
             he_IL.UTF-8
             ..
             hi_IN.ISCII-DEV
             ..
             hr_HR.ISO8859-2
             ..
             hr_HR.UTF-8
             ..
             hu_HU.ISO8859-2
             ..
             hu_HU.UTF-8
             ..
             hy_AM.ARMSCII-8
             ..
             hy_AM.UTF-8
             ..
             is_IS.ISO8859-1
             ..
             is_IS.ISO8859-15
             ..
             is_IS.UTF-8
             ..
             it_CH.ISO8859-1
             ..
             it_CH.ISO8859-15
             ..
             it_CH.UTF-8
             ..
             it_IT.ISO8859-1
             ..
             it_IT.ISO8859-15
             ..
             it_IT.UTF-8
             ..
             ja_JP.SJIS
             ..
             ja_JP.UTF-8
             ..
             ja_JP.eucJP
             ..
             kk_KZ.PT154
             ..
             kk_KZ.UTF-8
             ..
             ko_KR.CP949
             ..
             ko_KR.UTF-8
             ..
             ko_KR.eucKR
             ..
             lt_LT.ISO8859-13
             ..
             lt_LT.UTF-8
             ..
             lv_LV.ISO8859-13
             ..
             lv_LV.UTF-8
             ..
             mn_MN.UTF-8
             ..
             nl_BE.ISO8859-1
             ..
             nl_BE.ISO8859-15
             ..
             nl_BE.UTF-8
             ..
             nl_NL.ISO8859-1
             ..
             nl_NL.ISO8859-15
             ..
             nl_NL.UTF-8
             ..
             no_NO.ISO8859-1
             ..
             no_NO.ISO8859-15
             ..
             no_NO.UTF-8
             ..
             pl_PL.ISO8859-2
             ..
             pl_PL.UTF-8
             ..
             pt_BR.ISO8859-1
             ..
             pt_BR.UTF-8
             ..
             pt_PT.ISO8859-1
             ..
             pt_PT.ISO8859-15
             ..
             pt_PT.UTF-8
             ..
             ro_RO.ISO8859-2
             ..
             ro_RO.UTF-8
             ..
             ru_RU.CP1251
             ..
             ru_RU.CP866
             ..
             ru_RU.ISO8859-5
             ..
             ru_RU.KOI8-R
             ..
             ru_RU.UTF-8
             ..
             sk_SK.ISO8859-2
             ..
             sk_SK.UTF-8
             ..
             sl_SI.ISO8859-2
             ..
             sl_SI.UTF-8
             ..
             sr_YU.ISO8859-2
             ..
             sr_YU.ISO8859-5
             ..
             sr_YU.UTF-8
             ..
             sv_SE.ISO8859-1
             ..
             sv_SE.ISO8859-15
             ..
             sv_SE.UTF-8
             ..
             tr_TR.ISO8859-9
             ..
             tr_TR.UTF-8
             ..
             uk_UA.ISO8859-5
             ..
             uk_UA.KOI8-U
             ..
             uk_UA.UTF-8
             ..
             zh_CN.GB18030
             ..
             zh_CN.GB2312
             ..
             zh_CN.GBK
             ..
             zh_CN.UTF-8
             ..
             zh_CN.eucCN
             ..
             zh_HK.UTF-8
             ..
             zh_TW.UTF-8
             ..
         ..
         openssl
             man
                 man1
                 ..
                 man3
                 ..
             ..
         ..
         pc-sysinstall
             backend
             ..
             backend-partmanager
             ..
             backend-query
             ..
             conf
                 license
                 ..
             ..
             doc
             ..
         ..
         security
         ..
         sendmail
         ..
         skel
         ..
         snmp
             defs
             ..
             mibs
             ..
         ..
         syscons
             fonts
             ..
             keymaps
             ..
             scrnmaps
             ..
         ..
         tabset
         ..
         vi
             catalog
             ..
         ..
         vt
             fonts
             ..
             keymaps
             ..
         ..
         zoneinfo
             Africa
             ..
             America
                 Argentina
                 ..
                 Indiana
                 ..
                 Kentucky
                 ..
                 North_Dakota
                 ..
             ..
             Antarctica
             ..
             Arctic
             ..
             Asia
             ..
             Atlantic
             ..
             Australia
             ..
             Etc
             ..
             Europe
             ..
             Indian
             ..
             Pacific
             ..
             SystemV
             ..
         ..
     ..
     src             nochange
     ..
 ..
Index: user/ngie/bug-237403/etc/mtree/BSD.var.dist
===================================================================
--- user/ngie/bug-237403/etc/mtree/BSD.var.dist	(revision 346925)
+++ user/ngie/bug-237403/etc/mtree/BSD.var.dist	(revision 346926)
@@ -1,114 +1,114 @@
 # $FreeBSD$
 #
 # Please see the file src/etc/mtree/README before making changes to this file.
 #
 
 /set type=dir uname=root gname=wheel mode=0755
 .
     account
     ..
     at
 /set uname=daemon
         jobs            tags=package=at
         ..
         spool           tags=package=at
         ..
 /set uname=root
     ..
 /set mode=0750
 /set gname=audit
     audit
         dist            uname=auditdistd gname=audit mode=0770
         ..
         remote          uname=auditdistd gname=wheel mode=0700
         ..
     ..
     authpf              uname=root gname=authpf mode=0770
     ..
 /set gname=wheel
     backups
     ..
     cache               mode=0755
     ..
     crash
     ..
-    cron
+    cron                tags=package=runtime
         tabs            mode=0700
         ..
     ..
 /set mode=0755
     db
         entropy         uname=operator gname=operator mode=0700
         ..
         freebsd-update  mode=0700
         ..
         hyperv          mode=0700
         ..
         ipf             mode=0700
         ..
         ntp             uname=ntpd gname=ntpd
         ..
         pkg
         ..
         ports
         ..
         portsnap
         ..
         zfsd
             cases
             ..
         ..
     ..
-    empty           mode=0555 flags=schg
+    empty           mode=0555 flags=schg tags=package=runtime
     ..
     games           gname=games mode=0775
     ..
     heimdal         mode=0700
     ..
-    log
+    log             tags=package=runtime
     ..
-    mail            gname=mail mode=0775
+    mail            gname=mail mode=0775 tags=package=runtime
     ..
     msgs            uname=daemon
     ..
     preserve
     ..
-    run
+    run             tags=package=runtime
         dhclient
         ..
         ppp             gname=network mode=0770
         ..
         wpa_supplicant
         ..
     ..
     rwho            gname=daemon mode=0775
     ..
     spool
         dma             uname=root gname=mail mode=0770
         ..
         lock            uname=uucp gname=dialer mode=0775
         ..
 /set gname=daemon
         lpd
         ..
         mqueue
         ..
         opielocks       mode=0700
         ..
         output
             lpd
             ..
         ..
 /set gname=wheel
     ..
-    tmp             mode=01777
+    tmp             mode=01777 tags=package=runtime
         vi.recover      mode=01777
         ..
     ..
     unbound         uname=unbound gname=unbound mode=0755 tags=package=unbound
         conf.d          uname=unbound gname=unbound mode=0755 tags=package=unbound
         ..
     ..
     yp
     ..
 ..
Index: user/ngie/bug-237403/lib/libbe/Makefile
===================================================================
--- user/ngie/bug-237403/lib/libbe/Makefile	(revision 346925)
+++ user/ngie/bug-237403/lib/libbe/Makefile	(revision 346926)
@@ -1,36 +1,37 @@
 # $FreeBSD$
 
+SHLIBDIR?=	/lib
+
 .include <src.opts.mk>
 
 PACKAGE=	lib${LIB}
 LIB=		be
-SHLIBDIR?= /lib
 SHLIB_MAJOR=	1
 SHLIB_MINOR=	0
 
 SRCS=		be.c be_access.c be_error.c be_info.c
 INCS=		be.h
 MAN=		libbe.3
 
 WARNS?=	2
 IGNORE_PRAGMA=	yes
 
 LIBADD+= zfs
 LIBADD+= nvpair
 
 CFLAGS+= -I${SRCTOP}/cddl/contrib/opensolaris/lib/libzfs/common
 CFLAGS+= -I${SRCTOP}/sys/cddl/compat/opensolaris
 CFLAGS+= -I${SRCTOP}/cddl/compat/opensolaris/include
 CFLAGS+= -I${SRCTOP}/cddl/compat/opensolaris/lib/libumem
 CFLAGS+= -I${SRCTOP}/cddl/contrib/opensolaris/lib/libzpool/common
 CFLAGS+= -I${SRCTOP}/sys/cddl/contrib/opensolaris/common/zfs
 CFLAGS+= -I${SRCTOP}/sys/cddl/contrib/opensolaris/uts/common/fs/zfs
 CFLAGS+= -I${SRCTOP}/sys/cddl/contrib/opensolaris/uts/common
 CFLAGS+= -I${SRCTOP}/cddl/contrib/opensolaris/head
 
 CFLAGS+= -DNEED_SOLARIS_BOOLEAN
 
 HAS_TESTS=	YES
 SUBDIR.${MK_TESTS}+= tests
 
 .include <bsd.lib.mk>
Index: user/ngie/bug-237403/lib/libbe/be.c
===================================================================
--- user/ngie/bug-237403/lib/libbe/be.c	(revision 346925)
+++ user/ngie/bug-237403/lib/libbe/be.c	(revision 346926)
@@ -1,1093 +1,1097 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2017 Kyle J. Kneitinger <kyle@kneit.in>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/mount.h>
 #include <sys/stat.h>
 #include <sys/ucred.h>
 
 #include <ctype.h>
 #include <libgen.h>
 #include <libzfs_core.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <time.h>
 #include <unistd.h>
 
 #include "be.h"
 #include "be_impl.h"
 
 struct be_destroy_data {
 	libbe_handle_t		*lbh;
 	char			*snapname;
 };
 
 #if SOON
 static int be_create_child_noent(libbe_handle_t *lbh, const char *active,
     const char *child_path);
 static int be_create_child_cloned(libbe_handle_t *lbh, const char *active);
 #endif
 
 /* Arbitrary... should tune */
 #define	BE_SNAP_SERIAL_MAX	1024
 
 /*
  * Iterator function for locating the rootfs amongst the children of the
  * zfs_be_root set by loader(8).  data is expected to be a libbe_handle_t *.
  */
 static int
 be_locate_rootfs(libbe_handle_t *lbh)
 {
 	struct statfs sfs;
 	struct extmnttab entry;
 	zfs_handle_t *zfs;
 
 	/*
 	 * Check first if root is ZFS; if not, we'll bail on rootfs capture.
 	 * Unfortunately needed because zfs_path_to_zhandle will emit to
 	 * stderr if / isn't actually a ZFS filesystem, which we'd like
 	 * to avoid.
 	 */
 	if (statfs("/", &sfs) == 0) {
 		statfs2mnttab(&sfs, &entry);
 		if (strcmp(entry.mnt_fstype, MNTTYPE_ZFS) != 0)
 			return (1);
 	} else
 		return (1);
 	zfs = zfs_path_to_zhandle(lbh->lzh, "/", ZFS_TYPE_FILESYSTEM);
 	if (zfs == NULL)
 		return (1);
 
 	strlcpy(lbh->rootfs, zfs_get_name(zfs), sizeof(lbh->rootfs));
 	zfs_close(zfs);
 	return (0);
 }
 
 /*
  * Initializes the libbe context to operate in the root boot environment
  * dataset, for example, zroot/ROOT.
  */
 libbe_handle_t *
 libbe_init(const char *root)
 {
 	char altroot[MAXPATHLEN];
 	libbe_handle_t *lbh;
 	char *poolname, *pos;
 	int pnamelen;
 
 	lbh = NULL;
 	poolname = pos = NULL;
 
 	if ((lbh = calloc(1, sizeof(libbe_handle_t))) == NULL)
 		goto err;
 
 	if ((lbh->lzh = libzfs_init()) == NULL)
 		goto err;
 
 	/*
 	 * Grab rootfs, we'll work backwards from there if an optional BE root
 	 * has not been passed in.
 	 */
 	if (be_locate_rootfs(lbh) != 0) {
 		if (root == NULL)
 			goto err;
 		*lbh->rootfs = '\0';
 	}
 	if (root == NULL) {
 		/* Strip off the final slash from rootfs to get the be root */
 		strlcpy(lbh->root, lbh->rootfs, sizeof(lbh->root));
 		pos = strrchr(lbh->root, '/');
 		if (pos == NULL)
 			goto err;
 		*pos = '\0';
 	} else
 		strlcpy(lbh->root, root, sizeof(lbh->root));
 
 	if ((pos = strchr(lbh->root, '/')) == NULL)
 		goto err;
 
 	pnamelen = pos - lbh->root;
 	poolname = malloc(pnamelen + 1);
 	if (poolname == NULL)
 		goto err;
 
 	strlcpy(poolname, lbh->root, pnamelen + 1);
 	if ((lbh->active_phandle = zpool_open(lbh->lzh, poolname)) == NULL)
 		goto err;
 	free(poolname);
 	poolname = NULL;
 
 	if (zpool_get_prop(lbh->active_phandle, ZPOOL_PROP_BOOTFS, lbh->bootfs,
 	    sizeof(lbh->bootfs), NULL, true) != 0)
 		goto err;
 
 	if (zpool_get_prop(lbh->active_phandle, ZPOOL_PROP_ALTROOT,
 	    altroot, sizeof(altroot), NULL, true) == 0 &&
 	    strcmp(altroot, "-") != 0)
 		lbh->altroot_len = strlen(altroot);
 
 	return (lbh);
 err:
 	if (lbh != NULL) {
 		if (lbh->active_phandle != NULL)
 			zpool_close(lbh->active_phandle);
 		if (lbh->lzh != NULL)
 			libzfs_fini(lbh->lzh);
 		free(lbh);
 	}
 	free(poolname);
 	return (NULL);
 }
 
 
 /*
  * Free memory allocated by libbe_init()
  */
 void
 libbe_close(libbe_handle_t *lbh)
 {
 
 	if (lbh->active_phandle != NULL)
 		zpool_close(lbh->active_phandle);
 	libzfs_fini(lbh->lzh);
 	free(lbh);
 }
 
 /*
  * Proxy through to libzfs for the moment.
  */
 void
 be_nicenum(uint64_t num, char *buf, size_t buflen)
 {
 
 	zfs_nicenum(num, buf, buflen);
 }
 
 static int
 be_destroy_cb(zfs_handle_t *zfs_hdl, void *data)
 {
 	char path[BE_MAXPATHLEN];
 	struct be_destroy_data *bdd;
 	zfs_handle_t *snap;
 	int err;
 
 	bdd = (struct be_destroy_data *)data;
 	if (bdd->snapname == NULL) {
 		err = zfs_iter_children(zfs_hdl, be_destroy_cb, data);
 		if (err != 0)
 			return (err);
 		return (zfs_destroy(zfs_hdl, false));
 	}
 	/* If we're dealing with snapshots instead, delete that one alone */
 	err = zfs_iter_filesystems(zfs_hdl, be_destroy_cb, data);
 	if (err != 0)
 		return (err);
 	/*
 	 * This part is intentionally glossing over any potential errors,
 	 * because there's a lot less potential for errors when we're cleaning
 	 * up snapshots rather than a full deep BE.  The primary error case
 	 * here being if the snapshot doesn't exist in the first place, which
 	 * the caller will likely deem insignificant as long as it doesn't
 	 * exist after the call.  Thus, such a missing snapshot shouldn't jam
 	 * up the destruction.
 	 */
 	snprintf(path, sizeof(path), "%s@%s", zfs_get_name(zfs_hdl),
 	    bdd->snapname);
 	if (!zfs_dataset_exists(bdd->lbh->lzh, path, ZFS_TYPE_SNAPSHOT))
 		return (0);
 	snap = zfs_open(bdd->lbh->lzh, path, ZFS_TYPE_SNAPSHOT);
 	if (snap != NULL)
 		zfs_destroy(snap, false);
 	return (0);
 }
 
 /*
  * Destroy the boot environment or snapshot specified by the name
  * parameter. Options are or'd together with the possible values:
  * BE_DESTROY_FORCE : forces operation on mounted datasets
  * BE_DESTROY_ORIGIN: destroy the origin snapshot as well
  */
 int
 be_destroy(libbe_handle_t *lbh, const char *name, int options)
 {
 	struct be_destroy_data bdd;
 	char origin[BE_MAXPATHLEN], path[BE_MAXPATHLEN];
 	zfs_handle_t *fs;
 	char *snapdelim;
 	int err, force, mounted;
 	size_t rootlen;
 
 	bdd.lbh = lbh;
 	bdd.snapname = NULL;
 	force = options & BE_DESTROY_FORCE;
 	*origin = '\0';
 
 	be_root_concat(lbh, name, path);
 
 	if ((snapdelim = strchr(path, '@')) == NULL) {
 		if (!zfs_dataset_exists(lbh->lzh, path, ZFS_TYPE_FILESYSTEM))
 			return (set_error(lbh, BE_ERR_NOENT));
 
 		if (strcmp(path, lbh->rootfs) == 0 ||
 		    strcmp(path, lbh->bootfs) == 0)
 			return (set_error(lbh, BE_ERR_DESTROYACT));
 
 		fs = zfs_open(lbh->lzh, path, ZFS_TYPE_FILESYSTEM);
 		if (fs == NULL)
 			return (set_error(lbh, BE_ERR_ZFSOPEN));
 
 		if ((options & BE_DESTROY_ORIGIN) != 0 &&
 		    zfs_prop_get(fs, ZFS_PROP_ORIGIN, origin, sizeof(origin),
 		    NULL, NULL, 0, 1) != 0)
 			return (set_error(lbh, BE_ERR_NOORIGIN));
 
 		/* Don't destroy a mounted dataset unless force is specified */
 		if ((mounted = zfs_is_mounted(fs, NULL)) != 0) {
 			if (force) {
 				zfs_unmount(fs, NULL, 0);
 			} else {
 				free(bdd.snapname);
 				return (set_error(lbh, BE_ERR_DESTROYMNT));
 			}
 		}
 	} else {
 		if (!zfs_dataset_exists(lbh->lzh, path, ZFS_TYPE_SNAPSHOT))
 			return (set_error(lbh, BE_ERR_NOENT));
 
 		bdd.snapname = strdup(snapdelim + 1);
 		if (bdd.snapname == NULL)
 			return (set_error(lbh, BE_ERR_NOMEM));
 		*snapdelim = '\0';
 		fs = zfs_open(lbh->lzh, path, ZFS_TYPE_DATASET);
 		if (fs == NULL) {
 			free(bdd.snapname);
 			return (set_error(lbh, BE_ERR_ZFSOPEN));
 		}
 	}
 
 	err = be_destroy_cb(fs, &bdd);
 	zfs_close(fs);
 	free(bdd.snapname);
 	if (err != 0) {
 		/* Children are still present or the mount is referenced */
 		if (err == EBUSY)
 			return (set_error(lbh, BE_ERR_DESTROYMNT));
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 	}
 
 	if ((options & BE_DESTROY_ORIGIN) == 0)
 		return (0);
 
 	/* The origin can't possibly be shorter than the BE root */
 	rootlen = strlen(lbh->root);
 	if (*origin == '\0' || strlen(origin) <= rootlen + 1)
 		return (set_error(lbh, BE_ERR_INVORIGIN));
 
 	/*
 	 * We'll be chopping off the BE root and running this back through
 	 * be_destroy, so that we properly handle the origin snapshot whether
 	 * it be that of a deep BE or not.
 	 */
 	if (strncmp(origin, lbh->root, rootlen) != 0 || origin[rootlen] != '/')
 		return (0);
 
 	return (be_destroy(lbh, origin + rootlen + 1,
 	    options & ~BE_DESTROY_ORIGIN));
 }
 
 static void
 be_setup_snapshot_name(libbe_handle_t *lbh, char *buf, size_t buflen)
 {
 	time_t rawtime;
 	int len, serial;
 
 	time(&rawtime);
 	len = strlen(buf);
 	len += strftime(buf + len, buflen - len, "@%F-%T", localtime(&rawtime));
 	/* No room for serial... caller will do its best */
 	if (buflen - len < 2)
 		return;
 
 	for (serial = 0; serial < BE_SNAP_SERIAL_MAX; ++serial) {
 		snprintf(buf + len, buflen - len, "-%d", serial);
 		if (!zfs_dataset_exists(lbh->lzh, buf, ZFS_TYPE_SNAPSHOT))
 			return;
 	}
 }
 
 int
 be_snapshot(libbe_handle_t *lbh, const char *source, const char *snap_name,
     bool recursive, char *result)
 {
 	char buf[BE_MAXPATHLEN];
 	int err;
 
 	be_root_concat(lbh, source, buf);
 
 	if ((err = be_exists(lbh, buf)) != 0)
 		return (set_error(lbh, err));
 
 	if (snap_name != NULL) {
 		if (strlcat(buf, "@", sizeof(buf)) >= sizeof(buf))
 			return (set_error(lbh, BE_ERR_INVALIDNAME));
 
 		if (strlcat(buf, snap_name, sizeof(buf)) >= sizeof(buf))
 			return (set_error(lbh, BE_ERR_INVALIDNAME));
 
 		if (result != NULL)
 			snprintf(result, BE_MAXPATHLEN, "%s@%s", source,
 			    snap_name);
 	} else {
 		be_setup_snapshot_name(lbh, buf, sizeof(buf));
 
 		if (result != NULL && strlcpy(result, strrchr(buf, '/') + 1,
 		    sizeof(buf)) >= sizeof(buf))
 			return (set_error(lbh, BE_ERR_INVALIDNAME));
 	}
 	if ((err = zfs_snapshot(lbh->lzh, buf, recursive, NULL)) != 0) {
 		switch (err) {
 		case EZFS_INVALIDNAME:
 			return (set_error(lbh, BE_ERR_INVALIDNAME));
 
 		default:
 			/*
 			 * The other errors that zfs_ioc_snapshot might return
 			 * shouldn't happen if we've set things up properly, so
 			 * we'll gloss over them and call it UNKNOWN as it will
 			 * require further triage.
 			 */
 			if (errno == ENOTSUP)
 				return (set_error(lbh, BE_ERR_NOPOOL));
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 		}
 	}
 
 	return (BE_ERR_SUCCESS);
 }
 
 
 /*
  * Create the boot environment specified by the name parameter
  */
 int
 be_create(libbe_handle_t *lbh, const char *name)
 {
 	int err;
 
 	err = be_create_from_existing(lbh, name, be_active_path(lbh));
 
 	return (set_error(lbh, err));
 }
 
 static int
 be_deep_clone_prop(int prop, void *cb)
 {
 	int err;
         struct libbe_dccb *dccb;
 	zprop_source_t src;
 	char pval[BE_MAXPATHLEN];
 	char source[BE_MAXPATHLEN];
 	char *val;
 
 	dccb = cb;
 	/* Skip some properties we don't want to touch */
 	if (prop == ZFS_PROP_CANMOUNT)
 		return (ZPROP_CONT);
 
 	/* Don't copy readonly properties */
 	if (zfs_prop_readonly(prop))
 		return (ZPROP_CONT);
 
 	if ((err = zfs_prop_get(dccb->zhp, prop, (char *)&pval,
 	    sizeof(pval), &src, (char *)&source, sizeof(source), false)))
 		/* Just continue if we fail to read a property */
 		return (ZPROP_CONT);
 
-	/* Only copy locally defined properties */
-	if (src != ZPROP_SRC_LOCAL)
+	/*
+	 * Only copy locally defined or received properties.  This continues
+	 * to avoid temporary/default/local properties intentionally without
+	 * breaking received datasets.
+	 */
+	if (src != ZPROP_SRC_LOCAL && src != ZPROP_SRC_RECEIVED)
 		return (ZPROP_CONT);
 
 	/* Augment mountpoint with altroot, if needed */
 	val = pval;
 	if (prop == ZFS_PROP_MOUNTPOINT)
 		val = be_mountpoint_augmented(dccb->lbh, val);
 
 	nvlist_add_string(dccb->props, zfs_prop_to_name(prop), val);
 
 	return (ZPROP_CONT);
 }
 
 /*
  * Return the corresponding boot environment path for a given
  * dataset path, the constructed path is placed in 'result'.
  *
  * example: say our new boot environment name is 'bootenv' and
  *          the dataset path is 'zroot/ROOT/default/data/set'.
  *
  * result should produce: 'zroot/ROOT/bootenv/data/set'
  */
 static int
 be_get_path(struct libbe_deep_clone *ldc, const char *dspath, char *result, int result_size)
 {
 	char *pos;
 	char *child_dataset;
 
 	/* match the root path for the boot environments */
 	pos = strstr(dspath, ldc->lbh->root);
 
 	/* no match, different pools? */
 	if (pos == NULL)
 		return (BE_ERR_BADPATH);
 
 	/* root path of the new boot environment */
 	snprintf(result, result_size, "%s/%s", ldc->lbh->root, ldc->bename);
 
         /* gets us to the parent dataset, the +1 consumes a trailing slash */
 	pos += strlen(ldc->lbh->root) + 1;
 
 	/* skip the parent dataset */
 	if ((child_dataset = strchr(pos, '/')) != NULL)
 		strlcat(result, child_dataset, result_size);
 
 	return (BE_ERR_SUCCESS);
 }
 
 static int
 be_clone_cb(zfs_handle_t *ds, void *data)
 {
 	int err;
 	char be_path[BE_MAXPATHLEN];
 	char snap_path[BE_MAXPATHLEN];
 	const char *dspath;
 	zfs_handle_t *snap_hdl;
 	nvlist_t *props;
 	struct libbe_deep_clone *ldc;
 	struct libbe_dccb dccb;
 
 	ldc = (struct libbe_deep_clone *)data;
 	dspath = zfs_get_name(ds);
 
 	snprintf(snap_path, sizeof(snap_path), "%s@%s", dspath, ldc->snapname);
 
 	/* construct the boot environment path from the dataset we're cloning */
 	if (be_get_path(ldc, dspath, be_path, sizeof(be_path)) != BE_ERR_SUCCESS)
 		return (set_error(ldc->lbh, BE_ERR_UNKNOWN));
 
 	/* the dataset to be created (i.e. the boot environment) already exists */
 	if (zfs_dataset_exists(ldc->lbh->lzh, be_path, ZFS_TYPE_DATASET))
 		return (set_error(ldc->lbh, BE_ERR_EXISTS));
 
 	/* no snapshot found for this dataset, silently skip it */
 	if (!zfs_dataset_exists(ldc->lbh->lzh, snap_path, ZFS_TYPE_SNAPSHOT))
 		return (0);
 
 	if ((snap_hdl =
 	    zfs_open(ldc->lbh->lzh, snap_path, ZFS_TYPE_SNAPSHOT)) == NULL)
 		return (set_error(ldc->lbh, BE_ERR_ZFSOPEN));
 
 	nvlist_alloc(&props, NV_UNIQUE_NAME, KM_SLEEP);
 	nvlist_add_string(props, "canmount", "noauto");
 
 	dccb.lbh = ldc->lbh;
 	dccb.zhp = ds;
 	dccb.props = props;
 	if (zprop_iter(be_deep_clone_prop, &dccb, B_FALSE, B_FALSE,
 	    ZFS_TYPE_FILESYSTEM) == ZPROP_INVAL)
 		return (-1);
 
 	if ((err = zfs_clone(snap_hdl, be_path, props)) != 0)
 		return (set_error(ldc->lbh, BE_ERR_ZFSCLONE));
 
 	nvlist_free(props);
 	zfs_close(snap_hdl);
 
 	if (ldc->depth_limit == -1 || ldc->depth < ldc->depth_limit) {
 		ldc->depth++;
 		err = zfs_iter_filesystems(ds, be_clone_cb, ldc);
 		ldc->depth--;
 	}
 
 	return (set_error(ldc->lbh, err));
 }
 
 /*
  * Create a boot environment with a given name from a given snapshot.
  * Snapshots can be in the format 'zroot/ROOT/default@snapshot' or
  * 'default@snapshot'. In the latter case, 'default@snapshot' will be prepended
  * with the root path that libbe was initailized with.
 */
 static int
 be_clone(libbe_handle_t *lbh, const char *bename, const char *snapshot, int depth)
 {
 	int err;
 	char snap_path[BE_MAXPATHLEN];
 	char *parentname, *snapname;
 	zfs_handle_t *parent_hdl;
 	struct libbe_deep_clone ldc;
 
         /* ensure the boot environment name is valid */
 	if ((err = be_validate_name(lbh, bename)) != 0)
 		return (set_error(lbh, err));
 
 	/*
 	 * prepend the boot environment root path if we're
 	 * given a partial snapshot name.
 	 */
 	if ((err = be_root_concat(lbh, snapshot, snap_path)) != 0)
 		return (set_error(lbh, err));
 
 	/* ensure the snapshot exists */
 	if ((err = be_validate_snap(lbh, snap_path)) != 0)
 		return (set_error(lbh, err));
 
         /* get a copy of the snapshot path so we can disect it */
 	if ((parentname = strdup(snap_path)) == NULL)
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 
         /* split dataset name from snapshot name */
 	snapname = strchr(parentname, '@');
 	if (snapname == NULL) {
 		free(parentname);
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 	}
 	*snapname = '\0';
 	snapname++;
 
         /* set-up the boot environment */
         ldc.lbh = lbh;
         ldc.bename = bename;
         ldc.snapname = snapname;
 	ldc.depth = 0;
 	ldc.depth_limit = depth;
 
         /* the boot environment will be cloned from this dataset */
 	parent_hdl = zfs_open(lbh->lzh, parentname, ZFS_TYPE_DATASET);
 
         /* create the boot environment */
 	err = be_clone_cb(parent_hdl, &ldc);
 
 	free(parentname);
 	return (set_error(lbh, err));
 }
 
 /*
  * Create a boot environment from pre-existing snapshot, specifying a depth.
  */
 int be_create_depth(libbe_handle_t *lbh, const char *bename,
 		    const char *snap, int depth)
 {
 	return (be_clone(lbh, bename, snap, depth));
 }
 
 /*
  * Create the boot environment from pre-existing snapshot
  */
 int
 be_create_from_existing_snap(libbe_handle_t *lbh, const char *bename,
     const char *snap)
 {
 	return (be_clone(lbh, bename, snap, -1));
 }
 
 
 /*
  * Create a boot environment from an existing boot environment
  */
 int
 be_create_from_existing(libbe_handle_t *lbh, const char *bename, const char *old)
 {
 	int err;
 	char snap[BE_MAXPATHLEN];
 
 	if ((err = be_snapshot(lbh, old, NULL, true, snap)) != 0)
 		return (set_error(lbh, err));
 
         err = be_clone(lbh, bename, snap, -1);
 
 	return (set_error(lbh, err));
 }
 
 
 /*
  * Verifies that a snapshot has a valid name, exists, and has a mountpoint of
  * '/'. Returns BE_ERR_SUCCESS (0), upon success, or the relevant BE_ERR_* upon
  * failure. Does not set the internal library error state.
  */
 int
 be_validate_snap(libbe_handle_t *lbh, const char *snap_name)
 {
 
 	if (strlen(snap_name) >= BE_MAXPATHLEN)
 		return (BE_ERR_PATHLEN);
 
 	if (!zfs_name_valid(snap_name, ZFS_TYPE_SNAPSHOT))
 		return (BE_ERR_INVALIDNAME);
 
 	if (!zfs_dataset_exists(lbh->lzh, snap_name,
 	    ZFS_TYPE_SNAPSHOT))
 		return (BE_ERR_NOENT);
 
 	return (BE_ERR_SUCCESS);
 }
 
 
 /*
  * Idempotently appends the name argument to the root boot environment path
  * and copies the resulting string into the result buffer (which is assumed
  * to be at least BE_MAXPATHLEN characters long. Returns BE_ERR_SUCCESS upon
  * success, BE_ERR_PATHLEN if the resulting path is longer than BE_MAXPATHLEN,
  * or BE_ERR_INVALIDNAME if the name is a path that does not begin with
  * zfs_be_root. Does not set internal library error state.
  */
 int
 be_root_concat(libbe_handle_t *lbh, const char *name, char *result)
 {
 	size_t name_len, root_len;
 
 	name_len = strlen(name);
 	root_len = strlen(lbh->root);
 
 	/* Act idempotently; return be name if it is already a full path */
 	if (strrchr(name, '/') != NULL) {
 		if (strstr(name, lbh->root) != name)
 			return (BE_ERR_INVALIDNAME);
 
 		if (name_len >= BE_MAXPATHLEN)
 			return (BE_ERR_PATHLEN);
 
 		strlcpy(result, name, BE_MAXPATHLEN);
 		return (BE_ERR_SUCCESS);
 	} else if (name_len + root_len + 1 < BE_MAXPATHLEN) {
 		snprintf(result, BE_MAXPATHLEN, "%s/%s", lbh->root,
 		    name);
 		return (BE_ERR_SUCCESS);
 	}
 
 	return (BE_ERR_PATHLEN);
 }
 
 
 /*
  * Verifies the validity of a boot environment name (A-Za-z0-9-_.). Returns
  * BE_ERR_SUCCESS (0) if name is valid, otherwise returns BE_ERR_INVALIDNAME
  * or BE_ERR_PATHLEN.
  * Does not set internal library error state.
  */
 int
 be_validate_name(libbe_handle_t *lbh, const char *name)
 {
 
 	/*
 	 * Impose the additional restriction that the entire dataset name must
 	 * not exceed the maximum length of a dataset, i.e. MAXNAMELEN.
 	 */
 	if (strlen(lbh->root) + 1 + strlen(name) > MAXNAMELEN)
 		return (BE_ERR_PATHLEN);
 
 	if (!zfs_name_valid(name, ZFS_TYPE_DATASET))
 		return (BE_ERR_INVALIDNAME);
 
 	return (BE_ERR_SUCCESS);
 }
 
 
 /*
  * usage
  */
 int
 be_rename(libbe_handle_t *lbh, const char *old, const char *new)
 {
 	char full_old[BE_MAXPATHLEN];
 	char full_new[BE_MAXPATHLEN];
 	zfs_handle_t *zfs_hdl;
 	int err;
 
 	/*
 	 * be_validate_name is documented not to set error state, so we should
 	 * do so here.
 	 */
 	if ((err = be_validate_name(lbh, new)) != 0)
 		return (set_error(lbh, err));
 	if ((err = be_root_concat(lbh, old, full_old)) != 0)
 		return (set_error(lbh, err));
 	if ((err = be_root_concat(lbh, new, full_new)) != 0)
 		return (set_error(lbh, err));
 
 	if (!zfs_dataset_exists(lbh->lzh, full_old, ZFS_TYPE_DATASET))
 		return (set_error(lbh, BE_ERR_NOENT));
 
 	if (zfs_dataset_exists(lbh->lzh, full_new, ZFS_TYPE_DATASET))
 		return (set_error(lbh, BE_ERR_EXISTS));
 
 	if ((zfs_hdl = zfs_open(lbh->lzh, full_old,
 	    ZFS_TYPE_FILESYSTEM)) == NULL)
 		return (set_error(lbh, BE_ERR_ZFSOPEN));
 
 	/* recurse, nounmount, forceunmount */
 	struct renameflags flags = {
 		.nounmount = 1,
 	};
 
 	err = zfs_rename(zfs_hdl, NULL, full_new, flags);
 
 	zfs_close(zfs_hdl);
 	if (err != 0)
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 	return (0);
 }
 
 
 int
 be_export(libbe_handle_t *lbh, const char *bootenv, int fd)
 {
 	char snap_name[BE_MAXPATHLEN];
 	char buf[BE_MAXPATHLEN];
 	zfs_handle_t *zfs;
 	int err;
 
 	if ((err = be_snapshot(lbh, bootenv, NULL, true, snap_name)) != 0)
 		/* Use the error set by be_snapshot */
 		return (err);
 
 	be_root_concat(lbh, snap_name, buf);
 
 	if ((zfs = zfs_open(lbh->lzh, buf, ZFS_TYPE_DATASET)) == NULL)
 		return (set_error(lbh, BE_ERR_ZFSOPEN));
 
 	err = zfs_send_one(zfs, NULL, fd, 0);
 	zfs_close(zfs);
 
 	return (err);
 }
 
 
 int
 be_import(libbe_handle_t *lbh, const char *bootenv, int fd)
 {
 	char buf[BE_MAXPATHLEN];
 	nvlist_t *props;
 	zfs_handle_t *zfs;
 	recvflags_t flags = { .nomount = 1 };
 	int err;
 
 	be_root_concat(lbh, bootenv, buf);
 
 	if ((err = zfs_receive(lbh->lzh, buf, NULL, &flags, fd, NULL)) != 0) {
 		switch (err) {
 		case EINVAL:
 			return (set_error(lbh, BE_ERR_NOORIGIN));
 		case ENOENT:
 			return (set_error(lbh, BE_ERR_NOENT));
 		case EIO:
 			return (set_error(lbh, BE_ERR_IO));
 		default:
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 		}
 	}
 
 	if ((zfs = zfs_open(lbh->lzh, buf, ZFS_TYPE_FILESYSTEM)) == NULL)
 		return (set_error(lbh, BE_ERR_ZFSOPEN));
 
 	nvlist_alloc(&props, NV_UNIQUE_NAME, KM_SLEEP);
 	nvlist_add_string(props, "canmount", "noauto");
 	nvlist_add_string(props, "mountpoint", "/");
 
 	err = zfs_prop_set_list(zfs, props);
 	nvlist_free(props);
 
 	zfs_close(zfs);
 
 	if (err != 0)
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 
 	return (0);
 }
 
 #if SOON
 static int
 be_create_child_noent(libbe_handle_t *lbh, const char *active,
     const char *child_path)
 {
 	nvlist_t *props;
 	zfs_handle_t *zfs;
 	int err;
 
 	nvlist_alloc(&props, NV_UNIQUE_NAME, KM_SLEEP);
 	nvlist_add_string(props, "canmount", "noauto");
 	nvlist_add_string(props, "mountpoint", child_path);
 
 	/* Create */
 	if ((err = zfs_create(lbh->lzh, active, ZFS_TYPE_DATASET,
 	    props)) != 0) {
 		switch (err) {
 		case EZFS_EXISTS:
 			return (set_error(lbh, BE_ERR_EXISTS));
 		case EZFS_NOENT:
 			return (set_error(lbh, BE_ERR_NOENT));
 		case EZFS_BADTYPE:
 		case EZFS_BADVERSION:
 			return (set_error(lbh, BE_ERR_NOPOOL));
 		case EZFS_BADPROP:
 		default:
 			/* We set something up wrong, probably... */
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 		}
 	}
 	nvlist_free(props);
 
 	if ((zfs = zfs_open(lbh->lzh, active, ZFS_TYPE_DATASET)) == NULL)
 		return (set_error(lbh, BE_ERR_ZFSOPEN));
 
 	/* Set props */
 	if ((err = zfs_prop_set(zfs, "canmount", "noauto")) != 0) {
 		zfs_close(zfs);
 		/*
 		 * Similar to other cases, this shouldn't fail unless we've
 		 * done something wrong.  This is a new dataset that shouldn't
 		 * have been mounted anywhere between creation and now.
 		 */
 		if (err == EZFS_NOMEM)
 			return (set_error(lbh, BE_ERR_NOMEM));
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 	}
 	zfs_close(zfs);
 	return (BE_ERR_SUCCESS);
 }
 
 static int
 be_create_child_cloned(libbe_handle_t *lbh, const char *active)
 {
 	char buf[BE_MAXPATHLEN], tmp[BE_MAXPATHLEN];;
 	zfs_handle_t *zfs;
 	int err;
 
 	/* XXX TODO ? */
 
 	/*
 	 * Establish if the existing path is a zfs dataset or just
 	 * the subdirectory of one
 	 */
 	strlcpy(tmp, "tmp/be_snap.XXXXX", sizeof(tmp));
 	if (mktemp(tmp) == NULL)
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 
 	be_root_concat(lbh, tmp, buf);
 	printf("Here %s?\n", buf);
 	if ((err = zfs_snapshot(lbh->lzh, buf, false, NULL)) != 0) {
 		switch (err) {
 		case EZFS_INVALIDNAME:
 			return (set_error(lbh, BE_ERR_INVALIDNAME));
 
 		default:
 			/*
 			 * The other errors that zfs_ioc_snapshot might return
 			 * shouldn't happen if we've set things up properly, so
 			 * we'll gloss over them and call it UNKNOWN as it will
 			 * require further triage.
 			 */
 			if (errno == ENOTSUP)
 				return (set_error(lbh, BE_ERR_NOPOOL));
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 		}
 	}
 
 	/* Clone */
 	if ((zfs = zfs_open(lbh->lzh, buf, ZFS_TYPE_SNAPSHOT)) == NULL)
 		return (BE_ERR_ZFSOPEN);
 
 	if ((err = zfs_clone(zfs, active, NULL)) != 0)
 		/* XXX TODO correct error */
 		return (set_error(lbh, BE_ERR_UNKNOWN));
 
 	/* set props */
 	zfs_close(zfs);
 	return (BE_ERR_SUCCESS);
 }
 
 int
 be_add_child(libbe_handle_t *lbh, const char *child_path, bool cp_if_exists)
 {
 	struct stat sb;
 	char active[BE_MAXPATHLEN], buf[BE_MAXPATHLEN];
 	nvlist_t *props;
 	const char *s;
 
 	/* Require absolute paths */
 	if (*child_path != '/')
 		return (set_error(lbh, BE_ERR_BADPATH));
 
 	strlcpy(active, be_active_path(lbh), BE_MAXPATHLEN);
 	strcpy(buf, active);
 
 	/* Create non-mountable parent dataset(s) */
 	s = child_path;
 	for (char *p; (p = strchr(s+1, '/')) != NULL; s = p) {
 		size_t len = p - s;
 		strncat(buf, s, len);
 
 		nvlist_alloc(&props, NV_UNIQUE_NAME, KM_SLEEP);
 		nvlist_add_string(props, "canmount", "off");
 		nvlist_add_string(props, "mountpoint", "none");
 		zfs_create(lbh->lzh, buf, ZFS_TYPE_DATASET, props);
 		nvlist_free(props);
 	}
 
 	/* Path does not exist as a descendent of / yet */
 	if (strlcat(active, child_path, BE_MAXPATHLEN) >= BE_MAXPATHLEN)
 		return (set_error(lbh, BE_ERR_PATHLEN));
 
 	if (stat(child_path, &sb) != 0) {
 		/* Verify that error is ENOENT */
 		if (errno != ENOENT)
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 		return (be_create_child_noent(lbh, active, child_path));
 	} else if (cp_if_exists)
 		/* Path is already a descendent of / and should be copied */
 		return (be_create_child_cloned(lbh, active));
 	return (set_error(lbh, BE_ERR_EXISTS));
 }
 #endif	/* SOON */
 
 static int
 be_set_nextboot(libbe_handle_t *lbh, nvlist_t *config, uint64_t pool_guid,
     const char *zfsdev)
 {
 	nvlist_t **child;
 	uint64_t vdev_guid;
 	int c, children;
 
 	if (nvlist_lookup_nvlist_array(config, ZPOOL_CONFIG_CHILDREN, &child,
 	    &children) == 0) {
 		for (c = 0; c < children; ++c)
 			if (be_set_nextboot(lbh, child[c], pool_guid, zfsdev) != 0)
 				return (1);
 		return (0);
 	}
 
 	if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_GUID,
 	    &vdev_guid) != 0) {
 		return (1);
 	}
 
 	if (zpool_nextboot(lbh->lzh, pool_guid, vdev_guid, zfsdev) != 0) {
 		perror("ZFS_IOC_NEXTBOOT failed");
 		return (1);
 	}
 
 	return (0);
 }
 
 /*
  * Deactivate old BE dataset; currently just sets canmount=noauto
  */
 static int
 be_deactivate(libbe_handle_t *lbh, const char *ds)
 {
 	zfs_handle_t *zfs;
 
 	if ((zfs = zfs_open(lbh->lzh, ds, ZFS_TYPE_DATASET)) == NULL)
 		return (1);
 	if (zfs_prop_set(zfs, "canmount", "noauto") != 0)
 		return (1);
 	zfs_close(zfs);
 	return (0);
 }
 
 int
 be_activate(libbe_handle_t *lbh, const char *bootenv, bool temporary)
 {
 	char be_path[BE_MAXPATHLEN];
 	char buf[BE_MAXPATHLEN];
 	nvlist_t *config, *dsprops, *vdevs;
 	char *origin;
 	uint64_t pool_guid;
 	zfs_handle_t *zhp;
 	int err;
 
 	be_root_concat(lbh, bootenv, be_path);
 
 	/* Note: be_exists fails if mountpoint is not / */
 	if ((err = be_exists(lbh, be_path)) != 0)
 		return (set_error(lbh, err));
 
 	if (temporary) {
 		config = zpool_get_config(lbh->active_phandle, NULL);
 		if (config == NULL)
 			/* config should be fetchable... */
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 
 		if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
 		    &pool_guid) != 0)
 			/* Similarly, it shouldn't be possible */
 			return (set_error(lbh, BE_ERR_UNKNOWN));
 
 		/* Expected format according to zfsbootcfg(8) man */
 		snprintf(buf, sizeof(buf), "zfs:%s:", be_path);
 
 		/* We have no config tree */
 		if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
 		    &vdevs) != 0)
 			return (set_error(lbh, BE_ERR_NOPOOL));
 
 		return (be_set_nextboot(lbh, vdevs, pool_guid, buf));
 	} else {
 		if (be_deactivate(lbh, lbh->bootfs) != 0)
 			return (-1);
 
 		/* Obtain bootenv zpool */
 		err = zpool_set_prop(lbh->active_phandle, "bootfs", be_path);
 		if (err)
 			return (-1);
 
 		zhp = zfs_open(lbh->lzh, be_path, ZFS_TYPE_FILESYSTEM);
 		if (zhp == NULL)
 			return (-1);
 
 		if (be_prop_list_alloc(&dsprops) != 0)
 			return (-1);
 
 		if (be_get_dataset_props(lbh, be_path, dsprops) != 0) {
 			nvlist_free(dsprops);
 			return (-1);
 		}
 
 		if (nvlist_lookup_string(dsprops, "origin", &origin) == 0)
 			err = zfs_promote(zhp);
 		nvlist_free(dsprops);
 
 		zfs_close(zhp);
 
 		if (err)
 			return (-1);
 	}
 
 	return (BE_ERR_SUCCESS);
 }
Index: user/ngie/bug-237403/lib/libc/gen/Makefile.inc
===================================================================
--- user/ngie/bug-237403/lib/libc/gen/Makefile.inc	(revision 346925)
+++ user/ngie/bug-237403/lib/libc/gen/Makefile.inc	(revision 346926)
@@ -1,543 +1,545 @@
 #	@(#)Makefile.inc	8.6 (Berkeley) 5/4/95
 # $FreeBSD$
 
 # machine-independent gen sources
 .PATH: ${LIBC_SRCTOP}/${LIBC_ARCH}/gen ${LIBC_SRCTOP}/gen
 
 CONFS=	shells
 
 SRCS+=	__getosreldate.c \
 	__pthread_mutex_init_calloc_cb_stub.c \
 	__xuname.c \
 	_once_stub.c \
 	_pthread_stubs.c \
 	_rand48.c \
 	_spinlock_stub.c \
 	_thread_init.c \
 	alarm.c \
 	arc4random.c \
 	arc4random-compat.c \
 	arc4random_uniform.c \
 	assert.c \
 	auxv.c \
 	basename.c \
 	basename_compat.c \
 	cap_sandboxed.c \
 	check_utility_compat.c \
 	clock.c \
 	clock_getcpuclockid.c \
 	closedir.c \
 	confstr.c \
 	crypt.c \
 	ctermid.c \
 	daemon.c \
 	devname.c \
 	dirfd.c \
 	dirname.c \
 	dirname_compat.c \
 	disklabel.c \
 	dlfcn.c \
 	drand48.c \
 	dup3.c \
 	elf_utils.c \
 	erand48.c \
 	err.c \
 	errlst.c \
 	errno.c \
 	exec.c \
 	exect.c \
 	fdevname.c \
 	feature_present.c \
 	fmtcheck.c \
 	fmtmsg.c \
 	fnmatch.c \
 	fpclassify.c \
 	frexp.c \
 	fstab.c \
 	ftok.c \
 	fts.c \
 	ftw.c \
 	getbootfile.c \
 	getbsize.c \
 	getcap.c \
 	getcwd.c \
 	getdomainname.c \
 	getentropy.c \
 	getgrent.c \
 	getgrouplist.c \
 	gethostname.c \
 	getloadavg.c \
 	getlogin.c \
 	getmntinfo.c \
 	getnetgrent.c \
 	getosreldate.c \
 	getpagesize.c \
 	getpagesizes.c \
 	getpeereid.c \
 	getprogname.c \
 	getpwent.c \
 	getttyent.c \
 	getusershell.c \
 	getutxent.c \
 	getvfsbyname.c \
 	glob.c \
 	initgroups.c \
 	isatty.c \
 	isinf.c \
 	isnan.c \
 	jrand48.c \
 	lcong48.c \
 	libc_dlopen.c \
 	lockf.c \
 	lrand48.c \
 	mrand48.c \
 	nftw.c \
 	nice.c \
 	nlist.c \
 	nrand48.c \
 	opendir.c \
 	pause.c \
 	pmadvise.c \
 	popen.c \
 	posix_spawn.c \
 	psignal.c \
 	pututxline.c \
 	pw_scan.c \
 	raise.c \
 	readdir.c \
 	readpassphrase.c \
 	recvmmsg.c \
 	rewinddir.c \
 	scandir.c \
 	seed48.c \
 	seekdir.c \
 	semctl.c \
 	sendmmsg.c \
 	setdomainname.c \
 	sethostname.c \
 	setjmperr.c \
 	setmode.c \
 	setproctitle.c \
 	setprogname.c \
 	siginterrupt.c \
 	siglist.c \
 	signal.c \
 	sigsetops.c \
 	sleep.c \
 	srand48.c \
 	statvfs.c \
 	stringlist.c \
 	strtofflags.c \
 	sysconf.c \
 	sysctl.c \
 	sysctlbyname.c \
 	sysctlnametomib.c \
 	syslog.c \
 	telldir.c \
 	termios.c \
 	time.c \
 	times.c \
 	timespec_get.c \
 	timezone.c \
 	tls.c \
 	ttyname.c \
 	ttyslot.c \
 	ualarm.c \
 	ulimit.c \
 	uname.c \
 	usleep.c \
 	utime.c \
 	utxdb.c \
 	valloc.c \
 	wait.c \
 	wait3.c \
 	waitpid.c \
 	waitid.c \
 	wordexp.c
 .if ${MK_SYMVER} == yes
 SRCS+=	devname-compat11.c \
 	fts-compat.c \
 	fts-compat11.c \
 	ftw-compat11.c \
 	getmntinfo-compat11.c \
 	glob-compat11.c \
 	nftw-compat11.c \
 	readdir-compat11.c \
 	scandir-compat11.c \
 	unvis-compat.c
 .endif
 
 CFLAGS.arc4random.c= -I${SRCTOP}/sys -I${SRCTOP}/sys/crypto/chacha20
 
 .PATH: ${SRCTOP}/contrib/libc-pwcache
 SRCS+=	pwcache.c pwcache.h
 
 .PATH: ${SRCTOP}/contrib/libc-vis
 CFLAGS+=	-I${SRCTOP}/contrib/libc-vis
 SRCS+=	unvis.c vis.c
 
 MISRCS+=modf.c
 
 CANCELPOINTS_SRCS=sem.c sem_new.c
 .for src in ${CANCELPOINTS_SRCS}
 SRCS+=cancelpoints_${src}
 CLEANFILES+=cancelpoints_${src}
 cancelpoints_${src}: ${LIBC_SRCTOP}/gen/${src} .NOMETA
 	ln -sf ${.ALLSRC} ${.TARGET}
 .endfor
 
 SYM_MAPS+=${LIBC_SRCTOP}/gen/Symbol.map
 
 # machine-dependent gen sources
 .sinclude "${LIBC_SRCTOP}/${LIBC_ARCH}/gen/Makefile.inc"
 
 MAN+=	alarm.3 \
 	arc4random.3 \
+	auxv.3 \
 	basename.3 \
 	cap_rights_get.3 \
 	cap_sandboxed.3 \
 	check_utility_compat.3 \
 	clock.3 \
 	clock_getcpuclockid.3 \
 	confstr.3 \
 	ctermid.3 \
 	daemon.3 \
 	devname.3 \
 	directory.3 \
 	dirname.3 \
 	dl_iterate_phdr.3 \
 	dladdr.3 \
 	dlinfo.3 \
 	dllockinit.3 \
 	dlopen.3 \
 	dup3.3 \
 	err.3 \
 	exec.3 \
 	feature_present.3 \
 	fmtcheck.3 \
 	fmtmsg.3 \
 	fnmatch.3 \
 	fpclassify.3 \
 	frexp.3 \
 	ftok.3 \
 	fts.3 \
 	ftw.3 \
 	getbootfile.3 \
 	getbsize.3 \
 	getcap.3 \
 	getcontext.3 \
 	getcwd.3 \
 	getdiskbyname.3 \
 	getdomainname.3 \
 	getentropy.3 \
 	getfsent.3 \
 	getgrent.3 \
 	getgrouplist.3 \
 	gethostname.3 \
 	getloadavg.3 \
 	getmntinfo.3 \
 	getnetgrent.3 \
 	getosreldate.3 \
 	getpagesize.3 \
 	getpagesizes.3 \
 	getpass.3 \
 	getpeereid.3 \
 	getprogname.3 \
 	getpwent.3 \
 	getttyent.3 \
 	getusershell.3 \
 	getutxent.3 \
 	getvfsbyname.3 \
 	glob.3 \
 	initgroups.3 \
 	isgreater.3 \
 	ldexp.3 \
 	lockf.3 \
 	makecontext.3 \
 	modf.3 \
 	nice.3 \
 	nlist.3 \
 	pause.3 \
 	popen.3 \
 	posix_spawn.3 \
 	posix_spawn_file_actions_addopen.3 \
 	posix_spawn_file_actions_init.3 \
 	posix_spawnattr_getflags.3 \
 	posix_spawnattr_getpgroup.3 \
 	posix_spawnattr_getschedparam.3 \
 	posix_spawnattr_getschedpolicy.3 \
 	posix_spawnattr_init.3 \
 	posix_spawnattr_getsigdefault.3 \
 	posix_spawnattr_getsigmask.3 \
 	psignal.3 \
 	pwcache.3 \
 	raise.3 \
 	rand48.3 \
 	readpassphrase.3 \
 	rfork_thread.3 \
 	scandir.3 \
 	sem_destroy.3 \
 	sem_getvalue.3 \
 	sem_init.3 \
 	sem_open.3 \
 	sem_post.3 \
 	sem_timedwait.3 \
 	sem_wait.3 \
 	setjmp.3 \
 	setmode.3 \
 	setproctitle.3 \
 	siginterrupt.3 \
 	signal.3 \
 	sigsetops.3 \
 	sleep.3 \
 	statvfs.3 \
 	stringlist.3 \
 	strtofflags.3 \
 	sysconf.3 \
 	sysctl.3 \
 	syslog.3 \
 	tcgetpgrp.3 \
 	tcgetsid.3 \
 	tcsendbreak.3 \
 	tcsetattr.3 \
 	tcsetpgrp.3 \
 	tcsetsid.3 \
 	time.3 \
 	times.3 \
 	timespec_get.3 \
 	timezone.3 \
 	ttyname.3 \
 	tzset.3 \
 	ualarm.3 \
 	ucontext.3 \
 	ulimit.3 \
 	uname.3 \
 	unvis.3 \
 	usleep.3 \
 	utime.3 \
 	valloc.3 \
 	vis.3 \
 	wordexp.3
 
 MLINKS+=arc4random.3 arc4random_buf.3 \
 	arc4random.3 arc4random_uniform.3
+MLINKS+=auxv.3 elf_aux_info.3
 MLINKS+=ctermid.3 ctermid_r.3
 MLINKS+=devname.3 devname_r.3
 MLINKS+=devname.3 fdevname.3
 MLINKS+=devname.3 fdevname_r.3
 MLINKS+=directory.3 closedir.3 \
 	directory.3 dirfd.3 \
 	directory.3 fdclosedir.3 \
 	directory.3 fdopendir.3 \
 	directory.3 opendir.3 \
 	directory.3 readdir.3 \
 	directory.3 readdir_r.3 \
 	directory.3 rewinddir.3 \
 	directory.3 seekdir.3 \
 	directory.3 telldir.3
 MLINKS+=dlopen.3 fdlopen.3 \
 	dlopen.3 dlclose.3 \
 	dlopen.3 dlerror.3 \
 	dlopen.3 dlfunc.3 \
 	dlopen.3 dlsym.3 \
 	dlopen.3 dlvsym.3
 MLINKS+=err.3 err_set_exit.3 \
 	err.3 err_set_file.3 \
 	err.3 errc.3 \
 	err.3 errx.3 \
 	err.3 verr.3 \
 	err.3 verrc.3 \
 	err.3 verrx.3 \
 	err.3 vwarn.3 \
 	err.3 vwarnc.3 \
 	err.3 vwarnx.3 \
 	err.3 warnc.3 \
 	err.3 warn.3 \
 	err.3 warnx.3
 MLINKS+=exec.3 execl.3 \
 	exec.3 execle.3 \
 	exec.3 execlp.3 \
 	exec.3 exect.3 \
 	exec.3 execv.3 \
 	exec.3 execvP.3 \
 	exec.3 execvp.3
 MLINKS+=fpclassify.3 finite.3 \
 	fpclassify.3 finitef.3 \
 	fpclassify.3 isfinite.3 \
 	fpclassify.3 isinf.3 \
 	fpclassify.3 isnan.3 \
 	fpclassify.3 isnormal.3
 MLINKS+=frexp.3 frexpf.3 \
 	frexp.3 frexpl.3
 MLINKS+=fts.3 fts_children.3 \
 	fts.3 fts_close.3 \
 	fts.3 fts_open.3 \
 	fts.3 fts_read.3 \
 	fts.3 fts_set.3 \
 	fts.3 fts_set_clientptr.3 \
 	fts.3 fts_get_clientptr.3 \
 	fts.3 fts_get_stream.3
 MLINKS+=ftw.3 nftw.3
 MLINKS+=getcap.3 cgetcap.3 \
 	getcap.3 cgetclose.3 \
 	getcap.3 cgetent.3 \
 	getcap.3 cgetfirst.3 \
 	getcap.3 cgetmatch.3 \
 	getcap.3 cgetnext.3 \
 	getcap.3 cgetnum.3 \
 	getcap.3 cgetset.3 \
 	getcap.3 cgetstr.3 \
 	getcap.3 cgetustr.3
 MLINKS+=getcwd.3 getwd.3
 MLINKS+=getcontext.3 getcontextx.3
 MLINKS+=getcontext.3 setcontext.3
 MLINKS+=getdomainname.3 setdomainname.3
 MLINKS+=getfsent.3 endfsent.3 \
 	getfsent.3 getfsfile.3 \
 	getfsent.3 getfsspec.3 \
 	getfsent.3 getfstype.3 \
 	getfsent.3 setfsent.3 \
 	getfsent.3 setfstab.3 \
 	getfsent.3 getfstab.3
 MLINKS+=getgrent.3 endgrent.3 \
 	getgrent.3 getgrgid.3 \
 	getgrent.3 getgrnam.3 \
 	getgrent.3 setgrent.3 \
 	getgrent.3 setgroupent.3 \
 	getgrent.3 getgrent_r.3 \
 	getgrent.3 getgrnam_r.3 \
 	getgrent.3 getgrgid_r.3
 MLINKS+=gethostname.3 sethostname.3
 MLINKS+=getnetgrent.3 endnetgrent.3 \
 	getnetgrent.3 getnetgrent_r.3 \
 	getnetgrent.3 innetgr.3 \
 	getnetgrent.3 setnetgrent.3
 MLINKS+=getprogname.3 setprogname.3
 MLINKS+=getpwent.3 endpwent.3 \
 	getpwent.3 getpwnam.3 \
 	getpwent.3 getpwuid.3 \
 	getpwent.3 setpassent.3 \
 	getpwent.3 setpwent.3 \
 	getpwent.3 setpwfile.3 \
 	getpwent.3 getpwent_r.3 \
 	getpwent.3 getpwnam_r.3 \
 	getpwent.3 getpwuid_r.3
 MLINKS+=getttyent.3 endttyent.3 \
 	getttyent.3 getttynam.3 \
 	getttyent.3 isdialuptty.3 \
 	getttyent.3 isnettty.3 \
 	getttyent.3 setttyent.3
 MLINKS+=getusershell.3 endusershell.3 \
 	getusershell.3 setusershell.3
 MLINKS+=getutxent.3 endutxent.3 \
 	getutxent.3 getutxid.3 \
 	getutxent.3 getutxline.3 \
 	getutxent.3 getutxuser.3 \
 	getutxent.3 pututxline.3 \
 	getutxent.3 setutxdb.3 \
 	getutxent.3 setutxent.3 \
 	getutxent.3 utmpx.3
 MLINKS+=glob.3 globfree.3
 MLINKS+=isgreater.3 isgreaterequal.3 \
 	isgreater.3 isless.3 \
 	isgreater.3 islessequal.3 \
 	isgreater.3 islessgreater.3 \
 	isgreater.3 isunordered.3
 MLINKS+=ldexp.3 ldexpf.3 \
 	ldexp.3 ldexpl.3
 MLINKS+=makecontext.3 swapcontext.3
 MLINKS+=modf.3 modff.3 \
 	modf.3 modfl.3
 MLINKS+=popen.3 pclose.3
 MLINKS+=posix_spawn.3 posix_spawnp.3 \
 	posix_spawn_file_actions_addopen.3 posix_spawn_file_actions_addclose.3 \
 	posix_spawn_file_actions_addopen.3 posix_spawn_file_actions_adddup2.3 \
 	posix_spawn_file_actions_init.3 posix_spawn_file_actions_destroy.3 \
 	posix_spawnattr_getflags.3 posix_spawnattr_setflags.3 \
 	posix_spawnattr_getpgroup.3 posix_spawnattr_setpgroup.3 \
 	posix_spawnattr_getschedparam.3 posix_spawnattr_setschedparam.3 \
 	posix_spawnattr_getschedpolicy.3 posix_spawnattr_setschedpolicy.3 \
 	posix_spawnattr_getsigdefault.3 posix_spawnattr_setsigdefault.3 \
 	posix_spawnattr_getsigmask.3 posix_spawnattr_setsigmask.3 \
 	posix_spawnattr_init.3 posix_spawnattr_destroy.3
 MLINKS+=psignal.3 strsignal.3 \
 	psignal.3 sys_siglist.3 \
 	psignal.3 sys_signame.3
 MLINKS+=pwcache.3 group_from_gid.3 \
 	pwcache.3 user_from_uid.3
 MLINKS+=rand48.3 _rand48.3 \
 	rand48.3 drand48.3 \
 	rand48.3 erand48.3 \
 	rand48.3 jrand48.3 \
 	rand48.3 lcong48.3 \
 	rand48.3 lrand48.3 \
 	rand48.3 mrand48.3 \
 	rand48.3 nrand48.3 \
 	rand48.3 seed48.3 \
 	rand48.3 srand48.3
 MLINKS+=recv.2 recvmmsg.2
 MLINKS+=scandir.3 alphasort.3
 MLINKS+=sem_open.3 sem_close.3 \
 	sem_open.3 sem_unlink.3
 MLINKS+=sem_wait.3 sem_trywait.3
 MLINKS+=sem_timedwait.3 sem_clockwait_np.3
 MLINKS+=send.2 sendmmsg.2
 MLINKS+=setjmp.3 _longjmp.3 \
 	setjmp.3 _setjmp.3 \
 	setjmp.3 longjmp.3 \
 	setjmp.3 longjmperr.3 \
 	setjmp.3 longjmperror.3 \
 	setjmp.3 siglongjmp.3 \
 	setjmp.3 sigsetjmp.3
 MLINKS+=setmode.3 getmode.3
 MLINKS+=setproctitle.3 setproctitle_fast.3
 MLINKS+=sigsetops.3 sigaddset.3 \
 	sigsetops.3 sigdelset.3 \
 	sigsetops.3 sigemptyset.3 \
 	sigsetops.3 sigfillset.3 \
 	sigsetops.3 sigismember.3
 MLINKS+=statvfs.3 fstatvfs.3
 MLINKS+=stringlist.3 sl_add.3 \
 	stringlist.3 sl_find.3 \
 	stringlist.3 sl_free.3 \
 	stringlist.3 sl_init.3
 MLINKS+=strtofflags.3 fflagstostr.3
 MLINKS+=sysctl.3 sysctlbyname.3 \
 	sysctl.3 sysctlnametomib.3
 MLINKS+=syslog.3 closelog.3 \
 	syslog.3 openlog.3 \
 	syslog.3 setlogmask.3 \
 	syslog.3 vsyslog.3
 MLINKS+=tcsendbreak.3 tcdrain.3 \
 	tcsendbreak.3 tcflow.3 \
 	tcsendbreak.3 tcflush.3
 MLINKS+=tcsetattr.3 cfgetispeed.3 \
 	tcsetattr.3 cfgetospeed.3 \
 	tcsetattr.3 cfmakeraw.3 \
 	tcsetattr.3 cfmakesane.3 \
 	tcsetattr.3 cfsetispeed.3 \
 	tcsetattr.3 cfsetospeed.3 \
 	tcsetattr.3 cfsetspeed.3 \
 	tcsetattr.3 tcgetattr.3
 MLINKS+=ttyname.3 isatty.3 \
 	ttyname.3 ttyname_r.3
 MLINKS+=tzset.3 tzsetwall.3
 MLINKS+=unvis.3 strunvis.3 \
 	unvis.3 strunvisx.3
 MLINKS+=vis.3 nvis.3 \
 	vis.3 snvis.3 \
 	vis.3 strenvisx.3 \
 	vis.3 strnunvis.3 \
 	vis.3 strnunvisx.3 \
 	vis.3 strnvis.3 \
 	vis.3 strnvisx.3 \
 	vis.3 strsenvisx.3 \
 	vis.3 strsnvis.3 \
 	vis.3 strsnvisx.3 \
 	vis.3 strsvis.3 \
 	vis.3 strsvisx.3 \
 	vis.3 strvis.3 \
 	vis.3 strvisx.3 \
 	vis.3 svis.3
 
 MLINKS+=wordexp.3 wordfree.3
Index: user/ngie/bug-237403/lib/libc/gen/auxv.3
===================================================================
--- user/ngie/bug-237403/lib/libc/gen/auxv.3	(nonexistent)
+++ user/ngie/bug-237403/lib/libc/gen/auxv.3	(revision 346926)
@@ -0,0 +1,86 @@
+.\"
+.\" Copyright (c) 2019 Ian Lepore <ian@freebsd.org>
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\"
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+.\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+.\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd April 25, 2019
+.Dt ELF_AUX_INFO 3
+.Os
+.Sh NAME
+.Nm elf_aux_info
+.Nd extract data from the elf auxiliary vector of the current process
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In sys/auxv.h
+.Ft int
+.Fn elf_aux_info "int aux" "void *buf" "int buflen"
+.Sh DESCRIPTION
+The
+.Fn elf_aux_info
+function retrieves the auxiliary info vector requested in
+.Va aux .
+The information is stored into the provided buffer if it will fit.
+The following values, defined in
+.In sys/elf_common.h
+can be requested:
+.Bl -tag -width AT_OSRELDATE
+.It AT_CANARY
+The canary value for SSP.
+.It AT_HWCAP
+CPU / hardware feature flags.
+.It AT_HWCAP2
+CPU / hardware feature flags.
+.It AT_NCPUS
+Number of CPUs.
+.It AT_OSRELDATE
+Kernel OSRELDATE.
+.It AT_PAGESIZES
+Vector of page sizes.
+.It AT_PAGESZ
+Page size in bytes.
+.It AT_TIMEKEEP
+Pointer to VDSO timehands (for library internal use).
+.El
+.Sh RETURN VALUES
+Returns zero on success, or an error number on failure.
+.Sh ERRORS
+.Bl -tag -width Er
+.It Bq Er EINVAL
+An unknown item was requested.
+.It Bq Er EINVAL
+The provided buffer was not the right size for the requested item.
+.It Bq Er ENOENT
+The requested item is not available.
+.El
+.Sh HISTORY
+The
+.Fn elf_aux_info
+function appeared in
+.Fx 12.0 .
+.Sh BUGS
+Only a small subset of available auxiliary info vector items are
+accessible with this function.
+Some items require a "right-sized" buffer while others just require a
+"big enough" buffer.

Property changes on: user/ngie/bug-237403/lib/libc/gen/auxv.3
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/lib/libvgl/bitmap.c
===================================================================
--- user/ngie/bug-237403/lib/libvgl/bitmap.c	(revision 346925)
+++ user/ngie/bug-237403/lib/libvgl/bitmap.c	(revision 346926)
@@ -1,300 +1,320 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1991-1997 Søren Schmidt
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer,
  *    in this position and unchanged.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <signal.h>
 #include <sys/fbio.h>
 #include "vgl.h"
 
 #define min(x, y)	(((x) < (y)) ? (x) : (y))
 
 static byte mask[8] = {0xff, 0x7f, 0x3f, 0x1f, 0x0f, 0x07, 0x03, 0x01};
 static int color2bit[16] = {0x00000000, 0x00000001, 0x00000100, 0x00000101,
 			    0x00010000, 0x00010001, 0x00010100, 0x00010101,
 			    0x01000000, 0x01000001, 0x01000100, 0x01000101,
 			    0x01010000, 0x01010001, 0x01010100, 0x01010101};
 
 static void
 WriteVerticalLine(VGLBitmap *dst, int x, int y, int width, byte *line)
 {
   int i, pos, last, planepos, start_offset, end_offset, offset;
   int len;
   unsigned int word = 0;
   byte *address;
   byte *VGLPlane[4];
 
   switch (dst->Type) {
   case VIDBUF4:
   case VIDBUF4S:
     start_offset = (x & 0x07);
     end_offset = (x + width) & 0x07;
     i = (width + start_offset) / 8;
     if (end_offset)
 	i++;
     VGLPlane[0] = VGLBuf;
     VGLPlane[1] = VGLPlane[0] + i;
     VGLPlane[2] = VGLPlane[1] + i;
     VGLPlane[3] = VGLPlane[2] + i;
     pos = 0;
     planepos = 0;
     last = 8 - start_offset;
     while (pos < width) {
       word = 0;
       while (pos < last && pos < width)
 	word = (word<<1) | color2bit[line[pos++]&0x0f];
       VGLPlane[0][planepos] = word;
       VGLPlane[1][planepos] = word>>8;
       VGLPlane[2][planepos] = word>>16;
       VGLPlane[3][planepos] = word>>24;
       planepos++;
       last += 8;
     }
     planepos--;
     if (end_offset) {
       word <<= (8 - end_offset);
       VGLPlane[0][planepos] = word;
       VGLPlane[1][planepos] = word>>8;
       VGLPlane[2][planepos] = word>>16;
       VGLPlane[3][planepos] = word>>24;
     }
     if (start_offset || end_offset)
       width+=8;
     width /= 8;
     outb(0x3ce, 0x01); outb(0x3cf, 0x00);		/* set/reset enable */
     outb(0x3ce, 0x08); outb(0x3cf, 0xff);		/* bit mask */
     for (i=0; i<4; i++) {
       outb(0x3c4, 0x02);
       outb(0x3c5, 0x01<<i);
       outb(0x3ce, 0x04);
       outb(0x3cf, i);
       pos = VGLAdpInfo.va_line_width*y + x/8;
       if (dst->Type == VIDBUF4) {
 	if (end_offset)
 	  VGLPlane[i][planepos] |= dst->Bitmap[pos+planepos] & mask[end_offset];
 	if (start_offset)
 	  VGLPlane[i][0] |= dst->Bitmap[pos] & ~mask[start_offset];
 	bcopy(&VGLPlane[i][0], dst->Bitmap + pos, width);
       } else {	/* VIDBUF4S */
 	if (end_offset) {
 	  offset = VGLSetSegment(pos + planepos);
 	  VGLPlane[i][planepos] |= dst->Bitmap[offset] & mask[end_offset];
 	}
 	offset = VGLSetSegment(pos);
 	if (start_offset)
 	  VGLPlane[i][0] |= dst->Bitmap[offset] & ~mask[start_offset];
 	for (last = width; ; ) { 
 	  len = min(VGLAdpInfo.va_window_size - offset, last);
 	  bcopy(&VGLPlane[i][width - last], dst->Bitmap + offset, len);
 	  pos += len;
 	  last -= len;
 	  if (last <= 0)
 	    break;
 	  offset = VGLSetSegment(pos);
 	}
       }
     }
     break;
   case VIDBUF8X:
     address = dst->Bitmap + VGLAdpInfo.va_line_width * y + x/4;
     for (i=0; i<4; i++) {
       outb(0x3c4, 0x02);
       outb(0x3c5, 0x01 << ((x + i)%4));
       for (planepos=0, pos=i; pos<width; planepos++, pos+=4)
         address[planepos] = line[pos];
       if ((x + i)%4 == 3)
 	++address;
     }
     break;
   case VIDBUF8S:
   case VIDBUF16S:
   case VIDBUF24S:
   case VIDBUF32S:
     width = width * dst->PixelBytes;
     pos = (dst->VXsize * y + x) * dst->PixelBytes;
     while (width > 0) {
       offset = VGLSetSegment(pos);
       i = min(VGLAdpInfo.va_window_size - offset, width);
       bcopy(line, dst->Bitmap + offset, i);
       line += i;
       pos += i;
       width -= i;
     }
     break;
   case MEMBUF:
   case VIDBUF8:
   case VIDBUF16:
   case VIDBUF24:
   case VIDBUF32:
     address = dst->Bitmap + (dst->VXsize * y + x) * dst->PixelBytes;
     bcopy(line, address, width * dst->PixelBytes);
     break;
   default:
     ;
   }
 }
 
 int
 __VGLBitmapCopy(VGLBitmap *src, int srcx, int srcy,
 	      VGLBitmap *dst, int dstx, int dsty, int width, int hight)
 {
-  int srcline, dstline, yend, yextra, ystep;
-
+  byte *buffer, *p;
+  int mousemerge, srcline, dstline, yend, yextra, ystep;
+  
+  mousemerge = 0;
+  if (hight < 0) {
+    hight = -hight;
+    mousemerge = (dst == VGLDisplay &&
+		  VGLMouseOverlap(dstx, dsty, width, hight));
+    if (mousemerge)
+      buffer = alloca(width*src->PixelBytes);
+  }
   if (srcx>src->VXsize || srcy>src->VYsize
 	|| dstx>dst->VXsize || dsty>dst->VYsize)
     return -1;  
   if (srcx < 0) {
     width=width+srcx; dstx-=srcx; srcx=0;    
   }
   if (srcy < 0) {
     hight=hight+srcy; dsty-=srcy; srcy=0; 
   }
   if (dstx < 0) {    
     width=width+dstx; srcx-=dstx; dstx=0;
   }
   if (dsty < 0) {
     hight=hight+dsty; srcy-=dsty; dsty=0;
   }
   if (srcx+width > src->VXsize)
      width=src->VXsize-srcx;
   if (srcy+hight > src->VYsize)
      hight=src->VYsize-srcy;
   if (dstx+width > dst->VXsize)
      width=dst->VXsize-dstx;
   if (dsty+hight > dst->VYsize)
      hight=dst->VYsize-dsty;
   if (width < 0 || hight < 0)
      return -1;
   yend = srcy + hight;
   yextra = 0;
   ystep = 1;
   if (src->Bitmap == dst->Bitmap && srcy < dsty) {
-    yend = srcy;
+    yend = srcy - 1;
     yextra = hight - 1;
     ystep = -1;
   }
   for (srcline = srcy + yextra, dstline = dsty + yextra; srcline != yend;
        srcline += ystep, dstline += ystep) {
-    WriteVerticalLine(dst, dstx, dstline, width, 
-      src->Bitmap+(srcline*src->VXsize+srcx)*dst->PixelBytes);
+    p = src->Bitmap+(srcline*src->VXsize+srcx)*dst->PixelBytes;
+    if (mousemerge && VGLMouseOverlap(dstx, dstline, width, 1)) {
+      bcopy(p, buffer, width*src->PixelBytes);
+      p = buffer;
+      VGLMouseMerge(dstx, dstline, width, p);
+    }
+    WriteVerticalLine(dst, dstx, dstline, width, p);
   }
   return 0;
 }
 
 int
 VGLBitmapCopy(VGLBitmap *src, int srcx, int srcy,
 	      VGLBitmap *dst, int dstx, int dsty, int width, int hight)
 {
   int error;
 
+  if (hight < 0)
+    return -1;
   if (src == VGLDisplay)
     src = &VGLVDisplay;
   if (src->Type != MEMBUF)
     return -1;		/* invalid */
   if (dst == VGLDisplay) {
-    VGLMouseFreeze(dstx, dsty, width, hight, 0);
+    VGLMouseFreeze();
     __VGLBitmapCopy(src, srcx, srcy, &VGLVDisplay, dstx, dsty, width, hight);
+    error = __VGLBitmapCopy(src, srcx, srcy, &VGLVDisplay, dstx, dsty,
+                            width, hight);
+    if (error != 0)
+      return error;
     src = &VGLVDisplay;
     srcx = dstx;
     srcy = dsty;
   } else if (dst->Type != MEMBUF)
     return -1;		/* invalid */
-  error = __VGLBitmapCopy(src, srcx, srcy, dst, dstx, dsty, width, hight);
+  error = __VGLBitmapCopy(src, srcx, srcy, dst, dstx, dsty, width, -hight);
   if (dst == VGLDisplay)
     VGLMouseUnFreeze();
   return error;
 }
 
 VGLBitmap
 *VGLBitmapCreate(int type, int xsize, int ysize, byte *bits)
 {
   VGLBitmap *object;
 
   if (type != MEMBUF)
     return NULL;
   if (xsize < 0 || ysize < 0)
     return NULL;
   object = (VGLBitmap *)malloc(sizeof(*object));
   if (object == NULL)
     return NULL;
   object->Type = type;
   object->Xsize = xsize;
   object->Ysize = ysize;
   object->VXsize = xsize;
   object->VYsize = ysize;
   object->Xorigin = 0;
   object->Yorigin = 0;
   object->Bitmap = bits;
   object->PixelBytes = VGLDisplay->PixelBytes;
   return object;
 }
 
 void
 VGLBitmapDestroy(VGLBitmap *object)
 {
   if (object->Bitmap)
     free(object->Bitmap);
   free(object);
 }
 
 int
 VGLBitmapAllocateBits(VGLBitmap *object)
 {
   object->Bitmap = malloc(object->VXsize*object->VYsize*object->PixelBytes);
   if (object->Bitmap == NULL)
     return -1;
   return 0;
 }
 
 void
 VGLBitmapCvt(VGLBitmap *src, VGLBitmap *dst)
 {
   u_long color;
   int dstpos, i, pb, size, srcpb, srcpos;
 
   size = src->VXsize * src->VYsize;
   srcpb = src->PixelBytes;
   if (srcpb <= 0)
     srcpb = 1;
   pb = dst->PixelBytes;
   if (pb == srcpb) {
     bcopy(src->Bitmap, dst->Bitmap, size * pb);
     return;
   }
   if (srcpb != 1)
     return;		/* not supported */
   for (srcpos = dstpos = 0; srcpos < size; srcpos++) {
     color = VGLrgb332ToNative(src->Bitmap[srcpos]);
     for (i = 0; i < pb; i++, color >>= 8)
         dst->Bitmap[dstpos++] = color;
   }
 }
Index: user/ngie/bug-237403/lib/libvgl/main.c
===================================================================
--- user/ngie/bug-237403/lib/libvgl/main.c	(revision 346925)
+++ user/ngie/bug-237403/lib/libvgl/main.c	(revision 346926)
@@ -1,530 +1,531 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1991-1997 Søren Schmidt
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer
  *    in this position and unchanged.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <signal.h>
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/file.h>
 #include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <sys/fbio.h>
 #include <sys/kbio.h>
 #include <sys/consio.h>
 #include "vgl.h"
 
-/* XXX Direct Color 24bits modes unsupported */
-
 #define min(x, y)	(((x) < (y)) ? (x) : (y))
 #define max(x, y)	(((x) > (y)) ? (x) : (y))
 
 VGLBitmap *VGLDisplay;
 VGLBitmap VGLVDisplay;
 video_info_t VGLModeInfo;
 video_adapter_info_t VGLAdpInfo;
 byte *VGLBuf;
 
 static int VGLMode;
 static int VGLOldMode;
 static size_t VGLBufSize;
 static byte *VGLMem = MAP_FAILED;
 static int VGLSwitchPending;
 static int VGLAbortPending;
 static int VGLOnDisplay;
 static unsigned int VGLCurWindow;
 static int VGLInitDone = 0;
 static video_info_t VGLOldModeInfo;
 static vid_info_t VGLOldVInfo;
+static int VGLOldVXsize;
 
 void
 VGLEnd()
 {
 struct vt_mode smode;
   int size[3];
 
   if (!VGLInitDone)
     return;
   VGLInitDone = 0;
+  signal(SIGUSR1, SIG_IGN);
+  signal(SIGUSR2, SIG_IGN);
   VGLSwitchPending = 0;
   VGLAbortPending = 0;
+  VGLMouseMode(VGL_MOUSEHIDE);
 
-  signal(SIGUSR1, SIG_IGN);
-
   if (VGLMem != MAP_FAILED) {
     VGLClear(VGLDisplay, 0);
     munmap(VGLMem, VGLAdpInfo.va_window_size);
   }
 
+  ioctl(0, FBIO_SETLINEWIDTH, &VGLOldVXsize);
+
   if (VGLOldMode >= M_VESA_BASE)
     ioctl(0, _IO('V', VGLOldMode - M_VESA_BASE), 0);
   else
     ioctl(0, _IO('S', VGLOldMode), 0);
   if (VGLOldModeInfo.vi_flags & V_INFO_GRAPHICS) {
     size[0] = VGLOldVInfo.mv_csz;
     size[1] = VGLOldVInfo.mv_rsz;
     size[2] = VGLOldVInfo.font_size;;
     ioctl(0, KDRASTER, size);
   }
   if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT)
     ioctl(0, KDDISABIO, 0);
   ioctl(0, KDSETMODE, KD_TEXT);
   smode.mode = VT_AUTO;
   ioctl(0, VT_SETMODE, &smode);
   if (VGLBuf)
     free(VGLBuf);
   VGLBuf = NULL;
   free(VGLDisplay);
   VGLDisplay = NULL;
   VGLKeyboardEnd();
 }
 
 static void 
 VGLAbort(int arg)
 {
   sigset_t mask;
 
   VGLAbortPending = 1;
   signal(SIGINT, SIG_IGN);
   signal(SIGTERM, SIG_IGN);
   signal(SIGUSR2, SIG_IGN);
   if (arg == SIGBUS || arg == SIGSEGV) {
     signal(arg, SIG_DFL);
     sigemptyset(&mask);
     sigaddset(&mask, arg);
     sigprocmask(SIG_UNBLOCK, &mask, NULL);
     VGLEnd();
     kill(getpid(), arg);
   }
 }
 
 static void
 VGLSwitch(int arg __unused)
 {
   if (!VGLOnDisplay)
     VGLOnDisplay = 1;
   else
     VGLOnDisplay = 0;
   VGLSwitchPending = 1;
   signal(SIGUSR1, VGLSwitch);
 }
 
 int
 VGLInit(int mode)
 {
   struct vt_mode smode;
   int adptype, depth;
 
   if (VGLInitDone)
     return -1;
 
   signal(SIGUSR1, VGLSwitch);
   signal(SIGINT, VGLAbort);
   signal(SIGTERM, VGLAbort);
   signal(SIGSEGV, VGLAbort);
   signal(SIGBUS, VGLAbort);
   signal(SIGUSR2, SIG_IGN);
 
   VGLOnDisplay = 1;
   VGLSwitchPending = 0;
   VGLAbortPending = 0;
 
   if (ioctl(0, CONS_GET, &VGLOldMode) || ioctl(0, CONS_CURRENT, &adptype))
     return -1;
   if (IOCGROUP(mode) == 'V')	/* XXX: this is ugly */
     VGLModeInfo.vi_mode = (mode & 0x0ff) + M_VESA_BASE;
   else
     VGLModeInfo.vi_mode = mode & 0x0ff;
   if (ioctl(0, CONS_MODEINFO, &VGLModeInfo))	/* FBIO_MODEINFO */
     return -1;
 
   /* Save info for old mode to restore font size if old mode is graphics. */
   VGLOldModeInfo.vi_mode = VGLOldMode;
   if (ioctl(0, CONS_MODEINFO, &VGLOldModeInfo))
     return -1;
   VGLOldVInfo.size = sizeof(VGLOldVInfo);
   if (ioctl(0, CONS_GETINFO, &VGLOldVInfo))
     return -1;
 
   VGLDisplay = (VGLBitmap *)malloc(sizeof(VGLBitmap));
   if (VGLDisplay == NULL)
     return -2;
 
   if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT && ioctl(0, KDENABIO, 0)) {
     free(VGLDisplay);
     return -3;
   }
 
   VGLInitDone = 1;
 
   /*
    * vi_mem_model specifies the memory model of the current video mode
    * in -CURRENT.
    */
   switch (VGLModeInfo.vi_mem_model) {
   case V_INFO_MM_PLANAR:
     /* we can handle EGA/VGA planner modes only */
     if (VGLModeInfo.vi_depth != 4 || VGLModeInfo.vi_planes != 4
 	|| (adptype != KD_EGA && adptype != KD_VGA)) {
       VGLEnd();
       return -4;
     }
     VGLDisplay->Type = VIDBUF4;
     VGLDisplay->PixelBytes = 1;
     break;
   case V_INFO_MM_PACKED:
     /* we can do only 256 color packed modes */
     if (VGLModeInfo.vi_depth != 8) {
       VGLEnd();
       return -4;
     }
     VGLDisplay->Type = VIDBUF8;
     VGLDisplay->PixelBytes = 1;
     break;
   case V_INFO_MM_VGAX:
     VGLDisplay->Type = VIDBUF8X;
     VGLDisplay->PixelBytes = 1;
     break;
   case V_INFO_MM_DIRECT:
     VGLDisplay->PixelBytes = VGLModeInfo.vi_pixel_size;
     switch (VGLDisplay->PixelBytes) {
     case 2:
       VGLDisplay->Type = VIDBUF16;
       break;
-#if notyet
     case 3:
       VGLDisplay->Type = VIDBUF24;
       break;
-#endif
     case 4:
       VGLDisplay->Type = VIDBUF32;
       break;
     default:
       VGLEnd();
       return -4;
     }
     break;
   default:
     VGLEnd();
     return -4;
   }
 
   ioctl(0, VT_WAITACTIVE, 0);
   ioctl(0, KDSETMODE, KD_GRAPHICS);
   if (ioctl(0, mode, 0)) {
     VGLEnd();
     return -5;
   }
   if (ioctl(0, CONS_ADPINFO, &VGLAdpInfo)) {	/* FBIO_ADPINFO */
     VGLEnd();
     return -6;
   }
 
   /*
    * Calculate the shadow screen buffer size.  In -CURRENT, va_buffer_size
    * always holds the entire frame buffer size, wheather it's in the linear
    * mode or windowed mode.  
    *     VGLBufSize = VGLAdpInfo.va_buffer_size;
    * In -STABLE, va_buffer_size holds the frame buffer size, only if
    * the linear frame buffer mode is supported. Otherwise the field is zero.
    * We shall calculate the minimal size in this case:
    *     VGLAdpInfo.va_line_width*VGLModeInfo.vi_height*VGLModeInfo.vi_planes
    * or
    *     VGLAdpInfo.va_window_size*VGLModeInfo.vi_planes;
    * Use whichever is larger.
    */
   if (VGLAdpInfo.va_buffer_size != 0)
     VGLBufSize = VGLAdpInfo.va_buffer_size;
   else
     VGLBufSize = max(VGLAdpInfo.va_line_width*VGLModeInfo.vi_height,
 		     VGLAdpInfo.va_window_size)*VGLModeInfo.vi_planes;
   /*
    * The above is for old -CURRENT.  Current -CURRENT since r203535 and/or
    * r248799 restricts va_buffer_size to the displayed size in VESA modes to
    * avoid wasting kva for mapping unused parts of the frame buffer.  But all
    * parts were usable here.  Applying the same restriction to user mappings
    * makes our virtualization useless and breaks our panning, but large frame
    * buffers are also difficult for us to manage (clearing and switching may
    * be too slow, and malloc() may fail).  Restrict ourselves similarly to
    * get the same efficiency and bugs for all kernels.
    */
   if (VGLModeInfo.vi_mode >= M_VESA_BASE)
     VGLBufSize = VGLAdpInfo.va_line_width*VGLModeInfo.vi_height*
                  VGLModeInfo.vi_planes;
   VGLBuf = malloc(VGLBufSize);
   if (VGLBuf == NULL) {
     VGLEnd();
     return -7;
   }
 
 #ifdef LIBVGL_DEBUG
   fprintf(stderr, "VGLBufSize:0x%x\n", VGLBufSize);
 #endif
 
   /* see if we are in the windowed buffer mode or in the linear buffer mode */
   if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size) {
     switch (VGLDisplay->Type) {
     case VIDBUF4:
       VGLDisplay->Type = VIDBUF4S;
       break;
     case VIDBUF8:
       VGLDisplay->Type = VIDBUF8S;
       break;
     case VIDBUF16:
       VGLDisplay->Type = VIDBUF16S;
       break;
     case VIDBUF24:
       VGLDisplay->Type = VIDBUF24S;
       break;
     case VIDBUF32:
       VGLDisplay->Type = VIDBUF32S;
       break;
     default:
       VGLEnd();
       return -8;
     }
   }
 
   VGLMode = mode;
   VGLCurWindow = 0;
 
   VGLDisplay->Xsize = VGLModeInfo.vi_width;
   VGLDisplay->Ysize = VGLModeInfo.vi_height;
   depth = VGLModeInfo.vi_depth;
   if (depth == 15)
     depth = 16;
+  VGLOldVXsize =
   VGLDisplay->VXsize = VGLAdpInfo.va_line_width
 			   *8/(depth/VGLModeInfo.vi_planes);
   VGLDisplay->VYsize = VGLBufSize/VGLModeInfo.vi_planes/VGLAdpInfo.va_line_width;
   VGLDisplay->Xorigin = 0;
   VGLDisplay->Yorigin = 0;
 
   VGLMem = (byte*)mmap(0, VGLAdpInfo.va_window_size, PROT_READ|PROT_WRITE,
 		       MAP_FILE | MAP_SHARED, 0, 0);
   if (VGLMem == MAP_FAILED) {
     VGLEnd();
     return -7;
   }
   VGLDisplay->Bitmap = VGLMem;
 
   VGLVDisplay = *VGLDisplay;
   VGLVDisplay.Type = MEMBUF;
   if (VGLModeInfo.vi_depth < 8)
     VGLVDisplay.Bitmap = malloc(2 * VGLBufSize);
   else
     VGLVDisplay.Bitmap = VGLBuf;
 
   VGLSavePalette();
 
 #ifdef LIBVGL_DEBUG
   fprintf(stderr, "va_line_width:%d\n", VGLAdpInfo.va_line_width);
   fprintf(stderr, "VGLXsize:%d, Ysize:%d, VXsize:%d, VYsize:%d\n",
 	  VGLDisplay->Xsize, VGLDisplay->Ysize, 
 	  VGLDisplay->VXsize, VGLDisplay->VYsize);
 #endif
 
   smode.mode = VT_PROCESS;
   smode.waitv = 0;
   smode.relsig = SIGUSR1;
   smode.acqsig = SIGUSR1;
   smode.frsig  = SIGINT;	
   if (ioctl(0, VT_SETMODE, &smode)) {
     VGLEnd();
     return -9;
   }
   VGLTextSetFontFile((byte*)0);
   VGLClear(VGLDisplay, 0);
   return 0;
 }
 
 void
 VGLCheckSwitch()
 {
   if (VGLAbortPending) {
     VGLEnd();
     exit(0);
   }
   while (VGLSwitchPending) {
     VGLSwitchPending = 0;
     if (VGLOnDisplay) {
       if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT)
         ioctl(0, KDENABIO, 0);
       ioctl(0, KDSETMODE, KD_GRAPHICS);
       ioctl(0, VGLMode, 0);
       VGLCurWindow = 0;
       VGLMem = (byte*)mmap(0, VGLAdpInfo.va_window_size, PROT_READ|PROT_WRITE,
 			   MAP_FILE | MAP_SHARED, 0, 0);
 
       /* XXX: what if mmap() has failed! */
       VGLDisplay->Type = VIDBUF8;	/* XXX */
       switch (VGLModeInfo.vi_mem_model) {
       case V_INFO_MM_PLANAR:
 	if (VGLModeInfo.vi_depth == 4 && VGLModeInfo.vi_planes == 4) {
 	  if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size)
 	    VGLDisplay->Type = VIDBUF4S;
 	  else
 	    VGLDisplay->Type = VIDBUF4;
 	} else {
 	  /* shouldn't be happening */
 	}
         break;
       case V_INFO_MM_PACKED:
 	if (VGLModeInfo.vi_depth == 8) {
 	  if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size)
 	    VGLDisplay->Type = VIDBUF8S;
 	  else
 	    VGLDisplay->Type = VIDBUF8;
 	}
         break;
       case V_INFO_MM_VGAX:
 	VGLDisplay->Type = VIDBUF8X;
 	break;
       case V_INFO_MM_DIRECT:
 	switch (VGLModeInfo.vi_pixel_size) {
 	  case 2:
 	    if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size)
 	      VGLDisplay->Type = VIDBUF16S;
 	    else
 	      VGLDisplay->Type = VIDBUF16;
 	    break;
 	  case 3:
 	    if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size)
 	      VGLDisplay->Type = VIDBUF24S;
 	    else
 	      VGLDisplay->Type = VIDBUF24;
 	    break;
 	  case 4:
 	    if (VGLBufSize/VGLModeInfo.vi_planes > VGLAdpInfo.va_window_size)
 	      VGLDisplay->Type = VIDBUF32S;
 	    else
 	      VGLDisplay->Type = VIDBUF32;
 	    break;
 	  default:
 	  /* shouldn't be happening */
           break;
         }
       default:
 	/* shouldn't be happening */
         break;
       }
 
       VGLDisplay->Bitmap = VGLMem;
       VGLDisplay->Xsize = VGLModeInfo.vi_width;
       VGLDisplay->Ysize = VGLModeInfo.vi_height;
       VGLSetVScreenSize(VGLDisplay, VGLDisplay->VXsize, VGLDisplay->VYsize);
       VGLRestoreBlank();
       VGLRestoreBorder();
       VGLMouseRestore();
       VGLPanScreen(VGLDisplay, VGLDisplay->Xorigin, VGLDisplay->Yorigin);
       VGLBitmapCopy(&VGLVDisplay, 0, 0, VGLDisplay, 0, 0, 
                     VGLDisplay->VXsize, VGLDisplay->VYsize);
       VGLRestorePalette();
       ioctl(0, VT_RELDISP, VT_ACKACQ);
     }
     else {
       VGLMem = MAP_FAILED;
       munmap(VGLDisplay->Bitmap, VGLAdpInfo.va_window_size);
       ioctl(0, VGLOldMode, 0);
       ioctl(0, KDSETMODE, KD_TEXT);
       if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT)
         ioctl(0, KDDISABIO, 0);
       ioctl(0, VT_RELDISP, VT_TRUE);
       VGLDisplay->Bitmap = VGLBuf;
       VGLDisplay->Type = MEMBUF;
       VGLDisplay->Xsize = VGLDisplay->VXsize;
       VGLDisplay->Ysize = VGLDisplay->VYsize;
       while (!VGLOnDisplay) pause();
     }
   }
 }
 
 int
 VGLSetSegment(unsigned int offset)
 {
   if (offset/VGLAdpInfo.va_window_size != VGLCurWindow) {
     ioctl(0, CONS_SETWINORG, offset);		/* FBIO_SETWINORG */
     VGLCurWindow = offset/VGLAdpInfo.va_window_size;
   }
   return (offset%VGLAdpInfo.va_window_size);
 }
 
 int
 VGLSetVScreenSize(VGLBitmap *object, int VXsize, int VYsize)
 {
   int depth;
 
   if (VXsize < object->Xsize || VYsize < object->Ysize)
     return -1;
   if (object->Type == MEMBUF)
     return -1;
   if (ioctl(0, FBIO_SETLINEWIDTH, &VXsize))
     return -1;
   ioctl(0, CONS_ADPINFO, &VGLAdpInfo);	/* FBIO_ADPINFO */
   depth = VGLModeInfo.vi_depth;
   if (depth == 15)
     depth = 16;
   object->VXsize = VGLAdpInfo.va_line_width
 			   *8/(depth/VGLModeInfo.vi_planes);
   object->VYsize = VGLBufSize/VGLModeInfo.vi_planes/VGLAdpInfo.va_line_width;
   if (VYsize < object->VYsize)
     object->VYsize = VYsize;
 
 #ifdef LIBVGL_DEBUG
   fprintf(stderr, "new size: VGLXsize:%d, Ysize:%d, VXsize:%d, VYsize:%d\n",
 	  object->Xsize, object->Ysize, object->VXsize, object->VYsize);
 #endif
 
   return 0;
 }
 
 int
 VGLPanScreen(VGLBitmap *object, int x, int y)
 {
   video_display_start_t origin;
 
   if (x < 0 || x + object->Xsize > object->VXsize
       || y < 0 || y + object->Ysize > object->VYsize)
     return -1;
   if (object->Type == MEMBUF)
     return 0;
   origin.x = x;
   origin.y = y;
   if (ioctl(0, FBIO_SETDISPSTART, &origin))
     return -1;
   object->Xorigin = x;
   object->Yorigin = y;
 
 #ifdef LIBVGL_DEBUG
   fprintf(stderr, "new origin: (%d, %d)\n", x, y);
 #endif
 
   return 0;
 }
Index: user/ngie/bug-237403/lib/libvgl/mouse.c
===================================================================
--- user/ngie/bug-237403/lib/libvgl/mouse.c	(revision 346925)
+++ user/ngie/bug-237403/lib/libvgl/mouse.c	(revision 346926)
@@ -1,363 +1,426 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1991-1997 Søren Schmidt
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer
  *    in this position and unchanged.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/signal.h>
 #include <sys/consio.h>
 #include <sys/fbio.h>
 #include "vgl.h"
 
+static void VGLMouseAction(int dummy);
+
 #define BORDER	0xff	/* default border -- light white in rgb 3:3:2 */
 #define INTERIOR 0xa0	/* default interior -- red in rgb 3:3:2 */
+#define LARGE_MOUSE_IMG_XSIZE	19
+#define LARGE_MOUSE_IMG_YSIZE	32
+#define SMALL_MOUSE_IMG_XSIZE	10
+#define SMALL_MOUSE_IMG_YSIZE	16
 #define X	0xff	/* any nonzero in And mask means part of cursor */
 #define B	BORDER
 #define I	INTERIOR
-static byte StdAndMask[MOUSE_IMG_SIZE*MOUSE_IMG_SIZE] = {
-	X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-	X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,
-	X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,
-	X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,
-	X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,
-	X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,
-	X,X,X,0,X,X,X,X,0,0,0,0,0,0,0,0,
-	X,X,0,0,X,X,X,X,0,0,0,0,0,0,0,0,
-	0,0,0,0,0,X,X,X,X,0,0,0,0,0,0,0,
-	0,0,0,0,0,X,X,X,X,0,0,0,0,0,0,0,
-	0,0,0,0,0,0,X,X,0,0,0,0,0,0,0,0,
+static byte LargeAndMask[] = {
+  X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,
+  X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,
+  X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,X,X,X,X,X,X,0,0,0,0,0,0,0,
+  X,X,X,X,X,X,0,X,X,X,X,X,X,0,0,0,0,0,0,
+  X,X,X,X,X,0,0,X,X,X,X,X,X,0,0,0,0,0,0,
+  X,X,X,X,0,0,0,0,X,X,X,X,X,X,0,0,0,0,0,
+  X,X,X,0,0,0,0,0,X,X,X,X,X,X,0,0,0,0,0,
+  X,X,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,X,X,X,X,X,X,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,0,X,X,X,X,0,0,0,
 };
-static byte StdOrMask[MOUSE_IMG_SIZE*MOUSE_IMG_SIZE] = {
-	B,B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-	B,I,B,0,0,0,0,0,0,0,0,0,0,0,0,0,
-	B,I,I,B,0,0,0,0,0,0,0,0,0,0,0,0,
-	B,I,I,I,B,0,0,0,0,0,0,0,0,0,0,0,
-	B,I,I,I,I,B,0,0,0,0,0,0,0,0,0,0,
-	B,I,I,I,I,I,B,0,0,0,0,0,0,0,0,0,
-	B,I,I,I,I,I,I,B,0,0,0,0,0,0,0,0,
-	B,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,
-	B,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,
-	B,I,I,I,I,I,B,B,B,B,0,0,0,0,0,0,
-	B,I,I,B,I,I,B,0,0,0,0,0,0,0,0,0,
-	B,I,B,0,B,I,I,B,0,0,0,0,0,0,0,0,
-	B,B,0,0,B,I,I,B,0,0,0,0,0,0,0,0,
-	0,0,0,0,0,B,I,I,B,0,0,0,0,0,0,0,
-	0,0,0,0,0,B,I,I,B,0,0,0,0,0,0,0,
-	0,0,0,0,0,0,B,B,0,0,0,0,0,0,0,0,
+static byte LargeOrMask[] = {
+  B,B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,B,0,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,B,0,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,B,0,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,B,0,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,I,B,0,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,I,I,B,0,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,B,0,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,B,0,
+  B,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,I,B,
+  B,I,I,I,I,I,I,I,I,I,I,B,B,B,B,B,B,B,B,
+  B,I,I,I,I,I,I,I,I,I,I,B,0,0,0,0,0,0,0,
+  B,I,I,I,I,I,B,I,I,I,I,B,0,0,0,0,0,0,0,
+  B,I,I,I,I,B,0,B,I,I,I,I,B,0,0,0,0,0,0,
+  B,I,I,I,B,0,0,B,I,I,I,I,B,0,0,0,0,0,0,
+  B,I,I,B,0,0,0,0,B,I,I,I,I,B,0,0,0,0,0,
+  B,I,B,0,0,0,0,0,B,I,I,I,I,B,0,0,0,0,0,
+  B,B,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,B,I,I,I,I,B,0,0,
+  0,0,0,0,0,0,0,0,0,0,0,0,B,B,B,B,0,0,0,
 };
+static byte SmallAndMask[] = {
+  X,X,0,0,0,0,0,0,0,0,
+  X,X,X,0,0,0,0,0,0,0,
+  X,X,X,X,0,0,0,0,0,0,
+  X,X,X,X,X,0,0,0,0,0,
+  X,X,X,X,X,X,0,0,0,0,
+  X,X,X,X,X,X,X,0,0,0,
+  X,X,X,X,X,X,X,X,0,0,
+  X,X,X,X,X,X,X,X,X,0,
+  X,X,X,X,X,X,X,X,X,X,
+  X,X,X,X,X,X,X,X,X,X,
+  X,X,X,X,X,X,X,0,0,0,
+  X,X,X,0,X,X,X,X,0,0,
+  X,X,0,0,X,X,X,X,0,0,
+  0,0,0,0,0,X,X,X,X,0,
+  0,0,0,0,0,X,X,X,X,0,
+  0,0,0,0,0,0,X,X,0,0,
+};
+static byte SmallOrMask[] = {
+  B,B,0,0,0,0,0,0,0,0,
+  B,I,B,0,0,0,0,0,0,0,
+  B,I,I,B,0,0,0,0,0,0,
+  B,I,I,I,B,0,0,0,0,0,
+  B,I,I,I,I,B,0,0,0,0,
+  B,I,I,I,I,I,B,0,0,0,
+  B,I,I,I,I,I,I,B,0,0,
+  B,I,I,I,I,I,I,I,B,0,
+  B,I,I,I,I,I,I,I,I,B,
+  B,I,I,I,I,I,B,B,B,B,
+  B,I,I,B,I,I,B,0,0,0,
+  B,I,B,0,B,I,I,B,0,0,
+  B,B,0,0,B,I,I,B,0,0,
+  0,0,0,0,0,B,I,I,B,0,
+  0,0,0,0,0,B,I,I,B,0,
+  0,0,0,0,0,0,B,B,0,0,
+};
 #undef X
 #undef B
 #undef I
-static VGLBitmap VGLMouseStdAndMask = 
-    VGLBITMAP_INITIALIZER(MEMBUF, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE, StdAndMask);
-static VGLBitmap VGLMouseStdOrMask = 
-    VGLBITMAP_INITIALIZER(MEMBUF, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE, StdOrMask);
+static VGLBitmap VGLMouseLargeAndMask = 
+  VGLBITMAP_INITIALIZER(MEMBUF, LARGE_MOUSE_IMG_XSIZE, LARGE_MOUSE_IMG_YSIZE,
+                        LargeAndMask);
+static VGLBitmap VGLMouseLargeOrMask = 
+  VGLBITMAP_INITIALIZER(MEMBUF, LARGE_MOUSE_IMG_XSIZE, LARGE_MOUSE_IMG_YSIZE,
+                        LargeOrMask);
+static VGLBitmap VGLMouseSmallAndMask = 
+  VGLBITMAP_INITIALIZER(MEMBUF, SMALL_MOUSE_IMG_XSIZE, SMALL_MOUSE_IMG_YSIZE,
+                        SmallAndMask);
+static VGLBitmap VGLMouseSmallOrMask = 
+  VGLBITMAP_INITIALIZER(MEMBUF, SMALL_MOUSE_IMG_XSIZE, SMALL_MOUSE_IMG_YSIZE,
+                        SmallOrMask);
 static VGLBitmap *VGLMouseAndMask, *VGLMouseOrMask;
-static int VGLMouseVisible = 0;
-static int VGLMouseShown = 0;
+static int VGLMouseShown = VGL_MOUSEHIDE;
 static int VGLMouseXpos = 0;
 static int VGLMouseYpos = 0;
 static int VGLMouseButtons = 0;
 static volatile sig_atomic_t VGLMintpending;
 static volatile sig_atomic_t VGLMsuppressint;
 
 #define	INTOFF()	(VGLMsuppressint++)
 #define	INTON()		do { 						\
 				if (--VGLMsuppressint == 0 && VGLMintpending) \
 					VGLMouseAction(0);		\
 			} while (0)
 
-void
-VGLMousePointerShow()
+int
+__VGLMouseMode(int mode)
 {
-  byte buf[MOUSE_IMG_SIZE*MOUSE_IMG_SIZE*4];
-  VGLBitmap buffer =
-    VGLBITMAP_INITIALIZER(MEMBUF, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE, buf);
-  byte crtcidx, crtcval, gdcidx, gdcval;
-  int pos;
+  int oldmode;
 
-  if (!VGLMouseVisible) {
-    INTOFF();
-    VGLMouseVisible = 1;
-    if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT) {
-      crtcidx = inb(0x3c4);
-      crtcval = inb(0x3c5);
-      gdcidx = inb(0x3ce);
-      gdcval = inb(0x3cf);
-    }
-    buffer.PixelBytes = VGLDisplay->PixelBytes;
-    __VGLBitmapCopy(&VGLVDisplay, VGLMouseXpos, VGLMouseYpos, 
-                    &buffer, 0, 0, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE);
-    for (pos = 0; pos <  MOUSE_IMG_SIZE*MOUSE_IMG_SIZE; pos++)
-      if (VGLMouseAndMask->Bitmap[pos])
-        bcopy(&VGLMouseOrMask->Bitmap[pos*VGLDisplay->PixelBytes],
-              &buffer.Bitmap[pos*VGLDisplay->PixelBytes],
-              VGLDisplay->PixelBytes);
-    __VGLBitmapCopy(&buffer, 0, 0, VGLDisplay, 
-		  VGLMouseXpos, VGLMouseYpos, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE);
-    if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT) {
-      outb(0x3c4, crtcidx);
-      outb(0x3c5, crtcval);
-      outb(0x3ce, gdcidx);
-      outb(0x3cf, gdcval);
-    }
-    INTON();
-  }
-}
-
-void
-VGLMousePointerHide()
-{
-  byte crtcidx, crtcval, gdcidx, gdcval;
-
-  if (VGLMouseVisible) {
-    INTOFF();
-    VGLMouseVisible = 0;
-    if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT) {
-      crtcidx = inb(0x3c4);
-      crtcval = inb(0x3c5);
-      gdcidx = inb(0x3ce);
-      gdcval = inb(0x3cf);
-    }
-    __VGLBitmapCopy(&VGLVDisplay, VGLMouseXpos, VGLMouseYpos, VGLDisplay, 
-                    VGLMouseXpos, VGLMouseYpos, MOUSE_IMG_SIZE, MOUSE_IMG_SIZE);
-    if (VGLModeInfo.vi_mem_model != V_INFO_MM_DIRECT) {
-      outb(0x3c4, crtcidx);
-      outb(0x3c5, crtcval);
-      outb(0x3ce, gdcidx);
-      outb(0x3cf, gdcval);
-    }
-    INTON();
-  }
-}
-
-void
-VGLMouseMode(int mode)
-{
+  INTOFF();
+  oldmode = VGLMouseShown;
   if (mode == VGL_MOUSESHOW) {
     if (VGLMouseShown == VGL_MOUSEHIDE) {
-      VGLMousePointerShow();
       VGLMouseShown = VGL_MOUSESHOW;
+      __VGLBitmapCopy(&VGLVDisplay, VGLMouseXpos, VGLMouseYpos,
+                      VGLDisplay, VGLMouseXpos, VGLMouseYpos,
+                      VGLMouseAndMask->VXsize, -VGLMouseAndMask->VYsize);
     }
   }
   else {
     if (VGLMouseShown == VGL_MOUSESHOW) {
-      VGLMousePointerHide();
       VGLMouseShown = VGL_MOUSEHIDE;
+      __VGLBitmapCopy(&VGLVDisplay, VGLMouseXpos, VGLMouseYpos,
+                      VGLDisplay, VGLMouseXpos, VGLMouseYpos,
+                      VGLMouseAndMask->VXsize, VGLMouseAndMask->VYsize);
     }
   }
+  INTON();
+  return oldmode;
 }
 
 void
+VGLMouseMode(int mode)
+{
+  __VGLMouseMode(mode);
+}
+
+static void
 VGLMouseAction(int dummy)	
 {
   struct mouse_info mouseinfo;
+  int mousemode;
 
   if (VGLMsuppressint) {
     VGLMintpending = 1;
     return;
   }
 again:
   INTOFF();
   VGLMintpending = 0;
   mouseinfo.operation = MOUSE_GETINFO;
   ioctl(0, CONS_MOUSECTL, &mouseinfo);
-  if (VGLMouseShown == VGL_MOUSESHOW)
-    VGLMousePointerHide();
-  VGLMouseXpos = mouseinfo.u.data.x;
-  VGLMouseYpos = mouseinfo.u.data.y;
+  if (VGLMouseXpos != mouseinfo.u.data.x ||
+      VGLMouseYpos != mouseinfo.u.data.y) {
+    mousemode = __VGLMouseMode(VGL_MOUSEHIDE);
+    VGLMouseXpos = mouseinfo.u.data.x;
+    VGLMouseYpos = mouseinfo.u.data.y;
+    __VGLMouseMode(mousemode);
+  }
   VGLMouseButtons = mouseinfo.u.data.buttons;
-  if (VGLMouseShown == VGL_MOUSESHOW)
-    VGLMousePointerShow();
 
   /* 
    * Loop to handle any new (suppressed) signals.  This is INTON() without
    * recursion.  !SA_RESTART prevents recursion in signal handling.  So the
    * maximum recursion is 2 levels.
    */
   VGLMsuppressint = 0;
   if (VGLMintpending)
     goto again;
 }
 
 void
 VGLMouseSetImage(VGLBitmap *AndMask, VGLBitmap *OrMask)
 {
-  if (VGLMouseShown == VGL_MOUSESHOW)
-    VGLMousePointerHide();
+  int mousemode;
 
+  mousemode = __VGLMouseMode(VGL_MOUSEHIDE);
+
   VGLMouseAndMask = AndMask;
 
   if (VGLMouseOrMask != NULL) {
     free(VGLMouseOrMask->Bitmap);
     free(VGLMouseOrMask);
   }
   VGLMouseOrMask = VGLBitmapCreate(MEMBUF, OrMask->VXsize, OrMask->VYsize, 0);
   VGLBitmapAllocateBits(VGLMouseOrMask);
   VGLBitmapCvt(OrMask, VGLMouseOrMask);
 
-  if (VGLMouseShown == VGL_MOUSESHOW)
-    VGLMousePointerShow();
+  __VGLMouseMode(mousemode);
 }
 
 void
 VGLMouseSetStdImage()
 {
-  VGLMouseSetImage(&VGLMouseStdAndMask, &VGLMouseStdOrMask);
+  if (VGLDisplay->VXsize > 800)
+    VGLMouseSetImage(&VGLMouseLargeAndMask, &VGLMouseLargeOrMask);
+  else
+    VGLMouseSetImage(&VGLMouseSmallAndMask, &VGLMouseSmallOrMask);
 }
 
 int
 VGLMouseInit(int mode)
 {
   struct mouse_info mouseinfo;
+  VGLBitmap *ormask;
   int andmask, border, error, i, interior;
 
   switch (VGLModeInfo.vi_mem_model) {
   case V_INFO_MM_PACKED:
   case V_INFO_MM_PLANAR:
     andmask = 0x0f;
     border = 0x0f;
     interior = 0x04;
     break;
   case V_INFO_MM_VGAX:
     andmask = 0x3f;
     border = 0x3f;
     interior = 0x24;
     break;
   default:
     andmask = 0xff;
     border = BORDER;
     interior = INTERIOR;
     break;
   }
   if (VGLModeInfo.vi_mode == M_BG640x480)
     border = 0;		/* XXX (palette makes 0x04 look like 0x0f) */
   if (getenv("VGLMOUSEBORDERCOLOR") != NULL)
     border = strtoul(getenv("VGLMOUSEBORDERCOLOR"), NULL, 0);
   if (getenv("VGLMOUSEINTERIORCOLOR") != NULL)
     interior = strtoul(getenv("VGLMOUSEINTERIORCOLOR"), NULL, 0);
-  for (i = 0; i < MOUSE_IMG_SIZE*MOUSE_IMG_SIZE; i++)
-    VGLMouseStdOrMask.Bitmap[i] = VGLMouseStdOrMask.Bitmap[i] == BORDER ?
-      border : VGLMouseStdOrMask.Bitmap[i] == INTERIOR ? interior : 0;
+  ormask = &VGLMouseLargeOrMask;
+  for (i = 0; i < ormask->VXsize * ormask->VYsize; i++)
+    ormask->Bitmap[i] = ormask->Bitmap[i] == BORDER ?  border :
+                        ormask->Bitmap[i] == INTERIOR ? interior : 0;
+  ormask = &VGLMouseSmallOrMask;
+  for (i = 0; i < ormask->VXsize * ormask->VYsize; i++)
+    ormask->Bitmap[i] = ormask->Bitmap[i] == BORDER ?  border :
+                        ormask->Bitmap[i] == INTERIOR ? interior : 0;
   VGLMouseSetStdImage();
   mouseinfo.operation = MOUSE_MODE;
   mouseinfo.u.mode.signal = SIGUSR2;
   if ((error = ioctl(0, CONS_MOUSECTL, &mouseinfo)))
     return error;
   signal(SIGUSR2, VGLMouseAction);
   mouseinfo.operation = MOUSE_GETINFO;
   ioctl(0, CONS_MOUSECTL, &mouseinfo);
   VGLMouseXpos = mouseinfo.u.data.x;
   VGLMouseYpos = mouseinfo.u.data.y;
   VGLMouseButtons = mouseinfo.u.data.buttons;
   VGLMouseMode(mode);
   return 0;
 }
 
 void
 VGLMouseRestore(void)
 {
   struct mouse_info mouseinfo;
 
   INTOFF();
   mouseinfo.operation = MOUSE_GETINFO;
   if (ioctl(0, CONS_MOUSECTL, &mouseinfo) == 0) {
     mouseinfo.operation = MOUSE_MOVEABS;
     mouseinfo.u.data.x = VGLMouseXpos;
     mouseinfo.u.data.y = VGLMouseYpos;
     ioctl(0, CONS_MOUSECTL, &mouseinfo);
   }
   INTON();
 }
 
 int
 VGLMouseStatus(int *x, int *y, char *buttons)
 {
   INTOFF();
   *x =  VGLMouseXpos;
   *y =  VGLMouseYpos;
   *buttons =  VGLMouseButtons;
   INTON();
   return VGLMouseShown;
 }
 
-int
-VGLMouseFreeze(int x, int y, int width, int hight, u_long color)
+void
+VGLMouseFreeze(void)
 {
-    INTOFF();
-    if (width > 1 || hight > 1 || (color & 0xc0000000) == 0) { /* bitmap */
-      if (VGLMouseShown == 1) {
-        int overlap;
+  INTOFF();
+}
 
-        if (x > VGLMouseXpos)
-          overlap = (VGLMouseXpos + MOUSE_IMG_SIZE) - x;
-        else
-          overlap = (x + width) - VGLMouseXpos;
-        if (overlap > 0) {
-          if (y > VGLMouseYpos)
-            overlap = (VGLMouseYpos + MOUSE_IMG_SIZE) - y;
-          else
-            overlap = (y + hight) - VGLMouseYpos;
-          if (overlap > 0)
-            VGLMousePointerHide();
-        } 
-      }
-    }
-    else {				/* bit */
-      if (VGLMouseShown &&
-          x >= VGLMouseXpos && x < VGLMouseXpos + MOUSE_IMG_SIZE &&
-          y >= VGLMouseYpos && y < VGLMouseYpos + MOUSE_IMG_SIZE) {
-        if (color & 0x80000000) {	/* Set */
-          if (VGLMouseAndMask->Bitmap 
-            [(y-VGLMouseYpos)*MOUSE_IMG_SIZE+(x-VGLMouseXpos)]) {
-            return 1;
-          }   
-        }   
-      }       
-    }
+int
+VGLMouseFreezeXY(int x, int y)
+{
+  INTOFF();
+  if (VGLMouseShown != VGL_MOUSESHOW)
+    return 0;
+  if (x >= VGLMouseXpos && x < VGLMouseXpos + VGLMouseAndMask->VXsize &&
+      y >= VGLMouseYpos && y < VGLMouseYpos + VGLMouseAndMask->VYsize &&
+      VGLMouseAndMask->Bitmap[(y-VGLMouseYpos)*VGLMouseAndMask->VXsize+
+                              (x-VGLMouseXpos)])
+    return 1;
   return 0;
 }
 
+int
+VGLMouseOverlap(int x, int y, int width, int hight)
+{
+  int overlap;
+
+  if (VGLMouseShown != VGL_MOUSESHOW)
+    return 0;
+  if (x > VGLMouseXpos)
+    overlap = (VGLMouseXpos + VGLMouseAndMask->VXsize) - x;
+  else
+    overlap = (x + width) - VGLMouseXpos;
+  if (overlap <= 0)
+    return 0;
+  if (y > VGLMouseYpos)
+    overlap = (VGLMouseYpos + VGLMouseAndMask->VYsize) - y;
+  else
+    overlap = (y + hight) - VGLMouseYpos;
+  return overlap > 0;
+}
+
 void
+VGLMouseMerge(int x, int y, int width, byte *line)
+{
+  int pos, x1, xend, xstart;
+
+  xstart = x;
+  if (xstart < VGLMouseXpos)
+    xstart = VGLMouseXpos;
+  xend = x + width;
+  if (xend > VGLMouseXpos + VGLMouseAndMask->VXsize)
+    xend = VGLMouseXpos + VGLMouseAndMask->VXsize;
+  for (x1 = xstart; x1 < xend; x1++) {
+    pos = (y - VGLMouseYpos) * VGLMouseAndMask->VXsize + x1 - VGLMouseXpos;
+    if (VGLMouseAndMask->Bitmap[pos])
+      bcopy(&VGLMouseOrMask->Bitmap[pos * VGLDisplay->PixelBytes],
+            &line[(x1 - x) * VGLDisplay->PixelBytes], VGLDisplay->PixelBytes);
+  }
+}
+
+void
 VGLMouseUnFreeze()
 {
-  if (VGLMouseShown == VGL_MOUSESHOW && !VGLMouseVisible && !VGLMintpending)
-    VGLMousePointerShow();
-  while (VGLMsuppressint)
-    INTON();
+  INTON();
 }
Index: user/ngie/bug-237403/lib/libvgl/simple.c
===================================================================
--- user/ngie/bug-237403/lib/libvgl/simple.c	(revision 346925)
+++ user/ngie/bug-237403/lib/libvgl/simple.c	(revision 346926)
@@ -1,670 +1,688 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1991-1997 Søren Schmidt
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer
  *    in this position and unchanged.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <signal.h>
 #include <sys/fbio.h>
 #include <sys/kbio.h>
 #include <sys/endian.h>
 #include "vgl.h"
 
 static int VGLBlank;
 static byte VGLBorderColor;
 static byte VGLSavePaletteRed[256];
 static byte VGLSavePaletteGreen[256];
 static byte VGLSavePaletteBlue[256];
 
 #define ABS(a)		(((a)<0) ? -(a) : (a))
 #define SGN(a)		(((a)<0) ? -1 : 1)
 #define min(x, y)	(((x) < (y)) ? (x) : (y))
 #define max(x, y)	(((x) > (y)) ? (x) : (y))
 
 void
 VGLSetXY(VGLBitmap *object, int x, int y, u_long color)
 {
-  int offset;
+  int offset, soffset, undermouse;
 
   VGLCheckSwitch();
   if (x>=0 && x<object->VXsize && y>=0 && y<object->VYsize) {
-    if (object == VGLDisplay)
+    if (object == VGLDisplay) {
+      undermouse = VGLMouseFreezeXY(x, y);
       VGLSetXY(&VGLVDisplay, x, y, color);
-    if (object->Type == MEMBUF ||
-        !VGLMouseFreeze(x, y, 1, 1, 0x80000000 | color)) {
+    } else if (object->Type != MEMBUF)
+      return;		/* invalid */
+    else
+      undermouse = 0;
+    if (!undermouse) {
       offset = (y * object->VXsize + x) * object->PixelBytes;
       switch (object->Type) {
       case VIDBUF8S:
       case VIDBUF16S:
-      case VIDBUF24S:
       case VIDBUF32S:
         offset = VGLSetSegment(offset);
         /* FALLTHROUGH */
       case MEMBUF:
       case VIDBUF8:
       case VIDBUF16:
       case VIDBUF24:
       case VIDBUF32:
         color = htole32(color);
         switch (object->PixelBytes) {
         case 1:
           memcpy(&object->Bitmap[offset], &color, 1);
           break;
         case 2:
           memcpy(&object->Bitmap[offset], &color, 2);
           break;
         case 3:
           memcpy(&object->Bitmap[offset], &color, 3);
           break;
         case 4:
           memcpy(&object->Bitmap[offset], &color, 4);
           break;
         }
         break;
+      case VIDBUF24S:
+        soffset = VGLSetSegment(offset);
+        color = htole32(color);
+        switch (VGLAdpInfo.va_window_size - soffset) {
+        case 1:
+          memcpy(&object->Bitmap[soffset], &color, 1);
+          soffset = VGLSetSegment(offset + 1);
+          memcpy(&object->Bitmap[soffset], (byte *)&color + 1, 2);
+          break;
+        case 2:
+          memcpy(&object->Bitmap[soffset], &color, 2);
+          soffset = VGLSetSegment(offset + 2);
+          memcpy(&object->Bitmap[soffset], (byte *)&color + 2, 1);
+          break;
+        default:
+          memcpy(&object->Bitmap[soffset], &color, 3);
+          break;
+        }
+        break;
       case VIDBUF8X:
         outb(0x3c4, 0x02);
         outb(0x3c5, 0x01 << (x&0x3));
 	object->Bitmap[(unsigned)(VGLAdpInfo.va_line_width*y)+(x/4)] = ((byte)color);
 	break;
       case VIDBUF4S:
 	offset = VGLSetSegment(y*VGLAdpInfo.va_line_width + x/8);
 	goto set_planar;
       case VIDBUF4:
 	offset = y*VGLAdpInfo.va_line_width + x/8;
 set_planar:
         outb(0x3c4, 0x02); outb(0x3c5, 0x0f);
         outb(0x3ce, 0x00); outb(0x3cf, (byte)color & 0x0f);	/* set/reset */
         outb(0x3ce, 0x01); outb(0x3cf, 0x0f);		/* set/reset enable */
         outb(0x3ce, 0x08); outb(0x3cf, 0x80 >> (x%8));	/* bit mask */
 	object->Bitmap[offset] |= (byte)color;
       }
     }
-    if (object->Type != MEMBUF)
+    if (object == VGLDisplay)
       VGLMouseUnFreeze();
   }
 }
 
-static u_long
-__VGLGetXY(VGLBitmap *object, int x, int y)
+u_long
+VGLGetXY(VGLBitmap *object, int x, int y)
 {
-  int offset;
   u_long color;
+  int offset;
 
+  VGLCheckSwitch();
+  if (x<0 || x>=object->VXsize || y<0 || y>=object->VYsize)
+    return 0;
+  if (object == VGLDisplay)
+    object = &VGLVDisplay;
+  else if (object->Type != MEMBUF)
+    return 0;		/* invalid */
   offset = (y * object->VXsize + x) * object->PixelBytes;
   switch (object->PixelBytes) {
   case 1:
     memcpy(&color, &object->Bitmap[offset], 1);
     return le32toh(color) & 0xff;
   case 2:
     memcpy(&color, &object->Bitmap[offset], 2);
     return le32toh(color) & 0xffff;
   case 3:
     memcpy(&color, &object->Bitmap[offset], 3);
     return le32toh(color) & 0xffffff;
   case 4:
     memcpy(&color, &object->Bitmap[offset], 4);
     return le32toh(color);
   }
   return 0;		/* invalid */
 }
 
-u_long
-VGLGetXY(VGLBitmap *object, int x, int y)
-{
-  VGLCheckSwitch();
-  if (x<0 || x>=object->VXsize || y<0 || y>=object->VYsize)
-    return 0;
-  if (object == VGLDisplay)
-    object = &VGLVDisplay;
-  if (object->Type != MEMBUF)
-    return 0;		/* invalid */
-  return __VGLGetXY(object, x, y);
-}
-
  /*
   * Symmetric Double Step Line Algorithm by Brian Wyvill from
   * "Graphics Gems", Academic Press, 1990.
   */
 
 #define SL_SWAP(a,b)           {a^=b; b^=a; a^=b;}
 #define SL_ABSOLUTE(i,j,k)     ( (i-j)*(k = ( (i-j)<0 ? -1 : 1)))
 
 void
 plot(VGLBitmap * object, int x, int y, int flag, u_long color)
 {
   /* non-zero flag indicates the pixels need swapping back. */
   if (flag)
     VGLSetXY(object, y, x, color);
   else
     VGLSetXY(object, x, y, color);
 }
 
 
 void
 VGLLine(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color)
 {
   int dx, dy, incr1, incr2, D, x, y, xend, c, pixels_left;
   int sign_x, sign_y, step, reverse, i;
 
   dx = SL_ABSOLUTE(x2, x1, sign_x);
   dy = SL_ABSOLUTE(y2, y1, sign_y);
   /* decide increment sign by the slope sign */
   if (sign_x == sign_y)
     step = 1;
   else
     step = -1;
 
   if (dy > dx) {	/* chooses axis of greatest movement (make dx) */
     SL_SWAP(x1, y1);
     SL_SWAP(x2, y2);
     SL_SWAP(dx, dy);
     reverse = 1;
   } else
     reverse = 0;
   /* note error check for dx==0 should be included here */
   if (x1 > x2) {      /* start from the smaller coordinate */
     x = x2;
     y = y2;
 /*  x1 = x1;
     y1 = y1; */
   } else {
     x = x1;
     y = y1;
     x1 = x2;
     y1 = y2;
   }
 
 
   /* Note dx=n implies 0 - n or (dx+1) pixels to be set */
   /* Go round loop dx/4 times then plot last 0,1,2 or 3 pixels */
   /* In fact (dx-1)/4 as 2 pixels are already plotted */
   xend = (dx - 1) / 4;
   pixels_left = (dx - 1) % 4;  /* number of pixels left over at the
            * end */
   plot(object, x, y, reverse, color);
   if (pixels_left < 0)
     return;      /* plot only one pixel for zero length
            * vectors */
   plot(object, x1, y1, reverse, color);  /* plot first two points */
   incr2 = 4 * dy - 2 * dx;
   if (incr2 < 0) {    /* slope less than 1/2 */
     c = 2 * dy;
     incr1 = 2 * c;
     D = incr1 - dx;
 
     for (i = 0; i < xend; i++) {  /* plotting loop */
       ++x;
       --x1;
       if (D < 0) {
         /* pattern 1 forwards */
         plot(object, x, y, reverse, color);
         plot(object, ++x, y, reverse, color);
         /* pattern 1 backwards */
         plot(object, x1, y1, reverse, color);
         plot(object, --x1, y1, reverse, color);
         D += incr1;
       } else {
         if (D < c) {
           /* pattern 2 forwards */
           plot(object, x, y, reverse, color);
           plot(object, ++x, y += step, reverse,
               color);
           /* pattern 2 backwards */
           plot(object, x1, y1, reverse, color);
           plot(object, --x1, y1 -= step, reverse,
               color);
         } else {
           /* pattern 3 forwards */
           plot(object, x, y += step, reverse, color);
           plot(object, ++x, y, reverse, color);
           /* pattern 3 backwards */
           plot(object, x1, y1 -= step, reverse,
               color);
           plot(object, --x1, y1, reverse, color);
         }
         D += incr2;
       }
     }      /* end for */
 
     /* plot last pattern */
     if (pixels_left) {
       if (D < 0) {
         plot(object, ++x, y, reverse, color);  /* pattern 1 */
         if (pixels_left > 1)
           plot(object, ++x, y, reverse, color);
         if (pixels_left > 2)
           plot(object, --x1, y1, reverse, color);
       } else {
         if (D < c) {
           plot(object, ++x, y, reverse, color);  /* pattern 2  */
           if (pixels_left > 1)
             plot(object, ++x, y += step, reverse, color);
           if (pixels_left > 2)
             plot(object, --x1, y1, reverse, color);
         } else {
           /* pattern 3 */
           plot(object, ++x, y += step, reverse, color);
           if (pixels_left > 1)
             plot(object, ++x, y, reverse, color);
           if (pixels_left > 2)
             plot(object, --x1, y1 -= step, reverse, color);
         }
       }
     }      /* end if pixels_left */
   }
   /* end slope < 1/2 */
   else {        /* slope greater than 1/2 */
     c = 2 * (dy - dx);
     incr1 = 2 * c;
     D = incr1 + dx;
     for (i = 0; i < xend; i++) {
       ++x;
       --x1;
       if (D > 0) {
         /* pattern 4 forwards */
         plot(object, x, y += step, reverse, color);
         plot(object, ++x, y += step, reverse, color);
         /* pattern 4 backwards */
         plot(object, x1, y1 -= step, reverse, color);
         plot(object, --x1, y1 -= step, reverse, color);
         D += incr1;
       } else {
         if (D < c) {
           /* pattern 2 forwards */
           plot(object, x, y, reverse, color);
           plot(object, ++x, y += step, reverse,
               color);
 
           /* pattern 2 backwards */
           plot(object, x1, y1, reverse, color);
           plot(object, --x1, y1 -= step, reverse,
               color);
         } else {
           /* pattern 3 forwards */
           plot(object, x, y += step, reverse, color);
           plot(object, ++x, y, reverse, color);
           /* pattern 3 backwards */
           plot(object, x1, y1 -= step, reverse, color);
           plot(object, --x1, y1, reverse, color);
         }
         D += incr2;
       }
     }      /* end for */
     /* plot last pattern */
     if (pixels_left) {
       if (D > 0) {
         plot(object, ++x, y += step, reverse, color);  /* pattern 4 */
         if (pixels_left > 1)
           plot(object, ++x, y += step, reverse,
               color);
         if (pixels_left > 2)
           plot(object, --x1, y1 -= step, reverse,
               color);
       } else {
         if (D < c) {
           plot(object, ++x, y, reverse, color);  /* pattern 2  */
           if (pixels_left > 1)
             plot(object, ++x, y += step, reverse, color);
           if (pixels_left > 2)
             plot(object, --x1, y1, reverse, color);
         } else {
           /* pattern 3 */
           plot(object, ++x, y += step, reverse, color);
           if (pixels_left > 1)
             plot(object, ++x, y, reverse, color);
           if (pixels_left > 2) {
             if (D > c)  /* step 3 */
               plot(object, --x1, y1 -= step, reverse, color);
             else  /* step 2 */
               plot(object, --x1, y1, reverse, color);
           }
         }
       }
     }
   }
 }
 
 void
 VGLBox(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color)
 {
   VGLLine(object, x1, y1, x2, y1, color);
   VGLLine(object, x2, y1, x2, y2, color);
   VGLLine(object, x2, y2, x1, y2, color);
   VGLLine(object, x1, y2, x1, y1, color);
 }
 
 void
 VGLFilledBox(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color)
 {
   int y;
 
   for (y=y1; y<=y2; y++) VGLLine(object, x1, y, x2, y, color);
 }
 
 static inline void
 set4pixels(VGLBitmap *object, int x, int y, int xc, int yc, u_long color)
 {
   if (x!=0) { 
     VGLSetXY(object, xc+x, yc+y, color); 
     VGLSetXY(object, xc-x, yc+y, color); 
     if (y!=0) { 
       VGLSetXY(object, xc+x, yc-y, color); 
       VGLSetXY(object, xc-x, yc-y, color); 
     } 
   } 
   else { 
     VGLSetXY(object, xc, yc+y, color); 
     if (y!=0) 
       VGLSetXY(object, xc, yc-y, color); 
   } 
 }
 
 void
 VGLEllipse(VGLBitmap *object, int xc, int yc, int a, int b, u_long color)
 {
   int x = 0, y = b, asq = a*a, asq2 = a*a*2, bsq = b*b;
   int bsq2 = b*b*2, d = bsq-asq*b+asq/4, dx = 0, dy = asq2*b;
 
   while (dx<dy) {
     set4pixels(object, x, y, xc, yc, color);
     if (d>0) {
       y--; dy-=asq2; d-=dy;
     }
     x++; dx+=bsq2; d+=bsq+dx;
   }
   d+=(3*(asq-bsq)/2-(dx+dy))/2;
   while (y>=0) {
     set4pixels(object, x, y, xc, yc, color);
     if (d<0) {
       x++; dx+=bsq2; d+=dx;
     }
     y--; dy-=asq2; d+=asq-dy;
   }
 }
 
 static inline void
 set2lines(VGLBitmap *object, int x, int y, int xc, int yc, u_long color)
 {
   if (x!=0) { 
     VGLLine(object, xc+x, yc+y, xc-x, yc+y, color); 
     if (y!=0) 
       VGLLine(object, xc+x, yc-y, xc-x, yc-y, color); 
   } 
   else { 
     VGLLine(object, xc, yc+y, xc, yc-y, color); 
   } 
 }
 
 void
 VGLFilledEllipse(VGLBitmap *object, int xc, int yc, int a, int b, u_long color)
 {
   int x = 0, y = b, asq = a*a, asq2 = a*a*2, bsq = b*b;
   int bsq2 = b*b*2, d = bsq-asq*b+asq/4, dx = 0, dy = asq2*b;
 
   while (dx<dy) {
     set2lines(object, x, y, xc, yc, color);
     if (d>0) {
       y--; dy-=asq2; d-=dy;
     }
     x++; dx+=bsq2; d+=bsq+dx;
   }
   d+=(3*(asq-bsq)/2-(dx+dy))/2;
   while (y>=0) {
     set2lines(object, x, y, xc, yc, color);
     if (d<0) {
       x++; dx+=bsq2; d+=dx;
     }
     y--; dy-=asq2; d+=asq-dy;
   }
 }
 
 void
 VGLClear(VGLBitmap *object, u_long color)
 {
   VGLBitmap src;
-  int offset;
-  int len;
-  int i;
+  int i, len, mousemode, offset;
 
   VGLCheckSwitch();
   if (object == VGLDisplay) {
-    VGLMouseFreeze(0, 0, object->Xsize, object->Ysize, color);
+    VGLMouseFreeze();
     VGLClear(&VGLVDisplay, color);
   } else if (object->Type != MEMBUF)
     return;		/* invalid */
   switch (object->Type) {
   case MEMBUF:
   case VIDBUF8:
   case VIDBUF8S:
   case VIDBUF16:
   case VIDBUF16S:
   case VIDBUF24:
   case VIDBUF24S:
   case VIDBUF32:
   case VIDBUF32S:
     src.Type = MEMBUF;
     src.Xsize = object->Xsize;
     src.VXsize = object->VXsize;
     src.Ysize = 1;
     src.VYsize = 1;
     src.Xorigin = 0;
     src.Yorigin = 0;
     src.Bitmap = alloca(object->VXsize * object->PixelBytes);
     src.PixelBytes = object->PixelBytes;
     color = htole32(color);
     for (i = 0; i < object->VXsize; i++)
       bcopy(&color, src.Bitmap + i * object->PixelBytes, object->PixelBytes);
     for (i = 0; i < object->VYsize; i++)
-      __VGLBitmapCopy(&src, 0, 0, object, 0, i, object->VXsize, 1);
+      __VGLBitmapCopy(&src, 0, 0, object, 0, i, object->VXsize, -1);
     break;
 
   case VIDBUF8X:
+    mousemode = __VGLMouseMode(VGL_MOUSEHIDE);
     /* XXX works only for Xsize % 4 = 0 */
     outb(0x3c6, 0xff);
     outb(0x3c4, 0x02); outb(0x3c5, 0x0f);
     memset(object->Bitmap, (byte)color, VGLAdpInfo.va_line_width*object->VYsize);
+    __VGLMouseMode(mousemode);
     break;
 
   case VIDBUF4:
   case VIDBUF4S:
+    mousemode = __VGLMouseMode(VGL_MOUSEHIDE);
     /* XXX works only for Xsize % 8 = 0 */
     outb(0x3c4, 0x02); outb(0x3c5, 0x0f);
     outb(0x3ce, 0x05); outb(0x3cf, 0x02);		/* mode 2 */
     outb(0x3ce, 0x01); outb(0x3cf, 0x00);		/* set/reset enable */
     outb(0x3ce, 0x08); outb(0x3cf, 0xff);		/* bit mask */
     for (offset = 0; offset < VGLAdpInfo.va_line_width*object->VYsize; ) {
       VGLSetSegment(offset);
       len = min(object->VXsize*object->VYsize - offset,
 		VGLAdpInfo.va_window_size);
       memset(object->Bitmap, (byte)color, len);
       offset += len;
     }
     outb(0x3ce, 0x05); outb(0x3cf, 0x00);
+    __VGLMouseMode(mousemode);
     break;
   }
   if (object == VGLDisplay)
     VGLMouseUnFreeze();
 }
 
 static inline u_long
 VGLrgbToNative(uint16_t r, uint16_t g, uint16_t b)
 {
   int nr, ng, nb;
 
   nr = VGLModeInfo.vi_pixel_fsizes[2];
   ng = VGLModeInfo.vi_pixel_fsizes[1];
   nb = VGLModeInfo.vi_pixel_fsizes[0];
   return (r >> (16 - nr) << (ng + nb)) | (g >> (16 - ng) << nb) |
 	 (b >> (16 - nb) << 0);
 }
 
 u_long
 VGLrgb332ToNative(byte c)
 {
   uint16_t r, g, b;
 
   /* 3:3:2 to 16:16:16 */
   r = ((c & 0xe0) >> 5) * 0xffff / 7;
   g = ((c & 0x1c) >> 2) * 0xffff / 7;
   b = ((c & 0x03) >> 0) * 0xffff / 3;
 
   return VGLrgbToNative(r, g, b);
 }
 
 void
 VGLRestorePalette()
 {
   int i;
 
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     return;
   outb(0x3C6, 0xFF);
   inb(0x3DA); 
   outb(0x3C8, 0x00);
   for (i=0; i<256; i++) {
     outb(0x3C9, VGLSavePaletteRed[i]);
     inb(0x84);
     outb(0x3C9, VGLSavePaletteGreen[i]);
     inb(0x84);
     outb(0x3C9, VGLSavePaletteBlue[i]);
     inb(0x84);
   }
   inb(0x3DA);
   outb(0x3C0, 0x20);
 }
 
 void
 VGLSavePalette()
 {
   int i;
 
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     return;
   outb(0x3C6, 0xFF);
   inb(0x3DA);
   outb(0x3C7, 0x00);
   for (i=0; i<256; i++) {
     VGLSavePaletteRed[i] = inb(0x3C9);
     inb(0x84);
     VGLSavePaletteGreen[i] = inb(0x3C9);
     inb(0x84);
     VGLSavePaletteBlue[i] = inb(0x3C9);
     inb(0x84);
   }
   inb(0x3DA);
   outb(0x3C0, 0x20);
 }
 
 void
 VGLSetPalette(byte *red, byte *green, byte *blue)
 {
   int i;
   
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     return;
   for (i=0; i<256; i++) {
     VGLSavePaletteRed[i] = red[i];
     VGLSavePaletteGreen[i] = green[i];
     VGLSavePaletteBlue[i] = blue[i];
   }
   VGLCheckSwitch();
   outb(0x3C6, 0xFF);
   inb(0x3DA); 
   outb(0x3C8, 0x00);
   for (i=0; i<256; i++) {
     outb(0x3C9, VGLSavePaletteRed[i]);
     inb(0x84);
     outb(0x3C9, VGLSavePaletteGreen[i]);
     inb(0x84);
     outb(0x3C9, VGLSavePaletteBlue[i]);
     inb(0x84);
   }
   inb(0x3DA);
   outb(0x3C0, 0x20);
 }
 
 void
 VGLSetPaletteIndex(byte color, byte red, byte green, byte blue)
 {
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     return;
   VGLSavePaletteRed[color] = red;
   VGLSavePaletteGreen[color] = green;
   VGLSavePaletteBlue[color] = blue;
   VGLCheckSwitch();
   outb(0x3C6, 0xFF);
   inb(0x3DA);
   outb(0x3C8, color); 
   outb(0x3C9, red); outb(0x3C9, green); outb(0x3C9, blue);
   inb(0x3DA);
   outb(0x3C0, 0x20);
 }
 
 void
 VGLRestoreBorder(void)
 {
   VGLSetBorder(VGLBorderColor);
 }
 
 void
 VGLSetBorder(byte color)
 {
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT && ioctl(0, KDENABIO, 0))
     return;
   VGLCheckSwitch();
   inb(0x3DA);
   outb(0x3C0,0x11); outb(0x3C0, color); 
   inb(0x3DA);
   outb(0x3C0, 0x20);
   VGLBorderColor = color;
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     ioctl(0, KDDISABIO, 0);
 }
 
 void
 VGLRestoreBlank(void)
 {
   VGLBlankDisplay(VGLBlank);
 }
 
 void
 VGLBlankDisplay(int blank)
 {
   byte val;
 
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT && ioctl(0, KDENABIO, 0))
     return;
   VGLCheckSwitch();
   outb(0x3C4, 0x01); val = inb(0x3C5); outb(0x3C4, 0x01);
   outb(0x3C5, ((blank) ? (val |= 0x20) : (val &= 0xDF)));
   VGLBlank = blank;
   if (VGLModeInfo.vi_mem_model == V_INFO_MM_DIRECT)
     ioctl(0, KDDISABIO, 0);
 }
Index: user/ngie/bug-237403/lib/libvgl/vgl.h
===================================================================
--- user/ngie/bug-237403/lib/libvgl/vgl.h	(revision 346925)
+++ user/ngie/bug-237403/lib/libvgl/vgl.h	(revision 346926)
@@ -1,162 +1,163 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1991-1997 Søren Schmidt
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer
  *    in this position and unchanged.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _VGL_H_
 #define _VGL_H_
 
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
 #include <machine/cpufunc.h>
 
 typedef unsigned char byte;
 typedef struct {
   byte 	Type;
   int  	Xsize, Ysize;
   int  	VXsize, VYsize;
   int   Xorigin, Yorigin;
   byte 	*Bitmap;
   int	PixelBytes;
 } VGLBitmap;
 
 #define VGLBITMAP_INITIALIZER(t, x, y, bits)	\
 	{ (t), (x), (y), (x), (y), 0, 0, (bits), -1 }
 
 /*
  * Defined Type's
  */
 #define MEMBUF		0
 #define VIDBUF4		1
 #define VIDBUF8		2
 #define VIDBUF8X	3
 #define VIDBUF8S	4
 #define VIDBUF4S	5
 #define VIDBUF16	6		/* Direct Color linear buffer */
 #define VIDBUF24	7		/* Direct Color linear buffer */
 #define VIDBUF32	8		/* Direct Color linear buffer */
 #define VIDBUF16S	9		/* Direct Color segmented buffer */
 #define VIDBUF24S	10		/* Direct Color segmented buffer */
 #define VIDBUF32S	11		/* Direct Color segmented buffer */
 #define NOBUF		255
 
 typedef struct VGLText {
   byte	Width, Height;
   byte	*BitmapArray;
 } VGLText;
 
 typedef struct VGLObject {
   int	  	Id;
   int	  	Type;
   int	  	Status;
   int	  	Xpos, Ypos;
   int	  	Xhot, Yhot;
   VGLBitmap 	*Image;
   VGLBitmap 	*Mask;
   int		(*CallBackFunction)();
 } VGLObject;
 
 #define MOUSE_IMG_SIZE		16
 #define VGL_MOUSEHIDE		0
 #define VGL_MOUSESHOW		1
 #define VGL_MOUSEFREEZE		0
 #define VGL_MOUSEUNFREEZE	1
 #define VGL_DIR_RIGHT		0
 #define VGL_DIR_UP		1
 #define VGL_DIR_LEFT		2
 #define VGL_DIR_DOWN		3
 #define VGL_RAWKEYS		1
 #define VGL_CODEKEYS		2
 #define VGL_XLATEKEYS		3
 
 extern video_adapter_info_t	VGLAdpInfo;
 extern video_info_t		VGLModeInfo;
 extern VGLBitmap 		*VGLDisplay;
 extern VGLBitmap 		VGLVDisplay;
 extern byte 			*VGLBuf;
 
 /*
  * Prototypes
  */
 /* bitmap.c */
 int __VGLBitmapCopy(VGLBitmap *src, int srcx, int srcy, VGLBitmap *dst, int dstx, int dsty, int width, int hight);
 int VGLBitmapCopy(VGLBitmap *src, int srcx, int srcy, VGLBitmap *dst, int dstx, int dsty, int width, int hight);
 VGLBitmap *VGLBitmapCreate(int type, int xsize, int ysize, byte *bits);
 void VGLBitmapDestroy(VGLBitmap *object);
 int VGLBitmapAllocateBits(VGLBitmap *object);
 void VGLBitmapCvt(VGLBitmap *src, VGLBitmap *dst);
 /* keyboard.c */
 int VGLKeyboardInit(int mode);
 void VGLKeyboardEnd(void);
 int VGLKeyboardGetCh(void);
 /* main.c */
 void VGLEnd(void);
 int VGLInit(int mode);
 void VGLCheckSwitch(void);
 int VGLSetVScreenSize(VGLBitmap *object, int VXsize, int VYsize);
 int VGLPanScreen(VGLBitmap *object, int x, int y);
 int VGLSetSegment(unsigned int offset);
 /* mouse.c */
-void VGLMousePointerShow(void);
-void VGLMousePointerHide(void);
+int __VGLMouseMode(int mode);
 void VGLMouseMode(int mode);
-void VGLMouseAction(int dummy);
 void VGLMouseSetImage(VGLBitmap *AndMask, VGLBitmap *OrMask);
 void VGLMouseSetStdImage(void);
 int VGLMouseInit(int mode);
 void VGLMouseRestore(void);
 int VGLMouseStatus(int *x, int *y, char *buttons);
-int VGLMouseFreeze(int x, int y, int width, int hight, u_long color);
+void VGLMouseFreeze(void);
+int VGLMouseFreezeXY(int x, int y);
+void VGLMouseMerge(int x, int y, int width, byte *line);
+int VGLMouseOverlap(int x, int y, int width, int hight);
 void VGLMouseUnFreeze(void);
 /* simple.c */
 void VGLSetXY(VGLBitmap *object, int x, int y, u_long color);
 u_long VGLGetXY(VGLBitmap *object, int x, int y);
 void VGLLine(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color);
 void VGLBox(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color);
 void VGLFilledBox(VGLBitmap *object, int x1, int y1, int x2, int y2, u_long color);
 void VGLEllipse(VGLBitmap *object, int xc, int yc, int a, int b, u_long color);
 void VGLFilledEllipse(VGLBitmap *object, int xc, int yc, int a, int b, u_long color);
 void VGLClear(VGLBitmap *object, u_long color);
 u_long VGLrgb332ToNative(byte c);
 void VGLRestoreBlank(void);
 void VGLRestoreBorder(void);
 void VGLRestorePalette(void);
 void VGLSavePalette(void);
 void VGLSetPalette(byte *red, byte *green, byte *blue);
 void VGLSetPaletteIndex(byte color, byte red, byte green, byte blue);
 void VGLSetBorder(byte color);
 void VGLBlankDisplay(int blank);
 /* text.c */
 int VGLTextSetFontFile(char *filename);
 void VGLBitmapPutChar(VGLBitmap *Object, int x, int y, byte ch, u_long fgcol, u_long bgcol, int fill, int dir);
 void VGLBitmapString(VGLBitmap *Object, int x, int y, char *str, u_long fgcol, u_long bgcol, int fill, int dir);
 
 #endif /* !_VGL_H_ */
Index: user/ngie/bug-237403/libexec/rc/rc.initdiskless
===================================================================
--- user/ngie/bug-237403/libexec/rc/rc.initdiskless	(revision 346925)
+++ user/ngie/bug-237403/libexec/rc/rc.initdiskless	(revision 346926)
@@ -1,400 +1,404 @@
 #!/bin/sh
 #
 # Copyright (c) 1999  Matt Dillon
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
 # 1. Redistributions of source code must retain the above copyright
 #    notice, this list of conditions and the following disclaimer.
 # 2. Redistributions in binary form must reproduce the above copyright
 #    notice, this list of conditions and the following disclaimer in the
 #    documentation and/or other materials provided with the distribution.
 #
 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 # SUCH DAMAGE.
 #
 # $FreeBSD$
 
 # On entry to this script the entire system consists of a read-only root
 # mounted via NFS. The kernel has run BOOTP and configured an interface
 # (otherwise it would not have been able to mount the NFS root!)
 #
 # We use the contents of /conf to create and populate memory filesystems
 # that are mounted on top of this root to implement the writable
 # (and host-specific) parts of the root filesystem, and other volatile
 # filesystems.
 #
 # The hierarchy in /conf has the form /conf/T/M/ where M are directories
 # for which memory filesystems will be created and filled,
 # and T is one of the "template" directories below:
 #
 #  base		universal base, typically a replica of the original root;
 #  default	secondary universal base, typically overriding some
 #		of the files in the original root;
 #  ${ipba}	where ${ipba} is the assigned broadcast IP address
 #  bcast/${ipba} same as above
 #  ${class}	where ${class} is a list of directories supplied by
 #		bootp/dhcp through the T134 option.
 #		${ipba} and ${class} are typically used to configure features
 #		for group of diskless clients, or even individual features;
 #  ${ip}	where ${ip} is the machine's assigned IP address, typically
 #		used to set host-specific features;
 #  ip/${ip}	same as above
 #
 # Template directories are scanned in the order they are listed above,
 # with each successive directory overriding (merged into) the previous one;
 # non-existing directories are ignored.  The subdirectory forms exist to
 # help keep the top level /conf manageable in large installations.
 #
 # The existence of a directory /conf/T/M causes this script to create a
 # memory filesystem mounted as /M on the client.
 #
 # Some files in /conf have special meaning, namely:
 #
 # Filename	Action
 # ----------------------------------------------------------------
 # /conf/T/M/remount
 #		The contents of the file is a mount command. E.g. if
 # 		/conf/1.2.3.4/foo/remount contains "mount -o ro /dev/ad0s3",
 #		then /dev/ad0s3 will be mounted on /conf/1.2.3.4/foo/
 #
 # /conf/T/M/remount_optional
 #		If this file exists, then failure to execute the mount
 #		command contained in /conf/T/M/remount is non-fatal.
 #
 # /conf/T/M/remount_subdir
 #		If this file exists, then the behaviour of /conf/T/M/remount
 #		changes as follows:
 #		 1. /conf/T/M/remount is invoked to mount the root of the
 #		    filesystem where the configuration data exists on a
 #		    temporary mountpoint.
 #		 2. /conf/T/M/remount_subdir is then invoked to mount a
 #		    *subdirectory* of the filesystem mounted by
 #		    /conf/T/M/remount on /conf/T/M/.
 #
 # /conf/T/M/diskless_remount
 #		The contents of the file points to an NFS filesystem,
 #		possibly followed by mount_nfs options. If the server name
 #		is omitted, the script will prepend the root path used when
 #		booting. E.g. if you booted from foo.com:/path/to/root,
 #		an entry for /conf/base/etc/diskless_remount could be any of
 #			foo.com:/path/to/root/etc
 #			/etc -o ro
 #		Because mount_nfs understands ".." in paths, it is
 #		possible to mount from locations above the NFS root with
 #		paths such as "/../../etc".
 #
 # /conf/T/M/md_size
 #		The contents of the file specifies the size of the memory
 #		filesystem to be created, in 512 byte blocks.
 #		The default size is 10240 blocks (5MB). E.g. if
 #		/conf/base/etc/md_size contains "30000" then a 15MB MFS
 #		will be created. In case of multiple entries for the same
 #		directory M, the last one in the scanning order is used.
 #		NOTE: If you only need to create a memory filesystem but not
 #		initialize it from a template, it is preferable to specify
 #		it in fstab e.g. as  "md /tmp mfs -s=30m,rw 0 0"
 #
 # /conf/T/SUBDIR.cpio.gz
 #		The file is cpio'd into /SUBDIR (and a memory filesystem is
 #		created for /SUBDIR if necessary). The presence of this file
 #		prevents the copy from /conf/T/SUBDIR/
 #
 # /conf/T/M/extract
 #		This is alternative to SUBDIR.cpio.gz and remount.
 #		Similar to remount case, a memory filesystem is created
 #		for /M and initialized from a template but no mounting
 #		performed. Instead, this file is run passing /M as single
 #		argument. It is expected to extract template override to /M
 #		using auxiliary storage found in some embedded systems
 #		having NVRAM too small to hold mountable file system.
 #
 # /conf/T/SUBDIR.remove
 #		The list of paths contained in the file are rm -rf'd
 #		relative to /SUBDIR.
 #
 # /conf/diskless_remount
 #		Similar to /conf/T/M/diskless_remount above, but allows
 #		all of /conf to be remounted.  This can be used to allow
 #		multiple roots to share the same /conf.
 #
 #
 # You will almost universally want to create the following files under /conf
 #
 # File					Content
 # ----------------------------		----------------------------------
 # /conf/base/etc/md_size		size of /etc filesystem
 # /conf/base/etc/diskless_remount	"/etc"
 # /conf/default/etc/rc.conf		generic diskless config parameters
 # /conf/default/etc/fstab		generic diskless fstab e.g. like this
 #
 #	foo:/root_part			/	nfs	ro		0 0
 #	foo:/usr_part			/usr	nfs     ro		0 0
 #	foo:/home_part			/home   nfs     rw      	0 0
 #	md				/tmp	mfs     -s=30m,rw	0 0
 #	md				/var	mfs	-s=30m,rw	0 0
 #	proc				/proc	procfs	rw		0 0
 #
 # plus, possibly, overrides for password files etc.
 #
 # NOTE!  /var, /tmp, and /dev will be typically created elsewhere, e.g.
 # as entries in the fstab as above.
 # Those filesystems should not be specified in /conf.
 #
 # (end of documentation, now get to the real code)
 
 dlv=`/sbin/sysctl -n vfs.nfs.diskless_valid 2> /dev/null`
 
 # DEBUGGING
 # log something on stdout if verbose.
 o_verbose=0     # set to 1 or 2 if you want more debugging
 log() {
     [ ${o_verbose} -gt 0 ] && echo "*** $* ***"
     [ ${o_verbose} -gt 1 ] && read -p "=== Press enter to continue" foo
 }
 
 # chkerr:
 #
 # Routine to check for error
 #
 #	checks error code and drops into shell on failure.
 #	if shell exits, terminates script as well as /etc/rc.
 #	if remount_optional exists under the mountpoint, skip this check.
 #
 chkerr() {
     lastitem () ( n=$(($# - 1)) ; shift $n ; echo $1 )
     mountpoint="$(lastitem $2)"
     [ -r $mountpoint/remount_optional ] && ( echo "$2 failed: ignoring due to remount_optional" ; return )
     case $1 in
     0)
 	;;
     *)
 	echo "$2 failed: dropping into /bin/sh"
 	/bin/sh
 	# RESUME
 	;;
     esac
 }
 
 # The list of filesystems to umount after the copy
 to_umount=""
 
 handle_remount() { # $1 = mount point
     local nfspt mountopts b
     b=$1
     log handle_remount $1
     [ -d $b -a -f $b/diskless_remount ] || return
     read nfspt mountopts < $b/diskless_remount
     log "nfspt ${nfspt} mountopts ${mountopts}"
     # prepend the nfs root if not present
     [ `expr "$nfspt" : '\(.\)'` = "/" ] && nfspt="${nfsroot}${nfspt}"
     mount_nfs $mountopts $nfspt $b
     chkerr $? "mount_nfs $nfspt $b"
     to_umount="$b ${to_umount}"
 }
 
 # Create a generic memory disk.
 # The 'auto' parameter will attempt to use tmpfs(5), falls back to md(4).
 # $1 is size in 512-byte sectors, $2 is the mount point.
 mount_md() {
-    /sbin/mdmfs -s $1 auto $2
+    if [ ${o_verbose} -gt 0 ] ; then
+        /sbin/mdmfs -XL -s $1 auto $2
+    else
+        /sbin/mdmfs -s $1 auto $2
+    fi
 }
 
 # Create the memory filesystem if it has not already been created
 #
 create_md() {
 	[ "x`eval echo \\$md_created_$1`" = "x" ] || return # only once
 	if [ "x`eval echo \\$md_size_$1`" = "x" ]; then
 	    md_size=10240
 	else
 	    md_size=`eval echo \\$md_size_$1`
 	fi
 	log create_md $1 with size $md_size
 	mount_md $md_size /$1
 	/bin/chmod 755 /$1
 	eval md_created_$1=created
 }
 
 # DEBUGGING
 #
 # set -v
 
 # Figure out our interface and IP.
 #
 bootp_ifc=""
 bootp_ipa=""
 bootp_ipbca=""
 class=""
 if [ ${dlv:=0} -ne 0 ] ; then
 	iflist=`ifconfig -l`
 	for i in ${iflist} ; do
 	    set -- `ifconfig ${i}`
 	    while [ $# -ge 1 ] ; do
 		if [ "${bootp_ifc}" = "" -a "$1" = "inet" ] ; then
 		    bootp_ifc=${i} ; bootp_ipa=${2} ; shift
 		fi
 		if [ "${bootp_ipbca}" = "" -a "$1" = "broadcast" ] ; then
 		    bootp_ipbca=$2; shift
 		fi
 		shift
 	    done
 	    if [ "${bootp_ifc}" != "" ] ; then
 		break
 	    fi
 	done
 	# Get the values passed with the T134 bootp cookie.
 	class="`/sbin/sysctl -qn kern.bootp_cookie`"
 
 	echo "Interface ${bootp_ifc} IP-Address ${bootp_ipa} Broadcast ${bootp_ipbca} ${class}"
 fi
 
 log Figure out our NFS root path
 #
 set -- `mount -t nfs`
 while [ $# -ge 1 ] ; do
     if [ "$2" = "on" -a "$3" = "/" ]; then
 	nfsroot="$1"
 	break
     fi
     shift
 done
 
 # The list of directories with template files
 templates="base default"
 if [ -n "${bootp_ipbca}" ]; then
 	templates="${templates} ${bootp_ipbca} bcast/${bootp_ipbca}"
 fi
 if [ -n "${class}" ]; then
 	templates="${templates} ${class}"
 fi
 if [ -n "${bootp_ipa}" ]; then
 	templates="${templates} ${bootp_ipa} ip/${bootp_ipa}"
 fi
 
 # If /conf/diskless_remount exists, remount all of /conf.
 handle_remount /conf
 
 # Resolve templates in /conf/base, /conf/default, /conf/${bootp_ipbca},
 # and /conf/${bootp_ipa}.  For each subdirectory found within these
 # directories:
 #
 # - calculate memory filesystem sizes.  If the subdirectory (prior to
 #   NFS remounting) contains the file 'md_size', the contents specified
 #   in 512 byte sectors will be used to size the memory filesystem.  Otherwise
 #   8192 sectors (4MB) is used.
 #
 # - handle NFS remounts.  If the subdirectory contains the file
 #   diskless_remount, the contents of the file is NFS mounted over
 #   the directory.  For example /conf/base/etc/diskless_remount
 #   might contain 'myserver:/etc'.  NFS remounts allow you to avoid
 #   having to dup your system directories in /conf.  Your server must
 #   be sure to export those filesystems -alldirs, however.
 #   If the diskless_remount file contains a string beginning with a
 #   '/' it is assumed that the local nfsroot should be prepended to
 #   it before attemping to the remount.  This allows the root to be
 #   relocated without needing to change the remount files.
 #
 log "templates are ${templates}"
 for i in ${templates} ; do
     for j in /conf/$i/* ; do
 	[ -d $j ] || continue
 
 	# memory filesystem size specification
 	subdir=${j##*/}
 	[ -f $j/md_size ] && eval md_size_$subdir=`cat $j/md_size`
 
 	# remount. Beware, the command is in the file itself!
 	if [ -f $j/remount ]; then
 	    if [ -f $j/remount_subdir ]; then
 		k="/conf.tmp/$i/$subdir"
 		[ -d $k ] || continue
 
 		# Mount the filesystem root where the config data is
 		# on the temporary mount point.
 		nfspt=`/bin/cat $j/remount`
 		$nfspt $k
 		chkerr $? "$nfspt $k"
 
 		# Now use a nullfs mount to get the data where we
 		# really want to see it.
 		remount_subdir=`/bin/cat $j/remount_subdir`
 		remount_subdir_cmd="mount -t nullfs $k/$remount_subdir"
 
 		$remount_subdir_cmd $j
 		chkerr $? "$remount_subdir_cmd $j"
 
 		# XXX check order -- we must force $k to be unmounted
 		# after j, as j depends on k.
 		to_umount="$j $k ${to_umount}"
 	    else
 		nfspt=`/bin/cat $j/remount`
 		$nfspt $j
 		chkerr $? "$nfspt $j"
 		to_umount="$j ${to_umount}" # XXX hope it is really a mount!
 	    fi
 	fi
 
 	# NFS remount
 	handle_remount $j
     done
 done
 
 # - Create all required MFS filesystems and populate them from
 #   our templates.  Support both a direct template and a dir.cpio.gz
 #   archive. Support for auxiliary NVRAM. Support dir.remove files containing
 #   a list of relative paths to remove.
 #
 # The dir.cpio.gz form is there to make the copy process more efficient,
 # so if the cpio archive is present, it prevents the files from dir/
 # from being copied.
 
 for i in ${templates} ; do
     for j in /conf/$i/* ; do
 	subdir=${j##*/}
 	if [ -d $j -a ! -f $j.cpio.gz  ]; then
 	    create_md $subdir
 	    cp -Rp $j/ /$subdir
 	fi
     done
     for j in /conf/$i/*.cpio.gz ; do
 	subdir=${j%*.cpio.gz}
 	subdir=${subdir##*/}
 	if [ -f $j ]; then
 	    create_md $subdir
 	    echo "Loading /$subdir from cpio archive $j"
 	    (cd / ; /rescue/tar -xpf $j)
 	fi
     done
     for j in /conf/$i/*/extract ; do
 	if [ -x $j ]; then
 	    subdir=${j%*/extract}
 	    subdir=${subdir##*/}
 	    create_md $subdir
 	    echo "Loading /$subdir using auxiliary command $j"
 	    $j /$subdir
 	fi
     done
     for j in /conf/$i/*.remove ; do
 	subdir=${j%*.remove}
 	subdir=${subdir##*/}
 	if [ -f $j ]; then
 	    # doubly sure it is a memory disk before rm -rf'ing
 	    create_md $subdir
 	    (cd /$subdir; rm -rf `/bin/cat $j`)
 	fi
     done
 done
 
 # umount partitions used to fill the memory filesystems
 [ -n "${to_umount}" ] && umount $to_umount
Index: user/ngie/bug-237403/sbin/ifconfig/ifgre.c
===================================================================
--- user/ngie/bug-237403/sbin/ifconfig/ifgre.c	(revision 346925)
+++ user/ngie/bug-237403/sbin/ifconfig/ifgre.c	(revision 346926)
@@ -1,123 +1,144 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2008 Andrew Thompson. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR OR HIS RELATIVES BE LIABLE FOR ANY DIRECT,
  * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
  * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF MIND, USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
  * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <net/if.h>
 #include <net/if_gre.h>
 
 #include <ctype.h>
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <err.h>
 
 #include "ifconfig.h"
 
-#define	GREBITS	"\020\01ENABLE_CSUM\02ENABLE_SEQ"
+#define	GREBITS	"\020\01ENABLE_CSUM\02ENABLE_SEQ\03UDPENCAP"
 
 static	void gre_status(int s);
 
 static void
 gre_status(int s)
 {
-	uint32_t opts = 0;
+	uint32_t opts, port;
 
+	opts = 0;
 	ifr.ifr_data = (caddr_t)&opts;
 	if (ioctl(s, GREGKEY, &ifr) == 0)
 		if (opts != 0)
 			printf("\tgrekey: 0x%x (%u)\n", opts, opts);
 	opts = 0;
 	if (ioctl(s, GREGOPTS, &ifr) != 0 || opts == 0)
 		return;
+
+	port = 0;
+	ifr.ifr_data = (caddr_t)&port;
+	if (ioctl(s, GREGPORT, &ifr) == 0 && port != 0)
+		printf("\tudpport: %u\n", port);
 	printb("\toptions", opts, GREBITS);
 	putchar('\n');
 }
 
 static void
 setifgrekey(const char *val, int dummy __unused, int s, 
     const struct afswtch *afp)
 {
 	uint32_t grekey = strtol(val, NULL, 0);
 
 	strlcpy(ifr.ifr_name, name, sizeof (ifr.ifr_name));
 	ifr.ifr_data = (caddr_t)&grekey;
 	if (ioctl(s, GRESKEY, (caddr_t)&ifr) < 0)
 		warn("ioctl (set grekey)");
 }
 
 static void
+setifgreport(const char *val, int dummy __unused, int s,
+    const struct afswtch *afp)
+{
+	uint32_t udpport = strtol(val, NULL, 0);
+
+	strlcpy(ifr.ifr_name, name, sizeof (ifr.ifr_name));
+	ifr.ifr_data = (caddr_t)&udpport;
+	if (ioctl(s, GRESPORT, (caddr_t)&ifr) < 0)
+		warn("ioctl (set udpport)");
+}
+
+static void
 setifgreopts(const char *val, int d, int s, const struct afswtch *afp)
 {
 	uint32_t opts;
 
 	ifr.ifr_data = (caddr_t)&opts;
 	if (ioctl(s, GREGOPTS, &ifr) == -1) {
 		warn("ioctl(GREGOPTS)");
 		return;
 	}
 
 	if (d < 0)
 		opts &= ~(-d);
 	else
 		opts |= d;
 
 	if (ioctl(s, GRESOPTS, &ifr) == -1) {
 		warn("ioctl(GIFSOPTS)");
 		return;
 	}
 }
 
 
 static struct cmd gre_cmds[] = {
 	DEF_CMD_ARG("grekey",			setifgrekey),
+	DEF_CMD_ARG("udpport",			setifgreport),
 	DEF_CMD("enable_csum", GRE_ENABLE_CSUM,	setifgreopts),
 	DEF_CMD("-enable_csum",-GRE_ENABLE_CSUM,setifgreopts),
 	DEF_CMD("enable_seq", GRE_ENABLE_SEQ,	setifgreopts),
 	DEF_CMD("-enable_seq",-GRE_ENABLE_SEQ,	setifgreopts),
+	DEF_CMD("udpencap", GRE_UDPENCAP,	setifgreopts),
+	DEF_CMD("-udpencap",-GRE_UDPENCAP,	setifgreopts),
 };
 static struct afswtch af_gre = {
 	.af_name	= "af_gre",
 	.af_af		= AF_UNSPEC,
 	.af_other_status = gre_status,
 };
 
 static __constructor void
 gre_ctor(void)
 {
 	size_t i;
 
 	for (i = 0; i < nitems(gre_cmds);  i++)
 		cmd_register(&gre_cmds[i]);
 	af_register(&af_gre);
 }
Index: user/ngie/bug-237403/sbin/ipfw/ipfw2.c
===================================================================
--- user/ngie/bug-237403/sbin/ipfw/ipfw2.c	(revision 346925)
+++ user/ngie/bug-237403/sbin/ipfw/ipfw2.c	(revision 346926)
@@ -1,5583 +1,5587 @@
 /*-
  * Copyright (c) 2002-2003 Luigi Rizzo
  * Copyright (c) 1996 Alex Nash, Paul Traina, Poul-Henning Kamp
  * Copyright (c) 1994 Ugen J.S.Antsilevich
  *
  * Idea and grammar partially left from:
  * Copyright (c) 1993 Daniel Boulet
  *
  * Redistribution and use in source forms, with and without modification,
  * are permitted provided that this entire comment appears intact.
  *
  * Redistribution in binary form may occur without any restrictions.
  * Obviously, it would be nice if you gave credit where credit is due
  * but requiring it would be too onerous.
  *
  * This software is provided ``AS IS'' without any warranties of any kind.
  *
  * NEW command line interface for IP firewall facility
  *
  * $FreeBSD$
  */
 
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 
 #include "ipfw2.h"
 
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <grp.h>
 #include <jail.h>
 #include <netdb.h>
 #include <pwd.h>
 #include <stdio.h>
 #include <stdarg.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sysexits.h>
 #include <time.h>	/* ctime */
 #include <timeconv.h>	/* _long_to_time */
 #include <unistd.h>
 #include <fcntl.h>
 #include <stddef.h>	/* offsetof */
 
 #include <net/ethernet.h>
 #include <net/if.h>		/* only IFNAMSIZ */
 #include <netinet/in.h>
 #include <netinet/in_systm.h>	/* only n_short, n_long */
 #include <netinet/ip.h>
 #include <netinet/ip_icmp.h>
 #include <netinet/ip_fw.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>
 
 struct cmdline_opts co;	/* global options */
 
 struct format_opts {
 	int bcwidth;
 	int pcwidth;
 	int show_counters;
 	int show_time;		/* show timestamp */
 	uint32_t set_mask;	/* enabled sets mask */
 	uint32_t flags;		/* request flags */
 	uint32_t first;		/* first rule to request */
 	uint32_t last;		/* last rule to request */
 	uint32_t dcnt;		/* number of dynamic states */
 	ipfw_obj_ctlv *tstate;	/* table state data */
 };
 
 int resvd_set_number = RESVD_SET;
 
 int ipfw_socket = -1;
 
 #define	CHECK_LENGTH(v, len) do {				\
 	if ((v) < (len))					\
 		errx(EX_DATAERR, "Rule too long");		\
 	} while (0)
 /*
  * Check if we have enough space in cmd buffer. Note that since
  * first 8? u32 words are reserved by reserved header, full cmd
  * buffer can't be used, so we need to protect from buffer overrun
  * only. At the beginning, cblen is less than actual buffer size by
  * size of ipfw_insn_u32 instruction + 1 u32 work. This eliminates need
  * for checking small instructions fitting in given range.
  * We also (ab)use the fact that ipfw_insn is always the first field
  * for any custom instruction.
  */
 #define	CHECK_CMDLEN	CHECK_LENGTH(cblen, F_LEN((ipfw_insn *)cmd))
 
 #define GET_UINT_ARG(arg, min, max, tok, s_x) do {			\
 	if (!av[0])							\
 		errx(EX_USAGE, "%s: missing argument", match_value(s_x, tok)); \
 	if (_substrcmp(*av, "tablearg") == 0) {				\
 		arg = IP_FW_TARG;					\
 		break;							\
 	}								\
 									\
 	{								\
 	long _xval;							\
 	char *end;							\
 									\
 	_xval = strtol(*av, &end, 10);					\
 									\
 	if (!isdigit(**av) || *end != '\0' || (_xval == 0 && errno == EINVAL)) \
 		errx(EX_DATAERR, "%s: invalid argument: %s",		\
 		    match_value(s_x, tok), *av);			\
 									\
 	if (errno == ERANGE || _xval < min || _xval > max)		\
 		errx(EX_DATAERR, "%s: argument is out of range (%u..%u): %s", \
 		    match_value(s_x, tok), min, max, *av);		\
 									\
 	if (_xval == IP_FW_TARG)					\
 		errx(EX_DATAERR, "%s: illegal argument value: %s",	\
 		    match_value(s_x, tok), *av);			\
 	arg = _xval;							\
 	}								\
 } while (0)
 
 static struct _s_x f_tcpflags[] = {
 	{ "syn", TH_SYN },
 	{ "fin", TH_FIN },
 	{ "ack", TH_ACK },
 	{ "psh", TH_PUSH },
 	{ "rst", TH_RST },
 	{ "urg", TH_URG },
 	{ "tcp flag", 0 },
 	{ NULL,	0 }
 };
 
 static struct _s_x f_tcpopts[] = {
 	{ "mss",	IP_FW_TCPOPT_MSS },
 	{ "maxseg",	IP_FW_TCPOPT_MSS },
 	{ "window",	IP_FW_TCPOPT_WINDOW },
 	{ "sack",	IP_FW_TCPOPT_SACK },
 	{ "ts",		IP_FW_TCPOPT_TS },
 	{ "timestamp",	IP_FW_TCPOPT_TS },
 	{ "cc",		IP_FW_TCPOPT_CC },
 	{ "tcp option",	0 },
 	{ NULL,	0 }
 };
 
 /*
  * IP options span the range 0 to 255 so we need to remap them
  * (though in fact only the low 5 bits are significant).
  */
 static struct _s_x f_ipopts[] = {
 	{ "ssrr",	IP_FW_IPOPT_SSRR},
 	{ "lsrr",	IP_FW_IPOPT_LSRR},
 	{ "rr",		IP_FW_IPOPT_RR},
 	{ "ts",		IP_FW_IPOPT_TS},
 	{ "ip option",	0 },
 	{ NULL,	0 }
 };
 
 static struct _s_x f_iptos[] = {
 	{ "lowdelay",	IPTOS_LOWDELAY},
 	{ "throughput",	IPTOS_THROUGHPUT},
 	{ "reliability", IPTOS_RELIABILITY},
 	{ "mincost",	IPTOS_MINCOST},
 	{ "congestion",	IPTOS_ECN_CE},
 	{ "ecntransport", IPTOS_ECN_ECT0},
 	{ "ip tos option", 0},
 	{ NULL,	0 }
 };
 
 struct _s_x f_ipdscp[] = {
 	{ "af11", IPTOS_DSCP_AF11 >> 2 },	/* 001010 */
 	{ "af12", IPTOS_DSCP_AF12 >> 2 },	/* 001100 */
 	{ "af13", IPTOS_DSCP_AF13 >> 2 },	/* 001110 */
 	{ "af21", IPTOS_DSCP_AF21 >> 2 },	/* 010010 */
 	{ "af22", IPTOS_DSCP_AF22 >> 2 },	/* 010100 */
 	{ "af23", IPTOS_DSCP_AF23 >> 2 },	/* 010110 */
 	{ "af31", IPTOS_DSCP_AF31 >> 2 },	/* 011010 */
 	{ "af32", IPTOS_DSCP_AF32 >> 2 },	/* 011100 */
 	{ "af33", IPTOS_DSCP_AF33 >> 2 },	/* 011110 */
 	{ "af41", IPTOS_DSCP_AF41 >> 2 },	/* 100010 */
 	{ "af42", IPTOS_DSCP_AF42 >> 2 },	/* 100100 */
 	{ "af43", IPTOS_DSCP_AF43 >> 2 },	/* 100110 */
 	{ "be", IPTOS_DSCP_CS0 >> 2 }, 	/* 000000 */
 	{ "ef", IPTOS_DSCP_EF >> 2 },	/* 101110 */
 	{ "cs0", IPTOS_DSCP_CS0 >> 2 },	/* 000000 */
 	{ "cs1", IPTOS_DSCP_CS1 >> 2 },	/* 001000 */
 	{ "cs2", IPTOS_DSCP_CS2 >> 2 },	/* 010000 */
 	{ "cs3", IPTOS_DSCP_CS3 >> 2 },	/* 011000 */
 	{ "cs4", IPTOS_DSCP_CS4 >> 2 },	/* 100000 */
 	{ "cs5", IPTOS_DSCP_CS5 >> 2 },	/* 101000 */
 	{ "cs6", IPTOS_DSCP_CS6 >> 2 },	/* 110000 */
 	{ "cs7", IPTOS_DSCP_CS7 >> 2 },	/* 100000 */
 	{ NULL, 0 }
 };
 
 static struct _s_x limit_masks[] = {
 	{"all",		DYN_SRC_ADDR|DYN_SRC_PORT|DYN_DST_ADDR|DYN_DST_PORT},
 	{"src-addr",	DYN_SRC_ADDR},
 	{"src-port",	DYN_SRC_PORT},
 	{"dst-addr",	DYN_DST_ADDR},
 	{"dst-port",	DYN_DST_PORT},
 	{NULL,		0}
 };
 
 /*
  * we use IPPROTO_ETHERTYPE as a fake protocol id to call the print routines
  * This is only used in this code.
  */
 #define IPPROTO_ETHERTYPE	0x1000
 static struct _s_x ether_types[] = {
     /*
      * Note, we cannot use "-:&/" in the names because they are field
      * separators in the type specifications. Also, we use s = NULL as
      * end-delimiter, because a type of 0 can be legal.
      */
 	{ "ip",		0x0800 },
 	{ "ipv4",	0x0800 },
 	{ "ipv6",	0x86dd },
 	{ "arp",	0x0806 },
 	{ "rarp",	0x8035 },
 	{ "vlan",	0x8100 },
 	{ "loop",	0x9000 },
 	{ "trail",	0x1000 },
 	{ "at",		0x809b },
 	{ "atalk",	0x809b },
 	{ "aarp",	0x80f3 },
 	{ "pppoe_disc",	0x8863 },
 	{ "pppoe_sess",	0x8864 },
 	{ "ipx_8022",	0x00E0 },
 	{ "ipx_8023",	0x0000 },
 	{ "ipx_ii",	0x8137 },
 	{ "ipx_snap",	0x8137 },
 	{ "ipx",	0x8137 },
 	{ "ns",		0x0600 },
 	{ NULL,		0 }
 };
 
 static struct _s_x rule_eactions[] = {
 	{ "nat64clat",		TOK_NAT64CLAT },
 	{ "nat64lsn",		TOK_NAT64LSN },
 	{ "nat64stl",		TOK_NAT64STL },
 	{ "nptv6",		TOK_NPTV6 },
 	{ "tcp-setmss",		TOK_TCPSETMSS },
 	{ NULL, 0 }	/* terminator */
 };
 
 static struct _s_x rule_actions[] = {
 	{ "abort6",		TOK_ABORT6 },
 	{ "abort",		TOK_ABORT },
 	{ "accept",		TOK_ACCEPT },
 	{ "pass",		TOK_ACCEPT },
 	{ "allow",		TOK_ACCEPT },
 	{ "permit",		TOK_ACCEPT },
 	{ "count",		TOK_COUNT },
 	{ "pipe",		TOK_PIPE },
 	{ "queue",		TOK_QUEUE },
 	{ "divert",		TOK_DIVERT },
 	{ "tee",		TOK_TEE },
 	{ "netgraph",		TOK_NETGRAPH },
 	{ "ngtee",		TOK_NGTEE },
 	{ "fwd",		TOK_FORWARD },
 	{ "forward",		TOK_FORWARD },
 	{ "skipto",		TOK_SKIPTO },
 	{ "deny",		TOK_DENY },
 	{ "drop",		TOK_DENY },
 	{ "reject",		TOK_REJECT },
 	{ "reset6",		TOK_RESET6 },
 	{ "reset",		TOK_RESET },
 	{ "unreach6",		TOK_UNREACH6 },
 	{ "unreach",		TOK_UNREACH },
 	{ "check-state",	TOK_CHECKSTATE },
 	{ "//",			TOK_COMMENT },
 	{ "nat",		TOK_NAT },
 	{ "reass",		TOK_REASS },
 	{ "setfib",		TOK_SETFIB },
 	{ "setdscp",		TOK_SETDSCP },
 	{ "call",		TOK_CALL },
 	{ "return",		TOK_RETURN },
 	{ "eaction",		TOK_EACTION },
 	{ "tcp-setmss",		TOK_TCPSETMSS },
 	{ NULL, 0 }	/* terminator */
 };
 
 static struct _s_x rule_action_params[] = {
 	{ "altq",		TOK_ALTQ },
 	{ "log",		TOK_LOG },
 	{ "tag",		TOK_TAG },
 	{ "untag",		TOK_UNTAG },
 	{ NULL, 0 }	/* terminator */
 };
 
 /*
  * The 'lookup' instruction accepts one of the following arguments.
  * -1 is a terminator for the list.
  * Arguments are passed as v[1] in O_DST_LOOKUP options.
  */
 static int lookup_key[] = {
 	TOK_DSTIP, TOK_SRCIP, TOK_DSTPORT, TOK_SRCPORT,
 	TOK_UID, TOK_JAIL, TOK_DSCP, -1 };
 
 static struct _s_x rule_options[] = {
 	{ "tagged",		TOK_TAGGED },
 	{ "uid",		TOK_UID },
 	{ "gid",		TOK_GID },
 	{ "jail",		TOK_JAIL },
 	{ "in",			TOK_IN },
 	{ "limit",		TOK_LIMIT },
 	{ "set-limit",		TOK_SETLIMIT },
 	{ "keep-state",		TOK_KEEPSTATE },
 	{ "record-state",	TOK_RECORDSTATE },
 	{ "bridged",		TOK_LAYER2 },
 	{ "layer2",		TOK_LAYER2 },
 	{ "out",		TOK_OUT },
 	{ "diverted",		TOK_DIVERTED },
 	{ "diverted-loopback",	TOK_DIVERTEDLOOPBACK },
 	{ "diverted-output",	TOK_DIVERTEDOUTPUT },
 	{ "xmit",		TOK_XMIT },
 	{ "recv",		TOK_RECV },
 	{ "via",		TOK_VIA },
 	{ "fragment",		TOK_FRAG },
 	{ "frag",		TOK_FRAG },
 	{ "fib",		TOK_FIB },
 	{ "ipoptions",		TOK_IPOPTS },
 	{ "ipopts",		TOK_IPOPTS },
 	{ "iplen",		TOK_IPLEN },
 	{ "ipid",		TOK_IPID },
 	{ "ipprecedence",	TOK_IPPRECEDENCE },
 	{ "dscp",		TOK_DSCP },
 	{ "iptos",		TOK_IPTOS },
 	{ "ipttl",		TOK_IPTTL },
 	{ "ipversion",		TOK_IPVER },
 	{ "ipver",		TOK_IPVER },
 	{ "estab",		TOK_ESTAB },
 	{ "established",	TOK_ESTAB },
 	{ "setup",		TOK_SETUP },
 	{ "sockarg",		TOK_SOCKARG },
 	{ "tcpdatalen",		TOK_TCPDATALEN },
 	{ "tcpflags",		TOK_TCPFLAGS },
 	{ "tcpflgs",		TOK_TCPFLAGS },
 	{ "tcpoptions",		TOK_TCPOPTS },
 	{ "tcpopts",		TOK_TCPOPTS },
 	{ "tcpseq",		TOK_TCPSEQ },
 	{ "tcpack",		TOK_TCPACK },
 	{ "tcpwin",		TOK_TCPWIN },
 	{ "icmptype",		TOK_ICMPTYPES },
 	{ "icmptypes",		TOK_ICMPTYPES },
 	{ "dst-ip",		TOK_DSTIP },
 	{ "src-ip",		TOK_SRCIP },
 	{ "dst-port",		TOK_DSTPORT },
 	{ "src-port",		TOK_SRCPORT },
 	{ "proto",		TOK_PROTO },
 	{ "MAC",		TOK_MAC },
 	{ "mac",		TOK_MAC },
 	{ "mac-type",		TOK_MACTYPE },
 	{ "verrevpath",		TOK_VERREVPATH },
 	{ "versrcreach",	TOK_VERSRCREACH },
 	{ "antispoof",		TOK_ANTISPOOF },
 	{ "ipsec",		TOK_IPSEC },
 	{ "icmp6type",		TOK_ICMP6TYPES },
 	{ "icmp6types",		TOK_ICMP6TYPES },
 	{ "ext6hdr",		TOK_EXT6HDR},
 	{ "flow-id",		TOK_FLOWID},
 	{ "ipv6",		TOK_IPV6},
 	{ "ip6",		TOK_IPV6},
 	{ "ipv4",		TOK_IPV4},
 	{ "ip4",		TOK_IPV4},
 	{ "dst-ipv6",		TOK_DSTIP6},
 	{ "dst-ip6",		TOK_DSTIP6},
 	{ "src-ipv6",		TOK_SRCIP6},
 	{ "src-ip6",		TOK_SRCIP6},
 	{ "lookup",		TOK_LOOKUP},
 	{ "flow",		TOK_FLOW},
 	{ "defer-action",	TOK_SKIPACTION },
 	{ "defer-immediate-action",	TOK_SKIPACTION },
 	{ "//",			TOK_COMMENT },
 
 	{ "not",		TOK_NOT },		/* pseudo option */
 	{ "!", /* escape ? */	TOK_NOT },		/* pseudo option */
 	{ "or",			TOK_OR },		/* pseudo option */
 	{ "|", /* escape */	TOK_OR },		/* pseudo option */
 	{ "{",			TOK_STARTBRACE },	/* pseudo option */
 	{ "(",			TOK_STARTBRACE },	/* pseudo option */
 	{ "}",			TOK_ENDBRACE },		/* pseudo option */
 	{ ")",			TOK_ENDBRACE },		/* pseudo option */
 	{ NULL, 0 }	/* terminator */
 };
 
 void bprint_uint_arg(struct buf_pr *bp, const char *str, uint32_t arg);
 static int ipfw_get_config(struct cmdline_opts *co, struct format_opts *fo,
     ipfw_cfg_lheader **pcfg, size_t *psize);
 static int ipfw_show_config(struct cmdline_opts *co, struct format_opts *fo,
     ipfw_cfg_lheader *cfg, size_t sz, int ac, char **av);
 static void ipfw_list_tifaces(void);
 
 struct tidx;
 static uint16_t pack_object(struct tidx *tstate, char *name, int otype);
 static uint16_t pack_table(struct tidx *tstate, char *name);
 
 static char *table_search_ctlv(ipfw_obj_ctlv *ctlv, uint16_t idx);
 static void object_sort_ctlv(ipfw_obj_ctlv *ctlv);
 static char *object_search_ctlv(ipfw_obj_ctlv *ctlv, uint16_t idx,
     uint16_t type);
 
 /*
  * Simple string buffer API.
  * Used to simplify buffer passing between function and for
  * transparent overrun handling.
  */
 
 /*
  * Allocates new buffer of given size @sz.
  *
  * Returns 0 on success.
  */
 int
 bp_alloc(struct buf_pr *b, size_t size)
 {
 	memset(b, 0, sizeof(struct buf_pr));
 
 	if ((b->buf = calloc(1, size)) == NULL)
 		return (ENOMEM);
 
 	b->ptr = b->buf;
 	b->size = size;
 	b->avail = b->size;
 
 	return (0);
 }
 
 void
 bp_free(struct buf_pr *b)
 {
 
 	free(b->buf);
 }
 
 /*
  * Flushes buffer so new writer start from beginning.
  */
 void
 bp_flush(struct buf_pr *b)
 {
 
 	b->ptr = b->buf;
 	b->avail = b->size;
 	b->buf[0] = '\0';
 }
 
 /*
  * Print message specified by @format and args.
  * Automatically manage buffer space and transparently handle
  * buffer overruns.
  *
  * Returns number of bytes that should have been printed.
  */
 int
 bprintf(struct buf_pr *b, char *format, ...)
 {
 	va_list args;
 	int i;
 
 	va_start(args, format);
 
 	i = vsnprintf(b->ptr, b->avail, format, args);
 	va_end(args);
 
 	if (i > b->avail || i < 0) {
 		/* Overflow or print error */
 		b->avail = 0;
 	} else {
 		b->ptr += i;
 		b->avail -= i;
 	} 
 
 	b->needed += i;
 
 	return (i);
 }
 
 /*
  * Special values printer for tablearg-aware opcodes.
  */
 void
 bprint_uint_arg(struct buf_pr *bp, const char *str, uint32_t arg)
 {
 
 	if (str != NULL)
 		bprintf(bp, "%s", str);
 	if (arg == IP_FW_TARG)
 		bprintf(bp, "tablearg");
 	else
 		bprintf(bp, "%u", arg);
 }
 
 /*
  * Helper routine to print a possibly unaligned uint64_t on
  * various platform. If width > 0, print the value with
  * the desired width, followed by a space;
  * otherwise, return the required width.
  */
 int
 pr_u64(struct buf_pr *b, uint64_t *pd, int width)
 {
 #ifdef TCC
 #define U64_FMT "I64"
 #else
 #define U64_FMT "llu"
 #endif
 	uint64_t u;
 	unsigned long long d;
 
 	bcopy (pd, &u, sizeof(u));
 	d = u;
 	return (width > 0) ?
 		bprintf(b, "%*" U64_FMT " ", width, d) :
 		snprintf(NULL, 0, "%" U64_FMT, d) ;
 #undef U64_FMT
 }
 
 
 void *
 safe_calloc(size_t number, size_t size)
 {
 	void *ret = calloc(number, size);
 
 	if (ret == NULL)
 		err(EX_OSERR, "calloc");
 	return ret;
 }
 
 void *
 safe_realloc(void *ptr, size_t size)
 {
 	void *ret = realloc(ptr, size);
 
 	if (ret == NULL)
 		err(EX_OSERR, "realloc");
 	return ret;
 }
 
 /*
  * Compare things like interface or table names.
  */
 int
 stringnum_cmp(const char *a, const char *b)
 {
 	int la, lb;
 
 	la = strlen(a);
 	lb = strlen(b);
 
 	if (la > lb)
 		return (1);
 	else if (la < lb)
 		return (-01);
 
 	return (strcmp(a, b));
 }
 
 
 /*
  * conditionally runs the command.
  * Selected options or negative -> getsockopt
  */
 int
 do_cmd(int optname, void *optval, uintptr_t optlen)
 {
 	int i;
 
 	if (co.test_only)
 		return 0;
 
 	if (ipfw_socket == -1)
 		ipfw_socket = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
 	if (ipfw_socket < 0)
 		err(EX_UNAVAILABLE, "socket");
 
 	if (optname == IP_FW_GET || optname == IP_DUMMYNET_GET ||
 	    optname == IP_FW_ADD || optname == IP_FW3 ||
 	    optname == IP_FW_NAT_GET_CONFIG ||
 	    optname < 0 ||
 	    optname == IP_FW_NAT_GET_LOG) {
 		if (optname < 0)
 			optname = -optname;
 		i = getsockopt(ipfw_socket, IPPROTO_IP, optname, optval,
 			(socklen_t *)optlen);
 	} else {
 		i = setsockopt(ipfw_socket, IPPROTO_IP, optname, optval, optlen);
 	}
 	return i;
 }
 
 /*
  * do_set3 - pass ipfw control cmd to kernel
  * @optname: option name
  * @optval: pointer to option data
  * @optlen: option length
  *
  * Assumes op3 header is already embedded.
  * Calls setsockopt() with IP_FW3 as kernel-visible opcode.
  * Returns 0 on success or errno otherwise.
  */
 int
 do_set3(int optname, ip_fw3_opheader *op3, size_t optlen)
 {
 
 	if (co.test_only)
 		return (0);
 
 	if (ipfw_socket == -1)
 		ipfw_socket = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
 	if (ipfw_socket < 0)
 		err(EX_UNAVAILABLE, "socket");
 
 	op3->opcode = optname;
 
 	return (setsockopt(ipfw_socket, IPPROTO_IP, IP_FW3, op3, optlen));
 }
 
 /*
  * do_get3 - pass ipfw control cmd to kernel
  * @optname: option name
  * @optval: pointer to option data
  * @optlen: pointer to option length
  *
  * Assumes op3 header is already embedded.
  * Calls getsockopt() with IP_FW3 as kernel-visible opcode.
  * Returns 0 on success or errno otherwise.
  */
 int
 do_get3(int optname, ip_fw3_opheader *op3, size_t *optlen)
 {
 	int error;
 	socklen_t len;
 
 	if (co.test_only)
 		return (0);
 
 	if (ipfw_socket == -1)
 		ipfw_socket = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
 	if (ipfw_socket < 0)
 		err(EX_UNAVAILABLE, "socket");
 
 	op3->opcode = optname;
 
 	len = *optlen;
 	error = getsockopt(ipfw_socket, IPPROTO_IP, IP_FW3, op3, &len);
 	*optlen = len;
 
 	return (error);
 }
 
 /**
  * match_token takes a table and a string, returns the value associated
  * with the string (-1 in case of failure).
  */
 int
 match_token(struct _s_x *table, const char *string)
 {
 	struct _s_x *pt;
 	uint i = strlen(string);
 
 	for (pt = table ; i && pt->s != NULL ; pt++)
 		if (strlen(pt->s) == i && !bcmp(string, pt->s, i))
 			return pt->x;
 	return (-1);
 }
 
 /**
  * match_token_relaxed takes a table and a string, returns the value associated
  * with the string for the best match.
  *
  * Returns:
  * value from @table for matched records
  * -1 for non-matched records
  * -2 if more than one records match @string.
  */
 int
 match_token_relaxed(struct _s_x *table, const char *string)
 {
 	struct _s_x *pt, *m;
 	int i, c;
 
 	i = strlen(string);
 	c = 0;
 
 	for (pt = table ; i != 0 && pt->s != NULL ; pt++) {
 		if (strncmp(pt->s, string, i) != 0)
 			continue;
 		m = pt;
 		c++;
 	}
 
 	if (c == 1)
 		return (m->x);
 
 	return (c > 0 ? -2: -1);
 }
 
 int
 get_token(struct _s_x *table, const char *string, const char *errbase)
 {
 	int tcmd;
 
 	if ((tcmd = match_token_relaxed(table, string)) < 0)
 		errx(EX_USAGE, "%s %s %s",
 		    (tcmd == 0) ? "invalid" : "ambiguous", errbase, string);
 
 	return (tcmd);
 }
 
 /**
  * match_value takes a table and a value, returns the string associated
  * with the value (NULL in case of failure).
  */
 char const *
 match_value(struct _s_x *p, int value)
 {
 	for (; p->s != NULL; p++)
 		if (p->x == value)
 			return p->s;
 	return NULL;
 }
 
 size_t
 concat_tokens(char *buf, size_t bufsize, struct _s_x *table, char *delimiter)
 {
 	struct _s_x *pt;
 	int l;
 	size_t sz;
 
 	for (sz = 0, pt = table ; pt->s != NULL; pt++) {
 		l = snprintf(buf + sz, bufsize - sz, "%s%s",
 		    (sz == 0) ? "" : delimiter, pt->s);
 		sz += l;
 		bufsize += l;
 		if (sz > bufsize)
 			return (bufsize);
 	}
 
 	return (sz);
 }
 
 /*
  * helper function to process a set of flags and set bits in the
  * appropriate masks.
  */
 int
 fill_flags(struct _s_x *flags, char *p, char **e, uint32_t *set,
     uint32_t *clear)
 {
 	char *q;	/* points to the separator */
 	int val;
 	uint32_t *which;	/* mask we are working on */
 
 	while (p && *p) {
 		if (*p == '!') {
 			p++;
 			which = clear;
 		} else
 			which = set;
 		q = strchr(p, ',');
 		if (q)
 			*q++ = '\0';
 		val = match_token(flags, p);
 		if (val <= 0) {
 			if (e != NULL)
 				*e = p;
 			return (-1);
 		}
 		*which |= (uint32_t)val;
 		p = q;
 	}
 	return (0);
 }
 
 void
 print_flags_buffer(char *buf, size_t sz, struct _s_x *list, uint32_t set)
 {
 	char const *comma = "";
 	int i, l;
 
 	for (i = 0; list[i].x != 0; i++) {
 		if ((set & list[i].x) == 0)
 			continue;
 		
 		set &= ~list[i].x;
 		l = snprintf(buf, sz, "%s%s", comma, list[i].s);
 		if (l >= sz)
 			return;
 		comma = ",";
 		buf += l;
 		sz -=l;
 	}
 }
 
 /*
  * _substrcmp takes two strings and returns 1 if they do not match,
  * and 0 if they match exactly or the first string is a sub-string
  * of the second.  A warning is printed to stderr in the case that the
  * first string is a sub-string of the second.
  *
  * This function will be removed in the future through the usual
  * deprecation process.
  */
 int
 _substrcmp(const char *str1, const char* str2)
 {
 
 	if (strncmp(str1, str2, strlen(str1)) != 0)
 		return 1;
 
 	if (strlen(str1) != strlen(str2))
 		warnx("DEPRECATED: '%s' matched '%s' as a sub-string",
 		    str1, str2);
 	return 0;
 }
 
 /*
  * _substrcmp2 takes three strings and returns 1 if the first two do not match,
  * and 0 if they match exactly or the second string is a sub-string
  * of the first.  A warning is printed to stderr in the case that the
  * first string does not match the third.
  *
  * This function exists to warn about the bizarre construction
  * strncmp(str, "by", 2) which is used to allow people to use a shortcut
  * for "bytes".  The problem is that in addition to accepting "by",
  * "byt", "byte", and "bytes", it also excepts "by_rabid_dogs" and any
  * other string beginning with "by".
  *
  * This function will be removed in the future through the usual
  * deprecation process.
  */
 int
 _substrcmp2(const char *str1, const char* str2, const char* str3)
 {
 
 	if (strncmp(str1, str2, strlen(str2)) != 0)
 		return 1;
 
 	if (strcmp(str1, str3) != 0)
 		warnx("DEPRECATED: '%s' matched '%s'",
 		    str1, str3);
 	return 0;
 }
 
 /*
  * prints one port, symbolic or numeric
  */
 static void
 print_port(struct buf_pr *bp, int proto, uint16_t port)
 {
 
 	if (proto == IPPROTO_ETHERTYPE) {
 		char const *s;
 
 		if (co.do_resolv && (s = match_value(ether_types, port)) )
 			bprintf(bp, "%s", s);
 		else
 			bprintf(bp, "0x%04x", port);
 	} else {
 		struct servent *se = NULL;
 		if (co.do_resolv) {
 			struct protoent *pe = getprotobynumber(proto);
 
 			se = getservbyport(htons(port), pe ? pe->p_name : NULL);
 		}
 		if (se)
 			bprintf(bp, "%s", se->s_name);
 		else
 			bprintf(bp, "%d", port);
 	}
 }
 
 static struct _s_x _port_name[] = {
 	{"dst-port",	O_IP_DSTPORT},
 	{"src-port",	O_IP_SRCPORT},
 	{"ipid",	O_IPID},
 	{"iplen",	O_IPLEN},
 	{"ipttl",	O_IPTTL},
 	{"mac-type",	O_MAC_TYPE},
 	{"tcpdatalen",	O_TCPDATALEN},
 	{"tcpwin",	O_TCPWIN},
 	{"tagged",	O_TAGGED},
 	{NULL,		0}
 };
 
 /*
  * Print the values in a list 16-bit items of the types above.
  * XXX todo: add support for mask.
  */
 static void
 print_newports(struct buf_pr *bp, ipfw_insn_u16 *cmd, int proto, int opcode)
 {
 	uint16_t *p = cmd->ports;
 	int i;
 	char const *sep;
 
 	if (opcode != 0) {
 		sep = match_value(_port_name, opcode);
 		if (sep == NULL)
 			sep = "???";
 		bprintf(bp, " %s", sep);
 	}
 	sep = " ";
 	for (i = F_LEN((ipfw_insn *)cmd) - 1; i > 0; i--, p += 2) {
 		bprintf(bp, "%s", sep);
 		print_port(bp, proto, p[0]);
 		if (p[0] != p[1]) {
 			bprintf(bp, "-");
 			print_port(bp, proto, p[1]);
 		}
 		sep = ",";
 	}
 }
 
 /*
  * Like strtol, but also translates service names into port numbers
  * for some protocols.
  * In particular:
  *	proto == -1 disables the protocol check;
  *	proto == IPPROTO_ETHERTYPE looks up an internal table
  *	proto == <some value in /etc/protocols> matches the values there.
  * Returns *end == s in case the parameter is not found.
  */
 static int
 strtoport(char *s, char **end, int base, int proto)
 {
 	char *p, *buf;
 	char *s1;
 	int i;
 
 	*end = s;		/* default - not found */
 	if (*s == '\0')
 		return 0;	/* not found */
 
 	if (isdigit(*s))
 		return strtol(s, end, base);
 
 	/*
 	 * find separator. '\\' escapes the next char.
 	 */
 	for (s1 = s; *s1 && (isalnum(*s1) || *s1 == '\\' ||
 	    *s1 == '_' || *s1 == '.') ; s1++)
 		if (*s1 == '\\' && s1[1] != '\0')
 			s1++;
 
 	buf = safe_calloc(s1 - s + 1, 1);
 
 	/*
 	 * copy into a buffer skipping backslashes
 	 */
 	for (p = s, i = 0; p != s1 ; p++)
 		if (*p != '\\')
 			buf[i++] = *p;
 	buf[i++] = '\0';
 
 	if (proto == IPPROTO_ETHERTYPE) {
 		i = match_token(ether_types, buf);
 		free(buf);
 		if (i != -1) {	/* found */
 			*end = s1;
 			return i;
 		}
 	} else {
 		struct protoent *pe = NULL;
 		struct servent *se;
 
 		if (proto != 0)
 			pe = getprotobynumber(proto);
 		setservent(1);
 		se = getservbyname(buf, pe ? pe->p_name : NULL);
 		free(buf);
 		if (se != NULL) {
 			*end = s1;
 			return ntohs(se->s_port);
 		}
 	}
 	return 0;	/* not found */
 }
 
 /*
  * Fill the body of the command with the list of port ranges.
  */
 static int
 fill_newports(ipfw_insn_u16 *cmd, char *av, int proto, int cblen)
 {
 	uint16_t a, b, *p = cmd->ports;
 	int i = 0;
 	char *s = av;
 
 	while (*s) {
 		a = strtoport(av, &s, 0, proto);
 		if (s == av) 			/* empty or invalid argument */
 			return (0);
 
 		CHECK_LENGTH(cblen, i + 2);
 
 		switch (*s) {
 		case '-':			/* a range */
 			av = s + 1;
 			b = strtoport(av, &s, 0, proto);
 			/* Reject expressions like '1-abc' or '1-2-3'. */
 			if (s == av || (*s != ',' && *s != '\0'))
 				return (0);
 			p[0] = a;
 			p[1] = b;
 			break;
 		case ',':			/* comma separated list */
 		case '\0':
 			p[0] = p[1] = a;
 			break;
 		default:
 			warnx("port list: invalid separator <%c> in <%s>",
 				*s, av);
 			return (0);
 		}
 
 		i++;
 		p += 2;
 		av = s + 1;
 	}
 	if (i > 0) {
 		if (i + 1 > F_LEN_MASK)
 			errx(EX_DATAERR, "too many ports/ranges\n");
 		cmd->o.len |= i + 1;	/* leave F_NOT and F_OR untouched */
 	}
 	return (i);
 }
 
 /*
  * Fill the body of the command with the list of DiffServ codepoints.
  */
 static void
 fill_dscp(ipfw_insn *cmd, char *av, int cblen)
 {
 	uint32_t *low, *high;
 	char *s = av, *a;
 	int code;
 
 	cmd->opcode = O_DSCP;
 	cmd->len |= F_INSN_SIZE(ipfw_insn_u32) + 1;
 
 	CHECK_CMDLEN;
 
 	low = (uint32_t *)(cmd + 1);
 	high = low + 1;
 
 	*low = 0;
 	*high = 0;
 
 	while (s != NULL) {
 		a = strchr(s, ',');
 
 		if (a != NULL)
 			*a++ = '\0';
 
 		if (isalpha(*s)) {
 			if ((code = match_token(f_ipdscp, s)) == -1)
 				errx(EX_DATAERR, "Unknown DSCP code");
 		} else {
 			code = strtoul(s, NULL, 10);
 			if (code < 0 || code > 63)
 				errx(EX_DATAERR, "Invalid DSCP value");
 		}
 
 		if (code >= 32)
 			*high |= 1 << (code - 32);
 		else
 			*low |= 1 << code;
 
 		s = a;
 	}
 }
 
 static struct _s_x icmpcodes[] = {
       { "net",			ICMP_UNREACH_NET },
       { "host",			ICMP_UNREACH_HOST },
       { "protocol",		ICMP_UNREACH_PROTOCOL },
       { "port",			ICMP_UNREACH_PORT },
       { "needfrag",		ICMP_UNREACH_NEEDFRAG },
       { "srcfail",		ICMP_UNREACH_SRCFAIL },
       { "net-unknown",		ICMP_UNREACH_NET_UNKNOWN },
       { "host-unknown",		ICMP_UNREACH_HOST_UNKNOWN },
       { "isolated",		ICMP_UNREACH_ISOLATED },
       { "net-prohib",		ICMP_UNREACH_NET_PROHIB },
       { "host-prohib",		ICMP_UNREACH_HOST_PROHIB },
       { "tosnet",		ICMP_UNREACH_TOSNET },
       { "toshost",		ICMP_UNREACH_TOSHOST },
       { "filter-prohib",	ICMP_UNREACH_FILTER_PROHIB },
       { "host-precedence",	ICMP_UNREACH_HOST_PRECEDENCE },
       { "precedence-cutoff",	ICMP_UNREACH_PRECEDENCE_CUTOFF },
       { NULL, 0 }
 };
 
 static void
 fill_reject_code(u_short *codep, char *str)
 {
 	int val;
 	char *s;
 
 	val = strtoul(str, &s, 0);
 	if (s == str || *s != '\0' || val >= 0x100)
 		val = match_token(icmpcodes, str);
 	if (val < 0)
 		errx(EX_DATAERR, "unknown ICMP unreachable code ``%s''", str);
 	*codep = val;
 	return;
 }
 
 static void
 print_reject_code(struct buf_pr *bp, uint16_t code)
 {
 	char const *s;
 
 	if ((s = match_value(icmpcodes, code)) != NULL)
 		bprintf(bp, "unreach %s", s);
 	else
 		bprintf(bp, "unreach %u", code);
 }
 
 /*
  * Returns the number of bits set (from left) in a contiguous bitmask,
  * or -1 if the mask is not contiguous.
  * XXX this needs a proper fix.
  * This effectively works on masks in big-endian (network) format.
  * when compiled on little endian architectures.
  *
  * First bit is bit 7 of the first byte -- note, for MAC addresses,
  * the first bit on the wire is bit 0 of the first byte.
  * len is the max length in bits.
  */
 int
 contigmask(uint8_t *p, int len)
 {
 	int i, n;
 
 	for (i=0; i<len ; i++)
 		if ( (p[i/8] & (1 << (7 - (i%8)))) == 0) /* first bit unset */
 			break;
 	for (n=i+1; n < len; n++)
 		if ( (p[n/8] & (1 << (7 - (n%8)))) != 0)
 			return -1; /* mask not contiguous */
 	return i;
 }
 
 /*
  * print flags set/clear in the two bitmasks passed as parameters.
  * There is a specialized check for f_tcpflags.
  */
 static void
 print_flags(struct buf_pr *bp, char const *name, ipfw_insn *cmd,
     struct _s_x *list)
 {
 	char const *comma = "";
 	int i;
 	uint8_t set = cmd->arg1 & 0xff;
 	uint8_t clear = (cmd->arg1 >> 8) & 0xff;
 
 	if (list == f_tcpflags && set == TH_SYN && clear == TH_ACK) {
 		bprintf(bp, " setup");
 		return;
 	}
 
 	bprintf(bp, " %s ", name);
 	for (i=0; list[i].x != 0; i++) {
 		if (set & list[i].x) {
 			set &= ~list[i].x;
 			bprintf(bp, "%s%s", comma, list[i].s);
 			comma = ",";
 		}
 		if (clear & list[i].x) {
 			clear &= ~list[i].x;
 			bprintf(bp, "%s!%s", comma, list[i].s);
 			comma = ",";
 		}
 	}
 }
 
 
 /*
  * Print the ip address contained in a command.
  */
 static void
 print_ip(struct buf_pr *bp, const struct format_opts *fo, ipfw_insn_ip *cmd)
 {
 	struct hostent *he = NULL;
 	struct in_addr *ia;
 	uint32_t len = F_LEN((ipfw_insn *)cmd);
 	uint32_t *a = ((ipfw_insn_u32 *)cmd)->d;
 	char *t;
 
 	bprintf(bp, " ");
 	if (cmd->o.opcode == O_IP_DST_LOOKUP && len > F_INSN_SIZE(ipfw_insn_u32)) {
 		uint32_t d = a[1];
 		const char *arg = "<invalid>";
 
 		if (d < sizeof(lookup_key)/sizeof(lookup_key[0]))
 			arg = match_value(rule_options, lookup_key[d]);
 		t = table_search_ctlv(fo->tstate, ((ipfw_insn *)cmd)->arg1);
 		bprintf(bp, "lookup %s %s", arg, t);
 		return;
 	}
 	if (cmd->o.opcode == O_IP_SRC_ME || cmd->o.opcode == O_IP_DST_ME) {
 		bprintf(bp, "me");
 		return;
 	}
 	if (cmd->o.opcode == O_IP_SRC_LOOKUP ||
 	    cmd->o.opcode == O_IP_DST_LOOKUP) {
 		t = table_search_ctlv(fo->tstate, ((ipfw_insn *)cmd)->arg1);
 		bprintf(bp, "table(%s", t);
 		if (len == F_INSN_SIZE(ipfw_insn_u32))
 			bprintf(bp, ",%u", *a);
 		bprintf(bp, ")");
 		return;
 	}
 	if (cmd->o.opcode == O_IP_SRC_SET || cmd->o.opcode == O_IP_DST_SET) {
 		uint32_t x, *map = (uint32_t *)&(cmd->mask);
 		int i, j;
 		char comma = '{';
 
 		x = cmd->o.arg1 - 1;
 		x = htonl( ~x );
 		cmd->addr.s_addr = htonl(cmd->addr.s_addr);
 		bprintf(bp, "%s/%d", inet_ntoa(cmd->addr),
 			contigmask((uint8_t *)&x, 32));
 		x = cmd->addr.s_addr = htonl(cmd->addr.s_addr);
 		x &= 0xff; /* base */
 		/*
 		 * Print bits and ranges.
 		 * Locate first bit set (i), then locate first bit unset (j).
 		 * If we have 3+ consecutive bits set, then print them as a
 		 * range, otherwise only print the initial bit and rescan.
 		 */
 		for (i=0; i < cmd->o.arg1; i++)
 			if (map[i/32] & (1<<(i & 31))) {
 				for (j=i+1; j < cmd->o.arg1; j++)
 					if (!(map[ j/32] & (1<<(j & 31))))
 						break;
 				bprintf(bp, "%c%d", comma, i+x);
 				if (j>i+2) { /* range has at least 3 elements */
 					bprintf(bp, "-%d", j-1+x);
 					i = j-1;
 				}
 				comma = ',';
 			}
 		bprintf(bp, "}");
 		return;
 	}
 	/*
 	 * len == 2 indicates a single IP, whereas lists of 1 or more
 	 * addr/mask pairs have len = (2n+1). We convert len to n so we
 	 * use that to count the number of entries.
 	 */
     for (len = len / 2; len > 0; len--, a += 2) {
 	int mb =	/* mask length */
 	    (cmd->o.opcode == O_IP_SRC || cmd->o.opcode == O_IP_DST) ?
 		32 : contigmask((uint8_t *)&(a[1]), 32);
 	if (mb == 32 && co.do_resolv)
 		he = gethostbyaddr((char *)&(a[0]), sizeof(in_addr_t),
 		    AF_INET);
 	if (he != NULL)		/* resolved to name */
 		bprintf(bp, "%s", he->h_name);
 	else if (mb == 0)	/* any */
 		bprintf(bp, "any");
 	else {		/* numeric IP followed by some kind of mask */
 		ia = (struct in_addr *)&a[0];
 		bprintf(bp, "%s", inet_ntoa(*ia));
 		if (mb < 0) {
 			ia = (struct in_addr *)&a[1];
 			bprintf(bp, ":%s", inet_ntoa(*ia));
 		} else if (mb < 32)
 			bprintf(bp, "/%d", mb);
 	}
 	if (len > 1)
 		bprintf(bp, ",");
     }
 }
 
 /*
  * prints a MAC address/mask pair
  */
 static void
 format_mac(struct buf_pr *bp, uint8_t *addr, uint8_t *mask)
 {
 	int l = contigmask(mask, 48);
 
 	if (l == 0)
 		bprintf(bp, " any");
 	else {
 		bprintf(bp, " %02x:%02x:%02x:%02x:%02x:%02x",
 		    addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
 		if (l == -1)
 			bprintf(bp, "&%02x:%02x:%02x:%02x:%02x:%02x",
 			    mask[0], mask[1], mask[2],
 			    mask[3], mask[4], mask[5]);
 		else if (l < 48)
 			bprintf(bp, "/%d", l);
 	}
 }
 
 static void
 print_mac(struct buf_pr *bp, ipfw_insn_mac *mac)
 {
 
 	bprintf(bp, " MAC");
 	format_mac(bp, mac->addr, mac->mask);
 	format_mac(bp, mac->addr + 6, mac->mask + 6);
 }
 
 static void
 fill_icmptypes(ipfw_insn_u32 *cmd, char *av)
 {
 	uint8_t type;
 
 	cmd->d[0] = 0;
 	while (*av) {
 		if (*av == ',')
 			av++;
 
 		type = strtoul(av, &av, 0);
 
 		if (*av != ',' && *av != '\0')
 			errx(EX_DATAERR, "invalid ICMP type");
 
 		if (type > 31)
 			errx(EX_DATAERR, "ICMP type out of range");
 
 		cmd->d[0] |= 1 << type;
 	}
 	cmd->o.opcode = O_ICMPTYPE;
 	cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32);
 }
 
 static void
 print_icmptypes(struct buf_pr *bp, ipfw_insn_u32 *cmd)
 {
 	int i;
 	char sep= ' ';
 
 	bprintf(bp, " icmptypes");
 	for (i = 0; i < 32; i++) {
 		if ( (cmd->d[0] & (1 << (i))) == 0)
 			continue;
 		bprintf(bp, "%c%d", sep, i);
 		sep = ',';
 	}
 }
 
 static void
 print_dscp(struct buf_pr *bp, ipfw_insn_u32 *cmd)
 {
 	int i = 0;
 	uint32_t *v;
 	char sep= ' ';
 	const char *code;
 
 	bprintf(bp, " dscp");
 	v = cmd->d;
 	while (i < 64) {
 		if (*v & (1 << i)) {
 			if ((code = match_value(f_ipdscp, i)) != NULL)
 				bprintf(bp, "%c%s", sep, code);
 			else
 				bprintf(bp, "%c%d", sep, i);
 			sep = ',';
 		}
 
 		if ((++i % 32) == 0)
 			v++;
 	}
 }
 
 #define	insntod(cmd, type)	((ipfw_insn_ ## type *)(cmd))
 struct show_state {
 	struct ip_fw_rule	*rule;
 	const ipfw_insn		*eaction;
 	uint8_t			*printed;
 	int			flags;
 #define	HAVE_PROTO		0x0001
 #define	HAVE_SRCIP		0x0002
 #define	HAVE_DSTIP		0x0004
 #define	HAVE_PROBE_STATE	0x0008
 	int			proto;
 	int			or_block;
 };
 
 static int
 init_show_state(struct show_state *state, struct ip_fw_rule *rule)
 {
 
 	state->printed = calloc(rule->cmd_len, sizeof(uint8_t));
 	if (state->printed == NULL)
 		return (ENOMEM);
 	state->rule = rule;
 	state->eaction = NULL;
 	state->flags = 0;
 	state->proto = 0;
 	state->or_block = 0;
 	return (0);
 }
 
 static void
 free_show_state(struct show_state *state)
 {
 
 	free(state->printed);
 }
 
 static uint8_t
 is_printed_opcode(struct show_state *state, const ipfw_insn *cmd)
 {
 
 	return (state->printed[cmd - state->rule->cmd]);
 }
 
 static void
 mark_printed(struct show_state *state, const ipfw_insn *cmd)
 {
 
 	state->printed[cmd - state->rule->cmd] = 1;
 }
 
 static void
 print_limit_mask(struct buf_pr *bp, const ipfw_insn_limit *limit)
 {
 	struct _s_x *p = limit_masks;
 	char const *comma = " ";
 	uint8_t x;
 
 	for (x = limit->limit_mask; p->x != 0; p++) {
 		if ((x & p->x) == p->x) {
 			x &= ~p->x;
 			bprintf(bp, "%s%s", comma, p->s);
 			comma = ",";
 		}
 	}
 	bprint_uint_arg(bp, " ", limit->conn_limit);
 }
 
 static int
 print_instruction(struct buf_pr *bp, const struct format_opts *fo,
     struct show_state *state, ipfw_insn *cmd)
 {
 	struct protoent *pe;
 	struct passwd *pwd;
 	struct group *grp;
 	const char *s;
 	double d;
 
 	if (is_printed_opcode(state, cmd))
 		return (0);
 	if ((cmd->len & F_OR) != 0 && state->or_block == 0)
 		bprintf(bp, " {");
 	if (cmd->opcode != O_IN && (cmd->len & F_NOT) != 0)
 		bprintf(bp, " not");
 
 	switch (cmd->opcode) {
 	case O_PROB:
 		d = 1.0 * insntod(cmd, u32)->d[0] / 0x7fffffff;
 		bprintf(bp, "prob %f ", d);
 		break;
 	case O_PROBE_STATE: /* no need to print anything here */
 		state->flags |= HAVE_PROBE_STATE;
 		break;
 	case O_IP_SRC:
 	case O_IP_SRC_LOOKUP:
 	case O_IP_SRC_MASK:
 	case O_IP_SRC_ME:
 	case O_IP_SRC_SET:
 		if (state->flags & HAVE_SRCIP)
 			bprintf(bp, " src-ip");
 		print_ip(bp, fo, insntod(cmd, ip));
 		break;
 	case O_IP_DST:
 	case O_IP_DST_LOOKUP:
 	case O_IP_DST_MASK:
 	case O_IP_DST_ME:
 	case O_IP_DST_SET:
 		if (state->flags & HAVE_DSTIP)
 			bprintf(bp, " dst-ip");
 		print_ip(bp, fo, insntod(cmd, ip));
 		break;
 	case O_IP6_SRC:
 	case O_IP6_SRC_MASK:
 	case O_IP6_SRC_ME:
 		if (state->flags & HAVE_SRCIP)
 			bprintf(bp, " src-ip6");
 		print_ip6(bp, insntod(cmd, ip6));
 		break;
 	case O_IP6_DST:
 	case O_IP6_DST_MASK:
 	case O_IP6_DST_ME:
 		if (state->flags & HAVE_DSTIP)
 			bprintf(bp, " dst-ip6");
 		print_ip6(bp, insntod(cmd, ip6));
 		break;
 	case O_FLOW6ID:
 		print_flow6id(bp, insntod(cmd, u32));
 		break;
 	case O_IP_DSTPORT:
 	case O_IP_SRCPORT:
 		print_newports(bp, insntod(cmd, u16), state->proto,
 		    (state->flags & (HAVE_SRCIP | HAVE_DSTIP)) ==
 		    (HAVE_SRCIP | HAVE_DSTIP) ?  cmd->opcode: 0);
 		break;
 	case O_PROTO:
 		pe = getprotobynumber(cmd->arg1);
 		if (state->flags & HAVE_PROTO)
 			bprintf(bp, " proto");
 		if (pe != NULL)
 			bprintf(bp, " %s", pe->p_name);
 		else
 			bprintf(bp, " %u", cmd->arg1);
 		state->proto = cmd->arg1;
 		break;
 	case O_MACADDR2:
 		print_mac(bp, insntod(cmd, mac));
 		break;
 	case O_MAC_TYPE:
 		print_newports(bp, insntod(cmd, u16),
 		    IPPROTO_ETHERTYPE, cmd->opcode);
 		break;
 	case O_FRAG:
 		bprintf(bp, " frag");
 		break;
 	case O_FIB:
 		bprintf(bp, " fib %u", cmd->arg1);
 		break;
 	case O_SOCKARG:
 		bprintf(bp, " sockarg");
 		break;
 	case O_IN:
 		bprintf(bp, cmd->len & F_NOT ? " out" : " in");
 		break;
 	case O_DIVERTED:
 		switch (cmd->arg1) {
 		case 3:
 			bprintf(bp, " diverted");
 			break;
 		case 2:
 			bprintf(bp, " diverted-output");
 			break;
 		case 1:
 			bprintf(bp, " diverted-loopback");
 			break;
 		default:
 			bprintf(bp, " diverted-?<%u>", cmd->arg1);
 			break;
 		}
 		break;
 	case O_LAYER2:
 		bprintf(bp, " layer2");
 		break;
 	case O_XMIT:
 	case O_RECV:
 	case O_VIA:
 		if (cmd->opcode == O_XMIT)
 			s = "xmit";
 		else if (cmd->opcode == O_RECV)
 			s = "recv";
 		else /* if (cmd->opcode == O_VIA) */
 			s = "via";
 		switch (insntod(cmd, if)->name[0]) {
 		case '\0':
 			bprintf(bp, " %s %s", s,
 			    inet_ntoa(insntod(cmd, if)->p.ip));
 			break;
 		case '\1':
 			bprintf(bp, " %s table(%s)", s,
 			    table_search_ctlv(fo->tstate,
 			    insntod(cmd, if)->p.kidx));
 			break;
 		default:
 			bprintf(bp, " %s %s", s,
 			    insntod(cmd, if)->name);
 		}
 		break;
 	case O_IP_FLOW_LOOKUP:
 		s = table_search_ctlv(fo->tstate, cmd->arg1);
 		bprintf(bp, " flow table(%s", s);
 		if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_u32))
 			bprintf(bp, ",%u", insntod(cmd, u32)->d[0]);
 		bprintf(bp, ")");
 		break;
 	case O_IPID:
 	case O_IPTTL:
 	case O_IPLEN:
 	case O_TCPDATALEN:
 	case O_TCPWIN:
 		if (F_LEN(cmd) == 1) {
 			switch (cmd->opcode) {
 			case O_IPID:
 				s = "ipid";
 				break;
 			case O_IPTTL:
 				s = "ipttl";
 				break;
 			case O_IPLEN:
 				s = "iplen";
 				break;
 			case O_TCPDATALEN:
 				s = "tcpdatalen";
 				break;
 			case O_TCPWIN:
 				s = "tcpwin";
 				break;
 			}
 			bprintf(bp, " %s %u", s, cmd->arg1);
 		} else
 			print_newports(bp, insntod(cmd, u16), 0,
 			    cmd->opcode);
 		break;
 	case O_IPVER:
 		bprintf(bp, " ipver %u", cmd->arg1);
 		break;
 	case O_IPPRECEDENCE:
 		bprintf(bp, " ipprecedence %u", cmd->arg1 >> 5);
 		break;
 	case O_DSCP:
 		print_dscp(bp, insntod(cmd, u32));
 		break;
 	case O_IPOPT:
 		print_flags(bp, "ipoptions", cmd, f_ipopts);
 		break;
 	case O_IPTOS:
 		print_flags(bp, "iptos", cmd, f_iptos);
 		break;
 	case O_ICMPTYPE:
 		print_icmptypes(bp, insntod(cmd, u32));
 		break;
 	case O_ESTAB:
 		bprintf(bp, " established");
 		break;
 	case O_TCPFLAGS:
 		print_flags(bp, "tcpflags", cmd, f_tcpflags);
 		break;
 	case O_TCPOPTS:
 		print_flags(bp, "tcpoptions", cmd, f_tcpopts);
 		break;
 	case O_TCPACK:
 		bprintf(bp, " tcpack %d",
 		    ntohl(insntod(cmd, u32)->d[0]));
 		break;
 	case O_TCPSEQ:
 		bprintf(bp, " tcpseq %d",
 		    ntohl(insntod(cmd, u32)->d[0]));
 		break;
 	case O_UID:
 		pwd = getpwuid(insntod(cmd, u32)->d[0]);
 		if (pwd != NULL)
 			bprintf(bp, " uid %s", pwd->pw_name);
 		else
 			bprintf(bp, " uid %u",
 			    insntod(cmd, u32)->d[0]);
 		break;
 	case O_GID:
 		grp = getgrgid(insntod(cmd, u32)->d[0]);
 		if (grp != NULL)
 			bprintf(bp, " gid %s", grp->gr_name);
 		else
 			bprintf(bp, " gid %u",
 			    insntod(cmd, u32)->d[0]);
 		break;
 	case O_JAIL:
 		bprintf(bp, " jail %d", insntod(cmd, u32)->d[0]);
 		break;
 	case O_VERREVPATH:
 		bprintf(bp, " verrevpath");
 		break;
 	case O_VERSRCREACH:
 		bprintf(bp, " versrcreach");
 		break;
 	case O_ANTISPOOF:
 		bprintf(bp, " antispoof");
 		break;
 	case O_IPSEC:
 		bprintf(bp, " ipsec");
 		break;
 	case O_NOP:
 		bprintf(bp, " // %s", (char *)(cmd + 1));
 		break;
 	case O_KEEP_STATE:
 		if (state->flags & HAVE_PROBE_STATE)
 			bprintf(bp, " keep-state");
 		else
 			bprintf(bp, " record-state");
 		bprintf(bp, " :%s",
 		    object_search_ctlv(fo->tstate, cmd->arg1,
 		    IPFW_TLV_STATE_NAME));
 		break;
 	case O_LIMIT:
 		if (state->flags & HAVE_PROBE_STATE)
 			bprintf(bp, " limit");
 		else
 			bprintf(bp, " set-limit");
 		print_limit_mask(bp, insntod(cmd, limit));
 		bprintf(bp, " :%s",
 		    object_search_ctlv(fo->tstate, cmd->arg1,
 		    IPFW_TLV_STATE_NAME));
 		break;
 	case O_IP6:
+		if (state->flags & HAVE_PROTO)
+			bprintf(bp, " proto");
 		bprintf(bp, " ip6");
 		break;
 	case O_IP4:
+		if (state->flags & HAVE_PROTO)
+			bprintf(bp, " proto");
 		bprintf(bp, " ip4");
 		break;
 	case O_ICMP6TYPE:
 		print_icmp6types(bp, insntod(cmd, u32));
 		break;
 	case O_EXT_HDR:
 		print_ext6hdr(bp, cmd);
 		break;
 	case O_TAGGED:
 		if (F_LEN(cmd) == 1)
 			bprint_uint_arg(bp, " tagged ", cmd->arg1);
 		else
 			print_newports(bp, insntod(cmd, u16),
 				    0, O_TAGGED);
 		break;
 	case O_SKIP_ACTION:
 		bprintf(bp, " defer-immediate-action");
 		break;
 	default:
 		bprintf(bp, " [opcode %d len %d]", cmd->opcode,
 		    cmd->len);
 	}
 	if (cmd->len & F_OR) {
 		bprintf(bp, " or");
 		state->or_block = 1;
 	} else if (state->or_block != 0) {
 		bprintf(bp, " }");
 		state->or_block = 0;
 	}
 	mark_printed(state, cmd);
 
 	return (1);
 }
 
 static ipfw_insn *
 print_opcode(struct buf_pr *bp, struct format_opts *fo,
     struct show_state *state, int opcode)
 {
 	ipfw_insn *cmd;
 	int l;
 
 	for (l = state->rule->act_ofs, cmd = state->rule->cmd;
 	    l > 0; l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 		/* We use zero opcode to print the rest of options */
 		if (opcode >= 0 && cmd->opcode != opcode)
 			continue;
 		/*
 		 * Skip O_NOP, when we printing the rest
 		 * of options, it will be handled separately.
 		 */
 		if (cmd->opcode == O_NOP && opcode != O_NOP)
 			continue;
 		if (!print_instruction(bp, fo, state, cmd))
 			continue;
 		return (cmd);
 	}
 	return (NULL);
 }
 
 static void
 print_fwd(struct buf_pr *bp, const ipfw_insn *cmd)
 {
 	char buf[INET6_ADDRSTRLEN + IF_NAMESIZE + 2];
 	ipfw_insn_sa6 *sa6;
 	ipfw_insn_sa *sa;
 	uint16_t port;
 
 	if (cmd->opcode == O_FORWARD_IP) {
 		sa = insntod(cmd, sa);
 		port = sa->sa.sin_port;
 		if (sa->sa.sin_addr.s_addr == INADDR_ANY)
 			bprintf(bp, "fwd tablearg");
 		else
 			bprintf(bp, "fwd %s", inet_ntoa(sa->sa.sin_addr));
 	} else {
 		sa6 = insntod(cmd, sa6);
 		port = sa6->sa.sin6_port;
 		bprintf(bp, "fwd ");
 		if (getnameinfo((const struct sockaddr *)&sa6->sa,
 		    sizeof(struct sockaddr_in6), buf, sizeof(buf), NULL, 0,
 		    NI_NUMERICHOST) == 0)
 			bprintf(bp, "%s", buf);
 	}
 	if (port != 0)
 		bprintf(bp, ",%u", port);
 }
 
 static int
 print_action_instruction(struct buf_pr *bp, const struct format_opts *fo,
     struct show_state *state, const ipfw_insn *cmd)
 {
 	const char *s;
 
 	if (is_printed_opcode(state, cmd))
 		return (0);
 	switch (cmd->opcode) {
 	case O_CHECK_STATE:
 		bprintf(bp, "check-state");
 		if (cmd->arg1 != 0)
 			s = object_search_ctlv(fo->tstate, cmd->arg1,
 			    IPFW_TLV_STATE_NAME);
 		else
 			s = NULL;
 		bprintf(bp, " :%s", s ? s: "any");
 		break;
 	case O_ACCEPT:
 		bprintf(bp, "allow");
 		break;
 	case O_COUNT:
 		bprintf(bp, "count");
 		break;
 	case O_DENY:
 		bprintf(bp, "deny");
 		break;
 	case O_REJECT:
 		if (cmd->arg1 == ICMP_REJECT_RST)
 			bprintf(bp, "reset");
 		else if (cmd->arg1 == ICMP_REJECT_ABORT)
 			bprintf(bp, "abort");
 		else if (cmd->arg1 == ICMP_UNREACH_HOST)
 			bprintf(bp, "reject");
 		else
 			print_reject_code(bp, cmd->arg1);
 		break;
 	case O_UNREACH6:
 		if (cmd->arg1 == ICMP6_UNREACH_RST)
 			bprintf(bp, "reset6");
 		else if (cmd->arg1 == ICMP6_UNREACH_ABORT)
 			bprintf(bp, "abort6");
 		else
 			print_unreach6_code(bp, cmd->arg1);
 		break;
 	case O_SKIPTO:
 		bprint_uint_arg(bp, "skipto ", cmd->arg1);
 		break;
 	case O_PIPE:
 		bprint_uint_arg(bp, "pipe ", cmd->arg1);
 		break;
 	case O_QUEUE:
 		bprint_uint_arg(bp, "queue ", cmd->arg1);
 		break;
 	case O_DIVERT:
 		bprint_uint_arg(bp, "divert ", cmd->arg1);
 		break;
 	case O_TEE:
 		bprint_uint_arg(bp, "tee ", cmd->arg1);
 		break;
 	case O_NETGRAPH:
 		bprint_uint_arg(bp, "netgraph ", cmd->arg1);
 		break;
 	case O_NGTEE:
 		bprint_uint_arg(bp, "ngtee ", cmd->arg1);
 		break;
 	case O_FORWARD_IP:
 	case O_FORWARD_IP6:
 		print_fwd(bp, cmd);
 		break;
 	case O_LOG:
 		if (insntod(cmd, log)->max_log > 0)
 			bprintf(bp, " log logamount %d",
 			    insntod(cmd, log)->max_log);
 		else
 			bprintf(bp, " log");
 		break;
 	case O_ALTQ:
 #ifndef NO_ALTQ
 		print_altq_cmd(bp, insntod(cmd, altq));
 #endif
 		break;
 	case O_TAG:
 		bprint_uint_arg(bp, cmd->len & F_NOT ? " untag ":
 		    " tag ", cmd->arg1);
 		break;
 	case O_NAT:
 		if (cmd->arg1 != IP_FW_NAT44_GLOBAL)
 			bprint_uint_arg(bp, "nat ", cmd->arg1);
 		else
 			bprintf(bp, "nat global");
 		break;
 	case O_SETFIB:
 		if (cmd->arg1 == IP_FW_TARG)
 			bprint_uint_arg(bp, "setfib ", cmd->arg1);
 		else
 			bprintf(bp, "setfib %u", cmd->arg1 & 0x7FFF);
 		break;
 	case O_EXTERNAL_ACTION:
 		/*
 		 * The external action can consists of two following
 		 * each other opcodes - O_EXTERNAL_ACTION and
 		 * O_EXTERNAL_INSTANCE. The first contains the ID of
 		 * name of external action. The second contains the ID
 		 * of name of external action instance.
 		 * NOTE: in case when external action has no named
 		 * instances support, the second opcode isn't needed.
 		 */
 		state->eaction = cmd;
 		s = object_search_ctlv(fo->tstate, cmd->arg1,
 		    IPFW_TLV_EACTION);
 		if (match_token(rule_eactions, s) != -1)
 			bprintf(bp, "%s", s);
 		else
 			bprintf(bp, "eaction %s", s);
 		break;
 	case O_EXTERNAL_INSTANCE:
 		if (state->eaction == NULL)
 			break;
 		/*
 		 * XXX: we need to teach ipfw(9) to rewrite opcodes
 		 * in the user buffer on rule addition. When we add
 		 * the rule, we specify zero TLV type for
 		 * O_EXTERNAL_INSTANCE object. To show correct
 		 * rule after `ipfw add` we need to search instance
 		 * name with zero type. But when we do `ipfw show`
 		 * we calculate TLV type using IPFW_TLV_EACTION_NAME()
 		 * macro.
 		 */
 		s = object_search_ctlv(fo->tstate, cmd->arg1, 0);
 		if (s == NULL)
 			s = object_search_ctlv(fo->tstate,
 			    cmd->arg1, IPFW_TLV_EACTION_NAME(
 			    state->eaction->arg1));
 		bprintf(bp, " %s", s);
 		break;
 	case O_EXTERNAL_DATA:
 		if (state->eaction == NULL)
 			break;
 		/*
 		 * Currently we support data formatting only for
 		 * external data with datalen u16. For unknown data
 		 * print its size in bytes.
 		 */
 		if (cmd->len == F_INSN_SIZE(ipfw_insn))
 			bprintf(bp, " %u", cmd->arg1);
 		else
 			bprintf(bp, " %ubytes",
 			    cmd->len * sizeof(uint32_t));
 		break;
 	case O_SETDSCP:
 		if (cmd->arg1 == IP_FW_TARG) {
 			bprintf(bp, "setdscp tablearg");
 			break;
 		}
 		s = match_value(f_ipdscp, cmd->arg1 & 0x3F);
 		if (s != NULL)
 			bprintf(bp, "setdscp %s", s);
 		else
 			bprintf(bp, "setdscp %u", cmd->arg1 & 0x3F);
 		break;
 	case O_REASS:
 		bprintf(bp, "reass");
 		break;
 	case O_CALLRETURN:
 		if (cmd->len & F_NOT)
 			bprintf(bp, "return");
 		else
 			bprint_uint_arg(bp, "call ", cmd->arg1);
 		break;
 	default:
 		bprintf(bp, "** unrecognized action %d len %d ",
 			cmd->opcode, cmd->len);
 	}
 	mark_printed(state, cmd);
 
 	return (1);
 }
 
 
 static ipfw_insn *
 print_action(struct buf_pr *bp, struct format_opts *fo,
     struct show_state *state, uint8_t opcode)
 {
 	ipfw_insn *cmd;
 	int l;
 
 	for (l = state->rule->cmd_len - state->rule->act_ofs,
 	    cmd = ACTION_PTR(state->rule); l > 0;
 	    l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 		if (cmd->opcode != opcode)
 			continue;
 		if (!print_action_instruction(bp, fo, state, cmd))
 			continue;
 		return (cmd);
 	}
 	return (NULL);
 }
 
 static void
 print_proto(struct buf_pr *bp, struct format_opts *fo,
     struct show_state *state)
 {
 	ipfw_insn *cmd;
 	int l, proto, ip4, ip6;
 
 	/* Count all O_PROTO, O_IP4, O_IP6 instructions. */
 	proto = ip4 = ip6 = 0;
 	for (l = state->rule->act_ofs, cmd = state->rule->cmd;
 	    l > 0; l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 		switch (cmd->opcode) {
 		case O_PROTO:
 			proto++;
 			break;
 		case O_IP4:
 			ip4 = 1;
 			if (cmd->len & F_OR)
 				ip4++;
 			break;
 		case O_IP6:
 			ip6 = 1;
 			if (cmd->len & F_OR)
 				ip6++;
 			break;
 		default:
 			continue;
 		}
 	}
 	if (proto == 0 && ip4 == 0 && ip6 == 0) {
 		state->proto = IPPROTO_IP;
 		state->flags |= HAVE_PROTO;
 		bprintf(bp, " ip");
 		return;
 	}
 	/* To handle the case { ip4 or ip6 }, print opcode with F_OR first */
 	cmd = NULL;
 	if (ip4 || ip6)
 		cmd = print_opcode(bp, fo, state, ip4 > ip6 ? O_IP4: O_IP6);
 	if (cmd != NULL && (cmd->len & F_OR))
 		cmd = print_opcode(bp, fo, state, ip4 > ip6 ? O_IP6: O_IP4);
 	if (cmd == NULL || (cmd->len & F_OR))
 		for (l = proto; l > 0; l--) {
 			cmd = print_opcode(bp, fo, state, O_PROTO);
 			if (cmd == NULL || (cmd->len & F_OR) == 0)
 				break;
 		}
 	/* Initialize proto, it is used by print_newports() */
 	state->flags |= HAVE_PROTO;
 	if (state->proto == 0 && ip6 != 0)
 		state->proto = IPPROTO_IPV6;
 }
 
 static int
 match_opcode(int opcode, const int opcodes[], size_t nops)
 {
 	int i;
 
 	for (i = 0; i < nops; i++)
 		if (opcode == opcodes[i])
 			return (1);
 	return (0);
 }
 
 static void
 print_address(struct buf_pr *bp, struct format_opts *fo,
     struct show_state *state, const int opcodes[], size_t nops, int portop,
     int flag)
 {
 	ipfw_insn *cmd;
 	int count, l, portcnt, pf;
 
 	count = portcnt = 0;
 	for (l = state->rule->act_ofs, cmd = state->rule->cmd;
 	    l > 0; l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 		if (match_opcode(cmd->opcode, opcodes, nops))
 			count++;
 		else if (cmd->opcode == portop)
 			portcnt++;
 	}
 	if (count == 0)
 		bprintf(bp, " any");
 	for (l = state->rule->act_ofs, cmd = state->rule->cmd;
 	    l > 0 && count > 0; l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 		if (!match_opcode(cmd->opcode, opcodes, nops))
 			continue;
 		print_instruction(bp, fo, state, cmd);
 		if ((cmd->len & F_OR) == 0)
 			break;
 		count--;
 	}
 	/*
 	 * If several O_IP_?PORT opcodes specified, leave them to the
 	 * options section.
 	 */
 	if (portcnt == 1) {
 		for (l = state->rule->act_ofs, cmd = state->rule->cmd, pf = 0;
 		    l > 0; l -= F_LEN(cmd), cmd += F_LEN(cmd)) {
 			if (cmd->opcode != portop) {
 				pf = (cmd->len & F_OR);
 				continue;
 			}
 			/* Print opcode iff it is not in OR block. */
 			if (pf == 0 && (cmd->len & F_OR) == 0)
 				print_instruction(bp, fo, state, cmd);
 			break;
 		}
 	}
 	state->flags |= flag;
 }
 
 static const int action_opcodes[] = {
 	O_CHECK_STATE, O_ACCEPT, O_COUNT, O_DENY, O_REJECT,
 	O_UNREACH6, O_SKIPTO, O_PIPE, O_QUEUE, O_DIVERT, O_TEE,
 	O_NETGRAPH, O_NGTEE, O_FORWARD_IP, O_FORWARD_IP6, O_NAT,
 	O_SETFIB, O_SETDSCP, O_REASS, O_CALLRETURN,
 	/* keep the following opcodes at the end of the list */
 	O_EXTERNAL_ACTION, O_EXTERNAL_INSTANCE, O_EXTERNAL_DATA
 };
 
 static const int modifier_opcodes[] = {
 	O_LOG, O_ALTQ, O_TAG
 };
 
 static const int src_opcodes[] = {
 	O_IP_SRC, O_IP_SRC_LOOKUP, O_IP_SRC_MASK, O_IP_SRC_ME,
 	O_IP_SRC_SET, O_IP6_SRC, O_IP6_SRC_MASK, O_IP6_SRC_ME
 };
 
 static const int dst_opcodes[] = {
 	O_IP_DST, O_IP_DST_LOOKUP, O_IP_DST_MASK, O_IP_DST_ME,
 	O_IP_DST_SET, O_IP6_DST, O_IP6_DST_MASK, O_IP6_DST_ME
 };
 
 static void
 show_static_rule(struct cmdline_opts *co, struct format_opts *fo,
     struct buf_pr *bp, struct ip_fw_rule *rule, struct ip_fw_bcounter *cntr)
 {
 	struct show_state state;
 	ipfw_insn *cmd;
 	static int twidth = 0;
 	int i;
 
 	/* Print # DISABLED or skip the rule */
 	if ((fo->set_mask & (1 << rule->set)) == 0) {
 		/* disabled mask */
 		if (!co->show_sets)
 			return;
 		else
 			bprintf(bp, "# DISABLED ");
 	}
 	if (init_show_state(&state, rule) != 0) {
 		warn("init_show_state() failed");
 		return;
 	}
 	bprintf(bp, "%05u ", rule->rulenum);
 
 	/* Print counters if enabled */
 	if (fo->pcwidth > 0 || fo->bcwidth > 0) {
 		pr_u64(bp, &cntr->pcnt, fo->pcwidth);
 		pr_u64(bp, &cntr->bcnt, fo->bcwidth);
 	}
 
 	/* Print timestamp */
 	if (co->do_time == TIMESTAMP_NUMERIC)
 		bprintf(bp, "%10u ", cntr->timestamp);
 	else if (co->do_time == TIMESTAMP_STRING) {
 		char timestr[30];
 		time_t t = (time_t)0;
 
 		if (twidth == 0) {
 			strcpy(timestr, ctime(&t));
 			*strchr(timestr, '\n') = '\0';
 			twidth = strlen(timestr);
 		}
 		if (cntr->timestamp > 0) {
 			t = _long_to_time(cntr->timestamp);
 
 			strcpy(timestr, ctime(&t));
 			*strchr(timestr, '\n') = '\0';
 			bprintf(bp, "%s ", timestr);
 		} else {
 			bprintf(bp, "%*s", twidth, " ");
 		}
 	}
 
 	/* Print set number */
 	if (co->show_sets)
 		bprintf(bp, "set %d ", rule->set);
 
 	/* Print the optional "match probability" */
 	cmd = print_opcode(bp, fo, &state, O_PROB);
 	/* Print rule action */
 	for (i = 0; i < nitems(action_opcodes); i++) {
 		cmd = print_action(bp, fo, &state, action_opcodes[i]);
 		if (cmd == NULL)
 			continue;
 		/* Handle special cases */
 		switch (cmd->opcode) {
 		case O_CHECK_STATE:
 			goto end;
 		case O_EXTERNAL_ACTION:
 		case O_EXTERNAL_INSTANCE:
 			/* External action can have several instructions */
 			continue;
 		}
 		break;
 	}
 	/* Print rule modifiers */
 	for (i = 0; i < nitems(modifier_opcodes); i++)
 		print_action(bp, fo, &state, modifier_opcodes[i]);
 	/*
 	 * Print rule body
 	 */
 	if (co->comment_only != 0)
 		goto end;
 
 	if (rule->flags & IPFW_RULE_JUSTOPTS) {
 		state.flags |= HAVE_PROTO | HAVE_SRCIP | HAVE_DSTIP;
 		goto justopts;
 	}
 
 	print_proto(bp, fo, &state);
 
 	/* Print source */
 	bprintf(bp, " from");
 	print_address(bp, fo, &state, src_opcodes, nitems(src_opcodes),
 	    O_IP_SRCPORT, HAVE_SRCIP);
 
 	/* Print destination */
 	bprintf(bp, " to");
 	print_address(bp, fo, &state, dst_opcodes, nitems(dst_opcodes),
 	    O_IP_DSTPORT, HAVE_DSTIP);
 
 justopts:
 	/* Print the rest of options */
 	while (print_opcode(bp, fo, &state, -1))
 		;
 end:
 	/* Print comment at the end */
 	cmd = print_opcode(bp, fo, &state, O_NOP);
 	if (co->comment_only != 0 && cmd == NULL)
 		bprintf(bp, " // ...");
 	bprintf(bp, "\n");
 	free_show_state(&state);
 }
 
 static void
 show_dyn_state(struct cmdline_opts *co, struct format_opts *fo,
     struct buf_pr *bp, ipfw_dyn_rule *d)
 {
 	struct protoent *pe;
 	struct in_addr a;
 	uint16_t rulenum;
 	char buf[INET6_ADDRSTRLEN];
 
 	if (d->expire == 0 && d->dyn_type != O_LIMIT_PARENT)
 		return;
 
 	bcopy(&d->rule, &rulenum, sizeof(rulenum));
 	bprintf(bp, "%05d", rulenum);
 	if (fo->pcwidth > 0 || fo->bcwidth > 0) {
 		bprintf(bp, " ");
 		pr_u64(bp, &d->pcnt, fo->pcwidth);
 		pr_u64(bp, &d->bcnt, fo->bcwidth);
 		bprintf(bp, "(%ds)", d->expire);
 	}
 	switch (d->dyn_type) {
 	case O_LIMIT_PARENT:
 		bprintf(bp, " PARENT %d", d->count);
 		break;
 	case O_LIMIT:
 		bprintf(bp, " LIMIT");
 		break;
 	case O_KEEP_STATE: /* bidir, no mask */
 		bprintf(bp, " STATE");
 		break;
 	}
 
 	if ((pe = getprotobynumber(d->id.proto)) != NULL)
 		bprintf(bp, " %s", pe->p_name);
 	else
 		bprintf(bp, " proto %u", d->id.proto);
 
 	if (d->id.addr_type == 4) {
 		a.s_addr = htonl(d->id.src_ip);
 		bprintf(bp, " %s %d", inet_ntoa(a), d->id.src_port);
 
 		a.s_addr = htonl(d->id.dst_ip);
 		bprintf(bp, " <-> %s %d", inet_ntoa(a), d->id.dst_port);
 	} else if (d->id.addr_type == 6) {
 		bprintf(bp, " %s %d", inet_ntop(AF_INET6, &d->id.src_ip6, buf,
 		    sizeof(buf)), d->id.src_port);
 		bprintf(bp, " <-> %s %d", inet_ntop(AF_INET6, &d->id.dst_ip6,
 		    buf, sizeof(buf)), d->id.dst_port);
 	} else
 		bprintf(bp, " UNKNOWN <-> UNKNOWN");
 	if (d->kidx != 0)
 		bprintf(bp, " :%s", object_search_ctlv(fo->tstate,
 		    d->kidx, IPFW_TLV_STATE_NAME));
 
 #define	BOTH_SYN	(TH_SYN | (TH_SYN << 8))
 #define	BOTH_FIN	(TH_FIN | (TH_FIN << 8))
 	if (co->verbose) {
 		bprintf(bp, " state 0x%08x%s", d->state,
 		    d->state ? " ": ",");
 		if (d->state & IPFW_DYN_ORPHANED)
 			bprintf(bp, "ORPHANED,");
 		if ((d->state & BOTH_SYN) == BOTH_SYN)
 			bprintf(bp, "BOTH_SYN,");
 		else {
 			if (d->state & TH_SYN)
 				bprintf(bp, "F_SYN,");
 			if (d->state & (TH_SYN << 8))
 				bprintf(bp, "R_SYN,");
 		}
 		if ((d->state & BOTH_FIN) == BOTH_FIN)
 			bprintf(bp, "BOTH_FIN,");
 		else {
 			if (d->state & TH_FIN)
 				bprintf(bp, "F_FIN,");
 			if (d->state & (TH_FIN << 8))
 				bprintf(bp, "R_FIN,");
 		}
 		bprintf(bp, " f_ack 0x%x, r_ack 0x%x", d->ack_fwd,
 		    d->ack_rev);
 	}
 }
 
 static int
 do_range_cmd(int cmd, ipfw_range_tlv *rt)
 {
 	ipfw_range_header rh;
 	size_t sz;
 
 	memset(&rh, 0, sizeof(rh));
 	memcpy(&rh.range, rt, sizeof(*rt));
 	rh.range.head.length = sizeof(*rt);
 	rh.range.head.type = IPFW_TLV_RANGE;
 	sz = sizeof(rh);
 
 	if (do_get3(cmd, &rh.opheader, &sz) != 0)
 		return (-1);
 	/* Save number of matched objects */
 	rt->new_set = rh.range.new_set;
 	return (0);
 }
 
 /*
  * This one handles all set-related commands
  * 	ipfw set { show | enable | disable }
  * 	ipfw set swap X Y
  * 	ipfw set move X to Y
  * 	ipfw set move rule X to Y
  */
 void
 ipfw_sets_handler(char *av[])
 {
 	ipfw_range_tlv rt;
 	char *msg;
 	size_t size;
 	uint32_t masks[2];
 	int i;
 	uint16_t rulenum;
 	uint8_t cmd;
 
 	av++;
 	memset(&rt, 0, sizeof(rt));
 
 	if (av[0] == NULL)
 		errx(EX_USAGE, "set needs command");
 	if (_substrcmp(*av, "show") == 0) {
 		struct format_opts fo;
 		ipfw_cfg_lheader *cfg;
 
 		memset(&fo, 0, sizeof(fo));
 		if (ipfw_get_config(&co, &fo, &cfg, &size) != 0)
 			err(EX_OSERR, "requesting config failed");
 
 		for (i = 0, msg = "disable"; i < RESVD_SET; i++)
 			if ((cfg->set_mask & (1<<i)) == 0) {
 				printf("%s %d", msg, i);
 				msg = "";
 			}
 		msg = (cfg->set_mask != (uint32_t)-1) ? " enable" : "enable";
 		for (i = 0; i < RESVD_SET; i++)
 			if ((cfg->set_mask & (1<<i)) != 0) {
 				printf("%s %d", msg, i);
 				msg = "";
 			}
 		printf("\n");
 		free(cfg);
 	} else if (_substrcmp(*av, "swap") == 0) {
 		av++;
 		if ( av[0] == NULL || av[1] == NULL )
 			errx(EX_USAGE, "set swap needs 2 set numbers\n");
 		rt.set = atoi(av[0]);
 		rt.new_set = atoi(av[1]);
 		if (!isdigit(*(av[0])) || rt.set > RESVD_SET)
 			errx(EX_DATAERR, "invalid set number %s\n", av[0]);
 		if (!isdigit(*(av[1])) || rt.new_set > RESVD_SET)
 			errx(EX_DATAERR, "invalid set number %s\n", av[1]);
 		i = do_range_cmd(IP_FW_SET_SWAP, &rt);
 	} else if (_substrcmp(*av, "move") == 0) {
 		av++;
 		if (av[0] && _substrcmp(*av, "rule") == 0) {
 			rt.flags = IPFW_RCFLAG_RANGE; /* move rules to new set */
 			cmd = IP_FW_XMOVE;
 			av++;
 		} else
 			cmd = IP_FW_SET_MOVE; /* Move set to new one */
 		if (av[0] == NULL || av[1] == NULL || av[2] == NULL ||
 				av[3] != NULL ||  _substrcmp(av[1], "to") != 0)
 			errx(EX_USAGE, "syntax: set move [rule] X to Y\n");
 		rulenum = atoi(av[0]);
 		rt.new_set = atoi(av[2]);
 		if (cmd == IP_FW_XMOVE) {
 			rt.start_rule = rulenum;
 			rt.end_rule = rulenum;
 		} else
 			rt.set = rulenum;
 		rt.new_set = atoi(av[2]);
 		if (!isdigit(*(av[0])) || (cmd == 3 && rt.set > RESVD_SET) ||
 			(cmd == 2 && rt.start_rule == IPFW_DEFAULT_RULE) )
 			errx(EX_DATAERR, "invalid source number %s\n", av[0]);
 		if (!isdigit(*(av[2])) || rt.new_set > RESVD_SET)
 			errx(EX_DATAERR, "invalid dest. set %s\n", av[1]);
 		i = do_range_cmd(cmd, &rt);
 		if (i < 0)
 			err(EX_OSERR, "failed to move %s",
 			    cmd == IP_FW_SET_MOVE ? "set": "rule");
 	} else if (_substrcmp(*av, "disable") == 0 ||
 		   _substrcmp(*av, "enable") == 0 ) {
 		int which = _substrcmp(*av, "enable") == 0 ? 1 : 0;
 
 		av++;
 		masks[0] = masks[1] = 0;
 
 		while (av[0]) {
 			if (isdigit(**av)) {
 				i = atoi(*av);
 				if (i < 0 || i > RESVD_SET)
 					errx(EX_DATAERR,
 					    "invalid set number %d\n", i);
 				masks[which] |= (1<<i);
 			} else if (_substrcmp(*av, "disable") == 0)
 				which = 0;
 			else if (_substrcmp(*av, "enable") == 0)
 				which = 1;
 			else
 				errx(EX_DATAERR,
 					"invalid set command %s\n", *av);
 			av++;
 		}
 		if ( (masks[0] & masks[1]) != 0 )
 			errx(EX_DATAERR,
 			    "cannot enable and disable the same set\n");
 
 		rt.set = masks[0];
 		rt.new_set = masks[1];
 		i = do_range_cmd(IP_FW_SET_ENABLE, &rt);
 		if (i)
 			warn("set enable/disable: setsockopt(IP_FW_SET_ENABLE)");
 	} else
 		errx(EX_USAGE, "invalid set command %s\n", *av);
 }
 
 void
 ipfw_sysctl_handler(char *av[], int which)
 {
 	av++;
 
 	if (av[0] == NULL) {
 		warnx("missing keyword to enable/disable\n");
 	} else if (_substrcmp(*av, "firewall") == 0) {
 		sysctlbyname("net.inet.ip.fw.enable", NULL, 0,
 		    &which, sizeof(which));
 		sysctlbyname("net.inet6.ip6.fw.enable", NULL, 0,
 		    &which, sizeof(which));
 	} else if (_substrcmp(*av, "one_pass") == 0) {
 		sysctlbyname("net.inet.ip.fw.one_pass", NULL, 0,
 		    &which, sizeof(which));
 	} else if (_substrcmp(*av, "debug") == 0) {
 		sysctlbyname("net.inet.ip.fw.debug", NULL, 0,
 		    &which, sizeof(which));
 	} else if (_substrcmp(*av, "verbose") == 0) {
 		sysctlbyname("net.inet.ip.fw.verbose", NULL, 0,
 		    &which, sizeof(which));
 	} else if (_substrcmp(*av, "dyn_keepalive") == 0) {
 		sysctlbyname("net.inet.ip.fw.dyn_keepalive", NULL, 0,
 		    &which, sizeof(which));
 #ifndef NO_ALTQ
 	} else if (_substrcmp(*av, "altq") == 0) {
 		altq_set_enabled(which);
 #endif
 	} else {
 		warnx("unrecognize enable/disable keyword: %s\n", *av);
 	}
 }
 
 typedef void state_cb(struct cmdline_opts *co, struct format_opts *fo,
     void *arg, void *state);
 
 static void
 prepare_format_dyn(struct cmdline_opts *co, struct format_opts *fo,
     void *arg, void *_state)
 {
 	ipfw_dyn_rule *d;
 	int width;
 	uint8_t set;
 
 	d = (ipfw_dyn_rule *)_state;
 	/* Count _ALL_ states */
 	fo->dcnt++;
 
 	if (fo->show_counters == 0)
 		return;
 
 	if (co->use_set) {
 		/* skip states from another set */
 		bcopy((char *)&d->rule + sizeof(uint16_t), &set,
 		    sizeof(uint8_t));
 		if (set != co->use_set - 1)
 			return;
 	}
 
 	width = pr_u64(NULL, &d->pcnt, 0);
 	if (width > fo->pcwidth)
 		fo->pcwidth = width;
 
 	width = pr_u64(NULL, &d->bcnt, 0);
 	if (width > fo->bcwidth)
 		fo->bcwidth = width;
 }
 
 static int
 foreach_state(struct cmdline_opts *co, struct format_opts *fo,
     caddr_t base, size_t sz, state_cb dyn_bc, void *dyn_arg)
 {
 	int ttype;
 	state_cb *fptr;
 	void *farg;
 	ipfw_obj_tlv *tlv;
 	ipfw_obj_ctlv *ctlv;
 
 	fptr = NULL;
 	ttype = 0;
 
 	while (sz > 0) {
 		ctlv = (ipfw_obj_ctlv *)base;
 		switch (ctlv->head.type) {
 		case IPFW_TLV_DYNSTATE_LIST:
 			base += sizeof(*ctlv);
 			sz -= sizeof(*ctlv);
 			ttype = IPFW_TLV_DYN_ENT;
 			fptr = dyn_bc;
 			farg = dyn_arg;
 			break;
 		default:
 			return (sz);
 		}
 
 		while (sz > 0) {
 			tlv = (ipfw_obj_tlv *)base;
 			if (tlv->type != ttype)
 				break;
 
 			fptr(co, fo, farg, tlv + 1);
 			sz -= tlv->length;
 			base += tlv->length;
 		}
 	}
 
 	return (sz);
 }
 
 static void
 prepare_format_opts(struct cmdline_opts *co, struct format_opts *fo,
     ipfw_obj_tlv *rtlv, int rcnt, caddr_t dynbase, size_t dynsz)
 {
 	int bcwidth, pcwidth, width;
 	int n;
 	struct ip_fw_bcounter *cntr;
 	struct ip_fw_rule *r;
 
 	bcwidth = 0;
 	pcwidth = 0;
 	if (fo->show_counters != 0) {
 		for (n = 0; n < rcnt; n++,
 		    rtlv = (ipfw_obj_tlv *)((caddr_t)rtlv + rtlv->length)) {
 			cntr = (struct ip_fw_bcounter *)(rtlv + 1);
 			r = (struct ip_fw_rule *)((caddr_t)cntr + cntr->size);
 			/* skip rules from another set */
 			if (co->use_set && r->set != co->use_set - 1)
 				continue;
 
 			/* packet counter */
 			width = pr_u64(NULL, &cntr->pcnt, 0);
 			if (width > pcwidth)
 				pcwidth = width;
 
 			/* byte counter */
 			width = pr_u64(NULL, &cntr->bcnt, 0);
 			if (width > bcwidth)
 				bcwidth = width;
 		}
 	}
 	fo->bcwidth = bcwidth;
 	fo->pcwidth = pcwidth;
 
 	fo->dcnt = 0;
 	if (co->do_dynamic && dynsz > 0)
 		foreach_state(co, fo, dynbase, dynsz, prepare_format_dyn, NULL);
 }
 
 static int
 list_static_range(struct cmdline_opts *co, struct format_opts *fo,
     struct buf_pr *bp, ipfw_obj_tlv *rtlv, int rcnt)
 {
 	int n, seen;
 	struct ip_fw_rule *r;
 	struct ip_fw_bcounter *cntr;
 	int c = 0;
 
 	for (n = seen = 0; n < rcnt; n++,
 	    rtlv = (ipfw_obj_tlv *)((caddr_t)rtlv + rtlv->length)) {
 
 		if ((fo->show_counters | fo->show_time) != 0) {
 			cntr = (struct ip_fw_bcounter *)(rtlv + 1);
 			r = (struct ip_fw_rule *)((caddr_t)cntr + cntr->size);
 		} else {
 			cntr = NULL;
 			r = (struct ip_fw_rule *)(rtlv + 1);
 		}
 		if (r->rulenum > fo->last)
 			break;
 		if (co->use_set && r->set != co->use_set - 1)
 			continue;
 		if (r->rulenum >= fo->first && r->rulenum <= fo->last) {
 			show_static_rule(co, fo, bp, r, cntr);
 			printf("%s", bp->buf);
 			c += rtlv->length;
 			bp_flush(bp);
 			seen++;
 		}
 	}
 
 	return (seen);
 }
 
 static void
 list_dyn_state(struct cmdline_opts *co, struct format_opts *fo,
     void *_arg, void *_state)
 {
 	uint16_t rulenum;
 	uint8_t set;
 	ipfw_dyn_rule *d;
 	struct buf_pr *bp;
 
 	d = (ipfw_dyn_rule *)_state;
 	bp = (struct buf_pr *)_arg;
 
 	bcopy(&d->rule, &rulenum, sizeof(rulenum));
 	if (rulenum > fo->last)
 		return;
 	if (co->use_set) {
 		bcopy((char *)&d->rule + sizeof(uint16_t),
 		      &set, sizeof(uint8_t));
 		if (set != co->use_set - 1)
 			return;
 	}
 	if (rulenum >= fo->first) {
 		show_dyn_state(co, fo, bp, d);
 		printf("%s\n", bp->buf);
 		bp_flush(bp);
 	}
 }
 
 static int
 list_dyn_range(struct cmdline_opts *co, struct format_opts *fo,
     struct buf_pr *bp, caddr_t base, size_t sz)
 {
 
 	sz = foreach_state(co, fo, base, sz, list_dyn_state, bp);
 	return (sz);
 }
 
 void
 ipfw_list(int ac, char *av[], int show_counters)
 {
 	ipfw_cfg_lheader *cfg;
 	struct format_opts sfo;
 	size_t sz;
 	int error;
 	int lac;
 	char **lav;
 	uint32_t rnum;
 	char *endptr;
 
 	if (co.test_only) {
 		fprintf(stderr, "Testing only, list disabled\n");
 		return;
 	}
 	if (co.do_pipe) {
 		dummynet_list(ac, av, show_counters);
 		return;
 	}
 
 	ac--;
 	av++;
 	memset(&sfo, 0, sizeof(sfo));
 
 	/* Determine rule range to request */
 	if (ac > 0) {
 		for (lac = ac, lav = av; lac != 0; lac--) {
 			rnum = strtoul(*lav++, &endptr, 10);
 			if (sfo.first == 0 || rnum < sfo.first)
 				sfo.first = rnum;
 
 			if (*endptr == '-')
 				rnum = strtoul(endptr + 1, &endptr, 10);
 			if (sfo.last == 0 || rnum > sfo.last)
 				sfo.last = rnum;
 		}
 	}
 
 	/* get configuraion from kernel */
 	cfg = NULL;
 	sfo.show_counters = show_counters;
 	sfo.show_time = co.do_time;
 	if (co.do_dynamic != 2)
 		sfo.flags |= IPFW_CFG_GET_STATIC;
 	if (co.do_dynamic != 0)
 		sfo.flags |= IPFW_CFG_GET_STATES;
 	if ((sfo.show_counters | sfo.show_time) != 0)
 		sfo.flags |= IPFW_CFG_GET_COUNTERS;
 	if (ipfw_get_config(&co, &sfo, &cfg, &sz) != 0)
 		err(EX_OSERR, "retrieving config failed");
 
 	error = ipfw_show_config(&co, &sfo, cfg, sz, ac, av);
 
 	free(cfg);
 
 	if (error != EX_OK)
 		exit(error);
 }
 
 static int
 ipfw_show_config(struct cmdline_opts *co, struct format_opts *fo,
     ipfw_cfg_lheader *cfg, size_t sz, int ac, char *av[])
 {
 	caddr_t dynbase;
 	size_t dynsz;
 	int rcnt;
 	int exitval = EX_OK;
 	int lac;
 	char **lav;
 	char *endptr;
 	size_t readsz;
 	struct buf_pr bp;
 	ipfw_obj_ctlv *ctlv, *tstate;
 	ipfw_obj_tlv *rbase;
 
 	/*
 	 * Handle tablenames TLV first, if any
 	 */
 	tstate = NULL;
 	rbase = NULL;
 	dynbase = NULL;
 	dynsz = 0;
 	readsz = sizeof(*cfg);
 	rcnt = 0;
 
 	fo->set_mask = cfg->set_mask;
 
 	ctlv = (ipfw_obj_ctlv *)(cfg + 1);
 	if (ctlv->head.type == IPFW_TLV_TBLNAME_LIST) {
 		object_sort_ctlv(ctlv);
 		fo->tstate = ctlv;
 		readsz += ctlv->head.length;
 		ctlv = (ipfw_obj_ctlv *)((caddr_t)ctlv + ctlv->head.length);
 	}
 
 	if (cfg->flags & IPFW_CFG_GET_STATIC) {
 		/* We've requested static rules */
 		if (ctlv->head.type == IPFW_TLV_RULE_LIST) {
 			rbase = (ipfw_obj_tlv *)(ctlv + 1);
 			rcnt = ctlv->count;
 			readsz += ctlv->head.length;
 			ctlv = (ipfw_obj_ctlv *)((caddr_t)ctlv +
 			    ctlv->head.length);
 		}
 	}
 
 	if ((cfg->flags & IPFW_CFG_GET_STATES) && (readsz != sz))  {
 		/* We may have some dynamic states */
 		dynsz = sz - readsz;
 		/* Skip empty header */
 		if (dynsz != sizeof(ipfw_obj_ctlv))
 			dynbase = (caddr_t)ctlv;
 		else
 			dynsz = 0;
 	}
 
 	prepare_format_opts(co, fo, rbase, rcnt, dynbase, dynsz);
 	bp_alloc(&bp, 4096);
 
 	/* if no rule numbers were specified, list all rules */
 	if (ac == 0) {
 		fo->first = 0;
 		fo->last = IPFW_DEFAULT_RULE;
 		if (cfg->flags & IPFW_CFG_GET_STATIC)
 			list_static_range(co, fo, &bp, rbase, rcnt);
 
 		if (co->do_dynamic && dynsz > 0) {
 			printf("## Dynamic rules (%d %zu):\n", fo->dcnt,
 			    dynsz);
 			list_dyn_range(co, fo, &bp, dynbase, dynsz);
 		}
 
 		bp_free(&bp);
 		return (EX_OK);
 	}
 
 	/* display specific rules requested on command line */
 	for (lac = ac, lav = av; lac != 0; lac--) {
 		/* convert command line rule # */
 		fo->last = fo->first = strtoul(*lav++, &endptr, 10);
 		if (*endptr == '-')
 			fo->last = strtoul(endptr + 1, &endptr, 10);
 		if (*endptr) {
 			exitval = EX_USAGE;
 			warnx("invalid rule number: %s", *(lav - 1));
 			continue;
 		}
 
 		if ((cfg->flags & IPFW_CFG_GET_STATIC) == 0)
 			continue;
 
 		if (list_static_range(co, fo, &bp, rbase, rcnt) == 0) {
 			/* give precedence to other error(s) */
 			if (exitval == EX_OK)
 				exitval = EX_UNAVAILABLE;
 			if (fo->first == fo->last)
 				warnx("rule %u does not exist", fo->first);
 			else
 				warnx("no rules in range %u-%u",
 				    fo->first, fo->last);
 		}
 	}
 
 	if (co->do_dynamic && dynsz > 0) {
 		printf("## Dynamic rules:\n");
 		for (lac = ac, lav = av; lac != 0; lac--) {
 			fo->last = fo->first = strtoul(*lav++, &endptr, 10);
 			if (*endptr == '-')
 				fo->last = strtoul(endptr+1, &endptr, 10);
 			if (*endptr)
 				/* already warned */
 				continue;
 			list_dyn_range(co, fo, &bp, dynbase, dynsz);
 		}
 	}
 
 	bp_free(&bp);
 	return (exitval);
 }
 
 
 /*
  * Retrieves current ipfw configuration of given type
  * and stores its pointer to @pcfg.
  *
  * Caller is responsible for freeing @pcfg.
  *
  * Returns 0 on success.
  */
 
 static int
 ipfw_get_config(struct cmdline_opts *co, struct format_opts *fo,
     ipfw_cfg_lheader **pcfg, size_t *psize)
 {
 	ipfw_cfg_lheader *cfg;
 	size_t sz;
 	int i;
 
 
 	if (co->test_only != 0) {
 		fprintf(stderr, "Testing only, list disabled\n");
 		return (0);
 	}
 
 	/* Start with some data size */
 	sz = 4096;
 	cfg = NULL;
 
 	for (i = 0; i < 16; i++) {
 		if (cfg != NULL)
 			free(cfg);
 		if ((cfg = calloc(1, sz)) == NULL)
 			return (ENOMEM);
 
 		cfg->flags = fo->flags;
 		cfg->start_rule = fo->first;
 		cfg->end_rule = fo->last;
 
 		if (do_get3(IP_FW_XGET, &cfg->opheader, &sz) != 0) {
 			if (errno != ENOMEM) {
 				free(cfg);
 				return (errno);
 			}
 
 			/* Buffer size is not enough. Try to increase */
 			sz = sz * 2;
 			if (sz < cfg->size)
 				sz = cfg->size;
 			continue;
 		}
 
 		*pcfg = cfg;
 		*psize = sz;
 		return (0);
 	}
 
 	free(cfg);
 	return (ENOMEM);
 }
 
 static int
 lookup_host (char *host, struct in_addr *ipaddr)
 {
 	struct hostent *he;
 
 	if (!inet_aton(host, ipaddr)) {
 		if ((he = gethostbyname(host)) == NULL)
 			return(-1);
 		*ipaddr = *(struct in_addr *)he->h_addr_list[0];
 	}
 	return(0);
 }
 
 struct tidx {
 	ipfw_obj_ntlv *idx;
 	uint32_t count;
 	uint32_t size;
 	uint16_t counter;
 	uint8_t set;
 };
 
 int
 ipfw_check_object_name(const char *name)
 {
 	int c, i, l;
 
 	/*
 	 * Check that name is null-terminated and contains
 	 * valid symbols only. Valid mask is:
 	 * [a-zA-Z0-9\-_\.]{1,63}
 	 */
 	l = strlen(name);
 	if (l == 0 || l >= 64)
 		return (EINVAL);
 	for (i = 0; i < l; i++) {
 		c = name[i];
 		if (isalpha(c) || isdigit(c) || c == '_' ||
 		    c == '-' || c == '.')
 			continue;
 		return (EINVAL);
 	}
 	return (0);
 }
 
 static char *default_state_name = "default";
 static int
 state_check_name(const char *name)
 {
 
 	if (ipfw_check_object_name(name) != 0)
 		return (EINVAL);
 	if (strcmp(name, "any") == 0)
 		return (EINVAL);
 	return (0);
 }
 
 static int
 eaction_check_name(const char *name)
 {
 
 	if (ipfw_check_object_name(name) != 0)
 		return (EINVAL);
 	/* Restrict some 'special' names */
 	if (match_token(rule_actions, name) != -1 &&
 	    match_token(rule_action_params, name) != -1)
 		return (EINVAL);
 	return (0);
 }
 
 static uint16_t
 pack_object(struct tidx *tstate, char *name, int otype)
 {
 	int i;
 	ipfw_obj_ntlv *ntlv;
 
 	for (i = 0; i < tstate->count; i++) {
 		if (strcmp(tstate->idx[i].name, name) != 0)
 			continue;
 		if (tstate->idx[i].set != tstate->set)
 			continue;
 		if (tstate->idx[i].head.type != otype)
 			continue;
 
 		return (tstate->idx[i].idx);
 	}
 
 	if (tstate->count + 1 > tstate->size) {
 		tstate->size += 4;
 		tstate->idx = realloc(tstate->idx, tstate->size *
 		    sizeof(ipfw_obj_ntlv));
 		if (tstate->idx == NULL)
 			return (0);
 	}
 
 	ntlv = &tstate->idx[i];
 	memset(ntlv, 0, sizeof(ipfw_obj_ntlv));
 	strlcpy(ntlv->name, name, sizeof(ntlv->name));
 	ntlv->head.type = otype;
 	ntlv->head.length = sizeof(ipfw_obj_ntlv);
 	ntlv->set = tstate->set;
 	ntlv->idx = ++tstate->counter;
 	tstate->count++;
 
 	return (ntlv->idx);
 }
 
 static uint16_t
 pack_table(struct tidx *tstate, char *name)
 {
 
 	if (table_check_name(name) != 0)
 		return (0);
 
 	return (pack_object(tstate, name, IPFW_TLV_TBL_NAME));
 }
 
 void
 fill_table(struct _ipfw_insn *cmd, char *av, uint8_t opcode,
     struct tidx *tstate)
 {
 	uint32_t *d = ((ipfw_insn_u32 *)cmd)->d;
 	uint16_t uidx;
 	char *p;
 
 	if ((p = strchr(av + 6, ')')) == NULL)
 		errx(EX_DATAERR, "forgotten parenthesis: '%s'", av);
 	*p = '\0';
 	p = strchr(av + 6, ',');
 	if (p)
 		*p++ = '\0';
 
 	if ((uidx = pack_table(tstate, av + 6)) == 0)
 		errx(EX_DATAERR, "Invalid table name: %s", av + 6);
 
 	cmd->opcode = opcode;
 	cmd->arg1 = uidx;
 	if (p) {
 		cmd->len |= F_INSN_SIZE(ipfw_insn_u32);
 		d[0] = strtoul(p, NULL, 0);
 	} else
 		cmd->len |= F_INSN_SIZE(ipfw_insn);
 }
 
 
 /*
  * fills the addr and mask fields in the instruction as appropriate from av.
  * Update length as appropriate.
  * The following formats are allowed:
  *	me	returns O_IP_*_ME
  *	1.2.3.4		single IP address
  *	1.2.3.4:5.6.7.8	address:mask
  *	1.2.3.4/24	address/mask
  *	1.2.3.4/26{1,6,5,4,23}	set of addresses in a subnet
  * We can have multiple comma-separated address/mask entries.
  */
 static void
 fill_ip(ipfw_insn_ip *cmd, char *av, int cblen, struct tidx *tstate)
 {
 	int len = 0;
 	uint32_t *d = ((ipfw_insn_u32 *)cmd)->d;
 
 	cmd->o.len &= ~F_LEN_MASK;	/* zero len */
 
 	if (_substrcmp(av, "any") == 0)
 		return;
 
 	if (_substrcmp(av, "me") == 0) {
 		cmd->o.len |= F_INSN_SIZE(ipfw_insn);
 		return;
 	}
 
 	if (strncmp(av, "table(", 6) == 0) {
 		fill_table(&cmd->o, av, O_IP_DST_LOOKUP, tstate);
 		return;
 	}
 
     while (av) {
 	/*
 	 * After the address we can have '/' or ':' indicating a mask,
 	 * ',' indicating another address follows, '{' indicating a
 	 * set of addresses of unspecified size.
 	 */
 	char *t = NULL, *p = strpbrk(av, "/:,{");
 	int masklen;
 	char md, nd = '\0';
 
 	CHECK_LENGTH(cblen, F_INSN_SIZE(ipfw_insn) + 2 + len);
 
 	if (p) {
 		md = *p;
 		*p++ = '\0';
 		if ((t = strpbrk(p, ",{")) != NULL) {
 			nd = *t;
 			*t = '\0';
 		}
 	} else
 		md = '\0';
 
 	if (lookup_host(av, (struct in_addr *)&d[0]) != 0)
 		errx(EX_NOHOST, "hostname ``%s'' unknown", av);
 	switch (md) {
 	case ':':
 		if (!inet_aton(p, (struct in_addr *)&d[1]))
 			errx(EX_DATAERR, "bad netmask ``%s''", p);
 		break;
 	case '/':
 		masklen = atoi(p);
 		if (masklen == 0)
 			d[1] = htonl(0U);	/* mask */
 		else if (masklen > 32)
 			errx(EX_DATAERR, "bad width ``%s''", p);
 		else
 			d[1] = htonl(~0U << (32 - masklen));
 		break;
 	case '{':	/* no mask, assume /24 and put back the '{' */
 		d[1] = htonl(~0U << (32 - 24));
 		*(--p) = md;
 		break;
 
 	case ',':	/* single address plus continuation */
 		*(--p) = md;
 		/* FALLTHROUGH */
 	case 0:		/* initialization value */
 	default:
 		d[1] = htonl(~0U);	/* force /32 */
 		break;
 	}
 	d[0] &= d[1];		/* mask base address with mask */
 	if (t)
 		*t = nd;
 	/* find next separator */
 	if (p)
 		p = strpbrk(p, ",{");
 	if (p && *p == '{') {
 		/*
 		 * We have a set of addresses. They are stored as follows:
 		 *   arg1	is the set size (powers of 2, 2..256)
 		 *   addr	is the base address IN HOST FORMAT
 		 *   mask..	is an array of arg1 bits (rounded up to
 		 *		the next multiple of 32) with bits set
 		 *		for each host in the map.
 		 */
 		uint32_t *map = (uint32_t *)&cmd->mask;
 		int low, high;
 		int i = contigmask((uint8_t *)&(d[1]), 32);
 
 		if (len > 0)
 			errx(EX_DATAERR, "address set cannot be in a list");
 		if (i < 24 || i > 31)
 			errx(EX_DATAERR, "invalid set with mask %d\n", i);
 		cmd->o.arg1 = 1<<(32-i);	/* map length		*/
 		d[0] = ntohl(d[0]);		/* base addr in host format */
 		cmd->o.opcode = O_IP_DST_SET;	/* default */
 		cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32) + (cmd->o.arg1+31)/32;
 		for (i = 0; i < (cmd->o.arg1+31)/32 ; i++)
 			map[i] = 0;	/* clear map */
 
 		av = p + 1;
 		low = d[0] & 0xff;
 		high = low + cmd->o.arg1 - 1;
 		/*
 		 * Here, i stores the previous value when we specify a range
 		 * of addresses within a mask, e.g. 45-63. i = -1 means we
 		 * have no previous value.
 		 */
 		i = -1;	/* previous value in a range */
 		while (isdigit(*av)) {
 			char *s;
 			int a = strtol(av, &s, 0);
 
 			if (s == av) { /* no parameter */
 			    if (*av != '}')
 				errx(EX_DATAERR, "set not closed\n");
 			    if (i != -1)
 				errx(EX_DATAERR, "incomplete range %d-", i);
 			    break;
 			}
 			if (a < low || a > high)
 			    errx(EX_DATAERR, "addr %d out of range [%d-%d]\n",
 				a, low, high);
 			a -= low;
 			if (i == -1)	/* no previous in range */
 			    i = a;
 			else {		/* check that range is valid */
 			    if (i > a)
 				errx(EX_DATAERR, "invalid range %d-%d",
 					i+low, a+low);
 			    if (*s == '-')
 				errx(EX_DATAERR, "double '-' in range");
 			}
 			for (; i <= a; i++)
 			    map[i/32] |= 1<<(i & 31);
 			i = -1;
 			if (*s == '-')
 			    i = a;
 			else if (*s == '}')
 			    break;
 			av = s+1;
 		}
 		return;
 	}
 	av = p;
 	if (av)			/* then *av must be a ',' */
 		av++;
 
 	/* Check this entry */
 	if (d[1] == 0) { /* "any", specified as x.x.x.x/0 */
 		/*
 		 * 'any' turns the entire list into a NOP.
 		 * 'not any' never matches, so it is removed from the
 		 * list unless it is the only item, in which case we
 		 * report an error.
 		 */
 		if (cmd->o.len & F_NOT) {	/* "not any" never matches */
 			if (av == NULL && len == 0) /* only this entry */
 				errx(EX_DATAERR, "not any never matches");
 		}
 		/* else do nothing and skip this entry */
 		return;
 	}
 	/* A single IP can be stored in an optimized format */
 	if (d[1] == (uint32_t)~0 && av == NULL && len == 0) {
 		cmd->o.len |= F_INSN_SIZE(ipfw_insn_u32);
 		return;
 	}
 	len += 2;	/* two words... */
 	d += 2;
     } /* end while */
     if (len + 1 > F_LEN_MASK)
 	errx(EX_DATAERR, "address list too long");
     cmd->o.len |= len+1;
 }
 
 
 /* n2mask sets n bits of the mask */
 void
 n2mask(struct in6_addr *mask, int n)
 {
 	static int	minimask[9] =
 	    { 0x00, 0x80, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc, 0xfe, 0xff };
 	u_char		*p;
 
 	memset(mask, 0, sizeof(struct in6_addr));
 	p = (u_char *) mask;
 	for (; n > 0; p++, n -= 8) {
 		if (n >= 8)
 			*p = 0xff;
 		else
 			*p = minimask[n];
 	}
 	return;
 }
 
 static void
 fill_flags_cmd(ipfw_insn *cmd, enum ipfw_opcodes opcode,
 	struct _s_x *flags, char *p)
 {
 	char *e;
 	uint32_t set = 0, clear = 0;
 
 	if (fill_flags(flags, p, &e, &set, &clear) != 0)
 		errx(EX_DATAERR, "invalid flag %s", e);
 
 	cmd->opcode = opcode;
 	cmd->len =  (cmd->len & (F_NOT | F_OR)) | 1;
 	cmd->arg1 = (set & 0xff) | ( (clear & 0xff) << 8);
 }
 
 
 void
 ipfw_delete(char *av[])
 {
 	ipfw_range_tlv rt;
 	char *sep;
 	int i, j;
 	int exitval = EX_OK;
 	int do_set = 0;
 
 	av++;
 	NEED1("missing rule specification");
 	if ( *av && _substrcmp(*av, "set") == 0) {
 		/* Do not allow using the following syntax:
 		 *	ipfw set N delete set M
 		 */
 		if (co.use_set)
 			errx(EX_DATAERR, "invalid syntax");
 		do_set = 1;	/* delete set */
 		av++;
 	}
 
 	/* Rule number */
 	while (*av && isdigit(**av)) {
 		i = strtol(*av, &sep, 10);
 		j = i;
 		if (*sep== '-')
 			j = strtol(sep + 1, NULL, 10);
 		av++;
 		if (co.do_nat) {
 			exitval = do_cmd(IP_FW_NAT_DEL, &i, sizeof i);
 			if (exitval) {
 				exitval = EX_UNAVAILABLE;
 				if (co.do_quiet)
 					continue;
 				warn("nat %u not available", i);
 			}
 		} else if (co.do_pipe) {
 			exitval = ipfw_delete_pipe(co.do_pipe, i);
 		} else {
 			memset(&rt, 0, sizeof(rt));
 			if (do_set != 0) {
 				rt.set = i & 31;
 				rt.flags = IPFW_RCFLAG_SET;
 			} else {
 				rt.start_rule = i & 0xffff;
 				rt.end_rule = j & 0xffff;
 				if (rt.start_rule == 0 && rt.end_rule == 0)
 					rt.flags |= IPFW_RCFLAG_ALL;
 				else
 					rt.flags |= IPFW_RCFLAG_RANGE;
 				if (co.use_set != 0) {
 					rt.set = co.use_set - 1;
 					rt.flags |= IPFW_RCFLAG_SET;
 				}
 			}
 			if (co.do_dynamic == 2)
 				rt.flags |= IPFW_RCFLAG_DYNAMIC;
 			i = do_range_cmd(IP_FW_XDEL, &rt);
 			if (i != 0) {
 				exitval = EX_UNAVAILABLE;
 				if (co.do_quiet)
 					continue;
 				warn("rule %u: setsockopt(IP_FW_XDEL)",
 				    rt.start_rule);
 			} else if (rt.new_set == 0 && do_set == 0 &&
 			    co.do_dynamic != 2) {
 				exitval = EX_UNAVAILABLE;
 				if (co.do_quiet)
 					continue;
 				if (rt.start_rule != rt.end_rule)
 					warnx("no rules rules in %u-%u range",
 					    rt.start_rule, rt.end_rule);
 				else
 					warnx("rule %u not found",
 					    rt.start_rule);
 			}
 		}
 	}
 	if (exitval != EX_OK && co.do_force == 0)
 		exit(exitval);
 }
 
 
 /*
  * fill the interface structure. We do not check the name as we can
  * create interfaces dynamically, so checking them at insert time
  * makes relatively little sense.
  * Interface names containing '*', '?', or '[' are assumed to be shell
  * patterns which match interfaces.
  */
 static void
 fill_iface(ipfw_insn_if *cmd, char *arg, int cblen, struct tidx *tstate)
 {
 	char *p;
 	uint16_t uidx;
 
 	cmd->name[0] = '\0';
 	cmd->o.len |= F_INSN_SIZE(ipfw_insn_if);
 
 	CHECK_CMDLEN;
 
 	/* Parse the interface or address */
 	if (strcmp(arg, "any") == 0)
 		cmd->o.len = 0;		/* effectively ignore this command */
 	else if (strncmp(arg, "table(", 6) == 0) {
 		if ((p = strchr(arg + 6, ')')) == NULL)
 			errx(EX_DATAERR, "forgotten parenthesis: '%s'", arg);
 		*p = '\0';
 		p = strchr(arg + 6, ',');
 		if (p)
 			*p++ = '\0';
 		if ((uidx = pack_table(tstate, arg + 6)) == 0)
 			errx(EX_DATAERR, "Invalid table name: %s", arg + 6);
 
 		cmd->name[0] = '\1'; /* Special value indicating table */
 		cmd->p.kidx = uidx;
 	} else if (!isdigit(*arg)) {
 		strlcpy(cmd->name, arg, sizeof(cmd->name));
 		cmd->p.glob = strpbrk(arg, "*?[") != NULL ? 1 : 0;
 	} else if (!inet_aton(arg, &cmd->p.ip))
 		errx(EX_DATAERR, "bad ip address ``%s''", arg);
 }
 
 static void
 get_mac_addr_mask(const char *p, uint8_t *addr, uint8_t *mask)
 {
 	int i;
 	size_t l;
 	char *ap, *ptr, *optr;
 	struct ether_addr *mac;
 	const char *macset = "0123456789abcdefABCDEF:";
 
 	if (strcmp(p, "any") == 0) {
 		for (i = 0; i < ETHER_ADDR_LEN; i++)
 			addr[i] = mask[i] = 0;
 		return;
 	}
 
 	optr = ptr = strdup(p);
 	if ((ap = strsep(&ptr, "&/")) != NULL && *ap != 0) {
 		l = strlen(ap);
 		if (strspn(ap, macset) != l || (mac = ether_aton(ap)) == NULL)
 			errx(EX_DATAERR, "Incorrect MAC address");
 		bcopy(mac, addr, ETHER_ADDR_LEN);
 	} else
 		errx(EX_DATAERR, "Incorrect MAC address");
 
 	if (ptr != NULL) { /* we have mask? */
 		if (p[ptr - optr - 1] == '/') { /* mask len */
 			long ml = strtol(ptr, &ap, 10);
 			if (*ap != 0 || ml > ETHER_ADDR_LEN * 8 || ml < 0)
 				errx(EX_DATAERR, "Incorrect mask length");
 			for (i = 0; ml > 0 && i < ETHER_ADDR_LEN; ml -= 8, i++)
 				mask[i] = (ml >= 8) ? 0xff: (~0) << (8 - ml);
 		} else { /* mask */
 			l = strlen(ptr);
 			if (strspn(ptr, macset) != l ||
 			    (mac = ether_aton(ptr)) == NULL)
 				errx(EX_DATAERR, "Incorrect mask");
 			bcopy(mac, mask, ETHER_ADDR_LEN);
 		}
 	} else { /* default mask: ff:ff:ff:ff:ff:ff */
 		for (i = 0; i < ETHER_ADDR_LEN; i++)
 			mask[i] = 0xff;
 	}
 	for (i = 0; i < ETHER_ADDR_LEN; i++)
 		addr[i] &= mask[i];
 
 	free(optr);
 }
 
 /*
  * helper function, updates the pointer to cmd with the length
  * of the current command, and also cleans up the first word of
  * the new command in case it has been clobbered before.
  */
 static ipfw_insn *
 next_cmd(ipfw_insn *cmd, int *len)
 {
 	*len -= F_LEN(cmd);
 	CHECK_LENGTH(*len, 0);
 	cmd += F_LEN(cmd);
 	bzero(cmd, sizeof(*cmd));
 	return cmd;
 }
 
 /*
  * Takes arguments and copies them into a comment
  */
 static void
 fill_comment(ipfw_insn *cmd, char **av, int cblen)
 {
 	int i, l;
 	char *p = (char *)(cmd + 1);
 
 	cmd->opcode = O_NOP;
 	cmd->len =  (cmd->len & (F_NOT | F_OR));
 
 	/* Compute length of comment string. */
 	for (i = 0, l = 0; av[i] != NULL; i++)
 		l += strlen(av[i]) + 1;
 	if (l == 0)
 		return;
 	if (l > 84)
 		errx(EX_DATAERR,
 		    "comment too long (max 80 chars)");
 	l = 1 + (l+3)/4;
 	cmd->len =  (cmd->len & (F_NOT | F_OR)) | l;
 	CHECK_CMDLEN;
 
 	for (i = 0; av[i] != NULL; i++) {
 		strcpy(p, av[i]);
 		p += strlen(av[i]);
 		*p++ = ' ';
 	}
 	*(--p) = '\0';
 }
 
 /*
  * A function to fill simple commands of size 1.
  * Existing flags are preserved.
  */
 static void
 fill_cmd(ipfw_insn *cmd, enum ipfw_opcodes opcode, int flags, uint16_t arg)
 {
 	cmd->opcode = opcode;
 	cmd->len =  ((cmd->len | flags) & (F_NOT | F_OR)) | 1;
 	cmd->arg1 = arg;
 }
 
 /*
  * Fetch and add the MAC address and type, with masks. This generates one or
  * two microinstructions, and returns the pointer to the last one.
  */
 static ipfw_insn *
 add_mac(ipfw_insn *cmd, char *av[], int cblen)
 {
 	ipfw_insn_mac *mac;
 
 	if ( ( av[0] == NULL ) || ( av[1] == NULL ) )
 		errx(EX_DATAERR, "MAC dst src");
 
 	cmd->opcode = O_MACADDR2;
 	cmd->len = (cmd->len & (F_NOT | F_OR)) | F_INSN_SIZE(ipfw_insn_mac);
 	CHECK_CMDLEN;
 
 	mac = (ipfw_insn_mac *)cmd;
 	get_mac_addr_mask(av[0], mac->addr, mac->mask);	/* dst */
 	get_mac_addr_mask(av[1], &(mac->addr[ETHER_ADDR_LEN]),
 	    &(mac->mask[ETHER_ADDR_LEN])); /* src */
 	return cmd;
 }
 
 static ipfw_insn *
 add_mactype(ipfw_insn *cmd, char *av, int cblen)
 {
 	if (!av)
 		errx(EX_DATAERR, "missing MAC type");
 	if (strcmp(av, "any") != 0) { /* we have a non-null type */
 		fill_newports((ipfw_insn_u16 *)cmd, av, IPPROTO_ETHERTYPE,
 		    cblen);
 		cmd->opcode = O_MAC_TYPE;
 		return cmd;
 	} else
 		return NULL;
 }
 
 static ipfw_insn *
 add_proto0(ipfw_insn *cmd, char *av, u_char *protop)
 {
 	struct protoent *pe;
 	char *ep;
 	int proto;
 
 	proto = strtol(av, &ep, 10);
 	if (*ep != '\0' || proto <= 0) {
 		if ((pe = getprotobyname(av)) == NULL)
 			return NULL;
 		proto = pe->p_proto;
 	}
 
 	fill_cmd(cmd, O_PROTO, 0, proto);
 	*protop = proto;
 	return cmd;
 }
 
 static ipfw_insn *
 add_proto(ipfw_insn *cmd, char *av, u_char *protop)
 {
 	u_char proto = IPPROTO_IP;
 
 	if (_substrcmp(av, "all") == 0 || strcmp(av, "ip") == 0)
 		; /* do not set O_IP4 nor O_IP6 */
 	else if (strcmp(av, "ip4") == 0)
 		/* explicit "just IPv4" rule */
 		fill_cmd(cmd, O_IP4, 0, 0);
 	else if (strcmp(av, "ip6") == 0) {
 		/* explicit "just IPv6" rule */
 		proto = IPPROTO_IPV6;
 		fill_cmd(cmd, O_IP6, 0, 0);
 	} else
 		return add_proto0(cmd, av, protop);
 
 	*protop = proto;
 	return cmd;
 }
 
 static ipfw_insn *
 add_proto_compat(ipfw_insn *cmd, char *av, u_char *protop)
 {
 	u_char proto = IPPROTO_IP;
 
 	if (_substrcmp(av, "all") == 0 || strcmp(av, "ip") == 0)
 		; /* do not set O_IP4 nor O_IP6 */
 	else if (strcmp(av, "ipv4") == 0 || strcmp(av, "ip4") == 0)
 		/* explicit "just IPv4" rule */
 		fill_cmd(cmd, O_IP4, 0, 0);
 	else if (strcmp(av, "ipv6") == 0 || strcmp(av, "ip6") == 0) {
 		/* explicit "just IPv6" rule */
 		proto = IPPROTO_IPV6;
 		fill_cmd(cmd, O_IP6, 0, 0);
 	} else
 		return add_proto0(cmd, av, protop);
 
 	*protop = proto;
 	return cmd;
 }
 
 static ipfw_insn *
 add_srcip(ipfw_insn *cmd, char *av, int cblen, struct tidx *tstate)
 {
 	fill_ip((ipfw_insn_ip *)cmd, av, cblen, tstate);
 	if (cmd->opcode == O_IP_DST_SET)			/* set */
 		cmd->opcode = O_IP_SRC_SET;
 	else if (cmd->opcode == O_IP_DST_LOOKUP)		/* table */
 		cmd->opcode = O_IP_SRC_LOOKUP;
 	else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn))		/* me */
 		cmd->opcode = O_IP_SRC_ME;
 	else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_u32))	/* one IP */
 		cmd->opcode = O_IP_SRC;
 	else							/* addr/mask */
 		cmd->opcode = O_IP_SRC_MASK;
 	return cmd;
 }
 
 static ipfw_insn *
 add_dstip(ipfw_insn *cmd, char *av, int cblen, struct tidx *tstate)
 {
 	fill_ip((ipfw_insn_ip *)cmd, av, cblen, tstate);
 	if (cmd->opcode == O_IP_DST_SET)			/* set */
 		;
 	else if (cmd->opcode == O_IP_DST_LOOKUP)		/* table */
 		;
 	else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn))		/* me */
 		cmd->opcode = O_IP_DST_ME;
 	else if (F_LEN(cmd) == F_INSN_SIZE(ipfw_insn_u32))	/* one IP */
 		cmd->opcode = O_IP_DST;
 	else							/* addr/mask */
 		cmd->opcode = O_IP_DST_MASK;
 	return cmd;
 }
 
 static struct _s_x f_reserved_keywords[] = {
 	{ "altq",	TOK_OR },
 	{ "//",		TOK_OR },
 	{ "diverted",	TOK_OR },
 	{ "dst-port",	TOK_OR },
 	{ "src-port",	TOK_OR },
 	{ "established",	TOK_OR },
 	{ "keep-state",	TOK_OR },
 	{ "frag",	TOK_OR },
 	{ "icmptypes",	TOK_OR },
 	{ "in",		TOK_OR },
 	{ "out",	TOK_OR },
 	{ "ip6",	TOK_OR },
 	{ "any",	TOK_OR },
 	{ "to",		TOK_OR },
 	{ "via",	TOK_OR },
 	{ "{",		TOK_OR },
 	{ NULL, 0 }	/* terminator */
 };
 
 static ipfw_insn *
 add_ports(ipfw_insn *cmd, char *av, u_char proto, int opcode, int cblen)
 {
 
 	if (match_token(f_reserved_keywords, av) != -1)
 		return (NULL);
 
 	if (fill_newports((ipfw_insn_u16 *)cmd, av, proto, cblen)) {
 		/* XXX todo: check that we have a protocol with ports */
 		cmd->opcode = opcode;
 		return cmd;
 	}
 	return NULL;
 }
 
 static ipfw_insn *
 add_src(ipfw_insn *cmd, char *av, u_char proto, int cblen, struct tidx *tstate)
 {
 	struct in6_addr a;
 	char *host, *ch, buf[INET6_ADDRSTRLEN];
 	ipfw_insn *ret = NULL;
 	int len;
 
 	/* Copy first address in set if needed */
 	if ((ch = strpbrk(av, "/,")) != NULL) {
 		len = ch - av;
 		strlcpy(buf, av, sizeof(buf));
 		if (len < sizeof(buf))
 			buf[len] = '\0';
 		host = buf;
 	} else
 		host = av;
 
 	if (proto == IPPROTO_IPV6  || strcmp(av, "me6") == 0 ||
 	    inet_pton(AF_INET6, host, &a) == 1)
 		ret = add_srcip6(cmd, av, cblen, tstate);
 	/* XXX: should check for IPv4, not !IPv6 */
 	if (ret == NULL && (proto == IPPROTO_IP || strcmp(av, "me") == 0 ||
 	    inet_pton(AF_INET6, host, &a) != 1))
 		ret = add_srcip(cmd, av, cblen, tstate);
 	if (ret == NULL && strcmp(av, "any") != 0)
 		ret = cmd;
 
 	return ret;
 }
 
 static ipfw_insn *
 add_dst(ipfw_insn *cmd, char *av, u_char proto, int cblen, struct tidx *tstate)
 {
 	struct in6_addr a;
 	char *host, *ch, buf[INET6_ADDRSTRLEN];
 	ipfw_insn *ret = NULL;
 	int len;
 
 	/* Copy first address in set if needed */
 	if ((ch = strpbrk(av, "/,")) != NULL) {
 		len = ch - av;
 		strlcpy(buf, av, sizeof(buf));
 		if (len < sizeof(buf))
 			buf[len] = '\0';
 		host = buf;
 	} else
 		host = av;
 
 	if (proto == IPPROTO_IPV6  || strcmp(av, "me6") == 0 ||
 	    inet_pton(AF_INET6, host, &a) == 1)
 		ret = add_dstip6(cmd, av, cblen, tstate);
 	/* XXX: should check for IPv4, not !IPv6 */
 	if (ret == NULL && (proto == IPPROTO_IP || strcmp(av, "me") == 0 ||
 	    inet_pton(AF_INET6, host, &a) != 1))
 		ret = add_dstip(cmd, av, cblen, tstate);
 	if (ret == NULL && strcmp(av, "any") != 0)
 		ret = cmd;
 
 	return ret;
 }
 
 /*
  * Parse arguments and assemble the microinstructions which make up a rule.
  * Rules are added into the 'rulebuf' and then copied in the correct order
  * into the actual rule.
  *
  * The syntax for a rule starts with the action, followed by
  * optional action parameters, and the various match patterns.
  * In the assembled microcode, the first opcode must be an O_PROBE_STATE
  * (generated if the rule includes a keep-state option), then the
  * various match patterns, log/altq actions, and the actual action.
  *
  */
 void
 compile_rule(char *av[], uint32_t *rbuf, int *rbufsize, struct tidx *tstate)
 {
 	/*
 	 * rules are added into the 'rulebuf' and then copied in
 	 * the correct order into the actual rule.
 	 * Some things that need to go out of order (prob, action etc.)
 	 * go into actbuf[].
 	 */
 	static uint32_t actbuf[255], cmdbuf[255];
 	int rblen, ablen, cblen;
 
 	ipfw_insn *src, *dst, *cmd, *action, *prev=NULL;
 	ipfw_insn *first_cmd;	/* first match pattern */
 
 	struct ip_fw_rule *rule;
 
 	/*
 	 * various flags used to record that we entered some fields.
 	 */
 	ipfw_insn *have_state = NULL;	/* any state-related option */
 	int have_rstate = 0;
 	ipfw_insn *have_log = NULL, *have_altq = NULL, *have_tag = NULL;
 	ipfw_insn *have_skipcmd = NULL;
 	size_t len;
 
 	int i;
 
 	int open_par = 0;	/* open parenthesis ( */
 
 	/* proto is here because it is used to fetch ports */
 	u_char proto = IPPROTO_IP;	/* default protocol */
 
 	double match_prob = 1; /* match probability, default is always match */
 
 	bzero(actbuf, sizeof(actbuf));		/* actions go here */
 	bzero(cmdbuf, sizeof(cmdbuf));
 	bzero(rbuf, *rbufsize);
 
 	rule = (struct ip_fw_rule *)rbuf;
 	cmd = (ipfw_insn *)cmdbuf;
 	action = (ipfw_insn *)actbuf;
 
 	rblen = *rbufsize / sizeof(uint32_t);
 	rblen -= sizeof(struct ip_fw_rule) / sizeof(uint32_t);
 	ablen = sizeof(actbuf) / sizeof(actbuf[0]);
 	cblen = sizeof(cmdbuf) / sizeof(cmdbuf[0]);
 	cblen -= F_INSN_SIZE(ipfw_insn_u32) + 1;
 
 #define	CHECK_RBUFLEN(len)	{ CHECK_LENGTH(rblen, len); rblen -= len; }
 #define	CHECK_ACTLEN		CHECK_LENGTH(ablen, action->len)
 
 	av++;
 
 	/* [rule N]	-- Rule number optional */
 	if (av[0] && isdigit(**av)) {
 		rule->rulenum = atoi(*av);
 		av++;
 	}
 
 	/* [set N]	-- set number (0..RESVD_SET), optional */
 	if (av[0] && av[1] && _substrcmp(*av, "set") == 0) {
 		int set = strtoul(av[1], NULL, 10);
 		if (set < 0 || set > RESVD_SET)
 			errx(EX_DATAERR, "illegal set %s", av[1]);
 		rule->set = set;
 		tstate->set = set;
 		av += 2;
 	}
 
 	/* [prob D]	-- match probability, optional */
 	if (av[0] && av[1] && _substrcmp(*av, "prob") == 0) {
 		match_prob = strtod(av[1], NULL);
 
 		if (match_prob <= 0 || match_prob > 1)
 			errx(EX_DATAERR, "illegal match prob. %s", av[1]);
 		av += 2;
 	}
 
 	/* action	-- mandatory */
 	NEED1("missing action");
 	i = match_token(rule_actions, *av);
 	av++;
 	action->len = 1;	/* default */
 	CHECK_ACTLEN;
 	switch(i) {
 	case TOK_CHECKSTATE:
 		have_state = action;
 		action->opcode = O_CHECK_STATE;
 		if (*av == NULL ||
 		    match_token(rule_options, *av) == TOK_COMMENT) {
 			action->arg1 = pack_object(tstate,
 			    default_state_name, IPFW_TLV_STATE_NAME);
 			break;
 		}
 		if (*av[0] == ':') {
 			if (strcmp(*av + 1, "any") == 0)
 				action->arg1 = 0;
 			else if (state_check_name(*av + 1) == 0)
 				action->arg1 = pack_object(tstate, *av + 1,
 				    IPFW_TLV_STATE_NAME);
 			else
 				errx(EX_DATAERR, "Invalid state name %s",
 				    *av);
 			av++;
 			break;
 		}
 		errx(EX_DATAERR, "Invalid state name %s", *av);
 		break;
 
 	case TOK_ABORT:
 		action->opcode = O_REJECT;
 		action->arg1 = ICMP_REJECT_ABORT;
 		break;
 
 	case TOK_ABORT6:
 		action->opcode = O_UNREACH6;
 		action->arg1 = ICMP6_UNREACH_ABORT;
 		break;
 
 	case TOK_ACCEPT:
 		action->opcode = O_ACCEPT;
 		break;
 
 	case TOK_DENY:
 		action->opcode = O_DENY;
 		action->arg1 = 0;
 		break;
 
 	case TOK_REJECT:
 		action->opcode = O_REJECT;
 		action->arg1 = ICMP_UNREACH_HOST;
 		break;
 
 	case TOK_RESET:
 		action->opcode = O_REJECT;
 		action->arg1 = ICMP_REJECT_RST;
 		break;
 
 	case TOK_RESET6:
 		action->opcode = O_UNREACH6;
 		action->arg1 = ICMP6_UNREACH_RST;
 		break;
 
 	case TOK_UNREACH:
 		action->opcode = O_REJECT;
 		NEED1("missing reject code");
 		fill_reject_code(&action->arg1, *av);
 		av++;
 		break;
 
 	case TOK_UNREACH6:
 		action->opcode = O_UNREACH6;
 		NEED1("missing unreach code");
 		fill_unreach6_code(&action->arg1, *av);
 		av++;
 		break;
 
 	case TOK_COUNT:
 		action->opcode = O_COUNT;
 		break;
 
 	case TOK_NAT:
 		action->opcode = O_NAT;
 		action->len = F_INSN_SIZE(ipfw_insn_nat);
 		CHECK_ACTLEN;
 		if (*av != NULL && _substrcmp(*av, "global") == 0) {
 			action->arg1 = IP_FW_NAT44_GLOBAL;
 			av++;
 			break;
 		} else
 			goto chkarg;
 	case TOK_QUEUE:
 		action->opcode = O_QUEUE;
 		goto chkarg;
 	case TOK_PIPE:
 		action->opcode = O_PIPE;
 		goto chkarg;
 	case TOK_SKIPTO:
 		action->opcode = O_SKIPTO;
 		goto chkarg;
 	case TOK_NETGRAPH:
 		action->opcode = O_NETGRAPH;
 		goto chkarg;
 	case TOK_NGTEE:
 		action->opcode = O_NGTEE;
 		goto chkarg;
 	case TOK_DIVERT:
 		action->opcode = O_DIVERT;
 		goto chkarg;
 	case TOK_TEE:
 		action->opcode = O_TEE;
 		goto chkarg;
 	case TOK_CALL:
 		action->opcode = O_CALLRETURN;
 chkarg:
 		if (!av[0])
 			errx(EX_USAGE, "missing argument for %s", *(av - 1));
 		if (isdigit(**av)) {
 			action->arg1 = strtoul(*av, NULL, 10);
 			if (action->arg1 <= 0 || action->arg1 >= IP_FW_TABLEARG)
 				errx(EX_DATAERR, "illegal argument for %s",
 				    *(av - 1));
 		} else if (_substrcmp(*av, "tablearg") == 0) {
 			action->arg1 = IP_FW_TARG;
 		} else if (i == TOK_DIVERT || i == TOK_TEE) {
 			struct servent *s;
 			setservent(1);
 			s = getservbyname(av[0], "divert");
 			if (s != NULL)
 				action->arg1 = ntohs(s->s_port);
 			else
 				errx(EX_DATAERR, "illegal divert/tee port");
 		} else
 			errx(EX_DATAERR, "illegal argument for %s", *(av - 1));
 		av++;
 		break;
 
 	case TOK_FORWARD: {
 		/*
 		 * Locate the address-port separator (':' or ',').
 		 * Could be one of the following:
 		 *	hostname:port
 		 *	IPv4 a.b.c.d,port
 		 *	IPv4 a.b.c.d:port
 		 *	IPv6 w:x:y::z,port
 		 * The ':' can only be used with hostname and IPv4 address.
 		 * XXX-BZ Should we also support [w:x:y::z]:port?
 		 */
 		struct sockaddr_storage result;
 		struct addrinfo *res;
 		char *s, *end;
 		int family;
 		u_short port_number;
 
 		NEED1("missing forward address[:port]");
 
 		/*
 		 * locate the address-port separator (':' or ',')
 		 */
 		s = strchr(*av, ',');
 		if (s == NULL) {
 			/* Distinguish between IPv4:port and IPv6 cases. */
 			s = strchr(*av, ':');
 			if (s && strchr(s+1, ':'))
 				s = NULL; /* no port */
 		}
 
 		port_number = 0;
 		if (s != NULL) {
 			/* Terminate host portion and set s to start of port. */
 			*(s++) = '\0';
 			i = strtoport(s, &end, 0 /* base */, 0 /* proto */);
 			if (s == end)
 				errx(EX_DATAERR,
 				    "illegal forwarding port ``%s''", s);
 			port_number = (u_short)i;
 		}
 
 		if (_substrcmp(*av, "tablearg") == 0) {
 			family = PF_INET;
 			((struct sockaddr_in*)&result)->sin_addr.s_addr =
 			    INADDR_ANY;
 		} else {
 			/*
 			 * Resolve the host name or address to a family and a
 			 * network representation of the address.
 			 */
 			if (getaddrinfo(*av, NULL, NULL, &res))
 				errx(EX_DATAERR, NULL);
 			/* Just use the first host in the answer. */
 			family = res->ai_family;
 			memcpy(&result, res->ai_addr, res->ai_addrlen);
 			freeaddrinfo(res);
 		}
 
  		if (family == PF_INET) {
 			ipfw_insn_sa *p = (ipfw_insn_sa *)action;
 
 			action->opcode = O_FORWARD_IP;
 			action->len = F_INSN_SIZE(ipfw_insn_sa);
 			CHECK_ACTLEN;
 
 			/*
 			 * In the kernel we assume AF_INET and use only
 			 * sin_port and sin_addr. Remember to set sin_len as
 			 * the routing code seems to use it too.
 			 */
 			p->sa.sin_len = sizeof(struct sockaddr_in);
 			p->sa.sin_family = AF_INET;
 			p->sa.sin_port = port_number;
 			p->sa.sin_addr.s_addr =
 			     ((struct sockaddr_in *)&result)->sin_addr.s_addr;
 		} else if (family == PF_INET6) {
 			ipfw_insn_sa6 *p = (ipfw_insn_sa6 *)action;
 
 			action->opcode = O_FORWARD_IP6;
 			action->len = F_INSN_SIZE(ipfw_insn_sa6);
 			CHECK_ACTLEN;
 
 			p->sa.sin6_len = sizeof(struct sockaddr_in6);
 			p->sa.sin6_family = AF_INET6;
 			p->sa.sin6_port = port_number;
 			p->sa.sin6_flowinfo = 0;
 			p->sa.sin6_scope_id =
 			    ((struct sockaddr_in6 *)&result)->sin6_scope_id;
 			bcopy(&((struct sockaddr_in6*)&result)->sin6_addr,
 			    &p->sa.sin6_addr, sizeof(p->sa.sin6_addr));
 		} else {
 			errx(EX_DATAERR, "Invalid address family in forward action");
 		}
 		av++;
 		break;
 	    }
 	case TOK_COMMENT:
 		/* pretend it is a 'count' rule followed by the comment */
 		action->opcode = O_COUNT;
 		av--;		/* go back... */
 		break;
 
 	case TOK_SETFIB:
 	    {
 		int numfibs;
 		size_t intsize = sizeof(int);
 
 		action->opcode = O_SETFIB;
 		NEED1("missing fib number");
 		if (_substrcmp(*av, "tablearg") == 0) {
 			action->arg1 = IP_FW_TARG;
 		} else {
 		        action->arg1 = strtoul(*av, NULL, 10);
 			if (sysctlbyname("net.fibs", &numfibs, &intsize,
 			    NULL, 0) == -1)
 				errx(EX_DATAERR, "fibs not suported.\n");
 			if (action->arg1 >= numfibs)  /* Temporary */
 				errx(EX_DATAERR, "fib too large.\n");
 			/* Add high-order bit to fib to make room for tablearg*/
 			action->arg1 |= 0x8000;
 		}
 		av++;
 		break;
 	    }
 
 	case TOK_SETDSCP:
 	    {
 		int code;
 
 		action->opcode = O_SETDSCP;
 		NEED1("missing DSCP code");
 		if (_substrcmp(*av, "tablearg") == 0) {
 			action->arg1 = IP_FW_TARG;
 		} else {
 			if (isalpha(*av[0])) {
 				if ((code = match_token(f_ipdscp, *av)) == -1)
 					errx(EX_DATAERR, "Unknown DSCP code");
 				action->arg1 = code;
 			} else
 			        action->arg1 = strtoul(*av, NULL, 10);
 			/*
 			 * Add high-order bit to DSCP to make room
 			 * for tablearg
 			 */
 			action->arg1 |= 0x8000;
 		}
 		av++;
 		break;
 	    }
 
 	case TOK_REASS:
 		action->opcode = O_REASS;
 		break;
 
 	case TOK_RETURN:
 		fill_cmd(action, O_CALLRETURN, F_NOT, 0);
 		break;
 
 	case TOK_TCPSETMSS: {
 		u_long mss;
 		uint16_t idx;
 
 		idx = pack_object(tstate, "tcp-setmss", IPFW_TLV_EACTION);
 		if (idx == 0)
 			errx(EX_DATAERR, "pack_object failed");
 		fill_cmd(action, O_EXTERNAL_ACTION, 0, idx);
 		NEED1("Missing MSS value");
 		action = next_cmd(action, &ablen);
 		action->len = 1;
 		CHECK_ACTLEN;
 		mss = strtoul(*av, NULL, 10);
 		if (mss == 0 || mss > UINT16_MAX)
 			errx(EX_USAGE, "invalid MSS value %s", *av);
 		fill_cmd(action, O_EXTERNAL_DATA, 0, (uint16_t)mss);
 		av++;
 		break;
 	}
 
 	default:
 		av--;
 		if (match_token(rule_eactions, *av) == -1)
 			errx(EX_DATAERR, "invalid action %s\n", *av);
 		/*
 		 * External actions support.
 		 * XXX: we support only syntax with instance name.
 		 *	For known external actions (from rule_eactions list)
 		 *	we can handle syntax directly. But with `eaction'
 		 *	keyword we can use only `eaction <name> <instance>'
 		 *	syntax.
 		 */
 	case TOK_EACTION: {
 		uint16_t idx;
 
 		NEED1("Missing eaction name");
 		if (eaction_check_name(*av) != 0)
 			errx(EX_DATAERR, "Invalid eaction name %s", *av);
 		idx = pack_object(tstate, *av, IPFW_TLV_EACTION);
 		if (idx == 0)
 			errx(EX_DATAERR, "pack_object failed");
 		fill_cmd(action, O_EXTERNAL_ACTION, 0, idx);
 		av++;
 		NEED1("Missing eaction instance name");
 		action = next_cmd(action, &ablen);
 		action->len = 1;
 		CHECK_ACTLEN;
 		if (eaction_check_name(*av) != 0)
 			errx(EX_DATAERR, "Invalid eaction instance name %s",
 			    *av);
 		/*
 		 * External action instance object has TLV type depended
 		 * from the external action name object index. Since we
 		 * currently don't know this index, use zero as TLV type.
 		 */
 		idx = pack_object(tstate, *av, 0);
 		if (idx == 0)
 			errx(EX_DATAERR, "pack_object failed");
 		fill_cmd(action, O_EXTERNAL_INSTANCE, 0, idx);
 		av++;
 		}
 	}
 	action = next_cmd(action, &ablen);
 
 	/*
 	 * [altq queuename] -- altq tag, optional
 	 * [log [logamount N]]	-- log, optional
 	 *
 	 * If they exist, it go first in the cmdbuf, but then it is
 	 * skipped in the copy section to the end of the buffer.
 	 */
 	while (av[0] != NULL && (i = match_token(rule_action_params, *av)) != -1) {
 		av++;
 		switch (i) {
 		case TOK_LOG:
 		    {
 			ipfw_insn_log *c = (ipfw_insn_log *)cmd;
 			int l;
 
 			if (have_log)
 				errx(EX_DATAERR,
 				    "log cannot be specified more than once");
 			have_log = (ipfw_insn *)c;
 			cmd->len = F_INSN_SIZE(ipfw_insn_log);
 			CHECK_CMDLEN;
 			cmd->opcode = O_LOG;
 			if (av[0] && _substrcmp(*av, "logamount") == 0) {
 				av++;
 				NEED1("logamount requires argument");
 				l = atoi(*av);
 				if (l < 0)
 					errx(EX_DATAERR,
 					    "logamount must be positive");
 				c->max_log = l;
 				av++;
 			} else {
 				len = sizeof(c->max_log);
 				if (sysctlbyname("net.inet.ip.fw.verbose_limit",
 				    &c->max_log, &len, NULL, 0) == -1) {
 					if (co.test_only) {
 						c->max_log = 0;
 						break;
 					}
 					errx(1, "sysctlbyname(\"%s\")",
 					    "net.inet.ip.fw.verbose_limit");
 				}
 			}
 		    }
 			break;
 
 #ifndef NO_ALTQ
 		case TOK_ALTQ:
 		    {
 			ipfw_insn_altq *a = (ipfw_insn_altq *)cmd;
 
 			NEED1("missing altq queue name");
 			if (have_altq)
 				errx(EX_DATAERR,
 				    "altq cannot be specified more than once");
 			have_altq = (ipfw_insn *)a;
 			cmd->len = F_INSN_SIZE(ipfw_insn_altq);
 			CHECK_CMDLEN;
 			cmd->opcode = O_ALTQ;
 			a->qid = altq_name_to_qid(*av);
 			av++;
 		    }
 			break;
 #endif
 
 		case TOK_TAG:
 		case TOK_UNTAG: {
 			uint16_t tag;
 
 			if (have_tag)
 				errx(EX_USAGE, "tag and untag cannot be "
 				    "specified more than once");
 			GET_UINT_ARG(tag, IPFW_ARG_MIN, IPFW_ARG_MAX, i,
 			   rule_action_params);
 			have_tag = cmd;
 			fill_cmd(cmd, O_TAG, (i == TOK_TAG) ? 0: F_NOT, tag);
 			av++;
 			break;
 		}
 
 		default:
 			abort();
 		}
 		cmd = next_cmd(cmd, &cblen);
 	}
 
 	if (have_state)	{ /* must be a check-state, we are done */
 		if (*av != NULL &&
 		    match_token(rule_options, *av) == TOK_COMMENT) {
 			/* check-state has a comment */
 			av++;
 			fill_comment(cmd, av, cblen);
 			cmd = next_cmd(cmd, &cblen);
 			av[0] = NULL;
 		}
 		goto done;
 	}
 
 #define OR_START(target)					\
 	if (av[0] && (*av[0] == '(' || *av[0] == '{')) { 	\
 		if (open_par)					\
 			errx(EX_USAGE, "nested \"(\" not allowed\n"); \
 		prev = NULL;					\
 		open_par = 1;					\
 		if ( (av[0])[1] == '\0') {			\
 			av++;					\
 		} else						\
 			(*av)++;				\
 	}							\
 	target:							\
 
 
 #define	CLOSE_PAR						\
 	if (open_par) {						\
 		if (av[0] && (					\
 		    strcmp(*av, ")") == 0 ||			\
 		    strcmp(*av, "}") == 0)) {			\
 			prev = NULL;				\
 			open_par = 0;				\
 			av++;					\
 		} else						\
 			errx(EX_USAGE, "missing \")\"\n");	\
 	}
 
 #define NOT_BLOCK						\
 	if (av[0] && _substrcmp(*av, "not") == 0) {		\
 		if (cmd->len & F_NOT)				\
 			errx(EX_USAGE, "double \"not\" not allowed\n"); \
 		cmd->len |= F_NOT;				\
 		av++;						\
 	}
 
 #define OR_BLOCK(target)					\
 	if (av[0] && _substrcmp(*av, "or") == 0) {		\
 		if (prev == NULL || open_par == 0)		\
 			errx(EX_DATAERR, "invalid OR block");	\
 		prev->len |= F_OR;				\
 		av++;					\
 		goto target;					\
 	}							\
 	CLOSE_PAR;
 
 	first_cmd = cmd;
 
 #if 0
 	/*
 	 * MAC addresses, optional.
 	 * If we have this, we skip the part "proto from src to dst"
 	 * and jump straight to the option parsing.
 	 */
 	NOT_BLOCK;
 	NEED1("missing protocol");
 	if (_substrcmp(*av, "MAC") == 0 ||
 	    _substrcmp(*av, "mac") == 0) {
 		av++;			/* the "MAC" keyword */
 		add_mac(cmd, av);	/* exits in case of errors */
 		cmd = next_cmd(cmd);
 		av += 2;		/* dst-mac and src-mac */
 		NOT_BLOCK;
 		NEED1("missing mac type");
 		if (add_mactype(cmd, av[0]))
 			cmd = next_cmd(cmd);
 		av++;			/* any or mac-type */
 		goto read_options;
 	}
 #endif
 
 	/*
 	 * protocol, mandatory
 	 */
     OR_START(get_proto);
 	NOT_BLOCK;
 	NEED1("missing protocol");
 	if (add_proto_compat(cmd, *av, &proto)) {
 		av++;
 		if (F_LEN(cmd) != 0) {
 			prev = cmd;
 			cmd = next_cmd(cmd, &cblen);
 		}
 	} else if (first_cmd != cmd) {
 		errx(EX_DATAERR, "invalid protocol ``%s''", *av);
 	} else {
 		rule->flags |= IPFW_RULE_JUSTOPTS;
 		goto read_options;
 	}
     OR_BLOCK(get_proto);
 
 	/*
 	 * "from", mandatory
 	 */
 	if ((av[0] == NULL) || _substrcmp(*av, "from") != 0)
 		errx(EX_USAGE, "missing ``from''");
 	av++;
 
 	/*
 	 * source IP, mandatory
 	 */
     OR_START(source_ip);
 	NOT_BLOCK;	/* optional "not" */
 	NEED1("missing source address");
 	if (add_src(cmd, *av, proto, cblen, tstate)) {
 		av++;
 		if (F_LEN(cmd) != 0) {	/* ! any */
 			prev = cmd;
 			cmd = next_cmd(cmd, &cblen);
 		}
 	} else
 		errx(EX_USAGE, "bad source address %s", *av);
     OR_BLOCK(source_ip);
 
 	/*
 	 * source ports, optional
 	 */
 	NOT_BLOCK;	/* optional "not" */
 	if ( av[0] != NULL ) {
 		if (_substrcmp(*av, "any") == 0 ||
 		    add_ports(cmd, *av, proto, O_IP_SRCPORT, cblen)) {
 			av++;
 			if (F_LEN(cmd) != 0)
 				cmd = next_cmd(cmd, &cblen);
 		}
 	}
 
 	/*
 	 * "to", mandatory
 	 */
 	if ( (av[0] == NULL) || _substrcmp(*av, "to") != 0 )
 		errx(EX_USAGE, "missing ``to''");
 	av++;
 
 	/*
 	 * destination, mandatory
 	 */
     OR_START(dest_ip);
 	NOT_BLOCK;	/* optional "not" */
 	NEED1("missing dst address");
 	if (add_dst(cmd, *av, proto, cblen, tstate)) {
 		av++;
 		if (F_LEN(cmd) != 0) {	/* ! any */
 			prev = cmd;
 			cmd = next_cmd(cmd, &cblen);
 		}
 	} else
 		errx( EX_USAGE, "bad destination address %s", *av);
     OR_BLOCK(dest_ip);
 
 	/*
 	 * dest. ports, optional
 	 */
 	NOT_BLOCK;	/* optional "not" */
 	if (av[0]) {
 		if (_substrcmp(*av, "any") == 0 ||
 		    add_ports(cmd, *av, proto, O_IP_DSTPORT, cblen)) {
 			av++;
 			if (F_LEN(cmd) != 0)
 				cmd = next_cmd(cmd, &cblen);
 		}
 	}
 
 read_options:
 	prev = NULL;
 	while ( av[0] != NULL ) {
 		char *s;
 		ipfw_insn_u32 *cmd32;	/* alias for cmd */
 
 		s = *av;
 		cmd32 = (ipfw_insn_u32 *)cmd;
 
 		if (*s == '!') {	/* alternate syntax for NOT */
 			if (cmd->len & F_NOT)
 				errx(EX_USAGE, "double \"not\" not allowed\n");
 			cmd->len = F_NOT;
 			s++;
 		}
 		i = match_token(rule_options, s);
 		av++;
 		switch(i) {
 		case TOK_NOT:
 			if (cmd->len & F_NOT)
 				errx(EX_USAGE, "double \"not\" not allowed\n");
 			cmd->len = F_NOT;
 			break;
 
 		case TOK_OR:
 			if (open_par == 0 || prev == NULL)
 				errx(EX_USAGE, "invalid \"or\" block\n");
 			prev->len |= F_OR;
 			break;
 
 		case TOK_STARTBRACE:
 			if (open_par)
 				errx(EX_USAGE, "+nested \"(\" not allowed\n");
 			open_par = 1;
 			break;
 
 		case TOK_ENDBRACE:
 			if (!open_par)
 				errx(EX_USAGE, "+missing \")\"\n");
 			open_par = 0;
 			prev = NULL;
 			break;
 
 		case TOK_IN:
 			fill_cmd(cmd, O_IN, 0, 0);
 			break;
 
 		case TOK_OUT:
 			cmd->len ^= F_NOT; /* toggle F_NOT */
 			fill_cmd(cmd, O_IN, 0, 0);
 			break;
 
 		case TOK_DIVERTED:
 			fill_cmd(cmd, O_DIVERTED, 0, 3);
 			break;
 
 		case TOK_DIVERTEDLOOPBACK:
 			fill_cmd(cmd, O_DIVERTED, 0, 1);
 			break;
 
 		case TOK_DIVERTEDOUTPUT:
 			fill_cmd(cmd, O_DIVERTED, 0, 2);
 			break;
 
 		case TOK_FRAG:
 			fill_cmd(cmd, O_FRAG, 0, 0);
 			break;
 
 		case TOK_LAYER2:
 			fill_cmd(cmd, O_LAYER2, 0, 0);
 			break;
 
 		case TOK_XMIT:
 		case TOK_RECV:
 		case TOK_VIA:
 			NEED1("recv, xmit, via require interface name"
 				" or address");
 			fill_iface((ipfw_insn_if *)cmd, av[0], cblen, tstate);
 			av++;
 			if (F_LEN(cmd) == 0)	/* not a valid address */
 				break;
 			if (i == TOK_XMIT)
 				cmd->opcode = O_XMIT;
 			else if (i == TOK_RECV)
 				cmd->opcode = O_RECV;
 			else if (i == TOK_VIA)
 				cmd->opcode = O_VIA;
 			break;
 
 		case TOK_ICMPTYPES:
 			NEED1("icmptypes requires list of types");
 			fill_icmptypes((ipfw_insn_u32 *)cmd, *av);
 			av++;
 			break;
 
 		case TOK_ICMP6TYPES:
 			NEED1("icmptypes requires list of types");
 			fill_icmp6types((ipfw_insn_icmp6 *)cmd, *av, cblen);
 			av++;
 			break;
 
 		case TOK_IPTTL:
 			NEED1("ipttl requires TTL");
 			if (strpbrk(*av, "-,")) {
 			    if (!add_ports(cmd, *av, 0, O_IPTTL, cblen))
 				errx(EX_DATAERR, "invalid ipttl %s", *av);
 			} else
 			    fill_cmd(cmd, O_IPTTL, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_IPID:
 			NEED1("ipid requires id");
 			if (strpbrk(*av, "-,")) {
 			    if (!add_ports(cmd, *av, 0, O_IPID, cblen))
 				errx(EX_DATAERR, "invalid ipid %s", *av);
 			} else
 			    fill_cmd(cmd, O_IPID, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_IPLEN:
 			NEED1("iplen requires length");
 			if (strpbrk(*av, "-,")) {
 			    if (!add_ports(cmd, *av, 0, O_IPLEN, cblen))
 				errx(EX_DATAERR, "invalid ip len %s", *av);
 			} else
 			    fill_cmd(cmd, O_IPLEN, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_IPVER:
 			NEED1("ipver requires version");
 			fill_cmd(cmd, O_IPVER, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_IPPRECEDENCE:
 			NEED1("ipprecedence requires value");
 			fill_cmd(cmd, O_IPPRECEDENCE, 0,
 			    (strtoul(*av, NULL, 0) & 7) << 5);
 			av++;
 			break;
 
 		case TOK_DSCP:
 			NEED1("missing DSCP code");
 			fill_dscp(cmd, *av, cblen);
 			av++;
 			break;
 
 		case TOK_IPOPTS:
 			NEED1("missing argument for ipoptions");
 			fill_flags_cmd(cmd, O_IPOPT, f_ipopts, *av);
 			av++;
 			break;
 
 		case TOK_IPTOS:
 			NEED1("missing argument for iptos");
 			fill_flags_cmd(cmd, O_IPTOS, f_iptos, *av);
 			av++;
 			break;
 
 		case TOK_UID:
 			NEED1("uid requires argument");
 		    {
 			char *end;
 			uid_t uid;
 			struct passwd *pwd;
 
 			cmd->opcode = O_UID;
 			uid = strtoul(*av, &end, 0);
 			pwd = (*end == '\0') ? getpwuid(uid) : getpwnam(*av);
 			if (pwd == NULL)
 				errx(EX_DATAERR, "uid \"%s\" nonexistent", *av);
 			cmd32->d[0] = pwd->pw_uid;
 			cmd->len |= F_INSN_SIZE(ipfw_insn_u32);
 			av++;
 		    }
 			break;
 
 		case TOK_GID:
 			NEED1("gid requires argument");
 		    {
 			char *end;
 			gid_t gid;
 			struct group *grp;
 
 			cmd->opcode = O_GID;
 			gid = strtoul(*av, &end, 0);
 			grp = (*end == '\0') ? getgrgid(gid) : getgrnam(*av);
 			if (grp == NULL)
 				errx(EX_DATAERR, "gid \"%s\" nonexistent", *av);
 			cmd32->d[0] = grp->gr_gid;
 			cmd->len |= F_INSN_SIZE(ipfw_insn_u32);
 			av++;
 		    }
 			break;
 
 		case TOK_JAIL:
 			NEED1("jail requires argument");
 		    {
 			int jid;
 
 			cmd->opcode = O_JAIL;
 			jid = jail_getid(*av);
 			if (jid < 0)
 				errx(EX_DATAERR, "%s", jail_errmsg);
 			cmd32->d[0] = (uint32_t)jid;
 			cmd->len |= F_INSN_SIZE(ipfw_insn_u32);
 			av++;
 		    }
 			break;
 
 		case TOK_ESTAB:
 			fill_cmd(cmd, O_ESTAB, 0, 0);
 			break;
 
 		case TOK_SETUP:
 			fill_cmd(cmd, O_TCPFLAGS, 0,
 				(TH_SYN) | ( (TH_ACK) & 0xff) <<8 );
 			break;
 
 		case TOK_TCPDATALEN:
 			NEED1("tcpdatalen requires length");
 			if (strpbrk(*av, "-,")) {
 			    if (!add_ports(cmd, *av, 0, O_TCPDATALEN, cblen))
 				errx(EX_DATAERR, "invalid tcpdata len %s", *av);
 			} else
 			    fill_cmd(cmd, O_TCPDATALEN, 0,
 				    strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_TCPOPTS:
 			NEED1("missing argument for tcpoptions");
 			fill_flags_cmd(cmd, O_TCPOPTS, f_tcpopts, *av);
 			av++;
 			break;
 
 		case TOK_TCPSEQ:
 		case TOK_TCPACK:
 			NEED1("tcpseq/tcpack requires argument");
 			cmd->len = F_INSN_SIZE(ipfw_insn_u32);
 			cmd->opcode = (i == TOK_TCPSEQ) ? O_TCPSEQ : O_TCPACK;
 			cmd32->d[0] = htonl(strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_TCPWIN:
 			NEED1("tcpwin requires length");
 			if (strpbrk(*av, "-,")) {
 			    if (!add_ports(cmd, *av, 0, O_TCPWIN, cblen))
 				errx(EX_DATAERR, "invalid tcpwin len %s", *av);
 			} else
 			    fill_cmd(cmd, O_TCPWIN, 0,
 				    strtoul(*av, NULL, 0));
 			av++;
 			break;
 
 		case TOK_TCPFLAGS:
 			NEED1("missing argument for tcpflags");
 			cmd->opcode = O_TCPFLAGS;
 			fill_flags_cmd(cmd, O_TCPFLAGS, f_tcpflags, *av);
 			av++;
 			break;
 
 		case TOK_KEEPSTATE:
 		case TOK_RECORDSTATE: {
 			uint16_t uidx;
 
 			if (open_par)
 				errx(EX_USAGE, "keep-state or record-state cannot be part "
 				    "of an or block");
 			if (have_state)
 				errx(EX_USAGE, "only one of keep-state, record-state, "
 					" limit and set-limit is allowed");
 			if (*av != NULL && *av[0] == ':') {
 				if (state_check_name(*av + 1) != 0)
 					errx(EX_DATAERR,
 					    "Invalid state name %s", *av);
 				uidx = pack_object(tstate, *av + 1,
 				    IPFW_TLV_STATE_NAME);
 				av++;
 			} else
 				uidx = pack_object(tstate, default_state_name,
 				    IPFW_TLV_STATE_NAME);
 			have_state = cmd;
 			have_rstate = i == TOK_RECORDSTATE;
 			fill_cmd(cmd, O_KEEP_STATE, 0, uidx);
 			break;
 		}
 
 		case TOK_LIMIT:
 		case TOK_SETLIMIT: {
 			ipfw_insn_limit *c = (ipfw_insn_limit *)cmd;
 			int val;
 
 			if (open_par)
 				errx(EX_USAGE,
 				    "limit or set-limit cannot be part of an or block");
 			if (have_state)
 				errx(EX_USAGE, "only one of keep-state, record-state, "
 					" limit and set-limit is allowed");
 			have_state = cmd;
 			have_rstate = i == TOK_SETLIMIT;
 
 			cmd->len = F_INSN_SIZE(ipfw_insn_limit);
 			CHECK_CMDLEN;
 			cmd->opcode = O_LIMIT;
 			c->limit_mask = c->conn_limit = 0;
 
 			while ( av[0] != NULL ) {
 				if ((val = match_token(limit_masks, *av)) <= 0)
 					break;
 				c->limit_mask |= val;
 				av++;
 			}
 
 			if (c->limit_mask == 0)
 				errx(EX_USAGE, "limit: missing limit mask");
 
 			GET_UINT_ARG(c->conn_limit, IPFW_ARG_MIN, IPFW_ARG_MAX,
 			    TOK_LIMIT, rule_options);
 			av++;
 
 			if (*av != NULL && *av[0] == ':') {
 				if (state_check_name(*av + 1) != 0)
 					errx(EX_DATAERR,
 					    "Invalid state name %s", *av);
 				cmd->arg1 = pack_object(tstate, *av + 1,
 				    IPFW_TLV_STATE_NAME);
 				av++;
 			} else
 				cmd->arg1 = pack_object(tstate,
 				    default_state_name, IPFW_TLV_STATE_NAME);
 			break;
 		}
 
 		case TOK_PROTO:
 			NEED1("missing protocol");
 			if (add_proto(cmd, *av, &proto)) {
 				av++;
 			} else
 				errx(EX_DATAERR, "invalid protocol ``%s''",
 				    *av);
 			break;
 
 		case TOK_SRCIP:
 			NEED1("missing source IP");
 			if (add_srcip(cmd, *av, cblen, tstate)) {
 				av++;
 			}
 			break;
 
 		case TOK_DSTIP:
 			NEED1("missing destination IP");
 			if (add_dstip(cmd, *av, cblen, tstate)) {
 				av++;
 			}
 			break;
 
 		case TOK_SRCIP6:
 			NEED1("missing source IP6");
 			if (add_srcip6(cmd, *av, cblen, tstate)) {
 				av++;
 			}
 			break;
 
 		case TOK_DSTIP6:
 			NEED1("missing destination IP6");
 			if (add_dstip6(cmd, *av, cblen, tstate)) {
 				av++;
 			}
 			break;
 
 		case TOK_SRCPORT:
 			NEED1("missing source port");
 			if (_substrcmp(*av, "any") == 0 ||
 			    add_ports(cmd, *av, proto, O_IP_SRCPORT, cblen)) {
 				av++;
 			} else
 				errx(EX_DATAERR, "invalid source port %s", *av);
 			break;
 
 		case TOK_DSTPORT:
 			NEED1("missing destination port");
 			if (_substrcmp(*av, "any") == 0 ||
 			    add_ports(cmd, *av, proto, O_IP_DSTPORT, cblen)) {
 				av++;
 			} else
 				errx(EX_DATAERR, "invalid destination port %s",
 				    *av);
 			break;
 
 		case TOK_MAC:
 			if (add_mac(cmd, av, cblen))
 				av += 2;
 			break;
 
 		case TOK_MACTYPE:
 			NEED1("missing mac type");
 			if (!add_mactype(cmd, *av, cblen))
 				errx(EX_DATAERR, "invalid mac type %s", *av);
 			av++;
 			break;
 
 		case TOK_VERREVPATH:
 			fill_cmd(cmd, O_VERREVPATH, 0, 0);
 			break;
 
 		case TOK_VERSRCREACH:
 			fill_cmd(cmd, O_VERSRCREACH, 0, 0);
 			break;
 
 		case TOK_ANTISPOOF:
 			fill_cmd(cmd, O_ANTISPOOF, 0, 0);
 			break;
 
 		case TOK_IPSEC:
 			fill_cmd(cmd, O_IPSEC, 0, 0);
 			break;
 
 		case TOK_IPV6:
 			fill_cmd(cmd, O_IP6, 0, 0);
 			break;
 
 		case TOK_IPV4:
 			fill_cmd(cmd, O_IP4, 0, 0);
 			break;
 
 		case TOK_EXT6HDR:
 			fill_ext6hdr( cmd, *av );
 			av++;
 			break;
 
 		case TOK_FLOWID:
 			if (proto != IPPROTO_IPV6 )
 				errx( EX_USAGE, "flow-id filter is active "
 				    "only for ipv6 protocol\n");
 			fill_flow6( (ipfw_insn_u32 *) cmd, *av, cblen);
 			av++;
 			break;
 
 		case TOK_COMMENT:
 			fill_comment(cmd, av, cblen);
 			av[0]=NULL;
 			break;
 
 		case TOK_TAGGED:
 			if (av[0] && strpbrk(*av, "-,")) {
 				if (!add_ports(cmd, *av, 0, O_TAGGED, cblen))
 					errx(EX_DATAERR, "tagged: invalid tag"
 					    " list: %s", *av);
 			}
 			else {
 				uint16_t tag;
 
 				GET_UINT_ARG(tag, IPFW_ARG_MIN, IPFW_ARG_MAX,
 				    TOK_TAGGED, rule_options);
 				fill_cmd(cmd, O_TAGGED, 0, tag);
 			}
 			av++;
 			break;
 
 		case TOK_FIB:
 			NEED1("fib requires fib number");
 			fill_cmd(cmd, O_FIB, 0, strtoul(*av, NULL, 0));
 			av++;
 			break;
 		case TOK_SOCKARG:
 			fill_cmd(cmd, O_SOCKARG, 0, 0);
 			break;
 
 		case TOK_LOOKUP: {
 			ipfw_insn_u32 *c = (ipfw_insn_u32 *)cmd;
 			int j;
 
 			if (!av[0] || !av[1])
 				errx(EX_USAGE, "format: lookup argument tablenum");
 			cmd->opcode = O_IP_DST_LOOKUP;
 			cmd->len |= F_INSN_SIZE(ipfw_insn) + 2;
 			i = match_token(rule_options, *av);
 			for (j = 0; lookup_key[j] >= 0 ; j++) {
 				if (i == lookup_key[j])
 					break;
 			}
 			if (lookup_key[j] <= 0)
 				errx(EX_USAGE, "format: cannot lookup on %s", *av);
 			__PAST_END(c->d, 1) = j; // i converted to option
 			av++;
 
 			if ((j = pack_table(tstate, *av)) == 0)
 				errx(EX_DATAERR, "Invalid table name: %s", *av);
 
 			cmd->arg1 = j;
 			av++;
 		    }
 			break;
 		case TOK_FLOW:
 			NEED1("missing table name");
 			if (strncmp(*av, "table(", 6) != 0)
 				errx(EX_DATAERR,
 				    "enclose table name into \"table()\"");
 			fill_table(cmd, *av, O_IP_FLOW_LOOKUP, tstate);
 			av++;
 			break;
 
 		case TOK_SKIPACTION:
 			if (have_skipcmd)
 				errx(EX_USAGE, "only one defer-action "
 					"is allowed");
 			have_skipcmd = cmd;
 			fill_cmd(cmd, O_SKIP_ACTION, 0, 0);
 			break;
 
 		default:
 			errx(EX_USAGE, "unrecognised option [%d] %s\n", i, s);
 		}
 		if (F_LEN(cmd) > 0) {	/* prepare to advance */
 			prev = cmd;
 			cmd = next_cmd(cmd, &cblen);
 		}
 	}
 
 done:
 
 	if (!have_state && have_skipcmd)
 		warnx("Rule contains \"defer-immediate-action\" "
 			"and doesn't contain any state-related options.");
 
 	/*
 	 * Now copy stuff into the rule.
 	 * If we have a keep-state option, the first instruction
 	 * must be a PROBE_STATE (which is generated here).
 	 * If we have a LOG option, it was stored as the first command,
 	 * and now must be moved to the top of the action part.
 	 */
 	dst = (ipfw_insn *)rule->cmd;
 
 	/*
 	 * First thing to write into the command stream is the match probability.
 	 */
 	if (match_prob != 1) { /* 1 means always match */
 		dst->opcode = O_PROB;
 		dst->len = 2;
 		*((int32_t *)(dst+1)) = (int32_t)(match_prob * 0x7fffffff);
 		dst += dst->len;
 	}
 
 	/*
 	 * generate O_PROBE_STATE if necessary
 	 */
 	if (have_state && have_state->opcode != O_CHECK_STATE && !have_rstate) {
 		fill_cmd(dst, O_PROBE_STATE, 0, have_state->arg1);
 		dst = next_cmd(dst, &rblen);
 	}
 
 	/*
 	 * copy all commands but O_LOG, O_KEEP_STATE, O_LIMIT, O_ALTQ, O_TAG,
 	 * O_SKIP_ACTION
 	 */
 	for (src = (ipfw_insn *)cmdbuf; src != cmd; src += i) {
 		i = F_LEN(src);
 		CHECK_RBUFLEN(i);
 
 		switch (src->opcode) {
 		case O_LOG:
 		case O_KEEP_STATE:
 		case O_LIMIT:
 		case O_ALTQ:
 		case O_TAG:
 		case O_SKIP_ACTION:
 			break;
 		default:
 			bcopy(src, dst, i * sizeof(uint32_t));
 			dst += i;
 		}
 	}
 
 	/*
 	 * put back the have_state command as last opcode
 	 */
 	if (have_state && have_state->opcode != O_CHECK_STATE) {
 		i = F_LEN(have_state);
 		CHECK_RBUFLEN(i);
 		bcopy(have_state, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 
 	/*
 	 * put back the have_skipcmd command as very last opcode
 	 */
 	if (have_skipcmd) {
 		i = F_LEN(have_skipcmd);
 		CHECK_RBUFLEN(i);
 		bcopy(have_skipcmd, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 
 	/*
 	 * start action section
 	 */
 	rule->act_ofs = dst - rule->cmd;
 
 	/* put back O_LOG, O_ALTQ, O_TAG if necessary */
 	if (have_log) {
 		i = F_LEN(have_log);
 		CHECK_RBUFLEN(i);
 		bcopy(have_log, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 	if (have_altq) {
 		i = F_LEN(have_altq);
 		CHECK_RBUFLEN(i);
 		bcopy(have_altq, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 	if (have_tag) {
 		i = F_LEN(have_tag);
 		CHECK_RBUFLEN(i);
 		bcopy(have_tag, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 
 	/*
 	 * copy all other actions
 	 */
 	for (src = (ipfw_insn *)actbuf; src != action; src += i) {
 		i = F_LEN(src);
 		CHECK_RBUFLEN(i);
 		bcopy(src, dst, i * sizeof(uint32_t));
 		dst += i;
 	}
 
 	rule->cmd_len = (uint32_t *)dst - (uint32_t *)(rule->cmd);
 	*rbufsize = (char *)dst - (char *)rule;
 }
 
 static int
 compare_ntlv(const void *_a, const void *_b)
 {
 	ipfw_obj_ntlv *a, *b;
 
 	a = (ipfw_obj_ntlv *)_a;
 	b = (ipfw_obj_ntlv *)_b;
 
 	if (a->set < b->set)
 		return (-1);
 	else if (a->set > b->set)
 		return (1);
 
 	if (a->idx < b->idx)
 		return (-1);
 	else if (a->idx > b->idx)
 		return (1);
 
 	if (a->head.type < b->head.type)
 		return (-1);
 	else if (a->head.type > b->head.type)
 		return (1);
 
 	return (0);
 }
 
 /*
  * Provide kernel with sorted list of referenced objects
  */
 static void
 object_sort_ctlv(ipfw_obj_ctlv *ctlv)
 {
 
 	qsort(ctlv + 1, ctlv->count, ctlv->objsize, compare_ntlv);
 }
 
 struct object_kt {
 	uint16_t	uidx;
 	uint16_t	type;
 };
 static int
 compare_object_kntlv(const void *k, const void *v)
 {
 	ipfw_obj_ntlv *ntlv;
 	struct object_kt key;
 
 	key = *((struct object_kt *)k);
 	ntlv = (ipfw_obj_ntlv *)v;
 
 	if (key.uidx < ntlv->idx)
 		return (-1);
 	else if (key.uidx > ntlv->idx)
 		return (1);
 
 	if (key.type < ntlv->head.type)
 		return (-1);
 	else if (key.type > ntlv->head.type)
 		return (1);
 
 	return (0);
 }
 
 /*
  * Finds object name in @ctlv by @idx and @type.
  * Uses the following facts:
  * 1) All TLVs are the same size
  * 2) Kernel implementation provides already sorted list.
  *
  * Returns table name or NULL.
  */
 static char *
 object_search_ctlv(ipfw_obj_ctlv *ctlv, uint16_t idx, uint16_t type)
 {
 	ipfw_obj_ntlv *ntlv;
 	struct object_kt key;
 
 	key.uidx = idx;
 	key.type = type;
 
 	ntlv = bsearch(&key, (ctlv + 1), ctlv->count, ctlv->objsize,
 	    compare_object_kntlv);
 
 	if (ntlv != NULL)
 		return (ntlv->name);
 
 	return (NULL);
 }
 
 static char *
 table_search_ctlv(ipfw_obj_ctlv *ctlv, uint16_t idx)
 {
 
 	return (object_search_ctlv(ctlv, idx, IPFW_TLV_TBL_NAME));
 }
 
 /*
  * Adds one or more rules to ipfw chain.
  * Data layout:
  * Request:
  * [
  *   ip_fw3_opheader
  *   [ ipfw_obj_ctlv(IPFW_TLV_TBL_LIST) ipfw_obj_ntlv x N ] (optional *1)
  *   [ ipfw_obj_ctlv(IPFW_TLV_RULE_LIST) [ ip_fw_rule ip_fw_insn ] x N ] (*2) (*3)
  * ]
  * Reply:
  * [
  *   ip_fw3_opheader
  *   [ ipfw_obj_ctlv(IPFW_TLV_TBL_LIST) ipfw_obj_ntlv x N ] (optional)
  *   [ ipfw_obj_ctlv(IPFW_TLV_RULE_LIST) [ ip_fw_rule ip_fw_insn ] x N ]
  * ]
  *
  * Rules in reply are modified to store their actual ruleset number.
  *
  * (*1) TLVs inside IPFW_TLV_TBL_LIST needs to be sorted ascending
  * according to their idx field and there has to be no duplicates.
  * (*2) Numbered rules inside IPFW_TLV_RULE_LIST needs to be sorted ascending.
  * (*3) Each ip_fw structure needs to be aligned to u64 boundary.
  */
 void
 ipfw_add(char *av[])
 {
 	uint32_t rulebuf[1024];
 	int rbufsize, default_off, tlen, rlen;
 	size_t sz;
 	struct tidx ts;
 	struct ip_fw_rule *rule;
 	caddr_t tbuf;
 	ip_fw3_opheader *op3;
 	ipfw_obj_ctlv *ctlv, *tstate;
 
 	rbufsize = sizeof(rulebuf);
 	memset(rulebuf, 0, rbufsize);
 	memset(&ts, 0, sizeof(ts));
 
 	/* Optimize case with no tables */
 	default_off = sizeof(ipfw_obj_ctlv) + sizeof(ip_fw3_opheader);
 	op3 = (ip_fw3_opheader *)rulebuf;
 	ctlv = (ipfw_obj_ctlv *)(op3 + 1);
 	rule = (struct ip_fw_rule *)(ctlv + 1);
 	rbufsize -= default_off;
 
 	compile_rule(av, (uint32_t *)rule, &rbufsize, &ts);
 	/* Align rule size to u64 boundary */
 	rlen = roundup2(rbufsize, sizeof(uint64_t));
 
 	tbuf = NULL;
 	sz = 0;
 	tstate = NULL;
 	if (ts.count != 0) {
 		/* Some tables. We have to alloc more data */
 		tlen = ts.count * sizeof(ipfw_obj_ntlv);
 		sz = default_off + sizeof(ipfw_obj_ctlv) + tlen + rlen;
 
 		if ((tbuf = calloc(1, sz)) == NULL)
 			err(EX_UNAVAILABLE, "malloc() failed for IP_FW_ADD");
 		op3 = (ip_fw3_opheader *)tbuf;
 		/* Tables first */
 		ctlv = (ipfw_obj_ctlv *)(op3 + 1);
 		ctlv->head.type = IPFW_TLV_TBLNAME_LIST;
 		ctlv->head.length = sizeof(ipfw_obj_ctlv) + tlen;
 		ctlv->count = ts.count;
 		ctlv->objsize = sizeof(ipfw_obj_ntlv);
 		memcpy(ctlv + 1, ts.idx, tlen);
 		object_sort_ctlv(ctlv);
 		tstate = ctlv;
 		/* Rule next */
 		ctlv = (ipfw_obj_ctlv *)((caddr_t)ctlv + ctlv->head.length);
 		ctlv->head.type = IPFW_TLV_RULE_LIST;
 		ctlv->head.length = sizeof(ipfw_obj_ctlv) + rlen;
 		ctlv->count = 1;
 		memcpy(ctlv + 1, rule, rbufsize);
 	} else {
 		/* Simply add header */
 		sz = rlen + default_off;
 		memset(ctlv, 0, sizeof(*ctlv));
 		ctlv->head.type = IPFW_TLV_RULE_LIST;
 		ctlv->head.length = sizeof(ipfw_obj_ctlv) + rlen;
 		ctlv->count = 1;
 	}
 
 	if (do_get3(IP_FW_XADD, op3, &sz) != 0)
 		err(EX_UNAVAILABLE, "getsockopt(%s)", "IP_FW_XADD");
 
 	if (!co.do_quiet) {
 		struct format_opts sfo;
 		struct buf_pr bp;
 		memset(&sfo, 0, sizeof(sfo));
 		sfo.tstate = tstate;
 		sfo.set_mask = (uint32_t)(-1);
 		bp_alloc(&bp, 4096);
 		show_static_rule(&co, &sfo, &bp, rule, NULL);
 		printf("%s", bp.buf);
 		bp_free(&bp);
 	}
 
 	if (tbuf != NULL)
 		free(tbuf);
 
 	if (ts.idx != NULL)
 		free(ts.idx);
 }
 
 /*
  * clear the counters or the log counters.
  * optname has the following values:
  *  0 (zero both counters and logging)
  *  1 (zero logging only)
  */
 void
 ipfw_zero(int ac, char *av[], int optname)
 {
 	ipfw_range_tlv rt;
 	char const *errstr;
 	char const *name = optname ? "RESETLOG" : "ZERO";
 	uint32_t arg;
 	int failed = EX_OK;
 
 	optname = optname ? IP_FW_XRESETLOG : IP_FW_XZERO;
 	av++; ac--;
 
 	if (ac == 0) {
 		/* clear all entries */
 		memset(&rt, 0, sizeof(rt));
 		rt.flags = IPFW_RCFLAG_ALL;
 		if (do_range_cmd(optname, &rt) < 0)
 			err(EX_UNAVAILABLE, "setsockopt(IP_FW_X%s)", name);
 		if (!co.do_quiet)
 			printf("%s.\n", optname == IP_FW_XZERO ?
 			    "Accounting cleared":"Logging counts reset");
 
 		return;
 	}
 
 	while (ac) {
 		/* Rule number */
 		if (isdigit(**av)) {
 			arg = strtonum(*av, 0, 0xffff, &errstr);
 			if (errstr)
 				errx(EX_DATAERR,
 				    "invalid rule number %s\n", *av);
 			memset(&rt, 0, sizeof(rt));
 			rt.start_rule = arg;
 			rt.end_rule = arg;
 			rt.flags |= IPFW_RCFLAG_RANGE;
 			if (co.use_set != 0) {
 				rt.set = co.use_set - 1;
 				rt.flags |= IPFW_RCFLAG_SET;
 			}
 			if (do_range_cmd(optname, &rt) != 0) {
 				warn("rule %u: setsockopt(IP_FW_X%s)",
 				    arg, name);
 				failed = EX_UNAVAILABLE;
 			} else if (rt.new_set == 0) {
 				printf("Entry %d not found\n", arg);
 				failed = EX_UNAVAILABLE;
 			} else if (!co.do_quiet)
 				printf("Entry %d %s.\n", arg,
 				    optname == IP_FW_XZERO ?
 					"cleared" : "logging count reset");
 		} else {
 			errx(EX_USAGE, "invalid rule number ``%s''", *av);
 		}
 		av++; ac--;
 	}
 	if (failed != EX_OK)
 		exit(failed);
 }
 
 void
 ipfw_flush(int force)
 {
 	ipfw_range_tlv rt;
 
 	if (!force && !co.do_quiet) { /* need to ask user */
 		int c;
 
 		printf("Are you sure? [yn] ");
 		fflush(stdout);
 		do {
 			c = toupper(getc(stdin));
 			while (c != '\n' && getc(stdin) != '\n')
 				if (feof(stdin))
 					return; /* and do not flush */
 		} while (c != 'Y' && c != 'N');
 		printf("\n");
 		if (c == 'N')	/* user said no */
 			return;
 	}
 	if (co.do_pipe) {
 		dummynet_flush();
 		return;
 	}
 	/* `ipfw set N flush` - is the same that `ipfw delete set N` */
 	memset(&rt, 0, sizeof(rt));
 	if (co.use_set != 0) {
 		rt.set = co.use_set - 1;
 		rt.flags = IPFW_RCFLAG_SET;
 	} else
 		rt.flags = IPFW_RCFLAG_ALL;
 	if (do_range_cmd(IP_FW_XDEL, &rt) != 0)
 			err(EX_UNAVAILABLE, "setsockopt(IP_FW_XDEL)");
 	if (!co.do_quiet)
 		printf("Flushed all %s.\n", co.do_pipe ? "pipes" : "rules");
 }
 
 static struct _s_x intcmds[] = {
       { "talist",	TOK_TALIST },
       { "iflist",	TOK_IFLIST },
       { "olist",	TOK_OLIST },
       { "vlist",	TOK_VLIST },
       { NULL, 0 }
 };
 
 static struct _s_x otypes[] = {
 	{ "EACTION",	IPFW_TLV_EACTION },
 	{ "DYNSTATE",	IPFW_TLV_STATE_NAME },
 	{ NULL, 0 }
 };
 
 static const char*
 lookup_eaction_name(ipfw_obj_ntlv *ntlv, int cnt, uint16_t type)
 {
 	const char *name;
 	int i;
 
 	name = NULL;
 	for (i = 0; i < cnt; i++) {
 		if (ntlv[i].head.type != IPFW_TLV_EACTION)
 			continue;
 		if (IPFW_TLV_EACTION_NAME(ntlv[i].idx) != type)
 			continue;
 		name = ntlv[i].name;
 		break;
 	}
 	return (name);
 }
 
 static void
 ipfw_list_objects(int ac, char *av[])
 {
 	ipfw_obj_lheader req, *olh;
 	ipfw_obj_ntlv *ntlv;
 	const char *name;
 	size_t sz;
 	int i;
 
 	memset(&req, 0, sizeof(req));
 	sz = sizeof(req);
 	if (do_get3(IP_FW_DUMP_SRVOBJECTS, &req.opheader, &sz) != 0)
 		if (errno != ENOMEM)
 			return;
 
 	sz = req.size;
 	if ((olh = calloc(1, sz)) == NULL)
 		return;
 
 	olh->size = sz;
 	if (do_get3(IP_FW_DUMP_SRVOBJECTS, &olh->opheader, &sz) != 0) {
 		free(olh);
 		return;
 	}
 
 	if (olh->count > 0)
 		printf("Objects list:\n");
 	else
 		printf("There are no objects\n");
 	ntlv = (ipfw_obj_ntlv *)(olh + 1);
 	for (i = 0; i < olh->count; i++) {
 		name = match_value(otypes, ntlv->head.type);
 		if (name == NULL)
 			name = lookup_eaction_name(
 			    (ipfw_obj_ntlv *)(olh + 1), olh->count,
 			    ntlv->head.type);
 		if (name == NULL)
 			printf(" kidx: %4d\ttype: %10d\tname: %s\n",
 			    ntlv->idx, ntlv->head.type, ntlv->name);
 		else
 			printf(" kidx: %4d\ttype: %10s\tname: %s\n",
 			    ntlv->idx, name, ntlv->name);
 		ntlv++;
 	}
 	free(olh);
 }
 
 void
 ipfw_internal_handler(int ac, char *av[])
 {
 	int tcmd;
 
 	ac--; av++;
 	NEED1("internal cmd required");
 
 	if ((tcmd = match_token(intcmds, *av)) == -1)
 		errx(EX_USAGE, "invalid internal sub-cmd: %s", *av);
 
 	switch (tcmd) {
 	case TOK_IFLIST:
 		ipfw_list_tifaces();
 		break;
 	case TOK_TALIST:
 		ipfw_list_ta(ac, av);
 		break;
 	case TOK_OLIST:
 		ipfw_list_objects(ac, av);
 		break;
 	case TOK_VLIST:
 		ipfw_list_values(ac, av);
 		break;
 	}
 }
 
 static int
 ipfw_get_tracked_ifaces(ipfw_obj_lheader **polh)
 {
 	ipfw_obj_lheader req, *olh;
 	size_t sz;
 
 	memset(&req, 0, sizeof(req));
 	sz = sizeof(req);
 
 	if (do_get3(IP_FW_XIFLIST, &req.opheader, &sz) != 0) {
 		if (errno != ENOMEM)
 			return (errno);
 	}
 
 	sz = req.size;
 	if ((olh = calloc(1, sz)) == NULL)
 		return (ENOMEM);
 
 	olh->size = sz;
 	if (do_get3(IP_FW_XIFLIST, &olh->opheader, &sz) != 0) {
 		free(olh);
 		return (errno);
 	}
 
 	*polh = olh;
 	return (0);
 }
 
 static int
 ifinfo_cmp(const void *a, const void *b)
 {
 	ipfw_iface_info *ia, *ib;
 
 	ia = (ipfw_iface_info *)a;
 	ib = (ipfw_iface_info *)b;
 
 	return (stringnum_cmp(ia->ifname, ib->ifname));
 }
 
 /*
  * Retrieves table list from kernel,
  * optionally sorts it and calls requested function for each table.
  * Returns 0 on success.
  */
 static void
 ipfw_list_tifaces()
 {
 	ipfw_obj_lheader *olh;
 	ipfw_iface_info *info;
 	int i, error;
 
 	if ((error = ipfw_get_tracked_ifaces(&olh)) != 0)
 		err(EX_OSERR, "Unable to request ipfw tracked interface list");
 
 
 	qsort(olh + 1, olh->count, olh->objsize, ifinfo_cmp);
 
 	info = (ipfw_iface_info *)(olh + 1);
 	for (i = 0; i < olh->count; i++) {
 		if (info->flags & IPFW_IFFLAG_RESOLVED)
 			printf("%s ifindex: %d refcount: %u changes: %u\n",
 			    info->ifname, info->ifindex, info->refcnt,
 			    info->gencnt);
 		else
 			printf("%s ifindex: unresolved refcount: %u changes: %u\n",
 			    info->ifname, info->refcnt, info->gencnt);
 		info = (ipfw_iface_info *)((caddr_t)info + olh->objsize);
 	}
 
 	free(olh);
 }
 
 
 
 
Index: user/ngie/bug-237403/share/man/man4/Makefile
===================================================================
--- user/ngie/bug-237403/share/man/man4/Makefile	(revision 346925)
+++ user/ngie/bug-237403/share/man/man4/Makefile	(revision 346926)
@@ -1,1009 +1,1012 @@
 #	@(#)Makefile	8.1 (Berkeley) 6/18/93
 # $FreeBSD$
 
 .include <src.opts.mk>
 
 PACKAGE=runtime-manuals
 
 MAN=	aac.4 \
 	aacraid.4 \
 	acpi.4 \
 	${_acpi_asus.4} \
 	${_acpi_asus_wmi.4} \
 	${_acpi_dock.4} \
 	${_acpi_fujitsu.4} \
 	${_acpi_hp.4} \
 	${_acpi_ibm.4} \
 	${_acpi_panasonic.4} \
 	${_acpi_rapidstart.4} \
 	${_acpi_sony.4} \
 	acpi_thermal.4 \
 	${_acpi_toshiba.4} \
 	acpi_video.4 \
 	${_acpi_wmi.4} \
 	ada.4 \
 	adm6996fc.4 \
 	ae.4 \
 	${_aesni.4} \
 	age.4 \
 	agp.4 \
 	ahc.4 \
 	ahci.4 \
 	ahd.4 \
 	${_aibs.4} \
 	aio.4 \
 	alc.4 \
 	ale.4 \
 	alpm.4 \
 	altera_atse.4 \
 	altera_avgen.4 \
 	altera_jtag_uart.4 \
 	altera_sdcard.4 \
 	altq.4 \
 	amdpm.4 \
 	${_amdsbwd.4} \
 	${_amdsmb.4} \
 	${_amdsmn.4} \
 	${_amdtemp.4} \
 	${_bxe.4} \
 	amr.4 \
 	an.4 \
 	${_aout.4} \
 	${_apic.4} \
 	arcmsr.4 \
 	${_asmc.4} \
 	at45d.4 \
 	ata.4 \
 	ath.4 \
 	ath_ahb.4 \
 	ath_hal.4 \
 	ath_pci.4 \
 	atkbd.4 \
 	atkbdc.4 \
 	atp.4 \
 	${_atf_test_case.4} \
 	${_atrtc.4} \
 	${_attimer.4} \
 	audit.4 \
 	auditpipe.4 \
 	aue.4 \
 	axe.4 \
 	axge.4 \
 	bce.4 \
 	bcma.4 \
 	bfe.4 \
 	bge.4 \
 	${_bhyve.4} \
 	bhnd.4 \
 	bhnd_chipc.4 \
 	bhnd_pmu.4 \
 	bhndb.4 \
 	bhndb_pci.4 \
 	bktr.4 \
 	blackhole.4 \
 	bnxt.4 \
 	bpf.4 \
 	bridge.4 \
 	bt.4 \
 	bwi.4 \
 	bwn.4 \
 	${_bytgpio.4} \
 	${_chvgpio.4} \
 	capsicum.4 \
 	cardbus.4 \
 	carp.4 \
 	cas.4 \
 	cc_cdg.4 \
 	cc_chd.4 \
 	cc_cubic.4 \
 	cc_dctcp.4 \
 	cc_hd.4 \
 	cc_htcp.4 \
 	cc_newreno.4 \
 	cc_vegas.4 \
 	${_ccd.4} \
 	ccr.4 \
 	cd.4 \
 	cdce.4 \
 	cfi.4 \
 	cfumass.4 \
 	ch.4 \
 	chromebook_platform.4 \
 	ciss.4 \
 	cloudabi.4 \
 	cmx.4 \
 	${_coretemp.4} \
 	${_cpuctl.4} \
 	cpufreq.4 \
 	crypto.4 \
 	ctl.4 \
 	cue.4 \
 	cxgb.4 \
 	cxgbe.4 \
 	cxgbev.4 \
 	cy.4 \
 	cyapa.4 \
 	da.4 \
 	dc.4 \
 	dcons.4 \
 	dcons_crom.4 \
 	ddb.4 \
 	de.4 \
 	devctl.4 \
 	disc.4 \
 	divert.4 \
 	${_dpms.4} \
 	ds1307.4 \
 	ds3231.4 \
 	${_dtrace_provs} \
 	dummynet.4 \
 	ed.4 \
 	edsc.4 \
 	ehci.4 \
 	em.4 \
 	ena.4 \
 	enc.4 \
 	epair.4 \
 	esp.4 \
 	est.4 \
 	et.4 \
 	etherswitch.4 \
 	eventtimers.4 \
 	exca.4 \
 	e6060sw.4 \
 	fd.4 \
 	fdc.4 \
 	fdt.4 \
 	fdt_pinctrl.4 \
 	fdtbus.4 \
 	ffclock.4 \
 	filemon.4 \
 	firewire.4 \
 	full.4 \
 	fwe.4 \
 	fwip.4 \
 	fwohci.4 \
 	fxp.4 \
 	gbde.4 \
 	gdb.4 \
 	gem.4 \
 	geom.4 \
 	geom_fox.4 \
 	geom_linux_lvm.4 \
 	geom_map.4 \
 	geom_uzip.4 \
 	gif.4 \
 	gpio.4 \
 	gpioiic.4 \
 	gpioled.4 \
 	gre.4 \
 	h_ertt.4 \
 	hifn.4 \
 	hme.4 \
 	hpet.4 \
 	${_hpt27xx.4} \
 	${_hptiop.4} \
 	${_hptmv.4} \
 	${_hptnr.4} \
 	${_hptrr.4} \
 	${_hv_kvp.4} \
 	${_hv_netvsc.4} \
 	${_hv_storvsc.4} \
 	${_hv_utils.4} \
 	${_hv_vmbus.4} \
 	${_hv_vss.4} \
 	hwpmc.4 \
 	iavf.4 \
 	ichsmb.4 \
 	${_ichwd.4} \
 	icmp.4 \
 	icmp6.4 \
 	ida.4 \
 	if_ipsec.4 \
 	iflib.4 \
 	ifmib.4 \
 	ig4.4 \
 	igmp.4 \
 	iic.4 \
 	iicbb.4 \
 	iicbus.4 \
 	iicsmb.4 \
 	iir.4 \
 	${_imcsmb.4} \
 	inet.4 \
 	inet6.4 \
 	intpm.4 \
 	intro.4 \
 	${_io.4} \
 	${_ioat.4} \
 	ip.4 \
 	ip6.4 \
 	ipfirewall.4 \
 	ipheth.4 \
 	${_ipmi.4} \
 	ips.4 \
 	ipsec.4 \
 	ipw.4 \
 	ipwfw.4 \
 	isci.4 \
 	isl.4 \
 	ismt.4 \
 	isp.4 \
 	ispfw.4 \
 	iwi.4 \
 	iwifw.4 \
 	iwm.4 \
 	iwmfw.4 \
 	iwn.4 \
 	iwnfw.4 \
 	ixgbe.4 \
 	ixl.4 \
 	jedec_dimm.4 \
 	jme.4 \
 	kbdmux.4 \
 	keyboard.4 \
 	kld.4 \
 	ksyms.4 \
 	ksz8995ma.4 \
 	ktr.4 \
 	kue.4 \
 	lagg.4 \
 	le.4 \
 	led.4 \
 	lge.4 \
 	${_linux.4} \
 	liquidio.4 \
 	lm75.4 \
 	lo.4 \
 	lp.4 \
 	lpbb.4 \
 	lpt.4 \
 	mac.4 \
 	mac_biba.4 \
 	mac_bsdextended.4 \
 	mac_ifoff.4 \
 	mac_lomac.4 \
 	mac_mls.4 \
 	mac_none.4 \
 	mac_ntpd.4 \
 	mac_partition.4 \
 	mac_portacl.4 \
 	mac_seeotheruids.4 \
 	mac_stub.4 \
 	mac_test.4 \
 	malo.4 \
 	md.4 \
 	mdio.4 \
 	me.4 \
 	mem.4 \
 	meteor.4 \
 	mfi.4 \
 	miibus.4 \
 	mk48txx.4 \
 	mld.4 \
 	mlx.4 \
 	mlx4en.4 \
 	mlx5en.4 \
 	mly.4 \
 	mmc.4 \
 	mmcsd.4 \
 	mn.4 \
 	mod_cc.4 \
 	mos.4 \
 	mouse.4 \
 	mpr.4 \
 	mps.4 \
 	mpt.4 \
 	mrsas.4 \
 	msk.4 \
 	mtio.4 \
 	multicast.4 \
 	muge.4 \
 	mvs.4 \
 	mwl.4 \
 	mwlfw.4 \
 	mx25l.4 \
 	mxge.4 \
 	my.4 \
 	nand.4 \
 	nandsim.4 \
 	${_ndis.4} \
 	net80211.4 \
 	netdump.4 \
 	netfpga10g_nf10bmac.4 \
 	netgraph.4 \
 	netintro.4 \
 	netmap.4 \
 	${_nfe.4} \
 	${_nfsmb.4} \
 	ng_async.4 \
 	ngatmbase.4 \
 	ng_atmllc.4 \
 	ng_bpf.4 \
 	ng_bridge.4 \
 	ng_bt3c.4 \
 	ng_btsocket.4 \
 	ng_car.4 \
 	ng_ccatm.4 \
 	ng_checksum.4 \
 	ng_cisco.4 \
 	ng_deflate.4 \
 	ng_device.4 \
 	nge.4 \
 	ng_echo.4 \
 	ng_eiface.4 \
 	ng_etf.4 \
 	ng_ether.4 \
 	ng_ether_echo.4 \
 	ng_frame_relay.4 \
 	ng_gif.4 \
 	ng_gif_demux.4 \
 	ng_h4.4 \
 	ng_hci.4 \
 	ng_hole.4 \
 	ng_hub.4 \
 	ng_iface.4 \
 	ng_ipfw.4 \
 	ng_ip_input.4 \
 	ng_ksocket.4 \
 	ng_l2cap.4 \
 	ng_l2tp.4 \
 	ng_lmi.4 \
 	ng_mppc.4 \
 	ng_nat.4 \
 	ng_netflow.4 \
 	ng_one2many.4 \
 	ng_patch.4 \
 	ng_ppp.4 \
 	ng_pppoe.4 \
 	ng_pptpgre.4 \
 	ng_pred1.4 \
 	ng_rfc1490.4 \
 	ng_socket.4 \
 	ng_source.4 \
 	ng_split.4 \
 	ng_sppp.4 \
 	ng_sscfu.4 \
 	ng_sscop.4 \
 	ng_tag.4 \
 	ng_tcpmss.4 \
 	ng_tee.4 \
 	ng_tty.4 \
 	ng_ubt.4 \
 	ng_UI.4 \
 	ng_uni.4 \
 	ng_vjc.4 \
 	ng_vlan.4 \
 	nmdm.4 \
 	${_ntb.4} \
 	${_ntb_hw_intel.4} \
 	${_ntb_hw_plx.4} \
 	${_ntb_transport.4} \
 	${_nda.4} \
 	${_if_ntb.4} \
 	null.4 \
 	numa.4 \
 	${_nvd.4} \
 	${_nvme.4} \
 	${_nvram.4} \
 	${_nvram2env.4} \
 	oce.4 \
 	ocs_fc.4\
 	ohci.4 \
 	orm.4 \
 	ow.4 \
 	ow_temp.4 \
 	owc.4 \
 	${_padlock.4} \
 	pass.4 \
 	pccard.4 \
 	pccbb.4 \
 	pcf.4 \
 	pci.4 \
 	pcib.4 \
 	pcic.4 \
 	pcm.4 \
 	pcn.4 \
 	${_pf.4} \
 	${_pflog.4} \
 	${_pfsync.4} \
 	pim.4 \
 	pms.4 \
 	polling.4 \
 	ppbus.4 \
 	ppc.4 \
 	ppi.4 \
 	procdesc.4 \
 	proto.4 \
 	psm.4 \
 	pst.4 \
 	pt.4 \
 	pts.4 \
 	pty.4 \
 	puc.4 \
 	${_qlxge.4} \
 	${_qlxgb.4} \
 	${_qlxgbe.4} \
 	${_qlnxe.4} \
 	ral.4 \
 	random.4 \
 	rc.4 \
 	rctl.4 \
 	re.4 \
 	rgephy.4 \
 	rights.4 \
 	rl.4 \
 	rndtest.4 \
 	route.4 \
 	rp.4 \
 	rtwn.4 \
 	rtwnfw.4 \
 	rtwn_pci.4 \
 	rue.4 \
 	sa.4 \
 	safe.4 \
 	sbp.4 \
 	sbp_targ.4 \
 	scc.4 \
 	sched_4bsd.4 \
 	sched_ule.4 \
 	screen.4 \
 	scsi.4 \
 	sctp.4 \
 	sdhci.4 \
 	sem.4 \
 	send.4 \
 	ses.4 \
 	sf.4 \
 	${_sfxge.4} \
 	sge.4 \
 	siba.4 \
 	siftr.4 \
 	siis.4 \
 	simplebus.4 \
 	sio.4 \
 	sis.4 \
 	sk.4 \
 	${_smartpqi.4} \
 	smb.4 \
 	smbus.4 \
 	smp.4 \
 	smsc.4 \
 	sn.4 \
 	snd_ad1816.4 \
 	snd_als4000.4 \
 	snd_atiixp.4 \
 	snd_cmi.4 \
 	snd_cs4281.4 \
 	snd_csa.4 \
 	snd_ds1.4 \
 	snd_emu10k1.4 \
 	snd_emu10kx.4 \
 	snd_envy24.4 \
 	snd_envy24ht.4 \
 	snd_es137x.4 \
 	snd_ess.4 \
 	snd_fm801.4 \
 	snd_gusc.4 \
 	snd_hda.4 \
 	snd_hdspe.4 \
 	snd_ich.4 \
 	snd_maestro3.4 \
 	snd_maestro.4 \
 	snd_mss.4 \
 	snd_neomagic.4 \
 	snd_sbc.4 \
 	snd_solo.4 \
 	snd_spicds.4 \
 	snd_t4dwave.4 \
 	snd_uaudio.4 \
 	snd_via8233.4 \
 	snd_via82c686.4 \
 	snd_vibes.4 \
 	snp.4 \
 	spigen.4 \
 	${_spkr.4} \
 	splash.4 \
 	sppp.4 \
 	ste.4 \
 	stf.4 \
 	stge.4 \
 	sym.4 \
 	syncache.4 \
 	syncer.4 \
 	syscons.4 \
 	sysmouse.4 \
 	tap.4 \
 	targ.4 \
 	tcp.4 \
 	tdfx.4 \
 	terasic_mtl.4 \
 	termios.4 \
 	textdump.4 \
 	ti.4 \
 	timecounters.4 \
 	tl.4 \
 	${_tpm.4} \
 	trm.4 \
 	tty.4 \
 	tun.4 \
 	twa.4 \
 	twe.4 \
 	tws.4 \
 	tx.4 \
 	txp.4 \
 	udp.4 \
 	udplite.4 \
 	ure.4 \
 	vale.4 \
 	vga.4 \
 	vge.4 \
 	viapm.4 \
 	${_viawd.4} \
 	${_virtio.4} \
 	${_virtio_balloon.4} \
 	${_virtio_blk.4} \
 	${_virtio_console.4} \
 	${_virtio_random.4} \
 	${_virtio_scsi.4} \
 	${_vmci.4} \
 	vkbd.4 \
 	vlan.4 \
 	vxlan.4 \
 	${_vmm.4} \
 	${_vmx.4} \
 	vpo.4 \
 	vr.4 \
 	vt.4 \
 	vte.4 \
 	${_vtnet.4} \
 	watchdog.4 \
 	wb.4 \
 	${_wbwd.4} \
 	wi.4 \
 	witness.4 \
 	wlan.4 \
 	wlan_acl.4 \
 	wlan_amrr.4 \
 	wlan_ccmp.4 \
 	wlan_tkip.4 \
 	wlan_wep.4 \
 	wlan_xauth.4 \
 	wmt.4 \
 	${_wpi.4} \
 	wsp.4 \
 	xe.4 \
 	${_xen.4} \
 	xhci.4 \
 	xl.4 \
 	${_xnb.4} \
 	xpt.4 \
 	zero.4
 
 MLINKS=	ae.4 if_ae.4
 MLINKS+=age.4 if_age.4
 MLINKS+=agp.4 agpgart.4
 MLINKS+=alc.4 if_alc.4
 MLINKS+=ale.4 if_ale.4
 MLINKS+=altera_atse.4 atse.4
 MLINKS+=altera_sdcard.4 altera_sdcardc.4
 MLINKS+=altq.4 ALTQ.4
 MLINKS+=ath.4 if_ath.4
 MLINKS+=ath_pci.4 if_ath_pci.4
 MLINKS+=an.4 if_an.4
 MLINKS+=aue.4 if_aue.4
 MLINKS+=axe.4 if_axe.4
 MLINKS+=bce.4 if_bce.4
 MLINKS+=bfe.4 if_bfe.4
 MLINKS+=bge.4 if_bge.4
 MLINKS+=bktr.4 brooktree.4
 MLINKS+=bnxt.4 if_bnxt.4
 MLINKS+=bridge.4 if_bridge.4
 MLINKS+=bwi.4 if_bwi.4
 MLINKS+=bwn.4 if_bwn.4
 MLINKS+=${_bxe.4} ${_if_bxe.4}
 MLINKS+=cas.4 if_cas.4
 MLINKS+=cdce.4 if_cdce.4
 MLINKS+=cfi.4 cfid.4
 MLINKS+=cloudabi.4 cloudabi32.4 \
 	cloudabi.4 cloudabi64.4
 MLINKS+=crypto.4 cryptodev.4
 MLINKS+=cue.4 if_cue.4
 MLINKS+=cxgb.4 if_cxgb.4
 MLINKS+=cxgbe.4 if_cxgbe.4 \
 	cxgbe.4 vcxgbe.4 \
 	cxgbe.4 if_vcxgbe.4 \
 	cxgbe.4 cxl.4 \
 	cxgbe.4 if_cxl.4 \
 	cxgbe.4 vcxl.4 \
 	cxgbe.4 if_vcxl.4 \
 	cxgbe.4 cc.4 \
 	cxgbe.4 if_cc.4 \
 	cxgbe.4 vcc.4 \
 	cxgbe.4 if_vcc.4
 MLINKS+=cxgbev.4 if_cxgbev.4 \
 	cxgbev.4 cxlv.4 \
 	cxgbev.4 if_cxlv.4 \
 	cxgbev.4 ccv.4 \
 	cxgbev.4 if_ccv.4
 MLINKS+=dc.4 if_dc.4
 MLINKS+=de.4 if_de.4
 MLINKS+=disc.4 if_disc.4
 MLINKS+=ed.4 if_ed.4
 MLINKS+=edsc.4 if_edsc.4
 MLINKS+=em.4 if_em.4
 MLINKS+=enc.4 if_enc.4
 MLINKS+=epair.4 if_epair.4
 MLINKS+=et.4 if_et.4
 MLINKS+=fd.4 stderr.4 \
 	fd.4 stdin.4 \
 	fd.4 stdout.4
 MLINKS+=fdt.4 FDT.4
 MLINKS+=firewire.4 ieee1394.4
 MLINKS+=fwe.4 if_fwe.4
 MLINKS+=fwip.4 if_fwip.4
 MLINKS+=fxp.4 if_fxp.4
 MLINKS+=gem.4 if_gem.4
 MLINKS+=geom.4 GEOM.4
 MLINKS+=gif.4 if_gif.4
 MLINKS+=gpio.4 gpiobus.4
 MLINKS+=gre.4 if_gre.4
 MLINKS+=hme.4 if_hme.4
 MLINKS+=hpet.4 acpi_hpet.4
 MLINKS+=${_hptrr.4} ${_rr232x.4}
 MLINKS+=${_attimer.4} ${_i8254.4}
 MLINKS+=ip.4 rawip.4
 MLINKS+=ipfirewall.4 ipaccounting.4 \
 	ipfirewall.4 ipacct.4 \
 	ipfirewall.4 ipfw.4
 MLINKS+=ipheth.4 if_ipheth.4
 MLINKS+=ipw.4 if_ipw.4
 MLINKS+=iwi.4 if_iwi.4
 MLINKS+=iwm.4 if_iwm.4
 MLINKS+=iwn.4 if_iwn.4
 MLINKS+=ixgbe.4 ix.4
 MLINKS+=ixgbe.4 if_ix.4
 MLINKS+=ixgbe.4 if_ixgbe.4
 MLINKS+=ixl.4 if_ixl.4
 MLINKS+=iavf.4 if_iavf.4
 MLINKS+=jme.4 if_jme.4
 MLINKS+=kue.4 if_kue.4
 MLINKS+=lagg.4 trunk.4
 MLINKS+=lagg.4 if_lagg.4
 MLINKS+=le.4 if_le.4
 MLINKS+=lge.4 if_lge.4
 MLINKS+=lo.4 loop.4
 MLINKS+=lp.4 plip.4
 MLINKS+=malo.4 if_malo.4
 MLINKS+=md.4 vn.4
 MLINKS+=mem.4 kmem.4
 MLINKS+=mfi.4 mfi_linux.4 \
 	mfi.4 mfip.4
 MLINKS+=mlx5en.4 mce.4
 MLINKS+=mn.4 if_mn.4
 MLINKS+=mos.4 if_mos.4
 MLINKS+=msk.4 if_msk.4
 MLINKS+=mwl.4 if_mwl.4
 MLINKS+=mxge.4 if_mxge.4
 MLINKS+=my.4 if_my.4
 MLINKS+=${_ndis.4} ${_if_ndis.4}
 MLINKS+=netfpga10g_nf10bmac.4 if_nf10bmac.4
 MLINKS+=netintro.4 net.4 \
 	netintro.4 networking.4
 MLINKS+=${_nfe.4} ${_if_nfe.4}
 MLINKS+=nge.4 if_nge.4
 MLINKS+=ow.4 onewire.4
 MLINKS+=pccbb.4 cbb.4
 MLINKS+=pcm.4 snd.4 \
 	pcm.4 sound.4
 MLINKS+=pcn.4 if_pcn.4
 MLINKS+=pms.4 pmspcv.4
 MLINKS+=ral.4 if_ral.4
 MLINKS+=re.4 if_re.4
 MLINKS+=rl.4 if_rl.4
 MLINKS+=rtwn_pci.4 if_rtwn_pci.4
 MLINKS+=rue.4 if_rue.4
 MLINKS+=scsi.4 CAM.4 \
 	scsi.4 cam.4 \
 	scsi.4 scbus.4 \
 	scsi.4 SCSI.4
 MLINKS+=sf.4 if_sf.4
 MLINKS+=sge.4 if_sge.4
 MLINKS+=sis.4 if_sis.4
 MLINKS+=sk.4 if_sk.4
 MLINKS+=smp.4 SMP.4
 MLINKS+=smsc.4 if_smsc.4
 MLINKS+=sn.4 if_sn.4
 MLINKS+=snd_envy24.4 snd_ak452x.4
 MLINKS+=snd_sbc.4 snd_sb16.4 \
 	snd_sbc.4 snd_sb8.4
 MLINKS+=${_spkr.4} ${_speaker.4}
 MLINKS+=splash.4 screensaver.4
 MLINKS+=ste.4 if_ste.4
 MLINKS+=stf.4 if_stf.4
 MLINKS+=stge.4 if_stge.4
 MLINKS+=syncache.4 syncookies.4
 MLINKS+=syscons.4 sc.4
 MLINKS+=tap.4 if_tap.4
 MLINKS+=tdfx.4 tdfx_linux.4
 MLINKS+=ti.4 if_ti.4
 MLINKS+=tl.4 if_tl.4
 MLINKS+=tun.4 if_tun.4
 MLINKS+=tx.4 if_tx.4
 MLINKS+=txp.4 if_txp.4
 MLINKS+=ure.4 if_ure.4
 MLINKS+=vge.4 if_vge.4
 MLINKS+=vlan.4 if_vlan.4
 MLINKS+=vxlan.4 if_vxlan.4
 MLINKS+=${_vmx.4} ${_if_vmx.4}
 MLINKS+=vpo.4 imm.4
 MLINKS+=vr.4 if_vr.4
 MLINKS+=vte.4 if_vte.4
 MLINKS+=${_vtnet.4} ${_if_vtnet.4}
 MLINKS+=watchdog.4 SW_WATCHDOG.4
 MLINKS+=wb.4 if_wb.4
 MLINKS+=wi.4 if_wi.4
 MLINKS+=${_wpi.4} ${_if_wpi.4}
 MLINKS+=xe.4 if_xe.4
 MLINKS+=xl.4 if_xl.4
 
 .if ${MACHINE_CPUARCH} == "amd64" || ${MACHINE_CPUARCH} == "i386"
 _acpi_asus.4=	acpi_asus.4
 _acpi_asus_wmi.4=	acpi_asus_wmi.4
 _acpi_dock.4=	acpi_dock.4
 _acpi_fujitsu.4=acpi_fujitsu.4
 _acpi_hp.4=	acpi_hp.4
 _acpi_ibm.4=	acpi_ibm.4
 _acpi_panasonic.4=acpi_panasonic.4
 _acpi_rapidstart.4=acpi_rapidstart.4
 _acpi_sony.4=	acpi_sony.4
 _acpi_toshiba.4=acpi_toshiba.4
 _acpi_wmi.4=	acpi_wmi.4
 _aesni.4=	aesni.4
 _aout.4=	aout.4
 _apic.4=	apic.4
 _atrtc.4=	atrtc.4
 _attimer.4=	attimer.4
 _aibs.4=	aibs.4
 _amdsbwd.4=	amdsbwd.4
 _amdsmb.4=	amdsmb.4
 _amdsmn.4=	amdsmn.4
 _amdtemp.4=	amdtemp.4
 _asmc.4=	asmc.4
 _bxe.4=		bxe.4
 _bytgpio.4=	bytgpio.4
 _chvgpio.4=	chvgpio.4
 _coretemp.4=	coretemp.4
 _cpuctl.4=	cpuctl.4
 _dpms.4=	dpms.4
 _hpt27xx.4=	hpt27xx.4
 _hptiop.4=	hptiop.4
 _hptmv.4=	hptmv.4
 _hptnr.4=	hptnr.4
 _hptrr.4=	hptrr.4
 _hv_kvp.4=	hv_kvp.4
 _hv_netvsc.4=	hv_netvsc.4
 _hv_storvsc.4=	hv_storvsc.4
 _hv_utils.4=	hv_utils.4
 _hv_vmbus.4=	hv_vmbus.4
 _hv_vss.4=	hv_vss.4
 _i8254.4=	i8254.4
 _ichwd.4=	ichwd.4
 _if_bxe.4=	if_bxe.4
 _if_ndis.4=	if_ndis.4
 _if_nfe.4=	if_nfe.4
 _if_urtw.4=	if_urtw.4
 _if_vmx.4=	if_vmx.4
 _if_vtnet.4=	if_vtnet.4
 _if_wpi.4=	if_wpi.4
 _imcsmb.4=	imcsmb.4
 _ipmi.4=	ipmi.4
 _io.4=		io.4
 _linux.4=	linux.4
 _nda.4=		nda.4
 _ndis.4=	ndis.4
 _nfe.4=		nfe.4
 _nfsmb.4=	nfsmb.4
 _nvd.4=		nvd.4
 _nvme.4=	nvme.4
 _nvram.4=	nvram.4
 _virtio.4=	virtio.4
 _virtio_balloon.4=virtio_balloon.4
 _virtio_blk.4=	virtio_blk.4
 _virtio_console.4=virtio_console.4
 _virtio_random.4= virtio_random.4
 _virtio_scsi.4= virtio_scsi.4
 _vmx.4=		vmx.4
 _vtnet.4=	vtnet.4
 _padlock.4=	padlock.4
 _rr232x.4=	rr232x.4
 _speaker.4=	speaker.4
 _spkr.4=	spkr.4
 _tpm.4=		tpm.4
 _urtw.4=	urtw.4
 _viawd.4=	viawd.4
 _vmci.4=	vmci.4
 _wbwd.4=	wbwd.4
 _wpi.4=		wpi.4
 _xen.4=		xen.4
 _xnb.4=		xnb.4
 
 .endif
 
 .if ${MACHINE_CPUARCH} == "amd64"
 _if_ntb.4=	if_ntb.4
 _ioat.4=	ioat.4
 _ntb.4=		ntb.4
 _ntb_hw_intel.4=	ntb_hw_intel.4
 _ntb_hw_plx.4=	ntb_hw_plx.4
 _ntb_transport.4=ntb_transport.4
 _qlxge.4=	qlxge.4
 _qlxgb.4=	qlxgb.4
 _qlxgbe.4=	qlxgbe.4
 _qlnxe.4=	qlnxe.4
 _sfxge.4=	sfxge.4
 _smartpqi.4=	smartpqi.4
 
 MLINKS+=qlxge.4 if_qlxge.4
 MLINKS+=qlxgb.4 if_qlxgb.4
 MLINKS+=qlxgbe.4 if_qlxgbe.4
 MLINKS+=qlnxe.4 if_qlnxe.4
 MLINKS+=sfxge.4 if_sfxge.4
 
 .if ${MK_BHYVE} != "no"
 _bhyve.4=	bhyve.4
 _vmm.4=		vmm.4
 .endif
 .endif
 
 .if ${MACHINE_CPUARCH} == "mips"
 _nvram2env.4=	nvram2env.4
 .endif
 
 .if ${MACHINE_CPUARCH} == "powerpc"
 _nvd.4= 	nvd.4
 _nvme.4=	nvme.4
 .endif
 
 .if empty(MAN_ARCH)
 __arches=	${MACHINE} ${MACHINE_ARCH} ${MACHINE_CPUARCH}
 .elif ${MAN_ARCH} == "all"
 __arches=	${:!/bin/sh -c "/bin/ls -d ${.CURDIR}/man4.*"!:E}
 .else
 __arches=	${MAN_ARCH}
 .endif
 .for __arch in ${__arches:O:u}
 .if exists(${.CURDIR}/man4.${__arch})
 SUBDIR+=	man4.${__arch}
 .endif
 .endfor
 
 .if ${MK_BLUETOOTH} != "no"
 MAN+=		ng_bluetooth.4
 .endif
 
 .if ${MK_CCD} != "no"
 _ccd.4=		ccd.4
 .endif
 
 .if ${MK_CDDL} != "no"
-_dtrace_provs=	dtrace_io.4 \
+_dtrace_provs=	dtrace_audit.4 \
+		dtrace_io.4 \
 		dtrace_ip.4 \
 		dtrace_lockstat.4 \
 		dtrace_proc.4 \
 		dtrace_sched.4 \
 		dtrace_sctp.4 \
 		dtrace_tcp.4 \
 		dtrace_udp.4 \
 		dtrace_udplite.4
+
+MLINKS+=	dtrace_audit.4 dtaudit.4
 .endif
 
 .if ${MK_EFI} != "no"
 MAN+=		efidev.4
 
 MLINKS+=	efidev.4 efirtc.4
 .endif
 
 .if ${MK_ISCSI} != "no"
 MAN+=		cfiscsi.4
 MAN+=		iscsi.4
 MAN+=		iscsi_initiator.4
 MAN+=		iser.4
 .endif
 
 .if ${MK_OFED} != "no"
 MAN+=		mlx4ib.4
 MAN+=		mlx5ib.4
 .endif
 
 .if ${MK_MLX5TOOL} != "no"
 MAN+=		mlx5io.4
 .endif
 
 .if ${MK_TESTS} != "no"
 ATF=            ${SRCTOP}/contrib/atf
 .PATH:          ${ATF}/doc
 _atf_test_case.4=	atf-test-case.4
 .endif
 
 .if ${MK_PF} != "no"
 _pf.4=		pf.4
 _pflog.4=	pflog.4
 _pfsync.4=	pfsync.4
 .endif
 
 .if ${MK_USB} != "no"
 MAN+=	\
 	otus.4 \
 	otusfw.4 \
 	rsu.4 \
 	rsufw.4 \
 	rtwn_usb.4 \
 	rum.4 \
 	run.4 \
 	runfw.4 \
 	u3g.4 \
 	uark.4 \
 	uart.4 \
 	uath.4 \
 	ubsa.4 \
 	ubsec.4 \
 	ubser.4 \
 	ubtbcmfw.4 \
 	uchcom.4 \
 	ucom.4 \
 	ucycom.4 \
 	udav.4 \
 	udbp.4 \
 	udl.4 \
 	uep.4 \
 	ufm.4 \
 	ufoma.4 \
 	uftdi.4 \
 	ugen.4 \
 	ugold.4 \
 	uhci.4 \
 	uhid.4 \
 	uhso.4 \
 	uipaq.4 \
 	ukbd.4 \
 	uled.4 \
 	ulpt.4 \
 	umass.4 \
 	umcs.4 \
 	umct.4 \
 	umodem.4 \
 	umoscom.4 \
 	ums.4 \
 	unix.4 \
 	upgt.4 \
 	uplcom.4 \
 	ural.4 \
 	urio.4 \
 	urndis.4 \
 	${_urtw.4} \
 	usb.4 \
 	usb_quirk.4 \
 	usb_template.4 \
 	usfs.4 \
 	uslcom.4 \
 	uvisor.4 \
 	uvscom.4 \
 	zyd.4
 
 MLINKS+=otus.4 if_otus.4
 MLINKS+=rsu.4 if_rsu.4
 MLINKS+=rtwn_usb.4 if_rtwn_usb.4
 MLINKS+=rum.4 if_rum.4
 MLINKS+=run.4 if_run.4
 MLINKS+=u3g.4 u3gstub.4
 MLINKS+=uath.4 if_uath.4
 MLINKS+=udav.4 if_udav.4
 MLINKS+=upgt.4 if_upgt.4
 MLINKS+=ural.4 if_ural.4
 MLINKS+=urndis.4 if_urndis.4
 MLINKS+=${_urtw.4} ${_if_urtw.4}
 MLINKS+=zyd.4 if_zyd.4
 .endif
 
 .include <bsd.prog.mk>
Index: user/ngie/bug-237403/share/man/man4/audit.4
===================================================================
--- user/ngie/bug-237403/share/man/man4/audit.4	(revision 346925)
+++ user/ngie/bug-237403/share/man/man4/audit.4	(revision 346926)
@@ -1,148 +1,160 @@
-.\" Copyright (c) 2006 Robert N. M. Watson
+.\" Copyright (c) 2006, 2019 Robert N. M. Watson
 .\" All rights reserved.
 .\"
+.\" This software was developed in part by BAE Systems, the University of
+.\" Cambridge Computer Laboratory, and Memorial University under DARPA/AFRL
+.\" contract FA8650-15-C-7558 ("CADETS"), as part of the DARPA Transparent
+.\" Computing (TC) research program.
+.\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
-.Dd May 31, 2009
+.Dd April 28, 2019
 .Dt AUDIT 4
 .Os
 .Sh NAME
 .Nm audit
 .Nd Security Event Audit
 .Sh SYNOPSIS
 .Cd "options AUDIT"
 .Sh DESCRIPTION
 Security Event Audit is a facility to provide fine-grained, configurable
 logging of security-relevant events, and is intended to meet the requirements
 of the Common Criteria (CC) Common Access Protection Profile (CAPP)
 evaluation.
 The
 .Fx
 .Nm
 facility implements the de facto industry standard BSM API, file
 formats, and command line interface, first found in the Solaris operating
 system.
 Information on the user space implementation can be found in
 .Xr libbsm 3 .
 .Pp
 Audit support is enabled at boot, if present in the kernel, using an
 .Xr rc.conf 5
 flag.
 The audit daemon,
 .Xr auditd 8 ,
 is responsible for configuring the kernel to perform
 .Nm ,
 pushing
 configuration data from the various audit configuration files into the
 kernel.
 .Ss Audit Special Device
 The kernel
 .Nm
 facility provides a special device,
 .Pa /dev/audit ,
 which is used by
 .Xr auditd 8
 to monitor for
 .Nm
 events, such as requests to cycle the log, low disk
 space conditions, and requests to terminate auditing.
 This device is not intended for use by applications.
 .Ss Audit Pipe Special Devices
 Audit pipe special devices, discussed in
 .Xr auditpipe 4 ,
 provide a configurable live tracking mechanism to allow applications to
 tee the audit trail, as well as to configure custom preselection parameters
 to track users and events in a fine-grained manner.
+.Ss DTrace Audit Provider
+The DTrace Audit Provider,
+.Xr dtaudit 4 ,
+allows D scripts to enable capture of in-kernel audit records for kernel audit
+event types, and then process their contents during audit commit or BSM
+generation.
 .Sh SEE ALSO
 .Xr auditreduce 1 ,
 .Xr praudit 1 ,
 .Xr audit 2 ,
 .Xr auditctl 2 ,
 .Xr auditon 2 ,
 .Xr getaudit 2 ,
 .Xr getauid 2 ,
 .Xr poll 2 ,
 .Xr select 2 ,
 .Xr setaudit 2 ,
 .Xr setauid 2 ,
 .Xr libbsm 3 ,
 .Xr auditpipe 4 ,
+.Xr dtaudit 4 ,
 .Xr audit.log 5 ,
 .Xr audit_class 5 ,
 .Xr audit_control 5 ,
 .Xr audit_event 5 ,
 .Xr audit_user 5 ,
 .Xr audit_warn 5 ,
 .Xr rc.conf 5 ,
 .Xr audit 8 ,
 .Xr auditd 8 ,
 .Xr auditdistd 8
 .Sh HISTORY
 The
 .Tn OpenBSM
 implementation was created by McAfee Research, the security
 division of McAfee Inc., under contract to Apple Computer Inc.\& in 2004.
 It was subsequently adopted by the TrustedBSD Project as the foundation for
 the OpenBSM distribution.
 .Pp
 Support for kernel
 .Nm
 first appeared in
 .Fx 6.2 .
 .Sh AUTHORS
 .An -nosplit
 This software was created by McAfee Research, the security research division
 of McAfee, Inc., under contract to Apple Computer Inc.
 Additional authors include
 .An Wayne Salamon ,
 .An Robert Watson ,
 and SPARTA Inc.
 .Pp
 The Basic Security Module (BSM) interface to audit records and audit event
 stream format were defined by Sun Microsystems.
 .Pp
 This manual page was written by
 .An Robert Watson Aq Mt rwatson@FreeBSD.org .
 .Sh BUGS
 The
 .Fx
 kernel does not fully validate that audit records submitted by user
 applications are syntactically valid BSM; as submission of records is limited
 to privileged processes, this is not a critical bug.
 .Pp
 Instrumentation of auditable events in the kernel is not complete, as some
 system calls do not generate audit records, or generate audit records with
 incomplete argument information.
 .Pp
 Mandatory Access Control (MAC) labels, as provided by the
 .Xr mac 4
 facility, are not audited as part of records involving MAC decisions.
 .Pp
 Currently the
 .Nm
 syscalls are not supported for jailed processes.
 However, if a process has
 .Nm
 session state associated with it, audit records will still be produced and a zonename token
 containing the jail's ID or name will be present in the audit records.
Index: user/ngie/bug-237403/share/man/man4/auditpipe.4
===================================================================
--- user/ngie/bug-237403/share/man/man4/auditpipe.4	(revision 346925)
+++ user/ngie/bug-237403/share/man/man4/auditpipe.4	(revision 346926)
@@ -1,256 +1,257 @@
 .\" Copyright (c) 2006 Robert N. M. Watson
 .\" All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
-.Dd May 30, 2018
+.Dd April 28, 2019
 .Dt AUDITPIPE 4
 .Os
 .Sh NAME
 .Nm auditpipe
 .Nd "pseudo-device for live audit event tracking"
 .Sh SYNOPSIS
 .Cd "options AUDIT"
 .Sh DESCRIPTION
 While audit trail files
 generated with
 .Xr audit 4
 and maintained by
 .Xr auditd 8
 provide a reliable long-term store for audit log information, current log
 files are owned by the audit daemon until terminated making them somewhat
 unwieldy for live monitoring applications such as host-based intrusion
 detection.
 For example, the log may be cycled and new records written to a new file
 without notice to applications that may be accessing the file.
 .Pp
 The audit facility provides an audit pipe facility for applications requiring
 direct access to live BSM audit data for the purposes of real-time
 monitoring.
 Audit pipes are available via a clonable special device,
 .Pa /dev/auditpipe ,
 subject to the permissions on the device node, and provide a
 .Qq tee
 of the audit event stream.
 As the device is clonable, more than one instance of the device may be opened
 at a time; each device instance will provide independent access to all
 records.
 .Pp
 The audit pipe device provides discrete BSM audit records; if the read buffer
 passed by the application is too small to hold the next record in the
 sequence, it will be dropped.
 Unlike audit data written to the audit trail, the reliability of record
 delivery is not guaranteed.
 In particular, when an audit pipe queue fills, records will be dropped.
 Audit pipe devices are blocking by default, but support non-blocking I/O,
 asynchronous I/O using
 .Dv SIGIO ,
 and polled operation via
 .Xr select 2
 and
 .Xr poll 2 .
 .Pp
 Applications may choose to track the global audit trail, or configure local
 preselection parameters independent of the global audit trail parameters.
 .Ss Audit Pipe Queue Ioctls
 The following ioctls retrieve and set various audit pipe record queue
 properties:
 .Bl -tag -width ".Dv AUDITPIPE_GET_MAXAUDITDATA"
 .It Dv AUDITPIPE_GET_QLEN
 Query the current number of records available for reading on the pipe.
 .It Dv AUDITPIPE_GET_QLIMIT
 Retrieve the current maximum number of records that may be queued for reading
 on the pipe.
 .It Dv AUDITPIPE_SET_QLIMIT
 Set the current maximum number of records that may be queued for reading on
 the pipe.
 The new limit must fall between the queue limit minimum and queue limit
 maximum queryable using the following two ioctls.
 .It Dv AUDITPIPE_GET_QLIMIT_MIN
 Query the lowest possible maximum number of records that may be queued for
 reading on the pipe.
 .It Dv AUDITPIPE_GET_QLIMIT_MAX
 Query the highest possible maximum number of records that may be queued for
 reading on the pipe.
 .It Dv AUDITPIPE_FLUSH
 Flush all outstanding records on the audit pipe; useful after setting initial
 preselection properties to delete records queued during the configuration
 process which may not match the interests of the user process.
 .It Dv AUDITPIPE_GET_MAXAUDITDATA
 Query the maximum size of an audit record, which is a useful minimum size for
 a user space buffer intended to hold audit records read from the audit pipe.
 .El
 .Ss Audit Pipe Preselection Mode Ioctls
 By default, the audit pipe facility configures pipes to present records
 matched by the system-wide audit trail, configured by
 .Xr auditd 8 .
 However, the preselection mechanism for audit pipes can be configured using
 alternative criteria, including pipe-local flags and naflags settings, as
 well as auid-specific selection masks.
 This allows applications to track events not captured in the global audit
 trail, as well as limit records presented to those of specific interest to
 the application.
 .Pp
 The following ioctls configure the preselection mode on an audit pipe:
 .Bl -tag -width ".Dv AUDITPIPE_GET_PRESELECT_MODE"
 .It Dv AUDITPIPE_GET_PRESELECT_MODE
 Return the current preselect mode on the audit pipe.
 The ioctl argument should be of type
 .Vt int .
 .It Dv AUDITPIPE_SET_PRESELECT_MODE
 Set the current preselection mode on the audit pipe.
 The ioctl argument should be of type
 .Vt int .
 .El
 .Pp
 Possible preselection mode values are:
 .Bl -tag -width ".Dv AUDITPIPE_PRESELECT_MODE_TRAIL"
 .It Dv AUDITPIPE_PRESELECT_MODE_TRAIL
 Use the global audit trail preselection parameters to select records for the
 audit pipe.
 .It Dv AUDITPIPE_PRESELECT_MODE_LOCAL
 Use local audit pipe preselection; this model is similar to the global audit
 trail configuration model, consisting of global flags and naflags parameters,
 as well as a set of per-auid masks.
 These parameters are configured using further ioctls.
 .El
 .Pp
 After changing the audit pipe preselection mode, records selected under
 earlier preselection configuration may still be in the audit pipe queue.
 The application may flush the current record queue after changing the
 configuration to remove possibly undesired records.
 .Ss Audit Pipe Local Preselection Mode Ioctls
 The following ioctls configure the preselection parameters used when an audit
 pipe is configured for the
 .Dv AUDITPIPE_PRESELECT_MODE_LOCAL
 preselection mode.
 .Bl -tag -width ".Dv AUDITPIPE_GET_PRESELECT_NAFLAGS"
 .It Dv AUDITPIPE_GET_PRESELECT_FLAGS
 Retrieve the current default preselection flags for attributable events on
 the pipe.
 These flags correspond to the
 .Va flags
 field in
 .Xr audit_control 5 .
 The ioctl argument should be of type
 .Vt au_mask_t .
 .It Dv AUDITPIPE_SET_PRESELECT_FLAGS
 Set the current default preselection flags for attributable events on the
 pipe.
 These flags correspond to the
 .Va flags
 field in
 .Xr audit_control 5 .
 The ioctl argument should be of type
 .Vt au_mask_t .
 .It Dv AUDITPIPE_GET_PRESELECT_NAFLAGS
 Retrieve the current default preselection flags for non-attributable events
 on the pipe.
 These flags correspond to the
 .Va naflags
 field in
 .Xr audit_control 5 .
 The ioctl argument should be of type
 .Vt au_mask_t .
 .It Dv AUDITPIPE_SET_PRESELECT_NAFLAGS
 Set the current default preselection flags for non-attributable events on the
 pipe.
 These flags correspond to the
 .Va naflags
 field in
 .Xr audit_control 5 .
 The ioctl argument should be of type
 .Vt au_mask_t .
 .It Dv AUDITPIPE_GET_PRESELECT_AUID
 Query the current preselection masks for a specific auid on the pipe.
 The ioctl argument should be of type
 .Vt "struct auditpipe_ioctl_preselect" .
 The auid to query is specified via the
 .Va ap_auid
 field of type
 .Vt au_id_t ;
 the mask will be returned via
 .Va ap_mask
 of type
 .Vt au_mask_t .
 .It Dv AUDITPIPE_SET_PRESELECT_AUID
 Set the current preselection masks for a specific auid on the pipe.
 Arguments are identical to
 .Dv AUDITPIPE_GET_PRESELECT_AUID ,
 except that the caller should properly initialize the
 .Va ap_mask
 field to hold the desired preselection mask.
 .It Dv AUDITPIPE_DELETE_PRESELECT_AUID
 Delete the current preselection mask for a specific auid on the pipe.
 Once called, events associated with the specified auid will use the default
 flags mask.
 The ioctl argument should be of type
 .Vt au_id_t .
 .It Dv AUDITPIPE_FLUSH_PRESELECT_AUID
 Delete all auid specific preselection specifications.
 .El
 .Sh EXAMPLES
 The
 .Xr praudit 1
 utility
 may be directly executed on
 .Pa /dev/auditpipe
 to review the default audit trail.
 .Sh SEE ALSO
 .Xr poll 2 ,
 .Xr select 2 ,
 .Xr audit 4 ,
+.Xr dtaudit 4 ,
 .Xr audit_control 5 ,
 .Xr audit 8 ,
 .Xr auditd 8
 .Sh HISTORY
 The OpenBSM implementation was created by McAfee Research, the security
 division of McAfee Inc., under contract to Apple Computer Inc.\& in 2004.
 It was subsequently adopted by the TrustedBSD Project as the foundation for
 the OpenBSM distribution.
 .Pp
 Support for kernel audit first appeared in
 .Fx 6.2 .
 .Sh AUTHORS
 The audit pipe facility was designed and implemented by
 .An Robert Watson Aq Mt rwatson@FreeBSD.org .
 .Pp
 The Basic Security Module (BSM) interface to audit records and audit event
 stream format were defined by Sun Microsystems.
 .Sh BUGS
 See the
 .Xr audit 4
 manual page for information on audit-related bugs and limitations.
 .Pp
 The configurable preselection mechanism mirrors the selection model present
 for the global audit trail.
 It might be desirable to provide a more flexible selection model.
 .Pp
 The per-pipe audit event queue is fifo, with drops occurring if either the
 user thread provides in sufficient for the record on the queue head, or on
 enqueue if there is insufficient room.
 It might be desirable to support partial reads of records, which would be
 more compatible with buffered I/O as implemented in system libraries, and to
 allow applications to select which records are dropped, possibly in the style
 of preselection.
Index: user/ngie/bug-237403/share/man/man4/dtrace_audit.4
===================================================================
--- user/ngie/bug-237403/share/man/man4/dtrace_audit.4	(nonexistent)
+++ user/ngie/bug-237403/share/man/man4/dtrace_audit.4	(revision 346926)
@@ -0,0 +1,178 @@
+.\"-
+.\" SPDX-License-Identifier: BSD-2-Clause
+.\"
+.\" Copyright (c) 2019 Robert N. M. Watson
+.\"
+.\" This software was developed by BAE Systems, the University of Cambridge
+.\" Computer Laboratory, and Memorial University under DARPA/AFRL contract
+.\" FA8650-15-C-7558 ("CADETS"), as part of the DARPA Transparent Computing
+.\" (TC) research program.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd April 28, 2019
+.Dt DTRACE_AUDIT 4
+.Os
+.Sh NAME
+.Nm dtrace_audit
+.Nd A DTrace provider for tracing
+.Xr audit 4
+events
+.Sh SYNOPSIS
+.Pp
+.Fn audit:event:aue_*:commit "char *eventname" "struct audit_record *ar"
+.Fn audit:event:aue_*:bsm "char *eventname" "struct audit_record *ar" "const void *" "size_t"
+.Pp
+To compile this module into the kernel, place the following in your kernel
+configuration file:
+.Pp
+.Bd -literal -offset indent
+.Cd "options DTAUDIT"
+.Ed
+.Pp
+Alternatively, to load the module at boot time, place the following line in
+.Xr loader.conf 5 :
+.Bd -literal -offset indent
+dtaudit_load="YES"
+.Ed
+.Sh DESCRIPTION
+The DTrace
+.Nm dtaudit
+provider allows users to trace events in the kernel security auditing
+subsystem,
+.Xr audit 4 .
+.Xr audit 4
+provides detailed logging of a configurable set of security-relevant system
+calls, including key arguments (such as file paths) and return values that are
+copied race-free as the system call proceeds.
+The
+.Nm dtaudit
+provider allows DTrace scripts to selectively enable in-kernel audit-record
+capture for system calls, and then access those records in either the
+in-kernel format or BSM format (\c
+.Xr audit.log 5 )
+when the system call completes.
+While the in-kernel audit record data structure is subject to change as the
+kernel changes over time, it is a much more friendly interface for use in D
+scripts than either those available via the DTrace system-call provider or the
+BSM trail itself.
+.Ss Configuration
+The
+.Nm dtaudit
+provider relies on
+.Xr audit 4
+being compiled into the kernel.
+.Nm dtaudit
+probes become available only once there is an event-to-name mapping installed
+in the kernel, normally done by
+.Xr auditd 8
+during the boot process, if audit is enabled in
+.Xr rc.conf 5 :
+.Bd -literal -offset indent
+auditd_enable="YES"
+.Ed
+.Pp
+If
+.Nm dtaudit
+probes are required earlier in boot -- for example, in single-user mode -- or
+without enabling
+.Xr audit 4 ,
+they can be preloaded in the boot loader by adding this line to
+.Xr loader.conf 5 .
+.Bd -literal -offset indent
+audit_event_load="YES"
+.Ed
+.Ss Probes
+The
+.Fn audit:event:aue_*:commit
+probes fire synchronously during system-call return, giving access to two
+arguments: a
+.Vt char *
+audit event name, and
+the
+.Vt struct audit_record *
+in-kernel audit record.
+Because the probe fires in system-call return, the user thread has not yet
+regained control, and additional information from the thread and process
+remains available for capture by the script.
+.Pp
+The
+.Fn audit:event:aue_*:bsm
+probes fire asynchonously from system-call return, following BSM conversion
+and just prior to being written to disk, giving access to four arguments: a
+.Vt char *
+audit event name, the
+.Vt struct audit_record *
+in-kernel audit record, a
+.Vt const void *
+pointer to the converted BSM record, and a
+.Vt size_t
+for the length of the BSM record.
+.Sh IMPLEMENTATION NOTES
+When a set of
+.Nm dtaudit
+probes are registered, corresponding in-kernel audit records will be captured
+and their probes will fire regardless of whether the
+.Xr audit 4
+subsystem itself would have captured the record for the purposes of writing it
+to the audit trail, or for delivery to a
+.Xr auditpipe 4 .
+In-kernel audit records allocated only because of enabled
+.Xr dtaudit 4
+probes will not be unnecessarily written to the audit trail or enabled pipes.
+.Sh SEE ALSO
+.Xr dtrace 1 ,
+.Xr audit 4 ,
+.Xr audit.log 5 ,
+.Xr loader.conf 5 ,
+.Xr rc.conf 5 ,
+.Xr auditd 8
+.Sh HISTORY
+The
+.Nm dtaudit
+provider first appeared in
+.Fx 12.0 .
+.Sh AUTHORS
+This software and this manual page were developed by BAE Systems, the
+University of Cambridge Computer Laboratory, and Memorial University under
+DARPA/AFRL contract
+.Pq FA8650-15-C-7558
+.Pq Do CADETS Dc ,
+as part of the DARPA Transparent Computing (TC) research program.
+The
+.Nm dtaudit
+provider and this manual page were written by
+.An Robert Watson Aq Mt rwatson@FreeBSD.org .
+.Sh BUGS
+Because
+.Xr audit 4
+maintains its primary event-to-name mapping database in userspace, that
+database must be loaded into the kernel before
+.Nm dtaudit
+probes become available.
+.Pp
+.Nm dtaudit
+is only able to provide access to system-call audit events, not the full
+scope of userspace events, such as those relating to login, password change,
+and so on.

Property changes on: user/ngie/bug-237403/share/man/man4/dtrace_audit.4
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/share/man/man4/gre.4
===================================================================
--- user/ngie/bug-237403/share/man/man4/gre.4	(revision 346925)
+++ user/ngie/bug-237403/share/man/man4/gre.4	(revision 346926)
@@ -1,194 +1,232 @@
 .\" $NetBSD: gre.4,v 1.28 2002/06/10 02:49:35 itojun Exp $
 .\"
 .\" Copyright 1998 (c) The NetBSD Foundation, Inc.
 .\" All rights reserved.
 .\"
 .\" This code is derived from software contributed to The NetBSD Foundation
 .\" by Heiko W.Rupp <hwr@pilhuhn.de>
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 .\" TO, THE  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 .\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 .\" POSSIBILITY OF SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
-.Dd June 2, 2015
+.Dd April 24, 2019
 .Dt GRE 4
 .Os
 .Sh NAME
 .Nm gre
 .Nd encapsulating network device
 .Sh SYNOPSIS
 To compile the
 driver into the kernel, place the following line in the kernel
 configuration file:
 .Bd -ragged -offset indent
 .Cd "device gre"
 .Ed
 .Pp
 Alternatively, to load the
 driver as a module at boot time, place the following line in
 .Xr loader.conf 5 :
 .Bd -literal -offset indent
 if_gre_load="YES"
 .Ed
 .Sh DESCRIPTION
 The
 .Nm
 network interface pseudo device encapsulates datagrams
 into IP.
 These encapsulated datagrams are routed to a destination host,
 where they are decapsulated and further routed to their final destination.
 The
 .Dq tunnel
 appears to the inner datagrams as one hop.
 .Pp
 .Nm
 interfaces are dynamically created and destroyed with the
 .Xr ifconfig 8
 .Cm create
 and
 .Cm destroy
 subcommands.
 .Pp
 This driver corresponds to RFC 2784.
 Encapsulated datagrams are prepended an outer datagram and a GRE header.
 The GRE header specifies
 the type of the encapsulated datagram and thus allows for tunneling other
 protocols than IP.
 GRE mode is also the default tunnel mode on Cisco routers.
 .Nm
 also supports Cisco WCCP protocol, both version 1 and version 2.
 .Pp
 The
 .Nm
 interfaces support a number of additional parameters to the
 .Xr ifconfig 8 :
 .Bl -tag -width "enable_csum"
 .It Ar grekey
 Set the GRE key used for outgoing packets.
 A value of 0 disables the key option.
 .It Ar enable_csum
 Enables checksum calculation for outgoing packets.
 .It Ar enable_seq
 Enables use of sequence number field in the GRE header for outgoing packets.
+.It Ar udpencap
+Enables UDP-in-GRE encapsulation (see the
+.Sx GRE-IN-UDP ENCAPSULATION
+Section below for details).
+.It Ar udpport
+Set the source UDP port for outgoing packets.
+A value of 0 disables the persistence of source UDP port for outgoing packets.
+See the
+.Sx GRE-IN-UDP ENCAPSULATION
+Section below for details.
 .El
+.Sh GRE-IN-UDP ENCAPSULATION
+The
+.Nm
+supports GRE in UDP encapsulation as defined in RFC 8086.
+A GRE in UDP tunnel offers the possibility of better performance for
+load-balancing GRE traffic in transit networks.
+Encapsulating GRE in UDP enables use of the UDP source port to provide
+entropy to ECMP hashing.
+.Pp
+The GRE in UDP tunnel uses single value 4754 as UDP destination port.
+The UDP source port contains a 14-bit entropy value that is generated
+by the encapsulator to identify a flow for the encapsulated packet.
+The
+.Ar udpport
+option can be used to disable this behaviour and use single source UDP
+port value.
+The value of
+.Ar udpport
+should be within the ephemeral port range, i.e., 49152 to 65535 by default.
+.Pp
+Note that a GRE in UDP tunnel is unidirectional; the tunnel traffic is not
+expected to be returned back to the UDP source port values used to generate
+entropy.
+This may impact NAPT (Network Address Port Translator) middleboxes.
+If such tunnels are expected to be used on a path with a middlebox,
+the tunnel can be configured either to disable use of the UDP source port
+for entropy or to enable middleboxes to pass packets with UDP source port
+entropy.
 .Sh EXAMPLES
 .Bd -literal
 192.168.1.* --- Router A  -------tunnel-------- Router B --- 192.168.2.*
                    \\                              /
                     \\                            /
                      +------ the Internet ------+
 .Ed
 .Pp
 Assuming router A has the (external) IP address A and the internal address
 192.168.1.1, while router B has external address B and internal address
 192.168.2.1, the following commands will configure the tunnel:
 .Pp
 On router A:
 .Bd -literal -offset indent
 ifconfig greN create
 ifconfig greN inet 192.168.1.1 192.168.2.1
 ifconfig greN inet tunnel A B
 route add -net 192.168.2 -netmask 255.255.255.0 192.168.2.1
 .Ed
 .Pp
 On router B:
 .Bd -literal -offset indent
 ifconfig greN create
 ifconfig greN inet 192.168.2.1 192.168.1.1
 ifconfig greN inet tunnel B A
 route add -net 192.168.1 -netmask 255.255.255.0 192.168.1.1
 .Ed
 .Pp
 In case when internal and external IP addresses are the same,
 different routing tables (FIB) should be used.
 The default FIB will be applied to IP packets before GRE encapsulation.
 After encapsulation GRE interface should set different FIB number to
 outgoing packet.
 Then different FIB will be applied to such encapsulated packets.
 According to this FIB packet should be routed to tunnel endpoint.
 .Bd -literal
 Host X -- Host A (198.51.100.1) ---tunnel--- Cisco D (203.0.113.1) -- Host E
                    \\                                   /
                     \\                                 /
 	             +----- Host B ----- Host C -----+
                        (198.51.100.254)
 .Ed
 .Pp
 On Host A (FreeBSD):
 .Pp
 First of multiple FIBs should be configured via loader.conf:
 .Bd -literal -offset indent
 net.fibs=2
 net.add_addr_allfibs=0
 .Ed
 .Pp
 Then routes to the gateway and remote tunnel endpoint via this gateway
 should be added to the second FIB:
 .Bd -literal -offset indent
 route add -net 198.51.100.0 -netmask 255.255.255.0 -fib 1 -iface em0
 route add -host 203.0.113.1 -fib 1 198.51.100.254
 .Ed
 .Pp
 And GRE tunnel should be configured to change FIB for encapsulated packets:
 .Bd -literal -offset indent
 ifconfig greN create
 ifconfig greN inet 198.51.100.1 203.0.113.1
 ifconfig greN inet tunnel 198.51.100.1 203.0.113.1 tunnelfib 1
 .Ed
 .Sh NOTES
 The MTU of
 .Nm
 interfaces is set to 1476 by default, to match the value used by Cisco routers.
 This may not be an optimal value, depending on the link between the two tunnel
 endpoints.
 It can be adjusted via
 .Xr ifconfig 8 .
 .Pp
 For correct operation, the
 .Nm
 device needs a route to the decapsulating host that does not run over the tunnel,
 as this would be a loop.
 .Pp
 The kernel must be set to forward datagrams by setting the
 .Va net.inet.ip.forwarding
 .Xr sysctl 8
 variable to non-zero.
 .Sh SEE ALSO
 .Xr gif 4 ,
 .Xr inet 4 ,
 .Xr ip 4 ,
 .Xr me 4 ,
 .Xr netintro 4 ,
 .Xr protocols 5 ,
 .Xr ifconfig 8 ,
 .Xr sysctl 8
 .Pp
 A description of GRE encapsulation can be found in RFC 2784 and RFC 2890.
 .Sh AUTHORS
 .An Andrey V. Elsukov Aq Mt ae@FreeBSD.org
 .An Heiko W.Rupp Aq Mt hwr@pilhuhn.de
 .Sh BUGS
 The current implementation uses the key only for outgoing packets.
 Incoming packets with a different key or without a key will be treated as if they
 would belong to this interface.
 .Pp
 The sequence number field also used only for outgoing packets.
Index: user/ngie/bug-237403/share/man/man4/iflib.4
===================================================================
--- user/ngie/bug-237403/share/man/man4/iflib.4	(revision 346925)
+++ user/ngie/bug-237403/share/man/man4/iflib.4	(revision 346926)
@@ -1,204 +1,214 @@
 .\" $FreeBSD$
 .Dd September 27, 2018
 .Dt IFLIB 4
 .Os
 .Sh NAME
 .Nm iflib
 .Nd Network Interface Driver Framework
 .Sh SYNOPSIS
 .Cd "device pci"
 .Cd "device iflib"
 .Sh DESCRIPTION
 .Nm
 is a framework for network interface drivers for
 .Fx .
 It is designed to remove a large amount of the boilerplate that is often
 needed for modern network interface devices, allowing driver authors to
 focus on the specific code needed for their hardware.
 This allows for a shared set of
 .Xr sysctl 8
 names, rather than each driver naming them individually.
 .Sh SYSCTL VARIABLES
 These variables must be set before loading the driver, either via
 .Xr loader.conf 5
 or through the use of
 .Xr kenv 1 .
 They are all prefixed by
 .Va dev.X.Y.iflib\&.
 where X is the driver name, and Y is the instance number.
 .Bl -tag -width indent
 .It Va override_nrxds
 Override the number of RX descriptors for each queue.
 The value is a comma separated list of positive integers.
 Some drivers only use a single value, but others may use more.
 These numbers must be powers of two, and zero means to use the default.
 Individual drivers may have additional restrictions on allowable values.
 Defaults to all zeros.
 .It Va override_ntxds
 Override the number of TX descriptors for each queue.
 The value is a comma separated list of positive integers.
 Some drivers only use a single value, but others may use more.
 These numbers must be powers of two, and zero means to use the default.
 Individual drivers may have additional restrictions on allowable values.
 Defaults to all zeros.
 .It Va override_qs_enable
 When set, allows the number of transmit and receive queues to be different.
 If not set, the lower of the number of TX or RX queues will be used for both.
 .It Va override_nrxqs
 Set the number of RX queues.
 If zero, the number of RX queues is derived from the number of cores on the
 socket connected to the controller.
 Defaults to 0.
 .It Va override_ntxqs
 Set the number of TX queues.
 If zero, the number of TX queues is derived from the number of cores on the
 socket connected to the controller.
 .It Va disable_msix
 Disables MSI-X interrupts for the device.
+.It Va core_offset
+Specifies a starting core offset to assign queues to.
+If the value is unspecified or 65535, cores are assigned sequentially across
+controllers.
+.It Va separate_txrx
+Requests that RX and TX queues not be paired on the same core.
+If this is zero or not set, an RX and TX queue pair will be assigned to each
+core.
+When set to a non-zero value, TX queues are assigned to cores following the
+last RX queue.
 .El
 .Pp
 These
 .Xr sysctl 8
 variables can be changed at any time:
 .Bl -tag -width indent
 .It Va tx_abdicate
 Controls how the transmit ring is serviced.
 If set to zero, when a frame is submitted to the transmission ring, the same
 task that is submitting it will service the ring unless there's already a
 task servicing the TX ring.
 This ensures that whenever there is a pending transmission,
 the transmit ring is being serviced.
 This results in higher transmit throughput.
 If set to a non-zero value, task returns immediately and the transmit
 ring is serviced by a different task.
 This returns control to the caller faster and under high receive load,
 may result in fewer dropped RX frames.
 .It Va rx_budget
 Sets the maximum number of frames to be received at a time.
 Zero (the default) indicates the default (currently 16) should be used.
 .El
 .Pp
 There are also some global sysctls which can change behaviour for all drivers,
 and may be changed at any time.
 .Bl -tag -width indent
 .It Va net.iflib.min_tx_latency
 If this is set to a non-zero value, iflib will avoid any attempt to combine
 multiple transmits, and notify the hardware as quickly as possible of
 new descriptors.
 This will lower the maximum throughput, but will also lower transmit latency.
 .It Va net.iflib.no_tx_batch
 Some NICs allow processing completed transmit descriptors in batches.
 Doing so usually increases the transmit throughput by reducing the number of
 transmit interrupts.
 Setting this to a non-zero value will disable the use of this feature.
 .El
 .Pp
 These
 .Xr sysctl 8
 variables are read-only:
 .Bl -tag -width indent
 .It Va driver_version
 A string indicating the internal version of the driver.
 .El
 .Pp
 There are a number of queue state
 .Xr sysctl 8
 variables as well:
 .Bl -tag -width indent
 .It Va txqZ
 The following are repeated for each transmit queue, where Z is the transmit
 queue instance number:
 .Bl -tag -width indent
 .It Va r_abdications
 Number of consumer abdications in the MP ring for this queue.
 An abdication occurs on every ring submission when tx_abdicate is true.
 .It Va r_restarts
 Number of consumer restarts in the MP ring for this queue.
 A restart occurs when an attempt to drain a non-empty ring fails,
 and the ring is already in the STALLED state.
 .It Va r_stalls
 Number of consumer stalls in the MP ring for this queue.
 A stall occurs when an attempt to drain a non-empty ring fails.
 .It Va r_starts
 Number of normal consumer starts in the MP ring for this queue.
 A start occurs when the MP ring transitions from IDLE to BUSY.
 .It Va r_drops
 Number of drops in the MP ring for this queue.
 A drop occurs when there is an attempt to add an entry to an MP ring with
 no available space.
 .It Va r_enqueues
 Number of entries which have been enqueued to the MP ring for this queue.
 .It Va ring_state
 MP (soft) ring state.
 This privides a snapshot of the current MP ring state, including the producer
 head and tail indexes, the consumer index, and the state.
 The state is one of "IDLE", "BUSY",
 "STALLED", or "ABDICATED".
 .It Va txq_cleaned
 The number of transmit descriptors which have been reclaimed.
 Total cleaned.
 .It Va txq_processed
 The number of transmit descriptors which have been processed, but may not yet
 have been reclaimed.
 .It Va txq_in_use
 Descriptors which have been added to the transmit queue,
 but have not yet been cleaned.
 This value will include both untransmitted descriptors as well as descriptors
 which have been processed.
 .It Va txq_cidx_processed
 The transmit queue consumer index of the next descriptor to process.
 .It Va txq_cidx
 The transmit queue consumer index of the oldest descriptor to reclaim.
 .It Va txq_pidx
 The transmit queue producer index where the next descriptor to transmit will
 be inserted.
 .It Va no_tx_dma_setup
 Number of times DMA mapping a transmit mbuf failed for reasons other than
 .Er EFBIG .
 .It Va txd_encap_efbig
 Number of times DMA mapping a transmit mbuf failed due to requiring too many
 segments.
 .It Va tx_map_failed
 Number of times DMA mapping a transmit mbuf failed for any reason
 (sum of no_tx_dma_setup and txd_encap_efbig)
 .It Va no_desc_avail
 Number of times a descriptor couldn't be added to the transmit ring because
 the transmit ring was full.
 .It Va mbuf_defrag_failed
 Number of times both
 .Xr m_collapse 9
 and
 .Xr m_defrag 9
 failed after an
 .Er EFBIG
 error
 result from DMA mapping a transmit mbuf.
 .It Va m_pullups
 Number of times
 .Xr m_pullup 9
 was called attempting to parse a header.
 .It Va mbuf_defrag
 Number of times
 .Xr m_defrag 9
 was called.
 .El
 .It Va rxqZ
 The following are repeated for each receive queue, where Z is the
 receive queue instance number:
 .Bl -tag -width indent
 .It Va rxq_fl0.credits
 Credits currently available in the receive ring.
 .It Va rxq_fl0.cidx
 Current receive ring consumer index.
 .It Va rxq_fl0.pidx
 Current receive ring producer index.
 .El
 .El
 .Pp
 Additional OIDs useful for driver and iflib development are exposed when the
 INVARIANTS and/or WITNESS options are enabled in the kernel.
 .Sh SEE ALSO
 .Xr iflib 9
 .Sh HISTORY
 This framework was introduced in
 .Fx 11.0 .
Index: user/ngie/bug-237403/stand/common/disk.c
===================================================================
--- user/ngie/bug-237403/stand/common/disk.c	(revision 346925)
+++ user/ngie/bug-237403/stand/common/disk.c	(revision 346926)
@@ -1,438 +1,450 @@
 /*-
  * Copyright (c) 1998 Michael Smith <msmith@freebsd.org>
  * Copyright (c) 2012 Andrey V. Elsukov <ae@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/disk.h>
 #include <sys/queue.h>
 #include <stand.h>
 #include <stdarg.h>
 #include <bootstrap.h>
 #include <part.h>
 
 #include "disk.h"
 
 #ifdef DISK_DEBUG
 # define DPRINTF(fmt, args...)	printf("%s: " fmt "\n" , __func__ , ## args)
 #else
 # define DPRINTF(fmt, args...)
 #endif
 
 struct open_disk {
 	struct ptable		*table;
 	uint64_t		mediasize;
 	uint64_t		entrysize;
 	u_int			sectorsize;
 };
 
 struct print_args {
 	struct disk_devdesc	*dev;
 	const char		*prefix;
 	int			verbose;
 };
 
 /* Convert size to a human-readable number. */
 static char *
 display_size(uint64_t size, u_int sectorsize)
 {
 	static char buf[80];
 	char unit;
 
 	size = size * sectorsize / 1024;
 	unit = 'K';
 	if (size >= 10485760000LL) {
 		size /= 1073741824;
 		unit = 'T';
 	} else if (size >= 10240000) {
 		size /= 1048576;
 		unit = 'G';
 	} else if (size >= 10000) {
 		size /= 1024;
 		unit = 'M';
 	}
 	sprintf(buf, "%4ld%cB", (long)size, unit);
 	return (buf);
 }
 
 int
 ptblread(void *d, void *buf, size_t blocks, uint64_t offset)
 {
 	struct disk_devdesc *dev;
 	struct open_disk *od;
 
 	dev = (struct disk_devdesc *)d;
 	od = (struct open_disk *)dev->dd.d_opendata;
 
 	/*
 	 * The strategy function assumes the offset is in units of 512 byte
 	 * sectors. For larger sector sizes, we need to adjust the offset to
 	 * match the actual sector size.
 	 */
 	offset *= (od->sectorsize / 512);
 	/*
 	 * As the GPT backup partition is located at the end of the disk,
 	 * to avoid reading past disk end, flag bcache not to use RA.
 	 */
 	return (dev->dd.d_dev->dv_strategy(dev, F_READ | F_NORA, offset,
 	    blocks * od->sectorsize, (char *)buf, NULL));
 }
 
 static int
 ptable_print(void *arg, const char *pname, const struct ptable_entry *part)
 {
 	struct disk_devdesc dev;
 	struct print_args *pa, bsd;
 	struct open_disk *od;
 	struct ptable *table;
 	char line[80];
 	int res;
 	u_int sectsize;
 	uint64_t partsize;
 
 	pa = (struct print_args *)arg;
 	od = (struct open_disk *)pa->dev->dd.d_opendata;
 	sectsize = od->sectorsize;
 	partsize = part->end - part->start + 1;
 	sprintf(line, "  %s%s: %s\t%s\n", pa->prefix, pname,
 	    parttype2str(part->type),
 	    pa->verbose ? display_size(partsize, sectsize) : "");
 	if (pager_output(line))
 		return 1;
 	res = 0;
 	if (part->type == PART_FREEBSD) {
 		/* Open slice with BSD label */
 		dev.dd.d_dev = pa->dev->dd.d_dev;
 		dev.dd.d_unit = pa->dev->dd.d_unit;
 		dev.d_slice = part->index;
 		dev.d_partition = D_PARTNONE;
 		if (disk_open(&dev, partsize, sectsize) == 0) {
 			table = ptable_open(&dev, partsize, sectsize, ptblread);
 			if (table != NULL) {
 				sprintf(line, "  %s%s", pa->prefix, pname);
 				bsd.dev = pa->dev;
 				bsd.prefix = line;
 				bsd.verbose = pa->verbose;
 				res = ptable_iterate(table, &bsd, ptable_print);
 				ptable_close(table);
 			}
 			disk_close(&dev);
 		}
 	}
 
 	return (res);
 }
 
 int
 disk_print(struct disk_devdesc *dev, char *prefix, int verbose)
 {
 	struct open_disk *od;
 	struct print_args pa;
 
 	/* Disk should be opened */
 	od = (struct open_disk *)dev->dd.d_opendata;
 	pa.dev = dev;
 	pa.prefix = prefix;
 	pa.verbose = verbose;
 	return (ptable_iterate(od->table, &pa, ptable_print));
 }
 
 int
 disk_read(struct disk_devdesc *dev, void *buf, uint64_t offset, u_int blocks)
 {
 	struct open_disk *od;
 	int ret;
 
 	od = (struct open_disk *)dev->dd.d_opendata;
 	ret = dev->dd.d_dev->dv_strategy(dev, F_READ, dev->d_offset + offset,
 	    blocks * od->sectorsize, buf, NULL);
 
 	return (ret);
 }
 
 int
 disk_write(struct disk_devdesc *dev, void *buf, uint64_t offset, u_int blocks)
 {
 	struct open_disk *od;
 	int ret;
 
 	od = (struct open_disk *)dev->dd.d_opendata;
 	ret = dev->dd.d_dev->dv_strategy(dev, F_WRITE, dev->d_offset + offset,
 	    blocks * od->sectorsize, buf, NULL);
 
 	return (ret);
 }
 
 int
 disk_ioctl(struct disk_devdesc *dev, u_long cmd, void *data)
 {
 	struct open_disk *od = dev->dd.d_opendata;
 
 	if (od == NULL)
 		return (ENOTTY);
 
 	switch (cmd) {
 	case DIOCGSECTORSIZE:
 		*(u_int *)data = od->sectorsize;
 		break;
 	case DIOCGMEDIASIZE:
 		if (dev->d_offset == 0)
 			*(uint64_t *)data = od->mediasize;
 		else
 			*(uint64_t *)data = od->entrysize * od->sectorsize;
 		break;
 	default:
 		return (ENOTTY);
 	}
 
 	return (0);
 }
 
 int
 disk_open(struct disk_devdesc *dev, uint64_t mediasize, u_int sectorsize)
 {
 	struct disk_devdesc partdev;
 	struct open_disk *od;
 	struct ptable *table;
 	struct ptable_entry part;
 	int rc, slice, partition;
 
 	rc = 0;
 	od = (struct open_disk *)malloc(sizeof(struct open_disk));
 	if (od == NULL) {
 		DPRINTF("no memory");
 		return (ENOMEM);
 	}
 	dev->dd.d_opendata = od;
 	od->entrysize = 0;
 	od->mediasize = mediasize;
 	od->sectorsize = sectorsize;
 	/*
 	 * While we are reading disk metadata, make sure we do it relative
 	 * to the start of the disk
 	 */
 	memcpy(&partdev, dev, sizeof(partdev));
 	partdev.d_offset = 0;
 	partdev.d_slice = D_SLICENONE;
 	partdev.d_partition = D_PARTNONE;
 
 	dev->d_offset = 0;
 	table = NULL;
 	slice = dev->d_slice;
 	partition = dev->d_partition;
 
 	DPRINTF("%s unit %d, slice %d, partition %d => %p",
 	    disk_fmtdev(dev), dev->dd.d_unit, dev->d_slice, dev->d_partition, od);
 
 	/* Determine disk layout. */
 	od->table = ptable_open(&partdev, mediasize / sectorsize, sectorsize,
 	    ptblread);
 	if (od->table == NULL) {
 		DPRINTF("Can't read partition table");
 		rc = ENXIO;
 		goto out;
 	}
 
 	if (ptable_getsize(od->table, &mediasize) != 0) {
 		rc = ENXIO;
 		goto out;
 	}
 	od->mediasize = mediasize;
 
 	if (ptable_gettype(od->table) == PTABLE_BSD &&
 	    partition >= 0) {
 		/* It doesn't matter what value has d_slice */
 		rc = ptable_getpart(od->table, &part, partition);
 		if (rc == 0) {
 			dev->d_offset = part.start;
 			od->entrysize = part.end - part.start + 1;
 		}
 	} else if (ptable_gettype(od->table) == PTABLE_ISO9660) {
 		dev->d_offset = 0;
 		od->entrysize = mediasize;
 	} else if (slice >= 0) {
 		/* Try to get information about partition */
 		if (slice == 0)
 			rc = ptable_getbestpart(od->table, &part);
 		else
 			rc = ptable_getpart(od->table, &part, slice);
 		if (rc != 0) /* Partition doesn't exist */
 			goto out;
 		dev->d_offset = part.start;
 		od->entrysize = part.end - part.start + 1;
 		slice = part.index;
 		if (ptable_gettype(od->table) == PTABLE_GPT) {
 			partition = 255;
 			goto out; /* Nothing more to do */
 		} else if (partition == 255) {
 			/*
 			 * When we try to open GPT partition, but partition
 			 * table isn't GPT, reset d_partition value to -1
 			 * and try to autodetect appropriate value.
 			 */
 			partition = -1;
 		}
 		/*
 		 * If d_partition < 0 and we are looking at a BSD slice,
 		 * then try to read BSD label, otherwise return the
 		 * whole MBR slice.
 		 */
 		if (partition == -1 &&
 		    part.type != PART_FREEBSD)
 			goto out;
 		/* Try to read BSD label */
 		table = ptable_open(dev, part.end - part.start + 1,
 		    od->sectorsize, ptblread);
 		if (table == NULL) {
 			DPRINTF("Can't read BSD label");
 			rc = ENXIO;
 			goto out;
 		}
 		/*
 		 * If slice contains BSD label and d_partition < 0, then
 		 * assume the 'a' partition. Otherwise just return the
 		 * whole MBR slice, because it can contain ZFS.
 		 */
 		if (partition < 0) {
 			if (ptable_gettype(table) != PTABLE_BSD)
 				goto out;
 			partition = 0;
 		}
 		rc = ptable_getpart(table, &part, partition);
 		if (rc != 0)
 			goto out;
 		dev->d_offset += part.start;
 		od->entrysize = part.end - part.start + 1;
 	}
 out:
 	if (table != NULL)
 		ptable_close(table);
 
 	if (rc != 0) {
 		if (od->table != NULL)
 			ptable_close(od->table);
 		free(od);
 		DPRINTF("%s could not open", disk_fmtdev(dev));
 	} else {
 		/* Save the slice and partition number to the dev */
 		dev->d_slice = slice;
 		dev->d_partition = partition;
 		DPRINTF("%s offset %lld => %p", disk_fmtdev(dev),
 		    (long long)dev->d_offset, od);
 	}
 	return (rc);
 }
 
 int
 disk_close(struct disk_devdesc *dev)
 {
 	struct open_disk *od;
 
 	od = (struct open_disk *)dev->dd.d_opendata;
 	DPRINTF("%s closed => %p", disk_fmtdev(dev), od);
 	ptable_close(od->table);
 	free(od);
 	return (0);
 }
 
 char*
 disk_fmtdev(struct disk_devdesc *dev)
 {
 	static char buf[128];
 	char *cp;
 
 	cp = buf + sprintf(buf, "%s%d", dev->dd.d_dev->dv_name, dev->dd.d_unit);
 	if (dev->d_slice > D_SLICENONE) {
 #ifdef LOADER_GPT_SUPPORT
 		if (dev->d_partition == D_PARTISGPT) {
 			sprintf(cp, "p%d:", dev->d_slice);
 			return (buf);
 		} else
 #endif
 #ifdef LOADER_MBR_SUPPORT
 			cp += sprintf(cp, "s%d", dev->d_slice);
 #endif
 	}
 	if (dev->d_partition > D_PARTNONE)
 		cp += sprintf(cp, "%c", dev->d_partition + 'a');
 	strcat(cp, ":");
 	return (buf);
 }
 
 int
 disk_parsedev(struct disk_devdesc *dev, const char *devspec, const char **path)
 {
 	int unit, slice, partition;
 	const char *np;
 	char *cp;
 
 	np = devspec;
 	unit = -1;
-	slice = D_SLICEWILD;
-	partition = D_PARTWILD;
+	/*
+	 * If there is path/file info after the device info, then any missing
+	 * slice or partition info should be considered a request to search for
+	 * an appropriate partition.  Otherwise we want to open the raw device
+	 * itself and not try to fill in missing info by searching.
+	 */
+	if ((cp = strchr(np, ':')) != NULL && cp[1] != '\0') {
+		slice = D_SLICEWILD;
+		partition = D_PARTWILD;
+	} else {
+		slice = D_SLICENONE;
+		partition = D_PARTNONE;
+	}
+
 	if (*np != '\0' && *np != ':') {
 		unit = strtol(np, &cp, 10);
 		if (cp == np)
 			return (EUNIT);
 #ifdef LOADER_GPT_SUPPORT
 		if (*cp == 'p') {
 			np = cp + 1;
 			slice = strtol(np, &cp, 10);
 			if (np == cp)
 				return (ESLICE);
 			/* we don't support nested partitions on GPT */
 			if (*cp != '\0' && *cp != ':')
 				return (EINVAL);
 			partition = 255;
 		} else
 #endif
 #ifdef LOADER_MBR_SUPPORT
 		if (*cp == 's') {
 			np = cp + 1;
 			slice = strtol(np, &cp, 10);
 			if (np == cp)
 				return (ESLICE);
 		}
 #endif
 		if (*cp != '\0' && *cp != ':') {
 			partition = *cp - 'a';
 			if (partition < 0)
 				return (EPART);
 			cp++;
 		}
 	} else
 		return (EINVAL);
 
 	if (*cp != '\0' && *cp != ':')
 		return (EINVAL);
 	dev->dd.d_unit = unit;
 	dev->d_slice = slice;
 	dev->d_partition = partition;
 	if (path != NULL)
 		*path = (*cp == '\0') ? cp: cp + 1;
 	return (0);
 }
Index: user/ngie/bug-237403/stand/common/help.common
===================================================================
--- user/ngie/bug-237403/stand/common/help.common	(revision 346925)
+++ user/ngie/bug-237403/stand/common/help.common	(revision 346926)
@@ -1,407 +1,421 @@
 ################################################################################
 # Thelp DDisplay command help
 
 	help [topic [subtopic]]
 	help index
 
 	The help command displays help on commands and their usage.
 
 	In command help, a term enclosed with <...> indicates a value as
 	described by the term.  A term enclosed with [...] is optional,
 	and may not be required by all forms of the command.
 
 	Some commands may not be available.  Use the '?' command to list
 	most available commands.
 
 ################################################################################
 # T? DList available commands
 
 	?
 
 	Lists all available commands.
 
 ################################################################################
 # Tautoboot DBoot after a delay
 
 	autoboot [<delay> [<prompt>]]
 
 	Displays <prompt> or a default prompt, and counts down <delay> seconds
 	before attempting to boot.  If <delay> is not specified, the default
 	value is 10.
 
 ################################################################################
 # Tboot DBoot immediately
 
 	boot [<kernelname>] [-<arg> ...]
 
 	Boot the system.  If arguments are specified, they are added to the
 	arguments for the kernel.  If <kernelname> is specified, and a kernel
 	has not already been loaded, it will be booted instead of the default
 	kernel.
 
 ################################################################################
 # Tbcachestat DGet disk block cache stats
 
 	bcachestat
 
 	Displays statistics about disk cache usage.  For debugging only.
 
 ################################################################################
 # Techo DEcho arguments
 
 	echo [-n] [<message>]
 
 	Emits <message>, with no trailing newline if -n is specified.  This is
 	most useful in conjunction with scripts and the '@' line prefix.
 
 	Variables are substituted by prefixing them with $, eg.
 
 		echo Current device is $currdev
 
 	will print the current device.
 
 ################################################################################
 # Tload DLoad a kernel or module
 	
 	load [-t <type>] <filename>
 
 	Loads the module contained in <filename> into memory.  If no other
 	modules are loaded, <filename> must be a kernel or the command will
 	fail.
 
 	If -t is specified, the module is loaded as raw data of <type>, for
 	later use by the kernel or other modules.  <type> may be any string.
 
 ################################################################################
 # Tls DList files
 
 	ls [-l] [<path>]
 
 	Displays a listing of files in the directory <path>, or the root
 	directory of the current device if <path> is not specified.
 
 	The -l argument displays file sizes as well; the process of obtaining
 	file sizes on some media may be very slow.
 
 ################################################################################
 # Tlsdev DList devices
 
 	lsdev [-v]
 
 	List all of the devices from which it may be possible to load modules.
 	If -v is specified, print more details.
 
 ################################################################################
 # Tlsmod DList modules
 
 	lsmod [-v]
 
 	List loaded modules. If [-v] is specified, print more details.
 
 ################################################################################
+# Tmap-vdisk DMap virtual disk
+
+	map-vdisk filename
+
+	Map file as virtual disk.
+
+################################################################################
 # Tmore DPage files
 
 	more <filename> [<filename> ...]
 
 	Show contents of text files. When displaying the contents of more,
 	than one file, if the user elects to quit displaying a file, the
 	remaining files will not be shown.
 
 ################################################################################
 # Tpnpscan DScan for PnP devices
 
 	pnpscan [-v]
 
 	Scan for Plug-and-Play devices.  This command is normally automatically
 	run as part of the boot process, in order to dynamically load modules
 	required for system operation.
 
 	If the -v argument is specified, details on the devices found will
 	be printed.
 
 ################################################################################
 # Tset DSet a variable
 
 	set <variable name>
 	set <variable name>=<value>
 
 	The set command is used to set variables.
 
 ################################################################################
 # Tset Sautoboot_delay DSet the default autoboot delay
 
 	set autoboot_delay=<value>
 
 	Sets the default delay for the autoboot command to <value> seconds.
 	Set value to -1 if you don't want to allow user to interrupt autoboot
 	process and escape to the loader prompt.
 
 ################################################################################
 # Tset Sbootfile DSet the default boot file set
 
 	set bootfile=<filename>[;<filename>...]
 
 	Sets the default set of kernel boot filename(s). It may be overridden
 	by setting the bootfile variable to a semicolon-separated list of
 	filenames, each of which will be searched for in the module_path
 	directories. The default bootfile set is "kernel".
 
 ################################################################################
 # Tset Sboot_askname DPrompt for root device
 
 	set boot_askname
 
 	Instructs the kernel to prompt the user for the name of the root device
 	when the kernel is booted.
 
 ################################################################################
 # Tset Sboot_cdrom DMount root file system from CD-ROM
 
 	set boot_cdrom
 
 	Instructs the kernel to try to mount the root file system from CD-ROM.
 
 ################################################################################
 # Tset Sboot_ddb DDrop to the kernel debugger (DDB)
 
 	set boot_ddb
 
 	Instructs the kernel to start in the DDB debugger, rather than
 	proceeding to initialize when booted.
 
 ################################################################################
 # Tset Sboot_dfltroot DUse default root file system
 
 	set boot_dfltroot
 
 	Instructs the kernel to mount the statically compiled-in root
 	file system.
 
 ################################################################################
 # Tset Sboot_gdb DSelect gdb-remote mode for the kernel debugger
 
 	set boot_gdb
 
 	Selects gdb-remote mode for the kernel debugger by default.
 
 ################################################################################
 # Tset Sboot_multicons DUse multiple consoles
 
 	set boot_multicons
 
 	Enables multiple console support in the kernel early on boot.
 	In a running system, console configuration can be manipulated
 	by the conscontrol(8) utility.
 
 ################################################################################
 # Tset Sboot_mute DMute the console
 
 	set boot_mute
 
 	All console output is suppressed when console is muted.
 	In a running system, the state of console muting can be
 	manipulated by the conscontrol(8) utility.
 
 ################################################################################
 # Tset Sboot_pause DPause after each line during device probing
 
 	set boot_pause
 
 	During the device probe, pause after each line is printed.
 
 ################################################################################
 # Tset Sboot_serial DUse serial console
 
 	set boot_serial
 
 	Force the use of a serial console even when an internal console
 	is present.
 
 ################################################################################
 # Tset Sboot_single DStart system in single-user mode
 
 	set boot_single
 
 	Prevents the kernel from initiating a multi-user startup; instead,
 	a single-user mode will be entered when the kernel has finished
 	device probes.
 
 ################################################################################
 # Tset Sboot_verbose DVerbose boot messages
 
 	set boot_verbose
 
 	Setting this variable causes extra debugging information to be printed
 	by the kernel during the boot phase.
 
 ################################################################################
 # Tset Sconsole DSet the current console
 
 	set console[=<value>]
 
 	Sets the current console.  If <value> is omitted, a list of valid
 	consoles will be displayed.
 
 ################################################################################
 # Tset Scurrdev DSet the current device
 
 	set currdev=<device>
 
 	Selects the default device.  See lsdev for available devices.
 
 ################################################################################
 # Tset Sinit_path DSet the list of init candidates
 
 	set init_path=<path>[:<path>...]
 
 	Sets the list of binaries which the kernel will try to run as initial
 	process.
 
 
 ################################################################################
 # Tset Smodule_path DSet the module search path
 
 	set module_path=<path>[;<path>...]
 
 	Sets the list of directories which will be searched in for modules
 	named in a load command or implicitly required by a dependency. The
 	default module_path is "/boot/modules" with the kernel directory
 	prepended.
 
 ################################################################################
 # Tset Sprompt DSet the command prompt
 
 	set prompt=<value>
 
 	The command prompt is displayed when the loader is waiting for input.
 	Variable substitution is performed on the prompt.  The default 
 	prompt can be set with:
 
 		set prompt=\${interpret}
 
 ################################################################################
 # Tset Srootdev DSet the root filesystem
 
 	set rootdev=<path>
 
 	By default the value of $currdev is used to set the root filesystem
 	when the kernel is booted.  This can be overridden by setting
 	$rootdev explicitly.
 
 ################################################################################
 # Tset Stunables DSet kernel tunable values
 
 	Various kernel tunable parameters can be overridden by specifying new 
 	values in the environment.
 
 	set kern.ipc.nmbclusters=<value>
 
 		Set the number of mbuf clusters to be allocated.  The value
 		cannot be set below the default determined when the kernel
 		was compiled.
 
 	set kern.ipc.nsfbufs=<value>		NSFBUFS
 
 		Set the number of sendfile buffers to be allocated.  This
 		overrides the value determined when the kernel was compiled.
 
 	set vm.kmem_size=<value>		VM_KMEM_SIZE
 
 		Sets the size of kernel memory (bytes).  This overrides
 		the value determined when the kernel was compiled.
 
 	set machdep.disable_mtrrs=1
 
 		Disable the use of i686 MTRRs (i386 only)
 
 	set net.inet.tcp.tcbhashsize=<value>	TCBHASHSIZE
 
 		Overrides the compile-time set value of TCBHASHSIZE or
 		the preset default of 512.  Must be a power of 2.
 
 	hw.syscons.sc_no_suspend_vtswitch=<value>
 
 		Disable VT switching on suspend.
 
 		value is 0 (default) or non-zero to enable.
 
 	set hw.physmem=<value>			MAXMEM (i386 only)
 
 		Limits the amount of physical memory space available to
 		the system to <value> bytes.  <value> may have a k, M or G
 		suffix to indicate kilobytes, megabytes and gigabytes
 		respectively.  Note that the current i386 architecture
 		limits this value to 4GB.
 
 		On systems where memory cannot be accurately probed,
 		this option provides a hint as to the actual size of
 		system memory (which will be tested before use).
 
 	set hw.{acpi,pci}.host_start_mem=<value>
 
 		Sets the lowest address that the pci code will assign
 		when it doesn't have other information about the address
 		to assign (like from a pci bridge).  This is only useful
 		in older systems without a pci bridge.  Also, it only
 		impacts devices that the BIOS doesn't assign to, typically
 		CardBus bridges.  The default <value> is 0x80000000, but
 		some systems need values like 0xf0000000, 0xfc000000 or
 		0xfe000000 may be suitable for older systems (the older
 		the system, the higher the number typically should be).
 
 	set hw.pci.enable_io_modes=<value>
 
 		Enable PCI resources which are left off by some BIOSes
 		or are not enabled correctly by the device driver.
 
 		value is 1 (default), but this may cause problems with
 		some peripherals.  Set to 0 to disable.
 
 ################################################################################
 # Tshow DShow the values of variables
 
 	show [<variable>]
 
 	Displays the value of <variable>, or all variables if not specified.
 	Multiple paths can be separated with a semicolon.
 
 ################################################################################
 # Tinclude DRead commands from a script file
 
 	include <filename> [<filename> ...]
 
 	The entire contents of <filename> are read into memory before executing
 	commands, so it is safe to source a file from removable media.
 
 ################################################################################
 # Tread DRead input from the terminal
 
 	read [-t <value>] [-p <prompt>] [<variable name>]
 
 	The read command reads a line of input from the terminal.  If the 
 	-t argument is specified, it will return nothing if no input has been
 	received after <value> seconds.  (Any keypress will cancel the 
 	timeout).
 
 	If -p is specified, <prompt> is printed before reading input. No 
 	newline is emitted after the prompt.
 
 	If a variable name is supplied, the variable is set to the value read,
 	less any terminating newline.
 
 ################################################################################
 # Tunload DRemove all modules from memory
 
 	unload
 
 	This command removes any kernel and all loaded modules from memory.
+
+################################################################################
+# Tunmap-vdisk DUnmap virtual disk
+
+	unmap-vdisk diskname
+
+	Delete virtual disk mapping.
 
 ################################################################################
 # Tunset DUnset a variable
 
 	unset <variable name>
 
 	If allowed, the named variable's value is discarded and the variable
 	is removed.	
 
 ################################################################################
Index: user/ngie/bug-237403/stand/common/vdisk.c
===================================================================
--- user/ngie/bug-237403/stand/common/vdisk.c	(nonexistent)
+++ user/ngie/bug-237403/stand/common/vdisk.c	(revision 346926)
@@ -0,0 +1,417 @@
+/*-
+ * Copyright 2019 Toomas Soome <tsoome@me.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <stand.h>
+#include <stdarg.h>
+#include <machine/_inttypes.h>
+#include <bootstrap.h>
+#include <sys/disk.h>
+#include <sys/errno.h>
+#include <sys/queue.h>
+#include <sys/param.h>
+#include <disk.h>
+
+static int vdisk_init(void);
+static int vdisk_strategy(void *, int, daddr_t, size_t, char *, size_t *);
+static int vdisk_open(struct open_file *, ...);
+static int vdisk_close(struct open_file *);
+static int vdisk_ioctl(struct open_file *, u_long, void *);
+static int vdisk_print(int);
+
+struct devsw vdisk_dev = {
+	.dv_name = "vdisk",
+	.dv_type = DEVT_DISK,
+	.dv_init = vdisk_init,
+	.dv_strategy = vdisk_strategy,
+	.dv_open = vdisk_open,
+	.dv_close = vdisk_close,
+	.dv_ioctl = vdisk_ioctl,
+	.dv_print = vdisk_print,
+	.dv_cleanup = NULL
+};
+
+typedef STAILQ_HEAD(vdisk_info_list, vdisk_info) vdisk_info_list_t;
+
+typedef struct vdisk_info
+{
+	STAILQ_ENTRY(vdisk_info)	vdisk_link; /* link in device list */
+	char			*vdisk_path;
+	int			vdisk_unit;
+	int			vdisk_fd;
+	uint64_t		vdisk_size;	/* size in bytes */
+	uint32_t		vdisk_sectorsz;
+	uint32_t		vdisk_open;	/* reference counter */
+} vdisk_info_t;
+
+static vdisk_info_list_t vdisk_list;	/* list of mapped vdisks. */
+
+static vdisk_info_t *
+vdisk_get_info(struct devdesc *dev)
+{
+	vdisk_info_t *vd;
+
+	STAILQ_FOREACH(vd, &vdisk_list, vdisk_link) {
+		if (vd->vdisk_unit == dev->d_unit)
+			return (vd);
+	}
+	return (vd);
+}
+
+COMMAND_SET(map_vdisk, "map-vdisk", "map file as virtual disk", command_mapvd);
+
+static int
+command_mapvd(int argc, char *argv[])
+{
+	vdisk_info_t *vd, *p;
+	struct stat sb;
+
+	if (argc != 2) {
+		printf("usage: %s filename\n", argv[0]);
+		return (CMD_ERROR);
+	}
+
+	STAILQ_FOREACH(vd, &vdisk_list, vdisk_link) {
+		if (strcmp(vd->vdisk_path, argv[1]) == 0) {
+			printf("%s: file %s is already mapped as %s%d\n",
+			    argv[0], argv[1], vdisk_dev.dv_name,
+			    vd->vdisk_unit);
+			return (CMD_ERROR);
+		}
+	}
+
+	if (stat(argv[1], &sb) < 0) {
+		/*
+		 * ENOSYS is really ENOENT because we did try to walk
+		 * through devsw list to try to open this file.
+		 */
+		if (errno == ENOSYS)
+			errno = ENOENT;
+
+		printf("%s: stat failed: %s\n", argv[0], strerror(errno));
+		return (CMD_ERROR);
+	}
+
+	/*
+	 * Avoid mapping small files.
+	 */
+	if (sb.st_size < 1024 * 1024) {
+		printf("%s: file %s is too small.\n", argv[0], argv[1]);
+		return (CMD_ERROR);
+	}
+
+	vd = calloc(1, sizeof (*vd));
+	if (vd == NULL) {
+		printf("%s: out of memory\n", argv[0]);
+		return (CMD_ERROR);
+	}
+	vd->vdisk_path = strdup(argv[1]);
+	if (vd->vdisk_path == NULL) {
+		free (vd);
+		printf("%s: out of memory\n", argv[0]);
+		return (CMD_ERROR);
+	}
+	vd->vdisk_fd = open(vd->vdisk_path, O_RDONLY);
+	if (vd->vdisk_fd < 0) {
+		printf("%s: open failed: %s\n", argv[0], strerror(errno));
+		free(vd->vdisk_path);
+		free(vd);
+		return (CMD_ERROR);
+	}
+
+	vd->vdisk_size = sb.st_size;
+	vd->vdisk_sectorsz = DEV_BSIZE;
+	STAILQ_FOREACH(p, &vdisk_list, vdisk_link) {
+		vdisk_info_t *n;
+		if (p->vdisk_unit == vd->vdisk_unit) {
+			vd->vdisk_unit++;
+			continue;
+		}
+		n = STAILQ_NEXT(p, vdisk_link);
+		if (p->vdisk_unit < vd->vdisk_unit) {
+			if (n == NULL) {
+				/* p is last elem */
+				STAILQ_INSERT_TAIL(&vdisk_list, vd, vdisk_link);
+				break;
+			}
+			if (n->vdisk_unit > vd->vdisk_unit) {
+				/* p < vd < n */
+				STAILQ_INSERT_AFTER(&vdisk_list, p, vd,
+				    vdisk_link);
+				break;
+			}
+			/* else n < vd or n == vd */
+			vd->vdisk_unit++;
+			continue;
+		}
+		/* p > vd only if p is the first element */
+		STAILQ_INSERT_HEAD(&vdisk_list, vd, vdisk_link);
+		break;
+	}
+
+	/* if the list was empty or contiguous */
+	if (p == NULL)
+		STAILQ_INSERT_TAIL(&vdisk_list, vd, vdisk_link);
+
+	printf("%s: file %s is mapped as %s%d\n", argv[0], vd->vdisk_path,
+	    vdisk_dev.dv_name, vd->vdisk_unit);
+	return (CMD_OK);
+}
+
+COMMAND_SET(unmap_vdisk, "unmap-vdisk", "unmap virtual disk", command_unmapvd);
+
+/*
+ * unmap-vdisk vdiskX
+ */
+static int
+command_unmapvd(int argc, char *argv[])
+{
+	size_t len;
+	vdisk_info_t *vd;
+	long unit;
+	char *end;
+
+	if (argc != 2) {
+		printf("usage: %s %sN\n", argv[0], vdisk_dev.dv_name);
+		return (CMD_ERROR);
+	}
+
+	len = strlen(vdisk_dev.dv_name);
+	if (strncmp(vdisk_dev.dv_name, argv[1], len) != 0) {
+		printf("%s: unknown device %s\n", argv[0], argv[1]);
+		return (CMD_ERROR);
+	}
+	errno = 0;
+	unit = strtol(argv[1] + len, &end, 10);
+	if (errno != 0 || (*end != '\0' && strcmp(end, ":") != 0)) {
+		printf("%s: unknown device %s\n", argv[0], argv[1]);
+		return (CMD_ERROR);
+	}
+
+	STAILQ_FOREACH(vd, &vdisk_list, vdisk_link) {
+		if (vd->vdisk_unit == unit)
+			break;
+	}
+
+	if (vd == NULL) {
+		printf("%s: unknown device %s\n", argv[0], argv[1]);
+		return (CMD_ERROR);
+	}
+
+	if (vd->vdisk_open != 0) {
+		printf("%s: %s is in use, unable to unmap.\n",
+		    argv[0], argv[1]);
+		return (CMD_ERROR);
+	}
+
+	STAILQ_REMOVE(&vdisk_list, vd, vdisk_info, vdisk_link);
+	close(vd->vdisk_fd);
+	free(vd->vdisk_path);
+	free(vd);
+	printf("%s (%s) unmapped\n", argv[1], vd->vdisk_path);
+
+	return (CMD_OK);
+}
+
+static int
+vdisk_init(void)
+{
+	STAILQ_INIT(&vdisk_list);
+	return (0);
+}
+
+static int
+vdisk_strategy(void *devdata, int rw, daddr_t blk, size_t size,
+    char *buf, size_t *rsize)
+{
+	struct disk_devdesc *dev;
+	vdisk_info_t *vd;
+	ssize_t rv;
+
+	dev = devdata;
+	if (dev == NULL)
+		return (EINVAL);
+	vd = vdisk_get_info((struct devdesc *)dev);
+	if (vd == NULL)
+		return (EINVAL);
+
+	if (size == 0 || (size % 512) != 0)
+		return (EIO);
+
+	if (dev->dd.d_dev->dv_type == DEVT_DISK) {
+		daddr_t offset;
+
+		offset = dev->d_offset * vd->vdisk_sectorsz;
+		offset /= 512;
+		blk += offset;
+	}
+	if (lseek(vd->vdisk_fd, blk << 9, SEEK_SET) == -1)
+		return (EIO);
+
+	errno = 0;
+	switch (rw & F_MASK) {
+	case F_READ:
+		rv = read(vd->vdisk_fd, buf, size);
+		break;
+	case F_WRITE:
+		rv = write(vd->vdisk_fd, buf, size);
+		break;
+	default:
+		return (ENOSYS);
+	}
+
+	if (errno == 0 && rsize != NULL) {
+		*rsize = rv;
+	}
+	return (errno);
+}
+
+static int
+vdisk_open(struct open_file *f, ...)
+{
+	va_list args;
+	struct disk_devdesc *dev;
+	vdisk_info_t *vd;
+	int rc = 0;
+
+	va_start(args, f);
+	dev = va_arg(args, struct disk_devdesc *);
+	va_end(args);
+	if (dev == NULL)
+		return (EINVAL);
+	vd = vdisk_get_info((struct devdesc *)dev);
+	if (vd == NULL)
+		return (EINVAL);
+
+	if (dev->dd.d_dev->dv_type == DEVT_DISK) {
+		rc = disk_open(dev, vd->vdisk_size, vd->vdisk_sectorsz);
+	}
+	if (rc == 0)
+		vd->vdisk_open++;
+	return (rc);
+}
+
+static int
+vdisk_close(struct open_file *f)
+{
+	struct disk_devdesc *dev;
+	vdisk_info_t *vd;
+
+	dev = (struct disk_devdesc *)(f->f_devdata);
+	if (dev == NULL)
+		return (EINVAL);
+	vd = vdisk_get_info((struct devdesc *)dev);
+	if (vd == NULL)
+		return (EINVAL);
+
+	vd->vdisk_open--;
+	if (dev->dd.d_dev->dv_type == DEVT_DISK)
+		return (disk_close(dev));
+	return (0);
+}
+
+static int
+vdisk_ioctl(struct open_file *f, u_long cmd, void *data)
+{
+	struct disk_devdesc *dev;
+	vdisk_info_t *vd;
+	int rc;
+
+	dev = (struct disk_devdesc *)(f->f_devdata);
+	if (dev == NULL)
+		return (EINVAL);
+	vd = vdisk_get_info((struct devdesc *)dev);
+	if (vd == NULL)
+		return (EINVAL);
+
+	if (dev->dd.d_dev->dv_type == DEVT_DISK) {
+		rc = disk_ioctl(dev, cmd, data);
+		if (rc != ENOTTY)
+			return (rc);
+	}
+
+	switch (cmd) {
+	case DIOCGSECTORSIZE:
+		*(u_int *)data = vd->vdisk_sectorsz;
+		break;
+	case DIOCGMEDIASIZE:
+		*(uint64_t *)data = vd->vdisk_size;
+		break;
+	default:
+		return (ENOTTY);
+	}
+	return (0);
+}
+
+static int
+vdisk_print(int verbose)
+{
+	int ret = 0;
+	vdisk_info_t *vd;
+	char line[80];
+
+	if (STAILQ_EMPTY(&vdisk_list))
+		return (ret);
+
+	printf("%s devices:", vdisk_dev.dv_name);
+	if ((ret = pager_output("\n")) != 0)
+		return (ret);
+
+	STAILQ_FOREACH(vd, &vdisk_list, vdisk_link) {
+		struct disk_devdesc vd_dev;
+
+		if (verbose) {
+			printf("  %s", vd->vdisk_path);
+			if ((ret = pager_output("\n")) != 0)
+				break;
+		}
+		snprintf(line, sizeof(line),
+		    "    %s%d", vdisk_dev.dv_name, vd->vdisk_unit);
+		printf("%s:    %" PRIu64 " X %u blocks", line,
+		    vd->vdisk_size / vd->vdisk_sectorsz,
+		    vd->vdisk_sectorsz);
+		if ((ret = pager_output("\n")) != 0)
+			break;
+
+		vd_dev.dd.d_dev = &vdisk_dev;
+		vd_dev.dd.d_unit = vd->vdisk_unit;
+		vd_dev.d_slice = -1;
+		vd_dev.d_partition = -1;
+
+		ret = disk_open(&vd_dev, vd->vdisk_size, vd->vdisk_sectorsz);
+		if (ret == 0) {
+			ret = disk_print(&vd_dev, line, verbose);
+			disk_close(&vd_dev);
+			if (ret != 0)
+				break;
+		} else {
+			ret = 0;
+		}
+	}
+
+	return (ret);
+}

Property changes on: user/ngie/bug-237403/stand/common/vdisk.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/stand/efi/include/efilib.h
===================================================================
--- user/ngie/bug-237403/stand/efi/include/efilib.h	(revision 346925)
+++ user/ngie/bug-237403/stand/efi/include/efilib.h	(revision 346926)
@@ -1,144 +1,145 @@
 /*-
  * Copyright (c) 2000 Doug Rabson
  * Copyright (c) 2006 Marcel Moolenaar
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _LOADER_EFILIB_H
 #define	_LOADER_EFILIB_H
 
 #include <stand.h>
 #include <stdbool.h>
 #include <sys/queue.h>
 
 extern EFI_HANDLE		IH;
 extern EFI_SYSTEM_TABLE		*ST;
 extern EFI_BOOT_SERVICES	*BS;
 extern EFI_RUNTIME_SERVICES	*RS;
 
 extern struct devsw efipart_fddev;
 extern struct devsw efipart_cddev;
 extern struct devsw efipart_hddev;
 extern struct devsw efinet_dev;
 extern struct netif_driver efinetif;
 
 /* EFI block device data, included here to help efi_zfs_probe() */
 typedef STAILQ_HEAD(pdinfo_list, pdinfo) pdinfo_list_t;
 
 typedef struct pdinfo
 {
 	STAILQ_ENTRY(pdinfo)	pd_link;	/* link in device list */
 	pdinfo_list_t		pd_part;	/* list of partitions */
 	EFI_HANDLE		pd_handle;
 	EFI_HANDLE		pd_alias;
 	EFI_DEVICE_PATH		*pd_devpath;
 	EFI_BLOCK_IO		*pd_blkio;
 	uint32_t		pd_unit;	/* unit number */
 	uint32_t		pd_open;	/* reference counter */
 	void			*pd_bcache;	/* buffer cache data */
 	struct pdinfo		*pd_parent;	/* Linked items (eg partitions) */
 	struct devsw		*pd_devsw;	/* Back pointer to devsw */
 } pdinfo_t;
 
 pdinfo_list_t *efiblk_get_pdinfo_list(struct devsw *dev);
 pdinfo_t *efiblk_get_pdinfo(struct devdesc *dev);
 pdinfo_t *efiblk_get_pdinfo_by_handle(EFI_HANDLE h);
 pdinfo_t *efiblk_get_pdinfo_by_device_path(EFI_DEVICE_PATH *path);
 
 void *efi_get_table(EFI_GUID *tbl);
 
 int efi_getdev(void **vdev, const char *devspec, const char **path);
 char *efi_fmtdev(void *vdev);
 int efi_setcurrdev(struct env_var *ev, int flags, const void *value);
 
 
 int efi_register_handles(struct devsw *, EFI_HANDLE *, EFI_HANDLE *, int);
 EFI_HANDLE efi_find_handle(struct devsw *, int);
 int efi_handle_lookup(EFI_HANDLE, struct devsw **, int *,  uint64_t *);
 int efi_handle_update_dev(EFI_HANDLE, struct devsw *, int, uint64_t);
 
 EFI_DEVICE_PATH *efi_lookup_image_devpath(EFI_HANDLE);
 EFI_DEVICE_PATH *efi_lookup_devpath(EFI_HANDLE);
 EFI_HANDLE efi_devpath_handle(EFI_DEVICE_PATH *);
 EFI_DEVICE_PATH *efi_devpath_last_node(EFI_DEVICE_PATH *);
 EFI_DEVICE_PATH *efi_devpath_trim(EFI_DEVICE_PATH *);
 bool efi_devpath_match(EFI_DEVICE_PATH *, EFI_DEVICE_PATH *);
 bool efi_devpath_match_node(EFI_DEVICE_PATH *, EFI_DEVICE_PATH *);
 bool efi_devpath_is_prefix(EFI_DEVICE_PATH *, EFI_DEVICE_PATH *);
 CHAR16 *efi_devpath_name(EFI_DEVICE_PATH *);
 void efi_free_devpath_name(CHAR16 *);
 EFI_DEVICE_PATH *efi_devpath_to_media_path(EFI_DEVICE_PATH *);
 UINTN efi_devpath_length(EFI_DEVICE_PATH *);
 EFI_DEVICE_PATH *efi_name_to_devpath(const char *path);
 EFI_DEVICE_PATH *efi_name_to_devpath16(CHAR16 *path);
 void efi_devpath_free(EFI_DEVICE_PATH *dp);
 
 int efi_status_to_errno(EFI_STATUS);
 EFI_STATUS errno_to_efi_status(int errno);
 
 void efi_time_init(void);
 void efi_time_fini(void);
 
 EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab);
 
 EFI_STATUS main(int argc, CHAR16 *argv[]);
 void efi_exit(EFI_STATUS status) __dead2;
 void delay(int usecs);
 
 /* EFI environment initialization. */
 void efi_init_environment(void);
 
 /* EFI Memory type strings. */
 const char *efi_memory_type(EFI_MEMORY_TYPE);
 
 /* CHAR16 utility functions. */
 int wcscmp(CHAR16 *, CHAR16 *);
 void cpy8to16(const char *, CHAR16 *, size_t);
 void cpy16to8(const CHAR16 *, char *, size_t);
 
 /*
  * Routines for interacting with EFI's env vars in a more unix-like
  * way than the standard APIs. In addition, convenience routines for
  * the loader setting / getting FreeBSD specific variables.
  */
 
 EFI_STATUS efi_delenv(EFI_GUID *guid, const char *varname);
+EFI_STATUS efi_freebsd_delenv(const char *varname);
 EFI_STATUS efi_freebsd_getenv(const char *v, void *data, __size_t *len);
 EFI_STATUS efi_getenv(EFI_GUID *g, const char *v, void *data, __size_t *len);
 EFI_STATUS efi_global_getenv(const char *v, void *data, __size_t *len);
 EFI_STATUS efi_setenv(EFI_GUID *guid, const char *varname, UINT32 attr, void *data, __size_t len);
 EFI_STATUS efi_setenv_freebsd_wcs(const char *varname, CHAR16 *valstr);
 
 /* guids and names */
 bool efi_guid_to_str(const EFI_GUID *, char **);
 bool efi_str_to_guid(const char *, EFI_GUID *);
 bool efi_name_to_guid(const char *, EFI_GUID *);
 bool efi_guid_to_name(EFI_GUID *, char **);
 
 /* efipart.c */
 int	efipart_inithandles(void);
 
 #endif	/* _LOADER_EFILIB_H */
Index: user/ngie/bug-237403/stand/efi/libefi/efienv.c
===================================================================
--- user/ngie/bug-237403/stand/efi/libefi/efienv.c	(revision 346925)
+++ user/ngie/bug-237403/stand/efi/libefi/efienv.c	(revision 346926)
@@ -1,123 +1,129 @@
 /*-
  * Copyright (c) 2018 Netflix, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stand.h>
 #include <efi.h>
 #include <efichar.h>
 #include <efilib.h>
 
 static EFI_GUID FreeBSDBootVarGUID = FREEBSD_BOOT_VAR_GUID;
 static EFI_GUID GlobalBootVarGUID = EFI_GLOBAL_VARIABLE;
 
 EFI_STATUS
 efi_getenv(EFI_GUID *g, const char *v, void *data, size_t *len)
 {
 	size_t ul;
 	CHAR16 *uv;
 	UINT32 attr;
 	UINTN dl;
 	EFI_STATUS rv;
 
 	uv = NULL;
 	if (utf8_to_ucs2(v, &uv, &ul) != 0)
 		return (EFI_OUT_OF_RESOURCES);
 	dl = *len;
 	rv = RS->GetVariable(uv, g, &attr, &dl, data);
 	if (rv == EFI_SUCCESS || rv == EFI_BUFFER_TOO_SMALL)
 		*len = dl;
 	free(uv);
 	return (rv);
 }
 
 EFI_STATUS
 efi_global_getenv(const char *v, void *data, size_t *len)
 {
 
 	return (efi_getenv(&GlobalBootVarGUID, v, data, len));
 }
 
 EFI_STATUS
 efi_freebsd_getenv(const char *v, void *data, size_t *len)
 {
 
 	return (efi_getenv(&FreeBSDBootVarGUID, v, data, len));
 }
 
 /*
  * efi_setenv -- Sets an env variable.
  */
 EFI_STATUS
 efi_setenv(EFI_GUID *guid, const char *varname, UINT32 attr, void *data, __size_t len)
 {
 	EFI_STATUS rv;
 	CHAR16 *uv;
 	size_t ul;
 
 	uv = NULL;
 	if (utf8_to_ucs2(varname, &uv, &ul) != 0)
 		return (EFI_OUT_OF_RESOURCES);
 
 	rv = RS->SetVariable(uv, guid, attr, len, data);
 	free(uv);
 	return (rv);
 }
 
 EFI_STATUS
 efi_setenv_freebsd_wcs(const char *varname, CHAR16 *valstr)
 {
 	CHAR16 *var = NULL;
 	size_t len;
 	EFI_STATUS rv;
 
 	if (utf8_to_ucs2(varname, &var, &len) != 0)
 		return (EFI_OUT_OF_RESOURCES);
 	rv = RS->SetVariable(var, &FreeBSDBootVarGUID,
 	    EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS,
 	    (ucs2len(valstr) + 1) * sizeof(efi_char), valstr);
 	free(var);
 	return (rv);
 }
 
 /*
  * efi_delenv -- deletes the specified env variable
  */
 EFI_STATUS
 efi_delenv(EFI_GUID *guid, const char *name)
 {
 	CHAR16 *var;
 	size_t len;
 	EFI_STATUS rv;
 
 	var = NULL;
 	if (utf8_to_ucs2(name, &var, &len) != 0)
 		return (EFI_OUT_OF_RESOURCES);
 
 	rv = RS->SetVariable(var, guid, 0, 0, NULL);
 	free(var);
-	return rv;
+	return (rv);
+}
+
+EFI_STATUS
+efi_freebsd_delenv(const char *name)
+{
+	return (efi_delenv(&FreeBSDBootVarGUID, name));
 }
Index: user/ngie/bug-237403/stand/efi/loader/autoload.c
===================================================================
--- user/ngie/bug-237403/stand/efi/loader/autoload.c	(revision 346925)
+++ user/ngie/bug-237403/stand/efi/loader/autoload.c	(revision 346926)
@@ -1,56 +1,57 @@
 /*-
  * Copyright (c) 2010 Rui Paulo <rpaulo@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #if defined(LOADER_FDT_SUPPORT)
 #include <sys/param.h>
 #include <fdt_platform.h>
 #endif
 
 #include "loader_efi.h"
 
 int
 efi_autoload(void)
 {
 
 #if defined(LOADER_FDT_SUPPORT)
 	/*
 	 * Setup the FDT early so that we're not loading files during bi_load.
 	 * Any such loading is inherently broken since bi_load uses the space
 	 * just after all currently loaded files for the data that will be
 	 * passed to the kernel and newly loaded files will be positioned in
 	 * that same space.
 	 *
 	 * We're glossing over errors here because LOADER_FDT_SUPPORT does not
 	 * imply that we're on a platform where FDT is a requirement.  If we
 	 * fix this, then the error handling here should be fixed accordingly.
 	 */
-	fdt_setup_fdtp();
+	if (fdt_is_setup() == 0)
+		fdt_setup_fdtp();
 #endif
 	return (0);
 }
Index: user/ngie/bug-237403/stand/efi/loader/conf.c
===================================================================
--- user/ngie/bug-237403/stand/efi/loader/conf.c	(revision 346925)
+++ user/ngie/bug-237403/stand/efi/loader/conf.c	(revision 346926)
@@ -1,81 +1,84 @@
 /*-
  * Copyright (c) 2006 Marcel Moolenaar
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stand.h>
 #include <bootstrap.h>
 #include <efi.h>
 #include <efilib.h>
 #include <efizfs.h>
 
+extern struct devsw vdisk_dev;
+
 struct devsw *devsw[] = {
 	&efipart_fddev,
 	&efipart_cddev,
 	&efipart_hddev,
 	&efinet_dev,
+	&vdisk_dev,
 #ifdef EFI_ZFS_BOOT
 	&zfs_dev,
 #endif
 	NULL
 };
 
 struct fs_ops *file_system[] = {
 #ifdef EFI_ZFS_BOOT
 	&zfs_fsops,
 #endif
 	&dosfs_fsops,
 	&ufs_fsops,
 	&cd9660_fsops,
 	&tftp_fsops,
 	&nfs_fsops,
 	&gzipfs_fsops,
 	&bzipfs_fsops,
 	NULL
 };
 
 struct netif_driver *netif_drivers[] = {
 	&efinetif,
 	NULL
 };
 
 extern struct console efi_console;
 #if defined(__amd64__) || defined(__i386__)
 extern struct console comconsole;
 extern struct console nullconsole;
 extern struct console spinconsole;
 #endif
 
 struct console *consoles[] = {
 	&efi_console,
 #if defined(__amd64__) || defined(__i386__)
 	&comconsole,
 	&nullconsole,
 	&spinconsole,
 #endif
 	NULL
 };
Index: user/ngie/bug-237403/stand/efi/loader/main.c
===================================================================
--- user/ngie/bug-237403/stand/efi/loader/main.c	(revision 346925)
+++ user/ngie/bug-237403/stand/efi/loader/main.c	(revision 346926)
@@ -1,1420 +1,1555 @@
 /*-
  * Copyright (c) 2008-2010 Rui Paulo
  * Copyright (c) 2006 Marcel Moolenaar
  * All rights reserved.
  *
- * Copyright (c) 2018 Netflix, Inc.
+ * Copyright (c) 2016-2019 Netflix, Inc. written by M. Warner Losh
  * 
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stand.h>
 
 #include <sys/disk.h>
 #include <sys/param.h>
 #include <sys/reboot.h>
 #include <sys/boot.h>
 #include <stdint.h>
 #include <string.h>
 #include <setjmp.h>
 #include <disk.h>
 
 #include <efi.h>
 #include <efilib.h>
 #include <efichar.h>
 
 #include <uuid.h>
 
 #include <bootstrap.h>
 #include <smbios.h>
 
 #include "efizfs.h"
 
 #include "loader_efi.h"
 
 struct arch_switch archsw;	/* MI/MD interface boundary */
 
 EFI_GUID acpi = ACPI_TABLE_GUID;
 EFI_GUID acpi20 = ACPI_20_TABLE_GUID;
 EFI_GUID devid = DEVICE_PATH_PROTOCOL;
 EFI_GUID imgid = LOADED_IMAGE_PROTOCOL;
 EFI_GUID mps = MPS_TABLE_GUID;
 EFI_GUID netid = EFI_SIMPLE_NETWORK_PROTOCOL;
 EFI_GUID smbios = SMBIOS_TABLE_GUID;
 EFI_GUID smbios3 = SMBIOS3_TABLE_GUID;
 EFI_GUID dxe = DXE_SERVICES_TABLE_GUID;
 EFI_GUID hoblist = HOB_LIST_TABLE_GUID;
 EFI_GUID lzmadecomp = LZMA_DECOMPRESSION_GUID;
 EFI_GUID mpcore = ARM_MP_CORE_INFO_TABLE_GUID;
 EFI_GUID esrt = ESRT_TABLE_GUID;
 EFI_GUID memtype = MEMORY_TYPE_INFORMATION_TABLE_GUID;
 EFI_GUID debugimg = DEBUG_IMAGE_INFO_TABLE_GUID;
 EFI_GUID fdtdtb = FDT_TABLE_GUID;
 EFI_GUID inputid = SIMPLE_TEXT_INPUT_PROTOCOL;
 
 /*
  * Number of seconds to wait for a keystroke before exiting with failure
  * in the event no currdev is found. -2 means always break, -1 means
  * never break, 0 means poll once and then reboot, > 0 means wait for
  * that many seconds. "fail_timeout" can be set in the environment as
  * well.
  */
 static int fail_timeout = 5;
 
 /*
  * Current boot variable
  */
 UINT16 boot_current;
 
 /*
  * Image that we booted from.
  */
 EFI_LOADED_IMAGE *boot_img;
 
 static bool
 has_keyboard(void)
 {
 	EFI_STATUS status;
 	EFI_DEVICE_PATH *path;
 	EFI_HANDLE *hin, *hin_end, *walker;
 	UINTN sz;
 	bool retval = false;
 
 	/*
 	 * Find all the handles that support the SIMPLE_TEXT_INPUT_PROTOCOL and
 	 * do the typical dance to get the right sized buffer.
 	 */
 	sz = 0;
 	hin = NULL;
 	status = BS->LocateHandle(ByProtocol, &inputid, 0, &sz, 0);
 	if (status == EFI_BUFFER_TOO_SMALL) {
 		hin = (EFI_HANDLE *)malloc(sz);
 		status = BS->LocateHandle(ByProtocol, &inputid, 0, &sz,
 		    hin);
 		if (EFI_ERROR(status))
 			free(hin);
 	}
 	if (EFI_ERROR(status))
 		return retval;
 
 	/*
 	 * Look at each of the handles. If it supports the device path protocol,
 	 * use it to get the device path for this handle. Then see if that
 	 * device path matches either the USB device path for keyboards or the
 	 * legacy device path for keyboards.
 	 */
 	hin_end = &hin[sz / sizeof(*hin)];
 	for (walker = hin; walker < hin_end; walker++) {
 		status = BS->HandleProtocol(*walker, &devid, (VOID **)&path);
 		if (EFI_ERROR(status))
 			continue;
 
 		while (!IsDevicePathEnd(path)) {
 			/*
 			 * Check for the ACPI keyboard node. All PNP3xx nodes
 			 * are keyboards of different flavors. Note: It is
 			 * unclear of there's always a keyboard node when
 			 * there's a keyboard controller, or if there's only one
 			 * when a keyboard is detected at boot.
 			 */
 			if (DevicePathType(path) == ACPI_DEVICE_PATH &&
 			    (DevicePathSubType(path) == ACPI_DP ||
 				DevicePathSubType(path) == ACPI_EXTENDED_DP)) {
 				ACPI_HID_DEVICE_PATH  *acpi;
 
 				acpi = (ACPI_HID_DEVICE_PATH *)(void *)path;
 				if ((EISA_ID_TO_NUM(acpi->HID) & 0xff00) == 0x300 &&
 				    (acpi->HID & 0xffff) == PNP_EISA_ID_CONST) {
 					retval = true;
 					goto out;
 				}
 			/*
 			 * Check for USB keyboard node, if present. Unlike a
 			 * PS/2 keyboard, these definitely only appear when
 			 * connected to the system.
 			 */
 			} else if (DevicePathType(path) == MESSAGING_DEVICE_PATH &&
 			    DevicePathSubType(path) == MSG_USB_CLASS_DP) {
 				USB_CLASS_DEVICE_PATH *usb;
 
 				usb = (USB_CLASS_DEVICE_PATH *)(void *)path;
 				if (usb->DeviceClass == 3 && /* HID */
 				    usb->DeviceSubClass == 1 && /* Boot devices */
 				    usb->DeviceProtocol == 1) { /* Boot keyboards */
 					retval = true;
 					goto out;
 				}
 			}
 			path = NextDevicePathNode(path);
 		}
 	}
 out:
 	free(hin);
 	return retval;
 }
 
 static void
 set_currdev(const char *devname)
 {
 
 	env_setenv("currdev", EV_VOLATILE, devname, efi_setcurrdev, env_nounset);
 	env_setenv("loaddev", EV_VOLATILE, devname, env_noset, env_nounset);
 }
 
 static void
 set_currdev_devdesc(struct devdesc *currdev)
 {
 	const char *devname;
 
 	devname = efi_fmtdev(currdev);
 	printf("Setting currdev to %s\n", devname);
 	set_currdev(devname);
 }
 
 static void
 set_currdev_devsw(struct devsw *dev, int unit)
 {
 	struct devdesc currdev;
 
 	currdev.d_dev = dev;
 	currdev.d_unit = unit;
 
 	set_currdev_devdesc(&currdev);
 }
 
 static void
 set_currdev_pdinfo(pdinfo_t *dp)
 {
 
 	/*
 	 * Disks are special: they have partitions. if the parent
 	 * pointer is non-null, we're a partition not a full disk
 	 * and we need to adjust currdev appropriately.
 	 */
 	if (dp->pd_devsw->dv_type == DEVT_DISK) {
 		struct disk_devdesc currdev;
 
 		currdev.dd.d_dev = dp->pd_devsw;
 		if (dp->pd_parent == NULL) {
 			currdev.dd.d_unit = dp->pd_unit;
 			currdev.d_slice = D_SLICENONE;
 			currdev.d_partition = D_PARTNONE;
 		} else {
 			currdev.dd.d_unit = dp->pd_parent->pd_unit;
 			currdev.d_slice = dp->pd_unit;
 			currdev.d_partition = D_PARTISGPT; /* XXX Assumes GPT */
 		}
 		set_currdev_devdesc((struct devdesc *)&currdev);
 	} else {
 		set_currdev_devsw(dp->pd_devsw, dp->pd_unit);
 	}
 }
 
 static bool
 sanity_check_currdev(void)
 {
 	struct stat st;
 
 	return (stat("/boot/defaults/loader.conf", &st) == 0 ||
 	    stat("/boot/kernel/kernel", &st) == 0);
 }
 
 #ifdef EFI_ZFS_BOOT
 static bool
 probe_zfs_currdev(uint64_t guid)
 {
 	char *devname;
 	struct zfs_devdesc currdev;
 
 	currdev.dd.d_dev = &zfs_dev;
 	currdev.dd.d_unit = 0;
 	currdev.pool_guid = guid;
 	currdev.root_guid = 0;
 	set_currdev_devdesc((struct devdesc *)&currdev);
 	devname = efi_fmtdev(&currdev);
 	init_zfs_bootenv(devname);
 
 	return (sanity_check_currdev());
 }
 #endif
 
 static bool
 try_as_currdev(pdinfo_t *hd, pdinfo_t *pp)
 {
 	uint64_t guid;
 
 #ifdef EFI_ZFS_BOOT
 	/*
 	 * If there's a zpool on this device, try it as a ZFS
 	 * filesystem, which has somewhat different setup than all
 	 * other types of fs due to imperfect loader integration.
 	 * This all stems from ZFS being both a device (zpool) and
 	 * a filesystem, plus the boot env feature.
 	 */
 	if (efizfs_get_guid_by_handle(pp->pd_handle, &guid))
 		return (probe_zfs_currdev(guid));
 #endif
 	/*
 	 * All other filesystems just need the pdinfo
 	 * initialized in the standard way.
 	 */
 	set_currdev_pdinfo(pp);
 	return (sanity_check_currdev());
 }
 
 /*
  * Sometimes we get filenames that are all upper case
  * and/or have backslashes in them. Filter all this out
  * if it looks like we need to do so.
  */
 static void
 fix_dosisms(char *p)
 {
 	while (*p) {
 		if (isupper(*p))
 			*p = tolower(*p);
 		else if (*p == '\\')
 			*p = '/';
 		p++;
 	}
 }
 
 #define SIZE(dp, edp) (size_t)((intptr_t)(void *)edp - (intptr_t)(void *)dp)
 
 enum { BOOT_INFO_OK = 0, BAD_CHOICE = 1, NOT_SPECIFIC = 2  };
 static int
 match_boot_info(char *boot_info, size_t bisz)
 {
 	uint32_t attr;
 	uint16_t fplen;
 	size_t len;
 	char *walker, *ep;
 	EFI_DEVICE_PATH *dp, *edp, *first_dp, *last_dp;
 	pdinfo_t *pp;
 	CHAR16 *descr;
 	char *kernel = NULL;
 	FILEPATH_DEVICE_PATH  *fp;
 	struct stat st;
 	CHAR16 *text;
 
 	/*
 	 * FreeBSD encodes it's boot loading path into the boot loader
 	 * BootXXXX variable. We look for the last one in the path
 	 * and use that to load the kernel. However, if we only fine
 	 * one DEVICE_PATH, then there's nothing specific and we should
 	 * fall back.
 	 *
 	 * In an ideal world, we'd look at the image handle we were
 	 * passed, match up with the loader we are and then return the
 	 * next one in the path. This would be most flexible and cover
 	 * many chain booting scenarios where you need to use this
 	 * boot loader to get to the next boot loader. However, that
 	 * doesn't work. We rarely have the path to the image booted
 	 * (just the device) so we can't count on that. So, we do the
 	 * enxt best thing, we look through the device path(s) passed
 	 * in the BootXXXX varaible. If there's only one, we return
 	 * NOT_SPECIFIC. Otherwise, we look at the last one and try to
 	 * load that. If we can, we return BOOT_INFO_OK. Otherwise we
 	 * return BAD_CHOICE for the caller to sort out.
 	 */
 	if (bisz < sizeof(attr) + sizeof(fplen) + sizeof(CHAR16))
 		return NOT_SPECIFIC;
 	walker = boot_info;
 	ep = walker + bisz;
 	memcpy(&attr, walker, sizeof(attr));
 	walker += sizeof(attr);
 	memcpy(&fplen, walker, sizeof(fplen));
 	walker += sizeof(fplen);
 	descr = (CHAR16 *)(intptr_t)walker;
 	len = ucs2len(descr);
 	walker += (len + 1) * sizeof(CHAR16);
 	last_dp = first_dp = dp = (EFI_DEVICE_PATH *)walker;
 	edp = (EFI_DEVICE_PATH *)(walker + fplen);
 	if ((char *)edp > ep)
 		return NOT_SPECIFIC;
 	while (dp < edp && SIZE(dp, edp) > sizeof(EFI_DEVICE_PATH)) {
 		text = efi_devpath_name(dp);
 		if (text != NULL) {
 			printf("   BootInfo Path: %S\n", text);
 			efi_free_devpath_name(text);
 		}
 		last_dp = dp;
 		dp = (EFI_DEVICE_PATH *)((char *)dp + efi_devpath_length(dp));
 	}
 
 	/*
 	 * If there's only one item in the list, then nothing was
 	 * specified. Or if the last path doesn't have a media
 	 * path in it. Those show up as various VenHw() nodes
 	 * which are basically opaque to us. Don't count those
 	 * as something specifc.
 	 */
 	if (last_dp == first_dp) {
 		printf("Ignoring Boot%04x: Only one DP found\n", boot_current);
 		return NOT_SPECIFIC;
 	}
 	if (efi_devpath_to_media_path(last_dp) == NULL) {
 		printf("Ignoring Boot%04x: No Media Path\n", boot_current);
 		return NOT_SPECIFIC;
 	}
 
 	/*
 	 * OK. At this point we either have a good path or a bad one.
 	 * Let's check.
 	 */
 	pp = efiblk_get_pdinfo_by_device_path(last_dp);
 	if (pp == NULL) {
 		printf("Ignoring Boot%04x: Device Path not found\n", boot_current);
 		return BAD_CHOICE;
 	}
 	set_currdev_pdinfo(pp);
 	if (!sanity_check_currdev()) {
 		printf("Ignoring Boot%04x: sanity check failed\n", boot_current);
 		return BAD_CHOICE;
 	}
 
 	/*
 	 * OK. We've found a device that matches, next we need to check the last
 	 * component of the path. If it's a file, then we set the default kernel
 	 * to that. Otherwise, just use this as the default root.
 	 *
 	 * Reminder: we're running very early, before we've parsed the defaults
 	 * file, so we may need to have a hack override.
 	 */
 	dp = efi_devpath_last_node(last_dp);
 	if (DevicePathType(dp) !=  MEDIA_DEVICE_PATH ||
 	    DevicePathSubType(dp) != MEDIA_FILEPATH_DP) {
 		printf("Using Boot%04x for root partition\n", boot_current);
 		return (BOOT_INFO_OK);		/* use currdir, default kernel */
 	}
 	fp = (FILEPATH_DEVICE_PATH *)dp;
 	ucs2_to_utf8(fp->PathName, &kernel);
 	if (kernel == NULL) {
 		printf("Not using Boot%04x: can't decode kernel\n", boot_current);
 		return (BAD_CHOICE);
 	}
 	if (*kernel == '\\' || isupper(*kernel))
 		fix_dosisms(kernel);
 	if (stat(kernel, &st) != 0) {
 		free(kernel);
 		printf("Not using Boot%04x: can't find %s\n", boot_current,
 		    kernel);
 		return (BAD_CHOICE);
 	}
 	setenv("kernel", kernel, 1);
 	free(kernel);
 	text = efi_devpath_name(last_dp);
 	if (text) {
 		printf("Using Boot%04x %S + %s\n", boot_current, text,
 		    kernel);
 		efi_free_devpath_name(text);
 	}
 
 	return (BOOT_INFO_OK);
 }
 
 /*
  * Look at the passed-in boot_info, if any. If we find it then we need
  * to see if we can find ourselves in the boot chain. If we can, and
  * there's another specified thing to boot next, assume that the file
  * is loaded from / and use that for the root filesystem. If can't
  * find the specified thing, we must fail the boot. If we're last on
  * the list, then we fallback to looking for the first available /
  * candidate (ZFS, if there's a bootable zpool, otherwise a UFS
  * partition that has either /boot/defaults/loader.conf on it or
  * /boot/kernel/kernel (the default kernel) that we can use.
  *
  * We always fail if we can't find the right thing. However, as
  * a concession to buggy UEFI implementations, like u-boot, if
  * we have determined that the host is violating the UEFI boot
  * manager protocol, we'll signal the rest of the program that
  * a drop to the OK boot loader prompt is possible.
  */
 static int
 find_currdev(bool do_bootmgr, bool is_last,
     char *boot_info, size_t boot_info_sz)
 {
 	pdinfo_t *dp, *pp;
 	EFI_DEVICE_PATH *devpath, *copy;
 	EFI_HANDLE h;
 	CHAR16 *text;
 	struct devsw *dev;
 	int unit;
 	uint64_t extra;
 	int rv;
 	char *rootdev;
 
 	/*
 	 * First choice: if rootdev is already set, use that, even if
 	 * it's wrong.
 	 */
 	rootdev = getenv("rootdev");
 	if (rootdev != NULL) {
-		printf("Setting currdev to configured rootdev %s\n", rootdev);
+		printf("    Setting currdev to configured rootdev %s\n",
+		    rootdev);
 		set_currdev(rootdev);
 		return (0);
 	}
 
 	/*
-	 * Second choice: If we can find out image boot_info, and there's
+	 * Second choice: If uefi_rootdev is set, translate that UEFI device
+	 * path to the loader's internal name and use that.
+	 */
+	do {
+		rootdev = getenv("uefi_rootdev");
+		if (rootdev == NULL)
+			break;
+		devpath = efi_name_to_devpath(rootdev);
+		if (devpath == NULL)
+			break;
+		dp = efiblk_get_pdinfo_by_device_path(devpath);
+		efi_devpath_free(devpath);
+		if (dp == NULL)
+			break;
+		printf("    Setting currdev to UEFI path %s\n",
+		    rootdev);
+		set_currdev_pdinfo(dp);
+		return (0);
+	} while (0);
+
+	/*
+	 * Third choice: If we can find out image boot_info, and there's
 	 * a follow-on boot image in that boot_info, use that. In this
 	 * case root will be the partition specified in that image and
 	 * we'll load the kernel specified by the file path. Should there
 	 * not be a filepath, we use the default. This filepath overrides
 	 * loader.conf.
 	 */
 	if (do_bootmgr) {
 		rv = match_boot_info(boot_info, boot_info_sz);
 		switch (rv) {
 		case BOOT_INFO_OK:	/* We found it */
 			return (0);
 		case BAD_CHOICE:	/* specified file not found -> error */
 			/* XXX do we want to have an escape hatch for last in boot order? */
 			return (ENOENT);
 		} /* Nothing specified, try normal match */
 	}
 
 #ifdef EFI_ZFS_BOOT
 	/*
 	 * Did efi_zfs_probe() detect the boot pool? If so, use the zpool
 	 * it found, if it's sane. ZFS is the only thing that looks for
 	 * disks and pools to boot. This may change in the future, however,
 	 * if we allow specifying which pool to boot from via UEFI variables
 	 * rather than the bootenv stuff that FreeBSD uses today.
 	 */
 	if (pool_guid != 0) {
 		printf("Trying ZFS pool\n");
 		if (probe_zfs_currdev(pool_guid))
 			return (0);
 	}
 #endif /* EFI_ZFS_BOOT */
 
 	/*
 	 * Try to find the block device by its handle based on the
 	 * image we're booting. If we can't find a sane partition,
 	 * search all the other partitions of the disk. We do not
 	 * search other disks because it's a violation of the UEFI
 	 * boot protocol to do so. We fail and let UEFI go on to
 	 * the next candidate.
 	 */
 	dp = efiblk_get_pdinfo_by_handle(boot_img->DeviceHandle);
 	if (dp != NULL) {
 		text = efi_devpath_name(dp->pd_devpath);
 		if (text != NULL) {
 			printf("Trying ESP: %S\n", text);
 			efi_free_devpath_name(text);
 		}
 		set_currdev_pdinfo(dp);
 		if (sanity_check_currdev())
 			return (0);
 		if (dp->pd_parent != NULL) {
 			pdinfo_t *espdp = dp;
 			dp = dp->pd_parent;
 			STAILQ_FOREACH(pp, &dp->pd_part, pd_link) {
 				/* Already tried the ESP */
 				if (espdp == pp)
 					continue;
 				/*
 				 * Roll up the ZFS special case
 				 * for those partitions that have
 				 * zpools on them.
 				 */
 				text = efi_devpath_name(pp->pd_devpath);
 				if (text != NULL) {
 					printf("Trying: %S\n", text);
 					efi_free_devpath_name(text);
 				}
 				if (try_as_currdev(dp, pp))
 					return (0);
 			}
 		}
 	}
 
 	/*
 	 * Try the device handle from our loaded image first.  If that
 	 * fails, use the device path from the loaded image and see if
 	 * any of the nodes in that path match one of the enumerated
 	 * handles. Currently, this handle list is only for netboot.
 	 */
 	if (efi_handle_lookup(boot_img->DeviceHandle, &dev, &unit, &extra) == 0) {
 		set_currdev_devsw(dev, unit);
 		if (sanity_check_currdev())
 			return (0);
 	}
 
 	copy = NULL;
 	devpath = efi_lookup_image_devpath(IH);
 	while (devpath != NULL) {
 		h = efi_devpath_handle(devpath);
 		if (h == NULL)
 			break;
 
 		free(copy);
 		copy = NULL;
 
 		if (efi_handle_lookup(h, &dev, &unit, &extra) == 0) {
 			set_currdev_devsw(dev, unit);
 			if (sanity_check_currdev())
 				return (0);
 		}
 
 		devpath = efi_lookup_devpath(h);
 		if (devpath != NULL) {
 			copy = efi_devpath_trim(devpath);
 			devpath = copy;
 		}
 	}
 	free(copy);
 
 	return (ENOENT);
 }
 
 static bool
 interactive_interrupt(const char *msg)
 {
 	time_t now, then, last;
 
 	last = 0;
 	now = then = getsecs();
 	printf("%s\n", msg);
 	if (fail_timeout == -2)		/* Always break to OK */
 		return (true);
 	if (fail_timeout == -1)		/* Never break to OK */
 		return (false);
 	do {
 		if (last != now) {
 			printf("press any key to interrupt reboot in %d seconds\r",
 			    fail_timeout - (int)(now - then));
 			last = now;
 		}
 
 		/* XXX no pause or timeout wait for char */
 		if (ischar())
 			return (true);
 		now = getsecs();
 	} while (now - then < fail_timeout);
 	return (false);
 }
 
 static int
 parse_args(int argc, CHAR16 *argv[])
 {
 	int i, j, howto;
 	bool vargood;
 	char var[128];
 
 	/*
 	 * Parse the args to set the console settings, etc
 	 * boot1.efi passes these in, if it can read /boot.config or /boot/config
 	 * or iPXE may be setup to pass these in. Or the optional argument in the
 	 * boot environment was used to pass these arguments in (in which case
 	 * neither /boot.config nor /boot/config are consulted).
 	 *
 	 * Loop through the args, and for each one that contains an '=' that is
 	 * not the first character, add it to the environment.  This allows
 	 * loader and kernel env vars to be passed on the command line.  Convert
 	 * args from UCS-2 to ASCII (16 to 8 bit) as they are copied (though this
 	 * method is flawed for non-ASCII characters).
 	 */
 	howto = 0;
 	for (i = 1; i < argc; i++) {
 		cpy16to8(argv[i], var, sizeof(var));
 		howto |= boot_parse_arg(var);
 	}
 
 	return (howto);
 }
 
 static void
 setenv_int(const char *key, int val)
 {
 	char buf[20];
 
 	snprintf(buf, sizeof(buf), "%d", val);
 	setenv(key, buf, 1);
 }
 
 /*
  * Parse ConOut (the list of consoles active) and see if we can find a
  * serial port and/or a video port. It would be nice to also walk the
  * ACPI name space to map the UID for the serial port to a port. The
  * latter is especially hard.
  */
 static int
 parse_uefi_con_out(void)
 {
 	int how, rv;
 	int vid_seen = 0, com_seen = 0, seen = 0;
 	size_t sz;
 	char buf[4096], *ep;
 	EFI_DEVICE_PATH *node;
 	ACPI_HID_DEVICE_PATH  *acpi;
 	UART_DEVICE_PATH  *uart;
 	bool pci_pending;
 
 	how = 0;
 	sz = sizeof(buf);
 	rv = efi_global_getenv("ConOut", buf, &sz);
 	if (rv != EFI_SUCCESS)
 		goto out;
 	ep = buf + sz;
 	node = (EFI_DEVICE_PATH *)buf;
 	while ((char *)node < ep) {
 		pci_pending = false;
 		if (DevicePathType(node) == ACPI_DEVICE_PATH &&
 		    DevicePathSubType(node) == ACPI_DP) {
 			/* Check for Serial node */
 			acpi = (void *)node;
 			if (EISA_ID_TO_NUM(acpi->HID) == 0x501) {
 				setenv_int("efi_8250_uid", acpi->UID);
 				com_seen = ++seen;
 			}
 		} else if (DevicePathType(node) == MESSAGING_DEVICE_PATH &&
 		    DevicePathSubType(node) == MSG_UART_DP) {
 
 			uart = (void *)node;
 			setenv_int("efi_com_speed", uart->BaudRate);
 		} else if (DevicePathType(node) == ACPI_DEVICE_PATH &&
 		    DevicePathSubType(node) == ACPI_ADR_DP) {
 			/* Check for AcpiAdr() Node for video */
 			vid_seen = ++seen;
 		} else if (DevicePathType(node) == HARDWARE_DEVICE_PATH &&
 		    DevicePathSubType(node) == HW_PCI_DP) {
 			/*
 			 * Note, vmware fusion has a funky console device
 			 *	PciRoot(0x0)/Pci(0xf,0x0)
 			 * which we can only detect at the end since we also
 			 * have to cope with:
 			 *	PciRoot(0x0)/Pci(0x1f,0x0)/Serial(0x1)
 			 * so only match it if it's last.
 			 */
 			pci_pending = true;
 		}
 		node = NextDevicePathNode(node); /* Skip the end node */
 	}
 	if (pci_pending && vid_seen == 0)
 		vid_seen = ++seen;
 
 	/*
 	 * Truth table for RB_MULTIPLE | RB_SERIAL
 	 * Value		Result
 	 * 0			Use only video console
 	 * RB_SERIAL		Use only serial console
 	 * RB_MULTIPLE		Use both video and serial console
 	 *			(but video is primary so gets rc messages)
 	 * both			Use both video and serial console
 	 *			(but serial is primary so gets rc messages)
 	 *
 	 * Try to honor this as best we can. If only one of serial / video
 	 * found, then use that. Otherwise, use the first one we found.
 	 * This also implies if we found nothing, default to video.
 	 */
 	how = 0;
 	if (vid_seen && com_seen) {
 		how |= RB_MULTIPLE;
 		if (com_seen < vid_seen)
 			how |= RB_SERIAL;
 	} else if (com_seen)
 		how |= RB_SERIAL;
 out:
 	return (how);
 }
 
+void
+parse_loader_efi_config(EFI_HANDLE h, const char *env_fn)
+{
+	pdinfo_t *dp;
+	struct stat st;
+	int fd = -1;
+	char *env = NULL;
+
+	dp = efiblk_get_pdinfo_by_handle(h);
+	if (dp == NULL)
+		return;
+	set_currdev_pdinfo(dp);
+	if (stat(env_fn, &st) != 0)
+		return;
+	fd = open(env_fn, O_RDONLY);
+	if (fd == -1)
+		return;
+	env = malloc(st.st_size + 1);
+	if (env == NULL)
+		goto out;
+	if (read(fd, env, st.st_size) != st.st_size)
+		goto out;
+	env[st.st_size] = '\0';
+	boot_parse_cmdline(env);
+out:
+	free(env);
+	close(fd);
+}
+
+static void
+read_loader_env(const char *name, char *def_fn, bool once)
+{
+	UINTN len;
+	char *fn, *freeme = NULL;
+
+	len = 0;
+	fn = def_fn;
+	if (efi_freebsd_getenv(name, NULL, &len) == EFI_BUFFER_TOO_SMALL) {
+		freeme = fn = malloc(len + 1);
+		if (fn != NULL) {
+			if (efi_freebsd_getenv(name, fn, &len) != EFI_SUCCESS) {
+				free(fn);
+				fn = NULL;
+				printf(
+			    "Can't fetch FreeBSD::%s we know is there\n", name);
+			} else {
+				/*
+				 * if tagged as 'once' delete the env variable so we
+				 * only use it once.
+				 */
+				if (once)
+					efi_freebsd_delenv(name);
+				/*
+				 * We malloced 1 more than len above, then redid the call.
+				 * so now we have room at the end of the string to NUL terminate
+				 * it here, even if the typical idium would have '- 1' here to
+				 * not overflow. len should be the same on return both times.
+				 */
+				fn[len] = '\0';
+			}
+		} else {
+			printf(
+		    "Can't allocate %d bytes to fetch FreeBSD::%s env var\n",
+			    len, name);
+		}
+	}
+	if (fn) {
+		printf("    Reading loader env vars from %s\n", fn);
+		parse_loader_efi_config(boot_img->DeviceHandle, fn);
+	}
+}
+
+
+
 EFI_STATUS
 main(int argc, CHAR16 *argv[])
 {
 	EFI_GUID *guid;
 	int howto, i, uhowto;
 	UINTN k;
 	bool has_kbd, is_last;
 	char *s;
 	EFI_DEVICE_PATH *imgpath;
 	CHAR16 *text;
 	EFI_STATUS rv;
 	size_t sz, bosz = 0, bisz = 0;
 	UINT16 boot_order[100];
 	char boot_info[4096];
 	char buf[32];
 	bool uefi_boot_mgr;
 
 	archsw.arch_autoload = efi_autoload;
 	archsw.arch_getdev = efi_getdev;
 	archsw.arch_copyin = efi_copyin;
 	archsw.arch_copyout = efi_copyout;
 	archsw.arch_readin = efi_readin;
 	archsw.arch_zfs_probe = efi_zfs_probe;
 
         /* Get our loaded image protocol interface structure. */
 	BS->HandleProtocol(IH, &imgid, (VOID**)&boot_img);
 
 	/*
 	 * Chicken-and-egg problem; we want to have console output early, but
 	 * some console attributes may depend on reading from eg. the boot
 	 * device, which we can't do yet.  We can use printf() etc. once this is
 	 * done. So, we set it to the efi console, then call console init. This
 	 * gets us printf early, but also primes the pump for all future console
 	 * changes to take effect, regardless of where they come from.
 	 */
 	setenv("console", "efi", 1);
 	cons_probe();
 
 	/* Init the time source */
 	efi_time_init();
 
-	has_kbd = has_keyboard();
-
 	/*
 	 * Initialise the block cache. Set the upper limit.
 	 */
 	bcache_init(32768, 512);
 
+	/*
+	 * Scan the BLOCK IO MEDIA handles then
+	 * march through the device switch probing for things.
+	 */
+	i = efipart_inithandles();
+	if (i != 0 && i != ENOENT) {
+		printf("efipart_inithandles failed with ERRNO %d, expect "
+		    "failures\n", i);
+	}
+
+	for (i = 0; devsw[i] != NULL; i++)
+		if (devsw[i]->dv_init != NULL)
+			(devsw[i]->dv_init)();
+
+	/*
+	 * Detect console settings two different ways: one via the command
+	 * args (eg -h) or via the UEFI ConOut variable.
+	 */
+	has_kbd = has_keyboard();
 	howto = parse_args(argc, argv);
 	if (!has_kbd && (howto & RB_PROBE))
 		howto |= RB_SERIAL | RB_MULTIPLE;
 	howto &= ~RB_PROBE;
 	uhowto = parse_uefi_con_out();
 
 	/*
+	 * Scan the BLOCK IO MEDIA handles then
+	 * march through the device switch probing for things.
+	 */
+	i = efipart_inithandles();
+	if (i != 0 && i != ENOENT) {
+		printf("efipart_inithandles failed with ERRNO %d, expect "
+		    "failures\n", i);
+	}
+
+	for (i = 0; devsw[i] != NULL; i++)
+		if (devsw[i]->dv_init != NULL)
+			(devsw[i]->dv_init)();
+
+	/*
+	 * Read additional environment variables from the boot device's
+	 * "LoaderEnv" file. Any boot loader environment variable may be set
+	 * there, which are subtly different than loader.conf variables. Only
+	 * the 'simple' ones may be set so things like foo_load="YES" won't work
+	 * for two reasons.  First, the parser is simplistic and doesn't grok
+	 * quotes.  Second, because the variables that cause an action to happen
+	 * are parsed by the lua, 4th or whatever code that's not yet
+	 * loaded. This is relative to the root directory when loader.efi is
+	 * loaded off the UFS root drive (when chain booted), or from the ESP
+	 * when directly loaded by the BIOS.
+	 *
+	 * We also read in NextLoaderEnv if it was specified. This allows next boot
+	 * functionality to be implemented and to override anything in LoaderEnv.
+	 */
+	read_loader_env("LoaderEnv", "/efi/freebsd/loader.env", false);
+	read_loader_env("NextLoaderEnv", NULL, true);
+
+	/*
 	 * We now have two notions of console. howto should be viewed as
 	 * overrides. If console is already set, don't set it again.
 	 */
 #define	VIDEO_ONLY	0
 #define	SERIAL_ONLY	RB_SERIAL
 #define	VID_SER_BOTH	RB_MULTIPLE
 #define	SER_VID_BOTH	(RB_SERIAL | RB_MULTIPLE)
 #define	CON_MASK	(RB_SERIAL | RB_MULTIPLE)
 	if (strcmp(getenv("console"), "efi") == 0) {
 		if ((howto & CON_MASK) == 0) {
 			/* No override, uhowto is controlling and efi cons is perfect */
 			howto = howto | (uhowto & CON_MASK);
 		} else if ((howto & CON_MASK) == (uhowto & CON_MASK)) {
 			/* override matches what UEFI told us, efi console is perfect */
 		} else if ((uhowto & (CON_MASK)) != 0) {
 			/*
 			 * We detected a serial console on ConOut. All possible
 			 * overrides include serial. We can't really override what efi
 			 * gives us, so we use it knowing it's the best choice.
 			 */
 			/* Do nothing */
 		} else {
 			/*
 			 * We detected some kind of serial in the override, but ConOut
 			 * has no serial, so we have to sort out which case it really is.
 			 */
 			switch (howto & CON_MASK) {
 			case SERIAL_ONLY:
 				setenv("console", "comconsole", 1);
 				break;
 			case VID_SER_BOTH:
 				setenv("console", "efi comconsole", 1);
 				break;
 			case SER_VID_BOTH:
 				setenv("console", "comconsole efi", 1);
 				break;
 				/* case VIDEO_ONLY can't happen -- it's the first if above */
 			}
 		}
 	}
 
 	/*
 	 * howto is set now how we want to export the flags to the kernel, so
 	 * set the env based on it.
 	 */
 	boot_howto_to_env(howto);
 
 	if (efi_copy_init()) {
 		printf("failed to allocate staging area\n");
 		return (EFI_BUFFER_TOO_SMALL);
 	}
 
 	if ((s = getenv("fail_timeout")) != NULL)
 		fail_timeout = strtol(s, NULL, 10);
 
-	/*
-	 * Scan the BLOCK IO MEDIA handles then
-	 * march through the device switch probing for things.
-	 */
-	i = efipart_inithandles();
-	if (i != 0 && i != ENOENT) {
-		printf("efipart_inithandles failed with ERRNO %d, expect "
-		    "failures\n", i);
-	}
-
-	for (i = 0; devsw[i] != NULL; i++)
-		if (devsw[i]->dv_init != NULL)
-			(devsw[i]->dv_init)();
-
 	printf("%s\n", bootprog_info);
 	printf("   Command line arguments:");
 	for (i = 0; i < argc; i++)
 		printf(" %S", argv[i]);
 	printf("\n");
 
 	printf("   EFI version: %d.%02d\n", ST->Hdr.Revision >> 16,
 	    ST->Hdr.Revision & 0xffff);
 	printf("   EFI Firmware: %S (rev %d.%02d)\n", ST->FirmwareVendor,
 	    ST->FirmwareRevision >> 16, ST->FirmwareRevision & 0xffff);
 	printf("   Console: %s (%#x)\n", getenv("console"), howto);
 
-
-
 	/* Determine the devpath of our image so we can prefer it. */
 	text = efi_devpath_name(boot_img->FilePath);
 	if (text != NULL) {
 		printf("   Load Path: %S\n", text);
 		efi_setenv_freebsd_wcs("LoaderPath", text);
 		efi_free_devpath_name(text);
 	}
 
 	rv = BS->HandleProtocol(boot_img->DeviceHandle, &devid, (void **)&imgpath);
 	if (rv == EFI_SUCCESS) {
 		text = efi_devpath_name(imgpath);
 		if (text != NULL) {
 			printf("   Load Device: %S\n", text);
 			efi_setenv_freebsd_wcs("LoaderDev", text);
 			efi_free_devpath_name(text);
 		}
 	}
 
-	uefi_boot_mgr = true;
-	boot_current = 0;
-	sz = sizeof(boot_current);
-	rv = efi_global_getenv("BootCurrent", &boot_current, &sz);
-	if (rv == EFI_SUCCESS)
-		printf("   BootCurrent: %04x\n", boot_current);
-	else {
-		boot_current = 0xffff;
+	if (getenv("uefi_ignore_boot_mgr") != NULL) {
+		printf("    Ignoring UEFI boot manager\n");
 		uefi_boot_mgr = false;
-	}
+	} else {
+		uefi_boot_mgr = true;
+		boot_current = 0;
+		sz = sizeof(boot_current);
+		rv = efi_global_getenv("BootCurrent", &boot_current, &sz);
+		if (rv == EFI_SUCCESS)
+			printf("   BootCurrent: %04x\n", boot_current);
+		else {
+			boot_current = 0xffff;
+			uefi_boot_mgr = false;
+		}
 
-	sz = sizeof(boot_order);
-	rv = efi_global_getenv("BootOrder", &boot_order, &sz);
-	if (rv == EFI_SUCCESS) {
-		printf("   BootOrder:");
-		for (i = 0; i < sz / sizeof(boot_order[0]); i++)
-			printf(" %04x%s", boot_order[i],
-			    boot_order[i] == boot_current ? "[*]" : "");
-		printf("\n");
-		is_last = boot_order[(sz / sizeof(boot_order[0])) - 1] == boot_current;
-		bosz = sz;
-	} else if (uefi_boot_mgr) {
-		/*
-		 * u-boot doesn't set BootOrder, but otherwise participates in the
-		 * boot manager protocol. So we fake it here and don't consider it
-		 * a failure.
-		 */
-		bosz = sizeof(boot_order[0]);
-		boot_order[0] = boot_current;
-		is_last = true;
+		sz = sizeof(boot_order);
+		rv = efi_global_getenv("BootOrder", &boot_order, &sz);
+		if (rv == EFI_SUCCESS) {
+			printf("   BootOrder:");
+			for (i = 0; i < sz / sizeof(boot_order[0]); i++)
+				printf(" %04x%s", boot_order[i],
+				    boot_order[i] == boot_current ? "[*]" : "");
+			printf("\n");
+			is_last = boot_order[(sz / sizeof(boot_order[0])) - 1] == boot_current;
+			bosz = sz;
+		} else if (uefi_boot_mgr) {
+			/*
+			 * u-boot doesn't set BootOrder, but otherwise participates in the
+			 * boot manager protocol. So we fake it here and don't consider it
+			 * a failure.
+			 */
+			bosz = sizeof(boot_order[0]);
+			boot_order[0] = boot_current;
+			is_last = true;
+		}
 	}
 
 	/*
 	 * Next, find the boot info structure the UEFI boot manager is
 	 * supposed to setup. We need this so we can walk through it to
 	 * find where we are in the booting process and what to try to
 	 * boot next.
 	 */
 	if (uefi_boot_mgr) {
 		snprintf(buf, sizeof(buf), "Boot%04X", boot_current);
 		sz = sizeof(boot_info);
 		rv = efi_global_getenv(buf, &boot_info, &sz);
 		if (rv == EFI_SUCCESS)
 			bisz = sz;
 		else
 			uefi_boot_mgr = false;
 	}
 
 	/*
 	 * Disable the watchdog timer. By default the boot manager sets
 	 * the timer to 5 minutes before invoking a boot option. If we
 	 * want to return to the boot manager, we have to disable the
 	 * watchdog timer and since we're an interactive program, we don't
 	 * want to wait until the user types "quit". The timer may have
 	 * fired by then. We don't care if this fails. It does not prevent
 	 * normal functioning in any way...
 	 */
 	BS->SetWatchdogTimer(0, 0, 0, NULL);
 
 	/*
 	 * Initialize the trusted/forbidden certificates from UEFI.
 	 * They will be later used to verify the manifest(s),
 	 * which should contain hashes of verified files.
 	 * This needs to be initialized before any configuration files
 	 * are loaded.
 	 */
 #ifdef EFI_SECUREBOOT
 	ve_efi_init();
 #endif
 
 	/*
 	 * Try and find a good currdev based on the image that was booted.
 	 * It might be desirable here to have a short pause to allow falling
 	 * through to the boot loader instead of returning instantly to follow
 	 * the boot protocol and also allow an escape hatch for users wishing
 	 * to try something different.
 	 */
 	if (find_currdev(uefi_boot_mgr, is_last, boot_info, bisz) != 0)
-		if (!interactive_interrupt("Failed to find bootable partition"))
+		if (uefi_boot_mgr &&
+		    !interactive_interrupt("Failed to find bootable partition"))
 			return (EFI_NOT_FOUND);
 
 	efi_init_environment();
 
 #if !defined(__arm__)
 	for (k = 0; k < ST->NumberOfTableEntries; k++) {
 		guid = &ST->ConfigurationTable[k].VendorGuid;
 		if (!memcmp(guid, &smbios, sizeof(EFI_GUID))) {
 			char buf[40];
 
 			snprintf(buf, sizeof(buf), "%p",
 			    ST->ConfigurationTable[k].VendorTable);
 			setenv("hint.smbios.0.mem", buf, 1);
 			smbios_detect(ST->ConfigurationTable[k].VendorTable);
 			break;
 		}
 	}
 #endif
 
 	interact();			/* doesn't return */
 
 	return (EFI_SUCCESS);		/* keep compiler happy */
 }
 
 COMMAND_SET(poweroff, "poweroff", "power off the system", command_poweroff);
 
 static int
 command_poweroff(int argc __unused, char *argv[] __unused)
 {
 	int i;
 
 	for (i = 0; devsw[i] != NULL; ++i)
 		if (devsw[i]->dv_cleanup != NULL)
 			(devsw[i]->dv_cleanup)();
 
 	RS->ResetSystem(EfiResetShutdown, EFI_SUCCESS, 0, NULL);
 
 	/* NOTREACHED */
 	return (CMD_ERROR);
 }
 
 COMMAND_SET(reboot, "reboot", "reboot the system", command_reboot);
 
 static int
 command_reboot(int argc, char *argv[])
 {
 	int i;
 
 	for (i = 0; devsw[i] != NULL; ++i)
 		if (devsw[i]->dv_cleanup != NULL)
 			(devsw[i]->dv_cleanup)();
 
 	RS->ResetSystem(EfiResetCold, EFI_SUCCESS, 0, NULL);
 
 	/* NOTREACHED */
 	return (CMD_ERROR);
 }
 
 COMMAND_SET(quit, "quit", "exit the loader", command_quit);
 
 static int
 command_quit(int argc, char *argv[])
 {
 	exit(0);
 	return (CMD_OK);
 }
 
 COMMAND_SET(memmap, "memmap", "print memory map", command_memmap);
 
 static int
 command_memmap(int argc __unused, char *argv[] __unused)
 {
 	UINTN sz;
 	EFI_MEMORY_DESCRIPTOR *map, *p;
 	UINTN key, dsz;
 	UINT32 dver;
 	EFI_STATUS status;
 	int i, ndesc;
 	char line[80];
 
 	sz = 0;
 	status = BS->GetMemoryMap(&sz, 0, &key, &dsz, &dver);
 	if (status != EFI_BUFFER_TOO_SMALL) {
 		printf("Can't determine memory map size\n");
 		return (CMD_ERROR);
 	}
 	map = malloc(sz);
 	status = BS->GetMemoryMap(&sz, map, &key, &dsz, &dver);
 	if (EFI_ERROR(status)) {
 		printf("Can't read memory map\n");
 		return (CMD_ERROR);
 	}
 
 	ndesc = sz / dsz;
 	snprintf(line, sizeof(line), "%23s %12s %12s %8s %4s\n",
 	    "Type", "Physical", "Virtual", "#Pages", "Attr");
 	pager_open();
 	if (pager_output(line)) {
 		pager_close();
 		return (CMD_OK);
 	}
 
 	for (i = 0, p = map; i < ndesc;
 	     i++, p = NextMemoryDescriptor(p, dsz)) {
 		snprintf(line, sizeof(line), "%23s %012jx %012jx %08jx ",
 		    efi_memory_type(p->Type), (uintmax_t)p->PhysicalStart,
 		    (uintmax_t)p->VirtualStart, (uintmax_t)p->NumberOfPages);
 		if (pager_output(line))
 			break;
 
 		if (p->Attribute & EFI_MEMORY_UC)
 			printf("UC ");
 		if (p->Attribute & EFI_MEMORY_WC)
 			printf("WC ");
 		if (p->Attribute & EFI_MEMORY_WT)
 			printf("WT ");
 		if (p->Attribute & EFI_MEMORY_WB)
 			printf("WB ");
 		if (p->Attribute & EFI_MEMORY_UCE)
 			printf("UCE ");
 		if (p->Attribute & EFI_MEMORY_WP)
 			printf("WP ");
 		if (p->Attribute & EFI_MEMORY_RP)
 			printf("RP ");
 		if (p->Attribute & EFI_MEMORY_XP)
 			printf("XP ");
 		if (p->Attribute & EFI_MEMORY_NV)
 			printf("NV ");
 		if (p->Attribute & EFI_MEMORY_MORE_RELIABLE)
 			printf("MR ");
 		if (p->Attribute & EFI_MEMORY_RO)
 			printf("RO ");
 		if (pager_output("\n"))
 			break;
 	}
 
 	pager_close();
 	return (CMD_OK);
 }
 
 COMMAND_SET(configuration, "configuration", "print configuration tables",
     command_configuration);
 
 static int
 command_configuration(int argc, char *argv[])
 {
 	UINTN i;
 	char *name;
 
 	printf("NumberOfTableEntries=%lu\n",
 		(unsigned long)ST->NumberOfTableEntries);
 
 	for (i = 0; i < ST->NumberOfTableEntries; i++) {
 		EFI_GUID *guid;
 
 		printf("  ");
 		guid = &ST->ConfigurationTable[i].VendorGuid;
 
 		if (efi_guid_to_name(guid, &name) == true) {
 			printf(name);
 			free(name);
 		} else {
 			printf("Error while translating UUID to name");
 		}
 		printf(" at %p\n", ST->ConfigurationTable[i].VendorTable);
 	}
 
 	return (CMD_OK);
 }
 
 
 COMMAND_SET(mode, "mode", "change or display EFI text modes", command_mode);
 
 static int
 command_mode(int argc, char *argv[])
 {
 	UINTN cols, rows;
 	unsigned int mode;
 	int i;
 	char *cp;
 	char rowenv[8];
 	EFI_STATUS status;
 	SIMPLE_TEXT_OUTPUT_INTERFACE *conout;
 	extern void HO(void);
 
 	conout = ST->ConOut;
 
 	if (argc > 1) {
 		mode = strtol(argv[1], &cp, 0);
 		if (cp[0] != '\0') {
 			printf("Invalid mode\n");
 			return (CMD_ERROR);
 		}
 		status = conout->QueryMode(conout, mode, &cols, &rows);
 		if (EFI_ERROR(status)) {
 			printf("invalid mode %d\n", mode);
 			return (CMD_ERROR);
 		}
 		status = conout->SetMode(conout, mode);
 		if (EFI_ERROR(status)) {
 			printf("couldn't set mode %d\n", mode);
 			return (CMD_ERROR);
 		}
 		sprintf(rowenv, "%u", (unsigned)rows);
 		setenv("LINES", rowenv, 1);
 		HO();		/* set cursor */
 		return (CMD_OK);
 	}
 
 	printf("Current mode: %d\n", conout->Mode->Mode);
 	for (i = 0; i <= conout->Mode->MaxMode; i++) {
 		status = conout->QueryMode(conout, i, &cols, &rows);
 		if (EFI_ERROR(status))
 			continue;
 		printf("Mode %d: %u columns, %u rows\n", i, (unsigned)cols,
 		    (unsigned)rows);
 	}
 
 	if (i != 0)
 		printf("Select a mode with the command \"mode <number>\"\n");
 
 	return (CMD_OK);
 }
 
 COMMAND_SET(lsefi, "lsefi", "list EFI handles", command_lsefi);
 
 static int
 command_lsefi(int argc __unused, char *argv[] __unused)
 {
 	char *name;
 	EFI_HANDLE *buffer = NULL;
 	EFI_HANDLE handle;
 	UINTN bufsz = 0, i, j;
 	EFI_STATUS status;
 	int ret = 0;
 
 	status = BS->LocateHandle(AllHandles, NULL, NULL, &bufsz, buffer);
 	if (status != EFI_BUFFER_TOO_SMALL) {
 		snprintf(command_errbuf, sizeof (command_errbuf),
 		    "unexpected error: %lld", (long long)status);
 		return (CMD_ERROR);
 	}
 	if ((buffer = malloc(bufsz)) == NULL) {
 		sprintf(command_errbuf, "out of memory");
 		return (CMD_ERROR);
 	}
 
 	status = BS->LocateHandle(AllHandles, NULL, NULL, &bufsz, buffer);
 	if (EFI_ERROR(status)) {
 		free(buffer);
 		snprintf(command_errbuf, sizeof (command_errbuf),
 		    "LocateHandle() error: %lld", (long long)status);
 		return (CMD_ERROR);
 	}
 
 	pager_open();
 	for (i = 0; i < (bufsz / sizeof (EFI_HANDLE)); i++) {
 		UINTN nproto = 0;
 		EFI_GUID **protocols = NULL;
 
 		handle = buffer[i];
 		printf("Handle %p", handle);
 		if (pager_output("\n"))
 			break;
 		/* device path */
 
 		status = BS->ProtocolsPerHandle(handle, &protocols, &nproto);
 		if (EFI_ERROR(status)) {
 			snprintf(command_errbuf, sizeof (command_errbuf),
 			    "ProtocolsPerHandle() error: %lld",
 			    (long long)status);
 			continue;
 		}
 
 		for (j = 0; j < nproto; j++) {
 			if (efi_guid_to_name(protocols[j], &name) == true) {
 				printf("  %s", name);
 				free(name);
 			} else {
 				printf("Error while translating UUID to name");
 			}
 			if ((ret = pager_output("\n")) != 0)
 				break;
 		}
 		BS->FreePool(protocols);
 		if (ret != 0)
 			break;
 	}
 	pager_close();
 	free(buffer);
 	return (CMD_OK);
 }
 
 #ifdef LOADER_FDT_SUPPORT
 extern int command_fdt_internal(int argc, char *argv[]);
 
 /*
  * Since proper fdt command handling function is defined in fdt_loader_cmd.c,
  * and declaring it as extern is in contradiction with COMMAND_SET() macro
  * (which uses static pointer), we're defining wrapper function, which
  * calls the proper fdt handling routine.
  */
 static int
 command_fdt(int argc, char *argv[])
 {
 
 	return (command_fdt_internal(argc, argv));
 }
 
 COMMAND_SET(fdt, "fdt", "flattened device tree handling", command_fdt);
 #endif
 
 /*
  * Chain load another efi loader.
  */
 static int
 command_chain(int argc, char *argv[])
 {
 	EFI_GUID LoadedImageGUID = LOADED_IMAGE_PROTOCOL;
 	EFI_HANDLE loaderhandle;
 	EFI_LOADED_IMAGE *loaded_image;
 	EFI_STATUS status;
 	struct stat st;
 	struct devdesc *dev;
 	char *name, *path;
 	void *buf;
 	int fd;
 
 	if (argc < 2) {
 		command_errmsg = "wrong number of arguments";
 		return (CMD_ERROR);
 	}
 
 	name = argv[1];
 
 	if ((fd = open(name, O_RDONLY)) < 0) {
 		command_errmsg = "no such file";
 		return (CMD_ERROR);
 	}
 
 	if (fstat(fd, &st) < -1) {
 		command_errmsg = "stat failed";
 		close(fd);
 		return (CMD_ERROR);
 	}
 
 	status = BS->AllocatePool(EfiLoaderCode, (UINTN)st.st_size, &buf);
 	if (status != EFI_SUCCESS) {
 		command_errmsg = "failed to allocate buffer";
 		close(fd);
 		return (CMD_ERROR);
 	}
 	if (read(fd, buf, st.st_size) != st.st_size) {
 		command_errmsg = "error while reading the file";
 		(void)BS->FreePool(buf);
 		close(fd);
 		return (CMD_ERROR);
 	}
 	close(fd);
 	status = BS->LoadImage(FALSE, IH, NULL, buf, st.st_size, &loaderhandle);
 	(void)BS->FreePool(buf);
 	if (status != EFI_SUCCESS) {
 		command_errmsg = "LoadImage failed";
 		return (CMD_ERROR);
 	}
 	status = BS->HandleProtocol(loaderhandle, &LoadedImageGUID,
 	    (void **)&loaded_image);
 
 	if (argc > 2) {
 		int i, len = 0;
 		CHAR16 *argp;
 
 		for (i = 2; i < argc; i++)
 			len += strlen(argv[i]) + 1;
 
 		len *= sizeof (*argp);
 		loaded_image->LoadOptions = argp = malloc (len);
 		loaded_image->LoadOptionsSize = len;
 		for (i = 2; i < argc; i++) {
 			char *ptr = argv[i];
 			while (*ptr)
 				*(argp++) = *(ptr++);
 			*(argp++) = ' ';
 		}
 		*(--argv) = 0;
 	}
 
 	if (efi_getdev((void **)&dev, name, (const char **)&path) == 0) {
 #ifdef EFI_ZFS_BOOT
 		struct zfs_devdesc *z_dev;
 #endif
 		struct disk_devdesc *d_dev;
 		pdinfo_t *hd, *pd;
 
 		switch (dev->d_dev->dv_type) {
 #ifdef EFI_ZFS_BOOT
 		case DEVT_ZFS:
 			z_dev = (struct zfs_devdesc *)dev;
 			loaded_image->DeviceHandle =
 			    efizfs_get_handle_by_guid(z_dev->pool_guid);
 			break;
 #endif
 		case DEVT_NET:
 			loaded_image->DeviceHandle =
 			    efi_find_handle(dev->d_dev, dev->d_unit);
 			break;
 		default:
 			hd = efiblk_get_pdinfo(dev);
 			if (STAILQ_EMPTY(&hd->pd_part)) {
 				loaded_image->DeviceHandle = hd->pd_handle;
 				break;
 			}
 			d_dev = (struct disk_devdesc *)dev;
 			STAILQ_FOREACH(pd, &hd->pd_part, pd_link) {
 				/*
 				 * d_partition should be 255
 				 */
 				if (pd->pd_unit == (uint32_t)d_dev->d_slice) {
 					loaded_image->DeviceHandle =
 					    pd->pd_handle;
 					break;
 				}
 			}
 			break;
 		}
 	}
 
 	dev_cleanup();
 	status = BS->StartImage(loaderhandle, NULL, NULL);
 	if (status != EFI_SUCCESS) {
 		command_errmsg = "StartImage failed";
 		free(loaded_image->LoadOptions);
 		loaded_image->LoadOptions = NULL;
 		status = BS->UnloadImage(loaded_image);
 		return (CMD_ERROR);
 	}
 
 	return (CMD_ERROR);	/* not reached */
 }
 
 COMMAND_SET(chain, "chain", "chain load file", command_chain);
Index: user/ngie/bug-237403/stand/fdt/fdt_loader_cmd.c
===================================================================
--- user/ngie/bug-237403/stand/fdt/fdt_loader_cmd.c	(revision 346925)
+++ user/ngie/bug-237403/stand/fdt/fdt_loader_cmd.c	(revision 346926)
@@ -1,1851 +1,1861 @@
 /*-
  * Copyright (c) 2009-2010 The FreeBSD Foundation
  * All rights reserved.
  *
  * This software was developed by Semihalf under sponsorship from
  * the FreeBSD Foundation.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stand.h>
 #include <libfdt.h>
 #include <fdt.h>
 #include <sys/param.h>
 #include <sys/linker.h>
 #include <machine/elf.h>
 
 #include "bootstrap.h"
 #include "fdt_platform.h"
 
 #ifdef DEBUG
 #define debugf(fmt, args...) do { printf("%s(): ", __func__);	\
     printf(fmt,##args); } while (0)
 #else
 #define debugf(fmt, args...)
 #endif
 
 #define FDT_CWD_LEN	256
 #define FDT_MAX_DEPTH	12
 
 #define FDT_PROP_SEP	" = "
 
 #define COPYOUT(s,d,l)	archsw.arch_copyout(s, d, l)
 #define COPYIN(s,d,l)	archsw.arch_copyin(s, d, l)
 
 #define FDT_STATIC_DTB_SYMBOL	"fdt_static_dtb"
 
 #define	CMD_REQUIRES_BLOB	0x01
 
 /* Location of FDT yet to be loaded. */
 /* This may be in read-only memory, so can't be manipulated directly. */
 static struct fdt_header *fdt_to_load = NULL;
 /* Location of FDT on heap. */
 /* This is the copy we actually manipulate. */
 static struct fdt_header *fdtp = NULL;
 /* Size of FDT blob */
 static size_t fdtp_size = 0;
 
 static int fdt_load_dtb(vm_offset_t va);
 static void fdt_print_overlay_load_error(int err, const char *filename);
 static int fdt_check_overlay_compatible(void *base_fdt, void *overlay_fdt);
 
 static int fdt_cmd_nyi(int argc, char *argv[]);
 static int fdt_load_dtb_overlays_string(const char * filenames);
 
 static int fdt_cmd_addr(int argc, char *argv[]);
 static int fdt_cmd_mkprop(int argc, char *argv[]);
 static int fdt_cmd_cd(int argc, char *argv[]);
 static int fdt_cmd_hdr(int argc, char *argv[]);
 static int fdt_cmd_ls(int argc, char *argv[]);
 static int fdt_cmd_prop(int argc, char *argv[]);
 static int fdt_cmd_pwd(int argc, char *argv[]);
 static int fdt_cmd_rm(int argc, char *argv[]);
 static int fdt_cmd_mknode(int argc, char *argv[]);
 static int fdt_cmd_mres(int argc, char *argv[]);
 
 typedef int cmdf_t(int, char *[]);
 
 struct cmdtab {
 	const char	*name;
 	cmdf_t		*handler;
 	int		flags;
 };
 
 static const struct cmdtab commands[] = {
 	{ "addr", &fdt_cmd_addr,	0 },
 	{ "alias", &fdt_cmd_nyi,	0 },
 	{ "cd", &fdt_cmd_cd,		CMD_REQUIRES_BLOB },
 	{ "header", &fdt_cmd_hdr,	CMD_REQUIRES_BLOB },
 	{ "ls", &fdt_cmd_ls,		CMD_REQUIRES_BLOB },
 	{ "mknode", &fdt_cmd_mknode,	CMD_REQUIRES_BLOB },
 	{ "mkprop", &fdt_cmd_mkprop,	CMD_REQUIRES_BLOB },
 	{ "mres", &fdt_cmd_mres,	CMD_REQUIRES_BLOB },
 	{ "prop", &fdt_cmd_prop,	CMD_REQUIRES_BLOB },
 	{ "pwd", &fdt_cmd_pwd,		CMD_REQUIRES_BLOB },
 	{ "rm", &fdt_cmd_rm,		CMD_REQUIRES_BLOB },
 	{ NULL, NULL }
 };
 
 static char cwd[FDT_CWD_LEN] = "/";
 
 static vm_offset_t
 fdt_find_static_dtb()
 {
 	Elf_Ehdr *ehdr;
 	Elf_Shdr *shdr;
 	Elf_Sym sym;
 	vm_offset_t strtab, symtab, fdt_start;
 	uint64_t offs;
 	struct preloaded_file *kfp;
 	struct file_metadata *md;
 	char *strp;
 	int i, sym_count;
 
 	debugf("fdt_find_static_dtb()\n");
 
 	sym_count = symtab = strtab = 0;
 	strp = NULL;
 
 	offs = __elfN(relocation_offset);
 
 	kfp = file_findfile(NULL, NULL);
 	if (kfp == NULL)
 		return (0);
 
 	/* Locate the dynamic symbols and strtab. */
 	md = file_findmetadata(kfp, MODINFOMD_ELFHDR);
 	if (md == NULL)
 		return (0);
 	ehdr = (Elf_Ehdr *)md->md_data;
 
 	md = file_findmetadata(kfp, MODINFOMD_SHDR);
 	if (md == NULL)
 		return (0);
 	shdr = (Elf_Shdr *)md->md_data;
 
 	for (i = 0; i < ehdr->e_shnum; ++i) {
 		if (shdr[i].sh_type == SHT_DYNSYM && symtab == 0) {
 			symtab = shdr[i].sh_addr + offs;
 			sym_count = shdr[i].sh_size / sizeof(Elf_Sym);
 		} else if (shdr[i].sh_type == SHT_STRTAB && strtab == 0) {
 			strtab = shdr[i].sh_addr + offs;
 		}
 	}
 
 	/*
 	 * The most efficient way to find a symbol would be to calculate a
 	 * hash, find proper bucket and chain, and thus find a symbol.
 	 * However, that would involve code duplication (e.g. for hash
 	 * function). So we're using simpler and a bit slower way: we're
 	 * iterating through symbols, searching for the one which name is
 	 * 'equal' to 'fdt_static_dtb'. To speed up the process a little bit,
 	 * we are eliminating symbols type of which is not STT_NOTYPE, or(and)
 	 * those which binding attribute is not STB_GLOBAL.
 	 */
 	fdt_start = 0;
 	while (sym_count > 0 && fdt_start == 0) {
 		COPYOUT(symtab, &sym, sizeof(sym));
 		symtab += sizeof(sym);
 		--sym_count;
 		if (ELF_ST_BIND(sym.st_info) != STB_GLOBAL ||
 		    ELF_ST_TYPE(sym.st_info) != STT_NOTYPE)
 			continue;
 		strp = strdupout(strtab + sym.st_name);
 		if (strcmp(strp, FDT_STATIC_DTB_SYMBOL) == 0)
 			fdt_start = (vm_offset_t)sym.st_value + offs;
 		free(strp);
 	}
 	return (fdt_start);
 }
 
 static int
 fdt_load_dtb(vm_offset_t va)
 {
 	struct fdt_header header;
 	int err;
 
 	debugf("fdt_load_dtb(0x%08jx)\n", (uintmax_t)va);
 
 	COPYOUT(va, &header, sizeof(header));
 	err = fdt_check_header(&header);
 	if (err < 0) {
 		if (err == -FDT_ERR_BADVERSION) {
 			snprintf(command_errbuf, sizeof(command_errbuf),
 			    "incompatible blob version: %d, should be: %d",
 			    fdt_version(fdtp), FDT_LAST_SUPPORTED_VERSION);
 		} else {
 			snprintf(command_errbuf, sizeof(command_errbuf),
 			    "error validating blob: %s", fdt_strerror(err));
 		}
 		return (1);
 	}
 
 	/*
 	 * Release previous blob
 	 */
 	if (fdtp)
 		free(fdtp);
 
 	fdtp_size = fdt_totalsize(&header);
 	fdtp = malloc(fdtp_size);
 
 	if (fdtp == NULL) {
 		command_errmsg = "can't allocate memory for device tree copy";
 		return (1);
 	}
 
 	COPYOUT(va, fdtp, fdtp_size);
 	debugf("DTB blob found at 0x%jx, size: 0x%jx\n", (uintmax_t)va, (uintmax_t)fdtp_size);
 
 	return (0);
 }
 
 int
 fdt_load_dtb_addr(struct fdt_header *header)
 {
 	int err;
 
 	debugf("fdt_load_dtb_addr(%p)\n", header);
 
 	fdtp_size = fdt_totalsize(header);
 	err = fdt_check_header(header);
 	if (err < 0) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "error validating blob: %s", fdt_strerror(err));
 		return (err);
 	}
 	free(fdtp);
 	if ((fdtp = malloc(fdtp_size)) == NULL) {
 		command_errmsg = "can't allocate memory for device tree copy";
 		return (1);
 	}
 
 	bcopy(header, fdtp, fdtp_size);
 	return (0);
 }
 
 int
 fdt_load_dtb_file(const char * filename)
 {
 	struct preloaded_file *bfp, *oldbfp;
 	int err;
 
 	debugf("fdt_load_dtb_file(%s)\n", filename);
 
 	oldbfp = file_findfile(NULL, "dtb");
 
 	/* Attempt to load and validate a new dtb from a file. */
 	if ((bfp = file_loadraw(filename, "dtb", 1)) == NULL) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "failed to load file '%s'", filename);
 		return (1);
 	}
 	if ((err = fdt_load_dtb(bfp->f_addr)) != 0) {
 		file_discard(bfp);
 		return (err);
 	}
 
 	/* A new dtb was validated, discard any previous file. */
 	if (oldbfp)
 		file_discard(oldbfp);
 	return (0);
 }
 
 static int
 fdt_load_dtb_overlay(const char * filename)
 {
 	struct preloaded_file *bfp;
 	struct fdt_header header;
 	int err;
 
 	debugf("fdt_load_dtb_overlay(%s)\n", filename);
 
 	/* Attempt to load and validate a new dtb from a file. FDT_ERR_NOTFOUND
 	 * is normally a libfdt error code, but libfdt would actually return
 	 * -FDT_ERR_NOTFOUND. We re-purpose the error code here to convey a
 	 * similar meaning: the file itself was not found, which can still be
 	 * considered an error dealing with FDT pieces.
 	 */
 	if ((bfp = file_loadraw(filename, "dtbo", 1)) == NULL)
 		return (FDT_ERR_NOTFOUND);
 
 	COPYOUT(bfp->f_addr, &header, sizeof(header));
 	err = fdt_check_header(&header);
 
 	if (err < 0) {
 		file_discard(bfp);
 		return (err);
 	}
 
 	return (0);
 }
 
 static void
 fdt_print_overlay_load_error(int err, const char *filename)
 {
 
 	switch (err) {
 		case FDT_ERR_NOTFOUND:
 			printf("%s: failed to load file\n", filename);
 			break;
 		case -FDT_ERR_BADVERSION:
 			printf("%s: incompatible blob version: %d, should be: %d\n",
 			    filename, fdt_version(fdtp),
 			    FDT_LAST_SUPPORTED_VERSION);
 			break;
 		default:
 			/* libfdt errs are negative */
 			if (err < 0)
 				printf("%s: error validating blob: %s\n",
 				    filename, fdt_strerror(err));
 			else
 				printf("%s: unknown load error\n", filename);
 			break;
 	}
 }
 
 static int
 fdt_load_dtb_overlays_string(const char * filenames)
 {
 	char *names;
 	char *name, *name_ext;
 	char *comaptr;
 	int err, namesz;
 
 	debugf("fdt_load_dtb_overlays_string(%s)\n", filenames);
 
 	names = strdup(filenames);
 	if (names == NULL)
 		return (1);
 	name = names;
 	do {
 		comaptr = strchr(name, ',');
 		if (comaptr)
 			*comaptr = '\0';
 		err = fdt_load_dtb_overlay(name);
 		if (err == FDT_ERR_NOTFOUND) {
 			/* Allocate enough to append ".dtbo" */
 			namesz = strlen(name) + 6;
 			name_ext = malloc(namesz);
 			if (name_ext == NULL) {
 				fdt_print_overlay_load_error(err, name);
 				name = comaptr + 1;
 				continue;
 			}
 			snprintf(name_ext, namesz, "%s.dtbo", name);
 			err = fdt_load_dtb_overlay(name_ext);
 			free(name_ext);
 		}
 		/* Catch error with either initial load or fallback load */
 		if (err != 0)
 			fdt_print_overlay_load_error(err, name);
 		name = comaptr + 1;
 	} while(comaptr);
 
 	free(names);
 	return (0);
 }
 
 /*
  * fdt_check_overlay_compatible - check that the overlay_fdt is compatible with
  * base_fdt before we attempt to apply it. It will need to re-calculate offsets
  * in the base every time, rather than trying to cache them earlier in the
  * process, because the overlay application process can/will invalidate a lot of
  * offsets.
  */
 static int
 fdt_check_overlay_compatible(void *base_fdt, void *overlay_fdt)
 {
 	const char *compat;
 	int compat_len, ocompat_len;
 	int oroot_offset, root_offset;
 	int slidx, sllen;
 
 	oroot_offset = fdt_path_offset(overlay_fdt, "/");
 	if (oroot_offset < 0)
 		return (oroot_offset);
 	/*
 	 * If /compatible in the overlay does not exist or if it is empty, then
 	 * we're automatically compatible. We do this for the sake of rapid
 	 * overlay development for overlays that aren't intended to be deployed.
 	 * The user assumes the risk of using an overlay without /compatible.
 	 */
 	if (fdt_get_property(overlay_fdt, oroot_offset, "compatible",
 	    &ocompat_len) == NULL || ocompat_len == 0)
 		return (0);
 	root_offset = fdt_path_offset(base_fdt, "/");
 	if (root_offset < 0)
 		return (root_offset);
 	/*
 	 * However, an empty or missing /compatible on the base is an error,
 	 * because allowing this offers no advantages.
 	 */
 	if (fdt_get_property(base_fdt, root_offset, "compatible",
 	    &compat_len) == NULL)
 		return (compat_len);
 	else if(compat_len == 0)
 		return (1);
 
 	slidx = 0;
 	compat = fdt_stringlist_get(overlay_fdt, oroot_offset, "compatible",
 	    slidx, &sllen);
 	while (compat != NULL) {
 		if (fdt_stringlist_search(base_fdt, root_offset, "compatible",
 		    compat) >= 0)
 			return (0);
 		++slidx;
 		compat = fdt_stringlist_get(overlay_fdt, oroot_offset,
 		    "compatible", slidx, &sllen);
 	};
 
 	/* We've exhausted the overlay's /compatible property... no match */
 	return (1);
 }
 
 void
 fdt_apply_overlays()
 {
 	struct preloaded_file *fp;
 	size_t max_overlay_size, next_fdtp_size;
 	size_t current_fdtp_size;
 	void *current_fdtp;
 	void *next_fdtp;
 	void *overlay;
 	int rv;
 
 	if ((fdtp == NULL) || (fdtp_size == 0))
 		return;
 
 	max_overlay_size = 0;
 	for (fp = file_findfile(NULL, "dtbo"); fp != NULL; fp = fp->f_next) {
 		if (max_overlay_size < fp->f_size)
 			max_overlay_size = fp->f_size;
 	}
 
 	/* Nothing to apply */
 	if (max_overlay_size == 0)
 		return;
 
 	overlay = malloc(max_overlay_size);
 	if (overlay == NULL) {
 		printf("failed to allocate memory for DTB blob with overlays\n");
 		return;
 	}
 	current_fdtp = fdtp;
 	current_fdtp_size = fdtp_size;
 	for (fp = file_findfile(NULL, "dtbo"); fp != NULL; fp = fp->f_next) {
 		COPYOUT(fp->f_addr, overlay, fp->f_size);
 		/* Check compatible first to avoid unnecessary allocation */
 		rv = fdt_check_overlay_compatible(current_fdtp, overlay);
 		if (rv != 0) {
 			printf("DTB overlay '%s' not compatible\n", fp->f_name);
 			continue;
 		}
 		printf("applying DTB overlay '%s'\n", fp->f_name);
 		next_fdtp_size = current_fdtp_size + fp->f_size;
 		next_fdtp = malloc(next_fdtp_size);
 		if (next_fdtp == NULL) {
 			/*
 			 * Output warning, then move on to applying other
 			 * overlays in case this one is simply too large.
 			 */
 			printf("failed to allocate memory for overlay base\n");
 			continue;
 		}
 		rv = fdt_open_into(current_fdtp, next_fdtp, next_fdtp_size);
 		if (rv != 0) {
 			free(next_fdtp);
 			printf("failed to open base dtb into overlay base\n");
 			continue;
 		}
 		/* Both overlay and next_fdtp may be modified in place */
 		rv = fdt_overlay_apply(next_fdtp, overlay);
 		if (rv == 0) {
 			/* Rotate next -> current */
 			if (current_fdtp != fdtp)
 				free(current_fdtp);
 			current_fdtp = next_fdtp;
 			current_fdtp_size = next_fdtp_size;
 		} else {
 			/*
 			 * Assume here that the base we tried to apply on is
 			 * either trashed or in an inconsistent state. Trying to
 			 * load it might work, but it's better to discard it and
 			 * play it safe. */
 			free(next_fdtp);
 			printf("failed to apply overlay: %s\n",
 			    fdt_strerror(rv));
 		}
 	}
 	/* We could have failed to apply all overlays; then we do nothing */
 	if (current_fdtp != fdtp) {
 		free(fdtp);
 		fdtp = current_fdtp;
 		fdtp_size = current_fdtp_size;
 	}
 	free(overlay);
 }
 
 int
+fdt_is_setup(void)
+{
+
+	if (fdtp != NULL)
+		return (1);
+
+	return (0);
+}
+
+int
 fdt_setup_fdtp()
 {
 	struct preloaded_file *bfp;
 	vm_offset_t va;
 	
 	debugf("fdt_setup_fdtp()\n");
 
 	/* If we already loaded a file, use it. */
 	if ((bfp = file_findfile(NULL, "dtb")) != NULL) {
 		if (fdt_load_dtb(bfp->f_addr) == 0) {
 			printf("Using DTB from loaded file '%s'.\n", 
 			    bfp->f_name);
 			fdt_platform_load_overlays();
 			return (0);
 		}
 	}
 
 	/* If we were given the address of a valid blob in memory, use it. */
 	if (fdt_to_load != NULL) {
 		if (fdt_load_dtb_addr(fdt_to_load) == 0) {
 			printf("Using DTB from memory address %p.\n",
 			    fdt_to_load);
 			fdt_platform_load_overlays();
 			return (0);
 		}
 	}
 
 	if (fdt_platform_load_dtb() == 0) {
 		fdt_platform_load_overlays();
 		return (0);
 	}
 
 	/* If there is a dtb compiled into the kernel, use it. */
 	if ((va = fdt_find_static_dtb()) != 0) {
 		if (fdt_load_dtb(va) == 0) {
 			printf("Using DTB compiled into kernel.\n");
 			return (0);
 		}
 	}
 	
 	command_errmsg = "No device tree blob found!\n";
 	return (1);
 }
 
 #define fdt_strtovect(str, cellbuf, lim, cellsize) _fdt_strtovect((str), \
     (cellbuf), (lim), (cellsize), 0);
 
 /* Force using base 16 */
 #define fdt_strtovectx(str, cellbuf, lim, cellsize) _fdt_strtovect((str), \
     (cellbuf), (lim), (cellsize), 16);
 
 static int
 _fdt_strtovect(const char *str, void *cellbuf, int lim, unsigned char cellsize,
     uint8_t base)
 {
 	const char *buf = str;
 	const char *end = str + strlen(str) - 2;
 	uint32_t *u32buf = NULL;
 	uint8_t *u8buf = NULL;
 	int cnt = 0;
 
 	if (cellsize == sizeof(uint32_t))
 		u32buf = (uint32_t *)cellbuf;
 	else
 		u8buf = (uint8_t *)cellbuf;
 
 	if (lim == 0)
 		return (0);
 
 	while (buf < end) {
 
 		/* Skip white whitespace(s)/separators */
 		while (!isxdigit(*buf) && buf < end)
 			buf++;
 
 		if (u32buf != NULL)
 			u32buf[cnt] =
 			    cpu_to_fdt32((uint32_t)strtol(buf, NULL, base));
 
 		else
 			u8buf[cnt] = (uint8_t)strtol(buf, NULL, base);
 
 		if (cnt + 1 <= lim - 1)
 			cnt++;
 		else
 			break;
 		buf++;
 		/* Find another number */
 		while ((isxdigit(*buf) || *buf == 'x') && buf < end)
 			buf++;
 	}
 	return (cnt);
 }
 
 void
 fdt_fixup_ethernet(const char *str, char *ethstr, int len)
 {
 	uint8_t tmp_addr[6];
 
 	/* Convert macaddr string into a vector of uints */
 	fdt_strtovectx(str, &tmp_addr, 6, sizeof(uint8_t));
 	/* Set actual property to a value from vect */
 	fdt_setprop(fdtp, fdt_path_offset(fdtp, ethstr),
 	    "local-mac-address", &tmp_addr, 6 * sizeof(uint8_t));
 }
 
 void
 fdt_fixup_cpubusfreqs(unsigned long cpufreq, unsigned long busfreq)
 {
 	int lo, o = 0, o2, maxo = 0, depth;
 	const uint32_t zero = 0;
 
 	/* We want to modify every subnode of /cpus */
 	o = fdt_path_offset(fdtp, "/cpus");
 	if (o < 0)
 		return;
 
 	/* maxo should contain offset of node next to /cpus */
 	depth = 0;
 	maxo = o;
 	while (depth != -1)
 		maxo = fdt_next_node(fdtp, maxo, &depth);
 
 	/* Find CPU frequency properties */
 	o = fdt_node_offset_by_prop_value(fdtp, o, "clock-frequency",
 	    &zero, sizeof(uint32_t));
 
 	o2 = fdt_node_offset_by_prop_value(fdtp, o, "bus-frequency", &zero,
 	    sizeof(uint32_t));
 
 	lo = MIN(o, o2);
 
 	while (o != -FDT_ERR_NOTFOUND && o2 != -FDT_ERR_NOTFOUND) {
 
 		o = fdt_node_offset_by_prop_value(fdtp, lo,
 		    "clock-frequency", &zero, sizeof(uint32_t));
 
 		o2 = fdt_node_offset_by_prop_value(fdtp, lo, "bus-frequency",
 		    &zero, sizeof(uint32_t));
 
 		/* We're only interested in /cpus subnode(s) */
 		if (lo > maxo)
 			break;
 
 		fdt_setprop_inplace_cell(fdtp, lo, "clock-frequency",
 		    (uint32_t)cpufreq);
 
 		fdt_setprop_inplace_cell(fdtp, lo, "bus-frequency",
 		    (uint32_t)busfreq);
 
 		lo = MIN(o, o2);
 	}
 }
 
 #ifdef notyet
 static int
 fdt_reg_valid(uint32_t *reg, int len, int addr_cells, int size_cells)
 {
 	int cells_in_tuple, i, tuples, tuple_size;
 	uint32_t cur_start, cur_size;
 
 	cells_in_tuple = (addr_cells + size_cells);
 	tuple_size = cells_in_tuple * sizeof(uint32_t);
 	tuples = len / tuple_size;
 	if (tuples == 0)
 		return (EINVAL);
 
 	for (i = 0; i < tuples; i++) {
 		if (addr_cells == 2)
 			cur_start = fdt64_to_cpu(reg[i * cells_in_tuple]);
 		else
 			cur_start = fdt32_to_cpu(reg[i * cells_in_tuple]);
 
 		if (size_cells == 2)
 			cur_size = fdt64_to_cpu(reg[i * cells_in_tuple + 2]);
 		else
 			cur_size = fdt32_to_cpu(reg[i * cells_in_tuple + 1]);
 
 		if (cur_size == 0)
 			return (EINVAL);
 
 		debugf(" reg#%d (start: 0x%0x size: 0x%0x) valid!\n",
 		    i, cur_start, cur_size);
 	}
 	return (0);
 }
 #endif
 
 void
 fdt_fixup_memory(struct fdt_mem_region *region, size_t num)
 {
 	struct fdt_mem_region *curmr;
 	uint32_t addr_cells, size_cells;
 	uint32_t *addr_cellsp, *size_cellsp;
 	int err, i, len, memory, root;
 	size_t realmrno;
 	uint8_t *buf, *sb;
 	uint64_t rstart, rsize;
 	int reserved;
 
 	root = fdt_path_offset(fdtp, "/");
 	if (root < 0) {
 		sprintf(command_errbuf, "Could not find root node !");
 		return;
 	}
 
 	memory = fdt_path_offset(fdtp, "/memory");
 	if (memory <= 0) {
 		/* Create proper '/memory' node. */
 		memory = fdt_add_subnode(fdtp, root, "memory");
 		if (memory <= 0) {
 			snprintf(command_errbuf, sizeof(command_errbuf),
 			    "Could not fixup '/memory' "
 			    "node, error code : %d!\n", memory);
 			return;
 		}
 
 		err = fdt_setprop(fdtp, memory, "device_type", "memory",
 		    sizeof("memory"));
 
 		if (err < 0)
 			return;
 	}
 
 	addr_cellsp = (uint32_t *)fdt_getprop(fdtp, root, "#address-cells",
 	    NULL);
 	size_cellsp = (uint32_t *)fdt_getprop(fdtp, root, "#size-cells", NULL);
 
 	if (addr_cellsp == NULL || size_cellsp == NULL) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "Could not fixup '/memory' node : "
 		    "%s %s property not found in root node!\n",
 		    (!addr_cellsp) ? "#address-cells" : "",
 		    (!size_cellsp) ? "#size-cells" : "");
 		return;
 	}
 
 	addr_cells = fdt32_to_cpu(*addr_cellsp);
 	size_cells = fdt32_to_cpu(*size_cellsp);
 
 	/*
 	 * Convert memreserve data to memreserve property
 	 * Check if property already exists
 	 */
 	reserved = fdt_num_mem_rsv(fdtp);
 	if (reserved &&
 	    (fdt_getprop(fdtp, root, "memreserve", NULL) == NULL)) {
 		len = (addr_cells + size_cells) * reserved * sizeof(uint32_t);
 		sb = buf = (uint8_t *)malloc(len);
 		if (!buf)
 			return;
 
 		bzero(buf, len);
 
 		for (i = 0; i < reserved; i++) {
 			if (fdt_get_mem_rsv(fdtp, i, &rstart, &rsize))
 				break;
 			if (rsize) {
 				/* Ensure endianness, and put cells into a buffer */
 				if (addr_cells == 2)
 					*(uint64_t *)buf =
 					    cpu_to_fdt64(rstart);
 				else
 					*(uint32_t *)buf =
 					    cpu_to_fdt32(rstart);
 
 				buf += sizeof(uint32_t) * addr_cells;
 				if (size_cells == 2)
 					*(uint64_t *)buf =
 					    cpu_to_fdt64(rsize);
 				else
 					*(uint32_t *)buf =
 					    cpu_to_fdt32(rsize);
 
 				buf += sizeof(uint32_t) * size_cells;
 			}
 		}
 
 		/* Set property */
 		if ((err = fdt_setprop(fdtp, root, "memreserve", sb, len)) < 0)
 			printf("Could not fixup 'memreserve' property.\n");
 
 		free(sb);
 	} 
 
 	/* Count valid memory regions entries in sysinfo. */
 	realmrno = num;
 	for (i = 0; i < num; i++)
 		if (region[i].start == 0 && region[i].size == 0)
 			realmrno--;
 
 	if (realmrno == 0) {
 		sprintf(command_errbuf, "Could not fixup '/memory' node : "
 		    "sysinfo doesn't contain valid memory regions info!\n");
 		return;
 	}
 
 	len = (addr_cells + size_cells) * realmrno * sizeof(uint32_t);
 	sb = buf = (uint8_t *)malloc(len);
 	if (!buf)
 		return;
 
 	bzero(buf, len);
 
 	for (i = 0; i < num; i++) {
 		curmr = &region[i];
 		if (curmr->size != 0) {
 			/* Ensure endianness, and put cells into a buffer */
 			if (addr_cells == 2)
 				*(uint64_t *)buf =
 				    cpu_to_fdt64(curmr->start);
 			else
 				*(uint32_t *)buf =
 				    cpu_to_fdt32(curmr->start);
 
 			buf += sizeof(uint32_t) * addr_cells;
 			if (size_cells == 2)
 				*(uint64_t *)buf =
 				    cpu_to_fdt64(curmr->size);
 			else
 				*(uint32_t *)buf =
 				    cpu_to_fdt32(curmr->size);
 
 			buf += sizeof(uint32_t) * size_cells;
 		}
 	}
 
 	/* Set property */
 	if ((err = fdt_setprop(fdtp, memory, "reg", sb, len)) < 0)
 		sprintf(command_errbuf, "Could not fixup '/memory' node.\n");
 
 	free(sb);
 }
 
 void
 fdt_fixup_stdout(const char *str)
 {
 	char *ptr;
 	int len, no, sero;
 	const struct fdt_property *prop;
 	char *tmp[10];
 
 	ptr = (char *)str + strlen(str) - 1;
 	while (ptr > str && isdigit(*(str - 1)))
 		str--;
 
 	if (ptr == str)
 		return;
 
 	no = fdt_path_offset(fdtp, "/chosen");
 	if (no < 0)
 		return;
 
 	prop = fdt_get_property(fdtp, no, "stdout", &len);
 
 	/* If /chosen/stdout does not extist, create it */
 	if (prop == NULL || (prop != NULL && len == 0)) {
 
 		bzero(tmp, 10 * sizeof(char));
 		strcpy((char *)&tmp, "serial");
 		if (strlen(ptr) > 3)
 			/* Serial number too long */
 			return;
 
 		strncpy((char *)tmp + 6, ptr, 3);
 		sero = fdt_path_offset(fdtp, (const char *)tmp);
 		if (sero < 0)
 			/*
 			 * If serial device we're trying to assign
 			 * stdout to doesn't exist in DT -- return.
 			 */
 			return;
 
 		fdt_setprop(fdtp, no, "stdout", &tmp,
 		    strlen((char *)&tmp) + 1);
 		fdt_setprop(fdtp, no, "stdin", &tmp,
 		    strlen((char *)&tmp) + 1);
 	}
 }
 
 void
 fdt_load_dtb_overlays(const char *extras)
 {
 	const char *s;
 
 	/* Any extra overlays supplied by pre-loader environment */
 	if (extras != NULL && *extras != '\0') {
 		printf("Loading DTB overlays: '%s'\n", extras);
 		fdt_load_dtb_overlays_string(extras);
 	}
 
 	/* Any overlays supplied by loader environment */
 	s = getenv("fdt_overlays");
 	if (s != NULL && *s != '\0') {
 		printf("Loading DTB overlays: '%s'\n", s);
 		fdt_load_dtb_overlays_string(s);
 	}
 }
 
 /*
  * Locate the blob, fix it up and return its location.
  */
 static int
 fdt_fixup(void)
 {
 	int chosen;
 
 	debugf("fdt_fixup()\n");
 
 	if (fdtp == NULL && fdt_setup_fdtp() != 0)
 		return (0);
 
 	/* Create /chosen node (if not exists) */
 	if ((chosen = fdt_subnode_offset(fdtp, 0, "chosen")) ==
 	    -FDT_ERR_NOTFOUND)
 		chosen = fdt_add_subnode(fdtp, 0, "chosen");
 
 	/* Value assigned to fixup-applied does not matter. */
 	if (fdt_getprop(fdtp, chosen, "fixup-applied", NULL))
 		return (1);
 
 	fdt_platform_fixups();
 
 	/*
 	 * Re-fetch the /chosen subnode; our fixups may apply overlays or add
 	 * nodes/properties that invalidate the offset we grabbed or created
 	 * above, so we can no longer trust it.
 	 */
 	chosen = fdt_subnode_offset(fdtp, 0, "chosen");
 	fdt_setprop(fdtp, chosen, "fixup-applied", NULL, 0);
 	return (1);
 }
 
 /*
  * Copy DTB blob to specified location and return size
  */
 int
 fdt_copy(vm_offset_t va)
 {
 	int err;
 	debugf("fdt_copy va 0x%08x\n", va);
 	if (fdtp == NULL) {
 		err = fdt_setup_fdtp();
 		if (err) {
 			printf("No valid device tree blob found!\n");
 			return (0);
 		}
 	}
 
 	if (fdt_fixup() == 0)
 		return (0);
 
 	COPYIN(fdtp, va, fdtp_size);
 	return (fdtp_size);
 }
 
 
 
 int
 command_fdt_internal(int argc, char *argv[])
 {
 	cmdf_t *cmdh;
 	int flags;
 	int i, err;
 
 	if (argc < 2) {
 		command_errmsg = "usage is 'fdt <command> [<args>]";
 		return (CMD_ERROR);
 	}
 
 	/*
 	 * Validate fdt <command>.
 	 */
 	i = 0;
 	cmdh = NULL;
 	while (!(commands[i].name == NULL)) {
 		if (strcmp(argv[1], commands[i].name) == 0) {
 			/* found it */
 			cmdh = commands[i].handler;
 			flags = commands[i].flags;
 			break;
 		}
 		i++;
 	}
 	if (cmdh == NULL) {
 		command_errmsg = "unknown command";
 		return (CMD_ERROR);
 	}
 
 	if (flags & CMD_REQUIRES_BLOB) {
 		/*
 		 * Check if uboot env vars were parsed already. If not, do it now.
 		 */
 		if (fdt_fixup() == 0)
 			return (CMD_ERROR);
 	}
 
 	/*
 	 * Call command handler.
 	 */
 	err = (*cmdh)(argc, argv);
 
 	return (err);
 }
 
 static int
 fdt_cmd_addr(int argc, char *argv[])
 {
 	struct preloaded_file *fp;
 	struct fdt_header *hdr;
 	const char *addr;
 	char *cp;
 
 	fdt_to_load = NULL;
 
 	if (argc > 2)
 		addr = argv[2];
 	else {
 		sprintf(command_errbuf, "no address specified");
 		return (CMD_ERROR);
 	}
 
 	hdr = (struct fdt_header *)strtoul(addr, &cp, 16);
 	if (cp == addr) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "Invalid address: %s", addr);
 		return (CMD_ERROR);
 	}
 
 	while ((fp = file_findfile(NULL, "dtb")) != NULL) {
 		file_discard(fp);
 	}
 
 	fdt_to_load = hdr;
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_cd(int argc, char *argv[])
 {
 	char *path;
 	char tmp[FDT_CWD_LEN];
 	int len, o;
 
 	path = (argc > 2) ? argv[2] : "/";
 
 	if (path[0] == '/') {
 		len = strlen(path);
 		if (len >= FDT_CWD_LEN)
 			goto fail;
 	} else {
 		/* Handle path specification relative to cwd */
 		len = strlen(cwd) + strlen(path) + 1;
 		if (len >= FDT_CWD_LEN)
 			goto fail;
 
 		strcpy(tmp, cwd);
 		strcat(tmp, "/");
 		strcat(tmp, path);
 		path = tmp;
 	}
 
 	o = fdt_path_offset(fdtp, path);
 	if (o < 0) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "could not find node: '%s'", path);
 		return (CMD_ERROR);
 	}
 
 	strcpy(cwd, path);
 	return (CMD_OK);
 
 fail:
 	snprintf(command_errbuf, sizeof(command_errbuf),
 	    "path too long: %d, max allowed: %d", len, FDT_CWD_LEN - 1);
 	return (CMD_ERROR);
 }
 
 static int
 fdt_cmd_hdr(int argc __unused, char *argv[] __unused)
 {
 	char line[80];
 	int ver;
 
 	if (fdtp == NULL) {
 		command_errmsg = "no device tree blob pointer?!";
 		return (CMD_ERROR);
 	}
 
 	ver = fdt_version(fdtp);
 	pager_open();
 	sprintf(line, "\nFlattened device tree header (%p):\n", fdtp);
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " magic                   = 0x%08x\n", fdt_magic(fdtp));
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " size                    = %d\n", fdt_totalsize(fdtp));
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " off_dt_struct           = 0x%08x\n",
 	    fdt_off_dt_struct(fdtp));
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " off_dt_strings          = 0x%08x\n",
 	    fdt_off_dt_strings(fdtp));
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " off_mem_rsvmap          = 0x%08x\n",
 	    fdt_off_mem_rsvmap(fdtp));
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " version                 = %d\n", ver); 
 	if (pager_output(line))
 		goto out;
 	sprintf(line, " last compatible version = %d\n",
 	    fdt_last_comp_version(fdtp));
 	if (pager_output(line))
 		goto out;
 	if (ver >= 2) {
 		sprintf(line, " boot_cpuid              = %d\n",
 		    fdt_boot_cpuid_phys(fdtp));
 		if (pager_output(line))
 			goto out;
 	}
 	if (ver >= 3) {
 		sprintf(line, " size_dt_strings         = %d\n",
 		    fdt_size_dt_strings(fdtp));
 		if (pager_output(line))
 			goto out;
 	}
 	if (ver >= 17) {
 		sprintf(line, " size_dt_struct          = %d\n",
 		    fdt_size_dt_struct(fdtp));
 		if (pager_output(line))
 			goto out;
 	}
 out:
 	pager_close();
 
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_ls(int argc, char *argv[])
 {
 	const char *prevname[FDT_MAX_DEPTH] = { NULL };
 	const char *name;
 	char *path;
 	int i, o, depth;
 
 	path = (argc > 2) ? argv[2] : NULL;
 	if (path == NULL)
 		path = cwd;
 
 	o = fdt_path_offset(fdtp, path);
 	if (o < 0) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "could not find node: '%s'", path);
 		return (CMD_ERROR);
 	}
 
 	for (depth = 0;
 	    (o >= 0) && (depth >= 0);
 	    o = fdt_next_node(fdtp, o, &depth)) {
 
 		name = fdt_get_name(fdtp, o, NULL);
 
 		if (depth > FDT_MAX_DEPTH) {
 			printf("max depth exceeded: %d\n", depth);
 			continue;
 		}
 
 		prevname[depth] = name;
 
 		/* Skip root (i = 1) when printing devices */
 		for (i = 1; i <= depth; i++) {
 			if (prevname[i] == NULL)
 				break;
 
 			if (strcmp(cwd, "/") == 0)
 				printf("/");
 			printf("%s", prevname[i]);
 		}
 		printf("\n");
 	}
 
 	return (CMD_OK);
 }
 
 static __inline int
 isprint(int c)
 {
 
 	return (c >= ' ' && c <= 0x7e);
 }
 
 static int
 fdt_isprint(const void *data, int len, int *count)
 {
 	const char *d;
 	char ch;
 	int yesno, i;
 
 	if (len == 0)
 		return (0);
 
 	d = (const char *)data;
 	if (d[len - 1] != '\0')
 		return (0);
 
 	*count = 0;
 	yesno = 1;
 	for (i = 0; i < len; i++) {
 		ch = *(d + i);
 		if (isprint(ch) || (ch == '\0' && i > 0)) {
 			/* Count strings */
 			if (ch == '\0')
 				(*count)++;
 			continue;
 		}
 
 		yesno = 0;
 		break;
 	}
 
 	return (yesno);
 }
 
 static int
 fdt_data_str(const void *data, int len, int count, char **buf)
 {
 	char *b, *tmp;
 	const char *d;
 	int buf_len, i, l;
 
 	/*
 	 * Calculate the length for the string and allocate memory.
 	 *
 	 * Note that 'len' already includes at least one terminator.
 	 */
 	buf_len = len;
 	if (count > 1) {
 		/*
 		 * Each token had already a terminator buried in 'len', but we
 		 * only need one eventually, don't count space for these.
 		 */
 		buf_len -= count - 1;
 
 		/* Each consecutive token requires a ", " separator. */
 		buf_len += count * 2;
 	}
 
 	/* Add some space for surrounding double quotes. */
 	buf_len += count * 2;
 
 	/* Note that string being put in 'tmp' may be as big as 'buf_len'. */
 	b = (char *)malloc(buf_len);
 	tmp = (char *)malloc(buf_len);
 	if (b == NULL)
 		goto error;
 
 	if (tmp == NULL) {
 		free(b);
 		goto error;
 	}
 
 	b[0] = '\0';
 
 	/*
 	 * Now that we have space, format the string.
 	 */
 	i = 0;
 	do {
 		d = (const char *)data + i;
 		l = strlen(d) + 1;
 
 		sprintf(tmp, "\"%s\"%s", d,
 		    (i + l) < len ?  ", " : "");
 		strcat(b, tmp);
 
 		i += l;
 
 	} while (i < len);
 	*buf = b;
 
 	free(tmp);
 
 	return (0);
 error:
 	return (1);
 }
 
 static int
 fdt_data_cell(const void *data, int len, char **buf)
 {
 	char *b, *tmp;
 	const uint32_t *c;
 	int count, i, l;
 
 	/* Number of cells */
 	count = len / 4;
 
 	/*
 	 * Calculate the length for the string and allocate memory.
 	 */
 
 	/* Each byte translates to 2 output characters */
 	l = len * 2;
 	if (count > 1) {
 		/* Each consecutive cell requires a " " separator. */
 		l += (count - 1) * 1;
 	}
 	/* Each cell will have a "0x" prefix */
 	l += count * 2;
 	/* Space for surrounding <> and terminator */
 	l += 3;
 
 	b = (char *)malloc(l);
 	tmp = (char *)malloc(l);
 	if (b == NULL)
 		goto error;
 
 	if (tmp == NULL) {
 		free(b);
 		goto error;
 	}
 
 	b[0] = '\0';
 	strcat(b, "<");
 
 	for (i = 0; i < len; i += 4) {
 		c = (const uint32_t *)((const uint8_t *)data + i);
 		sprintf(tmp, "0x%08x%s", fdt32_to_cpu(*c),
 		    i < (len - 4) ? " " : "");
 		strcat(b, tmp);
 	}
 	strcat(b, ">");
 	*buf = b;
 
 	free(tmp);
 
 	return (0);
 error:
 	return (1);
 }
 
 static int
 fdt_data_bytes(const void *data, int len, char **buf)
 {
 	char *b, *tmp;
 	const char *d;
 	int i, l;
 
 	/*
 	 * Calculate the length for the string and allocate memory.
 	 */
 
 	/* Each byte translates to 2 output characters */
 	l = len * 2;
 	if (len > 1)
 		/* Each consecutive byte requires a " " separator. */
 		l += (len - 1) * 1;
 	/* Each byte will have a "0x" prefix */
 	l += len * 2;
 	/* Space for surrounding [] and terminator. */
 	l += 3;
 
 	b = (char *)malloc(l);
 	tmp = (char *)malloc(l);
 	if (b == NULL)
 		goto error;
 
 	if (tmp == NULL) {
 		free(b);
 		goto error;
 	}
 
 	b[0] = '\0';
 	strcat(b, "[");
 
 	for (i = 0, d = data; i < len; i++) {
 		sprintf(tmp, "0x%02x%s", d[i], i < len - 1 ? " " : "");
 		strcat(b, tmp);
 	}
 	strcat(b, "]");
 	*buf = b;
 
 	free(tmp);
 
 	return (0);
 error:
 	return (1);
 }
 
 static int
 fdt_data_fmt(const void *data, int len, char **buf)
 {
 	int count;
 
 	if (len == 0) {
 		*buf = NULL;
 		return (1);
 	}
 
 	if (fdt_isprint(data, len, &count))
 		return (fdt_data_str(data, len, count, buf));
 
 	else if ((len % 4) == 0)
 		return (fdt_data_cell(data, len, buf));
 
 	else
 		return (fdt_data_bytes(data, len, buf));
 }
 
 static int
 fdt_prop(int offset)
 {
 	char *line, *buf;
 	const struct fdt_property *prop;
 	const char *name;
 	const void *data;
 	int len, rv;
 
 	line = NULL;
 	prop = fdt_offset_ptr(fdtp, offset, sizeof(*prop));
 	if (prop == NULL)
 		return (1);
 
 	name = fdt_string(fdtp, fdt32_to_cpu(prop->nameoff));
 	len = fdt32_to_cpu(prop->len);
 
 	rv = 0;
 	buf = NULL;
 	if (len == 0) {
 		/* Property without value */
 		line = (char *)malloc(strlen(name) + 2);
 		if (line == NULL) {
 			rv = 2;
 			goto out2;
 		}
 		sprintf(line, "%s\n", name);
 		goto out1;
 	}
 
 	/*
 	 * Process property with value
 	 */
 	data = prop->data;
 
 	if (fdt_data_fmt(data, len, &buf) != 0) {
 		rv = 3;
 		goto out2;
 	}
 
 	line = (char *)malloc(strlen(name) + strlen(FDT_PROP_SEP) +
 	    strlen(buf) + 2);
 	if (line == NULL) {
 		sprintf(command_errbuf, "could not allocate space for string");
 		rv = 4;
 		goto out2;
 	}
 
 	sprintf(line, "%s" FDT_PROP_SEP "%s\n", name, buf);
 
 out1:
 	pager_open();
 	pager_output(line);
 	pager_close();
 
 out2:
 	if (buf)
 		free(buf);
 
 	if (line)
 		free(line);
 
 	return (rv);
 }
 
 static int
 fdt_modprop(int nodeoff, char *propname, void *value, char mode)
 {
 	uint32_t cells[100];
 	const char *buf;
 	int len, rv;
 	const struct fdt_property *p;
 
 	p = fdt_get_property(fdtp, nodeoff, propname, NULL);
 
 	if (p != NULL) {
 		if (mode == 1) {
 			 /* Adding inexistant value in mode 1 is forbidden */
 			sprintf(command_errbuf, "property already exists!");
 			return (CMD_ERROR);
 		}
 	} else if (mode == 0) {
 		sprintf(command_errbuf, "property does not exist!");
 		return (CMD_ERROR);
 	}
 	rv = 0;
 	buf = value;
 
 	switch (*buf) {
 	case '&':
 		/* phandles */
 		break;
 	case '<':
 		/* Data cells */
 		len = fdt_strtovect(buf, (void *)&cells, 100,
 		    sizeof(uint32_t));
 
 		rv = fdt_setprop(fdtp, nodeoff, propname, &cells,
 		    len * sizeof(uint32_t));
 		break;
 	case '[':
 		/* Data bytes */
 		len = fdt_strtovect(buf, (void *)&cells, 100,
 		    sizeof(uint8_t));
 
 		rv = fdt_setprop(fdtp, nodeoff, propname, &cells,
 		    len * sizeof(uint8_t));
 		break;
 	case '"':
 	default:
 		/* Default -- string */
 		rv = fdt_setprop_string(fdtp, nodeoff, propname, value);
 		break;
 	}
 
 	if (rv != 0) {
 		if (rv == -FDT_ERR_NOSPACE)
 			sprintf(command_errbuf,
 			    "Device tree blob is too small!\n");
 		else
 			sprintf(command_errbuf,
 			    "Could not add/modify property!\n");
 	}
 	return (rv);
 }
 
 /* Merge strings from argv into a single string */
 static int
 fdt_merge_strings(int argc, char *argv[], int start, char **buffer)
 {
 	char *buf;
 	int i, idx, sz;
 
 	*buffer = NULL;
 	sz = 0;
 
 	for (i = start; i < argc; i++)
 		sz += strlen(argv[i]);
 
 	/* Additional bytes for whitespaces between args */
 	sz += argc - start;
 
 	buf = (char *)malloc(sizeof(char) * sz);
 	if (buf == NULL) {
 		sprintf(command_errbuf, "could not allocate space "
 		    "for string");
 		return (1);
 	}
 	bzero(buf, sizeof(char) * sz);
 
 	idx = 0;
 	for (i = start, idx = 0; i < argc; i++) {
 		strcpy(buf + idx, argv[i]);
 		idx += strlen(argv[i]);
 		buf[idx] = ' ';
 		idx++;
 	}
 	buf[sz - 1] = '\0';
 	*buffer = buf;
 	return (0);
 }
 
 /* Extract offset and name of node/property from a given path */
 static int
 fdt_extract_nameloc(char **pathp, char **namep, int *nodeoff)
 {
 	int o;
 	char *path = *pathp, *name = NULL, *subpath = NULL;
 
 	subpath = strrchr(path, '/');
 	if (subpath == NULL) {
 		o = fdt_path_offset(fdtp, cwd);
 		name = path;
 		path = (char *)&cwd;
 	} else {
 		*subpath = '\0';
 		if (strlen(path) == 0)
 			path = cwd;
 
 		name = subpath + 1;
 		o = fdt_path_offset(fdtp, path);
 	}
 
 	if (strlen(name) == 0) {
 		sprintf(command_errbuf, "name not specified");
 		return (1);
 	}
 	if (o < 0) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "could not find node: '%s'", path);
 		return (1);
 	}
 	*namep = name;
 	*nodeoff = o;
 	*pathp = path;
 	return (0);
 }
 
 static int
 fdt_cmd_prop(int argc, char *argv[])
 {
 	char *path, *propname, *value;
 	int o, next, depth, rv;
 	uint32_t tag;
 
 	path = (argc > 2) ? argv[2] : NULL;
 
 	value = NULL;
 
 	if (argc > 3) {
 		/* Merge property value strings into one */
 		if (fdt_merge_strings(argc, argv, 3, &value) != 0)
 			return (CMD_ERROR);
 	} else
 		value = NULL;
 
 	if (path == NULL)
 		path = cwd;
 
 	rv = CMD_OK;
 
 	if (value) {
 		/* If value is specified -- try to modify prop. */
 		if (fdt_extract_nameloc(&path, &propname, &o) != 0)
 			return (CMD_ERROR);
 
 		rv = fdt_modprop(o, propname, value, 0);
 		if (rv)
 			return (CMD_ERROR);
 		return (CMD_OK);
 
 	}
 	/* User wants to display properties */
 	o = fdt_path_offset(fdtp, path);
 
 	if (o < 0) {
 		snprintf(command_errbuf, sizeof(command_errbuf),
 		    "could not find node: '%s'", path);
 		rv = CMD_ERROR;
 		goto out;
 	}
 
 	depth = 0;
 	while (depth >= 0) {
 		tag = fdt_next_tag(fdtp, o, &next);
 		switch (tag) {
 		case FDT_NOP:
 			break;
 		case FDT_PROP:
 			if (depth > 1)
 				/* Don't process properties of nested nodes */
 				break;
 
 			if (fdt_prop(o) != 0) {
 				sprintf(command_errbuf, "could not process "
 				    "property");
 				rv = CMD_ERROR;
 				goto out;
 			}
 			break;
 		case FDT_BEGIN_NODE:
 			depth++;
 			if (depth > FDT_MAX_DEPTH) {
 				printf("warning: nesting too deep: %d\n",
 				    depth);
 				goto out;
 			}
 			break;
 		case FDT_END_NODE:
 			depth--;
 			if (depth == 0)
 				/*
 				 * This is the end of our starting node, force
 				 * the loop finish.
 				 */
 				depth--;
 			break;
 		}
 		o = next;
 	}
 out:
 	return (rv);
 }
 
 static int
 fdt_cmd_mkprop(int argc, char *argv[])
 {
 	int o;
 	char *path, *propname, *value;
 
 	path = (argc > 2) ? argv[2] : NULL;
 
 	value = NULL;
 
 	if (argc > 3) {
 		/* Merge property value strings into one */
 		if (fdt_merge_strings(argc, argv, 3, &value) != 0)
 			return (CMD_ERROR);
 	} else
 		value = NULL;
 
 	if (fdt_extract_nameloc(&path, &propname, &o) != 0)
 		return (CMD_ERROR);
 
 	if (fdt_modprop(o, propname, value, 1))
 		return (CMD_ERROR);
 
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_rm(int argc, char *argv[])
 {
 	int o, rv;
 	char *path = NULL, *propname;
 
 	if (argc > 2)
 		path = argv[2];
 	else {
 		sprintf(command_errbuf, "no node/property name specified");
 		return (CMD_ERROR);
 	}
 
 	o = fdt_path_offset(fdtp, path);
 	if (o < 0) {
 		/* If node not found -- try to find & delete property */
 		if (fdt_extract_nameloc(&path, &propname, &o) != 0)
 			return (CMD_ERROR);
 
 		if ((rv = fdt_delprop(fdtp, o, propname)) != 0) {
 			snprintf(command_errbuf, sizeof(command_errbuf),
 			    "could not delete %s\n",
 			    (rv == -FDT_ERR_NOTFOUND) ?
 			    "(property/node does not exist)" : "");
 			return (CMD_ERROR);
 
 		} else
 			return (CMD_OK);
 	}
 	/* If node exists -- remove node */
 	rv = fdt_del_node(fdtp, o);
 	if (rv) {
 		sprintf(command_errbuf, "could not delete node");
 		return (CMD_ERROR);
 	}
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_mknode(int argc, char *argv[])
 {
 	int o, rv;
 	char *path = NULL, *nodename = NULL;
 
 	if (argc > 2)
 		path = argv[2];
 	else {
 		sprintf(command_errbuf, "no node name specified");
 		return (CMD_ERROR);
 	}
 
 	if (fdt_extract_nameloc(&path, &nodename, &o) != 0)
 		return (CMD_ERROR);
 
 	rv = fdt_add_subnode(fdtp, o, nodename);
 
 	if (rv < 0) {
 		if (rv == -FDT_ERR_NOSPACE)
 			sprintf(command_errbuf,
 			    "Device tree blob is too small!\n");
 		else
 			sprintf(command_errbuf,
 			    "Could not add node!\n");
 		return (CMD_ERROR);
 	}
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_pwd(int argc, char *argv[])
 {
 	char line[FDT_CWD_LEN];
 
 	pager_open();
 	sprintf(line, "%s\n", cwd);
 	pager_output(line);
 	pager_close();
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_mres(int argc, char *argv[])
 {
 	uint64_t start, size;
 	int i, total;
 	char line[80];
 
 	pager_open();
 	total = fdt_num_mem_rsv(fdtp);
 	if (total > 0) {
 		if (pager_output("Reserved memory regions:\n"))
 			goto out;
 		for (i = 0; i < total; i++) {
 			fdt_get_mem_rsv(fdtp, i, &start, &size);
 			sprintf(line, "reg#%d: (start: 0x%jx, size: 0x%jx)\n", 
 			    i, start, size);
 			if (pager_output(line))
 				goto out;
 		}
 	} else
 		pager_output("No reserved memory regions\n");
 out:
 	pager_close();
 
 	return (CMD_OK);
 }
 
 static int
 fdt_cmd_nyi(int argc, char *argv[])
 {
 
 	printf("command not yet implemented\n");
 	return (CMD_ERROR);
 }
Index: user/ngie/bug-237403/stand/fdt/fdt_platform.h
===================================================================
--- user/ngie/bug-237403/stand/fdt/fdt_platform.h	(revision 346925)
+++ user/ngie/bug-237403/stand/fdt/fdt_platform.h	(revision 346926)
@@ -1,57 +1,58 @@
 /*-
  * Copyright (c) 2014 Andrew Turner <andrew@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef FDT_PLATFORM_H
 #define FDT_PLATFORM_H
 
 struct fdt_header;
 
 struct fdt_mem_region {
 	unsigned long	start;
 	unsigned long	size;
 };
 
 #define	TMP_MAX_ETH	8
 
 int fdt_copy(vm_offset_t);
 void fdt_fixup_cpubusfreqs(unsigned long, unsigned long);
 void fdt_fixup_ethernet(const char *, char *, int);
 void fdt_fixup_memory(struct fdt_mem_region *, size_t);
 void fdt_fixup_stdout(const char *);
 void fdt_apply_overlays(void);
 int fdt_load_dtb_addr(struct fdt_header *);
 int fdt_load_dtb_file(const char *);
 void fdt_load_dtb_overlays(const char *);
 int fdt_setup_fdtp(void);
+int fdt_is_setup(void);
 
 /* The platform library needs to implement these functions */
 int fdt_platform_load_dtb(void);
 void fdt_platform_load_overlays(void);
 void fdt_platform_fixups(void);
 
 #endif /* FDT_PLATFORM_H */
Index: user/ngie/bug-237403/stand/i386/loader/conf.c
===================================================================
--- user/ngie/bug-237403/stand/i386/loader/conf.c	(revision 346925)
+++ user/ngie/bug-237403/stand/i386/loader/conf.c	(revision 346926)
@@ -1,168 +1,170 @@
 /*-
  * Copyright (c) 1998 Michael Smith <msmith@freebsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <stand.h>
 #include <bootstrap.h>
 #include "libi386/libi386.h"
 #if defined(LOADER_ZFS_SUPPORT)
 #include "libzfs.h"
 #endif
 
 /*
  * We could use linker sets for some or all of these, but
  * then we would have to control what ended up linked into
  * the bootstrap.  So it's easier to conditionalise things
  * here.
  *
  * XXX rename these arrays to be consistent and less namespace-hostile
  *
  * XXX as libi386 and biosboot merge, some of these can become linker sets.
  */
 
 #if defined(LOADER_FIREWIRE_SUPPORT)
 extern struct devsw fwohci;
 #endif
+extern struct devsw vdisk_dev;
 
 /* Exported for libstand */
 struct devsw *devsw[] = {
     &biosfd,
     &bioscd,
     &bioshd,
 #if defined(LOADER_NFS_SUPPORT) || defined(LOADER_TFTP_SUPPORT)
     &pxedisk,
 #endif
 #if defined(LOADER_FIREWIRE_SUPPORT)
     &fwohci,
 #endif
+    &vdisk_dev,
 #if defined(LOADER_ZFS_SUPPORT)
     &zfs_dev,
 #endif
     NULL
 };
 
 struct fs_ops *file_system[] = {
 #if defined(LOADER_ZFS_SUPPORT)
     &zfs_fsops,
 #endif
 #if defined(LOADER_UFS_SUPPORT)
     &ufs_fsops,
 #endif
 #if defined(LOADER_EXT2FS_SUPPORT)
     &ext2fs_fsops,
 #endif
 #if defined(LOADER_MSDOS_SUPPORT)
     &dosfs_fsops,
 #endif
 #if defined(LOADER_CD9660_SUPPORT)
     &cd9660_fsops,
 #endif
 #if defined(LOADER_NANDFS_SUPPORT)
     &nandfs_fsops,
 #endif
 #ifdef LOADER_NFS_SUPPORT 
     &nfs_fsops,
 #endif
 #ifdef LOADER_TFTP_SUPPORT
     &tftp_fsops,
 #endif
 #ifdef LOADER_GZIP_SUPPORT
     &gzipfs_fsops,
 #endif
 #ifdef LOADER_BZIP2_SUPPORT
     &bzipfs_fsops,
 #endif
 #ifdef LOADER_SPLIT_SUPPORT
     &splitfs_fsops,
 #endif
     NULL
 };
 
 /* Exported for i386 only */
 /* 
  * Sort formats so that those that can detect based on arguments
  * rather than reading the file go first.
  */
 extern struct file_format	i386_elf;
 extern struct file_format	i386_elf_obj;
 extern struct file_format	amd64_elf;
 extern struct file_format	amd64_elf_obj;
 extern struct file_format	multiboot;
 extern struct file_format	multiboot_obj;
 
 struct file_format *file_formats[] = {
 	&multiboot,
 	&multiboot_obj,
 #ifdef LOADER_PREFER_AMD64
     &amd64_elf,
     &amd64_elf_obj,
 #endif
     &i386_elf,
     &i386_elf_obj,
 #ifndef LOADER_PREFER_AMD64
     &amd64_elf,
     &amd64_elf_obj,
 #endif
     NULL
 };
 
 /* 
  * Consoles 
  *
  * We don't prototype these in libi386.h because they require
  * data structures from bootstrap.h as well.
  */
 extern struct console vidconsole;
 extern struct console comconsole;
 #if defined(LOADER_FIREWIRE_SUPPORT)
 extern struct console dconsole;
 #endif
 extern struct console nullconsole;
 extern struct console spinconsole;
 
 struct console *consoles[] = {
     &vidconsole,
     &comconsole,
 #if defined(LOADER_FIREWIRE_SUPPORT)
     &dconsole,
 #endif
     &nullconsole,
     &spinconsole,
     NULL
 };
 
 extern struct pnphandler isapnphandler;
 extern struct pnphandler biospnphandler;
 extern struct pnphandler biospcihandler;
 
 struct pnphandler *pnphandlers[] = {
     &biospnphandler,		/* should go first, as it may set isapnp_readport */
     &isapnphandler,
     &biospcihandler,
     NULL
 };
Index: user/ngie/bug-237403/stand/loader.mk
===================================================================
--- user/ngie/bug-237403/stand/loader.mk	(revision 346925)
+++ user/ngie/bug-237403/stand/loader.mk	(revision 346926)
@@ -1,178 +1,178 @@
 # $FreeBSD$
 
 .PATH: ${LDRSRC} ${BOOTSRC}/libsa
 
 CFLAGS+=-I${LDRSRC}
 
 SRCS+=	boot.c commands.c console.c devopen.c interp.c 
 SRCS+=	interp_backslash.c interp_parse.c ls.c misc.c 
 SRCS+=	module.c
 
 .if ${MACHINE} == "i386" || ${MACHINE_CPUARCH} == "amd64"
 SRCS+=	load_elf32.c load_elf32_obj.c reloc_elf32.c
 SRCS+=	load_elf64.c load_elf64_obj.c reloc_elf64.c
 .elif ${MACHINE_CPUARCH} == "aarch64"
 SRCS+=	load_elf64.c reloc_elf64.c
 .elif ${MACHINE_CPUARCH} == "arm"
 SRCS+=	load_elf32.c reloc_elf32.c
 .elif ${MACHINE_CPUARCH} == "powerpc"
 SRCS+=	load_elf32.c reloc_elf32.c
 SRCS+=	load_elf64.c reloc_elf64.c
 SRCS+=	metadata.c
 .elif ${MACHINE_CPUARCH} == "sparc64"
 SRCS+=	load_elf64.c reloc_elf64.c
 SRCS+=	metadata.c
 .elif ${MACHINE_ARCH:Mmips64*} != ""
 SRCS+= load_elf64.c reloc_elf64.c
 SRCS+=	metadata.c
 .elif ${MACHINE} == "mips"
 SRCS+=	load_elf32.c reloc_elf32.c
 SRCS+=	metadata.c
 .endif
 
 .if ${LOADER_DISK_SUPPORT:Uyes} == "yes"
-SRCS+=	disk.c part.c
+SRCS+=	disk.c part.c vdisk.c
 .endif
 
 .if ${LOADER_NET_SUPPORT:Uno} == "yes"
 SRCS+= dev_net.c
 .endif
 
 .if defined(HAVE_BCACHE)
 SRCS+=  bcache.c
 .endif
 
 .if defined(MD_IMAGE_SIZE)
 CFLAGS+= -DMD_IMAGE_SIZE=${MD_IMAGE_SIZE}
 SRCS+=	md.c
 .else
 CLEANFILES+=	md.o
 .endif
 
 # Machine-independant ISA PnP
 .if defined(HAVE_ISABUS)
 SRCS+=	isapnp.c
 .endif
 .if defined(HAVE_PNP)
 SRCS+=	pnp.c
 .endif
 
 .if ${LOADER_INTERP} == "lua"
 SRCS+=	interp_lua.c
 .include "${BOOTSRC}/lua.mk"
 LDR_INTERP=	${LIBLUA}
 LDR_INTERP32=	${LIBLUA32}
 .elif ${LOADER_INTERP} == "4th"
 SRCS+=	interp_forth.c
 .include "${BOOTSRC}/ficl.mk"
 LDR_INTERP=	${LIBFICL}
 LDR_INTERP32=	${LIBFICL32}
 .elif ${LOADER_INTERP} == "simp"
 SRCS+=	interp_simple.c
 .else
 .error Unknown interpreter ${LOADER_INTERP}
 .endif
 
 .if ${MK_LOADER_VERIEXEC} != "no"
 CFLAGS+= -DLOADER_VERIEXEC -I${SRCTOP}/lib/libsecureboot/h
 .endif
 
 .if ${MK_LOADER_VERIEXEC_PASS_MANIFEST} != "no"
 CFLAGS+= -DLOADER_VERIEXEC_PASS_MANIFEST -I${SRCTOP}/lib/libsecureboot/h
 .endif
 
 .if defined(BOOT_PROMPT_123)
 CFLAGS+=	-DBOOT_PROMPT_123
 .endif
 
 .if defined(LOADER_INSTALL_SUPPORT)
 SRCS+=	install.c
 .endif
 
 # Filesystem support
 .if ${LOADER_CD9660_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_CD9660_SUPPORT
 .endif
 .if ${LOADER_EXT2FS_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_EXT2FS_SUPPORT
 .endif
 .if ${LOADER_MSDOS_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_MSDOS_SUPPORT
 .endif
 .if ${LOADER_NANDFS_SUPPORT:U${MK_NAND}} == "yes"
 CFLAGS+=	-DLOADER_NANDFS_SUPPORT
 .endif
 .if ${LOADER_UFS_SUPPORT:Uyes} == "yes"
 CFLAGS+=	-DLOADER_UFS_SUPPORT
 .endif
 
 # Compression
 .if ${LOADER_GZIP_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_GZIP_SUPPORT
 .endif
 .if ${LOADER_BZIP2_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_BZIP2_SUPPORT
 .endif
 
 # Network related things
 .if ${LOADER_NET_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_NET_SUPPORT
 .endif
 .if ${LOADER_NFS_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_NFS_SUPPORT
 .endif
 .if ${LOADER_TFTP_SUPPORT:Uno} == "yes"
 CFLAGS+=	-DLOADER_TFTP_SUPPORT
 .endif
 
 # Partition support
 .if ${LOADER_GPT_SUPPORT:Uyes} == "yes"
 CFLAGS+= -DLOADER_GPT_SUPPORT
 .endif
 .if ${LOADER_MBR_SUPPORT:Uyes} == "yes"
 CFLAGS+= -DLOADER_MBR_SUPPORT
 .endif
 
 .if ${HAVE_ZFS:Uno} == "yes"
 CFLAGS+=	-DLOADER_ZFS_SUPPORT
 CFLAGS+=	-I${ZFSSRC}
 CFLAGS+=	-I${SYSDIR}/cddl/boot/zfs
 SRCS+=		zfs_cmd.c
 .endif
 
 LIBFICL=	${BOOTOBJ}/ficl/libficl.a
 .if ${MACHINE} == "i386"
 LIBFICL32=	${LIBFICL}
 .else
 LIBFICL32=	${BOOTOBJ}/ficl32/libficl.a
 .endif
 
 LIBLUA=		${BOOTOBJ}/liblua/liblua.a
 .if ${MACHINE} == "i386"
 LIBLUA32=	${LIBLUA}
 .else
 LIBLUA32=	${BOOTOBJ}/liblua32/liblua.a
 .endif
 
 CLEANFILES+=	vers.c
 VERSION_FILE?=	${.CURDIR}/version
 .if ${MK_REPRODUCIBLE_BUILD} != no
 REPRO_FLAG=	-r
 .endif
 vers.c: ${LDRSRC}/newvers.sh ${VERSION_FILE}
 	sh ${LDRSRC}/newvers.sh ${REPRO_FLAG} ${VERSION_FILE} \
 	    ${NEWVERSWHAT}
 
 .if ${MK_LOADER_VERBOSE} != "no"
 CFLAGS+=	-DELF_VERBOSE
 .endif
 
 .if !empty(HELP_FILES)
 HELP_FILES+=	${LDRSRC}/help.common
 
 CLEANFILES+=	loader.help
 FILES+=		loader.help
 
 loader.help: ${HELP_FILES}
 	cat ${HELP_FILES} | awk -f ${LDRSRC}/merge_help.awk > ${.TARGET}
 .endif
Index: user/ngie/bug-237403/sys/amd64/include/vmm.h
===================================================================
--- user/ngie/bug-237403/sys/amd64/include/vmm.h	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/include/vmm.h	(revision 346926)
@@ -1,701 +1,702 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _VMM_H_
 #define	_VMM_H_
 
 #include <sys/sdt.h>
 #include <x86/segments.h>
 
 #ifdef _KERNEL
 SDT_PROVIDER_DECLARE(vmm);
 #endif
 
 enum vm_suspend_how {
 	VM_SUSPEND_NONE,
 	VM_SUSPEND_RESET,
 	VM_SUSPEND_POWEROFF,
 	VM_SUSPEND_HALT,
 	VM_SUSPEND_TRIPLEFAULT,
 	VM_SUSPEND_LAST
 };
 
 /*
  * Identifiers for architecturally defined registers.
  */
 enum vm_reg_name {
 	VM_REG_GUEST_RAX,
 	VM_REG_GUEST_RBX,
 	VM_REG_GUEST_RCX,
 	VM_REG_GUEST_RDX,
 	VM_REG_GUEST_RSI,
 	VM_REG_GUEST_RDI,
 	VM_REG_GUEST_RBP,
 	VM_REG_GUEST_R8,
 	VM_REG_GUEST_R9,
 	VM_REG_GUEST_R10,
 	VM_REG_GUEST_R11,
 	VM_REG_GUEST_R12,
 	VM_REG_GUEST_R13,
 	VM_REG_GUEST_R14,
 	VM_REG_GUEST_R15,
 	VM_REG_GUEST_CR0,
 	VM_REG_GUEST_CR3,
 	VM_REG_GUEST_CR4,
 	VM_REG_GUEST_DR7,
 	VM_REG_GUEST_RSP,
 	VM_REG_GUEST_RIP,
 	VM_REG_GUEST_RFLAGS,
 	VM_REG_GUEST_ES,
 	VM_REG_GUEST_CS,
 	VM_REG_GUEST_SS,
 	VM_REG_GUEST_DS,
 	VM_REG_GUEST_FS,
 	VM_REG_GUEST_GS,
 	VM_REG_GUEST_LDTR,
 	VM_REG_GUEST_TR,
 	VM_REG_GUEST_IDTR,
 	VM_REG_GUEST_GDTR,
 	VM_REG_GUEST_EFER,
 	VM_REG_GUEST_CR2,
 	VM_REG_GUEST_PDPTE0,
 	VM_REG_GUEST_PDPTE1,
 	VM_REG_GUEST_PDPTE2,
 	VM_REG_GUEST_PDPTE3,
 	VM_REG_GUEST_INTR_SHADOW,
 	VM_REG_GUEST_DR0,
 	VM_REG_GUEST_DR1,
 	VM_REG_GUEST_DR2,
 	VM_REG_GUEST_DR3,
 	VM_REG_GUEST_DR6,
 	VM_REG_LAST
 };
 
 enum x2apic_state {
 	X2APIC_DISABLED,
 	X2APIC_ENABLED,
 	X2APIC_STATE_LAST
 };
 
 #define	VM_INTINFO_VECTOR(info)	((info) & 0xff)
 #define	VM_INTINFO_DEL_ERRCODE	0x800
 #define	VM_INTINFO_RSVD		0x7ffff000
 #define	VM_INTINFO_VALID	0x80000000
 #define	VM_INTINFO_TYPE		0x700
 #define	VM_INTINFO_HWINTR	(0 << 8)
 #define	VM_INTINFO_NMI		(2 << 8)
 #define	VM_INTINFO_HWEXCEPTION	(3 << 8)
 #define	VM_INTINFO_SWINTR	(4 << 8)
 
 #ifdef _KERNEL
 
 #define	VM_MAX_NAMELEN	32
 
 struct vm;
 struct vm_exception;
 struct seg_desc;
 struct vm_exit;
 struct vm_run;
 struct vhpet;
 struct vioapic;
 struct vlapic;
 struct vmspace;
 struct vm_object;
 struct vm_guest_paging;
 struct pmap;
 
 struct vm_eventinfo {
 	void	*rptr;		/* rendezvous cookie */
 	int	*sptr;		/* suspend cookie */
 	int	*iptr;		/* reqidle cookie */
 };
 
 typedef int	(*vmm_init_func_t)(int ipinum);
 typedef int	(*vmm_cleanup_func_t)(void);
 typedef void	(*vmm_resume_func_t)(void);
 typedef void *	(*vmi_init_func_t)(struct vm *vm, struct pmap *pmap);
 typedef int	(*vmi_run_func_t)(void *vmi, int vcpu, register_t rip,
 		    struct pmap *pmap, struct vm_eventinfo *info);
 typedef void	(*vmi_cleanup_func_t)(void *vmi);
 typedef int	(*vmi_get_register_t)(void *vmi, int vcpu, int num,
 				      uint64_t *retval);
 typedef int	(*vmi_set_register_t)(void *vmi, int vcpu, int num,
 				      uint64_t val);
 typedef int	(*vmi_get_desc_t)(void *vmi, int vcpu, int num,
 				  struct seg_desc *desc);
 typedef int	(*vmi_set_desc_t)(void *vmi, int vcpu, int num,
 				  struct seg_desc *desc);
 typedef int	(*vmi_get_cap_t)(void *vmi, int vcpu, int num, int *retval);
 typedef int	(*vmi_set_cap_t)(void *vmi, int vcpu, int num, int val);
 typedef struct vmspace * (*vmi_vmspace_alloc)(vm_offset_t min, vm_offset_t max);
 typedef void	(*vmi_vmspace_free)(struct vmspace *vmspace);
 typedef struct vlapic * (*vmi_vlapic_init)(void *vmi, int vcpu);
 typedef void	(*vmi_vlapic_cleanup)(void *vmi, struct vlapic *vlapic);
 
 struct vmm_ops {
 	vmm_init_func_t		init;		/* module wide initialization */
 	vmm_cleanup_func_t	cleanup;
 	vmm_resume_func_t	resume;
 
 	vmi_init_func_t		vminit;		/* vm-specific initialization */
 	vmi_run_func_t		vmrun;
 	vmi_cleanup_func_t	vmcleanup;
 	vmi_get_register_t	vmgetreg;
 	vmi_set_register_t	vmsetreg;
 	vmi_get_desc_t		vmgetdesc;
 	vmi_set_desc_t		vmsetdesc;
 	vmi_get_cap_t		vmgetcap;
 	vmi_set_cap_t		vmsetcap;
 	vmi_vmspace_alloc	vmspace_alloc;
 	vmi_vmspace_free	vmspace_free;
 	vmi_vlapic_init		vlapic_init;
 	vmi_vlapic_cleanup	vlapic_cleanup;
 };
 
 extern struct vmm_ops vmm_ops_intel;
 extern struct vmm_ops vmm_ops_amd;
 
 int vm_create(const char *name, struct vm **retvm);
 void vm_destroy(struct vm *vm);
 int vm_reinit(struct vm *vm);
 const char *vm_name(struct vm *vm);
+uint16_t vm_get_maxcpus(struct vm *vm);
 void vm_get_topology(struct vm *vm, uint16_t *sockets, uint16_t *cores,
     uint16_t *threads, uint16_t *maxcpus);
 int vm_set_topology(struct vm *vm, uint16_t sockets, uint16_t cores,
     uint16_t threads, uint16_t maxcpus);
 
 /*
  * APIs that modify the guest memory map require all vcpus to be frozen.
  */
 int vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t off,
     size_t len, int prot, int flags);
 int vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem);
 void vm_free_memseg(struct vm *vm, int ident);
 int vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa);
 int vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len);
 int vm_assign_pptdev(struct vm *vm, int bus, int slot, int func);
 int vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func);
 
 /*
  * APIs that inspect the guest memory map require only a *single* vcpu to
  * be frozen. This acts like a read lock on the guest memory map since any
  * modification requires *all* vcpus to be frozen.
  */
 int vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid,
     vm_ooffset_t *segoff, size_t *len, int *prot, int *flags);
 int vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem,
     struct vm_object **objptr);
 vm_paddr_t vmm_sysmem_maxaddr(struct vm *vm);
 void *vm_gpa_hold(struct vm *, int vcpuid, vm_paddr_t gpa, size_t len,
     int prot, void **cookie);
 void vm_gpa_release(void *cookie);
 bool vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa);
 
 int vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval);
 int vm_set_register(struct vm *vm, int vcpu, int reg, uint64_t val);
 int vm_get_seg_desc(struct vm *vm, int vcpu, int reg,
 		    struct seg_desc *ret_desc);
 int vm_set_seg_desc(struct vm *vm, int vcpu, int reg,
 		    struct seg_desc *desc);
 int vm_run(struct vm *vm, struct vm_run *vmrun);
 int vm_suspend(struct vm *vm, enum vm_suspend_how how);
 int vm_inject_nmi(struct vm *vm, int vcpu);
 int vm_nmi_pending(struct vm *vm, int vcpuid);
 void vm_nmi_clear(struct vm *vm, int vcpuid);
 int vm_inject_extint(struct vm *vm, int vcpu);
 int vm_extint_pending(struct vm *vm, int vcpuid);
 void vm_extint_clear(struct vm *vm, int vcpuid);
 struct vlapic *vm_lapic(struct vm *vm, int cpu);
 struct vioapic *vm_ioapic(struct vm *vm);
 struct vhpet *vm_hpet(struct vm *vm);
 int vm_get_capability(struct vm *vm, int vcpu, int type, int *val);
 int vm_set_capability(struct vm *vm, int vcpu, int type, int val);
 int vm_get_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state *state);
 int vm_set_x2apic_state(struct vm *vm, int vcpu, enum x2apic_state state);
 int vm_apicid2vcpuid(struct vm *vm, int apicid);
 int vm_activate_cpu(struct vm *vm, int vcpu);
 int vm_suspend_cpu(struct vm *vm, int vcpu);
 int vm_resume_cpu(struct vm *vm, int vcpu);
 struct vm_exit *vm_exitinfo(struct vm *vm, int vcpuid);
 void vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip);
 void vm_exit_debug(struct vm *vm, int vcpuid, uint64_t rip);
 void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip);
 void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip);
 void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip);
 
 #ifdef _SYS__CPUSET_H_
 /*
  * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'.
  * The rendezvous 'func(arg)' is not allowed to do anything that will
  * cause the thread to be put to sleep.
  *
  * If the rendezvous is being initiated from a vcpu context then the
  * 'vcpuid' must refer to that vcpu, otherwise it should be set to -1.
  *
  * The caller cannot hold any locks when initiating the rendezvous.
  *
  * The implementation of this API may cause vcpus other than those specified
  * by 'dest' to be stalled. The caller should not rely on any vcpus making
  * forward progress when the rendezvous is in progress.
  */
 typedef void (*vm_rendezvous_func_t)(struct vm *vm, int vcpuid, void *arg);
 void vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest,
     vm_rendezvous_func_t func, void *arg);
 cpuset_t vm_active_cpus(struct vm *vm);
 cpuset_t vm_debug_cpus(struct vm *vm);
 cpuset_t vm_suspended_cpus(struct vm *vm);
 #endif	/* _SYS__CPUSET_H_ */
 
 static __inline int
 vcpu_rendezvous_pending(struct vm_eventinfo *info)
 {
 
 	return (*((uintptr_t *)(info->rptr)) != 0);
 }
 
 static __inline int
 vcpu_suspended(struct vm_eventinfo *info)
 {
 
 	return (*info->sptr);
 }
 
 static __inline int
 vcpu_reqidle(struct vm_eventinfo *info)
 {
 
 	return (*info->iptr);
 }
 
 int vcpu_debugged(struct vm *vm, int vcpuid);
 
 /*
  * Return 1 if device indicated by bus/slot/func is supposed to be a
  * pci passthrough device.
  *
  * Return 0 otherwise.
  */
 int vmm_is_pptdev(int bus, int slot, int func);
 
 void *vm_iommu_domain(struct vm *vm);
 
 enum vcpu_state {
 	VCPU_IDLE,
 	VCPU_FROZEN,
 	VCPU_RUNNING,
 	VCPU_SLEEPING,
 };
 
 int vcpu_set_state(struct vm *vm, int vcpu, enum vcpu_state state,
     bool from_idle);
 enum vcpu_state vcpu_get_state(struct vm *vm, int vcpu, int *hostcpu);
 
 static int __inline
 vcpu_is_running(struct vm *vm, int vcpu, int *hostcpu)
 {
 	return (vcpu_get_state(vm, vcpu, hostcpu) == VCPU_RUNNING);
 }
 
 #ifdef _SYS_PROC_H_
 static int __inline
 vcpu_should_yield(struct vm *vm, int vcpu)
 {
 
 	if (curthread->td_flags & (TDF_ASTPENDING | TDF_NEEDRESCHED))
 		return (1);
 	else if (curthread->td_owepreempt)
 		return (1);
 	else
 		return (0);
 }
 #endif
 
 void *vcpu_stats(struct vm *vm, int vcpu);
 void vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr);
 struct vmspace *vm_get_vmspace(struct vm *vm);
 struct vatpic *vm_atpic(struct vm *vm);
 struct vatpit *vm_atpit(struct vm *vm);
 struct vpmtmr *vm_pmtmr(struct vm *vm);
 struct vrtc *vm_rtc(struct vm *vm);
 
 /*
  * Inject exception 'vector' into the guest vcpu. This function returns 0 on
  * success and non-zero on failure.
  *
  * Wrapper functions like 'vm_inject_gp()' should be preferred to calling
  * this function directly because they enforce the trap-like or fault-like
  * behavior of an exception.
  *
  * This function should only be called in the context of the thread that is
  * executing this vcpu.
  */
 int vm_inject_exception(struct vm *vm, int vcpuid, int vector, int err_valid,
     uint32_t errcode, int restart_instruction);
 
 /*
  * This function is called after a VM-exit that occurred during exception or
  * interrupt delivery through the IDT. The format of 'intinfo' is described
  * in Figure 15-1, "EXITINTINFO for All Intercepts", APM, Vol 2.
  *
  * If a VM-exit handler completes the event delivery successfully then it
  * should call vm_exit_intinfo() to extinguish the pending event. For e.g.,
  * if the task switch emulation is triggered via a task gate then it should
  * call this function with 'intinfo=0' to indicate that the external event
  * is not pending anymore.
  *
  * Return value is 0 on success and non-zero on failure.
  */
 int vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t intinfo);
 
 /*
  * This function is called before every VM-entry to retrieve a pending
  * event that should be injected into the guest. This function combines
  * nested events into a double or triple fault.
  *
  * Returns 0 if there are no events that need to be injected into the guest
  * and non-zero otherwise.
  */
 int vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *info);
 
 int vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2);
 
 enum vm_reg_name vm_segment_name(int seg_encoding);
 
 struct vm_copyinfo {
 	uint64_t	gpa;
 	size_t		len;
 	void		*hva;
 	void		*cookie;
 };
 
 /*
  * Set up 'copyinfo[]' to copy to/from guest linear address space starting
  * at 'gla' and 'len' bytes long. The 'prot' should be set to PROT_READ for
  * a copyin or PROT_WRITE for a copyout. 
  *
  * retval	is_fault	Interpretation
  *   0		   0		Success
  *   0		   1		An exception was injected into the guest
  * EFAULT	  N/A		Unrecoverable error
  *
  * The 'copyinfo[]' can be passed to 'vm_copyin()' or 'vm_copyout()' only if
  * the return value is 0. The 'copyinfo[]' resources should be freed by calling
  * 'vm_copy_teardown()' after the copy is done.
  */
 int vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging,
     uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo,
     int num_copyinfo, int *is_fault);
 void vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo,
     int num_copyinfo);
 void vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo,
     void *kaddr, size_t len);
 void vm_copyout(struct vm *vm, int vcpuid, const void *kaddr,
     struct vm_copyinfo *copyinfo, size_t len);
 
 int vcpu_trace_exceptions(struct vm *vm, int vcpuid);
 #endif	/* KERNEL */
 
 #define	VM_MAXCPU	16			/* maximum virtual cpus */
 
 /*
  * Identifiers for optional vmm capabilities
  */
 enum vm_cap_type {
 	VM_CAP_HALT_EXIT,
 	VM_CAP_MTRAP_EXIT,
 	VM_CAP_PAUSE_EXIT,
 	VM_CAP_UNRESTRICTED_GUEST,
 	VM_CAP_ENABLE_INVPCID,
 	VM_CAP_MAX
 };
 
 enum vm_intr_trigger {
 	EDGE_TRIGGER,
 	LEVEL_TRIGGER
 };
 	
 /*
  * The 'access' field has the format specified in Table 21-2 of the Intel
  * Architecture Manual vol 3b.
  *
  * XXX The contents of the 'access' field are architecturally defined except
  * bit 16 - Segment Unusable.
  */
 struct seg_desc {
 	uint64_t	base;
 	uint32_t	limit;
 	uint32_t	access;
 };
 #define	SEG_DESC_TYPE(access)		((access) & 0x001f)
 #define	SEG_DESC_DPL(access)		(((access) >> 5) & 0x3)
 #define	SEG_DESC_PRESENT(access)	(((access) & 0x0080) ? 1 : 0)
 #define	SEG_DESC_DEF32(access)		(((access) & 0x4000) ? 1 : 0)
 #define	SEG_DESC_GRANULARITY(access)	(((access) & 0x8000) ? 1 : 0)
 #define	SEG_DESC_UNUSABLE(access)	(((access) & 0x10000) ? 1 : 0)
 
 enum vm_cpu_mode {
 	CPU_MODE_REAL,
 	CPU_MODE_PROTECTED,
 	CPU_MODE_COMPATIBILITY,		/* IA-32E mode (CS.L = 0) */
 	CPU_MODE_64BIT,			/* IA-32E mode (CS.L = 1) */
 };
 
 enum vm_paging_mode {
 	PAGING_MODE_FLAT,
 	PAGING_MODE_32,
 	PAGING_MODE_PAE,
 	PAGING_MODE_64,
 };
 
 struct vm_guest_paging {
 	uint64_t	cr3;
 	int		cpl;
 	enum vm_cpu_mode cpu_mode;
 	enum vm_paging_mode paging_mode;
 };
 
 /*
  * The data structures 'vie' and 'vie_op' are meant to be opaque to the
  * consumers of instruction decoding. The only reason why their contents
  * need to be exposed is because they are part of the 'vm_exit' structure.
  */
 struct vie_op {
 	uint8_t		op_byte;	/* actual opcode byte */
 	uint8_t		op_type;	/* type of operation (e.g. MOV) */
 	uint16_t	op_flags;
 };
 
 #define	VIE_INST_SIZE	15
 struct vie {
 	uint8_t		inst[VIE_INST_SIZE];	/* instruction bytes */
 	uint8_t		num_valid;		/* size of the instruction */
 	uint8_t		num_processed;
 
 	uint8_t		addrsize:4, opsize:4;	/* address and operand sizes */
 	uint8_t		rex_w:1,		/* REX prefix */
 			rex_r:1,
 			rex_x:1,
 			rex_b:1,
 			rex_present:1,
 			repz_present:1,		/* REP/REPE/REPZ prefix */
 			repnz_present:1,	/* REPNE/REPNZ prefix */
 			opsize_override:1,	/* Operand size override */
 			addrsize_override:1,	/* Address size override */
 			segment_override:1;	/* Segment override */
 
 	uint8_t		mod:2,			/* ModRM byte */
 			reg:4,
 			rm:4;
 
 	uint8_t		ss:2,			/* SIB byte */
 			index:4,
 			base:4;
 
 	uint8_t		disp_bytes;
 	uint8_t		imm_bytes;
 
 	uint8_t		scale;
 	int		base_register;		/* VM_REG_GUEST_xyz */
 	int		index_register;		/* VM_REG_GUEST_xyz */
 	int		segment_register;	/* VM_REG_GUEST_xyz */
 
 	int64_t		displacement;		/* optional addr displacement */
 	int64_t		immediate;		/* optional immediate operand */
 
 	uint8_t		decoded;	/* set to 1 if successfully decoded */
 
 	struct vie_op	op;			/* opcode description */
 };
 
 enum vm_exitcode {
 	VM_EXITCODE_INOUT,
 	VM_EXITCODE_VMX,
 	VM_EXITCODE_BOGUS,
 	VM_EXITCODE_RDMSR,
 	VM_EXITCODE_WRMSR,
 	VM_EXITCODE_HLT,
 	VM_EXITCODE_MTRAP,
 	VM_EXITCODE_PAUSE,
 	VM_EXITCODE_PAGING,
 	VM_EXITCODE_INST_EMUL,
 	VM_EXITCODE_SPINUP_AP,
 	VM_EXITCODE_DEPRECATED1,	/* used to be SPINDOWN_CPU */
 	VM_EXITCODE_RENDEZVOUS,
 	VM_EXITCODE_IOAPIC_EOI,
 	VM_EXITCODE_SUSPENDED,
 	VM_EXITCODE_INOUT_STR,
 	VM_EXITCODE_TASK_SWITCH,
 	VM_EXITCODE_MONITOR,
 	VM_EXITCODE_MWAIT,
 	VM_EXITCODE_SVM,
 	VM_EXITCODE_REQIDLE,
 	VM_EXITCODE_DEBUG,
 	VM_EXITCODE_VMINSN,
 	VM_EXITCODE_MAX
 };
 
 struct vm_inout {
 	uint16_t	bytes:3;	/* 1 or 2 or 4 */
 	uint16_t	in:1;
 	uint16_t	string:1;
 	uint16_t	rep:1;
 	uint16_t	port;
 	uint32_t	eax;		/* valid for out */
 };
 
 struct vm_inout_str {
 	struct vm_inout	inout;		/* must be the first element */
 	struct vm_guest_paging paging;
 	uint64_t	rflags;
 	uint64_t	cr0;
 	uint64_t	index;
 	uint64_t	count;		/* rep=1 (%rcx), rep=0 (1) */
 	int		addrsize;
 	enum vm_reg_name seg_name;
 	struct seg_desc seg_desc;
 };
 
 enum task_switch_reason {
 	TSR_CALL,
 	TSR_IRET,
 	TSR_JMP,
 	TSR_IDT_GATE,	/* task gate in IDT */
 };
 
 struct vm_task_switch {
 	uint16_t	tsssel;		/* new TSS selector */
 	int		ext;		/* task switch due to external event */
 	uint32_t	errcode;
 	int		errcode_valid;	/* push 'errcode' on the new stack */
 	enum task_switch_reason reason;
 	struct vm_guest_paging paging;
 };
 
 struct vm_exit {
 	enum vm_exitcode	exitcode;
 	int			inst_length;	/* 0 means unknown */
 	uint64_t		rip;
 	union {
 		struct vm_inout	inout;
 		struct vm_inout_str inout_str;
 		struct {
 			uint64_t	gpa;
 			int		fault_type;
 		} paging;
 		struct {
 			uint64_t	gpa;
 			uint64_t	gla;
 			uint64_t	cs_base;
 			int		cs_d;		/* CS.D */
 			struct vm_guest_paging paging;
 			struct vie	vie;
 		} inst_emul;
 		/*
 		 * VMX specific payload. Used when there is no "better"
 		 * exitcode to represent the VM-exit.
 		 */
 		struct {
 			int		status;		/* vmx inst status */
 			/*
 			 * 'exit_reason' and 'exit_qualification' are valid
 			 * only if 'status' is zero.
 			 */
 			uint32_t	exit_reason;
 			uint64_t	exit_qualification;
 			/*
 			 * 'inst_error' and 'inst_type' are valid
 			 * only if 'status' is non-zero.
 			 */
 			int		inst_type;
 			int		inst_error;
 		} vmx;
 		/*
 		 * SVM specific payload.
 		 */
 		struct {
 			uint64_t	exitcode;
 			uint64_t	exitinfo1;
 			uint64_t	exitinfo2;
 		} svm;
 		struct {
 			uint32_t	code;		/* ecx value */
 			uint64_t	wval;
 		} msr;
 		struct {
 			int		vcpu;
 			uint64_t	rip;
 		} spinup_ap;
 		struct {
 			uint64_t	rflags;
 			uint64_t	intr_status;
 		} hlt;
 		struct {
 			int		vector;
 		} ioapic_eoi;
 		struct {
 			enum vm_suspend_how how;
 		} suspended;
 		struct vm_task_switch task_switch;
 	} u;
 };
 
 /* APIs to inject faults into the guest */
 void vm_inject_fault(void *vm, int vcpuid, int vector, int errcode_valid,
     int errcode);
 
 static __inline void
 vm_inject_ud(void *vm, int vcpuid)
 {
 	vm_inject_fault(vm, vcpuid, IDT_UD, 0, 0);
 }
 
 static __inline void
 vm_inject_gp(void *vm, int vcpuid)
 {
 	vm_inject_fault(vm, vcpuid, IDT_GP, 1, 0);
 }
 
 static __inline void
 vm_inject_ac(void *vm, int vcpuid, int errcode)
 {
 	vm_inject_fault(vm, vcpuid, IDT_AC, 1, errcode);
 }
 
 static __inline void
 vm_inject_ss(void *vm, int vcpuid, int errcode)
 {
 	vm_inject_fault(vm, vcpuid, IDT_SS, 1, errcode);
 }
 
 void vm_inject_pf(void *vm, int vcpuid, int error_code, uint64_t cr2);
 
 int vm_restart_instruction(void *vm, int vcpuid);
 
 #endif	/* _VMM_H_ */
Index: user/ngie/bug-237403/sys/amd64/vmm/amd/svm.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/amd/svm.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/amd/svm.c	(revision 346926)
@@ -1,2300 +1,2302 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2013, Anish Gupta (akgupt3@gmail.com)
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/smp.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/sysctl.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <machine/cpufunc.h>
 #include <machine/psl.h>
 #include <machine/md_var.h>
 #include <machine/reg.h>
 #include <machine/specialreg.h>
 #include <machine/smp.h>
 #include <machine/vmm.h>
 #include <machine/vmm_dev.h>
 #include <machine/vmm_instruction_emul.h>
 
 #include "vmm_lapic.h"
 #include "vmm_stat.h"
 #include "vmm_ktr.h"
 #include "vmm_ioport.h"
 #include "vatpic.h"
 #include "vlapic.h"
 #include "vlapic_priv.h"
 
 #include "x86.h"
 #include "vmcb.h"
 #include "svm.h"
 #include "svm_softc.h"
 #include "svm_msr.h"
 #include "npt.h"
 
 SYSCTL_DECL(_hw_vmm);
 SYSCTL_NODE(_hw_vmm, OID_AUTO, svm, CTLFLAG_RW, NULL, NULL);
 
 /*
  * SVM CPUID function 0x8000_000A, edx bit decoding.
  */
 #define AMD_CPUID_SVM_NP		BIT(0)  /* Nested paging or RVI */
 #define AMD_CPUID_SVM_LBR		BIT(1)  /* Last branch virtualization */
 #define AMD_CPUID_SVM_SVML		BIT(2)  /* SVM lock */
 #define AMD_CPUID_SVM_NRIP_SAVE		BIT(3)  /* Next RIP is saved */
 #define AMD_CPUID_SVM_TSC_RATE		BIT(4)  /* TSC rate control. */
 #define AMD_CPUID_SVM_VMCB_CLEAN	BIT(5)  /* VMCB state caching */
 #define AMD_CPUID_SVM_FLUSH_BY_ASID	BIT(6)  /* Flush by ASID */
 #define AMD_CPUID_SVM_DECODE_ASSIST	BIT(7)  /* Decode assist */
 #define AMD_CPUID_SVM_PAUSE_INC		BIT(10) /* Pause intercept filter. */
 #define AMD_CPUID_SVM_PAUSE_FTH		BIT(12) /* Pause filter threshold */
 #define	AMD_CPUID_SVM_AVIC		BIT(13)	/* AVIC present */
 
 #define	VMCB_CACHE_DEFAULT	(VMCB_CACHE_ASID 	|	\
 				VMCB_CACHE_IOPM		|	\
 				VMCB_CACHE_I		|	\
 				VMCB_CACHE_TPR		|	\
 				VMCB_CACHE_CR2		|	\
 				VMCB_CACHE_CR		|	\
 				VMCB_CACHE_DR		|	\
 				VMCB_CACHE_DT		|	\
 				VMCB_CACHE_SEG		|	\
 				VMCB_CACHE_NP)
 
 static uint32_t vmcb_clean = VMCB_CACHE_DEFAULT;
 SYSCTL_INT(_hw_vmm_svm, OID_AUTO, vmcb_clean, CTLFLAG_RDTUN, &vmcb_clean,
     0, NULL);
 
 static MALLOC_DEFINE(M_SVM, "svm", "svm");
 static MALLOC_DEFINE(M_SVM_VLAPIC, "svm-vlapic", "svm-vlapic");
 
 /* Per-CPU context area. */
 extern struct pcpu __pcpu[];
 
 static uint32_t svm_feature = ~0U;	/* AMD SVM features. */
 SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, features, CTLFLAG_RDTUN, &svm_feature, 0,
     "SVM features advertised by CPUID.8000000AH:EDX");
 
 static int disable_npf_assist;
 SYSCTL_INT(_hw_vmm_svm, OID_AUTO, disable_npf_assist, CTLFLAG_RWTUN,
     &disable_npf_assist, 0, NULL);
 
 /* Maximum ASIDs supported by the processor */
 static uint32_t nasid;
 SYSCTL_UINT(_hw_vmm_svm, OID_AUTO, num_asids, CTLFLAG_RDTUN, &nasid, 0,
     "Number of ASIDs supported by this processor");
 
 /* Current ASID generation for each host cpu */
 static struct asid asid[MAXCPU];
 
 /* 
  * SVM host state saved area of size 4KB for each core.
  */
 static uint8_t hsave[MAXCPU][PAGE_SIZE] __aligned(PAGE_SIZE);
 
 static VMM_STAT_AMD(VCPU_EXITINTINFO, "VM exits during event delivery");
 static VMM_STAT_AMD(VCPU_INTINFO_INJECTED, "Events pending at VM entry");
 static VMM_STAT_AMD(VMEXIT_VINTR, "VM exits due to interrupt window");
 
 static int svm_setreg(void *arg, int vcpu, int ident, uint64_t val);
 
 static __inline int
 flush_by_asid(void)
 {
 
 	return (svm_feature & AMD_CPUID_SVM_FLUSH_BY_ASID);
 }
 
 static __inline int
 decode_assist(void)
 {
 
 	return (svm_feature & AMD_CPUID_SVM_DECODE_ASSIST);
 }
 
 static void
 svm_disable(void *arg __unused)
 {
 	uint64_t efer;
 
 	efer = rdmsr(MSR_EFER);
 	efer &= ~EFER_SVM;
 	wrmsr(MSR_EFER, efer);
 }
 
 /*
  * Disable SVM on all CPUs.
  */
 static int
 svm_cleanup(void)
 {
 
 	smp_rendezvous(NULL, svm_disable, NULL, NULL);
 	return (0);
 }
 
 /*
  * Verify that all the features required by bhyve are available.
  */
 static int
 check_svm_features(void)
 {
 	u_int regs[4];
 
 	/* CPUID Fn8000_000A is for SVM */
 	do_cpuid(0x8000000A, regs);
 	svm_feature &= regs[3];
 
 	/*
 	 * The number of ASIDs can be configured to be less than what is
 	 * supported by the hardware but not more.
 	 */
 	if (nasid == 0 || nasid > regs[1])
 		nasid = regs[1];
 	KASSERT(nasid > 1, ("Insufficient ASIDs for guests: %#x", nasid));
 
 	/* bhyve requires the Nested Paging feature */
 	if (!(svm_feature & AMD_CPUID_SVM_NP)) {
 		printf("SVM: Nested Paging feature not available.\n");
 		return (ENXIO);
 	}
 
 	/* bhyve requires the NRIP Save feature */
 	if (!(svm_feature & AMD_CPUID_SVM_NRIP_SAVE)) {
 		printf("SVM: NRIP Save feature not available.\n");
 		return (ENXIO);
 	}
 
 	return (0);
 }
 
 static void
 svm_enable(void *arg __unused)
 {
 	uint64_t efer;
 
 	efer = rdmsr(MSR_EFER);
 	efer |= EFER_SVM;
 	wrmsr(MSR_EFER, efer);
 
 	wrmsr(MSR_VM_HSAVE_PA, vtophys(hsave[curcpu]));
 }
 
 /*
  * Return 1 if SVM is enabled on this processor and 0 otherwise.
  */
 static int
 svm_available(void)
 {
 	uint64_t msr;
 
 	/* Section 15.4 Enabling SVM from APM2. */
 	if ((amd_feature2 & AMDID2_SVM) == 0) {
 		printf("SVM: not available.\n");
 		return (0);
 	}
 
 	msr = rdmsr(MSR_VM_CR);
 	if ((msr & VM_CR_SVMDIS) != 0) {
 		printf("SVM: disabled by BIOS.\n");
 		return (0);
 	}
 
 	return (1);
 }
 
 static int
 svm_init(int ipinum)
 {
 	int error, cpu;
 
 	if (!svm_available())
 		return (ENXIO);
 
 	error = check_svm_features();
 	if (error)
 		return (error);
 
 	vmcb_clean &= VMCB_CACHE_DEFAULT;
 
 	for (cpu = 0; cpu < MAXCPU; cpu++) {
 		/*
 		 * Initialize the host ASIDs to their "highest" valid values.
 		 *
 		 * The next ASID allocation will rollover both 'gen' and 'num'
 		 * and start off the sequence at {1,1}.
 		 */
 		asid[cpu].gen = ~0UL;
 		asid[cpu].num = nasid - 1;
 	}
 
 	svm_msr_init();
 	svm_npt_init(ipinum);
 
 	/* Enable SVM on all CPUs */
 	smp_rendezvous(NULL, svm_enable, NULL, NULL);
 
 	return (0);
 }
 
 static void
 svm_restore(void)
 {
 
 	svm_enable(NULL);
 }		
 
 /* Pentium compatible MSRs */
 #define MSR_PENTIUM_START 	0	
 #define MSR_PENTIUM_END 	0x1FFF
 /* AMD 6th generation and Intel compatible MSRs */
 #define MSR_AMD6TH_START 	0xC0000000UL	
 #define MSR_AMD6TH_END 		0xC0001FFFUL	
 /* AMD 7th and 8th generation compatible MSRs */
 #define MSR_AMD7TH_START 	0xC0010000UL	
 #define MSR_AMD7TH_END 		0xC0011FFFUL	
 
 /*
  * Get the index and bit position for a MSR in permission bitmap.
  * Two bits are used for each MSR: lower bit for read and higher bit for write.
  */
 static int
 svm_msr_index(uint64_t msr, int *index, int *bit)
 {
 	uint32_t base, off;
 
 	*index = -1;
 	*bit = (msr % 4) * 2;
 	base = 0;
 
 	if (msr >= MSR_PENTIUM_START && msr <= MSR_PENTIUM_END) {
 		*index = msr / 4;
 		return (0);
 	}
 
 	base += (MSR_PENTIUM_END - MSR_PENTIUM_START + 1); 
 	if (msr >= MSR_AMD6TH_START && msr <= MSR_AMD6TH_END) {
 		off = (msr - MSR_AMD6TH_START); 
 		*index = (off + base) / 4;
 		return (0);
 	} 
 
 	base += (MSR_AMD6TH_END - MSR_AMD6TH_START + 1);
 	if (msr >= MSR_AMD7TH_START && msr <= MSR_AMD7TH_END) {
 		off = (msr - MSR_AMD7TH_START);
 		*index = (off + base) / 4;
 		return (0);
 	}
 
 	return (EINVAL);
 }
 
 /*
  * Allow vcpu to read or write the 'msr' without trapping into the hypervisor.
  */
 static void
 svm_msr_perm(uint8_t *perm_bitmap, uint64_t msr, bool read, bool write)
 {
 	int index, bit, error;
 
 	error = svm_msr_index(msr, &index, &bit);
 	KASSERT(error == 0, ("%s: invalid msr %#lx", __func__, msr));
 	KASSERT(index >= 0 && index < SVM_MSR_BITMAP_SIZE,
 	    ("%s: invalid index %d for msr %#lx", __func__, index, msr));
 	KASSERT(bit >= 0 && bit <= 6, ("%s: invalid bit position %d "
 	    "msr %#lx", __func__, bit, msr));
 
 	if (read)
 		perm_bitmap[index] &= ~(1UL << bit);
 
 	if (write)
 		perm_bitmap[index] &= ~(2UL << bit);
 }
 
 static void
 svm_msr_rw_ok(uint8_t *perm_bitmap, uint64_t msr)
 {
 
 	svm_msr_perm(perm_bitmap, msr, true, true);
 }
 
 static void
 svm_msr_rd_ok(uint8_t *perm_bitmap, uint64_t msr)
 {
 
 	svm_msr_perm(perm_bitmap, msr, true, false);
 }
 
 static __inline int
 svm_get_intercept(struct svm_softc *sc, int vcpu, int idx, uint32_t bitmask)
 {
 	struct vmcb_ctrl *ctrl;
 
 	KASSERT(idx >=0 && idx < 5, ("invalid intercept index %d", idx));
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 	return (ctrl->intercept[idx] & bitmask ? 1 : 0);
 }
 
 static __inline void
 svm_set_intercept(struct svm_softc *sc, int vcpu, int idx, uint32_t bitmask,
     int enabled)
 {
 	struct vmcb_ctrl *ctrl;
 	uint32_t oldval;
 
 	KASSERT(idx >=0 && idx < 5, ("invalid intercept index %d", idx));
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 	oldval = ctrl->intercept[idx];
 
 	if (enabled)
 		ctrl->intercept[idx] |= bitmask;
 	else
 		ctrl->intercept[idx] &= ~bitmask;
 
 	if (ctrl->intercept[idx] != oldval) {
 		svm_set_dirty(sc, vcpu, VMCB_CACHE_I);
 		VCPU_CTR3(sc->vm, vcpu, "intercept[%d] modified "
 		    "from %#x to %#x", idx, oldval, ctrl->intercept[idx]);
 	}
 }
 
 static __inline void
 svm_disable_intercept(struct svm_softc *sc, int vcpu, int off, uint32_t bitmask)
 {
 
 	svm_set_intercept(sc, vcpu, off, bitmask, 0);
 }
 
 static __inline void
 svm_enable_intercept(struct svm_softc *sc, int vcpu, int off, uint32_t bitmask)
 {
 
 	svm_set_intercept(sc, vcpu, off, bitmask, 1);
 }
 
 static void
 vmcb_init(struct svm_softc *sc, int vcpu, uint64_t iopm_base_pa,
     uint64_t msrpm_base_pa, uint64_t np_pml4)
 {
 	struct vmcb_ctrl *ctrl;
 	struct vmcb_state *state;
 	uint32_t mask;
 	int n;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 	state = svm_get_vmcb_state(sc, vcpu);
 
 	ctrl->iopm_base_pa = iopm_base_pa;
 	ctrl->msrpm_base_pa = msrpm_base_pa;
 
 	/* Enable nested paging */
 	ctrl->np_enable = 1;
 	ctrl->n_cr3 = np_pml4;
 
 	/*
 	 * Intercept accesses to the control registers that are not shadowed
 	 * in the VMCB - i.e. all except cr0, cr2, cr3, cr4 and cr8.
 	 */
 	for (n = 0; n < 16; n++) {
 		mask = (BIT(n) << 16) | BIT(n);
 		if (n == 0 || n == 2 || n == 3 || n == 4 || n == 8)
 			svm_disable_intercept(sc, vcpu, VMCB_CR_INTCPT, mask);
 		else
 			svm_enable_intercept(sc, vcpu, VMCB_CR_INTCPT, mask);
 	}
 
 
 	/*
 	 * Intercept everything when tracing guest exceptions otherwise
 	 * just intercept machine check exception.
 	 */
 	if (vcpu_trace_exceptions(sc->vm, vcpu)) {
 		for (n = 0; n < 32; n++) {
 			/*
 			 * Skip unimplemented vectors in the exception bitmap.
 			 */
 			if (n == 2 || n == 9) {
 				continue;
 			}
 			svm_enable_intercept(sc, vcpu, VMCB_EXC_INTCPT, BIT(n));
 		}
 	} else {
 		svm_enable_intercept(sc, vcpu, VMCB_EXC_INTCPT, BIT(IDT_MC));
 	}
 
 	/* Intercept various events (for e.g. I/O, MSR and CPUID accesses) */
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_IO);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_MSR);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_CPUID);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_INTR);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_INIT);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_NMI);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_SMI);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_SHUTDOWN);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 	    VMCB_INTCPT_FERR_FREEZE);
 
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL2_INTCPT, VMCB_INTCPT_MONITOR);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL2_INTCPT, VMCB_INTCPT_MWAIT);
 
 	/*
 	 * From section "Canonicalization and Consistency Checks" in APMv2
 	 * the VMRUN intercept bit must be set to pass the consistency check.
 	 */
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL2_INTCPT, VMCB_INTCPT_VMRUN);
 
 	/*
 	 * The ASID will be set to a non-zero value just before VMRUN.
 	 */
 	ctrl->asid = 0;
 
 	/*
 	 * Section 15.21.1, Interrupt Masking in EFLAGS
 	 * Section 15.21.2, Virtualizing APIC.TPR
 	 *
 	 * This must be set for %rflag and %cr8 isolation of guest and host.
 	 */
 	ctrl->v_intr_masking = 1;
 
 	/* Enable Last Branch Record aka LBR for debugging */
 	ctrl->lbr_virt_en = 1;
 	state->dbgctl = BIT(0);
 
 	/* EFER_SVM must always be set when the guest is executing */
 	state->efer = EFER_SVM;
 
 	/* Set up the PAT to power-on state */
 	state->g_pat = PAT_VALUE(0, PAT_WRITE_BACK)	|
 	    PAT_VALUE(1, PAT_WRITE_THROUGH)	|
 	    PAT_VALUE(2, PAT_UNCACHED)		|
 	    PAT_VALUE(3, PAT_UNCACHEABLE)	|
 	    PAT_VALUE(4, PAT_WRITE_BACK)	|
 	    PAT_VALUE(5, PAT_WRITE_THROUGH)	|
 	    PAT_VALUE(6, PAT_UNCACHED)		|
 	    PAT_VALUE(7, PAT_UNCACHEABLE);
 
 	/* Set up DR6/7 to power-on state */
 	state->dr6 = DBREG_DR6_RESERVED1;
 	state->dr7 = DBREG_DR7_RESERVED1;
 }
 
 /*
  * Initialize a virtual machine.
  */
 static void *
 svm_vminit(struct vm *vm, pmap_t pmap)
 {
 	struct svm_softc *svm_sc;
 	struct svm_vcpu *vcpu;
 	vm_paddr_t msrpm_pa, iopm_pa, pml4_pa;
 	int i;
+	uint16_t maxcpus;
 
 	svm_sc = malloc(sizeof (*svm_sc), M_SVM, M_WAITOK | M_ZERO);
 	if (((uintptr_t)svm_sc & PAGE_MASK) != 0)
 		panic("malloc of svm_softc not aligned on page boundary");
 
 	svm_sc->msr_bitmap = contigmalloc(SVM_MSR_BITMAP_SIZE, M_SVM,
 	    M_WAITOK, 0, ~(vm_paddr_t)0, PAGE_SIZE, 0);
 	if (svm_sc->msr_bitmap == NULL)
 		panic("contigmalloc of SVM MSR bitmap failed");
 	svm_sc->iopm_bitmap = contigmalloc(SVM_IO_BITMAP_SIZE, M_SVM,
 	    M_WAITOK, 0, ~(vm_paddr_t)0, PAGE_SIZE, 0);
 	if (svm_sc->iopm_bitmap == NULL)
 		panic("contigmalloc of SVM IO bitmap failed");
 
 	svm_sc->vm = vm;
 	svm_sc->nptp = (vm_offset_t)vtophys(pmap->pm_pml4);
 
 	/*
 	 * Intercept read and write accesses to all MSRs.
 	 */
 	memset(svm_sc->msr_bitmap, 0xFF, SVM_MSR_BITMAP_SIZE);
 
 	/*
 	 * Access to the following MSRs is redirected to the VMCB when the
 	 * guest is executing. Therefore it is safe to allow the guest to
 	 * read/write these MSRs directly without hypervisor involvement.
 	 */
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_GSBASE);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_FSBASE);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_KGSBASE);
 
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_STAR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_LSTAR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_CSTAR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SF_MASK);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_CS_MSR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_ESP_MSR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_EIP_MSR);
 	svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_PAT);
 
 	svm_msr_rd_ok(svm_sc->msr_bitmap, MSR_TSC);
 
 	/*
 	 * Intercept writes to make sure that the EFER_SVM bit is not cleared.
 	 */
 	svm_msr_rd_ok(svm_sc->msr_bitmap, MSR_EFER);
 
 	/* Intercept access to all I/O ports. */
 	memset(svm_sc->iopm_bitmap, 0xFF, SVM_IO_BITMAP_SIZE);
 
 	iopm_pa = vtophys(svm_sc->iopm_bitmap);
 	msrpm_pa = vtophys(svm_sc->msr_bitmap);
 	pml4_pa = svm_sc->nptp;
-	for (i = 0; i < VM_MAXCPU; i++) {
+	maxcpus = vm_get_maxcpus(svm_sc->vm);
+	for (i = 0; i < maxcpus; i++) {
 		vcpu = svm_get_vcpu(svm_sc, i);
 		vcpu->nextrip = ~0;
 		vcpu->lastcpu = NOCPU;
 		vcpu->vmcb_pa = vtophys(&vcpu->vmcb);
 		vmcb_init(svm_sc, i, iopm_pa, msrpm_pa, pml4_pa);
 		svm_msr_guest_init(svm_sc, i);
 	}
 	return (svm_sc);
 }
 
 /*
  * Collateral for a generic SVM VM-exit.
  */
 static void
 vm_exit_svm(struct vm_exit *vme, uint64_t code, uint64_t info1, uint64_t info2)
 {
 
 	vme->exitcode = VM_EXITCODE_SVM;
 	vme->u.svm.exitcode = code;
 	vme->u.svm.exitinfo1 = info1;
 	vme->u.svm.exitinfo2 = info2;
 }
 
 static int
 svm_cpl(struct vmcb_state *state)
 {
 
 	/*
 	 * From APMv2:
 	 *   "Retrieve the CPL from the CPL field in the VMCB, not
 	 *    from any segment DPL"
 	 */
 	return (state->cpl);
 }
 
 static enum vm_cpu_mode
 svm_vcpu_mode(struct vmcb *vmcb)
 {
 	struct vmcb_segment seg;
 	struct vmcb_state *state;
 	int error;
 
 	state = &vmcb->state;
 
 	if (state->efer & EFER_LMA) {
 		error = vmcb_seg(vmcb, VM_REG_GUEST_CS, &seg);
 		KASSERT(error == 0, ("%s: vmcb_seg(cs) error %d", __func__,
 		    error));
 
 		/*
 		 * Section 4.8.1 for APM2, check if Code Segment has
 		 * Long attribute set in descriptor.
 		 */
 		if (seg.attrib & VMCB_CS_ATTRIB_L)
 			return (CPU_MODE_64BIT);
 		else
 			return (CPU_MODE_COMPATIBILITY);
 	} else  if (state->cr0 & CR0_PE) {
 		return (CPU_MODE_PROTECTED);
 	} else {
 		return (CPU_MODE_REAL);
 	}
 }
 
 static enum vm_paging_mode
 svm_paging_mode(uint64_t cr0, uint64_t cr4, uint64_t efer)
 {
 
 	if ((cr0 & CR0_PG) == 0)
 		return (PAGING_MODE_FLAT);
 	if ((cr4 & CR4_PAE) == 0)
 		return (PAGING_MODE_32);
 	if (efer & EFER_LME)
 		return (PAGING_MODE_64);
 	else
 		return (PAGING_MODE_PAE);
 }
 
 /*
  * ins/outs utility routines
  */
 static uint64_t
 svm_inout_str_index(struct svm_regctx *regs, int in)
 {
 	uint64_t val;
 
 	val = in ? regs->sctx_rdi : regs->sctx_rsi;
 
 	return (val);
 }
 
 static uint64_t
 svm_inout_str_count(struct svm_regctx *regs, int rep)
 {
 	uint64_t val;
 
 	val = rep ? regs->sctx_rcx : 1;
 
 	return (val);
 }
 
 static void
 svm_inout_str_seginfo(struct svm_softc *svm_sc, int vcpu, int64_t info1,
     int in, struct vm_inout_str *vis)
 {
 	int error, s;
 
 	if (in) {
 		vis->seg_name = VM_REG_GUEST_ES;
 	} else {
 		/* The segment field has standard encoding */
 		s = (info1 >> 10) & 0x7;
 		vis->seg_name = vm_segment_name(s);
 	}
 
 	error = vmcb_getdesc(svm_sc, vcpu, vis->seg_name, &vis->seg_desc);
 	KASSERT(error == 0, ("%s: svm_getdesc error %d", __func__, error));
 }
 
 static int
 svm_inout_str_addrsize(uint64_t info1)
 {
         uint32_t size;
 
         size = (info1 >> 7) & 0x7;
         switch (size) {
         case 1:
                 return (2);     /* 16 bit */
         case 2:
                 return (4);     /* 32 bit */
         case 4:
                 return (8);     /* 64 bit */
         default:
                 panic("%s: invalid size encoding %d", __func__, size);
         }
 }
 
 static void
 svm_paging_info(struct vmcb *vmcb, struct vm_guest_paging *paging)
 {
 	struct vmcb_state *state;
 
 	state = &vmcb->state;
 	paging->cr3 = state->cr3;
 	paging->cpl = svm_cpl(state);
 	paging->cpu_mode = svm_vcpu_mode(vmcb);
 	paging->paging_mode = svm_paging_mode(state->cr0, state->cr4,
 	    state->efer);
 }
 
 #define	UNHANDLED 0
 
 /*
  * Handle guest I/O intercept.
  */
 static int
 svm_handle_io(struct svm_softc *svm_sc, int vcpu, struct vm_exit *vmexit)
 {
 	struct vmcb_ctrl *ctrl;
 	struct vmcb_state *state;
 	struct svm_regctx *regs;
 	struct vm_inout_str *vis;
 	uint64_t info1;
 	int inout_string;
 
 	state = svm_get_vmcb_state(svm_sc, vcpu);
 	ctrl  = svm_get_vmcb_ctrl(svm_sc, vcpu);
 	regs  = svm_get_guest_regctx(svm_sc, vcpu);
 
 	info1 = ctrl->exitinfo1;
 	inout_string = info1 & BIT(2) ? 1 : 0;
 
 	/*
 	 * The effective segment number in EXITINFO1[12:10] is populated
 	 * only if the processor has the DecodeAssist capability.
 	 *
 	 * XXX this is not specified explicitly in APMv2 but can be verified
 	 * empirically.
 	 */
 	if (inout_string && !decode_assist())
 		return (UNHANDLED);
 
 	vmexit->exitcode 	= VM_EXITCODE_INOUT;
 	vmexit->u.inout.in 	= (info1 & BIT(0)) ? 1 : 0;
 	vmexit->u.inout.string 	= inout_string;
 	vmexit->u.inout.rep 	= (info1 & BIT(3)) ? 1 : 0;
 	vmexit->u.inout.bytes 	= (info1 >> 4) & 0x7;
 	vmexit->u.inout.port 	= (uint16_t)(info1 >> 16);
 	vmexit->u.inout.eax 	= (uint32_t)(state->rax);
 
 	if (inout_string) {
 		vmexit->exitcode = VM_EXITCODE_INOUT_STR;
 		vis = &vmexit->u.inout_str;
 		svm_paging_info(svm_get_vmcb(svm_sc, vcpu), &vis->paging);
 		vis->rflags = state->rflags;
 		vis->cr0 = state->cr0;
 		vis->index = svm_inout_str_index(regs, vmexit->u.inout.in);
 		vis->count = svm_inout_str_count(regs, vmexit->u.inout.rep);
 		vis->addrsize = svm_inout_str_addrsize(info1);
 		svm_inout_str_seginfo(svm_sc, vcpu, info1,
 		    vmexit->u.inout.in, vis);
 	}
 
 	return (UNHANDLED);
 }
 
 static int
 npf_fault_type(uint64_t exitinfo1)
 {
 
 	if (exitinfo1 & VMCB_NPF_INFO1_W)
 		return (VM_PROT_WRITE);
 	else if (exitinfo1 & VMCB_NPF_INFO1_ID)
 		return (VM_PROT_EXECUTE);
 	else
 		return (VM_PROT_READ);
 }
 
 static bool
 svm_npf_emul_fault(uint64_t exitinfo1)
 {
 	
 	if (exitinfo1 & VMCB_NPF_INFO1_ID) {
 		return (false);
 	}
 
 	if (exitinfo1 & VMCB_NPF_INFO1_GPT) {
 		return (false);
 	}
 
 	if ((exitinfo1 & VMCB_NPF_INFO1_GPA) == 0) {
 		return (false);
 	}
 
 	return (true);	
 }
 
 static void
 svm_handle_inst_emul(struct vmcb *vmcb, uint64_t gpa, struct vm_exit *vmexit)
 {
 	struct vm_guest_paging *paging;
 	struct vmcb_segment seg;
 	struct vmcb_ctrl *ctrl;
 	char *inst_bytes;
 	int error, inst_len;
 
 	ctrl = &vmcb->ctrl;
 	paging = &vmexit->u.inst_emul.paging;
 
 	vmexit->exitcode = VM_EXITCODE_INST_EMUL;
 	vmexit->u.inst_emul.gpa = gpa;
 	vmexit->u.inst_emul.gla = VIE_INVALID_GLA;
 	svm_paging_info(vmcb, paging);
 
 	error = vmcb_seg(vmcb, VM_REG_GUEST_CS, &seg);
 	KASSERT(error == 0, ("%s: vmcb_seg(CS) error %d", __func__, error));
 
 	switch(paging->cpu_mode) {
 	case CPU_MODE_REAL:
 		vmexit->u.inst_emul.cs_base = seg.base;
 		vmexit->u.inst_emul.cs_d = 0;
 		break;
 	case CPU_MODE_PROTECTED:
 	case CPU_MODE_COMPATIBILITY:
 		vmexit->u.inst_emul.cs_base = seg.base;
 
 		/*
 		 * Section 4.8.1 of APM2, Default Operand Size or D bit.
 		 */
 		vmexit->u.inst_emul.cs_d = (seg.attrib & VMCB_CS_ATTRIB_D) ?
 		    1 : 0;
 		break;
 	default:
 		vmexit->u.inst_emul.cs_base = 0;
 		vmexit->u.inst_emul.cs_d = 0;
 		break;	
 	}
 
 	/*
 	 * Copy the instruction bytes into 'vie' if available.
 	 */
 	if (decode_assist() && !disable_npf_assist) {
 		inst_len = ctrl->inst_len;
 		inst_bytes = ctrl->inst_bytes;
 	} else {
 		inst_len = 0;
 		inst_bytes = NULL;
 	}
 	vie_init(&vmexit->u.inst_emul.vie, inst_bytes, inst_len);
 }
 
 #ifdef KTR
 static const char *
 intrtype_to_str(int intr_type)
 {
 	switch (intr_type) {
 	case VMCB_EVENTINJ_TYPE_INTR:
 		return ("hwintr");
 	case VMCB_EVENTINJ_TYPE_NMI:
 		return ("nmi");
 	case VMCB_EVENTINJ_TYPE_INTn:
 		return ("swintr");
 	case VMCB_EVENTINJ_TYPE_EXCEPTION:
 		return ("exception");
 	default:
 		panic("%s: unknown intr_type %d", __func__, intr_type);
 	}
 }
 #endif
 
 /*
  * Inject an event to vcpu as described in section 15.20, "Event injection".
  */
 static void
 svm_eventinject(struct svm_softc *sc, int vcpu, int intr_type, int vector,
 		 uint32_t error, bool ec_valid)
 {
 	struct vmcb_ctrl *ctrl;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 
 	KASSERT((ctrl->eventinj & VMCB_EVENTINJ_VALID) == 0,
 	    ("%s: event already pending %#lx", __func__, ctrl->eventinj));
 
 	KASSERT(vector >=0 && vector <= 255, ("%s: invalid vector %d",
 	    __func__, vector));
 
 	switch (intr_type) {
 	case VMCB_EVENTINJ_TYPE_INTR:
 	case VMCB_EVENTINJ_TYPE_NMI:
 	case VMCB_EVENTINJ_TYPE_INTn:
 		break;
 	case VMCB_EVENTINJ_TYPE_EXCEPTION:
 		if (vector >= 0 && vector <= 31 && vector != 2)
 			break;
 		/* FALLTHROUGH */
 	default:
 		panic("%s: invalid intr_type/vector: %d/%d", __func__,
 		    intr_type, vector);
 	}
 	ctrl->eventinj = vector | (intr_type << 8) | VMCB_EVENTINJ_VALID;
 	if (ec_valid) {
 		ctrl->eventinj |= VMCB_EVENTINJ_EC_VALID;
 		ctrl->eventinj |= (uint64_t)error << 32;
 		VCPU_CTR3(sc->vm, vcpu, "Injecting %s at vector %d errcode %#x",
 		    intrtype_to_str(intr_type), vector, error);
 	} else {
 		VCPU_CTR2(sc->vm, vcpu, "Injecting %s at vector %d",
 		    intrtype_to_str(intr_type), vector);
 	}
 }
 
 static void
 svm_update_virqinfo(struct svm_softc *sc, int vcpu)
 {
 	struct vm *vm;
 	struct vlapic *vlapic;
 	struct vmcb_ctrl *ctrl;
 
 	vm = sc->vm;
 	vlapic = vm_lapic(vm, vcpu);
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 
 	/* Update %cr8 in the emulated vlapic */
 	vlapic_set_cr8(vlapic, ctrl->v_tpr);
 
 	/* Virtual interrupt injection is not used. */
 	KASSERT(ctrl->v_intr_vector == 0, ("%s: invalid "
 	    "v_intr_vector %d", __func__, ctrl->v_intr_vector));
 }
 
 static void
 svm_save_intinfo(struct svm_softc *svm_sc, int vcpu)
 {
 	struct vmcb_ctrl *ctrl;
 	uint64_t intinfo;
 
 	ctrl  = svm_get_vmcb_ctrl(svm_sc, vcpu);
 	intinfo = ctrl->exitintinfo;	
 	if (!VMCB_EXITINTINFO_VALID(intinfo))
 		return;
 
 	/*
 	 * From APMv2, Section "Intercepts during IDT interrupt delivery"
 	 *
 	 * If a #VMEXIT happened during event delivery then record the event
 	 * that was being delivered.
 	 */
 	VCPU_CTR2(svm_sc->vm, vcpu, "SVM:Pending INTINFO(0x%lx), vector=%d.\n",
 		intinfo, VMCB_EXITINTINFO_VECTOR(intinfo));
 	vmm_stat_incr(svm_sc->vm, vcpu, VCPU_EXITINTINFO, 1);
 	vm_exit_intinfo(svm_sc->vm, vcpu, intinfo);
 }
 
 #ifdef INVARIANTS
 static __inline int
 vintr_intercept_enabled(struct svm_softc *sc, int vcpu)
 {
 
 	return (svm_get_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 	    VMCB_INTCPT_VINTR));
 }
 #endif
 
 static __inline void
 enable_intr_window_exiting(struct svm_softc *sc, int vcpu)
 {
 	struct vmcb_ctrl *ctrl;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 
 	if (ctrl->v_irq && ctrl->v_intr_vector == 0) {
 		KASSERT(ctrl->v_ign_tpr, ("%s: invalid v_ign_tpr", __func__));
 		KASSERT(vintr_intercept_enabled(sc, vcpu),
 		    ("%s: vintr intercept should be enabled", __func__));
 		return;
 	}
 
 	VCPU_CTR0(sc->vm, vcpu, "Enable intr window exiting");
 	ctrl->v_irq = 1;
 	ctrl->v_ign_tpr = 1;
 	ctrl->v_intr_vector = 0;
 	svm_set_dirty(sc, vcpu, VMCB_CACHE_TPR);
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_VINTR);
 }
 
 static __inline void
 disable_intr_window_exiting(struct svm_softc *sc, int vcpu)
 {
 	struct vmcb_ctrl *ctrl;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 
 	if (!ctrl->v_irq && ctrl->v_intr_vector == 0) {
 		KASSERT(!vintr_intercept_enabled(sc, vcpu),
 		    ("%s: vintr intercept should be disabled", __func__));
 		return;
 	}
 
 	VCPU_CTR0(sc->vm, vcpu, "Disable intr window exiting");
 	ctrl->v_irq = 0;
 	ctrl->v_intr_vector = 0;
 	svm_set_dirty(sc, vcpu, VMCB_CACHE_TPR);
 	svm_disable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_VINTR);
 }
 
 static int
 svm_modify_intr_shadow(struct svm_softc *sc, int vcpu, uint64_t val)
 {
 	struct vmcb_ctrl *ctrl;
 	int oldval, newval;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 	oldval = ctrl->intr_shadow;
 	newval = val ? 1 : 0;
 	if (newval != oldval) {
 		ctrl->intr_shadow = newval;
 		VCPU_CTR1(sc->vm, vcpu, "Setting intr_shadow to %d", newval);
 	}
 	return (0);
 }
 
 static int
 svm_get_intr_shadow(struct svm_softc *sc, int vcpu, uint64_t *val)
 {
 	struct vmcb_ctrl *ctrl;
 
 	ctrl = svm_get_vmcb_ctrl(sc, vcpu);
 	*val = ctrl->intr_shadow;
 	return (0);
 }
 
 /*
  * Once an NMI is injected it blocks delivery of further NMIs until the handler
  * executes an IRET. The IRET intercept is enabled when an NMI is injected to
  * to track when the vcpu is done handling the NMI.
  */
 static int
 nmi_blocked(struct svm_softc *sc, int vcpu)
 {
 	int blocked;
 
 	blocked = svm_get_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 	    VMCB_INTCPT_IRET);
 	return (blocked);
 }
 
 static void
 enable_nmi_blocking(struct svm_softc *sc, int vcpu)
 {
 
 	KASSERT(!nmi_blocked(sc, vcpu), ("vNMI already blocked"));
 	VCPU_CTR0(sc->vm, vcpu, "vNMI blocking enabled");
 	svm_enable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_IRET);
 }
 
 static void
 clear_nmi_blocking(struct svm_softc *sc, int vcpu)
 {
 	int error;
 
 	KASSERT(nmi_blocked(sc, vcpu), ("vNMI already unblocked"));
 	VCPU_CTR0(sc->vm, vcpu, "vNMI blocking cleared");
 	/*
 	 * When the IRET intercept is cleared the vcpu will attempt to execute
 	 * the "iret" when it runs next. However, it is possible to inject
 	 * another NMI into the vcpu before the "iret" has actually executed.
 	 *
 	 * For e.g. if the "iret" encounters a #NPF when accessing the stack
 	 * it will trap back into the hypervisor. If an NMI is pending for
 	 * the vcpu it will be injected into the guest.
 	 *
 	 * XXX this needs to be fixed
 	 */
 	svm_disable_intercept(sc, vcpu, VMCB_CTRL1_INTCPT, VMCB_INTCPT_IRET);
 
 	/*
 	 * Set 'intr_shadow' to prevent an NMI from being injected on the
 	 * immediate VMRUN.
 	 */
 	error = svm_modify_intr_shadow(sc, vcpu, 1);
 	KASSERT(!error, ("%s: error %d setting intr_shadow", __func__, error));
 }
 
 #define	EFER_MBZ_BITS	0xFFFFFFFFFFFF0200UL
 
 static int
 svm_write_efer(struct svm_softc *sc, int vcpu, uint64_t newval, bool *retu)
 {
 	struct vm_exit *vme;
 	struct vmcb_state *state;
 	uint64_t changed, lma, oldval;
 	int error;
 
 	state = svm_get_vmcb_state(sc, vcpu);
 
 	oldval = state->efer;
 	VCPU_CTR2(sc->vm, vcpu, "wrmsr(efer) %#lx/%#lx", oldval, newval);
 
 	newval &= ~0xFE;		/* clear the Read-As-Zero (RAZ) bits */
 	changed = oldval ^ newval;
 
 	if (newval & EFER_MBZ_BITS)
 		goto gpf;
 
 	/* APMv2 Table 14-5 "Long-Mode Consistency Checks" */
 	if (changed & EFER_LME) {
 		if (state->cr0 & CR0_PG)
 			goto gpf;
 	}
 
 	/* EFER.LMA = EFER.LME & CR0.PG */
 	if ((newval & EFER_LME) != 0 && (state->cr0 & CR0_PG) != 0)
 		lma = EFER_LMA;
 	else
 		lma = 0;
 
 	if ((newval & EFER_LMA) != lma)
 		goto gpf;
 
 	if (newval & EFER_NXE) {
 		if (!vm_cpuid_capability(sc->vm, vcpu, VCC_NO_EXECUTE))
 			goto gpf;
 	}
 
 	/*
 	 * XXX bhyve does not enforce segment limits in 64-bit mode. Until
 	 * this is fixed flag guest attempt to set EFER_LMSLE as an error.
 	 */
 	if (newval & EFER_LMSLE) {
 		vme = vm_exitinfo(sc->vm, vcpu);
 		vm_exit_svm(vme, VMCB_EXIT_MSR, 1, 0);
 		*retu = true;
 		return (0);
 	}
 
 	if (newval & EFER_FFXSR) {
 		if (!vm_cpuid_capability(sc->vm, vcpu, VCC_FFXSR))
 			goto gpf;
 	}
 
 	if (newval & EFER_TCE) {
 		if (!vm_cpuid_capability(sc->vm, vcpu, VCC_TCE))
 			goto gpf;
 	}
 
 	error = svm_setreg(sc, vcpu, VM_REG_GUEST_EFER, newval);
 	KASSERT(error == 0, ("%s: error %d updating efer", __func__, error));
 	return (0);
 gpf:
 	vm_inject_gp(sc->vm, vcpu);
 	return (0);
 }
 
 static int
 emulate_wrmsr(struct svm_softc *sc, int vcpu, u_int num, uint64_t val,
     bool *retu)
 {
 	int error;
 
 	if (lapic_msr(num))
 		error = lapic_wrmsr(sc->vm, vcpu, num, val, retu);
 	else if (num == MSR_EFER)
 		error = svm_write_efer(sc, vcpu, val, retu);
 	else
 		error = svm_wrmsr(sc, vcpu, num, val, retu);
 
 	return (error);
 }
 
 static int
 emulate_rdmsr(struct svm_softc *sc, int vcpu, u_int num, bool *retu)
 {
 	struct vmcb_state *state;
 	struct svm_regctx *ctx;
 	uint64_t result;
 	int error;
 
 	if (lapic_msr(num))
 		error = lapic_rdmsr(sc->vm, vcpu, num, &result, retu);
 	else
 		error = svm_rdmsr(sc, vcpu, num, &result, retu);
 
 	if (error == 0) {
 		state = svm_get_vmcb_state(sc, vcpu);
 		ctx = svm_get_guest_regctx(sc, vcpu);
 		state->rax = result & 0xffffffff;
 		ctx->sctx_rdx = result >> 32;
 	}
 
 	return (error);
 }
 
 #ifdef KTR
 static const char *
 exit_reason_to_str(uint64_t reason)
 {
 	static char reasonbuf[32];
 
 	switch (reason) {
 	case VMCB_EXIT_INVALID:
 		return ("invalvmcb");
 	case VMCB_EXIT_SHUTDOWN:
 		return ("shutdown");
 	case VMCB_EXIT_NPF:
 		return ("nptfault");
 	case VMCB_EXIT_PAUSE:
 		return ("pause");
 	case VMCB_EXIT_HLT:
 		return ("hlt");
 	case VMCB_EXIT_CPUID:
 		return ("cpuid");
 	case VMCB_EXIT_IO:
 		return ("inout");
 	case VMCB_EXIT_MC:
 		return ("mchk");
 	case VMCB_EXIT_INTR:
 		return ("extintr");
 	case VMCB_EXIT_NMI:
 		return ("nmi");
 	case VMCB_EXIT_VINTR:
 		return ("vintr");
 	case VMCB_EXIT_MSR:
 		return ("msr");
 	case VMCB_EXIT_IRET:
 		return ("iret");
 	case VMCB_EXIT_MONITOR:
 		return ("monitor");
 	case VMCB_EXIT_MWAIT:
 		return ("mwait");
 	default:
 		snprintf(reasonbuf, sizeof(reasonbuf), "%#lx", reason);
 		return (reasonbuf);
 	}
 }
 #endif	/* KTR */
 
 /*
  * From section "State Saved on Exit" in APMv2: nRIP is saved for all #VMEXITs
  * that are due to instruction intercepts as well as MSR and IOIO intercepts
  * and exceptions caused by INT3, INTO and BOUND instructions.
  *
  * Return 1 if the nRIP is valid and 0 otherwise.
  */
 static int
 nrip_valid(uint64_t exitcode)
 {
 	switch (exitcode) {
 	case 0x00 ... 0x0F:	/* read of CR0 through CR15 */
 	case 0x10 ... 0x1F:	/* write of CR0 through CR15 */
 	case 0x20 ... 0x2F:	/* read of DR0 through DR15 */
 	case 0x30 ... 0x3F:	/* write of DR0 through DR15 */
 	case 0x43:		/* INT3 */
 	case 0x44:		/* INTO */
 	case 0x45:		/* BOUND */
 	case 0x65 ... 0x7C:	/* VMEXIT_CR0_SEL_WRITE ... VMEXIT_MSR */
 	case 0x80 ... 0x8D:	/* VMEXIT_VMRUN ... VMEXIT_XSETBV */
 		return (1);
 	default:
 		return (0);
 	}
 }
 
 static int
 svm_vmexit(struct svm_softc *svm_sc, int vcpu, struct vm_exit *vmexit)
 {
 	struct vmcb *vmcb;
 	struct vmcb_state *state;
 	struct vmcb_ctrl *ctrl;
 	struct svm_regctx *ctx;
 	uint64_t code, info1, info2, val;
 	uint32_t eax, ecx, edx;
 	int error, errcode_valid, handled, idtvec, reflect;
 	bool retu;
 
 	ctx = svm_get_guest_regctx(svm_sc, vcpu);
 	vmcb = svm_get_vmcb(svm_sc, vcpu);
 	state = &vmcb->state;
 	ctrl = &vmcb->ctrl;
 
 	handled = 0;
 	code = ctrl->exitcode;
 	info1 = ctrl->exitinfo1;
 	info2 = ctrl->exitinfo2;
 
 	vmexit->exitcode = VM_EXITCODE_BOGUS;
 	vmexit->rip = state->rip;
 	vmexit->inst_length = nrip_valid(code) ? ctrl->nrip - state->rip : 0;
 
 	vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_COUNT, 1);
 
 	/*
 	 * #VMEXIT(INVALID) needs to be handled early because the VMCB is
 	 * in an inconsistent state and can trigger assertions that would
 	 * never happen otherwise.
 	 */
 	if (code == VMCB_EXIT_INVALID) {
 		vm_exit_svm(vmexit, code, info1, info2);
 		return (0);
 	}
 
 	KASSERT((ctrl->eventinj & VMCB_EVENTINJ_VALID) == 0, ("%s: event "
 	    "injection valid bit is set %#lx", __func__, ctrl->eventinj));
 
 	KASSERT(vmexit->inst_length >= 0 && vmexit->inst_length <= 15,
 	    ("invalid inst_length %d: code (%#lx), info1 (%#lx), info2 (%#lx)",
 	    vmexit->inst_length, code, info1, info2));
 
 	svm_update_virqinfo(svm_sc, vcpu);
 	svm_save_intinfo(svm_sc, vcpu);
 
 	switch (code) {
 	case VMCB_EXIT_IRET:
 		/*
 		 * Restart execution at "iret" but with the intercept cleared.
 		 */
 		vmexit->inst_length = 0;
 		clear_nmi_blocking(svm_sc, vcpu);
 		handled = 1;
 		break;
 	case VMCB_EXIT_VINTR:	/* interrupt window exiting */
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_VINTR, 1);
 		handled = 1;
 		break;
 	case VMCB_EXIT_INTR:	/* external interrupt */
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_EXTINT, 1);
 		handled = 1;
 		break;
 	case VMCB_EXIT_NMI:	/* external NMI */
 		handled = 1;
 		break;
 	case 0x40 ... 0x5F:
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_EXCEPTION, 1);
 		reflect = 1;
 		idtvec = code - 0x40;
 		switch (idtvec) {
 		case IDT_MC:
 			/*
 			 * Call the machine check handler by hand. Also don't
 			 * reflect the machine check back into the guest.
 			 */
 			reflect = 0;
 			VCPU_CTR0(svm_sc->vm, vcpu, "Vectoring to MCE handler");
 			__asm __volatile("int $18");
 			break;
 		case IDT_PF:
 			error = svm_setreg(svm_sc, vcpu, VM_REG_GUEST_CR2,
 			    info2);
 			KASSERT(error == 0, ("%s: error %d updating cr2",
 			    __func__, error));
 			/* fallthru */
 		case IDT_NP:
 		case IDT_SS:
 		case IDT_GP:
 		case IDT_AC:
 		case IDT_TS:
 			errcode_valid = 1;
 			break;
 
 		case IDT_DF:
 			errcode_valid = 1;
 			info1 = 0;
 			break;
 
 		case IDT_BP:
 		case IDT_OF:
 		case IDT_BR:
 			/*
 			 * The 'nrip' field is populated for INT3, INTO and
 			 * BOUND exceptions and this also implies that
 			 * 'inst_length' is non-zero.
 			 *
 			 * Reset 'inst_length' to zero so the guest %rip at
 			 * event injection is identical to what it was when
 			 * the exception originally happened.
 			 */
 			VCPU_CTR2(svm_sc->vm, vcpu, "Reset inst_length from %d "
 			    "to zero before injecting exception %d",
 			    vmexit->inst_length, idtvec);
 			vmexit->inst_length = 0;
 			/* fallthru */
 		default:
 			errcode_valid = 0;
 			info1 = 0;
 			break;
 		}
 		KASSERT(vmexit->inst_length == 0, ("invalid inst_length (%d) "
 		    "when reflecting exception %d into guest",
 		    vmexit->inst_length, idtvec));
 
 		if (reflect) {
 			/* Reflect the exception back into the guest */
 			VCPU_CTR2(svm_sc->vm, vcpu, "Reflecting exception "
 			    "%d/%#x into the guest", idtvec, (int)info1);
 			error = vm_inject_exception(svm_sc->vm, vcpu, idtvec,
 			    errcode_valid, info1, 0);
 			KASSERT(error == 0, ("%s: vm_inject_exception error %d",
 			    __func__, error));
 		}
 		handled = 1;
 		break;
 	case VMCB_EXIT_MSR:	/* MSR access. */
 		eax = state->rax;
 		ecx = ctx->sctx_rcx;
 		edx = ctx->sctx_rdx;
 		retu = false;	
 
 		if (info1) {
 			vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_WRMSR, 1);
 			val = (uint64_t)edx << 32 | eax;
 			VCPU_CTR2(svm_sc->vm, vcpu, "wrmsr %#x val %#lx",
 			    ecx, val);
 			if (emulate_wrmsr(svm_sc, vcpu, ecx, val, &retu)) {
 				vmexit->exitcode = VM_EXITCODE_WRMSR;
 				vmexit->u.msr.code = ecx;
 				vmexit->u.msr.wval = val;
 			} else if (!retu) {
 				handled = 1;
 			} else {
 				KASSERT(vmexit->exitcode != VM_EXITCODE_BOGUS,
 				    ("emulate_wrmsr retu with bogus exitcode"));
 			}
 		} else {
 			VCPU_CTR1(svm_sc->vm, vcpu, "rdmsr %#x", ecx);
 			vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_RDMSR, 1);
 			if (emulate_rdmsr(svm_sc, vcpu, ecx, &retu)) {
 				vmexit->exitcode = VM_EXITCODE_RDMSR;
 				vmexit->u.msr.code = ecx;
 			} else if (!retu) {
 				handled = 1;
 			} else {
 				KASSERT(vmexit->exitcode != VM_EXITCODE_BOGUS,
 				    ("emulate_rdmsr retu with bogus exitcode"));
 			}
 		}
 		break;
 	case VMCB_EXIT_IO:
 		handled = svm_handle_io(svm_sc, vcpu, vmexit);
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_INOUT, 1);
 		break;
 	case VMCB_EXIT_CPUID:
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_CPUID, 1);
 		handled = x86_emulate_cpuid(svm_sc->vm, vcpu,
 		    (uint32_t *)&state->rax,
 		    (uint32_t *)&ctx->sctx_rbx,
 		    (uint32_t *)&ctx->sctx_rcx,
 		    (uint32_t *)&ctx->sctx_rdx);
 		break;
 	case VMCB_EXIT_HLT:
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_HLT, 1);
 		vmexit->exitcode = VM_EXITCODE_HLT;
 		vmexit->u.hlt.rflags = state->rflags;
 		break;
 	case VMCB_EXIT_PAUSE:
 		vmexit->exitcode = VM_EXITCODE_PAUSE;
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_PAUSE, 1);
 		break;
 	case VMCB_EXIT_NPF:
 		/* EXITINFO2 contains the faulting guest physical address */
 		if (info1 & VMCB_NPF_INFO1_RSV) {
 			VCPU_CTR2(svm_sc->vm, vcpu, "nested page fault with "
 			    "reserved bits set: info1(%#lx) info2(%#lx)",
 			    info1, info2);
 		} else if (vm_mem_allocated(svm_sc->vm, vcpu, info2)) {
 			vmexit->exitcode = VM_EXITCODE_PAGING;
 			vmexit->u.paging.gpa = info2;
 			vmexit->u.paging.fault_type = npf_fault_type(info1);
 			vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_NESTED_FAULT, 1);
 			VCPU_CTR3(svm_sc->vm, vcpu, "nested page fault "
 			    "on gpa %#lx/%#lx at rip %#lx",
 			    info2, info1, state->rip);
 		} else if (svm_npf_emul_fault(info1)) {
 			svm_handle_inst_emul(vmcb, info2, vmexit);
 			vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_INST_EMUL, 1);
 			VCPU_CTR3(svm_sc->vm, vcpu, "inst_emul fault "
 			    "for gpa %#lx/%#lx at rip %#lx",
 			    info2, info1, state->rip);
 		}
 		break;
 	case VMCB_EXIT_MONITOR:
 		vmexit->exitcode = VM_EXITCODE_MONITOR;
 		break;
 	case VMCB_EXIT_MWAIT:
 		vmexit->exitcode = VM_EXITCODE_MWAIT;
 		break;
 	default:
 		vmm_stat_incr(svm_sc->vm, vcpu, VMEXIT_UNKNOWN, 1);
 		break;
 	}	
 
 	VCPU_CTR4(svm_sc->vm, vcpu, "%s %s vmexit at %#lx/%d",
 	    handled ? "handled" : "unhandled", exit_reason_to_str(code),
 	    vmexit->rip, vmexit->inst_length);
 
 	if (handled) {
 		vmexit->rip += vmexit->inst_length;
 		vmexit->inst_length = 0;
 		state->rip = vmexit->rip;
 	} else {
 		if (vmexit->exitcode == VM_EXITCODE_BOGUS) {
 			/*
 			 * If this VM exit was not claimed by anybody then
 			 * treat it as a generic SVM exit.
 			 */
 			vm_exit_svm(vmexit, code, info1, info2);
 		} else {
 			/*
 			 * The exitcode and collateral have been populated.
 			 * The VM exit will be processed further in userland.
 			 */
 		}
 	}
 	return (handled);
 }
 
 static void
 svm_inj_intinfo(struct svm_softc *svm_sc, int vcpu)
 {
 	uint64_t intinfo;
 
 	if (!vm_entry_intinfo(svm_sc->vm, vcpu, &intinfo))
 		return;
 
 	KASSERT(VMCB_EXITINTINFO_VALID(intinfo), ("%s: entry intinfo is not "
 	    "valid: %#lx", __func__, intinfo));
 
 	svm_eventinject(svm_sc, vcpu, VMCB_EXITINTINFO_TYPE(intinfo),
 		VMCB_EXITINTINFO_VECTOR(intinfo),
 		VMCB_EXITINTINFO_EC(intinfo),
 		VMCB_EXITINTINFO_EC_VALID(intinfo));
 	vmm_stat_incr(svm_sc->vm, vcpu, VCPU_INTINFO_INJECTED, 1);
 	VCPU_CTR1(svm_sc->vm, vcpu, "Injected entry intinfo: %#lx", intinfo);
 }
 
 /*
  * Inject event to virtual cpu.
  */
 static void
 svm_inj_interrupts(struct svm_softc *sc, int vcpu, struct vlapic *vlapic)
 {
 	struct vmcb_ctrl *ctrl;
 	struct vmcb_state *state;
 	struct svm_vcpu *vcpustate;
 	uint8_t v_tpr;
 	int vector, need_intr_window;
 	int extint_pending;
 
 	state = svm_get_vmcb_state(sc, vcpu);
 	ctrl  = svm_get_vmcb_ctrl(sc, vcpu);
 	vcpustate = svm_get_vcpu(sc, vcpu);
 
 	need_intr_window = 0;
 
 	if (vcpustate->nextrip != state->rip) {
 		ctrl->intr_shadow = 0;
 		VCPU_CTR2(sc->vm, vcpu, "Guest interrupt blocking "
 		    "cleared due to rip change: %#lx/%#lx",
 		    vcpustate->nextrip, state->rip);
 	}
 
 	/*
 	 * Inject pending events or exceptions for this vcpu.
 	 *
 	 * An event might be pending because the previous #VMEXIT happened
 	 * during event delivery (i.e. ctrl->exitintinfo).
 	 *
 	 * An event might also be pending because an exception was injected
 	 * by the hypervisor (e.g. #PF during instruction emulation).
 	 */
 	svm_inj_intinfo(sc, vcpu);
 
 	/* NMI event has priority over interrupts. */
 	if (vm_nmi_pending(sc->vm, vcpu)) {
 		if (nmi_blocked(sc, vcpu)) {
 			/*
 			 * Can't inject another NMI if the guest has not
 			 * yet executed an "iret" after the last NMI.
 			 */
 			VCPU_CTR0(sc->vm, vcpu, "Cannot inject NMI due "
 			    "to NMI-blocking");
 		} else if (ctrl->intr_shadow) {
 			/*
 			 * Can't inject an NMI if the vcpu is in an intr_shadow.
 			 */
 			VCPU_CTR0(sc->vm, vcpu, "Cannot inject NMI due to "
 			    "interrupt shadow");
 			need_intr_window = 1;
 			goto done;
 		} else if (ctrl->eventinj & VMCB_EVENTINJ_VALID) {
 			/*
 			 * If there is already an exception/interrupt pending
 			 * then defer the NMI until after that.
 			 */
 			VCPU_CTR1(sc->vm, vcpu, "Cannot inject NMI due to "
 			    "eventinj %#lx", ctrl->eventinj);
 
 			/*
 			 * Use self-IPI to trigger a VM-exit as soon as
 			 * possible after the event injection is completed.
 			 *
 			 * This works only if the external interrupt exiting
 			 * is at a lower priority than the event injection.
 			 *
 			 * Although not explicitly specified in APMv2 the
 			 * relative priorities were verified empirically.
 			 */
 			ipi_cpu(curcpu, IPI_AST);	/* XXX vmm_ipinum? */
 		} else {
 			vm_nmi_clear(sc->vm, vcpu);
 
 			/* Inject NMI, vector number is not used */
 			svm_eventinject(sc, vcpu, VMCB_EVENTINJ_TYPE_NMI,
 			    IDT_NMI, 0, false);
 
 			/* virtual NMI blocking is now in effect */
 			enable_nmi_blocking(sc, vcpu);
 
 			VCPU_CTR0(sc->vm, vcpu, "Injecting vNMI");
 		}
 	}
 
 	extint_pending = vm_extint_pending(sc->vm, vcpu);
 	if (!extint_pending) {
 		if (!vlapic_pending_intr(vlapic, &vector))
 			goto done;
 		KASSERT(vector >= 16 && vector <= 255,
 		    ("invalid vector %d from local APIC", vector));
 	} else {
 		/* Ask the legacy pic for a vector to inject */
 		vatpic_pending_intr(sc->vm, &vector);
 		KASSERT(vector >= 0 && vector <= 255,
 		    ("invalid vector %d from INTR", vector));
 	}
 
 	/*
 	 * If the guest has disabled interrupts or is in an interrupt shadow
 	 * then we cannot inject the pending interrupt.
 	 */
 	if ((state->rflags & PSL_I) == 0) {
 		VCPU_CTR2(sc->vm, vcpu, "Cannot inject vector %d due to "
 		    "rflags %#lx", vector, state->rflags);
 		need_intr_window = 1;
 		goto done;
 	}
 
 	if (ctrl->intr_shadow) {
 		VCPU_CTR1(sc->vm, vcpu, "Cannot inject vector %d due to "
 		    "interrupt shadow", vector);
 		need_intr_window = 1;
 		goto done;
 	}
 
 	if (ctrl->eventinj & VMCB_EVENTINJ_VALID) {
 		VCPU_CTR2(sc->vm, vcpu, "Cannot inject vector %d due to "
 		    "eventinj %#lx", vector, ctrl->eventinj);
 		need_intr_window = 1;
 		goto done;
 	}
 
 	svm_eventinject(sc, vcpu, VMCB_EVENTINJ_TYPE_INTR, vector, 0, false);
 
 	if (!extint_pending) {
 		vlapic_intr_accepted(vlapic, vector);
 	} else {
 		vm_extint_clear(sc->vm, vcpu);
 		vatpic_intr_accepted(sc->vm, vector);
 	}
 
 	/*
 	 * Force a VM-exit as soon as the vcpu is ready to accept another
 	 * interrupt. This is done because the PIC might have another vector
 	 * that it wants to inject. Also, if the APIC has a pending interrupt
 	 * that was preempted by the ExtInt then it allows us to inject the
 	 * APIC vector as soon as possible.
 	 */
 	need_intr_window = 1;
 done:
 	/*
 	 * The guest can modify the TPR by writing to %CR8. In guest mode
 	 * the processor reflects this write to V_TPR without hypervisor
 	 * intervention.
 	 *
 	 * The guest can also modify the TPR by writing to it via the memory
 	 * mapped APIC page. In this case, the write will be emulated by the
 	 * hypervisor. For this reason V_TPR must be updated before every
 	 * VMRUN.
 	 */
 	v_tpr = vlapic_get_cr8(vlapic);
 	KASSERT(v_tpr <= 15, ("invalid v_tpr %#x", v_tpr));
 	if (ctrl->v_tpr != v_tpr) {
 		VCPU_CTR2(sc->vm, vcpu, "VMCB V_TPR changed from %#x to %#x",
 		    ctrl->v_tpr, v_tpr);
 		ctrl->v_tpr = v_tpr;
 		svm_set_dirty(sc, vcpu, VMCB_CACHE_TPR);
 	}
 
 	if (need_intr_window) {
 		/*
 		 * We use V_IRQ in conjunction with the VINTR intercept to
 		 * trap into the hypervisor as soon as a virtual interrupt
 		 * can be delivered.
 		 *
 		 * Since injected events are not subject to intercept checks
 		 * we need to ensure that the V_IRQ is not actually going to
 		 * be delivered on VM entry. The KASSERT below enforces this.
 		 */
 		KASSERT((ctrl->eventinj & VMCB_EVENTINJ_VALID) != 0 ||
 		    (state->rflags & PSL_I) == 0 || ctrl->intr_shadow,
 		    ("Bogus intr_window_exiting: eventinj (%#lx), "
 		    "intr_shadow (%u), rflags (%#lx)",
 		    ctrl->eventinj, ctrl->intr_shadow, state->rflags));
 		enable_intr_window_exiting(sc, vcpu);
 	} else {
 		disable_intr_window_exiting(sc, vcpu);
 	}
 }
 
 static __inline void
 restore_host_tss(void)
 {
 	struct system_segment_descriptor *tss_sd;
 
 	/*
 	 * The TSS descriptor was in use prior to launching the guest so it
 	 * has been marked busy.
 	 *
 	 * 'ltr' requires the descriptor to be marked available so change the
 	 * type to "64-bit available TSS".
 	 */
 	tss_sd = PCPU_GET(tss);
 	tss_sd->sd_type = SDT_SYSTSS;
 	ltr(GSEL(GPROC0_SEL, SEL_KPL));
 }
 
 static void
 check_asid(struct svm_softc *sc, int vcpuid, pmap_t pmap, u_int thiscpu)
 {
 	struct svm_vcpu *vcpustate;
 	struct vmcb_ctrl *ctrl;
 	long eptgen;
 	bool alloc_asid;
 
 	KASSERT(CPU_ISSET(thiscpu, &pmap->pm_active), ("%s: nested pmap not "
 	    "active on cpu %u", __func__, thiscpu));
 
 	vcpustate = svm_get_vcpu(sc, vcpuid);
 	ctrl = svm_get_vmcb_ctrl(sc, vcpuid);
 
 	/*
 	 * The TLB entries associated with the vcpu's ASID are not valid
 	 * if either of the following conditions is true:
 	 *
 	 * 1. The vcpu's ASID generation is different than the host cpu's
 	 *    ASID generation. This happens when the vcpu migrates to a new
 	 *    host cpu. It can also happen when the number of vcpus executing
 	 *    on a host cpu is greater than the number of ASIDs available.
 	 *
 	 * 2. The pmap generation number is different than the value cached in
 	 *    the 'vcpustate'. This happens when the host invalidates pages
 	 *    belonging to the guest.
 	 *
 	 *	asidgen		eptgen	      Action
 	 *	mismatch	mismatch
 	 *	   0		   0		(a)
 	 *	   0		   1		(b1) or (b2)
 	 *	   1		   0		(c)
 	 *	   1		   1		(d)
 	 *
 	 * (a) There is no mismatch in eptgen or ASID generation and therefore
 	 *     no further action is needed.
 	 *
 	 * (b1) If the cpu supports FlushByAsid then the vcpu's ASID is
 	 *      retained and the TLB entries associated with this ASID
 	 *      are flushed by VMRUN.
 	 *
 	 * (b2) If the cpu does not support FlushByAsid then a new ASID is
 	 *      allocated.
 	 *
 	 * (c) A new ASID is allocated.
 	 *
 	 * (d) A new ASID is allocated.
 	 */
 
 	alloc_asid = false;
 	eptgen = pmap->pm_eptgen;
 	ctrl->tlb_ctrl = VMCB_TLB_FLUSH_NOTHING;
 
 	if (vcpustate->asid.gen != asid[thiscpu].gen) {
 		alloc_asid = true;	/* (c) and (d) */
 	} else if (vcpustate->eptgen != eptgen) {
 		if (flush_by_asid())
 			ctrl->tlb_ctrl = VMCB_TLB_FLUSH_GUEST;	/* (b1) */
 		else
 			alloc_asid = true;			/* (b2) */
 	} else {
 		/*
 		 * This is the common case (a).
 		 */
 		KASSERT(!alloc_asid, ("ASID allocation not necessary"));
 		KASSERT(ctrl->tlb_ctrl == VMCB_TLB_FLUSH_NOTHING,
 		    ("Invalid VMCB tlb_ctrl: %#x", ctrl->tlb_ctrl));
 	}
 
 	if (alloc_asid) {
 		if (++asid[thiscpu].num >= nasid) {
 			asid[thiscpu].num = 1;
 			if (++asid[thiscpu].gen == 0)
 				asid[thiscpu].gen = 1;
 			/*
 			 * If this cpu does not support "flush-by-asid"
 			 * then flush the entire TLB on a generation
 			 * bump. Subsequent ASID allocation in this
 			 * generation can be done without a TLB flush.
 			 */
 			if (!flush_by_asid())
 				ctrl->tlb_ctrl = VMCB_TLB_FLUSH_ALL;
 		}
 		vcpustate->asid.gen = asid[thiscpu].gen;
 		vcpustate->asid.num = asid[thiscpu].num;
 
 		ctrl->asid = vcpustate->asid.num;
 		svm_set_dirty(sc, vcpuid, VMCB_CACHE_ASID);
 		/*
 		 * If this cpu supports "flush-by-asid" then the TLB
 		 * was not flushed after the generation bump. The TLB
 		 * is flushed selectively after every new ASID allocation.
 		 */
 		if (flush_by_asid())
 			ctrl->tlb_ctrl = VMCB_TLB_FLUSH_GUEST;
 	}
 	vcpustate->eptgen = eptgen;
 
 	KASSERT(ctrl->asid != 0, ("Guest ASID must be non-zero"));
 	KASSERT(ctrl->asid == vcpustate->asid.num,
 	    ("ASID mismatch: %u/%u", ctrl->asid, vcpustate->asid.num));
 }
 
 static __inline void
 disable_gintr(void)
 {
 
 	__asm __volatile("clgi");
 }
 
 static __inline void
 enable_gintr(void)
 {
 
         __asm __volatile("stgi");
 }
 
 static __inline void
 svm_dr_enter_guest(struct svm_regctx *gctx)
 {
 
 	/* Save host control debug registers. */
 	gctx->host_dr7 = rdr7();
 	gctx->host_debugctl = rdmsr(MSR_DEBUGCTLMSR);
 
 	/*
 	 * Disable debugging in DR7 and DEBUGCTL to avoid triggering
 	 * exceptions in the host based on the guest DRx values.  The
 	 * guest DR6, DR7, and DEBUGCTL are saved/restored in the
 	 * VMCB.
 	 */
 	load_dr7(0);
 	wrmsr(MSR_DEBUGCTLMSR, 0);
 
 	/* Save host debug registers. */
 	gctx->host_dr0 = rdr0();
 	gctx->host_dr1 = rdr1();
 	gctx->host_dr2 = rdr2();
 	gctx->host_dr3 = rdr3();
 	gctx->host_dr6 = rdr6();
 
 	/* Restore guest debug registers. */
 	load_dr0(gctx->sctx_dr0);
 	load_dr1(gctx->sctx_dr1);
 	load_dr2(gctx->sctx_dr2);
 	load_dr3(gctx->sctx_dr3);
 }
 
 static __inline void
 svm_dr_leave_guest(struct svm_regctx *gctx)
 {
 
 	/* Save guest debug registers. */
 	gctx->sctx_dr0 = rdr0();
 	gctx->sctx_dr1 = rdr1();
 	gctx->sctx_dr2 = rdr2();
 	gctx->sctx_dr3 = rdr3();
 
 	/*
 	 * Restore host debug registers.  Restore DR7 and DEBUGCTL
 	 * last.
 	 */
 	load_dr0(gctx->host_dr0);
 	load_dr1(gctx->host_dr1);
 	load_dr2(gctx->host_dr2);
 	load_dr3(gctx->host_dr3);
 	load_dr6(gctx->host_dr6);
 	wrmsr(MSR_DEBUGCTLMSR, gctx->host_debugctl);
 	load_dr7(gctx->host_dr7);
 }
 
 /*
  * Start vcpu with specified RIP.
  */
 static int
 svm_vmrun(void *arg, int vcpu, register_t rip, pmap_t pmap, 
 	struct vm_eventinfo *evinfo)
 {
 	struct svm_regctx *gctx;
 	struct svm_softc *svm_sc;
 	struct svm_vcpu *vcpustate;
 	struct vmcb_state *state;
 	struct vmcb_ctrl *ctrl;
 	struct vm_exit *vmexit;
 	struct vlapic *vlapic;
 	struct vm *vm;
 	uint64_t vmcb_pa;
 	int handled;
 	uint16_t ldt_sel;
 
 	svm_sc = arg;
 	vm = svm_sc->vm;
 
 	vcpustate = svm_get_vcpu(svm_sc, vcpu);
 	state = svm_get_vmcb_state(svm_sc, vcpu);
 	ctrl = svm_get_vmcb_ctrl(svm_sc, vcpu);
 	vmexit = vm_exitinfo(vm, vcpu);
 	vlapic = vm_lapic(vm, vcpu);
 
 	gctx = svm_get_guest_regctx(svm_sc, vcpu);
 	vmcb_pa = svm_sc->vcpu[vcpu].vmcb_pa;
 
 	if (vcpustate->lastcpu != curcpu) {
 		/*
 		 * Force new ASID allocation by invalidating the generation.
 		 */
 		vcpustate->asid.gen = 0;
 
 		/*
 		 * Invalidate the VMCB state cache by marking all fields dirty.
 		 */
 		svm_set_dirty(svm_sc, vcpu, 0xffffffff);
 
 		/*
 		 * XXX
 		 * Setting 'vcpustate->lastcpu' here is bit premature because
 		 * we may return from this function without actually executing
 		 * the VMRUN  instruction. This could happen if a rendezvous
 		 * or an AST is pending on the first time through the loop.
 		 *
 		 * This works for now but any new side-effects of vcpu
 		 * migration should take this case into account.
 		 */
 		vcpustate->lastcpu = curcpu;
 		vmm_stat_incr(vm, vcpu, VCPU_MIGRATIONS, 1);
 	}
 
 	svm_msr_guest_enter(svm_sc, vcpu);
 
 	/* Update Guest RIP */
 	state->rip = rip;
 
 	do {
 		/*
 		 * Disable global interrupts to guarantee atomicity during
 		 * loading of guest state. This includes not only the state
 		 * loaded by the "vmrun" instruction but also software state
 		 * maintained by the hypervisor: suspended and rendezvous
 		 * state, NPT generation number, vlapic interrupts etc.
 		 */
 		disable_gintr();
 
 		if (vcpu_suspended(evinfo)) {
 			enable_gintr();
 			vm_exit_suspended(vm, vcpu, state->rip);
 			break;
 		}
 
 		if (vcpu_rendezvous_pending(evinfo)) {
 			enable_gintr();
 			vm_exit_rendezvous(vm, vcpu, state->rip);
 			break;
 		}
 
 		if (vcpu_reqidle(evinfo)) {
 			enable_gintr();
 			vm_exit_reqidle(vm, vcpu, state->rip);
 			break;
 		}
 
 		/* We are asked to give the cpu by scheduler. */
 		if (vcpu_should_yield(vm, vcpu)) {
 			enable_gintr();
 			vm_exit_astpending(vm, vcpu, state->rip);
 			break;
 		}
 
 		if (vcpu_debugged(vm, vcpu)) {
 			enable_gintr();
 			vm_exit_debug(vm, vcpu, state->rip);
 			break;
 		}
 
 		/*
 		 * #VMEXIT resumes the host with the guest LDTR, so
 		 * save the current LDT selector so it can be restored
 		 * after an exit.  The userspace hypervisor probably
 		 * doesn't use a LDT, but save and restore it to be
 		 * safe.
 		 */
 		ldt_sel = sldt();
 
 		svm_inj_interrupts(svm_sc, vcpu, vlapic);
 
 		/* Activate the nested pmap on 'curcpu' */
 		CPU_SET_ATOMIC_ACQ(curcpu, &pmap->pm_active);
 
 		/*
 		 * Check the pmap generation and the ASID generation to
 		 * ensure that the vcpu does not use stale TLB mappings.
 		 */
 		check_asid(svm_sc, vcpu, pmap, curcpu);
 
 		ctrl->vmcb_clean = vmcb_clean & ~vcpustate->dirty;
 		vcpustate->dirty = 0;
 		VCPU_CTR1(vm, vcpu, "vmcb clean %#x", ctrl->vmcb_clean);
 
 		/* Launch Virtual Machine. */
 		VCPU_CTR1(vm, vcpu, "Resume execution at %#lx", state->rip);
 		svm_dr_enter_guest(gctx);
 		svm_launch(vmcb_pa, gctx, &__pcpu[curcpu]);
 		svm_dr_leave_guest(gctx);
 
 		CPU_CLR_ATOMIC(curcpu, &pmap->pm_active);
 
 		/*
 		 * The host GDTR and IDTR is saved by VMRUN and restored
 		 * automatically on #VMEXIT. However, the host TSS needs
 		 * to be restored explicitly.
 		 */
 		restore_host_tss();
 
 		/* Restore host LDTR. */
 		lldt(ldt_sel);
 
 		/* #VMEXIT disables interrupts so re-enable them here. */ 
 		enable_gintr();
 
 		/* Update 'nextrip' */
 		vcpustate->nextrip = state->rip;
 
 		/* Handle #VMEXIT and if required return to user space. */
 		handled = svm_vmexit(svm_sc, vcpu, vmexit);
 	} while (handled);
 
 	svm_msr_guest_exit(svm_sc, vcpu);
 
 	return (0);
 }
 
 static void
 svm_vmcleanup(void *arg)
 {
 	struct svm_softc *sc = arg;
 
 	contigfree(sc->iopm_bitmap, SVM_IO_BITMAP_SIZE, M_SVM);
 	contigfree(sc->msr_bitmap, SVM_MSR_BITMAP_SIZE, M_SVM);
 	free(sc, M_SVM);
 }
 
 static register_t *
 swctx_regptr(struct svm_regctx *regctx, int reg)
 {
 
 	switch (reg) {
 	case VM_REG_GUEST_RBX:
 		return (&regctx->sctx_rbx);
 	case VM_REG_GUEST_RCX:
 		return (&regctx->sctx_rcx);
 	case VM_REG_GUEST_RDX:
 		return (&regctx->sctx_rdx);
 	case VM_REG_GUEST_RDI:
 		return (&regctx->sctx_rdi);
 	case VM_REG_GUEST_RSI:
 		return (&regctx->sctx_rsi);
 	case VM_REG_GUEST_RBP:
 		return (&regctx->sctx_rbp);
 	case VM_REG_GUEST_R8:
 		return (&regctx->sctx_r8);
 	case VM_REG_GUEST_R9:
 		return (&regctx->sctx_r9);
 	case VM_REG_GUEST_R10:
 		return (&regctx->sctx_r10);
 	case VM_REG_GUEST_R11:
 		return (&regctx->sctx_r11);
 	case VM_REG_GUEST_R12:
 		return (&regctx->sctx_r12);
 	case VM_REG_GUEST_R13:
 		return (&regctx->sctx_r13);
 	case VM_REG_GUEST_R14:
 		return (&regctx->sctx_r14);
 	case VM_REG_GUEST_R15:
 		return (&regctx->sctx_r15);
 	case VM_REG_GUEST_DR0:
 		return (&regctx->sctx_dr0);
 	case VM_REG_GUEST_DR1:
 		return (&regctx->sctx_dr1);
 	case VM_REG_GUEST_DR2:
 		return (&regctx->sctx_dr2);
 	case VM_REG_GUEST_DR3:
 		return (&regctx->sctx_dr3);
 	default:
 		return (NULL);
 	}
 }
 
 static int
 svm_getreg(void *arg, int vcpu, int ident, uint64_t *val)
 {
 	struct svm_softc *svm_sc;
 	register_t *reg;
 
 	svm_sc = arg;
 
 	if (ident == VM_REG_GUEST_INTR_SHADOW) {
 		return (svm_get_intr_shadow(svm_sc, vcpu, val));
 	}
 
 	if (vmcb_read(svm_sc, vcpu, ident, val) == 0) {
 		return (0);
 	}
 
 	reg = swctx_regptr(svm_get_guest_regctx(svm_sc, vcpu), ident);
 
 	if (reg != NULL) {
 		*val = *reg;
 		return (0);
 	}
 
 	VCPU_CTR1(svm_sc->vm, vcpu, "svm_getreg: unknown register %#x", ident);
 	return (EINVAL);
 }
 
 static int
 svm_setreg(void *arg, int vcpu, int ident, uint64_t val)
 {
 	struct svm_softc *svm_sc;
 	register_t *reg;
 
 	svm_sc = arg;
 
 	if (ident == VM_REG_GUEST_INTR_SHADOW) {
 		return (svm_modify_intr_shadow(svm_sc, vcpu, val));
 	}
 
 	if (vmcb_write(svm_sc, vcpu, ident, val) == 0) {
 		return (0);
 	}
 
 	reg = swctx_regptr(svm_get_guest_regctx(svm_sc, vcpu), ident);
 
 	if (reg != NULL) {
 		*reg = val;
 		return (0);
 	}
 
 	/*
 	 * XXX deal with CR3 and invalidate TLB entries tagged with the
 	 * vcpu's ASID. This needs to be treated differently depending on
 	 * whether 'running' is true/false.
 	 */
 
 	VCPU_CTR1(svm_sc->vm, vcpu, "svm_setreg: unknown register %#x", ident);
 	return (EINVAL);
 }
 
 static int
 svm_setcap(void *arg, int vcpu, int type, int val)
 {
 	struct svm_softc *sc;
 	int error;
 
 	sc = arg;
 	error = 0;
 	switch (type) {
 	case VM_CAP_HALT_EXIT:
 		svm_set_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 		    VMCB_INTCPT_HLT, val);
 		break;
 	case VM_CAP_PAUSE_EXIT:
 		svm_set_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 		    VMCB_INTCPT_PAUSE, val);
 		break;
 	case VM_CAP_UNRESTRICTED_GUEST:
 		/* Unrestricted guest execution cannot be disabled in SVM */
 		if (val == 0)
 			error = EINVAL;
 		break;
 	default:
 		error = ENOENT;
 		break;
 	}
 	return (error);
 }
 
 static int
 svm_getcap(void *arg, int vcpu, int type, int *retval)
 {
 	struct svm_softc *sc;
 	int error;
 
 	sc = arg;
 	error = 0;
 
 	switch (type) {
 	case VM_CAP_HALT_EXIT:
 		*retval = svm_get_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 		    VMCB_INTCPT_HLT);
 		break;
 	case VM_CAP_PAUSE_EXIT:
 		*retval = svm_get_intercept(sc, vcpu, VMCB_CTRL1_INTCPT,
 		    VMCB_INTCPT_PAUSE);
 		break;
 	case VM_CAP_UNRESTRICTED_GUEST:
 		*retval = 1;	/* unrestricted guest is always enabled */
 		break;
 	default:
 		error = ENOENT;
 		break;
 	}
 	return (error);
 }
 
 static struct vlapic *
 svm_vlapic_init(void *arg, int vcpuid)
 {
 	struct svm_softc *svm_sc;
 	struct vlapic *vlapic;
 
 	svm_sc = arg;
 	vlapic = malloc(sizeof(struct vlapic), M_SVM_VLAPIC, M_WAITOK | M_ZERO);
 	vlapic->vm = svm_sc->vm;
 	vlapic->vcpuid = vcpuid;
 	vlapic->apic_page = (struct LAPIC *)&svm_sc->apic_page[vcpuid];
 
 	vlapic_init(vlapic);
 
 	return (vlapic);
 }
 
 static void
 svm_vlapic_cleanup(void *arg, struct vlapic *vlapic)
 {
 
         vlapic_cleanup(vlapic);
         free(vlapic, M_SVM_VLAPIC);
 }
 
 struct vmm_ops vmm_ops_amd = {
 	svm_init,
 	svm_cleanup,
 	svm_restore,
 	svm_vminit,
 	svm_vmrun,
 	svm_vmcleanup,
 	svm_getreg,
 	svm_setreg,
 	vmcb_getdesc,
 	vmcb_setdesc,
 	svm_getcap,
 	svm_setcap,
 	svm_npt_alloc,
 	svm_npt_free,
 	svm_vlapic_init,
 	svm_vlapic_cleanup	
 };
Index: user/ngie/bug-237403/sys/amd64/vmm/intel/vmx.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/intel/vmx.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/intel/vmx.c	(revision 346926)
@@ -1,3805 +1,3809 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  * Copyright (c) 2018 Joyent, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/smp.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/sysctl.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <machine/psl.h>
 #include <machine/cpufunc.h>
 #include <machine/md_var.h>
 #include <machine/reg.h>
 #include <machine/segments.h>
 #include <machine/smp.h>
 #include <machine/specialreg.h>
 #include <machine/vmparam.h>
 
 #include <machine/vmm.h>
 #include <machine/vmm_dev.h>
 #include <machine/vmm_instruction_emul.h>
 #include "vmm_lapic.h"
 #include "vmm_host.h"
 #include "vmm_ioport.h"
 #include "vmm_ktr.h"
 #include "vmm_stat.h"
 #include "vatpic.h"
 #include "vlapic.h"
 #include "vlapic_priv.h"
 
 #include "ept.h"
 #include "vmx_cpufunc.h"
 #include "vmx.h"
 #include "vmx_msr.h"
 #include "x86.h"
 #include "vmx_controls.h"
 
 #define	PINBASED_CTLS_ONE_SETTING					\
 	(PINBASED_EXTINT_EXITING	|				\
 	 PINBASED_NMI_EXITING		|				\
 	 PINBASED_VIRTUAL_NMI)
 #define	PINBASED_CTLS_ZERO_SETTING	0
 
 #define PROCBASED_CTLS_WINDOW_SETTING					\
 	(PROCBASED_INT_WINDOW_EXITING	|				\
 	 PROCBASED_NMI_WINDOW_EXITING)
 
 #define	PROCBASED_CTLS_ONE_SETTING					\
 	(PROCBASED_SECONDARY_CONTROLS	|				\
 	 PROCBASED_MWAIT_EXITING	|				\
 	 PROCBASED_MONITOR_EXITING	|				\
 	 PROCBASED_IO_EXITING		|				\
 	 PROCBASED_MSR_BITMAPS		|				\
 	 PROCBASED_CTLS_WINDOW_SETTING	|				\
 	 PROCBASED_CR8_LOAD_EXITING	|				\
 	 PROCBASED_CR8_STORE_EXITING)
 #define	PROCBASED_CTLS_ZERO_SETTING	\
 	(PROCBASED_CR3_LOAD_EXITING |	\
 	PROCBASED_CR3_STORE_EXITING |	\
 	PROCBASED_IO_BITMAPS)
 
 #define	PROCBASED_CTLS2_ONE_SETTING	PROCBASED2_ENABLE_EPT
 #define	PROCBASED_CTLS2_ZERO_SETTING	0
 
 #define	VM_EXIT_CTLS_ONE_SETTING					\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS		|			\
 	VM_EXIT_HOST_LMA			|			\
 	VM_EXIT_SAVE_EFER			|			\
 	VM_EXIT_LOAD_EFER			|			\
 	VM_EXIT_ACKNOWLEDGE_INTERRUPT)
 
 #define	VM_EXIT_CTLS_ZERO_SETTING	0
 
 #define	VM_ENTRY_CTLS_ONE_SETTING					\
 	(VM_ENTRY_LOAD_DEBUG_CONTROLS		|			\
 	VM_ENTRY_LOAD_EFER)
 
 #define	VM_ENTRY_CTLS_ZERO_SETTING					\
 	(VM_ENTRY_INTO_SMM			|			\
 	VM_ENTRY_DEACTIVATE_DUAL_MONITOR)
 
 #define	HANDLED		1
 #define	UNHANDLED	0
 
 static MALLOC_DEFINE(M_VMX, "vmx", "vmx");
 static MALLOC_DEFINE(M_VLAPIC, "vlapic", "vlapic");
 
 SYSCTL_DECL(_hw_vmm);
 SYSCTL_NODE(_hw_vmm, OID_AUTO, vmx, CTLFLAG_RW, NULL, NULL);
 
 int vmxon_enabled[MAXCPU];
 static char vmxon_region[MAXCPU][PAGE_SIZE] __aligned(PAGE_SIZE);
 
 static uint32_t pinbased_ctls, procbased_ctls, procbased_ctls2;
 static uint32_t exit_ctls, entry_ctls;
 
 static uint64_t cr0_ones_mask, cr0_zeros_mask;
 SYSCTL_ULONG(_hw_vmm_vmx, OID_AUTO, cr0_ones_mask, CTLFLAG_RD,
 	     &cr0_ones_mask, 0, NULL);
 SYSCTL_ULONG(_hw_vmm_vmx, OID_AUTO, cr0_zeros_mask, CTLFLAG_RD,
 	     &cr0_zeros_mask, 0, NULL);
 
 static uint64_t cr4_ones_mask, cr4_zeros_mask;
 SYSCTL_ULONG(_hw_vmm_vmx, OID_AUTO, cr4_ones_mask, CTLFLAG_RD,
 	     &cr4_ones_mask, 0, NULL);
 SYSCTL_ULONG(_hw_vmm_vmx, OID_AUTO, cr4_zeros_mask, CTLFLAG_RD,
 	     &cr4_zeros_mask, 0, NULL);
 
 static int vmx_initialized;
 SYSCTL_INT(_hw_vmm_vmx, OID_AUTO, initialized, CTLFLAG_RD,
 	   &vmx_initialized, 0, "Intel VMX initialized");
 
 /*
  * Optional capabilities
  */
 static SYSCTL_NODE(_hw_vmm_vmx, OID_AUTO, cap, CTLFLAG_RW, NULL, NULL);
 
 static int cap_halt_exit;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, halt_exit, CTLFLAG_RD, &cap_halt_exit, 0,
     "HLT triggers a VM-exit");
 
 static int cap_pause_exit;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, pause_exit, CTLFLAG_RD, &cap_pause_exit,
     0, "PAUSE triggers a VM-exit");
 
 static int cap_unrestricted_guest;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, unrestricted_guest, CTLFLAG_RD,
     &cap_unrestricted_guest, 0, "Unrestricted guests");
 
 static int cap_monitor_trap;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, monitor_trap, CTLFLAG_RD,
     &cap_monitor_trap, 0, "Monitor trap flag");
 
 static int cap_invpcid;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, invpcid, CTLFLAG_RD, &cap_invpcid,
     0, "Guests are allowed to use INVPCID");
 
 static int virtual_interrupt_delivery;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, virtual_interrupt_delivery, CTLFLAG_RD,
     &virtual_interrupt_delivery, 0, "APICv virtual interrupt delivery support");
 
 static int posted_interrupts;
 SYSCTL_INT(_hw_vmm_vmx_cap, OID_AUTO, posted_interrupts, CTLFLAG_RD,
     &posted_interrupts, 0, "APICv posted interrupt support");
 
 static int pirvec = -1;
 SYSCTL_INT(_hw_vmm_vmx, OID_AUTO, posted_interrupt_vector, CTLFLAG_RD,
     &pirvec, 0, "APICv posted interrupt vector");
 
 static struct unrhdr *vpid_unr;
 static u_int vpid_alloc_failed;
 SYSCTL_UINT(_hw_vmm_vmx, OID_AUTO, vpid_alloc_failed, CTLFLAG_RD,
 	    &vpid_alloc_failed, 0, NULL);
 
 static int guest_l1d_flush;
 SYSCTL_INT(_hw_vmm_vmx, OID_AUTO, l1d_flush, CTLFLAG_RD,
     &guest_l1d_flush, 0, NULL);
 static int guest_l1d_flush_sw;
 SYSCTL_INT(_hw_vmm_vmx, OID_AUTO, l1d_flush_sw, CTLFLAG_RD,
     &guest_l1d_flush_sw, 0, NULL);
 
 static struct msr_entry msr_load_list[1] __aligned(16);
 
 /*
  * The definitions of SDT probes for VMX.
  */
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, entry,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, taskswitch,
     "struct vmx *", "int", "struct vm_exit *", "struct vm_task_switch *");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, craccess,
     "struct vmx *", "int", "struct vm_exit *", "uint64_t");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, rdmsr,
     "struct vmx *", "int", "struct vm_exit *", "uint32_t");
 
 SDT_PROBE_DEFINE5(vmm, vmx, exit, wrmsr,
     "struct vmx *", "int", "struct vm_exit *", "uint32_t", "uint64_t");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, halt,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, mtrap,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, pause,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, intrwindow,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, interrupt,
     "struct vmx *", "int", "struct vm_exit *", "uint32_t");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, nmiwindow,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, inout,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, cpuid,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE5(vmm, vmx, exit, exception,
     "struct vmx *", "int", "struct vm_exit *", "uint32_t", "int");
 
 SDT_PROBE_DEFINE5(vmm, vmx, exit, nestedfault,
     "struct vmx *", "int", "struct vm_exit *", "uint64_t", "uint64_t");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, mmiofault,
     "struct vmx *", "int", "struct vm_exit *", "uint64_t");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, eoi,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, apicaccess,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, apicwrite,
     "struct vmx *", "int", "struct vm_exit *", "struct vlapic *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, xsetbv,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, monitor,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, mwait,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE3(vmm, vmx, exit, vminsn,
     "struct vmx *", "int", "struct vm_exit *");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, unknown,
     "struct vmx *", "int", "struct vm_exit *", "uint32_t");
 
 SDT_PROBE_DEFINE4(vmm, vmx, exit, return,
     "struct vmx *", "int", "struct vm_exit *", "int");
 
 /*
  * Use the last page below 4GB as the APIC access address. This address is
  * occupied by the boot firmware so it is guaranteed that it will not conflict
  * with a page in system memory.
  */
 #define	APIC_ACCESS_ADDRESS	0xFFFFF000
 
 static int vmx_getdesc(void *arg, int vcpu, int reg, struct seg_desc *desc);
 static int vmx_getreg(void *arg, int vcpu, int reg, uint64_t *retval);
 static int vmxctx_setreg(struct vmxctx *vmxctx, int reg, uint64_t val);
 static void vmx_inject_pir(struct vlapic *vlapic);
 
 #ifdef KTR
 static const char *
 exit_reason_to_str(int reason)
 {
 	static char reasonbuf[32];
 
 	switch (reason) {
 	case EXIT_REASON_EXCEPTION:
 		return "exception";
 	case EXIT_REASON_EXT_INTR:
 		return "extint";
 	case EXIT_REASON_TRIPLE_FAULT:
 		return "triplefault";
 	case EXIT_REASON_INIT:
 		return "init";
 	case EXIT_REASON_SIPI:
 		return "sipi";
 	case EXIT_REASON_IO_SMI:
 		return "iosmi";
 	case EXIT_REASON_SMI:
 		return "smi";
 	case EXIT_REASON_INTR_WINDOW:
 		return "intrwindow";
 	case EXIT_REASON_NMI_WINDOW:
 		return "nmiwindow";
 	case EXIT_REASON_TASK_SWITCH:
 		return "taskswitch";
 	case EXIT_REASON_CPUID:
 		return "cpuid";
 	case EXIT_REASON_GETSEC:
 		return "getsec";
 	case EXIT_REASON_HLT:
 		return "hlt";
 	case EXIT_REASON_INVD:
 		return "invd";
 	case EXIT_REASON_INVLPG:
 		return "invlpg";
 	case EXIT_REASON_RDPMC:
 		return "rdpmc";
 	case EXIT_REASON_RDTSC:
 		return "rdtsc";
 	case EXIT_REASON_RSM:
 		return "rsm";
 	case EXIT_REASON_VMCALL:
 		return "vmcall";
 	case EXIT_REASON_VMCLEAR:
 		return "vmclear";
 	case EXIT_REASON_VMLAUNCH:
 		return "vmlaunch";
 	case EXIT_REASON_VMPTRLD:
 		return "vmptrld";
 	case EXIT_REASON_VMPTRST:
 		return "vmptrst";
 	case EXIT_REASON_VMREAD:
 		return "vmread";
 	case EXIT_REASON_VMRESUME:
 		return "vmresume";
 	case EXIT_REASON_VMWRITE:
 		return "vmwrite";
 	case EXIT_REASON_VMXOFF:
 		return "vmxoff";
 	case EXIT_REASON_VMXON:
 		return "vmxon";
 	case EXIT_REASON_CR_ACCESS:
 		return "craccess";
 	case EXIT_REASON_DR_ACCESS:
 		return "draccess";
 	case EXIT_REASON_INOUT:
 		return "inout";
 	case EXIT_REASON_RDMSR:
 		return "rdmsr";
 	case EXIT_REASON_WRMSR:
 		return "wrmsr";
 	case EXIT_REASON_INVAL_VMCS:
 		return "invalvmcs";
 	case EXIT_REASON_INVAL_MSR:
 		return "invalmsr";
 	case EXIT_REASON_MWAIT:
 		return "mwait";
 	case EXIT_REASON_MTF:
 		return "mtf";
 	case EXIT_REASON_MONITOR:
 		return "monitor";
 	case EXIT_REASON_PAUSE:
 		return "pause";
 	case EXIT_REASON_MCE_DURING_ENTRY:
 		return "mce-during-entry";
 	case EXIT_REASON_TPR:
 		return "tpr";
 	case EXIT_REASON_APIC_ACCESS:
 		return "apic-access";
 	case EXIT_REASON_GDTR_IDTR:
 		return "gdtridtr";
 	case EXIT_REASON_LDTR_TR:
 		return "ldtrtr";
 	case EXIT_REASON_EPT_FAULT:
 		return "eptfault";
 	case EXIT_REASON_EPT_MISCONFIG:
 		return "eptmisconfig";
 	case EXIT_REASON_INVEPT:
 		return "invept";
 	case EXIT_REASON_RDTSCP:
 		return "rdtscp";
 	case EXIT_REASON_VMX_PREEMPT:
 		return "vmxpreempt";
 	case EXIT_REASON_INVVPID:
 		return "invvpid";
 	case EXIT_REASON_WBINVD:
 		return "wbinvd";
 	case EXIT_REASON_XSETBV:
 		return "xsetbv";
 	case EXIT_REASON_APIC_WRITE:
 		return "apic-write";
 	default:
 		snprintf(reasonbuf, sizeof(reasonbuf), "%d", reason);
 		return (reasonbuf);
 	}
 }
 #endif	/* KTR */
 
 static int
 vmx_allow_x2apic_msrs(struct vmx *vmx)
 {
 	int i, error;
 
 	error = 0;
 
 	/*
 	 * Allow readonly access to the following x2APIC MSRs from the guest.
 	 */
 	error += guest_msr_ro(vmx, MSR_APIC_ID);
 	error += guest_msr_ro(vmx, MSR_APIC_VERSION);
 	error += guest_msr_ro(vmx, MSR_APIC_LDR);
 	error += guest_msr_ro(vmx, MSR_APIC_SVR);
 
 	for (i = 0; i < 8; i++)
 		error += guest_msr_ro(vmx, MSR_APIC_ISR0 + i);
 
 	for (i = 0; i < 8; i++)
 		error += guest_msr_ro(vmx, MSR_APIC_TMR0 + i);
 
 	for (i = 0; i < 8; i++)
 		error += guest_msr_ro(vmx, MSR_APIC_IRR0 + i);
 
 	error += guest_msr_ro(vmx, MSR_APIC_ESR);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_TIMER);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_THERMAL);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_PCINT);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_LINT0);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_LINT1);
 	error += guest_msr_ro(vmx, MSR_APIC_LVT_ERROR);
 	error += guest_msr_ro(vmx, MSR_APIC_ICR_TIMER);
 	error += guest_msr_ro(vmx, MSR_APIC_DCR_TIMER);
 	error += guest_msr_ro(vmx, MSR_APIC_ICR);
 
 	/*
 	 * Allow TPR, EOI and SELF_IPI MSRs to be read and written by the guest.
 	 *
 	 * These registers get special treatment described in the section
 	 * "Virtualizing MSR-Based APIC Accesses".
 	 */
 	error += guest_msr_rw(vmx, MSR_APIC_TPR);
 	error += guest_msr_rw(vmx, MSR_APIC_EOI);
 	error += guest_msr_rw(vmx, MSR_APIC_SELF_IPI);
 
 	return (error);
 }
 
 u_long
 vmx_fix_cr0(u_long cr0)
 {
 
 	return ((cr0 | cr0_ones_mask) & ~cr0_zeros_mask);
 }
 
 u_long
 vmx_fix_cr4(u_long cr4)
 {
 
 	return ((cr4 | cr4_ones_mask) & ~cr4_zeros_mask);
 }
 
 static void
 vpid_free(int vpid)
 {
 	if (vpid < 0 || vpid > 0xffff)
 		panic("vpid_free: invalid vpid %d", vpid);
 
 	/*
 	 * VPIDs [0,VM_MAXCPU] are special and are not allocated from
 	 * the unit number allocator.
 	 */
 
 	if (vpid > VM_MAXCPU)
 		free_unr(vpid_unr, vpid);
 }
 
 static void
 vpid_alloc(uint16_t *vpid, int num)
 {
 	int i, x;
 
 	if (num <= 0 || num > VM_MAXCPU)
 		panic("invalid number of vpids requested: %d", num);
 
 	/*
 	 * If the "enable vpid" execution control is not enabled then the
 	 * VPID is required to be 0 for all vcpus.
 	 */
 	if ((procbased_ctls2 & PROCBASED2_ENABLE_VPID) == 0) {
 		for (i = 0; i < num; i++)
 			vpid[i] = 0;
 		return;
 	}
 
 	/*
 	 * Allocate a unique VPID for each vcpu from the unit number allocator.
 	 */
 	for (i = 0; i < num; i++) {
 		x = alloc_unr(vpid_unr);
 		if (x == -1)
 			break;
 		else
 			vpid[i] = x;
 	}
 
 	if (i < num) {
 		atomic_add_int(&vpid_alloc_failed, 1);
 
 		/*
 		 * If the unit number allocator does not have enough unique
 		 * VPIDs then we need to allocate from the [1,VM_MAXCPU] range.
 		 *
 		 * These VPIDs are not be unique across VMs but this does not
 		 * affect correctness because the combined mappings are also
 		 * tagged with the EP4TA which is unique for each VM.
 		 *
 		 * It is still sub-optimal because the invvpid will invalidate
 		 * combined mappings for a particular VPID across all EP4TAs.
 		 */
 		while (i-- > 0)
 			vpid_free(vpid[i]);
 
 		for (i = 0; i < num; i++)
 			vpid[i] = i + 1;
 	}
 }
 
 static void
 vpid_init(void)
 {
 	/*
 	 * VPID 0 is required when the "enable VPID" execution control is
 	 * disabled.
 	 *
 	 * VPIDs [1,VM_MAXCPU] are used as the "overflow namespace" when the
 	 * unit number allocator does not have sufficient unique VPIDs to
 	 * satisfy the allocation.
 	 *
 	 * The remaining VPIDs are managed by the unit number allocator.
 	 */
 	vpid_unr = new_unrhdr(VM_MAXCPU + 1, 0xffff, NULL);
 }
 
 static void
 vmx_disable(void *arg __unused)
 {
 	struct invvpid_desc invvpid_desc = { 0 };
 	struct invept_desc invept_desc = { 0 };
 
 	if (vmxon_enabled[curcpu]) {
 		/*
 		 * See sections 25.3.3.3 and 25.3.3.4 in Intel Vol 3b.
 		 *
 		 * VMXON or VMXOFF are not required to invalidate any TLB
 		 * caching structures. This prevents potential retention of
 		 * cached information in the TLB between distinct VMX episodes.
 		 */
 		invvpid(INVVPID_TYPE_ALL_CONTEXTS, invvpid_desc);
 		invept(INVEPT_TYPE_ALL_CONTEXTS, invept_desc);
 		vmxoff();
 	}
 	load_cr4(rcr4() & ~CR4_VMXE);
 }
 
 static int
 vmx_cleanup(void)
 {
 
 	if (pirvec >= 0)
 		lapic_ipi_free(pirvec);
 
 	if (vpid_unr != NULL) {
 		delete_unrhdr(vpid_unr);
 		vpid_unr = NULL;
 	}
 
 	if (nmi_flush_l1d_sw == 1)
 		nmi_flush_l1d_sw = 0;
 
 	smp_rendezvous(NULL, vmx_disable, NULL, NULL);
 
 	return (0);
 }
 
 static void
 vmx_enable(void *arg __unused)
 {
 	int error;
 	uint64_t feature_control;
 
 	feature_control = rdmsr(MSR_IA32_FEATURE_CONTROL);
 	if ((feature_control & IA32_FEATURE_CONTROL_LOCK) == 0 ||
 	    (feature_control & IA32_FEATURE_CONTROL_VMX_EN) == 0) {
 		wrmsr(MSR_IA32_FEATURE_CONTROL,
 		    feature_control | IA32_FEATURE_CONTROL_VMX_EN |
 		    IA32_FEATURE_CONTROL_LOCK);
 	}
 
 	load_cr4(rcr4() | CR4_VMXE);
 
 	*(uint32_t *)vmxon_region[curcpu] = vmx_revision();
 	error = vmxon(vmxon_region[curcpu]);
 	if (error == 0)
 		vmxon_enabled[curcpu] = 1;
 }
 
 static void
 vmx_restore(void)
 {
 
 	if (vmxon_enabled[curcpu])
 		vmxon(vmxon_region[curcpu]);
 }
 
 static int
 vmx_init(int ipinum)
 {
 	int error, use_tpr_shadow;
 	uint64_t basic, fixed0, fixed1, feature_control;
 	uint32_t tmp, procbased2_vid_bits;
 
 	/* CPUID.1:ECX[bit 5] must be 1 for processor to support VMX */
 	if (!(cpu_feature2 & CPUID2_VMX)) {
 		printf("vmx_init: processor does not support VMX operation\n");
 		return (ENXIO);
 	}
 
 	/*
 	 * Verify that MSR_IA32_FEATURE_CONTROL lock and VMXON enable bits
 	 * are set (bits 0 and 2 respectively).
 	 */
 	feature_control = rdmsr(MSR_IA32_FEATURE_CONTROL);
 	if ((feature_control & IA32_FEATURE_CONTROL_LOCK) == 1 &&
 	    (feature_control & IA32_FEATURE_CONTROL_VMX_EN) == 0) {
 		printf("vmx_init: VMX operation disabled by BIOS\n");
 		return (ENXIO);
 	}
 
 	/*
 	 * Verify capabilities MSR_VMX_BASIC:
 	 * - bit 54 indicates support for INS/OUTS decoding
 	 */
 	basic = rdmsr(MSR_VMX_BASIC);
 	if ((basic & (1UL << 54)) == 0) {
 		printf("vmx_init: processor does not support desired basic "
 		    "capabilities\n");
 		return (EINVAL);
 	}
 
 	/* Check support for primary processor-based VM-execution controls */
 	error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS,
 			       MSR_VMX_TRUE_PROCBASED_CTLS,
 			       PROCBASED_CTLS_ONE_SETTING,
 			       PROCBASED_CTLS_ZERO_SETTING, &procbased_ctls);
 	if (error) {
 		printf("vmx_init: processor does not support desired primary "
 		       "processor-based controls\n");
 		return (error);
 	}
 
 	/* Clear the processor-based ctl bits that are set on demand */
 	procbased_ctls &= ~PROCBASED_CTLS_WINDOW_SETTING;
 
 	/* Check support for secondary processor-based VM-execution controls */
 	error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2,
 			       MSR_VMX_PROCBASED_CTLS2,
 			       PROCBASED_CTLS2_ONE_SETTING,
 			       PROCBASED_CTLS2_ZERO_SETTING, &procbased_ctls2);
 	if (error) {
 		printf("vmx_init: processor does not support desired secondary "
 		       "processor-based controls\n");
 		return (error);
 	}
 
 	/* Check support for VPID */
 	error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2, MSR_VMX_PROCBASED_CTLS2,
 			       PROCBASED2_ENABLE_VPID, 0, &tmp);
 	if (error == 0)
 		procbased_ctls2 |= PROCBASED2_ENABLE_VPID;
 
 	/* Check support for pin-based VM-execution controls */
 	error = vmx_set_ctlreg(MSR_VMX_PINBASED_CTLS,
 			       MSR_VMX_TRUE_PINBASED_CTLS,
 			       PINBASED_CTLS_ONE_SETTING,
 			       PINBASED_CTLS_ZERO_SETTING, &pinbased_ctls);
 	if (error) {
 		printf("vmx_init: processor does not support desired "
 		       "pin-based controls\n");
 		return (error);
 	}
 
 	/* Check support for VM-exit controls */
 	error = vmx_set_ctlreg(MSR_VMX_EXIT_CTLS, MSR_VMX_TRUE_EXIT_CTLS,
 			       VM_EXIT_CTLS_ONE_SETTING,
 			       VM_EXIT_CTLS_ZERO_SETTING,
 			       &exit_ctls);
 	if (error) {
 		printf("vmx_init: processor does not support desired "
 		    "exit controls\n");
 		return (error);
 	}
 
 	/* Check support for VM-entry controls */
 	error = vmx_set_ctlreg(MSR_VMX_ENTRY_CTLS, MSR_VMX_TRUE_ENTRY_CTLS,
 	    VM_ENTRY_CTLS_ONE_SETTING, VM_ENTRY_CTLS_ZERO_SETTING,
 	    &entry_ctls);
 	if (error) {
 		printf("vmx_init: processor does not support desired "
 		    "entry controls\n");
 		return (error);
 	}
 
 	/*
 	 * Check support for optional features by testing them
 	 * as individual bits
 	 */
 	cap_halt_exit = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS,
 					MSR_VMX_TRUE_PROCBASED_CTLS,
 					PROCBASED_HLT_EXITING, 0,
 					&tmp) == 0);
 
 	cap_monitor_trap = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS,
 					MSR_VMX_PROCBASED_CTLS,
 					PROCBASED_MTF, 0,
 					&tmp) == 0);
 
 	cap_pause_exit = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS,
 					 MSR_VMX_TRUE_PROCBASED_CTLS,
 					 PROCBASED_PAUSE_EXITING, 0,
 					 &tmp) == 0);
 
 	cap_unrestricted_guest = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2,
 					MSR_VMX_PROCBASED_CTLS2,
 					PROCBASED2_UNRESTRICTED_GUEST, 0,
 				        &tmp) == 0);
 
 	cap_invpcid = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2,
 	    MSR_VMX_PROCBASED_CTLS2, PROCBASED2_ENABLE_INVPCID, 0,
 	    &tmp) == 0);
 
 	/*
 	 * Check support for virtual interrupt delivery.
 	 */
 	procbased2_vid_bits = (PROCBASED2_VIRTUALIZE_APIC_ACCESSES |
 	    PROCBASED2_VIRTUALIZE_X2APIC_MODE |
 	    PROCBASED2_APIC_REGISTER_VIRTUALIZATION |
 	    PROCBASED2_VIRTUAL_INTERRUPT_DELIVERY);
 
 	use_tpr_shadow = (vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS,
 	    MSR_VMX_TRUE_PROCBASED_CTLS, PROCBASED_USE_TPR_SHADOW, 0,
 	    &tmp) == 0);
 
 	error = vmx_set_ctlreg(MSR_VMX_PROCBASED_CTLS2, MSR_VMX_PROCBASED_CTLS2,
 	    procbased2_vid_bits, 0, &tmp);
 	if (error == 0 && use_tpr_shadow) {
 		virtual_interrupt_delivery = 1;
 		TUNABLE_INT_FETCH("hw.vmm.vmx.use_apic_vid",
 		    &virtual_interrupt_delivery);
 	}
 
 	if (virtual_interrupt_delivery) {
 		procbased_ctls |= PROCBASED_USE_TPR_SHADOW;
 		procbased_ctls2 |= procbased2_vid_bits;
 		procbased_ctls2 &= ~PROCBASED2_VIRTUALIZE_X2APIC_MODE;
 
 		/*
 		 * No need to emulate accesses to %CR8 if virtual
 		 * interrupt delivery is enabled.
 		 */
 		procbased_ctls &= ~PROCBASED_CR8_LOAD_EXITING;
 		procbased_ctls &= ~PROCBASED_CR8_STORE_EXITING;
 
 		/*
 		 * Check for Posted Interrupts only if Virtual Interrupt
 		 * Delivery is enabled.
 		 */
 		error = vmx_set_ctlreg(MSR_VMX_PINBASED_CTLS,
 		    MSR_VMX_TRUE_PINBASED_CTLS, PINBASED_POSTED_INTERRUPT, 0,
 		    &tmp);
 		if (error == 0) {
 			pirvec = lapic_ipi_alloc(pti ? &IDTVEC(justreturn1_pti) :
 			    &IDTVEC(justreturn));
 			if (pirvec < 0) {
 				if (bootverbose) {
 					printf("vmx_init: unable to allocate "
 					    "posted interrupt vector\n");
 				}
 			} else {
 				posted_interrupts = 1;
 				TUNABLE_INT_FETCH("hw.vmm.vmx.use_apic_pir",
 				    &posted_interrupts);
 			}
 		}
 	}
 
 	if (posted_interrupts)
 		    pinbased_ctls |= PINBASED_POSTED_INTERRUPT;
 
 	/* Initialize EPT */
 	error = ept_init(ipinum);
 	if (error) {
 		printf("vmx_init: ept initialization failed (%d)\n", error);
 		return (error);
 	}
 
 	guest_l1d_flush = (cpu_ia32_arch_caps &
 	    IA32_ARCH_CAP_SKIP_L1DFL_VMENTRY) == 0;
 	TUNABLE_INT_FETCH("hw.vmm.l1d_flush", &guest_l1d_flush);
 
 	/*
 	 * L1D cache flush is enabled.  Use IA32_FLUSH_CMD MSR when
 	 * available.  Otherwise fall back to the software flush
 	 * method which loads enough data from the kernel text to
 	 * flush existing L1D content, both on VMX entry and on NMI
 	 * return.
 	 */
 	if (guest_l1d_flush) {
 		if ((cpu_stdext_feature3 & CPUID_STDEXT3_L1D_FLUSH) == 0) {
 			guest_l1d_flush_sw = 1;
 			TUNABLE_INT_FETCH("hw.vmm.l1d_flush_sw",
 			    &guest_l1d_flush_sw);
 		}
 		if (guest_l1d_flush_sw) {
 			if (nmi_flush_l1d_sw <= 1)
 				nmi_flush_l1d_sw = 1;
 		} else {
 			msr_load_list[0].index = MSR_IA32_FLUSH_CMD;
 			msr_load_list[0].val = IA32_FLUSH_CMD_L1D;
 		}
 	}
 
 	/*
 	 * Stash the cr0 and cr4 bits that must be fixed to 0 or 1
 	 */
 	fixed0 = rdmsr(MSR_VMX_CR0_FIXED0);
 	fixed1 = rdmsr(MSR_VMX_CR0_FIXED1);
 	cr0_ones_mask = fixed0 & fixed1;
 	cr0_zeros_mask = ~fixed0 & ~fixed1;
 
 	/*
 	 * CR0_PE and CR0_PG can be set to zero in VMX non-root operation
 	 * if unrestricted guest execution is allowed.
 	 */
 	if (cap_unrestricted_guest)
 		cr0_ones_mask &= ~(CR0_PG | CR0_PE);
 
 	/*
 	 * Do not allow the guest to set CR0_NW or CR0_CD.
 	 */
 	cr0_zeros_mask |= (CR0_NW | CR0_CD);
 
 	fixed0 = rdmsr(MSR_VMX_CR4_FIXED0);
 	fixed1 = rdmsr(MSR_VMX_CR4_FIXED1);
 	cr4_ones_mask = fixed0 & fixed1;
 	cr4_zeros_mask = ~fixed0 & ~fixed1;
 
 	vpid_init();
 
 	vmx_msr_init();
 
 	/* enable VMX operation */
 	smp_rendezvous(NULL, vmx_enable, NULL, NULL);
 
 	vmx_initialized = 1;
 
 	return (0);
 }
 
 static void
 vmx_trigger_hostintr(int vector)
 {
 	uintptr_t func;
 	struct gate_descriptor *gd;
 
 	gd = &idt[vector];
 
 	KASSERT(vector >= 32 && vector <= 255, ("vmx_trigger_hostintr: "
 	    "invalid vector %d", vector));
 	KASSERT(gd->gd_p == 1, ("gate descriptor for vector %d not present",
 	    vector));
 	KASSERT(gd->gd_type == SDT_SYSIGT, ("gate descriptor for vector %d "
 	    "has invalid type %d", vector, gd->gd_type));
 	KASSERT(gd->gd_dpl == SEL_KPL, ("gate descriptor for vector %d "
 	    "has invalid dpl %d", vector, gd->gd_dpl));
 	KASSERT(gd->gd_selector == GSEL(GCODE_SEL, SEL_KPL), ("gate descriptor "
 	    "for vector %d has invalid selector %d", vector, gd->gd_selector));
 	KASSERT(gd->gd_ist == 0, ("gate descriptor for vector %d has invalid "
 	    "IST %d", vector, gd->gd_ist));
 
 	func = ((long)gd->gd_hioffset << 16 | gd->gd_looffset);
 	vmx_call_isr(func);
 }
 
 static int
 vmx_setup_cr_shadow(int which, struct vmcs *vmcs, uint32_t initial)
 {
 	int error, mask_ident, shadow_ident;
 	uint64_t mask_value;
 
 	if (which != 0 && which != 4)
 		panic("vmx_setup_cr_shadow: unknown cr%d", which);
 
 	if (which == 0) {
 		mask_ident = VMCS_CR0_MASK;
 		mask_value = cr0_ones_mask | cr0_zeros_mask;
 		shadow_ident = VMCS_CR0_SHADOW;
 	} else {
 		mask_ident = VMCS_CR4_MASK;
 		mask_value = cr4_ones_mask | cr4_zeros_mask;
 		shadow_ident = VMCS_CR4_SHADOW;
 	}
 
 	error = vmcs_setreg(vmcs, 0, VMCS_IDENT(mask_ident), mask_value);
 	if (error)
 		return (error);
 
 	error = vmcs_setreg(vmcs, 0, VMCS_IDENT(shadow_ident), initial);
 	if (error)
 		return (error);
 
 	return (0);
 }
 #define	vmx_setup_cr0_shadow(vmcs,init)	vmx_setup_cr_shadow(0, (vmcs), (init))
 #define	vmx_setup_cr4_shadow(vmcs,init)	vmx_setup_cr_shadow(4, (vmcs), (init))
 
 static void *
 vmx_vminit(struct vm *vm, pmap_t pmap)
 {
 	uint16_t vpid[VM_MAXCPU];
 	int i, error;
 	struct vmx *vmx;
 	struct vmcs *vmcs;
 	uint32_t exc_bitmap;
+	uint16_t maxcpus;
 
 	vmx = malloc(sizeof(struct vmx), M_VMX, M_WAITOK | M_ZERO);
 	if ((uintptr_t)vmx & PAGE_MASK) {
 		panic("malloc of struct vmx not aligned on %d byte boundary",
 		      PAGE_SIZE);
 	}
 	vmx->vm = vm;
 
 	vmx->eptp = eptp(vtophys((vm_offset_t)pmap->pm_pml4));
 
 	/*
 	 * Clean up EPTP-tagged guest physical and combined mappings
 	 *
 	 * VMX transitions are not required to invalidate any guest physical
 	 * mappings. So, it may be possible for stale guest physical mappings
 	 * to be present in the processor TLBs.
 	 *
 	 * Combined mappings for this EP4TA are also invalidated for all VPIDs.
 	 */
 	ept_invalidate_mappings(vmx->eptp);
 
 	msr_bitmap_initialize(vmx->msr_bitmap);
 
 	/*
 	 * It is safe to allow direct access to MSR_GSBASE and MSR_FSBASE.
 	 * The guest FSBASE and GSBASE are saved and restored during
 	 * vm-exit and vm-entry respectively. The host FSBASE and GSBASE are
 	 * always restored from the vmcs host state area on vm-exit.
 	 *
 	 * The SYSENTER_CS/ESP/EIP MSRs are identical to FS/GSBASE in
 	 * how they are saved/restored so can be directly accessed by the
 	 * guest.
 	 *
 	 * MSR_EFER is saved and restored in the guest VMCS area on a
 	 * VM exit and entry respectively. It is also restored from the
 	 * host VMCS area on a VM exit.
 	 *
 	 * The TSC MSR is exposed read-only. Writes are disallowed as
 	 * that will impact the host TSC.  If the guest does a write
 	 * the "use TSC offsetting" execution control is enabled and the
 	 * difference between the host TSC and the guest TSC is written
 	 * into the TSC offset in the VMCS.
 	 */
 	if (guest_msr_rw(vmx, MSR_GSBASE) ||
 	    guest_msr_rw(vmx, MSR_FSBASE) ||
 	    guest_msr_rw(vmx, MSR_SYSENTER_CS_MSR) ||
 	    guest_msr_rw(vmx, MSR_SYSENTER_ESP_MSR) ||
 	    guest_msr_rw(vmx, MSR_SYSENTER_EIP_MSR) ||
 	    guest_msr_rw(vmx, MSR_EFER) ||
 	    guest_msr_ro(vmx, MSR_TSC))
 		panic("vmx_vminit: error setting guest msr access");
 
 	vpid_alloc(vpid, VM_MAXCPU);
 
 	if (virtual_interrupt_delivery) {
 		error = vm_map_mmio(vm, DEFAULT_APIC_BASE, PAGE_SIZE,
 		    APIC_ACCESS_ADDRESS);
 		/* XXX this should really return an error to the caller */
 		KASSERT(error == 0, ("vm_map_mmio(apicbase) error %d", error));
 	}
 
-	for (i = 0; i < VM_MAXCPU; i++) {
+	maxcpus = vm_get_maxcpus(vm);
+	for (i = 0; i < maxcpus; i++) {
 		vmcs = &vmx->vmcs[i];
 		vmcs->identifier = vmx_revision();
 		error = vmclear(vmcs);
 		if (error != 0) {
 			panic("vmx_vminit: vmclear error %d on vcpu %d\n",
 			      error, i);
 		}
 
 		vmx_msr_guest_init(vmx, i);
 
 		error = vmcs_init(vmcs);
 		KASSERT(error == 0, ("vmcs_init error %d", error));
 
 		VMPTRLD(vmcs);
 		error = 0;
 		error += vmwrite(VMCS_HOST_RSP, (u_long)&vmx->ctx[i]);
 		error += vmwrite(VMCS_EPTP, vmx->eptp);
 		error += vmwrite(VMCS_PIN_BASED_CTLS, pinbased_ctls);
 		error += vmwrite(VMCS_PRI_PROC_BASED_CTLS, procbased_ctls);
 		error += vmwrite(VMCS_SEC_PROC_BASED_CTLS, procbased_ctls2);
 		error += vmwrite(VMCS_EXIT_CTLS, exit_ctls);
 		error += vmwrite(VMCS_ENTRY_CTLS, entry_ctls);
 		error += vmwrite(VMCS_MSR_BITMAP, vtophys(vmx->msr_bitmap));
 		error += vmwrite(VMCS_VPID, vpid[i]);
 
 		if (guest_l1d_flush && !guest_l1d_flush_sw) {
 			vmcs_write(VMCS_ENTRY_MSR_LOAD, pmap_kextract(
 			    (vm_offset_t)&msr_load_list[0]));
 			vmcs_write(VMCS_ENTRY_MSR_LOAD_COUNT,
 			    nitems(msr_load_list));
 			vmcs_write(VMCS_EXIT_MSR_STORE, 0);
 			vmcs_write(VMCS_EXIT_MSR_STORE_COUNT, 0);
 		}
 
 		/* exception bitmap */
 		if (vcpu_trace_exceptions(vm, i))
 			exc_bitmap = 0xffffffff;
 		else
 			exc_bitmap = 1 << IDT_MC;
 		error += vmwrite(VMCS_EXCEPTION_BITMAP, exc_bitmap);
 
 		vmx->ctx[i].guest_dr6 = DBREG_DR6_RESERVED1;
 		error += vmwrite(VMCS_GUEST_DR7, DBREG_DR7_RESERVED1);
 
 		if (virtual_interrupt_delivery) {
 			error += vmwrite(VMCS_APIC_ACCESS, APIC_ACCESS_ADDRESS);
 			error += vmwrite(VMCS_VIRTUAL_APIC,
 			    vtophys(&vmx->apic_page[i]));
 			error += vmwrite(VMCS_EOI_EXIT0, 0);
 			error += vmwrite(VMCS_EOI_EXIT1, 0);
 			error += vmwrite(VMCS_EOI_EXIT2, 0);
 			error += vmwrite(VMCS_EOI_EXIT3, 0);
 		}
 		if (posted_interrupts) {
 			error += vmwrite(VMCS_PIR_VECTOR, pirvec);
 			error += vmwrite(VMCS_PIR_DESC,
 			    vtophys(&vmx->pir_desc[i]));
 		}
 		VMCLEAR(vmcs);
 		KASSERT(error == 0, ("vmx_vminit: error customizing the vmcs"));
 
 		vmx->cap[i].set = 0;
 		vmx->cap[i].proc_ctls = procbased_ctls;
 		vmx->cap[i].proc_ctls2 = procbased_ctls2;
 
 		vmx->state[i].nextrip = ~0;
 		vmx->state[i].lastcpu = NOCPU;
 		vmx->state[i].vpid = vpid[i];
 
 		/*
 		 * Set up the CR0/4 shadows, and init the read shadow
 		 * to the power-on register value from the Intel Sys Arch.
 		 *  CR0 - 0x60000010
 		 *  CR4 - 0
 		 */
 		error = vmx_setup_cr0_shadow(vmcs, 0x60000010);
 		if (error != 0)
 			panic("vmx_setup_cr0_shadow %d", error);
 
 		error = vmx_setup_cr4_shadow(vmcs, 0);
 		if (error != 0)
 			panic("vmx_setup_cr4_shadow %d", error);
 
 		vmx->ctx[i].pmap = pmap;
 	}
 
 	return (vmx);
 }
 
 static int
 vmx_handle_cpuid(struct vm *vm, int vcpu, struct vmxctx *vmxctx)
 {
 	int handled, func;
 
 	func = vmxctx->guest_rax;
 
 	handled = x86_emulate_cpuid(vm, vcpu,
 				    (uint32_t*)(&vmxctx->guest_rax),
 				    (uint32_t*)(&vmxctx->guest_rbx),
 				    (uint32_t*)(&vmxctx->guest_rcx),
 				    (uint32_t*)(&vmxctx->guest_rdx));
 	return (handled);
 }
 
 static __inline void
 vmx_run_trace(struct vmx *vmx, int vcpu)
 {
 #ifdef KTR
 	VCPU_CTR1(vmx->vm, vcpu, "Resume execution at %#lx", vmcs_guest_rip());
 #endif
 }
 
 static __inline void
 vmx_exit_trace(struct vmx *vmx, int vcpu, uint64_t rip, uint32_t exit_reason,
 	       int handled)
 {
 #ifdef KTR
 	VCPU_CTR3(vmx->vm, vcpu, "%s %s vmexit at 0x%0lx",
 		 handled ? "handled" : "unhandled",
 		 exit_reason_to_str(exit_reason), rip);
 #endif
 }
 
 static __inline void
 vmx_astpending_trace(struct vmx *vmx, int vcpu, uint64_t rip)
 {
 #ifdef KTR
 	VCPU_CTR1(vmx->vm, vcpu, "astpending vmexit at 0x%0lx", rip);
 #endif
 }
 
 static VMM_STAT_INTEL(VCPU_INVVPID_SAVED, "Number of vpid invalidations saved");
 static VMM_STAT_INTEL(VCPU_INVVPID_DONE, "Number of vpid invalidations done");
 
 /*
  * Invalidate guest mappings identified by its vpid from the TLB.
  */
 static __inline void
 vmx_invvpid(struct vmx *vmx, int vcpu, pmap_t pmap, int running)
 {
 	struct vmxstate *vmxstate;
 	struct invvpid_desc invvpid_desc;
 
 	vmxstate = &vmx->state[vcpu];
 	if (vmxstate->vpid == 0)
 		return;
 
 	if (!running) {
 		/*
 		 * Set the 'lastcpu' to an invalid host cpu.
 		 *
 		 * This will invalidate TLB entries tagged with the vcpu's
 		 * vpid the next time it runs via vmx_set_pcpu_defaults().
 		 */
 		vmxstate->lastcpu = NOCPU;
 		return;
 	}
 
 	KASSERT(curthread->td_critnest > 0, ("%s: vcpu %d running outside "
 	    "critical section", __func__, vcpu));
 
 	/*
 	 * Invalidate all mappings tagged with 'vpid'
 	 *
 	 * We do this because this vcpu was executing on a different host
 	 * cpu when it last ran. We do not track whether it invalidated
 	 * mappings associated with its 'vpid' during that run. So we must
 	 * assume that the mappings associated with 'vpid' on 'curcpu' are
 	 * stale and invalidate them.
 	 *
 	 * Note that we incur this penalty only when the scheduler chooses to
 	 * move the thread associated with this vcpu between host cpus.
 	 *
 	 * Note also that this will invalidate mappings tagged with 'vpid'
 	 * for "all" EP4TAs.
 	 */
 	if (pmap->pm_eptgen == vmx->eptgen[curcpu]) {
 		invvpid_desc._res1 = 0;
 		invvpid_desc._res2 = 0;
 		invvpid_desc.vpid = vmxstate->vpid;
 		invvpid_desc.linear_addr = 0;
 		invvpid(INVVPID_TYPE_SINGLE_CONTEXT, invvpid_desc);
 		vmm_stat_incr(vmx->vm, vcpu, VCPU_INVVPID_DONE, 1);
 	} else {
 		/*
 		 * The invvpid can be skipped if an invept is going to
 		 * be performed before entering the guest. The invept
 		 * will invalidate combined mappings tagged with
 		 * 'vmx->eptp' for all vpids.
 		 */
 		vmm_stat_incr(vmx->vm, vcpu, VCPU_INVVPID_SAVED, 1);
 	}
 }
 
 static void
 vmx_set_pcpu_defaults(struct vmx *vmx, int vcpu, pmap_t pmap)
 {
 	struct vmxstate *vmxstate;
 
 	vmxstate = &vmx->state[vcpu];
 	if (vmxstate->lastcpu == curcpu)
 		return;
 
 	vmxstate->lastcpu = curcpu;
 
 	vmm_stat_incr(vmx->vm, vcpu, VCPU_MIGRATIONS, 1);
 
 	vmcs_write(VMCS_HOST_TR_BASE, vmm_get_host_trbase());
 	vmcs_write(VMCS_HOST_GDTR_BASE, vmm_get_host_gdtrbase());
 	vmcs_write(VMCS_HOST_GS_BASE, vmm_get_host_gsbase());
 	vmx_invvpid(vmx, vcpu, pmap, 1);
 }
 
 /*
  * We depend on 'procbased_ctls' to have the Interrupt Window Exiting bit set.
  */
 CTASSERT((PROCBASED_CTLS_ONE_SETTING & PROCBASED_INT_WINDOW_EXITING) != 0);
 
 static void __inline
 vmx_set_int_window_exiting(struct vmx *vmx, int vcpu)
 {
 
 	if ((vmx->cap[vcpu].proc_ctls & PROCBASED_INT_WINDOW_EXITING) == 0) {
 		vmx->cap[vcpu].proc_ctls |= PROCBASED_INT_WINDOW_EXITING;
 		vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls);
 		VCPU_CTR0(vmx->vm, vcpu, "Enabling interrupt window exiting");
 	}
 }
 
 static void __inline
 vmx_clear_int_window_exiting(struct vmx *vmx, int vcpu)
 {
 
 	KASSERT((vmx->cap[vcpu].proc_ctls & PROCBASED_INT_WINDOW_EXITING) != 0,
 	    ("intr_window_exiting not set: %#x", vmx->cap[vcpu].proc_ctls));
 	vmx->cap[vcpu].proc_ctls &= ~PROCBASED_INT_WINDOW_EXITING;
 	vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls);
 	VCPU_CTR0(vmx->vm, vcpu, "Disabling interrupt window exiting");
 }
 
 static void __inline
 vmx_set_nmi_window_exiting(struct vmx *vmx, int vcpu)
 {
 
 	if ((vmx->cap[vcpu].proc_ctls & PROCBASED_NMI_WINDOW_EXITING) == 0) {
 		vmx->cap[vcpu].proc_ctls |= PROCBASED_NMI_WINDOW_EXITING;
 		vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls);
 		VCPU_CTR0(vmx->vm, vcpu, "Enabling NMI window exiting");
 	}
 }
 
 static void __inline
 vmx_clear_nmi_window_exiting(struct vmx *vmx, int vcpu)
 {
 
 	KASSERT((vmx->cap[vcpu].proc_ctls & PROCBASED_NMI_WINDOW_EXITING) != 0,
 	    ("nmi_window_exiting not set %#x", vmx->cap[vcpu].proc_ctls));
 	vmx->cap[vcpu].proc_ctls &= ~PROCBASED_NMI_WINDOW_EXITING;
 	vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls);
 	VCPU_CTR0(vmx->vm, vcpu, "Disabling NMI window exiting");
 }
 
 int
 vmx_set_tsc_offset(struct vmx *vmx, int vcpu, uint64_t offset)
 {
 	int error;
 
 	if ((vmx->cap[vcpu].proc_ctls & PROCBASED_TSC_OFFSET) == 0) {
 		vmx->cap[vcpu].proc_ctls |= PROCBASED_TSC_OFFSET;
 		vmcs_write(VMCS_PRI_PROC_BASED_CTLS, vmx->cap[vcpu].proc_ctls);
 		VCPU_CTR0(vmx->vm, vcpu, "Enabling TSC offsetting");
 	}
 
 	error = vmwrite(VMCS_TSC_OFFSET, offset);
 
 	return (error);
 }
 
 #define	NMI_BLOCKING	(VMCS_INTERRUPTIBILITY_NMI_BLOCKING |		\
 			 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)
 #define	HWINTR_BLOCKING	(VMCS_INTERRUPTIBILITY_STI_BLOCKING |		\
 			 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)
 
 static void
 vmx_inject_nmi(struct vmx *vmx, int vcpu)
 {
 	uint32_t gi, info;
 
 	gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 	KASSERT((gi & NMI_BLOCKING) == 0, ("vmx_inject_nmi: invalid guest "
 	    "interruptibility-state %#x", gi));
 
 	info = vmcs_read(VMCS_ENTRY_INTR_INFO);
 	KASSERT((info & VMCS_INTR_VALID) == 0, ("vmx_inject_nmi: invalid "
 	    "VM-entry interruption information %#x", info));
 
 	/*
 	 * Inject the virtual NMI. The vector must be the NMI IDT entry
 	 * or the VMCS entry check will fail.
 	 */
 	info = IDT_NMI | VMCS_INTR_T_NMI | VMCS_INTR_VALID;
 	vmcs_write(VMCS_ENTRY_INTR_INFO, info);
 
 	VCPU_CTR0(vmx->vm, vcpu, "Injecting vNMI");
 
 	/* Clear the request */
 	vm_nmi_clear(vmx->vm, vcpu);
 }
 
 static void
 vmx_inject_interrupts(struct vmx *vmx, int vcpu, struct vlapic *vlapic,
     uint64_t guestrip)
 {
 	int vector, need_nmi_exiting, extint_pending;
 	uint64_t rflags, entryinfo;
 	uint32_t gi, info;
 
 	if (vmx->state[vcpu].nextrip != guestrip) {
 		gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 		if (gi & HWINTR_BLOCKING) {
 			VCPU_CTR2(vmx->vm, vcpu, "Guest interrupt blocking "
 			    "cleared due to rip change: %#lx/%#lx",
 			    vmx->state[vcpu].nextrip, guestrip);
 			gi &= ~HWINTR_BLOCKING;
 			vmcs_write(VMCS_GUEST_INTERRUPTIBILITY, gi);
 		}
 	}
 
 	if (vm_entry_intinfo(vmx->vm, vcpu, &entryinfo)) {
 		KASSERT((entryinfo & VMCS_INTR_VALID) != 0, ("%s: entry "
 		    "intinfo is not valid: %#lx", __func__, entryinfo));
 
 		info = vmcs_read(VMCS_ENTRY_INTR_INFO);
 		KASSERT((info & VMCS_INTR_VALID) == 0, ("%s: cannot inject "
 		     "pending exception: %#lx/%#x", __func__, entryinfo, info));
 
 		info = entryinfo;
 		vector = info & 0xff;
 		if (vector == IDT_BP || vector == IDT_OF) {
 			/*
 			 * VT-x requires #BP and #OF to be injected as software
 			 * exceptions.
 			 */
 			info &= ~VMCS_INTR_T_MASK;
 			info |= VMCS_INTR_T_SWEXCEPTION;
 		}
 
 		if (info & VMCS_INTR_DEL_ERRCODE)
 			vmcs_write(VMCS_ENTRY_EXCEPTION_ERROR, entryinfo >> 32);
 
 		vmcs_write(VMCS_ENTRY_INTR_INFO, info);
 	}
 
 	if (vm_nmi_pending(vmx->vm, vcpu)) {
 		/*
 		 * If there are no conditions blocking NMI injection then
 		 * inject it directly here otherwise enable "NMI window
 		 * exiting" to inject it as soon as we can.
 		 *
 		 * We also check for STI_BLOCKING because some implementations
 		 * don't allow NMI injection in this case. If we are running
 		 * on a processor that doesn't have this restriction it will
 		 * immediately exit and the NMI will be injected in the
 		 * "NMI window exiting" handler.
 		 */
 		need_nmi_exiting = 1;
 		gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 		if ((gi & (HWINTR_BLOCKING | NMI_BLOCKING)) == 0) {
 			info = vmcs_read(VMCS_ENTRY_INTR_INFO);
 			if ((info & VMCS_INTR_VALID) == 0) {
 				vmx_inject_nmi(vmx, vcpu);
 				need_nmi_exiting = 0;
 			} else {
 				VCPU_CTR1(vmx->vm, vcpu, "Cannot inject NMI "
 				    "due to VM-entry intr info %#x", info);
 			}
 		} else {
 			VCPU_CTR1(vmx->vm, vcpu, "Cannot inject NMI due to "
 			    "Guest Interruptibility-state %#x", gi);
 		}
 
 		if (need_nmi_exiting)
 			vmx_set_nmi_window_exiting(vmx, vcpu);
 	}
 
 	extint_pending = vm_extint_pending(vmx->vm, vcpu);
 
 	if (!extint_pending && virtual_interrupt_delivery) {
 		vmx_inject_pir(vlapic);
 		return;
 	}
 
 	/*
 	 * If interrupt-window exiting is already in effect then don't bother
 	 * checking for pending interrupts. This is just an optimization and
 	 * not needed for correctness.
 	 */
 	if ((vmx->cap[vcpu].proc_ctls & PROCBASED_INT_WINDOW_EXITING) != 0) {
 		VCPU_CTR0(vmx->vm, vcpu, "Skip interrupt injection due to "
 		    "pending int_window_exiting");
 		return;
 	}
 
 	if (!extint_pending) {
 		/* Ask the local apic for a vector to inject */
 		if (!vlapic_pending_intr(vlapic, &vector))
 			return;
 
 		/*
 		 * From the Intel SDM, Volume 3, Section "Maskable
 		 * Hardware Interrupts":
 		 * - maskable interrupt vectors [16,255] can be delivered
 		 *   through the local APIC.
 		*/
 		KASSERT(vector >= 16 && vector <= 255,
 		    ("invalid vector %d from local APIC", vector));
 	} else {
 		/* Ask the legacy pic for a vector to inject */
 		vatpic_pending_intr(vmx->vm, &vector);
 
 		/*
 		 * From the Intel SDM, Volume 3, Section "Maskable
 		 * Hardware Interrupts":
 		 * - maskable interrupt vectors [0,255] can be delivered
 		 *   through the INTR pin.
 		 */
 		KASSERT(vector >= 0 && vector <= 255,
 		    ("invalid vector %d from INTR", vector));
 	}
 
 	/* Check RFLAGS.IF and the interruptibility state of the guest */
 	rflags = vmcs_read(VMCS_GUEST_RFLAGS);
 	if ((rflags & PSL_I) == 0) {
 		VCPU_CTR2(vmx->vm, vcpu, "Cannot inject vector %d due to "
 		    "rflags %#lx", vector, rflags);
 		goto cantinject;
 	}
 
 	gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 	if (gi & HWINTR_BLOCKING) {
 		VCPU_CTR2(vmx->vm, vcpu, "Cannot inject vector %d due to "
 		    "Guest Interruptibility-state %#x", vector, gi);
 		goto cantinject;
 	}
 
 	info = vmcs_read(VMCS_ENTRY_INTR_INFO);
 	if (info & VMCS_INTR_VALID) {
 		/*
 		 * This is expected and could happen for multiple reasons:
 		 * - A vectoring VM-entry was aborted due to astpending
 		 * - A VM-exit happened during event injection.
 		 * - An exception was injected above.
 		 * - An NMI was injected above or after "NMI window exiting"
 		 */
 		VCPU_CTR2(vmx->vm, vcpu, "Cannot inject vector %d due to "
 		    "VM-entry intr info %#x", vector, info);
 		goto cantinject;
 	}
 
 	/* Inject the interrupt */
 	info = VMCS_INTR_T_HWINTR | VMCS_INTR_VALID;
 	info |= vector;
 	vmcs_write(VMCS_ENTRY_INTR_INFO, info);
 
 	if (!extint_pending) {
 		/* Update the Local APIC ISR */
 		vlapic_intr_accepted(vlapic, vector);
 	} else {
 		vm_extint_clear(vmx->vm, vcpu);
 		vatpic_intr_accepted(vmx->vm, vector);
 
 		/*
 		 * After we accepted the current ExtINT the PIC may
 		 * have posted another one.  If that is the case, set
 		 * the Interrupt Window Exiting execution control so
 		 * we can inject that one too.
 		 *
 		 * Also, interrupt window exiting allows us to inject any
 		 * pending APIC vector that was preempted by the ExtINT
 		 * as soon as possible. This applies both for the software
 		 * emulated vlapic and the hardware assisted virtual APIC.
 		 */
 		vmx_set_int_window_exiting(vmx, vcpu);
 	}
 
 	VCPU_CTR1(vmx->vm, vcpu, "Injecting hwintr at vector %d", vector);
 
 	return;
 
 cantinject:
 	/*
 	 * Set the Interrupt Window Exiting execution control so we can inject
 	 * the interrupt as soon as blocking condition goes away.
 	 */
 	vmx_set_int_window_exiting(vmx, vcpu);
 }
 
 /*
  * If the Virtual NMIs execution control is '1' then the logical processor
  * tracks virtual-NMI blocking in the Guest Interruptibility-state field of
  * the VMCS. An IRET instruction in VMX non-root operation will remove any
  * virtual-NMI blocking.
  *
  * This unblocking occurs even if the IRET causes a fault. In this case the
  * hypervisor needs to restore virtual-NMI blocking before resuming the guest.
  */
 static void
 vmx_restore_nmi_blocking(struct vmx *vmx, int vcpuid)
 {
 	uint32_t gi;
 
 	VCPU_CTR0(vmx->vm, vcpuid, "Restore Virtual-NMI blocking");
 	gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 	gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 	vmcs_write(VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static void
 vmx_clear_nmi_blocking(struct vmx *vmx, int vcpuid)
 {
 	uint32_t gi;
 
 	VCPU_CTR0(vmx->vm, vcpuid, "Clear Virtual-NMI blocking");
 	gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 	gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 	vmcs_write(VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static void
 vmx_assert_nmi_blocking(struct vmx *vmx, int vcpuid)
 {
 	uint32_t gi;
 
 	gi = vmcs_read(VMCS_GUEST_INTERRUPTIBILITY);
 	KASSERT(gi & VMCS_INTERRUPTIBILITY_NMI_BLOCKING,
 	    ("NMI blocking is not in effect %#x", gi));
 }
 
 static int
 vmx_emulate_xsetbv(struct vmx *vmx, int vcpu, struct vm_exit *vmexit)
 {
 	struct vmxctx *vmxctx;
 	uint64_t xcrval;
 	const struct xsave_limits *limits;
 
 	vmxctx = &vmx->ctx[vcpu];
 	limits = vmm_get_xsave_limits();
 
 	/*
 	 * Note that the processor raises a GP# fault on its own if
 	 * xsetbv is executed for CPL != 0, so we do not have to
 	 * emulate that fault here.
 	 */
 
 	/* Only xcr0 is supported. */
 	if (vmxctx->guest_rcx != 0) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	/* We only handle xcr0 if both the host and guest have XSAVE enabled. */
 	if (!limits->xsave_enabled || !(vmcs_read(VMCS_GUEST_CR4) & CR4_XSAVE)) {
 		vm_inject_ud(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	xcrval = vmxctx->guest_rdx << 32 | (vmxctx->guest_rax & 0xffffffff);
 	if ((xcrval & ~limits->xcr0_allowed) != 0) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	if (!(xcrval & XFEATURE_ENABLED_X87)) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	/* AVX (YMM_Hi128) requires SSE. */
 	if (xcrval & XFEATURE_ENABLED_AVX &&
 	    (xcrval & XFEATURE_AVX) != XFEATURE_AVX) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	/*
 	 * AVX512 requires base AVX (YMM_Hi128) as well as OpMask,
 	 * ZMM_Hi256, and Hi16_ZMM.
 	 */
 	if (xcrval & XFEATURE_AVX512 &&
 	    (xcrval & (XFEATURE_AVX512 | XFEATURE_AVX)) !=
 	    (XFEATURE_AVX512 | XFEATURE_AVX)) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	/*
 	 * Intel MPX requires both bound register state flags to be
 	 * set.
 	 */
 	if (((xcrval & XFEATURE_ENABLED_BNDREGS) != 0) !=
 	    ((xcrval & XFEATURE_ENABLED_BNDCSR) != 0)) {
 		vm_inject_gp(vmx->vm, vcpu);
 		return (HANDLED);
 	}
 
 	/*
 	 * This runs "inside" vmrun() with the guest's FPU state, so
 	 * modifying xcr0 directly modifies the guest's xcr0, not the
 	 * host's.
 	 */
 	load_xcr(0, xcrval);
 	return (HANDLED);
 }
 
 static uint64_t
 vmx_get_guest_reg(struct vmx *vmx, int vcpu, int ident)
 {
 	const struct vmxctx *vmxctx;
 
 	vmxctx = &vmx->ctx[vcpu];
 
 	switch (ident) {
 	case 0:
 		return (vmxctx->guest_rax);
 	case 1:
 		return (vmxctx->guest_rcx);
 	case 2:
 		return (vmxctx->guest_rdx);
 	case 3:
 		return (vmxctx->guest_rbx);
 	case 4:
 		return (vmcs_read(VMCS_GUEST_RSP));
 	case 5:
 		return (vmxctx->guest_rbp);
 	case 6:
 		return (vmxctx->guest_rsi);
 	case 7:
 		return (vmxctx->guest_rdi);
 	case 8:
 		return (vmxctx->guest_r8);
 	case 9:
 		return (vmxctx->guest_r9);
 	case 10:
 		return (vmxctx->guest_r10);
 	case 11:
 		return (vmxctx->guest_r11);
 	case 12:
 		return (vmxctx->guest_r12);
 	case 13:
 		return (vmxctx->guest_r13);
 	case 14:
 		return (vmxctx->guest_r14);
 	case 15:
 		return (vmxctx->guest_r15);
 	default:
 		panic("invalid vmx register %d", ident);
 	}
 }
 
 static void
 vmx_set_guest_reg(struct vmx *vmx, int vcpu, int ident, uint64_t regval)
 {
 	struct vmxctx *vmxctx;
 
 	vmxctx = &vmx->ctx[vcpu];
 
 	switch (ident) {
 	case 0:
 		vmxctx->guest_rax = regval;
 		break;
 	case 1:
 		vmxctx->guest_rcx = regval;
 		break;
 	case 2:
 		vmxctx->guest_rdx = regval;
 		break;
 	case 3:
 		vmxctx->guest_rbx = regval;
 		break;
 	case 4:
 		vmcs_write(VMCS_GUEST_RSP, regval);
 		break;
 	case 5:
 		vmxctx->guest_rbp = regval;
 		break;
 	case 6:
 		vmxctx->guest_rsi = regval;
 		break;
 	case 7:
 		vmxctx->guest_rdi = regval;
 		break;
 	case 8:
 		vmxctx->guest_r8 = regval;
 		break;
 	case 9:
 		vmxctx->guest_r9 = regval;
 		break;
 	case 10:
 		vmxctx->guest_r10 = regval;
 		break;
 	case 11:
 		vmxctx->guest_r11 = regval;
 		break;
 	case 12:
 		vmxctx->guest_r12 = regval;
 		break;
 	case 13:
 		vmxctx->guest_r13 = regval;
 		break;
 	case 14:
 		vmxctx->guest_r14 = regval;
 		break;
 	case 15:
 		vmxctx->guest_r15 = regval;
 		break;
 	default:
 		panic("invalid vmx register %d", ident);
 	}
 }
 
 static int
 vmx_emulate_cr0_access(struct vmx *vmx, int vcpu, uint64_t exitqual)
 {
 	uint64_t crval, regval;
 
 	/* We only handle mov to %cr0 at this time */
 	if ((exitqual & 0xf0) != 0x00)
 		return (UNHANDLED);
 
 	regval = vmx_get_guest_reg(vmx, vcpu, (exitqual >> 8) & 0xf);
 
 	vmcs_write(VMCS_CR0_SHADOW, regval);
 
 	crval = regval | cr0_ones_mask;
 	crval &= ~cr0_zeros_mask;
 	vmcs_write(VMCS_GUEST_CR0, crval);
 
 	if (regval & CR0_PG) {
 		uint64_t efer, entry_ctls;
 
 		/*
 		 * If CR0.PG is 1 and EFER.LME is 1 then EFER.LMA and
 		 * the "IA-32e mode guest" bit in VM-entry control must be
 		 * equal.
 		 */
 		efer = vmcs_read(VMCS_GUEST_IA32_EFER);
 		if (efer & EFER_LME) {
 			efer |= EFER_LMA;
 			vmcs_write(VMCS_GUEST_IA32_EFER, efer);
 			entry_ctls = vmcs_read(VMCS_ENTRY_CTLS);
 			entry_ctls |= VM_ENTRY_GUEST_LMA;
 			vmcs_write(VMCS_ENTRY_CTLS, entry_ctls);
 		}
 	}
 
 	return (HANDLED);
 }
 
 static int
 vmx_emulate_cr4_access(struct vmx *vmx, int vcpu, uint64_t exitqual)
 {
 	uint64_t crval, regval;
 
 	/* We only handle mov to %cr4 at this time */
 	if ((exitqual & 0xf0) != 0x00)
 		return (UNHANDLED);
 
 	regval = vmx_get_guest_reg(vmx, vcpu, (exitqual >> 8) & 0xf);
 
 	vmcs_write(VMCS_CR4_SHADOW, regval);
 
 	crval = regval | cr4_ones_mask;
 	crval &= ~cr4_zeros_mask;
 	vmcs_write(VMCS_GUEST_CR4, crval);
 
 	return (HANDLED);
 }
 
 static int
 vmx_emulate_cr8_access(struct vmx *vmx, int vcpu, uint64_t exitqual)
 {
 	struct vlapic *vlapic;
 	uint64_t cr8;
 	int regnum;
 
 	/* We only handle mov %cr8 to/from a register at this time. */
 	if ((exitqual & 0xe0) != 0x00) {
 		return (UNHANDLED);
 	}
 
 	vlapic = vm_lapic(vmx->vm, vcpu);
 	regnum = (exitqual >> 8) & 0xf;
 	if (exitqual & 0x10) {
 		cr8 = vlapic_get_cr8(vlapic);
 		vmx_set_guest_reg(vmx, vcpu, regnum, cr8);
 	} else {
 		cr8 = vmx_get_guest_reg(vmx, vcpu, regnum);
 		vlapic_set_cr8(vlapic, cr8);
 	}
 
 	return (HANDLED);
 }
 
 /*
  * From section "Guest Register State" in the Intel SDM: CPL = SS.DPL
  */
 static int
 vmx_cpl(void)
 {
 	uint32_t ssar;
 
 	ssar = vmcs_read(VMCS_GUEST_SS_ACCESS_RIGHTS);
 	return ((ssar >> 5) & 0x3);
 }
 
 static enum vm_cpu_mode
 vmx_cpu_mode(void)
 {
 	uint32_t csar;
 
 	if (vmcs_read(VMCS_GUEST_IA32_EFER) & EFER_LMA) {
 		csar = vmcs_read(VMCS_GUEST_CS_ACCESS_RIGHTS);
 		if (csar & 0x2000)
 			return (CPU_MODE_64BIT);	/* CS.L = 1 */
 		else
 			return (CPU_MODE_COMPATIBILITY);
 	} else if (vmcs_read(VMCS_GUEST_CR0) & CR0_PE) {
 		return (CPU_MODE_PROTECTED);
 	} else {
 		return (CPU_MODE_REAL);
 	}
 }
 
 static enum vm_paging_mode
 vmx_paging_mode(void)
 {
 
 	if (!(vmcs_read(VMCS_GUEST_CR0) & CR0_PG))
 		return (PAGING_MODE_FLAT);
 	if (!(vmcs_read(VMCS_GUEST_CR4) & CR4_PAE))
 		return (PAGING_MODE_32);
 	if (vmcs_read(VMCS_GUEST_IA32_EFER) & EFER_LME)
 		return (PAGING_MODE_64);
 	else
 		return (PAGING_MODE_PAE);
 }
 
 static uint64_t
 inout_str_index(struct vmx *vmx, int vcpuid, int in)
 {
 	uint64_t val;
 	int error;
 	enum vm_reg_name reg;
 
 	reg = in ? VM_REG_GUEST_RDI : VM_REG_GUEST_RSI;
 	error = vmx_getreg(vmx, vcpuid, reg, &val);
 	KASSERT(error == 0, ("%s: vmx_getreg error %d", __func__, error));
 	return (val);
 }
 
 static uint64_t
 inout_str_count(struct vmx *vmx, int vcpuid, int rep)
 {
 	uint64_t val;
 	int error;
 
 	if (rep) {
 		error = vmx_getreg(vmx, vcpuid, VM_REG_GUEST_RCX, &val);
 		KASSERT(!error, ("%s: vmx_getreg error %d", __func__, error));
 	} else {
 		val = 1;
 	}
 	return (val);
 }
 
 static int
 inout_str_addrsize(uint32_t inst_info)
 {
 	uint32_t size;
 
 	size = (inst_info >> 7) & 0x7;
 	switch (size) {
 	case 0:
 		return (2);	/* 16 bit */
 	case 1:
 		return (4);	/* 32 bit */
 	case 2:
 		return (8);	/* 64 bit */
 	default:
 		panic("%s: invalid size encoding %d", __func__, size);
 	}
 }
 
 static void
 inout_str_seginfo(struct vmx *vmx, int vcpuid, uint32_t inst_info, int in,
     struct vm_inout_str *vis)
 {
 	int error, s;
 
 	if (in) {
 		vis->seg_name = VM_REG_GUEST_ES;
 	} else {
 		s = (inst_info >> 15) & 0x7;
 		vis->seg_name = vm_segment_name(s);
 	}
 
 	error = vmx_getdesc(vmx, vcpuid, vis->seg_name, &vis->seg_desc);
 	KASSERT(error == 0, ("%s: vmx_getdesc error %d", __func__, error));
 }
 
 static void
 vmx_paging_info(struct vm_guest_paging *paging)
 {
 	paging->cr3 = vmcs_guest_cr3();
 	paging->cpl = vmx_cpl();
 	paging->cpu_mode = vmx_cpu_mode();
 	paging->paging_mode = vmx_paging_mode();
 }
 
 static void
 vmexit_inst_emul(struct vm_exit *vmexit, uint64_t gpa, uint64_t gla)
 {
 	struct vm_guest_paging *paging;
 	uint32_t csar;
 
 	paging = &vmexit->u.inst_emul.paging;
 
 	vmexit->exitcode = VM_EXITCODE_INST_EMUL;
 	vmexit->inst_length = 0;
 	vmexit->u.inst_emul.gpa = gpa;
 	vmexit->u.inst_emul.gla = gla;
 	vmx_paging_info(paging);
 	switch (paging->cpu_mode) {
 	case CPU_MODE_REAL:
 		vmexit->u.inst_emul.cs_base = vmcs_read(VMCS_GUEST_CS_BASE);
 		vmexit->u.inst_emul.cs_d = 0;
 		break;
 	case CPU_MODE_PROTECTED:
 	case CPU_MODE_COMPATIBILITY:
 		vmexit->u.inst_emul.cs_base = vmcs_read(VMCS_GUEST_CS_BASE);
 		csar = vmcs_read(VMCS_GUEST_CS_ACCESS_RIGHTS);
 		vmexit->u.inst_emul.cs_d = SEG_DESC_DEF32(csar);
 		break;
 	default:
 		vmexit->u.inst_emul.cs_base = 0;
 		vmexit->u.inst_emul.cs_d = 0;
 		break;
 	}
 	vie_init(&vmexit->u.inst_emul.vie, NULL, 0);
 }
 
 static int
 ept_fault_type(uint64_t ept_qual)
 {
 	int fault_type;
 
 	if (ept_qual & EPT_VIOLATION_DATA_WRITE)
 		fault_type = VM_PROT_WRITE;
 	else if (ept_qual & EPT_VIOLATION_INST_FETCH)
 		fault_type = VM_PROT_EXECUTE;
 	else
 		fault_type= VM_PROT_READ;
 
 	return (fault_type);
 }
 
 static boolean_t
 ept_emulation_fault(uint64_t ept_qual)
 {
 	int read, write;
 
 	/* EPT fault on an instruction fetch doesn't make sense here */
 	if (ept_qual & EPT_VIOLATION_INST_FETCH)
 		return (FALSE);
 
 	/* EPT fault must be a read fault or a write fault */
 	read = ept_qual & EPT_VIOLATION_DATA_READ ? 1 : 0;
 	write = ept_qual & EPT_VIOLATION_DATA_WRITE ? 1 : 0;
 	if ((read | write) == 0)
 		return (FALSE);
 
 	/*
 	 * The EPT violation must have been caused by accessing a
 	 * guest-physical address that is a translation of a guest-linear
 	 * address.
 	 */
 	if ((ept_qual & EPT_VIOLATION_GLA_VALID) == 0 ||
 	    (ept_qual & EPT_VIOLATION_XLAT_VALID) == 0) {
 		return (FALSE);
 	}
 
 	return (TRUE);
 }
 
 static __inline int
 apic_access_virtualization(struct vmx *vmx, int vcpuid)
 {
 	uint32_t proc_ctls2;
 
 	proc_ctls2 = vmx->cap[vcpuid].proc_ctls2;
 	return ((proc_ctls2 & PROCBASED2_VIRTUALIZE_APIC_ACCESSES) ? 1 : 0);
 }
 
 static __inline int
 x2apic_virtualization(struct vmx *vmx, int vcpuid)
 {
 	uint32_t proc_ctls2;
 
 	proc_ctls2 = vmx->cap[vcpuid].proc_ctls2;
 	return ((proc_ctls2 & PROCBASED2_VIRTUALIZE_X2APIC_MODE) ? 1 : 0);
 }
 
 static int
 vmx_handle_apic_write(struct vmx *vmx, int vcpuid, struct vlapic *vlapic,
     uint64_t qual)
 {
 	int error, handled, offset;
 	uint32_t *apic_regs, vector;
 	bool retu;
 
 	handled = HANDLED;
 	offset = APIC_WRITE_OFFSET(qual);
 
 	if (!apic_access_virtualization(vmx, vcpuid)) {
 		/*
 		 * In general there should not be any APIC write VM-exits
 		 * unless APIC-access virtualization is enabled.
 		 *
 		 * However self-IPI virtualization can legitimately trigger
 		 * an APIC-write VM-exit so treat it specially.
 		 */
 		if (x2apic_virtualization(vmx, vcpuid) &&
 		    offset == APIC_OFFSET_SELF_IPI) {
 			apic_regs = (uint32_t *)(vlapic->apic_page);
 			vector = apic_regs[APIC_OFFSET_SELF_IPI / 4];
 			vlapic_self_ipi_handler(vlapic, vector);
 			return (HANDLED);
 		} else
 			return (UNHANDLED);
 	}
 
 	switch (offset) {
 	case APIC_OFFSET_ID:
 		vlapic_id_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_LDR:
 		vlapic_ldr_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_DFR:
 		vlapic_dfr_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_SVR:
 		vlapic_svr_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_ESR:
 		vlapic_esr_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_ICR_LOW:
 		retu = false;
 		error = vlapic_icrlo_write_handler(vlapic, &retu);
 		if (error != 0 || retu)
 			handled = UNHANDLED;
 		break;
 	case APIC_OFFSET_CMCI_LVT:
 	case APIC_OFFSET_TIMER_LVT ... APIC_OFFSET_ERROR_LVT:
 		vlapic_lvt_write_handler(vlapic, offset);
 		break;
 	case APIC_OFFSET_TIMER_ICR:
 		vlapic_icrtmr_write_handler(vlapic);
 		break;
 	case APIC_OFFSET_TIMER_DCR:
 		vlapic_dcr_write_handler(vlapic);
 		break;
 	default:
 		handled = UNHANDLED;
 		break;
 	}
 	return (handled);
 }
 
 static bool
 apic_access_fault(struct vmx *vmx, int vcpuid, uint64_t gpa)
 {
 
 	if (apic_access_virtualization(vmx, vcpuid) &&
 	    (gpa >= DEFAULT_APIC_BASE && gpa < DEFAULT_APIC_BASE + PAGE_SIZE))
 		return (true);
 	else
 		return (false);
 }
 
 static int
 vmx_handle_apic_access(struct vmx *vmx, int vcpuid, struct vm_exit *vmexit)
 {
 	uint64_t qual;
 	int access_type, offset, allowed;
 
 	if (!apic_access_virtualization(vmx, vcpuid))
 		return (UNHANDLED);
 
 	qual = vmexit->u.vmx.exit_qualification;
 	access_type = APIC_ACCESS_TYPE(qual);
 	offset = APIC_ACCESS_OFFSET(qual);
 
 	allowed = 0;
 	if (access_type == 0) {
 		/*
 		 * Read data access to the following registers is expected.
 		 */
 		switch (offset) {
 		case APIC_OFFSET_APR:
 		case APIC_OFFSET_PPR:
 		case APIC_OFFSET_RRR:
 		case APIC_OFFSET_CMCI_LVT:
 		case APIC_OFFSET_TIMER_CCR:
 			allowed = 1;
 			break;
 		default:
 			break;
 		}
 	} else if (access_type == 1) {
 		/*
 		 * Write data access to the following registers is expected.
 		 */
 		switch (offset) {
 		case APIC_OFFSET_VER:
 		case APIC_OFFSET_APR:
 		case APIC_OFFSET_PPR:
 		case APIC_OFFSET_RRR:
 		case APIC_OFFSET_ISR0 ... APIC_OFFSET_ISR7:
 		case APIC_OFFSET_TMR0 ... APIC_OFFSET_TMR7:
 		case APIC_OFFSET_IRR0 ... APIC_OFFSET_IRR7:
 		case APIC_OFFSET_CMCI_LVT:
 		case APIC_OFFSET_TIMER_CCR:
 			allowed = 1;
 			break;
 		default:
 			break;
 		}
 	}
 
 	if (allowed) {
 		vmexit_inst_emul(vmexit, DEFAULT_APIC_BASE + offset,
 		    VIE_INVALID_GLA);
 	}
 
 	/*
 	 * Regardless of whether the APIC-access is allowed this handler
 	 * always returns UNHANDLED:
 	 * - if the access is allowed then it is handled by emulating the
 	 *   instruction that caused the VM-exit (outside the critical section)
 	 * - if the access is not allowed then it will be converted to an
 	 *   exitcode of VM_EXITCODE_VMX and will be dealt with in userland.
 	 */
 	return (UNHANDLED);
 }
 
 static enum task_switch_reason
 vmx_task_switch_reason(uint64_t qual)
 {
 	int reason;
 
 	reason = (qual >> 30) & 0x3;
 	switch (reason) {
 	case 0:
 		return (TSR_CALL);
 	case 1:
 		return (TSR_IRET);
 	case 2:
 		return (TSR_JMP);
 	case 3:
 		return (TSR_IDT_GATE);
 	default:
 		panic("%s: invalid reason %d", __func__, reason);
 	}
 }
 
 static int
 emulate_wrmsr(struct vmx *vmx, int vcpuid, u_int num, uint64_t val, bool *retu)
 {
 	int error;
 
 	if (lapic_msr(num))
 		error = lapic_wrmsr(vmx->vm, vcpuid, num, val, retu);
 	else
 		error = vmx_wrmsr(vmx, vcpuid, num, val, retu);
 
 	return (error);
 }
 
 static int
 emulate_rdmsr(struct vmx *vmx, int vcpuid, u_int num, bool *retu)
 {
 	struct vmxctx *vmxctx;
 	uint64_t result;
 	uint32_t eax, edx;
 	int error;
 
 	if (lapic_msr(num))
 		error = lapic_rdmsr(vmx->vm, vcpuid, num, &result, retu);
 	else
 		error = vmx_rdmsr(vmx, vcpuid, num, &result, retu);
 
 	if (error == 0) {
 		eax = result;
 		vmxctx = &vmx->ctx[vcpuid];
 		error = vmxctx_setreg(vmxctx, VM_REG_GUEST_RAX, eax);
 		KASSERT(error == 0, ("vmxctx_setreg(rax) error %d", error));
 
 		edx = result >> 32;
 		error = vmxctx_setreg(vmxctx, VM_REG_GUEST_RDX, edx);
 		KASSERT(error == 0, ("vmxctx_setreg(rdx) error %d", error));
 	}
 
 	return (error);
 }
 
 static int
 vmx_exit_process(struct vmx *vmx, int vcpu, struct vm_exit *vmexit)
 {
 	int error, errcode, errcode_valid, handled, in;
 	struct vmxctx *vmxctx;
 	struct vlapic *vlapic;
 	struct vm_inout_str *vis;
 	struct vm_task_switch *ts;
 	uint32_t eax, ecx, edx, idtvec_info, idtvec_err, intr_info, inst_info;
 	uint32_t intr_type, intr_vec, reason;
 	uint64_t exitintinfo, qual, gpa;
 	bool retu;
 
 	CTASSERT((PINBASED_CTLS_ONE_SETTING & PINBASED_VIRTUAL_NMI) != 0);
 	CTASSERT((PINBASED_CTLS_ONE_SETTING & PINBASED_NMI_EXITING) != 0);
 
 	handled = UNHANDLED;
 	vmxctx = &vmx->ctx[vcpu];
 
 	qual = vmexit->u.vmx.exit_qualification;
 	reason = vmexit->u.vmx.exit_reason;
 	vmexit->exitcode = VM_EXITCODE_BOGUS;
 
 	vmm_stat_incr(vmx->vm, vcpu, VMEXIT_COUNT, 1);
 	SDT_PROBE3(vmm, vmx, exit, entry, vmx, vcpu, vmexit);
 
 	/*
 	 * VM-entry failures during or after loading guest state.
 	 *
 	 * These VM-exits are uncommon but must be handled specially
 	 * as most VM-exit fields are not populated as usual.
 	 */
 	if (__predict_false(reason == EXIT_REASON_MCE_DURING_ENTRY)) {
 		VCPU_CTR0(vmx->vm, vcpu, "Handling MCE during VM-entry");
 		__asm __volatile("int $18");
 		return (1);
 	}
 
 	/*
 	 * VM exits that can be triggered during event delivery need to
 	 * be handled specially by re-injecting the event if the IDT
 	 * vectoring information field's valid bit is set.
 	 *
 	 * See "Information for VM Exits During Event Delivery" in Intel SDM
 	 * for details.
 	 */
 	idtvec_info = vmcs_idt_vectoring_info();
 	if (idtvec_info & VMCS_IDT_VEC_VALID) {
 		idtvec_info &= ~(1 << 12); /* clear undefined bit */
 		exitintinfo = idtvec_info;
 		if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
 			idtvec_err = vmcs_idt_vectoring_err();
 			exitintinfo |= (uint64_t)idtvec_err << 32;
 		}
 		error = vm_exit_intinfo(vmx->vm, vcpu, exitintinfo);
 		KASSERT(error == 0, ("%s: vm_set_intinfo error %d",
 		    __func__, error));
 
 		/*
 		 * If 'virtual NMIs' are being used and the VM-exit
 		 * happened while injecting an NMI during the previous
 		 * VM-entry, then clear "blocking by NMI" in the
 		 * Guest Interruptibility-State so the NMI can be
 		 * reinjected on the subsequent VM-entry.
 		 *
 		 * However, if the NMI was being delivered through a task
 		 * gate, then the new task must start execution with NMIs
 		 * blocked so don't clear NMI blocking in this case.
 		 */
 		intr_type = idtvec_info & VMCS_INTR_T_MASK;
 		if (intr_type == VMCS_INTR_T_NMI) {
 			if (reason != EXIT_REASON_TASK_SWITCH)
 				vmx_clear_nmi_blocking(vmx, vcpu);
 			else
 				vmx_assert_nmi_blocking(vmx, vcpu);
 		}
 
 		/*
 		 * Update VM-entry instruction length if the event being
 		 * delivered was a software interrupt or software exception.
 		 */
 		if (intr_type == VMCS_INTR_T_SWINTR ||
 		    intr_type == VMCS_INTR_T_PRIV_SWEXCEPTION ||
 		    intr_type == VMCS_INTR_T_SWEXCEPTION) {
 			vmcs_write(VMCS_ENTRY_INST_LENGTH, vmexit->inst_length);
 		}
 	}
 
 	switch (reason) {
 	case EXIT_REASON_TASK_SWITCH:
 		ts = &vmexit->u.task_switch;
 		ts->tsssel = qual & 0xffff;
 		ts->reason = vmx_task_switch_reason(qual);
 		ts->ext = 0;
 		ts->errcode_valid = 0;
 		vmx_paging_info(&ts->paging);
 		/*
 		 * If the task switch was due to a CALL, JMP, IRET, software
 		 * interrupt (INT n) or software exception (INT3, INTO),
 		 * then the saved %rip references the instruction that caused
 		 * the task switch. The instruction length field in the VMCS
 		 * is valid in this case.
 		 *
 		 * In all other cases (e.g., NMI, hardware exception) the
 		 * saved %rip is one that would have been saved in the old TSS
 		 * had the task switch completed normally so the instruction
 		 * length field is not needed in this case and is explicitly
 		 * set to 0.
 		 */
 		if (ts->reason == TSR_IDT_GATE) {
 			KASSERT(idtvec_info & VMCS_IDT_VEC_VALID,
 			    ("invalid idtvec_info %#x for IDT task switch",
 			    idtvec_info));
 			intr_type = idtvec_info & VMCS_INTR_T_MASK;
 			if (intr_type != VMCS_INTR_T_SWINTR &&
 			    intr_type != VMCS_INTR_T_SWEXCEPTION &&
 			    intr_type != VMCS_INTR_T_PRIV_SWEXCEPTION) {
 				/* Task switch triggered by external event */
 				ts->ext = 1;
 				vmexit->inst_length = 0;
 				if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
 					ts->errcode_valid = 1;
 					ts->errcode = vmcs_idt_vectoring_err();
 				}
 			}
 		}
 		vmexit->exitcode = VM_EXITCODE_TASK_SWITCH;
 		SDT_PROBE4(vmm, vmx, exit, taskswitch, vmx, vcpu, vmexit, ts);
 		VCPU_CTR4(vmx->vm, vcpu, "task switch reason %d, tss 0x%04x, "
 		    "%s errcode 0x%016lx", ts->reason, ts->tsssel,
 		    ts->ext ? "external" : "internal",
 		    ((uint64_t)ts->errcode << 32) | ts->errcode_valid);
 		break;
 	case EXIT_REASON_CR_ACCESS:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_CR_ACCESS, 1);
 		SDT_PROBE4(vmm, vmx, exit, craccess, vmx, vcpu, vmexit, qual);
 		switch (qual & 0xf) {
 		case 0:
 			handled = vmx_emulate_cr0_access(vmx, vcpu, qual);
 			break;
 		case 4:
 			handled = vmx_emulate_cr4_access(vmx, vcpu, qual);
 			break;
 		case 8:
 			handled = vmx_emulate_cr8_access(vmx, vcpu, qual);
 			break;
 		}
 		break;
 	case EXIT_REASON_RDMSR:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_RDMSR, 1);
 		retu = false;
 		ecx = vmxctx->guest_rcx;
 		VCPU_CTR1(vmx->vm, vcpu, "rdmsr 0x%08x", ecx);
 		SDT_PROBE4(vmm, vmx, exit, rdmsr, vmx, vcpu, vmexit, ecx);
 		error = emulate_rdmsr(vmx, vcpu, ecx, &retu);
 		if (error) {
 			vmexit->exitcode = VM_EXITCODE_RDMSR;
 			vmexit->u.msr.code = ecx;
 		} else if (!retu) {
 			handled = HANDLED;
 		} else {
 			/* Return to userspace with a valid exitcode */
 			KASSERT(vmexit->exitcode != VM_EXITCODE_BOGUS,
 			    ("emulate_rdmsr retu with bogus exitcode"));
 		}
 		break;
 	case EXIT_REASON_WRMSR:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_WRMSR, 1);
 		retu = false;
 		eax = vmxctx->guest_rax;
 		ecx = vmxctx->guest_rcx;
 		edx = vmxctx->guest_rdx;
 		VCPU_CTR2(vmx->vm, vcpu, "wrmsr 0x%08x value 0x%016lx",
 		    ecx, (uint64_t)edx << 32 | eax);
 		SDT_PROBE5(vmm, vmx, exit, wrmsr, vmx, vmexit, vcpu, ecx,
 		    (uint64_t)edx << 32 | eax);
 		error = emulate_wrmsr(vmx, vcpu, ecx,
 		    (uint64_t)edx << 32 | eax, &retu);
 		if (error) {
 			vmexit->exitcode = VM_EXITCODE_WRMSR;
 			vmexit->u.msr.code = ecx;
 			vmexit->u.msr.wval = (uint64_t)edx << 32 | eax;
 		} else if (!retu) {
 			handled = HANDLED;
 		} else {
 			/* Return to userspace with a valid exitcode */
 			KASSERT(vmexit->exitcode != VM_EXITCODE_BOGUS,
 			    ("emulate_wrmsr retu with bogus exitcode"));
 		}
 		break;
 	case EXIT_REASON_HLT:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_HLT, 1);
 		SDT_PROBE3(vmm, vmx, exit, halt, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_HLT;
 		vmexit->u.hlt.rflags = vmcs_read(VMCS_GUEST_RFLAGS);
 		if (virtual_interrupt_delivery)
 			vmexit->u.hlt.intr_status =
 			    vmcs_read(VMCS_GUEST_INTR_STATUS);
 		else
 			vmexit->u.hlt.intr_status = 0;
 		break;
 	case EXIT_REASON_MTF:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_MTRAP, 1);
 		SDT_PROBE3(vmm, vmx, exit, mtrap, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_MTRAP;
 		vmexit->inst_length = 0;
 		break;
 	case EXIT_REASON_PAUSE:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_PAUSE, 1);
 		SDT_PROBE3(vmm, vmx, exit, pause, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_PAUSE;
 		break;
 	case EXIT_REASON_INTR_WINDOW:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_INTR_WINDOW, 1);
 		SDT_PROBE3(vmm, vmx, exit, intrwindow, vmx, vcpu, vmexit);
 		vmx_clear_int_window_exiting(vmx, vcpu);
 		return (1);
 	case EXIT_REASON_EXT_INTR:
 		/*
 		 * External interrupts serve only to cause VM exits and allow
 		 * the host interrupt handler to run.
 		 *
 		 * If this external interrupt triggers a virtual interrupt
 		 * to a VM, then that state will be recorded by the
 		 * host interrupt handler in the VM's softc. We will inject
 		 * this virtual interrupt during the subsequent VM enter.
 		 */
 		intr_info = vmcs_read(VMCS_EXIT_INTR_INFO);
 		SDT_PROBE4(vmm, vmx, exit, interrupt,
 		    vmx, vcpu, vmexit, intr_info);
 
 		/*
 		 * XXX: Ignore this exit if VMCS_INTR_VALID is not set.
 		 * This appears to be a bug in VMware Fusion?
 		 */
 		if (!(intr_info & VMCS_INTR_VALID))
 			return (1);
 		KASSERT((intr_info & VMCS_INTR_VALID) != 0 &&
 		    (intr_info & VMCS_INTR_T_MASK) == VMCS_INTR_T_HWINTR,
 		    ("VM exit interruption info invalid: %#x", intr_info));
 		vmx_trigger_hostintr(intr_info & 0xff);
 
 		/*
 		 * This is special. We want to treat this as an 'handled'
 		 * VM-exit but not increment the instruction pointer.
 		 */
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_EXTINT, 1);
 		return (1);
 	case EXIT_REASON_NMI_WINDOW:
 		SDT_PROBE3(vmm, vmx, exit, nmiwindow, vmx, vcpu, vmexit);
 		/* Exit to allow the pending virtual NMI to be injected */
 		if (vm_nmi_pending(vmx->vm, vcpu))
 			vmx_inject_nmi(vmx, vcpu);
 		vmx_clear_nmi_window_exiting(vmx, vcpu);
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_NMI_WINDOW, 1);
 		return (1);
 	case EXIT_REASON_INOUT:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_INOUT, 1);
 		vmexit->exitcode = VM_EXITCODE_INOUT;
 		vmexit->u.inout.bytes = (qual & 0x7) + 1;
 		vmexit->u.inout.in = in = (qual & 0x8) ? 1 : 0;
 		vmexit->u.inout.string = (qual & 0x10) ? 1 : 0;
 		vmexit->u.inout.rep = (qual & 0x20) ? 1 : 0;
 		vmexit->u.inout.port = (uint16_t)(qual >> 16);
 		vmexit->u.inout.eax = (uint32_t)(vmxctx->guest_rax);
 		if (vmexit->u.inout.string) {
 			inst_info = vmcs_read(VMCS_EXIT_INSTRUCTION_INFO);
 			vmexit->exitcode = VM_EXITCODE_INOUT_STR;
 			vis = &vmexit->u.inout_str;
 			vmx_paging_info(&vis->paging);
 			vis->rflags = vmcs_read(VMCS_GUEST_RFLAGS);
 			vis->cr0 = vmcs_read(VMCS_GUEST_CR0);
 			vis->index = inout_str_index(vmx, vcpu, in);
 			vis->count = inout_str_count(vmx, vcpu, vis->inout.rep);
 			vis->addrsize = inout_str_addrsize(inst_info);
 			inout_str_seginfo(vmx, vcpu, inst_info, in, vis);
 		}
 		SDT_PROBE3(vmm, vmx, exit, inout, vmx, vcpu, vmexit);
 		break;
 	case EXIT_REASON_CPUID:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_CPUID, 1);
 		SDT_PROBE3(vmm, vmx, exit, cpuid, vmx, vcpu, vmexit);
 		handled = vmx_handle_cpuid(vmx->vm, vcpu, vmxctx);
 		break;
 	case EXIT_REASON_EXCEPTION:
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_EXCEPTION, 1);
 		intr_info = vmcs_read(VMCS_EXIT_INTR_INFO);
 		KASSERT((intr_info & VMCS_INTR_VALID) != 0,
 		    ("VM exit interruption info invalid: %#x", intr_info));
 
 		intr_vec = intr_info & 0xff;
 		intr_type = intr_info & VMCS_INTR_T_MASK;
 
 		/*
 		 * If Virtual NMIs control is 1 and the VM-exit is due to a
 		 * fault encountered during the execution of IRET then we must
 		 * restore the state of "virtual-NMI blocking" before resuming
 		 * the guest.
 		 *
 		 * See "Resuming Guest Software after Handling an Exception".
 		 * See "Information for VM Exits Due to Vectored Events".
 		 */
 		if ((idtvec_info & VMCS_IDT_VEC_VALID) == 0 &&
 		    (intr_vec != IDT_DF) &&
 		    (intr_info & EXIT_QUAL_NMIUDTI) != 0)
 			vmx_restore_nmi_blocking(vmx, vcpu);
 
 		/*
 		 * The NMI has already been handled in vmx_exit_handle_nmi().
 		 */
 		if (intr_type == VMCS_INTR_T_NMI)
 			return (1);
 
 		/*
 		 * Call the machine check handler by hand. Also don't reflect
 		 * the machine check back into the guest.
 		 */
 		if (intr_vec == IDT_MC) {
 			VCPU_CTR0(vmx->vm, vcpu, "Vectoring to MCE handler");
 			__asm __volatile("int $18");
 			return (1);
 		}
 
 		if (intr_vec == IDT_PF) {
 			error = vmxctx_setreg(vmxctx, VM_REG_GUEST_CR2, qual);
 			KASSERT(error == 0, ("%s: vmxctx_setreg(cr2) error %d",
 			    __func__, error));
 		}
 
 		/*
 		 * Software exceptions exhibit trap-like behavior. This in
 		 * turn requires populating the VM-entry instruction length
 		 * so that the %rip in the trap frame is past the INT3/INTO
 		 * instruction.
 		 */
 		if (intr_type == VMCS_INTR_T_SWEXCEPTION)
 			vmcs_write(VMCS_ENTRY_INST_LENGTH, vmexit->inst_length);
 
 		/* Reflect all other exceptions back into the guest */
 		errcode_valid = errcode = 0;
 		if (intr_info & VMCS_INTR_DEL_ERRCODE) {
 			errcode_valid = 1;
 			errcode = vmcs_read(VMCS_EXIT_INTR_ERRCODE);
 		}
 		VCPU_CTR2(vmx->vm, vcpu, "Reflecting exception %d/%#x into "
 		    "the guest", intr_vec, errcode);
 		SDT_PROBE5(vmm, vmx, exit, exception,
 		    vmx, vcpu, vmexit, intr_vec, errcode);
 		error = vm_inject_exception(vmx->vm, vcpu, intr_vec,
 		    errcode_valid, errcode, 0);
 		KASSERT(error == 0, ("%s: vm_inject_exception error %d",
 		    __func__, error));
 		return (1);
 
 	case EXIT_REASON_EPT_FAULT:
 		/*
 		 * If 'gpa' lies within the address space allocated to
 		 * memory then this must be a nested page fault otherwise
 		 * this must be an instruction that accesses MMIO space.
 		 */
 		gpa = vmcs_gpa();
 		if (vm_mem_allocated(vmx->vm, vcpu, gpa) ||
 		    apic_access_fault(vmx, vcpu, gpa)) {
 			vmexit->exitcode = VM_EXITCODE_PAGING;
 			vmexit->inst_length = 0;
 			vmexit->u.paging.gpa = gpa;
 			vmexit->u.paging.fault_type = ept_fault_type(qual);
 			vmm_stat_incr(vmx->vm, vcpu, VMEXIT_NESTED_FAULT, 1);
 			SDT_PROBE5(vmm, vmx, exit, nestedfault,
 			    vmx, vcpu, vmexit, gpa, qual);
 		} else if (ept_emulation_fault(qual)) {
 			vmexit_inst_emul(vmexit, gpa, vmcs_gla());
 			vmm_stat_incr(vmx->vm, vcpu, VMEXIT_INST_EMUL, 1);
 			SDT_PROBE4(vmm, vmx, exit, mmiofault,
 			    vmx, vcpu, vmexit, gpa);
 		}
 		/*
 		 * If Virtual NMIs control is 1 and the VM-exit is due to an
 		 * EPT fault during the execution of IRET then we must restore
 		 * the state of "virtual-NMI blocking" before resuming.
 		 *
 		 * See description of "NMI unblocking due to IRET" in
 		 * "Exit Qualification for EPT Violations".
 		 */
 		if ((idtvec_info & VMCS_IDT_VEC_VALID) == 0 &&
 		    (qual & EXIT_QUAL_NMIUDTI) != 0)
 			vmx_restore_nmi_blocking(vmx, vcpu);
 		break;
 	case EXIT_REASON_VIRTUALIZED_EOI:
 		vmexit->exitcode = VM_EXITCODE_IOAPIC_EOI;
 		vmexit->u.ioapic_eoi.vector = qual & 0xFF;
 		SDT_PROBE3(vmm, vmx, exit, eoi, vmx, vcpu, vmexit);
 		vmexit->inst_length = 0;	/* trap-like */
 		break;
 	case EXIT_REASON_APIC_ACCESS:
 		SDT_PROBE3(vmm, vmx, exit, apicaccess, vmx, vcpu, vmexit);
 		handled = vmx_handle_apic_access(vmx, vcpu, vmexit);
 		break;
 	case EXIT_REASON_APIC_WRITE:
 		/*
 		 * APIC-write VM exit is trap-like so the %rip is already
 		 * pointing to the next instruction.
 		 */
 		vmexit->inst_length = 0;
 		vlapic = vm_lapic(vmx->vm, vcpu);
 		SDT_PROBE4(vmm, vmx, exit, apicwrite,
 		    vmx, vcpu, vmexit, vlapic);
 		handled = vmx_handle_apic_write(vmx, vcpu, vlapic, qual);
 		break;
 	case EXIT_REASON_XSETBV:
 		SDT_PROBE3(vmm, vmx, exit, xsetbv, vmx, vcpu, vmexit);
 		handled = vmx_emulate_xsetbv(vmx, vcpu, vmexit);
 		break;
 	case EXIT_REASON_MONITOR:
 		SDT_PROBE3(vmm, vmx, exit, monitor, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_MONITOR;
 		break;
 	case EXIT_REASON_MWAIT:
 		SDT_PROBE3(vmm, vmx, exit, mwait, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_MWAIT;
 		break;
 	case EXIT_REASON_VMCALL:
 	case EXIT_REASON_VMCLEAR:
 	case EXIT_REASON_VMLAUNCH:
 	case EXIT_REASON_VMPTRLD:
 	case EXIT_REASON_VMPTRST:
 	case EXIT_REASON_VMREAD:
 	case EXIT_REASON_VMRESUME:
 	case EXIT_REASON_VMWRITE:
 	case EXIT_REASON_VMXOFF:
 	case EXIT_REASON_VMXON:
 		SDT_PROBE3(vmm, vmx, exit, vminsn, vmx, vcpu, vmexit);
 		vmexit->exitcode = VM_EXITCODE_VMINSN;
 		break;
 	default:
 		SDT_PROBE4(vmm, vmx, exit, unknown,
 		    vmx, vcpu, vmexit, reason);
 		vmm_stat_incr(vmx->vm, vcpu, VMEXIT_UNKNOWN, 1);
 		break;
 	}
 
 	if (handled) {
 		/*
 		 * It is possible that control is returned to userland
 		 * even though we were able to handle the VM exit in the
 		 * kernel.
 		 *
 		 * In such a case we want to make sure that the userland
 		 * restarts guest execution at the instruction *after*
 		 * the one we just processed. Therefore we update the
 		 * guest rip in the VMCS and in 'vmexit'.
 		 */
 		vmexit->rip += vmexit->inst_length;
 		vmexit->inst_length = 0;
 		vmcs_write(VMCS_GUEST_RIP, vmexit->rip);
 	} else {
 		if (vmexit->exitcode == VM_EXITCODE_BOGUS) {
 			/*
 			 * If this VM exit was not claimed by anybody then
 			 * treat it as a generic VMX exit.
 			 */
 			vmexit->exitcode = VM_EXITCODE_VMX;
 			vmexit->u.vmx.status = VM_SUCCESS;
 			vmexit->u.vmx.inst_type = 0;
 			vmexit->u.vmx.inst_error = 0;
 		} else {
 			/*
 			 * The exitcode and collateral have been populated.
 			 * The VM exit will be processed further in userland.
 			 */
 		}
 	}
 
 	SDT_PROBE4(vmm, vmx, exit, return,
 	    vmx, vcpu, vmexit, handled);
 	return (handled);
 }
 
 static __inline void
 vmx_exit_inst_error(struct vmxctx *vmxctx, int rc, struct vm_exit *vmexit)
 {
 
 	KASSERT(vmxctx->inst_fail_status != VM_SUCCESS,
 	    ("vmx_exit_inst_error: invalid inst_fail_status %d",
 	    vmxctx->inst_fail_status));
 
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_VMX;
 	vmexit->u.vmx.status = vmxctx->inst_fail_status;
 	vmexit->u.vmx.inst_error = vmcs_instruction_error();
 	vmexit->u.vmx.exit_reason = ~0;
 	vmexit->u.vmx.exit_qualification = ~0;
 
 	switch (rc) {
 	case VMX_VMRESUME_ERROR:
 	case VMX_VMLAUNCH_ERROR:
 	case VMX_INVEPT_ERROR:
 		vmexit->u.vmx.inst_type = rc;
 		break;
 	default:
 		panic("vm_exit_inst_error: vmx_enter_guest returned %d", rc);
 	}
 }
 
 /*
  * If the NMI-exiting VM execution control is set to '1' then an NMI in
  * non-root operation causes a VM-exit. NMI blocking is in effect so it is
  * sufficient to simply vector to the NMI handler via a software interrupt.
  * However, this must be done before maskable interrupts are enabled
  * otherwise the "iret" issued by an interrupt handler will incorrectly
  * clear NMI blocking.
  */
 static __inline void
 vmx_exit_handle_nmi(struct vmx *vmx, int vcpuid, struct vm_exit *vmexit)
 {
 	uint32_t intr_info;
 
 	KASSERT((read_rflags() & PSL_I) == 0, ("interrupts enabled"));
 
 	if (vmexit->u.vmx.exit_reason != EXIT_REASON_EXCEPTION)
 		return;
 
 	intr_info = vmcs_read(VMCS_EXIT_INTR_INFO);
 	KASSERT((intr_info & VMCS_INTR_VALID) != 0,
 	    ("VM exit interruption info invalid: %#x", intr_info));
 
 	if ((intr_info & VMCS_INTR_T_MASK) == VMCS_INTR_T_NMI) {
 		KASSERT((intr_info & 0xff) == IDT_NMI, ("VM exit due "
 		    "to NMI has invalid vector: %#x", intr_info));
 		VCPU_CTR0(vmx->vm, vcpuid, "Vectoring to NMI handler");
 		__asm __volatile("int $2");
 	}
 }
 
 static __inline void
 vmx_dr_enter_guest(struct vmxctx *vmxctx)
 {
 	register_t rflags;
 
 	/* Save host control debug registers. */
 	vmxctx->host_dr7 = rdr7();
 	vmxctx->host_debugctl = rdmsr(MSR_DEBUGCTLMSR);
 
 	/*
 	 * Disable debugging in DR7 and DEBUGCTL to avoid triggering
 	 * exceptions in the host based on the guest DRx values.  The
 	 * guest DR7 and DEBUGCTL are saved/restored in the VMCS.
 	 */
 	load_dr7(0);
 	wrmsr(MSR_DEBUGCTLMSR, 0);
 
 	/*
 	 * Disable single stepping the kernel to avoid corrupting the
 	 * guest DR6.  A debugger might still be able to corrupt the
 	 * guest DR6 by setting a breakpoint after this point and then
 	 * single stepping.
 	 */
 	rflags = read_rflags();
 	vmxctx->host_tf = rflags & PSL_T;
 	write_rflags(rflags & ~PSL_T);
 
 	/* Save host debug registers. */
 	vmxctx->host_dr0 = rdr0();
 	vmxctx->host_dr1 = rdr1();
 	vmxctx->host_dr2 = rdr2();
 	vmxctx->host_dr3 = rdr3();
 	vmxctx->host_dr6 = rdr6();
 
 	/* Restore guest debug registers. */
 	load_dr0(vmxctx->guest_dr0);
 	load_dr1(vmxctx->guest_dr1);
 	load_dr2(vmxctx->guest_dr2);
 	load_dr3(vmxctx->guest_dr3);
 	load_dr6(vmxctx->guest_dr6);
 }
 
 static __inline void
 vmx_dr_leave_guest(struct vmxctx *vmxctx)
 {
 
 	/* Save guest debug registers. */
 	vmxctx->guest_dr0 = rdr0();
 	vmxctx->guest_dr1 = rdr1();
 	vmxctx->guest_dr2 = rdr2();
 	vmxctx->guest_dr3 = rdr3();
 	vmxctx->guest_dr6 = rdr6();
 
 	/*
 	 * Restore host debug registers.  Restore DR7, DEBUGCTL, and
 	 * PSL_T last.
 	 */
 	load_dr0(vmxctx->host_dr0);
 	load_dr1(vmxctx->host_dr1);
 	load_dr2(vmxctx->host_dr2);
 	load_dr3(vmxctx->host_dr3);
 	load_dr6(vmxctx->host_dr6);
 	wrmsr(MSR_DEBUGCTLMSR, vmxctx->host_debugctl);
 	load_dr7(vmxctx->host_dr7);
 	write_rflags(read_rflags() | vmxctx->host_tf);
 }
 
 static int
 vmx_run(void *arg, int vcpu, register_t rip, pmap_t pmap,
     struct vm_eventinfo *evinfo)
 {
 	int rc, handled, launched;
 	struct vmx *vmx;
 	struct vm *vm;
 	struct vmxctx *vmxctx;
 	struct vmcs *vmcs;
 	struct vm_exit *vmexit;
 	struct vlapic *vlapic;
 	uint32_t exit_reason;
 	struct region_descriptor gdtr, idtr;
 	uint16_t ldt_sel;
 
 	vmx = arg;
 	vm = vmx->vm;
 	vmcs = &vmx->vmcs[vcpu];
 	vmxctx = &vmx->ctx[vcpu];
 	vlapic = vm_lapic(vm, vcpu);
 	vmexit = vm_exitinfo(vm, vcpu);
 	launched = 0;
 
 	KASSERT(vmxctx->pmap == pmap,
 	    ("pmap %p different than ctx pmap %p", pmap, vmxctx->pmap));
 
 	vmx_msr_guest_enter(vmx, vcpu);
 
 	VMPTRLD(vmcs);
 
 	/*
 	 * XXX
 	 * We do this every time because we may setup the virtual machine
 	 * from a different process than the one that actually runs it.
 	 *
 	 * If the life of a virtual machine was spent entirely in the context
 	 * of a single process we could do this once in vmx_vminit().
 	 */
 	vmcs_write(VMCS_HOST_CR3, rcr3());
 
 	vmcs_write(VMCS_GUEST_RIP, rip);
 	vmx_set_pcpu_defaults(vmx, vcpu, pmap);
 	do {
 		KASSERT(vmcs_guest_rip() == rip, ("%s: vmcs guest rip mismatch "
 		    "%#lx/%#lx", __func__, vmcs_guest_rip(), rip));
 
 		handled = UNHANDLED;
 		/*
 		 * Interrupts are disabled from this point on until the
 		 * guest starts executing. This is done for the following
 		 * reasons:
 		 *
 		 * If an AST is asserted on this thread after the check below,
 		 * then the IPI_AST notification will not be lost, because it
 		 * will cause a VM exit due to external interrupt as soon as
 		 * the guest state is loaded.
 		 *
 		 * A posted interrupt after 'vmx_inject_interrupts()' will
 		 * not be "lost" because it will be held pending in the host
 		 * APIC because interrupts are disabled. The pending interrupt
 		 * will be recognized as soon as the guest state is loaded.
 		 *
 		 * The same reasoning applies to the IPI generated by
 		 * pmap_invalidate_ept().
 		 */
 		disable_intr();
 		vmx_inject_interrupts(vmx, vcpu, vlapic, rip);
 
 		/*
 		 * Check for vcpu suspension after injecting events because
 		 * vmx_inject_interrupts() can suspend the vcpu due to a
 		 * triple fault.
 		 */
 		if (vcpu_suspended(evinfo)) {
 			enable_intr();
 			vm_exit_suspended(vmx->vm, vcpu, rip);
 			break;
 		}
 
 		if (vcpu_rendezvous_pending(evinfo)) {
 			enable_intr();
 			vm_exit_rendezvous(vmx->vm, vcpu, rip);
 			break;
 		}
 
 		if (vcpu_reqidle(evinfo)) {
 			enable_intr();
 			vm_exit_reqidle(vmx->vm, vcpu, rip);
 			break;
 		}
 
 		if (vcpu_should_yield(vm, vcpu)) {
 			enable_intr();
 			vm_exit_astpending(vmx->vm, vcpu, rip);
 			vmx_astpending_trace(vmx, vcpu, rip);
 			handled = HANDLED;
 			break;
 		}
 
 		if (vcpu_debugged(vm, vcpu)) {
 			enable_intr();
 			vm_exit_debug(vmx->vm, vcpu, rip);
 			break;
 		}
 
 		/*
 		 * VM exits restore the base address but not the
 		 * limits of GDTR and IDTR.  The VMCS only stores the
 		 * base address, so VM exits set the limits to 0xffff.
 		 * Save and restore the full GDTR and IDTR to restore
 		 * the limits.
 		 *
 		 * The VMCS does not save the LDTR at all, and VM
 		 * exits clear LDTR as if a NULL selector were loaded.
 		 * The userspace hypervisor probably doesn't use a
 		 * LDT, but save and restore it to be safe.
 		 */
 		sgdt(&gdtr);
 		sidt(&idtr);
 		ldt_sel = sldt();
 
 		vmx_run_trace(vmx, vcpu);
 		vmx_dr_enter_guest(vmxctx);
 		rc = vmx_enter_guest(vmxctx, vmx, launched);
 		vmx_dr_leave_guest(vmxctx);
 
 		bare_lgdt(&gdtr);
 		lidt(&idtr);
 		lldt(ldt_sel);
 
 		/* Collect some information for VM exit processing */
 		vmexit->rip = rip = vmcs_guest_rip();
 		vmexit->inst_length = vmexit_instruction_length();
 		vmexit->u.vmx.exit_reason = exit_reason = vmcs_exit_reason();
 		vmexit->u.vmx.exit_qualification = vmcs_exit_qualification();
 
 		/* Update 'nextrip' */
 		vmx->state[vcpu].nextrip = rip;
 
 		if (rc == VMX_GUEST_VMEXIT) {
 			vmx_exit_handle_nmi(vmx, vcpu, vmexit);
 			enable_intr();
 			handled = vmx_exit_process(vmx, vcpu, vmexit);
 		} else {
 			enable_intr();
 			vmx_exit_inst_error(vmxctx, rc, vmexit);
 		}
 		launched = 1;
 		vmx_exit_trace(vmx, vcpu, rip, exit_reason, handled);
 		rip = vmexit->rip;
 	} while (handled);
 
 	/*
 	 * If a VM exit has been handled then the exitcode must be BOGUS
 	 * If a VM exit is not handled then the exitcode must not be BOGUS
 	 */
 	if ((handled && vmexit->exitcode != VM_EXITCODE_BOGUS) ||
 	    (!handled && vmexit->exitcode == VM_EXITCODE_BOGUS)) {
 		panic("Mismatch between handled (%d) and exitcode (%d)",
 		      handled, vmexit->exitcode);
 	}
 
 	if (!handled)
 		vmm_stat_incr(vm, vcpu, VMEXIT_USERSPACE, 1);
 
 	VCPU_CTR1(vm, vcpu, "returning from vmx_run: exitcode %d",
 	    vmexit->exitcode);
 
 	VMCLEAR(vmcs);
 	vmx_msr_guest_exit(vmx, vcpu);
 
 	return (0);
 }
 
 static void
 vmx_vmcleanup(void *arg)
 {
 	int i;
 	struct vmx *vmx = arg;
+	uint16_t maxcpus;
 
 	if (apic_access_virtualization(vmx, 0))
 		vm_unmap_mmio(vmx->vm, DEFAULT_APIC_BASE, PAGE_SIZE);
 
-	for (i = 0; i < VM_MAXCPU; i++)
+	maxcpus = vm_get_maxcpus(vmx->vm);
+	for (i = 0; i < maxcpus; i++)
 		vpid_free(vmx->state[i].vpid);
 
 	free(vmx, M_VMX);
 
 	return;
 }
 
 static register_t *
 vmxctx_regptr(struct vmxctx *vmxctx, int reg)
 {
 
 	switch (reg) {
 	case VM_REG_GUEST_RAX:
 		return (&vmxctx->guest_rax);
 	case VM_REG_GUEST_RBX:
 		return (&vmxctx->guest_rbx);
 	case VM_REG_GUEST_RCX:
 		return (&vmxctx->guest_rcx);
 	case VM_REG_GUEST_RDX:
 		return (&vmxctx->guest_rdx);
 	case VM_REG_GUEST_RSI:
 		return (&vmxctx->guest_rsi);
 	case VM_REG_GUEST_RDI:
 		return (&vmxctx->guest_rdi);
 	case VM_REG_GUEST_RBP:
 		return (&vmxctx->guest_rbp);
 	case VM_REG_GUEST_R8:
 		return (&vmxctx->guest_r8);
 	case VM_REG_GUEST_R9:
 		return (&vmxctx->guest_r9);
 	case VM_REG_GUEST_R10:
 		return (&vmxctx->guest_r10);
 	case VM_REG_GUEST_R11:
 		return (&vmxctx->guest_r11);
 	case VM_REG_GUEST_R12:
 		return (&vmxctx->guest_r12);
 	case VM_REG_GUEST_R13:
 		return (&vmxctx->guest_r13);
 	case VM_REG_GUEST_R14:
 		return (&vmxctx->guest_r14);
 	case VM_REG_GUEST_R15:
 		return (&vmxctx->guest_r15);
 	case VM_REG_GUEST_CR2:
 		return (&vmxctx->guest_cr2);
 	case VM_REG_GUEST_DR0:
 		return (&vmxctx->guest_dr0);
 	case VM_REG_GUEST_DR1:
 		return (&vmxctx->guest_dr1);
 	case VM_REG_GUEST_DR2:
 		return (&vmxctx->guest_dr2);
 	case VM_REG_GUEST_DR3:
 		return (&vmxctx->guest_dr3);
 	case VM_REG_GUEST_DR6:
 		return (&vmxctx->guest_dr6);
 	default:
 		break;
 	}
 	return (NULL);
 }
 
 static int
 vmxctx_getreg(struct vmxctx *vmxctx, int reg, uint64_t *retval)
 {
 	register_t *regp;
 
 	if ((regp = vmxctx_regptr(vmxctx, reg)) != NULL) {
 		*retval = *regp;
 		return (0);
 	} else
 		return (EINVAL);
 }
 
 static int
 vmxctx_setreg(struct vmxctx *vmxctx, int reg, uint64_t val)
 {
 	register_t *regp;
 
 	if ((regp = vmxctx_regptr(vmxctx, reg)) != NULL) {
 		*regp = val;
 		return (0);
 	} else
 		return (EINVAL);
 }
 
 static int
 vmx_get_intr_shadow(struct vmx *vmx, int vcpu, int running, uint64_t *retval)
 {
 	uint64_t gi;
 	int error;
 
 	error = vmcs_getreg(&vmx->vmcs[vcpu], running,
 	    VMCS_IDENT(VMCS_GUEST_INTERRUPTIBILITY), &gi);
 	*retval = (gi & HWINTR_BLOCKING) ? 1 : 0;
 	return (error);
 }
 
 static int
 vmx_modify_intr_shadow(struct vmx *vmx, int vcpu, int running, uint64_t val)
 {
 	struct vmcs *vmcs;
 	uint64_t gi;
 	int error, ident;
 
 	/*
 	 * Forcing the vcpu into an interrupt shadow is not supported.
 	 */
 	if (val) {
 		error = EINVAL;
 		goto done;
 	}
 
 	vmcs = &vmx->vmcs[vcpu];
 	ident = VMCS_IDENT(VMCS_GUEST_INTERRUPTIBILITY);
 	error = vmcs_getreg(vmcs, running, ident, &gi);
 	if (error == 0) {
 		gi &= ~HWINTR_BLOCKING;
 		error = vmcs_setreg(vmcs, running, ident, gi);
 	}
 done:
 	VCPU_CTR2(vmx->vm, vcpu, "Setting intr_shadow to %#lx %s", val,
 	    error ? "failed" : "succeeded");
 	return (error);
 }
 
 static int
 vmx_shadow_reg(int reg)
 {
 	int shreg;
 
 	shreg = -1;
 
 	switch (reg) {
 	case VM_REG_GUEST_CR0:
 		shreg = VMCS_CR0_SHADOW;
 		break;
 	case VM_REG_GUEST_CR4:
 		shreg = VMCS_CR4_SHADOW;
 		break;
 	default:
 		break;
 	}
 
 	return (shreg);
 }
 
 static int
 vmx_getreg(void *arg, int vcpu, int reg, uint64_t *retval)
 {
 	int running, hostcpu;
 	struct vmx *vmx = arg;
 
 	running = vcpu_is_running(vmx->vm, vcpu, &hostcpu);
 	if (running && hostcpu != curcpu)
 		panic("vmx_getreg: %s%d is running", vm_name(vmx->vm), vcpu);
 
 	if (reg == VM_REG_GUEST_INTR_SHADOW)
 		return (vmx_get_intr_shadow(vmx, vcpu, running, retval));
 
 	if (vmxctx_getreg(&vmx->ctx[vcpu], reg, retval) == 0)
 		return (0);
 
 	return (vmcs_getreg(&vmx->vmcs[vcpu], running, reg, retval));
 }
 
 static int
 vmx_setreg(void *arg, int vcpu, int reg, uint64_t val)
 {
 	int error, hostcpu, running, shadow;
 	uint64_t ctls;
 	pmap_t pmap;
 	struct vmx *vmx = arg;
 
 	running = vcpu_is_running(vmx->vm, vcpu, &hostcpu);
 	if (running && hostcpu != curcpu)
 		panic("vmx_setreg: %s%d is running", vm_name(vmx->vm), vcpu);
 
 	if (reg == VM_REG_GUEST_INTR_SHADOW)
 		return (vmx_modify_intr_shadow(vmx, vcpu, running, val));
 
 	if (vmxctx_setreg(&vmx->ctx[vcpu], reg, val) == 0)
 		return (0);
 
 	error = vmcs_setreg(&vmx->vmcs[vcpu], running, reg, val);
 
 	if (error == 0) {
 		/*
 		 * If the "load EFER" VM-entry control is 1 then the
 		 * value of EFER.LMA must be identical to "IA-32e mode guest"
 		 * bit in the VM-entry control.
 		 */
 		if ((entry_ctls & VM_ENTRY_LOAD_EFER) != 0 &&
 		    (reg == VM_REG_GUEST_EFER)) {
 			vmcs_getreg(&vmx->vmcs[vcpu], running,
 				    VMCS_IDENT(VMCS_ENTRY_CTLS), &ctls);
 			if (val & EFER_LMA)
 				ctls |= VM_ENTRY_GUEST_LMA;
 			else
 				ctls &= ~VM_ENTRY_GUEST_LMA;
 			vmcs_setreg(&vmx->vmcs[vcpu], running,
 				    VMCS_IDENT(VMCS_ENTRY_CTLS), ctls);
 		}
 
 		shadow = vmx_shadow_reg(reg);
 		if (shadow > 0) {
 			/*
 			 * Store the unmodified value in the shadow
 			 */
 			error = vmcs_setreg(&vmx->vmcs[vcpu], running,
 				    VMCS_IDENT(shadow), val);
 		}
 
 		if (reg == VM_REG_GUEST_CR3) {
 			/*
 			 * Invalidate the guest vcpu's TLB mappings to emulate
 			 * the behavior of updating %cr3.
 			 *
 			 * XXX the processor retains global mappings when %cr3
 			 * is updated but vmx_invvpid() does not.
 			 */
 			pmap = vmx->ctx[vcpu].pmap;
 			vmx_invvpid(vmx, vcpu, pmap, running);
 		}
 	}
 
 	return (error);
 }
 
 static int
 vmx_getdesc(void *arg, int vcpu, int reg, struct seg_desc *desc)
 {
 	int hostcpu, running;
 	struct vmx *vmx = arg;
 
 	running = vcpu_is_running(vmx->vm, vcpu, &hostcpu);
 	if (running && hostcpu != curcpu)
 		panic("vmx_getdesc: %s%d is running", vm_name(vmx->vm), vcpu);
 
 	return (vmcs_getdesc(&vmx->vmcs[vcpu], running, reg, desc));
 }
 
 static int
 vmx_setdesc(void *arg, int vcpu, int reg, struct seg_desc *desc)
 {
 	int hostcpu, running;
 	struct vmx *vmx = arg;
 
 	running = vcpu_is_running(vmx->vm, vcpu, &hostcpu);
 	if (running && hostcpu != curcpu)
 		panic("vmx_setdesc: %s%d is running", vm_name(vmx->vm), vcpu);
 
 	return (vmcs_setdesc(&vmx->vmcs[vcpu], running, reg, desc));
 }
 
 static int
 vmx_getcap(void *arg, int vcpu, int type, int *retval)
 {
 	struct vmx *vmx = arg;
 	int vcap;
 	int ret;
 
 	ret = ENOENT;
 
 	vcap = vmx->cap[vcpu].set;
 
 	switch (type) {
 	case VM_CAP_HALT_EXIT:
 		if (cap_halt_exit)
 			ret = 0;
 		break;
 	case VM_CAP_PAUSE_EXIT:
 		if (cap_pause_exit)
 			ret = 0;
 		break;
 	case VM_CAP_MTRAP_EXIT:
 		if (cap_monitor_trap)
 			ret = 0;
 		break;
 	case VM_CAP_UNRESTRICTED_GUEST:
 		if (cap_unrestricted_guest)
 			ret = 0;
 		break;
 	case VM_CAP_ENABLE_INVPCID:
 		if (cap_invpcid)
 			ret = 0;
 		break;
 	default:
 		break;
 	}
 
 	if (ret == 0)
 		*retval = (vcap & (1 << type)) ? 1 : 0;
 
 	return (ret);
 }
 
 static int
 vmx_setcap(void *arg, int vcpu, int type, int val)
 {
 	struct vmx *vmx = arg;
 	struct vmcs *vmcs = &vmx->vmcs[vcpu];
 	uint32_t baseval;
 	uint32_t *pptr;
 	int error;
 	int flag;
 	int reg;
 	int retval;
 
 	retval = ENOENT;
 	pptr = NULL;
 
 	switch (type) {
 	case VM_CAP_HALT_EXIT:
 		if (cap_halt_exit) {
 			retval = 0;
 			pptr = &vmx->cap[vcpu].proc_ctls;
 			baseval = *pptr;
 			flag = PROCBASED_HLT_EXITING;
 			reg = VMCS_PRI_PROC_BASED_CTLS;
 		}
 		break;
 	case VM_CAP_MTRAP_EXIT:
 		if (cap_monitor_trap) {
 			retval = 0;
 			pptr = &vmx->cap[vcpu].proc_ctls;
 			baseval = *pptr;
 			flag = PROCBASED_MTF;
 			reg = VMCS_PRI_PROC_BASED_CTLS;
 		}
 		break;
 	case VM_CAP_PAUSE_EXIT:
 		if (cap_pause_exit) {
 			retval = 0;
 			pptr = &vmx->cap[vcpu].proc_ctls;
 			baseval = *pptr;
 			flag = PROCBASED_PAUSE_EXITING;
 			reg = VMCS_PRI_PROC_BASED_CTLS;
 		}
 		break;
 	case VM_CAP_UNRESTRICTED_GUEST:
 		if (cap_unrestricted_guest) {
 			retval = 0;
 			pptr = &vmx->cap[vcpu].proc_ctls2;
 			baseval = *pptr;
 			flag = PROCBASED2_UNRESTRICTED_GUEST;
 			reg = VMCS_SEC_PROC_BASED_CTLS;
 		}
 		break;
 	case VM_CAP_ENABLE_INVPCID:
 		if (cap_invpcid) {
 			retval = 0;
 			pptr = &vmx->cap[vcpu].proc_ctls2;
 			baseval = *pptr;
 			flag = PROCBASED2_ENABLE_INVPCID;
 			reg = VMCS_SEC_PROC_BASED_CTLS;
 		}
 		break;
 	default:
 		break;
 	}
 
 	if (retval == 0) {
 		if (val) {
 			baseval |= flag;
 		} else {
 			baseval &= ~flag;
 		}
 		VMPTRLD(vmcs);
 		error = vmwrite(reg, baseval);
 		VMCLEAR(vmcs);
 
 		if (error) {
 			retval = error;
 		} else {
 			/*
 			 * Update optional stored flags, and record
 			 * setting
 			 */
 			if (pptr != NULL) {
 				*pptr = baseval;
 			}
 
 			if (val) {
 				vmx->cap[vcpu].set |= (1 << type);
 			} else {
 				vmx->cap[vcpu].set &= ~(1 << type);
 			}
 		}
 	}
 
 	return (retval);
 }
 
 struct vlapic_vtx {
 	struct vlapic	vlapic;
 	struct pir_desc	*pir_desc;
 	struct vmx	*vmx;
 	u_int	pending_prio;
 };
 
 #define VPR_PRIO_BIT(vpr)	(1 << ((vpr) >> 4))
 
 #define	VMX_CTR_PIR(vm, vcpuid, pir_desc, notify, vector, level, msg)	\
 do {									\
 	VCPU_CTR2(vm, vcpuid, msg " assert %s-triggered vector %d",	\
 	    level ? "level" : "edge", vector);				\
 	VCPU_CTR1(vm, vcpuid, msg " pir0 0x%016lx", pir_desc->pir[0]);	\
 	VCPU_CTR1(vm, vcpuid, msg " pir1 0x%016lx", pir_desc->pir[1]);	\
 	VCPU_CTR1(vm, vcpuid, msg " pir2 0x%016lx", pir_desc->pir[2]);	\
 	VCPU_CTR1(vm, vcpuid, msg " pir3 0x%016lx", pir_desc->pir[3]);	\
 	VCPU_CTR1(vm, vcpuid, msg " notify: %s", notify ? "yes" : "no");\
 } while (0)
 
 /*
  * vlapic->ops handlers that utilize the APICv hardware assist described in
  * Chapter 29 of the Intel SDM.
  */
 static int
 vmx_set_intr_ready(struct vlapic *vlapic, int vector, bool level)
 {
 	struct vlapic_vtx *vlapic_vtx;
 	struct pir_desc *pir_desc;
 	uint64_t mask;
 	int idx, notify = 0;
 
 	vlapic_vtx = (struct vlapic_vtx *)vlapic;
 	pir_desc = vlapic_vtx->pir_desc;
 
 	/*
 	 * Keep track of interrupt requests in the PIR descriptor. This is
 	 * because the virtual APIC page pointed to by the VMCS cannot be
 	 * modified if the vcpu is running.
 	 */
 	idx = vector / 64;
 	mask = 1UL << (vector % 64);
 	atomic_set_long(&pir_desc->pir[idx], mask);
 
 	/*
 	 * A notification is required whenever the 'pending' bit makes a
 	 * transition from 0->1.
 	 *
 	 * Even if the 'pending' bit is already asserted, notification about
 	 * the incoming interrupt may still be necessary.  For example, if a
 	 * vCPU is HLTed with a high PPR, a low priority interrupt would cause
 	 * the 0->1 'pending' transition with a notification, but the vCPU
 	 * would ignore the interrupt for the time being.  The same vCPU would
 	 * need to then be notified if a high-priority interrupt arrived which
 	 * satisfied the PPR.
 	 *
 	 * The priorities of interrupts injected while 'pending' is asserted
 	 * are tracked in a custom bitfield 'pending_prio'.  Should the
 	 * to-be-injected interrupt exceed the priorities already present, the
 	 * notification is sent.  The priorities recorded in 'pending_prio' are
 	 * cleared whenever the 'pending' bit makes another 0->1 transition.
 	 */
 	if (atomic_cmpset_long(&pir_desc->pending, 0, 1) != 0) {
 		notify = 1;
 		vlapic_vtx->pending_prio = 0;
 	} else {
 		const u_int old_prio = vlapic_vtx->pending_prio;
 		const u_int prio_bit = VPR_PRIO_BIT(vector & APIC_TPR_INT);
 
 		if ((old_prio & prio_bit) == 0 && prio_bit > old_prio) {
 			atomic_set_int(&vlapic_vtx->pending_prio, prio_bit);
 			notify = 1;
 		}
 	}
 
 	VMX_CTR_PIR(vlapic->vm, vlapic->vcpuid, pir_desc, notify, vector,
 	    level, "vmx_set_intr_ready");
 	return (notify);
 }
 
 static int
 vmx_pending_intr(struct vlapic *vlapic, int *vecptr)
 {
 	struct vlapic_vtx *vlapic_vtx;
 	struct pir_desc *pir_desc;
 	struct LAPIC *lapic;
 	uint64_t pending, pirval;
 	uint32_t ppr, vpr;
 	int i;
 
 	/*
 	 * This function is only expected to be called from the 'HLT' exit
 	 * handler which does not care about the vector that is pending.
 	 */
 	KASSERT(vecptr == NULL, ("vmx_pending_intr: vecptr must be NULL"));
 
 	vlapic_vtx = (struct vlapic_vtx *)vlapic;
 	pir_desc = vlapic_vtx->pir_desc;
 
 	pending = atomic_load_acq_long(&pir_desc->pending);
 	if (!pending) {
 		/*
 		 * While a virtual interrupt may have already been
 		 * processed the actual delivery maybe pending the
 		 * interruptibility of the guest.  Recognize a pending
 		 * interrupt by reevaluating virtual interrupts
 		 * following Section 29.2.1 in the Intel SDM Volume 3.
 		 */
 		struct vm_exit *vmexit;
 		uint8_t rvi, ppr;
 
 		vmexit = vm_exitinfo(vlapic->vm, vlapic->vcpuid);
 		KASSERT(vmexit->exitcode == VM_EXITCODE_HLT,
 		    ("vmx_pending_intr: exitcode not 'HLT'"));
 		rvi = vmexit->u.hlt.intr_status & APIC_TPR_INT;
 		lapic = vlapic->apic_page;
 		ppr = lapic->ppr & APIC_TPR_INT;
 		if (rvi > ppr) {
 			return (1);
 		}
 
 		return (0);
 	}
 
 	/*
 	 * If there is an interrupt pending then it will be recognized only
 	 * if its priority is greater than the processor priority.
 	 *
 	 * Special case: if the processor priority is zero then any pending
 	 * interrupt will be recognized.
 	 */
 	lapic = vlapic->apic_page;
 	ppr = lapic->ppr & APIC_TPR_INT;
 	if (ppr == 0)
 		return (1);
 
 	VCPU_CTR1(vlapic->vm, vlapic->vcpuid, "HLT with non-zero PPR %d",
 	    lapic->ppr);
 
 	vpr = 0;
 	for (i = 3; i >= 0; i--) {
 		pirval = pir_desc->pir[i];
 		if (pirval != 0) {
 			vpr = (i * 64 + flsl(pirval) - 1) & APIC_TPR_INT;
 			break;
 		}
 	}
 
 	/*
 	 * If the highest-priority pending interrupt falls short of the
 	 * processor priority of this vCPU, ensure that 'pending_prio' does not
 	 * have any stale bits which would preclude a higher-priority interrupt
 	 * from incurring a notification later.
 	 */
 	if (vpr <= ppr) {
 		const u_int prio_bit = VPR_PRIO_BIT(vpr);
 		const u_int old = vlapic_vtx->pending_prio;
 
 		if (old > prio_bit && (old & prio_bit) == 0) {
 			vlapic_vtx->pending_prio = prio_bit;
 		}
 		return (0);
 	}
 	return (1);
 }
 
 static void
 vmx_intr_accepted(struct vlapic *vlapic, int vector)
 {
 
 	panic("vmx_intr_accepted: not expected to be called");
 }
 
 static void
 vmx_set_tmr(struct vlapic *vlapic, int vector, bool level)
 {
 	struct vlapic_vtx *vlapic_vtx;
 	struct vmx *vmx;
 	struct vmcs *vmcs;
 	uint64_t mask, val;
 
 	KASSERT(vector >= 0 && vector <= 255, ("invalid vector %d", vector));
 	KASSERT(!vcpu_is_running(vlapic->vm, vlapic->vcpuid, NULL),
 	    ("vmx_set_tmr: vcpu cannot be running"));
 
 	vlapic_vtx = (struct vlapic_vtx *)vlapic;
 	vmx = vlapic_vtx->vmx;
 	vmcs = &vmx->vmcs[vlapic->vcpuid];
 	mask = 1UL << (vector % 64);
 
 	VMPTRLD(vmcs);
 	val = vmcs_read(VMCS_EOI_EXIT(vector));
 	if (level)
 		val |= mask;
 	else
 		val &= ~mask;
 	vmcs_write(VMCS_EOI_EXIT(vector), val);
 	VMCLEAR(vmcs);
 }
 
 static void
 vmx_enable_x2apic_mode(struct vlapic *vlapic)
 {
 	struct vmx *vmx;
 	struct vmcs *vmcs;
 	uint32_t proc_ctls2;
 	int vcpuid, error;
 
 	vcpuid = vlapic->vcpuid;
 	vmx = ((struct vlapic_vtx *)vlapic)->vmx;
 	vmcs = &vmx->vmcs[vcpuid];
 
 	proc_ctls2 = vmx->cap[vcpuid].proc_ctls2;
 	KASSERT((proc_ctls2 & PROCBASED2_VIRTUALIZE_APIC_ACCESSES) != 0,
 	    ("%s: invalid proc_ctls2 %#x", __func__, proc_ctls2));
 
 	proc_ctls2 &= ~PROCBASED2_VIRTUALIZE_APIC_ACCESSES;
 	proc_ctls2 |= PROCBASED2_VIRTUALIZE_X2APIC_MODE;
 	vmx->cap[vcpuid].proc_ctls2 = proc_ctls2;
 
 	VMPTRLD(vmcs);
 	vmcs_write(VMCS_SEC_PROC_BASED_CTLS, proc_ctls2);
 	VMCLEAR(vmcs);
 
 	if (vlapic->vcpuid == 0) {
 		/*
 		 * The nested page table mappings are shared by all vcpus
 		 * so unmap the APIC access page just once.
 		 */
 		error = vm_unmap_mmio(vmx->vm, DEFAULT_APIC_BASE, PAGE_SIZE);
 		KASSERT(error == 0, ("%s: vm_unmap_mmio error %d",
 		    __func__, error));
 
 		/*
 		 * The MSR bitmap is shared by all vcpus so modify it only
 		 * once in the context of vcpu 0.
 		 */
 		error = vmx_allow_x2apic_msrs(vmx);
 		KASSERT(error == 0, ("%s: vmx_allow_x2apic_msrs error %d",
 		    __func__, error));
 	}
 }
 
 static void
 vmx_post_intr(struct vlapic *vlapic, int hostcpu)
 {
 
 	ipi_cpu(hostcpu, pirvec);
 }
 
 /*
  * Transfer the pending interrupts in the PIR descriptor to the IRR
  * in the virtual APIC page.
  */
 static void
 vmx_inject_pir(struct vlapic *vlapic)
 {
 	struct vlapic_vtx *vlapic_vtx;
 	struct pir_desc *pir_desc;
 	struct LAPIC *lapic;
 	uint64_t val, pirval;
 	int rvi, pirbase = -1;
 	uint16_t intr_status_old, intr_status_new;
 
 	vlapic_vtx = (struct vlapic_vtx *)vlapic;
 	pir_desc = vlapic_vtx->pir_desc;
 	if (atomic_cmpset_long(&pir_desc->pending, 1, 0) == 0) {
 		VCPU_CTR0(vlapic->vm, vlapic->vcpuid, "vmx_inject_pir: "
 		    "no posted interrupt pending");
 		return;
 	}
 
 	pirval = 0;
 	pirbase = -1;
 	lapic = vlapic->apic_page;
 
 	val = atomic_readandclear_long(&pir_desc->pir[0]);
 	if (val != 0) {
 		lapic->irr0 |= val;
 		lapic->irr1 |= val >> 32;
 		pirbase = 0;
 		pirval = val;
 	}
 
 	val = atomic_readandclear_long(&pir_desc->pir[1]);
 	if (val != 0) {
 		lapic->irr2 |= val;
 		lapic->irr3 |= val >> 32;
 		pirbase = 64;
 		pirval = val;
 	}
 
 	val = atomic_readandclear_long(&pir_desc->pir[2]);
 	if (val != 0) {
 		lapic->irr4 |= val;
 		lapic->irr5 |= val >> 32;
 		pirbase = 128;
 		pirval = val;
 	}
 
 	val = atomic_readandclear_long(&pir_desc->pir[3]);
 	if (val != 0) {
 		lapic->irr6 |= val;
 		lapic->irr7 |= val >> 32;
 		pirbase = 192;
 		pirval = val;
 	}
 
 	VLAPIC_CTR_IRR(vlapic, "vmx_inject_pir");
 
 	/*
 	 * Update RVI so the processor can evaluate pending virtual
 	 * interrupts on VM-entry.
 	 *
 	 * It is possible for pirval to be 0 here, even though the
 	 * pending bit has been set. The scenario is:
 	 * CPU-Y is sending a posted interrupt to CPU-X, which
 	 * is running a guest and processing posted interrupts in h/w.
 	 * CPU-X will eventually exit and the state seen in s/w is
 	 * the pending bit set, but no PIR bits set.
 	 *
 	 *      CPU-X                      CPU-Y
 	 *   (vm running)                (host running)
 	 *   rx posted interrupt
 	 *   CLEAR pending bit
 	 *				 SET PIR bit
 	 *   READ/CLEAR PIR bits
 	 *				 SET pending bit
 	 *   (vm exit)
 	 *   pending bit set, PIR 0
 	 */
 	if (pirval != 0) {
 		rvi = pirbase + flsl(pirval) - 1;
 		intr_status_old = vmcs_read(VMCS_GUEST_INTR_STATUS);
 		intr_status_new = (intr_status_old & 0xFF00) | rvi;
 		if (intr_status_new > intr_status_old) {
 			vmcs_write(VMCS_GUEST_INTR_STATUS, intr_status_new);
 			VCPU_CTR2(vlapic->vm, vlapic->vcpuid, "vmx_inject_pir: "
 			    "guest_intr_status changed from 0x%04x to 0x%04x",
 			    intr_status_old, intr_status_new);
 		}
 	}
 }
 
 static struct vlapic *
 vmx_vlapic_init(void *arg, int vcpuid)
 {
 	struct vmx *vmx;
 	struct vlapic *vlapic;
 	struct vlapic_vtx *vlapic_vtx;
 
 	vmx = arg;
 
 	vlapic = malloc(sizeof(struct vlapic_vtx), M_VLAPIC, M_WAITOK | M_ZERO);
 	vlapic->vm = vmx->vm;
 	vlapic->vcpuid = vcpuid;
 	vlapic->apic_page = (struct LAPIC *)&vmx->apic_page[vcpuid];
 
 	vlapic_vtx = (struct vlapic_vtx *)vlapic;
 	vlapic_vtx->pir_desc = &vmx->pir_desc[vcpuid];
 	vlapic_vtx->vmx = vmx;
 
 	if (virtual_interrupt_delivery) {
 		vlapic->ops.set_intr_ready = vmx_set_intr_ready;
 		vlapic->ops.pending_intr = vmx_pending_intr;
 		vlapic->ops.intr_accepted = vmx_intr_accepted;
 		vlapic->ops.set_tmr = vmx_set_tmr;
 		vlapic->ops.enable_x2apic_mode = vmx_enable_x2apic_mode;
 	}
 
 	if (posted_interrupts)
 		vlapic->ops.post_intr = vmx_post_intr;
 
 	vlapic_init(vlapic);
 
 	return (vlapic);
 }
 
 static void
 vmx_vlapic_cleanup(void *arg, struct vlapic *vlapic)
 {
 
 	vlapic_cleanup(vlapic);
 	free(vlapic, M_VLAPIC);
 }
 
 struct vmm_ops vmm_ops_intel = {
 	vmx_init,
 	vmx_cleanup,
 	vmx_restore,
 	vmx_vminit,
 	vmx_run,
 	vmx_vmcleanup,
 	vmx_getreg,
 	vmx_setreg,
 	vmx_getdesc,
 	vmx_setdesc,
 	vmx_getcap,
 	vmx_setcap,
 	ept_vmspace_alloc,
 	ept_vmspace_free,
 	vmx_vlapic_init,
 	vmx_vlapic_cleanup,
 };
Index: user/ngie/bug-237403/sys/amd64/vmm/io/vlapic.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/io/vlapic.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/io/vlapic.c	(revision 346926)
@@ -1,1656 +1,1659 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/lock.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
 #include <sys/systm.h>
 #include <sys/smp.h>
 
 #include <x86/specialreg.h>
 #include <x86/apicreg.h>
 
 #include <machine/clock.h>
 #include <machine/smp.h>
 
 #include <machine/vmm.h>
 
 #include "vmm_lapic.h"
 #include "vmm_ktr.h"
 #include "vmm_stat.h"
 
 #include "vlapic.h"
 #include "vlapic_priv.h"
 #include "vioapic.h"
 
 #define	PRIO(x)			((x) >> 4)
 
 #define VLAPIC_VERSION		(16)
 
 #define	x2apic(vlapic)	(((vlapic)->msr_apicbase & APICBASE_X2APIC) ? 1 : 0)
 
 /*
  * The 'vlapic->timer_mtx' is used to provide mutual exclusion between the
  * vlapic_callout_handler() and vcpu accesses to:
  * - timer_freq_bt, timer_period_bt, timer_fire_bt
  * - timer LVT register
  */
 #define	VLAPIC_TIMER_LOCK(vlapic)	mtx_lock_spin(&((vlapic)->timer_mtx))
 #define	VLAPIC_TIMER_UNLOCK(vlapic)	mtx_unlock_spin(&((vlapic)->timer_mtx))
 #define	VLAPIC_TIMER_LOCKED(vlapic)	mtx_owned(&((vlapic)->timer_mtx))
 
 /*
  * APIC timer frequency:
  * - arbitrary but chosen to be in the ballpark of contemporary hardware.
  * - power-of-two to avoid loss of precision when converted to a bintime.
  */
 #define VLAPIC_BUS_FREQ		(128 * 1024 * 1024)
 
 static __inline uint32_t
 vlapic_get_id(struct vlapic *vlapic)
 {
 
 	if (x2apic(vlapic))
 		return (vlapic->vcpuid);
 	else
 		return (vlapic->vcpuid << 24);
 }
 
 static uint32_t
 x2apic_ldr(struct vlapic *vlapic)
 {
 	int apicid;
 	uint32_t ldr;
 
 	apicid = vlapic_get_id(vlapic);
 	ldr = 1 << (apicid & 0xf);
 	ldr |= (apicid & 0xffff0) << 12;
 	return (ldr);
 }
 
 void
 vlapic_dfr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 
 	lapic = vlapic->apic_page;
 	if (x2apic(vlapic)) {
 		VM_CTR1(vlapic->vm, "ignoring write to DFR in x2apic mode: %#x",
 		    lapic->dfr);
 		lapic->dfr = 0;
 		return;
 	}
 
 	lapic->dfr &= APIC_DFR_MODEL_MASK;
 	lapic->dfr |= APIC_DFR_RESERVED;
 
 	if ((lapic->dfr & APIC_DFR_MODEL_MASK) == APIC_DFR_MODEL_FLAT)
 		VLAPIC_CTR0(vlapic, "vlapic DFR in Flat Model");
 	else if ((lapic->dfr & APIC_DFR_MODEL_MASK) == APIC_DFR_MODEL_CLUSTER)
 		VLAPIC_CTR0(vlapic, "vlapic DFR in Cluster Model");
 	else
 		VLAPIC_CTR1(vlapic, "DFR in Unknown Model %#x", lapic->dfr);
 }
 
 void
 vlapic_ldr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 
 	lapic = vlapic->apic_page;
 
 	/* LDR is read-only in x2apic mode */
 	if (x2apic(vlapic)) {
 		VLAPIC_CTR1(vlapic, "ignoring write to LDR in x2apic mode: %#x",
 		    lapic->ldr);
 		lapic->ldr = x2apic_ldr(vlapic);
 	} else {
 		lapic->ldr &= ~APIC_LDR_RESERVED;
 		VLAPIC_CTR1(vlapic, "vlapic LDR set to %#x", lapic->ldr);
 	}
 }
 
 void
 vlapic_id_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	
 	/*
 	 * We don't allow the ID register to be modified so reset it back to
 	 * its default value.
 	 */
 	lapic = vlapic->apic_page;
 	lapic->id = vlapic_get_id(vlapic);
 }
 
 static int
 vlapic_timer_divisor(uint32_t dcr)
 {
 	switch (dcr & 0xB) {
 	case APIC_TDCR_1:
 		return (1);
 	case APIC_TDCR_2:
 		return (2);
 	case APIC_TDCR_4:
 		return (4);
 	case APIC_TDCR_8:
 		return (8);
 	case APIC_TDCR_16:
 		return (16);
 	case APIC_TDCR_32:
 		return (32);
 	case APIC_TDCR_64:
 		return (64);
 	case APIC_TDCR_128:
 		return (128);
 	default:
 		panic("vlapic_timer_divisor: invalid dcr 0x%08x", dcr);
 	}
 }
 
 #if 0
 static inline void
 vlapic_dump_lvt(uint32_t offset, uint32_t *lvt)
 {
 	printf("Offset %x: lvt %08x (V:%02x DS:%x M:%x)\n", offset,
 	    *lvt, *lvt & APIC_LVTT_VECTOR, *lvt & APIC_LVTT_DS,
 	    *lvt & APIC_LVTT_M);
 }
 #endif
 
 static uint32_t
 vlapic_get_ccr(struct vlapic *vlapic)
 {
 	struct bintime bt_now, bt_rem;
 	struct LAPIC *lapic;
 	uint32_t ccr;
 	
 	ccr = 0;
 	lapic = vlapic->apic_page;
 
 	VLAPIC_TIMER_LOCK(vlapic);
 	if (callout_active(&vlapic->callout)) {
 		/*
 		 * If the timer is scheduled to expire in the future then
 		 * compute the value of 'ccr' based on the remaining time.
 		 */
 		binuptime(&bt_now);
 		if (bintime_cmp(&vlapic->timer_fire_bt, &bt_now, >)) {
 			bt_rem = vlapic->timer_fire_bt;
 			bintime_sub(&bt_rem, &bt_now);
 			ccr += bt_rem.sec * BT2FREQ(&vlapic->timer_freq_bt);
 			ccr += bt_rem.frac / vlapic->timer_freq_bt.frac;
 		}
 	}
 	KASSERT(ccr <= lapic->icr_timer, ("vlapic_get_ccr: invalid ccr %#x, "
 	    "icr_timer is %#x", ccr, lapic->icr_timer));
 	VLAPIC_CTR2(vlapic, "vlapic ccr_timer = %#x, icr_timer = %#x",
 	    ccr, lapic->icr_timer);
 	VLAPIC_TIMER_UNLOCK(vlapic);
 	return (ccr);
 }
 
 void
 vlapic_dcr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	int divisor;
 	
 	lapic = vlapic->apic_page;
 	VLAPIC_TIMER_LOCK(vlapic);
 
 	divisor = vlapic_timer_divisor(lapic->dcr_timer);
 	VLAPIC_CTR2(vlapic, "vlapic dcr_timer=%#x, divisor=%d",
 	    lapic->dcr_timer, divisor);
 
 	/*
 	 * Update the timer frequency and the timer period.
 	 *
 	 * XXX changes to the frequency divider will not take effect until
 	 * the timer is reloaded.
 	 */
 	FREQ2BT(VLAPIC_BUS_FREQ / divisor, &vlapic->timer_freq_bt);
 	vlapic->timer_period_bt = vlapic->timer_freq_bt;
 	bintime_mul(&vlapic->timer_period_bt, lapic->icr_timer);
 
 	VLAPIC_TIMER_UNLOCK(vlapic);
 }
 
 void
 vlapic_esr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	
 	lapic = vlapic->apic_page;
 	lapic->esr = vlapic->esr_pending;
 	vlapic->esr_pending = 0;
 }
 
 int
 vlapic_set_intr_ready(struct vlapic *vlapic, int vector, bool level)
 {
 	struct LAPIC *lapic;
 	uint32_t *irrptr, *tmrptr, mask;
 	int idx;
 
 	KASSERT(vector >= 0 && vector < 256, ("invalid vector %d", vector));
 
 	lapic = vlapic->apic_page;
 	if (!(lapic->svr & APIC_SVR_ENABLE)) {
 		VLAPIC_CTR1(vlapic, "vlapic is software disabled, ignoring "
 		    "interrupt %d", vector);
 		return (0);
 	}
 
 	if (vector < 16) {
 		vlapic_set_error(vlapic, APIC_ESR_RECEIVE_ILLEGAL_VECTOR);
 		VLAPIC_CTR1(vlapic, "vlapic ignoring interrupt to vector %d",
 		    vector);
 		return (1);
 	}
 
 	if (vlapic->ops.set_intr_ready)
 		return ((*vlapic->ops.set_intr_ready)(vlapic, vector, level));
 
 	idx = (vector / 32) * 4;
 	mask = 1 << (vector % 32);
 
 	irrptr = &lapic->irr0;
 	atomic_set_int(&irrptr[idx], mask);
 
 	/*
 	 * Verify that the trigger-mode of the interrupt matches with
 	 * the vlapic TMR registers.
 	 */
 	tmrptr = &lapic->tmr0;
 	if ((tmrptr[idx] & mask) != (level ? mask : 0)) {
 		VLAPIC_CTR3(vlapic, "vlapic TMR[%d] is 0x%08x but "
 		    "interrupt is %s-triggered", idx / 4, tmrptr[idx],
 		    level ? "level" : "edge");
 	}
 
 	VLAPIC_CTR_IRR(vlapic, "vlapic_set_intr_ready");
 	return (1);
 }
 
 static __inline uint32_t *
 vlapic_get_lvtptr(struct vlapic *vlapic, uint32_t offset)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	int 		 i;
 
 	switch (offset) {
 	case APIC_OFFSET_CMCI_LVT:
 		return (&lapic->lvt_cmci);
 	case APIC_OFFSET_TIMER_LVT ... APIC_OFFSET_ERROR_LVT:
 		i = (offset - APIC_OFFSET_TIMER_LVT) >> 2;
 		return ((&lapic->lvt_timer) + i);;
 	default:
 		panic("vlapic_get_lvt: invalid LVT\n");
 	}
 }
 
 static __inline int
 lvt_off_to_idx(uint32_t offset)
 {
 	int index;
 
 	switch (offset) {
 	case APIC_OFFSET_CMCI_LVT:
 		index = APIC_LVT_CMCI;
 		break;
 	case APIC_OFFSET_TIMER_LVT:
 		index = APIC_LVT_TIMER;
 		break;
 	case APIC_OFFSET_THERM_LVT:
 		index = APIC_LVT_THERMAL;
 		break;
 	case APIC_OFFSET_PERF_LVT:
 		index = APIC_LVT_PMC;
 		break;
 	case APIC_OFFSET_LINT0_LVT:
 		index = APIC_LVT_LINT0;
 		break;
 	case APIC_OFFSET_LINT1_LVT:
 		index = APIC_LVT_LINT1;
 		break;
 	case APIC_OFFSET_ERROR_LVT:
 		index = APIC_LVT_ERROR;
 		break;
 	default:
 		index = -1;
 		break;
 	}
 	KASSERT(index >= 0 && index <= VLAPIC_MAXLVT_INDEX, ("lvt_off_to_idx: "
 	    "invalid lvt index %d for offset %#x", index, offset));
 
 	return (index);
 }
 
 static __inline uint32_t
 vlapic_get_lvt(struct vlapic *vlapic, uint32_t offset)
 {
 	int idx;
 	uint32_t val;
 
 	idx = lvt_off_to_idx(offset);
 	val = atomic_load_acq_32(&vlapic->lvt_last[idx]);
 	return (val);
 }
 
 void
 vlapic_lvt_write_handler(struct vlapic *vlapic, uint32_t offset)
 {
 	uint32_t *lvtptr, mask, val;
 	struct LAPIC *lapic;
 	int idx;
 	
 	lapic = vlapic->apic_page;
 	lvtptr = vlapic_get_lvtptr(vlapic, offset);	
 	val = *lvtptr;
 	idx = lvt_off_to_idx(offset);
 
 	if (!(lapic->svr & APIC_SVR_ENABLE))
 		val |= APIC_LVT_M;
 	mask = APIC_LVT_M | APIC_LVT_DS | APIC_LVT_VECTOR;
 	switch (offset) {
 	case APIC_OFFSET_TIMER_LVT:
 		mask |= APIC_LVTT_TM;
 		break;
 	case APIC_OFFSET_ERROR_LVT:
 		break;
 	case APIC_OFFSET_LINT0_LVT:
 	case APIC_OFFSET_LINT1_LVT:
 		mask |= APIC_LVT_TM | APIC_LVT_RIRR | APIC_LVT_IIPP;
 		/* FALLTHROUGH */
 	default:
 		mask |= APIC_LVT_DM;
 		break;
 	}
 	val &= mask;
 	*lvtptr = val;
 	atomic_store_rel_32(&vlapic->lvt_last[idx], val);
 }
 
 static void
 vlapic_mask_lvts(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic = vlapic->apic_page;
 
 	lapic->lvt_cmci |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_CMCI_LVT);
 
 	lapic->lvt_timer |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_TIMER_LVT);
 
 	lapic->lvt_thermal |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_THERM_LVT);
 
 	lapic->lvt_pcint |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_PERF_LVT);
 
 	lapic->lvt_lint0 |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_LINT0_LVT);
 
 	lapic->lvt_lint1 |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_LINT1_LVT);
 
 	lapic->lvt_error |= APIC_LVT_M;
 	vlapic_lvt_write_handler(vlapic, APIC_OFFSET_ERROR_LVT);
 }
 
 static int
 vlapic_fire_lvt(struct vlapic *vlapic, uint32_t lvt)
 {
 	uint32_t vec, mode;
 
 	if (lvt & APIC_LVT_M)
 		return (0);
 
 	vec = lvt & APIC_LVT_VECTOR;
 	mode = lvt & APIC_LVT_DM;
 
 	switch (mode) {
 	case APIC_LVT_DM_FIXED:
 		if (vec < 16) {
 			vlapic_set_error(vlapic, APIC_ESR_SEND_ILLEGAL_VECTOR);
 			return (0);
 		}
 		if (vlapic_set_intr_ready(vlapic, vec, false))
 			vcpu_notify_event(vlapic->vm, vlapic->vcpuid, true);
 		break;
 	case APIC_LVT_DM_NMI:
 		vm_inject_nmi(vlapic->vm, vlapic->vcpuid);
 		break;
 	case APIC_LVT_DM_EXTINT:
 		vm_inject_extint(vlapic->vm, vlapic->vcpuid);
 		break;
 	default:
 		// Other modes ignored
 		return (0);
 	}
 	return (1);
 }
 
 #if 1
 static void
 dump_isrvec_stk(struct vlapic *vlapic)
 {
 	int i;
 	uint32_t *isrptr;
 
 	isrptr = &vlapic->apic_page->isr0;
 	for (i = 0; i < 8; i++)
 		printf("ISR%d 0x%08x\n", i, isrptr[i * 4]);
 
 	for (i = 0; i <= vlapic->isrvec_stk_top; i++)
 		printf("isrvec_stk[%d] = %d\n", i, vlapic->isrvec_stk[i]);
 }
 #endif
 
 /*
  * Algorithm adopted from section "Interrupt, Task and Processor Priority"
  * in Intel Architecture Manual Vol 3a.
  */
 static void
 vlapic_update_ppr(struct vlapic *vlapic)
 {
 	int isrvec, tpr, ppr;
 
 	/*
 	 * Note that the value on the stack at index 0 is always 0.
 	 *
 	 * This is a placeholder for the value of ISRV when none of the
 	 * bits is set in the ISRx registers.
 	 */
 	isrvec = vlapic->isrvec_stk[vlapic->isrvec_stk_top];
 	tpr = vlapic->apic_page->tpr;
 
 #if 1
 	{
 		int i, lastprio, curprio, vector, idx;
 		uint32_t *isrptr;
 
 		if (vlapic->isrvec_stk_top == 0 && isrvec != 0)
 			panic("isrvec_stk is corrupted: %d", isrvec);
 
 		/*
 		 * Make sure that the priority of the nested interrupts is
 		 * always increasing.
 		 */
 		lastprio = -1;
 		for (i = 1; i <= vlapic->isrvec_stk_top; i++) {
 			curprio = PRIO(vlapic->isrvec_stk[i]);
 			if (curprio <= lastprio) {
 				dump_isrvec_stk(vlapic);
 				panic("isrvec_stk does not satisfy invariant");
 			}
 			lastprio = curprio;
 		}
 
 		/*
 		 * Make sure that each bit set in the ISRx registers has a
 		 * corresponding entry on the isrvec stack.
 		 */
 		i = 1;
 		isrptr = &vlapic->apic_page->isr0;
 		for (vector = 0; vector < 256; vector++) {
 			idx = (vector / 32) * 4;
 			if (isrptr[idx] & (1 << (vector % 32))) {
 				if (i > vlapic->isrvec_stk_top ||
 				    vlapic->isrvec_stk[i] != vector) {
 					dump_isrvec_stk(vlapic);
 					panic("ISR and isrvec_stk out of sync");
 				}
 				i++;
 			}
 		}
 	}
 #endif
 
 	if (PRIO(tpr) >= PRIO(isrvec))
 		ppr = tpr;
 	else
 		ppr = isrvec & 0xf0;
 
 	vlapic->apic_page->ppr = ppr;
 	VLAPIC_CTR1(vlapic, "vlapic_update_ppr 0x%02x", ppr);
 }
 
 static VMM_STAT(VLAPIC_GRATUITOUS_EOI, "EOI without any in-service interrupt");
 
 static void
 vlapic_process_eoi(struct vlapic *vlapic)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	uint32_t	*isrptr, *tmrptr;
 	int		i, idx, bitpos, vector;
 
 	isrptr = &lapic->isr0;
 	tmrptr = &lapic->tmr0;
 
 	for (i = 7; i >= 0; i--) {
 		idx = i * 4;
 		bitpos = fls(isrptr[idx]);
 		if (bitpos-- != 0) {
 			if (vlapic->isrvec_stk_top <= 0) {
 				panic("invalid vlapic isrvec_stk_top %d",
 				      vlapic->isrvec_stk_top);
 			}
 			isrptr[idx] &= ~(1 << bitpos);
 			vector = i * 32 + bitpos;
 			VCPU_CTR1(vlapic->vm, vlapic->vcpuid, "EOI vector %d",
 			    vector);
 			VLAPIC_CTR_ISR(vlapic, "vlapic_process_eoi");
 			vlapic->isrvec_stk_top--;
 			vlapic_update_ppr(vlapic);
 			if ((tmrptr[idx] & (1 << bitpos)) != 0) {
 				vioapic_process_eoi(vlapic->vm, vlapic->vcpuid,
 				    vector);
 			}
 			return;
 		}
 	}
 	VCPU_CTR0(vlapic->vm, vlapic->vcpuid, "Gratuitous EOI");
 	vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_GRATUITOUS_EOI, 1);
 }
 
 static __inline int
 vlapic_get_lvt_field(uint32_t lvt, uint32_t mask)
 {
 
 	return (lvt & mask);
 }
 
 static __inline int
 vlapic_periodic_timer(struct vlapic *vlapic)
 {
 	uint32_t lvt;
 	
 	lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_TIMER_LVT);
 
 	return (vlapic_get_lvt_field(lvt, APIC_LVTT_TM_PERIODIC));
 }
 
 static VMM_STAT(VLAPIC_INTR_ERROR, "error interrupts generated by vlapic");
 
 void
 vlapic_set_error(struct vlapic *vlapic, uint32_t mask)
 {
 	uint32_t lvt;
 
 	vlapic->esr_pending |= mask;
 	if (vlapic->esr_firing)
 		return;
 	vlapic->esr_firing = 1;
 
 	// The error LVT always uses the fixed delivery mode.
 	lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_ERROR_LVT);
 	if (vlapic_fire_lvt(vlapic, lvt | APIC_LVT_DM_FIXED)) {
 		vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_INTR_ERROR, 1);
 	}
 	vlapic->esr_firing = 0;
 }
 
 static VMM_STAT(VLAPIC_INTR_TIMER, "timer interrupts generated by vlapic");
 
 static void
 vlapic_fire_timer(struct vlapic *vlapic)
 {
 	uint32_t lvt;
 
 	KASSERT(VLAPIC_TIMER_LOCKED(vlapic), ("vlapic_fire_timer not locked"));
 	
 	// The timer LVT always uses the fixed delivery mode.
 	lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_TIMER_LVT);
 	if (vlapic_fire_lvt(vlapic, lvt | APIC_LVT_DM_FIXED)) {
 		VLAPIC_CTR0(vlapic, "vlapic timer fired");
 		vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_INTR_TIMER, 1);
 	}
 }
 
 static VMM_STAT(VLAPIC_INTR_CMC,
     "corrected machine check interrupts generated by vlapic");
 
 void
 vlapic_fire_cmci(struct vlapic *vlapic)
 {
 	uint32_t lvt;
 
 	lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_CMCI_LVT);
 	if (vlapic_fire_lvt(vlapic, lvt)) {
 		vmm_stat_incr(vlapic->vm, vlapic->vcpuid, VLAPIC_INTR_CMC, 1);
 	}
 }
 
 static VMM_STAT_ARRAY(LVTS_TRIGGERRED, VLAPIC_MAXLVT_INDEX + 1,
     "lvts triggered");
 
 int
 vlapic_trigger_lvt(struct vlapic *vlapic, int vector)
 {
 	uint32_t lvt;
 
 	if (vlapic_enabled(vlapic) == false) {
 		/*
 		 * When the local APIC is global/hardware disabled,
 		 * LINT[1:0] pins are configured as INTR and NMI pins,
 		 * respectively.
 		*/
 		switch (vector) {
 			case APIC_LVT_LINT0:
 				vm_inject_extint(vlapic->vm, vlapic->vcpuid);
 				break;
 			case APIC_LVT_LINT1:
 				vm_inject_nmi(vlapic->vm, vlapic->vcpuid);
 				break;
 			default:
 				break;
 		}
 		return (0);
 	}
 
 	switch (vector) {
 	case APIC_LVT_LINT0:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_LINT0_LVT);
 		break;
 	case APIC_LVT_LINT1:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_LINT1_LVT);
 		break;
 	case APIC_LVT_TIMER:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_TIMER_LVT);
 		lvt |= APIC_LVT_DM_FIXED;
 		break;
 	case APIC_LVT_ERROR:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_ERROR_LVT);
 		lvt |= APIC_LVT_DM_FIXED;
 		break;
 	case APIC_LVT_PMC:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_PERF_LVT);
 		break;
 	case APIC_LVT_THERMAL:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_THERM_LVT);
 		break;
 	case APIC_LVT_CMCI:
 		lvt = vlapic_get_lvt(vlapic, APIC_OFFSET_CMCI_LVT);
 		break;
 	default:
 		return (EINVAL);
 	}
 	if (vlapic_fire_lvt(vlapic, lvt)) {
 		vmm_stat_array_incr(vlapic->vm, vlapic->vcpuid,
 		    LVTS_TRIGGERRED, vector, 1);
 	}
 	return (0);
 }
 
 static void
 vlapic_callout_handler(void *arg)
 {
 	struct vlapic *vlapic;
 	struct bintime bt, btnow;
 	sbintime_t rem_sbt;
 
 	vlapic = arg;
 
 	VLAPIC_TIMER_LOCK(vlapic);
 	if (callout_pending(&vlapic->callout))	/* callout was reset */
 		goto done;
 
 	if (!callout_active(&vlapic->callout))	/* callout was stopped */
 		goto done;
 
 	callout_deactivate(&vlapic->callout);
 
 	vlapic_fire_timer(vlapic);
 
 	if (vlapic_periodic_timer(vlapic)) {
 		binuptime(&btnow);
 		KASSERT(bintime_cmp(&btnow, &vlapic->timer_fire_bt, >=),
 		    ("vlapic callout at %#lx.%#lx, expected at %#lx.#%lx",
 		    btnow.sec, btnow.frac, vlapic->timer_fire_bt.sec,
 		    vlapic->timer_fire_bt.frac));
 
 		/*
 		 * Compute the delta between when the timer was supposed to
 		 * fire and the present time.
 		 */
 		bt = btnow;
 		bintime_sub(&bt, &vlapic->timer_fire_bt);
 
 		rem_sbt = bttosbt(vlapic->timer_period_bt);
 		if (bintime_cmp(&bt, &vlapic->timer_period_bt, <)) {
 			/*
 			 * Adjust the time until the next countdown downward
 			 * to account for the lost time.
 			 */
 			rem_sbt -= bttosbt(bt);
 		} else {
 			/*
 			 * If the delta is greater than the timer period then
 			 * just reset our time base instead of trying to catch
 			 * up.
 			 */
 			vlapic->timer_fire_bt = btnow;
 			VLAPIC_CTR2(vlapic, "vlapic timer lagging by %lu "
 			    "usecs, period is %lu usecs - resetting time base",
 			    bttosbt(bt) / SBT_1US,
 			    bttosbt(vlapic->timer_period_bt) / SBT_1US);
 		}
 
 		bintime_add(&vlapic->timer_fire_bt, &vlapic->timer_period_bt);
 		callout_reset_sbt(&vlapic->callout, rem_sbt, 0,
 		    vlapic_callout_handler, vlapic, 0);
 	}
 done:
 	VLAPIC_TIMER_UNLOCK(vlapic);
 }
 
 void
 vlapic_icrtmr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	sbintime_t sbt;
 	uint32_t icr_timer;
 
 	VLAPIC_TIMER_LOCK(vlapic);
 
 	lapic = vlapic->apic_page;
 	icr_timer = lapic->icr_timer;
 
 	vlapic->timer_period_bt = vlapic->timer_freq_bt;
 	bintime_mul(&vlapic->timer_period_bt, icr_timer);
 
 	if (icr_timer != 0) {
 		binuptime(&vlapic->timer_fire_bt);
 		bintime_add(&vlapic->timer_fire_bt, &vlapic->timer_period_bt);
 
 		sbt = bttosbt(vlapic->timer_period_bt);
 		callout_reset_sbt(&vlapic->callout, sbt, 0,
 		    vlapic_callout_handler, vlapic, 0);
 	} else
 		callout_stop(&vlapic->callout);
 
 	VLAPIC_TIMER_UNLOCK(vlapic);
 }
 
 /*
  * This function populates 'dmask' with the set of vcpus that match the
  * addressing specified by the (dest, phys, lowprio) tuple.
  * 
  * 'x2apic_dest' specifies whether 'dest' is interpreted as x2APIC (32-bit)
  * or xAPIC (8-bit) destination field.
  */
 static void
 vlapic_calcdest(struct vm *vm, cpuset_t *dmask, uint32_t dest, bool phys,
     bool lowprio, bool x2apic_dest)
 {
 	struct vlapic *vlapic;
 	uint32_t dfr, ldr, ldest, cluster;
 	uint32_t mda_flat_ldest, mda_cluster_ldest, mda_ldest, mda_cluster_id;
 	cpuset_t amask;
 	int vcpuid;
 
 	if ((x2apic_dest && dest == 0xffffffff) ||
 	    (!x2apic_dest && dest == 0xff)) {
 		/*
 		 * Broadcast in both logical and physical modes.
 		 */
 		*dmask = vm_active_cpus(vm);
 		return;
 	}
 
 	if (phys) {
 		/*
 		 * Physical mode: destination is APIC ID.
 		 */
 		CPU_ZERO(dmask);
 		vcpuid = vm_apicid2vcpuid(vm, dest);
-		if (vcpuid < VM_MAXCPU)
+		if (vcpuid < vm_get_maxcpus(vm))
 			CPU_SET(vcpuid, dmask);
 	} else {
 		/*
 		 * In the "Flat Model" the MDA is interpreted as an 8-bit wide
 		 * bitmask. This model is only available in the xAPIC mode.
 		 */
 		mda_flat_ldest = dest & 0xff;
 
 		/*
 		 * In the "Cluster Model" the MDA is used to identify a
 		 * specific cluster and a set of APICs in that cluster.
 		 */
 		if (x2apic_dest) {
 			mda_cluster_id = dest >> 16;
 			mda_cluster_ldest = dest & 0xffff;
 		} else {
 			mda_cluster_id = (dest >> 4) & 0xf;
 			mda_cluster_ldest = dest & 0xf;
 		}
 
 		/*
 		 * Logical mode: match each APIC that has a bit set
 		 * in its LDR that matches a bit in the ldest.
 		 */
 		CPU_ZERO(dmask);
 		amask = vm_active_cpus(vm);
 		while ((vcpuid = CPU_FFS(&amask)) != 0) {
 			vcpuid--;
 			CPU_CLR(vcpuid, &amask);
 
 			vlapic = vm_lapic(vm, vcpuid);
 			dfr = vlapic->apic_page->dfr;
 			ldr = vlapic->apic_page->ldr;
 
 			if ((dfr & APIC_DFR_MODEL_MASK) ==
 			    APIC_DFR_MODEL_FLAT) {
 				ldest = ldr >> 24;
 				mda_ldest = mda_flat_ldest;
 			} else if ((dfr & APIC_DFR_MODEL_MASK) ==
 			    APIC_DFR_MODEL_CLUSTER) {
 				if (x2apic(vlapic)) {
 					cluster = ldr >> 16;
 					ldest = ldr & 0xffff;
 				} else {
 					cluster = ldr >> 28;
 					ldest = (ldr >> 24) & 0xf;
 				}
 				if (cluster != mda_cluster_id)
 					continue;
 				mda_ldest = mda_cluster_ldest;
 			} else {
 				/*
 				 * Guest has configured a bad logical
 				 * model for this vcpu - skip it.
 				 */
 				VLAPIC_CTR1(vlapic, "vlapic has bad logical "
 				    "model %x - cannot deliver interrupt", dfr);
 				continue;
 			}
 
 			if ((mda_ldest & ldest) != 0) {
 				CPU_SET(vcpuid, dmask);
 				if (lowprio)
 					break;
 			}
 		}
 	}
 }
 
 static VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu");
 
 static void
 vlapic_set_tpr(struct vlapic *vlapic, uint8_t val)
 {
 	struct LAPIC *lapic = vlapic->apic_page;
 
 	if (lapic->tpr != val) {
 		VCPU_CTR2(vlapic->vm, vlapic->vcpuid, "vlapic TPR changed "
 		    "from %#x to %#x", lapic->tpr, val);
 		lapic->tpr = val;
 		vlapic_update_ppr(vlapic);
 	}
 }
 
 static uint8_t
 vlapic_get_tpr(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic = vlapic->apic_page;
 
 	return (lapic->tpr);
 }
 
 void
 vlapic_set_cr8(struct vlapic *vlapic, uint64_t val)
 {
 	uint8_t tpr;
 
 	if (val & ~0xf) {
 		vm_inject_gp(vlapic->vm, vlapic->vcpuid);
 		return;
 	}
 
 	tpr = val << 4;
 	vlapic_set_tpr(vlapic, tpr);
 }
 
 uint64_t
 vlapic_get_cr8(struct vlapic *vlapic)
 {
 	uint8_t tpr;
 
 	tpr = vlapic_get_tpr(vlapic);
 	return (tpr >> 4);
 }
 
 int
 vlapic_icrlo_write_handler(struct vlapic *vlapic, bool *retu)
 {
 	int i;
 	bool phys;
 	cpuset_t dmask;
 	uint64_t icrval;
 	uint32_t dest, vec, mode;
 	struct vlapic *vlapic2;
 	struct vm_exit *vmexit;
 	struct LAPIC *lapic;
+	uint16_t maxcpus;
 
 	lapic = vlapic->apic_page;
 	lapic->icr_lo &= ~APIC_DELSTAT_PEND;
 	icrval = ((uint64_t)lapic->icr_hi << 32) | lapic->icr_lo;
 
 	if (x2apic(vlapic))
 		dest = icrval >> 32;
 	else
 		dest = icrval >> (32 + 24);
 	vec = icrval & APIC_VECTOR_MASK;
 	mode = icrval & APIC_DELMODE_MASK;
 
 	if (mode == APIC_DELMODE_FIXED && vec < 16) {
 		vlapic_set_error(vlapic, APIC_ESR_SEND_ILLEGAL_VECTOR);
 		VLAPIC_CTR1(vlapic, "Ignoring invalid IPI %d", vec);
 		return (0);
 	}
 
 	VLAPIC_CTR2(vlapic, "icrlo 0x%016lx triggered ipi %d", icrval, vec);
 
 	if (mode == APIC_DELMODE_FIXED || mode == APIC_DELMODE_NMI) {
 		switch (icrval & APIC_DEST_MASK) {
 		case APIC_DEST_DESTFLD:
 			phys = ((icrval & APIC_DESTMODE_LOG) == 0);
 			vlapic_calcdest(vlapic->vm, &dmask, dest, phys, false,
 			    x2apic(vlapic));
 			break;
 		case APIC_DEST_SELF:
 			CPU_SETOF(vlapic->vcpuid, &dmask);
 			break;
 		case APIC_DEST_ALLISELF:
 			dmask = vm_active_cpus(vlapic->vm);
 			break;
 		case APIC_DEST_ALLESELF:
 			dmask = vm_active_cpus(vlapic->vm);
 			CPU_CLR(vlapic->vcpuid, &dmask);
 			break;
 		default:
 			CPU_ZERO(&dmask);	/* satisfy gcc */
 			break;
 		}
 
 		while ((i = CPU_FFS(&dmask)) != 0) {
 			i--;
 			CPU_CLR(i, &dmask);
 			if (mode == APIC_DELMODE_FIXED) {
 				lapic_intr_edge(vlapic->vm, i, vec);
 				vmm_stat_array_incr(vlapic->vm, vlapic->vcpuid,
 						    IPIS_SENT, i, 1);
 				VLAPIC_CTR2(vlapic, "vlapic sending ipi %d "
 				    "to vcpuid %d", vec, i);
 			} else {
 				vm_inject_nmi(vlapic->vm, i);
 				VLAPIC_CTR1(vlapic, "vlapic sending ipi nmi "
 				    "to vcpuid %d", i);
 			}
 		}
 
 		return (0);	/* handled completely in the kernel */
 	}
 
+	maxcpus = vm_get_maxcpus(vlapic->vm);
 	if (mode == APIC_DELMODE_INIT) {
 		if ((icrval & APIC_LEVEL_MASK) == APIC_LEVEL_DEASSERT)
 			return (0);
 
-		if (vlapic->vcpuid == 0 && dest != 0 && dest < VM_MAXCPU) {
+		if (vlapic->vcpuid == 0 && dest != 0 && dest < maxcpus) {
 			vlapic2 = vm_lapic(vlapic->vm, dest);
 
 			/* move from INIT to waiting-for-SIPI state */
 			if (vlapic2->boot_state == BS_INIT) {
 				vlapic2->boot_state = BS_SIPI;
 			}
 
 			return (0);
 		}
 	}
 
 	if (mode == APIC_DELMODE_STARTUP) {
-		if (vlapic->vcpuid == 0 && dest != 0 && dest < VM_MAXCPU) {
+		if (vlapic->vcpuid == 0 && dest != 0 && dest < maxcpus) {
 			vlapic2 = vm_lapic(vlapic->vm, dest);
 
 			/*
 			 * Ignore SIPIs in any state other than wait-for-SIPI
 			 */
 			if (vlapic2->boot_state != BS_SIPI)
 				return (0);
 
 			vlapic2->boot_state = BS_RUNNING;
 
 			*retu = true;
 			vmexit = vm_exitinfo(vlapic->vm, vlapic->vcpuid);
 			vmexit->exitcode = VM_EXITCODE_SPINUP_AP;
 			vmexit->u.spinup_ap.vcpu = dest;
 			vmexit->u.spinup_ap.rip = vec << PAGE_SHIFT;
 
 			return (0);
 		}
 	}
 
 	/*
 	 * This will cause a return to userland.
 	 */
 	return (1);
 }
 
 void
 vlapic_self_ipi_handler(struct vlapic *vlapic, uint64_t val)
 {
 	int vec;
 
 	KASSERT(x2apic(vlapic), ("SELF_IPI does not exist in xAPIC mode"));
 
 	vec = val & 0xff;
 	lapic_intr_edge(vlapic->vm, vlapic->vcpuid, vec);
 	vmm_stat_array_incr(vlapic->vm, vlapic->vcpuid, IPIS_SENT,
 	    vlapic->vcpuid, 1);
 	VLAPIC_CTR1(vlapic, "vlapic self-ipi %d", vec);
 }
 
 int
 vlapic_pending_intr(struct vlapic *vlapic, int *vecptr)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	int	  	 idx, i, bitpos, vector;
 	uint32_t	*irrptr, val;
 
 	if (vlapic->ops.pending_intr)
 		return ((*vlapic->ops.pending_intr)(vlapic, vecptr));
 
 	irrptr = &lapic->irr0;
 
 	for (i = 7; i >= 0; i--) {
 		idx = i * 4;
 		val = atomic_load_acq_int(&irrptr[idx]);
 		bitpos = fls(val);
 		if (bitpos != 0) {
 			vector = i * 32 + (bitpos - 1);
 			if (PRIO(vector) > PRIO(lapic->ppr)) {
 				VLAPIC_CTR1(vlapic, "pending intr %d", vector);
 				if (vecptr != NULL)
 					*vecptr = vector;
 				return (1);
 			} else 
 				break;
 		}
 	}
 	return (0);
 }
 
 void
 vlapic_intr_accepted(struct vlapic *vlapic, int vector)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	uint32_t	*irrptr, *isrptr;
 	int		idx, stk_top;
 
 	if (vlapic->ops.intr_accepted)
 		return ((*vlapic->ops.intr_accepted)(vlapic, vector));
 
 	/*
 	 * clear the ready bit for vector being accepted in irr 
 	 * and set the vector as in service in isr.
 	 */
 	idx = (vector / 32) * 4;
 
 	irrptr = &lapic->irr0;
 	atomic_clear_int(&irrptr[idx], 1 << (vector % 32));
 	VLAPIC_CTR_IRR(vlapic, "vlapic_intr_accepted");
 
 	isrptr = &lapic->isr0;
 	isrptr[idx] |= 1 << (vector % 32);
 	VLAPIC_CTR_ISR(vlapic, "vlapic_intr_accepted");
 
 	/*
 	 * Update the PPR
 	 */
 	vlapic->isrvec_stk_top++;
 
 	stk_top = vlapic->isrvec_stk_top;
 	if (stk_top >= ISRVEC_STK_SIZE)
 		panic("isrvec_stk_top overflow %d", stk_top);
 
 	vlapic->isrvec_stk[stk_top] = vector;
 	vlapic_update_ppr(vlapic);
 }
 
 void
 vlapic_svr_write_handler(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	uint32_t old, new, changed;
 
 	lapic = vlapic->apic_page;
 
 	new = lapic->svr;
 	old = vlapic->svr_last;
 	vlapic->svr_last = new;
 
 	changed = old ^ new;
 	if ((changed & APIC_SVR_ENABLE) != 0) {
 		if ((new & APIC_SVR_ENABLE) == 0) {
 			/*
 			 * The apic is now disabled so stop the apic timer
 			 * and mask all the LVT entries.
 			 */
 			VLAPIC_CTR0(vlapic, "vlapic is software-disabled");
 			VLAPIC_TIMER_LOCK(vlapic);
 			callout_stop(&vlapic->callout);
 			VLAPIC_TIMER_UNLOCK(vlapic);
 			vlapic_mask_lvts(vlapic);
 		} else {
 			/*
 			 * The apic is now enabled so restart the apic timer
 			 * if it is configured in periodic mode.
 			 */
 			VLAPIC_CTR0(vlapic, "vlapic is software-enabled");
 			if (vlapic_periodic_timer(vlapic))
 				vlapic_icrtmr_write_handler(vlapic);
 		}
 	}
 }
 
 int
 vlapic_read(struct vlapic *vlapic, int mmio_access, uint64_t offset,
     uint64_t *data, bool *retu)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	uint32_t	*reg;
 	int		 i;
 
 	/* Ignore MMIO accesses in x2APIC mode */
 	if (x2apic(vlapic) && mmio_access) {
 		VLAPIC_CTR1(vlapic, "MMIO read from offset %#lx in x2APIC mode",
 		    offset);
 		*data = 0;
 		goto done;
 	}
 
 	if (!x2apic(vlapic) && !mmio_access) {
 		/*
 		 * XXX Generate GP fault for MSR accesses in xAPIC mode
 		 */
 		VLAPIC_CTR1(vlapic, "x2APIC MSR read from offset %#lx in "
 		    "xAPIC mode", offset);
 		*data = 0;
 		goto done;
 	}
 
 	if (offset > sizeof(*lapic)) {
 		*data = 0;
 		goto done;
 	}
 	
 	offset &= ~3;
 	switch(offset)
 	{
 		case APIC_OFFSET_ID:
 			*data = lapic->id;
 			break;
 		case APIC_OFFSET_VER:
 			*data = lapic->version;
 			break;
 		case APIC_OFFSET_TPR:
 			*data = vlapic_get_tpr(vlapic);
 			break;
 		case APIC_OFFSET_APR:
 			*data = lapic->apr;
 			break;
 		case APIC_OFFSET_PPR:
 			*data = lapic->ppr;
 			break;
 		case APIC_OFFSET_EOI:
 			*data = lapic->eoi;
 			break;
 		case APIC_OFFSET_LDR:
 			*data = lapic->ldr;
 			break;
 		case APIC_OFFSET_DFR:
 			*data = lapic->dfr;
 			break;
 		case APIC_OFFSET_SVR:
 			*data = lapic->svr;
 			break;
 		case APIC_OFFSET_ISR0 ... APIC_OFFSET_ISR7:
 			i = (offset - APIC_OFFSET_ISR0) >> 2;
 			reg = &lapic->isr0;
 			*data = *(reg + i);
 			break;
 		case APIC_OFFSET_TMR0 ... APIC_OFFSET_TMR7:
 			i = (offset - APIC_OFFSET_TMR0) >> 2;
 			reg = &lapic->tmr0;
 			*data = *(reg + i);
 			break;
 		case APIC_OFFSET_IRR0 ... APIC_OFFSET_IRR7:
 			i = (offset - APIC_OFFSET_IRR0) >> 2;
 			reg = &lapic->irr0;
 			*data = atomic_load_acq_int(reg + i);
 			break;
 		case APIC_OFFSET_ESR:
 			*data = lapic->esr;
 			break;
 		case APIC_OFFSET_ICR_LOW: 
 			*data = lapic->icr_lo;
 			if (x2apic(vlapic))
 				*data |= (uint64_t)lapic->icr_hi << 32;
 			break;
 		case APIC_OFFSET_ICR_HI: 
 			*data = lapic->icr_hi;
 			break;
 		case APIC_OFFSET_CMCI_LVT:
 		case APIC_OFFSET_TIMER_LVT ... APIC_OFFSET_ERROR_LVT:
 			*data = vlapic_get_lvt(vlapic, offset);	
 #ifdef INVARIANTS
 			reg = vlapic_get_lvtptr(vlapic, offset);
 			KASSERT(*data == *reg, ("inconsistent lvt value at "
 			    "offset %#lx: %#lx/%#x", offset, *data, *reg));
 #endif
 			break;
 		case APIC_OFFSET_TIMER_ICR:
 			*data = lapic->icr_timer;
 			break;
 		case APIC_OFFSET_TIMER_CCR:
 			*data = vlapic_get_ccr(vlapic);
 			break;
 		case APIC_OFFSET_TIMER_DCR:
 			*data = lapic->dcr_timer;
 			break;
 		case APIC_OFFSET_SELF_IPI:
 			/*
 			 * XXX generate a GP fault if vlapic is in x2apic mode
 			 */
 			*data = 0;
 			break;
 		case APIC_OFFSET_RRR:
 		default:
 			*data = 0;
 			break;
 	}
 done:
 	VLAPIC_CTR2(vlapic, "vlapic read offset %#x, data %#lx", offset, *data);
 	return 0;
 }
 
 int
 vlapic_write(struct vlapic *vlapic, int mmio_access, uint64_t offset,
     uint64_t data, bool *retu)
 {
 	struct LAPIC	*lapic = vlapic->apic_page;
 	uint32_t	*regptr;
 	int		retval;
 
 	KASSERT((offset & 0xf) == 0 && offset < PAGE_SIZE,
 	    ("vlapic_write: invalid offset %#lx", offset));
 
 	VLAPIC_CTR2(vlapic, "vlapic write offset %#lx, data %#lx",
 	    offset, data);
 
 	if (offset > sizeof(*lapic))
 		return (0);
 
 	/* Ignore MMIO accesses in x2APIC mode */
 	if (x2apic(vlapic) && mmio_access) {
 		VLAPIC_CTR2(vlapic, "MMIO write of %#lx to offset %#lx "
 		    "in x2APIC mode", data, offset);
 		return (0);
 	}
 
 	/*
 	 * XXX Generate GP fault for MSR accesses in xAPIC mode
 	 */
 	if (!x2apic(vlapic) && !mmio_access) {
 		VLAPIC_CTR2(vlapic, "x2APIC MSR write of %#lx to offset %#lx "
 		    "in xAPIC mode", data, offset);
 		return (0);
 	}
 
 	retval = 0;
 	switch(offset)
 	{
 		case APIC_OFFSET_ID:
 			lapic->id = data;
 			vlapic_id_write_handler(vlapic);
 			break;
 		case APIC_OFFSET_TPR:
 			vlapic_set_tpr(vlapic, data & 0xff);
 			break;
 		case APIC_OFFSET_EOI:
 			vlapic_process_eoi(vlapic);
 			break;
 		case APIC_OFFSET_LDR:
 			lapic->ldr = data;
 			vlapic_ldr_write_handler(vlapic);
 			break;
 		case APIC_OFFSET_DFR:
 			lapic->dfr = data;
 			vlapic_dfr_write_handler(vlapic);
 			break;
 		case APIC_OFFSET_SVR:
 			lapic->svr = data;
 			vlapic_svr_write_handler(vlapic);
 			break;
 		case APIC_OFFSET_ICR_LOW: 
 			lapic->icr_lo = data;
 			if (x2apic(vlapic))
 				lapic->icr_hi = data >> 32;
 			retval = vlapic_icrlo_write_handler(vlapic, retu);
 			break;
 		case APIC_OFFSET_ICR_HI:
 			lapic->icr_hi = data;
 			break;
 		case APIC_OFFSET_CMCI_LVT:
 		case APIC_OFFSET_TIMER_LVT ... APIC_OFFSET_ERROR_LVT:
 			regptr = vlapic_get_lvtptr(vlapic, offset);
 			*regptr = data;
 			vlapic_lvt_write_handler(vlapic, offset);
 			break;
 		case APIC_OFFSET_TIMER_ICR:
 			lapic->icr_timer = data;
 			vlapic_icrtmr_write_handler(vlapic);
 			break;
 
 		case APIC_OFFSET_TIMER_DCR:
 			lapic->dcr_timer = data;
 			vlapic_dcr_write_handler(vlapic);
 			break;
 
 		case APIC_OFFSET_ESR:
 			vlapic_esr_write_handler(vlapic);
 			break;
 
 		case APIC_OFFSET_SELF_IPI:
 			if (x2apic(vlapic))
 				vlapic_self_ipi_handler(vlapic, data);
 			break;
 
 		case APIC_OFFSET_VER:
 		case APIC_OFFSET_APR:
 		case APIC_OFFSET_PPR:
 		case APIC_OFFSET_RRR:
 		case APIC_OFFSET_ISR0 ... APIC_OFFSET_ISR7:
 		case APIC_OFFSET_TMR0 ... APIC_OFFSET_TMR7:
 		case APIC_OFFSET_IRR0 ... APIC_OFFSET_IRR7:
 		case APIC_OFFSET_TIMER_CCR:
 		default:
 			// Read only.
 			break;
 	}
 
 	return (retval);
 }
 
 static void
 vlapic_reset(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic;
 	
 	lapic = vlapic->apic_page;
 	bzero(lapic, sizeof(struct LAPIC));
 
 	lapic->id = vlapic_get_id(vlapic);
 	lapic->version = VLAPIC_VERSION;
 	lapic->version |= (VLAPIC_MAXLVT_INDEX << MAXLVTSHIFT);
 	lapic->dfr = 0xffffffff;
 	lapic->svr = APIC_SVR_VECTOR;
 	vlapic_mask_lvts(vlapic);
 	vlapic_reset_tmr(vlapic);
 
 	lapic->dcr_timer = 0;
 	vlapic_dcr_write_handler(vlapic);
 
 	if (vlapic->vcpuid == 0)
 		vlapic->boot_state = BS_RUNNING;	/* BSP */
 	else
 		vlapic->boot_state = BS_INIT;		/* AP */
 
 	vlapic->svr_last = lapic->svr;
 }
 
 void
 vlapic_init(struct vlapic *vlapic)
 {
 	KASSERT(vlapic->vm != NULL, ("vlapic_init: vm is not initialized"));
-	KASSERT(vlapic->vcpuid >= 0 && vlapic->vcpuid < VM_MAXCPU,
+	KASSERT(vlapic->vcpuid >= 0 &&
+	    vlapic->vcpuid < vm_get_maxcpus(vlapic->vm),
 	    ("vlapic_init: vcpuid is not initialized"));
 	KASSERT(vlapic->apic_page != NULL, ("vlapic_init: apic_page is not "
 	    "initialized"));
 
 	/*
 	 * If the vlapic is configured in x2apic mode then it will be
 	 * accessed in the critical section via the MSR emulation code.
 	 *
 	 * Therefore the timer mutex must be a spinlock because blockable
 	 * mutexes cannot be acquired in a critical section.
 	 */
 	mtx_init(&vlapic->timer_mtx, "vlapic timer mtx", NULL, MTX_SPIN);
 	callout_init(&vlapic->callout, 1);
 
 	vlapic->msr_apicbase = DEFAULT_APIC_BASE | APICBASE_ENABLED;
 
 	if (vlapic->vcpuid == 0)
 		vlapic->msr_apicbase |= APICBASE_BSP;
 
 	vlapic_reset(vlapic);
 }
 
 void
 vlapic_cleanup(struct vlapic *vlapic)
 {
 
 	callout_drain(&vlapic->callout);
 }
 
 uint64_t
 vlapic_get_apicbase(struct vlapic *vlapic)
 {
 
 	return (vlapic->msr_apicbase);
 }
 
 int
 vlapic_set_apicbase(struct vlapic *vlapic, uint64_t new)
 {
 
 	if (vlapic->msr_apicbase != new) {
 		VLAPIC_CTR2(vlapic, "Changing APIC_BASE MSR from %#lx to %#lx "
 		    "not supported", vlapic->msr_apicbase, new);
 		return (-1);
 	}
 
 	return (0);
 }
 
 void
 vlapic_set_x2apic_state(struct vm *vm, int vcpuid, enum x2apic_state state)
 {
 	struct vlapic *vlapic;
 	struct LAPIC *lapic;
 
 	vlapic = vm_lapic(vm, vcpuid);
 
 	if (state == X2APIC_DISABLED)
 		vlapic->msr_apicbase &= ~APICBASE_X2APIC;
 	else
 		vlapic->msr_apicbase |= APICBASE_X2APIC;
 
 	/*
 	 * Reset the local APIC registers whose values are mode-dependent.
 	 *
 	 * XXX this works because the APIC mode can be changed only at vcpu
 	 * initialization time.
 	 */
 	lapic = vlapic->apic_page;
 	lapic->id = vlapic_get_id(vlapic);
 	if (x2apic(vlapic)) {
 		lapic->ldr = x2apic_ldr(vlapic);
 		lapic->dfr = 0;
 	} else {
 		lapic->ldr = 0;
 		lapic->dfr = 0xffffffff;
 	}
 
 	if (state == X2APIC_ENABLED) {
 		if (vlapic->ops.enable_x2apic_mode)
 			(*vlapic->ops.enable_x2apic_mode)(vlapic);
 	}
 }
 
 void
 vlapic_deliver_intr(struct vm *vm, bool level, uint32_t dest, bool phys,
     int delmode, int vec)
 {
 	bool lowprio;
 	int vcpuid;
 	cpuset_t dmask;
 
 	if (delmode != IOART_DELFIXED &&
 	    delmode != IOART_DELLOPRI &&
 	    delmode != IOART_DELEXINT) {
 		VM_CTR1(vm, "vlapic intr invalid delmode %#x", delmode);
 		return;
 	}
 	lowprio = (delmode == IOART_DELLOPRI);
 
 	/*
 	 * We don't provide any virtual interrupt redirection hardware so
 	 * all interrupts originating from the ioapic or MSI specify the
 	 * 'dest' in the legacy xAPIC format.
 	 */
 	vlapic_calcdest(vm, &dmask, dest, phys, lowprio, false);
 
 	while ((vcpuid = CPU_FFS(&dmask)) != 0) {
 		vcpuid--;
 		CPU_CLR(vcpuid, &dmask);
 		if (delmode == IOART_DELEXINT) {
 			vm_inject_extint(vm, vcpuid);
 		} else {
 			lapic_set_intr(vm, vcpuid, vec, level);
 		}
 	}
 }
 
 void
 vlapic_post_intr(struct vlapic *vlapic, int hostcpu, int ipinum)
 {
 	/*
 	 * Post an interrupt to the vcpu currently running on 'hostcpu'.
 	 *
 	 * This is done by leveraging features like Posted Interrupts (Intel)
 	 * Doorbell MSR (AMD AVIC) that avoid a VM exit.
 	 *
 	 * If neither of these features are available then fallback to
 	 * sending an IPI to 'hostcpu'.
 	 */
 	if (vlapic->ops.post_intr)
 		(*vlapic->ops.post_intr)(vlapic, hostcpu);
 	else
 		ipi_cpu(hostcpu, ipinum);
 }
 
 bool
 vlapic_enabled(struct vlapic *vlapic)
 {
 	struct LAPIC *lapic = vlapic->apic_page;
 
 	if ((vlapic->msr_apicbase & APICBASE_ENABLED) != 0 &&
 	    (lapic->svr & APIC_SVR_ENABLE) != 0)
 		return (true);
 	else
 		return (false);
 }
 
 static void
 vlapic_set_tmr(struct vlapic *vlapic, int vector, bool level)
 {
 	struct LAPIC *lapic;
 	uint32_t *tmrptr, mask;
 	int idx;
 
 	lapic = vlapic->apic_page;
 	tmrptr = &lapic->tmr0;
 	idx = (vector / 32) * 4;
 	mask = 1 << (vector % 32);
 	if (level)
 		tmrptr[idx] |= mask;
 	else
 		tmrptr[idx] &= ~mask;
 
 	if (vlapic->ops.set_tmr != NULL)
 		(*vlapic->ops.set_tmr)(vlapic, vector, level);
 }
 
 void
 vlapic_reset_tmr(struct vlapic *vlapic)
 {
 	int vector;
 
 	VLAPIC_CTR0(vlapic, "vlapic resetting all vectors to edge-triggered");
 
 	for (vector = 0; vector <= 255; vector++)
 		vlapic_set_tmr(vlapic, vector, false);
 }
 
 void
 vlapic_set_tmr_level(struct vlapic *vlapic, uint32_t dest, bool phys,
     int delmode, int vector)
 {
 	cpuset_t dmask;
 	bool lowprio;
 
 	KASSERT(vector >= 0 && vector <= 255, ("invalid vector %d", vector));
 
 	/*
 	 * A level trigger is valid only for fixed and lowprio delivery modes.
 	 */
 	if (delmode != APIC_DELMODE_FIXED && delmode != APIC_DELMODE_LOWPRIO) {
 		VLAPIC_CTR1(vlapic, "Ignoring level trigger-mode for "
 		    "delivery-mode %d", delmode);
 		return;
 	}
 
 	lowprio = (delmode == APIC_DELMODE_LOWPRIO);
 	vlapic_calcdest(vlapic->vm, &dmask, dest, phys, lowprio, false);
 
 	if (!CPU_ISSET(vlapic->vcpuid, &dmask))
 		return;
 
 	VLAPIC_CTR1(vlapic, "vector %d set to level-triggered", vector);
 	vlapic_set_tmr(vlapic, vector, true);
 }
Index: user/ngie/bug-237403/sys/amd64/vmm/vmm.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/vmm.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/vmm.c	(revision 346926)
@@ -1,2711 +1,2718 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/sysctl.h>
 #include <sys/malloc.h>
 #include <sys/pcpu.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/rwlock.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 #include <sys/systm.h>
 
 #include <vm/vm.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
 #include <vm/pmap.h>
 #include <vm/vm_map.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_param.h>
 
 #include <machine/cpu.h>
 #include <machine/pcb.h>
 #include <machine/smp.h>
 #include <machine/md_var.h>
 #include <x86/psl.h>
 #include <x86/apicreg.h>
 
 #include <machine/vmm.h>
 #include <machine/vmm_dev.h>
 #include <machine/vmm_instruction_emul.h>
 
 #include "vmm_ioport.h"
 #include "vmm_ktr.h"
 #include "vmm_host.h"
 #include "vmm_mem.h"
 #include "vmm_util.h"
 #include "vatpic.h"
 #include "vatpit.h"
 #include "vhpet.h"
 #include "vioapic.h"
 #include "vlapic.h"
 #include "vpmtmr.h"
 #include "vrtc.h"
 #include "vmm_stat.h"
 #include "vmm_lapic.h"
 
 #include "io/ppt.h"
 #include "io/iommu.h"
 
 struct vlapic;
 
 /*
  * Initialization:
  * (a) allocated when vcpu is created
  * (i) initialized when vcpu is created and when it is reinitialized
  * (o) initialized the first time the vcpu is created
  * (x) initialized before use
  */
 struct vcpu {
 	struct mtx 	mtx;		/* (o) protects 'state' and 'hostcpu' */
 	enum vcpu_state	state;		/* (o) vcpu state */
 	int		hostcpu;	/* (o) vcpu's host cpu */
 	int		reqidle;	/* (i) request vcpu to idle */
 	struct vlapic	*vlapic;	/* (i) APIC device model */
 	enum x2apic_state x2apic_state;	/* (i) APIC mode */
 	uint64_t	exitintinfo;	/* (i) events pending at VM exit */
 	int		nmi_pending;	/* (i) NMI pending */
 	int		extint_pending;	/* (i) INTR pending */
 	int	exception_pending;	/* (i) exception pending */
 	int	exc_vector;		/* (x) exception collateral */
 	int	exc_errcode_valid;
 	uint32_t exc_errcode;
 	struct savefpu	*guestfpu;	/* (a,i) guest fpu state */
 	uint64_t	guest_xcr0;	/* (i) guest %xcr0 register */
 	void		*stats;		/* (a,i) statistics */
 	struct vm_exit	exitinfo;	/* (x) exit reason and collateral */
 	uint64_t	nextrip;	/* (x) next instruction to execute */
 };
 
 #define	vcpu_lock_initialized(v) mtx_initialized(&((v)->mtx))
 #define	vcpu_lock_init(v)	mtx_init(&((v)->mtx), "vcpu lock", 0, MTX_SPIN)
 #define	vcpu_lock(v)		mtx_lock_spin(&((v)->mtx))
 #define	vcpu_unlock(v)		mtx_unlock_spin(&((v)->mtx))
 #define	vcpu_assert_locked(v)	mtx_assert(&((v)->mtx), MA_OWNED)
 
 struct mem_seg {
 	size_t	len;
 	bool	sysmem;
 	struct vm_object *object;
 };
 #define	VM_MAX_MEMSEGS	3
 
 struct mem_map {
 	vm_paddr_t	gpa;
 	size_t		len;
 	vm_ooffset_t	segoff;
 	int		segid;
 	int		prot;
 	int		flags;
 };
 #define	VM_MAX_MEMMAPS	4
 
 /*
  * Initialization:
  * (o) initialized the first time the VM is created
  * (i) initialized when VM is created and when it is reinitialized
  * (x) initialized before use
  */
 struct vm {
 	void		*cookie;		/* (i) cpu-specific data */
 	void		*iommu;			/* (x) iommu-specific data */
 	struct vhpet	*vhpet;			/* (i) virtual HPET */
 	struct vioapic	*vioapic;		/* (i) virtual ioapic */
 	struct vatpic	*vatpic;		/* (i) virtual atpic */
 	struct vatpit	*vatpit;		/* (i) virtual atpit */
 	struct vpmtmr	*vpmtmr;		/* (i) virtual ACPI PM timer */
 	struct vrtc	*vrtc;			/* (o) virtual RTC */
 	volatile cpuset_t active_cpus;		/* (i) active vcpus */
 	volatile cpuset_t debug_cpus;		/* (i) vcpus stopped for debug */
 	int		suspend;		/* (i) stop VM execution */
 	volatile cpuset_t suspended_cpus; 	/* (i) suspended vcpus */
 	volatile cpuset_t halted_cpus;		/* (x) cpus in a hard halt */
 	cpuset_t	rendezvous_req_cpus;	/* (x) rendezvous requested */
 	cpuset_t	rendezvous_done_cpus;	/* (x) rendezvous finished */
 	void		*rendezvous_arg;	/* (x) rendezvous func/arg */
 	vm_rendezvous_func_t rendezvous_func;
 	struct mtx	rendezvous_mtx;		/* (o) rendezvous lock */
 	struct mem_map	mem_maps[VM_MAX_MEMMAPS]; /* (i) guest address space */
 	struct mem_seg	mem_segs[VM_MAX_MEMSEGS]; /* (o) guest memory regions */
 	struct vmspace	*vmspace;		/* (o) guest's address space */
 	char		name[VM_MAX_NAMELEN];	/* (o) virtual machine name */
 	struct vcpu	vcpu[VM_MAXCPU];	/* (i) guest vcpus */
 	/* The following describe the vm cpu topology */
 	uint16_t	sockets;		/* (o) num of sockets */
 	uint16_t	cores;			/* (o) num of cores/socket */
 	uint16_t	threads;		/* (o) num of threads/core */
 	uint16_t	maxcpus;		/* (o) max pluggable cpus */
 };
 
 static int vmm_initialized;
 
 static struct vmm_ops *ops;
 #define	VMM_INIT(num)	(ops != NULL ? (*ops->init)(num) : 0)
 #define	VMM_CLEANUP()	(ops != NULL ? (*ops->cleanup)() : 0)
 #define	VMM_RESUME()	(ops != NULL ? (*ops->resume)() : 0)
 
 #define	VMINIT(vm, pmap) (ops != NULL ? (*ops->vminit)(vm, pmap): NULL)
 #define	VMRUN(vmi, vcpu, rip, pmap, evinfo) \
 	(ops != NULL ? (*ops->vmrun)(vmi, vcpu, rip, pmap, evinfo) : ENXIO)
 #define	VMCLEANUP(vmi)	(ops != NULL ? (*ops->vmcleanup)(vmi) : NULL)
 #define	VMSPACE_ALLOC(min, max) \
 	(ops != NULL ? (*ops->vmspace_alloc)(min, max) : NULL)
 #define	VMSPACE_FREE(vmspace) \
 	(ops != NULL ? (*ops->vmspace_free)(vmspace) : ENXIO)
 #define	VMGETREG(vmi, vcpu, num, retval)		\
 	(ops != NULL ? (*ops->vmgetreg)(vmi, vcpu, num, retval) : ENXIO)
 #define	VMSETREG(vmi, vcpu, num, val)		\
 	(ops != NULL ? (*ops->vmsetreg)(vmi, vcpu, num, val) : ENXIO)
 #define	VMGETDESC(vmi, vcpu, num, desc)		\
 	(ops != NULL ? (*ops->vmgetdesc)(vmi, vcpu, num, desc) : ENXIO)
 #define	VMSETDESC(vmi, vcpu, num, desc)		\
 	(ops != NULL ? (*ops->vmsetdesc)(vmi, vcpu, num, desc) : ENXIO)
 #define	VMGETCAP(vmi, vcpu, num, retval)	\
 	(ops != NULL ? (*ops->vmgetcap)(vmi, vcpu, num, retval) : ENXIO)
 #define	VMSETCAP(vmi, vcpu, num, val)		\
 	(ops != NULL ? (*ops->vmsetcap)(vmi, vcpu, num, val) : ENXIO)
 #define	VLAPIC_INIT(vmi, vcpu)			\
 	(ops != NULL ? (*ops->vlapic_init)(vmi, vcpu) : NULL)
 #define	VLAPIC_CLEANUP(vmi, vlapic)		\
 	(ops != NULL ? (*ops->vlapic_cleanup)(vmi, vlapic) : NULL)
 
 #define	fpu_start_emulating()	load_cr0(rcr0() | CR0_TS)
 #define	fpu_stop_emulating()	clts()
 
 SDT_PROVIDER_DEFINE(vmm);
 
 static MALLOC_DEFINE(M_VM, "vm", "vm");
 
 /* statistics */
 static VMM_STAT(VCPU_TOTAL_RUNTIME, "vcpu total runtime");
 
 SYSCTL_NODE(_hw, OID_AUTO, vmm, CTLFLAG_RW, NULL, NULL);
 
 /*
  * Halt the guest if all vcpus are executing a HLT instruction with
  * interrupts disabled.
  */
 static int halt_detection_enabled = 1;
 SYSCTL_INT(_hw_vmm, OID_AUTO, halt_detection, CTLFLAG_RDTUN,
     &halt_detection_enabled, 0,
     "Halt VM if all vcpus execute HLT with interrupts disabled");
 
 static int vmm_ipinum;
 SYSCTL_INT(_hw_vmm, OID_AUTO, ipinum, CTLFLAG_RD, &vmm_ipinum, 0,
     "IPI vector used for vcpu notifications");
 
 static int trace_guest_exceptions;
 SYSCTL_INT(_hw_vmm, OID_AUTO, trace_guest_exceptions, CTLFLAG_RDTUN,
     &trace_guest_exceptions, 0,
     "Trap into hypervisor on all guest exceptions and reflect them back");
 
 static void vm_free_memmap(struct vm *vm, int ident);
 static bool sysmem_mapping(struct vm *vm, struct mem_map *mm);
 static void vcpu_notify_event_locked(struct vcpu *vcpu, bool lapic_intr);
 
 #ifdef KTR
 static const char *
 vcpu_state2str(enum vcpu_state state)
 {
 
 	switch (state) {
 	case VCPU_IDLE:
 		return ("idle");
 	case VCPU_FROZEN:
 		return ("frozen");
 	case VCPU_RUNNING:
 		return ("running");
 	case VCPU_SLEEPING:
 		return ("sleeping");
 	default:
 		return ("unknown");
 	}
 }
 #endif
 
 static void
 vcpu_cleanup(struct vm *vm, int i, bool destroy)
 {
 	struct vcpu *vcpu = &vm->vcpu[i];
 
 	VLAPIC_CLEANUP(vm->cookie, vcpu->vlapic);
 	if (destroy) {
 		vmm_stat_free(vcpu->stats);	
 		fpu_save_area_free(vcpu->guestfpu);
 	}
 }
 
 static void
 vcpu_init(struct vm *vm, int vcpu_id, bool create)
 {
 	struct vcpu *vcpu;
 
-	KASSERT(vcpu_id >= 0 && vcpu_id < VM_MAXCPU,
+	KASSERT(vcpu_id >= 0 && vcpu_id < vm->maxcpus,
 	    ("vcpu_init: invalid vcpu %d", vcpu_id));
 	  
 	vcpu = &vm->vcpu[vcpu_id];
 
 	if (create) {
 		KASSERT(!vcpu_lock_initialized(vcpu), ("vcpu %d already "
 		    "initialized", vcpu_id));
 		vcpu_lock_init(vcpu);
 		vcpu->state = VCPU_IDLE;
 		vcpu->hostcpu = NOCPU;
 		vcpu->guestfpu = fpu_save_area_alloc();
 		vcpu->stats = vmm_stat_alloc();
 	}
 
 	vcpu->vlapic = VLAPIC_INIT(vm->cookie, vcpu_id);
 	vm_set_x2apic_state(vm, vcpu_id, X2APIC_DISABLED);
 	vcpu->reqidle = 0;
 	vcpu->exitintinfo = 0;
 	vcpu->nmi_pending = 0;
 	vcpu->extint_pending = 0;
 	vcpu->exception_pending = 0;
 	vcpu->guest_xcr0 = XFEATURE_ENABLED_X87;
 	fpu_save_area_reset(vcpu->guestfpu);
 	vmm_stat_init(vcpu->stats);
 }
 
 int
 vcpu_trace_exceptions(struct vm *vm, int vcpuid)
 {
 
 	return (trace_guest_exceptions);
 }
 
 struct vm_exit *
 vm_exitinfo(struct vm *vm, int cpuid)
 {
 	struct vcpu *vcpu;
 
-	if (cpuid < 0 || cpuid >= VM_MAXCPU)
+	if (cpuid < 0 || cpuid >= vm->maxcpus)
 		panic("vm_exitinfo: invalid cpuid %d", cpuid);
 
 	vcpu = &vm->vcpu[cpuid];
 
 	return (&vcpu->exitinfo);
 }
 
 static void
 vmm_resume(void)
 {
 	VMM_RESUME();
 }
 
 static int
 vmm_init(void)
 {
 	int error;
 
 	vmm_host_state_init();
 
 	vmm_ipinum = lapic_ipi_alloc(pti ? &IDTVEC(justreturn1_pti) :
 	    &IDTVEC(justreturn));
 	if (vmm_ipinum < 0)
 		vmm_ipinum = IPI_AST;
 
 	error = vmm_mem_init();
 	if (error)
 		return (error);
 	
 	if (vmm_is_intel())
 		ops = &vmm_ops_intel;
 	else if (vmm_is_amd())
 		ops = &vmm_ops_amd;
 	else
 		return (ENXIO);
 
 	vmm_resume_p = vmm_resume;
 
 	return (VMM_INIT(vmm_ipinum));
 }
 
 static int
 vmm_handler(module_t mod, int what, void *arg)
 {
 	int error;
 
 	switch (what) {
 	case MOD_LOAD:
 		vmmdev_init();
 		error = vmm_init();
 		if (error == 0)
 			vmm_initialized = 1;
 		break;
 	case MOD_UNLOAD:
 		error = vmmdev_cleanup();
 		if (error == 0) {
 			vmm_resume_p = NULL;
 			iommu_cleanup();
 			if (vmm_ipinum != IPI_AST)
 				lapic_ipi_free(vmm_ipinum);
 			error = VMM_CLEANUP();
 			/*
 			 * Something bad happened - prevent new
 			 * VMs from being created
 			 */
 			if (error)
 				vmm_initialized = 0;
 		}
 		break;
 	default:
 		error = 0;
 		break;
 	}
 	return (error);
 }
 
 static moduledata_t vmm_kmod = {
 	"vmm",
 	vmm_handler,
 	NULL
 };
 
 /*
  * vmm initialization has the following dependencies:
  *
  * - VT-x initialization requires smp_rendezvous() and therefore must happen
  *   after SMP is fully functional (after SI_SUB_SMP).
  */
 DECLARE_MODULE(vmm, vmm_kmod, SI_SUB_SMP + 1, SI_ORDER_ANY);
 MODULE_VERSION(vmm, 1);
 
 static void
 vm_init(struct vm *vm, bool create)
 {
 	int i;
 
 	vm->cookie = VMINIT(vm, vmspace_pmap(vm->vmspace));
 	vm->iommu = NULL;
 	vm->vioapic = vioapic_init(vm);
 	vm->vhpet = vhpet_init(vm);
 	vm->vatpic = vatpic_init(vm);
 	vm->vatpit = vatpit_init(vm);
 	vm->vpmtmr = vpmtmr_init(vm);
 	if (create)
 		vm->vrtc = vrtc_init(vm);
 
 	CPU_ZERO(&vm->active_cpus);
 	CPU_ZERO(&vm->debug_cpus);
 
 	vm->suspend = 0;
 	CPU_ZERO(&vm->suspended_cpus);
 
-	for (i = 0; i < VM_MAXCPU; i++)
+	for (i = 0; i < vm->maxcpus; i++)
 		vcpu_init(vm, i, create);
 }
 
 /*
  * The default CPU topology is a single thread per package.
  */
 u_int cores_per_package = 1;
 u_int threads_per_core = 1;
 
 int
 vm_create(const char *name, struct vm **retvm)
 {
 	struct vm *vm;
 	struct vmspace *vmspace;
 
 	/*
 	 * If vmm.ko could not be successfully initialized then don't attempt
 	 * to create the virtual machine.
 	 */
 	if (!vmm_initialized)
 		return (ENXIO);
 
 	if (name == NULL || strlen(name) >= VM_MAX_NAMELEN)
 		return (EINVAL);
 
 	vmspace = VMSPACE_ALLOC(0, VM_MAXUSER_ADDRESS);
 	if (vmspace == NULL)
 		return (ENOMEM);
 
 	vm = malloc(sizeof(struct vm), M_VM, M_WAITOK | M_ZERO);
 	strcpy(vm->name, name);
 	vm->vmspace = vmspace;
 	mtx_init(&vm->rendezvous_mtx, "vm rendezvous lock", 0, MTX_DEF);
 
 	vm->sockets = 1;
 	vm->cores = cores_per_package;	/* XXX backwards compatibility */
 	vm->threads = threads_per_core;	/* XXX backwards compatibility */
-	vm->maxcpus = 0;		/* XXX not implemented */
+	vm->maxcpus = VM_MAXCPU;	/* XXX temp to keep code working */
 
 	vm_init(vm, true);
 
 	*retvm = vm;
 	return (0);
 }
 
 void
 vm_get_topology(struct vm *vm, uint16_t *sockets, uint16_t *cores,
     uint16_t *threads, uint16_t *maxcpus)
 {
 	*sockets = vm->sockets;
 	*cores = vm->cores;
 	*threads = vm->threads;
 	*maxcpus = vm->maxcpus;
 }
 
+uint16_t
+vm_get_maxcpus(struct vm *vm)
+{
+	return (vm->maxcpus);
+}
+
 int
 vm_set_topology(struct vm *vm, uint16_t sockets, uint16_t cores,
     uint16_t threads, uint16_t maxcpus)
 {
 	if (maxcpus != 0)
 		return (EINVAL);	/* XXX remove when supported */
-	if ((sockets * cores * threads) > VM_MAXCPU)
+	if ((sockets * cores * threads) > vm->maxcpus)
 		return (EINVAL);
 	/* XXX need to check sockets * cores * threads == vCPU, how? */
 	vm->sockets = sockets;
 	vm->cores = cores;
 	vm->threads = threads;
-	vm->maxcpus = maxcpus;
+	vm->maxcpus = VM_MAXCPU;	/* XXX temp to keep code working */
 	return(0);
 }
 
 static void
 vm_cleanup(struct vm *vm, bool destroy)
 {
 	struct mem_map *mm;
 	int i;
 
 	ppt_unassign_all(vm);
 
 	if (vm->iommu != NULL)
 		iommu_destroy_domain(vm->iommu);
 
 	if (destroy)
 		vrtc_cleanup(vm->vrtc);
 	else
 		vrtc_reset(vm->vrtc);
 	vpmtmr_cleanup(vm->vpmtmr);
 	vatpit_cleanup(vm->vatpit);
 	vhpet_cleanup(vm->vhpet);
 	vatpic_cleanup(vm->vatpic);
 	vioapic_cleanup(vm->vioapic);
 
-	for (i = 0; i < VM_MAXCPU; i++)
+	for (i = 0; i < vm->maxcpus; i++)
 		vcpu_cleanup(vm, i, destroy);
 
 	VMCLEANUP(vm->cookie);
 
 	/*
 	 * System memory is removed from the guest address space only when
 	 * the VM is destroyed. This is because the mapping remains the same
 	 * across VM reset.
 	 *
 	 * Device memory can be relocated by the guest (e.g. using PCI BARs)
 	 * so those mappings are removed on a VM reset.
 	 */
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (destroy || !sysmem_mapping(vm, mm))
 			vm_free_memmap(vm, i);
 	}
 
 	if (destroy) {
 		for (i = 0; i < VM_MAX_MEMSEGS; i++)
 			vm_free_memseg(vm, i);
 
 		VMSPACE_FREE(vm->vmspace);
 		vm->vmspace = NULL;
 	}
 }
 
 void
 vm_destroy(struct vm *vm)
 {
 	vm_cleanup(vm, true);
 	free(vm, M_VM);
 }
 
 int
 vm_reinit(struct vm *vm)
 {
 	int error;
 
 	/*
 	 * A virtual machine can be reset only if all vcpus are suspended.
 	 */
 	if (CPU_CMP(&vm->suspended_cpus, &vm->active_cpus) == 0) {
 		vm_cleanup(vm, false);
 		vm_init(vm, false);
 		error = 0;
 	} else {
 		error = EBUSY;
 	}
 
 	return (error);
 }
 
 const char *
 vm_name(struct vm *vm)
 {
 	return (vm->name);
 }
 
 int
 vm_map_mmio(struct vm *vm, vm_paddr_t gpa, size_t len, vm_paddr_t hpa)
 {
 	vm_object_t obj;
 
 	if ((obj = vmm_mmio_alloc(vm->vmspace, gpa, len, hpa)) == NULL)
 		return (ENOMEM);
 	else
 		return (0);
 }
 
 int
 vm_unmap_mmio(struct vm *vm, vm_paddr_t gpa, size_t len)
 {
 
 	vmm_mmio_free(vm->vmspace, gpa, len);
 	return (0);
 }
 
 /*
  * Return 'true' if 'gpa' is allocated in the guest address space.
  *
  * This function is called in the context of a running vcpu which acts as
  * an implicit lock on 'vm->mem_maps[]'.
  */
 bool
 vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa)
 {
 	struct mem_map *mm;
 	int i;
 
 #ifdef INVARIANTS
 	int hostcpu, state;
 	state = vcpu_get_state(vm, vcpuid, &hostcpu);
 	KASSERT(state == VCPU_RUNNING && hostcpu == curcpu,
 	    ("%s: invalid vcpu state %d/%d", __func__, state, hostcpu));
 #endif
 
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (mm->len != 0 && gpa >= mm->gpa && gpa < mm->gpa + mm->len)
 			return (true);		/* 'gpa' is sysmem or devmem */
 	}
 
 	if (ppt_is_mmio(vm, gpa))
 		return (true);			/* 'gpa' is pci passthru mmio */
 
 	return (false);
 }
 
 int
 vm_alloc_memseg(struct vm *vm, int ident, size_t len, bool sysmem)
 {
 	struct mem_seg *seg;
 	vm_object_t obj;
 
 	if (ident < 0 || ident >= VM_MAX_MEMSEGS)
 		return (EINVAL);
 
 	if (len == 0 || (len & PAGE_MASK))
 		return (EINVAL);
 
 	seg = &vm->mem_segs[ident];
 	if (seg->object != NULL) {
 		if (seg->len == len && seg->sysmem == sysmem)
 			return (EEXIST);
 		else
 			return (EINVAL);
 	}
 
 	obj = vm_object_allocate(OBJT_DEFAULT, len >> PAGE_SHIFT);
 	if (obj == NULL)
 		return (ENOMEM);
 
 	seg->len = len;
 	seg->object = obj;
 	seg->sysmem = sysmem;
 	return (0);
 }
 
 int
 vm_get_memseg(struct vm *vm, int ident, size_t *len, bool *sysmem,
     vm_object_t *objptr)
 {
 	struct mem_seg *seg;
 
 	if (ident < 0 || ident >= VM_MAX_MEMSEGS)
 		return (EINVAL);
 
 	seg = &vm->mem_segs[ident];
 	if (len)
 		*len = seg->len;
 	if (sysmem)
 		*sysmem = seg->sysmem;
 	if (objptr)
 		*objptr = seg->object;
 	return (0);
 }
 
 void
 vm_free_memseg(struct vm *vm, int ident)
 {
 	struct mem_seg *seg;
 
 	KASSERT(ident >= 0 && ident < VM_MAX_MEMSEGS,
 	    ("%s: invalid memseg ident %d", __func__, ident));
 
 	seg = &vm->mem_segs[ident];
 	if (seg->object != NULL) {
 		vm_object_deallocate(seg->object);
 		bzero(seg, sizeof(struct mem_seg));
 	}
 }
 
 int
 vm_mmap_memseg(struct vm *vm, vm_paddr_t gpa, int segid, vm_ooffset_t first,
     size_t len, int prot, int flags)
 {
 	struct mem_seg *seg;
 	struct mem_map *m, *map;
 	vm_ooffset_t last;
 	int i, error;
 
 	if (prot == 0 || (prot & ~(VM_PROT_ALL)) != 0)
 		return (EINVAL);
 
 	if (flags & ~VM_MEMMAP_F_WIRED)
 		return (EINVAL);
 
 	if (segid < 0 || segid >= VM_MAX_MEMSEGS)
 		return (EINVAL);
 
 	seg = &vm->mem_segs[segid];
 	if (seg->object == NULL)
 		return (EINVAL);
 
 	last = first + len;
 	if (first < 0 || first >= last || last > seg->len)
 		return (EINVAL);
 
 	if ((gpa | first | last) & PAGE_MASK)
 		return (EINVAL);
 
 	map = NULL;
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		m = &vm->mem_maps[i];
 		if (m->len == 0) {
 			map = m;
 			break;
 		}
 	}
 
 	if (map == NULL)
 		return (ENOSPC);
 
 	error = vm_map_find(&vm->vmspace->vm_map, seg->object, first, &gpa,
 	    len, 0, VMFS_NO_SPACE, prot, prot, 0);
 	if (error != KERN_SUCCESS)
 		return (EFAULT);
 
 	vm_object_reference(seg->object);
 
 	if (flags & VM_MEMMAP_F_WIRED) {
 		error = vm_map_wire(&vm->vmspace->vm_map, gpa, gpa + len,
 		    VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES);
 		if (error != KERN_SUCCESS) {
 			vm_map_remove(&vm->vmspace->vm_map, gpa, gpa + len);
 			return (EFAULT);
 		}
 	}
 
 	map->gpa = gpa;
 	map->len = len;
 	map->segoff = first;
 	map->segid = segid;
 	map->prot = prot;
 	map->flags = flags;
 	return (0);
 }
 
 int
 vm_mmap_getnext(struct vm *vm, vm_paddr_t *gpa, int *segid,
     vm_ooffset_t *segoff, size_t *len, int *prot, int *flags)
 {
 	struct mem_map *mm, *mmnext;
 	int i;
 
 	mmnext = NULL;
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (mm->len == 0 || mm->gpa < *gpa)
 			continue;
 		if (mmnext == NULL || mm->gpa < mmnext->gpa)
 			mmnext = mm;
 	}
 
 	if (mmnext != NULL) {
 		*gpa = mmnext->gpa;
 		if (segid)
 			*segid = mmnext->segid;
 		if (segoff)
 			*segoff = mmnext->segoff;
 		if (len)
 			*len = mmnext->len;
 		if (prot)
 			*prot = mmnext->prot;
 		if (flags)
 			*flags = mmnext->flags;
 		return (0);
 	} else {
 		return (ENOENT);
 	}
 }
 
 static void
 vm_free_memmap(struct vm *vm, int ident)
 {
 	struct mem_map *mm;
 	int error;
 
 	mm = &vm->mem_maps[ident];
 	if (mm->len) {
 		error = vm_map_remove(&vm->vmspace->vm_map, mm->gpa,
 		    mm->gpa + mm->len);
 		KASSERT(error == KERN_SUCCESS, ("%s: vm_map_remove error %d",
 		    __func__, error));
 		bzero(mm, sizeof(struct mem_map));
 	}
 }
 
 static __inline bool
 sysmem_mapping(struct vm *vm, struct mem_map *mm)
 {
 
 	if (mm->len != 0 && vm->mem_segs[mm->segid].sysmem)
 		return (true);
 	else
 		return (false);
 }
 
 vm_paddr_t
 vmm_sysmem_maxaddr(struct vm *vm)
 {
 	struct mem_map *mm;
 	vm_paddr_t maxaddr;
 	int i;
 
 	maxaddr = 0;
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (sysmem_mapping(vm, mm)) {
 			if (maxaddr < mm->gpa + mm->len)
 				maxaddr = mm->gpa + mm->len;
 		}
 	}
 	return (maxaddr);
 }
 
 static void
 vm_iommu_modify(struct vm *vm, boolean_t map)
 {
 	int i, sz;
 	vm_paddr_t gpa, hpa;
 	struct mem_map *mm;
 	void *vp, *cookie, *host_domain;
 
 	sz = PAGE_SIZE;
 	host_domain = iommu_host_domain();
 
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (!sysmem_mapping(vm, mm))
 			continue;
 
 		if (map) {
 			KASSERT((mm->flags & VM_MEMMAP_F_IOMMU) == 0,
 			    ("iommu map found invalid memmap %#lx/%#lx/%#x",
 			    mm->gpa, mm->len, mm->flags));
 			if ((mm->flags & VM_MEMMAP_F_WIRED) == 0)
 				continue;
 			mm->flags |= VM_MEMMAP_F_IOMMU;
 		} else {
 			if ((mm->flags & VM_MEMMAP_F_IOMMU) == 0)
 				continue;
 			mm->flags &= ~VM_MEMMAP_F_IOMMU;
 			KASSERT((mm->flags & VM_MEMMAP_F_WIRED) != 0,
 			    ("iommu unmap found invalid memmap %#lx/%#lx/%#x",
 			    mm->gpa, mm->len, mm->flags));
 		}
 
 		gpa = mm->gpa;
 		while (gpa < mm->gpa + mm->len) {
 			vp = vm_gpa_hold(vm, -1, gpa, PAGE_SIZE, VM_PROT_WRITE,
 					 &cookie);
 			KASSERT(vp != NULL, ("vm(%s) could not map gpa %#lx",
 			    vm_name(vm), gpa));
 
 			vm_gpa_release(cookie);
 
 			hpa = DMAP_TO_PHYS((uintptr_t)vp);
 			if (map) {
 				iommu_create_mapping(vm->iommu, gpa, hpa, sz);
 				iommu_remove_mapping(host_domain, hpa, sz);
 			} else {
 				iommu_remove_mapping(vm->iommu, gpa, sz);
 				iommu_create_mapping(host_domain, hpa, hpa, sz);
 			}
 
 			gpa += PAGE_SIZE;
 		}
 	}
 
 	/*
 	 * Invalidate the cached translations associated with the domain
 	 * from which pages were removed.
 	 */
 	if (map)
 		iommu_invalidate_tlb(host_domain);
 	else
 		iommu_invalidate_tlb(vm->iommu);
 }
 
 #define	vm_iommu_unmap(vm)	vm_iommu_modify((vm), FALSE)
 #define	vm_iommu_map(vm)	vm_iommu_modify((vm), TRUE)
 
 int
 vm_unassign_pptdev(struct vm *vm, int bus, int slot, int func)
 {
 	int error;
 
 	error = ppt_unassign_device(vm, bus, slot, func);
 	if (error)
 		return (error);
 
 	if (ppt_assigned_devices(vm) == 0)
 		vm_iommu_unmap(vm);
 
 	return (0);
 }
 
 int
 vm_assign_pptdev(struct vm *vm, int bus, int slot, int func)
 {
 	int error;
 	vm_paddr_t maxaddr;
 
 	/* Set up the IOMMU to do the 'gpa' to 'hpa' translation */
 	if (ppt_assigned_devices(vm) == 0) {
 		KASSERT(vm->iommu == NULL,
 		    ("vm_assign_pptdev: iommu must be NULL"));
 		maxaddr = vmm_sysmem_maxaddr(vm);
 		vm->iommu = iommu_create_domain(maxaddr);
 		if (vm->iommu == NULL)
 			return (ENXIO);
 		vm_iommu_map(vm);
 	}
 
 	error = ppt_assign_device(vm, bus, slot, func);
 	return (error);
 }
 
 void *
 vm_gpa_hold(struct vm *vm, int vcpuid, vm_paddr_t gpa, size_t len, int reqprot,
 	    void **cookie)
 {
 	int i, count, pageoff;
 	struct mem_map *mm;
 	vm_page_t m;
 #ifdef INVARIANTS
 	/*
 	 * All vcpus are frozen by ioctls that modify the memory map
 	 * (e.g. VM_MMAP_MEMSEG). Therefore 'vm->memmap[]' stability is
 	 * guaranteed if at least one vcpu is in the VCPU_FROZEN state.
 	 */
 	int state;
-	KASSERT(vcpuid >= -1 && vcpuid < VM_MAXCPU, ("%s: invalid vcpuid %d",
+	KASSERT(vcpuid >= -1 && vcpuid < vm->maxcpus, ("%s: invalid vcpuid %d",
 	    __func__, vcpuid));
-	for (i = 0; i < VM_MAXCPU; i++) {
+	for (i = 0; i < vm->maxcpus; i++) {
 		if (vcpuid != -1 && vcpuid != i)
 			continue;
 		state = vcpu_get_state(vm, i, NULL);
 		KASSERT(state == VCPU_FROZEN, ("%s: invalid vcpu state %d",
 		    __func__, state));
 	}
 #endif
 	pageoff = gpa & PAGE_MASK;
 	if (len > PAGE_SIZE - pageoff)
 		panic("vm_gpa_hold: invalid gpa/len: 0x%016lx/%lu", gpa, len);
 
 	count = 0;
 	for (i = 0; i < VM_MAX_MEMMAPS; i++) {
 		mm = &vm->mem_maps[i];
 		if (sysmem_mapping(vm, mm) && gpa >= mm->gpa &&
 		    gpa < mm->gpa + mm->len) {
 			count = vm_fault_quick_hold_pages(&vm->vmspace->vm_map,
 			    trunc_page(gpa), PAGE_SIZE, reqprot, &m, 1);
 			break;
 		}
 	}
 
 	if (count == 1) {
 		*cookie = m;
 		return ((void *)(PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m)) + pageoff));
 	} else {
 		*cookie = NULL;
 		return (NULL);
 	}
 }
 
 void
 vm_gpa_release(void *cookie)
 {
 	vm_page_t m = cookie;
 
 	vm_page_lock(m);
 	vm_page_unhold(m);
 	vm_page_unlock(m);
 }
 
 int
 vm_get_register(struct vm *vm, int vcpu, int reg, uint64_t *retval)
 {
 
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm->maxcpus)
 		return (EINVAL);
 
 	if (reg >= VM_REG_LAST)
 		return (EINVAL);
 
 	return (VMGETREG(vm->cookie, vcpu, reg, retval));
 }
 
 int
 vm_set_register(struct vm *vm, int vcpuid, int reg, uint64_t val)
 {
 	struct vcpu *vcpu;
 	int error;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (reg >= VM_REG_LAST)
 		return (EINVAL);
 
 	error = VMSETREG(vm->cookie, vcpuid, reg, val);
 	if (error || reg != VM_REG_GUEST_RIP)
 		return (error);
 
 	/* Set 'nextrip' to match the value of %rip */
 	VCPU_CTR1(vm, vcpuid, "Setting nextrip to %#lx", val);
 	vcpu = &vm->vcpu[vcpuid];
 	vcpu->nextrip = val;
 	return (0);
 }
 
 static boolean_t
 is_descriptor_table(int reg)
 {
 
 	switch (reg) {
 	case VM_REG_GUEST_IDTR:
 	case VM_REG_GUEST_GDTR:
 		return (TRUE);
 	default:
 		return (FALSE);
 	}
 }
 
 static boolean_t
 is_segment_register(int reg)
 {
 	
 	switch (reg) {
 	case VM_REG_GUEST_ES:
 	case VM_REG_GUEST_CS:
 	case VM_REG_GUEST_SS:
 	case VM_REG_GUEST_DS:
 	case VM_REG_GUEST_FS:
 	case VM_REG_GUEST_GS:
 	case VM_REG_GUEST_TR:
 	case VM_REG_GUEST_LDTR:
 		return (TRUE);
 	default:
 		return (FALSE);
 	}
 }
 
 int
 vm_get_seg_desc(struct vm *vm, int vcpu, int reg,
 		struct seg_desc *desc)
 {
 
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm->maxcpus)
 		return (EINVAL);
 
 	if (!is_segment_register(reg) && !is_descriptor_table(reg))
 		return (EINVAL);
 
 	return (VMGETDESC(vm->cookie, vcpu, reg, desc));
 }
 
 int
 vm_set_seg_desc(struct vm *vm, int vcpu, int reg,
 		struct seg_desc *desc)
 {
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm->maxcpus)
 		return (EINVAL);
 
 	if (!is_segment_register(reg) && !is_descriptor_table(reg))
 		return (EINVAL);
 
 	return (VMSETDESC(vm->cookie, vcpu, reg, desc));
 }
 
 static void
 restore_guest_fpustate(struct vcpu *vcpu)
 {
 
 	/* flush host state to the pcb */
 	fpuexit(curthread);
 
 	/* restore guest FPU state */
 	fpu_stop_emulating();
 	fpurestore(vcpu->guestfpu);
 
 	/* restore guest XCR0 if XSAVE is enabled in the host */
 	if (rcr4() & CR4_XSAVE)
 		load_xcr(0, vcpu->guest_xcr0);
 
 	/*
 	 * The FPU is now "dirty" with the guest's state so turn on emulation
 	 * to trap any access to the FPU by the host.
 	 */
 	fpu_start_emulating();
 }
 
 static void
 save_guest_fpustate(struct vcpu *vcpu)
 {
 
 	if ((rcr0() & CR0_TS) == 0)
 		panic("fpu emulation not enabled in host!");
 
 	/* save guest XCR0 and restore host XCR0 */
 	if (rcr4() & CR4_XSAVE) {
 		vcpu->guest_xcr0 = rxcr(0);
 		load_xcr(0, vmm_get_host_xcr0());
 	}
 
 	/* save guest FPU state */
 	fpu_stop_emulating();
 	fpusave(vcpu->guestfpu);
 	fpu_start_emulating();
 }
 
 static VMM_STAT(VCPU_IDLE_TICKS, "number of ticks vcpu was idle");
 
 static int
 vcpu_set_state_locked(struct vm *vm, int vcpuid, enum vcpu_state newstate,
     bool from_idle)
 {
 	struct vcpu *vcpu;
 	int error;
 
 	vcpu = &vm->vcpu[vcpuid];
 	vcpu_assert_locked(vcpu);
 
 	/*
 	 * State transitions from the vmmdev_ioctl() must always begin from
 	 * the VCPU_IDLE state. This guarantees that there is only a single
 	 * ioctl() operating on a vcpu at any point.
 	 */
 	if (from_idle) {
 		while (vcpu->state != VCPU_IDLE) {
 			vcpu->reqidle = 1;
 			vcpu_notify_event_locked(vcpu, false);
 			VCPU_CTR1(vm, vcpuid, "vcpu state change from %s to "
 			    "idle requested", vcpu_state2str(vcpu->state));
 			msleep_spin(&vcpu->state, &vcpu->mtx, "vmstat", hz);
 		}
 	} else {
 		KASSERT(vcpu->state != VCPU_IDLE, ("invalid transition from "
 		    "vcpu idle state"));
 	}
 
 	if (vcpu->state == VCPU_RUNNING) {
 		KASSERT(vcpu->hostcpu == curcpu, ("curcpu %d and hostcpu %d "
 		    "mismatch for running vcpu", curcpu, vcpu->hostcpu));
 	} else {
 		KASSERT(vcpu->hostcpu == NOCPU, ("Invalid hostcpu %d for a "
 		    "vcpu that is not running", vcpu->hostcpu));
 	}
 
 	/*
 	 * The following state transitions are allowed:
 	 * IDLE -> FROZEN -> IDLE
 	 * FROZEN -> RUNNING -> FROZEN
 	 * FROZEN -> SLEEPING -> FROZEN
 	 */
 	switch (vcpu->state) {
 	case VCPU_IDLE:
 	case VCPU_RUNNING:
 	case VCPU_SLEEPING:
 		error = (newstate != VCPU_FROZEN);
 		break;
 	case VCPU_FROZEN:
 		error = (newstate == VCPU_FROZEN);
 		break;
 	default:
 		error = 1;
 		break;
 	}
 
 	if (error)
 		return (EBUSY);
 
 	VCPU_CTR2(vm, vcpuid, "vcpu state changed from %s to %s",
 	    vcpu_state2str(vcpu->state), vcpu_state2str(newstate));
 
 	vcpu->state = newstate;
 	if (newstate == VCPU_RUNNING)
 		vcpu->hostcpu = curcpu;
 	else
 		vcpu->hostcpu = NOCPU;
 
 	if (newstate == VCPU_IDLE)
 		wakeup(&vcpu->state);
 
 	return (0);
 }
 
 static void
 vcpu_require_state(struct vm *vm, int vcpuid, enum vcpu_state newstate)
 {
 	int error;
 
 	if ((error = vcpu_set_state(vm, vcpuid, newstate, false)) != 0)
 		panic("Error %d setting state to %d\n", error, newstate);
 }
 
 static void
 vcpu_require_state_locked(struct vm *vm, int vcpuid, enum vcpu_state newstate)
 {
 	int error;
 
 	if ((error = vcpu_set_state_locked(vm, vcpuid, newstate, false)) != 0)
 		panic("Error %d setting state to %d", error, newstate);
 }
 
 static void
 vm_set_rendezvous_func(struct vm *vm, vm_rendezvous_func_t func)
 {
 
 	KASSERT(mtx_owned(&vm->rendezvous_mtx), ("rendezvous_mtx not locked"));
 
 	/*
 	 * Update 'rendezvous_func' and execute a write memory barrier to
 	 * ensure that it is visible across all host cpus. This is not needed
 	 * for correctness but it does ensure that all the vcpus will notice
 	 * that the rendezvous is requested immediately.
 	 */
 	vm->rendezvous_func = func;
 	wmb();
 }
 
 #define	RENDEZVOUS_CTR0(vm, vcpuid, fmt)				\
 	do {								\
 		if (vcpuid >= 0)					\
 			VCPU_CTR0(vm, vcpuid, fmt);			\
 		else							\
 			VM_CTR0(vm, fmt);				\
 	} while (0)
 
 static void
 vm_handle_rendezvous(struct vm *vm, int vcpuid)
 {
 
-	KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < VM_MAXCPU),
+	KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < vm->maxcpus),
 	    ("vm_handle_rendezvous: invalid vcpuid %d", vcpuid));
 
 	mtx_lock(&vm->rendezvous_mtx);
 	while (vm->rendezvous_func != NULL) {
 		/* 'rendezvous_req_cpus' must be a subset of 'active_cpus' */
 		CPU_AND(&vm->rendezvous_req_cpus, &vm->active_cpus);
 
 		if (vcpuid != -1 &&
 		    CPU_ISSET(vcpuid, &vm->rendezvous_req_cpus) &&
 		    !CPU_ISSET(vcpuid, &vm->rendezvous_done_cpus)) {
 			VCPU_CTR0(vm, vcpuid, "Calling rendezvous func");
 			(*vm->rendezvous_func)(vm, vcpuid, vm->rendezvous_arg);
 			CPU_SET(vcpuid, &vm->rendezvous_done_cpus);
 		}
 		if (CPU_CMP(&vm->rendezvous_req_cpus,
 		    &vm->rendezvous_done_cpus) == 0) {
 			VCPU_CTR0(vm, vcpuid, "Rendezvous completed");
 			vm_set_rendezvous_func(vm, NULL);
 			wakeup(&vm->rendezvous_func);
 			break;
 		}
 		RENDEZVOUS_CTR0(vm, vcpuid, "Wait for rendezvous completion");
 		mtx_sleep(&vm->rendezvous_func, &vm->rendezvous_mtx, 0,
 		    "vmrndv", 0);
 	}
 	mtx_unlock(&vm->rendezvous_mtx);
 }
 
 /*
  * Emulate a guest 'hlt' by sleeping until the vcpu is ready to run.
  */
 static int
 vm_handle_hlt(struct vm *vm, int vcpuid, bool intr_disabled, bool *retu)
 {
 	struct vcpu *vcpu;
 	const char *wmesg;
 	int t, vcpu_halted, vm_halted;
 
 	KASSERT(!CPU_ISSET(vcpuid, &vm->halted_cpus), ("vcpu already halted"));
 
 	vcpu = &vm->vcpu[vcpuid];
 	vcpu_halted = 0;
 	vm_halted = 0;
 
 	vcpu_lock(vcpu);
 	while (1) {
 		/*
 		 * Do a final check for pending NMI or interrupts before
 		 * really putting this thread to sleep. Also check for
 		 * software events that would cause this vcpu to wakeup.
 		 *
 		 * These interrupts/events could have happened after the
 		 * vcpu returned from VMRUN() and before it acquired the
 		 * vcpu lock above.
 		 */
 		if (vm->rendezvous_func != NULL || vm->suspend || vcpu->reqidle)
 			break;
 		if (vm_nmi_pending(vm, vcpuid))
 			break;
 		if (!intr_disabled) {
 			if (vm_extint_pending(vm, vcpuid) ||
 			    vlapic_pending_intr(vcpu->vlapic, NULL)) {
 				break;
 			}
 		}
 
 		/* Don't go to sleep if the vcpu thread needs to yield */
 		if (vcpu_should_yield(vm, vcpuid))
 			break;
 
 		if (vcpu_debugged(vm, vcpuid))
 			break;
 
 		/*
 		 * Some Linux guests implement "halt" by having all vcpus
 		 * execute HLT with interrupts disabled. 'halted_cpus' keeps
 		 * track of the vcpus that have entered this state. When all
 		 * vcpus enter the halted state the virtual machine is halted.
 		 */
 		if (intr_disabled) {
 			wmesg = "vmhalt";
 			VCPU_CTR0(vm, vcpuid, "Halted");
 			if (!vcpu_halted && halt_detection_enabled) {
 				vcpu_halted = 1;
 				CPU_SET_ATOMIC(vcpuid, &vm->halted_cpus);
 			}
 			if (CPU_CMP(&vm->halted_cpus, &vm->active_cpus) == 0) {
 				vm_halted = 1;
 				break;
 			}
 		} else {
 			wmesg = "vmidle";
 		}
 
 		t = ticks;
 		vcpu_require_state_locked(vm, vcpuid, VCPU_SLEEPING);
 		/*
 		 * XXX msleep_spin() cannot be interrupted by signals so
 		 * wake up periodically to check pending signals.
 		 */
 		msleep_spin(vcpu, &vcpu->mtx, wmesg, hz);
 		vcpu_require_state_locked(vm, vcpuid, VCPU_FROZEN);
 		vmm_stat_incr(vm, vcpuid, VCPU_IDLE_TICKS, ticks - t);
 	}
 
 	if (vcpu_halted)
 		CPU_CLR_ATOMIC(vcpuid, &vm->halted_cpus);
 
 	vcpu_unlock(vcpu);
 
 	if (vm_halted)
 		vm_suspend(vm, VM_SUSPEND_HALT);
 
 	return (0);
 }
 
 static int
 vm_handle_paging(struct vm *vm, int vcpuid, bool *retu)
 {
 	int rv, ftype;
 	struct vm_map *map;
 	struct vcpu *vcpu;
 	struct vm_exit *vme;
 
 	vcpu = &vm->vcpu[vcpuid];
 	vme = &vcpu->exitinfo;
 
 	KASSERT(vme->inst_length == 0, ("%s: invalid inst_length %d",
 	    __func__, vme->inst_length));
 
 	ftype = vme->u.paging.fault_type;
 	KASSERT(ftype == VM_PROT_READ ||
 	    ftype == VM_PROT_WRITE || ftype == VM_PROT_EXECUTE,
 	    ("vm_handle_paging: invalid fault_type %d", ftype));
 
 	if (ftype == VM_PROT_READ || ftype == VM_PROT_WRITE) {
 		rv = pmap_emulate_accessed_dirty(vmspace_pmap(vm->vmspace),
 		    vme->u.paging.gpa, ftype);
 		if (rv == 0) {
 			VCPU_CTR2(vm, vcpuid, "%s bit emulation for gpa %#lx",
 			    ftype == VM_PROT_READ ? "accessed" : "dirty",
 			    vme->u.paging.gpa);
 			goto done;
 		}
 	}
 
 	map = &vm->vmspace->vm_map;
 	rv = vm_fault(map, vme->u.paging.gpa, ftype, VM_FAULT_NORMAL);
 
 	VCPU_CTR3(vm, vcpuid, "vm_handle_paging rv = %d, gpa = %#lx, "
 	    "ftype = %d", rv, vme->u.paging.gpa, ftype);
 
 	if (rv != KERN_SUCCESS)
 		return (EFAULT);
 done:
 	return (0);
 }
 
 static int
 vm_handle_inst_emul(struct vm *vm, int vcpuid, bool *retu)
 {
 	struct vie *vie;
 	struct vcpu *vcpu;
 	struct vm_exit *vme;
 	uint64_t gla, gpa, cs_base;
 	struct vm_guest_paging *paging;
 	mem_region_read_t mread;
 	mem_region_write_t mwrite;
 	enum vm_cpu_mode cpu_mode;
 	int cs_d, error, fault;
 
 	vcpu = &vm->vcpu[vcpuid];
 	vme = &vcpu->exitinfo;
 
 	KASSERT(vme->inst_length == 0, ("%s: invalid inst_length %d",
 	    __func__, vme->inst_length));
 
 	gla = vme->u.inst_emul.gla;
 	gpa = vme->u.inst_emul.gpa;
 	cs_base = vme->u.inst_emul.cs_base;
 	cs_d = vme->u.inst_emul.cs_d;
 	vie = &vme->u.inst_emul.vie;
 	paging = &vme->u.inst_emul.paging;
 	cpu_mode = paging->cpu_mode;
 
 	VCPU_CTR1(vm, vcpuid, "inst_emul fault accessing gpa %#lx", gpa);
 
 	/* Fetch, decode and emulate the faulting instruction */
 	if (vie->num_valid == 0) {
 		error = vmm_fetch_instruction(vm, vcpuid, paging, vme->rip +
 		    cs_base, VIE_INST_SIZE, vie, &fault);
 	} else {
 		/*
 		 * The instruction bytes have already been copied into 'vie'
 		 */
 		error = fault = 0;
 	}
 	if (error || fault)
 		return (error);
 
 	if (vmm_decode_instruction(vm, vcpuid, gla, cpu_mode, cs_d, vie) != 0) {
 		VCPU_CTR1(vm, vcpuid, "Error decoding instruction at %#lx",
 		    vme->rip + cs_base);
 		*retu = true;	    /* dump instruction bytes in userspace */
 		return (0);
 	}
 
 	/*
 	 * Update 'nextrip' based on the length of the emulated instruction.
 	 */
 	vme->inst_length = vie->num_processed;
 	vcpu->nextrip += vie->num_processed;
 	VCPU_CTR1(vm, vcpuid, "nextrip updated to %#lx after instruction "
 	    "decoding", vcpu->nextrip);
  
 	/* return to userland unless this is an in-kernel emulated device */
 	if (gpa >= DEFAULT_APIC_BASE && gpa < DEFAULT_APIC_BASE + PAGE_SIZE) {
 		mread = lapic_mmio_read;
 		mwrite = lapic_mmio_write;
 	} else if (gpa >= VIOAPIC_BASE && gpa < VIOAPIC_BASE + VIOAPIC_SIZE) {
 		mread = vioapic_mmio_read;
 		mwrite = vioapic_mmio_write;
 	} else if (gpa >= VHPET_BASE && gpa < VHPET_BASE + VHPET_SIZE) {
 		mread = vhpet_mmio_read;
 		mwrite = vhpet_mmio_write;
 	} else {
 		*retu = true;
 		return (0);
 	}
 
 	error = vmm_emulate_instruction(vm, vcpuid, gpa, vie, paging,
 	    mread, mwrite, retu);
 
 	return (error);
 }
 
 static int
 vm_handle_suspend(struct vm *vm, int vcpuid, bool *retu)
 {
 	int i, done;
 	struct vcpu *vcpu;
 
 	done = 0;
 	vcpu = &vm->vcpu[vcpuid];
 
 	CPU_SET_ATOMIC(vcpuid, &vm->suspended_cpus);
 
 	/*
 	 * Wait until all 'active_cpus' have suspended themselves.
 	 *
 	 * Since a VM may be suspended at any time including when one or
 	 * more vcpus are doing a rendezvous we need to call the rendezvous
 	 * handler while we are waiting to prevent a deadlock.
 	 */
 	vcpu_lock(vcpu);
 	while (1) {
 		if (CPU_CMP(&vm->suspended_cpus, &vm->active_cpus) == 0) {
 			VCPU_CTR0(vm, vcpuid, "All vcpus suspended");
 			break;
 		}
 
 		if (vm->rendezvous_func == NULL) {
 			VCPU_CTR0(vm, vcpuid, "Sleeping during suspend");
 			vcpu_require_state_locked(vm, vcpuid, VCPU_SLEEPING);
 			msleep_spin(vcpu, &vcpu->mtx, "vmsusp", hz);
 			vcpu_require_state_locked(vm, vcpuid, VCPU_FROZEN);
 		} else {
 			VCPU_CTR0(vm, vcpuid, "Rendezvous during suspend");
 			vcpu_unlock(vcpu);
 			vm_handle_rendezvous(vm, vcpuid);
 			vcpu_lock(vcpu);
 		}
 	}
 	vcpu_unlock(vcpu);
 
 	/*
 	 * Wakeup the other sleeping vcpus and return to userspace.
 	 */
-	for (i = 0; i < VM_MAXCPU; i++) {
+	for (i = 0; i < vm->maxcpus; i++) {
 		if (CPU_ISSET(i, &vm->suspended_cpus)) {
 			vcpu_notify_event(vm, i, false);
 		}
 	}
 
 	*retu = true;
 	return (0);
 }
 
 static int
 vm_handle_reqidle(struct vm *vm, int vcpuid, bool *retu)
 {
 	struct vcpu *vcpu = &vm->vcpu[vcpuid];
 
 	vcpu_lock(vcpu);
 	KASSERT(vcpu->reqidle, ("invalid vcpu reqidle %d", vcpu->reqidle));
 	vcpu->reqidle = 0;
 	vcpu_unlock(vcpu);
 	*retu = true;
 	return (0);
 }
 
 int
 vm_suspend(struct vm *vm, enum vm_suspend_how how)
 {
 	int i;
 
 	if (how <= VM_SUSPEND_NONE || how >= VM_SUSPEND_LAST)
 		return (EINVAL);
 
 	if (atomic_cmpset_int(&vm->suspend, 0, how) == 0) {
 		VM_CTR2(vm, "virtual machine already suspended %d/%d",
 		    vm->suspend, how);
 		return (EALREADY);
 	}
 
 	VM_CTR1(vm, "virtual machine successfully suspended %d", how);
 
 	/*
 	 * Notify all active vcpus that they are now suspended.
 	 */
-	for (i = 0; i < VM_MAXCPU; i++) {
+	for (i = 0; i < vm->maxcpus; i++) {
 		if (CPU_ISSET(i, &vm->active_cpus))
 			vcpu_notify_event(vm, i, false);
 	}
 
 	return (0);
 }
 
 void
 vm_exit_suspended(struct vm *vm, int vcpuid, uint64_t rip)
 {
 	struct vm_exit *vmexit;
 
 	KASSERT(vm->suspend > VM_SUSPEND_NONE && vm->suspend < VM_SUSPEND_LAST,
 	    ("vm_exit_suspended: invalid suspend type %d", vm->suspend));
 
 	vmexit = vm_exitinfo(vm, vcpuid);
 	vmexit->rip = rip;
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_SUSPENDED;
 	vmexit->u.suspended.how = vm->suspend;
 }
 
 void
 vm_exit_debug(struct vm *vm, int vcpuid, uint64_t rip)
 {
 	struct vm_exit *vmexit;
 
 	vmexit = vm_exitinfo(vm, vcpuid);
 	vmexit->rip = rip;
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_DEBUG;
 }
 
 void
 vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip)
 {
 	struct vm_exit *vmexit;
 
 	KASSERT(vm->rendezvous_func != NULL, ("rendezvous not in progress"));
 
 	vmexit = vm_exitinfo(vm, vcpuid);
 	vmexit->rip = rip;
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_RENDEZVOUS;
 	vmm_stat_incr(vm, vcpuid, VMEXIT_RENDEZVOUS, 1);
 }
 
 void
 vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip)
 {
 	struct vm_exit *vmexit;
 
 	vmexit = vm_exitinfo(vm, vcpuid);
 	vmexit->rip = rip;
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_REQIDLE;
 	vmm_stat_incr(vm, vcpuid, VMEXIT_REQIDLE, 1);
 }
 
 void
 vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip)
 {
 	struct vm_exit *vmexit;
 
 	vmexit = vm_exitinfo(vm, vcpuid);
 	vmexit->rip = rip;
 	vmexit->inst_length = 0;
 	vmexit->exitcode = VM_EXITCODE_BOGUS;
 	vmm_stat_incr(vm, vcpuid, VMEXIT_ASTPENDING, 1);
 }
 
 int
 vm_run(struct vm *vm, struct vm_run *vmrun)
 {
 	struct vm_eventinfo evinfo;
 	int error, vcpuid;
 	struct vcpu *vcpu;
 	struct pcb *pcb;
 	uint64_t tscval;
 	struct vm_exit *vme;
 	bool retu, intr_disabled;
 	pmap_t pmap;
 
 	vcpuid = vmrun->cpuid;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (!CPU_ISSET(vcpuid, &vm->active_cpus))
 		return (EINVAL);
 
 	if (CPU_ISSET(vcpuid, &vm->suspended_cpus))
 		return (EINVAL);
 
 	pmap = vmspace_pmap(vm->vmspace);
 	vcpu = &vm->vcpu[vcpuid];
 	vme = &vcpu->exitinfo;
 	evinfo.rptr = &vm->rendezvous_func;
 	evinfo.sptr = &vm->suspend;
 	evinfo.iptr = &vcpu->reqidle;
 restart:
 	critical_enter();
 
 	KASSERT(!CPU_ISSET(curcpu, &pmap->pm_active),
 	    ("vm_run: absurd pm_active"));
 
 	tscval = rdtsc();
 
 	pcb = PCPU_GET(curpcb);
 	set_pcb_flags(pcb, PCB_FULL_IRET);
 
 	restore_guest_fpustate(vcpu);
 
 	vcpu_require_state(vm, vcpuid, VCPU_RUNNING);
 	error = VMRUN(vm->cookie, vcpuid, vcpu->nextrip, pmap, &evinfo);
 	vcpu_require_state(vm, vcpuid, VCPU_FROZEN);
 
 	save_guest_fpustate(vcpu);
 
 	vmm_stat_incr(vm, vcpuid, VCPU_TOTAL_RUNTIME, rdtsc() - tscval);
 
 	critical_exit();
 
 	if (error == 0) {
 		retu = false;
 		vcpu->nextrip = vme->rip + vme->inst_length;
 		switch (vme->exitcode) {
 		case VM_EXITCODE_REQIDLE:
 			error = vm_handle_reqidle(vm, vcpuid, &retu);
 			break;
 		case VM_EXITCODE_SUSPENDED:
 			error = vm_handle_suspend(vm, vcpuid, &retu);
 			break;
 		case VM_EXITCODE_IOAPIC_EOI:
 			vioapic_process_eoi(vm, vcpuid,
 			    vme->u.ioapic_eoi.vector);
 			break;
 		case VM_EXITCODE_RENDEZVOUS:
 			vm_handle_rendezvous(vm, vcpuid);
 			error = 0;
 			break;
 		case VM_EXITCODE_HLT:
 			intr_disabled = ((vme->u.hlt.rflags & PSL_I) == 0);
 			error = vm_handle_hlt(vm, vcpuid, intr_disabled, &retu);
 			break;
 		case VM_EXITCODE_PAGING:
 			error = vm_handle_paging(vm, vcpuid, &retu);
 			break;
 		case VM_EXITCODE_INST_EMUL:
 			error = vm_handle_inst_emul(vm, vcpuid, &retu);
 			break;
 		case VM_EXITCODE_INOUT:
 		case VM_EXITCODE_INOUT_STR:
 			error = vm_handle_inout(vm, vcpuid, vme, &retu);
 			break;
 		case VM_EXITCODE_MONITOR:
 		case VM_EXITCODE_MWAIT:
 		case VM_EXITCODE_VMINSN:
 			vm_inject_ud(vm, vcpuid);
 			break;
 		default:
 			retu = true;	/* handled in userland */
 			break;
 		}
 	}
 
 	if (error == 0 && retu == false)
 		goto restart;
 
 	VCPU_CTR2(vm, vcpuid, "retu %d/%d", error, vme->exitcode);
 
 	/* copy the exit information */
 	bcopy(vme, &vmrun->vm_exit, sizeof(struct vm_exit));
 	return (error);
 }
 
 int
 vm_restart_instruction(void *arg, int vcpuid)
 {
 	struct vm *vm;
 	struct vcpu *vcpu;
 	enum vcpu_state state;
 	uint64_t rip;
 	int error;
 
 	vm = arg;
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 	state = vcpu_get_state(vm, vcpuid, NULL);
 	if (state == VCPU_RUNNING) {
 		/*
 		 * When a vcpu is "running" the next instruction is determined
 		 * by adding 'rip' and 'inst_length' in the vcpu's 'exitinfo'.
 		 * Thus setting 'inst_length' to zero will cause the current
 		 * instruction to be restarted.
 		 */
 		vcpu->exitinfo.inst_length = 0;
 		VCPU_CTR1(vm, vcpuid, "restarting instruction at %#lx by "
 		    "setting inst_length to zero", vcpu->exitinfo.rip);
 	} else if (state == VCPU_FROZEN) {
 		/*
 		 * When a vcpu is "frozen" it is outside the critical section
 		 * around VMRUN() and 'nextrip' points to the next instruction.
 		 * Thus instruction restart is achieved by setting 'nextrip'
 		 * to the vcpu's %rip.
 		 */
 		error = vm_get_register(vm, vcpuid, VM_REG_GUEST_RIP, &rip);
 		KASSERT(!error, ("%s: error %d getting rip", __func__, error));
 		VCPU_CTR2(vm, vcpuid, "restarting instruction by updating "
 		    "nextrip from %#lx to %#lx", vcpu->nextrip, rip);
 		vcpu->nextrip = rip;
 	} else {
 		panic("%s: invalid state %d", __func__, state);
 	}
 	return (0);
 }
 
 int
 vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_t info)
 {
 	struct vcpu *vcpu;
 	int type, vector;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	if (info & VM_INTINFO_VALID) {
 		type = info & VM_INTINFO_TYPE;
 		vector = info & 0xff;
 		if (type == VM_INTINFO_NMI && vector != IDT_NMI)
 			return (EINVAL);
 		if (type == VM_INTINFO_HWEXCEPTION && vector >= 32)
 			return (EINVAL);
 		if (info & VM_INTINFO_RSVD)
 			return (EINVAL);
 	} else {
 		info = 0;
 	}
 	VCPU_CTR2(vm, vcpuid, "%s: info1(%#lx)", __func__, info);
 	vcpu->exitintinfo = info;
 	return (0);
 }
 
 enum exc_class {
 	EXC_BENIGN,
 	EXC_CONTRIBUTORY,
 	EXC_PAGEFAULT
 };
 
 #define	IDT_VE	20	/* Virtualization Exception (Intel specific) */
 
 static enum exc_class
 exception_class(uint64_t info)
 {
 	int type, vector;
 
 	KASSERT(info & VM_INTINFO_VALID, ("intinfo must be valid: %#lx", info));
 	type = info & VM_INTINFO_TYPE;
 	vector = info & 0xff;
 
 	/* Table 6-4, "Interrupt and Exception Classes", Intel SDM, Vol 3 */
 	switch (type) {
 	case VM_INTINFO_HWINTR:
 	case VM_INTINFO_SWINTR:
 	case VM_INTINFO_NMI:
 		return (EXC_BENIGN);
 	default:
 		/*
 		 * Hardware exception.
 		 *
 		 * SVM and VT-x use identical type values to represent NMI,
 		 * hardware interrupt and software interrupt.
 		 *
 		 * SVM uses type '3' for all exceptions. VT-x uses type '3'
 		 * for exceptions except #BP and #OF. #BP and #OF use a type
 		 * value of '5' or '6'. Therefore we don't check for explicit
 		 * values of 'type' to classify 'intinfo' into a hardware
 		 * exception.
 		 */
 		break;
 	}
 
 	switch (vector) {
 	case IDT_PF:
 	case IDT_VE:
 		return (EXC_PAGEFAULT);
 	case IDT_DE:
 	case IDT_TS:
 	case IDT_NP:
 	case IDT_SS:
 	case IDT_GP:
 		return (EXC_CONTRIBUTORY);
 	default:
 		return (EXC_BENIGN);
 	}
 }
 
 static int
 nested_fault(struct vm *vm, int vcpuid, uint64_t info1, uint64_t info2,
     uint64_t *retinfo)
 {
 	enum exc_class exc1, exc2;
 	int type1, vector1;
 
 	KASSERT(info1 & VM_INTINFO_VALID, ("info1 %#lx is not valid", info1));
 	KASSERT(info2 & VM_INTINFO_VALID, ("info2 %#lx is not valid", info2));
 
 	/*
 	 * If an exception occurs while attempting to call the double-fault
 	 * handler the processor enters shutdown mode (aka triple fault).
 	 */
 	type1 = info1 & VM_INTINFO_TYPE;
 	vector1 = info1 & 0xff;
 	if (type1 == VM_INTINFO_HWEXCEPTION && vector1 == IDT_DF) {
 		VCPU_CTR2(vm, vcpuid, "triple fault: info1(%#lx), info2(%#lx)",
 		    info1, info2);
 		vm_suspend(vm, VM_SUSPEND_TRIPLEFAULT);
 		*retinfo = 0;
 		return (0);
 	}
 
 	/*
 	 * Table 6-5 "Conditions for Generating a Double Fault", Intel SDM, Vol3
 	 */
 	exc1 = exception_class(info1);
 	exc2 = exception_class(info2);
 	if ((exc1 == EXC_CONTRIBUTORY && exc2 == EXC_CONTRIBUTORY) ||
 	    (exc1 == EXC_PAGEFAULT && exc2 != EXC_BENIGN)) {
 		/* Convert nested fault into a double fault. */
 		*retinfo = IDT_DF;
 		*retinfo |= VM_INTINFO_VALID | VM_INTINFO_HWEXCEPTION;
 		*retinfo |= VM_INTINFO_DEL_ERRCODE;
 	} else {
 		/* Handle exceptions serially */
 		*retinfo = info2;
 	}
 	return (1);
 }
 
 static uint64_t
 vcpu_exception_intinfo(struct vcpu *vcpu)
 {
 	uint64_t info = 0;
 
 	if (vcpu->exception_pending) {
 		info = vcpu->exc_vector & 0xff;
 		info |= VM_INTINFO_VALID | VM_INTINFO_HWEXCEPTION;
 		if (vcpu->exc_errcode_valid) {
 			info |= VM_INTINFO_DEL_ERRCODE;
 			info |= (uint64_t)vcpu->exc_errcode << 32;
 		}
 	}
 	return (info);
 }
 
 int
 vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *retinfo)
 {
 	struct vcpu *vcpu;
 	uint64_t info1, info2;
 	int valid;
 
-	KASSERT(vcpuid >= 0 && vcpuid < VM_MAXCPU, ("invalid vcpu %d", vcpuid));
+	KASSERT(vcpuid >= 0 &&
+	    vcpuid < vm->maxcpus, ("invalid vcpu %d", vcpuid));
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	info1 = vcpu->exitintinfo;
 	vcpu->exitintinfo = 0;
 
 	info2 = 0;
 	if (vcpu->exception_pending) {
 		info2 = vcpu_exception_intinfo(vcpu);
 		vcpu->exception_pending = 0;
 		VCPU_CTR2(vm, vcpuid, "Exception %d delivered: %#lx",
 		    vcpu->exc_vector, info2);
 	}
 
 	if ((info1 & VM_INTINFO_VALID) && (info2 & VM_INTINFO_VALID)) {
 		valid = nested_fault(vm, vcpuid, info1, info2, retinfo);
 	} else if (info1 & VM_INTINFO_VALID) {
 		*retinfo = info1;
 		valid = 1;
 	} else if (info2 & VM_INTINFO_VALID) {
 		*retinfo = info2;
 		valid = 1;
 	} else {
 		valid = 0;
 	}
 
 	if (valid) {
 		VCPU_CTR4(vm, vcpuid, "%s: info1(%#lx), info2(%#lx), "
 		    "retinfo(%#lx)", __func__, info1, info2, *retinfo);
 	}
 
 	return (valid);
 }
 
 int
 vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 	*info1 = vcpu->exitintinfo;
 	*info2 = vcpu_exception_intinfo(vcpu);
 	return (0);
 }
 
 int
 vm_inject_exception(struct vm *vm, int vcpuid, int vector, int errcode_valid,
     uint32_t errcode, int restart_instruction)
 {
 	struct vcpu *vcpu;
 	uint64_t regval;
 	int error;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (vector < 0 || vector >= 32)
 		return (EINVAL);
 
 	/*
 	 * A double fault exception should never be injected directly into
 	 * the guest. It is a derived exception that results from specific
 	 * combinations of nested faults.
 	 */
 	if (vector == IDT_DF)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	if (vcpu->exception_pending) {
 		VCPU_CTR2(vm, vcpuid, "Unable to inject exception %d due to "
 		    "pending exception %d", vector, vcpu->exc_vector);
 		return (EBUSY);
 	}
 
 	if (errcode_valid) {
 		/*
 		 * Exceptions don't deliver an error code in real mode.
 		 */
 		error = vm_get_register(vm, vcpuid, VM_REG_GUEST_CR0, &regval);
 		KASSERT(!error, ("%s: error %d getting CR0", __func__, error));
 		if (!(regval & CR0_PE))
 			errcode_valid = 0;
 	}
 
 	/*
 	 * From section 26.6.1 "Interruptibility State" in Intel SDM:
 	 *
 	 * Event blocking by "STI" or "MOV SS" is cleared after guest executes
 	 * one instruction or incurs an exception.
 	 */
 	error = vm_set_register(vm, vcpuid, VM_REG_GUEST_INTR_SHADOW, 0);
 	KASSERT(error == 0, ("%s: error %d clearing interrupt shadow",
 	    __func__, error));
 
 	if (restart_instruction)
 		vm_restart_instruction(vm, vcpuid);
 
 	vcpu->exception_pending = 1;
 	vcpu->exc_vector = vector;
 	vcpu->exc_errcode = errcode;
 	vcpu->exc_errcode_valid = errcode_valid;
 	VCPU_CTR1(vm, vcpuid, "Exception %d pending", vector);
 	return (0);
 }
 
 void
 vm_inject_fault(void *vmarg, int vcpuid, int vector, int errcode_valid,
     int errcode)
 {
 	struct vm *vm;
 	int error, restart_instruction;
 
 	vm = vmarg;
 	restart_instruction = 1;
 
 	error = vm_inject_exception(vm, vcpuid, vector, errcode_valid,
 	    errcode, restart_instruction);
 	KASSERT(error == 0, ("vm_inject_exception error %d", error));
 }
 
 void
 vm_inject_pf(void *vmarg, int vcpuid, int error_code, uint64_t cr2)
 {
 	struct vm *vm;
 	int error;
 
 	vm = vmarg;
 	VCPU_CTR2(vm, vcpuid, "Injecting page fault: error_code %#x, cr2 %#lx",
 	    error_code, cr2);
 
 	error = vm_set_register(vm, vcpuid, VM_REG_GUEST_CR2, cr2);
 	KASSERT(error == 0, ("vm_set_register(cr2) error %d", error));
 
 	vm_inject_fault(vm, vcpuid, IDT_PF, 1, error_code);
 }
 
 static VMM_STAT(VCPU_NMI_COUNT, "number of NMIs delivered to vcpu");
 
 int
 vm_inject_nmi(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	vcpu->nmi_pending = 1;
 	vcpu_notify_event(vm, vcpuid, false);
 	return (0);
 }
 
 int
 vm_nmi_pending(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_nmi_pending: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	return (vcpu->nmi_pending);
 }
 
 void
 vm_nmi_clear(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_nmi_pending: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	if (vcpu->nmi_pending == 0)
 		panic("vm_nmi_clear: inconsistent nmi_pending state");
 
 	vcpu->nmi_pending = 0;
 	vmm_stat_incr(vm, vcpuid, VCPU_NMI_COUNT, 1);
 }
 
 static VMM_STAT(VCPU_EXTINT_COUNT, "number of ExtINTs delivered to vcpu");
 
 int
 vm_inject_extint(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	vcpu->extint_pending = 1;
 	vcpu_notify_event(vm, vcpuid, false);
 	return (0);
 }
 
 int
 vm_extint_pending(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_extint_pending: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	return (vcpu->extint_pending);
 }
 
 void
 vm_extint_clear(struct vm *vm, int vcpuid)
 {
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_extint_pending: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	if (vcpu->extint_pending == 0)
 		panic("vm_extint_clear: inconsistent extint_pending state");
 
 	vcpu->extint_pending = 0;
 	vmm_stat_incr(vm, vcpuid, VCPU_EXTINT_COUNT, 1);
 }
 
 int
 vm_get_capability(struct vm *vm, int vcpu, int type, int *retval)
 {
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm->maxcpus)
 		return (EINVAL);
 
 	if (type < 0 || type >= VM_CAP_MAX)
 		return (EINVAL);
 
 	return (VMGETCAP(vm->cookie, vcpu, type, retval));
 }
 
 int
 vm_set_capability(struct vm *vm, int vcpu, int type, int val)
 {
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm->maxcpus)
 		return (EINVAL);
 
 	if (type < 0 || type >= VM_CAP_MAX)
 		return (EINVAL);
 
 	return (VMSETCAP(vm->cookie, vcpu, type, val));
 }
 
 struct vlapic *
 vm_lapic(struct vm *vm, int cpu)
 {
 	return (vm->vcpu[cpu].vlapic);
 }
 
 struct vioapic *
 vm_ioapic(struct vm *vm)
 {
 
 	return (vm->vioapic);
 }
 
 struct vhpet *
 vm_hpet(struct vm *vm)
 {
 
 	return (vm->vhpet);
 }
 
 boolean_t
 vmm_is_pptdev(int bus, int slot, int func)
 {
 	int found, i, n;
 	int b, s, f;
 	char *val, *cp, *cp2;
 
 	/*
 	 * XXX
 	 * The length of an environment variable is limited to 128 bytes which
 	 * puts an upper limit on the number of passthru devices that may be
 	 * specified using a single environment variable.
 	 *
 	 * Work around this by scanning multiple environment variable
 	 * names instead of a single one - yuck!
 	 */
 	const char *names[] = { "pptdevs", "pptdevs2", "pptdevs3", NULL };
 
 	/* set pptdevs="1/2/3 4/5/6 7/8/9 10/11/12" */
 	found = 0;
 	for (i = 0; names[i] != NULL && !found; i++) {
 		cp = val = kern_getenv(names[i]);
 		while (cp != NULL && *cp != '\0') {
 			if ((cp2 = strchr(cp, ' ')) != NULL)
 				*cp2 = '\0';
 
 			n = sscanf(cp, "%d/%d/%d", &b, &s, &f);
 			if (n == 3 && bus == b && slot == s && func == f) {
 				found = 1;
 				break;
 			}
 		
 			if (cp2 != NULL)
 				*cp2++ = ' ';
 
 			cp = cp2;
 		}
 		freeenv(val);
 	}
 	return (found);
 }
 
 void *
 vm_iommu_domain(struct vm *vm)
 {
 
 	return (vm->iommu);
 }
 
 int
 vcpu_set_state(struct vm *vm, int vcpuid, enum vcpu_state newstate,
     bool from_idle)
 {
 	int error;
 	struct vcpu *vcpu;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_set_run_state: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	vcpu_lock(vcpu);
 	error = vcpu_set_state_locked(vm, vcpuid, newstate, from_idle);
 	vcpu_unlock(vcpu);
 
 	return (error);
 }
 
 enum vcpu_state
 vcpu_get_state(struct vm *vm, int vcpuid, int *hostcpu)
 {
 	struct vcpu *vcpu;
 	enum vcpu_state state;
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		panic("vm_get_run_state: invalid vcpuid %d", vcpuid);
 
 	vcpu = &vm->vcpu[vcpuid];
 
 	vcpu_lock(vcpu);
 	state = vcpu->state;
 	if (hostcpu != NULL)
 		*hostcpu = vcpu->hostcpu;
 	vcpu_unlock(vcpu);
 
 	return (state);
 }
 
 int
 vm_activate_cpu(struct vm *vm, int vcpuid)
 {
 
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (CPU_ISSET(vcpuid, &vm->active_cpus))
 		return (EBUSY);
 
 	VCPU_CTR0(vm, vcpuid, "activated");
 	CPU_SET_ATOMIC(vcpuid, &vm->active_cpus);
 	return (0);
 }
 
 int
 vm_suspend_cpu(struct vm *vm, int vcpuid)
 {
 	int i;
 
-	if (vcpuid < -1 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < -1 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (vcpuid == -1) {
 		vm->debug_cpus = vm->active_cpus;
-		for (i = 0; i < VM_MAXCPU; i++) {
+		for (i = 0; i < vm->maxcpus; i++) {
 			if (CPU_ISSET(i, &vm->active_cpus))
 				vcpu_notify_event(vm, i, false);
 		}
 	} else {
 		if (!CPU_ISSET(vcpuid, &vm->active_cpus))
 			return (EINVAL);
 
 		CPU_SET_ATOMIC(vcpuid, &vm->debug_cpus);
 		vcpu_notify_event(vm, vcpuid, false);
 	}
 	return (0);
 }
 
 int
 vm_resume_cpu(struct vm *vm, int vcpuid)
 {
 
-	if (vcpuid < -1 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < -1 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (vcpuid == -1) {
 		CPU_ZERO(&vm->debug_cpus);
 	} else {
 		if (!CPU_ISSET(vcpuid, &vm->debug_cpus))
 			return (EINVAL);
 
 		CPU_CLR_ATOMIC(vcpuid, &vm->debug_cpus);
 	}
 	return (0);
 }
 
 int
 vcpu_debugged(struct vm *vm, int vcpuid)
 {
 
 	return (CPU_ISSET(vcpuid, &vm->debug_cpus));
 }
 
 cpuset_t
 vm_active_cpus(struct vm *vm)
 {
 
 	return (vm->active_cpus);
 }
 
 cpuset_t
 vm_debug_cpus(struct vm *vm)
 {
 
 	return (vm->debug_cpus);
 }
 
 cpuset_t
 vm_suspended_cpus(struct vm *vm)
 {
 
 	return (vm->suspended_cpus);
 }
 
 void *
 vcpu_stats(struct vm *vm, int vcpuid)
 {
 
 	return (vm->vcpu[vcpuid].stats);
 }
 
 int
 vm_get_x2apic_state(struct vm *vm, int vcpuid, enum x2apic_state *state)
 {
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	*state = vm->vcpu[vcpuid].x2apic_state;
 
 	return (0);
 }
 
 int
 vm_set_x2apic_state(struct vm *vm, int vcpuid, enum x2apic_state state)
 {
-	if (vcpuid < 0 || vcpuid >= VM_MAXCPU)
+	if (vcpuid < 0 || vcpuid >= vm->maxcpus)
 		return (EINVAL);
 
 	if (state >= X2APIC_STATE_LAST)
 		return (EINVAL);
 
 	vm->vcpu[vcpuid].x2apic_state = state;
 
 	vlapic_set_x2apic_state(vm, vcpuid, state);
 
 	return (0);
 }
 
 /*
  * This function is called to ensure that a vcpu "sees" a pending event
  * as soon as possible:
  * - If the vcpu thread is sleeping then it is woken up.
  * - If the vcpu is running on a different host_cpu then an IPI will be directed
  *   to the host_cpu to cause the vcpu to trap into the hypervisor.
  */
 static void
 vcpu_notify_event_locked(struct vcpu *vcpu, bool lapic_intr)
 {
 	int hostcpu;
 
 	hostcpu = vcpu->hostcpu;
 	if (vcpu->state == VCPU_RUNNING) {
 		KASSERT(hostcpu != NOCPU, ("vcpu running on invalid hostcpu"));
 		if (hostcpu != curcpu) {
 			if (lapic_intr) {
 				vlapic_post_intr(vcpu->vlapic, hostcpu,
 				    vmm_ipinum);
 			} else {
 				ipi_cpu(hostcpu, vmm_ipinum);
 			}
 		} else {
 			/*
 			 * If the 'vcpu' is running on 'curcpu' then it must
 			 * be sending a notification to itself (e.g. SELF_IPI).
 			 * The pending event will be picked up when the vcpu
 			 * transitions back to guest context.
 			 */
 		}
 	} else {
 		KASSERT(hostcpu == NOCPU, ("vcpu state %d not consistent "
 		    "with hostcpu %d", vcpu->state, hostcpu));
 		if (vcpu->state == VCPU_SLEEPING)
 			wakeup_one(vcpu);
 	}
 }
 
 void
 vcpu_notify_event(struct vm *vm, int vcpuid, bool lapic_intr)
 {
 	struct vcpu *vcpu = &vm->vcpu[vcpuid];
 
 	vcpu_lock(vcpu);
 	vcpu_notify_event_locked(vcpu, lapic_intr);
 	vcpu_unlock(vcpu);
 }
 
 struct vmspace *
 vm_get_vmspace(struct vm *vm)
 {
 
 	return (vm->vmspace);
 }
 
 int
 vm_apicid2vcpuid(struct vm *vm, int apicid)
 {
 	/*
 	 * XXX apic id is assumed to be numerically identical to vcpu id
 	 */
 	return (apicid);
 }
 
 void
 vm_smp_rendezvous(struct vm *vm, int vcpuid, cpuset_t dest,
     vm_rendezvous_func_t func, void *arg)
 {
 	int i;
 
 	/*
 	 * Enforce that this function is called without any locks
 	 */
 	WITNESS_WARN(WARN_PANIC, NULL, "vm_smp_rendezvous");
-	KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < VM_MAXCPU),
+	KASSERT(vcpuid == -1 || (vcpuid >= 0 && vcpuid < vm->maxcpus),
 	    ("vm_smp_rendezvous: invalid vcpuid %d", vcpuid));
 
 restart:
 	mtx_lock(&vm->rendezvous_mtx);
 	if (vm->rendezvous_func != NULL) {
 		/*
 		 * If a rendezvous is already in progress then we need to
 		 * call the rendezvous handler in case this 'vcpuid' is one
 		 * of the targets of the rendezvous.
 		 */
 		RENDEZVOUS_CTR0(vm, vcpuid, "Rendezvous already in progress");
 		mtx_unlock(&vm->rendezvous_mtx);
 		vm_handle_rendezvous(vm, vcpuid);
 		goto restart;
 	}
 	KASSERT(vm->rendezvous_func == NULL, ("vm_smp_rendezvous: previous "
 	    "rendezvous is still in progress"));
 
 	RENDEZVOUS_CTR0(vm, vcpuid, "Initiating rendezvous");
 	vm->rendezvous_req_cpus = dest;
 	CPU_ZERO(&vm->rendezvous_done_cpus);
 	vm->rendezvous_arg = arg;
 	vm_set_rendezvous_func(vm, func);
 	mtx_unlock(&vm->rendezvous_mtx);
 
 	/*
 	 * Wake up any sleeping vcpus and trigger a VM-exit in any running
 	 * vcpus so they handle the rendezvous as soon as possible.
 	 */
-	for (i = 0; i < VM_MAXCPU; i++) {
+	for (i = 0; i < vm->maxcpus; i++) {
 		if (CPU_ISSET(i, &dest))
 			vcpu_notify_event(vm, i, false);
 	}
 
 	vm_handle_rendezvous(vm, vcpuid);
 }
 
 struct vatpic *
 vm_atpic(struct vm *vm)
 {
 	return (vm->vatpic);
 }
 
 struct vatpit *
 vm_atpit(struct vm *vm)
 {
 	return (vm->vatpit);
 }
 
 struct vpmtmr *
 vm_pmtmr(struct vm *vm)
 {
 
 	return (vm->vpmtmr);
 }
 
 struct vrtc *
 vm_rtc(struct vm *vm)
 {
 
 	return (vm->vrtc);
 }
 
 enum vm_reg_name
 vm_segment_name(int seg)
 {
 	static enum vm_reg_name seg_names[] = {
 		VM_REG_GUEST_ES,
 		VM_REG_GUEST_CS,
 		VM_REG_GUEST_SS,
 		VM_REG_GUEST_DS,
 		VM_REG_GUEST_FS,
 		VM_REG_GUEST_GS
 	};
 
 	KASSERT(seg >= 0 && seg < nitems(seg_names),
 	    ("%s: invalid segment encoding %d", __func__, seg));
 	return (seg_names[seg]);
 }
 
 void
 vm_copy_teardown(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo,
     int num_copyinfo)
 {
 	int idx;
 
 	for (idx = 0; idx < num_copyinfo; idx++) {
 		if (copyinfo[idx].cookie != NULL)
 			vm_gpa_release(copyinfo[idx].cookie);
 	}
 	bzero(copyinfo, num_copyinfo * sizeof(struct vm_copyinfo));
 }
 
 int
 vm_copy_setup(struct vm *vm, int vcpuid, struct vm_guest_paging *paging,
     uint64_t gla, size_t len, int prot, struct vm_copyinfo *copyinfo,
     int num_copyinfo, int *fault)
 {
 	int error, idx, nused;
 	size_t n, off, remaining;
 	void *hva, *cookie;
 	uint64_t gpa;
 
 	bzero(copyinfo, sizeof(struct vm_copyinfo) * num_copyinfo);
 
 	nused = 0;
 	remaining = len;
 	while (remaining > 0) {
 		KASSERT(nused < num_copyinfo, ("insufficient vm_copyinfo"));
 		error = vm_gla2gpa(vm, vcpuid, paging, gla, prot, &gpa, fault);
 		if (error || *fault)
 			return (error);
 		off = gpa & PAGE_MASK;
 		n = min(remaining, PAGE_SIZE - off);
 		copyinfo[nused].gpa = gpa;
 		copyinfo[nused].len = n;
 		remaining -= n;
 		gla += n;
 		nused++;
 	}
 
 	for (idx = 0; idx < nused; idx++) {
 		hva = vm_gpa_hold(vm, vcpuid, copyinfo[idx].gpa,
 		    copyinfo[idx].len, prot, &cookie);
 		if (hva == NULL)
 			break;
 		copyinfo[idx].hva = hva;
 		copyinfo[idx].cookie = cookie;
 	}
 
 	if (idx != nused) {
 		vm_copy_teardown(vm, vcpuid, copyinfo, num_copyinfo);
 		return (EFAULT);
 	} else {
 		*fault = 0;
 		return (0);
 	}
 }
 
 void
 vm_copyin(struct vm *vm, int vcpuid, struct vm_copyinfo *copyinfo, void *kaddr,
     size_t len)
 {
 	char *dst;
 	int idx;
 	
 	dst = kaddr;
 	idx = 0;
 	while (len > 0) {
 		bcopy(copyinfo[idx].hva, dst, copyinfo[idx].len);
 		len -= copyinfo[idx].len;
 		dst += copyinfo[idx].len;
 		idx++;
 	}
 }
 
 void
 vm_copyout(struct vm *vm, int vcpuid, const void *kaddr,
     struct vm_copyinfo *copyinfo, size_t len)
 {
 	const char *src;
 	int idx;
 
 	src = kaddr;
 	idx = 0;
 	while (len > 0) {
 		bcopy(src, copyinfo[idx].hva, copyinfo[idx].len);
 		len -= copyinfo[idx].len;
 		src += copyinfo[idx].len;
 		idx++;
 	}
 }
 
 /*
  * Return the amount of in-use and wired memory for the VM. Since
  * these are global stats, only return the values with for vCPU 0
  */
 VMM_STAT_DECLARE(VMM_MEM_RESIDENT);
 VMM_STAT_DECLARE(VMM_MEM_WIRED);
 
 static void
 vm_get_rescnt(struct vm *vm, int vcpu, struct vmm_stat_type *stat)
 {
 
 	if (vcpu == 0) {
 		vmm_stat_set(vm, vcpu, VMM_MEM_RESIDENT,
 	       	    PAGE_SIZE * vmspace_resident_count(vm->vmspace));
 	}	
 }
 
 static void
 vm_get_wiredcnt(struct vm *vm, int vcpu, struct vmm_stat_type *stat)
 {
 
 	if (vcpu == 0) {
 		vmm_stat_set(vm, vcpu, VMM_MEM_WIRED,
 	      	    PAGE_SIZE * pmap_wired_count(vmspace_pmap(vm->vmspace)));
 	}	
 }
 
 VMM_STAT_FUNC(VMM_MEM_RESIDENT, "Resident memory", vm_get_rescnt);
 VMM_STAT_FUNC(VMM_MEM_WIRED, "Wired memory", vm_get_wiredcnt);
Index: user/ngie/bug-237403/sys/amd64/vmm/vmm_dev.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/vmm_dev.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/vmm_dev.c	(revision 346926)
@@ -1,1131 +1,1142 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/jail.h>
 #include <sys/queue.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/malloc.h>
 #include <sys/conf.h>
 #include <sys/sysctl.h>
 #include <sys/libkern.h>
 #include <sys/ioccom.h>
 #include <sys/mman.h>
 #include <sys/uio.h>
 #include <sys/proc.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <vm/vm_map.h>
 #include <vm/vm_object.h>
 
 #include <machine/vmparam.h>
 #include <machine/vmm.h>
 #include <machine/vmm_instruction_emul.h>
 #include <machine/vmm_dev.h>
 
 #include "vmm_lapic.h"
 #include "vmm_stat.h"
 #include "vmm_mem.h"
 #include "io/ppt.h"
 #include "io/vatpic.h"
 #include "io/vioapic.h"
 #include "io/vhpet.h"
 #include "io/vrtc.h"
 
 struct devmem_softc {
 	int	segid;
 	char	*name;
 	struct cdev *cdev;
 	struct vmmdev_softc *sc;
 	SLIST_ENTRY(devmem_softc) link;
 };
 
 struct vmmdev_softc {
 	struct vm	*vm;		/* vm instance cookie */
 	struct cdev	*cdev;
 	SLIST_ENTRY(vmmdev_softc) link;
 	SLIST_HEAD(, devmem_softc) devmem;
 	int		flags;
 };
 #define	VSC_LINKED		0x01
 
 static SLIST_HEAD(, vmmdev_softc) head;
 
 static unsigned pr_allow_flag;
 static struct mtx vmmdev_mtx;
 
 static MALLOC_DEFINE(M_VMMDEV, "vmmdev", "vmmdev");
 
 SYSCTL_DECL(_hw_vmm);
 
 static int vmm_priv_check(struct ucred *ucred);
 static int devmem_create_cdev(const char *vmname, int id, char *devmem);
 static void devmem_destroy(void *arg);
 
 static int
 vmm_priv_check(struct ucred *ucred)
 {
 
 	if (jailed(ucred) &&
 	    !(ucred->cr_prison->pr_allow & pr_allow_flag))
 		return (EPERM);
 
 	return (0);
 }
 
 static int
 vcpu_lock_one(struct vmmdev_softc *sc, int vcpu)
 {
 	int error;
 
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm_get_maxcpus(sc->vm))
 		return (EINVAL);
 
 	error = vcpu_set_state(sc->vm, vcpu, VCPU_FROZEN, true);
 	return (error);
 }
 
 static void
 vcpu_unlock_one(struct vmmdev_softc *sc, int vcpu)
 {
 	enum vcpu_state state;
 
 	state = vcpu_get_state(sc->vm, vcpu, NULL);
 	if (state != VCPU_FROZEN) {
 		panic("vcpu %s(%d) has invalid state %d", vm_name(sc->vm),
 		    vcpu, state);
 	}
 
 	vcpu_set_state(sc->vm, vcpu, VCPU_IDLE, false);
 }
 
 static int
 vcpu_lock_all(struct vmmdev_softc *sc)
 {
 	int error, vcpu;
+	uint16_t maxcpus;
 
-	for (vcpu = 0; vcpu < VM_MAXCPU; vcpu++) {
+	maxcpus = vm_get_maxcpus(sc->vm);
+	for (vcpu = 0; vcpu < maxcpus; vcpu++) {
 		error = vcpu_lock_one(sc, vcpu);
 		if (error)
 			break;
 	}
 
 	if (error) {
 		while (--vcpu >= 0)
 			vcpu_unlock_one(sc, vcpu);
 	}
 
 	return (error);
 }
 
 static void
 vcpu_unlock_all(struct vmmdev_softc *sc)
 {
 	int vcpu;
+	uint16_t maxcpus;
 
-	for (vcpu = 0; vcpu < VM_MAXCPU; vcpu++)
+	maxcpus = vm_get_maxcpus(sc->vm);
+	for (vcpu = 0; vcpu < maxcpus; vcpu++)
 		vcpu_unlock_one(sc, vcpu);
 }
 
 static struct vmmdev_softc *
 vmmdev_lookup(const char *name)
 {
 	struct vmmdev_softc *sc;
 
 #ifdef notyet	/* XXX kernel is not compiled with invariants */
 	mtx_assert(&vmmdev_mtx, MA_OWNED);
 #endif
 
 	SLIST_FOREACH(sc, &head, link) {
 		if (strcmp(name, vm_name(sc->vm)) == 0)
 			break;
 	}
 
 	return (sc);
 }
 
 static struct vmmdev_softc *
 vmmdev_lookup2(struct cdev *cdev)
 {
 
 	return (cdev->si_drv1);
 }
 
 static int
 vmmdev_rw(struct cdev *cdev, struct uio *uio, int flags)
 {
 	int error, off, c, prot;
 	vm_paddr_t gpa, maxaddr;
 	void *hpa, *cookie;
 	struct vmmdev_softc *sc;
+	uint16_t lastcpu;
 
 	error = vmm_priv_check(curthread->td_ucred);
 	if (error)
 		return (error);
 
 	sc = vmmdev_lookup2(cdev);
 	if (sc == NULL)
 		return (ENXIO);
 
 	/*
 	 * Get a read lock on the guest memory map by freezing any vcpu.
 	 */
-	error = vcpu_lock_one(sc, VM_MAXCPU - 1);
+	lastcpu = vm_get_maxcpus(sc->vm) - 1;
+	error = vcpu_lock_one(sc, lastcpu);
 	if (error)
 		return (error);
 
 	prot = (uio->uio_rw == UIO_WRITE ? VM_PROT_WRITE : VM_PROT_READ);
 	maxaddr = vmm_sysmem_maxaddr(sc->vm);
 	while (uio->uio_resid > 0 && error == 0) {
 		gpa = uio->uio_offset;
 		off = gpa & PAGE_MASK;
 		c = min(uio->uio_resid, PAGE_SIZE - off);
 
 		/*
 		 * The VM has a hole in its physical memory map. If we want to
 		 * use 'dd' to inspect memory beyond the hole we need to
 		 * provide bogus data for memory that lies in the hole.
 		 *
 		 * Since this device does not support lseek(2), dd(1) will
 		 * read(2) blocks of data to simulate the lseek(2).
 		 */
-		hpa = vm_gpa_hold(sc->vm, VM_MAXCPU - 1, gpa, c, prot, &cookie);
+		hpa = vm_gpa_hold(sc->vm, lastcpu, gpa, c,
+		    prot, &cookie);
 		if (hpa == NULL) {
 			if (uio->uio_rw == UIO_READ && gpa < maxaddr)
 				error = uiomove(__DECONST(void *, zero_region),
 				    c, uio);
 			else
 				error = EFAULT;
 		} else {
 			error = uiomove(hpa, c, uio);
 			vm_gpa_release(cookie);
 		}
 	}
-	vcpu_unlock_one(sc, VM_MAXCPU - 1);
+	vcpu_unlock_one(sc, lastcpu);
 	return (error);
 }
 
 CTASSERT(sizeof(((struct vm_memseg *)0)->name) >= SPECNAMELEN + 1);
 
 static int
 get_memseg(struct vmmdev_softc *sc, struct vm_memseg *mseg)
 {
 	struct devmem_softc *dsc;
 	int error;
 	bool sysmem;
 
 	error = vm_get_memseg(sc->vm, mseg->segid, &mseg->len, &sysmem, NULL);
 	if (error || mseg->len == 0)
 		return (error);
 
 	if (!sysmem) {
 		SLIST_FOREACH(dsc, &sc->devmem, link) {
 			if (dsc->segid == mseg->segid)
 				break;
 		}
 		KASSERT(dsc != NULL, ("%s: devmem segment %d not found",
 		    __func__, mseg->segid));
 		error = copystr(dsc->name, mseg->name, SPECNAMELEN + 1, NULL);
 	} else {
 		bzero(mseg->name, sizeof(mseg->name));
 	}
 
 	return (error);
 }
 
 static int
 alloc_memseg(struct vmmdev_softc *sc, struct vm_memseg *mseg)
 {
 	char *name;
 	int error;
 	bool sysmem;
 
 	error = 0;
 	name = NULL;
 	sysmem = true;
 
 	if (VM_MEMSEG_NAME(mseg)) {
 		sysmem = false;
 		name = malloc(SPECNAMELEN + 1, M_VMMDEV, M_WAITOK);
 		error = copystr(mseg->name, name, SPECNAMELEN + 1, 0);
 		if (error)
 			goto done;
 	}
 
 	error = vm_alloc_memseg(sc->vm, mseg->segid, mseg->len, sysmem);
 	if (error)
 		goto done;
 
 	if (VM_MEMSEG_NAME(mseg)) {
 		error = devmem_create_cdev(vm_name(sc->vm), mseg->segid, name);
 		if (error)
 			vm_free_memseg(sc->vm, mseg->segid);
 		else
 			name = NULL;	/* freed when 'cdev' is destroyed */
 	}
 done:
 	free(name, M_VMMDEV);
 	return (error);
 }
 
 static int
 vm_get_register_set(struct vm *vm, int vcpu, unsigned int count, int *regnum,
     uint64_t *regval)
 {
 	int error, i;
 
 	error = 0;
 	for (i = 0; i < count; i++) {
 		error = vm_get_register(vm, vcpu, regnum[i], &regval[i]);
 		if (error)
 			break;
 	}
 	return (error);
 }
 
 static int
 vm_set_register_set(struct vm *vm, int vcpu, unsigned int count, int *regnum,
     uint64_t *regval)
 {
 	int error, i;
 
 	error = 0;
 	for (i = 0; i < count; i++) {
 		error = vm_set_register(vm, vcpu, regnum[i], regval[i]);
 		if (error)
 			break;
 	}
 	return (error);
 }
 
 static int
 vmmdev_ioctl(struct cdev *cdev, u_long cmd, caddr_t data, int fflag,
 	     struct thread *td)
 {
 	int error, vcpu, state_changed, size;
 	cpuset_t *cpuset;
 	struct vmmdev_softc *sc;
 	struct vm_register *vmreg;
 	struct vm_seg_desc *vmsegdesc;
 	struct vm_register_set *vmregset;
 	struct vm_run *vmrun;
 	struct vm_exception *vmexc;
 	struct vm_lapic_irq *vmirq;
 	struct vm_lapic_msi *vmmsi;
 	struct vm_ioapic_irq *ioapic_irq;
 	struct vm_isa_irq *isa_irq;
 	struct vm_isa_irq_trigger *isa_irq_trigger;
 	struct vm_capability *vmcap;
 	struct vm_pptdev *pptdev;
 	struct vm_pptdev_mmio *pptmmio;
 	struct vm_pptdev_msi *pptmsi;
 	struct vm_pptdev_msix *pptmsix;
 	struct vm_nmi *vmnmi;
 	struct vm_stats *vmstats;
 	struct vm_stat_desc *statdesc;
 	struct vm_x2apic *x2apic;
 	struct vm_gpa_pte *gpapte;
 	struct vm_suspend *vmsuspend;
 	struct vm_gla2gpa *gg;
 	struct vm_activate_cpu *vac;
 	struct vm_cpuset *vm_cpuset;
 	struct vm_intinfo *vmii;
 	struct vm_rtc_time *rtctime;
 	struct vm_rtc_data *rtcdata;
 	struct vm_memmap *mm;
 	struct vm_cpu_topology *topology;
 	uint64_t *regvals;
 	int *regnums;
 
 	error = vmm_priv_check(curthread->td_ucred);
 	if (error)
 		return (error);
 
 	sc = vmmdev_lookup2(cdev);
 	if (sc == NULL)
 		return (ENXIO);
 
 	vcpu = -1;
 	state_changed = 0;
 
 	/*
 	 * Some VMM ioctls can operate only on vcpus that are not running.
 	 */
 	switch (cmd) {
 	case VM_RUN:
 	case VM_GET_REGISTER:
 	case VM_SET_REGISTER:
 	case VM_GET_SEGMENT_DESCRIPTOR:
 	case VM_SET_SEGMENT_DESCRIPTOR:
 	case VM_GET_REGISTER_SET:
 	case VM_SET_REGISTER_SET:
 	case VM_INJECT_EXCEPTION:
 	case VM_GET_CAPABILITY:
 	case VM_SET_CAPABILITY:
 	case VM_PPTDEV_MSI:
 	case VM_PPTDEV_MSIX:
 	case VM_SET_X2APIC_STATE:
 	case VM_GLA2GPA:
 	case VM_GLA2GPA_NOFAULT:
 	case VM_ACTIVATE_CPU:
 	case VM_SET_INTINFO:
 	case VM_GET_INTINFO:
 	case VM_RESTART_INSTRUCTION:
 		/*
 		 * XXX fragile, handle with care
 		 * Assumes that the first field of the ioctl data is the vcpu.
 		 */
 		vcpu = *(int *)data;
 		error = vcpu_lock_one(sc, vcpu);
 		if (error)
 			goto done;
 		state_changed = 1;
 		break;
 
 	case VM_MAP_PPTDEV_MMIO:
 	case VM_BIND_PPTDEV:
 	case VM_UNBIND_PPTDEV:
 	case VM_ALLOC_MEMSEG:
 	case VM_MMAP_MEMSEG:
 	case VM_REINIT:
 		/*
 		 * ioctls that operate on the entire virtual machine must
 		 * prevent all vcpus from running.
 		 */
 		error = vcpu_lock_all(sc);
 		if (error)
 			goto done;
 		state_changed = 2;
 		break;
 
 	case VM_GET_MEMSEG:
 	case VM_MMAP_GETNEXT:
 		/*
 		 * Lock a vcpu to make sure that the memory map cannot be
 		 * modified while it is being inspected.
 		 */
-		vcpu = VM_MAXCPU - 1;
+		vcpu = vm_get_maxcpus(sc->vm) - 1;
 		error = vcpu_lock_one(sc, vcpu);
 		if (error)
 			goto done;
 		state_changed = 1;
 		break;
 
 	default:
 		break;
 	}
 
 	switch(cmd) {
 	case VM_RUN:
 		vmrun = (struct vm_run *)data;
 		error = vm_run(sc->vm, vmrun);
 		break;
 	case VM_SUSPEND:
 		vmsuspend = (struct vm_suspend *)data;
 		error = vm_suspend(sc->vm, vmsuspend->how);
 		break;
 	case VM_REINIT:
 		error = vm_reinit(sc->vm);
 		break;
 	case VM_STAT_DESC: {
 		statdesc = (struct vm_stat_desc *)data;
 		error = vmm_stat_desc_copy(statdesc->index,
 					statdesc->desc, sizeof(statdesc->desc));
 		break;
 	}
 	case VM_STATS: {
 		CTASSERT(MAX_VM_STATS >= MAX_VMM_STAT_ELEMS);
 		vmstats = (struct vm_stats *)data;
 		getmicrotime(&vmstats->tv);
 		error = vmm_stat_copy(sc->vm, vmstats->cpuid,
 				      &vmstats->num_entries, vmstats->statbuf);
 		break;
 	}
 	case VM_PPTDEV_MSI:
 		pptmsi = (struct vm_pptdev_msi *)data;
 		error = ppt_setup_msi(sc->vm, pptmsi->vcpu,
 				      pptmsi->bus, pptmsi->slot, pptmsi->func,
 				      pptmsi->addr, pptmsi->msg,
 				      pptmsi->numvec);
 		break;
 	case VM_PPTDEV_MSIX:
 		pptmsix = (struct vm_pptdev_msix *)data;
 		error = ppt_setup_msix(sc->vm, pptmsix->vcpu,
 				       pptmsix->bus, pptmsix->slot, 
 				       pptmsix->func, pptmsix->idx,
 				       pptmsix->addr, pptmsix->msg,
 				       pptmsix->vector_control);
 		break;
 	case VM_MAP_PPTDEV_MMIO:
 		pptmmio = (struct vm_pptdev_mmio *)data;
 		error = ppt_map_mmio(sc->vm, pptmmio->bus, pptmmio->slot,
 				     pptmmio->func, pptmmio->gpa, pptmmio->len,
 				     pptmmio->hpa);
 		break;
 	case VM_BIND_PPTDEV:
 		pptdev = (struct vm_pptdev *)data;
 		error = vm_assign_pptdev(sc->vm, pptdev->bus, pptdev->slot,
 					 pptdev->func);
 		break;
 	case VM_UNBIND_PPTDEV:
 		pptdev = (struct vm_pptdev *)data;
 		error = vm_unassign_pptdev(sc->vm, pptdev->bus, pptdev->slot,
 					   pptdev->func);
 		break;
 	case VM_INJECT_EXCEPTION:
 		vmexc = (struct vm_exception *)data;
 		error = vm_inject_exception(sc->vm, vmexc->cpuid,
 		    vmexc->vector, vmexc->error_code_valid, vmexc->error_code,
 		    vmexc->restart_instruction);
 		break;
 	case VM_INJECT_NMI:
 		vmnmi = (struct vm_nmi *)data;
 		error = vm_inject_nmi(sc->vm, vmnmi->cpuid);
 		break;
 	case VM_LAPIC_IRQ:
 		vmirq = (struct vm_lapic_irq *)data;
 		error = lapic_intr_edge(sc->vm, vmirq->cpuid, vmirq->vector);
 		break;
 	case VM_LAPIC_LOCAL_IRQ:
 		vmirq = (struct vm_lapic_irq *)data;
 		error = lapic_set_local_intr(sc->vm, vmirq->cpuid,
 		    vmirq->vector);
 		break;
 	case VM_LAPIC_MSI:
 		vmmsi = (struct vm_lapic_msi *)data;
 		error = lapic_intr_msi(sc->vm, vmmsi->addr, vmmsi->msg);
 		break;
 	case VM_IOAPIC_ASSERT_IRQ:
 		ioapic_irq = (struct vm_ioapic_irq *)data;
 		error = vioapic_assert_irq(sc->vm, ioapic_irq->irq);
 		break;
 	case VM_IOAPIC_DEASSERT_IRQ:
 		ioapic_irq = (struct vm_ioapic_irq *)data;
 		error = vioapic_deassert_irq(sc->vm, ioapic_irq->irq);
 		break;
 	case VM_IOAPIC_PULSE_IRQ:
 		ioapic_irq = (struct vm_ioapic_irq *)data;
 		error = vioapic_pulse_irq(sc->vm, ioapic_irq->irq);
 		break;
 	case VM_IOAPIC_PINCOUNT:
 		*(int *)data = vioapic_pincount(sc->vm);
 		break;
 	case VM_ISA_ASSERT_IRQ:
 		isa_irq = (struct vm_isa_irq *)data;
 		error = vatpic_assert_irq(sc->vm, isa_irq->atpic_irq);
 		if (error == 0 && isa_irq->ioapic_irq != -1)
 			error = vioapic_assert_irq(sc->vm,
 			    isa_irq->ioapic_irq);
 		break;
 	case VM_ISA_DEASSERT_IRQ:
 		isa_irq = (struct vm_isa_irq *)data;
 		error = vatpic_deassert_irq(sc->vm, isa_irq->atpic_irq);
 		if (error == 0 && isa_irq->ioapic_irq != -1)
 			error = vioapic_deassert_irq(sc->vm,
 			    isa_irq->ioapic_irq);
 		break;
 	case VM_ISA_PULSE_IRQ:
 		isa_irq = (struct vm_isa_irq *)data;
 		error = vatpic_pulse_irq(sc->vm, isa_irq->atpic_irq);
 		if (error == 0 && isa_irq->ioapic_irq != -1)
 			error = vioapic_pulse_irq(sc->vm, isa_irq->ioapic_irq);
 		break;
 	case VM_ISA_SET_IRQ_TRIGGER:
 		isa_irq_trigger = (struct vm_isa_irq_trigger *)data;
 		error = vatpic_set_irq_trigger(sc->vm,
 		    isa_irq_trigger->atpic_irq, isa_irq_trigger->trigger);
 		break;
 	case VM_MMAP_GETNEXT:
 		mm = (struct vm_memmap *)data;
 		error = vm_mmap_getnext(sc->vm, &mm->gpa, &mm->segid,
 		    &mm->segoff, &mm->len, &mm->prot, &mm->flags);
 		break;
 	case VM_MMAP_MEMSEG:
 		mm = (struct vm_memmap *)data;
 		error = vm_mmap_memseg(sc->vm, mm->gpa, mm->segid, mm->segoff,
 		    mm->len, mm->prot, mm->flags);
 		break;
 	case VM_ALLOC_MEMSEG:
 		error = alloc_memseg(sc, (struct vm_memseg *)data);
 		break;
 	case VM_GET_MEMSEG:
 		error = get_memseg(sc, (struct vm_memseg *)data);
 		break;
 	case VM_GET_REGISTER:
 		vmreg = (struct vm_register *)data;
 		error = vm_get_register(sc->vm, vmreg->cpuid, vmreg->regnum,
 					&vmreg->regval);
 		break;
 	case VM_SET_REGISTER:
 		vmreg = (struct vm_register *)data;
 		error = vm_set_register(sc->vm, vmreg->cpuid, vmreg->regnum,
 					vmreg->regval);
 		break;
 	case VM_SET_SEGMENT_DESCRIPTOR:
 		vmsegdesc = (struct vm_seg_desc *)data;
 		error = vm_set_seg_desc(sc->vm, vmsegdesc->cpuid,
 					vmsegdesc->regnum,
 					&vmsegdesc->desc);
 		break;
 	case VM_GET_SEGMENT_DESCRIPTOR:
 		vmsegdesc = (struct vm_seg_desc *)data;
 		error = vm_get_seg_desc(sc->vm, vmsegdesc->cpuid,
 					vmsegdesc->regnum,
 					&vmsegdesc->desc);
 		break;
 	case VM_GET_REGISTER_SET:
 		vmregset = (struct vm_register_set *)data;
 		if (vmregset->count > VM_REG_LAST) {
 			error = EINVAL;
 			break;
 		}
 		regvals = malloc(sizeof(regvals[0]) * vmregset->count, M_VMMDEV,
 		    M_WAITOK);
 		regnums = malloc(sizeof(regnums[0]) * vmregset->count, M_VMMDEV,
 		    M_WAITOK);
 		error = copyin(vmregset->regnums, regnums, sizeof(regnums[0]) *
 		    vmregset->count);
 		if (error == 0)
 			error = vm_get_register_set(sc->vm, vmregset->cpuid,
 			    vmregset->count, regnums, regvals);
 		if (error == 0)
 			error = copyout(regvals, vmregset->regvals,
 			    sizeof(regvals[0]) * vmregset->count);
 		free(regvals, M_VMMDEV);
 		free(regnums, M_VMMDEV);
 		break;
 	case VM_SET_REGISTER_SET:
 		vmregset = (struct vm_register_set *)data;
 		if (vmregset->count > VM_REG_LAST) {
 			error = EINVAL;
 			break;
 		}
 		regvals = malloc(sizeof(regvals[0]) * vmregset->count, M_VMMDEV,
 		    M_WAITOK);
 		regnums = malloc(sizeof(regnums[0]) * vmregset->count, M_VMMDEV,
 		    M_WAITOK);
 		error = copyin(vmregset->regnums, regnums, sizeof(regnums[0]) *
 		    vmregset->count);
 		if (error == 0)
 			error = copyin(vmregset->regvals, regvals,
 			    sizeof(regvals[0]) * vmregset->count);
 		if (error == 0)
 			error = vm_set_register_set(sc->vm, vmregset->cpuid,
 			    vmregset->count, regnums, regvals);
 		free(regvals, M_VMMDEV);
 		free(regnums, M_VMMDEV);
 		break;
 	case VM_GET_CAPABILITY:
 		vmcap = (struct vm_capability *)data;
 		error = vm_get_capability(sc->vm, vmcap->cpuid,
 					  vmcap->captype,
 					  &vmcap->capval);
 		break;
 	case VM_SET_CAPABILITY:
 		vmcap = (struct vm_capability *)data;
 		error = vm_set_capability(sc->vm, vmcap->cpuid,
 					  vmcap->captype,
 					  vmcap->capval);
 		break;
 	case VM_SET_X2APIC_STATE:
 		x2apic = (struct vm_x2apic *)data;
 		error = vm_set_x2apic_state(sc->vm,
 					    x2apic->cpuid, x2apic->state);
 		break;
 	case VM_GET_X2APIC_STATE:
 		x2apic = (struct vm_x2apic *)data;
 		error = vm_get_x2apic_state(sc->vm,
 					    x2apic->cpuid, &x2apic->state);
 		break;
 	case VM_GET_GPA_PMAP:
 		gpapte = (struct vm_gpa_pte *)data;
 		pmap_get_mapping(vmspace_pmap(vm_get_vmspace(sc->vm)),
 				 gpapte->gpa, gpapte->pte, &gpapte->ptenum);
 		error = 0;
 		break;
 	case VM_GET_HPET_CAPABILITIES:
 		error = vhpet_getcap((struct vm_hpet_cap *)data);
 		break;
 	case VM_GLA2GPA: {
 		CTASSERT(PROT_READ == VM_PROT_READ);
 		CTASSERT(PROT_WRITE == VM_PROT_WRITE);
 		CTASSERT(PROT_EXEC == VM_PROT_EXECUTE);
 		gg = (struct vm_gla2gpa *)data;
 		error = vm_gla2gpa(sc->vm, gg->vcpuid, &gg->paging, gg->gla,
 		    gg->prot, &gg->gpa, &gg->fault);
 		KASSERT(error == 0 || error == EFAULT,
 		    ("%s: vm_gla2gpa unknown error %d", __func__, error));
 		break;
 	}
 	case VM_GLA2GPA_NOFAULT:
 		gg = (struct vm_gla2gpa *)data;
 		error = vm_gla2gpa_nofault(sc->vm, gg->vcpuid, &gg->paging,
 		    gg->gla, gg->prot, &gg->gpa, &gg->fault);
 		KASSERT(error == 0 || error == EFAULT,
 		    ("%s: vm_gla2gpa unknown error %d", __func__, error));
 		break;
 	case VM_ACTIVATE_CPU:
 		vac = (struct vm_activate_cpu *)data;
 		error = vm_activate_cpu(sc->vm, vac->vcpuid);
 		break;
 	case VM_GET_CPUS:
 		error = 0;
 		vm_cpuset = (struct vm_cpuset *)data;
 		size = vm_cpuset->cpusetsize;
 		if (size < sizeof(cpuset_t) || size > CPU_MAXSIZE / NBBY) {
 			error = ERANGE;
 			break;
 		}
 		cpuset = malloc(size, M_TEMP, M_WAITOK | M_ZERO);
 		if (vm_cpuset->which == VM_ACTIVE_CPUS)
 			*cpuset = vm_active_cpus(sc->vm);
 		else if (vm_cpuset->which == VM_SUSPENDED_CPUS)
 			*cpuset = vm_suspended_cpus(sc->vm);
 		else if (vm_cpuset->which == VM_DEBUG_CPUS)
 			*cpuset = vm_debug_cpus(sc->vm);
 		else
 			error = EINVAL;
 		if (error == 0)
 			error = copyout(cpuset, vm_cpuset->cpus, size);
 		free(cpuset, M_TEMP);
 		break;
 	case VM_SUSPEND_CPU:
 		vac = (struct vm_activate_cpu *)data;
 		error = vm_suspend_cpu(sc->vm, vac->vcpuid);
 		break;
 	case VM_RESUME_CPU:
 		vac = (struct vm_activate_cpu *)data;
 		error = vm_resume_cpu(sc->vm, vac->vcpuid);
 		break;
 	case VM_SET_INTINFO:
 		vmii = (struct vm_intinfo *)data;
 		error = vm_exit_intinfo(sc->vm, vmii->vcpuid, vmii->info1);
 		break;
 	case VM_GET_INTINFO:
 		vmii = (struct vm_intinfo *)data;
 		error = vm_get_intinfo(sc->vm, vmii->vcpuid, &vmii->info1,
 		    &vmii->info2);
 		break;
 	case VM_RTC_WRITE:
 		rtcdata = (struct vm_rtc_data *)data;
 		error = vrtc_nvram_write(sc->vm, rtcdata->offset,
 		    rtcdata->value);
 		break;
 	case VM_RTC_READ:
 		rtcdata = (struct vm_rtc_data *)data;
 		error = vrtc_nvram_read(sc->vm, rtcdata->offset,
 		    &rtcdata->value);
 		break;
 	case VM_RTC_SETTIME:
 		rtctime = (struct vm_rtc_time *)data;
 		error = vrtc_set_time(sc->vm, rtctime->secs);
 		break;
 	case VM_RTC_GETTIME:
 		error = 0;
 		rtctime = (struct vm_rtc_time *)data;
 		rtctime->secs = vrtc_get_time(sc->vm);
 		break;
 	case VM_RESTART_INSTRUCTION:
 		error = vm_restart_instruction(sc->vm, vcpu);
 		break;
 	case VM_SET_TOPOLOGY:
 		topology = (struct vm_cpu_topology *)data;
 		error = vm_set_topology(sc->vm, topology->sockets,
 		    topology->cores, topology->threads, topology->maxcpus);
 		break;
 	case VM_GET_TOPOLOGY:
 		topology = (struct vm_cpu_topology *)data;
 		vm_get_topology(sc->vm, &topology->sockets, &topology->cores,
 		    &topology->threads, &topology->maxcpus);
 		error = 0;
 		break;
 	default:
 		error = ENOTTY;
 		break;
 	}
 
 	if (state_changed == 1)
 		vcpu_unlock_one(sc, vcpu);
 	else if (state_changed == 2)
 		vcpu_unlock_all(sc);
 
 done:
 	/* Make sure that no handler returns a bogus value like ERESTART */
 	KASSERT(error >= 0, ("vmmdev_ioctl: invalid error return %d", error));
 	return (error);
 }
 
 static int
 vmmdev_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t mapsize,
     struct vm_object **objp, int nprot)
 {
 	struct vmmdev_softc *sc;
 	vm_paddr_t gpa;
 	size_t len;
 	vm_ooffset_t segoff, first, last;
 	int error, found, segid;
+	uint16_t lastcpu;
 	bool sysmem;
 
 	error = vmm_priv_check(curthread->td_ucred);
 	if (error)
 		return (error);
 
 	first = *offset;
 	last = first + mapsize;
 	if ((nprot & PROT_EXEC) || first < 0 || first >= last)
 		return (EINVAL);
 
 	sc = vmmdev_lookup2(cdev);
 	if (sc == NULL) {
 		/* virtual machine is in the process of being created */
 		return (EINVAL);
 	}
 
 	/*
 	 * Get a read lock on the guest memory map by freezing any vcpu.
 	 */
-	error = vcpu_lock_one(sc, VM_MAXCPU - 1);
+	lastcpu = vm_get_maxcpus(sc->vm) - 1;
+	error = vcpu_lock_one(sc, lastcpu);
 	if (error)
 		return (error);
 
 	gpa = 0;
 	found = 0;
 	while (!found) {
 		error = vm_mmap_getnext(sc->vm, &gpa, &segid, &segoff, &len,
 		    NULL, NULL);
 		if (error)
 			break;
 
 		if (first >= gpa && last <= gpa + len)
 			found = 1;
 		else
 			gpa += len;
 	}
 
 	if (found) {
 		error = vm_get_memseg(sc->vm, segid, &len, &sysmem, objp);
 		KASSERT(error == 0 && *objp != NULL,
 		    ("%s: invalid memory segment %d", __func__, segid));
 		if (sysmem) {
 			vm_object_reference(*objp);
 			*offset = segoff + (first - gpa);
 		} else {
 			error = EINVAL;
 		}
 	}
-	vcpu_unlock_one(sc, VM_MAXCPU - 1);
+	vcpu_unlock_one(sc, lastcpu);
 	return (error);
 }
 
 static void
 vmmdev_destroy(void *arg)
 {
 	struct vmmdev_softc *sc = arg;
 	struct devmem_softc *dsc;
 	int error;
 
 	error = vcpu_lock_all(sc);
 	KASSERT(error == 0, ("%s: error %d freezing vcpus", __func__, error));
 
 	while ((dsc = SLIST_FIRST(&sc->devmem)) != NULL) {
 		KASSERT(dsc->cdev == NULL, ("%s: devmem not free", __func__));
 		SLIST_REMOVE_HEAD(&sc->devmem, link);
 		free(dsc->name, M_VMMDEV);
 		free(dsc, M_VMMDEV);
 	}
 
 	if (sc->cdev != NULL)
 		destroy_dev(sc->cdev);
 
 	if (sc->vm != NULL)
 		vm_destroy(sc->vm);
 
 	if ((sc->flags & VSC_LINKED) != 0) {
 		mtx_lock(&vmmdev_mtx);
 		SLIST_REMOVE(&head, sc, vmmdev_softc, link);
 		mtx_unlock(&vmmdev_mtx);
 	}
 
 	free(sc, M_VMMDEV);
 }
 
 static int
 sysctl_vmm_destroy(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	char buf[VM_MAX_NAMELEN];
 	struct devmem_softc *dsc;
 	struct vmmdev_softc *sc;
 	struct cdev *cdev;
 
 	error = vmm_priv_check(req->td->td_ucred);
 	if (error)
 		return (error);
 
 	strlcpy(buf, "beavis", sizeof(buf));
 	error = sysctl_handle_string(oidp, buf, sizeof(buf), req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 
 	mtx_lock(&vmmdev_mtx);
 	sc = vmmdev_lookup(buf);
 	if (sc == NULL || sc->cdev == NULL) {
 		mtx_unlock(&vmmdev_mtx);
 		return (EINVAL);
 	}
 
 	/*
 	 * The 'cdev' will be destroyed asynchronously when 'si_threadcount'
 	 * goes down to 0 so we should not do it again in the callback.
 	 *
 	 * Setting 'sc->cdev' to NULL is also used to indicate that the VM
 	 * is scheduled for destruction.
 	 */
 	cdev = sc->cdev;
 	sc->cdev = NULL;		
 	mtx_unlock(&vmmdev_mtx);
 
 	/*
 	 * Schedule all cdevs to be destroyed:
 	 *
 	 * - any new operations on the 'cdev' will return an error (ENXIO).
 	 *
 	 * - when the 'si_threadcount' dwindles down to zero the 'cdev' will
 	 *   be destroyed and the callback will be invoked in a taskqueue
 	 *   context.
 	 *
 	 * - the 'devmem' cdevs are destroyed before the virtual machine 'cdev'
 	 */
 	SLIST_FOREACH(dsc, &sc->devmem, link) {
 		KASSERT(dsc->cdev != NULL, ("devmem cdev already destroyed"));
 		destroy_dev_sched_cb(dsc->cdev, devmem_destroy, dsc);
 	}
 	destroy_dev_sched_cb(cdev, vmmdev_destroy, sc);
 	return (0);
 }
 SYSCTL_PROC(_hw_vmm, OID_AUTO, destroy,
 	    CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_PRISON,
 	    NULL, 0, sysctl_vmm_destroy, "A", NULL);
 
 static struct cdevsw vmmdevsw = {
 	.d_name		= "vmmdev",
 	.d_version	= D_VERSION,
 	.d_ioctl	= vmmdev_ioctl,
 	.d_mmap_single	= vmmdev_mmap_single,
 	.d_read		= vmmdev_rw,
 	.d_write	= vmmdev_rw,
 };
 
 static int
 sysctl_vmm_create(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	struct vm *vm;
 	struct cdev *cdev;
 	struct vmmdev_softc *sc, *sc2;
 	char buf[VM_MAX_NAMELEN];
 
 	error = vmm_priv_check(req->td->td_ucred);
 	if (error)
 		return (error);
 
 	strlcpy(buf, "beavis", sizeof(buf));
 	error = sysctl_handle_string(oidp, buf, sizeof(buf), req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 
 	mtx_lock(&vmmdev_mtx);
 	sc = vmmdev_lookup(buf);
 	mtx_unlock(&vmmdev_mtx);
 	if (sc != NULL)
 		return (EEXIST);
 
 	error = vm_create(buf, &vm);
 	if (error != 0)
 		return (error);
 
 	sc = malloc(sizeof(struct vmmdev_softc), M_VMMDEV, M_WAITOK | M_ZERO);
 	sc->vm = vm;
 	SLIST_INIT(&sc->devmem);
 
 	/*
 	 * Lookup the name again just in case somebody sneaked in when we
 	 * dropped the lock.
 	 */
 	mtx_lock(&vmmdev_mtx);
 	sc2 = vmmdev_lookup(buf);
 	if (sc2 == NULL) {
 		SLIST_INSERT_HEAD(&head, sc, link);
 		sc->flags |= VSC_LINKED;
 	}
 	mtx_unlock(&vmmdev_mtx);
 
 	if (sc2 != NULL) {
 		vmmdev_destroy(sc);
 		return (EEXIST);
 	}
 
 	error = make_dev_p(MAKEDEV_CHECKNAME, &cdev, &vmmdevsw, NULL,
 			   UID_ROOT, GID_WHEEL, 0600, "vmm/%s", buf);
 	if (error != 0) {
 		vmmdev_destroy(sc);
 		return (error);
 	}
 
 	mtx_lock(&vmmdev_mtx);
 	sc->cdev = cdev;
 	sc->cdev->si_drv1 = sc;
 	mtx_unlock(&vmmdev_mtx);
 
 	return (0);
 }
 SYSCTL_PROC(_hw_vmm, OID_AUTO, create,
 	    CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_PRISON,
 	    NULL, 0, sysctl_vmm_create, "A", NULL);
 
 void
 vmmdev_init(void)
 {
 	mtx_init(&vmmdev_mtx, "vmm device mutex", NULL, MTX_DEF);
 	pr_allow_flag = prison_add_allow(NULL, "vmm", NULL,
 	    "Allow use of vmm in a jail.");
 }
 
 int
 vmmdev_cleanup(void)
 {
 	int error;
 
 	if (SLIST_EMPTY(&head))
 		error = 0;
 	else
 		error = EBUSY;
 
 	return (error);
 }
 
 static int
 devmem_mmap_single(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t len,
     struct vm_object **objp, int nprot)
 {
 	struct devmem_softc *dsc;
 	vm_ooffset_t first, last;
 	size_t seglen;
 	int error;
+	uint16_t lastcpu;
 	bool sysmem;
 
 	dsc = cdev->si_drv1;
 	if (dsc == NULL) {
 		/* 'cdev' has been created but is not ready for use */
 		return (ENXIO);
 	}
 
 	first = *offset;
 	last = *offset + len;
 	if ((nprot & PROT_EXEC) || first < 0 || first >= last)
 		return (EINVAL);
 
-	error = vcpu_lock_one(dsc->sc, VM_MAXCPU - 1);
+	lastcpu = vm_get_maxcpus(dsc->sc->vm) - 1;
+	error = vcpu_lock_one(dsc->sc, lastcpu);
 	if (error)
 		return (error);
 
 	error = vm_get_memseg(dsc->sc->vm, dsc->segid, &seglen, &sysmem, objp);
 	KASSERT(error == 0 && !sysmem && *objp != NULL,
 	    ("%s: invalid devmem segment %d", __func__, dsc->segid));
 
-	vcpu_unlock_one(dsc->sc, VM_MAXCPU - 1);
+	vcpu_unlock_one(dsc->sc, lastcpu);
 
 	if (seglen >= last) {
 		vm_object_reference(*objp);
 		return (0);
 	} else {
 		return (EINVAL);
 	}
 }
 
 static struct cdevsw devmemsw = {
 	.d_name		= "devmem",
 	.d_version	= D_VERSION,
 	.d_mmap_single	= devmem_mmap_single,
 };
 
 static int
 devmem_create_cdev(const char *vmname, int segid, char *devname)
 {
 	struct devmem_softc *dsc;
 	struct vmmdev_softc *sc;
 	struct cdev *cdev;
 	int error;
 
 	error = make_dev_p(MAKEDEV_CHECKNAME, &cdev, &devmemsw, NULL,
 	    UID_ROOT, GID_WHEEL, 0600, "vmm.io/%s.%s", vmname, devname);
 	if (error)
 		return (error);
 
 	dsc = malloc(sizeof(struct devmem_softc), M_VMMDEV, M_WAITOK | M_ZERO);
 
 	mtx_lock(&vmmdev_mtx);
 	sc = vmmdev_lookup(vmname);
 	KASSERT(sc != NULL, ("%s: vm %s softc not found", __func__, vmname));
 	if (sc->cdev == NULL) {
 		/* virtual machine is being created or destroyed */
 		mtx_unlock(&vmmdev_mtx);
 		free(dsc, M_VMMDEV);
 		destroy_dev_sched_cb(cdev, NULL, 0);
 		return (ENODEV);
 	}
 
 	dsc->segid = segid;
 	dsc->name = devname;
 	dsc->cdev = cdev;
 	dsc->sc = sc;
 	SLIST_INSERT_HEAD(&sc->devmem, dsc, link);
 	mtx_unlock(&vmmdev_mtx);
 
 	/* The 'cdev' is ready for use after 'si_drv1' is initialized */
 	cdev->si_drv1 = dsc;
 	return (0);
 }
 
 static void
 devmem_destroy(void *arg)
 {
 	struct devmem_softc *dsc = arg;
 
 	KASSERT(dsc->cdev, ("%s: devmem cdev already destroyed", __func__));
 	dsc->cdev = NULL;
 	dsc->sc = NULL;
 }
Index: user/ngie/bug-237403/sys/amd64/vmm/vmm_lapic.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/vmm_lapic.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/vmm_lapic.c	(revision 346926)
@@ -1,249 +1,249 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/smp.h>
 
 #include <x86/specialreg.h>
 #include <x86/apicreg.h>
 
 #include <machine/vmm.h>
 #include "vmm_ktr.h"
 #include "vmm_lapic.h"
 #include "vlapic.h"
 
 /*
  * Some MSI message definitions
  */
 #define	MSI_X86_ADDR_MASK	0xfff00000
 #define	MSI_X86_ADDR_BASE	0xfee00000
 #define	MSI_X86_ADDR_RH		0x00000008	/* Redirection Hint */
 #define	MSI_X86_ADDR_LOG	0x00000004	/* Destination Mode */
 
 int
 lapic_set_intr(struct vm *vm, int cpu, int vector, bool level)
 {
 	struct vlapic *vlapic;
 
-	if (cpu < 0 || cpu >= VM_MAXCPU)
+	if (cpu < 0 || cpu >= vm_get_maxcpus(vm))
 		return (EINVAL);
 
 	/*
 	 * According to section "Maskable Hardware Interrupts" in Intel SDM
 	 * vectors 16 through 255 can be delivered through the local APIC.
 	 */
 	if (vector < 16 || vector > 255)
 		return (EINVAL);
 
 	vlapic = vm_lapic(vm, cpu);
 	if (vlapic_set_intr_ready(vlapic, vector, level))
 		vcpu_notify_event(vm, cpu, true);
 	return (0);
 }
 
 int
 lapic_set_local_intr(struct vm *vm, int cpu, int vector)
 {
 	struct vlapic *vlapic;
 	cpuset_t dmask;
 	int error;
 
-	if (cpu < -1 || cpu >= VM_MAXCPU)
+	if (cpu < -1 || cpu >= vm_get_maxcpus(vm))
 		return (EINVAL);
 
 	if (cpu == -1)
 		dmask = vm_active_cpus(vm);
 	else
 		CPU_SETOF(cpu, &dmask);
 	error = 0;
 	while ((cpu = CPU_FFS(&dmask)) != 0) {
 		cpu--;
 		CPU_CLR(cpu, &dmask);
 		vlapic = vm_lapic(vm, cpu);
 		error = vlapic_trigger_lvt(vlapic, vector);
 		if (error)
 			break;
 	}
 
 	return (error);
 }
 
 int
 lapic_intr_msi(struct vm *vm, uint64_t addr, uint64_t msg)
 {
 	int delmode, vec;
 	uint32_t dest;
 	bool phys;
 
 	VM_CTR2(vm, "lapic MSI addr: %#lx msg: %#lx", addr, msg);
 
 	if ((addr & MSI_X86_ADDR_MASK) != MSI_X86_ADDR_BASE) {
 		VM_CTR1(vm, "lapic MSI invalid addr %#lx", addr);
 		return (-1);
 	}
 
 	/*
 	 * Extract the x86-specific fields from the MSI addr/msg
 	 * params according to the Intel Arch spec, Vol3 Ch 10.
 	 *
 	 * The PCI specification does not support level triggered
 	 * MSI/MSI-X so ignore trigger level in 'msg'.
 	 *
 	 * The 'dest' is interpreted as a logical APIC ID if both
 	 * the Redirection Hint and Destination Mode are '1' and
 	 * physical otherwise.
 	 */
 	dest = (addr >> 12) & 0xff;
 	phys = ((addr & (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG)) !=
 	    (MSI_X86_ADDR_RH | MSI_X86_ADDR_LOG));
 	delmode = msg & APIC_DELMODE_MASK;
 	vec = msg & 0xff;
 
 	VM_CTR3(vm, "lapic MSI %s dest %#x, vec %d",
 	    phys ? "physical" : "logical", dest, vec);
 
 	vlapic_deliver_intr(vm, LAPIC_TRIG_EDGE, dest, phys, delmode, vec);
 	return (0);
 }
 
 static boolean_t
 x2apic_msr(u_int msr)
 {
 	if (msr >= 0x800 && msr <= 0xBFF)
 		return (TRUE);
 	else
 		return (FALSE);
 }
 
 static u_int
 x2apic_msr_to_regoff(u_int msr)
 {
 
 	return ((msr - 0x800) << 4);
 }
 
 boolean_t
 lapic_msr(u_int msr)
 {
 
 	if (x2apic_msr(msr) || (msr == MSR_APICBASE))
 		return (TRUE);
 	else
 		return (FALSE);
 }
 
 int
 lapic_rdmsr(struct vm *vm, int cpu, u_int msr, uint64_t *rval, bool *retu)
 {
 	int error;
 	u_int offset;
 	struct vlapic *vlapic;
 
 	vlapic = vm_lapic(vm, cpu);
 
 	if (msr == MSR_APICBASE) {
 		*rval = vlapic_get_apicbase(vlapic);
 		error = 0;
 	} else {
 		offset = x2apic_msr_to_regoff(msr);
 		error = vlapic_read(vlapic, 0, offset, rval, retu);
 	}
 
 	return (error);
 }
 
 int
 lapic_wrmsr(struct vm *vm, int cpu, u_int msr, uint64_t val, bool *retu)
 {
 	int error;
 	u_int offset;
 	struct vlapic *vlapic;
 
 	vlapic = vm_lapic(vm, cpu);
 
 	if (msr == MSR_APICBASE) {
 		error = vlapic_set_apicbase(vlapic, val);
 	} else {
 		offset = x2apic_msr_to_regoff(msr);
 		error = vlapic_write(vlapic, 0, offset, val, retu);
 	}
 
 	return (error);
 }
 
 int
 lapic_mmio_write(void *vm, int cpu, uint64_t gpa, uint64_t wval, int size,
 		 void *arg)
 {
 	int error;
 	uint64_t off;
 	struct vlapic *vlapic;
 
 	off = gpa - DEFAULT_APIC_BASE;
 
 	/*
 	 * Memory mapped local apic accesses must be 4 bytes wide and
 	 * aligned on a 16-byte boundary.
 	 */
 	if (size != 4 || off & 0xf)
 		return (EINVAL);
 
 	vlapic = vm_lapic(vm, cpu);
 	error = vlapic_write(vlapic, 1, off, wval, arg);
 	return (error);
 }
 
 int
 lapic_mmio_read(void *vm, int cpu, uint64_t gpa, uint64_t *rval, int size,
 		void *arg)
 {
 	int error;
 	uint64_t off;
 	struct vlapic *vlapic;
 
 	off = gpa - DEFAULT_APIC_BASE;
 
 	/*
 	 * Memory mapped local apic accesses should be aligned on a
 	 * 16-byte boundary.  They are also suggested to be 4 bytes
 	 * wide, alas not all OSes follow suggestions.
 	 */
 	off &= ~3;
 	if (off & 0xf)
 		return (EINVAL);
 
 	vlapic = vm_lapic(vm, cpu);
 	error = vlapic_read(vlapic, 1, off, rval, arg);
 	return (error);
 }
Index: user/ngie/bug-237403/sys/amd64/vmm/vmm_stat.c
===================================================================
--- user/ngie/bug-237403/sys/amd64/vmm/vmm_stat.c	(revision 346925)
+++ user/ngie/bug-237403/sys/amd64/vmm/vmm_stat.c	(revision 346926)
@@ -1,172 +1,172 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/systm.h>
 #include <sys/malloc.h>
 
 #include <machine/vmm.h>
 #include "vmm_util.h"
 #include "vmm_stat.h"
 
 /*
  * 'vst_num_elems' is the total number of addressable statistic elements
  * 'vst_num_types' is the number of unique statistic types
  *
  * It is always true that 'vst_num_elems' is greater than or equal to
  * 'vst_num_types'. This is because a stat type may represent more than
  * one element (for e.g. VMM_STAT_ARRAY).
  */
 static int vst_num_elems, vst_num_types;
 static struct vmm_stat_type *vsttab[MAX_VMM_STAT_ELEMS];
 
 static MALLOC_DEFINE(M_VMM_STAT, "vmm stat", "vmm stat");
 
 #define	vst_size	((size_t)vst_num_elems * sizeof(uint64_t))
 
 void
 vmm_stat_register(void *arg)
 {
 	struct vmm_stat_type *vst = arg;
 
 	/* We require all stats to identify themselves with a description */
 	if (vst->desc == NULL)
 		return;
 
 	if (vst->scope == VMM_STAT_SCOPE_INTEL && !vmm_is_intel())
 		return;
 
 	if (vst->scope == VMM_STAT_SCOPE_AMD && !vmm_is_amd())
 		return;
 
 	if (vst_num_elems + vst->nelems >= MAX_VMM_STAT_ELEMS) {
 		printf("Cannot accommodate vmm stat type \"%s\"!\n", vst->desc);
 		return;
 	}
 
 	vst->index = vst_num_elems;
 	vst_num_elems += vst->nelems;
 
 	vsttab[vst_num_types++] = vst;
 }
 
 int
 vmm_stat_copy(struct vm *vm, int vcpu, int *num_stats, uint64_t *buf)
 {
 	struct vmm_stat_type *vst;
 	uint64_t *stats;
 	int i;
 
-	if (vcpu < 0 || vcpu >= VM_MAXCPU)
+	if (vcpu < 0 || vcpu >= vm_get_maxcpus(vm))
 		return (EINVAL);
 
 	/* Let stats functions update their counters */
 	for (i = 0; i < vst_num_types; i++) {
 		vst = vsttab[i];
 		if (vst->func != NULL)
 			(*vst->func)(vm, vcpu, vst);
 	}
 
 	/* Copy over the stats */
 	stats = vcpu_stats(vm, vcpu);
 	for (i = 0; i < vst_num_elems; i++)
 		buf[i] = stats[i];
 	*num_stats = vst_num_elems;
 	return (0);
 }
 
 void *
 vmm_stat_alloc(void)
 {
 
 	return (malloc(vst_size, M_VMM_STAT, M_WAITOK));
 }
 
 void
 vmm_stat_init(void *vp)
 {
 
 	bzero(vp, vst_size);
 }
 
 void
 vmm_stat_free(void *vp)
 {
 	free(vp, M_VMM_STAT);
 }
 
 int
 vmm_stat_desc_copy(int index, char *buf, int bufsize)
 {
 	int i;
 	struct vmm_stat_type *vst;
 
 	for (i = 0; i < vst_num_types; i++) {
 		vst = vsttab[i];
 		if (index >= vst->index && index < vst->index + vst->nelems) {
 			if (vst->nelems > 1) {
 				snprintf(buf, bufsize, "%s[%d]",
 					 vst->desc, index - vst->index);
 			} else {
 				strlcpy(buf, vst->desc, bufsize);
 			}
 			return (0);	/* found it */
 		}
 	}
 
 	return (EINVAL);
 }
 
 /* global statistics */
 VMM_STAT(VCPU_MIGRATIONS, "vcpu migration across host cpus");
 VMM_STAT(VMEXIT_COUNT, "total number of vm exits");
 VMM_STAT(VMEXIT_EXTINT, "vm exits due to external interrupt");
 VMM_STAT(VMEXIT_HLT, "number of times hlt was intercepted");
 VMM_STAT(VMEXIT_CR_ACCESS, "number of times %cr access was intercepted");
 VMM_STAT(VMEXIT_RDMSR, "number of times rdmsr was intercepted");
 VMM_STAT(VMEXIT_WRMSR, "number of times wrmsr was intercepted");
 VMM_STAT(VMEXIT_MTRAP, "number of monitor trap exits");
 VMM_STAT(VMEXIT_PAUSE, "number of times pause was intercepted");
 VMM_STAT(VMEXIT_INTR_WINDOW, "vm exits due to interrupt window opening");
 VMM_STAT(VMEXIT_NMI_WINDOW, "vm exits due to nmi window opening");
 VMM_STAT(VMEXIT_INOUT, "number of times in/out was intercepted");
 VMM_STAT(VMEXIT_CPUID, "number of times cpuid was intercepted");
 VMM_STAT(VMEXIT_NESTED_FAULT, "vm exits due to nested page fault");
 VMM_STAT(VMEXIT_INST_EMUL, "vm exits for instruction emulation");
 VMM_STAT(VMEXIT_UNKNOWN, "number of vm exits for unknown reason");
 VMM_STAT(VMEXIT_ASTPENDING, "number of times astpending at exit");
 VMM_STAT(VMEXIT_REQIDLE, "number of times idle requested at exit");
 VMM_STAT(VMEXIT_USERSPACE, "number of vm exits handled in userspace");
 VMM_STAT(VMEXIT_RENDEZVOUS, "number of times rendezvous pending at exit");
 VMM_STAT(VMEXIT_EXCEPTION, "number of vm exits due to exceptions");
Index: user/ngie/bug-237403/sys/arm/allwinner/a10/a10_padconf.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/a10/a10_padconf.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/a10/a10_padconf.c	(revision 346926)
@@ -1,231 +1,231 @@
 /*-
  * Copyright (c) 2016 Emmanuel Vadot <manu@freebsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/types.h>
 
 #include <arm/allwinner/allwinner_pinctrl.h>
 
 #ifdef SOC_ALLWINNER_A10
 
 const static struct allwinner_pins a10_pins[] = {
 	{"PA0",  0, 0,  {"gpio_in", "gpio_out", "emac", "spi1", "uart2", NULL, NULL, NULL}},
 	{"PA1",  0, 1,  {"gpio_in", "gpio_out", "emac", "spi1", "uart2", NULL, NULL, NULL}},
 	{"PA2",  0, 2,  {"gpio_in", "gpio_out", "emac", "spi1", "uart2", NULL, NULL, NULL}},
 	{"PA3",  0, 3,  {"gpio_in", "gpio_out", "emac", "spi1", "uart2", NULL, NULL, NULL}},
 	{"PA4",  0, 4,  {"gpio_in", "gpio_out", "emac", "spi1", NULL, NULL, NULL, NULL}},
 	{"PA5",  0, 5,  {"gpio_in", "gpio_out", "emac", "spi3", NULL, NULL, NULL, NULL}},
 	{"PA6",  0, 6,  {"gpio_in", "gpio_out", "emac", "spi3", NULL, NULL, NULL, NULL}},
 	{"PA7",  0, 7,  {"gpio_in", "gpio_out", "emac", "spi3", NULL, NULL, NULL, NULL}},
 	{"PA8",  0, 8,  {"gpio_in", "gpio_out", "emac", "spi3", NULL, NULL, NULL, NULL}},
 	{"PA9",  0, 9,  {"gpio_in", "gpio_out", "emac", "spi3", NULL, NULL, NULL, NULL}},
 	{"PA10", 0, 10, {"gpio_in", "gpio_out", "emac", NULL, "uart1", NULL, NULL, NULL}},
 	{"PA11", 0, 11, {"gpio_in", "gpio_out", "emac", NULL, "uart1", NULL, NULL, NULL}},
 	{"PA12", 0, 12, {"gpio_in", "gpio_out", "emac", "uart6", "uart1", NULL, NULL, NULL}},
 	{"PA13", 0, 13, {"gpio_in", "gpio_out", "emac", "uart6", "uart1", NULL, NULL, NULL}},
 	{"PA14", 0, 14, {"gpio_in", "gpio_out", "emac", "uart7", "uart1", NULL, NULL, NULL}},
 	{"PA15", 0, 15, {"gpio_in", "gpio_out", "emac", "uart7", "uart1", NULL, NULL, NULL}},
 	{"PA16", 0, 16, {"gpio_in", "gpio_out", NULL, "can", "uart1", NULL, NULL, NULL}},
 	{"PA17", 0, 17, {"gpio_in", "gpio_out", NULL, "can", "uart1", NULL, NULL, NULL}},
 
 	{"PB0",  1, 0,  {"gpio_in", "gpio_out", "i2c0", NULL, NULL, NULL, NULL, NULL}},
 	{"PB1",  1, 1,  {"gpio_in", "gpio_out", "i2c0", NULL, NULL, NULL, NULL, NULL}},
 	{"PB2",  1, 2,  {"gpio_in", "gpio_out", "pwm", NULL, NULL, NULL, NULL, NULL}},
 	{"PB3",  1, 3,  {"gpio_in", "gpio_out", "ir0", NULL, NULL, NULL, NULL, NULL}},
 	{"PB4",  1, 4,  {"gpio_in", "gpio_out", "ir0", NULL, NULL, NULL, NULL, NULL}},
 	{"PB5",  1, 5,  {"gpio_in", "gpio_out", "i2s", "ac97", NULL, NULL, NULL, NULL}},
 	{"PB6",  1, 6,  {"gpio_in", "gpio_out", "i2s", "ac97", NULL, NULL, NULL, NULL}},
 	{"PB7",  1, 7,  {"gpio_in", "gpio_out", "i2s", "ac97", NULL, NULL, NULL, NULL}},
 	{"PB8",  1, 8,  {"gpio_in", "gpio_out", "i2s", "ac97", NULL, NULL, NULL, NULL}},
 	{"PB9",  1, 9,  {"gpio_in", "gpio_out", "i2s", NULL, NULL, NULL, NULL, NULL}},
 	{"PB10", 1, 10, {"gpio_in", "gpio_out", "i2s", NULL, NULL, NULL, NULL, NULL}},
 	{"PB11", 1, 11, {"gpio_in", "gpio_out", "i2s", NULL, NULL, NULL, NULL, NULL}},
 	{"PB12", 1, 12, {"gpio_in", "gpio_out", "i2s", "ac97", NULL, NULL, NULL, NULL}},
 	{"PB13", 1, 13, {"gpio_in", "gpio_out", "spi2", NULL, NULL, NULL, NULL, NULL}},
 	{"PB14", 1, 14, {"gpio_in", "gpio_out", "spi2", "jtag", NULL, NULL, NULL, NULL}},
 	{"PB15", 1, 15, {"gpio_in", "gpio_out", "spi2", "jtag", NULL, NULL, NULL, NULL}},
 	{"PB16", 1, 16, {"gpio_in", "gpio_out", "spi2", "jtag", NULL, NULL, NULL, NULL}},
 	{"PB17", 1, 17, {"gpio_in", "gpio_out", "spi2", "jtag", NULL, NULL, NULL, NULL}},
 	{"PB18", 1, 18, {"gpio_in", "gpio_out", "i2c1", NULL, NULL, NULL, NULL, NULL}},
 	{"PB19", 1, 19, {"gpio_in", "gpio_out", "i2c1", NULL, NULL, NULL, NULL, NULL}},
-	{"PB20", 1, 20, {"gpio_in", "gpio_out", "i2c1", NULL, NULL, NULL, NULL, NULL}},
-	{"PB21", 1, 21, {"gpio_in", "gpio_out", "i2c1", NULL, NULL, NULL, NULL, NULL}},
+	{"PB20", 1, 20, {"gpio_in", "gpio_out", "i2c2", NULL, NULL, NULL, NULL, NULL}},
+	{"PB21", 1, 21, {"gpio_in", "gpio_out", "i2c2", NULL, NULL, NULL, NULL, NULL}},
 	{"PB22", 1, 22, {"gpio_in", "gpio_out", "uart0", "ir1", NULL, NULL, NULL, NULL}},
 	{"PB23", 1, 23, {"gpio_in", "gpio_out", "uart0", "ir1", NULL, NULL, NULL, NULL}},
 
 	{"PC0",  2,  0, {"gpio_in", "gpio_out", "nand", "spi0", NULL, NULL, NULL, NULL}},
 	{"PC1",  2,  1, {"gpio_in", "gpio_out", "nand", "spi0", NULL, NULL, NULL, NULL}},
 	{"PC2",  2,  2, {"gpio_in", "gpio_out", "nand", "spi0", NULL, NULL, NULL, NULL}},
 	{"PC3",  2,  3, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC4",  2,  4, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC5",  2,  5, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC6",  2,  6, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC7",  2,  7, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC8",  2,  8, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC9",  2,  9, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC10", 2, 10, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC11", 2, 11, {"gpio_in", "gpio_out", "nand", "mmc2", NULL, NULL, NULL, NULL}},
 	{"PC12", 2, 12, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC13", 2, 13, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC14", 2, 14, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC15", 2, 15, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC16", 2, 16, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC17", 2, 17, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC18", 2, 18, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 	{"PC19", 2, 19, {"gpio_in", "gpio_out", "nand", "spi2", NULL, NULL, NULL, NULL}},
 	{"PC20", 2, 20, {"gpio_in", "gpio_out", "nand", "spi2", NULL, NULL, NULL, NULL}},
 	{"PC21", 2, 21, {"gpio_in", "gpio_out", "nand", "spi2", NULL, NULL, NULL, NULL}},
 	{"PC22", 2, 22, {"gpio_in", "gpio_out", "nand", "spi2", NULL, NULL, NULL, NULL}},
 	{"PC23", 2, 23, {"gpio_in", "gpio_out", "spi0", NULL, NULL, NULL, NULL, NULL}},
 	{"PC24", 2, 24, {"gpio_in", "gpio_out", "nand", NULL, NULL, NULL, NULL, NULL}},
 
 	{"PD0",  3,  0, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD1",  3,  1, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD2",  3,  2, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD3",  3,  3, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD4",  3,  4, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD5",  3,  5, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD6",  3,  6, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD7",  3,  7, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD8",  3,  8, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD9",  3,  9, {"gpio_in", "gpio_out", "lcd0", "lvds0", NULL, NULL, NULL, NULL}},
 	{"PD10", 3, 10, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD11", 3, 11, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD12", 3, 12, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD13", 3, 13, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD14", 3, 14, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD15", 3, 15, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD16", 3, 16, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD17", 3, 17, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD18", 3, 18, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD19", 3, 19, {"gpio_in", "gpio_out", "lcd0", "lvds1", NULL, NULL, NULL, NULL}},
 	{"PD20", 3, 20, {"gpio_in", "gpio_out", "lcd0", "csi1", NULL, NULL, NULL, NULL}},
 	{"PD21", 3, 21, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD22", 3, 22, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD23", 3, 23, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD24", 3, 24, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD25", 3, 25, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD26", 3, 26, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 	{"PD27", 3, 27, {"gpio_in", "gpio_out", "lcd0", "sim", NULL, NULL, NULL, NULL}},
 
 	{"PE0",  4,  0, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE1",  4,  1, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE2",  4,  2, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE3",  4,  3, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE4",  4,  4, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE5",  4,  5, {"gpio_in", "gpio_out", "ts0", "csi0", "sim", NULL, NULL, NULL}},
 	{"PE6",  4,  6, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE7",  4,  7, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE8",  4,  8, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE9",  4,  9, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE10", 4, 10, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 	{"PE11", 4, 11, {"gpio_in", "gpio_out", "ts0", "csi0", NULL, NULL, NULL, NULL}},
 
 	{"PF0",  5,  0, {"gpio_in", "gpio_out", "mmc0", NULL, "jtag", NULL, NULL, NULL}},
 	{"PF1",  5,  1, {"gpio_in", "gpio_out", "mmc0", NULL, "jtag", NULL, NULL, NULL}},
 	{"PF2",  5,  2, {"gpio_in", "gpio_out", "mmc0", NULL, "uart0", NULL, NULL, NULL}},
 	{"PF3",  5,  3, {"gpio_in", "gpio_out", "mmc0", NULL, "jtag", NULL, NULL, NULL}},
 	{"PF4",  5,  4, {"gpio_in", "gpio_out", "mmc0", NULL, "jtag", NULL, NULL, NULL}},
 	{"PF5",  5,  5, {"gpio_in", "gpio_out", "mmc0", NULL, "jtag", NULL, NULL, NULL}},
 
 	{"PG0",  6,  0, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", NULL, NULL, NULL}},
 	{"PG1",  6,  1, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", NULL, NULL, NULL}},
 	{"PG2",  6,  2, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", NULL, NULL, NULL}},
 	{"PG3",  6,  3, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", NULL, NULL, NULL}},
 	{"PG4",  6,  4, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", "csi0", NULL, NULL}},
 	{"PG5",  6,  5, {"gpio_in", "gpio_out", "ts1", "csi1", "mmc1", "csi0", NULL, NULL}},
 	{"PG6",  6,  6, {"gpio_in", "gpio_out", "ts1", "csi1", "uart3", "csi0", NULL, NULL}},
 	{"PG7",  6,  7, {"gpio_in", "gpio_out", "ts1", "csi1", "uart3", "csi0", NULL, NULL}},
 	{"PG8",  6,  8, {"gpio_in", "gpio_out", "ts1", "csi1", "uart3", "csi0", NULL, NULL}},
 	{"PG9",  6,  9, {"gpio_in", "gpio_out", "ts1", "csi1", "uart3", "csi0", NULL, NULL}},
 	{"PG10", 6, 10, {"gpio_in", "gpio_out", "ts1", "csi1", "uart4", "csi0", NULL, NULL}},
 	{"PG11", 6, 11, {"gpio_in", "gpio_out", "ts1", "csi1", "uart4", "csi0", NULL, NULL}},
 
 	{"PH0",  7,  0, {"gpio_in", "gpio_out", "lcd1", "pata", "uart3", NULL, "eint0", "csi1"}, 6, 0},
 	{"PH1",  7,  1, {"gpio_in", "gpio_out", "lcd1", "pata", "uart3", NULL, "eint1", "csi1"}, 6, 1},
 	{"PH2",  7,  2, {"gpio_in", "gpio_out", "lcd1", "pata", "uart3", NULL, "eint2", "csi1"}, 6, 2},
 	{"PH3",  7,  3, {"gpio_in", "gpio_out", "lcd1", "pata", "uart3", NULL, "eint3", "csi1"}, 6, 3},
 	{"PH4",  7,  4, {"gpio_in", "gpio_out", "lcd1", "pata", "uart4", NULL, "eint4", "csi1"}, 6, 4},
 	{"PH5",  7,  5, {"gpio_in", "gpio_out", "lcd1", "pata", "uart4", NULL, "eint5", "csi1"}, 6, 5},
 	{"PH6",  7,  6, {"gpio_in", "gpio_out", "lcd1", "pata", "uart5", "ms", "eint6", "csi1"}, 6, 6},
 	{"PH7",  7,  7, {"gpio_in", "gpio_out", "lcd1", "pata", "uart5", "ms", "eint7", "csi1"}, 6, 7},
 	{"PH8",  7,  8, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "ms", "eint8", "csi1"}, 6, 8},
 	{"PH9",  7,  9, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "ms", "eint9", "csi1"}, 6, 9},
 	{"PH10", 7, 10, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "ms", "eint10", "csi1"}, 6, 10},
 	{"PH11", 7, 11, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "ms", "eint11", "csi1"}, 6, 11},
 	{"PH12", 7, 12, {"gpio_in", "gpio_out", "lcd1", "pata", "ps2", NULL, "eint12", "csi1"}, 6, 12},
 	{"PH13", 7, 13, {"gpio_in", "gpio_out", "lcd1", "pata", "ps2", "sim", "eint13", "csi1"}, 6, 13},
 	{"PH14", 7, 14, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "sim", "eint14", "csi1"}, 6, 14},
 	{"PH15", 7, 15, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "sim", "eint15", "csi1"}, 6, 15},
 	{"PH16", 7, 16, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", NULL, "eint16", "csi1"}, 6, 16},
 	{"PH17", 7, 17, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "sim", "eint17", "csi1"}, 6, 17},
 	{"PH18", 7, 18, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "sim", "eint18", "csi1"}, 6, 18},
 	{"PH19", 7, 19, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "sim", "eint19", "csi1"}, 6, 19},
 	{"PH20", 7, 20, {"gpio_in", "gpio_out", "lcd1", "pata", "can", NULL, "eint20", "csi1"}, 6, 20},
 	{"PH21", 7, 21, {"gpio_in", "gpio_out", "lcd1", "pata", "can", NULL, "eint21", "csi1"}, 6, 21},
 	{"PH22", 7, 22, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 	{"PH23", 7, 23, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 	{"PH24", 7, 24, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 	{"PH25", 7, 25, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 	{"PH26", 7, 26, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 	{"PH27", 7, 27, {"gpio_in", "gpio_out", "lcd1", "pata", "keypad", "mmc1", NULL, "csi1"}},
 
 	{"PI0",  8,  0, {"gpio_in", "gpio_out", NULL, NULL, NULL, NULL, NULL, NULL}},
 	{"PI1",  8,  1, {"gpio_in", "gpio_out", NULL, NULL, NULL, NULL, NULL, NULL}},
 	{"PI2",  8,  2, {"gpio_in", "gpio_out", NULL, NULL, NULL, NULL, NULL, NULL}},
 	{"PI3",  8,  3, {"gpio_in", "gpio_out", "pwm", NULL, NULL, NULL, NULL, NULL}},
 	{"PI4",  8,  4, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI5",  8,  5, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI6",  8,  6, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI7",  8,  7, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI8",  8,  8, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI9",  8,  9, {"gpio_in", "gpio_out", "mmc3", NULL, NULL, NULL, NULL, NULL}},
 	{"PI10", 8, 10, {"gpio_in", "gpio_out", "spi0", "uart5", NULL, NULL, "eint22", NULL}, 6, 22},
 	{"PI11", 8, 11, {"gpio_in", "gpio_out", "spi0", "uart5", NULL, NULL, "eint23", NULL}, 6, 23},
 	{"PI12", 8, 12, {"gpio_in", "gpio_out", "spi0", "uart6", NULL, NULL, "eint24", NULL}, 6, 24},
 	{"PI13", 8, 13, {"gpio_in", "gpio_out", "spi0", "uart6", NULL, NULL, "eint25", NULL}, 6, 25},
 	{"PI14", 8, 14, {"gpio_in", "gpio_out", "spi0", "ps2", "timer4", NULL, "eint26", NULL}, 6, 26},
 	{"PI15", 8, 15, {"gpio_in", "gpio_out", "spi1", "ps2", "timer5", NULL, "eint27", NULL}, 6, 27},
 	{"PI16", 8, 16, {"gpio_in", "gpio_out", "spi1", "uart2", NULL, NULL, "eint28", NULL}, 6, 28},
 	{"PI17", 8, 17, {"gpio_in", "gpio_out", "spi1", "uart2", NULL, NULL, "eint29", NULL}, 6, 29},
 	{"PI18", 8, 18, {"gpio_in", "gpio_out", "spi1", "uart2", NULL, NULL, "eint30", NULL}, 6, 30},
 	{"PI19", 8, 19, {"gpio_in", "gpio_out", "spi1", "uart2", NULL, NULL, "eint31", NULL}, 6, 31},
 	{"PI20", 8, 20, {"gpio_in", "gpio_out", "ps2", "uart7", "hdmi", NULL, NULL, NULL}},
 	{"PI21", 8, 21, {"gpio_in", "gpio_out", "ps2", "uart7", "hdmi", NULL, NULL, NULL}},
 };
 
 const struct allwinner_padconf a10_padconf = {
 	.npins = sizeof(a10_pins) / sizeof(struct allwinner_pins),
 	.pins = a10_pins,
 };
 
 #endif /* SOC_ALLWINNER_A10 */
Index: user/ngie/bug-237403/sys/arm/allwinner/aw_rsb.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/aw_rsb.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/aw_rsb.c	(revision 346926)
@@ -1,498 +1,500 @@
 /*-
  * Copyright (c) 2016 Jared McNeill <jmcneill@invisible.ca>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * Allwinner RSB (Reduced Serial Bus) and P2WI (Push-Pull Two Wire Interface)
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <machine/bus.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/iicbus/iiconf.h>
 #include <dev/iicbus/iicbus.h>
 
 #include <dev/extres/clk/clk.h>
 #include <dev/extres/hwreset/hwreset.h>
 
 #include "iicbus_if.h"
 
 #define	RSB_CTRL		0x00
 #define	 START_TRANS		(1 << 7)
 #define	 GLOBAL_INT_ENB		(1 << 1)
 #define	 SOFT_RESET		(1 << 0)
 #define	RSB_CCR		0x04
 #define	RSB_INTE		0x08
 #define	RSB_INTS		0x0c
 #define	 INT_TRANS_ERR_ID(x)	(((x) >> 8) & 0xf)
 #define	 INT_LOAD_BSY		(1 << 2)
 #define	 INT_TRANS_ERR		(1 << 1)
 #define	 INT_TRANS_OVER		(1 << 0)
 #define	 INT_MASK		(INT_LOAD_BSY|INT_TRANS_ERR|INT_TRANS_OVER)
 #define	RSB_DADDR0		0x10
 #define	RSB_DADDR1		0x14
 #define	RSB_DLEN		0x18
 #define	 DLEN_READ		(1 << 4)
 #define	RSB_DATA0		0x1c
 #define	RSB_DATA1		0x20
 #define	RSB_CMD			0x2c
 #define	 CMD_SRTA		0xe8
 #define	 CMD_RD8		0x8b
 #define	 CMD_RD16		0x9c
 #define	 CMD_RD32		0xa6
 #define	 CMD_WR8		0x4e
 #define	 CMD_WR16		0x59
 #define	 CMD_WR32		0x63
 #define	RSB_DAR			0x30
 #define	 DAR_RTA		(0xff << 16)
 #define	 DAR_RTA_SHIFT		16
 #define	 DAR_DA			(0xffff << 0)
 #define	 DAR_DA_SHIFT		0
 
 #define	RSB_MAXLEN		8
 #define	RSB_RESET_RETRY		100
 #define	RSB_I2C_TIMEOUT		hz
 
 #define	RSB_ADDR_PMIC_PRIMARY	0x3a3
 #define	RSB_ADDR_PMIC_SECONDARY	0x745
 #define	RSB_ADDR_PERIPH_IC	0xe89
 
 #define	A31_P2WI	1
 #define	A23_RSB		2
 
 static struct ofw_compat_data compat_data[] = {
 	{ "allwinner,sun6i-a31-p2wi",		A31_P2WI },
 	{ "allwinner,sun8i-a23-rsb",		A23_RSB },
 	{ NULL,					0 }
 };
 
 static struct resource_spec rsb_spec[] = {
 	{ SYS_RES_MEMORY,	0,	RF_ACTIVE },
 	{ -1, 0 }
 };
 
 /*
  * Device address to Run-time address mappings.
  *
  * Run-time address (RTA) is an 8-bit value used to address the device during
  * a read or write transaction. The following are valid RTAs:
  *  0x17 0x2d 0x3a 0x4e 0x59 0x63 0x74 0x8b 0x9c 0xa6 0xb1 0xc5 0xd2 0xe8 0xff
  *
  * Allwinner uses RTA 0x2d for the primary PMIC, 0x3a for the secondary PMIC,
  * and 0x4e for the peripheral IC (where applicable).
  */
 static const struct {
 	uint16_t	addr;
 	uint8_t		rta;
 } rsb_rtamap[] = {
 	{ .addr = RSB_ADDR_PMIC_PRIMARY,	.rta = 0x2d },
 	{ .addr = RSB_ADDR_PMIC_SECONDARY,	.rta = 0x3a },
 	{ .addr = RSB_ADDR_PERIPH_IC,		.rta = 0x4e },
 	{ .addr = 0,				.rta = 0 }
 };
 
 struct rsb_softc {
 	struct resource	*res;
 	struct mtx	mtx;
 	clk_t		clk;
 	hwreset_t	rst;
 	device_t	iicbus;
 	int		busy;
 	uint32_t	status;
 	uint16_t	cur_addr;
 	int		type;
 
 	struct iic_msg	*msg;
 };
 
 #define	RSB_LOCK(sc)			mtx_lock(&(sc)->mtx)
 #define	RSB_UNLOCK(sc)			mtx_unlock(&(sc)->mtx)
 #define	RSB_ASSERT_LOCKED(sc)		mtx_assert(&(sc)->mtx, MA_OWNED)
 #define	RSB_READ(sc, reg)		bus_read_4((sc)->res, (reg))
 #define	RSB_WRITE(sc, reg, val)	bus_write_4((sc)->res, (reg), (val))
 
 static phandle_t
 rsb_get_node(device_t bus, device_t dev)
 {
 	return (ofw_bus_get_node(bus));
 }
 
 static int
 rsb_reset(device_t dev, u_char speed, u_char addr, u_char *oldaddr)
 {
 	struct rsb_softc *sc;
 	int retry;
 
 	sc = device_get_softc(dev);
 
 	RSB_LOCK(sc);
 
 	/* Write soft-reset bit and wait for it to self-clear. */
 	RSB_WRITE(sc, RSB_CTRL, SOFT_RESET);
 	for (retry = RSB_RESET_RETRY; retry > 0; retry--)
 		if ((RSB_READ(sc, RSB_CTRL) & SOFT_RESET) == 0)
 			break;
 
 	RSB_UNLOCK(sc);
 
 	if (retry == 0) {
 		device_printf(dev, "soft reset timeout\n");
 		return (ETIMEDOUT);
 	}
 
 	return (IIC_ENOADDR);
 }
 
 static uint32_t
 rsb_encode(const uint8_t *buf, u_int len, u_int off)
 {
 	uint32_t val;
 	u_int n;
 
 	val = 0;
 	for (n = off; n < MIN(len, 4 + off); n++)
 		val |= ((uint32_t)buf[n] << ((n - off) * NBBY));
 
 	return val;
 }
 
 static void
 rsb_decode(const uint32_t val, uint8_t *buf, u_int len, u_int off)
 {
 	u_int n;
 
 	for (n = off; n < MIN(len, 4 + off); n++)
 		buf[n] = (val >> ((n - off) * NBBY)) & 0xff;
 }
 
 static int
 rsb_start(device_t dev)
 {
 	struct rsb_softc *sc;
 	int error, retry;
 
 	sc = device_get_softc(dev);
 
 	RSB_ASSERT_LOCKED(sc);
 
 	/* Start the transfer */
 	RSB_WRITE(sc, RSB_CTRL, GLOBAL_INT_ENB | START_TRANS);
 
 	/* Wait for transfer to complete */
 	error = ETIMEDOUT;
 	for (retry = RSB_I2C_TIMEOUT; retry > 0; retry--) {
 		sc->status |= RSB_READ(sc, RSB_INTS);
 		if ((sc->status & INT_TRANS_OVER) != 0) {
 			error = 0;
 			break;
 		}
 		DELAY((1000 * hz) / RSB_I2C_TIMEOUT);
 	}
 	if (error == 0 && (sc->status & INT_TRANS_OVER) == 0) {
 		device_printf(dev, "transfer error, status 0x%08x\n",
 		    sc->status);
 		error = EIO;
 	}
 
 	return (error);
 
 }
 
 static int
 rsb_set_rta(device_t dev, uint16_t addr)
 {
 	struct rsb_softc *sc;
 	uint8_t rta;
 	int i;
 
 	sc = device_get_softc(dev);
 
 	RSB_ASSERT_LOCKED(sc);
 
 	/* Lookup run-time address for given device address */
 	for (rta = 0, i = 0; rsb_rtamap[i].rta != 0; i++)
 		if (rsb_rtamap[i].addr == addr) {
 			rta = rsb_rtamap[i].rta;
 			break;
 		}
 	if (rta == 0) {
 		device_printf(dev, "RTA not known for address %#x\n", addr);
 		return (ENXIO);
 	}
 
 	/* Set run-time address */
 	RSB_WRITE(sc, RSB_INTS, RSB_READ(sc, RSB_INTS));
 	RSB_WRITE(sc, RSB_DAR, (addr << DAR_DA_SHIFT) | (rta << DAR_RTA_SHIFT));
 	RSB_WRITE(sc, RSB_CMD, CMD_SRTA);
 
 	return (rsb_start(dev));
 }
 
 static int
 rsb_transfer(device_t dev, struct iic_msg *msgs, uint32_t nmsgs)
 {
 	struct rsb_softc *sc;
 	uint32_t daddr[2], data[2], dlen;
 	uint16_t device_addr;
 	uint8_t cmd;
 	int error;
 
 	sc = device_get_softc(dev);
 
 	/*
 	 * P2WI and RSB are not really I2C or SMBus controllers, so there are
 	 * some restrictions imposed by the driver.
 	 *
 	 * Transfers must contain exactly two messages. The first is always
 	 * a write, containing a single data byte offset. Data will either
 	 * be read from or written to the corresponding data byte in the
 	 * second message. The slave address in both messages must be the
 	 * same.
 	 */
 	if (nmsgs != 2 || (msgs[0].flags & IIC_M_RD) == IIC_M_RD ||
 	    (msgs[0].slave >> 1) != (msgs[1].slave >> 1) ||
 	    msgs[0].len != 1 || msgs[1].len > RSB_MAXLEN)
 		return (EINVAL);
 
 	/* The RSB controller can read or write 1, 2, or 4 bytes at a time. */
 	if (sc->type == A23_RSB) {
 		if ((msgs[1].flags & IIC_M_RD) != 0) {
 			switch (msgs[1].len) {
 			case 1:
 				cmd = CMD_RD8;
 				break;
 			case 2:
 				cmd = CMD_RD16;
 				break;
 			case 4:
 				cmd = CMD_RD32;
 				break;
 			default:
 				return (EINVAL);
 			}
 		} else {
 			switch (msgs[1].len) {
 			case 1:
 				cmd = CMD_WR8;
 				break;
 			case 2:
 				cmd = CMD_WR16;
 				break;
 			case 4:
 				cmd = CMD_WR32;
 				break;
 			default:
 				return (EINVAL);
 			}
 		}
 	}
 
 	RSB_LOCK(sc);
 	while (sc->busy)
 		mtx_sleep(sc, &sc->mtx, 0, "i2cbuswait", 0);
 	sc->busy = 1;
 	sc->status = 0;
 
 	/* Select current run-time address if necessary */
 	if (sc->type == A23_RSB) {
 		device_addr = msgs[0].slave >> 1;
 		if (sc->cur_addr != device_addr) {
 			error = rsb_set_rta(dev, device_addr);
 			if (error != 0)
 				goto done;
 			sc->cur_addr = device_addr;
 			sc->status = 0;
 		}
 	}
 
 	/* Clear interrupt status */
 	RSB_WRITE(sc, RSB_INTS, RSB_READ(sc, RSB_INTS));
 
 	/* Program data access address registers */
 	daddr[0] = rsb_encode(msgs[0].buf, msgs[0].len, 0);
 	RSB_WRITE(sc, RSB_DADDR0, daddr[0]);
 
 	/* Write data */
 	if ((msgs[1].flags & IIC_M_RD) == 0) {
 		data[0] = rsb_encode(msgs[1].buf, msgs[1].len, 0);
 		RSB_WRITE(sc, RSB_DATA0, data[0]);
 	}
 
 	/* Set command type for RSB */
 	if (sc->type == A23_RSB)
 		RSB_WRITE(sc, RSB_CMD, cmd);
 
 	/* Program data length register and transfer direction */
 	dlen = msgs[0].len - 1;
 	if ((msgs[1].flags & IIC_M_RD) == IIC_M_RD)
 		dlen |= DLEN_READ;
 	RSB_WRITE(sc, RSB_DLEN, dlen);
 
 	/* Start transfer */
 	error = rsb_start(dev);
 	if (error != 0)
 		goto done;
 
 	/* Read data */
 	if ((msgs[1].flags & IIC_M_RD) == IIC_M_RD) {
 		data[0] = RSB_READ(sc, RSB_DATA0);
 		rsb_decode(data[0], msgs[1].buf, msgs[1].len, 0);
 	}
 
 done:
 	sc->msg = NULL;
 	sc->busy = 0;
 	wakeup(sc);
 	RSB_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 rsb_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	switch (ofw_bus_search_compatible(dev, compat_data)->ocd_data) {
 	case A23_RSB:
 		device_set_desc(dev, "Allwinner RSB");
 		break;
 	case A31_P2WI:
 		device_set_desc(dev, "Allwinner P2WI");
 		break;
 	default:
 		return (ENXIO);
 	}
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 rsb_attach(device_t dev)
 {
 	struct rsb_softc *sc;
 	int error;
 
 	sc = device_get_softc(dev);
 	mtx_init(&sc->mtx, device_get_nameunit(dev), "rsb", MTX_DEF);
 
 	sc->type = ofw_bus_search_compatible(dev, compat_data)->ocd_data;
 
 	if (clk_get_by_ofw_index(dev, 0, 0, &sc->clk) == 0) {
 		error = clk_enable(sc->clk);
 		if (error != 0) {
 			device_printf(dev, "cannot enable clock\n");
 			goto fail;
 		}
 	}
 	if (hwreset_get_by_ofw_idx(dev, 0, 0, &sc->rst) == 0) {
 		error = hwreset_deassert(sc->rst);
 		if (error != 0) {
 			device_printf(dev, "cannot de-assert reset\n");
 			goto fail;
 		}
 	}
 
 	if (bus_alloc_resources(dev, rsb_spec, &sc->res) != 0) {
 		device_printf(dev, "cannot allocate resources for device\n");
 		error = ENXIO;
 		goto fail;
 	}
 
 	sc->iicbus = device_add_child(dev, "iicbus", -1);
 	if (sc->iicbus == NULL) {
 		device_printf(dev, "cannot add iicbus child device\n");
 		error = ENXIO;
 		goto fail;
 	}
 
 	bus_generic_attach(dev);
 
 	return (0);
 
 fail:
 	bus_release_resources(dev, rsb_spec, &sc->res);
 	if (sc->rst != NULL)
 		hwreset_release(sc->rst);
 	if (sc->clk != NULL)
 		clk_release(sc->clk);
 	mtx_destroy(&sc->mtx);
 	return (error);
 }
 
 static device_method_t rsb_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		rsb_probe),
 	DEVMETHOD(device_attach,	rsb_attach),
 
 	/* Bus interface */
 	DEVMETHOD(bus_setup_intr,	bus_generic_setup_intr),
 	DEVMETHOD(bus_teardown_intr,	bus_generic_teardown_intr),
 	DEVMETHOD(bus_alloc_resource,	bus_generic_alloc_resource),
 	DEVMETHOD(bus_release_resource,	bus_generic_release_resource),
 	DEVMETHOD(bus_activate_resource, bus_generic_activate_resource),
 	DEVMETHOD(bus_deactivate_resource, bus_generic_deactivate_resource),
 	DEVMETHOD(bus_adjust_resource,	bus_generic_adjust_resource),
 	DEVMETHOD(bus_set_resource,	bus_generic_rl_set_resource),
 	DEVMETHOD(bus_get_resource,	bus_generic_rl_get_resource),
 
 	/* OFW methods */
 	DEVMETHOD(ofw_bus_get_node,	rsb_get_node),
 
 	/* iicbus interface */
 	DEVMETHOD(iicbus_callback,	iicbus_null_callback),
 	DEVMETHOD(iicbus_reset,		rsb_reset),
 	DEVMETHOD(iicbus_transfer,	rsb_transfer),
 
 	DEVMETHOD_END
 };
 
 static driver_t rsb_driver = {
 	"iichb",
 	rsb_methods,
 	sizeof(struct rsb_softc),
 };
 
 static devclass_t rsb_devclass;
 
 EARLY_DRIVER_MODULE(iicbus, rsb, iicbus_driver, iicbus_devclass, 0, 0,
     BUS_PASS_RESOURCE + BUS_PASS_ORDER_MIDDLE);
 EARLY_DRIVER_MODULE(rsb, simplebus, rsb_driver, rsb_devclass, 0, 0,
     BUS_PASS_RESOURCE + BUS_PASS_ORDER_MIDDLE);
 MODULE_VERSION(rsb, 1);
+MODULE_DEPEND(rsb, iicbus, 1, 1, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
Index: user/ngie/bug-237403/sys/arm/allwinner/aw_rtc.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/aw_rtc.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/aw_rtc.c	(revision 346926)
@@ -1,364 +1,367 @@
 /*-
  * Copyright (c) 2019 Emmanuel Vadot <manu@FreeBSD.Org>
  * Copyright (c) 2016 Vladimir Belian <fate10@gmail.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/time.h>
 #include <sys/rman.h>
 #include <sys/clock.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/resource.h>
 
 #include <machine/bus.h>
 #include <machine/resource.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/extres/clk/clk_fixed.h>
 
 #include <arm/allwinner/aw_machdep.h>
 
 #include "clock_if.h"
 
 #define	LOSC_CTRL_REG			0x00
 #define	A10_RTC_DATE_REG		0x04
 #define	A10_RTC_TIME_REG		0x08
 #define	A31_LOSC_AUTO_SWT_STA		0x04
 #define	A31_RTC_DATE_REG		0x10
 #define	A31_RTC_TIME_REG		0x14
 
 #define	TIME_MASK			0x001f3f3f
 
 #define	LOSC_OSC_SRC			(1 << 0)
 #define	LOSC_GSM			(1 << 3)
 #define	LOSC_AUTO_SW_EN			(1 << 14)
 #define	LOSC_MAGIC			0x16aa0000
 #define	LOSC_BUSY_MASK			0x00000380
 
 #define	IS_SUN7I			(sc->conf->is_a20 == true)
 
 #define	YEAR_MIN			(IS_SUN7I ? 1970 : 2010)
 #define	YEAR_MAX			(IS_SUN7I ? 2100 : 2073)
 #define	YEAR_OFFSET			(IS_SUN7I ? 1900 : 2010)
 #define	YEAR_MASK			(IS_SUN7I ? 0xff : 0x3f)
 #define	LEAP_BIT			(IS_SUN7I ? 24 : 22)
 
 #define	GET_SEC_VALUE(x)		((x)  & 0x0000003f)
 #define	GET_MIN_VALUE(x)		(((x) & 0x00003f00) >> 8)
 #define	GET_HOUR_VALUE(x)		(((x) & 0x001f0000) >> 16)
 #define	GET_DAY_VALUE(x)		((x)  & 0x0000001f)
 #define	GET_MON_VALUE(x)		(((x) & 0x00000f00) >> 8)
 #define	GET_YEAR_VALUE(x)		(((x) >> 16) & YEAR_MASK)
 
 #define	SET_DAY_VALUE(x)		GET_DAY_VALUE(x)
 #define	SET_MON_VALUE(x)		(((x) & 0x0000000f) << 8)
 #define	SET_YEAR_VALUE(x)		(((x) & YEAR_MASK)  << 16)
 #define	SET_LEAP_VALUE(x)		(((x) & 0x00000001) << LEAP_BIT)
 #define	SET_SEC_VALUE(x)		GET_SEC_VALUE(x)
 #define	SET_MIN_VALUE(x)		(((x) & 0x0000003f) << 8)
 #define	SET_HOUR_VALUE(x)		(((x) & 0x0000001f) << 16)
 
 #define	HALF_OF_SEC_NS			500000000
 #define	RTC_RES_US			1000000
 #define	RTC_TIMEOUT			70
 
 #define	RTC_READ(sc, reg) 		bus_read_4((sc)->res, (reg))
 #define	RTC_WRITE(sc, reg, val)		bus_write_4((sc)->res, (reg), (val))
 
 #define	IS_LEAP_YEAR(y) (((y) % 400) == 0 || (((y) % 100) != 0 && ((y) % 4) == 0))
 
 struct aw_rtc_conf {
 	uint64_t	iosc_freq;
 	bus_size_t	rtc_date;
 	bus_size_t	rtc_time;
 	bus_size_t	rtc_losc_sta;
 	bool		is_a20;
 };
 
 struct aw_rtc_conf a10_conf = {
 	.rtc_date = A10_RTC_DATE_REG,
 	.rtc_time = A10_RTC_TIME_REG,
 	.rtc_losc_sta = LOSC_CTRL_REG,
 };
 
 struct aw_rtc_conf a20_conf = {
 	.rtc_date = A10_RTC_DATE_REG,
 	.rtc_time = A10_RTC_TIME_REG,
 	.rtc_losc_sta = LOSC_CTRL_REG,
 	.is_a20 = true,
 };
 
 struct aw_rtc_conf a31_conf = {
 	.iosc_freq = 650000,			/* between 600 and 700 Khz */
 	.rtc_date = A31_RTC_DATE_REG,
 	.rtc_time = A31_RTC_TIME_REG,
 	.rtc_losc_sta = A31_LOSC_AUTO_SWT_STA,
 };
 
 struct aw_rtc_conf h3_conf = {
 	.iosc_freq = 16000000,
 	.rtc_date = A31_RTC_DATE_REG,
 	.rtc_time = A31_RTC_TIME_REG,
 	.rtc_losc_sta = A31_LOSC_AUTO_SWT_STA,
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{ "allwinner,sun4i-a10-rtc", (uintptr_t) &a10_conf },
 	{ "allwinner,sun7i-a20-rtc", (uintptr_t) &a20_conf },
 	{ "allwinner,sun6i-a31-rtc", (uintptr_t) &a31_conf },
 	{ "allwinner,sun8i-h3-rtc", (uintptr_t) &h3_conf },
+	{ "allwinner,sun50i-h5-rtc", (uintptr_t) &h3_conf },
 	{ NULL, 0 }
 };
 
 struct aw_rtc_softc {
 	struct resource		*res;
 	struct aw_rtc_conf	*conf;
 	int			type;
 };
 
 static struct clk_fixed_def aw_rtc_osc32k = {
 	.clkdef.id = 0,
 	.freq = 32768,
 };
 
 static struct clk_fixed_def aw_rtc_iosc = {
 	.clkdef.id = 2,
 };
 
 static void	aw_rtc_install_clocks(struct aw_rtc_softc *sc, device_t dev);
 
 static int aw_rtc_probe(device_t dev);
 static int aw_rtc_attach(device_t dev);
 static int aw_rtc_detach(device_t dev);
 
 static int aw_rtc_gettime(device_t dev, struct timespec *ts);
 static int aw_rtc_settime(device_t dev, struct timespec *ts);
 
 static device_method_t aw_rtc_methods[] = {
 	DEVMETHOD(device_probe,		aw_rtc_probe),
 	DEVMETHOD(device_attach,	aw_rtc_attach),
 	DEVMETHOD(device_detach,	aw_rtc_detach),
 
 	DEVMETHOD(clock_gettime,	aw_rtc_gettime),
 	DEVMETHOD(clock_settime,	aw_rtc_settime),
 
 	DEVMETHOD_END
 };
 
 static driver_t aw_rtc_driver = {
 	"rtc",
 	aw_rtc_methods,
 	sizeof(struct aw_rtc_softc),
 };
 
 static devclass_t aw_rtc_devclass;
 
 EARLY_DRIVER_MODULE(aw_rtc, simplebus, aw_rtc_driver, aw_rtc_devclass, 0, 0,
     BUS_PASS_BUS + BUS_PASS_ORDER_MIDDLE);
+MODULE_VERSION(aw_rtc, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
 
 static int
 aw_rtc_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (!ofw_bus_search_compatible(dev, compat_data)->ocd_data)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner RTC");
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 aw_rtc_attach(device_t dev)
 {
 	struct aw_rtc_softc *sc  = device_get_softc(dev);
 	uint32_t val;
 	int rid = 0;
 
 	sc->res = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid, RF_ACTIVE);
 	if (!sc->res) {
 		device_printf(dev, "could not allocate resources\n");
 		return (ENXIO);
 	}
 
 	sc->conf = (struct aw_rtc_conf *)ofw_bus_search_compatible(dev, compat_data)->ocd_data;
 	val = RTC_READ(sc, LOSC_CTRL_REG);
 	val |= LOSC_AUTO_SW_EN;
 	val |= LOSC_MAGIC | LOSC_GSM | LOSC_OSC_SRC;
 	RTC_WRITE(sc, LOSC_CTRL_REG, val);
 
 	DELAY(100);
 
 	if (bootverbose) {
 		val = RTC_READ(sc, sc->conf->rtc_losc_sta);
 		if ((val & LOSC_OSC_SRC) == 0)
 			device_printf(dev, "Using internal oscillator\n");
 		else
 			device_printf(dev, "Using external oscillator\n");
 	}
 
 	aw_rtc_install_clocks(sc, dev);
 
 	clock_register(dev, RTC_RES_US);
 	
 	return (0);
 }
 
 static int
 aw_rtc_detach(device_t dev)
 {
 	/* can't support detach, since there's no clock_unregister function */
 	return (EBUSY);
 }
 
 static void
 aw_rtc_install_clocks(struct aw_rtc_softc *sc, device_t dev) {
 	struct clkdom *clkdom;
 	const char **clknames;
 	phandle_t node;
 	int nclocks;
 
 	node = ofw_bus_get_node(dev);
 	nclocks = ofw_bus_string_list_to_array(node, "clock-output-names", &clknames);
 	/* No clocks to export */
 	if (nclocks <= 0)
 		return;
 
 	if (nclocks != 3) {
 		device_printf(dev, "Having only %d clocks instead of 3, aborting\n", nclocks);
 		return;
 	}
 
 	clkdom = clkdom_create(dev);
 
 	aw_rtc_osc32k.clkdef.name = clknames[0];
 	if (clknode_fixed_register(clkdom, &aw_rtc_osc32k) != 0)
 		device_printf(dev, "Cannot register osc32k clock\n");
 
 	aw_rtc_iosc.clkdef.name = clknames[2];
 	aw_rtc_iosc.freq = sc->conf->iosc_freq;
 	if (clknode_fixed_register(clkdom, &aw_rtc_iosc) != 0)
 		device_printf(dev, "Cannot register iosc clock\n");
 
 	clkdom_finit(clkdom);
 
 	if (bootverbose)
 		clkdom_dump(clkdom);
 }
 
 static int
 aw_rtc_gettime(device_t dev, struct timespec *ts)
 {
 	struct aw_rtc_softc *sc  = device_get_softc(dev);
 	struct clocktime ct;
 	uint32_t rdate, rtime;
 
 	rdate = RTC_READ(sc, sc->conf->rtc_date);
 	rtime = RTC_READ(sc, sc->conf->rtc_time);
 	
 	if ((rtime & TIME_MASK) == 0)
 		rdate = RTC_READ(sc, sc->conf->rtc_date);
 
 	ct.sec = GET_SEC_VALUE(rtime);
 	ct.min = GET_MIN_VALUE(rtime);
 	ct.hour = GET_HOUR_VALUE(rtime);
 	ct.day = GET_DAY_VALUE(rdate);
 	ct.mon = GET_MON_VALUE(rdate);
 	ct.year = GET_YEAR_VALUE(rdate) + YEAR_OFFSET;
 	ct.dow = -1;
 	/* RTC resolution is 1 sec */
 	ct.nsec = 0;
 	
 	return (clock_ct_to_ts(&ct, ts));
 }
 
 static int
 aw_rtc_settime(device_t dev, struct timespec *ts)
 {
 	struct aw_rtc_softc *sc  = device_get_softc(dev);
 	struct clocktime ct;
 	uint32_t clk, rdate, rtime;
 
 	/* RTC resolution is 1 sec */
 	if (ts->tv_nsec >= HALF_OF_SEC_NS)
 		ts->tv_sec++;
 	ts->tv_nsec = 0;
 
 	clock_ts_to_ct(ts, &ct);
 	
 	if ((ct.year < YEAR_MIN) || (ct.year > YEAR_MAX)) {
 		device_printf(dev, "could not set time, year out of range\n");
 		return (EINVAL);
 	}
 
 	for (clk = 0; RTC_READ(sc, LOSC_CTRL_REG) & LOSC_BUSY_MASK; clk++) {
 		if (clk > RTC_TIMEOUT) {
 			device_printf(dev, "could not set time, RTC busy\n");
 			return (EINVAL);
 		}
 		DELAY(1);
 	}
 	/* reset time register to avoid unexpected date increment */
 	RTC_WRITE(sc, sc->conf->rtc_time, 0);
 
 	rdate = SET_DAY_VALUE(ct.day) | SET_MON_VALUE(ct.mon) |
 		SET_YEAR_VALUE(ct.year - YEAR_OFFSET) | 
 		SET_LEAP_VALUE(IS_LEAP_YEAR(ct.year));
 			
 	rtime = SET_SEC_VALUE(ct.sec) | SET_MIN_VALUE(ct.min) |
 		SET_HOUR_VALUE(ct.hour);
 
 	for (clk = 0; RTC_READ(sc, LOSC_CTRL_REG) & LOSC_BUSY_MASK; clk++) {
 		if (clk > RTC_TIMEOUT) {
 			device_printf(dev, "could not set date, RTC busy\n");
 			return (EINVAL);
 		}
 		DELAY(1);
 	}
 	RTC_WRITE(sc, sc->conf->rtc_date, rdate);
 
 	for (clk = 0; RTC_READ(sc, LOSC_CTRL_REG) & LOSC_BUSY_MASK; clk++) {
 		if (clk > RTC_TIMEOUT) {
 			device_printf(dev, "could not set time, RTC busy\n");
 			return (EINVAL);
 		}
 		DELAY(1);
 	}
 	RTC_WRITE(sc, sc->conf->rtc_time, rtime);
 
 	DELAY(RTC_TIMEOUT);
 
 	return (0);
 }
Index: user/ngie/bug-237403/sys/arm/allwinner/aw_sid.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/aw_sid.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/aw_sid.c	(revision 346926)
@@ -1,416 +1,417 @@
 /*-
  * Copyright (c) 2016 Jared McNeill <jmcneill@invisible.ca>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * Allwinner secure ID controller
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/endian.h>
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/module.h>
 #include <sys/sysctl.h>
 #include <machine/bus.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <arm/allwinner/aw_sid.h>
 
 #include "nvmem_if.h"
 
 /* 
  * Starting at least from sun8iw6 (A83T) EFUSE starts at 0x200 
  * There is 3 registers in the low area to read/write protected EFUSE.
  */
 #define	SID_PRCTL		0x40
 #define	 SID_PRCTL_OFFSET_MASK	0xff
 #define	 SID_PRCTL_OFFSET(n)	(((n) & SID_PRCTL_OFFSET_MASK) << 16)
 #define	 SID_PRCTL_LOCK		(0xac << 8)
 #define	 SID_PRCTL_READ		(0x01 << 1)
 #define	 SID_PRCTL_WRITE	(0x01 << 0)
 #define	SID_PRKEY		0x50
 #define	SID_RDKEY		0x60
 
 #define	EFUSE_OFFSET		0x200
 #define	EFUSE_NAME_SIZE		32
 #define	EFUSE_DESC_SIZE		64
 
 struct aw_sid_efuse {
 	char			name[EFUSE_NAME_SIZE];
 	char			desc[EFUSE_DESC_SIZE];
 	bus_size_t		base;
 	bus_size_t		offset;
 	uint32_t		size;
 	enum aw_sid_fuse_id	id;
 	bool			public;
 };
 
 static struct aw_sid_efuse a10_efuses[] = {
 	{
 		.name = "rootkey",
 		.desc = "Root Key or ChipID",
 		.offset = 0x0,
 		.size = 16,
 		.id = AW_SID_FUSE_ROOTKEY,
 		.public = true,
 	},
 };
 
 static struct aw_sid_efuse a64_efuses[] = {
 	{
 		.name = "rootkey",
 		.desc = "Root Key or ChipID",
 		.base = EFUSE_OFFSET,
 		.offset = 0x00,
 		.size = 16,
 		.id = AW_SID_FUSE_ROOTKEY,
 		.public = true,
 	},
 	{
 		.name = "ths-calib",
 		.desc = "Thermal Sensor Calibration Data",
 		.base = EFUSE_OFFSET,
 		.offset = 0x34,
 		.size = 6,
 		.id = AW_SID_FUSE_THSSENSOR,
 		.public = true,
 	},
 };
 
 static struct aw_sid_efuse a83t_efuses[] = {
 	{
 		.name = "rootkey",
 		.desc = "Root Key or ChipID",
 		.base = EFUSE_OFFSET,
 		.offset = 0x00,
 		.size = 16,
 		.id = AW_SID_FUSE_ROOTKEY,
 		.public = true,
 	},
 	{
 		.name = "ths-calib",
 		.desc = "Thermal Sensor Calibration Data",
 		.base = EFUSE_OFFSET,
 		.offset = 0x34,
 		.size = 8,
 		.id = AW_SID_FUSE_THSSENSOR,
 		.public = true,
 	},
 };
 
 static struct aw_sid_efuse h3_efuses[] = {
 	{
 		.name = "rootkey",
 		.desc = "Root Key or ChipID",
 		.base = EFUSE_OFFSET,
 		.offset = 0x00,
 		.size = 16,
 		.id = AW_SID_FUSE_ROOTKEY,
 		.public = true,
 	},
 	{
 		.name = "ths-calib",
 		.desc = "Thermal Sensor Calibration Data",
 		.base = EFUSE_OFFSET,
 		.offset = 0x34,
 		.size = 2,
 		.id = AW_SID_FUSE_THSSENSOR,
 		.public = false,
 	},
 };
 
 static struct aw_sid_efuse h5_efuses[] = {
 	{
 		.name = "rootkey",
 		.desc = "Root Key or ChipID",
 		.base = EFUSE_OFFSET,
 		.offset = 0x00,
 		.size = 16,
 		.id = AW_SID_FUSE_ROOTKEY,
 		.public = true,
 	},
 	{
 		.name = "ths-calib",
 		.desc = "Thermal Sensor Calibration Data",
 		.base = EFUSE_OFFSET,
 		.offset = 0x34,
 		.size = 4,
 		.id = AW_SID_FUSE_THSSENSOR,
 		.public = true,
 	},
 };
 
 struct aw_sid_conf {
 	struct aw_sid_efuse	*efuses;
 	size_t			nfuses;
 };
 
 static const struct aw_sid_conf a10_conf = {
 	.efuses = a10_efuses,
 	.nfuses = nitems(a10_efuses),
 };
 
 static const struct aw_sid_conf a20_conf = {
 	.efuses = a10_efuses,
 	.nfuses = nitems(a10_efuses),
 };
 
 static const struct aw_sid_conf a64_conf = {
 	.efuses = a64_efuses,
 	.nfuses = nitems(a64_efuses),
 };
 
 static const struct aw_sid_conf a83t_conf = {
 	.efuses = a83t_efuses,
 	.nfuses = nitems(a83t_efuses),
 };
 
 static const struct aw_sid_conf h3_conf = {
 	.efuses = h3_efuses,
 	.nfuses = nitems(h3_efuses),
 };
 
 static const struct aw_sid_conf h5_conf = {
 	.efuses = h5_efuses,
 	.nfuses = nitems(h5_efuses),
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{ "allwinner,sun4i-a10-sid",		(uintptr_t)&a10_conf},
 	{ "allwinner,sun7i-a20-sid",		(uintptr_t)&a20_conf},
 	{ "allwinner,sun50i-a64-sid",		(uintptr_t)&a64_conf},
 	{ "allwinner,sun8i-a83t-sid",		(uintptr_t)&a83t_conf},
 	{ "allwinner,sun8i-h3-sid",		(uintptr_t)&h3_conf},
 	{ "allwinner,sun50i-h5-sid",		(uintptr_t)&h5_conf},
 	{ NULL,					0 }
 };
 
 struct aw_sid_softc {
 	device_t		sid_dev;
 	struct resource		*res;
 	struct aw_sid_conf	*sid_conf;
 	struct mtx		prctl_mtx;
 };
 
 static struct aw_sid_softc *aw_sid_sc;
 
 static struct resource_spec aw_sid_spec[] = {
 	{ SYS_RES_MEMORY,	0,	RF_ACTIVE },
 	{ -1, 0 }
 };
 
 #define	RD1(sc, reg)		bus_read_1((sc)->res, (reg))
 #define	RD4(sc, reg)		bus_read_4((sc)->res, (reg))
 #define	WR4(sc, reg, val)	bus_write_4((sc)->res, (reg), (val))
 
 static int aw_sid_sysctl(SYSCTL_HANDLER_ARGS);
 
 static int
 aw_sid_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner Secure ID Controller");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 aw_sid_attach(device_t dev)
 {
 	struct aw_sid_softc *sc;
 	phandle_t node;
 	int i;
 
 	node = ofw_bus_get_node(dev);
 	sc = device_get_softc(dev);
 	sc->sid_dev = dev;
 
 	if (bus_alloc_resources(dev, aw_sid_spec, &sc->res) != 0) {
 		device_printf(dev, "cannot allocate resources for device\n");
 		return (ENXIO);
 	}
 
 	mtx_init(&sc->prctl_mtx, device_get_nameunit(dev), NULL, MTX_DEF);
 	sc->sid_conf = (struct aw_sid_conf *)ofw_bus_search_compatible(dev, compat_data)->ocd_data;
 	aw_sid_sc = sc;
 
 	/* Register ourself so device can resolve who we are */
 	OF_device_register_xref(OF_xref_from_node(node), dev);
 
 	for (i = 0; i < sc->sid_conf->nfuses ;i++) {\
 		SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 		    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 		    OID_AUTO, sc->sid_conf->efuses[i].name,
 		    CTLTYPE_STRING | CTLFLAG_RD,
 		    dev, sc->sid_conf->efuses[i].id, aw_sid_sysctl,
 		    "A", sc->sid_conf->efuses[i].desc);
 	}
 	return (0);
 }
 
 int
 aw_sid_get_fuse(enum aw_sid_fuse_id id, uint8_t *out, uint32_t *size)
 {
 	struct aw_sid_softc *sc;
 	uint32_t val;
 	int i, j;
 
 	sc = aw_sid_sc;
 	if (sc == NULL)
 		return (ENXIO);
 
 	for (i = 0; i < sc->sid_conf->nfuses; i++)
 		if (id == sc->sid_conf->efuses[i].id)
 			break;
 
 	if (i == sc->sid_conf->nfuses)
 		return (ENOENT);
 
 	if (*size != sc->sid_conf->efuses[i].size) {
 		*size = sc->sid_conf->efuses[i].size;
 		return (ENOMEM);
 	}
 
 	if (out == NULL)
 		return (ENOMEM);
 
 	if (sc->sid_conf->efuses[i].public == false)
 		mtx_lock(&sc->prctl_mtx);
 	for (j = 0; j < sc->sid_conf->efuses[i].size; j += 4) {
 		if (sc->sid_conf->efuses[i].public == false) {
 			val = SID_PRCTL_OFFSET(sc->sid_conf->efuses[i].offset + j) |
 				SID_PRCTL_LOCK |
 				SID_PRCTL_READ;
 			WR4(sc, SID_PRCTL, val);
 			/* Read bit will be cleared once read has concluded */
 			while (RD4(sc, SID_PRCTL) & SID_PRCTL_READ)
 				continue;
 			val = RD4(sc, SID_RDKEY);
 		} else
 			val = RD4(sc, sc->sid_conf->efuses[i].base +
 			    sc->sid_conf->efuses[i].offset + j);
 		out[j] = val & 0xFF;
 		if (j + 1 < *size)
 			out[j + 1] = (val & 0xFF00) >> 8;
 		if (j + 2 < *size)
 			out[j + 2] = (val & 0xFF0000) >> 16;
 		if (j + 3 < *size)
 			out[j + 3] = (val & 0xFF000000) >> 24;
 	}
 	if (sc->sid_conf->efuses[i].public == false)
 		mtx_unlock(&sc->prctl_mtx);
 
 	return (0);
 }
 
 static int
 aw_sid_read(device_t dev, uint32_t offset, uint32_t size, uint8_t *buffer)
 {
 	struct aw_sid_softc *sc;
 	enum aw_sid_fuse_id fuse_id = 0;
 	int i;
 
 	sc = device_get_softc(dev);
 
 	for (i = 0; i < sc->sid_conf->nfuses; i++)
 		if (offset == (sc->sid_conf->efuses[i].base +
 		    sc->sid_conf->efuses[i].offset)) {
 			fuse_id = sc->sid_conf->efuses[i].id;
 			break;
 		}
 
 	if (fuse_id == 0)
 		return (ENOENT);
 
 	return (aw_sid_get_fuse(fuse_id, buffer, &size));
 }
 
 static int
 aw_sid_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct aw_sid_softc *sc;
 	device_t dev = arg1;
 	enum aw_sid_fuse_id fuse = arg2;
 	uint8_t data[32];
 	char out[128];
 	uint32_t size;
 	int ret, i;
 
 	sc = device_get_softc(dev);
 
 	/* Get the size of the efuse data */
 	size = 0;
 	aw_sid_get_fuse(fuse, NULL, &size);
 	/* We now have the real size */
 	ret = aw_sid_get_fuse(fuse, data, &size);
 	if (ret != 0) {
 		device_printf(dev, "Cannot get fuse id %d: %d\n", fuse, ret);
 		return (ENOENT);
 	}
 
 	for (i = 0; i < size; i++)
 		snprintf(out + (i * 2), sizeof(out) - (i * 2),
 		  "%.2x", data[i]);
 
 	return sysctl_handle_string(oidp, out, sizeof(out), req);
 }
 
 static device_method_t aw_sid_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		aw_sid_probe),
 	DEVMETHOD(device_attach,	aw_sid_attach),
 
 	/* NVMEM interface */
 	DEVMETHOD(nvmem_read,		aw_sid_read),
 	DEVMETHOD_END
 };
 
 static driver_t aw_sid_driver = {
 	"aw_sid",
 	aw_sid_methods,
 	sizeof(struct aw_sid_softc),
 };
 
 static devclass_t aw_sid_devclass;
 
 EARLY_DRIVER_MODULE(aw_sid, simplebus, aw_sid_driver, aw_sid_devclass, 0, 0,
     BUS_PASS_RESOURCE + BUS_PASS_ORDER_FIRST);
 MODULE_VERSION(aw_sid, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
Index: user/ngie/bug-237403/sys/arm/allwinner/aw_syscon.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/aw_syscon.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/aw_syscon.c	(revision 346926)
@@ -1,85 +1,86 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2018 Kyle Evans <kevans@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 /*
  * Allwinner syscon driver
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/mutex.h>
 #include <sys/rman.h>
 #include <machine/bus.h>
 
 #include <dev/ofw/openfirm.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/extres/syscon/syscon.h>
 #include <dev/extres/syscon/syscon_generic.h>
 
 static struct ofw_compat_data compat_data[] = {
 	{"allwinner,sun50i-a64-system-controller", 1},
 	{"allwinner,sun50i-a64-system-control", 1},
 	{"allwinner,sun8i-a83t-system-controller", 1},
 	{"allwinner,sun8i-h3-system-controller", 1},
 	{"allwinner,sun8i-h3-system-control", 1},
+	{"allwinner,sun50i-h5-system-control", 1},
 	{NULL,             0}
 };
 
 static int
 aw_syscon_probe(device_t dev)
 {
 
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner syscon");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static device_method_t aw_syscon_methods[] = {
 	DEVMETHOD(device_probe, aw_syscon_probe),
 
 	DEVMETHOD_END
 };
 
 DEFINE_CLASS_1(aw_syscon, aw_syscon_driver, aw_syscon_methods,
     sizeof(struct syscon_generic_softc), syscon_generic_driver);
 
 static devclass_t aw_syscon_devclass;
 /* aw_syscon needs to attach prior to if_awg */
 EARLY_DRIVER_MODULE(aw_syscon, simplebus, aw_syscon_driver, aw_syscon_devclass,
     0, 0, BUS_PASS_SUPPORTDEV + BUS_PASS_ORDER_MIDDLE);
 MODULE_VERSION(aw_syscon, 1);
Index: user/ngie/bug-237403/sys/arm/allwinner/aw_thermal.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/aw_thermal.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/aw_thermal.c	(revision 346926)
@@ -1,730 +1,732 @@
 /*-
  * Copyright (c) 2016 Jared McNeill <jmcneill@invisible.ca>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * Allwinner thermal sensor controller
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/reboot.h>
 #include <sys/module.h>
 #include <sys/cpu.h>
 #include <sys/taskqueue.h>
 #include <machine/bus.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/extres/clk/clk.h>
 #include <dev/extres/hwreset/hwreset.h>
 #include <dev/extres/nvmem/nvmem.h>
 
 #include <arm/allwinner/aw_sid.h>
 
 #include "cpufreq_if.h"
 #include "nvmem_if.h"
 
 #define	THS_CTRL0		0x00
 #define	THS_CTRL1		0x04
 #define	 ADC_CALI_EN		(1 << 17)
 #define	THS_CTRL2		0x40
 #define	 SENSOR_ACQ1_SHIFT	16
 #define	 SENSOR2_EN		(1 << 2)
 #define	 SENSOR1_EN		(1 << 1)
 #define	 SENSOR0_EN		(1 << 0)
 #define	THS_INTC		0x44
 #define	 THS_THERMAL_PER_SHIFT	12
 #define	THS_INTS		0x48
 #define	 THS2_DATA_IRQ_STS	(1 << 10)
 #define	 THS1_DATA_IRQ_STS	(1 << 9)
 #define	 THS0_DATA_IRQ_STS	(1 << 8)
 #define	 SHUT_INT2_STS		(1 << 6)
 #define	 SHUT_INT1_STS		(1 << 5)
 #define	 SHUT_INT0_STS		(1 << 4)
 #define	 ALARM_INT2_STS		(1 << 2)
 #define	 ALARM_INT1_STS		(1 << 1)
 #define	 ALARM_INT0_STS		(1 << 0)
 #define	THS_ALARM0_CTRL		0x50
 #define	 ALARM_T_HOT_MASK	0xfff
 #define	 ALARM_T_HOT_SHIFT	16
 #define	 ALARM_T_HYST_MASK	0xfff
 #define	 ALARM_T_HYST_SHIFT	0
 #define	THS_SHUTDOWN0_CTRL	0x60
 #define	 SHUT_T_HOT_MASK	0xfff
 #define	 SHUT_T_HOT_SHIFT	16
 #define	THS_FILTER		0x70
 #define	THS_CALIB0		0x74
 #define	THS_CALIB1		0x78
 #define	THS_DATA0		0x80
 #define	THS_DATA1		0x84
 #define	THS_DATA2		0x88
 #define	 DATA_MASK		0xfff
 
 #define	A83T_CLK_RATE		24000000
 #define	A83T_ADC_ACQUIRE_TIME	23	/* 24Mhz/(23 + 1) = 1us */
 #define	A83T_THERMAL_PER	1	/* 4096 * (1 + 1) / 24Mhz = 341 us */
 #define	A83T_FILTER		0x5	/* Filter enabled, avg of 4 */
 #define	A83T_TEMP_BASE		2719000
 #define	A83T_TEMP_MUL		1000
 #define	A83T_TEMP_DIV		14186
 
 #define	A64_CLK_RATE		4000000
 #define	A64_ADC_ACQUIRE_TIME	400	/* 4Mhz/(400 + 1) = 100 us */
 #define	A64_THERMAL_PER		24	/* 4096 * (24 + 1) / 4Mhz = 25.6 ms */
 #define	A64_FILTER		0x6	/* Filter enabled, avg of 8 */
 #define	A64_TEMP_BASE		2170000
 #define	A64_TEMP_MUL		1000
 #define	A64_TEMP_DIV		8560
 
 #define	H3_CLK_RATE		4000000
 #define	H3_ADC_ACQUIRE_TIME	0x3f
 #define	H3_THERMAL_PER		401
 #define	H3_FILTER		0x6	/* Filter enabled, avg of 8 */
 #define	H3_TEMP_BASE		217
 #define	H3_TEMP_MUL		1000
 #define	H3_TEMP_DIV		8253
 #define	H3_TEMP_MINUS		1794000
 #define	H3_INIT_ALARM		90	/* degC */
 #define	H3_INIT_SHUT		105	/* degC */
 
 #define	H5_CLK_RATE		24000000
 #define	H5_ADC_ACQUIRE_TIME	479	/* 24Mhz/479 = 20us */
 #define	H5_THERMAL_PER		58	/* 4096 * (58 + 1) / 24Mhz = 10ms */
 #define	H5_FILTER		0x6	/* Filter enabled, avg of 8 */
 #define	H5_TEMP_BASE		233832448
 #define	H5_TEMP_MUL		124885
 #define	H5_TEMP_DIV		20
 #define	H5_TEMP_BASE_CPU	271581184
 #define	H5_TEMP_MUL_CPU		152253
 #define	H5_TEMP_BASE_GPU	289406976
 #define	H5_TEMP_MUL_GPU		166724
 #define	H5_INIT_CPU_ALARM	80	/* degC */
 #define	H5_INIT_CPU_SHUT	96	/* degC */
 #define	H5_INIT_GPU_ALARM	84	/* degC */
 #define	H5_INIT_GPU_SHUT	100	/* degC */
 
 #define	TEMP_C_TO_K		273
 #define	SENSOR_ENABLE_ALL	(SENSOR0_EN|SENSOR1_EN|SENSOR2_EN)
 #define	SHUT_INT_ALL		(SHUT_INT0_STS|SHUT_INT1_STS|SHUT_INT2_STS)
 #define	ALARM_INT_ALL		(ALARM_INT0_STS)
 
 #define	MAX_SENSORS	3
 #define	MAX_CF_LEVELS	64
 
 #define	THROTTLE_ENABLE_DEFAULT	1
 
 /* Enable thermal throttling */
 static int aw_thermal_throttle_enable = THROTTLE_ENABLE_DEFAULT;
 TUNABLE_INT("hw.aw_thermal.throttle_enable", &aw_thermal_throttle_enable);
 
 struct aw_thermal_sensor {
 	const char		*name;
 	const char		*desc;
 	int			init_alarm;
 	int			init_shut;
 };
 
 struct aw_thermal_config {
 	struct aw_thermal_sensor	sensors[MAX_SENSORS];
 	int				nsensors;
 	uint64_t			clk_rate;
 	uint32_t			adc_acquire_time;
 	int				adc_cali_en;
 	uint32_t			filter;
 	uint32_t			thermal_per;
 	int				(*to_temp)(uint32_t, int);
 	uint32_t			(*to_reg)(int, int);
 	int				temp_base;
 	int				temp_mul;
 	int				temp_div;
 	int				calib0, calib1;
 	uint32_t			calib0_mask, calib1_mask;
 };
 
 static int
 a83t_to_temp(uint32_t val, int sensor)
 {
 	return ((A83T_TEMP_BASE - (val * A83T_TEMP_MUL)) / A83T_TEMP_DIV);
 }
 
 static const struct aw_thermal_config a83t_config = {
 	.nsensors = 3,
 	.sensors = {
 		[0] = {
 			.name = "cluster0",
 			.desc = "CPU cluster 0 temperature",
 		},
 		[1] = {
 			.name = "cluster1",
 			.desc = "CPU cluster 1 temperature",
 		},
 		[2] = {
 			.name = "gpu",
 			.desc = "GPU temperature",
 		},
 	},
 	.clk_rate = A83T_CLK_RATE,
 	.adc_acquire_time = A83T_ADC_ACQUIRE_TIME,
 	.adc_cali_en = 1,
 	.filter = A83T_FILTER,
 	.thermal_per = A83T_THERMAL_PER,
 	.to_temp = a83t_to_temp,
 	.calib0_mask = 0xffffffff,
 	.calib1_mask = 0xffff,
 };
 
 static int
 a64_to_temp(uint32_t val, int sensor)
 {
 	return ((A64_TEMP_BASE - (val * A64_TEMP_MUL)) / A64_TEMP_DIV);
 }
 
 static const struct aw_thermal_config a64_config = {
 	.nsensors = 3,
 	.sensors = {
 		[0] = {
 			.name = "cpu",
 			.desc = "CPU temperature",
 		},
 		[1] = {
 			.name = "gpu1",
 			.desc = "GPU temperature 1",
 		},
 		[2] = {
 			.name = "gpu2",
 			.desc = "GPU temperature 2",
 		},
 	},
 	.clk_rate = A64_CLK_RATE,
 	.adc_acquire_time = A64_ADC_ACQUIRE_TIME,
 	.adc_cali_en = 1,
 	.filter = A64_FILTER,
 	.thermal_per = A64_THERMAL_PER,
 	.to_temp = a64_to_temp,
 	.calib0_mask = 0xffffffff,
 	.calib1_mask = 0xffff,
 };
 
 static int
 h3_to_temp(uint32_t val, int sensor)
 {
 	return (H3_TEMP_BASE - ((val * H3_TEMP_MUL) / H3_TEMP_DIV));
 }
 
 static uint32_t
 h3_to_reg(int val, int sensor)
 {
 	return ((H3_TEMP_MINUS - (val * H3_TEMP_DIV)) / H3_TEMP_MUL);
 }
 
 static const struct aw_thermal_config h3_config = {
 	.nsensors = 1,
 	.sensors = {
 		[0] = {
 			.name = "cpu",
 			.desc = "CPU temperature",
 			.init_alarm = H3_INIT_ALARM,
 			.init_shut = H3_INIT_SHUT,
 		},
 	},
 	.clk_rate = H3_CLK_RATE,
 	.adc_acquire_time = H3_ADC_ACQUIRE_TIME,
 	.adc_cali_en = 1,
 	.filter = H3_FILTER,
 	.thermal_per = H3_THERMAL_PER,
 	.to_temp = h3_to_temp,
 	.to_reg = h3_to_reg,
 	.calib0_mask = 0xffff,
 };
 
 static int
 h5_to_temp(uint32_t val, int sensor)
 {
 	int tmp;
 
 	/* Temp is lower than 70 degrees */
 	if (val > 0x500) {
 		tmp = H5_TEMP_BASE - (val * H5_TEMP_MUL);
 		tmp >>= H5_TEMP_DIV;
 		return (tmp);
 	}
 
 	if (sensor == 0)
 		tmp = H5_TEMP_BASE_CPU - (val * H5_TEMP_MUL_CPU);
 	else if (sensor == 1)
 		tmp = H5_TEMP_BASE_GPU - (val * H5_TEMP_MUL_GPU);
 	else {
 		printf("Unknown sensor %d\n", sensor);
 		return (val);
 	}
 
 	tmp >>= H5_TEMP_DIV;
 	return (tmp);
 }
 
 static uint32_t
 h5_to_reg(int val, int sensor)
 {
 	int tmp;
 
 	if (val < 70) {
 		tmp = H5_TEMP_BASE - (val << H5_TEMP_DIV);
 		tmp /= H5_TEMP_MUL;
 	} else {
 		if (sensor == 0) {
 			tmp = H5_TEMP_BASE_CPU - (val << H5_TEMP_DIV);
 			tmp /= H5_TEMP_MUL_CPU;
 		} else if (sensor == 1) {
 			tmp = H5_TEMP_BASE_GPU - (val << H5_TEMP_DIV);
 			tmp /= H5_TEMP_MUL_GPU;
 		} else {
 			printf("Unknown sensor %d\n", sensor);
 			return (val);
 		}
 	}
 
 	return ((uint32_t)tmp);
 }
 
 static const struct aw_thermal_config h5_config = {
 	.nsensors = 2,
 	.sensors = {
 		[0] = {
 			.name = "cpu",
 			.desc = "CPU temperature",
 			.init_alarm = H5_INIT_CPU_ALARM,
 			.init_shut = H5_INIT_CPU_SHUT,
 		},
 		[1] = {
 			.name = "gpu",
 			.desc = "GPU temperature",
 			.init_alarm = H5_INIT_GPU_ALARM,
 			.init_shut = H5_INIT_GPU_SHUT,
 		},
 	},
 	.clk_rate = H5_CLK_RATE,
 	.adc_acquire_time = H5_ADC_ACQUIRE_TIME,
 	.filter = H5_FILTER,
 	.thermal_per = H5_THERMAL_PER,
 	.to_temp = h5_to_temp,
 	.to_reg = h5_to_reg,
 	.calib0_mask = 0xffffffff,
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{ "allwinner,sun8i-a83t-ths",	(uintptr_t)&a83t_config },
 	{ "allwinner,sun8i-h3-ths",	(uintptr_t)&h3_config },
 	{ "allwinner,sun50i-a64-ths",	(uintptr_t)&a64_config },
 	{ "allwinner,sun50i-h5-ths",	(uintptr_t)&h5_config },
 	{ NULL,				(uintptr_t)NULL }
 };
 
 #define	THS_CONF(d)		\
 	(void *)ofw_bus_search_compatible((d), compat_data)->ocd_data
 
 struct aw_thermal_softc {
 	device_t			dev;
 	struct resource			*res[2];
 	struct aw_thermal_config	*conf;
 
 	struct task			cf_task;
 	int				throttle;
 	int				min_freq;
 	struct cf_level			levels[MAX_CF_LEVELS];
 	eventhandler_tag		cf_pre_tag;
 
 	clk_t				clk_apb;
 	clk_t				clk_ths;
 };
 
 static struct resource_spec aw_thermal_spec[] = {
 	{ SYS_RES_MEMORY,	0,	RF_ACTIVE },
 	{ SYS_RES_IRQ,		0,	RF_ACTIVE },
 	{ -1, 0 }
 };
 
 #define	RD4(sc, reg)		bus_read_4((sc)->res[0], (reg))
 #define	WR4(sc, reg, val)	bus_write_4((sc)->res[0], (reg), (val))
 
 static int
 aw_thermal_init(struct aw_thermal_softc *sc)
 {
 	phandle_t node;
 	uint32_t calib[2];
 	int error;
 
 	node = ofw_bus_get_node(sc->dev);
 	if (nvmem_get_cell_len(node, "ths-calib") > sizeof(calib)) {
 		device_printf(sc->dev, "ths-calib nvmem cell is too large\n");
 		return (ENXIO);
 	}
 	error = nvmem_read_cell_by_name(node, "ths-calib",
 	    (void *)&calib, nvmem_get_cell_len(node, "ths-calib"));
 	/* Read calibration settings from EFUSE */
 	if (error != 0) {
 		device_printf(sc->dev, "Cannot read THS efuse\n");
 		return (error);
 	}
 
 	calib[0] &= sc->conf->calib0_mask;
 	calib[1] &= sc->conf->calib1_mask;
 
 	/* Write calibration settings to thermal controller */
 	if (calib[0] != 0)
 		WR4(sc, THS_CALIB0, calib[0]);
 	if (calib[1] != 0)
 		WR4(sc, THS_CALIB1, calib[1]);
 
 	/* Configure ADC acquire time (CLK_IN/(N+1)) and enable sensors */
 	WR4(sc, THS_CTRL1, ADC_CALI_EN);
 	WR4(sc, THS_CTRL0, sc->conf->adc_acquire_time);
 	WR4(sc, THS_CTRL2, sc->conf->adc_acquire_time << SENSOR_ACQ1_SHIFT);
 
 	/* Set thermal period */
 	WR4(sc, THS_INTC, sc->conf->thermal_per << THS_THERMAL_PER_SHIFT);
 
 	/* Enable average filter */
 	WR4(sc, THS_FILTER, sc->conf->filter);
 
 	/* Enable interrupts */
 	WR4(sc, THS_INTS, RD4(sc, THS_INTS));
 	WR4(sc, THS_INTC, RD4(sc, THS_INTC) | SHUT_INT_ALL | ALARM_INT_ALL);
 
 	/* Enable sensors */
 	WR4(sc, THS_CTRL2, RD4(sc, THS_CTRL2) | SENSOR_ENABLE_ALL);
 
 	return (0);
 }
 
 static int
 aw_thermal_gettemp(struct aw_thermal_softc *sc, int sensor)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_DATA0 + (sensor * 4));
 
 	return (sc->conf->to_temp(val, sensor));
 }
 
 static int
 aw_thermal_getshut(struct aw_thermal_softc *sc, int sensor)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_SHUTDOWN0_CTRL + (sensor * 4));
 	val = (val >> SHUT_T_HOT_SHIFT) & SHUT_T_HOT_MASK;
 
 	return (sc->conf->to_temp(val, sensor));
 }
 
 static void
 aw_thermal_setshut(struct aw_thermal_softc *sc, int sensor, int temp)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_SHUTDOWN0_CTRL + (sensor * 4));
 	val &= ~(SHUT_T_HOT_MASK << SHUT_T_HOT_SHIFT);
 	val |= (sc->conf->to_reg(temp, sensor) << SHUT_T_HOT_SHIFT);
 	WR4(sc, THS_SHUTDOWN0_CTRL + (sensor * 4), val);
 }
 
 static int
 aw_thermal_gethyst(struct aw_thermal_softc *sc, int sensor)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_ALARM0_CTRL + (sensor * 4));
 	val = (val >> ALARM_T_HYST_SHIFT) & ALARM_T_HYST_MASK;
 
 	return (sc->conf->to_temp(val, sensor));
 }
 
 static int
 aw_thermal_getalarm(struct aw_thermal_softc *sc, int sensor)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_ALARM0_CTRL + (sensor * 4));
 	val = (val >> ALARM_T_HOT_SHIFT) & ALARM_T_HOT_MASK;
 
 	return (sc->conf->to_temp(val, sensor));
 }
 
 static void
 aw_thermal_setalarm(struct aw_thermal_softc *sc, int sensor, int temp)
 {
 	uint32_t val;
 
 	val = RD4(sc, THS_ALARM0_CTRL + (sensor * 4));
 	val &= ~(ALARM_T_HOT_MASK << ALARM_T_HOT_SHIFT);
 	val |= (sc->conf->to_reg(temp, sensor) << ALARM_T_HOT_SHIFT);
 	WR4(sc, THS_ALARM0_CTRL + (sensor * 4), val);
 }
 
 static int
 aw_thermal_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct aw_thermal_softc *sc;
 	int sensor, val;
 
 	sc = arg1;
 	sensor = arg2;
 
 	val = aw_thermal_gettemp(sc, sensor) + TEMP_C_TO_K;
 
 	return sysctl_handle_opaque(oidp, &val, sizeof(val), req);
 }
 
 static void
 aw_thermal_throttle(struct aw_thermal_softc *sc, int enable)
 {
 	device_t cf_dev;
 	int count, error;
 
 	if (enable == sc->throttle)
 		return;
 
 	if (enable != 0) {
 		/* Set the lowest available frequency */
 		cf_dev = devclass_get_device(devclass_find("cpufreq"), 0);
 		if (cf_dev == NULL)
 			return;
 		count = MAX_CF_LEVELS;
 		error = CPUFREQ_LEVELS(cf_dev, sc->levels, &count);
 		if (error != 0 || count == 0)
 			return;
 		sc->min_freq = sc->levels[count - 1].total_set.freq;
 		error = CPUFREQ_SET(cf_dev, &sc->levels[count - 1],
 		    CPUFREQ_PRIO_USER);
 		if (error != 0)
 			return;
 	}
 
 	sc->throttle = enable;
 }
 
 static void
 aw_thermal_cf_task(void *arg, int pending)
 {
 	struct aw_thermal_softc *sc;
 
 	sc = arg;
 
 	aw_thermal_throttle(sc, 1);
 }
 
 static void
 aw_thermal_cf_pre_change(void *arg, const struct cf_level *level, int *status)
 {
 	struct aw_thermal_softc *sc;
 	int temp_cur, temp_alarm;
 
 	sc = arg;
 
 	if (aw_thermal_throttle_enable == 0 || sc->throttle == 0 ||
 	    level->total_set.freq == sc->min_freq)
 		return;
 
 	temp_cur = aw_thermal_gettemp(sc, 0);
 	temp_alarm = aw_thermal_getalarm(sc, 0);
 
 	if (temp_cur < temp_alarm)
 		aw_thermal_throttle(sc, 0);
 	else
 		*status = ENXIO;
 }
 
 static void
 aw_thermal_intr(void *arg)
 {
 	struct aw_thermal_softc *sc;
 	device_t dev;
 	uint32_t ints;
 
 	dev = arg;
 	sc = device_get_softc(dev);
 
 	ints = RD4(sc, THS_INTS);
 	WR4(sc, THS_INTS, ints);
 
 	if ((ints & SHUT_INT_ALL) != 0) {
 		device_printf(dev,
 		    "WARNING - current temperature exceeds safe limits\n");
 		shutdown_nice(RB_POWEROFF);
 	}
 
 	if ((ints & ALARM_INT_ALL) != 0)
 		taskqueue_enqueue(taskqueue_thread, &sc->cf_task);
 }
 
 static int
 aw_thermal_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (THS_CONF(dev) == NULL)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner Thermal Sensor Controller");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 aw_thermal_attach(device_t dev)
 {
 	struct aw_thermal_softc *sc;
 	hwreset_t rst;
 	int i, error;
 	void *ih;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	rst = NULL;
 	ih = NULL;
 
 	sc->conf = THS_CONF(dev);
 	TASK_INIT(&sc->cf_task, 0, aw_thermal_cf_task, sc);
 
 	if (bus_alloc_resources(dev, aw_thermal_spec, sc->res) != 0) {
 		device_printf(dev, "cannot allocate resources for device\n");
 		return (ENXIO);
 	}
 
 	if (clk_get_by_ofw_name(dev, 0, "apb", &sc->clk_apb) == 0) {
 		error = clk_enable(sc->clk_apb);
 		if (error != 0) {
 			device_printf(dev, "cannot enable apb clock\n");
 			goto fail;
 		}
 	}
 
 	if (clk_get_by_ofw_name(dev, 0, "ths", &sc->clk_ths) == 0) {
 		error = clk_set_freq(sc->clk_ths, sc->conf->clk_rate, 0);
 		if (error != 0) {
 			device_printf(dev, "cannot set ths clock rate\n");
 			goto fail;
 		}
 		error = clk_enable(sc->clk_ths);
 		if (error != 0) {
 			device_printf(dev, "cannot enable ths clock\n");
 			goto fail;
 		}
 	}
 
 	if (hwreset_get_by_ofw_idx(dev, 0, 0, &rst) == 0) {
 		error = hwreset_deassert(rst);
 		if (error != 0) {
 			device_printf(dev, "cannot de-assert reset\n");
 			goto fail;
 		}
 	}
 
 	error = bus_setup_intr(dev, sc->res[1], INTR_TYPE_MISC | INTR_MPSAFE,
 	    NULL, aw_thermal_intr, dev, &ih);
 	if (error != 0) {
 		device_printf(dev, "cannot setup interrupt handler\n");
 		goto fail;
 	}
 
 	for (i = 0; i < sc->conf->nsensors; i++) {
 		if (sc->conf->sensors[i].init_alarm > 0)
 			aw_thermal_setalarm(sc, i,
 			    sc->conf->sensors[i].init_alarm);
 		if (sc->conf->sensors[i].init_shut > 0)
 			aw_thermal_setshut(sc, i,
 			    sc->conf->sensors[i].init_shut);
 	}
 
 	if (aw_thermal_init(sc) != 0)
 		goto fail;
 
 	for (i = 0; i < sc->conf->nsensors; i++)
 		SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 		    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 		    OID_AUTO, sc->conf->sensors[i].name,
 		    CTLTYPE_INT | CTLFLAG_RD,
 		    sc, i, aw_thermal_sysctl, "IK0",
 		    sc->conf->sensors[i].desc);
 
 	if (bootverbose)
 		for (i = 0; i < sc->conf->nsensors; i++) {
 			device_printf(dev,
 			    "%s: alarm %dC hyst %dC shut %dC\n",
 			    sc->conf->sensors[i].name,
 			    aw_thermal_getalarm(sc, i),
 			    aw_thermal_gethyst(sc, i),
 			    aw_thermal_getshut(sc, i));
 		}
 
 	sc->cf_pre_tag = EVENTHANDLER_REGISTER(cpufreq_pre_change,
 	    aw_thermal_cf_pre_change, sc, EVENTHANDLER_PRI_FIRST);
 
 	return (0);
 
 fail:
 	if (ih != NULL)
 		bus_teardown_intr(dev, sc->res[1], ih);
 	if (rst != NULL)
 		hwreset_release(rst);
 	if (sc->clk_apb != NULL)
 		clk_release(sc->clk_apb);
 	if (sc->clk_ths != NULL)
 		clk_release(sc->clk_ths);
 	bus_release_resources(dev, aw_thermal_spec, sc->res);
 
 	return (ENXIO);
 }
 
 static device_method_t aw_thermal_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		aw_thermal_probe),
 	DEVMETHOD(device_attach,	aw_thermal_attach),
 
 	DEVMETHOD_END
 };
 
 static driver_t aw_thermal_driver = {
 	"aw_thermal",
 	aw_thermal_methods,
 	sizeof(struct aw_thermal_softc),
 };
 
 static devclass_t aw_thermal_devclass;
 
 DRIVER_MODULE(aw_thermal, simplebus, aw_thermal_driver, aw_thermal_devclass,
     0, 0);
 MODULE_VERSION(aw_thermal, 1);
+MODULE_DEPEND(aw_thermal, aw_sid, 1, 1, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
Index: user/ngie/bug-237403/sys/arm/allwinner/axp81x.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/axp81x.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/axp81x.c	(revision 346926)
@@ -1,1621 +1,1622 @@
 /*-
  * Copyright (c) 2018 Emmanuel Vadot <manu@freebsd.org>
  * Copyright (c) 2016 Jared McNeill <jmcneill@invisible.ca>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * X-Powers AXP803/813/818 PMU for Allwinner SoCs
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/eventhandler.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/reboot.h>
 #include <sys/gpio.h>
 #include <sys/module.h>
 #include <machine/bus.h>
 
 #include <dev/iicbus/iicbus.h>
 #include <dev/iicbus/iiconf.h>
 
 #include <dev/gpio/gpiobusvar.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/extres/regulator/regulator.h>
 
 #include "gpio_if.h"
 #include "iicbus_if.h"
 #include "regdev_if.h"
 
 MALLOC_DEFINE(M_AXP8XX_REG, "AXP8xx regulator", "AXP8xx power regulator");
 
 #define	AXP_POWERSRC		0x00
 #define	 AXP_POWERSRC_ACIN	(1 << 7)
 #define	 AXP_POWERSRC_VBUS	(1 << 5)
 #define	 AXP_POWERSRC_VBAT	(1 << 3)
 #define	 AXP_POWERSRC_CHARING	(1 << 2)	/* Charging Direction */
 #define	 AXP_POWERSRC_SHORTED	(1 << 1)
 #define	 AXP_POWERSRC_STARTUP	(1 << 0)
 #define	AXP_POWERMODE		0x01
 #define	 AXP_POWERMODE_BAT_CHARGING	(1 << 6)
 #define	 AXP_POWERMODE_BAT_PRESENT	(1 << 5)
 #define	 AXP_POWERMODE_BAT_VALID	(1 << 4)
 #define	AXP_ICTYPE		0x03
 #define	AXP_POWERCTL1		0x10
 #define	 AXP_POWERCTL1_DCDC7	(1 << 6)	/* AXP813/818 only */
 #define	 AXP_POWERCTL1_DCDC6	(1 << 5)
 #define	 AXP_POWERCTL1_DCDC5	(1 << 4)
 #define	 AXP_POWERCTL1_DCDC4	(1 << 3)
 #define	 AXP_POWERCTL1_DCDC3	(1 << 2)
 #define	 AXP_POWERCTL1_DCDC2	(1 << 1)
 #define	 AXP_POWERCTL1_DCDC1	(1 << 0)
 #define	AXP_POWERCTL2		0x12
 #define	 AXP_POWERCTL2_DC1SW	(1 << 7)	/* AXP803 only */
 #define	 AXP_POWERCTL2_DLDO4	(1 << 6)
 #define	 AXP_POWERCTL2_DLDO3	(1 << 5)
 #define	 AXP_POWERCTL2_DLDO2	(1 << 4)
 #define	 AXP_POWERCTL2_DLDO1	(1 << 3)
 #define	 AXP_POWERCTL2_ELDO3	(1 << 2)
 #define	 AXP_POWERCTL2_ELDO2	(1 << 1)
 #define	 AXP_POWERCTL2_ELDO1	(1 << 0)
 #define	AXP_POWERCTL3		0x13
 #define	 AXP_POWERCTL3_ALDO3	(1 << 7)
 #define	 AXP_POWERCTL3_ALDO2	(1 << 6)
 #define	 AXP_POWERCTL3_ALDO1	(1 << 5)
 #define	 AXP_POWERCTL3_FLDO3	(1 << 4)	/* AXP813/818 only */
 #define	 AXP_POWERCTL3_FLDO2	(1 << 3)
 #define	 AXP_POWERCTL3_FLDO1	(1 << 2)
 #define	AXP_VOLTCTL_DLDO1	0x15
 #define	AXP_VOLTCTL_DLDO2	0x16
 #define	AXP_VOLTCTL_DLDO3	0x17
 #define	AXP_VOLTCTL_DLDO4	0x18
 #define	AXP_VOLTCTL_ELDO1	0x19
 #define	AXP_VOLTCTL_ELDO2	0x1A
 #define	AXP_VOLTCTL_ELDO3	0x1B
 #define	AXP_VOLTCTL_FLDO1	0x1C
 #define	AXP_VOLTCTL_FLDO2	0x1D
 #define	AXP_VOLTCTL_DCDC1	0x20
 #define	AXP_VOLTCTL_DCDC2	0x21
 #define	AXP_VOLTCTL_DCDC3	0x22
 #define	AXP_VOLTCTL_DCDC4	0x23
 #define	AXP_VOLTCTL_DCDC5	0x24
 #define	AXP_VOLTCTL_DCDC6	0x25
 #define	AXP_VOLTCTL_DCDC7	0x26
 #define	AXP_VOLTCTL_ALDO1	0x28
 #define	AXP_VOLTCTL_ALDO2	0x29
 #define	AXP_VOLTCTL_ALDO3	0x2A
 #define	 AXP_VOLTCTL_STATUS	(1 << 7)
 #define	 AXP_VOLTCTL_MASK	0x7f
 #define	AXP_POWERBAT		0x32
 #define	 AXP_POWERBAT_SHUTDOWN	(1 << 7)
 #define	AXP_CHARGERCTL1		0x33
 #define	 AXP_CHARGERCTL1_MIN	0
 #define	 AXP_CHARGERCTL1_MAX	13
 #define	 AXP_CHARGERCTL1_CMASK	0xf
 #define	AXP_IRQEN1		0x40
 #define	 AXP_IRQEN1_ACIN_HI	(1 << 6)
 #define	 AXP_IRQEN1_ACIN_LO	(1 << 5)
 #define	 AXP_IRQEN1_VBUS_HI	(1 << 3)
 #define	 AXP_IRQEN1_VBUS_LO	(1 << 2)
 #define	AXP_IRQEN2		0x41
 #define	 AXP_IRQEN2_BAT_IN	(1 << 7)
 #define	 AXP_IRQEN2_BAT_NO	(1 << 6)
 #define	 AXP_IRQEN2_BATCHGC	(1 << 3)
 #define	 AXP_IRQEN2_BATCHGD	(1 << 2)
 #define	AXP_IRQEN3		0x42
 #define	AXP_IRQEN4		0x43
 #define	 AXP_IRQEN4_BATLVL_LO1	(1 << 1)
 #define	 AXP_IRQEN4_BATLVL_LO0	(1 << 0)
 #define	AXP_IRQEN5		0x44
 #define	 AXP_IRQEN5_POKSIRQ	(1 << 4)
 #define	 AXP_IRQEN5_POKLIRQ	(1 << 3)
 #define	AXP_IRQEN6		0x45
 #define	AXP_IRQSTAT1		0x48
 #define	 AXP_IRQSTAT1_ACIN_HI	(1 << 6)
 #define	 AXP_IRQSTAT1_ACIN_LO	(1 << 5)
 #define	 AXP_IRQSTAT1_VBUS_HI	(1 << 3)
 #define	 AXP_IRQSTAT1_VBUS_LO	(1 << 2)
 #define	AXP_IRQSTAT2		0x49
 #define	 AXP_IRQSTAT2_BAT_IN	(1 << 7)
 #define	 AXP_IRQSTAT2_BAT_NO	(1 << 6)
 #define	 AXP_IRQSTAT2_BATCHGC	(1 << 3)
 #define	 AXP_IRQSTAT2_BATCHGD	(1 << 2)
 #define	AXP_IRQSTAT3		0x4a
 #define	AXP_IRQSTAT4		0x4b
 #define	 AXP_IRQSTAT4_BATLVL_LO1	(1 << 1)
 #define	 AXP_IRQSTAT4_BATLVL_LO0	(1 << 0)
 #define	AXP_IRQSTAT5		0x4c
 #define	 AXP_IRQSTAT5_POKSIRQ	(1 << 4)
 #define	 AXP_IRQEN5_POKLIRQ	(1 << 3)
 #define	AXP_IRQSTAT6		0x4d
 #define	AXP_BATSENSE_HI		0x78
 #define	AXP_BATSENSE_LO		0x79
 #define	AXP_BATCHG_HI		0x7a
 #define	AXP_BATCHG_LO		0x7b
 #define	AXP_BATDISCHG_HI	0x7c
 #define	AXP_BATDISCHG_LO	0x7d
 #define	AXP_GPIO0_CTRL		0x90
 #define	AXP_GPIO0LDO_CTRL	0x91
 #define	AXP_GPIO1_CTRL		0x92
 #define	AXP_GPIO1LDO_CTRL	0x93
 #define	 AXP_GPIO_FUNC		(0x7 << 0)
 #define	 AXP_GPIO_FUNC_SHIFT	0
 #define	 AXP_GPIO_FUNC_DRVLO	0
 #define	 AXP_GPIO_FUNC_DRVHI	1
 #define	 AXP_GPIO_FUNC_INPUT	2
 #define	 AXP_GPIO_FUNC_LDO_ON	3
 #define	 AXP_GPIO_FUNC_LDO_OFF	4
 #define	AXP_GPIO_SIGBIT		0x94
 #define	AXP_GPIO_PD		0x97
 #define	AXP_FUEL_GAUGECTL	0xb8
 #define	 AXP_FUEL_GAUGECTL_EN	(1 << 7)
 
 #define	AXP_BAT_CAP		0xb9
 #define	 AXP_BAT_CAP_VALID	(1 << 7)
 #define	 AXP_BAT_CAP_PERCENT	0x7f
 
 #define	AXP_BAT_MAX_CAP_HI	0xe0
 #define	 AXP_BAT_MAX_CAP_VALID	(1 << 7)
 #define	AXP_BAT_MAX_CAP_LO	0xe1
 
 #define	AXP_BAT_COULOMB_HI	0xe2
 #define	 AXP_BAT_COULOMB_VALID	(1 << 7)
 #define	AXP_BAT_COULOMB_LO	0xe3
 
 #define	AXP_BAT_CAP_WARN	0xe6
 #define	 AXP_BAT_CAP_WARN_LV1		0xf0	/* Bits 4, 5, 6, 7 */
 #define	 AXP_BAP_CAP_WARN_LV1BASE	5	/* 5-20%, 1% per step */
 #define	 AXP_BAT_CAP_WARN_LV2		0xf	/* Bits 0, 1, 2, 3 */
 
 /* Sensor conversion macros */
 #define	AXP_SENSOR_BAT_H(hi)		((hi) << 4)
 #define	AXP_SENSOR_BAT_L(lo)		((lo) & 0xf)
 #define	AXP_SENSOR_COULOMB(hi, lo)	(((hi & ~(1 << 7)) << 8) | (lo))
 
 static const struct {
 	const char *name;
 	uint8_t	ctrl_reg;
 } axp8xx_pins[] = {
 	{ "GPIO0", AXP_GPIO0_CTRL },
 	{ "GPIO1", AXP_GPIO1_CTRL },
 };
 
 enum AXP8XX_TYPE {
 	AXP803 = 1,
 	AXP813,
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{ "x-powers,axp803",			AXP803 },
 	{ "x-powers,axp813",			AXP813 },
 	{ "x-powers,axp818",			AXP813 },
 	{ NULL,					0 }
 };
 
 static struct resource_spec axp8xx_spec[] = {
 	{ SYS_RES_IRQ,		0,	RF_ACTIVE },
 	{ -1, 0 }
 };
 
 struct axp8xx_regdef {
 	intptr_t		id;
 	char			*name;
 	char			*supply_name;
 	uint8_t			enable_reg;
 	uint8_t			enable_mask;
 	uint8_t			enable_value;
 	uint8_t			disable_value;
 	uint8_t			voltage_reg;
 	int			voltage_min;
 	int			voltage_max;
 	int			voltage_step1;
 	int			voltage_nstep1;
 	int			voltage_step2;
 	int			voltage_nstep2;
 };
 
 enum axp8xx_reg_id {
 	AXP8XX_REG_ID_DCDC1 = 100,
 	AXP8XX_REG_ID_DCDC2,
 	AXP8XX_REG_ID_DCDC3,
 	AXP8XX_REG_ID_DCDC4,
 	AXP8XX_REG_ID_DCDC5,
 	AXP8XX_REG_ID_DCDC6,
 	AXP813_REG_ID_DCDC7,
 	AXP803_REG_ID_DC1SW,
 	AXP8XX_REG_ID_DLDO1,
 	AXP8XX_REG_ID_DLDO2,
 	AXP8XX_REG_ID_DLDO3,
 	AXP8XX_REG_ID_DLDO4,
 	AXP8XX_REG_ID_ELDO1,
 	AXP8XX_REG_ID_ELDO2,
 	AXP8XX_REG_ID_ELDO3,
 	AXP8XX_REG_ID_ALDO1,
 	AXP8XX_REG_ID_ALDO2,
 	AXP8XX_REG_ID_ALDO3,
 	AXP8XX_REG_ID_FLDO1,
 	AXP8XX_REG_ID_FLDO2,
 	AXP813_REG_ID_FLDO3,
 	AXP8XX_REG_ID_GPIO0_LDO,
 	AXP8XX_REG_ID_GPIO1_LDO,
 };
 
 static struct axp8xx_regdef axp803_regdefs[] = {
 	{
 		.id = AXP803_REG_ID_DC1SW,
 		.name = "dc1sw",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_DC1SW,
 		.enable_value = AXP_POWERCTL2_DC1SW,
 	},
 };
 
 static struct axp8xx_regdef axp813_regdefs[] = {
 	{
 		.id = AXP813_REG_ID_DCDC7,
 		.name = "dcdc7",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC7,
 		.enable_value = AXP_POWERCTL1_DCDC7,
 		.voltage_reg = AXP_VOLTCTL_DCDC7,
 		.voltage_min = 600,
 		.voltage_max = 1520,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 50,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 21,
 	},
 };
 
 static struct axp8xx_regdef axp8xx_common_regdefs[] = {
 	{
 		.id = AXP8XX_REG_ID_DCDC1,
 		.name = "dcdc1",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC1,
 		.enable_value = AXP_POWERCTL1_DCDC1,
 		.voltage_reg = AXP_VOLTCTL_DCDC1,
 		.voltage_min = 1600,
 		.voltage_max = 3400,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 18,
 	},
 	{
 		.id = AXP8XX_REG_ID_DCDC2,
 		.name = "dcdc2",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC2,
 		.enable_value = AXP_POWERCTL1_DCDC2,
 		.voltage_reg = AXP_VOLTCTL_DCDC2,
 		.voltage_min = 500,
 		.voltage_max = 1300,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 70,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 5,
 	},
 	{
 		.id = AXP8XX_REG_ID_DCDC3,
 		.name = "dcdc3",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC3,
 		.enable_value = AXP_POWERCTL1_DCDC3,
 		.voltage_reg = AXP_VOLTCTL_DCDC3,
 		.voltage_min = 500,
 		.voltage_max = 1300,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 70,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 5,
 	},
 	{
 		.id = AXP8XX_REG_ID_DCDC4,
 		.name = "dcdc4",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC4,
 		.enable_value = AXP_POWERCTL1_DCDC4,
 		.voltage_reg = AXP_VOLTCTL_DCDC4,
 		.voltage_min = 500,
 		.voltage_max = 1300,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 70,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 5,
 	},
 	{
 		.id = AXP8XX_REG_ID_DCDC5,
 		.name = "dcdc5",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC5,
 		.enable_value = AXP_POWERCTL1_DCDC5,
 		.voltage_reg = AXP_VOLTCTL_DCDC5,
 		.voltage_min = 800,
 		.voltage_max = 1840,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 42,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 36,
 	},
 	{
 		.id = AXP8XX_REG_ID_DCDC6,
 		.name = "dcdc6",
 		.enable_reg = AXP_POWERCTL1,
 		.enable_mask = (uint8_t) AXP_POWERCTL1_DCDC6,
 		.enable_value = AXP_POWERCTL1_DCDC6,
 		.voltage_reg = AXP_VOLTCTL_DCDC6,
 		.voltage_min = 600,
 		.voltage_max = 1520,
 		.voltage_step1 = 10,
 		.voltage_nstep1 = 50,
 		.voltage_step2 = 20,
 		.voltage_nstep2 = 21,
 	},
 	{
 		.id = AXP8XX_REG_ID_DLDO1,
 		.name = "dldo1",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_DLDO1,
 		.enable_value = AXP_POWERCTL2_DLDO1,
 		.voltage_reg = AXP_VOLTCTL_DLDO1,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_DLDO2,
 		.name = "dldo2",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_DLDO2,
 		.enable_value = AXP_POWERCTL2_DLDO2,
 		.voltage_reg = AXP_VOLTCTL_DLDO2,
 		.voltage_min = 700,
 		.voltage_max = 4200,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 27,
 		.voltage_step2 = 200,
 		.voltage_nstep2 = 4,
 	},
 	{
 		.id = AXP8XX_REG_ID_DLDO3,
 		.name = "dldo3",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_DLDO3,
 		.enable_value = AXP_POWERCTL2_DLDO3,
 		.voltage_reg = AXP_VOLTCTL_DLDO3,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_DLDO4,
 		.name = "dldo4",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_DLDO4,
 		.enable_value = AXP_POWERCTL2_DLDO4,
 		.voltage_reg = AXP_VOLTCTL_DLDO4,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_ALDO1,
 		.name = "aldo1",
 		.enable_reg = AXP_POWERCTL3,
 		.enable_mask = (uint8_t) AXP_POWERCTL3_ALDO1,
 		.enable_value = AXP_POWERCTL3_ALDO1,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_ALDO2,
 		.name = "aldo2",
 		.enable_reg = AXP_POWERCTL3,
 		.enable_mask = (uint8_t) AXP_POWERCTL3_ALDO2,
 		.enable_value = AXP_POWERCTL3_ALDO2,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_ALDO3,
 		.name = "aldo3",
 		.enable_reg = AXP_POWERCTL3,
 		.enable_mask = (uint8_t) AXP_POWERCTL3_ALDO3,
 		.enable_value = AXP_POWERCTL3_ALDO3,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_ELDO1,
 		.name = "eldo1",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_ELDO1,
 		.enable_value = AXP_POWERCTL2_ELDO1,
 		.voltage_min = 700,
 		.voltage_max = 1900,
 		.voltage_step1 = 50,
 		.voltage_nstep1 = 24,
 	},
 	{
 		.id = AXP8XX_REG_ID_ELDO2,
 		.name = "eldo2",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_ELDO2,
 		.enable_value = AXP_POWERCTL2_ELDO2,
 		.voltage_min = 700,
 		.voltage_max = 1900,
 		.voltage_step1 = 50,
 		.voltage_nstep1 = 24,
 	},
 	{
 		.id = AXP8XX_REG_ID_ELDO3,
 		.name = "eldo3",
 		.enable_reg = AXP_POWERCTL2,
 		.enable_mask = (uint8_t) AXP_POWERCTL2_ELDO3,
 		.enable_value = AXP_POWERCTL2_ELDO3,
 		.voltage_min = 700,
 		.voltage_max = 1900,
 		.voltage_step1 = 50,
 		.voltage_nstep1 = 24,
 	},
 	{
 		.id = AXP8XX_REG_ID_FLDO1,
 		.name = "fldo1",
 		.enable_reg = AXP_POWERCTL3,
 		.enable_mask = (uint8_t) AXP_POWERCTL3_FLDO1,
 		.enable_value = AXP_POWERCTL3_FLDO1,
 		.voltage_min = 700,
 		.voltage_max = 1450,
 		.voltage_step1 = 50,
 		.voltage_nstep1 = 15,
 	},
 	{
 		.id = AXP8XX_REG_ID_FLDO2,
 		.name = "fldo2",
 		.enable_reg = AXP_POWERCTL3,
 		.enable_mask = (uint8_t) AXP_POWERCTL3_FLDO2,
 		.enable_value = AXP_POWERCTL3_FLDO2,
 		.voltage_min = 700,
 		.voltage_max = 1450,
 		.voltage_step1 = 50,
 		.voltage_nstep1 = 15,
 	},
 	{
 		.id = AXP8XX_REG_ID_GPIO0_LDO,
 		.name = "ldo-io0",
 		.enable_reg = AXP_GPIO0_CTRL,
 		.enable_mask = (uint8_t) AXP_GPIO_FUNC,
 		.enable_value = AXP_GPIO_FUNC_LDO_ON,
 		.disable_value = AXP_GPIO_FUNC_LDO_OFF,
 		.voltage_reg = AXP_GPIO0LDO_CTRL,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 	{
 		.id = AXP8XX_REG_ID_GPIO1_LDO,
 		.name = "ldo-io1",
 		.enable_reg = AXP_GPIO1_CTRL,
 		.enable_mask = (uint8_t) AXP_GPIO_FUNC,
 		.enable_value = AXP_GPIO_FUNC_LDO_ON,
 		.disable_value = AXP_GPIO_FUNC_LDO_OFF,
 		.voltage_reg = AXP_GPIO1LDO_CTRL,
 		.voltage_min = 700,
 		.voltage_max = 3300,
 		.voltage_step1 = 100,
 		.voltage_nstep1 = 26,
 	},
 };
 
 enum axp8xx_sensor {
 	AXP_SENSOR_ACIN_PRESENT,
 	AXP_SENSOR_VBUS_PRESENT,
 	AXP_SENSOR_BATT_PRESENT,
 	AXP_SENSOR_BATT_CHARGING,
 	AXP_SENSOR_BATT_CHARGE_STATE,
 	AXP_SENSOR_BATT_VOLTAGE,
 	AXP_SENSOR_BATT_CHARGE_CURRENT,
 	AXP_SENSOR_BATT_DISCHARGE_CURRENT,
 	AXP_SENSOR_BATT_CAPACITY_PERCENT,
 	AXP_SENSOR_BATT_MAXIMUM_CAPACITY,
 	AXP_SENSOR_BATT_CURRENT_CAPACITY,
 };
 
 enum battery_capacity_state {
 	BATT_CAPACITY_NORMAL = 1,	/* normal cap in battery */
 	BATT_CAPACITY_WARNING,		/* warning cap in battery */
 	BATT_CAPACITY_CRITICAL,		/* critical cap in battery */
 	BATT_CAPACITY_HIGH,		/* high cap in battery */
 	BATT_CAPACITY_MAX,		/* maximum cap in battery */
 	BATT_CAPACITY_LOW		/* low cap in battery */
 };
 
 struct axp8xx_sensors {
 	int             id;
 	const char      *name;
 	const char      *desc;
 	const char      *format;
 };
 
 static const struct axp8xx_sensors axp8xx_common_sensors[] = {
 	{
 		.id = AXP_SENSOR_ACIN_PRESENT,
 		.name = "acin",
 		.format = "I",
 		.desc = "ACIN Present",
 	},
 	{
 		.id = AXP_SENSOR_VBUS_PRESENT,
 		.name = "vbus",
 		.format = "I",
 		.desc = "VBUS Present",
 	},
 	{
 		.id = AXP_SENSOR_BATT_PRESENT,
 		.name = "bat",
 		.format = "I",
 		.desc = "Battery Present",
 	},
 	{
 		.id = AXP_SENSOR_BATT_CHARGING,
 		.name = "batcharging",
 		.format = "I",
 		.desc = "Battery Charging",
 	},
 	{
 		.id = AXP_SENSOR_BATT_CHARGE_STATE,
 		.name = "batchargestate",
 		.format = "I",
 		.desc = "Battery Charge State",
 	},
 	{
 		.id = AXP_SENSOR_BATT_VOLTAGE,
 		.name = "batvolt",
 		.format = "I",
 		.desc = "Battery Voltage",
 	},
 	{
 		.id = AXP_SENSOR_BATT_CHARGE_CURRENT,
 		.name = "batchargecurrent",
 		.format = "I",
 		.desc = "Average Battery Charging Current",
 	},
 	{
 		.id = AXP_SENSOR_BATT_DISCHARGE_CURRENT,
 		.name = "batdischargecurrent",
 		.format = "I",
 		.desc = "Average Battery Discharging Current",
 	},
 	{
 		.id = AXP_SENSOR_BATT_CAPACITY_PERCENT,
 		.name = "batcapacitypercent",
 		.format = "I",
 		.desc = "Battery Capacity Percentage",
 	},
 	{
 		.id = AXP_SENSOR_BATT_MAXIMUM_CAPACITY,
 		.name = "batmaxcapacity",
 		.format = "I",
 		.desc = "Battery Maximum Capacity",
 	},
 	{
 		.id = AXP_SENSOR_BATT_CURRENT_CAPACITY,
 		.name = "batcurrentcapacity",
 		.format = "I",
 		.desc = "Battery Current Capacity",
 	},
 };
 
 struct axp8xx_config {
 	const char		*name;
 	int			batsense_step;  /* uV */
 	int			charge_step;    /* uA */
 	int			discharge_step; /* uA */
 	int			maxcap_step;    /* uAh */
 	int			coulomb_step;   /* uAh */
 };
 
 static struct axp8xx_config axp803_config = {
 	.name = "AXP803",
 	.batsense_step = 1100,
 	.charge_step = 1000,
 	.discharge_step = 1000,
 	.maxcap_step = 1456,
 	.coulomb_step = 1456,
 };
 
 struct axp8xx_softc;
 
 struct axp8xx_reg_sc {
 	struct regnode		*regnode;
 	device_t		base_dev;
 	struct axp8xx_regdef	*def;
 	phandle_t		xref;
 	struct regnode_std_param *param;
 };
 
 struct axp8xx_softc {
 	struct resource		*res;
 	uint16_t		addr;
 	void			*ih;
 	device_t		gpiodev;
 	struct mtx		mtx;
 	int			busy;
 
 	int			type;
 
 	/* Configs */
 	const struct axp8xx_config	*config;
 
 	/* Sensors */
 	const struct axp8xx_sensors	*sensors;
 	int				nsensors;
 
 	/* Regulators */
 	struct axp8xx_reg_sc	**regs;
 	int			nregs;
 
 	/* Warning, shutdown thresholds */
 	int			warn_thres;
 	int			shut_thres;
 };
 
 #define	AXP_LOCK(sc)	mtx_lock(&(sc)->mtx)
 #define	AXP_UNLOCK(sc)	mtx_unlock(&(sc)->mtx)
 
 static int
 axp8xx_read(device_t dev, uint8_t reg, uint8_t *data, uint8_t size)
 {
 	struct axp8xx_softc *sc;
 	struct iic_msg msg[2];
 
 	sc = device_get_softc(dev);
 
 	msg[0].slave = sc->addr;
 	msg[0].flags = IIC_M_WR;
 	msg[0].len = 1;
 	msg[0].buf = &reg;
 
 	msg[1].slave = sc->addr;
 	msg[1].flags = IIC_M_RD;
 	msg[1].len = size;
 	msg[1].buf = data;
 
 	return (iicbus_transfer(dev, msg, 2));
 }
 
 static int
 axp8xx_write(device_t dev, uint8_t reg, uint8_t val)
 {
 	struct axp8xx_softc *sc;
 	struct iic_msg msg[2];
 
 	sc = device_get_softc(dev);
 
 	msg[0].slave = sc->addr;
 	msg[0].flags = IIC_M_WR;
 	msg[0].len = 1;
 	msg[0].buf = &reg;
 
 	msg[1].slave = sc->addr;
 	msg[1].flags = IIC_M_WR;
 	msg[1].len = 1;
 	msg[1].buf = &val;
 
 	return (iicbus_transfer(dev, msg, 2));
 }
 
 static int
 axp8xx_regnode_init(struct regnode *regnode)
 {
 	return (0);
 }
 
 static int
 axp8xx_regnode_enable(struct regnode *regnode, bool enable, int *udelay)
 {
 	struct axp8xx_reg_sc *sc;
 	uint8_t val;
 
 	sc = regnode_get_softc(regnode);
 
 	if (bootverbose)
 		device_printf(sc->base_dev, "%sable %s (%s)\n",
 		    enable ? "En" : "Dis",
 		    regnode_get_name(regnode),
 		    sc->def->name);
 
 	axp8xx_read(sc->base_dev, sc->def->enable_reg, &val, 1);
 	val &= ~sc->def->enable_mask;
 	if (enable)
 		val |= sc->def->enable_value;
 	else {
 		if (sc->def->disable_value)
 			val |= sc->def->disable_value;
 		else
 			val &= ~sc->def->enable_value;
 	}
 	axp8xx_write(sc->base_dev, sc->def->enable_reg, val);
 
 	*udelay = 0;
 
 	return (0);
 }
 
 static void
 axp8xx_regnode_reg_to_voltage(struct axp8xx_reg_sc *sc, uint8_t val, int *uv)
 {
 	if (val < sc->def->voltage_nstep1)
 		*uv = sc->def->voltage_min + val * sc->def->voltage_step1;
 	else
 		*uv = sc->def->voltage_min +
 		    (sc->def->voltage_nstep1 * sc->def->voltage_step1) +
 		    ((val - sc->def->voltage_nstep1) * sc->def->voltage_step2);
 	*uv *= 1000;
 }
 
 static int
 axp8xx_regnode_voltage_to_reg(struct axp8xx_reg_sc *sc, int min_uvolt,
     int max_uvolt, uint8_t *val)
 {
 	uint8_t nval;
 	int nstep, uvolt;
 
 	nval = 0;
 	uvolt = sc->def->voltage_min * 1000;
 
 	for (nstep = 0; nstep < sc->def->voltage_nstep1 && uvolt < min_uvolt;
 	     nstep++) {
 		++nval;
 		uvolt += (sc->def->voltage_step1 * 1000);
 	}
 	for (nstep = 0; nstep < sc->def->voltage_nstep2 && uvolt < min_uvolt;
 	     nstep++) {
 		++nval;
 		uvolt += (sc->def->voltage_step2 * 1000);
 	}
 	if (uvolt > max_uvolt)
 		return (EINVAL);
 
 	*val = nval;
 	return (0);
 }
 
 static int
 axp8xx_regnode_set_voltage(struct regnode *regnode, int min_uvolt,
     int max_uvolt, int *udelay)
 {
 	struct axp8xx_reg_sc *sc;
 	uint8_t val;
 
 	sc = regnode_get_softc(regnode);
 
 	if (bootverbose)
 		device_printf(sc->base_dev, "Setting %s (%s) to %d<->%d\n",
 		    regnode_get_name(regnode),
 		    sc->def->name,
 		    min_uvolt, max_uvolt);
 
 	if (sc->def->voltage_step1 == 0)
 		return (ENXIO);
 
 	if (axp8xx_regnode_voltage_to_reg(sc, min_uvolt, max_uvolt, &val) != 0)
 		return (ERANGE);
 
 	axp8xx_write(sc->base_dev, sc->def->voltage_reg, val);
 
 	*udelay = 0;
 
 	return (0);
 }
 
 static int
 axp8xx_regnode_get_voltage(struct regnode *regnode, int *uvolt)
 {
 	struct axp8xx_reg_sc *sc;
 	uint8_t val;
 
 	sc = regnode_get_softc(regnode);
 
 	if (!sc->def->voltage_step1 || !sc->def->voltage_step2)
 		return (ENXIO);
 
 	axp8xx_read(sc->base_dev, sc->def->voltage_reg, &val, 1);
 	axp8xx_regnode_reg_to_voltage(sc, val & AXP_VOLTCTL_MASK, uvolt);
 
 	return (0);
 }
 
 static regnode_method_t axp8xx_regnode_methods[] = {
 	/* Regulator interface */
 	REGNODEMETHOD(regnode_init,		axp8xx_regnode_init),
 	REGNODEMETHOD(regnode_enable,		axp8xx_regnode_enable),
 	REGNODEMETHOD(regnode_set_voltage,	axp8xx_regnode_set_voltage),
 	REGNODEMETHOD(regnode_get_voltage,	axp8xx_regnode_get_voltage),
 	REGNODEMETHOD_END
 };
 DEFINE_CLASS_1(axp8xx_regnode, axp8xx_regnode_class, axp8xx_regnode_methods,
     sizeof(struct axp8xx_reg_sc), regnode_class);
 
 static void
 axp8xx_shutdown(void *devp, int howto)
 {
 	device_t dev;
 
 	if ((howto & RB_POWEROFF) == 0)
 		return;
 
 	dev = devp;
 
 	if (bootverbose)
 		device_printf(dev, "Shutdown Axp8xx\n");
 
 	axp8xx_write(dev, AXP_POWERBAT, AXP_POWERBAT_SHUTDOWN);
 }
 
 static int
 axp8xx_sysctl_chargecurrent(SYSCTL_HANDLER_ARGS)
 {
 	device_t dev = arg1;
 	uint8_t data;
 	int val, error;
 
 	error = axp8xx_read(dev, AXP_CHARGERCTL1, &data, 1);
 	if (error != 0)
 		return (error);
 
 	if (bootverbose)
 		device_printf(dev, "Raw CHARGECTL1 val: 0x%0x\n", data);
 	val = (data & AXP_CHARGERCTL1_CMASK);
 	error = sysctl_handle_int(oidp, &val, 0, req);
 	if (error || !req->newptr) /* error || read request */
 		return (error);
 
 	if ((val < AXP_CHARGERCTL1_MIN) || (val > AXP_CHARGERCTL1_MAX))
 		return (EINVAL);
 
 	val |= (data & (AXP_CHARGERCTL1_CMASK << 4));
 	axp8xx_write(dev, AXP_CHARGERCTL1, val);
 
 	return (0);
 }
 
 static int
 axp8xx_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct axp8xx_softc *sc;
 	device_t dev = arg1;
 	enum axp8xx_sensor sensor = arg2;
 	const struct axp8xx_config *c;
 	uint8_t data;
 	int val, i, found, batt_val;
 	uint8_t lo, hi;
 
 	sc = device_get_softc(dev);
 	c = sc->config;
 
 	for (found = 0, i = 0; i < sc->nsensors; i++) {
 		if (sc->sensors[i].id == sensor) {
 			found = 1;
 			break;
 		}
 	}
 
 	if (found == 0)
 		return (ENOENT);
 
 	switch (sensor) {
 	case AXP_SENSOR_ACIN_PRESENT:
 		if (axp8xx_read(dev, AXP_POWERSRC, &data, 1) == 0)
 			val = !!(data & AXP_POWERSRC_ACIN);
 		break;
 	case AXP_SENSOR_VBUS_PRESENT:
 		if (axp8xx_read(dev, AXP_POWERSRC, &data, 1) == 0)
 			val = !!(data & AXP_POWERSRC_VBUS);
 		break;
 	case AXP_SENSOR_BATT_PRESENT:
 		if (axp8xx_read(dev, AXP_POWERMODE, &data, 1) == 0) {
 			if (data & AXP_POWERMODE_BAT_VALID)
 				val = !!(data & AXP_POWERMODE_BAT_PRESENT);
 		}
 		break;
 	case AXP_SENSOR_BATT_CHARGING:
 		if (axp8xx_read(dev, AXP_POWERMODE, &data, 1) == 0)
 			val = !!(data & AXP_POWERMODE_BAT_CHARGING);
 		break;
 	case AXP_SENSOR_BATT_CHARGE_STATE:
 		if (axp8xx_read(dev, AXP_BAT_CAP, &data, 1) == 0 &&
 		    (data & AXP_BAT_CAP_VALID) != 0) {
 			batt_val = (data & AXP_BAT_CAP_PERCENT);
 			if (batt_val <= sc->shut_thres)
 				val = BATT_CAPACITY_CRITICAL;
 			else if (batt_val <= sc->warn_thres)
 				val = BATT_CAPACITY_WARNING;
 			else
 				val = BATT_CAPACITY_NORMAL;
 		}
 		break;
 	case AXP_SENSOR_BATT_CAPACITY_PERCENT:
 		if (axp8xx_read(dev, AXP_BAT_CAP, &data, 1) == 0 &&
 		    (data & AXP_BAT_CAP_VALID) != 0)
 			val = (data & AXP_BAT_CAP_PERCENT);
 		break;
 	case AXP_SENSOR_BATT_VOLTAGE:
 		if (axp8xx_read(dev, AXP_BATSENSE_HI, &hi, 1) == 0 &&
 		    axp8xx_read(dev, AXP_BATSENSE_LO, &lo, 1) == 0) {
 			val = (AXP_SENSOR_BAT_H(hi) | AXP_SENSOR_BAT_L(lo));
 			val *= c->batsense_step;
 		}
 		break;
 	case AXP_SENSOR_BATT_CHARGE_CURRENT:
 		if (axp8xx_read(dev, AXP_POWERSRC, &data, 1) == 0 &&
 		    (data & AXP_POWERSRC_CHARING) != 0 &&
 		    axp8xx_read(dev, AXP_BATCHG_HI, &hi, 1) == 0 &&
 		    axp8xx_read(dev, AXP_BATCHG_LO, &lo, 1) == 0) {
 			val = (AXP_SENSOR_BAT_H(hi) | AXP_SENSOR_BAT_L(lo));
 			val *= c->charge_step;
 		}
 		break;
 	case AXP_SENSOR_BATT_DISCHARGE_CURRENT:
 		if (axp8xx_read(dev, AXP_POWERSRC, &data, 1) == 0 &&
 		    (data & AXP_POWERSRC_CHARING) == 0 &&
 		    axp8xx_read(dev, AXP_BATDISCHG_HI, &hi, 1) == 0 &&
 		    axp8xx_read(dev, AXP_BATDISCHG_LO, &lo, 1) == 0) {
 			val = (AXP_SENSOR_BAT_H(hi) | AXP_SENSOR_BAT_L(lo));
 			val *= c->discharge_step;
 		}
 		break;
 	case AXP_SENSOR_BATT_MAXIMUM_CAPACITY:
 		if (axp8xx_read(dev, AXP_BAT_MAX_CAP_HI, &hi, 1) == 0 &&
 		    axp8xx_read(dev, AXP_BAT_MAX_CAP_LO, &lo, 1) == 0) {
 			val = AXP_SENSOR_COULOMB(hi, lo);
 			val *= c->maxcap_step;
 		}
 		break;
 	case AXP_SENSOR_BATT_CURRENT_CAPACITY:
 		if (axp8xx_read(dev, AXP_BAT_COULOMB_HI, &hi, 1) == 0 &&
 		    axp8xx_read(dev, AXP_BAT_COULOMB_LO, &lo, 1) == 0) {
 			val = AXP_SENSOR_COULOMB(hi, lo);
 			val *= c->coulomb_step;
 		}
 		break;
 	}
 
 	return sysctl_handle_opaque(oidp, &val, sizeof(val), req);
 }
 
 static void
 axp8xx_intr(void *arg)
 {
 	device_t dev;
 	uint8_t val;
 	int error;
 
 	dev = arg;
 
 	error = axp8xx_read(dev, AXP_IRQSTAT1, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val) {
 		if (bootverbose)
 			device_printf(dev, "AXP_IRQSTAT1 val: %x\n", val);
 		if (val & AXP_IRQSTAT1_ACIN_HI)
 			devctl_notify("PMU", "AC", "plugged", NULL);
 		if (val & AXP_IRQSTAT1_ACIN_LO)
 			devctl_notify("PMU", "AC", "unplugged", NULL);
 		if (val & AXP_IRQSTAT1_VBUS_HI)
 			devctl_notify("PMU", "USB", "plugged", NULL);
 		if (val & AXP_IRQSTAT1_VBUS_LO)
 			devctl_notify("PMU", "USB", "unplugged", NULL);
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT1, val);
 	}
 
 	error = axp8xx_read(dev, AXP_IRQSTAT2, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val) {
 		if (bootverbose)
 			device_printf(dev, "AXP_IRQSTAT2 val: %x\n", val);
 		if (val & AXP_IRQSTAT2_BATCHGD)
 			devctl_notify("PMU", "Battery", "charged", NULL);
 		if (val & AXP_IRQSTAT2_BATCHGC)
 			devctl_notify("PMU", "Battery", "charging", NULL);
 		if (val & AXP_IRQSTAT2_BAT_NO)
 			devctl_notify("PMU", "Battery", "absent", NULL);
 		if (val & AXP_IRQSTAT2_BAT_IN)
 			devctl_notify("PMU", "Battery", "plugged", NULL);
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT2, val);
 	}
 
 	error = axp8xx_read(dev, AXP_IRQSTAT3, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val) {
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT3, val);
 	}
 
 	error = axp8xx_read(dev, AXP_IRQSTAT4, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val) {
 		if (bootverbose)
 			device_printf(dev, "AXP_IRQSTAT4 val: %x\n", val);
 		if (val & AXP_IRQSTAT4_BATLVL_LO0)
 			devctl_notify("PMU", "Battery", "shutdown threshold", NULL);
 		if (val & AXP_IRQSTAT4_BATLVL_LO1)
 			devctl_notify("PMU", "Battery", "warning threshold", NULL);
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT4, val);
 	}
 
 	error = axp8xx_read(dev, AXP_IRQSTAT5, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val != 0) {
 		if ((val & AXP_IRQSTAT5_POKSIRQ) != 0) {
 			if (bootverbose)
 				device_printf(dev, "Power button pressed\n");
 			shutdown_nice(RB_POWEROFF);
 		}
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT5, val);
 	}
 
 	error = axp8xx_read(dev, AXP_IRQSTAT6, &val, 1);
 	if (error != 0)
 		return;
 
 	if (val) {
 		/* Acknowledge */
 		axp8xx_write(dev, AXP_IRQSTAT6, val);
 	}
 }
 
 static device_t
 axp8xx_gpio_get_bus(device_t dev)
 {
 	struct axp8xx_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	return (sc->gpiodev);
 }
 
 static int
 axp8xx_gpio_pin_max(device_t dev, int *maxpin)
 {
 	*maxpin = nitems(axp8xx_pins) - 1;
 
 	return (0);
 }
 
 static int
 axp8xx_gpio_pin_getname(device_t dev, uint32_t pin, char *name)
 {
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	snprintf(name, GPIOMAXNAME, "%s", axp8xx_pins[pin].name);
 
 	return (0);
 }
 
 static int
 axp8xx_gpio_pin_getcaps(device_t dev, uint32_t pin, uint32_t *caps)
 {
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	*caps = GPIO_PIN_INPUT | GPIO_PIN_OUTPUT;
 
 	return (0);
 }
 
 static int
 axp8xx_gpio_pin_getflags(device_t dev, uint32_t pin, uint32_t *flags)
 {
 	struct axp8xx_softc *sc;
 	uint8_t data, func;
 	int error;
 
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	sc = device_get_softc(dev);
 
 	AXP_LOCK(sc);
 	error = axp8xx_read(dev, axp8xx_pins[pin].ctrl_reg, &data, 1);
 	if (error == 0) {
 		func = (data & AXP_GPIO_FUNC) >> AXP_GPIO_FUNC_SHIFT;
 		if (func == AXP_GPIO_FUNC_INPUT)
 			*flags = GPIO_PIN_INPUT;
 		else if (func == AXP_GPIO_FUNC_DRVLO ||
 		    func == AXP_GPIO_FUNC_DRVHI)
 			*flags = GPIO_PIN_OUTPUT;
 		else
 			*flags = 0;
 	}
 	AXP_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 axp8xx_gpio_pin_setflags(device_t dev, uint32_t pin, uint32_t flags)
 {
 	struct axp8xx_softc *sc;
 	uint8_t data;
 	int error;
 
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	sc = device_get_softc(dev);
 
 	AXP_LOCK(sc);
 	error = axp8xx_read(dev, axp8xx_pins[pin].ctrl_reg, &data, 1);
 	if (error == 0) {
 		data &= ~AXP_GPIO_FUNC;
 		if ((flags & (GPIO_PIN_INPUT|GPIO_PIN_OUTPUT)) != 0) {
 			if ((flags & GPIO_PIN_OUTPUT) == 0)
 				data |= AXP_GPIO_FUNC_INPUT;
 		}
 		error = axp8xx_write(dev, axp8xx_pins[pin].ctrl_reg, data);
 	}
 	AXP_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 axp8xx_gpio_pin_get(device_t dev, uint32_t pin, unsigned int *val)
 {
 	struct axp8xx_softc *sc;
 	uint8_t data, func;
 	int error;
 
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	sc = device_get_softc(dev);
 
 	AXP_LOCK(sc);
 	error = axp8xx_read(dev, axp8xx_pins[pin].ctrl_reg, &data, 1);
 	if (error == 0) {
 		func = (data & AXP_GPIO_FUNC) >> AXP_GPIO_FUNC_SHIFT;
 		switch (func) {
 		case AXP_GPIO_FUNC_DRVLO:
 			*val = 0;
 			break;
 		case AXP_GPIO_FUNC_DRVHI:
 			*val = 1;
 			break;
 		case AXP_GPIO_FUNC_INPUT:
 			error = axp8xx_read(dev, AXP_GPIO_SIGBIT, &data, 1);
 			if (error == 0)
 				*val = (data & (1 << pin)) ? 1 : 0;
 			break;
 		default:
 			error = EIO;
 			break;
 		}
 	}
 	AXP_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 axp8xx_gpio_pin_set(device_t dev, uint32_t pin, unsigned int val)
 {
 	struct axp8xx_softc *sc;
 	uint8_t data, func;
 	int error;
 
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	sc = device_get_softc(dev);
 
 	AXP_LOCK(sc);
 	error = axp8xx_read(dev, axp8xx_pins[pin].ctrl_reg, &data, 1);
 	if (error == 0) {
 		func = (data & AXP_GPIO_FUNC) >> AXP_GPIO_FUNC_SHIFT;
 		switch (func) {
 		case AXP_GPIO_FUNC_DRVLO:
 		case AXP_GPIO_FUNC_DRVHI:
 			data &= ~AXP_GPIO_FUNC;
 			data |= (val << AXP_GPIO_FUNC_SHIFT);
 			break;
 		default:
 			error = EIO;
 			break;
 		}
 	}
 	if (error == 0)
 		error = axp8xx_write(dev, axp8xx_pins[pin].ctrl_reg, data);
 	AXP_UNLOCK(sc);
 
 	return (error);
 }
 
 
 static int
 axp8xx_gpio_pin_toggle(device_t dev, uint32_t pin)
 {
 	struct axp8xx_softc *sc;
 	uint8_t data, func;
 	int error;
 
 	if (pin >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	sc = device_get_softc(dev);
 
 	AXP_LOCK(sc);
 	error = axp8xx_read(dev, axp8xx_pins[pin].ctrl_reg, &data, 1);
 	if (error == 0) {
 		func = (data & AXP_GPIO_FUNC) >> AXP_GPIO_FUNC_SHIFT;
 		switch (func) {
 		case AXP_GPIO_FUNC_DRVLO:
 			data &= ~AXP_GPIO_FUNC;
 			data |= (AXP_GPIO_FUNC_DRVHI << AXP_GPIO_FUNC_SHIFT);
 			break;
 		case AXP_GPIO_FUNC_DRVHI:
 			data &= ~AXP_GPIO_FUNC;
 			data |= (AXP_GPIO_FUNC_DRVLO << AXP_GPIO_FUNC_SHIFT);
 			break;
 		default:
 			error = EIO;
 			break;
 		}
 	}
 	if (error == 0)
 		error = axp8xx_write(dev, axp8xx_pins[pin].ctrl_reg, data);
 	AXP_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 axp8xx_gpio_map_gpios(device_t bus, phandle_t dev, phandle_t gparent,
     int gcells, pcell_t *gpios, uint32_t *pin, uint32_t *flags)
 {
 	if (gpios[0] >= nitems(axp8xx_pins))
 		return (EINVAL);
 
 	*pin = gpios[0];
 	*flags = gpios[1];
 
 	return (0);
 }
 
 static phandle_t
 axp8xx_get_node(device_t dev, device_t bus)
 {
 	return (ofw_bus_get_node(dev));
 }
 
 static struct axp8xx_reg_sc *
 axp8xx_reg_attach(device_t dev, phandle_t node,
     struct axp8xx_regdef *def)
 {
 	struct axp8xx_reg_sc *reg_sc;
 	struct regnode_init_def initdef;
 	struct regnode *regnode;
 
 	memset(&initdef, 0, sizeof(initdef));
 	if (regulator_parse_ofw_stdparam(dev, node, &initdef) != 0)
 		return (NULL);
 	if (initdef.std_param.min_uvolt == 0)
 		initdef.std_param.min_uvolt = def->voltage_min * 1000;
 	if (initdef.std_param.max_uvolt == 0)
 		initdef.std_param.max_uvolt = def->voltage_max * 1000;
 	initdef.id = def->id;
 	initdef.ofw_node = node;
 	regnode = regnode_create(dev, &axp8xx_regnode_class, &initdef);
 	if (regnode == NULL) {
 		device_printf(dev, "cannot create regulator\n");
 		return (NULL);
 	}
 
 	reg_sc = regnode_get_softc(regnode);
 	reg_sc->regnode = regnode;
 	reg_sc->base_dev = dev;
 	reg_sc->def = def;
 	reg_sc->xref = OF_xref_from_node(node);
 	reg_sc->param = regnode_get_stdparam(regnode);
 
 	regnode_register(regnode);
 
 	return (reg_sc);
 }
 
 static int
 axp8xx_regdev_map(device_t dev, phandle_t xref, int ncells, pcell_t *cells,
     intptr_t *num)
 {
 	struct axp8xx_softc *sc;
 	int i;
 
 	sc = device_get_softc(dev);
 	for (i = 0; i < sc->nregs; i++) {
 		if (sc->regs[i] == NULL)
 			continue;
 		if (sc->regs[i]->xref == xref) {
 			*num = sc->regs[i]->def->id;
 			return (0);
 		}
 	}
 
 	return (ENXIO);
 }
 
 static int
 axp8xx_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	switch (ofw_bus_search_compatible(dev, compat_data)->ocd_data)
 	{
 	case AXP803:
 		device_set_desc(dev, "X-Powers AXP803 Power Management Unit");
 		break;
 	case AXP813:
 		device_set_desc(dev, "X-Powers AXP813 Power Management Unit");
 		break;
 	default:
 		return (ENXIO);
 	}
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 axp8xx_attach(device_t dev)
 {
 	struct axp8xx_softc *sc;
 	struct axp8xx_reg_sc *reg;
 	uint8_t chip_id, val;
 	phandle_t rnode, child;
 	int error, i;
 
 	sc = device_get_softc(dev);
 
 	sc->addr = iicbus_get_addr(dev);
 	mtx_init(&sc->mtx, device_get_nameunit(dev), NULL, MTX_DEF);
 
 	error = bus_alloc_resources(dev, axp8xx_spec, &sc->res);
 	if (error != 0) {
 		device_printf(dev, "cannot allocate resources for device\n");
 		return (error);
 	}
 
 	if (bootverbose) {
 		axp8xx_read(dev, AXP_ICTYPE, &chip_id, 1);
 		device_printf(dev, "chip ID 0x%02x\n", chip_id);
 	}
 
 	sc->nregs = nitems(axp8xx_common_regdefs);
 	sc->type = ofw_bus_search_compatible(dev, compat_data)->ocd_data;
 	switch (sc->type) {
 	case AXP803:
 		sc->nregs += nitems(axp803_regdefs);
 		break;
 	case AXP813:
 		sc->nregs += nitems(axp813_regdefs);
 		break;
 	}
 	sc->config = &axp803_config;
 	sc->sensors = axp8xx_common_sensors;
 	sc->nsensors = nitems(axp8xx_common_sensors);
 
 	sc->regs = malloc(sizeof(struct axp8xx_reg_sc *) * sc->nregs,
 	    M_AXP8XX_REG, M_WAITOK | M_ZERO);
 
 	/* Attach known regulators that exist in the DT */
 	rnode = ofw_bus_find_child(ofw_bus_get_node(dev), "regulators");
 	if (rnode > 0) {
 		for (i = 0; i < sc->nregs; i++) {
 			char *regname;
 			struct axp8xx_regdef *regdef;
 
 			if (i <= nitems(axp8xx_common_regdefs)) {
 				regname = axp8xx_common_regdefs[i].name;
 				regdef = &axp8xx_common_regdefs[i];
 			} else {
 				int off;
 
 				off = i - nitems(axp8xx_common_regdefs);
 				switch (sc->type) {
 				case AXP803:
 					regname = axp803_regdefs[off].name;
 					regdef = &axp803_regdefs[off];
 					break;
 				case AXP813:
 					regname = axp813_regdefs[off].name;
 					regdef = &axp813_regdefs[off];
 					break;
 				}
 			}
 			child = ofw_bus_find_child(rnode,
 			    regname);
 			if (child == 0)
 				continue;
 			reg = axp8xx_reg_attach(dev, child,
 			    regdef);
 			if (reg == NULL) {
 				device_printf(dev,
 				    "cannot attach regulator %s\n",
 				    regname);
 				continue;
 			}
 			sc->regs[i] = reg;
 		}
 	}
 
 	/* Add sensors */
 	for (i = 0; i < sc->nsensors; i++) {
 		SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 		    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 		    OID_AUTO, sc->sensors[i].name,
 		    CTLTYPE_INT | CTLFLAG_RD,
 		    dev, sc->sensors[i].id, axp8xx_sysctl,
 		    sc->sensors[i].format,
 		    sc->sensors[i].desc);
 	}
 	SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 	    OID_AUTO, "batchargecurrentstep",
 	    CTLTYPE_INT | CTLFLAG_RW,
 	    dev, 0, axp8xx_sysctl_chargecurrent,
 	    "I", "Battery Charging Current Step, "
 	    "0: 200mA, 1: 400mA, 2: 600mA, 3: 800mA, "
 	    "4: 1000mA, 5: 1200mA, 6: 1400mA, 7: 1600mA, "
 	    "8: 1800mA, 9: 2000mA, 10: 2200mA, 11: 2400mA, "
 	    "12: 2600mA, 13: 2800mA");
 
 	/* Get thresholds */
 	if (axp8xx_read(dev, AXP_BAT_CAP_WARN, &val, 1) == 0) {
 		sc->warn_thres = (val & AXP_BAT_CAP_WARN_LV1) >> 4;
 		sc->warn_thres += AXP_BAP_CAP_WARN_LV1BASE;
 		sc->shut_thres = (val & AXP_BAT_CAP_WARN_LV2);
 		if (bootverbose) {
 			device_printf(dev,
 			    "Raw reg val: 0x%02x\n", val);
 			device_printf(dev,
 			    "Warning threshold: 0x%02x\n", sc->warn_thres);
 			device_printf(dev,
 			    "Shutdown threshold: 0x%02x\n", sc->shut_thres);
 		}
 	}
 
 	/* Enable interrupts */
 	axp8xx_write(dev, AXP_IRQEN1,
 	    AXP_IRQEN1_VBUS_LO |
 	    AXP_IRQEN1_VBUS_HI |
 	    AXP_IRQEN1_ACIN_LO |
 	    AXP_IRQEN1_ACIN_HI);
 	axp8xx_write(dev, AXP_IRQEN2,
 	    AXP_IRQEN2_BATCHGD |
 	    AXP_IRQEN2_BATCHGC |
 	    AXP_IRQEN2_BAT_NO |
 	    AXP_IRQEN2_BAT_IN);
 	axp8xx_write(dev, AXP_IRQEN3, 0);
 	axp8xx_write(dev, AXP_IRQEN4,
 	    AXP_IRQEN4_BATLVL_LO0 |
 	    AXP_IRQEN4_BATLVL_LO1);
 	axp8xx_write(dev, AXP_IRQEN5,
 	    AXP_IRQEN5_POKSIRQ |
 	    AXP_IRQEN5_POKLIRQ);
 	axp8xx_write(dev, AXP_IRQEN6, 0);
 
 	/* Install interrupt handler */
 	error = bus_setup_intr(dev, sc->res, INTR_TYPE_MISC | INTR_MPSAFE,
 	    NULL, axp8xx_intr, dev, &sc->ih);
 	if (error != 0) {
 		device_printf(dev, "cannot setup interrupt handler\n");
 		return (error);
 	}
 
 	EVENTHANDLER_REGISTER(shutdown_final, axp8xx_shutdown, dev,
 	    SHUTDOWN_PRI_LAST);
 
 	sc->gpiodev = gpiobus_attach_bus(dev);
 
 	return (0);
 }
 
 static device_method_t axp8xx_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		axp8xx_probe),
 	DEVMETHOD(device_attach,	axp8xx_attach),
 
 	/* GPIO interface */
 	DEVMETHOD(gpio_get_bus,		axp8xx_gpio_get_bus),
 	DEVMETHOD(gpio_pin_max,		axp8xx_gpio_pin_max),
 	DEVMETHOD(gpio_pin_getname,	axp8xx_gpio_pin_getname),
 	DEVMETHOD(gpio_pin_getcaps,	axp8xx_gpio_pin_getcaps),
 	DEVMETHOD(gpio_pin_getflags,	axp8xx_gpio_pin_getflags),
 	DEVMETHOD(gpio_pin_setflags,	axp8xx_gpio_pin_setflags),
 	DEVMETHOD(gpio_pin_get,		axp8xx_gpio_pin_get),
 	DEVMETHOD(gpio_pin_set,		axp8xx_gpio_pin_set),
 	DEVMETHOD(gpio_pin_toggle,	axp8xx_gpio_pin_toggle),
 	DEVMETHOD(gpio_map_gpios,	axp8xx_gpio_map_gpios),
 
 	/* Regdev interface */
 	DEVMETHOD(regdev_map,		axp8xx_regdev_map),
 
 	/* OFW bus interface */
 	DEVMETHOD(ofw_bus_get_node,	axp8xx_get_node),
 
 	DEVMETHOD_END
 };
 
 static driver_t axp8xx_driver = {
 	"axp8xx_pmu",
 	axp8xx_methods,
 	sizeof(struct axp8xx_softc),
 };
 
 static devclass_t axp8xx_devclass;
 extern devclass_t ofwgpiobus_devclass, gpioc_devclass;
 extern driver_t ofw_gpiobus_driver, gpioc_driver;
 
 EARLY_DRIVER_MODULE(axp8xx, iicbus, axp8xx_driver, axp8xx_devclass, 0, 0,
     BUS_PASS_INTERRUPT + BUS_PASS_ORDER_LAST);
 EARLY_DRIVER_MODULE(ofw_gpiobus, axp8xx_pmu, ofw_gpiobus_driver,
     ofwgpiobus_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_LAST);
 DRIVER_MODULE(gpioc, axp8xx_pmu, gpioc_driver, gpioc_devclass, 0, 0);
 MODULE_VERSION(axp8xx, 1);
 MODULE_DEPEND(axp8xx, iicbus, 1, 1, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
Index: user/ngie/bug-237403/sys/arm/allwinner/clkng/ccu_de2.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/clkng/ccu_de2.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/clkng/ccu_de2.c	(revision 346926)
@@ -1,167 +1,166 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2018 Emmanuel Vadot <manu@freebsd.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <machine/bus.h>
 
 #include <dev/fdt/simplebus.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include "opt_soc.h"
 
 #include <dev/extres/clk/clk_div.h>
 #include <dev/extres/clk/clk_fixed.h>
 #include <dev/extres/clk/clk_mux.h>
 
 #include <arm/allwinner/clkng/aw_ccung.h>
 
 #include <gnu/dts/include/dt-bindings/clock/sun8i-de2.h>
 #include <gnu/dts/include/dt-bindings/reset/sun8i-de2.h>
 
 /* Non exported clocks */
 #define	CLK_MIXER0_DIV	3
 #define	CLK_MIXER1_DIV	4
 #define	CLK_WB_DIV	5
 
 static struct aw_ccung_reset de2_ccu_resets[] = {
 	CCU_RESET(RST_MIXER0, 0x08, 0)
 	CCU_RESET(RST_MIXER1, 0x08, 1)
 	CCU_RESET(RST_WB, 0x08, 2)
 };
 
 static struct aw_ccung_gate de2_ccu_gates[] = {
 	CCU_GATE(CLK_BUS_MIXER0, "mixer0", "mixer0-div", 0x00, 0)
 	CCU_GATE(CLK_BUS_MIXER1, "mixer1", "mixer1-div", 0x00, 1)
 	CCU_GATE(CLK_BUS_WB, "wb", "wb-div", 0x00, 2)
 
 	CCU_GATE(CLK_MIXER0, "bus-mixer0", "bus-de", 0x04, 0)
 	CCU_GATE(CLK_MIXER1, "bus-mixer1", "bus-de", 0x04, 1)
 	CCU_GATE(CLK_WB, "bus-wb", "bus-de", 0x04, 2)
 };
 
 static const char *div_parents[] = {"de"};
 
 NM_CLK(mixer0_div_clk,
     CLK_MIXER0_DIV,			/* id */
     "mixer0-div", div_parents,		/* names, parents */
     0x0C,				/* offset */
     0, 0, 1, AW_CLK_FACTOR_FIXED,	/* N factor (fake)*/
     0, 4, 0, 0,				/* M flags */
     0, 0,				/* mux */
     0,					/* gate */
     AW_CLK_SCALE_CHANGE);	/* flags */
 
 NM_CLK(mixer1_div_clk,
     CLK_MIXER1_DIV,			/* id */
     "mixer1-div", div_parents,		/* names, parents */
     0x0C,				/* offset */
     0, 0, 1, AW_CLK_FACTOR_FIXED,	/* N factor (fake)*/
     4, 4, 0, 0,				/* M flags */
     0, 0,				/* mux */
     0,					/* gate */
     AW_CLK_SCALE_CHANGE);	/* flags */
 
 NM_CLK(wb_div_clk,
     CLK_WB_DIV,				/* id */
     "wb-div", div_parents,		/* names, parents */
     0x0C,				/* offset */
     0, 0, 1, AW_CLK_FACTOR_FIXED,	/* N factor (fake)*/
     8, 4, 0, 0,				/* M flags */
     0, 0,				/* mux */
     0,					/* gate */
     AW_CLK_SCALE_CHANGE);	/* flags */
 
 static struct aw_ccung_clk de2_ccu_clks[] = {
 	{ .type = AW_CLK_NM, .clk.nm = &mixer0_div_clk},
 	{ .type = AW_CLK_NM, .clk.nm = &mixer1_div_clk},
 	{ .type = AW_CLK_NM, .clk.nm = &wb_div_clk},
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{"allwinner,sun50i-a64-de2-clk", 1},
-	{"allwinner,sun50i-h5-de2-clk", 1},
 	{NULL,             0}
 };
 
 static int
 ccu_de2_probe(device_t dev)
 {
 
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner DE2 Clock Control Unit");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 ccu_de2_attach(device_t dev)
 {
 	struct aw_ccung_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	sc->resets = de2_ccu_resets;
 	sc->nresets = nitems(de2_ccu_resets);
 	sc->gates = de2_ccu_gates;
 	sc->ngates = nitems(de2_ccu_gates);
 	sc->clks = de2_ccu_clks;
 	sc->nclks = nitems(de2_ccu_clks);
 
 	return (aw_ccung_attach(dev));
 }
 
 static device_method_t ccu_de2_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		ccu_de2_probe),
 	DEVMETHOD(device_attach,	ccu_de2_attach),
 
 	DEVMETHOD_END
 };
 
 static devclass_t ccu_de2ng_devclass;
 
 DEFINE_CLASS_1(ccu_de2, ccu_de2_driver, ccu_de2_methods,
   sizeof(struct aw_ccung_softc), aw_ccung_driver);
 
 EARLY_DRIVER_MODULE(ccu_de2, simplebus, ccu_de2_driver,
     ccu_de2ng_devclass, 0, 0, BUS_PASS_BUS + BUS_PASS_ORDER_LAST);
Index: user/ngie/bug-237403/sys/arm/allwinner/if_awg.c
===================================================================
--- user/ngie/bug-237403/sys/arm/allwinner/if_awg.c	(revision 346925)
+++ user/ngie/bug-237403/sys/arm/allwinner/if_awg.c	(revision 346926)
@@ -1,1972 +1,1973 @@
 /*-
  * Copyright (c) 2016 Jared McNeill <jmcneill@invisible.ca>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
  * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * Allwinner Gigabit Ethernet MAC (EMAC) controller
  */
 
 #include "opt_device_polling.h"
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/kernel.h>
 #include <sys/endian.h>
 #include <sys/mbuf.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/module.h>
 #include <sys/taskqueue.h>
 #include <sys/gpio.h>
 
 #include <net/bpf.h>
 #include <net/if.h>
 #include <net/ethernet.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_types.h>
 #include <net/if_var.h>
 
 #include <machine/bus.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <arm/allwinner/if_awgreg.h>
 #include <arm/allwinner/aw_sid.h>
 #include <dev/mii/mii.h>
 #include <dev/mii/miivar.h>
 
 #include <dev/extres/clk/clk.h>
 #include <dev/extres/hwreset/hwreset.h>
 #include <dev/extres/regulator/regulator.h>
 #include <dev/extres/syscon/syscon.h>
 
 #include "syscon_if.h"
 #include "miibus_if.h"
 #include "gpio_if.h"
 
 #define	RD4(sc, reg)		bus_read_4((sc)->res[_RES_EMAC], (reg))
 #define	WR4(sc, reg, val)	bus_write_4((sc)->res[_RES_EMAC], (reg), (val))
 
 #define	AWG_LOCK(sc)		mtx_lock(&(sc)->mtx)
 #define	AWG_UNLOCK(sc)		mtx_unlock(&(sc)->mtx);
 #define	AWG_ASSERT_LOCKED(sc)	mtx_assert(&(sc)->mtx, MA_OWNED)
 #define	AWG_ASSERT_UNLOCKED(sc)	mtx_assert(&(sc)->mtx, MA_NOTOWNED)
 
 #define	DESC_ALIGN		4
 #define	TX_DESC_COUNT		1024
 #define	TX_DESC_SIZE		(sizeof(struct emac_desc) * TX_DESC_COUNT)
 #define	RX_DESC_COUNT		256
 #define	RX_DESC_SIZE		(sizeof(struct emac_desc) * RX_DESC_COUNT)
 
 #define	DESC_OFF(n)		((n) * sizeof(struct emac_desc))
 #define	TX_NEXT(n)		(((n) + 1) & (TX_DESC_COUNT - 1))
 #define	TX_SKIP(n, o)		(((n) + (o)) & (TX_DESC_COUNT - 1))
 #define	RX_NEXT(n)		(((n) + 1) & (RX_DESC_COUNT - 1))
 
 #define	TX_MAX_SEGS		20
 
 #define	SOFT_RST_RETRY		1000
 #define	MII_BUSY_RETRY		1000
 #define	MDIO_FREQ		2500000
 
 #define	BURST_LEN_DEFAULT	8
 #define	RX_TX_PRI_DEFAULT	0
 #define	PAUSE_TIME_DEFAULT	0x400
 #define	TX_INTERVAL_DEFAULT	64
 #define	RX_BATCH_DEFAULT	64
 
 /* syscon EMAC clock register */
 #define	EMAC_CLK_REG		0x30
 #define	EMAC_CLK_EPHY_ADDR	(0x1f << 20)	/* H3 */
 #define	EMAC_CLK_EPHY_ADDR_SHIFT 20
 #define	EMAC_CLK_EPHY_LED_POL	(1 << 17)	/* H3 */
 #define	EMAC_CLK_EPHY_SHUTDOWN	(1 << 16)	/* H3 */
 #define	EMAC_CLK_EPHY_SELECT	(1 << 15)	/* H3 */
 #define	EMAC_CLK_RMII_EN	(1 << 13)
 #define	EMAC_CLK_ETXDC		(0x7 << 10)
 #define	EMAC_CLK_ETXDC_SHIFT	10
 #define	EMAC_CLK_ERXDC		(0x1f << 5)
 #define	EMAC_CLK_ERXDC_SHIFT	5
 #define	EMAC_CLK_PIT		(0x1 << 2)
 #define	 EMAC_CLK_PIT_MII	(0 << 2)
 #define	 EMAC_CLK_PIT_RGMII	(1 << 2)
 #define	EMAC_CLK_SRC		(0x3 << 0)
 #define	 EMAC_CLK_SRC_MII	(0 << 0)
 #define	 EMAC_CLK_SRC_EXT_RGMII	(1 << 0)
 #define	 EMAC_CLK_SRC_RGMII	(2 << 0)
 
 /* Burst length of RX and TX DMA transfers */
 static int awg_burst_len = BURST_LEN_DEFAULT;
 TUNABLE_INT("hw.awg.burst_len", &awg_burst_len);
 
 /* RX / TX DMA priority. If 1, RX DMA has priority over TX DMA. */
 static int awg_rx_tx_pri = RX_TX_PRI_DEFAULT;
 TUNABLE_INT("hw.awg.rx_tx_pri", &awg_rx_tx_pri);
 
 /* Pause time field in the transmitted control frame */
 static int awg_pause_time = PAUSE_TIME_DEFAULT;
 TUNABLE_INT("hw.awg.pause_time", &awg_pause_time);
 
 /* Request a TX interrupt every <n> descriptors */
 static int awg_tx_interval = TX_INTERVAL_DEFAULT;
 TUNABLE_INT("hw.awg.tx_interval", &awg_tx_interval);
 
 /* Maximum number of mbufs to send to if_input */
 static int awg_rx_batch = RX_BATCH_DEFAULT;
 TUNABLE_INT("hw.awg.rx_batch", &awg_rx_batch);
 
 enum awg_type {
 	EMAC_A83T = 1,
 	EMAC_H3,
 	EMAC_A64,
 };
 
 static struct ofw_compat_data compat_data[] = {
 	{ "allwinner,sun8i-a83t-emac",		EMAC_A83T },
 	{ "allwinner,sun8i-h3-emac",		EMAC_H3 },
 	{ "allwinner,sun50i-a64-emac",		EMAC_A64 },
 	{ NULL,					0 }
 };
 
 struct awg_bufmap {
 	bus_dmamap_t		map;
 	struct mbuf		*mbuf;
 };
 
 struct awg_txring {
 	bus_dma_tag_t		desc_tag;
 	bus_dmamap_t		desc_map;
 	struct emac_desc	*desc_ring;
 	bus_addr_t		desc_ring_paddr;
 	bus_dma_tag_t		buf_tag;
 	struct awg_bufmap	buf_map[TX_DESC_COUNT];
 	u_int			cur, next, queued;
 	u_int			segs;
 };
 
 struct awg_rxring {
 	bus_dma_tag_t		desc_tag;
 	bus_dmamap_t		desc_map;
 	struct emac_desc	*desc_ring;
 	bus_addr_t		desc_ring_paddr;
 	bus_dma_tag_t		buf_tag;
 	struct awg_bufmap	buf_map[RX_DESC_COUNT];
 	bus_dmamap_t		buf_spare_map;
 	u_int			cur;
 };
 
 enum {
 	_RES_EMAC,
 	_RES_IRQ,
 	_RES_SYSCON,
 	_RES_NITEMS
 };
 
 struct awg_softc {
 	struct resource		*res[_RES_NITEMS];
 	struct mtx		mtx;
 	if_t			ifp;
 	device_t		dev;
 	device_t		miibus;
 	struct callout		stat_ch;
 	struct task		link_task;
 	void			*ih;
 	u_int			mdc_div_ratio_m;
 	int			link;
 	int			if_flags;
 	enum awg_type		type;
 	struct syscon		*syscon;
 
 	struct awg_txring	tx;
 	struct awg_rxring	rx;
 };
 
 static struct resource_spec awg_spec[] = {
 	{ SYS_RES_MEMORY,	0,	RF_ACTIVE },
 	{ SYS_RES_IRQ,		0,	RF_ACTIVE },
 	{ SYS_RES_MEMORY,	1,	RF_ACTIVE | RF_OPTIONAL },
 	{ -1, 0 }
 };
 
 static void awg_txeof(struct awg_softc *sc);
 
 static int awg_parse_delay(device_t dev, uint32_t *tx_delay,
     uint32_t *rx_delay);
 static uint32_t syscon_read_emac_clk_reg(device_t dev);
 static void syscon_write_emac_clk_reg(device_t dev, uint32_t val);
 static phandle_t awg_get_phy_node(device_t dev);
 static bool awg_has_internal_phy(device_t dev);
 
 static int
 awg_miibus_readreg(device_t dev, int phy, int reg)
 {
 	struct awg_softc *sc;
 	int retry, val;
 
 	sc = device_get_softc(dev);
 	val = 0;
 
 	WR4(sc, EMAC_MII_CMD,
 	    (sc->mdc_div_ratio_m << MDC_DIV_RATIO_M_SHIFT) |
 	    (phy << PHY_ADDR_SHIFT) |
 	    (reg << PHY_REG_ADDR_SHIFT) |
 	    MII_BUSY);
 	for (retry = MII_BUSY_RETRY; retry > 0; retry--) {
 		if ((RD4(sc, EMAC_MII_CMD) & MII_BUSY) == 0) {
 			val = RD4(sc, EMAC_MII_DATA);
 			break;
 		}
 		DELAY(10);
 	}
 
 	if (retry == 0)
 		device_printf(dev, "phy read timeout, phy=%d reg=%d\n",
 		    phy, reg);
 
 	return (val);
 }
 
 static int
 awg_miibus_writereg(device_t dev, int phy, int reg, int val)
 {
 	struct awg_softc *sc;
 	int retry;
 
 	sc = device_get_softc(dev);
 
 	WR4(sc, EMAC_MII_DATA, val);
 	WR4(sc, EMAC_MII_CMD,
 	    (sc->mdc_div_ratio_m << MDC_DIV_RATIO_M_SHIFT) |
 	    (phy << PHY_ADDR_SHIFT) |
 	    (reg << PHY_REG_ADDR_SHIFT) |
 	    MII_WR | MII_BUSY);
 	for (retry = MII_BUSY_RETRY; retry > 0; retry--) {
 		if ((RD4(sc, EMAC_MII_CMD) & MII_BUSY) == 0)
 			break;
 		DELAY(10);
 	}
 
 	if (retry == 0)
 		device_printf(dev, "phy write timeout, phy=%d reg=%d\n",
 		    phy, reg);
 
 	return (0);
 }
 
 static void
 awg_update_link_locked(struct awg_softc *sc)
 {
 	struct mii_data *mii;
 	uint32_t val;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	if ((if_getdrvflags(sc->ifp) & IFF_DRV_RUNNING) == 0)
 		return;
 	mii = device_get_softc(sc->miibus);
 
 	if ((mii->mii_media_status & (IFM_ACTIVE | IFM_AVALID)) ==
 	    (IFM_ACTIVE | IFM_AVALID)) {
 		switch (IFM_SUBTYPE(mii->mii_media_active)) {
 		case IFM_1000_T:
 		case IFM_1000_SX:
 		case IFM_100_TX:
 		case IFM_10_T:
 			sc->link = 1;
 			break;
 		default:
 			sc->link = 0;
 			break;
 		}
 	} else
 		sc->link = 0;
 
 	if (sc->link == 0)
 		return;
 
 	val = RD4(sc, EMAC_BASIC_CTL_0);
 	val &= ~(BASIC_CTL_SPEED | BASIC_CTL_DUPLEX);
 
 	if (IFM_SUBTYPE(mii->mii_media_active) == IFM_1000_T ||
 	    IFM_SUBTYPE(mii->mii_media_active) == IFM_1000_SX)
 		val |= BASIC_CTL_SPEED_1000 << BASIC_CTL_SPEED_SHIFT;
 	else if (IFM_SUBTYPE(mii->mii_media_active) == IFM_100_TX)
 		val |= BASIC_CTL_SPEED_100 << BASIC_CTL_SPEED_SHIFT;
 	else
 		val |= BASIC_CTL_SPEED_10 << BASIC_CTL_SPEED_SHIFT;
 
 	if ((IFM_OPTIONS(mii->mii_media_active) & IFM_FDX) != 0)
 		val |= BASIC_CTL_DUPLEX;
 
 	WR4(sc, EMAC_BASIC_CTL_0, val);
 
 	val = RD4(sc, EMAC_RX_CTL_0);
 	val &= ~RX_FLOW_CTL_EN;
 	if ((IFM_OPTIONS(mii->mii_media_active) & IFM_ETH_RXPAUSE) != 0)
 		val |= RX_FLOW_CTL_EN;
 	WR4(sc, EMAC_RX_CTL_0, val);
 
 	val = RD4(sc, EMAC_TX_FLOW_CTL);
 	val &= ~(PAUSE_TIME|TX_FLOW_CTL_EN);
 	if ((IFM_OPTIONS(mii->mii_media_active) & IFM_ETH_TXPAUSE) != 0)
 		val |= TX_FLOW_CTL_EN;
 	if ((IFM_OPTIONS(mii->mii_media_active) & IFM_FDX) != 0)
 		val |= awg_pause_time << PAUSE_TIME_SHIFT;
 	WR4(sc, EMAC_TX_FLOW_CTL, val);
 }
 
 static void
 awg_link_task(void *arg, int pending)
 {
 	struct awg_softc *sc;
 
 	sc = arg;
 
 	AWG_LOCK(sc);
 	awg_update_link_locked(sc);
 	AWG_UNLOCK(sc);
 }
 
 static void
 awg_miibus_statchg(device_t dev)
 {
 	struct awg_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	taskqueue_enqueue(taskqueue_swi, &sc->link_task);
 }
 
 static void
 awg_media_status(if_t ifp, struct ifmediareq *ifmr)
 {
 	struct awg_softc *sc;
 	struct mii_data *mii;
 
 	sc = if_getsoftc(ifp);
 	mii = device_get_softc(sc->miibus);
 
 	AWG_LOCK(sc);
 	mii_pollstat(mii);
 	ifmr->ifm_active = mii->mii_media_active;
 	ifmr->ifm_status = mii->mii_media_status;
 	AWG_UNLOCK(sc);
 }
 
 static int
 awg_media_change(if_t ifp)
 {
 	struct awg_softc *sc;
 	struct mii_data *mii;
 	int error;
 
 	sc = if_getsoftc(ifp);
 	mii = device_get_softc(sc->miibus);
 
 	AWG_LOCK(sc);
 	error = mii_mediachg(mii);
 	AWG_UNLOCK(sc);
 
 	return (error);
 }
 
 static int
 awg_encap(struct awg_softc *sc, struct mbuf **mp)
 {
 	bus_dmamap_t map;
 	bus_dma_segment_t segs[TX_MAX_SEGS];
 	int error, nsegs, cur, first, last, i;
 	u_int csum_flags;
 	uint32_t flags, status;
 	struct mbuf *m;
 
 	cur = first = sc->tx.cur;
 	map = sc->tx.buf_map[first].map;
 
 	m = *mp;
 	error = bus_dmamap_load_mbuf_sg(sc->tx.buf_tag, map, m, segs,
 	    &nsegs, BUS_DMA_NOWAIT);
 	if (error == EFBIG) {
 		m = m_collapse(m, M_NOWAIT, TX_MAX_SEGS);
 		if (m == NULL) {
 			device_printf(sc->dev, "awg_encap: m_collapse failed\n");
 			m_freem(*mp);
 			*mp = NULL;
 			return (ENOMEM);
 		}
 		*mp = m;
 		error = bus_dmamap_load_mbuf_sg(sc->tx.buf_tag, map, m,
 		    segs, &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) {
 			m_freem(*mp);
 			*mp = NULL;
 		}
 	}
 	if (error != 0) {
 		device_printf(sc->dev, "awg_encap: bus_dmamap_load_mbuf_sg failed\n");
 		return (error);
 	}
 	if (nsegs == 0) {
 		m_freem(*mp);
 		*mp = NULL;
 		return (EIO);
 	}
 
 	if (sc->tx.queued + nsegs > TX_DESC_COUNT) {
 		bus_dmamap_unload(sc->tx.buf_tag, map);
 		return (ENOBUFS);
 	}
 
 	bus_dmamap_sync(sc->tx.buf_tag, map, BUS_DMASYNC_PREWRITE);
 
 	flags = TX_FIR_DESC;
 	status = 0;
 	if ((m->m_pkthdr.csum_flags & CSUM_IP) != 0) {
 		if ((m->m_pkthdr.csum_flags & (CSUM_TCP|CSUM_UDP)) != 0)
 			csum_flags = TX_CHECKSUM_CTL_FULL;
 		else
 			csum_flags = TX_CHECKSUM_CTL_IP;
 		flags |= (csum_flags << TX_CHECKSUM_CTL_SHIFT);
 	}
 
 	for (i = 0; i < nsegs; i++) {
 		sc->tx.segs++;
 		if (i == nsegs - 1) {
 			flags |= TX_LAST_DESC;
 			/*
 			 * Can only request TX completion
 			 * interrupt on last descriptor.
 			 */
 			if (sc->tx.segs >= awg_tx_interval) {
 				sc->tx.segs = 0;
 				flags |= TX_INT_CTL;
 			}
 		}
 
 		sc->tx.desc_ring[cur].addr = htole32((uint32_t)segs[i].ds_addr);
 		sc->tx.desc_ring[cur].size = htole32(flags | segs[i].ds_len);
 		sc->tx.desc_ring[cur].status = htole32(status);
 
 		flags &= ~TX_FIR_DESC;
 		/*
 		 * Setting of the valid bit in the first descriptor is
 		 * deferred until the whole chain is fully set up.
 		 */
 		status = TX_DESC_CTL;
 
 		++sc->tx.queued;
 		cur = TX_NEXT(cur);
 	}
 
 	sc->tx.cur = cur;
 
 	/* Store mapping and mbuf in the last segment */
 	last = TX_SKIP(cur, TX_DESC_COUNT - 1);
 	sc->tx.buf_map[first].map = sc->tx.buf_map[last].map;
 	sc->tx.buf_map[last].map = map;
 	sc->tx.buf_map[last].mbuf = m;
 
 	/*
 	 * The whole mbuf chain has been DMA mapped,
 	 * fix the first descriptor.
 	 */
 	sc->tx.desc_ring[first].status = htole32(TX_DESC_CTL);
 
 	return (0);
 }
 
 static void
 awg_clean_txbuf(struct awg_softc *sc, int index)
 {
 	struct awg_bufmap *bmap;
 
 	--sc->tx.queued;
 
 	bmap = &sc->tx.buf_map[index];
 	if (bmap->mbuf != NULL) {
 		bus_dmamap_sync(sc->tx.buf_tag, bmap->map,
 		    BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(sc->tx.buf_tag, bmap->map);
 		m_freem(bmap->mbuf);
 		bmap->mbuf = NULL;
 	}
 }
 
 static void
 awg_setup_rxdesc(struct awg_softc *sc, int index, bus_addr_t paddr)
 {
 	uint32_t status, size;
 
 	status = RX_DESC_CTL;
 	size = MCLBYTES - 1;
 
 	sc->rx.desc_ring[index].addr = htole32((uint32_t)paddr);
 	sc->rx.desc_ring[index].size = htole32(size);
 	sc->rx.desc_ring[index].status = htole32(status);
 }
 
 static void
 awg_reuse_rxdesc(struct awg_softc *sc, int index)
 {
 
 	sc->rx.desc_ring[index].status = htole32(RX_DESC_CTL);
 }
 
 static int
 awg_newbuf_rx(struct awg_softc *sc, int index)
 {
 	struct mbuf *m;
 	bus_dma_segment_t seg;
 	bus_dmamap_t map;
 	int nsegs;
 
 	m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
 	if (m == NULL)
 		return (ENOBUFS);
 
 	m->m_pkthdr.len = m->m_len = m->m_ext.ext_size;
 	m_adj(m, ETHER_ALIGN);
 
 	if (bus_dmamap_load_mbuf_sg(sc->rx.buf_tag, sc->rx.buf_spare_map,
 	    m, &seg, &nsegs, BUS_DMA_NOWAIT) != 0) {
 		m_freem(m);
 		return (ENOBUFS);
 	}
 
 	if (sc->rx.buf_map[index].mbuf != NULL) {
 		bus_dmamap_sync(sc->rx.buf_tag, sc->rx.buf_map[index].map,
 		    BUS_DMASYNC_POSTREAD);
 		bus_dmamap_unload(sc->rx.buf_tag, sc->rx.buf_map[index].map);
 	}
 	map = sc->rx.buf_map[index].map;
 	sc->rx.buf_map[index].map = sc->rx.buf_spare_map;
 	sc->rx.buf_spare_map = map;
 	bus_dmamap_sync(sc->rx.buf_tag, sc->rx.buf_map[index].map,
 	    BUS_DMASYNC_PREREAD);
 
 	sc->rx.buf_map[index].mbuf = m;
 	awg_setup_rxdesc(sc, index, seg.ds_addr);
 
 	return (0);
 }
 
 static void
 awg_start_locked(struct awg_softc *sc)
 {
 	struct mbuf *m;
 	uint32_t val;
 	if_t ifp;
 	int cnt, err;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	if (!sc->link)
 		return;
 
 	ifp = sc->ifp;
 
 	if ((if_getdrvflags(ifp) & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
 	    IFF_DRV_RUNNING)
 		return;
 
 	for (cnt = 0; ; cnt++) {
 		m = if_dequeue(ifp);
 		if (m == NULL)
 			break;
 
 		err = awg_encap(sc, &m);
 		if (err != 0) {
 			if (err == ENOBUFS)
 				if_setdrvflagbits(ifp, IFF_DRV_OACTIVE, 0);
 			if (m != NULL)
 				if_sendq_prepend(ifp, m);
 			break;
 		}
 		if_bpfmtap(ifp, m);
 	}
 
 	if (cnt != 0) {
 		bus_dmamap_sync(sc->tx.desc_tag, sc->tx.desc_map,
 		    BUS_DMASYNC_PREREAD|BUS_DMASYNC_PREWRITE);
 
 		/* Start and run TX DMA */
 		val = RD4(sc, EMAC_TX_CTL_1);
 		WR4(sc, EMAC_TX_CTL_1, val | TX_DMA_START);
 	}
 }
 
 static void
 awg_start(if_t ifp)
 {
 	struct awg_softc *sc;
 
 	sc = if_getsoftc(ifp);
 
 	AWG_LOCK(sc);
 	awg_start_locked(sc);
 	AWG_UNLOCK(sc);
 }
 
 static void
 awg_tick(void *softc)
 {
 	struct awg_softc *sc;
 	struct mii_data *mii;
 	if_t ifp;
 	int link;
 
 	sc = softc;
 	ifp = sc->ifp;
 	mii = device_get_softc(sc->miibus);
 
 	AWG_ASSERT_LOCKED(sc);
 
 	if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0)
 		return;
 
 	link = sc->link;
 	mii_tick(mii);
 	if (sc->link && !link)
 		awg_start_locked(sc);
 
 	callout_reset(&sc->stat_ch, hz, awg_tick, sc);
 }
 
 /* Bit Reversal - http://aggregate.org/MAGIC/#Bit%20Reversal */
 static uint32_t
 bitrev32(uint32_t x)
 {
 	x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
 	x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
 	x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
 	x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
 
 	return (x >> 16) | (x << 16);
 }
 
 static void
 awg_setup_rxfilter(struct awg_softc *sc)
 {
 	uint32_t val, crc, hashreg, hashbit, hash[2], machi, maclo;
 	int mc_count, mcnt, i;
 	uint8_t *eaddr, *mta;
 	if_t ifp;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	ifp = sc->ifp;
 	val = 0;
 	hash[0] = hash[1] = 0;
 
 	mc_count = if_multiaddr_count(ifp, -1);
 
 	if (if_getflags(ifp) & IFF_PROMISC)
 		val |= DIS_ADDR_FILTER;
 	else if (if_getflags(ifp) & IFF_ALLMULTI) {
 		val |= RX_ALL_MULTICAST;
 		hash[0] = hash[1] = ~0;
 	} else if (mc_count > 0) {
 		val |= HASH_MULTICAST;
 
 		mta = malloc(sizeof(unsigned char) * ETHER_ADDR_LEN * mc_count,
 		    M_DEVBUF, M_NOWAIT);
 		if (mta == NULL) {
 			if_printf(ifp,
 			    "failed to allocate temporary multicast list\n");
 			return;
 		}
 
 		if_multiaddr_array(ifp, mta, &mcnt, mc_count);
 		for (i = 0; i < mcnt; i++) {
 			crc = ether_crc32_le(mta + (i * ETHER_ADDR_LEN),
 			    ETHER_ADDR_LEN) & 0x7f;
 			crc = bitrev32(~crc) >> 26;
 			hashreg = (crc >> 5);
 			hashbit = (crc & 0x1f);
 			hash[hashreg] |= (1 << hashbit);
 		}
 
 		free(mta, M_DEVBUF);
 	}
 
 	/* Write our unicast address */
 	eaddr = IF_LLADDR(ifp);
 	machi = (eaddr[5] << 8) | eaddr[4];
 	maclo = (eaddr[3] << 24) | (eaddr[2] << 16) | (eaddr[1] << 8) |
 	   (eaddr[0] << 0);
 	WR4(sc, EMAC_ADDR_HIGH(0), machi);
 	WR4(sc, EMAC_ADDR_LOW(0), maclo);
 
 	/* Multicast hash filters */
 	WR4(sc, EMAC_RX_HASH_0, hash[1]);
 	WR4(sc, EMAC_RX_HASH_1, hash[0]);
 
 	/* RX frame filter config */
 	WR4(sc, EMAC_RX_FRM_FLT, val);
 }
 
 static void
 awg_enable_intr(struct awg_softc *sc)
 {
 	/* Enable interrupts */
 	WR4(sc, EMAC_INT_EN, RX_INT_EN | TX_INT_EN | TX_BUF_UA_INT_EN);
 }
 
 static void
 awg_disable_intr(struct awg_softc *sc)
 {
 	/* Disable interrupts */
 	WR4(sc, EMAC_INT_EN, 0);
 }
 
 static void
 awg_init_locked(struct awg_softc *sc)
 {
 	struct mii_data *mii;
 	uint32_t val;
 	if_t ifp;
 
 	mii = device_get_softc(sc->miibus);
 	ifp = sc->ifp;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	if (if_getdrvflags(ifp) & IFF_DRV_RUNNING)
 		return;
 
 	awg_setup_rxfilter(sc);
 
 	/* Configure DMA burst length and priorities */
 	val = awg_burst_len << BASIC_CTL_BURST_LEN_SHIFT;
 	if (awg_rx_tx_pri)
 		val |= BASIC_CTL_RX_TX_PRI;
 	WR4(sc, EMAC_BASIC_CTL_1, val);
 
 	/* Enable interrupts */
 #ifdef DEVICE_POLLING
 	if ((if_getcapenable(ifp) & IFCAP_POLLING) == 0)
 		awg_enable_intr(sc);
 	else
 		awg_disable_intr(sc);
 #else
 	awg_enable_intr(sc);
 #endif
 
 	/* Enable transmit DMA */
 	val = RD4(sc, EMAC_TX_CTL_1);
 	WR4(sc, EMAC_TX_CTL_1, val | TX_DMA_EN | TX_MD | TX_NEXT_FRAME);
 
 	/* Enable receive DMA */
 	val = RD4(sc, EMAC_RX_CTL_1);
 	WR4(sc, EMAC_RX_CTL_1, val | RX_DMA_EN | RX_MD);
 
 	/* Enable transmitter */
 	val = RD4(sc, EMAC_TX_CTL_0);
 	WR4(sc, EMAC_TX_CTL_0, val | TX_EN);
 
 	/* Enable receiver */
 	val = RD4(sc, EMAC_RX_CTL_0);
 	WR4(sc, EMAC_RX_CTL_0, val | RX_EN | CHECK_CRC);
 
 	if_setdrvflagbits(ifp, IFF_DRV_RUNNING, IFF_DRV_OACTIVE);
 
 	mii_mediachg(mii);
 	callout_reset(&sc->stat_ch, hz, awg_tick, sc);
 }
 
 static void
 awg_init(void *softc)
 {
 	struct awg_softc *sc;
 
 	sc = softc;
 
 	AWG_LOCK(sc);
 	awg_init_locked(sc);
 	AWG_UNLOCK(sc);
 }
 
 static void
 awg_stop(struct awg_softc *sc)
 {
 	if_t ifp;
 	uint32_t val;
 	int i;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	ifp = sc->ifp;
 
 	callout_stop(&sc->stat_ch);
 
 	/* Stop transmit DMA and flush data in the TX FIFO */
 	val = RD4(sc, EMAC_TX_CTL_1);
 	val &= ~TX_DMA_EN;
 	val |= FLUSH_TX_FIFO;
 	WR4(sc, EMAC_TX_CTL_1, val);
 
 	/* Disable transmitter */
 	val = RD4(sc, EMAC_TX_CTL_0);
 	WR4(sc, EMAC_TX_CTL_0, val & ~TX_EN);
 
 	/* Disable receiver */
 	val = RD4(sc, EMAC_RX_CTL_0);
 	WR4(sc, EMAC_RX_CTL_0, val & ~RX_EN);
 
 	/* Disable interrupts */
 	awg_disable_intr(sc);
 
 	/* Disable transmit DMA */
 	val = RD4(sc, EMAC_TX_CTL_1);
 	WR4(sc, EMAC_TX_CTL_1, val & ~TX_DMA_EN);
 
 	/* Disable receive DMA */
 	val = RD4(sc, EMAC_RX_CTL_1);
 	WR4(sc, EMAC_RX_CTL_1, val & ~RX_DMA_EN);
 
 	sc->link = 0;
 
 	/* Finish handling transmitted buffers */
 	awg_txeof(sc);
 
 	/* Release any untransmitted buffers. */
 	for (i = sc->tx.next; sc->tx.queued > 0; i = TX_NEXT(i)) {
 		val = le32toh(sc->tx.desc_ring[i].status);
 		if ((val & TX_DESC_CTL) != 0)
 			break;
 		awg_clean_txbuf(sc, i);
 	}
 	sc->tx.next = i;
 	for (; sc->tx.queued > 0; i = TX_NEXT(i)) {
 		sc->tx.desc_ring[i].status = 0;
 		awg_clean_txbuf(sc, i);
 	}
 	sc->tx.cur = sc->tx.next;
 	bus_dmamap_sync(sc->tx.desc_tag, sc->tx.desc_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	/* Setup RX buffers for reuse */
 	bus_dmamap_sync(sc->rx.desc_tag, sc->rx.desc_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	for (i = sc->rx.cur; ; i = RX_NEXT(i)) {
 		val = le32toh(sc->rx.desc_ring[i].status);
 		if ((val & RX_DESC_CTL) != 0)
 			break;
 		awg_reuse_rxdesc(sc, i);
 	}
 	sc->rx.cur = i;
 	bus_dmamap_sync(sc->rx.desc_tag, sc->rx.desc_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	if_setdrvflagbits(ifp, 0, IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 }
 
 static int
 awg_rxintr(struct awg_softc *sc)
 {
 	if_t ifp;
 	struct mbuf *m, *mh, *mt;
 	int error, index, len, cnt, npkt;
 	uint32_t status;
 
 	ifp = sc->ifp;
 	mh = mt = NULL;
 	cnt = 0;
 	npkt = 0;
 
 	bus_dmamap_sync(sc->rx.desc_tag, sc->rx.desc_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	for (index = sc->rx.cur; ; index = RX_NEXT(index)) {
 		status = le32toh(sc->rx.desc_ring[index].status);
 		if ((status & RX_DESC_CTL) != 0)
 			break;
 
 		len = (status & RX_FRM_LEN) >> RX_FRM_LEN_SHIFT;
 
 		if (len == 0) {
 			if ((status & (RX_NO_ENOUGH_BUF_ERR | RX_OVERFLOW_ERR)) != 0)
 				if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 			awg_reuse_rxdesc(sc, index);
 			continue;
 		}
 
 		m = sc->rx.buf_map[index].mbuf;
 
 		error = awg_newbuf_rx(sc, index);
 		if (error != 0) {
 			if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
 			awg_reuse_rxdesc(sc, index);
 			continue;
 		}
 
 		m->m_pkthdr.rcvif = ifp;
 		m->m_pkthdr.len = len;
 		m->m_len = len;
 		if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 
 		if ((if_getcapenable(ifp) & IFCAP_RXCSUM) != 0 &&
 		    (status & RX_FRM_TYPE) != 0) {
 			m->m_pkthdr.csum_flags = CSUM_IP_CHECKED;
 			if ((status & RX_HEADER_ERR) == 0)
 				m->m_pkthdr.csum_flags |= CSUM_IP_VALID;
 			if ((status & RX_PAYLOAD_ERR) == 0) {
 				m->m_pkthdr.csum_flags |=
 				    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 				m->m_pkthdr.csum_data = 0xffff;
 			}
 		}
 
 		m->m_nextpkt = NULL;
 		if (mh == NULL)
 			mh = m;
 		else
 			mt->m_nextpkt = m;
 		mt = m;
 		++cnt;
 		++npkt;
 
 		if (cnt == awg_rx_batch) {
 			AWG_UNLOCK(sc);
 			if_input(ifp, mh);
 			AWG_LOCK(sc);
 			mh = mt = NULL;
 			cnt = 0;
 		}
 	}
 
 	if (index != sc->rx.cur) {
 		bus_dmamap_sync(sc->rx.desc_tag, sc->rx.desc_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	}
 
 	if (mh != NULL) {
 		AWG_UNLOCK(sc);
 		if_input(ifp, mh);
 		AWG_LOCK(sc);
 	}
 
 	sc->rx.cur = index;
 
 	return (npkt);
 }
 
 static void
 awg_txeof(struct awg_softc *sc)
 {
 	struct emac_desc *desc;
 	uint32_t status, size;
 	if_t ifp;
 	int i, prog;
 
 	AWG_ASSERT_LOCKED(sc);
 
 	bus_dmamap_sync(sc->tx.desc_tag, sc->tx.desc_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	ifp = sc->ifp;
 
 	prog = 0;
 	for (i = sc->tx.next; sc->tx.queued > 0; i = TX_NEXT(i)) {
 		desc = &sc->tx.desc_ring[i];
 		status = le32toh(desc->status);
 		if ((status & TX_DESC_CTL) != 0)
 			break;
 		size = le32toh(desc->size);
 		if (size & TX_LAST_DESC) {
 			if ((status & (TX_HEADER_ERR | TX_PAYLOAD_ERR)) != 0)
 				if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 			else
 				if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 		}
 		prog++;
 		awg_clean_txbuf(sc, i);
 	}
 
 	if (prog > 0) {
 		sc->tx.next = i;
 		if_setdrvflagbits(ifp, 0, IFF_DRV_OACTIVE);
 	}
 }
 
 static void
 awg_intr(void *arg)
 {
 	struct awg_softc *sc;
 	uint32_t val;
 
 	sc = arg;
 
 	AWG_LOCK(sc);
 	val = RD4(sc, EMAC_INT_STA);
 	WR4(sc, EMAC_INT_STA, val);
 
 	if (val & RX_INT)
 		awg_rxintr(sc);
 
 	if (val & TX_INT)
 		awg_txeof(sc);
 
 	if (val & (TX_INT | TX_BUF_UA_INT)) {
 		if (!if_sendq_empty(sc->ifp))
 			awg_start_locked(sc);
 	}
 
 	AWG_UNLOCK(sc);
 }
 
 #ifdef DEVICE_POLLING
 static int
 awg_poll(if_t ifp, enum poll_cmd cmd, int count)
 {
 	struct awg_softc *sc;
 	uint32_t val;
 	int rx_npkts;
 
 	sc = if_getsoftc(ifp);
 	rx_npkts = 0;
 
 	AWG_LOCK(sc);
 
 	if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0) {
 		AWG_UNLOCK(sc);
 		return (0);
 	}
 
 	rx_npkts = awg_rxintr(sc);
 	awg_txeof(sc);
 	if (!if_sendq_empty(ifp))
 		awg_start_locked(sc);
 
 	if (cmd == POLL_AND_CHECK_STATUS) {
 		val = RD4(sc, EMAC_INT_STA);
 		if (val != 0)
 			WR4(sc, EMAC_INT_STA, val);
 	}
 
 	AWG_UNLOCK(sc);
 
 	return (rx_npkts);
 }
 #endif
 
 static int
 awg_ioctl(if_t ifp, u_long cmd, caddr_t data)
 {
 	struct awg_softc *sc;
 	struct mii_data *mii;
 	struct ifreq *ifr;
 	int flags, mask, error;
 
 	sc = if_getsoftc(ifp);
 	mii = device_get_softc(sc->miibus);
 	ifr = (struct ifreq *)data;
 	error = 0;
 
 	switch (cmd) {
 	case SIOCSIFFLAGS:
 		AWG_LOCK(sc);
 		if (if_getflags(ifp) & IFF_UP) {
 			if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 				flags = if_getflags(ifp) ^ sc->if_flags;
 				if ((flags & (IFF_PROMISC|IFF_ALLMULTI)) != 0)
 					awg_setup_rxfilter(sc);
 			} else
 				awg_init_locked(sc);
 		} else {
 			if (if_getdrvflags(ifp) & IFF_DRV_RUNNING)
 				awg_stop(sc);
 		}
 		sc->if_flags = if_getflags(ifp);
 		AWG_UNLOCK(sc);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 			AWG_LOCK(sc);
 			awg_setup_rxfilter(sc);
 			AWG_UNLOCK(sc);
 		}
 		break;
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		error = ifmedia_ioctl(ifp, ifr, &mii->mii_media, cmd);
 		break;
 	case SIOCSIFCAP:
 		mask = ifr->ifr_reqcap ^ if_getcapenable(ifp);
 #ifdef DEVICE_POLLING
 		if (mask & IFCAP_POLLING) {
 			if ((ifr->ifr_reqcap & IFCAP_POLLING) != 0) {
 				error = ether_poll_register(awg_poll, ifp);
 				if (error != 0)
 					break;
 				AWG_LOCK(sc);
 				awg_disable_intr(sc);
 				if_setcapenablebit(ifp, IFCAP_POLLING, 0);
 				AWG_UNLOCK(sc);
 			} else {
 				error = ether_poll_deregister(ifp);
 				AWG_LOCK(sc);
 				awg_enable_intr(sc);
 				if_setcapenablebit(ifp, 0, IFCAP_POLLING);
 				AWG_UNLOCK(sc);
 			}
 		}
 #endif
 		if (mask & IFCAP_VLAN_MTU)
 			if_togglecapenable(ifp, IFCAP_VLAN_MTU);
 		if (mask & IFCAP_RXCSUM)
 			if_togglecapenable(ifp, IFCAP_RXCSUM);
 		if (mask & IFCAP_TXCSUM)
 			if_togglecapenable(ifp, IFCAP_TXCSUM);
 		if ((if_getcapenable(ifp) & IFCAP_TXCSUM) != 0)
 			if_sethwassistbits(ifp, CSUM_IP | CSUM_UDP | CSUM_TCP, 0);
 		else
 			if_sethwassistbits(ifp, 0, CSUM_IP | CSUM_UDP | CSUM_TCP);
 		break;
 	default:
 		error = ether_ioctl(ifp, cmd, data);
 		break;
 	}
 
 	return (error);
 }
 
 static uint32_t
 syscon_read_emac_clk_reg(device_t dev)
 {
 	struct awg_softc *sc;
 
 	sc = device_get_softc(dev);
 	if (sc->syscon != NULL)
 		return (SYSCON_READ_4(sc->syscon, EMAC_CLK_REG));
 	else if (sc->res[_RES_SYSCON] != NULL)
 		return (bus_read_4(sc->res[_RES_SYSCON], 0));
 
 	return (0);
 }
 
 static void
 syscon_write_emac_clk_reg(device_t dev, uint32_t val)
 {
 	struct awg_softc *sc;
 
 	sc = device_get_softc(dev);
 	if (sc->syscon != NULL)
 		SYSCON_WRITE_4(sc->syscon, EMAC_CLK_REG, val);
 	else if (sc->res[_RES_SYSCON] != NULL)
 		bus_write_4(sc->res[_RES_SYSCON], 0, val);
 }
 
 static phandle_t
 awg_get_phy_node(device_t dev)
 {
 	phandle_t node;
 	pcell_t phy_handle;
 
 	node = ofw_bus_get_node(dev);
 	if (OF_getencprop(node, "phy-handle", (void *)&phy_handle,
 	    sizeof(phy_handle)) <= 0)
 		return (0);
 
 	return (OF_node_from_xref(phy_handle));
 }
 
 static bool
 awg_has_internal_phy(device_t dev)
 {
 	phandle_t node, phy_node;
 
 	node = ofw_bus_get_node(dev);
 	/* Legacy binding */
 	if (OF_hasprop(node, "allwinner,use-internal-phy"))
 		return (true);
 
 	phy_node = awg_get_phy_node(dev);
 	return (phy_node != 0 && ofw_bus_node_is_compatible(OF_parent(phy_node),
 	    "allwinner,sun8i-h3-mdio-internal") != 0);
 }
 
 static int
 awg_parse_delay(device_t dev, uint32_t *tx_delay, uint32_t *rx_delay)
 {
 	phandle_t node;
 	uint32_t delay;
 
 	if (tx_delay == NULL || rx_delay == NULL)
 		return (EINVAL);
 	*tx_delay = *rx_delay = 0;
 	node = ofw_bus_get_node(dev);
 
 	if (OF_getencprop(node, "tx-delay", &delay, sizeof(delay)) >= 0)
 		*tx_delay = delay;
 	else if (OF_getencprop(node, "allwinner,tx-delay-ps", &delay,
 	    sizeof(delay)) >= 0) {
 		if ((delay % 100) != 0) {
 			device_printf(dev, "tx-delay-ps is not a multiple of 100\n");
 			return (EDOM);
 		}
 		*tx_delay = delay / 100;
 	}
 	if (*tx_delay > 7) {
 		device_printf(dev, "tx-delay out of range\n");
 		return (ERANGE);
 	}
 
 	if (OF_getencprop(node, "rx-delay", &delay, sizeof(delay)) >= 0)
 		*rx_delay = delay;
 	else if (OF_getencprop(node, "allwinner,rx-delay-ps", &delay,
 	    sizeof(delay)) >= 0) {
 		if ((delay % 100) != 0) {
 			device_printf(dev, "rx-delay-ps is not within documented domain\n");
 			return (EDOM);
 		}
 		*rx_delay = delay / 100;
 	}
 	if (*rx_delay > 31) {
 		device_printf(dev, "rx-delay out of range\n");
 		return (ERANGE);
 	}
 
 	return (0);
 }
 
 static int
 awg_setup_phy(device_t dev)
 {
 	struct awg_softc *sc;
 	clk_t clk_tx, clk_tx_parent;
 	const char *tx_parent_name;
 	char *phy_type;
 	phandle_t node;
 	uint32_t reg, tx_delay, rx_delay;
 	int error;
 	bool use_syscon;
 
 	sc = device_get_softc(dev);
 	node = ofw_bus_get_node(dev);
 	use_syscon = false;
 
 	if (OF_getprop_alloc(node, "phy-mode", (void **)&phy_type) == 0)
 		return (0);
 
 	if (sc->syscon != NULL || sc->res[_RES_SYSCON] != NULL)
 		use_syscon = true;
 
 	if (bootverbose)
 		device_printf(dev, "PHY type: %s, conf mode: %s\n", phy_type,
 		    use_syscon ? "reg" : "clk");
 
 	if (use_syscon) {
 		/*
 		 * Abstract away writing to syscon for devices like the pine64.
 		 * For the pine64, we get dtb from U-Boot and it still uses the
 		 * legacy setup of specifying syscon register in emac node
 		 * rather than as its own node and using an xref in emac.
 		 * These abstractions can go away once U-Boot dts is up-to-date.
 		 */
 		reg = syscon_read_emac_clk_reg(dev);
 		reg &= ~(EMAC_CLK_PIT | EMAC_CLK_SRC | EMAC_CLK_RMII_EN);
 		if (strncmp(phy_type, "rgmii", 5) == 0)
 			reg |= EMAC_CLK_PIT_RGMII | EMAC_CLK_SRC_RGMII;
 		else if (strcmp(phy_type, "rmii") == 0)
 			reg |= EMAC_CLK_RMII_EN;
 		else
 			reg |= EMAC_CLK_PIT_MII | EMAC_CLK_SRC_MII;
 
 		/*
 		 * Fail attach if we fail to parse either of the delay
 		 * parameters. If we don't have the proper delay to write to
 		 * syscon, then awg likely won't function properly anyways.
 		 * Lack of delay is not an error!
 		 */
 		error = awg_parse_delay(dev, &tx_delay, &rx_delay);
 		if (error != 0)
 			goto fail;
 
 		/* Default to 0 and we'll increase it if we need to. */
 		reg &= ~(EMAC_CLK_ETXDC | EMAC_CLK_ERXDC);
 		if (tx_delay > 0)
 			reg |= (tx_delay << EMAC_CLK_ETXDC_SHIFT);
 		if (rx_delay > 0)
 			reg |= (rx_delay << EMAC_CLK_ERXDC_SHIFT);
 
 		if (sc->type == EMAC_H3) {
 			if (awg_has_internal_phy(dev)) {
 				reg |= EMAC_CLK_EPHY_SELECT;
 				reg &= ~EMAC_CLK_EPHY_SHUTDOWN;
 				if (OF_hasprop(node,
 				    "allwinner,leds-active-low"))
 					reg |= EMAC_CLK_EPHY_LED_POL;
 				else
 					reg &= ~EMAC_CLK_EPHY_LED_POL;
 
 				/* Set internal PHY addr to 1 */
 				reg &= ~EMAC_CLK_EPHY_ADDR;
 				reg |= (1 << EMAC_CLK_EPHY_ADDR_SHIFT);
 			} else {
 				reg &= ~EMAC_CLK_EPHY_SELECT;
 			}
 		}
 
 		if (bootverbose)
 			device_printf(dev, "EMAC clock: 0x%08x\n", reg);
 		syscon_write_emac_clk_reg(dev, reg);
 	} else {
 		if (strncmp(phy_type, "rgmii", 5) == 0)
 			tx_parent_name = "emac_int_tx";
 		else
 			tx_parent_name = "mii_phy_tx";
 
 		/* Get the TX clock */
 		error = clk_get_by_ofw_name(dev, 0, "tx", &clk_tx);
 		if (error != 0) {
 			device_printf(dev, "cannot get tx clock\n");
 			goto fail;
 		}
 
 		/* Find the desired parent clock based on phy-mode property */
 		error = clk_get_by_name(dev, tx_parent_name, &clk_tx_parent);
 		if (error != 0) {
 			device_printf(dev, "cannot get clock '%s'\n",
 			    tx_parent_name);
 			goto fail;
 		}
 
 		/* Set TX clock parent */
 		error = clk_set_parent_by_clk(clk_tx, clk_tx_parent);
 		if (error != 0) {
 			device_printf(dev, "cannot set tx clock parent\n");
 			goto fail;
 		}
 
 		/* Enable TX clock */
 		error = clk_enable(clk_tx);
 		if (error != 0) {
 			device_printf(dev, "cannot enable tx clock\n");
 			goto fail;
 		}
 	}
 
 	error = 0;
 
 fail:
 	OF_prop_free(phy_type);
 	return (error);
 }
 
 static int
 awg_setup_extres(device_t dev)
 {
 	struct awg_softc *sc;
 	phandle_t node, phy_node;
 	hwreset_t rst_ahb, rst_ephy;
 	clk_t clk_ahb, clk_ephy;
 	regulator_t reg;
 	uint64_t freq;
 	int error, div;
 
 	sc = device_get_softc(dev);
 	rst_ahb = rst_ephy = NULL;
 	clk_ahb = clk_ephy = NULL;
 	reg = NULL;
 	node = ofw_bus_get_node(dev);
 	phy_node = awg_get_phy_node(dev);
 
 	if (phy_node == 0 && OF_hasprop(node, "phy-handle")) {
 		error = ENXIO;
 		device_printf(dev, "cannot get phy handle\n");
 		goto fail;
 	}
 
 	/* Get AHB clock and reset resources */
 	error = hwreset_get_by_ofw_name(dev, 0, "stmmaceth", &rst_ahb);
 	if (error != 0)
 		error = hwreset_get_by_ofw_name(dev, 0, "ahb", &rst_ahb);
 	if (error != 0) {
 		device_printf(dev, "cannot get ahb reset\n");
 		goto fail;
 	}
 	if (hwreset_get_by_ofw_name(dev, 0, "ephy", &rst_ephy) != 0)
 		if (phy_node == 0 || hwreset_get_by_ofw_idx(dev, phy_node, 0,
 		    &rst_ephy) != 0)
 			rst_ephy = NULL;
 	error = clk_get_by_ofw_name(dev, 0, "stmmaceth", &clk_ahb);
 	if (error != 0)
 		error = clk_get_by_ofw_name(dev, 0, "ahb", &clk_ahb);
 	if (error != 0) {
 		device_printf(dev, "cannot get ahb clock\n");
 		goto fail;
 	}
 	if (clk_get_by_ofw_name(dev, 0, "ephy", &clk_ephy) != 0)
 		if (phy_node == 0 || clk_get_by_ofw_index(dev, phy_node, 0,
 		    &clk_ephy) != 0)
 			clk_ephy = NULL;
 
 	if (OF_hasprop(node, "syscon") && syscon_get_by_ofw_property(dev, node,
 	    "syscon", &sc->syscon) != 0) {
 		device_printf(dev, "cannot get syscon driver handle\n");
 		goto fail;
 	}
 
 	/* Configure PHY for MII or RGMII mode */
 	if (awg_setup_phy(dev) != 0)
 		goto fail;
 
 	/* Enable clocks */
 	error = clk_enable(clk_ahb);
 	if (error != 0) {
 		device_printf(dev, "cannot enable ahb clock\n");
 		goto fail;
 	}
 	if (clk_ephy != NULL) {
 		error = clk_enable(clk_ephy);
 		if (error != 0) {
 			device_printf(dev, "cannot enable ephy clock\n");
 			goto fail;
 		}
 	}
 
 	/* De-assert reset */
 	error = hwreset_deassert(rst_ahb);
 	if (error != 0) {
 		device_printf(dev, "cannot de-assert ahb reset\n");
 		goto fail;
 	}
 	if (rst_ephy != NULL) {
 		/*
 		 * The ephy reset is left de-asserted by U-Boot.  Assert it
 		 * here to make sure that we're in a known good state going
 		 * into the PHY reset.
 		 */
 		hwreset_assert(rst_ephy);
 		error = hwreset_deassert(rst_ephy);
 		if (error != 0) {
 			device_printf(dev, "cannot de-assert ephy reset\n");
 			goto fail;
 		}
 	}
 
 	/* Enable PHY regulator if applicable */
 	if (regulator_get_by_ofw_property(dev, 0, "phy-supply", &reg) == 0) {
 		error = regulator_enable(reg);
 		if (error != 0) {
 			device_printf(dev, "cannot enable PHY regulator\n");
 			goto fail;
 		}
 	}
 
 	/* Determine MDC clock divide ratio based on AHB clock */
 	error = clk_get_freq(clk_ahb, &freq);
 	if (error != 0) {
 		device_printf(dev, "cannot get AHB clock frequency\n");
 		goto fail;
 	}
 	div = freq / MDIO_FREQ;
 	if (div <= 16)
 		sc->mdc_div_ratio_m = MDC_DIV_RATIO_M_16;
 	else if (div <= 32)
 		sc->mdc_div_ratio_m = MDC_DIV_RATIO_M_32;
 	else if (div <= 64)
 		sc->mdc_div_ratio_m = MDC_DIV_RATIO_M_64;
 	else if (div <= 128)
 		sc->mdc_div_ratio_m = MDC_DIV_RATIO_M_128;
 	else {
 		device_printf(dev, "cannot determine MDC clock divide ratio\n");
 		error = ENXIO;
 		goto fail;
 	}
 
 	if (bootverbose)
 		device_printf(dev, "AHB frequency %ju Hz, MDC div: 0x%x\n",
 		    (uintmax_t)freq, sc->mdc_div_ratio_m);
 
 	return (0);
 
 fail:
 	if (reg != NULL)
 		regulator_release(reg);
 	if (clk_ephy != NULL)
 		clk_release(clk_ephy);
 	if (clk_ahb != NULL)
 		clk_release(clk_ahb);
 	if (rst_ephy != NULL)
 		hwreset_release(rst_ephy);
 	if (rst_ahb != NULL)
 		hwreset_release(rst_ahb);
 	return (error);
 }
 
 static void 
 awg_get_eaddr(device_t dev, uint8_t *eaddr)
 {
 	struct awg_softc *sc;
 	uint32_t maclo, machi, rnd;
 	u_char rootkey[16];
 	uint32_t rootkey_size;
 
 	sc = device_get_softc(dev);
 
 	machi = RD4(sc, EMAC_ADDR_HIGH(0)) & 0xffff;
 	maclo = RD4(sc, EMAC_ADDR_LOW(0));
 
 	rootkey_size = sizeof(rootkey);
 	if (maclo == 0xffffffff && machi == 0xffff) {
 		/* MAC address in hardware is invalid, create one */
 		if (aw_sid_get_fuse(AW_SID_FUSE_ROOTKEY, rootkey,
 		    &rootkey_size) == 0 &&
 		    (rootkey[3] | rootkey[12] | rootkey[13] | rootkey[14] |
 		     rootkey[15]) != 0) {
 			/* MAC address is derived from the root key in SID */
 			maclo = (rootkey[13] << 24) | (rootkey[12] << 16) |
 				(rootkey[3] << 8) | 0x02;
 			machi = (rootkey[15] << 8) | rootkey[14];
 		} else {
 			/* Create one */
 			rnd = arc4random();
 			maclo = 0x00f2 | (rnd & 0xffff0000);
 			machi = rnd & 0xffff;
 		}
 	}
 
 	eaddr[0] = maclo & 0xff;
 	eaddr[1] = (maclo >> 8) & 0xff;
 	eaddr[2] = (maclo >> 16) & 0xff;
 	eaddr[3] = (maclo >> 24) & 0xff;
 	eaddr[4] = machi & 0xff;
 	eaddr[5] = (machi >> 8) & 0xff;
 }
 
 #ifdef AWG_DEBUG
 static void
 awg_dump_regs(device_t dev)
 {
 	static const struct {
 		const char *name;
 		u_int reg;
 	} regs[] = {
 		{ "BASIC_CTL_0", EMAC_BASIC_CTL_0 },
 		{ "BASIC_CTL_1", EMAC_BASIC_CTL_1 },
 		{ "INT_STA", EMAC_INT_STA },
 		{ "INT_EN", EMAC_INT_EN },
 		{ "TX_CTL_0", EMAC_TX_CTL_0 },
 		{ "TX_CTL_1", EMAC_TX_CTL_1 },
 		{ "TX_FLOW_CTL", EMAC_TX_FLOW_CTL },
 		{ "TX_DMA_LIST", EMAC_TX_DMA_LIST },
 		{ "RX_CTL_0", EMAC_RX_CTL_0 },
 		{ "RX_CTL_1", EMAC_RX_CTL_1 },
 		{ "RX_DMA_LIST", EMAC_RX_DMA_LIST },
 		{ "RX_FRM_FLT", EMAC_RX_FRM_FLT },
 		{ "RX_HASH_0", EMAC_RX_HASH_0 },
 		{ "RX_HASH_1", EMAC_RX_HASH_1 },
 		{ "MII_CMD", EMAC_MII_CMD },
 		{ "ADDR_HIGH0", EMAC_ADDR_HIGH(0) },
 		{ "ADDR_LOW0", EMAC_ADDR_LOW(0) },
 		{ "TX_DMA_STA", EMAC_TX_DMA_STA },
 		{ "TX_DMA_CUR_DESC", EMAC_TX_DMA_CUR_DESC },
 		{ "TX_DMA_CUR_BUF", EMAC_TX_DMA_CUR_BUF },
 		{ "RX_DMA_STA", EMAC_RX_DMA_STA },
 		{ "RX_DMA_CUR_DESC", EMAC_RX_DMA_CUR_DESC },
 		{ "RX_DMA_CUR_BUF", EMAC_RX_DMA_CUR_BUF },
 		{ "RGMII_STA", EMAC_RGMII_STA },
 	};
 	struct awg_softc *sc;
 	unsigned int n;
 
 	sc = device_get_softc(dev);
 
 	for (n = 0; n < nitems(regs); n++)
 		device_printf(dev, "  %-20s %08x\n", regs[n].name,
 		    RD4(sc, regs[n].reg));
 }
 #endif
 
 #define	GPIO_ACTIVE_LOW		1
 
 static int
 awg_phy_reset(device_t dev)
 {
 	pcell_t gpio_prop[4], delay_prop[3];
 	phandle_t node, gpio_node;
 	device_t gpio;
 	uint32_t pin, flags;
 	uint32_t pin_value;
 
 	node = ofw_bus_get_node(dev);
 	if (OF_getencprop(node, "allwinner,reset-gpio", gpio_prop,
 	    sizeof(gpio_prop)) <= 0)
 		return (0);
 
 	if (OF_getencprop(node, "allwinner,reset-delays-us", delay_prop,
 	    sizeof(delay_prop)) <= 0)
 		return (ENXIO);
 
 	gpio_node = OF_node_from_xref(gpio_prop[0]);
 	if ((gpio = OF_device_from_xref(gpio_prop[0])) == NULL)
 		return (ENXIO);
 
 	if (GPIO_MAP_GPIOS(gpio, node, gpio_node, nitems(gpio_prop) - 1,
 	    gpio_prop + 1, &pin, &flags) != 0)
 		return (ENXIO);
 
 	pin_value = GPIO_PIN_LOW;
 	if (OF_hasprop(node, "allwinner,reset-active-low"))
 		pin_value = GPIO_PIN_HIGH;
 
 	if (flags & GPIO_ACTIVE_LOW)
 		pin_value = !pin_value;
 
 	GPIO_PIN_SETFLAGS(gpio, pin, GPIO_PIN_OUTPUT);
 	GPIO_PIN_SET(gpio, pin, pin_value);
 	DELAY(delay_prop[0]);
 	GPIO_PIN_SET(gpio, pin, !pin_value);
 	DELAY(delay_prop[1]);
 	GPIO_PIN_SET(gpio, pin, pin_value);
 	DELAY(delay_prop[2]);
 
 	return (0);
 }
 
 static int
 awg_reset(device_t dev)
 {
 	struct awg_softc *sc;
 	int retry;
 
 	sc = device_get_softc(dev);
 
 	/* Reset PHY if necessary */
 	if (awg_phy_reset(dev) != 0) {
 		device_printf(dev, "failed to reset PHY\n");
 		return (ENXIO);
 	}
 
 	/* Soft reset all registers and logic */
 	WR4(sc, EMAC_BASIC_CTL_1, BASIC_CTL_SOFT_RST);
 
 	/* Wait for soft reset bit to self-clear */
 	for (retry = SOFT_RST_RETRY; retry > 0; retry--) {
 		if ((RD4(sc, EMAC_BASIC_CTL_1) & BASIC_CTL_SOFT_RST) == 0)
 			break;
 		DELAY(10);
 	}
 	if (retry == 0) {
 		device_printf(dev, "soft reset timed out\n");
 #ifdef AWG_DEBUG
 		awg_dump_regs(dev);
 #endif
 		return (ETIMEDOUT);
 	}
 
 	return (0);
 }
 
 static void
 awg_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	if (error != 0)
 		return;
 	*(bus_addr_t *)arg = segs[0].ds_addr;
 }
 
 static int
 awg_setup_dma(device_t dev)
 {
 	struct awg_softc *sc;
 	int error, i;
 
 	sc = device_get_softc(dev);
 
 	/* Setup TX ring */
 	error = bus_dma_tag_create(
 	    bus_get_dma_tag(dev),	/* Parent tag */
 	    DESC_ALIGN, 0,		/* alignment, boundary */
 	    BUS_SPACE_MAXADDR_32BIT,	/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    TX_DESC_SIZE, 1,		/* maxsize, nsegs */
 	    TX_DESC_SIZE,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockarg */
 	    &sc->tx.desc_tag);
 	if (error != 0) {
 		device_printf(dev, "cannot create TX descriptor ring tag\n");
 		return (error);
 	}
 
 	error = bus_dmamem_alloc(sc->tx.desc_tag, (void **)&sc->tx.desc_ring,
 	    BUS_DMA_COHERENT | BUS_DMA_WAITOK | BUS_DMA_ZERO, &sc->tx.desc_map);
 	if (error != 0) {
 		device_printf(dev, "cannot allocate TX descriptor ring\n");
 		return (error);
 	}
 
 	error = bus_dmamap_load(sc->tx.desc_tag, sc->tx.desc_map,
 	    sc->tx.desc_ring, TX_DESC_SIZE, awg_dmamap_cb,
 	    &sc->tx.desc_ring_paddr, 0);
 	if (error != 0) {
 		device_printf(dev, "cannot load TX descriptor ring\n");
 		return (error);
 	}
 
 	for (i = 0; i < TX_DESC_COUNT; i++)
 		sc->tx.desc_ring[i].next =
 		    htole32(sc->tx.desc_ring_paddr + DESC_OFF(TX_NEXT(i)));
 
 	error = bus_dma_tag_create(
 	    bus_get_dma_tag(dev),	/* Parent tag */
 	    1, 0,			/* alignment, boundary */
 	    BUS_SPACE_MAXADDR_32BIT,	/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    MCLBYTES, TX_MAX_SEGS,	/* maxsize, nsegs */
 	    MCLBYTES,			/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockarg */
 	    &sc->tx.buf_tag);
 	if (error != 0) {
 		device_printf(dev, "cannot create TX buffer tag\n");
 		return (error);
 	}
 
 	sc->tx.queued = 0;
 	for (i = 0; i < TX_DESC_COUNT; i++) {
 		error = bus_dmamap_create(sc->tx.buf_tag, 0,
 		    &sc->tx.buf_map[i].map);
 		if (error != 0) {
 			device_printf(dev, "cannot create TX buffer map\n");
 			return (error);
 		}
 	}
 
 	/* Setup RX ring */
 	error = bus_dma_tag_create(
 	    bus_get_dma_tag(dev),	/* Parent tag */
 	    DESC_ALIGN, 0,		/* alignment, boundary */
 	    BUS_SPACE_MAXADDR_32BIT,	/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    RX_DESC_SIZE, 1,		/* maxsize, nsegs */
 	    RX_DESC_SIZE,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockarg */
 	    &sc->rx.desc_tag);
 	if (error != 0) {
 		device_printf(dev, "cannot create RX descriptor ring tag\n");
 		return (error);
 	}
 
 	error = bus_dmamem_alloc(sc->rx.desc_tag, (void **)&sc->rx.desc_ring,
 	    BUS_DMA_COHERENT | BUS_DMA_WAITOK | BUS_DMA_ZERO, &sc->rx.desc_map);
 	if (error != 0) {
 		device_printf(dev, "cannot allocate RX descriptor ring\n");
 		return (error);
 	}
 
 	error = bus_dmamap_load(sc->rx.desc_tag, sc->rx.desc_map,
 	    sc->rx.desc_ring, RX_DESC_SIZE, awg_dmamap_cb,
 	    &sc->rx.desc_ring_paddr, 0);
 	if (error != 0) {
 		device_printf(dev, "cannot load RX descriptor ring\n");
 		return (error);
 	}
 
 	error = bus_dma_tag_create(
 	    bus_get_dma_tag(dev),	/* Parent tag */
 	    1, 0,			/* alignment, boundary */
 	    BUS_SPACE_MAXADDR_32BIT,	/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    MCLBYTES, 1,		/* maxsize, nsegs */
 	    MCLBYTES,			/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockarg */
 	    &sc->rx.buf_tag);
 	if (error != 0) {
 		device_printf(dev, "cannot create RX buffer tag\n");
 		return (error);
 	}
 
 	error = bus_dmamap_create(sc->rx.buf_tag, 0, &sc->rx.buf_spare_map);
 	if (error != 0) {
 		device_printf(dev,
 		    "cannot create RX buffer spare map\n");
 		return (error);
 	}
 
 	for (i = 0; i < RX_DESC_COUNT; i++) {
 		sc->rx.desc_ring[i].next =
 		    htole32(sc->rx.desc_ring_paddr + DESC_OFF(RX_NEXT(i)));
 
 		error = bus_dmamap_create(sc->rx.buf_tag, 0,
 		    &sc->rx.buf_map[i].map);
 		if (error != 0) {
 			device_printf(dev, "cannot create RX buffer map\n");
 			return (error);
 		}
 		sc->rx.buf_map[i].mbuf = NULL;
 		error = awg_newbuf_rx(sc, i);
 		if (error != 0) {
 			device_printf(dev, "cannot create RX buffer\n");
 			return (error);
 		}
 	}
 	bus_dmamap_sync(sc->rx.desc_tag, sc->rx.desc_map,
 	    BUS_DMASYNC_PREWRITE);
 
 	/* Write transmit and receive descriptor base address registers */
 	WR4(sc, EMAC_TX_DMA_LIST, sc->tx.desc_ring_paddr);
 	WR4(sc, EMAC_RX_DMA_LIST, sc->rx.desc_ring_paddr);
 
 	return (0);
 }
 
 static int
 awg_probe(device_t dev)
 {
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "Allwinner Gigabit Ethernet");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 awg_attach(device_t dev)
 {
 	uint8_t eaddr[ETHER_ADDR_LEN];
 	struct awg_softc *sc;
 	int error;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	sc->type = ofw_bus_search_compatible(dev, compat_data)->ocd_data;
 
 	if (bus_alloc_resources(dev, awg_spec, sc->res) != 0) {
 		device_printf(dev, "cannot allocate resources for device\n");
 		return (ENXIO);
 	}
 
 	mtx_init(&sc->mtx, device_get_nameunit(dev), MTX_NETWORK_LOCK, MTX_DEF);
 	callout_init_mtx(&sc->stat_ch, &sc->mtx, 0);
 	TASK_INIT(&sc->link_task, 0, awg_link_task, sc);
 
 	/* Setup clocks and regulators */
 	error = awg_setup_extres(dev);
 	if (error != 0)
 		return (error);
 
 	/* Read MAC address before resetting the chip */
 	awg_get_eaddr(dev, eaddr);
 
 	/* Soft reset EMAC core */
 	error = awg_reset(dev);
 	if (error != 0)
 		return (error);
 
 	/* Setup DMA descriptors */
 	error = awg_setup_dma(dev);
 	if (error != 0)
 		return (error);
 
 	/* Install interrupt handler */
 	error = bus_setup_intr(dev, sc->res[_RES_IRQ],
 	    INTR_TYPE_NET | INTR_MPSAFE, NULL, awg_intr, sc, &sc->ih);
 	if (error != 0) {
 		device_printf(dev, "cannot setup interrupt handler\n");
 		return (error);
 	}
 
 	/* Setup ethernet interface */
 	sc->ifp = if_alloc(IFT_ETHER);
 	if_setsoftc(sc->ifp, sc);
 	if_initname(sc->ifp, device_get_name(dev), device_get_unit(dev));
 	if_setflags(sc->ifp, IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST);
 	if_setstartfn(sc->ifp, awg_start);
 	if_setioctlfn(sc->ifp, awg_ioctl);
 	if_setinitfn(sc->ifp, awg_init);
 	if_setsendqlen(sc->ifp, TX_DESC_COUNT - 1);
 	if_setsendqready(sc->ifp);
 	if_sethwassist(sc->ifp, CSUM_IP | CSUM_UDP | CSUM_TCP);
 	if_setcapabilities(sc->ifp, IFCAP_VLAN_MTU | IFCAP_HWCSUM);
 	if_setcapenable(sc->ifp, if_getcapabilities(sc->ifp));
 #ifdef DEVICE_POLLING
 	if_setcapabilitiesbit(sc->ifp, IFCAP_POLLING, 0);
 #endif
 
 	/* Attach MII driver */
 	error = mii_attach(dev, &sc->miibus, sc->ifp, awg_media_change,
 	    awg_media_status, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY,
 	    MIIF_DOPAUSE);
 	if (error != 0) {
 		device_printf(dev, "cannot attach PHY\n");
 		return (error);
 	}
 
 	/* Attach ethernet interface */
 	ether_ifattach(sc->ifp, eaddr);
 
 	return (0);
 }
 
 static device_method_t awg_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		awg_probe),
 	DEVMETHOD(device_attach,	awg_attach),
 
 	/* MII interface */
 	DEVMETHOD(miibus_readreg,	awg_miibus_readreg),
 	DEVMETHOD(miibus_writereg,	awg_miibus_writereg),
 	DEVMETHOD(miibus_statchg,	awg_miibus_statchg),
 
 	DEVMETHOD_END
 };
 
 static driver_t awg_driver = {
 	"awg",
 	awg_methods,
 	sizeof(struct awg_softc),
 };
 
 static devclass_t awg_devclass;
 
 DRIVER_MODULE(awg, simplebus, awg_driver, awg_devclass, 0, 0);
 DRIVER_MODULE(miibus, awg, miibus_driver, miibus_devclass, 0, 0);
-
 MODULE_DEPEND(awg, ether, 1, 1, 1);
 MODULE_DEPEND(awg, miibus, 1, 1, 1);
+MODULE_DEPEND(awg, aw_sid, 1, 1, 1);
+SIMPLEBUS_PNP_INFO(compat_data);
Index: user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
===================================================================
--- user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	(revision 346925)
+++ user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	(revision 346926)
@@ -1,1364 +1,1366 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
  * Copyright (c) 2013 Steven Hartland. All rights reserved.
  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  * Copyright 2016 Nexenta Systems, Inc.  All rights reserved.
  */
 
 #include <sys/dsl_pool.h>
 #include <sys/dsl_dataset.h>
 #include <sys/dsl_prop.h>
 #include <sys/dsl_dir.h>
 #include <sys/dsl_synctask.h>
 #include <sys/dsl_scan.h>
 #include <sys/dnode.h>
 #include <sys/dmu_tx.h>
 #include <sys/dmu_objset.h>
 #include <sys/arc.h>
 #include <sys/zap.h>
 #include <sys/zio.h>
 #include <sys/zfs_context.h>
 #include <sys/fs/zfs.h>
 #include <sys/zfs_znode.h>
 #include <sys/spa_impl.h>
 #include <sys/dsl_deadlist.h>
 #include <sys/vdev_impl.h>
 #include <sys/metaslab_impl.h>
 #include <sys/bptree.h>
 #include <sys/zfeature.h>
 #include <sys/zil_impl.h>
 #include <sys/dsl_userhold.h>
 
 #if defined(__FreeBSD__) && defined(_KERNEL)
 #include <sys/types.h>
 #include <sys/sysctl.h>
 #endif
 
 /*
  * ZFS Write Throttle
  * ------------------
  *
  * ZFS must limit the rate of incoming writes to the rate at which it is able
  * to sync data modifications to the backend storage. Throttling by too much
  * creates an artificial limit; throttling by too little can only be sustained
  * for short periods and would lead to highly lumpy performance. On a per-pool
  * basis, ZFS tracks the amount of modified (dirty) data. As operations change
  * data, the amount of dirty data increases; as ZFS syncs out data, the amount
  * of dirty data decreases. When the amount of dirty data exceeds a
  * predetermined threshold further modifications are blocked until the amount
  * of dirty data decreases (as data is synced out).
  *
  * The limit on dirty data is tunable, and should be adjusted according to
  * both the IO capacity and available memory of the system. The larger the
  * window, the more ZFS is able to aggregate and amortize metadata (and data)
  * changes. However, memory is a limited resource, and allowing for more dirty
  * data comes at the cost of keeping other useful data in memory (for example
  * ZFS data cached by the ARC).
  *
  * Implementation
  *
  * As buffers are modified dsl_pool_willuse_space() increments both the per-
  * txg (dp_dirty_pertxg[]) and poolwide (dp_dirty_total) accounting of
  * dirty space used; dsl_pool_dirty_space() decrements those values as data
  * is synced out from dsl_pool_sync(). While only the poolwide value is
  * relevant, the per-txg value is useful for debugging. The tunable
  * zfs_dirty_data_max determines the dirty space limit. Once that value is
  * exceeded, new writes are halted until space frees up.
  *
  * The zfs_dirty_data_sync tunable dictates the threshold at which we
  * ensure that there is a txg syncing (see the comment in txg.c for a full
  * description of transaction group stages).
  *
  * The IO scheduler uses both the dirty space limit and current amount of
  * dirty data as inputs. Those values affect the number of concurrent IOs ZFS
  * issues. See the comment in vdev_queue.c for details of the IO scheduler.
  *
  * The delay is also calculated based on the amount of dirty data.  See the
  * comment above dmu_tx_delay() for details.
  */
 
 /*
  * zfs_dirty_data_max will be set to zfs_dirty_data_max_percent% of all memory,
  * capped at zfs_dirty_data_max_max.  It can also be overridden in /etc/system.
  */
 uint64_t zfs_dirty_data_max;
 uint64_t zfs_dirty_data_max_max = 4ULL * 1024 * 1024 * 1024;
 int zfs_dirty_data_max_percent = 10;
 
 /*
  * If there is at least this much dirty data, push out a txg.
  */
 uint64_t zfs_dirty_data_sync = 64 * 1024 * 1024;
 
 /*
  * Once there is this amount of dirty data, the dmu_tx_delay() will kick in
  * and delay each transaction.
  * This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
  */
 int zfs_delay_min_dirty_percent = 60;
 
 /*
  * This controls how quickly the delay approaches infinity.
  * Larger values cause it to delay more for a given amount of dirty data.
  * Therefore larger values will cause there to be less dirty data for a
  * given throughput.
  *
  * For the smoothest delay, this value should be about 1 billion divided
  * by the maximum number of operations per second.  This will smoothly
  * handle between 10x and 1/10th this number.
  *
  * Note: zfs_delay_scale * zfs_dirty_data_max must be < 2^64, due to the
  * multiply in dmu_tx_delay().
  */
 uint64_t zfs_delay_scale = 1000 * 1000 * 1000 / 2000;
 
 /*
  * This determines the number of threads used by the dp_sync_taskq.
  */
 int zfs_sync_taskq_batch_pct = 75;
 
 /*
  * These tunables determine the behavior of how zil_itxg_clean() is
  * called via zil_clean() in the context of spa_sync(). When an itxg
  * list needs to be cleaned, TQ_NOSLEEP will be used when dispatching.
  * If the dispatch fails, the call to zil_itxg_clean() will occur
  * synchronously in the context of spa_sync(), which can negatively
  * impact the performance of spa_sync() (e.g. in the case of the itxg
  * list having a large number of itxs that needs to be cleaned).
  *
  * Thus, these tunables can be used to manipulate the behavior of the
  * taskq used by zil_clean(); they determine the number of taskq entries
  * that are pre-populated when the taskq is first created (via the
  * "zfs_zil_clean_taskq_minalloc" tunable) and the maximum number of
  * taskq entries that are cached after an on-demand allocation (via the
  * "zfs_zil_clean_taskq_maxalloc").
  *
  * The idea being, we want to try reasonably hard to ensure there will
  * already be a taskq entry pre-allocated by the time that it is needed
  * by zil_clean(). This way, we can avoid the possibility of an
  * on-demand allocation of a new taskq entry from failing, which would
  * result in zil_itxg_clean() being called synchronously from zil_clean()
  * (which can adversely affect performance of spa_sync()).
  *
  * Additionally, the number of threads used by the taskq can be
  * configured via the "zfs_zil_clean_taskq_nthr_pct" tunable.
  */
 int zfs_zil_clean_taskq_nthr_pct = 100;
 int zfs_zil_clean_taskq_minalloc = 1024;
 int zfs_zil_clean_taskq_maxalloc = 1024 * 1024;
 
 #if defined(__FreeBSD__) && defined(_KERNEL)
 
 extern int zfs_vdev_async_write_active_max_dirty_percent;
 
 SYSCTL_DECL(_vfs_zfs);
 
 SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max, CTLFLAG_RWTUN,
     &zfs_dirty_data_max, 0,
     "The maximum amount of dirty data in bytes after which new writes are "
     "halted until space becomes available");
 
 SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max_max, CTLFLAG_RDTUN,
     &zfs_dirty_data_max_max, 0,
     "The absolute cap on dirty_data_max when auto calculating");
 
 static int sysctl_zfs_dirty_data_max_percent(SYSCTL_HANDLER_ARGS);
 SYSCTL_PROC(_vfs_zfs, OID_AUTO, dirty_data_max_percent,
     CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RWTUN, 0, sizeof(int),
     sysctl_zfs_dirty_data_max_percent, "I",
     "The percent of physical memory used to auto calculate dirty_data_max");
 
 SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_sync, CTLFLAG_RWTUN,
     &zfs_dirty_data_sync, 0,
     "Force a txg if the number of dirty buffer bytes exceed this value");
 
 static int sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS);
 /* No zfs_delay_min_dirty_percent tunable due to limit requirements */
 SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_min_dirty_percent,
     CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(int),
     sysctl_zfs_delay_min_dirty_percent, "I",
     "The limit of outstanding dirty data before transactions are delayed");
 
 static int sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS);
 /* No zfs_delay_scale tunable due to limit requirements */
 SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_scale,
     CTLTYPE_U64 | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(uint64_t),
     sysctl_zfs_delay_scale, "QU",
     "Controls how quickly the delay approaches infinity");
 
 static int
 sysctl_zfs_dirty_data_max_percent(SYSCTL_HANDLER_ARGS)
 {
 	int val, err;
 
 	val = zfs_dirty_data_max_percent;
 	err = sysctl_handle_int(oidp, &val, 0, req);
 	if (err != 0 || req->newptr == NULL)
 		return (err);
 
 	if (val < 0 || val > 100)
 		return (EINVAL);
 
 	zfs_dirty_data_max_percent = val;
 
 	return (0);
 }
 
 static int
 sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS)
 {
 	int val, err;
 
 	val = zfs_delay_min_dirty_percent;
 	err = sysctl_handle_int(oidp, &val, 0, req);
 	if (err != 0 || req->newptr == NULL)
 		return (err);
 
 	if (val < zfs_vdev_async_write_active_max_dirty_percent)
 		return (EINVAL);
 
 	zfs_delay_min_dirty_percent = val;
 
 	return (0);
 }
 
 static int
 sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS)
 {
 	uint64_t val;
 	int err;
 
 	val = zfs_delay_scale;
 	err = sysctl_handle_64(oidp, &val, 0, req);
 	if (err != 0 || req->newptr == NULL)
 		return (err);
 
 	if (val > UINT64_MAX / zfs_dirty_data_max)
 		return (EINVAL);
 
 	zfs_delay_scale = val;
 
 	return (0);
 }
 #endif
 
 int
 dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp)
 {
 	uint64_t obj;
 	int err;
 
 	err = zap_lookup(dp->dp_meta_objset,
 	    dsl_dir_phys(dp->dp_root_dir)->dd_child_dir_zapobj,
 	    name, sizeof (obj), 1, &obj);
 	if (err)
 		return (err);
 
 	return (dsl_dir_hold_obj(dp, obj, name, dp, ddp));
 }
 
 static dsl_pool_t *
 dsl_pool_open_impl(spa_t *spa, uint64_t txg)
 {
 	dsl_pool_t *dp;
 	blkptr_t *bp = spa_get_rootblkptr(spa);
 
 	dp = kmem_zalloc(sizeof (dsl_pool_t), KM_SLEEP);
 	dp->dp_spa = spa;
 	dp->dp_meta_rootbp = *bp;
 	rrw_init(&dp->dp_config_rwlock, B_TRUE);
 	txg_init(dp, txg);
 
 	txg_list_create(&dp->dp_dirty_datasets, spa,
 	    offsetof(dsl_dataset_t, ds_dirty_link));
 	txg_list_create(&dp->dp_dirty_zilogs, spa,
 	    offsetof(zilog_t, zl_dirty_link));
 	txg_list_create(&dp->dp_dirty_dirs, spa,
 	    offsetof(dsl_dir_t, dd_dirty_link));
 	txg_list_create(&dp->dp_sync_tasks, spa,
 	    offsetof(dsl_sync_task_t, dst_node));
 	txg_list_create(&dp->dp_early_sync_tasks, spa,
 	    offsetof(dsl_sync_task_t, dst_node));
 
 	dp->dp_sync_taskq = taskq_create("dp_sync_taskq",
 	    zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX,
 	    TASKQ_THREADS_CPU_PCT);
 
 	dp->dp_zil_clean_taskq = taskq_create("dp_zil_clean_taskq",
 	    zfs_zil_clean_taskq_nthr_pct, minclsyspri,
 	    zfs_zil_clean_taskq_minalloc,
 	    zfs_zil_clean_taskq_maxalloc,
 	    TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT);
 
 	mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL);
 	cv_init(&dp->dp_spaceavail_cv, NULL, CV_DEFAULT, NULL);
 
 	dp->dp_vnrele_taskq = taskq_create("zfs_vn_rele_taskq", 1, minclsyspri,
 	    1, 4, 0);
 
 	return (dp);
 }
 
 int
 dsl_pool_init(spa_t *spa, uint64_t txg, dsl_pool_t **dpp)
 {
 	int err;
 	dsl_pool_t *dp = dsl_pool_open_impl(spa, txg);
 
 	err = dmu_objset_open_impl(spa, NULL, &dp->dp_meta_rootbp,
 	    &dp->dp_meta_objset);
 	if (err != 0)
 		dsl_pool_close(dp);
 	else
 		*dpp = dp;
 
 	return (err);
 }
 
 int
 dsl_pool_open(dsl_pool_t *dp)
 {
 	int err;
 	dsl_dir_t *dd;
 	dsl_dataset_t *ds;
 	uint64_t obj;
 
 	rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG);
 	err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1,
 	    &dp->dp_root_dir_obj);
 	if (err)
 		goto out;
 
 	err = dsl_dir_hold_obj(dp, dp->dp_root_dir_obj,
 	    NULL, dp, &dp->dp_root_dir);
 	if (err)
 		goto out;
 
 	err = dsl_pool_open_special_dir(dp, MOS_DIR_NAME, &dp->dp_mos_dir);
 	if (err)
 		goto out;
 
 	if (spa_version(dp->dp_spa) >= SPA_VERSION_ORIGIN) {
 		err = dsl_pool_open_special_dir(dp, ORIGIN_DIR_NAME, &dd);
 		if (err)
 			goto out;
 		err = dsl_dataset_hold_obj(dp,
 		    dsl_dir_phys(dd)->dd_head_dataset_obj, FTAG, &ds);
 		if (err == 0) {
 			err = dsl_dataset_hold_obj(dp,
 			    dsl_dataset_phys(ds)->ds_prev_snap_obj, dp,
 			    &dp->dp_origin_snap);
 			dsl_dataset_rele(ds, FTAG);
 		}
 		dsl_dir_rele(dd, dp);
 		if (err)
 			goto out;
 	}
 
 	if (spa_version(dp->dp_spa) >= SPA_VERSION_DEADLISTS) {
 		err = dsl_pool_open_special_dir(dp, FREE_DIR_NAME,
 		    &dp->dp_free_dir);
 		if (err)
 			goto out;
 
 		err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 		    DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj);
 		if (err)
 			goto out;
 		VERIFY0(bpobj_open(&dp->dp_free_bpobj,
 		    dp->dp_meta_objset, obj));
 	}
 
 	if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_OBSOLETE_COUNTS)) {
 		err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 		    DMU_POOL_OBSOLETE_BPOBJ, sizeof (uint64_t), 1, &obj);
 		if (err == 0) {
 			VERIFY0(bpobj_open(&dp->dp_obsolete_bpobj,
 			    dp->dp_meta_objset, obj));
 		} else if (err == ENOENT) {
 			/*
 			 * We might not have created the remap bpobj yet.
 			 */
 			err = 0;
 		} else {
 			goto out;
 		}
 	}
 
 	/*
 	 * Note: errors ignored, because the these special dirs, used for
 	 * space accounting, are only created on demand.
 	 */
 	(void) dsl_pool_open_special_dir(dp, LEAK_DIR_NAME,
 	    &dp->dp_leak_dir);
 
 	if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_ASYNC_DESTROY)) {
 		err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 		    DMU_POOL_BPTREE_OBJ, sizeof (uint64_t), 1,
 		    &dp->dp_bptree_obj);
 		if (err != 0)
 			goto out;
 	}
 
 	if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_EMPTY_BPOBJ)) {
 		err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 		    DMU_POOL_EMPTY_BPOBJ, sizeof (uint64_t), 1,
 		    &dp->dp_empty_bpobj);
 		if (err != 0)
 			goto out;
 	}
 
 	err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_POOL_TMP_USERREFS, sizeof (uint64_t), 1,
 	    &dp->dp_tmp_userrefs_obj);
 	if (err == ENOENT)
 		err = 0;
 	if (err)
 		goto out;
 
 	err = dsl_scan_init(dp, dp->dp_tx.tx_open_txg);
 
 out:
 	rrw_exit(&dp->dp_config_rwlock, FTAG);
 	return (err);
 }
 
 void
 dsl_pool_close(dsl_pool_t *dp)
 {
 	/*
 	 * Drop our references from dsl_pool_open().
 	 *
 	 * Since we held the origin_snap from "syncing" context (which
 	 * includes pool-opening context), it actually only got a "ref"
 	 * and not a hold, so just drop that here.
 	 */
 	if (dp->dp_origin_snap != NULL)
 		dsl_dataset_rele(dp->dp_origin_snap, dp);
 	if (dp->dp_mos_dir != NULL)
 		dsl_dir_rele(dp->dp_mos_dir, dp);
 	if (dp->dp_free_dir != NULL)
 		dsl_dir_rele(dp->dp_free_dir, dp);
 	if (dp->dp_leak_dir != NULL)
 		dsl_dir_rele(dp->dp_leak_dir, dp);
 	if (dp->dp_root_dir != NULL)
 		dsl_dir_rele(dp->dp_root_dir, dp);
 
 	bpobj_close(&dp->dp_free_bpobj);
 	bpobj_close(&dp->dp_obsolete_bpobj);
 
 	/* undo the dmu_objset_open_impl(mos) from dsl_pool_open() */
 	if (dp->dp_meta_objset != NULL)
 		dmu_objset_evict(dp->dp_meta_objset);
 
 	txg_list_destroy(&dp->dp_dirty_datasets);
 	txg_list_destroy(&dp->dp_dirty_zilogs);
 	txg_list_destroy(&dp->dp_sync_tasks);
 	txg_list_destroy(&dp->dp_early_sync_tasks);
 	txg_list_destroy(&dp->dp_dirty_dirs);
 
 	taskq_destroy(dp->dp_zil_clean_taskq);
 	taskq_destroy(dp->dp_sync_taskq);
 
 	/*
 	 * We can't set retry to TRUE since we're explicitly specifying
 	 * a spa to flush. This is good enough; any missed buffers for
 	 * this spa won't cause trouble, and they'll eventually fall
 	 * out of the ARC just like any other unused buffer.
 	 */
 	arc_flush(dp->dp_spa, FALSE);
 
 	txg_fini(dp);
 	dsl_scan_fini(dp);
 	dmu_buf_user_evict_wait();
 
 	rrw_destroy(&dp->dp_config_rwlock);
 	mutex_destroy(&dp->dp_lock);
 	taskq_destroy(dp->dp_vnrele_taskq);
-	if (dp->dp_blkstats != NULL)
+	if (dp->dp_blkstats != NULL) {
+		mutex_destroy(&dp->dp_blkstats->zab_lock);
 		kmem_free(dp->dp_blkstats, sizeof (zfs_all_blkstats_t));
+	}
 	kmem_free(dp, sizeof (dsl_pool_t));
 }
 
 void
 dsl_pool_create_obsolete_bpobj(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	uint64_t obj;
 	/*
 	 * Currently, we only create the obsolete_bpobj where there are
 	 * indirect vdevs with referenced mappings.
 	 */
 	ASSERT(spa_feature_is_active(dp->dp_spa, SPA_FEATURE_DEVICE_REMOVAL));
 	/* create and open the obsolete_bpobj */
 	obj = bpobj_alloc(dp->dp_meta_objset, SPA_OLD_MAXBLOCKSIZE, tx);
 	VERIFY0(bpobj_open(&dp->dp_obsolete_bpobj, dp->dp_meta_objset, obj));
 	VERIFY0(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_POOL_OBSOLETE_BPOBJ, sizeof (uint64_t), 1, &obj, tx));
 	spa_feature_incr(dp->dp_spa, SPA_FEATURE_OBSOLETE_COUNTS, tx);
 }
 
 void
 dsl_pool_destroy_obsolete_bpobj(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	spa_feature_decr(dp->dp_spa, SPA_FEATURE_OBSOLETE_COUNTS, tx);
 	VERIFY0(zap_remove(dp->dp_meta_objset,
 	    DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_POOL_OBSOLETE_BPOBJ, tx));
 	bpobj_free(dp->dp_meta_objset,
 	    dp->dp_obsolete_bpobj.bpo_object, tx);
 	bpobj_close(&dp->dp_obsolete_bpobj);
 }
 
 dsl_pool_t *
 dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg)
 {
 	int err;
 	dsl_pool_t *dp = dsl_pool_open_impl(spa, txg);
 	dmu_tx_t *tx = dmu_tx_create_assigned(dp, txg);
 	dsl_dataset_t *ds;
 	uint64_t obj;
 
 	rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG);
 
 	/* create and open the MOS (meta-objset) */
 	dp->dp_meta_objset = dmu_objset_create_impl(spa,
 	    NULL, &dp->dp_meta_rootbp, DMU_OST_META, tx);
 
 	/* create the pool directory */
 	err = zap_create_claim(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_OT_OBJECT_DIRECTORY, DMU_OT_NONE, 0, tx);
 	ASSERT0(err);
 
 	/* Initialize scan structures */
 	VERIFY0(dsl_scan_init(dp, txg));
 
 	/* create and open the root dir */
 	dp->dp_root_dir_obj = dsl_dir_create_sync(dp, NULL, NULL, tx);
 	VERIFY0(dsl_dir_hold_obj(dp, dp->dp_root_dir_obj,
 	    NULL, dp, &dp->dp_root_dir));
 
 	/* create and open the meta-objset dir */
 	(void) dsl_dir_create_sync(dp, dp->dp_root_dir, MOS_DIR_NAME, tx);
 	VERIFY0(dsl_pool_open_special_dir(dp,
 	    MOS_DIR_NAME, &dp->dp_mos_dir));
 
 	if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
 		/* create and open the free dir */
 		(void) dsl_dir_create_sync(dp, dp->dp_root_dir,
 		    FREE_DIR_NAME, tx);
 		VERIFY0(dsl_pool_open_special_dir(dp,
 		    FREE_DIR_NAME, &dp->dp_free_dir));
 
 		/* create and open the free_bplist */
 		obj = bpobj_alloc(dp->dp_meta_objset, SPA_OLD_MAXBLOCKSIZE, tx);
 		VERIFY(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 		    DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx) == 0);
 		VERIFY0(bpobj_open(&dp->dp_free_bpobj,
 		    dp->dp_meta_objset, obj));
 	}
 
 	if (spa_version(spa) >= SPA_VERSION_DSL_SCRUB)
 		dsl_pool_create_origin(dp, tx);
 
 	/* create the root dataset */
 	obj = dsl_dataset_create_sync_dd(dp->dp_root_dir, NULL, 0, tx);
 
 	/* create the root objset */
 	VERIFY0(dsl_dataset_hold_obj(dp, obj, FTAG, &ds));
 #ifdef _KERNEL
 	{
 		objset_t *os;
 		rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
 		os = dmu_objset_create_impl(dp->dp_spa, ds,
 		    dsl_dataset_get_blkptr(ds), DMU_OST_ZFS, tx);
 		rrw_exit(&ds->ds_bp_rwlock, FTAG);
 		zfs_create_fs(os, kcred, zplprops, tx);
 	}
 #endif
 	dsl_dataset_rele(ds, FTAG);
 
 	dmu_tx_commit(tx);
 
 	rrw_exit(&dp->dp_config_rwlock, FTAG);
 
 	return (dp);
 }
 
 /*
  * Account for the meta-objset space in its placeholder dsl_dir.
  */
 void
 dsl_pool_mos_diduse_space(dsl_pool_t *dp,
     int64_t used, int64_t comp, int64_t uncomp)
 {
 	ASSERT3U(comp, ==, uncomp); /* it's all metadata */
 	mutex_enter(&dp->dp_lock);
 	dp->dp_mos_used_delta += used;
 	dp->dp_mos_compressed_delta += comp;
 	dp->dp_mos_uncompressed_delta += uncomp;
 	mutex_exit(&dp->dp_lock);
 }
 
 static void
 dsl_pool_sync_mos(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	zio_t *zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
 	dmu_objset_sync(dp->dp_meta_objset, zio, tx);
 	VERIFY0(zio_wait(zio));
 	dprintf_bp(&dp->dp_meta_rootbp, "meta objset rootbp is %s", "");
 	spa_set_rootblkptr(dp->dp_spa, &dp->dp_meta_rootbp);
 }
 
 static void
 dsl_pool_dirty_delta(dsl_pool_t *dp, int64_t delta)
 {
 	ASSERT(MUTEX_HELD(&dp->dp_lock));
 
 	if (delta < 0)
 		ASSERT3U(-delta, <=, dp->dp_dirty_total);
 
 	dp->dp_dirty_total += delta;
 
 	/*
 	 * Note: we signal even when increasing dp_dirty_total.
 	 * This ensures forward progress -- each thread wakes the next waiter.
 	 */
 	if (dp->dp_dirty_total < zfs_dirty_data_max)
 		cv_signal(&dp->dp_spaceavail_cv);
 }
 
 static boolean_t
 dsl_early_sync_task_verify(dsl_pool_t *dp, uint64_t txg)
 {
 	spa_t *spa = dp->dp_spa;
 	vdev_t *rvd = spa->spa_root_vdev;
 
 	for (uint64_t c = 0; c < rvd->vdev_children; c++) {
 		vdev_t *vd = rvd->vdev_child[c];
 		txg_list_t *tl = &vd->vdev_ms_list;
 		metaslab_t *ms;
 
 		for (ms = txg_list_head(tl, TXG_CLEAN(txg)); ms;
 		    ms = txg_list_next(tl, ms, TXG_CLEAN(txg))) {
 			VERIFY(range_tree_is_empty(ms->ms_freeing));
 			VERIFY(range_tree_is_empty(ms->ms_checkpointing));
 		}
 	}
 
 	return (B_TRUE);
 }
 
 void
 dsl_pool_sync(dsl_pool_t *dp, uint64_t txg)
 {
 	zio_t *zio;
 	dmu_tx_t *tx;
 	dsl_dir_t *dd;
 	dsl_dataset_t *ds;
 	objset_t *mos = dp->dp_meta_objset;
 	list_t synced_datasets;
 
 	list_create(&synced_datasets, sizeof (dsl_dataset_t),
 	    offsetof(dsl_dataset_t, ds_synced_link));
 
 	tx = dmu_tx_create_assigned(dp, txg);
 
 	/*
 	 * Run all early sync tasks before writing out any dirty blocks.
 	 * For more info on early sync tasks see block comment in
 	 * dsl_early_sync_task().
 	 */
 	if (!txg_list_empty(&dp->dp_early_sync_tasks, txg)) {
 		dsl_sync_task_t *dst;
 
 		ASSERT3U(spa_sync_pass(dp->dp_spa), ==, 1);
 		while ((dst =
 		    txg_list_remove(&dp->dp_early_sync_tasks, txg)) != NULL) {
 			ASSERT(dsl_early_sync_task_verify(dp, txg));
 			dsl_sync_task_sync(dst, tx);
 		}
 		ASSERT(dsl_early_sync_task_verify(dp, txg));
 	}
 
 	/*
 	 * Write out all dirty blocks of dirty datasets.
 	 */
 	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
 	while ((ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) != NULL) {
 		/*
 		 * We must not sync any non-MOS datasets twice, because
 		 * we may have taken a snapshot of them.  However, we
 		 * may sync newly-created datasets on pass 2.
 		 */
 		ASSERT(!list_link_active(&ds->ds_synced_link));
 		list_insert_tail(&synced_datasets, ds);
 		dsl_dataset_sync(ds, zio, tx);
 	}
 	VERIFY0(zio_wait(zio));
 
 	/*
 	 * We have written all of the accounted dirty data, so our
 	 * dp_space_towrite should now be zero.  However, some seldom-used
 	 * code paths do not adhere to this (e.g. dbuf_undirty(), also
 	 * rounding error in dbuf_write_physdone).
 	 * Shore up the accounting of any dirtied space now.
 	 */
 	dsl_pool_undirty_space(dp, dp->dp_dirty_pertxg[txg & TXG_MASK], txg);
 
 	/*
 	 * Update the long range free counter after
 	 * we're done syncing user data
 	 */
 	mutex_enter(&dp->dp_lock);
 	ASSERT(spa_sync_pass(dp->dp_spa) == 1 ||
 	    dp->dp_long_free_dirty_pertxg[txg & TXG_MASK] == 0);
 	dp->dp_long_free_dirty_pertxg[txg & TXG_MASK] = 0;
 	mutex_exit(&dp->dp_lock);
 
 	/*
 	 * After the data blocks have been written (ensured by the zio_wait()
 	 * above), update the user/group space accounting.  This happens
 	 * in tasks dispatched to dp_sync_taskq, so wait for them before
 	 * continuing.
 	 */
 	for (ds = list_head(&synced_datasets); ds != NULL;
 	    ds = list_next(&synced_datasets, ds)) {
 		dmu_objset_do_userquota_updates(ds->ds_objset, tx);
 	}
 	taskq_wait(dp->dp_sync_taskq);
 
 	/*
 	 * Sync the datasets again to push out the changes due to
 	 * userspace updates.  This must be done before we process the
 	 * sync tasks, so that any snapshots will have the correct
 	 * user accounting information (and we won't get confused
 	 * about which blocks are part of the snapshot).
 	 */
 	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
 	while ((ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) != NULL) {
 		ASSERT(list_link_active(&ds->ds_synced_link));
 		dmu_buf_rele(ds->ds_dbuf, ds);
 		dsl_dataset_sync(ds, zio, tx);
 	}
 	VERIFY0(zio_wait(zio));
 
 	/*
 	 * Now that the datasets have been completely synced, we can
 	 * clean up our in-memory structures accumulated while syncing:
 	 *
 	 *  - move dead blocks from the pending deadlist to the on-disk deadlist
 	 *  - release hold from dsl_dataset_dirty()
 	 */
 	while ((ds = list_remove_head(&synced_datasets)) != NULL) {
 		dsl_dataset_sync_done(ds, tx);
 	}
 	while ((dd = txg_list_remove(&dp->dp_dirty_dirs, txg)) != NULL) {
 		dsl_dir_sync(dd, tx);
 	}
 
 	/*
 	 * The MOS's space is accounted for in the pool/$MOS
 	 * (dp_mos_dir).  We can't modify the mos while we're syncing
 	 * it, so we remember the deltas and apply them here.
 	 */
 	if (dp->dp_mos_used_delta != 0 || dp->dp_mos_compressed_delta != 0 ||
 	    dp->dp_mos_uncompressed_delta != 0) {
 		dsl_dir_diduse_space(dp->dp_mos_dir, DD_USED_HEAD,
 		    dp->dp_mos_used_delta,
 		    dp->dp_mos_compressed_delta,
 		    dp->dp_mos_uncompressed_delta, tx);
 		dp->dp_mos_used_delta = 0;
 		dp->dp_mos_compressed_delta = 0;
 		dp->dp_mos_uncompressed_delta = 0;
 	}
 
 	if (!multilist_is_empty(mos->os_dirty_dnodes[txg & TXG_MASK])) {
 		dsl_pool_sync_mos(dp, tx);
 	}
 
 	/*
 	 * If we modify a dataset in the same txg that we want to destroy it,
 	 * its dsl_dir's dd_dbuf will be dirty, and thus have a hold on it.
 	 * dsl_dir_destroy_check() will fail if there are unexpected holds.
 	 * Therefore, we want to sync the MOS (thus syncing the dd_dbuf
 	 * and clearing the hold on it) before we process the sync_tasks.
 	 * The MOS data dirtied by the sync_tasks will be synced on the next
 	 * pass.
 	 */
 	if (!txg_list_empty(&dp->dp_sync_tasks, txg)) {
 		dsl_sync_task_t *dst;
 		/*
 		 * No more sync tasks should have been added while we
 		 * were syncing.
 		 */
 		ASSERT3U(spa_sync_pass(dp->dp_spa), ==, 1);
 		while ((dst = txg_list_remove(&dp->dp_sync_tasks, txg)) != NULL)
 			dsl_sync_task_sync(dst, tx);
 	}
 
 	dmu_tx_commit(tx);
 
 	DTRACE_PROBE2(dsl_pool_sync__done, dsl_pool_t *dp, dp, uint64_t, txg);
 }
 
 void
 dsl_pool_sync_done(dsl_pool_t *dp, uint64_t txg)
 {
 	zilog_t *zilog;
 
 	while (zilog = txg_list_head(&dp->dp_dirty_zilogs, txg)) {
 		dsl_dataset_t *ds = dmu_objset_ds(zilog->zl_os);
 		/*
 		 * We don't remove the zilog from the dp_dirty_zilogs
 		 * list until after we've cleaned it. This ensures that
 		 * callers of zilog_is_dirty() receive an accurate
 		 * answer when they are racing with the spa sync thread.
 		 */
 		zil_clean(zilog, txg);
 		(void) txg_list_remove_this(&dp->dp_dirty_zilogs, zilog, txg);
 		ASSERT(!dmu_objset_is_dirty(zilog->zl_os, txg));
 		dmu_buf_rele(ds->ds_dbuf, zilog);
 	}
 	ASSERT(!dmu_objset_is_dirty(dp->dp_meta_objset, txg));
 }
 
 /*
  * TRUE if the current thread is the tx_sync_thread or if we
  * are being called from SPA context during pool initialization.
  */
 int
 dsl_pool_sync_context(dsl_pool_t *dp)
 {
 	return (curthread == dp->dp_tx.tx_sync_thread ||
 	    spa_is_initializing(dp->dp_spa) ||
 	    taskq_member(dp->dp_sync_taskq, curthread));
 }
 
 /*
  * This function returns the amount of allocatable space in the pool
  * minus whatever space is currently reserved by ZFS for specific
  * purposes. Specifically:
  *
  * 1] Any reserved SLOP space
  * 2] Any space used by the checkpoint
  * 3] Any space used for deferred frees
  *
  * The latter 2 are especially important because they are needed to
  * rectify the SPA's and DMU's different understanding of how much space
  * is used. Now the DMU is aware of that extra space tracked by the SPA
  * without having to maintain a separate special dir (e.g similar to
  * $MOS, $FREEING, and $LEAKED).
  *
  * Note: By deferred frees here, we mean the frees that were deferred
  * in spa_sync() after sync pass 1 (spa_deferred_bpobj), and not the
  * segments placed in ms_defer trees during metaslab_sync_done().
  */
 uint64_t
 dsl_pool_adjustedsize(dsl_pool_t *dp, zfs_space_check_t slop_policy)
 {
 	spa_t *spa = dp->dp_spa;
 	uint64_t space, resv, adjustedsize;
 	uint64_t spa_deferred_frees =
 	    spa->spa_deferred_bpobj.bpo_phys->bpo_bytes;
 
 	space = spa_get_dspace(spa)
 	    - spa_get_checkpoint_space(spa) - spa_deferred_frees;
 	resv = spa_get_slop_space(spa);
 
 	switch (slop_policy) {
 	case ZFS_SPACE_CHECK_NORMAL:
 		break;
 	case ZFS_SPACE_CHECK_RESERVED:
 		resv >>= 1;
 		break;
 	case ZFS_SPACE_CHECK_EXTRA_RESERVED:
 		resv >>= 2;
 		break;
 	case ZFS_SPACE_CHECK_NONE:
 		resv = 0;
 		break;
 	default:
 		panic("invalid slop policy value: %d", slop_policy);
 		break;
 	}
 	adjustedsize = (space >= resv) ? (space - resv) : 0;
 
 	return (adjustedsize);
 }
 
 uint64_t
 dsl_pool_unreserved_space(dsl_pool_t *dp, zfs_space_check_t slop_policy)
 {
 	uint64_t poolsize = dsl_pool_adjustedsize(dp, slop_policy);
 	uint64_t deferred =
 	    metaslab_class_get_deferred(spa_normal_class(dp->dp_spa));
 	uint64_t quota = (poolsize >= deferred) ? (poolsize - deferred) : 0;
 	return (quota);
 }
 
 boolean_t
 dsl_pool_need_dirty_delay(dsl_pool_t *dp)
 {
 	uint64_t delay_min_bytes =
 	    zfs_dirty_data_max * zfs_delay_min_dirty_percent / 100;
 	boolean_t rv;
 
 	mutex_enter(&dp->dp_lock);
 	if (dp->dp_dirty_total > zfs_dirty_data_sync)
 		txg_kick(dp);
 	rv = (dp->dp_dirty_total > delay_min_bytes);
 	mutex_exit(&dp->dp_lock);
 	return (rv);
 }
 
 void
 dsl_pool_dirty_space(dsl_pool_t *dp, int64_t space, dmu_tx_t *tx)
 {
 	if (space > 0) {
 		mutex_enter(&dp->dp_lock);
 		dp->dp_dirty_pertxg[tx->tx_txg & TXG_MASK] += space;
 		dsl_pool_dirty_delta(dp, space);
 		mutex_exit(&dp->dp_lock);
 	}
 }
 
 void
 dsl_pool_undirty_space(dsl_pool_t *dp, int64_t space, uint64_t txg)
 {
 	ASSERT3S(space, >=, 0);
 	if (space == 0)
 		return;
 	mutex_enter(&dp->dp_lock);
 	if (dp->dp_dirty_pertxg[txg & TXG_MASK] < space) {
 		/* XXX writing something we didn't dirty? */
 		space = dp->dp_dirty_pertxg[txg & TXG_MASK];
 	}
 	ASSERT3U(dp->dp_dirty_pertxg[txg & TXG_MASK], >=, space);
 	dp->dp_dirty_pertxg[txg & TXG_MASK] -= space;
 	ASSERT3U(dp->dp_dirty_total, >=, space);
 	dsl_pool_dirty_delta(dp, -space);
 	mutex_exit(&dp->dp_lock);
 }
 
 /* ARGSUSED */
 static int
 upgrade_clones_cb(dsl_pool_t *dp, dsl_dataset_t *hds, void *arg)
 {
 	dmu_tx_t *tx = arg;
 	dsl_dataset_t *ds, *prev = NULL;
 	int err;
 
 	err = dsl_dataset_hold_obj(dp, hds->ds_object, FTAG, &ds);
 	if (err)
 		return (err);
 
 	while (dsl_dataset_phys(ds)->ds_prev_snap_obj != 0) {
 		err = dsl_dataset_hold_obj(dp,
 		    dsl_dataset_phys(ds)->ds_prev_snap_obj, FTAG, &prev);
 		if (err) {
 			dsl_dataset_rele(ds, FTAG);
 			return (err);
 		}
 
 		if (dsl_dataset_phys(prev)->ds_next_snap_obj != ds->ds_object)
 			break;
 		dsl_dataset_rele(ds, FTAG);
 		ds = prev;
 		prev = NULL;
 	}
 
 	if (prev == NULL) {
 		prev = dp->dp_origin_snap;
 
 		/*
 		 * The $ORIGIN can't have any data, or the accounting
 		 * will be wrong.
 		 */
 		rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
 		ASSERT0(dsl_dataset_phys(prev)->ds_bp.blk_birth);
 		rrw_exit(&ds->ds_bp_rwlock, FTAG);
 
 		/* The origin doesn't get attached to itself */
 		if (ds->ds_object == prev->ds_object) {
 			dsl_dataset_rele(ds, FTAG);
 			return (0);
 		}
 
 		dmu_buf_will_dirty(ds->ds_dbuf, tx);
 		dsl_dataset_phys(ds)->ds_prev_snap_obj = prev->ds_object;
 		dsl_dataset_phys(ds)->ds_prev_snap_txg =
 		    dsl_dataset_phys(prev)->ds_creation_txg;
 
 		dmu_buf_will_dirty(ds->ds_dir->dd_dbuf, tx);
 		dsl_dir_phys(ds->ds_dir)->dd_origin_obj = prev->ds_object;
 
 		dmu_buf_will_dirty(prev->ds_dbuf, tx);
 		dsl_dataset_phys(prev)->ds_num_children++;
 
 		if (dsl_dataset_phys(ds)->ds_next_snap_obj == 0) {
 			ASSERT(ds->ds_prev == NULL);
 			VERIFY0(dsl_dataset_hold_obj(dp,
 			    dsl_dataset_phys(ds)->ds_prev_snap_obj,
 			    ds, &ds->ds_prev));
 		}
 	}
 
 	ASSERT3U(dsl_dir_phys(ds->ds_dir)->dd_origin_obj, ==, prev->ds_object);
 	ASSERT3U(dsl_dataset_phys(ds)->ds_prev_snap_obj, ==, prev->ds_object);
 
 	if (dsl_dataset_phys(prev)->ds_next_clones_obj == 0) {
 		dmu_buf_will_dirty(prev->ds_dbuf, tx);
 		dsl_dataset_phys(prev)->ds_next_clones_obj =
 		    zap_create(dp->dp_meta_objset,
 		    DMU_OT_NEXT_CLONES, DMU_OT_NONE, 0, tx);
 	}
 	VERIFY0(zap_add_int(dp->dp_meta_objset,
 	    dsl_dataset_phys(prev)->ds_next_clones_obj, ds->ds_object, tx));
 
 	dsl_dataset_rele(ds, FTAG);
 	if (prev != dp->dp_origin_snap)
 		dsl_dataset_rele(prev, FTAG);
 	return (0);
 }
 
 void
 dsl_pool_upgrade_clones(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	ASSERT(dmu_tx_is_syncing(tx));
 	ASSERT(dp->dp_origin_snap != NULL);
 
 	VERIFY0(dmu_objset_find_dp(dp, dp->dp_root_dir_obj, upgrade_clones_cb,
 	    tx, DS_FIND_CHILDREN | DS_FIND_SERIALIZE));
 }
 
 /* ARGSUSED */
 static int
 upgrade_dir_clones_cb(dsl_pool_t *dp, dsl_dataset_t *ds, void *arg)
 {
 	dmu_tx_t *tx = arg;
 	objset_t *mos = dp->dp_meta_objset;
 
 	if (dsl_dir_phys(ds->ds_dir)->dd_origin_obj != 0) {
 		dsl_dataset_t *origin;
 
 		VERIFY0(dsl_dataset_hold_obj(dp,
 		    dsl_dir_phys(ds->ds_dir)->dd_origin_obj, FTAG, &origin));
 
 		if (dsl_dir_phys(origin->ds_dir)->dd_clones == 0) {
 			dmu_buf_will_dirty(origin->ds_dir->dd_dbuf, tx);
 			dsl_dir_phys(origin->ds_dir)->dd_clones =
 			    zap_create(mos, DMU_OT_DSL_CLONES, DMU_OT_NONE,
 			    0, tx);
 		}
 
 		VERIFY0(zap_add_int(dp->dp_meta_objset,
 		    dsl_dir_phys(origin->ds_dir)->dd_clones,
 		    ds->ds_object, tx));
 
 		dsl_dataset_rele(origin, FTAG);
 	}
 	return (0);
 }
 
 void
 dsl_pool_upgrade_dir_clones(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	ASSERT(dmu_tx_is_syncing(tx));
 	uint64_t obj;
 
 	(void) dsl_dir_create_sync(dp, dp->dp_root_dir, FREE_DIR_NAME, tx);
 	VERIFY0(dsl_pool_open_special_dir(dp,
 	    FREE_DIR_NAME, &dp->dp_free_dir));
 
 	/*
 	 * We can't use bpobj_alloc(), because spa_version() still
 	 * returns the old version, and we need a new-version bpobj with
 	 * subobj support.  So call dmu_object_alloc() directly.
 	 */
 	obj = dmu_object_alloc(dp->dp_meta_objset, DMU_OT_BPOBJ,
 	    SPA_OLD_MAXBLOCKSIZE, DMU_OT_BPOBJ_HDR, sizeof (bpobj_phys_t), tx);
 	VERIFY0(zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
 	    DMU_POOL_FREE_BPOBJ, sizeof (uint64_t), 1, &obj, tx));
 	VERIFY0(bpobj_open(&dp->dp_free_bpobj, dp->dp_meta_objset, obj));
 
 	VERIFY0(dmu_objset_find_dp(dp, dp->dp_root_dir_obj,
 	    upgrade_dir_clones_cb, tx, DS_FIND_CHILDREN | DS_FIND_SERIALIZE));
 }
 
 void
 dsl_pool_create_origin(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	uint64_t dsobj;
 	dsl_dataset_t *ds;
 
 	ASSERT(dmu_tx_is_syncing(tx));
 	ASSERT(dp->dp_origin_snap == NULL);
 	ASSERT(rrw_held(&dp->dp_config_rwlock, RW_WRITER));
 
 	/* create the origin dir, ds, & snap-ds */
 	dsobj = dsl_dataset_create_sync(dp->dp_root_dir, ORIGIN_DIR_NAME,
 	    NULL, 0, kcred, tx);
 	VERIFY0(dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds));
 	dsl_dataset_snapshot_sync_impl(ds, ORIGIN_DIR_NAME, tx);
 	VERIFY0(dsl_dataset_hold_obj(dp, dsl_dataset_phys(ds)->ds_prev_snap_obj,
 	    dp, &dp->dp_origin_snap));
 	dsl_dataset_rele(ds, FTAG);
 }
 
 taskq_t *
 dsl_pool_vnrele_taskq(dsl_pool_t *dp)
 {
 	return (dp->dp_vnrele_taskq);
 }
 
 /*
  * Walk through the pool-wide zap object of temporary snapshot user holds
  * and release them.
  */
 void
 dsl_pool_clean_tmp_userrefs(dsl_pool_t *dp)
 {
 	zap_attribute_t za;
 	zap_cursor_t zc;
 	objset_t *mos = dp->dp_meta_objset;
 	uint64_t zapobj = dp->dp_tmp_userrefs_obj;
 	nvlist_t *holds;
 
 	if (zapobj == 0)
 		return;
 	ASSERT(spa_version(dp->dp_spa) >= SPA_VERSION_USERREFS);
 
 	holds = fnvlist_alloc();
 
 	for (zap_cursor_init(&zc, mos, zapobj);
 	    zap_cursor_retrieve(&zc, &za) == 0;
 	    zap_cursor_advance(&zc)) {
 		char *htag;
 		nvlist_t *tags;
 
 		htag = strchr(za.za_name, '-');
 		*htag = '\0';
 		++htag;
 		if (nvlist_lookup_nvlist(holds, za.za_name, &tags) != 0) {
 			tags = fnvlist_alloc();
 			fnvlist_add_boolean(tags, htag);
 			fnvlist_add_nvlist(holds, za.za_name, tags);
 			fnvlist_free(tags);
 		} else {
 			fnvlist_add_boolean(tags, htag);
 		}
 	}
 	dsl_dataset_user_release_tmp(dp, holds);
 	fnvlist_free(holds);
 	zap_cursor_fini(&zc);
 }
 
 /*
  * Create the pool-wide zap object for storing temporary snapshot holds.
  */
 void
 dsl_pool_user_hold_create_obj(dsl_pool_t *dp, dmu_tx_t *tx)
 {
 	objset_t *mos = dp->dp_meta_objset;
 
 	ASSERT(dp->dp_tmp_userrefs_obj == 0);
 	ASSERT(dmu_tx_is_syncing(tx));
 
 	dp->dp_tmp_userrefs_obj = zap_create_link(mos, DMU_OT_USERREFS,
 	    DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TMP_USERREFS, tx);
 }
 
 static int
 dsl_pool_user_hold_rele_impl(dsl_pool_t *dp, uint64_t dsobj,
     const char *tag, uint64_t now, dmu_tx_t *tx, boolean_t holding)
 {
 	objset_t *mos = dp->dp_meta_objset;
 	uint64_t zapobj = dp->dp_tmp_userrefs_obj;
 	char *name;
 	int error;
 
 	ASSERT(spa_version(dp->dp_spa) >= SPA_VERSION_USERREFS);
 	ASSERT(dmu_tx_is_syncing(tx));
 
 	/*
 	 * If the pool was created prior to SPA_VERSION_USERREFS, the
 	 * zap object for temporary holds might not exist yet.
 	 */
 	if (zapobj == 0) {
 		if (holding) {
 			dsl_pool_user_hold_create_obj(dp, tx);
 			zapobj = dp->dp_tmp_userrefs_obj;
 		} else {
 			return (SET_ERROR(ENOENT));
 		}
 	}
 
 	name = kmem_asprintf("%llx-%s", (u_longlong_t)dsobj, tag);
 	if (holding)
 		error = zap_add(mos, zapobj, name, 8, 1, &now, tx);
 	else
 		error = zap_remove(mos, zapobj, name, tx);
 	strfree(name);
 
 	return (error);
 }
 
 /*
  * Add a temporary hold for the given dataset object and tag.
  */
 int
 dsl_pool_user_hold(dsl_pool_t *dp, uint64_t dsobj, const char *tag,
     uint64_t now, dmu_tx_t *tx)
 {
 	return (dsl_pool_user_hold_rele_impl(dp, dsobj, tag, now, tx, B_TRUE));
 }
 
 /*
  * Release a temporary hold for the given dataset object and tag.
  */
 int
 dsl_pool_user_release(dsl_pool_t *dp, uint64_t dsobj, const char *tag,
     dmu_tx_t *tx)
 {
 	return (dsl_pool_user_hold_rele_impl(dp, dsobj, tag, 0,
 	    tx, B_FALSE));
 }
 
 /*
  * DSL Pool Configuration Lock
  *
  * The dp_config_rwlock protects against changes to DSL state (e.g. dataset
  * creation / destruction / rename / property setting).  It must be held for
  * read to hold a dataset or dsl_dir.  I.e. you must call
  * dsl_pool_config_enter() or dsl_pool_hold() before calling
  * dsl_{dataset,dir}_hold{_obj}.  In most circumstances, the dp_config_rwlock
  * must be held continuously until all datasets and dsl_dirs are released.
  *
  * The only exception to this rule is that if a "long hold" is placed on
  * a dataset, then the dp_config_rwlock may be dropped while the dataset
  * is still held.  The long hold will prevent the dataset from being
  * destroyed -- the destroy will fail with EBUSY.  A long hold can be
  * obtained by calling dsl_dataset_long_hold(), or by "owning" a dataset
  * (by calling dsl_{dataset,objset}_{try}own{_obj}).
  *
  * Legitimate long-holders (including owners) should be long-running, cancelable
  * tasks that should cause "zfs destroy" to fail.  This includes DMU
  * consumers (i.e. a ZPL filesystem being mounted or ZVOL being open),
  * "zfs send", and "zfs diff".  There are several other long-holders whose
  * uses are suboptimal (e.g. "zfs promote", and zil_suspend()).
  *
  * The usual formula for long-holding would be:
  * dsl_pool_hold()
  * dsl_dataset_hold()
  * ... perform checks ...
  * dsl_dataset_long_hold()
  * dsl_pool_rele()
  * ... perform long-running task ...
  * dsl_dataset_long_rele()
  * dsl_dataset_rele()
  *
  * Note that when the long hold is released, the dataset is still held but
  * the pool is not held.  The dataset may change arbitrarily during this time
  * (e.g. it could be destroyed).  Therefore you shouldn't do anything to the
  * dataset except release it.
  *
  * User-initiated operations (e.g. ioctls, zfs_ioc_*()) are either read-only
  * or modifying operations.
  *
  * Modifying operations should generally use dsl_sync_task().  The synctask
  * infrastructure enforces proper locking strategy with respect to the
  * dp_config_rwlock.  See the comment above dsl_sync_task() for details.
  *
  * Read-only operations will manually hold the pool, then the dataset, obtain
  * information from the dataset, then release the pool and dataset.
  * dmu_objset_{hold,rele}() are convenience routines that also do the pool
  * hold/rele.
  */
 
 int
 dsl_pool_hold(const char *name, void *tag, dsl_pool_t **dp)
 {
 	spa_t *spa;
 	int error;
 
 	error = spa_open(name, &spa, tag);
 	if (error == 0) {
 		*dp = spa_get_dsl(spa);
 		dsl_pool_config_enter(*dp, tag);
 	}
 	return (error);
 }
 
 void
 dsl_pool_rele(dsl_pool_t *dp, void *tag)
 {
 	dsl_pool_config_exit(dp, tag);
 	spa_close(dp->dp_spa, tag);
 }
 
 void
 dsl_pool_config_enter(dsl_pool_t *dp, void *tag)
 {
 	/*
 	 * We use a "reentrant" reader-writer lock, but not reentrantly.
 	 *
 	 * The rrwlock can (with the track_all flag) track all reading threads,
 	 * which is very useful for debugging which code path failed to release
 	 * the lock, and for verifying that the *current* thread does hold
 	 * the lock.
 	 *
 	 * (Unlike a rwlock, which knows that N threads hold it for
 	 * read, but not *which* threads, so rw_held(RW_READER) returns TRUE
 	 * if any thread holds it for read, even if this thread doesn't).
 	 */
 	ASSERT(!rrw_held(&dp->dp_config_rwlock, RW_READER));
 	rrw_enter(&dp->dp_config_rwlock, RW_READER, tag);
 }
 
 void
 dsl_pool_config_enter_prio(dsl_pool_t *dp, void *tag)
 {
 	ASSERT(!rrw_held(&dp->dp_config_rwlock, RW_READER));
 	rrw_enter_read_prio(&dp->dp_config_rwlock, tag);
 }
 
 void
 dsl_pool_config_exit(dsl_pool_t *dp, void *tag)
 {
 	rrw_exit(&dp->dp_config_rwlock, tag);
 }
 
 boolean_t
 dsl_pool_config_held(dsl_pool_t *dp)
 {
 	return (RRW_LOCK_HELD(&dp->dp_config_rwlock));
 }
 
 boolean_t
 dsl_pool_config_held_writer(dsl_pool_t *dp)
 {
 	return (RRW_WRITE_HELD(&dp->dp_config_rwlock));
 }
Index: user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/range_tree.h
===================================================================
--- user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/range_tree.h	(revision 346925)
+++ user/ngie/bug-237403/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/range_tree.h	(revision 346926)
@@ -1,130 +1,123 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
  * Use is subject to license terms.
  */
 
 /*
  * Copyright (c) 2013, 2017 by Delphix. All rights reserved.
  */
 
 #ifndef _SYS_RANGE_TREE_H
 #define	_SYS_RANGE_TREE_H
 
 #include <sys/avl.h>
 #include <sys/dmu.h>
 
 #ifdef	__cplusplus
 extern "C" {
 #endif
 
 #define	RANGE_TREE_HISTOGRAM_SIZE	64
 
 typedef struct range_tree_ops range_tree_ops_t;
 
 /*
  * Note: the range_tree may not be accessed concurrently; consumers
  * must provide external locking if required.
  */
 typedef struct range_tree {
 	avl_tree_t	rt_root;	/* offset-ordered segment AVL tree */
 	uint64_t	rt_space;	/* sum of all segments in the map */
 	range_tree_ops_t *rt_ops;
 	void		*rt_arg;
 
 	/* rt_avl_compare should only be set it rt_arg is an AVL tree */
 	uint64_t	rt_gap;		/* allowable inter-segment gap */
 	int (*rt_avl_compare)(const void *, const void *);
 	/*
 	 * The rt_histogram maintains a histogram of ranges. Each bucket,
 	 * rt_histogram[i], contains the number of ranges whose size is:
 	 * 2^i <= size of range in bytes < 2^(i+1)
 	 */
 	uint64_t	rt_histogram[RANGE_TREE_HISTOGRAM_SIZE];
 } range_tree_t;
 
 typedef struct range_seg {
 	avl_node_t	rs_node;	/* AVL node */
 	avl_node_t	rs_pp_node;	/* AVL picker-private node */
 	uint64_t	rs_start;	/* starting offset of this segment */
 	uint64_t	rs_end;		/* ending offset (non-inclusive) */
 	uint64_t	rs_fill;	/* actual fill if gap mode is on */
 } range_seg_t;
 
 struct range_tree_ops {
 	void    (*rtop_create)(range_tree_t *rt, void *arg);
 	void    (*rtop_destroy)(range_tree_t *rt, void *arg);
 	void	(*rtop_add)(range_tree_t *rt, range_seg_t *rs, void *arg);
 	void    (*rtop_remove)(range_tree_t *rt, range_seg_t *rs, void *arg);
 	void	(*rtop_vacate)(range_tree_t *rt, void *arg);
 };
 
 typedef void range_tree_func_t(void *arg, uint64_t start, uint64_t size);
 
 void range_tree_init(void);
 void range_tree_fini(void);
 range_tree_t *range_tree_create_impl(range_tree_ops_t *ops, void *arg,
     int (*avl_compare)(const void*, const void*), uint64_t gap);
-	range_tree_t *range_tree_create(range_tree_ops_t *ops, void *arg);
+range_tree_t *range_tree_create(range_tree_ops_t *ops, void *arg);
 void range_tree_destroy(range_tree_t *rt);
 boolean_t range_tree_contains(range_tree_t *rt, uint64_t start, uint64_t size);
 range_seg_t *range_tree_find(range_tree_t *rt, uint64_t start, uint64_t size);
 void range_tree_resize_segment(range_tree_t *rt, range_seg_t *rs,
     uint64_t newstart, uint64_t newsize);
 uint64_t range_tree_space(range_tree_t *rt);
 boolean_t range_tree_is_empty(range_tree_t *rt);
 void range_tree_verify(range_tree_t *rt, uint64_t start, uint64_t size);
 void range_tree_swap(range_tree_t **rtsrc, range_tree_t **rtdst);
 void range_tree_stat_verify(range_tree_t *rt);
 uint64_t range_tree_min(range_tree_t *rt);
 uint64_t range_tree_max(range_tree_t *rt);
 uint64_t range_tree_span(range_tree_t *rt);
 
 void range_tree_add(void *arg, uint64_t start, uint64_t size);
 void range_tree_remove(void *arg, uint64_t start, uint64_t size);
 void range_tree_remove_fill(range_tree_t *rt, uint64_t start, uint64_t size);
 void range_tree_adjust_fill(range_tree_t *rt, range_seg_t *rs, int64_t delta);
 void range_tree_clear(range_tree_t *rt, uint64_t start, uint64_t size);
 
 void range_tree_vacate(range_tree_t *rt, range_tree_func_t *func, void *arg);
 void range_tree_walk(range_tree_t *rt, range_tree_func_t *func, void *arg);
 range_seg_t *range_tree_first(range_tree_t *rt);
-
-void rt_avl_create(range_tree_t *rt, void *arg);
-void rt_avl_destroy(range_tree_t *rt, void *arg);
-void rt_avl_add(range_tree_t *rt, range_seg_t *rs, void *arg);
-void rt_avl_remove(range_tree_t *rt, range_seg_t *rs, void *arg);
-void rt_avl_vacate(range_tree_t *rt, void *arg);
-extern struct range_tree_ops rt_avl_ops;
 
 void rt_avl_create(range_tree_t *rt, void *arg);
 void rt_avl_destroy(range_tree_t *rt, void *arg);
 void rt_avl_add(range_tree_t *rt, range_seg_t *rs, void *arg);
 void rt_avl_remove(range_tree_t *rt, range_seg_t *rs, void *arg);
 void rt_avl_vacate(range_tree_t *rt, void *arg);
 extern struct range_tree_ops rt_avl_ops;
 
 #ifdef	__cplusplus
 }
 #endif
 
 #endif	/* _SYS_RANGE_TREE_H */
Index: user/ngie/bug-237403/sys/cddl/contrib/opensolaris
===================================================================
--- user/ngie/bug-237403/sys/cddl/contrib/opensolaris	(revision 346925)
+++ user/ngie/bug-237403/sys/cddl/contrib/opensolaris	(revision 346926)

Property changes on: user/ngie/bug-237403/sys/cddl/contrib/opensolaris
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/sys/cddl/contrib/opensolaris:r346444-346925
Index: user/ngie/bug-237403/sys/compat/freebsd32/freebsd32_systrace_args.c
===================================================================
--- user/ngie/bug-237403/sys/compat/freebsd32/freebsd32_systrace_args.c	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/freebsd32/freebsd32_systrace_args.c	(revision 346926)
@@ -1,10816 +1,10816 @@
 /*
  * System call argument to DTrace register array converstion.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
  * This file is part of the DTrace syscall provider.
  */
 
 static void
 systrace_args(int sysnum, void *params, uint64_t *uarg, int *n_args)
 {
 	int64_t *iarg  = (int64_t *) uarg;
 	switch (sysnum) {
 #if !defined(PAD64_REQUIRED) && !defined(__amd64__)
 #define PAD64_REQUIRED
 #endif
 	/* nosys */
 	case 0: {
 		*n_args = 0;
 		break;
 	}
 	/* sys_exit */
 	case 1: {
 		struct sys_exit_args *p = params;
 		iarg[0] = p->rval; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* fork */
 	case 2: {
 		*n_args = 0;
 		break;
 	}
 	/* read */
 	case 3: {
 		struct read_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->nbyte; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* write */
 	case 4: {
 		struct write_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->nbyte; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* open */
 	case 5: {
 		struct open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* close */
 	case 6: {
 		struct close_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_wait4 */
 	case 7: {
 		struct freebsd32_wait4_args *p = params;
 		iarg[0] = p->pid; /* int */
 		uarg[1] = (intptr_t) p->status; /* int * */
 		iarg[2] = p->options; /* int */
 		uarg[3] = (intptr_t) p->rusage; /* struct rusage32 * */
 		*n_args = 4;
 		break;
 	}
 	/* link */
 	case 9: {
 		struct link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->link; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* unlink */
 	case 10: {
 		struct unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* chdir */
 	case 12: {
 		struct chdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* fchdir */
 	case 13: {
 		struct fchdir_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* chmod */
 	case 15: {
 		struct chmod_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* chown */
 	case 16: {
 		struct chown_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* break */
 	case 17: {
 		struct break_args *p = params;
 		uarg[0] = (intptr_t) p->nsize; /* char * */
 		*n_args = 1;
 		break;
 	}
 	/* getpid */
 	case 20: {
 		*n_args = 0;
 		break;
 	}
 	/* mount */
 	case 21: {
 		struct mount_args *p = params;
 		uarg[0] = (intptr_t) p->type; /* const char * */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flags; /* int */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		*n_args = 4;
 		break;
 	}
 	/* unmount */
 	case 22: {
 		struct unmount_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* setuid */
 	case 23: {
 		struct setuid_args *p = params;
 		uarg[0] = p->uid; /* uid_t */
 		*n_args = 1;
 		break;
 	}
 	/* getuid */
 	case 24: {
 		*n_args = 0;
 		break;
 	}
 	/* geteuid */
 	case 25: {
 		*n_args = 0;
 		break;
 	}
 	/* ptrace */
 	case 26: {
 		struct ptrace_args *p = params;
 		iarg[0] = p->req; /* int */
 		iarg[1] = p->pid; /* pid_t */
 		uarg[2] = (intptr_t) p->addr; /* caddr_t */
 		iarg[3] = p->data; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_recvmsg */
 	case 27: {
 		struct freebsd32_recvmsg_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->msg; /* struct msghdr32 * */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_sendmsg */
 	case 28: {
 		struct freebsd32_sendmsg_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->msg; /* struct msghdr32 * */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_recvfrom */
 	case 29: {
 		struct freebsd32_recvfrom_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->len; /* uint32_t */
 		iarg[3] = p->flags; /* int */
 		uarg[4] = (intptr_t) p->from; /* struct sockaddr * */
 		uarg[5] = p->fromlenaddr; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 	/* accept */
 	case 30: {
 		struct accept_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->anamelen; /* int * */
 		*n_args = 3;
 		break;
 	}
 	/* getpeername */
 	case 31: {
 		struct getpeername_args *p = params;
 		iarg[0] = p->fdes; /* int */
 		uarg[1] = (intptr_t) p->asa; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->alen; /* int * */
 		*n_args = 3;
 		break;
 	}
 	/* getsockname */
 	case 32: {
 		struct getsockname_args *p = params;
 		iarg[0] = p->fdes; /* int */
 		uarg[1] = (intptr_t) p->asa; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->alen; /* int * */
 		*n_args = 3;
 		break;
 	}
 	/* access */
 	case 33: {
 		struct access_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->amode; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* chflags */
 	case 34: {
 		struct chflags_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* fchflags */
 	case 35: {
 		struct fchflags_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* sync */
 	case 36: {
 		*n_args = 0;
 		break;
 	}
 	/* kill */
 	case 37: {
 		struct kill_args *p = params;
 		iarg[0] = p->pid; /* int */
 		iarg[1] = p->signum; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* getppid */
 	case 39: {
 		*n_args = 0;
 		break;
 	}
 	/* dup */
 	case 41: {
 		struct dup_args *p = params;
 		uarg[0] = p->fd; /* u_int */
 		*n_args = 1;
 		break;
 	}
 	/* getegid */
 	case 43: {
 		*n_args = 0;
 		break;
 	}
 	/* profil */
 	case 44: {
 		struct profil_args *p = params;
 		uarg[0] = (intptr_t) p->samples; /* char * */
 		uarg[1] = p->size; /* size_t */
 		uarg[2] = p->offset; /* size_t */
 		uarg[3] = p->scale; /* u_int */
 		*n_args = 4;
 		break;
 	}
 	/* ktrace */
 	case 45: {
 		struct ktrace_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		iarg[1] = p->ops; /* int */
 		iarg[2] = p->facs; /* int */
 		iarg[3] = p->pid; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* getgid */
 	case 47: {
 		*n_args = 0;
 		break;
 	}
 	/* getlogin */
 	case 49: {
 		struct getlogin_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* char * */
 		uarg[1] = p->namelen; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* setlogin */
 	case 50: {
 		struct setlogin_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* acct */
 	case 51: {
 		struct acct_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_sigaltstack */
 	case 53: {
 		struct freebsd32_sigaltstack_args *p = params;
 		uarg[0] = (intptr_t) p->ss; /* struct sigaltstack32 * */
 		uarg[1] = (intptr_t) p->oss; /* struct sigaltstack32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_ioctl */
 	case 54: {
 		struct freebsd32_ioctl_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->com; /* uint32_t */
 		uarg[2] = (intptr_t) p->data; /* struct md_ioctl32 * */
 		*n_args = 3;
 		break;
 	}
 	/* reboot */
 	case 55: {
 		struct reboot_args *p = params;
 		iarg[0] = p->opt; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* revoke */
 	case 56: {
 		struct revoke_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* symlink */
 	case 57: {
 		struct symlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->link; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* readlink */
 	case 58: {
 		struct readlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->count; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_execve */
 	case 59: {
 		struct freebsd32_execve_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->argv; /* uint32_t * */
 		uarg[2] = (intptr_t) p->envv; /* uint32_t * */
 		*n_args = 3;
 		break;
 	}
 	/* umask */
 	case 60: {
 		struct umask_args *p = params;
 		iarg[0] = p->newmask; /* mode_t */
 		*n_args = 1;
 		break;
 	}
 	/* chroot */
 	case 61: {
 		struct chroot_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* msync */
 	case 65: {
 		struct msync_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* vfork */
 	case 66: {
 		*n_args = 0;
 		break;
 	}
 	/* sbrk */
 	case 69: {
 		struct sbrk_args *p = params;
 		iarg[0] = p->incr; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* sstk */
 	case 70: {
 		struct sstk_args *p = params;
 		iarg[0] = p->incr; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* munmap */
 	case 73: {
 		struct munmap_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_mprotect */
 	case 74: {
 		struct freebsd32_mprotect_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->prot; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* madvise */
 	case 75: {
 		struct madvise_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->behav; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* mincore */
 	case 78: {
 		struct mincore_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		uarg[2] = (intptr_t) p->vec; /* char * */
 		*n_args = 3;
 		break;
 	}
 	/* getgroups */
 	case 79: {
 		struct getgroups_args *p = params;
 		uarg[0] = p->gidsetsize; /* u_int */
 		uarg[1] = (intptr_t) p->gidset; /* gid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* setgroups */
 	case 80: {
 		struct setgroups_args *p = params;
 		uarg[0] = p->gidsetsize; /* u_int */
 		uarg[1] = (intptr_t) p->gidset; /* gid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* getpgrp */
 	case 81: {
 		*n_args = 0;
 		break;
 	}
 	/* setpgid */
 	case 82: {
 		struct setpgid_args *p = params;
 		iarg[0] = p->pid; /* int */
 		iarg[1] = p->pgid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_setitimer */
 	case 83: {
 		struct freebsd32_setitimer_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->itv; /* struct itimerval32 * */
 		uarg[2] = (intptr_t) p->oitv; /* struct itimerval32 * */
 		*n_args = 3;
 		break;
 	}
 	/* swapon */
 	case 85: {
 		struct swapon_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_getitimer */
 	case 86: {
 		struct freebsd32_getitimer_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->itv; /* struct itimerval32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getdtablesize */
 	case 89: {
 		*n_args = 0;
 		break;
 	}
 	/* dup2 */
 	case 90: {
 		struct dup2_args *p = params;
 		uarg[0] = p->from; /* u_int */
 		uarg[1] = p->to; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_fcntl */
 	case 92: {
 		struct freebsd32_fcntl_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->cmd; /* int */
 		iarg[2] = p->arg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_select */
 	case 93: {
 		struct freebsd32_select_args *p = params;
 		iarg[0] = p->nd; /* int */
 		uarg[1] = (intptr_t) p->in; /* fd_set * */
 		uarg[2] = (intptr_t) p->ou; /* fd_set * */
 		uarg[3] = (intptr_t) p->ex; /* fd_set * */
 		uarg[4] = (intptr_t) p->tv; /* struct timeval32 * */
 		*n_args = 5;
 		break;
 	}
 	/* fsync */
 	case 95: {
 		struct fsync_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* setpriority */
 	case 96: {
 		struct setpriority_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->who; /* int */
 		iarg[2] = p->prio; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* socket */
 	case 97: {
 		struct socket_args *p = params;
 		iarg[0] = p->domain; /* int */
 		iarg[1] = p->type; /* int */
 		iarg[2] = p->protocol; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* connect */
 	case 98: {
 		struct connect_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[2] = p->namelen; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* getpriority */
 	case 100: {
 		struct getpriority_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->who; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* bind */
 	case 104: {
 		struct bind_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[2] = p->namelen; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* setsockopt */
 	case 105: {
 		struct setsockopt_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->level; /* int */
 		iarg[2] = p->name; /* int */
 		uarg[3] = (intptr_t) p->val; /* const void * */
 		iarg[4] = p->valsize; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* listen */
 	case 106: {
 		struct listen_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->backlog; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_gettimeofday */
 	case 116: {
 		struct freebsd32_gettimeofday_args *p = params;
 		uarg[0] = (intptr_t) p->tp; /* struct timeval32 * */
 		uarg[1] = (intptr_t) p->tzp; /* struct timezone * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_getrusage */
 	case 117: {
 		struct freebsd32_getrusage_args *p = params;
 		iarg[0] = p->who; /* int */
 		uarg[1] = (intptr_t) p->rusage; /* struct rusage32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getsockopt */
 	case 118: {
 		struct getsockopt_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->level; /* int */
 		iarg[2] = p->name; /* int */
 		uarg[3] = (intptr_t) p->val; /* void * */
 		uarg[4] = (intptr_t) p->avalsize; /* int * */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_readv */
 	case 120: {
 		struct freebsd32_readv_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[2] = p->iovcnt; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_writev */
 	case 121: {
 		struct freebsd32_writev_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[2] = p->iovcnt; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_settimeofday */
 	case 122: {
 		struct freebsd32_settimeofday_args *p = params;
 		uarg[0] = (intptr_t) p->tv; /* struct timeval32 * */
 		uarg[1] = (intptr_t) p->tzp; /* struct timezone * */
 		*n_args = 2;
 		break;
 	}
 	/* fchown */
 	case 123: {
 		struct fchown_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* fchmod */
 	case 124: {
 		struct fchmod_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* setreuid */
 	case 126: {
 		struct setreuid_args *p = params;
 		iarg[0] = p->ruid; /* int */
 		iarg[1] = p->euid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* setregid */
 	case 127: {
 		struct setregid_args *p = params;
 		iarg[0] = p->rgid; /* int */
 		iarg[1] = p->egid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* rename */
 	case 128: {
 		struct rename_args *p = params;
 		uarg[0] = (intptr_t) p->from; /* const char * */
 		uarg[1] = (intptr_t) p->to; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* flock */
 	case 131: {
 		struct flock_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->how; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* mkfifo */
 	case 132: {
 		struct mkfifo_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* sendto */
 	case 133: {
 		struct sendto_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->len; /* size_t */
 		iarg[3] = p->flags; /* int */
 		uarg[4] = (intptr_t) p->to; /* const struct sockaddr * */
 		iarg[5] = p->tolen; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* shutdown */
 	case 134: {
 		struct shutdown_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->how; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* socketpair */
 	case 135: {
 		struct socketpair_args *p = params;
 		iarg[0] = p->domain; /* int */
 		iarg[1] = p->type; /* int */
 		iarg[2] = p->protocol; /* int */
 		uarg[3] = (intptr_t) p->rsv; /* int * */
 		*n_args = 4;
 		break;
 	}
 	/* mkdir */
 	case 136: {
 		struct mkdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* rmdir */
 	case 137: {
 		struct rmdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_utimes */
 	case 138: {
 		struct freebsd32_utimes_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_adjtime */
 	case 140: {
 		struct freebsd32_adjtime_args *p = params;
 		uarg[0] = (intptr_t) p->delta; /* struct timeval32 * */
 		uarg[1] = (intptr_t) p->olddelta; /* struct timeval32 * */
 		*n_args = 2;
 		break;
 	}
 	/* setsid */
 	case 147: {
 		*n_args = 0;
 		break;
 	}
 	/* quotactl */
 	case 148: {
 		struct quotactl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->cmd; /* int */
 		iarg[2] = p->uid; /* int */
 		uarg[3] = (intptr_t) p->arg; /* void * */
 		*n_args = 4;
 		break;
 	}
 	/* getfh */
 	case 161: {
 		struct getfh_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->fhp; /* struct fhandle * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_sysarch */
 	case 165: {
 		struct freebsd32_sysarch_args *p = params;
 		iarg[0] = p->op; /* int */
 		uarg[1] = (intptr_t) p->parms; /* char * */
 		*n_args = 2;
 		break;
 	}
 	/* rtprio */
 	case 166: {
 		struct rtprio_args *p = params;
 		iarg[0] = p->function; /* int */
 		iarg[1] = p->pid; /* pid_t */
 		uarg[2] = (intptr_t) p->rtp; /* struct rtprio * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_semsys */
 	case 169: {
 		struct freebsd32_semsys_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->a2; /* int */
 		iarg[2] = p->a3; /* int */
 		iarg[3] = p->a4; /* int */
 		iarg[4] = p->a5; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_msgsys */
 	case 170: {
 		struct freebsd32_msgsys_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->a2; /* int */
 		iarg[2] = p->a3; /* int */
 		iarg[3] = p->a4; /* int */
 		iarg[4] = p->a5; /* int */
 		iarg[5] = p->a6; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_shmsys */
 	case 171: {
 		struct freebsd32_shmsys_args *p = params;
 		uarg[0] = p->which; /* uint32_t */
 		uarg[1] = p->a2; /* uint32_t */
 		uarg[2] = p->a3; /* uint32_t */
 		uarg[3] = p->a4; /* uint32_t */
 		*n_args = 4;
 		break;
 	}
 	/* ntp_adjtime */
 	case 176: {
 		struct ntp_adjtime_args *p = params;
 		uarg[0] = (intptr_t) p->tp; /* struct timex * */
 		*n_args = 1;
 		break;
 	}
 	/* setgid */
 	case 181: {
 		struct setgid_args *p = params;
 		iarg[0] = p->gid; /* gid_t */
 		*n_args = 1;
 		break;
 	}
 	/* setegid */
 	case 182: {
 		struct setegid_args *p = params;
 		iarg[0] = p->egid; /* gid_t */
 		*n_args = 1;
 		break;
 	}
 	/* seteuid */
 	case 183: {
 		struct seteuid_args *p = params;
 		uarg[0] = p->euid; /* uid_t */
 		*n_args = 1;
 		break;
 	}
 	/* pathconf */
 	case 191: {
 		struct pathconf_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* fpathconf */
 	case 192: {
 		struct fpathconf_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* getrlimit */
 	case 194: {
 		struct __getrlimit_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->rlp; /* struct rlimit * */
 		*n_args = 2;
 		break;
 	}
 	/* setrlimit */
 	case 195: {
 		struct __setrlimit_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->rlp; /* struct rlimit * */
 		*n_args = 2;
 		break;
 	}
 	/* nosys */
 	case 198: {
 		*n_args = 0;
 		break;
 	}
 	/* freebsd32___sysctl */
 	case 202: {
 		struct freebsd32___sysctl_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* int * */
 		uarg[1] = p->namelen; /* u_int */
 		uarg[2] = (intptr_t) p->old; /* void * */
 		uarg[3] = (intptr_t) p->oldlenp; /* uint32_t * */
 		uarg[4] = (intptr_t) p->new; /* const void * */
 		uarg[5] = p->newlen; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 	/* mlock */
 	case 203: {
 		struct mlock_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* munlock */
 	case 204: {
 		struct munlock_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* undelete */
 	case 205: {
 		struct undelete_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_futimes */
 	case 206: {
 		struct freebsd32_futimes_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getpgid */
 	case 207: {
 		struct getpgid_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* poll */
 	case 209: {
 		struct poll_args *p = params;
 		uarg[0] = (intptr_t) p->fds; /* struct pollfd * */
 		uarg[1] = p->nfds; /* u_int */
 		iarg[2] = p->timeout; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* lkmnosys */
 	case 210: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 211: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 212: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 213: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 214: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 215: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 216: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 217: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 218: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 219: {
 		*n_args = 0;
 		break;
 	}
 	/* semget */
 	case 221: {
 		struct semget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		iarg[1] = p->nsems; /* int */
 		iarg[2] = p->semflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* semop */
 	case 222: {
 		struct semop_args *p = params;
 		iarg[0] = p->semid; /* int */
 		uarg[1] = (intptr_t) p->sops; /* struct sembuf * */
 		uarg[2] = p->nsops; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* msgget */
 	case 225: {
 		struct msgget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		iarg[1] = p->msgflg; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_msgsnd */
 	case 226: {
 		struct freebsd32_msgsnd_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		uarg[1] = (intptr_t) p->msgp; /* void * */
 		uarg[2] = p->msgsz; /* size_t */
 		iarg[3] = p->msgflg; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_msgrcv */
 	case 227: {
 		struct freebsd32_msgrcv_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		uarg[1] = (intptr_t) p->msgp; /* void * */
 		uarg[2] = p->msgsz; /* size_t */
 		iarg[3] = p->msgtyp; /* long */
 		iarg[4] = p->msgflg; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* shmat */
 	case 228: {
 		struct shmat_args *p = params;
 		iarg[0] = p->shmid; /* int */
 		uarg[1] = (intptr_t) p->shmaddr; /* void * */
 		iarg[2] = p->shmflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* shmdt */
 	case 230: {
 		struct shmdt_args *p = params;
 		uarg[0] = (intptr_t) p->shmaddr; /* void * */
 		*n_args = 1;
 		break;
 	}
 	/* shmget */
 	case 231: {
 		struct shmget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		iarg[1] = p->size; /* int */
 		iarg[2] = p->shmflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_clock_gettime */
 	case 232: {
 		struct freebsd32_clock_gettime_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_clock_settime */
 	case 233: {
 		struct freebsd32_clock_settime_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* const struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_clock_getres */
 	case 234: {
 		struct freebsd32_clock_getres_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_ktimer_create */
 	case 235: {
 		struct freebsd32_ktimer_create_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->evp; /* struct sigevent32 * */
 		uarg[2] = (intptr_t) p->timerid; /* int * */
 		*n_args = 3;
 		break;
 	}
 	/* ktimer_delete */
 	case 236: {
 		struct ktimer_delete_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_ktimer_settime */
 	case 237: {
 		struct freebsd32_ktimer_settime_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		iarg[1] = p->flags; /* int */
 		uarg[2] = (intptr_t) p->value; /* const struct itimerspec32 * */
 		uarg[3] = (intptr_t) p->ovalue; /* struct itimerspec32 * */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_ktimer_gettime */
 	case 238: {
 		struct freebsd32_ktimer_gettime_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		uarg[1] = (intptr_t) p->value; /* struct itimerspec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* ktimer_getoverrun */
 	case 239: {
 		struct ktimer_getoverrun_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_nanosleep */
 	case 240: {
 		struct freebsd32_nanosleep_args *p = params;
 		uarg[0] = (intptr_t) p->rqtp; /* const struct timespec32 * */
 		uarg[1] = (intptr_t) p->rmtp; /* struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* ffclock_getcounter */
 	case 241: {
 		struct ffclock_getcounter_args *p = params;
 		uarg[0] = (intptr_t) p->ffcount; /* ffcounter * */
 		*n_args = 1;
 		break;
 	}
 	/* ffclock_setestimate */
 	case 242: {
 		struct ffclock_setestimate_args *p = params;
 		uarg[0] = (intptr_t) p->cest; /* struct ffclock_estimate * */
 		*n_args = 1;
 		break;
 	}
 	/* ffclock_getestimate */
 	case 243: {
 		struct ffclock_getestimate_args *p = params;
 		uarg[0] = (intptr_t) p->cest; /* struct ffclock_estimate * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_clock_nanosleep */
 	case 244: {
 		struct freebsd32_clock_nanosleep_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		iarg[1] = p->flags; /* int */
 		uarg[2] = (intptr_t) p->rqtp; /* const struct timespec32 * */
 		uarg[3] = (intptr_t) p->rmtp; /* struct timespec32 * */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_clock_getcpuclockid2 */
 	case 247: {
 		struct freebsd32_clock_getcpuclockid2_args *p = params;
 		uarg[0] = p->id1; /* uint32_t */
 		uarg[1] = p->id2; /* uint32_t */
 		iarg[2] = p->which; /* int */
 		uarg[3] = (intptr_t) p->clock_id; /* clockid_t * */
 		*n_args = 4;
 		break;
 	}
 	/* minherit */
 	case 250: {
 		struct minherit_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->inherit; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* rfork */
 	case 251: {
 		struct rfork_args *p = params;
 		iarg[0] = p->flags; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* issetugid */
 	case 253: {
 		*n_args = 0;
 		break;
 	}
 	/* lchown */
 	case 254: {
 		struct lchown_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_aio_read */
 	case 255: {
 		struct freebsd32_aio_read_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_aio_write */
 	case 256: {
 		struct freebsd32_aio_write_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_lio_listio */
 	case 257: {
 		struct freebsd32_lio_listio_args *p = params;
 		iarg[0] = p->mode; /* int */
 		uarg[1] = (intptr_t) p->acb_list; /* struct aiocb32 *const * */
 		iarg[2] = p->nent; /* int */
 		uarg[3] = (intptr_t) p->sig; /* struct sigevent32 * */
 		*n_args = 4;
 		break;
 	}
 	/* lchmod */
 	case 274: {
 		struct lchmod_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_lutimes */
 	case 276: {
 		struct freebsd32_lutimes_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_preadv */
 	case 289: {
 		struct freebsd32_preadv_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[2] = p->iovcnt; /* u_int */
 		uarg[3] = p->offset1; /* uint32_t */
 		uarg[4] = p->offset2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_pwritev */
 	case 290: {
 		struct freebsd32_pwritev_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[2] = p->iovcnt; /* u_int */
 		uarg[3] = p->offset1; /* uint32_t */
 		uarg[4] = p->offset2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 	/* fhopen */
 	case 298: {
 		struct fhopen_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* modnext */
 	case 300: {
 		struct modnext_args *p = params;
 		iarg[0] = p->modid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_modstat */
 	case 301: {
 		struct freebsd32_modstat_args *p = params;
 		iarg[0] = p->modid; /* int */
 		uarg[1] = (intptr_t) p->stat; /* struct module_stat32 * */
 		*n_args = 2;
 		break;
 	}
 	/* modfnext */
 	case 302: {
 		struct modfnext_args *p = params;
 		iarg[0] = p->modid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* modfind */
 	case 303: {
 		struct modfind_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldload */
 	case 304: {
 		struct kldload_args *p = params;
 		uarg[0] = (intptr_t) p->file; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldunload */
 	case 305: {
 		struct kldunload_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* kldfind */
 	case 306: {
 		struct kldfind_args *p = params;
 		uarg[0] = (intptr_t) p->file; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldnext */
 	case 307: {
 		struct kldnext_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_kldstat */
 	case 308: {
 		struct freebsd32_kldstat_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		uarg[1] = (intptr_t) p->stat; /* struct kld32_file_stat * */
 		*n_args = 2;
 		break;
 	}
 	/* kldfirstmod */
 	case 309: {
 		struct kldfirstmod_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* getsid */
 	case 310: {
 		struct getsid_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* setresuid */
 	case 311: {
 		struct setresuid_args *p = params;
 		uarg[0] = p->ruid; /* uid_t */
 		uarg[1] = p->euid; /* uid_t */
 		uarg[2] = p->suid; /* uid_t */
 		*n_args = 3;
 		break;
 	}
 	/* setresgid */
 	case 312: {
 		struct setresgid_args *p = params;
 		iarg[0] = p->rgid; /* gid_t */
 		iarg[1] = p->egid; /* gid_t */
 		iarg[2] = p->sgid; /* gid_t */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_aio_return */
 	case 314: {
 		struct freebsd32_aio_return_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_aio_suspend */
 	case 315: {
 		struct freebsd32_aio_suspend_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 *const * */
 		iarg[1] = p->nent; /* int */
 		uarg[2] = (intptr_t) p->timeout; /* const struct timespec32 * */
 		*n_args = 3;
 		break;
 	}
 	/* aio_cancel */
 	case 316: {
 		struct aio_cancel_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_aio_error */
 	case 317: {
 		struct freebsd32_aio_error_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 1;
 		break;
 	}
 	/* yield */
 	case 321: {
 		*n_args = 0;
 		break;
 	}
 	/* mlockall */
 	case 324: {
 		struct mlockall_args *p = params;
 		iarg[0] = p->how; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* munlockall */
 	case 325: {
 		*n_args = 0;
 		break;
 	}
 	/* __getcwd */
 	case 326: {
 		struct __getcwd_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* char * */
 		uarg[1] = p->buflen; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* sched_setparam */
 	case 327: {
 		struct sched_setparam_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->param; /* const struct sched_param * */
 		*n_args = 2;
 		break;
 	}
 	/* sched_getparam */
 	case 328: {
 		struct sched_getparam_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->param; /* struct sched_param * */
 		*n_args = 2;
 		break;
 	}
 	/* sched_setscheduler */
 	case 329: {
 		struct sched_setscheduler_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->policy; /* int */
 		uarg[2] = (intptr_t) p->param; /* const struct sched_param * */
 		*n_args = 3;
 		break;
 	}
 	/* sched_getscheduler */
 	case 330: {
 		struct sched_getscheduler_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* sched_yield */
 	case 331: {
 		*n_args = 0;
 		break;
 	}
 	/* sched_get_priority_max */
 	case 332: {
 		struct sched_get_priority_max_args *p = params;
 		iarg[0] = p->policy; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* sched_get_priority_min */
 	case 333: {
 		struct sched_get_priority_min_args *p = params;
 		iarg[0] = p->policy; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_sched_rr_get_interval */
 	case 334: {
 		struct freebsd32_sched_rr_get_interval_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->interval; /* struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* utrace */
 	case 335: {
 		struct utrace_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* kldsym */
 	case 337: {
 		struct kldsym_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_jail */
 	case 338: {
 		struct freebsd32_jail_args *p = params;
 		uarg[0] = (intptr_t) p->jail; /* struct jail32 * */
 		*n_args = 1;
 		break;
 	}
 	/* sigprocmask */
 	case 340: {
 		struct sigprocmask_args *p = params;
 		iarg[0] = p->how; /* int */
 		uarg[1] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[2] = (intptr_t) p->oset; /* sigset_t * */
 		*n_args = 3;
 		break;
 	}
 	/* sigsuspend */
 	case 341: {
 		struct sigsuspend_args *p = params;
 		uarg[0] = (intptr_t) p->sigmask; /* const sigset_t * */
 		*n_args = 1;
 		break;
 	}
 	/* sigpending */
 	case 343: {
 		struct sigpending_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* sigset_t * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_sigtimedwait */
 	case 345: {
 		struct freebsd32_sigtimedwait_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->info; /* siginfo_t * */
 		uarg[2] = (intptr_t) p->timeout; /* const struct timespec * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_sigwaitinfo */
 	case 346: {
 		struct freebsd32_sigwaitinfo_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->info; /* siginfo_t * */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_get_file */
 	case 347: {
 		struct __acl_get_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_file */
 	case 348: {
 		struct __acl_set_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_get_fd */
 	case 349: {
 		struct __acl_get_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_fd */
 	case 350: {
 		struct __acl_set_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_delete_file */
 	case 351: {
 		struct __acl_delete_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_delete_fd */
 	case 352: {
 		struct __acl_delete_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_aclcheck_file */
 	case 353: {
 		struct __acl_aclcheck_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_aclcheck_fd */
 	case 354: {
 		struct __acl_aclcheck_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* extattrctl */
 	case 355: {
 		struct extattrctl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->filename; /* const char * */
 		iarg[3] = p->attrnamespace; /* int */
 		uarg[4] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_set_file */
 	case 356: {
 		struct extattr_set_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_file */
 	case 357: {
 		struct extattr_get_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_file */
 	case 358: {
 		struct extattr_delete_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_aio_waitcomplete */
 	case 359: {
 		struct freebsd32_aio_waitcomplete_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 ** */
 		uarg[1] = (intptr_t) p->timeout; /* struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getresuid */
 	case 360: {
 		struct getresuid_args *p = params;
 		uarg[0] = (intptr_t) p->ruid; /* uid_t * */
 		uarg[1] = (intptr_t) p->euid; /* uid_t * */
 		uarg[2] = (intptr_t) p->suid; /* uid_t * */
 		*n_args = 3;
 		break;
 	}
 	/* getresgid */
 	case 361: {
 		struct getresgid_args *p = params;
 		uarg[0] = (intptr_t) p->rgid; /* gid_t * */
 		uarg[1] = (intptr_t) p->egid; /* gid_t * */
 		uarg[2] = (intptr_t) p->sgid; /* gid_t * */
 		*n_args = 3;
 		break;
 	}
 	/* kqueue */
 	case 362: {
 		*n_args = 0;
 		break;
 	}
 	/* extattr_set_fd */
 	case 371: {
 		struct extattr_set_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_fd */
 	case 372: {
 		struct extattr_get_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_fd */
 	case 373: {
 		struct extattr_delete_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* __setugid */
 	case 374: {
 		struct __setugid_args *p = params;
 		iarg[0] = p->flag; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* eaccess */
 	case 376: {
 		struct eaccess_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->amode; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_nmount */
 	case 378: {
 		struct freebsd32_nmount_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* kenv */
 	case 390: {
 		struct kenv_args *p = params;
 		iarg[0] = p->what; /* int */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		uarg[2] = (intptr_t) p->value; /* char * */
 		iarg[3] = p->len; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* lchflags */
 	case 391: {
 		struct lchflags_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* uuidgen */
 	case 392: {
 		struct uuidgen_args *p = params;
 		uarg[0] = (intptr_t) p->store; /* struct uuid * */
 		iarg[1] = p->count; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_sendfile */
 	case 393: {
 		struct freebsd32_sendfile_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		uarg[2] = p->offset1; /* uint32_t */
 		uarg[3] = p->offset2; /* uint32_t */
 		uarg[4] = p->nbytes; /* size_t */
 		uarg[5] = (intptr_t) p->hdtr; /* struct sf_hdtr32 * */
 		uarg[6] = (intptr_t) p->sbytes; /* off_t * */
 		iarg[7] = p->flags; /* int */
 		*n_args = 8;
 		break;
 	}
 	/* ksem_close */
 	case 400: {
 		struct ksem_close_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_post */
 	case 401: {
 		struct ksem_post_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_wait */
 	case 402: {
 		struct ksem_wait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_trywait */
 	case 403: {
 		struct ksem_trywait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_ksem_init */
 	case 404: {
 		struct freebsd32_ksem_init_args *p = params;
 		uarg[0] = (intptr_t) p->idp; /* semid_t * */
 		uarg[1] = p->value; /* unsigned int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_ksem_open */
 	case 405: {
 		struct freebsd32_ksem_open_args *p = params;
 		uarg[0] = (intptr_t) p->idp; /* semid_t * */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		iarg[2] = p->oflag; /* int */
 		iarg[3] = p->mode; /* mode_t */
 		uarg[4] = p->value; /* unsigned int */
 		*n_args = 5;
 		break;
 	}
 	/* ksem_unlink */
 	case 406: {
 		struct ksem_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_getvalue */
 	case 407: {
 		struct ksem_getvalue_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		uarg[1] = (intptr_t) p->val; /* int * */
 		*n_args = 2;
 		break;
 	}
 	/* ksem_destroy */
 	case 408: {
 		struct ksem_destroy_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* extattr_set_link */
 	case 412: {
 		struct extattr_set_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_link */
 	case 413: {
 		struct extattr_get_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_link */
 	case 414: {
 		struct extattr_delete_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_sigaction */
 	case 416: {
 		struct freebsd32_sigaction_args *p = params;
 		iarg[0] = p->sig; /* int */
 		uarg[1] = (intptr_t) p->act; /* struct sigaction32 * */
 		uarg[2] = (intptr_t) p->oact; /* struct sigaction32 * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_sigreturn */
 	case 417: {
 		struct freebsd32_sigreturn_args *p = params;
 		uarg[0] = (intptr_t) p->sigcntxp; /* const struct freebsd32_ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_getcontext */
 	case 421: {
 		struct freebsd32_getcontext_args *p = params;
 		uarg[0] = (intptr_t) p->ucp; /* struct freebsd32_ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_setcontext */
 	case 422: {
 		struct freebsd32_setcontext_args *p = params;
 		uarg[0] = (intptr_t) p->ucp; /* const struct freebsd32_ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_swapcontext */
 	case 423: {
 		struct freebsd32_swapcontext_args *p = params;
 		uarg[0] = (intptr_t) p->oucp; /* struct freebsd32_ucontext * */
 		uarg[1] = (intptr_t) p->ucp; /* const struct freebsd32_ucontext * */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_get_link */
 	case 425: {
 		struct __acl_get_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_link */
 	case 426: {
 		struct __acl_set_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_delete_link */
 	case 427: {
 		struct __acl_delete_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_aclcheck_link */
 	case 428: {
 		struct __acl_aclcheck_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* sigwait */
 	case 429: {
 		struct sigwait_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->sig; /* int * */
 		*n_args = 2;
 		break;
 	}
 	/* thr_exit */
 	case 431: {
 		struct thr_exit_args *p = params;
 		uarg[0] = (intptr_t) p->state; /* long * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_self */
 	case 432: {
 		struct thr_self_args *p = params;
 		uarg[0] = (intptr_t) p->id; /* long * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_kill */
 	case 433: {
 		struct thr_kill_args *p = params;
 		iarg[0] = p->id; /* long */
 		iarg[1] = p->sig; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* jail_attach */
 	case 436: {
 		struct jail_attach_args *p = params;
 		iarg[0] = p->jid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* extattr_list_fd */
 	case 437: {
 		struct extattr_list_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* extattr_list_file */
 	case 438: {
 		struct extattr_list_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* extattr_list_link */
 	case 439: {
 		struct extattr_list_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_ksem_timedwait */
 	case 441: {
 		struct freebsd32_ksem_timedwait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		uarg[1] = (intptr_t) p->abstime; /* const struct timespec32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_thr_suspend */
 	case 442: {
 		struct freebsd32_thr_suspend_args *p = params;
 		uarg[0] = (intptr_t) p->timeout; /* const struct timespec32 * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_wake */
 	case 443: {
 		struct thr_wake_args *p = params;
 		iarg[0] = p->id; /* long */
 		*n_args = 1;
 		break;
 	}
 	/* kldunloadf */
 	case 444: {
 		struct kldunloadf_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* audit */
 	case 445: {
 		struct audit_args *p = params;
 		uarg[0] = (intptr_t) p->record; /* const void * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* auditon */
 	case 446: {
 		struct auditon_args *p = params;
 		iarg[0] = p->cmd; /* int */
 		uarg[1] = (intptr_t) p->data; /* void * */
 		uarg[2] = p->length; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* getauid */
 	case 447: {
 		struct getauid_args *p = params;
 		uarg[0] = (intptr_t) p->auid; /* uid_t * */
 		*n_args = 1;
 		break;
 	}
 	/* setauid */
 	case 448: {
 		struct setauid_args *p = params;
 		uarg[0] = (intptr_t) p->auid; /* uid_t * */
 		*n_args = 1;
 		break;
 	}
 	/* getaudit */
 	case 449: {
 		struct getaudit_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo; /* struct auditinfo * */
 		*n_args = 1;
 		break;
 	}
 	/* setaudit */
 	case 450: {
 		struct setaudit_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo; /* struct auditinfo * */
 		*n_args = 1;
 		break;
 	}
 	/* getaudit_addr */
 	case 451: {
 		struct getaudit_addr_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo_addr; /* struct auditinfo_addr * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* setaudit_addr */
 	case 452: {
 		struct setaudit_addr_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo_addr; /* struct auditinfo_addr * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* auditctl */
 	case 453: {
 		struct auditctl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32__umtx_op */
 	case 454: {
 		struct freebsd32__umtx_op_args *p = params;
 		uarg[0] = (intptr_t) p->obj; /* void * */
 		iarg[1] = p->op; /* int */
 		uarg[2] = p->val; /* u_long */
 		uarg[3] = (intptr_t) p->uaddr; /* void * */
 		uarg[4] = (intptr_t) p->uaddr2; /* void * */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_thr_new */
 	case 455: {
 		struct freebsd32_thr_new_args *p = params;
 		uarg[0] = (intptr_t) p->param; /* struct thr_param32 * */
 		iarg[1] = p->param_size; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_sigqueue */
 	case 456: {
 		struct freebsd32_sigqueue_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->signum; /* int */
 		iarg[2] = p->value; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_kmq_open */
 	case 457: {
 		struct freebsd32_kmq_open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		uarg[3] = (intptr_t) p->attr; /* const struct mq_attr32 * */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_kmq_setattr */
 	case 458: {
 		struct freebsd32_kmq_setattr_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->attr; /* const struct mq_attr32 * */
 		uarg[2] = (intptr_t) p->oattr; /* struct mq_attr32 * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_kmq_timedreceive */
 	case 459: {
 		struct freebsd32_kmq_timedreceive_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->msg_ptr; /* char * */
 		uarg[2] = p->msg_len; /* size_t */
 		uarg[3] = (intptr_t) p->msg_prio; /* unsigned * */
 		uarg[4] = (intptr_t) p->abs_timeout; /* const struct timespec32 * */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_kmq_timedsend */
 	case 460: {
 		struct freebsd32_kmq_timedsend_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->msg_ptr; /* const char * */
 		uarg[2] = p->msg_len; /* size_t */
 		uarg[3] = p->msg_prio; /* unsigned */
 		uarg[4] = (intptr_t) p->abs_timeout; /* const struct timespec32 * */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_kmq_notify */
 	case 461: {
 		struct freebsd32_kmq_notify_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->sigev; /* const struct sigevent32 * */
 		*n_args = 2;
 		break;
 	}
 	/* kmq_unlink */
 	case 462: {
 		struct kmq_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* abort2 */
 	case 463: {
 		struct abort2_args *p = params;
 		uarg[0] = (intptr_t) p->why; /* const char * */
 		iarg[1] = p->nargs; /* int */
 		uarg[2] = (intptr_t) p->args; /* void ** */
 		*n_args = 3;
 		break;
 	}
 	/* thr_set_name */
 	case 464: {
 		struct thr_set_name_args *p = params;
 		iarg[0] = p->id; /* long */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_aio_fsync */
 	case 465: {
 		struct freebsd32_aio_fsync_args *p = params;
 		iarg[0] = p->op; /* int */
 		uarg[1] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 2;
 		break;
 	}
 	/* rtprio_thread */
 	case 466: {
 		struct rtprio_thread_args *p = params;
 		iarg[0] = p->function; /* int */
 		iarg[1] = p->lwpid; /* lwpid_t */
 		uarg[2] = (intptr_t) p->rtp; /* struct rtprio * */
 		*n_args = 3;
 		break;
 	}
 	/* sctp_peeloff */
 	case 471: {
 		struct sctp_peeloff_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = p->name; /* uint32_t */
 		*n_args = 2;
 		break;
 	}
 	/* sctp_generic_sendmsg */
 	case 472: {
 		struct sctp_generic_sendmsg_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->msg; /* void * */
 		iarg[2] = p->mlen; /* int */
 		uarg[3] = (intptr_t) p->to; /* struct sockaddr * */
 		iarg[4] = p->tolen; /* __socklen_t */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		iarg[6] = p->flags; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* sctp_generic_sendmsg_iov */
 	case 473: {
 		struct sctp_generic_sendmsg_iov_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->iov; /* struct iovec * */
 		iarg[2] = p->iovlen; /* int */
 		uarg[3] = (intptr_t) p->to; /* struct sockaddr * */
 		iarg[4] = p->tolen; /* __socklen_t */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		iarg[6] = p->flags; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* sctp_generic_recvmsg */
 	case 474: {
 		struct sctp_generic_recvmsg_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->iov; /* struct iovec * */
 		iarg[2] = p->iovlen; /* int */
 		uarg[3] = (intptr_t) p->from; /* struct sockaddr * */
 		uarg[4] = (intptr_t) p->fromlenaddr; /* __socklen_t * */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		uarg[6] = (intptr_t) p->msg_flags; /* int * */
 		*n_args = 7;
 		break;
 	}
 #ifdef PAD64_REQUIRED
 	/* freebsd32_pread */
 	case 475: {
 		struct freebsd32_pread_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->nbyte; /* size_t */
 		iarg[3] = p->pad; /* int */
 		uarg[4] = p->offset1; /* uint32_t */
 		uarg[5] = p->offset2; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_pwrite */
 	case 476: {
 		struct freebsd32_pwrite_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->nbyte; /* size_t */
 		iarg[3] = p->pad; /* int */
 		uarg[4] = p->offset1; /* uint32_t */
 		uarg[5] = p->offset2; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_mmap */
 	case 477: {
 		struct freebsd32_mmap_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->prot; /* int */
 		iarg[3] = p->flags; /* int */
 		iarg[4] = p->fd; /* int */
 		iarg[5] = p->pad; /* int */
 		uarg[6] = p->pos1; /* uint32_t */
 		uarg[7] = p->pos2; /* uint32_t */
 		*n_args = 8;
 		break;
 	}
 	/* freebsd32_lseek */
 	case 478: {
 		struct freebsd32_lseek_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->offset1; /* uint32_t */
 		uarg[3] = p->offset2; /* uint32_t */
 		iarg[4] = p->whence; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_truncate */
 	case 479: {
 		struct freebsd32_truncate_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->length1; /* uint32_t */
 		uarg[3] = p->length2; /* uint32_t */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_ftruncate */
 	case 480: {
 		struct freebsd32_ftruncate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->length1; /* uint32_t */
 		uarg[3] = p->length2; /* uint32_t */
 		*n_args = 4;
 		break;
 	}
 #else
 	/* freebsd32_pread */
 	case 475: {
 		struct freebsd32_pread_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->nbyte; /* size_t */
 		uarg[3] = p->offset1; /* uint32_t */
 		uarg[4] = p->offset2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_pwrite */
 	case 476: {
 		struct freebsd32_pwrite_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->nbyte; /* size_t */
 		uarg[3] = p->offset1; /* uint32_t */
 		uarg[4] = p->offset2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_mmap */
 	case 477: {
 		struct freebsd32_mmap_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->prot; /* int */
 		iarg[3] = p->flags; /* int */
 		iarg[4] = p->fd; /* int */
 		uarg[5] = p->pos1; /* uint32_t */
 		uarg[6] = p->pos2; /* uint32_t */
 		*n_args = 7;
 		break;
 	}
 	/* freebsd32_lseek */
 	case 478: {
 		struct freebsd32_lseek_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->offset1; /* uint32_t */
 		uarg[2] = p->offset2; /* uint32_t */
 		iarg[3] = p->whence; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_truncate */
 	case 479: {
 		struct freebsd32_truncate_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = p->length1; /* uint32_t */
 		uarg[2] = p->length2; /* uint32_t */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_ftruncate */
 	case 480: {
 		struct freebsd32_ftruncate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->length1; /* uint32_t */
 		uarg[2] = p->length2; /* uint32_t */
 		*n_args = 3;
 		break;
 	}
 #endif
 	/* thr_kill2 */
 	case 481: {
 		struct thr_kill2_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->id; /* long */
 		iarg[2] = p->sig; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* shm_open */
 	case 482: {
 		struct shm_open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* shm_unlink */
 	case 483: {
 		struct shm_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* cpuset */
 	case 484: {
 		struct cpuset_args *p = params;
 		uarg[0] = (intptr_t) p->setid; /* cpusetid_t * */
 		*n_args = 1;
 		break;
 	}
 #ifdef PAD64_REQUIRED
 	/* freebsd32_cpuset_setid */
 	case 485: {
 		struct freebsd32_cpuset_setid_args *p = params;
 		iarg[0] = p->which; /* cpuwhich_t */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		iarg[4] = p->setid; /* cpusetid_t */
 		*n_args = 5;
 		break;
 	}
 #else
 	/* freebsd32_cpuset_setid */
 	case 485: {
 		struct freebsd32_cpuset_setid_args *p = params;
 		iarg[0] = p->which; /* cpuwhich_t */
 		uarg[1] = p->id1; /* uint32_t */
 		uarg[2] = p->id2; /* uint32_t */
 		iarg[3] = p->setid; /* cpusetid_t */
 		*n_args = 4;
 		break;
 	}
 #endif
 	/* freebsd32_cpuset_getid */
 	case 486: {
 		struct freebsd32_cpuset_getid_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = (intptr_t) p->setid; /* cpusetid_t * */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_cpuset_getaffinity */
 	case 487: {
 		struct freebsd32_cpuset_getaffinity_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = p->cpusetsize; /* size_t */
 		uarg[5] = (intptr_t) p->mask; /* cpuset_t * */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_cpuset_setaffinity */
 	case 488: {
 		struct freebsd32_cpuset_setaffinity_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = p->cpusetsize; /* size_t */
 		uarg[5] = (intptr_t) p->mask; /* const cpuset_t * */
 		*n_args = 6;
 		break;
 	}
 	/* faccessat */
 	case 489: {
 		struct faccessat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->amode; /* int */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fchmodat */
 	case 490: {
 		struct fchmodat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fchownat */
 	case 491: {
 		struct fchownat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = p->uid; /* uid_t */
 		iarg[3] = p->gid; /* gid_t */
 		iarg[4] = p->flag; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_fexecve */
 	case 492: {
 		struct freebsd32_fexecve_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->argv; /* uint32_t * */
 		uarg[2] = (intptr_t) p->envv; /* uint32_t * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_futimesat */
 	case 494: {
 		struct freebsd32_futimesat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->times; /* struct timeval * */
 		*n_args = 3;
 		break;
 	}
 	/* linkat */
 	case 495: {
 		struct linkat_args *p = params;
 		iarg[0] = p->fd1; /* int */
 		uarg[1] = (intptr_t) p->path1; /* const char * */
 		iarg[2] = p->fd2; /* int */
 		uarg[3] = (intptr_t) p->path2; /* const char * */
 		iarg[4] = p->flag; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* mkdirat */
 	case 496: {
 		struct mkdirat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* mkfifoat */
 	case 497: {
 		struct mkfifoat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* openat */
 	case 499: {
 		struct openat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flag; /* int */
 		iarg[3] = p->mode; /* mode_t */
 		*n_args = 4;
 		break;
 	}
 	/* readlinkat */
 	case 500: {
 		struct readlinkat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->buf; /* char * */
 		uarg[3] = p->bufsize; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* renameat */
 	case 501: {
 		struct renameat_args *p = params;
 		iarg[0] = p->oldfd; /* int */
 		uarg[1] = (intptr_t) p->old; /* const char * */
 		iarg[2] = p->newfd; /* int */
 		uarg[3] = (intptr_t) p->new; /* const char * */
 		*n_args = 4;
 		break;
 	}
 	/* symlinkat */
 	case 502: {
 		struct symlinkat_args *p = params;
 		uarg[0] = (intptr_t) p->path1; /* const char * */
 		iarg[1] = p->fd; /* int */
 		uarg[2] = (intptr_t) p->path2; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* unlinkat */
 	case 503: {
 		struct unlinkat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flag; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* posix_openpt */
 	case 504: {
 		struct posix_openpt_args *p = params;
 		iarg[0] = p->flags; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_jail_get */
 	case 506: {
 		struct freebsd32_jail_get_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_jail_set */
 	case 507: {
 		struct freebsd32_jail_set_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec32 * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* jail_remove */
 	case 508: {
 		struct jail_remove_args *p = params;
 		iarg[0] = p->jid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* closefrom */
 	case 509: {
 		struct closefrom_args *p = params;
 		iarg[0] = p->lowfd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_semctl */
 	case 510: {
 		struct freebsd32_semctl_args *p = params;
 		iarg[0] = p->semid; /* int */
 		iarg[1] = p->semnum; /* int */
 		iarg[2] = p->cmd; /* int */
 		uarg[3] = (intptr_t) p->arg; /* union semun32 * */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_msgctl */
 	case 511: {
 		struct freebsd32_msgctl_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->buf; /* struct msqid_ds32 * */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_shmctl */
 	case 512: {
 		struct freebsd32_shmctl_args *p = params;
 		iarg[0] = p->shmid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->buf; /* struct shmid_ds32 * */
 		*n_args = 3;
 		break;
 	}
 	/* lpathconf */
 	case 513: {
 		struct lpathconf_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* __cap_rights_get */
 	case 515: {
 		struct __cap_rights_get_args *p = params;
 		iarg[0] = p->version; /* int */
 		iarg[1] = p->fd; /* int */
 		uarg[2] = (intptr_t) p->rightsp; /* cap_rights_t * */
 		*n_args = 3;
 		break;
 	}
 	/* cap_enter */
 	case 516: {
 		*n_args = 0;
 		break;
 	}
 	/* cap_getmode */
 	case 517: {
 		struct cap_getmode_args *p = params;
 		uarg[0] = (intptr_t) p->modep; /* u_int * */
 		*n_args = 1;
 		break;
 	}
 	/* pdfork */
 	case 518: {
 		struct pdfork_args *p = params;
 		uarg[0] = (intptr_t) p->fdp; /* int * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* pdkill */
 	case 519: {
 		struct pdkill_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->signum; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* pdgetpid */
 	case 520: {
 		struct pdgetpid_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->pidp; /* pid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_pselect */
 	case 522: {
 		struct freebsd32_pselect_args *p = params;
 		iarg[0] = p->nd; /* int */
 		uarg[1] = (intptr_t) p->in; /* fd_set * */
 		uarg[2] = (intptr_t) p->ou; /* fd_set * */
 		uarg[3] = (intptr_t) p->ex; /* fd_set * */
 		uarg[4] = (intptr_t) p->ts; /* const struct timespec32 * */
 		uarg[5] = (intptr_t) p->sm; /* const sigset_t * */
 		*n_args = 6;
 		break;
 	}
 	/* getloginclass */
 	case 523: {
 		struct getloginclass_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* char * */
 		uarg[1] = p->namelen; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* setloginclass */
 	case 524: {
 		struct setloginclass_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* rctl_get_racct */
 	case 525: {
 		struct rctl_get_racct_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_get_rules */
 	case 526: {
 		struct rctl_get_rules_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_get_limits */
 	case 527: {
 		struct rctl_get_limits_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_add_rule */
 	case 528: {
 		struct rctl_add_rule_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_remove_rule */
 	case 529: {
 		struct rctl_remove_rule_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 #ifdef PAD64_REQUIRED
 	/* freebsd32_posix_fallocate */
 	case 530: {
 		struct freebsd32_posix_fallocate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->offset1; /* uint32_t */
 		uarg[3] = p->offset2; /* uint32_t */
 		uarg[4] = p->len1; /* uint32_t */
 		uarg[5] = p->len2; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_posix_fadvise */
 	case 531: {
 		struct freebsd32_posix_fadvise_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->offset1; /* uint32_t */
 		uarg[3] = p->offset2; /* uint32_t */
 		uarg[4] = p->len1; /* uint32_t */
 		uarg[5] = p->len2; /* uint32_t */
 		iarg[6] = p->advice; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* freebsd32_wait6 */
 	case 532: {
 		struct freebsd32_wait6_args *p = params;
 		iarg[0] = p->idtype; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = (intptr_t) p->status; /* int * */
 		iarg[5] = p->options; /* int */
 		uarg[6] = (intptr_t) p->wrusage; /* struct wrusage32 * */
 		uarg[7] = (intptr_t) p->info; /* siginfo_t * */
 		*n_args = 8;
 		break;
 	}
 #else
 	/* freebsd32_posix_fallocate */
 	case 530: {
 		struct freebsd32_posix_fallocate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->offset1; /* uint32_t */
 		uarg[2] = p->offset2; /* uint32_t */
 		uarg[3] = p->len1; /* uint32_t */
 		uarg[4] = p->len2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 	/* freebsd32_posix_fadvise */
 	case 531: {
 		struct freebsd32_posix_fadvise_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->offset1; /* uint32_t */
 		uarg[2] = p->offset2; /* uint32_t */
 		uarg[3] = p->len1; /* uint32_t */
 		uarg[4] = p->len2; /* uint32_t */
 		iarg[5] = p->advice; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_wait6 */
 	case 532: {
 		struct freebsd32_wait6_args *p = params;
 		iarg[0] = p->idtype; /* int */
 		uarg[1] = p->id1; /* uint32_t */
 		uarg[2] = p->id2; /* uint32_t */
 		uarg[3] = (intptr_t) p->status; /* int * */
 		iarg[4] = p->options; /* int */
 		uarg[5] = (intptr_t) p->wrusage; /* struct wrusage32 * */
 		uarg[6] = (intptr_t) p->info; /* siginfo_t * */
 		*n_args = 7;
 		break;
 	}
 #endif
 	/* cap_rights_limit */
 	case 533: {
 		struct cap_rights_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->rightsp; /* cap_rights_t * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_cap_ioctls_limit */
 	case 534: {
 		struct freebsd32_cap_ioctls_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->cmds; /* const uint32_t * */
 		uarg[2] = p->ncmds; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* freebsd32_cap_ioctls_get */
 	case 535: {
 		struct freebsd32_cap_ioctls_get_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->cmds; /* uint32_t * */
 		uarg[2] = p->maxcmds; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* cap_fcntls_limit */
 	case 536: {
 		struct cap_fcntls_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->fcntlrights; /* uint32_t */
 		*n_args = 2;
 		break;
 	}
 	/* cap_fcntls_get */
 	case 537: {
 		struct cap_fcntls_get_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->fcntlrightsp; /* uint32_t * */
 		*n_args = 2;
 		break;
 	}
 	/* bindat */
 	case 538: {
 		struct bindat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		uarg[2] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[3] = p->namelen; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* connectat */
 	case 539: {
 		struct connectat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		uarg[2] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[3] = p->namelen; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* chflagsat */
 	case 540: {
 		struct chflagsat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = p->flags; /* u_long */
 		iarg[3] = p->atflag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* accept4 */
 	case 541: {
 		struct accept4_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->anamelen; /* __socklen_t * */
 		iarg[3] = p->flags; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* pipe2 */
 	case 542: {
 		struct pipe2_args *p = params;
 		uarg[0] = (intptr_t) p->fildes; /* int * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_aio_mlock */
 	case 543: {
 		struct freebsd32_aio_mlock_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb32 * */
 		*n_args = 1;
 		break;
 	}
 #ifdef PAD64_REQUIRED
 	/* freebsd32_procctl */
 	case 544: {
 		struct freebsd32_procctl_args *p = params;
 		iarg[0] = p->idtype; /* int */
 		iarg[1] = p->pad; /* int */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		iarg[4] = p->com; /* int */
 		uarg[5] = (intptr_t) p->data; /* void * */
 		*n_args = 6;
 		break;
 	}
 #else
 	/* freebsd32_procctl */
 	case 544: {
 		struct freebsd32_procctl_args *p = params;
 		iarg[0] = p->idtype; /* int */
 		uarg[1] = p->id1; /* uint32_t */
 		uarg[2] = p->id2; /* uint32_t */
 		iarg[3] = p->com; /* int */
 		uarg[4] = (intptr_t) p->data; /* void * */
 		*n_args = 5;
 		break;
 	}
 #endif
 	/* freebsd32_ppoll */
 	case 545: {
 		struct freebsd32_ppoll_args *p = params;
 		uarg[0] = (intptr_t) p->fds; /* struct pollfd * */
 		uarg[1] = p->nfds; /* u_int */
 		uarg[2] = (intptr_t) p->ts; /* const struct timespec32 * */
 		uarg[3] = (intptr_t) p->set; /* const sigset_t * */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_futimens */
 	case 546: {
 		struct freebsd32_futimens_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->times; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_utimensat */
 	case 547: {
 		struct freebsd32_utimensat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->times; /* struct timespec * */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fdatasync */
 	case 550: {
 		struct fdatasync_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* freebsd32_fstat */
 	case 551: {
 		struct freebsd32_fstat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->ub; /* struct stat32 * */
 		*n_args = 2;
 		break;
 	}
 	/* freebsd32_fstatat */
 	case 552: {
 		struct freebsd32_fstatat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->buf; /* struct stat32 * */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* freebsd32_fhstat */
 	case 553: {
 		struct freebsd32_fhstat_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		uarg[1] = (intptr_t) p->sb; /* struct stat32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getdirentries */
 	case 554: {
 		struct getdirentries_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->count; /* size_t */
 		uarg[3] = (intptr_t) p->basep; /* off_t * */
 		*n_args = 4;
 		break;
 	}
 	/* statfs */
 	case 555: {
 		struct statfs_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs32 * */
 		*n_args = 2;
 		break;
 	}
 	/* fstatfs */
 	case 556: {
 		struct fstatfs_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs32 * */
 		*n_args = 2;
 		break;
 	}
 	/* getfsstat */
 	case 557: {
 		struct getfsstat_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* struct statfs32 * */
 		iarg[1] = p->bufsize; /* long */
 		iarg[2] = p->mode; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* fhstatfs */
 	case 558: {
 		struct fhstatfs_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs32 * */
 		*n_args = 2;
 		break;
 	}
 #ifdef PAD64_REQUIRED
 	/* freebsd32_mknodat */
 	case 559: {
 		struct freebsd32_mknodat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		iarg[3] = p->pad; /* int */
 		uarg[4] = p->dev1; /* uint32_t */
 		uarg[5] = p->dev2; /* uint32_t */
 		*n_args = 6;
 		break;
 	}
 #else
 	/* freebsd32_mknodat */
 	case 559: {
 		struct freebsd32_mknodat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		uarg[3] = p->dev1; /* uint32_t */
 		uarg[4] = p->dev2; /* uint32_t */
 		*n_args = 5;
 		break;
 	}
 #endif
 	/* freebsd32_kevent */
 	case 560: {
 		struct freebsd32_kevent_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->changelist; /* const struct kevent32 * */
 		iarg[2] = p->nchanges; /* int */
 		uarg[3] = (intptr_t) p->eventlist; /* struct kevent32 * */
 		iarg[4] = p->nevents; /* int */
 		uarg[5] = (intptr_t) p->timeout; /* const struct timespec32 * */
 		*n_args = 6;
 		break;
 	}
 	/* freebsd32_cpuset_getdomain */
 	case 561: {
 		struct freebsd32_cpuset_getdomain_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = p->domainsetsize; /* size_t */
 		uarg[5] = (intptr_t) p->mask; /* domainset_t * */
 		uarg[6] = (intptr_t) p->policy; /* int * */
 		*n_args = 7;
 		break;
 	}
 	/* freebsd32_cpuset_setdomain */
 	case 562: {
 		struct freebsd32_cpuset_setdomain_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		uarg[2] = p->id1; /* uint32_t */
 		uarg[3] = p->id2; /* uint32_t */
 		uarg[4] = p->domainsetsize; /* size_t */
 		uarg[5] = (intptr_t) p->mask; /* domainset_t * */
 		iarg[6] = p->policy; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* getrandom */
 	case 563: {
 		struct getrandom_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* void * */
 		uarg[1] = p->buflen; /* size_t */
 		uarg[2] = p->flags; /* unsigned int */
 		*n_args = 3;
 		break;
 	}
 	/* getfhat */
 	case 564: {
 		struct getfhat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* char * */
 		uarg[2] = (intptr_t) p->fhp; /* struct fhandle * */
 		iarg[3] = p->flags; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fhlink */
 	case 565: {
 		struct fhlink_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		uarg[1] = (intptr_t) p->to; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* fhlinkat */
 	case 566: {
 		struct fhlinkat_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		iarg[1] = p->tofd; /* int */
 		uarg[2] = (intptr_t) p->to; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* fhreadlink */
 	case 567: {
 		struct fhreadlink_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->bufsize; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* funlinkat */
 	case 568: {
 		struct funlinkat_args *p = params;
 		iarg[0] = p->dfd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->fd; /* int */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	default:
 		*n_args = 0;
 		break;
 	};
 }
 static void
 systrace_entry_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 {
 	const char *p = NULL;
 	switch (sysnum) {
 #if !defined(PAD64_REQUIRED) && !defined(__amd64__)
 #define PAD64_REQUIRED
 #endif
 	/* nosys */
 	case 0:
 		break;
 	/* sys_exit */
 	case 1:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fork */
 	case 2:
 		break;
 	/* read */
 	case 3:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* write */
 	case 4:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* open */
 	case 5:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* close */
 	case 6:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_wait4 */
 	case 7:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct rusage32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* link */
 	case 9:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unlink */
 	case 10:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chdir */
 	case 12:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchdir */
 	case 13:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chmod */
 	case 15:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chown */
 	case 16:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* break */
 	case 17:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpid */
 	case 20:
 		break;
 	/* mount */
 	case 21:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unmount */
 	case 22:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setuid */
 	case 23:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getuid */
 	case 24:
 		break;
 	/* geteuid */
 	case 25:
 		break;
 	/* ptrace */
 	case 26:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "pid_t";
 			break;
 		case 2:
 			p = "caddr_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_recvmsg */
 	case 27:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct msghdr32 *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sendmsg */
 	case 28:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct msghdr32 *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_recvfrom */
 	case 29:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland struct sockaddr *";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* accept */
 	case 30:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpeername */
 	case 31:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsockname */
 	case 32:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* access */
 	case 33:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chflags */
 	case 34:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchflags */
 	case 35:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sync */
 	case 36:
 		break;
 	/* kill */
 	case 37:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getppid */
 	case 39:
 		break;
 	/* dup */
 	case 41:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getegid */
 	case 43:
 		break;
 	/* profil */
 	case 44:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktrace */
 	case 45:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getgid */
 	case 47:
 		break;
 	/* getlogin */
 	case 49:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setlogin */
 	case 50:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* acct */
 	case 51:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigaltstack */
 	case 53:
 		switch(ndx) {
 		case 0:
 			p = "userland struct sigaltstack32 *";
 			break;
 		case 1:
 			p = "userland struct sigaltstack32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ioctl */
 	case 54:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "userland struct md_ioctl32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* reboot */
 	case 55:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* revoke */
 	case 56:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* symlink */
 	case 57:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* readlink */
 	case 58:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_execve */
 	case 59:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland uint32_t *";
 			break;
 		case 2:
 			p = "userland uint32_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* umask */
 	case 60:
 		switch(ndx) {
 		case 0:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chroot */
 	case 61:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msync */
 	case 65:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* vfork */
 	case 66:
 		break;
 	/* sbrk */
 	case 69:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sstk */
 	case 70:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munmap */
 	case 73:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_mprotect */
 	case 74:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* madvise */
 	case 75:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mincore */
 	case 78:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getgroups */
 	case 79:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setgroups */
 	case 80:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpgrp */
 	case 81:
 		break;
 	/* setpgid */
 	case 82:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_setitimer */
 	case 83:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct itimerval32 *";
 			break;
 		case 2:
 			p = "userland struct itimerval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* swapon */
 	case 85:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_getitimer */
 	case 86:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct itimerval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getdtablesize */
 	case 89:
 		break;
 	/* dup2 */
 	case 90:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_fcntl */
 	case 92:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_select */
 	case 93:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland fd_set *";
 			break;
 		case 2:
 			p = "userland fd_set *";
 			break;
 		case 3:
 			p = "userland fd_set *";
 			break;
 		case 4:
 			p = "userland struct timeval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fsync */
 	case 95:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setpriority */
 	case 96:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* socket */
 	case 97:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* connect */
 	case 98:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sockaddr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpriority */
 	case 100:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* bind */
 	case 104:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sockaddr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setsockopt */
 	case 105:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const void *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* listen */
 	case 106:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_gettimeofday */
 	case 116:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval32 *";
 			break;
 		case 1:
 			p = "userland struct timezone *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_getrusage */
 	case 117:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct rusage32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsockopt */
 	case 118:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_readv */
 	case 120:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec32 *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_writev */
 	case 121:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec32 *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_settimeofday */
 	case 122:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval32 *";
 			break;
 		case 1:
 			p = "userland struct timezone *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchown */
 	case 123:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchmod */
 	case 124:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setreuid */
 	case 126:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setregid */
 	case 127:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rename */
 	case 128:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* flock */
 	case 131:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkfifo */
 	case 132:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sendto */
 	case 133:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland const struct sockaddr *";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shutdown */
 	case 134:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* socketpair */
 	case 135:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkdir */
 	case 136:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rmdir */
 	case 137:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_utimes */
 	case 138:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct timeval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_adjtime */
 	case 140:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval32 *";
 			break;
 		case 1:
 			p = "userland struct timeval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setsid */
 	case 147:
 		break;
 	/* quotactl */
 	case 148:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfh */
 	case 161:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct fhandle *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sysarch */
 	case 165:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rtprio */
 	case 166:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "pid_t";
 			break;
 		case 2:
 			p = "userland struct rtprio *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_semsys */
 	case 169:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_msgsys */
 	case 170:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_shmsys */
 	case 171:
 		switch(ndx) {
 		case 0:
 			p = "uint32_t";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ntp_adjtime */
 	case 176:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timex *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setgid */
 	case 181:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setegid */
 	case 182:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* seteuid */
 	case 183:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pathconf */
 	case 191:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fpathconf */
 	case 192:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getrlimit */
 	case 194:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct rlimit *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setrlimit */
 	case 195:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct rlimit *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nosys */
 	case 198:
 		break;
 	/* freebsd32___sysctl */
 	case 202:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "userland uint32_t *";
 			break;
 		case 4:
 			p = "userland const void *";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mlock */
 	case 203:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munlock */
 	case 204:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* undelete */
 	case 205:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_futimes */
 	case 206:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct timeval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpgid */
 	case 207:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* poll */
 	case 209:
 		switch(ndx) {
 		case 0:
 			p = "userland struct pollfd *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lkmnosys */
 	case 210:
 		break;
 	/* lkmnosys */
 	case 211:
 		break;
 	/* lkmnosys */
 	case 212:
 		break;
 	/* lkmnosys */
 	case 213:
 		break;
 	/* lkmnosys */
 	case 214:
 		break;
 	/* lkmnosys */
 	case 215:
 		break;
 	/* lkmnosys */
 	case 216:
 		break;
 	/* lkmnosys */
 	case 217:
 		break;
 	/* lkmnosys */
 	case 218:
 		break;
 	/* lkmnosys */
 	case 219:
 		break;
 	/* semget */
 	case 221:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* semop */
 	case 222:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sembuf *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgget */
 	case 225:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_msgsnd */
 	case 226:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_msgrcv */
 	case 227:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "long";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmat */
 	case 228:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmdt */
 	case 230:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmget */
 	case 231:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_clock_gettime */
 	case 232:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_clock_settime */
 	case 233:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_clock_getres */
 	case 234:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ktimer_create */
 	case 235:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct sigevent32 *";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_delete */
 	case 236:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ktimer_settime */
 	case 237:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct itimerspec32 *";
 			break;
 		case 3:
 			p = "userland struct itimerspec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ktimer_gettime */
 	case 238:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct itimerspec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_getoverrun */
 	case 239:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_nanosleep */
 	case 240:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct timespec32 *";
 			break;
 		case 1:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_getcounter */
 	case 241:
 		switch(ndx) {
 		case 0:
 			p = "userland ffcounter *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_setestimate */
 	case 242:
 		switch(ndx) {
 		case 0:
 			p = "userland struct ffclock_estimate *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_getestimate */
 	case 243:
 		switch(ndx) {
 		case 0:
 			p = "userland struct ffclock_estimate *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_clock_nanosleep */
 	case 244:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct timespec32 *";
 			break;
 		case 3:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_clock_getcpuclockid2 */
 	case 247:
 		switch(ndx) {
 		case 0:
 			p = "uint32_t";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland clockid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* minherit */
 	case 250:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rfork */
 	case 251:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* issetugid */
 	case 253:
 		break;
 	/* lchown */
 	case 254:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_read */
 	case 255:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_write */
 	case 256:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_lio_listio */
 	case 257:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb32 *const *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sigevent32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lchmod */
 	case 274:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_lutimes */
 	case 276:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct timeval32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_preadv */
 	case 289:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec32 *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_pwritev */
 	case 290:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec32 *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhopen */
 	case 298:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modnext */
 	case 300:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_modstat */
 	case 301:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct module_stat32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modfnext */
 	case 302:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modfind */
 	case 303:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldload */
 	case 304:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldunload */
 	case 305:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldfind */
 	case 306:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldnext */
 	case 307:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kldstat */
 	case 308:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct kld32_file_stat *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldfirstmod */
 	case 309:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsid */
 	case 310:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setresuid */
 	case 311:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		case 1:
 			p = "uid_t";
 			break;
 		case 2:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setresgid */
 	case 312:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		case 1:
 			p = "gid_t";
 			break;
 		case 2:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_return */
 	case 314:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_suspend */
 	case 315:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *const *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_cancel */
 	case 316:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_error */
 	case 317:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* yield */
 	case 321:
 		break;
 	/* mlockall */
 	case 324:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munlockall */
 	case 325:
 		break;
 	/* __getcwd */
 	case 326:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_setparam */
 	case 327:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland const struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_getparam */
 	case 328:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_setscheduler */
 	case 329:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_getscheduler */
 	case 330:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_yield */
 	case 331:
 		break;
 	/* sched_get_priority_max */
 	case 332:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_get_priority_min */
 	case 333:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sched_rr_get_interval */
 	case 334:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* utrace */
 	case 335:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldsym */
 	case 337:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_jail */
 	case 338:
 		switch(ndx) {
 		case 0:
 			p = "userland struct jail32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigprocmask */
 	case 340:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const sigset_t *";
 			break;
 		case 2:
 			p = "userland sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigsuspend */
 	case 341:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigpending */
 	case 343:
 		switch(ndx) {
 		case 0:
 			p = "userland sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigtimedwait */
 	case 345:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland siginfo_t *";
 			break;
 		case 2:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigwaitinfo */
 	case 346:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland siginfo_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_file */
 	case 347:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_file */
 	case 348:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_fd */
 	case 349:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_fd */
 	case 350:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_file */
 	case 351:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_fd */
 	case 352:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_file */
 	case 353:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_fd */
 	case 354:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattrctl */
 	case 355:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_set_file */
 	case 356:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_file */
 	case 357:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_file */
 	case 358:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_waitcomplete */
 	case 359:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 **";
 			break;
 		case 1:
 			p = "userland struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getresuid */
 	case 360:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		case 1:
 			p = "userland uid_t *";
 			break;
 		case 2:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getresgid */
 	case 361:
 		switch(ndx) {
 		case 0:
 			p = "userland gid_t *";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		case 2:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kqueue */
 	case 362:
 		break;
 	/* extattr_set_fd */
 	case 371:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_fd */
 	case 372:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_fd */
 	case 373:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __setugid */
 	case 374:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* eaccess */
 	case 376:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_nmount */
 	case 378:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec32 *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kenv */
 	case 390:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lchflags */
 	case 391:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* uuidgen */
 	case 392:
 		switch(ndx) {
 		case 0:
 			p = "userland struct uuid *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sendfile */
 	case 393:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		case 5:
 			p = "userland struct sf_hdtr32 *";
 			break;
 		case 6:
 			p = "userland off_t *";
 			break;
 		case 7:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_close */
 	case 400:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_post */
 	case 401:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_wait */
 	case 402:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_trywait */
 	case 403:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ksem_init */
 	case 404:
 		switch(ndx) {
 		case 0:
 			p = "userland semid_t *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ksem_open */
 	case 405:
 		switch(ndx) {
 		case 0:
 			p = "userland semid_t *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "mode_t";
 			break;
 		case 4:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_unlink */
 	case 406:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_getvalue */
 	case 407:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_destroy */
 	case 408:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_set_link */
 	case 412:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_link */
 	case 413:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_link */
 	case 414:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigaction */
 	case 416:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sigaction32 *";
 			break;
 		case 2:
 			p = "userland struct sigaction32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigreturn */
 	case 417:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct freebsd32_ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_getcontext */
 	case 421:
 		switch(ndx) {
 		case 0:
 			p = "userland struct freebsd32_ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_setcontext */
 	case 422:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct freebsd32_ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_swapcontext */
 	case 423:
 		switch(ndx) {
 		case 0:
 			p = "userland struct freebsd32_ucontext *";
 			break;
 		case 1:
 			p = "userland const struct freebsd32_ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_link */
 	case 425:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_link */
 	case 426:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_link */
 	case 427:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_link */
 	case 428:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigwait */
 	case 429:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_exit */
 	case 431:
 		switch(ndx) {
 		case 0:
 			p = "userland long *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_self */
 	case 432:
 		switch(ndx) {
 		case 0:
 			p = "userland long *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_kill */
 	case 433:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_attach */
 	case 436:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_fd */
 	case 437:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_file */
 	case 438:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_link */
 	case 439:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ksem_timedwait */
 	case 441:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		case 1:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_thr_suspend */
 	case 442:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_wake */
 	case 443:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldunloadf */
 	case 444:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* audit */
 	case 445:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* auditon */
 	case 446:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getauid */
 	case 447:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setauid */
 	case 448:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getaudit */
 	case 449:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setaudit */
 	case 450:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getaudit_addr */
 	case 451:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo_addr *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setaudit_addr */
 	case 452:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo_addr *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* auditctl */
 	case 453:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32__umtx_op */
 	case 454:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "u_long";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_thr_new */
 	case 455:
 		switch(ndx) {
 		case 0:
 			p = "userland struct thr_param32 *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_sigqueue */
 	case 456:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kmq_open */
 	case 457:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "userland const struct mq_attr32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kmq_setattr */
 	case 458:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct mq_attr32 *";
 			break;
 		case 2:
 			p = "userland struct mq_attr32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kmq_timedreceive */
 	case 459:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "userland unsigned *";
 			break;
 		case 4:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kmq_timedsend */
 	case 460:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "unsigned";
 			break;
 		case 4:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_kmq_notify */
 	case 461:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sigevent32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_unlink */
 	case 462:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* abort2 */
 	case 463:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void **";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_set_name */
 	case 464:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_fsync */
 	case 465:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rtprio_thread */
 	case 466:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "lwpid_t";
 			break;
 		case 2:
 			p = "userland struct rtprio *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_peeloff */
 	case 471:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_sendmsg */
 	case 472:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "__socklen_t";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_sendmsg_iov */
 	case 473:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "__socklen_t";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_recvmsg */
 	case 474:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "userland __socklen_t *";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_pread */
 	case 475:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_pwrite */
 	case 476:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_mmap */
 	case 477:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "int";
 			break;
 		case 6:
 			p = "uint32_t";
 			break;
 		case 7:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_lseek */
 	case 478:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_truncate */
 	case 479:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ftruncate */
 	case 480:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #else
 	/* freebsd32_pread */
 	case 475:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_pwrite */
 	case 476:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_mmap */
 	case 477:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		case 6:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_lseek */
 	case 478:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_truncate */
 	case 479:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_ftruncate */
 	case 480:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #endif
 	/* thr_kill2 */
 	case 481:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "long";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shm_open */
 	case 482:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shm_unlink */
 	case 483:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset */
 	case 484:
 		switch(ndx) {
 		case 0:
 			p = "userland cpusetid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_cpuset_setid */
 	case 485:
 		switch(ndx) {
 		case 0:
 			p = "cpuwhich_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "cpusetid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #else
 	/* freebsd32_cpuset_setid */
 	case 485:
 		switch(ndx) {
 		case 0:
 			p = "cpuwhich_t";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "cpusetid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #endif
 	/* freebsd32_cpuset_getid */
 	case 486:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "userland cpusetid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cpuset_getaffinity */
 	case 487:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		case 5:
 			p = "userland cpuset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cpuset_setaffinity */
 	case 488:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		case 5:
 			p = "userland const cpuset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* faccessat */
 	case 489:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchmodat */
 	case 490:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchownat */
 	case 491:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "uid_t";
 			break;
 		case 3:
 			p = "gid_t";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_fexecve */
 	case 492:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland uint32_t *";
 			break;
 		case 2:
 			p = "userland uint32_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_futimesat */
 	case 494:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* linkat */
 	case 495:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const char *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkdirat */
 	case 496:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkfifoat */
 	case 497:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* openat */
 	case 499:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* readlinkat */
 	case 500:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* renameat */
 	case 501:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* symlinkat */
 	case 502:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unlinkat */
 	case 503:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* posix_openpt */
 	case 504:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_jail_get */
 	case 506:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec32 *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_jail_set */
 	case 507:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec32 *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_remove */
 	case 508:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* closefrom */
 	case 509:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_semctl */
 	case 510:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland union semun32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_msgctl */
 	case 511:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland struct msqid_ds32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_shmctl */
 	case 512:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland struct shmid_ds32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lpathconf */
 	case 513:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __cap_rights_get */
 	case 515:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland cap_rights_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_enter */
 	case 516:
 		break;
 	/* cap_getmode */
 	case 517:
 		switch(ndx) {
 		case 0:
 			p = "userland u_int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdfork */
 	case 518:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdkill */
 	case 519:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdgetpid */
 	case 520:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland pid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_pselect */
 	case 522:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland fd_set *";
 			break;
 		case 2:
 			p = "userland fd_set *";
 			break;
 		case 3:
 			p = "userland fd_set *";
 			break;
 		case 4:
 			p = "userland const struct timespec32 *";
 			break;
 		case 5:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getloginclass */
 	case 523:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setloginclass */
 	case 524:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_racct */
 	case 525:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_rules */
 	case 526:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_limits */
 	case 527:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_add_rule */
 	case 528:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_remove_rule */
 	case 529:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_posix_fallocate */
 	case 530:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_posix_fadvise */
 	case 531:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_wait6 */
 	case 532:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "userland int *";
 			break;
 		case 5:
 			p = "int";
 			break;
 		case 6:
 			p = "userland struct wrusage32 *";
 			break;
 		case 7:
 			p = "userland siginfo_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 #else
 	/* freebsd32_posix_fallocate */
 	case 530:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_posix_fadvise */
 	case 531:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_wait6 */
 	case 532:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "userland int *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "userland struct wrusage32 *";
 			break;
 		case 6:
 			p = "userland siginfo_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 #endif
 	/* cap_rights_limit */
 	case 533:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland cap_rights_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cap_ioctls_limit */
 	case 534:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const uint32_t *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cap_ioctls_get */
 	case 535:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland uint32_t *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_fcntls_limit */
 	case 536:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_fcntls_get */
 	case 537:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland uint32_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* bindat */
 	case 538:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sockaddr *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* connectat */
 	case 539:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sockaddr *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chflagsat */
 	case 540:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "u_long";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* accept4 */
 	case 541:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland __socklen_t *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pipe2 */
 	case 542:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_aio_mlock */
 	case 543:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_procctl */
 	case 544:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 #else
 	/* freebsd32_procctl */
 	case 544:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 #endif
 	/* freebsd32_ppoll */
 	case 545:
 		switch(ndx) {
 		case 0:
 			p = "userland struct pollfd *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "userland const struct timespec32 *";
 			break;
 		case 3:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_futimens */
 	case 546:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_utimensat */
 	case 547:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct timespec *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fdatasync */
 	case 550:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_fstat */
 	case 551:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct stat32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_fstatat */
 	case 552:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct stat32 *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_fhstat */
 	case 553:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "userland struct stat32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getdirentries */
 	case 554:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "userland off_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* statfs */
 	case 555:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct statfs32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fstatfs */
 	case 556:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct statfs32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfsstat */
 	case 557:
 		switch(ndx) {
 		case 0:
 			p = "userland struct statfs32 *";
 			break;
 		case 1:
 			p = "long";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhstatfs */
 	case 558:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "userland struct statfs32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_mknodat */
 	case 559:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		case 5:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #else
 	/* freebsd32_mknodat */
 	case 559:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 #endif
 	/* freebsd32_kevent */
 	case 560:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct kevent32 *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct kevent32 *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "userland const struct timespec32 *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cpuset_getdomain */
 	case 561:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		case 5:
 			p = "userland domainset_t *";
 			break;
 		case 6:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* freebsd32_cpuset_setdomain */
 	case 562:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "uint32_t";
 			break;
 		case 3:
 			p = "uint32_t";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		case 5:
 			p = "userland domainset_t *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getrandom */
 	case 563:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfhat */
 	case 564:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "userland struct fhandle *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhlink */
 	case 565:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhlinkat */
 	case 566:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhreadlink */
 	case 567:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* funlinkat */
 	case 568:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	default:
 		break;
 	};
 	if (p != NULL)
 		strlcpy(desc, p, descsz);
 }
 static void
 systrace_return_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 {
 	const char *p = NULL;
 	switch (sysnum) {
 #if !defined(PAD64_REQUIRED) && !defined(__amd64__)
 #define PAD64_REQUIRED
 #endif
 	/* nosys */
 	case 0:
 	/* sys_exit */
 	case 1:
 		if (ndx == 0 || ndx == 1)
 			p = "void";
 		break;
 	/* fork */
 	case 2:
 	/* read */
 	case 3:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* write */
 	case 4:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* open */
 	case 5:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* close */
 	case 6:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_wait4 */
 	case 7:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* link */
 	case 9:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unlink */
 	case 10:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chdir */
 	case 12:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchdir */
 	case 13:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chmod */
 	case 15:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chown */
 	case 16:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* break */
 	case 17:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* getpid */
 	case 20:
 	/* mount */
 	case 21:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unmount */
 	case 22:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setuid */
 	case 23:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getuid */
 	case 24:
 	/* geteuid */
 	case 25:
 	/* ptrace */
 	case 26:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_recvmsg */
 	case 27:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sendmsg */
 	case 28:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_recvfrom */
 	case 29:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* accept */
 	case 30:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpeername */
 	case 31:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsockname */
 	case 32:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* access */
 	case 33:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chflags */
 	case 34:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchflags */
 	case 35:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sync */
 	case 36:
 	/* kill */
 	case 37:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getppid */
 	case 39:
 	/* dup */
 	case 41:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getegid */
 	case 43:
 	/* profil */
 	case 44:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktrace */
 	case 45:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getgid */
 	case 47:
 	/* getlogin */
 	case 49:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setlogin */
 	case 50:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* acct */
 	case 51:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigaltstack */
 	case 53:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ioctl */
 	case 54:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* reboot */
 	case 55:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* revoke */
 	case 56:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* symlink */
 	case 57:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* readlink */
 	case 58:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_execve */
 	case 59:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* umask */
 	case 60:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chroot */
 	case 61:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msync */
 	case 65:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* vfork */
 	case 66:
 	/* sbrk */
 	case 69:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sstk */
 	case 70:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munmap */
 	case 73:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_mprotect */
 	case 74:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* madvise */
 	case 75:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mincore */
 	case 78:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getgroups */
 	case 79:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setgroups */
 	case 80:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpgrp */
 	case 81:
 	/* setpgid */
 	case 82:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_setitimer */
 	case 83:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* swapon */
 	case 85:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_getitimer */
 	case 86:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getdtablesize */
 	case 89:
 	/* dup2 */
 	case 90:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_fcntl */
 	case 92:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_select */
 	case 93:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fsync */
 	case 95:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setpriority */
 	case 96:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* socket */
 	case 97:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* connect */
 	case 98:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpriority */
 	case 100:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* bind */
 	case 104:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setsockopt */
 	case 105:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* listen */
 	case 106:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_gettimeofday */
 	case 116:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_getrusage */
 	case 117:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsockopt */
 	case 118:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_readv */
 	case 120:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_writev */
 	case 121:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_settimeofday */
 	case 122:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchown */
 	case 123:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchmod */
 	case 124:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setreuid */
 	case 126:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setregid */
 	case 127:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rename */
 	case 128:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* flock */
 	case 131:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkfifo */
 	case 132:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sendto */
 	case 133:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shutdown */
 	case 134:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* socketpair */
 	case 135:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkdir */
 	case 136:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rmdir */
 	case 137:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_utimes */
 	case 138:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_adjtime */
 	case 140:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setsid */
 	case 147:
 	/* quotactl */
 	case 148:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfh */
 	case 161:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sysarch */
 	case 165:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rtprio */
 	case 166:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_semsys */
 	case 169:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_msgsys */
 	case 170:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_shmsys */
 	case 171:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ntp_adjtime */
 	case 176:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setgid */
 	case 181:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setegid */
 	case 182:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* seteuid */
 	case 183:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pathconf */
 	case 191:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fpathconf */
 	case 192:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getrlimit */
 	case 194:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setrlimit */
 	case 195:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nosys */
 	case 198:
 	/* freebsd32___sysctl */
 	case 202:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mlock */
 	case 203:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munlock */
 	case 204:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* undelete */
 	case 205:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_futimes */
 	case 206:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpgid */
 	case 207:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* poll */
 	case 209:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lkmnosys */
 	case 210:
 	/* lkmnosys */
 	case 211:
 	/* lkmnosys */
 	case 212:
 	/* lkmnosys */
 	case 213:
 	/* lkmnosys */
 	case 214:
 	/* lkmnosys */
 	case 215:
 	/* lkmnosys */
 	case 216:
 	/* lkmnosys */
 	case 217:
 	/* lkmnosys */
 	case 218:
 	/* lkmnosys */
 	case 219:
 	/* semget */
 	case 221:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* semop */
 	case 222:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgget */
 	case 225:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_msgsnd */
 	case 226:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_msgrcv */
 	case 227:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shmat */
 	case 228:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* shmdt */
 	case 230:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shmget */
 	case 231:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_clock_gettime */
 	case 232:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_clock_settime */
 	case 233:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_clock_getres */
 	case 234:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ktimer_create */
 	case 235:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_delete */
 	case 236:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ktimer_settime */
 	case 237:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ktimer_gettime */
 	case 238:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_getoverrun */
 	case 239:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_nanosleep */
 	case 240:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_getcounter */
 	case 241:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_setestimate */
 	case 242:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_getestimate */
 	case 243:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_clock_nanosleep */
 	case 244:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_clock_getcpuclockid2 */
 	case 247:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* minherit */
 	case 250:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rfork */
 	case 251:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* issetugid */
 	case 253:
 	/* lchown */
 	case 254:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_read */
 	case 255:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_write */
 	case 256:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_lio_listio */
 	case 257:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lchmod */
 	case 274:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_lutimes */
 	case 276:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_preadv */
 	case 289:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_pwritev */
 	case 290:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* fhopen */
 	case 298:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modnext */
 	case 300:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_modstat */
 	case 301:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modfnext */
 	case 302:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modfind */
 	case 303:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldload */
 	case 304:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldunload */
 	case 305:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldfind */
 	case 306:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldnext */
 	case 307:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kldstat */
 	case 308:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldfirstmod */
 	case 309:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsid */
 	case 310:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setresuid */
 	case 311:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setresgid */
 	case 312:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_return */
 	case 314:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_suspend */
 	case 315:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_cancel */
 	case 316:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_error */
 	case 317:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* yield */
 	case 321:
 	/* mlockall */
 	case 324:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munlockall */
 	case 325:
 	/* __getcwd */
 	case 326:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_setparam */
 	case 327:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_getparam */
 	case 328:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_setscheduler */
 	case 329:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_getscheduler */
 	case 330:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_yield */
 	case 331:
 	/* sched_get_priority_max */
 	case 332:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_get_priority_min */
 	case 333:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sched_rr_get_interval */
 	case 334:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* utrace */
 	case 335:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldsym */
 	case 337:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_jail */
 	case 338:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigprocmask */
 	case 340:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigsuspend */
 	case 341:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigpending */
 	case 343:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigtimedwait */
 	case 345:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigwaitinfo */
 	case 346:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_file */
 	case 347:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_file */
 	case 348:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_fd */
 	case 349:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_fd */
 	case 350:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_file */
 	case 351:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_fd */
 	case 352:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_file */
 	case 353:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_fd */
 	case 354:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattrctl */
 	case 355:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_set_file */
 	case 356:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_file */
 	case 357:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_file */
 	case 358:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_waitcomplete */
 	case 359:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getresuid */
 	case 360:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getresgid */
 	case 361:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kqueue */
 	case 362:
 	/* extattr_set_fd */
 	case 371:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_fd */
 	case 372:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_fd */
 	case 373:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __setugid */
 	case 374:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* eaccess */
 	case 376:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_nmount */
 	case 378:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kenv */
 	case 390:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lchflags */
 	case 391:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* uuidgen */
 	case 392:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sendfile */
 	case 393:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_close */
 	case 400:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_post */
 	case 401:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_wait */
 	case 402:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_trywait */
 	case 403:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ksem_init */
 	case 404:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ksem_open */
 	case 405:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_unlink */
 	case 406:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_getvalue */
 	case 407:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_destroy */
 	case 408:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_set_link */
 	case 412:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_link */
 	case 413:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_link */
 	case 414:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigaction */
 	case 416:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigreturn */
 	case 417:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_getcontext */
 	case 421:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_setcontext */
 	case 422:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_swapcontext */
 	case 423:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_link */
 	case 425:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_link */
 	case 426:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_link */
 	case 427:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_link */
 	case 428:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigwait */
 	case 429:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_exit */
 	case 431:
 		if (ndx == 0 || ndx == 1)
 			p = "void";
 		break;
 	/* thr_self */
 	case 432:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_kill */
 	case 433:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_attach */
 	case 436:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_list_fd */
 	case 437:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_list_file */
 	case 438:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_list_link */
 	case 439:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_ksem_timedwait */
 	case 441:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_thr_suspend */
 	case 442:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_wake */
 	case 443:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldunloadf */
 	case 444:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* audit */
 	case 445:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* auditon */
 	case 446:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getauid */
 	case 447:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setauid */
 	case 448:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getaudit */
 	case 449:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setaudit */
 	case 450:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getaudit_addr */
 	case 451:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setaudit_addr */
 	case 452:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* auditctl */
 	case 453:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32__umtx_op */
 	case 454:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_thr_new */
 	case 455:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_sigqueue */
 	case 456:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kmq_open */
 	case 457:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kmq_setattr */
 	case 458:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kmq_timedreceive */
 	case 459:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kmq_timedsend */
 	case 460:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_kmq_notify */
 	case 461:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_unlink */
 	case 462:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* abort2 */
 	case 463:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_set_name */
 	case 464:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_fsync */
 	case 465:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rtprio_thread */
 	case 466:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_peeloff */
 	case 471:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_sendmsg */
 	case 472:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_sendmsg_iov */
 	case 473:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_recvmsg */
 	case 474:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_pread */
 	case 475:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_pwrite */
 	case 476:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_mmap */
 	case 477:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* freebsd32_lseek */
 	case 478:
 		if (ndx == 0 || ndx == 1)
 			p = "off_t";
 		break;
 	/* freebsd32_truncate */
 	case 479:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ftruncate */
 	case 480:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #else
 	/* freebsd32_pread */
 	case 475:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_pwrite */
 	case 476:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* freebsd32_mmap */
 	case 477:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* freebsd32_lseek */
 	case 478:
 		if (ndx == 0 || ndx == 1)
 			p = "off_t";
 		break;
 	/* freebsd32_truncate */
 	case 479:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_ftruncate */
 	case 480:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #endif
 	/* thr_kill2 */
 	case 481:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shm_open */
 	case 482:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shm_unlink */
 	case 483:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset */
 	case 484:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_cpuset_setid */
 	case 485:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #else
 	/* freebsd32_cpuset_setid */
 	case 485:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #endif
 	/* freebsd32_cpuset_getid */
 	case 486:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cpuset_getaffinity */
 	case 487:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cpuset_setaffinity */
 	case 488:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* faccessat */
 	case 489:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchmodat */
 	case 490:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchownat */
 	case 491:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_fexecve */
 	case 492:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_futimesat */
 	case 494:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* linkat */
 	case 495:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkdirat */
 	case 496:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkfifoat */
 	case 497:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* openat */
 	case 499:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* readlinkat */
 	case 500:
 		if (ndx == 0 || ndx == 1)
-			p = "int";
+			p = "ssize_t";
 		break;
 	/* renameat */
 	case 501:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* symlinkat */
 	case 502:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unlinkat */
 	case 503:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* posix_openpt */
 	case 504:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_jail_get */
 	case 506:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_jail_set */
 	case 507:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_remove */
 	case 508:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* closefrom */
 	case 509:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_semctl */
 	case 510:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_msgctl */
 	case 511:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_shmctl */
 	case 512:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lpathconf */
 	case 513:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __cap_rights_get */
 	case 515:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_enter */
 	case 516:
 	/* cap_getmode */
 	case 517:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdfork */
 	case 518:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdkill */
 	case 519:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdgetpid */
 	case 520:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_pselect */
 	case 522:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getloginclass */
 	case 523:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setloginclass */
 	case 524:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_racct */
 	case 525:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_rules */
 	case 526:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_limits */
 	case 527:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_add_rule */
 	case 528:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_remove_rule */
 	case 529:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_posix_fallocate */
 	case 530:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_posix_fadvise */
 	case 531:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_wait6 */
 	case 532:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #else
 	/* freebsd32_posix_fallocate */
 	case 530:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_posix_fadvise */
 	case 531:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_wait6 */
 	case 532:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #endif
 	/* cap_rights_limit */
 	case 533:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cap_ioctls_limit */
 	case 534:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cap_ioctls_get */
 	case 535:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* cap_fcntls_limit */
 	case 536:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_fcntls_get */
 	case 537:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* bindat */
 	case 538:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* connectat */
 	case 539:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chflagsat */
 	case 540:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* accept4 */
 	case 541:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pipe2 */
 	case 542:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_aio_mlock */
 	case 543:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_procctl */
 	case 544:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #else
 	/* freebsd32_procctl */
 	case 544:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #endif
 	/* freebsd32_ppoll */
 	case 545:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_futimens */
 	case 546:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_utimensat */
 	case 547:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fdatasync */
 	case 550:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_fstat */
 	case 551:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_fstatat */
 	case 552:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_fhstat */
 	case 553:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getdirentries */
 	case 554:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* statfs */
 	case 555:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fstatfs */
 	case 556:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfsstat */
 	case 557:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhstatfs */
 	case 558:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #ifdef PAD64_REQUIRED
 	/* freebsd32_mknodat */
 	case 559:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #else
 	/* freebsd32_mknodat */
 	case 559:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 #endif
 	/* freebsd32_kevent */
 	case 560:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cpuset_getdomain */
 	case 561:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* freebsd32_cpuset_setdomain */
 	case 562:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getrandom */
 	case 563:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfhat */
 	case 564:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhlink */
 	case 565:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhlinkat */
 	case 566:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhreadlink */
 	case 567:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* funlinkat */
 	case 568:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	default:
 		break;
 	};
 	if (p != NULL)
 		strlcpy(desc, p, descsz);
 }
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/device.h
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/device.h	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/device.h	(revision 346926)
@@ -1,553 +1,553 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	_LINUX_DEVICE_H_
 #define	_LINUX_DEVICE_H_
 
 #include <linux/err.h>
 #include <linux/types.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
 #include <linux/list.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/workqueue.h>
 #include <linux/sysfs.h>
 #include <linux/kdev_t.h>
 #include <asm/atomic.h>
 
 #include <sys/bus.h>
 
 struct device;
 struct fwnode_handle;
 
 struct class {
 	const char	*name;
 	struct module	*owner;
 	struct kobject	kobj;
 	devclass_t	bsdclass;
 	const struct dev_pm_ops *pm;
 	void		(*class_release)(struct class *class);
 	void		(*dev_release)(struct device *dev);
 	char *		(*devnode)(struct device *dev, umode_t *mode);
 };
 
 struct dev_pm_ops {
 	int (*suspend)(struct device *dev);
 	int (*suspend_late)(struct device *dev);
 	int (*resume)(struct device *dev);
 	int (*resume_early)(struct device *dev);
 	int (*freeze)(struct device *dev);
 	int (*freeze_late)(struct device *dev);
 	int (*thaw)(struct device *dev);
 	int (*thaw_early)(struct device *dev);
 	int (*poweroff)(struct device *dev);
 	int (*poweroff_late)(struct device *dev);
 	int (*restore)(struct device *dev);
 	int (*restore_early)(struct device *dev);
 	int (*runtime_suspend)(struct device *dev);
 	int (*runtime_resume)(struct device *dev);
 	int (*runtime_idle)(struct device *dev);
 };
 
 struct device_driver {
 	const char	*name;
 	const struct dev_pm_ops *pm;
 };
 
 struct device_type {
 	const char	*name;
 };
 
 struct device {
 	struct device	*parent;
 	struct list_head irqents;
 	device_t	bsddev;
 	/*
 	 * The following flag is used to determine if the LinuxKPI is
 	 * responsible for detaching the BSD device or not. If the
 	 * LinuxKPI got the BSD device using devclass_get_device(), it
 	 * must not try to detach or delete it, because it's already
 	 * done somewhere else.
 	 */
 	bool		bsddev_attached_here;
 	struct device_driver *driver;
 	struct device_type *type;
 	dev_t		devt;
 	struct class	*class;
 	void		(*release)(struct device *dev);
 	struct kobject	kobj;
-	uint64_t	*dma_mask;
+	void		*dma_priv;
 	void		*driver_data;
 	unsigned int	irq;
 #define	LINUX_IRQ_INVALID	65535
 	unsigned int	msix;
 	unsigned int	msix_max;
 	const struct attribute_group **groups;
 	struct fwnode_handle *fwnode;
 
 	spinlock_t	devres_lock;
 	struct list_head devres_head;
 };
 
 extern struct device linux_root_device;
 extern struct kobject linux_class_root;
 extern const struct kobj_type linux_dev_ktype;
 extern const struct kobj_type linux_class_ktype;
 
 struct class_attribute {
 	struct attribute attr;
 	ssize_t (*show)(struct class *, struct class_attribute *, char *);
 	ssize_t (*store)(struct class *, struct class_attribute *, const char *, size_t);
 	const void *(*namespace)(struct class *, const struct class_attribute *);
 };
 
 #define	CLASS_ATTR(_name, _mode, _show, _store)				\
 	struct class_attribute class_attr_##_name =			\
 	    { { #_name, NULL, _mode }, _show, _store }
 
 struct device_attribute {
 	struct attribute	attr;
 	ssize_t			(*show)(struct device *,
 					struct device_attribute *, char *);
 	ssize_t			(*store)(struct device *,
 					struct device_attribute *, const char *,
 					size_t);
 };
 
 #define	DEVICE_ATTR(_name, _mode, _show, _store)			\
 	struct device_attribute dev_attr_##_name =			\
 	    __ATTR(_name, _mode, _show, _store)
 #define	DEVICE_ATTR_RO(_name)						\
 	struct device_attribute dev_attr_##_name = __ATTR_RO(_name)
 #define	DEVICE_ATTR_WO(_name)						\
 	struct device_attribute dev_attr_##_name = __ATTR_WO(_name)
 #define	DEVICE_ATTR_RW(_name)						\
 	struct device_attribute dev_attr_##_name = __ATTR_RW(_name)
 
 /* Simple class attribute that is just a static string */
 struct class_attribute_string {
 	struct class_attribute attr;
 	char *str;
 };
 
 static inline ssize_t
 show_class_attr_string(struct class *class,
 				struct class_attribute *attr, char *buf)
 {
 	struct class_attribute_string *cs;
 	cs = container_of(attr, struct class_attribute_string, attr);
 	return snprintf(buf, PAGE_SIZE, "%s\n", cs->str);
 }
 
 /* Currently read-only only */
 #define _CLASS_ATTR_STRING(_name, _mode, _str) \
 	{ __ATTR(_name, _mode, show_class_attr_string, NULL), _str }
 #define CLASS_ATTR_STRING(_name, _mode, _str) \
 	struct class_attribute_string class_attr_##_name = \
 		_CLASS_ATTR_STRING(_name, _mode, _str)
 
 #define	dev_err(dev, fmt, ...)	device_printf((dev)->bsddev, fmt, ##__VA_ARGS__)
 #define	dev_warn(dev, fmt, ...)	device_printf((dev)->bsddev, fmt, ##__VA_ARGS__)
 #define	dev_info(dev, fmt, ...)	device_printf((dev)->bsddev, fmt, ##__VA_ARGS__)
 #define	dev_notice(dev, fmt, ...)	device_printf((dev)->bsddev, fmt, ##__VA_ARGS__)
 #define	dev_dbg(dev, fmt, ...)	do { } while (0)
 #define	dev_printk(lvl, dev, fmt, ...)					\
 	    device_printf((dev)->bsddev, fmt, ##__VA_ARGS__)
 
 #define	dev_err_once(dev, ...) do {		\
 	static bool __dev_err_once;		\
 	if (!__dev_err_once) {			\
 		__dev_err_once = 1;		\
 		dev_err(dev, __VA_ARGS__);	\
 	}					\
 } while (0)
 
 #define	dev_err_ratelimited(dev, ...) do {	\
 	static linux_ratelimit_t __ratelimited;	\
 	if (linux_ratelimited(&__ratelimited))	\
 		dev_err(dev, __VA_ARGS__);	\
 } while (0)
 
 #define	dev_warn_ratelimited(dev, ...) do {	\
 	static linux_ratelimit_t __ratelimited;	\
 	if (linux_ratelimited(&__ratelimited))	\
 		dev_warn(dev, __VA_ARGS__);	\
 } while (0)
 
 static inline void *
 dev_get_drvdata(const struct device *dev)
 {
 
 	return dev->driver_data;
 }
 
 static inline void
 dev_set_drvdata(struct device *dev, void *data)
 {
 
 	dev->driver_data = data;
 }
 
 static inline struct device *
 get_device(struct device *dev)
 {
 
 	if (dev)
 		kobject_get(&dev->kobj);
 
 	return (dev);
 }
 
 static inline char *
 dev_name(const struct device *dev)
 {
 
 	return kobject_name(&dev->kobj);
 }
 
 #define	dev_set_name(_dev, _fmt, ...)					\
 	kobject_set_name(&(_dev)->kobj, (_fmt), ##__VA_ARGS__)
 
 static inline void
 put_device(struct device *dev)
 {
 
 	if (dev)
 		kobject_put(&dev->kobj);
 }
 
 static inline int
 class_register(struct class *class)
 {
 
 	class->bsdclass = devclass_create(class->name);
 	kobject_init(&class->kobj, &linux_class_ktype);
 	kobject_set_name(&class->kobj, class->name);
 	kobject_add(&class->kobj, &linux_class_root, class->name);
 
 	return (0);
 }
 
 static inline void
 class_unregister(struct class *class)
 {
 
 	kobject_put(&class->kobj);
 }
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
 {
 	return container_of(kobj, struct device, kobj);
 }
 
 /*
  * Devices are registered and created for exporting to sysfs. Create
  * implies register and register assumes the device fields have been
  * setup appropriately before being called.
  */
 static inline void
 device_initialize(struct device *dev)
 {
 	device_t bsddev = NULL;
 	int unit = -1;
 
 	if (dev->devt) {
 		unit = MINOR(dev->devt);
 		bsddev = devclass_get_device(dev->class->bsdclass, unit);
 		dev->bsddev_attached_here = false;
 	} else if (dev->parent == NULL) {
 		bsddev = devclass_get_device(dev->class->bsdclass, 0);
 		dev->bsddev_attached_here = false;
 	} else {
 		dev->bsddev_attached_here = true;
 	}
 
 	if (bsddev == NULL && dev->parent != NULL) {
 		bsddev = device_add_child(dev->parent->bsddev,
 		    dev->class->kobj.name, unit);
 	}
 
 	if (bsddev != NULL)
 		device_set_softc(bsddev, dev);
 
 	dev->bsddev = bsddev;
 	MPASS(dev->bsddev != NULL);
 	kobject_init(&dev->kobj, &linux_dev_ktype);
 
 	spin_lock_init(&dev->devres_lock);
 	INIT_LIST_HEAD(&dev->devres_head);
 }
 
 static inline int
 device_add(struct device *dev)
 {
 	if (dev->bsddev != NULL) {
 		if (dev->devt == 0)
 			dev->devt = makedev(0, device_get_unit(dev->bsddev));
 	}
 	kobject_add(&dev->kobj, &dev->class->kobj, dev_name(dev));
 	return (0);
 }
 
 static inline void
 device_create_release(struct device *dev)
 {
 	kfree(dev);
 }
 
 static inline struct device *
 device_create_groups_vargs(struct class *class, struct device *parent,
     dev_t devt, void *drvdata, const struct attribute_group **groups,
     const char *fmt, va_list args)
 {
 	struct device *dev = NULL;
 	int retval = -ENODEV;
 
 	if (class == NULL || IS_ERR(class))
 		goto error;
 
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	if (!dev) {
 		retval = -ENOMEM;
 		goto error;
 	}
 
 	dev->devt = devt;
 	dev->class = class;
 	dev->parent = parent;
 	dev->groups = groups;
 	dev->release = device_create_release;
 	/* device_initialize() needs the class and parent to be set */
 	device_initialize(dev);
 	dev_set_drvdata(dev, drvdata);
 
 	retval = kobject_set_name_vargs(&dev->kobj, fmt, args);
 	if (retval)
 		goto error;
 
 	retval = device_add(dev);
 	if (retval)
 		goto error;
 
 	return dev;
 
 error:
 	put_device(dev);
 	return ERR_PTR(retval);
 }
 
 static inline struct device *
 device_create_with_groups(struct class *class,
     struct device *parent, dev_t devt, void *drvdata,
     const struct attribute_group **groups, const char *fmt, ...)
 {
 	va_list vargs;
 	struct device *dev;
 
 	va_start(vargs, fmt);
 	dev = device_create_groups_vargs(class, parent, devt, drvdata,
 	    groups, fmt, vargs);
 	va_end(vargs);
 	return dev;
 }
 
 static inline bool
 device_is_registered(struct device *dev)
 {
 
 	return (dev->bsddev != NULL);
 }
 
 static inline int
 device_register(struct device *dev)
 {
 	device_t bsddev = NULL;
 	int unit = -1;
 
 	if (device_is_registered(dev))
 		goto done;
 
 	if (dev->devt) {
 		unit = MINOR(dev->devt);
 		bsddev = devclass_get_device(dev->class->bsdclass, unit);
 		dev->bsddev_attached_here = false;
 	} else if (dev->parent == NULL) {
 		bsddev = devclass_get_device(dev->class->bsdclass, 0);
 		dev->bsddev_attached_here = false;
 	} else {
 		dev->bsddev_attached_here = true;
 	}
 	if (bsddev == NULL && dev->parent != NULL) {
 		bsddev = device_add_child(dev->parent->bsddev,
 		    dev->class->kobj.name, unit);
 	}
 	if (bsddev != NULL) {
 		if (dev->devt == 0)
 			dev->devt = makedev(0, device_get_unit(bsddev));
 		device_set_softc(bsddev, dev);
 	}
 	dev->bsddev = bsddev;
 done:
 	kobject_init(&dev->kobj, &linux_dev_ktype);
 	kobject_add(&dev->kobj, &dev->class->kobj, dev_name(dev));
 
 	return (0);
 }
 
 static inline void
 device_unregister(struct device *dev)
 {
 	device_t bsddev;
 
 	bsddev = dev->bsddev;
 	dev->bsddev = NULL;
 
 	if (bsddev != NULL && dev->bsddev_attached_here) {
 		mtx_lock(&Giant);
 		device_delete_child(device_get_parent(bsddev), bsddev);
 		mtx_unlock(&Giant);
 	}
 	put_device(dev);
 }
 
 static inline void
 device_del(struct device *dev)
 {
 	device_t bsddev;
 
 	bsddev = dev->bsddev;
 	dev->bsddev = NULL;
 
 	if (bsddev != NULL && dev->bsddev_attached_here) {
 		mtx_lock(&Giant);
 		device_delete_child(device_get_parent(bsddev), bsddev);
 		mtx_unlock(&Giant);
 	}
 }
 
 struct device *device_create(struct class *class, struct device *parent,
 	    dev_t devt, void *drvdata, const char *fmt, ...);
 
 static inline void
 device_destroy(struct class *class, dev_t devt)
 {
 	device_t bsddev;
 	int unit;
 
 	unit = MINOR(devt);
 	bsddev = devclass_get_device(class->bsdclass, unit);
 	if (bsddev != NULL)
 		device_unregister(device_get_softc(bsddev));
 }
 
 #define	dev_pm_set_driver_flags(dev, flags) do { \
 } while (0)
 
 static inline void
 linux_class_kfree(struct class *class)
 {
 
 	kfree(class);
 }
 
 static inline struct class *
 class_create(struct module *owner, const char *name)
 {
 	struct class *class;
 	int error;
 
 	class = kzalloc(sizeof(*class), M_WAITOK);
 	class->owner = owner;
 	class->name = name;
 	class->class_release = linux_class_kfree;
 	error = class_register(class);
 	if (error) {
 		kfree(class);
 		return (NULL);
 	}
 
 	return (class);
 }
 
 static inline void
 class_destroy(struct class *class)
 {
 
 	if (class == NULL)
 		return;
 	class_unregister(class);
 }
 
 static inline int
 device_create_file(struct device *dev, const struct device_attribute *attr)
 {
 
 	if (dev)
 		return sysfs_create_file(&dev->kobj, &attr->attr);
 	return -EINVAL;
 }
 
 static inline void
 device_remove_file(struct device *dev, const struct device_attribute *attr)
 {
 
 	if (dev)
 		sysfs_remove_file(&dev->kobj, &attr->attr);
 }
 
 static inline int
 class_create_file(struct class *class, const struct class_attribute *attr)
 {
 
 	if (class)
 		return sysfs_create_file(&class->kobj, &attr->attr);
 	return -EINVAL;
 }
 
 static inline void
 class_remove_file(struct class *class, const struct class_attribute *attr)
 {
 
 	if (class)
 		sysfs_remove_file(&class->kobj, &attr->attr);
 }
 
 static inline int
 dev_to_node(struct device *dev)
 {
 	return -1;
 }
 
 char *kvasprintf(gfp_t, const char *, va_list);
 char *kasprintf(gfp_t, const char *, ...);
 
 #endif	/* _LINUX_DEVICE_H_ */
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dma-mapping.h
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dma-mapping.h	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dma-mapping.h	(revision 346926)
@@ -1,301 +1,294 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013, 2014 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	_LINUX_DMA_MAPPING_H_
 #define _LINUX_DMA_MAPPING_H_
 
 #include <linux/types.h>
 #include <linux/device.h>
 #include <linux/err.h>
 #include <linux/dma-attrs.h>
 #include <linux/scatterlist.h>
 #include <linux/mm.h>
 #include <linux/page.h>
 
 #include <sys/systm.h>
 #include <sys/malloc.h>
 
 #include <vm/vm.h>
 #include <vm/vm_page.h>
 #include <vm/pmap.h>
 
 #include <machine/bus.h>
 
 enum dma_data_direction {
 	DMA_BIDIRECTIONAL = 0,
 	DMA_TO_DEVICE = 1,
 	DMA_FROM_DEVICE = 2,
 	DMA_NONE = 3,
 };
 
 struct dma_map_ops {
 	void* (*alloc_coherent)(struct device *dev, size_t size,
 	    dma_addr_t *dma_handle, gfp_t gfp);
 	void (*free_coherent)(struct device *dev, size_t size,
 	    void *vaddr, dma_addr_t dma_handle);
 	dma_addr_t (*map_page)(struct device *dev, struct page *page,
 	    unsigned long offset, size_t size, enum dma_data_direction dir,
 	    struct dma_attrs *attrs);
 	void (*unmap_page)(struct device *dev, dma_addr_t dma_handle,
 	    size_t size, enum dma_data_direction dir, struct dma_attrs *attrs);
 	int (*map_sg)(struct device *dev, struct scatterlist *sg,
 	    int nents, enum dma_data_direction dir, struct dma_attrs *attrs);
 	void (*unmap_sg)(struct device *dev, struct scatterlist *sg, int nents,
 	    enum dma_data_direction dir, struct dma_attrs *attrs);
 	void (*sync_single_for_cpu)(struct device *dev, dma_addr_t dma_handle,
 	    size_t size, enum dma_data_direction dir);
 	void (*sync_single_for_device)(struct device *dev,
 	    dma_addr_t dma_handle, size_t size, enum dma_data_direction dir);
 	void (*sync_single_range_for_cpu)(struct device *dev,
 	    dma_addr_t dma_handle, unsigned long offset, size_t size,
 	    enum dma_data_direction dir);
 	void (*sync_single_range_for_device)(struct device *dev,
 	    dma_addr_t dma_handle, unsigned long offset, size_t size,
 	    enum dma_data_direction dir);
 	void (*sync_sg_for_cpu)(struct device *dev, struct scatterlist *sg,
 	    int nents, enum dma_data_direction dir);
 	void (*sync_sg_for_device)(struct device *dev, struct scatterlist *sg,
 	    int nents, enum dma_data_direction dir);
 	int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
 	int (*dma_supported)(struct device *dev, u64 mask);
 	int is_phys;
 };
 
 #define	DMA_BIT_MASK(n)	((2ULL << ((n) - 1)) - 1ULL)
 
+int linux_dma_tag_init(struct device *dev, u64 mask);
+void *linux_dma_alloc_coherent(struct device *dev, size_t size,
+    dma_addr_t *dma_handle, gfp_t flag);
+dma_addr_t linux_dma_map_phys(struct device *dev, vm_paddr_t phys, size_t len);
+void linux_dma_unmap(struct device *dev, dma_addr_t dma_addr, size_t size);
+int linux_dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+    int nents, enum dma_data_direction dir, struct dma_attrs *attrs);
+void linux_dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
+    int nents, enum dma_data_direction dir, struct dma_attrs *attrs);
+
 static inline int
 dma_supported(struct device *dev, u64 mask)
 {
 
 	/* XXX busdma takes care of this elsewhere. */
 	return (1);
 }
 
 static inline int
 dma_set_mask(struct device *dev, u64 dma_mask)
 {
 
-	if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+	if (!dev->dma_priv || !dma_supported(dev, dma_mask))
 		return -EIO;
 
-	*dev->dma_mask = dma_mask;
-	return (0);
+	return (linux_dma_tag_init(dev, dma_mask));
 }
 
 static inline int
 dma_set_coherent_mask(struct device *dev, u64 mask)
 {
 
 	if (!dma_supported(dev, mask))
 		return -EIO;
 	/* XXX Currently we don't support a separate coherent mask. */
 	return 0;
 }
 
 static inline int
 dma_set_mask_and_coherent(struct device *dev, u64 mask)
 {
 	int r;
 
 	r = dma_set_mask(dev, mask);
 	if (r == 0)
 		dma_set_coherent_mask(dev, mask);
 	return (r);
 }
 
 static inline void *
 dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
     gfp_t flag)
 {
-	vm_paddr_t high;
-	size_t align;
-	void *mem;
-
-	if (dev != NULL && dev->dma_mask)
-		high = *dev->dma_mask;
-	else if (flag & GFP_DMA32)
-		high = BUS_SPACE_MAXADDR_32BIT;
-	else
-		high = BUS_SPACE_MAXADDR;
-	align = PAGE_SIZE << get_order(size);
-	mem = (void *)kmem_alloc_contig(size, flag, 0, high, align, 0,
-	    VM_MEMATTR_DEFAULT);
-	if (mem)
-		*dma_handle = vtophys(mem);
-	else
-		*dma_handle = 0;
-	return (mem);
+	return (linux_dma_alloc_coherent(dev, size, dma_handle, flag));
 }
 
 static inline void *
 dma_zalloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
     gfp_t flag)
 {
 
 	return (dma_alloc_coherent(dev, size, dma_handle, flag | __GFP_ZERO));
 }
 
 static inline void
 dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
-    dma_addr_t dma_handle)
+    dma_addr_t dma_addr)
 {
 
+	linux_dma_unmap(dev, dma_addr, size);
 	kmem_free((vm_offset_t)cpu_addr, size);
 }
 
-/* XXX This only works with no iommu. */
 static inline dma_addr_t
 dma_map_single_attrs(struct device *dev, void *ptr, size_t size,
     enum dma_data_direction dir, struct dma_attrs *attrs)
 {
 
-	return vtophys(ptr);
+	return (linux_dma_map_phys(dev, vtophys(ptr), size));
 }
 
 static inline void
-dma_unmap_single_attrs(struct device *dev, dma_addr_t addr, size_t size,
+dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr, size_t size,
     enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+
+	linux_dma_unmap(dev, dma_addr, size);
 }
 
 static inline dma_addr_t
 dma_map_page_attrs(struct device *dev, struct page *page, size_t offset,
     size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 
-	return (VM_PAGE_TO_PHYS(page) + offset);
+	return (linux_dma_map_phys(dev, VM_PAGE_TO_PHYS(page) + offset, size));
 }
 
 static inline int
 dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, int nents,
     enum dma_data_direction dir, struct dma_attrs *attrs)
 {
-	struct scatterlist *sg;
-	int i;
 
-	for_each_sg(sgl, sg, nents, i)
-		sg_dma_address(sg) = sg_phys(sg);
-
-	return (nents);
+	return (linux_dma_map_sg_attrs(dev, sgl, nents, dir, attrs));
 }
 
 static inline void
 dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
     enum dma_data_direction dir, struct dma_attrs *attrs)
 {
+
+	linux_dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
 }
 
 static inline dma_addr_t
 dma_map_page(struct device *dev, struct page *page,
     unsigned long offset, size_t size, enum dma_data_direction direction)
 {
 
-	return VM_PAGE_TO_PHYS(page) + offset;
+	return (linux_dma_map_phys(dev, VM_PAGE_TO_PHYS(page) + offset, size));
 }
 
 static inline void
 dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
     enum dma_data_direction direction)
 {
+
+	linux_dma_unmap(dev, dma_address, size);
 }
 
 static inline void
 dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
     enum dma_data_direction direction)
 {
 }
 
 static inline void
 dma_sync_single(struct device *dev, dma_addr_t addr, size_t size,
     enum dma_data_direction dir)
 {
 	dma_sync_single_for_cpu(dev, addr, size, dir);
 }
 
 static inline void
 dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
     size_t size, enum dma_data_direction direction)
 {
 }
 
 static inline void
 dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
     enum dma_data_direction direction)
 {
 }
 
 static inline void
 dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
     enum dma_data_direction direction)
 {
 }
 
 static inline void
 dma_sync_single_range_for_cpu(struct device *dev, dma_addr_t dma_handle,
     unsigned long offset, size_t size, int direction)
 {
 }
 
 static inline void
 dma_sync_single_range_for_device(struct device *dev, dma_addr_t dma_handle,
     unsigned long offset, size_t size, int direction)
 {
 }
 
 static inline int
 dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
 
-	return (0);
+	return (dma_addr == 0);
 }
 
 static inline unsigned int dma_set_max_seg_size(struct device *dev,
     unsigned int size)
 {
 	return (0);
 }
 
 
 #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)
 #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, NULL)
 #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, NULL)
 #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, NULL)
 
 #define	DEFINE_DMA_UNMAP_ADDR(name)		dma_addr_t name
 #define	DEFINE_DMA_UNMAP_LEN(name)		__u32 name
 #define	dma_unmap_addr(p, name)			((p)->name)
 #define	dma_unmap_addr_set(p, name, v)		(((p)->name) = (v))
 #define	dma_unmap_len(p, name)			((p)->name)
 #define	dma_unmap_len_set(p, name, v)		(((p)->name) = (v))
 
 extern int uma_align_cache;
 #define	dma_get_cache_alignment()	uma_align_cache
 
 #endif	/* _LINUX_DMA_MAPPING_H_ */
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dmapool.h
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dmapool.h	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/dmapool.h	(revision 346926)
@@ -1,94 +1,95 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013, 2014 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef _LINUX_DMAPOOL_H_
 #define	_LINUX_DMAPOOL_H_
 
 #include <linux/types.h>
 #include <linux/io.h>
 #include <linux/scatterlist.h>
 #include <linux/device.h>
 #include <linux/slab.h>
 
+struct dma_pool *linux_dma_pool_create(char *name, struct device *dev,
+    size_t size, size_t align, size_t boundary);
+void linux_dma_pool_destroy(struct dma_pool *pool);
+void *linux_dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
+    dma_addr_t *handle);
+void linux_dma_pool_free(struct dma_pool *pool, void *vaddr,
+    dma_addr_t dma_addr);
+
 struct dma_pool {
+	struct pci_dev	*pool_pdev;
 	uma_zone_t	pool_zone;
+	struct mtx	pool_dma_lock;
+	bus_dma_tag_t	pool_dmat;
+	size_t		pool_entry_size;
+	struct mtx	pool_ptree_lock;
+	struct pctrie	pool_ptree;
 };
 
 static inline struct dma_pool *
 dma_pool_create(char *name, struct device *dev, size_t size,
     size_t align, size_t boundary)
 {
-	struct dma_pool *pool;
 
-	pool = kmalloc(sizeof(*pool), GFP_KERNEL);
-	align--;
-	/*
-	 * XXX Eventually this could use a separate allocf to honor boundary
-	 * and physical address requirements of the device.
-	 */
-	pool->pool_zone = uma_zcreate(name, size, NULL, NULL, NULL, NULL,
-	    align, UMA_ZONE_OFFPAGE|UMA_ZONE_HASH);
-
-	return (pool);
+	return (linux_dma_pool_create(name, dev, size, align, boundary));
 }
 
 static inline void
 dma_pool_destroy(struct dma_pool *pool)
 {
-	uma_zdestroy(pool->pool_zone);
-	kfree(pool);
+
+	linux_dma_pool_destroy(pool);
 }
 
 static inline void *
 dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle)
 {
-	void *vaddr;
 
-	vaddr = uma_zalloc(pool->pool_zone, mem_flags);
-	if (vaddr)
-		*handle = vtophys(vaddr);
-	return (vaddr);
+	return (linux_dma_pool_alloc(pool, mem_flags, handle));
 }
 
 static inline void *
 dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle)
 {
 
 	return (dma_pool_alloc(pool, mem_flags | __GFP_ZERO, handle));
 }
 
 static inline void
-dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr)
+dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma_addr)
 {
-	uma_zfree(pool->pool_zone, vaddr);
+
+	linux_dma_pool_free(pool, vaddr, dma_addr);
 }
 
 
 #endif /* _LINUX_DMAPOOL_H_ */
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/pci.h
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/pci.h	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/pci.h	(revision 346926)
@@ -1,926 +1,925 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	_LINUX_PCI_H_
 #define	_LINUX_PCI_H_
 
 #define	CONFIG_PCI_MSI
 
 #include <linux/types.h>
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/pciio.h>
 #include <sys/rman.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pci_private.h>
 
 #include <machine/resource.h>
 
 #include <linux/list.h>
 #include <linux/dmapool.h>
 #include <linux/dma-mapping.h>
 #include <linux/compiler.h>
 #include <linux/errno.h>
 #include <asm/atomic.h>
 #include <linux/device.h>
 
 struct pci_device_id {
 	uint32_t	vendor;
 	uint32_t	device;
 	uint32_t	subvendor;
 	uint32_t	subdevice;
 	uint32_t	class;
 	uint32_t	class_mask;
 	uintptr_t	driver_data;
 };
 
 #define	MODULE_DEVICE_TABLE(bus, table)
 
 #define	PCI_BASE_CLASS_DISPLAY		0x03
 #define	PCI_CLASS_DISPLAY_VGA		0x0300
 #define	PCI_CLASS_DISPLAY_OTHER		0x0380
 #define	PCI_BASE_CLASS_BRIDGE		0x06
 #define	PCI_CLASS_BRIDGE_ISA		0x0601
 
 #define	PCI_ANY_ID			-1U
 #define	PCI_VENDOR_ID_APPLE		0x106b
 #define	PCI_VENDOR_ID_ASUSTEK		0x1043
 #define	PCI_VENDOR_ID_ATI		0x1002
 #define	PCI_VENDOR_ID_DELL		0x1028
 #define	PCI_VENDOR_ID_HP		0x103c
 #define	PCI_VENDOR_ID_IBM		0x1014
 #define	PCI_VENDOR_ID_INTEL		0x8086
 #define	PCI_VENDOR_ID_MELLANOX			0x15b3
 #define	PCI_VENDOR_ID_REDHAT_QUMRANET	0x1af4
 #define	PCI_VENDOR_ID_SERVERWORKS	0x1166
 #define	PCI_VENDOR_ID_SONY		0x104d
 #define	PCI_VENDOR_ID_TOPSPIN			0x1867
 #define	PCI_VENDOR_ID_VIA		0x1106
 #define	PCI_SUBVENDOR_ID_REDHAT_QUMRANET	0x1af4
 #define	PCI_DEVICE_ID_ATI_RADEON_QY	0x5159
 #define	PCI_DEVICE_ID_MELLANOX_TAVOR		0x5a44
 #define	PCI_DEVICE_ID_MELLANOX_TAVOR_BRIDGE	0x5a46
 #define	PCI_DEVICE_ID_MELLANOX_ARBEL_COMPAT	0x6278
 #define	PCI_DEVICE_ID_MELLANOX_ARBEL		0x6282
 #define	PCI_DEVICE_ID_MELLANOX_SINAI_OLD	0x5e8c
 #define	PCI_DEVICE_ID_MELLANOX_SINAI		0x6274
 #define	PCI_SUBDEVICE_ID_QEMU		0x1100
 
 #define PCI_DEVFN(slot, func)   ((((slot) & 0x1f) << 3) | ((func) & 0x07))
 #define PCI_SLOT(devfn)		(((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn)		((devfn) & 0x07)
 #define	PCI_BUS_NUM(devfn)	(((devfn) >> 8) & 0xff)
 
 #define PCI_VDEVICE(_vendor, _device)					\
 	    .vendor = PCI_VENDOR_ID_##_vendor, .device = (_device),	\
 	    .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
 #define	PCI_DEVICE(_vendor, _device)					\
 	    .vendor = (_vendor), .device = (_device),			\
 	    .subvendor = PCI_ANY_ID, .subdevice = PCI_ANY_ID
 
 #define	to_pci_dev(n)	container_of(n, struct pci_dev, dev)
 
 #define	PCI_VENDOR_ID		PCIR_DEVVENDOR
 #define	PCI_COMMAND		PCIR_COMMAND
 #define	PCI_EXP_DEVCTL		PCIER_DEVICE_CTL		/* Device Control */
 #define	PCI_EXP_LNKCTL		PCIER_LINK_CTL			/* Link Control */
 #define	PCI_EXP_FLAGS_TYPE	PCIEM_FLAGS_TYPE		/* Device/Port type */
 #define	PCI_EXP_DEVCAP		PCIER_DEVICE_CAP		/* Device capabilities */
 #define	PCI_EXP_DEVSTA		PCIER_DEVICE_STA		/* Device Status */
 #define	PCI_EXP_LNKCAP		PCIER_LINK_CAP			/* Link Capabilities */
 #define	PCI_EXP_LNKSTA		PCIER_LINK_STA			/* Link Status */
 #define	PCI_EXP_SLTCAP		PCIER_SLOT_CAP			/* Slot Capabilities */
 #define	PCI_EXP_SLTCTL		PCIER_SLOT_CTL			/* Slot Control */
 #define	PCI_EXP_SLTSTA		PCIER_SLOT_STA			/* Slot Status */
 #define	PCI_EXP_RTCTL		PCIER_ROOT_CTL			/* Root Control */
 #define	PCI_EXP_RTCAP		PCIER_ROOT_CAP			/* Root Capabilities */
 #define	PCI_EXP_RTSTA		PCIER_ROOT_STA			/* Root Status */
 #define	PCI_EXP_DEVCAP2		PCIER_DEVICE_CAP2		/* Device Capabilities 2 */
 #define	PCI_EXP_DEVCTL2		PCIER_DEVICE_CTL2		/* Device Control 2 */
 #define	PCI_EXP_LNKCAP2		PCIER_LINK_CAP2			/* Link Capabilities 2 */
 #define	PCI_EXP_LNKCTL2		PCIER_LINK_CTL2			/* Link Control 2 */
 #define	PCI_EXP_LNKSTA2		PCIER_LINK_STA2			/* Link Status 2 */
 #define	PCI_EXP_FLAGS		PCIER_FLAGS			/* Capabilities register */
 #define	PCI_EXP_FLAGS_VERS	PCIEM_FLAGS_VERSION		/* Capability version */
 #define	PCI_EXP_TYPE_ROOT_PORT	PCIEM_TYPE_ROOT_PORT		/* Root Port */
 #define	PCI_EXP_TYPE_ENDPOINT	PCIEM_TYPE_ENDPOINT		/* Express Endpoint */
 #define	PCI_EXP_TYPE_LEG_END	PCIEM_TYPE_LEGACY_ENDPOINT	/* Legacy Endpoint */
 #define	PCI_EXP_TYPE_DOWNSTREAM PCIEM_TYPE_DOWNSTREAM_PORT	/* Downstream Port */
 #define	PCI_EXP_FLAGS_SLOT	PCIEM_FLAGS_SLOT		/* Slot implemented */
 #define	PCI_EXP_TYPE_RC_EC	PCIEM_TYPE_ROOT_EC		/* Root Complex Event Collector */
 #define	PCI_EXP_LNKCAP_SLS_2_5GB 0x01	/* Supported Link Speed 2.5GT/s */
 #define	PCI_EXP_LNKCAP_SLS_5_0GB 0x02	/* Supported Link Speed 5.0GT/s */
 #define	PCI_EXP_LNKCAP_SLS_8_0GB 0x04	/* Supported Link Speed 8.0GT/s */
 #define	PCI_EXP_LNKCAP_SLS_16_0GB 0x08	/* Supported Link Speed 16.0GT/s */
 #define	PCI_EXP_LNKCAP_MLW	0x03f0	/* Maximum Link Width */
 #define	PCI_EXP_LNKCAP2_SLS_2_5GB 0x02	/* Supported Link Speed 2.5GT/s */
 #define	PCI_EXP_LNKCAP2_SLS_5_0GB 0x04	/* Supported Link Speed 5.0GT/s */
 #define	PCI_EXP_LNKCAP2_SLS_8_0GB 0x08	/* Supported Link Speed 8.0GT/s */
 #define	PCI_EXP_LNKCAP2_SLS_16_0GB 0x10	/* Supported Link Speed 16.0GT/s */
 
 #define PCI_EXP_LNKCTL_HAWD	PCIEM_LINK_CTL_HAWD
 #define PCI_EXP_LNKCAP_CLKPM	0x00040000
 #define PCI_EXP_DEVSTA_TRPND	0x0020
 
 #define	IORESOURCE_MEM	(1 << SYS_RES_MEMORY)
 #define	IORESOURCE_IO	(1 << SYS_RES_IOPORT)
 #define	IORESOURCE_IRQ	(1 << SYS_RES_IRQ)
 
 enum pci_bus_speed {
 	PCI_SPEED_UNKNOWN = -1,
 	PCIE_SPEED_2_5GT,
 	PCIE_SPEED_5_0GT,
 	PCIE_SPEED_8_0GT,
 	PCIE_SPEED_16_0GT,
 };
 
 enum pcie_link_width {
 	PCIE_LNK_WIDTH_RESRV	= 0x00,
 	PCIE_LNK_X1		= 0x01,
 	PCIE_LNK_X2		= 0x02,
 	PCIE_LNK_X4		= 0x04,
 	PCIE_LNK_X8		= 0x08,
 	PCIE_LNK_X12		= 0x0c,
 	PCIE_LNK_X16		= 0x10,
 	PCIE_LNK_X32		= 0x20,
 	PCIE_LNK_WIDTH_UNKNOWN	= 0xff,
 };
 
 typedef int pci_power_t;
 
 #define PCI_D0	PCI_POWERSTATE_D0
 #define PCI_D1	PCI_POWERSTATE_D1
 #define PCI_D2	PCI_POWERSTATE_D2
 #define PCI_D3hot	PCI_POWERSTATE_D3
 #define PCI_D3cold	4
 
 #define PCI_POWER_ERROR	PCI_POWERSTATE_UNKNOWN
 
 struct pci_dev;
 
 struct pci_driver {
 	struct list_head		links;
 	char				*name;
 	const struct pci_device_id		*id_table;
 	int  (*probe)(struct pci_dev *dev, const struct pci_device_id *id);
 	void (*remove)(struct pci_dev *dev);
 	int  (*suspend) (struct pci_dev *dev, pm_message_t state);	/* Device suspended */
 	int  (*resume) (struct pci_dev *dev);		/* Device woken up */
 	void (*shutdown) (struct pci_dev *dev);		/* Device shutdown */
 	driver_t			bsddriver;
 	devclass_t			bsdclass;
 	struct device_driver		driver;
 	const struct pci_error_handlers       *err_handler;
 	bool				isdrm;
 };
 
 struct pci_bus {
 	struct pci_dev	*self;
 	int		number;
 };
 
 extern struct list_head pci_drivers;
 extern struct list_head pci_devices;
 extern spinlock_t pci_lock;
 
 #define	__devexit_p(x)	x
 
 struct pci_dev {
 	struct device		dev;
 	struct list_head	links;
 	struct pci_driver	*pdrv;
 	struct pci_bus		*bus;
-	uint64_t		dma_mask;
 	uint16_t		device;
 	uint16_t		vendor;
 	uint16_t		subsystem_vendor;
 	uint16_t		subsystem_device;
 	unsigned int		irq;
 	unsigned int		devfn;
 	uint32_t		class;
 	uint8_t			revision;
 };
 
 static inline struct resource_list_entry *
 linux_pci_get_rle(struct pci_dev *pdev, int type, int rid)
 {
 	struct pci_devinfo *dinfo;
 	struct resource_list *rl;
 
 	dinfo = device_get_ivars(pdev->dev.bsddev);
 	rl = &dinfo->resources;
 	return resource_list_find(rl, type, rid);
 }
 
 static inline struct resource_list_entry *
 linux_pci_get_bar(struct pci_dev *pdev, int bar)
 {
 	struct resource_list_entry *rle;
 
 	bar = PCIR_BAR(bar);
 	if ((rle = linux_pci_get_rle(pdev, SYS_RES_MEMORY, bar)) == NULL)
 		rle = linux_pci_get_rle(pdev, SYS_RES_IOPORT, bar);
 	return (rle);
 }
 
 static inline struct device *
 linux_pci_find_irq_dev(unsigned int irq)
 {
 	struct pci_dev *pdev;
 	struct device *found;
 
 	found = NULL;
 	spin_lock(&pci_lock);
 	list_for_each_entry(pdev, &pci_devices, links) {
 		if (irq == pdev->dev.irq ||
 		    (irq >= pdev->dev.msix && irq < pdev->dev.msix_max)) {
 			found = &pdev->dev;
 			break;
 		}
 	}
 	spin_unlock(&pci_lock);
 	return (found);
 }
 
 static inline unsigned long
 pci_resource_start(struct pci_dev *pdev, int bar)
 {
 	struct resource_list_entry *rle;
 
 	if ((rle = linux_pci_get_bar(pdev, bar)) == NULL)
 		return (0);
 	return rle->start;
 }
 
 static inline unsigned long
 pci_resource_len(struct pci_dev *pdev, int bar)
 {
 	struct resource_list_entry *rle;
 
 	if ((rle = linux_pci_get_bar(pdev, bar)) == NULL)
 		return (0);
 	return rle->count;
 }
 
 static inline int
 pci_resource_type(struct pci_dev *pdev, int bar)
 {
 	struct pci_map *pm;
 
 	pm = pci_find_bar(pdev->dev.bsddev, PCIR_BAR(bar));
 	if (!pm)
 		return (-1);
 
 	if (PCI_BAR_IO(pm->pm_value))
 		return (SYS_RES_IOPORT);
 	else
 		return (SYS_RES_MEMORY);
 }
 
 /*
  * All drivers just seem to want to inspect the type not flags.
  */
 static inline int
 pci_resource_flags(struct pci_dev *pdev, int bar)
 {
 	int type;
 
 	type = pci_resource_type(pdev, bar);
 	if (type < 0)
 		return (0);
 	return (1 << type);
 }
 
 static inline const char *
 pci_name(struct pci_dev *d)
 {
 
 	return device_get_desc(d->dev.bsddev);
 }
 
 static inline void *
 pci_get_drvdata(struct pci_dev *pdev)
 {
 
 	return dev_get_drvdata(&pdev->dev);
 }
 
 static inline void
 pci_set_drvdata(struct pci_dev *pdev, void *data)
 {
 
 	dev_set_drvdata(&pdev->dev, data);
 }
 
 static inline int
 pci_enable_device(struct pci_dev *pdev)
 {
 
 	pci_enable_io(pdev->dev.bsddev, SYS_RES_IOPORT);
 	pci_enable_io(pdev->dev.bsddev, SYS_RES_MEMORY);
 	return (0);
 }
 
 static inline void
 pci_disable_device(struct pci_dev *pdev)
 {
 
 	pci_disable_io(pdev->dev.bsddev, SYS_RES_IOPORT);
 	pci_disable_io(pdev->dev.bsddev, SYS_RES_MEMORY);
 	pci_disable_busmaster(pdev->dev.bsddev);
 }
 
 static inline int
 pci_set_master(struct pci_dev *pdev)
 {
 
 	pci_enable_busmaster(pdev->dev.bsddev);
 	return (0);
 }
 
 static inline int
 pci_set_power_state(struct pci_dev *pdev, int state)
 {
 
 	pci_set_powerstate(pdev->dev.bsddev, state);
 	return (0);
 }
 
 static inline int
 pci_clear_master(struct pci_dev *pdev)
 {
 
 	pci_disable_busmaster(pdev->dev.bsddev);
 	return (0);
 }
 
 static inline int
 pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
 {
 	int rid;
 	int type;
 
 	type = pci_resource_type(pdev, bar);
 	if (type < 0)
 		return (-ENODEV);
 	rid = PCIR_BAR(bar);
 	if (bus_alloc_resource_any(pdev->dev.bsddev, type, &rid,
 	    RF_ACTIVE) == NULL)
 		return (-EINVAL);
 	return (0);
 }
 
 static inline void
 pci_release_region(struct pci_dev *pdev, int bar)
 {
 	struct resource_list_entry *rle;
 
 	if ((rle = linux_pci_get_bar(pdev, bar)) == NULL)
 		return;
 	bus_release_resource(pdev->dev.bsddev, rle->type, rle->rid, rle->res);
 }
 
 static inline void
 pci_release_regions(struct pci_dev *pdev)
 {
 	int i;
 
 	for (i = 0; i <= PCIR_MAX_BAR_0; i++)
 		pci_release_region(pdev, i);
 }
 
 static inline int
 pci_request_regions(struct pci_dev *pdev, const char *res_name)
 {
 	int error;
 	int i;
 
 	for (i = 0; i <= PCIR_MAX_BAR_0; i++) {
 		error = pci_request_region(pdev, i, res_name);
 		if (error && error != -ENODEV) {
 			pci_release_regions(pdev);
 			return (error);
 		}
 	}
 	return (0);
 }
 
 static inline void
 pci_disable_msix(struct pci_dev *pdev)
 {
 
 	pci_release_msi(pdev->dev.bsddev);
 
 	/*
 	 * The MSIX IRQ numbers associated with this PCI device are no
 	 * longer valid and might be re-assigned. Make sure
 	 * linux_pci_find_irq_dev() does no longer see them by
 	 * resetting their references to zero:
 	 */
 	pdev->dev.msix = 0;
 	pdev->dev.msix_max = 0;
 }
 
 static inline bus_addr_t
 pci_bus_address(struct pci_dev *pdev, int bar)
 {
 
 	return (pci_resource_start(pdev, bar));
 }
 
 #define	PCI_CAP_ID_EXP	PCIY_EXPRESS
 #define	PCI_CAP_ID_PCIX	PCIY_PCIX
 #define PCI_CAP_ID_AGP  PCIY_AGP
 #define PCI_CAP_ID_PM   PCIY_PMG
 
 #define PCI_EXP_DEVCTL		PCIER_DEVICE_CTL
 #define PCI_EXP_DEVCTL_PAYLOAD	PCIEM_CTL_MAX_PAYLOAD
 #define PCI_EXP_DEVCTL_READRQ	PCIEM_CTL_MAX_READ_REQUEST
 #define PCI_EXP_LNKCTL		PCIER_LINK_CTL
 #define PCI_EXP_LNKSTA		PCIER_LINK_STA
 
 static inline int
 pci_find_capability(struct pci_dev *pdev, int capid)
 {
 	int reg;
 
 	if (pci_find_cap(pdev->dev.bsddev, capid, &reg))
 		return (0);
 	return (reg);
 }
 
 static inline int pci_pcie_cap(struct pci_dev *dev)
 {
 	return pci_find_capability(dev, PCI_CAP_ID_EXP);
 }
 
 
 static inline int
 pci_read_config_byte(struct pci_dev *pdev, int where, u8 *val)
 {
 
 	*val = (u8)pci_read_config(pdev->dev.bsddev, where, 1);
 	return (0);
 }
 
 static inline int
 pci_read_config_word(struct pci_dev *pdev, int where, u16 *val)
 {
 
 	*val = (u16)pci_read_config(pdev->dev.bsddev, where, 2);
 	return (0);
 }
 
 static inline int
 pci_read_config_dword(struct pci_dev *pdev, int where, u32 *val)
 {
 
 	*val = (u32)pci_read_config(pdev->dev.bsddev, where, 4);
 	return (0);
 }
 
 static inline int
 pci_write_config_byte(struct pci_dev *pdev, int where, u8 val)
 {
 
 	pci_write_config(pdev->dev.bsddev, where, val, 1);
 	return (0);
 }
 
 static inline int
 pci_write_config_word(struct pci_dev *pdev, int where, u16 val)
 {
 
 	pci_write_config(pdev->dev.bsddev, where, val, 2);
 	return (0);
 }
 
 static inline int
 pci_write_config_dword(struct pci_dev *pdev, int where, u32 val)
 {
 
 	pci_write_config(pdev->dev.bsddev, where, val, 4);
 	return (0);
 }
 
 int	linux_pci_register_driver(struct pci_driver *pdrv);
 int	linux_pci_register_drm_driver(struct pci_driver *pdrv);
 void	linux_pci_unregister_driver(struct pci_driver *pdrv);
 
 #define	pci_register_driver(pdrv)	linux_pci_register_driver(pdrv)
 #define	pci_unregister_driver(pdrv)	linux_pci_unregister_driver(pdrv)
 
 struct msix_entry {
 	int entry;
 	int vector;
 };
 
 /*
  * Enable msix, positive errors indicate actual number of available
  * vectors.  Negative errors are failures.
  *
  * NB: define added to prevent this definition of pci_enable_msix from
  * clashing with the native FreeBSD version.
  */
 #define	pci_enable_msix(...) \
   linux_pci_enable_msix(__VA_ARGS__)
 
 static inline int
 pci_enable_msix(struct pci_dev *pdev, struct msix_entry *entries, int nreq)
 {
 	struct resource_list_entry *rle;
 	int error;
 	int avail;
 	int i;
 
 	avail = pci_msix_count(pdev->dev.bsddev);
 	if (avail < nreq) {
 		if (avail == 0)
 			return -EINVAL;
 		return avail;
 	}
 	avail = nreq;
 	if ((error = -pci_alloc_msix(pdev->dev.bsddev, &avail)) != 0)
 		return error;
 	/*
 	 * Handle case where "pci_alloc_msix()" may allocate less
 	 * interrupts than available and return with no error:
 	 */
 	if (avail < nreq) {
 		pci_release_msi(pdev->dev.bsddev);
 		return avail;
 	}
 	rle = linux_pci_get_rle(pdev, SYS_RES_IRQ, 1);
 	pdev->dev.msix = rle->start;
 	pdev->dev.msix_max = rle->start + avail;
 	for (i = 0; i < nreq; i++)
 		entries[i].vector = pdev->dev.msix + i;
 	return (0);
 }
 
 #define	pci_enable_msix_range(...) \
   linux_pci_enable_msix_range(__VA_ARGS__)
 
 static inline int
 pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
     int minvec, int maxvec)
 {
 	int nvec = maxvec;
 	int rc;
 
 	if (maxvec < minvec)
 		return (-ERANGE);
 
 	do {
 		rc = pci_enable_msix(dev, entries, nvec);
 		if (rc < 0) {
 			return (rc);
 		} else if (rc > 0) {
 			if (rc < minvec)
 				return (-ENOSPC);
 			nvec = rc;
 		}
 	} while (rc);
 	return (nvec);
 }
 
 static inline int
 pci_channel_offline(struct pci_dev *pdev)
 {
 
 	return (pci_get_vendor(pdev->dev.bsddev) == 0xffff);
 }
 
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 {
 	return -ENODEV;
 }
 static inline void pci_disable_sriov(struct pci_dev *dev)
 {
 }
 
 #define DEFINE_PCI_DEVICE_TABLE(_table) \
 	const struct pci_device_id _table[] __devinitdata
 
 
 /* XXX This should not be necessary. */
 #define	pcix_set_mmrbc(d, v)	0
 #define	pcix_get_max_mmrbc(d)	0
 #define	pcie_set_readrq(d, v)	0
 
 #define	PCI_DMA_BIDIRECTIONAL	0
 #define	PCI_DMA_TODEVICE	1
 #define	PCI_DMA_FROMDEVICE	2
 #define	PCI_DMA_NONE		3
 
 #define	pci_pool		dma_pool
 #define	pci_pool_destroy(...)	dma_pool_destroy(__VA_ARGS__)
 #define	pci_pool_alloc(...)	dma_pool_alloc(__VA_ARGS__)
 #define	pci_pool_free(...)	dma_pool_free(__VA_ARGS__)
 #define	pci_pool_create(_name, _pdev, _size, _align, _alloc)		\
 	    dma_pool_create(_name, &(_pdev)->dev, _size, _align, _alloc)
 #define	pci_free_consistent(_hwdev, _size, _vaddr, _dma_handle)		\
 	    dma_free_coherent((_hwdev) == NULL ? NULL : &(_hwdev)->dev,	\
 		_size, _vaddr, _dma_handle)
 #define	pci_map_sg(_hwdev, _sg, _nents, _dir)				\
 	    dma_map_sg((_hwdev) == NULL ? NULL : &(_hwdev->dev),	\
 		_sg, _nents, (enum dma_data_direction)_dir)
 #define	pci_map_single(_hwdev, _ptr, _size, _dir)			\
 	    dma_map_single((_hwdev) == NULL ? NULL : &(_hwdev->dev),	\
 		(_ptr), (_size), (enum dma_data_direction)_dir)
 #define	pci_unmap_single(_hwdev, _addr, _size, _dir)			\
 	    dma_unmap_single((_hwdev) == NULL ? NULL : &(_hwdev)->dev,	\
 		_addr, _size, (enum dma_data_direction)_dir)
 #define	pci_unmap_sg(_hwdev, _sg, _nents, _dir)				\
 	    dma_unmap_sg((_hwdev) == NULL ? NULL : &(_hwdev)->dev,	\
 		_sg, _nents, (enum dma_data_direction)_dir)
 #define	pci_map_page(_hwdev, _page, _offset, _size, _dir)		\
 	    dma_map_page((_hwdev) == NULL ? NULL : &(_hwdev)->dev, _page,\
 		_offset, _size, (enum dma_data_direction)_dir)
 #define	pci_unmap_page(_hwdev, _dma_address, _size, _dir)		\
 	    dma_unmap_page((_hwdev) == NULL ? NULL : &(_hwdev)->dev,	\
 		_dma_address, _size, (enum dma_data_direction)_dir)
 #define	pci_set_dma_mask(_pdev, mask)	dma_set_mask(&(_pdev)->dev, (mask))
 #define	pci_dma_mapping_error(_pdev, _dma_addr)				\
 	    dma_mapping_error(&(_pdev)->dev, _dma_addr)
 #define	pci_set_consistent_dma_mask(_pdev, _mask)			\
 	    dma_set_coherent_mask(&(_pdev)->dev, (_mask))
 #define	DECLARE_PCI_UNMAP_ADDR(x)	DEFINE_DMA_UNMAP_ADDR(x);
 #define	DECLARE_PCI_UNMAP_LEN(x)	DEFINE_DMA_UNMAP_LEN(x);
 #define	pci_unmap_addr		dma_unmap_addr
 #define	pci_unmap_addr_set	dma_unmap_addr_set
 #define	pci_unmap_len		dma_unmap_len
 #define	pci_unmap_len_set	dma_unmap_len_set
 
 typedef unsigned int __bitwise pci_channel_state_t;
 typedef unsigned int __bitwise pci_ers_result_t;
 
 enum pci_channel_state {
 	pci_channel_io_normal = 1,
 	pci_channel_io_frozen = 2,
 	pci_channel_io_perm_failure = 3,
 };
 
 enum pci_ers_result {
 	PCI_ERS_RESULT_NONE = 1,
 	PCI_ERS_RESULT_CAN_RECOVER = 2,
 	PCI_ERS_RESULT_NEED_RESET = 3,
 	PCI_ERS_RESULT_DISCONNECT = 4,
 	PCI_ERS_RESULT_RECOVERED = 5,
 };
 
 
 /* PCI bus error event callbacks */
 struct pci_error_handlers {
 	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
 	    enum pci_channel_state error);
 	pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev);
 	pci_ers_result_t (*link_reset)(struct pci_dev *dev);
 	pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
 	void (*resume)(struct pci_dev *dev);
 };
 
 /* FreeBSD does not support SRIOV - yet */
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 	return dev;
 }
 
 static inline bool pci_is_pcie(struct pci_dev *dev)
 {
 	return !!pci_pcie_cap(dev);
 }
 
 static inline u16 pcie_flags_reg(struct pci_dev *dev)
 {
 	int pos;
 	u16 reg16;
 
 	pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
 	if (!pos)
 		return 0;
 
 	pci_read_config_word(dev, pos + PCI_EXP_FLAGS, &reg16);
 
 	return reg16;
 }
 
 
 static inline int pci_pcie_type(struct pci_dev *dev)
 {
 	return (pcie_flags_reg(dev) & PCI_EXP_FLAGS_TYPE) >> 4;
 }
 
 static inline int pcie_cap_version(struct pci_dev *dev)
 {
 	return pcie_flags_reg(dev) & PCI_EXP_FLAGS_VERS;
 }
 
 static inline bool pcie_cap_has_lnkctl(struct pci_dev *dev)
 {
 	int type = pci_pcie_type(dev);
 
 	return pcie_cap_version(dev) > 1 ||
 	       type == PCI_EXP_TYPE_ROOT_PORT ||
 	       type == PCI_EXP_TYPE_ENDPOINT ||
 	       type == PCI_EXP_TYPE_LEG_END;
 }
 
 static inline bool pcie_cap_has_devctl(const struct pci_dev *dev)
 {
 		return true;
 }
 
 static inline bool pcie_cap_has_sltctl(struct pci_dev *dev)
 {
 	int type = pci_pcie_type(dev);
 
 	return pcie_cap_version(dev) > 1 || type == PCI_EXP_TYPE_ROOT_PORT ||
 	    (type == PCI_EXP_TYPE_DOWNSTREAM &&
 	    pcie_flags_reg(dev) & PCI_EXP_FLAGS_SLOT);
 }
 
 static inline bool pcie_cap_has_rtctl(struct pci_dev *dev)
 {
 	int type = pci_pcie_type(dev);
 
 	return pcie_cap_version(dev) > 1 || type == PCI_EXP_TYPE_ROOT_PORT ||
 	    type == PCI_EXP_TYPE_RC_EC;
 }
 
 static bool pcie_capability_reg_implemented(struct pci_dev *dev, int pos)
 {
 	if (!pci_is_pcie(dev))
 		return false;
 
 	switch (pos) {
 	case PCI_EXP_FLAGS_TYPE:
 		return true;
 	case PCI_EXP_DEVCAP:
 	case PCI_EXP_DEVCTL:
 	case PCI_EXP_DEVSTA:
 		return pcie_cap_has_devctl(dev);
 	case PCI_EXP_LNKCAP:
 	case PCI_EXP_LNKCTL:
 	case PCI_EXP_LNKSTA:
 		return pcie_cap_has_lnkctl(dev);
 	case PCI_EXP_SLTCAP:
 	case PCI_EXP_SLTCTL:
 	case PCI_EXP_SLTSTA:
 		return pcie_cap_has_sltctl(dev);
 	case PCI_EXP_RTCTL:
 	case PCI_EXP_RTCAP:
 	case PCI_EXP_RTSTA:
 		return pcie_cap_has_rtctl(dev);
 	case PCI_EXP_DEVCAP2:
 	case PCI_EXP_DEVCTL2:
 	case PCI_EXP_LNKCAP2:
 	case PCI_EXP_LNKCTL2:
 	case PCI_EXP_LNKSTA2:
 		return pcie_cap_version(dev) > 1;
 	default:
 		return false;
 	}
 }
 
 static inline int
 pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *dst)
 {
 	if (pos & 3)
 		return -EINVAL;
 
 	if (!pcie_capability_reg_implemented(dev, pos))
 		return -EINVAL;
 
 	return pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, dst);
 }
 
 static inline int
 pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *dst)
 {
 	if (pos & 3)
 		return -EINVAL;
 
 	if (!pcie_capability_reg_implemented(dev, pos))
 		return -EINVAL;
 
 	return pci_read_config_word(dev, pci_pcie_cap(dev) + pos, dst);
 }
 
 static inline int
 pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val)
 {
 	if (pos & 1)
 		return -EINVAL;
 
 	if (!pcie_capability_reg_implemented(dev, pos))
 		return 0;
 
 	return pci_write_config_word(dev, pci_pcie_cap(dev) + pos, val);
 }
 
 static inline int pcie_get_minimum_link(struct pci_dev *dev,
     enum pci_bus_speed *speed, enum pcie_link_width *width)
 {
 	*speed = PCI_SPEED_UNKNOWN;
 	*width = PCIE_LNK_WIDTH_UNKNOWN;
 	return (0);
 }
 
 static inline int
 pci_num_vf(struct pci_dev *dev)
 {
 	return (0);
 }
 
 static inline enum pci_bus_speed
 pcie_get_speed_cap(struct pci_dev *dev)
 {
 	device_t root;
 	uint32_t lnkcap, lnkcap2;
 	int error, pos;
 
 	root = device_get_parent(dev->dev.bsddev);
 	if (root == NULL)
 		return (PCI_SPEED_UNKNOWN);
 	root = device_get_parent(root);
 	if (root == NULL)
 		return (PCI_SPEED_UNKNOWN);
 	root = device_get_parent(root);
 	if (root == NULL)
 		return (PCI_SPEED_UNKNOWN);
 
 	if (pci_get_vendor(root) == PCI_VENDOR_ID_VIA ||
 	    pci_get_vendor(root) == PCI_VENDOR_ID_SERVERWORKS)
 		return (PCI_SPEED_UNKNOWN);
 
 	if ((error = pci_find_cap(root, PCIY_EXPRESS, &pos)) != 0)
 		return (PCI_SPEED_UNKNOWN);
 
 	lnkcap2 = pci_read_config(root, pos + PCIER_LINK_CAP2, 4);
 
 	if (lnkcap2) {	/* PCIe r3.0-compliant */
 		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
 			return (PCIE_SPEED_2_5GT);
 		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
 			return (PCIE_SPEED_5_0GT);
 		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
 			return (PCIE_SPEED_8_0GT);
 		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_16_0GB)
 			return (PCIE_SPEED_16_0GT);
 	} else {	/* pre-r3.0 */
 		lnkcap = pci_read_config(root, pos + PCIER_LINK_CAP, 4);
 		if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
 			return (PCIE_SPEED_2_5GT);
 		if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
 			return (PCIE_SPEED_5_0GT);
 		if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
 			return (PCIE_SPEED_8_0GT);
 		if (lnkcap & PCI_EXP_LNKCAP_SLS_16_0GB)
 			return (PCIE_SPEED_16_0GT);
 	}
 	return (PCI_SPEED_UNKNOWN);
 }
 
 static inline enum pcie_link_width
 pcie_get_width_cap(struct pci_dev *dev)
 {
 	uint32_t lnkcap;
 
 	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
 	if (lnkcap)
 		return ((lnkcap & PCI_EXP_LNKCAP_MLW) >> 4);
 
 	return (PCIE_LNK_WIDTH_UNKNOWN);
 }
 
 #endif	/* _LINUX_PCI_H_ */
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/scatterlist.h
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/scatterlist.h	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/include/linux/scatterlist.h	(revision 346926)
@@ -1,457 +1,458 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013-2017 Mellanox Technologies, Ltd.
  * Copyright (c) 2015 Matthew Dillon <dillon@backplane.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	_LINUX_SCATTERLIST_H_
 #define	_LINUX_SCATTERLIST_H_
 
 #include <linux/page.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
 
 struct scatterlist {
 	unsigned long page_link;
 #define	SG_PAGE_LINK_CHAIN	0x1UL
 #define	SG_PAGE_LINK_LAST	0x2UL
 #define	SG_PAGE_LINK_MASK	0x3UL
 	unsigned int offset;
 	unsigned int length;
-	dma_addr_t address;
+	dma_addr_t dma_address;
+	unsigned int dma_length;
 };
 
 CTASSERT((sizeof(struct scatterlist) & SG_PAGE_LINK_MASK) == 0);
 
 struct sg_table {
 	struct scatterlist *sgl;
 	unsigned int nents;
 	unsigned int orig_nents;
 };
 
 struct sg_page_iter {
 	struct scatterlist *sg;
 	unsigned int sg_pgoffset;
 	unsigned int maxents;
 	struct {
 		unsigned int nents;
 		int	pg_advance;
 	} internal;
 };
 
 #define	SCATTERLIST_MAX_SEGMENT	(-1U & ~(PAGE_SIZE - 1))
 
 #define	SG_MAX_SINGLE_ALLOC	(PAGE_SIZE / sizeof(struct scatterlist))
 
 #define	SG_MAGIC		0x87654321UL
 #define	SG_CHAIN		SG_PAGE_LINK_CHAIN
 #define	SG_END			SG_PAGE_LINK_LAST
 
 #define	sg_is_chain(sg)		((sg)->page_link & SG_PAGE_LINK_CHAIN)
 #define	sg_is_last(sg)		((sg)->page_link & SG_PAGE_LINK_LAST)
 #define	sg_chain_ptr(sg)	\
 	((struct scatterlist *) ((sg)->page_link & ~SG_PAGE_LINK_MASK))
 
-#define	sg_dma_address(sg)	(sg)->address
-#define	sg_dma_len(sg)		(sg)->length
+#define	sg_dma_address(sg)	(sg)->dma_address
+#define	sg_dma_len(sg)		(sg)->dma_length
 
 #define	for_each_sg_page(sgl, iter, nents, pgoffset)			\
 	for (_sg_iter_init(sgl, iter, nents, pgoffset);			\
 	     (iter)->sg; _sg_iter_next(iter))
 
 #define	for_each_sg(sglist, sg, sgmax, iter)				\
 	for (iter = 0, sg = (sglist); iter < (sgmax); iter++, sg = sg_next(sg))
 
 typedef struct scatterlist *(sg_alloc_fn) (unsigned int, gfp_t);
 typedef void (sg_free_fn) (struct scatterlist *, unsigned int);
 
 static inline void
 sg_assign_page(struct scatterlist *sg, struct page *page)
 {
 	unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK;
 
 	sg->page_link = page_link | (unsigned long)page;
 }
 
 static inline void
 sg_set_page(struct scatterlist *sg, struct page *page, unsigned int len,
     unsigned int offset)
 {
 	sg_assign_page(sg, page);
 	sg->offset = offset;
 	sg->length = len;
 }
 
 static inline struct page *
 sg_page(struct scatterlist *sg)
 {
 	return ((struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK));
 }
 
 static inline void
 sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen)
 {
 	sg_set_page(sg, virt_to_page(buf), buflen,
 	    ((uintptr_t)buf) & (PAGE_SIZE - 1));
 }
 
 static inline struct scatterlist *
 sg_next(struct scatterlist *sg)
 {
 	if (sg_is_last(sg))
 		return (NULL);
 	sg++;
 	if (sg_is_chain(sg))
 		sg = sg_chain_ptr(sg);
 	return (sg);
 }
 
 static inline vm_paddr_t
 sg_phys(struct scatterlist *sg)
 {
 	return (VM_PAGE_TO_PHYS(sg_page(sg)) + sg->offset);
 }
 
 static inline void *
 sg_virt(struct scatterlist *sg)
 {
 
 	return ((void *)((unsigned long)page_address(sg_page(sg)) + sg->offset));
 }
 
 static inline void
 sg_chain(struct scatterlist *prv, unsigned int prv_nents,
     struct scatterlist *sgl)
 {
 	struct scatterlist *sg = &prv[prv_nents - 1];
 
 	sg->offset = 0;
 	sg->length = 0;
 	sg->page_link = ((unsigned long)sgl |
 	    SG_PAGE_LINK_CHAIN) & ~SG_PAGE_LINK_LAST;
 }
 
 static inline void
 sg_mark_end(struct scatterlist *sg)
 {
 	sg->page_link |= SG_PAGE_LINK_LAST;
 	sg->page_link &= ~SG_PAGE_LINK_CHAIN;
 }
 
 static inline void
 sg_init_table(struct scatterlist *sg, unsigned int nents)
 {
 	bzero(sg, sizeof(*sg) * nents);
 	sg_mark_end(&sg[nents - 1]);
 }
 
 static struct scatterlist *
 sg_kmalloc(unsigned int nents, gfp_t gfp_mask)
 {
 	if (nents == SG_MAX_SINGLE_ALLOC) {
 		return ((void *)__get_free_page(gfp_mask));
 	} else
 		return (kmalloc(nents * sizeof(struct scatterlist), gfp_mask));
 }
 
 static inline void
 sg_kfree(struct scatterlist *sg, unsigned int nents)
 {
 	if (nents == SG_MAX_SINGLE_ALLOC) {
 		free_page((unsigned long)sg);
 	} else
 		kfree(sg);
 }
 
 static inline void
 __sg_free_table(struct sg_table *table, unsigned int max_ents,
     bool skip_first_chunk, sg_free_fn * free_fn)
 {
 	struct scatterlist *sgl, *next;
 
 	if (unlikely(!table->sgl))
 		return;
 
 	sgl = table->sgl;
 	while (table->orig_nents) {
 		unsigned int alloc_size = table->orig_nents;
 		unsigned int sg_size;
 
 		if (alloc_size > max_ents) {
 			next = sg_chain_ptr(&sgl[max_ents - 1]);
 			alloc_size = max_ents;
 			sg_size = alloc_size - 1;
 		} else {
 			sg_size = alloc_size;
 			next = NULL;
 		}
 
 		table->orig_nents -= sg_size;
 		if (skip_first_chunk)
 			skip_first_chunk = 0;
 		else
 			free_fn(sgl, alloc_size);
 		sgl = next;
 	}
 
 	table->sgl = NULL;
 }
 
 static inline void
 sg_free_table(struct sg_table *table)
 {
 	__sg_free_table(table, SG_MAX_SINGLE_ALLOC, 0, sg_kfree);
 }
 
 static inline int
 __sg_alloc_table(struct sg_table *table, unsigned int nents,
     unsigned int max_ents, struct scatterlist *first_chunk,
     gfp_t gfp_mask, sg_alloc_fn *alloc_fn)
 {
 	struct scatterlist *sg, *prv;
 	unsigned int left;
 
 	memset(table, 0, sizeof(*table));
 
 	if (nents == 0)
 		return (-EINVAL);
 	left = nents;
 	prv = NULL;
 	do {
 		unsigned int sg_size;
 		unsigned int alloc_size = left;
 
 		if (alloc_size > max_ents) {
 			alloc_size = max_ents;
 			sg_size = alloc_size - 1;
 		} else
 			sg_size = alloc_size;
 
 		left -= sg_size;
 
 		if (first_chunk) {
 			sg = first_chunk;
 			first_chunk = NULL;
 		} else {
 			sg = alloc_fn(alloc_size, gfp_mask);
 		}
 		if (unlikely(!sg)) {
 			if (prv)
 				table->nents = ++table->orig_nents;
 
 			return (-ENOMEM);
 		}
 		sg_init_table(sg, alloc_size);
 		table->nents = table->orig_nents += sg_size;
 
 		if (prv)
 			sg_chain(prv, max_ents, sg);
 		else
 			table->sgl = sg;
 
 		if (!left)
 			sg_mark_end(&sg[sg_size - 1]);
 
 		prv = sg;
 	} while (left);
 
 	return (0);
 }
 
 static inline int
 sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
 {
 	int ret;
 
 	ret = __sg_alloc_table(table, nents, SG_MAX_SINGLE_ALLOC,
 	    NULL, gfp_mask, sg_kmalloc);
 	if (unlikely(ret))
 		__sg_free_table(table, SG_MAX_SINGLE_ALLOC, 0, sg_kfree);
 
 	return (ret);
 }
 
 static inline int
 __sg_alloc_table_from_pages(struct sg_table *sgt,
     struct page **pages, unsigned int count,
     unsigned long off, unsigned long size,
     unsigned int max_segment, gfp_t gfp_mask)
 {
 	unsigned int i, segs, cur, len;
 	int rc;
 	struct scatterlist *s;
 
 	if (__predict_false(!max_segment || offset_in_page(max_segment)))
 		return (-EINVAL);
 
 	len = 0;
 	for (segs = i = 1; i < count; ++i) {
 		len += PAGE_SIZE;
 		if (len >= max_segment ||
 		    page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
 			++segs;
 			len = 0;
 		}
 	}
 	if (__predict_false((rc = sg_alloc_table(sgt, segs, gfp_mask))))
 		return (rc);
 
 	cur = 0;
 	for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
 		unsigned long seg_size;
 		unsigned int j;
 
 		len = 0;
 		for (j = cur + 1; j < count; ++j) {
 			len += PAGE_SIZE;
 			if (len >= max_segment || page_to_pfn(pages[j]) !=
 			    page_to_pfn(pages[j - 1]) + 1)
 				break;
 		}
 
 		seg_size = ((j - cur) << PAGE_SHIFT) - off;
 		sg_set_page(s, pages[cur], min(size, seg_size), off);
 		size -= seg_size;
 		off = 0;
 		cur = j;
 	}
 	return (0);
 }
 
 static inline int
 sg_alloc_table_from_pages(struct sg_table *sgt,
     struct page **pages, unsigned int count,
     unsigned long off, unsigned long size,
     gfp_t gfp_mask)
 {
 
 	return (__sg_alloc_table_from_pages(sgt, pages, count, off, size,
 	    SCATTERLIST_MAX_SEGMENT, gfp_mask));
 }
 
 static inline int
 sg_nents(struct scatterlist *sg)
 {
 	int nents;
 
 	for (nents = 0; sg; sg = sg_next(sg))
 		nents++;
 	return (nents);
 }
 
 static inline void
 __sg_page_iter_start(struct sg_page_iter *piter,
     struct scatterlist *sglist, unsigned int nents,
     unsigned long pgoffset)
 {
 	piter->internal.pg_advance = 0;
 	piter->internal.nents = nents;
 
 	piter->sg = sglist;
 	piter->sg_pgoffset = pgoffset;
 }
 
 static inline void
 _sg_iter_next(struct sg_page_iter *iter)
 {
 	struct scatterlist *sg;
 	unsigned int pgcount;
 
 	sg = iter->sg;
 	pgcount = (sg->offset + sg->length + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
 	++iter->sg_pgoffset;
 	while (iter->sg_pgoffset >= pgcount) {
 		iter->sg_pgoffset -= pgcount;
 		sg = sg_next(sg);
 		--iter->maxents;
 		if (sg == NULL || iter->maxents == 0)
 			break;
 		pgcount = (sg->offset + sg->length + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	}
 	iter->sg = sg;
 }
 
 static inline int
 sg_page_count(struct scatterlist *sg)
 {
 	return (PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT);
 }
 
 static inline bool
 __sg_page_iter_next(struct sg_page_iter *piter)
 {
 	if (piter->internal.nents == 0)
 		return (0);
 	if (piter->sg == NULL)
 		return (0);
 
 	piter->sg_pgoffset += piter->internal.pg_advance;
 	piter->internal.pg_advance = 1;
 
 	while (piter->sg_pgoffset >= sg_page_count(piter->sg)) {
 		piter->sg_pgoffset -= sg_page_count(piter->sg);
 		piter->sg = sg_next(piter->sg);
 		if (--piter->internal.nents == 0)
 			return (0);
 		if (piter->sg == NULL)
 			return (0);
 	}
 	return (1);
 }
 
 static inline void
 _sg_iter_init(struct scatterlist *sgl, struct sg_page_iter *iter,
     unsigned int nents, unsigned long pgoffset)
 {
 	if (nents) {
 		iter->sg = sgl;
 		iter->sg_pgoffset = pgoffset - 1;
 		iter->maxents = nents;
 		_sg_iter_next(iter);
 	} else {
 		iter->sg = NULL;
 		iter->sg_pgoffset = 0;
 		iter->maxents = 0;
 	}
 }
 
 static inline dma_addr_t
 sg_page_iter_dma_address(struct sg_page_iter *spi)
 {
-	return (spi->sg->address + (spi->sg_pgoffset << PAGE_SHIFT));
+	return (spi->sg->dma_address + (spi->sg_pgoffset << PAGE_SHIFT));
 }
 
 static inline struct page *
 sg_page_iter_page(struct sg_page_iter *piter)
 {
 	return (nth_page(sg_page(piter->sg), piter->sg_pgoffset));
 }
 
 
 #endif					/* _LINUX_SCATTERLIST_H_ */
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_compat.c
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_compat.c	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_compat.c	(revision 346926)
@@ -1,2446 +1,2446 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
  * Copyright (c) 2013-2018 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_stack.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/malloc.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/proc.h>
 #include <sys/sglist.h>
 #include <sys/sleepqueue.h>
 #include <sys/refcount.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/bus.h>
 #include <sys/fcntl.h>
 #include <sys/file.h>
 #include <sys/filio.h>
 #include <sys/rwlock.h>
 #include <sys/mman.h>
 #include <sys/stack.h>
 #include <sys/user.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
 #include <vm/vm_pager.h>
 
 #include <machine/stdarg.h>
 
 #if defined(__i386__) || defined(__amd64__)
 #include <machine/md_var.h>
 #endif
 
 #include <linux/kobject.h>
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/cdev.h>
 #include <linux/file.h>
 #include <linux/sysfs.h>
 #include <linux/mm.h>
 #include <linux/io.h>
 #include <linux/vmalloc.h>
 #include <linux/netdevice.h>
 #include <linux/timer.h>
 #include <linux/interrupt.h>
 #include <linux/uaccess.h>
 #include <linux/list.h>
 #include <linux/kthread.h>
 #include <linux/kernel.h>
 #include <linux/compat.h>
 #include <linux/poll.h>
 #include <linux/smp.h>
 
 #if defined(__i386__) || defined(__amd64__)
 #include <asm/smp.h>
 #endif
 
 SYSCTL_NODE(_compat, OID_AUTO, linuxkpi, CTLFLAG_RW, 0, "LinuxKPI parameters");
 
 MALLOC_DEFINE(M_KMALLOC, "linux", "Linux kmalloc compat");
 
 #include <linux/rbtree.h>
 /* Undo Linux compat changes. */
 #undef RB_ROOT
 #undef file
 #undef cdev
 #define	RB_ROOT(head)	(head)->rbh_root
 
 static void linux_cdev_deref(struct linux_cdev *ldev);
 static struct vm_area_struct *linux_cdev_handle_find(void *handle);
 
 struct kobject linux_class_root;
 struct device linux_root_device;
 struct class linux_class_misc;
 struct list_head pci_drivers;
 struct list_head pci_devices;
 spinlock_t pci_lock;
 
 unsigned long linux_timer_hz_mask;
 
 int
 panic_cmp(struct rb_node *one, struct rb_node *two)
 {
 	panic("no cmp");
 }
 
 RB_GENERATE(linux_root, rb_node, __entry, panic_cmp);
 
 int
 kobject_set_name_vargs(struct kobject *kobj, const char *fmt, va_list args)
 {
 	va_list tmp_va;
 	int len;
 	char *old;
 	char *name;
 	char dummy;
 
 	old = kobj->name;
 
 	if (old && fmt == NULL)
 		return (0);
 
 	/* compute length of string */
 	va_copy(tmp_va, args);
 	len = vsnprintf(&dummy, 0, fmt, tmp_va);
 	va_end(tmp_va);
 
 	/* account for zero termination */
 	len++;
 
 	/* check for error */
 	if (len < 1)
 		return (-EINVAL);
 
 	/* allocate memory for string */
 	name = kzalloc(len, GFP_KERNEL);
 	if (name == NULL)
 		return (-ENOMEM);
 	vsnprintf(name, len, fmt, args);
 	kobj->name = name;
 
 	/* free old string */
 	kfree(old);
 
 	/* filter new string */
 	for (; *name != '\0'; name++)
 		if (*name == '/')
 			*name = '!';
 	return (0);
 }
 
 int
 kobject_set_name(struct kobject *kobj, const char *fmt, ...)
 {
 	va_list args;
 	int error;
 
 	va_start(args, fmt);
 	error = kobject_set_name_vargs(kobj, fmt, args);
 	va_end(args);
 
 	return (error);
 }
 
 static int
 kobject_add_complete(struct kobject *kobj, struct kobject *parent)
 {
 	const struct kobj_type *t;
 	int error;
 
 	kobj->parent = parent;
 	error = sysfs_create_dir(kobj);
 	if (error == 0 && kobj->ktype && kobj->ktype->default_attrs) {
 		struct attribute **attr;
 		t = kobj->ktype;
 
 		for (attr = t->default_attrs; *attr != NULL; attr++) {
 			error = sysfs_create_file(kobj, *attr);
 			if (error)
 				break;
 		}
 		if (error)
 			sysfs_remove_dir(kobj);
 
 	}
 	return (error);
 }
 
 int
 kobject_add(struct kobject *kobj, struct kobject *parent, const char *fmt, ...)
 {
 	va_list args;
 	int error;
 
 	va_start(args, fmt);
 	error = kobject_set_name_vargs(kobj, fmt, args);
 	va_end(args);
 	if (error)
 		return (error);
 
 	return kobject_add_complete(kobj, parent);
 }
 
 void
 linux_kobject_release(struct kref *kref)
 {
 	struct kobject *kobj;
 	char *name;
 
 	kobj = container_of(kref, struct kobject, kref);
 	sysfs_remove_dir(kobj);
 	name = kobj->name;
 	if (kobj->ktype && kobj->ktype->release)
 		kobj->ktype->release(kobj);
 	kfree(name);
 }
 
 static void
 linux_kobject_kfree(struct kobject *kobj)
 {
 	kfree(kobj);
 }
 
 static void
 linux_kobject_kfree_name(struct kobject *kobj)
 {
 	if (kobj) {
 		kfree(kobj->name);
 	}
 }
 
 const struct kobj_type linux_kfree_type = {
 	.release = linux_kobject_kfree
 };
 
 static void
 linux_device_release(struct device *dev)
 {
 	pr_debug("linux_device_release: %s\n", dev_name(dev));
 	kfree(dev);
 }
 
 static ssize_t
 linux_class_show(struct kobject *kobj, struct attribute *attr, char *buf)
 {
 	struct class_attribute *dattr;
 	ssize_t error;
 
 	dattr = container_of(attr, struct class_attribute, attr);
 	error = -EIO;
 	if (dattr->show)
 		error = dattr->show(container_of(kobj, struct class, kobj),
 		    dattr, buf);
 	return (error);
 }
 
 static ssize_t
 linux_class_store(struct kobject *kobj, struct attribute *attr, const char *buf,
     size_t count)
 {
 	struct class_attribute *dattr;
 	ssize_t error;
 
 	dattr = container_of(attr, struct class_attribute, attr);
 	error = -EIO;
 	if (dattr->store)
 		error = dattr->store(container_of(kobj, struct class, kobj),
 		    dattr, buf, count);
 	return (error);
 }
 
 static void
 linux_class_release(struct kobject *kobj)
 {
 	struct class *class;
 
 	class = container_of(kobj, struct class, kobj);
 	if (class->class_release)
 		class->class_release(class);
 }
 
 static const struct sysfs_ops linux_class_sysfs = {
 	.show  = linux_class_show,
 	.store = linux_class_store,
 };
 
 const struct kobj_type linux_class_ktype = {
 	.release = linux_class_release,
 	.sysfs_ops = &linux_class_sysfs
 };
 
 static void
 linux_dev_release(struct kobject *kobj)
 {
 	struct device *dev;
 
 	dev = container_of(kobj, struct device, kobj);
 	/* This is the precedence defined by linux. */
 	if (dev->release)
 		dev->release(dev);
 	else if (dev->class && dev->class->dev_release)
 		dev->class->dev_release(dev);
 }
 
 static ssize_t
 linux_dev_show(struct kobject *kobj, struct attribute *attr, char *buf)
 {
 	struct device_attribute *dattr;
 	ssize_t error;
 
 	dattr = container_of(attr, struct device_attribute, attr);
 	error = -EIO;
 	if (dattr->show)
 		error = dattr->show(container_of(kobj, struct device, kobj),
 		    dattr, buf);
 	return (error);
 }
 
 static ssize_t
 linux_dev_store(struct kobject *kobj, struct attribute *attr, const char *buf,
     size_t count)
 {
 	struct device_attribute *dattr;
 	ssize_t error;
 
 	dattr = container_of(attr, struct device_attribute, attr);
 	error = -EIO;
 	if (dattr->store)
 		error = dattr->store(container_of(kobj, struct device, kobj),
 		    dattr, buf, count);
 	return (error);
 }
 
 static const struct sysfs_ops linux_dev_sysfs = {
 	.show  = linux_dev_show,
 	.store = linux_dev_store,
 };
 
 const struct kobj_type linux_dev_ktype = {
 	.release = linux_dev_release,
 	.sysfs_ops = &linux_dev_sysfs
 };
 
 struct device *
 device_create(struct class *class, struct device *parent, dev_t devt,
     void *drvdata, const char *fmt, ...)
 {
 	struct device *dev;
 	va_list args;
 
 	dev = kzalloc(sizeof(*dev), M_WAITOK);
 	dev->parent = parent;
 	dev->class = class;
 	dev->devt = devt;
 	dev->driver_data = drvdata;
 	dev->release = linux_device_release;
 	va_start(args, fmt);
 	kobject_set_name_vargs(&dev->kobj, fmt, args);
 	va_end(args);
 	device_register(dev);
 
 	return (dev);
 }
 
 int
 kobject_init_and_add(struct kobject *kobj, const struct kobj_type *ktype,
     struct kobject *parent, const char *fmt, ...)
 {
 	va_list args;
 	int error;
 
 	kobject_init(kobj, ktype);
 	kobj->ktype = ktype;
 	kobj->parent = parent;
 	kobj->name = NULL;
 
 	va_start(args, fmt);
 	error = kobject_set_name_vargs(kobj, fmt, args);
 	va_end(args);
 	if (error)
 		return (error);
 	return kobject_add_complete(kobj, parent);
 }
 
 static void
 linux_kq_lock(void *arg)
 {
 	spinlock_t *s = arg;
 
 	spin_lock(s);
 }
 static void
 linux_kq_unlock(void *arg)
 {
 	spinlock_t *s = arg;
 
 	spin_unlock(s);
 }
 
 static void
 linux_kq_lock_owned(void *arg)
 {
 #ifdef INVARIANTS
 	spinlock_t *s = arg;
 
 	mtx_assert(&s->m, MA_OWNED);
 #endif
 }
 
 static void
 linux_kq_lock_unowned(void *arg)
 {
 #ifdef INVARIANTS
 	spinlock_t *s = arg;
 
 	mtx_assert(&s->m, MA_NOTOWNED);
 #endif
 }
 
 static void
 linux_file_kqfilter_poll(struct linux_file *, int);
 
 struct linux_file *
 linux_file_alloc(void)
 {
 	struct linux_file *filp;
 
 	filp = kzalloc(sizeof(*filp), GFP_KERNEL);
 
 	/* set initial refcount */
 	filp->f_count = 1;
 
 	/* setup fields needed by kqueue support */
 	spin_lock_init(&filp->f_kqlock);
 	knlist_init(&filp->f_selinfo.si_note, &filp->f_kqlock,
 	    linux_kq_lock, linux_kq_unlock,
 	    linux_kq_lock_owned, linux_kq_lock_unowned);
 
 	return (filp);
 }
 
 void
 linux_file_free(struct linux_file *filp)
 {
 	if (filp->_file == NULL) {
 		if (filp->f_shmem != NULL)
 			vm_object_deallocate(filp->f_shmem);
 		kfree(filp);
 	} else {
 		/*
 		 * The close method of the character device or file
 		 * will free the linux_file structure:
 		 */
 		_fdrop(filp->_file, curthread);
 	}
 }
 
 static int
 linux_cdev_pager_fault(vm_object_t vm_obj, vm_ooffset_t offset, int prot,
     vm_page_t *mres)
 {
 	struct vm_area_struct *vmap;
 
 	vmap = linux_cdev_handle_find(vm_obj->handle);
 
 	MPASS(vmap != NULL);
 	MPASS(vmap->vm_private_data == vm_obj->handle);
 
 	if (likely(vmap->vm_ops != NULL && offset < vmap->vm_len)) {
 		vm_paddr_t paddr = IDX_TO_OFF(vmap->vm_pfn) + offset;
 		vm_page_t page;
 
 		if (((*mres)->flags & PG_FICTITIOUS) != 0) {
 			/*
 			 * If the passed in result page is a fake
 			 * page, update it with the new physical
 			 * address.
 			 */
 			page = *mres;
 			vm_page_updatefake(page, paddr, vm_obj->memattr);
 		} else {
 			/*
 			 * Replace the passed in "mres" page with our
 			 * own fake page and free up the all of the
 			 * original pages.
 			 */
 			VM_OBJECT_WUNLOCK(vm_obj);
 			page = vm_page_getfake(paddr, vm_obj->memattr);
 			VM_OBJECT_WLOCK(vm_obj);
 
 			vm_page_replace_checked(page, vm_obj,
 			    (*mres)->pindex, *mres);
 
 			vm_page_lock(*mres);
 			vm_page_free(*mres);
 			vm_page_unlock(*mres);
 			*mres = page;
 		}
 		page->valid = VM_PAGE_BITS_ALL;
 		return (VM_PAGER_OK);
 	}
 	return (VM_PAGER_FAIL);
 }
 
 static int
 linux_cdev_pager_populate(vm_object_t vm_obj, vm_pindex_t pidx, int fault_type,
     vm_prot_t max_prot, vm_pindex_t *first, vm_pindex_t *last)
 {
 	struct vm_area_struct *vmap;
 	int err;
 
 	linux_set_current(curthread);
 
 	/* get VM area structure */
 	vmap = linux_cdev_handle_find(vm_obj->handle);
 	MPASS(vmap != NULL);
 	MPASS(vmap->vm_private_data == vm_obj->handle);
 
 	VM_OBJECT_WUNLOCK(vm_obj);
 
 	down_write(&vmap->vm_mm->mmap_sem);
 	if (unlikely(vmap->vm_ops == NULL)) {
 		err = VM_FAULT_SIGBUS;
 	} else {
 		struct vm_fault vmf;
 
 		/* fill out VM fault structure */
 		vmf.virtual_address = (void *)(uintptr_t)IDX_TO_OFF(pidx);
 		vmf.flags = (fault_type & VM_PROT_WRITE) ? FAULT_FLAG_WRITE : 0;
 		vmf.pgoff = 0;
 		vmf.page = NULL;
 		vmf.vma = vmap;
 
 		vmap->vm_pfn_count = 0;
 		vmap->vm_pfn_pcount = &vmap->vm_pfn_count;
 		vmap->vm_obj = vm_obj;
 
 		err = vmap->vm_ops->fault(vmap, &vmf);
 
 		while (vmap->vm_pfn_count == 0 && err == VM_FAULT_NOPAGE) {
 			kern_yield(PRI_USER);
 			err = vmap->vm_ops->fault(vmap, &vmf);
 		}
 	}
 
 	/* translate return code */
 	switch (err) {
 	case VM_FAULT_OOM:
 		err = VM_PAGER_AGAIN;
 		break;
 	case VM_FAULT_SIGBUS:
 		err = VM_PAGER_BAD;
 		break;
 	case VM_FAULT_NOPAGE:
 		/*
 		 * By contract the fault handler will return having
 		 * busied all the pages itself. If pidx is already
 		 * found in the object, it will simply xbusy the first
 		 * page and return with vm_pfn_count set to 1.
 		 */
 		*first = vmap->vm_pfn_first;
 		*last = *first + vmap->vm_pfn_count - 1;
 		err = VM_PAGER_OK;
 		break;
 	default:
 		err = VM_PAGER_ERROR;
 		break;
 	}
 	up_write(&vmap->vm_mm->mmap_sem);
 	VM_OBJECT_WLOCK(vm_obj);
 	return (err);
 }
 
 static struct rwlock linux_vma_lock;
 static TAILQ_HEAD(, vm_area_struct) linux_vma_head =
     TAILQ_HEAD_INITIALIZER(linux_vma_head);
 
 static void
 linux_cdev_handle_free(struct vm_area_struct *vmap)
 {
 	/* Drop reference on vm_file */
 	if (vmap->vm_file != NULL)
 		fput(vmap->vm_file);
 
 	/* Drop reference on mm_struct */
 	mmput(vmap->vm_mm);
 
 	kfree(vmap);
 }
 
 static void
 linux_cdev_handle_remove(struct vm_area_struct *vmap)
 {
 	rw_wlock(&linux_vma_lock);
 	TAILQ_REMOVE(&linux_vma_head, vmap, vm_entry);
 	rw_wunlock(&linux_vma_lock);
 }
 
 static struct vm_area_struct *
 linux_cdev_handle_find(void *handle)
 {
 	struct vm_area_struct *vmap;
 
 	rw_rlock(&linux_vma_lock);
 	TAILQ_FOREACH(vmap, &linux_vma_head, vm_entry) {
 		if (vmap->vm_private_data == handle)
 			break;
 	}
 	rw_runlock(&linux_vma_lock);
 	return (vmap);
 }
 
 static int
 linux_cdev_pager_ctor(void *handle, vm_ooffset_t size, vm_prot_t prot,
 		      vm_ooffset_t foff, struct ucred *cred, u_short *color)
 {
 
 	MPASS(linux_cdev_handle_find(handle) != NULL);
 	*color = 0;
 	return (0);
 }
 
 static void
 linux_cdev_pager_dtor(void *handle)
 {
 	const struct vm_operations_struct *vm_ops;
 	struct vm_area_struct *vmap;
 
 	vmap = linux_cdev_handle_find(handle);
 	MPASS(vmap != NULL);
 
 	/*
 	 * Remove handle before calling close operation to prevent
 	 * other threads from reusing the handle pointer.
 	 */
 	linux_cdev_handle_remove(vmap);
 
 	down_write(&vmap->vm_mm->mmap_sem);
 	vm_ops = vmap->vm_ops;
 	if (likely(vm_ops != NULL))
 		vm_ops->close(vmap);
 	up_write(&vmap->vm_mm->mmap_sem);
 
 	linux_cdev_handle_free(vmap);
 }
 
 static struct cdev_pager_ops linux_cdev_pager_ops[2] = {
   {
 	/* OBJT_MGTDEVICE */
 	.cdev_pg_populate	= linux_cdev_pager_populate,
 	.cdev_pg_ctor	= linux_cdev_pager_ctor,
 	.cdev_pg_dtor	= linux_cdev_pager_dtor
   },
   {
 	/* OBJT_DEVICE */
 	.cdev_pg_fault	= linux_cdev_pager_fault,
 	.cdev_pg_ctor	= linux_cdev_pager_ctor,
 	.cdev_pg_dtor	= linux_cdev_pager_dtor
   },
 };
 
 int
 zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
     unsigned long size)
 {
 	vm_object_t obj;
 	vm_page_t m;
 
 	obj = vma->vm_obj;
 	if (obj == NULL || (obj->flags & OBJ_UNMANAGED) != 0)
 		return (-ENOTSUP);
 	VM_OBJECT_RLOCK(obj);
 	for (m = vm_page_find_least(obj, OFF_TO_IDX(address));
 	    m != NULL && m->pindex < OFF_TO_IDX(address + size);
 	    m = TAILQ_NEXT(m, listq))
 		pmap_remove_all(m);
 	VM_OBJECT_RUNLOCK(obj);
 	return (0);
 }
 
 static struct file_operations dummy_ldev_ops = {
 	/* XXXKIB */
 };
 
 static struct linux_cdev dummy_ldev = {
 	.ops = &dummy_ldev_ops,
 };
 
 #define	LDEV_SI_DTR	0x0001
 #define	LDEV_SI_REF	0x0002
 
 static void
 linux_get_fop(struct linux_file *filp, const struct file_operations **fop,
     struct linux_cdev **dev)
 {
 	struct linux_cdev *ldev;
 	u_int siref;
 
 	ldev = filp->f_cdev;
 	*fop = filp->f_op;
 	if (ldev != NULL) {
 		for (siref = ldev->siref;;) {
 			if ((siref & LDEV_SI_DTR) != 0) {
 				ldev = &dummy_ldev;
 				siref = ldev->siref;
 				*fop = ldev->ops;
 				MPASS((ldev->siref & LDEV_SI_DTR) == 0);
 			} else if (atomic_fcmpset_int(&ldev->siref, &siref,
 			    siref + LDEV_SI_REF)) {
 				break;
 			}
 		}
 	}
 	*dev = ldev;
 }
 
 static void
 linux_drop_fop(struct linux_cdev *ldev)
 {
 
 	if (ldev == NULL)
 		return;
 	MPASS((ldev->siref & ~LDEV_SI_DTR) != 0);
 	atomic_subtract_int(&ldev->siref, LDEV_SI_REF);
 }
 
 #define	OPW(fp,td,code) ({			\
 	struct file *__fpop;			\
 	__typeof(code) __retval;		\
 						\
 	__fpop = (td)->td_fpop;			\
 	(td)->td_fpop = (fp);			\
 	__retval = (code);			\
 	(td)->td_fpop = __fpop;			\
 	__retval;				\
 })
 
 static int
 linux_dev_fdopen(struct cdev *dev, int fflags, struct thread *td,
     struct file *file)
 {
 	struct linux_cdev *ldev;
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	int error;
 
 	ldev = dev->si_drv1;
 
 	filp = linux_file_alloc();
 	filp->f_dentry = &filp->f_dentry_store;
 	filp->f_op = ldev->ops;
 	filp->f_mode = file->f_flag;
 	filp->f_flags = file->f_flag;
 	filp->f_vnode = file->f_vnode;
 	filp->_file = file;
 	refcount_acquire(&ldev->refs);
 	filp->f_cdev = ldev;
 
 	linux_set_current(td);
 	linux_get_fop(filp, &fop, &ldev);
 
 	if (fop->open != NULL) {
 		error = -fop->open(file->f_vnode, filp);
 		if (error != 0) {
 			linux_drop_fop(ldev);
 			linux_cdev_deref(filp->f_cdev);
 			kfree(filp);
 			return (error);
 		}
 	}
 
 	/* hold on to the vnode - used for fstat() */
 	vhold(filp->f_vnode);
 
 	/* release the file from devfs */
 	finit(file, filp->f_mode, DTYPE_DEV, filp, &linuxfileops);
 	linux_drop_fop(ldev);
 	return (ENXIO);
 }
 
 #define	LINUX_IOCTL_MIN_PTR 0x10000UL
 #define	LINUX_IOCTL_MAX_PTR (LINUX_IOCTL_MIN_PTR + IOCPARM_MAX)
 
 static inline int
 linux_remap_address(void **uaddr, size_t len)
 {
 	uintptr_t uaddr_val = (uintptr_t)(*uaddr);
 
 	if (unlikely(uaddr_val >= LINUX_IOCTL_MIN_PTR &&
 	    uaddr_val < LINUX_IOCTL_MAX_PTR)) {
 		struct task_struct *pts = current;
 		if (pts == NULL) {
 			*uaddr = NULL;
 			return (1);
 		}
 
 		/* compute data offset */
 		uaddr_val -= LINUX_IOCTL_MIN_PTR;
 
 		/* check that length is within bounds */
 		if ((len > IOCPARM_MAX) ||
 		    (uaddr_val + len) > pts->bsd_ioctl_len) {
 			*uaddr = NULL;
 			return (1);
 		}
 
 		/* re-add kernel buffer address */
 		uaddr_val += (uintptr_t)pts->bsd_ioctl_data;
 
 		/* update address location */
 		*uaddr = (void *)uaddr_val;
 		return (1);
 	}
 	return (0);
 }
 
 int
 linux_copyin(const void *uaddr, void *kaddr, size_t len)
 {
 	if (linux_remap_address(__DECONST(void **, &uaddr), len)) {
 		if (uaddr == NULL)
 			return (-EFAULT);
 		memcpy(kaddr, uaddr, len);
 		return (0);
 	}
 	return (-copyin(uaddr, kaddr, len));
 }
 
 int
 linux_copyout(const void *kaddr, void *uaddr, size_t len)
 {
 	if (linux_remap_address(&uaddr, len)) {
 		if (uaddr == NULL)
 			return (-EFAULT);
 		memcpy(uaddr, kaddr, len);
 		return (0);
 	}
 	return (-copyout(kaddr, uaddr, len));
 }
 
 size_t
 linux_clear_user(void *_uaddr, size_t _len)
 {
 	uint8_t *uaddr = _uaddr;
 	size_t len = _len;
 
 	/* make sure uaddr is aligned before going into the fast loop */
 	while (((uintptr_t)uaddr & 7) != 0 && len > 7) {
 		if (subyte(uaddr, 0))
 			return (_len);
 		uaddr++;
 		len--;
 	}
 
 	/* zero 8 bytes at a time */
 	while (len > 7) {
 #ifdef __LP64__
 		if (suword64(uaddr, 0))
 			return (_len);
 #else
 		if (suword32(uaddr, 0))
 			return (_len);
 		if (suword32(uaddr + 4, 0))
 			return (_len);
 #endif
 		uaddr += 8;
 		len -= 8;
 	}
 
 	/* zero fill end, if any */
 	while (len > 0) {
 		if (subyte(uaddr, 0))
 			return (_len);
 		uaddr++;
 		len--;
 	}
 	return (0);
 }
 
 int
 linux_access_ok(int rw, const void *uaddr, size_t len)
 {
 	uintptr_t saddr;
 	uintptr_t eaddr;
 
 	/* get start and end address */
 	saddr = (uintptr_t)uaddr;
 	eaddr = (uintptr_t)uaddr + len;
 
 	/* verify addresses are valid for userspace */
 	return ((saddr == eaddr) ||
 	    (eaddr > saddr && eaddr <= VM_MAXUSER_ADDRESS));
 }
 
 /*
  * This function should return either EINTR or ERESTART depending on
  * the signal type sent to this thread:
  */
 static int
 linux_get_error(struct task_struct *task, int error)
 {
 	/* check for signal type interrupt code */
 	if (error == EINTR || error == ERESTARTSYS || error == ERESTART) {
 		error = -linux_schedule_get_interrupt_value(task);
 		if (error == 0)
 			error = EINTR;
 	}
 	return (error);
 }
 
 static int
 linux_file_ioctl_sub(struct file *fp, struct linux_file *filp,
     const struct file_operations *fop, u_long cmd, caddr_t data,
     struct thread *td)
 {
 	struct task_struct *task = current;
 	unsigned size;
 	int error;
 
 	size = IOCPARM_LEN(cmd);
 	/* refer to logic in sys_ioctl() */
 	if (size > 0) {
 		/*
 		 * Setup hint for linux_copyin() and linux_copyout().
 		 *
 		 * Background: Linux code expects a user-space address
 		 * while FreeBSD supplies a kernel-space address.
 		 */
 		task->bsd_ioctl_data = data;
 		task->bsd_ioctl_len = size;
 		data = (void *)LINUX_IOCTL_MIN_PTR;
 	} else {
 		/* fetch user-space pointer */
 		data = *(void **)data;
 	}
 #if defined(__amd64__)
 	if (td->td_proc->p_elf_machine == EM_386) {
 		/* try the compat IOCTL handler first */
 		if (fop->compat_ioctl != NULL) {
 			error = -OPW(fp, td, fop->compat_ioctl(filp,
 			    cmd, (u_long)data));
 		} else {
 			error = ENOTTY;
 		}
 
 		/* fallback to the regular IOCTL handler, if any */
 		if (error == ENOTTY && fop->unlocked_ioctl != NULL) {
 			error = -OPW(fp, td, fop->unlocked_ioctl(filp,
 			    cmd, (u_long)data));
 		}
 	} else
 #endif
 	{
 		if (fop->unlocked_ioctl != NULL) {
 			error = -OPW(fp, td, fop->unlocked_ioctl(filp,
 			    cmd, (u_long)data));
 		} else {
 			error = ENOTTY;
 		}
 	}
 	if (size > 0) {
 		task->bsd_ioctl_data = NULL;
 		task->bsd_ioctl_len = 0;
 	}
 
 	if (error == EWOULDBLOCK) {
 		/* update kqfilter status, if any */
 		linux_file_kqfilter_poll(filp,
 		    LINUX_KQ_FLAG_HAS_READ | LINUX_KQ_FLAG_HAS_WRITE);
 	} else {
 		error = linux_get_error(task, error);
 	}
 	return (error);
 }
 
 #define	LINUX_POLL_TABLE_NORMAL ((poll_table *)1)
 
 /*
  * This function atomically updates the poll wakeup state and returns
  * the previous state at the time of update.
  */
 static uint8_t
 linux_poll_wakeup_state(atomic_t *v, const uint8_t *pstate)
 {
 	int c, old;
 
 	c = v->counter;
 
 	while ((old = atomic_cmpxchg(v, c, pstate[c])) != c)
 		c = old;
 
 	return (c);
 }
 
 
 static int
 linux_poll_wakeup_callback(wait_queue_t *wq, unsigned int wq_state, int flags, void *key)
 {
 	static const uint8_t state[LINUX_FWQ_STATE_MAX] = {
 		[LINUX_FWQ_STATE_INIT] = LINUX_FWQ_STATE_INIT, /* NOP */
 		[LINUX_FWQ_STATE_NOT_READY] = LINUX_FWQ_STATE_NOT_READY, /* NOP */
 		[LINUX_FWQ_STATE_QUEUED] = LINUX_FWQ_STATE_READY,
 		[LINUX_FWQ_STATE_READY] = LINUX_FWQ_STATE_READY, /* NOP */
 	};
 	struct linux_file *filp = container_of(wq, struct linux_file, f_wait_queue.wq);
 
 	switch (linux_poll_wakeup_state(&filp->f_wait_queue.state, state)) {
 	case LINUX_FWQ_STATE_QUEUED:
 		linux_poll_wakeup(filp);
 		return (1);
 	default:
 		return (0);
 	}
 }
 
 void
 linux_poll_wait(struct linux_file *filp, wait_queue_head_t *wqh, poll_table *p)
 {
 	static const uint8_t state[LINUX_FWQ_STATE_MAX] = {
 		[LINUX_FWQ_STATE_INIT] = LINUX_FWQ_STATE_NOT_READY,
 		[LINUX_FWQ_STATE_NOT_READY] = LINUX_FWQ_STATE_NOT_READY, /* NOP */
 		[LINUX_FWQ_STATE_QUEUED] = LINUX_FWQ_STATE_QUEUED, /* NOP */
 		[LINUX_FWQ_STATE_READY] = LINUX_FWQ_STATE_QUEUED,
 	};
 
 	/* check if we are called inside the select system call */
 	if (p == LINUX_POLL_TABLE_NORMAL)
 		selrecord(curthread, &filp->f_selinfo);
 
 	switch (linux_poll_wakeup_state(&filp->f_wait_queue.state, state)) {
 	case LINUX_FWQ_STATE_INIT:
 		/* NOTE: file handles can only belong to one wait-queue */
 		filp->f_wait_queue.wqh = wqh;
 		filp->f_wait_queue.wq.func = &linux_poll_wakeup_callback;
 		add_wait_queue(wqh, &filp->f_wait_queue.wq);
 		atomic_set(&filp->f_wait_queue.state, LINUX_FWQ_STATE_QUEUED);
 		break;
 	default:
 		break;
 	}
 }
 
 static void
 linux_poll_wait_dequeue(struct linux_file *filp)
 {
 	static const uint8_t state[LINUX_FWQ_STATE_MAX] = {
 		[LINUX_FWQ_STATE_INIT] = LINUX_FWQ_STATE_INIT,	/* NOP */
 		[LINUX_FWQ_STATE_NOT_READY] = LINUX_FWQ_STATE_INIT,
 		[LINUX_FWQ_STATE_QUEUED] = LINUX_FWQ_STATE_INIT,
 		[LINUX_FWQ_STATE_READY] = LINUX_FWQ_STATE_INIT,
 	};
 
 	seldrain(&filp->f_selinfo);
 
 	switch (linux_poll_wakeup_state(&filp->f_wait_queue.state, state)) {
 	case LINUX_FWQ_STATE_NOT_READY:
 	case LINUX_FWQ_STATE_QUEUED:
 	case LINUX_FWQ_STATE_READY:
 		remove_wait_queue(filp->f_wait_queue.wqh, &filp->f_wait_queue.wq);
 		break;
 	default:
 		break;
 	}
 }
 
 void
 linux_poll_wakeup(struct linux_file *filp)
 {
 	/* this function should be NULL-safe */
 	if (filp == NULL)
 		return;
 
 	selwakeup(&filp->f_selinfo);
 
 	spin_lock(&filp->f_kqlock);
 	filp->f_kqflags |= LINUX_KQ_FLAG_NEED_READ |
 	    LINUX_KQ_FLAG_NEED_WRITE;
 
 	/* make sure the "knote" gets woken up */
 	KNOTE_LOCKED(&filp->f_selinfo.si_note, 1);
 	spin_unlock(&filp->f_kqlock);
 }
 
 static void
 linux_file_kqfilter_detach(struct knote *kn)
 {
 	struct linux_file *filp = kn->kn_hook;
 
 	spin_lock(&filp->f_kqlock);
 	knlist_remove(&filp->f_selinfo.si_note, kn, 1);
 	spin_unlock(&filp->f_kqlock);
 }
 
 static int
 linux_file_kqfilter_read_event(struct knote *kn, long hint)
 {
 	struct linux_file *filp = kn->kn_hook;
 
 	mtx_assert(&filp->f_kqlock.m, MA_OWNED);
 
 	return ((filp->f_kqflags & LINUX_KQ_FLAG_NEED_READ) ? 1 : 0);
 }
 
 static int
 linux_file_kqfilter_write_event(struct knote *kn, long hint)
 {
 	struct linux_file *filp = kn->kn_hook;
 
 	mtx_assert(&filp->f_kqlock.m, MA_OWNED);
 
 	return ((filp->f_kqflags & LINUX_KQ_FLAG_NEED_WRITE) ? 1 : 0);
 }
 
 static struct filterops linux_dev_kqfiltops_read = {
 	.f_isfd = 1,
 	.f_detach = linux_file_kqfilter_detach,
 	.f_event = linux_file_kqfilter_read_event,
 };
 
 static struct filterops linux_dev_kqfiltops_write = {
 	.f_isfd = 1,
 	.f_detach = linux_file_kqfilter_detach,
 	.f_event = linux_file_kqfilter_write_event,
 };
 
 static void
 linux_file_kqfilter_poll(struct linux_file *filp, int kqflags)
 {
 	struct thread *td;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	int temp;
 
 	if ((filp->f_kqflags & kqflags) == 0)
 		return;
 
 	td = curthread;
 
 	linux_get_fop(filp, &fop, &ldev);
 	/* get the latest polling state */
 	temp = OPW(filp->_file, td, fop->poll(filp, NULL));
 	linux_drop_fop(ldev);
 
 	spin_lock(&filp->f_kqlock);
 	/* clear kqflags */
 	filp->f_kqflags &= ~(LINUX_KQ_FLAG_NEED_READ |
 	    LINUX_KQ_FLAG_NEED_WRITE);
 	/* update kqflags */
 	if ((temp & (POLLIN | POLLOUT)) != 0) {
 		if ((temp & POLLIN) != 0)
 			filp->f_kqflags |= LINUX_KQ_FLAG_NEED_READ;
 		if ((temp & POLLOUT) != 0)
 			filp->f_kqflags |= LINUX_KQ_FLAG_NEED_WRITE;
 
 		/* make sure the "knote" gets woken up */
 		KNOTE_LOCKED(&filp->f_selinfo.si_note, 0);
 	}
 	spin_unlock(&filp->f_kqlock);
 }
 
 static int
 linux_file_kqfilter(struct file *file, struct knote *kn)
 {
 	struct linux_file *filp;
 	struct thread *td;
 	int error;
 
 	td = curthread;
 	filp = (struct linux_file *)file->f_data;
 	filp->f_flags = file->f_flag;
 	if (filp->f_op->poll == NULL)
 		return (EINVAL);
 
 	spin_lock(&filp->f_kqlock);
 	switch (kn->kn_filter) {
 	case EVFILT_READ:
 		filp->f_kqflags |= LINUX_KQ_FLAG_HAS_READ;
 		kn->kn_fop = &linux_dev_kqfiltops_read;
 		kn->kn_hook = filp;
 		knlist_add(&filp->f_selinfo.si_note, kn, 1);
 		error = 0;
 		break;
 	case EVFILT_WRITE:
 		filp->f_kqflags |= LINUX_KQ_FLAG_HAS_WRITE;
 		kn->kn_fop = &linux_dev_kqfiltops_write;
 		kn->kn_hook = filp;
 		knlist_add(&filp->f_selinfo.si_note, kn, 1);
 		error = 0;
 		break;
 	default:
 		error = EINVAL;
 		break;
 	}
 	spin_unlock(&filp->f_kqlock);
 
 	if (error == 0) {
 		linux_set_current(td);
 
 		/* update kqfilter status, if any */
 		linux_file_kqfilter_poll(filp,
 		    LINUX_KQ_FLAG_HAS_READ | LINUX_KQ_FLAG_HAS_WRITE);
 	}
 	return (error);
 }
 
 static int
 linux_file_mmap_single(struct file *fp, const struct file_operations *fop,
     vm_ooffset_t *offset, vm_size_t size, struct vm_object **object,
     int nprot, struct thread *td)
 {
 	struct task_struct *task;
 	struct vm_area_struct *vmap;
 	struct mm_struct *mm;
 	struct linux_file *filp;
 	vm_memattr_t attr;
 	int error;
 
 	filp = (struct linux_file *)fp->f_data;
 	filp->f_flags = fp->f_flag;
 
 	if (fop->mmap == NULL)
 		return (EOPNOTSUPP);
 
 	linux_set_current(td);
 
 	/*
 	 * The same VM object might be shared by multiple processes
 	 * and the mm_struct is usually freed when a process exits.
 	 *
 	 * The atomic reference below makes sure the mm_struct is
 	 * available as long as the vmap is in the linux_vma_head.
 	 */
 	task = current;
 	mm = task->mm;
 	if (atomic_inc_not_zero(&mm->mm_users) == 0)
 		return (EINVAL);
 
 	vmap = kzalloc(sizeof(*vmap), GFP_KERNEL);
 	vmap->vm_start = 0;
 	vmap->vm_end = size;
 	vmap->vm_pgoff = *offset / PAGE_SIZE;
 	vmap->vm_pfn = 0;
 	vmap->vm_flags = vmap->vm_page_prot = (nprot & VM_PROT_ALL);
 	vmap->vm_ops = NULL;
 	vmap->vm_file = get_file(filp);
 	vmap->vm_mm = mm;
 
 	if (unlikely(down_write_killable(&vmap->vm_mm->mmap_sem))) {
 		error = linux_get_error(task, EINTR);
 	} else {
 		error = -OPW(fp, td, fop->mmap(filp, vmap));
 		error = linux_get_error(task, error);
 		up_write(&vmap->vm_mm->mmap_sem);
 	}
 
 	if (error != 0) {
 		linux_cdev_handle_free(vmap);
 		return (error);
 	}
 
 	attr = pgprot2cachemode(vmap->vm_page_prot);
 
 	if (vmap->vm_ops != NULL) {
 		struct vm_area_struct *ptr;
 		void *vm_private_data;
 		bool vm_no_fault;
 
 		if (vmap->vm_ops->open == NULL ||
 		    vmap->vm_ops->close == NULL ||
 		    vmap->vm_private_data == NULL) {
 			/* free allocated VM area struct */
 			linux_cdev_handle_free(vmap);
 			return (EINVAL);
 		}
 
 		vm_private_data = vmap->vm_private_data;
 
 		rw_wlock(&linux_vma_lock);
 		TAILQ_FOREACH(ptr, &linux_vma_head, vm_entry) {
 			if (ptr->vm_private_data == vm_private_data)
 				break;
 		}
 		/* check if there is an existing VM area struct */
 		if (ptr != NULL) {
 			/* check if the VM area structure is invalid */
 			if (ptr->vm_ops == NULL ||
 			    ptr->vm_ops->open == NULL ||
 			    ptr->vm_ops->close == NULL) {
 				error = ESTALE;
 				vm_no_fault = 1;
 			} else {
 				error = EEXIST;
 				vm_no_fault = (ptr->vm_ops->fault == NULL);
 			}
 		} else {
 			/* insert VM area structure into list */
 			TAILQ_INSERT_TAIL(&linux_vma_head, vmap, vm_entry);
 			error = 0;
 			vm_no_fault = (vmap->vm_ops->fault == NULL);
 		}
 		rw_wunlock(&linux_vma_lock);
 
 		if (error != 0) {
 			/* free allocated VM area struct */
 			linux_cdev_handle_free(vmap);
 			/* check for stale VM area struct */
 			if (error != EEXIST)
 				return (error);
 		}
 
 		/* check if there is no fault handler */
 		if (vm_no_fault) {
 			*object = cdev_pager_allocate(vm_private_data, OBJT_DEVICE,
 			    &linux_cdev_pager_ops[1], size, nprot, *offset,
 			    td->td_ucred);
 		} else {
 			*object = cdev_pager_allocate(vm_private_data, OBJT_MGTDEVICE,
 			    &linux_cdev_pager_ops[0], size, nprot, *offset,
 			    td->td_ucred);
 		}
 
 		/* check if allocating the VM object failed */
 		if (*object == NULL) {
 			if (error == 0) {
 				/* remove VM area struct from list */
 				linux_cdev_handle_remove(vmap);
 				/* free allocated VM area struct */
 				linux_cdev_handle_free(vmap);
 			}
 			return (EINVAL);
 		}
 	} else {
 		struct sglist *sg;
 
 		sg = sglist_alloc(1, M_WAITOK);
 		sglist_append_phys(sg,
 		    (vm_paddr_t)vmap->vm_pfn << PAGE_SHIFT, vmap->vm_len);
 
 		*object = vm_pager_allocate(OBJT_SG, sg, vmap->vm_len,
 		    nprot, 0, td->td_ucred);
 
 		linux_cdev_handle_free(vmap);
 
 		if (*object == NULL) {
 			sglist_free(sg);
 			return (EINVAL);
 		}
 	}
 
 	if (attr != VM_MEMATTR_DEFAULT) {
 		VM_OBJECT_WLOCK(*object);
 		vm_object_set_memattr(*object, attr);
 		VM_OBJECT_WUNLOCK(*object);
 	}
 	*offset = 0;
 	return (0);
 }
 
 struct cdevsw linuxcdevsw = {
 	.d_version = D_VERSION,
 	.d_fdopen = linux_dev_fdopen,
 	.d_name = "lkpidev",
 };
 
 static int
 linux_file_read(struct file *file, struct uio *uio, struct ucred *active_cred,
     int flags, struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	ssize_t bytes;
 	int error;
 
 	error = 0;
 	filp = (struct linux_file *)file->f_data;
 	filp->f_flags = file->f_flag;
 	/* XXX no support for I/O vectors currently */
 	if (uio->uio_iovcnt != 1)
 		return (EOPNOTSUPP);
 	if (uio->uio_resid > DEVFS_IOSIZE_MAX)
 		return (EINVAL);
 	linux_set_current(td);
 	linux_get_fop(filp, &fop, &ldev);
 	if (fop->read != NULL) {
 		bytes = OPW(file, td, fop->read(filp,
 		    uio->uio_iov->iov_base,
 		    uio->uio_iov->iov_len, &uio->uio_offset));
 		if (bytes >= 0) {
 			uio->uio_iov->iov_base =
 			    ((uint8_t *)uio->uio_iov->iov_base) + bytes;
 			uio->uio_iov->iov_len -= bytes;
 			uio->uio_resid -= bytes;
 		} else {
 			error = linux_get_error(current, -bytes);
 		}
 	} else
 		error = ENXIO;
 
 	/* update kqfilter status, if any */
 	linux_file_kqfilter_poll(filp, LINUX_KQ_FLAG_HAS_READ);
 	linux_drop_fop(ldev);
 
 	return (error);
 }
 
 static int
 linux_file_write(struct file *file, struct uio *uio, struct ucred *active_cred,
     int flags, struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	ssize_t bytes;
 	int error;
 
 	filp = (struct linux_file *)file->f_data;
 	filp->f_flags = file->f_flag;
 	/* XXX no support for I/O vectors currently */
 	if (uio->uio_iovcnt != 1)
 		return (EOPNOTSUPP);
 	if (uio->uio_resid > DEVFS_IOSIZE_MAX)
 		return (EINVAL);
 	linux_set_current(td);
 	linux_get_fop(filp, &fop, &ldev);
 	if (fop->write != NULL) {
 		bytes = OPW(file, td, fop->write(filp,
 		    uio->uio_iov->iov_base,
 		    uio->uio_iov->iov_len, &uio->uio_offset));
 		if (bytes >= 0) {
 			uio->uio_iov->iov_base =
 			    ((uint8_t *)uio->uio_iov->iov_base) + bytes;
 			uio->uio_iov->iov_len -= bytes;
 			uio->uio_resid -= bytes;
 			error = 0;
 		} else {
 			error = linux_get_error(current, -bytes);
 		}
 	} else
 		error = ENXIO;
 
 	/* update kqfilter status, if any */
 	linux_file_kqfilter_poll(filp, LINUX_KQ_FLAG_HAS_WRITE);
 
 	linux_drop_fop(ldev);
 
 	return (error);
 }
 
 static int
 linux_file_poll(struct file *file, int events, struct ucred *active_cred,
     struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	int revents;
 
 	filp = (struct linux_file *)file->f_data;
 	filp->f_flags = file->f_flag;
 	linux_set_current(td);
 	linux_get_fop(filp, &fop, &ldev);
 	if (fop->poll != NULL) {
 		revents = OPW(file, td, fop->poll(filp,
 		    LINUX_POLL_TABLE_NORMAL)) & events;
 	} else {
 		revents = 0;
 	}
 	linux_drop_fop(ldev);
 	return (revents);
 }
 
 static int
 linux_file_close(struct file *file, struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	int error;
 
 	filp = (struct linux_file *)file->f_data;
 
 	KASSERT(file_count(filp) == 0,
 	    ("File refcount(%d) is not zero", file_count(filp)));
 
 	error = 0;
 	filp->f_flags = file->f_flag;
 	linux_set_current(td);
 	linux_poll_wait_dequeue(filp);
 	linux_get_fop(filp, &fop, &ldev);
 	if (fop->release != NULL)
 		error = -OPW(file, td, fop->release(filp->f_vnode, filp));
 	funsetown(&filp->f_sigio);
 	if (filp->f_vnode != NULL)
 		vdrop(filp->f_vnode);
 	linux_drop_fop(ldev);
 	if (filp->f_cdev != NULL)
 		linux_cdev_deref(filp->f_cdev);
 	kfree(filp);
 
 	return (error);
 }
 
 static int
 linux_file_ioctl(struct file *fp, u_long cmd, void *data, struct ucred *cred,
     struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	int error;
 
 	error = 0;
 	filp = (struct linux_file *)fp->f_data;
 	filp->f_flags = fp->f_flag;
 	linux_get_fop(filp, &fop, &ldev);
 
 	linux_set_current(td);
 	switch (cmd) {
 	case FIONBIO:
 		break;
 	case FIOASYNC:
 		if (fop->fasync == NULL)
 			break;
 		error = -OPW(fp, td, fop->fasync(0, filp, fp->f_flag & FASYNC));
 		break;
 	case FIOSETOWN:
 		error = fsetown(*(int *)data, &filp->f_sigio);
 		if (error == 0) {
 			if (fop->fasync == NULL)
 				break;
 			error = -OPW(fp, td, fop->fasync(0, filp,
 			    fp->f_flag & FASYNC));
 		}
 		break;
 	case FIOGETOWN:
 		*(int *)data = fgetown(&filp->f_sigio);
 		break;
 	default:
 		error = linux_file_ioctl_sub(fp, filp, fop, cmd, data, td);
 		break;
 	}
 	linux_drop_fop(ldev);
 	return (error);
 }
 
 static int
 linux_file_mmap_sub(struct thread *td, vm_size_t objsize, vm_prot_t prot,
     vm_prot_t *maxprotp, int *flagsp, struct file *fp,
     vm_ooffset_t *foff, const struct file_operations *fop, vm_object_t *objp)
 {
 	/*
 	 * Character devices do not provide private mappings
 	 * of any kind:
 	 */
 	if ((*maxprotp & VM_PROT_WRITE) == 0 &&
 	    (prot & VM_PROT_WRITE) != 0)
 		return (EACCES);
 	if ((*flagsp & (MAP_PRIVATE | MAP_COPY)) != 0)
 		return (EINVAL);
 
 	return (linux_file_mmap_single(fp, fop, foff, objsize, objp,
 	    (int)prot, td));
 }
 
 static int
 linux_file_mmap(struct file *fp, vm_map_t map, vm_offset_t *addr, vm_size_t size,
     vm_prot_t prot, vm_prot_t cap_maxprot, int flags, vm_ooffset_t foff,
     struct thread *td)
 {
 	struct linux_file *filp;
 	const struct file_operations *fop;
 	struct linux_cdev *ldev;
 	struct mount *mp;
 	struct vnode *vp;
 	vm_object_t object;
 	vm_prot_t maxprot;
 	int error;
 
 	filp = (struct linux_file *)fp->f_data;
 
 	vp = filp->f_vnode;
 	if (vp == NULL)
 		return (EOPNOTSUPP);
 
 	/*
 	 * Ensure that file and memory protections are
 	 * compatible.
 	 */
 	mp = vp->v_mount;
 	if (mp != NULL && (mp->mnt_flag & MNT_NOEXEC) != 0) {
 		maxprot = VM_PROT_NONE;
 		if ((prot & VM_PROT_EXECUTE) != 0)
 			return (EACCES);
 	} else
 		maxprot = VM_PROT_EXECUTE;
 	if ((fp->f_flag & FREAD) != 0)
 		maxprot |= VM_PROT_READ;
 	else if ((prot & VM_PROT_READ) != 0)
 		return (EACCES);
 
 	/*
 	 * If we are sharing potential changes via MAP_SHARED and we
 	 * are trying to get write permission although we opened it
 	 * without asking for it, bail out.
 	 *
 	 * Note that most character devices always share mappings.
 	 *
 	 * Rely on linux_file_mmap_sub() to fail invalid MAP_PRIVATE
 	 * requests rather than doing it here.
 	 */
 	if ((flags & MAP_SHARED) != 0) {
 		if ((fp->f_flag & FWRITE) != 0)
 			maxprot |= VM_PROT_WRITE;
 		else if ((prot & VM_PROT_WRITE) != 0)
 			return (EACCES);
 	}
 	maxprot &= cap_maxprot;
 
 	linux_get_fop(filp, &fop, &ldev);
 	error = linux_file_mmap_sub(td, size, prot, &maxprot, &flags, fp,
 	    &foff, fop, &object);
 	if (error != 0)
 		goto out;
 
 	error = vm_mmap_object(map, addr, size, prot, maxprot, flags, object,
 	    foff, FALSE, td);
 	if (error != 0)
 		vm_object_deallocate(object);
 out:
 	linux_drop_fop(ldev);
 	return (error);
 }
 
 static int
 linux_file_stat(struct file *fp, struct stat *sb, struct ucred *active_cred,
     struct thread *td)
 {
 	struct linux_file *filp;
 	struct vnode *vp;
 	int error;
 
 	filp = (struct linux_file *)fp->f_data;
 	if (filp->f_vnode == NULL)
 		return (EOPNOTSUPP);
 
 	vp = filp->f_vnode;
 
 	vn_lock(vp, LK_SHARED | LK_RETRY);
 	error = vn_stat(vp, sb, td->td_ucred, NOCRED, td);
 	VOP_UNLOCK(vp, 0);
 
 	return (error);
 }
 
 static int
 linux_file_fill_kinfo(struct file *fp, struct kinfo_file *kif,
     struct filedesc *fdp)
 {
 	struct linux_file *filp;
 	struct vnode *vp;
 	int error;
 
 	filp = fp->f_data;
 	vp = filp->f_vnode;
 	if (vp == NULL) {
 		error = 0;
 		kif->kf_type = KF_TYPE_DEV;
 	} else {
 		vref(vp);
 		FILEDESC_SUNLOCK(fdp);
 		error = vn_fill_kinfo_vnode(vp, kif);
 		vrele(vp);
 		kif->kf_type = KF_TYPE_VNODE;
 		FILEDESC_SLOCK(fdp);
 	}
 	return (error);
 }
 
 unsigned int
 linux_iminor(struct inode *inode)
 {
 	struct linux_cdev *ldev;
 
 	if (inode == NULL || inode->v_rdev == NULL ||
 	    inode->v_rdev->si_devsw != &linuxcdevsw)
 		return (-1U);
 	ldev = inode->v_rdev->si_drv1;
 	if (ldev == NULL)
 		return (-1U);
 
 	return (minor(ldev->dev));
 }
 
 struct fileops linuxfileops = {
 	.fo_read = linux_file_read,
 	.fo_write = linux_file_write,
 	.fo_truncate = invfo_truncate,
 	.fo_kqfilter = linux_file_kqfilter,
 	.fo_stat = linux_file_stat,
 	.fo_fill_kinfo = linux_file_fill_kinfo,
 	.fo_poll = linux_file_poll,
 	.fo_close = linux_file_close,
 	.fo_ioctl = linux_file_ioctl,
 	.fo_mmap = linux_file_mmap,
 	.fo_chmod = invfo_chmod,
 	.fo_chown = invfo_chown,
 	.fo_sendfile = invfo_sendfile,
 	.fo_flags = DFLAG_PASSABLE,
 };
 
 /*
  * Hash of vmmap addresses.  This is infrequently accessed and does not
  * need to be particularly large.  This is done because we must store the
  * caller's idea of the map size to properly unmap.
  */
 struct vmmap {
 	LIST_ENTRY(vmmap)	vm_next;
 	void 			*vm_addr;
 	unsigned long		vm_size;
 };
 
 struct vmmaphd {
 	struct vmmap *lh_first;
 };
 #define	VMMAP_HASH_SIZE	64
 #define	VMMAP_HASH_MASK	(VMMAP_HASH_SIZE - 1)
 #define	VM_HASH(addr)	((uintptr_t)(addr) >> PAGE_SHIFT) & VMMAP_HASH_MASK
 static struct vmmaphd vmmaphead[VMMAP_HASH_SIZE];
 static struct mtx vmmaplock;
 
 static void
 vmmap_add(void *addr, unsigned long size)
 {
 	struct vmmap *vmmap;
 
 	vmmap = kmalloc(sizeof(*vmmap), GFP_KERNEL);
 	mtx_lock(&vmmaplock);
 	vmmap->vm_size = size;
 	vmmap->vm_addr = addr;
 	LIST_INSERT_HEAD(&vmmaphead[VM_HASH(addr)], vmmap, vm_next);
 	mtx_unlock(&vmmaplock);
 }
 
 static struct vmmap *
 vmmap_remove(void *addr)
 {
 	struct vmmap *vmmap;
 
 	mtx_lock(&vmmaplock);
 	LIST_FOREACH(vmmap, &vmmaphead[VM_HASH(addr)], vm_next)
 		if (vmmap->vm_addr == addr)
 			break;
 	if (vmmap)
 		LIST_REMOVE(vmmap, vm_next);
 	mtx_unlock(&vmmaplock);
 
 	return (vmmap);
 }
 
 #if defined(__i386__) || defined(__amd64__) || defined(__powerpc__) || defined(__aarch64__)
 void *
 _ioremap_attr(vm_paddr_t phys_addr, unsigned long size, int attr)
 {
 	void *addr;
 
 	addr = pmap_mapdev_attr(phys_addr, size, attr);
 	if (addr == NULL)
 		return (NULL);
 	vmmap_add(addr, size);
 
 	return (addr);
 }
 #endif
 
 void
 iounmap(void *addr)
 {
 	struct vmmap *vmmap;
 
 	vmmap = vmmap_remove(addr);
 	if (vmmap == NULL)
 		return;
 #if defined(__i386__) || defined(__amd64__) || defined(__powerpc__) || defined(__aarch64__)
 	pmap_unmapdev((vm_offset_t)addr, vmmap->vm_size);
 #endif
 	kfree(vmmap);
 }
 
 
 void *
 vmap(struct page **pages, unsigned int count, unsigned long flags, int prot)
 {
 	vm_offset_t off;
 	size_t size;
 
 	size = count * PAGE_SIZE;
 	off = kva_alloc(size);
 	if (off == 0)
 		return (NULL);
 	vmmap_add((void *)off, size);
 	pmap_qenter(off, pages, count);
 
 	return ((void *)off);
 }
 
 void
 vunmap(void *addr)
 {
 	struct vmmap *vmmap;
 
 	vmmap = vmmap_remove(addr);
 	if (vmmap == NULL)
 		return;
 	pmap_qremove((vm_offset_t)addr, vmmap->vm_size / PAGE_SIZE);
 	kva_free((vm_offset_t)addr, vmmap->vm_size);
 	kfree(vmmap);
 }
 
 char *
 kvasprintf(gfp_t gfp, const char *fmt, va_list ap)
 {
 	unsigned int len;
 	char *p;
 	va_list aq;
 
 	va_copy(aq, ap);
 	len = vsnprintf(NULL, 0, fmt, aq);
 	va_end(aq);
 
 	p = kmalloc(len + 1, gfp);
 	if (p != NULL)
 		vsnprintf(p, len + 1, fmt, ap);
 
 	return (p);
 }
 
 char *
 kasprintf(gfp_t gfp, const char *fmt, ...)
 {
 	va_list ap;
 	char *p;
 
 	va_start(ap, fmt);
 	p = kvasprintf(gfp, fmt, ap);
 	va_end(ap);
 
 	return (p);
 }
 
 static void
 linux_timer_callback_wrapper(void *context)
 {
 	struct timer_list *timer;
 
 	linux_set_current(curthread);
 
 	timer = context;
 	timer->function(timer->data);
 }
 
 void
 mod_timer(struct timer_list *timer, int expires)
 {
 
 	timer->expires = expires;
 	callout_reset(&timer->callout,
 	    linux_timer_jiffies_until(expires),
 	    &linux_timer_callback_wrapper, timer);
 }
 
 void
 add_timer(struct timer_list *timer)
 {
 
 	callout_reset(&timer->callout,
 	    linux_timer_jiffies_until(timer->expires),
 	    &linux_timer_callback_wrapper, timer);
 }
 
 void
 add_timer_on(struct timer_list *timer, int cpu)
 {
 
 	callout_reset_on(&timer->callout,
 	    linux_timer_jiffies_until(timer->expires),
 	    &linux_timer_callback_wrapper, timer, cpu);
 }
 
 static void
 linux_timer_init(void *arg)
 {
 
 	/*
 	 * Compute an internal HZ value which can divide 2**32 to
 	 * avoid timer rounding problems when the tick value wraps
 	 * around 2**32:
 	 */
 	linux_timer_hz_mask = 1;
 	while (linux_timer_hz_mask < (unsigned long)hz)
 		linux_timer_hz_mask *= 2;
 	linux_timer_hz_mask--;
 }
 SYSINIT(linux_timer, SI_SUB_DRIVERS, SI_ORDER_FIRST, linux_timer_init, NULL);
 
 void
 linux_complete_common(struct completion *c, int all)
 {
 	int wakeup_swapper;
 
 	sleepq_lock(c);
 	if (all) {
 		c->done = UINT_MAX;
 		wakeup_swapper = sleepq_broadcast(c, SLEEPQ_SLEEP, 0, 0);
 	} else {
 		if (c->done != UINT_MAX)
 			c->done++;
 		wakeup_swapper = sleepq_signal(c, SLEEPQ_SLEEP, 0, 0);
 	}
 	sleepq_release(c);
 	if (wakeup_swapper)
 		kick_proc0();
 }
 
 /*
  * Indefinite wait for done != 0 with or without signals.
  */
 int
 linux_wait_for_common(struct completion *c, int flags)
 {
 	struct task_struct *task;
 	int error;
 
 	if (SCHEDULER_STOPPED())
 		return (0);
 
 	task = current;
 
 	if (flags != 0)
 		flags = SLEEPQ_INTERRUPTIBLE | SLEEPQ_SLEEP;
 	else
 		flags = SLEEPQ_SLEEP;
 	error = 0;
 	for (;;) {
 		sleepq_lock(c);
 		if (c->done)
 			break;
 		sleepq_add(c, NULL, "completion", flags, 0);
 		if (flags & SLEEPQ_INTERRUPTIBLE) {
 			DROP_GIANT();
 			error = -sleepq_wait_sig(c, 0);
 			PICKUP_GIANT();
 			if (error != 0) {
 				linux_schedule_save_interrupt_value(task, error);
 				error = -ERESTARTSYS;
 				goto intr;
 			}
 		} else {
 			DROP_GIANT();
 			sleepq_wait(c, 0);
 			PICKUP_GIANT();
 		}
 	}
 	if (c->done != UINT_MAX)
 		c->done--;
 	sleepq_release(c);
 
 intr:
 	return (error);
 }
 
 /*
  * Time limited wait for done != 0 with or without signals.
  */
 int
 linux_wait_for_timeout_common(struct completion *c, int timeout, int flags)
 {
 	struct task_struct *task;
 	int end = jiffies + timeout;
 	int error;
 
 	if (SCHEDULER_STOPPED())
 		return (0);
 
 	task = current;
 
 	if (flags != 0)
 		flags = SLEEPQ_INTERRUPTIBLE | SLEEPQ_SLEEP;
 	else
 		flags = SLEEPQ_SLEEP;
 
 	for (;;) {
 		sleepq_lock(c);
 		if (c->done)
 			break;
 		sleepq_add(c, NULL, "completion", flags, 0);
 		sleepq_set_timeout(c, linux_timer_jiffies_until(end));
 
 		DROP_GIANT();
 		if (flags & SLEEPQ_INTERRUPTIBLE)
 			error = -sleepq_timedwait_sig(c, 0);
 		else
 			error = -sleepq_timedwait(c, 0);
 		PICKUP_GIANT();
 
 		if (error != 0) {
 			/* check for timeout */
 			if (error == -EWOULDBLOCK) {
 				error = 0;	/* timeout */
 			} else {
 				/* signal happened */
 				linux_schedule_save_interrupt_value(task, error);
 				error = -ERESTARTSYS;
 			}
 			goto done;
 		}
 	}
 	if (c->done != UINT_MAX)
 		c->done--;
 	sleepq_release(c);
 
 	/* return how many jiffies are left */
 	error = linux_timer_jiffies_until(end);
 done:
 	return (error);
 }
 
 int
 linux_try_wait_for_completion(struct completion *c)
 {
 	int isdone;
 
 	sleepq_lock(c);
 	isdone = (c->done != 0);
 	if (c->done != 0 && c->done != UINT_MAX)
 		c->done--;
 	sleepq_release(c);
 	return (isdone);
 }
 
 int
 linux_completion_done(struct completion *c)
 {
 	int isdone;
 
 	sleepq_lock(c);
 	isdone = (c->done != 0);
 	sleepq_release(c);
 	return (isdone);
 }
 
 static void
 linux_cdev_deref(struct linux_cdev *ldev)
 {
 
 	if (refcount_release(&ldev->refs))
 		kfree(ldev);
 }
 
 static void
 linux_cdev_release(struct kobject *kobj)
 {
 	struct linux_cdev *cdev;
 	struct kobject *parent;
 
 	cdev = container_of(kobj, struct linux_cdev, kobj);
 	parent = kobj->parent;
 	linux_destroy_dev(cdev);
 	linux_cdev_deref(cdev);
 	kobject_put(parent);
 }
 
 static void
 linux_cdev_static_release(struct kobject *kobj)
 {
 	struct linux_cdev *cdev;
 	struct kobject *parent;
 
 	cdev = container_of(kobj, struct linux_cdev, kobj);
 	parent = kobj->parent;
 	linux_destroy_dev(cdev);
 	kobject_put(parent);
 }
 
 void
 linux_destroy_dev(struct linux_cdev *ldev)
 {
 
 	if (ldev->cdev == NULL)
 		return;
 
 	MPASS((ldev->siref & LDEV_SI_DTR) == 0);
 	atomic_set_int(&ldev->siref, LDEV_SI_DTR);
 	while ((atomic_load_int(&ldev->siref) & ~LDEV_SI_DTR) != 0)
 		pause("ldevdtr", hz / 4);
 
 	destroy_dev(ldev->cdev);
 	ldev->cdev = NULL;
 }
 
 const struct kobj_type linux_cdev_ktype = {
 	.release = linux_cdev_release,
 };
 
 const struct kobj_type linux_cdev_static_ktype = {
 	.release = linux_cdev_static_release,
 };
 
 static void
 linux_handle_ifnet_link_event(void *arg, struct ifnet *ifp, int linkstate)
 {
 	struct notifier_block *nb;
 
 	nb = arg;
 	if (linkstate == LINK_STATE_UP)
 		nb->notifier_call(nb, NETDEV_UP, ifp);
 	else
 		nb->notifier_call(nb, NETDEV_DOWN, ifp);
 }
 
 static void
 linux_handle_ifnet_arrival_event(void *arg, struct ifnet *ifp)
 {
 	struct notifier_block *nb;
 
 	nb = arg;
 	nb->notifier_call(nb, NETDEV_REGISTER, ifp);
 }
 
 static void
 linux_handle_ifnet_departure_event(void *arg, struct ifnet *ifp)
 {
 	struct notifier_block *nb;
 
 	nb = arg;
 	nb->notifier_call(nb, NETDEV_UNREGISTER, ifp);
 }
 
 static void
 linux_handle_iflladdr_event(void *arg, struct ifnet *ifp)
 {
 	struct notifier_block *nb;
 
 	nb = arg;
 	nb->notifier_call(nb, NETDEV_CHANGEADDR, ifp);
 }
 
 static void
 linux_handle_ifaddr_event(void *arg, struct ifnet *ifp)
 {
 	struct notifier_block *nb;
 
 	nb = arg;
 	nb->notifier_call(nb, NETDEV_CHANGEIFADDR, ifp);
 }
 
 int
 register_netdevice_notifier(struct notifier_block *nb)
 {
 
 	nb->tags[NETDEV_UP] = EVENTHANDLER_REGISTER(
 	    ifnet_link_event, linux_handle_ifnet_link_event, nb, 0);
 	nb->tags[NETDEV_REGISTER] = EVENTHANDLER_REGISTER(
 	    ifnet_arrival_event, linux_handle_ifnet_arrival_event, nb, 0);
 	nb->tags[NETDEV_UNREGISTER] = EVENTHANDLER_REGISTER(
 	    ifnet_departure_event, linux_handle_ifnet_departure_event, nb, 0);
 	nb->tags[NETDEV_CHANGEADDR] = EVENTHANDLER_REGISTER(
 	    iflladdr_event, linux_handle_iflladdr_event, nb, 0);
 
 	return (0);
 }
 
 int
 register_inetaddr_notifier(struct notifier_block *nb)
 {
 
 	nb->tags[NETDEV_CHANGEIFADDR] = EVENTHANDLER_REGISTER(
 	    ifaddr_event, linux_handle_ifaddr_event, nb, 0);
 	return (0);
 }
 
 int
 unregister_netdevice_notifier(struct notifier_block *nb)
 {
 
 	EVENTHANDLER_DEREGISTER(ifnet_link_event,
 	    nb->tags[NETDEV_UP]);
 	EVENTHANDLER_DEREGISTER(ifnet_arrival_event,
 	    nb->tags[NETDEV_REGISTER]);
 	EVENTHANDLER_DEREGISTER(ifnet_departure_event,
 	    nb->tags[NETDEV_UNREGISTER]);
 	EVENTHANDLER_DEREGISTER(iflladdr_event,
 	    nb->tags[NETDEV_CHANGEADDR]);
 
 	return (0);
 }
 
 int
 unregister_inetaddr_notifier(struct notifier_block *nb)
 {
 
 	EVENTHANDLER_DEREGISTER(ifaddr_event,
 	    nb->tags[NETDEV_CHANGEIFADDR]);
 
 	return (0);
 }
 
 struct list_sort_thunk {
 	int (*cmp)(void *, struct list_head *, struct list_head *);
 	void *priv;
 };
 
 static inline int
 linux_le_cmp(void *priv, const void *d1, const void *d2)
 {
 	struct list_head *le1, *le2;
 	struct list_sort_thunk *thunk;
 
 	thunk = priv;
 	le1 = *(__DECONST(struct list_head **, d1));
 	le2 = *(__DECONST(struct list_head **, d2));
 	return ((thunk->cmp)(thunk->priv, le1, le2));
 }
 
 void
 list_sort(void *priv, struct list_head *head, int (*cmp)(void *priv,
     struct list_head *a, struct list_head *b))
 {
 	struct list_sort_thunk thunk;
 	struct list_head **ar, *le;
 	size_t count, i;
 
 	count = 0;
 	list_for_each(le, head)
 		count++;
 	ar = malloc(sizeof(struct list_head *) * count, M_KMALLOC, M_WAITOK);
 	i = 0;
 	list_for_each(le, head)
 		ar[i++] = le;
 	thunk.cmp = cmp;
 	thunk.priv = priv;
 	qsort_r(ar, count, sizeof(struct list_head *), &thunk, linux_le_cmp);
 	INIT_LIST_HEAD(head);
 	for (i = 0; i < count; i++)
 		list_add_tail(ar[i], head);
 	free(ar, M_KMALLOC);
 }
 
 void
 linux_irq_handler(void *ent)
 {
 	struct irq_ent *irqe;
 
 	linux_set_current(curthread);
 
 	irqe = ent;
 	irqe->handler(irqe->irq, irqe->arg);
 }
 
 #if defined(__i386__) || defined(__amd64__)
 int
 linux_wbinvd_on_all_cpus(void)
 {
 
 	pmap_invalidate_cache();
 	return (0);
 }
 #endif
 
 int
 linux_on_each_cpu(void callback(void *), void *data)
 {
 
 	smp_rendezvous(smp_no_rendezvous_barrier, callback,
 	    smp_no_rendezvous_barrier, data);
 	return (0);
 }
 
 int
 linux_in_atomic(void)
 {
 
 	return ((curthread->td_pflags & TDP_NOFAULTING) != 0);
 }
 
 struct linux_cdev *
 linux_find_cdev(const char *name, unsigned major, unsigned minor)
 {
 	dev_t dev = MKDEV(major, minor);
 	struct cdev *cdev;
 
 	dev_lock();
 	LIST_FOREACH(cdev, &linuxcdevsw.d_devs, si_list) {
 		struct linux_cdev *ldev = cdev->si_drv1;
 		if (ldev->dev == dev &&
 		    strcmp(kobject_name(&ldev->kobj), name) == 0) {
 			break;
 		}
 	}
 	dev_unlock();
 
 	return (cdev != NULL ? cdev->si_drv1 : NULL);
 }
 
 int
 __register_chrdev(unsigned int major, unsigned int baseminor,
     unsigned int count, const char *name,
     const struct file_operations *fops)
 {
 	struct linux_cdev *cdev;
 	int ret = 0;
 	int i;
 
 	for (i = baseminor; i < baseminor + count; i++) {
 		cdev = cdev_alloc();
-		cdev_init(cdev, fops);
+		cdev->ops = fops;
 		kobject_set_name(&cdev->kobj, name);
 
 		ret = cdev_add(cdev, makedev(major, i), 1);
 		if (ret != 0)
 			break;
 	}
 	return (ret);
 }
 
 int
 __register_chrdev_p(unsigned int major, unsigned int baseminor,
     unsigned int count, const char *name,
     const struct file_operations *fops, uid_t uid,
     gid_t gid, int mode)
 {
 	struct linux_cdev *cdev;
 	int ret = 0;
 	int i;
 
 	for (i = baseminor; i < baseminor + count; i++) {
 		cdev = cdev_alloc();
-		cdev_init(cdev, fops);
+		cdev->ops = fops;
 		kobject_set_name(&cdev->kobj, name);
 
 		ret = cdev_add_ext(cdev, makedev(major, i), uid, gid, mode);
 		if (ret != 0)
 			break;
 	}
 	return (ret);
 }
 
 void
 __unregister_chrdev(unsigned int major, unsigned int baseminor,
     unsigned int count, const char *name)
 {
 	struct linux_cdev *cdevp;
 	int i;
 
 	for (i = baseminor; i < baseminor + count; i++) {
 		cdevp = linux_find_cdev(name, major, i);
 		if (cdevp != NULL)
 			cdev_del(cdevp);
 	}
 }
 
 void
 linux_dump_stack(void)
 {
 #ifdef STACK
 	struct stack st;
 
 	stack_zero(&st);
 	stack_save(&st);
 	stack_print(&st);
 #endif
 }
 
 #if defined(__i386__) || defined(__amd64__)
 bool linux_cpu_has_clflush;
 #endif
 
 static void
 linux_compat_init(void *arg)
 {
 	struct sysctl_oid *rootoid;
 	int i;
 
 #if defined(__i386__) || defined(__amd64__)
 	linux_cpu_has_clflush = (cpu_feature & CPUID_CLFSH);
 #endif
 	rw_init(&linux_vma_lock, "lkpi-vma-lock");
 
 	rootoid = SYSCTL_ADD_ROOT_NODE(NULL,
 	    OID_AUTO, "sys", CTLFLAG_RD|CTLFLAG_MPSAFE, NULL, "sys");
 	kobject_init(&linux_class_root, &linux_class_ktype);
 	kobject_set_name(&linux_class_root, "class");
 	linux_class_root.oidp = SYSCTL_ADD_NODE(NULL, SYSCTL_CHILDREN(rootoid),
 	    OID_AUTO, "class", CTLFLAG_RD|CTLFLAG_MPSAFE, NULL, "class");
 	kobject_init(&linux_root_device.kobj, &linux_dev_ktype);
 	kobject_set_name(&linux_root_device.kobj, "device");
 	linux_root_device.kobj.oidp = SYSCTL_ADD_NODE(NULL,
 	    SYSCTL_CHILDREN(rootoid), OID_AUTO, "device", CTLFLAG_RD, NULL,
 	    "device");
 	linux_root_device.bsddev = root_bus;
 	linux_class_misc.name = "misc";
 	class_register(&linux_class_misc);
 	INIT_LIST_HEAD(&pci_drivers);
 	INIT_LIST_HEAD(&pci_devices);
 	spin_lock_init(&pci_lock);
 	mtx_init(&vmmaplock, "IO Map lock", NULL, MTX_DEF);
 	for (i = 0; i < VMMAP_HASH_SIZE; i++)
 		LIST_INIT(&vmmaphead[i]);
 }
 SYSINIT(linux_compat, SI_SUB_DRIVERS, SI_ORDER_SECOND, linux_compat_init, NULL);
 
 static void
 linux_compat_uninit(void *arg)
 {
 	linux_kobject_kfree_name(&linux_class_root);
 	linux_kobject_kfree_name(&linux_root_device.kobj);
 	linux_kobject_kfree_name(&linux_class_misc.kobj);
 
 	mtx_destroy(&vmmaplock);
 	spin_lock_destroy(&pci_lock);
 	rw_destroy(&linux_vma_lock);
 }
 SYSUNINIT(linux_compat, SI_SUB_DRIVERS, SI_ORDER_SECOND, linux_compat_uninit, NULL);
 
 /*
  * NOTE: Linux frequently uses "unsigned long" for pointer to integer
  * conversion and vice versa, where in FreeBSD "uintptr_t" would be
  * used. Assert these types have the same size, else some parts of the
  * LinuxKPI may not work like expected:
  */
 CTASSERT(sizeof(unsigned long) == sizeof(uintptr_t));
Index: user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_pci.c
===================================================================
--- user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_pci.c	(revision 346925)
+++ user/ngie/bug-237403/sys/compat/linuxkpi/common/src/linux_pci.c	(revision 346926)
@@ -1,332 +1,828 @@
 /*-
  * Copyright (c) 2015-2016 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/malloc.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/bus.h>
 #include <sys/fcntl.h>
 #include <sys/file.h>
 #include <sys/filio.h>
+#include <sys/pctrie.h>
 #include <sys/rwlock.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <machine/stdarg.h>
 
 #include <linux/kobject.h>
 #include <linux/device.h>
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/cdev.h>
 #include <linux/file.h>
 #include <linux/sysfs.h>
 #include <linux/mm.h>
 #include <linux/io.h>
 #include <linux/vmalloc.h>
 #include <linux/pci.h>
 #include <linux/compat.h>
 
 static device_probe_t linux_pci_probe;
 static device_attach_t linux_pci_attach;
 static device_detach_t linux_pci_detach;
 static device_suspend_t linux_pci_suspend;
 static device_resume_t linux_pci_resume;
 static device_shutdown_t linux_pci_shutdown;
 
 static device_method_t pci_methods[] = {
 	DEVMETHOD(device_probe, linux_pci_probe),
 	DEVMETHOD(device_attach, linux_pci_attach),
 	DEVMETHOD(device_detach, linux_pci_detach),
 	DEVMETHOD(device_suspend, linux_pci_suspend),
 	DEVMETHOD(device_resume, linux_pci_resume),
 	DEVMETHOD(device_shutdown, linux_pci_shutdown),
 	DEVMETHOD_END
 };
 
+struct linux_dma_priv {
+	uint64_t	dma_mask;
+	struct mtx	dma_lock;
+	bus_dma_tag_t	dmat;
+	struct mtx	ptree_lock;
+	struct pctrie	ptree;
+};
+
+static int
+linux_pdev_dma_init(struct pci_dev *pdev)
+{
+	struct linux_dma_priv *priv;
+
+	priv = malloc(sizeof(*priv), M_DEVBUF, M_WAITOK | M_ZERO);
+	pdev->dev.dma_priv = priv;
+
+	mtx_init(&priv->dma_lock, "linux_dma", NULL, MTX_DEF);
+
+	mtx_init(&priv->ptree_lock, "linux_dma_ptree", NULL, MTX_DEF);
+	pctrie_init(&priv->ptree);
+
+	return (0);
+}
+
+static int
+linux_pdev_dma_uninit(struct pci_dev *pdev)
+{
+	struct linux_dma_priv *priv;
+
+	priv = pdev->dev.dma_priv;
+	if (priv->dmat)
+		bus_dma_tag_destroy(priv->dmat);
+	mtx_destroy(&priv->dma_lock);
+	mtx_destroy(&priv->ptree_lock);
+	free(priv, M_DEVBUF);
+	pdev->dev.dma_priv = NULL;
+	return (0);
+}
+
+int
+linux_dma_tag_init(struct device *dev, u64 dma_mask)
+{
+	struct linux_dma_priv *priv;
+	int error;
+
+	priv = dev->dma_priv;
+
+	if (priv->dmat) {
+		if (priv->dma_mask == dma_mask)
+			return (0);
+
+		bus_dma_tag_destroy(priv->dmat);
+	}
+
+	priv->dma_mask = dma_mask;
+
+	error = bus_dma_tag_create(bus_get_dma_tag(dev->bsddev),
+	    1, 0,			/* alignment, boundary */
+	    dma_mask,			/* lowaddr */
+	    BUS_SPACE_MAXADDR,		/* highaddr */
+	    NULL, NULL,			/* filtfunc, filtfuncarg */
+	    BUS_SPACE_MAXSIZE,		/* maxsize */
+	    1,				/* nsegments */
+	    BUS_SPACE_MAXSIZE,		/* maxsegsz */
+	    0,				/* flags */
+	    NULL, NULL,			/* lockfunc, lockfuncarg */
+	    &priv->dmat);
+	return (-error);
+}
+
 static struct pci_driver *
 linux_pci_find(device_t dev, const struct pci_device_id **idp)
 {
 	const struct pci_device_id *id;
 	struct pci_driver *pdrv;
 	uint16_t vendor;
 	uint16_t device;
 	uint16_t subvendor;
 	uint16_t subdevice;
 
 	vendor = pci_get_vendor(dev);
 	device = pci_get_device(dev);
 	subvendor = pci_get_subvendor(dev);
 	subdevice = pci_get_subdevice(dev);
 
 	spin_lock(&pci_lock);
 	list_for_each_entry(pdrv, &pci_drivers, links) {
 		for (id = pdrv->id_table; id->vendor != 0; id++) {
 			if (vendor == id->vendor &&
 			    (PCI_ANY_ID == id->device || device == id->device) &&
 			    (PCI_ANY_ID == id->subvendor || subvendor == id->subvendor) &&
 			    (PCI_ANY_ID == id->subdevice || subdevice == id->subdevice)) {
 				*idp = id;
 				spin_unlock(&pci_lock);
 				return (pdrv);
 			}
 		}
 	}
 	spin_unlock(&pci_lock);
 	return (NULL);
 }
 
 static int
 linux_pci_probe(device_t dev)
 {
 	const struct pci_device_id *id;
 	struct pci_driver *pdrv;
 
 	if ((pdrv = linux_pci_find(dev, &id)) == NULL)
 		return (ENXIO);
 	if (device_get_driver(dev) != &pdrv->bsddriver)
 		return (ENXIO);
 	device_set_desc(dev, pdrv->name);
 	return (0);
 }
 
 static int
 linux_pci_attach(device_t dev)
 {
 	struct resource_list_entry *rle;
 	struct pci_bus *pbus;
 	struct pci_dev *pdev;
 	struct pci_devinfo *dinfo;
 	struct pci_driver *pdrv;
 	const struct pci_device_id *id;
 	device_t parent;
 	devclass_t devclass;
 	int error;
 
 	linux_set_current(curthread);
 
 	pdrv = linux_pci_find(dev, &id);
 	pdev = device_get_softc(dev);
 
 	parent = device_get_parent(dev);
 	devclass = device_get_devclass(parent);
 	if (pdrv->isdrm) {
 		dinfo = device_get_ivars(parent);
 		device_set_ivars(dev, dinfo);
 	} else {
 		dinfo = device_get_ivars(dev);
 	}
 
 	pdev->dev.parent = &linux_root_device;
 	pdev->dev.bsddev = dev;
 	INIT_LIST_HEAD(&pdev->dev.irqents);
 	pdev->devfn = PCI_DEVFN(pci_get_slot(dev), pci_get_function(dev));
 	pdev->device = dinfo->cfg.device;
 	pdev->vendor = dinfo->cfg.vendor;
 	pdev->subsystem_vendor = dinfo->cfg.subvendor;
 	pdev->subsystem_device = dinfo->cfg.subdevice;
 	pdev->class = pci_get_class(dev);
 	pdev->revision = pci_get_revid(dev);
-	pdev->dev.dma_mask = &pdev->dma_mask;
 	pdev->pdrv = pdrv;
 	kobject_init(&pdev->dev.kobj, &linux_dev_ktype);
 	kobject_set_name(&pdev->dev.kobj, device_get_nameunit(dev));
 	kobject_add(&pdev->dev.kobj, &linux_root_device.kobj,
 	    kobject_name(&pdev->dev.kobj));
 	rle = linux_pci_get_rle(pdev, SYS_RES_IRQ, 0);
 	if (rle != NULL)
 		pdev->dev.irq = rle->start;
 	else
 		pdev->dev.irq = LINUX_IRQ_INVALID;
 	pdev->irq = pdev->dev.irq;
+	error = linux_pdev_dma_init(pdev);
+	if (error)
+		goto out;
 
 	if (pdev->bus == NULL) {
 		pbus = malloc(sizeof(*pbus), M_DEVBUF, M_WAITOK | M_ZERO);
 		pbus->self = pdev;
 		pbus->number = pci_get_bus(dev);
 		pdev->bus = pbus;
 	}
 
 	spin_lock(&pci_lock);
 	list_add(&pdev->links, &pci_devices);
 	spin_unlock(&pci_lock);
 
 	error = pdrv->probe(pdev, id);
+out:
 	if (error) {
 		spin_lock(&pci_lock);
 		list_del(&pdev->links);
 		spin_unlock(&pci_lock);
 		put_device(&pdev->dev);
 		error = -error;
 	}
 	return (error);
 }
 
 static int
 linux_pci_detach(device_t dev)
 {
 	struct pci_dev *pdev;
 
 	linux_set_current(curthread);
 	pdev = device_get_softc(dev);
 
 	pdev->pdrv->remove(pdev);
+	linux_pdev_dma_uninit(pdev);
 
 	spin_lock(&pci_lock);
 	list_del(&pdev->links);
 	spin_unlock(&pci_lock);
 	device_set_desc(dev, NULL);
 	put_device(&pdev->dev);
 
 	return (0);
 }
 
 static int
 linux_pci_suspend(device_t dev)
 {
 	const struct dev_pm_ops *pmops;
 	struct pm_message pm = { };
 	struct pci_dev *pdev;
 	int error;
 
 	error = 0;
 	linux_set_current(curthread);
 	pdev = device_get_softc(dev);
 	pmops = pdev->pdrv->driver.pm;
 
 	if (pdev->pdrv->suspend != NULL)
 		error = -pdev->pdrv->suspend(pdev, pm);
 	else if (pmops != NULL && pmops->suspend != NULL) {
 		error = -pmops->suspend(&pdev->dev);
 		if (error == 0 && pmops->suspend_late != NULL)
 			error = -pmops->suspend_late(&pdev->dev);
 	}
 	return (error);
 }
 
 static int
 linux_pci_resume(device_t dev)
 {
 	const struct dev_pm_ops *pmops;
 	struct pci_dev *pdev;
 	int error;
 
 	error = 0;
 	linux_set_current(curthread);
 	pdev = device_get_softc(dev);
 	pmops = pdev->pdrv->driver.pm;
 
 	if (pdev->pdrv->resume != NULL)
 		error = -pdev->pdrv->resume(pdev);
 	else if (pmops != NULL && pmops->resume != NULL) {
 		if (pmops->resume_early != NULL)
 			error = -pmops->resume_early(&pdev->dev);
 		if (error == 0 && pmops->resume != NULL)
 			error = -pmops->resume(&pdev->dev);
 	}
 	return (error);
 }
 
 static int
 linux_pci_shutdown(device_t dev)
 {
 	struct pci_dev *pdev;
 
 	linux_set_current(curthread);
 	pdev = device_get_softc(dev);
 	if (pdev->pdrv->shutdown != NULL)
 		pdev->pdrv->shutdown(pdev);
 	return (0);
 }
 
 static int
 _linux_pci_register_driver(struct pci_driver *pdrv, devclass_t dc)
 {
 	int error;
 
 	linux_set_current(curthread);
 	spin_lock(&pci_lock);
 	list_add(&pdrv->links, &pci_drivers);
 	spin_unlock(&pci_lock);
 	pdrv->bsddriver.name = pdrv->name;
 	pdrv->bsddriver.methods = pci_methods;
 	pdrv->bsddriver.size = sizeof(struct pci_dev);
 
 	mtx_lock(&Giant);
 	error = devclass_add_driver(dc, &pdrv->bsddriver,
 	    BUS_PASS_DEFAULT, &pdrv->bsdclass);
 	mtx_unlock(&Giant);
 	return (-error);
 }
 
 int
 linux_pci_register_driver(struct pci_driver *pdrv)
 {
 	devclass_t dc;
 
 	dc = devclass_find("pci");
 	if (dc == NULL)
 		return (-ENXIO);
 	pdrv->isdrm = false;
 	return (_linux_pci_register_driver(pdrv, dc));
 }
 
 int
 linux_pci_register_drm_driver(struct pci_driver *pdrv)
 {
 	devclass_t dc;
 
 	dc = devclass_create("vgapci");
 	if (dc == NULL)
 		return (-ENXIO);
 	pdrv->isdrm = true;
 	pdrv->name = "drmn";
 	return (_linux_pci_register_driver(pdrv, dc));
 }
 
 void
 linux_pci_unregister_driver(struct pci_driver *pdrv)
 {
 	devclass_t bus;
 
 	bus = devclass_find("pci");
 
 	spin_lock(&pci_lock);
 	list_del(&pdrv->links);
 	spin_unlock(&pci_lock);
 	mtx_lock(&Giant);
 	if (bus != NULL)
 		devclass_delete_driver(bus, &pdrv->bsddriver);
 	mtx_unlock(&Giant);
+}
+
+CTASSERT(sizeof(dma_addr_t) <= sizeof(uint64_t));
+
+struct linux_dma_obj {
+	void		*vaddr;
+	uint64_t	dma_addr;
+	bus_dmamap_t	dmamap;
+};
+
+static uma_zone_t linux_dma_trie_zone;
+static uma_zone_t linux_dma_obj_zone;
+
+static void
+linux_dma_init(void *arg)
+{
+
+	linux_dma_trie_zone = uma_zcreate("linux_dma_pctrie",
+	    pctrie_node_size(), NULL, NULL, pctrie_zone_init, NULL,
+	    UMA_ALIGN_PTR, 0);
+	linux_dma_obj_zone = uma_zcreate("linux_dma_object",
+	    sizeof(struct linux_dma_obj), NULL, NULL, NULL, NULL,
+	    UMA_ALIGN_PTR, 0);
+
+}
+SYSINIT(linux_dma, SI_SUB_DRIVERS, SI_ORDER_THIRD, linux_dma_init, NULL);
+
+static void
+linux_dma_uninit(void *arg)
+{
+
+	uma_zdestroy(linux_dma_obj_zone);
+	uma_zdestroy(linux_dma_trie_zone);
+}
+SYSUNINIT(linux_dma, SI_SUB_DRIVERS, SI_ORDER_THIRD, linux_dma_uninit, NULL);
+
+static void *
+linux_dma_trie_alloc(struct pctrie *ptree)
+{
+
+	return (uma_zalloc(linux_dma_trie_zone, 0));
+}
+
+static void
+linux_dma_trie_free(struct pctrie *ptree, void *node)
+{
+
+	uma_zfree(linux_dma_trie_zone, node);
+}
+
+
+PCTRIE_DEFINE(LINUX_DMA, linux_dma_obj, dma_addr, linux_dma_trie_alloc,
+    linux_dma_trie_free);
+
+void *
+linux_dma_alloc_coherent(struct device *dev, size_t size,
+    dma_addr_t *dma_handle, gfp_t flag)
+{
+	struct linux_dma_priv *priv;
+	vm_paddr_t high;
+	size_t align;
+	void *mem;
+
+	if (dev == NULL || dev->dma_priv == NULL) {
+		*dma_handle = 0;
+		return (NULL);
+	}
+	priv = dev->dma_priv;
+	if (priv->dma_mask)
+		high = priv->dma_mask;
+	else if (flag & GFP_DMA32)
+		high = BUS_SPACE_MAXADDR_32BIT;
+	else
+		high = BUS_SPACE_MAXADDR;
+	align = PAGE_SIZE << get_order(size);
+	mem = (void *)kmem_alloc_contig(size, flag, 0, high, align, 0,
+	    VM_MEMATTR_DEFAULT);
+	if (mem)
+		*dma_handle = linux_dma_map_phys(dev, vtophys(mem), size);
+	else
+		*dma_handle = 0;
+	return (mem);
+}
+
+dma_addr_t
+linux_dma_map_phys(struct device *dev, vm_paddr_t phys, size_t len)
+{
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+	int error, nseg;
+	bus_dma_segment_t seg;
+
+	priv = dev->dma_priv;
+
+	obj = uma_zalloc(linux_dma_obj_zone, 0);
+
+	if (bus_dmamap_create(priv->dmat, 0, &obj->dmamap) != 0) {
+		uma_zfree(linux_dma_obj_zone, obj);
+		return (0);
+	}
+
+	nseg = -1;
+	mtx_lock(&priv->dma_lock);
+	if (_bus_dmamap_load_phys(priv->dmat, obj->dmamap, phys, len,
+	    BUS_DMA_NOWAIT, &seg, &nseg) != 0) {
+		bus_dmamap_destroy(priv->dmat, obj->dmamap);
+		mtx_unlock(&priv->dma_lock);
+		uma_zfree(linux_dma_obj_zone, obj);
+		return (0);
+	}
+	mtx_unlock(&priv->dma_lock);
+
+	KASSERT(++nseg == 1, ("More than one segment (nseg=%d)", nseg));
+	obj->dma_addr = seg.ds_addr;
+
+	mtx_lock(&priv->ptree_lock);
+	error = LINUX_DMA_PCTRIE_INSERT(&priv->ptree, obj);
+	mtx_unlock(&priv->ptree_lock);
+	if (error != 0) {
+		mtx_lock(&priv->dma_lock);
+		bus_dmamap_unload(priv->dmat, obj->dmamap);
+		bus_dmamap_destroy(priv->dmat, obj->dmamap);
+		mtx_unlock(&priv->dma_lock);
+		uma_zfree(linux_dma_obj_zone, obj);
+		return (0);
+	}
+
+	return (obj->dma_addr);
+}
+
+void
+linux_dma_unmap(struct device *dev, dma_addr_t dma_addr, size_t len)
+{
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+
+	priv = dev->dma_priv;
+
+	mtx_lock(&priv->ptree_lock);
+	obj = LINUX_DMA_PCTRIE_LOOKUP(&priv->ptree, dma_addr);
+	if (obj == NULL) {
+		mtx_unlock(&priv->ptree_lock);
+		return;
+	}
+	LINUX_DMA_PCTRIE_REMOVE(&priv->ptree, dma_addr);
+	mtx_unlock(&priv->ptree_lock);
+
+	mtx_lock(&priv->dma_lock);
+	bus_dmamap_unload(priv->dmat, obj->dmamap);
+	bus_dmamap_destroy(priv->dmat, obj->dmamap);
+	mtx_unlock(&priv->dma_lock);
+
+	uma_zfree(linux_dma_obj_zone, obj);
+}
+
+int
+linux_dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, int nents,
+    enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+	struct scatterlist *dma_sg, *sg;
+	int dma_nents, error, nseg;
+	size_t seg_len;
+	vm_paddr_t seg_phys, prev_phys_end;
+	bus_dma_segment_t seg;
+
+	priv = dev->dma_priv;
+
+	obj = uma_zalloc(linux_dma_obj_zone, 0);
+
+	if (bus_dmamap_create(priv->dmat, 0, &obj->dmamap) != 0) {
+		uma_zfree(linux_dma_obj_zone, obj);
+		return (0);
+	}
+
+	sg = sgl;
+	dma_sg = sg;
+	dma_nents = 0;
+	while (nents > 0) {
+		seg_phys = sg_phys(sg);
+		seg_len = sg->length;
+		while (--nents > 0) {
+			prev_phys_end = sg_phys(sg) + sg->length;
+			sg = sg_next(sg);
+			if (prev_phys_end != sg_phys(sg))
+				break;
+			seg_len += sg->length;
+		}
+
+		nseg = -1;
+		mtx_lock(&priv->dma_lock);
+		if (_bus_dmamap_load_phys(priv->dmat, obj->dmamap,
+		    seg_phys, seg_len, BUS_DMA_NOWAIT,
+		    &seg, &nseg) != 0) {
+			bus_dmamap_unload(priv->dmat, obj->dmamap);
+			bus_dmamap_destroy(priv->dmat, obj->dmamap);
+			mtx_unlock(&priv->dma_lock);
+			uma_zfree(linux_dma_obj_zone, obj);
+			return (0);
+		}
+		mtx_unlock(&priv->dma_lock);
+		KASSERT(++nseg == 1, ("More than one segment (nseg=%d)", nseg));
+
+		sg_dma_address(dma_sg) = seg.ds_addr;
+		sg_dma_len(dma_sg) = seg.ds_len;
+
+		dma_sg = sg_next(dma_sg);
+		dma_nents++;
+        }
+
+	obj->dma_addr = sg_dma_address(sgl);
+
+	mtx_lock(&priv->ptree_lock);
+	error = LINUX_DMA_PCTRIE_INSERT(&priv->ptree, obj);
+	mtx_unlock(&priv->ptree_lock);
+	if (error != 0) {
+		mtx_lock(&priv->dma_lock);
+		bus_dmamap_unload(priv->dmat, obj->dmamap);
+		bus_dmamap_destroy(priv->dmat, obj->dmamap);
+		mtx_unlock(&priv->dma_lock);
+		uma_zfree(linux_dma_obj_zone, obj);
+		return (0);
+	}
+
+	return (dma_nents);
+}
+
+void
+linux_dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sgl,
+    int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+
+	priv = dev->dma_priv;
+
+	mtx_lock(&priv->ptree_lock);
+	obj = LINUX_DMA_PCTRIE_LOOKUP(&priv->ptree, sg_dma_address(sgl));
+	if (obj == NULL) {
+		mtx_unlock(&priv->ptree_lock);
+		return;
+	}
+	LINUX_DMA_PCTRIE_REMOVE(&priv->ptree, sg_dma_address(sgl));
+	mtx_unlock(&priv->ptree_lock);
+
+	mtx_lock(&priv->dma_lock);
+	bus_dmamap_unload(priv->dmat, obj->dmamap);
+	bus_dmamap_destroy(priv->dmat, obj->dmamap);
+	mtx_unlock(&priv->dma_lock);
+
+	uma_zfree(linux_dma_obj_zone, obj);
+}
+
+static inline int
+dma_pool_obj_ctor(void *mem, int size, void *arg, int flags)
+{
+	struct linux_dma_obj *obj = mem;
+	struct dma_pool *pool = arg;
+	int error, nseg;
+	bus_dma_segment_t seg;
+
+	nseg = -1;
+	mtx_lock(&pool->pool_dma_lock);
+	error = _bus_dmamap_load_phys(pool->pool_dmat, obj->dmamap,
+	    vtophys(obj->vaddr), pool->pool_entry_size, BUS_DMA_NOWAIT,
+	    &seg, &nseg);
+	mtx_unlock(&pool->pool_dma_lock);
+	if (error != 0) {
+		return (error);
+	}
+	KASSERT(++nseg == 1, ("More than one segment (nseg=%d)", nseg));
+	obj->dma_addr = seg.ds_addr;
+
+	return (0);
+}
+
+static void
+dma_pool_obj_dtor(void *mem, int size, void *arg)
+{
+	struct linux_dma_obj *obj = mem;
+	struct dma_pool *pool = arg;
+
+	mtx_lock(&pool->pool_dma_lock);
+	bus_dmamap_unload(pool->pool_dmat, obj->dmamap);
+	mtx_unlock(&pool->pool_dma_lock);
+}
+
+static int
+dma_pool_obj_import(void *arg, void **store, int count, int domain __unused,
+    int flags)
+{
+	struct dma_pool *pool = arg;
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+	int error, i;
+
+	priv = pool->pool_pdev->dev.dma_priv;
+	for (i = 0; i < count; i++) {
+		obj = uma_zalloc(linux_dma_obj_zone, flags);
+		if (obj == NULL)
+			break;
+
+		error = bus_dmamem_alloc(pool->pool_dmat, &obj->vaddr,
+		    BUS_DMA_NOWAIT, &obj->dmamap);
+		if (error!= 0) {
+			uma_zfree(linux_dma_obj_zone, obj);
+			break;
+		}
+
+		store[i] = obj;
+	}
+
+	return (i);
+}
+
+static void
+dma_pool_obj_release(void *arg, void **store, int count)
+{
+	struct dma_pool *pool = arg;
+	struct linux_dma_priv *priv;
+	struct linux_dma_obj *obj;
+	int i;
+
+	priv = pool->pool_pdev->dev.dma_priv;
+	for (i = 0; i < count; i++) {
+		obj = store[i];
+		bus_dmamem_free(pool->pool_dmat, obj->vaddr, obj->dmamap);
+		uma_zfree(linux_dma_obj_zone, obj);
+	}
+}
+
+struct dma_pool *
+linux_dma_pool_create(char *name, struct device *dev, size_t size,
+    size_t align, size_t boundary)
+{
+	struct linux_dma_priv *priv;
+	struct dma_pool *pool;
+
+	priv = dev->dma_priv;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	pool->pool_pdev = to_pci_dev(dev);
+	pool->pool_entry_size = size;
+
+	if (bus_dma_tag_create(bus_get_dma_tag(dev->bsddev),
+	    align, boundary,		/* alignment, boundary */
+	    priv->dma_mask,		/* lowaddr */
+	    BUS_SPACE_MAXADDR,		/* highaddr */
+	    NULL, NULL,			/* filtfunc, filtfuncarg */
+	    size,			/* maxsize */
+	    1,				/* nsegments */
+	    size,			/* maxsegsz */
+	    0,				/* flags */
+	    NULL, NULL,			/* lockfunc, lockfuncarg */
+	    &pool->pool_dmat)) {
+		kfree(pool);
+		return (NULL);
+	}
+
+	pool->pool_zone = uma_zcache_create(name, -1, dma_pool_obj_ctor,
+	    dma_pool_obj_dtor, NULL, NULL, dma_pool_obj_import,
+	    dma_pool_obj_release, pool, 0);
+
+	mtx_init(&pool->pool_dma_lock, "linux_dma_pool", NULL, MTX_DEF);
+
+	mtx_init(&pool->pool_ptree_lock, "linux_dma_pool_ptree", NULL,
+	    MTX_DEF);
+	pctrie_init(&pool->pool_ptree);
+
+	return (pool);
+}
+
+void
+linux_dma_pool_destroy(struct dma_pool *pool)
+{
+
+	uma_zdestroy(pool->pool_zone);
+	bus_dma_tag_destroy(pool->pool_dmat);
+	mtx_destroy(&pool->pool_ptree_lock);
+	mtx_destroy(&pool->pool_dma_lock);
+	kfree(pool);
+}
+
+void *
+linux_dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
+    dma_addr_t *handle)
+{
+	struct linux_dma_obj *obj;
+
+	obj = uma_zalloc_arg(pool->pool_zone, pool, mem_flags);
+	if (obj == NULL)
+		return (NULL);
+
+	mtx_lock(&pool->pool_ptree_lock);
+	if (LINUX_DMA_PCTRIE_INSERT(&pool->pool_ptree, obj) != 0) {
+		mtx_unlock(&pool->pool_ptree_lock);
+		uma_zfree_arg(pool->pool_zone, obj, pool);
+		return (NULL);
+	}
+	mtx_unlock(&pool->pool_ptree_lock);
+
+	*handle = obj->dma_addr;
+	return (obj->vaddr);
+}
+
+void
+linux_dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma_addr)
+{
+	struct linux_dma_obj *obj;
+
+	mtx_lock(&pool->pool_ptree_lock);
+	obj = LINUX_DMA_PCTRIE_LOOKUP(&pool->pool_ptree, dma_addr);
+	if (obj == NULL) {
+		mtx_unlock(&pool->pool_ptree_lock);
+		return;
+	}
+	LINUX_DMA_PCTRIE_REMOVE(&pool->pool_ptree, dma_addr);
+	mtx_unlock(&pool->pool_ptree_lock);
+
+	uma_zfree_arg(pool->pool_zone, obj, pool);
 }
Index: user/ngie/bug-237403/sys/conf/files.powerpc
===================================================================
--- user/ngie/bug-237403/sys/conf/files.powerpc	(revision 346925)
+++ user/ngie/bug-237403/sys/conf/files.powerpc	(revision 346926)
@@ -1,278 +1,278 @@
 # This file tells config what files go into building a kernel,
 # files marked standard are always included.
 #
 # $FreeBSD$
 #
 # The long compile-with and dependency lines are required because of
 # limitations in config: backslash-newline doesn't work in strings, and
 # dependency lines other than the first are silently ignored.
 #
 #
 
 font.h				optional	sc			\
 	compile-with	"uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x16.fnt && file2c 'u_char dflt_font_16[16*256] = {' '};' < ${SC_DFLT_FONT}-8x16 > font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x14.fnt && file2c 'u_char dflt_font_14[14*256] = {' '};' < ${SC_DFLT_FONT}-8x14 >> font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x8.fnt && file2c 'u_char dflt_font_8[8*256] = {' '};' < ${SC_DFLT_FONT}-8x8 >> font.h" \
 	no-obj no-implicit-rule before-depend				\
 	clean	"font.h ${SC_DFLT_FONT}-8x14 ${SC_DFLT_FONT}-8x16 ${SC_DFLT_FONT}-8x8"
 #
 # There is only an asm version on ppc64.
 cddl/compat/opensolaris/kern/opensolaris_atomic.c			optional zfs powerpc | dtrace powerpc | zfs powerpcspe | dtrace powerpcspe compile-with "${ZFS_C}"
 cddl/contrib/opensolaris/common/atomic/powerpc64/opensolaris_atomic.S	optional zfs powerpc64 | dtrace powerpc64 compile-with "${ZFS_S}"
 cddl/dev/dtrace/powerpc/dtrace_asm.S		optional dtrace compile-with "${DTRACE_S}"
 cddl/dev/dtrace/powerpc/dtrace_subr.c		optional dtrace compile-with "${DTRACE_C}"
 cddl/dev/fbt/powerpc/fbt_isa.c			optional dtrace_fbt | dtraceall compile-with "${FBT_C}"
 crypto/blowfish/bf_enc.c	optional	crypto | ipsec | ipsec_support
 crypto/des/des_enc.c		optional	crypto | ipsec | ipsec_support | netsmb
 dev/bm/if_bm.c			optional	bm powermac
 dev/adb/adb_bus.c		optional	adb
 dev/adb/adb_kbd.c		optional	adb
 dev/adb/adb_mouse.c		optional	adb
 dev/adb/adb_hb_if.m		optional	adb
 dev/adb/adb_if.m		optional	adb
 dev/adb/adb_buttons.c		optional	adb
 dev/agp/agp_apple.c		optional	agp powermac
 dev/fb/fb.c			optional	sc
 dev/hwpmc/hwpmc_e500.c		optional	hwpmc
 dev/hwpmc/hwpmc_mpc7xxx.c	optional	hwpmc
 dev/hwpmc/hwpmc_powerpc.c	optional	hwpmc
 dev/hwpmc/hwpmc_ppc970.c	optional	hwpmc
 dev/iicbus/ad7417.c		optional	ad7417 powermac
 dev/iicbus/adm1030.c		optional	powermac windtunnel | adm1030 powermac
 dev/iicbus/adt746x.c		optional        adt746x powermac
 dev/iicbus/ds1631.c		optional	ds1631 powermac
 dev/iicbus/ds1775.c		optional	ds1775 powermac
 dev/iicbus/max6690.c		optional	max6690 powermac
 dev/iicbus/ofw_iicbus.c		optional	iicbus aim
 dev/ipmi/ipmi.c			optional	ipmi
 dev/ipmi/ipmi_opal.c		optional	powernv ipmi
 dev/nand/nfc_fsl.c		optional	nand mpc85xx
 dev/nand/nfc_rb.c		optional	nand mpc85xx
 # Most ofw stuff below is brought in by conf/files for options FDT, but
 # we always want it, even on non-FDT platforms.
 dev/fdt/simplebus.c		standard
 dev/ofw/openfirm.c		standard
 dev/ofw/openfirmio.c		standard
 dev/ofw/ofw_bus_if.m		standard
 dev/ofw/ofw_cpu.c		standard
 dev/ofw/ofw_if.m		standard
 dev/ofw/ofw_bus_subr.c		standard
 dev/ofw/ofw_console.c		optional	aim
 dev/ofw/ofw_disk.c		optional	ofwd aim
 dev/ofw/ofwbus.c		standard
 dev/ofw/ofwpci.c		optional 	pci
 dev/ofw/ofw_standard.c		optional	aim powerpc
 dev/ofw/ofw_subr.c		standard
 dev/powermac_nvram/powermac_nvram.c optional	powermac_nvram powermac
 dev/quicc/quicc_bfe_fdt.c	optional	quicc mpc85xx
 dev/random/darn.c		optional	powerpc64 random
 dev/scc/scc_bfe_macio.c		optional	scc powermac
 dev/sdhci/fsl_sdhci.c		optional	mpc85xx sdhci
 dev/sec/sec.c			optional	sec mpc85xx
 dev/sound/macio/aoa.c		optional	snd_davbus | snd_ai2s powermac
 dev/sound/macio/davbus.c	optional	snd_davbus powermac
 dev/sound/macio/i2s.c		optional	snd_ai2s powermac
 dev/sound/macio/onyx.c		optional	snd_ai2s iicbus powermac
 dev/sound/macio/snapper.c	optional	snd_ai2s iicbus powermac
 dev/sound/macio/tumbler.c	optional	snd_ai2s iicbus powermac
 dev/syscons/scgfbrndr.c		optional	sc
 dev/tsec/if_tsec.c		optional	tsec
 dev/tsec/if_tsec_fdt.c		optional	tsec 
 dev/uart/uart_cpu_powerpc.c	optional	uart
 dev/usb/controller/ehci_fsl.c	optional	ehci mpc85xx
 dev/vt/hw/ofwfb/ofwfb.c		optional	vt aim
 kern/kern_clocksource.c		standard
 kern/subr_dummy_vdso_tc.c	standard
 kern/syscalls.c			optional	ktr
 kern/subr_sfbuf.c		standard
 libkern/ashldi3.c		optional	powerpc | powerpcspe
 libkern/ashrdi3.c		optional	powerpc | powerpcspe
 libkern/bcmp.c			standard
 libkern/bcopy.c			standard
 libkern/cmpdi2.c		optional	powerpc | powerpcspe
 libkern/divdi3.c		optional	powerpc | powerpcspe
 libkern/ffs.c			standard
 libkern/ffsl.c			standard
 libkern/ffsll.c			standard
 libkern/fls.c			standard
 libkern/flsl.c			standard
 libkern/flsll.c			standard
 libkern/lshrdi3.c		optional	powerpc | powerpcspe
 libkern/memcmp.c		standard
 libkern/memset.c		standard
 libkern/moddi3.c		optional	powerpc | powerpcspe
 libkern/qdivrem.c		optional	powerpc | powerpcspe
 libkern/ucmpdi2.c		optional	powerpc | powerpcspe
 libkern/udivdi3.c		optional	powerpc | powerpcspe
 libkern/umoddi3.c		optional	powerpc | powerpcspe
 powerpc/aim/locore.S		optional	aim no-obj
 powerpc/aim/aim_machdep.c	optional	aim
 powerpc/aim/mmu_oea.c		optional	aim powerpc
 powerpc/aim/mmu_oea64.c		optional	aim
 powerpc/aim/moea64_if.m		optional	aim
 powerpc/aim/moea64_native.c	optional	aim
 powerpc/aim/mp_cpudep.c		optional	aim
 powerpc/aim/slb.c		optional	aim powerpc64
 powerpc/booke/locore.S		optional	booke no-obj
 powerpc/booke/booke_machdep.c	optional	booke
 powerpc/booke/machdep_e500.c	optional	booke_e500
 powerpc/booke/mp_cpudep.c	optional	booke smp
 powerpc/booke/platform_bare.c	optional	booke
 powerpc/booke/pmap.c		optional	booke
 powerpc/booke/spe.c		optional	powerpcspe
 powerpc/cpufreq/dfs.c		optional	cpufreq
 powerpc/cpufreq/mpc85xx_jog.c	optional	cpufreq mpc85xx
 powerpc/cpufreq/pcr.c		optional	cpufreq aim
 powerpc/cpufreq/pmcr.c		optional	cpufreq aim powerpc64
 powerpc/cpufreq/pmufreq.c	optional	cpufreq aim pmu
 powerpc/fpu/fpu_add.c		optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_compare.c	optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_div.c		optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_emu.c		optional	fpu_emu
 powerpc/fpu/fpu_explode.c	optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_implode.c	optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_mul.c		optional	fpu_emu | powerpcspe
 powerpc/fpu/fpu_sqrt.c		optional	fpu_emu
 powerpc/fpu/fpu_subr.c		optional	fpu_emu | powerpcspe
 powerpc/mambo/mambocall.S	optional	mambo
 powerpc/mambo/mambo.c		optional	mambo
 powerpc/mambo/mambo_console.c	optional	mambo
 powerpc/mambo/mambo_disk.c	optional	mambo
 powerpc/mikrotik/platform_rb.c	optional	mikrotik
 powerpc/mikrotik/rb_led.c	optional	mikrotik
 powerpc/mpc85xx/atpic.c		optional	mpc85xx isa
 powerpc/mpc85xx/ds1553_bus_fdt.c	optional	ds1553
 powerpc/mpc85xx/ds1553_core.c	optional	ds1553
 powerpc/mpc85xx/fsl_diu.c	optional	mpc85xx diu
 powerpc/mpc85xx/fsl_espi.c	optional	mpc85xx spibus
 powerpc/mpc85xx/fsl_sata.c	optional	mpc85xx ata
 powerpc/mpc85xx/i2c.c		optional	iicbus
 powerpc/mpc85xx/isa.c		optional	mpc85xx isa
 powerpc/mpc85xx/lbc.c		optional	mpc85xx
 powerpc/mpc85xx/mpc85xx.c	optional	mpc85xx
 powerpc/mpc85xx/mpc85xx_cache.c	optional	mpc85xx
 powerpc/mpc85xx/mpc85xx_gpio.c	optional	mpc85xx gpio
 powerpc/mpc85xx/platform_mpc85xx.c	optional	mpc85xx
 powerpc/mpc85xx/pci_mpc85xx.c	optional	pci mpc85xx
 powerpc/mpc85xx/pci_mpc85xx_pcib.c	optional	pci mpc85xx
 powerpc/mpc85xx/qoriq_gpio.c	optional	mpc85xx gpio
 powerpc/ofw/ofw_machdep.c	standard
 powerpc/ofw/ofw_pcibus.c	optional	pci
 powerpc/ofw/ofw_pcib_pci.c	optional	pci
 powerpc/ofw/ofw_real.c		optional	aim
 powerpc/ofw/ofw_syscons.c	optional	sc aim
 powerpc/ofw/ofwcall32.S		optional	aim powerpc
 powerpc/ofw/ofwcall64.S		optional	aim powerpc64
 powerpc/ofw/openpic_ofw.c	standard
 powerpc/ofw/rtas.c		optional	aim
 powerpc/ofw/ofw_initrd.c	optional	md_root_mem powerpc64
 powerpc/powermac/ata_kauai.c	optional	powermac ata | powermac atamacio
 powerpc/powermac/ata_macio.c	optional	powermac ata | powermac atamacio
 powerpc/powermac/ata_dbdma.c	optional	powermac ata | powermac atamacio
 powerpc/powermac/atibl.c	optional	powermac atibl
 powerpc/powermac/cuda.c		optional	powermac cuda
 powerpc/powermac/cpcht.c	optional	powermac pci
 powerpc/powermac/dbdma.c	optional	powermac pci
 powerpc/powermac/fcu.c		optional	powermac fcu
 powerpc/powermac/grackle.c	optional	powermac pci
 powerpc/powermac/hrowpic.c	optional	powermac pci
 powerpc/powermac/kiic.c		optional	powermac kiic
 powerpc/powermac/macgpio.c	optional	powermac pci 
 powerpc/powermac/macio.c	optional	powermac pci
 powerpc/powermac/nvbl.c		optional	powermac nvbl
 powerpc/powermac/platform_powermac.c optional	powermac
 powerpc/powermac/powermac_thermal.c optional	powermac
 powerpc/powermac/pswitch.c	optional	powermac pswitch
 powerpc/powermac/pmu.c		optional	powermac pmu 
 powerpc/powermac/smu.c		optional	powermac smu 
 powerpc/powermac/smusat.c	optional	powermac smu
 powerpc/powermac/uninorth.c	optional	powermac
 powerpc/powermac/uninorthpci.c	optional	powermac pci
 powerpc/powermac/vcoregpio.c	optional	powermac 
 powerpc/powernv/opal.c		optional	powernv
 powerpc/powernv/opal_async.c	optional	powernv
 powerpc/powernv/opal_console.c	optional	powernv
 powerpc/powernv/opal_dev.c	optional	powernv
-powerpc/powernv/opal_flash.c	optional	powernv
+powerpc/powernv/opal_flash.c	optional	powernv opalflash
 powerpc/powernv/opal_hmi.c	optional	powernv
 powerpc/powernv/opal_i2c.c	optional	iicbus fdt powernv
 powerpc/powernv/opal_i2cm.c	optional	iicbus fdt powernv
 powerpc/powernv/opal_pci.c	optional	powernv pci
 powerpc/powernv/opal_sensor.c	optional	powernv
 powerpc/powernv/opalcall.S	optional	powernv
 powerpc/powernv/platform_powernv.c optional	powernv
 powerpc/powernv/powernv_centaur.c	optional	powernv
 powerpc/powernv/powernv_xscom.c	optional	powernv
 powerpc/powernv/xive.c		optional	powernv
 powerpc/powerpc/altivec.c	optional	powerpc | powerpc64
 powerpc/powerpc/autoconf.c	standard
 powerpc/powerpc/bus_machdep.c	standard
 powerpc/powerpc/busdma_machdep.c standard
 powerpc/powerpc/clock.c		standard
 powerpc/powerpc/copyinout.c	standard
 powerpc/powerpc/copystr.c	standard
 powerpc/powerpc/cpu.c		standard
 powerpc/powerpc/cpu_subr64.S	optional	powerpc64
 powerpc/powerpc/db_disasm.c	optional	ddb
 powerpc/powerpc/db_hwwatch.c	optional	ddb
 powerpc/powerpc/db_interface.c	optional	ddb
 powerpc/powerpc/db_trace.c	optional	ddb
 powerpc/powerpc/dump_machdep.c	standard
 powerpc/powerpc/elf32_machdep.c	optional	powerpc | powerpcspe | compat_freebsd32
 powerpc/powerpc/elf64_machdep.c	optional	powerpc64
 powerpc/powerpc/exec_machdep.c	standard
 powerpc/powerpc/fpu.c		standard
 powerpc/powerpc/gdb_machdep.c	optional	gdb
 powerpc/powerpc/in_cksum.c	optional	inet | inet6
 powerpc/powerpc/interrupt.c	standard
 powerpc/powerpc/intr_machdep.c	standard
 powerpc/powerpc/iommu_if.m	standard
 powerpc/powerpc/machdep.c	standard
 powerpc/powerpc/mem.c		optional	mem
 powerpc/powerpc/mmu_if.m	standard
 powerpc/powerpc/mp_machdep.c	optional	smp
 powerpc/powerpc/nexus.c		standard
 powerpc/powerpc/openpic.c	standard
 powerpc/powerpc/pic_if.m	standard
 powerpc/powerpc/pmap_dispatch.c	standard
 powerpc/powerpc/platform.c	standard
 powerpc/powerpc/platform_if.m	standard
 powerpc/powerpc/ptrace_machdep.c	standard
 powerpc/powerpc/sc_machdep.c	optional	sc
 powerpc/powerpc/setjmp.S	standard
 powerpc/powerpc/sigcode32.S	optional	powerpc | powerpcspe | compat_freebsd32
 powerpc/powerpc/sigcode64.S	optional	powerpc64
 powerpc/powerpc/swtch32.S	optional	powerpc | powerpcspe
 powerpc/powerpc/swtch64.S	optional	powerpc64
 powerpc/powerpc/stack_machdep.c	optional	ddb | stack
 powerpc/powerpc/syncicache.c	standard
 powerpc/powerpc/sys_machdep.c	standard
 powerpc/powerpc/trap.c		standard
 powerpc/powerpc/uio_machdep.c	standard
 powerpc/powerpc/uma_machdep.c	standard
 powerpc/powerpc/vm_machdep.c	standard
 powerpc/ps3/ehci_ps3.c		optional	ps3 ehci
 powerpc/ps3/ohci_ps3.c		optional	ps3 ohci
 powerpc/ps3/if_glc.c		optional	ps3 glc
 powerpc/ps3/mmu_ps3.c		optional	ps3
 powerpc/ps3/platform_ps3.c	optional	ps3
 powerpc/ps3/ps3bus.c		optional	ps3
 powerpc/ps3/ps3cdrom.c		optional	ps3 scbus
 powerpc/ps3/ps3disk.c		optional	ps3
 powerpc/ps3/ps3pic.c		optional	ps3
 powerpc/ps3/ps3_syscons.c	optional	ps3 vt
 powerpc/ps3/ps3-hvcall.S	optional	ps3
 powerpc/pseries/phyp-hvcall.S	optional	pseries powerpc64
 powerpc/pseries/mmu_phyp.c	optional	pseries powerpc64
 powerpc/pseries/phyp_console.c	optional	pseries powerpc64 uart
 powerpc/pseries/phyp_llan.c	optional	llan
 powerpc/pseries/phyp_vscsi.c	optional	pseries powerpc64 scbus
 powerpc/pseries/platform_chrp.c	optional	pseries
 powerpc/pseries/plpar_iommu.c	optional	pseries powerpc64
 powerpc/pseries/plpar_pcibus.c	optional	pseries powerpc64 pci
 powerpc/pseries/rtas_dev.c	optional	pseries
 powerpc/pseries/rtas_pci.c	optional	pseries pci
 powerpc/pseries/vdevice.c	optional	pseries powerpc64
 powerpc/pseries/xics.c		optional	pseries powerpc64
 powerpc/psim/iobus.c 		optional	psim
 powerpc/psim/ata_iobus.c	optional	ata psim
 powerpc/psim/openpic_iobus.c	optional	psim
 powerpc/psim/uart_iobus.c	optional	uart psim
Index: user/ngie/bug-237403/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
===================================================================
--- user/ngie/bug-237403/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c	(revision 346925)
+++ user/ngie/bug-237403/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c	(revision 346926)
@@ -1,1447 +1,1447 @@
 /*	$FreeBSD$	*/
 
 /*
  * Copyright (C) 2012 by Darren Reed.
  *
  * See the IPFILTER.LICENCE file for details on licencing.
  */
 #if !defined(lint)
 static const char sccsid[] = "@(#)ip_fil.c	2.41 6/5/96 (C) 1993-2000 Darren Reed";
 static const char rcsid[] = "@(#)$Id$";
 #endif
 
 #if defined(KERNEL) || defined(_KERNEL)
 # undef KERNEL
 # undef _KERNEL
 # define	KERNEL	1
 # define	_KERNEL	1
 #endif
 #if defined(__FreeBSD_version) && (__FreeBSD_version >= 400000) && \
     !defined(KLD_MODULE) && !defined(IPFILTER_LKM)
 # include "opt_inet6.h"
 #endif
 #if defined(__FreeBSD_version) && (__FreeBSD_version >= 440000) && \
     !defined(KLD_MODULE) && !defined(IPFILTER_LKM)
 # include "opt_random_ip_id.h"
 #endif
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/errno.h>
 #include <sys/types.h>
 #include <sys/file.h>
 # include <sys/fcntl.h>
 # include <sys/filio.h>
 #include <sys/time.h>
 #include <sys/systm.h>
 # include <sys/dirent.h>
 #if defined(__FreeBSD_version) && (__FreeBSD_version >= 800000)
 #include <sys/jail.h>
 #endif
 # include <sys/malloc.h>
 # include <sys/mbuf.h>
 # include <sys/sockopt.h>
 #include <sys/socket.h>
 # include <sys/selinfo.h>
 # include <netinet/tcp_var.h>
 
 #include <net/if.h>
 # include <net/if_var.h>
 #  include <net/netisr.h>
 #include <net/route.h>
 #include <netinet/in.h>
 #include <netinet/in_fib.h>
 #include <netinet/in_var.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/ip_var.h>
 #include <netinet/tcp.h>
 #include <net/vnet.h>
 #include <netinet/udp.h>
 #include <netinet/tcpip.h>
 #include <netinet/ip_icmp.h>
 #include "netinet/ip_compat.h"
 #ifdef USE_INET6
 # include <netinet/icmp6.h>
 #endif
 #include "netinet/ip_fil.h"
 #include "netinet/ip_nat.h"
 #include "netinet/ip_frag.h"
 #include "netinet/ip_state.h"
 #include "netinet/ip_proxy.h"
 #include "netinet/ip_auth.h"
 #include "netinet/ip_sync.h"
 #include "netinet/ip_lookup.h"
 #include "netinet/ip_dstlist.h"
 #ifdef	IPFILTER_SCAN
 #include "netinet/ip_scan.h"
 #endif
 #include "netinet/ip_pool.h"
 # include <sys/malloc.h>
 #include <sys/kernel.h>
 #ifdef CSUM_DATA_VALID
 #include <machine/in_cksum.h>
 #endif
 extern	int	ip_optcopy __P((struct ip *, struct ip *));
 
 # ifdef IPFILTER_M_IPFILTER
 MALLOC_DEFINE(M_IPFILTER, "ipfilter", "IP Filter packet filter data structures");
 # endif
 
 
 static	int	ipf_send_ip __P((fr_info_t *, mb_t *));
 static void	ipf_timer_func __P((void *arg));
 
 VNET_DEFINE(ipf_main_softc_t, ipfmain) = {
 	.ipf_running		= -2,
 };
 #define	V_ipfmain		VNET(ipfmain)
 
 # include <sys/conf.h>
 #  include <net/pfil.h>
 
 static eventhandler_tag ipf_arrivetag, ipf_departtag;
 #if 0
 /*
  * Disable the "cloner" event handler;  we are getting interface
  * events before the firewall is fully initiallized and also no vnet
  * information thus leading to uninitialised memory accesses.
  * In addition it is unclear why we need it in first place.
  * If it turns out to be needed, well need a dedicated event handler
  * for it to deal with the ifc and the correct vnet.
  */
 static eventhandler_tag ipf_clonetag;
 #endif
 
 static void ipf_ifevent(void *arg, struct ifnet *ifp);
 
 static void ipf_ifevent(arg, ifp)
 	void *arg;
 	struct ifnet *ifp;
 {
 
 	CURVNET_SET(ifp->if_vnet);
 	if (V_ipfmain.ipf_running > 0)
 		ipf_sync(&V_ipfmain, NULL);
 	CURVNET_RESTORE();
 }
 
 
 
 static pfil_return_t
 ipf_check_wrapper(struct mbuf **mp, struct ifnet *ifp, int flags,
     void *ruleset __unused, struct inpcb *inp)
 {
 	struct ip *ip = mtod(*mp, struct ip *);
 	pfil_return_t rv;
 
 	CURVNET_SET(ifp->if_vnet);
 	rv = ipf_check(&V_ipfmain, ip, ip->ip_hl << 2, ifp,
 	    !!(flags & PFIL_OUT), mp);
 	CURVNET_RESTORE();
 	return (rv == 0 ? PFIL_PASS : PFIL_DROPPED);
 }
 
 #ifdef USE_INET6
 static pfil_return_t
 ipf_check_wrapper6(struct mbuf **mp, struct ifnet *ifp, int flags,
     void *ruleset __unused, struct inpcb *inp)
 {
 	pfil_return_t rv;
 
 	CURVNET_SET(ifp->if_vnet);
 	rv = ipf_check(&V_ipfmain, mtod(*mp, struct ip *),
 	    sizeof(struct ip6_hdr), ifp, !!(flags & PFIL_OUT), mp);
 	CURVNET_RESTORE();
 
 	return (rv == 0 ? PFIL_PASS : PFIL_DROPPED);
 }
 # endif
 #if	defined(IPFILTER_LKM)
 int ipf_identify(s)
 	char *s;
 {
 	if (strcmp(s, "ipl") == 0)
 		return 1;
 	return 0;
 }
 #endif /* IPFILTER_LKM */
 
 
 static void
 ipf_timer_func(arg)
 	void *arg;
 {
 	ipf_main_softc_t *softc = arg;
 	SPL_INT(s);
 
 	SPL_NET(s);
 	READ_ENTER(&softc->ipf_global);
 
         if (softc->ipf_running > 0)
 		ipf_slowtimer(softc);
 
 	if (softc->ipf_running == -1 || softc->ipf_running == 1) {
 #if 0
 		softc->ipf_slow_ch = timeout(ipf_timer_func, softc, hz/2);
 #endif
 		callout_init(&softc->ipf_slow_ch, 1);
 		callout_reset(&softc->ipf_slow_ch,
 			(hz / IPF_HZ_DIVIDE) * IPF_HZ_MULT,
 			ipf_timer_func, softc);
 	}
 	RWLOCK_EXIT(&softc->ipf_global);
 	SPL_X(s);
 }
 
 
 int
 ipfattach(softc)
 	ipf_main_softc_t *softc;
 {
 #ifdef USE_SPL
 	int s;
 #endif
 
 	SPL_NET(s);
 	if (softc->ipf_running > 0) {
 		SPL_X(s);
 		return EBUSY;
 	}
 
 	if (ipf_init_all(softc) < 0) {
 		SPL_X(s);
 		return EIO;
 	}
 
 
 	bzero((char *)V_ipfmain.ipf_selwait, sizeof(V_ipfmain.ipf_selwait));
 	softc->ipf_running = 1;
 
 	if (softc->ipf_control_forwarding & 1)
 		V_ipforwarding = 1;
 
 	SPL_X(s);
 #if 0
 	softc->ipf_slow_ch = timeout(ipf_timer_func, softc,
 				     (hz / IPF_HZ_DIVIDE) * IPF_HZ_MULT);
 #endif
 	callout_init(&softc->ipf_slow_ch, 1);
 	callout_reset(&softc->ipf_slow_ch, (hz / IPF_HZ_DIVIDE) * IPF_HZ_MULT,
 		ipf_timer_func, softc);
 	return 0;
 }
 
 
 /*
  * Disable the filter by removing the hooks from the IP input/output
  * stream.
  */
 int
 ipfdetach(softc)
 	ipf_main_softc_t *softc;
 {
 #ifdef USE_SPL
 	int s;
 #endif
 
 	if (softc->ipf_control_forwarding & 2)
 		V_ipforwarding = 0;
 
 	SPL_NET(s);
 
 #if 0
 	if (softc->ipf_slow_ch.callout != NULL)
 		untimeout(ipf_timer_func, softc, softc->ipf_slow_ch);
 	bzero(&softc->ipf_slow, sizeof(softc->ipf_slow));
 #endif
 	callout_drain(&softc->ipf_slow_ch);
 
 	ipf_fini_all(softc);
 
 	softc->ipf_running = -2;
 
 	SPL_X(s);
 
 	return 0;
 }
 
 
 /*
  * Filter ioctl interface.
  */
 int
 ipfioctl(dev, cmd, data, mode, p)
 	struct thread *p;
 #    define	p_cred	td_ucred
 #    define	p_uid	td_ucred->cr_ruid
 	struct cdev *dev;
 	ioctlcmd_t cmd;
 	caddr_t data;
 	int mode;
 {
 	int error = 0, unit = 0;
 	SPL_INT(s);
 
 	CURVNET_SET(TD_TO_VNET(p));
 #if (BSD >= 199306)
         if (securelevel_ge(p->p_cred, 3) && (mode & FWRITE))
 	{
 		V_ipfmain.ipf_interror = 130001;
 		CURVNET_RESTORE();
 		return EPERM;
 	}
 #endif
 
 	unit = GET_MINOR(dev);
 	if ((IPL_LOGMAX < unit) || (unit < 0)) {
 		V_ipfmain.ipf_interror = 130002;
 		CURVNET_RESTORE();
 		return ENXIO;
 	}
 
 	if (V_ipfmain.ipf_running <= 0) {
 		if (unit != IPL_LOGIPF && cmd != SIOCIPFINTERROR) {
 			V_ipfmain.ipf_interror = 130003;
 			CURVNET_RESTORE();
 			return EIO;
 		}
 		if (cmd != SIOCIPFGETNEXT && cmd != SIOCIPFGET &&
 		    cmd != SIOCIPFSET && cmd != SIOCFRENB &&
 		    cmd != SIOCGETFS && cmd != SIOCGETFF &&
 		    cmd != SIOCIPFINTERROR) {
 			V_ipfmain.ipf_interror = 130004;
 			CURVNET_RESTORE();
 			return EIO;
 		}
 	}
 
 	SPL_NET(s);
 
 	error = ipf_ioctlswitch(&V_ipfmain, unit, data, cmd, mode, p->p_uid, p);
 	CURVNET_RESTORE();
 	if (error != -1) {
 		SPL_X(s);
 		return error;
 	}
 
 	SPL_X(s);
 
 	return error;
 }
 
 
 /*
  * ipf_send_reset - this could conceivably be a call to tcp_respond(), but that
  * requires a large amount of setting up and isn't any more efficient.
  */
 int
 ipf_send_reset(fin)
 	fr_info_t *fin;
 {
 	struct tcphdr *tcp, *tcp2;
 	int tlen = 0, hlen;
 	struct mbuf *m;
 #ifdef USE_INET6
 	ip6_t *ip6;
 #endif
 	ip_t *ip;
 
 	tcp = fin->fin_dp;
 	if (tcp->th_flags & TH_RST)
 		return -1;		/* feedback loop */
 
 	if (ipf_checkl4sum(fin) == -1)
 		return -1;
 
 	tlen = fin->fin_dlen - (TCP_OFF(tcp) << 2) +
 			((tcp->th_flags & TH_SYN) ? 1 : 0) +
 			((tcp->th_flags & TH_FIN) ? 1 : 0);
 
 #ifdef USE_INET6
 	hlen = (fin->fin_v == 6) ? sizeof(ip6_t) : sizeof(ip_t);
 #else
 	hlen = sizeof(ip_t);
 #endif
 #ifdef MGETHDR
 	MGETHDR(m, M_NOWAIT, MT_HEADER);
 #else
 	MGET(m, M_NOWAIT, MT_HEADER);
 #endif
 	if (m == NULL)
 		return -1;
 	if (sizeof(*tcp2) + hlen > MLEN) {
 		if (!(MCLGET(m, M_NOWAIT))) {
 			FREE_MB_T(m);
 			return -1;
 		}
 	}
 
 	m->m_len = sizeof(*tcp2) + hlen;
 #if (BSD >= 199103)
 	m->m_data += max_linkhdr;
 	m->m_pkthdr.len = m->m_len;
 	m->m_pkthdr.rcvif = (struct ifnet *)0;
 #endif
 	ip = mtod(m, struct ip *);
 	bzero((char *)ip, hlen);
 #ifdef USE_INET6
 	ip6 = (ip6_t *)ip;
 #endif
 	tcp2 = (struct tcphdr *)((char *)ip + hlen);
 	tcp2->th_sport = tcp->th_dport;
 	tcp2->th_dport = tcp->th_sport;
 
 	if (tcp->th_flags & TH_ACK) {
 		tcp2->th_seq = tcp->th_ack;
 		tcp2->th_flags = TH_RST;
 		tcp2->th_ack = 0;
 	} else {
 		tcp2->th_seq = 0;
 		tcp2->th_ack = ntohl(tcp->th_seq);
 		tcp2->th_ack += tlen;
 		tcp2->th_ack = htonl(tcp2->th_ack);
 		tcp2->th_flags = TH_RST|TH_ACK;
 	}
 	TCP_X2_A(tcp2, 0);
 	TCP_OFF_A(tcp2, sizeof(*tcp2) >> 2);
 	tcp2->th_win = tcp->th_win;
 	tcp2->th_sum = 0;
 	tcp2->th_urp = 0;
 
 #ifdef USE_INET6
 	if (fin->fin_v == 6) {
 		ip6->ip6_flow = ((ip6_t *)fin->fin_ip)->ip6_flow;
 		ip6->ip6_plen = htons(sizeof(struct tcphdr));
 		ip6->ip6_nxt = IPPROTO_TCP;
 		ip6->ip6_hlim = 0;
 		ip6->ip6_src = fin->fin_dst6.in6;
 		ip6->ip6_dst = fin->fin_src6.in6;
 		tcp2->th_sum = in6_cksum(m, IPPROTO_TCP,
 					 sizeof(*ip6), sizeof(*tcp2));
 		return ipf_send_ip(fin, m);
 	}
 #endif
 	ip->ip_p = IPPROTO_TCP;
 	ip->ip_len = htons(sizeof(struct tcphdr));
 	ip->ip_src.s_addr = fin->fin_daddr;
 	ip->ip_dst.s_addr = fin->fin_saddr;
 	tcp2->th_sum = in_cksum(m, hlen + sizeof(*tcp2));
 	ip->ip_len = htons(hlen + sizeof(*tcp2));
 	return ipf_send_ip(fin, m);
 }
 
 
 /*
  * ip_len must be in network byte order when called.
  */
 static int
 ipf_send_ip(fin, m)
 	fr_info_t *fin;
 	mb_t *m;
 {
 	fr_info_t fnew;
 	ip_t *ip, *oip;
 	int hlen;
 
 	ip = mtod(m, ip_t *);
 	bzero((char *)&fnew, sizeof(fnew));
 	fnew.fin_main_soft = fin->fin_main_soft;
 
 	IP_V_A(ip, fin->fin_v);
 	switch (fin->fin_v)
 	{
 	case 4 :
 		oip = fin->fin_ip;
 		hlen = sizeof(*oip);
 		fnew.fin_v = 4;
 		fnew.fin_p = ip->ip_p;
 		fnew.fin_plen = ntohs(ip->ip_len);
 		IP_HL_A(ip, sizeof(*oip) >> 2);
 		ip->ip_tos = oip->ip_tos;
 		ip->ip_id = fin->fin_ip->ip_id;
 		ip->ip_off = htons(V_path_mtu_discovery ? IP_DF : 0);
 		ip->ip_ttl = V_ip_defttl;
 		ip->ip_sum = 0;
 		break;
 #ifdef USE_INET6
 	case 6 :
 	{
 		ip6_t *ip6 = (ip6_t *)ip;
 
 		ip6->ip6_vfc = 0x60;
 		ip6->ip6_hlim = IPDEFTTL;
 
 		hlen = sizeof(*ip6);
 		fnew.fin_p = ip6->ip6_nxt;
 		fnew.fin_v = 6;
 		fnew.fin_plen = ntohs(ip6->ip6_plen) + hlen;
 		break;
 	}
 #endif
 	default :
 		return EINVAL;
 	}
 #ifdef IPSEC
 	m->m_pkthdr.rcvif = NULL;
 #endif
 
 	fnew.fin_ifp = fin->fin_ifp;
 	fnew.fin_flx = FI_NOCKSUM;
 	fnew.fin_m = m;
 	fnew.fin_ip = ip;
 	fnew.fin_mp = &m;
 	fnew.fin_hlen = hlen;
 	fnew.fin_dp = (char *)ip + hlen;
 	(void) ipf_makefrip(hlen, ip, &fnew);
 
 	return ipf_fastroute(m, &m, &fnew, NULL);
 }
 
 
 int
 ipf_send_icmp_err(type, fin, dst)
 	int type;
 	fr_info_t *fin;
 	int dst;
 {
 	int err, hlen, xtra, iclen, ohlen, avail, code;
 	struct in_addr dst4;
 	struct icmp *icmp;
 	struct mbuf *m;
 	i6addr_t dst6;
 	void *ifp;
 #ifdef USE_INET6
 	ip6_t *ip6;
 #endif
 	ip_t *ip, *ip2;
 
 	if ((type < 0) || (type >= ICMP_MAXTYPE))
 		return -1;
 
 	code = fin->fin_icode;
 #ifdef USE_INET6
 	/* See NetBSD ip_fil_netbsd.c r1.4: */
 	if ((code < 0) || (code >= sizeof(icmptoicmp6unreach)/sizeof(int)))
 		return -1;
 #endif
 
 	if (ipf_checkl4sum(fin) == -1)
 		return -1;
 #ifdef MGETHDR
 	MGETHDR(m, M_NOWAIT, MT_HEADER);
 #else
 	MGET(m, M_NOWAIT, MT_HEADER);
 #endif
 	if (m == NULL)
 		return -1;
 	avail = MHLEN;
 
 	xtra = 0;
 	hlen = 0;
 	ohlen = 0;
 	dst4.s_addr = 0;
 	ifp = fin->fin_ifp;
 	if (fin->fin_v == 4) {
 		if ((fin->fin_p == IPPROTO_ICMP) && !(fin->fin_flx & FI_SHORT))
 			switch (ntohs(fin->fin_data[0]) >> 8)
 			{
 			case ICMP_ECHO :
 			case ICMP_TSTAMP :
 			case ICMP_IREQ :
 			case ICMP_MASKREQ :
 				break;
 			default :
 				FREE_MB_T(m);
 				return 0;
 			}
 
 		if (dst == 0) {
 			if (ipf_ifpaddr(&V_ipfmain, 4, FRI_NORMAL, ifp,
 					&dst6, NULL) == -1) {
 				FREE_MB_T(m);
 				return -1;
 			}
 			dst4 = dst6.in4;
 		} else
 			dst4.s_addr = fin->fin_daddr;
 
 		hlen = sizeof(ip_t);
 		ohlen = fin->fin_hlen;
 		iclen = hlen + offsetof(struct icmp, icmp_ip) + ohlen;
 		if (fin->fin_hlen < fin->fin_plen)
 			xtra = MIN(fin->fin_dlen, 8);
 		else
 			xtra = 0;
 	}
 
 #ifdef USE_INET6
 	else if (fin->fin_v == 6) {
 		hlen = sizeof(ip6_t);
 		ohlen = sizeof(ip6_t);
 		iclen = hlen + offsetof(struct icmp, icmp_ip) + ohlen;
 		type = icmptoicmp6types[type];
 		if (type == ICMP6_DST_UNREACH)
 			code = icmptoicmp6unreach[code];
 
 		if (iclen + max_linkhdr + fin->fin_plen > avail) {
 			if (!(MCLGET(m, M_NOWAIT))) {
 				FREE_MB_T(m);
 				return -1;
 			}
 			avail = MCLBYTES;
 		}
 		xtra = MIN(fin->fin_plen, avail - iclen - max_linkhdr);
 		xtra = MIN(xtra, IPV6_MMTU - iclen);
 		if (dst == 0) {
 			if (ipf_ifpaddr(&V_ipfmain, 6, FRI_NORMAL, ifp,
 					&dst6, NULL) == -1) {
 				FREE_MB_T(m);
 				return -1;
 			}
 		} else
 			dst6 = fin->fin_dst6;
 	}
 #endif
 	else {
 		FREE_MB_T(m);
 		return -1;
 	}
 
 	avail -= (max_linkhdr + iclen);
 	if (avail < 0) {
 		FREE_MB_T(m);
 		return -1;
 	}
 	if (xtra > avail)
 		xtra = avail;
 	iclen += xtra;
 	m->m_data += max_linkhdr;
 	m->m_pkthdr.rcvif = (struct ifnet *)0;
 	m->m_pkthdr.len = iclen;
 	m->m_len = iclen;
 	ip = mtod(m, ip_t *);
 	icmp = (struct icmp *)((char *)ip + hlen);
 	ip2 = (ip_t *)&icmp->icmp_ip;
 
 	icmp->icmp_type = type;
 	icmp->icmp_code = fin->fin_icode;
 	icmp->icmp_cksum = 0;
 #ifdef icmp_nextmtu
 	if (type == ICMP_UNREACH && fin->fin_icode == ICMP_UNREACH_NEEDFRAG) {
 		if (fin->fin_mtu != 0) {
 			icmp->icmp_nextmtu = htons(fin->fin_mtu);
 
 		} else if (ifp != NULL) {
 			icmp->icmp_nextmtu = htons(GETIFMTU_4(ifp));
 
 		} else {	/* make up a number... */
 			icmp->icmp_nextmtu = htons(fin->fin_plen - 20);
 		}
 	}
 #endif
 
 	bcopy((char *)fin->fin_ip, (char *)ip2, ohlen);
 
 #ifdef USE_INET6
 	ip6 = (ip6_t *)ip;
 	if (fin->fin_v == 6) {
 		ip6->ip6_flow = ((ip6_t *)fin->fin_ip)->ip6_flow;
 		ip6->ip6_plen = htons(iclen - hlen);
 		ip6->ip6_nxt = IPPROTO_ICMPV6;
 		ip6->ip6_hlim = 0;
 		ip6->ip6_src = dst6.in6;
 		ip6->ip6_dst = fin->fin_src6.in6;
 		if (xtra > 0)
 			bcopy((char *)fin->fin_ip + ohlen,
 			      (char *)&icmp->icmp_ip + ohlen, xtra);
 		icmp->icmp_cksum = in6_cksum(m, IPPROTO_ICMPV6,
 					     sizeof(*ip6), iclen - hlen);
 	} else
 #endif
 	{
 		ip->ip_p = IPPROTO_ICMP;
 		ip->ip_src.s_addr = dst4.s_addr;
 		ip->ip_dst.s_addr = fin->fin_saddr;
 
 		if (xtra > 0)
 			bcopy((char *)fin->fin_ip + ohlen,
 			      (char *)&icmp->icmp_ip + ohlen, xtra);
 		icmp->icmp_cksum = ipf_cksum((u_short *)icmp,
 					     sizeof(*icmp) + 8);
 		ip->ip_len = htons(iclen);
 		ip->ip_p = IPPROTO_ICMP;
 	}
 	err = ipf_send_ip(fin, m);
 	return err;
 }
 
 
 
 
 /*
  * m0 - pointer to mbuf where the IP packet starts
  * mpp - pointer to the mbuf pointer that is the start of the mbuf chain
  */
 int
 ipf_fastroute(m0, mpp, fin, fdp)
 	mb_t *m0, **mpp;
 	fr_info_t *fin;
 	frdest_t *fdp;
 {
 	register struct ip *ip, *mhip;
 	register struct mbuf *m = *mpp;
 	int len, off, error = 0, hlen, code;
 	struct ifnet *ifp, *sifp;
 	struct sockaddr_in dst;
 	struct nhop4_extended nh4;
 	int has_nhop = 0;
 	u_long fibnum = 0;
 	u_short ip_off;
 	frdest_t node;
 	frentry_t *fr;
 
 #ifdef M_WRITABLE
 	/*
 	* HOT FIX/KLUDGE:
 	*
 	* If the mbuf we're about to send is not writable (because of
 	* a cluster reference, for example) we'll need to make a copy
 	* of it since this routine modifies the contents.
 	*
 	* If you have non-crappy network hardware that can transmit data
 	* from the mbuf, rather than making a copy, this is gonna be a
 	* problem.
 	*/
 	if (M_WRITABLE(m) == 0) {
 		m0 = m_dup(m, M_NOWAIT);
 		if (m0 != NULL) {
 			FREE_MB_T(m);
 			m = m0;
 			*mpp = m;
 		} else {
 			error = ENOBUFS;
 			FREE_MB_T(m);
 			goto done;
 		}
 	}
 #endif
 
 #ifdef USE_INET6
 	if (fin->fin_v == 6) {
 		/*
 		 * currently "to <if>" and "to <if>:ip#" are not supported
 		 * for IPv6
 		 */
 		return ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL);
 	}
 #endif
 
 	hlen = fin->fin_hlen;
 	ip = mtod(m0, struct ip *);
 	ifp = NULL;
 
 	/*
 	 * Route packet.
 	 */
 	bzero(&dst, sizeof (dst));
 	dst.sin_family = AF_INET;
 	dst.sin_addr = ip->ip_dst;
 	dst.sin_len = sizeof(dst);
 
 	fr = fin->fin_fr;
 	if ((fr != NULL) && !(fr->fr_flags & FR_KEEPSTATE) && (fdp != NULL) &&
 	    (fdp->fd_type == FRD_DSTLIST)) {
 		if (ipf_dstlist_select_node(fin, fdp->fd_ptr, NULL, &node) == 0)
 			fdp = &node;
 	}
 
 	if (fdp != NULL)
 		ifp = fdp->fd_ptr;
 	else
 		ifp = fin->fin_ifp;
 
 	if ((ifp == NULL) && ((fr == NULL) || !(fr->fr_flags & FR_FASTROUTE))) {
 		error = -2;
 		goto bad;
 	}
 
 	if ((fdp != NULL) && (fdp->fd_ip.s_addr != 0))
 		dst.sin_addr = fdp->fd_ip;
 
 	fibnum = M_GETFIB(m0);
 	if (fib4_lookup_nh_ext(fibnum, dst.sin_addr, NHR_REF, 0, &nh4) != 0) {
 		if (in_localaddr(ip->ip_dst))
 			error = EHOSTUNREACH;
 		else
 			error = ENETUNREACH;
 		goto bad;
 	}
 
 	has_nhop = 1;
 	if (ifp == NULL)
 		ifp = nh4.nh_ifp;
 	if (nh4.nh_flags & NHF_GATEWAY)
 		dst.sin_addr = nh4.nh_addr;
 
 	/*
 	 * For input packets which are being "fastrouted", they won't
 	 * go back through output filtering and miss their chance to get
 	 * NAT'd and counted.  Duplicated packets aren't considered to be
 	 * part of the normal packet stream, so do not NAT them or pass
 	 * them through stateful checking, etc.
 	 */
 	if ((fdp != &fr->fr_dif) && (fin->fin_out == 0)) {
 		sifp = fin->fin_ifp;
 		fin->fin_ifp = ifp;
 		fin->fin_out = 1;
 		(void) ipf_acctpkt(fin, NULL);
 		fin->fin_fr = NULL;
 		if (!fr || !(fr->fr_flags & FR_RETMASK)) {
 			u_32_t pass;
 
 			(void) ipf_state_check(fin, &pass);
 		}
 
 		switch (ipf_nat_checkout(fin, NULL))
 		{
 		case 0 :
 			break;
 		case 1 :
 			ip->ip_sum = 0;
 			break;
 		case -1 :
 			error = -1;
 			goto bad;
 			break;
 		}
 
 		fin->fin_ifp = sifp;
 		fin->fin_out = 0;
 	} else
 		ip->ip_sum = 0;
 	/*
 	 * If small enough for interface, can just send directly.
 	 */
 	if (ntohs(ip->ip_len) <= ifp->if_mtu) {
 		if (!ip->ip_sum)
 			ip->ip_sum = in_cksum(m, hlen);
 		error = (*ifp->if_output)(ifp, m, (struct sockaddr *)&dst,
 			    NULL
 			);
 		goto done;
 	}
 	/*
 	 * Too large for interface; fragment if possible.
 	 * Must be able to put at least 8 bytes per fragment.
 	 */
 	ip_off = ntohs(ip->ip_off);
 	if (ip_off & IP_DF) {
 		error = EMSGSIZE;
 		goto bad;
 	}
 	len = (ifp->if_mtu - hlen) &~ 7;
 	if (len < 8) {
 		error = EMSGSIZE;
 		goto bad;
 	}
 
     {
 	int mhlen, firstlen = len;
 	struct mbuf **mnext = &m->m_act;
 
 	/*
 	 * Loop through length of segment after first fragment,
 	 * make new header and copy data of each part and link onto chain.
 	 */
 	m0 = m;
 	mhlen = sizeof (struct ip);
 	for (off = hlen + len; off < ntohs(ip->ip_len); off += len) {
 #ifdef MGETHDR
 		MGETHDR(m, M_NOWAIT, MT_HEADER);
 #else
 		MGET(m, M_NOWAIT, MT_HEADER);
 #endif
 		if (m == NULL) {
 			m = m0;
 			error = ENOBUFS;
 			goto bad;
 		}
 		m->m_data += max_linkhdr;
 		mhip = mtod(m, struct ip *);
 		bcopy((char *)ip, (char *)mhip, sizeof(*ip));
 		if (hlen > sizeof (struct ip)) {
 			mhlen = ip_optcopy(ip, mhip) + sizeof (struct ip);
 			IP_HL_A(mhip, mhlen >> 2);
 		}
 		m->m_len = mhlen;
 		mhip->ip_off = ((off - hlen) >> 3) + ip_off;
 		if (off + len >= ntohs(ip->ip_len))
 			len = ntohs(ip->ip_len) - off;
 		else
 			mhip->ip_off |= IP_MF;
 		mhip->ip_len = htons((u_short)(len + mhlen));
 		*mnext = m;
 		m->m_next = m_copym(m0, off, len, M_NOWAIT);
 		if (m->m_next == 0) {
 			error = ENOBUFS;	/* ??? */
 			goto sendorfree;
 		}
 		m->m_pkthdr.len = mhlen + len;
 		m->m_pkthdr.rcvif = NULL;
 		mhip->ip_off = htons((u_short)mhip->ip_off);
 		mhip->ip_sum = 0;
 		mhip->ip_sum = in_cksum(m, mhlen);
 		mnext = &m->m_act;
 	}
 	/*
 	 * Update first fragment by trimming what's been copied out
 	 * and updating header, then send each fragment (in order).
 	 */
 	m_adj(m0, hlen + firstlen - ip->ip_len);
 	ip->ip_len = htons((u_short)(hlen + firstlen));
 	ip->ip_off = htons((u_short)IP_MF);
 	ip->ip_sum = 0;
 	ip->ip_sum = in_cksum(m0, hlen);
 sendorfree:
 	for (m = m0; m; m = m0) {
 		m0 = m->m_act;
 		m->m_act = 0;
 		if (error == 0)
 			error = (*ifp->if_output)(ifp, m,
 			    (struct sockaddr *)&dst,
 			    NULL
 			    );
 		else
 			FREE_MB_T(m);
 	}
     }
 done:
 	if (!error)
 		V_ipfmain.ipf_frouteok[0]++;
 	else
 		V_ipfmain.ipf_frouteok[1]++;
 
 	if (has_nhop)
 		fib4_free_nh_ext(fibnum, &nh4);
 
 	return 0;
 bad:
 	if (error == EMSGSIZE) {
 		sifp = fin->fin_ifp;
 		code = fin->fin_icode;
 		fin->fin_icode = ICMP_UNREACH_NEEDFRAG;
 		fin->fin_ifp = ifp;
 		(void) ipf_send_icmp_err(ICMP_UNREACH, fin, 1);
 		fin->fin_ifp = sifp;
 		fin->fin_icode = code;
 	}
 	FREE_MB_T(m);
 	goto done;
 }
 
 
 int
 ipf_verifysrc(fin)
 	fr_info_t *fin;
 {
 	struct nhop4_basic nh4;
 
 	if (fib4_lookup_nh_basic(0, fin->fin_src, 0, 0, &nh4) != 0)
 		return (0);
 	return (fin->fin_ifp == nh4.nh_ifp);
 }
 
 
 /*
  * return the first IP Address associated with an interface
  */
 int
 ipf_ifpaddr(softc, v, atype, ifptr, inp, inpmask)
 	ipf_main_softc_t *softc;
 	int v, atype;
 	void *ifptr;
 	i6addr_t *inp, *inpmask;
 {
 #ifdef USE_INET6
 	struct in6_addr *inp6 = NULL;
 #endif
 	struct sockaddr *sock, *mask;
 	struct sockaddr_in *sin;
 	struct ifaddr *ifa;
 	struct ifnet *ifp;
 
 	if ((ifptr == NULL) || (ifptr == (void *)-1))
 		return -1;
 
 	sin = NULL;
 	ifp = ifptr;
 
 	if (v == 4)
 		inp->in4.s_addr = 0;
 #ifdef USE_INET6
 	else if (v == 6)
 		bzero((char *)inp, sizeof(*inp));
 #endif
 	ifa = CK_STAILQ_FIRST(&ifp->if_addrhead);
 
 	sock = ifa->ifa_addr;
 	while (sock != NULL && ifa != NULL) {
 		sin = (struct sockaddr_in *)sock;
 		if ((v == 4) && (sin->sin_family == AF_INET))
 			break;
 #ifdef USE_INET6
 		if ((v == 6) && (sin->sin_family == AF_INET6)) {
 			inp6 = &((struct sockaddr_in6 *)sin)->sin6_addr;
 			if (!IN6_IS_ADDR_LINKLOCAL(inp6) &&
 			    !IN6_IS_ADDR_LOOPBACK(inp6))
 				break;
 		}
 #endif
 		ifa = CK_STAILQ_NEXT(ifa, ifa_link);
 		if (ifa != NULL)
 			sock = ifa->ifa_addr;
 	}
 
 	if (ifa == NULL || sin == NULL)
 		return -1;
 
 	mask = ifa->ifa_netmask;
 	if (atype == FRI_BROADCAST)
 		sock = ifa->ifa_broadaddr;
 	else if (atype == FRI_PEERADDR)
 		sock = ifa->ifa_dstaddr;
 
 	if (sock == NULL)
 		return -1;
 
 #ifdef USE_INET6
 	if (v == 6) {
 		return ipf_ifpfillv6addr(atype, (struct sockaddr_in6 *)sock,
 					 (struct sockaddr_in6 *)mask,
 					 inp, inpmask);
 	}
 #endif
 	return ipf_ifpfillv4addr(atype, (struct sockaddr_in *)sock,
 				 (struct sockaddr_in *)mask,
 				 &inp->in4, &inpmask->in4);
 }
 
 
 u_32_t
 ipf_newisn(fin)
 	fr_info_t *fin;
 {
 	u_32_t newiss;
 	newiss = arc4random();
 	return newiss;
 }
 
 
 INLINE int
 ipf_checkv4sum(fin)
 	fr_info_t *fin;
 {
 #ifdef CSUM_DATA_VALID
 	int manual = 0;
 	u_short sum;
 	ip_t *ip;
 	mb_t *m;
 
 	if ((fin->fin_flx & FI_NOCKSUM) != 0)
 		return 0;
 
 	if ((fin->fin_flx & FI_SHORT) != 0)
 		return 1;
 
 	if (fin->fin_cksum != FI_CK_NEEDED)
 		return (fin->fin_cksum > FI_CK_NEEDED) ? 0 : -1;
 
 	m = fin->fin_m;
 	if (m == NULL) {
 		manual = 1;
 		goto skipauto;
 	}
 	ip = fin->fin_ip;
 
 	if ((m->m_pkthdr.csum_flags & (CSUM_IP_CHECKED|CSUM_IP_VALID)) ==
 	    CSUM_IP_CHECKED) {
 		fin->fin_cksum = FI_CK_BAD;
 		fin->fin_flx |= FI_BAD;
 		DT2(ipf_fi_bad_checkv4sum_csum_ip_checked, fr_info_t *, fin, u_int, m->m_pkthdr.csum_flags & (CSUM_IP_CHECKED|CSUM_IP_VALID));
 		return -1;
 	}
 	if (m->m_pkthdr.csum_flags & CSUM_DATA_VALID) {
 		/* Depending on the driver, UDP may have zero checksum */
 		if (fin->fin_p == IPPROTO_UDP && (fin->fin_flx &
 		    (FI_FRAG|FI_SHORT|FI_BAD)) == 0) {
 			udphdr_t *udp = fin->fin_dp;
 			if (udp->uh_sum == 0) {
 				/*
 				 * we're good no matter what the hardware
 				 * checksum flags and csum_data say (handling
 				 * of csum_data for zero UDP checksum is not
 				 * consistent across all drivers)
 				 */
 				fin->fin_cksum = 1;
 				return 0;
 			}
 		}
 
 		if (m->m_pkthdr.csum_flags & CSUM_PSEUDO_HDR)
 			sum = m->m_pkthdr.csum_data;
 		else
 			sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr,
 					htonl(m->m_pkthdr.csum_data +
 					fin->fin_dlen + fin->fin_p));
 		sum ^= 0xffff;
 		if (sum != 0) {
 			fin->fin_cksum = FI_CK_BAD;
 			fin->fin_flx |= FI_BAD;
 			DT2(ipf_fi_bad_checkv4sum_sum, fr_info_t *, fin, u_int, sum);
 		} else {
 			fin->fin_cksum = FI_CK_SUMOK;
 			return 0;
 		}
 	} else {
 		if (m->m_pkthdr.csum_flags == CSUM_DELAY_DATA) {
 			fin->fin_cksum = FI_CK_L4FULL;
 			return 0;
 		} else if (m->m_pkthdr.csum_flags == CSUM_TCP ||
 			   m->m_pkthdr.csum_flags == CSUM_UDP) {
 			fin->fin_cksum = FI_CK_L4PART;
 			return 0;
 		} else if (m->m_pkthdr.csum_flags == CSUM_IP) {
 			fin->fin_cksum = FI_CK_L4PART;
 			return 0;
 		} else {
 			manual = 1;
 		}
 	}
 skipauto:
 	if (manual != 0) {
 		if (ipf_checkl4sum(fin) == -1) {
 			fin->fin_flx |= FI_BAD;
 			DT2(ipf_fi_bad_checkv4sum_manual, fr_info_t *, fin, u_int, manual);
 			return -1;
 		}
 	}
 #else
 	if (ipf_checkl4sum(fin) == -1) {
 		fin->fin_flx |= FI_BAD;
 		DT2(ipf_fi_bad_checkv4sum_checkl4sum, fr_info_t *, fin, u_int, -1);
 		return -1;
 	}
 #endif
 	return 0;
 }
 
 
 #ifdef USE_INET6
 INLINE int
 ipf_checkv6sum(fin)
 	fr_info_t *fin;
 {
 	if ((fin->fin_flx & FI_NOCKSUM) != 0) {
 		DT(ipf_checkv6sum_fi_nocksum);
 		return 0;
 	}
 
 	if ((fin->fin_flx & FI_SHORT) != 0) {
 		DT(ipf_checkv6sum_fi_short);
 		return 1;
 	}
 
 	if (fin->fin_cksum != FI_CK_NEEDED) {
 		DT(ipf_checkv6sum_fi_ck_needed);
 		return (fin->fin_cksum > FI_CK_NEEDED) ? 0 : -1;
 	}
 
 	if (ipf_checkl4sum(fin) == -1) {
 		fin->fin_flx |= FI_BAD;
 		DT2(ipf_fi_bad_checkv6sum_checkl4sum, fr_info_t *, fin, u_int, -1);
 		return -1;
 	}
 	return 0;
 }
 #endif /* USE_INET6 */
 
 
 size_t
 mbufchainlen(m0)
 	struct mbuf *m0;
-	{
+{
 	size_t len;
 
 	if ((m0->m_flags & M_PKTHDR) != 0) {
 		len = m0->m_pkthdr.len;
 	} else {
 		struct mbuf *m;
 
 		for (m = m0, len = 0; m != NULL; m = m->m_next)
 			len += m->m_len;
 	}
 	return len;
 }
 
 
 /* ------------------------------------------------------------------------ */
 /* Function:    ipf_pullup                                                  */
 /* Returns:     NULL == pullup failed, else pointer to protocol header      */
 /* Parameters:  xmin(I)- pointer to buffer where data packet starts         */
 /*              fin(I) - pointer to packet information                      */
 /*              len(I) - number of bytes to pullup                          */
 /*                                                                          */
 /* Attempt to move at least len bytes (from the start of the buffer) into a */
 /* single buffer for ease of access.  Operating system native functions are */
 /* used to manage buffers - if necessary.  If the entire packet ends up in  */
 /* a single buffer, set the FI_COALESCE flag even though ipf_coalesce() has */
 /* not been called.  Both fin_ip and fin_dp are updated before exiting _IF_ */
 /* and ONLY if the pullup succeeds.                                         */
 /*                                                                          */
 /* We assume that 'xmin' is a pointer to a buffer that is part of the chain */
 /* of buffers that starts at *fin->fin_mp.                                  */
 /* ------------------------------------------------------------------------ */
 void *
 ipf_pullup(xmin, fin, len)
 	mb_t *xmin;
 	fr_info_t *fin;
 	int len;
 {
 	int dpoff, ipoff;
 	mb_t *m = xmin;
 	char *ip;
 
 	if (m == NULL)
 		return NULL;
 
 	ip = (char *)fin->fin_ip;
 	if ((fin->fin_flx & FI_COALESCE) != 0)
 		return ip;
 
 	ipoff = fin->fin_ipoff;
 	if (fin->fin_dp != NULL)
 		dpoff = (char *)fin->fin_dp - (char *)ip;
 	else
 		dpoff = 0;
 
 	if (M_LEN(m) < len) {
 		mb_t *n = *fin->fin_mp;
 		/*
 		 * Assume that M_PKTHDR is set and just work with what is left
 		 * rather than check..
 		 * Should not make any real difference, anyway.
 		 */
 		if (m != n) {
 			/*
 			 * Record the mbuf that points to the mbuf that we're
 			 * about to go to work on so that we can update the
 			 * m_next appropriately later.
 			 */
 			for (; n->m_next != m; n = n->m_next)
 				;
 		} else {
 			n = NULL;
 		}
 
 #ifdef MHLEN
 		if (len > MHLEN)
 #else
 		if (len > MLEN)
 #endif
 		{
 #ifdef HAVE_M_PULLDOWN
 			if (m_pulldown(m, 0, len, NULL) == NULL)
 				m = NULL;
 #else
 			FREE_MB_T(*fin->fin_mp);
 			m = NULL;
 			n = NULL;
 #endif
 		} else
 		{
 			m = m_pullup(m, len);
 		}
 		if (n != NULL)
 			n->m_next = m;
 		if (m == NULL) {
 			/*
 			 * When n is non-NULL, it indicates that m pointed to
 			 * a sub-chain (tail) of the mbuf and that the head
 			 * of this chain has not yet been free'd.
 			 */
 			if (n != NULL) {
 				FREE_MB_T(*fin->fin_mp);
 			}
 
 			*fin->fin_mp = NULL;
 			fin->fin_m = NULL;
 			return NULL;
 		}
 
 		if (n == NULL)
 			*fin->fin_mp = m;
 
 		while (M_LEN(m) == 0) {
 			m = m->m_next;
 		}
 		fin->fin_m = m;
 		ip = MTOD(m, char *) + ipoff;
 
 		fin->fin_ip = (ip_t *)ip;
 		if (fin->fin_dp != NULL)
 			fin->fin_dp = (char *)fin->fin_ip + dpoff;
 		if (fin->fin_fraghdr != NULL)
 			fin->fin_fraghdr = (char *)ip +
 					   ((char *)fin->fin_fraghdr -
 					    (char *)fin->fin_ip);
 	}
 
 	if (len == fin->fin_plen)
 		fin->fin_flx |= FI_COALESCE;
 	return ip;
 }
 
 
 int
 ipf_inject(fin, m)
 	fr_info_t *fin;
 	mb_t *m;
 {
 	int error = 0;
 
 	if (fin->fin_out == 0) {
 		netisr_dispatch(NETISR_IP, m);
 	} else {
 		fin->fin_ip->ip_len = ntohs(fin->fin_ip->ip_len);
 		fin->fin_ip->ip_off = ntohs(fin->fin_ip->ip_off);
 		error = ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL);
 	}
 
 	return error;
 }
 
 VNET_DEFINE_STATIC(pfil_hook_t, ipf_inet_hook);
 VNET_DEFINE_STATIC(pfil_hook_t, ipf_inet6_hook);
 #define	V_ipf_inet_hook		VNET(ipf_inet_hook)
 #define	V_ipf_inet6_hook	VNET(ipf_inet6_hook)
 
 int ipf_pfil_unhook(void) {
 
 	pfil_remove_hook(V_ipf_inet_hook);
 
 #ifdef USE_INET6
 	pfil_remove_hook(V_ipf_inet6_hook);
 #endif
 
 	return (0);
 }
 
 int ipf_pfil_hook(void) {
 	struct pfil_hook_args pha;
 	struct pfil_link_args pla;
 	int error, error6;
 
 	pha.pa_version = PFIL_VERSION;
 	pha.pa_flags = PFIL_IN | PFIL_OUT;
 	pha.pa_modname = "ipfilter";
 	pha.pa_rulname = "default-ip4";
 	pha.pa_func = ipf_check_wrapper;
 	pha.pa_ruleset = NULL;
 	pha.pa_type = PFIL_TYPE_IP4;
 	V_ipf_inet_hook = pfil_add_hook(&pha);
 
 #ifdef USE_INET6
 	pha.pa_rulname = "default-ip6";
 	pha.pa_func = ipf_check_wrapper6;
 	pha.pa_type = PFIL_TYPE_IP6;
 	V_ipf_inet6_hook = pfil_add_hook(&pha);
 #endif
 
 	pla.pa_version = PFIL_VERSION;
 	pla.pa_flags = PFIL_IN | PFIL_OUT |
 	    PFIL_HEADPTR | PFIL_HOOKPTR;
 	pla.pa_head = V_inet_pfil_head;
 	pla.pa_hook = V_ipf_inet_hook;
 	error = pfil_link(&pla);
 
 	error6 = 0;
 #ifdef USE_INET6
 	pla.pa_head = V_inet6_pfil_head;
 	pla.pa_hook = V_ipf_inet6_hook;
 	error6 = pfil_link(&pla);
 #endif
 
 	if (error || error6)
 		error = ENODEV;
 	else
 		error = 0;
 
 	return (error);
 }
 
 void
 ipf_event_reg(void)
 {
 	ipf_arrivetag = EVENTHANDLER_REGISTER(ifnet_arrival_event, \
 					       ipf_ifevent, NULL, \
 					       EVENTHANDLER_PRI_ANY);
 	ipf_departtag = EVENTHANDLER_REGISTER(ifnet_departure_event, \
 					       ipf_ifevent, NULL, \
 					       EVENTHANDLER_PRI_ANY);
 #if 0
 	ipf_clonetag  = EVENTHANDLER_REGISTER(if_clone_event, ipf_ifevent, \
 					       NULL, EVENTHANDLER_PRI_ANY);
 #endif
 }
 
 void
 ipf_event_dereg(void)
 {
 	if (ipf_arrivetag != NULL) {
 		EVENTHANDLER_DEREGISTER(ifnet_arrival_event, ipf_arrivetag);
 	}
 	if (ipf_departtag != NULL) {
 		EVENTHANDLER_DEREGISTER(ifnet_departure_event, ipf_departtag);
 	}
 #if 0
 	if (ipf_clonetag != NULL) {
 		EVENTHANDLER_DEREGISTER(if_clone_event, ipf_clonetag);
 	}
 #endif
 }
 
 
 u_32_t
 ipf_random()
 {
 	return arc4random();
 }
 
 
 u_int
 ipf_pcksum(fin, hlen, sum)
 	fr_info_t *fin;
 	int hlen;
 	u_int sum;
 {
 	struct mbuf *m;
 	u_int sum2;
 	int off;
 
 	m = fin->fin_m;
 	off = (char *)fin->fin_dp - (char *)fin->fin_ip;
 	m->m_data += hlen;
 	m->m_len -= hlen;
 	sum2 = in_cksum(fin->fin_m, fin->fin_plen - off);
 	m->m_len += hlen;
 	m->m_data -= hlen;
 
 	/*
 	 * Both sum and sum2 are partial sums, so combine them together.
 	 */
 	sum += ~sum2 & 0xffff;
 	while (sum > 0xffff)
 		sum = (sum & 0xffff) + (sum >> 16);
 	sum2 = ~sum & 0xffff;
 	return sum2;
 }
Index: user/ngie/bug-237403/sys/contrib/ipfilter
===================================================================
--- user/ngie/bug-237403/sys/contrib/ipfilter	(revision 346925)
+++ user/ngie/bug-237403/sys/contrib/ipfilter	(revision 346926)

Property changes on: user/ngie/bug-237403/sys/contrib/ipfilter
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/sys/contrib/ipfilter:r346444-346925
Index: user/ngie/bug-237403/sys/dev/acpi_support/acpi_ibm.c
===================================================================
--- user/ngie/bug-237403/sys/dev/acpi_support/acpi_ibm.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/acpi_support/acpi_ibm.c	(revision 346926)
@@ -1,1380 +1,1412 @@
 /*-
  * Copyright (c) 2004 Takanori Watanabe
  * Copyright (c) 2005 Markus Brueffer <markus@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 /*
  * Driver for extra ACPI-controlled gadgets found on IBM ThinkPad laptops.
  * Inspired by the ibm-acpi and tpb projects which implement these features
  * on Linux.
  *
  *   acpi-ibm: <http://ibm-acpi.sourceforge.net/>
  *        tpb: <http://www.nongnu.org/tpb/>
  */
 
 #include "opt_acpi.h"
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <machine/cpufunc.h>
 
 #include <contrib/dev/acpica/include/acpi.h>
 #include <contrib/dev/acpica/include/accommon.h>
 
 #include "acpi_if.h"
 #include <sys/module.h>
 #include <dev/acpica/acpivar.h>
 #include <dev/led/led.h>
 #include <sys/power.h>
 #include <sys/sbuf.h>
 #include <sys/sysctl.h>
 #include <isa/rtc.h>
 
 #define _COMPONENT	ACPI_OEM
 ACPI_MODULE_NAME("IBM")
 
 /* Internal methods */
 #define ACPI_IBM_METHOD_EVENTS		1
 #define ACPI_IBM_METHOD_EVENTMASK	2
 #define ACPI_IBM_METHOD_HOTKEY		3
 #define ACPI_IBM_METHOD_BRIGHTNESS	4
 #define ACPI_IBM_METHOD_VOLUME		5
 #define ACPI_IBM_METHOD_MUTE		6
 #define ACPI_IBM_METHOD_THINKLIGHT	7
 #define ACPI_IBM_METHOD_BLUETOOTH	8
 #define ACPI_IBM_METHOD_WLAN		9
 #define ACPI_IBM_METHOD_FANSPEED	10
 #define ACPI_IBM_METHOD_FANLEVEL	11
 #define ACPI_IBM_METHOD_FANSTATUS	12
 #define ACPI_IBM_METHOD_THERMAL		13
 #define ACPI_IBM_METHOD_HANDLEREVENTS	14
 #define ACPI_IBM_METHOD_MIC_LED		15
 
 /* Hotkeys/Buttons */
 #define IBM_RTC_HOTKEY1			0x64
 #define   IBM_RTC_MASK_HOME		(1 << 0)
 #define   IBM_RTC_MASK_SEARCH		(1 << 1)
 #define   IBM_RTC_MASK_MAIL		(1 << 2)
 #define   IBM_RTC_MASK_WLAN		(1 << 5)
 #define IBM_RTC_HOTKEY2			0x65
 #define   IBM_RTC_MASK_THINKPAD		(1 << 3)
 #define   IBM_RTC_MASK_ZOOM		(1 << 5)
 #define   IBM_RTC_MASK_VIDEO		(1 << 6)
 #define   IBM_RTC_MASK_HIBERNATE	(1 << 7)
 #define IBM_RTC_THINKLIGHT		0x66
 #define   IBM_RTC_MASK_THINKLIGHT	(1 << 4)
 #define IBM_RTC_SCREENEXPAND		0x67
 #define   IBM_RTC_MASK_SCREENEXPAND	(1 << 5)
 #define IBM_RTC_BRIGHTNESS		0x6c
 #define   IBM_RTC_MASK_BRIGHTNESS	(1 << 5)
 #define IBM_RTC_VOLUME			0x6e
 #define   IBM_RTC_MASK_VOLUME		(1 << 7)
 
 /* Embedded Controller registers */
 #define IBM_EC_BRIGHTNESS		0x31
 #define   IBM_EC_MASK_BRI		0x7
 #define IBM_EC_VOLUME			0x30
 #define   IBM_EC_MASK_VOL		0xf
 #define   IBM_EC_MASK_MUTE		(1 << 6)
 #define IBM_EC_FANSTATUS		0x2F
 #define   IBM_EC_MASK_FANLEVEL		0x3f
 #define   IBM_EC_MASK_FANDISENGAGED	(1 << 6)
 #define   IBM_EC_MASK_FANSTATUS		(1 << 7)
 #define IBM_EC_FANSPEED			0x84
 
 /* CMOS Commands */
 #define IBM_CMOS_VOLUME_DOWN		0
 #define IBM_CMOS_VOLUME_UP		1
 #define IBM_CMOS_VOLUME_MUTE		2
 #define IBM_CMOS_BRIGHTNESS_UP		4
 #define IBM_CMOS_BRIGHTNESS_DOWN	5
 
 /* ACPI methods */
 #define IBM_NAME_KEYLIGHT		"KBLT"
 #define IBM_NAME_WLAN_BT_GET		"GBDC"
 #define IBM_NAME_WLAN_BT_SET		"SBDC"
 #define   IBM_NAME_MASK_BT		(1 << 1)
 #define   IBM_NAME_MASK_WLAN		(1 << 2)
 #define IBM_NAME_THERMAL_GET		"TMP7"
 #define IBM_NAME_THERMAL_UPDT		"UPDT"
 
 #define IBM_NAME_EVENTS_STATUS_GET	"DHKC"
 #define IBM_NAME_EVENTS_MASK_GET	"DHKN"
 #define IBM_NAME_EVENTS_STATUS_SET	"MHKC"
 #define IBM_NAME_EVENTS_MASK_SET	"MHKM"
 #define IBM_NAME_EVENTS_GET		"MHKP"
 #define IBM_NAME_EVENTS_AVAILMASK	"MHKA"
 
 /* Event Code */
 #define IBM_EVENT_LCD_BACKLIGHT		0x03
 #define IBM_EVENT_SUSPEND_TO_RAM	0x04
 #define IBM_EVENT_BLUETOOTH		0x05
 #define IBM_EVENT_SCREEN_EXPAND		0x07
 #define IBM_EVENT_SUSPEND_TO_DISK	0x0c
 #define IBM_EVENT_BRIGHTNESS_UP		0x10
 #define IBM_EVENT_BRIGHTNESS_DOWN	0x11
 #define IBM_EVENT_THINKLIGHT		0x12
 #define IBM_EVENT_ZOOM			0x14
 #define IBM_EVENT_VOLUME_UP		0x15
 #define IBM_EVENT_VOLUME_DOWN		0x16
 #define IBM_EVENT_MUTE			0x17
 #define IBM_EVENT_ACCESS_IBM_BUTTON	0x18
 
 #define ABS(x) (((x) < 0)? -(x) : (x))
 
 struct acpi_ibm_softc {
 	device_t	dev;
 	ACPI_HANDLE	handle;
 
 	/* Embedded controller */
 	device_t	ec_dev;
 	ACPI_HANDLE	ec_handle;
 
 	/* CMOS */
 	ACPI_HANDLE	cmos_handle;
 
 	/* Fan status */
 	ACPI_HANDLE	fan_handle;
 	int		fan_levels;
 
 	/* Keylight commands and states */
 	ACPI_HANDLE	light_handle;
 	int		light_cmd_on;
 	int		light_cmd_off;
 	int		light_val;
 	int		light_get_supported;
 	int		light_set_supported;
 
 	/* led(4) interface */
 	struct cdev	*led_dev;
 	int		led_busy;
 	int		led_state;
 
 	/* Mic led handle */
 	ACPI_HANDLE	mic_led_handle;
 	int		mic_led_state;
 
 	int		wlan_bt_flags;
 	int		thermal_updt_supported;
 
 	unsigned int	events_availmask;
 	unsigned int	events_initialmask;
 	int		events_mask_supported;
 	int		events_enable;
 
 	unsigned int	handler_events;
 
 	struct sysctl_ctx_list	*sysctl_ctx;
 	struct sysctl_oid	*sysctl_tree;
 };
 
 static struct {
 	char	*name;
 	int	method;
 	char	*description;
 	int	flag_rdonly;
 } acpi_ibm_sysctls[] = {
 	{
 		.name		= "events",
 		.method		= ACPI_IBM_METHOD_EVENTS,
 		.description	= "ACPI events enable",
 	},
 	{
 		.name		= "eventmask",
 		.method		= ACPI_IBM_METHOD_EVENTMASK,
 		.description	= "ACPI eventmask",
 	},
 	{
 		.name		= "hotkey",
 		.method		= ACPI_IBM_METHOD_HOTKEY,
 		.description	= "Key Status",
 		.flag_rdonly	= 1
 	},
 	{
 		.name		= "lcd_brightness",
 		.method		= ACPI_IBM_METHOD_BRIGHTNESS,
 		.description	= "LCD Brightness",
 	},
 	{
 		.name		= "volume",
 		.method		= ACPI_IBM_METHOD_VOLUME,
 		.description	= "Volume",
 	},
 	{
 		.name		= "mute",
 		.method		= ACPI_IBM_METHOD_MUTE,
 		.description	= "Mute",
 	},
 	{
 		.name		= "thinklight",
 		.method		= ACPI_IBM_METHOD_THINKLIGHT,
 		.description	= "Thinklight enable",
 	},
 	{
 		.name		= "bluetooth",
 		.method		= ACPI_IBM_METHOD_BLUETOOTH,
 		.description	= "Bluetooth enable",
 	},
 	{
 		.name		= "wlan",
 		.method		= ACPI_IBM_METHOD_WLAN,
 		.description	= "WLAN enable",
 		.flag_rdonly	= 1
 	},
 	{
 		.name		= "fan_speed",
 		.method		= ACPI_IBM_METHOD_FANSPEED,
 		.description	= "Fan speed",
 		.flag_rdonly	= 1
 	},
 	{
 		.name		= "fan_level",
 		.method		= ACPI_IBM_METHOD_FANLEVEL,
 		.description	= "Fan level",
 	},
 	{
 		.name		= "fan",
 		.method		= ACPI_IBM_METHOD_FANSTATUS,
 		.description	= "Fan enable",
 	},
 	{
 		.name		= "mic_led",
 		.method		= ACPI_IBM_METHOD_MIC_LED,
 		.description	= "Mic led",
 	},
 	{ NULL, 0, NULL, 0 }
 };
 
 /*
  * Per-model default list of event mask.
  */
 #define	ACPI_IBM_HKEY_RFKILL_MASK		(1 << 4)
 #define	ACPI_IBM_HKEY_DSWITCH_MASK		(1 << 6)
 #define	ACPI_IBM_HKEY_BRIGHTNESS_UP_MASK	(1 << 15)
 #define	ACPI_IBM_HKEY_BRIGHTNESS_DOWN_MASK	(1 << 16)
 #define	ACPI_IBM_HKEY_SEARCH_MASK		(1 << 18)
 #define	ACPI_IBM_HKEY_MICMUTE_MASK		(1 << 26)
 #define	ACPI_IBM_HKEY_SETTINGS_MASK		(1 << 28)
 #define	ACPI_IBM_HKEY_VIEWOPEN_MASK		(1 << 30)
 #define	ACPI_IBM_HKEY_VIEWALL_MASK		(1 << 31)
 
 struct acpi_ibm_models {
 	const char *maker;
 	const char *product;
 	uint32_t eventmask;
 } acpi_ibm_models[] = {
 	{ "LENOVO", "20BSCTO1WW",
 	  ACPI_IBM_HKEY_RFKILL_MASK |
 	  ACPI_IBM_HKEY_DSWITCH_MASK |
 	  ACPI_IBM_HKEY_BRIGHTNESS_UP_MASK |
 	  ACPI_IBM_HKEY_BRIGHTNESS_DOWN_MASK |
 	  ACPI_IBM_HKEY_SEARCH_MASK |
 	  ACPI_IBM_HKEY_MICMUTE_MASK |
 	  ACPI_IBM_HKEY_SETTINGS_MASK |
 	  ACPI_IBM_HKEY_VIEWOPEN_MASK |
 	  ACPI_IBM_HKEY_VIEWALL_MASK
 	}
 };
 
 ACPI_SERIAL_DECL(ibm, "ACPI IBM extras");
 
 static int	acpi_ibm_probe(device_t dev);
 static int	acpi_ibm_attach(device_t dev);
 static int	acpi_ibm_detach(device_t dev);
 static int	acpi_ibm_resume(device_t dev);
 
 static void	ibm_led(void *softc, int onoff);
 static void	ibm_led_task(struct acpi_ibm_softc *sc, int pending __unused);
 
 static int	acpi_ibm_sysctl(SYSCTL_HANDLER_ARGS);
 static int	acpi_ibm_sysctl_init(struct acpi_ibm_softc *sc, int method);
 static int	acpi_ibm_sysctl_get(struct acpi_ibm_softc *sc, int method);
 static int	acpi_ibm_sysctl_set(struct acpi_ibm_softc *sc, int method, int val);
 
 static int	acpi_ibm_eventmask_set(struct acpi_ibm_softc *sc, int val);
 static int	acpi_ibm_thermal_sysctl(SYSCTL_HANDLER_ARGS);
 static int	acpi_ibm_handlerevents_sysctl(SYSCTL_HANDLER_ARGS);
 static void	acpi_ibm_notify(ACPI_HANDLE h, UINT32 notify, void *context);
 
 static int	acpi_ibm_brightness_set(struct acpi_ibm_softc *sc, int arg);
 static int	acpi_ibm_bluetooth_set(struct acpi_ibm_softc *sc, int arg);
 static int	acpi_ibm_thinklight_set(struct acpi_ibm_softc *sc, int arg);
 static int	acpi_ibm_volume_set(struct acpi_ibm_softc *sc, int arg);
 static int	acpi_ibm_mute_set(struct acpi_ibm_softc *sc, int arg);
 
 static device_method_t acpi_ibm_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe, acpi_ibm_probe),
 	DEVMETHOD(device_attach, acpi_ibm_attach),
 	DEVMETHOD(device_detach, acpi_ibm_detach),
 	DEVMETHOD(device_resume, acpi_ibm_resume),
 
 	DEVMETHOD_END
 };
 
 static driver_t	acpi_ibm_driver = {
 	"acpi_ibm",
 	acpi_ibm_methods,
 	sizeof(struct acpi_ibm_softc),
 };
 
 static devclass_t acpi_ibm_devclass;
 
 DRIVER_MODULE(acpi_ibm, acpi, acpi_ibm_driver, acpi_ibm_devclass,
 	      0, 0);
 MODULE_DEPEND(acpi_ibm, acpi, 1, 1, 1);
-static char    *ibm_ids[] = {"IBM0068", "LEN0068", NULL};
+static char    *ibm_ids[] = {"IBM0068", "LEN0068", "LEN0268", NULL};
 
 static void
 ibm_led(void *softc, int onoff)
 {
 	struct acpi_ibm_softc* sc = (struct acpi_ibm_softc*) softc;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 
 	if (sc->led_busy)
 		return;
 
 	sc->led_busy = 1;
 	sc->led_state = onoff;
 
 	AcpiOsExecute(OSL_NOTIFY_HANDLER, (void *)ibm_led_task, sc);
 }
 
 static void
 ibm_led_task(struct acpi_ibm_softc *sc, int pending __unused)
 {
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 
 	ACPI_SERIAL_BEGIN(ibm);
 	acpi_ibm_sysctl_set(sc, ACPI_IBM_METHOD_THINKLIGHT, sc->led_state);
 	ACPI_SERIAL_END(ibm);
 
 	sc->led_busy = 0;
 }
 
 static int
 acpi_ibm_mic_led_set (struct acpi_ibm_softc *sc, int arg)
 {
 	ACPI_OBJECT_LIST input;
 	ACPI_OBJECT params[1];
 	ACPI_STATUS status;
 
 	if (arg < 0 || arg > 1)
 		return (EINVAL);
 
 	if (sc->mic_led_handle) {
 		params[0].Type = ACPI_TYPE_INTEGER;
 		params[0].Integer.Value = 0;
 		/* mic led: 0 off, 2 on */
 		if (arg == 1)
 			params[0].Integer.Value = 2;
 
 		input.Pointer = params;
 		input.Count = 1;
 
 		status = AcpiEvaluateObject (sc->handle, "MMTS", &input, NULL);
 		if (ACPI_SUCCESS(status))
 			sc->mic_led_state = arg;
 		return(status);
 	}
 
 	return (0);
 }
 
 static int
 acpi_ibm_probe(device_t dev)
 {
 	int rv;
 
 	if (acpi_disabled("ibm") ||
 	    device_get_unit(dev) != 0)
 		return (ENXIO);
 	rv = ACPI_ID_PROBE(device_get_parent(dev), dev, ibm_ids, NULL);
 
 	if (rv <= 0) 
 		device_set_desc(dev, "IBM ThinkPad ACPI Extras");
 	
 	return (rv);
 }
 
 static int
 acpi_ibm_attach(device_t dev)
 {
 	int i;
+	int hkey;
 	struct acpi_ibm_softc	*sc;
 	char *maker, *product;
-	devclass_t		ec_devclass;
+	ACPI_OBJECT_LIST input;
+	ACPI_OBJECT params[1];
+	ACPI_OBJECT out_obj;
+	ACPI_BUFFER result;
+	devclass_t ec_devclass;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t) __func__);
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	sc->handle = acpi_get_handle(dev);
 
 	/* Look for the first embedded controller */
         if (!(ec_devclass = devclass_find ("acpi_ec"))) {
 		if (bootverbose)
 			device_printf(dev, "Couldn't find acpi_ec devclass\n");
 		return (EINVAL);
 	}
         if (!(sc->ec_dev = devclass_get_device(ec_devclass, 0))) {
 		if (bootverbose)
 			device_printf(dev, "Couldn't find acpi_ec device\n");
 		return (EINVAL);
 	}
 	sc->ec_handle = acpi_get_handle(sc->ec_dev);
 
 	/* Get the sysctl tree */
 	sc->sysctl_ctx = device_get_sysctl_ctx(dev);
 	sc->sysctl_tree = device_get_sysctl_tree(dev);
 
 	/* Look for event mask and hook up the nodes */
 	sc->events_mask_supported = ACPI_SUCCESS(acpi_GetInteger(sc->handle,
 	    IBM_NAME_EVENTS_MASK_GET, &sc->events_initialmask));
 
 	if (sc->events_mask_supported) {
 		SYSCTL_ADD_UINT(sc->sysctl_ctx,
 		    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
 		    "initialmask", CTLFLAG_RD,
 		    &sc->events_initialmask, 0, "Initial eventmask");
 
-		/* The availmask is the bitmask of supported events */
-		if (ACPI_FAILURE(acpi_GetInteger(sc->handle,
-		    IBM_NAME_EVENTS_AVAILMASK, &sc->events_availmask)))
+		if (ACPI_SUCCESS (acpi_GetInteger(sc->handle, "MHKV", &hkey))) {
+			device_printf(dev, "Firmware version is 0x%X\n", hkey);
+			switch(hkey >> 8)
+			{
+			case 1:
+				/* The availmask is the bitmask of supported events */
+				if (ACPI_FAILURE(acpi_GetInteger(sc->handle,
+				    IBM_NAME_EVENTS_AVAILMASK, &sc->events_availmask)))
+					sc->events_availmask = 0xffffffff;
+				break;
+
+			case 2:
+				result.Length = sizeof(out_obj);
+				result.Pointer = &out_obj;
+				params[0].Type = ACPI_TYPE_INTEGER;
+				params[0].Integer.Value = 1;
+				input.Pointer = params;
+				input.Count = 1;
+
+				sc->events_availmask = 0xffffffff;
+
+				if (ACPI_SUCCESS(AcpiEvaluateObject (sc->handle,
+				    IBM_NAME_EVENTS_AVAILMASK, &input, &result)))
+					sc->events_availmask = out_obj.Integer.Value;
+				break;
+			default:
+				device_printf(dev, "Unknown firmware version 0x%x\n", hkey);
+				break;
+			}
+		} else
 			sc->events_availmask = 0xffffffff;
 
 		SYSCTL_ADD_UINT(sc->sysctl_ctx,
-		    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
-		    "availmask", CTLFLAG_RD,
-		    &sc->events_availmask, 0, "Mask of supported events");
+				SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
+				"availmask", CTLFLAG_RD,
+				&sc->events_availmask, 0, "Mask of supported events");
 	}
 
 	/* Hook up proc nodes */
 	for (int i = 0; acpi_ibm_sysctls[i].name != NULL; i++) {
 		if (!acpi_ibm_sysctl_init(sc, acpi_ibm_sysctls[i].method))
 			continue;
 
 		if (acpi_ibm_sysctls[i].flag_rdonly != 0) {
 			SYSCTL_ADD_PROC(sc->sysctl_ctx,
 			    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
 			    acpi_ibm_sysctls[i].name, CTLTYPE_INT | CTLFLAG_RD,
 			    sc, i, acpi_ibm_sysctl, "I",
 			    acpi_ibm_sysctls[i].description);
 		} else {
 			SYSCTL_ADD_PROC(sc->sysctl_ctx,
 			    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
 			    acpi_ibm_sysctls[i].name, CTLTYPE_INT | CTLFLAG_RW,
 			    sc, i, acpi_ibm_sysctl, "I",
 			    acpi_ibm_sysctls[i].description);
 		}
 	}
 
 	/* Hook up thermal node */
 	if (acpi_ibm_sysctl_init(sc, ACPI_IBM_METHOD_THERMAL)) {
 		SYSCTL_ADD_PROC(sc->sysctl_ctx,
 		    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
 		    "thermal", CTLTYPE_INT | CTLFLAG_RD,
 		    sc, 0, acpi_ibm_thermal_sysctl, "I",
 		    "Thermal zones");
 	}
 
 	/* Hook up handlerevents node */
 	if (acpi_ibm_sysctl_init(sc, ACPI_IBM_METHOD_HANDLEREVENTS)) {
 		SYSCTL_ADD_PROC(sc->sysctl_ctx,
 		    SYSCTL_CHILDREN(sc->sysctl_tree), OID_AUTO,
 		    "handlerevents", CTLTYPE_STRING | CTLFLAG_RW,
 		    sc, 0, acpi_ibm_handlerevents_sysctl, "I",
 		    "devd(8) events handled by acpi_ibm");
 	}
 
 	/* Handle notifies */
 	AcpiInstallNotifyHandler(sc->handle, ACPI_DEVICE_NOTIFY,
 	    acpi_ibm_notify, dev);
 
 	/* Hook up light to led(4) */
 	if (sc->light_set_supported)
 		sc->led_dev = led_create_state(ibm_led, sc, "thinklight",
 		    (sc->light_val ? 1 : 0));
 
 	/* Enable per-model events. */
 	maker = kern_getenv("smbios.system.maker");
 	product = kern_getenv("smbios.system.product");
 	if (maker == NULL || product == NULL)
 		goto nosmbios;
 
 	for (i = 0; i < nitems(acpi_ibm_models); i++) {
 		if (strcmp(maker, acpi_ibm_models[i].maker) == 0 &&
 		    strcmp(product, acpi_ibm_models[i].product) == 0) {
 			ACPI_SERIAL_BEGIN(ibm);
 			acpi_ibm_sysctl_set(sc, ACPI_IBM_METHOD_EVENTMASK,
 			    acpi_ibm_models[i].eventmask);
 			ACPI_SERIAL_END(ibm);
 		}
 	}
 
 nosmbios:
 	freeenv(maker);
 	freeenv(product);
 
 	/* Enable events by default. */
 	ACPI_SERIAL_BEGIN(ibm);
 	acpi_ibm_sysctl_set(sc, ACPI_IBM_METHOD_EVENTS, 1);
 	ACPI_SERIAL_END(ibm);
 
 
 	return (0);
 }
 
 static int
 acpi_ibm_detach(device_t dev)
 {
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t) __func__);
 
 	struct acpi_ibm_softc *sc = device_get_softc(dev);
 
 	/* Disable events and restore eventmask */
 	ACPI_SERIAL_BEGIN(ibm);
 	acpi_ibm_sysctl_set(sc, ACPI_IBM_METHOD_EVENTS, 0);
 	acpi_ibm_sysctl_set(sc, ACPI_IBM_METHOD_EVENTMASK, sc->events_initialmask);
 	ACPI_SERIAL_END(ibm);
 
 	AcpiRemoveNotifyHandler(sc->handle, ACPI_DEVICE_NOTIFY, acpi_ibm_notify);
 
 	if (sc->led_dev != NULL)
 		led_destroy(sc->led_dev);
 
 	return (0);
 }
 
 static int
 acpi_ibm_resume(device_t dev)
 {
 	struct acpi_ibm_softc *sc = device_get_softc(dev);
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t) __func__);
 
 	ACPI_SERIAL_BEGIN(ibm);
 	for (int i = 0; acpi_ibm_sysctls[i].name != NULL; i++) {
 		int val;
 
 		val = acpi_ibm_sysctl_get(sc, i);
 
 		if (acpi_ibm_sysctls[i].flag_rdonly != 0)
 			continue;
 
 		acpi_ibm_sysctl_set(sc, i, val);
 	}
 	ACPI_SERIAL_END(ibm);
 
 	/* The mic led does not turn back on when sysctl_set is called in the above loop */
 	acpi_ibm_mic_led_set(sc, sc->mic_led_state);
 
 	return (0);
 }
 
 static int
 acpi_ibm_eventmask_set(struct acpi_ibm_softc *sc, int val)
 {
 	ACPI_OBJECT		arg[2];
 	ACPI_OBJECT_LIST	args;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	args.Count = 2;
 	args.Pointer = arg;
 	arg[0].Type = ACPI_TYPE_INTEGER;
 	arg[1].Type = ACPI_TYPE_INTEGER;
 
 	for (int i = 0; i < 32; ++i) {
 		arg[0].Integer.Value = i+1;
 		arg[1].Integer.Value = (((1 << i) & val) != 0);
 		status = AcpiEvaluateObject(sc->handle,
 		    IBM_NAME_EVENTS_MASK_SET, &args, NULL);
 
 		if (ACPI_FAILURE(status))
 			return (status);
 	}
 
 	return (0);
 }
 
 static int
 acpi_ibm_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct acpi_ibm_softc	*sc;
 	int			arg;
 	int			error = 0;
 	int			function;
 	int			method;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 
 	sc = (struct acpi_ibm_softc *)oidp->oid_arg1;
 	function = oidp->oid_arg2;
 	method = acpi_ibm_sysctls[function].method;
 
 	ACPI_SERIAL_BEGIN(ibm);
 	arg = acpi_ibm_sysctl_get(sc, method);
 	error = sysctl_handle_int(oidp, &arg, 0, req);
 
 	/* Sanity check */
 	if (error != 0 || req->newptr == NULL)
 		goto out;
 
 	/* Update */
 	error = acpi_ibm_sysctl_set(sc, method, arg);
 
 out:
 	ACPI_SERIAL_END(ibm);
 	return (error);
 }
 
 static int
 acpi_ibm_sysctl_get(struct acpi_ibm_softc *sc, int method)
 {
 	UINT64		val_ec;
 	int 		val = 0, key;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	switch (method) {
 	case ACPI_IBM_METHOD_EVENTS:
 		acpi_GetInteger(sc->handle, IBM_NAME_EVENTS_STATUS_GET, &val);
 		break;
 
 	case ACPI_IBM_METHOD_EVENTMASK:
 		if (sc->events_mask_supported)
 			acpi_GetInteger(sc->handle, IBM_NAME_EVENTS_MASK_GET, &val);
 		break;
 
 	case ACPI_IBM_METHOD_HOTKEY:
 		/*
 		 * Construct the hotkey as a bitmask as illustrated below.
 		 * Note that whenever a key was pressed, the respecting bit
 		 * toggles and nothing else changes.
 		 * +--+--+-+-+-+-+-+-+-+-+-+-+
 		 * |11|10|9|8|7|6|5|4|3|2|1|0|
 		 * +--+--+-+-+-+-+-+-+-+-+-+-+
 		 *   |  | | | | | | | | | | |
 		 *   |  | | | | | | | | | | +- Home Button
 		 *   |  | | | | | | | | | +--- Search Button
 		 *   |  | | | | | | | | +----- Mail Button
 		 *   |  | | | | | | | +------- Thinkpad Button
 		 *   |  | | | | | | +--------- Zoom (Fn + Space)
 		 *   |  | | | | | +----------- WLAN Button
 		 *   |  | | | | +------------- Video Button
 		 *   |  | | | +--------------- Hibernate Button
 		 *   |  | | +----------------- Thinklight Button
 		 *   |  | +------------------- Screen expand (Fn + F8)
 		 *   |  +--------------------- Brightness
 		 *   +------------------------ Volume/Mute
 		 */
 		key = rtcin(IBM_RTC_HOTKEY1);
 		val = (IBM_RTC_MASK_HOME | IBM_RTC_MASK_SEARCH | IBM_RTC_MASK_MAIL | IBM_RTC_MASK_WLAN) & key;
 		key = rtcin(IBM_RTC_HOTKEY2);
 		val |= (IBM_RTC_MASK_THINKPAD | IBM_RTC_MASK_VIDEO | IBM_RTC_MASK_HIBERNATE) & key;
 		val |= (IBM_RTC_MASK_ZOOM & key) >> 1;
 		key = rtcin(IBM_RTC_THINKLIGHT);
 		val |= (IBM_RTC_MASK_THINKLIGHT & key) << 4;
 		key = rtcin(IBM_RTC_SCREENEXPAND);
 		val |= (IBM_RTC_MASK_THINKLIGHT & key) << 4;
 		key = rtcin(IBM_RTC_BRIGHTNESS);
 		val |= (IBM_RTC_MASK_BRIGHTNESS & key) << 5;
 		key = rtcin(IBM_RTC_VOLUME);
 		val |= (IBM_RTC_MASK_VOLUME & key) << 4;
 		break;
 
 	case ACPI_IBM_METHOD_BRIGHTNESS:
 		ACPI_EC_READ(sc->ec_dev, IBM_EC_BRIGHTNESS, &val_ec, 1);
 		val = val_ec & IBM_EC_MASK_BRI;
 		break;
 
 	case ACPI_IBM_METHOD_VOLUME:
 		ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 		val = val_ec & IBM_EC_MASK_VOL;
 		break;
 
 	case ACPI_IBM_METHOD_MUTE:
 		ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 		val = ((val_ec & IBM_EC_MASK_MUTE) == IBM_EC_MASK_MUTE);
 		break;
 
 	case ACPI_IBM_METHOD_THINKLIGHT:
 		if (sc->light_get_supported)
 			acpi_GetInteger(sc->ec_handle, IBM_NAME_KEYLIGHT, &val);
 		else
 			val = sc->light_val;
 		break;
 
 	case ACPI_IBM_METHOD_BLUETOOTH:
 		acpi_GetInteger(sc->handle, IBM_NAME_WLAN_BT_GET, &val);
 		sc->wlan_bt_flags = val;
 		val = ((val & IBM_NAME_MASK_BT) != 0);
 		break;
 
 	case ACPI_IBM_METHOD_WLAN:
 		acpi_GetInteger(sc->handle, IBM_NAME_WLAN_BT_GET, &val);
 		sc->wlan_bt_flags = val;
 		val = ((val & IBM_NAME_MASK_WLAN) != 0);
 		break;
 
 	case ACPI_IBM_METHOD_FANSPEED:
 		if (sc->fan_handle) {
 			if(ACPI_FAILURE(acpi_GetInteger(sc->fan_handle, NULL, &val)))
 				val = -1;
 		}
 		else {
 			ACPI_EC_READ(sc->ec_dev, IBM_EC_FANSPEED, &val_ec, 2);
 			val = val_ec;
 		}
 		break;
 
 	case ACPI_IBM_METHOD_FANLEVEL:
 		/*
 		 * The IBM_EC_FANSTATUS register works as follows:
 		 * Bit 0-5 indicate the level at which the fan operates. Only
 		 *       values between 0 and 7 have an effect. Everything
 		 *       above 7 is treated the same as level 7
 		 * Bit 6 overrides the fan speed limit if set to 1
 		 * Bit 7 indicates at which mode the fan operates:
 		 *       manual (0) or automatic (1)
 		 */
 		if (!sc->fan_handle) {
 			ACPI_EC_READ(sc->ec_dev, IBM_EC_FANSTATUS, &val_ec, 1);
 			val = val_ec & IBM_EC_MASK_FANLEVEL;
 		}
 		break;
 
 	case ACPI_IBM_METHOD_FANSTATUS:
 		if (!sc->fan_handle) {
 			ACPI_EC_READ(sc->ec_dev, IBM_EC_FANSTATUS, &val_ec, 1);
 			val = (val_ec & IBM_EC_MASK_FANSTATUS) == IBM_EC_MASK_FANSTATUS;
 		}
 		else
 			val = -1;
 		break;
 	case ACPI_IBM_METHOD_MIC_LED:
 		if (sc->mic_led_handle)
 			return sc->mic_led_state;
 		else
 			val = -1;
 		break;
 	}
 
 	return (val);
 }
 
 static int
 acpi_ibm_sysctl_set(struct acpi_ibm_softc *sc, int method, int arg)
 {
 	int			val;
 	UINT64			val_ec;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	switch (method) {
 	case ACPI_IBM_METHOD_EVENTS:
 		if (arg < 0 || arg > 1)
 			return (EINVAL);
 
 		status = acpi_SetInteger(sc->handle, IBM_NAME_EVENTS_STATUS_SET, arg);
 		if (ACPI_FAILURE(status))
 			return (status);
 		if (sc->events_mask_supported)
 			return acpi_ibm_eventmask_set(sc, sc->events_availmask);
 		break;
 
 	case ACPI_IBM_METHOD_EVENTMASK:
 		if (sc->events_mask_supported)
 			return acpi_ibm_eventmask_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_BRIGHTNESS:
 		return acpi_ibm_brightness_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_VOLUME:
 		return acpi_ibm_volume_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_MUTE:
 		return acpi_ibm_mute_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_MIC_LED:
 		return acpi_ibm_mic_led_set (sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_THINKLIGHT:
 		return acpi_ibm_thinklight_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_BLUETOOTH:
 		return acpi_ibm_bluetooth_set(sc, arg);
 		break;
 
 	case ACPI_IBM_METHOD_FANLEVEL:
 		if (arg < 0 || arg > 7)
 			return (EINVAL);
 
 		if (!sc->fan_handle) {
 			/* Read the current fanstatus */
 			ACPI_EC_READ(sc->ec_dev, IBM_EC_FANSTATUS, &val_ec, 1);
 			val = val_ec & (~IBM_EC_MASK_FANLEVEL);
 
 			return ACPI_EC_WRITE(sc->ec_dev, IBM_EC_FANSTATUS, val | arg, 1);
 		}
 		break;
 
 	case ACPI_IBM_METHOD_FANSTATUS:
 		if (arg < 0 || arg > 1)
 			return (EINVAL);
 
 		if (!sc->fan_handle) {
 			/* Read the current fanstatus */
 			ACPI_EC_READ(sc->ec_dev, IBM_EC_FANSTATUS, &val_ec, 1);
 
 			return ACPI_EC_WRITE(sc->ec_dev, IBM_EC_FANSTATUS,
 				(arg == 1) ? (val_ec | IBM_EC_MASK_FANSTATUS) : (val_ec & (~IBM_EC_MASK_FANSTATUS)), 1);
 		}
 		break;
 	}
 
 	return (0);
 }
 
 static int
 acpi_ibm_sysctl_init(struct acpi_ibm_softc *sc, int method)
 {
 	int 			dummy;
 	ACPI_OBJECT_TYPE 	cmos_t;
 	ACPI_HANDLE		ledb_handle;
 
 	switch (method) {
 	case ACPI_IBM_METHOD_EVENTS:
 		return (TRUE);
 
 	case ACPI_IBM_METHOD_EVENTMASK:
 		return (sc->events_mask_supported);
 
 	case ACPI_IBM_METHOD_HOTKEY:
 	case ACPI_IBM_METHOD_BRIGHTNESS:
 	case ACPI_IBM_METHOD_VOLUME:
 	case ACPI_IBM_METHOD_MUTE:
 		/* EC is required here, which was already checked before */
 		return (TRUE);
 
 	case ACPI_IBM_METHOD_MIC_LED:
 		if (ACPI_SUCCESS(AcpiGetHandle(sc->handle, "MMTS", &sc->mic_led_handle)))
 		{
 			/* Turn off mic led by default */
 			acpi_ibm_mic_led_set (sc, 0);
 			return(TRUE);
 		}
 		else
 			sc->mic_led_handle = NULL;
 		return (FALSE);
 
 	case ACPI_IBM_METHOD_THINKLIGHT:
 		sc->cmos_handle = NULL;
 		sc->light_get_supported = ACPI_SUCCESS(acpi_GetInteger(
 		    sc->ec_handle, IBM_NAME_KEYLIGHT, &sc->light_val));
 
 		if ((ACPI_SUCCESS(AcpiGetHandle(sc->handle, "\\UCMS", &sc->light_handle)) ||
 		     ACPI_SUCCESS(AcpiGetHandle(sc->handle, "\\CMOS", &sc->light_handle)) ||
 		     ACPI_SUCCESS(AcpiGetHandle(sc->handle, "\\CMS", &sc->light_handle))) &&
 		     ACPI_SUCCESS(AcpiGetType(sc->light_handle, &cmos_t)) &&
 		     cmos_t == ACPI_TYPE_METHOD) {
 			sc->light_cmd_on = 0x0c;
 			sc->light_cmd_off = 0x0d;
 			sc->cmos_handle = sc->light_handle;
 		}
 		else if (ACPI_SUCCESS(AcpiGetHandle(sc->handle, "\\LGHT", &sc->light_handle))) {
 			sc->light_cmd_on = 1;
 			sc->light_cmd_off = 0;
 		}
 		else
 			sc->light_handle = NULL;
 
 		sc->light_set_supported = (sc->light_handle &&
 		    ACPI_FAILURE(AcpiGetHandle(sc->ec_handle, "LEDB", &ledb_handle)));
 
 		if (sc->light_get_supported)
 			return (TRUE);
 
 		if (sc->light_set_supported) {
 			sc->light_val = 0;
 			return (TRUE);
 		}
 
 		return (FALSE);
 
 	case ACPI_IBM_METHOD_BLUETOOTH:
 	case ACPI_IBM_METHOD_WLAN:
 		if (ACPI_SUCCESS(acpi_GetInteger(sc->handle, IBM_NAME_WLAN_BT_GET, &dummy)))
 			return (TRUE);
 		return (FALSE);
 
 	case ACPI_IBM_METHOD_FANSPEED:
 		/*
 		 * Some models report the fan speed in levels from 0-7
 		 * Newer models report it contiguously
 		 */
 		sc->fan_levels =
 		    (ACPI_SUCCESS(AcpiGetHandle(sc->handle, "GFAN", &sc->fan_handle)) ||
 		     ACPI_SUCCESS(AcpiGetHandle(sc->handle, "\\FSPD", &sc->fan_handle)));
 		return (TRUE);
 
 	case ACPI_IBM_METHOD_FANLEVEL:
 	case ACPI_IBM_METHOD_FANSTATUS:
 		/*
 		 * Fan status is only supported on those models,
 		 * which report fan RPM contiguously, not in levels
 		 */
 		if (sc->fan_levels)
 			return (FALSE);
 		return (TRUE);
 
 	case ACPI_IBM_METHOD_THERMAL:
 		if (ACPI_SUCCESS(acpi_GetInteger(sc->ec_handle, IBM_NAME_THERMAL_GET, &dummy))) {
 			sc->thermal_updt_supported = ACPI_SUCCESS(acpi_GetInteger(sc->ec_handle, IBM_NAME_THERMAL_UPDT, &dummy));
 			return (TRUE);
 		}
 		return (FALSE);
 
 	case ACPI_IBM_METHOD_HANDLEREVENTS:
 		return (TRUE);
 	}
 	return (FALSE);
 }
 
 static int
 acpi_ibm_thermal_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct acpi_ibm_softc	*sc;
 	int			error = 0;
 	char			temp_cmd[] = "TMP0";
 	int			temp[8];
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 
 	sc = (struct acpi_ibm_softc *)oidp->oid_arg1;
 
 	ACPI_SERIAL_BEGIN(ibm);
 
 	for (int i = 0; i < 8; ++i) {
 		temp_cmd[3] = '0' + i;
 
 		/*
 		 * The TMPx methods seem to return +/- 128 or 0
 		 * when the respecting sensor is not available
 		 */
 		if (ACPI_FAILURE(acpi_GetInteger(sc->ec_handle, temp_cmd,
 		    &temp[i])) || ABS(temp[i]) == 128 || temp[i] == 0)
 			temp[i] = -1;
 		else if (sc->thermal_updt_supported)
 			/* Temperature is reported in tenth of Kelvin */
 			temp[i] = (temp[i] - 2731 + 5) / 10;
 	}
 
 	error = sysctl_handle_opaque(oidp, &temp, 8*sizeof(int), req);
 
 	ACPI_SERIAL_END(ibm);
 	return (error);
 }
 
 static int
 acpi_ibm_handlerevents_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct acpi_ibm_softc	*sc;
 	int			error = 0;
 	struct sbuf		sb;
 	char			*cp, *ep;
 	int			l, val;
 	unsigned int		handler_events;
 	char			temp[128];
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 
 	sc = (struct acpi_ibm_softc *)oidp->oid_arg1;
 
 	if (sbuf_new(&sb, NULL, 128, SBUF_AUTOEXTEND) == NULL)
 		return (ENOMEM);
 
 	ACPI_SERIAL_BEGIN(ibm);
 
 	/* Get old values if this is a get request. */
 	if (req->newptr == NULL) {
 		for (int i = 0; i < 8 * sizeof(sc->handler_events); i++)
 			if (sc->handler_events & (1 << i))
 				sbuf_printf(&sb, "0x%02x ", i + 1);
 		if (sbuf_len(&sb) == 0)
 			sbuf_printf(&sb, "NONE");
 	}
 
 	sbuf_trim(&sb);
 	sbuf_finish(&sb);
 	strlcpy(temp, sbuf_data(&sb), sizeof(temp));
 	sbuf_delete(&sb);
 
 	error = sysctl_handle_string(oidp, temp, sizeof(temp), req);
 
 	/* Check for error or no change */
 	if (error != 0 || req->newptr == NULL)
 		goto out;
 
 	/* If the user is setting a string, parse it. */
 	handler_events = 0;
 	cp = temp;
 	while (*cp) {
 		if (isspace(*cp)) {
 			cp++;
 			continue;
 		}
 
 		ep = cp;
 
 		while (*ep && !isspace(*ep))
 			ep++;
 
 		l = ep - cp;
 		if (l == 0)
 			break;
 
 		if (strncmp(cp, "NONE", 4) == 0) {
 			cp = ep;
 			continue;
 		}
 
 		if (l >= 3 && cp[0] == '0' && (cp[1] == 'X' || cp[1] == 'x'))
 			val = strtoul(cp, &ep, 16);
 		else
 			val = strtoul(cp, &ep, 10);
 
 		if (val == 0 || ep == cp || val >= 8 * sizeof(handler_events)) {
 			cp[l] = '\0';
 			device_printf(sc->dev, "invalid event code: %s\n", cp);
 			error = EINVAL;
 			goto out;
 		}
 
 		handler_events |= 1 << (val - 1);
 
 		cp = ep;
 	}
 
 	sc->handler_events = handler_events;
 out:
 	ACPI_SERIAL_END(ibm);
 	return (error);
 }
 
 static int
 acpi_ibm_brightness_set(struct acpi_ibm_softc *sc, int arg)
 {
 	int			val, step;
 	UINT64			val_ec;
 	ACPI_OBJECT		Arg;
 	ACPI_OBJECT_LIST	Args;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	if (arg < 0 || arg > 7)
 		return (EINVAL);
 
 	/* Read the current brightness */
 	status = ACPI_EC_READ(sc->ec_dev, IBM_EC_BRIGHTNESS, &val_ec, 1);
 	if (ACPI_FAILURE(status))
 		return (status);
 
 	if (sc->cmos_handle) {
 		val = val_ec & IBM_EC_MASK_BRI;
 
 		Args.Count = 1;
 		Args.Pointer = &Arg;
 		Arg.Type = ACPI_TYPE_INTEGER;
 		Arg.Integer.Value = (arg > val) ? IBM_CMOS_BRIGHTNESS_UP :
 						  IBM_CMOS_BRIGHTNESS_DOWN;
 
 		step = (arg > val) ? 1 : -1;
 		for (int i = val; i != arg; i += step) {
 			status = AcpiEvaluateObject(sc->cmos_handle, NULL,
 						    &Args, NULL);
 			if (ACPI_FAILURE(status)) {
 				/* Record the last value */
 				if (i != val) {
 					ACPI_EC_WRITE(sc->ec_dev,
 					    IBM_EC_BRIGHTNESS, i - step, 1);
 				}
 				return (status);
 			}
 		}
 	}
 
 	return ACPI_EC_WRITE(sc->ec_dev, IBM_EC_BRIGHTNESS, arg, 1);
 }
 
 static int
 acpi_ibm_bluetooth_set(struct acpi_ibm_softc *sc, int arg)
 {
 	int			val;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	if (arg < 0 || arg > 1)
 		return (EINVAL);
 
 	val = (arg == 1) ? sc->wlan_bt_flags | IBM_NAME_MASK_BT :
 			   sc->wlan_bt_flags & (~IBM_NAME_MASK_BT);
 	return acpi_SetInteger(sc->handle, IBM_NAME_WLAN_BT_SET, val);
 }
 
 static int
 acpi_ibm_thinklight_set(struct acpi_ibm_softc *sc, int arg)
 {
 	ACPI_OBJECT		Arg;
 	ACPI_OBJECT_LIST	Args;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	if (arg < 0 || arg > 1)
 		return (EINVAL);
 
 	if (sc->light_set_supported) {
 		Args.Count = 1;
 		Args.Pointer = &Arg;
 		Arg.Type = ACPI_TYPE_INTEGER;
 		Arg.Integer.Value = arg ? sc->light_cmd_on : sc->light_cmd_off;
 
 		status = AcpiEvaluateObject(sc->light_handle, NULL,
 					    &Args, NULL);
 		if (ACPI_SUCCESS(status))
 			sc->light_val = arg;
 		return (status);
 	}
 
 	return (0);
 }
 
 static int
 acpi_ibm_volume_set(struct acpi_ibm_softc *sc, int arg)
 {
 	int			val, step;
 	UINT64			val_ec;
 	ACPI_OBJECT		Arg;
 	ACPI_OBJECT_LIST	Args;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	if (arg < 0 || arg > 14)
 		return (EINVAL);
 
 	/* Read the current volume */
 	status = ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 	if (ACPI_FAILURE(status))
 		return (status);
 
 	if (sc->cmos_handle) {
 		val = val_ec & IBM_EC_MASK_VOL;
 
 		Args.Count = 1;
 		Args.Pointer = &Arg;
 		Arg.Type = ACPI_TYPE_INTEGER;
 		Arg.Integer.Value = (arg > val) ? IBM_CMOS_VOLUME_UP :
 						  IBM_CMOS_VOLUME_DOWN;
 
 		step = (arg > val) ? 1 : -1;
 		for (int i = val; i != arg; i += step) {
 			status = AcpiEvaluateObject(sc->cmos_handle, NULL,
 						    &Args, NULL);
 			if (ACPI_FAILURE(status)) {
 				/* Record the last value */
 				if (i != val) {
 					val_ec = i - step +
 						 (val_ec & (~IBM_EC_MASK_VOL));
 					ACPI_EC_WRITE(sc->ec_dev, IBM_EC_VOLUME,
 						      val_ec, 1);
 				}
 				return (status);
 			}
 		}
 	}
 
 	val_ec = arg + (val_ec & (~IBM_EC_MASK_VOL));
 	return ACPI_EC_WRITE(sc->ec_dev, IBM_EC_VOLUME, val_ec, 1);
 }
 
 static int
 acpi_ibm_mute_set(struct acpi_ibm_softc *sc, int arg)
 {
 	UINT64			val_ec;
 	ACPI_OBJECT		Arg;
 	ACPI_OBJECT_LIST	Args;
 	ACPI_STATUS		status;
 
 	ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__);
 	ACPI_SERIAL_ASSERT(ibm);
 
 	if (arg < 0 || arg > 1)
 		return (EINVAL);
 
 	status = ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 	if (ACPI_FAILURE(status))
 		return (status);
 
 	if (sc->cmos_handle) {
 		Args.Count = 1;
 		Args.Pointer = &Arg;
 		Arg.Type = ACPI_TYPE_INTEGER;
 		Arg.Integer.Value = IBM_CMOS_VOLUME_MUTE;
 
 		status = AcpiEvaluateObject(sc->cmos_handle, NULL, &Args, NULL);
 		if (ACPI_FAILURE(status))
 			return (status);
 	}
 
 	val_ec = (arg == 1) ? val_ec | IBM_EC_MASK_MUTE :
 			      val_ec & (~IBM_EC_MASK_MUTE);
 	return ACPI_EC_WRITE(sc->ec_dev, IBM_EC_VOLUME, val_ec, 1);
 }
 
 static void
 acpi_ibm_eventhandler(struct acpi_ibm_softc *sc, int arg)
 {
 	int			val;
 	UINT64			val_ec;
 	ACPI_STATUS		status;
 
 	ACPI_SERIAL_BEGIN(ibm);
 	switch (arg) {
 	case IBM_EVENT_SUSPEND_TO_RAM:
 		power_pm_suspend(POWER_SLEEP_STATE_SUSPEND);
 		break;
 
 	case IBM_EVENT_BLUETOOTH:
 		acpi_ibm_bluetooth_set(sc, (sc->wlan_bt_flags == 0));
 		break;
 
 	case IBM_EVENT_BRIGHTNESS_UP:
 	case IBM_EVENT_BRIGHTNESS_DOWN:
 		/* Read the current brightness */
 		status = ACPI_EC_READ(sc->ec_dev, IBM_EC_BRIGHTNESS,
 				      &val_ec, 1);
 		if (ACPI_FAILURE(status))
 			return;
 
 		val = val_ec & IBM_EC_MASK_BRI;
 		val = (arg == IBM_EVENT_BRIGHTNESS_UP) ? val + 1 : val - 1;
 		acpi_ibm_brightness_set(sc, val);
 		break;
 
 	case IBM_EVENT_THINKLIGHT:
 		acpi_ibm_thinklight_set(sc, (sc->light_val == 0));
 		break;
 
 	case IBM_EVENT_VOLUME_UP:
 	case IBM_EVENT_VOLUME_DOWN:
 		/* Read the current volume */
 		status = ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 		if (ACPI_FAILURE(status))
 			return;
 
 		val = val_ec & IBM_EC_MASK_VOL;
 		val = (arg == IBM_EVENT_VOLUME_UP) ? val + 1 : val - 1;
 		acpi_ibm_volume_set(sc, val);
 		break;
 
 	case IBM_EVENT_MUTE:
 		/* Read the current value */
 		status = ACPI_EC_READ(sc->ec_dev, IBM_EC_VOLUME, &val_ec, 1);
 		if (ACPI_FAILURE(status))
 			return;
 
 		val = ((val_ec & IBM_EC_MASK_MUTE) == IBM_EC_MASK_MUTE);
 		acpi_ibm_mute_set(sc, (val == 0));
 		break;
 
 	default:
 		break;
 	}
 	ACPI_SERIAL_END(ibm);
 }
 
 static void
 acpi_ibm_notify(ACPI_HANDLE h, UINT32 notify, void *context)
 {
 	int		event, arg, type;
 	device_t	dev = context;
 	struct acpi_ibm_softc *sc = device_get_softc(dev);
 
 	ACPI_FUNCTION_TRACE_U32((char *)(uintptr_t)__func__, notify);
 
 	if (notify != 0x80)
 		device_printf(dev, "Unknown notify\n");
 
 	for (;;) {
 		acpi_GetInteger(acpi_get_handle(dev), IBM_NAME_EVENTS_GET, &event);
 		if (event == 0)
 			break;
 
 
 		type = (event >> 12) & 0xf;
 		arg = event & 0xfff;
 		switch (type) {
 		case 1:
 			if (!(sc->events_availmask & (1 << (arg - 1)))) {
 				device_printf(dev, "Unknown key %d\n", arg);
 				break;
 			}
 
 			/* Execute event handler */
 			if (sc->handler_events & (1 << (arg - 1)))
 				acpi_ibm_eventhandler(sc, (arg & 0xff));
 
 			/* Notify devd(8) */
 			acpi_UserNotify("IBM", h, (arg & 0xff));
 			break;
 		default:
 			break;
 		}
 	}
 }
Index: user/ngie/bug-237403/sys/dev/altera/atse/if_atse.c
===================================================================
--- user/ngie/bug-237403/sys/dev/altera/atse/if_atse.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/altera/atse/if_atse.c	(revision 346926)
@@ -1,1603 +1,1608 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2012, 2013 Bjoern A. Zeeb
  * Copyright (c) 2014 Robert N. M. Watson
  * Copyright (c) 2016-2017 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory under DARPA/AFRL contract (FA8750-11-C-0249)
  * ("MRC2"), as part of the DARPA MRC research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 /*
  * Altera Triple-Speed Ethernet MegaCore, Function User Guide
  * UG-01008-3.0, Software Version: 12.0, June 2012.
  * Available at the time of writing at:
  * http://www.altera.com/literature/ug/ug_ethernet.pdf
  *
  * We are using an Marvell E1111 (Alaska) PHY on the DE4.  See mii/e1000phy.c.
  */
 /*
  * XXX-BZ NOTES:
  * - ifOutBroadcastPkts are only counted if both ether dst and src are all-1s;
  *   seems an IP core bug, they count ether broadcasts as multicast.  Is this
  *   still the case?
  * - figure out why the TX FIFO fill status and intr did not work as expected.
  * - test 100Mbit/s and 10Mbit/s
  * - blacklist the one special factory programmed ethernet address (for now
  *   hardcoded, later from loader?)
  * - resolve all XXX, left as reminders to shake out details later
  * - Jumbo frame support
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_device_polling.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/endian.h>
 #include <sys/jail.h>
 #include <sys/lock.h>
 #include <sys/module.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/types.h>
 
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_types.h>
 #include <net/if_vlan_var.h>
 
 #include <net/bpf.h>
 
 #include <machine/bus.h>
 #include <machine/resource.h>
 #include <sys/rman.h>
 
 #include <dev/mii/mii.h>
 #include <dev/mii/miivar.h>
 
 #include <dev/altera/atse/if_atsereg.h>
 #include <dev/xdma/xdma.h>
 
 #define	RX_QUEUE_SIZE		4096
 #define	TX_QUEUE_SIZE		4096
 #define	NUM_RX_MBUF		512
 #define	BUFRING_SIZE		8192
 
 #include <machine/cache.h>
 
 /* XXX once we'd do parallel attach, we need a global lock for this. */
 #define	ATSE_ETHERNET_OPTION_BITS_UNDEF	0
 #define	ATSE_ETHERNET_OPTION_BITS_READ	1
 static int atse_ethernet_option_bits_flag = ATSE_ETHERNET_OPTION_BITS_UNDEF;
 static uint8_t atse_ethernet_option_bits[ALTERA_ETHERNET_OPTION_BITS_LEN];
 
 /*
  * Softc and critical resource locking.
  */
 #define	ATSE_LOCK(_sc)		mtx_lock(&(_sc)->atse_mtx)
 #define	ATSE_UNLOCK(_sc)	mtx_unlock(&(_sc)->atse_mtx)
 #define	ATSE_LOCK_ASSERT(_sc)	mtx_assert(&(_sc)->atse_mtx, MA_OWNED)
 
 #define ATSE_DEBUG
 #undef ATSE_DEBUG
 
 #ifdef ATSE_DEBUG
 #define	DPRINTF(format, ...)	printf(format, __VA_ARGS__)
 #else
 #define	DPRINTF(format, ...)
 #endif
 
 /*
  * Register space access macros.
  */
 static inline void
 csr_write_4(struct atse_softc *sc, uint32_t reg, uint32_t val4,
     const char *f, const int l)
 {
 
 	val4 = htole32(val4);
 	DPRINTF("[%s:%d] CSR W %s 0x%08x (0x%08x) = 0x%08x\n", f, l,
 	    "atse_mem_res", reg, reg * 4, val4);
 	bus_write_4(sc->atse_mem_res, reg * 4, val4);
 }
 
 static inline uint32_t
 csr_read_4(struct atse_softc *sc, uint32_t reg, const char *f, const int l)
 {
 	uint32_t val4;
 
 	val4 = le32toh(bus_read_4(sc->atse_mem_res, reg * 4));
 	DPRINTF("[%s:%d] CSR R %s 0x%08x (0x%08x) = 0x%08x\n", f, l, 
 	    "atse_mem_res", reg, reg * 4, val4);
 
 	return (val4);
 }
 
 /*
  * See page 5-2 that it's all dword offsets and the MS 16 bits need to be zero
  * on write and ignored on read.
  */
 static inline void
 pxx_write_2(struct atse_softc *sc, bus_addr_t bmcr, uint32_t reg, uint16_t val,
     const char *f, const int l, const char *s)
 {
 	uint32_t val4;
 
 	val4 = htole32(val & 0x0000ffff);
 	DPRINTF("[%s:%d] %s W %s 0x%08x (0x%08jx) = 0x%08x\n", f, l, s,
 	    "atse_mem_res", reg, (bmcr + reg) * 4, val4);
 	bus_write_4(sc->atse_mem_res, (bmcr + reg) * 4, val4);
 }
 
 static inline uint16_t
 pxx_read_2(struct atse_softc *sc, bus_addr_t bmcr, uint32_t reg, const char *f,
     const int l, const char *s)
 {
 	uint32_t val4;
 	uint16_t val;
 
 	val4 = bus_read_4(sc->atse_mem_res, (bmcr + reg) * 4);
 	val = le32toh(val4) & 0x0000ffff;
 	DPRINTF("[%s:%d] %s R %s 0x%08x (0x%08jx) = 0x%04x\n", f, l, s,
 	    "atse_mem_res", reg, (bmcr + reg) * 4, val);
 
 	return (val);
 }
 
 #define	CSR_WRITE_4(sc, reg, val)	\
 	csr_write_4((sc), (reg), (val), __func__, __LINE__)
 #define	CSR_READ_4(sc, reg)		\
 	csr_read_4((sc), (reg), __func__, __LINE__)
 #define	PCS_WRITE_2(sc, reg, val)	\
 	pxx_write_2((sc), sc->atse_bmcr0, (reg), (val), __func__, __LINE__, \
 	    "PCS")
 #define	PCS_READ_2(sc, reg)		\
 	pxx_read_2((sc), sc->atse_bmcr0, (reg), __func__, __LINE__, "PCS")
 #define	PHY_WRITE_2(sc, reg, val)	\
 	pxx_write_2((sc), sc->atse_bmcr1, (reg), (val), __func__, __LINE__, \
 	    "PHY")
 #define	PHY_READ_2(sc, reg)		\
 	pxx_read_2((sc), sc->atse_bmcr1, (reg), __func__, __LINE__, "PHY")
 
 static void atse_tick(void *);
 static int atse_detach(device_t);
 
 devclass_t atse_devclass;
 
 static int
 atse_rx_enqueue(struct atse_softc *sc, uint32_t n)
 {
 	struct mbuf *m;
 	int i;
 
 	for (i = 0; i < n; i++) {
 		m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
 		if (m == NULL) {
 			device_printf(sc->dev,
 			    "%s: Can't alloc rx mbuf\n", __func__);
 			return (-1);
 		}
 
 		m->m_pkthdr.len = m->m_len = m->m_ext.ext_size;
 		xdma_enqueue_mbuf(sc->xchan_rx, &m, 0, 4, 4, XDMA_DEV_TO_MEM);
 	}
 
 	return (0);
 }
 
 static int
 atse_xdma_tx_intr(void *arg, xdma_transfer_status_t *status)
 {
 	xdma_transfer_status_t st;
 	struct atse_softc *sc;
 	struct ifnet *ifp;
 	struct mbuf *m;
 	int err;
 
 	sc = arg;
 
 	ATSE_LOCK(sc);
 
 	ifp = sc->atse_ifp;
 
 	for (;;) {
 		err = xdma_dequeue_mbuf(sc->xchan_tx, &m, &st);
 		if (err != 0) {
 			break;
 		}
 
 		if (st.error != 0) {
 			if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 		}
 
 		m_freem(m);
 		sc->txcount--;
 	}
 
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 
 	ATSE_UNLOCK(sc);
 
 	return (0);
 }
 
 static int
 atse_xdma_rx_intr(void *arg, xdma_transfer_status_t *status)
 {
 	xdma_transfer_status_t st;
 	struct atse_softc *sc;
 	struct ifnet *ifp;
 	struct mbuf *m;
 	int err;
 	uint32_t cnt_processed;
 
 	sc = arg;
 
 	ATSE_LOCK(sc);
 
 	ifp = sc->atse_ifp;
 
 	cnt_processed = 0;
 	for (;;) {
 		err = xdma_dequeue_mbuf(sc->xchan_rx, &m, &st);
 		if (err != 0) {
 			break;
 		}
 		cnt_processed++;
 
 		if (st.error != 0) {
 			if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 			m_freem(m);
 			continue;
 		}
 
 		m->m_pkthdr.len = m->m_len = st.transferred;
 		m->m_pkthdr.rcvif = ifp;
 		m_adj(m, ETHER_ALIGN);
 		ATSE_UNLOCK(sc);
 		(*ifp->if_input)(ifp, m);
 		ATSE_LOCK(sc);
 	}
 
 	atse_rx_enqueue(sc, cnt_processed);
 
 	ATSE_UNLOCK(sc);
 
 	return (0);
 }
 
 static int
 atse_transmit_locked(struct ifnet *ifp)
 {
 	struct atse_softc *sc;
 	struct mbuf *m;
 	struct buf_ring *br;
 	int error;
 	int enq;
 
 	sc = ifp->if_softc;
 	br = sc->br;
 
 	enq = 0;
 
 	while ((m = drbr_peek(ifp, br)) != NULL) {
 		error = xdma_enqueue_mbuf(sc->xchan_tx, &m, 0, 4, 4, XDMA_MEM_TO_DEV);
 		if (error != 0) {
 			/* No space in request queue available yet. */
 			drbr_putback(ifp, br, m);
 			break;
 		}
 
 		drbr_advance(ifp, br);
 
 		sc->txcount++;
 		enq++;
 
 		/* If anyone is interested give them a copy. */
 		ETHER_BPF_MTAP(ifp, m);
         }
 
 	if (enq > 0)
 		xdma_queue_submit(sc->xchan_tx);
 
 	return (0);
 }
 
 static int
 atse_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct atse_softc *sc;
 	struct buf_ring *br;
 	int error;
 
 	sc = ifp->if_softc;
 	br = sc->br;
 
 	ATSE_LOCK(sc);
 
 	mtx_lock(&sc->br_mtx);
 
 	if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) != IFF_DRV_RUNNING) {
 		error = drbr_enqueue(ifp, sc->br, m);
 		mtx_unlock(&sc->br_mtx);
 		ATSE_UNLOCK(sc);
 		return (error);
 	}
 
 	if ((sc->atse_flags & ATSE_FLAGS_LINK) == 0) {
 		error = drbr_enqueue(ifp, sc->br, m);
 		mtx_unlock(&sc->br_mtx);
 		ATSE_UNLOCK(sc);
 		return (error);
 	}
 
 	error = drbr_enqueue(ifp, br, m);
 	if (error) {
 		mtx_unlock(&sc->br_mtx);
 		ATSE_UNLOCK(sc);
 		return (error);
 	}
 	error = atse_transmit_locked(ifp);
 
 	mtx_unlock(&sc->br_mtx);
 	ATSE_UNLOCK(sc);
 
 	return (error);
 }
 
 static void
 atse_qflush(struct ifnet *ifp)
 {
 	struct atse_softc *sc;
 
 	sc = ifp->if_softc;
 
 	printf("%s\n", __func__);
 }
 
 static int
 atse_stop_locked(struct atse_softc *sc)
 {
 	uint32_t mask, val4;
 	struct ifnet *ifp;
 	int i;
 
 	ATSE_LOCK_ASSERT(sc);
 
 	callout_stop(&sc->atse_tick);
 
 	ifp = sc->atse_ifp;
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 
 	/* Disable MAC transmit and receive datapath. */
 	mask = BASE_CFG_COMMAND_CONFIG_TX_ENA|BASE_CFG_COMMAND_CONFIG_RX_ENA;
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	val4 &= ~mask;
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 
 	/* Wait for bits to be cleared; i=100 is excessive. */
 	for (i = 0; i < 100; i++) {
 		val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 		if ((val4 & mask) == 0) {
 			break;
 		}
 		DELAY(10);
 	}
 
 	if ((val4 & mask) != 0) {
 		device_printf(sc->atse_dev, "Disabling MAC TX/RX timed out.\n");
 		/* Punt. */
 	}
 
 	sc->atse_flags &= ~ATSE_FLAGS_LINK;
 
 	return (0);
 }
 
 static uint8_t
 atse_mchash(struct atse_softc *sc __unused, const uint8_t *addr)
 {
 	uint8_t x, y;
 	int i, j;
 
 	x = 0;
 	for (i = 0; i < ETHER_ADDR_LEN; i++) {
 		y = addr[i] & 0x01;
 		for (j = 1; j < 8; j++)
 			y ^= (addr[i] >> j) & 0x01;
 		x |= (y << i);
 	}
 
 	return (x);
 }
 
 static int
 atse_rxfilter_locked(struct atse_softc *sc)
 {
 	struct ifmultiaddr *ifma;
 	struct ifnet *ifp;
 	uint32_t val4;
 	int i;
 
 	/* XXX-BZ can we find out if we have the MHASH synthesized? */
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	/* For simplicity always hash full 48 bits of addresses. */
 	if ((val4 & BASE_CFG_COMMAND_CONFIG_MHASH_SEL) != 0)
 		val4 &= ~BASE_CFG_COMMAND_CONFIG_MHASH_SEL;
 
 	ifp = sc->atse_ifp;
 	if (ifp->if_flags & IFF_PROMISC) {
 		val4 |= BASE_CFG_COMMAND_CONFIG_PROMIS_EN;
 	} else {
 		val4 &= ~BASE_CFG_COMMAND_CONFIG_PROMIS_EN;
 	}
 
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 
 	if (ifp->if_flags & IFF_ALLMULTI) {
 		/* Accept all multicast addresses. */
 		for (i = 0; i <= MHASH_LEN; i++)
 			CSR_WRITE_4(sc, MHASH_START + i, 0x1);
 	} else {
 		/*
 		 * Can hold MHASH_LEN entries.
 		 * XXX-BZ bitstring.h would be more general.
 		 */
 		uint64_t h;
 
 		h = 0;
 		/*
 		 * Re-build and re-program hash table.  First build the
 		 * bit-field "yes" or "no" for each slot per address, then
 		 * do all the programming afterwards.
 		 */
 		if_maddr_rlock(ifp);
 		CK_STAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 			if (ifma->ifma_addr->sa_family != AF_LINK) {
 				continue;
 			}
 
 			h |= (1 << atse_mchash(sc,
 			    LLADDR((struct sockaddr_dl *)ifma->ifma_addr)));
 		}
 		if_maddr_runlock(ifp);
 		for (i = 0; i <= MHASH_LEN; i++) {
 			CSR_WRITE_4(sc, MHASH_START + i,
 			    (h & (1 << i)) ? 0x01 : 0x00);
 		}
 	}
 
 	return (0);
 }
 
 static int
 atse_ethernet_option_bits_read_fdt(device_t dev)
 {
 	struct resource *res;
 	device_t fdev;
 	int i, rid;
 
 	if (atse_ethernet_option_bits_flag & ATSE_ETHERNET_OPTION_BITS_READ) {
 		return (0);
 	}
 
 	fdev = device_find_child(device_get_parent(dev), "cfi", 0);
 	if (fdev == NULL) {
 		return (ENOENT);
 	}
 
 	rid = 0;
 	res = bus_alloc_resource_any(fdev, SYS_RES_MEMORY, &rid,
 	    RF_ACTIVE | RF_SHAREABLE);
 	if (res == NULL) {
 		return (ENXIO);
 	}
 
 	for (i = 0; i < ALTERA_ETHERNET_OPTION_BITS_LEN; i++) {
 		atse_ethernet_option_bits[i] = bus_read_1(res,
 		    ALTERA_ETHERNET_OPTION_BITS_OFF + i);
 	}
 
 	bus_release_resource(fdev, SYS_RES_MEMORY, rid, res);
 	atse_ethernet_option_bits_flag |= ATSE_ETHERNET_OPTION_BITS_READ;
 
 	return (0);
 }
 
 static int
 atse_ethernet_option_bits_read(device_t dev)
 {
 	int error;
 
 	error = atse_ethernet_option_bits_read_fdt(dev);
 	if (error == 0)
 		return (0);
 
 	device_printf(dev, "Cannot read Ethernet addresses from flash.\n");
 
 	return (error);
 }
 
 static int
 atse_get_eth_address(struct atse_softc *sc)
 {
 	unsigned long hostid;
 	uint32_t val4;
 	int unit;
 
 	/*
 	 * Make sure to only ever do this once.  Otherwise a reset would
 	 * possibly change our ethernet address, which is not good at all.
 	 */
 	if (sc->atse_eth_addr[0] != 0x00 || sc->atse_eth_addr[1] != 0x00 ||
 	    sc->atse_eth_addr[2] != 0x00) {
 		return (0);
 	}
 
 	if ((atse_ethernet_option_bits_flag &
 	    ATSE_ETHERNET_OPTION_BITS_READ) == 0) {
 		goto get_random;
 	}
 
 	val4 = atse_ethernet_option_bits[0] << 24;
 	val4 |= atse_ethernet_option_bits[1] << 16;
 	val4 |= atse_ethernet_option_bits[2] << 8;
 	val4 |= atse_ethernet_option_bits[3];
 	/* They chose "safe". */
 	if (val4 != le32toh(0x00005afe)) {
 		device_printf(sc->atse_dev, "Magic '5afe' is not safe: 0x%08x. "
 		    "Falling back to random numbers for hardware address.\n",
 		     val4);
 		goto get_random;
 	}
 
 	sc->atse_eth_addr[0] = atse_ethernet_option_bits[4];
 	sc->atse_eth_addr[1] = atse_ethernet_option_bits[5];
 	sc->atse_eth_addr[2] = atse_ethernet_option_bits[6];
 	sc->atse_eth_addr[3] = atse_ethernet_option_bits[7];
 	sc->atse_eth_addr[4] = atse_ethernet_option_bits[8];
 	sc->atse_eth_addr[5] = atse_ethernet_option_bits[9];
 
 	/* Handle factory default ethernet addresss: 00:07:ed:ff:ed:15 */
 	if (sc->atse_eth_addr[0] == 0x00 && sc->atse_eth_addr[1] == 0x07 &&
 	    sc->atse_eth_addr[2] == 0xed && sc->atse_eth_addr[3] == 0xff &&
 	    sc->atse_eth_addr[4] == 0xed && sc->atse_eth_addr[5] == 0x15) {
 
 		device_printf(sc->atse_dev, "Factory programmed Ethernet "
 		    "hardware address blacklisted.  Falling back to random "
 		    "address to avoid collisions.\n");
 		device_printf(sc->atse_dev, "Please re-program your flash.\n");
 		goto get_random;
 	}
 
 	if (sc->atse_eth_addr[0] == 0x00 && sc->atse_eth_addr[1] == 0x00 &&
 	    sc->atse_eth_addr[2] == 0x00 && sc->atse_eth_addr[3] == 0x00 &&
 	    sc->atse_eth_addr[4] == 0x00 && sc->atse_eth_addr[5] == 0x00) {
 		device_printf(sc->atse_dev, "All zero's Ethernet hardware "
 		    "address blacklisted.  Falling back to random address.\n");
 		device_printf(sc->atse_dev, "Please re-program your flash.\n");
 		goto get_random;
 	}
 
 	if (ETHER_IS_MULTICAST(sc->atse_eth_addr)) {
 		device_printf(sc->atse_dev, "Multicast Ethernet hardware "
 		    "address blacklisted.  Falling back to random address.\n");
 		device_printf(sc->atse_dev, "Please re-program your flash.\n");
 		goto get_random;
 	}
 
 	/*
 	 * If we find an Altera prefixed address with a 0x0 ending
 	 * adjust by device unit.  If not and this is not the first
 	 * Ethernet, go to random.
 	 */
 	unit = device_get_unit(sc->atse_dev);
 	if (unit == 0x00) {
 		return (0);
 	}
 
 	if (unit > 0x0f) {
 		device_printf(sc->atse_dev, "We do not support Ethernet "
 		    "addresses for more than 16 MACs. Falling back to "
 		    "random hadware address.\n");
 		goto get_random;
 	}
 	if ((sc->atse_eth_addr[0] & ~0x2) != 0 ||
 	    sc->atse_eth_addr[1] != 0x07 || sc->atse_eth_addr[2] != 0xed ||
 	    (sc->atse_eth_addr[5] & 0x0f) != 0x0) {
 		device_printf(sc->atse_dev, "Ethernet address not meeting our "
 		    "multi-MAC standards. Falling back to random hadware "
 		    "address.\n");
 		goto get_random;
 	}
 	sc->atse_eth_addr[5] |= (unit & 0x0f);
 
 	return (0);
 
 get_random:
 	/*
 	 * Fall back to random code we also use on bridge(4).
 	 */
 	getcredhostid(curthread->td_ucred, &hostid);
 	if (hostid == 0) {
 		arc4rand(sc->atse_eth_addr, ETHER_ADDR_LEN, 1);
 		sc->atse_eth_addr[0] &= ~1;/* clear multicast bit */
 		sc->atse_eth_addr[0] |= 2; /* set the LAA bit */
 	} else {
 		sc->atse_eth_addr[0] = 0x2;
 		sc->atse_eth_addr[1] = (hostid >> 24)	& 0xff;
 		sc->atse_eth_addr[2] = (hostid >> 16)	& 0xff;
 		sc->atse_eth_addr[3] = (hostid >> 8 )	& 0xff;
 		sc->atse_eth_addr[4] = hostid		& 0xff;
 		sc->atse_eth_addr[5] = sc->atse_unit	& 0xff;
 	}
 
 	return (0);
 }
 
 static int
 atse_set_eth_address(struct atse_softc *sc, int n)
 {
 	uint32_t v0, v1;
 
 	v0 = (sc->atse_eth_addr[3] << 24) | (sc->atse_eth_addr[2] << 16) |
 	    (sc->atse_eth_addr[1] << 8) | sc->atse_eth_addr[0];
 	v1 = (sc->atse_eth_addr[5] << 8) | sc->atse_eth_addr[4];
 
 	if (n & ATSE_ETH_ADDR_DEF) {
 		CSR_WRITE_4(sc, BASE_CFG_MAC_0, v0);
 		CSR_WRITE_4(sc, BASE_CFG_MAC_1, v1);
 	}
 	if (n & ATSE_ETH_ADDR_SUPP1) {
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_0_0, v0);
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_0_1, v1);
 	}
 	if (n & ATSE_ETH_ADDR_SUPP2) {
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_1_0, v0);
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_1_1, v1);
 	}
 	if (n & ATSE_ETH_ADDR_SUPP3) {
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_2_0, v0);
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_2_1, v1);
 	}
 	if (n & ATSE_ETH_ADDR_SUPP4) {
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_3_0, v0);
 		CSR_WRITE_4(sc, SUPPL_ADDR_SMAC_3_1, v1);
 	}
 
 	return (0);
 }
 
 static int
 atse_reset(struct atse_softc *sc)
 {
 	uint32_t val4, mask;
 	uint16_t val;
 	int i;
 
 	/* 1. External PHY Initialization using MDIO. */
 	/*
 	 * We select the right MDIO space in atse_attach() and let MII do
 	 * anything else.
 	 */
 
 	/* 2. PCS Configuration Register Initialization. */
 	/* a. Set auto negotiation link timer to 1.6ms for SGMII. */
 	PCS_WRITE_2(sc, PCS_EXT_LINK_TIMER_0, 0x0D40);
 	PCS_WRITE_2(sc, PCS_EXT_LINK_TIMER_1, 0x0003);
 
 	/* b. Configure SGMII. */
 	val = PCS_EXT_IF_MODE_SGMII_ENA|PCS_EXT_IF_MODE_USE_SGMII_AN;
 	PCS_WRITE_2(sc, PCS_EXT_IF_MODE, val);
 
 	/* c. Enable auto negotiation. */
 	/* Ignore Bits 6,8,13; should be set,set,unset. */
 	val = PCS_READ_2(sc, PCS_CONTROL);
 	val &= ~(PCS_CONTROL_ISOLATE|PCS_CONTROL_POWERDOWN);
 	val &= ~PCS_CONTROL_LOOPBACK;		/* Make this a -link1 option? */
 	val |= PCS_CONTROL_AUTO_NEGOTIATION_ENABLE;
 	PCS_WRITE_2(sc, PCS_CONTROL, val);
 
 	/* d. PCS reset. */
 	val = PCS_READ_2(sc, PCS_CONTROL);
 	val |= PCS_CONTROL_RESET;
 	PCS_WRITE_2(sc, PCS_CONTROL, val);
 
 	/* Wait for reset bit to clear; i=100 is excessive. */
 	for (i = 0; i < 100; i++) {
 		val = PCS_READ_2(sc, PCS_CONTROL);
 		if ((val & PCS_CONTROL_RESET) == 0) {
 			break;
 		}
 		DELAY(10);
 	}
 
 	if ((val & PCS_CONTROL_RESET) != 0) {
 		device_printf(sc->atse_dev, "PCS reset timed out.\n");
 		return (ENXIO);
 	}
 
 	/* 3. MAC Configuration Register Initialization. */
 	/* a. Disable MAC transmit and receive datapath. */
 	mask = BASE_CFG_COMMAND_CONFIG_TX_ENA|BASE_CFG_COMMAND_CONFIG_RX_ENA;
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	val4 &= ~mask;
 	/* Samples in the manual do have the SW_RESET bit set here, why? */
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 	/* Wait for bits to be cleared; i=100 is excessive. */
 	for (i = 0; i < 100; i++) {
 		val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 		if ((val4 & mask) == 0) {
 			break;
 		}
 		DELAY(10);
 	}
 	if ((val4 & mask) != 0) {
 		device_printf(sc->atse_dev, "Disabling MAC TX/RX timed out.\n");
 		return (ENXIO);
 	}
 	/* b. MAC FIFO configuration. */
 	CSR_WRITE_4(sc, BASE_CFG_TX_SECTION_EMPTY, FIFO_DEPTH_TX - 16);
 	CSR_WRITE_4(sc, BASE_CFG_TX_ALMOST_FULL, 3);
 	CSR_WRITE_4(sc, BASE_CFG_TX_ALMOST_EMPTY, 8);
 	CSR_WRITE_4(sc, BASE_CFG_RX_SECTION_EMPTY, FIFO_DEPTH_RX - 16);
 	CSR_WRITE_4(sc, BASE_CFG_RX_ALMOST_FULL, 8);
 	CSR_WRITE_4(sc, BASE_CFG_RX_ALMOST_EMPTY, 8);
 #if 0
 	CSR_WRITE_4(sc, BASE_CFG_TX_SECTION_FULL, 16);
 	CSR_WRITE_4(sc, BASE_CFG_RX_SECTION_FULL, 16);
 #else
 	/* For store-and-forward mode, set this threshold to 0. */
 	CSR_WRITE_4(sc, BASE_CFG_TX_SECTION_FULL, 0);
 	CSR_WRITE_4(sc, BASE_CFG_RX_SECTION_FULL, 0);
 #endif
 	/* c. MAC address configuration. */
 	/* Also intialize supplementary addresses to our primary one. */
 	/* XXX-BZ FreeBSD really needs to grow and API for using these. */
 	atse_get_eth_address(sc);
 	atse_set_eth_address(sc, ATSE_ETH_ADDR_ALL);
 
 	/* d. MAC function configuration. */
 	CSR_WRITE_4(sc, BASE_CFG_FRM_LENGTH, 1518);	/* Default. */
 	CSR_WRITE_4(sc, BASE_CFG_TX_IPG_LENGTH, 12);
 	CSR_WRITE_4(sc, BASE_CFG_PAUSE_QUANT, 0xFFFF);
 
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	/*
 	 * If 1000BASE-X/SGMII PCS is initialized, set the ETH_SPEED (bit 3)
 	 * and ENA_10 (bit 25) in command_config register to 0.  If half duplex
 	 * is reported in the PHY/PCS status register, set the HD_ENA (bit 10)
 	 * to 1 in command_config register.
 	 * BZ: We shoot for 1000 instead.
 	 */
 #if 0
 	val4 |= BASE_CFG_COMMAND_CONFIG_ETH_SPEED;
 #else
 	val4 &= ~BASE_CFG_COMMAND_CONFIG_ETH_SPEED;
 #endif
 	val4 &= ~BASE_CFG_COMMAND_CONFIG_ENA_10;
 #if 0
 	/*
 	 * We do not want to set this, otherwise, we could not even send
 	 * random raw ethernet frames for various other research.  By default
 	 * FreeBSD will use the right ether source address.
 	 */
 	val4 |= BASE_CFG_COMMAND_CONFIG_TX_ADDR_INS;
 #endif
 	val4 |= BASE_CFG_COMMAND_CONFIG_PAD_EN;
 	val4 &= ~BASE_CFG_COMMAND_CONFIG_CRC_FWD;
 #if 0
 	val4 |= BASE_CFG_COMMAND_CONFIG_CNTL_FRM_ENA;
 #endif
 #if 1
 	val4 |= BASE_CFG_COMMAND_CONFIG_RX_ERR_DISC;
 #endif
 	val &= ~BASE_CFG_COMMAND_CONFIG_LOOP_ENA;		/* link0? */
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 
 	/*
 	 * Make sure we do not enable 32bit alignment;  FreeBSD cannot
 	 * cope with the additional padding (though we should!?).
 	 * Also make sure we get the CRC appended.
 	 */
 	val4 = CSR_READ_4(sc, TX_CMD_STAT);
 	val4 &= ~(TX_CMD_STAT_OMIT_CRC|TX_CMD_STAT_TX_SHIFT16);
 	CSR_WRITE_4(sc, TX_CMD_STAT, val4);
 
 	val4 = CSR_READ_4(sc, RX_CMD_STAT);
 	val4 &= ~RX_CMD_STAT_RX_SHIFT16;
 	val4 |= RX_CMD_STAT_RX_SHIFT16;
 	CSR_WRITE_4(sc, RX_CMD_STAT, val4);
 
 	/* e. Reset MAC. */
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	val4 |= BASE_CFG_COMMAND_CONFIG_SW_RESET;
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 	/* Wait for bits to be cleared; i=100 is excessive. */
 	for (i = 0; i < 100; i++) {
 		val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 		if ((val4 & BASE_CFG_COMMAND_CONFIG_SW_RESET) == 0) {
 			break;
 		}
 		DELAY(10);
 	}
 	if ((val4 & BASE_CFG_COMMAND_CONFIG_SW_RESET) != 0) {
 		device_printf(sc->atse_dev, "MAC reset timed out.\n");
 		return (ENXIO);
 	}
 
 	/* f. Enable MAC transmit and receive datapath. */
 	mask = BASE_CFG_COMMAND_CONFIG_TX_ENA|BASE_CFG_COMMAND_CONFIG_RX_ENA;
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 	val4 |= mask;
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 	/* Wait for bits to be cleared; i=100 is excessive. */
 	for (i = 0; i < 100; i++) {
 		val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 		if ((val4 & mask) == mask) {
 			break;
 		}
 		DELAY(10);
 	}
 	if ((val4 & mask) != mask) {
 		device_printf(sc->atse_dev, "Enabling MAC TX/RX timed out.\n");
 		return (ENXIO);
 	}
 
 	return (0);
 }
 
 static void
 atse_init_locked(struct atse_softc *sc)
 {
 	struct ifnet *ifp;
 	struct mii_data *mii;
 	uint8_t *eaddr;
 
 	ATSE_LOCK_ASSERT(sc);
 	ifp = sc->atse_ifp;
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) != 0) {
 		return;
 	}
 
 	/*
 	 * Must update the ether address if changed.  Given we do not handle
 	 * in atse_ioctl() but it's in the general framework, just always
 	 * do it here before atse_reset().
 	 */
 	eaddr = IF_LLADDR(sc->atse_ifp);
 	bcopy(eaddr, &sc->atse_eth_addr, ETHER_ADDR_LEN);
 
 	/* Make things frind to halt, cleanup, ... */
 	atse_stop_locked(sc);
 
 	atse_reset(sc);
 
 	/* ... and fire up the engine again. */
 	atse_rxfilter_locked(sc);
 
 	sc->atse_flags &= ATSE_FLAGS_LINK;	/* Preserve. */
 
 	mii = device_get_softc(sc->atse_miibus);
 
 	sc->atse_flags &= ~ATSE_FLAGS_LINK;
 	mii_mediachg(mii);
 
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 
 	callout_reset(&sc->atse_tick, hz, atse_tick, sc);
 }
 
 static void
 atse_init(void *xsc)
 {
 	struct atse_softc *sc;
 
 	/*
 	 * XXXRW: There is some argument that we should immediately do RX
 	 * processing after enabling interrupts, or one may not fire if there
 	 * are buffered packets.
 	 */
 	sc = (struct atse_softc *)xsc;
 	ATSE_LOCK(sc);
 	atse_init_locked(sc);
 	ATSE_UNLOCK(sc);
 }
 
 static int
 atse_ioctl(struct ifnet *ifp, u_long command, caddr_t data)
 {
 	struct atse_softc *sc;
 	struct ifreq *ifr;
 	int error, mask;
 
 	error = 0;
 	sc = ifp->if_softc;
 	ifr = (struct ifreq *)data;
 
 	switch (command) {
 	case SIOCSIFFLAGS:
 		ATSE_LOCK(sc);
 		if (ifp->if_flags & IFF_UP) {
 			if ((ifp->if_drv_flags & IFF_DRV_RUNNING) != 0 &&
 			    ((ifp->if_flags ^ sc->atse_if_flags) &
 			    (IFF_PROMISC | IFF_ALLMULTI)) != 0)
 				atse_rxfilter_locked(sc);
 			else
 				atse_init_locked(sc);
 		} else if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 			atse_stop_locked(sc);
 		sc->atse_if_flags = ifp->if_flags;
 		ATSE_UNLOCK(sc);
 		break;
 	case SIOCSIFCAP:
 		ATSE_LOCK(sc);
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 		ATSE_UNLOCK(sc);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		ATSE_LOCK(sc);
 		atse_rxfilter_locked(sc);
 		ATSE_UNLOCK(sc);
 		break;
 	case SIOCGIFMEDIA:
 	case SIOCSIFMEDIA:
 	{
 		struct mii_data *mii;
 		struct ifreq *ifr;
 
 		mii = device_get_softc(sc->atse_miibus);
 		ifr = (struct ifreq *)data;
 		error = ifmedia_ioctl(ifp, ifr, &mii->mii_media, command);
 		break;
 	}
 	default:
 		error = ether_ioctl(ifp, command, data);
 		break;
 	}
 
 	return (error);
 }
 
 static void
 atse_tick(void *xsc)
 {
 	struct atse_softc *sc;
 	struct mii_data *mii;
 	struct ifnet *ifp;
 
 	sc = (struct atse_softc *)xsc;
 	ATSE_LOCK_ASSERT(sc);
 	ifp = sc->atse_ifp;
 
 	mii = device_get_softc(sc->atse_miibus);
 	mii_tick(mii);
 	if ((sc->atse_flags & ATSE_FLAGS_LINK) == 0) {
 		atse_miibus_statchg(sc->atse_dev);
 	}
 
 	callout_reset(&sc->atse_tick, hz, atse_tick, sc);
 }
 
 /*
  * Set media options.
  */
 static int
 atse_ifmedia_upd(struct ifnet *ifp)
 {
 	struct atse_softc *sc;
 	struct mii_data *mii;
 	struct mii_softc *miisc;
 	int error;
 
 	sc = ifp->if_softc;
 
 	ATSE_LOCK(sc);
 	mii = device_get_softc(sc->atse_miibus);
 	LIST_FOREACH(miisc, &mii->mii_phys, mii_list) {
 		PHY_RESET(miisc);
 	}
 	error = mii_mediachg(mii);
 	ATSE_UNLOCK(sc);
 
 	return (error);
 }
 
 /*
  * Report current media status.
  */
 static void
 atse_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct atse_softc *sc;
 	struct mii_data *mii;
 
 	sc = ifp->if_softc;
 
 	ATSE_LOCK(sc);
 	mii = device_get_softc(sc->atse_miibus);
 	mii_pollstat(mii);
 	ifmr->ifm_active = mii->mii_media_active;
 	ifmr->ifm_status = mii->mii_media_status;
 	ATSE_UNLOCK(sc);
 }
 
 static struct atse_mac_stats_regs {
 	const char *name;
 	const char *descr;	/* Mostly copied from Altera datasheet. */
 } atse_mac_stats_regs[] = {
 	[0x1a] =
 	{ "aFramesTransmittedOK",
 	    "The number of frames that are successfully transmitted including "
 	    "the pause frames." },
 	{ "aFramesReceivedOK",
 	    "The number of frames that are successfully received including the "
 	    "pause frames." },
 	{ "aFrameCheckSequenceErrors",
 	    "The number of receive frames with CRC error." },
 	{ "aAlignmentErrors",
 	    "The number of receive frames with alignment error." },
 	{ "aOctetsTransmittedOK",
 	    "The lower 32 bits of the number of data and padding octets that "
 	    "are successfully transmitted." },
 	{ "aOctetsReceivedOK",
 	    "The lower 32 bits of the number of data and padding octets that "
 	    " are successfully received." },
 	{ "aTxPAUSEMACCtrlFrames",
 	    "The number of pause frames transmitted." },
 	{ "aRxPAUSEMACCtrlFrames",
 	    "The number received pause frames received." },
 	{ "ifInErrors",
 	    "The number of errored frames received." },
 	{ "ifOutErrors",
 	    "The number of transmit frames with either a FIFO overflow error, "
 	    "a FIFO underflow error, or a error defined by the user "
 	    "application." },
 	{ "ifInUcastPkts",
 	    "The number of valid unicast frames received." },
 	{ "ifInMulticastPkts",
 	    "The number of valid multicast frames received. The count does "
 	    "not include pause frames." },
 	{ "ifInBroadcastPkts",
 	    "The number of valid broadcast frames received." },
 	{ "ifOutDiscards",
 	    "This statistics counter is not in use.  The MAC function does not "
 	    "discard frames that are written to the FIFO buffer by the user "
 	    "application." },
 	{ "ifOutUcastPkts",
 	    "The number of valid unicast frames transmitted." },
 	{ "ifOutMulticastPkts",
 	    "The number of valid multicast frames transmitted, excluding pause "
 	    "frames." },
 	{ "ifOutBroadcastPkts",
 	    "The number of valid broadcast frames transmitted." },
 	{ "etherStatsDropEvents",
 	    "The number of frames that are dropped due to MAC internal errors "
 	    "when FIFO buffer overflow persists." },
 	{ "etherStatsOctets",
 	    "The lower 32 bits of the total number of octets received. This "
 	    "count includes both good and errored frames." },
 	{ "etherStatsPkts",
 	    "The total number of good and errored frames received." },
 	{ "etherStatsUndersizePkts",
 	    "The number of frames received with length less than 64 bytes. "
 	    "This count does not include errored frames." },
 	{ "etherStatsOversizePkts",
 	    "The number of frames received that are longer than the value "
 	    "configured in the frm_length register. This count does not "
 	    "include errored frames." },
 	{ "etherStatsPkts64Octets",
 	    "The number of 64-byte frames received. This count includes good "
 	    "and errored frames." },
 	{ "etherStatsPkts65to127Octets",
 	    "The number of received good and errored frames between the length "
 	    "of 65 and 127 bytes." },
 	{ "etherStatsPkts128to255Octets",
 	    "The number of received good and errored frames between the length "
 	    "of 128 and 255 bytes." },
 	{ "etherStatsPkts256to511Octets",
 	    "The number of received good and errored frames between the length "
 	    "of 256 and 511 bytes." },
 	{ "etherStatsPkts512to1023Octets",
 	    "The number of received good and errored frames between the length "
 	    "of 512 and 1023 bytes." },
 	{ "etherStatsPkts1024to1518Octets",
 	    "The number of received good and errored frames between the length "
 	    "of 1024 and 1518 bytes." },
 	{ "etherStatsPkts1519toXOctets",
 	    "The number of received good and errored frames between the length "
 	    "of 1519 and the maximum frame length configured in the frm_length "
 	    "register." },
 	{ "etherStatsJabbers",
 	    "Too long frames with CRC error." },
 	{ "etherStatsFragments",
 	    "Too short frames with CRC error." },
 	/* 0x39 unused, 0x3a/b non-stats. */
 	[0x3c] =
 	/* Extended Statistics Counters */
 	{ "msb_aOctetsTransmittedOK",
 	    "Upper 32 bits of the number of data and padding octets that are "
 	    "successfully transmitted." },
 	{ "msb_aOctetsReceivedOK",
 	    "Upper 32 bits of the number of data and padding octets that are "
 	    "successfully received." },
 	{ "msb_etherStatsOctets",
 	    "Upper 32 bits of the total number of octets received. This count "
 	    "includes both good and errored frames." }
 };
 
 static int
 sysctl_atse_mac_stats_proc(SYSCTL_HANDLER_ARGS)
 {
 	struct atse_softc *sc;
 	int error, offset, s;
 
 	sc = arg1;
 	offset = arg2;
 
 	s = CSR_READ_4(sc, offset);
 	error = sysctl_handle_int(oidp, &s, 0, req);
 	if (error || !req->newptr) {
 		return (error);
 	}
 
 	return (0);
 }
 
 static struct atse_rx_err_stats_regs {
 	const char *name;
 	const char *descr;
 } atse_rx_err_stats_regs[] = {
 
 #define	ATSE_RX_ERR_FIFO_THRES_EOP	0 /* FIFO threshold reached, on EOP. */
 #define	ATSE_RX_ERR_ELEN		1 /* Frame/payload length not valid. */
 #define	ATSE_RX_ERR_CRC32		2 /* CRC-32 error. */
 #define	ATSE_RX_ERR_FIFO_THRES_TRUNC	3 /* FIFO thresh., truncated frame. */
 #define	ATSE_RX_ERR_4			4 /* ? */
 #define	ATSE_RX_ERR_5			5 /* / */
 
 	{ "rx_err_fifo_thres_eop",
 	    "FIFO threshold reached, reported on EOP." },
 	{ "rx_err_fifo_elen",
 	    "Frame or payload length not valid." },
 	{ "rx_err_fifo_crc32",
 	    "CRC-32 error." },
 	{ "rx_err_fifo_thres_trunc",
 	    "FIFO threshold reached, truncated frame" },
 	{ "rx_err_4",
 	    "?" },
 	{ "rx_err_5",
 	    "?" },
 };
 
 static int
 sysctl_atse_rx_err_stats_proc(SYSCTL_HANDLER_ARGS)
 {
 	struct atse_softc *sc;
 	int error, offset, s;
 
 	sc = arg1;
 	offset = arg2;
 
 	s = sc->atse_rx_err[offset];
 	error = sysctl_handle_int(oidp, &s, 0, req);
 	if (error || !req->newptr) {
 		return (error);
 	}
 
 	return (0);
 }
 
 static void
 atse_sysctl_stats_attach(device_t dev)
 {
 	struct sysctl_ctx_list *sctx;
 	struct sysctl_oid *soid;
 	struct atse_softc *sc;
 	int i;
 
 	sc = device_get_softc(dev);
 	sctx = device_get_sysctl_ctx(dev);
 	soid = device_get_sysctl_tree(dev);
 
 	/* MAC statistics. */
 	for (i = 0; i < nitems(atse_mac_stats_regs); i++) {
 		if (atse_mac_stats_regs[i].name == NULL ||
 		    atse_mac_stats_regs[i].descr == NULL) {
 			continue;
 		}
 
 		SYSCTL_ADD_PROC(sctx, SYSCTL_CHILDREN(soid), OID_AUTO,
 		    atse_mac_stats_regs[i].name, CTLTYPE_UINT|CTLFLAG_RD,
 		    sc, i, sysctl_atse_mac_stats_proc, "IU",
 		    atse_mac_stats_regs[i].descr);
 	}
 
 	/* rx_err[]. */
 	for (i = 0; i < ATSE_RX_ERR_MAX; i++) {
 		if (atse_rx_err_stats_regs[i].name == NULL ||
 		    atse_rx_err_stats_regs[i].descr == NULL) {
 			continue;
 		}
 
 		SYSCTL_ADD_PROC(sctx, SYSCTL_CHILDREN(soid), OID_AUTO,
 		    atse_rx_err_stats_regs[i].name, CTLTYPE_UINT|CTLFLAG_RD,
 		    sc, i, sysctl_atse_rx_err_stats_proc, "IU",
 		    atse_rx_err_stats_regs[i].descr);
 	}
 }
 
 /*
  * Generic device handling routines.
  */
 int
 atse_attach(device_t dev)
 {
 	struct atse_softc *sc;
 	struct ifnet *ifp;
 	uint32_t caps;
 	int error;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 
 	/* Get xDMA controller */
 	sc->xdma_tx = xdma_ofw_get(sc->dev, "tx");
 	if (sc->xdma_tx == NULL) {
 		device_printf(dev, "Can't find DMA controller.\n");
 		return (ENXIO);
 	}
 
 	/*
 	 * Only final (EOP) write can be less than "symbols per beat" value
 	 * so we have to defrag mbuf chain.
 	 * Chapter 15. On-Chip FIFO Memory Core.
 	 * Embedded Peripherals IP User Guide.
 	 */
-	caps = XCHAN_CAP_BUSDMA_NOSEG;
+	caps = XCHAN_CAP_NOSEG;
 
 	/* Alloc xDMA virtual channel. */
 	sc->xchan_tx = xdma_channel_alloc(sc->xdma_tx, caps);
 	if (sc->xchan_tx == NULL) {
 		device_printf(dev, "Can't alloc virtual DMA channel.\n");
 		return (ENXIO);
 	}
 
 	/* Setup interrupt handler. */
 	error = xdma_setup_intr(sc->xchan_tx, atse_xdma_tx_intr, sc, &sc->ih_tx);
 	if (error) {
 		device_printf(sc->dev,
 		    "Can't setup xDMA interrupt handler.\n");
 		return (ENXIO);
 	}
 
 	xdma_prep_sg(sc->xchan_tx,
 	    TX_QUEUE_SIZE,	/* xchan requests queue size */
 	    MCLBYTES,	/* maxsegsize */
 	    8,		/* maxnsegs */
 	    16,		/* alignment */
 	    0,		/* boundary */
 	    BUS_SPACE_MAXADDR_32BIT,
 	    BUS_SPACE_MAXADDR);
 
 	/* Get RX xDMA controller */
 	sc->xdma_rx = xdma_ofw_get(sc->dev, "rx");
 	if (sc->xdma_rx == NULL) {
 		device_printf(dev, "Can't find DMA controller.\n");
 		return (ENXIO);
 	}
 
 	/* Alloc xDMA virtual channel. */
 	sc->xchan_rx = xdma_channel_alloc(sc->xdma_rx, caps);
 	if (sc->xchan_rx == NULL) {
 		device_printf(dev, "Can't alloc virtual DMA channel.\n");
 		return (ENXIO);
 	}
 
 	/* Setup interrupt handler. */
 	error = xdma_setup_intr(sc->xchan_rx, atse_xdma_rx_intr, sc, &sc->ih_rx);
 	if (error) {
 		device_printf(sc->dev,
 		    "Can't setup xDMA interrupt handler.\n");
 		return (ENXIO);
 	}
 
 	xdma_prep_sg(sc->xchan_rx,
 	    RX_QUEUE_SIZE,	/* xchan requests queue size */
 	    MCLBYTES,		/* maxsegsize */
 	    1,			/* maxnsegs */
 	    16,			/* alignment */
 	    0,			/* boundary */
 	    BUS_SPACE_MAXADDR_32BIT,
 	    BUS_SPACE_MAXADDR);
 
 	mtx_init(&sc->br_mtx, "buf ring mtx", NULL, MTX_DEF);
 	sc->br = buf_ring_alloc(BUFRING_SIZE, M_DEVBUF,
 	    M_NOWAIT, &sc->br_mtx);
 	if (sc->br == NULL) {
 		return (ENOMEM);
 	}
 
 	atse_ethernet_option_bits_read(dev);
 
 	mtx_init(&sc->atse_mtx, device_get_nameunit(dev), MTX_NETWORK_LOCK,
 	    MTX_DEF);
 
 	callout_init_mtx(&sc->atse_tick, &sc->atse_mtx, 0);
 
 	/*
 	 * We are only doing single-PHY with this driver currently.  The
 	 * defaults would be right so that BASE_CFG_MDIO_ADDR0 points to the
 	 * 1st PHY address (0) apart from the fact that BMCR0 is always
 	 * the PCS mapping, so we always use BMCR1. See Table 5-1 0xA0-0xBF.
 	 */
 #if 0	/* Always PCS. */
 	sc->atse_bmcr0 = MDIO_0_START;
 	CSR_WRITE_4(sc, BASE_CFG_MDIO_ADDR0, 0x00);
 #endif
 	/* Always use matching PHY for atse[0..]. */
 	sc->atse_phy_addr = device_get_unit(dev);
 	sc->atse_bmcr1 = MDIO_1_START;
 	CSR_WRITE_4(sc, BASE_CFG_MDIO_ADDR1, sc->atse_phy_addr);
 
 	/* Reset the adapter. */
 	atse_reset(sc);
 
 	/* Setup interface. */
 	ifp = sc->atse_ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "if_alloc() failed\n");
 		error = ENOSPC;
 		goto err;
 	}
 	ifp->if_softc = sc;
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 	ifp->if_ioctl = atse_ioctl;
 	ifp->if_transmit = atse_transmit;
 	ifp->if_qflush = atse_qflush;
 	ifp->if_init = atse_init;
 	IFQ_SET_MAXLEN(&ifp->if_snd, ATSE_TX_LIST_CNT - 1);
 	ifp->if_snd.ifq_drv_maxlen = ATSE_TX_LIST_CNT - 1;
 	IFQ_SET_READY(&ifp->if_snd);
 
 	/* MII setup. */
 	error = mii_attach(dev, &sc->atse_miibus, ifp, atse_ifmedia_upd,
 	    atse_ifmedia_sts, BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, 0);
 	if (error != 0) {
 		device_printf(dev, "attaching PHY failed: %d\n", error);
 		goto err;
 	}
 
 	/* Call media-indepedent attach routine. */
 	ether_ifattach(ifp, sc->atse_eth_addr);
 
 	/* Tell the upper layer(s) about vlan mtu support. */
 	ifp->if_hdrlen = sizeof(struct ether_vlan_header);
 	ifp->if_capabilities |= IFCAP_VLAN_MTU;
 	ifp->if_capenable = ifp->if_capabilities;
 
 err:
 	if (error != 0) {
 		atse_detach(dev);
 	}
 
 	if (error == 0) {
 		atse_sysctl_stats_attach(dev);
 	}
 
 	atse_rx_enqueue(sc, NUM_RX_MBUF);
 	xdma_queue_submit(sc->xchan_rx);
 
 	return (error);
 }
 
 static int
 atse_detach(device_t dev)
 {
 	struct atse_softc *sc;
 	struct ifnet *ifp;
 
 	sc = device_get_softc(dev);
 	KASSERT(mtx_initialized(&sc->atse_mtx), ("%s: mutex not initialized",
 	    device_get_nameunit(dev)));
 	ifp = sc->atse_ifp;
 
 	/* Only cleanup if attach succeeded. */
 	if (device_is_attached(dev)) {
 		ATSE_LOCK(sc);
 		atse_stop_locked(sc);
 		ATSE_UNLOCK(sc);
 		callout_drain(&sc->atse_tick);
 		ether_ifdetach(ifp);
 	}
 	if (sc->atse_miibus != NULL) {
 		device_delete_child(dev, sc->atse_miibus);
 	}
 
 	if (ifp != NULL) {
 		if_free(ifp);
 	}
 
 	mtx_destroy(&sc->atse_mtx);
+
+	xdma_channel_free(sc->xchan_tx);
+	xdma_channel_free(sc->xchan_rx);
+	xdma_put(sc->xdma_tx);
+	xdma_put(sc->xdma_rx);
 
 	return (0);
 }
 
 /* Shared between nexus and fdt implementation. */
 void
 atse_detach_resources(device_t dev)
 {
 	struct atse_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	if (sc->atse_mem_res != NULL) {
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->atse_mem_rid,
 		    sc->atse_mem_res);
 		sc->atse_mem_res = NULL;
 	}
 }
 
 int
 atse_detach_dev(device_t dev)
 {
 	int error;
 
 	error = atse_detach(dev);
 	if (error) {
 		/* We are basically in undefined state now. */
 		device_printf(dev, "atse_detach() failed: %d\n", error);
 		return (error);
 	}
 
 	atse_detach_resources(dev);
 
 	return (0);
 }
 
 int
 atse_miibus_readreg(device_t dev, int phy, int reg)
 {
 	struct atse_softc *sc;
 	int val;
 
 	sc = device_get_softc(dev);
 
 	/*
 	 * We currently do not support re-mapping of MDIO space on-the-fly
 	 * but de-facto hard-code the phy#.
 	 */
 	if (phy != sc->atse_phy_addr) {
 		return (0);
 	}
 
 	val = PHY_READ_2(sc, reg);
 
 	return (val);
 }
 
 int
 atse_miibus_writereg(device_t dev, int phy, int reg, int data)
 {
 	struct atse_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	/*
 	 * We currently do not support re-mapping of MDIO space on-the-fly
 	 * but de-facto hard-code the phy#.
 	 */
 	if (phy != sc->atse_phy_addr) {
 		return (0);
 	}
 
 	PHY_WRITE_2(sc, reg, data);
 	return (0);
 }
 
 void
 atse_miibus_statchg(device_t dev)
 {
 	struct atse_softc *sc;
 	struct mii_data *mii;
 	struct ifnet *ifp;
 	uint32_t val4;
 
 	sc = device_get_softc(dev);
 	ATSE_LOCK_ASSERT(sc);
 
 	mii = device_get_softc(sc->atse_miibus);
 	ifp = sc->atse_ifp;
 	if (mii == NULL || ifp == NULL ||
 	    (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 		return;
 	}
 
 	val4 = CSR_READ_4(sc, BASE_CFG_COMMAND_CONFIG);
 
 	/* Assume no link. */
 	sc->atse_flags &= ~ATSE_FLAGS_LINK;
 
 	if ((mii->mii_media_status & (IFM_ACTIVE | IFM_AVALID)) ==
 	    (IFM_ACTIVE | IFM_AVALID)) {
 
 		switch (IFM_SUBTYPE(mii->mii_media_active)) {
 		case IFM_10_T:
 			val4 |= BASE_CFG_COMMAND_CONFIG_ENA_10;
 			val4 &= ~BASE_CFG_COMMAND_CONFIG_ETH_SPEED;
 			sc->atse_flags |= ATSE_FLAGS_LINK;
 			break;
 		case IFM_100_TX:
 			val4 &= ~BASE_CFG_COMMAND_CONFIG_ENA_10;
 			val4 &= ~BASE_CFG_COMMAND_CONFIG_ETH_SPEED;
 			sc->atse_flags |= ATSE_FLAGS_LINK;
 			break;
 		case IFM_1000_T:
 			val4 &= ~BASE_CFG_COMMAND_CONFIG_ENA_10;
 			val4 |= BASE_CFG_COMMAND_CONFIG_ETH_SPEED;
 			sc->atse_flags |= ATSE_FLAGS_LINK;
 			break;
 		default:
 			break;
 		}
 	}
 
 	if ((sc->atse_flags & ATSE_FLAGS_LINK) == 0) {
 		/* Need to stop the MAC? */
 		return;
 	}
 
 	if (IFM_OPTIONS(mii->mii_media_active & IFM_FDX) != 0) {
 		val4 &= ~BASE_CFG_COMMAND_CONFIG_HD_ENA;
 	} else {
 		val4 |= BASE_CFG_COMMAND_CONFIG_HD_ENA;
 	}
 
 	/* flow control? */
 
 	/* Make sure the MAC is activated. */
 	val4 |= BASE_CFG_COMMAND_CONFIG_TX_ENA;
 	val4 |= BASE_CFG_COMMAND_CONFIG_RX_ENA;
 
 	CSR_WRITE_4(sc, BASE_CFG_COMMAND_CONFIG, val4);
 }
 
 MODULE_DEPEND(atse, ether, 1, 1, 1);
 MODULE_DEPEND(atse, miibus, 1, 1, 1);
Index: user/ngie/bug-237403/sys/dev/altera/softdma/softdma.c
===================================================================
--- user/ngie/bug-237403/sys/dev/altera/softdma/softdma.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/altera/softdma/softdma.c	(revision 346926)
@@ -1,864 +1,888 @@
 /*-
  * Copyright (c) 2017-2018 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory under DARPA/AFRL contract FA8750-10-C-0237
  * ("CTSRD"), as part of the DARPA CRASH research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 /* This is driver for SoftDMA device built using Altera FIFO component. */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_platform.h"
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/conf.h>
 #include <sys/bus.h>
 #include <sys/endian.h>
 #include <sys/kernel.h>
 #include <sys/kthread.h>
 #include <sys/module.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/resource.h>
 #include <sys/rman.h>
 
 #include <machine/bus.h>
 
 #ifdef FDT
 #include <dev/fdt/fdt_common.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 #endif
 
 #include <dev/altera/softdma/a_api.h>
 
 #include <dev/xdma/xdma.h>
 #include "xdma_if.h"
 
 #define SOFTDMA_DEBUG
 #undef SOFTDMA_DEBUG
 
 #ifdef SOFTDMA_DEBUG
 #define dprintf(fmt, ...)  printf(fmt, ##__VA_ARGS__)
 #else
 #define dprintf(fmt, ...)
 #endif
 
 #define	AVALON_FIFO_TX_BASIC_OPTS_DEPTH		16
 #define	SOFTDMA_NCHANNELS			1
 #define	CONTROL_GEN_SOP				(1 << 0)
 #define	CONTROL_GEN_EOP				(1 << 1)
 #define	CONTROL_OWN				(1 << 31)
 
 #define	SOFTDMA_RX_EVENTS	\
 	(A_ONCHIP_FIFO_MEM_CORE_INTR_FULL	| \
 	 A_ONCHIP_FIFO_MEM_CORE_INTR_OVERFLOW	| \
 	 A_ONCHIP_FIFO_MEM_CORE_INTR_UNDERFLOW)
 #define	SOFTDMA_TX_EVENTS	\
 	(A_ONCHIP_FIFO_MEM_CORE_INTR_EMPTY	| \
  	A_ONCHIP_FIFO_MEM_CORE_INTR_OVERFLOW	| \
  	A_ONCHIP_FIFO_MEM_CORE_INTR_UNDERFLOW)
 
 struct softdma_channel {
 	struct softdma_softc	*sc;
 	struct mtx		mtx;
 	xdma_channel_t		*xchan;
 	struct proc		*p;
 	int			used;
 	int			index;
 	int			run;
 	uint32_t		idx_tail;
 	uint32_t		idx_head;
 	struct softdma_desc	*descs;
 
 	uint32_t		descs_num;
 	uint32_t		descs_used_count;
 };
 
 struct softdma_desc {
 	uint64_t		src_addr;
 	uint64_t		dst_addr;
 	uint32_t		len;
 	uint32_t		access_width;
 	uint32_t		count;
 	uint16_t		src_incr;
 	uint16_t		dst_incr;
 	uint32_t		direction;
 	struct softdma_desc	*next;
 	uint32_t		transfered;
 	uint32_t		status;
 	uint32_t		reserved;
 	uint32_t		control;
 };
 
 struct softdma_softc {
 	device_t		dev;
 	struct resource		*res[3];
 	bus_space_tag_t		bst;
 	bus_space_handle_t	bsh;
 	bus_space_tag_t		bst_c;
 	bus_space_handle_t	bsh_c;
 	void			*ih;
 	struct softdma_channel	channels[SOFTDMA_NCHANNELS];
 };
 
 static struct resource_spec softdma_spec[] = {
 	{ SYS_RES_MEMORY,	0,	RF_ACTIVE },	/* fifo */
 	{ SYS_RES_MEMORY,	1,	RF_ACTIVE },	/* core */
 	{ SYS_RES_IRQ,		0,	RF_ACTIVE },
 	{ -1, 0 }
 };
 
 static int softdma_probe(device_t dev);
 static int softdma_attach(device_t dev);
 static int softdma_detach(device_t dev);
 
 static inline uint32_t
 softdma_next_desc(struct softdma_channel *chan, uint32_t curidx)
 {
 
 	return ((curidx + 1) % chan->descs_num);
 }
 
 static void
 softdma_mem_write(struct softdma_softc *sc, uint32_t reg, uint32_t val)
 {
 
 	bus_write_4(sc->res[0], reg, htole32(val));
 }
 
 static uint32_t
 softdma_mem_read(struct softdma_softc *sc, uint32_t reg)
 {
 	uint32_t val;
 
 	val = bus_read_4(sc->res[0], reg);
 
 	return (le32toh(val));
 }
 
 static void
 softdma_memc_write(struct softdma_softc *sc, uint32_t reg, uint32_t val)
 {
 
 	bus_write_4(sc->res[1], reg, htole32(val));
 }
 
 static uint32_t
 softdma_memc_read(struct softdma_softc *sc, uint32_t reg)
 {
 	uint32_t val;
 
 	val = bus_read_4(sc->res[1], reg);
 
 	return (le32toh(val));
 }
 
 static uint32_t
 softdma_fill_level(struct softdma_softc *sc)
 {
 	uint32_t val;
 
 	val = softdma_memc_read(sc,
 	    A_ONCHIP_FIFO_MEM_CORE_STATUS_REG_FILL_LEVEL);
 
 	return (val);
 }
 
+static uint32_t
+fifo_fill_level_wait(struct softdma_softc *sc)
+{
+	uint32_t val;
+
+	do
+		val = softdma_fill_level(sc);
+	while (val == AVALON_FIFO_TX_BASIC_OPTS_DEPTH);
+
+	return (val);
+}
+
 static void
 softdma_intr(void *arg)
 {
 	struct softdma_channel *chan;
 	struct softdma_softc *sc;
 	int reg;
 	int err;
 
 	sc = arg;
 
 	chan = &sc->channels[0];
 
 	reg = softdma_memc_read(sc, A_ONCHIP_FIFO_MEM_CORE_STATUS_REG_EVENT);
 
 	if (reg & (A_ONCHIP_FIFO_MEM_CORE_EVENT_OVERFLOW | 
 	    A_ONCHIP_FIFO_MEM_CORE_EVENT_UNDERFLOW)) {
 		/* Errors */
 		err = (((reg & A_ONCHIP_FIFO_MEM_CORE_ERROR_MASK) >> \
 		    A_ONCHIP_FIFO_MEM_CORE_ERROR_SHIFT) & 0xff);
 	}
 
 	if (reg != 0) {
 		softdma_memc_write(sc,
 		    A_ONCHIP_FIFO_MEM_CORE_STATUS_REG_EVENT, reg);
 		chan->run = 1;
 		wakeup(chan);
 	}
 }
 
 static int
 softdma_probe(device_t dev)
 {
 
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (!ofw_bus_is_compatible(dev, "altr,softdma"))
 		return (ENXIO);
 
 	device_set_desc(dev, "SoftDMA");
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 softdma_attach(device_t dev)
 {
 	struct softdma_softc *sc;
 	phandle_t xref, node;
 	int err;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 
 	if (bus_alloc_resources(dev, softdma_spec, sc->res)) {
 		device_printf(dev,
 		    "could not allocate resources for device\n");
 		return (ENXIO);
 	}
 
 	/* FIFO memory interface */
 	sc->bst = rman_get_bustag(sc->res[0]);
 	sc->bsh = rman_get_bushandle(sc->res[0]);
 
 	/* FIFO control memory interface */
 	sc->bst_c = rman_get_bustag(sc->res[1]);
 	sc->bsh_c = rman_get_bushandle(sc->res[1]);
 
 	/* Setup interrupt handler */
 	err = bus_setup_intr(dev, sc->res[2], INTR_TYPE_MISC | INTR_MPSAFE,
 	    NULL, softdma_intr, sc, &sc->ih);
 	if (err) {
 		device_printf(dev, "Unable to alloc interrupt resource.\n");
 		return (ENXIO);
 	}
 
 	node = ofw_bus_get_node(dev);
 	xref = OF_xref_from_node(node);
 	OF_device_register_xref(xref, dev);
 
 	return (0);
 }
 
 static int
 softdma_detach(device_t dev)
 {
 	struct softdma_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	return (0);
 }
 
 static int
 softdma_process_tx(struct softdma_channel *chan, struct softdma_desc *desc)
 {
 	struct softdma_softc *sc;
-	uint32_t src_offs, dst_offs;
+	uint64_t addr;
+	uint64_t buf;
+	uint32_t word;
+	uint32_t missing;
 	uint32_t reg;
-	uint32_t fill_level;
-	uint32_t leftm;
-	uint32_t tmp;
-	uint32_t val;
-	uint32_t c;
+	int got_bits;
+	int len;
 
 	sc = chan->sc;
 
-	fill_level = softdma_fill_level(sc);
-	while (fill_level == AVALON_FIFO_TX_BASIC_OPTS_DEPTH)
-		fill_level = softdma_fill_level(sc);
+	fifo_fill_level_wait(sc);
 
 	/* Set start of packet. */
-	if (desc->control & CONTROL_GEN_SOP) {
-		reg = 0;
-		reg |= A_ONCHIP_FIFO_MEM_CORE_SOP;
-		softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA, reg);
-	}
+	if (desc->control & CONTROL_GEN_SOP)
+		softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA,
+		    A_ONCHIP_FIFO_MEM_CORE_SOP);
 
-	src_offs = dst_offs = 0;
-	c = 0;
-	while ((desc->len - c) >= 4) {
-		val = *(uint32_t *)(desc->src_addr + src_offs);
-		bus_write_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA, val);
-		if (desc->src_incr)
-			src_offs += 4;
-		if (desc->dst_incr)
-			dst_offs += 4;
-		fill_level += 1;
+	got_bits = 0;
+	buf = 0;
 
-		while (fill_level == AVALON_FIFO_TX_BASIC_OPTS_DEPTH) {
-			fill_level = softdma_fill_level(sc);
-		}
-		c += 4;
+	addr = desc->src_addr;
+	len = desc->len;
+
+	if (addr & 1) {
+		buf = (buf << 8) | *(uint8_t *)addr;
+		got_bits += 8;
+		addr += 1;
+		len -= 1;
 	}
 
-	val = 0;
-	leftm = (desc->len - c);
+	if (len >= 2 && addr & 2) {
+		buf = (buf << 16) | *(uint16_t *)addr;
+		got_bits += 16;
+		addr += 2;
+		len -= 2;
+	}
 
-	switch (leftm) {
-	case 1:
-		val = *(uint8_t *)(desc->src_addr + src_offs);
-		val <<= 24;
-		src_offs += 1;
-		break;
-	case 2:
-	case 3:
-		val = *(uint16_t *)(desc->src_addr + src_offs);
-		val <<= 16;
-		src_offs += 2;
+	while (len >= 4) {
+		buf = (buf << 32) | (uint64_t)*(uint32_t *)addr;
+		addr += 4;
+		len -= 4;
+		word = (uint32_t)((buf >> got_bits) & 0xffffffff);
 
-		if (leftm == 3) {
-			tmp = *(uint8_t *)(desc->src_addr + src_offs);
-			val |= (tmp << 8);
-			src_offs += 1;
-		}
-		break;
-	case 0:
-	default:
-		break;
+		fifo_fill_level_wait(sc);
+		if (len == 0 && got_bits == 0 &&
+		    (desc->control & CONTROL_GEN_EOP) != 0)
+			softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA,
+			    A_ONCHIP_FIFO_MEM_CORE_EOP);
+		bus_write_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA, word);
 	}
 
-	/* Set end of packet. */
-	reg = 0;
-	if (desc->control & CONTROL_GEN_EOP)
-		reg |= A_ONCHIP_FIFO_MEM_CORE_EOP;
-	reg |= ((4 - leftm) << A_ONCHIP_FIFO_MEM_CORE_EMPTY_SHIFT);
-	softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA, reg);
+	if (len & 2) {
+		buf = (buf << 16) | *(uint16_t *)addr;
+		got_bits += 16;
+		addr += 2;
+		len -= 2;
+	}
 
-	/* Ensure there is a FIFO entry available. */
-	fill_level = softdma_fill_level(sc);
-	while (fill_level == AVALON_FIFO_TX_BASIC_OPTS_DEPTH)
-		fill_level = softdma_fill_level(sc);
+	if (len & 1) {
+		buf = (buf << 8) | *(uint8_t *)addr;
+		got_bits += 8;
+		addr += 1;
+		len -= 1;
+	}
 
-	/* Final write */
-	bus_write_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA, val);
+	if (got_bits >= 32) {
+		got_bits -= 32;
+		word = (uint32_t)((buf >> got_bits) & 0xffffffff);
 
-	return (dst_offs);
+		fifo_fill_level_wait(sc);
+		if (len == 0 && got_bits == 0 &&
+		    (desc->control & CONTROL_GEN_EOP) != 0)
+			softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA,
+			    A_ONCHIP_FIFO_MEM_CORE_EOP);
+		bus_write_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA, word);
+	}
+
+	if (got_bits) {
+		missing = 32 - got_bits;
+		got_bits /= 8;
+
+		fifo_fill_level_wait(sc);
+		reg = A_ONCHIP_FIFO_MEM_CORE_EOP |
+		    ((4 - got_bits) << A_ONCHIP_FIFO_MEM_CORE_EMPTY_SHIFT);
+		softdma_mem_write(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA, reg);
+		word = (uint32_t)((buf << missing) & 0xffffffff);
+		bus_write_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA, word);
+	}
+
+	return (desc->len);
 }
 
 static int
 softdma_process_rx(struct softdma_channel *chan, struct softdma_desc *desc)
 {
 	uint32_t src_offs, dst_offs;
 	struct softdma_softc *sc;
 	uint32_t fill_level;
 	uint32_t empty;
 	uint32_t meta;
 	uint32_t data;
 	int sop_rcvd;
 	int timeout;
 	size_t len;
 	int error;
 
 	sc = chan->sc;
 	empty = 0;
 	src_offs = dst_offs = 0;
 	error = 0;
 
 	fill_level = softdma_fill_level(sc);
 	if (fill_level == 0) {
 		/* Nothing to receive. */
 		return (0);
 	}
 
 	len = desc->len;
 
 	sop_rcvd = 0;
 	while (fill_level) {
 		empty = 0;
 		data = bus_read_4(sc->res[0], A_ONCHIP_FIFO_MEM_CORE_DATA);
 		meta = softdma_mem_read(sc, A_ONCHIP_FIFO_MEM_CORE_METADATA);
 
 		if (meta & A_ONCHIP_FIFO_MEM_CORE_ERROR_MASK) {
 			error = 1;
 			break;
 		}
 
 		if ((meta & A_ONCHIP_FIFO_MEM_CORE_CHANNEL_MASK) != 0) {
 			error = 1;
 			break;
 		}
 
 		if (meta & A_ONCHIP_FIFO_MEM_CORE_SOP) {
 			sop_rcvd = 1;
 		}
 
 		if (meta & A_ONCHIP_FIFO_MEM_CORE_EOP) {
 			empty = (meta & A_ONCHIP_FIFO_MEM_CORE_EMPTY_MASK) >>
 			    A_ONCHIP_FIFO_MEM_CORE_EMPTY_SHIFT;
 		}
 
 		if (sop_rcvd == 0) {
 			error = 1;
 			break;
 		}
 
 		if (empty == 0) {
 			*(uint32_t *)(desc->dst_addr + dst_offs) = data;
 			dst_offs += 4;
 		} else if (empty == 1) {
 			*(uint16_t *)(desc->dst_addr + dst_offs) =
 			    ((data >> 16) & 0xffff);
 			dst_offs += 2;
 
 			*(uint8_t *)(desc->dst_addr + dst_offs) =
 			    ((data >> 8) & 0xff);
 			dst_offs += 1;
 		} else {
 			panic("empty %d\n", empty);
 		}
 
 		if (meta & A_ONCHIP_FIFO_MEM_CORE_EOP)
 			break;
 
 		fill_level = softdma_fill_level(sc);
 		timeout = 100;
 		while (fill_level == 0 && timeout--)
 			fill_level = softdma_fill_level(sc);
 		if (timeout == 0) {
 			/* No EOP received. Broken packet. */
 			error = 1;
 			break;
 		}
 	}
 
 	if (error) {
 		return (-1);
 	}
 
 	return (dst_offs);
 }
 
 static uint32_t
 softdma_process_descriptors(struct softdma_channel *chan,
     xdma_transfer_status_t *status)
 {
 	struct xdma_channel *xchan;
 	struct softdma_desc *desc;
 	struct softdma_softc *sc;
 	xdma_transfer_status_t st;
 	int ret;
 
 	sc = chan->sc;
 
 	xchan = chan->xchan;
 
 	desc = &chan->descs[chan->idx_tail];
 
 	while (desc != NULL) {
 
 		if ((desc->control & CONTROL_OWN) == 0) {
 			break;
 		}
 
 		if (desc->direction == XDMA_MEM_TO_DEV) {
 			ret = softdma_process_tx(chan, desc);
 		} else {
 			ret = softdma_process_rx(chan, desc);
 			if (ret == 0) {
 				/* No new data available. */
 				break;
 			}
 		}
 
 		/* Descriptor processed. */
 		desc->control = 0;
 
 		if (ret >= 0) {
 			st.error = 0;
 			st.transferred = ret;
 		} else {
 			st.error = ret;
 			st.transferred = 0;
 		}
 
 		xchan_seg_done(xchan, &st);
 		atomic_subtract_int(&chan->descs_used_count, 1);
 
 		if (ret >= 0) {
 			status->transferred += ret;
 		} else {
 			status->error = 1;
 			break;
 		}
 
 		chan->idx_tail = softdma_next_desc(chan, chan->idx_tail);
 
 		/* Process next descriptor, if any. */
 		desc = desc->next;
 	}
 
 	return (0);
 }
 
 static void
 softdma_worker(void *arg)
 {
 	xdma_transfer_status_t status;
 	struct softdma_channel *chan;
 	struct softdma_softc *sc;
 
 	chan = arg;
 
 	sc = chan->sc;
 
 	while (1) {
 		mtx_lock(&chan->mtx);
 
 		do {
 			mtx_sleep(chan, &chan->mtx, 0, "softdma_wait", hz / 2);
 		} while (chan->run == 0);
 
 		status.error = 0;
 		status.transferred = 0;
 
 		softdma_process_descriptors(chan, &status);
 
 		/* Finish operation */
 		chan->run = 0;
 		xdma_callback(chan->xchan, &status);
 
 		mtx_unlock(&chan->mtx);
 	}
 
 }
 
 static int
 softdma_proc_create(struct softdma_channel *chan)
 {
 	struct softdma_softc *sc;
 
 	sc = chan->sc;
 
 	if (chan->p != NULL) {
 		/* Already created */
 		return (0);
 	}
 
 	mtx_init(&chan->mtx, "SoftDMA", NULL, MTX_DEF);
 
 	if (kproc_create(softdma_worker, (void *)chan, &chan->p, 0, 0,
 	    "softdma_worker") != 0) {
 		device_printf(sc->dev,
 		    "%s: Failed to create worker thread.\n", __func__);
 		return (-1);
 	}
 
 	return (0);
 }
 
 static int
 softdma_channel_alloc(device_t dev, struct xdma_channel *xchan)
 {
 	struct softdma_channel *chan;
 	struct softdma_softc *sc;
 	int i;
 
 	sc = device_get_softc(dev);
 
 	for (i = 0; i < SOFTDMA_NCHANNELS; i++) {
 		chan = &sc->channels[i];
 		if (chan->used == 0) {
 			chan->xchan = xchan;
 			xchan->chan = (void *)chan;
+			xchan->caps |= XCHAN_CAP_NOBUFS;
+			xchan->caps |= XCHAN_CAP_NOSEG;
 			chan->index = i;
 			chan->idx_head = 0;
 			chan->idx_tail = 0;
 			chan->descs_used_count = 0;
 			chan->descs_num = 1024;
 			chan->sc = sc;
 
 			if (softdma_proc_create(chan) != 0) {
 				return (-1);
 			}
 
 			chan->used = 1;
 
 			return (0);
 		}
 	}
 
 	return (-1);
 }
 
 static int
 softdma_channel_free(device_t dev, struct xdma_channel *xchan)
 {
 	struct softdma_channel *chan;
 	struct softdma_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	if (chan->descs != NULL) {
 		free(chan->descs, M_DEVBUF);
 	}
 
 	chan->used = 0;
 
 	return (0);
 }
 
 static int
 softdma_desc_alloc(struct xdma_channel *xchan)
 {
 	struct softdma_channel *chan;
 	uint32_t nsegments;
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	nsegments = chan->descs_num;
 
 	chan->descs = malloc(nsegments * sizeof(struct softdma_desc),
 	    M_DEVBUF, (M_WAITOK | M_ZERO));
 
 	return (0);
 }
 
 static int
 softdma_channel_prep_sg(device_t dev, struct xdma_channel *xchan)
 {
 	struct softdma_channel *chan;
 	struct softdma_desc *desc;
 	struct softdma_softc *sc;
 	int ret;
 	int i;
 
 	sc = device_get_softc(dev);
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	ret = softdma_desc_alloc(xchan);
 	if (ret != 0) {
 		device_printf(sc->dev,
 		    "%s: Can't allocate descriptors.\n", __func__);
 		return (-1);
 	}
 
 	for (i = 0; i < chan->descs_num; i++) {
 		desc = &chan->descs[i];
 
 		if (i == (chan->descs_num - 1)) {
 			desc->next = &chan->descs[0];
 		} else {
 			desc->next = &chan->descs[i+1];
 		}
 	}
 
 	return (0);
 }
 
 static int
 softdma_channel_capacity(device_t dev, xdma_channel_t *xchan,
     uint32_t *capacity)
 {
 	struct softdma_channel *chan;
 	uint32_t c;
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	/* At least one descriptor must be left empty. */
 	c = (chan->descs_num - chan->descs_used_count - 1);
 
 	*capacity = c;
 
 	return (0);
 }
 
 static int
 softdma_channel_submit_sg(device_t dev, struct xdma_channel *xchan,
     struct xdma_sglist *sg, uint32_t sg_n)
 {
 	struct softdma_channel *chan;
 	struct softdma_desc *desc;
 	struct softdma_softc *sc;
 	uint32_t enqueued;
 	uint32_t saved_dir;
 	uint32_t tmp;
 	uint32_t len;
 	int i;
 
 	sc = device_get_softc(dev);
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	enqueued = 0;
 
 	for (i = 0; i < sg_n; i++) {
 		len = (uint32_t)sg[i].len;
 
 		desc = &chan->descs[chan->idx_head];
 		desc->src_addr = sg[i].src_addr;
 		desc->dst_addr = sg[i].dst_addr;
 		if (sg[i].direction == XDMA_MEM_TO_DEV) {
 			desc->src_incr = 1;
 			desc->dst_incr = 0;
 		} else {
 			desc->src_incr = 0;
 			desc->dst_incr = 1;
 		}
 		desc->direction = sg[i].direction;
 		saved_dir = sg[i].direction;
 		desc->len = len;
 		desc->transfered = 0;
 		desc->status = 0;
 		desc->reserved = 0;
 		desc->control = 0;
 
 		if (sg[i].first == 1)
 			desc->control |= CONTROL_GEN_SOP;
 		if (sg[i].last == 1)
 			desc->control |= CONTROL_GEN_EOP;
 
 		tmp = chan->idx_head;
 		chan->idx_head = softdma_next_desc(chan, chan->idx_head);
 		atomic_add_int(&chan->descs_used_count, 1);
 		desc->control |= CONTROL_OWN;
 		enqueued += 1;
 	}
 
 	if (enqueued == 0)
 		return (0);
 
 	if (saved_dir == XDMA_MEM_TO_DEV) {
 		chan->run = 1;
 		wakeup(chan);
 	} else
 		softdma_memc_write(sc,
 		    A_ONCHIP_FIFO_MEM_CORE_STATUS_REG_INT_ENABLE,
 		    SOFTDMA_RX_EVENTS);
 
 	return (0);
 }
 
 static int
 softdma_channel_request(device_t dev, struct xdma_channel *xchan,
     struct xdma_request *req)
 {
 	struct softdma_channel *chan;
 	struct softdma_desc *desc;
 	struct softdma_softc *sc;
 	int ret;
 
 	sc = device_get_softc(dev);
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	ret = softdma_desc_alloc(xchan);
 	if (ret != 0) {
 		device_printf(sc->dev,
 		    "%s: Can't allocate descriptors.\n", __func__);
 		return (-1);
 	}
 
 	desc = &chan->descs[0];
 
 	desc->src_addr = req->src_addr;
 	desc->dst_addr = req->dst_addr;
 	desc->len = req->block_len;
 	desc->src_incr = 1;
 	desc->dst_incr = 1;
 	desc->next = NULL;
 
 	return (0);
 }
 
 static int
 softdma_channel_control(device_t dev, xdma_channel_t *xchan, int cmd)
 {
 	struct softdma_channel *chan;
 	struct softdma_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	chan = (struct softdma_channel *)xchan->chan;
 
 	switch (cmd) {
 	case XDMA_CMD_BEGIN:
 	case XDMA_CMD_TERMINATE:
 	case XDMA_CMD_PAUSE:
 		/* TODO: implement me */
 		return (-1);
 	}
 
 	return (0);
 }
 
 #ifdef FDT
 static int
 softdma_ofw_md_data(device_t dev, pcell_t *cells,
     int ncells, void **ptr)
 {
 
 	return (0);
 }
 #endif
 
 static device_method_t softdma_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,			softdma_probe),
 	DEVMETHOD(device_attach,		softdma_attach),
 	DEVMETHOD(device_detach,		softdma_detach),
 
 	/* xDMA Interface */
 	DEVMETHOD(xdma_channel_alloc,		softdma_channel_alloc),
 	DEVMETHOD(xdma_channel_free,		softdma_channel_free),
 	DEVMETHOD(xdma_channel_request,		softdma_channel_request),
 	DEVMETHOD(xdma_channel_control,		softdma_channel_control),
 
 	/* xDMA SG Interface */
 	DEVMETHOD(xdma_channel_prep_sg,		softdma_channel_prep_sg),
 	DEVMETHOD(xdma_channel_submit_sg,	softdma_channel_submit_sg),
 	DEVMETHOD(xdma_channel_capacity,	softdma_channel_capacity),
 
 #ifdef FDT
 	DEVMETHOD(xdma_ofw_md_data,		softdma_ofw_md_data),
 #endif
 
 	DEVMETHOD_END
 };
 
 static driver_t softdma_driver = {
 	"softdma",
 	softdma_methods,
 	sizeof(struct softdma_softc),
 };
 
 static devclass_t softdma_devclass;
 
 EARLY_DRIVER_MODULE(softdma, simplebus, softdma_driver, softdma_devclass, 0, 0,
     BUS_PASS_INTERRUPT + BUS_PASS_ORDER_LATE);
Index: user/ngie/bug-237403/sys/dev/cadence/if_cgem.c
===================================================================
--- user/ngie/bug-237403/sys/dev/cadence/if_cgem.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/cadence/if_cgem.c	(revision 346926)
@@ -1,1868 +1,1874 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2012-2014 Thomas Skibo <thomasskibo@yahoo.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 /*
  * A network interface driver for Cadence GEM Gigabit Ethernet
  * interface such as the one used in Xilinx Zynq-7000 SoC.
  *
  * Reference: Zynq-7000 All Programmable SoC Technical Reference Manual.
  * (v1.4) November 16, 2012.  Xilinx doc UG585.  GEM is covered in Ch. 16
  * and register definitions are in appendix B.18.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/module.h>
 #include <sys/rman.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 
 #include <machine/bus.h>
 
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_arp.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_mib.h>
 #include <net/if_types.h>
 
 #ifdef INET
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/in_var.h>
 #include <netinet/ip.h>
 #endif
 
 #include <net/bpf.h>
 #include <net/bpfdesc.h>
 
 #include <dev/fdt/fdt_common.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include <dev/mii/mii.h>
 #include <dev/mii/miivar.h>
 
 #include <dev/cadence/if_cgem_hw.h>
 
 #include "miibus_if.h"
 
 #define IF_CGEM_NAME "cgem"
 
 #define CGEM_NUM_RX_DESCS	512	/* size of receive descriptor ring */
 #define CGEM_NUM_TX_DESCS	512	/* size of transmit descriptor ring */
 
 #define MAX_DESC_RING_SIZE (MAX(CGEM_NUM_RX_DESCS*sizeof(struct cgem_rx_desc),\
 				CGEM_NUM_TX_DESCS*sizeof(struct cgem_tx_desc)))
 
 
 /* Default for sysctl rxbufs.  Must be < CGEM_NUM_RX_DESCS of course. */
 #define DEFAULT_NUM_RX_BUFS	256	/* number of receive bufs to queue. */
 
 #define TX_MAX_DMA_SEGS		8	/* maximum segs in a tx mbuf dma */
 
 #define CGEM_CKSUM_ASSIST	(CSUM_IP | CSUM_TCP | CSUM_UDP | \
 				 CSUM_TCP_IPV6 | CSUM_UDP_IPV6)
 
+static struct ofw_compat_data compat_data[] = {
+	{ "cadence,gem",	1 },
+	{ "cdns,macb",		1 },
+	{ NULL,			0 },
+};
+
 struct cgem_softc {
 	if_t			ifp;
 	struct mtx		sc_mtx;
 	device_t		dev;
 	device_t		miibus;
 	u_int			mii_media_active;	/* last active media */
 	int			if_old_flags;
 	struct resource 	*mem_res;
 	struct resource 	*irq_res;
 	void			*intrhand;
 	struct callout		tick_ch;
 	uint32_t		net_ctl_shadow;
 	int			ref_clk_num;
 	u_char			eaddr[6];
 
 	bus_dma_tag_t		desc_dma_tag;
 	bus_dma_tag_t		mbuf_dma_tag;
 
 	/* receive descriptor ring */
 	struct cgem_rx_desc	*rxring;
 	bus_addr_t		rxring_physaddr;
 	struct mbuf		*rxring_m[CGEM_NUM_RX_DESCS];
 	bus_dmamap_t		rxring_m_dmamap[CGEM_NUM_RX_DESCS];
 	int			rxring_hd_ptr;	/* where to put rcv bufs */
 	int			rxring_tl_ptr;	/* where to get receives */
 	int			rxring_queued;	/* how many rcv bufs queued */
  	bus_dmamap_t		rxring_dma_map;
 	int			rxbufs;		/* tunable number rcv bufs */
 	int			rxhangwar;	/* rx hang work-around */
 	u_int			rxoverruns;	/* rx overruns */
 	u_int			rxnobufs;	/* rx buf ring empty events */
 	u_int			rxdmamapfails;	/* rx dmamap failures */
 	uint32_t		rx_frames_prev;
 
 	/* transmit descriptor ring */
 	struct cgem_tx_desc	*txring;
 	bus_addr_t		txring_physaddr;
 	struct mbuf		*txring_m[CGEM_NUM_TX_DESCS];
 	bus_dmamap_t		txring_m_dmamap[CGEM_NUM_TX_DESCS];
 	int			txring_hd_ptr;	/* where to put next xmits */
 	int			txring_tl_ptr;	/* next xmit mbuf to free */
 	int			txring_queued;	/* num xmits segs queued */
 	bus_dmamap_t		txring_dma_map;
 	u_int			txfull;		/* tx ring full events */
 	u_int			txdefrags;	/* tx calls to m_defrag() */
 	u_int			txdefragfails;	/* tx m_defrag() failures */
 	u_int			txdmamapfails;	/* tx dmamap failures */
 
 	/* hardware provided statistics */
 	struct cgem_hw_stats {
 		uint64_t		tx_bytes;
 		uint32_t		tx_frames;
 		uint32_t		tx_frames_bcast;
 		uint32_t		tx_frames_multi;
 		uint32_t		tx_frames_pause;
 		uint32_t		tx_frames_64b;
 		uint32_t		tx_frames_65to127b;
 		uint32_t		tx_frames_128to255b;
 		uint32_t		tx_frames_256to511b;
 		uint32_t		tx_frames_512to1023b;
 		uint32_t		tx_frames_1024to1536b;
 		uint32_t		tx_under_runs;
 		uint32_t		tx_single_collisn;
 		uint32_t		tx_multi_collisn;
 		uint32_t		tx_excsv_collisn;
 		uint32_t		tx_late_collisn;
 		uint32_t		tx_deferred_frames;
 		uint32_t		tx_carrier_sense_errs;
 
 		uint64_t		rx_bytes;
 		uint32_t		rx_frames;
 		uint32_t		rx_frames_bcast;
 		uint32_t		rx_frames_multi;
 		uint32_t		rx_frames_pause;
 		uint32_t		rx_frames_64b;
 		uint32_t		rx_frames_65to127b;
 		uint32_t		rx_frames_128to255b;
 		uint32_t		rx_frames_256to511b;
 		uint32_t		rx_frames_512to1023b;
 		uint32_t		rx_frames_1024to1536b;
 		uint32_t		rx_frames_undersize;
 		uint32_t		rx_frames_oversize;
 		uint32_t		rx_frames_jabber;
 		uint32_t		rx_frames_fcs_errs;
 		uint32_t		rx_frames_length_errs;
 		uint32_t		rx_symbol_errs;
 		uint32_t		rx_align_errs;
 		uint32_t		rx_resource_errs;
 		uint32_t		rx_overrun_errs;
 		uint32_t		rx_ip_hdr_csum_errs;
 		uint32_t		rx_tcp_csum_errs;
 		uint32_t		rx_udp_csum_errs;
 	} stats;
 };
 
 #define RD4(sc, off) 		(bus_read_4((sc)->mem_res, (off)))
 #define WR4(sc, off, val) 	(bus_write_4((sc)->mem_res, (off), (val)))
 #define BARRIER(sc, off, len, flags) \
 	(bus_barrier((sc)->mem_res, (off), (len), (flags))
 
 #define CGEM_LOCK(sc)		mtx_lock(&(sc)->sc_mtx)
 #define CGEM_UNLOCK(sc)	mtx_unlock(&(sc)->sc_mtx)
 #define CGEM_LOCK_INIT(sc)	\
 	mtx_init(&(sc)->sc_mtx, device_get_nameunit((sc)->dev), \
 		 MTX_NETWORK_LOCK, MTX_DEF)
 #define CGEM_LOCK_DESTROY(sc)	mtx_destroy(&(sc)->sc_mtx)
 #define CGEM_ASSERT_LOCKED(sc)	mtx_assert(&(sc)->sc_mtx, MA_OWNED)
 
 /* Allow platforms to optionally provide a way to set the reference clock. */
 int cgem_set_ref_clk(int unit, int frequency);
 
 static devclass_t cgem_devclass;
 
 static int cgem_probe(device_t dev);
 static int cgem_attach(device_t dev);
 static int cgem_detach(device_t dev);
 static void cgem_tick(void *);
 static void cgem_intr(void *);
 
 static void cgem_mediachange(struct cgem_softc *, struct mii_data *);
 
 static void
 cgem_get_mac(struct cgem_softc *sc, u_char eaddr[])
 {
 	int i;
 	uint32_t rnd;
 
 	/* See if boot loader gave us a MAC address already. */
 	for (i = 0; i < 4; i++) {
 		uint32_t low = RD4(sc, CGEM_SPEC_ADDR_LOW(i));
 		uint32_t high = RD4(sc, CGEM_SPEC_ADDR_HI(i)) & 0xffff;
 		if (low != 0 || high != 0) {
 			eaddr[0] = low & 0xff;
 			eaddr[1] = (low >> 8) & 0xff;
 			eaddr[2] = (low >> 16) & 0xff;
 			eaddr[3] = (low >> 24) & 0xff;
 			eaddr[4] = high & 0xff;
 			eaddr[5] = (high >> 8) & 0xff;
 			break;
 		}
 	}
 
 	/* No MAC from boot loader?  Assign a random one. */
 	if (i == 4) {
 		rnd = arc4random();
 
 		eaddr[0] = 'b';
 		eaddr[1] = 's';
 		eaddr[2] = 'd';
 		eaddr[3] = (rnd >> 16) & 0xff;
 		eaddr[4] = (rnd >> 8) & 0xff;
 		eaddr[5] = rnd & 0xff;
 
 		device_printf(sc->dev, "no mac address found, assigning "
 			      "random: %02x:%02x:%02x:%02x:%02x:%02x\n",
 			      eaddr[0], eaddr[1], eaddr[2],
 			      eaddr[3], eaddr[4], eaddr[5]);
 	}
 
 	/* Move address to first slot and zero out the rest. */
 	WR4(sc, CGEM_SPEC_ADDR_LOW(0), (eaddr[3] << 24) |
 	    (eaddr[2] << 16) | (eaddr[1] << 8) | eaddr[0]);
 	WR4(sc, CGEM_SPEC_ADDR_HI(0), (eaddr[5] << 8) | eaddr[4]);
 
 	for (i = 1; i < 4; i++) {
 		WR4(sc, CGEM_SPEC_ADDR_LOW(i), 0);
 		WR4(sc, CGEM_SPEC_ADDR_HI(i), 0);
 	}
 }
 
 /* cgem_mac_hash():  map 48-bit address to a 6-bit hash.
  * The 6-bit hash corresponds to a bit in a 64-bit hash
  * register.  Setting that bit in the hash register enables
  * reception of all frames with a destination address that hashes
  * to that 6-bit value.
  *
  * The hash function is described in sec. 16.2.3 in the Zynq-7000 Tech
  * Reference Manual.  Bits 0-5 in the hash are the exclusive-or of
  * every sixth bit in the destination address.
  */
 static int
 cgem_mac_hash(u_char eaddr[])
 {
 	int hash;
 	int i, j;
 
 	hash = 0;
 	for (i = 0; i < 6; i++)
 		for (j = i; j < 48; j += 6)
 			if ((eaddr[j >> 3] & (1 << (j & 7))) != 0)
 				hash ^= (1 << i);
 
 	return hash;
 }
 
 /* After any change in rx flags or multi-cast addresses, set up
  * hash registers and net config register bits.
  */
 static void
 cgem_rx_filter(struct cgem_softc *sc)
 {
 	if_t ifp = sc->ifp;
 	u_char *mta;
 
 	int index, i, mcnt;
 	uint32_t hash_hi, hash_lo;
 	uint32_t net_cfg;
 
 	hash_hi = 0;
 	hash_lo = 0;
 
 	net_cfg = RD4(sc, CGEM_NET_CFG);
 
 	net_cfg &= ~(CGEM_NET_CFG_MULTI_HASH_EN |
 		     CGEM_NET_CFG_NO_BCAST | 
 		     CGEM_NET_CFG_COPY_ALL);
 
 	if ((if_getflags(ifp) & IFF_PROMISC) != 0)
 		net_cfg |= CGEM_NET_CFG_COPY_ALL;
 	else {
 		if ((if_getflags(ifp) & IFF_BROADCAST) == 0)
 			net_cfg |= CGEM_NET_CFG_NO_BCAST;
 		if ((if_getflags(ifp) & IFF_ALLMULTI) != 0) {
 			hash_hi = 0xffffffff;
 			hash_lo = 0xffffffff;
 		} else {
 			mcnt = if_multiaddr_count(ifp, -1);
 			mta = malloc(ETHER_ADDR_LEN * mcnt, M_DEVBUF,
 				     M_NOWAIT);
 			if (mta == NULL) {
 				device_printf(sc->dev,
 				      "failed to allocate temp mcast list\n");
 				return;
 			}
 			if_multiaddr_array(ifp, mta, &mcnt, mcnt);
 			for (i = 0; i < mcnt; i++) {
 				index = cgem_mac_hash(
 					LLADDR((struct sockaddr_dl *)
 					       (mta + (i * ETHER_ADDR_LEN))));
 				if (index > 31)
 					hash_hi |= (1 << (index - 32));
 				else
 					hash_lo |= (1 << index);
 			}
 			free(mta, M_DEVBUF);
 		}
 
 		if (hash_hi != 0 || hash_lo != 0)
 			net_cfg |= CGEM_NET_CFG_MULTI_HASH_EN;
 	}
 
 	WR4(sc, CGEM_HASH_TOP, hash_hi);
 	WR4(sc, CGEM_HASH_BOT, hash_lo);
 	WR4(sc, CGEM_NET_CFG, net_cfg);
 }
 
 /* For bus_dmamap_load() callback. */
 static void
 cgem_getaddr(void *arg, bus_dma_segment_t *segs, int nsegs, int error)
 {
 
 	if (nsegs != 1 || error != 0)
 		return;
 	*(bus_addr_t *)arg = segs[0].ds_addr;
 }
 
 /* Create DMA'able descriptor rings. */
 static int
 cgem_setup_descs(struct cgem_softc *sc)
 {
 	int i, err;
 
 	sc->txring = NULL;
 	sc->rxring = NULL;
 
 	/* Allocate non-cached DMA space for RX and TX descriptors.
 	 */
 	err = bus_dma_tag_create(bus_get_dma_tag(sc->dev), 1, 0,
 				 BUS_SPACE_MAXADDR_32BIT,
 				 BUS_SPACE_MAXADDR,
 				 NULL, NULL,
 				 MAX_DESC_RING_SIZE,
 				 1,
 				 MAX_DESC_RING_SIZE,
 				 0,
 				 busdma_lock_mutex,
 				 &sc->sc_mtx,
 				 &sc->desc_dma_tag);
 	if (err)
 		return (err);
 
 	/* Set up a bus_dma_tag for mbufs. */
 	err = bus_dma_tag_create(bus_get_dma_tag(sc->dev), 1, 0,
 				 BUS_SPACE_MAXADDR_32BIT,
 				 BUS_SPACE_MAXADDR,
 				 NULL, NULL,
 				 MCLBYTES,
 				 TX_MAX_DMA_SEGS,
 				 MCLBYTES,
 				 0,
 				 busdma_lock_mutex,
 				 &sc->sc_mtx,
 				 &sc->mbuf_dma_tag);
 	if (err)
 		return (err);
 
 	/* Allocate DMA memory in non-cacheable space. */
 	err = bus_dmamem_alloc(sc->desc_dma_tag,
 			       (void **)&sc->rxring,
 			       BUS_DMA_NOWAIT | BUS_DMA_COHERENT,
 			       &sc->rxring_dma_map);
 	if (err)
 		return (err);
 
 	/* Load descriptor DMA memory. */
 	err = bus_dmamap_load(sc->desc_dma_tag, sc->rxring_dma_map,
 			      (void *)sc->rxring,
 			      CGEM_NUM_RX_DESCS*sizeof(struct cgem_rx_desc),
 			      cgem_getaddr, &sc->rxring_physaddr,
 			      BUS_DMA_NOWAIT);
 	if (err)
 		return (err);
 
 	/* Initialize RX descriptors. */
 	for (i = 0; i < CGEM_NUM_RX_DESCS; i++) {
 		sc->rxring[i].addr = CGEM_RXDESC_OWN;
 		sc->rxring[i].ctl = 0;
 		sc->rxring_m[i] = NULL;
 		sc->rxring_m_dmamap[i] = NULL;
 	}
 	sc->rxring[CGEM_NUM_RX_DESCS - 1].addr |= CGEM_RXDESC_WRAP;
 
 	sc->rxring_hd_ptr = 0;
 	sc->rxring_tl_ptr = 0;
 	sc->rxring_queued = 0;
 
 	/* Allocate DMA memory for TX descriptors in non-cacheable space. */
 	err = bus_dmamem_alloc(sc->desc_dma_tag,
 			       (void **)&sc->txring,
 			       BUS_DMA_NOWAIT | BUS_DMA_COHERENT,
 			       &sc->txring_dma_map);
 	if (err)
 		return (err);
 
 	/* Load TX descriptor DMA memory. */
 	err = bus_dmamap_load(sc->desc_dma_tag, sc->txring_dma_map,
 			      (void *)sc->txring,
 			      CGEM_NUM_TX_DESCS*sizeof(struct cgem_tx_desc),
 			      cgem_getaddr, &sc->txring_physaddr, 
 			      BUS_DMA_NOWAIT);
 	if (err)
 		return (err);
 
 	/* Initialize TX descriptor ring. */
 	for (i = 0; i < CGEM_NUM_TX_DESCS; i++) {
 		sc->txring[i].addr = 0;
 		sc->txring[i].ctl = CGEM_TXDESC_USED;
 		sc->txring_m[i] = NULL;
 		sc->txring_m_dmamap[i] = NULL;
 	}
 	sc->txring[CGEM_NUM_TX_DESCS - 1].ctl |= CGEM_TXDESC_WRAP;
 
 	sc->txring_hd_ptr = 0;
 	sc->txring_tl_ptr = 0;
 	sc->txring_queued = 0;
 
 	return (0);
 }
 
 /* Fill receive descriptor ring with mbufs. */
 static void
 cgem_fill_rqueue(struct cgem_softc *sc)
 {
 	struct mbuf *m = NULL;
 	bus_dma_segment_t segs[TX_MAX_DMA_SEGS];
 	int nsegs;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	while (sc->rxring_queued < sc->rxbufs) {
 		/* Get a cluster mbuf. */
 		m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
 		if (m == NULL)
 			break;
 
 		m->m_len = MCLBYTES;
 		m->m_pkthdr.len = MCLBYTES;
 		m->m_pkthdr.rcvif = sc->ifp;
 
 		/* Load map and plug in physical address. */
 		if (bus_dmamap_create(sc->mbuf_dma_tag, 0,
 			      &sc->rxring_m_dmamap[sc->rxring_hd_ptr])) {
 			sc->rxdmamapfails++;
 			m_free(m);
 			break;
 		}
 		if (bus_dmamap_load_mbuf_sg(sc->mbuf_dma_tag, 
 			      sc->rxring_m_dmamap[sc->rxring_hd_ptr], m,
 			      segs, &nsegs, BUS_DMA_NOWAIT)) {
 			sc->rxdmamapfails++;
 			bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->rxring_m_dmamap[sc->rxring_hd_ptr]);
 			sc->rxring_m_dmamap[sc->rxring_hd_ptr] = NULL;
 			m_free(m);
 			break;
 		}
 		sc->rxring_m[sc->rxring_hd_ptr] = m;
 
 		/* Sync cache with receive buffer. */
 		bus_dmamap_sync(sc->mbuf_dma_tag,
 				sc->rxring_m_dmamap[sc->rxring_hd_ptr],
 				BUS_DMASYNC_PREREAD);
 
 		/* Write rx descriptor and increment head pointer. */
 		sc->rxring[sc->rxring_hd_ptr].ctl = 0;
 		if (sc->rxring_hd_ptr == CGEM_NUM_RX_DESCS - 1) {
 			sc->rxring[sc->rxring_hd_ptr].addr = segs[0].ds_addr |
 				CGEM_RXDESC_WRAP;
 			sc->rxring_hd_ptr = 0;
 		} else
 			sc->rxring[sc->rxring_hd_ptr++].addr = segs[0].ds_addr;
 			
 		sc->rxring_queued++;
 	}
 }
 
 /* Pull received packets off of receive descriptor ring. */
 static void
 cgem_recv(struct cgem_softc *sc)
 {
 	if_t ifp = sc->ifp;
 	struct mbuf *m, *m_hd, **m_tl;
 	uint32_t ctl;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	/* Pick up all packets in which the OWN bit is set. */
 	m_hd = NULL;
 	m_tl = &m_hd;
 	while (sc->rxring_queued > 0 &&
 	       (sc->rxring[sc->rxring_tl_ptr].addr & CGEM_RXDESC_OWN) != 0) {
 
 		ctl = sc->rxring[sc->rxring_tl_ptr].ctl;
 
 		/* Grab filled mbuf. */
 		m = sc->rxring_m[sc->rxring_tl_ptr];
 		sc->rxring_m[sc->rxring_tl_ptr] = NULL;
 
 		/* Sync cache with receive buffer. */
 		bus_dmamap_sync(sc->mbuf_dma_tag,
 				sc->rxring_m_dmamap[sc->rxring_tl_ptr],
 				BUS_DMASYNC_POSTREAD);
 
 		/* Unload and destroy dmamap. */
 		bus_dmamap_unload(sc->mbuf_dma_tag,
 		  	sc->rxring_m_dmamap[sc->rxring_tl_ptr]);
 		bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->rxring_m_dmamap[sc->rxring_tl_ptr]);
 		sc->rxring_m_dmamap[sc->rxring_tl_ptr] = NULL;
 
 		/* Increment tail pointer. */
 		if (++sc->rxring_tl_ptr == CGEM_NUM_RX_DESCS)
 			sc->rxring_tl_ptr = 0;
 		sc->rxring_queued--;
 
 		/* Check FCS and make sure entire packet landed in one mbuf
 		 * cluster (which is much bigger than the largest ethernet
 		 * packet).
 		 */
 		if ((ctl & CGEM_RXDESC_BAD_FCS) != 0 ||
 		    (ctl & (CGEM_RXDESC_SOF | CGEM_RXDESC_EOF)) !=
 		           (CGEM_RXDESC_SOF | CGEM_RXDESC_EOF)) {
 			/* discard. */
 			m_free(m);
 			if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 			continue;
 		}
 
 		/* Ready it to hand off to upper layers. */
 		m->m_data += ETHER_ALIGN;
 		m->m_len = (ctl & CGEM_RXDESC_LENGTH_MASK);
 		m->m_pkthdr.rcvif = ifp;
 		m->m_pkthdr.len = m->m_len;
 
 		/* Are we using hardware checksumming?  Check the
 		 * status in the receive descriptor.
 		 */
 		if ((if_getcapenable(ifp) & IFCAP_RXCSUM) != 0) {
 			/* TCP or UDP checks out, IP checks out too. */
 			if ((ctl & CGEM_RXDESC_CKSUM_STAT_MASK) ==
 			    CGEM_RXDESC_CKSUM_STAT_TCP_GOOD ||
 			    (ctl & CGEM_RXDESC_CKSUM_STAT_MASK) ==
 			    CGEM_RXDESC_CKSUM_STAT_UDP_GOOD) {
 				m->m_pkthdr.csum_flags |=
 					CSUM_IP_CHECKED | CSUM_IP_VALID |
 					CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 				m->m_pkthdr.csum_data = 0xffff;
 			} else if ((ctl & CGEM_RXDESC_CKSUM_STAT_MASK) ==
 				   CGEM_RXDESC_CKSUM_STAT_IP_GOOD) {
 				/* Only IP checks out. */
 				m->m_pkthdr.csum_flags |=
 					CSUM_IP_CHECKED | CSUM_IP_VALID;
 				m->m_pkthdr.csum_data = 0xffff;
 			}
 		}
 
 		/* Queue it up for delivery below. */
 		*m_tl = m;
 		m_tl = &m->m_next;
 	}
 
 	/* Replenish receive buffers. */
 	cgem_fill_rqueue(sc);
 
 	/* Unlock and send up packets. */
 	CGEM_UNLOCK(sc);
 	while (m_hd != NULL) {
 		m = m_hd;
 		m_hd = m_hd->m_next;
 		m->m_next = NULL;
 		if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 		if_input(ifp, m);
 	}
 	CGEM_LOCK(sc);
 }
 
 /* Find completed transmits and free their mbufs. */
 static void
 cgem_clean_tx(struct cgem_softc *sc)
 {
 	struct mbuf *m;
 	uint32_t ctl;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	/* free up finished transmits. */
 	while (sc->txring_queued > 0 &&
 	       ((ctl = sc->txring[sc->txring_tl_ptr].ctl) &
 		CGEM_TXDESC_USED) != 0) {
 
 		/* Sync cache. */
 		bus_dmamap_sync(sc->mbuf_dma_tag,
 				sc->txring_m_dmamap[sc->txring_tl_ptr],
 				BUS_DMASYNC_POSTWRITE);
 
 		/* Unload and destroy DMA map. */
 		bus_dmamap_unload(sc->mbuf_dma_tag,
 				  sc->txring_m_dmamap[sc->txring_tl_ptr]);
 		bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->txring_m_dmamap[sc->txring_tl_ptr]);
 		sc->txring_m_dmamap[sc->txring_tl_ptr] = NULL;
 
 		/* Free up the mbuf. */
 		m = sc->txring_m[sc->txring_tl_ptr];
 		sc->txring_m[sc->txring_tl_ptr] = NULL;
 		m_freem(m);
 
 		/* Check the status. */
 		if ((ctl & CGEM_TXDESC_AHB_ERR) != 0) {
 			/* Serious bus error. log to console. */
 			device_printf(sc->dev, "cgem_clean_tx: Whoa! "
 				   "AHB error, addr=0x%x\n",
 				   sc->txring[sc->txring_tl_ptr].addr);
 		} else if ((ctl & (CGEM_TXDESC_RETRY_ERR |
 				   CGEM_TXDESC_LATE_COLL)) != 0) {
 			if_inc_counter(sc->ifp, IFCOUNTER_OERRORS, 1);
 		} else
 			if_inc_counter(sc->ifp, IFCOUNTER_OPACKETS, 1);
 
 		/* If the packet spanned more than one tx descriptor,
 		 * skip descriptors until we find the end so that only
 		 * start-of-frame descriptors are processed.
 		 */
 		while ((ctl & CGEM_TXDESC_LAST_BUF) == 0) {
 			if ((ctl & CGEM_TXDESC_WRAP) != 0)
 				sc->txring_tl_ptr = 0;
 			else
 				sc->txring_tl_ptr++;
 			sc->txring_queued--;
 
 			ctl = sc->txring[sc->txring_tl_ptr].ctl;
 
 			sc->txring[sc->txring_tl_ptr].ctl =
 				ctl | CGEM_TXDESC_USED;
 		}
 
 		/* Next descriptor. */
 		if ((ctl & CGEM_TXDESC_WRAP) != 0)
 			sc->txring_tl_ptr = 0;
 		else
 			sc->txring_tl_ptr++;
 		sc->txring_queued--;
 
 		if_setdrvflagbits(sc->ifp, 0, IFF_DRV_OACTIVE);
 	}
 }
 
 /* Start transmits. */
 static void
 cgem_start_locked(if_t ifp)
 {
 	struct cgem_softc *sc = (struct cgem_softc *) if_getsoftc(ifp);
 	struct mbuf *m;
 	bus_dma_segment_t segs[TX_MAX_DMA_SEGS];
 	uint32_t ctl;
 	int i, nsegs, wrap, err;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	if ((if_getdrvflags(ifp) & IFF_DRV_OACTIVE) != 0)
 		return;
 
 	for (;;) {
 		/* Check that there is room in the descriptor ring. */
 		if (sc->txring_queued >=
 		    CGEM_NUM_TX_DESCS - TX_MAX_DMA_SEGS * 2) {
 
 			/* Try to make room. */
 			cgem_clean_tx(sc);
 
 			/* Still no room? */
 			if (sc->txring_queued >=
 			    CGEM_NUM_TX_DESCS - TX_MAX_DMA_SEGS * 2) {
 				if_setdrvflagbits(ifp, IFF_DRV_OACTIVE, 0);
 				sc->txfull++;
 				break;
 			}
 		}
 
 		/* Grab next transmit packet. */
 		m = if_dequeue(ifp);
 		if (m == NULL)
 			break;
 
 		/* Create and load DMA map. */
 		if (bus_dmamap_create(sc->mbuf_dma_tag, 0,
 			      &sc->txring_m_dmamap[sc->txring_hd_ptr])) {
 			m_freem(m);
 			sc->txdmamapfails++;
 			continue;
 		}
 		err = bus_dmamap_load_mbuf_sg(sc->mbuf_dma_tag,
 				      sc->txring_m_dmamap[sc->txring_hd_ptr],
 				      m, segs, &nsegs, BUS_DMA_NOWAIT);
 		if (err == EFBIG) {
 			/* Too many segments!  defrag and try again. */
 			struct mbuf *m2 = m_defrag(m, M_NOWAIT);
 
 			if (m2 == NULL) {
 				sc->txdefragfails++;
 				m_freem(m);
 				bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->txring_m_dmamap[sc->txring_hd_ptr]);
 				sc->txring_m_dmamap[sc->txring_hd_ptr] = NULL;
 				continue;
 			}
 			m = m2;
 			err = bus_dmamap_load_mbuf_sg(sc->mbuf_dma_tag,
 				      sc->txring_m_dmamap[sc->txring_hd_ptr],
 				      m, segs, &nsegs, BUS_DMA_NOWAIT);
 			sc->txdefrags++;
 		}
 		if (err) {
 			/* Give up. */
 			m_freem(m);
 			bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->txring_m_dmamap[sc->txring_hd_ptr]);
 			sc->txring_m_dmamap[sc->txring_hd_ptr] = NULL;
 			sc->txdmamapfails++;
 			continue;
 		}
 		sc->txring_m[sc->txring_hd_ptr] = m;
 
 		/* Sync tx buffer with cache. */
 		bus_dmamap_sync(sc->mbuf_dma_tag,
 				sc->txring_m_dmamap[sc->txring_hd_ptr],
 				BUS_DMASYNC_PREWRITE);
 
 		/* Set wrap flag if next packet might run off end of ring. */
 		wrap = sc->txring_hd_ptr + nsegs + TX_MAX_DMA_SEGS >=
 			CGEM_NUM_TX_DESCS;
 
 		/* Fill in the TX descriptors back to front so that USED
 		 * bit in first descriptor is cleared last.
 		 */
 		for (i = nsegs - 1; i >= 0; i--) {
 			/* Descriptor address. */
 			sc->txring[sc->txring_hd_ptr + i].addr =
 				segs[i].ds_addr;
 
 			/* Descriptor control word. */
 			ctl = segs[i].ds_len;
 			if (i == nsegs - 1) {
 				ctl |= CGEM_TXDESC_LAST_BUF;
 				if (wrap)
 					ctl |= CGEM_TXDESC_WRAP;
 			}
 			sc->txring[sc->txring_hd_ptr + i].ctl = ctl;
 
 			if (i != 0)
 				sc->txring_m[sc->txring_hd_ptr + i] = NULL;
 		}
 
 		if (wrap)
 			sc->txring_hd_ptr = 0;
 		else
 			sc->txring_hd_ptr += nsegs;
 		sc->txring_queued += nsegs;
 
 		/* Kick the transmitter. */
 		WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow |
 		    CGEM_NET_CTRL_START_TX);
 
 		/* If there is a BPF listener, bounce a copy to him. */
 		ETHER_BPF_MTAP(ifp, m);
 	}
 }
 
 static void
 cgem_start(if_t ifp)
 {
 	struct cgem_softc *sc = (struct cgem_softc *) if_getsoftc(ifp);
 
 	CGEM_LOCK(sc);
 	cgem_start_locked(ifp);
 	CGEM_UNLOCK(sc);
 }
 
 static void
 cgem_poll_hw_stats(struct cgem_softc *sc)
 {
 	uint32_t n;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	sc->stats.tx_bytes += RD4(sc, CGEM_OCTETS_TX_BOT);
 	sc->stats.tx_bytes += (uint64_t)RD4(sc, CGEM_OCTETS_TX_TOP) << 32;
 
 	sc->stats.tx_frames += RD4(sc, CGEM_FRAMES_TX);
 	sc->stats.tx_frames_bcast += RD4(sc, CGEM_BCAST_FRAMES_TX);
 	sc->stats.tx_frames_multi += RD4(sc, CGEM_MULTI_FRAMES_TX);
 	sc->stats.tx_frames_pause += RD4(sc, CGEM_PAUSE_FRAMES_TX);
 	sc->stats.tx_frames_64b += RD4(sc, CGEM_FRAMES_64B_TX);
 	sc->stats.tx_frames_65to127b += RD4(sc, CGEM_FRAMES_65_127B_TX);
 	sc->stats.tx_frames_128to255b += RD4(sc, CGEM_FRAMES_128_255B_TX);
 	sc->stats.tx_frames_256to511b += RD4(sc, CGEM_FRAMES_256_511B_TX);
 	sc->stats.tx_frames_512to1023b += RD4(sc, CGEM_FRAMES_512_1023B_TX);
 	sc->stats.tx_frames_1024to1536b += RD4(sc, CGEM_FRAMES_1024_1518B_TX);
 	sc->stats.tx_under_runs += RD4(sc, CGEM_TX_UNDERRUNS);
 
 	n = RD4(sc, CGEM_SINGLE_COLL_FRAMES);
 	sc->stats.tx_single_collisn += n;
 	if_inc_counter(sc->ifp, IFCOUNTER_COLLISIONS, n);
 	n = RD4(sc, CGEM_MULTI_COLL_FRAMES);
 	sc->stats.tx_multi_collisn += n;
 	if_inc_counter(sc->ifp, IFCOUNTER_COLLISIONS, n);
 	n = RD4(sc, CGEM_EXCESSIVE_COLL_FRAMES);
 	sc->stats.tx_excsv_collisn += n;
 	if_inc_counter(sc->ifp, IFCOUNTER_COLLISIONS, n);
 	n = RD4(sc, CGEM_LATE_COLL);
 	sc->stats.tx_late_collisn += n;
 	if_inc_counter(sc->ifp, IFCOUNTER_COLLISIONS, n);
 
 	sc->stats.tx_deferred_frames += RD4(sc, CGEM_DEFERRED_TX_FRAMES);
 	sc->stats.tx_carrier_sense_errs += RD4(sc, CGEM_CARRIER_SENSE_ERRS);
 
 	sc->stats.rx_bytes += RD4(sc, CGEM_OCTETS_RX_BOT);
 	sc->stats.rx_bytes += (uint64_t)RD4(sc, CGEM_OCTETS_RX_TOP) << 32;
 
 	sc->stats.rx_frames += RD4(sc, CGEM_FRAMES_RX);
 	sc->stats.rx_frames_bcast += RD4(sc, CGEM_BCAST_FRAMES_RX);
 	sc->stats.rx_frames_multi += RD4(sc, CGEM_MULTI_FRAMES_RX);
 	sc->stats.rx_frames_pause += RD4(sc, CGEM_PAUSE_FRAMES_RX);
 	sc->stats.rx_frames_64b += RD4(sc, CGEM_FRAMES_64B_RX);
 	sc->stats.rx_frames_65to127b += RD4(sc, CGEM_FRAMES_65_127B_RX);
 	sc->stats.rx_frames_128to255b += RD4(sc, CGEM_FRAMES_128_255B_RX);
 	sc->stats.rx_frames_256to511b += RD4(sc, CGEM_FRAMES_256_511B_RX);
 	sc->stats.rx_frames_512to1023b += RD4(sc, CGEM_FRAMES_512_1023B_RX);
 	sc->stats.rx_frames_1024to1536b += RD4(sc, CGEM_FRAMES_1024_1518B_RX);
 	sc->stats.rx_frames_undersize += RD4(sc, CGEM_UNDERSZ_RX);
 	sc->stats.rx_frames_oversize += RD4(sc, CGEM_OVERSZ_RX);
 	sc->stats.rx_frames_jabber += RD4(sc, CGEM_JABBERS_RX);
 	sc->stats.rx_frames_fcs_errs += RD4(sc, CGEM_FCS_ERRS);
 	sc->stats.rx_frames_length_errs += RD4(sc, CGEM_LENGTH_FIELD_ERRS);
 	sc->stats.rx_symbol_errs += RD4(sc, CGEM_RX_SYMBOL_ERRS);
 	sc->stats.rx_align_errs += RD4(sc, CGEM_ALIGN_ERRS);
 	sc->stats.rx_resource_errs += RD4(sc, CGEM_RX_RESOURCE_ERRS);
 	sc->stats.rx_overrun_errs += RD4(sc, CGEM_RX_OVERRUN_ERRS);
 	sc->stats.rx_ip_hdr_csum_errs += RD4(sc, CGEM_IP_HDR_CKSUM_ERRS);
 	sc->stats.rx_tcp_csum_errs += RD4(sc, CGEM_TCP_CKSUM_ERRS);
 	sc->stats.rx_udp_csum_errs += RD4(sc, CGEM_UDP_CKSUM_ERRS);
 }
 
 static void
 cgem_tick(void *arg)
 {
 	struct cgem_softc *sc = (struct cgem_softc *)arg;
 	struct mii_data *mii;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	/* Poll the phy. */
 	if (sc->miibus != NULL) {
 		mii = device_get_softc(sc->miibus);
 		mii_tick(mii);
 	}
 
 	/* Poll statistics registers. */
 	cgem_poll_hw_stats(sc);
 
 	/* Check for receiver hang. */
 	if (sc->rxhangwar && sc->rx_frames_prev == sc->stats.rx_frames) {
 		/*
 		 * Reset receiver logic by toggling RX_EN bit.  1usec
 		 * delay is necessary especially when operating at 100mbps
 		 * and 10mbps speeds.
 		 */
 		WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow &
 		    ~CGEM_NET_CTRL_RX_EN);
 		DELAY(1);
 		WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow);
 	}
 	sc->rx_frames_prev = sc->stats.rx_frames;
 
 	/* Next callout in one second. */
 	callout_reset(&sc->tick_ch, hz, cgem_tick, sc);
 }
 
 /* Interrupt handler. */
 static void
 cgem_intr(void *arg)
 {
 	struct cgem_softc *sc = (struct cgem_softc *)arg;
 	if_t ifp = sc->ifp;
 	uint32_t istatus;
 
 	CGEM_LOCK(sc);
 
 	if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0) {
 		CGEM_UNLOCK(sc);
 		return;
 	}
 
 	/* Read interrupt status and immediately clear the bits. */
 	istatus = RD4(sc, CGEM_INTR_STAT);
 	WR4(sc, CGEM_INTR_STAT, istatus);
 
 	/* Packets received. */
 	if ((istatus & CGEM_INTR_RX_COMPLETE) != 0)
 		cgem_recv(sc);
 
 	/* Free up any completed transmit buffers. */
 	cgem_clean_tx(sc);
 
 	/* Hresp not ok.  Something is very bad with DMA.  Try to clear. */
 	if ((istatus & CGEM_INTR_HRESP_NOT_OK) != 0) {
 		device_printf(sc->dev, "cgem_intr: hresp not okay! "
 			      "rx_status=0x%x\n", RD4(sc, CGEM_RX_STAT));
 		WR4(sc, CGEM_RX_STAT, CGEM_RX_STAT_HRESP_NOT_OK);
 	}
 
 	/* Receiver overrun. */
 	if ((istatus & CGEM_INTR_RX_OVERRUN) != 0) {
 		/* Clear status bit. */
 		WR4(sc, CGEM_RX_STAT, CGEM_RX_STAT_OVERRUN);
 		sc->rxoverruns++;
 	}
 
 	/* Receiver ran out of bufs. */
 	if ((istatus & CGEM_INTR_RX_USED_READ) != 0) {
 		WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow |
 		    CGEM_NET_CTRL_FLUSH_DPRAM_PKT);
 		cgem_fill_rqueue(sc);
 		sc->rxnobufs++;
 	}
 
 	/* Restart transmitter if needed. */
 	if (!if_sendq_empty(ifp))
 		cgem_start_locked(ifp);
 
 	CGEM_UNLOCK(sc);
 }
 
 /* Reset hardware. */
 static void
 cgem_reset(struct cgem_softc *sc)
 {
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	WR4(sc, CGEM_NET_CTRL, 0);
 	WR4(sc, CGEM_NET_CFG, 0);
 	WR4(sc, CGEM_NET_CTRL, CGEM_NET_CTRL_CLR_STAT_REGS);
 	WR4(sc, CGEM_TX_STAT, CGEM_TX_STAT_ALL);
 	WR4(sc, CGEM_RX_STAT, CGEM_RX_STAT_ALL);
 	WR4(sc, CGEM_INTR_DIS, CGEM_INTR_ALL);
 	WR4(sc, CGEM_HASH_BOT, 0);
 	WR4(sc, CGEM_HASH_TOP, 0);
 	WR4(sc, CGEM_TX_QBAR, 0);	/* manual says do this. */
 	WR4(sc, CGEM_RX_QBAR, 0);
 
 	/* Get management port running even if interface is down. */
 	WR4(sc, CGEM_NET_CFG,
 	    CGEM_NET_CFG_DBUS_WIDTH_32 |
 	    CGEM_NET_CFG_MDC_CLK_DIV_64);
 
 	sc->net_ctl_shadow = CGEM_NET_CTRL_MGMT_PORT_EN;
 	WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow);
 }
 
 /* Bring up the hardware. */
 static void
 cgem_config(struct cgem_softc *sc)
 {
 	if_t ifp = sc->ifp;
 	uint32_t net_cfg;
 	uint32_t dma_cfg;
 	u_char *eaddr = if_getlladdr(ifp);
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	/* Program Net Config Register. */
 	net_cfg = CGEM_NET_CFG_DBUS_WIDTH_32 |
 		CGEM_NET_CFG_MDC_CLK_DIV_64 |
 		CGEM_NET_CFG_FCS_REMOVE |
 		CGEM_NET_CFG_RX_BUF_OFFSET(ETHER_ALIGN) |
 		CGEM_NET_CFG_GIGE_EN |
 		CGEM_NET_CFG_1536RXEN |
 		CGEM_NET_CFG_FULL_DUPLEX |
 		CGEM_NET_CFG_SPEED100;
 
 	/* Enable receive checksum offloading? */
 	if ((if_getcapenable(ifp) & IFCAP_RXCSUM) != 0)
 		net_cfg |=  CGEM_NET_CFG_RX_CHKSUM_OFFLD_EN;
 
 	WR4(sc, CGEM_NET_CFG, net_cfg);
 
 	/* Program DMA Config Register. */
 	dma_cfg = CGEM_DMA_CFG_RX_BUF_SIZE(MCLBYTES) |
 		CGEM_DMA_CFG_RX_PKTBUF_MEMSZ_SEL_8K |
 		CGEM_DMA_CFG_TX_PKTBUF_MEMSZ_SEL |
 		CGEM_DMA_CFG_AHB_FIXED_BURST_LEN_16 |
 		CGEM_DMA_CFG_DISC_WHEN_NO_AHB;
 
 	/* Enable transmit checksum offloading? */
 	if ((if_getcapenable(ifp) & IFCAP_TXCSUM) != 0)
 		dma_cfg |= CGEM_DMA_CFG_CHKSUM_GEN_OFFLOAD_EN;
 
 	WR4(sc, CGEM_DMA_CFG, dma_cfg);
 
 	/* Write the rx and tx descriptor ring addresses to the QBAR regs. */
 	WR4(sc, CGEM_RX_QBAR, (uint32_t) sc->rxring_physaddr);
 	WR4(sc, CGEM_TX_QBAR, (uint32_t) sc->txring_physaddr);
 	
 	/* Enable rx and tx. */
 	sc->net_ctl_shadow |= (CGEM_NET_CTRL_TX_EN | CGEM_NET_CTRL_RX_EN);
 	WR4(sc, CGEM_NET_CTRL, sc->net_ctl_shadow);
 
 	/* Set receive address in case it changed. */
 	WR4(sc, CGEM_SPEC_ADDR_LOW(0), (eaddr[3] << 24) |
 	    (eaddr[2] << 16) | (eaddr[1] << 8) | eaddr[0]);
 	WR4(sc, CGEM_SPEC_ADDR_HI(0), (eaddr[5] << 8) | eaddr[4]);
 
 	/* Set up interrupts. */
 	WR4(sc, CGEM_INTR_EN,
 	    CGEM_INTR_RX_COMPLETE | CGEM_INTR_RX_OVERRUN |
 	    CGEM_INTR_TX_USED_READ | CGEM_INTR_RX_USED_READ |
 	    CGEM_INTR_HRESP_NOT_OK);
 }
 
 /* Turn on interface and load up receive ring with buffers. */
 static void
 cgem_init_locked(struct cgem_softc *sc)
 {
 	struct mii_data *mii;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	if ((if_getdrvflags(sc->ifp) & IFF_DRV_RUNNING) != 0)
 		return;
 
 	cgem_config(sc);
 	cgem_fill_rqueue(sc);
 
 	if_setdrvflagbits(sc->ifp, IFF_DRV_RUNNING, IFF_DRV_OACTIVE);
 
 	mii = device_get_softc(sc->miibus);
 	mii_mediachg(mii);
 
 	callout_reset(&sc->tick_ch, hz, cgem_tick, sc);
 }
 
 static void
 cgem_init(void *arg)
 {
 	struct cgem_softc *sc = (struct cgem_softc *)arg;
 
 	CGEM_LOCK(sc);
 	cgem_init_locked(sc);
 	CGEM_UNLOCK(sc);
 }
 
 /* Turn off interface.  Free up any buffers in transmit or receive queues. */
 static void
 cgem_stop(struct cgem_softc *sc)
 {
 	int i;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	callout_stop(&sc->tick_ch);
 
 	/* Shut down hardware. */
 	cgem_reset(sc);
 
 	/* Clear out transmit queue. */
 	for (i = 0; i < CGEM_NUM_TX_DESCS; i++) {
 		sc->txring[i].ctl = CGEM_TXDESC_USED;
 		sc->txring[i].addr = 0;
 		if (sc->txring_m[i]) {
 			/* Unload and destroy dmamap. */
 			bus_dmamap_unload(sc->mbuf_dma_tag,
 					  sc->txring_m_dmamap[i]);
 			bus_dmamap_destroy(sc->mbuf_dma_tag,
 					   sc->txring_m_dmamap[i]);
 			sc->txring_m_dmamap[i] = NULL;
 			m_freem(sc->txring_m[i]);
 			sc->txring_m[i] = NULL;
 		}
 	}
 	sc->txring[CGEM_NUM_TX_DESCS - 1].ctl |= CGEM_TXDESC_WRAP;
 
 	sc->txring_hd_ptr = 0;
 	sc->txring_tl_ptr = 0;
 	sc->txring_queued = 0;
 
 	/* Clear out receive queue. */
 	for (i = 0; i < CGEM_NUM_RX_DESCS; i++) {
 		sc->rxring[i].addr = CGEM_RXDESC_OWN;
 		sc->rxring[i].ctl = 0;
 		if (sc->rxring_m[i]) {
 			/* Unload and destroy dmamap. */
 			bus_dmamap_unload(sc->mbuf_dma_tag,
 				  sc->rxring_m_dmamap[i]);
 			bus_dmamap_destroy(sc->mbuf_dma_tag,
 				   sc->rxring_m_dmamap[i]);
 			sc->rxring_m_dmamap[i] = NULL;
 
 			m_freem(sc->rxring_m[i]);
 			sc->rxring_m[i] = NULL;
 		}
 	}
 	sc->rxring[CGEM_NUM_RX_DESCS - 1].addr |= CGEM_RXDESC_WRAP;
 
 	sc->rxring_hd_ptr = 0;
 	sc->rxring_tl_ptr = 0;
 	sc->rxring_queued = 0;
 
 	/* Force next statchg or linkchg to program net config register. */
 	sc->mii_media_active = 0;
 }
 
 
 static int
 cgem_ioctl(if_t ifp, u_long cmd, caddr_t data)
 {
 	struct cgem_softc *sc = if_getsoftc(ifp);
 	struct ifreq *ifr = (struct ifreq *)data;
 	struct mii_data *mii;
 	int error = 0, mask;
 
 	switch (cmd) {
 	case SIOCSIFFLAGS:
 		CGEM_LOCK(sc);
 		if ((if_getflags(ifp) & IFF_UP) != 0) {
 			if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) {
 				if (((if_getflags(ifp) ^ sc->if_old_flags) &
 				     (IFF_PROMISC | IFF_ALLMULTI)) != 0) {
 					cgem_rx_filter(sc);
 				}
 			} else {
 				cgem_init_locked(sc);
 			}
 		} else if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) {
 			if_setdrvflagbits(ifp, 0, IFF_DRV_RUNNING);
 			cgem_stop(sc);
 		}
 		sc->if_old_flags = if_getflags(ifp);
 		CGEM_UNLOCK(sc);
 		break;
 
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		/* Set up multi-cast filters. */
 		if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) {
 			CGEM_LOCK(sc);
 			cgem_rx_filter(sc);
 			CGEM_UNLOCK(sc);
 		}
 		break;
 
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		mii = device_get_softc(sc->miibus);
 		error = ifmedia_ioctl(ifp, ifr, &mii->mii_media, cmd);
 		break;
 
 	case SIOCSIFCAP:
 		CGEM_LOCK(sc);
 		mask = if_getcapenable(ifp) ^ ifr->ifr_reqcap;
 
 		if ((mask & IFCAP_TXCSUM) != 0) {
 			if ((ifr->ifr_reqcap & IFCAP_TXCSUM) != 0) {
 				/* Turn on TX checksumming. */
 				if_setcapenablebit(ifp, IFCAP_TXCSUM |
 						   IFCAP_TXCSUM_IPV6, 0);
 				if_sethwassistbits(ifp, CGEM_CKSUM_ASSIST, 0);
 
 				WR4(sc, CGEM_DMA_CFG,
 				    RD4(sc, CGEM_DMA_CFG) |
 				     CGEM_DMA_CFG_CHKSUM_GEN_OFFLOAD_EN);
 			} else {
 				/* Turn off TX checksumming. */
 				if_setcapenablebit(ifp, 0, IFCAP_TXCSUM |
 						   IFCAP_TXCSUM_IPV6);
 				if_sethwassistbits(ifp, 0, CGEM_CKSUM_ASSIST);
 
 				WR4(sc, CGEM_DMA_CFG,
 				    RD4(sc, CGEM_DMA_CFG) &
 				     ~CGEM_DMA_CFG_CHKSUM_GEN_OFFLOAD_EN);
 			}
 		}
 		if ((mask & IFCAP_RXCSUM) != 0) {
 			if ((ifr->ifr_reqcap & IFCAP_RXCSUM) != 0) {
 				/* Turn on RX checksumming. */
 				if_setcapenablebit(ifp, IFCAP_RXCSUM |
 						   IFCAP_RXCSUM_IPV6, 0);
 				WR4(sc, CGEM_NET_CFG,
 				    RD4(sc, CGEM_NET_CFG) |
 				     CGEM_NET_CFG_RX_CHKSUM_OFFLD_EN);
 			} else {
 				/* Turn off RX checksumming. */
 				if_setcapenablebit(ifp, 0, IFCAP_RXCSUM |
 						   IFCAP_RXCSUM_IPV6);
 				WR4(sc, CGEM_NET_CFG,
 				    RD4(sc, CGEM_NET_CFG) &
 				     ~CGEM_NET_CFG_RX_CHKSUM_OFFLD_EN);
 			}
 		}
 		if ((if_getcapenable(ifp) & (IFCAP_RXCSUM | IFCAP_TXCSUM)) == 
 		    (IFCAP_RXCSUM | IFCAP_TXCSUM))
 			if_setcapenablebit(ifp, IFCAP_VLAN_HWCSUM, 0);
 		else
 			if_setcapenablebit(ifp, 0, IFCAP_VLAN_HWCSUM);
 
 		CGEM_UNLOCK(sc);
 		break;
 	default:
 		error = ether_ioctl(ifp, cmd, data);
 		break;
 	}
 
 	return (error);
 }
 
 /* MII bus support routines.
  */
 static void
 cgem_child_detached(device_t dev, device_t child)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 
 	if (child == sc->miibus)
 		sc->miibus = NULL;
 }
 
 static int
 cgem_ifmedia_upd(if_t ifp)
 {
 	struct cgem_softc *sc = (struct cgem_softc *) if_getsoftc(ifp);
 	struct mii_data *mii;
 	struct mii_softc *miisc;
 	int error = 0;
 
 	mii = device_get_softc(sc->miibus);
 	CGEM_LOCK(sc);
 	if ((if_getflags(ifp) & IFF_UP) != 0) {
 		LIST_FOREACH(miisc, &mii->mii_phys, mii_list)
 			PHY_RESET(miisc);
 		error = mii_mediachg(mii);
 	}
 	CGEM_UNLOCK(sc);
 
 	return (error);
 }
 
 static void
 cgem_ifmedia_sts(if_t ifp, struct ifmediareq *ifmr)
 {
 	struct cgem_softc *sc = (struct cgem_softc *) if_getsoftc(ifp);
 	struct mii_data *mii;
 
 	mii = device_get_softc(sc->miibus);
 	CGEM_LOCK(sc);
 	mii_pollstat(mii);
 	ifmr->ifm_active = mii->mii_media_active;
 	ifmr->ifm_status = mii->mii_media_status;
 	CGEM_UNLOCK(sc);
 }
 
 static int
 cgem_miibus_readreg(device_t dev, int phy, int reg)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 	int tries, val;
 
 	WR4(sc, CGEM_PHY_MAINT,
 	    CGEM_PHY_MAINT_CLAUSE_22 | CGEM_PHY_MAINT_MUST_10 |
 	    CGEM_PHY_MAINT_OP_READ |
 	    (phy << CGEM_PHY_MAINT_PHY_ADDR_SHIFT) |
 	    (reg << CGEM_PHY_MAINT_REG_ADDR_SHIFT));
 
 	/* Wait for completion. */
 	tries=0;
 	while ((RD4(sc, CGEM_NET_STAT) & CGEM_NET_STAT_PHY_MGMT_IDLE) == 0) {
 		DELAY(5);
 		if (++tries > 200) {
 			device_printf(dev, "phy read timeout: %d\n", reg);
 			return (-1);
 		}
 	}
 
 	val = RD4(sc, CGEM_PHY_MAINT) & CGEM_PHY_MAINT_DATA_MASK;
 
 	if (reg == MII_EXTSR)
 		/*
 		 * MAC does not support half-duplex at gig speeds.
 		 * Let mii(4) exclude the capability.
 		 */
 		val &= ~(EXTSR_1000XHDX | EXTSR_1000THDX);
 
 	return (val);
 }
 
 static int
 cgem_miibus_writereg(device_t dev, int phy, int reg, int data)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 	int tries;
 	
 	WR4(sc, CGEM_PHY_MAINT,
 	    CGEM_PHY_MAINT_CLAUSE_22 | CGEM_PHY_MAINT_MUST_10 |
 	    CGEM_PHY_MAINT_OP_WRITE |
 	    (phy << CGEM_PHY_MAINT_PHY_ADDR_SHIFT) |
 	    (reg << CGEM_PHY_MAINT_REG_ADDR_SHIFT) |
 	    (data & CGEM_PHY_MAINT_DATA_MASK));
 
 	/* Wait for completion. */
 	tries = 0;
 	while ((RD4(sc, CGEM_NET_STAT) & CGEM_NET_STAT_PHY_MGMT_IDLE) == 0) {
 		DELAY(5);
 		if (++tries > 200) {
 			device_printf(dev, "phy write timeout: %d\n", reg);
 			return (-1);
 		}
 	}
 
 	return (0);
 }
 
 static void
 cgem_miibus_statchg(device_t dev)
 {
 	struct cgem_softc *sc  = device_get_softc(dev);
 	struct mii_data *mii = device_get_softc(sc->miibus);
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	if ((mii->mii_media_status & (IFM_ACTIVE | IFM_AVALID)) ==
 	    (IFM_ACTIVE | IFM_AVALID) &&
 	    sc->mii_media_active != mii->mii_media_active)
 		cgem_mediachange(sc, mii);
 }
 
 static void
 cgem_miibus_linkchg(device_t dev)
 {
 	struct cgem_softc *sc  = device_get_softc(dev);
 	struct mii_data *mii = device_get_softc(sc->miibus);
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	if ((mii->mii_media_status & (IFM_ACTIVE | IFM_AVALID)) ==
 	    (IFM_ACTIVE | IFM_AVALID) &&
 	    sc->mii_media_active != mii->mii_media_active)
 		cgem_mediachange(sc, mii);
 }
 
 /*
  * Overridable weak symbol cgem_set_ref_clk().  This allows platforms to
  * provide a function to set the cgem's reference clock.
  */
 static int __used
 cgem_default_set_ref_clk(int unit, int frequency)
 {
 
 	return 0;
 }
 __weak_reference(cgem_default_set_ref_clk, cgem_set_ref_clk);
 
 /* Call to set reference clock and network config bits according to media. */
 static void
 cgem_mediachange(struct cgem_softc *sc,	struct mii_data *mii)
 {
 	uint32_t net_cfg;
 	int ref_clk_freq;
 
 	CGEM_ASSERT_LOCKED(sc);
 
 	/* Update hardware to reflect media. */
 	net_cfg = RD4(sc, CGEM_NET_CFG);
 	net_cfg &= ~(CGEM_NET_CFG_SPEED100 | CGEM_NET_CFG_GIGE_EN |
 		     CGEM_NET_CFG_FULL_DUPLEX);
 
 	switch (IFM_SUBTYPE(mii->mii_media_active)) {
 	case IFM_1000_T:
 		net_cfg |= (CGEM_NET_CFG_SPEED100 |
 			    CGEM_NET_CFG_GIGE_EN);
 		ref_clk_freq = 125000000;
 		break;
 	case IFM_100_TX:
 		net_cfg |= CGEM_NET_CFG_SPEED100;
 		ref_clk_freq = 25000000;
 		break;
 	default:
 		ref_clk_freq = 2500000;
 	}
 
 	if ((mii->mii_media_active & IFM_FDX) != 0)
 		net_cfg |= CGEM_NET_CFG_FULL_DUPLEX;
 
 	WR4(sc, CGEM_NET_CFG, net_cfg);
 
 	/* Set the reference clock if necessary. */
 	if (cgem_set_ref_clk(sc->ref_clk_num, ref_clk_freq))
 		device_printf(sc->dev, "cgem_mediachange: "
 			      "could not set ref clk%d to %d.\n",
 			      sc->ref_clk_num, ref_clk_freq);
 
 	sc->mii_media_active = mii->mii_media_active;
 }
 
 static void
 cgem_add_sysctls(device_t dev)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid_list *child;
 	struct sysctl_oid *tree;
 
 	ctx = device_get_sysctl_ctx(dev);
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev));
 
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rxbufs", CTLFLAG_RW,
 		       &sc->rxbufs, 0,
 		       "Number receive buffers to provide");
 
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rxhangwar", CTLFLAG_RW,
 		       &sc->rxhangwar, 0,
 		       "Enable receive hang work-around");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_rxoverruns", CTLFLAG_RD,
 			&sc->rxoverruns, 0,
 			"Receive overrun events");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_rxnobufs", CTLFLAG_RD,
 			&sc->rxnobufs, 0,
 			"Receive buf queue empty events");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_rxdmamapfails", CTLFLAG_RD,
 			&sc->rxdmamapfails, 0,
 			"Receive DMA map failures");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_txfull", CTLFLAG_RD,
 			&sc->txfull, 0,
 			"Transmit ring full events");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_txdmamapfails", CTLFLAG_RD,
 			&sc->txdmamapfails, 0,
 			"Transmit DMA map failures");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_txdefrags", CTLFLAG_RD,
 			&sc->txdefrags, 0,
 			"Transmit m_defrag() calls");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "_txdefragfails", CTLFLAG_RD,
 			&sc->txdefragfails, 0,
 			"Transmit m_defrag() failures");
 
 	tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "stats", CTLFLAG_RD,
 			       NULL, "GEM statistics");
 	child = SYSCTL_CHILDREN(tree);
 
 	SYSCTL_ADD_UQUAD(ctx, child, OID_AUTO, "tx_bytes", CTLFLAG_RD,
 			 &sc->stats.tx_bytes, "Total bytes transmitted");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames", CTLFLAG_RD,
 			&sc->stats.tx_frames, 0, "Total frames transmitted");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_bcast", CTLFLAG_RD,
 			&sc->stats.tx_frames_bcast, 0,
 			"Number broadcast frames transmitted");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_multi", CTLFLAG_RD,
 			&sc->stats.tx_frames_multi, 0,
 			"Number multicast frames transmitted");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_pause",
 			CTLFLAG_RD, &sc->stats.tx_frames_pause, 0,
 			"Number pause frames transmitted");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_64b", CTLFLAG_RD,
 			&sc->stats.tx_frames_64b, 0,
 			"Number frames transmitted of size 64 bytes or less");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_65to127b", CTLFLAG_RD,
 			&sc->stats.tx_frames_65to127b, 0,
 			"Number frames transmitted of size 65-127 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_128to255b",
 			CTLFLAG_RD, &sc->stats.tx_frames_128to255b, 0,
 			"Number frames transmitted of size 128-255 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_256to511b",
 			CTLFLAG_RD, &sc->stats.tx_frames_256to511b, 0,
 			"Number frames transmitted of size 256-511 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_512to1023b",
 			CTLFLAG_RD, &sc->stats.tx_frames_512to1023b, 0,
 			"Number frames transmitted of size 512-1023 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_frames_1024to1536b",
 			CTLFLAG_RD, &sc->stats.tx_frames_1024to1536b, 0,
 			"Number frames transmitted of size 1024-1536 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_under_runs",
 			CTLFLAG_RD, &sc->stats.tx_under_runs, 0,
 			"Number transmit under-run events");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_single_collisn",
 			CTLFLAG_RD, &sc->stats.tx_single_collisn, 0,
 			"Number single-collision transmit frames");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_multi_collisn",
 			CTLFLAG_RD, &sc->stats.tx_multi_collisn, 0,
 			"Number multi-collision transmit frames");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_excsv_collisn",
 			CTLFLAG_RD, &sc->stats.tx_excsv_collisn, 0,
 			"Number excessive collision transmit frames");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_late_collisn",
 			CTLFLAG_RD, &sc->stats.tx_late_collisn, 0,
 			"Number late-collision transmit frames");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_deferred_frames",
 			CTLFLAG_RD, &sc->stats.tx_deferred_frames, 0,
 			"Number deferred transmit frames");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "tx_carrier_sense_errs",
 			CTLFLAG_RD, &sc->stats.tx_carrier_sense_errs, 0,
 			"Number carrier sense errors on transmit");
 
 	SYSCTL_ADD_UQUAD(ctx, child, OID_AUTO, "rx_bytes", CTLFLAG_RD,
 			 &sc->stats.rx_bytes, "Total bytes received");
 
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames", CTLFLAG_RD,
 			&sc->stats.rx_frames, 0, "Total frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_bcast",
 			CTLFLAG_RD, &sc->stats.rx_frames_bcast, 0,
 			"Number broadcast frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_multi",
 			CTLFLAG_RD, &sc->stats.rx_frames_multi, 0,
 			"Number multicast frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_pause",
 			CTLFLAG_RD, &sc->stats.rx_frames_pause, 0,
 			"Number pause frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_64b",
 			CTLFLAG_RD, &sc->stats.rx_frames_64b, 0,
 			"Number frames received of size 64 bytes or less");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_65to127b",
 			CTLFLAG_RD, &sc->stats.rx_frames_65to127b, 0,
 			"Number frames received of size 65-127 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_128to255b",
 			CTLFLAG_RD, &sc->stats.rx_frames_128to255b, 0,
 			"Number frames received of size 128-255 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_256to511b",
 			CTLFLAG_RD, &sc->stats.rx_frames_256to511b, 0,
 			"Number frames received of size 256-511 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_512to1023b",
 			CTLFLAG_RD, &sc->stats.rx_frames_512to1023b, 0,
 			"Number frames received of size 512-1023 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_1024to1536b",
 			CTLFLAG_RD, &sc->stats.rx_frames_1024to1536b, 0,
 			"Number frames received of size 1024-1536 bytes");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_undersize",
 			CTLFLAG_RD, &sc->stats.rx_frames_undersize, 0,
 			"Number undersize frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_oversize",
 			CTLFLAG_RD, &sc->stats.rx_frames_oversize, 0,
 			"Number oversize frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_jabber",
 			CTLFLAG_RD, &sc->stats.rx_frames_jabber, 0,
 			"Number jabber frames received");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_fcs_errs",
 			CTLFLAG_RD, &sc->stats.rx_frames_fcs_errs, 0,
 			"Number frames received with FCS errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_length_errs",
 			CTLFLAG_RD, &sc->stats.rx_frames_length_errs, 0,
 			"Number frames received with length errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_symbol_errs",
 			CTLFLAG_RD, &sc->stats.rx_symbol_errs, 0,
 			"Number receive symbol errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_align_errs",
 			CTLFLAG_RD, &sc->stats.rx_align_errs, 0,
 			"Number receive alignment errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_resource_errs",
 			CTLFLAG_RD, &sc->stats.rx_resource_errs, 0,
 			"Number frames received when no rx buffer available");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_overrun_errs",
 			CTLFLAG_RD, &sc->stats.rx_overrun_errs, 0,
 			"Number frames received but not copied due to "
 			"receive overrun");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_ip_hdr_csum_errs",
 			CTLFLAG_RD, &sc->stats.rx_ip_hdr_csum_errs, 0,
 			"Number frames received with IP header checksum "
 			"errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_tcp_csum_errs",
 			CTLFLAG_RD, &sc->stats.rx_tcp_csum_errs, 0,
 			"Number frames received with TCP checksum errors");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "rx_frames_udp_csum_errs",
 			CTLFLAG_RD, &sc->stats.rx_udp_csum_errs, 0,
 			"Number frames received with UDP checksum errors");
 }
 
 
 static int
 cgem_probe(device_t dev)
 {
 
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
-	if (!ofw_bus_is_compatible(dev, "cadence,gem"))
+	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "Cadence CGEM Gigabit Ethernet Interface");
 	return (0);
 }
 
 static int
 cgem_attach(device_t dev)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 	if_t ifp = NULL;
 	phandle_t node;
 	pcell_t cell;
 	int rid, err;
 	u_char eaddr[ETHER_ADDR_LEN];
 
 	sc->dev = dev;
 	CGEM_LOCK_INIT(sc);
 
 	/* Get reference clock number and base divider from fdt. */
 	node = ofw_bus_get_node(dev);
 	sc->ref_clk_num = 0;
 	if (OF_getprop(node, "ref-clock-num", &cell, sizeof(cell)) > 0)
 		sc->ref_clk_num = fdt32_to_cpu(cell);
 
 	/* Get memory resource. */
 	rid = 0;
 	sc->mem_res = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid,
 					     RF_ACTIVE);
 	if (sc->mem_res == NULL) {
 		device_printf(dev, "could not allocate memory resources.\n");
 		return (ENOMEM);
 	}
 
 	/* Get IRQ resource. */
 	rid = 0;
 	sc->irq_res = bus_alloc_resource_any(dev, SYS_RES_IRQ, &rid,
 					     RF_ACTIVE);
 	if (sc->irq_res == NULL) {
 		device_printf(dev, "could not allocate interrupt resource.\n");
 		cgem_detach(dev);
 		return (ENOMEM);
 	}
 
 	/* Set up ifnet structure. */
 	ifp = sc->ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "could not allocate ifnet structure\n");
 		cgem_detach(dev);
 		return (ENOMEM);
 	}
 	if_setsoftc(ifp, sc);
 	if_initname(ifp, IF_CGEM_NAME, device_get_unit(dev));
 	if_setflags(ifp, IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST);
 	if_setinitfn(ifp, cgem_init);
 	if_setioctlfn(ifp, cgem_ioctl);
 	if_setstartfn(ifp, cgem_start);
 	if_setcapabilitiesbit(ifp, IFCAP_HWCSUM | IFCAP_HWCSUM_IPV6 |
 			      IFCAP_VLAN_MTU | IFCAP_VLAN_HWCSUM, 0);
 	if_setsendqlen(ifp, CGEM_NUM_TX_DESCS);
 	if_setsendqready(ifp);
 
 	/* Disable hardware checksumming by default. */
 	if_sethwassist(ifp, 0);
 	if_setcapenable(ifp, if_getcapabilities(ifp) &
 		~(IFCAP_HWCSUM | IFCAP_HWCSUM_IPV6 | IFCAP_VLAN_HWCSUM));
 
 	sc->if_old_flags = if_getflags(ifp);
 	sc->rxbufs = DEFAULT_NUM_RX_BUFS;
 	sc->rxhangwar = 1;
 
 	/* Reset hardware. */
 	CGEM_LOCK(sc);
 	cgem_reset(sc);
 	CGEM_UNLOCK(sc);
 
 	/* Attach phy to mii bus. */
 	err = mii_attach(dev, &sc->miibus, ifp,
 			 cgem_ifmedia_upd, cgem_ifmedia_sts,
 			 BMSR_DEFCAPMASK, MII_PHY_ANY, MII_OFFSET_ANY, 0);
 	if (err) {
 		device_printf(dev, "attaching PHYs failed\n");
 		cgem_detach(dev);
 		return (err);
 	}
 
 	/* Set up TX and RX descriptor area. */
 	err = cgem_setup_descs(sc);
 	if (err) {
 		device_printf(dev, "could not set up dma mem for descs.\n");
 		cgem_detach(dev);
 		return (ENOMEM);
 	}
 
 	/* Get a MAC address. */
 	cgem_get_mac(sc, eaddr);
 
 	/* Start ticks. */
 	callout_init_mtx(&sc->tick_ch, &sc->sc_mtx, 0);
 
 	ether_ifattach(ifp, eaddr);
 
 	err = bus_setup_intr(dev, sc->irq_res, INTR_TYPE_NET | INTR_MPSAFE |
 			     INTR_EXCL, NULL, cgem_intr, sc, &sc->intrhand);
 	if (err) {
 		device_printf(dev, "could not set interrupt handler.\n");
 		ether_ifdetach(ifp);
 		cgem_detach(dev);
 		return (err);
 	}
 
 	cgem_add_sysctls(dev);
 
 	return (0);
 }
 
 static int
 cgem_detach(device_t dev)
 {
 	struct cgem_softc *sc = device_get_softc(dev);
 	int i;
 
 	if (sc == NULL)
 		return (ENODEV);
 
 	if (device_is_attached(dev)) {
 		CGEM_LOCK(sc);
 		cgem_stop(sc);
 		CGEM_UNLOCK(sc);
 		callout_drain(&sc->tick_ch);
 		if_setflagbits(sc->ifp, 0, IFF_UP);
 		ether_ifdetach(sc->ifp);
 	}
 
 	if (sc->miibus != NULL) {
 		device_delete_child(dev, sc->miibus);
 		sc->miibus = NULL;
 	}
 
 	/* Release resources. */
 	if (sc->mem_res != NULL) {
 		bus_release_resource(dev, SYS_RES_MEMORY,
 				     rman_get_rid(sc->mem_res), sc->mem_res);
 		sc->mem_res = NULL;
 	}
 	if (sc->irq_res != NULL) {
 		if (sc->intrhand)
 			bus_teardown_intr(dev, sc->irq_res, sc->intrhand);
 		bus_release_resource(dev, SYS_RES_IRQ,
 				     rman_get_rid(sc->irq_res), sc->irq_res);
 		sc->irq_res = NULL;
 	}
 
 	/* Release DMA resources. */
 	if (sc->rxring != NULL) {
 		if (sc->rxring_physaddr != 0) {
 			bus_dmamap_unload(sc->desc_dma_tag,
 					  sc->rxring_dma_map);
 			sc->rxring_physaddr = 0;
 		}
 		bus_dmamem_free(sc->desc_dma_tag, sc->rxring,
 				sc->rxring_dma_map);
 		sc->rxring = NULL;
 		for (i = 0; i < CGEM_NUM_RX_DESCS; i++)
 			if (sc->rxring_m_dmamap[i] != NULL) {
 				bus_dmamap_destroy(sc->mbuf_dma_tag,
 						   sc->rxring_m_dmamap[i]);
 				sc->rxring_m_dmamap[i] = NULL;
 			}
 	}
 	if (sc->txring != NULL) {
 		if (sc->txring_physaddr != 0) {
 			bus_dmamap_unload(sc->desc_dma_tag,
 					  sc->txring_dma_map);
 			sc->txring_physaddr = 0;
 		}
 		bus_dmamem_free(sc->desc_dma_tag, sc->txring,
 				sc->txring_dma_map);
 		sc->txring = NULL;
 		for (i = 0; i < CGEM_NUM_TX_DESCS; i++)
 			if (sc->txring_m_dmamap[i] != NULL) {
 				bus_dmamap_destroy(sc->mbuf_dma_tag,
 						   sc->txring_m_dmamap[i]);
 				sc->txring_m_dmamap[i] = NULL;
 			}
 	}
 	if (sc->desc_dma_tag != NULL) {
 		bus_dma_tag_destroy(sc->desc_dma_tag);
 		sc->desc_dma_tag = NULL;
 	}
 	if (sc->mbuf_dma_tag != NULL) {
 		bus_dma_tag_destroy(sc->mbuf_dma_tag);
 		sc->mbuf_dma_tag = NULL;
 	}
 
 	bus_generic_detach(dev);
 
 	CGEM_LOCK_DESTROY(sc);
 
 	return (0);
 }
 
 static device_method_t cgem_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		cgem_probe),
 	DEVMETHOD(device_attach,	cgem_attach),
 	DEVMETHOD(device_detach,	cgem_detach),
 
 	/* Bus interface */
 	DEVMETHOD(bus_child_detached,	cgem_child_detached),
 
 	/* MII interface */
 	DEVMETHOD(miibus_readreg,	cgem_miibus_readreg),
 	DEVMETHOD(miibus_writereg,	cgem_miibus_writereg),
 	DEVMETHOD(miibus_statchg,	cgem_miibus_statchg),
 	DEVMETHOD(miibus_linkchg,	cgem_miibus_linkchg),
 
 	DEVMETHOD_END
 };
 
 static driver_t cgem_driver = {
 	"cgem",
 	cgem_methods,
 	sizeof(struct cgem_softc),
 };
 
 DRIVER_MODULE(cgem, simplebus, cgem_driver, cgem_devclass, NULL, NULL);
 DRIVER_MODULE(miibus, cgem, miibus_driver, miibus_devclass, NULL, NULL);
 MODULE_DEPEND(cgem, miibus, 1, 1, 1);
 MODULE_DEPEND(cgem, ether, 1, 1, 1);
Index: user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.c
===================================================================
--- user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.c	(revision 346926)
@@ -1,2388 +1,2952 @@
 /*-
  * Copyright (c) 2017 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: John Baldwin <jhb@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/bus.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
 #include <sys/module.h>
 #include <sys/sglist.h>
 
 #include <opencrypto/cryptodev.h>
 #include <opencrypto/xform.h>
 
 #include "cryptodev_if.h"
 
 #include "common/common.h"
 #include "crypto/t4_crypto.h"
 
 /*
  * Requests consist of:
  *
  * +-------------------------------+
  * | struct fw_crypto_lookaside_wr |
  * +-------------------------------+
  * | struct ulp_txpkt              |
  * +-------------------------------+
  * | struct ulptx_idata            |
  * +-------------------------------+
  * | struct cpl_tx_sec_pdu         |
  * +-------------------------------+
  * | struct cpl_tls_tx_scmd_fmt    |
  * +-------------------------------+
  * | key context header            |
  * +-------------------------------+
  * | AES key                       |  ----- For requests with AES
  * +-------------------------------+
  * | Hash state                    |  ----- For hash-only requests
  * +-------------------------------+ -
  * | IPAD (16-byte aligned)        |  \
  * +-------------------------------+  +---- For requests with HMAC
  * | OPAD (16-byte aligned)        |  /
  * +-------------------------------+ -
  * | GMAC H                        |  ----- For AES-GCM
  * +-------------------------------+ -
  * | struct cpl_rx_phys_dsgl       |  \
  * +-------------------------------+  +---- Destination buffer for
  * | PHYS_DSGL entries             |  /     non-hash-only requests
  * +-------------------------------+ -
  * | 16 dummy bytes                |  ----- Only for HMAC/hash-only requests
  * +-------------------------------+
  * | IV                            |  ----- If immediate IV
  * +-------------------------------+
  * | Payload                       |  ----- If immediate Payload
  * +-------------------------------+ -
  * | struct ulptx_sgl              |  \
  * +-------------------------------+  +---- If payload via SGL
  * | SGL entries                   |  /
  * +-------------------------------+ -
  *
  * Note that the key context must be padded to ensure 16-byte alignment.
  * For HMAC requests, the key consists of the partial hash of the IPAD
  * followed by the partial hash of the OPAD.
  *
  * Replies consist of:
  *
  * +-------------------------------+
  * | struct cpl_fw6_pld            |
  * +-------------------------------+
  * | hash digest                   |  ----- For HMAC request with
  * +-------------------------------+        'hash_size' set in work request
  *
  * A 32-bit big-endian error status word is supplied in the last 4
  * bytes of data[0] in the CPL_FW6_PLD message.  bit 0 indicates a
  * "MAC" error and bit 1 indicates a "PAD" error.
  *
  * The 64-bit 'cookie' field from the fw_crypto_lookaside_wr message
  * in the request is returned in data[1] of the CPL_FW6_PLD message.
  *
  * For block cipher replies, the updated IV is supplied in data[2] and
  * data[3] of the CPL_FW6_PLD message.
  *
  * For hash replies where the work request set 'hash_size' to request
  * a copy of the hash in the reply, the hash digest is supplied
  * immediately following the CPL_FW6_PLD message.
  */
 
 /*
  * The crypto engine supports a maximum AAD size of 511 bytes.
  */
 #define	MAX_AAD_LEN		511
 
 /*
  * The documentation for CPL_RX_PHYS_DSGL claims a maximum of 32 SG
  * entries.  While the CPL includes a 16-bit length field, the T6 can
  * sometimes hang if an error occurs while processing a request with a
  * single DSGL entry larger than 2k.
  */
 #define	MAX_RX_PHYS_DSGL_SGE	32
 #define	DSGL_SGE_MAXLEN		2048
 
 /*
  * The adapter only supports requests with a total input or output
  * length of 64k-1 or smaller.  Longer requests either result in hung
  * requests or incorrect results.
  */
 #define	MAX_REQUEST_SIZE	65535
 
 static MALLOC_DEFINE(M_CCR, "ccr", "Chelsio T6 crypto");
 
 struct ccr_session_hmac {
 	struct auth_hash *auth_hash;
 	int hash_len;
 	unsigned int partial_digest_len;
 	unsigned int auth_mode;
 	unsigned int mk_size;
 	char ipad[CHCR_HASH_MAX_BLOCK_SIZE_128];
 	char opad[CHCR_HASH_MAX_BLOCK_SIZE_128];
 };
 
 struct ccr_session_gmac {
 	int hash_len;
 	char ghash_h[GMAC_BLOCK_LEN];
 };
 
+struct ccr_session_ccm_mac {
+	int hash_len;
+};
+
 struct ccr_session_blkcipher {
 	unsigned int cipher_mode;
 	unsigned int key_len;
 	unsigned int iv_len;
 	__be32 key_ctx_hdr;
 	char enckey[CHCR_AES_MAX_KEY_LEN];
 	char deckey[CHCR_AES_MAX_KEY_LEN];
 };
 
 struct ccr_session {
 	bool active;
 	int pending;
-	enum { HASH, HMAC, BLKCIPHER, AUTHENC, GCM } mode;
+	enum { HASH, HMAC, BLKCIPHER, AUTHENC, GCM, CCM } mode;
 	union {
 		struct ccr_session_hmac hmac;
 		struct ccr_session_gmac gmac;
+		struct ccr_session_ccm_mac ccm_mac;
 	};
 	struct ccr_session_blkcipher blkcipher;
 };
 
 struct ccr_softc {
 	struct adapter *adapter;
 	device_t dev;
 	uint32_t cid;
 	int tx_channel_id;
 	struct mtx lock;
 	bool detaching;
 	struct sge_wrq *txq;
 	struct sge_rxq *rxq;
 
 	/*
 	 * Pre-allocate S/G lists used when preparing a work request.
 	 * 'sg_crp' contains an sglist describing the entire buffer
 	 * for a 'struct cryptop'.  'sg_ulptx' is used to describe
 	 * the data the engine should DMA as input via ULPTX_SGL.
 	 * 'sg_dsgl' is used to describe the destination that cipher
 	 * text and a tag should be written to.
 	 */
 	struct sglist *sg_crp;
 	struct sglist *sg_ulptx;
 	struct sglist *sg_dsgl;
 
 	/*
 	 * Pre-allocate a dummy output buffer for the IV and AAD for
 	 * AEAD requests.
 	 */
 	char *iv_aad_buf;
 	struct sglist *sg_iv_aad;
 
 	/* Statistics. */
 	uint64_t stats_blkcipher_encrypt;
 	uint64_t stats_blkcipher_decrypt;
 	uint64_t stats_hash;
 	uint64_t stats_hmac;
 	uint64_t stats_authenc_encrypt;
 	uint64_t stats_authenc_decrypt;
 	uint64_t stats_gcm_encrypt;
 	uint64_t stats_gcm_decrypt;
+	uint64_t stats_ccm_encrypt;
+	uint64_t stats_ccm_decrypt;
 	uint64_t stats_wr_nomem;
 	uint64_t stats_inflight;
 	uint64_t stats_mac_error;
 	uint64_t stats_pad_error;
 	uint64_t stats_bad_session;
 	uint64_t stats_sglist_error;
 	uint64_t stats_process_error;
 	uint64_t stats_sw_fallback;
 };
 
 /*
  * Crypto requests involve two kind of scatter/gather lists.
  *
  * Non-hash-only requests require a PHYS_DSGL that describes the
  * location to store the results of the encryption or decryption
  * operation.  This SGL uses a different format (PHYS_DSGL) and should
  * exclude the crd_skip bytes at the start of the data as well as
  * any AAD or IV.  For authenticated encryption requests it should
  * cover include the destination of the hash or tag.
  *
  * The input payload may either be supplied inline as immediate data,
  * or via a standard ULP_TX SGL.  This SGL should include AAD,
  * ciphertext, and the hash or tag for authenticated decryption
  * requests.
  *
  * These scatter/gather lists can describe different subsets of the
  * buffer described by the crypto operation.  ccr_populate_sglist()
  * generates a scatter/gather list that covers the entire crypto
  * operation buffer that is then used to construct the other
  * scatter/gather lists.
  */
 static int
 ccr_populate_sglist(struct sglist *sg, struct cryptop *crp)
 {
 	int error;
 
 	sglist_reset(sg);
 	if (crp->crp_flags & CRYPTO_F_IMBUF)
 		error = sglist_append_mbuf(sg, (struct mbuf *)crp->crp_buf);
 	else if (crp->crp_flags & CRYPTO_F_IOV)
 		error = sglist_append_uio(sg, (struct uio *)crp->crp_buf);
 	else
 		error = sglist_append(sg, crp->crp_buf, crp->crp_ilen);
 	return (error);
 }
 
 /*
  * Segments in 'sg' larger than 'maxsegsize' are counted as multiple
  * segments.
  */
 static int
 ccr_count_sgl(struct sglist *sg, int maxsegsize)
 {
 	int i, nsegs;
 
 	nsegs = 0;
 	for (i = 0; i < sg->sg_nseg; i++)
 		nsegs += howmany(sg->sg_segs[i].ss_len, maxsegsize);
 	return (nsegs);
 }
 
 /* These functions deal with PHYS_DSGL for the reply buffer. */
 static inline int
 ccr_phys_dsgl_len(int nsegs)
 {
 	int len;
 
 	len = (nsegs / 8) * sizeof(struct phys_sge_pairs);
 	if ((nsegs % 8) != 0) {
 		len += sizeof(uint16_t) * 8;
 		len += roundup2(nsegs % 8, 2) * sizeof(uint64_t);
 	}
 	return (len);
 }
 
 static void
 ccr_write_phys_dsgl(struct ccr_softc *sc, void *dst, int nsegs)
 {
 	struct sglist *sg;
 	struct cpl_rx_phys_dsgl *cpl;
 	struct phys_sge_pairs *sgl;
 	vm_paddr_t paddr;
 	size_t seglen;
 	u_int i, j;
 
 	sg = sc->sg_dsgl;
 	cpl = dst;
 	cpl->op_to_tid = htobe32(V_CPL_RX_PHYS_DSGL_OPCODE(CPL_RX_PHYS_DSGL) |
 	    V_CPL_RX_PHYS_DSGL_ISRDMA(0));
 	cpl->pcirlxorder_to_noofsgentr = htobe32(
 	    V_CPL_RX_PHYS_DSGL_PCIRLXORDER(0) |
 	    V_CPL_RX_PHYS_DSGL_PCINOSNOOP(0) |
 	    V_CPL_RX_PHYS_DSGL_PCITPHNTENB(0) | V_CPL_RX_PHYS_DSGL_DCAID(0) |
 	    V_CPL_RX_PHYS_DSGL_NOOFSGENTR(nsegs));
 	cpl->rss_hdr_int.opcode = CPL_RX_PHYS_ADDR;
 	cpl->rss_hdr_int.qid = htobe16(sc->rxq->iq.abs_id);
 	cpl->rss_hdr_int.hash_val = 0;
 	sgl = (struct phys_sge_pairs *)(cpl + 1);
 	j = 0;
 	for (i = 0; i < sg->sg_nseg; i++) {
 		seglen = sg->sg_segs[i].ss_len;
 		paddr = sg->sg_segs[i].ss_paddr;
 		do {
 			sgl->addr[j] = htobe64(paddr);
 			if (seglen > DSGL_SGE_MAXLEN) {
 				sgl->len[j] = htobe16(DSGL_SGE_MAXLEN);
 				paddr += DSGL_SGE_MAXLEN;
 				seglen -= DSGL_SGE_MAXLEN;
 			} else {
 				sgl->len[j] = htobe16(seglen);
 				seglen = 0;
 			}
 			j++;
 			if (j == 8) {
 				sgl++;
 				j = 0;
 			}
 		} while (seglen != 0);
 	}
 	MPASS(j + 8 * (sgl - (struct phys_sge_pairs *)(cpl + 1)) == nsegs);
 }
 
 /* These functions deal with the ULPTX_SGL for input payload. */
 static inline int
 ccr_ulptx_sgl_len(int nsegs)
 {
 	u_int n;
 
 	nsegs--; /* first segment is part of ulptx_sgl */
 	n = sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1));
 	return (roundup2(n, 16));
 }
 
 static void
 ccr_write_ulptx_sgl(struct ccr_softc *sc, void *dst, int nsegs)
 {
 	struct ulptx_sgl *usgl;
 	struct sglist *sg;
 	struct sglist_seg *ss;
 	int i;
 
 	sg = sc->sg_ulptx;
 	MPASS(nsegs == sg->sg_nseg);
 	ss = &sg->sg_segs[0];
 	usgl = dst;
 	usgl->cmd_nsge = htobe32(V_ULPTX_CMD(ULP_TX_SC_DSGL) |
 	    V_ULPTX_NSGE(nsegs));
 	usgl->len0 = htobe32(ss->ss_len);
 	usgl->addr0 = htobe64(ss->ss_paddr);
 	ss++;
 	for (i = 0; i < sg->sg_nseg - 1; i++) {
 		usgl->sge[i / 2].len[i & 1] = htobe32(ss->ss_len);
 		usgl->sge[i / 2].addr[i & 1] = htobe64(ss->ss_paddr);
 		ss++;
 	}
 	
 }
 
 static bool
 ccr_use_imm_data(u_int transhdr_len, u_int input_len)
 {
 
 	if (input_len > CRYPTO_MAX_IMM_TX_PKT_LEN)
 		return (false);
 	if (roundup2(transhdr_len, 16) + roundup2(input_len, 16) >
 	    SGE_MAX_WR_LEN)
 		return (false);
 	return (true);
 }
 
 static void
 ccr_populate_wreq(struct ccr_softc *sc, struct chcr_wr *crwr, u_int kctx_len,
     u_int wr_len, u_int imm_len, u_int sgl_len, u_int hash_size,
     struct cryptop *crp)
 {
-	u_int cctx_size;
+	u_int cctx_size, idata_len;
 
 	cctx_size = sizeof(struct _key_ctx) + kctx_len;
 	crwr->wreq.op_to_cctx_size = htobe32(
 	    V_FW_CRYPTO_LOOKASIDE_WR_OPCODE(FW_CRYPTO_LOOKASIDE_WR) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_COMPL(0) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_IMM_LEN(imm_len) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_CCTX_LOC(1) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_CCTX_SIZE(cctx_size >> 4));
 	crwr->wreq.len16_pkd = htobe32(
 	    V_FW_CRYPTO_LOOKASIDE_WR_LEN16(wr_len / 16));
 	crwr->wreq.session_id = 0;
 	crwr->wreq.rx_chid_to_rx_q_id = htobe32(
 	    V_FW_CRYPTO_LOOKASIDE_WR_RX_CHID(sc->tx_channel_id) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_LCB(0) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_PHASH(0) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_IV(IV_NOP) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_FQIDX(0) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_TX_CH(0) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_RX_Q_ID(sc->rxq->iq.abs_id));
 	crwr->wreq.key_addr = 0;
 	crwr->wreq.pld_size_hash_size = htobe32(
 	    V_FW_CRYPTO_LOOKASIDE_WR_PLD_SIZE(sgl_len) |
 	    V_FW_CRYPTO_LOOKASIDE_WR_HASH_SIZE(hash_size));
 	crwr->wreq.cookie = htobe64((uintptr_t)crp);
 
 	crwr->ulptx.cmd_dest = htobe32(V_ULPTX_CMD(ULP_TX_PKT) |
 	    V_ULP_TXPKT_DATAMODIFY(0) |
 	    V_ULP_TXPKT_CHANNELID(sc->tx_channel_id) | V_ULP_TXPKT_DEST(0) |
 	    V_ULP_TXPKT_FID(0) | V_ULP_TXPKT_RO(1));
 	crwr->ulptx.len = htobe32(
 	    ((wr_len - sizeof(struct fw_crypto_lookaside_wr)) / 16));
 
 	crwr->sc_imm.cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM) |
-	    V_ULP_TX_SC_MORE(imm_len != 0 ? 0 : 1));
-	crwr->sc_imm.len = htobe32(wr_len - offsetof(struct chcr_wr, sec_cpl) -
-	    sgl_len);
+	    V_ULP_TX_SC_MORE(sgl_len != 0 ? 1 : 0));
+	idata_len = wr_len - offsetof(struct chcr_wr, sec_cpl) - sgl_len;
+	if (imm_len % 16 != 0)
+		idata_len -= 16 - imm_len % 16;
+	crwr->sc_imm.len = htobe32(idata_len);
 }
 
 static int
 ccr_hash(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp)
 {
 	struct chcr_wr *crwr;
 	struct wrqe *wr;
 	struct auth_hash *axf;
 	struct cryptodesc *crd;
 	char *dst;
 	u_int hash_size_in_response, kctx_flits, kctx_len, transhdr_len, wr_len;
 	u_int hmac_ctrl, imm_len, iopad_size;
 	int error, sgl_nsegs, sgl_len, use_opad;
 
 	crd = crp->crp_desc;
 
 	/* Reject requests with too large of an input buffer. */
 	if (crd->crd_len > MAX_REQUEST_SIZE)
 		return (EFBIG);
 
 	axf = s->hmac.auth_hash;
 
 	if (s->mode == HMAC) {
 		use_opad = 1;
 		hmac_ctrl = SCMD_HMAC_CTRL_NO_TRUNC;
 	} else {
 		use_opad = 0;
 		hmac_ctrl = SCMD_HMAC_CTRL_NOP;
 	}
 
 	/* PADs must be 128-bit aligned. */
 	iopad_size = roundup2(s->hmac.partial_digest_len, 16);
 
 	/*
 	 * The 'key' part of the context includes the aligned IPAD and
 	 * OPAD.
 	 */
 	kctx_len = iopad_size;
 	if (use_opad)
 		kctx_len += iopad_size;
 	hash_size_in_response = axf->hashsize;
 	transhdr_len = HASH_TRANSHDR_SIZE(kctx_len);
 
 	if (crd->crd_len == 0) {
 		imm_len = axf->blocksize;
 		sgl_nsegs = 0;
 		sgl_len = 0;
 	} else if (ccr_use_imm_data(transhdr_len, crd->crd_len)) {
 		imm_len = crd->crd_len;
 		sgl_nsegs = 0;
 		sgl_len = 0;
 	} else {
 		imm_len = 0;
 		sglist_reset(sc->sg_ulptx);
 		error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 		    crd->crd_skip, crd->crd_len);
 		if (error)
 			return (error);
 		sgl_nsegs = sc->sg_ulptx->sg_nseg;
 		sgl_len = ccr_ulptx_sgl_len(sgl_nsegs);
 	}
 
 	wr_len = roundup2(transhdr_len, 16) + roundup2(imm_len, 16) + sgl_len;
 	if (wr_len > SGE_MAX_WR_LEN)
 		return (EFBIG);
 	wr = alloc_wrqe(wr_len, sc->txq);
 	if (wr == NULL) {
 		sc->stats_wr_nomem++;
 		return (ENOMEM);
 	}
 	crwr = wrtod(wr);
 	memset(crwr, 0, wr_len);
 
 	ccr_populate_wreq(sc, crwr, kctx_len, wr_len, imm_len, sgl_len,
 	    hash_size_in_response, crp);
 
 	/* XXX: Hardcodes SGE loopback channel of 0. */
 	crwr->sec_cpl.op_ivinsrtofst = htobe32(
 	    V_CPL_TX_SEC_PDU_OPCODE(CPL_TX_SEC_PDU) |
 	    V_CPL_TX_SEC_PDU_RXCHID(sc->tx_channel_id) |
 	    V_CPL_TX_SEC_PDU_ACKFOLLOWS(0) | V_CPL_TX_SEC_PDU_ULPTXLPBK(1) |
 	    V_CPL_TX_SEC_PDU_CPLLEN(2) | V_CPL_TX_SEC_PDU_PLACEHOLDER(0) |
 	    V_CPL_TX_SEC_PDU_IVINSRTOFST(0));
 
 	crwr->sec_cpl.pldlen = htobe32(crd->crd_len == 0 ? axf->blocksize :
 	    crd->crd_len);
 
 	crwr->sec_cpl.cipherstop_lo_authinsert = htobe32(
 	    V_CPL_TX_SEC_PDU_AUTHSTART(1) | V_CPL_TX_SEC_PDU_AUTHSTOP(0));
 
 	/* These two flits are actually a CPL_TLS_TX_SCMD_FMT. */
 	crwr->sec_cpl.seqno_numivs = htobe32(
 	    V_SCMD_SEQ_NO_CTRL(0) |
 	    V_SCMD_PROTO_VERSION(SCMD_PROTO_VERSION_GENERIC) |
 	    V_SCMD_CIPH_MODE(SCMD_CIPH_MODE_NOP) |
 	    V_SCMD_AUTH_MODE(s->hmac.auth_mode) |
 	    V_SCMD_HMAC_CTRL(hmac_ctrl));
 	crwr->sec_cpl.ivgen_hdrlen = htobe32(
 	    V_SCMD_LAST_FRAG(0) |
 	    V_SCMD_MORE_FRAGS(crd->crd_len == 0 ? 1 : 0) | V_SCMD_MAC_ONLY(1));
 
 	memcpy(crwr->key_ctx.key, s->hmac.ipad, s->hmac.partial_digest_len);
 	if (use_opad)
 		memcpy(crwr->key_ctx.key + iopad_size, s->hmac.opad,
 		    s->hmac.partial_digest_len);
 
 	/* XXX: F_KEY_CONTEXT_SALT_PRESENT set, but 'salt' not set. */
 	kctx_flits = (sizeof(struct _key_ctx) + kctx_len) / 16;
 	crwr->key_ctx.ctx_hdr = htobe32(V_KEY_CONTEXT_CTX_LEN(kctx_flits) |
 	    V_KEY_CONTEXT_OPAD_PRESENT(use_opad) |
 	    V_KEY_CONTEXT_SALT_PRESENT(1) |
 	    V_KEY_CONTEXT_CK_SIZE(CHCR_KEYCTX_NO_KEY) |
 	    V_KEY_CONTEXT_MK_SIZE(s->hmac.mk_size) | V_KEY_CONTEXT_VALID(1));
 
 	dst = (char *)(crwr + 1) + kctx_len + DUMMY_BYTES;
 	if (crd->crd_len == 0) {
 		dst[0] = 0x80;
-		*(uint64_t *)(dst + axf->blocksize - sizeof(uint64_t)) =
-		    htobe64(axf->blocksize << 3);
+		if (s->mode == HMAC)
+			*(uint64_t *)(dst + axf->blocksize - sizeof(uint64_t)) =
+			    htobe64(axf->blocksize << 3);
 	} else if (imm_len != 0)
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crd->crd_skip,
 		    crd->crd_len, dst);
 	else
 		ccr_write_ulptx_sgl(sc, dst, sgl_nsegs);
 
 	/* XXX: TODO backpressure */
 	t4_wrq_tx(sc->adapter, wr);
 
 	return (0);
 }
 
 static int
 ccr_hash_done(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp,
     const struct cpl_fw6_pld *cpl, int error)
 {
 	struct cryptodesc *crd;
 
 	crd = crp->crp_desc;
 	if (error == 0) {
 		crypto_copyback(crp->crp_flags, crp->crp_buf, crd->crd_inject,
 		    s->hmac.hash_len, (c_caddr_t)(cpl + 1));
 	}
 
 	return (error);
 }
 
 static int
 ccr_blkcipher(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp)
 {
 	char iv[CHCR_MAX_CRYPTO_IV_LEN];
 	struct chcr_wr *crwr;
 	struct wrqe *wr;
 	struct cryptodesc *crd;
 	char *dst;
 	u_int kctx_len, key_half, op_type, transhdr_len, wr_len;
 	u_int imm_len;
 	int dsgl_nsegs, dsgl_len;
 	int sgl_nsegs, sgl_len;
 	int error;
 
 	crd = crp->crp_desc;
 
 	if (s->blkcipher.key_len == 0 || crd->crd_len == 0)
 		return (EINVAL);
 	if (crd->crd_alg == CRYPTO_AES_CBC &&
 	    (crd->crd_len % AES_BLOCK_LEN) != 0)
 		return (EINVAL);
 
 	/* Reject requests with too large of an input buffer. */
 	if (crd->crd_len > MAX_REQUEST_SIZE)
 		return (EFBIG);
 
 	if (crd->crd_flags & CRD_F_ENCRYPT)
 		op_type = CHCR_ENCRYPT_OP;
 	else
 		op_type = CHCR_DECRYPT_OP;
 	
 	sglist_reset(sc->sg_dsgl);
 	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp, crd->crd_skip,
 	    crd->crd_len);
 	if (error)
 		return (error);
 	dsgl_nsegs = ccr_count_sgl(sc->sg_dsgl, DSGL_SGE_MAXLEN);
 	if (dsgl_nsegs > MAX_RX_PHYS_DSGL_SGE)
 		return (EFBIG);
 	dsgl_len = ccr_phys_dsgl_len(dsgl_nsegs);
 
 	/* The 'key' must be 128-bit aligned. */
 	kctx_len = roundup2(s->blkcipher.key_len, 16);
 	transhdr_len = CIPHER_TRANSHDR_SIZE(kctx_len, dsgl_len);
 
 	if (ccr_use_imm_data(transhdr_len, crd->crd_len +
 	    s->blkcipher.iv_len)) {
 		imm_len = crd->crd_len;
 		sgl_nsegs = 0;
 		sgl_len = 0;
 	} else {
 		imm_len = 0;
 		sglist_reset(sc->sg_ulptx);
 		error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 		    crd->crd_skip, crd->crd_len);
 		if (error)
 			return (error);
 		sgl_nsegs = sc->sg_ulptx->sg_nseg;
 		sgl_len = ccr_ulptx_sgl_len(sgl_nsegs);
 	}
 
 	wr_len = roundup2(transhdr_len, 16) + s->blkcipher.iv_len +
 	    roundup2(imm_len, 16) + sgl_len;
 	if (wr_len > SGE_MAX_WR_LEN)
 		return (EFBIG);
 	wr = alloc_wrqe(wr_len, sc->txq);
 	if (wr == NULL) {
 		sc->stats_wr_nomem++;
 		return (ENOMEM);
 	}
 	crwr = wrtod(wr);
 	memset(crwr, 0, wr_len);
 
 	/*
 	 * Read the existing IV from the request or generate a random
 	 * one if none is provided.  Optionally copy the generated IV
 	 * into the output buffer if requested.
 	 */
 	if (op_type == CHCR_ENCRYPT_OP) {
 		if (crd->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crd->crd_iv, s->blkcipher.iv_len);
 		else
 			arc4rand(iv, s->blkcipher.iv_len, 0);
 		if ((crd->crd_flags & CRD_F_IV_PRESENT) == 0)
 			crypto_copyback(crp->crp_flags, crp->crp_buf,
 			    crd->crd_inject, s->blkcipher.iv_len, iv);
 	} else {
 		if (crd->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crd->crd_iv, s->blkcipher.iv_len);
 		else
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crd->crd_inject, s->blkcipher.iv_len, iv);
 	}
 
 	ccr_populate_wreq(sc, crwr, kctx_len, wr_len, imm_len, sgl_len, 0,
 	    crp);
 
 	/* XXX: Hardcodes SGE loopback channel of 0. */
 	crwr->sec_cpl.op_ivinsrtofst = htobe32(
 	    V_CPL_TX_SEC_PDU_OPCODE(CPL_TX_SEC_PDU) |
 	    V_CPL_TX_SEC_PDU_RXCHID(sc->tx_channel_id) |
 	    V_CPL_TX_SEC_PDU_ACKFOLLOWS(0) | V_CPL_TX_SEC_PDU_ULPTXLPBK(1) |
 	    V_CPL_TX_SEC_PDU_CPLLEN(2) | V_CPL_TX_SEC_PDU_PLACEHOLDER(0) |
 	    V_CPL_TX_SEC_PDU_IVINSRTOFST(1));
 
 	crwr->sec_cpl.pldlen = htobe32(s->blkcipher.iv_len + crd->crd_len);
 
 	crwr->sec_cpl.aadstart_cipherstop_hi = htobe32(
 	    V_CPL_TX_SEC_PDU_CIPHERSTART(s->blkcipher.iv_len + 1) |
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_HI(0));
 	crwr->sec_cpl.cipherstop_lo_authinsert = htobe32(
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_LO(0));
 
 	/* These two flits are actually a CPL_TLS_TX_SCMD_FMT. */
 	crwr->sec_cpl.seqno_numivs = htobe32(
 	    V_SCMD_SEQ_NO_CTRL(0) |
 	    V_SCMD_PROTO_VERSION(SCMD_PROTO_VERSION_GENERIC) |
 	    V_SCMD_ENC_DEC_CTRL(op_type) |
 	    V_SCMD_CIPH_MODE(s->blkcipher.cipher_mode) |
 	    V_SCMD_AUTH_MODE(SCMD_AUTH_MODE_NOP) |
 	    V_SCMD_HMAC_CTRL(SCMD_HMAC_CTRL_NOP) |
 	    V_SCMD_IV_SIZE(s->blkcipher.iv_len / 2) |
 	    V_SCMD_NUM_IVS(0));
 	crwr->sec_cpl.ivgen_hdrlen = htobe32(
 	    V_SCMD_IV_GEN_CTRL(0) |
 	    V_SCMD_MORE_FRAGS(0) | V_SCMD_LAST_FRAG(0) | V_SCMD_MAC_ONLY(0) |
 	    V_SCMD_AADIVDROP(1) | V_SCMD_HDR_LEN(dsgl_len));
 
 	crwr->key_ctx.ctx_hdr = s->blkcipher.key_ctx_hdr;
 	switch (crd->crd_alg) {
 	case CRYPTO_AES_CBC:
 		if (crd->crd_flags & CRD_F_ENCRYPT)
 			memcpy(crwr->key_ctx.key, s->blkcipher.enckey,
 			    s->blkcipher.key_len);
 		else
 			memcpy(crwr->key_ctx.key, s->blkcipher.deckey,
 			    s->blkcipher.key_len);
 		break;
 	case CRYPTO_AES_ICM:
 		memcpy(crwr->key_ctx.key, s->blkcipher.enckey,
 		    s->blkcipher.key_len);
 		break;
 	case CRYPTO_AES_XTS:
 		key_half = s->blkcipher.key_len / 2;
 		memcpy(crwr->key_ctx.key, s->blkcipher.enckey + key_half,
 		    key_half);
 		if (crd->crd_flags & CRD_F_ENCRYPT)
 			memcpy(crwr->key_ctx.key + key_half,
 			    s->blkcipher.enckey, key_half);
 		else
 			memcpy(crwr->key_ctx.key + key_half,
 			    s->blkcipher.deckey, key_half);
 		break;
 	}
 
 	dst = (char *)(crwr + 1) + kctx_len;
 	ccr_write_phys_dsgl(sc, dst, dsgl_nsegs);
 	dst += sizeof(struct cpl_rx_phys_dsgl) + dsgl_len;
 	memcpy(dst, iv, s->blkcipher.iv_len);
 	dst += s->blkcipher.iv_len;
 	if (imm_len != 0)
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crd->crd_skip,
 		    crd->crd_len, dst);
 	else
 		ccr_write_ulptx_sgl(sc, dst, sgl_nsegs);
 
 	/* XXX: TODO backpressure */
 	t4_wrq_tx(sc->adapter, wr);
 
 	return (0);
 }
 
 static int
 ccr_blkcipher_done(struct ccr_softc *sc, struct ccr_session *s,
     struct cryptop *crp, const struct cpl_fw6_pld *cpl, int error)
 {
 
 	/*
 	 * The updated IV to permit chained requests is at
 	 * cpl->data[2], but OCF doesn't permit chained requests.
 	 */
 	return (error);
 }
 
 /*
  * 'hashsize' is the length of a full digest.  'authsize' is the
  * requested digest length for this operation which may be less
  * than 'hashsize'.
  */
 static int
 ccr_hmac_ctrl(unsigned int hashsize, unsigned int authsize)
 {
 
 	if (authsize == 10)
 		return (SCMD_HMAC_CTRL_TRUNC_RFC4366);
 	if (authsize == 12)
 		return (SCMD_HMAC_CTRL_IPSEC_96BIT);
 	if (authsize == hashsize / 2)
 		return (SCMD_HMAC_CTRL_DIV2);
 	return (SCMD_HMAC_CTRL_NO_TRUNC);
 }
 
 static int
 ccr_authenc(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp,
     struct cryptodesc *crda, struct cryptodesc *crde)
 {
 	char iv[CHCR_MAX_CRYPTO_IV_LEN];
 	struct chcr_wr *crwr;
 	struct wrqe *wr;
 	struct auth_hash *axf;
 	char *dst;
 	u_int kctx_len, key_half, op_type, transhdr_len, wr_len;
 	u_int hash_size_in_response, imm_len, iopad_size;
 	u_int aad_start, aad_len, aad_stop;
 	u_int auth_start, auth_stop, auth_insert;
 	u_int cipher_start, cipher_stop;
 	u_int hmac_ctrl, input_len;
 	int dsgl_nsegs, dsgl_len;
 	int sgl_nsegs, sgl_len;
 	int error;
 
 	/*
 	 * If there is a need in the future, requests with an empty
 	 * payload could be supported as HMAC-only requests.
 	 */
 	if (s->blkcipher.key_len == 0 || crde->crd_len == 0)
 		return (EINVAL);
 	if (crde->crd_alg == CRYPTO_AES_CBC &&
 	    (crde->crd_len % AES_BLOCK_LEN) != 0)
 		return (EINVAL);
 
 	/*
 	 * Compute the length of the AAD (data covered by the
 	 * authentication descriptor but not the encryption
 	 * descriptor).  To simplify the logic, AAD is only permitted
 	 * before the cipher/plain text, not after.  This is true of
 	 * all currently-generated requests.
 	 */
 	if (crda->crd_len + crda->crd_skip > crde->crd_len + crde->crd_skip)
 		return (EINVAL);
 	if (crda->crd_skip < crde->crd_skip) {
 		if (crda->crd_skip + crda->crd_len > crde->crd_skip)
 			aad_len = (crde->crd_skip - crda->crd_skip);
 		else
 			aad_len = crda->crd_len;
 	} else
 		aad_len = 0;
 	if (aad_len + s->blkcipher.iv_len > MAX_AAD_LEN)
 		return (EINVAL);
 
 	axf = s->hmac.auth_hash;
 	hash_size_in_response = s->hmac.hash_len;
 	if (crde->crd_flags & CRD_F_ENCRYPT)
 		op_type = CHCR_ENCRYPT_OP;
 	else
 		op_type = CHCR_DECRYPT_OP;
 
 	/*
 	 * The output buffer consists of the cipher text followed by
 	 * the hash when encrypting.  For decryption it only contains
 	 * the plain text.
 	 *
 	 * Due to a firmware bug, the output buffer must include a
 	 * dummy output buffer for the IV and AAD prior to the real
 	 * output buffer.
 	 */
 	if (op_type == CHCR_ENCRYPT_OP) {
 		if (s->blkcipher.iv_len + aad_len + crde->crd_len +
 		    hash_size_in_response > MAX_REQUEST_SIZE)
 			return (EFBIG);
 	} else {
 		if (s->blkcipher.iv_len + aad_len + crde->crd_len >
 		    MAX_REQUEST_SIZE)
 			return (EFBIG);
 	}
 	sglist_reset(sc->sg_dsgl);
 	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_iv_aad, 0,
 	    s->blkcipher.iv_len + aad_len);
 	if (error)
 		return (error);
 	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp, crde->crd_skip,
 	    crde->crd_len);
 	if (error)
 		return (error);
 	if (op_type == CHCR_ENCRYPT_OP) {
 		error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp,
 		    crda->crd_inject, hash_size_in_response);
 		if (error)
 			return (error);
 	}
 	dsgl_nsegs = ccr_count_sgl(sc->sg_dsgl, DSGL_SGE_MAXLEN);
 	if (dsgl_nsegs > MAX_RX_PHYS_DSGL_SGE)
 		return (EFBIG);
 	dsgl_len = ccr_phys_dsgl_len(dsgl_nsegs);
 
 	/* PADs must be 128-bit aligned. */
 	iopad_size = roundup2(s->hmac.partial_digest_len, 16);
 
 	/*
 	 * The 'key' part of the key context consists of the key followed
 	 * by the IPAD and OPAD.
 	 */
 	kctx_len = roundup2(s->blkcipher.key_len, 16) + iopad_size * 2;
 	transhdr_len = CIPHER_TRANSHDR_SIZE(kctx_len, dsgl_len);
 
 	/*
 	 * The input buffer consists of the IV, any AAD, and then the
 	 * cipher/plain text.  For decryption requests the hash is
 	 * appended after the cipher text.
 	 *
 	 * The IV is always stored at the start of the input buffer
 	 * even though it may be duplicated in the payload.  The
 	 * crypto engine doesn't work properly if the IV offset points
 	 * inside of the AAD region, so a second copy is always
 	 * required.
 	 */
 	input_len = aad_len + crde->crd_len;
 
 	/*
 	 * The firmware hangs if sent a request which is a
 	 * bit smaller than MAX_REQUEST_SIZE.  In particular, the
 	 * firmware appears to require 512 - 16 bytes of spare room
 	 * along with the size of the hash even if the hash isn't
 	 * included in the input buffer.
 	 */
 	if (input_len + roundup2(axf->hashsize, 16) + (512 - 16) >
 	    MAX_REQUEST_SIZE)
 		return (EFBIG);
 	if (op_type == CHCR_DECRYPT_OP)
 		input_len += hash_size_in_response;
 	if (ccr_use_imm_data(transhdr_len, s->blkcipher.iv_len + input_len)) {
 		imm_len = input_len;
 		sgl_nsegs = 0;
 		sgl_len = 0;
 	} else {
 		imm_len = 0;
 		sglist_reset(sc->sg_ulptx);
 		if (aad_len != 0) {
 			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 			    crda->crd_skip, aad_len);
 			if (error)
 				return (error);
 		}
 		error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 		    crde->crd_skip, crde->crd_len);
 		if (error)
 			return (error);
 		if (op_type == CHCR_DECRYPT_OP) {
 			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 			    crda->crd_inject, hash_size_in_response);
 			if (error)
 				return (error);
 		}
 		sgl_nsegs = sc->sg_ulptx->sg_nseg;
 		sgl_len = ccr_ulptx_sgl_len(sgl_nsegs);
 	}
 
 	/*
 	 * Any auth-only data before the cipher region is marked as AAD.
 	 * Auth-data that overlaps with the cipher region is placed in
 	 * the auth section.
 	 */
 	if (aad_len != 0) {
 		aad_start = s->blkcipher.iv_len + 1;
 		aad_stop = aad_start + aad_len - 1;
 	} else {
 		aad_start = 0;
 		aad_stop = 0;
 	}
 	cipher_start = s->blkcipher.iv_len + aad_len + 1;
 	if (op_type == CHCR_DECRYPT_OP)
 		cipher_stop = hash_size_in_response;
 	else
 		cipher_stop = 0;
 	if (aad_len == crda->crd_len) {
 		auth_start = 0;
 		auth_stop = 0;
 	} else {
 		if (aad_len != 0)
 			auth_start = cipher_start;
 		else
 			auth_start = s->blkcipher.iv_len + crda->crd_skip -
 			    crde->crd_skip + 1;
 		auth_stop = (crde->crd_skip + crde->crd_len) -
 		    (crda->crd_skip + crda->crd_len) + cipher_stop;
 	}
 	if (op_type == CHCR_DECRYPT_OP)
 		auth_insert = hash_size_in_response;
 	else
 		auth_insert = 0;
 
 	wr_len = roundup2(transhdr_len, 16) + s->blkcipher.iv_len +
 	    roundup2(imm_len, 16) + sgl_len;
 	if (wr_len > SGE_MAX_WR_LEN)
 		return (EFBIG);
 	wr = alloc_wrqe(wr_len, sc->txq);
 	if (wr == NULL) {
 		sc->stats_wr_nomem++;
 		return (ENOMEM);
 	}
 	crwr = wrtod(wr);
 	memset(crwr, 0, wr_len);
 
 	/*
 	 * Read the existing IV from the request or generate a random
 	 * one if none is provided.  Optionally copy the generated IV
 	 * into the output buffer if requested.
 	 */
 	if (op_type == CHCR_ENCRYPT_OP) {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, s->blkcipher.iv_len);
 		else
 			arc4rand(iv, s->blkcipher.iv_len, 0);
 		if ((crde->crd_flags & CRD_F_IV_PRESENT) == 0)
 			crypto_copyback(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, s->blkcipher.iv_len, iv);
 	} else {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, s->blkcipher.iv_len);
 		else
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, s->blkcipher.iv_len, iv);
 	}
 
 	ccr_populate_wreq(sc, crwr, kctx_len, wr_len, imm_len, sgl_len,
 	    op_type == CHCR_DECRYPT_OP ? hash_size_in_response : 0, crp);
 
 	/* XXX: Hardcodes SGE loopback channel of 0. */
 	crwr->sec_cpl.op_ivinsrtofst = htobe32(
 	    V_CPL_TX_SEC_PDU_OPCODE(CPL_TX_SEC_PDU) |
 	    V_CPL_TX_SEC_PDU_RXCHID(sc->tx_channel_id) |
 	    V_CPL_TX_SEC_PDU_ACKFOLLOWS(0) | V_CPL_TX_SEC_PDU_ULPTXLPBK(1) |
 	    V_CPL_TX_SEC_PDU_CPLLEN(2) | V_CPL_TX_SEC_PDU_PLACEHOLDER(0) |
 	    V_CPL_TX_SEC_PDU_IVINSRTOFST(1));
 
 	crwr->sec_cpl.pldlen = htobe32(s->blkcipher.iv_len + input_len);
 
 	crwr->sec_cpl.aadstart_cipherstop_hi = htobe32(
 	    V_CPL_TX_SEC_PDU_AADSTART(aad_start) |
 	    V_CPL_TX_SEC_PDU_AADSTOP(aad_stop) |
 	    V_CPL_TX_SEC_PDU_CIPHERSTART(cipher_start) |
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_HI(cipher_stop >> 4));
 	crwr->sec_cpl.cipherstop_lo_authinsert = htobe32(
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_LO(cipher_stop & 0xf) |
 	    V_CPL_TX_SEC_PDU_AUTHSTART(auth_start) |
 	    V_CPL_TX_SEC_PDU_AUTHSTOP(auth_stop) |
 	    V_CPL_TX_SEC_PDU_AUTHINSERT(auth_insert));
 
 	/* These two flits are actually a CPL_TLS_TX_SCMD_FMT. */
 	hmac_ctrl = ccr_hmac_ctrl(axf->hashsize, hash_size_in_response);
 	crwr->sec_cpl.seqno_numivs = htobe32(
 	    V_SCMD_SEQ_NO_CTRL(0) |
 	    V_SCMD_PROTO_VERSION(SCMD_PROTO_VERSION_GENERIC) |
 	    V_SCMD_ENC_DEC_CTRL(op_type) |
 	    V_SCMD_CIPH_AUTH_SEQ_CTRL(op_type == CHCR_ENCRYPT_OP ? 1 : 0) |
 	    V_SCMD_CIPH_MODE(s->blkcipher.cipher_mode) |
 	    V_SCMD_AUTH_MODE(s->hmac.auth_mode) |
 	    V_SCMD_HMAC_CTRL(hmac_ctrl) |
 	    V_SCMD_IV_SIZE(s->blkcipher.iv_len / 2) |
 	    V_SCMD_NUM_IVS(0));
 	crwr->sec_cpl.ivgen_hdrlen = htobe32(
 	    V_SCMD_IV_GEN_CTRL(0) |
 	    V_SCMD_MORE_FRAGS(0) | V_SCMD_LAST_FRAG(0) | V_SCMD_MAC_ONLY(0) |
 	    V_SCMD_AADIVDROP(0) | V_SCMD_HDR_LEN(dsgl_len));
 
 	crwr->key_ctx.ctx_hdr = s->blkcipher.key_ctx_hdr;
 	switch (crde->crd_alg) {
 	case CRYPTO_AES_CBC:
 		if (crde->crd_flags & CRD_F_ENCRYPT)
 			memcpy(crwr->key_ctx.key, s->blkcipher.enckey,
 			    s->blkcipher.key_len);
 		else
 			memcpy(crwr->key_ctx.key, s->blkcipher.deckey,
 			    s->blkcipher.key_len);
 		break;
 	case CRYPTO_AES_ICM:
 		memcpy(crwr->key_ctx.key, s->blkcipher.enckey,
 		    s->blkcipher.key_len);
 		break;
 	case CRYPTO_AES_XTS:
 		key_half = s->blkcipher.key_len / 2;
 		memcpy(crwr->key_ctx.key, s->blkcipher.enckey + key_half,
 		    key_half);
 		if (crde->crd_flags & CRD_F_ENCRYPT)
 			memcpy(crwr->key_ctx.key + key_half,
 			    s->blkcipher.enckey, key_half);
 		else
 			memcpy(crwr->key_ctx.key + key_half,
 			    s->blkcipher.deckey, key_half);
 		break;
 	}
 
 	dst = crwr->key_ctx.key + roundup2(s->blkcipher.key_len, 16);
 	memcpy(dst, s->hmac.ipad, s->hmac.partial_digest_len);
 	memcpy(dst + iopad_size, s->hmac.opad, s->hmac.partial_digest_len);
 
 	dst = (char *)(crwr + 1) + kctx_len;
 	ccr_write_phys_dsgl(sc, dst, dsgl_nsegs);
 	dst += sizeof(struct cpl_rx_phys_dsgl) + dsgl_len;
 	memcpy(dst, iv, s->blkcipher.iv_len);
 	dst += s->blkcipher.iv_len;
 	if (imm_len != 0) {
 		if (aad_len != 0) {
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crda->crd_skip, aad_len, dst);
 			dst += aad_len;
 		}
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crde->crd_skip,
 		    crde->crd_len, dst);
 		dst += crde->crd_len;
 		if (op_type == CHCR_DECRYPT_OP)
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crda->crd_inject, hash_size_in_response, dst);
 	} else
 		ccr_write_ulptx_sgl(sc, dst, sgl_nsegs);
 
 	/* XXX: TODO backpressure */
 	t4_wrq_tx(sc->adapter, wr);
 
 	return (0);
 }
 
 static int
 ccr_authenc_done(struct ccr_softc *sc, struct ccr_session *s,
     struct cryptop *crp, const struct cpl_fw6_pld *cpl, int error)
 {
 	struct cryptodesc *crd;
 
 	/*
 	 * The updated IV to permit chained requests is at
 	 * cpl->data[2], but OCF doesn't permit chained requests.
 	 *
 	 * For a decryption request, the hardware may do a verification
 	 * of the HMAC which will fail if the existing HMAC isn't in the
 	 * buffer.  If that happens, clear the error and copy the HMAC
 	 * from the CPL reply into the buffer.
 	 *
 	 * For encryption requests, crd should be the cipher request
 	 * which will have CRD_F_ENCRYPT set.  For decryption
 	 * requests, crp_desc will be the HMAC request which should
 	 * not have this flag set.
 	 */
 	crd = crp->crp_desc;
 	if (error == EBADMSG && !CHK_PAD_ERR_BIT(be64toh(cpl->data[0])) &&
 	    !(crd->crd_flags & CRD_F_ENCRYPT)) {
 		crypto_copyback(crp->crp_flags, crp->crp_buf, crd->crd_inject,
 		    s->hmac.hash_len, (c_caddr_t)(cpl + 1));
 		error = 0;
 	}
 	return (error);
 }
 
 static int
 ccr_gcm(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp,
     struct cryptodesc *crda, struct cryptodesc *crde)
 {
 	char iv[CHCR_MAX_CRYPTO_IV_LEN];
 	struct chcr_wr *crwr;
 	struct wrqe *wr;
 	char *dst;
 	u_int iv_len, kctx_len, op_type, transhdr_len, wr_len;
 	u_int hash_size_in_response, imm_len;
 	u_int aad_start, aad_stop, cipher_start, cipher_stop, auth_insert;
 	u_int hmac_ctrl, input_len;
 	int dsgl_nsegs, dsgl_len;
 	int sgl_nsegs, sgl_len;
 	int error;
 
 	if (s->blkcipher.key_len == 0)
 		return (EINVAL);
 
 	/*
 	 * The crypto engine doesn't handle GCM requests with an empty
 	 * payload, so handle those in software instead.
 	 */
 	if (crde->crd_len == 0)
 		return (EMSGSIZE);
 
 	/*
 	 * AAD is only permitted before the cipher/plain text, not
 	 * after.
 	 */
 	if (crda->crd_len + crda->crd_skip > crde->crd_len + crde->crd_skip)
 		return (EMSGSIZE);
 
 	if (crda->crd_len + AES_BLOCK_LEN > MAX_AAD_LEN)
 		return (EMSGSIZE);
 
 	hash_size_in_response = s->gmac.hash_len;
 	if (crde->crd_flags & CRD_F_ENCRYPT)
 		op_type = CHCR_ENCRYPT_OP;
 	else
 		op_type = CHCR_DECRYPT_OP;
 
 	/*
 	 * The IV handling for GCM in OCF is a bit more complicated in
 	 * that IPSec provides a full 16-byte IV (including the
 	 * counter), whereas the /dev/crypto interface sometimes
 	 * provides a full 16-byte IV (if no IV is provided in the
 	 * ioctl) and sometimes a 12-byte IV (if the IV was explicit).
 	 *
 	 * When provided a 12-byte IV, assume the IV is really 16 bytes
 	 * with a counter in the last 4 bytes initialized to 1.
 	 *
 	 * While iv_len is checked below, the value is currently
 	 * always set to 12 when creating a GCM session in this driver
 	 * due to limitations in OCF (there is no way to know what the
 	 * IV length of a given request will be).  This means that the
 	 * driver always assumes as 12-byte IV for now.
 	 */
 	if (s->blkcipher.iv_len == 12)
 		iv_len = AES_BLOCK_LEN;
 	else
 		iv_len = s->blkcipher.iv_len;
 
 	/*
 	 * The output buffer consists of the cipher text followed by
 	 * the tag when encrypting.  For decryption it only contains
 	 * the plain text.
 	 *
 	 * Due to a firmware bug, the output buffer must include a
 	 * dummy output buffer for the IV and AAD prior to the real
 	 * output buffer.
 	 */
 	if (op_type == CHCR_ENCRYPT_OP) {
 		if (iv_len + crda->crd_len + crde->crd_len +
 		    hash_size_in_response > MAX_REQUEST_SIZE)
 			return (EFBIG);
 	} else {
 		if (iv_len + crda->crd_len + crde->crd_len > MAX_REQUEST_SIZE)
 			return (EFBIG);
 	}
 	sglist_reset(sc->sg_dsgl);
 	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_iv_aad, 0, iv_len +
 	    crda->crd_len);
 	if (error)
 		return (error);
 	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp, crde->crd_skip,
 	    crde->crd_len);
 	if (error)
 		return (error);
 	if (op_type == CHCR_ENCRYPT_OP) {
 		error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp,
 		    crda->crd_inject, hash_size_in_response);
 		if (error)
 			return (error);
 	}
 	dsgl_nsegs = ccr_count_sgl(sc->sg_dsgl, DSGL_SGE_MAXLEN);
 	if (dsgl_nsegs > MAX_RX_PHYS_DSGL_SGE)
 		return (EFBIG);
 	dsgl_len = ccr_phys_dsgl_len(dsgl_nsegs);
 
 	/*
 	 * The 'key' part of the key context consists of the key followed
 	 * by the Galois hash key.
 	 */
 	kctx_len = roundup2(s->blkcipher.key_len, 16) + GMAC_BLOCK_LEN;
 	transhdr_len = CIPHER_TRANSHDR_SIZE(kctx_len, dsgl_len);
 
 	/*
 	 * The input buffer consists of the IV, any AAD, and then the
 	 * cipher/plain text.  For decryption requests the hash is
 	 * appended after the cipher text.
 	 *
 	 * The IV is always stored at the start of the input buffer
 	 * even though it may be duplicated in the payload.  The
 	 * crypto engine doesn't work properly if the IV offset points
 	 * inside of the AAD region, so a second copy is always
 	 * required.
 	 */
 	input_len = crda->crd_len + crde->crd_len;
 	if (op_type == CHCR_DECRYPT_OP)
 		input_len += hash_size_in_response;
 	if (input_len > MAX_REQUEST_SIZE)
 		return (EFBIG);
 	if (ccr_use_imm_data(transhdr_len, iv_len + input_len)) {
 		imm_len = input_len;
 		sgl_nsegs = 0;
 		sgl_len = 0;
 	} else {
 		imm_len = 0;
 		sglist_reset(sc->sg_ulptx);
 		if (crda->crd_len != 0) {
 			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 			    crda->crd_skip, crda->crd_len);
 			if (error)
 				return (error);
 		}
 		error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 		    crde->crd_skip, crde->crd_len);
 		if (error)
 			return (error);
 		if (op_type == CHCR_DECRYPT_OP) {
 			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
 			    crda->crd_inject, hash_size_in_response);
 			if (error)
 				return (error);
 		}
 		sgl_nsegs = sc->sg_ulptx->sg_nseg;
 		sgl_len = ccr_ulptx_sgl_len(sgl_nsegs);
 	}
 
 	if (crda->crd_len != 0) {
 		aad_start = iv_len + 1;
 		aad_stop = aad_start + crda->crd_len - 1;
 	} else {
 		aad_start = 0;
 		aad_stop = 0;
 	}
 	cipher_start = iv_len + crda->crd_len + 1;
 	if (op_type == CHCR_DECRYPT_OP)
 		cipher_stop = hash_size_in_response;
 	else
 		cipher_stop = 0;
 	if (op_type == CHCR_DECRYPT_OP)
 		auth_insert = hash_size_in_response;
 	else
 		auth_insert = 0;
 
 	wr_len = roundup2(transhdr_len, 16) + iv_len + roundup2(imm_len, 16) +
 	    sgl_len;
 	if (wr_len > SGE_MAX_WR_LEN)
 		return (EFBIG);
 	wr = alloc_wrqe(wr_len, sc->txq);
 	if (wr == NULL) {
 		sc->stats_wr_nomem++;
 		return (ENOMEM);
 	}
 	crwr = wrtod(wr);
 	memset(crwr, 0, wr_len);
 
 	/*
 	 * Read the existing IV from the request or generate a random
 	 * one if none is provided.  Optionally copy the generated IV
 	 * into the output buffer if requested.
 	 *
 	 * If the input IV is 12 bytes, append an explicit 4-byte
 	 * counter of 1.
 	 */
 	if (op_type == CHCR_ENCRYPT_OP) {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, s->blkcipher.iv_len);
 		else
 			arc4rand(iv, s->blkcipher.iv_len, 0);
 		if ((crde->crd_flags & CRD_F_IV_PRESENT) == 0)
 			crypto_copyback(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, s->blkcipher.iv_len, iv);
 	} else {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, s->blkcipher.iv_len);
 		else
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, s->blkcipher.iv_len, iv);
 	}
 	if (s->blkcipher.iv_len == 12)
 		*(uint32_t *)&iv[12] = htobe32(1);
 
 	ccr_populate_wreq(sc, crwr, kctx_len, wr_len, imm_len, sgl_len, 0,
 	    crp);
 
 	/* XXX: Hardcodes SGE loopback channel of 0. */
 	crwr->sec_cpl.op_ivinsrtofst = htobe32(
 	    V_CPL_TX_SEC_PDU_OPCODE(CPL_TX_SEC_PDU) |
 	    V_CPL_TX_SEC_PDU_RXCHID(sc->tx_channel_id) |
 	    V_CPL_TX_SEC_PDU_ACKFOLLOWS(0) | V_CPL_TX_SEC_PDU_ULPTXLPBK(1) |
 	    V_CPL_TX_SEC_PDU_CPLLEN(2) | V_CPL_TX_SEC_PDU_PLACEHOLDER(0) |
 	    V_CPL_TX_SEC_PDU_IVINSRTOFST(1));
 
 	crwr->sec_cpl.pldlen = htobe32(iv_len + input_len);
 
 	/*
 	 * NB: cipherstop is explicitly set to 0.  On encrypt it
 	 * should normally be set to 0 anyway (as the encrypt crd ends
 	 * at the end of the input).  However, for decrypt the cipher
 	 * ends before the tag in the AUTHENC case (and authstop is
 	 * set to stop before the tag), but for GCM the cipher still
 	 * runs to the end of the buffer.  Not sure if this is
 	 * intentional or a firmware quirk, but it is required for
 	 * working tag validation with GCM decryption.
 	 */
 	crwr->sec_cpl.aadstart_cipherstop_hi = htobe32(
 	    V_CPL_TX_SEC_PDU_AADSTART(aad_start) |
 	    V_CPL_TX_SEC_PDU_AADSTOP(aad_stop) |
 	    V_CPL_TX_SEC_PDU_CIPHERSTART(cipher_start) |
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_HI(0));
 	crwr->sec_cpl.cipherstop_lo_authinsert = htobe32(
 	    V_CPL_TX_SEC_PDU_CIPHERSTOP_LO(0) |
 	    V_CPL_TX_SEC_PDU_AUTHSTART(cipher_start) |
 	    V_CPL_TX_SEC_PDU_AUTHSTOP(cipher_stop) |
 	    V_CPL_TX_SEC_PDU_AUTHINSERT(auth_insert));
 
 	/* These two flits are actually a CPL_TLS_TX_SCMD_FMT. */
 	hmac_ctrl = ccr_hmac_ctrl(AES_GMAC_HASH_LEN, hash_size_in_response);
 	crwr->sec_cpl.seqno_numivs = htobe32(
 	    V_SCMD_SEQ_NO_CTRL(0) |
 	    V_SCMD_PROTO_VERSION(SCMD_PROTO_VERSION_GENERIC) |
 	    V_SCMD_ENC_DEC_CTRL(op_type) |
 	    V_SCMD_CIPH_AUTH_SEQ_CTRL(op_type == CHCR_ENCRYPT_OP ? 1 : 0) |
 	    V_SCMD_CIPH_MODE(SCMD_CIPH_MODE_AES_GCM) |
 	    V_SCMD_AUTH_MODE(SCMD_AUTH_MODE_GHASH) |
 	    V_SCMD_HMAC_CTRL(hmac_ctrl) |
 	    V_SCMD_IV_SIZE(iv_len / 2) |
 	    V_SCMD_NUM_IVS(0));
 	crwr->sec_cpl.ivgen_hdrlen = htobe32(
 	    V_SCMD_IV_GEN_CTRL(0) |
 	    V_SCMD_MORE_FRAGS(0) | V_SCMD_LAST_FRAG(0) | V_SCMD_MAC_ONLY(0) |
 	    V_SCMD_AADIVDROP(0) | V_SCMD_HDR_LEN(dsgl_len));
 
 	crwr->key_ctx.ctx_hdr = s->blkcipher.key_ctx_hdr;
 	memcpy(crwr->key_ctx.key, s->blkcipher.enckey, s->blkcipher.key_len);
 	dst = crwr->key_ctx.key + roundup2(s->blkcipher.key_len, 16);
 	memcpy(dst, s->gmac.ghash_h, GMAC_BLOCK_LEN);
 
 	dst = (char *)(crwr + 1) + kctx_len;
 	ccr_write_phys_dsgl(sc, dst, dsgl_nsegs);
 	dst += sizeof(struct cpl_rx_phys_dsgl) + dsgl_len;
 	memcpy(dst, iv, iv_len);
 	dst += iv_len;
 	if (imm_len != 0) {
 		if (crda->crd_len != 0) {
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crda->crd_skip, crda->crd_len, dst);
 			dst += crda->crd_len;
 		}
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crde->crd_skip,
 		    crde->crd_len, dst);
 		dst += crde->crd_len;
 		if (op_type == CHCR_DECRYPT_OP)
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crda->crd_inject, hash_size_in_response, dst);
 	} else
 		ccr_write_ulptx_sgl(sc, dst, sgl_nsegs);
 
 	/* XXX: TODO backpressure */
 	t4_wrq_tx(sc->adapter, wr);
 
 	return (0);
 }
 
 static int
 ccr_gcm_done(struct ccr_softc *sc, struct ccr_session *s,
     struct cryptop *crp, const struct cpl_fw6_pld *cpl, int error)
 {
 
 	/*
 	 * The updated IV to permit chained requests is at
 	 * cpl->data[2], but OCF doesn't permit chained requests.
 	 *
 	 * Note that the hardware should always verify the GMAC hash.
 	 */
 	return (error);
 }
 
 /*
  * Handle a GCM request that is not supported by the crypto engine by
  * performing the operation in software.  Derived from swcr_authenc().
  */
 static void
 ccr_gcm_soft(struct ccr_session *s, struct cryptop *crp,
     struct cryptodesc *crda, struct cryptodesc *crde)
 {
 	struct auth_hash *axf;
 	struct enc_xform *exf;
 	void *auth_ctx;
 	uint8_t *kschedule;
 	char block[GMAC_BLOCK_LEN];
 	char digest[GMAC_DIGEST_LEN];
 	char iv[AES_BLOCK_LEN];
 	int error, i, len;
 
 	auth_ctx = NULL;
 	kschedule = NULL;
 
 	/* Initialize the MAC. */
 	switch (s->blkcipher.key_len) {
 	case 16:
 		axf = &auth_hash_nist_gmac_aes_128;
 		break;
 	case 24:
 		axf = &auth_hash_nist_gmac_aes_192;
 		break;
 	case 32:
 		axf = &auth_hash_nist_gmac_aes_256;
 		break;
 	default:
 		error = EINVAL;
 		goto out;
 	}
 	auth_ctx = malloc(axf->ctxsize, M_CCR, M_NOWAIT);
 	if (auth_ctx == NULL) {
 		error = ENOMEM;
 		goto out;
 	}
 	axf->Init(auth_ctx);
 	axf->Setkey(auth_ctx, s->blkcipher.enckey, s->blkcipher.key_len);
 
 	/* Initialize the cipher. */
 	exf = &enc_xform_aes_nist_gcm;
 	error = exf->setkey(&kschedule, s->blkcipher.enckey,
 	    s->blkcipher.key_len);
 	if (error)
 		goto out;
 
 	/*
 	 * This assumes a 12-byte IV from the crp.  See longer comment
 	 * above in ccr_gcm() for more details.
 	 */
 	if (crde->crd_flags & CRD_F_ENCRYPT) {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, 12);
 		else
 			arc4rand(iv, 12, 0);
 		if ((crde->crd_flags & CRD_F_IV_PRESENT) == 0)
 			crypto_copyback(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, 12, iv);
 	} else {
 		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
 			memcpy(iv, crde->crd_iv, 12);
 		else
 			crypto_copydata(crp->crp_flags, crp->crp_buf,
 			    crde->crd_inject, 12, iv);
 	}
 	*(uint32_t *)&iv[12] = htobe32(1);
 
 	axf->Reinit(auth_ctx, iv, sizeof(iv));
 
 	/* MAC the AAD. */
 	for (i = 0; i < crda->crd_len; i += sizeof(block)) {
 		len = imin(crda->crd_len - i, sizeof(block));
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crda->crd_skip +
 		    i, len, block);
 		bzero(block + len, sizeof(block) - len);
 		axf->Update(auth_ctx, block, sizeof(block));
 	}
 
 	exf->reinit(kschedule, iv);
 
 	/* Do encryption with MAC */
 	for (i = 0; i < crde->crd_len; i += sizeof(block)) {
 		len = imin(crde->crd_len - i, sizeof(block));
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crde->crd_skip +
 		    i, len, block);
 		bzero(block + len, sizeof(block) - len);
 		if (crde->crd_flags & CRD_F_ENCRYPT) {
 			exf->encrypt(kschedule, block);
 			axf->Update(auth_ctx, block, len);
 			crypto_copyback(crp->crp_flags, crp->crp_buf,
 			    crde->crd_skip + i, len, block);
 		} else {
 			axf->Update(auth_ctx, block, len);
 		}
 	}
 
 	/* Length block. */
 	bzero(block, sizeof(block));
 	((uint32_t *)block)[1] = htobe32(crda->crd_len * 8);
 	((uint32_t *)block)[3] = htobe32(crde->crd_len * 8);
 	axf->Update(auth_ctx, block, sizeof(block));
 
 	/* Finalize MAC. */
 	axf->Final(digest, auth_ctx);
 
 	/* Inject or validate tag. */
 	if (crde->crd_flags & CRD_F_ENCRYPT) {
 		crypto_copyback(crp->crp_flags, crp->crp_buf, crda->crd_inject,
 		    sizeof(digest), digest);
 		error = 0;
 	} else {
 		char digest2[GMAC_DIGEST_LEN];
 
 		crypto_copydata(crp->crp_flags, crp->crp_buf, crda->crd_inject,
 		    sizeof(digest2), digest2);
 		if (timingsafe_bcmp(digest, digest2, sizeof(digest)) == 0) {
 			error = 0;
 
 			/* Tag matches, decrypt data. */
 			for (i = 0; i < crde->crd_len; i += sizeof(block)) {
 				len = imin(crde->crd_len - i, sizeof(block));
 				crypto_copydata(crp->crp_flags, crp->crp_buf,
 				    crde->crd_skip + i, len, block);
 				bzero(block + len, sizeof(block) - len);
 				exf->decrypt(kschedule, block);
 				crypto_copyback(crp->crp_flags, crp->crp_buf,
 				    crde->crd_skip + i, len, block);
 			}
 		} else
 			error = EBADMSG;
 	}
 
 	exf->zerokey(&kschedule);
 out:
 	if (auth_ctx != NULL) {
 		memset(auth_ctx, 0, axf->ctxsize);
 		free(auth_ctx, M_CCR);
 	}
 	crp->crp_etype = error;
 	crypto_done(crp);
 }
 
 static void
+generate_ccm_b0(struct cryptodesc *crda, struct cryptodesc *crde,
+    u_int hash_size_in_response, const char *iv, char *b0)
+{
+	u_int i, payload_len;
+
+	/* NB: L is already set in the first byte of the IV. */
+	memcpy(b0, iv, CCM_B0_SIZE);
+
+	/* Set length of hash in bits 3 - 5. */
+	b0[0] |= (((hash_size_in_response - 2) / 2) << 3);
+
+	/* Store the payload length as a big-endian value. */
+	payload_len = crde->crd_len;
+	for (i = 0; i < iv[0]; i++) {
+		b0[CCM_CBC_BLOCK_LEN - 1 - i] = payload_len;
+		payload_len >>= 8;
+	}
+
+	/*
+	 * If there is AAD in the request, set bit 6 in the flags
+	 * field and store the AAD length as a big-endian value at the
+	 * start of block 1.  This only assumes a 16-bit AAD length
+	 * since T6 doesn't support large AAD sizes.
+	 */
+	if (crda->crd_len != 0) {
+		b0[0] |= (1 << 6);
+		*(uint16_t *)(b0 + CCM_B0_SIZE) = htobe16(crda->crd_len);
+	}
+}
+
+static int
+ccr_ccm(struct ccr_softc *sc, struct ccr_session *s, struct cryptop *crp,
+    struct cryptodesc *crda, struct cryptodesc *crde)
+{
+	char iv[CHCR_MAX_CRYPTO_IV_LEN];
+	struct ulptx_idata *idata;
+	struct chcr_wr *crwr;
+	struct wrqe *wr;
+	char *dst;
+	u_int iv_len, kctx_len, op_type, transhdr_len, wr_len;
+	u_int aad_len, b0_len, hash_size_in_response, imm_len;
+	u_int aad_start, aad_stop, cipher_start, cipher_stop, auth_insert;
+	u_int hmac_ctrl, input_len;
+	int dsgl_nsegs, dsgl_len;
+	int sgl_nsegs, sgl_len;
+	int error;
+
+	if (s->blkcipher.key_len == 0)
+		return (EINVAL);
+
+	/*
+	 * The crypto engine doesn't handle CCM requests with an empty
+	 * payload, so handle those in software instead.
+	 */
+	if (crde->crd_len == 0)
+		return (EMSGSIZE);
+
+	/*
+	 * AAD is only permitted before the cipher/plain text, not
+	 * after.
+	 */
+	if (crda->crd_len + crda->crd_skip > crde->crd_len + crde->crd_skip)
+		return (EMSGSIZE);
+
+	/*
+	 * CCM always includes block 0 in the AAD before AAD from the
+	 * request.
+	 */
+	b0_len = CCM_B0_SIZE;
+	if (crda->crd_len != 0)
+		b0_len += CCM_AAD_FIELD_SIZE;
+	aad_len = b0_len + crda->crd_len;
+
+	/*
+	 * Always assume a 12 byte input IV for now since that is what
+	 * OCF always generates.  The full IV in the work request is
+	 * 16 bytes.
+	 */
+	iv_len = AES_BLOCK_LEN;
+
+	if (iv_len + aad_len > MAX_AAD_LEN)
+		return (EMSGSIZE);
+
+	hash_size_in_response = s->ccm_mac.hash_len;
+	if (crde->crd_flags & CRD_F_ENCRYPT)
+		op_type = CHCR_ENCRYPT_OP;
+	else
+		op_type = CHCR_DECRYPT_OP;
+
+	/*
+	 * The output buffer consists of the cipher text followed by
+	 * the tag when encrypting.  For decryption it only contains
+	 * the plain text.
+	 *
+	 * Due to a firmware bug, the output buffer must include a
+	 * dummy output buffer for the IV and AAD prior to the real
+	 * output buffer.
+	 */
+	if (op_type == CHCR_ENCRYPT_OP) {
+		if (iv_len + aad_len + crde->crd_len + hash_size_in_response >
+		    MAX_REQUEST_SIZE)
+			return (EFBIG);
+	} else {
+		if (iv_len + aad_len + crde->crd_len > MAX_REQUEST_SIZE)
+			return (EFBIG);
+	}
+	sglist_reset(sc->sg_dsgl);
+	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_iv_aad, 0, iv_len +
+	    aad_len);
+	if (error)
+		return (error);
+	error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp, crde->crd_skip,
+	    crde->crd_len);
+	if (error)
+		return (error);
+	if (op_type == CHCR_ENCRYPT_OP) {
+		error = sglist_append_sglist(sc->sg_dsgl, sc->sg_crp,
+		    crda->crd_inject, hash_size_in_response);
+		if (error)
+			return (error);
+	}
+	dsgl_nsegs = ccr_count_sgl(sc->sg_dsgl, DSGL_SGE_MAXLEN);
+	if (dsgl_nsegs > MAX_RX_PHYS_DSGL_SGE)
+		return (EFBIG);
+	dsgl_len = ccr_phys_dsgl_len(dsgl_nsegs);
+
+	/*
+	 * The 'key' part of the key context consists of two copies of
+	 * the AES key.
+	 */
+	kctx_len = roundup2(s->blkcipher.key_len, 16) * 2;
+	transhdr_len = CIPHER_TRANSHDR_SIZE(kctx_len, dsgl_len);
+
+	/*
+	 * The input buffer consists of the IV, AAD (including block
+	 * 0), and then the cipher/plain text.  For decryption
+	 * requests the hash is appended after the cipher text.
+	 *
+	 * The IV is always stored at the start of the input buffer
+	 * even though it may be duplicated in the payload.  The
+	 * crypto engine doesn't work properly if the IV offset points
+	 * inside of the AAD region, so a second copy is always
+	 * required.
+	 */
+	input_len = aad_len + crde->crd_len;
+	if (op_type == CHCR_DECRYPT_OP)
+		input_len += hash_size_in_response;
+	if (input_len > MAX_REQUEST_SIZE)
+		return (EFBIG);
+	if (ccr_use_imm_data(transhdr_len, iv_len + input_len)) {
+		imm_len = input_len;
+		sgl_nsegs = 0;
+		sgl_len = 0;
+	} else {
+		/* Block 0 is passed as immediate data. */
+		imm_len = b0_len;
+
+		sglist_reset(sc->sg_ulptx);
+		if (crda->crd_len != 0) {
+			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
+			    crda->crd_skip, crda->crd_len);
+			if (error)
+				return (error);
+		}
+		error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
+		    crde->crd_skip, crde->crd_len);
+		if (error)
+			return (error);
+		if (op_type == CHCR_DECRYPT_OP) {
+			error = sglist_append_sglist(sc->sg_ulptx, sc->sg_crp,
+			    crda->crd_inject, hash_size_in_response);
+			if (error)
+				return (error);
+		}
+		sgl_nsegs = sc->sg_ulptx->sg_nseg;
+		sgl_len = ccr_ulptx_sgl_len(sgl_nsegs);
+	}
+
+	aad_start = iv_len + 1;
+	aad_stop = aad_start + aad_len - 1;
+	cipher_start = aad_stop + 1;
+	if (op_type == CHCR_DECRYPT_OP)
+		cipher_stop = hash_size_in_response;
+	else
+		cipher_stop = 0;
+	if (op_type == CHCR_DECRYPT_OP)
+		auth_insert = hash_size_in_response;
+	else
+		auth_insert = 0;
+
+	wr_len = roundup2(transhdr_len, 16) + iv_len + roundup2(imm_len, 16) +
+	    sgl_len;
+	if (wr_len > SGE_MAX_WR_LEN)
+		return (EFBIG);
+	wr = alloc_wrqe(wr_len, sc->txq);
+	if (wr == NULL) {
+		sc->stats_wr_nomem++;
+		return (ENOMEM);
+	}
+	crwr = wrtod(wr);
+	memset(crwr, 0, wr_len);
+
+	/*
+	 * Read the nonce from the request or generate a random one if
+	 * none is provided.  Use the nonce to generate the full IV
+	 * with the counter set to 0.
+	 */
+	memset(iv, 0, iv_len);
+	iv[0] = (15 - AES_CCM_IV_LEN) - 1;
+	if (op_type == CHCR_ENCRYPT_OP) {
+		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
+			memcpy(iv + 1, crde->crd_iv, AES_CCM_IV_LEN);
+		else
+			arc4rand(iv + 1, AES_CCM_IV_LEN, 0);
+		if ((crde->crd_flags & CRD_F_IV_PRESENT) == 0)
+			crypto_copyback(crp->crp_flags, crp->crp_buf,
+			    crde->crd_inject, AES_CCM_IV_LEN, iv + 1);
+	} else {
+		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
+			memcpy(iv + 1, crde->crd_iv, AES_CCM_IV_LEN);
+		else
+			crypto_copydata(crp->crp_flags, crp->crp_buf,
+			    crde->crd_inject, AES_CCM_IV_LEN, iv + 1);
+	}
+
+	ccr_populate_wreq(sc, crwr, kctx_len, wr_len, imm_len, sgl_len, 0,
+	    crp);
+
+	/* XXX: Hardcodes SGE loopback channel of 0. */
+	crwr->sec_cpl.op_ivinsrtofst = htobe32(
+	    V_CPL_TX_SEC_PDU_OPCODE(CPL_TX_SEC_PDU) |
+	    V_CPL_TX_SEC_PDU_RXCHID(sc->tx_channel_id) |
+	    V_CPL_TX_SEC_PDU_ACKFOLLOWS(0) | V_CPL_TX_SEC_PDU_ULPTXLPBK(1) |
+	    V_CPL_TX_SEC_PDU_CPLLEN(2) | V_CPL_TX_SEC_PDU_PLACEHOLDER(0) |
+	    V_CPL_TX_SEC_PDU_IVINSRTOFST(1));
+
+	crwr->sec_cpl.pldlen = htobe32(iv_len + input_len);
+
+	/*
+	 * NB: cipherstop is explicitly set to 0.  See comments above
+	 * in ccr_gcm().
+	 */
+	crwr->sec_cpl.aadstart_cipherstop_hi = htobe32(
+	    V_CPL_TX_SEC_PDU_AADSTART(aad_start) |
+	    V_CPL_TX_SEC_PDU_AADSTOP(aad_stop) |
+	    V_CPL_TX_SEC_PDU_CIPHERSTART(cipher_start) |
+	    V_CPL_TX_SEC_PDU_CIPHERSTOP_HI(0));
+	crwr->sec_cpl.cipherstop_lo_authinsert = htobe32(
+	    V_CPL_TX_SEC_PDU_CIPHERSTOP_LO(0) |
+	    V_CPL_TX_SEC_PDU_AUTHSTART(cipher_start) |
+	    V_CPL_TX_SEC_PDU_AUTHSTOP(cipher_stop) |
+	    V_CPL_TX_SEC_PDU_AUTHINSERT(auth_insert));
+
+	/* These two flits are actually a CPL_TLS_TX_SCMD_FMT. */
+	hmac_ctrl = ccr_hmac_ctrl(AES_CBC_MAC_HASH_LEN, hash_size_in_response);
+	crwr->sec_cpl.seqno_numivs = htobe32(
+	    V_SCMD_SEQ_NO_CTRL(0) |
+	    V_SCMD_PROTO_VERSION(SCMD_PROTO_VERSION_GENERIC) |
+	    V_SCMD_ENC_DEC_CTRL(op_type) |
+	    V_SCMD_CIPH_AUTH_SEQ_CTRL(op_type == CHCR_ENCRYPT_OP ? 0 : 1) |
+	    V_SCMD_CIPH_MODE(SCMD_CIPH_MODE_AES_CCM) |
+	    V_SCMD_AUTH_MODE(SCMD_AUTH_MODE_CBCMAC) |
+	    V_SCMD_HMAC_CTRL(hmac_ctrl) |
+	    V_SCMD_IV_SIZE(iv_len / 2) |
+	    V_SCMD_NUM_IVS(0));
+	crwr->sec_cpl.ivgen_hdrlen = htobe32(
+	    V_SCMD_IV_GEN_CTRL(0) |
+	    V_SCMD_MORE_FRAGS(0) | V_SCMD_LAST_FRAG(0) | V_SCMD_MAC_ONLY(0) |
+	    V_SCMD_AADIVDROP(0) | V_SCMD_HDR_LEN(dsgl_len));
+
+	crwr->key_ctx.ctx_hdr = s->blkcipher.key_ctx_hdr;
+	memcpy(crwr->key_ctx.key, s->blkcipher.enckey, s->blkcipher.key_len);
+	memcpy(crwr->key_ctx.key + roundup(s->blkcipher.key_len, 16),
+	    s->blkcipher.enckey, s->blkcipher.key_len);
+
+	dst = (char *)(crwr + 1) + kctx_len;
+	ccr_write_phys_dsgl(sc, dst, dsgl_nsegs);
+	dst += sizeof(struct cpl_rx_phys_dsgl) + dsgl_len;
+	memcpy(dst, iv, iv_len);
+	dst += iv_len;
+	generate_ccm_b0(crda, crde, hash_size_in_response, iv, dst);
+	if (sgl_nsegs == 0) {
+		dst += b0_len;
+		if (crda->crd_len != 0) {
+			crypto_copydata(crp->crp_flags, crp->crp_buf,
+			    crda->crd_skip, crda->crd_len, dst);
+			dst += crda->crd_len;
+		}
+		crypto_copydata(crp->crp_flags, crp->crp_buf, crde->crd_skip,
+		    crde->crd_len, dst);
+		dst += crde->crd_len;
+		if (op_type == CHCR_DECRYPT_OP)
+			crypto_copydata(crp->crp_flags, crp->crp_buf,
+			    crda->crd_inject, hash_size_in_response, dst);
+	} else {
+		dst += CCM_B0_SIZE;
+		if (b0_len > CCM_B0_SIZE) {
+			/*
+			 * If there is AAD, insert padding including a
+			 * ULP_TX_SC_NOOP so that the ULP_TX_SC_DSGL
+			 * is 16-byte aligned.
+			 */
+			KASSERT(b0_len - CCM_B0_SIZE == CCM_AAD_FIELD_SIZE,
+			    ("b0_len mismatch"));
+			memset(dst + CCM_AAD_FIELD_SIZE, 0,
+			    8 - CCM_AAD_FIELD_SIZE);
+			idata = (void *)(dst + 8);
+			idata->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_NOOP));
+			idata->len = htobe32(0);
+			dst = (void *)(idata + 1);
+		}
+		ccr_write_ulptx_sgl(sc, dst, sgl_nsegs);
+	}
+
+	/* XXX: TODO backpressure */
+	t4_wrq_tx(sc->adapter, wr);
+
+	return (0);
+}
+
+static int
+ccr_ccm_done(struct ccr_softc *sc, struct ccr_session *s,
+    struct cryptop *crp, const struct cpl_fw6_pld *cpl, int error)
+{
+
+	/*
+	 * The updated IV to permit chained requests is at
+	 * cpl->data[2], but OCF doesn't permit chained requests.
+	 *
+	 * Note that the hardware should always verify the CBC MAC
+	 * hash.
+	 */
+	return (error);
+}
+
+/*
+ * Handle a CCM request that is not supported by the crypto engine by
+ * performing the operation in software.  Derived from swcr_authenc().
+ */
+static void
+ccr_ccm_soft(struct ccr_session *s, struct cryptop *crp,
+    struct cryptodesc *crda, struct cryptodesc *crde)
+{
+	struct auth_hash *axf;
+	struct enc_xform *exf;
+	union authctx *auth_ctx;
+	uint8_t *kschedule;
+	char block[CCM_CBC_BLOCK_LEN];
+	char digest[AES_CBC_MAC_HASH_LEN];
+	char iv[AES_CCM_IV_LEN];
+	int error, i, len;
+
+	auth_ctx = NULL;
+	kschedule = NULL;
+
+	/* Initialize the MAC. */
+	switch (s->blkcipher.key_len) {
+	case 16:
+		axf = &auth_hash_ccm_cbc_mac_128;
+		break;
+	case 24:
+		axf = &auth_hash_ccm_cbc_mac_192;
+		break;
+	case 32:
+		axf = &auth_hash_ccm_cbc_mac_256;
+		break;
+	default:
+		error = EINVAL;
+		goto out;
+	}
+	auth_ctx = malloc(axf->ctxsize, M_CCR, M_NOWAIT);
+	if (auth_ctx == NULL) {
+		error = ENOMEM;
+		goto out;
+	}
+	axf->Init(auth_ctx);
+	axf->Setkey(auth_ctx, s->blkcipher.enckey, s->blkcipher.key_len);
+
+	/* Initialize the cipher. */
+	exf = &enc_xform_ccm;
+	error = exf->setkey(&kschedule, s->blkcipher.enckey,
+	    s->blkcipher.key_len);
+	if (error)
+		goto out;
+
+	if (crde->crd_flags & CRD_F_ENCRYPT) {
+		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
+			memcpy(iv, crde->crd_iv, AES_CCM_IV_LEN);
+		else
+			arc4rand(iv, AES_CCM_IV_LEN, 0);
+		if ((crde->crd_flags & CRD_F_IV_PRESENT) == 0)
+			crypto_copyback(crp->crp_flags, crp->crp_buf,
+			    crde->crd_inject, AES_CCM_IV_LEN, iv);
+	} else {
+		if (crde->crd_flags & CRD_F_IV_EXPLICIT)
+			memcpy(iv, crde->crd_iv, AES_CCM_IV_LEN);
+		else
+			crypto_copydata(crp->crp_flags, crp->crp_buf,
+			    crde->crd_inject, AES_CCM_IV_LEN, iv);
+	}
+
+	auth_ctx->aes_cbc_mac_ctx.authDataLength = crda->crd_len;
+	auth_ctx->aes_cbc_mac_ctx.cryptDataLength = crde->crd_len;
+	axf->Reinit(auth_ctx, iv, sizeof(iv));
+
+	/* MAC the AAD. */
+	for (i = 0; i < crda->crd_len; i += sizeof(block)) {
+		len = imin(crda->crd_len - i, sizeof(block));
+		crypto_copydata(crp->crp_flags, crp->crp_buf, crda->crd_skip +
+		    i, len, block);
+		bzero(block + len, sizeof(block) - len);
+		axf->Update(auth_ctx, block, sizeof(block));
+	}
+
+	exf->reinit(kschedule, iv);
+
+	/* Do encryption/decryption with MAC */
+	for (i = 0; i < crde->crd_len; i += sizeof(block)) {
+		len = imin(crde->crd_len - i, sizeof(block));
+		crypto_copydata(crp->crp_flags, crp->crp_buf, crde->crd_skip +
+		    i, len, block);
+		bzero(block + len, sizeof(block) - len);
+		if (crde->crd_flags & CRD_F_ENCRYPT) {
+			axf->Update(auth_ctx, block, len);
+			exf->encrypt(kschedule, block);
+			crypto_copyback(crp->crp_flags, crp->crp_buf,
+			    crde->crd_skip + i, len, block);
+		} else {
+			exf->decrypt(kschedule, block);
+			axf->Update(auth_ctx, block, len);
+		}
+	}
+
+	/* Finalize MAC. */
+	axf->Final(digest, auth_ctx);
+
+	/* Inject or validate tag. */
+	if (crde->crd_flags & CRD_F_ENCRYPT) {
+		crypto_copyback(crp->crp_flags, crp->crp_buf, crda->crd_inject,
+		    sizeof(digest), digest);
+		error = 0;
+	} else {
+		char digest2[GMAC_DIGEST_LEN];
+
+		crypto_copydata(crp->crp_flags, crp->crp_buf, crda->crd_inject,
+		    sizeof(digest2), digest2);
+		if (timingsafe_bcmp(digest, digest2, sizeof(digest)) == 0) {
+			error = 0;
+
+			/* Tag matches, decrypt data. */
+			exf->reinit(kschedule, iv);
+			for (i = 0; i < crde->crd_len; i += sizeof(block)) {
+				len = imin(crde->crd_len - i, sizeof(block));
+				crypto_copydata(crp->crp_flags, crp->crp_buf,
+				    crde->crd_skip + i, len, block);
+				bzero(block + len, sizeof(block) - len);
+				exf->decrypt(kschedule, block);
+				crypto_copyback(crp->crp_flags, crp->crp_buf,
+				    crde->crd_skip + i, len, block);
+			}
+		} else
+			error = EBADMSG;
+	}
+
+	exf->zerokey(&kschedule);
+out:
+	if (auth_ctx != NULL) {
+		memset(auth_ctx, 0, axf->ctxsize);
+		free(auth_ctx, M_CCR);
+	}
+	crp->crp_etype = error;
+	crypto_done(crp);
+}
+
+static void
 ccr_identify(driver_t *driver, device_t parent)
 {
 	struct adapter *sc;
 
 	sc = device_get_softc(parent);
 	if (sc->cryptocaps & FW_CAPS_CONFIG_CRYPTO_LOOKASIDE &&
 	    device_find_child(parent, "ccr", -1) == NULL)
 		device_add_child(parent, "ccr", -1);
 }
 
 static int
 ccr_probe(device_t dev)
 {
 
 	device_set_desc(dev, "Chelsio Crypto Accelerator");
 	return (BUS_PROBE_DEFAULT);
 }
 
 static void
 ccr_sysctls(struct ccr_softc *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children;
 
 	ctx = device_get_sysctl_ctx(sc->dev);
 
 	/*
 	 * dev.ccr.X.
 	 */
 	oid = device_get_sysctl_tree(sc->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	/*
 	 * dev.ccr.X.stats.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD,
 	    NULL, "statistics");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "hash", CTLFLAG_RD,
 	    &sc->stats_hash, 0, "Hash requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "hmac", CTLFLAG_RD,
 	    &sc->stats_hmac, 0, "HMAC requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "cipher_encrypt", CTLFLAG_RD,
 	    &sc->stats_blkcipher_encrypt, 0,
 	    "Cipher encryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "cipher_decrypt", CTLFLAG_RD,
 	    &sc->stats_blkcipher_decrypt, 0,
 	    "Cipher decryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "authenc_encrypt", CTLFLAG_RD,
 	    &sc->stats_authenc_encrypt, 0,
 	    "Combined AES+HMAC encryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "authenc_decrypt", CTLFLAG_RD,
 	    &sc->stats_authenc_decrypt, 0,
 	    "Combined AES+HMAC decryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "gcm_encrypt", CTLFLAG_RD,
 	    &sc->stats_gcm_encrypt, 0, "AES-GCM encryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "gcm_decrypt", CTLFLAG_RD,
 	    &sc->stats_gcm_decrypt, 0, "AES-GCM decryption requests submitted");
+	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "ccm_encrypt", CTLFLAG_RD,
+	    &sc->stats_ccm_encrypt, 0, "AES-CCM encryption requests submitted");
+	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "ccm_decrypt", CTLFLAG_RD,
+	    &sc->stats_ccm_decrypt, 0, "AES-CCM decryption requests submitted");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "wr_nomem", CTLFLAG_RD,
 	    &sc->stats_wr_nomem, 0, "Work request memory allocation failures");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "inflight", CTLFLAG_RD,
 	    &sc->stats_inflight, 0, "Requests currently pending");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "mac_error", CTLFLAG_RD,
 	    &sc->stats_mac_error, 0, "MAC errors");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "pad_error", CTLFLAG_RD,
 	    &sc->stats_pad_error, 0, "Padding errors");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "bad_session", CTLFLAG_RD,
 	    &sc->stats_bad_session, 0, "Requests with invalid session ID");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "sglist_error", CTLFLAG_RD,
 	    &sc->stats_sglist_error, 0,
 	    "Requests for which DMA mapping failed");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "process_error", CTLFLAG_RD,
 	    &sc->stats_process_error, 0, "Requests failed during queueing");
 	SYSCTL_ADD_U64(ctx, children, OID_AUTO, "sw_fallback", CTLFLAG_RD,
 	    &sc->stats_sw_fallback, 0,
 	    "Requests processed by falling back to software");
 }
 
 static int
 ccr_attach(device_t dev)
 {
 	struct ccr_softc *sc;
 	int32_t cid;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	sc->adapter = device_get_softc(device_get_parent(dev));
 	sc->txq = &sc->adapter->sge.ctrlq[0];
 	sc->rxq = &sc->adapter->sge.rxq[0];
 	cid = crypto_get_driverid(dev, sizeof(struct ccr_session),
 	    CRYPTOCAP_F_HARDWARE);
 	if (cid < 0) {
 		device_printf(dev, "could not get crypto driver id\n");
 		return (ENXIO);
 	}
 	sc->cid = cid;
 	sc->adapter->ccr_softc = sc;
 
 	/* XXX: TODO? */
 	sc->tx_channel_id = 0;
 
 	mtx_init(&sc->lock, "ccr", NULL, MTX_DEF);
 	sc->sg_crp = sglist_alloc(TX_SGL_SEGS, M_WAITOK);
 	sc->sg_ulptx = sglist_alloc(TX_SGL_SEGS, M_WAITOK);
 	sc->sg_dsgl = sglist_alloc(MAX_RX_PHYS_DSGL_SGE, M_WAITOK);
 	sc->iv_aad_buf = malloc(MAX_AAD_LEN, M_CCR, M_WAITOK);
 	sc->sg_iv_aad = sglist_build(sc->iv_aad_buf, MAX_AAD_LEN, M_WAITOK);
 	ccr_sysctls(sc);
 
 	crypto_register(cid, CRYPTO_SHA1, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_224, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_256, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_384, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_512, 0, 0);
 	crypto_register(cid, CRYPTO_SHA1_HMAC, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_224_HMAC, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_256_HMAC, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_384_HMAC, 0, 0);
 	crypto_register(cid, CRYPTO_SHA2_512_HMAC, 0, 0);
 	crypto_register(cid, CRYPTO_AES_CBC, 0, 0);
 	crypto_register(cid, CRYPTO_AES_ICM, 0, 0);
 	crypto_register(cid, CRYPTO_AES_NIST_GCM_16, 0, 0);
 	crypto_register(cid, CRYPTO_AES_128_NIST_GMAC, 0, 0);
 	crypto_register(cid, CRYPTO_AES_192_NIST_GMAC, 0, 0);
 	crypto_register(cid, CRYPTO_AES_256_NIST_GMAC, 0, 0);
 	crypto_register(cid, CRYPTO_AES_XTS, 0, 0);
+	crypto_register(cid, CRYPTO_AES_CCM_16, 0, 0);
+	crypto_register(cid, CRYPTO_AES_CCM_CBC_MAC, 0, 0);
 	return (0);
 }
 
 static int
 ccr_detach(device_t dev)
 {
 	struct ccr_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	mtx_lock(&sc->lock);
 	sc->detaching = true;
 	mtx_unlock(&sc->lock);
 
 	crypto_unregister_all(sc->cid);
 
 	mtx_destroy(&sc->lock);
 	sglist_free(sc->sg_iv_aad);
 	free(sc->iv_aad_buf, M_CCR);
 	sglist_free(sc->sg_dsgl);
 	sglist_free(sc->sg_ulptx);
 	sglist_free(sc->sg_crp);
 	sc->adapter->ccr_softc = NULL;
 	return (0);
 }
 
 static void
 ccr_copy_partial_hash(void *dst, int cri_alg, union authctx *auth_ctx)
 {
 	uint32_t *u32;
 	uint64_t *u64;
 	u_int i;
 
 	u32 = (uint32_t *)dst;
 	u64 = (uint64_t *)dst;
 	switch (cri_alg) {
 	case CRYPTO_SHA1:
 	case CRYPTO_SHA1_HMAC:
 		for (i = 0; i < SHA1_HASH_LEN / 4; i++)
 			u32[i] = htobe32(auth_ctx->sha1ctx.h.b32[i]);
 		break;
 	case CRYPTO_SHA2_224:
 	case CRYPTO_SHA2_224_HMAC:
 		for (i = 0; i < SHA2_256_HASH_LEN / 4; i++)
 			u32[i] = htobe32(auth_ctx->sha224ctx.state[i]);
 		break;
 	case CRYPTO_SHA2_256:
 	case CRYPTO_SHA2_256_HMAC:
 		for (i = 0; i < SHA2_256_HASH_LEN / 4; i++)
 			u32[i] = htobe32(auth_ctx->sha256ctx.state[i]);
 		break;
 	case CRYPTO_SHA2_384:
 	case CRYPTO_SHA2_384_HMAC:
 		for (i = 0; i < SHA2_512_HASH_LEN / 8; i++)
 			u64[i] = htobe64(auth_ctx->sha384ctx.state[i]);
 		break;
 	case CRYPTO_SHA2_512:
 	case CRYPTO_SHA2_512_HMAC:
 		for (i = 0; i < SHA2_512_HASH_LEN / 8; i++)
 			u64[i] = htobe64(auth_ctx->sha512ctx.state[i]);
 		break;
 	}
 }
 
 static void
 ccr_init_hash_digest(struct ccr_session *s, int cri_alg)
 {
 	union authctx auth_ctx;
 	struct auth_hash *axf;
 
 	axf = s->hmac.auth_hash;
 	axf->Init(&auth_ctx);
 	ccr_copy_partial_hash(s->hmac.ipad, cri_alg, &auth_ctx);
 }
 
 static void
 ccr_init_hmac_digest(struct ccr_session *s, int cri_alg, char *key,
     int klen)
 {
 	union authctx auth_ctx;
 	struct auth_hash *axf;
 	u_int i;
 
 	/*
 	 * If the key is larger than the block size, use the digest of
 	 * the key as the key instead.
 	 */
 	axf = s->hmac.auth_hash;
 	klen /= 8;
 	if (klen > axf->blocksize) {
 		axf->Init(&auth_ctx);
 		axf->Update(&auth_ctx, key, klen);
 		axf->Final(s->hmac.ipad, &auth_ctx);
 		klen = axf->hashsize;
 	} else
 		memcpy(s->hmac.ipad, key, klen);
 
 	memset(s->hmac.ipad + klen, 0, axf->blocksize - klen);
 	memcpy(s->hmac.opad, s->hmac.ipad, axf->blocksize);
 
 	for (i = 0; i < axf->blocksize; i++) {
 		s->hmac.ipad[i] ^= HMAC_IPAD_VAL;
 		s->hmac.opad[i] ^= HMAC_OPAD_VAL;
 	}
 
 	/*
 	 * Hash the raw ipad and opad and store the partial result in
 	 * the same buffer.
 	 */
 	axf->Init(&auth_ctx);
 	axf->Update(&auth_ctx, s->hmac.ipad, axf->blocksize);
 	ccr_copy_partial_hash(s->hmac.ipad, cri_alg, &auth_ctx);
 
 	axf->Init(&auth_ctx);
 	axf->Update(&auth_ctx, s->hmac.opad, axf->blocksize);
 	ccr_copy_partial_hash(s->hmac.opad, cri_alg, &auth_ctx);
 }
 
 /*
  * Borrowed from AES_GMAC_Setkey().
  */
 static void
 ccr_init_gmac_hash(struct ccr_session *s, char *key, int klen)
 {
 	static char zeroes[GMAC_BLOCK_LEN];
 	uint32_t keysched[4 * (RIJNDAEL_MAXNR + 1)];
 	int rounds;
 
 	rounds = rijndaelKeySetupEnc(keysched, key, klen);
 	rijndaelEncrypt(keysched, rounds, zeroes, s->gmac.ghash_h);
 }
 
 static int
 ccr_aes_check_keylen(int alg, int klen)
 {
 
 	switch (klen) {
 	case 128:
 	case 192:
 		if (alg == CRYPTO_AES_XTS)
 			return (EINVAL);
 		break;
 	case 256:
 		break;
 	case 512:
 		if (alg != CRYPTO_AES_XTS)
 			return (EINVAL);
 		break;
 	default:
 		return (EINVAL);
 	}
 	return (0);
 }
 
 static void
 ccr_aes_setkey(struct ccr_session *s, int alg, const void *key, int klen)
 {
 	unsigned int ck_size, iopad_size, kctx_flits, kctx_len, kbits, mk_size;
 	unsigned int opad_present;
 
 	if (alg == CRYPTO_AES_XTS)
 		kbits = klen / 2;
 	else
 		kbits = klen;
 	switch (kbits) {
 	case 128:
 		ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_128;
 		break;
 	case 192:
 		ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_192;
 		break;
 	case 256:
 		ck_size = CHCR_KEYCTX_CIPHER_KEY_SIZE_256;
 		break;
 	default:
 		panic("should not get here");
 	}
 
 	s->blkcipher.key_len = klen / 8;
 	memcpy(s->blkcipher.enckey, key, s->blkcipher.key_len);
 	switch (alg) {
 	case CRYPTO_AES_CBC:
 	case CRYPTO_AES_XTS:
 		t4_aes_getdeckey(s->blkcipher.deckey, key, kbits);
 		break;
 	}
 
 	kctx_len = roundup2(s->blkcipher.key_len, 16);
 	switch (s->mode) {
 	case AUTHENC:
 		mk_size = s->hmac.mk_size;
 		opad_present = 1;
 		iopad_size = roundup2(s->hmac.partial_digest_len, 16);
 		kctx_len += iopad_size * 2;
 		break;
 	case GCM:
 		mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_128;
 		opad_present = 0;
 		kctx_len += GMAC_BLOCK_LEN;
 		break;
+	case CCM:
+		switch (kbits) {
+		case 128:
+			mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_128;
+			break;
+		case 192:
+			mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_192;
+			break;
+		case 256:
+			mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_256;
+			break;
+		default:
+			panic("should not get here");
+		}
+		opad_present = 0;
+		kctx_len *= 2;
+		break;
 	default:
 		mk_size = CHCR_KEYCTX_NO_KEY;
 		opad_present = 0;
 		break;
 	}
 	kctx_flits = (sizeof(struct _key_ctx) + kctx_len) / 16;
 	s->blkcipher.key_ctx_hdr = htobe32(V_KEY_CONTEXT_CTX_LEN(kctx_flits) |
 	    V_KEY_CONTEXT_DUAL_CK(alg == CRYPTO_AES_XTS) |
 	    V_KEY_CONTEXT_OPAD_PRESENT(opad_present) |
 	    V_KEY_CONTEXT_SALT_PRESENT(1) | V_KEY_CONTEXT_CK_SIZE(ck_size) |
 	    V_KEY_CONTEXT_MK_SIZE(mk_size) | V_KEY_CONTEXT_VALID(1));
 }
 
 static int
 ccr_newsession(device_t dev, crypto_session_t cses, struct cryptoini *cri)
 {
 	struct ccr_softc *sc;
 	struct ccr_session *s;
 	struct auth_hash *auth_hash;
 	struct cryptoini *c, *hash, *cipher;
 	unsigned int auth_mode, cipher_mode, iv_len, mk_size;
 	unsigned int partial_digest_len;
 	int error;
 	bool gcm_hash, hmac;
 
 	if (cri == NULL)
 		return (EINVAL);
 
 	gcm_hash = false;
 	hmac = false;
 	cipher = NULL;
 	hash = NULL;
 	auth_hash = NULL;
 	auth_mode = SCMD_AUTH_MODE_NOP;
 	cipher_mode = SCMD_CIPH_MODE_NOP;
 	iv_len = 0;
 	mk_size = 0;
 	partial_digest_len = 0;
 	for (c = cri; c != NULL; c = c->cri_next) {
 		switch (c->cri_alg) {
 		case CRYPTO_SHA1:
 		case CRYPTO_SHA2_224:
 		case CRYPTO_SHA2_256:
 		case CRYPTO_SHA2_384:
 		case CRYPTO_SHA2_512:
 		case CRYPTO_SHA1_HMAC:
 		case CRYPTO_SHA2_224_HMAC:
 		case CRYPTO_SHA2_256_HMAC:
 		case CRYPTO_SHA2_384_HMAC:
 		case CRYPTO_SHA2_512_HMAC:
 		case CRYPTO_AES_128_NIST_GMAC:
 		case CRYPTO_AES_192_NIST_GMAC:
 		case CRYPTO_AES_256_NIST_GMAC:
+		case CRYPTO_AES_CCM_CBC_MAC:
 			if (hash)
 				return (EINVAL);
 			hash = c;
 			switch (c->cri_alg) {
 			case CRYPTO_SHA1:
 			case CRYPTO_SHA1_HMAC:
 				auth_hash = &auth_hash_hmac_sha1;
 				auth_mode = SCMD_AUTH_MODE_SHA1;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_160;
 				partial_digest_len = SHA1_HASH_LEN;
 				break;
 			case CRYPTO_SHA2_224:
 			case CRYPTO_SHA2_224_HMAC:
 				auth_hash = &auth_hash_hmac_sha2_224;
 				auth_mode = SCMD_AUTH_MODE_SHA224;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_256;
 				partial_digest_len = SHA2_256_HASH_LEN;
 				break;
 			case CRYPTO_SHA2_256:
 			case CRYPTO_SHA2_256_HMAC:
 				auth_hash = &auth_hash_hmac_sha2_256;
 				auth_mode = SCMD_AUTH_MODE_SHA256;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_256;
 				partial_digest_len = SHA2_256_HASH_LEN;
 				break;
 			case CRYPTO_SHA2_384:
 			case CRYPTO_SHA2_384_HMAC:
 				auth_hash = &auth_hash_hmac_sha2_384;
 				auth_mode = SCMD_AUTH_MODE_SHA512_384;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_512;
 				partial_digest_len = SHA2_512_HASH_LEN;
 				break;
 			case CRYPTO_SHA2_512:
 			case CRYPTO_SHA2_512_HMAC:
 				auth_hash = &auth_hash_hmac_sha2_512;
 				auth_mode = SCMD_AUTH_MODE_SHA512_512;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_512;
 				partial_digest_len = SHA2_512_HASH_LEN;
 				break;
 			case CRYPTO_AES_128_NIST_GMAC:
 			case CRYPTO_AES_192_NIST_GMAC:
 			case CRYPTO_AES_256_NIST_GMAC:
 				gcm_hash = true;
 				auth_mode = SCMD_AUTH_MODE_GHASH;
 				mk_size = CHCR_KEYCTX_MAC_KEY_SIZE_128;
 				break;
+			case CRYPTO_AES_CCM_CBC_MAC:
+				auth_mode = SCMD_AUTH_MODE_CBCMAC;
+				break;
 			}
 			switch (c->cri_alg) {
 			case CRYPTO_SHA1_HMAC:
 			case CRYPTO_SHA2_224_HMAC:
 			case CRYPTO_SHA2_256_HMAC:
 			case CRYPTO_SHA2_384_HMAC:
 			case CRYPTO_SHA2_512_HMAC:
 				hmac = true;
 				break;
 			}
 			break;
 		case CRYPTO_AES_CBC:
 		case CRYPTO_AES_ICM:
 		case CRYPTO_AES_NIST_GCM_16:
 		case CRYPTO_AES_XTS:
+		case CRYPTO_AES_CCM_16:
 			if (cipher)
 				return (EINVAL);
 			cipher = c;
 			switch (c->cri_alg) {
 			case CRYPTO_AES_CBC:
 				cipher_mode = SCMD_CIPH_MODE_AES_CBC;
 				iv_len = AES_BLOCK_LEN;
 				break;
 			case CRYPTO_AES_ICM:
 				cipher_mode = SCMD_CIPH_MODE_AES_CTR;
 				iv_len = AES_BLOCK_LEN;
 				break;
 			case CRYPTO_AES_NIST_GCM_16:
 				cipher_mode = SCMD_CIPH_MODE_AES_GCM;
 				iv_len = AES_GCM_IV_LEN;
 				break;
 			case CRYPTO_AES_XTS:
 				cipher_mode = SCMD_CIPH_MODE_AES_XTS;
 				iv_len = AES_BLOCK_LEN;
 				break;
+			case CRYPTO_AES_CCM_16:
+				cipher_mode = SCMD_CIPH_MODE_AES_CCM;
+				iv_len = AES_CCM_IV_LEN;
+				break;
 			}
 			if (c->cri_key != NULL) {
 				error = ccr_aes_check_keylen(c->cri_alg,
 				    c->cri_klen);
 				if (error)
 					return (error);
 			}
 			break;
 		default:
 			return (EINVAL);
 		}
 	}
 	if (gcm_hash != (cipher_mode == SCMD_CIPH_MODE_AES_GCM))
 		return (EINVAL);
+	if ((auth_mode == SCMD_AUTH_MODE_CBCMAC) !=
+	    (cipher_mode == SCMD_CIPH_MODE_AES_CCM))
+		return (EINVAL);
 	if (hash == NULL && cipher == NULL)
 		return (EINVAL);
 	if (hash != NULL) {
-		if ((hmac || gcm_hash) && hash->cri_key == NULL)
-			return (EINVAL);
-		if (!(hmac || gcm_hash) && hash->cri_key != NULL)
-			return (EINVAL);
+		if (hmac || gcm_hash || auth_mode == SCMD_AUTH_MODE_CBCMAC) {
+			if (hash->cri_key == NULL)
+				return (EINVAL);
+		} else {
+			if (hash->cri_key != NULL)
+				return (EINVAL);
+		}
 	}
 
 	sc = device_get_softc(dev);
 
 	/*
 	 * XXX: Don't create a session if the queues aren't
 	 * initialized.  This is racy as the rxq can be destroyed by
 	 * the associated VI detaching.  Eventually ccr should use
 	 * dedicated queues.
 	 */
 	if (sc->rxq->iq.adapter == NULL || sc->txq->adapter == NULL)
 		return (ENXIO);
 	
 	mtx_lock(&sc->lock);
 	if (sc->detaching) {
 		mtx_unlock(&sc->lock);
 		return (ENXIO);
 	}
 
 	s = crypto_get_driver_session(cses);
 
 	if (gcm_hash)
 		s->mode = GCM;
+	else if (cipher_mode == SCMD_CIPH_MODE_AES_CCM)
+		s->mode = CCM;
 	else if (hash != NULL && cipher != NULL)
 		s->mode = AUTHENC;
 	else if (hash != NULL) {
 		if (hmac)
 			s->mode = HMAC;
 		else
 			s->mode = HASH;
 	} else {
 		MPASS(cipher != NULL);
 		s->mode = BLKCIPHER;
 	}
 	if (gcm_hash) {
 		if (hash->cri_mlen == 0)
 			s->gmac.hash_len = AES_GMAC_HASH_LEN;
 		else
 			s->gmac.hash_len = hash->cri_mlen;
 		ccr_init_gmac_hash(s, hash->cri_key, hash->cri_klen);
+	} else if (auth_mode == SCMD_AUTH_MODE_CBCMAC) {
+		if (hash->cri_mlen == 0)
+			s->ccm_mac.hash_len = AES_CBC_MAC_HASH_LEN;
+		else
+			s->ccm_mac.hash_len = hash->cri_mlen;
 	} else if (hash != NULL) {
 		s->hmac.auth_hash = auth_hash;
 		s->hmac.auth_mode = auth_mode;
 		s->hmac.mk_size = mk_size;
 		s->hmac.partial_digest_len = partial_digest_len;
 		if (hash->cri_mlen == 0)
 			s->hmac.hash_len = auth_hash->hashsize;
 		else
 			s->hmac.hash_len = hash->cri_mlen;
 		if (hmac)
 			ccr_init_hmac_digest(s, hash->cri_alg, hash->cri_key,
 			    hash->cri_klen);
 		else
 			ccr_init_hash_digest(s, hash->cri_alg);
 	}
 	if (cipher != NULL) {
 		s->blkcipher.cipher_mode = cipher_mode;
 		s->blkcipher.iv_len = iv_len;
 		if (cipher->cri_key != NULL)
 			ccr_aes_setkey(s, cipher->cri_alg, cipher->cri_key,
 			    cipher->cri_klen);
 	}
 
 	s->active = true;
 	mtx_unlock(&sc->lock);
 	return (0);
 }
 
 static void
 ccr_freesession(device_t dev, crypto_session_t cses)
 {
 	struct ccr_softc *sc;
 	struct ccr_session *s;
 
 	sc = device_get_softc(dev);
 	s = crypto_get_driver_session(cses);
 	mtx_lock(&sc->lock);
 	if (s->pending != 0)
 		device_printf(dev,
 		    "session %p freed with %d pending requests\n", s,
 		    s->pending);
 	s->active = false;
 	mtx_unlock(&sc->lock);
 }
 
 static int
 ccr_process(device_t dev, struct cryptop *crp, int hint)
 {
 	struct ccr_softc *sc;
 	struct ccr_session *s;
 	struct cryptodesc *crd, *crda, *crde;
 	int error;
 
 	if (crp == NULL)
 		return (EINVAL);
 
 	crd = crp->crp_desc;
 	s = crypto_get_driver_session(crp->crp_session);
 	sc = device_get_softc(dev);
 
 	mtx_lock(&sc->lock);
 	error = ccr_populate_sglist(sc->sg_crp, crp);
 	if (error) {
 		sc->stats_sglist_error++;
 		goto out;
 	}
 
 	switch (s->mode) {
 	case HASH:
 		error = ccr_hash(sc, s, crp);
 		if (error == 0)
 			sc->stats_hash++;
 		break;
 	case HMAC:
 		if (crd->crd_flags & CRD_F_KEY_EXPLICIT)
 			ccr_init_hmac_digest(s, crd->crd_alg, crd->crd_key,
 			    crd->crd_klen);
 		error = ccr_hash(sc, s, crp);
 		if (error == 0)
 			sc->stats_hmac++;
 		break;
 	case BLKCIPHER:
 		if (crd->crd_flags & CRD_F_KEY_EXPLICIT) {
 			error = ccr_aes_check_keylen(crd->crd_alg,
 			    crd->crd_klen);
 			if (error)
 				break;
 			ccr_aes_setkey(s, crd->crd_alg, crd->crd_key,
 			    crd->crd_klen);
 		}
 		error = ccr_blkcipher(sc, s, crp);
 		if (error == 0) {
 			if (crd->crd_flags & CRD_F_ENCRYPT)
 				sc->stats_blkcipher_encrypt++;
 			else
 				sc->stats_blkcipher_decrypt++;
 		}
 		break;
 	case AUTHENC:
 		error = 0;
 		switch (crd->crd_alg) {
 		case CRYPTO_AES_CBC:
 		case CRYPTO_AES_ICM:
 		case CRYPTO_AES_XTS:
 			/* Only encrypt-then-authenticate supported. */
 			crde = crd;
 			crda = crd->crd_next;
 			if (!(crde->crd_flags & CRD_F_ENCRYPT)) {
 				error = EINVAL;
 				break;
 			}
 			break;
 		default:
 			crda = crd;
 			crde = crd->crd_next;
 			if (crde->crd_flags & CRD_F_ENCRYPT) {
 				error = EINVAL;
 				break;
 			}
 			break;
 		}
 		if (error)
 			break;
 		if (crda->crd_flags & CRD_F_KEY_EXPLICIT)
 			ccr_init_hmac_digest(s, crda->crd_alg, crda->crd_key,
 			    crda->crd_klen);
 		if (crde->crd_flags & CRD_F_KEY_EXPLICIT) {
 			error = ccr_aes_check_keylen(crde->crd_alg,
 			    crde->crd_klen);
 			if (error)
 				break;
 			ccr_aes_setkey(s, crde->crd_alg, crde->crd_key,
 			    crde->crd_klen);
 		}
 		error = ccr_authenc(sc, s, crp, crda, crde);
 		if (error == 0) {
 			if (crde->crd_flags & CRD_F_ENCRYPT)
 				sc->stats_authenc_encrypt++;
 			else
 				sc->stats_authenc_decrypt++;
 		}
 		break;
 	case GCM:
 		error = 0;
 		if (crd->crd_alg == CRYPTO_AES_NIST_GCM_16) {
 			crde = crd;
 			crda = crd->crd_next;
 		} else {
 			crda = crd;
 			crde = crd->crd_next;
 		}
 		if (crda->crd_flags & CRD_F_KEY_EXPLICIT)
 			ccr_init_gmac_hash(s, crda->crd_key, crda->crd_klen);
 		if (crde->crd_flags & CRD_F_KEY_EXPLICIT) {
 			error = ccr_aes_check_keylen(crde->crd_alg,
 			    crde->crd_klen);
 			if (error)
 				break;
 			ccr_aes_setkey(s, crde->crd_alg, crde->crd_key,
 			    crde->crd_klen);
 		}
 		if (crde->crd_len == 0) {
 			mtx_unlock(&sc->lock);
 			ccr_gcm_soft(s, crp, crda, crde);
 			return (0);
 		}
 		error = ccr_gcm(sc, s, crp, crda, crde);
 		if (error == EMSGSIZE) {
 			sc->stats_sw_fallback++;
 			mtx_unlock(&sc->lock);
 			ccr_gcm_soft(s, crp, crda, crde);
 			return (0);
 		}
 		if (error == 0) {
 			if (crde->crd_flags & CRD_F_ENCRYPT)
 				sc->stats_gcm_encrypt++;
 			else
 				sc->stats_gcm_decrypt++;
 		}
 		break;
+	case CCM:
+		error = 0;
+		if (crd->crd_alg == CRYPTO_AES_CCM_16) {
+			crde = crd;
+			crda = crd->crd_next;
+		} else {
+			crda = crd;
+			crde = crd->crd_next;
+		}
+		if (crde->crd_flags & CRD_F_KEY_EXPLICIT) {
+			error = ccr_aes_check_keylen(crde->crd_alg,
+			    crde->crd_klen);
+			if (error)
+				break;
+			ccr_aes_setkey(s, crde->crd_alg, crde->crd_key,
+			    crde->crd_klen);
+		}
+		error = ccr_ccm(sc, s, crp, crda, crde);
+		if (error == EMSGSIZE) {
+			sc->stats_sw_fallback++;
+			mtx_unlock(&sc->lock);
+			ccr_ccm_soft(s, crp, crda, crde);
+			return (0);
+		}
+		if (error == 0) {
+			if (crde->crd_flags & CRD_F_ENCRYPT)
+				sc->stats_ccm_encrypt++;
+			else
+				sc->stats_ccm_decrypt++;
+		}
+		break;
 	}
 
 	if (error == 0) {
 		s->pending++;
 		sc->stats_inflight++;
 	} else
 		sc->stats_process_error++;
 
 out:
 	mtx_unlock(&sc->lock);
 
 	if (error) {
 		crp->crp_etype = error;
 		crypto_done(crp);
 	}
 
 	return (0);
 }
 
 static int
 do_cpl6_fw_pld(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	struct ccr_softc *sc = iq->adapter->ccr_softc;
 	struct ccr_session *s;
 	const struct cpl_fw6_pld *cpl;
 	struct cryptop *crp;
 	uint32_t status;
 	int error;
 
 	if (m != NULL)
 		cpl = mtod(m, const void *);
 	else
 		cpl = (const void *)(rss + 1);
 
 	crp = (struct cryptop *)(uintptr_t)be64toh(cpl->data[1]);
 	s = crypto_get_driver_session(crp->crp_session);
 	status = be64toh(cpl->data[0]);
 	if (CHK_MAC_ERR_BIT(status) || CHK_PAD_ERR_BIT(status))
 		error = EBADMSG;
 	else
 		error = 0;
 
 	mtx_lock(&sc->lock);
 	s->pending--;
 	sc->stats_inflight--;
 
 	switch (s->mode) {
 	case HASH:
 	case HMAC:
 		error = ccr_hash_done(sc, s, crp, cpl, error);
 		break;
 	case BLKCIPHER:
 		error = ccr_blkcipher_done(sc, s, crp, cpl, error);
 		break;
 	case AUTHENC:
 		error = ccr_authenc_done(sc, s, crp, cpl, error);
 		break;
 	case GCM:
 		error = ccr_gcm_done(sc, s, crp, cpl, error);
+		break;
+	case CCM:
+		error = ccr_ccm_done(sc, s, crp, cpl, error);
 		break;
 	}
 
 	if (error == EBADMSG) {
 		if (CHK_MAC_ERR_BIT(status))
 			sc->stats_mac_error++;
 		if (CHK_PAD_ERR_BIT(status))
 			sc->stats_pad_error++;
 	}
 	mtx_unlock(&sc->lock);
 	crp->crp_etype = error;
 	crypto_done(crp);
 	m_freem(m);
 	return (0);
 }
 
 static int
 ccr_modevent(module_t mod, int cmd, void *arg)
 {
 
 	switch (cmd) {
 	case MOD_LOAD:
 		t4_register_cpl_handler(CPL_FW6_PLD, do_cpl6_fw_pld);
 		return (0);
 	case MOD_UNLOAD:
 		t4_register_cpl_handler(CPL_FW6_PLD, NULL);
 		return (0);
 	default:
 		return (EOPNOTSUPP);
 	}
 }
 
 static device_method_t ccr_methods[] = {
 	DEVMETHOD(device_identify,	ccr_identify),
 	DEVMETHOD(device_probe,		ccr_probe),
 	DEVMETHOD(device_attach,	ccr_attach),
 	DEVMETHOD(device_detach,	ccr_detach),
 
 	DEVMETHOD(cryptodev_newsession,	ccr_newsession),
 	DEVMETHOD(cryptodev_freesession, ccr_freesession),
 	DEVMETHOD(cryptodev_process,	ccr_process),
 
 	DEVMETHOD_END
 };
 
 static driver_t ccr_driver = {
 	"ccr",
 	ccr_methods,
 	sizeof(struct ccr_softc)
 };
 
 static devclass_t ccr_devclass;
 
 DRIVER_MODULE(ccr, t6nex, ccr_driver, ccr_devclass, ccr_modevent, NULL);
 MODULE_VERSION(ccr, 1);
 MODULE_DEPEND(ccr, crypto, 1, 1, 1);
 MODULE_DEPEND(ccr, t6nex, 1, 1, 1);
Index: user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.h
===================================================================
--- user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.h	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/cxgbe/crypto/t4_crypto.h	(revision 346926)
@@ -1,191 +1,194 @@
 /*-
  * Copyright (c) 2017 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: John Baldwin <jhb@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef __T4_CRYPTO_H__
 #define	__T4_CRYPTO_H__
 
 /* From chr_core.h */
 #define PAD_ERROR_BIT		1
 #define CHK_PAD_ERR_BIT(x)	(((x) >> PAD_ERROR_BIT) & 1)
 
 #define MAC_ERROR_BIT		0
 #define CHK_MAC_ERR_BIT(x)	(((x) >> MAC_ERROR_BIT) & 1)
 #define MAX_SALT                4
 
 struct _key_ctx {
 	__be32 ctx_hdr;
 	u8 salt[MAX_SALT];
 	__be64 reserverd;
 	unsigned char key[0];
 };
 
 struct chcr_wr {
 	struct fw_crypto_lookaside_wr wreq;
 	struct ulp_txpkt ulptx;
 	struct ulptx_idata sc_imm;
 	struct cpl_tx_sec_pdu sec_cpl;
 	struct _key_ctx key_ctx;
 };
 
 /* From chr_algo.h */
 
 /* Crypto key context */
 #define S_KEY_CONTEXT_CTX_LEN           24
 #define M_KEY_CONTEXT_CTX_LEN           0xff
 #define V_KEY_CONTEXT_CTX_LEN(x)        ((x) << S_KEY_CONTEXT_CTX_LEN)
 #define G_KEY_CONTEXT_CTX_LEN(x) \
 	(((x) >> S_KEY_CONTEXT_CTX_LEN) & M_KEY_CONTEXT_CTX_LEN)
 
 #define S_KEY_CONTEXT_DUAL_CK      12
 #define M_KEY_CONTEXT_DUAL_CK      0x1
 #define V_KEY_CONTEXT_DUAL_CK(x)   ((x) << S_KEY_CONTEXT_DUAL_CK)
 #define G_KEY_CONTEXT_DUAL_CK(x)   \
 (((x) >> S_KEY_CONTEXT_DUAL_CK) & M_KEY_CONTEXT_DUAL_CK)
 #define F_KEY_CONTEXT_DUAL_CK      V_KEY_CONTEXT_DUAL_CK(1U)
 
 #define S_KEY_CONTEXT_OPAD_PRESENT      11
 #define M_KEY_CONTEXT_OPAD_PRESENT      0x1
 #define V_KEY_CONTEXT_OPAD_PRESENT(x)   ((x) << S_KEY_CONTEXT_OPAD_PRESENT)
 #define G_KEY_CONTEXT_OPAD_PRESENT(x)   \
 	(((x) >> S_KEY_CONTEXT_OPAD_PRESENT) & \
 	 M_KEY_CONTEXT_OPAD_PRESENT)
 #define F_KEY_CONTEXT_OPAD_PRESENT      V_KEY_CONTEXT_OPAD_PRESENT(1U)
 
 #define S_KEY_CONTEXT_SALT_PRESENT      10
 #define M_KEY_CONTEXT_SALT_PRESENT      0x1
 #define V_KEY_CONTEXT_SALT_PRESENT(x)   ((x) << S_KEY_CONTEXT_SALT_PRESENT)
 #define G_KEY_CONTEXT_SALT_PRESENT(x)   \
 	(((x) >> S_KEY_CONTEXT_SALT_PRESENT) & \
 	 M_KEY_CONTEXT_SALT_PRESENT)
 #define F_KEY_CONTEXT_SALT_PRESENT      V_KEY_CONTEXT_SALT_PRESENT(1U)
 
 #define S_KEY_CONTEXT_CK_SIZE           6
 #define M_KEY_CONTEXT_CK_SIZE           0xf
 #define V_KEY_CONTEXT_CK_SIZE(x)        ((x) << S_KEY_CONTEXT_CK_SIZE)
 #define G_KEY_CONTEXT_CK_SIZE(x)        \
 	(((x) >> S_KEY_CONTEXT_CK_SIZE) & M_KEY_CONTEXT_CK_SIZE)
 
 #define S_KEY_CONTEXT_MK_SIZE           2
 #define M_KEY_CONTEXT_MK_SIZE           0xf
 #define V_KEY_CONTEXT_MK_SIZE(x)        ((x) << S_KEY_CONTEXT_MK_SIZE)
 #define G_KEY_CONTEXT_MK_SIZE(x)        \
 	(((x) >> S_KEY_CONTEXT_MK_SIZE) & M_KEY_CONTEXT_MK_SIZE)
 
 #define S_KEY_CONTEXT_VALID     0
 #define M_KEY_CONTEXT_VALID     0x1
 #define V_KEY_CONTEXT_VALID(x)  ((x) << S_KEY_CONTEXT_VALID)
 #define G_KEY_CONTEXT_VALID(x)  \
 	(((x) >> S_KEY_CONTEXT_VALID) & \
 	 M_KEY_CONTEXT_VALID)
 #define F_KEY_CONTEXT_VALID     V_KEY_CONTEXT_VALID(1U)
 
 #define CHCR_HASH_MAX_DIGEST_SIZE 64
 
 #define DUMMY_BYTES 16
 
 #define TRANSHDR_SIZE(kctx_len)\
 	(sizeof(struct chcr_wr) +\
 	 kctx_len)
 #define CIPHER_TRANSHDR_SIZE(kctx_len, sge_pairs) \
 	(TRANSHDR_SIZE((kctx_len)) + (sge_pairs) +\
 	 sizeof(struct cpl_rx_phys_dsgl))
 #define HASH_TRANSHDR_SIZE(kctx_len)\
 	(TRANSHDR_SIZE(kctx_len) + DUMMY_BYTES)
 
 #define CRYPTO_MAX_IMM_TX_PKT_LEN 256
 
 struct phys_sge_pairs {
 	__be16 len[8];
 	__be64 addr[8];
 };
 
 /* From chr_crypto.h */
+#define CCM_B0_SIZE             16
+#define CCM_AAD_FIELD_SIZE      2
+
 #define CHCR_AES_MAX_KEY_LEN  (AES_XTS_MAX_KEY)
 #define CHCR_MAX_CRYPTO_IV_LEN 16 /* AES IV len */
 
 #define CHCR_ENCRYPT_OP 0
 #define CHCR_DECRYPT_OP 1
 
 #define SCMD_ENCDECCTRL_ENCRYPT 0
 #define SCMD_ENCDECCTRL_DECRYPT 1
 
 #define SCMD_PROTO_VERSION_TLS_1_2 0
 #define SCMD_PROTO_VERSION_TLS_1_1 1
 #define SCMD_PROTO_VERSION_GENERIC 4
 
 #define SCMD_CIPH_MODE_NOP               0
 #define SCMD_CIPH_MODE_AES_CBC           1
 #define SCMD_CIPH_MODE_AES_GCM           2
 #define SCMD_CIPH_MODE_AES_CTR           3
 #define SCMD_CIPH_MODE_GENERIC_AES       4
 #define SCMD_CIPH_MODE_AES_XTS           6
 #define SCMD_CIPH_MODE_AES_CCM           7
 
 #define SCMD_AUTH_MODE_NOP             0
 #define SCMD_AUTH_MODE_SHA1            1
 #define SCMD_AUTH_MODE_SHA224          2
 #define SCMD_AUTH_MODE_SHA256          3
 #define SCMD_AUTH_MODE_GHASH           4
 #define SCMD_AUTH_MODE_SHA512_224      5
 #define SCMD_AUTH_MODE_SHA512_256      6
 #define SCMD_AUTH_MODE_SHA512_384      7
 #define SCMD_AUTH_MODE_SHA512_512      8
 #define SCMD_AUTH_MODE_CBCMAC          9
 #define SCMD_AUTH_MODE_CMAC            10
 
 #define SCMD_HMAC_CTRL_NOP             0
 #define SCMD_HMAC_CTRL_NO_TRUNC        1
 #define SCMD_HMAC_CTRL_TRUNC_RFC4366   2
 #define SCMD_HMAC_CTRL_IPSEC_96BIT     3
 #define SCMD_HMAC_CTRL_PL1             4
 #define SCMD_HMAC_CTRL_PL2             5
 #define SCMD_HMAC_CTRL_PL3             6
 #define SCMD_HMAC_CTRL_DIV2            7
 
 /* This are not really mac key size. They are intermediate values
  * of sha engine and its size
  */
 #define CHCR_KEYCTX_MAC_KEY_SIZE_128        0
 #define CHCR_KEYCTX_MAC_KEY_SIZE_160        1
 #define CHCR_KEYCTX_MAC_KEY_SIZE_192        2
 #define CHCR_KEYCTX_MAC_KEY_SIZE_256        3
 #define CHCR_KEYCTX_MAC_KEY_SIZE_512        4
 #define CHCR_KEYCTX_CIPHER_KEY_SIZE_128     0
 #define CHCR_KEYCTX_CIPHER_KEY_SIZE_192     1
 #define CHCR_KEYCTX_CIPHER_KEY_SIZE_256     2
 #define CHCR_KEYCTX_NO_KEY                  15
 
 #define IV_NOP                  0
 #define IV_IMMEDIATE            1
 #define IV_DSGL			2
 
 #define CHCR_HASH_MAX_BLOCK_SIZE_64  64
 #define CHCR_HASH_MAX_BLOCK_SIZE_128 128
 
 #endif /* !__T4_CRYPTO_H__ */
Index: user/ngie/bug-237403/sys/dev/cxgbe/t4_sge.c
===================================================================
--- user/ngie/bug-237403/sys/dev/cxgbe/t4_sge.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/cxgbe/t4_sge.c	(revision 346926)
@@ -1,6051 +1,6054 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: Navdeep Parhar <np@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_ratelimit.h"
 
 #include <sys/types.h>
 #include <sys/eventhandler.h>
 #include <sys/mbuf.h>
 #include <sys/socket.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/queue.h>
 #include <sys/sbuf.h>
 #include <sys/taskqueue.h>
 #include <sys/time.h>
 #include <sys/sglist.h>
 #include <sys/sysctl.h>
 #include <sys/smp.h>
 #include <sys/counter.h>
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_vlan_var.h>
 #include <netinet/in.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 #include <netinet/tcp.h>
 #include <netinet/udp.h>
 #include <machine/in_cksum.h>
 #include <machine/md_var.h>
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #ifdef DEV_NETMAP
 #include <machine/bus.h>
 #include <sys/selinfo.h>
 #include <net/if_var.h>
 #include <net/netmap.h>
 #include <dev/netmap/netmap_kern.h>
 #endif
 
 #include "common/common.h"
 #include "common/t4_regs.h"
 #include "common/t4_regs_values.h"
 #include "common/t4_msg.h"
 #include "t4_l2t.h"
 #include "t4_mp_ring.h"
 
 #ifdef T4_PKT_TIMESTAMP
 #define RX_COPY_THRESHOLD (MINCLSIZE - 8)
 #else
 #define RX_COPY_THRESHOLD MINCLSIZE
 #endif
 
 /* Internal mbuf flags stored in PH_loc.eight[1]. */
 #define	MC_RAW_WR		0x02
 
 /*
  * Ethernet frames are DMA'd at this byte offset into the freelist buffer.
  * 0-7 are valid values.
  */
 static int fl_pktshift = 0;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, fl_pktshift, CTLFLAG_RDTUN, &fl_pktshift, 0,
     "payload DMA offset in rx buffer (bytes)");
 
 /*
  * Pad ethernet payload up to this boundary.
  * -1: driver should figure out a good value.
  *  0: disable padding.
  *  Any power of 2 from 32 to 4096 (both inclusive) is also a valid value.
  */
 int fl_pad = -1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, fl_pad, CTLFLAG_RDTUN, &fl_pad, 0,
     "payload pad boundary (bytes)");
 
 /*
  * Status page length.
  * -1: driver should figure out a good value.
  *  64 or 128 are the only other valid values.
  */
 static int spg_len = -1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, spg_len, CTLFLAG_RDTUN, &spg_len, 0,
     "status page size (bytes)");
 
 /*
  * Congestion drops.
  * -1: no congestion feedback (not recommended).
  *  0: backpressure the channel instead of dropping packets right away.
  *  1: no backpressure, drop packets for the congested queue immediately.
  */
 static int cong_drop = 0;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, cong_drop, CTLFLAG_RDTUN, &cong_drop, 0,
     "Congestion control for RX queues (0 = backpressure, 1 = drop");
 
 /*
  * Deliver multiple frames in the same free list buffer if they fit.
  * -1: let the driver decide whether to enable buffer packing or not.
  *  0: disable buffer packing.
  *  1: enable buffer packing.
  */
 static int buffer_packing = -1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, buffer_packing, CTLFLAG_RDTUN, &buffer_packing,
     0, "Enable buffer packing");
 
 /*
  * Start next frame in a packed buffer at this boundary.
  * -1: driver should figure out a good value.
  * T4: driver will ignore this and use the same value as fl_pad above.
  * T5: 16, or a power of 2 from 64 to 4096 (both inclusive) is a valid value.
  */
 static int fl_pack = -1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, fl_pack, CTLFLAG_RDTUN, &fl_pack, 0,
     "payload pack boundary (bytes)");
 
 /*
  * Allow the driver to create mbuf(s) in a cluster allocated for rx.
  * 0: never; always allocate mbufs from the zone_mbuf UMA zone.
  * 1: ok to create mbuf(s) within a cluster if there is room.
  */
 static int allow_mbufs_in_cluster = 1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, allow_mbufs_in_cluster, CTLFLAG_RDTUN,
     &allow_mbufs_in_cluster, 0,
     "Allow driver to create mbufs within a rx cluster");
 
 /*
  * Largest rx cluster size that the driver is allowed to allocate.
  */
 static int largest_rx_cluster = MJUM16BYTES;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, largest_rx_cluster, CTLFLAG_RDTUN,
     &largest_rx_cluster, 0, "Largest rx cluster (bytes)");
 
 /*
  * Size of cluster allocation that's most likely to succeed.  The driver will
  * fall back to this size if it fails to allocate clusters larger than this.
  */
 static int safest_rx_cluster = PAGE_SIZE;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, safest_rx_cluster, CTLFLAG_RDTUN,
     &safest_rx_cluster, 0, "Safe rx cluster (bytes)");
 
 #ifdef RATELIMIT
 /*
  * Knob to control TCP timestamp rewriting, and the granularity of the tick used
  * for rewriting.  -1 and 0-3 are all valid values.
  * -1: hardware should leave the TCP timestamps alone.
  * 0: 1ms
  * 1: 100us
  * 2: 10us
  * 3: 1us
  */
 static int tsclk = -1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, tsclk, CTLFLAG_RDTUN, &tsclk, 0,
     "Control TCP timestamp rewriting when using pacing");
 
 static int eo_max_backlog = 1024 * 1024;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, eo_max_backlog, CTLFLAG_RDTUN, &eo_max_backlog,
     0, "Maximum backlog of ratelimited data per flow");
 #endif
 
 /*
  * The interrupt holdoff timers are multiplied by this value on T6+.
  * 1 and 3-17 (both inclusive) are legal values.
  */
 static int tscale = 1;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, tscale, CTLFLAG_RDTUN, &tscale, 0,
     "Interrupt holdoff timer scale on T6+");
 
 /*
  * Number of LRO entries in the lro_ctrl structure per rx queue.
  */
 static int lro_entries = TCP_LRO_ENTRIES;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, lro_entries, CTLFLAG_RDTUN, &lro_entries, 0,
     "Number of LRO entries per RX queue");
 
 /*
  * This enables presorting of frames before they're fed into tcp_lro_rx.
  */
 static int lro_mbufs = 0;
 SYSCTL_INT(_hw_cxgbe, OID_AUTO, lro_mbufs, CTLFLAG_RDTUN, &lro_mbufs, 0,
     "Enable presorting of LRO frames");
 
 struct txpkts {
 	u_int wr_type;		/* type 0 or type 1 */
 	u_int npkt;		/* # of packets in this work request */
 	u_int plen;		/* total payload (sum of all packets) */
 	u_int len16;		/* # of 16B pieces used by this work request */
 };
 
 /* A packet's SGL.  This + m_pkthdr has all info needed for tx */
 struct sgl {
 	struct sglist sg;
 	struct sglist_seg seg[TX_SGL_SEGS];
 };
 
 static int service_iq(struct sge_iq *, int);
 static int service_iq_fl(struct sge_iq *, int);
 static struct mbuf *get_fl_payload(struct adapter *, struct sge_fl *, uint32_t);
 static int t4_eth_rx(struct sge_iq *, const struct rss_header *, struct mbuf *);
 static inline void init_iq(struct sge_iq *, struct adapter *, int, int, int);
 static inline void init_fl(struct adapter *, struct sge_fl *, int, int, char *);
 static inline void init_eq(struct adapter *, struct sge_eq *, int, int, uint8_t,
     uint16_t, char *);
 static int alloc_ring(struct adapter *, size_t, bus_dma_tag_t *, bus_dmamap_t *,
     bus_addr_t *, void **);
 static int free_ring(struct adapter *, bus_dma_tag_t, bus_dmamap_t, bus_addr_t,
     void *);
 static int alloc_iq_fl(struct vi_info *, struct sge_iq *, struct sge_fl *,
     int, int);
 static int free_iq_fl(struct vi_info *, struct sge_iq *, struct sge_fl *);
 static void add_iq_sysctls(struct sysctl_ctx_list *, struct sysctl_oid *,
     struct sge_iq *);
 static void add_fl_sysctls(struct adapter *, struct sysctl_ctx_list *,
     struct sysctl_oid *, struct sge_fl *);
 static int alloc_fwq(struct adapter *);
 static int free_fwq(struct adapter *);
 static int alloc_ctrlq(struct adapter *, struct sge_wrq *, int,
     struct sysctl_oid *);
 static int alloc_rxq(struct vi_info *, struct sge_rxq *, int, int,
     struct sysctl_oid *);
 static int free_rxq(struct vi_info *, struct sge_rxq *);
 #ifdef TCP_OFFLOAD
 static int alloc_ofld_rxq(struct vi_info *, struct sge_ofld_rxq *, int, int,
     struct sysctl_oid *);
 static int free_ofld_rxq(struct vi_info *, struct sge_ofld_rxq *);
 #endif
 #ifdef DEV_NETMAP
 static int alloc_nm_rxq(struct vi_info *, struct sge_nm_rxq *, int, int,
     struct sysctl_oid *);
 static int free_nm_rxq(struct vi_info *, struct sge_nm_rxq *);
 static int alloc_nm_txq(struct vi_info *, struct sge_nm_txq *, int, int,
     struct sysctl_oid *);
 static int free_nm_txq(struct vi_info *, struct sge_nm_txq *);
 #endif
 static int ctrl_eq_alloc(struct adapter *, struct sge_eq *);
 static int eth_eq_alloc(struct adapter *, struct vi_info *, struct sge_eq *);
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 static int ofld_eq_alloc(struct adapter *, struct vi_info *, struct sge_eq *);
 #endif
 static int alloc_eq(struct adapter *, struct vi_info *, struct sge_eq *);
 static int free_eq(struct adapter *, struct sge_eq *);
 static int alloc_wrq(struct adapter *, struct vi_info *, struct sge_wrq *,
     struct sysctl_oid *);
 static int free_wrq(struct adapter *, struct sge_wrq *);
 static int alloc_txq(struct vi_info *, struct sge_txq *, int,
     struct sysctl_oid *);
 static int free_txq(struct vi_info *, struct sge_txq *);
 static void oneseg_dma_callback(void *, bus_dma_segment_t *, int, int);
 static inline void ring_fl_db(struct adapter *, struct sge_fl *);
 static int refill_fl(struct adapter *, struct sge_fl *, int);
 static void refill_sfl(void *);
 static int alloc_fl_sdesc(struct sge_fl *);
 static void free_fl_sdesc(struct adapter *, struct sge_fl *);
 static void find_best_refill_source(struct adapter *, struct sge_fl *, int);
 static void find_safe_refill_source(struct adapter *, struct sge_fl *);
 static void add_fl_to_sfl(struct adapter *, struct sge_fl *);
 
 static inline void get_pkt_gl(struct mbuf *, struct sglist *);
 static inline u_int txpkt_len16(u_int, u_int);
 static inline u_int txpkt_vm_len16(u_int, u_int);
 static inline u_int txpkts0_len16(u_int);
 static inline u_int txpkts1_len16(void);
 static u_int write_raw_wr(struct sge_txq *, void *, struct mbuf *, u_int);
 static u_int write_txpkt_wr(struct sge_txq *, struct fw_eth_tx_pkt_wr *,
     struct mbuf *, u_int);
 static u_int write_txpkt_vm_wr(struct adapter *, struct sge_txq *,
     struct fw_eth_tx_pkt_vm_wr *, struct mbuf *, u_int);
 static int try_txpkts(struct mbuf *, struct mbuf *, struct txpkts *, u_int);
 static int add_to_txpkts(struct mbuf *, struct txpkts *, u_int);
 static u_int write_txpkts_wr(struct sge_txq *, struct fw_eth_tx_pkts_wr *,
     struct mbuf *, const struct txpkts *, u_int);
 static void write_gl_to_txd(struct sge_txq *, struct mbuf *, caddr_t *, int);
 static inline void copy_to_txd(struct sge_eq *, caddr_t, caddr_t *, int);
 static inline void ring_eq_db(struct adapter *, struct sge_eq *, u_int);
 static inline uint16_t read_hw_cidx(struct sge_eq *);
 static inline u_int reclaimable_tx_desc(struct sge_eq *);
 static inline u_int total_available_tx_desc(struct sge_eq *);
 static u_int reclaim_tx_descs(struct sge_txq *, u_int);
 static void tx_reclaim(void *, int);
 static __be64 get_flit(struct sglist_seg *, int, int);
 static int handle_sge_egr_update(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 static int handle_fw_msg(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 static int t4_handle_wrerr_rpl(struct adapter *, const __be64 *);
 static void wrq_tx_drain(void *, int);
 static void drain_wrq_wr_list(struct adapter *, struct sge_wrq *);
 
 static int sysctl_uint16(SYSCTL_HANDLER_ARGS);
 static int sysctl_bufsizes(SYSCTL_HANDLER_ARGS);
 #ifdef RATELIMIT
 static inline u_int txpkt_eo_len16(u_int, u_int, u_int);
 static int ethofld_fw4_ack(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 #endif
 
 static counter_u64_t extfree_refs;
 static counter_u64_t extfree_rels;
 
 an_handler_t t4_an_handler;
 fw_msg_handler_t t4_fw_msg_handler[NUM_FW6_TYPES];
 cpl_handler_t t4_cpl_handler[NUM_CPL_CMDS];
 cpl_handler_t set_tcb_rpl_handlers[NUM_CPL_COOKIES];
 cpl_handler_t l2t_write_rpl_handlers[NUM_CPL_COOKIES];
 cpl_handler_t act_open_rpl_handlers[NUM_CPL_COOKIES];
 cpl_handler_t abort_rpl_rss_handlers[NUM_CPL_COOKIES];
 cpl_handler_t fw4_ack_handlers[NUM_CPL_COOKIES];
 
 void
 t4_register_an_handler(an_handler_t h)
 {
 	uintptr_t *loc;
 
 	MPASS(h == NULL || t4_an_handler == NULL);
 
 	loc = (uintptr_t *)&t4_an_handler;
 	atomic_store_rel_ptr(loc, (uintptr_t)h);
 }
 
 void
 t4_register_fw_msg_handler(int type, fw_msg_handler_t h)
 {
 	uintptr_t *loc;
 
 	MPASS(type < nitems(t4_fw_msg_handler));
 	MPASS(h == NULL || t4_fw_msg_handler[type] == NULL);
 	/*
 	 * These are dispatched by the handler for FW{4|6}_CPL_MSG using the CPL
 	 * handler dispatch table.  Reject any attempt to install a handler for
 	 * this subtype.
 	 */
 	MPASS(type != FW_TYPE_RSSCPL);
 	MPASS(type != FW6_TYPE_RSSCPL);
 
 	loc = (uintptr_t *)&t4_fw_msg_handler[type];
 	atomic_store_rel_ptr(loc, (uintptr_t)h);
 }
 
 void
 t4_register_cpl_handler(int opcode, cpl_handler_t h)
 {
 	uintptr_t *loc;
 
 	MPASS(opcode < nitems(t4_cpl_handler));
 	MPASS(h == NULL || t4_cpl_handler[opcode] == NULL);
 
 	loc = (uintptr_t *)&t4_cpl_handler[opcode];
 	atomic_store_rel_ptr(loc, (uintptr_t)h);
 }
 
 static int
 set_tcb_rpl_handler(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	const struct cpl_set_tcb_rpl *cpl = (const void *)(rss + 1);
 	u_int tid;
 	int cookie;
 
 	MPASS(m == NULL);
 
 	tid = GET_TID(cpl);
 	if (is_hpftid(iq->adapter, tid) || is_ftid(iq->adapter, tid)) {
 		/*
 		 * The return code for filter-write is put in the CPL cookie so
 		 * we have to rely on the hardware tid (is_ftid) to determine
 		 * that this is a response to a filter.
 		 */
 		cookie = CPL_COOKIE_FILTER;
 	} else {
 		cookie = G_COOKIE(cpl->cookie);
 	}
 	MPASS(cookie > CPL_COOKIE_RESERVED);
 	MPASS(cookie < nitems(set_tcb_rpl_handlers));
 
 	return (set_tcb_rpl_handlers[cookie](iq, rss, m));
 }
 
 static int
 l2t_write_rpl_handler(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	const struct cpl_l2t_write_rpl *rpl = (const void *)(rss + 1);
 	unsigned int cookie;
 
 	MPASS(m == NULL);
 
 	cookie = GET_TID(rpl) & F_SYNC_WR ? CPL_COOKIE_TOM : CPL_COOKIE_FILTER;
 	return (l2t_write_rpl_handlers[cookie](iq, rss, m));
 }
 
 static int
 act_open_rpl_handler(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	const struct cpl_act_open_rpl *cpl = (const void *)(rss + 1);
 	u_int cookie = G_TID_COOKIE(G_AOPEN_ATID(be32toh(cpl->atid_status)));
 
 	MPASS(m == NULL);
 	MPASS(cookie != CPL_COOKIE_RESERVED);
 
 	return (act_open_rpl_handlers[cookie](iq, rss, m));
 }
 
 static int
 abort_rpl_rss_handler(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	u_int cookie;
 
 	MPASS(m == NULL);
 	if (is_hashfilter(sc))
 		cookie = CPL_COOKIE_HASHFILTER;
 	else
 		cookie = CPL_COOKIE_TOM;
 
 	return (abort_rpl_rss_handlers[cookie](iq, rss, m));
 }
 
 static int
 fw4_ack_handler(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_fw4_ack *cpl = (const void *)(rss + 1);
 	unsigned int tid = G_CPL_FW4_ACK_FLOWID(be32toh(OPCODE_TID(cpl)));
 	u_int cookie;
 
 	MPASS(m == NULL);
 	if (is_etid(sc, tid))
 		cookie = CPL_COOKIE_ETHOFLD;
 	else
 		cookie = CPL_COOKIE_TOM;
 
 	return (fw4_ack_handlers[cookie](iq, rss, m));
 }
 
 static void
 t4_init_shared_cpl_handlers(void)
 {
 
 	t4_register_cpl_handler(CPL_SET_TCB_RPL, set_tcb_rpl_handler);
 	t4_register_cpl_handler(CPL_L2T_WRITE_RPL, l2t_write_rpl_handler);
 	t4_register_cpl_handler(CPL_ACT_OPEN_RPL, act_open_rpl_handler);
 	t4_register_cpl_handler(CPL_ABORT_RPL_RSS, abort_rpl_rss_handler);
 	t4_register_cpl_handler(CPL_FW4_ACK, fw4_ack_handler);
 }
 
 void
 t4_register_shared_cpl_handler(int opcode, cpl_handler_t h, int cookie)
 {
 	uintptr_t *loc;
 
 	MPASS(opcode < nitems(t4_cpl_handler));
 	MPASS(cookie > CPL_COOKIE_RESERVED);
 	MPASS(cookie < NUM_CPL_COOKIES);
 	MPASS(t4_cpl_handler[opcode] != NULL);
 
 	switch (opcode) {
 	case CPL_SET_TCB_RPL:
 		loc = (uintptr_t *)&set_tcb_rpl_handlers[cookie];
 		break;
 	case CPL_L2T_WRITE_RPL:
 		loc = (uintptr_t *)&l2t_write_rpl_handlers[cookie];
 		break;
 	case CPL_ACT_OPEN_RPL:
 		loc = (uintptr_t *)&act_open_rpl_handlers[cookie];
 		break;
 	case CPL_ABORT_RPL_RSS:
 		loc = (uintptr_t *)&abort_rpl_rss_handlers[cookie];
 		break;
 	case CPL_FW4_ACK:
 		loc = (uintptr_t *)&fw4_ack_handlers[cookie];
 		break;
 	default:
 		MPASS(0);
 		return;
 	}
 	MPASS(h == NULL || *loc == (uintptr_t)NULL);
 	atomic_store_rel_ptr(loc, (uintptr_t)h);
 }
 
 /*
  * Called on MOD_LOAD.  Validates and calculates the SGE tunables.
  */
 void
 t4_sge_modload(void)
 {
 
 	if (fl_pktshift < 0 || fl_pktshift > 7) {
 		printf("Invalid hw.cxgbe.fl_pktshift value (%d),"
 		    " using 0 instead.\n", fl_pktshift);
 		fl_pktshift = 0;
 	}
 
 	if (spg_len != 64 && spg_len != 128) {
 		int len;
 
 #if defined(__i386__) || defined(__amd64__)
 		len = cpu_clflush_line_size > 64 ? 128 : 64;
 #else
 		len = 64;
 #endif
 		if (spg_len != -1) {
 			printf("Invalid hw.cxgbe.spg_len value (%d),"
 			    " using %d instead.\n", spg_len, len);
 		}
 		spg_len = len;
 	}
 
 	if (cong_drop < -1 || cong_drop > 1) {
 		printf("Invalid hw.cxgbe.cong_drop value (%d),"
 		    " using 0 instead.\n", cong_drop);
 		cong_drop = 0;
 	}
 
 	if (tscale != 1 && (tscale < 3 || tscale > 17)) {
 		printf("Invalid hw.cxgbe.tscale value (%d),"
 		    " using 1 instead.\n", tscale);
 		tscale = 1;
 	}
 
 	extfree_refs = counter_u64_alloc(M_WAITOK);
 	extfree_rels = counter_u64_alloc(M_WAITOK);
 	counter_u64_zero(extfree_refs);
 	counter_u64_zero(extfree_rels);
 
 	t4_init_shared_cpl_handlers();
 	t4_register_cpl_handler(CPL_FW4_MSG, handle_fw_msg);
 	t4_register_cpl_handler(CPL_FW6_MSG, handle_fw_msg);
 	t4_register_cpl_handler(CPL_SGE_EGR_UPDATE, handle_sge_egr_update);
 	t4_register_cpl_handler(CPL_RX_PKT, t4_eth_rx);
 #ifdef RATELIMIT
 	t4_register_shared_cpl_handler(CPL_FW4_ACK, ethofld_fw4_ack,
 	    CPL_COOKIE_ETHOFLD);
 #endif
 	t4_register_fw_msg_handler(FW6_TYPE_CMD_RPL, t4_handle_fw_rpl);
 	t4_register_fw_msg_handler(FW6_TYPE_WRERR_RPL, t4_handle_wrerr_rpl);
 }
 
 void
 t4_sge_modunload(void)
 {
 
 	counter_u64_free(extfree_refs);
 	counter_u64_free(extfree_rels);
 }
 
 uint64_t
 t4_sge_extfree_refs(void)
 {
 	uint64_t refs, rels;
 
 	rels = counter_u64_fetch(extfree_rels);
 	refs = counter_u64_fetch(extfree_refs);
 
 	return (refs - rels);
 }
 
 static inline void
 setup_pad_and_pack_boundaries(struct adapter *sc)
 {
 	uint32_t v, m;
 	int pad, pack, pad_shift;
 
 	pad_shift = chip_id(sc) > CHELSIO_T5 ? X_T6_INGPADBOUNDARY_SHIFT :
 	    X_INGPADBOUNDARY_SHIFT;
 	pad = fl_pad;
 	if (fl_pad < (1 << pad_shift) ||
 	    fl_pad > (1 << (pad_shift + M_INGPADBOUNDARY)) ||
 	    !powerof2(fl_pad)) {
 		/*
 		 * If there is any chance that we might use buffer packing and
 		 * the chip is a T4, then pick 64 as the pad/pack boundary.  Set
 		 * it to the minimum allowed in all other cases.
 		 */
 		pad = is_t4(sc) && buffer_packing ? 64 : 1 << pad_shift;
 
 		/*
 		 * For fl_pad = 0 we'll still write a reasonable value to the
 		 * register but all the freelists will opt out of padding.
 		 * We'll complain here only if the user tried to set it to a
 		 * value greater than 0 that was invalid.
 		 */
 		if (fl_pad > 0) {
 			device_printf(sc->dev, "Invalid hw.cxgbe.fl_pad value"
 			    " (%d), using %d instead.\n", fl_pad, pad);
 		}
 	}
 	m = V_INGPADBOUNDARY(M_INGPADBOUNDARY);
 	v = V_INGPADBOUNDARY(ilog2(pad) - pad_shift);
 	t4_set_reg_field(sc, A_SGE_CONTROL, m, v);
 
 	if (is_t4(sc)) {
 		if (fl_pack != -1 && fl_pack != pad) {
 			/* Complain but carry on. */
 			device_printf(sc->dev, "hw.cxgbe.fl_pack (%d) ignored,"
 			    " using %d instead.\n", fl_pack, pad);
 		}
 		return;
 	}
 
 	pack = fl_pack;
 	if (fl_pack < 16 || fl_pack == 32 || fl_pack > 4096 ||
 	    !powerof2(fl_pack)) {
 		pack = max(sc->params.pci.mps, CACHE_LINE_SIZE);
 		MPASS(powerof2(pack));
 		if (pack < 16)
 			pack = 16;
 		if (pack == 32)
 			pack = 64;
 		if (pack > 4096)
 			pack = 4096;
 		if (fl_pack != -1) {
 			device_printf(sc->dev, "Invalid hw.cxgbe.fl_pack value"
 			    " (%d), using %d instead.\n", fl_pack, pack);
 		}
 	}
 	m = V_INGPACKBOUNDARY(M_INGPACKBOUNDARY);
 	if (pack == 16)
 		v = V_INGPACKBOUNDARY(0);
 	else
 		v = V_INGPACKBOUNDARY(ilog2(pack) - 5);
 
 	MPASS(!is_t4(sc));	/* T4 doesn't have SGE_CONTROL2 */
 	t4_set_reg_field(sc, A_SGE_CONTROL2, m, v);
 }
 
 /*
  * adap->params.vpd.cclk must be set up before this is called.
  */
 void
 t4_tweak_chip_settings(struct adapter *sc)
 {
 	int i;
 	uint32_t v, m;
 	int intr_timer[SGE_NTIMERS] = {1, 5, 10, 50, 100, 200};
 	int timer_max = M_TIMERVALUE0 * 1000 / sc->params.vpd.cclk;
 	int intr_pktcount[SGE_NCOUNTERS] = {1, 8, 16, 32}; /* 63 max */
 	uint16_t indsz = min(RX_COPY_THRESHOLD - 1, M_INDICATESIZE);
 	static int sge_flbuf_sizes[] = {
 		MCLBYTES,
 #if MJUMPAGESIZE != MCLBYTES
 		MJUMPAGESIZE,
 		MJUMPAGESIZE - CL_METADATA_SIZE,
 		MJUMPAGESIZE - 2 * MSIZE - CL_METADATA_SIZE,
 #endif
 		MJUM9BYTES,
 		MJUM16BYTES,
 		MCLBYTES - MSIZE - CL_METADATA_SIZE,
 		MJUM9BYTES - CL_METADATA_SIZE,
 		MJUM16BYTES - CL_METADATA_SIZE,
 	};
 
 	KASSERT(sc->flags & MASTER_PF,
 	    ("%s: trying to change chip settings when not master.", __func__));
 
 	m = V_PKTSHIFT(M_PKTSHIFT) | F_RXPKTCPLMODE | F_EGRSTATUSPAGESIZE;
 	v = V_PKTSHIFT(fl_pktshift) | F_RXPKTCPLMODE |
 	    V_EGRSTATUSPAGESIZE(spg_len == 128);
 	t4_set_reg_field(sc, A_SGE_CONTROL, m, v);
 
 	setup_pad_and_pack_boundaries(sc);
 
 	v = V_HOSTPAGESIZEPF0(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF1(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF2(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF3(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF4(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF5(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF6(PAGE_SHIFT - 10) |
 	    V_HOSTPAGESIZEPF7(PAGE_SHIFT - 10);
 	t4_write_reg(sc, A_SGE_HOST_PAGE_SIZE, v);
 
 	KASSERT(nitems(sge_flbuf_sizes) <= SGE_FLBUF_SIZES,
 	    ("%s: hw buffer size table too big", __func__));
 	t4_write_reg(sc, A_SGE_FL_BUFFER_SIZE0, 4096);
 	t4_write_reg(sc, A_SGE_FL_BUFFER_SIZE1, 65536);
 	for (i = 0; i < min(nitems(sge_flbuf_sizes), SGE_FLBUF_SIZES); i++) {
 		t4_write_reg(sc, A_SGE_FL_BUFFER_SIZE15 - (4 * i),
 		    sge_flbuf_sizes[i]);
 	}
 
 	v = V_THRESHOLD_0(intr_pktcount[0]) | V_THRESHOLD_1(intr_pktcount[1]) |
 	    V_THRESHOLD_2(intr_pktcount[2]) | V_THRESHOLD_3(intr_pktcount[3]);
 	t4_write_reg(sc, A_SGE_INGRESS_RX_THRESHOLD, v);
 
 	KASSERT(intr_timer[0] <= timer_max,
 	    ("%s: not a single usable timer (%d, %d)", __func__, intr_timer[0],
 	    timer_max));
 	for (i = 1; i < nitems(intr_timer); i++) {
 		KASSERT(intr_timer[i] >= intr_timer[i - 1],
 		    ("%s: timers not listed in increasing order (%d)",
 		    __func__, i));
 
 		while (intr_timer[i] > timer_max) {
 			if (i == nitems(intr_timer) - 1) {
 				intr_timer[i] = timer_max;
 				break;
 			}
 			intr_timer[i] += intr_timer[i - 1];
 			intr_timer[i] /= 2;
 		}
 	}
 
 	v = V_TIMERVALUE0(us_to_core_ticks(sc, intr_timer[0])) |
 	    V_TIMERVALUE1(us_to_core_ticks(sc, intr_timer[1]));
 	t4_write_reg(sc, A_SGE_TIMER_VALUE_0_AND_1, v);
 	v = V_TIMERVALUE2(us_to_core_ticks(sc, intr_timer[2])) |
 	    V_TIMERVALUE3(us_to_core_ticks(sc, intr_timer[3]));
 	t4_write_reg(sc, A_SGE_TIMER_VALUE_2_AND_3, v);
 	v = V_TIMERVALUE4(us_to_core_ticks(sc, intr_timer[4])) |
 	    V_TIMERVALUE5(us_to_core_ticks(sc, intr_timer[5]));
 	t4_write_reg(sc, A_SGE_TIMER_VALUE_4_AND_5, v);
 
 	if (chip_id(sc) >= CHELSIO_T6) {
 		m = V_TSCALE(M_TSCALE);
 		if (tscale == 1)
 			v = 0;
 		else
 			v = V_TSCALE(tscale - 2);
 		t4_set_reg_field(sc, A_SGE_ITP_CONTROL, m, v);
 
 		if (sc->debug_flags & DF_DISABLE_TCB_CACHE) {
 			m = V_RDTHRESHOLD(M_RDTHRESHOLD) | F_WRTHRTHRESHEN |
 			    V_WRTHRTHRESH(M_WRTHRTHRESH);
 			t4_tp_pio_read(sc, &v, 1, A_TP_CMM_CONFIG, 1);
 			v &= ~m;
 			v |= V_RDTHRESHOLD(1) | F_WRTHRTHRESHEN |
 			    V_WRTHRTHRESH(16);
 			t4_tp_pio_write(sc, &v, 1, A_TP_CMM_CONFIG, 1);
 		}
 	}
 
 	/* 4K, 16K, 64K, 256K DDP "page sizes" for TDDP */
 	v = V_HPZ0(0) | V_HPZ1(2) | V_HPZ2(4) | V_HPZ3(6);
 	t4_write_reg(sc, A_ULP_RX_TDDP_PSZ, v);
 
 	/*
 	 * 4K, 8K, 16K, 64K DDP "page sizes" for iSCSI DDP.  These have been
 	 * chosen with MAXPHYS = 128K in mind.  The largest DDP buffer that we
 	 * may have to deal with is MAXPHYS + 1 page.
 	 */
 	v = V_HPZ0(0) | V_HPZ1(1) | V_HPZ2(2) | V_HPZ3(4);
 	t4_write_reg(sc, A_ULP_RX_ISCSI_PSZ, v);
 
 	/* We use multiple DDP page sizes both in plain-TOE and ISCSI modes. */
 	m = v = F_TDDPTAGTCB | F_ISCSITAGTCB;
 	t4_set_reg_field(sc, A_ULP_RX_CTL, m, v);
 
 	m = V_INDICATESIZE(M_INDICATESIZE) | F_REARMDDPOFFSET |
 	    F_RESETDDPOFFSET;
 	v = V_INDICATESIZE(indsz) | F_REARMDDPOFFSET | F_RESETDDPOFFSET;
 	t4_set_reg_field(sc, A_TP_PARA_REG5, m, v);
 }
 
 /*
  * SGE wants the buffer to be at least 64B and then a multiple of 16.  If
  * padding is in use, the buffer's start and end need to be aligned to the pad
  * boundary as well.  We'll just make sure that the size is a multiple of the
  * boundary here, it is up to the buffer allocation code to make sure the start
  * of the buffer is aligned as well.
  */
 static inline int
 hwsz_ok(struct adapter *sc, int hwsz)
 {
 	int mask = fl_pad ? sc->params.sge.pad_boundary - 1 : 16 - 1;
 
 	return (hwsz >= 64 && (hwsz & mask) == 0);
 }
 
 /*
  * XXX: driver really should be able to deal with unexpected settings.
  */
 int
 t4_read_chip_settings(struct adapter *sc)
 {
 	struct sge *s = &sc->sge;
 	struct sge_params *sp = &sc->params.sge;
 	int i, j, n, rc = 0;
 	uint32_t m, v, r;
 	uint16_t indsz = min(RX_COPY_THRESHOLD - 1, M_INDICATESIZE);
 	static int sw_buf_sizes[] = {	/* Sorted by size */
 		MCLBYTES,
 #if MJUMPAGESIZE != MCLBYTES
 		MJUMPAGESIZE,
 #endif
 		MJUM9BYTES,
 		MJUM16BYTES
 	};
 	struct sw_zone_info *swz, *safe_swz;
 	struct hw_buf_info *hwb;
 
 	m = F_RXPKTCPLMODE;
 	v = F_RXPKTCPLMODE;
 	r = sc->params.sge.sge_control;
 	if ((r & m) != v) {
 		device_printf(sc->dev, "invalid SGE_CONTROL(0x%x)\n", r);
 		rc = EINVAL;
 	}
 
 	/*
 	 * If this changes then every single use of PAGE_SHIFT in the driver
 	 * needs to be carefully reviewed for PAGE_SHIFT vs sp->page_shift.
 	 */
 	if (sp->page_shift != PAGE_SHIFT) {
 		device_printf(sc->dev, "invalid SGE_HOST_PAGE_SIZE(0x%x)\n", r);
 		rc = EINVAL;
 	}
 
 	/* Filter out unusable hw buffer sizes entirely (mark with -2). */
 	hwb = &s->hw_buf_info[0];
 	for (i = 0; i < nitems(s->hw_buf_info); i++, hwb++) {
 		r = sc->params.sge.sge_fl_buffer_size[i];
 		hwb->size = r;
 		hwb->zidx = hwsz_ok(sc, r) ? -1 : -2;
 		hwb->next = -1;
 	}
 
 	/*
 	 * Create a sorted list in decreasing order of hw buffer sizes (and so
 	 * increasing order of spare area) for each software zone.
 	 *
 	 * If padding is enabled then the start and end of the buffer must align
 	 * to the pad boundary; if packing is enabled then they must align with
 	 * the pack boundary as well.  Allocations from the cluster zones are
 	 * aligned to min(size, 4K), so the buffer starts at that alignment and
 	 * ends at hwb->size alignment.  If mbuf inlining is allowed the
 	 * starting alignment will be reduced to MSIZE and the driver will
 	 * exercise appropriate caution when deciding on the best buffer layout
 	 * to use.
 	 */
 	n = 0;	/* no usable buffer size to begin with */
 	swz = &s->sw_zone_info[0];
 	safe_swz = NULL;
 	for (i = 0; i < SW_ZONE_SIZES; i++, swz++) {
 		int8_t head = -1, tail = -1;
 
 		swz->size = sw_buf_sizes[i];
 		swz->zone = m_getzone(swz->size);
 		swz->type = m_gettype(swz->size);
 
 		if (swz->size < PAGE_SIZE) {
 			MPASS(powerof2(swz->size));
 			if (fl_pad && (swz->size % sp->pad_boundary != 0))
 				continue;
 		}
 
 		if (swz->size == safest_rx_cluster)
 			safe_swz = swz;
 
 		hwb = &s->hw_buf_info[0];
 		for (j = 0; j < SGE_FLBUF_SIZES; j++, hwb++) {
 			if (hwb->zidx != -1 || hwb->size > swz->size)
 				continue;
 #ifdef INVARIANTS
 			if (fl_pad)
 				MPASS(hwb->size % sp->pad_boundary == 0);
 #endif
 			hwb->zidx = i;
 			if (head == -1)
 				head = tail = j;
 			else if (hwb->size < s->hw_buf_info[tail].size) {
 				s->hw_buf_info[tail].next = j;
 				tail = j;
 			} else {
 				int8_t *cur;
 				struct hw_buf_info *t;
 
 				for (cur = &head; *cur != -1; cur = &t->next) {
 					t = &s->hw_buf_info[*cur];
 					if (hwb->size == t->size) {
 						hwb->zidx = -2;
 						break;
 					}
 					if (hwb->size > t->size) {
 						hwb->next = *cur;
 						*cur = j;
 						break;
 					}
 				}
 			}
 		}
 		swz->head_hwidx = head;
 		swz->tail_hwidx = tail;
 
 		if (tail != -1) {
 			n++;
 			if (swz->size - s->hw_buf_info[tail].size >=
 			    CL_METADATA_SIZE)
 				sc->flags |= BUF_PACKING_OK;
 		}
 	}
 	if (n == 0) {
 		device_printf(sc->dev, "no usable SGE FL buffer size.\n");
 		rc = EINVAL;
 	}
 
 	s->safe_hwidx1 = -1;
 	s->safe_hwidx2 = -1;
 	if (safe_swz != NULL) {
 		s->safe_hwidx1 = safe_swz->head_hwidx;
 		for (i = safe_swz->head_hwidx; i != -1; i = hwb->next) {
 			int spare;
 
 			hwb = &s->hw_buf_info[i];
 #ifdef INVARIANTS
 			if (fl_pad)
 				MPASS(hwb->size % sp->pad_boundary == 0);
 #endif
 			spare = safe_swz->size - hwb->size;
 			if (spare >= CL_METADATA_SIZE) {
 				s->safe_hwidx2 = i;
 				break;
 			}
 		}
 	}
 
 	if (sc->flags & IS_VF)
 		return (0);
 
 	v = V_HPZ0(0) | V_HPZ1(2) | V_HPZ2(4) | V_HPZ3(6);
 	r = t4_read_reg(sc, A_ULP_RX_TDDP_PSZ);
 	if (r != v) {
 		device_printf(sc->dev, "invalid ULP_RX_TDDP_PSZ(0x%x)\n", r);
 		rc = EINVAL;
 	}
 
 	m = v = F_TDDPTAGTCB;
 	r = t4_read_reg(sc, A_ULP_RX_CTL);
 	if ((r & m) != v) {
 		device_printf(sc->dev, "invalid ULP_RX_CTL(0x%x)\n", r);
 		rc = EINVAL;
 	}
 
 	m = V_INDICATESIZE(M_INDICATESIZE) | F_REARMDDPOFFSET |
 	    F_RESETDDPOFFSET;
 	v = V_INDICATESIZE(indsz) | F_REARMDDPOFFSET | F_RESETDDPOFFSET;
 	r = t4_read_reg(sc, A_TP_PARA_REG5);
 	if ((r & m) != v) {
 		device_printf(sc->dev, "invalid TP_PARA_REG5(0x%x)\n", r);
 		rc = EINVAL;
 	}
 
 	t4_init_tp_params(sc, 1);
 
 	t4_read_mtu_tbl(sc, sc->params.mtus, NULL);
 	t4_load_mtus(sc, sc->params.mtus, sc->params.a_wnd, sc->params.b_wnd);
 
 	return (rc);
 }
 
 int
 t4_create_dma_tag(struct adapter *sc)
 {
 	int rc;
 
 	rc = bus_dma_tag_create(bus_get_dma_tag(sc->dev), 1, 0,
 	    BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, BUS_SPACE_MAXSIZE,
 	    BUS_SPACE_UNRESTRICTED, BUS_SPACE_MAXSIZE, BUS_DMA_ALLOCNOW, NULL,
 	    NULL, &sc->dmat);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to create main DMA tag: %d\n", rc);
 	}
 
 	return (rc);
 }
 
 void
 t4_sge_sysctls(struct adapter *sc, struct sysctl_ctx_list *ctx,
     struct sysctl_oid_list *children)
 {
 	struct sge_params *sp = &sc->params.sge;
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "buffer_sizes",
 	    CTLTYPE_STRING | CTLFLAG_RD, &sc->sge, 0, sysctl_bufsizes, "A",
 	    "freelist buffer sizes");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pktshift", CTLFLAG_RD,
 	    NULL, sp->fl_pktshift, "payload DMA offset in rx buffer (bytes)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pad", CTLFLAG_RD,
 	    NULL, sp->pad_boundary, "payload pad boundary (bytes)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "spg_len", CTLFLAG_RD,
 	    NULL, sp->spg_len, "status page size (bytes)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "cong_drop", CTLFLAG_RD,
 	    NULL, cong_drop, "congestion drop setting");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "fl_pack", CTLFLAG_RD,
 	    NULL, sp->pack_boundary, "payload pack boundary (bytes)");
 }
 
 int
 t4_destroy_dma_tag(struct adapter *sc)
 {
 	if (sc->dmat)
 		bus_dma_tag_destroy(sc->dmat);
 
 	return (0);
 }
 
 /*
  * Allocate and initialize the firmware event queue, control queues, and special
  * purpose rx queues owned by the adapter.
  *
  * Returns errno on failure.  Resources allocated up to that point may still be
  * allocated.  Caller is responsible for cleanup in case this function fails.
  */
 int
 t4_setup_adapter_queues(struct adapter *sc)
 {
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children;
 	int rc, i;
 
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 
 	sysctl_ctx_init(&sc->ctx);
 	sc->flags |= ADAP_SYSCTL_CTX;
 
 	/*
 	 * Firmware event queue
 	 */
 	rc = alloc_fwq(sc);
 	if (rc != 0)
 		return (rc);
 
 	/*
 	 * That's all for the VF driver.
 	 */
 	if (sc->flags & IS_VF)
 		return (rc);
 
 	oid = device_get_sysctl_tree(sc->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	/*
 	 * XXX: General purpose rx queues, one per port.
 	 */
 
 	/*
 	 * Control queues, one per port.
 	 */
 	oid = SYSCTL_ADD_NODE(&sc->ctx, children, OID_AUTO, "ctrlq",
 	    CTLFLAG_RD, NULL, "control queues");
 	for_each_port(sc, i) {
 		struct sge_wrq *ctrlq = &sc->sge.ctrlq[i];
 
 		rc = alloc_ctrlq(sc, ctrlq, i, oid);
 		if (rc != 0)
 			return (rc);
 	}
 
 	return (rc);
 }
 
 /*
  * Idempotent
  */
 int
 t4_teardown_adapter_queues(struct adapter *sc)
 {
 	int i;
 
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 
 	/* Do this before freeing the queue */
 	if (sc->flags & ADAP_SYSCTL_CTX) {
 		sysctl_ctx_free(&sc->ctx);
 		sc->flags &= ~ADAP_SYSCTL_CTX;
 	}
 
 	if (!(sc->flags & IS_VF)) {
 		for_each_port(sc, i)
 			free_wrq(sc, &sc->sge.ctrlq[i]);
 	}
 	free_fwq(sc);
 
 	return (0);
 }
 
 /* Maximum payload that can be delivered with a single iq descriptor */
 static inline int
 mtu_to_max_payload(struct adapter *sc, int mtu, const int toe)
 {
 	int payload;
 
 #ifdef TCP_OFFLOAD
 	if (toe) {
 		int rxcs = G_RXCOALESCESIZE(t4_read_reg(sc, A_TP_PARA_REG2));
 
 		/* Note that COP can set rx_coalesce on/off per connection. */
 		payload = max(mtu, rxcs);
 	} else {
 #endif
 		/* large enough even when hw VLAN extraction is disabled */
 		payload = sc->params.sge.fl_pktshift + ETHER_HDR_LEN +
 		    ETHER_VLAN_ENCAP_LEN + mtu;
 #ifdef TCP_OFFLOAD
 	}
 #endif
 
 	return (payload);
 }
 
 int
 t4_setup_vi_queues(struct vi_info *vi)
 {
 	int rc = 0, i, intr_idx, iqidx;
 	struct sge_rxq *rxq;
 	struct sge_txq *txq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 	struct sge_wrq *ofld_txq;
 #endif
 #ifdef DEV_NETMAP
 	int saved_idx;
 	struct sge_nm_rxq *nm_rxq;
 	struct sge_nm_txq *nm_txq;
 #endif
 	char name[16];
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	struct sysctl_oid *oid = device_get_sysctl_tree(vi->dev);
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 	int maxp, mtu = ifp->if_mtu;
 
 	/* Interrupt vector to start from (when using multiple vectors) */
 	intr_idx = vi->first_intr;
 
 #ifdef DEV_NETMAP
 	saved_idx = intr_idx;
 	if (ifp->if_capabilities & IFCAP_NETMAP) {
 
 		/* netmap is supported with direct interrupts only. */
 		MPASS(!forwarding_intr_to_fwq(sc));
 
 		/*
 		 * We don't have buffers to back the netmap rx queues
 		 * right now so we create the queues in a way that
 		 * doesn't set off any congestion signal in the chip.
 		 */
 		oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "nm_rxq",
 		    CTLFLAG_RD, NULL, "rx queues");
 		for_each_nm_rxq(vi, i, nm_rxq) {
 			rc = alloc_nm_rxq(vi, nm_rxq, intr_idx, i, oid);
 			if (rc != 0)
 				goto done;
 			intr_idx++;
 		}
 
 		oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "nm_txq",
 		    CTLFLAG_RD, NULL, "tx queues");
 		for_each_nm_txq(vi, i, nm_txq) {
 			iqidx = vi->first_nm_rxq + (i % vi->nnmrxq);
 			rc = alloc_nm_txq(vi, nm_txq, iqidx, i, oid);
 			if (rc != 0)
 				goto done;
 		}
 	}
 
 	/* Normal rx queues and netmap rx queues share the same interrupts. */
 	intr_idx = saved_idx;
 #endif
 
 	/*
 	 * Allocate rx queues first because a default iqid is required when
 	 * creating a tx queue.
 	 */
 	maxp = mtu_to_max_payload(sc, mtu, 0);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "rxq",
 	    CTLFLAG_RD, NULL, "rx queues");
 	for_each_rxq(vi, i, rxq) {
 
 		init_iq(&rxq->iq, sc, vi->tmr_idx, vi->pktc_idx, vi->qsize_rxq);
 
 		snprintf(name, sizeof(name), "%s rxq%d-fl",
 		    device_get_nameunit(vi->dev), i);
 		init_fl(sc, &rxq->fl, vi->qsize_rxq / 8, maxp, name);
 
 		rc = alloc_rxq(vi, rxq,
 		    forwarding_intr_to_fwq(sc) ? -1 : intr_idx, i, oid);
 		if (rc != 0)
 			goto done;
 		intr_idx++;
 	}
 #ifdef DEV_NETMAP
 	if (ifp->if_capabilities & IFCAP_NETMAP)
 		intr_idx = saved_idx + max(vi->nrxq, vi->nnmrxq);
 #endif
 #ifdef TCP_OFFLOAD
 	maxp = mtu_to_max_payload(sc, mtu, 1);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ofld_rxq",
 	    CTLFLAG_RD, NULL, "rx queues for offloaded TCP connections");
 	for_each_ofld_rxq(vi, i, ofld_rxq) {
 
 		init_iq(&ofld_rxq->iq, sc, vi->ofld_tmr_idx, vi->ofld_pktc_idx,
 		    vi->qsize_rxq);
 
 		snprintf(name, sizeof(name), "%s ofld_rxq%d-fl",
 		    device_get_nameunit(vi->dev), i);
 		init_fl(sc, &ofld_rxq->fl, vi->qsize_rxq / 8, maxp, name);
 
 		rc = alloc_ofld_rxq(vi, ofld_rxq,
 		    forwarding_intr_to_fwq(sc) ? -1 : intr_idx, i, oid);
 		if (rc != 0)
 			goto done;
 		intr_idx++;
 	}
 #endif
 
 	/*
 	 * Now the tx queues.
 	 */
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "txq", CTLFLAG_RD,
 	    NULL, "tx queues");
 	for_each_txq(vi, i, txq) {
 		iqidx = vi->first_rxq + (i % vi->nrxq);
 		snprintf(name, sizeof(name), "%s txq%d",
 		    device_get_nameunit(vi->dev), i);
 		init_eq(sc, &txq->eq, EQ_ETH, vi->qsize_txq, pi->tx_chan,
 		    sc->sge.rxq[iqidx].iq.cntxt_id, name);
 
 		rc = alloc_txq(vi, txq, i, oid);
 		if (rc != 0)
 			goto done;
 	}
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, "ofld_txq",
 	    CTLFLAG_RD, NULL, "tx queues for TOE/ETHOFLD");
 	for_each_ofld_txq(vi, i, ofld_txq) {
 		struct sysctl_oid *oid2;
 
 		snprintf(name, sizeof(name), "%s ofld_txq%d",
 		    device_get_nameunit(vi->dev), i);
 		if (vi->nofldrxq > 0) {
 			iqidx = vi->first_ofld_rxq + (i % vi->nofldrxq);
 			init_eq(sc, &ofld_txq->eq, EQ_OFLD, vi->qsize_txq,
 			    pi->tx_chan, sc->sge.ofld_rxq[iqidx].iq.cntxt_id,
 			    name);
 		} else {
 			iqidx = vi->first_rxq + (i % vi->nrxq);
 			init_eq(sc, &ofld_txq->eq, EQ_OFLD, vi->qsize_txq,
 			    pi->tx_chan, sc->sge.rxq[iqidx].iq.cntxt_id, name);
 		}
 
 		snprintf(name, sizeof(name), "%d", i);
 		oid2 = SYSCTL_ADD_NODE(&vi->ctx, SYSCTL_CHILDREN(oid), OID_AUTO,
 		    name, CTLFLAG_RD, NULL, "offload tx queue");
 
 		rc = alloc_wrq(sc, vi, ofld_txq, oid2);
 		if (rc != 0)
 			goto done;
 	}
 #endif
 done:
 	if (rc)
 		t4_teardown_vi_queues(vi);
 
 	return (rc);
 }
 
 /*
  * Idempotent
  */
 int
 t4_teardown_vi_queues(struct vi_info *vi)
 {
 	int i;
 	struct sge_rxq *rxq;
 	struct sge_txq *txq;
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct sge_wrq *ofld_txq;
 #endif
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 #ifdef DEV_NETMAP
 	struct sge_nm_rxq *nm_rxq;
 	struct sge_nm_txq *nm_txq;
 #endif
 
 	/* Do this before freeing the queues */
 	if (vi->flags & VI_SYSCTL_CTX) {
 		sysctl_ctx_free(&vi->ctx);
 		vi->flags &= ~VI_SYSCTL_CTX;
 	}
 
 #ifdef DEV_NETMAP
 	if (vi->ifp->if_capabilities & IFCAP_NETMAP) {
 		for_each_nm_txq(vi, i, nm_txq) {
 			free_nm_txq(vi, nm_txq);
 		}
 
 		for_each_nm_rxq(vi, i, nm_rxq) {
 			free_nm_rxq(vi, nm_rxq);
 		}
 	}
 #endif
 
 	/*
 	 * Take down all the tx queues first, as they reference the rx queues
 	 * (for egress updates, etc.).
 	 */
 
 	for_each_txq(vi, i, txq) {
 		free_txq(vi, txq);
 	}
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 	for_each_ofld_txq(vi, i, ofld_txq) {
 		free_wrq(sc, ofld_txq);
 	}
 #endif
 
 	/*
 	 * Then take down the rx queues.
 	 */
 
 	for_each_rxq(vi, i, rxq) {
 		free_rxq(vi, rxq);
 	}
 #ifdef TCP_OFFLOAD
 	for_each_ofld_rxq(vi, i, ofld_rxq) {
 		free_ofld_rxq(vi, ofld_rxq);
 	}
 #endif
 
 	return (0);
 }
 
 /*
  * Interrupt handler when the driver is using only 1 interrupt.  This is a very
  * unusual scenario.
  *
  * a) Deals with errors, if any.
  * b) Services firmware event queue, which is taking interrupts for all other
  *    queues.
  */
 void
 t4_intr_all(void *arg)
 {
 	struct adapter *sc = arg;
 	struct sge_iq *fwq = &sc->sge.fwq;
 
 	MPASS(sc->intr_count == 1);
 
 	if (sc->intr_type == INTR_INTX)
 		t4_write_reg(sc, MYPF_REG(A_PCIE_PF_CLI), 0);
 
 	t4_intr_err(arg);
 	t4_intr_evt(fwq);
 }
 
 /*
  * Interrupt handler for errors (installed directly when multiple interrupts are
  * being used, or called by t4_intr_all).
  */
 void
 t4_intr_err(void *arg)
 {
 	struct adapter *sc = arg;
 	uint32_t v;
 	const bool verbose = (sc->debug_flags & DF_VERBOSE_SLOWINTR) != 0;
 
 	if (sc->flags & ADAP_ERR)
 		return;
 
 	v = t4_read_reg(sc, MYPF_REG(A_PL_PF_INT_CAUSE));
 	if (v & F_PFSW) {
 		sc->swintr++;
 		t4_write_reg(sc, MYPF_REG(A_PL_PF_INT_CAUSE), v);
 	}
 
 	t4_slow_intr_handler(sc, verbose);
 }
 
 /*
  * Interrupt handler for iq-only queues.  The firmware event queue is the only
  * such queue right now.
  */
 void
 t4_intr_evt(void *arg)
 {
 	struct sge_iq *iq = arg;
 
 	if (atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_BUSY)) {
 		service_iq(iq, 0);
 		(void) atomic_cmpset_int(&iq->state, IQS_BUSY, IQS_IDLE);
 	}
 }
 
 /*
  * Interrupt handler for iq+fl queues.
  */
 void
 t4_intr(void *arg)
 {
 	struct sge_iq *iq = arg;
 
 	if (atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_BUSY)) {
 		service_iq_fl(iq, 0);
 		(void) atomic_cmpset_int(&iq->state, IQS_BUSY, IQS_IDLE);
 	}
 }
 
 #ifdef DEV_NETMAP
 /*
  * Interrupt handler for netmap rx queues.
  */
 void
 t4_nm_intr(void *arg)
 {
 	struct sge_nm_rxq *nm_rxq = arg;
 
 	if (atomic_cmpset_int(&nm_rxq->nm_state, NM_ON, NM_BUSY)) {
 		service_nm_rxq(nm_rxq);
 		(void) atomic_cmpset_int(&nm_rxq->nm_state, NM_BUSY, NM_ON);
 	}
 }
 
 /*
  * Interrupt handler for vectors shared between NIC and netmap rx queues.
  */
 void
 t4_vi_intr(void *arg)
 {
 	struct irq *irq = arg;
 
 	MPASS(irq->nm_rxq != NULL);
 	t4_nm_intr(irq->nm_rxq);
 
 	MPASS(irq->rxq != NULL);
 	t4_intr(irq->rxq);
 }
 #endif
 
 /*
  * Deals with interrupts on an iq-only (no freelist) queue.
  */
 static int
 service_iq(struct sge_iq *iq, int budget)
 {
 	struct sge_iq *q;
 	struct adapter *sc = iq->adapter;
 	struct iq_desc *d = &iq->desc[iq->cidx];
 	int ndescs = 0, limit;
 	int rsp_type;
 	uint32_t lq;
 	STAILQ_HEAD(, sge_iq) iql = STAILQ_HEAD_INITIALIZER(iql);
 
 	KASSERT(iq->state == IQS_BUSY, ("%s: iq %p not BUSY", __func__, iq));
 	KASSERT((iq->flags & IQ_HAS_FL) == 0,
 	    ("%s: called for iq %p with fl (iq->flags 0x%x)", __func__, iq,
 	    iq->flags));
 	MPASS((iq->flags & IQ_ADJ_CREDIT) == 0);
 	MPASS((iq->flags & IQ_LRO_ENABLED) == 0);
 
 	limit = budget ? budget : iq->qsize / 16;
 
 	/*
 	 * We always come back and check the descriptor ring for new indirect
 	 * interrupts and other responses after running a single handler.
 	 */
 	for (;;) {
 		while ((d->rsp.u.type_gen & F_RSPD_GEN) == iq->gen) {
 
 			rmb();
 
 			rsp_type = G_RSPD_TYPE(d->rsp.u.type_gen);
 			lq = be32toh(d->rsp.pldbuflen_qid);
 
 			switch (rsp_type) {
 			case X_RSPD_TYPE_FLBUF:
 				panic("%s: data for an iq (%p) with no freelist",
 				    __func__, iq);
 
 				/* NOTREACHED */
 
 			case X_RSPD_TYPE_CPL:
 				KASSERT(d->rss.opcode < NUM_CPL_CMDS,
 				    ("%s: bad opcode %02x.", __func__,
 				    d->rss.opcode));
 				t4_cpl_handler[d->rss.opcode](iq, &d->rss, NULL);
 				break;
 
 			case X_RSPD_TYPE_INTR:
 				/*
 				 * There are 1K interrupt-capable queues (qids 0
 				 * through 1023).  A response type indicating a
 				 * forwarded interrupt with a qid >= 1K is an
 				 * iWARP async notification.
 				 */
 				if (__predict_true(lq >= 1024)) {
 					t4_an_handler(iq, &d->rsp);
 					break;
 				}
 
 				q = sc->sge.iqmap[lq - sc->sge.iq_start -
 				    sc->sge.iq_base];
 				if (atomic_cmpset_int(&q->state, IQS_IDLE,
 				    IQS_BUSY)) {
 					if (service_iq_fl(q, q->qsize / 16) == 0) {
 						(void) atomic_cmpset_int(&q->state,
 						    IQS_BUSY, IQS_IDLE);
 					} else {
 						STAILQ_INSERT_TAIL(&iql, q,
 						    link);
 					}
 				}
 				break;
 
 			default:
 				KASSERT(0,
 				    ("%s: illegal response type %d on iq %p",
 				    __func__, rsp_type, iq));
 				log(LOG_ERR,
 				    "%s: illegal response type %d on iq %p",
 				    device_get_nameunit(sc->dev), rsp_type, iq);
 				break;
 			}
 
 			d++;
 			if (__predict_false(++iq->cidx == iq->sidx)) {
 				iq->cidx = 0;
 				iq->gen ^= F_RSPD_GEN;
 				d = &iq->desc[0];
 			}
 			if (__predict_false(++ndescs == limit)) {
 				t4_write_reg(sc, sc->sge_gts_reg,
 				    V_CIDXINC(ndescs) |
 				    V_INGRESSQID(iq->cntxt_id) |
 				    V_SEINTARM(V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX)));
 				ndescs = 0;
 
 				if (budget) {
 					return (EINPROGRESS);
 				}
 			}
 		}
 
 		if (STAILQ_EMPTY(&iql))
 			break;
 
 		/*
 		 * Process the head only, and send it to the back of the list if
 		 * it's still not done.
 		 */
 		q = STAILQ_FIRST(&iql);
 		STAILQ_REMOVE_HEAD(&iql, link);
 		if (service_iq_fl(q, q->qsize / 8) == 0)
 			(void) atomic_cmpset_int(&q->state, IQS_BUSY, IQS_IDLE);
 		else
 			STAILQ_INSERT_TAIL(&iql, q, link);
 	}
 
 	t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(ndescs) |
 	    V_INGRESSQID((u32)iq->cntxt_id) | V_SEINTARM(iq->intr_params));
 
 	return (0);
 }
 
 static inline int
 sort_before_lro(struct lro_ctrl *lro)
 {
 
 	return (lro->lro_mbuf_max != 0);
 }
 
 static inline uint64_t
 last_flit_to_ns(struct adapter *sc, uint64_t lf)
 {
 	uint64_t n = be64toh(lf) & 0xfffffffffffffff;	/* 60b, not 64b. */
 
 	if (n > UINT64_MAX / 1000000)
 		return (n / sc->params.vpd.cclk * 1000000);
 	else
 		return (n * 1000000 / sc->params.vpd.cclk);
 }
 
 /*
  * Deals with interrupts on an iq+fl queue.
  */
 static int
 service_iq_fl(struct sge_iq *iq, int budget)
 {
 	struct sge_rxq *rxq = iq_to_rxq(iq);
 	struct sge_fl *fl;
 	struct adapter *sc = iq->adapter;
 	struct iq_desc *d = &iq->desc[iq->cidx];
 	int ndescs = 0, limit;
 	int rsp_type, refill, starved;
 	uint32_t lq;
 	uint16_t fl_hw_cidx;
 	struct mbuf *m0;
 #if defined(INET) || defined(INET6)
 	const struct timeval lro_timeout = {0, sc->lro_timeout};
 	struct lro_ctrl *lro = &rxq->lro;
 #endif
 
 	KASSERT(iq->state == IQS_BUSY, ("%s: iq %p not BUSY", __func__, iq));
 	MPASS(iq->flags & IQ_HAS_FL);
 
 	limit = budget ? budget : iq->qsize / 16;
 	fl = &rxq->fl;
 	fl_hw_cidx = fl->hw_cidx;	/* stable snapshot */
 
 #if defined(INET) || defined(INET6)
 	if (iq->flags & IQ_ADJ_CREDIT) {
 		MPASS(sort_before_lro(lro));
 		iq->flags &= ~IQ_ADJ_CREDIT;
 		if ((d->rsp.u.type_gen & F_RSPD_GEN) != iq->gen) {
 			tcp_lro_flush_all(lro);
 			t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(1) |
 			    V_INGRESSQID((u32)iq->cntxt_id) |
 			    V_SEINTARM(iq->intr_params));
 			return (0);
 		}
 		ndescs = 1;
 	}
 #else
 	MPASS((iq->flags & IQ_ADJ_CREDIT) == 0);
 #endif
 
 	while ((d->rsp.u.type_gen & F_RSPD_GEN) == iq->gen) {
 
 		rmb();
 
 		refill = 0;
 		m0 = NULL;
 		rsp_type = G_RSPD_TYPE(d->rsp.u.type_gen);
 		lq = be32toh(d->rsp.pldbuflen_qid);
 
 		switch (rsp_type) {
 		case X_RSPD_TYPE_FLBUF:
 
 			m0 = get_fl_payload(sc, fl, lq);
 			if (__predict_false(m0 == NULL))
 				goto out;
 			refill = IDXDIFF(fl->hw_cidx, fl_hw_cidx, fl->sidx) > 2;
 
 			if (iq->flags & IQ_RX_TIMESTAMP) {
 				/*
 				 * Fill up rcv_tstmp but do not set M_TSTMP.
 				 * rcv_tstmp is not in the format that the
 				 * kernel expects and we don't want to mislead
 				 * it.  For now this is only for custom code
 				 * that knows how to interpret cxgbe's stamp.
 				 */
 				m0->m_pkthdr.rcv_tstmp =
 				    last_flit_to_ns(sc, d->rsp.u.last_flit);
 #ifdef notyet
 				m0->m_flags |= M_TSTMP;
 #endif
 			}
 
 			/* fall through */
 
 		case X_RSPD_TYPE_CPL:
 			KASSERT(d->rss.opcode < NUM_CPL_CMDS,
 			    ("%s: bad opcode %02x.", __func__, d->rss.opcode));
 			t4_cpl_handler[d->rss.opcode](iq, &d->rss, m0);
 			break;
 
 		case X_RSPD_TYPE_INTR:
 
 			/*
 			 * There are 1K interrupt-capable queues (qids 0
 			 * through 1023).  A response type indicating a
 			 * forwarded interrupt with a qid >= 1K is an
 			 * iWARP async notification.  That is the only
 			 * acceptable indirect interrupt on this queue.
 			 */
 			if (__predict_false(lq < 1024)) {
 				panic("%s: indirect interrupt on iq_fl %p "
 				    "with qid %u", __func__, iq, lq);
 			}
 
 			t4_an_handler(iq, &d->rsp);
 			break;
 
 		default:
 			KASSERT(0, ("%s: illegal response type %d on iq %p",
 			    __func__, rsp_type, iq));
 			log(LOG_ERR, "%s: illegal response type %d on iq %p",
 			    device_get_nameunit(sc->dev), rsp_type, iq);
 			break;
 		}
 
 		d++;
 		if (__predict_false(++iq->cidx == iq->sidx)) {
 			iq->cidx = 0;
 			iq->gen ^= F_RSPD_GEN;
 			d = &iq->desc[0];
 		}
 		if (__predict_false(++ndescs == limit)) {
 			t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(ndescs) |
 			    V_INGRESSQID(iq->cntxt_id) |
 			    V_SEINTARM(V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX)));
 			ndescs = 0;
 
 #if defined(INET) || defined(INET6)
 			if (iq->flags & IQ_LRO_ENABLED &&
 			    !sort_before_lro(lro) &&
 			    sc->lro_timeout != 0) {
 				tcp_lro_flush_inactive(lro, &lro_timeout);
 			}
 #endif
 			if (budget) {
 				FL_LOCK(fl);
 				refill_fl(sc, fl, 32);
 				FL_UNLOCK(fl);
 
 				return (EINPROGRESS);
 			}
 		}
 		if (refill) {
 			FL_LOCK(fl);
 			refill_fl(sc, fl, 32);
 			FL_UNLOCK(fl);
 			fl_hw_cidx = fl->hw_cidx;
 		}
 	}
 out:
 #if defined(INET) || defined(INET6)
 	if (iq->flags & IQ_LRO_ENABLED) {
 		if (ndescs > 0 && lro->lro_mbuf_count > 8) {
 			MPASS(sort_before_lro(lro));
 			/* hold back one credit and don't flush LRO state */
 			iq->flags |= IQ_ADJ_CREDIT;
 			ndescs--;
 		} else {
 			tcp_lro_flush_all(lro);
 		}
 	}
 #endif
 
 	t4_write_reg(sc, sc->sge_gts_reg, V_CIDXINC(ndescs) |
 	    V_INGRESSQID((u32)iq->cntxt_id) | V_SEINTARM(iq->intr_params));
 
 	FL_LOCK(fl);
 	starved = refill_fl(sc, fl, 64);
 	FL_UNLOCK(fl);
 	if (__predict_false(starved != 0))
 		add_fl_to_sfl(sc, fl);
 
 	return (0);
 }
 
 static inline int
 cl_has_metadata(struct sge_fl *fl, struct cluster_layout *cll)
 {
 	int rc = fl->flags & FL_BUF_PACKING || cll->region1 > 0;
 
 	if (rc)
 		MPASS(cll->region3 >= CL_METADATA_SIZE);
 
 	return (rc);
 }
 
 static inline struct cluster_metadata *
 cl_metadata(struct adapter *sc, struct sge_fl *fl, struct cluster_layout *cll,
     caddr_t cl)
 {
 
 	if (cl_has_metadata(fl, cll)) {
 		struct sw_zone_info *swz = &sc->sge.sw_zone_info[cll->zidx];
 
 		return ((struct cluster_metadata *)(cl + swz->size) - 1);
 	}
 	return (NULL);
 }
 
 static void
 rxb_free(struct mbuf *m)
 {
 	uma_zone_t zone = m->m_ext.ext_arg1;
 	void *cl = m->m_ext.ext_arg2;
 
 	uma_zfree(zone, cl);
 	counter_u64_add(extfree_rels, 1);
 }
 
 /*
  * The mbuf returned by this function could be allocated from zone_mbuf or
  * constructed in spare room in the cluster.
  *
  * The mbuf carries the payload in one of these ways
  * a) frame inside the mbuf (mbuf from zone_mbuf)
  * b) m_cljset (for clusters without metadata) zone_mbuf
  * c) m_extaddref (cluster with metadata) inline mbuf
  * d) m_extaddref (cluster with metadata) zone_mbuf
  */
 static struct mbuf *
 get_scatter_segment(struct adapter *sc, struct sge_fl *fl, int fr_offset,
     int remaining)
 {
 	struct mbuf *m;
 	struct fl_sdesc *sd = &fl->sdesc[fl->cidx];
 	struct cluster_layout *cll = &sd->cll;
 	struct sw_zone_info *swz = &sc->sge.sw_zone_info[cll->zidx];
 	struct hw_buf_info *hwb = &sc->sge.hw_buf_info[cll->hwidx];
 	struct cluster_metadata *clm = cl_metadata(sc, fl, cll, sd->cl);
 	int len, blen;
 	caddr_t payload;
 
 	blen = hwb->size - fl->rx_offset;	/* max possible in this buf */
 	len = min(remaining, blen);
 	payload = sd->cl + cll->region1 + fl->rx_offset;
 	if (fl->flags & FL_BUF_PACKING) {
 		const u_int l = fr_offset + len;
 		const u_int pad = roundup2(l, fl->buf_boundary) - l;
 
 		if (fl->rx_offset + len + pad < hwb->size)
 			blen = len + pad;
 		MPASS(fl->rx_offset + blen <= hwb->size);
 	} else {
 		MPASS(fl->rx_offset == 0);	/* not packing */
 	}
 
 
 	if (sc->sc_do_rxcopy && len < RX_COPY_THRESHOLD) {
 
 		/*
 		 * Copy payload into a freshly allocated mbuf.
 		 */
 
 		m = fr_offset == 0 ?
 		    m_gethdr(M_NOWAIT, MT_DATA) : m_get(M_NOWAIT, MT_DATA);
 		if (m == NULL)
 			return (NULL);
 		fl->mbuf_allocated++;
 
 		/* copy data to mbuf */
 		bcopy(payload, mtod(m, caddr_t), len);
 
 	} else if (sd->nmbuf * MSIZE < cll->region1) {
 
 		/*
 		 * There's spare room in the cluster for an mbuf.  Create one
 		 * and associate it with the payload that's in the cluster.
 		 */
 
 		MPASS(clm != NULL);
 		m = (struct mbuf *)(sd->cl + sd->nmbuf * MSIZE);
 		/* No bzero required */
 		if (m_init(m, M_NOWAIT, MT_DATA,
 		    fr_offset == 0 ? M_PKTHDR | M_NOFREE : M_NOFREE))
 			return (NULL);
 		fl->mbuf_inlined++;
 		m_extaddref(m, payload, blen, &clm->refcount, rxb_free,
 		    swz->zone, sd->cl);
 		if (sd->nmbuf++ == 0)
 			counter_u64_add(extfree_refs, 1);
 
 	} else {
 
 		/*
 		 * Grab an mbuf from zone_mbuf and associate it with the
 		 * payload in the cluster.
 		 */
 
 		m = fr_offset == 0 ?
 		    m_gethdr(M_NOWAIT, MT_DATA) : m_get(M_NOWAIT, MT_DATA);
 		if (m == NULL)
 			return (NULL);
 		fl->mbuf_allocated++;
 		if (clm != NULL) {
 			m_extaddref(m, payload, blen, &clm->refcount,
 			    rxb_free, swz->zone, sd->cl);
 			if (sd->nmbuf++ == 0)
 				counter_u64_add(extfree_refs, 1);
 		} else {
 			m_cljset(m, sd->cl, swz->type);
 			sd->cl = NULL;	/* consumed, not a recycle candidate */
 		}
 	}
 	if (fr_offset == 0)
 		m->m_pkthdr.len = remaining;
 	m->m_len = len;
 
 	if (fl->flags & FL_BUF_PACKING) {
 		fl->rx_offset += blen;
 		MPASS(fl->rx_offset <= hwb->size);
 		if (fl->rx_offset < hwb->size)
 			return (m);	/* without advancing the cidx */
 	}
 
 	if (__predict_false(++fl->cidx % 8 == 0)) {
 		uint16_t cidx = fl->cidx / 8;
 
 		if (__predict_false(cidx == fl->sidx))
 			fl->cidx = cidx = 0;
 		fl->hw_cidx = cidx;
 	}
 	fl->rx_offset = 0;
 
 	return (m);
 }
 
 static struct mbuf *
 get_fl_payload(struct adapter *sc, struct sge_fl *fl, uint32_t len_newbuf)
 {
 	struct mbuf *m0, *m, **pnext;
 	u_int remaining;
 	const u_int total = G_RSPD_LEN(len_newbuf);
 
 	if (__predict_false(fl->flags & FL_BUF_RESUME)) {
 		M_ASSERTPKTHDR(fl->m0);
 		MPASS(fl->m0->m_pkthdr.len == total);
 		MPASS(fl->remaining < total);
 
 		m0 = fl->m0;
 		pnext = fl->pnext;
 		remaining = fl->remaining;
 		fl->flags &= ~FL_BUF_RESUME;
 		goto get_segment;
 	}
 
 	if (fl->rx_offset > 0 && len_newbuf & F_RSPD_NEWBUF) {
 		fl->rx_offset = 0;
 		if (__predict_false(++fl->cidx % 8 == 0)) {
 			uint16_t cidx = fl->cidx / 8;
 
 			if (__predict_false(cidx == fl->sidx))
 				fl->cidx = cidx = 0;
 			fl->hw_cidx = cidx;
 		}
 	}
 
 	/*
 	 * Payload starts at rx_offset in the current hw buffer.  Its length is
 	 * 'len' and it may span multiple hw buffers.
 	 */
 
 	m0 = get_scatter_segment(sc, fl, 0, total);
 	if (m0 == NULL)
 		return (NULL);
 	remaining = total - m0->m_len;
 	pnext = &m0->m_next;
 	while (remaining > 0) {
 get_segment:
 		MPASS(fl->rx_offset == 0);
 		m = get_scatter_segment(sc, fl, total - remaining, remaining);
 		if (__predict_false(m == NULL)) {
 			fl->m0 = m0;
 			fl->pnext = pnext;
 			fl->remaining = remaining;
 			fl->flags |= FL_BUF_RESUME;
 			return (NULL);
 		}
 		*pnext = m;
 		pnext = &m->m_next;
 		remaining -= m->m_len;
 	}
 	*pnext = NULL;
 
 	M_ASSERTPKTHDR(m0);
 	return (m0);
 }
 
 static int
 t4_eth_rx(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m0)
 {
 	struct sge_rxq *rxq = iq_to_rxq(iq);
 	struct ifnet *ifp = rxq->ifp;
 	struct adapter *sc = iq->adapter;
 	const struct cpl_rx_pkt *cpl = (const void *)(rss + 1);
 #if defined(INET) || defined(INET6)
 	struct lro_ctrl *lro = &rxq->lro;
 #endif
 	static const int sw_hashtype[4][2] = {
 		{M_HASHTYPE_NONE, M_HASHTYPE_NONE},
 		{M_HASHTYPE_RSS_IPV4, M_HASHTYPE_RSS_IPV6},
 		{M_HASHTYPE_RSS_TCP_IPV4, M_HASHTYPE_RSS_TCP_IPV6},
 		{M_HASHTYPE_RSS_UDP_IPV4, M_HASHTYPE_RSS_UDP_IPV6},
 	};
 
 	KASSERT(m0 != NULL, ("%s: no payload with opcode %02x", __func__,
 	    rss->opcode));
 
 	m0->m_pkthdr.len -= sc->params.sge.fl_pktshift;
 	m0->m_len -= sc->params.sge.fl_pktshift;
 	m0->m_data += sc->params.sge.fl_pktshift;
 
 	m0->m_pkthdr.rcvif = ifp;
 	M_HASHTYPE_SET(m0, sw_hashtype[rss->hash_type][rss->ipv6]);
 	m0->m_pkthdr.flowid = be32toh(rss->hash_val);
 
 	if (cpl->csum_calc && !(cpl->err_vec & sc->params.tp.err_vec_mask)) {
 		if (ifp->if_capenable & IFCAP_RXCSUM &&
 		    cpl->l2info & htobe32(F_RXF_IP)) {
 			m0->m_pkthdr.csum_flags = (CSUM_IP_CHECKED |
 			    CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 			rxq->rxcsum++;
 		} else if (ifp->if_capenable & IFCAP_RXCSUM_IPV6 &&
 		    cpl->l2info & htobe32(F_RXF_IP6)) {
 			m0->m_pkthdr.csum_flags = (CSUM_DATA_VALID_IPV6 |
 			    CSUM_PSEUDO_HDR);
 			rxq->rxcsum++;
 		}
 
 		if (__predict_false(cpl->ip_frag))
 			m0->m_pkthdr.csum_data = be16toh(cpl->csum);
 		else
 			m0->m_pkthdr.csum_data = 0xffff;
 	}
 
 	if (cpl->vlan_ex) {
 		m0->m_pkthdr.ether_vtag = be16toh(cpl->vlan);
 		m0->m_flags |= M_VLANTAG;
 		rxq->vlan_extraction++;
 	}
 
+#ifdef NUMA
+	m0->m_pkthdr.numa_domain = ifp->if_numa_domain;
+#endif
 #if defined(INET) || defined(INET6)
 	if (iq->flags & IQ_LRO_ENABLED) {
 		if (sort_before_lro(lro)) {
 			tcp_lro_queue_mbuf(lro, m0);
 			return (0); /* queued for sort, then LRO */
 		}
 		if (tcp_lro_rx(lro, m0, 0) == 0)
 			return (0); /* queued for LRO */
 	}
 #endif
 	ifp->if_input(ifp, m0);
 
 	return (0);
 }
 
 /*
  * Must drain the wrq or make sure that someone else will.
  */
 static void
 wrq_tx_drain(void *arg, int n)
 {
 	struct sge_wrq *wrq = arg;
 	struct sge_eq *eq = &wrq->eq;
 
 	EQ_LOCK(eq);
 	if (TAILQ_EMPTY(&wrq->incomplete_wrs) && !STAILQ_EMPTY(&wrq->wr_list))
 		drain_wrq_wr_list(wrq->adapter, wrq);
 	EQ_UNLOCK(eq);
 }
 
 static void
 drain_wrq_wr_list(struct adapter *sc, struct sge_wrq *wrq)
 {
 	struct sge_eq *eq = &wrq->eq;
 	u_int available, dbdiff;	/* # of hardware descriptors */
 	u_int n;
 	struct wrqe *wr;
 	struct fw_eth_tx_pkt_wr *dst;	/* any fw WR struct will do */
 
 	EQ_LOCK_ASSERT_OWNED(eq);
 	MPASS(TAILQ_EMPTY(&wrq->incomplete_wrs));
 	wr = STAILQ_FIRST(&wrq->wr_list);
 	MPASS(wr != NULL);	/* Must be called with something useful to do */
 	MPASS(eq->pidx == eq->dbidx);
 	dbdiff = 0;
 
 	do {
 		eq->cidx = read_hw_cidx(eq);
 		if (eq->pidx == eq->cidx)
 			available = eq->sidx - 1;
 		else
 			available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1;
 
 		MPASS(wr->wrq == wrq);
 		n = howmany(wr->wr_len, EQ_ESIZE);
 		if (available < n)
 			break;
 
 		dst = (void *)&eq->desc[eq->pidx];
 		if (__predict_true(eq->sidx - eq->pidx > n)) {
 			/* Won't wrap, won't end exactly at the status page. */
 			bcopy(&wr->wr[0], dst, wr->wr_len);
 			eq->pidx += n;
 		} else {
 			int first_portion = (eq->sidx - eq->pidx) * EQ_ESIZE;
 
 			bcopy(&wr->wr[0], dst, first_portion);
 			if (wr->wr_len > first_portion) {
 				bcopy(&wr->wr[first_portion], &eq->desc[0],
 				    wr->wr_len - first_portion);
 			}
 			eq->pidx = n - (eq->sidx - eq->pidx);
 		}
 		wrq->tx_wrs_copied++;
 
 		if (available < eq->sidx / 4 &&
 		    atomic_cmpset_int(&eq->equiq, 0, 1)) {
 				/*
 				 * XXX: This is not 100% reliable with some
 				 * types of WRs.  But this is a very unusual
 				 * situation for an ofld/ctrl queue anyway.
 				 */
 			dst->equiq_to_len16 |= htobe32(F_FW_WR_EQUIQ |
 			    F_FW_WR_EQUEQ);
 		}
 
 		dbdiff += n;
 		if (dbdiff >= 16) {
 			ring_eq_db(sc, eq, dbdiff);
 			dbdiff = 0;
 		}
 
 		STAILQ_REMOVE_HEAD(&wrq->wr_list, link);
 		free_wrqe(wr);
 		MPASS(wrq->nwr_pending > 0);
 		wrq->nwr_pending--;
 		MPASS(wrq->ndesc_needed >= n);
 		wrq->ndesc_needed -= n;
 	} while ((wr = STAILQ_FIRST(&wrq->wr_list)) != NULL);
 
 	if (dbdiff)
 		ring_eq_db(sc, eq, dbdiff);
 }
 
 /*
  * Doesn't fail.  Holds on to work requests it can't send right away.
  */
 void
 t4_wrq_tx_locked(struct adapter *sc, struct sge_wrq *wrq, struct wrqe *wr)
 {
 #ifdef INVARIANTS
 	struct sge_eq *eq = &wrq->eq;
 #endif
 
 	EQ_LOCK_ASSERT_OWNED(eq);
 	MPASS(wr != NULL);
 	MPASS(wr->wr_len > 0 && wr->wr_len <= SGE_MAX_WR_LEN);
 	MPASS((wr->wr_len & 0x7) == 0);
 
 	STAILQ_INSERT_TAIL(&wrq->wr_list, wr, link);
 	wrq->nwr_pending++;
 	wrq->ndesc_needed += howmany(wr->wr_len, EQ_ESIZE);
 
 	if (!TAILQ_EMPTY(&wrq->incomplete_wrs))
 		return;	/* commit_wrq_wr will drain wr_list as well. */
 
 	drain_wrq_wr_list(sc, wrq);
 
 	/* Doorbell must have caught up to the pidx. */
 	MPASS(eq->pidx == eq->dbidx);
 }
 
 void
 t4_update_fl_bufsize(struct ifnet *ifp)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct adapter *sc = vi->pi->adapter;
 	struct sge_rxq *rxq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 	struct sge_fl *fl;
 	int i, maxp, mtu = ifp->if_mtu;
 
 	maxp = mtu_to_max_payload(sc, mtu, 0);
 	for_each_rxq(vi, i, rxq) {
 		fl = &rxq->fl;
 
 		FL_LOCK(fl);
 		find_best_refill_source(sc, fl, maxp);
 		FL_UNLOCK(fl);
 	}
 #ifdef TCP_OFFLOAD
 	maxp = mtu_to_max_payload(sc, mtu, 1);
 	for_each_ofld_rxq(vi, i, ofld_rxq) {
 		fl = &ofld_rxq->fl;
 
 		FL_LOCK(fl);
 		find_best_refill_source(sc, fl, maxp);
 		FL_UNLOCK(fl);
 	}
 #endif
 }
 
 static inline int
 mbuf_nsegs(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	KASSERT(m->m_pkthdr.l5hlen > 0,
 	    ("%s: mbuf %p missing information on # of segments.", __func__, m));
 
 	return (m->m_pkthdr.l5hlen);
 }
 
 static inline void
 set_mbuf_nsegs(struct mbuf *m, uint8_t nsegs)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.l5hlen = nsegs;
 }
 
 static inline int
 mbuf_cflags(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	return (m->m_pkthdr.PH_loc.eight[4]);
 }
 
 static inline void
 set_mbuf_cflags(struct mbuf *m, uint8_t flags)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.PH_loc.eight[4] = flags;
 }
 
 static inline int
 mbuf_len16(struct mbuf *m)
 {
 	int n;
 
 	M_ASSERTPKTHDR(m);
 	n = m->m_pkthdr.PH_loc.eight[0];
 	MPASS(n > 0 && n <= SGE_MAX_WR_LEN / 16);
 
 	return (n);
 }
 
 static inline void
 set_mbuf_len16(struct mbuf *m, uint8_t len16)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.PH_loc.eight[0] = len16;
 }
 
 #ifdef RATELIMIT
 static inline int
 mbuf_eo_nsegs(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	return (m->m_pkthdr.PH_loc.eight[1]);
 }
 
 static inline void
 set_mbuf_eo_nsegs(struct mbuf *m, uint8_t nsegs)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.PH_loc.eight[1] = nsegs;
 }
 
 static inline int
 mbuf_eo_len16(struct mbuf *m)
 {
 	int n;
 
 	M_ASSERTPKTHDR(m);
 	n = m->m_pkthdr.PH_loc.eight[2];
 	MPASS(n > 0 && n <= SGE_MAX_WR_LEN / 16);
 
 	return (n);
 }
 
 static inline void
 set_mbuf_eo_len16(struct mbuf *m, uint8_t len16)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.PH_loc.eight[2] = len16;
 }
 
 static inline int
 mbuf_eo_tsclk_tsoff(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	return (m->m_pkthdr.PH_loc.eight[3]);
 }
 
 static inline void
 set_mbuf_eo_tsclk_tsoff(struct mbuf *m, uint8_t tsclk_tsoff)
 {
 
 	M_ASSERTPKTHDR(m);
 	m->m_pkthdr.PH_loc.eight[3] = tsclk_tsoff;
 }
 
 static inline int
 needs_eo(struct mbuf *m)
 {
 
 	return (m->m_pkthdr.snd_tag != NULL);
 }
 #endif
 
 /*
  * Try to allocate an mbuf to contain a raw work request.  To make it
  * easy to construct the work request, don't allocate a chain but a
  * single mbuf.
  */
 struct mbuf *
 alloc_wr_mbuf(int len, int how)
 {
 	struct mbuf *m;
 
 	if (len <= MHLEN)
 		m = m_gethdr(how, MT_DATA);
 	else if (len <= MCLBYTES)
 		m = m_getcl(how, MT_DATA, M_PKTHDR);
 	else
 		m = NULL;
 	if (m == NULL)
 		return (NULL);
 	m->m_pkthdr.len = len;
 	m->m_len = len;
 	set_mbuf_cflags(m, MC_RAW_WR);
 	set_mbuf_len16(m, howmany(len, 16));
 	return (m);
 }
 
 static inline int
 needs_tso(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 
 	return (m->m_pkthdr.csum_flags & CSUM_TSO);
 }
 
 static inline int
 needs_l3_csum(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 
 	return (m->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TSO));
 }
 
 static inline int
 needs_l4_csum(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 
 	return (m->m_pkthdr.csum_flags & (CSUM_TCP | CSUM_UDP | CSUM_UDP_IPV6 |
 	    CSUM_TCP_IPV6 | CSUM_TSO));
 }
 
 static inline int
 needs_tcp_csum(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	return (m->m_pkthdr.csum_flags & (CSUM_TCP | CSUM_TCP_IPV6 | CSUM_TSO));
 }
 
 #ifdef RATELIMIT
 static inline int
 needs_udp_csum(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 	return (m->m_pkthdr.csum_flags & (CSUM_UDP | CSUM_UDP_IPV6));
 }
 #endif
 
 static inline int
 needs_vlan_insertion(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 
 	return (m->m_flags & M_VLANTAG);
 }
 
 static void *
 m_advance(struct mbuf **pm, int *poffset, int len)
 {
 	struct mbuf *m = *pm;
 	int offset = *poffset;
 	uintptr_t p = 0;
 
 	MPASS(len > 0);
 
 	for (;;) {
 		if (offset + len < m->m_len) {
 			offset += len;
 			p = mtod(m, uintptr_t) + offset;
 			break;
 		}
 		len -= m->m_len - offset;
 		m = m->m_next;
 		offset = 0;
 		MPASS(m != NULL);
 	}
 	*poffset = offset;
 	*pm = m;
 	return ((void *)p);
 }
 
 /*
  * Can deal with empty mbufs in the chain that have m_len = 0, but the chain
  * must have at least one mbuf that's not empty.  It is possible for this
  * routine to return 0 if skip accounts for all the contents of the mbuf chain.
  */
 static inline int
 count_mbuf_nsegs(struct mbuf *m, int skip)
 {
 	vm_paddr_t lastb, next;
 	vm_offset_t va;
 	int len, nsegs;
 
 	M_ASSERTPKTHDR(m);
 	MPASS(m->m_pkthdr.len > 0);
 	MPASS(m->m_pkthdr.len >= skip);
 
 	nsegs = 0;
 	lastb = 0;
 	for (; m; m = m->m_next) {
 
 		len = m->m_len;
 		if (__predict_false(len == 0))
 			continue;
 		if (skip >= len) {
 			skip -= len;
 			continue;
 		}
 		va = mtod(m, vm_offset_t) + skip;
 		len -= skip;
 		skip = 0;
 		next = pmap_kextract(va);
 		nsegs += sglist_count((void *)(uintptr_t)va, len);
 		if (lastb + 1 == next)
 			nsegs--;
 		lastb = pmap_kextract(va + len - 1);
 	}
 
 	return (nsegs);
 }
 
 /*
  * Analyze the mbuf to determine its tx needs.  The mbuf passed in may change:
  * a) caller can assume it's been freed if this function returns with an error.
  * b) it may get defragged up if the gather list is too long for the hardware.
  */
 int
 parse_pkt(struct adapter *sc, struct mbuf **mp)
 {
 	struct mbuf *m0 = *mp, *m;
 	int rc, nsegs, defragged = 0, offset;
 	struct ether_header *eh;
 	void *l3hdr;
 #if defined(INET) || defined(INET6)
 	struct tcphdr *tcp;
 #endif
 	uint16_t eh_type;
 
 	M_ASSERTPKTHDR(m0);
 	if (__predict_false(m0->m_pkthdr.len < ETHER_HDR_LEN)) {
 		rc = EINVAL;
 fail:
 		m_freem(m0);
 		*mp = NULL;
 		return (rc);
 	}
 restart:
 	/*
 	 * First count the number of gather list segments in the payload.
 	 * Defrag the mbuf if nsegs exceeds the hardware limit.
 	 */
 	M_ASSERTPKTHDR(m0);
 	MPASS(m0->m_pkthdr.len > 0);
 	nsegs = count_mbuf_nsegs(m0, 0);
 	if (nsegs > (needs_tso(m0) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS)) {
 		if (defragged++ > 0 || (m = m_defrag(m0, M_NOWAIT)) == NULL) {
 			rc = EFBIG;
 			goto fail;
 		}
 		*mp = m0 = m;	/* update caller's copy after defrag */
 		goto restart;
 	}
 
 	if (__predict_false(nsegs > 2 && m0->m_pkthdr.len <= MHLEN)) {
 		m0 = m_pullup(m0, m0->m_pkthdr.len);
 		if (m0 == NULL) {
 			/* Should have left well enough alone. */
 			rc = EFBIG;
 			goto fail;
 		}
 		*mp = m0;	/* update caller's copy after pullup */
 		goto restart;
 	}
 	set_mbuf_nsegs(m0, nsegs);
 	set_mbuf_cflags(m0, 0);
 	if (sc->flags & IS_VF)
 		set_mbuf_len16(m0, txpkt_vm_len16(nsegs, needs_tso(m0)));
 	else
 		set_mbuf_len16(m0, txpkt_len16(nsegs, needs_tso(m0)));
 
 #ifdef RATELIMIT
 	/*
 	 * Ethofld is limited to TCP and UDP for now, and only when L4 hw
 	 * checksumming is enabled.  needs_l4_csum happens to check for all the
 	 * right things.
 	 */
 	if (__predict_false(needs_eo(m0) && !needs_l4_csum(m0)))
 		m0->m_pkthdr.snd_tag = NULL;
 #endif
 
 	if (!needs_tso(m0) &&
 #ifdef RATELIMIT
 	    !needs_eo(m0) &&
 #endif
 	    !(sc->flags & IS_VF && (needs_l3_csum(m0) || needs_l4_csum(m0))))
 		return (0);
 
 	m = m0;
 	eh = mtod(m, struct ether_header *);
 	eh_type = ntohs(eh->ether_type);
 	if (eh_type == ETHERTYPE_VLAN) {
 		struct ether_vlan_header *evh = (void *)eh;
 
 		eh_type = ntohs(evh->evl_proto);
 		m0->m_pkthdr.l2hlen = sizeof(*evh);
 	} else
 		m0->m_pkthdr.l2hlen = sizeof(*eh);
 
 	offset = 0;
 	l3hdr = m_advance(&m, &offset, m0->m_pkthdr.l2hlen);
 
 	switch (eh_type) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 	{
 		struct ip6_hdr *ip6 = l3hdr;
 
 		MPASS(!needs_tso(m0) || ip6->ip6_nxt == IPPROTO_TCP);
 
 		m0->m_pkthdr.l3hlen = sizeof(*ip6);
 		break;
 	}
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 	{
 		struct ip *ip = l3hdr;
 
 		m0->m_pkthdr.l3hlen = ip->ip_hl * 4;
 		break;
 	}
 #endif
 	default:
 		panic("%s: ethertype 0x%04x unknown.  if_cxgbe must be compiled"
 		    " with the same INET/INET6 options as the kernel.",
 		    __func__, eh_type);
 	}
 
 #if defined(INET) || defined(INET6)
 	if (needs_tcp_csum(m0)) {
 		tcp = m_advance(&m, &offset, m0->m_pkthdr.l3hlen);
 		m0->m_pkthdr.l4hlen = tcp->th_off * 4;
 #ifdef RATELIMIT
 		if (tsclk >= 0 && *(uint32_t *)(tcp + 1) == ntohl(0x0101080a)) {
 			set_mbuf_eo_tsclk_tsoff(m0,
 			    V_FW_ETH_TX_EO_WR_TSCLK(tsclk) |
 			    V_FW_ETH_TX_EO_WR_TSOFF(sizeof(*tcp) / 2 + 1));
 		} else
 			set_mbuf_eo_tsclk_tsoff(m0, 0);
 	} else if (needs_udp_csum(m)) {
 		m0->m_pkthdr.l4hlen = sizeof(struct udphdr);
 #endif
 	}
 #ifdef RATELIMIT
 	if (needs_eo(m0)) {
 		u_int immhdrs;
 
 		/* EO WRs have the headers in the WR and not the GL. */
 		immhdrs = m0->m_pkthdr.l2hlen + m0->m_pkthdr.l3hlen +
 		    m0->m_pkthdr.l4hlen;
 		nsegs = count_mbuf_nsegs(m0, immhdrs);
 		set_mbuf_eo_nsegs(m0, nsegs);
 		set_mbuf_eo_len16(m0,
 		    txpkt_eo_len16(nsegs, immhdrs, needs_tso(m0)));
 	}
 #endif
 #endif
 	MPASS(m0 == *mp);
 	return (0);
 }
 
 void *
 start_wrq_wr(struct sge_wrq *wrq, int len16, struct wrq_cookie *cookie)
 {
 	struct sge_eq *eq = &wrq->eq;
 	struct adapter *sc = wrq->adapter;
 	int ndesc, available;
 	struct wrqe *wr;
 	void *w;
 
 	MPASS(len16 > 0);
 	ndesc = howmany(len16, EQ_ESIZE / 16);
 	MPASS(ndesc > 0 && ndesc <= SGE_MAX_WR_NDESC);
 
 	EQ_LOCK(eq);
 
 	if (TAILQ_EMPTY(&wrq->incomplete_wrs) && !STAILQ_EMPTY(&wrq->wr_list))
 		drain_wrq_wr_list(sc, wrq);
 
 	if (!STAILQ_EMPTY(&wrq->wr_list)) {
 slowpath:
 		EQ_UNLOCK(eq);
 		wr = alloc_wrqe(len16 * 16, wrq);
 		if (__predict_false(wr == NULL))
 			return (NULL);
 		cookie->pidx = -1;
 		cookie->ndesc = ndesc;
 		return (&wr->wr);
 	}
 
 	eq->cidx = read_hw_cidx(eq);
 	if (eq->pidx == eq->cidx)
 		available = eq->sidx - 1;
 	else
 		available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1;
 	if (available < ndesc)
 		goto slowpath;
 
 	cookie->pidx = eq->pidx;
 	cookie->ndesc = ndesc;
 	TAILQ_INSERT_TAIL(&wrq->incomplete_wrs, cookie, link);
 
 	w = &eq->desc[eq->pidx];
 	IDXINCR(eq->pidx, ndesc, eq->sidx);
 	if (__predict_false(cookie->pidx + ndesc > eq->sidx)) {
 		w = &wrq->ss[0];
 		wrq->ss_pidx = cookie->pidx;
 		wrq->ss_len = len16 * 16;
 	}
 
 	EQ_UNLOCK(eq);
 
 	return (w);
 }
 
 void
 commit_wrq_wr(struct sge_wrq *wrq, void *w, struct wrq_cookie *cookie)
 {
 	struct sge_eq *eq = &wrq->eq;
 	struct adapter *sc = wrq->adapter;
 	int ndesc, pidx;
 	struct wrq_cookie *prev, *next;
 
 	if (cookie->pidx == -1) {
 		struct wrqe *wr = __containerof(w, struct wrqe, wr);
 
 		t4_wrq_tx(sc, wr);
 		return;
 	}
 
 	if (__predict_false(w == &wrq->ss[0])) {
 		int n = (eq->sidx - wrq->ss_pidx) * EQ_ESIZE;
 
 		MPASS(wrq->ss_len > n);	/* WR had better wrap around. */
 		bcopy(&wrq->ss[0], &eq->desc[wrq->ss_pidx], n);
 		bcopy(&wrq->ss[n], &eq->desc[0], wrq->ss_len - n);
 		wrq->tx_wrs_ss++;
 	} else
 		wrq->tx_wrs_direct++;
 
 	EQ_LOCK(eq);
 	ndesc = cookie->ndesc;	/* Can be more than SGE_MAX_WR_NDESC here. */
 	pidx = cookie->pidx;
 	MPASS(pidx >= 0 && pidx < eq->sidx);
 	prev = TAILQ_PREV(cookie, wrq_incomplete_wrs, link);
 	next = TAILQ_NEXT(cookie, link);
 	if (prev == NULL) {
 		MPASS(pidx == eq->dbidx);
 		if (next == NULL || ndesc >= 16) {
 			int available;
 			struct fw_eth_tx_pkt_wr *dst;	/* any fw WR struct will do */
 
 			/*
 			 * Note that the WR via which we'll request tx updates
 			 * is at pidx and not eq->pidx, which has moved on
 			 * already.
 			 */
 			dst = (void *)&eq->desc[pidx];
 			available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1;
 			if (available < eq->sidx / 4 &&
 			    atomic_cmpset_int(&eq->equiq, 0, 1)) {
 				/*
 				 * XXX: This is not 100% reliable with some
 				 * types of WRs.  But this is a very unusual
 				 * situation for an ofld/ctrl queue anyway.
 				 */
 				dst->equiq_to_len16 |= htobe32(F_FW_WR_EQUIQ |
 				    F_FW_WR_EQUEQ);
 			}
 
 			ring_eq_db(wrq->adapter, eq, ndesc);
 		} else {
 			MPASS(IDXDIFF(next->pidx, pidx, eq->sidx) == ndesc);
 			next->pidx = pidx;
 			next->ndesc += ndesc;
 		}
 	} else {
 		MPASS(IDXDIFF(pidx, prev->pidx, eq->sidx) == prev->ndesc);
 		prev->ndesc += ndesc;
 	}
 	TAILQ_REMOVE(&wrq->incomplete_wrs, cookie, link);
 
 	if (TAILQ_EMPTY(&wrq->incomplete_wrs) && !STAILQ_EMPTY(&wrq->wr_list))
 		drain_wrq_wr_list(sc, wrq);
 
 #ifdef INVARIANTS
 	if (TAILQ_EMPTY(&wrq->incomplete_wrs)) {
 		/* Doorbell must have caught up to the pidx. */
 		MPASS(wrq->eq.pidx == wrq->eq.dbidx);
 	}
 #endif
 	EQ_UNLOCK(eq);
 }
 
 static u_int
 can_resume_eth_tx(struct mp_ring *r)
 {
 	struct sge_eq *eq = r->cookie;
 
 	return (total_available_tx_desc(eq) > eq->sidx / 8);
 }
 
 static inline int
 cannot_use_txpkts(struct mbuf *m)
 {
 	/* maybe put a GL limit too, to avoid silliness? */
 
 	return (needs_tso(m) || (mbuf_cflags(m) & MC_RAW_WR) != 0);
 }
 
 static inline int
 discard_tx(struct sge_eq *eq)
 {
 
 	return ((eq->flags & (EQ_ENABLED | EQ_QFLUSH)) != EQ_ENABLED);
 }
 
 static inline int
 wr_can_update_eq(struct fw_eth_tx_pkts_wr *wr)
 {
 
 	switch (G_FW_WR_OP(be32toh(wr->op_pkd))) {
 	case FW_ULPTX_WR:
 	case FW_ETH_TX_PKT_WR:
 	case FW_ETH_TX_PKTS_WR:
 	case FW_ETH_TX_PKT_VM_WR:
 		return (1);
 	default:
 		return (0);
 	}
 }
 
 /*
  * r->items[cidx] to r->items[pidx], with a wraparound at r->size, are ready to
  * be consumed.  Return the actual number consumed.  0 indicates a stall.
  */
 static u_int
 eth_tx(struct mp_ring *r, u_int cidx, u_int pidx)
 {
 	struct sge_txq *txq = r->cookie;
 	struct sge_eq *eq = &txq->eq;
 	struct ifnet *ifp = txq->ifp;
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	u_int total, remaining;		/* # of packets */
 	u_int available, dbdiff;	/* # of hardware descriptors */
 	u_int n, next_cidx;
 	struct mbuf *m0, *tail;
 	struct txpkts txp;
 	struct fw_eth_tx_pkts_wr *wr;	/* any fw WR struct will do */
 
 	remaining = IDXDIFF(pidx, cidx, r->size);
 	MPASS(remaining > 0);	/* Must not be called without work to do. */
 	total = 0;
 
 	TXQ_LOCK(txq);
 	if (__predict_false(discard_tx(eq))) {
 		while (cidx != pidx) {
 			m0 = r->items[cidx];
 			m_freem(m0);
 			if (++cidx == r->size)
 				cidx = 0;
 		}
 		reclaim_tx_descs(txq, 2048);
 		total = remaining;
 		goto done;
 	}
 
 	/* How many hardware descriptors do we have readily available. */
 	if (eq->pidx == eq->cidx)
 		available = eq->sidx - 1;
 	else
 		available = IDXDIFF(eq->cidx, eq->pidx, eq->sidx) - 1;
 	dbdiff = IDXDIFF(eq->pidx, eq->dbidx, eq->sidx);
 
 	while (remaining > 0) {
 
 		m0 = r->items[cidx];
 		M_ASSERTPKTHDR(m0);
 		MPASS(m0->m_nextpkt == NULL);
 
 		if (available < SGE_MAX_WR_NDESC) {
 			available += reclaim_tx_descs(txq, 64);
 			if (available < howmany(mbuf_len16(m0), EQ_ESIZE / 16))
 				break;	/* out of descriptors */
 		}
 
 		next_cidx = cidx + 1;
 		if (__predict_false(next_cidx == r->size))
 			next_cidx = 0;
 
 		wr = (void *)&eq->desc[eq->pidx];
 		if (sc->flags & IS_VF) {
 			total++;
 			remaining--;
 			ETHER_BPF_MTAP(ifp, m0);
 			n = write_txpkt_vm_wr(sc, txq, (void *)wr, m0,
 			    available);
 		} else if (remaining > 1 &&
 		    try_txpkts(m0, r->items[next_cidx], &txp, available) == 0) {
 
 			/* pkts at cidx, next_cidx should both be in txp. */
 			MPASS(txp.npkt == 2);
 			tail = r->items[next_cidx];
 			MPASS(tail->m_nextpkt == NULL);
 			ETHER_BPF_MTAP(ifp, m0);
 			ETHER_BPF_MTAP(ifp, tail);
 			m0->m_nextpkt = tail;
 
 			if (__predict_false(++next_cidx == r->size))
 				next_cidx = 0;
 
 			while (next_cidx != pidx) {
 				if (add_to_txpkts(r->items[next_cidx], &txp,
 				    available) != 0)
 					break;
 				tail->m_nextpkt = r->items[next_cidx];
 				tail = tail->m_nextpkt;
 				ETHER_BPF_MTAP(ifp, tail);
 				if (__predict_false(++next_cidx == r->size))
 					next_cidx = 0;
 			}
 
 			n = write_txpkts_wr(txq, wr, m0, &txp, available);
 			total += txp.npkt;
 			remaining -= txp.npkt;
 		} else if (mbuf_cflags(m0) & MC_RAW_WR) {
 			total++;
 			remaining--;
 			n = write_raw_wr(txq, (void *)wr, m0, available);
 		} else {
 			total++;
 			remaining--;
 			ETHER_BPF_MTAP(ifp, m0);
 			n = write_txpkt_wr(txq, (void *)wr, m0, available);
 		}
 		MPASS(n >= 1 && n <= available && n <= SGE_MAX_WR_NDESC);
 
 		available -= n;
 		dbdiff += n;
 		IDXINCR(eq->pidx, n, eq->sidx);
 
 		if (wr_can_update_eq(wr)) {
 			if (total_available_tx_desc(eq) < eq->sidx / 4 &&
 			    atomic_cmpset_int(&eq->equiq, 0, 1)) {
 				wr->equiq_to_len16 |= htobe32(F_FW_WR_EQUIQ |
 				    F_FW_WR_EQUEQ);
 				eq->equeqidx = eq->pidx;
 			} else if (IDXDIFF(eq->pidx, eq->equeqidx, eq->sidx) >=
 			    32) {
 				wr->equiq_to_len16 |= htobe32(F_FW_WR_EQUEQ);
 				eq->equeqidx = eq->pidx;
 			}
 		}
 
 		if (dbdiff >= 16 && remaining >= 4) {
 			ring_eq_db(sc, eq, dbdiff);
 			available += reclaim_tx_descs(txq, 4 * dbdiff);
 			dbdiff = 0;
 		}
 
 		cidx = next_cidx;
 	}
 	if (dbdiff != 0) {
 		ring_eq_db(sc, eq, dbdiff);
 		reclaim_tx_descs(txq, 32);
 	}
 done:
 	TXQ_UNLOCK(txq);
 
 	return (total);
 }
 
 static inline void
 init_iq(struct sge_iq *iq, struct adapter *sc, int tmr_idx, int pktc_idx,
     int qsize)
 {
 
 	KASSERT(tmr_idx >= 0 && tmr_idx < SGE_NTIMERS,
 	    ("%s: bad tmr_idx %d", __func__, tmr_idx));
 	KASSERT(pktc_idx < SGE_NCOUNTERS,	/* -ve is ok, means don't use */
 	    ("%s: bad pktc_idx %d", __func__, pktc_idx));
 
 	iq->flags = 0;
 	iq->adapter = sc;
 	iq->intr_params = V_QINTR_TIMER_IDX(tmr_idx);
 	iq->intr_pktc_idx = SGE_NCOUNTERS - 1;
 	if (pktc_idx >= 0) {
 		iq->intr_params |= F_QINTR_CNT_EN;
 		iq->intr_pktc_idx = pktc_idx;
 	}
 	iq->qsize = roundup2(qsize, 16);	/* See FW_IQ_CMD/iqsize */
 	iq->sidx = iq->qsize - sc->params.sge.spg_len / IQ_ESIZE;
 }
 
 static inline void
 init_fl(struct adapter *sc, struct sge_fl *fl, int qsize, int maxp, char *name)
 {
 
 	fl->qsize = qsize;
 	fl->sidx = qsize - sc->params.sge.spg_len / EQ_ESIZE;
 	strlcpy(fl->lockname, name, sizeof(fl->lockname));
 	if (sc->flags & BUF_PACKING_OK &&
 	    ((!is_t4(sc) && buffer_packing) ||	/* T5+: enabled unless 0 */
 	    (is_t4(sc) && buffer_packing == 1)))/* T4: disabled unless 1 */
 		fl->flags |= FL_BUF_PACKING;
 	find_best_refill_source(sc, fl, maxp);
 	find_safe_refill_source(sc, fl);
 }
 
 static inline void
 init_eq(struct adapter *sc, struct sge_eq *eq, int eqtype, int qsize,
     uint8_t tx_chan, uint16_t iqid, char *name)
 {
 	KASSERT(eqtype <= EQ_TYPEMASK, ("%s: bad qtype %d", __func__, eqtype));
 
 	eq->flags = eqtype & EQ_TYPEMASK;
 	eq->tx_chan = tx_chan;
 	eq->iqid = iqid;
 	eq->sidx = qsize - sc->params.sge.spg_len / EQ_ESIZE;
 	strlcpy(eq->lockname, name, sizeof(eq->lockname));
 }
 
 static int
 alloc_ring(struct adapter *sc, size_t len, bus_dma_tag_t *tag,
     bus_dmamap_t *map, bus_addr_t *pa, void **va)
 {
 	int rc;
 
 	rc = bus_dma_tag_create(sc->dmat, 512, 0, BUS_SPACE_MAXADDR,
 	    BUS_SPACE_MAXADDR, NULL, NULL, len, 1, len, 0, NULL, NULL, tag);
 	if (rc != 0) {
 		device_printf(sc->dev, "cannot allocate DMA tag: %d\n", rc);
 		goto done;
 	}
 
 	rc = bus_dmamem_alloc(*tag, va,
 	    BUS_DMA_WAITOK | BUS_DMA_COHERENT | BUS_DMA_ZERO, map);
 	if (rc != 0) {
 		device_printf(sc->dev, "cannot allocate DMA memory: %d\n", rc);
 		goto done;
 	}
 
 	rc = bus_dmamap_load(*tag, *map, *va, len, oneseg_dma_callback, pa, 0);
 	if (rc != 0) {
 		device_printf(sc->dev, "cannot load DMA map: %d\n", rc);
 		goto done;
 	}
 done:
 	if (rc)
 		free_ring(sc, *tag, *map, *pa, *va);
 
 	return (rc);
 }
 
 static int
 free_ring(struct adapter *sc, bus_dma_tag_t tag, bus_dmamap_t map,
     bus_addr_t pa, void *va)
 {
 	if (pa)
 		bus_dmamap_unload(tag, map);
 	if (va)
 		bus_dmamem_free(tag, va, map);
 	if (tag)
 		bus_dma_tag_destroy(tag);
 
 	return (0);
 }
 
 /*
  * Allocates the ring for an ingress queue and an optional freelist.  If the
  * freelist is specified it will be allocated and then associated with the
  * ingress queue.
  *
  * Returns errno on failure.  Resources allocated up to that point may still be
  * allocated.  Caller is responsible for cleanup in case this function fails.
  *
  * If the ingress queue will take interrupts directly then the intr_idx
  * specifies the vector, starting from 0.  -1 means the interrupts for this
  * queue should be forwarded to the fwq.
  */
 static int
 alloc_iq_fl(struct vi_info *vi, struct sge_iq *iq, struct sge_fl *fl,
     int intr_idx, int cong)
 {
 	int rc, i, cntxt_id;
 	size_t len;
 	struct fw_iq_cmd c;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = iq->adapter;
 	struct sge_params *sp = &sc->params.sge;
 	__be32 v = 0;
 
 	len = iq->qsize * IQ_ESIZE;
 	rc = alloc_ring(sc, len, &iq->desc_tag, &iq->desc_map, &iq->ba,
 	    (void **)&iq->desc);
 	if (rc != 0)
 		return (rc);
 
 	bzero(&c, sizeof(c));
 	c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_IQ_CMD) | F_FW_CMD_REQUEST |
 	    F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_IQ_CMD_PFN(sc->pf) |
 	    V_FW_IQ_CMD_VFN(0));
 
 	c.alloc_to_len16 = htobe32(F_FW_IQ_CMD_ALLOC | F_FW_IQ_CMD_IQSTART |
 	    FW_LEN16(c));
 
 	/* Special handling for firmware event queue */
 	if (iq == &sc->sge.fwq)
 		v |= F_FW_IQ_CMD_IQASYNCH;
 
 	if (intr_idx < 0) {
 		/* Forwarded interrupts, all headed to fwq */
 		v |= F_FW_IQ_CMD_IQANDST;
 		v |= V_FW_IQ_CMD_IQANDSTINDEX(sc->sge.fwq.cntxt_id);
 	} else {
 		KASSERT(intr_idx < sc->intr_count,
 		    ("%s: invalid direct intr_idx %d", __func__, intr_idx));
 		v |= V_FW_IQ_CMD_IQANDSTINDEX(intr_idx);
 	}
 
 	c.type_to_iqandstindex = htobe32(v |
 	    V_FW_IQ_CMD_TYPE(FW_IQ_TYPE_FL_INT_CAP) |
 	    V_FW_IQ_CMD_VIID(vi->viid) |
 	    V_FW_IQ_CMD_IQANUD(X_UPDATEDELIVERY_INTERRUPT));
 	c.iqdroprss_to_iqesize = htobe16(V_FW_IQ_CMD_IQPCIECH(pi->tx_chan) |
 	    F_FW_IQ_CMD_IQGTSMODE |
 	    V_FW_IQ_CMD_IQINTCNTTHRESH(iq->intr_pktc_idx) |
 	    V_FW_IQ_CMD_IQESIZE(ilog2(IQ_ESIZE) - 4));
 	c.iqsize = htobe16(iq->qsize);
 	c.iqaddr = htobe64(iq->ba);
 	if (cong >= 0)
 		c.iqns_to_fl0congen = htobe32(F_FW_IQ_CMD_IQFLINTCONGEN);
 
 	if (fl) {
 		mtx_init(&fl->fl_lock, fl->lockname, NULL, MTX_DEF);
 
 		len = fl->qsize * EQ_ESIZE;
 		rc = alloc_ring(sc, len, &fl->desc_tag, &fl->desc_map,
 		    &fl->ba, (void **)&fl->desc);
 		if (rc)
 			return (rc);
 
 		/* Allocate space for one software descriptor per buffer. */
 		rc = alloc_fl_sdesc(fl);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to setup fl software descriptors: %d\n",
 			    rc);
 			return (rc);
 		}
 
 		if (fl->flags & FL_BUF_PACKING) {
 			fl->lowat = roundup2(sp->fl_starve_threshold2, 8);
 			fl->buf_boundary = sp->pack_boundary;
 		} else {
 			fl->lowat = roundup2(sp->fl_starve_threshold, 8);
 			fl->buf_boundary = 16;
 		}
 		if (fl_pad && fl->buf_boundary < sp->pad_boundary)
 			fl->buf_boundary = sp->pad_boundary;
 
 		c.iqns_to_fl0congen |=
 		    htobe32(V_FW_IQ_CMD_FL0HOSTFCMODE(X_HOSTFCMODE_NONE) |
 			F_FW_IQ_CMD_FL0FETCHRO | F_FW_IQ_CMD_FL0DATARO |
 			(fl_pad ? F_FW_IQ_CMD_FL0PADEN : 0) |
 			(fl->flags & FL_BUF_PACKING ? F_FW_IQ_CMD_FL0PACKEN :
 			    0));
 		if (cong >= 0) {
 			c.iqns_to_fl0congen |=
 				htobe32(V_FW_IQ_CMD_FL0CNGCHMAP(cong) |
 				    F_FW_IQ_CMD_FL0CONGCIF |
 				    F_FW_IQ_CMD_FL0CONGEN);
 		}
 		c.fl0dcaen_to_fl0cidxfthresh =
 		    htobe16(V_FW_IQ_CMD_FL0FBMIN(chip_id(sc) <= CHELSIO_T5 ?
 			X_FETCHBURSTMIN_128B : X_FETCHBURSTMIN_64B) |
 			V_FW_IQ_CMD_FL0FBMAX(chip_id(sc) <= CHELSIO_T5 ?
 			X_FETCHBURSTMAX_512B : X_FETCHBURSTMAX_256B));
 		c.fl0size = htobe16(fl->qsize);
 		c.fl0addr = htobe64(fl->ba);
 	}
 
 	rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to create ingress queue: %d\n", rc);
 		return (rc);
 	}
 
 	iq->cidx = 0;
 	iq->gen = F_RSPD_GEN;
 	iq->intr_next = iq->intr_params;
 	iq->cntxt_id = be16toh(c.iqid);
 	iq->abs_id = be16toh(c.physiqid);
 	iq->flags |= IQ_ALLOCATED;
 
 	cntxt_id = iq->cntxt_id - sc->sge.iq_start;
 	if (cntxt_id >= sc->sge.niq) {
 		panic ("%s: iq->cntxt_id (%d) more than the max (%d)", __func__,
 		    cntxt_id, sc->sge.niq - 1);
 	}
 	sc->sge.iqmap[cntxt_id] = iq;
 
 	if (fl) {
 		u_int qid;
 
 		iq->flags |= IQ_HAS_FL;
 		fl->cntxt_id = be16toh(c.fl0id);
 		fl->pidx = fl->cidx = 0;
 
 		cntxt_id = fl->cntxt_id - sc->sge.eq_start;
 		if (cntxt_id >= sc->sge.neq) {
 			panic("%s: fl->cntxt_id (%d) more than the max (%d)",
 			    __func__, cntxt_id, sc->sge.neq - 1);
 		}
 		sc->sge.eqmap[cntxt_id] = (void *)fl;
 
 		qid = fl->cntxt_id;
 		if (isset(&sc->doorbells, DOORBELL_UDB)) {
 			uint32_t s_qpp = sc->params.sge.eq_s_qpp;
 			uint32_t mask = (1 << s_qpp) - 1;
 			volatile uint8_t *udb;
 
 			udb = sc->udbs_base + UDBS_DB_OFFSET;
 			udb += (qid >> s_qpp) << PAGE_SHIFT;
 			qid &= mask;
 			if (qid < PAGE_SIZE / UDBS_SEG_SIZE) {
 				udb += qid << UDBS_SEG_SHIFT;
 				qid = 0;
 			}
 			fl->udb = (volatile void *)udb;
 		}
 		fl->dbval = V_QID(qid) | sc->chip_params->sge_fl_db;
 
 		FL_LOCK(fl);
 		/* Enough to make sure the SGE doesn't think it's starved */
 		refill_fl(sc, fl, fl->lowat);
 		FL_UNLOCK(fl);
 	}
 
 	if (chip_id(sc) >= CHELSIO_T5 && !(sc->flags & IS_VF) && cong >= 0) {
 		uint32_t param, val;
 
 		param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) |
 		    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_CONM_CTXT) |
 		    V_FW_PARAMS_PARAM_YZ(iq->cntxt_id);
 		if (cong == 0)
 			val = 1 << 19;
 		else {
 			val = 2 << 19;
 			for (i = 0; i < 4; i++) {
 				if (cong & (1 << i))
 					val |= 1 << (i << 2);
 			}
 		}
 
 		rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 		if (rc != 0) {
 			/* report error but carry on */
 			device_printf(sc->dev,
 			    "failed to set congestion manager context for "
 			    "ingress queue %d: %d\n", iq->cntxt_id, rc);
 		}
 	}
 
 	/* Enable IQ interrupts */
 	atomic_store_rel_int(&iq->state, IQS_IDLE);
 	t4_write_reg(sc, sc->sge_gts_reg, V_SEINTARM(iq->intr_params) |
 	    V_INGRESSQID(iq->cntxt_id));
 
 	return (0);
 }
 
 static int
 free_iq_fl(struct vi_info *vi, struct sge_iq *iq, struct sge_fl *fl)
 {
 	int rc;
 	struct adapter *sc = iq->adapter;
 	device_t dev;
 
 	if (sc == NULL)
 		return (0);	/* nothing to do */
 
 	dev = vi ? vi->dev : sc->dev;
 
 	if (iq->flags & IQ_ALLOCATED) {
 		rc = -t4_iq_free(sc, sc->mbox, sc->pf, 0,
 		    FW_IQ_TYPE_FL_INT_CAP, iq->cntxt_id,
 		    fl ? fl->cntxt_id : 0xffff, 0xffff);
 		if (rc != 0) {
 			device_printf(dev,
 			    "failed to free queue %p: %d\n", iq, rc);
 			return (rc);
 		}
 		iq->flags &= ~IQ_ALLOCATED;
 	}
 
 	free_ring(sc, iq->desc_tag, iq->desc_map, iq->ba, iq->desc);
 
 	bzero(iq, sizeof(*iq));
 
 	if (fl) {
 		free_ring(sc, fl->desc_tag, fl->desc_map, fl->ba,
 		    fl->desc);
 
 		if (fl->sdesc)
 			free_fl_sdesc(sc, fl);
 
 		if (mtx_initialized(&fl->fl_lock))
 			mtx_destroy(&fl->fl_lock);
 
 		bzero(fl, sizeof(*fl));
 	}
 
 	return (0);
 }
 
 static void
 add_iq_sysctls(struct sysctl_ctx_list *ctx, struct sysctl_oid *oid,
     struct sge_iq *iq)
 {
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UAUTO(ctx, children, OID_AUTO, "ba", CTLFLAG_RD, &iq->ba,
 	    "bus address of descriptor ring");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "dmalen", CTLFLAG_RD, NULL,
 	    iq->qsize * IQ_ESIZE, "descriptor ring size in bytes");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "abs_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &iq->abs_id, 0, sysctl_uint16, "I",
 	    "absolute id of the queue");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &iq->cntxt_id, 0, sysctl_uint16, "I",
 	    "SGE context id of the queue");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &iq->cidx, 0, sysctl_uint16, "I",
 	    "consumer index");
 }
 
 static void
 add_fl_sysctls(struct adapter *sc, struct sysctl_ctx_list *ctx,
     struct sysctl_oid *oid, struct sge_fl *fl)
 {
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "fl", CTLFLAG_RD, NULL,
 	    "freelist");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UAUTO(ctx, children, OID_AUTO, "ba", CTLFLAG_RD,
 	    &fl->ba, "bus address of descriptor ring");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "dmalen", CTLFLAG_RD, NULL,
 	    fl->sidx * EQ_ESIZE + sc->params.sge.spg_len,
 	    "desc ring size in bytes");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &fl->cntxt_id, 0, sysctl_uint16, "I",
 	    "SGE context id of the freelist");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "padding", CTLFLAG_RD, NULL,
 	    fl_pad ? 1 : 0, "padding enabled");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "packing", CTLFLAG_RD, NULL,
 	    fl->flags & FL_BUF_PACKING ? 1 : 0, "packing enabled");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cidx", CTLFLAG_RD, &fl->cidx,
 	    0, "consumer index");
 	if (fl->flags & FL_BUF_PACKING) {
 		SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_offset",
 		    CTLFLAG_RD, &fl->rx_offset, 0, "packing rx offset");
 	}
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "pidx", CTLFLAG_RD, &fl->pidx,
 	    0, "producer index");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "mbuf_allocated",
 	    CTLFLAG_RD, &fl->mbuf_allocated, "# of mbuf allocated");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "mbuf_inlined",
 	    CTLFLAG_RD, &fl->mbuf_inlined, "# of mbuf inlined in clusters");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_allocated",
 	    CTLFLAG_RD, &fl->cl_allocated, "# of clusters allocated");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_recycled",
 	    CTLFLAG_RD, &fl->cl_recycled, "# of clusters recycled");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "cluster_fast_recycled",
 	    CTLFLAG_RD, &fl->cl_fast_recycled, "# of clusters recycled (fast)");
 }
 
 static int
 alloc_fwq(struct adapter *sc)
 {
 	int rc, intr_idx;
 	struct sge_iq *fwq = &sc->sge.fwq;
 	struct sysctl_oid *oid = device_get_sysctl_tree(sc->dev);
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	init_iq(fwq, sc, 0, 0, FW_IQ_QSIZE);
 	if (sc->flags & IS_VF)
 		intr_idx = 0;
 	else
 		intr_idx = sc->intr_count > 1 ? 1 : 0;
 	rc = alloc_iq_fl(&sc->port[0]->vi[0], fwq, NULL, intr_idx, -1);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to create firmware event queue: %d\n", rc);
 		return (rc);
 	}
 
 	oid = SYSCTL_ADD_NODE(&sc->ctx, children, OID_AUTO, "fwq", CTLFLAG_RD,
 	    NULL, "firmware event queue");
 	add_iq_sysctls(&sc->ctx, oid, fwq);
 
 	return (0);
 }
 
 static int
 free_fwq(struct adapter *sc)
 {
 	return free_iq_fl(NULL, &sc->sge.fwq, NULL);
 }
 
 static int
 alloc_ctrlq(struct adapter *sc, struct sge_wrq *ctrlq, int idx,
     struct sysctl_oid *oid)
 {
 	int rc;
 	char name[16];
 	struct sysctl_oid_list *children;
 
 	snprintf(name, sizeof(name), "%s ctrlq%d", device_get_nameunit(sc->dev),
 	    idx);
 	init_eq(sc, &ctrlq->eq, EQ_CTRL, CTRL_EQ_QSIZE, sc->port[idx]->tx_chan,
 	    sc->sge.fwq.cntxt_id, name);
 
 	children = SYSCTL_CHILDREN(oid);
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(&sc->ctx, children, OID_AUTO, name, CTLFLAG_RD,
 	    NULL, "ctrl queue");
 	rc = alloc_wrq(sc, NULL, ctrlq, oid);
 
 	return (rc);
 }
 
 int
 tnl_cong(struct port_info *pi, int drop)
 {
 
 	if (drop == -1)
 		return (-1);
 	else if (drop == 1)
 		return (0);
 	else
 		return (pi->rx_e_chan_map);
 }
 
 static int
 alloc_rxq(struct vi_info *vi, struct sge_rxq *rxq, int intr_idx, int idx,
     struct sysctl_oid *oid)
 {
 	int rc;
 	struct adapter *sc = vi->pi->adapter;
 	struct sysctl_oid_list *children;
 	char name[16];
 
 	rc = alloc_iq_fl(vi, &rxq->iq, &rxq->fl, intr_idx,
 	    tnl_cong(vi->pi, cong_drop));
 	if (rc != 0)
 		return (rc);
 
 	if (idx == 0)
 		sc->sge.iq_base = rxq->iq.abs_id - rxq->iq.cntxt_id;
 	else
 		KASSERT(rxq->iq.cntxt_id + sc->sge.iq_base == rxq->iq.abs_id,
 		    ("iq_base mismatch"));
 	KASSERT(sc->sge.iq_base == 0 || sc->flags & IS_VF,
 	    ("PF with non-zero iq_base"));
 
 	/*
 	 * The freelist is just barely above the starvation threshold right now,
 	 * fill it up a bit more.
 	 */
 	FL_LOCK(&rxq->fl);
 	refill_fl(sc, &rxq->fl, 128);
 	FL_UNLOCK(&rxq->fl);
 
 #if defined(INET) || defined(INET6)
 	rc = tcp_lro_init_args(&rxq->lro, vi->ifp, lro_entries, lro_mbufs);
 	if (rc != 0)
 		return (rc);
 	MPASS(rxq->lro.ifp == vi->ifp);	/* also indicates LRO init'ed */
 
 	if (vi->ifp->if_capenable & IFCAP_LRO)
 		rxq->iq.flags |= IQ_LRO_ENABLED;
 #endif
 	if (vi->ifp->if_capenable & IFCAP_HWRXTSTMP)
 		rxq->iq.flags |= IQ_RX_TIMESTAMP;
 	rxq->ifp = vi->ifp;
 
 	children = SYSCTL_CHILDREN(oid);
 
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD,
 	    NULL, "rx queue");
 	children = SYSCTL_CHILDREN(oid);
 
 	add_iq_sysctls(&vi->ctx, oid, &rxq->iq);
 #if defined(INET) || defined(INET6)
 	SYSCTL_ADD_U64(&vi->ctx, children, OID_AUTO, "lro_queued", CTLFLAG_RD,
 	    &rxq->lro.lro_queued, 0, NULL);
 	SYSCTL_ADD_U64(&vi->ctx, children, OID_AUTO, "lro_flushed", CTLFLAG_RD,
 	    &rxq->lro.lro_flushed, 0, NULL);
 #endif
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "rxcsum", CTLFLAG_RD,
 	    &rxq->rxcsum, "# of times hardware assisted with checksum");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "vlan_extraction",
 	    CTLFLAG_RD, &rxq->vlan_extraction,
 	    "# of times hardware extracted 802.1Q tag");
 
 	add_fl_sysctls(sc, &vi->ctx, oid, &rxq->fl);
 
 	return (rc);
 }
 
 static int
 free_rxq(struct vi_info *vi, struct sge_rxq *rxq)
 {
 	int rc;
 
 #if defined(INET) || defined(INET6)
 	if (rxq->lro.ifp) {
 		tcp_lro_free(&rxq->lro);
 		rxq->lro.ifp = NULL;
 	}
 #endif
 
 	rc = free_iq_fl(vi, &rxq->iq, &rxq->fl);
 	if (rc == 0)
 		bzero(rxq, sizeof(*rxq));
 
 	return (rc);
 }
 
 #ifdef TCP_OFFLOAD
 static int
 alloc_ofld_rxq(struct vi_info *vi, struct sge_ofld_rxq *ofld_rxq,
     int intr_idx, int idx, struct sysctl_oid *oid)
 {
 	struct port_info *pi = vi->pi;
 	int rc;
 	struct sysctl_oid_list *children;
 	char name[16];
 
 	rc = alloc_iq_fl(vi, &ofld_rxq->iq, &ofld_rxq->fl, intr_idx, 0);
 	if (rc != 0)
 		return (rc);
 
 	children = SYSCTL_CHILDREN(oid);
 
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD,
 	    NULL, "rx queue");
 	add_iq_sysctls(&vi->ctx, oid, &ofld_rxq->iq);
 	add_fl_sysctls(pi->adapter, &vi->ctx, oid, &ofld_rxq->fl);
 
 	return (rc);
 }
 
 static int
 free_ofld_rxq(struct vi_info *vi, struct sge_ofld_rxq *ofld_rxq)
 {
 	int rc;
 
 	rc = free_iq_fl(vi, &ofld_rxq->iq, &ofld_rxq->fl);
 	if (rc == 0)
 		bzero(ofld_rxq, sizeof(*ofld_rxq));
 
 	return (rc);
 }
 #endif
 
 #ifdef DEV_NETMAP
 static int
 alloc_nm_rxq(struct vi_info *vi, struct sge_nm_rxq *nm_rxq, int intr_idx,
     int idx, struct sysctl_oid *oid)
 {
 	int rc;
 	struct sysctl_oid_list *children;
 	struct sysctl_ctx_list *ctx;
 	char name[16];
 	size_t len;
 	struct adapter *sc = vi->pi->adapter;
 	struct netmap_adapter *na = NA(vi->ifp);
 
 	MPASS(na != NULL);
 
 	len = vi->qsize_rxq * IQ_ESIZE;
 	rc = alloc_ring(sc, len, &nm_rxq->iq_desc_tag, &nm_rxq->iq_desc_map,
 	    &nm_rxq->iq_ba, (void **)&nm_rxq->iq_desc);
 	if (rc != 0)
 		return (rc);
 
 	len = na->num_rx_desc * EQ_ESIZE + sc->params.sge.spg_len;
 	rc = alloc_ring(sc, len, &nm_rxq->fl_desc_tag, &nm_rxq->fl_desc_map,
 	    &nm_rxq->fl_ba, (void **)&nm_rxq->fl_desc);
 	if (rc != 0)
 		return (rc);
 
 	nm_rxq->vi = vi;
 	nm_rxq->nid = idx;
 	nm_rxq->iq_cidx = 0;
 	nm_rxq->iq_sidx = vi->qsize_rxq - sc->params.sge.spg_len / IQ_ESIZE;
 	nm_rxq->iq_gen = F_RSPD_GEN;
 	nm_rxq->fl_pidx = nm_rxq->fl_cidx = 0;
 	nm_rxq->fl_sidx = na->num_rx_desc;
 	nm_rxq->intr_idx = intr_idx;
 	nm_rxq->iq_cntxt_id = INVALID_NM_RXQ_CNTXT_ID;
 
 	ctx = &vi->ctx;
 	children = SYSCTL_CHILDREN(oid);
 
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, name, CTLFLAG_RD, NULL,
 	    "rx queue");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "abs_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_abs_id, 0, sysctl_uint16,
 	    "I", "absolute id of the queue");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_cntxt_id, 0, sysctl_uint16,
 	    "I", "SGE context id of the queue");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->iq_cidx, 0, sysctl_uint16, "I",
 	    "consumer index");
 
 	children = SYSCTL_CHILDREN(oid);
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "fl", CTLFLAG_RD, NULL,
 	    "freelist");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cntxt_id",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_rxq->fl_cntxt_id, 0, sysctl_uint16,
 	    "I", "SGE context id of the freelist");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cidx", CTLFLAG_RD,
 	    &nm_rxq->fl_cidx, 0, "consumer index");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "pidx", CTLFLAG_RD,
 	    &nm_rxq->fl_pidx, 0, "producer index");
 
 	return (rc);
 }
 
 
 static int
 free_nm_rxq(struct vi_info *vi, struct sge_nm_rxq *nm_rxq)
 {
 	struct adapter *sc = vi->pi->adapter;
 
 	if (vi->flags & VI_INIT_DONE)
 		MPASS(nm_rxq->iq_cntxt_id == INVALID_NM_RXQ_CNTXT_ID);
 	else
 		MPASS(nm_rxq->iq_cntxt_id == 0);
 
 	free_ring(sc, nm_rxq->iq_desc_tag, nm_rxq->iq_desc_map, nm_rxq->iq_ba,
 	    nm_rxq->iq_desc);
 	free_ring(sc, nm_rxq->fl_desc_tag, nm_rxq->fl_desc_map, nm_rxq->fl_ba,
 	    nm_rxq->fl_desc);
 
 	return (0);
 }
 
 static int
 alloc_nm_txq(struct vi_info *vi, struct sge_nm_txq *nm_txq, int iqidx, int idx,
     struct sysctl_oid *oid)
 {
 	int rc;
 	size_t len;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct netmap_adapter *na = NA(vi->ifp);
 	char name[16];
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	len = na->num_tx_desc * EQ_ESIZE + sc->params.sge.spg_len;
 	rc = alloc_ring(sc, len, &nm_txq->desc_tag, &nm_txq->desc_map,
 	    &nm_txq->ba, (void **)&nm_txq->desc);
 	if (rc)
 		return (rc);
 
 	nm_txq->pidx = nm_txq->cidx = 0;
 	nm_txq->sidx = na->num_tx_desc;
 	nm_txq->nid = idx;
 	nm_txq->iqidx = iqidx;
 	nm_txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT) |
 	    V_TXPKT_INTF(pi->tx_chan) | V_TXPKT_PF(sc->pf) |
 	    V_TXPKT_VF(vi->vin) | V_TXPKT_VF_VLD(vi->vfvld));
 	nm_txq->cntxt_id = INVALID_NM_TXQ_CNTXT_ID;
 
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD,
 	    NULL, "netmap tx queue");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD,
 	    &nm_txq->cntxt_id, 0, "SGE context id of the queue");
 	SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_txq->cidx, 0, sysctl_uint16, "I",
 	    "consumer index");
 	SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "pidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &nm_txq->pidx, 0, sysctl_uint16, "I",
 	    "producer index");
 
 	return (rc);
 }
 
 static int
 free_nm_txq(struct vi_info *vi, struct sge_nm_txq *nm_txq)
 {
 	struct adapter *sc = vi->pi->adapter;
 
 	if (vi->flags & VI_INIT_DONE)
 		MPASS(nm_txq->cntxt_id == INVALID_NM_TXQ_CNTXT_ID);
 	else
 		MPASS(nm_txq->cntxt_id == 0);
 
 	free_ring(sc, nm_txq->desc_tag, nm_txq->desc_map, nm_txq->ba,
 	    nm_txq->desc);
 
 	return (0);
 }
 #endif
 
 /*
  * Returns a reasonable automatic cidx flush threshold for a given queue size.
  */
 static u_int
 qsize_to_fthresh(int qsize)
 {
 	u_int fthresh;
 
 	while (!powerof2(qsize))
 		qsize++;
 	fthresh = ilog2(qsize);
 	if (fthresh > X_CIDXFLUSHTHRESH_128)
 		fthresh = X_CIDXFLUSHTHRESH_128;
 
 	return (fthresh);
 }
 
 static int
 ctrl_eq_alloc(struct adapter *sc, struct sge_eq *eq)
 {
 	int rc, cntxt_id;
 	struct fw_eq_ctrl_cmd c;
 	int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE;
 
 	bzero(&c, sizeof(c));
 
 	c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_EQ_CTRL_CMD) | F_FW_CMD_REQUEST |
 	    F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_CTRL_CMD_PFN(sc->pf) |
 	    V_FW_EQ_CTRL_CMD_VFN(0));
 	c.alloc_to_len16 = htobe32(F_FW_EQ_CTRL_CMD_ALLOC |
 	    F_FW_EQ_CTRL_CMD_EQSTART | FW_LEN16(c));
 	c.cmpliqid_eqid = htonl(V_FW_EQ_CTRL_CMD_CMPLIQID(eq->iqid));
 	c.physeqid_pkd = htobe32(0);
 	c.fetchszm_to_iqid =
 	    htobe32(V_FW_EQ_CTRL_CMD_HOSTFCMODE(X_HOSTFCMODE_STATUS_PAGE) |
 		V_FW_EQ_CTRL_CMD_PCIECHN(eq->tx_chan) |
 		F_FW_EQ_CTRL_CMD_FETCHRO | V_FW_EQ_CTRL_CMD_IQID(eq->iqid));
 	c.dcaen_to_eqsize =
 	    htobe32(V_FW_EQ_CTRL_CMD_FBMIN(X_FETCHBURSTMIN_64B) |
 		V_FW_EQ_CTRL_CMD_FBMAX(X_FETCHBURSTMAX_512B) |
 		V_FW_EQ_CTRL_CMD_CIDXFTHRESH(qsize_to_fthresh(qsize)) |
 		V_FW_EQ_CTRL_CMD_EQSIZE(qsize));
 	c.eqaddr = htobe64(eq->ba);
 
 	rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to create control queue %d: %d\n", eq->tx_chan, rc);
 		return (rc);
 	}
 	eq->flags |= EQ_ALLOCATED;
 
 	eq->cntxt_id = G_FW_EQ_CTRL_CMD_EQID(be32toh(c.cmpliqid_eqid));
 	cntxt_id = eq->cntxt_id - sc->sge.eq_start;
 	if (cntxt_id >= sc->sge.neq)
 	    panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__,
 		cntxt_id, sc->sge.neq - 1);
 	sc->sge.eqmap[cntxt_id] = eq;
 
 	return (rc);
 }
 
 static int
 eth_eq_alloc(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq)
 {
 	int rc, cntxt_id;
 	struct fw_eq_eth_cmd c;
 	int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE;
 
 	bzero(&c, sizeof(c));
 
 	c.op_to_vfn = htobe32(V_FW_CMD_OP(FW_EQ_ETH_CMD) | F_FW_CMD_REQUEST |
 	    F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_ETH_CMD_PFN(sc->pf) |
 	    V_FW_EQ_ETH_CMD_VFN(0));
 	c.alloc_to_len16 = htobe32(F_FW_EQ_ETH_CMD_ALLOC |
 	    F_FW_EQ_ETH_CMD_EQSTART | FW_LEN16(c));
 	c.autoequiqe_to_viid = htobe32(F_FW_EQ_ETH_CMD_AUTOEQUIQE |
 	    F_FW_EQ_ETH_CMD_AUTOEQUEQE | V_FW_EQ_ETH_CMD_VIID(vi->viid));
 	c.fetchszm_to_iqid =
 	    htobe32(V_FW_EQ_ETH_CMD_HOSTFCMODE(X_HOSTFCMODE_NONE) |
 		V_FW_EQ_ETH_CMD_PCIECHN(eq->tx_chan) | F_FW_EQ_ETH_CMD_FETCHRO |
 		V_FW_EQ_ETH_CMD_IQID(eq->iqid));
 	c.dcaen_to_eqsize = htobe32(V_FW_EQ_ETH_CMD_FBMIN(X_FETCHBURSTMIN_64B) |
 	    V_FW_EQ_ETH_CMD_FBMAX(X_FETCHBURSTMAX_512B) |
 	    V_FW_EQ_ETH_CMD_EQSIZE(qsize));
 	c.eqaddr = htobe64(eq->ba);
 
 	rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c);
 	if (rc != 0) {
 		device_printf(vi->dev,
 		    "failed to create Ethernet egress queue: %d\n", rc);
 		return (rc);
 	}
 	eq->flags |= EQ_ALLOCATED;
 
 	eq->cntxt_id = G_FW_EQ_ETH_CMD_EQID(be32toh(c.eqid_pkd));
 	eq->abs_id = G_FW_EQ_ETH_CMD_PHYSEQID(be32toh(c.physeqid_pkd));
 	cntxt_id = eq->cntxt_id - sc->sge.eq_start;
 	if (cntxt_id >= sc->sge.neq)
 	    panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__,
 		cntxt_id, sc->sge.neq - 1);
 	sc->sge.eqmap[cntxt_id] = eq;
 
 	return (rc);
 }
 
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 static int
 ofld_eq_alloc(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq)
 {
 	int rc, cntxt_id;
 	struct fw_eq_ofld_cmd c;
 	int qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE;
 
 	bzero(&c, sizeof(c));
 
 	c.op_to_vfn = htonl(V_FW_CMD_OP(FW_EQ_OFLD_CMD) | F_FW_CMD_REQUEST |
 	    F_FW_CMD_WRITE | F_FW_CMD_EXEC | V_FW_EQ_OFLD_CMD_PFN(sc->pf) |
 	    V_FW_EQ_OFLD_CMD_VFN(0));
 	c.alloc_to_len16 = htonl(F_FW_EQ_OFLD_CMD_ALLOC |
 	    F_FW_EQ_OFLD_CMD_EQSTART | FW_LEN16(c));
 	c.fetchszm_to_iqid =
 		htonl(V_FW_EQ_OFLD_CMD_HOSTFCMODE(X_HOSTFCMODE_STATUS_PAGE) |
 		    V_FW_EQ_OFLD_CMD_PCIECHN(eq->tx_chan) |
 		    F_FW_EQ_OFLD_CMD_FETCHRO | V_FW_EQ_OFLD_CMD_IQID(eq->iqid));
 	c.dcaen_to_eqsize =
 	    htobe32(V_FW_EQ_OFLD_CMD_FBMIN(X_FETCHBURSTMIN_64B) |
 		V_FW_EQ_OFLD_CMD_FBMAX(X_FETCHBURSTMAX_512B) |
 		V_FW_EQ_OFLD_CMD_CIDXFTHRESH(qsize_to_fthresh(qsize)) |
 		V_FW_EQ_OFLD_CMD_EQSIZE(qsize));
 	c.eqaddr = htobe64(eq->ba);
 
 	rc = -t4_wr_mbox(sc, sc->mbox, &c, sizeof(c), &c);
 	if (rc != 0) {
 		device_printf(vi->dev,
 		    "failed to create egress queue for TCP offload: %d\n", rc);
 		return (rc);
 	}
 	eq->flags |= EQ_ALLOCATED;
 
 	eq->cntxt_id = G_FW_EQ_OFLD_CMD_EQID(be32toh(c.eqid_pkd));
 	cntxt_id = eq->cntxt_id - sc->sge.eq_start;
 	if (cntxt_id >= sc->sge.neq)
 	    panic("%s: eq->cntxt_id (%d) more than the max (%d)", __func__,
 		cntxt_id, sc->sge.neq - 1);
 	sc->sge.eqmap[cntxt_id] = eq;
 
 	return (rc);
 }
 #endif
 
 static int
 alloc_eq(struct adapter *sc, struct vi_info *vi, struct sge_eq *eq)
 {
 	int rc, qsize;
 	size_t len;
 
 	mtx_init(&eq->eq_lock, eq->lockname, NULL, MTX_DEF);
 
 	qsize = eq->sidx + sc->params.sge.spg_len / EQ_ESIZE;
 	len = qsize * EQ_ESIZE;
 	rc = alloc_ring(sc, len, &eq->desc_tag, &eq->desc_map,
 	    &eq->ba, (void **)&eq->desc);
 	if (rc)
 		return (rc);
 
 	eq->pidx = eq->cidx = eq->dbidx = 0;
 	/* Note that equeqidx is not used with sge_wrq (OFLD/CTRL) queues. */
 	eq->equeqidx = 0;
 	eq->doorbells = sc->doorbells;
 
 	switch (eq->flags & EQ_TYPEMASK) {
 	case EQ_CTRL:
 		rc = ctrl_eq_alloc(sc, eq);
 		break;
 
 	case EQ_ETH:
 		rc = eth_eq_alloc(sc, vi, eq);
 		break;
 
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 	case EQ_OFLD:
 		rc = ofld_eq_alloc(sc, vi, eq);
 		break;
 #endif
 
 	default:
 		panic("%s: invalid eq type %d.", __func__,
 		    eq->flags & EQ_TYPEMASK);
 	}
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to allocate egress queue(%d): %d\n",
 		    eq->flags & EQ_TYPEMASK, rc);
 	}
 
 	if (isset(&eq->doorbells, DOORBELL_UDB) ||
 	    isset(&eq->doorbells, DOORBELL_UDBWC) ||
 	    isset(&eq->doorbells, DOORBELL_WCWR)) {
 		uint32_t s_qpp = sc->params.sge.eq_s_qpp;
 		uint32_t mask = (1 << s_qpp) - 1;
 		volatile uint8_t *udb;
 
 		udb = sc->udbs_base + UDBS_DB_OFFSET;
 		udb += (eq->cntxt_id >> s_qpp) << PAGE_SHIFT;	/* pg offset */
 		eq->udb_qid = eq->cntxt_id & mask;		/* id in page */
 		if (eq->udb_qid >= PAGE_SIZE / UDBS_SEG_SIZE)
 	    		clrbit(&eq->doorbells, DOORBELL_WCWR);
 		else {
 			udb += eq->udb_qid << UDBS_SEG_SHIFT;	/* seg offset */
 			eq->udb_qid = 0;
 		}
 		eq->udb = (volatile void *)udb;
 	}
 
 	return (rc);
 }
 
 static int
 free_eq(struct adapter *sc, struct sge_eq *eq)
 {
 	int rc;
 
 	if (eq->flags & EQ_ALLOCATED) {
 		switch (eq->flags & EQ_TYPEMASK) {
 		case EQ_CTRL:
 			rc = -t4_ctrl_eq_free(sc, sc->mbox, sc->pf, 0,
 			    eq->cntxt_id);
 			break;
 
 		case EQ_ETH:
 			rc = -t4_eth_eq_free(sc, sc->mbox, sc->pf, 0,
 			    eq->cntxt_id);
 			break;
 
 #if defined(TCP_OFFLOAD) || defined(RATELIMIT)
 		case EQ_OFLD:
 			rc = -t4_ofld_eq_free(sc, sc->mbox, sc->pf, 0,
 			    eq->cntxt_id);
 			break;
 #endif
 
 		default:
 			panic("%s: invalid eq type %d.", __func__,
 			    eq->flags & EQ_TYPEMASK);
 		}
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to free egress queue (%d): %d\n",
 			    eq->flags & EQ_TYPEMASK, rc);
 			return (rc);
 		}
 		eq->flags &= ~EQ_ALLOCATED;
 	}
 
 	free_ring(sc, eq->desc_tag, eq->desc_map, eq->ba, eq->desc);
 
 	if (mtx_initialized(&eq->eq_lock))
 		mtx_destroy(&eq->eq_lock);
 
 	bzero(eq, sizeof(*eq));
 	return (0);
 }
 
 static int
 alloc_wrq(struct adapter *sc, struct vi_info *vi, struct sge_wrq *wrq,
     struct sysctl_oid *oid)
 {
 	int rc;
 	struct sysctl_ctx_list *ctx = vi ? &vi->ctx : &sc->ctx;
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	rc = alloc_eq(sc, vi, &wrq->eq);
 	if (rc)
 		return (rc);
 
 	wrq->adapter = sc;
 	TASK_INIT(&wrq->wrq_tx_task, 0, wrq_tx_drain, wrq);
 	TAILQ_INIT(&wrq->incomplete_wrs);
 	STAILQ_INIT(&wrq->wr_list);
 	wrq->nwr_pending = 0;
 	wrq->ndesc_needed = 0;
 
 	SYSCTL_ADD_UAUTO(ctx, children, OID_AUTO, "ba", CTLFLAG_RD,
 	    &wrq->eq.ba, "bus address of descriptor ring");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "dmalen", CTLFLAG_RD, NULL,
 	    wrq->eq.sidx * EQ_ESIZE + sc->params.sge.spg_len,
 	    "desc ring size in bytes");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD,
 	    &wrq->eq.cntxt_id, 0, "SGE context id of the queue");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &wrq->eq.cidx, 0, sysctl_uint16, "I",
 	    "consumer index");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &wrq->eq.pidx, 0, sysctl_uint16, "I",
 	    "producer index");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "sidx", CTLFLAG_RD, NULL,
 	    wrq->eq.sidx, "status page index");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tx_wrs_direct", CTLFLAG_RD,
 	    &wrq->tx_wrs_direct, "# of work requests (direct)");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tx_wrs_copied", CTLFLAG_RD,
 	    &wrq->tx_wrs_copied, "# of work requests (copied)");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tx_wrs_sspace", CTLFLAG_RD,
 	    &wrq->tx_wrs_ss, "# of work requests (copied from scratch space)");
 
 	return (rc);
 }
 
 static int
 free_wrq(struct adapter *sc, struct sge_wrq *wrq)
 {
 	int rc;
 
 	rc = free_eq(sc, &wrq->eq);
 	if (rc)
 		return (rc);
 
 	bzero(wrq, sizeof(*wrq));
 	return (0);
 }
 
 static int
 alloc_txq(struct vi_info *vi, struct sge_txq *txq, int idx,
     struct sysctl_oid *oid)
 {
 	int rc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct sge_eq *eq = &txq->eq;
 	char name[16];
 	struct sysctl_oid_list *children = SYSCTL_CHILDREN(oid);
 
 	rc = mp_ring_alloc(&txq->r, eq->sidx, txq, eth_tx, can_resume_eth_tx,
 	    M_CXGBE, M_WAITOK);
 	if (rc != 0) {
 		device_printf(sc->dev, "failed to allocate mp_ring: %d\n", rc);
 		return (rc);
 	}
 
 	rc = alloc_eq(sc, vi, eq);
 	if (rc != 0) {
 		mp_ring_free(txq->r);
 		txq->r = NULL;
 		return (rc);
 	}
 
 	/* Can't fail after this point. */
 
 	if (idx == 0)
 		sc->sge.eq_base = eq->abs_id - eq->cntxt_id;
 	else
 		KASSERT(eq->cntxt_id + sc->sge.eq_base == eq->abs_id,
 		    ("eq_base mismatch"));
 	KASSERT(sc->sge.eq_base == 0 || sc->flags & IS_VF,
 	    ("PF with non-zero eq_base"));
 
 	TASK_INIT(&txq->tx_reclaim_task, 0, tx_reclaim, eq);
 	txq->ifp = vi->ifp;
 	txq->gl = sglist_alloc(TX_SGL_SEGS, M_WAITOK);
 	if (sc->flags & IS_VF)
 		txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT_XT) |
 		    V_TXPKT_INTF(pi->tx_chan));
 	else
 		txq->cpl_ctrl0 = htobe32(V_TXPKT_OPCODE(CPL_TX_PKT) |
 		    V_TXPKT_INTF(pi->tx_chan) | V_TXPKT_PF(sc->pf) |
 		    V_TXPKT_VF(vi->vin) | V_TXPKT_VF_VLD(vi->vfvld));
 	txq->tc_idx = -1;
 	txq->sdesc = malloc(eq->sidx * sizeof(struct tx_sdesc), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	snprintf(name, sizeof(name), "%d", idx);
 	oid = SYSCTL_ADD_NODE(&vi->ctx, children, OID_AUTO, name, CTLFLAG_RD,
 	    NULL, "tx queue");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UAUTO(&vi->ctx, children, OID_AUTO, "ba", CTLFLAG_RD,
 	    &eq->ba, "bus address of descriptor ring");
 	SYSCTL_ADD_INT(&vi->ctx, children, OID_AUTO, "dmalen", CTLFLAG_RD, NULL,
 	    eq->sidx * EQ_ESIZE + sc->params.sge.spg_len,
 	    "desc ring size in bytes");
 	SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "abs_id", CTLFLAG_RD,
 	    &eq->abs_id, 0, "absolute id of the queue");
 	SYSCTL_ADD_UINT(&vi->ctx, children, OID_AUTO, "cntxt_id", CTLFLAG_RD,
 	    &eq->cntxt_id, 0, "SGE context id of the queue");
 	SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "cidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &eq->cidx, 0, sysctl_uint16, "I",
 	    "consumer index");
 	SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "pidx",
 	    CTLTYPE_INT | CTLFLAG_RD, &eq->pidx, 0, sysctl_uint16, "I",
 	    "producer index");
 	SYSCTL_ADD_INT(&vi->ctx, children, OID_AUTO, "sidx", CTLFLAG_RD, NULL,
 	    eq->sidx, "status page index");
 
 	SYSCTL_ADD_PROC(&vi->ctx, children, OID_AUTO, "tc",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, idx, sysctl_tc, "I",
 	    "traffic class (-1 means none)");
 
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txcsum", CTLFLAG_RD,
 	    &txq->txcsum, "# of times hardware assisted with checksum");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "vlan_insertion",
 	    CTLFLAG_RD, &txq->vlan_insertion,
 	    "# of times hardware inserted 802.1Q tag");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "tso_wrs", CTLFLAG_RD,
 	    &txq->tso_wrs, "# of TSO work requests");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "imm_wrs", CTLFLAG_RD,
 	    &txq->imm_wrs, "# of work requests with immediate data");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "sgl_wrs", CTLFLAG_RD,
 	    &txq->sgl_wrs, "# of work requests with direct SGL");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkt_wrs", CTLFLAG_RD,
 	    &txq->txpkt_wrs, "# of txpkt work requests (one pkt/WR)");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts0_wrs",
 	    CTLFLAG_RD, &txq->txpkts0_wrs,
 	    "# of txpkts (type 0) work requests");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts1_wrs",
 	    CTLFLAG_RD, &txq->txpkts1_wrs,
 	    "# of txpkts (type 1) work requests");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts0_pkts",
 	    CTLFLAG_RD, &txq->txpkts0_pkts,
 	    "# of frames tx'd using type0 txpkts work requests");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "txpkts1_pkts",
 	    CTLFLAG_RD, &txq->txpkts1_pkts,
 	    "# of frames tx'd using type1 txpkts work requests");
 	SYSCTL_ADD_UQUAD(&vi->ctx, children, OID_AUTO, "raw_wrs", CTLFLAG_RD,
 	    &txq->raw_wrs, "# of raw work requests (non-packets)");
 
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_enqueues",
 	    CTLFLAG_RD, &txq->r->enqueues,
 	    "# of enqueues to the mp_ring for this queue");
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_drops",
 	    CTLFLAG_RD, &txq->r->drops,
 	    "# of drops in the mp_ring for this queue");
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_starts",
 	    CTLFLAG_RD, &txq->r->starts,
 	    "# of normal consumer starts in the mp_ring for this queue");
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_stalls",
 	    CTLFLAG_RD, &txq->r->stalls,
 	    "# of consumer stalls in the mp_ring for this queue");
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_restarts",
 	    CTLFLAG_RD, &txq->r->restarts,
 	    "# of consumer restarts in the mp_ring for this queue");
 	SYSCTL_ADD_COUNTER_U64(&vi->ctx, children, OID_AUTO, "r_abdications",
 	    CTLFLAG_RD, &txq->r->abdications,
 	    "# of consumer abdications in the mp_ring for this queue");
 
 	return (0);
 }
 
 static int
 free_txq(struct vi_info *vi, struct sge_txq *txq)
 {
 	int rc;
 	struct adapter *sc = vi->pi->adapter;
 	struct sge_eq *eq = &txq->eq;
 
 	rc = free_eq(sc, eq);
 	if (rc)
 		return (rc);
 
 	sglist_free(txq->gl);
 	free(txq->sdesc, M_CXGBE);
 	mp_ring_free(txq->r);
 
 	bzero(txq, sizeof(*txq));
 	return (0);
 }
 
 static void
 oneseg_dma_callback(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	bus_addr_t *ba = arg;
 
 	KASSERT(nseg == 1,
 	    ("%s meant for single segment mappings only.", __func__));
 
 	*ba = error ? 0 : segs->ds_addr;
 }
 
 static inline void
 ring_fl_db(struct adapter *sc, struct sge_fl *fl)
 {
 	uint32_t n, v;
 
 	n = IDXDIFF(fl->pidx / 8, fl->dbidx, fl->sidx);
 	MPASS(n > 0);
 
 	wmb();
 	v = fl->dbval | V_PIDX(n);
 	if (fl->udb)
 		*fl->udb = htole32(v);
 	else
 		t4_write_reg(sc, sc->sge_kdoorbell_reg, v);
 	IDXINCR(fl->dbidx, n, fl->sidx);
 }
 
 /*
  * Fills up the freelist by allocating up to 'n' buffers.  Buffers that are
  * recycled do not count towards this allocation budget.
  *
  * Returns non-zero to indicate that this freelist should be added to the list
  * of starving freelists.
  */
 static int
 refill_fl(struct adapter *sc, struct sge_fl *fl, int n)
 {
 	__be64 *d;
 	struct fl_sdesc *sd;
 	uintptr_t pa;
 	caddr_t cl;
 	struct cluster_layout *cll;
 	struct sw_zone_info *swz;
 	struct cluster_metadata *clm;
 	uint16_t max_pidx;
 	uint16_t hw_cidx = fl->hw_cidx;		/* stable snapshot */
 
 	FL_LOCK_ASSERT_OWNED(fl);
 
 	/*
 	 * We always stop at the beginning of the hardware descriptor that's just
 	 * before the one with the hw cidx.  This is to avoid hw pidx = hw cidx,
 	 * which would mean an empty freelist to the chip.
 	 */
 	max_pidx = __predict_false(hw_cidx == 0) ? fl->sidx - 1 : hw_cidx - 1;
 	if (fl->pidx == max_pidx * 8)
 		return (0);
 
 	d = &fl->desc[fl->pidx];
 	sd = &fl->sdesc[fl->pidx];
 	cll = &fl->cll_def;	/* default layout */
 	swz = &sc->sge.sw_zone_info[cll->zidx];
 
 	while (n > 0) {
 
 		if (sd->cl != NULL) {
 
 			if (sd->nmbuf == 0) {
 				/*
 				 * Fast recycle without involving any atomics on
 				 * the cluster's metadata (if the cluster has
 				 * metadata).  This happens when all frames
 				 * received in the cluster were small enough to
 				 * fit within a single mbuf each.
 				 */
 				fl->cl_fast_recycled++;
 #ifdef INVARIANTS
 				clm = cl_metadata(sc, fl, &sd->cll, sd->cl);
 				if (clm != NULL)
 					MPASS(clm->refcount == 1);
 #endif
 				goto recycled_fast;
 			}
 
 			/*
 			 * Cluster is guaranteed to have metadata.  Clusters
 			 * without metadata always take the fast recycle path
 			 * when they're recycled.
 			 */
 			clm = cl_metadata(sc, fl, &sd->cll, sd->cl);
 			MPASS(clm != NULL);
 
 			if (atomic_fetchadd_int(&clm->refcount, -1) == 1) {
 				fl->cl_recycled++;
 				counter_u64_add(extfree_rels, 1);
 				goto recycled;
 			}
 			sd->cl = NULL;	/* gave up my reference */
 		}
 		MPASS(sd->cl == NULL);
 alloc:
 		cl = uma_zalloc(swz->zone, M_NOWAIT);
 		if (__predict_false(cl == NULL)) {
 			if (cll == &fl->cll_alt || fl->cll_alt.zidx == -1 ||
 			    fl->cll_def.zidx == fl->cll_alt.zidx)
 				break;
 
 			/* fall back to the safe zone */
 			cll = &fl->cll_alt;
 			swz = &sc->sge.sw_zone_info[cll->zidx];
 			goto alloc;
 		}
 		fl->cl_allocated++;
 		n--;
 
 		pa = pmap_kextract((vm_offset_t)cl);
 		pa += cll->region1;
 		sd->cl = cl;
 		sd->cll = *cll;
 		*d = htobe64(pa | cll->hwidx);
 		clm = cl_metadata(sc, fl, cll, cl);
 		if (clm != NULL) {
 recycled:
 #ifdef INVARIANTS
 			clm->sd = sd;
 #endif
 			clm->refcount = 1;
 		}
 		sd->nmbuf = 0;
 recycled_fast:
 		d++;
 		sd++;
 		if (__predict_false(++fl->pidx % 8 == 0)) {
 			uint16_t pidx = fl->pidx / 8;
 
 			if (__predict_false(pidx == fl->sidx)) {
 				fl->pidx = 0;
 				pidx = 0;
 				sd = fl->sdesc;
 				d = fl->desc;
 			}
 			if (pidx == max_pidx)
 				break;
 
 			if (IDXDIFF(pidx, fl->dbidx, fl->sidx) >= 4)
 				ring_fl_db(sc, fl);
 		}
 	}
 
 	if (fl->pidx / 8 != fl->dbidx)
 		ring_fl_db(sc, fl);
 
 	return (FL_RUNNING_LOW(fl) && !(fl->flags & FL_STARVING));
 }
 
 /*
  * Attempt to refill all starving freelists.
  */
 static void
 refill_sfl(void *arg)
 {
 	struct adapter *sc = arg;
 	struct sge_fl *fl, *fl_temp;
 
 	mtx_assert(&sc->sfl_lock, MA_OWNED);
 	TAILQ_FOREACH_SAFE(fl, &sc->sfl, link, fl_temp) {
 		FL_LOCK(fl);
 		refill_fl(sc, fl, 64);
 		if (FL_NOT_RUNNING_LOW(fl) || fl->flags & FL_DOOMED) {
 			TAILQ_REMOVE(&sc->sfl, fl, link);
 			fl->flags &= ~FL_STARVING;
 		}
 		FL_UNLOCK(fl);
 	}
 
 	if (!TAILQ_EMPTY(&sc->sfl))
 		callout_schedule(&sc->sfl_callout, hz / 5);
 }
 
 static int
 alloc_fl_sdesc(struct sge_fl *fl)
 {
 
 	fl->sdesc = malloc(fl->sidx * 8 * sizeof(struct fl_sdesc), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	return (0);
 }
 
 static void
 free_fl_sdesc(struct adapter *sc, struct sge_fl *fl)
 {
 	struct fl_sdesc *sd;
 	struct cluster_metadata *clm;
 	struct cluster_layout *cll;
 	int i;
 
 	sd = fl->sdesc;
 	for (i = 0; i < fl->sidx * 8; i++, sd++) {
 		if (sd->cl == NULL)
 			continue;
 
 		cll = &sd->cll;
 		clm = cl_metadata(sc, fl, cll, sd->cl);
 		if (sd->nmbuf == 0)
 			uma_zfree(sc->sge.sw_zone_info[cll->zidx].zone, sd->cl);
 		else if (clm && atomic_fetchadd_int(&clm->refcount, -1) == 1) {
 			uma_zfree(sc->sge.sw_zone_info[cll->zidx].zone, sd->cl);
 			counter_u64_add(extfree_rels, 1);
 		}
 		sd->cl = NULL;
 	}
 
 	free(fl->sdesc, M_CXGBE);
 	fl->sdesc = NULL;
 }
 
 static inline void
 get_pkt_gl(struct mbuf *m, struct sglist *gl)
 {
 	int rc;
 
 	M_ASSERTPKTHDR(m);
 
 	sglist_reset(gl);
 	rc = sglist_append_mbuf(gl, m);
 	if (__predict_false(rc != 0)) {
 		panic("%s: mbuf %p (%d segs) was vetted earlier but now fails "
 		    "with %d.", __func__, m, mbuf_nsegs(m), rc);
 	}
 
 	KASSERT(gl->sg_nseg == mbuf_nsegs(m),
 	    ("%s: nsegs changed for mbuf %p from %d to %d", __func__, m,
 	    mbuf_nsegs(m), gl->sg_nseg));
 	KASSERT(gl->sg_nseg > 0 &&
 	    gl->sg_nseg <= (needs_tso(m) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS),
 	    ("%s: %d segments, should have been 1 <= nsegs <= %d", __func__,
 		gl->sg_nseg, needs_tso(m) ? TX_SGL_SEGS_TSO : TX_SGL_SEGS));
 }
 
 /*
  * len16 for a txpkt WR with a GL.  Includes the firmware work request header.
  */
 static inline u_int
 txpkt_len16(u_int nsegs, u_int tso)
 {
 	u_int n;
 
 	MPASS(nsegs > 0);
 
 	nsegs--; /* first segment is part of ulptx_sgl */
 	n = sizeof(struct fw_eth_tx_pkt_wr) + sizeof(struct cpl_tx_pkt_core) +
 	    sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1));
 	if (tso)
 		n += sizeof(struct cpl_tx_pkt_lso_core);
 
 	return (howmany(n, 16));
 }
 
 /*
  * len16 for a txpkt_vm WR with a GL.  Includes the firmware work
  * request header.
  */
 static inline u_int
 txpkt_vm_len16(u_int nsegs, u_int tso)
 {
 	u_int n;
 
 	MPASS(nsegs > 0);
 
 	nsegs--; /* first segment is part of ulptx_sgl */
 	n = sizeof(struct fw_eth_tx_pkt_vm_wr) +
 	    sizeof(struct cpl_tx_pkt_core) +
 	    sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1));
 	if (tso)
 		n += sizeof(struct cpl_tx_pkt_lso_core);
 
 	return (howmany(n, 16));
 }
 
 /*
  * len16 for a txpkts type 0 WR with a GL.  Does not include the firmware work
  * request header.
  */
 static inline u_int
 txpkts0_len16(u_int nsegs)
 {
 	u_int n;
 
 	MPASS(nsegs > 0);
 
 	nsegs--; /* first segment is part of ulptx_sgl */
 	n = sizeof(struct ulp_txpkt) + sizeof(struct ulptx_idata) +
 	    sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl) +
 	    8 * ((3 * nsegs) / 2 + (nsegs & 1));
 
 	return (howmany(n, 16));
 }
 
 /*
  * len16 for a txpkts type 1 WR with a GL.  Does not include the firmware work
  * request header.
  */
 static inline u_int
 txpkts1_len16(void)
 {
 	u_int n;
 
 	n = sizeof(struct cpl_tx_pkt_core) + sizeof(struct ulptx_sgl);
 
 	return (howmany(n, 16));
 }
 
 static inline u_int
 imm_payload(u_int ndesc)
 {
 	u_int n;
 
 	n = ndesc * EQ_ESIZE - sizeof(struct fw_eth_tx_pkt_wr) -
 	    sizeof(struct cpl_tx_pkt_core);
 
 	return (n);
 }
 
 /*
  * Write a VM txpkt WR for this packet to the hardware descriptors, update the
  * software descriptor, and advance the pidx.  It is guaranteed that enough
  * descriptors are available.
  *
  * The return value is the # of hardware descriptors used.
  */
 static u_int
 write_txpkt_vm_wr(struct adapter *sc, struct sge_txq *txq,
     struct fw_eth_tx_pkt_vm_wr *wr, struct mbuf *m0, u_int available)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct tx_sdesc *txsd;
 	struct cpl_tx_pkt_core *cpl;
 	uint32_t ctrl;	/* used in many unrelated places */
 	uint64_t ctrl1;
 	int csum_type, len16, ndesc, pktlen, nsegs;
 	caddr_t dst;
 
 	TXQ_LOCK_ASSERT_OWNED(txq);
 	M_ASSERTPKTHDR(m0);
 	MPASS(available > 0 && available < eq->sidx);
 
 	len16 = mbuf_len16(m0);
 	nsegs = mbuf_nsegs(m0);
 	pktlen = m0->m_pkthdr.len;
 	ctrl = sizeof(struct cpl_tx_pkt_core);
 	if (needs_tso(m0))
 		ctrl += sizeof(struct cpl_tx_pkt_lso_core);
 	ndesc = howmany(len16, EQ_ESIZE / 16);
 	MPASS(ndesc <= available);
 
 	/* Firmware work request header */
 	MPASS(wr == (void *)&eq->desc[eq->pidx]);
 	wr->op_immdlen = htobe32(V_FW_WR_OP(FW_ETH_TX_PKT_VM_WR) |
 	    V_FW_ETH_TX_PKT_WR_IMMDLEN(ctrl));
 
 	ctrl = V_FW_WR_LEN16(len16);
 	wr->equiq_to_len16 = htobe32(ctrl);
 	wr->r3[0] = 0;
 	wr->r3[1] = 0;
 	
 	/*
 	 * Copy over ethmacdst, ethmacsrc, ethtype, and vlantci.
 	 * vlantci is ignored unless the ethtype is 0x8100, so it's
 	 * simpler to always copy it rather than making it
 	 * conditional.  Also, it seems that we do not have to set
 	 * vlantci or fake the ethtype when doing VLAN tag insertion.
 	 */
 	m_copydata(m0, 0, sizeof(struct ether_header) + 2, wr->ethmacdst);
 
 	csum_type = -1;
 	if (needs_tso(m0)) {
 		struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
 
 		KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0 &&
 		    m0->m_pkthdr.l4hlen > 0,
 		    ("%s: mbuf %p needs TSO but missing header lengths",
 			__func__, m0));
 
 		ctrl = V_LSO_OPCODE(CPL_TX_PKT_LSO) | F_LSO_FIRST_SLICE |
 		    F_LSO_LAST_SLICE | V_LSO_IPHDR_LEN(m0->m_pkthdr.l3hlen >> 2)
 		    | V_LSO_TCPHDR_LEN(m0->m_pkthdr.l4hlen >> 2);
 		if (m0->m_pkthdr.l2hlen == sizeof(struct ether_vlan_header))
 			ctrl |= V_LSO_ETHHDR_LEN(1);
 		if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr))
 			ctrl |= F_LSO_IPV6;
 
 		lso->lso_ctrl = htobe32(ctrl);
 		lso->ipid_ofst = htobe16(0);
 		lso->mss = htobe16(m0->m_pkthdr.tso_segsz);
 		lso->seqno_offset = htobe32(0);
 		lso->len = htobe32(pktlen);
 
 		if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr))
 			csum_type = TX_CSUM_TCPIP6;
 		else
 			csum_type = TX_CSUM_TCPIP;
 
 		cpl = (void *)(lso + 1);
 
 		txq->tso_wrs++;
 	} else {
 		if (m0->m_pkthdr.csum_flags & CSUM_IP_TCP)
 			csum_type = TX_CSUM_TCPIP;
 		else if (m0->m_pkthdr.csum_flags & CSUM_IP_UDP)
 			csum_type = TX_CSUM_UDPIP;
 		else if (m0->m_pkthdr.csum_flags & CSUM_IP6_TCP)
 			csum_type = TX_CSUM_TCPIP6;
 		else if (m0->m_pkthdr.csum_flags & CSUM_IP6_UDP)
 			csum_type = TX_CSUM_UDPIP6;
 #if defined(INET)
 		else if (m0->m_pkthdr.csum_flags & CSUM_IP) {
 			/*
 			 * XXX: The firmware appears to stomp on the
 			 * fragment/flags field of the IP header when
 			 * using TX_CSUM_IP.  Fall back to doing
 			 * software checksums.
 			 */
 			u_short *sump;
 			struct mbuf *m;
 			int offset;
 
 			m = m0;
 			offset = 0;
 			sump = m_advance(&m, &offset, m0->m_pkthdr.l2hlen +
 			    offsetof(struct ip, ip_sum));
 			*sump = in_cksum_skip(m0, m0->m_pkthdr.l2hlen +
 			    m0->m_pkthdr.l3hlen, m0->m_pkthdr.l2hlen);
 			m0->m_pkthdr.csum_flags &= ~CSUM_IP;
 		}
 #endif
 
 		cpl = (void *)(wr + 1);
 	}
 
 	/* Checksum offload */
 	ctrl1 = 0;
 	if (needs_l3_csum(m0) == 0)
 		ctrl1 |= F_TXPKT_IPCSUM_DIS;
 	if (csum_type >= 0) {
 		KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0,
 	    ("%s: mbuf %p needs checksum offload but missing header lengths",
 			__func__, m0));
 
 		if (chip_id(sc) <= CHELSIO_T5) {
 			ctrl1 |= V_TXPKT_ETHHDR_LEN(m0->m_pkthdr.l2hlen -
 			    ETHER_HDR_LEN);
 		} else {
 			ctrl1 |= V_T6_TXPKT_ETHHDR_LEN(m0->m_pkthdr.l2hlen -
 			    ETHER_HDR_LEN);
 		}
 		ctrl1 |= V_TXPKT_IPHDR_LEN(m0->m_pkthdr.l3hlen);
 		ctrl1 |= V_TXPKT_CSUM_TYPE(csum_type);
 	} else
 		ctrl1 |= F_TXPKT_L4CSUM_DIS;
 	if (m0->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP |
 	    CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO))
 		txq->txcsum++;	/* some hardware assistance provided */
 
 	/* VLAN tag insertion */
 	if (needs_vlan_insertion(m0)) {
 		ctrl1 |= F_TXPKT_VLAN_VLD |
 		    V_TXPKT_VLAN(m0->m_pkthdr.ether_vtag);
 		txq->vlan_insertion++;
 	}
 
 	/* CPL header */
 	cpl->ctrl0 = txq->cpl_ctrl0;
 	cpl->pack = 0;
 	cpl->len = htobe16(pktlen);
 	cpl->ctrl1 = htobe64(ctrl1);
 
 	/* SGL */
 	dst = (void *)(cpl + 1);
 
 	/*
 	 * A packet using TSO will use up an entire descriptor for the
 	 * firmware work request header, LSO CPL, and TX_PKT_XT CPL.
 	 * If this descriptor is the last descriptor in the ring, wrap
 	 * around to the front of the ring explicitly for the start of
 	 * the sgl.
 	 */
 	if (dst == (void *)&eq->desc[eq->sidx]) {
 		dst = (void *)&eq->desc[0];
 		write_gl_to_txd(txq, m0, &dst, 0);
 	} else
 		write_gl_to_txd(txq, m0, &dst, eq->sidx - ndesc < eq->pidx);
 	txq->sgl_wrs++;
 
 	txq->txpkt_wrs++;
 
 	txsd = &txq->sdesc[eq->pidx];
 	txsd->m = m0;
 	txsd->desc_used = ndesc;
 
 	return (ndesc);
 }
 
 /*
  * Write a raw WR to the hardware descriptors, update the software
  * descriptor, and advance the pidx.  It is guaranteed that enough
  * descriptors are available.
  *
  * The return value is the # of hardware descriptors used.
  */
 static u_int
 write_raw_wr(struct sge_txq *txq, void *wr, struct mbuf *m0, u_int available)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct tx_sdesc *txsd;
 	struct mbuf *m;
 	caddr_t dst;
 	int len16, ndesc;
 
 	len16 = mbuf_len16(m0);
 	ndesc = howmany(len16, EQ_ESIZE / 16);
 	MPASS(ndesc <= available);
 
 	dst = wr;
 	for (m = m0; m != NULL; m = m->m_next)
 		copy_to_txd(eq, mtod(m, caddr_t), &dst, m->m_len);
 
 	txq->raw_wrs++;
 
 	txsd = &txq->sdesc[eq->pidx];
 	txsd->m = m0;
 	txsd->desc_used = ndesc;
 
 	return (ndesc);
 }
 
 /*
  * Write a txpkt WR for this packet to the hardware descriptors, update the
  * software descriptor, and advance the pidx.  It is guaranteed that enough
  * descriptors are available.
  *
  * The return value is the # of hardware descriptors used.
  */
 static u_int
 write_txpkt_wr(struct sge_txq *txq, struct fw_eth_tx_pkt_wr *wr,
     struct mbuf *m0, u_int available)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct tx_sdesc *txsd;
 	struct cpl_tx_pkt_core *cpl;
 	uint32_t ctrl;	/* used in many unrelated places */
 	uint64_t ctrl1;
 	int len16, ndesc, pktlen, nsegs;
 	caddr_t dst;
 
 	TXQ_LOCK_ASSERT_OWNED(txq);
 	M_ASSERTPKTHDR(m0);
 	MPASS(available > 0 && available < eq->sidx);
 
 	len16 = mbuf_len16(m0);
 	nsegs = mbuf_nsegs(m0);
 	pktlen = m0->m_pkthdr.len;
 	ctrl = sizeof(struct cpl_tx_pkt_core);
 	if (needs_tso(m0))
 		ctrl += sizeof(struct cpl_tx_pkt_lso_core);
 	else if (pktlen <= imm_payload(2) && available >= 2) {
 		/* Immediate data.  Recalculate len16 and set nsegs to 0. */
 		ctrl += pktlen;
 		len16 = howmany(sizeof(struct fw_eth_tx_pkt_wr) +
 		    sizeof(struct cpl_tx_pkt_core) + pktlen, 16);
 		nsegs = 0;
 	}
 	ndesc = howmany(len16, EQ_ESIZE / 16);
 	MPASS(ndesc <= available);
 
 	/* Firmware work request header */
 	MPASS(wr == (void *)&eq->desc[eq->pidx]);
 	wr->op_immdlen = htobe32(V_FW_WR_OP(FW_ETH_TX_PKT_WR) |
 	    V_FW_ETH_TX_PKT_WR_IMMDLEN(ctrl));
 
 	ctrl = V_FW_WR_LEN16(len16);
 	wr->equiq_to_len16 = htobe32(ctrl);
 	wr->r3 = 0;
 
 	if (needs_tso(m0)) {
 		struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
 
 		KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0 &&
 		    m0->m_pkthdr.l4hlen > 0,
 		    ("%s: mbuf %p needs TSO but missing header lengths",
 			__func__, m0));
 
 		ctrl = V_LSO_OPCODE(CPL_TX_PKT_LSO) | F_LSO_FIRST_SLICE |
 		    F_LSO_LAST_SLICE | V_LSO_IPHDR_LEN(m0->m_pkthdr.l3hlen >> 2)
 		    | V_LSO_TCPHDR_LEN(m0->m_pkthdr.l4hlen >> 2);
 		if (m0->m_pkthdr.l2hlen == sizeof(struct ether_vlan_header))
 			ctrl |= V_LSO_ETHHDR_LEN(1);
 		if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr))
 			ctrl |= F_LSO_IPV6;
 
 		lso->lso_ctrl = htobe32(ctrl);
 		lso->ipid_ofst = htobe16(0);
 		lso->mss = htobe16(m0->m_pkthdr.tso_segsz);
 		lso->seqno_offset = htobe32(0);
 		lso->len = htobe32(pktlen);
 
 		cpl = (void *)(lso + 1);
 
 		txq->tso_wrs++;
 	} else
 		cpl = (void *)(wr + 1);
 
 	/* Checksum offload */
 	ctrl1 = 0;
 	if (needs_l3_csum(m0) == 0)
 		ctrl1 |= F_TXPKT_IPCSUM_DIS;
 	if (needs_l4_csum(m0) == 0)
 		ctrl1 |= F_TXPKT_L4CSUM_DIS;
 	if (m0->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP |
 	    CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO))
 		txq->txcsum++;	/* some hardware assistance provided */
 
 	/* VLAN tag insertion */
 	if (needs_vlan_insertion(m0)) {
 		ctrl1 |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN(m0->m_pkthdr.ether_vtag);
 		txq->vlan_insertion++;
 	}
 
 	/* CPL header */
 	cpl->ctrl0 = txq->cpl_ctrl0;
 	cpl->pack = 0;
 	cpl->len = htobe16(pktlen);
 	cpl->ctrl1 = htobe64(ctrl1);
 
 	/* SGL */
 	dst = (void *)(cpl + 1);
 	if (nsegs > 0) {
 
 		write_gl_to_txd(txq, m0, &dst, eq->sidx - ndesc < eq->pidx);
 		txq->sgl_wrs++;
 	} else {
 		struct mbuf *m;
 
 		for (m = m0; m != NULL; m = m->m_next) {
 			copy_to_txd(eq, mtod(m, caddr_t), &dst, m->m_len);
 #ifdef INVARIANTS
 			pktlen -= m->m_len;
 #endif
 		}
 #ifdef INVARIANTS
 		KASSERT(pktlen == 0, ("%s: %d bytes left.", __func__, pktlen));
 #endif
 		txq->imm_wrs++;
 	}
 
 	txq->txpkt_wrs++;
 
 	txsd = &txq->sdesc[eq->pidx];
 	txsd->m = m0;
 	txsd->desc_used = ndesc;
 
 	return (ndesc);
 }
 
 static int
 try_txpkts(struct mbuf *m, struct mbuf *n, struct txpkts *txp, u_int available)
 {
 	u_int needed, nsegs1, nsegs2, l1, l2;
 
 	if (cannot_use_txpkts(m) || cannot_use_txpkts(n))
 		return (1);
 
 	nsegs1 = mbuf_nsegs(m);
 	nsegs2 = mbuf_nsegs(n);
 	if (nsegs1 + nsegs2 == 2) {
 		txp->wr_type = 1;
 		l1 = l2 = txpkts1_len16();
 	} else {
 		txp->wr_type = 0;
 		l1 = txpkts0_len16(nsegs1);
 		l2 = txpkts0_len16(nsegs2);
 	}
 	txp->len16 = howmany(sizeof(struct fw_eth_tx_pkts_wr), 16) + l1 + l2;
 	needed = howmany(txp->len16, EQ_ESIZE / 16);
 	if (needed > SGE_MAX_WR_NDESC || needed > available)
 		return (1);
 
 	txp->plen = m->m_pkthdr.len + n->m_pkthdr.len;
 	if (txp->plen > 65535)
 		return (1);
 
 	txp->npkt = 2;
 	set_mbuf_len16(m, l1);
 	set_mbuf_len16(n, l2);
 
 	return (0);
 }
 
 static int
 add_to_txpkts(struct mbuf *m, struct txpkts *txp, u_int available)
 {
 	u_int plen, len16, needed, nsegs;
 
 	MPASS(txp->wr_type == 0 || txp->wr_type == 1);
 
 	if (cannot_use_txpkts(m))
 		return (1);
 
 	nsegs = mbuf_nsegs(m);
 	if (txp->wr_type == 1 && nsegs != 1)
 		return (1);
 
 	plen = txp->plen + m->m_pkthdr.len;
 	if (plen > 65535)
 		return (1);
 
 	if (txp->wr_type == 0)
 		len16 = txpkts0_len16(nsegs);
 	else
 		len16 = txpkts1_len16();
 	needed = howmany(txp->len16 + len16, EQ_ESIZE / 16);
 	if (needed > SGE_MAX_WR_NDESC || needed > available)
 		return (1);
 
 	txp->npkt++;
 	txp->plen = plen;
 	txp->len16 += len16;
 	set_mbuf_len16(m, len16);
 
 	return (0);
 }
 
 /*
  * Write a txpkts WR for the packets in txp to the hardware descriptors, update
  * the software descriptor, and advance the pidx.  It is guaranteed that enough
  * descriptors are available.
  *
  * The return value is the # of hardware descriptors used.
  */
 static u_int
 write_txpkts_wr(struct sge_txq *txq, struct fw_eth_tx_pkts_wr *wr,
     struct mbuf *m0, const struct txpkts *txp, u_int available)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct tx_sdesc *txsd;
 	struct cpl_tx_pkt_core *cpl;
 	uint32_t ctrl;
 	uint64_t ctrl1;
 	int ndesc, checkwrap;
 	struct mbuf *m;
 	void *flitp;
 
 	TXQ_LOCK_ASSERT_OWNED(txq);
 	MPASS(txp->npkt > 0);
 	MPASS(txp->plen < 65536);
 	MPASS(m0 != NULL);
 	MPASS(m0->m_nextpkt != NULL);
 	MPASS(txp->len16 <= howmany(SGE_MAX_WR_LEN, 16));
 	MPASS(available > 0 && available < eq->sidx);
 
 	ndesc = howmany(txp->len16, EQ_ESIZE / 16);
 	MPASS(ndesc <= available);
 
 	MPASS(wr == (void *)&eq->desc[eq->pidx]);
 	wr->op_pkd = htobe32(V_FW_WR_OP(FW_ETH_TX_PKTS_WR));
 	ctrl = V_FW_WR_LEN16(txp->len16);
 	wr->equiq_to_len16 = htobe32(ctrl);
 	wr->plen = htobe16(txp->plen);
 	wr->npkt = txp->npkt;
 	wr->r3 = 0;
 	wr->type = txp->wr_type;
 	flitp = wr + 1;
 
 	/*
 	 * At this point we are 16B into a hardware descriptor.  If checkwrap is
 	 * set then we know the WR is going to wrap around somewhere.  We'll
 	 * check for that at appropriate points.
 	 */
 	checkwrap = eq->sidx - ndesc < eq->pidx;
 	for (m = m0; m != NULL; m = m->m_nextpkt) {
 		if (txp->wr_type == 0) {
 			struct ulp_txpkt *ulpmc;
 			struct ulptx_idata *ulpsc;
 
 			/* ULP master command */
 			ulpmc = flitp;
 			ulpmc->cmd_dest = htobe32(V_ULPTX_CMD(ULP_TX_PKT) |
 			    V_ULP_TXPKT_DEST(0) | V_ULP_TXPKT_FID(eq->iqid));
 			ulpmc->len = htobe32(mbuf_len16(m));
 
 			/* ULP subcommand */
 			ulpsc = (void *)(ulpmc + 1);
 			ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM) |
 			    F_ULP_TX_SC_MORE);
 			ulpsc->len = htobe32(sizeof(struct cpl_tx_pkt_core));
 
 			cpl = (void *)(ulpsc + 1);
 			if (checkwrap &&
 			    (uintptr_t)cpl == (uintptr_t)&eq->desc[eq->sidx])
 				cpl = (void *)&eq->desc[0];
 		} else {
 			cpl = flitp;
 		}
 
 		/* Checksum offload */
 		ctrl1 = 0;
 		if (needs_l3_csum(m) == 0)
 			ctrl1 |= F_TXPKT_IPCSUM_DIS;
 		if (needs_l4_csum(m) == 0)
 			ctrl1 |= F_TXPKT_L4CSUM_DIS;
 		if (m->m_pkthdr.csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP |
 		    CSUM_UDP_IPV6 | CSUM_TCP_IPV6 | CSUM_TSO))
 			txq->txcsum++;	/* some hardware assistance provided */
 
 		/* VLAN tag insertion */
 		if (needs_vlan_insertion(m)) {
 			ctrl1 |= F_TXPKT_VLAN_VLD |
 			    V_TXPKT_VLAN(m->m_pkthdr.ether_vtag);
 			txq->vlan_insertion++;
 		}
 
 		/* CPL header */
 		cpl->ctrl0 = txq->cpl_ctrl0;
 		cpl->pack = 0;
 		cpl->len = htobe16(m->m_pkthdr.len);
 		cpl->ctrl1 = htobe64(ctrl1);
 
 		flitp = cpl + 1;
 		if (checkwrap &&
 		    (uintptr_t)flitp == (uintptr_t)&eq->desc[eq->sidx])
 			flitp = (void *)&eq->desc[0];
 
 		write_gl_to_txd(txq, m, (caddr_t *)(&flitp), checkwrap);
 
 	}
 
 	if (txp->wr_type == 0) {
 		txq->txpkts0_pkts += txp->npkt;
 		txq->txpkts0_wrs++;
 	} else {
 		txq->txpkts1_pkts += txp->npkt;
 		txq->txpkts1_wrs++;
 	}
 
 	txsd = &txq->sdesc[eq->pidx];
 	txsd->m = m0;
 	txsd->desc_used = ndesc;
 
 	return (ndesc);
 }
 
 /*
  * If the SGL ends on an address that is not 16 byte aligned, this function will
  * add a 0 filled flit at the end.
  */
 static void
 write_gl_to_txd(struct sge_txq *txq, struct mbuf *m, caddr_t *to, int checkwrap)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct sglist *gl = txq->gl;
 	struct sglist_seg *seg;
 	__be64 *flitp, *wrap;
 	struct ulptx_sgl *usgl;
 	int i, nflits, nsegs;
 
 	KASSERT(((uintptr_t)(*to) & 0xf) == 0,
 	    ("%s: SGL must start at a 16 byte boundary: %p", __func__, *to));
 	MPASS((uintptr_t)(*to) >= (uintptr_t)&eq->desc[0]);
 	MPASS((uintptr_t)(*to) < (uintptr_t)&eq->desc[eq->sidx]);
 
 	get_pkt_gl(m, gl);
 	nsegs = gl->sg_nseg;
 	MPASS(nsegs > 0);
 
 	nflits = (3 * (nsegs - 1)) / 2 + ((nsegs - 1) & 1) + 2;
 	flitp = (__be64 *)(*to);
 	wrap = (__be64 *)(&eq->desc[eq->sidx]);
 	seg = &gl->sg_segs[0];
 	usgl = (void *)flitp;
 
 	/*
 	 * We start at a 16 byte boundary somewhere inside the tx descriptor
 	 * ring, so we're at least 16 bytes away from the status page.  There is
 	 * no chance of a wrap around in the middle of usgl (which is 16 bytes).
 	 */
 
 	usgl->cmd_nsge = htobe32(V_ULPTX_CMD(ULP_TX_SC_DSGL) |
 	    V_ULPTX_NSGE(nsegs));
 	usgl->len0 = htobe32(seg->ss_len);
 	usgl->addr0 = htobe64(seg->ss_paddr);
 	seg++;
 
 	if (checkwrap == 0 || (uintptr_t)(flitp + nflits) <= (uintptr_t)wrap) {
 
 		/* Won't wrap around at all */
 
 		for (i = 0; i < nsegs - 1; i++, seg++) {
 			usgl->sge[i / 2].len[i & 1] = htobe32(seg->ss_len);
 			usgl->sge[i / 2].addr[i & 1] = htobe64(seg->ss_paddr);
 		}
 		if (i & 1)
 			usgl->sge[i / 2].len[1] = htobe32(0);
 		flitp += nflits;
 	} else {
 
 		/* Will wrap somewhere in the rest of the SGL */
 
 		/* 2 flits already written, write the rest flit by flit */
 		flitp = (void *)(usgl + 1);
 		for (i = 0; i < nflits - 2; i++) {
 			if (flitp == wrap)
 				flitp = (void *)eq->desc;
 			*flitp++ = get_flit(seg, nsegs - 1, i);
 		}
 	}
 
 	if (nflits & 1) {
 		MPASS(((uintptr_t)flitp) & 0xf);
 		*flitp++ = 0;
 	}
 
 	MPASS((((uintptr_t)flitp) & 0xf) == 0);
 	if (__predict_false(flitp == wrap))
 		*to = (void *)eq->desc;
 	else
 		*to = (void *)flitp;
 }
 
 static inline void
 copy_to_txd(struct sge_eq *eq, caddr_t from, caddr_t *to, int len)
 {
 
 	MPASS((uintptr_t)(*to) >= (uintptr_t)&eq->desc[0]);
 	MPASS((uintptr_t)(*to) < (uintptr_t)&eq->desc[eq->sidx]);
 
 	if (__predict_true((uintptr_t)(*to) + len <=
 	    (uintptr_t)&eq->desc[eq->sidx])) {
 		bcopy(from, *to, len);
 		(*to) += len;
 	} else {
 		int portion = (uintptr_t)&eq->desc[eq->sidx] - (uintptr_t)(*to);
 
 		bcopy(from, *to, portion);
 		from += portion;
 		portion = len - portion;	/* remaining */
 		bcopy(from, (void *)eq->desc, portion);
 		(*to) = (caddr_t)eq->desc + portion;
 	}
 }
 
 static inline void
 ring_eq_db(struct adapter *sc, struct sge_eq *eq, u_int n)
 {
 	u_int db;
 
 	MPASS(n > 0);
 
 	db = eq->doorbells;
 	if (n > 1)
 		clrbit(&db, DOORBELL_WCWR);
 	wmb();
 
 	switch (ffs(db) - 1) {
 	case DOORBELL_UDB:
 		*eq->udb = htole32(V_QID(eq->udb_qid) | V_PIDX(n));
 		break;
 
 	case DOORBELL_WCWR: {
 		volatile uint64_t *dst, *src;
 		int i;
 
 		/*
 		 * Queues whose 128B doorbell segment fits in the page do not
 		 * use relative qid (udb_qid is always 0).  Only queues with
 		 * doorbell segments can do WCWR.
 		 */
 		KASSERT(eq->udb_qid == 0 && n == 1,
 		    ("%s: inappropriate doorbell (0x%x, %d, %d) for eq %p",
 		    __func__, eq->doorbells, n, eq->dbidx, eq));
 
 		dst = (volatile void *)((uintptr_t)eq->udb + UDBS_WR_OFFSET -
 		    UDBS_DB_OFFSET);
 		i = eq->dbidx;
 		src = (void *)&eq->desc[i];
 		while (src != (void *)&eq->desc[i + 1])
 			*dst++ = *src++;
 		wmb();
 		break;
 	}
 
 	case DOORBELL_UDBWC:
 		*eq->udb = htole32(V_QID(eq->udb_qid) | V_PIDX(n));
 		wmb();
 		break;
 
 	case DOORBELL_KDB:
 		t4_write_reg(sc, sc->sge_kdoorbell_reg,
 		    V_QID(eq->cntxt_id) | V_PIDX(n));
 		break;
 	}
 
 	IDXINCR(eq->dbidx, n, eq->sidx);
 }
 
 static inline u_int
 reclaimable_tx_desc(struct sge_eq *eq)
 {
 	uint16_t hw_cidx;
 
 	hw_cidx = read_hw_cidx(eq);
 	return (IDXDIFF(hw_cidx, eq->cidx, eq->sidx));
 }
 
 static inline u_int
 total_available_tx_desc(struct sge_eq *eq)
 {
 	uint16_t hw_cidx, pidx;
 
 	hw_cidx = read_hw_cidx(eq);
 	pidx = eq->pidx;
 
 	if (pidx == hw_cidx)
 		return (eq->sidx - 1);
 	else
 		return (IDXDIFF(hw_cidx, pidx, eq->sidx) - 1);
 }
 
 static inline uint16_t
 read_hw_cidx(struct sge_eq *eq)
 {
 	struct sge_qstat *spg = (void *)&eq->desc[eq->sidx];
 	uint16_t cidx = spg->cidx;	/* stable snapshot */
 
 	return (be16toh(cidx));
 }
 
 /*
  * Reclaim 'n' descriptors approximately.
  */
 static u_int
 reclaim_tx_descs(struct sge_txq *txq, u_int n)
 {
 	struct tx_sdesc *txsd;
 	struct sge_eq *eq = &txq->eq;
 	u_int can_reclaim, reclaimed;
 
 	TXQ_LOCK_ASSERT_OWNED(txq);
 	MPASS(n > 0);
 
 	reclaimed = 0;
 	can_reclaim = reclaimable_tx_desc(eq);
 	while (can_reclaim && reclaimed < n) {
 		int ndesc;
 		struct mbuf *m, *nextpkt;
 
 		txsd = &txq->sdesc[eq->cidx];
 		ndesc = txsd->desc_used;
 
 		/* Firmware doesn't return "partial" credits. */
 		KASSERT(can_reclaim >= ndesc,
 		    ("%s: unexpected number of credits: %d, %d",
 		    __func__, can_reclaim, ndesc));
 		KASSERT(ndesc != 0,
 		    ("%s: descriptor with no credits: cidx %d",
 		    __func__, eq->cidx));
 
 		for (m = txsd->m; m != NULL; m = nextpkt) {
 			nextpkt = m->m_nextpkt;
 			m->m_nextpkt = NULL;
 			m_freem(m);
 		}
 		reclaimed += ndesc;
 		can_reclaim -= ndesc;
 		IDXINCR(eq->cidx, ndesc, eq->sidx);
 	}
 
 	return (reclaimed);
 }
 
 static void
 tx_reclaim(void *arg, int n)
 {
 	struct sge_txq *txq = arg;
 	struct sge_eq *eq = &txq->eq;
 
 	do {
 		if (TXQ_TRYLOCK(txq) == 0)
 			break;
 		n = reclaim_tx_descs(txq, 32);
 		if (eq->cidx == eq->pidx)
 			eq->equeqidx = eq->pidx;
 		TXQ_UNLOCK(txq);
 	} while (n > 0);
 }
 
 static __be64
 get_flit(struct sglist_seg *segs, int nsegs, int idx)
 {
 	int i = (idx / 3) * 2;
 
 	switch (idx % 3) {
 	case 0: {
 		uint64_t rc;
 
 		rc = (uint64_t)segs[i].ss_len << 32;
 		if (i + 1 < nsegs)
 			rc |= (uint64_t)(segs[i + 1].ss_len);
 
 		return (htobe64(rc));
 	}
 	case 1:
 		return (htobe64(segs[i].ss_paddr));
 	case 2:
 		return (htobe64(segs[i + 1].ss_paddr));
 	}
 
 	return (0);
 }
 
 static void
 find_best_refill_source(struct adapter *sc, struct sge_fl *fl, int maxp)
 {
 	int8_t zidx, hwidx, idx;
 	uint16_t region1, region3;
 	int spare, spare_needed, n;
 	struct sw_zone_info *swz;
 	struct hw_buf_info *hwb, *hwb_list = &sc->sge.hw_buf_info[0];
 
 	/*
 	 * Buffer Packing: Look for PAGE_SIZE or larger zone which has a bufsize
 	 * large enough for the max payload and cluster metadata.  Otherwise
 	 * settle for the largest bufsize that leaves enough room in the cluster
 	 * for metadata.
 	 *
 	 * Without buffer packing: Look for the smallest zone which has a
 	 * bufsize large enough for the max payload.  Settle for the largest
 	 * bufsize available if there's nothing big enough for max payload.
 	 */
 	spare_needed = fl->flags & FL_BUF_PACKING ? CL_METADATA_SIZE : 0;
 	swz = &sc->sge.sw_zone_info[0];
 	hwidx = -1;
 	for (zidx = 0; zidx < SW_ZONE_SIZES; zidx++, swz++) {
 		if (swz->size > largest_rx_cluster) {
 			if (__predict_true(hwidx != -1))
 				break;
 
 			/*
 			 * This is a misconfiguration.  largest_rx_cluster is
 			 * preventing us from finding a refill source.  See
 			 * dev.t5nex.<n>.buffer_sizes to figure out why.
 			 */
 			device_printf(sc->dev, "largest_rx_cluster=%u leaves no"
 			    " refill source for fl %p (dma %u).  Ignored.\n",
 			    largest_rx_cluster, fl, maxp);
 		}
 		for (idx = swz->head_hwidx; idx != -1; idx = hwb->next) {
 			hwb = &hwb_list[idx];
 			spare = swz->size - hwb->size;
 			if (spare < spare_needed)
 				continue;
 
 			hwidx = idx;		/* best option so far */
 			if (hwb->size >= maxp) {
 
 				if ((fl->flags & FL_BUF_PACKING) == 0)
 					goto done; /* stop looking (not packing) */
 
 				if (swz->size >= safest_rx_cluster)
 					goto done; /* stop looking (packing) */
 			}
 			break;		/* keep looking, next zone */
 		}
 	}
 done:
 	/* A usable hwidx has been located. */
 	MPASS(hwidx != -1);
 	hwb = &hwb_list[hwidx];
 	zidx = hwb->zidx;
 	swz = &sc->sge.sw_zone_info[zidx];
 	region1 = 0;
 	region3 = swz->size - hwb->size;
 
 	/*
 	 * Stay within this zone and see if there is a better match when mbuf
 	 * inlining is allowed.  Remember that the hwidx's are sorted in
 	 * decreasing order of size (so in increasing order of spare area).
 	 */
 	for (idx = hwidx; idx != -1; idx = hwb->next) {
 		hwb = &hwb_list[idx];
 		spare = swz->size - hwb->size;
 
 		if (allow_mbufs_in_cluster == 0 || hwb->size < maxp)
 			break;
 
 		/*
 		 * Do not inline mbufs if doing so would violate the pad/pack
 		 * boundary alignment requirement.
 		 */
 		if (fl_pad && (MSIZE % sc->params.sge.pad_boundary) != 0)
 			continue;
 		if (fl->flags & FL_BUF_PACKING &&
 		    (MSIZE % sc->params.sge.pack_boundary) != 0)
 			continue;
 
 		if (spare < CL_METADATA_SIZE + MSIZE)
 			continue;
 		n = (spare - CL_METADATA_SIZE) / MSIZE;
 		if (n > howmany(hwb->size, maxp))
 			break;
 
 		hwidx = idx;
 		if (fl->flags & FL_BUF_PACKING) {
 			region1 = n * MSIZE;
 			region3 = spare - region1;
 		} else {
 			region1 = MSIZE;
 			region3 = spare - region1;
 			break;
 		}
 	}
 
 	KASSERT(zidx >= 0 && zidx < SW_ZONE_SIZES,
 	    ("%s: bad zone %d for fl %p, maxp %d", __func__, zidx, fl, maxp));
 	KASSERT(hwidx >= 0 && hwidx <= SGE_FLBUF_SIZES,
 	    ("%s: bad hwidx %d for fl %p, maxp %d", __func__, hwidx, fl, maxp));
 	KASSERT(region1 + sc->sge.hw_buf_info[hwidx].size + region3 ==
 	    sc->sge.sw_zone_info[zidx].size,
 	    ("%s: bad buffer layout for fl %p, maxp %d. "
 		"cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp,
 		sc->sge.sw_zone_info[zidx].size, region1,
 		sc->sge.hw_buf_info[hwidx].size, region3));
 	if (fl->flags & FL_BUF_PACKING || region1 > 0) {
 		KASSERT(region3 >= CL_METADATA_SIZE,
 		    ("%s: no room for metadata.  fl %p, maxp %d; "
 		    "cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp,
 		    sc->sge.sw_zone_info[zidx].size, region1,
 		    sc->sge.hw_buf_info[hwidx].size, region3));
 		KASSERT(region1 % MSIZE == 0,
 		    ("%s: bad mbuf region for fl %p, maxp %d. "
 		    "cl %d; r1 %d, payload %d, r3 %d", __func__, fl, maxp,
 		    sc->sge.sw_zone_info[zidx].size, region1,
 		    sc->sge.hw_buf_info[hwidx].size, region3));
 	}
 
 	fl->cll_def.zidx = zidx;
 	fl->cll_def.hwidx = hwidx;
 	fl->cll_def.region1 = region1;
 	fl->cll_def.region3 = region3;
 }
 
 static void
 find_safe_refill_source(struct adapter *sc, struct sge_fl *fl)
 {
 	struct sge *s = &sc->sge;
 	struct hw_buf_info *hwb;
 	struct sw_zone_info *swz;
 	int spare;
 	int8_t hwidx;
 
 	if (fl->flags & FL_BUF_PACKING)
 		hwidx = s->safe_hwidx2;	/* with room for metadata */
 	else if (allow_mbufs_in_cluster && s->safe_hwidx2 != -1) {
 		hwidx = s->safe_hwidx2;
 		hwb = &s->hw_buf_info[hwidx];
 		swz = &s->sw_zone_info[hwb->zidx];
 		spare = swz->size - hwb->size;
 
 		/* no good if there isn't room for an mbuf as well */
 		if (spare < CL_METADATA_SIZE + MSIZE)
 			hwidx = s->safe_hwidx1;
 	} else
 		hwidx = s->safe_hwidx1;
 
 	if (hwidx == -1) {
 		/* No fallback source */
 		fl->cll_alt.hwidx = -1;
 		fl->cll_alt.zidx = -1;
 
 		return;
 	}
 
 	hwb = &s->hw_buf_info[hwidx];
 	swz = &s->sw_zone_info[hwb->zidx];
 	spare = swz->size - hwb->size;
 	fl->cll_alt.hwidx = hwidx;
 	fl->cll_alt.zidx = hwb->zidx;
 	if (allow_mbufs_in_cluster &&
 	    (fl_pad == 0 || (MSIZE % sc->params.sge.pad_boundary) == 0))
 		fl->cll_alt.region1 = ((spare - CL_METADATA_SIZE) / MSIZE) * MSIZE;
 	else
 		fl->cll_alt.region1 = 0;
 	fl->cll_alt.region3 = spare - fl->cll_alt.region1;
 }
 
 static void
 add_fl_to_sfl(struct adapter *sc, struct sge_fl *fl)
 {
 	mtx_lock(&sc->sfl_lock);
 	FL_LOCK(fl);
 	if ((fl->flags & FL_DOOMED) == 0) {
 		fl->flags |= FL_STARVING;
 		TAILQ_INSERT_TAIL(&sc->sfl, fl, link);
 		callout_reset(&sc->sfl_callout, hz / 5, refill_sfl, sc);
 	}
 	FL_UNLOCK(fl);
 	mtx_unlock(&sc->sfl_lock);
 }
 
 static void
 handle_wrq_egr_update(struct adapter *sc, struct sge_eq *eq)
 {
 	struct sge_wrq *wrq = (void *)eq;
 
 	atomic_readandclear_int(&eq->equiq);
 	taskqueue_enqueue(sc->tq[eq->tx_chan], &wrq->wrq_tx_task);
 }
 
 static void
 handle_eth_egr_update(struct adapter *sc, struct sge_eq *eq)
 {
 	struct sge_txq *txq = (void *)eq;
 
 	MPASS((eq->flags & EQ_TYPEMASK) == EQ_ETH);
 
 	atomic_readandclear_int(&eq->equiq);
 	mp_ring_check_drainage(txq->r, 0);
 	taskqueue_enqueue(sc->tq[eq->tx_chan], &txq->tx_reclaim_task);
 }
 
 static int
 handle_sge_egr_update(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	const struct cpl_sge_egr_update *cpl = (const void *)(rss + 1);
 	unsigned int qid = G_EGR_QID(ntohl(cpl->opcode_qid));
 	struct adapter *sc = iq->adapter;
 	struct sge *s = &sc->sge;
 	struct sge_eq *eq;
 	static void (*h[])(struct adapter *, struct sge_eq *) = {NULL,
 		&handle_wrq_egr_update, &handle_eth_egr_update,
 		&handle_wrq_egr_update};
 
 	KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__,
 	    rss->opcode));
 
 	eq = s->eqmap[qid - s->eq_start - s->eq_base];
 	(*h[eq->flags & EQ_TYPEMASK])(sc, eq);
 
 	return (0);
 }
 
 /* handle_fw_msg works for both fw4_msg and fw6_msg because this is valid */
 CTASSERT(offsetof(struct cpl_fw4_msg, data) == \
     offsetof(struct cpl_fw6_msg, data));
 
 static int
 handle_fw_msg(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_fw6_msg *cpl = (const void *)(rss + 1);
 
 	KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__,
 	    rss->opcode));
 
 	if (cpl->type == FW_TYPE_RSSCPL || cpl->type == FW6_TYPE_RSSCPL) {
 		const struct rss_header *rss2;
 
 		rss2 = (const struct rss_header *)&cpl->data[0];
 		return (t4_cpl_handler[rss2->opcode](iq, rss2, m));
 	}
 
 	return (t4_fw_msg_handler[cpl->type](sc, &cpl->data[0]));
 }
 
 /**
  *	t4_handle_wrerr_rpl - process a FW work request error message
  *	@adap: the adapter
  *	@rpl: start of the FW message
  */
 static int
 t4_handle_wrerr_rpl(struct adapter *adap, const __be64 *rpl)
 {
 	u8 opcode = *(const u8 *)rpl;
 	const struct fw_error_cmd *e = (const void *)rpl;
 	unsigned int i;
 
 	if (opcode != FW_ERROR_CMD) {
 		log(LOG_ERR,
 		    "%s: Received WRERR_RPL message with opcode %#x\n",
 		    device_get_nameunit(adap->dev), opcode);
 		return (EINVAL);
 	}
 	log(LOG_ERR, "%s: FW_ERROR (%s) ", device_get_nameunit(adap->dev),
 	    G_FW_ERROR_CMD_FATAL(be32toh(e->op_to_type)) ? "fatal" :
 	    "non-fatal");
 	switch (G_FW_ERROR_CMD_TYPE(be32toh(e->op_to_type))) {
 	case FW_ERROR_TYPE_EXCEPTION:
 		log(LOG_ERR, "exception info:\n");
 		for (i = 0; i < nitems(e->u.exception.info); i++)
 			log(LOG_ERR, "%s%08x", i == 0 ? "\t" : " ",
 			    be32toh(e->u.exception.info[i]));
 		log(LOG_ERR, "\n");
 		break;
 	case FW_ERROR_TYPE_HWMODULE:
 		log(LOG_ERR, "HW module regaddr %08x regval %08x\n",
 		    be32toh(e->u.hwmodule.regaddr),
 		    be32toh(e->u.hwmodule.regval));
 		break;
 	case FW_ERROR_TYPE_WR:
 		log(LOG_ERR, "WR cidx %d PF %d VF %d eqid %d hdr:\n",
 		    be16toh(e->u.wr.cidx),
 		    G_FW_ERROR_CMD_PFN(be16toh(e->u.wr.pfn_vfn)),
 		    G_FW_ERROR_CMD_VFN(be16toh(e->u.wr.pfn_vfn)),
 		    be32toh(e->u.wr.eqid));
 		for (i = 0; i < nitems(e->u.wr.wrhdr); i++)
 			log(LOG_ERR, "%s%02x", i == 0 ? "\t" : " ",
 			    e->u.wr.wrhdr[i]);
 		log(LOG_ERR, "\n");
 		break;
 	case FW_ERROR_TYPE_ACL:
 		log(LOG_ERR, "ACL cidx %d PF %d VF %d eqid %d %s",
 		    be16toh(e->u.acl.cidx),
 		    G_FW_ERROR_CMD_PFN(be16toh(e->u.acl.pfn_vfn)),
 		    G_FW_ERROR_CMD_VFN(be16toh(e->u.acl.pfn_vfn)),
 		    be32toh(e->u.acl.eqid),
 		    G_FW_ERROR_CMD_MV(be16toh(e->u.acl.mv_pkd)) ? "vlanid" :
 		    "MAC");
 		for (i = 0; i < nitems(e->u.acl.val); i++)
 			log(LOG_ERR, " %02x", e->u.acl.val[i]);
 		log(LOG_ERR, "\n");
 		break;
 	default:
 		log(LOG_ERR, "type %#x\n",
 		    G_FW_ERROR_CMD_TYPE(be32toh(e->op_to_type)));
 		return (EINVAL);
 	}
 	return (0);
 }
 
 static int
 sysctl_uint16(SYSCTL_HANDLER_ARGS)
 {
 	uint16_t *id = arg1;
 	int i = *id;
 
 	return sysctl_handle_int(oidp, &i, 0, req);
 }
 
 static int
 sysctl_bufsizes(SYSCTL_HANDLER_ARGS)
 {
 	struct sge *s = arg1;
 	struct hw_buf_info *hwb = &s->hw_buf_info[0];
 	struct sw_zone_info *swz = &s->sw_zone_info[0];
 	int i, rc;
 	struct sbuf sb;
 	char c;
 
 	sbuf_new(&sb, NULL, 32, SBUF_AUTOEXTEND);
 	for (i = 0; i < SGE_FLBUF_SIZES; i++, hwb++) {
 		if (hwb->zidx >= 0 && swz[hwb->zidx].size <= largest_rx_cluster)
 			c = '*';
 		else
 			c = '\0';
 
 		sbuf_printf(&sb, "%u%c ", hwb->size, c);
 	}
 	sbuf_trim(&sb);
 	sbuf_finish(&sb);
 	rc = sysctl_handle_string(oidp, sbuf_data(&sb), sbuf_len(&sb), req);
 	sbuf_delete(&sb);
 	return (rc);
 }
 
 #ifdef RATELIMIT
 /*
  * len16 for a txpkt WR with a GL.  Includes the firmware work request header.
  */
 static inline u_int
 txpkt_eo_len16(u_int nsegs, u_int immhdrs, u_int tso)
 {
 	u_int n;
 
 	MPASS(immhdrs > 0);
 
 	n = roundup2(sizeof(struct fw_eth_tx_eo_wr) +
 	    sizeof(struct cpl_tx_pkt_core) + immhdrs, 16);
 	if (__predict_false(nsegs == 0))
 		goto done;
 
 	nsegs--; /* first segment is part of ulptx_sgl */
 	n += sizeof(struct ulptx_sgl) + 8 * ((3 * nsegs) / 2 + (nsegs & 1));
 	if (tso)
 		n += sizeof(struct cpl_tx_pkt_lso_core);
 
 done:
 	return (howmany(n, 16));
 }
 
 #define ETID_FLOWC_NPARAMS 6
 #define ETID_FLOWC_LEN (roundup2((sizeof(struct fw_flowc_wr) + \
     ETID_FLOWC_NPARAMS * sizeof(struct fw_flowc_mnemval)), 16))
 #define ETID_FLOWC_LEN16 (howmany(ETID_FLOWC_LEN, 16))
 
 static int
 send_etid_flowc_wr(struct cxgbe_snd_tag *cst, struct port_info *pi,
     struct vi_info *vi)
 {
 	struct wrq_cookie cookie;
 	u_int pfvf = pi->adapter->pf << S_FW_VIID_PFN;
 	struct fw_flowc_wr *flowc;
 
 	mtx_assert(&cst->lock, MA_OWNED);
 	MPASS((cst->flags & (EO_FLOWC_PENDING | EO_FLOWC_RPL_PENDING)) ==
 	    EO_FLOWC_PENDING);
 
 	flowc = start_wrq_wr(cst->eo_txq, ETID_FLOWC_LEN16, &cookie);
 	if (__predict_false(flowc == NULL))
 		return (ENOMEM);
 
 	bzero(flowc, ETID_FLOWC_LEN);
 	flowc->op_to_nparams = htobe32(V_FW_WR_OP(FW_FLOWC_WR) |
 	    V_FW_FLOWC_WR_NPARAMS(ETID_FLOWC_NPARAMS) | V_FW_WR_COMPL(0));
 	flowc->flowid_len16 = htonl(V_FW_WR_LEN16(ETID_FLOWC_LEN16) |
 	    V_FW_WR_FLOWID(cst->etid));
 	flowc->mnemval[0].mnemonic = FW_FLOWC_MNEM_PFNVFN;
 	flowc->mnemval[0].val = htobe32(pfvf);
 	flowc->mnemval[1].mnemonic = FW_FLOWC_MNEM_CH;
 	flowc->mnemval[1].val = htobe32(pi->tx_chan);
 	flowc->mnemval[2].mnemonic = FW_FLOWC_MNEM_PORT;
 	flowc->mnemval[2].val = htobe32(pi->tx_chan);
 	flowc->mnemval[3].mnemonic = FW_FLOWC_MNEM_IQID;
 	flowc->mnemval[3].val = htobe32(cst->iqid);
 	flowc->mnemval[4].mnemonic = FW_FLOWC_MNEM_EOSTATE;
 	flowc->mnemval[4].val = htobe32(FW_FLOWC_MNEM_EOSTATE_ESTABLISHED);
 	flowc->mnemval[5].mnemonic = FW_FLOWC_MNEM_SCHEDCLASS;
 	flowc->mnemval[5].val = htobe32(cst->schedcl);
 
 	commit_wrq_wr(cst->eo_txq, flowc, &cookie);
 
 	cst->flags &= ~EO_FLOWC_PENDING;
 	cst->flags |= EO_FLOWC_RPL_PENDING;
 	MPASS(cst->tx_credits >= ETID_FLOWC_LEN16);	/* flowc is first WR. */
 	cst->tx_credits -= ETID_FLOWC_LEN16;
 
 	return (0);
 }
 
 #define ETID_FLUSH_LEN16 (howmany(sizeof (struct fw_flowc_wr), 16))
 
 void
 send_etid_flush_wr(struct cxgbe_snd_tag *cst)
 {
 	struct fw_flowc_wr *flowc;
 	struct wrq_cookie cookie;
 
 	mtx_assert(&cst->lock, MA_OWNED);
 
 	flowc = start_wrq_wr(cst->eo_txq, ETID_FLUSH_LEN16, &cookie);
 	if (__predict_false(flowc == NULL))
 		CXGBE_UNIMPLEMENTED(__func__);
 
 	bzero(flowc, ETID_FLUSH_LEN16 * 16);
 	flowc->op_to_nparams = htobe32(V_FW_WR_OP(FW_FLOWC_WR) |
 	    V_FW_FLOWC_WR_NPARAMS(0) | F_FW_WR_COMPL);
 	flowc->flowid_len16 = htobe32(V_FW_WR_LEN16(ETID_FLUSH_LEN16) |
 	    V_FW_WR_FLOWID(cst->etid));
 
 	commit_wrq_wr(cst->eo_txq, flowc, &cookie);
 
 	cst->flags |= EO_FLUSH_RPL_PENDING;
 	MPASS(cst->tx_credits >= ETID_FLUSH_LEN16);
 	cst->tx_credits -= ETID_FLUSH_LEN16;
 	cst->ncompl++;
 }
 
 static void
 write_ethofld_wr(struct cxgbe_snd_tag *cst, struct fw_eth_tx_eo_wr *wr,
     struct mbuf *m0, int compl)
 {
 	struct cpl_tx_pkt_core *cpl;
 	uint64_t ctrl1;
 	uint32_t ctrl;	/* used in many unrelated places */
 	int len16, pktlen, nsegs, immhdrs;
 	caddr_t dst;
 	uintptr_t p;
 	struct ulptx_sgl *usgl;
 	struct sglist sg;
 	struct sglist_seg segs[38];	/* XXX: find real limit.  XXX: get off the stack */
 
 	mtx_assert(&cst->lock, MA_OWNED);
 	M_ASSERTPKTHDR(m0);
 	KASSERT(m0->m_pkthdr.l2hlen > 0 && m0->m_pkthdr.l3hlen > 0 &&
 	    m0->m_pkthdr.l4hlen > 0,
 	    ("%s: ethofld mbuf %p is missing header lengths", __func__, m0));
 
 	len16 = mbuf_eo_len16(m0);
 	nsegs = mbuf_eo_nsegs(m0);
 	pktlen = m0->m_pkthdr.len;
 	ctrl = sizeof(struct cpl_tx_pkt_core);
 	if (needs_tso(m0))
 		ctrl += sizeof(struct cpl_tx_pkt_lso_core);
 	immhdrs = m0->m_pkthdr.l2hlen + m0->m_pkthdr.l3hlen + m0->m_pkthdr.l4hlen;
 	ctrl += immhdrs;
 
 	wr->op_immdlen = htobe32(V_FW_WR_OP(FW_ETH_TX_EO_WR) |
 	    V_FW_ETH_TX_EO_WR_IMMDLEN(ctrl) | V_FW_WR_COMPL(!!compl));
 	wr->equiq_to_len16 = htobe32(V_FW_WR_LEN16(len16) |
 	    V_FW_WR_FLOWID(cst->etid));
 	wr->r3 = 0;
 	if (needs_udp_csum(m0)) {
 		wr->u.udpseg.type = FW_ETH_TX_EO_TYPE_UDPSEG;
 		wr->u.udpseg.ethlen = m0->m_pkthdr.l2hlen;
 		wr->u.udpseg.iplen = htobe16(m0->m_pkthdr.l3hlen);
 		wr->u.udpseg.udplen = m0->m_pkthdr.l4hlen;
 		wr->u.udpseg.rtplen = 0;
 		wr->u.udpseg.r4 = 0;
 		wr->u.udpseg.mss = htobe16(pktlen - immhdrs);
 		wr->u.udpseg.schedpktsize = wr->u.udpseg.mss;
 		wr->u.udpseg.plen = htobe32(pktlen - immhdrs);
 		cpl = (void *)(wr + 1);
 	} else {
 		MPASS(needs_tcp_csum(m0));
 		wr->u.tcpseg.type = FW_ETH_TX_EO_TYPE_TCPSEG;
 		wr->u.tcpseg.ethlen = m0->m_pkthdr.l2hlen;
 		wr->u.tcpseg.iplen = htobe16(m0->m_pkthdr.l3hlen);
 		wr->u.tcpseg.tcplen = m0->m_pkthdr.l4hlen;
 		wr->u.tcpseg.tsclk_tsoff = mbuf_eo_tsclk_tsoff(m0);
 		wr->u.tcpseg.r4 = 0;
 		wr->u.tcpseg.r5 = 0;
 		wr->u.tcpseg.plen = htobe32(pktlen - immhdrs);
 
 		if (needs_tso(m0)) {
 			struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
 
 			wr->u.tcpseg.mss = htobe16(m0->m_pkthdr.tso_segsz);
 
 			ctrl = V_LSO_OPCODE(CPL_TX_PKT_LSO) |
 			    F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE |
 			    V_LSO_IPHDR_LEN(m0->m_pkthdr.l3hlen >> 2) |
 			    V_LSO_TCPHDR_LEN(m0->m_pkthdr.l4hlen >> 2);
 			if (m0->m_pkthdr.l2hlen == sizeof(struct ether_vlan_header))
 				ctrl |= V_LSO_ETHHDR_LEN(1);
 			if (m0->m_pkthdr.l3hlen == sizeof(struct ip6_hdr))
 				ctrl |= F_LSO_IPV6;
 			lso->lso_ctrl = htobe32(ctrl);
 			lso->ipid_ofst = htobe16(0);
 			lso->mss = htobe16(m0->m_pkthdr.tso_segsz);
 			lso->seqno_offset = htobe32(0);
 			lso->len = htobe32(pktlen);
 
 			cpl = (void *)(lso + 1);
 		} else {
 			wr->u.tcpseg.mss = htobe16(0xffff);
 			cpl = (void *)(wr + 1);
 		}
 	}
 
 	/* Checksum offload must be requested for ethofld. */
 	ctrl1 = 0;
 	MPASS(needs_l4_csum(m0));
 
 	/* VLAN tag insertion */
 	if (needs_vlan_insertion(m0)) {
 		ctrl1 |= F_TXPKT_VLAN_VLD |
 		    V_TXPKT_VLAN(m0->m_pkthdr.ether_vtag);
 	}
 
 	/* CPL header */
 	cpl->ctrl0 = cst->ctrl0;
 	cpl->pack = 0;
 	cpl->len = htobe16(pktlen);
 	cpl->ctrl1 = htobe64(ctrl1);
 
 	/* Copy Ethernet, IP & TCP/UDP hdrs as immediate data */
 	p = (uintptr_t)(cpl + 1);
 	m_copydata(m0, 0, immhdrs, (void *)p);
 
 	/* SGL */
 	dst = (void *)(cpl + 1);
 	if (nsegs > 0) {
 		int i, pad;
 
 		/* zero-pad upto next 16Byte boundary, if not 16Byte aligned */
 		p += immhdrs;
 		pad = 16 - (immhdrs & 0xf);
 		bzero((void *)p, pad);
 
 		usgl = (void *)(p + pad);
 		usgl->cmd_nsge = htobe32(V_ULPTX_CMD(ULP_TX_SC_DSGL) |
 		    V_ULPTX_NSGE(nsegs));
 
 		sglist_init(&sg, nitems(segs), segs);
 		for (; m0 != NULL; m0 = m0->m_next) {
 			if (__predict_false(m0->m_len == 0))
 				continue;
 			if (immhdrs >= m0->m_len) {
 				immhdrs -= m0->m_len;
 				continue;
 			}
 
 			sglist_append(&sg, mtod(m0, char *) + immhdrs,
 			    m0->m_len - immhdrs);
 			immhdrs = 0;
 		}
 		MPASS(sg.sg_nseg == nsegs);
 
 		/*
 		 * Zero pad last 8B in case the WR doesn't end on a 16B
 		 * boundary.
 		 */
 		*(uint64_t *)((char *)wr + len16 * 16 - 8) = 0;
 
 		usgl->len0 = htobe32(segs[0].ss_len);
 		usgl->addr0 = htobe64(segs[0].ss_paddr);
 		for (i = 0; i < nsegs - 1; i++) {
 			usgl->sge[i / 2].len[i & 1] = htobe32(segs[i + 1].ss_len);
 			usgl->sge[i / 2].addr[i & 1] = htobe64(segs[i + 1].ss_paddr);
 		}
 		if (i & 1)
 			usgl->sge[i / 2].len[1] = htobe32(0);
 	}
 
 }
 
 static void
 ethofld_tx(struct cxgbe_snd_tag *cst)
 {
 	struct mbuf *m;
 	struct wrq_cookie cookie;
 	int next_credits, compl;
 	struct fw_eth_tx_eo_wr *wr;
 
 	mtx_assert(&cst->lock, MA_OWNED);
 
 	while ((m = mbufq_first(&cst->pending_tx)) != NULL) {
 		M_ASSERTPKTHDR(m);
 
 		/* How many len16 credits do we need to send this mbuf. */
 		next_credits = mbuf_eo_len16(m);
 		MPASS(next_credits > 0);
 		if (next_credits > cst->tx_credits) {
 			/*
 			 * Tx will make progress eventually because there is at
 			 * least one outstanding fw4_ack that will return
 			 * credits and kick the tx.
 			 */
 			MPASS(cst->ncompl > 0);
 			return;
 		}
 		wr = start_wrq_wr(cst->eo_txq, next_credits, &cookie);
 		if (__predict_false(wr == NULL)) {
 			/* XXX: wishful thinking, not a real assertion. */
 			MPASS(cst->ncompl > 0);
 			return;
 		}
 		cst->tx_credits -= next_credits;
 		cst->tx_nocompl += next_credits;
 		compl = cst->ncompl == 0 || cst->tx_nocompl >= cst->tx_total / 2;
 		ETHER_BPF_MTAP(cst->com.ifp, m);
 		write_ethofld_wr(cst, wr, m, compl);
 		commit_wrq_wr(cst->eo_txq, wr, &cookie);
 		if (compl) {
 			cst->ncompl++;
 			cst->tx_nocompl	= 0;
 		}
 		(void) mbufq_dequeue(&cst->pending_tx);
 		mbufq_enqueue(&cst->pending_fwack, m);
 	}
 }
 
 int
 ethofld_transmit(struct ifnet *ifp, struct mbuf *m0)
 {
 	struct cxgbe_snd_tag *cst;
 	int rc;
 
 	MPASS(m0->m_nextpkt == NULL);
 	MPASS(m0->m_pkthdr.snd_tag != NULL);
 	cst = mst_to_cst(m0->m_pkthdr.snd_tag);
 
 	mtx_lock(&cst->lock);
 	MPASS(cst->flags & EO_SND_TAG_REF);
 
 	if (__predict_false(cst->flags & EO_FLOWC_PENDING)) {
 		struct vi_info *vi = ifp->if_softc;
 		struct port_info *pi = vi->pi;
 		struct adapter *sc = pi->adapter;
 		const uint32_t rss_mask = vi->rss_size - 1;
 		uint32_t rss_hash;
 
 		cst->eo_txq = &sc->sge.ofld_txq[vi->first_ofld_txq];
 		if (M_HASHTYPE_ISHASH(m0))
 			rss_hash = m0->m_pkthdr.flowid;
 		else
 			rss_hash = arc4random();
 		/* We assume RSS hashing */
 		cst->iqid = vi->rss[rss_hash & rss_mask];
 		cst->eo_txq += rss_hash % vi->nofldtxq;
 		rc = send_etid_flowc_wr(cst, pi, vi);
 		if (rc != 0)
 			goto done;
 	}
 
 	if (__predict_false(cst->plen + m0->m_pkthdr.len > eo_max_backlog)) {
 		rc = ENOBUFS;
 		goto done;
 	}
 
 	mbufq_enqueue(&cst->pending_tx, m0);
 	cst->plen += m0->m_pkthdr.len;
 
 	ethofld_tx(cst);
 	rc = 0;
 done:
 	mtx_unlock(&cst->lock);
 	if (__predict_false(rc != 0))
 		m_freem(m0);
 	return (rc);
 }
 
 static int
 ethofld_fw4_ack(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m0)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_fw4_ack *cpl = (const void *)(rss + 1);
 	struct mbuf *m;
 	u_int etid = G_CPL_FW4_ACK_FLOWID(be32toh(OPCODE_TID(cpl)));
 	struct cxgbe_snd_tag *cst;
 	uint8_t credits = cpl->credits;
 
 	cst = lookup_etid(sc, etid);
 	mtx_lock(&cst->lock);
 	if (__predict_false(cst->flags & EO_FLOWC_RPL_PENDING)) {
 		MPASS(credits >= ETID_FLOWC_LEN16);
 		credits -= ETID_FLOWC_LEN16;
 		cst->flags &= ~EO_FLOWC_RPL_PENDING;
 	}
 
 	KASSERT(cst->ncompl > 0,
 	    ("%s: etid %u (%p) wasn't expecting completion.",
 	    __func__, etid, cst));
 	cst->ncompl--;
 
 	while (credits > 0) {
 		m = mbufq_dequeue(&cst->pending_fwack);
 		if (__predict_false(m == NULL)) {
 			/*
 			 * The remaining credits are for the final flush that
 			 * was issued when the tag was freed by the kernel.
 			 */
 			MPASS((cst->flags &
 			    (EO_FLUSH_RPL_PENDING | EO_SND_TAG_REF)) ==
 			    EO_FLUSH_RPL_PENDING);
 			MPASS(credits == ETID_FLUSH_LEN16);
 			MPASS(cst->tx_credits + cpl->credits == cst->tx_total);
 			MPASS(cst->ncompl == 0);
 
 			cst->flags &= ~EO_FLUSH_RPL_PENDING;
 			cst->tx_credits += cpl->credits;
 freetag:
 			cxgbe_snd_tag_free_locked(cst);
 			return (0);	/* cst is gone. */
 		}
 		KASSERT(m != NULL,
 		    ("%s: too many credits (%u, %u)", __func__, cpl->credits,
 		    credits));
 		KASSERT(credits >= mbuf_eo_len16(m),
 		    ("%s: too few credits (%u, %u, %u)", __func__,
 		    cpl->credits, credits, mbuf_eo_len16(m)));
 		credits -= mbuf_eo_len16(m);
 		cst->plen -= m->m_pkthdr.len;
 		m_freem(m);
 	}
 
 	cst->tx_credits += cpl->credits;
 	MPASS(cst->tx_credits <= cst->tx_total);
 
 	m = mbufq_first(&cst->pending_tx);
 	if (m != NULL && cst->tx_credits >= mbuf_eo_len16(m))
 		ethofld_tx(cst);
 
 	if (__predict_false((cst->flags & EO_SND_TAG_REF) == 0) &&
 	    cst->ncompl == 0) {
 		if (cst->tx_credits == cst->tx_total)
 			goto freetag;
 		else {
 			MPASS((cst->flags & EO_FLUSH_RPL_PENDING) == 0);
 			send_etid_flush_wr(cst);
 		}
 	}
 
 	mtx_unlock(&cst->lock);
 
 	return (0);
 }
 #endif
Index: user/ngie/bug-237403/sys/dev/gpio/gpioc.c
===================================================================
--- user/ngie/bug-237403/sys/dev/gpio/gpioc.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/gpio/gpioc.c	(revision 346926)
@@ -1,231 +1,231 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2009 Oleksandr Tymoshenko <gonzo@freebsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/gpio.h>
 #include <sys/ioccom.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 
 #include <dev/gpio/gpiobusvar.h>
 
 #include "gpio_if.h"
 #include "gpiobus_if.h"
 
 #undef GPIOC_DEBUG
 #ifdef GPIOC_DEBUG
 #define dprintf printf
 #else
 #define dprintf(x, arg...)
 #endif
 
 static int gpioc_probe(device_t dev);
 static int gpioc_attach(device_t dev);
 static int gpioc_detach(device_t dev);
 
 static d_ioctl_t	gpioc_ioctl;
 
 static struct cdevsw gpioc_cdevsw = {
 	.d_version	= D_VERSION,
 	.d_ioctl	= gpioc_ioctl,
 	.d_name		= "gpioc",
 };
 
 struct gpioc_softc {
 	device_t	sc_dev;		/* gpiocX dev */
 	device_t	sc_pdev;	/* gpioX dev */
 	struct cdev	*sc_ctl_dev;	/* controller device */
 	int		sc_unit;
 };
 
 static int
 gpioc_probe(device_t dev)
 {
 	device_set_desc(dev, "GPIO controller");
 	return (0);
 }
 
 static int
 gpioc_attach(device_t dev)
 {
 	int err;
 	struct gpioc_softc *sc;
 	struct make_dev_args devargs;
 
 	sc = device_get_softc(dev);
 	sc->sc_dev = dev;
 	sc->sc_pdev = device_get_parent(dev);
 	sc->sc_unit = device_get_unit(dev);
 	make_dev_args_init(&devargs);
 	devargs.mda_devsw = &gpioc_cdevsw;
 	devargs.mda_uid = UID_ROOT;
 	devargs.mda_gid = GID_WHEEL;
 	devargs.mda_mode = 0600;
 	devargs.mda_si_drv1 = sc;
 	err = make_dev_s(&devargs, &sc->sc_ctl_dev, "gpioc%d", sc->sc_unit);
 	if (err != 0) {
 		printf("Failed to create gpioc%d", sc->sc_unit);
 		return (ENXIO);
 	}
 
 	return (0);
 }
 
 static int
 gpioc_detach(device_t dev)
 {
 	struct gpioc_softc *sc = device_get_softc(dev);
 	int err;
 
 	if (sc->sc_ctl_dev)
 		destroy_dev(sc->sc_ctl_dev);
 
 	if ((err = bus_generic_detach(dev)) != 0)
 		return (err);
 
 	return (0);
 }
 
 static int 
 gpioc_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag, 
     struct thread *td)
 {
 	device_t bus;
 	int max_pin, res;
 	struct gpioc_softc *sc = cdev->si_drv1;
 	struct gpio_pin pin;
 	struct gpio_req req;
 	struct gpio_access_32 *a32;
 	struct gpio_config_32 *c32;
 	uint32_t caps;
 
 	bus = GPIO_GET_BUS(sc->sc_pdev);
 	if (bus == NULL)
 		return (EINVAL);
 	switch (cmd) {
 		case GPIOMAXPIN:
 			max_pin = -1;
 			res = GPIO_PIN_MAX(sc->sc_pdev, &max_pin);
 			bcopy(&max_pin, arg, sizeof(max_pin));
 			break;
 		case GPIOGETCONFIG:
 			bcopy(arg, &pin, sizeof(pin));
 			dprintf("get config pin %d\n", pin.gp_pin);
 			res = GPIO_PIN_GETFLAGS(sc->sc_pdev, pin.gp_pin,
 			    &pin.gp_flags);
 			/* Fail early */
 			if (res)
 				break;
 			GPIO_PIN_GETCAPS(sc->sc_pdev, pin.gp_pin, &pin.gp_caps);
 			GPIOBUS_PIN_GETNAME(bus, pin.gp_pin, pin.gp_name);
 			bcopy(&pin, arg, sizeof(pin));
 			break;
 		case GPIOSETCONFIG:
 			bcopy(arg, &pin, sizeof(pin));
 			dprintf("set config pin %d\n", pin.gp_pin);
 			res = GPIO_PIN_GETCAPS(sc->sc_pdev, pin.gp_pin, &caps);
 			if (res == 0)
 				res = gpio_check_flags(caps, pin.gp_flags);
 			if (res == 0)
 				res = GPIO_PIN_SETFLAGS(sc->sc_pdev, pin.gp_pin,
 				    pin.gp_flags);
 			break;
 		case GPIOGET:
 			bcopy(arg, &req, sizeof(req));
 			res = GPIO_PIN_GET(sc->sc_pdev, req.gp_pin,
 			    &req.gp_value);
 			dprintf("read pin %d -> %d\n", 
 			    req.gp_pin, req.gp_value);
 			bcopy(&req, arg, sizeof(req));
 			break;
 		case GPIOSET:
 			bcopy(arg, &req, sizeof(req));
 			res = GPIO_PIN_SET(sc->sc_pdev, req.gp_pin, 
 			    req.gp_value);
 			dprintf("write pin %d -> %d\n", 
 			    req.gp_pin, req.gp_value);
 			break;
 		case GPIOTOGGLE:
 			bcopy(arg, &req, sizeof(req));
 			dprintf("toggle pin %d\n", 
 			    req.gp_pin);
 			res = GPIO_PIN_TOGGLE(sc->sc_pdev, req.gp_pin);
 			break;
 		case GPIOSETNAME:
 			bcopy(arg, &pin, sizeof(pin));
 			dprintf("set name on pin %d\n", pin.gp_pin);
 			res = GPIOBUS_PIN_SETNAME(bus, pin.gp_pin,
 			    pin.gp_name);
 			break;
 		case GPIOACCESS32:
 			a32 = (struct gpio_access_32 *)arg;
 			res = GPIO_PIN_ACCESS_32(sc->sc_pdev, a32->first_pin,
-			    a32->clear_pins, a32->orig_pins, &a32->orig_pins);
+			    a32->clear_pins, a32->change_pins, &a32->orig_pins);
 			break;
 		case GPIOCONFIG32:
 			c32 = (struct gpio_config_32 *)arg;
 			res = GPIO_PIN_CONFIG_32(sc->sc_pdev, c32->first_pin,
 			    c32->num_pins, c32->pin_flags);
 			break;
 		default:
 			return (ENOTTY);
 			break;
 	}
 
 	return (res);
 }
 
 static device_method_t gpioc_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		gpioc_probe),
 	DEVMETHOD(device_attach,	gpioc_attach),
 	DEVMETHOD(device_detach,	gpioc_detach),
 	DEVMETHOD(device_shutdown,	bus_generic_shutdown),
 	DEVMETHOD(device_suspend,	bus_generic_suspend),
 	DEVMETHOD(device_resume,	bus_generic_resume),
 
 	DEVMETHOD_END
 };
 
 driver_t gpioc_driver = {
 	"gpioc",
 	gpioc_methods,
 	sizeof(struct gpioc_softc)
 };
 
 devclass_t	gpioc_devclass;
 
 DRIVER_MODULE(gpioc, gpio, gpioc_driver, gpioc_devclass, 0, 0);
 MODULE_VERSION(gpioc, 1);
Index: user/ngie/bug-237403/sys/dev/isp/isp_pci.c
===================================================================
--- user/ngie/bug-237403/sys/dev/isp/isp_pci.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/isp/isp_pci.c	(revision 346926)
@@ -1,2001 +1,2010 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2009-2018 Alexander Motin <mav@FreeBSD.org>
  * Copyright (c) 1997-2008 by Matthew Jacob
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice immediately at the beginning of the file, without modification,
  *    this list of conditions, and the following disclaimer.
  * 2. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
  * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 /*
  * PCI specific probe and attach routines for Qlogic ISP SCSI adapters.
  * FreeBSD Version.
  */
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/linker.h>
 #include <sys/firmware.h>
 #include <sys/bus.h>
 #include <sys/stdint.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <machine/bus.h>
 #include <machine/resource.h>
 #include <sys/rman.h>
 #include <sys/malloc.h>
 #include <sys/uio.h>
 
 #ifdef __sparc64__
 #include <dev/ofw/openfirm.h>
 #include <machine/ofw_machdep.h>
 #endif
 
 #include <dev/isp/isp_freebsd.h>
 
 static uint32_t isp_pci_rd_reg(ispsoftc_t *, int);
 static void isp_pci_wr_reg(ispsoftc_t *, int, uint32_t);
 static uint32_t isp_pci_rd_reg_1080(ispsoftc_t *, int);
 static void isp_pci_wr_reg_1080(ispsoftc_t *, int, uint32_t);
 static uint32_t isp_pci_rd_reg_2400(ispsoftc_t *, int);
 static void isp_pci_wr_reg_2400(ispsoftc_t *, int, uint32_t);
 static uint32_t isp_pci_rd_reg_2600(ispsoftc_t *, int);
 static void isp_pci_wr_reg_2600(ispsoftc_t *, int, uint32_t);
 static void isp_pci_run_isr(ispsoftc_t *);
 static void isp_pci_run_isr_2300(ispsoftc_t *);
 static void isp_pci_run_isr_2400(ispsoftc_t *);
 static int isp_pci_mbxdma(ispsoftc_t *);
 static void isp_pci_mbxdmafree(ispsoftc_t *);
 static int isp_pci_dmasetup(ispsoftc_t *, XS_T *, void *);
 static int isp_pci_irqsetup(ispsoftc_t *);
 static void isp_pci_dumpregs(ispsoftc_t *, const char *);
 
 static struct ispmdvec mdvec = {
 	isp_pci_run_isr,
 	isp_pci_rd_reg,
 	isp_pci_wr_reg,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs,
 	NULL,
 	BIU_BURST_ENABLE|BIU_PCI_CONF1_FIFO_64
 };
 
 static struct ispmdvec mdvec_1080 = {
 	isp_pci_run_isr,
 	isp_pci_rd_reg_1080,
 	isp_pci_wr_reg_1080,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs,
 	NULL,
 	BIU_BURST_ENABLE|BIU_PCI_CONF1_FIFO_64
 };
 
 static struct ispmdvec mdvec_12160 = {
 	isp_pci_run_isr,
 	isp_pci_rd_reg_1080,
 	isp_pci_wr_reg_1080,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs,
 	NULL,
 	BIU_BURST_ENABLE|BIU_PCI_CONF1_FIFO_64
 };
 
 static struct ispmdvec mdvec_2100 = {
 	isp_pci_run_isr,
 	isp_pci_rd_reg,
 	isp_pci_wr_reg,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs
 };
 
 static struct ispmdvec mdvec_2200 = {
 	isp_pci_run_isr,
 	isp_pci_rd_reg,
 	isp_pci_wr_reg,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs
 };
 
 static struct ispmdvec mdvec_2300 = {
 	isp_pci_run_isr_2300,
 	isp_pci_rd_reg,
 	isp_pci_wr_reg,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	isp_pci_dumpregs
 };
 
 static struct ispmdvec mdvec_2400 = {
 	isp_pci_run_isr_2400,
 	isp_pci_rd_reg_2400,
 	isp_pci_wr_reg_2400,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	NULL
 };
 
 static struct ispmdvec mdvec_2500 = {
 	isp_pci_run_isr_2400,
 	isp_pci_rd_reg_2400,
 	isp_pci_wr_reg_2400,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	NULL
 };
 
 static struct ispmdvec mdvec_2600 = {
 	isp_pci_run_isr_2400,
 	isp_pci_rd_reg_2600,
 	isp_pci_wr_reg_2600,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	NULL
 };
 
 static struct ispmdvec mdvec_2700 = {
 	isp_pci_run_isr_2400,
 	isp_pci_rd_reg_2600,
 	isp_pci_wr_reg_2600,
 	isp_pci_mbxdma,
 	isp_pci_dmasetup,
 	isp_common_dmateardown,
 	isp_pci_irqsetup,
 	NULL
 };
 
 #ifndef	PCIM_CMD_INVEN
 #define	PCIM_CMD_INVEN			0x10
 #endif
 #ifndef	PCIM_CMD_BUSMASTEREN
 #define	PCIM_CMD_BUSMASTEREN		0x0004
 #endif
 #ifndef	PCIM_CMD_PERRESPEN
 #define	PCIM_CMD_PERRESPEN		0x0040
 #endif
 #ifndef	PCIM_CMD_SEREN
 #define	PCIM_CMD_SEREN			0x0100
 #endif
 #ifndef	PCIM_CMD_INTX_DISABLE
 #define	PCIM_CMD_INTX_DISABLE		0x0400
 #endif
 
 #ifndef	PCIR_COMMAND
 #define	PCIR_COMMAND			0x04
 #endif
 
 #ifndef	PCIR_CACHELNSZ
 #define	PCIR_CACHELNSZ			0x0c
 #endif
 
 #ifndef	PCIR_LATTIMER
 #define	PCIR_LATTIMER			0x0d
 #endif
 
 #ifndef	PCIR_ROMADDR
 #define	PCIR_ROMADDR			0x30
 #endif
 
 #define	PCI_VENDOR_QLOGIC		0x1077
 
 #define	PCI_PRODUCT_QLOGIC_ISP1020	0x1020
 #define	PCI_PRODUCT_QLOGIC_ISP1080	0x1080
 #define	PCI_PRODUCT_QLOGIC_ISP10160	0x1016
 #define	PCI_PRODUCT_QLOGIC_ISP12160	0x1216
 #define	PCI_PRODUCT_QLOGIC_ISP1240	0x1240
 #define	PCI_PRODUCT_QLOGIC_ISP1280	0x1280
 
 #define	PCI_PRODUCT_QLOGIC_ISP2100	0x2100
 #define	PCI_PRODUCT_QLOGIC_ISP2200	0x2200
 #define	PCI_PRODUCT_QLOGIC_ISP2300	0x2300
 #define	PCI_PRODUCT_QLOGIC_ISP2312	0x2312
 #define	PCI_PRODUCT_QLOGIC_ISP2322	0x2322
 #define	PCI_PRODUCT_QLOGIC_ISP2422	0x2422
 #define	PCI_PRODUCT_QLOGIC_ISP2432	0x2432
 #define	PCI_PRODUCT_QLOGIC_ISP2532	0x2532
 #define	PCI_PRODUCT_QLOGIC_ISP5432	0x5432
 #define	PCI_PRODUCT_QLOGIC_ISP6312	0x6312
 #define	PCI_PRODUCT_QLOGIC_ISP6322	0x6322
 #define	PCI_PRODUCT_QLOGIC_ISP2031	0x2031
 #define	PCI_PRODUCT_QLOGIC_ISP8031	0x8031
 #define	PCI_PRODUCT_QLOGIC_ISP2684	0x2171
 #define	PCI_PRODUCT_QLOGIC_ISP2692	0x2b61
 #define	PCI_PRODUCT_QLOGIC_ISP2714	0x2071
 #define	PCI_PRODUCT_QLOGIC_ISP2722	0x2261
 
 #define	PCI_QLOGIC_ISP1020	\
 	((PCI_PRODUCT_QLOGIC_ISP1020 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP1080	\
 	((PCI_PRODUCT_QLOGIC_ISP1080 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP10160	\
 	((PCI_PRODUCT_QLOGIC_ISP10160 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP12160	\
 	((PCI_PRODUCT_QLOGIC_ISP12160 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP1240	\
 	((PCI_PRODUCT_QLOGIC_ISP1240 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP1280	\
 	((PCI_PRODUCT_QLOGIC_ISP1280 << 16) | PCI_VENDOR_QLOGIC)
 
 #define	PCI_QLOGIC_ISP2100	\
 	((PCI_PRODUCT_QLOGIC_ISP2100 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2200	\
 	((PCI_PRODUCT_QLOGIC_ISP2200 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2300	\
 	((PCI_PRODUCT_QLOGIC_ISP2300 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2312	\
 	((PCI_PRODUCT_QLOGIC_ISP2312 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2322	\
 	((PCI_PRODUCT_QLOGIC_ISP2322 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2422	\
 	((PCI_PRODUCT_QLOGIC_ISP2422 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2432	\
 	((PCI_PRODUCT_QLOGIC_ISP2432 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2532	\
 	((PCI_PRODUCT_QLOGIC_ISP2532 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP5432	\
 	((PCI_PRODUCT_QLOGIC_ISP5432 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP6312	\
 	((PCI_PRODUCT_QLOGIC_ISP6312 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP6322	\
 	((PCI_PRODUCT_QLOGIC_ISP6322 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2031	\
 	((PCI_PRODUCT_QLOGIC_ISP2031 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP8031	\
 	((PCI_PRODUCT_QLOGIC_ISP8031 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2684	\
 	((PCI_PRODUCT_QLOGIC_ISP2684 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2692	\
 	((PCI_PRODUCT_QLOGIC_ISP2692 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2714	\
 	((PCI_PRODUCT_QLOGIC_ISP2714 << 16) | PCI_VENDOR_QLOGIC)
 #define	PCI_QLOGIC_ISP2722	\
 	((PCI_PRODUCT_QLOGIC_ISP2722 << 16) | PCI_VENDOR_QLOGIC)
 
 /*
  * Odd case for some AMI raid cards... We need to *not* attach to this.
  */
 #define	AMI_RAID_SUBVENDOR_ID	0x101e
 
 #define	PCI_DFLT_LTNCY	0x40
 #define	PCI_DFLT_LNSZ	0x10
 
 static int isp_pci_probe (device_t);
 static int isp_pci_attach (device_t);
 static int isp_pci_detach (device_t);
 
 
 #define	ISP_PCD(isp)	((struct isp_pcisoftc *)isp)->pci_dev
 struct isp_pcisoftc {
 	ispsoftc_t			pci_isp;
 	device_t			pci_dev;
 	struct resource *		regs;
 	struct resource *		regs1;
 	struct resource *		regs2;
 	struct {
 		int				iqd;
 		struct resource *		irq;
 		void *				ih;
 	} irq[ISP_MAX_IRQS];
 	int				rtp;
 	int				rgd;
 	int				rtp1;
 	int				rgd1;
 	int				rtp2;
 	int				rgd2;
 	int16_t				pci_poff[_NREG_BLKS];
 	bus_dma_tag_t			dmat;
 	int				msicount;
 };
 
 
 static device_method_t isp_pci_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		isp_pci_probe),
 	DEVMETHOD(device_attach,	isp_pci_attach),
 	DEVMETHOD(device_detach,	isp_pci_detach),
 	{ 0, 0 }
 };
 
 static driver_t isp_pci_driver = {
 	"isp", isp_pci_methods, sizeof (struct isp_pcisoftc)
 };
 static devclass_t isp_devclass;
 DRIVER_MODULE(isp, pci, isp_pci_driver, isp_devclass, 0, 0);
 MODULE_DEPEND(isp, cam, 1, 1, 1);
 MODULE_DEPEND(isp, firmware, 1, 1, 1);
 static int isp_nvports = 0;
 
 static int
 isp_pci_probe(device_t dev)
 {
 	switch ((pci_get_device(dev) << 16) | (pci_get_vendor(dev))) {
 	case PCI_QLOGIC_ISP1020:
 		device_set_desc(dev, "Qlogic ISP 1020/1040 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP1080:
 		device_set_desc(dev, "Qlogic ISP 1080 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP1240:
 		device_set_desc(dev, "Qlogic ISP 1240 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP1280:
 		device_set_desc(dev, "Qlogic ISP 1280 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP10160:
 		device_set_desc(dev, "Qlogic ISP 10160 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP12160:
 		if (pci_get_subvendor(dev) == AMI_RAID_SUBVENDOR_ID) {
 			return (ENXIO);
 		}
 		device_set_desc(dev, "Qlogic ISP 12160 PCI SCSI Adapter");
 		break;
 	case PCI_QLOGIC_ISP2100:
 		device_set_desc(dev, "Qlogic ISP 2100 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2200:
 		device_set_desc(dev, "Qlogic ISP 2200 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2300:
 		device_set_desc(dev, "Qlogic ISP 2300 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2312:
 		device_set_desc(dev, "Qlogic ISP 2312 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2322:
 		device_set_desc(dev, "Qlogic ISP 2322 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2422:
 		device_set_desc(dev, "Qlogic ISP 2422 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2432:
 		device_set_desc(dev, "Qlogic ISP 2432 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2532:
 		device_set_desc(dev, "Qlogic ISP 2532 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP5432:
 		device_set_desc(dev, "Qlogic ISP 5432 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP6312:
 		device_set_desc(dev, "Qlogic ISP 6312 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP6322:
 		device_set_desc(dev, "Qlogic ISP 6322 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP2031:
 		device_set_desc(dev, "Qlogic ISP 2031 PCI FC-AL Adapter");
 		break;
 	case PCI_QLOGIC_ISP8031:
 		device_set_desc(dev, "Qlogic ISP 8031 PCI FCoE Adapter");
 		break;
 	case PCI_QLOGIC_ISP2684:
 		device_set_desc(dev, "Qlogic ISP 2684 PCI FC Adapter");
 		break;
 	case PCI_QLOGIC_ISP2692:
 		device_set_desc(dev, "Qlogic ISP 2692 PCI FC Adapter");
 		break;
 	case PCI_QLOGIC_ISP2714:
 		device_set_desc(dev, "Qlogic ISP 2714 PCI FC Adapter");
 		break;
 	case PCI_QLOGIC_ISP2722:
 		device_set_desc(dev, "Qlogic ISP 2722 PCI FC Adapter");
 		break;
 	default:
 		return (ENXIO);
 	}
 	if (isp_announced == 0 && bootverbose) {
 		printf("Qlogic ISP Driver, FreeBSD Version %d.%d, "
 		    "Core Version %d.%d\n",
 		    ISP_PLATFORM_VERSION_MAJOR, ISP_PLATFORM_VERSION_MINOR,
 		    ISP_CORE_VERSION_MAJOR, ISP_CORE_VERSION_MINOR);
 		isp_announced++;
 	}
 	/*
 	 * XXXX: Here is where we might load the f/w module
 	 * XXXX: (or increase a reference count to it).
 	 */
 	return (BUS_PROBE_DEFAULT);
 }
 
 static void
 isp_get_generic_options(device_t dev, ispsoftc_t *isp)
 {
 	int tval;
 
 	tval = 0;
 	if (resource_int_value(device_get_name(dev), device_get_unit(dev), "fwload_disable", &tval) == 0 && tval != 0) {
 		isp->isp_confopts |= ISP_CFG_NORELOAD;
 	}
 	tval = 0;
 	if (resource_int_value(device_get_name(dev), device_get_unit(dev), "ignore_nvram", &tval) == 0 && tval != 0) {
 		isp->isp_confopts |= ISP_CFG_NONVRAM;
 	}
 	tval = 0;
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev), "debug", &tval);
 	if (tval) {
 		isp->isp_dblev = tval;
 	} else {
 		isp->isp_dblev = ISP_LOGWARN|ISP_LOGERR;
 	}
 	if (bootverbose) {
 		isp->isp_dblev |= ISP_LOGCONFIG|ISP_LOGINFO;
 	}
 	tval = -1;
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev), "vports", &tval);
 	if (tval > 0 && tval <= 254) {
 		isp_nvports = tval;
 	}
 	tval = 7;
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev), "quickboot_time", &tval);
 	isp_quickboot_time = tval;
 }
 
 static void
 isp_get_specific_options(device_t dev, int chan, ispsoftc_t *isp)
 {
 	const char *sptr;
 	int tval = 0;
 	char prefix[12], name[16];
 
 	if (chan == 0)
 		prefix[0] = 0;
 	else
 		snprintf(prefix, sizeof(prefix), "chan%d.", chan);
 	snprintf(name, sizeof(name), "%siid", prefix);
 	if (resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval)) {
 		if (IS_FC(isp)) {
 			ISP_FC_PC(isp, chan)->default_id = 109 - chan;
 		} else {
 #ifdef __sparc64__
 			ISP_SPI_PC(isp, chan)->iid = OF_getscsinitid(dev);
 #else
 			ISP_SPI_PC(isp, chan)->iid = 7;
 #endif
 		}
 	} else {
 		if (IS_FC(isp)) {
 			ISP_FC_PC(isp, chan)->default_id = tval - chan;
 		} else {
 			ISP_SPI_PC(isp, chan)->iid = tval;
 		}
 		isp->isp_confopts |= ISP_CFG_OWNLOOPID;
 	}
 
 	if (IS_SCSI(isp))
 		return;
 
 	tval = -1;
 	snprintf(name, sizeof(name), "%srole", prefix);
 	if (resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval) == 0) {
 		switch (tval) {
 		case ISP_ROLE_NONE:
 		case ISP_ROLE_INITIATOR:
 		case ISP_ROLE_TARGET:
 		case ISP_ROLE_BOTH:
 			device_printf(dev, "Chan %d setting role to 0x%x\n", chan, tval);
 			break;
 		default:
 			tval = -1;
 			break;
 		}
 	}
 	if (tval == -1) {
 		tval = ISP_DEFAULT_ROLES;
 	}
 	ISP_FC_PC(isp, chan)->def_role = tval;
 
 	tval = 0;
 	snprintf(name, sizeof(name), "%sfullduplex", prefix);
 	if (resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval) == 0 && tval != 0) {
 		isp->isp_confopts |= ISP_CFG_FULL_DUPLEX;
 	}
 	sptr = NULL;
 	snprintf(name, sizeof(name), "%stopology", prefix);
 	if (resource_string_value(device_get_name(dev), device_get_unit(dev),
 	    name, (const char **) &sptr) == 0 && sptr != NULL) {
 		if (strcmp(sptr, "lport") == 0) {
 			isp->isp_confopts |= ISP_CFG_LPORT;
 		} else if (strcmp(sptr, "nport") == 0) {
 			isp->isp_confopts |= ISP_CFG_NPORT;
 		} else if (strcmp(sptr, "lport-only") == 0) {
 			isp->isp_confopts |= ISP_CFG_LPORT_ONLY;
 		} else if (strcmp(sptr, "nport-only") == 0) {
 			isp->isp_confopts |= ISP_CFG_NPORT_ONLY;
 		}
 	}
 
 #ifdef ISP_FCTAPE_OFF
 	isp->isp_confopts |= ISP_CFG_NOFCTAPE;
 #else
 	isp->isp_confopts |= ISP_CFG_FCTAPE;
 #endif
 
 	tval = 0;
 	snprintf(name, sizeof(name), "%snofctape", prefix);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval);
 	if (tval) {
 		isp->isp_confopts &= ~ISP_CFG_FCTAPE;
 		isp->isp_confopts |= ISP_CFG_NOFCTAPE;
 	}
 
 	tval = 0;
 	snprintf(name, sizeof(name), "%sfctape", prefix);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval);
 	if (tval) {
 		isp->isp_confopts &= ~ISP_CFG_NOFCTAPE;
 		isp->isp_confopts |= ISP_CFG_FCTAPE;
 	}
 
 
 	/*
 	 * Because the resource_*_value functions can neither return
 	 * 64 bit integer values, nor can they be directly coerced
 	 * to interpret the right hand side of the assignment as
 	 * you want them to interpret it, we have to force WWN
 	 * hint replacement to specify WWN strings with a leading
 	 * 'w' (e..g w50000000aaaa0001). Sigh.
 	 */
 	sptr = NULL;
 	snprintf(name, sizeof(name), "%sportwwn", prefix);
 	tval = resource_string_value(device_get_name(dev), device_get_unit(dev),
 	    name, (const char **) &sptr);
 	if (tval == 0 && sptr != NULL && *sptr++ == 'w') {
 		char *eptr = NULL;
 		ISP_FC_PC(isp, chan)->def_wwpn = strtouq(sptr, &eptr, 16);
 		if (eptr < sptr + 16 || ISP_FC_PC(isp, chan)->def_wwpn == -1) {
 			device_printf(dev, "mangled portwwn hint '%s'\n", sptr);
 			ISP_FC_PC(isp, chan)->def_wwpn = 0;
 		}
 	}
 
 	sptr = NULL;
 	snprintf(name, sizeof(name), "%snodewwn", prefix);
 	tval = resource_string_value(device_get_name(dev), device_get_unit(dev),
 	    name, (const char **) &sptr);
 	if (tval == 0 && sptr != NULL && *sptr++ == 'w') {
 		char *eptr = NULL;
 		ISP_FC_PC(isp, chan)->def_wwnn = strtouq(sptr, &eptr, 16);
 		if (eptr < sptr + 16 || ISP_FC_PC(isp, chan)->def_wwnn == 0) {
 			device_printf(dev, "mangled nodewwn hint '%s'\n", sptr);
 			ISP_FC_PC(isp, chan)->def_wwnn = 0;
 		}
 	}
 
 	tval = -1;
 	snprintf(name, sizeof(name), "%sloop_down_limit", prefix);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval);
 	if (tval >= 0 && tval < 0xffff) {
 		ISP_FC_PC(isp, chan)->loop_down_limit = tval;
 	} else {
 		ISP_FC_PC(isp, chan)->loop_down_limit = isp_loop_down_limit;
 	}
 
 	tval = -1;
 	snprintf(name, sizeof(name), "%sgone_device_time", prefix);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    name, &tval);
 	if (tval >= 0 && tval < 0xffff) {
 		ISP_FC_PC(isp, chan)->gone_device_time = tval;
 	} else {
 		ISP_FC_PC(isp, chan)->gone_device_time = isp_gone_device_time;
 	}
 }
 
 static int
 isp_pci_attach(device_t dev)
 {
 	struct isp_pcisoftc *pcs = device_get_softc(dev);
 	ispsoftc_t *isp = &pcs->pci_isp;
 	int i;
 	uint32_t data, cmd, linesz, did;
 	size_t psize, xsize;
 	char fwname[32];
 
 	pcs->pci_dev = dev;
 	isp->isp_dev = dev;
 	isp->isp_nchan = 1;
 	mtx_init(&isp->isp_lock, "isp", NULL, MTX_DEF);
 
 	/*
 	 * Get Generic Options
 	 */
 	isp_nvports = 0;
 	isp_get_generic_options(dev, isp);
 
 	linesz = PCI_DFLT_LNSZ;
 	pcs->regs = pcs->regs2 = NULL;
 	pcs->rgd = pcs->rtp = 0;
 
 	pcs->pci_dev = dev;
 	pcs->pci_poff[BIU_BLOCK >> _BLK_REG_SHFT] = BIU_REGS_OFF;
 	pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS_OFF;
 	pcs->pci_poff[SXP_BLOCK >> _BLK_REG_SHFT] = PCI_SXP_REGS_OFF;
 	pcs->pci_poff[RISC_BLOCK >> _BLK_REG_SHFT] = PCI_RISC_REGS_OFF;
 	pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = DMA_REGS_OFF;
 
 	switch (pci_get_devid(dev)) {
 	case PCI_QLOGIC_ISP1020:
 		did = 0x1040;
 		isp->isp_mdvec = &mdvec;
 		isp->isp_type = ISP_HA_SCSI_UNKNOWN;
 		break;
 	case PCI_QLOGIC_ISP1080:
 		did = 0x1080;
 		isp->isp_mdvec = &mdvec_1080;
 		isp->isp_type = ISP_HA_SCSI_1080;
 		pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = ISP1080_DMA_REGS_OFF;
 		break;
 	case PCI_QLOGIC_ISP1240:
 		did = 0x1080;
 		isp->isp_mdvec = &mdvec_1080;
 		isp->isp_type = ISP_HA_SCSI_1240;
 		isp->isp_nchan = 2;
 		pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = ISP1080_DMA_REGS_OFF;
 		break;
 	case PCI_QLOGIC_ISP1280:
 		did = 0x1080;
 		isp->isp_mdvec = &mdvec_1080;
 		isp->isp_type = ISP_HA_SCSI_1280;
 		pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = ISP1080_DMA_REGS_OFF;
 		break;
 	case PCI_QLOGIC_ISP10160:
 		did = 0x12160;
 		isp->isp_mdvec = &mdvec_12160;
 		isp->isp_type = ISP_HA_SCSI_10160;
 		pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = ISP1080_DMA_REGS_OFF;
 		break;
 	case PCI_QLOGIC_ISP12160:
 		did = 0x12160;
 		isp->isp_nchan = 2;
 		isp->isp_mdvec = &mdvec_12160;
 		isp->isp_type = ISP_HA_SCSI_12160;
 		pcs->pci_poff[DMA_BLOCK >> _BLK_REG_SHFT] = ISP1080_DMA_REGS_OFF;
 		break;
 	case PCI_QLOGIC_ISP2100:
 		did = 0x2100;
 		isp->isp_mdvec = &mdvec_2100;
 		isp->isp_type = ISP_HA_FC_2100;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2100_OFF;
 		if (pci_get_revid(dev) < 3) {
 			/*
 			 * XXX: Need to get the actual revision
 			 * XXX: number of the 2100 FB. At any rate,
 			 * XXX: lower cache line size for early revision
 			 * XXX; boards.
 			 */
 			linesz = 1;
 		}
 		break;
 	case PCI_QLOGIC_ISP2200:
 		did = 0x2200;
 		isp->isp_mdvec = &mdvec_2200;
 		isp->isp_type = ISP_HA_FC_2200;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2100_OFF;
 		break;
 	case PCI_QLOGIC_ISP2300:
 		did = 0x2300;
 		isp->isp_mdvec = &mdvec_2300;
 		isp->isp_type = ISP_HA_FC_2300;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2300_OFF;
 		break;
 	case PCI_QLOGIC_ISP2312:
 	case PCI_QLOGIC_ISP6312:
 		did = 0x2300;
 		isp->isp_mdvec = &mdvec_2300;
 		isp->isp_type = ISP_HA_FC_2312;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2300_OFF;
 		break;
 	case PCI_QLOGIC_ISP2322:
 	case PCI_QLOGIC_ISP6322:
 		did = 0x2322;
 		isp->isp_mdvec = &mdvec_2300;
 		isp->isp_type = ISP_HA_FC_2322;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2300_OFF;
 		break;
 	case PCI_QLOGIC_ISP2422:
 	case PCI_QLOGIC_ISP2432:
 		did = 0x2400;
 		isp->isp_nchan += isp_nvports;
 		isp->isp_mdvec = &mdvec_2400;
 		isp->isp_type = ISP_HA_FC_2400;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2400_OFF;
 		break;
 	case PCI_QLOGIC_ISP2532:
 		did = 0x2500;
 		isp->isp_nchan += isp_nvports;
 		isp->isp_mdvec = &mdvec_2500;
 		isp->isp_type = ISP_HA_FC_2500;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2400_OFF;
 		break;
 	case PCI_QLOGIC_ISP5432:
 		did = 0x2500;
 		isp->isp_mdvec = &mdvec_2500;
 		isp->isp_type = ISP_HA_FC_2500;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2400_OFF;
 		break;
 	case PCI_QLOGIC_ISP2031:
 	case PCI_QLOGIC_ISP8031:
 		did = 0x2600;
 		isp->isp_nchan += isp_nvports;
 		isp->isp_mdvec = &mdvec_2600;
 		isp->isp_type = ISP_HA_FC_2600;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2400_OFF;
 		break;
 	case PCI_QLOGIC_ISP2684:
 	case PCI_QLOGIC_ISP2692:
 	case PCI_QLOGIC_ISP2714:
 	case PCI_QLOGIC_ISP2722:
 		did = 0x2700;
 		isp->isp_nchan += isp_nvports;
 		isp->isp_mdvec = &mdvec_2700;
 		isp->isp_type = ISP_HA_FC_2700;
 		pcs->pci_poff[MBOX_BLOCK >> _BLK_REG_SHFT] = PCI_MBOX_REGS2400_OFF;
 		break;
 	default:
 		device_printf(dev, "unknown device type\n");
 		goto bad;
 		break;
 	}
 	isp->isp_revision = pci_get_revid(dev);
 
 	if (IS_26XX(isp)) {
 		pcs->rtp = SYS_RES_MEMORY;
 		pcs->rgd = PCIR_BAR(0);
 		pcs->regs = bus_alloc_resource_any(dev, pcs->rtp, &pcs->rgd,
 		    RF_ACTIVE);
 		pcs->rtp1 = SYS_RES_MEMORY;
 		pcs->rgd1 = PCIR_BAR(2);
 		pcs->regs1 = bus_alloc_resource_any(dev, pcs->rtp1, &pcs->rgd1,
 		    RF_ACTIVE);
 		pcs->rtp2 = SYS_RES_MEMORY;
 		pcs->rgd2 = PCIR_BAR(4);
 		pcs->regs2 = bus_alloc_resource_any(dev, pcs->rtp2, &pcs->rgd2,
 		    RF_ACTIVE);
 	} else {
 		pcs->rtp = SYS_RES_MEMORY;
 		pcs->rgd = PCIR_BAR(1);
 		pcs->regs = bus_alloc_resource_any(dev, pcs->rtp, &pcs->rgd,
 		    RF_ACTIVE);
 		if (pcs->regs == NULL) {
 			pcs->rtp = SYS_RES_IOPORT;
 			pcs->rgd = PCIR_BAR(0);
 			pcs->regs = bus_alloc_resource_any(dev, pcs->rtp,
 			    &pcs->rgd, RF_ACTIVE);
 		}
 	}
 	if (pcs->regs == NULL) {
 		device_printf(dev, "Unable to map any ports\n");
 		goto bad;
 	}
 	if (bootverbose) {
 		device_printf(dev, "Using %s space register mapping\n",
 		    (pcs->rtp == SYS_RES_IOPORT)? "I/O" : "Memory");
 	}
 	isp->isp_regs = pcs->regs;
 	isp->isp_regs2 = pcs->regs2;
 
 	if (IS_FC(isp)) {
 		psize = sizeof (fcparam);
 		xsize = sizeof (struct isp_fc);
 	} else {
 		psize = sizeof (sdparam);
 		xsize = sizeof (struct isp_spi);
 	}
 	psize *= isp->isp_nchan;
 	xsize *= isp->isp_nchan;
 	isp->isp_param = malloc(psize, M_DEVBUF, M_NOWAIT | M_ZERO);
 	if (isp->isp_param == NULL) {
 		device_printf(dev, "cannot allocate parameter data\n");
 		goto bad;
 	}
 	isp->isp_osinfo.pc.ptr = malloc(xsize, M_DEVBUF, M_NOWAIT | M_ZERO);
 	if (isp->isp_osinfo.pc.ptr == NULL) {
 		device_printf(dev, "cannot allocate parameter data\n");
 		goto bad;
 	}
 
 	/*
 	 * Now that we know who we are (roughly) get/set specific options
 	 */
 	for (i = 0; i < isp->isp_nchan; i++) {
 		isp_get_specific_options(dev, i, isp);
 	}
 
 	isp->isp_osinfo.fw = NULL;
 	if (isp->isp_osinfo.fw == NULL) {
 		snprintf(fwname, sizeof (fwname), "isp_%04x", did);
 		isp->isp_osinfo.fw = firmware_get(fwname);
 	}
 	if (isp->isp_osinfo.fw != NULL) {
 		isp_prt(isp, ISP_LOGCONFIG, "loaded firmware %s", fwname);
 		isp->isp_mdvec->dv_ispfw = isp->isp_osinfo.fw->data;
 	}
 
 	/*
 	 * Make sure that SERR, PERR, WRITE INVALIDATE and BUSMASTER are set.
 	 */
 	cmd = pci_read_config(dev, PCIR_COMMAND, 2);
 	cmd |= PCIM_CMD_SEREN | PCIM_CMD_PERRESPEN | PCIM_CMD_BUSMASTEREN | PCIM_CMD_INVEN;
 	if (IS_2300(isp)) {	/* per QLogic errata */
 		cmd &= ~PCIM_CMD_INVEN;
 	}
 	if (IS_2322(isp) || pci_get_devid(dev) == PCI_QLOGIC_ISP6312) {
 		cmd &= ~PCIM_CMD_INTX_DISABLE;
 	}
 	if (IS_24XX(isp)) {
 		cmd &= ~PCIM_CMD_INTX_DISABLE;
 	}
 	pci_write_config(dev, PCIR_COMMAND, cmd, 2);
 
 	/*
 	 * Make sure the Cache Line Size register is set sensibly.
 	 */
 	data = pci_read_config(dev, PCIR_CACHELNSZ, 1);
 	if (data == 0 || (linesz != PCI_DFLT_LNSZ && data != linesz)) {
 		isp_prt(isp, ISP_LOGDEBUG0, "set PCI line size to %d from %d", linesz, data);
 		data = linesz;
 		pci_write_config(dev, PCIR_CACHELNSZ, data, 1);
 	}
 
 	/*
 	 * Make sure the Latency Timer is sane.
 	 */
 	data = pci_read_config(dev, PCIR_LATTIMER, 1);
 	if (data < PCI_DFLT_LTNCY) {
 		data = PCI_DFLT_LTNCY;
 		isp_prt(isp, ISP_LOGDEBUG0, "set PCI latency to %d", data);
 		pci_write_config(dev, PCIR_LATTIMER, data, 1);
 	}
 
 	/*
 	 * Make sure we've disabled the ROM.
 	 */
 	data = pci_read_config(dev, PCIR_ROMADDR, 4);
 	data &= ~1;
 	pci_write_config(dev, PCIR_ROMADDR, data, 4);
 
 	/*
 	 * Last minute checks...
 	 */
 	if (IS_23XX(isp) || IS_24XX(isp)) {
 		isp->isp_port = pci_get_function(dev);
 	}
 
 	/*
 	 * Make sure we're in reset state.
 	 */
 	ISP_LOCK(isp);
 	if (isp_reinit(isp, 1) != 0) {
 		ISP_UNLOCK(isp);
 		goto bad;
 	}
 	ISP_UNLOCK(isp);
 	if (isp_attach(isp)) {
 		ISP_LOCK(isp);
 		isp_shutdown(isp);
 		ISP_UNLOCK(isp);
 		goto bad;
 	}
 	return (0);
 
 bad:
+	if (isp->isp_osinfo.fw == NULL && !IS_26XX(isp)) {
+		/*
+		 * Failure to attach at boot time might have been caused
+		 * by a missing ispfw(4).  Except for for 16Gb adapters,
+		 * there's no loadable firmware for them.
+		 */
+		isp_prt(isp, ISP_LOGWARN, "See the ispfw(4) man page on "
+		    "how to load known good firmware at boot time");
+	}
 	for (i = 0; i < isp->isp_nirq; i++) {
 		(void) bus_teardown_intr(dev, pcs->irq[i].irq, pcs->irq[i].ih);
 		(void) bus_release_resource(dev, SYS_RES_IRQ, pcs->irq[i].iqd,
 		    pcs->irq[0].irq);
 	}
 	if (pcs->msicount) {
 		pci_release_msi(dev);
 	}
 	if (pcs->regs)
 		(void) bus_release_resource(dev, pcs->rtp, pcs->rgd, pcs->regs);
 	if (pcs->regs1)
 		(void) bus_release_resource(dev, pcs->rtp1, pcs->rgd1, pcs->regs1);
 	if (pcs->regs2)
 		(void) bus_release_resource(dev, pcs->rtp2, pcs->rgd2, pcs->regs2);
 	if (pcs->pci_isp.isp_param) {
 		free(pcs->pci_isp.isp_param, M_DEVBUF);
 		pcs->pci_isp.isp_param = NULL;
 	}
 	if (pcs->pci_isp.isp_osinfo.pc.ptr) {
 		free(pcs->pci_isp.isp_osinfo.pc.ptr, M_DEVBUF);
 		pcs->pci_isp.isp_osinfo.pc.ptr = NULL;
 	}
 	mtx_destroy(&isp->isp_lock);
 	return (ENXIO);
 }
 
 static int
 isp_pci_detach(device_t dev)
 {
 	struct isp_pcisoftc *pcs = device_get_softc(dev);
 	ispsoftc_t *isp = &pcs->pci_isp;
 	int i, status;
 
 	status = isp_detach(isp);
 	if (status)
 		return (status);
 	ISP_LOCK(isp);
 	isp_shutdown(isp);
 	ISP_UNLOCK(isp);
 	for (i = 0; i < isp->isp_nirq; i++) {
 		(void) bus_teardown_intr(dev, pcs->irq[i].irq, pcs->irq[i].ih);
 		(void) bus_release_resource(dev, SYS_RES_IRQ, pcs->irq[i].iqd,
 		    pcs->irq[i].irq);
 	}
 	if (pcs->msicount)
 		pci_release_msi(dev);
 	(void) bus_release_resource(dev, pcs->rtp, pcs->rgd, pcs->regs);
 	if (pcs->regs1)
 		(void) bus_release_resource(dev, pcs->rtp1, pcs->rgd1, pcs->regs1);
 	if (pcs->regs2)
 		(void) bus_release_resource(dev, pcs->rtp2, pcs->rgd2, pcs->regs2);
 	isp_pci_mbxdmafree(isp);
 	if (pcs->pci_isp.isp_param) {
 		free(pcs->pci_isp.isp_param, M_DEVBUF);
 		pcs->pci_isp.isp_param = NULL;
 	}
 	if (pcs->pci_isp.isp_osinfo.pc.ptr) {
 		free(pcs->pci_isp.isp_osinfo.pc.ptr, M_DEVBUF);
 		pcs->pci_isp.isp_osinfo.pc.ptr = NULL;
 	}
 	mtx_destroy(&isp->isp_lock);
 	return (0);
 }
 
 #define	IspVirt2Off(a, x)	\
 	(((struct isp_pcisoftc *)a)->pci_poff[((x) & _BLK_REG_MASK) >> \
 	_BLK_REG_SHFT] + ((x) & 0xfff))
 
 #define	BXR2(isp, off)		bus_read_2((isp)->isp_regs, (off))
 #define	BXW2(isp, off, v)	bus_write_2((isp)->isp_regs, (off), (v))
 #define	BXR4(isp, off)		bus_read_4((isp)->isp_regs, (off))
 #define	BXW4(isp, off, v)	bus_write_4((isp)->isp_regs, (off), (v))
 #define	B2R4(isp, off)		bus_read_4((isp)->isp_regs2, (off))
 #define	B2W4(isp, off, v)	bus_write_4((isp)->isp_regs2, (off), (v))
 
 static ISP_INLINE uint16_t
 isp_pci_rd_debounced(ispsoftc_t *isp, int off)
 {
 	uint16_t val, prev;
 
 	val = BXR2(isp, IspVirt2Off(isp, off));
 	do {
 		prev = val;
 		val = BXR2(isp, IspVirt2Off(isp, off));
 	} while (val != prev);
 	return (val);
 }
 
 static void
 isp_pci_run_isr(ispsoftc_t *isp)
 {
 	uint16_t isr, sema, info;
 
 	if (IS_2100(isp)) {
 		isr = isp_pci_rd_debounced(isp, BIU_ISR);
 		sema = isp_pci_rd_debounced(isp, BIU_SEMA);
 	} else {
 		isr = BXR2(isp, IspVirt2Off(isp, BIU_ISR));
 		sema = BXR2(isp, IspVirt2Off(isp, BIU_SEMA));
 	}
 	isp_prt(isp, ISP_LOGDEBUG3, "ISR 0x%x SEMA 0x%x", isr, sema);
 	isr &= INT_PENDING_MASK(isp);
 	sema &= BIU_SEMA_LOCK;
 	if (isr == 0 && sema == 0)
 		return;
 	if (sema != 0) {
 		if (IS_2100(isp))
 			info = isp_pci_rd_debounced(isp, OUTMAILBOX0);
 		else
 			info = BXR2(isp, IspVirt2Off(isp, OUTMAILBOX0));
 		if (info & MBOX_COMMAND_COMPLETE)
 			isp_intr_mbox(isp, info);
 		else
 			isp_intr_async(isp, info);
 		if (!IS_FC(isp) && isp->isp_state == ISP_RUNSTATE)
 			isp_intr_respq(isp);
 	} else
 		isp_intr_respq(isp);
 	ISP_WRITE(isp, HCCR, HCCR_CMD_CLEAR_RISC_INT);
 	if (sema)
 		ISP_WRITE(isp, BIU_SEMA, 0);
 }
 
 static void
 isp_pci_run_isr_2300(ispsoftc_t *isp)
 {
 	uint32_t hccr, r2hisr;
 	uint16_t isr, info;
 
 	if ((BXR2(isp, IspVirt2Off(isp, BIU_ISR)) & BIU2100_ISR_RISC_INT) == 0)
 		return;
 	r2hisr = BXR4(isp, IspVirt2Off(isp, BIU_R2HSTSLO));
 	isp_prt(isp, ISP_LOGDEBUG3, "RISC2HOST ISR 0x%x", r2hisr);
 	if ((r2hisr & BIU_R2HST_INTR) == 0)
 		return;
 	isr = r2hisr & BIU_R2HST_ISTAT_MASK;
 	info = r2hisr >> 16;
 	switch (isr) {
 	case ISPR2HST_ROM_MBX_OK:
 	case ISPR2HST_ROM_MBX_FAIL:
 	case ISPR2HST_MBX_OK:
 	case ISPR2HST_MBX_FAIL:
 		isp_intr_mbox(isp, info);
 		break;
 	case ISPR2HST_ASYNC_EVENT:
 		isp_intr_async(isp, info);
 		break;
 	case ISPR2HST_RIO_16:
 		isp_intr_async(isp, ASYNC_RIO16_1);
 		break;
 	case ISPR2HST_FPOST:
 		isp_intr_async(isp, ASYNC_CMD_CMPLT);
 		break;
 	case ISPR2HST_FPOST_CTIO:
 		isp_intr_async(isp, ASYNC_CTIO_DONE);
 		break;
 	case ISPR2HST_RSPQ_UPDATE:
 		isp_intr_respq(isp);
 		break;
 	default:
 		hccr = ISP_READ(isp, HCCR);
 		if (hccr & HCCR_PAUSE) {
 			ISP_WRITE(isp, HCCR, HCCR_RESET);
 			isp_prt(isp, ISP_LOGERR, "RISC paused at interrupt (%x->%x)", hccr, ISP_READ(isp, HCCR));
 			ISP_WRITE(isp, BIU_ICR, 0);
 		} else {
 			isp_prt(isp, ISP_LOGERR, "unknown interrupt 0x%x\n", r2hisr);
 		}
 	}
 	ISP_WRITE(isp, HCCR, HCCR_CMD_CLEAR_RISC_INT);
 	ISP_WRITE(isp, BIU_SEMA, 0);
 }
 
 static void
 isp_pci_run_isr_2400(ispsoftc_t *isp)
 {
 	uint32_t r2hisr;
 	uint16_t isr, info;
 
 	r2hisr = BXR4(isp, IspVirt2Off(isp, BIU2400_R2HSTSLO));
 	isp_prt(isp, ISP_LOGDEBUG3, "RISC2HOST ISR 0x%x", r2hisr);
 	if ((r2hisr & BIU_R2HST_INTR) == 0)
 		return;
 	isr = r2hisr & BIU_R2HST_ISTAT_MASK;
 	info = (r2hisr >> 16);
 	switch (isr) {
 	case ISPR2HST_ROM_MBX_OK:
 	case ISPR2HST_ROM_MBX_FAIL:
 	case ISPR2HST_MBX_OK:
 	case ISPR2HST_MBX_FAIL:
 		isp_intr_mbox(isp, info);
 		break;
 	case ISPR2HST_ASYNC_EVENT:
 		isp_intr_async(isp, info);
 		break;
 	case ISPR2HST_RSPQ_UPDATE:
 		isp_intr_respq(isp);
 		break;
 	case ISPR2HST_RSPQ_UPDATE2:
 #ifdef	ISP_TARGET_MODE
 	case ISPR2HST_ATIO_RSPQ_UPDATE:
 #endif
 		isp_intr_respq(isp);
 		/* FALLTHROUGH */
 #ifdef	ISP_TARGET_MODE
 	case ISPR2HST_ATIO_UPDATE:
 	case ISPR2HST_ATIO_UPDATE2:
 		isp_intr_atioq(isp);
 #endif
 		break;
 	default:
 		isp_prt(isp, ISP_LOGERR, "unknown interrupt 0x%x\n", r2hisr);
 	}
 	ISP_WRITE(isp, BIU2400_HCCR, HCCR_2400_CMD_CLEAR_RISC_INT);
 }
 
 static uint32_t
 isp_pci_rd_reg(ispsoftc_t *isp, int regoff)
 {
 	uint16_t rv;
 	int oldconf = 0;
 
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		/*
 		 * We will assume that someone has paused the RISC processor.
 		 */
 		oldconf = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), oldconf | BIU_PCI_CONF1_SXP);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	rv = BXR2(isp, IspVirt2Off(isp, regoff));
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), oldconf);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	return (rv);
 }
 
 static void
 isp_pci_wr_reg(ispsoftc_t *isp, int regoff, uint32_t val)
 {
 	int oldconf = 0;
 
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		/*
 		 * We will assume that someone has paused the RISC processor.
 		 */
 		oldconf = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1),
 		    oldconf | BIU_PCI_CONF1_SXP);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	BXW2(isp, IspVirt2Off(isp, regoff), val);
 	MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, regoff), 2, -1);
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), oldconf);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 
 }
 
 static uint32_t
 isp_pci_rd_reg_1080(ispsoftc_t *isp, int regoff)
 {
 	uint32_t rv, oc = 0;
 
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		uint32_t tc;
 		/*
 		 * We will assume that someone has paused the RISC processor.
 		 */
 		oc = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		tc = oc & ~BIU_PCI1080_CONF1_DMA;
 		if (regoff & SXP_BANK1_SELECT)
 			tc |= BIU_PCI1080_CONF1_SXP1;
 		else
 			tc |= BIU_PCI1080_CONF1_SXP0;
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), tc);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	} else if ((regoff & _BLK_REG_MASK) == DMA_BLOCK) {
 		oc = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), 
 		    oc | BIU_PCI1080_CONF1_DMA);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	rv = BXR2(isp, IspVirt2Off(isp, regoff));
 	if (oc) {
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), oc);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	return (rv);
 }
 
 static void
 isp_pci_wr_reg_1080(ispsoftc_t *isp, int regoff, uint32_t val)
 {
 	int oc = 0;
 
 	if ((regoff & _BLK_REG_MASK) == SXP_BLOCK) {
 		uint32_t tc;
 		/*
 		 * We will assume that someone has paused the RISC processor.
 		 */
 		oc = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		tc = oc & ~BIU_PCI1080_CONF1_DMA;
 		if (regoff & SXP_BANK1_SELECT)
 			tc |= BIU_PCI1080_CONF1_SXP1;
 		else
 			tc |= BIU_PCI1080_CONF1_SXP0;
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), tc);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	} else if ((regoff & _BLK_REG_MASK) == DMA_BLOCK) {
 		oc = BXR2(isp, IspVirt2Off(isp, BIU_CONF1));
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), 
 		    oc | BIU_PCI1080_CONF1_DMA);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 	BXW2(isp, IspVirt2Off(isp, regoff), val);
 	MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, regoff), 2, -1);
 	if (oc) {
 		BXW2(isp, IspVirt2Off(isp, BIU_CONF1), oc);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, BIU_CONF1), 2, -1);
 	}
 }
 
 static uint32_t
 isp_pci_rd_reg_2400(ispsoftc_t *isp, int regoff)
 {
 	uint32_t rv;
 	int block = regoff & _BLK_REG_MASK;
 
 	switch (block) {
 	case BIU_BLOCK:
 		break;
 	case MBOX_BLOCK:
 		return (BXR2(isp, IspVirt2Off(isp, regoff)));
 	case SXP_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "SXP_BLOCK read at 0x%x", regoff);
 		return (0xffffffff);
 	case RISC_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "RISC_BLOCK read at 0x%x", regoff);
 		return (0xffffffff);
 	case DMA_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "DMA_BLOCK read at 0x%x", regoff);
 		return (0xffffffff);
 	default:
 		isp_prt(isp, ISP_LOGERR, "unknown block read at 0x%x", regoff);
 		return (0xffffffff);
 	}
 
 	switch (regoff) {
 	case BIU2400_FLASH_ADDR:
 	case BIU2400_FLASH_DATA:
 	case BIU2400_ICR:
 	case BIU2400_ISR:
 	case BIU2400_CSR:
 	case BIU2400_REQINP:
 	case BIU2400_REQOUTP:
 	case BIU2400_RSPINP:
 	case BIU2400_RSPOUTP:
 	case BIU2400_PRI_REQINP:
 	case BIU2400_PRI_REQOUTP:
 	case BIU2400_ATIO_RSPINP:
 	case BIU2400_ATIO_RSPOUTP:
 	case BIU2400_HCCR:
 	case BIU2400_GPIOD:
 	case BIU2400_GPIOE:
 	case BIU2400_HSEMA:
 		rv = BXR4(isp, IspVirt2Off(isp, regoff));
 		break;
 	case BIU2400_R2HSTSLO:
 		rv = BXR4(isp, IspVirt2Off(isp, regoff));
 		break;
 	case BIU2400_R2HSTSHI:
 		rv = BXR4(isp, IspVirt2Off(isp, regoff)) >> 16;
 		break;
 	default:
 		isp_prt(isp, ISP_LOGERR, "unknown register read at 0x%x",
 		    regoff);
 		rv = 0xffffffff;
 		break;
 	}
 	return (rv);
 }
 
 static void
 isp_pci_wr_reg_2400(ispsoftc_t *isp, int regoff, uint32_t val)
 {
 	int block = regoff & _BLK_REG_MASK;
 
 	switch (block) {
 	case BIU_BLOCK:
 		break;
 	case MBOX_BLOCK:
 		BXW2(isp, IspVirt2Off(isp, regoff), val);
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, regoff), 2, -1);
 		return;
 	case SXP_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "SXP_BLOCK write at 0x%x", regoff);
 		return;
 	case RISC_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "RISC_BLOCK write at 0x%x", regoff);
 		return;
 	case DMA_BLOCK:
 		isp_prt(isp, ISP_LOGERR, "DMA_BLOCK write at 0x%x", regoff);
 		return;
 	default:
 		isp_prt(isp, ISP_LOGERR, "unknown block write at 0x%x", regoff);
 		break;
 	}
 
 	switch (regoff) {
 	case BIU2400_FLASH_ADDR:
 	case BIU2400_FLASH_DATA:
 	case BIU2400_ICR:
 	case BIU2400_ISR:
 	case BIU2400_CSR:
 	case BIU2400_REQINP:
 	case BIU2400_REQOUTP:
 	case BIU2400_RSPINP:
 	case BIU2400_RSPOUTP:
 	case BIU2400_PRI_REQINP:
 	case BIU2400_PRI_REQOUTP:
 	case BIU2400_ATIO_RSPINP:
 	case BIU2400_ATIO_RSPOUTP:
 	case BIU2400_HCCR:
 	case BIU2400_GPIOD:
 	case BIU2400_GPIOE:
 	case BIU2400_HSEMA:
 		BXW4(isp, IspVirt2Off(isp, regoff), val);
 #ifdef MEMORYBARRIERW
 		if (regoff == BIU2400_REQINP ||
 		    regoff == BIU2400_RSPOUTP ||
 		    regoff == BIU2400_PRI_REQINP ||
 		    regoff == BIU2400_ATIO_RSPOUTP)
 			MEMORYBARRIERW(isp, SYNC_REG,
 			    IspVirt2Off(isp, regoff), 4, -1)
 		else
 #endif
 		MEMORYBARRIER(isp, SYNC_REG, IspVirt2Off(isp, regoff), 4, -1);
 		break;
 	default:
 		isp_prt(isp, ISP_LOGERR, "unknown register write at 0x%x",
 		    regoff);
 		break;
 	}
 }
 
 static uint32_t
 isp_pci_rd_reg_2600(ispsoftc_t *isp, int regoff)
 {
 	uint32_t rv;
 
 	switch (regoff) {
 	case BIU2400_PRI_REQINP:
 	case BIU2400_PRI_REQOUTP:
 		isp_prt(isp, ISP_LOGERR, "unknown register read at 0x%x",
 		    regoff);
 		rv = 0xffffffff;
 		break;
 	case BIU2400_REQINP:
 		rv = B2R4(isp, 0x00);
 		break;
 	case BIU2400_REQOUTP:
 		rv = B2R4(isp, 0x04);
 		break;
 	case BIU2400_RSPINP:
 		rv = B2R4(isp, 0x08);
 		break;
 	case BIU2400_RSPOUTP:
 		rv = B2R4(isp, 0x0c);
 		break;
 	case BIU2400_ATIO_RSPINP:
 		rv = B2R4(isp, 0x10);
 		break;
 	case BIU2400_ATIO_RSPOUTP:
 		rv = B2R4(isp, 0x14);
 		break;
 	default:
 		rv = isp_pci_rd_reg_2400(isp, regoff);
 		break;
 	}
 	return (rv);
 }
 
 static void
 isp_pci_wr_reg_2600(ispsoftc_t *isp, int regoff, uint32_t val)
 {
 	int off;
 
 	switch (regoff) {
 	case BIU2400_PRI_REQINP:
 	case BIU2400_PRI_REQOUTP:
 		isp_prt(isp, ISP_LOGERR, "unknown register write at 0x%x",
 		    regoff);
 		return;
 	case BIU2400_REQINP:
 		off = 0x00;
 		break;
 	case BIU2400_REQOUTP:
 		off = 0x04;
 		break;
 	case BIU2400_RSPINP:
 		off = 0x08;
 		break;
 	case BIU2400_RSPOUTP:
 		off = 0x0c;
 		break;
 	case BIU2400_ATIO_RSPINP:
 		off = 0x10;
 		break;
 	case BIU2400_ATIO_RSPOUTP:
 		off = 0x14;
 		break;
 	default:
 		isp_pci_wr_reg_2400(isp, regoff, val);
 		return;
 	}
 	B2W4(isp, off, val);
 }
 
 
 struct imush {
 	bus_addr_t maddr;
 	int error;
 };
 
 static void
 imc(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	struct imush *imushp = (struct imush *) arg;
 
 	if (!(imushp->error = error))
 		imushp->maddr = segs[0].ds_addr;
 }
 
 static int
 isp_pci_mbxdma(ispsoftc_t *isp)
 {
 	caddr_t base;
 	uint32_t len, nsegs;
 	int i, error, cmap = 0;
 	bus_size_t slim;	/* segment size */
 	bus_addr_t llim;	/* low limit of unavailable dma */
 	bus_addr_t hlim;	/* high limit of unavailable dma */
 	struct imush im;
 	isp_ecmd_t *ecmd;
 
 	/* Already been here? If so, leave... */
 	if (isp->isp_xflist != NULL)
 		return (0);
 	if (isp->isp_rquest != NULL && isp->isp_maxcmds == 0)
 		return (0);
 	ISP_UNLOCK(isp);
 	if (isp->isp_rquest != NULL)
 		goto gotmaxcmds;
 
 	hlim = BUS_SPACE_MAXADDR;
 	if (IS_ULTRA2(isp) || IS_FC(isp) || IS_1240(isp)) {
 		if (sizeof (bus_size_t) > 4)
 			slim = (bus_size_t) (1ULL << 32);
 		else
 			slim = (bus_size_t) (1UL << 31);
 		llim = BUS_SPACE_MAXADDR;
 	} else {
 		slim = (1UL << 24);
 		llim = BUS_SPACE_MAXADDR_32BIT;
 	}
 	if (sizeof (bus_size_t) > 4)
 		nsegs = ISP_NSEG64_MAX;
 	else
 		nsegs = ISP_NSEG_MAX;
 
 	if (bus_dma_tag_create(bus_get_dma_tag(ISP_PCD(isp)), 1,
 	    slim, llim, hlim, NULL, NULL, BUS_SPACE_MAXSIZE, nsegs, slim, 0,
 	    busdma_lock_mutex, &isp->isp_lock, &isp->isp_osinfo.dmat)) {
 		ISP_LOCK(isp);
 		isp_prt(isp, ISP_LOGERR, "could not create master dma tag");
 		return (1);
 	}
 
 	/*
 	 * Allocate and map the request queue and a region for external
 	 * DMA addressable command/status structures (22XX and later).
 	 */
 	len = ISP_QUEUE_SIZE(RQUEST_QUEUE_LEN(isp));
 	if (isp->isp_type >= ISP_HA_FC_2200)
 		len += (N_XCMDS * XCMD_SIZE);
 	if (bus_dma_tag_create(isp->isp_osinfo.dmat, QENTRY_LEN, slim,
 	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 	    len, 1, len, 0, busdma_lock_mutex, &isp->isp_lock,
 	    &isp->isp_osinfo.reqdmat)) {
 		isp_prt(isp, ISP_LOGERR, "cannot create request DMA tag");
 		goto bad;
 	}
 	if (bus_dmamem_alloc(isp->isp_osinfo.reqdmat, (void **)&base,
 	    BUS_DMA_COHERENT, &isp->isp_osinfo.reqmap) != 0) {
 		isp_prt(isp, ISP_LOGERR, "cannot allocate request DMA memory");
 		bus_dma_tag_destroy(isp->isp_osinfo.reqdmat);
 		goto bad;
 	}
 	isp->isp_rquest = base;
 	im.error = 0;
 	if (bus_dmamap_load(isp->isp_osinfo.reqdmat, isp->isp_osinfo.reqmap,
 	    base, len, imc, &im, 0) || im.error) {
 		isp_prt(isp, ISP_LOGERR, "error loading request DMA map %d", im.error);
 		goto bad;
 	}
 	isp_prt(isp, ISP_LOGDEBUG0, "request area @ 0x%jx/0x%jx",
 	    (uintmax_t)im.maddr, (uintmax_t)len);
 	isp->isp_rquest_dma = im.maddr;
 	base += ISP_QUEUE_SIZE(RQUEST_QUEUE_LEN(isp));
 	im.maddr += ISP_QUEUE_SIZE(RQUEST_QUEUE_LEN(isp));
 	if (isp->isp_type >= ISP_HA_FC_2200) {
 		isp->isp_osinfo.ecmd_dma = im.maddr;
 		isp->isp_osinfo.ecmd_free = (isp_ecmd_t *)base;
 		isp->isp_osinfo.ecmd_base = isp->isp_osinfo.ecmd_free;
 		for (ecmd = isp->isp_osinfo.ecmd_free;
 		    ecmd < &isp->isp_osinfo.ecmd_free[N_XCMDS]; ecmd++) {
 			if (ecmd == &isp->isp_osinfo.ecmd_free[N_XCMDS - 1])
 				ecmd->next = NULL;
 			else
 				ecmd->next = ecmd + 1;
 		}
 	}
 
 	/*
 	 * Allocate and map the result queue.
 	 */
 	len = ISP_QUEUE_SIZE(RESULT_QUEUE_LEN(isp));
 	if (bus_dma_tag_create(isp->isp_osinfo.dmat, QENTRY_LEN, slim,
 	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 	    len, 1, len, 0, busdma_lock_mutex, &isp->isp_lock,
 	    &isp->isp_osinfo.respdmat)) {
 		isp_prt(isp, ISP_LOGERR, "cannot create response DMA tag");
 		goto bad;
 	}
 	if (bus_dmamem_alloc(isp->isp_osinfo.respdmat, (void **)&base,
 	    BUS_DMA_COHERENT, &isp->isp_osinfo.respmap) != 0) {
 		isp_prt(isp, ISP_LOGERR, "cannot allocate response DMA memory");
 		bus_dma_tag_destroy(isp->isp_osinfo.respdmat);
 		goto bad;
 	}
 	isp->isp_result = base;
 	im.error = 0;
 	if (bus_dmamap_load(isp->isp_osinfo.respdmat, isp->isp_osinfo.respmap,
 	    base, len, imc, &im, 0) || im.error) {
 		isp_prt(isp, ISP_LOGERR, "error loading response DMA map %d", im.error);
 		goto bad;
 	}
 	isp_prt(isp, ISP_LOGDEBUG0, "response area @ 0x%jx/0x%jx",
 	    (uintmax_t)im.maddr, (uintmax_t)len);
 	isp->isp_result_dma = im.maddr;
 
 #ifdef	ISP_TARGET_MODE
 	/*
 	 * Allocate and map ATIO queue on 24xx with target mode.
 	 */
 	if (IS_24XX(isp)) {
 		len = ISP_QUEUE_SIZE(RESULT_QUEUE_LEN(isp));
 		if (bus_dma_tag_create(isp->isp_osinfo.dmat, QENTRY_LEN, slim,
 		    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 		    len, 1, len, 0, busdma_lock_mutex, &isp->isp_lock,
 		    &isp->isp_osinfo.atiodmat)) {
 			isp_prt(isp, ISP_LOGERR, "cannot create ATIO DMA tag");
 			goto bad;
 		}
 		if (bus_dmamem_alloc(isp->isp_osinfo.atiodmat, (void **)&base,
 		    BUS_DMA_COHERENT, &isp->isp_osinfo.atiomap) != 0) {
 			isp_prt(isp, ISP_LOGERR, "cannot allocate ATIO DMA memory");
 			bus_dma_tag_destroy(isp->isp_osinfo.atiodmat);
 			goto bad;
 		}
 		isp->isp_atioq = base;
 		im.error = 0;
 		if (bus_dmamap_load(isp->isp_osinfo.atiodmat, isp->isp_osinfo.atiomap,
 		    base, len, imc, &im, 0) || im.error) {
 			isp_prt(isp, ISP_LOGERR, "error loading ATIO DMA map %d", im.error);
 			goto bad;
 		}
 		isp_prt(isp, ISP_LOGDEBUG0, "ATIO area @ 0x%jx/0x%jx",
 		    (uintmax_t)im.maddr, (uintmax_t)len);
 		isp->isp_atioq_dma = im.maddr;
 	}
 #endif
 
 	if (IS_FC(isp)) {
 		if (bus_dma_tag_create(isp->isp_osinfo.dmat, 64, slim,
 		    BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL,
 		    2*QENTRY_LEN, 1, 2*QENTRY_LEN, 0, busdma_lock_mutex,
 		    &isp->isp_lock, &isp->isp_osinfo.iocbdmat)) {
 			goto bad;
 		}
 		if (bus_dmamem_alloc(isp->isp_osinfo.iocbdmat,
 		    (void **)&base, BUS_DMA_COHERENT, &isp->isp_osinfo.iocbmap) != 0)
 			goto bad;
 		isp->isp_iocb = base;
 		im.error = 0;
 		if (bus_dmamap_load(isp->isp_osinfo.iocbdmat, isp->isp_osinfo.iocbmap,
 		    base, 2*QENTRY_LEN, imc, &im, 0) || im.error)
 			goto bad;
 		isp->isp_iocb_dma = im.maddr;
 
 		if (bus_dma_tag_create(isp->isp_osinfo.dmat, 64, slim,
 		    BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL,
 		    ISP_FC_SCRLEN, 1, ISP_FC_SCRLEN, 0, busdma_lock_mutex,
 		    &isp->isp_lock, &isp->isp_osinfo.scdmat))
 			goto bad;
 		for (cmap = 0; cmap < isp->isp_nchan; cmap++) {
 			struct isp_fc *fc = ISP_FC_PC(isp, cmap);
 			if (bus_dmamem_alloc(isp->isp_osinfo.scdmat,
 			    (void **)&base, BUS_DMA_COHERENT, &fc->scmap) != 0)
 				goto bad;
 			FCPARAM(isp, cmap)->isp_scratch = base;
 			im.error = 0;
 			if (bus_dmamap_load(isp->isp_osinfo.scdmat, fc->scmap,
 			    base, ISP_FC_SCRLEN, imc, &im, 0) || im.error) {
 				bus_dmamem_free(isp->isp_osinfo.scdmat,
 				    base, fc->scmap);
 				FCPARAM(isp, cmap)->isp_scratch = NULL;
 				goto bad;
 			}
 			FCPARAM(isp, cmap)->isp_scdma = im.maddr;
 			if (!IS_2100(isp)) {
 				for (i = 0; i < INITIAL_NEXUS_COUNT; i++) {
 					struct isp_nexus *n = malloc(sizeof (struct isp_nexus), M_DEVBUF, M_NOWAIT | M_ZERO);
 					if (n == NULL) {
 						while (fc->nexus_free_list) {
 							n = fc->nexus_free_list;
 							fc->nexus_free_list = n->next;
 							free(n, M_DEVBUF);
 						}
 						goto bad;
 					}
 					n->next = fc->nexus_free_list;
 					fc->nexus_free_list = n;
 				}
 			}
 		}
 	}
 
 	if (isp->isp_maxcmds == 0) {
 		ISP_LOCK(isp);
 		return (0);
 	}
 
 gotmaxcmds:
 	len = isp->isp_maxcmds * sizeof (struct isp_pcmd);
 	isp->isp_osinfo.pcmd_pool = (struct isp_pcmd *)
 	    malloc(len, M_DEVBUF, M_WAITOK | M_ZERO);
 	for (i = 0; i < isp->isp_maxcmds; i++) {
 		struct isp_pcmd *pcmd = &isp->isp_osinfo.pcmd_pool[i];
 		error = bus_dmamap_create(isp->isp_osinfo.dmat, 0, &pcmd->dmap);
 		if (error) {
 			isp_prt(isp, ISP_LOGERR, "error %d creating per-cmd DMA maps", error);
 			while (--i >= 0) {
 				bus_dmamap_destroy(isp->isp_osinfo.dmat,
 				    isp->isp_osinfo.pcmd_pool[i].dmap);
 			}
 			goto bad;
 		}
 		callout_init_mtx(&pcmd->wdog, &isp->isp_lock, 0);
 		if (i == isp->isp_maxcmds-1)
 			pcmd->next = NULL;
 		else
 			pcmd->next = &isp->isp_osinfo.pcmd_pool[i+1];
 	}
 	isp->isp_osinfo.pcmd_free = &isp->isp_osinfo.pcmd_pool[0];
 
 	len = sizeof (isp_hdl_t) * isp->isp_maxcmds;
 	isp->isp_xflist = (isp_hdl_t *) malloc(len, M_DEVBUF, M_WAITOK | M_ZERO);
 	for (len = 0; len < isp->isp_maxcmds - 1; len++)
 		isp->isp_xflist[len].cmd = &isp->isp_xflist[len+1];
 	isp->isp_xffree = isp->isp_xflist;
 
 	ISP_LOCK(isp);
 	return (0);
 
 bad:
 	isp_pci_mbxdmafree(isp);
 	ISP_LOCK(isp);
 	return (1);
 }
 
 static void
 isp_pci_mbxdmafree(ispsoftc_t *isp)
 {
 	int i;
 
 	if (isp->isp_xflist != NULL) {
 		free(isp->isp_xflist, M_DEVBUF);
 		isp->isp_xflist = NULL;
 	}
 	if (isp->isp_osinfo.pcmd_pool != NULL) {
 		for (i = 0; i < isp->isp_maxcmds; i++) {
 			bus_dmamap_destroy(isp->isp_osinfo.dmat,
 			    isp->isp_osinfo.pcmd_pool[i].dmap);
 		}
 		free(isp->isp_osinfo.pcmd_pool, M_DEVBUF);
 		isp->isp_osinfo.pcmd_pool = NULL;
 	}
 	if (IS_FC(isp)) {
 		for (i = 0; i < isp->isp_nchan; i++) {
 			struct isp_fc *fc = ISP_FC_PC(isp, i);
 			if (FCPARAM(isp, i)->isp_scdma != 0) {
 				bus_dmamap_unload(isp->isp_osinfo.scdmat,
 				    fc->scmap);
 				FCPARAM(isp, i)->isp_scdma = 0;
 			}
 			if (FCPARAM(isp, i)->isp_scratch != NULL) {
 				bus_dmamem_free(isp->isp_osinfo.scdmat,
 				    FCPARAM(isp, i)->isp_scratch, fc->scmap);
 				FCPARAM(isp, i)->isp_scratch = NULL;
 			}
 			while (fc->nexus_free_list) {
 				struct isp_nexus *n = fc->nexus_free_list;
 				fc->nexus_free_list = n->next;
 				free(n, M_DEVBUF);
 			}
 		}
 		if (isp->isp_iocb_dma != 0) {
 			bus_dma_tag_destroy(isp->isp_osinfo.scdmat);
 			bus_dmamap_unload(isp->isp_osinfo.iocbdmat,
 			    isp->isp_osinfo.iocbmap);
 			isp->isp_iocb_dma = 0;
 		}
 		if (isp->isp_iocb != NULL) {
 			bus_dmamem_free(isp->isp_osinfo.iocbdmat,
 			    isp->isp_iocb, isp->isp_osinfo.iocbmap);
 			bus_dma_tag_destroy(isp->isp_osinfo.iocbdmat);
 		}
 	}
 #ifdef	ISP_TARGET_MODE
 	if (IS_24XX(isp)) {
 		if (isp->isp_atioq_dma != 0) {
 			bus_dmamap_unload(isp->isp_osinfo.atiodmat,
 			    isp->isp_osinfo.atiomap);
 			isp->isp_atioq_dma = 0;
 		}
 		if (isp->isp_atioq != NULL) {
 			bus_dmamem_free(isp->isp_osinfo.atiodmat, isp->isp_atioq,
 			    isp->isp_osinfo.atiomap);
 			bus_dma_tag_destroy(isp->isp_osinfo.atiodmat);
 			isp->isp_atioq = NULL;
 		}
 	}
 #endif
 	if (isp->isp_result_dma != 0) {
 		bus_dmamap_unload(isp->isp_osinfo.respdmat,
 		    isp->isp_osinfo.respmap);
 		isp->isp_result_dma = 0;
 	}
 	if (isp->isp_result != NULL) {
 		bus_dmamem_free(isp->isp_osinfo.respdmat, isp->isp_result,
 		    isp->isp_osinfo.respmap);
 		bus_dma_tag_destroy(isp->isp_osinfo.respdmat);
 		isp->isp_result = NULL;
 	}
 	if (isp->isp_rquest_dma != 0) {
 		bus_dmamap_unload(isp->isp_osinfo.reqdmat,
 		    isp->isp_osinfo.reqmap);
 		isp->isp_rquest_dma = 0;
 	}
 	if (isp->isp_rquest != NULL) {
 		bus_dmamem_free(isp->isp_osinfo.reqdmat, isp->isp_rquest,
 		    isp->isp_osinfo.reqmap);
 		bus_dma_tag_destroy(isp->isp_osinfo.reqdmat);
 		isp->isp_rquest = NULL;
 	}
 }
 
 typedef struct {
 	ispsoftc_t *isp;
 	void *cmd_token;
 	void *rq;	/* original request */
 	int error;
 } mush_t;
 
 #define	MUSHERR_NOQENTRIES	-2
 
 static void
 dma2(void *arg, bus_dma_segment_t *dm_segs, int nseg, int error)
 {
 	mush_t *mp = (mush_t *) arg;
 	ispsoftc_t *isp= mp->isp;
 	struct ccb_scsiio *csio = mp->cmd_token;
 	isp_ddir_t ddir;
 	int sdir;
 
 	if (error) {
 		mp->error = error;
 		return;
 	}
 	if (nseg == 0) {
 		ddir = ISP_NOXFR;
 	} else {
 		if ((csio->ccb_h.flags & CAM_DIR_MASK) == CAM_DIR_IN) {
 			ddir = ISP_FROM_DEVICE;
 		} else {
 			ddir = ISP_TO_DEVICE;
 		}
 		if ((csio->ccb_h.func_code == XPT_CONT_TARGET_IO) ^
 		    ((csio->ccb_h.flags & CAM_DIR_MASK) == CAM_DIR_IN)) {
 			sdir = BUS_DMASYNC_PREREAD;
 		} else {
 			sdir = BUS_DMASYNC_PREWRITE;
 		}
 		bus_dmamap_sync(isp->isp_osinfo.dmat, PISP_PCMD(csio)->dmap,
 		    sdir);
 	}
 
 	error = isp_send_cmd(isp, mp->rq, dm_segs, nseg, XS_XFRLEN(csio),
 	    ddir, (ispds64_t *)csio->req_map);
 	switch (error) {
 	case CMD_EAGAIN:
 		mp->error = MUSHERR_NOQENTRIES;
 		break;
 	case CMD_QUEUED:
 		break;
 	default:
 		mp->error = EIO;
 		break;
 	}
 }
 
 static int
 isp_pci_dmasetup(ispsoftc_t *isp, struct ccb_scsiio *csio, void *ff)
 {
 	mush_t mush, *mp;
 	int error;
 
 	mp = &mush;
 	mp->isp = isp;
 	mp->cmd_token = csio;
 	mp->rq = ff;
 	mp->error = 0;
 
 	error = bus_dmamap_load_ccb(isp->isp_osinfo.dmat, PISP_PCMD(csio)->dmap,
 	    (union ccb *)csio, dma2, mp, 0);
 	if (error == EINPROGRESS) {
 		bus_dmamap_unload(isp->isp_osinfo.dmat, PISP_PCMD(csio)->dmap);
 		mp->error = EINVAL;
 		isp_prt(isp, ISP_LOGERR, "deferred dma allocation not supported");
 	} else if (error && mp->error == 0) {
 #ifdef	DIAGNOSTIC
 		isp_prt(isp, ISP_LOGERR, "error %d in dma mapping code", error);
 #endif
 		mp->error = error;
 	}
 	if (mp->error) {
 		int retval = CMD_COMPLETE;
 		if (mp->error == MUSHERR_NOQENTRIES) {
 			retval = CMD_EAGAIN;
 		} else if (mp->error == EFBIG) {
 			csio->ccb_h.status = CAM_REQ_TOO_BIG;
 		} else if (mp->error == EINVAL) {
 			csio->ccb_h.status = CAM_REQ_INVALID;
 		} else {
 			csio->ccb_h.status = CAM_UNREC_HBA_ERROR;
 		}
 		return (retval);
 	}
 	return (CMD_QUEUED);
 }
 
 static int
 isp_pci_irqsetup(ispsoftc_t *isp)
 {
 	device_t dev = isp->isp_osinfo.dev;
 	struct isp_pcisoftc *pcs = device_get_softc(dev);
 	driver_intr_t *f;
 	int i, max_irq;
 
 	/* Allocate IRQs only once. */
 	if (isp->isp_nirq > 0)
 		return (0);
 
 	ISP_UNLOCK(isp);
 	if (ISP_CAP_MSIX(isp)) {
 		max_irq = IS_26XX(isp) ? 3 : (IS_25XX(isp) ? 2 : 0);
 		resource_int_value(device_get_name(dev),
 		    device_get_unit(dev), "msix", &max_irq);
 		max_irq = imin(ISP_MAX_IRQS, max_irq);
 		pcs->msicount = imin(pci_msix_count(dev), max_irq);
 		if (pcs->msicount > 0 &&
 		    pci_alloc_msix(dev, &pcs->msicount) != 0)
 			pcs->msicount = 0;
 	}
 	if (pcs->msicount == 0) {
 		max_irq = 1;
 		resource_int_value(device_get_name(dev),
 		    device_get_unit(dev), "msi", &max_irq);
 		max_irq = imin(1, max_irq);
 		pcs->msicount = imin(pci_msi_count(dev), max_irq);
 		if (pcs->msicount > 0 &&
 		    pci_alloc_msi(dev, &pcs->msicount) != 0)
 			pcs->msicount = 0;
 	}
 	for (i = 0; i < MAX(1, pcs->msicount); i++) {
 		pcs->irq[i].iqd = i + (pcs->msicount > 0);
 		pcs->irq[i].irq = bus_alloc_resource_any(dev, SYS_RES_IRQ,
 		    &pcs->irq[i].iqd, RF_ACTIVE | RF_SHAREABLE);
 		if (pcs->irq[i].irq == NULL) {
 			device_printf(dev, "could not allocate interrupt\n");
 			break;
 		}
 		if (i == 0)
 			f = isp_platform_intr;
 		else if (i == 1)
 			f = isp_platform_intr_resp;
 		else
 			f = isp_platform_intr_atio;
 		if (bus_setup_intr(dev, pcs->irq[i].irq, ISP_IFLAGS, NULL,
 		    f, isp, &pcs->irq[i].ih)) {
 			device_printf(dev, "could not setup interrupt\n");
 			(void) bus_release_resource(dev, SYS_RES_IRQ,
 			    pcs->irq[i].iqd, pcs->irq[i].irq);
 			break;
 		}
 		if (pcs->msicount > 1) {
 			bus_describe_intr(dev, pcs->irq[i].irq, pcs->irq[i].ih,
 			    "%d", i);
 		}
 		isp->isp_nirq = i + 1;
 	}
 	ISP_LOCK(isp);
 
 	return (isp->isp_nirq == 0);
 }
 
 static void
 isp_pci_dumpregs(ispsoftc_t *isp, const char *msg)
 {
 	struct isp_pcisoftc *pcs = (struct isp_pcisoftc *)isp;
 	if (msg)
 		printf("%s: %s\n", device_get_nameunit(isp->isp_dev), msg);
 	else
 		printf("%s:\n", device_get_nameunit(isp->isp_dev));
 	if (IS_SCSI(isp))
 		printf("    biu_conf1=%x", ISP_READ(isp, BIU_CONF1));
 	else
 		printf("    biu_csr=%x", ISP_READ(isp, BIU2100_CSR));
 	printf(" biu_icr=%x biu_isr=%x biu_sema=%x ", ISP_READ(isp, BIU_ICR),
 	    ISP_READ(isp, BIU_ISR), ISP_READ(isp, BIU_SEMA));
 	printf("risc_hccr=%x\n", ISP_READ(isp, HCCR));
 
 
 	if (IS_SCSI(isp)) {
 		ISP_WRITE(isp, HCCR, HCCR_CMD_PAUSE);
 		printf("    cdma_conf=%x cdma_sts=%x cdma_fifostat=%x\n",
 			ISP_READ(isp, CDMA_CONF), ISP_READ(isp, CDMA_STATUS),
 			ISP_READ(isp, CDMA_FIFO_STS));
 		printf("    ddma_conf=%x ddma_sts=%x ddma_fifostat=%x\n",
 			ISP_READ(isp, DDMA_CONF), ISP_READ(isp, DDMA_STATUS),
 			ISP_READ(isp, DDMA_FIFO_STS));
 		printf("    sxp_int=%x sxp_gross=%x sxp(scsi_ctrl)=%x\n",
 			ISP_READ(isp, SXP_INTERRUPT),
 			ISP_READ(isp, SXP_GROSS_ERR),
 			ISP_READ(isp, SXP_PINS_CTRL));
 		ISP_WRITE(isp, HCCR, HCCR_CMD_RELEASE);
 	}
 	printf("    mbox regs: %x %x %x %x %x\n",
 	    ISP_READ(isp, OUTMAILBOX0), ISP_READ(isp, OUTMAILBOX1),
 	    ISP_READ(isp, OUTMAILBOX2), ISP_READ(isp, OUTMAILBOX3),
 	    ISP_READ(isp, OUTMAILBOX4));
 	printf("    PCI Status Command/Status=%x\n",
 	    pci_read_config(pcs->pci_dev, PCIR_COMMAND, 1));
 }
Index: user/ngie/bug-237403/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
===================================================================
--- user/ngie/bug-237403/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c	(revision 346926)
@@ -1,588 +1,591 @@
 /*-
  * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include "en.h"
 #include <machine/in_cksum.h>
 
 static inline int
 mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq,
     struct mlx5e_rx_wqe *wqe, u16 ix)
 {
 	bus_dma_segment_t segs[rq->nsegs];
 	struct mbuf *mb;
 	int nsegs;
 	int err;
 #if (MLX5E_MAX_RX_SEGS != 1)
 	struct mbuf *mb_head;
 	int i;
 #endif
 	if (rq->mbuf[ix].mbuf != NULL)
 		return (0);
 
 #if (MLX5E_MAX_RX_SEGS == 1)
 	mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rq->wqe_sz);
 	if (unlikely(!mb))
 		return (-ENOMEM);
 
 	mb->m_pkthdr.len = mb->m_len = rq->wqe_sz;
 #else
 	mb_head = mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR,
 	    MLX5E_MAX_RX_BYTES);
 	if (unlikely(mb == NULL))
 		return (-ENOMEM);
 
 	mb->m_len = MLX5E_MAX_RX_BYTES;
 	mb->m_pkthdr.len = MLX5E_MAX_RX_BYTES;
 
 	for (i = 1; i < rq->nsegs; i++) {
 		if (mb_head->m_pkthdr.len >= rq->wqe_sz)
 			break;
 		mb = mb->m_next = m_getjcl(M_NOWAIT, MT_DATA, 0,
 		    MLX5E_MAX_RX_BYTES);
 		if (unlikely(mb == NULL)) {
 			m_freem(mb_head);
 			return (-ENOMEM);
 		}
 		mb->m_len = MLX5E_MAX_RX_BYTES;
 		mb_head->m_pkthdr.len += MLX5E_MAX_RX_BYTES;
 	}
 	/* rewind to first mbuf in chain */
 	mb = mb_head;
 #endif
 	/* get IP header aligned */
 	m_adj(mb, MLX5E_NET_IP_ALIGN);
 
 	err = -bus_dmamap_load_mbuf_sg(rq->dma_tag, rq->mbuf[ix].dma_map,
 	    mb, segs, &nsegs, BUS_DMA_NOWAIT);
 	if (err != 0)
 		goto err_free_mbuf;
 	if (unlikely(nsegs == 0)) {
 		bus_dmamap_unload(rq->dma_tag, rq->mbuf[ix].dma_map);
 		err = -ENOMEM;
 		goto err_free_mbuf;
 	}
 #if (MLX5E_MAX_RX_SEGS == 1)
 	wqe->data[0].addr = cpu_to_be64(segs[0].ds_addr);
 #else
 	wqe->data[0].addr = cpu_to_be64(segs[0].ds_addr);
 	wqe->data[0].byte_count = cpu_to_be32(segs[0].ds_len |
 	    MLX5_HW_START_PADDING);
 	for (i = 1; i != nsegs; i++) {
 		wqe->data[i].addr = cpu_to_be64(segs[i].ds_addr);
 		wqe->data[i].byte_count = cpu_to_be32(segs[i].ds_len);
 	}
 	for (; i < rq->nsegs; i++) {
 		wqe->data[i].addr = 0;
 		wqe->data[i].byte_count = 0;
 	}
 #endif
 
 	rq->mbuf[ix].mbuf = mb;
 	rq->mbuf[ix].data = mb->m_data;
 
 	bus_dmamap_sync(rq->dma_tag, rq->mbuf[ix].dma_map,
 	    BUS_DMASYNC_PREREAD);
 	return (0);
 
 err_free_mbuf:
 	m_freem(mb);
 	return (err);
 }
 
 static void
 mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 {
 	if (unlikely(rq->enabled == 0))
 		return;
 
 	while (!mlx5_wq_ll_is_full(&rq->wq)) {
 		struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, rq->wq.head);
 
 		if (unlikely(mlx5e_alloc_rx_wqe(rq, wqe, rq->wq.head))) {
 			callout_reset_curcpu(&rq->watchdog, 1, (void *)&mlx5e_post_rx_wqes, rq);
 			break;
 		}
 		mlx5_wq_ll_push(&rq->wq, be16_to_cpu(wqe->next.next_wqe_index));
 	}
 
 	/* ensure wqes are visible to device before updating doorbell record */
 	atomic_thread_fence_rel();
 
 	mlx5_wq_ll_update_db_record(&rq->wq);
 }
 
 static void
 mlx5e_lro_update_hdr(struct mbuf *mb, struct mlx5_cqe64 *cqe)
 {
 	/* TODO: consider vlans, ip options, ... */
 	struct ether_header *eh;
 	uint16_t eh_type;
 	uint16_t tot_len;
 	struct ip6_hdr *ip6 = NULL;
 	struct ip *ip4 = NULL;
 	struct tcphdr *th;
 	uint32_t *ts_ptr;
 	uint8_t l4_hdr_type;
 	int tcp_ack;
 
 	eh = mtod(mb, struct ether_header *);
 	eh_type = ntohs(eh->ether_type);
 
 	l4_hdr_type = get_cqe_l4_hdr_type(cqe);
 	tcp_ack = ((CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA == l4_hdr_type) ||
 	    (CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA == l4_hdr_type));
 
 	/* TODO: consider vlan */
 	tot_len = be32_to_cpu(cqe->byte_cnt) - ETHER_HDR_LEN;
 
 	switch (eh_type) {
 	case ETHERTYPE_IP:
 		ip4 = (struct ip *)(eh + 1);
 		th = (struct tcphdr *)(ip4 + 1);
 		break;
 	case ETHERTYPE_IPV6:
 		ip6 = (struct ip6_hdr *)(eh + 1);
 		th = (struct tcphdr *)(ip6 + 1);
 		break;
 	default:
 		return;
 	}
 
 	ts_ptr = (uint32_t *)(th + 1);
 
 	if (get_cqe_lro_tcppsh(cqe))
 		th->th_flags |= TH_PUSH;
 
 	if (tcp_ack) {
 		th->th_flags |= TH_ACK;
 		th->th_ack = cqe->lro_ack_seq_num;
 		th->th_win = cqe->lro_tcp_win;
 
 		/*
 		 * FreeBSD handles only 32bit aligned timestamp right after
 		 * the TCP hdr
 		 * +--------+--------+--------+--------+
 		 * |   NOP  |  NOP   |  TSopt |   10   |
 		 * +--------+--------+--------+--------+
 		 * |          TSval   timestamp        |
 		 * +--------+--------+--------+--------+
 		 * |          TSecr   timestamp        |
 		 * +--------+--------+--------+--------+
 		 */
 		if (get_cqe_lro_timestamp_valid(cqe) &&
 		    (__predict_true(*ts_ptr) == ntohl(TCPOPT_NOP << 24 |
 		    TCPOPT_NOP << 16 | TCPOPT_TIMESTAMP << 8 |
 		    TCPOLEN_TIMESTAMP))) {
 			/*
 			 * cqe->timestamp is 64bit long.
 			 * [0-31] - timestamp.
 			 * [32-64] - timestamp echo replay.
 			 */
 			ts_ptr[1] = *(uint32_t *)&cqe->timestamp;
 			ts_ptr[2] = *((uint32_t *)&cqe->timestamp + 1);
 		}
 	}
 	if (ip4) {
 		ip4->ip_ttl = cqe->lro_min_ttl;
 		ip4->ip_len = cpu_to_be16(tot_len);
 		ip4->ip_sum = 0;
 		ip4->ip_sum = in_cksum(mb, ip4->ip_hl << 2);
 	} else {
 		ip6->ip6_hlim = cqe->lro_min_ttl;
 		ip6->ip6_plen = cpu_to_be16(tot_len -
 		    sizeof(struct ip6_hdr));
 	}
 	/* TODO: handle tcp checksum */
 }
 
 static uint64_t
 mlx5e_mbuf_tstmp(struct mlx5e_priv *priv, uint64_t hw_tstmp)
 {
 	struct mlx5e_clbr_point *cp;
 	uint64_t a1, a2, res;
 	u_int gen;
 
 	do {
 		cp = &priv->clbr_points[priv->clbr_curr];
 		gen = atomic_load_acq_int(&cp->clbr_gen);
 		a1 = (hw_tstmp - cp->clbr_hw_prev) >> MLX5E_TSTMP_PREC;
 		a2 = (cp->base_curr - cp->base_prev) >> MLX5E_TSTMP_PREC;
 		res = (a1 * a2) << MLX5E_TSTMP_PREC;
 
 		/*
 		 * Divisor cannot be zero because calibration callback
 		 * checks for the condition and disables timestamping
 		 * if clock halted.
 		 */
 		res /= (cp->clbr_hw_curr - cp->clbr_hw_prev) >>
 		    MLX5E_TSTMP_PREC;
 
 		res += cp->base_prev;
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != cp->clbr_gen);
 	return (res);
 }
 
 static inline void
 mlx5e_build_rx_mbuf(struct mlx5_cqe64 *cqe,
     struct mlx5e_rq *rq, struct mbuf *mb,
     u32 cqe_bcnt)
 {
 	struct ifnet *ifp = rq->ifp;
 	struct mlx5e_channel *c;
 #if (MLX5E_MAX_RX_SEGS != 1)
 	struct mbuf *mb_head;
 #endif
 	int lro_num_seg;	/* HW LRO session aggregated packets counter */
 	uint64_t tstmp;
 
 	lro_num_seg = be32_to_cpu(cqe->srqn) >> 24;
 	if (lro_num_seg > 1) {
 		mlx5e_lro_update_hdr(mb, cqe);
 		rq->stats.lro_packets++;
 		rq->stats.lro_bytes += cqe_bcnt;
 	}
 
 #if (MLX5E_MAX_RX_SEGS == 1)
 	mb->m_pkthdr.len = mb->m_len = cqe_bcnt;
 #else
 	mb->m_pkthdr.len = cqe_bcnt;
 	for (mb_head = mb; mb != NULL; mb = mb->m_next) {
 		if (mb->m_len > cqe_bcnt)
 			mb->m_len = cqe_bcnt;
 		cqe_bcnt -= mb->m_len;
 		if (likely(cqe_bcnt == 0)) {
 			if (likely(mb->m_next != NULL)) {
 				/* trim off empty mbufs */
 				m_freem(mb->m_next);
 				mb->m_next = NULL;
 			}
 			break;
 		}
 	}
 	/* rewind to first mbuf in chain */
 	mb = mb_head;
 #endif
 	/* check if a Toeplitz hash was computed */
 	if (cqe->rss_hash_type != 0) {
 		mb->m_pkthdr.flowid = be32_to_cpu(cqe->rss_hash_result);
 #ifdef RSS
 		/* decode the RSS hash type */
 		switch (cqe->rss_hash_type &
 		    (CQE_RSS_DST_HTYPE_L4 | CQE_RSS_DST_HTYPE_IP)) {
 		/* IPv4 */
 		case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV4):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV4);
 			break;
 		case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV4):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV4);
 			break;
 		case CQE_RSS_DST_HTYPE_IPV4:
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV4);
 			break;
 		/* IPv6 */
 		case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV6):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV6);
 			break;
 		case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV6):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV6);
 			break;
 		case CQE_RSS_DST_HTYPE_IPV6:
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV6);
 			break;
 		default:	/* Other */
 			M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
 			break;
 		}
 #else
 		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
 #endif
 	} else {
 		mb->m_pkthdr.flowid = rq->ix;
 		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
 	}
 	mb->m_pkthdr.rcvif = ifp;
 
 	if (likely(ifp->if_capenable & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) &&
 	    ((cqe->hds_ip_ext & (CQE_L2_OK | CQE_L3_OK | CQE_L4_OK)) ==
 	    (CQE_L2_OK | CQE_L3_OK | CQE_L4_OK))) {
 		mb->m_pkthdr.csum_flags =
 		    CSUM_IP_CHECKED | CSUM_IP_VALID |
 		    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 		mb->m_pkthdr.csum_data = htons(0xffff);
 	} else {
 		rq->stats.csum_none++;
 	}
 
 	if (cqe_has_vlan(cqe)) {
 		mb->m_pkthdr.ether_vtag = be16_to_cpu(cqe->vlan_info);
 		mb->m_flags |= M_VLANTAG;
 	}
 
 	c = container_of(rq, struct mlx5e_channel, rq);
 	if (c->priv->clbr_done >= 2) {
 		tstmp = mlx5e_mbuf_tstmp(c->priv, be64_to_cpu(cqe->timestamp));
 		if ((tstmp & MLX5_CQE_TSTMP_PTP) != 0) {
 			/*
 			 * Timestamp was taken on the packet entrance,
 			 * instead of the cqe generation.
 			 */
 			tstmp &= ~MLX5_CQE_TSTMP_PTP;
 			mb->m_flags |= M_TSTMP_HPREC;
 		}
 		mb->m_pkthdr.rcv_tstmp = tstmp;
 		mb->m_flags |= M_TSTMP;
 	}
 }
 
 static inline void
 mlx5e_read_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
 {
 	memcpy(data, mlx5_cqwq_get_wqe(&cq->wq, (cc & cq->wq.sz_m1)),
 	    sizeof(struct mlx5_cqe64));
 }
 
 static inline void
 mlx5e_write_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
 {
 	memcpy(mlx5_cqwq_get_wqe(&cq->wq, cc & cq->wq.sz_m1),
 	    data, sizeof(struct mlx5_cqe64));
 }
 
 static inline void
 mlx5e_decompress_cqe(struct mlx5e_cq *cq, struct mlx5_cqe64 *title,
     struct mlx5_mini_cqe8 *mini,
     u16 wqe_counter, int i)
 {
 	/*
 	 * NOTE: The fields which are not set here are copied from the
 	 * initial and common title. See memcpy() in
 	 * mlx5e_write_cqe_slot().
 	 */
 	title->byte_cnt = mini->byte_cnt;
 	title->wqe_counter = cpu_to_be16((wqe_counter + i) & cq->wq.sz_m1);
 	title->check_sum = mini->checksum;
 	title->op_own = (title->op_own & 0xf0) |
 	    (((cq->wq.cc + i) >> cq->wq.log_sz) & 1);
 }
 
 #define MLX5E_MINI_ARRAY_SZ 8
 /* Make sure structs are not packet differently */
 CTASSERT(sizeof(struct mlx5_cqe64) ==
     sizeof(struct mlx5_mini_cqe8) * MLX5E_MINI_ARRAY_SZ);
 static void
 mlx5e_decompress_cqes(struct mlx5e_cq *cq)
 {
 	struct mlx5_mini_cqe8 mini_array[MLX5E_MINI_ARRAY_SZ];
 	struct mlx5_cqe64 title;
 	u32 cqe_count;
 	u32 i = 0;
 	u16 title_wqe_counter;
 
 	mlx5e_read_cqe_slot(cq, cq->wq.cc, &title);
 	title_wqe_counter = be16_to_cpu(title.wqe_counter);
 	cqe_count = be32_to_cpu(title.byte_cnt);
 
 	/* Make sure we won't overflow */
 	KASSERT(cqe_count <= cq->wq.sz_m1,
 	    ("%s: cqe_count %u > cq->wq.sz_m1 %u", __func__,
 	    cqe_count, cq->wq.sz_m1));
 
 	mlx5e_read_cqe_slot(cq, cq->wq.cc + 1, mini_array);
 	while (true) {
 		mlx5e_decompress_cqe(cq, &title,
 		    &mini_array[i % MLX5E_MINI_ARRAY_SZ],
 		    title_wqe_counter, i);
 		mlx5e_write_cqe_slot(cq, cq->wq.cc + i, &title);
 		i++;
 
 		if (i == cqe_count)
 			break;
 		if (i % MLX5E_MINI_ARRAY_SZ == 0)
 			mlx5e_read_cqe_slot(cq, cq->wq.cc + i, mini_array);
 	}
 }
 
 static int
 mlx5e_poll_rx_cq(struct mlx5e_rq *rq, int budget)
 {
 	struct pfil_head *pfil;
 	int i, rv;
 
 	CURVNET_SET_QUIET(rq->ifp->if_vnet);
 	pfil = rq->channel->priv->pfil;
 	for (i = 0; i < budget; i++) {
 		struct mlx5e_rx_wqe *wqe;
 		struct mlx5_cqe64 *cqe;
 		struct mbuf *mb;
 		__be16 wqe_counter_be;
 		u16 wqe_counter;
 		u32 byte_cnt, seglen;
 
 		cqe = mlx5e_get_cqe(&rq->cq);
 		if (!cqe)
 			break;
 
 		if (mlx5_get_cqe_format(cqe) == MLX5_COMPRESSED)
 			mlx5e_decompress_cqes(&rq->cq);
 
 		mlx5_cqwq_pop(&rq->cq.wq);
 
 		wqe_counter_be = cqe->wqe_counter;
 		wqe_counter = be16_to_cpu(wqe_counter_be);
 		wqe = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 		byte_cnt = be32_to_cpu(cqe->byte_cnt);
 
 		bus_dmamap_sync(rq->dma_tag,
 		    rq->mbuf[wqe_counter].dma_map,
 		    BUS_DMASYNC_POSTREAD);
 
 		if (unlikely((cqe->op_own >> 4) != MLX5_CQE_RESP_SEND)) {
 			rq->stats.wqe_err++;
 			goto wq_ll_pop;
 		}
 		if (pfil != NULL && PFIL_HOOKED_IN(pfil)) {
 			seglen = MIN(byte_cnt, MLX5E_MAX_RX_BYTES);
 			rv = pfil_run_hooks(rq->channel->priv->pfil,
 			    rq->mbuf[wqe_counter].data, rq->ifp,
 			    seglen | PFIL_MEMPTR | PFIL_IN, NULL);
 
 			switch (rv) {
 			case PFIL_DROPPED:
 			case PFIL_CONSUMED:
 				/*
 				 * Filter dropped or consumed it. In
 				 * either case, we can just recycle
 				 * buffer; there is no more work to do.
 				 */
 				rq->stats.packets++;
 				goto wq_ll_pop;
 			case PFIL_REALLOCED:
 				/*
 				 * Filter copied it; recycle buffer
 				 * and receive the new mbuf allocated
 				 * by the Filter
 				 */
 				mb = pfil_mem2mbuf(rq->mbuf[wqe_counter].data);
 				goto rx_common;
 			default:
 				/*
 				 * The Filter said it was OK, so
 				 * receive like normal.
 				 */
 				KASSERT(rv == PFIL_PASS,
 					("Filter returned %d!\n", rv));
 			}
 		}
 		if ((MHLEN - MLX5E_NET_IP_ALIGN) >= byte_cnt &&
 		    (mb = m_gethdr(M_NOWAIT, MT_DATA)) != NULL) {
 #if (MLX5E_MAX_RX_SEGS != 1)
 			/* set maximum mbuf length */
 			mb->m_len = MHLEN - MLX5E_NET_IP_ALIGN;
 #endif
 			/* get IP header aligned */
 			mb->m_data += MLX5E_NET_IP_ALIGN;
 
 			bcopy(rq->mbuf[wqe_counter].data, mtod(mb, caddr_t),
 			    byte_cnt);
 		} else {
 			mb = rq->mbuf[wqe_counter].mbuf;
 			rq->mbuf[wqe_counter].mbuf = NULL;	/* safety clear */
 
 			bus_dmamap_unload(rq->dma_tag,
 			    rq->mbuf[wqe_counter].dma_map);
 		}
 rx_common:
 		mlx5e_build_rx_mbuf(cqe, rq, mb, byte_cnt);
 		rq->stats.bytes += byte_cnt;
 		rq->stats.packets++;
+#ifdef NUMA
+		mb->m_pkthdr.numa_domain = rq->ifp->if_numa_domain;
+#endif
 
 #if !defined(HAVE_TCP_LRO_RX)
 		tcp_lro_queue_mbuf(&rq->lro, mb);
 #else
 		if (mb->m_pkthdr.csum_flags == 0 ||
 		    (rq->ifp->if_capenable & IFCAP_LRO) == 0 ||
 		    rq->lro.lro_cnt == 0 ||
 		    tcp_lro_rx(&rq->lro, mb, 0) != 0) {
 			rq->ifp->if_input(rq->ifp, mb);
 		}
 #endif
 wq_ll_pop:
 		mlx5_wq_ll_pop(&rq->wq, wqe_counter_be,
 		    &wqe->next.next_wqe_index);
 	}
 	CURVNET_RESTORE();
 
 	mlx5_cqwq_update_db_record(&rq->cq.wq);
 
 	/* ensure cq space is freed before enabling more cqes */
 	atomic_thread_fence_rel();
 	return (i);
 }
 
 void
 mlx5e_rx_cq_comp(struct mlx5_core_cq *mcq)
 {
 	struct mlx5e_rq *rq = container_of(mcq, struct mlx5e_rq, cq.mcq);
 	int i = 0;
 
 #ifdef HAVE_PER_CQ_EVENT_PACKET
 #if (MHLEN < 15)
 #error "MHLEN is too small"
 #endif
 	struct mbuf *mb = m_gethdr(M_NOWAIT, MT_DATA);
 
 	if (mb != NULL) {
 		/* this code is used for debugging purpose only */
 		mb->m_pkthdr.len = mb->m_len = 15;
 		memset(mb->m_data, 255, 14);
 		mb->m_data[14] = rq->ix;
 		mb->m_pkthdr.rcvif = rq->ifp;
 		rq->ifp->if_input(rq->ifp, mb);
 	}
 #endif
 
 	mtx_lock(&rq->mtx);
 
 	/*
 	 * Polling the entire CQ without posting new WQEs results in
 	 * lack of receive WQEs during heavy traffic scenarios.
 	 */
 	while (1) {
 		if (mlx5e_poll_rx_cq(rq, MLX5E_RX_BUDGET_MAX) !=
 		    MLX5E_RX_BUDGET_MAX)
 			break;
 		i += MLX5E_RX_BUDGET_MAX;
 		if (i >= MLX5E_BUDGET_MAX)
 			break;
 		mlx5e_post_rx_wqes(rq);
 	}
 	mlx5e_post_rx_wqes(rq);
 	mlx5e_cq_arm(&rq->cq, MLX5_GET_DOORBELL_LOCK(&rq->channel->priv->doorbell_lock));
 	tcp_lro_flush_all(&rq->lro);
 	mtx_unlock(&rq->mtx);
 }
Index: user/ngie/bug-237403/sys/dev/uart/uart_cpu_arm64.c
===================================================================
--- user/ngie/bug-237403/sys/dev/uart/uart_cpu_arm64.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/uart/uart_cpu_arm64.c	(revision 346926)
@@ -1,217 +1,224 @@
 /*-
  * Copyright (c) 2016 The FreeBSD Foundation
  * All rights reserved.
  *
  * This software was developed by Andrew Turner under sponsorship from
  * the FreeBSD Foundation.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include "opt_acpi.h"
 #include "opt_platform.h"
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/systm.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <machine/bus.h>
 
 #include <dev/uart/uart.h>
 #include <dev/uart/uart_bus.h>
 #include <dev/uart/uart_cpu.h>
 
 #ifdef DEV_ACPI
 #include <contrib/dev/acpica/include/acpi.h>
 #include <contrib/dev/acpica/include/accommon.h>
 #include <contrib/dev/acpica/include/actables.h>
 #include <dev/uart/uart_cpu_acpi.h>
 #endif
 
 #ifdef FDT
 #include <dev/fdt/fdt_common.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 #include <dev/uart/uart_cpu_fdt.h>
 #endif
 
 /*
  * UART console routines.
  */
 extern struct bus_space memmap_bus;
 bus_space_tag_t uart_bus_space_io;
 bus_space_tag_t uart_bus_space_mem = &memmap_bus;
 
 int
 uart_cpu_eqres(struct uart_bas *b1, struct uart_bas *b2)
 {
 
 	if (pmap_kextract(b1->bsh) == 0)
 		return (0);
 	if (pmap_kextract(b2->bsh) == 0)
 		return (0);
 	return ((pmap_kextract(b1->bsh) == pmap_kextract(b2->bsh)) ? 1 : 0);
 }
 
 #ifdef DEV_ACPI
 static struct acpi_uart_compat_data *
 uart_cpu_acpi_scan(uint8_t interface_type)
 {
 	struct acpi_uart_compat_data **cd, *curcd;
 	int i;
 
 	SET_FOREACH(cd, uart_acpi_class_and_device_set) {
 		curcd = *cd;
 		for (i = 0; curcd[i].cd_hid != NULL; i++) {
 			if (curcd[i].cd_port_subtype == interface_type)
 				return (&curcd[i]);
 		}
 	}
 
 	SET_FOREACH(cd, uart_acpi_class_set) {
 		curcd = *cd;
 		for (i = 0; curcd[i].cd_hid != NULL; i++) {
 			if (curcd[i].cd_port_subtype == interface_type)
 				return (&curcd[i]);
 		}
 	}
 
 	return (NULL);
 }
 
 static int
 uart_cpu_acpi_probe(struct uart_class **classp, bus_space_tag_t *bst,
     bus_space_handle_t *bsh, int *baud, u_int *rclk, u_int *shiftp,
     u_int *iowidthp)
 {
 	struct acpi_uart_compat_data *cd;
 	ACPI_TABLE_SPCR *spcr;
 	vm_paddr_t spcr_physaddr;
 	int err;
 
 	err = ENXIO;
 	spcr_physaddr = acpi_find_table(ACPI_SIG_SPCR);
 	if (spcr_physaddr == 0)
 		return (ENXIO);
 
 	spcr = acpi_map_table(spcr_physaddr, ACPI_SIG_SPCR);
 
 	cd = uart_cpu_acpi_scan(spcr->InterfaceType);
 	if (cd == NULL)
 		goto out;
 
 	switch(spcr->BaudRate) {
+	case 0:
+		/*
+		 * A BaudRate of 0 is a special value which means not to
+		 * change the rate that's already programmed.
+		 */
+		*baud = 0;
+		break;
 	case 3:
 		*baud = 9600;
 		break;
 	case 4:
 		*baud = 19200;
 		break;
 	case 6:
 		*baud = 57600;
 		break;
 	case 7:
 		*baud = 115200;
 		break;
 	default:
 		goto out;
 	}
 
 	err = acpi_map_addr(&spcr->SerialPort, bst, bsh, PAGE_SIZE);
 	if (err != 0)
 		goto out;
 
 	*classp = cd->cd_class;
 	*rclk = 0;
 	*shiftp = spcr->SerialPort.AccessWidth - 1;
 	*iowidthp = spcr->SerialPort.BitWidth / 8;
 
 	if ((cd->cd_quirks & UART_F_IGNORE_SPCR_REGSHFT) ==
 	    UART_F_IGNORE_SPCR_REGSHFT) {
 		*shiftp = cd->cd_regshft;
 	}
 
 out:
 	acpi_unmap_table(spcr);
 	return (err);
 }
 #endif
 
 int
 uart_cpu_getdev(int devtype, struct uart_devinfo *di)
 {
 	struct uart_class *class;
 	bus_space_handle_t bsh;
 	bus_space_tag_t bst;
 	u_int rclk, shift, iowidth;
 	int br, err;
 
 	/* Allow overriding the FDT using the environment. */
 	class = &uart_ns8250_class;
 	err = uart_getenv(devtype, di, class);
 	if (err == 0)
 		return (0);
 
 	if (devtype != UART_DEV_CONSOLE)
 		return (ENXIO);
 
 	err = ENXIO;
 #ifdef DEV_ACPI
 	err = uart_cpu_acpi_probe(&class, &bst, &bsh, &br, &rclk, &shift,
 	    &iowidth);
 #endif
 #ifdef FDT
 	if (err != 0) {
 		err = uart_cpu_fdt_probe(&class, &bst, &bsh, &br, &rclk,
 		    &shift, &iowidth);
 	}
 #endif
 	if (err != 0)
 		return (err);
 
 	/*
 	 * Finalize configuration.
 	 */
 	di->bas.chan = 0;
 	di->bas.regshft = shift;
 	di->bas.regiowidth = iowidth;
 	di->baudrate = br;
 	di->bas.rclk = rclk;
 	di->ops = uart_getops(class);
 	di->databits = 8;
 	di->stopbits = 1;
 	di->parity = UART_PARITY_NONE;
 	di->bas.bst = bst;
 	di->bas.bsh = bsh;
 	uart_bus_space_mem = di->bas.bst;
 	uart_bus_space_io = NULL;
 
 	return (0);
 }
Index: user/ngie/bug-237403/sys/dev/xdma/xdma.h
===================================================================
--- user/ngie/bug-237403/sys/dev/xdma/xdma.h	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/xdma/xdma.h	(revision 346926)
@@ -1,264 +1,264 @@
 /*-
  * Copyright (c) 2016-2018 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory under DARPA/AFRL contract FA8750-10-C-0237
  * ("CTSRD"), as part of the DARPA CRASH research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _DEV_XDMA_XDMA_H_
 #define _DEV_XDMA_XDMA_H_
 
 #include <sys/proc.h>
 
 enum xdma_direction {
 	XDMA_MEM_TO_MEM,
 	XDMA_MEM_TO_DEV,
 	XDMA_DEV_TO_MEM,
 	XDMA_DEV_TO_DEV,
 };
 
 enum xdma_operation_type {
 	XDMA_MEMCPY,
 	XDMA_CYCLIC,
 	XDMA_FIFO,
 	XDMA_SG,
 };
 
 enum xdma_request_type {
 	XR_TYPE_PHYS,
 	XR_TYPE_VIRT,
 	XR_TYPE_MBUF,
 	XR_TYPE_BIO,
 };
 
 enum xdma_command {
 	XDMA_CMD_BEGIN,
 	XDMA_CMD_PAUSE,
 	XDMA_CMD_TERMINATE,
 };
 
 struct xdma_transfer_status {
 	uint32_t	transferred;
 	int		error;
 };
 
 typedef struct xdma_transfer_status xdma_transfer_status_t;
 
 struct xdma_controller {
 	device_t dev;		/* DMA consumer device_t. */
 	device_t dma_dev;	/* A real DMA device_t. */
 	void *data;		/* OFW MD part. */
 
 	/* List of virtual channels allocated. */
 	TAILQ_HEAD(xdma_channel_list, xdma_channel)	channels;
 };
 
 typedef struct xdma_controller xdma_controller_t;
 
 struct xchan_buf {
 	bus_dmamap_t			map;
 	uint32_t			nsegs;
 	uint32_t			nsegs_left;
-	void				*cbuf;
 };
 
 struct xdma_request {
 	struct mbuf			*m;
 	struct bio			*bp;
 	enum xdma_operation_type	operation;
 	enum xdma_request_type		req_type;
 	enum xdma_direction		direction;
 	bus_addr_t			src_addr;
 	bus_addr_t			dst_addr;
 	uint8_t				src_width;
 	uint8_t				dst_width;
 	bus_size_t			block_num;
 	bus_size_t			block_len;
 	xdma_transfer_status_t		status;
 	void				*user;
 	TAILQ_ENTRY(xdma_request)	xr_next;
 	struct xchan_buf		buf;
 };
 
 struct xdma_sglist {
 	bus_addr_t			src_addr;
 	bus_addr_t			dst_addr;
 	size_t				len;
 	uint8_t				src_width;
 	uint8_t				dst_width;
 	enum xdma_direction		direction;
 	bool				first;
 	bool				last;
 };
 
 struct xdma_channel {
 	xdma_controller_t		*xdma;
 
 	uint32_t			flags;
 #define	XCHAN_BUFS_ALLOCATED		(1 << 0)
 #define	XCHAN_SGLIST_ALLOCATED		(1 << 1)
 #define	XCHAN_CONFIGURED		(1 << 2)
 #define	XCHAN_TYPE_CYCLIC		(1 << 3)
 #define	XCHAN_TYPE_MEMCPY		(1 << 4)
 #define	XCHAN_TYPE_FIFO			(1 << 5)
 #define	XCHAN_TYPE_SG			(1 << 6)
 
 	uint32_t			caps;
 #define	XCHAN_CAP_BUSDMA		(1 << 0)
-#define	XCHAN_CAP_BUSDMA_NOSEG		(1 << 1)
+#define	XCHAN_CAP_NOSEG			(1 << 1)
+#define	XCHAN_CAP_NOBUFS		(1 << 2)
 
 	/* A real hardware driver channel. */
 	void				*chan;
 
 	/* Interrupt handlers. */
 	TAILQ_HEAD(, xdma_intr_handler)	ie_handlers;
 	TAILQ_ENTRY(xdma_channel)	xchan_next;
 
 	struct sx			sx_lock;
 	struct sx			sx_qin_lock;
 	struct sx			sx_qout_lock;
 	struct sx			sx_bank_lock;
 	struct sx			sx_proc_lock;
 
 	/* Request queue. */
 	bus_dma_tag_t			dma_tag_bufs;
 	struct xdma_request		*xr_mem;
 	uint32_t			xr_num;
 
 	/* Bus dma tag options. */
 	bus_size_t			maxsegsize;
 	bus_size_t			maxnsegs;
 	bus_size_t			alignment;
 	bus_addr_t			boundary;
 	bus_addr_t			lowaddr;
 	bus_addr_t			highaddr;
 
 	struct xdma_sglist		*sg;
 
 	TAILQ_HEAD(, xdma_request)	bank;
 	TAILQ_HEAD(, xdma_request)	queue_in;
 	TAILQ_HEAD(, xdma_request)	queue_out;
 	TAILQ_HEAD(, xdma_request)	processing;
 };
 
 typedef struct xdma_channel xdma_channel_t;
 
 struct xdma_intr_handler {
 	int		(*cb)(void *cb_user, xdma_transfer_status_t *status);
 	void		*cb_user;
 	TAILQ_ENTRY(xdma_intr_handler)	ih_next;
 };
 
 static MALLOC_DEFINE(M_XDMA, "xdma", "xDMA framework");
 
 #define	XCHAN_LOCK(xchan)		sx_xlock(&(xchan)->sx_lock)
 #define	XCHAN_UNLOCK(xchan)		sx_xunlock(&(xchan)->sx_lock)
 #define	XCHAN_ASSERT_LOCKED(xchan)	\
     sx_assert(&(xchan)->sx_lock, SX_XLOCKED)
 
 #define	QUEUE_IN_LOCK(xchan)		sx_xlock(&(xchan)->sx_qin_lock)
 #define	QUEUE_IN_UNLOCK(xchan)		sx_xunlock(&(xchan)->sx_qin_lock)
 #define	QUEUE_IN_ASSERT_LOCKED(xchan)	\
     sx_assert(&(xchan)->sx_qin_lock, SX_XLOCKED)
 
 #define	QUEUE_OUT_LOCK(xchan)		sx_xlock(&(xchan)->sx_qout_lock)
 #define	QUEUE_OUT_UNLOCK(xchan)		sx_xunlock(&(xchan)->sx_qout_lock)
 #define	QUEUE_OUT_ASSERT_LOCKED(xchan)	\
     sx_assert(&(xchan)->sx_qout_lock, SX_XLOCKED)
 
 #define	QUEUE_BANK_LOCK(xchan)		sx_xlock(&(xchan)->sx_bank_lock)
 #define	QUEUE_BANK_UNLOCK(xchan)	sx_xunlock(&(xchan)->sx_bank_lock)
 #define	QUEUE_BANK_ASSERT_LOCKED(xchan)	\
     sx_assert(&(xchan)->sx_bank_lock, SX_XLOCKED)
 
 #define	QUEUE_PROC_LOCK(xchan)		sx_xlock(&(xchan)->sx_proc_lock)
 #define	QUEUE_PROC_UNLOCK(xchan)	sx_xunlock(&(xchan)->sx_proc_lock)
 #define	QUEUE_PROC_ASSERT_LOCKED(xchan)	\
     sx_assert(&(xchan)->sx_proc_lock, SX_XLOCKED)
 
 #define	XDMA_SGLIST_MAXLEN	2048
 #define	XDMA_MAX_SEG		128
 
 /* xDMA controller ops */
 xdma_controller_t *xdma_ofw_get(device_t dev, const char *prop);
 int xdma_put(xdma_controller_t *xdma);
 
 /* xDMA channel ops */
 xdma_channel_t * xdma_channel_alloc(xdma_controller_t *, uint32_t caps);
 int xdma_channel_free(xdma_channel_t *);
 int xdma_request(xdma_channel_t *xchan, struct xdma_request *r);
 
 /* SG interface */
 int xdma_prep_sg(xdma_channel_t *, uint32_t,
     bus_size_t, bus_size_t, bus_size_t, bus_addr_t, bus_addr_t, bus_addr_t);
 void xdma_channel_free_sg(xdma_channel_t *xchan);
 int xdma_queue_submit_sg(xdma_channel_t *xchan);
 void xchan_seg_done(xdma_channel_t *xchan, xdma_transfer_status_t *);
 
 /* Queue operations */
 int xdma_dequeue_mbuf(xdma_channel_t *xchan, struct mbuf **m,
     xdma_transfer_status_t *);
 int xdma_enqueue_mbuf(xdma_channel_t *xchan, struct mbuf **m, uintptr_t addr,
     uint8_t, uint8_t, enum xdma_direction dir);
 int xdma_dequeue_bio(xdma_channel_t *xchan, struct bio **bp,
     xdma_transfer_status_t *status);
 int xdma_enqueue_bio(xdma_channel_t *xchan, struct bio **bp, bus_addr_t addr,
     uint8_t, uint8_t, enum xdma_direction dir);
 int xdma_dequeue(xdma_channel_t *xchan, void **user,
     xdma_transfer_status_t *status);
 int xdma_enqueue(xdma_channel_t *xchan, uintptr_t src, uintptr_t dst,
     uint8_t, uint8_t, bus_size_t, enum xdma_direction dir, void *);
 int xdma_queue_submit(xdma_channel_t *xchan);
 
 /* Mbuf operations */
 uint32_t xdma_mbuf_defrag(xdma_channel_t *xchan, struct xdma_request *xr);
 uint32_t xdma_mbuf_chain_count(struct mbuf *m0);
 
 /* Channel Control */
 int xdma_control(xdma_channel_t *xchan, enum xdma_command cmd);
 
 /* Interrupt callback */
 int xdma_setup_intr(xdma_channel_t *xchan, int (*cb)(void *,
     xdma_transfer_status_t *), void *arg, void **);
 int xdma_teardown_intr(xdma_channel_t *xchan, struct xdma_intr_handler *ih);
 int xdma_teardown_all_intr(xdma_channel_t *xchan);
 void xdma_callback(struct xdma_channel *xchan, xdma_transfer_status_t *status);
 
 /* Sglist */
 int xchan_sglist_alloc(xdma_channel_t *xchan);
 void xchan_sglist_free(xdma_channel_t *xchan);
 int xdma_sglist_add(struct xdma_sglist *sg, struct bus_dma_segment *seg,
     uint32_t nsegs, struct xdma_request *xr);
 
 /* Requests bank */
 void xchan_bank_init(xdma_channel_t *xchan);
 int xchan_bank_free(xdma_channel_t *xchan);
 struct xdma_request * xchan_bank_get(xdma_channel_t *xchan);
 int xchan_bank_put(xdma_channel_t *xchan, struct xdma_request *xr);
 
 #endif /* !_DEV_XDMA_XDMA_H_ */
Index: user/ngie/bug-237403/sys/dev/xdma/xdma_mbuf.c
===================================================================
--- user/ngie/bug-237403/sys/dev/xdma/xdma_mbuf.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/xdma/xdma_mbuf.c	(revision 346926)
@@ -1,154 +1,150 @@
 /*-
  * Copyright (c) 2017-2018 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory under DARPA/AFRL contract FA8750-10-C-0237
  * ("CTSRD"), as part of the DARPA CRASH research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_platform.h"
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/sx.h>
 #include <sys/mbuf.h>
 
 #include <machine/bus.h>
 
 #ifdef FDT
 #include <dev/fdt/fdt_common.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 #endif
 
 #include <dev/xdma/xdma.h>
 
 int
 xdma_dequeue_mbuf(xdma_channel_t *xchan, struct mbuf **mp,
     xdma_transfer_status_t *status)
 {
 	struct xdma_request *xr;
 	struct xdma_request *xr_tmp;
 
 	QUEUE_OUT_LOCK(xchan);
 	TAILQ_FOREACH_SAFE(xr, &xchan->queue_out, xr_next, xr_tmp) {
 		TAILQ_REMOVE(&xchan->queue_out, xr, xr_next);
 		break;
 	}
 	QUEUE_OUT_UNLOCK(xchan);
 
 	if (xr == NULL)
 		return (-1);
 
 	*mp = xr->m;
 	status->error = xr->status.error;
 	status->transferred = xr->status.transferred;
 
 	xchan_bank_put(xchan, xr);
 
 	return (0);
 }
 
 int
 xdma_enqueue_mbuf(xdma_channel_t *xchan, struct mbuf **mp,
     uintptr_t addr, uint8_t src_width, uint8_t dst_width,
     enum xdma_direction dir)
 {
 	struct xdma_request *xr;
 	xdma_controller_t *xdma;
 
 	xdma = xchan->xdma;
 
 	xr = xchan_bank_get(xchan);
 	if (xr == NULL)
 		return (-1); /* No space is available yet. */
 
 	xr->direction = dir;
 	xr->m = *mp;
 	xr->req_type = XR_TYPE_MBUF;
 	if (dir == XDMA_MEM_TO_DEV) {
 		xr->dst_addr = addr;
 		xr->src_addr = 0;
 	} else {
 		xr->src_addr = addr;
 		xr->dst_addr = 0;
 	}
 	xr->src_width = src_width;
 	xr->dst_width = dst_width;
 
 	QUEUE_IN_LOCK(xchan);
 	TAILQ_INSERT_TAIL(&xchan->queue_in, xr, xr_next);
 	QUEUE_IN_UNLOCK(xchan);
 
 	return (0);
 }
 
 uint32_t
 xdma_mbuf_chain_count(struct mbuf *m0)
 {
 	struct mbuf *m;
 	uint32_t c;
 
 	c = 0;
 
 	for (m = m0; m != NULL; m = m->m_next)
 		c++;
 
 	return (c);
 }
 
 uint32_t
 xdma_mbuf_defrag(xdma_channel_t *xchan, struct xdma_request *xr)
 {
 	xdma_controller_t *xdma;
 	struct mbuf *m;
 	uint32_t c;
 
 	xdma = xchan->xdma;
 
 	c = xdma_mbuf_chain_count(xr->m);
 	if (c == 1)
 		return (c); /* Nothing to do. */
 
-	if (xchan->caps & XCHAN_CAP_BUSDMA) {
-		if ((xchan->caps & XCHAN_CAP_BUSDMA_NOSEG) || \
-		    (c > xchan->maxnsegs)) {
-			if ((m = m_defrag(xr->m, M_NOWAIT)) == NULL) {
-				device_printf(xdma->dma_dev,
-				    "%s: Can't defrag mbuf\n",
-				    __func__);
-				return (c);
-			}
-			xr->m = m;
-			c = 1;
-		}
+	if ((m = m_defrag(xr->m, M_NOWAIT)) == NULL) {
+		device_printf(xdma->dma_dev,
+		    "%s: Can't defrag mbuf\n",
+		    __func__);
+		return (c);
 	}
+
+	xr->m = m;
+	c = 1;
 
 	return (c);
 }
Index: user/ngie/bug-237403/sys/dev/xdma/xdma_sg.c
===================================================================
--- user/ngie/bug-237403/sys/dev/xdma/xdma_sg.c	(revision 346925)
+++ user/ngie/bug-237403/sys/dev/xdma/xdma_sg.c	(revision 346926)
@@ -1,594 +1,586 @@
 /*-
  * Copyright (c) 2018 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory under DARPA/AFRL contract FA8750-10-C-0237
  * ("CTSRD"), as part of the DARPA CRASH research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_platform.h"
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/sx.h>
 
 #include <machine/bus.h>
 
 #ifdef FDT
 #include <dev/fdt/fdt_common.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 #endif
 
 #include <dev/xdma/xdma.h>
 
 #include <xdma_if.h>
 
 struct seg_load_request {
 	struct bus_dma_segment *seg;
 	uint32_t nsegs;
 	uint32_t error;
 };
 
 static int
 _xchan_bufs_alloc(xdma_channel_t *xchan)
 {
 	xdma_controller_t *xdma;
 	struct xdma_request *xr;
 	int i;
 
 	xdma = xchan->xdma;
 
 	for (i = 0; i < xchan->xr_num; i++) {
 		xr = &xchan->xr_mem[i];
-		xr->buf.cbuf = contigmalloc(xchan->maxsegsize,
-		    M_XDMA, 0, 0, ~0, PAGE_SIZE, 0);
-		if (xr->buf.cbuf == NULL) {
-			device_printf(xdma->dev,
-			    "%s: Can't allocate contiguous kernel"
-			    " physical memory\n", __func__);
-			return (-1);
-		}
+		/* TODO: bounce buffer */
 	}
 
 	return (0);
 }
 
 static int
 _xchan_bufs_alloc_busdma(xdma_channel_t *xchan)
 {
 	xdma_controller_t *xdma;
 	struct xdma_request *xr;
 	int err;
 	int i;
 
 	xdma = xchan->xdma;
 
 	/* Create bus_dma tag */
 	err = bus_dma_tag_create(
 	    bus_get_dma_tag(xdma->dev),	/* Parent tag. */
 	    xchan->alignment,		/* alignment */
 	    xchan->boundary,		/* boundary */
 	    xchan->lowaddr,		/* lowaddr */
 	    xchan->highaddr,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    xchan->maxsegsize * xchan->maxnsegs, /* maxsize */
 	    xchan->maxnsegs,		/* nsegments */
 	    xchan->maxsegsize,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockarg */
 	    &xchan->dma_tag_bufs);
 	if (err != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't create bus_dma tag.\n", __func__);
 		return (-1);
 	}
 
 	for (i = 0; i < xchan->xr_num; i++) {
 		xr = &xchan->xr_mem[i];
 		err = bus_dmamap_create(xchan->dma_tag_bufs, 0,
 		    &xr->buf.map);
 		if (err != 0) {
 			device_printf(xdma->dev,
 			    "%s: Can't create buf DMA map.\n", __func__);
 
 			/* Cleanup. */
 			bus_dma_tag_destroy(xchan->dma_tag_bufs);
 
 			return (-1);
 		}
 	}
 
 	return (0);
 }
 
 static int
 xchan_bufs_alloc(xdma_channel_t *xchan)
 {
 	xdma_controller_t *xdma;
 	int ret;
 
 	xdma = xchan->xdma;
 
 	if (xdma == NULL) {
 		device_printf(xdma->dev,
 		    "%s: Channel was not allocated properly.\n", __func__);
 		return (-1);
 	}
 
 	if (xchan->caps & XCHAN_CAP_BUSDMA)
 		ret = _xchan_bufs_alloc_busdma(xchan);
 	else
 		ret = _xchan_bufs_alloc(xchan);
 	if (ret != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't allocate bufs.\n", __func__);
 		return (-1);
 	}
 
 	xchan->flags |= XCHAN_BUFS_ALLOCATED;
 
 	return (0);
 }
 
 static int
 xchan_bufs_free(xdma_channel_t *xchan)
 {
 	struct xdma_request *xr;
 	struct xchan_buf *b;
 	int i;
 
 	if ((xchan->flags & XCHAN_BUFS_ALLOCATED) == 0)
 		return (-1);
 
 	if (xchan->caps & XCHAN_CAP_BUSDMA) {
 		for (i = 0; i < xchan->xr_num; i++) {
 			xr = &xchan->xr_mem[i];
 			b = &xr->buf;
 			bus_dmamap_destroy(xchan->dma_tag_bufs, b->map);
 		}
 		bus_dma_tag_destroy(xchan->dma_tag_bufs);
 	} else {
 		for (i = 0; i < xchan->xr_num; i++) {
 			xr = &xchan->xr_mem[i];
-			contigfree(xr->buf.cbuf, xchan->maxsegsize, M_XDMA);
+			/* TODO: bounce buffer */
 		}
 	}
 
 	xchan->flags &= ~XCHAN_BUFS_ALLOCATED;
 
 	return (0);
 }
 
 void
 xdma_channel_free_sg(xdma_channel_t *xchan)
 {
 
 	xchan_bufs_free(xchan);
 	xchan_sglist_free(xchan);
 	xchan_bank_free(xchan);
 }
 
 /*
  * Prepare xchan for a scatter-gather transfer.
  * xr_num - xdma requests queue size,
  * maxsegsize - maximum allowed scatter-gather list element size in bytes
  */
 int
 xdma_prep_sg(xdma_channel_t *xchan, uint32_t xr_num,
     bus_size_t maxsegsize, bus_size_t maxnsegs,
     bus_size_t alignment, bus_addr_t boundary,
     bus_addr_t lowaddr, bus_addr_t highaddr)
 {
 	xdma_controller_t *xdma;
 	int ret;
 
 	xdma = xchan->xdma;
 
 	KASSERT(xdma != NULL, ("xdma is NULL"));
 
 	if (xchan->flags & XCHAN_CONFIGURED) {
 		device_printf(xdma->dev,
 		    "%s: Channel is already configured.\n", __func__);
 		return (-1);
 	}
 
 	xchan->xr_num = xr_num;
 	xchan->maxsegsize = maxsegsize;
 	xchan->maxnsegs = maxnsegs;
 	xchan->alignment = alignment;
 	xchan->boundary = boundary;
 	xchan->lowaddr = lowaddr;
 	xchan->highaddr = highaddr;
 
 	if (xchan->maxnsegs > XDMA_MAX_SEG) {
 		device_printf(xdma->dev, "%s: maxnsegs is too big\n",
 		    __func__);
 		return (-1);
 	}
 
 	xchan_bank_init(xchan);
 
 	/* Allocate sglist. */
 	ret = xchan_sglist_alloc(xchan);
 	if (ret != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't allocate sglist.\n", __func__);
 		return (-1);
 	}
 
-	/* Allocate bufs. */
-	ret = xchan_bufs_alloc(xchan);
-	if (ret != 0) {
-		device_printf(xdma->dev,
-		    "%s: Can't allocate bufs.\n", __func__);
+	/* Allocate buffers if required. */
+	if ((xchan->caps & XCHAN_CAP_NOBUFS) == 0) {
+		ret = xchan_bufs_alloc(xchan);
+		if (ret != 0) {
+			device_printf(xdma->dev,
+			    "%s: Can't allocate bufs.\n", __func__);
 
-		/* Cleanup */
-		xchan_sglist_free(xchan);
-		xchan_bank_free(xchan);
+			/* Cleanup */
+			xchan_sglist_free(xchan);
+			xchan_bank_free(xchan);
 
-		return (-1);
+			return (-1);
+		}
 	}
 
 	xchan->flags |= (XCHAN_CONFIGURED | XCHAN_TYPE_SG);
 
 	XCHAN_LOCK(xchan);
 	ret = XDMA_CHANNEL_PREP_SG(xdma->dma_dev, xchan);
 	if (ret != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't prepare SG transfer.\n", __func__);
 		XCHAN_UNLOCK(xchan);
 
 		return (-1);
 	}
 	XCHAN_UNLOCK(xchan);
 
 	return (0);
 }
 
 void
 xchan_seg_done(xdma_channel_t *xchan,
     struct xdma_transfer_status *st)
 {
 	struct xdma_request *xr;
 	xdma_controller_t *xdma;
 	struct xchan_buf *b;
 
 	xdma = xchan->xdma;
 
 	xr = TAILQ_FIRST(&xchan->processing);
 	if (xr == NULL)
 		panic("request not found\n");
 
 	b = &xr->buf;
 
 	atomic_subtract_int(&b->nsegs_left, 1);
 
 	if (b->nsegs_left == 0) {
 		if (xchan->caps & XCHAN_CAP_BUSDMA) {
 			if (xr->direction == XDMA_MEM_TO_DEV)
 				bus_dmamap_sync(xchan->dma_tag_bufs, b->map, 
 				    BUS_DMASYNC_POSTWRITE);
 			else
 				bus_dmamap_sync(xchan->dma_tag_bufs, b->map, 
 				    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(xchan->dma_tag_bufs, b->map);
 		}
 		xr->status.error = st->error;
 		xr->status.transferred = st->transferred;
 
 		QUEUE_PROC_LOCK(xchan);
 		TAILQ_REMOVE(&xchan->processing, xr, xr_next);
 		QUEUE_PROC_UNLOCK(xchan);
 
 		QUEUE_OUT_LOCK(xchan);
 		TAILQ_INSERT_TAIL(&xchan->queue_out, xr, xr_next);
 		QUEUE_OUT_UNLOCK(xchan);
 	}
 }
 
 static void
 xdma_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nsegs, int error)
 {
 	struct seg_load_request *slr;
 	struct bus_dma_segment *seg;
 	int i;
 
 	slr = arg;
 	seg = slr->seg;
 
 	if (error != 0) {
 		slr->error = error;
 		return;
 	}
 
 	slr->nsegs = nsegs;
 
 	for (i = 0; i < nsegs; i++) {
 		seg[i].ds_addr = segs[i].ds_addr;
 		seg[i].ds_len = segs[i].ds_len;
 	}
 }
 
 static int
 _xdma_load_data_busdma(xdma_channel_t *xchan, struct xdma_request *xr,
     struct bus_dma_segment *seg)
 {
 	xdma_controller_t *xdma;
 	struct seg_load_request slr;
 	uint32_t nsegs;
 	void *addr;
 	int error;
 
 	xdma = xchan->xdma;
 
 	error = 0;
 	nsegs = 0;
 
 	switch (xr->req_type) {
 	case XR_TYPE_MBUF:
 		error = bus_dmamap_load_mbuf_sg(xchan->dma_tag_bufs,
 		    xr->buf.map, xr->m, seg, &nsegs, BUS_DMA_NOWAIT);
 		break;
 	case XR_TYPE_BIO:
 		slr.nsegs = 0;
 		slr.error = 0;
 		slr.seg = seg;
 		error = bus_dmamap_load_bio(xchan->dma_tag_bufs,
 		    xr->buf.map, xr->bp, xdma_dmamap_cb, &slr, BUS_DMA_NOWAIT);
 		if (slr.error != 0) {
 			device_printf(xdma->dma_dev,
 			    "%s: bus_dmamap_load failed, err %d\n",
 			    __func__, slr.error);
 			return (0);
 		}
 		nsegs = slr.nsegs;
 		break;
 	case XR_TYPE_VIRT:
 		switch (xr->direction) {
 		case XDMA_MEM_TO_DEV:
 			addr = (void *)xr->src_addr;
 			break;
 		case XDMA_DEV_TO_MEM:
 			addr = (void *)xr->dst_addr;
 			break;
 		default:
 			device_printf(xdma->dma_dev,
 			    "%s: Direction is not supported\n", __func__);
 			return (0);
 		}
 		slr.nsegs = 0;
 		slr.error = 0;
 		slr.seg = seg;
 		error = bus_dmamap_load(xchan->dma_tag_bufs, xr->buf.map,
 		    addr, (xr->block_len * xr->block_num),
 		    xdma_dmamap_cb, &slr, BUS_DMA_NOWAIT);
 		if (slr.error != 0) {
 			device_printf(xdma->dma_dev,
 			    "%s: bus_dmamap_load failed, err %d\n",
 			    __func__, slr.error);
 			return (0);
 		}
 		nsegs = slr.nsegs;
 		break;
 	default:
 		break;
 	}
 
 	if (error != 0) {
 		if (error == ENOMEM) {
 			/*
 			 * Out of memory. Try again later.
 			 * TODO: count errors.
 			 */
 		} else
 			device_printf(xdma->dma_dev,
 			    "%s: bus_dmamap_load failed with err %d\n",
 			    __func__, error);
 		return (0);
 	}
 
 	if (xr->direction == XDMA_MEM_TO_DEV)
 		bus_dmamap_sync(xchan->dma_tag_bufs, xr->buf.map,
 		    BUS_DMASYNC_PREWRITE);
 	else
 		bus_dmamap_sync(xchan->dma_tag_bufs, xr->buf.map,
 		    BUS_DMASYNC_PREREAD);
 
 	return (nsegs);
 }
 
 static int
 _xdma_load_data(xdma_channel_t *xchan, struct xdma_request *xr,
     struct bus_dma_segment *seg)
 {
 	xdma_controller_t *xdma;
 	struct mbuf *m;
 	uint32_t nsegs;
 
 	xdma = xchan->xdma;
 
 	m = xr->m;
 
 	nsegs = 1;
 
 	switch (xr->req_type) {
 	case XR_TYPE_MBUF:
-		if (xr->direction == XDMA_MEM_TO_DEV) {
-			m_copydata(m, 0, m->m_pkthdr.len, xr->buf.cbuf);
-			seg[0].ds_addr = (bus_addr_t)xr->buf.cbuf;
-			seg[0].ds_len = m->m_pkthdr.len;
-		} else {
-			seg[0].ds_addr = mtod(m, bus_addr_t);
-			seg[0].ds_len = m->m_pkthdr.len;
-		}
+		seg[0].ds_addr = mtod(m, bus_addr_t);
+		seg[0].ds_len = m->m_pkthdr.len;
 		break;
 	case XR_TYPE_BIO:
 	case XR_TYPE_VIRT:
 	default:
 		panic("implement me\n");
 	}
 
 	return (nsegs);
 }
 
 static int
 xdma_load_data(xdma_channel_t *xchan,
     struct xdma_request *xr, struct bus_dma_segment *seg)
 {
 	xdma_controller_t *xdma;
 	int error;
 	int nsegs;
 
 	xdma = xchan->xdma;
 
 	error = 0;
 	nsegs = 0;
 
 	if (xchan->caps & XCHAN_CAP_BUSDMA)
 		nsegs = _xdma_load_data_busdma(xchan, xr, seg);
 	else
 		nsegs = _xdma_load_data(xchan, xr, seg);
 	if (nsegs == 0)
 		return (0); /* Try again later. */
 
 	xr->buf.nsegs = nsegs;
 	xr->buf.nsegs_left = nsegs;
 
 	return (nsegs);
 }
 
 static int
 xdma_process(xdma_channel_t *xchan,
     struct xdma_sglist *sg)
 {
 	struct bus_dma_segment seg[XDMA_MAX_SEG];
 	struct xdma_request *xr;
 	struct xdma_request *xr_tmp;
 	xdma_controller_t *xdma;
 	uint32_t capacity;
 	uint32_t n;
 	uint32_t c;
 	int nsegs;
 	int ret;
 
 	XCHAN_ASSERT_LOCKED(xchan);
 
 	xdma = xchan->xdma;
 
 	n = 0;
 
 	ret = XDMA_CHANNEL_CAPACITY(xdma->dma_dev, xchan, &capacity);
 	if (ret != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't get DMA controller capacity.\n", __func__);
 		return (-1);
 	}
 
 	TAILQ_FOREACH_SAFE(xr, &xchan->queue_in, xr_next, xr_tmp) {
 		switch (xr->req_type) {
 		case XR_TYPE_MBUF:
-			c = xdma_mbuf_defrag(xchan, xr);
+			if ((xchan->caps & XCHAN_CAP_NOSEG) ||
+			    (c > xchan->maxnsegs))
+				c = xdma_mbuf_defrag(xchan, xr);
 			break;
 		case XR_TYPE_BIO:
 		case XR_TYPE_VIRT:
 		default:
 			c = 1;
 		}
 
 		if (capacity <= (c + n)) {
 			/*
 			 * No space yet available for the entire
 			 * request in the DMA engine.
 			 */
 			break;
 		}
 
 		if ((c + n + xchan->maxnsegs) >= XDMA_SGLIST_MAXLEN) {
 			/* Sglist is full. */
 			break;
 		}
 
 		nsegs = xdma_load_data(xchan, xr, seg);
 		if (nsegs == 0)
 			break;
 
 		xdma_sglist_add(&sg[n], seg, nsegs, xr);
 		n += nsegs;
 
 		QUEUE_IN_LOCK(xchan);
 		TAILQ_REMOVE(&xchan->queue_in, xr, xr_next);
 		QUEUE_IN_UNLOCK(xchan);
 
 		QUEUE_PROC_LOCK(xchan);
 		TAILQ_INSERT_TAIL(&xchan->processing, xr, xr_next);
 		QUEUE_PROC_UNLOCK(xchan);
 	}
 
 	return (n);
 }
 
 int
 xdma_queue_submit_sg(xdma_channel_t *xchan)
 {
 	struct xdma_sglist *sg;
 	xdma_controller_t *xdma;
 	uint32_t sg_n;
 	int ret;
 
 	xdma = xchan->xdma;
 	KASSERT(xdma != NULL, ("xdma is NULL"));
 
 	XCHAN_ASSERT_LOCKED(xchan);
 
 	sg = xchan->sg;
 
-	if ((xchan->flags & XCHAN_BUFS_ALLOCATED) == 0) {
+	if ((xchan->caps & XCHAN_CAP_NOBUFS) == 0 &&
+	   (xchan->flags & XCHAN_BUFS_ALLOCATED) == 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't submit a transfer: no bufs\n",
 		    __func__);
 		return (-1);
 	}
 
 	sg_n = xdma_process(xchan, sg);
 	if (sg_n == 0)
 		return (0); /* Nothing to submit */
 
 	/* Now submit sglist to DMA engine driver. */
 	ret = XDMA_CHANNEL_SUBMIT_SG(xdma->dma_dev, xchan, sg, sg_n);
 	if (ret != 0) {
 		device_printf(xdma->dev,
 		    "%s: Can't submit an sglist.\n", __func__);
 		return (-1);
 	}
 
 	return (0);
 }
Index: user/ngie/bug-237403/sys/geom/geom_dev.c
===================================================================
--- user/ngie/bug-237403/sys/geom/geom_dev.c	(revision 346925)
+++ user/ngie/bug-237403/sys/geom/geom_dev.c	(revision 346926)
@@ -1,869 +1,870 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 2002 Poul-Henning Kamp
  * Copyright (c) 2002 Networks Associates Technology, Inc.
  * All rights reserved.
  *
  * This software was developed for the FreeBSD Project by Poul-Henning Kamp
  * and NAI Labs, the Security Research Division of Network Associates, Inc.
  * under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
  * DARPA CHATS research program.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The names of the authors may not be used to endorse or promote
  *    products derived from this software without specific prior written
  *    permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/malloc.h>
 #include <sys/kernel.h>
 #include <sys/conf.h>
 #include <sys/ctype.h>
 #include <sys/bio.h>
 #include <sys/bus.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/errno.h>
 #include <sys/time.h>
 #include <sys/disk.h>
 #include <sys/fcntl.h>
 #include <sys/limits.h>
 #include <sys/sysctl.h>
 #include <geom/geom.h>
 #include <geom/geom_int.h>
 #include <machine/stdarg.h>
 
 struct g_dev_softc {
 	struct mtx	 sc_mtx;
 	struct cdev	*sc_dev;
 	struct cdev	*sc_alias;
 	int		 sc_open;
 	u_int		 sc_active;
 #define	SC_A_DESTROY	(1 << 31)
 #define	SC_A_OPEN	(1 << 30)
 #define	SC_A_ACTIVE	(SC_A_OPEN - 1)
 };
 
 static d_open_t		g_dev_open;
 static d_close_t	g_dev_close;
 static d_strategy_t	g_dev_strategy;
 static d_ioctl_t	g_dev_ioctl;
 
 static struct cdevsw g_dev_cdevsw = {
 	.d_version =	D_VERSION,
 	.d_open =	g_dev_open,
 	.d_close =	g_dev_close,
 	.d_read =	physread,
 	.d_write =	physwrite,
 	.d_ioctl =	g_dev_ioctl,
 	.d_strategy =	g_dev_strategy,
 	.d_name =	"g_dev",
 	.d_flags =	D_DISK | D_TRACKCLOSE,
 };
 
 static g_init_t g_dev_init;
 static g_fini_t g_dev_fini;
 static g_taste_t g_dev_taste;
 static g_orphan_t g_dev_orphan;
 static g_attrchanged_t g_dev_attrchanged;
 static g_resize_t g_dev_resize;
 
 static struct g_class g_dev_class	= {
 	.name = "DEV",
 	.version = G_VERSION,
 	.init = g_dev_init,
 	.fini = g_dev_fini,
 	.taste = g_dev_taste,
 	.orphan = g_dev_orphan,
 	.attrchanged = g_dev_attrchanged,
 	.resize = g_dev_resize
 };
 
 /*
  * We target 262144 (8 x 32768) sectors by default as this significantly
  * increases the throughput on commonly used SSD's with a marginal
  * increase in non-interruptible request latency.
  */
 static uint64_t g_dev_del_max_sectors = 262144;
 SYSCTL_DECL(_kern_geom);
 SYSCTL_NODE(_kern_geom, OID_AUTO, dev, CTLFLAG_RW, 0, "GEOM_DEV stuff");
 SYSCTL_QUAD(_kern_geom_dev, OID_AUTO, delete_max_sectors, CTLFLAG_RW,
     &g_dev_del_max_sectors, 0, "Maximum number of sectors in a single "
     "delete request sent to the provider. Larger requests are chunked "
     "so they can be interrupted. (0 = disable chunking)");
 
 static char *dumpdev = NULL;
 static void
 g_dev_init(struct g_class *mp)
 {
 
 	dumpdev = kern_getenv("dumpdev");
 }
 
 static void
 g_dev_fini(struct g_class *mp)
 {
 
 	freeenv(dumpdev);
 	dumpdev = NULL;
 }
 
 static int
 g_dev_setdumpdev(struct cdev *dev, struct diocskerneldump_arg *kda,
     struct thread *td)
 {
 	struct g_kerneldump kd;
 	struct g_consumer *cp;
 	int error, len;
 
 	if (dev == NULL || kda == NULL)
 		return (clear_dumper(td));
 
 	cp = dev->si_drv2;
 	len = sizeof(kd);
 	memset(&kd, 0, len);
 	kd.offset = 0;
 	kd.length = OFF_MAX;
 	error = g_io_getattr("GEOM::kerneldump", cp, &len, &kd);
 	if (error != 0)
 		return (error);
 
 	error = set_dumper(&kd.di, devtoname(dev), td, kda->kda_compression,
 	    kda->kda_encryption, kda->kda_key, kda->kda_encryptedkeysize,
 	    kda->kda_encryptedkey);
 	if (error == 0)
 		dev->si_flags |= SI_DUMPDEV;
 
 	return (error);
 }
 
 static int
 init_dumpdev(struct cdev *dev)
 {
 	struct diocskerneldump_arg kda;
 	struct g_consumer *cp;
 	const char *devprefix = "/dev/", *devname;
 	int error;
 	size_t len;
 
 	bzero(&kda, sizeof(kda));
 	kda.kda_enable = 1;
 
 	if (dumpdev == NULL)
 		return (0);
 
 	len = strlen(devprefix);
 	devname = devtoname(dev);
 	if (strcmp(devname, dumpdev) != 0 &&
 	   (strncmp(dumpdev, devprefix, len) != 0 ||
 	    strcmp(devname, dumpdev + len) != 0))
 		return (0);
 
 	cp = (struct g_consumer *)dev->si_drv2;
 	error = g_access(cp, 1, 0, 0);
 	if (error != 0)
 		return (error);
 
 	error = g_dev_setdumpdev(dev, &kda, curthread);
 	if (error == 0) {
 		freeenv(dumpdev);
 		dumpdev = NULL;
 	}
 
 	(void)g_access(cp, -1, 0, 0);
 
 	return (error);
 }
 
 static void
 g_dev_destroy(void *arg, int flags __unused)
 {
 	struct g_consumer *cp;
 	struct g_geom *gp;
 	struct g_dev_softc *sc;
 	char buf[SPECNAMELEN + 6];
 
 	g_topology_assert();
 	cp = arg;
 	gp = cp->geom;
 	sc = cp->private;
 	g_trace(G_T_TOPOLOGY, "g_dev_destroy(%p(%s))", cp, gp->name);
 	snprintf(buf, sizeof(buf), "cdev=%s", gp->name);
 	devctl_notify_f("GEOM", "DEV", "DESTROY", buf, M_WAITOK);
 	if (cp->acr > 0 || cp->acw > 0 || cp->ace > 0)
 		g_access(cp, -cp->acr, -cp->acw, -cp->ace);
 	g_detach(cp);
 	g_destroy_consumer(cp);
 	g_destroy_geom(gp);
 	mtx_destroy(&sc->sc_mtx);
 	g_free(sc);
 }
 
 void
 g_dev_print(void)
 {
 	struct g_geom *gp;
 	char const *p = "";
 
 	LIST_FOREACH(gp, &g_dev_class.geom, geom) {
 		printf("%s%s", p, gp->name);
 		p = " ";
 	}
 	printf("\n");
 }
 
 static void
 g_dev_set_physpath(struct g_consumer *cp)
 {
 	struct g_dev_softc *sc;
 	char *physpath;
 	int error, physpath_len;
 
 	if (g_access(cp, 1, 0, 0) != 0)
 		return;
 
 	sc = cp->private;
 	physpath_len = MAXPATHLEN;
 	physpath = g_malloc(physpath_len, M_WAITOK|M_ZERO);
 	error = g_io_getattr("GEOM::physpath", cp, &physpath_len, physpath);
 	g_access(cp, -1, 0, 0);
 	if (error == 0 && strlen(physpath) != 0) {
 		struct cdev *dev, *old_alias_dev;
 		struct cdev **alias_devp;
 
 		dev = sc->sc_dev;
 		old_alias_dev = sc->sc_alias;
 		alias_devp = (struct cdev **)&sc->sc_alias;
 		make_dev_physpath_alias(MAKEDEV_WAITOK, alias_devp, dev,
 		    old_alias_dev, physpath);
 	} else if (sc->sc_alias) {
 		destroy_dev((struct cdev *)sc->sc_alias);
 		sc->sc_alias = NULL;
 	}
 	g_free(physpath);
 }
 
 static void
 g_dev_set_media(struct g_consumer *cp)
 {
 	struct g_dev_softc *sc;
 	struct cdev *dev;
 	char buf[SPECNAMELEN + 6];
 
 	sc = cp->private;
 	dev = sc->sc_dev;
 	snprintf(buf, sizeof(buf), "cdev=%s", dev->si_name);
 	devctl_notify_f("DEVFS", "CDEV", "MEDIACHANGE", buf, M_WAITOK);
 	devctl_notify_f("GEOM", "DEV", "MEDIACHANGE", buf, M_WAITOK);
 	dev = sc->sc_alias;
 	if (dev != NULL) {
 		snprintf(buf, sizeof(buf), "cdev=%s", dev->si_name);
 		devctl_notify_f("DEVFS", "CDEV", "MEDIACHANGE", buf, M_WAITOK);
 		devctl_notify_f("GEOM", "DEV", "MEDIACHANGE", buf, M_WAITOK);
 	}
 }
 
 static void
 g_dev_attrchanged(struct g_consumer *cp, const char *attr)
 {
 
 	if (strcmp(attr, "GEOM::media") == 0) {
 		g_dev_set_media(cp);
 		return;
 	}
 
 	if (strcmp(attr, "GEOM::physpath") == 0) {
 		g_dev_set_physpath(cp);
 		return;
 	}
 }
 
 static void
 g_dev_resize(struct g_consumer *cp)
 {
 	char buf[SPECNAMELEN + 6];
 
 	snprintf(buf, sizeof(buf), "cdev=%s", cp->provider->name);
 	devctl_notify_f("GEOM", "DEV", "SIZECHANGE", buf, M_WAITOK);
 }
 
 struct g_provider *
 g_dev_getprovider(struct cdev *dev)
 {
 	struct g_consumer *cp;
 
 	g_topology_assert();
 	if (dev == NULL)
 		return (NULL);
 	if (dev->si_devsw != &g_dev_cdevsw)
 		return (NULL);
 	cp = dev->si_drv2;
 	return (cp->provider);
 }
 
 static struct g_geom *
 g_dev_taste(struct g_class *mp, struct g_provider *pp, int insist __unused)
 {
 	struct g_geom *gp;
 	struct g_geom_alias *gap;
 	struct g_consumer *cp;
 	struct g_dev_softc *sc;
 	int error;
 	struct cdev *dev, *adev;
 	char buf[SPECNAMELEN + 6];
 
 	g_trace(G_T_TOPOLOGY, "dev_taste(%s,%s)", mp->name, pp->name);
 	g_topology_assert();
 	gp = g_new_geomf(mp, "%s", pp->name);
 	sc = g_malloc(sizeof(*sc), M_WAITOK | M_ZERO);
 	mtx_init(&sc->sc_mtx, "g_dev", NULL, MTX_DEF);
 	cp = g_new_consumer(gp);
 	cp->private = sc;
 	cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE;
 	error = g_attach(cp, pp);
 	KASSERT(error == 0,
 	    ("g_dev_taste(%s) failed to g_attach, err=%d", pp->name, error));
 	error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &dev,
 	    &g_dev_cdevsw, NULL, UID_ROOT, GID_OPERATOR, 0640, "%s", gp->name);
 	if (error != 0) {
 		printf("%s: make_dev_p() failed (gp->name=%s, error=%d)\n",
 		    __func__, gp->name, error);
 		g_detach(cp);
 		g_destroy_consumer(cp);
 		g_destroy_geom(gp);
 		mtx_destroy(&sc->sc_mtx);
 		g_free(sc);
 		return (NULL);
 	}
 	dev->si_flags |= SI_UNMAPPED;
 	sc->sc_dev = dev;
 
 	dev->si_iosize_max = MAXPHYS;
 	dev->si_drv2 = cp;
 	error = init_dumpdev(dev);
 	if (error != 0)
 		printf("%s: init_dumpdev() failed (gp->name=%s, error=%d)\n",
 		    __func__, gp->name, error);
 
 	g_dev_attrchanged(cp, "GEOM::physpath");
 	snprintf(buf, sizeof(buf), "cdev=%s", gp->name);
 	devctl_notify_f("GEOM", "DEV", "CREATE", buf, M_WAITOK);
 	/*
 	 * Now add all the aliases for this drive
 	 */
 	LIST_FOREACH(gap, &pp->geom->aliases, ga_next) {
 		error = make_dev_alias_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &adev, dev,
 		    "%s", gap->ga_alias);
 		if (error) {
 			printf("%s: make_dev_alias_p() failed (name=%s, error=%d)\n",
 			    __func__, gap->ga_alias, error);
 			continue;
 		}
 		snprintf(buf, sizeof(buf), "cdev=%s", gap->ga_alias);
 		devctl_notify_f("GEOM", "DEV", "CREATE", buf, M_WAITOK);
 	}
 
 	return (gp);
 }
 
 static int
 g_dev_open(struct cdev *dev, int flags, int fmt, struct thread *td)
 {
 	struct g_consumer *cp;
 	struct g_dev_softc *sc;
 	int error, r, w, e;
 
 	cp = dev->si_drv2;
 	if (cp == NULL)
 		return (ENXIO);		/* g_dev_taste() not done yet */
 	g_trace(G_T_ACCESS, "g_dev_open(%s, %d, %d, %p)",
 	    cp->geom->name, flags, fmt, td);
 
 	r = flags & FREAD ? 1 : 0;
 	w = flags & FWRITE ? 1 : 0;
 #ifdef notyet
 	e = flags & O_EXCL ? 1 : 0;
 #else
 	e = 0;
 #endif
 
 	/*
 	 * This happens on attempt to open a device node with O_EXEC.
 	 */
 	if (r + w + e == 0)
 		return (EINVAL);
 
 	if (w) {
 		/*
 		 * When running in very secure mode, do not allow
 		 * opens for writing of any disks.
 		 */
 		error = securelevel_ge(td->td_ucred, 2);
 		if (error)
 			return (error);
 	}
 	g_topology_lock();
 	error = g_access(cp, r, w, e);
 	g_topology_unlock();
 	if (error == 0) {
 		sc = cp->private;
 		mtx_lock(&sc->sc_mtx);
 		if (sc->sc_open == 0 && (sc->sc_active & SC_A_ACTIVE) != 0)
 			wakeup(&sc->sc_active);
 		sc->sc_open += r + w + e;
 		if (sc->sc_open == 0)
 			atomic_clear_int(&sc->sc_active, SC_A_OPEN);
 		else
 			atomic_set_int(&sc->sc_active, SC_A_OPEN);
 		mtx_unlock(&sc->sc_mtx);
 	}
 	return (error);
 }
 
 static int
 g_dev_close(struct cdev *dev, int flags, int fmt, struct thread *td)
 {
 	struct g_consumer *cp;
 	struct g_dev_softc *sc;
 	int error, r, w, e;
 
 	cp = dev->si_drv2;
 	if (cp == NULL)
 		return (ENXIO);
 	g_trace(G_T_ACCESS, "g_dev_close(%s, %d, %d, %p)",
 	    cp->geom->name, flags, fmt, td);
 
 	r = flags & FREAD ? -1 : 0;
 	w = flags & FWRITE ? -1 : 0;
 #ifdef notyet
 	e = flags & O_EXCL ? -1 : 0;
 #else
 	e = 0;
 #endif
 
 	/*
 	 * The vgonel(9) - caused by eg. forced unmount of devfs - calls
 	 * VOP_CLOSE(9) on devfs vnode without any FREAD or FWRITE flags,
 	 * which would result in zero deltas, which in turn would cause
 	 * panic in g_access(9).
 	 *
 	 * Note that we cannot zero the counters (ie. do "r = cp->acr"
 	 * etc) instead, because the consumer might be opened in another
 	 * devfs instance.
 	 */
 	if (r + w + e == 0)
 		return (EINVAL);
 
 	sc = cp->private;
 	mtx_lock(&sc->sc_mtx);
 	sc->sc_open += r + w + e;
 	if (sc->sc_open == 0)
 		atomic_clear_int(&sc->sc_active, SC_A_OPEN);
 	else
 		atomic_set_int(&sc->sc_active, SC_A_OPEN);
 	while (sc->sc_open == 0 && (sc->sc_active & SC_A_ACTIVE) != 0)
 		msleep(&sc->sc_active, &sc->sc_mtx, 0, "g_dev_close", hz / 10);
 	mtx_unlock(&sc->sc_mtx);
 	g_topology_lock();
 	error = g_access(cp, r, w, e);
 	g_topology_unlock();
 	return (error);
 }
 
 /*
  * XXX: Until we have unmessed the ioctl situation, there is a race against
  * XXX: a concurrent orphanization.  We cannot close it by holding topology
  * XXX: since that would prevent us from doing our job, and stalling events
  * XXX: will break (actually: stall) the BSD disklabel hacks.
  */
 static int
 g_dev_ioctl(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td)
 {
 	struct g_consumer *cp;
 	struct g_provider *pp;
 	off_t offset, length, chunk, odd;
 	int i, error;
 
 	cp = dev->si_drv2;
 	pp = cp->provider;
 
 	error = 0;
 	KASSERT(cp->acr || cp->acw,
 	    ("Consumer with zero access count in g_dev_ioctl"));
 
 	i = IOCPARM_LEN(cmd);
 	switch (cmd) {
 	case DIOCGSECTORSIZE:
 		*(u_int *)data = cp->provider->sectorsize;
 		if (*(u_int *)data == 0)
 			error = ENOENT;
 		break;
 	case DIOCGMEDIASIZE:
 		*(off_t *)data = cp->provider->mediasize;
 		if (*(off_t *)data == 0)
 			error = ENOENT;
 		break;
 	case DIOCGFWSECTORS:
 		error = g_io_getattr("GEOM::fwsectors", cp, &i, data);
 		if (error == 0 && *(u_int *)data == 0)
 			error = ENOENT;
 		break;
 	case DIOCGFWHEADS:
 		error = g_io_getattr("GEOM::fwheads", cp, &i, data);
 		if (error == 0 && *(u_int *)data == 0)
 			error = ENOENT;
 		break;
 	case DIOCGFRONTSTUFF:
 		error = g_io_getattr("GEOM::frontstuff", cp, &i, data);
 		break;
 #ifdef COMPAT_FREEBSD11
 	case DIOCSKERNELDUMP_FREEBSD11:
 	    {
 		struct diocskerneldump_arg kda;
 
 		bzero(&kda, sizeof(kda));
 		kda.kda_encryption = KERNELDUMP_ENC_NONE;
 		kda.kda_enable = (uint8_t)*(u_int *)data;
 		if (kda.kda_enable == 0)
 			error = g_dev_setdumpdev(NULL, NULL, td);
 		else
 			error = g_dev_setdumpdev(dev, &kda, td);
 		break;
 	    }
 #endif
 	case DIOCSKERNELDUMP:
 	    {
 		struct diocskerneldump_arg *kda;
 		uint8_t *encryptedkey;
 
 		kda = (struct diocskerneldump_arg *)data;
 		if (kda->kda_enable == 0) {
 			error = g_dev_setdumpdev(NULL, NULL, td);
 			break;
 		}
 
 		if (kda->kda_encryption != KERNELDUMP_ENC_NONE) {
 			if (kda->kda_encryptedkeysize <= 0 ||
 			    kda->kda_encryptedkeysize >
 			    KERNELDUMP_ENCKEY_MAX_SIZE) {
 				return (EINVAL);
 			}
 			encryptedkey = malloc(kda->kda_encryptedkeysize, M_TEMP,
 			    M_WAITOK);
 			error = copyin(kda->kda_encryptedkey, encryptedkey,
 			    kda->kda_encryptedkeysize);
 		} else {
 			encryptedkey = NULL;
 		}
 		if (error == 0) {
 			kda->kda_encryptedkey = encryptedkey;
 			error = g_dev_setdumpdev(dev, kda, td);
 		}
 		if (encryptedkey != NULL) {
 			explicit_bzero(encryptedkey, kda->kda_encryptedkeysize);
 			free(encryptedkey, M_TEMP);
 		}
 		explicit_bzero(kda, sizeof(*kda));
 		break;
 	    }
 	case DIOCGFLUSH:
 		error = g_io_flush(cp);
 		break;
 	case DIOCGDELETE:
 		offset = ((off_t *)data)[0];
 		length = ((off_t *)data)[1];
 		if ((offset % cp->provider->sectorsize) != 0 ||
 		    (length % cp->provider->sectorsize) != 0 || length <= 0) {
 			printf("%s: offset=%jd length=%jd\n", __func__, offset,
 			    length);
 			error = EINVAL;
 			break;
 		}
 		if ((cp->provider->mediasize > 0) &&
 		    (offset >= cp->provider->mediasize)) {
 			/*
 			 * Catch out-of-bounds requests here. The problem is
 			 * that due to historical GEOM I/O implementation
 			 * peculatities, g_delete_data() would always return
 			 * success for requests starting just the next byte
 			 * after providers media boundary. Condition check on
 			 * non-zero media size, since that condition would
 			 * (most likely) cause ENXIO instead.
 			 */
 			error = EIO;
 			break;
 		}
 		while (length > 0) {
 			chunk = length;
 			if (g_dev_del_max_sectors != 0 && chunk >
 			    g_dev_del_max_sectors * cp->provider->sectorsize) {
 				chunk = g_dev_del_max_sectors *
 				    cp->provider->sectorsize;
 				if (cp->provider->stripesize > 0) {
 					odd = (offset + chunk +
 					    cp->provider->stripeoffset) %
 					    cp->provider->stripesize;
 					if (chunk > odd)
 						chunk -= odd;
 				}
 			}
 			error = g_delete_data(cp, offset, chunk);
 			length -= chunk;
 			offset += chunk;
 			if (error)
 				break;
 			/*
 			 * Since the request size can be large, the service
 			 * time can be is likewise.  We make this ioctl
 			 * interruptible by checking for signals for each bio.
 			 */
 			if (SIGPENDING(td))
 				break;
 		}
 		break;
 	case DIOCGIDENT:
 		error = g_io_getattr("GEOM::ident", cp, &i, data);
 		break;
 	case DIOCGPROVIDERNAME:
 		if (pp == NULL)
 			return (ENOENT);
 		strlcpy(data, pp->name, i);
 		break;
 	case DIOCGSTRIPESIZE:
 		*(off_t *)data = cp->provider->stripesize;
 		break;
 	case DIOCGSTRIPEOFFSET:
 		*(off_t *)data = cp->provider->stripeoffset;
 		break;
 	case DIOCGPHYSPATH:
 		error = g_io_getattr("GEOM::physpath", cp, &i, data);
 		if (error == 0 && *(char *)data == '\0')
 			error = ENOENT;
 		break;
 	case DIOCGATTR: {
 		struct diocgattr_arg *arg = (struct diocgattr_arg *)data;
 
 		if (arg->len > sizeof(arg->value)) {
 			error = EINVAL;
 			break;
 		}
 		error = g_io_getattr(arg->name, cp, &arg->len, &arg->value);
 		break;
 	}
 	case DIOCZONECMD: {
 		struct disk_zone_args *zone_args =(struct disk_zone_args *)data;
 		struct disk_zone_rep_entry *new_entries, *old_entries;
 		struct disk_zone_report *rep;
 		size_t alloc_size;
 
 		old_entries = NULL;
 		new_entries = NULL;
 		rep = NULL;
 		alloc_size = 0;
 
 		if (zone_args->zone_cmd == DISK_ZONE_REPORT_ZONES) {
 			rep = &zone_args->zone_params.report;
 #define	MAXENTRIES	(MAXPHYS / sizeof(struct disk_zone_rep_entry))
 			if (rep->entries_allocated > MAXENTRIES)
 				rep->entries_allocated = MAXENTRIES;
 			alloc_size = rep->entries_allocated *
 			    sizeof(struct disk_zone_rep_entry);
 			if (alloc_size != 0)
 				new_entries = g_malloc(alloc_size,
 				    M_WAITOK| M_ZERO);
 			old_entries = rep->entries;
 			rep->entries = new_entries;
 		}
 		error = g_io_zonecmd(zone_args, cp);
 		if (zone_args->zone_cmd == DISK_ZONE_REPORT_ZONES &&
 		    alloc_size != 0 && error == 0)
 			error = copyout(new_entries, old_entries, alloc_size);
 		if (old_entries != NULL && rep != NULL)
 			rep->entries = old_entries;
 		if (new_entries != NULL)
 			g_free(new_entries);
 		break;
 	}
 	default:
 		if (cp->provider->geom->ioctl != NULL) {
 			error = cp->provider->geom->ioctl(cp->provider, cmd, data, fflag, td);
 		} else {
 			error = ENOIOCTL;
 		}
 	}
 
 	return (error);
 }
 
 static void
 g_dev_done(struct bio *bp2)
 {
 	struct g_consumer *cp;
 	struct g_dev_softc *sc;
 	struct bio *bp;
 	int active;
 
 	cp = bp2->bio_from;
 	sc = cp->private;
 	bp = bp2->bio_parent;
 	bp->bio_error = bp2->bio_error;
 	bp->bio_completed = bp2->bio_completed;
 	bp->bio_resid = bp->bio_length - bp2->bio_completed;
 	if (bp2->bio_cmd == BIO_ZONE)
 		bcopy(&bp2->bio_zone, &bp->bio_zone, sizeof(bp->bio_zone));
 
 	if (bp2->bio_error != 0) {
 		g_trace(G_T_BIO, "g_dev_done(%p) had error %d",
 		    bp2, bp2->bio_error);
 		bp->bio_flags |= BIO_ERROR;
 	} else {
 		g_trace(G_T_BIO, "g_dev_done(%p/%p) resid %ld completed %jd",
 		    bp2, bp, bp2->bio_resid, (intmax_t)bp2->bio_completed);
 	}
 	g_destroy_bio(bp2);
 	active = atomic_fetchadd_int(&sc->sc_active, -1) - 1;
 	if ((active & SC_A_ACTIVE) == 0) {
 		if ((active & SC_A_OPEN) == 0)
 			wakeup(&sc->sc_active);
 		if (active & SC_A_DESTROY)
 			g_post_event(g_dev_destroy, cp, M_NOWAIT, NULL);
 	}
 	biodone(bp);
 }
 
 static void
 g_dev_strategy(struct bio *bp)
 {
 	struct g_consumer *cp;
 	struct bio *bp2;
 	struct cdev *dev;
 	struct g_dev_softc *sc;
 
 	KASSERT(bp->bio_cmd == BIO_READ ||
 	        bp->bio_cmd == BIO_WRITE ||
 	        bp->bio_cmd == BIO_DELETE ||
 		bp->bio_cmd == BIO_FLUSH ||
 		bp->bio_cmd == BIO_ZONE,
 		("Wrong bio_cmd bio=%p cmd=%d", bp, bp->bio_cmd));
 	dev = bp->bio_dev;
 	cp = dev->si_drv2;
 	sc = cp->private;
 	KASSERT(cp->acr || cp->acw,
 	    ("Consumer with zero access count in g_dev_strategy"));
 	biotrack(bp, __func__);
 #ifdef INVARIANTS
 	if ((bp->bio_offset % cp->provider->sectorsize) != 0 ||
 	    (bp->bio_bcount % cp->provider->sectorsize) != 0) {
 		bp->bio_resid = bp->bio_bcount;
 		biofinish(bp, NULL, EINVAL);
 		return;
 	}
 #endif
 	KASSERT(sc->sc_open > 0, ("Closed device in g_dev_strategy"));
 	atomic_add_int(&sc->sc_active, 1);
 
 	for (;;) {
 		/*
 		 * XXX: This is not an ideal solution, but I believe it to
 		 * XXX: deadlock safely, all things considered.
 		 */
 		bp2 = g_clone_bio(bp);
 		if (bp2 != NULL)
 			break;
 		pause("gdstrat", hz / 10);
 	}
 	KASSERT(bp2 != NULL, ("XXX: ENOMEM in a bad place"));
 	bp2->bio_done = g_dev_done;
 	g_trace(G_T_BIO,
 	    "g_dev_strategy(%p/%p) offset %jd length %jd data %p cmd %d",
 	    bp, bp2, (intmax_t)bp->bio_offset, (intmax_t)bp2->bio_length,
 	    bp2->bio_data, bp2->bio_cmd);
 	g_io_request(bp2, cp);
 	KASSERT(cp->acr || cp->acw,
 	    ("g_dev_strategy raced with g_dev_close and lost"));
 
 }
 
 /*
  * g_dev_callback()
  *
  * Called by devfs when asynchronous device destruction is completed.
  * - Mark that we have no attached device any more.
  * - If there are no outstanding requests, schedule geom destruction.
  *   Otherwise destruction will be scheduled later by g_dev_done().
  */
 
 static void
 g_dev_callback(void *arg)
 {
 	struct g_consumer *cp;
 	struct g_dev_softc *sc;
 	int active;
 
 	cp = arg;
 	sc = cp->private;
 	g_trace(G_T_TOPOLOGY, "g_dev_callback(%p(%s))", cp, cp->geom->name);
 
 	sc->sc_dev = NULL;
 	sc->sc_alias = NULL;
 	active = atomic_fetchadd_int(&sc->sc_active, SC_A_DESTROY);
 	if ((active & SC_A_ACTIVE) == 0)
 		g_post_event(g_dev_destroy, cp, M_WAITOK, NULL);
 }
 
 /*
  * g_dev_orphan()
  *
  * Called from below when the provider orphaned us.
  * - Clear any dump settings.
  * - Request asynchronous device destruction to prevent any more requests
  *   from coming in.  The provider is already marked with an error, so
  *   anything which comes in the interim will be returned immediately.
  */
 
 static void
 g_dev_orphan(struct g_consumer *cp)
 {
 	struct cdev *dev;
 	struct g_dev_softc *sc;
 
 	g_topology_assert();
 	sc = cp->private;
 	dev = sc->sc_dev;
 	g_trace(G_T_TOPOLOGY, "g_dev_orphan(%p(%s))", cp, cp->geom->name);
 
 	/* Reset any dump-area set on this device */
 	if (dev->si_flags & SI_DUMPDEV)
 		(void)clear_dumper(curthread);
 
 	/* Destroy the struct cdev *so we get no more requests */
+	delist_dev(dev);
 	destroy_dev_sched_cb(dev, g_dev_callback, cp);
 }
 
 DECLARE_GEOM_CLASS(g_dev_class, g_dev);
Index: user/ngie/bug-237403/sys/kern/kern_sig.c
===================================================================
--- user/ngie/bug-237403/sys/kern/kern_sig.c	(revision 346925)
+++ user/ngie/bug-237403/sys/kern/kern_sig.c	(revision 346926)
@@ -1,3840 +1,3838 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1989, 1991, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)kern_sig.c	8.7 (Berkeley) 4/18/94
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ktrace.h"
 
 #include <sys/param.h>
 #include <sys/ctype.h>
 #include <sys/systm.h>
 #include <sys/signalvar.h>
 #include <sys/vnode.h>
 #include <sys/acct.h>
 #include <sys/bus.h>
 #include <sys/capsicum.h>
 #include <sys/compressor.h>
 #include <sys/condvar.h>
 #include <sys/event.h>
 #include <sys/fcntl.h>
 #include <sys/imgact.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/ktrace.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
 #include <sys/refcount.h>
 #include <sys/namei.h>
 #include <sys/proc.h>
 #include <sys/procdesc.h>
 #include <sys/posix4.h>
 #include <sys/pioctl.h>
 #include <sys/racct.h>
 #include <sys/resourcevar.h>
 #include <sys/sdt.h>
 #include <sys/sbuf.h>
 #include <sys/sleepqueue.h>
 #include <sys/smp.h>
 #include <sys/stat.h>
 #include <sys/sx.h>
 #include <sys/syscallsubr.h>
 #include <sys/sysctl.h>
 #include <sys/sysent.h>
 #include <sys/syslog.h>
 #include <sys/sysproto.h>
 #include <sys/timers.h>
 #include <sys/unistd.h>
 #include <sys/wait.h>
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/uma.h>
 
 #include <sys/jail.h>
 
 #include <machine/cpu.h>
 
 #include <security/audit/audit.h>
 
 #define	ONSIG	32		/* NSIG for osig* syscalls.  XXX. */
 
 SDT_PROVIDER_DECLARE(proc);
 SDT_PROBE_DEFINE3(proc, , , signal__send,
     "struct thread *", "struct proc *", "int");
 SDT_PROBE_DEFINE2(proc, , , signal__clear,
     "int", "ksiginfo_t *");
 SDT_PROBE_DEFINE3(proc, , , signal__discard,
     "struct thread *", "struct proc *", "int");
 
 static int	coredump(struct thread *);
 static int	killpg1(struct thread *td, int sig, int pgid, int all,
 		    ksiginfo_t *ksi);
 static int	issignal(struct thread *td);
 static int	sigprop(int sig);
 static void	tdsigwakeup(struct thread *, int, sig_t, int);
 static int	sig_suspend_threads(struct thread *, struct proc *, int);
 static int	filt_sigattach(struct knote *kn);
 static void	filt_sigdetach(struct knote *kn);
 static int	filt_signal(struct knote *kn, long hint);
 static struct thread *sigtd(struct proc *p, int sig, int prop);
 static void	sigqueue_start(void);
 
 static uma_zone_t	ksiginfo_zone = NULL;
 struct filterops sig_filtops = {
 	.f_isfd = 0,
 	.f_attach = filt_sigattach,
 	.f_detach = filt_sigdetach,
 	.f_event = filt_signal,
 };
 
 static int	kern_logsigexit = 1;
 SYSCTL_INT(_kern, KERN_LOGSIGEXIT, logsigexit, CTLFLAG_RW,
     &kern_logsigexit, 0,
     "Log processes quitting on abnormal signals to syslog(3)");
 
 static int	kern_forcesigexit = 1;
 SYSCTL_INT(_kern, OID_AUTO, forcesigexit, CTLFLAG_RW,
     &kern_forcesigexit, 0, "Force trap signal to be handled");
 
 static SYSCTL_NODE(_kern, OID_AUTO, sigqueue, CTLFLAG_RW, 0,
     "POSIX real time signal");
 
 static int	max_pending_per_proc = 128;
 SYSCTL_INT(_kern_sigqueue, OID_AUTO, max_pending_per_proc, CTLFLAG_RW,
     &max_pending_per_proc, 0, "Max pending signals per proc");
 
 static int	preallocate_siginfo = 1024;
 SYSCTL_INT(_kern_sigqueue, OID_AUTO, preallocate, CTLFLAG_RDTUN,
     &preallocate_siginfo, 0, "Preallocated signal memory size");
 
 static int	signal_overflow = 0;
 SYSCTL_INT(_kern_sigqueue, OID_AUTO, overflow, CTLFLAG_RD,
     &signal_overflow, 0, "Number of signals overflew");
 
 static int	signal_alloc_fail = 0;
 SYSCTL_INT(_kern_sigqueue, OID_AUTO, alloc_fail, CTLFLAG_RD,
     &signal_alloc_fail, 0, "signals failed to be allocated");
 
 static int	kern_lognosys = 0;
 SYSCTL_INT(_kern, OID_AUTO, lognosys, CTLFLAG_RWTUN, &kern_lognosys, 0,
     "Log invalid syscalls");
 
 SYSINIT(signal, SI_SUB_P1003_1B, SI_ORDER_FIRST+3, sigqueue_start, NULL);
 
 /*
  * Policy -- Can ucred cr1 send SIGIO to process cr2?
  * Should use cr_cansignal() once cr_cansignal() allows SIGIO and SIGURG
  * in the right situations.
  */
 #define CANSIGIO(cr1, cr2) \
 	((cr1)->cr_uid == 0 || \
 	    (cr1)->cr_ruid == (cr2)->cr_ruid || \
 	    (cr1)->cr_uid == (cr2)->cr_ruid || \
 	    (cr1)->cr_ruid == (cr2)->cr_uid || \
 	    (cr1)->cr_uid == (cr2)->cr_uid)
 
 static int	sugid_coredump;
 SYSCTL_INT(_kern, OID_AUTO, sugid_coredump, CTLFLAG_RWTUN,
     &sugid_coredump, 0, "Allow setuid and setgid processes to dump core");
 
 static int	capmode_coredump;
 SYSCTL_INT(_kern, OID_AUTO, capmode_coredump, CTLFLAG_RWTUN,
     &capmode_coredump, 0, "Allow processes in capability mode to dump core");
 
 static int	do_coredump = 1;
 SYSCTL_INT(_kern, OID_AUTO, coredump, CTLFLAG_RW,
 	&do_coredump, 0, "Enable/Disable coredumps");
 
 static int	set_core_nodump_flag = 0;
 SYSCTL_INT(_kern, OID_AUTO, nodump_coredump, CTLFLAG_RW, &set_core_nodump_flag,
 	0, "Enable setting the NODUMP flag on coredump files");
 
 static int	coredump_devctl = 0;
 SYSCTL_INT(_kern, OID_AUTO, coredump_devctl, CTLFLAG_RW, &coredump_devctl,
 	0, "Generate a devctl notification when processes coredump");
 
 /*
  * Signal properties and actions.
  * The array below categorizes the signals and their default actions
  * according to the following properties:
  */
 #define	SIGPROP_KILL		0x01	/* terminates process by default */
 #define	SIGPROP_CORE		0x02	/* ditto and coredumps */
 #define	SIGPROP_STOP		0x04	/* suspend process */
 #define	SIGPROP_TTYSTOP		0x08	/* ditto, from tty */
 #define	SIGPROP_IGNORE		0x10	/* ignore by default */
 #define	SIGPROP_CONT		0x20	/* continue if suspended */
 #define	SIGPROP_CANTMASK	0x40	/* non-maskable, catchable */
 
 static int sigproptbl[NSIG] = {
 	[SIGHUP] =	SIGPROP_KILL,
 	[SIGINT] =	SIGPROP_KILL,
 	[SIGQUIT] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGILL] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGTRAP] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGABRT] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGEMT] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGFPE] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGKILL] =	SIGPROP_KILL,
 	[SIGBUS] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGSEGV] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGSYS] =	SIGPROP_KILL | SIGPROP_CORE,
 	[SIGPIPE] =	SIGPROP_KILL,
 	[SIGALRM] =	SIGPROP_KILL,
 	[SIGTERM] =	SIGPROP_KILL,
 	[SIGURG] =	SIGPROP_IGNORE,
 	[SIGSTOP] =	SIGPROP_STOP,
 	[SIGTSTP] =	SIGPROP_STOP | SIGPROP_TTYSTOP,
 	[SIGCONT] =	SIGPROP_IGNORE | SIGPROP_CONT,
 	[SIGCHLD] =	SIGPROP_IGNORE,
 	[SIGTTIN] =	SIGPROP_STOP | SIGPROP_TTYSTOP,
 	[SIGTTOU] =	SIGPROP_STOP | SIGPROP_TTYSTOP,
 	[SIGIO] =	SIGPROP_IGNORE,
 	[SIGXCPU] =	SIGPROP_KILL,
 	[SIGXFSZ] =	SIGPROP_KILL,
 	[SIGVTALRM] =	SIGPROP_KILL,
 	[SIGPROF] =	SIGPROP_KILL,
 	[SIGWINCH] =	SIGPROP_IGNORE,
 	[SIGINFO] =	SIGPROP_IGNORE,
 	[SIGUSR1] =	SIGPROP_KILL,
 	[SIGUSR2] =	SIGPROP_KILL,
 };
 
 static void reschedule_signals(struct proc *p, sigset_t block, int flags);
 
 static void
 sigqueue_start(void)
 {
 	ksiginfo_zone = uma_zcreate("ksiginfo", sizeof(ksiginfo_t),
 		NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
 	uma_prealloc(ksiginfo_zone, preallocate_siginfo);
 	p31b_setcfg(CTL_P1003_1B_REALTIME_SIGNALS, _POSIX_REALTIME_SIGNALS);
 	p31b_setcfg(CTL_P1003_1B_RTSIG_MAX, SIGRTMAX - SIGRTMIN + 1);
 	p31b_setcfg(CTL_P1003_1B_SIGQUEUE_MAX, max_pending_per_proc);
 }
 
 ksiginfo_t *
 ksiginfo_alloc(int wait)
 {
 	int flags;
 
 	flags = M_ZERO;
 	if (! wait)
 		flags |= M_NOWAIT;
 	if (ksiginfo_zone != NULL)
 		return ((ksiginfo_t *)uma_zalloc(ksiginfo_zone, flags));
 	return (NULL);
 }
 
 void
 ksiginfo_free(ksiginfo_t *ksi)
 {
 	uma_zfree(ksiginfo_zone, ksi);
 }
 
 static __inline int
 ksiginfo_tryfree(ksiginfo_t *ksi)
 {
 	if (!(ksi->ksi_flags & KSI_EXT)) {
 		uma_zfree(ksiginfo_zone, ksi);
 		return (1);
 	}
 	return (0);
 }
 
 void
 sigqueue_init(sigqueue_t *list, struct proc *p)
 {
 	SIGEMPTYSET(list->sq_signals);
 	SIGEMPTYSET(list->sq_kill);
 	SIGEMPTYSET(list->sq_ptrace);
 	TAILQ_INIT(&list->sq_list);
 	list->sq_proc = p;
 	list->sq_flags = SQ_INIT;
 }
 
 /*
  * Get a signal's ksiginfo.
  * Return:
  *	0	-	signal not found
  *	others	-	signal number
  */
 static int
 sigqueue_get(sigqueue_t *sq, int signo, ksiginfo_t *si)
 {
 	struct proc *p = sq->sq_proc;
 	struct ksiginfo *ksi, *next;
 	int count = 0;
 
 	KASSERT(sq->sq_flags & SQ_INIT, ("sigqueue not inited"));
 
 	if (!SIGISMEMBER(sq->sq_signals, signo))
 		return (0);
 
 	if (SIGISMEMBER(sq->sq_ptrace, signo)) {
 		count++;
 		SIGDELSET(sq->sq_ptrace, signo);
 		si->ksi_flags |= KSI_PTRACE;
 	}
 	if (SIGISMEMBER(sq->sq_kill, signo)) {
 		count++;
 		if (count == 1)
 			SIGDELSET(sq->sq_kill, signo);
 	}
 
 	TAILQ_FOREACH_SAFE(ksi, &sq->sq_list, ksi_link, next) {
 		if (ksi->ksi_signo == signo) {
 			if (count == 0) {
 				TAILQ_REMOVE(&sq->sq_list, ksi, ksi_link);
 				ksi->ksi_sigq = NULL;
 				ksiginfo_copy(ksi, si);
 				if (ksiginfo_tryfree(ksi) && p != NULL)
 					p->p_pendingcnt--;
 			}
 			if (++count > 1)
 				break;
 		}
 	}
 
 	if (count <= 1)
 		SIGDELSET(sq->sq_signals, signo);
 	si->ksi_signo = signo;
 	return (signo);
 }
 
 void
 sigqueue_take(ksiginfo_t *ksi)
 {
 	struct ksiginfo *kp;
 	struct proc	*p;
 	sigqueue_t	*sq;
 
 	if (ksi == NULL || (sq = ksi->ksi_sigq) == NULL)
 		return;
 
 	p = sq->sq_proc;
 	TAILQ_REMOVE(&sq->sq_list, ksi, ksi_link);
 	ksi->ksi_sigq = NULL;
 	if (!(ksi->ksi_flags & KSI_EXT) && p != NULL)
 		p->p_pendingcnt--;
 
 	for (kp = TAILQ_FIRST(&sq->sq_list); kp != NULL;
 	     kp = TAILQ_NEXT(kp, ksi_link)) {
 		if (kp->ksi_signo == ksi->ksi_signo)
 			break;
 	}
 	if (kp == NULL && !SIGISMEMBER(sq->sq_kill, ksi->ksi_signo) &&
 	    !SIGISMEMBER(sq->sq_ptrace, ksi->ksi_signo))
 		SIGDELSET(sq->sq_signals, ksi->ksi_signo);
 }
 
 static int
 sigqueue_add(sigqueue_t *sq, int signo, ksiginfo_t *si)
 {
 	struct proc *p = sq->sq_proc;
 	struct ksiginfo *ksi;
 	int ret = 0;
 
 	KASSERT(sq->sq_flags & SQ_INIT, ("sigqueue not inited"));
 
 	/*
 	 * SIGKILL/SIGSTOP cannot be caught or masked, so take the fast path
 	 * for these signals.
 	 */
 	if (signo == SIGKILL || signo == SIGSTOP || si == NULL) {
 		SIGADDSET(sq->sq_kill, signo);
 		goto out_set_bit;
 	}
 
 	/* directly insert the ksi, don't copy it */
 	if (si->ksi_flags & KSI_INS) {
 		if (si->ksi_flags & KSI_HEAD)
 			TAILQ_INSERT_HEAD(&sq->sq_list, si, ksi_link);
 		else
 			TAILQ_INSERT_TAIL(&sq->sq_list, si, ksi_link);
 		si->ksi_sigq = sq;
 		goto out_set_bit;
 	}
 
 	if (__predict_false(ksiginfo_zone == NULL)) {
 		SIGADDSET(sq->sq_kill, signo);
 		goto out_set_bit;
 	}
 
 	if (p != NULL && p->p_pendingcnt >= max_pending_per_proc) {
 		signal_overflow++;
 		ret = EAGAIN;
 	} else if ((ksi = ksiginfo_alloc(0)) == NULL) {
 		signal_alloc_fail++;
 		ret = EAGAIN;
 	} else {
 		if (p != NULL)
 			p->p_pendingcnt++;
 		ksiginfo_copy(si, ksi);
 		ksi->ksi_signo = signo;
 		if (si->ksi_flags & KSI_HEAD)
 			TAILQ_INSERT_HEAD(&sq->sq_list, ksi, ksi_link);
 		else
 			TAILQ_INSERT_TAIL(&sq->sq_list, ksi, ksi_link);
 		ksi->ksi_sigq = sq;
 	}
 
 	if (ret != 0) {
 		if ((si->ksi_flags & KSI_PTRACE) != 0) {
 			SIGADDSET(sq->sq_ptrace, signo);
 			ret = 0;
 			goto out_set_bit;
 		} else if ((si->ksi_flags & KSI_TRAP) != 0 ||
 		    (si->ksi_flags & KSI_SIGQ) == 0) {
 			SIGADDSET(sq->sq_kill, signo);
 			ret = 0;
 			goto out_set_bit;
 		}
 		return (ret);
 	}
 
 out_set_bit:
 	SIGADDSET(sq->sq_signals, signo);
 	return (ret);
 }
 
 void
 sigqueue_flush(sigqueue_t *sq)
 {
 	struct proc *p = sq->sq_proc;
 	ksiginfo_t *ksi;
 
 	KASSERT(sq->sq_flags & SQ_INIT, ("sigqueue not inited"));
 
 	if (p != NULL)
 		PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	while ((ksi = TAILQ_FIRST(&sq->sq_list)) != NULL) {
 		TAILQ_REMOVE(&sq->sq_list, ksi, ksi_link);
 		ksi->ksi_sigq = NULL;
 		if (ksiginfo_tryfree(ksi) && p != NULL)
 			p->p_pendingcnt--;
 	}
 
 	SIGEMPTYSET(sq->sq_signals);
 	SIGEMPTYSET(sq->sq_kill);
 	SIGEMPTYSET(sq->sq_ptrace);
 }
 
 static void
 sigqueue_move_set(sigqueue_t *src, sigqueue_t *dst, const sigset_t *set)
 {
 	sigset_t tmp;
 	struct proc *p1, *p2;
 	ksiginfo_t *ksi, *next;
 
 	KASSERT(src->sq_flags & SQ_INIT, ("src sigqueue not inited"));
 	KASSERT(dst->sq_flags & SQ_INIT, ("dst sigqueue not inited"));
 	p1 = src->sq_proc;
 	p2 = dst->sq_proc;
 	/* Move siginfo to target list */
 	TAILQ_FOREACH_SAFE(ksi, &src->sq_list, ksi_link, next) {
 		if (SIGISMEMBER(*set, ksi->ksi_signo)) {
 			TAILQ_REMOVE(&src->sq_list, ksi, ksi_link);
 			if (p1 != NULL)
 				p1->p_pendingcnt--;
 			TAILQ_INSERT_TAIL(&dst->sq_list, ksi, ksi_link);
 			ksi->ksi_sigq = dst;
 			if (p2 != NULL)
 				p2->p_pendingcnt++;
 		}
 	}
 
 	/* Move pending bits to target list */
 	tmp = src->sq_kill;
 	SIGSETAND(tmp, *set);
 	SIGSETOR(dst->sq_kill, tmp);
 	SIGSETNAND(src->sq_kill, tmp);
 
 	tmp = src->sq_ptrace;
 	SIGSETAND(tmp, *set);
 	SIGSETOR(dst->sq_ptrace, tmp);
 	SIGSETNAND(src->sq_ptrace, tmp);
 
 	tmp = src->sq_signals;
 	SIGSETAND(tmp, *set);
 	SIGSETOR(dst->sq_signals, tmp);
 	SIGSETNAND(src->sq_signals, tmp);
 }
 
 #if 0
 static void
 sigqueue_move(sigqueue_t *src, sigqueue_t *dst, int signo)
 {
 	sigset_t set;
 
 	SIGEMPTYSET(set);
 	SIGADDSET(set, signo);
 	sigqueue_move_set(src, dst, &set);
 }
 #endif
 
 static void
 sigqueue_delete_set(sigqueue_t *sq, const sigset_t *set)
 {
 	struct proc *p = sq->sq_proc;
 	ksiginfo_t *ksi, *next;
 
 	KASSERT(sq->sq_flags & SQ_INIT, ("src sigqueue not inited"));
 
 	/* Remove siginfo queue */
 	TAILQ_FOREACH_SAFE(ksi, &sq->sq_list, ksi_link, next) {
 		if (SIGISMEMBER(*set, ksi->ksi_signo)) {
 			TAILQ_REMOVE(&sq->sq_list, ksi, ksi_link);
 			ksi->ksi_sigq = NULL;
 			if (ksiginfo_tryfree(ksi) && p != NULL)
 				p->p_pendingcnt--;
 		}
 	}
 	SIGSETNAND(sq->sq_kill, *set);
 	SIGSETNAND(sq->sq_ptrace, *set);
 	SIGSETNAND(sq->sq_signals, *set);
 }
 
 void
 sigqueue_delete(sigqueue_t *sq, int signo)
 {
 	sigset_t set;
 
 	SIGEMPTYSET(set);
 	SIGADDSET(set, signo);
 	sigqueue_delete_set(sq, &set);
 }
 
 /* Remove a set of signals for a process */
 static void
 sigqueue_delete_set_proc(struct proc *p, const sigset_t *set)
 {
 	sigqueue_t worklist;
 	struct thread *td0;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	sigqueue_init(&worklist, NULL);
 	sigqueue_move_set(&p->p_sigqueue, &worklist, set);
 
 	FOREACH_THREAD_IN_PROC(p, td0)
 		sigqueue_move_set(&td0->td_sigqueue, &worklist, set);
 
 	sigqueue_flush(&worklist);
 }
 
 void
 sigqueue_delete_proc(struct proc *p, int signo)
 {
 	sigset_t set;
 
 	SIGEMPTYSET(set);
 	SIGADDSET(set, signo);
 	sigqueue_delete_set_proc(p, &set);
 }
 
 static void
 sigqueue_delete_stopmask_proc(struct proc *p)
 {
 	sigset_t set;
 
 	SIGEMPTYSET(set);
 	SIGADDSET(set, SIGSTOP);
 	SIGADDSET(set, SIGTSTP);
 	SIGADDSET(set, SIGTTIN);
 	SIGADDSET(set, SIGTTOU);
 	sigqueue_delete_set_proc(p, &set);
 }
 
 /*
  * Determine signal that should be delivered to thread td, the current
  * thread, 0 if none.  If there is a pending stop signal with default
  * action, the process stops in issignal().
  */
 int
 cursig(struct thread *td)
 {
 	PROC_LOCK_ASSERT(td->td_proc, MA_OWNED);
 	mtx_assert(&td->td_proc->p_sigacts->ps_mtx, MA_OWNED);
 	THREAD_LOCK_ASSERT(td, MA_NOTOWNED);
 	return (SIGPENDING(td) ? issignal(td) : 0);
 }
 
 /*
  * Arrange for ast() to handle unmasked pending signals on return to user
  * mode.  This must be called whenever a signal is added to td_sigqueue or
  * unmasked in td_sigmask.
  */
 void
 signotify(struct thread *td)
 {
 
 	PROC_LOCK_ASSERT(td->td_proc, MA_OWNED);
 
 	if (SIGPENDING(td)) {
 		thread_lock(td);
 		td->td_flags |= TDF_NEEDSIGCHK | TDF_ASTPENDING;
 		thread_unlock(td);
 	}
 }
 
 /*
  * Returns 1 (true) if altstack is configured for the thread, and the
  * passed stack bottom address falls into the altstack range.  Handles
  * the 43 compat special case where the alt stack size is zero.
  */
 int
 sigonstack(size_t sp)
 {
 	struct thread *td;
 
 	td = curthread;
 	if ((td->td_pflags & TDP_ALTSTACK) == 0)
 		return (0);
 #if defined(COMPAT_43)
 	if (td->td_sigstk.ss_size == 0)
 		return ((td->td_sigstk.ss_flags & SS_ONSTACK) != 0);
 #endif
 	return (sp >= (size_t)td->td_sigstk.ss_sp &&
 	    sp < td->td_sigstk.ss_size + (size_t)td->td_sigstk.ss_sp);
 }
 
 static __inline int
 sigprop(int sig)
 {
 
 	if (sig > 0 && sig < nitems(sigproptbl))
 		return (sigproptbl[sig]);
 	return (0);
 }
 
 int
 sig_ffs(sigset_t *set)
 {
 	int i;
 
 	for (i = 0; i < _SIG_WORDS; i++)
 		if (set->__bits[i])
 			return (ffs(set->__bits[i]) + (i * 32));
 	return (0);
 }
 
 static bool
 sigact_flag_test(const struct sigaction *act, int flag)
 {
 
 	/*
 	 * SA_SIGINFO is reset when signal disposition is set to
 	 * ignore or default.  Other flags are kept according to user
 	 * settings.
 	 */
 	return ((act->sa_flags & flag) != 0 && (flag != SA_SIGINFO ||
 	    ((__sighandler_t *)act->sa_sigaction != SIG_IGN &&
 	    (__sighandler_t *)act->sa_sigaction != SIG_DFL)));
 }
 
 /*
  * kern_sigaction
  * sigaction
  * freebsd4_sigaction
  * osigaction
  */
 int
 kern_sigaction(struct thread *td, int sig, const struct sigaction *act,
     struct sigaction *oact, int flags)
 {
 	struct sigacts *ps;
 	struct proc *p = td->td_proc;
 
 	if (!_SIG_VALID(sig))
 		return (EINVAL);
 	if (act != NULL && act->sa_handler != SIG_DFL &&
 	    act->sa_handler != SIG_IGN && (act->sa_flags & ~(SA_ONSTACK |
 	    SA_RESTART | SA_RESETHAND | SA_NOCLDSTOP | SA_NODEFER |
 	    SA_NOCLDWAIT | SA_SIGINFO)) != 0)
 		return (EINVAL);
 
 	PROC_LOCK(p);
 	ps = p->p_sigacts;
 	mtx_lock(&ps->ps_mtx);
 	if (oact) {
 		memset(oact, 0, sizeof(*oact));
 		oact->sa_mask = ps->ps_catchmask[_SIG_IDX(sig)];
 		if (SIGISMEMBER(ps->ps_sigonstack, sig))
 			oact->sa_flags |= SA_ONSTACK;
 		if (!SIGISMEMBER(ps->ps_sigintr, sig))
 			oact->sa_flags |= SA_RESTART;
 		if (SIGISMEMBER(ps->ps_sigreset, sig))
 			oact->sa_flags |= SA_RESETHAND;
 		if (SIGISMEMBER(ps->ps_signodefer, sig))
 			oact->sa_flags |= SA_NODEFER;
 		if (SIGISMEMBER(ps->ps_siginfo, sig)) {
 			oact->sa_flags |= SA_SIGINFO;
 			oact->sa_sigaction =
 			    (__siginfohandler_t *)ps->ps_sigact[_SIG_IDX(sig)];
 		} else
 			oact->sa_handler = ps->ps_sigact[_SIG_IDX(sig)];
 		if (sig == SIGCHLD && ps->ps_flag & PS_NOCLDSTOP)
 			oact->sa_flags |= SA_NOCLDSTOP;
 		if (sig == SIGCHLD && ps->ps_flag & PS_NOCLDWAIT)
 			oact->sa_flags |= SA_NOCLDWAIT;
 	}
 	if (act) {
 		if ((sig == SIGKILL || sig == SIGSTOP) &&
 		    act->sa_handler != SIG_DFL) {
 			mtx_unlock(&ps->ps_mtx);
 			PROC_UNLOCK(p);
 			return (EINVAL);
 		}
 
 		/*
 		 * Change setting atomically.
 		 */
 
 		ps->ps_catchmask[_SIG_IDX(sig)] = act->sa_mask;
 		SIG_CANTMASK(ps->ps_catchmask[_SIG_IDX(sig)]);
 		if (sigact_flag_test(act, SA_SIGINFO)) {
 			ps->ps_sigact[_SIG_IDX(sig)] =
 			    (__sighandler_t *)act->sa_sigaction;
 			SIGADDSET(ps->ps_siginfo, sig);
 		} else {
 			ps->ps_sigact[_SIG_IDX(sig)] = act->sa_handler;
 			SIGDELSET(ps->ps_siginfo, sig);
 		}
 		if (!sigact_flag_test(act, SA_RESTART))
 			SIGADDSET(ps->ps_sigintr, sig);
 		else
 			SIGDELSET(ps->ps_sigintr, sig);
 		if (sigact_flag_test(act, SA_ONSTACK))
 			SIGADDSET(ps->ps_sigonstack, sig);
 		else
 			SIGDELSET(ps->ps_sigonstack, sig);
 		if (sigact_flag_test(act, SA_RESETHAND))
 			SIGADDSET(ps->ps_sigreset, sig);
 		else
 			SIGDELSET(ps->ps_sigreset, sig);
 		if (sigact_flag_test(act, SA_NODEFER))
 			SIGADDSET(ps->ps_signodefer, sig);
 		else
 			SIGDELSET(ps->ps_signodefer, sig);
 		if (sig == SIGCHLD) {
 			if (act->sa_flags & SA_NOCLDSTOP)
 				ps->ps_flag |= PS_NOCLDSTOP;
 			else
 				ps->ps_flag &= ~PS_NOCLDSTOP;
 			if (act->sa_flags & SA_NOCLDWAIT) {
 				/*
 				 * Paranoia: since SA_NOCLDWAIT is implemented
 				 * by reparenting the dying child to PID 1 (and
 				 * trust it to reap the zombie), PID 1 itself
 				 * is forbidden to set SA_NOCLDWAIT.
 				 */
 				if (p->p_pid == 1)
 					ps->ps_flag &= ~PS_NOCLDWAIT;
 				else
 					ps->ps_flag |= PS_NOCLDWAIT;
 			} else
 				ps->ps_flag &= ~PS_NOCLDWAIT;
 			if (ps->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN)
 				ps->ps_flag |= PS_CLDSIGIGN;
 			else
 				ps->ps_flag &= ~PS_CLDSIGIGN;
 		}
 		/*
 		 * Set bit in ps_sigignore for signals that are set to SIG_IGN,
 		 * and for signals set to SIG_DFL where the default is to
 		 * ignore. However, don't put SIGCONT in ps_sigignore, as we
 		 * have to restart the process.
 		 */
 		if (ps->ps_sigact[_SIG_IDX(sig)] == SIG_IGN ||
 		    (sigprop(sig) & SIGPROP_IGNORE &&
 		     ps->ps_sigact[_SIG_IDX(sig)] == SIG_DFL)) {
 			/* never to be seen again */
 			sigqueue_delete_proc(p, sig);
 			if (sig != SIGCONT)
 				/* easier in psignal */
 				SIGADDSET(ps->ps_sigignore, sig);
 			SIGDELSET(ps->ps_sigcatch, sig);
 		} else {
 			SIGDELSET(ps->ps_sigignore, sig);
 			if (ps->ps_sigact[_SIG_IDX(sig)] == SIG_DFL)
 				SIGDELSET(ps->ps_sigcatch, sig);
 			else
 				SIGADDSET(ps->ps_sigcatch, sig);
 		}
 #ifdef COMPAT_FREEBSD4
 		if (ps->ps_sigact[_SIG_IDX(sig)] == SIG_IGN ||
 		    ps->ps_sigact[_SIG_IDX(sig)] == SIG_DFL ||
 		    (flags & KSA_FREEBSD4) == 0)
 			SIGDELSET(ps->ps_freebsd4, sig);
 		else
 			SIGADDSET(ps->ps_freebsd4, sig);
 #endif
 #ifdef COMPAT_43
 		if (ps->ps_sigact[_SIG_IDX(sig)] == SIG_IGN ||
 		    ps->ps_sigact[_SIG_IDX(sig)] == SIG_DFL ||
 		    (flags & KSA_OSIGSET) == 0)
 			SIGDELSET(ps->ps_osigset, sig);
 		else
 			SIGADDSET(ps->ps_osigset, sig);
 #endif
 	}
 	mtx_unlock(&ps->ps_mtx);
 	PROC_UNLOCK(p);
 	return (0);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct sigaction_args {
 	int	sig;
 	struct	sigaction *act;
 	struct	sigaction *oact;
 };
 #endif
 int
 sys_sigaction(struct thread *td, struct sigaction_args *uap)
 {
 	struct sigaction act, oact;
 	struct sigaction *actp, *oactp;
 	int error;
 
 	actp = (uap->act != NULL) ? &act : NULL;
 	oactp = (uap->oact != NULL) ? &oact : NULL;
 	if (actp) {
 		error = copyin(uap->act, actp, sizeof(act));
 		if (error)
 			return (error);
 	}
 	error = kern_sigaction(td, uap->sig, actp, oactp, 0);
 	if (oactp && !error)
 		error = copyout(oactp, uap->oact, sizeof(oact));
 	return (error);
 }
 
 #ifdef COMPAT_FREEBSD4
 #ifndef _SYS_SYSPROTO_H_
 struct freebsd4_sigaction_args {
 	int	sig;
 	struct	sigaction *act;
 	struct	sigaction *oact;
 };
 #endif
 int
 freebsd4_sigaction(struct thread *td, struct freebsd4_sigaction_args *uap)
 {
 	struct sigaction act, oact;
 	struct sigaction *actp, *oactp;
 	int error;
 
 
 	actp = (uap->act != NULL) ? &act : NULL;
 	oactp = (uap->oact != NULL) ? &oact : NULL;
 	if (actp) {
 		error = copyin(uap->act, actp, sizeof(act));
 		if (error)
 			return (error);
 	}
 	error = kern_sigaction(td, uap->sig, actp, oactp, KSA_FREEBSD4);
 	if (oactp && !error)
 		error = copyout(oactp, uap->oact, sizeof(oact));
 	return (error);
 }
 #endif	/* COMAPT_FREEBSD4 */
 
 #ifdef COMPAT_43	/* XXX - COMPAT_FBSD3 */
 #ifndef _SYS_SYSPROTO_H_
 struct osigaction_args {
 	int	signum;
 	struct	osigaction *nsa;
 	struct	osigaction *osa;
 };
 #endif
 int
 osigaction(struct thread *td, struct osigaction_args *uap)
 {
 	struct osigaction sa;
 	struct sigaction nsa, osa;
 	struct sigaction *nsap, *osap;
 	int error;
 
 	if (uap->signum <= 0 || uap->signum >= ONSIG)
 		return (EINVAL);
 
 	nsap = (uap->nsa != NULL) ? &nsa : NULL;
 	osap = (uap->osa != NULL) ? &osa : NULL;
 
 	if (nsap) {
 		error = copyin(uap->nsa, &sa, sizeof(sa));
 		if (error)
 			return (error);
 		nsap->sa_handler = sa.sa_handler;
 		nsap->sa_flags = sa.sa_flags;
 		OSIG2SIG(sa.sa_mask, nsap->sa_mask);
 	}
 	error = kern_sigaction(td, uap->signum, nsap, osap, KSA_OSIGSET);
 	if (osap && !error) {
 		sa.sa_handler = osap->sa_handler;
 		sa.sa_flags = osap->sa_flags;
 		SIG2OSIG(osap->sa_mask, sa.sa_mask);
 		error = copyout(&sa, uap->osa, sizeof(sa));
 	}
 	return (error);
 }
 
 #if !defined(__i386__)
 /* Avoid replicating the same stub everywhere */
 int
 osigreturn(struct thread *td, struct osigreturn_args *uap)
 {
 
 	return (nosys(td, (struct nosys_args *)uap));
 }
 #endif
 #endif /* COMPAT_43 */
 
 /*
  * Initialize signal state for process 0;
  * set to ignore signals that are ignored by default.
  */
 void
 siginit(struct proc *p)
 {
 	int i;
 	struct sigacts *ps;
 
 	PROC_LOCK(p);
 	ps = p->p_sigacts;
 	mtx_lock(&ps->ps_mtx);
 	for (i = 1; i <= NSIG; i++) {
 		if (sigprop(i) & SIGPROP_IGNORE && i != SIGCONT) {
 			SIGADDSET(ps->ps_sigignore, i);
 		}
 	}
 	mtx_unlock(&ps->ps_mtx);
 	PROC_UNLOCK(p);
 }
 
 /*
  * Reset specified signal to the default disposition.
  */
 static void
 sigdflt(struct sigacts *ps, int sig)
 {
 
 	mtx_assert(&ps->ps_mtx, MA_OWNED);
 	SIGDELSET(ps->ps_sigcatch, sig);
 	if ((sigprop(sig) & SIGPROP_IGNORE) != 0 && sig != SIGCONT)
 		SIGADDSET(ps->ps_sigignore, sig);
 	ps->ps_sigact[_SIG_IDX(sig)] = SIG_DFL;
 	SIGDELSET(ps->ps_siginfo, sig);
 }
 
 /*
  * Reset signals for an exec of the specified process.
  */
 void
 execsigs(struct proc *p)
 {
 	sigset_t osigignore;
 	struct sigacts *ps;
 	int sig;
 	struct thread *td;
 
 	/*
 	 * Reset caught signals.  Held signals remain held
 	 * through td_sigmask (unless they were caught,
 	 * and are now ignored by default).
 	 */
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	ps = p->p_sigacts;
 	mtx_lock(&ps->ps_mtx);
 	while (SIGNOTEMPTY(ps->ps_sigcatch)) {
 		sig = sig_ffs(&ps->ps_sigcatch);
 		sigdflt(ps, sig);
 		if ((sigprop(sig) & SIGPROP_IGNORE) != 0)
 			sigqueue_delete_proc(p, sig);
 	}
 
 	/*
 	 * As CloudABI processes cannot modify signal handlers, fully
 	 * reset all signals to their default behavior. Do ignore
 	 * SIGPIPE, as it would otherwise be impossible to recover from
 	 * writes to broken pipes and sockets.
 	 */
 	if (SV_PROC_ABI(p) == SV_ABI_CLOUDABI) {
 		osigignore = ps->ps_sigignore;
 		while (SIGNOTEMPTY(osigignore)) {
 			sig = sig_ffs(&osigignore);
 			SIGDELSET(osigignore, sig);
 			if (sig != SIGPIPE)
 				sigdflt(ps, sig);
 		}
 		SIGADDSET(ps->ps_sigignore, SIGPIPE);
 	}
 
 	/*
 	 * Reset stack state to the user stack.
 	 * Clear set of signals caught on the signal stack.
 	 */
 	td = curthread;
 	MPASS(td->td_proc == p);
 	td->td_sigstk.ss_flags = SS_DISABLE;
 	td->td_sigstk.ss_size = 0;
 	td->td_sigstk.ss_sp = 0;
 	td->td_pflags &= ~TDP_ALTSTACK;
 	/*
 	 * Reset no zombies if child dies flag as Solaris does.
 	 */
 	ps->ps_flag &= ~(PS_NOCLDWAIT | PS_CLDSIGIGN);
 	if (ps->ps_sigact[_SIG_IDX(SIGCHLD)] == SIG_IGN)
 		ps->ps_sigact[_SIG_IDX(SIGCHLD)] = SIG_DFL;
 	mtx_unlock(&ps->ps_mtx);
 }
 
 /*
  * kern_sigprocmask()
  *
  *	Manipulate signal mask.
  */
 int
 kern_sigprocmask(struct thread *td, int how, sigset_t *set, sigset_t *oset,
     int flags)
 {
 	sigset_t new_block, oset1;
 	struct proc *p;
 	int error;
 
 	p = td->td_proc;
 	if ((flags & SIGPROCMASK_PROC_LOCKED) != 0)
 		PROC_LOCK_ASSERT(p, MA_OWNED);
 	else
 		PROC_LOCK(p);
 	mtx_assert(&p->p_sigacts->ps_mtx, (flags & SIGPROCMASK_PS_LOCKED) != 0
 	    ? MA_OWNED : MA_NOTOWNED);
 	if (oset != NULL)
 		*oset = td->td_sigmask;
 
 	error = 0;
 	if (set != NULL) {
 		switch (how) {
 		case SIG_BLOCK:
 			SIG_CANTMASK(*set);
 			oset1 = td->td_sigmask;
 			SIGSETOR(td->td_sigmask, *set);
 			new_block = td->td_sigmask;
 			SIGSETNAND(new_block, oset1);
 			break;
 		case SIG_UNBLOCK:
 			SIGSETNAND(td->td_sigmask, *set);
 			signotify(td);
 			goto out;
 		case SIG_SETMASK:
 			SIG_CANTMASK(*set);
 			oset1 = td->td_sigmask;
 			if (flags & SIGPROCMASK_OLD)
 				SIGSETLO(td->td_sigmask, *set);
 			else
 				td->td_sigmask = *set;
 			new_block = td->td_sigmask;
 			SIGSETNAND(new_block, oset1);
 			signotify(td);
 			break;
 		default:
 			error = EINVAL;
 			goto out;
 		}
 
 		/*
 		 * The new_block set contains signals that were not previously
 		 * blocked, but are blocked now.
 		 *
 		 * In case we block any signal that was not previously blocked
 		 * for td, and process has the signal pending, try to schedule
 		 * signal delivery to some thread that does not block the
 		 * signal, possibly waking it up.
 		 */
 		if (p->p_numthreads != 1)
 			reschedule_signals(p, new_block, flags);
 	}
 
 out:
 	if (!(flags & SIGPROCMASK_PROC_LOCKED))
 		PROC_UNLOCK(p);
 	return (error);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct sigprocmask_args {
 	int	how;
 	const sigset_t *set;
 	sigset_t *oset;
 };
 #endif
 int
 sys_sigprocmask(struct thread *td, struct sigprocmask_args *uap)
 {
 	sigset_t set, oset;
 	sigset_t *setp, *osetp;
 	int error;
 
 	setp = (uap->set != NULL) ? &set : NULL;
 	osetp = (uap->oset != NULL) ? &oset : NULL;
 	if (setp) {
 		error = copyin(uap->set, setp, sizeof(set));
 		if (error)
 			return (error);
 	}
 	error = kern_sigprocmask(td, uap->how, setp, osetp, 0);
 	if (osetp && !error) {
 		error = copyout(osetp, uap->oset, sizeof(oset));
 	}
 	return (error);
 }
 
 #ifdef COMPAT_43	/* XXX - COMPAT_FBSD3 */
 #ifndef _SYS_SYSPROTO_H_
 struct osigprocmask_args {
 	int	how;
 	osigset_t mask;
 };
 #endif
 int
 osigprocmask(struct thread *td, struct osigprocmask_args *uap)
 {
 	sigset_t set, oset;
 	int error;
 
 	OSIG2SIG(uap->mask, set);
 	error = kern_sigprocmask(td, uap->how, &set, &oset, 1);
 	SIG2OSIG(oset, td->td_retval[0]);
 	return (error);
 }
 #endif /* COMPAT_43 */
 
 int
 sys_sigwait(struct thread *td, struct sigwait_args *uap)
 {
 	ksiginfo_t ksi;
 	sigset_t set;
 	int error;
 
 	error = copyin(uap->set, &set, sizeof(set));
 	if (error) {
 		td->td_retval[0] = error;
 		return (0);
 	}
 
 	error = kern_sigtimedwait(td, set, &ksi, NULL);
 	if (error) {
 		if (error == EINTR && td->td_proc->p_osrel < P_OSREL_SIGWAIT)
 			error = ERESTART;
 		if (error == ERESTART)
 			return (error);
 		td->td_retval[0] = error;
 		return (0);
 	}
 
 	error = copyout(&ksi.ksi_signo, uap->sig, sizeof(ksi.ksi_signo));
 	td->td_retval[0] = error;
 	return (0);
 }
 
 int
 sys_sigtimedwait(struct thread *td, struct sigtimedwait_args *uap)
 {
 	struct timespec ts;
 	struct timespec *timeout;
 	sigset_t set;
 	ksiginfo_t ksi;
 	int error;
 
 	if (uap->timeout) {
 		error = copyin(uap->timeout, &ts, sizeof(ts));
 		if (error)
 			return (error);
 
 		timeout = &ts;
 	} else
 		timeout = NULL;
 
 	error = copyin(uap->set, &set, sizeof(set));
 	if (error)
 		return (error);
 
 	error = kern_sigtimedwait(td, set, &ksi, timeout);
 	if (error)
 		return (error);
 
 	if (uap->info)
 		error = copyout(&ksi.ksi_info, uap->info, sizeof(siginfo_t));
 
 	if (error == 0)
 		td->td_retval[0] = ksi.ksi_signo;
 	return (error);
 }
 
 int
 sys_sigwaitinfo(struct thread *td, struct sigwaitinfo_args *uap)
 {
 	ksiginfo_t ksi;
 	sigset_t set;
 	int error;
 
 	error = copyin(uap->set, &set, sizeof(set));
 	if (error)
 		return (error);
 
 	error = kern_sigtimedwait(td, set, &ksi, NULL);
 	if (error)
 		return (error);
 
 	if (uap->info)
 		error = copyout(&ksi.ksi_info, uap->info, sizeof(siginfo_t));
 
 	if (error == 0)
 		td->td_retval[0] = ksi.ksi_signo;
 	return (error);
 }
 
 static void
 proc_td_siginfo_capture(struct thread *td, siginfo_t *si)
 {
 	struct thread *thr;
 
 	FOREACH_THREAD_IN_PROC(td->td_proc, thr) {
 		if (thr == td)
 			thr->td_si = *si;
 		else
 			thr->td_si.si_signo = 0;
 	}
 }
 
 int
 kern_sigtimedwait(struct thread *td, sigset_t waitset, ksiginfo_t *ksi,
 	struct timespec *timeout)
 {
 	struct sigacts *ps;
 	sigset_t saved_mask, new_block;
 	struct proc *p;
 	int error, sig, timo, timevalid = 0;
 	struct timespec rts, ets, ts;
 	struct timeval tv;
 
 	p = td->td_proc;
 	error = 0;
 	ets.tv_sec = 0;
 	ets.tv_nsec = 0;
 
 	if (timeout != NULL) {
 		if (timeout->tv_nsec >= 0 && timeout->tv_nsec < 1000000000) {
 			timevalid = 1;
 			getnanouptime(&rts);
 			timespecadd(&rts, timeout, &ets);
 		}
 	}
 	ksiginfo_init(ksi);
 	/* Some signals can not be waited for. */
 	SIG_CANTMASK(waitset);
 	ps = p->p_sigacts;
 	PROC_LOCK(p);
 	saved_mask = td->td_sigmask;
 	SIGSETNAND(td->td_sigmask, waitset);
 	for (;;) {
 		mtx_lock(&ps->ps_mtx);
 		sig = cursig(td);
 		mtx_unlock(&ps->ps_mtx);
 		KASSERT(sig >= 0, ("sig %d", sig));
 		if (sig != 0 && SIGISMEMBER(waitset, sig)) {
 			if (sigqueue_get(&td->td_sigqueue, sig, ksi) != 0 ||
 			    sigqueue_get(&p->p_sigqueue, sig, ksi) != 0) {
 				error = 0;
 				break;
 			}
 		}
 
 		if (error != 0)
 			break;
 
 		/*
 		 * POSIX says this must be checked after looking for pending
 		 * signals.
 		 */
 		if (timeout != NULL) {
 			if (!timevalid) {
 				error = EINVAL;
 				break;
 			}
 			getnanouptime(&rts);
 			if (timespeccmp(&rts, &ets, >=)) {
 				error = EAGAIN;
 				break;
 			}
 			timespecsub(&ets, &rts, &ts);
 			TIMESPEC_TO_TIMEVAL(&tv, &ts);
 			timo = tvtohz(&tv);
 		} else {
 			timo = 0;
 		}
 
 		error = msleep(ps, &p->p_mtx, PPAUSE|PCATCH, "sigwait", timo);
 
 		if (timeout != NULL) {
 			if (error == ERESTART) {
 				/* Timeout can not be restarted. */
 				error = EINTR;
 			} else if (error == EAGAIN) {
 				/* We will calculate timeout by ourself. */
 				error = 0;
 			}
 		}
 	}
 
 	new_block = saved_mask;
 	SIGSETNAND(new_block, td->td_sigmask);
 	td->td_sigmask = saved_mask;
 	/*
 	 * Fewer signals can be delivered to us, reschedule signal
 	 * notification.
 	 */
 	if (p->p_numthreads != 1)
 		reschedule_signals(p, new_block, 0);
 
 	if (error == 0) {
 		SDT_PROBE2(proc, , , signal__clear, sig, ksi);
 
 		if (ksi->ksi_code == SI_TIMER)
 			itimer_accept(p, ksi->ksi_timerid, ksi);
 
 #ifdef KTRACE
 		if (KTRPOINT(td, KTR_PSIG)) {
 			sig_t action;
 
 			mtx_lock(&ps->ps_mtx);
 			action = ps->ps_sigact[_SIG_IDX(sig)];
 			mtx_unlock(&ps->ps_mtx);
 			ktrpsig(sig, action, &td->td_sigmask, ksi->ksi_code);
 		}
 #endif
 		if (sig == SIGKILL) {
 			proc_td_siginfo_capture(td, &ksi->ksi_info);
 			sigexit(td, sig);
 		}
 	}
 	PROC_UNLOCK(p);
 	return (error);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct sigpending_args {
 	sigset_t	*set;
 };
 #endif
 int
 sys_sigpending(struct thread *td, struct sigpending_args *uap)
 {
 	struct proc *p = td->td_proc;
 	sigset_t pending;
 
 	PROC_LOCK(p);
 	pending = p->p_sigqueue.sq_signals;
 	SIGSETOR(pending, td->td_sigqueue.sq_signals);
 	PROC_UNLOCK(p);
 	return (copyout(&pending, uap->set, sizeof(sigset_t)));
 }
 
 #ifdef COMPAT_43	/* XXX - COMPAT_FBSD3 */
 #ifndef _SYS_SYSPROTO_H_
 struct osigpending_args {
 	int	dummy;
 };
 #endif
 int
 osigpending(struct thread *td, struct osigpending_args *uap)
 {
 	struct proc *p = td->td_proc;
 	sigset_t pending;
 
 	PROC_LOCK(p);
 	pending = p->p_sigqueue.sq_signals;
 	SIGSETOR(pending, td->td_sigqueue.sq_signals);
 	PROC_UNLOCK(p);
 	SIG2OSIG(pending, td->td_retval[0]);
 	return (0);
 }
 #endif /* COMPAT_43 */
 
 #if defined(COMPAT_43)
 /*
  * Generalized interface signal handler, 4.3-compatible.
  */
 #ifndef _SYS_SYSPROTO_H_
 struct osigvec_args {
 	int	signum;
 	struct	sigvec *nsv;
 	struct	sigvec *osv;
 };
 #endif
 /* ARGSUSED */
 int
 osigvec(struct thread *td, struct osigvec_args *uap)
 {
 	struct sigvec vec;
 	struct sigaction nsa, osa;
 	struct sigaction *nsap, *osap;
 	int error;
 
 	if (uap->signum <= 0 || uap->signum >= ONSIG)
 		return (EINVAL);
 	nsap = (uap->nsv != NULL) ? &nsa : NULL;
 	osap = (uap->osv != NULL) ? &osa : NULL;
 	if (nsap) {
 		error = copyin(uap->nsv, &vec, sizeof(vec));
 		if (error)
 			return (error);
 		nsap->sa_handler = vec.sv_handler;
 		OSIG2SIG(vec.sv_mask, nsap->sa_mask);
 		nsap->sa_flags = vec.sv_flags;
 		nsap->sa_flags ^= SA_RESTART;	/* opposite of SV_INTERRUPT */
 	}
 	error = kern_sigaction(td, uap->signum, nsap, osap, KSA_OSIGSET);
 	if (osap && !error) {
 		vec.sv_handler = osap->sa_handler;
 		SIG2OSIG(osap->sa_mask, vec.sv_mask);
 		vec.sv_flags = osap->sa_flags;
 		vec.sv_flags &= ~SA_NOCLDWAIT;
 		vec.sv_flags ^= SA_RESTART;
 		error = copyout(&vec, uap->osv, sizeof(vec));
 	}
 	return (error);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct osigblock_args {
 	int	mask;
 };
 #endif
 int
 osigblock(struct thread *td, struct osigblock_args *uap)
 {
 	sigset_t set, oset;
 
 	OSIG2SIG(uap->mask, set);
 	kern_sigprocmask(td, SIG_BLOCK, &set, &oset, 0);
 	SIG2OSIG(oset, td->td_retval[0]);
 	return (0);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct osigsetmask_args {
 	int	mask;
 };
 #endif
 int
 osigsetmask(struct thread *td, struct osigsetmask_args *uap)
 {
 	sigset_t set, oset;
 
 	OSIG2SIG(uap->mask, set);
 	kern_sigprocmask(td, SIG_SETMASK, &set, &oset, 0);
 	SIG2OSIG(oset, td->td_retval[0]);
 	return (0);
 }
 #endif /* COMPAT_43 */
 
 /*
  * Suspend calling thread until signal, providing mask to be set in the
  * meantime.
  */
 #ifndef _SYS_SYSPROTO_H_
 struct sigsuspend_args {
 	const sigset_t *sigmask;
 };
 #endif
 /* ARGSUSED */
 int
 sys_sigsuspend(struct thread *td, struct sigsuspend_args *uap)
 {
 	sigset_t mask;
 	int error;
 
 	error = copyin(uap->sigmask, &mask, sizeof(mask));
 	if (error)
 		return (error);
 	return (kern_sigsuspend(td, mask));
 }
 
 int
 kern_sigsuspend(struct thread *td, sigset_t mask)
 {
 	struct proc *p = td->td_proc;
 	int has_sig, sig;
 
 	/*
 	 * When returning from sigsuspend, we want
 	 * the old mask to be restored after the
 	 * signal handler has finished.  Thus, we
 	 * save it here and mark the sigacts structure
 	 * to indicate this.
 	 */
 	PROC_LOCK(p);
 	kern_sigprocmask(td, SIG_SETMASK, &mask, &td->td_oldsigmask,
 	    SIGPROCMASK_PROC_LOCKED);
 	td->td_pflags |= TDP_OLDMASK;
 
 	/*
 	 * Process signals now. Otherwise, we can get spurious wakeup
 	 * due to signal entered process queue, but delivered to other
 	 * thread. But sigsuspend should return only on signal
 	 * delivery.
 	 */
 	(p->p_sysent->sv_set_syscall_retval)(td, EINTR);
 	for (has_sig = 0; !has_sig;) {
 		while (msleep(&p->p_sigacts, &p->p_mtx, PPAUSE|PCATCH, "pause",
 			0) == 0)
 			/* void */;
 		thread_suspend_check(0);
 		mtx_lock(&p->p_sigacts->ps_mtx);
 		while ((sig = cursig(td)) != 0) {
 			KASSERT(sig >= 0, ("sig %d", sig));
 			has_sig += postsig(sig);
 		}
 		mtx_unlock(&p->p_sigacts->ps_mtx);
 	}
 	PROC_UNLOCK(p);
 	td->td_errno = EINTR;
 	td->td_pflags |= TDP_NERRNO;
 	return (EJUSTRETURN);
 }
 
 #ifdef COMPAT_43	/* XXX - COMPAT_FBSD3 */
 /*
  * Compatibility sigsuspend call for old binaries.  Note nonstandard calling
  * convention: libc stub passes mask, not pointer, to save a copyin.
  */
 #ifndef _SYS_SYSPROTO_H_
 struct osigsuspend_args {
 	osigset_t mask;
 };
 #endif
 /* ARGSUSED */
 int
 osigsuspend(struct thread *td, struct osigsuspend_args *uap)
 {
 	sigset_t mask;
 
 	OSIG2SIG(uap->mask, mask);
 	return (kern_sigsuspend(td, mask));
 }
 #endif /* COMPAT_43 */
 
 #if defined(COMPAT_43)
 #ifndef _SYS_SYSPROTO_H_
 struct osigstack_args {
 	struct	sigstack *nss;
 	struct	sigstack *oss;
 };
 #endif
 /* ARGSUSED */
 int
 osigstack(struct thread *td, struct osigstack_args *uap)
 {
 	struct sigstack nss, oss;
 	int error = 0;
 
 	if (uap->nss != NULL) {
 		error = copyin(uap->nss, &nss, sizeof(nss));
 		if (error)
 			return (error);
 	}
 	oss.ss_sp = td->td_sigstk.ss_sp;
 	oss.ss_onstack = sigonstack(cpu_getstack(td));
 	if (uap->nss != NULL) {
 		td->td_sigstk.ss_sp = nss.ss_sp;
 		td->td_sigstk.ss_size = 0;
 		td->td_sigstk.ss_flags |= nss.ss_onstack & SS_ONSTACK;
 		td->td_pflags |= TDP_ALTSTACK;
 	}
 	if (uap->oss != NULL)
 		error = copyout(&oss, uap->oss, sizeof(oss));
 
 	return (error);
 }
 #endif /* COMPAT_43 */
 
 #ifndef _SYS_SYSPROTO_H_
 struct sigaltstack_args {
 	stack_t	*ss;
 	stack_t	*oss;
 };
 #endif
 /* ARGSUSED */
 int
 sys_sigaltstack(struct thread *td, struct sigaltstack_args *uap)
 {
 	stack_t ss, oss;
 	int error;
 
 	if (uap->ss != NULL) {
 		error = copyin(uap->ss, &ss, sizeof(ss));
 		if (error)
 			return (error);
 	}
 	error = kern_sigaltstack(td, (uap->ss != NULL) ? &ss : NULL,
 	    (uap->oss != NULL) ? &oss : NULL);
 	if (error)
 		return (error);
 	if (uap->oss != NULL)
 		error = copyout(&oss, uap->oss, sizeof(stack_t));
 	return (error);
 }
 
 int
 kern_sigaltstack(struct thread *td, stack_t *ss, stack_t *oss)
 {
 	struct proc *p = td->td_proc;
 	int oonstack;
 
 	oonstack = sigonstack(cpu_getstack(td));
 
 	if (oss != NULL) {
 		*oss = td->td_sigstk;
 		oss->ss_flags = (td->td_pflags & TDP_ALTSTACK)
 		    ? ((oonstack) ? SS_ONSTACK : 0) : SS_DISABLE;
 	}
 
 	if (ss != NULL) {
 		if (oonstack)
 			return (EPERM);
 		if ((ss->ss_flags & ~SS_DISABLE) != 0)
 			return (EINVAL);
 		if (!(ss->ss_flags & SS_DISABLE)) {
 			if (ss->ss_size < p->p_sysent->sv_minsigstksz)
 				return (ENOMEM);
 
 			td->td_sigstk = *ss;
 			td->td_pflags |= TDP_ALTSTACK;
 		} else {
 			td->td_pflags &= ~TDP_ALTSTACK;
 		}
 	}
 	return (0);
 }
 
 /*
  * Common code for kill process group/broadcast kill.
  * cp is calling process.
  */
 static int
 killpg1(struct thread *td, int sig, int pgid, int all, ksiginfo_t *ksi)
 {
 	struct proc *p;
 	struct pgrp *pgrp;
 	int err;
 	int ret;
 
 	ret = ESRCH;
 	if (all) {
 		/*
 		 * broadcast
 		 */
 		sx_slock(&allproc_lock);
 		FOREACH_PROC_IN_SYSTEM(p) {
 			if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
 			    p == td->td_proc || p->p_state == PRS_NEW) {
 				continue;
 			}
 			PROC_LOCK(p);
 			err = p_cansignal(td, p, sig);
 			if (err == 0) {
 				if (sig)
 					pksignal(p, sig, ksi);
 				ret = err;
 			}
 			else if (ret == ESRCH)
 				ret = err;
 			PROC_UNLOCK(p);
 		}
 		sx_sunlock(&allproc_lock);
 	} else {
 		sx_slock(&proctree_lock);
 		if (pgid == 0) {
 			/*
 			 * zero pgid means send to my process group.
 			 */
 			pgrp = td->td_proc->p_pgrp;
 			PGRP_LOCK(pgrp);
 		} else {
 			pgrp = pgfind(pgid);
 			if (pgrp == NULL) {
 				sx_sunlock(&proctree_lock);
 				return (ESRCH);
 			}
 		}
 		sx_sunlock(&proctree_lock);
 		LIST_FOREACH(p, &pgrp->pg_members, p_pglist) {
 			PROC_LOCK(p);
 			if (p->p_pid <= 1 || p->p_flag & P_SYSTEM ||
 			    p->p_state == PRS_NEW) {
 				PROC_UNLOCK(p);
 				continue;
 			}
 			err = p_cansignal(td, p, sig);
 			if (err == 0) {
 				if (sig)
 					pksignal(p, sig, ksi);
 				ret = err;
 			}
 			else if (ret == ESRCH)
 				ret = err;
 			PROC_UNLOCK(p);
 		}
 		PGRP_UNLOCK(pgrp);
 	}
 	return (ret);
 }
 
 #ifndef _SYS_SYSPROTO_H_
 struct kill_args {
 	int	pid;
 	int	signum;
 };
 #endif
 /* ARGSUSED */
 int
 sys_kill(struct thread *td, struct kill_args *uap)
 {
 	ksiginfo_t ksi;
 	struct proc *p;
 	int error;
 
 	/*
 	 * A process in capability mode can send signals only to himself.
 	 * The main rationale behind this is that abort(3) is implemented as
 	 * kill(getpid(), SIGABRT).
 	 */
 	if (IN_CAPABILITY_MODE(td) && uap->pid != td->td_proc->p_pid)
 		return (ECAPMODE);
 
 	AUDIT_ARG_SIGNUM(uap->signum);
 	AUDIT_ARG_PID(uap->pid);
 	if ((u_int)uap->signum > _SIG_MAXSIG)
 		return (EINVAL);
 
 	ksiginfo_init(&ksi);
 	ksi.ksi_signo = uap->signum;
 	ksi.ksi_code = SI_USER;
 	ksi.ksi_pid = td->td_proc->p_pid;
 	ksi.ksi_uid = td->td_ucred->cr_ruid;
 
 	if (uap->pid > 0) {
 		/* kill single process */
 		if ((p = pfind_any(uap->pid)) == NULL)
 			return (ESRCH);
 		AUDIT_ARG_PROCESS(p);
 		error = p_cansignal(td, p, uap->signum);
 		if (error == 0 && uap->signum)
 			pksignal(p, uap->signum, &ksi);
 		PROC_UNLOCK(p);
 		return (error);
 	}
 	switch (uap->pid) {
 	case -1:		/* broadcast signal */
 		return (killpg1(td, uap->signum, 0, 1, &ksi));
 	case 0:			/* signal own process group */
 		return (killpg1(td, uap->signum, 0, 0, &ksi));
 	default:		/* negative explicit process group */
 		return (killpg1(td, uap->signum, -uap->pid, 0, &ksi));
 	}
 	/* NOTREACHED */
 }
 
 int
 sys_pdkill(struct thread *td, struct pdkill_args *uap)
 {
 	struct proc *p;
 	int error;
 
 	AUDIT_ARG_SIGNUM(uap->signum);
 	AUDIT_ARG_FD(uap->fd);
 	if ((u_int)uap->signum > _SIG_MAXSIG)
 		return (EINVAL);
 
 	error = procdesc_find(td, uap->fd, &cap_pdkill_rights, &p);
 	if (error)
 		return (error);
 	AUDIT_ARG_PROCESS(p);
 	error = p_cansignal(td, p, uap->signum);
 	if (error == 0 && uap->signum)
 		kern_psignal(p, uap->signum);
 	PROC_UNLOCK(p);
 	return (error);
 }
 
 #if defined(COMPAT_43)
 #ifndef _SYS_SYSPROTO_H_
 struct okillpg_args {
 	int	pgid;
 	int	signum;
 };
 #endif
 /* ARGSUSED */
 int
 okillpg(struct thread *td, struct okillpg_args *uap)
 {
 	ksiginfo_t ksi;
 
 	AUDIT_ARG_SIGNUM(uap->signum);
 	AUDIT_ARG_PID(uap->pgid);
 	if ((u_int)uap->signum > _SIG_MAXSIG)
 		return (EINVAL);
 
 	ksiginfo_init(&ksi);
 	ksi.ksi_signo = uap->signum;
 	ksi.ksi_code = SI_USER;
 	ksi.ksi_pid = td->td_proc->p_pid;
 	ksi.ksi_uid = td->td_ucred->cr_ruid;
 	return (killpg1(td, uap->signum, uap->pgid, 0, &ksi));
 }
 #endif /* COMPAT_43 */
 
 #ifndef _SYS_SYSPROTO_H_
 struct sigqueue_args {
 	pid_t pid;
 	int signum;
 	/* union sigval */ void *value;
 };
 #endif
 int
 sys_sigqueue(struct thread *td, struct sigqueue_args *uap)
 {
 	union sigval sv;
 
 	sv.sival_ptr = uap->value;
 
 	return (kern_sigqueue(td, uap->pid, uap->signum, &sv));
 }
 
 int
 kern_sigqueue(struct thread *td, pid_t pid, int signum, union sigval *value)
 {
 	ksiginfo_t ksi;
 	struct proc *p;
 	int error;
 
 	if ((u_int)signum > _SIG_MAXSIG)
 		return (EINVAL);
 
 	/*
 	 * Specification says sigqueue can only send signal to
 	 * single process.
 	 */
 	if (pid <= 0)
 		return (EINVAL);
 
 	if ((p = pfind_any(pid)) == NULL)
 		return (ESRCH);
 	error = p_cansignal(td, p, signum);
 	if (error == 0 && signum != 0) {
 		ksiginfo_init(&ksi);
 		ksi.ksi_flags = KSI_SIGQ;
 		ksi.ksi_signo = signum;
 		ksi.ksi_code = SI_QUEUE;
 		ksi.ksi_pid = td->td_proc->p_pid;
 		ksi.ksi_uid = td->td_ucred->cr_ruid;
 		ksi.ksi_value = *value;
 		error = pksignal(p, ksi.ksi_signo, &ksi);
 	}
 	PROC_UNLOCK(p);
 	return (error);
 }
 
 /*
  * Send a signal to a process group.
  */
 void
 gsignal(int pgid, int sig, ksiginfo_t *ksi)
 {
 	struct pgrp *pgrp;
 
 	if (pgid != 0) {
 		sx_slock(&proctree_lock);
 		pgrp = pgfind(pgid);
 		sx_sunlock(&proctree_lock);
 		if (pgrp != NULL) {
 			pgsignal(pgrp, sig, 0, ksi);
 			PGRP_UNLOCK(pgrp);
 		}
 	}
 }
 
 /*
  * Send a signal to a process group.  If checktty is 1,
  * limit to members which have a controlling terminal.
  */
 void
 pgsignal(struct pgrp *pgrp, int sig, int checkctty, ksiginfo_t *ksi)
 {
 	struct proc *p;
 
 	if (pgrp) {
 		PGRP_LOCK_ASSERT(pgrp, MA_OWNED);
 		LIST_FOREACH(p, &pgrp->pg_members, p_pglist) {
 			PROC_LOCK(p);
 			if (p->p_state == PRS_NORMAL &&
 			    (checkctty == 0 || p->p_flag & P_CONTROLT))
 				pksignal(p, sig, ksi);
 			PROC_UNLOCK(p);
 		}
 	}
 }
 
 
 /*
  * Recalculate the signal mask and reset the signal disposition after
  * usermode frame for delivery is formed.  Should be called after
  * mach-specific routine, because sysent->sv_sendsig() needs correct
  * ps_siginfo and signal mask.
  */
 static void
 postsig_done(int sig, struct thread *td, struct sigacts *ps)
 {
 	sigset_t mask;
 
 	mtx_assert(&ps->ps_mtx, MA_OWNED);
 	td->td_ru.ru_nsignals++;
 	mask = ps->ps_catchmask[_SIG_IDX(sig)];
 	if (!SIGISMEMBER(ps->ps_signodefer, sig))
 		SIGADDSET(mask, sig);
 	kern_sigprocmask(td, SIG_BLOCK, &mask, NULL,
 	    SIGPROCMASK_PROC_LOCKED | SIGPROCMASK_PS_LOCKED);
 	if (SIGISMEMBER(ps->ps_sigreset, sig))
 		sigdflt(ps, sig);
 }
 
 
 /*
  * Send a signal caused by a trap to the current thread.  If it will be
  * caught immediately, deliver it with correct code.  Otherwise, post it
  * normally.
  */
 void
 trapsignal(struct thread *td, ksiginfo_t *ksi)
 {
 	struct sigacts *ps;
 	struct proc *p;
 	int sig;
 	int code;
 
 	p = td->td_proc;
 	sig = ksi->ksi_signo;
 	code = ksi->ksi_code;
 	KASSERT(_SIG_VALID(sig), ("invalid signal"));
 
 	PROC_LOCK(p);
 	ps = p->p_sigacts;
 	mtx_lock(&ps->ps_mtx);
 	if ((p->p_flag & P_TRACED) == 0 && SIGISMEMBER(ps->ps_sigcatch, sig) &&
 	    !SIGISMEMBER(td->td_sigmask, sig)) {
 #ifdef KTRACE
 		if (KTRPOINT(curthread, KTR_PSIG))
 			ktrpsig(sig, ps->ps_sigact[_SIG_IDX(sig)],
 			    &td->td_sigmask, code);
 #endif
 		(*p->p_sysent->sv_sendsig)(ps->ps_sigact[_SIG_IDX(sig)],
 				ksi, &td->td_sigmask);
 		postsig_done(sig, td, ps);
 		mtx_unlock(&ps->ps_mtx);
 	} else {
 		/*
 		 * Avoid a possible infinite loop if the thread
 		 * masking the signal or process is ignoring the
 		 * signal.
 		 */
 		if (kern_forcesigexit &&
 		    (SIGISMEMBER(td->td_sigmask, sig) ||
 		     ps->ps_sigact[_SIG_IDX(sig)] == SIG_IGN)) {
 			SIGDELSET(td->td_sigmask, sig);
 			SIGDELSET(ps->ps_sigcatch, sig);
 			SIGDELSET(ps->ps_sigignore, sig);
 			ps->ps_sigact[_SIG_IDX(sig)] = SIG_DFL;
 		}
 		mtx_unlock(&ps->ps_mtx);
-		p->p_code = code;	/* XXX for core dump/debugger */
 		p->p_sig = sig;		/* XXX to verify code */
 		tdsendsignal(p, td, sig, ksi);
 	}
 	PROC_UNLOCK(p);
 }
 
 static struct thread *
 sigtd(struct proc *p, int sig, int prop)
 {
 	struct thread *td, *signal_td;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	/*
 	 * Check if current thread can handle the signal without
 	 * switching context to another thread.
 	 */
 	if (curproc == p && !SIGISMEMBER(curthread->td_sigmask, sig))
 		return (curthread);
 	signal_td = NULL;
 	FOREACH_THREAD_IN_PROC(p, td) {
 		if (!SIGISMEMBER(td->td_sigmask, sig)) {
 			signal_td = td;
 			break;
 		}
 	}
 	if (signal_td == NULL)
 		signal_td = FIRST_THREAD_IN_PROC(p);
 	return (signal_td);
 }
 
 /*
  * Send the signal to the process.  If the signal has an action, the action
  * is usually performed by the target process rather than the caller; we add
  * the signal to the set of pending signals for the process.
  *
  * Exceptions:
  *   o When a stop signal is sent to a sleeping process that takes the
  *     default action, the process is stopped without awakening it.
  *   o SIGCONT restarts stopped processes (or puts them back to sleep)
  *     regardless of the signal action (eg, blocked or ignored).
  *
  * Other ignored signals are discarded immediately.
  *
  * NB: This function may be entered from the debugger via the "kill" DDB
  * command.  There is little that can be done to mitigate the possibly messy
  * side effects of this unwise possibility.
  */
 void
 kern_psignal(struct proc *p, int sig)
 {
 	ksiginfo_t ksi;
 
 	ksiginfo_init(&ksi);
 	ksi.ksi_signo = sig;
 	ksi.ksi_code = SI_KERNEL;
 	(void) tdsendsignal(p, NULL, sig, &ksi);
 }
 
 int
 pksignal(struct proc *p, int sig, ksiginfo_t *ksi)
 {
 
 	return (tdsendsignal(p, NULL, sig, ksi));
 }
 
 /* Utility function for finding a thread to send signal event to. */
 int
 sigev_findtd(struct proc *p ,struct sigevent *sigev, struct thread **ttd)
 {
 	struct thread *td;
 
 	if (sigev->sigev_notify == SIGEV_THREAD_ID) {
 		td = tdfind(sigev->sigev_notify_thread_id, p->p_pid);
 		if (td == NULL)
 			return (ESRCH);
 		*ttd = td;
 	} else {
 		*ttd = NULL;
 		PROC_LOCK(p);
 	}
 	return (0);
 }
 
 void
 tdsignal(struct thread *td, int sig)
 {
 	ksiginfo_t ksi;
 
 	ksiginfo_init(&ksi);
 	ksi.ksi_signo = sig;
 	ksi.ksi_code = SI_KERNEL;
 	(void) tdsendsignal(td->td_proc, td, sig, &ksi);
 }
 
 void
 tdksignal(struct thread *td, int sig, ksiginfo_t *ksi)
 {
 
 	(void) tdsendsignal(td->td_proc, td, sig, ksi);
 }
 
 int
 tdsendsignal(struct proc *p, struct thread *td, int sig, ksiginfo_t *ksi)
 {
 	sig_t action;
 	sigqueue_t *sigqueue;
 	int prop;
 	struct sigacts *ps;
 	int intrval;
 	int ret = 0;
 	int wakeup_swapper;
 
 	MPASS(td == NULL || p == td->td_proc);
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	if (!_SIG_VALID(sig))
 		panic("%s(): invalid signal %d", __func__, sig);
 
 	KASSERT(ksi == NULL || !KSI_ONQ(ksi), ("%s: ksi on queue", __func__));
 
 	/*
 	 * IEEE Std 1003.1-2001: return success when killing a zombie.
 	 */
 	if (p->p_state == PRS_ZOMBIE) {
 		if (ksi && (ksi->ksi_flags & KSI_INS))
 			ksiginfo_tryfree(ksi);
 		return (ret);
 	}
 
 	ps = p->p_sigacts;
 	KNOTE_LOCKED(p->p_klist, NOTE_SIGNAL | sig);
 	prop = sigprop(sig);
 
 	if (td == NULL) {
 		td = sigtd(p, sig, prop);
 		sigqueue = &p->p_sigqueue;
 	} else
 		sigqueue = &td->td_sigqueue;
 
 	SDT_PROBE3(proc, , , signal__send, td, p, sig);
 
 	/*
 	 * If the signal is being ignored,
 	 * then we forget about it immediately.
 	 * (Note: we don't set SIGCONT in ps_sigignore,
 	 * and if it is set to SIG_IGN,
 	 * action will be SIG_DFL here.)
 	 */
 	mtx_lock(&ps->ps_mtx);
 	if (SIGISMEMBER(ps->ps_sigignore, sig)) {
 		SDT_PROBE3(proc, , , signal__discard, td, p, sig);
 
 		mtx_unlock(&ps->ps_mtx);
 		if (ksi && (ksi->ksi_flags & KSI_INS))
 			ksiginfo_tryfree(ksi);
 		return (ret);
 	}
 	if (SIGISMEMBER(td->td_sigmask, sig))
 		action = SIG_HOLD;
 	else if (SIGISMEMBER(ps->ps_sigcatch, sig))
 		action = SIG_CATCH;
 	else
 		action = SIG_DFL;
 	if (SIGISMEMBER(ps->ps_sigintr, sig))
 		intrval = EINTR;
 	else
 		intrval = ERESTART;
 	mtx_unlock(&ps->ps_mtx);
 
 	if (prop & SIGPROP_CONT)
 		sigqueue_delete_stopmask_proc(p);
 	else if (prop & SIGPROP_STOP) {
 		/*
 		 * If sending a tty stop signal to a member of an orphaned
 		 * process group, discard the signal here if the action
 		 * is default; don't stop the process below if sleeping,
 		 * and don't clear any pending SIGCONT.
 		 */
 		if ((prop & SIGPROP_TTYSTOP) &&
 		    (p->p_pgrp->pg_jobc == 0) &&
 		    (action == SIG_DFL)) {
 			if (ksi && (ksi->ksi_flags & KSI_INS))
 				ksiginfo_tryfree(ksi);
 			return (ret);
 		}
 		sigqueue_delete_proc(p, SIGCONT);
 		if (p->p_flag & P_CONTINUED) {
 			p->p_flag &= ~P_CONTINUED;
 			PROC_LOCK(p->p_pptr);
 			sigqueue_take(p->p_ksi);
 			PROC_UNLOCK(p->p_pptr);
 		}
 	}
 
 	ret = sigqueue_add(sigqueue, sig, ksi);
 	if (ret != 0)
 		return (ret);
 	signotify(td);
 	/*
 	 * Defer further processing for signals which are held,
 	 * except that stopped processes must be continued by SIGCONT.
 	 */
 	if (action == SIG_HOLD &&
 	    !((prop & SIGPROP_CONT) && (p->p_flag & P_STOPPED_SIG)))
 		return (ret);
 
 	/* SIGKILL: Remove procfs STOPEVENTs. */
 	if (sig == SIGKILL) {
 		/* from procfs_ioctl.c: PIOCBIC */
 		p->p_stops = 0;
 		/* from procfs_ioctl.c: PIOCCONT */
 		p->p_step = 0;
 		wakeup(&p->p_step);
 	}
 	/*
 	 * Some signals have a process-wide effect and a per-thread
 	 * component.  Most processing occurs when the process next
 	 * tries to cross the user boundary, however there are some
 	 * times when processing needs to be done immediately, such as
 	 * waking up threads so that they can cross the user boundary.
 	 * We try to do the per-process part here.
 	 */
 	if (P_SHOULDSTOP(p)) {
 		KASSERT(!(p->p_flag & P_WEXIT),
 		    ("signal to stopped but exiting process"));
 		if (sig == SIGKILL) {
 			/*
 			 * If traced process is already stopped,
 			 * then no further action is necessary.
 			 */
 			if (p->p_flag & P_TRACED)
 				goto out;
 			/*
 			 * SIGKILL sets process running.
 			 * It will die elsewhere.
 			 * All threads must be restarted.
 			 */
 			p->p_flag &= ~P_STOPPED_SIG;
 			goto runfast;
 		}
 
 		if (prop & SIGPROP_CONT) {
 			/*
 			 * If traced process is already stopped,
 			 * then no further action is necessary.
 			 */
 			if (p->p_flag & P_TRACED)
 				goto out;
 			/*
 			 * If SIGCONT is default (or ignored), we continue the
 			 * process but don't leave the signal in sigqueue as
 			 * it has no further action.  If SIGCONT is held, we
 			 * continue the process and leave the signal in
 			 * sigqueue.  If the process catches SIGCONT, let it
 			 * handle the signal itself.  If it isn't waiting on
 			 * an event, it goes back to run state.
 			 * Otherwise, process goes back to sleep state.
 			 */
 			p->p_flag &= ~P_STOPPED_SIG;
 			PROC_SLOCK(p);
 			if (p->p_numthreads == p->p_suspcount) {
 				PROC_SUNLOCK(p);
 				p->p_flag |= P_CONTINUED;
 				p->p_xsig = SIGCONT;
 				PROC_LOCK(p->p_pptr);
 				childproc_continued(p);
 				PROC_UNLOCK(p->p_pptr);
 				PROC_SLOCK(p);
 			}
 			if (action == SIG_DFL) {
 				thread_unsuspend(p);
 				PROC_SUNLOCK(p);
 				sigqueue_delete(sigqueue, sig);
 				goto out;
 			}
 			if (action == SIG_CATCH) {
 				/*
 				 * The process wants to catch it so it needs
 				 * to run at least one thread, but which one?
 				 */
 				PROC_SUNLOCK(p);
 				goto runfast;
 			}
 			/*
 			 * The signal is not ignored or caught.
 			 */
 			thread_unsuspend(p);
 			PROC_SUNLOCK(p);
 			goto out;
 		}
 
 		if (prop & SIGPROP_STOP) {
 			/*
 			 * If traced process is already stopped,
 			 * then no further action is necessary.
 			 */
 			if (p->p_flag & P_TRACED)
 				goto out;
 			/*
 			 * Already stopped, don't need to stop again
 			 * (If we did the shell could get confused).
 			 * Just make sure the signal STOP bit set.
 			 */
 			p->p_flag |= P_STOPPED_SIG;
 			sigqueue_delete(sigqueue, sig);
 			goto out;
 		}
 
 		/*
 		 * All other kinds of signals:
 		 * If a thread is sleeping interruptibly, simulate a
 		 * wakeup so that when it is continued it will be made
 		 * runnable and can look at the signal.  However, don't make
 		 * the PROCESS runnable, leave it stopped.
 		 * It may run a bit until it hits a thread_suspend_check().
 		 */
 		wakeup_swapper = 0;
 		PROC_SLOCK(p);
 		thread_lock(td);
 		if (TD_ON_SLEEPQ(td) && (td->td_flags & TDF_SINTR))
 			wakeup_swapper = sleepq_abort(td, intrval);
 		thread_unlock(td);
 		PROC_SUNLOCK(p);
 		if (wakeup_swapper)
 			kick_proc0();
 		goto out;
 		/*
 		 * Mutexes are short lived. Threads waiting on them will
 		 * hit thread_suspend_check() soon.
 		 */
 	} else if (p->p_state == PRS_NORMAL) {
 		if (p->p_flag & P_TRACED || action == SIG_CATCH) {
 			tdsigwakeup(td, sig, action, intrval);
 			goto out;
 		}
 
 		MPASS(action == SIG_DFL);
 
 		if (prop & SIGPROP_STOP) {
 			if (p->p_flag & (P_PPWAIT|P_WEXIT))
 				goto out;
 			p->p_flag |= P_STOPPED_SIG;
 			p->p_xsig = sig;
 			PROC_SLOCK(p);
 			wakeup_swapper = sig_suspend_threads(td, p, 1);
 			if (p->p_numthreads == p->p_suspcount) {
 				/*
 				 * only thread sending signal to another
 				 * process can reach here, if thread is sending
 				 * signal to its process, because thread does
 				 * not suspend itself here, p_numthreads
 				 * should never be equal to p_suspcount.
 				 */
 				thread_stopped(p);
 				PROC_SUNLOCK(p);
 				sigqueue_delete_proc(p, p->p_xsig);
 			} else
 				PROC_SUNLOCK(p);
 			if (wakeup_swapper)
 				kick_proc0();
 			goto out;
 		}
 	} else {
 		/* Not in "NORMAL" state. discard the signal. */
 		sigqueue_delete(sigqueue, sig);
 		goto out;
 	}
 
 	/*
 	 * The process is not stopped so we need to apply the signal to all the
 	 * running threads.
 	 */
 runfast:
 	tdsigwakeup(td, sig, action, intrval);
 	PROC_SLOCK(p);
 	thread_unsuspend(p);
 	PROC_SUNLOCK(p);
 out:
 	/* If we jump here, proc slock should not be owned. */
 	PROC_SLOCK_ASSERT(p, MA_NOTOWNED);
 	return (ret);
 }
 
 /*
  * The force of a signal has been directed against a single
  * thread.  We need to see what we can do about knocking it
  * out of any sleep it may be in etc.
  */
 static void
 tdsigwakeup(struct thread *td, int sig, sig_t action, int intrval)
 {
 	struct proc *p = td->td_proc;
 	int prop;
 	int wakeup_swapper;
 
 	wakeup_swapper = 0;
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	prop = sigprop(sig);
 
 	PROC_SLOCK(p);
 	thread_lock(td);
 	/*
 	 * Bring the priority of a thread up if we want it to get
 	 * killed in this lifetime.  Be careful to avoid bumping the
 	 * priority of the idle thread, since we still allow to signal
 	 * kernel processes.
 	 */
 	if (action == SIG_DFL && (prop & SIGPROP_KILL) != 0 &&
 	    td->td_priority > PUSER && !TD_IS_IDLETHREAD(td))
 		sched_prio(td, PUSER);
 	if (TD_ON_SLEEPQ(td)) {
 		/*
 		 * If thread is sleeping uninterruptibly
 		 * we can't interrupt the sleep... the signal will
 		 * be noticed when the process returns through
 		 * trap() or syscall().
 		 */
 		if ((td->td_flags & TDF_SINTR) == 0)
 			goto out;
 		/*
 		 * If SIGCONT is default (or ignored) and process is
 		 * asleep, we are finished; the process should not
 		 * be awakened.
 		 */
 		if ((prop & SIGPROP_CONT) && action == SIG_DFL) {
 			thread_unlock(td);
 			PROC_SUNLOCK(p);
 			sigqueue_delete(&p->p_sigqueue, sig);
 			/*
 			 * It may be on either list in this state.
 			 * Remove from both for now.
 			 */
 			sigqueue_delete(&td->td_sigqueue, sig);
 			return;
 		}
 
 		/*
 		 * Don't awaken a sleeping thread for SIGSTOP if the
 		 * STOP signal is deferred.
 		 */
 		if ((prop & SIGPROP_STOP) != 0 && (td->td_flags & (TDF_SBDRY |
 		    TDF_SERESTART | TDF_SEINTR)) == TDF_SBDRY)
 			goto out;
 
 		/*
 		 * Give low priority threads a better chance to run.
 		 */
 		if (td->td_priority > PUSER && !TD_IS_IDLETHREAD(td))
 			sched_prio(td, PUSER);
 
 		wakeup_swapper = sleepq_abort(td, intrval);
 	} else {
 		/*
 		 * Other states do nothing with the signal immediately,
 		 * other than kicking ourselves if we are running.
 		 * It will either never be noticed, or noticed very soon.
 		 */
 #ifdef SMP
 		if (TD_IS_RUNNING(td) && td != curthread)
 			forward_signal(td);
 #endif
 	}
 out:
 	PROC_SUNLOCK(p);
 	thread_unlock(td);
 	if (wakeup_swapper)
 		kick_proc0();
 }
 
 static int
 sig_suspend_threads(struct thread *td, struct proc *p, int sending)
 {
 	struct thread *td2;
 	int wakeup_swapper;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	MPASS(sending || td == curthread);
 
 	wakeup_swapper = 0;
 	FOREACH_THREAD_IN_PROC(p, td2) {
 		thread_lock(td2);
 		td2->td_flags |= TDF_ASTPENDING | TDF_NEEDSUSPCHK;
 		if ((TD_IS_SLEEPING(td2) || TD_IS_SWAPPED(td2)) &&
 		    (td2->td_flags & TDF_SINTR)) {
 			if (td2->td_flags & TDF_SBDRY) {
 				/*
 				 * Once a thread is asleep with
 				 * TDF_SBDRY and without TDF_SERESTART
 				 * or TDF_SEINTR set, it should never
 				 * become suspended due to this check.
 				 */
 				KASSERT(!TD_IS_SUSPENDED(td2),
 				    ("thread with deferred stops suspended"));
 				if (TD_SBDRY_INTR(td2))
 					wakeup_swapper |= sleepq_abort(td2,
 					    TD_SBDRY_ERRNO(td2));
 			} else if (!TD_IS_SUSPENDED(td2)) {
 				thread_suspend_one(td2);
 			}
 		} else if (!TD_IS_SUSPENDED(td2)) {
 			if (sending || td != td2)
 				td2->td_flags |= TDF_ASTPENDING;
 #ifdef SMP
 			if (TD_IS_RUNNING(td2) && td2 != td)
 				forward_signal(td2);
 #endif
 		}
 		thread_unlock(td2);
 	}
 	return (wakeup_swapper);
 }
 
 /*
  * Stop the process for an event deemed interesting to the debugger. If si is
  * non-NULL, this is a signal exchange; the new signal requested by the
  * debugger will be returned for handling. If si is NULL, this is some other
  * type of interesting event. The debugger may request a signal be delivered in
  * that case as well, however it will be deferred until it can be handled.
  */
 int
 ptracestop(struct thread *td, int sig, ksiginfo_t *si)
 {
 	struct proc *p = td->td_proc;
 	struct thread *td2;
 	ksiginfo_t ksi;
 	int prop;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	KASSERT(!(p->p_flag & P_WEXIT), ("Stopping exiting process"));
 	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
 	    &p->p_mtx.lock_object, "Stopping for traced signal");
 
 	td->td_xsig = sig;
 
 	if (si == NULL || (si->ksi_flags & KSI_PTRACE) == 0) {
 		td->td_dbgflags |= TDB_XSIG;
 		CTR4(KTR_PTRACE, "ptracestop: tid %d (pid %d) flags %#x sig %d",
 		    td->td_tid, p->p_pid, td->td_dbgflags, sig);
 		PROC_SLOCK(p);
 		while ((p->p_flag & P_TRACED) && (td->td_dbgflags & TDB_XSIG)) {
 			if (P_KILLED(p)) {
 				/*
 				 * Ensure that, if we've been PT_KILLed, the
 				 * exit status reflects that. Another thread
 				 * may also be in ptracestop(), having just
 				 * received the SIGKILL, but this thread was
 				 * unsuspended first.
 				 */
 				td->td_dbgflags &= ~TDB_XSIG;
 				td->td_xsig = SIGKILL;
 				p->p_ptevents = 0;
 				break;
 			}
 			if (p->p_flag & P_SINGLE_EXIT &&
 			    !(td->td_dbgflags & TDB_EXIT)) {
 				/*
 				 * Ignore ptrace stops except for thread exit
 				 * events when the process exits.
 				 */
 				td->td_dbgflags &= ~TDB_XSIG;
 				PROC_SUNLOCK(p);
 				return (0);
 			}
 
 			/*
 			 * Make wait(2) work.  Ensure that right after the
 			 * attach, the thread which was decided to become the
 			 * leader of attach gets reported to the waiter.
 			 * Otherwise, just avoid overwriting another thread's
 			 * assignment to p_xthread.  If another thread has
 			 * already set p_xthread, the current thread will get
 			 * a chance to report itself upon the next iteration.
 			 */
 			if ((td->td_dbgflags & TDB_FSTP) != 0 ||
 			    ((p->p_flag2 & P2_PTRACE_FSTP) == 0 &&
 			    p->p_xthread == NULL)) {
 				p->p_xsig = sig;
 				p->p_xthread = td;
 				td->td_dbgflags &= ~TDB_FSTP;
 				p->p_flag2 &= ~P2_PTRACE_FSTP;
 				p->p_flag |= P_STOPPED_SIG | P_STOPPED_TRACE;
 				sig_suspend_threads(td, p, 0);
 			}
 			if ((td->td_dbgflags & TDB_STOPATFORK) != 0) {
 				td->td_dbgflags &= ~TDB_STOPATFORK;
 			}
 stopme:
 			thread_suspend_switch(td, p);
 			if (p->p_xthread == td)
 				p->p_xthread = NULL;
 			if (!(p->p_flag & P_TRACED))
 				break;
 			if (td->td_dbgflags & TDB_SUSPEND) {
 				if (p->p_flag & P_SINGLE_EXIT)
 					break;
 				goto stopme;
 			}
 		}
 		PROC_SUNLOCK(p);
 	}
 
 	if (si != NULL && sig == td->td_xsig) {
 		/* Parent wants us to take the original signal unchanged. */
 		si->ksi_flags |= KSI_HEAD;
 		if (sigqueue_add(&td->td_sigqueue, sig, si) != 0)
 			si->ksi_signo = 0;
 	} else if (td->td_xsig != 0) {
 		/*
 		 * If parent wants us to take a new signal, then it will leave
 		 * it in td->td_xsig; otherwise we just look for signals again.
 		 */
 		ksiginfo_init(&ksi);
 		ksi.ksi_signo = td->td_xsig;
 		ksi.ksi_flags |= KSI_PTRACE;
 		prop = sigprop(td->td_xsig);
 		td2 = sigtd(p, td->td_xsig, prop);
 		tdsendsignal(p, td2, td->td_xsig, &ksi);
 		if (td != td2)
 			return (0);
 	}
 
 	return (td->td_xsig);
 }
 
 static void
 reschedule_signals(struct proc *p, sigset_t block, int flags)
 {
 	struct sigacts *ps;
 	struct thread *td;
 	int sig;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	ps = p->p_sigacts;
 	mtx_assert(&ps->ps_mtx, (flags & SIGPROCMASK_PS_LOCKED) != 0 ?
 	    MA_OWNED : MA_NOTOWNED);
 	if (SIGISEMPTY(p->p_siglist))
 		return;
 	SIGSETAND(block, p->p_siglist);
 	while ((sig = sig_ffs(&block)) != 0) {
 		SIGDELSET(block, sig);
 		td = sigtd(p, sig, 0);
 		signotify(td);
 		if (!(flags & SIGPROCMASK_PS_LOCKED))
 			mtx_lock(&ps->ps_mtx);
 		if (p->p_flag & P_TRACED ||
 		    (SIGISMEMBER(ps->ps_sigcatch, sig) &&
 		    !SIGISMEMBER(td->td_sigmask, sig)))
 			tdsigwakeup(td, sig, SIG_CATCH,
 			    (SIGISMEMBER(ps->ps_sigintr, sig) ? EINTR :
 			     ERESTART));
 		if (!(flags & SIGPROCMASK_PS_LOCKED))
 			mtx_unlock(&ps->ps_mtx);
 	}
 }
 
 void
 tdsigcleanup(struct thread *td)
 {
 	struct proc *p;
 	sigset_t unblocked;
 
 	p = td->td_proc;
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	sigqueue_flush(&td->td_sigqueue);
 	if (p->p_numthreads == 1)
 		return;
 
 	/*
 	 * Since we cannot handle signals, notify signal post code
 	 * about this by filling the sigmask.
 	 *
 	 * Also, if needed, wake up thread(s) that do not block the
 	 * same signals as the exiting thread, since the thread might
 	 * have been selected for delivery and woken up.
 	 */
 	SIGFILLSET(unblocked);
 	SIGSETNAND(unblocked, td->td_sigmask);
 	SIGFILLSET(td->td_sigmask);
 	reschedule_signals(p, unblocked, 0);
 
 }
 
 static int
 sigdeferstop_curr_flags(int cflags)
 {
 
 	MPASS((cflags & (TDF_SEINTR | TDF_SERESTART)) == 0 ||
 	    (cflags & TDF_SBDRY) != 0);
 	return (cflags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART));
 }
 
 /*
  * Defer the delivery of SIGSTOP for the current thread, according to
  * the requested mode.  Returns previous flags, which must be restored
  * by sigallowstop().
  *
  * TDF_SBDRY, TDF_SEINTR, and TDF_SERESTART flags are only set and
  * cleared by the current thread, which allow the lock-less read-only
  * accesses below.
  */
 int
 sigdeferstop_impl(int mode)
 {
 	struct thread *td;
 	int cflags, nflags;
 
 	td = curthread;
 	cflags = sigdeferstop_curr_flags(td->td_flags);
 	switch (mode) {
 	case SIGDEFERSTOP_NOP:
 		nflags = cflags;
 		break;
 	case SIGDEFERSTOP_OFF:
 		nflags = 0;
 		break;
 	case SIGDEFERSTOP_SILENT:
 		nflags = (cflags | TDF_SBDRY) & ~(TDF_SEINTR | TDF_SERESTART);
 		break;
 	case SIGDEFERSTOP_EINTR:
 		nflags = (cflags | TDF_SBDRY | TDF_SEINTR) & ~TDF_SERESTART;
 		break;
 	case SIGDEFERSTOP_ERESTART:
 		nflags = (cflags | TDF_SBDRY | TDF_SERESTART) & ~TDF_SEINTR;
 		break;
 	default:
 		panic("sigdeferstop: invalid mode %x", mode);
 		break;
 	}
 	if (cflags == nflags)
 		return (SIGDEFERSTOP_VAL_NCHG);
 	thread_lock(td);
 	td->td_flags = (td->td_flags & ~cflags) | nflags;
 	thread_unlock(td);
 	return (cflags);
 }
 
 /*
  * Restores the STOP handling mode, typically permitting the delivery
  * of SIGSTOP for the current thread.  This does not immediately
  * suspend if a stop was posted.  Instead, the thread will suspend
  * either via ast() or a subsequent interruptible sleep.
  */
 void
 sigallowstop_impl(int prev)
 {
 	struct thread *td;
 	int cflags;
 
 	KASSERT(prev != SIGDEFERSTOP_VAL_NCHG, ("failed sigallowstop"));
 	KASSERT((prev & ~(TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0,
 	    ("sigallowstop: incorrect previous mode %x", prev));
 	td = curthread;
 	cflags = sigdeferstop_curr_flags(td->td_flags);
 	if (cflags != prev) {
 		thread_lock(td);
 		td->td_flags = (td->td_flags & ~cflags) | prev;
 		thread_unlock(td);
 	}
 }
 
 /*
  * If the current process has received a signal (should be caught or cause
  * termination, should interrupt current syscall), return the signal number.
  * Stop signals with default action are processed immediately, then cleared;
  * they aren't returned.  This is checked after each entry to the system for
  * a syscall or trap (though this can usually be done without calling issignal
  * by checking the pending signal masks in cursig.) The normal call
  * sequence is
  *
  *	while (sig = cursig(curthread))
  *		postsig(sig);
  */
 static int
 issignal(struct thread *td)
 {
 	struct proc *p;
 	struct sigacts *ps;
 	struct sigqueue *queue;
 	sigset_t sigpending;
 	ksiginfo_t ksi;
 	int prop, sig, traced;
 
 	p = td->td_proc;
 	ps = p->p_sigacts;
 	mtx_assert(&ps->ps_mtx, MA_OWNED);
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	for (;;) {
 		traced = (p->p_flag & P_TRACED) || (p->p_stops & S_SIG);
 
 		sigpending = td->td_sigqueue.sq_signals;
 		SIGSETOR(sigpending, p->p_sigqueue.sq_signals);
 		SIGSETNAND(sigpending, td->td_sigmask);
 
 		if ((p->p_flag & P_PPWAIT) != 0 || (td->td_flags &
 		    (TDF_SBDRY | TDF_SERESTART | TDF_SEINTR)) == TDF_SBDRY)
 			SIG_STOPSIGMASK(sigpending);
 		if (SIGISEMPTY(sigpending))	/* no signal to send */
 			return (0);
 		if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED &&
 		    (p->p_flag2 & P2_PTRACE_FSTP) != 0 &&
 		    SIGISMEMBER(sigpending, SIGSTOP)) {
 			/*
 			 * If debugger just attached, always consume
 			 * SIGSTOP from ptrace(PT_ATTACH) first, to
 			 * execute the debugger attach ritual in
 			 * order.
 			 */
 			sig = SIGSTOP;
 			td->td_dbgflags |= TDB_FSTP;
 		} else {
 			sig = sig_ffs(&sigpending);
 		}
 
 		if (p->p_stops & S_SIG) {
 			mtx_unlock(&ps->ps_mtx);
 			stopevent(p, S_SIG, sig);
 			mtx_lock(&ps->ps_mtx);
 		}
 
 		/*
 		 * We should see pending but ignored signals
 		 * only if P_TRACED was on when they were posted.
 		 */
 		if (SIGISMEMBER(ps->ps_sigignore, sig) && (traced == 0)) {
 			sigqueue_delete(&td->td_sigqueue, sig);
 			sigqueue_delete(&p->p_sigqueue, sig);
 			continue;
 		}
 		if ((p->p_flag & (P_TRACED | P_PPTRACE)) == P_TRACED) {
 			/*
 			 * If traced, always stop.
 			 * Remove old signal from queue before the stop.
 			 * XXX shrug off debugger, it causes siginfo to
 			 * be thrown away.
 			 */
 			queue = &td->td_sigqueue;
 			ksiginfo_init(&ksi);
 			if (sigqueue_get(queue, sig, &ksi) == 0) {
 				queue = &p->p_sigqueue;
 				sigqueue_get(queue, sig, &ksi);
 			}
 			td->td_si = ksi.ksi_info;
 
 			mtx_unlock(&ps->ps_mtx);
 			sig = ptracestop(td, sig, &ksi);
 			mtx_lock(&ps->ps_mtx);
 
 			td->td_si.si_signo = 0;
 
 			/* 
 			 * Keep looking if the debugger discarded or
 			 * replaced the signal.
 			 */
 			if (sig == 0)
 				continue;
 
 			/*
 			 * If the signal became masked, re-queue it.
 			 */
 			if (SIGISMEMBER(td->td_sigmask, sig)) {
 				ksi.ksi_flags |= KSI_HEAD;
 				sigqueue_add(&p->p_sigqueue, sig, &ksi);
 				continue;
 			}
 
 			/*
 			 * If the traced bit got turned off, requeue
 			 * the signal and go back up to the top to
 			 * rescan signals.  This ensures that p_sig*
 			 * and p_sigact are consistent.
 			 */
 			if ((p->p_flag & P_TRACED) == 0) {
 				ksi.ksi_flags |= KSI_HEAD;
 				sigqueue_add(queue, sig, &ksi);
 				continue;
 			}
 		}
 
 		prop = sigprop(sig);
 
 		/*
 		 * Decide whether the signal should be returned.
 		 * Return the signal's number, or fall through
 		 * to clear it from the pending mask.
 		 */
 		switch ((intptr_t)p->p_sigacts->ps_sigact[_SIG_IDX(sig)]) {
 
 		case (intptr_t)SIG_DFL:
 			/*
 			 * Don't take default actions on system processes.
 			 */
 			if (p->p_pid <= 1) {
 #ifdef DIAGNOSTIC
 				/*
 				 * Are you sure you want to ignore SIGSEGV
 				 * in init? XXX
 				 */
 				printf("Process (pid %lu) got signal %d\n",
 					(u_long)p->p_pid, sig);
 #endif
 				break;		/* == ignore */
 			}
 			/*
 			 * If there is a pending stop signal to process with
 			 * default action, stop here, then clear the signal.
 			 * Traced or exiting processes should ignore stops.
 			 * Additionally, a member of an orphaned process group
 			 * should ignore tty stops.
 			 */
 			if (prop & SIGPROP_STOP) {
 				if (p->p_flag &
 				    (P_TRACED | P_WEXIT | P_SINGLE_EXIT) ||
 				    (p->p_pgrp->pg_jobc == 0 &&
 				     prop & SIGPROP_TTYSTOP))
 					break;	/* == ignore */
 				if (TD_SBDRY_INTR(td)) {
 					KASSERT((td->td_flags & TDF_SBDRY) != 0,
 					    ("lost TDF_SBDRY"));
 					return (-1);
 				}
 				mtx_unlock(&ps->ps_mtx);
 				WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
 				    &p->p_mtx.lock_object, "Catching SIGSTOP");
 				sigqueue_delete(&td->td_sigqueue, sig);
 				sigqueue_delete(&p->p_sigqueue, sig);
 				p->p_flag |= P_STOPPED_SIG;
 				p->p_xsig = sig;
 				PROC_SLOCK(p);
 				sig_suspend_threads(td, p, 0);
 				thread_suspend_switch(td, p);
 				PROC_SUNLOCK(p);
 				mtx_lock(&ps->ps_mtx);
 				goto next;
 			} else if (prop & SIGPROP_IGNORE) {
 				/*
 				 * Except for SIGCONT, shouldn't get here.
 				 * Default action is to ignore; drop it.
 				 */
 				break;		/* == ignore */
 			} else
 				return (sig);
 			/*NOTREACHED*/
 
 		case (intptr_t)SIG_IGN:
 			/*
 			 * Masking above should prevent us ever trying
 			 * to take action on an ignored signal other
 			 * than SIGCONT, unless process is traced.
 			 */
 			if ((prop & SIGPROP_CONT) == 0 &&
 			    (p->p_flag & P_TRACED) == 0)
 				printf("issignal\n");
 			break;		/* == ignore */
 
 		default:
 			/*
 			 * This signal has an action, let
 			 * postsig() process it.
 			 */
 			return (sig);
 		}
 		sigqueue_delete(&td->td_sigqueue, sig);	/* take the signal! */
 		sigqueue_delete(&p->p_sigqueue, sig);
 next:;
 	}
 	/* NOTREACHED */
 }
 
 void
 thread_stopped(struct proc *p)
 {
 	int n;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	n = p->p_suspcount;
 	if (p == curproc)
 		n++;
 	if ((p->p_flag & P_STOPPED_SIG) && (n == p->p_numthreads)) {
 		PROC_SUNLOCK(p);
 		p->p_flag &= ~P_WAITED;
 		PROC_LOCK(p->p_pptr);
 		childproc_stopped(p, (p->p_flag & P_TRACED) ?
 			CLD_TRAPPED : CLD_STOPPED);
 		PROC_UNLOCK(p->p_pptr);
 		PROC_SLOCK(p);
 	}
 }
 
 /*
  * Take the action for the specified signal
  * from the current set of pending signals.
  */
 int
 postsig(int sig)
 {
 	struct thread *td;
 	struct proc *p;
 	struct sigacts *ps;
 	sig_t action;
 	ksiginfo_t ksi;
 	sigset_t returnmask;
 
 	KASSERT(sig != 0, ("postsig"));
 
 	td = curthread;
 	p = td->td_proc;
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	ps = p->p_sigacts;
 	mtx_assert(&ps->ps_mtx, MA_OWNED);
 	ksiginfo_init(&ksi);
 	if (sigqueue_get(&td->td_sigqueue, sig, &ksi) == 0 &&
 	    sigqueue_get(&p->p_sigqueue, sig, &ksi) == 0)
 		return (0);
 	ksi.ksi_signo = sig;
 	if (ksi.ksi_code == SI_TIMER)
 		itimer_accept(p, ksi.ksi_timerid, &ksi);
 	action = ps->ps_sigact[_SIG_IDX(sig)];
 #ifdef KTRACE
 	if (KTRPOINT(td, KTR_PSIG))
 		ktrpsig(sig, action, td->td_pflags & TDP_OLDMASK ?
 		    &td->td_oldsigmask : &td->td_sigmask, ksi.ksi_code);
 #endif
 	if ((p->p_stops & S_SIG) != 0) {
 		mtx_unlock(&ps->ps_mtx);
 		stopevent(p, S_SIG, sig);
 		mtx_lock(&ps->ps_mtx);
 	}
 
 	if (action == SIG_DFL) {
 		/*
 		 * Default action, where the default is to kill
 		 * the process.  (Other cases were ignored above.)
 		 */
 		mtx_unlock(&ps->ps_mtx);
 		proc_td_siginfo_capture(td, &ksi.ksi_info);
 		sigexit(td, sig);
 		/* NOTREACHED */
 	} else {
 		/*
 		 * If we get here, the signal must be caught.
 		 */
 		KASSERT(action != SIG_IGN, ("postsig action %p", action));
 		KASSERT(!SIGISMEMBER(td->td_sigmask, sig),
 		    ("postsig action: blocked sig %d", sig));
 
 		/*
 		 * Set the new mask value and also defer further
 		 * occurrences of this signal.
 		 *
 		 * Special case: user has done a sigsuspend.  Here the
 		 * current mask is not of interest, but rather the
 		 * mask from before the sigsuspend is what we want
 		 * restored after the signal processing is completed.
 		 */
 		if (td->td_pflags & TDP_OLDMASK) {
 			returnmask = td->td_oldsigmask;
 			td->td_pflags &= ~TDP_OLDMASK;
 		} else
 			returnmask = td->td_sigmask;
 
 		if (p->p_sig == sig) {
-			p->p_code = 0;
 			p->p_sig = 0;
 		}
 		(*p->p_sysent->sv_sendsig)(action, &ksi, &returnmask);
 		postsig_done(sig, td, ps);
 	}
 	return (1);
 }
 
 void
 proc_wkilled(struct proc *p)
 {
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	if ((p->p_flag & P_WKILLED) == 0) {
 		p->p_flag |= P_WKILLED;
 		/*
 		 * Notify swapper that there is a process to swap in.
 		 * The notification is racy, at worst it would take 10
 		 * seconds for the swapper process to notice.
 		 */
 		if ((p->p_flag & (P_INMEM | P_SWAPPINGIN)) == 0)
 			wakeup(&proc0);
 	}
 }
 
 /*
  * Kill the current process for stated reason.
  */
 void
 killproc(struct proc *p, char *why)
 {
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	CTR3(KTR_PROC, "killproc: proc %p (pid %d, %s)", p, p->p_pid,
 	    p->p_comm);
 	log(LOG_ERR, "pid %d (%s), jid %d, uid %d, was killed: %s\n",
 	    p->p_pid, p->p_comm, p->p_ucred->cr_prison->pr_id,
 	    p->p_ucred->cr_uid, why);
 	proc_wkilled(p);
 	kern_psignal(p, SIGKILL);
 }
 
 /*
  * Force the current process to exit with the specified signal, dumping core
  * if appropriate.  We bypass the normal tests for masked and caught signals,
  * allowing unrecoverable failures to terminate the process without changing
  * signal state.  Mark the accounting record with the signal termination.
  * If dumping core, save the signal number for the debugger.  Calls exit and
  * does not return.
  */
 void
 sigexit(struct thread *td, int sig)
 {
 	struct proc *p = td->td_proc;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	p->p_acflag |= AXSIG;
 	/*
 	 * We must be single-threading to generate a core dump.  This
 	 * ensures that the registers in the core file are up-to-date.
 	 * Also, the ELF dump handler assumes that the thread list doesn't
 	 * change out from under it.
 	 *
 	 * XXX If another thread attempts to single-thread before us
 	 *     (e.g. via fork()), we won't get a dump at all.
 	 */
 	if ((sigprop(sig) & SIGPROP_CORE) &&
 	    thread_single(p, SINGLE_NO_EXIT) == 0) {
 		p->p_sig = sig;
 		/*
 		 * Log signals which would cause core dumps
 		 * (Log as LOG_INFO to appease those who don't want
 		 * these messages.)
 		 * XXX : Todo, as well as euid, write out ruid too
 		 * Note that coredump() drops proc lock.
 		 */
 		if (coredump(td) == 0)
 			sig |= WCOREFLAG;
 		if (kern_logsigexit)
 			log(LOG_INFO,
 			    "pid %d (%s), jid %d, uid %d: exited on "
 			    "signal %d%s\n", p->p_pid, p->p_comm,
 			    p->p_ucred->cr_prison->pr_id,
 			    td->td_ucred->cr_uid,
 			    sig &~ WCOREFLAG,
 			    sig & WCOREFLAG ? " (core dumped)" : "");
 	} else
 		PROC_UNLOCK(p);
 	exit1(td, 0, sig);
 	/* NOTREACHED */
 }
 
 /*
  * Send queued SIGCHLD to parent when child process's state
  * is changed.
  */
 static void
 sigparent(struct proc *p, int reason, int status)
 {
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_LOCK_ASSERT(p->p_pptr, MA_OWNED);
 
 	if (p->p_ksi != NULL) {
 		p->p_ksi->ksi_signo  = SIGCHLD;
 		p->p_ksi->ksi_code   = reason;
 		p->p_ksi->ksi_status = status;
 		p->p_ksi->ksi_pid    = p->p_pid;
 		p->p_ksi->ksi_uid    = p->p_ucred->cr_ruid;
 		if (KSI_ONQ(p->p_ksi))
 			return;
 	}
 	pksignal(p->p_pptr, SIGCHLD, p->p_ksi);
 }
 
 static void
 childproc_jobstate(struct proc *p, int reason, int sig)
 {
 	struct sigacts *ps;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_LOCK_ASSERT(p->p_pptr, MA_OWNED);
 
 	/*
 	 * Wake up parent sleeping in kern_wait(), also send
 	 * SIGCHLD to parent, but SIGCHLD does not guarantee
 	 * that parent will awake, because parent may masked
 	 * the signal.
 	 */
 	p->p_pptr->p_flag |= P_STATCHILD;
 	wakeup(p->p_pptr);
 
 	ps = p->p_pptr->p_sigacts;
 	mtx_lock(&ps->ps_mtx);
 	if ((ps->ps_flag & PS_NOCLDSTOP) == 0) {
 		mtx_unlock(&ps->ps_mtx);
 		sigparent(p, reason, sig);
 	} else
 		mtx_unlock(&ps->ps_mtx);
 }
 
 void
 childproc_stopped(struct proc *p, int reason)
 {
 
 	childproc_jobstate(p, reason, p->p_xsig);
 }
 
 void
 childproc_continued(struct proc *p)
 {
 	childproc_jobstate(p, CLD_CONTINUED, SIGCONT);
 }
 
 void
 childproc_exited(struct proc *p)
 {
 	int reason, status;
 
 	if (WCOREDUMP(p->p_xsig)) {
 		reason = CLD_DUMPED;
 		status = WTERMSIG(p->p_xsig);
 	} else if (WIFSIGNALED(p->p_xsig)) {
 		reason = CLD_KILLED;
 		status = WTERMSIG(p->p_xsig);
 	} else {
 		reason = CLD_EXITED;
 		status = p->p_xexit;
 	}
 	/*
 	 * XXX avoid calling wakeup(p->p_pptr), the work is
 	 * done in exit1().
 	 */
 	sigparent(p, reason, status);
 }
 
 #define	MAX_NUM_CORE_FILES 100000
 #ifndef NUM_CORE_FILES
 #define	NUM_CORE_FILES 5
 #endif
 CTASSERT(NUM_CORE_FILES >= 0 && NUM_CORE_FILES <= MAX_NUM_CORE_FILES);
 static int num_cores = NUM_CORE_FILES;
 
 static int
 sysctl_debug_num_cores_check (SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	int new_val;
 
 	new_val = num_cores;
 	error = sysctl_handle_int(oidp, &new_val, 0, req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	if (new_val > MAX_NUM_CORE_FILES)
 		new_val = MAX_NUM_CORE_FILES;
 	if (new_val < 0)
 		new_val = 0;
 	num_cores = new_val;
 	return (0);
 }
 SYSCTL_PROC(_debug, OID_AUTO, ncores, CTLTYPE_INT|CTLFLAG_RW,
 	    0, sizeof(int), sysctl_debug_num_cores_check, "I",
 	    "Maximum number of generated process corefiles while using index format");
 
 #define	GZIP_SUFFIX	".gz"
 #define	ZSTD_SUFFIX	".zst"
 
 int compress_user_cores = 0;
 
 static int
 sysctl_compress_user_cores(SYSCTL_HANDLER_ARGS)
 {
 	int error, val;
 
 	val = compress_user_cores;
 	error = sysctl_handle_int(oidp, &val, 0, req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	if (val != 0 && !compressor_avail(val))
 		return (EINVAL);
 	compress_user_cores = val;
 	return (error);
 }
 SYSCTL_PROC(_kern, OID_AUTO, compress_user_cores, CTLTYPE_INT | CTLFLAG_RWTUN,
     0, sizeof(int), sysctl_compress_user_cores, "I",
     "Enable compression of user corefiles ("
     __XSTRING(COMPRESS_GZIP) " = gzip, "
     __XSTRING(COMPRESS_ZSTD) " = zstd)");
 
 int compress_user_cores_level = 6;
 SYSCTL_INT(_kern, OID_AUTO, compress_user_cores_level, CTLFLAG_RWTUN,
     &compress_user_cores_level, 0,
     "Corefile compression level");
 
 /*
  * Protect the access to corefilename[] by allproc_lock.
  */
 #define	corefilename_lock	allproc_lock
 
 static char corefilename[MAXPATHLEN] = {"%N.core"};
 TUNABLE_STR("kern.corefile", corefilename, sizeof(corefilename));
 
 static int
 sysctl_kern_corefile(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 
 	sx_xlock(&corefilename_lock);
 	error = sysctl_handle_string(oidp, corefilename, sizeof(corefilename),
 	    req);
 	sx_xunlock(&corefilename_lock);
 
 	return (error);
 }
 SYSCTL_PROC(_kern, OID_AUTO, corefile, CTLTYPE_STRING | CTLFLAG_RW |
     CTLFLAG_MPSAFE, 0, 0, sysctl_kern_corefile, "A",
     "Process corefile name format string");
 
 static void
 vnode_close_locked(struct thread *td, struct vnode *vp)
 {
 
 	VOP_UNLOCK(vp, 0);
 	vn_close(vp, FWRITE, td->td_ucred, td);
 }
 
 /*
  * If the core format has a %I in it, then we need to check
  * for existing corefiles before defining a name.
  * To do this we iterate over 0..ncores to find a
  * non-existing core file name to use. If all core files are
  * already used we choose the oldest one.
  */
 static int
 corefile_open_last(struct thread *td, char *name, int indexpos,
     int indexlen, int ncores, struct vnode **vpp)
 {
 	struct vnode *oldvp, *nextvp, *vp;
 	struct vattr vattr;
 	struct nameidata nd;
 	int error, i, flags, oflags, cmode;
 	char ch;
 	struct timespec lasttime;
 
 	nextvp = oldvp = NULL;
 	cmode = S_IRUSR | S_IWUSR;
 	oflags = VN_OPEN_NOAUDIT | VN_OPEN_NAMECACHE |
 	    (capmode_coredump ? VN_OPEN_NOCAPCHECK : 0);
 
 	for (i = 0; i < ncores; i++) {
 		flags = O_CREAT | FWRITE | O_NOFOLLOW;
 
 		ch = name[indexpos + indexlen];
 		(void)snprintf(name + indexpos, indexlen + 1, "%.*u", indexlen,
 		    i);
 		name[indexpos + indexlen] = ch;
 
 		NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, name, td);
 		error = vn_open_cred(&nd, &flags, cmode, oflags, td->td_ucred,
 		    NULL);
 		if (error != 0)
 			break;
 
 		vp = nd.ni_vp;
 		NDFREE(&nd, NDF_ONLY_PNBUF);
 		if ((flags & O_CREAT) == O_CREAT) {
 			nextvp = vp;
 			break;
 		}
 
 		error = VOP_GETATTR(vp, &vattr, td->td_ucred);
 		if (error != 0) {
 			vnode_close_locked(td, vp);
 			break;
 		}
 
 		if (oldvp == NULL ||
 		    lasttime.tv_sec > vattr.va_mtime.tv_sec ||
 		    (lasttime.tv_sec == vattr.va_mtime.tv_sec &&
 		    lasttime.tv_nsec >= vattr.va_mtime.tv_nsec)) {
 			if (oldvp != NULL)
 				vnode_close_locked(td, oldvp);
 			oldvp = vp;
 			lasttime = vattr.va_mtime;
 		} else {
 			vnode_close_locked(td, vp);
 		}
 	}
 
 	if (oldvp != NULL) {
 		if (nextvp == NULL)
 			nextvp = oldvp;
 		else
 			vnode_close_locked(td, oldvp);
 	}
 	if (error != 0) {
 		if (nextvp != NULL)
 			vnode_close_locked(td, oldvp);
 	} else {
 		*vpp = nextvp;
 	}
 
 	return (error);
 }
 
 /*
  * corefile_open(comm, uid, pid, td, compress, vpp, namep)
  * Expand the name described in corefilename, using name, uid, and pid
  * and open/create core file.
  * corefilename is a printf-like string, with three format specifiers:
  *	%N	name of process ("name")
  *	%P	process id (pid)
  *	%U	user id (uid)
  * For example, "%N.core" is the default; they can be disabled completely
  * by using "/dev/null", or all core files can be stored in "/cores/%U/%N-%P".
  * This is controlled by the sysctl variable kern.corefile (see above).
  */
 static int
 corefile_open(const char *comm, uid_t uid, pid_t pid, struct thread *td,
     int compress, struct vnode **vpp, char **namep)
 {
 	struct sbuf sb;
 	struct nameidata nd;
 	const char *format;
 	char *hostname, *name;
 	int cmode, error, flags, i, indexpos, indexlen, oflags, ncores;
 
 	hostname = NULL;
 	format = corefilename;
 	name = malloc(MAXPATHLEN, M_TEMP, M_WAITOK | M_ZERO);
 	indexlen = 0;
 	indexpos = -1;
 	ncores = num_cores;
 	(void)sbuf_new(&sb, name, MAXPATHLEN, SBUF_FIXEDLEN);
 	sx_slock(&corefilename_lock);
 	for (i = 0; format[i] != '\0'; i++) {
 		switch (format[i]) {
 		case '%':	/* Format character */
 			i++;
 			switch (format[i]) {
 			case '%':
 				sbuf_putc(&sb, '%');
 				break;
 			case 'H':	/* hostname */
 				if (hostname == NULL) {
 					hostname = malloc(MAXHOSTNAMELEN,
 					    M_TEMP, M_WAITOK);
 				}
 				getcredhostname(td->td_ucred, hostname,
 				    MAXHOSTNAMELEN);
 				sbuf_printf(&sb, "%s", hostname);
 				break;
 			case 'I':	/* autoincrementing index */
 				if (indexpos != -1) {
 					sbuf_printf(&sb, "%%I");
 					break;
 				}
 
 				indexpos = sbuf_len(&sb);
 				sbuf_printf(&sb, "%u", ncores - 1);
 				indexlen = sbuf_len(&sb) - indexpos;
 				break;
 			case 'N':	/* process name */
 				sbuf_printf(&sb, "%s", comm);
 				break;
 			case 'P':	/* process id */
 				sbuf_printf(&sb, "%u", pid);
 				break;
 			case 'U':	/* user id */
 				sbuf_printf(&sb, "%u", uid);
 				break;
 			default:
 				log(LOG_ERR,
 				    "Unknown format character %c in "
 				    "corename `%s'\n", format[i], format);
 				break;
 			}
 			break;
 		default:
 			sbuf_putc(&sb, format[i]);
 			break;
 		}
 	}
 	sx_sunlock(&corefilename_lock);
 	free(hostname, M_TEMP);
 	if (compress == COMPRESS_GZIP)
 		sbuf_printf(&sb, GZIP_SUFFIX);
 	else if (compress == COMPRESS_ZSTD)
 		sbuf_printf(&sb, ZSTD_SUFFIX);
 	if (sbuf_error(&sb) != 0) {
 		log(LOG_ERR, "pid %ld (%s), uid (%lu): corename is too "
 		    "long\n", (long)pid, comm, (u_long)uid);
 		sbuf_delete(&sb);
 		free(name, M_TEMP);
 		return (ENOMEM);
 	}
 	sbuf_finish(&sb);
 	sbuf_delete(&sb);
 
 	if (indexpos != -1) {
 		error = corefile_open_last(td, name, indexpos, indexlen, ncores,
 		    vpp);
 		if (error != 0) {
 			log(LOG_ERR,
 			    "pid %d (%s), uid (%u):  Path `%s' failed "
 			    "on initial open test, error = %d\n",
 			    pid, comm, uid, name, error);
 		}
 	} else {
 		cmode = S_IRUSR | S_IWUSR;
 		oflags = VN_OPEN_NOAUDIT | VN_OPEN_NAMECACHE |
 		    (capmode_coredump ? VN_OPEN_NOCAPCHECK : 0);
 		flags = O_CREAT | FWRITE | O_NOFOLLOW;
 
 		NDINIT(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, name, td);
 		error = vn_open_cred(&nd, &flags, cmode, oflags, td->td_ucred,
 		    NULL);
 		if (error == 0) {
 			*vpp = nd.ni_vp;
 			NDFREE(&nd, NDF_ONLY_PNBUF);
 		}
 	}
 
 	if (error != 0) {
 #ifdef AUDIT
 		audit_proc_coredump(td, name, error);
 #endif
 		free(name, M_TEMP);
 		return (error);
 	}
 	*namep = name;
 	return (0);
 }
 
 /*
  * Dump a process' core.  The main routine does some
  * policy checking, and creates the name of the coredump;
  * then it passes on a vnode and a size limit to the process-specific
  * coredump routine if there is one; if there _is not_ one, it returns
  * ENOSYS; otherwise it returns the error from the process-specific routine.
  */
 
 static int
 coredump(struct thread *td)
 {
 	struct proc *p = td->td_proc;
 	struct ucred *cred = td->td_ucred;
 	struct vnode *vp;
 	struct flock lf;
 	struct vattr vattr;
 	int error, error1, locked;
 	char *name;			/* name of corefile */
 	void *rl_cookie;
 	off_t limit;
 	char *fullpath, *freepath = NULL;
 	struct sbuf *sb;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	MPASS((p->p_flag & P_HADTHREADS) == 0 || p->p_singlethread == td);
 	_STOPEVENT(p, S_CORE, 0);
 
 	if (!do_coredump || (!sugid_coredump && (p->p_flag & P_SUGID) != 0) ||
 	    (p->p_flag2 & P2_NOTRACE) != 0) {
 		PROC_UNLOCK(p);
 		return (EFAULT);
 	}
 
 	/*
 	 * Note that the bulk of limit checking is done after
 	 * the corefile is created.  The exception is if the limit
 	 * for corefiles is 0, in which case we don't bother
 	 * creating the corefile at all.  This layout means that
 	 * a corefile is truncated instead of not being created,
 	 * if it is larger than the limit.
 	 */
 	limit = (off_t)lim_cur(td, RLIMIT_CORE);
 	if (limit == 0 || racct_get_available(p, RACCT_CORE) == 0) {
 		PROC_UNLOCK(p);
 		return (EFBIG);
 	}
 	PROC_UNLOCK(p);
 
 	error = corefile_open(p->p_comm, cred->cr_uid, p->p_pid, td,
 	    compress_user_cores, &vp, &name);
 	if (error != 0)
 		return (error);
 
 	/*
 	 * Don't dump to non-regular files or files with links.
 	 * Do not dump into system files.
 	 */
 	if (vp->v_type != VREG || VOP_GETATTR(vp, &vattr, cred) != 0 ||
 	    vattr.va_nlink != 1 || (vp->v_vflag & VV_SYSTEM) != 0) {
 		VOP_UNLOCK(vp, 0);
 		error = EFAULT;
 		goto out;
 	}
 
 	VOP_UNLOCK(vp, 0);
 
 	/* Postpone other writers, including core dumps of other processes. */
 	rl_cookie = vn_rangelock_wlock(vp, 0, OFF_MAX);
 
 	lf.l_whence = SEEK_SET;
 	lf.l_start = 0;
 	lf.l_len = 0;
 	lf.l_type = F_WRLCK;
 	locked = (VOP_ADVLOCK(vp, (caddr_t)p, F_SETLK, &lf, F_FLOCK) == 0);
 
 	VATTR_NULL(&vattr);
 	vattr.va_size = 0;
 	if (set_core_nodump_flag)
 		vattr.va_flags = UF_NODUMP;
 	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
 	VOP_SETATTR(vp, &vattr, cred);
 	VOP_UNLOCK(vp, 0);
 	PROC_LOCK(p);
 	p->p_acflag |= ACORE;
 	PROC_UNLOCK(p);
 
 	if (p->p_sysent->sv_coredump != NULL) {
 		error = p->p_sysent->sv_coredump(td, vp, limit, 0);
 	} else {
 		error = ENOSYS;
 	}
 
 	if (locked) {
 		lf.l_type = F_UNLCK;
 		VOP_ADVLOCK(vp, (caddr_t)p, F_UNLCK, &lf, F_FLOCK);
 	}
 	vn_rangelock_unlock(vp, rl_cookie);
 
 	/*
 	 * Notify the userland helper that a process triggered a core dump.
 	 * This allows the helper to run an automated debugging session.
 	 */
 	if (error != 0 || coredump_devctl == 0)
 		goto out;
 	sb = sbuf_new_auto();
 	if (vn_fullpath_global(td, p->p_textvp, &fullpath, &freepath) != 0)
 		goto out2;
 	sbuf_printf(sb, "comm=\"");
 	devctl_safe_quote_sb(sb, fullpath);
 	free(freepath, M_TEMP);
 	sbuf_printf(sb, "\" core=\"");
 
 	/*
 	 * We can't lookup core file vp directly. When we're replacing a core, and
 	 * other random times, we flush the name cache, so it will fail. Instead,
 	 * if the path of the core is relative, add the current dir in front if it.
 	 */
 	if (name[0] != '/') {
 		fullpath = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
 		if (kern___getcwd(td, fullpath, UIO_SYSSPACE, MAXPATHLEN, MAXPATHLEN) != 0) {
 			free(fullpath, M_TEMP);
 			goto out2;
 		}
 		devctl_safe_quote_sb(sb, fullpath);
 		free(fullpath, M_TEMP);
 		sbuf_putc(sb, '/');
 	}
 	devctl_safe_quote_sb(sb, name);
 	sbuf_printf(sb, "\"");
 	if (sbuf_finish(sb) == 0)
 		devctl_notify("kernel", "signal", "coredump", sbuf_data(sb));
 out2:
 	sbuf_delete(sb);
 out:
 	error1 = vn_close(vp, FWRITE, cred, td);
 	if (error == 0)
 		error = error1;
 #ifdef AUDIT
 	audit_proc_coredump(td, name, error);
 #endif
 	free(name, M_TEMP);
 	return (error);
 }
 
 /*
  * Nonexistent system call-- signal process (may want to handle it).  Flag
  * error in case process won't see signal immediately (blocked or ignored).
  */
 #ifndef _SYS_SYSPROTO_H_
 struct nosys_args {
 	int	dummy;
 };
 #endif
 /* ARGSUSED */
 int
 nosys(struct thread *td, struct nosys_args *args)
 {
 	struct proc *p;
 
 	p = td->td_proc;
 
 	PROC_LOCK(p);
 	tdsignal(td, SIGSYS);
 	PROC_UNLOCK(p);
 	if (kern_lognosys == 1 || kern_lognosys == 3) {
 		uprintf("pid %d comm %s: nosys %d\n", p->p_pid, p->p_comm,
 		    td->td_sa.code);
 	}
 	if (kern_lognosys == 2 || kern_lognosys == 3) {
 		printf("pid %d comm %s: nosys %d\n", p->p_pid, p->p_comm,
 		    td->td_sa.code);
 	}
 	return (ENOSYS);
 }
 
 /*
  * Send a SIGIO or SIGURG signal to a process or process group using stored
  * credentials rather than those of the current process.
  */
 void
 pgsigio(struct sigio **sigiop, int sig, int checkctty)
 {
 	ksiginfo_t ksi;
 	struct sigio *sigio;
 
 	ksiginfo_init(&ksi);
 	ksi.ksi_signo = sig;
 	ksi.ksi_code = SI_KERNEL;
 
 	SIGIO_LOCK();
 	sigio = *sigiop;
 	if (sigio == NULL) {
 		SIGIO_UNLOCK();
 		return;
 	}
 	if (sigio->sio_pgid > 0) {
 		PROC_LOCK(sigio->sio_proc);
 		if (CANSIGIO(sigio->sio_ucred, sigio->sio_proc->p_ucred))
 			kern_psignal(sigio->sio_proc, sig);
 		PROC_UNLOCK(sigio->sio_proc);
 	} else if (sigio->sio_pgid < 0) {
 		struct proc *p;
 
 		PGRP_LOCK(sigio->sio_pgrp);
 		LIST_FOREACH(p, &sigio->sio_pgrp->pg_members, p_pglist) {
 			PROC_LOCK(p);
 			if (p->p_state == PRS_NORMAL &&
 			    CANSIGIO(sigio->sio_ucred, p->p_ucred) &&
 			    (checkctty == 0 || (p->p_flag & P_CONTROLT)))
 				kern_psignal(p, sig);
 			PROC_UNLOCK(p);
 		}
 		PGRP_UNLOCK(sigio->sio_pgrp);
 	}
 	SIGIO_UNLOCK();
 }
 
 static int
 filt_sigattach(struct knote *kn)
 {
 	struct proc *p = curproc;
 
 	kn->kn_ptr.p_proc = p;
 	kn->kn_flags |= EV_CLEAR;		/* automatically set */
 
 	knlist_add(p->p_klist, kn, 0);
 
 	return (0);
 }
 
 static void
 filt_sigdetach(struct knote *kn)
 {
 	struct proc *p = kn->kn_ptr.p_proc;
 
 	knlist_remove(p->p_klist, kn, 0);
 }
 
 /*
  * signal knotes are shared with proc knotes, so we apply a mask to
  * the hint in order to differentiate them from process hints.  This
  * could be avoided by using a signal-specific knote list, but probably
  * isn't worth the trouble.
  */
 static int
 filt_signal(struct knote *kn, long hint)
 {
 
 	if (hint & NOTE_SIGNAL) {
 		hint &= ~NOTE_SIGNAL;
 
 		if (kn->kn_id == hint)
 			kn->kn_data++;
 	}
 	return (kn->kn_data != 0);
 }
 
 struct sigacts *
 sigacts_alloc(void)
 {
 	struct sigacts *ps;
 
 	ps = malloc(sizeof(struct sigacts), M_SUBPROC, M_WAITOK | M_ZERO);
 	refcount_init(&ps->ps_refcnt, 1);
 	mtx_init(&ps->ps_mtx, "sigacts", NULL, MTX_DEF);
 	return (ps);
 }
 
 void
 sigacts_free(struct sigacts *ps)
 {
 
 	if (refcount_release(&ps->ps_refcnt) == 0)
 		return;
 	mtx_destroy(&ps->ps_mtx);
 	free(ps, M_SUBPROC);
 }
 
 struct sigacts *
 sigacts_hold(struct sigacts *ps)
 {
 
 	refcount_acquire(&ps->ps_refcnt);
 	return (ps);
 }
 
 void
 sigacts_copy(struct sigacts *dest, struct sigacts *src)
 {
 
 	KASSERT(dest->ps_refcnt == 1, ("sigacts_copy to shared dest"));
 	mtx_lock(&src->ps_mtx);
 	bcopy(src, dest, offsetof(struct sigacts, ps_refcnt));
 	mtx_unlock(&src->ps_mtx);
 }
 
 int
 sigacts_shared(struct sigacts *ps)
 {
 
 	return (ps->ps_refcnt > 1);
 }
Index: user/ngie/bug-237403/sys/kern/kern_thread.c
===================================================================
--- user/ngie/bug-237403/sys/kern/kern_thread.c	(revision 346925)
+++ user/ngie/bug-237403/sys/kern/kern_thread.c	(revision 346926)
@@ -1,1267 +1,1267 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (C) 2001 Julian Elischer <julian@freebsd.org>.
  *  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice(s), this list of conditions and the following disclaimer as
  *    the first lines of this file unmodified other than the possible
  *    addition of one or more copyright notices.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice(s), this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  * DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY
  * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
  * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
  * DAMAGE.
  */
 
 #include "opt_witness.h"
 #include "opt_hwpmc_hooks.h"
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/epoch.h>
 #include <sys/rangelock.h>
 #include <sys/resourcevar.h>
 #include <sys/sdt.h>
 #include <sys/smp.h>
 #include <sys/sched.h>
 #include <sys/sleepqueue.h>
 #include <sys/selinfo.h>
 #include <sys/syscallsubr.h>
 #include <sys/sysent.h>
 #include <sys/turnstile.h>
 #include <sys/ktr.h>
 #include <sys/rwlock.h>
 #include <sys/umtx.h>
 #include <sys/vmmeter.h>
 #include <sys/cpuset.h>
 #ifdef	HWPMC_HOOKS
 #include <sys/pmckern.h>
 #endif
 
 #include <security/audit/audit.h>
 
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/uma.h>
 #include <sys/eventhandler.h>
 
 /*
  * Asserts below verify the stability of struct thread and struct proc
  * layout, as exposed by KBI to modules.  On head, the KBI is allowed
  * to drift, change to the structures must be accompanied by the
  * assert update.
  *
  * On the stable branches after KBI freeze, conditions must not be
  * violated.  Typically new fields are moved to the end of the
  * structures.
  */
 #ifdef __amd64__
 _Static_assert(offsetof(struct thread, td_flags) == 0xfc,
     "struct thread KBI td_flags");
 _Static_assert(offsetof(struct thread, td_pflags) == 0x104,
     "struct thread KBI td_pflags");
 _Static_assert(offsetof(struct thread, td_frame) == 0x478,
     "struct thread KBI td_frame");
 _Static_assert(offsetof(struct thread, td_emuldata) == 0x530,
     "struct thread KBI td_emuldata");
 _Static_assert(offsetof(struct proc, p_flag) == 0xb0,
     "struct proc KBI p_flag");
 _Static_assert(offsetof(struct proc, p_pid) == 0xbc,
     "struct proc KBI p_pid");
-_Static_assert(offsetof(struct proc, p_filemon) == 0x3d0,
+_Static_assert(offsetof(struct proc, p_filemon) == 0x3c8,
     "struct proc KBI p_filemon");
-_Static_assert(offsetof(struct proc, p_comm) == 0x3e8,
+_Static_assert(offsetof(struct proc, p_comm) == 0x3e0,
     "struct proc KBI p_comm");
-_Static_assert(offsetof(struct proc, p_emuldata) == 0x4c8,
+_Static_assert(offsetof(struct proc, p_emuldata) == 0x4c0,
     "struct proc KBI p_emuldata");
 #endif
 #ifdef __i386__
 _Static_assert(offsetof(struct thread, td_flags) == 0x98,
     "struct thread KBI td_flags");
 _Static_assert(offsetof(struct thread, td_pflags) == 0xa0,
     "struct thread KBI td_pflags");
 _Static_assert(offsetof(struct thread, td_frame) == 0x2ec,
     "struct thread KBI td_frame");
 _Static_assert(offsetof(struct thread, td_emuldata) == 0x338,
     "struct thread KBI td_emuldata");
 _Static_assert(offsetof(struct proc, p_flag) == 0x68,
     "struct proc KBI p_flag");
 _Static_assert(offsetof(struct proc, p_pid) == 0x74,
     "struct proc KBI p_pid");
-_Static_assert(offsetof(struct proc, p_filemon) == 0x27c,
+_Static_assert(offsetof(struct proc, p_filemon) == 0x278,
     "struct proc KBI p_filemon");
-_Static_assert(offsetof(struct proc, p_comm) == 0x290,
+_Static_assert(offsetof(struct proc, p_comm) == 0x28c,
     "struct proc KBI p_comm");
-_Static_assert(offsetof(struct proc, p_emuldata) == 0x31c,
+_Static_assert(offsetof(struct proc, p_emuldata) == 0x318,
     "struct proc KBI p_emuldata");
 #endif
 
 SDT_PROVIDER_DECLARE(proc);
 SDT_PROBE_DEFINE(proc, , , lwp__exit);
 
 /*
  * thread related storage.
  */
 static uma_zone_t thread_zone;
 
 TAILQ_HEAD(, thread) zombie_threads = TAILQ_HEAD_INITIALIZER(zombie_threads);
 static struct mtx zombie_lock;
 MTX_SYSINIT(zombie_lock, &zombie_lock, "zombie lock", MTX_SPIN);
 
 static void thread_zombie(struct thread *);
 static int thread_unsuspend_one(struct thread *td, struct proc *p,
     bool boundary);
 
 #define TID_BUFFER_SIZE	1024
 
 struct mtx tid_lock;
 static struct unrhdr *tid_unrhdr;
 static lwpid_t tid_buffer[TID_BUFFER_SIZE];
 static int tid_head, tid_tail;
 static MALLOC_DEFINE(M_TIDHASH, "tidhash", "thread hash");
 
 struct	tidhashhead *tidhashtbl;
 u_long	tidhash;
 struct	rwlock tidhash_lock;
 
 EVENTHANDLER_LIST_DEFINE(thread_ctor);
 EVENTHANDLER_LIST_DEFINE(thread_dtor);
 EVENTHANDLER_LIST_DEFINE(thread_init);
 EVENTHANDLER_LIST_DEFINE(thread_fini);
 
 static lwpid_t
 tid_alloc(void)
 {
 	lwpid_t	tid;
 
 	tid = alloc_unr(tid_unrhdr);
 	if (tid != -1)
 		return (tid);
 	mtx_lock(&tid_lock);
 	if (tid_head == tid_tail) {
 		mtx_unlock(&tid_lock);
 		return (-1);
 	}
 	tid = tid_buffer[tid_head];
 	tid_head = (tid_head + 1) % TID_BUFFER_SIZE;
 	mtx_unlock(&tid_lock);
 	return (tid);
 }
 
 static void
 tid_free(lwpid_t tid)
 {
 	lwpid_t tmp_tid = -1;
 
 	mtx_lock(&tid_lock);
 	if ((tid_tail + 1) % TID_BUFFER_SIZE == tid_head) {
 		tmp_tid = tid_buffer[tid_head];
 		tid_head = (tid_head + 1) % TID_BUFFER_SIZE;
 	}
 	tid_buffer[tid_tail] = tid;
 	tid_tail = (tid_tail + 1) % TID_BUFFER_SIZE;
 	mtx_unlock(&tid_lock);
 	if (tmp_tid != -1)
 		free_unr(tid_unrhdr, tmp_tid);
 }
 
 /*
  * Prepare a thread for use.
  */
 static int
 thread_ctor(void *mem, int size, void *arg, int flags)
 {
 	struct thread	*td;
 
 	td = (struct thread *)mem;
 	td->td_state = TDS_INACTIVE;
 	td->td_lastcpu = td->td_oncpu = NOCPU;
 
 	td->td_tid = tid_alloc();
 
 	/*
 	 * Note that td_critnest begins life as 1 because the thread is not
 	 * running and is thereby implicitly waiting to be on the receiving
 	 * end of a context switch.
 	 */
 	td->td_critnest = 1;
 	td->td_lend_user_pri = PRI_MAX;
 	EVENTHANDLER_DIRECT_INVOKE(thread_ctor, td);
 #ifdef AUDIT
 	audit_thread_alloc(td);
 #endif
 	umtx_thread_alloc(td);
 	return (0);
 }
 
 /*
  * Reclaim a thread after use.
  */
 static void
 thread_dtor(void *mem, int size, void *arg)
 {
 	struct thread *td;
 
 	td = (struct thread *)mem;
 
 #ifdef INVARIANTS
 	/* Verify that this thread is in a safe state to free. */
 	switch (td->td_state) {
 	case TDS_INHIBITED:
 	case TDS_RUNNING:
 	case TDS_CAN_RUN:
 	case TDS_RUNQ:
 		/*
 		 * We must never unlink a thread that is in one of
 		 * these states, because it is currently active.
 		 */
 		panic("bad state for thread unlinking");
 		/* NOTREACHED */
 	case TDS_INACTIVE:
 		break;
 	default:
 		panic("bad thread state");
 		/* NOTREACHED */
 	}
 #endif
 #ifdef AUDIT
 	audit_thread_free(td);
 #endif
 	/* Free all OSD associated to this thread. */
 	osd_thread_exit(td);
 	td_softdep_cleanup(td);
 	MPASS(td->td_su == NULL);
 
 	EVENTHANDLER_DIRECT_INVOKE(thread_dtor, td);
 	tid_free(td->td_tid);
 }
 
 /*
  * Initialize type-stable parts of a thread (when newly created).
  */
 static int
 thread_init(void *mem, int size, int flags)
 {
 	struct thread *td;
 
 	td = (struct thread *)mem;
 
 	td->td_sleepqueue = sleepq_alloc();
 	td->td_turnstile = turnstile_alloc();
 	td->td_rlqe = NULL;
 	EVENTHANDLER_DIRECT_INVOKE(thread_init, td);
 	umtx_thread_init(td);
 	epoch_thread_init(td);
 	td->td_kstack = 0;
 	td->td_sel = NULL;
 	return (0);
 }
 
 /*
  * Tear down type-stable parts of a thread (just before being discarded).
  */
 static void
 thread_fini(void *mem, int size)
 {
 	struct thread *td;
 
 	td = (struct thread *)mem;
 	EVENTHANDLER_DIRECT_INVOKE(thread_fini, td);
 	rlqentry_free(td->td_rlqe);
 	turnstile_free(td->td_turnstile);
 	sleepq_free(td->td_sleepqueue);
 	umtx_thread_fini(td);
 	epoch_thread_fini(td);
 	seltdfini(td);
 }
 
 /*
  * For a newly created process,
  * link up all the structures and its initial threads etc.
  * called from:
  * {arch}/{arch}/machdep.c   {arch}_init(), init386() etc.
  * proc_dtor() (should go away)
  * proc_init()
  */
 void
 proc_linkup0(struct proc *p, struct thread *td)
 {
 	TAILQ_INIT(&p->p_threads);	     /* all threads in proc */
 	proc_linkup(p, td);
 }
 
 void
 proc_linkup(struct proc *p, struct thread *td)
 {
 
 	sigqueue_init(&p->p_sigqueue, p);
 	p->p_ksi = ksiginfo_alloc(1);
 	if (p->p_ksi != NULL) {
 		/* XXX p_ksi may be null if ksiginfo zone is not ready */
 		p->p_ksi->ksi_flags = KSI_EXT | KSI_INS;
 	}
 	LIST_INIT(&p->p_mqnotifier);
 	p->p_numthreads = 0;
 	thread_link(td, p);
 }
 
 /*
  * Initialize global thread allocation resources.
  */
 void
 threadinit(void)
 {
 
 	mtx_init(&tid_lock, "TID lock", NULL, MTX_DEF);
 
 	/*
 	 * pid_max cannot be greater than PID_MAX.
 	 * leave one number for thread0.
 	 */
 	tid_unrhdr = new_unrhdr(PID_MAX + 2, INT_MAX, &tid_lock);
 
 	thread_zone = uma_zcreate("THREAD", sched_sizeof_thread(),
 	    thread_ctor, thread_dtor, thread_init, thread_fini,
 	    32 - 1, UMA_ZONE_NOFREE);
 	tidhashtbl = hashinit(maxproc / 2, M_TIDHASH, &tidhash);
 	rw_init(&tidhash_lock, "tidhash");
 }
 
 /*
  * Place an unused thread on the zombie list.
  * Use the slpq as that must be unused by now.
  */
 void
 thread_zombie(struct thread *td)
 {
 	mtx_lock_spin(&zombie_lock);
 	TAILQ_INSERT_HEAD(&zombie_threads, td, td_slpq);
 	mtx_unlock_spin(&zombie_lock);
 }
 
 /*
  * Release a thread that has exited after cpu_throw().
  */
 void
 thread_stash(struct thread *td)
 {
 	atomic_subtract_rel_int(&td->td_proc->p_exitthreads, 1);
 	thread_zombie(td);
 }
 
 /*
  * Reap zombie resources.
  */
 void
 thread_reap(void)
 {
 	struct thread *td_first, *td_next;
 
 	/*
 	 * Don't even bother to lock if none at this instant,
 	 * we really don't care about the next instant.
 	 */
 	if (!TAILQ_EMPTY(&zombie_threads)) {
 		mtx_lock_spin(&zombie_lock);
 		td_first = TAILQ_FIRST(&zombie_threads);
 		if (td_first)
 			TAILQ_INIT(&zombie_threads);
 		mtx_unlock_spin(&zombie_lock);
 		while (td_first) {
 			td_next = TAILQ_NEXT(td_first, td_slpq);
 			thread_cow_free(td_first);
 			thread_free(td_first);
 			td_first = td_next;
 		}
 	}
 }
 
 /*
  * Allocate a thread.
  */
 struct thread *
 thread_alloc(int pages)
 {
 	struct thread *td;
 
 	thread_reap(); /* check if any zombies to get */
 
 	td = (struct thread *)uma_zalloc(thread_zone, M_WAITOK);
 	KASSERT(td->td_kstack == 0, ("thread_alloc got thread with kstack"));
 	if (!vm_thread_new(td, pages)) {
 		uma_zfree(thread_zone, td);
 		return (NULL);
 	}
 	cpu_thread_alloc(td);
 	return (td);
 }
 
 int
 thread_alloc_stack(struct thread *td, int pages)
 {
 
 	KASSERT(td->td_kstack == 0,
 	    ("thread_alloc_stack called on a thread with kstack"));
 	if (!vm_thread_new(td, pages))
 		return (0);
 	cpu_thread_alloc(td);
 	return (1);
 }
 
 /*
  * Deallocate a thread.
  */
 void
 thread_free(struct thread *td)
 {
 
 	lock_profile_thread_exit(td);
 	if (td->td_cpuset)
 		cpuset_rel(td->td_cpuset);
 	td->td_cpuset = NULL;
 	cpu_thread_free(td);
 	if (td->td_kstack != 0)
 		vm_thread_dispose(td);
 	callout_drain(&td->td_slpcallout);
 	uma_zfree(thread_zone, td);
 }
 
 void
 thread_cow_get_proc(struct thread *newtd, struct proc *p)
 {
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	newtd->td_ucred = crhold(p->p_ucred);
 	newtd->td_limit = lim_hold(p->p_limit);
 	newtd->td_cowgen = p->p_cowgen;
 }
 
 void
 thread_cow_get(struct thread *newtd, struct thread *td)
 {
 
 	newtd->td_ucred = crhold(td->td_ucred);
 	newtd->td_limit = lim_hold(td->td_limit);
 	newtd->td_cowgen = td->td_cowgen;
 }
 
 void
 thread_cow_free(struct thread *td)
 {
 
 	if (td->td_ucred != NULL)
 		crfree(td->td_ucred);
 	if (td->td_limit != NULL)
 		lim_free(td->td_limit);
 }
 
 void
 thread_cow_update(struct thread *td)
 {
 	struct proc *p;
 	struct ucred *oldcred;
 	struct plimit *oldlimit;
 
 	p = td->td_proc;
 	oldcred = NULL;
 	oldlimit = NULL;
 	PROC_LOCK(p);
 	if (td->td_ucred != p->p_ucred) {
 		oldcred = td->td_ucred;
 		td->td_ucred = crhold(p->p_ucred);
 	}
 	if (td->td_limit != p->p_limit) {
 		oldlimit = td->td_limit;
 		td->td_limit = lim_hold(p->p_limit);
 	}
 	td->td_cowgen = p->p_cowgen;
 	PROC_UNLOCK(p);
 	if (oldcred != NULL)
 		crfree(oldcred);
 	if (oldlimit != NULL)
 		lim_free(oldlimit);
 }
 
 /*
  * Discard the current thread and exit from its context.
  * Always called with scheduler locked.
  *
  * Because we can't free a thread while we're operating under its context,
  * push the current thread into our CPU's deadthread holder. This means
  * we needn't worry about someone else grabbing our context before we
  * do a cpu_throw().
  */
 void
 thread_exit(void)
 {
 	uint64_t runtime, new_switchtime;
 	struct thread *td;
 	struct thread *td2;
 	struct proc *p;
 	int wakeup_swapper;
 
 	td = curthread;
 	p = td->td_proc;
 
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	mtx_assert(&Giant, MA_NOTOWNED);
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	KASSERT(p != NULL, ("thread exiting without a process"));
 	CTR3(KTR_PROC, "thread_exit: thread %p (pid %ld, %s)", td,
 	    (long)p->p_pid, td->td_name);
 	SDT_PROBE0(proc, , , lwp__exit);
 	KASSERT(TAILQ_EMPTY(&td->td_sigqueue.sq_list), ("signal pending"));
 
 	/*
 	 * drop FPU & debug register state storage, or any other
 	 * architecture specific resources that
 	 * would not be on a new untouched process.
 	 */
 	cpu_thread_exit(td);
 
 	/*
 	 * The last thread is left attached to the process
 	 * So that the whole bundle gets recycled. Skip
 	 * all this stuff if we never had threads.
 	 * EXIT clears all sign of other threads when
 	 * it goes to single threading, so the last thread always
 	 * takes the short path.
 	 */
 	if (p->p_flag & P_HADTHREADS) {
 		if (p->p_numthreads > 1) {
 			atomic_add_int(&td->td_proc->p_exitthreads, 1);
 			thread_unlink(td);
 			td2 = FIRST_THREAD_IN_PROC(p);
 			sched_exit_thread(td2, td);
 
 			/*
 			 * The test below is NOT true if we are the
 			 * sole exiting thread. P_STOPPED_SINGLE is unset
 			 * in exit1() after it is the only survivor.
 			 */
 			if (P_SHOULDSTOP(p) == P_STOPPED_SINGLE) {
 				if (p->p_numthreads == p->p_suspcount) {
 					thread_lock(p->p_singlethread);
 					wakeup_swapper = thread_unsuspend_one(
 						p->p_singlethread, p, false);
 					thread_unlock(p->p_singlethread);
 					if (wakeup_swapper)
 						kick_proc0();
 				}
 			}
 
 			PCPU_SET(deadthread, td);
 		} else {
 			/*
 			 * The last thread is exiting.. but not through exit()
 			 */
 			panic ("thread_exit: Last thread exiting on its own");
 		}
 	} 
 #ifdef	HWPMC_HOOKS
 	/*
 	 * If this thread is part of a process that is being tracked by hwpmc(4),
 	 * inform the module of the thread's impending exit.
 	 */
 	if (PMC_PROC_IS_USING_PMCS(td->td_proc)) {
 		PMC_SWITCH_CONTEXT(td, PMC_FN_CSW_OUT);
 		PMC_CALL_HOOK_UNLOCKED(td, PMC_FN_THR_EXIT, NULL);
 	} else if (PMC_SYSTEM_SAMPLING_ACTIVE())
 		PMC_CALL_HOOK_UNLOCKED(td, PMC_FN_THR_EXIT_LOG, NULL);
 #endif
 	PROC_UNLOCK(p);
 	PROC_STATLOCK(p);
 	thread_lock(td);
 	PROC_SUNLOCK(p);
 
 	/* Do the same timestamp bookkeeping that mi_switch() would do. */
 	new_switchtime = cpu_ticks();
 	runtime = new_switchtime - PCPU_GET(switchtime);
 	td->td_runtime += runtime;
 	td->td_incruntime += runtime;
 	PCPU_SET(switchtime, new_switchtime);
 	PCPU_SET(switchticks, ticks);
 	VM_CNT_INC(v_swtch);
 
 	/* Save our resource usage in our process. */
 	td->td_ru.ru_nvcsw++;
 	ruxagg(p, td);
 	rucollect(&p->p_ru, &td->td_ru);
 	PROC_STATUNLOCK(p);
 
 	td->td_state = TDS_INACTIVE;
 #ifdef WITNESS
 	witness_thread_exit(td);
 #endif
 	CTR1(KTR_PROC, "thread_exit: cpu_throw() thread %p", td);
 	sched_throw(td);
 	panic("I'm a teapot!");
 	/* NOTREACHED */
 }
 
 /*
  * Do any thread specific cleanups that may be needed in wait()
  * called with Giant, proc and schedlock not held.
  */
 void
 thread_wait(struct proc *p)
 {
 	struct thread *td;
 
 	mtx_assert(&Giant, MA_NOTOWNED);
 	KASSERT(p->p_numthreads == 1, ("multiple threads in thread_wait()"));
 	KASSERT(p->p_exitthreads == 0, ("p_exitthreads leaking"));
 	td = FIRST_THREAD_IN_PROC(p);
 	/* Lock the last thread so we spin until it exits cpu_throw(). */
 	thread_lock(td);
 	thread_unlock(td);
 	lock_profile_thread_exit(td);
 	cpuset_rel(td->td_cpuset);
 	td->td_cpuset = NULL;
 	cpu_thread_clean(td);
 	thread_cow_free(td);
 	callout_drain(&td->td_slpcallout);
 	thread_reap();	/* check for zombie threads etc. */
 }
 
 /*
  * Link a thread to a process.
  * set up anything that needs to be initialized for it to
  * be used by the process.
  */
 void
 thread_link(struct thread *td, struct proc *p)
 {
 
 	/*
 	 * XXX This can't be enabled because it's called for proc0 before
 	 * its lock has been created.
 	 * PROC_LOCK_ASSERT(p, MA_OWNED);
 	 */
 	td->td_state    = TDS_INACTIVE;
 	td->td_proc     = p;
 	td->td_flags    = TDF_INMEM;
 
 	LIST_INIT(&td->td_contested);
 	LIST_INIT(&td->td_lprof[0]);
 	LIST_INIT(&td->td_lprof[1]);
 	sigqueue_init(&td->td_sigqueue, p);
 	callout_init(&td->td_slpcallout, 1);
 	TAILQ_INSERT_TAIL(&p->p_threads, td, td_plist);
 	p->p_numthreads++;
 }
 
 /*
  * Called from:
  *  thread_exit()
  */
 void
 thread_unlink(struct thread *td)
 {
 	struct proc *p = td->td_proc;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	TAILQ_REMOVE(&p->p_threads, td, td_plist);
 	p->p_numthreads--;
 	/* could clear a few other things here */
 	/* Must  NOT clear links to proc! */
 }
 
 static int
 calc_remaining(struct proc *p, int mode)
 {
 	int remaining;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	if (mode == SINGLE_EXIT)
 		remaining = p->p_numthreads;
 	else if (mode == SINGLE_BOUNDARY)
 		remaining = p->p_numthreads - p->p_boundary_count;
 	else if (mode == SINGLE_NO_EXIT || mode == SINGLE_ALLPROC)
 		remaining = p->p_numthreads - p->p_suspcount;
 	else
 		panic("calc_remaining: wrong mode %d", mode);
 	return (remaining);
 }
 
 static int
 remain_for_mode(int mode)
 {
 
 	return (mode == SINGLE_ALLPROC ? 0 : 1);
 }
 
 static int
 weed_inhib(int mode, struct thread *td2, struct proc *p)
 {
 	int wakeup_swapper;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	THREAD_LOCK_ASSERT(td2, MA_OWNED);
 
 	wakeup_swapper = 0;
 	switch (mode) {
 	case SINGLE_EXIT:
 		if (TD_IS_SUSPENDED(td2))
 			wakeup_swapper |= thread_unsuspend_one(td2, p, true);
 		if (TD_ON_SLEEPQ(td2) && (td2->td_flags & TDF_SINTR) != 0)
 			wakeup_swapper |= sleepq_abort(td2, EINTR);
 		break;
 	case SINGLE_BOUNDARY:
 	case SINGLE_NO_EXIT:
 		if (TD_IS_SUSPENDED(td2) && (td2->td_flags & TDF_BOUNDARY) == 0)
 			wakeup_swapper |= thread_unsuspend_one(td2, p, false);
 		if (TD_ON_SLEEPQ(td2) && (td2->td_flags & TDF_SINTR) != 0)
 			wakeup_swapper |= sleepq_abort(td2, ERESTART);
 		break;
 	case SINGLE_ALLPROC:
 		/*
 		 * ALLPROC suspend tries to avoid spurious EINTR for
 		 * threads sleeping interruptable, by suspending the
 		 * thread directly, similarly to sig_suspend_threads().
 		 * Since such sleep is not performed at the user
 		 * boundary, TDF_BOUNDARY flag is not set, and TDF_ALLPROCSUSP
 		 * is used to avoid immediate un-suspend.
 		 */
 		if (TD_IS_SUSPENDED(td2) && (td2->td_flags & (TDF_BOUNDARY |
 		    TDF_ALLPROCSUSP)) == 0)
 			wakeup_swapper |= thread_unsuspend_one(td2, p, false);
 		if (TD_ON_SLEEPQ(td2) && (td2->td_flags & TDF_SINTR) != 0) {
 			if ((td2->td_flags & TDF_SBDRY) == 0) {
 				thread_suspend_one(td2);
 				td2->td_flags |= TDF_ALLPROCSUSP;
 			} else {
 				wakeup_swapper |= sleepq_abort(td2, ERESTART);
 			}
 		}
 		break;
 	}
 	return (wakeup_swapper);
 }
 
 /*
  * Enforce single-threading.
  *
  * Returns 1 if the caller must abort (another thread is waiting to
  * exit the process or similar). Process is locked!
  * Returns 0 when you are successfully the only thread running.
  * A process has successfully single threaded in the suspend mode when
  * There are no threads in user mode. Threads in the kernel must be
  * allowed to continue until they get to the user boundary. They may even
  * copy out their return values and data before suspending. They may however be
  * accelerated in reaching the user boundary as we will wake up
  * any sleeping threads that are interruptable. (PCATCH).
  */
 int
 thread_single(struct proc *p, int mode)
 {
 	struct thread *td;
 	struct thread *td2;
 	int remaining, wakeup_swapper;
 
 	td = curthread;
 	KASSERT(mode == SINGLE_EXIT || mode == SINGLE_BOUNDARY ||
 	    mode == SINGLE_ALLPROC || mode == SINGLE_NO_EXIT,
 	    ("invalid mode %d", mode));
 	/*
 	 * If allowing non-ALLPROC singlethreading for non-curproc
 	 * callers, calc_remaining() and remain_for_mode() should be
 	 * adjusted to also account for td->td_proc != p.  For now
 	 * this is not implemented because it is not used.
 	 */
 	KASSERT((mode == SINGLE_ALLPROC && td->td_proc != p) ||
 	    (mode != SINGLE_ALLPROC && td->td_proc == p),
 	    ("mode %d proc %p curproc %p", mode, p, td->td_proc));
 	mtx_assert(&Giant, MA_NOTOWNED);
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	if ((p->p_flag & P_HADTHREADS) == 0 && mode != SINGLE_ALLPROC)
 		return (0);
 
 	/* Is someone already single threading? */
 	if (p->p_singlethread != NULL && p->p_singlethread != td)
 		return (1);
 
 	if (mode == SINGLE_EXIT) {
 		p->p_flag |= P_SINGLE_EXIT;
 		p->p_flag &= ~P_SINGLE_BOUNDARY;
 	} else {
 		p->p_flag &= ~P_SINGLE_EXIT;
 		if (mode == SINGLE_BOUNDARY)
 			p->p_flag |= P_SINGLE_BOUNDARY;
 		else
 			p->p_flag &= ~P_SINGLE_BOUNDARY;
 	}
 	if (mode == SINGLE_ALLPROC)
 		p->p_flag |= P_TOTAL_STOP;
 	p->p_flag |= P_STOPPED_SINGLE;
 	PROC_SLOCK(p);
 	p->p_singlethread = td;
 	remaining = calc_remaining(p, mode);
 	while (remaining != remain_for_mode(mode)) {
 		if (P_SHOULDSTOP(p) != P_STOPPED_SINGLE)
 			goto stopme;
 		wakeup_swapper = 0;
 		FOREACH_THREAD_IN_PROC(p, td2) {
 			if (td2 == td)
 				continue;
 			thread_lock(td2);
 			td2->td_flags |= TDF_ASTPENDING | TDF_NEEDSUSPCHK;
 			if (TD_IS_INHIBITED(td2)) {
 				wakeup_swapper |= weed_inhib(mode, td2, p);
 #ifdef SMP
 			} else if (TD_IS_RUNNING(td2) && td != td2) {
 				forward_signal(td2);
 #endif
 			}
 			thread_unlock(td2);
 		}
 		if (wakeup_swapper)
 			kick_proc0();
 		remaining = calc_remaining(p, mode);
 
 		/*
 		 * Maybe we suspended some threads.. was it enough?
 		 */
 		if (remaining == remain_for_mode(mode))
 			break;
 
 stopme:
 		/*
 		 * Wake us up when everyone else has suspended.
 		 * In the mean time we suspend as well.
 		 */
 		thread_suspend_switch(td, p);
 		remaining = calc_remaining(p, mode);
 	}
 	if (mode == SINGLE_EXIT) {
 		/*
 		 * Convert the process to an unthreaded process.  The
 		 * SINGLE_EXIT is called by exit1() or execve(), in
 		 * both cases other threads must be retired.
 		 */
 		KASSERT(p->p_numthreads == 1, ("Unthreading with >1 threads"));
 		p->p_singlethread = NULL;
 		p->p_flag &= ~(P_STOPPED_SINGLE | P_SINGLE_EXIT | P_HADTHREADS);
 
 		/*
 		 * Wait for any remaining threads to exit cpu_throw().
 		 */
 		while (p->p_exitthreads != 0) {
 			PROC_SUNLOCK(p);
 			PROC_UNLOCK(p);
 			sched_relinquish(td);
 			PROC_LOCK(p);
 			PROC_SLOCK(p);
 		}
 	} else if (mode == SINGLE_BOUNDARY) {
 		/*
 		 * Wait until all suspended threads are removed from
 		 * the processors.  The thread_suspend_check()
 		 * increments p_boundary_count while it is still
 		 * running, which makes it possible for the execve()
 		 * to destroy vmspace while our other threads are
 		 * still using the address space.
 		 *
 		 * We lock the thread, which is only allowed to
 		 * succeed after context switch code finished using
 		 * the address space.
 		 */
 		FOREACH_THREAD_IN_PROC(p, td2) {
 			if (td2 == td)
 				continue;
 			thread_lock(td2);
 			KASSERT((td2->td_flags & TDF_BOUNDARY) != 0,
 			    ("td %p not on boundary", td2));
 			KASSERT(TD_IS_SUSPENDED(td2),
 			    ("td %p is not suspended", td2));
 			thread_unlock(td2);
 		}
 	}
 	PROC_SUNLOCK(p);
 	return (0);
 }
 
 bool
 thread_suspend_check_needed(void)
 {
 	struct proc *p;
 	struct thread *td;
 
 	td = curthread;
 	p = td->td_proc;
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	return (P_SHOULDSTOP(p) || ((p->p_flag & P_TRACED) != 0 &&
 	    (td->td_dbgflags & TDB_SUSPEND) != 0));
 }
 
 /*
  * Called in from locations that can safely check to see
  * whether we have to suspend or at least throttle for a
  * single-thread event (e.g. fork).
  *
  * Such locations include userret().
  * If the "return_instead" argument is non zero, the thread must be able to
  * accept 0 (caller may continue), or 1 (caller must abort) as a result.
  *
  * The 'return_instead' argument tells the function if it may do a
  * thread_exit() or suspend, or whether the caller must abort and back
  * out instead.
  *
  * If the thread that set the single_threading request has set the
  * P_SINGLE_EXIT bit in the process flags then this call will never return
  * if 'return_instead' is false, but will exit.
  *
  * P_SINGLE_EXIT | return_instead == 0| return_instead != 0
  *---------------+--------------------+---------------------
  *       0       | returns 0          |   returns 0 or 1
  *               | when ST ends       |   immediately
  *---------------+--------------------+---------------------
  *       1       | thread exits       |   returns 1
  *               |                    |  immediately
  * 0 = thread_exit() or suspension ok,
  * other = return error instead of stopping the thread.
  *
  * While a full suspension is under effect, even a single threading
  * thread would be suspended if it made this call (but it shouldn't).
  * This call should only be made from places where
  * thread_exit() would be safe as that may be the outcome unless
  * return_instead is set.
  */
 int
 thread_suspend_check(int return_instead)
 {
 	struct thread *td;
 	struct proc *p;
 	int wakeup_swapper;
 
 	td = curthread;
 	p = td->td_proc;
 	mtx_assert(&Giant, MA_NOTOWNED);
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	while (thread_suspend_check_needed()) {
 		if (P_SHOULDSTOP(p) == P_STOPPED_SINGLE) {
 			KASSERT(p->p_singlethread != NULL,
 			    ("singlethread not set"));
 			/*
 			 * The only suspension in action is a
 			 * single-threading. Single threader need not stop.
 			 * It is safe to access p->p_singlethread unlocked
 			 * because it can only be set to our address by us.
 			 */
 			if (p->p_singlethread == td)
 				return (0);	/* Exempt from stopping. */
 		}
 		if ((p->p_flag & P_SINGLE_EXIT) && return_instead)
 			return (EINTR);
 
 		/* Should we goto user boundary if we didn't come from there? */
 		if (P_SHOULDSTOP(p) == P_STOPPED_SINGLE &&
 		    (p->p_flag & P_SINGLE_BOUNDARY) && return_instead)
 			return (ERESTART);
 
 		/*
 		 * Ignore suspend requests if they are deferred.
 		 */
 		if ((td->td_flags & TDF_SBDRY) != 0) {
 			KASSERT(return_instead,
 			    ("TDF_SBDRY set for unsafe thread_suspend_check"));
 			KASSERT((td->td_flags & (TDF_SEINTR | TDF_SERESTART)) !=
 			    (TDF_SEINTR | TDF_SERESTART),
 			    ("both TDF_SEINTR and TDF_SERESTART"));
 			return (TD_SBDRY_INTR(td) ? TD_SBDRY_ERRNO(td) : 0);
 		}
 
 		/*
 		 * If the process is waiting for us to exit,
 		 * this thread should just suicide.
 		 * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.
 		 */
 		if ((p->p_flag & P_SINGLE_EXIT) && (p->p_singlethread != td)) {
 			PROC_UNLOCK(p);
 
 			/*
 			 * Allow Linux emulation layer to do some work
 			 * before thread suicide.
 			 */
 			if (__predict_false(p->p_sysent->sv_thread_detach != NULL))
 				(p->p_sysent->sv_thread_detach)(td);
 			umtx_thread_exit(td);
 			kern_thr_exit(td);
 			panic("stopped thread did not exit");
 		}
 
 		PROC_SLOCK(p);
 		thread_stopped(p);
 		if (P_SHOULDSTOP(p) == P_STOPPED_SINGLE) {
 			if (p->p_numthreads == p->p_suspcount + 1) {
 				thread_lock(p->p_singlethread);
 				wakeup_swapper = thread_unsuspend_one(
 				    p->p_singlethread, p, false);
 				thread_unlock(p->p_singlethread);
 				if (wakeup_swapper)
 					kick_proc0();
 			}
 		}
 		PROC_UNLOCK(p);
 		thread_lock(td);
 		/*
 		 * When a thread suspends, it just
 		 * gets taken off all queues.
 		 */
 		thread_suspend_one(td);
 		if (return_instead == 0) {
 			p->p_boundary_count++;
 			td->td_flags |= TDF_BOUNDARY;
 		}
 		PROC_SUNLOCK(p);
 		mi_switch(SW_INVOL | SWT_SUSPEND, NULL);
 		thread_unlock(td);
 		PROC_LOCK(p);
 	}
 	return (0);
 }
 
 void
 thread_suspend_switch(struct thread *td, struct proc *p)
 {
 
 	KASSERT(!TD_IS_SUSPENDED(td), ("already suspended"));
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	/*
 	 * We implement thread_suspend_one in stages here to avoid
 	 * dropping the proc lock while the thread lock is owned.
 	 */
 	if (p == td->td_proc) {
 		thread_stopped(p);
 		p->p_suspcount++;
 	}
 	PROC_UNLOCK(p);
 	thread_lock(td);
 	td->td_flags &= ~TDF_NEEDSUSPCHK;
 	TD_SET_SUSPENDED(td);
 	sched_sleep(td, 0);
 	PROC_SUNLOCK(p);
 	DROP_GIANT();
 	mi_switch(SW_VOL | SWT_SUSPEND, NULL);
 	thread_unlock(td);
 	PICKUP_GIANT();
 	PROC_LOCK(p);
 	PROC_SLOCK(p);
 }
 
 void
 thread_suspend_one(struct thread *td)
 {
 	struct proc *p;
 
 	p = td->td_proc;
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	THREAD_LOCK_ASSERT(td, MA_OWNED);
 	KASSERT(!TD_IS_SUSPENDED(td), ("already suspended"));
 	p->p_suspcount++;
 	td->td_flags &= ~TDF_NEEDSUSPCHK;
 	TD_SET_SUSPENDED(td);
 	sched_sleep(td, 0);
 }
 
 static int
 thread_unsuspend_one(struct thread *td, struct proc *p, bool boundary)
 {
 
 	THREAD_LOCK_ASSERT(td, MA_OWNED);
 	KASSERT(TD_IS_SUSPENDED(td), ("Thread not suspended"));
 	TD_CLR_SUSPENDED(td);
 	td->td_flags &= ~TDF_ALLPROCSUSP;
 	if (td->td_proc == p) {
 		PROC_SLOCK_ASSERT(p, MA_OWNED);
 		p->p_suspcount--;
 		if (boundary && (td->td_flags & TDF_BOUNDARY) != 0) {
 			td->td_flags &= ~TDF_BOUNDARY;
 			p->p_boundary_count--;
 		}
 	}
 	return (setrunnable(td));
 }
 
 /*
  * Allow all threads blocked by single threading to continue running.
  */
 void
 thread_unsuspend(struct proc *p)
 {
 	struct thread *td;
 	int wakeup_swapper;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	PROC_SLOCK_ASSERT(p, MA_OWNED);
 	wakeup_swapper = 0;
 	if (!P_SHOULDSTOP(p)) {
                 FOREACH_THREAD_IN_PROC(p, td) {
 			thread_lock(td);
 			if (TD_IS_SUSPENDED(td)) {
 				wakeup_swapper |= thread_unsuspend_one(td, p,
 				    true);
 			}
 			thread_unlock(td);
 		}
 	} else if (P_SHOULDSTOP(p) == P_STOPPED_SINGLE &&
 	    p->p_numthreads == p->p_suspcount) {
 		/*
 		 * Stopping everything also did the job for the single
 		 * threading request. Now we've downgraded to single-threaded,
 		 * let it continue.
 		 */
 		if (p->p_singlethread->td_proc == p) {
 			thread_lock(p->p_singlethread);
 			wakeup_swapper = thread_unsuspend_one(
 			    p->p_singlethread, p, false);
 			thread_unlock(p->p_singlethread);
 		}
 	}
 	if (wakeup_swapper)
 		kick_proc0();
 }
 
 /*
  * End the single threading mode..
  */
 void
 thread_single_end(struct proc *p, int mode)
 {
 	struct thread *td;
 	int wakeup_swapper;
 
 	KASSERT(mode == SINGLE_EXIT || mode == SINGLE_BOUNDARY ||
 	    mode == SINGLE_ALLPROC || mode == SINGLE_NO_EXIT,
 	    ("invalid mode %d", mode));
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	KASSERT((mode == SINGLE_ALLPROC && (p->p_flag & P_TOTAL_STOP) != 0) ||
 	    (mode != SINGLE_ALLPROC && (p->p_flag & P_TOTAL_STOP) == 0),
 	    ("mode %d does not match P_TOTAL_STOP", mode));
 	KASSERT(mode == SINGLE_ALLPROC || p->p_singlethread == curthread,
 	    ("thread_single_end from other thread %p %p",
 	    curthread, p->p_singlethread));
 	KASSERT(mode != SINGLE_BOUNDARY ||
 	    (p->p_flag & P_SINGLE_BOUNDARY) != 0,
 	    ("mis-matched SINGLE_BOUNDARY flags %x", p->p_flag));
 	p->p_flag &= ~(P_STOPPED_SINGLE | P_SINGLE_EXIT | P_SINGLE_BOUNDARY |
 	    P_TOTAL_STOP);
 	PROC_SLOCK(p);
 	p->p_singlethread = NULL;
 	wakeup_swapper = 0;
 	/*
 	 * If there are other threads they may now run,
 	 * unless of course there is a blanket 'stop order'
 	 * on the process. The single threader must be allowed
 	 * to continue however as this is a bad place to stop.
 	 */
 	if (p->p_numthreads != remain_for_mode(mode) && !P_SHOULDSTOP(p)) {
                 FOREACH_THREAD_IN_PROC(p, td) {
 			thread_lock(td);
 			if (TD_IS_SUSPENDED(td)) {
 				wakeup_swapper |= thread_unsuspend_one(td, p,
 				    mode == SINGLE_BOUNDARY);
 			}
 			thread_unlock(td);
 		}
 	}
 	KASSERT(mode != SINGLE_BOUNDARY || p->p_boundary_count == 0,
 	    ("inconsistent boundary count %d", p->p_boundary_count));
 	PROC_SUNLOCK(p);
 	if (wakeup_swapper)
 		kick_proc0();
 }
 
 struct thread *
 thread_find(struct proc *p, lwpid_t tid)
 {
 	struct thread *td;
 
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	FOREACH_THREAD_IN_PROC(p, td) {
 		if (td->td_tid == tid)
 			break;
 	}
 	return (td);
 }
 
 /* Locate a thread by number; return with proc lock held. */
 struct thread *
 tdfind(lwpid_t tid, pid_t pid)
 {
 #define RUN_THRESH	16
 	struct thread *td;
 	int run = 0;
 
 	rw_rlock(&tidhash_lock);
 	LIST_FOREACH(td, TIDHASH(tid), td_hash) {
 		if (td->td_tid == tid) {
 			if (pid != -1 && td->td_proc->p_pid != pid) {
 				td = NULL;
 				break;
 			}
 			PROC_LOCK(td->td_proc);
 			if (td->td_proc->p_state == PRS_NEW) {
 				PROC_UNLOCK(td->td_proc);
 				td = NULL;
 				break;
 			}
 			if (run > RUN_THRESH) {
 				if (rw_try_upgrade(&tidhash_lock)) {
 					LIST_REMOVE(td, td_hash);
 					LIST_INSERT_HEAD(TIDHASH(td->td_tid),
 						td, td_hash);
 					rw_wunlock(&tidhash_lock);
 					return (td);
 				}
 			}
 			break;
 		}
 		run++;
 	}
 	rw_runlock(&tidhash_lock);
 	return (td);
 }
 
 void
 tidhash_add(struct thread *td)
 {
 	rw_wlock(&tidhash_lock);
 	LIST_INSERT_HEAD(TIDHASH(td->td_tid), td, td_hash);
 	rw_wunlock(&tidhash_lock);
 }
 
 void
 tidhash_remove(struct thread *td)
 {
 	rw_wlock(&tidhash_lock);
 	LIST_REMOVE(td, td_hash);
 	rw_wunlock(&tidhash_lock);
 }
Index: user/ngie/bug-237403/sys/kern/systrace_args.c
===================================================================
--- user/ngie/bug-237403/sys/kern/systrace_args.c	(revision 346925)
+++ user/ngie/bug-237403/sys/kern/systrace_args.c	(revision 346926)
@@ -1,10748 +1,10748 @@
 /*
  * System call argument to DTrace register array converstion.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
  * This file is part of the DTrace syscall provider.
  */
 
 static void
 systrace_args(int sysnum, void *params, uint64_t *uarg, int *n_args)
 {
 	int64_t *iarg  = (int64_t *) uarg;
 	switch (sysnum) {
 	/* nosys */
 	case 0: {
 		*n_args = 0;
 		break;
 	}
 	/* sys_exit */
 	case 1: {
 		struct sys_exit_args *p = params;
 		iarg[0] = p->rval; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* fork */
 	case 2: {
 		*n_args = 0;
 		break;
 	}
 	/* read */
 	case 3: {
 		struct read_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->nbyte; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* write */
 	case 4: {
 		struct write_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->nbyte; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* open */
 	case 5: {
 		struct open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* close */
 	case 6: {
 		struct close_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* wait4 */
 	case 7: {
 		struct wait4_args *p = params;
 		iarg[0] = p->pid; /* int */
 		uarg[1] = (intptr_t) p->status; /* int * */
 		iarg[2] = p->options; /* int */
 		uarg[3] = (intptr_t) p->rusage; /* struct rusage * */
 		*n_args = 4;
 		break;
 	}
 	/* link */
 	case 9: {
 		struct link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->link; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* unlink */
 	case 10: {
 		struct unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* chdir */
 	case 12: {
 		struct chdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* fchdir */
 	case 13: {
 		struct fchdir_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* chmod */
 	case 15: {
 		struct chmod_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* chown */
 	case 16: {
 		struct chown_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* break */
 	case 17: {
 		struct break_args *p = params;
 		uarg[0] = (intptr_t) p->nsize; /* char * */
 		*n_args = 1;
 		break;
 	}
 	/* getpid */
 	case 20: {
 		*n_args = 0;
 		break;
 	}
 	/* mount */
 	case 21: {
 		struct mount_args *p = params;
 		uarg[0] = (intptr_t) p->type; /* const char * */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flags; /* int */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		*n_args = 4;
 		break;
 	}
 	/* unmount */
 	case 22: {
 		struct unmount_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* setuid */
 	case 23: {
 		struct setuid_args *p = params;
 		uarg[0] = p->uid; /* uid_t */
 		*n_args = 1;
 		break;
 	}
 	/* getuid */
 	case 24: {
 		*n_args = 0;
 		break;
 	}
 	/* geteuid */
 	case 25: {
 		*n_args = 0;
 		break;
 	}
 	/* ptrace */
 	case 26: {
 		struct ptrace_args *p = params;
 		iarg[0] = p->req; /* int */
 		iarg[1] = p->pid; /* pid_t */
 		uarg[2] = (intptr_t) p->addr; /* caddr_t */
 		iarg[3] = p->data; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* recvmsg */
 	case 27: {
 		struct recvmsg_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->msg; /* struct msghdr * */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* sendmsg */
 	case 28: {
 		struct sendmsg_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->msg; /* struct msghdr * */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* recvfrom */
 	case 29: {
 		struct recvfrom_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->len; /* size_t */
 		iarg[3] = p->flags; /* int */
 		uarg[4] = (intptr_t) p->from; /* struct sockaddr * */
 		uarg[5] = (intptr_t) p->fromlenaddr; /* __socklen_t * */
 		*n_args = 6;
 		break;
 	}
 	/* accept */
 	case 30: {
 		struct accept_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->anamelen; /* __socklen_t * */
 		*n_args = 3;
 		break;
 	}
 	/* getpeername */
 	case 31: {
 		struct getpeername_args *p = params;
 		iarg[0] = p->fdes; /* int */
 		uarg[1] = (intptr_t) p->asa; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->alen; /* __socklen_t * */
 		*n_args = 3;
 		break;
 	}
 	/* getsockname */
 	case 32: {
 		struct getsockname_args *p = params;
 		iarg[0] = p->fdes; /* int */
 		uarg[1] = (intptr_t) p->asa; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->alen; /* __socklen_t * */
 		*n_args = 3;
 		break;
 	}
 	/* access */
 	case 33: {
 		struct access_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->amode; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* chflags */
 	case 34: {
 		struct chflags_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* fchflags */
 	case 35: {
 		struct fchflags_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* sync */
 	case 36: {
 		*n_args = 0;
 		break;
 	}
 	/* kill */
 	case 37: {
 		struct kill_args *p = params;
 		iarg[0] = p->pid; /* int */
 		iarg[1] = p->signum; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* getppid */
 	case 39: {
 		*n_args = 0;
 		break;
 	}
 	/* dup */
 	case 41: {
 		struct dup_args *p = params;
 		uarg[0] = p->fd; /* u_int */
 		*n_args = 1;
 		break;
 	}
 	/* getegid */
 	case 43: {
 		*n_args = 0;
 		break;
 	}
 	/* profil */
 	case 44: {
 		struct profil_args *p = params;
 		uarg[0] = (intptr_t) p->samples; /* char * */
 		uarg[1] = p->size; /* size_t */
 		uarg[2] = p->offset; /* size_t */
 		uarg[3] = p->scale; /* u_int */
 		*n_args = 4;
 		break;
 	}
 	/* ktrace */
 	case 45: {
 		struct ktrace_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		iarg[1] = p->ops; /* int */
 		iarg[2] = p->facs; /* int */
 		iarg[3] = p->pid; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* getgid */
 	case 47: {
 		*n_args = 0;
 		break;
 	}
 	/* getlogin */
 	case 49: {
 		struct getlogin_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* char * */
 		uarg[1] = p->namelen; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* setlogin */
 	case 50: {
 		struct setlogin_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* acct */
 	case 51: {
 		struct acct_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* sigaltstack */
 	case 53: {
 		struct sigaltstack_args *p = params;
 		uarg[0] = (intptr_t) p->ss; /* stack_t * */
 		uarg[1] = (intptr_t) p->oss; /* stack_t * */
 		*n_args = 2;
 		break;
 	}
 	/* ioctl */
 	case 54: {
 		struct ioctl_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->com; /* u_long */
 		uarg[2] = (intptr_t) p->data; /* char * */
 		*n_args = 3;
 		break;
 	}
 	/* reboot */
 	case 55: {
 		struct reboot_args *p = params;
 		iarg[0] = p->opt; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* revoke */
 	case 56: {
 		struct revoke_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* symlink */
 	case 57: {
 		struct symlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->link; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* readlink */
 	case 58: {
 		struct readlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->count; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* execve */
 	case 59: {
 		struct execve_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->argv; /* char ** */
 		uarg[2] = (intptr_t) p->envv; /* char ** */
 		*n_args = 3;
 		break;
 	}
 	/* umask */
 	case 60: {
 		struct umask_args *p = params;
 		iarg[0] = p->newmask; /* mode_t */
 		*n_args = 1;
 		break;
 	}
 	/* chroot */
 	case 61: {
 		struct chroot_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* msync */
 	case 65: {
 		struct msync_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* vfork */
 	case 66: {
 		*n_args = 0;
 		break;
 	}
 	/* sbrk */
 	case 69: {
 		struct sbrk_args *p = params;
 		iarg[0] = p->incr; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* sstk */
 	case 70: {
 		struct sstk_args *p = params;
 		iarg[0] = p->incr; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* munmap */
 	case 73: {
 		struct munmap_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* mprotect */
 	case 74: {
 		struct mprotect_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->prot; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* madvise */
 	case 75: {
 		struct madvise_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->behav; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* mincore */
 	case 78: {
 		struct mincore_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		uarg[2] = (intptr_t) p->vec; /* char * */
 		*n_args = 3;
 		break;
 	}
 	/* getgroups */
 	case 79: {
 		struct getgroups_args *p = params;
 		uarg[0] = p->gidsetsize; /* u_int */
 		uarg[1] = (intptr_t) p->gidset; /* gid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* setgroups */
 	case 80: {
 		struct setgroups_args *p = params;
 		uarg[0] = p->gidsetsize; /* u_int */
 		uarg[1] = (intptr_t) p->gidset; /* gid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* getpgrp */
 	case 81: {
 		*n_args = 0;
 		break;
 	}
 	/* setpgid */
 	case 82: {
 		struct setpgid_args *p = params;
 		iarg[0] = p->pid; /* int */
 		iarg[1] = p->pgid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* setitimer */
 	case 83: {
 		struct setitimer_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->itv; /* struct itimerval * */
 		uarg[2] = (intptr_t) p->oitv; /* struct itimerval * */
 		*n_args = 3;
 		break;
 	}
 	/* swapon */
 	case 85: {
 		struct swapon_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* getitimer */
 	case 86: {
 		struct getitimer_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->itv; /* struct itimerval * */
 		*n_args = 2;
 		break;
 	}
 	/* getdtablesize */
 	case 89: {
 		*n_args = 0;
 		break;
 	}
 	/* dup2 */
 	case 90: {
 		struct dup2_args *p = params;
 		uarg[0] = p->from; /* u_int */
 		uarg[1] = p->to; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* fcntl */
 	case 92: {
 		struct fcntl_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->cmd; /* int */
 		iarg[2] = p->arg; /* long */
 		*n_args = 3;
 		break;
 	}
 	/* select */
 	case 93: {
 		struct select_args *p = params;
 		iarg[0] = p->nd; /* int */
 		uarg[1] = (intptr_t) p->in; /* fd_set * */
 		uarg[2] = (intptr_t) p->ou; /* fd_set * */
 		uarg[3] = (intptr_t) p->ex; /* fd_set * */
 		uarg[4] = (intptr_t) p->tv; /* struct timeval * */
 		*n_args = 5;
 		break;
 	}
 	/* fsync */
 	case 95: {
 		struct fsync_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* setpriority */
 	case 96: {
 		struct setpriority_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->who; /* int */
 		iarg[2] = p->prio; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* socket */
 	case 97: {
 		struct socket_args *p = params;
 		iarg[0] = p->domain; /* int */
 		iarg[1] = p->type; /* int */
 		iarg[2] = p->protocol; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* connect */
 	case 98: {
 		struct connect_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[2] = p->namelen; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* getpriority */
 	case 100: {
 		struct getpriority_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->who; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* bind */
 	case 104: {
 		struct bind_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[2] = p->namelen; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* setsockopt */
 	case 105: {
 		struct setsockopt_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->level; /* int */
 		iarg[2] = p->name; /* int */
 		uarg[3] = (intptr_t) p->val; /* const void * */
 		iarg[4] = p->valsize; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* listen */
 	case 106: {
 		struct listen_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->backlog; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* gettimeofday */
 	case 116: {
 		struct gettimeofday_args *p = params;
 		uarg[0] = (intptr_t) p->tp; /* struct timeval * */
 		uarg[1] = (intptr_t) p->tzp; /* struct timezone * */
 		*n_args = 2;
 		break;
 	}
 	/* getrusage */
 	case 117: {
 		struct getrusage_args *p = params;
 		iarg[0] = p->who; /* int */
 		uarg[1] = (intptr_t) p->rusage; /* struct rusage * */
 		*n_args = 2;
 		break;
 	}
 	/* getsockopt */
 	case 118: {
 		struct getsockopt_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->level; /* int */
 		iarg[2] = p->name; /* int */
 		uarg[3] = (intptr_t) p->val; /* void * */
 		uarg[4] = (intptr_t) p->avalsize; /* int * */
 		*n_args = 5;
 		break;
 	}
 	/* readv */
 	case 120: {
 		struct readv_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[2] = p->iovcnt; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* writev */
 	case 121: {
 		struct writev_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[2] = p->iovcnt; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* settimeofday */
 	case 122: {
 		struct settimeofday_args *p = params;
 		uarg[0] = (intptr_t) p->tv; /* struct timeval * */
 		uarg[1] = (intptr_t) p->tzp; /* struct timezone * */
 		*n_args = 2;
 		break;
 	}
 	/* fchown */
 	case 123: {
 		struct fchown_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* fchmod */
 	case 124: {
 		struct fchmod_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* setreuid */
 	case 126: {
 		struct setreuid_args *p = params;
 		iarg[0] = p->ruid; /* int */
 		iarg[1] = p->euid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* setregid */
 	case 127: {
 		struct setregid_args *p = params;
 		iarg[0] = p->rgid; /* int */
 		iarg[1] = p->egid; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* rename */
 	case 128: {
 		struct rename_args *p = params;
 		uarg[0] = (intptr_t) p->from; /* const char * */
 		uarg[1] = (intptr_t) p->to; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* flock */
 	case 131: {
 		struct flock_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->how; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* mkfifo */
 	case 132: {
 		struct mkfifo_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* sendto */
 	case 133: {
 		struct sendto_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->len; /* size_t */
 		iarg[3] = p->flags; /* int */
 		uarg[4] = (intptr_t) p->to; /* const struct sockaddr * */
 		iarg[5] = p->tolen; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* shutdown */
 	case 134: {
 		struct shutdown_args *p = params;
 		iarg[0] = p->s; /* int */
 		iarg[1] = p->how; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* socketpair */
 	case 135: {
 		struct socketpair_args *p = params;
 		iarg[0] = p->domain; /* int */
 		iarg[1] = p->type; /* int */
 		iarg[2] = p->protocol; /* int */
 		uarg[3] = (intptr_t) p->rsv; /* int * */
 		*n_args = 4;
 		break;
 	}
 	/* mkdir */
 	case 136: {
 		struct mkdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* rmdir */
 	case 137: {
 		struct rmdir_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* utimes */
 	case 138: {
 		struct utimes_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval * */
 		*n_args = 2;
 		break;
 	}
 	/* adjtime */
 	case 140: {
 		struct adjtime_args *p = params;
 		uarg[0] = (intptr_t) p->delta; /* struct timeval * */
 		uarg[1] = (intptr_t) p->olddelta; /* struct timeval * */
 		*n_args = 2;
 		break;
 	}
 	/* setsid */
 	case 147: {
 		*n_args = 0;
 		break;
 	}
 	/* quotactl */
 	case 148: {
 		struct quotactl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->cmd; /* int */
 		iarg[2] = p->uid; /* int */
 		uarg[3] = (intptr_t) p->arg; /* void * */
 		*n_args = 4;
 		break;
 	}
 	/* nlm_syscall */
 	case 154: {
 		struct nlm_syscall_args *p = params;
 		iarg[0] = p->debug_level; /* int */
 		iarg[1] = p->grace_period; /* int */
 		iarg[2] = p->addr_count; /* int */
 		uarg[3] = (intptr_t) p->addrs; /* char ** */
 		*n_args = 4;
 		break;
 	}
 	/* nfssvc */
 	case 155: {
 		struct nfssvc_args *p = params;
 		iarg[0] = p->flag; /* int */
 		uarg[1] = (intptr_t) p->argp; /* void * */
 		*n_args = 2;
 		break;
 	}
 	/* lgetfh */
 	case 160: {
 		struct lgetfh_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->fhp; /* struct fhandle * */
 		*n_args = 2;
 		break;
 	}
 	/* getfh */
 	case 161: {
 		struct getfh_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->fhp; /* struct fhandle * */
 		*n_args = 2;
 		break;
 	}
 	/* sysarch */
 	case 165: {
 		struct sysarch_args *p = params;
 		iarg[0] = p->op; /* int */
 		uarg[1] = (intptr_t) p->parms; /* char * */
 		*n_args = 2;
 		break;
 	}
 	/* rtprio */
 	case 166: {
 		struct rtprio_args *p = params;
 		iarg[0] = p->function; /* int */
 		iarg[1] = p->pid; /* pid_t */
 		uarg[2] = (intptr_t) p->rtp; /* struct rtprio * */
 		*n_args = 3;
 		break;
 	}
 	/* semsys */
 	case 169: {
 		struct semsys_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->a2; /* int */
 		iarg[2] = p->a3; /* int */
 		iarg[3] = p->a4; /* int */
 		iarg[4] = p->a5; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* msgsys */
 	case 170: {
 		struct msgsys_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->a2; /* int */
 		iarg[2] = p->a3; /* int */
 		iarg[3] = p->a4; /* int */
 		iarg[4] = p->a5; /* int */
 		iarg[5] = p->a6; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* shmsys */
 	case 171: {
 		struct shmsys_args *p = params;
 		iarg[0] = p->which; /* int */
 		iarg[1] = p->a2; /* int */
 		iarg[2] = p->a3; /* int */
 		iarg[3] = p->a4; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* setfib */
 	case 175: {
 		struct setfib_args *p = params;
 		iarg[0] = p->fibnum; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* ntp_adjtime */
 	case 176: {
 		struct ntp_adjtime_args *p = params;
 		uarg[0] = (intptr_t) p->tp; /* struct timex * */
 		*n_args = 1;
 		break;
 	}
 	/* setgid */
 	case 181: {
 		struct setgid_args *p = params;
 		iarg[0] = p->gid; /* gid_t */
 		*n_args = 1;
 		break;
 	}
 	/* setegid */
 	case 182: {
 		struct setegid_args *p = params;
 		iarg[0] = p->egid; /* gid_t */
 		*n_args = 1;
 		break;
 	}
 	/* seteuid */
 	case 183: {
 		struct seteuid_args *p = params;
 		uarg[0] = p->euid; /* uid_t */
 		*n_args = 1;
 		break;
 	}
 	/* pathconf */
 	case 191: {
 		struct pathconf_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* fpathconf */
 	case 192: {
 		struct fpathconf_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* getrlimit */
 	case 194: {
 		struct __getrlimit_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->rlp; /* struct rlimit * */
 		*n_args = 2;
 		break;
 	}
 	/* setrlimit */
 	case 195: {
 		struct __setrlimit_args *p = params;
 		uarg[0] = p->which; /* u_int */
 		uarg[1] = (intptr_t) p->rlp; /* struct rlimit * */
 		*n_args = 2;
 		break;
 	}
 	/* nosys */
 	case 198: {
 		*n_args = 0;
 		break;
 	}
 	/* __sysctl */
 	case 202: {
 		struct sysctl_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* int * */
 		uarg[1] = p->namelen; /* u_int */
 		uarg[2] = (intptr_t) p->old; /* void * */
 		uarg[3] = (intptr_t) p->oldlenp; /* size_t * */
 		uarg[4] = (intptr_t) p->new; /* const void * */
 		uarg[5] = p->newlen; /* size_t */
 		*n_args = 6;
 		break;
 	}
 	/* mlock */
 	case 203: {
 		struct mlock_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* munlock */
 	case 204: {
 		struct munlock_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* undelete */
 	case 205: {
 		struct undelete_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* futimes */
 	case 206: {
 		struct futimes_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval * */
 		*n_args = 2;
 		break;
 	}
 	/* getpgid */
 	case 207: {
 		struct getpgid_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* poll */
 	case 209: {
 		struct poll_args *p = params;
 		uarg[0] = (intptr_t) p->fds; /* struct pollfd * */
 		uarg[1] = p->nfds; /* u_int */
 		iarg[2] = p->timeout; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* lkmnosys */
 	case 210: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 211: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 212: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 213: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 214: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 215: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 216: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 217: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 218: {
 		*n_args = 0;
 		break;
 	}
 	/* lkmnosys */
 	case 219: {
 		*n_args = 0;
 		break;
 	}
 	/* semget */
 	case 221: {
 		struct semget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		iarg[1] = p->nsems; /* int */
 		iarg[2] = p->semflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* semop */
 	case 222: {
 		struct semop_args *p = params;
 		iarg[0] = p->semid; /* int */
 		uarg[1] = (intptr_t) p->sops; /* struct sembuf * */
 		uarg[2] = p->nsops; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* msgget */
 	case 225: {
 		struct msgget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		iarg[1] = p->msgflg; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* msgsnd */
 	case 226: {
 		struct msgsnd_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		uarg[1] = (intptr_t) p->msgp; /* const void * */
 		uarg[2] = p->msgsz; /* size_t */
 		iarg[3] = p->msgflg; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* msgrcv */
 	case 227: {
 		struct msgrcv_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		uarg[1] = (intptr_t) p->msgp; /* void * */
 		uarg[2] = p->msgsz; /* size_t */
 		iarg[3] = p->msgtyp; /* long */
 		iarg[4] = p->msgflg; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* shmat */
 	case 228: {
 		struct shmat_args *p = params;
 		iarg[0] = p->shmid; /* int */
 		uarg[1] = (intptr_t) p->shmaddr; /* const void * */
 		iarg[2] = p->shmflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* shmdt */
 	case 230: {
 		struct shmdt_args *p = params;
 		uarg[0] = (intptr_t) p->shmaddr; /* const void * */
 		*n_args = 1;
 		break;
 	}
 	/* shmget */
 	case 231: {
 		struct shmget_args *p = params;
 		iarg[0] = p->key; /* key_t */
 		uarg[1] = p->size; /* size_t */
 		iarg[2] = p->shmflg; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* clock_gettime */
 	case 232: {
 		struct clock_gettime_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* clock_settime */
 	case 233: {
 		struct clock_settime_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* const struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* clock_getres */
 	case 234: {
 		struct clock_getres_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->tp; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* ktimer_create */
 	case 235: {
 		struct ktimer_create_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		uarg[1] = (intptr_t) p->evp; /* struct sigevent * */
 		uarg[2] = (intptr_t) p->timerid; /* int * */
 		*n_args = 3;
 		break;
 	}
 	/* ktimer_delete */
 	case 236: {
 		struct ktimer_delete_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* ktimer_settime */
 	case 237: {
 		struct ktimer_settime_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		iarg[1] = p->flags; /* int */
 		uarg[2] = (intptr_t) p->value; /* const struct itimerspec * */
 		uarg[3] = (intptr_t) p->ovalue; /* struct itimerspec * */
 		*n_args = 4;
 		break;
 	}
 	/* ktimer_gettime */
 	case 238: {
 		struct ktimer_gettime_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		uarg[1] = (intptr_t) p->value; /* struct itimerspec * */
 		*n_args = 2;
 		break;
 	}
 	/* ktimer_getoverrun */
 	case 239: {
 		struct ktimer_getoverrun_args *p = params;
 		iarg[0] = p->timerid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* nanosleep */
 	case 240: {
 		struct nanosleep_args *p = params;
 		uarg[0] = (intptr_t) p->rqtp; /* const struct timespec * */
 		uarg[1] = (intptr_t) p->rmtp; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* ffclock_getcounter */
 	case 241: {
 		struct ffclock_getcounter_args *p = params;
 		uarg[0] = (intptr_t) p->ffcount; /* ffcounter * */
 		*n_args = 1;
 		break;
 	}
 	/* ffclock_setestimate */
 	case 242: {
 		struct ffclock_setestimate_args *p = params;
 		uarg[0] = (intptr_t) p->cest; /* struct ffclock_estimate * */
 		*n_args = 1;
 		break;
 	}
 	/* ffclock_getestimate */
 	case 243: {
 		struct ffclock_getestimate_args *p = params;
 		uarg[0] = (intptr_t) p->cest; /* struct ffclock_estimate * */
 		*n_args = 1;
 		break;
 	}
 	/* clock_nanosleep */
 	case 244: {
 		struct clock_nanosleep_args *p = params;
 		iarg[0] = p->clock_id; /* clockid_t */
 		iarg[1] = p->flags; /* int */
 		uarg[2] = (intptr_t) p->rqtp; /* const struct timespec * */
 		uarg[3] = (intptr_t) p->rmtp; /* struct timespec * */
 		*n_args = 4;
 		break;
 	}
 	/* clock_getcpuclockid2 */
 	case 247: {
 		struct clock_getcpuclockid2_args *p = params;
 		iarg[0] = p->id; /* id_t */
 		iarg[1] = p->which; /* int */
 		uarg[2] = (intptr_t) p->clock_id; /* clockid_t * */
 		*n_args = 3;
 		break;
 	}
 	/* ntp_gettime */
 	case 248: {
 		struct ntp_gettime_args *p = params;
 		uarg[0] = (intptr_t) p->ntvp; /* struct ntptimeval * */
 		*n_args = 1;
 		break;
 	}
 	/* minherit */
 	case 250: {
 		struct minherit_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->inherit; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* rfork */
 	case 251: {
 		struct rfork_args *p = params;
 		iarg[0] = p->flags; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* issetugid */
 	case 253: {
 		*n_args = 0;
 		break;
 	}
 	/* lchown */
 	case 254: {
 		struct lchown_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->uid; /* int */
 		iarg[2] = p->gid; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* aio_read */
 	case 255: {
 		struct aio_read_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 1;
 		break;
 	}
 	/* aio_write */
 	case 256: {
 		struct aio_write_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 1;
 		break;
 	}
 	/* lio_listio */
 	case 257: {
 		struct lio_listio_args *p = params;
 		iarg[0] = p->mode; /* int */
 		uarg[1] = (intptr_t) p->acb_list; /* struct aiocb *const * */
 		iarg[2] = p->nent; /* int */
 		uarg[3] = (intptr_t) p->sig; /* struct sigevent * */
 		*n_args = 4;
 		break;
 	}
 	/* lchmod */
 	case 274: {
 		struct lchmod_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->mode; /* mode_t */
 		*n_args = 2;
 		break;
 	}
 	/* lutimes */
 	case 276: {
 		struct lutimes_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->tptr; /* struct timeval * */
 		*n_args = 2;
 		break;
 	}
 	/* preadv */
 	case 289: {
 		struct preadv_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[2] = p->iovcnt; /* u_int */
 		iarg[3] = p->offset; /* off_t */
 		*n_args = 4;
 		break;
 	}
 	/* pwritev */
 	case 290: {
 		struct pwritev_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[2] = p->iovcnt; /* u_int */
 		iarg[3] = p->offset; /* off_t */
 		*n_args = 4;
 		break;
 	}
 	/* fhopen */
 	case 298: {
 		struct fhopen_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* modnext */
 	case 300: {
 		struct modnext_args *p = params;
 		iarg[0] = p->modid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* modstat */
 	case 301: {
 		struct modstat_args *p = params;
 		iarg[0] = p->modid; /* int */
 		uarg[1] = (intptr_t) p->stat; /* struct module_stat * */
 		*n_args = 2;
 		break;
 	}
 	/* modfnext */
 	case 302: {
 		struct modfnext_args *p = params;
 		iarg[0] = p->modid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* modfind */
 	case 303: {
 		struct modfind_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldload */
 	case 304: {
 		struct kldload_args *p = params;
 		uarg[0] = (intptr_t) p->file; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldunload */
 	case 305: {
 		struct kldunload_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* kldfind */
 	case 306: {
 		struct kldfind_args *p = params;
 		uarg[0] = (intptr_t) p->file; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* kldnext */
 	case 307: {
 		struct kldnext_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* kldstat */
 	case 308: {
 		struct kldstat_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		uarg[1] = (intptr_t) p->stat; /* struct kld_file_stat * */
 		*n_args = 2;
 		break;
 	}
 	/* kldfirstmod */
 	case 309: {
 		struct kldfirstmod_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* getsid */
 	case 310: {
 		struct getsid_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* setresuid */
 	case 311: {
 		struct setresuid_args *p = params;
 		uarg[0] = p->ruid; /* uid_t */
 		uarg[1] = p->euid; /* uid_t */
 		uarg[2] = p->suid; /* uid_t */
 		*n_args = 3;
 		break;
 	}
 	/* setresgid */
 	case 312: {
 		struct setresgid_args *p = params;
 		iarg[0] = p->rgid; /* gid_t */
 		iarg[1] = p->egid; /* gid_t */
 		iarg[2] = p->sgid; /* gid_t */
 		*n_args = 3;
 		break;
 	}
 	/* aio_return */
 	case 314: {
 		struct aio_return_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 1;
 		break;
 	}
 	/* aio_suspend */
 	case 315: {
 		struct aio_suspend_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb *const * */
 		iarg[1] = p->nent; /* int */
 		uarg[2] = (intptr_t) p->timeout; /* const struct timespec * */
 		*n_args = 3;
 		break;
 	}
 	/* aio_cancel */
 	case 316: {
 		struct aio_cancel_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 2;
 		break;
 	}
 	/* aio_error */
 	case 317: {
 		struct aio_error_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 1;
 		break;
 	}
 	/* yield */
 	case 321: {
 		*n_args = 0;
 		break;
 	}
 	/* mlockall */
 	case 324: {
 		struct mlockall_args *p = params;
 		iarg[0] = p->how; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* munlockall */
 	case 325: {
 		*n_args = 0;
 		break;
 	}
 	/* __getcwd */
 	case 326: {
 		struct __getcwd_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* char * */
 		uarg[1] = p->buflen; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* sched_setparam */
 	case 327: {
 		struct sched_setparam_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->param; /* const struct sched_param * */
 		*n_args = 2;
 		break;
 	}
 	/* sched_getparam */
 	case 328: {
 		struct sched_getparam_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->param; /* struct sched_param * */
 		*n_args = 2;
 		break;
 	}
 	/* sched_setscheduler */
 	case 329: {
 		struct sched_setscheduler_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->policy; /* int */
 		uarg[2] = (intptr_t) p->param; /* const struct sched_param * */
 		*n_args = 3;
 		break;
 	}
 	/* sched_getscheduler */
 	case 330: {
 		struct sched_getscheduler_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		*n_args = 1;
 		break;
 	}
 	/* sched_yield */
 	case 331: {
 		*n_args = 0;
 		break;
 	}
 	/* sched_get_priority_max */
 	case 332: {
 		struct sched_get_priority_max_args *p = params;
 		iarg[0] = p->policy; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* sched_get_priority_min */
 	case 333: {
 		struct sched_get_priority_min_args *p = params;
 		iarg[0] = p->policy; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* sched_rr_get_interval */
 	case 334: {
 		struct sched_rr_get_interval_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->interval; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* utrace */
 	case 335: {
 		struct utrace_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* const void * */
 		uarg[1] = p->len; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* kldsym */
 	case 337: {
 		struct kldsym_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		*n_args = 3;
 		break;
 	}
 	/* jail */
 	case 338: {
 		struct jail_args *p = params;
 		uarg[0] = (intptr_t) p->jail; /* struct jail * */
 		*n_args = 1;
 		break;
 	}
 	/* nnpfs_syscall */
 	case 339: {
 		struct nnpfs_syscall_args *p = params;
 		iarg[0] = p->operation; /* int */
 		uarg[1] = (intptr_t) p->a_pathP; /* char * */
 		iarg[2] = p->a_opcode; /* int */
 		uarg[3] = (intptr_t) p->a_paramsP; /* void * */
 		iarg[4] = p->a_followSymlinks; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* sigprocmask */
 	case 340: {
 		struct sigprocmask_args *p = params;
 		iarg[0] = p->how; /* int */
 		uarg[1] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[2] = (intptr_t) p->oset; /* sigset_t * */
 		*n_args = 3;
 		break;
 	}
 	/* sigsuspend */
 	case 341: {
 		struct sigsuspend_args *p = params;
 		uarg[0] = (intptr_t) p->sigmask; /* const sigset_t * */
 		*n_args = 1;
 		break;
 	}
 	/* sigpending */
 	case 343: {
 		struct sigpending_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* sigset_t * */
 		*n_args = 1;
 		break;
 	}
 	/* sigtimedwait */
 	case 345: {
 		struct sigtimedwait_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->info; /* siginfo_t * */
 		uarg[2] = (intptr_t) p->timeout; /* const struct timespec * */
 		*n_args = 3;
 		break;
 	}
 	/* sigwaitinfo */
 	case 346: {
 		struct sigwaitinfo_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->info; /* siginfo_t * */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_get_file */
 	case 347: {
 		struct __acl_get_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_file */
 	case 348: {
 		struct __acl_set_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_get_fd */
 	case 349: {
 		struct __acl_get_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_fd */
 	case 350: {
 		struct __acl_set_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_delete_file */
 	case 351: {
 		struct __acl_delete_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_delete_fd */
 	case 352: {
 		struct __acl_delete_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_aclcheck_file */
 	case 353: {
 		struct __acl_aclcheck_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_aclcheck_fd */
 	case 354: {
 		struct __acl_aclcheck_fd_args *p = params;
 		iarg[0] = p->filedes; /* int */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* extattrctl */
 	case 355: {
 		struct extattrctl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->filename; /* const char * */
 		iarg[3] = p->attrnamespace; /* int */
 		uarg[4] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_set_file */
 	case 356: {
 		struct extattr_set_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_file */
 	case 357: {
 		struct extattr_get_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_file */
 	case 358: {
 		struct extattr_delete_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* aio_waitcomplete */
 	case 359: {
 		struct aio_waitcomplete_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb ** */
 		uarg[1] = (intptr_t) p->timeout; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* getresuid */
 	case 360: {
 		struct getresuid_args *p = params;
 		uarg[0] = (intptr_t) p->ruid; /* uid_t * */
 		uarg[1] = (intptr_t) p->euid; /* uid_t * */
 		uarg[2] = (intptr_t) p->suid; /* uid_t * */
 		*n_args = 3;
 		break;
 	}
 	/* getresgid */
 	case 361: {
 		struct getresgid_args *p = params;
 		uarg[0] = (intptr_t) p->rgid; /* gid_t * */
 		uarg[1] = (intptr_t) p->egid; /* gid_t * */
 		uarg[2] = (intptr_t) p->sgid; /* gid_t * */
 		*n_args = 3;
 		break;
 	}
 	/* kqueue */
 	case 362: {
 		*n_args = 0;
 		break;
 	}
 	/* extattr_set_fd */
 	case 371: {
 		struct extattr_set_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_fd */
 	case 372: {
 		struct extattr_get_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_fd */
 	case 373: {
 		struct extattr_delete_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* __setugid */
 	case 374: {
 		struct __setugid_args *p = params;
 		iarg[0] = p->flag; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* eaccess */
 	case 376: {
 		struct eaccess_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->amode; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* afs3_syscall */
 	case 377: {
 		struct afs3_syscall_args *p = params;
 		iarg[0] = p->syscall; /* long */
 		iarg[1] = p->parm1; /* long */
 		iarg[2] = p->parm2; /* long */
 		iarg[3] = p->parm3; /* long */
 		iarg[4] = p->parm4; /* long */
 		iarg[5] = p->parm5; /* long */
 		iarg[6] = p->parm6; /* long */
 		*n_args = 7;
 		break;
 	}
 	/* nmount */
 	case 378: {
 		struct nmount_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* __mac_get_proc */
 	case 384: {
 		struct __mac_get_proc_args *p = params;
 		uarg[0] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 1;
 		break;
 	}
 	/* __mac_set_proc */
 	case 385: {
 		struct __mac_set_proc_args *p = params;
 		uarg[0] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 1;
 		break;
 	}
 	/* __mac_get_fd */
 	case 386: {
 		struct __mac_get_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* __mac_get_file */
 	case 387: {
 		struct __mac_get_file_args *p = params;
 		uarg[0] = (intptr_t) p->path_p; /* const char * */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* __mac_set_fd */
 	case 388: {
 		struct __mac_set_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* __mac_set_file */
 	case 389: {
 		struct __mac_set_file_args *p = params;
 		uarg[0] = (intptr_t) p->path_p; /* const char * */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* kenv */
 	case 390: {
 		struct kenv_args *p = params;
 		iarg[0] = p->what; /* int */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		uarg[2] = (intptr_t) p->value; /* char * */
 		iarg[3] = p->len; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* lchflags */
 	case 391: {
 		struct lchflags_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = p->flags; /* u_long */
 		*n_args = 2;
 		break;
 	}
 	/* uuidgen */
 	case 392: {
 		struct uuidgen_args *p = params;
 		uarg[0] = (intptr_t) p->store; /* struct uuid * */
 		iarg[1] = p->count; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* sendfile */
 	case 393: {
 		struct sendfile_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		iarg[2] = p->offset; /* off_t */
 		uarg[3] = p->nbytes; /* size_t */
 		uarg[4] = (intptr_t) p->hdtr; /* struct sf_hdtr * */
 		uarg[5] = (intptr_t) p->sbytes; /* off_t * */
 		iarg[6] = p->flags; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* mac_syscall */
 	case 394: {
 		struct mac_syscall_args *p = params;
 		uarg[0] = (intptr_t) p->policy; /* const char * */
 		iarg[1] = p->call; /* int */
 		uarg[2] = (intptr_t) p->arg; /* void * */
 		*n_args = 3;
 		break;
 	}
 	/* ksem_close */
 	case 400: {
 		struct ksem_close_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_post */
 	case 401: {
 		struct ksem_post_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_wait */
 	case 402: {
 		struct ksem_wait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_trywait */
 	case 403: {
 		struct ksem_trywait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_init */
 	case 404: {
 		struct ksem_init_args *p = params;
 		uarg[0] = (intptr_t) p->idp; /* semid_t * */
 		uarg[1] = p->value; /* unsigned int */
 		*n_args = 2;
 		break;
 	}
 	/* ksem_open */
 	case 405: {
 		struct ksem_open_args *p = params;
 		uarg[0] = (intptr_t) p->idp; /* semid_t * */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		iarg[2] = p->oflag; /* int */
 		iarg[3] = p->mode; /* mode_t */
 		uarg[4] = p->value; /* unsigned int */
 		*n_args = 5;
 		break;
 	}
 	/* ksem_unlink */
 	case 406: {
 		struct ksem_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* ksem_getvalue */
 	case 407: {
 		struct ksem_getvalue_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		uarg[1] = (intptr_t) p->val; /* int * */
 		*n_args = 2;
 		break;
 	}
 	/* ksem_destroy */
 	case 408: {
 		struct ksem_destroy_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		*n_args = 1;
 		break;
 	}
 	/* __mac_get_pid */
 	case 409: {
 		struct __mac_get_pid_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* __mac_get_link */
 	case 410: {
 		struct __mac_get_link_args *p = params;
 		uarg[0] = (intptr_t) p->path_p; /* const char * */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* __mac_set_link */
 	case 411: {
 		struct __mac_set_link_args *p = params;
 		uarg[0] = (intptr_t) p->path_p; /* const char * */
 		uarg[1] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 2;
 		break;
 	}
 	/* extattr_set_link */
 	case 412: {
 		struct extattr_set_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_get_link */
 	case 413: {
 		struct extattr_get_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		uarg[4] = p->nbytes; /* size_t */
 		*n_args = 5;
 		break;
 	}
 	/* extattr_delete_link */
 	case 414: {
 		struct extattr_delete_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->attrname; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* __mac_execve */
 	case 415: {
 		struct __mac_execve_args *p = params;
 		uarg[0] = (intptr_t) p->fname; /* const char * */
 		uarg[1] = (intptr_t) p->argv; /* char ** */
 		uarg[2] = (intptr_t) p->envv; /* char ** */
 		uarg[3] = (intptr_t) p->mac_p; /* struct mac * */
 		*n_args = 4;
 		break;
 	}
 	/* sigaction */
 	case 416: {
 		struct sigaction_args *p = params;
 		iarg[0] = p->sig; /* int */
 		uarg[1] = (intptr_t) p->act; /* const struct sigaction * */
 		uarg[2] = (intptr_t) p->oact; /* struct sigaction * */
 		*n_args = 3;
 		break;
 	}
 	/* sigreturn */
 	case 417: {
 		struct sigreturn_args *p = params;
 		uarg[0] = (intptr_t) p->sigcntxp; /* const struct __ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* getcontext */
 	case 421: {
 		struct getcontext_args *p = params;
 		uarg[0] = (intptr_t) p->ucp; /* struct __ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* setcontext */
 	case 422: {
 		struct setcontext_args *p = params;
 		uarg[0] = (intptr_t) p->ucp; /* const struct __ucontext * */
 		*n_args = 1;
 		break;
 	}
 	/* swapcontext */
 	case 423: {
 		struct swapcontext_args *p = params;
 		uarg[0] = (intptr_t) p->oucp; /* struct __ucontext * */
 		uarg[1] = (intptr_t) p->ucp; /* const struct __ucontext * */
 		*n_args = 2;
 		break;
 	}
 	/* swapoff */
 	case 424: {
 		struct swapoff_args *p = params;
 		uarg[0] = (intptr_t) p->name; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* __acl_get_link */
 	case 425: {
 		struct __acl_get_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_set_link */
 	case 426: {
 		struct __acl_set_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* __acl_delete_link */
 	case 427: {
 		struct __acl_delete_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		*n_args = 2;
 		break;
 	}
 	/* __acl_aclcheck_link */
 	case 428: {
 		struct __acl_aclcheck_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->type; /* acl_type_t */
 		uarg[2] = (intptr_t) p->aclp; /* struct acl * */
 		*n_args = 3;
 		break;
 	}
 	/* sigwait */
 	case 429: {
 		struct sigwait_args *p = params;
 		uarg[0] = (intptr_t) p->set; /* const sigset_t * */
 		uarg[1] = (intptr_t) p->sig; /* int * */
 		*n_args = 2;
 		break;
 	}
 	/* thr_create */
 	case 430: {
 		struct thr_create_args *p = params;
 		uarg[0] = (intptr_t) p->ctx; /* ucontext_t * */
 		uarg[1] = (intptr_t) p->id; /* long * */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* thr_exit */
 	case 431: {
 		struct thr_exit_args *p = params;
 		uarg[0] = (intptr_t) p->state; /* long * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_self */
 	case 432: {
 		struct thr_self_args *p = params;
 		uarg[0] = (intptr_t) p->id; /* long * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_kill */
 	case 433: {
 		struct thr_kill_args *p = params;
 		iarg[0] = p->id; /* long */
 		iarg[1] = p->sig; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* jail_attach */
 	case 436: {
 		struct jail_attach_args *p = params;
 		iarg[0] = p->jid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* extattr_list_fd */
 	case 437: {
 		struct extattr_list_fd_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* extattr_list_file */
 	case 438: {
 		struct extattr_list_file_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* extattr_list_link */
 	case 439: {
 		struct extattr_list_link_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->attrnamespace; /* int */
 		uarg[2] = (intptr_t) p->data; /* void * */
 		uarg[3] = p->nbytes; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* ksem_timedwait */
 	case 441: {
 		struct ksem_timedwait_args *p = params;
 		iarg[0] = p->id; /* semid_t */
 		uarg[1] = (intptr_t) p->abstime; /* const struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* thr_suspend */
 	case 442: {
 		struct thr_suspend_args *p = params;
 		uarg[0] = (intptr_t) p->timeout; /* const struct timespec * */
 		*n_args = 1;
 		break;
 	}
 	/* thr_wake */
 	case 443: {
 		struct thr_wake_args *p = params;
 		iarg[0] = p->id; /* long */
 		*n_args = 1;
 		break;
 	}
 	/* kldunloadf */
 	case 444: {
 		struct kldunloadf_args *p = params;
 		iarg[0] = p->fileid; /* int */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* audit */
 	case 445: {
 		struct audit_args *p = params;
 		uarg[0] = (intptr_t) p->record; /* const void * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* auditon */
 	case 446: {
 		struct auditon_args *p = params;
 		iarg[0] = p->cmd; /* int */
 		uarg[1] = (intptr_t) p->data; /* void * */
 		uarg[2] = p->length; /* u_int */
 		*n_args = 3;
 		break;
 	}
 	/* getauid */
 	case 447: {
 		struct getauid_args *p = params;
 		uarg[0] = (intptr_t) p->auid; /* uid_t * */
 		*n_args = 1;
 		break;
 	}
 	/* setauid */
 	case 448: {
 		struct setauid_args *p = params;
 		uarg[0] = (intptr_t) p->auid; /* uid_t * */
 		*n_args = 1;
 		break;
 	}
 	/* getaudit */
 	case 449: {
 		struct getaudit_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo; /* struct auditinfo * */
 		*n_args = 1;
 		break;
 	}
 	/* setaudit */
 	case 450: {
 		struct setaudit_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo; /* struct auditinfo * */
 		*n_args = 1;
 		break;
 	}
 	/* getaudit_addr */
 	case 451: {
 		struct getaudit_addr_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo_addr; /* struct auditinfo_addr * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* setaudit_addr */
 	case 452: {
 		struct setaudit_addr_args *p = params;
 		uarg[0] = (intptr_t) p->auditinfo_addr; /* struct auditinfo_addr * */
 		uarg[1] = p->length; /* u_int */
 		*n_args = 2;
 		break;
 	}
 	/* auditctl */
 	case 453: {
 		struct auditctl_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* _umtx_op */
 	case 454: {
 		struct _umtx_op_args *p = params;
 		uarg[0] = (intptr_t) p->obj; /* void * */
 		iarg[1] = p->op; /* int */
 		uarg[2] = p->val; /* u_long */
 		uarg[3] = (intptr_t) p->uaddr1; /* void * */
 		uarg[4] = (intptr_t) p->uaddr2; /* void * */
 		*n_args = 5;
 		break;
 	}
 	/* thr_new */
 	case 455: {
 		struct thr_new_args *p = params;
 		uarg[0] = (intptr_t) p->param; /* struct thr_param * */
 		iarg[1] = p->param_size; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* sigqueue */
 	case 456: {
 		struct sigqueue_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->signum; /* int */
 		uarg[2] = (intptr_t) p->value; /* void * */
 		*n_args = 3;
 		break;
 	}
 	/* kmq_open */
 	case 457: {
 		struct kmq_open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		uarg[3] = (intptr_t) p->attr; /* const struct mq_attr * */
 		*n_args = 4;
 		break;
 	}
 	/* kmq_setattr */
 	case 458: {
 		struct kmq_setattr_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->attr; /* const struct mq_attr * */
 		uarg[2] = (intptr_t) p->oattr; /* struct mq_attr * */
 		*n_args = 3;
 		break;
 	}
 	/* kmq_timedreceive */
 	case 459: {
 		struct kmq_timedreceive_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->msg_ptr; /* char * */
 		uarg[2] = p->msg_len; /* size_t */
 		uarg[3] = (intptr_t) p->msg_prio; /* unsigned * */
 		uarg[4] = (intptr_t) p->abs_timeout; /* const struct timespec * */
 		*n_args = 5;
 		break;
 	}
 	/* kmq_timedsend */
 	case 460: {
 		struct kmq_timedsend_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->msg_ptr; /* const char * */
 		uarg[2] = p->msg_len; /* size_t */
 		uarg[3] = p->msg_prio; /* unsigned */
 		uarg[4] = (intptr_t) p->abs_timeout; /* const struct timespec * */
 		*n_args = 5;
 		break;
 	}
 	/* kmq_notify */
 	case 461: {
 		struct kmq_notify_args *p = params;
 		iarg[0] = p->mqd; /* int */
 		uarg[1] = (intptr_t) p->sigev; /* const struct sigevent * */
 		*n_args = 2;
 		break;
 	}
 	/* kmq_unlink */
 	case 462: {
 		struct kmq_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* abort2 */
 	case 463: {
 		struct abort2_args *p = params;
 		uarg[0] = (intptr_t) p->why; /* const char * */
 		iarg[1] = p->nargs; /* int */
 		uarg[2] = (intptr_t) p->args; /* void ** */
 		*n_args = 3;
 		break;
 	}
 	/* thr_set_name */
 	case 464: {
 		struct thr_set_name_args *p = params;
 		iarg[0] = p->id; /* long */
 		uarg[1] = (intptr_t) p->name; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* aio_fsync */
 	case 465: {
 		struct aio_fsync_args *p = params;
 		iarg[0] = p->op; /* int */
 		uarg[1] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 2;
 		break;
 	}
 	/* rtprio_thread */
 	case 466: {
 		struct rtprio_thread_args *p = params;
 		iarg[0] = p->function; /* int */
 		iarg[1] = p->lwpid; /* lwpid_t */
 		uarg[2] = (intptr_t) p->rtp; /* struct rtprio * */
 		*n_args = 3;
 		break;
 	}
 	/* sctp_peeloff */
 	case 471: {
 		struct sctp_peeloff_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = p->name; /* uint32_t */
 		*n_args = 2;
 		break;
 	}
 	/* sctp_generic_sendmsg */
 	case 472: {
 		struct sctp_generic_sendmsg_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->msg; /* void * */
 		iarg[2] = p->mlen; /* int */
 		uarg[3] = (intptr_t) p->to; /* struct sockaddr * */
 		iarg[4] = p->tolen; /* __socklen_t */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		iarg[6] = p->flags; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* sctp_generic_sendmsg_iov */
 	case 473: {
 		struct sctp_generic_sendmsg_iov_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->iov; /* struct iovec * */
 		iarg[2] = p->iovlen; /* int */
 		uarg[3] = (intptr_t) p->to; /* struct sockaddr * */
 		iarg[4] = p->tolen; /* __socklen_t */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		iarg[6] = p->flags; /* int */
 		*n_args = 7;
 		break;
 	}
 	/* sctp_generic_recvmsg */
 	case 474: {
 		struct sctp_generic_recvmsg_args *p = params;
 		iarg[0] = p->sd; /* int */
 		uarg[1] = (intptr_t) p->iov; /* struct iovec * */
 		iarg[2] = p->iovlen; /* int */
 		uarg[3] = (intptr_t) p->from; /* struct sockaddr * */
 		uarg[4] = (intptr_t) p->fromlenaddr; /* __socklen_t * */
 		uarg[5] = (intptr_t) p->sinfo; /* struct sctp_sndrcvinfo * */
 		uarg[6] = (intptr_t) p->msg_flags; /* int * */
 		*n_args = 7;
 		break;
 	}
 	/* pread */
 	case 475: {
 		struct pread_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* void * */
 		uarg[2] = p->nbyte; /* size_t */
 		iarg[3] = p->offset; /* off_t */
 		*n_args = 4;
 		break;
 	}
 	/* pwrite */
 	case 476: {
 		struct pwrite_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* const void * */
 		uarg[2] = p->nbyte; /* size_t */
 		iarg[3] = p->offset; /* off_t */
 		*n_args = 4;
 		break;
 	}
 	/* mmap */
 	case 477: {
 		struct mmap_args *p = params;
 		uarg[0] = (intptr_t) p->addr; /* void * */
 		uarg[1] = p->len; /* size_t */
 		iarg[2] = p->prot; /* int */
 		iarg[3] = p->flags; /* int */
 		iarg[4] = p->fd; /* int */
 		iarg[5] = p->pos; /* off_t */
 		*n_args = 6;
 		break;
 	}
 	/* lseek */
 	case 478: {
 		struct lseek_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->offset; /* off_t */
 		iarg[2] = p->whence; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* truncate */
 	case 479: {
 		struct truncate_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->length; /* off_t */
 		*n_args = 2;
 		break;
 	}
 	/* ftruncate */
 	case 480: {
 		struct ftruncate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->length; /* off_t */
 		*n_args = 2;
 		break;
 	}
 	/* thr_kill2 */
 	case 481: {
 		struct thr_kill2_args *p = params;
 		iarg[0] = p->pid; /* pid_t */
 		iarg[1] = p->id; /* long */
 		iarg[2] = p->sig; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* shm_open */
 	case 482: {
 		struct shm_open_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->flags; /* int */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* shm_unlink */
 	case 483: {
 		struct shm_unlink_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* cpuset */
 	case 484: {
 		struct cpuset_args *p = params;
 		uarg[0] = (intptr_t) p->setid; /* cpusetid_t * */
 		*n_args = 1;
 		break;
 	}
 	/* cpuset_setid */
 	case 485: {
 		struct cpuset_setid_args *p = params;
 		iarg[0] = p->which; /* cpuwhich_t */
 		iarg[1] = p->id; /* id_t */
 		iarg[2] = p->setid; /* cpusetid_t */
 		*n_args = 3;
 		break;
 	}
 	/* cpuset_getid */
 	case 486: {
 		struct cpuset_getid_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		iarg[2] = p->id; /* id_t */
 		uarg[3] = (intptr_t) p->setid; /* cpusetid_t * */
 		*n_args = 4;
 		break;
 	}
 	/* cpuset_getaffinity */
 	case 487: {
 		struct cpuset_getaffinity_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		iarg[2] = p->id; /* id_t */
 		uarg[3] = p->cpusetsize; /* size_t */
 		uarg[4] = (intptr_t) p->mask; /* cpuset_t * */
 		*n_args = 5;
 		break;
 	}
 	/* cpuset_setaffinity */
 	case 488: {
 		struct cpuset_setaffinity_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		iarg[2] = p->id; /* id_t */
 		uarg[3] = p->cpusetsize; /* size_t */
 		uarg[4] = (intptr_t) p->mask; /* const cpuset_t * */
 		*n_args = 5;
 		break;
 	}
 	/* faccessat */
 	case 489: {
 		struct faccessat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->amode; /* int */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fchmodat */
 	case 490: {
 		struct fchmodat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fchownat */
 	case 491: {
 		struct fchownat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = p->uid; /* uid_t */
 		iarg[3] = p->gid; /* gid_t */
 		iarg[4] = p->flag; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* fexecve */
 	case 492: {
 		struct fexecve_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->argv; /* char ** */
 		uarg[2] = (intptr_t) p->envv; /* char ** */
 		*n_args = 3;
 		break;
 	}
 	/* futimesat */
 	case 494: {
 		struct futimesat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->times; /* struct timeval * */
 		*n_args = 3;
 		break;
 	}
 	/* linkat */
 	case 495: {
 		struct linkat_args *p = params;
 		iarg[0] = p->fd1; /* int */
 		uarg[1] = (intptr_t) p->path1; /* const char * */
 		iarg[2] = p->fd2; /* int */
 		uarg[3] = (intptr_t) p->path2; /* const char * */
 		iarg[4] = p->flag; /* int */
 		*n_args = 5;
 		break;
 	}
 	/* mkdirat */
 	case 496: {
 		struct mkdirat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* mkfifoat */
 	case 497: {
 		struct mkfifoat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		*n_args = 3;
 		break;
 	}
 	/* openat */
 	case 499: {
 		struct openat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flag; /* int */
 		iarg[3] = p->mode; /* mode_t */
 		*n_args = 4;
 		break;
 	}
 	/* readlinkat */
 	case 500: {
 		struct readlinkat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->buf; /* char * */
 		uarg[3] = p->bufsize; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* renameat */
 	case 501: {
 		struct renameat_args *p = params;
 		iarg[0] = p->oldfd; /* int */
 		uarg[1] = (intptr_t) p->old; /* const char * */
 		iarg[2] = p->newfd; /* int */
 		uarg[3] = (intptr_t) p->new; /* const char * */
 		*n_args = 4;
 		break;
 	}
 	/* symlinkat */
 	case 502: {
 		struct symlinkat_args *p = params;
 		uarg[0] = (intptr_t) p->path1; /* const char * */
 		iarg[1] = p->fd; /* int */
 		uarg[2] = (intptr_t) p->path2; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* unlinkat */
 	case 503: {
 		struct unlinkat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->flag; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* posix_openpt */
 	case 504: {
 		struct posix_openpt_args *p = params;
 		iarg[0] = p->flags; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* gssd_syscall */
 	case 505: {
 		struct gssd_syscall_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* jail_get */
 	case 506: {
 		struct jail_get_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* jail_set */
 	case 507: {
 		struct jail_set_args *p = params;
 		uarg[0] = (intptr_t) p->iovp; /* struct iovec * */
 		uarg[1] = p->iovcnt; /* unsigned int */
 		iarg[2] = p->flags; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* jail_remove */
 	case 508: {
 		struct jail_remove_args *p = params;
 		iarg[0] = p->jid; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* closefrom */
 	case 509: {
 		struct closefrom_args *p = params;
 		iarg[0] = p->lowfd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* __semctl */
 	case 510: {
 		struct __semctl_args *p = params;
 		iarg[0] = p->semid; /* int */
 		iarg[1] = p->semnum; /* int */
 		iarg[2] = p->cmd; /* int */
 		uarg[3] = (intptr_t) p->arg; /* union semun * */
 		*n_args = 4;
 		break;
 	}
 	/* msgctl */
 	case 511: {
 		struct msgctl_args *p = params;
 		iarg[0] = p->msqid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->buf; /* struct msqid_ds * */
 		*n_args = 3;
 		break;
 	}
 	/* shmctl */
 	case 512: {
 		struct shmctl_args *p = params;
 		iarg[0] = p->shmid; /* int */
 		iarg[1] = p->cmd; /* int */
 		uarg[2] = (intptr_t) p->buf; /* struct shmid_ds * */
 		*n_args = 3;
 		break;
 	}
 	/* lpathconf */
 	case 513: {
 		struct lpathconf_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		iarg[1] = p->name; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* __cap_rights_get */
 	case 515: {
 		struct __cap_rights_get_args *p = params;
 		iarg[0] = p->version; /* int */
 		iarg[1] = p->fd; /* int */
 		uarg[2] = (intptr_t) p->rightsp; /* cap_rights_t * */
 		*n_args = 3;
 		break;
 	}
 	/* cap_enter */
 	case 516: {
 		*n_args = 0;
 		break;
 	}
 	/* cap_getmode */
 	case 517: {
 		struct cap_getmode_args *p = params;
 		uarg[0] = (intptr_t) p->modep; /* u_int * */
 		*n_args = 1;
 		break;
 	}
 	/* pdfork */
 	case 518: {
 		struct pdfork_args *p = params;
 		uarg[0] = (intptr_t) p->fdp; /* int * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* pdkill */
 	case 519: {
 		struct pdkill_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->signum; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* pdgetpid */
 	case 520: {
 		struct pdgetpid_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->pidp; /* pid_t * */
 		*n_args = 2;
 		break;
 	}
 	/* pselect */
 	case 522: {
 		struct pselect_args *p = params;
 		iarg[0] = p->nd; /* int */
 		uarg[1] = (intptr_t) p->in; /* fd_set * */
 		uarg[2] = (intptr_t) p->ou; /* fd_set * */
 		uarg[3] = (intptr_t) p->ex; /* fd_set * */
 		uarg[4] = (intptr_t) p->ts; /* const struct timespec * */
 		uarg[5] = (intptr_t) p->sm; /* const sigset_t * */
 		*n_args = 6;
 		break;
 	}
 	/* getloginclass */
 	case 523: {
 		struct getloginclass_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* char * */
 		uarg[1] = p->namelen; /* size_t */
 		*n_args = 2;
 		break;
 	}
 	/* setloginclass */
 	case 524: {
 		struct setloginclass_args *p = params;
 		uarg[0] = (intptr_t) p->namebuf; /* const char * */
 		*n_args = 1;
 		break;
 	}
 	/* rctl_get_racct */
 	case 525: {
 		struct rctl_get_racct_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_get_rules */
 	case 526: {
 		struct rctl_get_rules_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_get_limits */
 	case 527: {
 		struct rctl_get_limits_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_add_rule */
 	case 528: {
 		struct rctl_add_rule_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* rctl_remove_rule */
 	case 529: {
 		struct rctl_remove_rule_args *p = params;
 		uarg[0] = (intptr_t) p->inbufp; /* const void * */
 		uarg[1] = p->inbuflen; /* size_t */
 		uarg[2] = (intptr_t) p->outbufp; /* void * */
 		uarg[3] = p->outbuflen; /* size_t */
 		*n_args = 4;
 		break;
 	}
 	/* posix_fallocate */
 	case 530: {
 		struct posix_fallocate_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->offset; /* off_t */
 		iarg[2] = p->len; /* off_t */
 		*n_args = 3;
 		break;
 	}
 	/* posix_fadvise */
 	case 531: {
 		struct posix_fadvise_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->offset; /* off_t */
 		iarg[2] = p->len; /* off_t */
 		iarg[3] = p->advice; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* wait6 */
 	case 532: {
 		struct wait6_args *p = params;
 		iarg[0] = p->idtype; /* idtype_t */
 		iarg[1] = p->id; /* id_t */
 		uarg[2] = (intptr_t) p->status; /* int * */
 		iarg[3] = p->options; /* int */
 		uarg[4] = (intptr_t) p->wrusage; /* struct __wrusage * */
 		uarg[5] = (intptr_t) p->info; /* siginfo_t * */
 		*n_args = 6;
 		break;
 	}
 	/* cap_rights_limit */
 	case 533: {
 		struct cap_rights_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->rightsp; /* cap_rights_t * */
 		*n_args = 2;
 		break;
 	}
 	/* cap_ioctls_limit */
 	case 534: {
 		struct cap_ioctls_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->cmds; /* const u_long * */
 		uarg[2] = p->ncmds; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* cap_ioctls_get */
 	case 535: {
 		struct cap_ioctls_get_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->cmds; /* u_long * */
 		uarg[2] = p->maxcmds; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* cap_fcntls_limit */
 	case 536: {
 		struct cap_fcntls_limit_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = p->fcntlrights; /* uint32_t */
 		*n_args = 2;
 		break;
 	}
 	/* cap_fcntls_get */
 	case 537: {
 		struct cap_fcntls_get_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->fcntlrightsp; /* uint32_t * */
 		*n_args = 2;
 		break;
 	}
 	/* bindat */
 	case 538: {
 		struct bindat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		uarg[2] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[3] = p->namelen; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* connectat */
 	case 539: {
 		struct connectat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		iarg[1] = p->s; /* int */
 		uarg[2] = (intptr_t) p->name; /* const struct sockaddr * */
 		iarg[3] = p->namelen; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* chflagsat */
 	case 540: {
 		struct chflagsat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = p->flags; /* u_long */
 		iarg[3] = p->atflag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* accept4 */
 	case 541: {
 		struct accept4_args *p = params;
 		iarg[0] = p->s; /* int */
 		uarg[1] = (intptr_t) p->name; /* struct sockaddr * */
 		uarg[2] = (intptr_t) p->anamelen; /* __socklen_t * */
 		iarg[3] = p->flags; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* pipe2 */
 	case 542: {
 		struct pipe2_args *p = params;
 		uarg[0] = (intptr_t) p->fildes; /* int * */
 		iarg[1] = p->flags; /* int */
 		*n_args = 2;
 		break;
 	}
 	/* aio_mlock */
 	case 543: {
 		struct aio_mlock_args *p = params;
 		uarg[0] = (intptr_t) p->aiocbp; /* struct aiocb * */
 		*n_args = 1;
 		break;
 	}
 	/* procctl */
 	case 544: {
 		struct procctl_args *p = params;
 		iarg[0] = p->idtype; /* idtype_t */
 		iarg[1] = p->id; /* id_t */
 		iarg[2] = p->com; /* int */
 		uarg[3] = (intptr_t) p->data; /* void * */
 		*n_args = 4;
 		break;
 	}
 	/* ppoll */
 	case 545: {
 		struct ppoll_args *p = params;
 		uarg[0] = (intptr_t) p->fds; /* struct pollfd * */
 		uarg[1] = p->nfds; /* u_int */
 		uarg[2] = (intptr_t) p->ts; /* const struct timespec * */
 		uarg[3] = (intptr_t) p->set; /* const sigset_t * */
 		*n_args = 4;
 		break;
 	}
 	/* futimens */
 	case 546: {
 		struct futimens_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->times; /* struct timespec * */
 		*n_args = 2;
 		break;
 	}
 	/* utimensat */
 	case 547: {
 		struct utimensat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->times; /* struct timespec * */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fdatasync */
 	case 550: {
 		struct fdatasync_args *p = params;
 		iarg[0] = p->fd; /* int */
 		*n_args = 1;
 		break;
 	}
 	/* fstat */
 	case 551: {
 		struct fstat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->sb; /* struct stat * */
 		*n_args = 2;
 		break;
 	}
 	/* fstatat */
 	case 552: {
 		struct fstatat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		uarg[2] = (intptr_t) p->buf; /* struct stat * */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fhstat */
 	case 553: {
 		struct fhstat_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		uarg[1] = (intptr_t) p->sb; /* struct stat * */
 		*n_args = 2;
 		break;
 	}
 	/* getdirentries */
 	case 554: {
 		struct getdirentries_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->count; /* size_t */
 		uarg[3] = (intptr_t) p->basep; /* off_t * */
 		*n_args = 4;
 		break;
 	}
 	/* statfs */
 	case 555: {
 		struct statfs_args *p = params;
 		uarg[0] = (intptr_t) p->path; /* const char * */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs * */
 		*n_args = 2;
 		break;
 	}
 	/* fstatfs */
 	case 556: {
 		struct fstatfs_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs * */
 		*n_args = 2;
 		break;
 	}
 	/* getfsstat */
 	case 557: {
 		struct getfsstat_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* struct statfs * */
 		iarg[1] = p->bufsize; /* long */
 		iarg[2] = p->mode; /* int */
 		*n_args = 3;
 		break;
 	}
 	/* fhstatfs */
 	case 558: {
 		struct fhstatfs_args *p = params;
 		uarg[0] = (intptr_t) p->u_fhp; /* const struct fhandle * */
 		uarg[1] = (intptr_t) p->buf; /* struct statfs * */
 		*n_args = 2;
 		break;
 	}
 	/* mknodat */
 	case 559: {
 		struct mknodat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->mode; /* mode_t */
 		iarg[3] = p->dev; /* dev_t */
 		*n_args = 4;
 		break;
 	}
 	/* kevent */
 	case 560: {
 		struct kevent_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->changelist; /* struct kevent * */
 		iarg[2] = p->nchanges; /* int */
 		uarg[3] = (intptr_t) p->eventlist; /* struct kevent * */
 		iarg[4] = p->nevents; /* int */
 		uarg[5] = (intptr_t) p->timeout; /* const struct timespec * */
 		*n_args = 6;
 		break;
 	}
 	/* cpuset_getdomain */
 	case 561: {
 		struct cpuset_getdomain_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		iarg[2] = p->id; /* id_t */
 		uarg[3] = p->domainsetsize; /* size_t */
 		uarg[4] = (intptr_t) p->mask; /* domainset_t * */
 		uarg[5] = (intptr_t) p->policy; /* int * */
 		*n_args = 6;
 		break;
 	}
 	/* cpuset_setdomain */
 	case 562: {
 		struct cpuset_setdomain_args *p = params;
 		iarg[0] = p->level; /* cpulevel_t */
 		iarg[1] = p->which; /* cpuwhich_t */
 		iarg[2] = p->id; /* id_t */
 		uarg[3] = p->domainsetsize; /* size_t */
 		uarg[4] = (intptr_t) p->mask; /* domainset_t * */
 		iarg[5] = p->policy; /* int */
 		*n_args = 6;
 		break;
 	}
 	/* getrandom */
 	case 563: {
 		struct getrandom_args *p = params;
 		uarg[0] = (intptr_t) p->buf; /* void * */
 		uarg[1] = p->buflen; /* size_t */
 		uarg[2] = p->flags; /* unsigned int */
 		*n_args = 3;
 		break;
 	}
 	/* getfhat */
 	case 564: {
 		struct getfhat_args *p = params;
 		iarg[0] = p->fd; /* int */
 		uarg[1] = (intptr_t) p->path; /* char * */
 		uarg[2] = (intptr_t) p->fhp; /* struct fhandle * */
 		iarg[3] = p->flags; /* int */
 		*n_args = 4;
 		break;
 	}
 	/* fhlink */
 	case 565: {
 		struct fhlink_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		uarg[1] = (intptr_t) p->to; /* const char * */
 		*n_args = 2;
 		break;
 	}
 	/* fhlinkat */
 	case 566: {
 		struct fhlinkat_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		iarg[1] = p->tofd; /* int */
 		uarg[2] = (intptr_t) p->to; /* const char * */
 		*n_args = 3;
 		break;
 	}
 	/* fhreadlink */
 	case 567: {
 		struct fhreadlink_args *p = params;
 		uarg[0] = (intptr_t) p->fhp; /* struct fhandle * */
 		uarg[1] = (intptr_t) p->buf; /* char * */
 		uarg[2] = p->bufsize; /* size_t */
 		*n_args = 3;
 		break;
 	}
 	/* funlinkat */
 	case 568: {
 		struct funlinkat_args *p = params;
 		iarg[0] = p->dfd; /* int */
 		uarg[1] = (intptr_t) p->path; /* const char * */
 		iarg[2] = p->fd; /* int */
 		iarg[3] = p->flag; /* int */
 		*n_args = 4;
 		break;
 	}
 	default:
 		*n_args = 0;
 		break;
 	};
 }
 static void
 systrace_entry_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 {
 	const char *p = NULL;
 	switch (sysnum) {
 	/* nosys */
 	case 0:
 		break;
 	/* sys_exit */
 	case 1:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fork */
 	case 2:
 		break;
 	/* read */
 	case 3:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* write */
 	case 4:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* open */
 	case 5:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* close */
 	case 6:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* wait4 */
 	case 7:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct rusage *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* link */
 	case 9:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unlink */
 	case 10:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chdir */
 	case 12:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchdir */
 	case 13:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chmod */
 	case 15:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chown */
 	case 16:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* break */
 	case 17:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpid */
 	case 20:
 		break;
 	/* mount */
 	case 21:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unmount */
 	case 22:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setuid */
 	case 23:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getuid */
 	case 24:
 		break;
 	/* geteuid */
 	case 25:
 		break;
 	/* ptrace */
 	case 26:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "pid_t";
 			break;
 		case 2:
 			p = "caddr_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* recvmsg */
 	case 27:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct msghdr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sendmsg */
 	case 28:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct msghdr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* recvfrom */
 	case 29:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland struct sockaddr *";
 			break;
 		case 5:
 			p = "userland __socklen_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* accept */
 	case 30:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland __socklen_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpeername */
 	case 31:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland __socklen_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsockname */
 	case 32:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland __socklen_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* access */
 	case 33:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chflags */
 	case 34:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchflags */
 	case 35:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sync */
 	case 36:
 		break;
 	/* kill */
 	case 37:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getppid */
 	case 39:
 		break;
 	/* dup */
 	case 41:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getegid */
 	case 43:
 		break;
 	/* profil */
 	case 44:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktrace */
 	case 45:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getgid */
 	case 47:
 		break;
 	/* getlogin */
 	case 49:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setlogin */
 	case 50:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* acct */
 	case 51:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigaltstack */
 	case 53:
 		switch(ndx) {
 		case 0:
 			p = "userland stack_t *";
 			break;
 		case 1:
 			p = "userland stack_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ioctl */
 	case 54:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* reboot */
 	case 55:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* revoke */
 	case 56:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* symlink */
 	case 57:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* readlink */
 	case 58:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* execve */
 	case 59:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland char **";
 			break;
 		case 2:
 			p = "userland char **";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* umask */
 	case 60:
 		switch(ndx) {
 		case 0:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chroot */
 	case 61:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msync */
 	case 65:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* vfork */
 	case 66:
 		break;
 	/* sbrk */
 	case 69:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sstk */
 	case 70:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munmap */
 	case 73:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mprotect */
 	case 74:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* madvise */
 	case 75:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mincore */
 	case 78:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getgroups */
 	case 79:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setgroups */
 	case 80:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpgrp */
 	case 81:
 		break;
 	/* setpgid */
 	case 82:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setitimer */
 	case 83:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct itimerval *";
 			break;
 		case 2:
 			p = "userland struct itimerval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* swapon */
 	case 85:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getitimer */
 	case 86:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct itimerval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getdtablesize */
 	case 89:
 		break;
 	/* dup2 */
 	case 90:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fcntl */
 	case 92:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* select */
 	case 93:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland fd_set *";
 			break;
 		case 2:
 			p = "userland fd_set *";
 			break;
 		case 3:
 			p = "userland fd_set *";
 			break;
 		case 4:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fsync */
 	case 95:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setpriority */
 	case 96:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* socket */
 	case 97:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* connect */
 	case 98:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sockaddr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpriority */
 	case 100:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* bind */
 	case 104:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sockaddr *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setsockopt */
 	case 105:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const void *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* listen */
 	case 106:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* gettimeofday */
 	case 116:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval *";
 			break;
 		case 1:
 			p = "userland struct timezone *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getrusage */
 	case 117:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct rusage *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsockopt */
 	case 118:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* readv */
 	case 120:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* writev */
 	case 121:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* settimeofday */
 	case 122:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval *";
 			break;
 		case 1:
 			p = "userland struct timezone *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchown */
 	case 123:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchmod */
 	case 124:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setreuid */
 	case 126:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setregid */
 	case 127:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rename */
 	case 128:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* flock */
 	case 131:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkfifo */
 	case 132:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sendto */
 	case 133:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland const struct sockaddr *";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shutdown */
 	case 134:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* socketpair */
 	case 135:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkdir */
 	case 136:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rmdir */
 	case 137:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* utimes */
 	case 138:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* adjtime */
 	case 140:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timeval *";
 			break;
 		case 1:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setsid */
 	case 147:
 		break;
 	/* quotactl */
 	case 148:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nlm_syscall */
 	case 154:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland char **";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nfssvc */
 	case 155:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lgetfh */
 	case 160:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct fhandle *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfh */
 	case 161:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct fhandle *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sysarch */
 	case 165:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rtprio */
 	case 166:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "pid_t";
 			break;
 		case 2:
 			p = "userland struct rtprio *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* semsys */
 	case 169:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgsys */
 	case 170:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmsys */
 	case 171:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setfib */
 	case 175:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ntp_adjtime */
 	case 176:
 		switch(ndx) {
 		case 0:
 			p = "userland struct timex *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setgid */
 	case 181:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setegid */
 	case 182:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* seteuid */
 	case 183:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pathconf */
 	case 191:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fpathconf */
 	case 192:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getrlimit */
 	case 194:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct rlimit *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setrlimit */
 	case 195:
 		switch(ndx) {
 		case 0:
 			p = "u_int";
 			break;
 		case 1:
 			p = "userland struct rlimit *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nosys */
 	case 198:
 		break;
 	/* __sysctl */
 	case 202:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "userland size_t *";
 			break;
 		case 4:
 			p = "userland const void *";
 			break;
 		case 5:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mlock */
 	case 203:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munlock */
 	case 204:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* undelete */
 	case 205:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* futimes */
 	case 206:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getpgid */
 	case 207:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* poll */
 	case 209:
 		switch(ndx) {
 		case 0:
 			p = "userland struct pollfd *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lkmnosys */
 	case 210:
 		break;
 	/* lkmnosys */
 	case 211:
 		break;
 	/* lkmnosys */
 	case 212:
 		break;
 	/* lkmnosys */
 	case 213:
 		break;
 	/* lkmnosys */
 	case 214:
 		break;
 	/* lkmnosys */
 	case 215:
 		break;
 	/* lkmnosys */
 	case 216:
 		break;
 	/* lkmnosys */
 	case 217:
 		break;
 	/* lkmnosys */
 	case 218:
 		break;
 	/* lkmnosys */
 	case 219:
 		break;
 	/* semget */
 	case 221:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* semop */
 	case 222:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sembuf *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgget */
 	case 225:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgsnd */
 	case 226:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgrcv */
 	case 227:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "long";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmat */
 	case 228:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmdt */
 	case 230:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmget */
 	case 231:
 		switch(ndx) {
 		case 0:
 			p = "key_t";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* clock_gettime */
 	case 232:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* clock_settime */
 	case 233:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* clock_getres */
 	case 234:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_create */
 	case 235:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "userland struct sigevent *";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_delete */
 	case 236:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_settime */
 	case 237:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct itimerspec *";
 			break;
 		case 3:
 			p = "userland struct itimerspec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_gettime */
 	case 238:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct itimerspec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ktimer_getoverrun */
 	case 239:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nanosleep */
 	case 240:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct timespec *";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_getcounter */
 	case 241:
 		switch(ndx) {
 		case 0:
 			p = "userland ffcounter *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_setestimate */
 	case 242:
 		switch(ndx) {
 		case 0:
 			p = "userland struct ffclock_estimate *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ffclock_getestimate */
 	case 243:
 		switch(ndx) {
 		case 0:
 			p = "userland struct ffclock_estimate *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* clock_nanosleep */
 	case 244:
 		switch(ndx) {
 		case 0:
 			p = "clockid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct timespec *";
 			break;
 		case 3:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* clock_getcpuclockid2 */
 	case 247:
 		switch(ndx) {
 		case 0:
 			p = "id_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland clockid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ntp_gettime */
 	case 248:
 		switch(ndx) {
 		case 0:
 			p = "userland struct ntptimeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* minherit */
 	case 250:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rfork */
 	case 251:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* issetugid */
 	case 253:
 		break;
 	/* lchown */
 	case 254:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_read */
 	case 255:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_write */
 	case 256:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lio_listio */
 	case 257:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb *const *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sigevent *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lchmod */
 	case 274:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lutimes */
 	case 276:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* preadv */
 	case 289:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		case 3:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pwritev */
 	case 290:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		case 3:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhopen */
 	case 298:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modnext */
 	case 300:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modstat */
 	case 301:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct module_stat *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modfnext */
 	case 302:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* modfind */
 	case 303:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldload */
 	case 304:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldunload */
 	case 305:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldfind */
 	case 306:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldnext */
 	case 307:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldstat */
 	case 308:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct kld_file_stat *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldfirstmod */
 	case 309:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getsid */
 	case 310:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setresuid */
 	case 311:
 		switch(ndx) {
 		case 0:
 			p = "uid_t";
 			break;
 		case 1:
 			p = "uid_t";
 			break;
 		case 2:
 			p = "uid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setresgid */
 	case 312:
 		switch(ndx) {
 		case 0:
 			p = "gid_t";
 			break;
 		case 1:
 			p = "gid_t";
 			break;
 		case 2:
 			p = "gid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_return */
 	case 314:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_suspend */
 	case 315:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *const *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_cancel */
 	case 316:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_error */
 	case 317:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* yield */
 	case 321:
 		break;
 	/* mlockall */
 	case 324:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* munlockall */
 	case 325:
 		break;
 	/* __getcwd */
 	case 326:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_setparam */
 	case 327:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland const struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_getparam */
 	case 328:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_setscheduler */
 	case 329:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sched_param *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_getscheduler */
 	case 330:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_yield */
 	case 331:
 		break;
 	/* sched_get_priority_max */
 	case 332:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_get_priority_min */
 	case 333:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sched_rr_get_interval */
 	case 334:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* utrace */
 	case 335:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldsym */
 	case 337:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail */
 	case 338:
 		switch(ndx) {
 		case 0:
 			p = "userland struct jail *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nnpfs_syscall */
 	case 339:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigprocmask */
 	case 340:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const sigset_t *";
 			break;
 		case 2:
 			p = "userland sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigsuspend */
 	case 341:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigpending */
 	case 343:
 		switch(ndx) {
 		case 0:
 			p = "userland sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigtimedwait */
 	case 345:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland siginfo_t *";
 			break;
 		case 2:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigwaitinfo */
 	case 346:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland siginfo_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_file */
 	case 347:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_file */
 	case 348:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_fd */
 	case 349:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_fd */
 	case 350:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_file */
 	case 351:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_fd */
 	case 352:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_file */
 	case 353:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_fd */
 	case 354:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattrctl */
 	case 355:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_set_file */
 	case 356:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_file */
 	case 357:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_file */
 	case 358:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_waitcomplete */
 	case 359:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb **";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getresuid */
 	case 360:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		case 1:
 			p = "userland uid_t *";
 			break;
 		case 2:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getresgid */
 	case 361:
 		switch(ndx) {
 		case 0:
 			p = "userland gid_t *";
 			break;
 		case 1:
 			p = "userland gid_t *";
 			break;
 		case 2:
 			p = "userland gid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kqueue */
 	case 362:
 		break;
 	/* extattr_set_fd */
 	case 371:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_fd */
 	case 372:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_fd */
 	case 373:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __setugid */
 	case 374:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* eaccess */
 	case 376:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* afs3_syscall */
 	case 377:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		case 1:
 			p = "long";
 			break;
 		case 2:
 			p = "long";
 			break;
 		case 3:
 			p = "long";
 			break;
 		case 4:
 			p = "long";
 			break;
 		case 5:
 			p = "long";
 			break;
 		case 6:
 			p = "long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* nmount */
 	case 378:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_get_proc */
 	case 384:
 		switch(ndx) {
 		case 0:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_set_proc */
 	case 385:
 		switch(ndx) {
 		case 0:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_get_fd */
 	case 386:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_get_file */
 	case 387:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_set_fd */
 	case 388:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_set_file */
 	case 389:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kenv */
 	case 390:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lchflags */
 	case 391:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "u_long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* uuidgen */
 	case 392:
 		switch(ndx) {
 		case 0:
 			p = "userland struct uuid *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sendfile */
 	case 393:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "off_t";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		case 4:
 			p = "userland struct sf_hdtr *";
 			break;
 		case 5:
 			p = "userland off_t *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mac_syscall */
 	case 394:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_close */
 	case 400:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_post */
 	case 401:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_wait */
 	case 402:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_trywait */
 	case 403:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_init */
 	case 404:
 		switch(ndx) {
 		case 0:
 			p = "userland semid_t *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_open */
 	case 405:
 		switch(ndx) {
 		case 0:
 			p = "userland semid_t *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "mode_t";
 			break;
 		case 4:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_unlink */
 	case 406:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_getvalue */
 	case 407:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_destroy */
 	case 408:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_get_pid */
 	case 409:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_get_link */
 	case 410:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_set_link */
 	case 411:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_set_link */
 	case 412:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_get_link */
 	case 413:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_delete_link */
 	case 414:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __mac_execve */
 	case 415:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland char **";
 			break;
 		case 2:
 			p = "userland char **";
 			break;
 		case 3:
 			p = "userland struct mac *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigaction */
 	case 416:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sigaction *";
 			break;
 		case 2:
 			p = "userland struct sigaction *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigreturn */
 	case 417:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct __ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getcontext */
 	case 421:
 		switch(ndx) {
 		case 0:
 			p = "userland struct __ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setcontext */
 	case 422:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct __ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* swapcontext */
 	case 423:
 		switch(ndx) {
 		case 0:
 			p = "userland struct __ucontext *";
 			break;
 		case 1:
 			p = "userland const struct __ucontext *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* swapoff */
 	case 424:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_get_link */
 	case 425:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_set_link */
 	case 426:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_delete_link */
 	case 427:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __acl_aclcheck_link */
 	case 428:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "acl_type_t";
 			break;
 		case 2:
 			p = "userland struct acl *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigwait */
 	case 429:
 		switch(ndx) {
 		case 0:
 			p = "userland const sigset_t *";
 			break;
 		case 1:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_create */
 	case 430:
 		switch(ndx) {
 		case 0:
 			p = "userland ucontext_t *";
 			break;
 		case 1:
 			p = "userland long *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_exit */
 	case 431:
 		switch(ndx) {
 		case 0:
 			p = "userland long *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_self */
 	case 432:
 		switch(ndx) {
 		case 0:
 			p = "userland long *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_kill */
 	case 433:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_attach */
 	case 436:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_fd */
 	case 437:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_file */
 	case 438:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* extattr_list_link */
 	case 439:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ksem_timedwait */
 	case 441:
 		switch(ndx) {
 		case 0:
 			p = "semid_t";
 			break;
 		case 1:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_suspend */
 	case 442:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_wake */
 	case 443:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kldunloadf */
 	case 444:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* audit */
 	case 445:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* auditon */
 	case 446:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getauid */
 	case 447:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setauid */
 	case 448:
 		switch(ndx) {
 		case 0:
 			p = "userland uid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getaudit */
 	case 449:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setaudit */
 	case 450:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getaudit_addr */
 	case 451:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo_addr *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setaudit_addr */
 	case 452:
 		switch(ndx) {
 		case 0:
 			p = "userland struct auditinfo_addr *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* auditctl */
 	case 453:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* _umtx_op */
 	case 454:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "u_long";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		case 4:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_new */
 	case 455:
 		switch(ndx) {
 		case 0:
 			p = "userland struct thr_param *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sigqueue */
 	case 456:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_open */
 	case 457:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "userland const struct mq_attr *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_setattr */
 	case 458:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct mq_attr *";
 			break;
 		case 2:
 			p = "userland struct mq_attr *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_timedreceive */
 	case 459:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "userland unsigned *";
 			break;
 		case 4:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_timedsend */
 	case 460:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "unsigned";
 			break;
 		case 4:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_notify */
 	case 461:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const struct sigevent *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kmq_unlink */
 	case 462:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* abort2 */
 	case 463:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland void **";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_set_name */
 	case 464:
 		switch(ndx) {
 		case 0:
 			p = "long";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_fsync */
 	case 465:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rtprio_thread */
 	case 466:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "lwpid_t";
 			break;
 		case 2:
 			p = "userland struct rtprio *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_peeloff */
 	case 471:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_sendmsg */
 	case 472:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "__socklen_t";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_sendmsg_iov */
 	case 473:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "__socklen_t";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* sctp_generic_recvmsg */
 	case 474:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct iovec *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct sockaddr *";
 			break;
 		case 4:
 			p = "userland __socklen_t *";
 			break;
 		case 5:
 			p = "userland struct sctp_sndrcvinfo *";
 			break;
 		case 6:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pread */
 	case 475:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pwrite */
 	case 476:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const void *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mmap */
 	case 477:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lseek */
 	case 478:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "off_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* truncate */
 	case 479:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ftruncate */
 	case 480:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* thr_kill2 */
 	case 481:
 		switch(ndx) {
 		case 0:
 			p = "pid_t";
 			break;
 		case 1:
 			p = "long";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shm_open */
 	case 482:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shm_unlink */
 	case 483:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset */
 	case 484:
 		switch(ndx) {
 		case 0:
 			p = "userland cpusetid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_setid */
 	case 485:
 		switch(ndx) {
 		case 0:
 			p = "cpuwhich_t";
 			break;
 		case 1:
 			p = "id_t";
 			break;
 		case 2:
 			p = "cpusetid_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_getid */
 	case 486:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "id_t";
 			break;
 		case 3:
 			p = "userland cpusetid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_getaffinity */
 	case 487:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "id_t";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		case 4:
 			p = "userland cpuset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_setaffinity */
 	case 488:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "id_t";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		case 4:
 			p = "userland const cpuset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* faccessat */
 	case 489:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchmodat */
 	case 490:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fchownat */
 	case 491:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "uid_t";
 			break;
 		case 3:
 			p = "gid_t";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fexecve */
 	case 492:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char **";
 			break;
 		case 2:
 			p = "userland char **";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* futimesat */
 	case 494:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct timeval *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* linkat */
 	case 495:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const char *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkdirat */
 	case 496:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mkfifoat */
 	case 497:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* openat */
 	case 499:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "mode_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* readlinkat */
 	case 500:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland char *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* renameat */
 	case 501:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* symlinkat */
 	case 502:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* unlinkat */
 	case 503:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* posix_openpt */
 	case 504:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* gssd_syscall */
 	case 505:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_get */
 	case 506:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_set */
 	case 507:
 		switch(ndx) {
 		case 0:
 			p = "userland struct iovec *";
 			break;
 		case 1:
 			p = "unsigned int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* jail_remove */
 	case 508:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* closefrom */
 	case 509:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __semctl */
 	case 510:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland union semun *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* msgctl */
 	case 511:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland struct msqid_ds *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* shmctl */
 	case 512:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland struct shmid_ds *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* lpathconf */
 	case 513:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* __cap_rights_get */
 	case 515:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland cap_rights_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_enter */
 	case 516:
 		break;
 	/* cap_getmode */
 	case 517:
 		switch(ndx) {
 		case 0:
 			p = "userland u_int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdfork */
 	case 518:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdkill */
 	case 519:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pdgetpid */
 	case 520:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland pid_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pselect */
 	case 522:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland fd_set *";
 			break;
 		case 2:
 			p = "userland fd_set *";
 			break;
 		case 3:
 			p = "userland fd_set *";
 			break;
 		case 4:
 			p = "userland const struct timespec *";
 			break;
 		case 5:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getloginclass */
 	case 523:
 		switch(ndx) {
 		case 0:
 			p = "userland char *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* setloginclass */
 	case 524:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_racct */
 	case 525:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_rules */
 	case 526:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_get_limits */
 	case 527:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_add_rule */
 	case 528:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* rctl_remove_rule */
 	case 529:
 		switch(ndx) {
 		case 0:
 			p = "userland const void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "userland void *";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* posix_fallocate */
 	case 530:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "off_t";
 			break;
 		case 2:
 			p = "off_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* posix_fadvise */
 	case 531:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "off_t";
 			break;
 		case 2:
 			p = "off_t";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* wait6 */
 	case 532:
 		switch(ndx) {
 		case 0:
 			p = "idtype_t";
 			break;
 		case 1:
 			p = "id_t";
 			break;
 		case 2:
 			p = "userland int *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		case 4:
 			p = "userland struct __wrusage *";
 			break;
 		case 5:
 			p = "userland siginfo_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_rights_limit */
 	case 533:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland cap_rights_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_ioctls_limit */
 	case 534:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const u_long *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_ioctls_get */
 	case 535:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland u_long *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_fcntls_limit */
 	case 536:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "uint32_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cap_fcntls_get */
 	case 537:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland uint32_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* bindat */
 	case 538:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sockaddr *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* connectat */
 	case 539:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const struct sockaddr *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* chflagsat */
 	case 540:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "u_long";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* accept4 */
 	case 541:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct sockaddr *";
 			break;
 		case 2:
 			p = "userland __socklen_t *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* pipe2 */
 	case 542:
 		switch(ndx) {
 		case 0:
 			p = "userland int *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* aio_mlock */
 	case 543:
 		switch(ndx) {
 		case 0:
 			p = "userland struct aiocb *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* procctl */
 	case 544:
 		switch(ndx) {
 		case 0:
 			p = "idtype_t";
 			break;
 		case 1:
 			p = "id_t";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland void *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* ppoll */
 	case 545:
 		switch(ndx) {
 		case 0:
 			p = "userland struct pollfd *";
 			break;
 		case 1:
 			p = "u_int";
 			break;
 		case 2:
 			p = "userland const struct timespec *";
 			break;
 		case 3:
 			p = "userland const sigset_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* futimens */
 	case 546:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* utimensat */
 	case 547:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct timespec *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fdatasync */
 	case 550:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fstat */
 	case 551:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct stat *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fstatat */
 	case 552:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "userland struct stat *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhstat */
 	case 553:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "userland struct stat *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getdirentries */
 	case 554:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		case 3:
 			p = "userland off_t *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* statfs */
 	case 555:
 		switch(ndx) {
 		case 0:
 			p = "userland const char *";
 			break;
 		case 1:
 			p = "userland struct statfs *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fstatfs */
 	case 556:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct statfs *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfsstat */
 	case 557:
 		switch(ndx) {
 		case 0:
 			p = "userland struct statfs *";
 			break;
 		case 1:
 			p = "long";
 			break;
 		case 2:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhstatfs */
 	case 558:
 		switch(ndx) {
 		case 0:
 			p = "userland const struct fhandle *";
 			break;
 		case 1:
 			p = "userland struct statfs *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* mknodat */
 	case 559:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "mode_t";
 			break;
 		case 3:
 			p = "dev_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* kevent */
 	case 560:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland struct kevent *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "userland struct kevent *";
 			break;
 		case 4:
 			p = "int";
 			break;
 		case 5:
 			p = "userland const struct timespec *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_getdomain */
 	case 561:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "id_t";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		case 4:
 			p = "userland domainset_t *";
 			break;
 		case 5:
 			p = "userland int *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* cpuset_setdomain */
 	case 562:
 		switch(ndx) {
 		case 0:
 			p = "cpulevel_t";
 			break;
 		case 1:
 			p = "cpuwhich_t";
 			break;
 		case 2:
 			p = "id_t";
 			break;
 		case 3:
 			p = "size_t";
 			break;
 		case 4:
 			p = "userland domainset_t *";
 			break;
 		case 5:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getrandom */
 	case 563:
 		switch(ndx) {
 		case 0:
 			p = "userland void *";
 			break;
 		case 1:
 			p = "size_t";
 			break;
 		case 2:
 			p = "unsigned int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* getfhat */
 	case 564:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "userland struct fhandle *";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhlink */
 	case 565:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhlinkat */
 	case 566:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "int";
 			break;
 		case 2:
 			p = "userland const char *";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* fhreadlink */
 	case 567:
 		switch(ndx) {
 		case 0:
 			p = "userland struct fhandle *";
 			break;
 		case 1:
 			p = "userland char *";
 			break;
 		case 2:
 			p = "size_t";
 			break;
 		default:
 			break;
 		};
 		break;
 	/* funlinkat */
 	case 568:
 		switch(ndx) {
 		case 0:
 			p = "int";
 			break;
 		case 1:
 			p = "userland const char *";
 			break;
 		case 2:
 			p = "int";
 			break;
 		case 3:
 			p = "int";
 			break;
 		default:
 			break;
 		};
 		break;
 	default:
 		break;
 	};
 	if (p != NULL)
 		strlcpy(desc, p, descsz);
 }
 static void
 systrace_return_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 {
 	const char *p = NULL;
 	switch (sysnum) {
 	/* nosys */
 	case 0:
 	/* sys_exit */
 	case 1:
 		if (ndx == 0 || ndx == 1)
 			p = "void";
 		break;
 	/* fork */
 	case 2:
 	/* read */
 	case 3:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* write */
 	case 4:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* open */
 	case 5:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* close */
 	case 6:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* wait4 */
 	case 7:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* link */
 	case 9:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unlink */
 	case 10:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chdir */
 	case 12:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchdir */
 	case 13:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chmod */
 	case 15:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chown */
 	case 16:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* break */
 	case 17:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* getpid */
 	case 20:
 	/* mount */
 	case 21:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unmount */
 	case 22:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setuid */
 	case 23:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getuid */
 	case 24:
 	/* geteuid */
 	case 25:
 	/* ptrace */
 	case 26:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* recvmsg */
 	case 27:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sendmsg */
 	case 28:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* recvfrom */
 	case 29:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* accept */
 	case 30:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpeername */
 	case 31:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsockname */
 	case 32:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* access */
 	case 33:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chflags */
 	case 34:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchflags */
 	case 35:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sync */
 	case 36:
 	/* kill */
 	case 37:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getppid */
 	case 39:
 	/* dup */
 	case 41:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getegid */
 	case 43:
 	/* profil */
 	case 44:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktrace */
 	case 45:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getgid */
 	case 47:
 	/* getlogin */
 	case 49:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setlogin */
 	case 50:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* acct */
 	case 51:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigaltstack */
 	case 53:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ioctl */
 	case 54:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* reboot */
 	case 55:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* revoke */
 	case 56:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* symlink */
 	case 57:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* readlink */
 	case 58:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* execve */
 	case 59:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* umask */
 	case 60:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chroot */
 	case 61:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msync */
 	case 65:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* vfork */
 	case 66:
 	/* sbrk */
 	case 69:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sstk */
 	case 70:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munmap */
 	case 73:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mprotect */
 	case 74:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* madvise */
 	case 75:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mincore */
 	case 78:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getgroups */
 	case 79:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setgroups */
 	case 80:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpgrp */
 	case 81:
 	/* setpgid */
 	case 82:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setitimer */
 	case 83:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* swapon */
 	case 85:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getitimer */
 	case 86:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getdtablesize */
 	case 89:
 	/* dup2 */
 	case 90:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fcntl */
 	case 92:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* select */
 	case 93:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fsync */
 	case 95:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setpriority */
 	case 96:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* socket */
 	case 97:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* connect */
 	case 98:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpriority */
 	case 100:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* bind */
 	case 104:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setsockopt */
 	case 105:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* listen */
 	case 106:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* gettimeofday */
 	case 116:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getrusage */
 	case 117:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsockopt */
 	case 118:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* readv */
 	case 120:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* writev */
 	case 121:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* settimeofday */
 	case 122:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchown */
 	case 123:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchmod */
 	case 124:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setreuid */
 	case 126:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setregid */
 	case 127:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rename */
 	case 128:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* flock */
 	case 131:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkfifo */
 	case 132:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sendto */
 	case 133:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shutdown */
 	case 134:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* socketpair */
 	case 135:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkdir */
 	case 136:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rmdir */
 	case 137:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* utimes */
 	case 138:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* adjtime */
 	case 140:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setsid */
 	case 147:
 	/* quotactl */
 	case 148:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nlm_syscall */
 	case 154:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nfssvc */
 	case 155:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lgetfh */
 	case 160:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfh */
 	case 161:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sysarch */
 	case 165:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rtprio */
 	case 166:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* semsys */
 	case 169:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgsys */
 	case 170:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shmsys */
 	case 171:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setfib */
 	case 175:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ntp_adjtime */
 	case 176:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setgid */
 	case 181:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setegid */
 	case 182:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* seteuid */
 	case 183:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pathconf */
 	case 191:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fpathconf */
 	case 192:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getrlimit */
 	case 194:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setrlimit */
 	case 195:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nosys */
 	case 198:
 	/* __sysctl */
 	case 202:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mlock */
 	case 203:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munlock */
 	case 204:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* undelete */
 	case 205:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* futimes */
 	case 206:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getpgid */
 	case 207:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* poll */
 	case 209:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lkmnosys */
 	case 210:
 	/* lkmnosys */
 	case 211:
 	/* lkmnosys */
 	case 212:
 	/* lkmnosys */
 	case 213:
 	/* lkmnosys */
 	case 214:
 	/* lkmnosys */
 	case 215:
 	/* lkmnosys */
 	case 216:
 	/* lkmnosys */
 	case 217:
 	/* lkmnosys */
 	case 218:
 	/* lkmnosys */
 	case 219:
 	/* semget */
 	case 221:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* semop */
 	case 222:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgget */
 	case 225:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgsnd */
 	case 226:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgrcv */
 	case 227:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* shmat */
 	case 228:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* shmdt */
 	case 230:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shmget */
 	case 231:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* clock_gettime */
 	case 232:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* clock_settime */
 	case 233:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* clock_getres */
 	case 234:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_create */
 	case 235:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_delete */
 	case 236:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_settime */
 	case 237:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_gettime */
 	case 238:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ktimer_getoverrun */
 	case 239:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nanosleep */
 	case 240:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_getcounter */
 	case 241:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_setestimate */
 	case 242:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ffclock_getestimate */
 	case 243:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* clock_nanosleep */
 	case 244:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* clock_getcpuclockid2 */
 	case 247:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ntp_gettime */
 	case 248:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* minherit */
 	case 250:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rfork */
 	case 251:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* issetugid */
 	case 253:
 	/* lchown */
 	case 254:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_read */
 	case 255:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_write */
 	case 256:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lio_listio */
 	case 257:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lchmod */
 	case 274:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lutimes */
 	case 276:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* preadv */
 	case 289:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* pwritev */
 	case 290:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* fhopen */
 	case 298:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modnext */
 	case 300:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modstat */
 	case 301:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modfnext */
 	case 302:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* modfind */
 	case 303:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldload */
 	case 304:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldunload */
 	case 305:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldfind */
 	case 306:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldnext */
 	case 307:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldstat */
 	case 308:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldfirstmod */
 	case 309:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getsid */
 	case 310:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setresuid */
 	case 311:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setresgid */
 	case 312:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_return */
 	case 314:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* aio_suspend */
 	case 315:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_cancel */
 	case 316:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_error */
 	case 317:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* yield */
 	case 321:
 	/* mlockall */
 	case 324:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* munlockall */
 	case 325:
 	/* __getcwd */
 	case 326:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_setparam */
 	case 327:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_getparam */
 	case 328:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_setscheduler */
 	case 329:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_getscheduler */
 	case 330:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_yield */
 	case 331:
 	/* sched_get_priority_max */
 	case 332:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_get_priority_min */
 	case 333:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sched_rr_get_interval */
 	case 334:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* utrace */
 	case 335:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldsym */
 	case 337:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail */
 	case 338:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nnpfs_syscall */
 	case 339:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigprocmask */
 	case 340:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigsuspend */
 	case 341:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigpending */
 	case 343:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigtimedwait */
 	case 345:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigwaitinfo */
 	case 346:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_file */
 	case 347:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_file */
 	case 348:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_fd */
 	case 349:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_fd */
 	case 350:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_file */
 	case 351:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_fd */
 	case 352:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_file */
 	case 353:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_fd */
 	case 354:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattrctl */
 	case 355:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_set_file */
 	case 356:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_file */
 	case 357:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_file */
 	case 358:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_waitcomplete */
 	case 359:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* getresuid */
 	case 360:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getresgid */
 	case 361:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kqueue */
 	case 362:
 	/* extattr_set_fd */
 	case 371:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_fd */
 	case 372:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_fd */
 	case 373:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __setugid */
 	case 374:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* eaccess */
 	case 376:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* afs3_syscall */
 	case 377:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* nmount */
 	case 378:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_get_proc */
 	case 384:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_set_proc */
 	case 385:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_get_fd */
 	case 386:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_get_file */
 	case 387:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_set_fd */
 	case 388:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_set_file */
 	case 389:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kenv */
 	case 390:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lchflags */
 	case 391:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* uuidgen */
 	case 392:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sendfile */
 	case 393:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mac_syscall */
 	case 394:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_close */
 	case 400:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_post */
 	case 401:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_wait */
 	case 402:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_trywait */
 	case 403:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_init */
 	case 404:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_open */
 	case 405:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_unlink */
 	case 406:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_getvalue */
 	case 407:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ksem_destroy */
 	case 408:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_get_pid */
 	case 409:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_get_link */
 	case 410:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_set_link */
 	case 411:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_set_link */
 	case 412:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_get_link */
 	case 413:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_delete_link */
 	case 414:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __mac_execve */
 	case 415:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigaction */
 	case 416:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigreturn */
 	case 417:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getcontext */
 	case 421:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setcontext */
 	case 422:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* swapcontext */
 	case 423:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* swapoff */
 	case 424:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_get_link */
 	case 425:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_set_link */
 	case 426:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_delete_link */
 	case 427:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __acl_aclcheck_link */
 	case 428:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigwait */
 	case 429:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_create */
 	case 430:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_exit */
 	case 431:
 		if (ndx == 0 || ndx == 1)
 			p = "void";
 		break;
 	/* thr_self */
 	case 432:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_kill */
 	case 433:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_attach */
 	case 436:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* extattr_list_fd */
 	case 437:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_list_file */
 	case 438:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* extattr_list_link */
 	case 439:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* ksem_timedwait */
 	case 441:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_suspend */
 	case 442:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_wake */
 	case 443:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kldunloadf */
 	case 444:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* audit */
 	case 445:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* auditon */
 	case 446:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getauid */
 	case 447:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setauid */
 	case 448:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getaudit */
 	case 449:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setaudit */
 	case 450:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getaudit_addr */
 	case 451:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setaudit_addr */
 	case 452:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* auditctl */
 	case 453:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* _umtx_op */
 	case 454:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_new */
 	case 455:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sigqueue */
 	case 456:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_open */
 	case 457:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_setattr */
 	case 458:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_timedreceive */
 	case 459:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_timedsend */
 	case 460:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_notify */
 	case 461:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kmq_unlink */
 	case 462:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* abort2 */
 	case 463:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_set_name */
 	case 464:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_fsync */
 	case 465:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rtprio_thread */
 	case 466:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_peeloff */
 	case 471:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_sendmsg */
 	case 472:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_sendmsg_iov */
 	case 473:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* sctp_generic_recvmsg */
 	case 474:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pread */
 	case 475:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* pwrite */
 	case 476:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* mmap */
 	case 477:
 		if (ndx == 0 || ndx == 1)
 			p = "void *";
 		break;
 	/* lseek */
 	case 478:
 		if (ndx == 0 || ndx == 1)
 			p = "off_t";
 		break;
 	/* truncate */
 	case 479:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ftruncate */
 	case 480:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* thr_kill2 */
 	case 481:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shm_open */
 	case 482:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shm_unlink */
 	case 483:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset */
 	case 484:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_setid */
 	case 485:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_getid */
 	case 486:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_getaffinity */
 	case 487:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_setaffinity */
 	case 488:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* faccessat */
 	case 489:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchmodat */
 	case 490:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fchownat */
 	case 491:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fexecve */
 	case 492:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* futimesat */
 	case 494:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* linkat */
 	case 495:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkdirat */
 	case 496:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mkfifoat */
 	case 497:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* openat */
 	case 499:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* readlinkat */
 	case 500:
 		if (ndx == 0 || ndx == 1)
-			p = "int";
+			p = "ssize_t";
 		break;
 	/* renameat */
 	case 501:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* symlinkat */
 	case 502:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* unlinkat */
 	case 503:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* posix_openpt */
 	case 504:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* gssd_syscall */
 	case 505:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_get */
 	case 506:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_set */
 	case 507:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* jail_remove */
 	case 508:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* closefrom */
 	case 509:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __semctl */
 	case 510:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* msgctl */
 	case 511:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* shmctl */
 	case 512:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* lpathconf */
 	case 513:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* __cap_rights_get */
 	case 515:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_enter */
 	case 516:
 	/* cap_getmode */
 	case 517:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdfork */
 	case 518:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdkill */
 	case 519:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pdgetpid */
 	case 520:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pselect */
 	case 522:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getloginclass */
 	case 523:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* setloginclass */
 	case 524:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_racct */
 	case 525:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_rules */
 	case 526:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_get_limits */
 	case 527:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_add_rule */
 	case 528:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* rctl_remove_rule */
 	case 529:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* posix_fallocate */
 	case 530:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* posix_fadvise */
 	case 531:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* wait6 */
 	case 532:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_rights_limit */
 	case 533:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_ioctls_limit */
 	case 534:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_ioctls_get */
 	case 535:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* cap_fcntls_limit */
 	case 536:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cap_fcntls_get */
 	case 537:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* bindat */
 	case 538:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* connectat */
 	case 539:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* chflagsat */
 	case 540:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* accept4 */
 	case 541:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* pipe2 */
 	case 542:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* aio_mlock */
 	case 543:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* procctl */
 	case 544:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* ppoll */
 	case 545:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* futimens */
 	case 546:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* utimensat */
 	case 547:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fdatasync */
 	case 550:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fstat */
 	case 551:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fstatat */
 	case 552:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhstat */
 	case 553:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getdirentries */
 	case 554:
 		if (ndx == 0 || ndx == 1)
 			p = "ssize_t";
 		break;
 	/* statfs */
 	case 555:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fstatfs */
 	case 556:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfsstat */
 	case 557:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhstatfs */
 	case 558:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* mknodat */
 	case 559:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* kevent */
 	case 560:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_getdomain */
 	case 561:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* cpuset_setdomain */
 	case 562:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getrandom */
 	case 563:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* getfhat */
 	case 564:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhlink */
 	case 565:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhlinkat */
 	case 566:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* fhreadlink */
 	case 567:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	/* funlinkat */
 	case 568:
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
 	default:
 		break;
 	};
 	if (p != NULL)
 		strlcpy(desc, p, descsz);
 }
Index: user/ngie/bug-237403/sys/kern/uipc_mbuf.c
===================================================================
--- user/ngie/bug-237403/sys/kern/uipc_mbuf.c	(revision 346925)
+++ user/ngie/bug-237403/sys/kern/uipc_mbuf.c	(revision 346926)
@@ -1,1872 +1,1875 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1988, 1991, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)uipc_mbuf.c	8.2 (Berkeley) 1/4/94
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_param.h"
 #include "opt_mbuf_stress_test.h"
 #include "opt_mbuf_profiling.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/limits.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/sysctl.h>
 #include <sys/domain.h>
 #include <sys/protosw.h>
 #include <sys/uio.h>
 #include <sys/sdt.h>
 
 SDT_PROBE_DEFINE5_XLATE(sdt, , , m__init,
     "struct mbuf *", "mbufinfo_t *",
     "uint32_t", "uint32_t",
     "uint16_t", "uint16_t",
     "uint32_t", "uint32_t",
     "uint32_t", "uint32_t");
 
 SDT_PROBE_DEFINE3_XLATE(sdt, , , m__gethdr,
     "uint32_t", "uint32_t",
     "uint16_t", "uint16_t",
     "struct mbuf *", "mbufinfo_t *");
 
 SDT_PROBE_DEFINE3_XLATE(sdt, , , m__get,
     "uint32_t", "uint32_t",
     "uint16_t", "uint16_t",
     "struct mbuf *", "mbufinfo_t *");
 
 SDT_PROBE_DEFINE4_XLATE(sdt, , , m__getcl,
     "uint32_t", "uint32_t",
     "uint16_t", "uint16_t",
     "uint32_t", "uint32_t",
     "struct mbuf *", "mbufinfo_t *");
 
 SDT_PROBE_DEFINE3_XLATE(sdt, , , m__clget,
     "struct mbuf *", "mbufinfo_t *",
     "uint32_t", "uint32_t",
     "uint32_t", "uint32_t");
 
 SDT_PROBE_DEFINE4_XLATE(sdt, , , m__cljget,
     "struct mbuf *", "mbufinfo_t *",
     "uint32_t", "uint32_t",
     "uint32_t", "uint32_t",
     "void*", "void*");
 
 SDT_PROBE_DEFINE(sdt, , , m__cljset);
 
 SDT_PROBE_DEFINE1_XLATE(sdt, , , m__free,
         "struct mbuf *", "mbufinfo_t *");
 
 SDT_PROBE_DEFINE1_XLATE(sdt, , , m__freem,
     "struct mbuf *", "mbufinfo_t *");
 
 #include <security/mac/mac_framework.h>
 
 int	max_linkhdr;
 int	max_protohdr;
 int	max_hdr;
 int	max_datalen;
 #ifdef MBUF_STRESS_TEST
 int	m_defragpackets;
 int	m_defragbytes;
 int	m_defraguseless;
 int	m_defragfailure;
 int	m_defragrandomfailures;
 #endif
 
 /*
  * sysctl(8) exported objects
  */
 SYSCTL_INT(_kern_ipc, KIPC_MAX_LINKHDR, max_linkhdr, CTLFLAG_RD,
 	   &max_linkhdr, 0, "Size of largest link layer header");
 SYSCTL_INT(_kern_ipc, KIPC_MAX_PROTOHDR, max_protohdr, CTLFLAG_RD,
 	   &max_protohdr, 0, "Size of largest protocol layer header");
 SYSCTL_INT(_kern_ipc, KIPC_MAX_HDR, max_hdr, CTLFLAG_RD,
 	   &max_hdr, 0, "Size of largest link plus protocol header");
 SYSCTL_INT(_kern_ipc, KIPC_MAX_DATALEN, max_datalen, CTLFLAG_RD,
 	   &max_datalen, 0, "Minimum space left in mbuf after max_hdr");
 #ifdef MBUF_STRESS_TEST
 SYSCTL_INT(_kern_ipc, OID_AUTO, m_defragpackets, CTLFLAG_RD,
 	   &m_defragpackets, 0, "");
 SYSCTL_INT(_kern_ipc, OID_AUTO, m_defragbytes, CTLFLAG_RD,
 	   &m_defragbytes, 0, "");
 SYSCTL_INT(_kern_ipc, OID_AUTO, m_defraguseless, CTLFLAG_RD,
 	   &m_defraguseless, 0, "");
 SYSCTL_INT(_kern_ipc, OID_AUTO, m_defragfailure, CTLFLAG_RD,
 	   &m_defragfailure, 0, "");
 SYSCTL_INT(_kern_ipc, OID_AUTO, m_defragrandomfailures, CTLFLAG_RW,
 	   &m_defragrandomfailures, 0, "");
 #endif
 
 /*
  * Ensure the correct size of various mbuf parameters.  It could be off due
  * to compiler-induced padding and alignment artifacts.
  */
 CTASSERT(MSIZE - offsetof(struct mbuf, m_dat) == MLEN);
 CTASSERT(MSIZE - offsetof(struct mbuf, m_pktdat) == MHLEN);
 
 /*
  * mbuf data storage should be 64-bit aligned regardless of architectural
  * pointer size; check this is the case with and without a packet header.
  */
 CTASSERT(offsetof(struct mbuf, m_dat) % 8 == 0);
 CTASSERT(offsetof(struct mbuf, m_pktdat) % 8 == 0);
 
 /*
  * While the specific values here don't matter too much (i.e., +/- a few
  * words), we do want to ensure that changes to these values are carefully
  * reasoned about and properly documented.  This is especially the case as
  * network-protocol and device-driver modules encode these layouts, and must
  * be recompiled if the structures change.  Check these values at compile time
  * against the ones documented in comments in mbuf.h.
  *
  * NB: Possibly they should be documented there via #define's and not just
  * comments.
  */
 #if defined(__LP64__)
 CTASSERT(offsetof(struct mbuf, m_dat) == 32);
 CTASSERT(sizeof(struct pkthdr) == 56);
 CTASSERT(sizeof(struct m_ext) == 48);
 #else
 CTASSERT(offsetof(struct mbuf, m_dat) == 24);
 CTASSERT(sizeof(struct pkthdr) == 48);
 CTASSERT(sizeof(struct m_ext) == 28);
 #endif
 
 /*
  * Assert that the queue(3) macros produce code of the same size as an old
  * plain pointer does.
  */
 #ifdef INVARIANTS
 static struct mbuf __used m_assertbuf;
 CTASSERT(sizeof(m_assertbuf.m_slist) == sizeof(m_assertbuf.m_next));
 CTASSERT(sizeof(m_assertbuf.m_stailq) == sizeof(m_assertbuf.m_next));
 CTASSERT(sizeof(m_assertbuf.m_slistpkt) == sizeof(m_assertbuf.m_nextpkt));
 CTASSERT(sizeof(m_assertbuf.m_stailqpkt) == sizeof(m_assertbuf.m_nextpkt));
 #endif
 
 /*
  * Attach the cluster from *m to *n, set up m_ext in *n
  * and bump the refcount of the cluster.
  */
 void
 mb_dupcl(struct mbuf *n, struct mbuf *m)
 {
 	volatile u_int *refcnt;
 
 	KASSERT(m->m_flags & M_EXT, ("%s: M_EXT not set on %p", __func__, m));
 	KASSERT(!(n->m_flags & M_EXT), ("%s: M_EXT set on %p", __func__, n));
 
 	/*
 	 * Cache access optimization.  For most kinds of external
 	 * storage we don't need full copy of m_ext, since the
 	 * holder of the 'ext_count' is responsible to carry the
 	 * free routine and its arguments.  Exclusion is EXT_EXTREF,
 	 * where 'ext_cnt' doesn't point into mbuf at all.
 	 */
 	if (m->m_ext.ext_type == EXT_EXTREF)
 		bcopy(&m->m_ext, &n->m_ext, sizeof(struct m_ext));
 	else
 		bcopy(&m->m_ext, &n->m_ext, m_ext_copylen);
 	n->m_flags |= M_EXT;
 	n->m_flags |= m->m_flags & M_RDONLY;
 
 	/* See if this is the mbuf that holds the embedded refcount. */
 	if (m->m_ext.ext_flags & EXT_FLAG_EMBREF) {
 		refcnt = n->m_ext.ext_cnt = &m->m_ext.ext_count;
 		n->m_ext.ext_flags &= ~EXT_FLAG_EMBREF;
 	} else {
 		KASSERT(m->m_ext.ext_cnt != NULL,
 		    ("%s: no refcounting pointer on %p", __func__, m));
 		refcnt = m->m_ext.ext_cnt;
 	}
 
 	if (*refcnt == 1)
 		*refcnt += 1;
 	else
 		atomic_add_int(refcnt, 1);
 }
 
 void
 m_demote_pkthdr(struct mbuf *m)
 {
 
 	M_ASSERTPKTHDR(m);
 
 	m_tag_delete_chain(m, NULL);
 	m->m_flags &= ~M_PKTHDR;
 	bzero(&m->m_pkthdr, sizeof(struct pkthdr));
 }
 
 /*
  * Clean up mbuf (chain) from any tags and packet headers.
  * If "all" is set then the first mbuf in the chain will be
  * cleaned too.
  */
 void
 m_demote(struct mbuf *m0, int all, int flags)
 {
 	struct mbuf *m;
 
 	for (m = all ? m0 : m0->m_next; m != NULL; m = m->m_next) {
 		KASSERT(m->m_nextpkt == NULL, ("%s: m_nextpkt in m %p, m0 %p",
 		    __func__, m, m0));
 		if (m->m_flags & M_PKTHDR)
 			m_demote_pkthdr(m);
 		m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags);
 	}
 }
 
 /*
  * Sanity checks on mbuf (chain) for use in KASSERT() and general
  * debugging.
  * Returns 0 or panics when bad and 1 on all tests passed.
  * Sanitize, 0 to run M_SANITY_ACTION, 1 to garble things so they
  * blow up later.
  */
 int
 m_sanity(struct mbuf *m0, int sanitize)
 {
 	struct mbuf *m;
 	caddr_t a, b;
 	int pktlen = 0;
 
 #ifdef INVARIANTS
 #define	M_SANITY_ACTION(s)	panic("mbuf %p: " s, m)
 #else
 #define	M_SANITY_ACTION(s)	printf("mbuf %p: " s, m)
 #endif
 
 	for (m = m0; m != NULL; m = m->m_next) {
 		/*
 		 * Basic pointer checks.  If any of these fails then some
 		 * unrelated kernel memory before or after us is trashed.
 		 * No way to recover from that.
 		 */
 		a = M_START(m);
 		b = a + M_SIZE(m);
 		if ((caddr_t)m->m_data < a)
 			M_SANITY_ACTION("m_data outside mbuf data range left");
 		if ((caddr_t)m->m_data > b)
 			M_SANITY_ACTION("m_data outside mbuf data range right");
 		if ((caddr_t)m->m_data + m->m_len > b)
 			M_SANITY_ACTION("m_data + m_len exeeds mbuf space");
 
 		/* m->m_nextpkt may only be set on first mbuf in chain. */
 		if (m != m0 && m->m_nextpkt != NULL) {
 			if (sanitize) {
 				m_freem(m->m_nextpkt);
 				m->m_nextpkt = (struct mbuf *)0xDEADC0DE;
 			} else
 				M_SANITY_ACTION("m->m_nextpkt on in-chain mbuf");
 		}
 
 		/* packet length (not mbuf length!) calculation */
 		if (m0->m_flags & M_PKTHDR)
 			pktlen += m->m_len;
 
 		/* m_tags may only be attached to first mbuf in chain. */
 		if (m != m0 && m->m_flags & M_PKTHDR &&
 		    !SLIST_EMPTY(&m->m_pkthdr.tags)) {
 			if (sanitize) {
 				m_tag_delete_chain(m, NULL);
 				/* put in 0xDEADC0DE perhaps? */
 			} else
 				M_SANITY_ACTION("m_tags on in-chain mbuf");
 		}
 
 		/* M_PKTHDR may only be set on first mbuf in chain */
 		if (m != m0 && m->m_flags & M_PKTHDR) {
 			if (sanitize) {
 				bzero(&m->m_pkthdr, sizeof(m->m_pkthdr));
 				m->m_flags &= ~M_PKTHDR;
 				/* put in 0xDEADCODE and leave hdr flag in */
 			} else
 				M_SANITY_ACTION("M_PKTHDR on in-chain mbuf");
 		}
 	}
 	m = m0;
 	if (pktlen && pktlen != m->m_pkthdr.len) {
 		if (sanitize)
 			m->m_pkthdr.len = 0;
 		else
 			M_SANITY_ACTION("m_pkthdr.len != mbuf chain length");
 	}
 	return 1;
 
 #undef	M_SANITY_ACTION
 }
 
 /*
  * Non-inlined part of m_init().
  */
 int
 m_pkthdr_init(struct mbuf *m, int how)
 {
 #ifdef MAC
 	int error;
 #endif
 	m->m_data = m->m_pktdat;
 	bzero(&m->m_pkthdr, sizeof(m->m_pkthdr));
+#ifdef NUMA
+	m->m_pkthdr.numa_domain = M_NODOM;
+#endif
 #ifdef MAC
 	/* If the label init fails, fail the alloc */
 	error = mac_mbuf_init(m, how);
 	if (error)
 		return (error);
 #endif
 
 	return (0);
 }
 
 /*
  * "Move" mbuf pkthdr from "from" to "to".
  * "from" must have M_PKTHDR set, and "to" must be empty.
  */
 void
 m_move_pkthdr(struct mbuf *to, struct mbuf *from)
 {
 
 #if 0
 	/* see below for why these are not enabled */
 	M_ASSERTPKTHDR(to);
 	/* Note: with MAC, this may not be a good assertion. */
 	KASSERT(SLIST_EMPTY(&to->m_pkthdr.tags),
 	    ("m_move_pkthdr: to has tags"));
 #endif
 #ifdef MAC
 	/*
 	 * XXXMAC: It could be this should also occur for non-MAC?
 	 */
 	if (to->m_flags & M_PKTHDR)
 		m_tag_delete_chain(to, NULL);
 #endif
 	to->m_flags = (from->m_flags & M_COPYFLAGS) | (to->m_flags & M_EXT);
 	if ((to->m_flags & M_EXT) == 0)
 		to->m_data = to->m_pktdat;
 	to->m_pkthdr = from->m_pkthdr;		/* especially tags */
 	SLIST_INIT(&from->m_pkthdr.tags);	/* purge tags from src */
 	from->m_flags &= ~M_PKTHDR;
 }
 
 /*
  * Duplicate "from"'s mbuf pkthdr in "to".
  * "from" must have M_PKTHDR set, and "to" must be empty.
  * In particular, this does a deep copy of the packet tags.
  */
 int
 m_dup_pkthdr(struct mbuf *to, const struct mbuf *from, int how)
 {
 
 #if 0
 	/*
 	 * The mbuf allocator only initializes the pkthdr
 	 * when the mbuf is allocated with m_gethdr(). Many users
 	 * (e.g. m_copy*, m_prepend) use m_get() and then
 	 * smash the pkthdr as needed causing these
 	 * assertions to trip.  For now just disable them.
 	 */
 	M_ASSERTPKTHDR(to);
 	/* Note: with MAC, this may not be a good assertion. */
 	KASSERT(SLIST_EMPTY(&to->m_pkthdr.tags), ("m_dup_pkthdr: to has tags"));
 #endif
 	MBUF_CHECKSLEEP(how);
 #ifdef MAC
 	if (to->m_flags & M_PKTHDR)
 		m_tag_delete_chain(to, NULL);
 #endif
 	to->m_flags = (from->m_flags & M_COPYFLAGS) | (to->m_flags & M_EXT);
 	if ((to->m_flags & M_EXT) == 0)
 		to->m_data = to->m_pktdat;
 	to->m_pkthdr = from->m_pkthdr;
 	SLIST_INIT(&to->m_pkthdr.tags);
 	return (m_tag_copy_chain(to, from, how));
 }
 
 /*
  * Lesser-used path for M_PREPEND:
  * allocate new mbuf to prepend to chain,
  * copy junk along.
  */
 struct mbuf *
 m_prepend(struct mbuf *m, int len, int how)
 {
 	struct mbuf *mn;
 
 	if (m->m_flags & M_PKTHDR)
 		mn = m_gethdr(how, m->m_type);
 	else
 		mn = m_get(how, m->m_type);
 	if (mn == NULL) {
 		m_freem(m);
 		return (NULL);
 	}
 	if (m->m_flags & M_PKTHDR)
 		m_move_pkthdr(mn, m);
 	mn->m_next = m;
 	m = mn;
 	if (len < M_SIZE(m))
 		M_ALIGN(m, len);
 	m->m_len = len;
 	return (m);
 }
 
 /*
  * Make a copy of an mbuf chain starting "off0" bytes from the beginning,
  * continuing for "len" bytes.  If len is M_COPYALL, copy to end of mbuf.
  * The wait parameter is a choice of M_WAITOK/M_NOWAIT from caller.
  * Note that the copy is read-only, because clusters are not copied,
  * only their reference counts are incremented.
  */
 struct mbuf *
 m_copym(struct mbuf *m, int off0, int len, int wait)
 {
 	struct mbuf *n, **np;
 	int off = off0;
 	struct mbuf *top;
 	int copyhdr = 0;
 
 	KASSERT(off >= 0, ("m_copym, negative off %d", off));
 	KASSERT(len >= 0, ("m_copym, negative len %d", len));
 	MBUF_CHECKSLEEP(wait);
 	if (off == 0 && m->m_flags & M_PKTHDR)
 		copyhdr = 1;
 	while (off > 0) {
 		KASSERT(m != NULL, ("m_copym, offset > size of mbuf chain"));
 		if (off < m->m_len)
 			break;
 		off -= m->m_len;
 		m = m->m_next;
 	}
 	np = &top;
 	top = NULL;
 	while (len > 0) {
 		if (m == NULL) {
 			KASSERT(len == M_COPYALL,
 			    ("m_copym, length > size of mbuf chain"));
 			break;
 		}
 		if (copyhdr)
 			n = m_gethdr(wait, m->m_type);
 		else
 			n = m_get(wait, m->m_type);
 		*np = n;
 		if (n == NULL)
 			goto nospace;
 		if (copyhdr) {
 			if (!m_dup_pkthdr(n, m, wait))
 				goto nospace;
 			if (len == M_COPYALL)
 				n->m_pkthdr.len -= off0;
 			else
 				n->m_pkthdr.len = len;
 			copyhdr = 0;
 		}
 		n->m_len = min(len, m->m_len - off);
 		if (m->m_flags & M_EXT) {
 			n->m_data = m->m_data + off;
 			mb_dupcl(n, m);
 		} else
 			bcopy(mtod(m, caddr_t)+off, mtod(n, caddr_t),
 			    (u_int)n->m_len);
 		if (len != M_COPYALL)
 			len -= n->m_len;
 		off = 0;
 		m = m->m_next;
 		np = &n->m_next;
 	}
 
 	return (top);
 nospace:
 	m_freem(top);
 	return (NULL);
 }
 
 /*
  * Copy an entire packet, including header (which must be present).
  * An optimization of the common case `m_copym(m, 0, M_COPYALL, how)'.
  * Note that the copy is read-only, because clusters are not copied,
  * only their reference counts are incremented.
  * Preserve alignment of the first mbuf so if the creator has left
  * some room at the beginning (e.g. for inserting protocol headers)
  * the copies still have the room available.
  */
 struct mbuf *
 m_copypacket(struct mbuf *m, int how)
 {
 	struct mbuf *top, *n, *o;
 
 	MBUF_CHECKSLEEP(how);
 	n = m_get(how, m->m_type);
 	top = n;
 	if (n == NULL)
 		goto nospace;
 
 	if (!m_dup_pkthdr(n, m, how))
 		goto nospace;
 	n->m_len = m->m_len;
 	if (m->m_flags & M_EXT) {
 		n->m_data = m->m_data;
 		mb_dupcl(n, m);
 	} else {
 		n->m_data = n->m_pktdat + (m->m_data - m->m_pktdat );
 		bcopy(mtod(m, char *), mtod(n, char *), n->m_len);
 	}
 
 	m = m->m_next;
 	while (m) {
 		o = m_get(how, m->m_type);
 		if (o == NULL)
 			goto nospace;
 
 		n->m_next = o;
 		n = n->m_next;
 
 		n->m_len = m->m_len;
 		if (m->m_flags & M_EXT) {
 			n->m_data = m->m_data;
 			mb_dupcl(n, m);
 		} else {
 			bcopy(mtod(m, char *), mtod(n, char *), n->m_len);
 		}
 
 		m = m->m_next;
 	}
 	return top;
 nospace:
 	m_freem(top);
 	return (NULL);
 }
 
 /*
  * Copy data from an mbuf chain starting "off" bytes from the beginning,
  * continuing for "len" bytes, into the indicated buffer.
  */
 void
 m_copydata(const struct mbuf *m, int off, int len, caddr_t cp)
 {
 	u_int count;
 
 	KASSERT(off >= 0, ("m_copydata, negative off %d", off));
 	KASSERT(len >= 0, ("m_copydata, negative len %d", len));
 	while (off > 0) {
 		KASSERT(m != NULL, ("m_copydata, offset > size of mbuf chain"));
 		if (off < m->m_len)
 			break;
 		off -= m->m_len;
 		m = m->m_next;
 	}
 	while (len > 0) {
 		KASSERT(m != NULL, ("m_copydata, length > size of mbuf chain"));
 		count = min(m->m_len - off, len);
 		bcopy(mtod(m, caddr_t) + off, cp, count);
 		len -= count;
 		cp += count;
 		off = 0;
 		m = m->m_next;
 	}
 }
 
 /*
  * Copy a packet header mbuf chain into a completely new chain, including
  * copying any mbuf clusters.  Use this instead of m_copypacket() when
  * you need a writable copy of an mbuf chain.
  */
 struct mbuf *
 m_dup(const struct mbuf *m, int how)
 {
 	struct mbuf **p, *top = NULL;
 	int remain, moff, nsize;
 
 	MBUF_CHECKSLEEP(how);
 	/* Sanity check */
 	if (m == NULL)
 		return (NULL);
 	M_ASSERTPKTHDR(m);
 
 	/* While there's more data, get a new mbuf, tack it on, and fill it */
 	remain = m->m_pkthdr.len;
 	moff = 0;
 	p = &top;
 	while (remain > 0 || top == NULL) {	/* allow m->m_pkthdr.len == 0 */
 		struct mbuf *n;
 
 		/* Get the next new mbuf */
 		if (remain >= MINCLSIZE) {
 			n = m_getcl(how, m->m_type, 0);
 			nsize = MCLBYTES;
 		} else {
 			n = m_get(how, m->m_type);
 			nsize = MLEN;
 		}
 		if (n == NULL)
 			goto nospace;
 
 		if (top == NULL) {		/* First one, must be PKTHDR */
 			if (!m_dup_pkthdr(n, m, how)) {
 				m_free(n);
 				goto nospace;
 			}
 			if ((n->m_flags & M_EXT) == 0)
 				nsize = MHLEN;
 			n->m_flags &= ~M_RDONLY;
 		}
 		n->m_len = 0;
 
 		/* Link it into the new chain */
 		*p = n;
 		p = &n->m_next;
 
 		/* Copy data from original mbuf(s) into new mbuf */
 		while (n->m_len < nsize && m != NULL) {
 			int chunk = min(nsize - n->m_len, m->m_len - moff);
 
 			bcopy(m->m_data + moff, n->m_data + n->m_len, chunk);
 			moff += chunk;
 			n->m_len += chunk;
 			remain -= chunk;
 			if (moff == m->m_len) {
 				m = m->m_next;
 				moff = 0;
 			}
 		}
 
 		/* Check correct total mbuf length */
 		KASSERT((remain > 0 && m != NULL) || (remain == 0 && m == NULL),
 		    	("%s: bogus m_pkthdr.len", __func__));
 	}
 	return (top);
 
 nospace:
 	m_freem(top);
 	return (NULL);
 }
 
 /*
  * Concatenate mbuf chain n to m.
  * Both chains must be of the same type (e.g. MT_DATA).
  * Any m_pkthdr is not updated.
  */
 void
 m_cat(struct mbuf *m, struct mbuf *n)
 {
 	while (m->m_next)
 		m = m->m_next;
 	while (n) {
 		if (!M_WRITABLE(m) ||
 		    M_TRAILINGSPACE(m) < n->m_len) {
 			/* just join the two chains */
 			m->m_next = n;
 			return;
 		}
 		/* splat the data from one into the other */
 		bcopy(mtod(n, caddr_t), mtod(m, caddr_t) + m->m_len,
 		    (u_int)n->m_len);
 		m->m_len += n->m_len;
 		n = m_free(n);
 	}
 }
 
 /*
  * Concatenate two pkthdr mbuf chains.
  */
 void
 m_catpkt(struct mbuf *m, struct mbuf *n)
 {
 
 	M_ASSERTPKTHDR(m);
 	M_ASSERTPKTHDR(n);
 
 	m->m_pkthdr.len += n->m_pkthdr.len;
 	m_demote(n, 1, 0);
 
 	m_cat(m, n);
 }
 
 void
 m_adj(struct mbuf *mp, int req_len)
 {
 	int len = req_len;
 	struct mbuf *m;
 	int count;
 
 	if ((m = mp) == NULL)
 		return;
 	if (len >= 0) {
 		/*
 		 * Trim from head.
 		 */
 		while (m != NULL && len > 0) {
 			if (m->m_len <= len) {
 				len -= m->m_len;
 				m->m_len = 0;
 				m = m->m_next;
 			} else {
 				m->m_len -= len;
 				m->m_data += len;
 				len = 0;
 			}
 		}
 		if (mp->m_flags & M_PKTHDR)
 			mp->m_pkthdr.len -= (req_len - len);
 	} else {
 		/*
 		 * Trim from tail.  Scan the mbuf chain,
 		 * calculating its length and finding the last mbuf.
 		 * If the adjustment only affects this mbuf, then just
 		 * adjust and return.  Otherwise, rescan and truncate
 		 * after the remaining size.
 		 */
 		len = -len;
 		count = 0;
 		for (;;) {
 			count += m->m_len;
 			if (m->m_next == (struct mbuf *)0)
 				break;
 			m = m->m_next;
 		}
 		if (m->m_len >= len) {
 			m->m_len -= len;
 			if (mp->m_flags & M_PKTHDR)
 				mp->m_pkthdr.len -= len;
 			return;
 		}
 		count -= len;
 		if (count < 0)
 			count = 0;
 		/*
 		 * Correct length for chain is "count".
 		 * Find the mbuf with last data, adjust its length,
 		 * and toss data from remaining mbufs on chain.
 		 */
 		m = mp;
 		if (m->m_flags & M_PKTHDR)
 			m->m_pkthdr.len = count;
 		for (; m; m = m->m_next) {
 			if (m->m_len >= count) {
 				m->m_len = count;
 				if (m->m_next != NULL) {
 					m_freem(m->m_next);
 					m->m_next = NULL;
 				}
 				break;
 			}
 			count -= m->m_len;
 		}
 	}
 }
 
 /*
  * Rearange an mbuf chain so that len bytes are contiguous
  * and in the data area of an mbuf (so that mtod will work
  * for a structure of size len).  Returns the resulting
  * mbuf chain on success, frees it and returns null on failure.
  * If there is room, it will add up to max_protohdr-len extra bytes to the
  * contiguous region in an attempt to avoid being called next time.
  */
 struct mbuf *
 m_pullup(struct mbuf *n, int len)
 {
 	struct mbuf *m;
 	int count;
 	int space;
 
 	/*
 	 * If first mbuf has no cluster, and has room for len bytes
 	 * without shifting current data, pullup into it,
 	 * otherwise allocate a new mbuf to prepend to the chain.
 	 */
 	if ((n->m_flags & M_EXT) == 0 &&
 	    n->m_data + len < &n->m_dat[MLEN] && n->m_next) {
 		if (n->m_len >= len)
 			return (n);
 		m = n;
 		n = n->m_next;
 		len -= m->m_len;
 	} else {
 		if (len > MHLEN)
 			goto bad;
 		m = m_get(M_NOWAIT, n->m_type);
 		if (m == NULL)
 			goto bad;
 		if (n->m_flags & M_PKTHDR)
 			m_move_pkthdr(m, n);
 	}
 	space = &m->m_dat[MLEN] - (m->m_data + m->m_len);
 	do {
 		count = min(min(max(len, max_protohdr), space), n->m_len);
 		bcopy(mtod(n, caddr_t), mtod(m, caddr_t) + m->m_len,
 		  (u_int)count);
 		len -= count;
 		m->m_len += count;
 		n->m_len -= count;
 		space -= count;
 		if (n->m_len)
 			n->m_data += count;
 		else
 			n = m_free(n);
 	} while (len > 0 && n);
 	if (len > 0) {
 		(void) m_free(m);
 		goto bad;
 	}
 	m->m_next = n;
 	return (m);
 bad:
 	m_freem(n);
 	return (NULL);
 }
 
 /*
  * Like m_pullup(), except a new mbuf is always allocated, and we allow
  * the amount of empty space before the data in the new mbuf to be specified
  * (in the event that the caller expects to prepend later).
  */
 struct mbuf *
 m_copyup(struct mbuf *n, int len, int dstoff)
 {
 	struct mbuf *m;
 	int count, space;
 
 	if (len > (MHLEN - dstoff))
 		goto bad;
 	m = m_get(M_NOWAIT, n->m_type);
 	if (m == NULL)
 		goto bad;
 	if (n->m_flags & M_PKTHDR)
 		m_move_pkthdr(m, n);
 	m->m_data += dstoff;
 	space = &m->m_dat[MLEN] - (m->m_data + m->m_len);
 	do {
 		count = min(min(max(len, max_protohdr), space), n->m_len);
 		memcpy(mtod(m, caddr_t) + m->m_len, mtod(n, caddr_t),
 		    (unsigned)count);
 		len -= count;
 		m->m_len += count;
 		n->m_len -= count;
 		space -= count;
 		if (n->m_len)
 			n->m_data += count;
 		else
 			n = m_free(n);
 	} while (len > 0 && n);
 	if (len > 0) {
 		(void) m_free(m);
 		goto bad;
 	}
 	m->m_next = n;
 	return (m);
  bad:
 	m_freem(n);
 	return (NULL);
 }
 
 /*
  * Partition an mbuf chain in two pieces, returning the tail --
  * all but the first len0 bytes.  In case of failure, it returns NULL and
  * attempts to restore the chain to its original state.
  *
  * Note that the resulting mbufs might be read-only, because the new
  * mbuf can end up sharing an mbuf cluster with the original mbuf if
  * the "breaking point" happens to lie within a cluster mbuf. Use the
  * M_WRITABLE() macro to check for this case.
  */
 struct mbuf *
 m_split(struct mbuf *m0, int len0, int wait)
 {
 	struct mbuf *m, *n;
 	u_int len = len0, remain;
 
 	MBUF_CHECKSLEEP(wait);
 	for (m = m0; m && len > m->m_len; m = m->m_next)
 		len -= m->m_len;
 	if (m == NULL)
 		return (NULL);
 	remain = m->m_len - len;
 	if (m0->m_flags & M_PKTHDR && remain == 0) {
 		n = m_gethdr(wait, m0->m_type);
 		if (n == NULL)
 			return (NULL);
 		n->m_next = m->m_next;
 		m->m_next = NULL;
 		n->m_pkthdr.rcvif = m0->m_pkthdr.rcvif;
 		n->m_pkthdr.len = m0->m_pkthdr.len - len0;
 		m0->m_pkthdr.len = len0;
 		return (n);
 	} else if (m0->m_flags & M_PKTHDR) {
 		n = m_gethdr(wait, m0->m_type);
 		if (n == NULL)
 			return (NULL);
 		n->m_pkthdr.rcvif = m0->m_pkthdr.rcvif;
 		n->m_pkthdr.len = m0->m_pkthdr.len - len0;
 		m0->m_pkthdr.len = len0;
 		if (m->m_flags & M_EXT)
 			goto extpacket;
 		if (remain > MHLEN) {
 			/* m can't be the lead packet */
 			M_ALIGN(n, 0);
 			n->m_next = m_split(m, len, wait);
 			if (n->m_next == NULL) {
 				(void) m_free(n);
 				return (NULL);
 			} else {
 				n->m_len = 0;
 				return (n);
 			}
 		} else
 			M_ALIGN(n, remain);
 	} else if (remain == 0) {
 		n = m->m_next;
 		m->m_next = NULL;
 		return (n);
 	} else {
 		n = m_get(wait, m->m_type);
 		if (n == NULL)
 			return (NULL);
 		M_ALIGN(n, remain);
 	}
 extpacket:
 	if (m->m_flags & M_EXT) {
 		n->m_data = m->m_data + len;
 		mb_dupcl(n, m);
 	} else {
 		bcopy(mtod(m, caddr_t) + len, mtod(n, caddr_t), remain);
 	}
 	n->m_len = remain;
 	m->m_len = len;
 	n->m_next = m->m_next;
 	m->m_next = NULL;
 	return (n);
 }
 /*
  * Routine to copy from device local memory into mbufs.
  * Note that `off' argument is offset into first mbuf of target chain from
  * which to begin copying the data to.
  */
 struct mbuf *
 m_devget(char *buf, int totlen, int off, struct ifnet *ifp,
     void (*copy)(char *from, caddr_t to, u_int len))
 {
 	struct mbuf *m;
 	struct mbuf *top = NULL, **mp = &top;
 	int len;
 
 	if (off < 0 || off > MHLEN)
 		return (NULL);
 
 	while (totlen > 0) {
 		if (top == NULL) {	/* First one, must be PKTHDR */
 			if (totlen + off >= MINCLSIZE) {
 				m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
 				len = MCLBYTES;
 			} else {
 				m = m_gethdr(M_NOWAIT, MT_DATA);
 				len = MHLEN;
 
 				/* Place initial small packet/header at end of mbuf */
 				if (m && totlen + off + max_linkhdr <= MHLEN) {
 					m->m_data += max_linkhdr;
 					len -= max_linkhdr;
 				}
 			}
 			if (m == NULL)
 				return NULL;
 			m->m_pkthdr.rcvif = ifp;
 			m->m_pkthdr.len = totlen;
 		} else {
 			if (totlen + off >= MINCLSIZE) {
 				m = m_getcl(M_NOWAIT, MT_DATA, 0);
 				len = MCLBYTES;
 			} else {
 				m = m_get(M_NOWAIT, MT_DATA);
 				len = MLEN;
 			}
 			if (m == NULL) {
 				m_freem(top);
 				return NULL;
 			}
 		}
 		if (off) {
 			m->m_data += off;
 			len -= off;
 			off = 0;
 		}
 		m->m_len = len = min(totlen, len);
 		if (copy)
 			copy(buf, mtod(m, caddr_t), (u_int)len);
 		else
 			bcopy(buf, mtod(m, caddr_t), (u_int)len);
 		buf += len;
 		*mp = m;
 		mp = &m->m_next;
 		totlen -= len;
 	}
 	return (top);
 }
 
 /*
  * Copy data from a buffer back into the indicated mbuf chain,
  * starting "off" bytes from the beginning, extending the mbuf
  * chain if necessary.
  */
 void
 m_copyback(struct mbuf *m0, int off, int len, c_caddr_t cp)
 {
 	int mlen;
 	struct mbuf *m = m0, *n;
 	int totlen = 0;
 
 	if (m0 == NULL)
 		return;
 	while (off > (mlen = m->m_len)) {
 		off -= mlen;
 		totlen += mlen;
 		if (m->m_next == NULL) {
 			n = m_get(M_NOWAIT, m->m_type);
 			if (n == NULL)
 				goto out;
 			bzero(mtod(n, caddr_t), MLEN);
 			n->m_len = min(MLEN, len + off);
 			m->m_next = n;
 		}
 		m = m->m_next;
 	}
 	while (len > 0) {
 		if (m->m_next == NULL && (len > m->m_len - off)) {
 			m->m_len += min(len - (m->m_len - off),
 			    M_TRAILINGSPACE(m));
 		}
 		mlen = min (m->m_len - off, len);
 		bcopy(cp, off + mtod(m, caddr_t), (u_int)mlen);
 		cp += mlen;
 		len -= mlen;
 		mlen += off;
 		off = 0;
 		totlen += mlen;
 		if (len == 0)
 			break;
 		if (m->m_next == NULL) {
 			n = m_get(M_NOWAIT, m->m_type);
 			if (n == NULL)
 				break;
 			n->m_len = min(MLEN, len);
 			m->m_next = n;
 		}
 		m = m->m_next;
 	}
 out:	if (((m = m0)->m_flags & M_PKTHDR) && (m->m_pkthdr.len < totlen))
 		m->m_pkthdr.len = totlen;
 }
 
 /*
  * Append the specified data to the indicated mbuf chain,
  * Extend the mbuf chain if the new data does not fit in
  * existing space.
  *
  * Return 1 if able to complete the job; otherwise 0.
  */
 int
 m_append(struct mbuf *m0, int len, c_caddr_t cp)
 {
 	struct mbuf *m, *n;
 	int remainder, space;
 
 	for (m = m0; m->m_next != NULL; m = m->m_next)
 		;
 	remainder = len;
 	space = M_TRAILINGSPACE(m);
 	if (space > 0) {
 		/*
 		 * Copy into available space.
 		 */
 		if (space > remainder)
 			space = remainder;
 		bcopy(cp, mtod(m, caddr_t) + m->m_len, space);
 		m->m_len += space;
 		cp += space, remainder -= space;
 	}
 	while (remainder > 0) {
 		/*
 		 * Allocate a new mbuf; could check space
 		 * and allocate a cluster instead.
 		 */
 		n = m_get(M_NOWAIT, m->m_type);
 		if (n == NULL)
 			break;
 		n->m_len = min(MLEN, remainder);
 		bcopy(cp, mtod(n, caddr_t), n->m_len);
 		cp += n->m_len, remainder -= n->m_len;
 		m->m_next = n;
 		m = n;
 	}
 	if (m0->m_flags & M_PKTHDR)
 		m0->m_pkthdr.len += len - remainder;
 	return (remainder == 0);
 }
 
 /*
  * Apply function f to the data in an mbuf chain starting "off" bytes from
  * the beginning, continuing for "len" bytes.
  */
 int
 m_apply(struct mbuf *m, int off, int len,
     int (*f)(void *, void *, u_int), void *arg)
 {
 	u_int count;
 	int rval;
 
 	KASSERT(off >= 0, ("m_apply, negative off %d", off));
 	KASSERT(len >= 0, ("m_apply, negative len %d", len));
 	while (off > 0) {
 		KASSERT(m != NULL, ("m_apply, offset > size of mbuf chain"));
 		if (off < m->m_len)
 			break;
 		off -= m->m_len;
 		m = m->m_next;
 	}
 	while (len > 0) {
 		KASSERT(m != NULL, ("m_apply, offset > size of mbuf chain"));
 		count = min(m->m_len - off, len);
 		rval = (*f)(arg, mtod(m, caddr_t) + off, count);
 		if (rval)
 			return (rval);
 		len -= count;
 		off = 0;
 		m = m->m_next;
 	}
 	return (0);
 }
 
 /*
  * Return a pointer to mbuf/offset of location in mbuf chain.
  */
 struct mbuf *
 m_getptr(struct mbuf *m, int loc, int *off)
 {
 
 	while (loc >= 0) {
 		/* Normal end of search. */
 		if (m->m_len > loc) {
 			*off = loc;
 			return (m);
 		} else {
 			loc -= m->m_len;
 			if (m->m_next == NULL) {
 				if (loc == 0) {
 					/* Point at the end of valid data. */
 					*off = m->m_len;
 					return (m);
 				}
 				return (NULL);
 			}
 			m = m->m_next;
 		}
 	}
 	return (NULL);
 }
 
 void
 m_print(const struct mbuf *m, int maxlen)
 {
 	int len;
 	int pdata;
 	const struct mbuf *m2;
 
 	if (m == NULL) {
 		printf("mbuf: %p\n", m);
 		return;
 	}
 
 	if (m->m_flags & M_PKTHDR)
 		len = m->m_pkthdr.len;
 	else
 		len = -1;
 	m2 = m;
 	while (m2 != NULL && (len == -1 || len)) {
 		pdata = m2->m_len;
 		if (maxlen != -1 && pdata > maxlen)
 			pdata = maxlen;
 		printf("mbuf: %p len: %d, next: %p, %b%s", m2, m2->m_len,
 		    m2->m_next, m2->m_flags, "\20\20freelist\17skipfw"
 		    "\11proto5\10proto4\7proto3\6proto2\5proto1\4rdonly"
 		    "\3eor\2pkthdr\1ext", pdata ? "" : "\n");
 		if (pdata)
 			printf(", %*D\n", pdata, (u_char *)m2->m_data, "-");
 		if (len != -1)
 			len -= m2->m_len;
 		m2 = m2->m_next;
 	}
 	if (len > 0)
 		printf("%d bytes unaccounted for.\n", len);
 	return;
 }
 
 u_int
 m_fixhdr(struct mbuf *m0)
 {
 	u_int len;
 
 	len = m_length(m0, NULL);
 	m0->m_pkthdr.len = len;
 	return (len);
 }
 
 u_int
 m_length(struct mbuf *m0, struct mbuf **last)
 {
 	struct mbuf *m;
 	u_int len;
 
 	len = 0;
 	for (m = m0; m != NULL; m = m->m_next) {
 		len += m->m_len;
 		if (m->m_next == NULL)
 			break;
 	}
 	if (last != NULL)
 		*last = m;
 	return (len);
 }
 
 /*
  * Defragment a mbuf chain, returning the shortest possible
  * chain of mbufs and clusters.  If allocation fails and
  * this cannot be completed, NULL will be returned, but
  * the passed in chain will be unchanged.  Upon success,
  * the original chain will be freed, and the new chain
  * will be returned.
  *
  * If a non-packet header is passed in, the original
  * mbuf (chain?) will be returned unharmed.
  */
 struct mbuf *
 m_defrag(struct mbuf *m0, int how)
 {
 	struct mbuf *m_new = NULL, *m_final = NULL;
 	int progress = 0, length;
 
 	MBUF_CHECKSLEEP(how);
 	if (!(m0->m_flags & M_PKTHDR))
 		return (m0);
 
 	m_fixhdr(m0); /* Needed sanity check */
 
 #ifdef MBUF_STRESS_TEST
 	if (m_defragrandomfailures) {
 		int temp = arc4random() & 0xff;
 		if (temp == 0xba)
 			goto nospace;
 	}
 #endif
 
 	if (m0->m_pkthdr.len > MHLEN)
 		m_final = m_getcl(how, MT_DATA, M_PKTHDR);
 	else
 		m_final = m_gethdr(how, MT_DATA);
 
 	if (m_final == NULL)
 		goto nospace;
 
 	if (m_dup_pkthdr(m_final, m0, how) == 0)
 		goto nospace;
 
 	m_new = m_final;
 
 	while (progress < m0->m_pkthdr.len) {
 		length = m0->m_pkthdr.len - progress;
 		if (length > MCLBYTES)
 			length = MCLBYTES;
 
 		if (m_new == NULL) {
 			if (length > MLEN)
 				m_new = m_getcl(how, MT_DATA, 0);
 			else
 				m_new = m_get(how, MT_DATA);
 			if (m_new == NULL)
 				goto nospace;
 		}
 
 		m_copydata(m0, progress, length, mtod(m_new, caddr_t));
 		progress += length;
 		m_new->m_len = length;
 		if (m_new != m_final)
 			m_cat(m_final, m_new);
 		m_new = NULL;
 	}
 #ifdef MBUF_STRESS_TEST
 	if (m0->m_next == NULL)
 		m_defraguseless++;
 #endif
 	m_freem(m0);
 	m0 = m_final;
 #ifdef MBUF_STRESS_TEST
 	m_defragpackets++;
 	m_defragbytes += m0->m_pkthdr.len;
 #endif
 	return (m0);
 nospace:
 #ifdef MBUF_STRESS_TEST
 	m_defragfailure++;
 #endif
 	if (m_final)
 		m_freem(m_final);
 	return (NULL);
 }
 
 /*
  * Defragment an mbuf chain, returning at most maxfrags separate
  * mbufs+clusters.  If this is not possible NULL is returned and
  * the original mbuf chain is left in its present (potentially
  * modified) state.  We use two techniques: collapsing consecutive
  * mbufs and replacing consecutive mbufs by a cluster.
  *
  * NB: this should really be named m_defrag but that name is taken
  */
 struct mbuf *
 m_collapse(struct mbuf *m0, int how, int maxfrags)
 {
 	struct mbuf *m, *n, *n2, **prev;
 	u_int curfrags;
 
 	/*
 	 * Calculate the current number of frags.
 	 */
 	curfrags = 0;
 	for (m = m0; m != NULL; m = m->m_next)
 		curfrags++;
 	/*
 	 * First, try to collapse mbufs.  Note that we always collapse
 	 * towards the front so we don't need to deal with moving the
 	 * pkthdr.  This may be suboptimal if the first mbuf has much
 	 * less data than the following.
 	 */
 	m = m0;
 again:
 	for (;;) {
 		n = m->m_next;
 		if (n == NULL)
 			break;
 		if (M_WRITABLE(m) &&
 		    n->m_len < M_TRAILINGSPACE(m)) {
 			bcopy(mtod(n, void *), mtod(m, char *) + m->m_len,
 				n->m_len);
 			m->m_len += n->m_len;
 			m->m_next = n->m_next;
 			m_free(n);
 			if (--curfrags <= maxfrags)
 				return m0;
 		} else
 			m = n;
 	}
 	KASSERT(maxfrags > 1,
 		("maxfrags %u, but normal collapse failed", maxfrags));
 	/*
 	 * Collapse consecutive mbufs to a cluster.
 	 */
 	prev = &m0->m_next;		/* NB: not the first mbuf */
 	while ((n = *prev) != NULL) {
 		if ((n2 = n->m_next) != NULL &&
 		    n->m_len + n2->m_len < MCLBYTES) {
 			m = m_getcl(how, MT_DATA, 0);
 			if (m == NULL)
 				goto bad;
 			bcopy(mtod(n, void *), mtod(m, void *), n->m_len);
 			bcopy(mtod(n2, void *), mtod(m, char *) + n->m_len,
 				n2->m_len);
 			m->m_len = n->m_len + n2->m_len;
 			m->m_next = n2->m_next;
 			*prev = m;
 			m_free(n);
 			m_free(n2);
 			if (--curfrags <= maxfrags)	/* +1 cl -2 mbufs */
 				return m0;
 			/*
 			 * Still not there, try the normal collapse
 			 * again before we allocate another cluster.
 			 */
 			goto again;
 		}
 		prev = &n->m_next;
 	}
 	/*
 	 * No place where we can collapse to a cluster; punt.
 	 * This can occur if, for example, you request 2 frags
 	 * but the packet requires that both be clusters (we
 	 * never reallocate the first mbuf to avoid moving the
 	 * packet header).
 	 */
 bad:
 	return NULL;
 }
 
 #ifdef MBUF_STRESS_TEST
 
 /*
  * Fragment an mbuf chain.  There's no reason you'd ever want to do
  * this in normal usage, but it's great for stress testing various
  * mbuf consumers.
  *
  * If fragmentation is not possible, the original chain will be
  * returned.
  *
  * Possible length values:
  * 0	 no fragmentation will occur
  * > 0	each fragment will be of the specified length
  * -1	each fragment will be the same random value in length
  * -2	each fragment's length will be entirely random
  * (Random values range from 1 to 256)
  */
 struct mbuf *
 m_fragment(struct mbuf *m0, int how, int length)
 {
 	struct mbuf *m_first, *m_last;
 	int divisor = 255, progress = 0, fraglen;
 
 	if (!(m0->m_flags & M_PKTHDR))
 		return (m0);
 
 	if (length == 0 || length < -2)
 		return (m0);
 	if (length > MCLBYTES)
 		length = MCLBYTES;
 	if (length < 0 && divisor > MCLBYTES)
 		divisor = MCLBYTES;
 	if (length == -1)
 		length = 1 + (arc4random() % divisor);
 	if (length > 0)
 		fraglen = length;
 
 	m_fixhdr(m0); /* Needed sanity check */
 
 	m_first = m_getcl(how, MT_DATA, M_PKTHDR);
 	if (m_first == NULL)
 		goto nospace;
 
 	if (m_dup_pkthdr(m_first, m0, how) == 0)
 		goto nospace;
 
 	m_last = m_first;
 
 	while (progress < m0->m_pkthdr.len) {
 		if (length == -2)
 			fraglen = 1 + (arc4random() % divisor);
 		if (fraglen > m0->m_pkthdr.len - progress)
 			fraglen = m0->m_pkthdr.len - progress;
 
 		if (progress != 0) {
 			struct mbuf *m_new = m_getcl(how, MT_DATA, 0);
 			if (m_new == NULL)
 				goto nospace;
 
 			m_last->m_next = m_new;
 			m_last = m_new;
 		}
 
 		m_copydata(m0, progress, fraglen, mtod(m_last, caddr_t));
 		progress += fraglen;
 		m_last->m_len = fraglen;
 	}
 	m_freem(m0);
 	m0 = m_first;
 	return (m0);
 nospace:
 	if (m_first)
 		m_freem(m_first);
 	/* Return the original chain on failure */
 	return (m0);
 }
 
 #endif
 
 /*
  * Copy the contents of uio into a properly sized mbuf chain.
  */
 struct mbuf *
 m_uiotombuf(struct uio *uio, int how, int len, int align, int flags)
 {
 	struct mbuf *m, *mb;
 	int error, length;
 	ssize_t total;
 	int progress = 0;
 
 	/*
 	 * len can be zero or an arbitrary large value bound by
 	 * the total data supplied by the uio.
 	 */
 	if (len > 0)
 		total = (uio->uio_resid < len) ? uio->uio_resid : len;
 	else
 		total = uio->uio_resid;
 
 	/*
 	 * The smallest unit returned by m_getm2() is a single mbuf
 	 * with pkthdr.  We can't align past it.
 	 */
 	if (align >= MHLEN)
 		return (NULL);
 
 	/*
 	 * Give us the full allocation or nothing.
 	 * If len is zero return the smallest empty mbuf.
 	 */
 	m = m_getm2(NULL, max(total + align, 1), how, MT_DATA, flags);
 	if (m == NULL)
 		return (NULL);
 	m->m_data += align;
 
 	/* Fill all mbufs with uio data and update header information. */
 	for (mb = m; mb != NULL; mb = mb->m_next) {
 		length = min(M_TRAILINGSPACE(mb), total - progress);
 
 		error = uiomove(mtod(mb, void *), length, uio);
 		if (error) {
 			m_freem(m);
 			return (NULL);
 		}
 
 		mb->m_len = length;
 		progress += length;
 		if (flags & M_PKTHDR)
 			m->m_pkthdr.len += length;
 	}
 	KASSERT(progress == total, ("%s: progress != total", __func__));
 
 	return (m);
 }
 
 /*
  * Copy an mbuf chain into a uio limited by len if set.
  */
 int
 m_mbuftouio(struct uio *uio, const struct mbuf *m, int len)
 {
 	int error, length, total;
 	int progress = 0;
 
 	if (len > 0)
 		total = min(uio->uio_resid, len);
 	else
 		total = uio->uio_resid;
 
 	/* Fill the uio with data from the mbufs. */
 	for (; m != NULL; m = m->m_next) {
 		length = min(m->m_len, total - progress);
 
 		error = uiomove(mtod(m, void *), length, uio);
 		if (error)
 			return (error);
 
 		progress += length;
 	}
 
 	return (0);
 }
 
 /*
  * Create a writable copy of the mbuf chain.  While doing this
  * we compact the chain with a goal of producing a chain with
  * at most two mbufs.  The second mbuf in this chain is likely
  * to be a cluster.  The primary purpose of this work is to create
  * a writable packet for encryption, compression, etc.  The
  * secondary goal is to linearize the data so the data can be
  * passed to crypto hardware in the most efficient manner possible.
  */
 struct mbuf *
 m_unshare(struct mbuf *m0, int how)
 {
 	struct mbuf *m, *mprev;
 	struct mbuf *n, *mfirst, *mlast;
 	int len, off;
 
 	mprev = NULL;
 	for (m = m0; m != NULL; m = mprev->m_next) {
 		/*
 		 * Regular mbufs are ignored unless there's a cluster
 		 * in front of it that we can use to coalesce.  We do
 		 * the latter mainly so later clusters can be coalesced
 		 * also w/o having to handle them specially (i.e. convert
 		 * mbuf+cluster -> cluster).  This optimization is heavily
 		 * influenced by the assumption that we're running over
 		 * Ethernet where MCLBYTES is large enough that the max
 		 * packet size will permit lots of coalescing into a
 		 * single cluster.  This in turn permits efficient
 		 * crypto operations, especially when using hardware.
 		 */
 		if ((m->m_flags & M_EXT) == 0) {
 			if (mprev && (mprev->m_flags & M_EXT) &&
 			    m->m_len <= M_TRAILINGSPACE(mprev)) {
 				/* XXX: this ignores mbuf types */
 				memcpy(mtod(mprev, caddr_t) + mprev->m_len,
 				    mtod(m, caddr_t), m->m_len);
 				mprev->m_len += m->m_len;
 				mprev->m_next = m->m_next;	/* unlink from chain */
 				m_free(m);			/* reclaim mbuf */
 			} else {
 				mprev = m;
 			}
 			continue;
 		}
 		/*
 		 * Writable mbufs are left alone (for now).
 		 */
 		if (M_WRITABLE(m)) {
 			mprev = m;
 			continue;
 		}
 
 		/*
 		 * Not writable, replace with a copy or coalesce with
 		 * the previous mbuf if possible (since we have to copy
 		 * it anyway, we try to reduce the number of mbufs and
 		 * clusters so that future work is easier).
 		 */
 		KASSERT(m->m_flags & M_EXT, ("m_flags 0x%x", m->m_flags));
 		/* NB: we only coalesce into a cluster or larger */
 		if (mprev != NULL && (mprev->m_flags & M_EXT) &&
 		    m->m_len <= M_TRAILINGSPACE(mprev)) {
 			/* XXX: this ignores mbuf types */
 			memcpy(mtod(mprev, caddr_t) + mprev->m_len,
 			    mtod(m, caddr_t), m->m_len);
 			mprev->m_len += m->m_len;
 			mprev->m_next = m->m_next;	/* unlink from chain */
 			m_free(m);			/* reclaim mbuf */
 			continue;
 		}
 
 		/*
 		 * Allocate new space to hold the copy and copy the data.
 		 * We deal with jumbo mbufs (i.e. m_len > MCLBYTES) by
 		 * splitting them into clusters.  We could just malloc a
 		 * buffer and make it external but too many device drivers
 		 * don't know how to break up the non-contiguous memory when
 		 * doing DMA.
 		 */
 		n = m_getcl(how, m->m_type, m->m_flags & M_COPYFLAGS);
 		if (n == NULL) {
 			m_freem(m0);
 			return (NULL);
 		}
 		if (m->m_flags & M_PKTHDR) {
 			KASSERT(mprev == NULL, ("%s: m0 %p, m %p has M_PKTHDR",
 			    __func__, m0, m));
 			m_move_pkthdr(n, m);
 		}
 		len = m->m_len;
 		off = 0;
 		mfirst = n;
 		mlast = NULL;
 		for (;;) {
 			int cc = min(len, MCLBYTES);
 			memcpy(mtod(n, caddr_t), mtod(m, caddr_t) + off, cc);
 			n->m_len = cc;
 			if (mlast != NULL)
 				mlast->m_next = n;
 			mlast = n;
 #if 0
 			newipsecstat.ips_clcopied++;
 #endif
 
 			len -= cc;
 			if (len <= 0)
 				break;
 			off += cc;
 
 			n = m_getcl(how, m->m_type, m->m_flags & M_COPYFLAGS);
 			if (n == NULL) {
 				m_freem(mfirst);
 				m_freem(m0);
 				return (NULL);
 			}
 		}
 		n->m_next = m->m_next;
 		if (mprev == NULL)
 			m0 = mfirst;		/* new head of chain */
 		else
 			mprev->m_next = mfirst;	/* replace old mbuf */
 		m_free(m);			/* release old mbuf */
 		mprev = mfirst;
 	}
 	return (m0);
 }
 
 #ifdef MBUF_PROFILING
 
 #define MP_BUCKETS 32 /* don't just change this as things may overflow.*/
 struct mbufprofile {
 	uintmax_t wasted[MP_BUCKETS];
 	uintmax_t used[MP_BUCKETS];
 	uintmax_t segments[MP_BUCKETS];
 } mbprof;
 
 #define MP_MAXDIGITS 21	/* strlen("16,000,000,000,000,000,000") == 21 */
 #define MP_NUMLINES 6
 #define MP_NUMSPERLINE 16
 #define MP_EXTRABYTES 64	/* > strlen("used:\nwasted:\nsegments:\n") */
 /* work out max space needed and add a bit of spare space too */
 #define MP_MAXLINE ((MP_MAXDIGITS+1) * MP_NUMSPERLINE)
 #define MP_BUFSIZE ((MP_MAXLINE * MP_NUMLINES) + 1 + MP_EXTRABYTES)
 
 char mbprofbuf[MP_BUFSIZE];
 
 void
 m_profile(struct mbuf *m)
 {
 	int segments = 0;
 	int used = 0;
 	int wasted = 0;
 
 	while (m) {
 		segments++;
 		used += m->m_len;
 		if (m->m_flags & M_EXT) {
 			wasted += MHLEN - sizeof(m->m_ext) +
 			    m->m_ext.ext_size - m->m_len;
 		} else {
 			if (m->m_flags & M_PKTHDR)
 				wasted += MHLEN - m->m_len;
 			else
 				wasted += MLEN - m->m_len;
 		}
 		m = m->m_next;
 	}
 	/* be paranoid.. it helps */
 	if (segments > MP_BUCKETS - 1)
 		segments = MP_BUCKETS - 1;
 	if (used > 100000)
 		used = 100000;
 	if (wasted > 100000)
 		wasted = 100000;
 	/* store in the appropriate bucket */
 	/* don't bother locking. if it's slightly off, so what? */
 	mbprof.segments[segments]++;
 	mbprof.used[fls(used)]++;
 	mbprof.wasted[fls(wasted)]++;
 }
 
 static void
 mbprof_textify(void)
 {
 	int offset;
 	char *c;
 	uint64_t *p;
 
 	p = &mbprof.wasted[0];
 	c = mbprofbuf;
 	offset = snprintf(c, MP_MAXLINE + 10,
 	    "wasted:\n"
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %ju\n",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #ifdef BIG_ARRAY
 	p = &mbprof.wasted[16];
 	c += offset;
 	offset = snprintf(c, MP_MAXLINE,
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %ju\n",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #endif
 	p = &mbprof.used[0];
 	c += offset;
 	offset = snprintf(c, MP_MAXLINE + 10,
 	    "used:\n"
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %ju\n",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #ifdef BIG_ARRAY
 	p = &mbprof.used[16];
 	c += offset;
 	offset = snprintf(c, MP_MAXLINE,
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %ju\n",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #endif
 	p = &mbprof.segments[0];
 	c += offset;
 	offset = snprintf(c, MP_MAXLINE + 10,
 	    "segments:\n"
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %ju\n",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #ifdef BIG_ARRAY
 	p = &mbprof.segments[16];
 	c += offset;
 	offset = snprintf(c, MP_MAXLINE,
 	    "%ju %ju %ju %ju %ju %ju %ju %ju "
 	    "%ju %ju %ju %ju %ju %ju %ju %jju",
 	    p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
 	    p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
 #endif
 }
 
 static int
 mbprof_handler(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 
 	mbprof_textify();
 	error = SYSCTL_OUT(req, mbprofbuf, strlen(mbprofbuf) + 1);
 	return (error);
 }
 
 static int
 mbprof_clr_handler(SYSCTL_HANDLER_ARGS)
 {
 	int clear, error;
 
 	clear = 0;
 	error = sysctl_handle_int(oidp, &clear, 0, req);
 	if (error || !req->newptr)
 		return (error);
 
 	if (clear) {
 		bzero(&mbprof, sizeof(mbprof));
 	}
 
 	return (error);
 }
 
 
 SYSCTL_PROC(_kern_ipc, OID_AUTO, mbufprofile, CTLTYPE_STRING|CTLFLAG_RD,
 	    NULL, 0, mbprof_handler, "A", "mbuf profiling statistics");
 
 SYSCTL_PROC(_kern_ipc, OID_AUTO, mbufprofileclr, CTLTYPE_INT|CTLFLAG_RW,
 	    NULL, 0, mbprof_clr_handler, "I", "clear mbuf profiling statistics");
 #endif
 
Index: user/ngie/bug-237403/sys/kern/vfs_bio.c
===================================================================
--- user/ngie/bug-237403/sys/kern/vfs_bio.c	(revision 346925)
+++ user/ngie/bug-237403/sys/kern/vfs_bio.c	(revision 346926)
@@ -1,5505 +1,5499 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2004 Poul-Henning Kamp
  * Copyright (c) 1994,1997 John S. Dyson
  * Copyright (c) 2013 The FreeBSD Foundation
  * All rights reserved.
  *
  * Portions of this software were developed by Konstantin Belousov
  * under sponsorship from the FreeBSD Foundation.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 /*
  * this file contains a new buffer I/O scheme implementing a coherent
  * VM object and buffer cache scheme.  Pains have been taken to make
  * sure that the performance degradation associated with schemes such
  * as this is not realized.
  *
  * Author:  John S. Dyson
  * Significant help during the development and debugging phases
  * had been provided by David Greenman, also of the FreeBSD core team.
  *
  * see man buf(9) for more info.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bio.h>
 #include <sys/bitset.h>
 #include <sys/conf.h>
 #include <sys/counter.h>
 #include <sys/buf.h>
 #include <sys/devicestat.h>
 #include <sys/eventhandler.h>
 #include <sys/fail.h>
 #include <sys/limits.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mount.h>
 #include <sys/mutex.h>
 #include <sys/kernel.h>
 #include <sys/kthread.h>
 #include <sys/proc.h>
 #include <sys/racct.h>
 #include <sys/resourcevar.h>
 #include <sys/rwlock.h>
 #include <sys/smp.h>
 #include <sys/sysctl.h>
 #include <sys/sysproto.h>
 #include <sys/vmem.h>
 #include <sys/vmmeter.h>
 #include <sys/vnode.h>
 #include <sys/watchdog.h>
 #include <geom/geom.h>
 #include <vm/vm.h>
 #include <vm/vm_param.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
 #include <vm/vm_pageout.h>
 #include <vm/vm_pager.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_map.h>
 #include <vm/swap_pager.h>
 
 static MALLOC_DEFINE(M_BIOBUF, "biobuf", "BIO buffer");
 
 struct	bio_ops bioops;		/* I/O operation notification */
 
 struct	buf_ops buf_ops_bio = {
 	.bop_name	=	"buf_ops_bio",
 	.bop_write	=	bufwrite,
 	.bop_strategy	=	bufstrategy,
 	.bop_sync	=	bufsync,
 	.bop_bdflush	=	bufbdflush,
 };
 
 struct bufqueue {
 	struct mtx_padalign	bq_lock;
 	TAILQ_HEAD(, buf)	bq_queue;
 	uint8_t			bq_index;
 	uint16_t		bq_subqueue;
 	int			bq_len;
 } __aligned(CACHE_LINE_SIZE);
 
 #define	BQ_LOCKPTR(bq)		(&(bq)->bq_lock)
 #define	BQ_LOCK(bq)		mtx_lock(BQ_LOCKPTR((bq)))
 #define	BQ_UNLOCK(bq)		mtx_unlock(BQ_LOCKPTR((bq)))
 #define	BQ_ASSERT_LOCKED(bq)	mtx_assert(BQ_LOCKPTR((bq)), MA_OWNED)
 
 struct bufdomain {
 	struct bufqueue	bd_subq[MAXCPU + 1]; /* Per-cpu sub queues + global */
 	struct bufqueue bd_dirtyq;
 	struct bufqueue	*bd_cleanq;
 	struct mtx_padalign bd_run_lock;
 	/* Constants */
 	long		bd_maxbufspace;
 	long		bd_hibufspace;
 	long 		bd_lobufspace;
 	long 		bd_bufspacethresh;
 	int		bd_hifreebuffers;
 	int		bd_lofreebuffers;
 	int		bd_hidirtybuffers;
 	int		bd_lodirtybuffers;
 	int		bd_dirtybufthresh;
 	int		bd_lim;
 	/* atomics */
 	int		bd_wanted;
 	int __aligned(CACHE_LINE_SIZE)	bd_numdirtybuffers;
 	int __aligned(CACHE_LINE_SIZE)	bd_running;
 	long __aligned(CACHE_LINE_SIZE) bd_bufspace;
 	int __aligned(CACHE_LINE_SIZE)	bd_freebuffers;
 } __aligned(CACHE_LINE_SIZE);
 
 #define	BD_LOCKPTR(bd)		(&(bd)->bd_cleanq->bq_lock)
 #define	BD_LOCK(bd)		mtx_lock(BD_LOCKPTR((bd)))
 #define	BD_UNLOCK(bd)		mtx_unlock(BD_LOCKPTR((bd)))
 #define	BD_ASSERT_LOCKED(bd)	mtx_assert(BD_LOCKPTR((bd)), MA_OWNED)
 #define	BD_RUN_LOCKPTR(bd)	(&(bd)->bd_run_lock)
 #define	BD_RUN_LOCK(bd)		mtx_lock(BD_RUN_LOCKPTR((bd)))
 #define	BD_RUN_UNLOCK(bd)	mtx_unlock(BD_RUN_LOCKPTR((bd)))
 #define	BD_DOMAIN(bd)		(bd - bdomain)
 
 static struct buf *buf;		/* buffer header pool */
 extern struct buf *swbuf;	/* Swap buffer header pool. */
 caddr_t unmapped_buf;
 
 /* Used below and for softdep flushing threads in ufs/ffs/ffs_softdep.c */
 struct proc *bufdaemonproc;
 
 static int inmem(struct vnode *vp, daddr_t blkno);
 static void vm_hold_free_pages(struct buf *bp, int newbsize);
 static void vm_hold_load_pages(struct buf *bp, vm_offset_t from,
 		vm_offset_t to);
 static void vfs_page_set_valid(struct buf *bp, vm_ooffset_t off, vm_page_t m);
 static void vfs_page_set_validclean(struct buf *bp, vm_ooffset_t off,
 		vm_page_t m);
 static void vfs_clean_pages_dirty_buf(struct buf *bp);
 static void vfs_setdirty_locked_object(struct buf *bp);
 static void vfs_vmio_invalidate(struct buf *bp);
 static void vfs_vmio_truncate(struct buf *bp, int npages);
 static void vfs_vmio_extend(struct buf *bp, int npages, int size);
 static int vfs_bio_clcheck(struct vnode *vp, int size,
 		daddr_t lblkno, daddr_t blkno);
 static void breada(struct vnode *, daddr_t *, int *, int, struct ucred *, int,
 		void (*)(struct buf *));
 static int buf_flush(struct vnode *vp, struct bufdomain *, int);
 static int flushbufqueues(struct vnode *, struct bufdomain *, int, int);
 static void buf_daemon(void);
 static __inline void bd_wakeup(void);
 static int sysctl_runningspace(SYSCTL_HANDLER_ARGS);
 static void bufkva_reclaim(vmem_t *, int);
 static void bufkva_free(struct buf *);
 static int buf_import(void *, void **, int, int, int);
 static void buf_release(void *, void **, int);
 static void maxbcachebuf_adjust(void);
 static inline struct bufdomain *bufdomain(struct buf *);
 static void bq_remove(struct bufqueue *bq, struct buf *bp);
 static void bq_insert(struct bufqueue *bq, struct buf *bp, bool unlock);
 static int buf_recycle(struct bufdomain *, bool kva);
 static void bq_init(struct bufqueue *bq, int qindex, int cpu,
 	    const char *lockname);
 static void bd_init(struct bufdomain *bd);
 static int bd_flushall(struct bufdomain *bd);
 static int sysctl_bufdomain_long(SYSCTL_HANDLER_ARGS);
 static int sysctl_bufdomain_int(SYSCTL_HANDLER_ARGS);
 
 static int sysctl_bufspace(SYSCTL_HANDLER_ARGS);
 int vmiodirenable = TRUE;
 SYSCTL_INT(_vfs, OID_AUTO, vmiodirenable, CTLFLAG_RW, &vmiodirenable, 0,
     "Use the VM system for directory writes");
 long runningbufspace;
 SYSCTL_LONG(_vfs, OID_AUTO, runningbufspace, CTLFLAG_RD, &runningbufspace, 0,
     "Amount of presently outstanding async buffer io");
 SYSCTL_PROC(_vfs, OID_AUTO, bufspace, CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RD,
     NULL, 0, sysctl_bufspace, "L", "Physical memory used for buffers");
 static counter_u64_t bufkvaspace;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, bufkvaspace, CTLFLAG_RD, &bufkvaspace,
     "Kernel virtual memory used for buffers");
 static long maxbufspace;
 SYSCTL_PROC(_vfs, OID_AUTO, maxbufspace,
     CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RW, &maxbufspace,
     __offsetof(struct bufdomain, bd_maxbufspace), sysctl_bufdomain_long, "L",
     "Maximum allowed value of bufspace (including metadata)");
 static long bufmallocspace;
 SYSCTL_LONG(_vfs, OID_AUTO, bufmallocspace, CTLFLAG_RD, &bufmallocspace, 0,
     "Amount of malloced memory for buffers");
 static long maxbufmallocspace;
 SYSCTL_LONG(_vfs, OID_AUTO, maxmallocbufspace, CTLFLAG_RW, &maxbufmallocspace,
     0, "Maximum amount of malloced memory for buffers");
 static long lobufspace;
 SYSCTL_PROC(_vfs, OID_AUTO, lobufspace,
     CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RW, &lobufspace,
     __offsetof(struct bufdomain, bd_lobufspace), sysctl_bufdomain_long, "L",
     "Minimum amount of buffers we want to have");
 long hibufspace;
 SYSCTL_PROC(_vfs, OID_AUTO, hibufspace,
     CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RW, &hibufspace,
     __offsetof(struct bufdomain, bd_hibufspace), sysctl_bufdomain_long, "L",
     "Maximum allowed value of bufspace (excluding metadata)");
 long bufspacethresh;
 SYSCTL_PROC(_vfs, OID_AUTO, bufspacethresh,
     CTLTYPE_LONG|CTLFLAG_MPSAFE|CTLFLAG_RW, &bufspacethresh,
     __offsetof(struct bufdomain, bd_bufspacethresh), sysctl_bufdomain_long, "L",
     "Bufspace consumed before waking the daemon to free some");
 static counter_u64_t buffreekvacnt;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, buffreekvacnt, CTLFLAG_RW, &buffreekvacnt,
     "Number of times we have freed the KVA space from some buffer");
 static counter_u64_t bufdefragcnt;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, bufdefragcnt, CTLFLAG_RW, &bufdefragcnt,
     "Number of times we have had to repeat buffer allocation to defragment");
 static long lorunningspace;
 SYSCTL_PROC(_vfs, OID_AUTO, lorunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE |
     CTLFLAG_RW, &lorunningspace, 0, sysctl_runningspace, "L",
     "Minimum preferred space used for in-progress I/O");
 static long hirunningspace;
 SYSCTL_PROC(_vfs, OID_AUTO, hirunningspace, CTLTYPE_LONG | CTLFLAG_MPSAFE |
     CTLFLAG_RW, &hirunningspace, 0, sysctl_runningspace, "L",
     "Maximum amount of space to use for in-progress I/O");
 int dirtybufferflushes;
 SYSCTL_INT(_vfs, OID_AUTO, dirtybufferflushes, CTLFLAG_RW, &dirtybufferflushes,
     0, "Number of bdwrite to bawrite conversions to limit dirty buffers");
 int bdwriteskip;
 SYSCTL_INT(_vfs, OID_AUTO, bdwriteskip, CTLFLAG_RW, &bdwriteskip,
     0, "Number of buffers supplied to bdwrite with snapshot deadlock risk");
 int altbufferflushes;
 SYSCTL_INT(_vfs, OID_AUTO, altbufferflushes, CTLFLAG_RW, &altbufferflushes,
     0, "Number of fsync flushes to limit dirty buffers");
 static int recursiveflushes;
 SYSCTL_INT(_vfs, OID_AUTO, recursiveflushes, CTLFLAG_RW, &recursiveflushes,
     0, "Number of flushes skipped due to being recursive");
 static int sysctl_numdirtybuffers(SYSCTL_HANDLER_ARGS);
 SYSCTL_PROC(_vfs, OID_AUTO, numdirtybuffers,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RD, NULL, 0, sysctl_numdirtybuffers, "I",
     "Number of buffers that are dirty (has unwritten changes) at the moment");
 static int lodirtybuffers;
 SYSCTL_PROC(_vfs, OID_AUTO, lodirtybuffers,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &lodirtybuffers,
     __offsetof(struct bufdomain, bd_lodirtybuffers), sysctl_bufdomain_int, "I",
     "How many buffers we want to have free before bufdaemon can sleep");
 static int hidirtybuffers;
 SYSCTL_PROC(_vfs, OID_AUTO, hidirtybuffers,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &hidirtybuffers,
     __offsetof(struct bufdomain, bd_hidirtybuffers), sysctl_bufdomain_int, "I",
     "When the number of dirty buffers is considered severe");
 int dirtybufthresh;
 SYSCTL_PROC(_vfs, OID_AUTO, dirtybufthresh,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &dirtybufthresh,
     __offsetof(struct bufdomain, bd_dirtybufthresh), sysctl_bufdomain_int, "I",
     "Number of bdwrite to bawrite conversions to clear dirty buffers");
 static int numfreebuffers;
 SYSCTL_INT(_vfs, OID_AUTO, numfreebuffers, CTLFLAG_RD, &numfreebuffers, 0,
     "Number of free buffers");
 static int lofreebuffers;
 SYSCTL_PROC(_vfs, OID_AUTO, lofreebuffers,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &lofreebuffers,
     __offsetof(struct bufdomain, bd_lofreebuffers), sysctl_bufdomain_int, "I",
    "Target number of free buffers");
 static int hifreebuffers;
 SYSCTL_PROC(_vfs, OID_AUTO, hifreebuffers,
     CTLTYPE_INT|CTLFLAG_MPSAFE|CTLFLAG_RW, &hifreebuffers,
     __offsetof(struct bufdomain, bd_hifreebuffers), sysctl_bufdomain_int, "I",
    "Threshold for clean buffer recycling");
 static counter_u64_t getnewbufcalls;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, getnewbufcalls, CTLFLAG_RD,
    &getnewbufcalls, "Number of calls to getnewbuf");
 static counter_u64_t getnewbufrestarts;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, getnewbufrestarts, CTLFLAG_RD,
     &getnewbufrestarts,
     "Number of times getnewbuf has had to restart a buffer acquisition");
 static counter_u64_t mappingrestarts;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, mappingrestarts, CTLFLAG_RD,
     &mappingrestarts,
     "Number of times getblk has had to restart a buffer mapping for "
     "unmapped buffer");
 static counter_u64_t numbufallocfails;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, numbufallocfails, CTLFLAG_RW,
     &numbufallocfails, "Number of times buffer allocations failed");
 static int flushbufqtarget = 100;
 SYSCTL_INT(_vfs, OID_AUTO, flushbufqtarget, CTLFLAG_RW, &flushbufqtarget, 0,
     "Amount of work to do in flushbufqueues when helping bufdaemon");
 static counter_u64_t notbufdflushes;
 SYSCTL_COUNTER_U64(_vfs, OID_AUTO, notbufdflushes, CTLFLAG_RD, &notbufdflushes,
     "Number of dirty buffer flushes done by the bufdaemon helpers");
 static long barrierwrites;
 SYSCTL_LONG(_vfs, OID_AUTO, barrierwrites, CTLFLAG_RW, &barrierwrites, 0,
     "Number of barrier writes");
 SYSCTL_INT(_vfs, OID_AUTO, unmapped_buf_allowed, CTLFLAG_RD,
     &unmapped_buf_allowed, 0,
     "Permit the use of the unmapped i/o");
 int maxbcachebuf = MAXBCACHEBUF;
 SYSCTL_INT(_vfs, OID_AUTO, maxbcachebuf, CTLFLAG_RDTUN, &maxbcachebuf, 0,
     "Maximum size of a buffer cache block");
 
 /*
  * This lock synchronizes access to bd_request.
  */
 static struct mtx_padalign __exclusive_cache_line bdlock;
 
 /*
  * This lock protects the runningbufreq and synchronizes runningbufwakeup and
  * waitrunningbufspace().
  */
 static struct mtx_padalign __exclusive_cache_line rbreqlock;
 
 /*
  * Lock that protects bdirtywait.
  */
 static struct mtx_padalign __exclusive_cache_line bdirtylock;
 
 /*
  * Wakeup point for bufdaemon, as well as indicator of whether it is already
  * active.  Set to 1 when the bufdaemon is already "on" the queue, 0 when it
  * is idling.
  */
 static int bd_request;
 
 /*
  * Request for the buf daemon to write more buffers than is indicated by
  * lodirtybuf.  This may be necessary to push out excess dependencies or
  * defragment the address space where a simple count of the number of dirty
  * buffers is insufficient to characterize the demand for flushing them.
  */
 static int bd_speedupreq;
 
 /*
  * Synchronization (sleep/wakeup) variable for active buffer space requests.
  * Set when wait starts, cleared prior to wakeup().
  * Used in runningbufwakeup() and waitrunningbufspace().
  */
 static int runningbufreq;
 
 /*
  * Synchronization for bwillwrite() waiters.
  */
 static int bdirtywait;
 
 /*
  * Definitions for the buffer free lists.
  */
 #define QUEUE_NONE	0	/* on no queue */
 #define QUEUE_EMPTY	1	/* empty buffer headers */
 #define QUEUE_DIRTY	2	/* B_DELWRI buffers */
 #define QUEUE_CLEAN	3	/* non-B_DELWRI buffers */
 #define QUEUE_SENTINEL	4	/* not an queue index, but mark for sentinel */
 
 /* Maximum number of buffer domains. */
 #define	BUF_DOMAINS	8
 
 struct bufdomainset bdlodirty;		/* Domains > lodirty */
 struct bufdomainset bdhidirty;		/* Domains > hidirty */
 
 /* Configured number of clean queues. */
 static int __read_mostly buf_domains;
 
 BITSET_DEFINE(bufdomainset, BUF_DOMAINS);
 struct bufdomain __exclusive_cache_line bdomain[BUF_DOMAINS];
 struct bufqueue __exclusive_cache_line bqempty;
 
 /*
  * per-cpu empty buffer cache.
  */
 uma_zone_t buf_zone;
 
 /*
  * Single global constant for BUF_WMESG, to avoid getting multiple references.
  * buf_wmesg is referred from macros.
  */
 const char *buf_wmesg = BUF_WMESG;
 
 static int
 sysctl_runningspace(SYSCTL_HANDLER_ARGS)
 {
 	long value;
 	int error;
 
 	value = *(long *)arg1;
 	error = sysctl_handle_long(oidp, &value, 0, req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	mtx_lock(&rbreqlock);
 	if (arg1 == &hirunningspace) {
 		if (value < lorunningspace)
 			error = EINVAL;
 		else
 			hirunningspace = value;
 	} else {
 		KASSERT(arg1 == &lorunningspace,
 		    ("%s: unknown arg1", __func__));
 		if (value > hirunningspace)
 			error = EINVAL;
 		else
 			lorunningspace = value;
 	}
 	mtx_unlock(&rbreqlock);
 	return (error);
 }
 
 static int
 sysctl_bufdomain_int(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	int value;
 	int i;
 
 	value = *(int *)arg1;
 	error = sysctl_handle_int(oidp, &value, 0, req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	*(int *)arg1 = value;
 	for (i = 0; i < buf_domains; i++)
 		*(int *)(uintptr_t)(((uintptr_t)&bdomain[i]) + arg2) =
 		    value / buf_domains;
 
 	return (error);
 }
 
 static int
 sysctl_bufdomain_long(SYSCTL_HANDLER_ARGS)
 {
 	long value;
 	int error;
 	int i;
 
 	value = *(long *)arg1;
 	error = sysctl_handle_long(oidp, &value, 0, req);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	*(long *)arg1 = value;
 	for (i = 0; i < buf_domains; i++)
 		*(long *)(uintptr_t)(((uintptr_t)&bdomain[i]) + arg2) =
 		    value / buf_domains;
 
 	return (error);
 }
 
 #if defined(COMPAT_FREEBSD4) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD7)
 static int
 sysctl_bufspace(SYSCTL_HANDLER_ARGS)
 {
 	long lvalue;
 	int ivalue;
 	int i;
 
 	lvalue = 0;
 	for (i = 0; i < buf_domains; i++)
 		lvalue += bdomain[i].bd_bufspace;
 	if (sizeof(int) == sizeof(long) || req->oldlen >= sizeof(long))
 		return (sysctl_handle_long(oidp, &lvalue, 0, req));
 	if (lvalue > INT_MAX)
 		/* On overflow, still write out a long to trigger ENOMEM. */
 		return (sysctl_handle_long(oidp, &lvalue, 0, req));
 	ivalue = lvalue;
 	return (sysctl_handle_int(oidp, &ivalue, 0, req));
 }
 #else
 static int
 sysctl_bufspace(SYSCTL_HANDLER_ARGS)
 {
 	long lvalue;
 	int i;
 
 	lvalue = 0;
 	for (i = 0; i < buf_domains; i++)
 		lvalue += bdomain[i].bd_bufspace;
 	return (sysctl_handle_long(oidp, &lvalue, 0, req));
 }
 #endif
 
 static int
 sysctl_numdirtybuffers(SYSCTL_HANDLER_ARGS)
 {
 	int value;
 	int i;
 
 	value = 0;
 	for (i = 0; i < buf_domains; i++)
 		value += bdomain[i].bd_numdirtybuffers;
 	return (sysctl_handle_int(oidp, &value, 0, req));
 }
 
 /*
  *	bdirtywakeup:
  *
  *	Wakeup any bwillwrite() waiters.
  */
 static void
 bdirtywakeup(void)
 {
 	mtx_lock(&bdirtylock);
 	if (bdirtywait) {
 		bdirtywait = 0;
 		wakeup(&bdirtywait);
 	}
 	mtx_unlock(&bdirtylock);
 }
 
 /*
  *	bd_clear:
  *
  *	Clear a domain from the appropriate bitsets when dirtybuffers
  *	is decremented.
  */
 static void
 bd_clear(struct bufdomain *bd)
 {
 
 	mtx_lock(&bdirtylock);
 	if (bd->bd_numdirtybuffers <= bd->bd_lodirtybuffers)
 		BIT_CLR(BUF_DOMAINS, BD_DOMAIN(bd), &bdlodirty);
 	if (bd->bd_numdirtybuffers <= bd->bd_hidirtybuffers)
 		BIT_CLR(BUF_DOMAINS, BD_DOMAIN(bd), &bdhidirty);
 	mtx_unlock(&bdirtylock);
 }
 
 /*
  *	bd_set:
  *
  *	Set a domain in the appropriate bitsets when dirtybuffers
  *	is incremented.
  */
 static void
 bd_set(struct bufdomain *bd)
 {
 
 	mtx_lock(&bdirtylock);
 	if (bd->bd_numdirtybuffers > bd->bd_lodirtybuffers)
 		BIT_SET(BUF_DOMAINS, BD_DOMAIN(bd), &bdlodirty);
 	if (bd->bd_numdirtybuffers > bd->bd_hidirtybuffers)
 		BIT_SET(BUF_DOMAINS, BD_DOMAIN(bd), &bdhidirty);
 	mtx_unlock(&bdirtylock);
 }
 
 /*
  *	bdirtysub:
  *
  *	Decrement the numdirtybuffers count by one and wakeup any
  *	threads blocked in bwillwrite().
  */
 static void
 bdirtysub(struct buf *bp)
 {
 	struct bufdomain *bd;
 	int num;
 
 	bd = bufdomain(bp);
 	num = atomic_fetchadd_int(&bd->bd_numdirtybuffers, -1);
 	if (num == (bd->bd_lodirtybuffers + bd->bd_hidirtybuffers) / 2)
 		bdirtywakeup();
 	if (num == bd->bd_lodirtybuffers || num == bd->bd_hidirtybuffers)
 		bd_clear(bd);
 }
 
 /*
  *	bdirtyadd:
  *
  *	Increment the numdirtybuffers count by one and wakeup the buf 
  *	daemon if needed.
  */
 static void
 bdirtyadd(struct buf *bp)
 {
 	struct bufdomain *bd;
 	int num;
 
 	/*
 	 * Only do the wakeup once as we cross the boundary.  The
 	 * buf daemon will keep running until the condition clears.
 	 */
 	bd = bufdomain(bp);
 	num = atomic_fetchadd_int(&bd->bd_numdirtybuffers, 1);
 	if (num == (bd->bd_lodirtybuffers + bd->bd_hidirtybuffers) / 2)
 		bd_wakeup();
 	if (num == bd->bd_lodirtybuffers || num == bd->bd_hidirtybuffers)
 		bd_set(bd);
 }
 
 /*
  *	bufspace_daemon_wakeup:
  *
  *	Wakeup the daemons responsible for freeing clean bufs.
  */
 static void
 bufspace_daemon_wakeup(struct bufdomain *bd)
 {
 
 	/*
 	 * avoid the lock if the daemon is running.
 	 */
 	if (atomic_fetchadd_int(&bd->bd_running, 1) == 0) {
 		BD_RUN_LOCK(bd);
 		atomic_store_int(&bd->bd_running, 1);
 		wakeup(&bd->bd_running);
 		BD_RUN_UNLOCK(bd);
 	}
 }
 
 /*
  *	bufspace_daemon_wait:
  *
  *	Sleep until the domain falls below a limit or one second passes.
  */
 static void
 bufspace_daemon_wait(struct bufdomain *bd)
 {
 	/*
 	 * Re-check our limits and sleep.  bd_running must be
 	 * cleared prior to checking the limits to avoid missed
 	 * wakeups.  The waker will adjust one of bufspace or
 	 * freebuffers prior to checking bd_running.
 	 */
 	BD_RUN_LOCK(bd);
 	atomic_store_int(&bd->bd_running, 0);
 	if (bd->bd_bufspace < bd->bd_bufspacethresh &&
 	    bd->bd_freebuffers > bd->bd_lofreebuffers) {
 		msleep(&bd->bd_running, BD_RUN_LOCKPTR(bd), PRIBIO|PDROP,
 		    "-", hz);
 	} else {
 		/* Avoid spurious wakeups while running. */
 		atomic_store_int(&bd->bd_running, 1);
 		BD_RUN_UNLOCK(bd);
 	}
 }
 
 /*
  *	bufspace_adjust:
  *
  *	Adjust the reported bufspace for a KVA managed buffer, possibly
  * 	waking any waiters.
  */
 static void
 bufspace_adjust(struct buf *bp, int bufsize)
 {
 	struct bufdomain *bd;
 	long space;
 	int diff;
 
 	KASSERT((bp->b_flags & B_MALLOC) == 0,
 	    ("bufspace_adjust: malloc buf %p", bp));
 	bd = bufdomain(bp);
 	diff = bufsize - bp->b_bufsize;
 	if (diff < 0) {
 		atomic_subtract_long(&bd->bd_bufspace, -diff);
 	} else if (diff > 0) {
 		space = atomic_fetchadd_long(&bd->bd_bufspace, diff);
 		/* Wake up the daemon on the transition. */
 		if (space < bd->bd_bufspacethresh &&
 		    space + diff >= bd->bd_bufspacethresh)
 			bufspace_daemon_wakeup(bd);
 	}
 	bp->b_bufsize = bufsize;
 }
 
 /*
  *	bufspace_reserve:
  *
  *	Reserve bufspace before calling allocbuf().  metadata has a
  *	different space limit than data.
  */
 static int
 bufspace_reserve(struct bufdomain *bd, int size, bool metadata)
 {
 	long limit, new;
 	long space;
 
 	if (metadata)
 		limit = bd->bd_maxbufspace;
 	else
 		limit = bd->bd_hibufspace;
 	space = atomic_fetchadd_long(&bd->bd_bufspace, size);
 	new = space + size;
 	if (new > limit) {
 		atomic_subtract_long(&bd->bd_bufspace, size);
 		return (ENOSPC);
 	}
 
 	/* Wake up the daemon on the transition. */
 	if (space < bd->bd_bufspacethresh && new >= bd->bd_bufspacethresh)
 		bufspace_daemon_wakeup(bd);
 
 	return (0);
 }
 
 /*
  *	bufspace_release:
  *
  *	Release reserved bufspace after bufspace_adjust() has consumed it.
  */
 static void
 bufspace_release(struct bufdomain *bd, int size)
 {
 
 	atomic_subtract_long(&bd->bd_bufspace, size);
 }
 
 /*
  *	bufspace_wait:
  *
  *	Wait for bufspace, acting as the buf daemon if a locked vnode is
  *	supplied.  bd_wanted must be set prior to polling for space.  The
  *	operation must be re-tried on return.
  */
 static void
 bufspace_wait(struct bufdomain *bd, struct vnode *vp, int gbflags,
     int slpflag, int slptimeo)
 {
 	struct thread *td;
 	int error, fl, norunbuf;
 
 	if ((gbflags & GB_NOWAIT_BD) != 0)
 		return;
 
 	td = curthread;
 	BD_LOCK(bd);
 	while (bd->bd_wanted) {
 		if (vp != NULL && vp->v_type != VCHR &&
 		    (td->td_pflags & TDP_BUFNEED) == 0) {
 			BD_UNLOCK(bd);
 			/*
 			 * getblk() is called with a vnode locked, and
 			 * some majority of the dirty buffers may as
 			 * well belong to the vnode.  Flushing the
 			 * buffers there would make a progress that
 			 * cannot be achieved by the buf_daemon, that
 			 * cannot lock the vnode.
 			 */
 			norunbuf = ~(TDP_BUFNEED | TDP_NORUNNINGBUF) |
 			    (td->td_pflags & TDP_NORUNNINGBUF);
 
 			/*
 			 * Play bufdaemon.  The getnewbuf() function
 			 * may be called while the thread owns lock
 			 * for another dirty buffer for the same
 			 * vnode, which makes it impossible to use
 			 * VOP_FSYNC() there, due to the buffer lock
 			 * recursion.
 			 */
 			td->td_pflags |= TDP_BUFNEED | TDP_NORUNNINGBUF;
 			fl = buf_flush(vp, bd, flushbufqtarget);
 			td->td_pflags &= norunbuf;
 			BD_LOCK(bd);
 			if (fl != 0)
 				continue;
 			if (bd->bd_wanted == 0)
 				break;
 		}
 		error = msleep(&bd->bd_wanted, BD_LOCKPTR(bd),
 		    (PRIBIO + 4) | slpflag, "newbuf", slptimeo);
 		if (error != 0)
 			break;
 	}
 	BD_UNLOCK(bd);
 }
 
 
 /*
  *	bufspace_daemon:
  *
  *	buffer space management daemon.  Tries to maintain some marginal
  *	amount of free buffer space so that requesting processes neither
  *	block nor work to reclaim buffers.
  */
 static void
 bufspace_daemon(void *arg)
 {
 	struct bufdomain *bd;
 
 	EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
 	    SHUTDOWN_PRI_LAST + 100);
 
 	bd = arg;
 	for (;;) {
 		kthread_suspend_check();
 
 		/*
 		 * Free buffers from the clean queue until we meet our
 		 * targets.
 		 *
 		 * Theory of operation:  The buffer cache is most efficient
 		 * when some free buffer headers and space are always
 		 * available to getnewbuf().  This daemon attempts to prevent
 		 * the excessive blocking and synchronization associated
 		 * with shortfall.  It goes through three phases according
 		 * demand:
 		 *
 		 * 1)	The daemon wakes up voluntarily once per-second
 		 *	during idle periods when the counters are below
 		 *	the wakeup thresholds (bufspacethresh, lofreebuffers).
 		 *
 		 * 2)	The daemon wakes up as we cross the thresholds
 		 *	ahead of any potential blocking.  This may bounce
 		 *	slightly according to the rate of consumption and
 		 *	release.
 		 *
 		 * 3)	The daemon and consumers are starved for working
 		 *	clean buffers.  This is the 'bufspace' sleep below
 		 *	which will inefficiently trade bufs with bqrelse
 		 *	until we return to condition 2.
 		 */
 		while (bd->bd_bufspace > bd->bd_lobufspace ||
 		    bd->bd_freebuffers < bd->bd_hifreebuffers) {
 			if (buf_recycle(bd, false) != 0) {
 				if (bd_flushall(bd))
 					continue;
 				/*
 				 * Speedup dirty if we've run out of clean
 				 * buffers.  This is possible in particular
 				 * because softdep may held many bufs locked
 				 * pending writes to other bufs which are
 				 * marked for delayed write, exhausting
 				 * clean space until they are written.
 				 */
 				bd_speedup();
 				BD_LOCK(bd);
 				if (bd->bd_wanted) {
 					msleep(&bd->bd_wanted, BD_LOCKPTR(bd),
 					    PRIBIO|PDROP, "bufspace", hz/10);
 				} else
 					BD_UNLOCK(bd);
 			}
 			maybe_yield();
 		}
 		bufspace_daemon_wait(bd);
 	}
 }
 
 /*
  *	bufmallocadjust:
  *
  *	Adjust the reported bufspace for a malloc managed buffer, possibly
  *	waking any waiters.
  */
 static void
 bufmallocadjust(struct buf *bp, int bufsize)
 {
 	int diff;
 
 	KASSERT((bp->b_flags & B_MALLOC) != 0,
 	    ("bufmallocadjust: non-malloc buf %p", bp));
 	diff = bufsize - bp->b_bufsize;
 	if (diff < 0)
 		atomic_subtract_long(&bufmallocspace, -diff);
 	else
 		atomic_add_long(&bufmallocspace, diff);
 	bp->b_bufsize = bufsize;
 }
 
 /*
  *	runningwakeup:
  *
  *	Wake up processes that are waiting on asynchronous writes to fall
  *	below lorunningspace.
  */
 static void
 runningwakeup(void)
 {
 
 	mtx_lock(&rbreqlock);
 	if (runningbufreq) {
 		runningbufreq = 0;
 		wakeup(&runningbufreq);
 	}
 	mtx_unlock(&rbreqlock);
 }
 
 /*
  *	runningbufwakeup:
  *
  *	Decrement the outstanding write count according.
  */
 void
 runningbufwakeup(struct buf *bp)
 {
 	long space, bspace;
 
 	bspace = bp->b_runningbufspace;
 	if (bspace == 0)
 		return;
 	space = atomic_fetchadd_long(&runningbufspace, -bspace);
 	KASSERT(space >= bspace, ("runningbufspace underflow %ld %ld",
 	    space, bspace));
 	bp->b_runningbufspace = 0;
 	/*
 	 * Only acquire the lock and wakeup on the transition from exceeding
 	 * the threshold to falling below it.
 	 */
 	if (space < lorunningspace)
 		return;
 	if (space - bspace > lorunningspace)
 		return;
 	runningwakeup();
 }
 
 /*
  *	waitrunningbufspace()
  *
  *	runningbufspace is a measure of the amount of I/O currently
  *	running.  This routine is used in async-write situations to
  *	prevent creating huge backups of pending writes to a device.
  *	Only asynchronous writes are governed by this function.
  *
  *	This does NOT turn an async write into a sync write.  It waits  
  *	for earlier writes to complete and generally returns before the
  *	caller's write has reached the device.
  */
 void
 waitrunningbufspace(void)
 {
 
 	mtx_lock(&rbreqlock);
 	while (runningbufspace > hirunningspace) {
 		runningbufreq = 1;
 		msleep(&runningbufreq, &rbreqlock, PVM, "wdrain", 0);
 	}
 	mtx_unlock(&rbreqlock);
 }
 
 
 /*
  *	vfs_buf_test_cache:
  *
  *	Called when a buffer is extended.  This function clears the B_CACHE
  *	bit if the newly extended portion of the buffer does not contain
  *	valid data.
  */
 static __inline void
 vfs_buf_test_cache(struct buf *bp, vm_ooffset_t foff, vm_offset_t off,
     vm_offset_t size, vm_page_t m)
 {
 
 	VM_OBJECT_ASSERT_LOCKED(m->object);
 	if (bp->b_flags & B_CACHE) {
 		int base = (foff + off) & PAGE_MASK;
 		if (vm_page_is_valid(m, base, size) == 0)
 			bp->b_flags &= ~B_CACHE;
 	}
 }
 
 /* Wake up the buffer daemon if necessary */
 static void
 bd_wakeup(void)
 {
 
 	mtx_lock(&bdlock);
 	if (bd_request == 0) {
 		bd_request = 1;
 		wakeup(&bd_request);
 	}
 	mtx_unlock(&bdlock);
 }
 
 /*
  * Adjust the maxbcachbuf tunable.
  */
 static void
 maxbcachebuf_adjust(void)
 {
 	int i;
 
 	/*
 	 * maxbcachebuf must be a power of 2 >= MAXBSIZE.
 	 */
 	i = 2;
 	while (i * 2 <= maxbcachebuf)
 		i *= 2;
 	maxbcachebuf = i;
 	if (maxbcachebuf < MAXBSIZE)
 		maxbcachebuf = MAXBSIZE;
 	if (maxbcachebuf > MAXPHYS)
 		maxbcachebuf = MAXPHYS;
 	if (bootverbose != 0 && maxbcachebuf != MAXBCACHEBUF)
 		printf("maxbcachebuf=%d\n", maxbcachebuf);
 }
 
 /*
  * bd_speedup - speedup the buffer cache flushing code
  */
 void
 bd_speedup(void)
 {
 	int needwake;
 
 	mtx_lock(&bdlock);
 	needwake = 0;
 	if (bd_speedupreq == 0 || bd_request == 0)
 		needwake = 1;
 	bd_speedupreq = 1;
 	bd_request = 1;
 	if (needwake)
 		wakeup(&bd_request);
 	mtx_unlock(&bdlock);
 }
 
 #ifdef __i386__
 #define	TRANSIENT_DENOM	5
 #else
 #define	TRANSIENT_DENOM 10
 #endif
 
 /*
  * Calculating buffer cache scaling values and reserve space for buffer
  * headers.  This is called during low level kernel initialization and
  * may be called more then once.  We CANNOT write to the memory area
  * being reserved at this time.
  */
 caddr_t
 kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est)
 {
 	int tuned_nbuf;
 	long maxbuf, maxbuf_sz, buf_sz,	biotmap_sz;
 
 	/*
 	 * physmem_est is in pages.  Convert it to kilobytes (assumes
 	 * PAGE_SIZE is >= 1K)
 	 */
 	physmem_est = physmem_est * (PAGE_SIZE / 1024);
 
 	maxbcachebuf_adjust();
 	/*
 	 * The nominal buffer size (and minimum KVA allocation) is BKVASIZE.
 	 * For the first 64MB of ram nominally allocate sufficient buffers to
 	 * cover 1/4 of our ram.  Beyond the first 64MB allocate additional
 	 * buffers to cover 1/10 of our ram over 64MB.  When auto-sizing
 	 * the buffer cache we limit the eventual kva reservation to
 	 * maxbcache bytes.
 	 *
 	 * factor represents the 1/4 x ram conversion.
 	 */
 	if (nbuf == 0) {
 		int factor = 4 * BKVASIZE / 1024;
 
 		nbuf = 50;
 		if (physmem_est > 4096)
 			nbuf += min((physmem_est - 4096) / factor,
 			    65536 / factor);
 		if (physmem_est > 65536)
 			nbuf += min((physmem_est - 65536) * 2 / (factor * 5),
 			    32 * 1024 * 1024 / (factor * 5));
 
 		if (maxbcache && nbuf > maxbcache / BKVASIZE)
 			nbuf = maxbcache / BKVASIZE;
 		tuned_nbuf = 1;
 	} else
 		tuned_nbuf = 0;
 
 	/* XXX Avoid unsigned long overflows later on with maxbufspace. */
 	maxbuf = (LONG_MAX / 3) / BKVASIZE;
 	if (nbuf > maxbuf) {
 		if (!tuned_nbuf)
 			printf("Warning: nbufs lowered from %d to %ld\n", nbuf,
 			    maxbuf);
 		nbuf = maxbuf;
 	}
 
 	/*
 	 * Ideal allocation size for the transient bio submap is 10%
 	 * of the maximal space buffer map.  This roughly corresponds
 	 * to the amount of the buffer mapped for typical UFS load.
 	 *
 	 * Clip the buffer map to reserve space for the transient
 	 * BIOs, if its extent is bigger than 90% (80% on i386) of the
 	 * maximum buffer map extent on the platform.
 	 *
 	 * The fall-back to the maxbuf in case of maxbcache unset,
 	 * allows to not trim the buffer KVA for the architectures
 	 * with ample KVA space.
 	 */
 	if (bio_transient_maxcnt == 0 && unmapped_buf_allowed) {
 		maxbuf_sz = maxbcache != 0 ? maxbcache : maxbuf * BKVASIZE;
 		buf_sz = (long)nbuf * BKVASIZE;
 		if (buf_sz < maxbuf_sz / TRANSIENT_DENOM *
 		    (TRANSIENT_DENOM - 1)) {
 			/*
 			 * There is more KVA than memory.  Do not
 			 * adjust buffer map size, and assign the rest
 			 * of maxbuf to transient map.
 			 */
 			biotmap_sz = maxbuf_sz - buf_sz;
 		} else {
 			/*
 			 * Buffer map spans all KVA we could afford on
 			 * this platform.  Give 10% (20% on i386) of
 			 * the buffer map to the transient bio map.
 			 */
 			biotmap_sz = buf_sz / TRANSIENT_DENOM;
 			buf_sz -= biotmap_sz;
 		}
 		if (biotmap_sz / INT_MAX > MAXPHYS)
 			bio_transient_maxcnt = INT_MAX;
 		else
 			bio_transient_maxcnt = biotmap_sz / MAXPHYS;
 		/*
 		 * Artificially limit to 1024 simultaneous in-flight I/Os
 		 * using the transient mapping.
 		 */
 		if (bio_transient_maxcnt > 1024)
 			bio_transient_maxcnt = 1024;
 		if (tuned_nbuf)
 			nbuf = buf_sz / BKVASIZE;
 	}
 
 	if (nswbuf == 0) {
 		nswbuf = min(nbuf / 4, 256);
 		if (nswbuf < NSWBUF_MIN)
 			nswbuf = NSWBUF_MIN;
 	}
 
 	/*
 	 * Reserve space for the buffer cache buffers
 	 */
 	buf = (void *)v;
 	v = (caddr_t)(buf + nbuf);
 
 	return(v);
 }
 
 /* Initialize the buffer subsystem.  Called before use of any buffers. */
 void
 bufinit(void)
 {
 	struct buf *bp;
 	int i;
 
 	KASSERT(maxbcachebuf >= MAXBSIZE,
 	    ("maxbcachebuf (%d) must be >= MAXBSIZE (%d)\n", maxbcachebuf,
 	    MAXBSIZE));
 	bq_init(&bqempty, QUEUE_EMPTY, -1, "bufq empty lock");
 	mtx_init(&rbreqlock, "runningbufspace lock", NULL, MTX_DEF);
 	mtx_init(&bdlock, "buffer daemon lock", NULL, MTX_DEF);
 	mtx_init(&bdirtylock, "dirty buf lock", NULL, MTX_DEF);
 
 	unmapped_buf = (caddr_t)kva_alloc(MAXPHYS);
 
 	/* finally, initialize each buffer header and stick on empty q */
 	for (i = 0; i < nbuf; i++) {
 		bp = &buf[i];
 		bzero(bp, sizeof *bp);
 		bp->b_flags = B_INVAL;
 		bp->b_rcred = NOCRED;
 		bp->b_wcred = NOCRED;
 		bp->b_qindex = QUEUE_NONE;
 		bp->b_domain = -1;
 		bp->b_subqueue = mp_maxid + 1;
 		bp->b_xflags = 0;
 		bp->b_data = bp->b_kvabase = unmapped_buf;
 		LIST_INIT(&bp->b_dep);
 		BUF_LOCKINIT(bp);
 		bq_insert(&bqempty, bp, false);
 	}
 
 	/*
 	 * maxbufspace is the absolute maximum amount of buffer space we are 
 	 * allowed to reserve in KVM and in real terms.  The absolute maximum
 	 * is nominally used by metadata.  hibufspace is the nominal maximum
 	 * used by most other requests.  The differential is required to 
 	 * ensure that metadata deadlocks don't occur.
 	 *
 	 * maxbufspace is based on BKVASIZE.  Allocating buffers larger then
 	 * this may result in KVM fragmentation which is not handled optimally
 	 * by the system. XXX This is less true with vmem.  We could use
 	 * PAGE_SIZE.
 	 */
 	maxbufspace = (long)nbuf * BKVASIZE;
 	hibufspace = lmax(3 * maxbufspace / 4, maxbufspace - maxbcachebuf * 10);
 	lobufspace = (hibufspace / 20) * 19; /* 95% */
 	bufspacethresh = lobufspace + (hibufspace - lobufspace) / 2;
 
 	/*
 	 * Note: The 16 MiB upper limit for hirunningspace was chosen
 	 * arbitrarily and may need further tuning. It corresponds to
 	 * 128 outstanding write IO requests (if IO size is 128 KiB),
 	 * which fits with many RAID controllers' tagged queuing limits.
 	 * The lower 1 MiB limit is the historical upper limit for
 	 * hirunningspace.
 	 */
 	hirunningspace = lmax(lmin(roundup(hibufspace / 64, maxbcachebuf),
 	    16 * 1024 * 1024), 1024 * 1024);
 	lorunningspace = roundup((hirunningspace * 2) / 3, maxbcachebuf);
 
 	/*
 	 * Limit the amount of malloc memory since it is wired permanently into
 	 * the kernel space.  Even though this is accounted for in the buffer
 	 * allocation, we don't want the malloced region to grow uncontrolled.
 	 * The malloc scheme improves memory utilization significantly on
 	 * average (small) directories.
 	 */
 	maxbufmallocspace = hibufspace / 20;
 
 	/*
 	 * Reduce the chance of a deadlock occurring by limiting the number
 	 * of delayed-write dirty buffers we allow to stack up.
 	 */
 	hidirtybuffers = nbuf / 4 + 20;
 	dirtybufthresh = hidirtybuffers * 9 / 10;
 	/*
 	 * To support extreme low-memory systems, make sure hidirtybuffers
 	 * cannot eat up all available buffer space.  This occurs when our
 	 * minimum cannot be met.  We try to size hidirtybuffers to 3/4 our
 	 * buffer space assuming BKVASIZE'd buffers.
 	 */
 	while ((long)hidirtybuffers * BKVASIZE > 3 * hibufspace / 4) {
 		hidirtybuffers >>= 1;
 	}
 	lodirtybuffers = hidirtybuffers / 2;
 
 	/*
 	 * lofreebuffers should be sufficient to avoid stalling waiting on
 	 * buf headers under heavy utilization.  The bufs in per-cpu caches
 	 * are counted as free but will be unavailable to threads executing
 	 * on other cpus.
 	 *
 	 * hifreebuffers is the free target for the bufspace daemon.  This
 	 * should be set appropriately to limit work per-iteration.
 	 */
 	lofreebuffers = MIN((nbuf / 25) + (20 * mp_ncpus), 128 * mp_ncpus);
 	hifreebuffers = (3 * lofreebuffers) / 2;
 	numfreebuffers = nbuf;
 
 	/* Setup the kva and free list allocators. */
 	vmem_set_reclaim(buffer_arena, bufkva_reclaim);
 	buf_zone = uma_zcache_create("buf free cache", sizeof(struct buf),
 	    NULL, NULL, NULL, NULL, buf_import, buf_release, NULL, 0);
 
 	/*
 	 * Size the clean queue according to the amount of buffer space.
 	 * One queue per-256mb up to the max.  More queues gives better
 	 * concurrency but less accurate LRU.
 	 */
 	buf_domains = MIN(howmany(maxbufspace, 256*1024*1024), BUF_DOMAINS);
 	for (i = 0 ; i < buf_domains; i++) {
 		struct bufdomain *bd;
 
 		bd = &bdomain[i];
 		bd_init(bd);
 		bd->bd_freebuffers = nbuf / buf_domains;
 		bd->bd_hifreebuffers = hifreebuffers / buf_domains;
 		bd->bd_lofreebuffers = lofreebuffers / buf_domains;
 		bd->bd_bufspace = 0;
 		bd->bd_maxbufspace = maxbufspace / buf_domains;
 		bd->bd_hibufspace = hibufspace / buf_domains;
 		bd->bd_lobufspace = lobufspace / buf_domains;
 		bd->bd_bufspacethresh = bufspacethresh / buf_domains;
 		bd->bd_numdirtybuffers = 0;
 		bd->bd_hidirtybuffers = hidirtybuffers / buf_domains;
 		bd->bd_lodirtybuffers = lodirtybuffers / buf_domains;
 		bd->bd_dirtybufthresh = dirtybufthresh / buf_domains;
 		/* Don't allow more than 2% of bufs in the per-cpu caches. */
 		bd->bd_lim = nbuf / buf_domains / 50 / mp_ncpus;
 	}
 	getnewbufcalls = counter_u64_alloc(M_WAITOK);
 	getnewbufrestarts = counter_u64_alloc(M_WAITOK);
 	mappingrestarts = counter_u64_alloc(M_WAITOK);
 	numbufallocfails = counter_u64_alloc(M_WAITOK);
 	notbufdflushes = counter_u64_alloc(M_WAITOK);
 	buffreekvacnt = counter_u64_alloc(M_WAITOK);
 	bufdefragcnt = counter_u64_alloc(M_WAITOK);
 	bufkvaspace = counter_u64_alloc(M_WAITOK);
 }
 
 #ifdef INVARIANTS
 static inline void
 vfs_buf_check_mapped(struct buf *bp)
 {
 
 	KASSERT(bp->b_kvabase != unmapped_buf,
 	    ("mapped buf: b_kvabase was not updated %p", bp));
 	KASSERT(bp->b_data != unmapped_buf,
 	    ("mapped buf: b_data was not updated %p", bp));
 	KASSERT(bp->b_data < unmapped_buf || bp->b_data >= unmapped_buf +
 	    MAXPHYS, ("b_data + b_offset unmapped %p", bp));
 }
 
 static inline void
 vfs_buf_check_unmapped(struct buf *bp)
 {
 
 	KASSERT(bp->b_data == unmapped_buf,
 	    ("unmapped buf: corrupted b_data %p", bp));
 }
 
 #define	BUF_CHECK_MAPPED(bp) vfs_buf_check_mapped(bp)
 #define	BUF_CHECK_UNMAPPED(bp) vfs_buf_check_unmapped(bp)
 #else
 #define	BUF_CHECK_MAPPED(bp) do {} while (0)
 #define	BUF_CHECK_UNMAPPED(bp) do {} while (0)
 #endif
 
 static int
 isbufbusy(struct buf *bp)
 {
 	if (((bp->b_flags & B_INVAL) == 0 && BUF_ISLOCKED(bp)) ||
 	    ((bp->b_flags & (B_DELWRI | B_INVAL)) == B_DELWRI))
 		return (1);
 	return (0);
 }
 
 /*
  * Shutdown the system cleanly to prepare for reboot, halt, or power off.
  */
 void
 bufshutdown(int show_busybufs)
 {
 	static int first_buf_printf = 1;
 	struct buf *bp;
 	int iter, nbusy, pbusy;
 #ifndef PREEMPTION
 	int subiter;
 #endif
 
 	/* 
 	 * Sync filesystems for shutdown
 	 */
 	wdog_kern_pat(WD_LASTVAL);
 	sys_sync(curthread, NULL);
 
 	/*
 	 * With soft updates, some buffers that are
 	 * written will be remarked as dirty until other
 	 * buffers are written.
 	 */
 	for (iter = pbusy = 0; iter < 20; iter++) {
 		nbusy = 0;
 		for (bp = &buf[nbuf]; --bp >= buf; )
 			if (isbufbusy(bp))
 				nbusy++;
 		if (nbusy == 0) {
 			if (first_buf_printf)
 				printf("All buffers synced.");
 			break;
 		}
 		if (first_buf_printf) {
 			printf("Syncing disks, buffers remaining... ");
 			first_buf_printf = 0;
 		}
 		printf("%d ", nbusy);
 		if (nbusy < pbusy)
 			iter = 0;
 		pbusy = nbusy;
 
 		wdog_kern_pat(WD_LASTVAL);
 		sys_sync(curthread, NULL);
 
 #ifdef PREEMPTION
 		/*
 		 * Spin for a while to allow interrupt threads to run.
 		 */
 		DELAY(50000 * iter);
 #else
 		/*
 		 * Context switch several times to allow interrupt
 		 * threads to run.
 		 */
 		for (subiter = 0; subiter < 50 * iter; subiter++) {
 			thread_lock(curthread);
 			mi_switch(SW_VOL, NULL);
 			thread_unlock(curthread);
 			DELAY(1000);
 		}
 #endif
 	}
 	printf("\n");
 	/*
 	 * Count only busy local buffers to prevent forcing 
 	 * a fsck if we're just a client of a wedged NFS server
 	 */
 	nbusy = 0;
 	for (bp = &buf[nbuf]; --bp >= buf; ) {
 		if (isbufbusy(bp)) {
 #if 0
 /* XXX: This is bogus.  We should probably have a BO_REMOTE flag instead */
 			if (bp->b_dev == NULL) {
 				TAILQ_REMOVE(&mountlist,
 				    bp->b_vp->v_mount, mnt_list);
 				continue;
 			}
 #endif
 			nbusy++;
 			if (show_busybufs > 0) {
 				printf(
 	    "%d: buf:%p, vnode:%p, flags:%0x, blkno:%jd, lblkno:%jd, buflock:",
 				    nbusy, bp, bp->b_vp, bp->b_flags,
 				    (intmax_t)bp->b_blkno,
 				    (intmax_t)bp->b_lblkno);
 				BUF_LOCKPRINTINFO(bp);
 				if (show_busybufs > 1)
 					vn_printf(bp->b_vp,
 					    "vnode content: ");
 			}
 		}
 	}
 	if (nbusy) {
 		/*
 		 * Failed to sync all blocks. Indicate this and don't
 		 * unmount filesystems (thus forcing an fsck on reboot).
 		 */
 		printf("Giving up on %d buffers\n", nbusy);
 		DELAY(5000000);	/* 5 seconds */
 	} else {
 		if (!first_buf_printf)
 			printf("Final sync complete\n");
 		/*
 		 * Unmount filesystems
 		 */
 		if (panicstr == NULL)
 			vfs_unmountall();
 	}
 	swapoff_all();
 	DELAY(100000);		/* wait for console output to finish */
 }
 
 static void
 bpmap_qenter(struct buf *bp)
 {
 
 	BUF_CHECK_MAPPED(bp);
 
 	/*
 	 * bp->b_data is relative to bp->b_offset, but
 	 * bp->b_offset may be offset into the first page.
 	 */
 	bp->b_data = (caddr_t)trunc_page((vm_offset_t)bp->b_data);
 	pmap_qenter((vm_offset_t)bp->b_data, bp->b_pages, bp->b_npages);
 	bp->b_data = (caddr_t)((vm_offset_t)bp->b_data |
 	    (vm_offset_t)(bp->b_offset & PAGE_MASK));
 }
 
 static inline struct bufdomain *
 bufdomain(struct buf *bp)
 {
 
 	return (&bdomain[bp->b_domain]);
 }
 
 static struct bufqueue *
 bufqueue(struct buf *bp)
 {
 
 	switch (bp->b_qindex) {
 	case QUEUE_NONE:
 		/* FALLTHROUGH */
 	case QUEUE_SENTINEL:
 		return (NULL);
 	case QUEUE_EMPTY:
 		return (&bqempty);
 	case QUEUE_DIRTY:
 		return (&bufdomain(bp)->bd_dirtyq);
 	case QUEUE_CLEAN:
 		return (&bufdomain(bp)->bd_subq[bp->b_subqueue]);
 	default:
 		break;
 	}
 	panic("bufqueue(%p): Unhandled type %d\n", bp, bp->b_qindex);
 }
 
 /*
  * Return the locked bufqueue that bp is a member of.
  */
 static struct bufqueue *
 bufqueue_acquire(struct buf *bp)
 {
 	struct bufqueue *bq, *nbq;
 
 	/*
 	 * bp can be pushed from a per-cpu queue to the
 	 * cleanq while we're waiting on the lock.  Retry
 	 * if the queues don't match.
 	 */
 	bq = bufqueue(bp);
 	BQ_LOCK(bq);
 	for (;;) {
 		nbq = bufqueue(bp);
 		if (bq == nbq)
 			break;
 		BQ_UNLOCK(bq);
 		BQ_LOCK(nbq);
 		bq = nbq;
 	}
 	return (bq);
 }
 
 /*
  *	binsfree:
  *
  *	Insert the buffer into the appropriate free list.  Requires a
  *	locked buffer on entry and buffer is unlocked before return.
  */
 static void
 binsfree(struct buf *bp, int qindex)
 {
 	struct bufdomain *bd;
 	struct bufqueue *bq;
 
 	KASSERT(qindex == QUEUE_CLEAN || qindex == QUEUE_DIRTY,
 	    ("binsfree: Invalid qindex %d", qindex));
 	BUF_ASSERT_XLOCKED(bp);
 
 	/*
 	 * Handle delayed bremfree() processing.
 	 */
 	if (bp->b_flags & B_REMFREE) {
 		if (bp->b_qindex == qindex) {
 			bp->b_flags |= B_REUSE;
 			bp->b_flags &= ~B_REMFREE;
 			BUF_UNLOCK(bp);
 			return;
 		}
 		bq = bufqueue_acquire(bp);
 		bq_remove(bq, bp);
 		BQ_UNLOCK(bq);
 	}
 	bd = bufdomain(bp);
 	if (qindex == QUEUE_CLEAN) {
 		if (bd->bd_lim != 0)
 			bq = &bd->bd_subq[PCPU_GET(cpuid)];
 		else
 			bq = bd->bd_cleanq;
 	} else
 		bq = &bd->bd_dirtyq;
 	bq_insert(bq, bp, true);
 }
 
 /*
  * buf_free:
  *
  *	Free a buffer to the buf zone once it no longer has valid contents.
  */
 static void
 buf_free(struct buf *bp)
 {
 
 	if (bp->b_flags & B_REMFREE)
 		bremfreef(bp);
 	if (bp->b_vflags & BV_BKGRDINPROG)
 		panic("losing buffer 1");
 	if (bp->b_rcred != NOCRED) {
 		crfree(bp->b_rcred);
 		bp->b_rcred = NOCRED;
 	}
 	if (bp->b_wcred != NOCRED) {
 		crfree(bp->b_wcred);
 		bp->b_wcred = NOCRED;
 	}
 	if (!LIST_EMPTY(&bp->b_dep))
 		buf_deallocate(bp);
 	bufkva_free(bp);
 	atomic_add_int(&bufdomain(bp)->bd_freebuffers, 1);
 	BUF_UNLOCK(bp);
 	uma_zfree(buf_zone, bp);
 }
 
 /*
  * buf_import:
  *
  *	Import bufs into the uma cache from the buf list.  The system still
  *	expects a static array of bufs and much of the synchronization
  *	around bufs assumes type stable storage.  As a result, UMA is used
  *	only as a per-cpu cache of bufs still maintained on a global list.
  */
 static int
 buf_import(void *arg, void **store, int cnt, int domain, int flags)
 {
 	struct buf *bp;
 	int i;
 
 	BQ_LOCK(&bqempty);
 	for (i = 0; i < cnt; i++) {
 		bp = TAILQ_FIRST(&bqempty.bq_queue);
 		if (bp == NULL)
 			break;
 		bq_remove(&bqempty, bp);
 		store[i] = bp;
 	}
 	BQ_UNLOCK(&bqempty);
 
 	return (i);
 }
 
 /*
  * buf_release:
  *
  *	Release bufs from the uma cache back to the buffer queues.
  */
 static void
 buf_release(void *arg, void **store, int cnt)
 {
 	struct bufqueue *bq;
 	struct buf *bp;
         int i;
 
 	bq = &bqempty;
 	BQ_LOCK(bq);
         for (i = 0; i < cnt; i++) {
 		bp = store[i];
 		/* Inline bq_insert() to batch locking. */
 		TAILQ_INSERT_TAIL(&bq->bq_queue, bp, b_freelist);
 		bp->b_flags &= ~(B_AGE | B_REUSE);
 		bq->bq_len++;
 		bp->b_qindex = bq->bq_index;
 	}
 	BQ_UNLOCK(bq);
 }
 
 /*
  * buf_alloc:
  *
  *	Allocate an empty buffer header.
  */
 static struct buf *
 buf_alloc(struct bufdomain *bd)
 {
 	struct buf *bp;
 	int freebufs;
 
 	/*
 	 * We can only run out of bufs in the buf zone if the average buf
 	 * is less than BKVASIZE.  In this case the actual wait/block will
 	 * come from buf_reycle() failing to flush one of these small bufs.
 	 */
 	bp = NULL;
 	freebufs = atomic_fetchadd_int(&bd->bd_freebuffers, -1);
 	if (freebufs > 0)
 		bp = uma_zalloc(buf_zone, M_NOWAIT);
 	if (bp == NULL) {
 		atomic_add_int(&bd->bd_freebuffers, 1);
 		bufspace_daemon_wakeup(bd);
 		counter_u64_add(numbufallocfails, 1);
 		return (NULL);
 	}
 	/*
 	 * Wake-up the bufspace daemon on transition below threshold.
 	 */
 	if (freebufs == bd->bd_lofreebuffers)
 		bufspace_daemon_wakeup(bd);
 
 	if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
 		panic("getnewbuf_empty: Locked buf %p on free queue.", bp);
 	
 	KASSERT(bp->b_vp == NULL,
 	    ("bp: %p still has vnode %p.", bp, bp->b_vp));
 	KASSERT((bp->b_flags & (B_DELWRI | B_NOREUSE)) == 0,
 	    ("invalid buffer %p flags %#x", bp, bp->b_flags));
 	KASSERT((bp->b_xflags & (BX_VNCLEAN|BX_VNDIRTY)) == 0,
 	    ("bp: %p still on a buffer list. xflags %X", bp, bp->b_xflags));
 	KASSERT(bp->b_npages == 0,
 	    ("bp: %p still has %d vm pages\n", bp, bp->b_npages));
 	KASSERT(bp->b_kvasize == 0, ("bp: %p still has kva\n", bp));
 	KASSERT(bp->b_bufsize == 0, ("bp: %p still has bufspace\n", bp));
 
 	bp->b_domain = BD_DOMAIN(bd);
 	bp->b_flags = 0;
 	bp->b_ioflags = 0;
 	bp->b_xflags = 0;
 	bp->b_vflags = 0;
 	bp->b_vp = NULL;
 	bp->b_blkno = bp->b_lblkno = 0;
 	bp->b_offset = NOOFFSET;
 	bp->b_iodone = 0;
 	bp->b_error = 0;
 	bp->b_resid = 0;
 	bp->b_bcount = 0;
 	bp->b_npages = 0;
 	bp->b_dirtyoff = bp->b_dirtyend = 0;
 	bp->b_bufobj = NULL;
 	bp->b_data = bp->b_kvabase = unmapped_buf;
 	bp->b_fsprivate1 = NULL;
 	bp->b_fsprivate2 = NULL;
 	bp->b_fsprivate3 = NULL;
 	LIST_INIT(&bp->b_dep);
 
 	return (bp);
 }
 
 /*
  *	buf_recycle:
  *
  *	Free a buffer from the given bufqueue.  kva controls whether the
  *	freed buf must own some kva resources.  This is used for
  *	defragmenting.
  */
 static int
 buf_recycle(struct bufdomain *bd, bool kva)
 {
 	struct bufqueue *bq;
 	struct buf *bp, *nbp;
 
 	if (kva)
 		counter_u64_add(bufdefragcnt, 1);
 	nbp = NULL;
 	bq = bd->bd_cleanq;
 	BQ_LOCK(bq);
 	KASSERT(BQ_LOCKPTR(bq) == BD_LOCKPTR(bd),
 	    ("buf_recycle: Locks don't match"));
 	nbp = TAILQ_FIRST(&bq->bq_queue);
 
 	/*
 	 * Run scan, possibly freeing data and/or kva mappings on the fly
 	 * depending.
 	 */
 	while ((bp = nbp) != NULL) {
 		/*
 		 * Calculate next bp (we can only use it if we do not
 		 * release the bqlock).
 		 */
 		nbp = TAILQ_NEXT(bp, b_freelist);
 
 		/*
 		 * If we are defragging then we need a buffer with 
 		 * some kva to reclaim.
 		 */
 		if (kva && bp->b_kvasize == 0)
 			continue;
 
 		if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
 			continue;
 
 		/*
 		 * Implement a second chance algorithm for frequently
 		 * accessed buffers.
 		 */
 		if ((bp->b_flags & B_REUSE) != 0) {
 			TAILQ_REMOVE(&bq->bq_queue, bp, b_freelist);
 			TAILQ_INSERT_TAIL(&bq->bq_queue, bp, b_freelist);
 			bp->b_flags &= ~B_REUSE;
 			BUF_UNLOCK(bp);
 			continue;
 		}
 
 		/*
 		 * Skip buffers with background writes in progress.
 		 */
 		if ((bp->b_vflags & BV_BKGRDINPROG) != 0) {
 			BUF_UNLOCK(bp);
 			continue;
 		}
 
 		KASSERT(bp->b_qindex == QUEUE_CLEAN,
 		    ("buf_recycle: inconsistent queue %d bp %p",
 		    bp->b_qindex, bp));
 		KASSERT(bp->b_domain == BD_DOMAIN(bd),
 		    ("getnewbuf: queue domain %d doesn't match request %d",
 		    bp->b_domain, (int)BD_DOMAIN(bd)));
 		/*
 		 * NOTE:  nbp is now entirely invalid.  We can only restart
 		 * the scan from this point on.
 		 */
 		bq_remove(bq, bp);
 		BQ_UNLOCK(bq);
 
 		/*
 		 * Requeue the background write buffer with error and
 		 * restart the scan.
 		 */
 		if ((bp->b_vflags & BV_BKGRDERR) != 0) {
 			bqrelse(bp);
 			BQ_LOCK(bq);
 			nbp = TAILQ_FIRST(&bq->bq_queue);
 			continue;
 		}
 		bp->b_flags |= B_INVAL;
 		brelse(bp);
 		return (0);
 	}
 	bd->bd_wanted = 1;
 	BQ_UNLOCK(bq);
 
 	return (ENOBUFS);
 }
 
 /*
  *	bremfree:
  *
  *	Mark the buffer for removal from the appropriate free list.
  *	
  */
 void
 bremfree(struct buf *bp)
 {
 
 	CTR3(KTR_BUF, "bremfree(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	KASSERT((bp->b_flags & B_REMFREE) == 0,
 	    ("bremfree: buffer %p already marked for delayed removal.", bp));
 	KASSERT(bp->b_qindex != QUEUE_NONE,
 	    ("bremfree: buffer %p not on a queue.", bp));
 	BUF_ASSERT_XLOCKED(bp);
 
 	bp->b_flags |= B_REMFREE;
 }
 
 /*
  *	bremfreef:
  *
  *	Force an immediate removal from a free list.  Used only in nfs when
  *	it abuses the b_freelist pointer.
  */
 void
 bremfreef(struct buf *bp)
 {
 	struct bufqueue *bq;
 
 	bq = bufqueue_acquire(bp);
 	bq_remove(bq, bp);
 	BQ_UNLOCK(bq);
 }
 
 static void
 bq_init(struct bufqueue *bq, int qindex, int subqueue, const char *lockname)
 {
 
 	mtx_init(&bq->bq_lock, lockname, NULL, MTX_DEF);
 	TAILQ_INIT(&bq->bq_queue);
 	bq->bq_len = 0;
 	bq->bq_index = qindex;
 	bq->bq_subqueue = subqueue;
 }
 
 static void
 bd_init(struct bufdomain *bd)
 {
 	int i;
 
 	bd->bd_cleanq = &bd->bd_subq[mp_maxid + 1];
 	bq_init(bd->bd_cleanq, QUEUE_CLEAN, mp_maxid + 1, "bufq clean lock");
 	bq_init(&bd->bd_dirtyq, QUEUE_DIRTY, -1, "bufq dirty lock");
 	for (i = 0; i <= mp_maxid; i++)
 		bq_init(&bd->bd_subq[i], QUEUE_CLEAN, i,
 		    "bufq clean subqueue lock");
 	mtx_init(&bd->bd_run_lock, "bufspace daemon run lock", NULL, MTX_DEF);
 }
 
 /*
  *	bq_remove:
  *
  *	Removes a buffer from the free list, must be called with the
  *	correct qlock held.
  */
 static void
 bq_remove(struct bufqueue *bq, struct buf *bp)
 {
 
 	CTR3(KTR_BUF, "bq_remove(%p) vp %p flags %X",
 	    bp, bp->b_vp, bp->b_flags);
 	KASSERT(bp->b_qindex != QUEUE_NONE,
 	    ("bq_remove: buffer %p not on a queue.", bp));
 	KASSERT(bufqueue(bp) == bq,
 	    ("bq_remove: Remove buffer %p from wrong queue.", bp));
 
 	BQ_ASSERT_LOCKED(bq);
 	if (bp->b_qindex != QUEUE_EMPTY) {
 		BUF_ASSERT_XLOCKED(bp);
 	}
 	KASSERT(bq->bq_len >= 1,
 	    ("queue %d underflow", bp->b_qindex));
 	TAILQ_REMOVE(&bq->bq_queue, bp, b_freelist);
 	bq->bq_len--;
 	bp->b_qindex = QUEUE_NONE;
 	bp->b_flags &= ~(B_REMFREE | B_REUSE);
 }
 
 static void
 bd_flush(struct bufdomain *bd, struct bufqueue *bq)
 {
 	struct buf *bp;
 
 	BQ_ASSERT_LOCKED(bq);
 	if (bq != bd->bd_cleanq) {
 		BD_LOCK(bd);
 		while ((bp = TAILQ_FIRST(&bq->bq_queue)) != NULL) {
 			TAILQ_REMOVE(&bq->bq_queue, bp, b_freelist);
 			TAILQ_INSERT_TAIL(&bd->bd_cleanq->bq_queue, bp,
 			    b_freelist);
 			bp->b_subqueue = bd->bd_cleanq->bq_subqueue;
 		}
 		bd->bd_cleanq->bq_len += bq->bq_len;
 		bq->bq_len = 0;
 	}
 	if (bd->bd_wanted) {
 		bd->bd_wanted = 0;
 		wakeup(&bd->bd_wanted);
 	}
 	if (bq != bd->bd_cleanq)
 		BD_UNLOCK(bd);
 }
 
 static int
 bd_flushall(struct bufdomain *bd)
 {
 	struct bufqueue *bq;
 	int flushed;
 	int i;
 
 	if (bd->bd_lim == 0)
 		return (0);
 	flushed = 0;
 	for (i = 0; i <= mp_maxid; i++) {
 		bq = &bd->bd_subq[i];
 		if (bq->bq_len == 0)
 			continue;
 		BQ_LOCK(bq);
 		bd_flush(bd, bq);
 		BQ_UNLOCK(bq);
 		flushed++;
 	}
 
 	return (flushed);
 }
 
 static void
 bq_insert(struct bufqueue *bq, struct buf *bp, bool unlock)
 {
 	struct bufdomain *bd;
 
 	if (bp->b_qindex != QUEUE_NONE)
 		panic("bq_insert: free buffer %p onto another queue?", bp);
 
 	bd = bufdomain(bp);
 	if (bp->b_flags & B_AGE) {
 		/* Place this buf directly on the real queue. */
 		if (bq->bq_index == QUEUE_CLEAN)
 			bq = bd->bd_cleanq;
 		BQ_LOCK(bq);
 		TAILQ_INSERT_HEAD(&bq->bq_queue, bp, b_freelist);
 	} else {
 		BQ_LOCK(bq);
 		TAILQ_INSERT_TAIL(&bq->bq_queue, bp, b_freelist);
 	}
 	bp->b_flags &= ~(B_AGE | B_REUSE);
 	bq->bq_len++;
 	bp->b_qindex = bq->bq_index;
 	bp->b_subqueue = bq->bq_subqueue;
 
 	/*
 	 * Unlock before we notify so that we don't wakeup a waiter that
 	 * fails a trylock on the buf and sleeps again.
 	 */
 	if (unlock)
 		BUF_UNLOCK(bp);
 
 	if (bp->b_qindex == QUEUE_CLEAN) {
 		/*
 		 * Flush the per-cpu queue and notify any waiters.
 		 */
 		if (bd->bd_wanted || (bq != bd->bd_cleanq &&
 		    bq->bq_len >= bd->bd_lim))
 			bd_flush(bd, bq);
 	}
 	BQ_UNLOCK(bq);
 }
 
 /*
  *	bufkva_free:
  *
  *	Free the kva allocation for a buffer.
  *
  */
 static void
 bufkva_free(struct buf *bp)
 {
 
 #ifdef INVARIANTS
 	if (bp->b_kvasize == 0) {
 		KASSERT(bp->b_kvabase == unmapped_buf &&
 		    bp->b_data == unmapped_buf,
 		    ("Leaked KVA space on %p", bp));
 	} else if (buf_mapped(bp))
 		BUF_CHECK_MAPPED(bp);
 	else
 		BUF_CHECK_UNMAPPED(bp);
 #endif
 	if (bp->b_kvasize == 0)
 		return;
 
 	vmem_free(buffer_arena, (vm_offset_t)bp->b_kvabase, bp->b_kvasize);
 	counter_u64_add(bufkvaspace, -bp->b_kvasize);
 	counter_u64_add(buffreekvacnt, 1);
 	bp->b_data = bp->b_kvabase = unmapped_buf;
 	bp->b_kvasize = 0;
 }
 
 /*
  *	bufkva_alloc:
  *
  *	Allocate the buffer KVA and set b_kvasize and b_kvabase.
  */
 static int
 bufkva_alloc(struct buf *bp, int maxsize, int gbflags)
 {
 	vm_offset_t addr;
 	int error;
 
 	KASSERT((gbflags & GB_UNMAPPED) == 0 || (gbflags & GB_KVAALLOC) != 0,
 	    ("Invalid gbflags 0x%x in %s", gbflags, __func__));
 
 	bufkva_free(bp);
 
 	addr = 0;
 	error = vmem_alloc(buffer_arena, maxsize, M_BESTFIT | M_NOWAIT, &addr);
 	if (error != 0) {
 		/*
 		 * Buffer map is too fragmented.  Request the caller
 		 * to defragment the map.
 		 */
 		return (error);
 	}
 	bp->b_kvabase = (caddr_t)addr;
 	bp->b_kvasize = maxsize;
 	counter_u64_add(bufkvaspace, bp->b_kvasize);
 	if ((gbflags & GB_UNMAPPED) != 0) {
 		bp->b_data = unmapped_buf;
 		BUF_CHECK_UNMAPPED(bp);
 	} else {
 		bp->b_data = bp->b_kvabase;
 		BUF_CHECK_MAPPED(bp);
 	}
 	return (0);
 }
 
 /*
  *	bufkva_reclaim:
  *
  *	Reclaim buffer kva by freeing buffers holding kva.  This is a vmem
  *	callback that fires to avoid returning failure.
  */
 static void
 bufkva_reclaim(vmem_t *vmem, int flags)
 {
 	bool done;
 	int q;
 	int i;
 
 	done = false;
 	for (i = 0; i < 5; i++) {
 		for (q = 0; q < buf_domains; q++)
 			if (buf_recycle(&bdomain[q], true) != 0)
 				done = true;
 		if (done)
 			break;
 	}
 	return;
 }
 
 /*
  * Attempt to initiate asynchronous I/O on read-ahead blocks.  We must
  * clear BIO_ERROR and B_INVAL prior to initiating I/O . If B_CACHE is set,
  * the buffer is valid and we do not have to do anything.
  */
 static void
 breada(struct vnode * vp, daddr_t * rablkno, int * rabsize, int cnt,
     struct ucred * cred, int flags, void (*ckhashfunc)(struct buf *))
 {
 	struct buf *rabp;
+	struct thread *td;
 	int i;
 
+	td = curthread;
+
 	for (i = 0; i < cnt; i++, rablkno++, rabsize++) {
 		if (inmem(vp, *rablkno))
 			continue;
 		rabp = getblk(vp, *rablkno, *rabsize, 0, 0, 0);
 		if ((rabp->b_flags & B_CACHE) != 0) {
 			brelse(rabp);
 			continue;
 		}
-		if (!TD_IS_IDLETHREAD(curthread)) {
 #ifdef RACCT
-			if (racct_enable) {
-				PROC_LOCK(curproc);
-				racct_add_buf(curproc, rabp, 0);
-				PROC_UNLOCK(curproc);
-			}
-#endif /* RACCT */
-			curthread->td_ru.ru_inblock++;
+		if (racct_enable) {
+			PROC_LOCK(curproc);
+			racct_add_buf(curproc, rabp, 0);
+			PROC_UNLOCK(curproc);
 		}
+#endif /* RACCT */
+		td->td_ru.ru_inblock++;
 		rabp->b_flags |= B_ASYNC;
 		rabp->b_flags &= ~B_INVAL;
 		if ((flags & GB_CKHASH) != 0) {
 			rabp->b_flags |= B_CKHASH;
 			rabp->b_ckhashcalc = ckhashfunc;
 		}
 		rabp->b_ioflags &= ~BIO_ERROR;
 		rabp->b_iocmd = BIO_READ;
 		if (rabp->b_rcred == NOCRED && cred != NOCRED)
 			rabp->b_rcred = crhold(cred);
 		vfs_busy_pages(rabp, 0);
 		BUF_KERNPROC(rabp);
 		rabp->b_iooffset = dbtob(rabp->b_blkno);
 		bstrategy(rabp);
 	}
 }
 
 /*
  * Entry point for bread() and breadn() via #defines in sys/buf.h.
  *
  * Get a buffer with the specified data.  Look in the cache first.  We
  * must clear BIO_ERROR and B_INVAL prior to initiating I/O.  If B_CACHE
  * is set, the buffer is valid and we do not have to do anything, see
  * getblk(). Also starts asynchronous I/O on read-ahead blocks.
  *
  * Always return a NULL buffer pointer (in bpp) when returning an error.
  */
 int
 breadn_flags(struct vnode *vp, daddr_t blkno, int size, daddr_t *rablkno,
     int *rabsize, int cnt, struct ucred *cred, int flags,
     void (*ckhashfunc)(struct buf *), struct buf **bpp)
 {
 	struct buf *bp;
 	struct thread *td;
 	int error, readwait, rv;
 
 	CTR3(KTR_BUF, "breadn(%p, %jd, %d)", vp, blkno, size);
 	td = curthread;
 	/*
 	 * Can only return NULL if GB_LOCK_NOWAIT or GB_SPARSE flags
 	 * are specified.
 	 */
 	error = getblkx(vp, blkno, size, 0, 0, flags, &bp);
 	if (error != 0) {
 		*bpp = NULL;
 		return (error);
 	}
 	flags &= ~GB_NOSPARSE;
 	*bpp = bp;
 
 	/*
 	 * If not found in cache, do some I/O
 	 */
 	readwait = 0;
 	if ((bp->b_flags & B_CACHE) == 0) {
-		if (!TD_IS_IDLETHREAD(td)) {
 #ifdef RACCT
-			if (racct_enable) {
-				PROC_LOCK(td->td_proc);
-				racct_add_buf(td->td_proc, bp, 0);
-				PROC_UNLOCK(td->td_proc);
-			}
-#endif /* RACCT */
-			td->td_ru.ru_inblock++;
+		if (racct_enable) {
+			PROC_LOCK(td->td_proc);
+			racct_add_buf(td->td_proc, bp, 0);
+			PROC_UNLOCK(td->td_proc);
 		}
+#endif /* RACCT */
+		td->td_ru.ru_inblock++;
 		bp->b_iocmd = BIO_READ;
 		bp->b_flags &= ~B_INVAL;
 		if ((flags & GB_CKHASH) != 0) {
 			bp->b_flags |= B_CKHASH;
 			bp->b_ckhashcalc = ckhashfunc;
 		}
 		bp->b_ioflags &= ~BIO_ERROR;
 		if (bp->b_rcred == NOCRED && cred != NOCRED)
 			bp->b_rcred = crhold(cred);
 		vfs_busy_pages(bp, 0);
 		bp->b_iooffset = dbtob(bp->b_blkno);
 		bstrategy(bp);
 		++readwait;
 	}
 
 	/*
 	 * Attempt to initiate asynchronous I/O on read-ahead blocks.
 	 */
 	breada(vp, rablkno, rabsize, cnt, cred, flags, ckhashfunc);
 
 	rv = 0;
 	if (readwait) {
 		rv = bufwait(bp);
 		if (rv != 0) {
 			brelse(bp);
 			*bpp = NULL;
 		}
 	}
 	return (rv);
 }
 
 /*
  * Write, release buffer on completion.  (Done by iodone
  * if async).  Do not bother writing anything if the buffer
  * is invalid.
  *
  * Note that we set B_CACHE here, indicating that buffer is
  * fully valid and thus cacheable.  This is true even of NFS
  * now so we set it generally.  This could be set either here 
  * or in biodone() since the I/O is synchronous.  We put it
  * here.
  */
 int
 bufwrite(struct buf *bp)
 {
 	int oldflags;
 	struct vnode *vp;
 	long space;
 	int vp_md;
 
 	CTR3(KTR_BUF, "bufwrite(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	if ((bp->b_bufobj->bo_flag & BO_DEAD) != 0) {
 		bp->b_flags |= B_INVAL | B_RELBUF;
 		bp->b_flags &= ~B_CACHE;
 		brelse(bp);
 		return (ENXIO);
 	}
 	if (bp->b_flags & B_INVAL) {
 		brelse(bp);
 		return (0);
 	}
 
 	if (bp->b_flags & B_BARRIER)
 		atomic_add_long(&barrierwrites, 1);
 
 	oldflags = bp->b_flags;
 
 	BUF_ASSERT_HELD(bp);
 
 	KASSERT(!(bp->b_vflags & BV_BKGRDINPROG),
 	    ("FFS background buffer should not get here %p", bp));
 
 	vp = bp->b_vp;
 	if (vp)
 		vp_md = vp->v_vflag & VV_MD;
 	else
 		vp_md = 0;
 
 	/*
 	 * Mark the buffer clean.  Increment the bufobj write count
 	 * before bundirty() call, to prevent other thread from seeing
 	 * empty dirty list and zero counter for writes in progress,
 	 * falsely indicating that the bufobj is clean.
 	 */
 	bufobj_wref(bp->b_bufobj);
 	bundirty(bp);
 
 	bp->b_flags &= ~B_DONE;
 	bp->b_ioflags &= ~BIO_ERROR;
 	bp->b_flags |= B_CACHE;
 	bp->b_iocmd = BIO_WRITE;
 
 	vfs_busy_pages(bp, 1);
 
 	/*
 	 * Normal bwrites pipeline writes
 	 */
 	bp->b_runningbufspace = bp->b_bufsize;
 	space = atomic_fetchadd_long(&runningbufspace, bp->b_runningbufspace);
 
-	if (!TD_IS_IDLETHREAD(curthread)) {
 #ifdef RACCT
-		if (racct_enable) {
-			PROC_LOCK(curproc);
-			racct_add_buf(curproc, bp, 1);
-			PROC_UNLOCK(curproc);
-		}
-#endif /* RACCT */
-		curthread->td_ru.ru_oublock++;
+	if (racct_enable) {
+		PROC_LOCK(curproc);
+		racct_add_buf(curproc, bp, 1);
+		PROC_UNLOCK(curproc);
 	}
+#endif /* RACCT */
+	curthread->td_ru.ru_oublock++;
 	if (oldflags & B_ASYNC)
 		BUF_KERNPROC(bp);
 	bp->b_iooffset = dbtob(bp->b_blkno);
 	buf_track(bp, __func__);
 	bstrategy(bp);
 
 	if ((oldflags & B_ASYNC) == 0) {
 		int rtval = bufwait(bp);
 		brelse(bp);
 		return (rtval);
 	} else if (space > hirunningspace) {
 		/*
 		 * don't allow the async write to saturate the I/O
 		 * system.  We will not deadlock here because
 		 * we are blocking waiting for I/O that is already in-progress
 		 * to complete. We do not block here if it is the update
 		 * or syncer daemon trying to clean up as that can lead
 		 * to deadlock.
 		 */
 		if ((curthread->td_pflags & TDP_NORUNNINGBUF) == 0 && !vp_md)
 			waitrunningbufspace();
 	}
 
 	return (0);
 }
 
 void
 bufbdflush(struct bufobj *bo, struct buf *bp)
 {
 	struct buf *nbp;
 
 	if (bo->bo_dirty.bv_cnt > dirtybufthresh + 10) {
 		(void) VOP_FSYNC(bp->b_vp, MNT_NOWAIT, curthread);
 		altbufferflushes++;
 	} else if (bo->bo_dirty.bv_cnt > dirtybufthresh) {
 		BO_LOCK(bo);
 		/*
 		 * Try to find a buffer to flush.
 		 */
 		TAILQ_FOREACH(nbp, &bo->bo_dirty.bv_hd, b_bobufs) {
 			if ((nbp->b_vflags & BV_BKGRDINPROG) ||
 			    BUF_LOCK(nbp,
 				     LK_EXCLUSIVE | LK_NOWAIT, NULL))
 				continue;
 			if (bp == nbp)
 				panic("bdwrite: found ourselves");
 			BO_UNLOCK(bo);
 			/* Don't countdeps with the bo lock held. */
 			if (buf_countdeps(nbp, 0)) {
 				BO_LOCK(bo);
 				BUF_UNLOCK(nbp);
 				continue;
 			}
 			if (nbp->b_flags & B_CLUSTEROK) {
 				vfs_bio_awrite(nbp);
 			} else {
 				bremfree(nbp);
 				bawrite(nbp);
 			}
 			dirtybufferflushes++;
 			break;
 		}
 		if (nbp == NULL)
 			BO_UNLOCK(bo);
 	}
 }
 
 /*
  * Delayed write. (Buffer is marked dirty).  Do not bother writing
  * anything if the buffer is marked invalid.
  *
  * Note that since the buffer must be completely valid, we can safely
  * set B_CACHE.  In fact, we have to set B_CACHE here rather then in
  * biodone() in order to prevent getblk from writing the buffer
  * out synchronously.
  */
 void
 bdwrite(struct buf *bp)
 {
 	struct thread *td = curthread;
 	struct vnode *vp;
 	struct bufobj *bo;
 
 	CTR3(KTR_BUF, "bdwrite(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	KASSERT(bp->b_bufobj != NULL, ("No b_bufobj %p", bp));
 	KASSERT((bp->b_flags & B_BARRIER) == 0,
 	    ("Barrier request in delayed write %p", bp));
 	BUF_ASSERT_HELD(bp);
 
 	if (bp->b_flags & B_INVAL) {
 		brelse(bp);
 		return;
 	}
 
 	/*
 	 * If we have too many dirty buffers, don't create any more.
 	 * If we are wildly over our limit, then force a complete
 	 * cleanup. Otherwise, just keep the situation from getting
 	 * out of control. Note that we have to avoid a recursive
 	 * disaster and not try to clean up after our own cleanup!
 	 */
 	vp = bp->b_vp;
 	bo = bp->b_bufobj;
 	if ((td->td_pflags & (TDP_COWINPROGRESS|TDP_INBDFLUSH)) == 0) {
 		td->td_pflags |= TDP_INBDFLUSH;
 		BO_BDFLUSH(bo, bp);
 		td->td_pflags &= ~TDP_INBDFLUSH;
 	} else
 		recursiveflushes++;
 
 	bdirty(bp);
 	/*
 	 * Set B_CACHE, indicating that the buffer is fully valid.  This is
 	 * true even of NFS now.
 	 */
 	bp->b_flags |= B_CACHE;
 
 	/*
 	 * This bmap keeps the system from needing to do the bmap later,
 	 * perhaps when the system is attempting to do a sync.  Since it
 	 * is likely that the indirect block -- or whatever other datastructure
 	 * that the filesystem needs is still in memory now, it is a good
 	 * thing to do this.  Note also, that if the pageout daemon is
 	 * requesting a sync -- there might not be enough memory to do
 	 * the bmap then...  So, this is important to do.
 	 */
 	if (vp->v_type != VCHR && bp->b_lblkno == bp->b_blkno) {
 		VOP_BMAP(vp, bp->b_lblkno, NULL, &bp->b_blkno, NULL, NULL);
 	}
 
 	buf_track(bp, __func__);
 
 	/*
 	 * Set the *dirty* buffer range based upon the VM system dirty
 	 * pages.
 	 *
 	 * Mark the buffer pages as clean.  We need to do this here to
 	 * satisfy the vnode_pager and the pageout daemon, so that it
 	 * thinks that the pages have been "cleaned".  Note that since
 	 * the pages are in a delayed write buffer -- the VFS layer
 	 * "will" see that the pages get written out on the next sync,
 	 * or perhaps the cluster will be completed.
 	 */
 	vfs_clean_pages_dirty_buf(bp);
 	bqrelse(bp);
 
 	/*
 	 * note: we cannot initiate I/O from a bdwrite even if we wanted to,
 	 * due to the softdep code.
 	 */
 }
 
 /*
  *	bdirty:
  *
  *	Turn buffer into delayed write request.  We must clear BIO_READ and
  *	B_RELBUF, and we must set B_DELWRI.  We reassign the buffer to 
  *	itself to properly update it in the dirty/clean lists.  We mark it
  *	B_DONE to ensure that any asynchronization of the buffer properly
  *	clears B_DONE ( else a panic will occur later ).  
  *
  *	bdirty() is kinda like bdwrite() - we have to clear B_INVAL which
  *	might have been set pre-getblk().  Unlike bwrite/bdwrite, bdirty()
  *	should only be called if the buffer is known-good.
  *
  *	Since the buffer is not on a queue, we do not update the numfreebuffers
  *	count.
  *
  *	The buffer must be on QUEUE_NONE.
  */
 void
 bdirty(struct buf *bp)
 {
 
 	CTR3(KTR_BUF, "bdirty(%p) vp %p flags %X",
 	    bp, bp->b_vp, bp->b_flags);
 	KASSERT(bp->b_bufobj != NULL, ("No b_bufobj %p", bp));
 	KASSERT(bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE,
 	    ("bdirty: buffer %p still on queue %d", bp, bp->b_qindex));
 	BUF_ASSERT_HELD(bp);
 	bp->b_flags &= ~(B_RELBUF);
 	bp->b_iocmd = BIO_WRITE;
 
 	if ((bp->b_flags & B_DELWRI) == 0) {
 		bp->b_flags |= /* XXX B_DONE | */ B_DELWRI;
 		reassignbuf(bp);
 		bdirtyadd(bp);
 	}
 }
 
 /*
  *	bundirty:
  *
  *	Clear B_DELWRI for buffer.
  *
  *	Since the buffer is not on a queue, we do not update the numfreebuffers
  *	count.
  *	
  *	The buffer must be on QUEUE_NONE.
  */
 
 void
 bundirty(struct buf *bp)
 {
 
 	CTR3(KTR_BUF, "bundirty(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	KASSERT(bp->b_bufobj != NULL, ("No b_bufobj %p", bp));
 	KASSERT(bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE,
 	    ("bundirty: buffer %p still on queue %d", bp, bp->b_qindex));
 	BUF_ASSERT_HELD(bp);
 
 	if (bp->b_flags & B_DELWRI) {
 		bp->b_flags &= ~B_DELWRI;
 		reassignbuf(bp);
 		bdirtysub(bp);
 	}
 	/*
 	 * Since it is now being written, we can clear its deferred write flag.
 	 */
 	bp->b_flags &= ~B_DEFERRED;
 }
 
 /*
  *	bawrite:
  *
  *	Asynchronous write.  Start output on a buffer, but do not wait for
  *	it to complete.  The buffer is released when the output completes.
  *
  *	bwrite() ( or the VOP routine anyway ) is responsible for handling 
  *	B_INVAL buffers.  Not us.
  */
 void
 bawrite(struct buf *bp)
 {
 
 	bp->b_flags |= B_ASYNC;
 	(void) bwrite(bp);
 }
 
 /*
  *	babarrierwrite:
  *
  *	Asynchronous barrier write.  Start output on a buffer, but do not
  *	wait for it to complete.  Place a write barrier after this write so
  *	that this buffer and all buffers written before it are committed to
  *	the disk before any buffers written after this write are committed
  *	to the disk.  The buffer is released when the output completes.
  */
 void
 babarrierwrite(struct buf *bp)
 {
 
 	bp->b_flags |= B_ASYNC | B_BARRIER;
 	(void) bwrite(bp);
 }
 
 /*
  *	bbarrierwrite:
  *
  *	Synchronous barrier write.  Start output on a buffer and wait for
  *	it to complete.  Place a write barrier after this write so that
  *	this buffer and all buffers written before it are committed to 
  *	the disk before any buffers written after this write are committed
  *	to the disk.  The buffer is released when the output completes.
  */
 int
 bbarrierwrite(struct buf *bp)
 {
 
 	bp->b_flags |= B_BARRIER;
 	return (bwrite(bp));
 }
 
 /*
  *	bwillwrite:
  *
  *	Called prior to the locking of any vnodes when we are expecting to
  *	write.  We do not want to starve the buffer cache with too many
  *	dirty buffers so we block here.  By blocking prior to the locking
  *	of any vnodes we attempt to avoid the situation where a locked vnode
  *	prevents the various system daemons from flushing related buffers.
  */
 void
 bwillwrite(void)
 {
 
 	if (buf_dirty_count_severe()) {
 		mtx_lock(&bdirtylock);
 		while (buf_dirty_count_severe()) {
 			bdirtywait = 1;
 			msleep(&bdirtywait, &bdirtylock, (PRIBIO + 4),
 			    "flswai", 0);
 		}
 		mtx_unlock(&bdirtylock);
 	}
 }
 
 /*
  * Return true if we have too many dirty buffers.
  */
 int
 buf_dirty_count_severe(void)
 {
 
 	return (!BIT_EMPTY(BUF_DOMAINS, &bdhidirty));
 }
 
 /*
  *	brelse:
  *
  *	Release a busy buffer and, if requested, free its resources.  The
  *	buffer will be stashed in the appropriate bufqueue[] allowing it
  *	to be accessed later as a cache entity or reused for other purposes.
  */
 void
 brelse(struct buf *bp)
 {
 	struct mount *v_mnt;
 	int qindex;
 
 	/*
 	 * Many functions erroneously call brelse with a NULL bp under rare
 	 * error conditions. Simply return when called with a NULL bp.
 	 */
 	if (bp == NULL)
 		return;
 	CTR3(KTR_BUF, "brelse(%p) vp %p flags %X",
 	    bp, bp->b_vp, bp->b_flags);
 	KASSERT(!(bp->b_flags & (B_CLUSTER|B_PAGING)),
 	    ("brelse: inappropriate B_PAGING or B_CLUSTER bp %p", bp));
 	KASSERT((bp->b_flags & B_VMIO) != 0 || (bp->b_flags & B_NOREUSE) == 0,
 	    ("brelse: non-VMIO buffer marked NOREUSE"));
 
 	if (BUF_LOCKRECURSED(bp)) {
 		/*
 		 * Do not process, in particular, do not handle the
 		 * B_INVAL/B_RELBUF and do not release to free list.
 		 */
 		BUF_UNLOCK(bp);
 		return;
 	}
 
 	if (bp->b_flags & B_MANAGED) {
 		bqrelse(bp);
 		return;
 	}
 
 	if ((bp->b_vflags & (BV_BKGRDINPROG | BV_BKGRDERR)) == BV_BKGRDERR) {
 		BO_LOCK(bp->b_bufobj);
 		bp->b_vflags &= ~BV_BKGRDERR;
 		BO_UNLOCK(bp->b_bufobj);
 		bdirty(bp);
 	}
 	if (bp->b_iocmd == BIO_WRITE && (bp->b_ioflags & BIO_ERROR) &&
 	    (bp->b_error != ENXIO || !LIST_EMPTY(&bp->b_dep)) &&
 	    !(bp->b_flags & B_INVAL)) {
 		/*
 		 * Failed write, redirty.  All errors except ENXIO (which
 		 * means the device is gone) are treated as being
 		 * transient.
 		 *
 		 * XXX Treating EIO as transient is not correct; the
 		 * contract with the local storage device drivers is that
 		 * they will only return EIO once the I/O is no longer
 		 * retriable.  Network I/O also respects this through the
 		 * guarantees of TCP and/or the internal retries of NFS.
 		 * ENOMEM might be transient, but we also have no way of
 		 * knowing when its ok to retry/reschedule.  In general,
 		 * this entire case should be made obsolete through better
 		 * error handling/recovery and resource scheduling.
 		 *
 		 * Do this also for buffers that failed with ENXIO, but have
 		 * non-empty dependencies - the soft updates code might need
 		 * to access the buffer to untangle them.
 		 *
 		 * Must clear BIO_ERROR to prevent pages from being scrapped.
 		 */
 		bp->b_ioflags &= ~BIO_ERROR;
 		bdirty(bp);
 	} else if ((bp->b_flags & (B_NOCACHE | B_INVAL)) ||
 	    (bp->b_ioflags & BIO_ERROR) || (bp->b_bufsize <= 0)) {
 		/*
 		 * Either a failed read I/O, or we were asked to free or not
 		 * cache the buffer, or we failed to write to a device that's
 		 * no longer present.
 		 */
 		bp->b_flags |= B_INVAL;
 		if (!LIST_EMPTY(&bp->b_dep))
 			buf_deallocate(bp);
 		if (bp->b_flags & B_DELWRI)
 			bdirtysub(bp);
 		bp->b_flags &= ~(B_DELWRI | B_CACHE);
 		if ((bp->b_flags & B_VMIO) == 0) {
 			allocbuf(bp, 0);
 			if (bp->b_vp)
 				brelvp(bp);
 		}
 	}
 
 	/*
 	 * We must clear B_RELBUF if B_DELWRI is set.  If vfs_vmio_truncate() 
 	 * is called with B_DELWRI set, the underlying pages may wind up
 	 * getting freed causing a previous write (bdwrite()) to get 'lost'
 	 * because pages associated with a B_DELWRI bp are marked clean.
 	 * 
 	 * We still allow the B_INVAL case to call vfs_vmio_truncate(), even
 	 * if B_DELWRI is set.
 	 */
 	if (bp->b_flags & B_DELWRI)
 		bp->b_flags &= ~B_RELBUF;
 
 	/*
 	 * VMIO buffer rundown.  It is not very necessary to keep a VMIO buffer
 	 * constituted, not even NFS buffers now.  Two flags effect this.  If
 	 * B_INVAL, the struct buf is invalidated but the VM object is kept
 	 * around ( i.e. so it is trivial to reconstitute the buffer later ).
 	 *
 	 * If BIO_ERROR or B_NOCACHE is set, pages in the VM object will be
 	 * invalidated.  BIO_ERROR cannot be set for a failed write unless the
 	 * buffer is also B_INVAL because it hits the re-dirtying code above.
 	 *
 	 * Normally we can do this whether a buffer is B_DELWRI or not.  If
 	 * the buffer is an NFS buffer, it is tracking piecemeal writes or
 	 * the commit state and we cannot afford to lose the buffer. If the
 	 * buffer has a background write in progress, we need to keep it
 	 * around to prevent it from being reconstituted and starting a second
 	 * background write.
 	 */
 
 	v_mnt = bp->b_vp != NULL ? bp->b_vp->v_mount : NULL;
 
 	if ((bp->b_flags & B_VMIO) && (bp->b_flags & B_NOCACHE ||
 	    (bp->b_ioflags & BIO_ERROR && bp->b_iocmd == BIO_READ)) &&
 	    (v_mnt == NULL || (v_mnt->mnt_vfc->vfc_flags & VFCF_NETWORK) == 0 ||
 	    vn_isdisk(bp->b_vp, NULL) || (bp->b_flags & B_DELWRI) == 0)) {
 		vfs_vmio_invalidate(bp);
 		allocbuf(bp, 0);
 	}
 
 	if ((bp->b_flags & (B_INVAL | B_RELBUF)) != 0 ||
 	    (bp->b_flags & (B_DELWRI | B_NOREUSE)) == B_NOREUSE) {
 		allocbuf(bp, 0);
 		bp->b_flags &= ~B_NOREUSE;
 		if (bp->b_vp != NULL)
 			brelvp(bp);
 	}
 			
 	/*
 	 * If the buffer has junk contents signal it and eventually
 	 * clean up B_DELWRI and diassociate the vnode so that gbincore()
 	 * doesn't find it.
 	 */
 	if (bp->b_bufsize == 0 || (bp->b_ioflags & BIO_ERROR) != 0 ||
 	    (bp->b_flags & (B_INVAL | B_NOCACHE | B_RELBUF)) != 0)
 		bp->b_flags |= B_INVAL;
 	if (bp->b_flags & B_INVAL) {
 		if (bp->b_flags & B_DELWRI)
 			bundirty(bp);
 		if (bp->b_vp)
 			brelvp(bp);
 	}
 
 	buf_track(bp, __func__);
 
 	/* buffers with no memory */
 	if (bp->b_bufsize == 0) {
 		buf_free(bp);
 		return;
 	}
 	/* buffers with junk contents */
 	if (bp->b_flags & (B_INVAL | B_NOCACHE | B_RELBUF) ||
 	    (bp->b_ioflags & BIO_ERROR)) {
 		bp->b_xflags &= ~(BX_BKGRDWRITE | BX_ALTDATA);
 		if (bp->b_vflags & BV_BKGRDINPROG)
 			panic("losing buffer 2");
 		qindex = QUEUE_CLEAN;
 		bp->b_flags |= B_AGE;
 	/* remaining buffers */
 	} else if (bp->b_flags & B_DELWRI)
 		qindex = QUEUE_DIRTY;
 	else
 		qindex = QUEUE_CLEAN;
 
 	if ((bp->b_flags & B_DELWRI) == 0 && (bp->b_xflags & BX_VNDIRTY))
 		panic("brelse: not dirty");
 
 	bp->b_flags &= ~(B_ASYNC | B_NOCACHE | B_RELBUF | B_DIRECT);
 	/* binsfree unlocks bp. */
 	binsfree(bp, qindex);
 }
 
 /*
  * Release a buffer back to the appropriate queue but do not try to free
  * it.  The buffer is expected to be used again soon.
  *
  * bqrelse() is used by bdwrite() to requeue a delayed write, and used by
  * biodone() to requeue an async I/O on completion.  It is also used when
  * known good buffers need to be requeued but we think we may need the data
  * again soon.
  *
  * XXX we should be able to leave the B_RELBUF hint set on completion.
  */
 void
 bqrelse(struct buf *bp)
 {
 	int qindex;
 
 	CTR3(KTR_BUF, "bqrelse(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	KASSERT(!(bp->b_flags & (B_CLUSTER|B_PAGING)),
 	    ("bqrelse: inappropriate B_PAGING or B_CLUSTER bp %p", bp));
 
 	qindex = QUEUE_NONE;
 	if (BUF_LOCKRECURSED(bp)) {
 		/* do not release to free list */
 		BUF_UNLOCK(bp);
 		return;
 	}
 	bp->b_flags &= ~(B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF);
 
 	if (bp->b_flags & B_MANAGED) {
 		if (bp->b_flags & B_REMFREE)
 			bremfreef(bp);
 		goto out;
 	}
 
 	/* buffers with stale but valid contents */
 	if ((bp->b_flags & B_DELWRI) != 0 || (bp->b_vflags & (BV_BKGRDINPROG |
 	    BV_BKGRDERR)) == BV_BKGRDERR) {
 		BO_LOCK(bp->b_bufobj);
 		bp->b_vflags &= ~BV_BKGRDERR;
 		BO_UNLOCK(bp->b_bufobj);
 		qindex = QUEUE_DIRTY;
 	} else {
 		if ((bp->b_flags & B_DELWRI) == 0 &&
 		    (bp->b_xflags & BX_VNDIRTY))
 			panic("bqrelse: not dirty");
 		if ((bp->b_flags & B_NOREUSE) != 0) {
 			brelse(bp);
 			return;
 		}
 		qindex = QUEUE_CLEAN;
 	}
 	buf_track(bp, __func__);
 	/* binsfree unlocks bp. */
 	binsfree(bp, qindex);
 	return;
 
 out:
 	buf_track(bp, __func__);
 	/* unlock */
 	BUF_UNLOCK(bp);
 }
 
 /*
  * Complete I/O to a VMIO backed page.  Validate the pages as appropriate,
  * restore bogus pages.
  */
 static void
 vfs_vmio_iodone(struct buf *bp)
 {
 	vm_ooffset_t foff;
 	vm_page_t m;
 	vm_object_t obj;
 	struct vnode *vp __unused;
 	int i, iosize, resid;
 	bool bogus;
 
 	obj = bp->b_bufobj->bo_object;
 	KASSERT(obj->paging_in_progress >= bp->b_npages,
 	    ("vfs_vmio_iodone: paging in progress(%d) < b_npages(%d)",
 	    obj->paging_in_progress, bp->b_npages));
 
 	vp = bp->b_vp;
 	KASSERT(vp->v_holdcnt > 0,
 	    ("vfs_vmio_iodone: vnode %p has zero hold count", vp));
 	KASSERT(vp->v_object != NULL,
 	    ("vfs_vmio_iodone: vnode %p has no vm_object", vp));
 
 	foff = bp->b_offset;
 	KASSERT(bp->b_offset != NOOFFSET,
 	    ("vfs_vmio_iodone: bp %p has no buffer offset", bp));
 
 	bogus = false;
 	iosize = bp->b_bcount - bp->b_resid;
 	VM_OBJECT_WLOCK(obj);
 	for (i = 0; i < bp->b_npages; i++) {
 		resid = ((foff + PAGE_SIZE) & ~(off_t)PAGE_MASK) - foff;
 		if (resid > iosize)
 			resid = iosize;
 
 		/*
 		 * cleanup bogus pages, restoring the originals
 		 */
 		m = bp->b_pages[i];
 		if (m == bogus_page) {
 			bogus = true;
 			m = vm_page_lookup(obj, OFF_TO_IDX(foff));
 			if (m == NULL)
 				panic("biodone: page disappeared!");
 			bp->b_pages[i] = m;
 		} else if ((bp->b_iocmd == BIO_READ) && resid > 0) {
 			/*
 			 * In the write case, the valid and clean bits are
 			 * already changed correctly ( see bdwrite() ), so we 
 			 * only need to do this here in the read case.
 			 */
 			KASSERT((m->dirty & vm_page_bits(foff & PAGE_MASK,
 			    resid)) == 0, ("vfs_vmio_iodone: page %p "
 			    "has unexpected dirty bits", m));
 			vfs_page_set_valid(bp, foff, m);
 		}
 		KASSERT(OFF_TO_IDX(foff) == m->pindex,
 		    ("vfs_vmio_iodone: foff(%jd)/pindex(%ju) mismatch",
 		    (intmax_t)foff, (uintmax_t)m->pindex));
 
 		vm_page_sunbusy(m);
 		foff = (foff + PAGE_SIZE) & ~(off_t)PAGE_MASK;
 		iosize -= resid;
 	}
 	vm_object_pip_wakeupn(obj, bp->b_npages);
 	VM_OBJECT_WUNLOCK(obj);
 	if (bogus && buf_mapped(bp)) {
 		BUF_CHECK_MAPPED(bp);
 		pmap_qenter(trunc_page((vm_offset_t)bp->b_data),
 		    bp->b_pages, bp->b_npages);
 	}
 }
 
 /*
  * Unwire a page held by a buf and either free it or update the page queues to
  * reflect its recent use.
  */
 static void
 vfs_vmio_unwire(struct buf *bp, vm_page_t m)
 {
 	bool freed;
 
 	vm_page_lock(m);
 	if (vm_page_unwire_noq(m)) {
 		if ((bp->b_flags & B_DIRECT) != 0)
 			freed = vm_page_try_to_free(m);
 		else
 			freed = false;
 		if (!freed) {
 			/*
 			 * Use a racy check of the valid bits to determine
 			 * whether we can accelerate reclamation of the page.
 			 * The valid bits will be stable unless the page is
 			 * being mapped or is referenced by multiple buffers,
 			 * and in those cases we expect races to be rare.  At
 			 * worst we will either accelerate reclamation of a
 			 * valid page and violate LRU, or unnecessarily defer
 			 * reclamation of an invalid page.
 			 *
 			 * The B_NOREUSE flag marks data that is not expected to
 			 * be reused, so accelerate reclamation in that case
 			 * too.  Otherwise, maintain LRU.
 			 */
 			if (m->valid == 0 || (bp->b_flags & B_NOREUSE) != 0)
 				vm_page_deactivate_noreuse(m);
 			else if (vm_page_active(m))
 				vm_page_reference(m);
 			else
 				vm_page_deactivate(m);
 		}
 	}
 	vm_page_unlock(m);
 }
 
 /*
  * Perform page invalidation when a buffer is released.  The fully invalid
  * pages will be reclaimed later in vfs_vmio_truncate().
  */
 static void
 vfs_vmio_invalidate(struct buf *bp)
 {
 	vm_object_t obj;
 	vm_page_t m;
 	int i, resid, poffset, presid;
 
 	if (buf_mapped(bp)) {
 		BUF_CHECK_MAPPED(bp);
 		pmap_qremove(trunc_page((vm_offset_t)bp->b_data), bp->b_npages);
 	} else
 		BUF_CHECK_UNMAPPED(bp);
 	/*
 	 * Get the base offset and length of the buffer.  Note that 
 	 * in the VMIO case if the buffer block size is not
 	 * page-aligned then b_data pointer may not be page-aligned.
 	 * But our b_pages[] array *IS* page aligned.
 	 *
 	 * block sizes less then DEV_BSIZE (usually 512) are not 
 	 * supported due to the page granularity bits (m->valid,
 	 * m->dirty, etc...). 
 	 *
 	 * See man buf(9) for more information
 	 */
 	obj = bp->b_bufobj->bo_object;
 	resid = bp->b_bufsize;
 	poffset = bp->b_offset & PAGE_MASK;
 	VM_OBJECT_WLOCK(obj);
 	for (i = 0; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
 		if (m == bogus_page)
 			panic("vfs_vmio_invalidate: Unexpected bogus page.");
 		bp->b_pages[i] = NULL;
 
 		presid = resid > (PAGE_SIZE - poffset) ?
 		    (PAGE_SIZE - poffset) : resid;
 		KASSERT(presid >= 0, ("brelse: extra page"));
 		while (vm_page_xbusied(m)) {
 			vm_page_lock(m);
 			VM_OBJECT_WUNLOCK(obj);
 			vm_page_busy_sleep(m, "mbncsh", true);
 			VM_OBJECT_WLOCK(obj);
 		}
 		if (pmap_page_wired_mappings(m) == 0)
 			vm_page_set_invalid(m, poffset, presid);
 		vfs_vmio_unwire(bp, m);
 		resid -= presid;
 		poffset = 0;
 	}
 	VM_OBJECT_WUNLOCK(obj);
 	bp->b_npages = 0;
 }
 
 /*
  * Page-granular truncation of an existing VMIO buffer.
  */
 static void
 vfs_vmio_truncate(struct buf *bp, int desiredpages)
 {
 	vm_object_t obj;
 	vm_page_t m;
 	int i;
 
 	if (bp->b_npages == desiredpages)
 		return;
 
 	if (buf_mapped(bp)) {
 		BUF_CHECK_MAPPED(bp);
 		pmap_qremove((vm_offset_t)trunc_page((vm_offset_t)bp->b_data) +
 		    (desiredpages << PAGE_SHIFT), bp->b_npages - desiredpages);
 	} else
 		BUF_CHECK_UNMAPPED(bp);
 
 	/*
 	 * The object lock is needed only if we will attempt to free pages.
 	 */
 	obj = (bp->b_flags & B_DIRECT) != 0 ? bp->b_bufobj->bo_object : NULL;
 	if (obj != NULL)
 		VM_OBJECT_WLOCK(obj);
 	for (i = desiredpages; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
 		KASSERT(m != bogus_page, ("allocbuf: bogus page found"));
 		bp->b_pages[i] = NULL;
 		vfs_vmio_unwire(bp, m);
 	}
 	if (obj != NULL)
 		VM_OBJECT_WUNLOCK(obj);
 	bp->b_npages = desiredpages;
 }
 
 /*
  * Byte granular extension of VMIO buffers.
  */
 static void
 vfs_vmio_extend(struct buf *bp, int desiredpages, int size)
 {
 	/*
 	 * We are growing the buffer, possibly in a 
 	 * byte-granular fashion.
 	 */
 	vm_object_t obj;
 	vm_offset_t toff;
 	vm_offset_t tinc;
 	vm_page_t m;
 
 	/*
 	 * Step 1, bring in the VM pages from the object, allocating
 	 * them if necessary.  We must clear B_CACHE if these pages
 	 * are not valid for the range covered by the buffer.
 	 */
 	obj = bp->b_bufobj->bo_object;
 	VM_OBJECT_WLOCK(obj);
 	if (bp->b_npages < desiredpages) {
 		/*
 		 * We must allocate system pages since blocking
 		 * here could interfere with paging I/O, no
 		 * matter which process we are.
 		 *
 		 * Only exclusive busy can be tested here.
 		 * Blocking on shared busy might lead to
 		 * deadlocks once allocbuf() is called after
 		 * pages are vfs_busy_pages().
 		 */
 		(void)vm_page_grab_pages(obj,
 		    OFF_TO_IDX(bp->b_offset) + bp->b_npages,
 		    VM_ALLOC_SYSTEM | VM_ALLOC_IGN_SBUSY |
 		    VM_ALLOC_NOBUSY | VM_ALLOC_WIRED,
 		    &bp->b_pages[bp->b_npages], desiredpages - bp->b_npages);
 		bp->b_npages = desiredpages;
 	}
 
 	/*
 	 * Step 2.  We've loaded the pages into the buffer,
 	 * we have to figure out if we can still have B_CACHE
 	 * set.  Note that B_CACHE is set according to the
 	 * byte-granular range ( bcount and size ), not the
 	 * aligned range ( newbsize ).
 	 *
 	 * The VM test is against m->valid, which is DEV_BSIZE
 	 * aligned.  Needless to say, the validity of the data
 	 * needs to also be DEV_BSIZE aligned.  Note that this
 	 * fails with NFS if the server or some other client
 	 * extends the file's EOF.  If our buffer is resized, 
 	 * B_CACHE may remain set! XXX
 	 */
 	toff = bp->b_bcount;
 	tinc = PAGE_SIZE - ((bp->b_offset + toff) & PAGE_MASK);
 	while ((bp->b_flags & B_CACHE) && toff < size) {
 		vm_pindex_t pi;
 
 		if (tinc > (size - toff))
 			tinc = size - toff;
 		pi = ((bp->b_offset & PAGE_MASK) + toff) >> PAGE_SHIFT;
 		m = bp->b_pages[pi];
 		vfs_buf_test_cache(bp, bp->b_offset, toff, tinc, m);
 		toff += tinc;
 		tinc = PAGE_SIZE;
 	}
 	VM_OBJECT_WUNLOCK(obj);
 
 	/*
 	 * Step 3, fixup the KVA pmap.
 	 */
 	if (buf_mapped(bp))
 		bpmap_qenter(bp);
 	else
 		BUF_CHECK_UNMAPPED(bp);
 }
 
 /*
  * Check to see if a block at a particular lbn is available for a clustered
  * write.
  */
 static int
 vfs_bio_clcheck(struct vnode *vp, int size, daddr_t lblkno, daddr_t blkno)
 {
 	struct buf *bpa;
 	int match;
 
 	match = 0;
 
 	/* If the buf isn't in core skip it */
 	if ((bpa = gbincore(&vp->v_bufobj, lblkno)) == NULL)
 		return (0);
 
 	/* If the buf is busy we don't want to wait for it */
 	if (BUF_LOCK(bpa, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
 		return (0);
 
 	/* Only cluster with valid clusterable delayed write buffers */
 	if ((bpa->b_flags & (B_DELWRI | B_CLUSTEROK | B_INVAL)) !=
 	    (B_DELWRI | B_CLUSTEROK))
 		goto done;
 
 	if (bpa->b_bufsize != size)
 		goto done;
 
 	/*
 	 * Check to see if it is in the expected place on disk and that the
 	 * block has been mapped.
 	 */
 	if ((bpa->b_blkno != bpa->b_lblkno) && (bpa->b_blkno == blkno))
 		match = 1;
 done:
 	BUF_UNLOCK(bpa);
 	return (match);
 }
 
 /*
  *	vfs_bio_awrite:
  *
  *	Implement clustered async writes for clearing out B_DELWRI buffers.
  *	This is much better then the old way of writing only one buffer at
  *	a time.  Note that we may not be presented with the buffers in the 
  *	correct order, so we search for the cluster in both directions.
  */
 int
 vfs_bio_awrite(struct buf *bp)
 {
 	struct bufobj *bo;
 	int i;
 	int j;
 	daddr_t lblkno = bp->b_lblkno;
 	struct vnode *vp = bp->b_vp;
 	int ncl;
 	int nwritten;
 	int size;
 	int maxcl;
 	int gbflags;
 
 	bo = &vp->v_bufobj;
 	gbflags = (bp->b_data == unmapped_buf) ? GB_UNMAPPED : 0;
 	/*
 	 * right now we support clustered writing only to regular files.  If
 	 * we find a clusterable block we could be in the middle of a cluster
 	 * rather then at the beginning.
 	 */
 	if ((vp->v_type == VREG) && 
 	    (vp->v_mount != 0) && /* Only on nodes that have the size info */
 	    (bp->b_flags & (B_CLUSTEROK | B_INVAL)) == B_CLUSTEROK) {
 
 		size = vp->v_mount->mnt_stat.f_iosize;
 		maxcl = MAXPHYS / size;
 
 		BO_RLOCK(bo);
 		for (i = 1; i < maxcl; i++)
 			if (vfs_bio_clcheck(vp, size, lblkno + i,
 			    bp->b_blkno + ((i * size) >> DEV_BSHIFT)) == 0)
 				break;
 
 		for (j = 1; i + j <= maxcl && j <= lblkno; j++) 
 			if (vfs_bio_clcheck(vp, size, lblkno - j,
 			    bp->b_blkno - ((j * size) >> DEV_BSHIFT)) == 0)
 				break;
 		BO_RUNLOCK(bo);
 		--j;
 		ncl = i + j;
 		/*
 		 * this is a possible cluster write
 		 */
 		if (ncl != 1) {
 			BUF_UNLOCK(bp);
 			nwritten = cluster_wbuild(vp, size, lblkno - j, ncl,
 			    gbflags);
 			return (nwritten);
 		}
 	}
 	bremfree(bp);
 	bp->b_flags |= B_ASYNC;
 	/*
 	 * default (old) behavior, writing out only one block
 	 *
 	 * XXX returns b_bufsize instead of b_bcount for nwritten?
 	 */
 	nwritten = bp->b_bufsize;
 	(void) bwrite(bp);
 
 	return (nwritten);
 }
 
 /*
  *	getnewbuf_kva:
  *
  *	Allocate KVA for an empty buf header according to gbflags.
  */
 static int
 getnewbuf_kva(struct buf *bp, int gbflags, int maxsize)
 {
 
 	if ((gbflags & (GB_UNMAPPED | GB_KVAALLOC)) != GB_UNMAPPED) {
 		/*
 		 * In order to keep fragmentation sane we only allocate kva
 		 * in BKVASIZE chunks.  XXX with vmem we can do page size.
 		 */
 		maxsize = (maxsize + BKVAMASK) & ~BKVAMASK;
 
 		if (maxsize != bp->b_kvasize &&
 		    bufkva_alloc(bp, maxsize, gbflags))
 			return (ENOSPC);
 	}
 	return (0);
 }
 
 /*
  *	getnewbuf:
  *
  *	Find and initialize a new buffer header, freeing up existing buffers
  *	in the bufqueues as necessary.  The new buffer is returned locked.
  *
  *	We block if:
  *		We have insufficient buffer headers
  *		We have insufficient buffer space
  *		buffer_arena is too fragmented ( space reservation fails )
  *		If we have to flush dirty buffers ( but we try to avoid this )
  *
  *	The caller is responsible for releasing the reserved bufspace after
  *	allocbuf() is called.
  */
 static struct buf *
 getnewbuf(struct vnode *vp, int slpflag, int slptimeo, int maxsize, int gbflags)
 {
 	struct bufdomain *bd;
 	struct buf *bp;
 	bool metadata, reserved;
 
 	bp = NULL;
 	KASSERT((gbflags & (GB_UNMAPPED | GB_KVAALLOC)) != GB_KVAALLOC,
 	    ("GB_KVAALLOC only makes sense with GB_UNMAPPED"));
 	if (!unmapped_buf_allowed)
 		gbflags &= ~(GB_UNMAPPED | GB_KVAALLOC);
 
 	if (vp == NULL || (vp->v_vflag & (VV_MD | VV_SYSTEM)) != 0 ||
 	    vp->v_type == VCHR)
 		metadata = true;
 	else
 		metadata = false;
 	if (vp == NULL)
 		bd = &bdomain[0];
 	else
 		bd = &bdomain[vp->v_bufobj.bo_domain];
 
 	counter_u64_add(getnewbufcalls, 1);
 	reserved = false;
 	do {
 		if (reserved == false &&
 		    bufspace_reserve(bd, maxsize, metadata) != 0) {
 			counter_u64_add(getnewbufrestarts, 1);
 			continue;
 		}
 		reserved = true;
 		if ((bp = buf_alloc(bd)) == NULL) {
 			counter_u64_add(getnewbufrestarts, 1);
 			continue;
 		}
 		if (getnewbuf_kva(bp, gbflags, maxsize) == 0)
 			return (bp);
 		break;
 	} while (buf_recycle(bd, false) == 0);
 
 	if (reserved)
 		bufspace_release(bd, maxsize);
 	if (bp != NULL) {
 		bp->b_flags |= B_INVAL;
 		brelse(bp);
 	}
 	bufspace_wait(bd, vp, gbflags, slpflag, slptimeo);
 
 	return (NULL);
 }
 
 /*
  *	buf_daemon:
  *
  *	buffer flushing daemon.  Buffers are normally flushed by the
  *	update daemon but if it cannot keep up this process starts to
  *	take the load in an attempt to prevent getnewbuf() from blocking.
  */
 static struct kproc_desc buf_kp = {
 	"bufdaemon",
 	buf_daemon,
 	&bufdaemonproc
 };
 SYSINIT(bufdaemon, SI_SUB_KTHREAD_BUF, SI_ORDER_FIRST, kproc_start, &buf_kp);
 
 static int
 buf_flush(struct vnode *vp, struct bufdomain *bd, int target)
 {
 	int flushed;
 
 	flushed = flushbufqueues(vp, bd, target, 0);
 	if (flushed == 0) {
 		/*
 		 * Could not find any buffers without rollback
 		 * dependencies, so just write the first one
 		 * in the hopes of eventually making progress.
 		 */
 		if (vp != NULL && target > 2)
 			target /= 2;
 		flushbufqueues(vp, bd, target, 1);
 	}
 	return (flushed);
 }
 
 static void
 buf_daemon()
 {
 	struct bufdomain *bd;
 	int speedupreq;
 	int lodirty;
 	int i;
 
 	/*
 	 * This process needs to be suspended prior to shutdown sync.
 	 */
 	EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
 	    SHUTDOWN_PRI_LAST + 100);
 
 	/*
 	 * Start the buf clean daemons as children threads.
 	 */
 	for (i = 0 ; i < buf_domains; i++) {
 		int error;
 
 		error = kthread_add((void (*)(void *))bufspace_daemon,
 		    &bdomain[i], curproc, NULL, 0, 0, "bufspacedaemon-%d", i);
 		if (error)
 			panic("error %d spawning bufspace daemon", error);
 	}
 
 	/*
 	 * This process is allowed to take the buffer cache to the limit
 	 */
 	curthread->td_pflags |= TDP_NORUNNINGBUF | TDP_BUFNEED;
 	mtx_lock(&bdlock);
 	for (;;) {
 		bd_request = 0;
 		mtx_unlock(&bdlock);
 
 		kthread_suspend_check();
 
 		/*
 		 * Save speedupreq for this pass and reset to capture new
 		 * requests.
 		 */
 		speedupreq = bd_speedupreq;
 		bd_speedupreq = 0;
 
 		/*
 		 * Flush each domain sequentially according to its level and
 		 * the speedup request.
 		 */
 		for (i = 0; i < buf_domains; i++) {
 			bd = &bdomain[i];
 			if (speedupreq)
 				lodirty = bd->bd_numdirtybuffers / 2;
 			else
 				lodirty = bd->bd_lodirtybuffers;
 			while (bd->bd_numdirtybuffers > lodirty) {
 				if (buf_flush(NULL, bd,
 				    bd->bd_numdirtybuffers - lodirty) == 0)
 					break;
 				kern_yield(PRI_USER);
 			}
 		}
 
 		/*
 		 * Only clear bd_request if we have reached our low water
 		 * mark.  The buf_daemon normally waits 1 second and
 		 * then incrementally flushes any dirty buffers that have
 		 * built up, within reason.
 		 *
 		 * If we were unable to hit our low water mark and couldn't
 		 * find any flushable buffers, we sleep for a short period
 		 * to avoid endless loops on unlockable buffers.
 		 */
 		mtx_lock(&bdlock);
 		if (!BIT_EMPTY(BUF_DOMAINS, &bdlodirty)) {
 			/*
 			 * We reached our low water mark, reset the
 			 * request and sleep until we are needed again.
 			 * The sleep is just so the suspend code works.
 			 */
 			bd_request = 0;
 			/*
 			 * Do an extra wakeup in case dirty threshold
 			 * changed via sysctl and the explicit transition
 			 * out of shortfall was missed.
 			 */
 			bdirtywakeup();
 			if (runningbufspace <= lorunningspace)
 				runningwakeup();
 			msleep(&bd_request, &bdlock, PVM, "psleep", hz);
 		} else {
 			/*
 			 * We couldn't find any flushable dirty buffers but
 			 * still have too many dirty buffers, we
 			 * have to sleep and try again.  (rare)
 			 */
 			msleep(&bd_request, &bdlock, PVM, "qsleep", hz / 10);
 		}
 	}
 }
 
 /*
  *	flushbufqueues:
  *
  *	Try to flush a buffer in the dirty queue.  We must be careful to
  *	free up B_INVAL buffers instead of write them, which NFS is 
  *	particularly sensitive to.
  */
 static int flushwithdeps = 0;
 SYSCTL_INT(_vfs, OID_AUTO, flushwithdeps, CTLFLAG_RW, &flushwithdeps,
     0, "Number of buffers flushed with dependecies that require rollbacks");
 
 static int
 flushbufqueues(struct vnode *lvp, struct bufdomain *bd, int target,
     int flushdeps)
 {
 	struct bufqueue *bq;
 	struct buf *sentinel;
 	struct vnode *vp;
 	struct mount *mp;
 	struct buf *bp;
 	int hasdeps;
 	int flushed;
 	int error;
 	bool unlock;
 
 	flushed = 0;
 	bq = &bd->bd_dirtyq;
 	bp = NULL;
 	sentinel = malloc(sizeof(struct buf), M_TEMP, M_WAITOK | M_ZERO);
 	sentinel->b_qindex = QUEUE_SENTINEL;
 	BQ_LOCK(bq);
 	TAILQ_INSERT_HEAD(&bq->bq_queue, sentinel, b_freelist);
 	BQ_UNLOCK(bq);
 	while (flushed != target) {
 		maybe_yield();
 		BQ_LOCK(bq);
 		bp = TAILQ_NEXT(sentinel, b_freelist);
 		if (bp != NULL) {
 			TAILQ_REMOVE(&bq->bq_queue, sentinel, b_freelist);
 			TAILQ_INSERT_AFTER(&bq->bq_queue, bp, sentinel,
 			    b_freelist);
 		} else {
 			BQ_UNLOCK(bq);
 			break;
 		}
 		/*
 		 * Skip sentinels inserted by other invocations of the
 		 * flushbufqueues(), taking care to not reorder them.
 		 *
 		 * Only flush the buffers that belong to the
 		 * vnode locked by the curthread.
 		 */
 		if (bp->b_qindex == QUEUE_SENTINEL || (lvp != NULL &&
 		    bp->b_vp != lvp)) {
 			BQ_UNLOCK(bq);
 			continue;
 		}
 		error = BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL);
 		BQ_UNLOCK(bq);
 		if (error != 0)
 			continue;
 
 		/*
 		 * BKGRDINPROG can only be set with the buf and bufobj
 		 * locks both held.  We tolerate a race to clear it here.
 		 */
 		if ((bp->b_vflags & BV_BKGRDINPROG) != 0 ||
 		    (bp->b_flags & B_DELWRI) == 0) {
 			BUF_UNLOCK(bp);
 			continue;
 		}
 		if (bp->b_flags & B_INVAL) {
 			bremfreef(bp);
 			brelse(bp);
 			flushed++;
 			continue;
 		}
 
 		if (!LIST_EMPTY(&bp->b_dep) && buf_countdeps(bp, 0)) {
 			if (flushdeps == 0) {
 				BUF_UNLOCK(bp);
 				continue;
 			}
 			hasdeps = 1;
 		} else
 			hasdeps = 0;
 		/*
 		 * We must hold the lock on a vnode before writing
 		 * one of its buffers. Otherwise we may confuse, or
 		 * in the case of a snapshot vnode, deadlock the
 		 * system.
 		 *
 		 * The lock order here is the reverse of the normal
 		 * of vnode followed by buf lock.  This is ok because
 		 * the NOWAIT will prevent deadlock.
 		 */
 		vp = bp->b_vp;
 		if (vn_start_write(vp, &mp, V_NOWAIT) != 0) {
 			BUF_UNLOCK(bp);
 			continue;
 		}
 		if (lvp == NULL) {
 			unlock = true;
 			error = vn_lock(vp, LK_EXCLUSIVE | LK_NOWAIT);
 		} else {
 			ASSERT_VOP_LOCKED(vp, "getbuf");
 			unlock = false;
 			error = VOP_ISLOCKED(vp) == LK_EXCLUSIVE ? 0 :
 			    vn_lock(vp, LK_TRYUPGRADE);
 		}
 		if (error == 0) {
 			CTR3(KTR_BUF, "flushbufqueue(%p) vp %p flags %X",
 			    bp, bp->b_vp, bp->b_flags);
 			if (curproc == bufdaemonproc) {
 				vfs_bio_awrite(bp);
 			} else {
 				bremfree(bp);
 				bwrite(bp);
 				counter_u64_add(notbufdflushes, 1);
 			}
 			vn_finished_write(mp);
 			if (unlock)
 				VOP_UNLOCK(vp, 0);
 			flushwithdeps += hasdeps;
 			flushed++;
 
 			/*
 			 * Sleeping on runningbufspace while holding
 			 * vnode lock leads to deadlock.
 			 */
 			if (curproc == bufdaemonproc &&
 			    runningbufspace > hirunningspace)
 				waitrunningbufspace();
 			continue;
 		}
 		vn_finished_write(mp);
 		BUF_UNLOCK(bp);
 	}
 	BQ_LOCK(bq);
 	TAILQ_REMOVE(&bq->bq_queue, sentinel, b_freelist);
 	BQ_UNLOCK(bq);
 	free(sentinel, M_TEMP);
 	return (flushed);
 }
 
 /*
  * Check to see if a block is currently memory resident.
  */
 struct buf *
 incore(struct bufobj *bo, daddr_t blkno)
 {
 	struct buf *bp;
 
 	BO_RLOCK(bo);
 	bp = gbincore(bo, blkno);
 	BO_RUNLOCK(bo);
 	return (bp);
 }
 
 /*
  * Returns true if no I/O is needed to access the
  * associated VM object.  This is like incore except
  * it also hunts around in the VM system for the data.
  */
 
 static int
 inmem(struct vnode * vp, daddr_t blkno)
 {
 	vm_object_t obj;
 	vm_offset_t toff, tinc, size;
 	vm_page_t m;
 	vm_ooffset_t off;
 
 	ASSERT_VOP_LOCKED(vp, "inmem");
 
 	if (incore(&vp->v_bufobj, blkno))
 		return 1;
 	if (vp->v_mount == NULL)
 		return 0;
 	obj = vp->v_object;
 	if (obj == NULL)
 		return (0);
 
 	size = PAGE_SIZE;
 	if (size > vp->v_mount->mnt_stat.f_iosize)
 		size = vp->v_mount->mnt_stat.f_iosize;
 	off = (vm_ooffset_t)blkno * (vm_ooffset_t)vp->v_mount->mnt_stat.f_iosize;
 
 	VM_OBJECT_RLOCK(obj);
 	for (toff = 0; toff < vp->v_mount->mnt_stat.f_iosize; toff += tinc) {
 		m = vm_page_lookup(obj, OFF_TO_IDX(off + toff));
 		if (!m)
 			goto notinmem;
 		tinc = size;
 		if (tinc > PAGE_SIZE - ((toff + off) & PAGE_MASK))
 			tinc = PAGE_SIZE - ((toff + off) & PAGE_MASK);
 		if (vm_page_is_valid(m,
 		    (vm_offset_t) ((toff + off) & PAGE_MASK), tinc) == 0)
 			goto notinmem;
 	}
 	VM_OBJECT_RUNLOCK(obj);
 	return 1;
 
 notinmem:
 	VM_OBJECT_RUNLOCK(obj);
 	return (0);
 }
 
 /*
  * Set the dirty range for a buffer based on the status of the dirty
  * bits in the pages comprising the buffer.  The range is limited
  * to the size of the buffer.
  *
  * Tell the VM system that the pages associated with this buffer
  * are clean.  This is used for delayed writes where the data is
  * going to go to disk eventually without additional VM intevention.
  *
  * Note that while we only really need to clean through to b_bcount, we
  * just go ahead and clean through to b_bufsize.
  */
 static void
 vfs_clean_pages_dirty_buf(struct buf *bp)
 {
 	vm_ooffset_t foff, noff, eoff;
 	vm_page_t m;
 	int i;
 
 	if ((bp->b_flags & B_VMIO) == 0 || bp->b_bufsize == 0)
 		return;
 
 	foff = bp->b_offset;
 	KASSERT(bp->b_offset != NOOFFSET,
 	    ("vfs_clean_pages_dirty_buf: no buffer offset"));
 
 	VM_OBJECT_WLOCK(bp->b_bufobj->bo_object);
 	vfs_drain_busy_pages(bp);
 	vfs_setdirty_locked_object(bp);
 	for (i = 0; i < bp->b_npages; i++) {
 		noff = (foff + PAGE_SIZE) & ~(off_t)PAGE_MASK;
 		eoff = noff;
 		if (eoff > bp->b_offset + bp->b_bufsize)
 			eoff = bp->b_offset + bp->b_bufsize;
 		m = bp->b_pages[i];
 		vfs_page_set_validclean(bp, foff, m);
 		/* vm_page_clear_dirty(m, foff & PAGE_MASK, eoff - foff); */
 		foff = noff;
 	}
 	VM_OBJECT_WUNLOCK(bp->b_bufobj->bo_object);
 }
 
 static void
 vfs_setdirty_locked_object(struct buf *bp)
 {
 	vm_object_t object;
 	int i;
 
 	object = bp->b_bufobj->bo_object;
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
 	/*
 	 * We qualify the scan for modified pages on whether the
 	 * object has been flushed yet.
 	 */
 	if ((object->flags & OBJ_MIGHTBEDIRTY) != 0) {
 		vm_offset_t boffset;
 		vm_offset_t eoffset;
 
 		/*
 		 * test the pages to see if they have been modified directly
 		 * by users through the VM system.
 		 */
 		for (i = 0; i < bp->b_npages; i++)
 			vm_page_test_dirty(bp->b_pages[i]);
 
 		/*
 		 * Calculate the encompassing dirty range, boffset and eoffset,
 		 * (eoffset - boffset) bytes.
 		 */
 
 		for (i = 0; i < bp->b_npages; i++) {
 			if (bp->b_pages[i]->dirty)
 				break;
 		}
 		boffset = (i << PAGE_SHIFT) - (bp->b_offset & PAGE_MASK);
 
 		for (i = bp->b_npages - 1; i >= 0; --i) {
 			if (bp->b_pages[i]->dirty) {
 				break;
 			}
 		}
 		eoffset = ((i + 1) << PAGE_SHIFT) - (bp->b_offset & PAGE_MASK);
 
 		/*
 		 * Fit it to the buffer.
 		 */
 
 		if (eoffset > bp->b_bcount)
 			eoffset = bp->b_bcount;
 
 		/*
 		 * If we have a good dirty range, merge with the existing
 		 * dirty range.
 		 */
 
 		if (boffset < eoffset) {
 			if (bp->b_dirtyoff > boffset)
 				bp->b_dirtyoff = boffset;
 			if (bp->b_dirtyend < eoffset)
 				bp->b_dirtyend = eoffset;
 		}
 	}
 }
 
 /*
  * Allocate the KVA mapping for an existing buffer.
  * If an unmapped buffer is provided but a mapped buffer is requested, take
  * also care to properly setup mappings between pages and KVA.
  */
 static void
 bp_unmapped_get_kva(struct buf *bp, daddr_t blkno, int size, int gbflags)
 {
 	int bsize, maxsize, need_mapping, need_kva;
 	off_t offset;
 
 	need_mapping = bp->b_data == unmapped_buf &&
 	    (gbflags & GB_UNMAPPED) == 0;
 	need_kva = bp->b_kvabase == unmapped_buf &&
 	    bp->b_data == unmapped_buf &&
 	    (gbflags & GB_KVAALLOC) != 0;
 	if (!need_mapping && !need_kva)
 		return;
 
 	BUF_CHECK_UNMAPPED(bp);
 
 	if (need_mapping && bp->b_kvabase != unmapped_buf) {
 		/*
 		 * Buffer is not mapped, but the KVA was already
 		 * reserved at the time of the instantiation.  Use the
 		 * allocated space.
 		 */
 		goto has_addr;
 	}
 
 	/*
 	 * Calculate the amount of the address space we would reserve
 	 * if the buffer was mapped.
 	 */
 	bsize = vn_isdisk(bp->b_vp, NULL) ? DEV_BSIZE : bp->b_bufobj->bo_bsize;
 	KASSERT(bsize != 0, ("bsize == 0, check bo->bo_bsize"));
 	offset = blkno * bsize;
 	maxsize = size + (offset & PAGE_MASK);
 	maxsize = imax(maxsize, bsize);
 
 	while (bufkva_alloc(bp, maxsize, gbflags) != 0) {
 		if ((gbflags & GB_NOWAIT_BD) != 0) {
 			/*
 			 * XXXKIB: defragmentation cannot
 			 * succeed, not sure what else to do.
 			 */
 			panic("GB_NOWAIT_BD and GB_UNMAPPED %p", bp);
 		}
 		counter_u64_add(mappingrestarts, 1);
 		bufspace_wait(bufdomain(bp), bp->b_vp, gbflags, 0, 0);
 	}
 has_addr:
 	if (need_mapping) {
 		/* b_offset is handled by bpmap_qenter. */
 		bp->b_data = bp->b_kvabase;
 		BUF_CHECK_MAPPED(bp);
 		bpmap_qenter(bp);
 	}
 }
 
 struct buf *
 getblk(struct vnode *vp, daddr_t blkno, int size, int slpflag, int slptimeo,
     int flags)
 {
 	struct buf *bp;
 	int error;
 
 	error = getblkx(vp, blkno, size, slpflag, slptimeo, flags, &bp);
 	if (error != 0)
 		return (NULL);
 	return (bp);
 }
 
 /*
  *	getblkx:
  *
  *	Get a block given a specified block and offset into a file/device.
  *	The buffers B_DONE bit will be cleared on return, making it almost
  * 	ready for an I/O initiation.  B_INVAL may or may not be set on 
  *	return.  The caller should clear B_INVAL prior to initiating a
  *	READ.
  *
  *	For a non-VMIO buffer, B_CACHE is set to the opposite of B_INVAL for
  *	an existing buffer.
  *
  *	For a VMIO buffer, B_CACHE is modified according to the backing VM.
  *	If getblk()ing a previously 0-sized invalid buffer, B_CACHE is set
  *	and then cleared based on the backing VM.  If the previous buffer is
  *	non-0-sized but invalid, B_CACHE will be cleared.
  *
  *	If getblk() must create a new buffer, the new buffer is returned with
  *	both B_INVAL and B_CACHE clear unless it is a VMIO buffer, in which
  *	case it is returned with B_INVAL clear and B_CACHE set based on the
  *	backing VM.
  *
  *	getblk() also forces a bwrite() for any B_DELWRI buffer whos
  *	B_CACHE bit is clear.
  *	
  *	What this means, basically, is that the caller should use B_CACHE to
  *	determine whether the buffer is fully valid or not and should clear
  *	B_INVAL prior to issuing a read.  If the caller intends to validate
  *	the buffer by loading its data area with something, the caller needs
  *	to clear B_INVAL.  If the caller does this without issuing an I/O, 
  *	the caller should set B_CACHE ( as an optimization ), else the caller
  *	should issue the I/O and biodone() will set B_CACHE if the I/O was
  *	a write attempt or if it was a successful read.  If the caller 
  *	intends to issue a READ, the caller must clear B_INVAL and BIO_ERROR
  *	prior to issuing the READ.  biodone() will *not* clear B_INVAL.
  */
 int
 getblkx(struct vnode *vp, daddr_t blkno, int size, int slpflag, int slptimeo,
     int flags, struct buf **bpp)
 {
 	struct buf *bp;
 	struct bufobj *bo;
 	daddr_t d_blkno;
 	int bsize, error, maxsize, vmio;
 	off_t offset;
 
 	CTR3(KTR_BUF, "getblk(%p, %ld, %d)", vp, (long)blkno, size);
 	KASSERT((flags & (GB_UNMAPPED | GB_KVAALLOC)) != GB_KVAALLOC,
 	    ("GB_KVAALLOC only makes sense with GB_UNMAPPED"));
 	ASSERT_VOP_LOCKED(vp, "getblk");
 	if (size > maxbcachebuf)
 		panic("getblk: size(%d) > maxbcachebuf(%d)\n", size,
 		    maxbcachebuf);
 	if (!unmapped_buf_allowed)
 		flags &= ~(GB_UNMAPPED | GB_KVAALLOC);
 
 	bo = &vp->v_bufobj;
 	d_blkno = blkno;
 loop:
 	BO_RLOCK(bo);
 	bp = gbincore(bo, blkno);
 	if (bp != NULL) {
 		int lockflags;
 		/*
 		 * Buffer is in-core.  If the buffer is not busy nor managed,
 		 * it must be on a queue.
 		 */
 		lockflags = LK_EXCLUSIVE | LK_SLEEPFAIL | LK_INTERLOCK;
 
 		if ((flags & GB_LOCK_NOWAIT) != 0)
 			lockflags |= LK_NOWAIT;
 
 		error = BUF_TIMELOCK(bp, lockflags,
 		    BO_LOCKPTR(bo), "getblk", slpflag, slptimeo);
 
 		/*
 		 * If we slept and got the lock we have to restart in case
 		 * the buffer changed identities.
 		 */
 		if (error == ENOLCK)
 			goto loop;
 		/* We timed out or were interrupted. */
 		else if (error != 0)
 			return (error);
 		/* If recursed, assume caller knows the rules. */
 		else if (BUF_LOCKRECURSED(bp))
 			goto end;
 
 		/*
 		 * The buffer is locked.  B_CACHE is cleared if the buffer is 
 		 * invalid.  Otherwise, for a non-VMIO buffer, B_CACHE is set
 		 * and for a VMIO buffer B_CACHE is adjusted according to the
 		 * backing VM cache.
 		 */
 		if (bp->b_flags & B_INVAL)
 			bp->b_flags &= ~B_CACHE;
 		else if ((bp->b_flags & (B_VMIO | B_INVAL)) == 0)
 			bp->b_flags |= B_CACHE;
 		if (bp->b_flags & B_MANAGED)
 			MPASS(bp->b_qindex == QUEUE_NONE);
 		else
 			bremfree(bp);
 
 		/*
 		 * check for size inconsistencies for non-VMIO case.
 		 */
 		if (bp->b_bcount != size) {
 			if ((bp->b_flags & B_VMIO) == 0 ||
 			    (size > bp->b_kvasize)) {
 				if (bp->b_flags & B_DELWRI) {
 					bp->b_flags |= B_NOCACHE;
 					bwrite(bp);
 				} else {
 					if (LIST_EMPTY(&bp->b_dep)) {
 						bp->b_flags |= B_RELBUF;
 						brelse(bp);
 					} else {
 						bp->b_flags |= B_NOCACHE;
 						bwrite(bp);
 					}
 				}
 				goto loop;
 			}
 		}
 
 		/*
 		 * Handle the case of unmapped buffer which should
 		 * become mapped, or the buffer for which KVA
 		 * reservation is requested.
 		 */
 		bp_unmapped_get_kva(bp, blkno, size, flags);
 
 		/*
 		 * If the size is inconsistent in the VMIO case, we can resize
 		 * the buffer.  This might lead to B_CACHE getting set or
 		 * cleared.  If the size has not changed, B_CACHE remains
 		 * unchanged from its previous state.
 		 */
 		allocbuf(bp, size);
 
 		KASSERT(bp->b_offset != NOOFFSET, 
 		    ("getblk: no buffer offset"));
 
 		/*
 		 * A buffer with B_DELWRI set and B_CACHE clear must
 		 * be committed before we can return the buffer in
 		 * order to prevent the caller from issuing a read
 		 * ( due to B_CACHE not being set ) and overwriting
 		 * it.
 		 *
 		 * Most callers, including NFS and FFS, need this to
 		 * operate properly either because they assume they
 		 * can issue a read if B_CACHE is not set, or because
 		 * ( for example ) an uncached B_DELWRI might loop due 
 		 * to softupdates re-dirtying the buffer.  In the latter
 		 * case, B_CACHE is set after the first write completes,
 		 * preventing further loops.
 		 * NOTE!  b*write() sets B_CACHE.  If we cleared B_CACHE
 		 * above while extending the buffer, we cannot allow the
 		 * buffer to remain with B_CACHE set after the write
 		 * completes or it will represent a corrupt state.  To
 		 * deal with this we set B_NOCACHE to scrap the buffer
 		 * after the write.
 		 *
 		 * We might be able to do something fancy, like setting
 		 * B_CACHE in bwrite() except if B_DELWRI is already set,
 		 * so the below call doesn't set B_CACHE, but that gets real
 		 * confusing.  This is much easier.
 		 */
 
 		if ((bp->b_flags & (B_CACHE|B_DELWRI)) == B_DELWRI) {
 			bp->b_flags |= B_NOCACHE;
 			bwrite(bp);
 			goto loop;
 		}
 		bp->b_flags &= ~B_DONE;
 	} else {
 		/*
 		 * Buffer is not in-core, create new buffer.  The buffer
 		 * returned by getnewbuf() is locked.  Note that the returned
 		 * buffer is also considered valid (not marked B_INVAL).
 		 */
 		BO_RUNLOCK(bo);
 		/*
 		 * If the user does not want us to create the buffer, bail out
 		 * here.
 		 */
 		if (flags & GB_NOCREAT)
 			return (EEXIST);
-		if (bdomain[bo->bo_domain].bd_freebuffers == 0 &&
-		    TD_IS_IDLETHREAD(curthread))
-			return (EBUSY);
 
 		bsize = vn_isdisk(vp, NULL) ? DEV_BSIZE : bo->bo_bsize;
 		KASSERT(bsize != 0, ("bsize == 0, check bo->bo_bsize"));
 		offset = blkno * bsize;
 		vmio = vp->v_object != NULL;
 		if (vmio) {
 			maxsize = size + (offset & PAGE_MASK);
 		} else {
 			maxsize = size;
 			/* Do not allow non-VMIO notmapped buffers. */
 			flags &= ~(GB_UNMAPPED | GB_KVAALLOC);
 		}
 		maxsize = imax(maxsize, bsize);
 		if ((flags & GB_NOSPARSE) != 0 && vmio &&
 		    !vn_isdisk(vp, NULL)) {
 			error = VOP_BMAP(vp, blkno, NULL, &d_blkno, 0, 0);
 			KASSERT(error != EOPNOTSUPP,
 			    ("GB_NOSPARSE from fs not supporting bmap, vp %p",
 			    vp));
 			if (error != 0)
 				return (error);
 			if (d_blkno == -1)
 				return (EJUSTRETURN);
 		}
 
 		bp = getnewbuf(vp, slpflag, slptimeo, maxsize, flags);
 		if (bp == NULL) {
 			if (slpflag || slptimeo)
 				return (ETIMEDOUT);
 			/*
 			 * XXX This is here until the sleep path is diagnosed
 			 * enough to work under very low memory conditions.
 			 *
 			 * There's an issue on low memory, 4BSD+non-preempt
 			 * systems (eg MIPS routers with 32MB RAM) where buffer
 			 * exhaustion occurs without sleeping for buffer
 			 * reclaimation.  This just sticks in a loop and
 			 * constantly attempts to allocate a buffer, which
 			 * hits exhaustion and tries to wakeup bufdaemon.
 			 * This never happens because we never yield.
 			 *
 			 * The real solution is to identify and fix these cases
 			 * so we aren't effectively busy-waiting in a loop
 			 * until the reclaimation path has cycles to run.
 			 */
 			kern_yield(PRI_USER);
 			goto loop;
 		}
 
 		/*
 		 * This code is used to make sure that a buffer is not
 		 * created while the getnewbuf routine is blocked.
 		 * This can be a problem whether the vnode is locked or not.
 		 * If the buffer is created out from under us, we have to
 		 * throw away the one we just created.
 		 *
 		 * Note: this must occur before we associate the buffer
 		 * with the vp especially considering limitations in
 		 * the splay tree implementation when dealing with duplicate
 		 * lblkno's.
 		 */
 		BO_LOCK(bo);
 		if (gbincore(bo, blkno)) {
 			BO_UNLOCK(bo);
 			bp->b_flags |= B_INVAL;
 			bufspace_release(bufdomain(bp), maxsize);
 			brelse(bp);
 			goto loop;
 		}
 
 		/*
 		 * Insert the buffer into the hash, so that it can
 		 * be found by incore.
 		 */
 		bp->b_lblkno = blkno;
 		bp->b_blkno = d_blkno;
 		bp->b_offset = offset;
 		bgetvp(vp, bp);
 		BO_UNLOCK(bo);
 
 		/*
 		 * set B_VMIO bit.  allocbuf() the buffer bigger.  Since the
 		 * buffer size starts out as 0, B_CACHE will be set by
 		 * allocbuf() for the VMIO case prior to it testing the
 		 * backing store for validity.
 		 */
 
 		if (vmio) {
 			bp->b_flags |= B_VMIO;
 			KASSERT(vp->v_object == bp->b_bufobj->bo_object,
 			    ("ARGH! different b_bufobj->bo_object %p %p %p\n",
 			    bp, vp->v_object, bp->b_bufobj->bo_object));
 		} else {
 			bp->b_flags &= ~B_VMIO;
 			KASSERT(bp->b_bufobj->bo_object == NULL,
 			    ("ARGH! has b_bufobj->bo_object %p %p\n",
 			    bp, bp->b_bufobj->bo_object));
 			BUF_CHECK_MAPPED(bp);
 		}
 
 		allocbuf(bp, size);
 		bufspace_release(bufdomain(bp), maxsize);
 		bp->b_flags &= ~B_DONE;
 	}
 	CTR4(KTR_BUF, "getblk(%p, %ld, %d) = %p", vp, (long)blkno, size, bp);
 	BUF_ASSERT_HELD(bp);
 end:
 	buf_track(bp, __func__);
 	KASSERT(bp->b_bufobj == bo,
 	    ("bp %p wrong b_bufobj %p should be %p", bp, bp->b_bufobj, bo));
 	*bpp = bp;
 	return (0);
 }
 
 /*
  * Get an empty, disassociated buffer of given size.  The buffer is initially
  * set to B_INVAL.
  */
 struct buf *
 geteblk(int size, int flags)
 {
 	struct buf *bp;
 	int maxsize;
 
 	maxsize = (size + BKVAMASK) & ~BKVAMASK;
 	while ((bp = getnewbuf(NULL, 0, 0, maxsize, flags)) == NULL) {
 		if ((flags & GB_NOWAIT_BD) &&
 		    (curthread->td_pflags & TDP_BUFNEED) != 0)
 			return (NULL);
 	}
 	allocbuf(bp, size);
 	bufspace_release(bufdomain(bp), maxsize);
 	bp->b_flags |= B_INVAL;	/* b_dep cleared by getnewbuf() */
 	BUF_ASSERT_HELD(bp);
 	return (bp);
 }
 
 /*
  * Truncate the backing store for a non-vmio buffer.
  */
 static void
 vfs_nonvmio_truncate(struct buf *bp, int newbsize)
 {
 
 	if (bp->b_flags & B_MALLOC) {
 		/*
 		 * malloced buffers are not shrunk
 		 */
 		if (newbsize == 0) {
 			bufmallocadjust(bp, 0);
 			free(bp->b_data, M_BIOBUF);
 			bp->b_data = bp->b_kvabase;
 			bp->b_flags &= ~B_MALLOC;
 		}
 		return;
 	}
 	vm_hold_free_pages(bp, newbsize);
 	bufspace_adjust(bp, newbsize);
 }
 
 /*
  * Extend the backing for a non-VMIO buffer.
  */
 static void
 vfs_nonvmio_extend(struct buf *bp, int newbsize)
 {
 	caddr_t origbuf;
 	int origbufsize;
 
 	/*
 	 * We only use malloced memory on the first allocation.
 	 * and revert to page-allocated memory when the buffer
 	 * grows.
 	 *
 	 * There is a potential smp race here that could lead
 	 * to bufmallocspace slightly passing the max.  It
 	 * is probably extremely rare and not worth worrying
 	 * over.
 	 */
 	if (bp->b_bufsize == 0 && newbsize <= PAGE_SIZE/2 &&
 	    bufmallocspace < maxbufmallocspace) {
 		bp->b_data = malloc(newbsize, M_BIOBUF, M_WAITOK);
 		bp->b_flags |= B_MALLOC;
 		bufmallocadjust(bp, newbsize);
 		return;
 	}
 
 	/*
 	 * If the buffer is growing on its other-than-first
 	 * allocation then we revert to the page-allocation
 	 * scheme.
 	 */
 	origbuf = NULL;
 	origbufsize = 0;
 	if (bp->b_flags & B_MALLOC) {
 		origbuf = bp->b_data;
 		origbufsize = bp->b_bufsize;
 		bp->b_data = bp->b_kvabase;
 		bufmallocadjust(bp, 0);
 		bp->b_flags &= ~B_MALLOC;
 		newbsize = round_page(newbsize);
 	}
 	vm_hold_load_pages(bp, (vm_offset_t) bp->b_data + bp->b_bufsize,
 	    (vm_offset_t) bp->b_data + newbsize);
 	if (origbuf != NULL) {
 		bcopy(origbuf, bp->b_data, origbufsize);
 		free(origbuf, M_BIOBUF);
 	}
 	bufspace_adjust(bp, newbsize);
 }
 
 /*
  * This code constitutes the buffer memory from either anonymous system
  * memory (in the case of non-VMIO operations) or from an associated
  * VM object (in the case of VMIO operations).  This code is able to
  * resize a buffer up or down.
  *
  * Note that this code is tricky, and has many complications to resolve
  * deadlock or inconsistent data situations.  Tread lightly!!! 
  * There are B_CACHE and B_DELWRI interactions that must be dealt with by 
  * the caller.  Calling this code willy nilly can result in the loss of data.
  *
  * allocbuf() only adjusts B_CACHE for VMIO buffers.  getblk() deals with
  * B_CACHE for the non-VMIO case.
  */
 int
 allocbuf(struct buf *bp, int size)
 {
 	int newbsize;
 
 	BUF_ASSERT_HELD(bp);
 
 	if (bp->b_bcount == size)
 		return (1);
 
 	if (bp->b_kvasize != 0 && bp->b_kvasize < size)
 		panic("allocbuf: buffer too small");
 
 	newbsize = roundup2(size, DEV_BSIZE);
 	if ((bp->b_flags & B_VMIO) == 0) {
 		if ((bp->b_flags & B_MALLOC) == 0)
 			newbsize = round_page(newbsize);
 		/*
 		 * Just get anonymous memory from the kernel.  Don't
 		 * mess with B_CACHE.
 		 */
 		if (newbsize < bp->b_bufsize)
 			vfs_nonvmio_truncate(bp, newbsize);
 		else if (newbsize > bp->b_bufsize)
 			vfs_nonvmio_extend(bp, newbsize);
 	} else {
 		int desiredpages;
 
 		desiredpages = (size == 0) ? 0 :
 		    num_pages((bp->b_offset & PAGE_MASK) + newbsize);
 
 		if (bp->b_flags & B_MALLOC)
 			panic("allocbuf: VMIO buffer can't be malloced");
 		/*
 		 * Set B_CACHE initially if buffer is 0 length or will become
 		 * 0-length.
 		 */
 		if (size == 0 || bp->b_bufsize == 0)
 			bp->b_flags |= B_CACHE;
 
 		if (newbsize < bp->b_bufsize)
 			vfs_vmio_truncate(bp, desiredpages);
 		/* XXX This looks as if it should be newbsize > b_bufsize */
 		else if (size > bp->b_bcount)
 			vfs_vmio_extend(bp, desiredpages, size);
 		bufspace_adjust(bp, newbsize);
 	}
 	bp->b_bcount = size;		/* requested buffer size. */
 	return (1);
 }
 
 extern int inflight_transient_maps;
 
 static struct bio_queue nondump_bios;
 
 void
 biodone(struct bio *bp)
 {
 	struct mtx *mtxp;
 	void (*done)(struct bio *);
 	vm_offset_t start, end;
 
 	biotrack(bp, __func__);
 
 	/*
 	 * Avoid completing I/O when dumping after a panic since that may
 	 * result in a deadlock in the filesystem or pager code.  Note that
 	 * this doesn't affect dumps that were started manually since we aim
 	 * to keep the system usable after it has been resumed.
 	 */
 	if (__predict_false(dumping && SCHEDULER_STOPPED())) {
 		TAILQ_INSERT_HEAD(&nondump_bios, bp, bio_queue);
 		return;
 	}
 	if ((bp->bio_flags & BIO_TRANSIENT_MAPPING) != 0) {
 		bp->bio_flags &= ~BIO_TRANSIENT_MAPPING;
 		bp->bio_flags |= BIO_UNMAPPED;
 		start = trunc_page((vm_offset_t)bp->bio_data);
 		end = round_page((vm_offset_t)bp->bio_data + bp->bio_length);
 		bp->bio_data = unmapped_buf;
 		pmap_qremove(start, atop(end - start));
 		vmem_free(transient_arena, start, end - start);
 		atomic_add_int(&inflight_transient_maps, -1);
 	}
 	done = bp->bio_done;
 	if (done == NULL) {
 		mtxp = mtx_pool_find(mtxpool_sleep, bp);
 		mtx_lock(mtxp);
 		bp->bio_flags |= BIO_DONE;
 		wakeup(bp);
 		mtx_unlock(mtxp);
 	} else
 		done(bp);
 }
 
 /*
  * Wait for a BIO to finish.
  */
 int
 biowait(struct bio *bp, const char *wchan)
 {
 	struct mtx *mtxp;
 
 	mtxp = mtx_pool_find(mtxpool_sleep, bp);
 	mtx_lock(mtxp);
 	while ((bp->bio_flags & BIO_DONE) == 0)
 		msleep(bp, mtxp, PRIBIO, wchan, 0);
 	mtx_unlock(mtxp);
 	if (bp->bio_error != 0)
 		return (bp->bio_error);
 	if (!(bp->bio_flags & BIO_ERROR))
 		return (0);
 	return (EIO);
 }
 
 void
 biofinish(struct bio *bp, struct devstat *stat, int error)
 {
 	
 	if (error) {
 		bp->bio_error = error;
 		bp->bio_flags |= BIO_ERROR;
 	}
 	if (stat != NULL)
 		devstat_end_transaction_bio(stat, bp);
 	biodone(bp);
 }
 
 #if defined(BUF_TRACKING) || defined(FULL_BUF_TRACKING)
 void
 biotrack_buf(struct bio *bp, const char *location)
 {
 
 	buf_track(bp->bio_track_bp, location);
 }
 #endif
 
 /*
  *	bufwait:
  *
  *	Wait for buffer I/O completion, returning error status.  The buffer
  *	is left locked and B_DONE on return.  B_EINTR is converted into an EINTR
  *	error and cleared.
  */
 int
 bufwait(struct buf *bp)
 {
 	if (bp->b_iocmd == BIO_READ)
 		bwait(bp, PRIBIO, "biord");
 	else
 		bwait(bp, PRIBIO, "biowr");
 	if (bp->b_flags & B_EINTR) {
 		bp->b_flags &= ~B_EINTR;
 		return (EINTR);
 	}
 	if (bp->b_ioflags & BIO_ERROR) {
 		return (bp->b_error ? bp->b_error : EIO);
 	} else {
 		return (0);
 	}
 }
 
 /*
  *	bufdone:
  *
  *	Finish I/O on a buffer, optionally calling a completion function.
  *	This is usually called from an interrupt so process blocking is
  *	not allowed.
  *
  *	biodone is also responsible for setting B_CACHE in a B_VMIO bp.
  *	In a non-VMIO bp, B_CACHE will be set on the next getblk() 
  *	assuming B_INVAL is clear.
  *
  *	For the VMIO case, we set B_CACHE if the op was a read and no
  *	read error occurred, or if the op was a write.  B_CACHE is never
  *	set if the buffer is invalid or otherwise uncacheable.
  *
- *	biodone does not mess with B_INVAL, allowing the I/O routine or the
+ *	bufdone does not mess with B_INVAL, allowing the I/O routine or the
  *	initiator to leave B_INVAL set to brelse the buffer out of existence
  *	in the biodone routine.
  */
 void
 bufdone(struct buf *bp)
 {
 	struct bufobj *dropobj;
 	void    (*biodone)(struct buf *);
 
 	buf_track(bp, __func__);
 	CTR3(KTR_BUF, "bufdone(%p) vp %p flags %X", bp, bp->b_vp, bp->b_flags);
 	dropobj = NULL;
 
 	KASSERT(!(bp->b_flags & B_DONE), ("biodone: bp %p already done", bp));
 	BUF_ASSERT_HELD(bp);
 
 	runningbufwakeup(bp);
 	if (bp->b_iocmd == BIO_WRITE)
 		dropobj = bp->b_bufobj;
 	/* call optional completion function if requested */
 	if (bp->b_iodone != NULL) {
 		biodone = bp->b_iodone;
 		bp->b_iodone = NULL;
 		(*biodone) (bp);
 		if (dropobj)
 			bufobj_wdrop(dropobj);
 		return;
 	}
 	if (bp->b_flags & B_VMIO) {
 		/*
 		 * Set B_CACHE if the op was a normal read and no error
 		 * occurred.  B_CACHE is set for writes in the b*write()
 		 * routines.
 		 */
 		if (bp->b_iocmd == BIO_READ &&
 		    !(bp->b_flags & (B_INVAL|B_NOCACHE)) &&
 		    !(bp->b_ioflags & BIO_ERROR))
 			bp->b_flags |= B_CACHE;
 		vfs_vmio_iodone(bp);
 	}
 	if (!LIST_EMPTY(&bp->b_dep))
 		buf_complete(bp);
 	if ((bp->b_flags & B_CKHASH) != 0) {
 		KASSERT(bp->b_iocmd == BIO_READ,
 		    ("bufdone: b_iocmd %d not BIO_READ", bp->b_iocmd));
 		KASSERT(buf_mapped(bp), ("bufdone: bp %p not mapped", bp));
 		(*bp->b_ckhashcalc)(bp);
 	}
 	/*
 	 * For asynchronous completions, release the buffer now. The brelse
 	 * will do a wakeup there if necessary - so no need to do a wakeup
 	 * here in the async case. The sync case always needs to do a wakeup.
 	 */
 	if (bp->b_flags & B_ASYNC) {
 		if ((bp->b_flags & (B_NOCACHE | B_INVAL | B_RELBUF)) ||
 		    (bp->b_ioflags & BIO_ERROR))
 			brelse(bp);
 		else
 			bqrelse(bp);
 	} else
 		bdone(bp);
 	if (dropobj)
 		bufobj_wdrop(dropobj);
 }
 
 /*
  * This routine is called in lieu of iodone in the case of
  * incomplete I/O.  This keeps the busy status for pages
  * consistent.
  */
 void
 vfs_unbusy_pages(struct buf *bp)
 {
 	int i;
 	vm_object_t obj;
 	vm_page_t m;
 
 	runningbufwakeup(bp);
 	if (!(bp->b_flags & B_VMIO))
 		return;
 
 	obj = bp->b_bufobj->bo_object;
 	VM_OBJECT_WLOCK(obj);
 	for (i = 0; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
 		if (m == bogus_page) {
 			m = vm_page_lookup(obj, OFF_TO_IDX(bp->b_offset) + i);
 			if (!m)
 				panic("vfs_unbusy_pages: page missing\n");
 			bp->b_pages[i] = m;
 			if (buf_mapped(bp)) {
 				BUF_CHECK_MAPPED(bp);
 				pmap_qenter(trunc_page((vm_offset_t)bp->b_data),
 				    bp->b_pages, bp->b_npages);
 			} else
 				BUF_CHECK_UNMAPPED(bp);
 		}
 		vm_page_sunbusy(m);
 	}
 	vm_object_pip_wakeupn(obj, bp->b_npages);
 	VM_OBJECT_WUNLOCK(obj);
 }
 
 /*
  * vfs_page_set_valid:
  *
  *	Set the valid bits in a page based on the supplied offset.   The
  *	range is restricted to the buffer's size.
  *
  *	This routine is typically called after a read completes.
  */
 static void
 vfs_page_set_valid(struct buf *bp, vm_ooffset_t off, vm_page_t m)
 {
 	vm_ooffset_t eoff;
 
 	/*
 	 * Compute the end offset, eoff, such that [off, eoff) does not span a
 	 * page boundary and eoff is not greater than the end of the buffer.
 	 * The end of the buffer, in this case, is our file EOF, not the
 	 * allocation size of the buffer.
 	 */
 	eoff = (off + PAGE_SIZE) & ~(vm_ooffset_t)PAGE_MASK;
 	if (eoff > bp->b_offset + bp->b_bcount)
 		eoff = bp->b_offset + bp->b_bcount;
 
 	/*
 	 * Set valid range.  This is typically the entire buffer and thus the
 	 * entire page.
 	 */
 	if (eoff > off)
 		vm_page_set_valid_range(m, off & PAGE_MASK, eoff - off);
 }
 
 /*
  * vfs_page_set_validclean:
  *
  *	Set the valid bits and clear the dirty bits in a page based on the
  *	supplied offset.   The range is restricted to the buffer's size.
  */
 static void
 vfs_page_set_validclean(struct buf *bp, vm_ooffset_t off, vm_page_t m)
 {
 	vm_ooffset_t soff, eoff;
 
 	/*
 	 * Start and end offsets in buffer.  eoff - soff may not cross a
 	 * page boundary or cross the end of the buffer.  The end of the
 	 * buffer, in this case, is our file EOF, not the allocation size
 	 * of the buffer.
 	 */
 	soff = off;
 	eoff = (off + PAGE_SIZE) & ~(off_t)PAGE_MASK;
 	if (eoff > bp->b_offset + bp->b_bcount)
 		eoff = bp->b_offset + bp->b_bcount;
 
 	/*
 	 * Set valid range.  This is typically the entire buffer and thus the
 	 * entire page.
 	 */
 	if (eoff > soff) {
 		vm_page_set_validclean(
 		    m,
 		   (vm_offset_t) (soff & PAGE_MASK),
 		   (vm_offset_t) (eoff - soff)
 		);
 	}
 }
 
 /*
  * Ensure that all buffer pages are not exclusive busied.  If any page is
  * exclusive busy, drain it.
  */
 void
 vfs_drain_busy_pages(struct buf *bp)
 {
 	vm_page_t m;
 	int i, last_busied;
 
 	VM_OBJECT_ASSERT_WLOCKED(bp->b_bufobj->bo_object);
 	last_busied = 0;
 	for (i = 0; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
 		if (vm_page_xbusied(m)) {
 			for (; last_busied < i; last_busied++)
 				vm_page_sbusy(bp->b_pages[last_busied]);
 			while (vm_page_xbusied(m)) {
 				vm_page_lock(m);
 				VM_OBJECT_WUNLOCK(bp->b_bufobj->bo_object);
 				vm_page_busy_sleep(m, "vbpage", true);
 				VM_OBJECT_WLOCK(bp->b_bufobj->bo_object);
 			}
 		}
 	}
 	for (i = 0; i < last_busied; i++)
 		vm_page_sunbusy(bp->b_pages[i]);
 }
 
 /*
  * This routine is called before a device strategy routine.
  * It is used to tell the VM system that paging I/O is in
  * progress, and treat the pages associated with the buffer
  * almost as being exclusive busy.  Also the object paging_in_progress
  * flag is handled to make sure that the object doesn't become
  * inconsistent.
  *
  * Since I/O has not been initiated yet, certain buffer flags
  * such as BIO_ERROR or B_INVAL may be in an inconsistent state
  * and should be ignored.
  */
 void
 vfs_busy_pages(struct buf *bp, int clear_modify)
 {
 	vm_object_t obj;
 	vm_ooffset_t foff;
 	vm_page_t m;
 	int i;
 	bool bogus;
 
 	if (!(bp->b_flags & B_VMIO))
 		return;
 
 	obj = bp->b_bufobj->bo_object;
 	foff = bp->b_offset;
 	KASSERT(bp->b_offset != NOOFFSET,
 	    ("vfs_busy_pages: no buffer offset"));
 	VM_OBJECT_WLOCK(obj);
 	vfs_drain_busy_pages(bp);
 	if (bp->b_bufsize != 0)
 		vfs_setdirty_locked_object(bp);
 	bogus = false;
 	for (i = 0; i < bp->b_npages; i++) {
 		m = bp->b_pages[i];
 
 		if ((bp->b_flags & B_CLUSTER) == 0) {
 			vm_object_pip_add(obj, 1);
 			vm_page_sbusy(m);
 		}
 		/*
 		 * When readying a buffer for a read ( i.e
 		 * clear_modify == 0 ), it is important to do
 		 * bogus_page replacement for valid pages in 
 		 * partially instantiated buffers.  Partially 
 		 * instantiated buffers can, in turn, occur when
 		 * reconstituting a buffer from its VM backing store
 		 * base.  We only have to do this if B_CACHE is
 		 * clear ( which causes the I/O to occur in the
 		 * first place ).  The replacement prevents the read
 		 * I/O from overwriting potentially dirty VM-backed
 		 * pages.  XXX bogus page replacement is, uh, bogus.
 		 * It may not work properly with small-block devices.
 		 * We need to find a better way.
 		 */
 		if (clear_modify) {
 			pmap_remove_write(m);
 			vfs_page_set_validclean(bp, foff, m);
 		} else if (m->valid == VM_PAGE_BITS_ALL &&
 		    (bp->b_flags & B_CACHE) == 0) {
 			bp->b_pages[i] = bogus_page;
 			bogus = true;
 		}
 		foff = (foff + PAGE_SIZE) & ~(off_t)PAGE_MASK;
 	}
 	VM_OBJECT_WUNLOCK(obj);
 	if (bogus && buf_mapped(bp)) {
 		BUF_CHECK_MAPPED(bp);
 		pmap_qenter(trunc_page((vm_offset_t)bp->b_data),
 		    bp->b_pages, bp->b_npages);
 	}
 }
 
 /*
  *	vfs_bio_set_valid:
  *
  *	Set the range within the buffer to valid.  The range is
  *	relative to the beginning of the buffer, b_offset.  Note that
  *	b_offset itself may be offset from the beginning of the first
  *	page.
  */
 void   
 vfs_bio_set_valid(struct buf *bp, int base, int size)
 {
 	int i, n;
 	vm_page_t m;
 
 	if (!(bp->b_flags & B_VMIO))
 		return;
 
 	/*
 	 * Fixup base to be relative to beginning of first page.
 	 * Set initial n to be the maximum number of bytes in the
 	 * first page that can be validated.
 	 */
 	base += (bp->b_offset & PAGE_MASK);
 	n = PAGE_SIZE - (base & PAGE_MASK);
 
 	VM_OBJECT_WLOCK(bp->b_bufobj->bo_object);
 	for (i = base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
 		m = bp->b_pages[i];
 		if (n > size)
 			n = size;
 		vm_page_set_valid_range(m, base & PAGE_MASK, n);
 		base += n;
 		size -= n;
 		n = PAGE_SIZE;
 	}
 	VM_OBJECT_WUNLOCK(bp->b_bufobj->bo_object);
 }
 
 /*
  *	vfs_bio_clrbuf:
  *
  *	If the specified buffer is a non-VMIO buffer, clear the entire
  *	buffer.  If the specified buffer is a VMIO buffer, clear and
  *	validate only the previously invalid portions of the buffer.
  *	This routine essentially fakes an I/O, so we need to clear
  *	BIO_ERROR and B_INVAL.
  *
  *	Note that while we only theoretically need to clear through b_bcount,
  *	we go ahead and clear through b_bufsize.
  */
 void
 vfs_bio_clrbuf(struct buf *bp) 
 {
 	int i, j, mask, sa, ea, slide;
 
 	if ((bp->b_flags & (B_VMIO | B_MALLOC)) != B_VMIO) {
 		clrbuf(bp);
 		return;
 	}
 	bp->b_flags &= ~B_INVAL;
 	bp->b_ioflags &= ~BIO_ERROR;
 	VM_OBJECT_WLOCK(bp->b_bufobj->bo_object);
 	if ((bp->b_npages == 1) && (bp->b_bufsize < PAGE_SIZE) &&
 	    (bp->b_offset & PAGE_MASK) == 0) {
 		if (bp->b_pages[0] == bogus_page)
 			goto unlock;
 		mask = (1 << (bp->b_bufsize / DEV_BSIZE)) - 1;
 		VM_OBJECT_ASSERT_WLOCKED(bp->b_pages[0]->object);
 		if ((bp->b_pages[0]->valid & mask) == mask)
 			goto unlock;
 		if ((bp->b_pages[0]->valid & mask) == 0) {
 			pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
 			bp->b_pages[0]->valid |= mask;
 			goto unlock;
 		}
 	}
 	sa = bp->b_offset & PAGE_MASK;
 	slide = 0;
 	for (i = 0; i < bp->b_npages; i++, sa = 0) {
 		slide = imin(slide + PAGE_SIZE, bp->b_offset + bp->b_bufsize);
 		ea = slide & PAGE_MASK;
 		if (ea == 0)
 			ea = PAGE_SIZE;
 		if (bp->b_pages[i] == bogus_page)
 			continue;
 		j = sa / DEV_BSIZE;
 		mask = ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
 		VM_OBJECT_ASSERT_WLOCKED(bp->b_pages[i]->object);
 		if ((bp->b_pages[i]->valid & mask) == mask)
 			continue;
 		if ((bp->b_pages[i]->valid & mask) == 0)
 			pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
 		else {
 			for (; sa < ea; sa += DEV_BSIZE, j++) {
 				if ((bp->b_pages[i]->valid & (1 << j)) == 0) {
 					pmap_zero_page_area(bp->b_pages[i],
 					    sa, DEV_BSIZE);
 				}
 			}
 		}
 		bp->b_pages[i]->valid |= mask;
 	}
 unlock:
 	VM_OBJECT_WUNLOCK(bp->b_bufobj->bo_object);
 	bp->b_resid = 0;
 }
 
 void
 vfs_bio_bzero_buf(struct buf *bp, int base, int size)
 {
 	vm_page_t m;
 	int i, n;
 
 	if (buf_mapped(bp)) {
 		BUF_CHECK_MAPPED(bp);
 		bzero(bp->b_data + base, size);
 	} else {
 		BUF_CHECK_UNMAPPED(bp);
 		n = PAGE_SIZE - (base & PAGE_MASK);
 		for (i = base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
 			m = bp->b_pages[i];
 			if (n > size)
 				n = size;
 			pmap_zero_page_area(m, base & PAGE_MASK, n);
 			base += n;
 			size -= n;
 			n = PAGE_SIZE;
 		}
 	}
 }
 
 /*
  * Update buffer flags based on I/O request parameters, optionally releasing the
  * buffer.  If it's VMIO or direct I/O, the buffer pages are released to the VM,
  * where they may be placed on a page queue (VMIO) or freed immediately (direct
  * I/O).  Otherwise the buffer is released to the cache.
  */
 static void
 b_io_dismiss(struct buf *bp, int ioflag, bool release)
 {
 
 	KASSERT((ioflag & IO_NOREUSE) == 0 || (ioflag & IO_VMIO) != 0,
 	    ("buf %p non-VMIO noreuse", bp));
 
 	if ((ioflag & IO_DIRECT) != 0)
 		bp->b_flags |= B_DIRECT;
 	if ((ioflag & IO_EXT) != 0)
 		bp->b_xflags |= BX_ALTDATA;
 	if ((ioflag & (IO_VMIO | IO_DIRECT)) != 0 && LIST_EMPTY(&bp->b_dep)) {
 		bp->b_flags |= B_RELBUF;
 		if ((ioflag & IO_NOREUSE) != 0)
 			bp->b_flags |= B_NOREUSE;
 		if (release)
 			brelse(bp);
 	} else if (release)
 		bqrelse(bp);
 }
 
 void
 vfs_bio_brelse(struct buf *bp, int ioflag)
 {
 
 	b_io_dismiss(bp, ioflag, true);
 }
 
 void
 vfs_bio_set_flags(struct buf *bp, int ioflag)
 {
 
 	b_io_dismiss(bp, ioflag, false);
 }
 
 /*
  * vm_hold_load_pages and vm_hold_free_pages get pages into
  * a buffers address space.  The pages are anonymous and are
  * not associated with a file object.
  */
 static void
 vm_hold_load_pages(struct buf *bp, vm_offset_t from, vm_offset_t to)
 {
 	vm_offset_t pg;
 	vm_page_t p;
 	int index;
 
 	BUF_CHECK_MAPPED(bp);
 
 	to = round_page(to);
 	from = round_page(from);
 	index = (from - trunc_page((vm_offset_t)bp->b_data)) >> PAGE_SHIFT;
 
 	for (pg = from; pg < to; pg += PAGE_SIZE, index++) {
 		/*
 		 * note: must allocate system pages since blocking here
 		 * could interfere with paging I/O, no matter which
 		 * process we are.
 		 */
 		p = vm_page_alloc(NULL, 0, VM_ALLOC_SYSTEM | VM_ALLOC_NOOBJ |
 		    VM_ALLOC_WIRED | VM_ALLOC_COUNT((to - pg) >> PAGE_SHIFT) |
 		    VM_ALLOC_WAITOK);
 		pmap_qenter(pg, &p, 1);
 		bp->b_pages[index] = p;
 	}
 	bp->b_npages = index;
 }
 
 /* Return pages associated with this buf to the vm system */
 static void
 vm_hold_free_pages(struct buf *bp, int newbsize)
 {
 	vm_offset_t from;
 	vm_page_t p;
 	int index, newnpages;
 
 	BUF_CHECK_MAPPED(bp);
 
 	from = round_page((vm_offset_t)bp->b_data + newbsize);
 	newnpages = (from - trunc_page((vm_offset_t)bp->b_data)) >> PAGE_SHIFT;
 	if (bp->b_npages > newnpages)
 		pmap_qremove(from, bp->b_npages - newnpages);
 	for (index = newnpages; index < bp->b_npages; index++) {
 		p = bp->b_pages[index];
 		bp->b_pages[index] = NULL;
 		p->wire_count--;
 		vm_page_free(p);
 	}
 	vm_wire_sub(bp->b_npages - newnpages);
 	bp->b_npages = newnpages;
 }
 
 /*
  * Map an IO request into kernel virtual address space.
  *
  * All requests are (re)mapped into kernel VA space.
  * Notice that we use b_bufsize for the size of the buffer
  * to be mapped.  b_bcount might be modified by the driver.
  *
  * Note that even if the caller determines that the address space should
  * be valid, a race or a smaller-file mapped into a larger space may
  * actually cause vmapbuf() to fail, so all callers of vmapbuf() MUST
  * check the return value.
  *
  * This function only works with pager buffers.
  */
 int
 vmapbuf(struct buf *bp, int mapbuf)
 {
 	vm_prot_t prot;
 	int pidx;
 
 	if (bp->b_bufsize < 0)
 		return (-1);
 	prot = VM_PROT_READ;
 	if (bp->b_iocmd == BIO_READ)
 		prot |= VM_PROT_WRITE;	/* Less backwards than it looks */
 	if ((pidx = vm_fault_quick_hold_pages(&curproc->p_vmspace->vm_map,
 	    (vm_offset_t)bp->b_data, bp->b_bufsize, prot, bp->b_pages,
 	    btoc(MAXPHYS))) < 0)
 		return (-1);
 	bp->b_npages = pidx;
 	bp->b_offset = ((vm_offset_t)bp->b_data) & PAGE_MASK;
 	if (mapbuf || !unmapped_buf_allowed) {
 		pmap_qenter((vm_offset_t)bp->b_kvabase, bp->b_pages, pidx);
 		bp->b_data = bp->b_kvabase + bp->b_offset;
 	} else
 		bp->b_data = unmapped_buf;
 	return(0);
 }
 
 /*
  * Free the io map PTEs associated with this IO operation.
  * We also invalidate the TLB entries and restore the original b_addr.
  *
  * This function only works with pager buffers.
  */
 void
 vunmapbuf(struct buf *bp)
 {
 	int npages;
 
 	npages = bp->b_npages;
 	if (buf_mapped(bp))
 		pmap_qremove(trunc_page((vm_offset_t)bp->b_data), npages);
 	vm_page_unhold_pages(bp->b_pages, npages);
 
 	bp->b_data = unmapped_buf;
 }
 
 void
 bdone(struct buf *bp)
 {
 	struct mtx *mtxp;
 
 	mtxp = mtx_pool_find(mtxpool_sleep, bp);
 	mtx_lock(mtxp);
 	bp->b_flags |= B_DONE;
 	wakeup(bp);
 	mtx_unlock(mtxp);
 }
 
 void
 bwait(struct buf *bp, u_char pri, const char *wchan)
 {
 	struct mtx *mtxp;
 
 	mtxp = mtx_pool_find(mtxpool_sleep, bp);
 	mtx_lock(mtxp);
 	while ((bp->b_flags & B_DONE) == 0)
 		msleep(bp, mtxp, pri, wchan, 0);
 	mtx_unlock(mtxp);
 }
 
 int
 bufsync(struct bufobj *bo, int waitfor)
 {
 
 	return (VOP_FSYNC(bo2vnode(bo), waitfor, curthread));
 }
 
 void
 bufstrategy(struct bufobj *bo, struct buf *bp)
 {
 	int i __unused;
 	struct vnode *vp;
 
 	vp = bp->b_vp;
 	KASSERT(vp == bo->bo_private, ("Inconsistent vnode bufstrategy"));
 	KASSERT(vp->v_type != VCHR && vp->v_type != VBLK,
 	    ("Wrong vnode in bufstrategy(bp=%p, vp=%p)", bp, vp));
 	i = VOP_STRATEGY(vp, bp);
 	KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
 }
 
 /*
  * Initialize a struct bufobj before use.  Memory is assumed zero filled.
  */
 void
 bufobj_init(struct bufobj *bo, void *private)
 {
 	static volatile int bufobj_cleanq;
 
         bo->bo_domain =
             atomic_fetchadd_int(&bufobj_cleanq, 1) % buf_domains;
         rw_init(BO_LOCKPTR(bo), "bufobj interlock");
         bo->bo_private = private;
         TAILQ_INIT(&bo->bo_clean.bv_hd);
         TAILQ_INIT(&bo->bo_dirty.bv_hd);
 }
 
 void
 bufobj_wrefl(struct bufobj *bo)
 {
 
 	KASSERT(bo != NULL, ("NULL bo in bufobj_wref"));
 	ASSERT_BO_WLOCKED(bo);
 	bo->bo_numoutput++;
 }
 
 void
 bufobj_wref(struct bufobj *bo)
 {
 
 	KASSERT(bo != NULL, ("NULL bo in bufobj_wref"));
 	BO_LOCK(bo);
 	bo->bo_numoutput++;
 	BO_UNLOCK(bo);
 }
 
 void
 bufobj_wdrop(struct bufobj *bo)
 {
 
 	KASSERT(bo != NULL, ("NULL bo in bufobj_wdrop"));
 	BO_LOCK(bo);
 	KASSERT(bo->bo_numoutput > 0, ("bufobj_wdrop non-positive count"));
 	if ((--bo->bo_numoutput == 0) && (bo->bo_flag & BO_WWAIT)) {
 		bo->bo_flag &= ~BO_WWAIT;
 		wakeup(&bo->bo_numoutput);
 	}
 	BO_UNLOCK(bo);
 }
 
 int
 bufobj_wwait(struct bufobj *bo, int slpflag, int timeo)
 {
 	int error;
 
 	KASSERT(bo != NULL, ("NULL bo in bufobj_wwait"));
 	ASSERT_BO_WLOCKED(bo);
 	error = 0;
 	while (bo->bo_numoutput) {
 		bo->bo_flag |= BO_WWAIT;
 		error = msleep(&bo->bo_numoutput, BO_LOCKPTR(bo),
 		    slpflag | (PRIBIO + 1), "bo_wwait", timeo);
 		if (error)
 			break;
 	}
 	return (error);
 }
 
 /*
  * Set bio_data or bio_ma for struct bio from the struct buf.
  */
 void
 bdata2bio(struct buf *bp, struct bio *bip)
 {
 
 	if (!buf_mapped(bp)) {
 		KASSERT(unmapped_buf_allowed, ("unmapped"));
 		bip->bio_ma = bp->b_pages;
 		bip->bio_ma_n = bp->b_npages;
 		bip->bio_data = unmapped_buf;
 		bip->bio_ma_offset = (vm_offset_t)bp->b_offset & PAGE_MASK;
 		bip->bio_flags |= BIO_UNMAPPED;
 		KASSERT(round_page(bip->bio_ma_offset + bip->bio_length) /
 		    PAGE_SIZE == bp->b_npages,
 		    ("Buffer %p too short: %d %lld %d", bp, bip->bio_ma_offset,
 		    (long long)bip->bio_length, bip->bio_ma_n));
 	} else {
 		bip->bio_data = bp->b_data;
 		bip->bio_ma = NULL;
 	}
 }
 
 /*
  * The MIPS pmap code currently doesn't handle aliased pages.
  * The VIPT caches may not handle page aliasing themselves, leading
  * to data corruption.
  *
  * As such, this code makes a system extremely unhappy if said
  * system doesn't support unaliasing the above situation in hardware.
  * Some "recent" systems (eg some mips24k/mips74k cores) don't enable
  * this feature at build time, so it has to be handled in software.
  *
  * Once the MIPS pmap/cache code grows to support this function on
  * earlier chips, it should be flipped back off.
  */
 #ifdef	__mips__
 static int buf_pager_relbuf = 1;
 #else
 static int buf_pager_relbuf = 0;
 #endif
 SYSCTL_INT(_vfs, OID_AUTO, buf_pager_relbuf, CTLFLAG_RWTUN,
     &buf_pager_relbuf, 0,
     "Make buffer pager release buffers after reading");
 
 /*
  * The buffer pager.  It uses buffer reads to validate pages.
  *
  * In contrast to the generic local pager from vm/vnode_pager.c, this
  * pager correctly and easily handles volumes where the underlying
  * device block size is greater than the machine page size.  The
  * buffer cache transparently extends the requested page run to be
  * aligned at the block boundary, and does the necessary bogus page
  * replacements in the addends to avoid obliterating already valid
  * pages.
  *
  * The only non-trivial issue is that the exclusive busy state for
  * pages, which is assumed by the vm_pager_getpages() interface, is
  * incompatible with the VMIO buffer cache's desire to share-busy the
  * pages.  This function performs a trivial downgrade of the pages'
  * state before reading buffers, and a less trivial upgrade from the
  * shared-busy to excl-busy state after the read.
  */
 int
 vfs_bio_getpages(struct vnode *vp, vm_page_t *ma, int count,
     int *rbehind, int *rahead, vbg_get_lblkno_t get_lblkno,
     vbg_get_blksize_t get_blksize)
 {
 	vm_page_t m;
 	vm_object_t object;
 	struct buf *bp;
 	struct mount *mp;
 	daddr_t lbn, lbnp;
 	vm_ooffset_t la, lb, poff, poffe;
 	long bsize;
 	int bo_bs, br_flags, error, i, pgsin, pgsin_a, pgsin_b;
 	bool redo, lpart;
 
 	object = vp->v_object;
 	mp = vp->v_mount;
 	error = 0;
 	la = IDX_TO_OFF(ma[count - 1]->pindex);
 	if (la >= object->un_pager.vnp.vnp_size)
 		return (VM_PAGER_BAD);
 
 	/*
 	 * Change the meaning of la from where the last requested page starts
 	 * to where it ends, because that's the end of the requested region
 	 * and the start of the potential read-ahead region.
 	 */
 	la += PAGE_SIZE;
 	lpart = la > object->un_pager.vnp.vnp_size;
 	bo_bs = get_blksize(vp, get_lblkno(vp, IDX_TO_OFF(ma[0]->pindex)));
 
 	/*
 	 * Calculate read-ahead, behind and total pages.
 	 */
 	pgsin = count;
 	lb = IDX_TO_OFF(ma[0]->pindex);
 	pgsin_b = OFF_TO_IDX(lb - rounddown2(lb, bo_bs));
 	pgsin += pgsin_b;
 	if (rbehind != NULL)
 		*rbehind = pgsin_b;
 	pgsin_a = OFF_TO_IDX(roundup2(la, bo_bs) - la);
 	if (la + IDX_TO_OFF(pgsin_a) >= object->un_pager.vnp.vnp_size)
 		pgsin_a = OFF_TO_IDX(roundup2(object->un_pager.vnp.vnp_size,
 		    PAGE_SIZE) - la);
 	pgsin += pgsin_a;
 	if (rahead != NULL)
 		*rahead = pgsin_a;
 	VM_CNT_INC(v_vnodein);
 	VM_CNT_ADD(v_vnodepgsin, pgsin);
 
 	br_flags = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS)
 	    != 0) ? GB_UNMAPPED : 0;
 	VM_OBJECT_WLOCK(object);
 again:
 	for (i = 0; i < count; i++)
 		vm_page_busy_downgrade(ma[i]);
 	VM_OBJECT_WUNLOCK(object);
 
 	lbnp = -1;
 	for (i = 0; i < count; i++) {
 		m = ma[i];
 
 		/*
 		 * Pages are shared busy and the object lock is not
 		 * owned, which together allow for the pages'
 		 * invalidation.  The racy test for validity avoids
 		 * useless creation of the buffer for the most typical
 		 * case when invalidation is not used in redo or for
 		 * parallel read.  The shared->excl upgrade loop at
 		 * the end of the function catches the race in a
 		 * reliable way (protected by the object lock).
 		 */
 		if (m->valid == VM_PAGE_BITS_ALL)
 			continue;
 
 		poff = IDX_TO_OFF(m->pindex);
 		poffe = MIN(poff + PAGE_SIZE, object->un_pager.vnp.vnp_size);
 		for (; poff < poffe; poff += bsize) {
 			lbn = get_lblkno(vp, poff);
 			if (lbn == lbnp)
 				goto next_page;
 			lbnp = lbn;
 
 			bsize = get_blksize(vp, lbn);
 			error = bread_gb(vp, lbn, bsize, curthread->td_ucred,
 			    br_flags, &bp);
 			if (error != 0)
 				goto end_pages;
 			if (LIST_EMPTY(&bp->b_dep)) {
 				/*
 				 * Invalidation clears m->valid, but
 				 * may leave B_CACHE flag if the
 				 * buffer existed at the invalidation
 				 * time.  In this case, recycle the
 				 * buffer to do real read on next
 				 * bread() after redo.
 				 *
 				 * Otherwise B_RELBUF is not strictly
 				 * necessary, enable to reduce buf
 				 * cache pressure.
 				 */
 				if (buf_pager_relbuf ||
 				    m->valid != VM_PAGE_BITS_ALL)
 					bp->b_flags |= B_RELBUF;
 
 				bp->b_flags &= ~B_NOCACHE;
 				brelse(bp);
 			} else {
 				bqrelse(bp);
 			}
 		}
 		KASSERT(1 /* racy, enable for debugging */ ||
 		    m->valid == VM_PAGE_BITS_ALL || i == count - 1,
 		    ("buf %d %p invalid", i, m));
 		if (i == count - 1 && lpart) {
 			VM_OBJECT_WLOCK(object);
 			if (m->valid != 0 &&
 			    m->valid != VM_PAGE_BITS_ALL)
 				vm_page_zero_invalid(m, TRUE);
 			VM_OBJECT_WUNLOCK(object);
 		}
 next_page:;
 	}
 end_pages:
 
 	VM_OBJECT_WLOCK(object);
 	redo = false;
 	for (i = 0; i < count; i++) {
 		vm_page_sunbusy(ma[i]);
 		ma[i] = vm_page_grab(object, ma[i]->pindex, VM_ALLOC_NORMAL);
 
 		/*
 		 * Since the pages were only sbusy while neither the
 		 * buffer nor the object lock was held by us, or
 		 * reallocated while vm_page_grab() slept for busy
 		 * relinguish, they could have been invalidated.
 		 * Recheck the valid bits and re-read as needed.
 		 *
 		 * Note that the last page is made fully valid in the
 		 * read loop, and partial validity for the page at
 		 * index count - 1 could mean that the page was
 		 * invalidated or removed, so we must restart for
 		 * safety as well.
 		 */
 		if (ma[i]->valid != VM_PAGE_BITS_ALL)
 			redo = true;
 	}
 	if (redo && error == 0)
 		goto again;
 	VM_OBJECT_WUNLOCK(object);
 	return (error != 0 ? VM_PAGER_ERROR : VM_PAGER_OK);
 }
 
 #include "opt_ddb.h"
 #ifdef DDB
 #include <ddb/ddb.h>
 
 /* DDB command to show buffer data */
 DB_SHOW_COMMAND(buffer, db_show_buffer)
 {
 	/* get args */
 	struct buf *bp = (struct buf *)addr;
 #ifdef FULL_BUF_TRACKING
 	uint32_t i, j;
 #endif
 
 	if (!have_addr) {
 		db_printf("usage: show buffer <addr>\n");
 		return;
 	}
 
 	db_printf("buf at %p\n", bp);
 	db_printf("b_flags = 0x%b, b_xflags=0x%b\n",
 	    (u_int)bp->b_flags, PRINT_BUF_FLAGS,
 	    (u_int)bp->b_xflags, PRINT_BUF_XFLAGS);
 	db_printf("b_vflags=0x%b b_ioflags0x%b\n",
 	    (u_int)bp->b_vflags, PRINT_BUF_VFLAGS,
 	    (u_int)bp->b_ioflags, PRINT_BIO_FLAGS);
 	db_printf(
 	    "b_error = %d, b_bufsize = %ld, b_bcount = %ld, b_resid = %ld\n"
 	    "b_bufobj = (%p), b_data = %p\n, b_blkno = %jd, b_lblkno = %jd, "
 	    "b_vp = %p, b_dep = %p\n",
 	    bp->b_error, bp->b_bufsize, bp->b_bcount, bp->b_resid,
 	    bp->b_bufobj, bp->b_data, (intmax_t)bp->b_blkno,
 	    (intmax_t)bp->b_lblkno, bp->b_vp, bp->b_dep.lh_first);
 	db_printf("b_kvabase = %p, b_kvasize = %d\n",
 	    bp->b_kvabase, bp->b_kvasize);
 	if (bp->b_npages) {
 		int i;
 		db_printf("b_npages = %d, pages(OBJ, IDX, PA): ", bp->b_npages);
 		for (i = 0; i < bp->b_npages; i++) {
 			vm_page_t m;
 			m = bp->b_pages[i];
 			if (m != NULL)
 				db_printf("(%p, 0x%lx, 0x%lx)", m->object,
 				    (u_long)m->pindex,
 				    (u_long)VM_PAGE_TO_PHYS(m));
 			else
 				db_printf("( ??? )");
 			if ((i + 1) < bp->b_npages)
 				db_printf(",");
 		}
 		db_printf("\n");
 	}
 	BUF_LOCKPRINTINFO(bp);
 #if defined(FULL_BUF_TRACKING)
 	db_printf("b_io_tracking: b_io_tcnt = %u\n", bp->b_io_tcnt);
 
 	i = bp->b_io_tcnt % BUF_TRACKING_SIZE;
 	for (j = 1; j <= BUF_TRACKING_SIZE; j++) {
 		if (bp->b_io_tracking[BUF_TRACKING_ENTRY(i - j)] == NULL)
 			continue;
 		db_printf(" %2u: %s\n", j,
 		    bp->b_io_tracking[BUF_TRACKING_ENTRY(i - j)]);
 	}
 #elif defined(BUF_TRACKING)
 	db_printf("b_io_tracking: %s\n", bp->b_io_tracking);
 #endif
 	db_printf(" ");
 }
 
 DB_SHOW_COMMAND(bufqueues, bufqueues)
 {
 	struct bufdomain *bd;
 	struct buf *bp;
 	long total;
 	int i, j, cnt;
 
 	db_printf("bqempty: %d\n", bqempty.bq_len);
 
 	for (i = 0; i < buf_domains; i++) {
 		bd = &bdomain[i];
 		db_printf("Buf domain %d\n", i);
 		db_printf("\tfreebufs\t%d\n", bd->bd_freebuffers);
 		db_printf("\tlofreebufs\t%d\n", bd->bd_lofreebuffers);
 		db_printf("\thifreebufs\t%d\n", bd->bd_hifreebuffers);
 		db_printf("\n");
 		db_printf("\tbufspace\t%ld\n", bd->bd_bufspace);
 		db_printf("\tmaxbufspace\t%ld\n", bd->bd_maxbufspace);
 		db_printf("\thibufspace\t%ld\n", bd->bd_hibufspace);
 		db_printf("\tlobufspace\t%ld\n", bd->bd_lobufspace);
 		db_printf("\tbufspacethresh\t%ld\n", bd->bd_bufspacethresh);
 		db_printf("\n");
 		db_printf("\tnumdirtybuffers\t%d\n", bd->bd_numdirtybuffers);
 		db_printf("\tlodirtybuffers\t%d\n", bd->bd_lodirtybuffers);
 		db_printf("\thidirtybuffers\t%d\n", bd->bd_hidirtybuffers);
 		db_printf("\tdirtybufthresh\t%d\n", bd->bd_dirtybufthresh);
 		db_printf("\n");
 		total = 0;
 		TAILQ_FOREACH(bp, &bd->bd_cleanq->bq_queue, b_freelist)
 			total += bp->b_bufsize;
 		db_printf("\tcleanq count\t%d (%ld)\n",
 		    bd->bd_cleanq->bq_len, total);
 		total = 0;
 		TAILQ_FOREACH(bp, &bd->bd_dirtyq.bq_queue, b_freelist)
 			total += bp->b_bufsize;
 		db_printf("\tdirtyq count\t%d (%ld)\n",
 		    bd->bd_dirtyq.bq_len, total);
 		db_printf("\twakeup\t\t%d\n", bd->bd_wanted);
 		db_printf("\tlim\t\t%d\n", bd->bd_lim);
 		db_printf("\tCPU ");
 		for (j = 0; j <= mp_maxid; j++)
 			db_printf("%d, ", bd->bd_subq[j].bq_len);
 		db_printf("\n");
 		cnt = 0;
 		total = 0;
 		for (j = 0; j < nbuf; j++)
 			if (buf[j].b_domain == i && BUF_ISLOCKED(&buf[j])) {
 				cnt++;
 				total += buf[j].b_bufsize;
 			}
 		db_printf("\tLocked buffers: %d space %ld\n", cnt, total);
 		cnt = 0;
 		total = 0;
 		for (j = 0; j < nbuf; j++)
 			if (buf[j].b_domain == i) {
 				cnt++;
 				total += buf[j].b_bufsize;
 			}
 		db_printf("\tTotal buffers: %d space %ld\n", cnt, total);
 	}
 }
 
 DB_SHOW_COMMAND(lockedbufs, lockedbufs)
 {
 	struct buf *bp;
 	int i;
 
 	for (i = 0; i < nbuf; i++) {
 		bp = &buf[i];
 		if (BUF_ISLOCKED(bp)) {
 			db_show_buffer((uintptr_t)bp, 1, 0, NULL);
 			db_printf("\n");
 			if (db_pager_quit)
 				break;
 		}
 	}
 }
 
 DB_SHOW_COMMAND(vnodebufs, db_show_vnodebufs)
 {
 	struct vnode *vp;
 	struct buf *bp;
 
 	if (!have_addr) {
 		db_printf("usage: show vnodebufs <addr>\n");
 		return;
 	}
 	vp = (struct vnode *)addr;
 	db_printf("Clean buffers:\n");
 	TAILQ_FOREACH(bp, &vp->v_bufobj.bo_clean.bv_hd, b_bobufs) {
 		db_show_buffer((uintptr_t)bp, 1, 0, NULL);
 		db_printf("\n");
 	}
 	db_printf("Dirty buffers:\n");
 	TAILQ_FOREACH(bp, &vp->v_bufobj.bo_dirty.bv_hd, b_bobufs) {
 		db_show_buffer((uintptr_t)bp, 1, 0, NULL);
 		db_printf("\n");
 	}
 }
 
 DB_COMMAND(countfreebufs, db_coundfreebufs)
 {
 	struct buf *bp;
 	int i, used = 0, nfree = 0;
 
 	if (have_addr) {
 		db_printf("usage: countfreebufs\n");
 		return;
 	}
 
 	for (i = 0; i < nbuf; i++) {
 		bp = &buf[i];
 		if (bp->b_qindex == QUEUE_EMPTY)
 			nfree++;
 		else
 			used++;
 	}
 
 	db_printf("Counted %d free, %d used (%d tot)\n", nfree, used,
 	    nfree + used);
 	db_printf("numfreebuffers is %d\n", numfreebuffers);
 }
 #endif /* DDB */
Index: user/ngie/bug-237403/sys/modules/allwinner/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/Makefile	(revision 346925)
+++ user/ngie/bug-237403/sys/modules/allwinner/Makefile	(revision 346926)
@@ -1,7 +1,14 @@
 # $FreeBSD$
 # Build modules specific to Allwinner.
 
 SUBDIR = \
+	aw_pwm \
+	aw_rtc \
+	aw_rsb \
+	aw_sid \
 	aw_spi \
+	aw_thermal \
+	axp81x \
+	if_awg
 
 .include <bsd.subdir.mk>
Index: user/ngie/bug-237403/sys/modules/allwinner/aw_pwm/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/aw_pwm/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/aw_pwm/Makefile	(revision 346926)
@@ -0,0 +1,15 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	aw_pwm
+SRCS=	aw_pwm.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+	pwm_if.h
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/aw_pwm/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/aw_rsb/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/aw_rsb/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/aw_rsb/Makefile	(revision 346926)
@@ -0,0 +1,15 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	aw_rsb
+SRCS=	aw_rsb.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+	iicbus_if.h
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/aw_rsb/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/aw_rtc/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/aw_rtc/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/aw_rtc/Makefile	(revision 346926)
@@ -0,0 +1,15 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	aw_rtc
+SRCS=	aw_rtc.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+	spibus_if.h \
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/aw_rtc/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/aw_sid/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/aw_sid/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/aw_sid/Makefile	(revision 346926)
@@ -0,0 +1,14 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	aw_sid
+SRCS=	aw_sid.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/aw_sid/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/aw_thermal/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/aw_thermal/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/aw_thermal/Makefile	(revision 346926)
@@ -0,0 +1,14 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	aw_thermal
+SRCS=	aw_thermal.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/aw_thermal/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/axp81x/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/axp81x/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/axp81x/Makefile	(revision 346926)
@@ -0,0 +1,15 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	axp81x
+SRCS=	axp81x.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+	iicbus_if.h
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/axp81x/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/allwinner/if_awg/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/allwinner/if_awg/Makefile	(nonexistent)
+++ user/ngie/bug-237403/sys/modules/allwinner/if_awg/Makefile	(revision 346926)
@@ -0,0 +1,14 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/arm/allwinner
+
+KMOD=	if_awg
+SRCS=	if_awg.c
+
+SRCS+=	\
+	bus_if.h \
+	clknode_if.h \
+	device_if.h \
+	ofw_bus_if.h \
+
+.include <bsd.kmod.mk>

Property changes on: user/ngie/bug-237403/sys/modules/allwinner/if_awg/Makefile
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/sys/modules/fusefs/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/fusefs/Makefile	(revision 346925)
+++ user/ngie/bug-237403/sys/modules/fusefs/Makefile	(revision 346926)
@@ -1,13 +1,29 @@
 # $FreeBSD$
 
 .PATH: ${SRCTOP}/sys/fs/fuse
 
 KMOD=	fusefs
 SRCS=	vnode_if.h \
 	fuse_node.c fuse_io.c fuse_device.c fuse_ipc.c fuse_file.c \
 	fuse_vfsops.c fuse_vnops.c fuse_internal.c fuse_main.c
 
 # Symlink for backwards compatibility with systems installed at 12.0 or older
+.if ${MACHINE_CPUARCH} != "powerpc"
 SYMLINKS=	${KMOD}.ko ${KMODDIR}/fuse.ko
+.else
+# Some PPC systems use msdosfs for /boot, which can't handle links or symlinks
+afterinstall: alias alias_debug
+alias: .PHONY
+	${INSTALL} -T release -o ${KMODOWN} -g ${KMODGRP} -m ${KMODMODE} \
+	    ${_INSTALLFLAGS} ${PROG} ${DESTDIR}${KMODDIR}/fuse.ko
+.if defined(DEBUG_FLAGS) && !defined(INSTALL_NODEBUG) && "${MK_KERNEL_SYMBOLS}" != "no"
+alias_debug: .PHONY
+	${INSTALL} -T debug -o ${KMODOWN} -g ${KMODGRP} -m ${KMODMODE} \
+	    ${_INSTALLFLAGS} ${PROG}.debug \
+	    ${DESTDIR}${KERN_DEBUGDIR}${KMODDIR}/fuse.ko
+.else
+alias_debug: .PHONY
+.endif
+.endif
 
 .include <bsd.kmod.mk>
Index: user/ngie/bug-237403/sys/modules/if_gre/Makefile
===================================================================
--- user/ngie/bug-237403/sys/modules/if_gre/Makefile	(revision 346925)
+++ user/ngie/bug-237403/sys/modules/if_gre/Makefile	(revision 346926)
@@ -1,12 +1,12 @@
 # $FreeBSD$
 
 SYSDIR?=${SRCTOP}/sys
 .PATH: ${SYSDIR}/net ${SYSDIR}/netinet ${SYSDIR}/netinet6
 .include "${SYSDIR}/conf/kern.opts.mk"
 
 KMOD=	if_gre
-SRCS=	if_gre.c opt_inet.h opt_inet6.h
+SRCS=	if_gre.c opt_inet.h opt_inet6.h opt_rss.h
 SRCS.INET=	ip_gre.c
 SRCS.INET6=	ip6_gre.c
 
 .include <bsd.kmod.mk>
Index: user/ngie/bug-237403/sys/net/if_gre.c
===================================================================
--- user/ngie/bug-237403/sys/net/if_gre.c	(revision 346925)
+++ user/ngie/bug-237403/sys/net/if_gre.c	(revision 346926)
@@ -1,679 +1,820 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 1998 The NetBSD Foundation, Inc.
  * Copyright (c) 2014, 2018 Andrey V. Elsukov <ae@FreeBSD.org>
  * All rights reserved.
  *
  * This code is derived from software contributed to The NetBSD Foundation
  * by Heiko W.Rupp <hwr@pilhuhn.de>
  *
  * IPv6-over-GRE contributed by Gert Doering <gert@greenie.muc.de>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * $NetBSD: if_gre.c,v 1.49 2003/12/11 00:22:29 itojun Exp $
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
+#include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/mbuf.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/socket.h>
+#include <sys/socketvar.h>
 #include <sys/sockio.h>
 #include <sys/sx.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
 #include <sys/systm.h>
 
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_clone.h>
 #include <net/if_types.h>
 #include <net/netisr.h>
 #include <net/vnet.h>
 #include <net/route.h>
 
 #include <netinet/in.h>
+#include <netinet/in_pcb.h>
 #ifdef INET
 #include <netinet/in_var.h>
 #include <netinet/ip.h>
 #include <netinet/ip_var.h>
+#ifdef RSS
+#include <netinet/in_rss.h>
 #endif
+#endif
 
 #ifdef INET6
 #include <netinet/ip6.h>
 #include <netinet6/in6_var.h>
 #include <netinet6/ip6_var.h>
+#ifdef RSS
+#include <netinet6/in6_rss.h>
 #endif
+#endif
 
 #include <netinet/ip_encap.h>
+#include <netinet/udp.h>
 #include <net/bpf.h>
 #include <net/if_gre.h>
 
 #include <machine/in_cksum.h>
 #include <security/mac/mac_framework.h>
 
 #define	GREMTU			1476
 
 static const char grename[] = "gre";
 MALLOC_DEFINE(M_GRE, grename, "Generic Routing Encapsulation");
 
 static struct sx gre_ioctl_sx;
 SX_SYSINIT(gre_ioctl_sx, &gre_ioctl_sx, "gre_ioctl");
 
 static int	gre_clone_create(struct if_clone *, int, caddr_t);
 static void	gre_clone_destroy(struct ifnet *);
 VNET_DEFINE_STATIC(struct if_clone *, gre_cloner);
 #define	V_gre_cloner	VNET(gre_cloner)
 
 static void	gre_qflush(struct ifnet *);
 static int	gre_transmit(struct ifnet *, struct mbuf *);
 static int	gre_ioctl(struct ifnet *, u_long, caddr_t);
 static int	gre_output(struct ifnet *, struct mbuf *,
 		    const struct sockaddr *, struct route *);
 static void	gre_delete_tunnel(struct gre_softc *);
 
 SYSCTL_DECL(_net_link);
 static SYSCTL_NODE(_net_link, IFT_TUNNEL, gre, CTLFLAG_RW, 0,
     "Generic Routing Encapsulation");
 #ifndef MAX_GRE_NEST
 /*
  * This macro controls the default upper limitation on nesting of gre tunnels.
  * Since, setting a large value to this macro with a careless configuration
  * may introduce system crash, we don't allow any nestings by default.
  * If you need to configure nested gre tunnels, you can define this macro
  * in your kernel configuration file.  However, if you do so, please be
  * careful to configure the tunnels so that it won't make a loop.
  */
 #define MAX_GRE_NEST 1
 #endif
 
 VNET_DEFINE_STATIC(int, max_gre_nesting) = MAX_GRE_NEST;
 #define	V_max_gre_nesting	VNET(max_gre_nesting)
 SYSCTL_INT(_net_link_gre, OID_AUTO, max_nesting, CTLFLAG_RW | CTLFLAG_VNET,
     &VNET_NAME(max_gre_nesting), 0, "Max nested tunnels");
 
 static void
 vnet_gre_init(const void *unused __unused)
 {
 
 	V_gre_cloner = if_clone_simple(grename, gre_clone_create,
 	    gre_clone_destroy, 0);
 #ifdef INET
 	in_gre_init();
 #endif
 #ifdef INET6
 	in6_gre_init();
 #endif
 }
 VNET_SYSINIT(vnet_gre_init, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
     vnet_gre_init, NULL);
 
 static void
 vnet_gre_uninit(const void *unused __unused)
 {
 
 	if_clone_detach(V_gre_cloner);
 #ifdef INET
 	in_gre_uninit();
 #endif
 #ifdef INET6
 	in6_gre_uninit();
 #endif
+	/* XXX: epoch_call drain */
 }
 VNET_SYSUNINIT(vnet_gre_uninit, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
     vnet_gre_uninit, NULL);
 
 static int
 gre_clone_create(struct if_clone *ifc, int unit, caddr_t params)
 {
 	struct gre_softc *sc;
 
 	sc = malloc(sizeof(struct gre_softc), M_GRE, M_WAITOK | M_ZERO);
 	sc->gre_fibnum = curthread->td_proc->p_fibnum;
 	GRE2IFP(sc) = if_alloc(IFT_TUNNEL);
 	GRE2IFP(sc)->if_softc = sc;
 	if_initname(GRE2IFP(sc), grename, unit);
 
 	GRE2IFP(sc)->if_mtu = GREMTU;
 	GRE2IFP(sc)->if_flags = IFF_POINTOPOINT|IFF_MULTICAST;
 	GRE2IFP(sc)->if_output = gre_output;
 	GRE2IFP(sc)->if_ioctl = gre_ioctl;
 	GRE2IFP(sc)->if_transmit = gre_transmit;
 	GRE2IFP(sc)->if_qflush = gre_qflush;
 	GRE2IFP(sc)->if_capabilities |= IFCAP_LINKSTATE;
 	GRE2IFP(sc)->if_capenable |= IFCAP_LINKSTATE;
 	if_attach(GRE2IFP(sc));
 	bpfattach(GRE2IFP(sc), DLT_NULL, sizeof(u_int32_t));
 	return (0);
 }
 
 static void
 gre_clone_destroy(struct ifnet *ifp)
 {
 	struct gre_softc *sc;
 
 	sx_xlock(&gre_ioctl_sx);
 	sc = ifp->if_softc;
 	gre_delete_tunnel(sc);
 	bpfdetach(ifp);
 	if_detach(ifp);
 	ifp->if_softc = NULL;
 	sx_xunlock(&gre_ioctl_sx);
 
 	GRE_WAIT();
 	if_free(ifp);
 	free(sc, M_GRE);
 }
 
 static int
 gre_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
 	struct ifreq *ifr = (struct ifreq *)data;
 	struct gre_softc *sc;
 	uint32_t opt;
 	int error;
 
 	switch (cmd) {
 	case SIOCSIFMTU:
 		 /* XXX: */
 		if (ifr->ifr_mtu < 576)
 			return (EINVAL);
 		ifp->if_mtu = ifr->ifr_mtu;
 		return (0);
 	case SIOCSIFADDR:
 		ifp->if_flags |= IFF_UP;
 	case SIOCSIFFLAGS:
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		return (0);
 	case GRESADDRS:
 	case GRESADDRD:
 	case GREGADDRS:
 	case GREGADDRD:
 	case GRESPROTO:
 	case GREGPROTO:
 		return (EOPNOTSUPP);
 	}
 	sx_xlock(&gre_ioctl_sx);
 	sc = ifp->if_softc;
 	if (sc == NULL) {
 		error = ENXIO;
 		goto end;
 	}
 	error = 0;
 	switch (cmd) {
 	case SIOCDIFPHYADDR:
 		if (sc->gre_family == 0)
 			break;
 		gre_delete_tunnel(sc);
 		break;
 #ifdef INET
 	case SIOCSIFPHYADDR:
 	case SIOCGIFPSRCADDR:
 	case SIOCGIFPDSTADDR:
 		error = in_gre_ioctl(sc, cmd, data);
 		break;
 #endif
 #ifdef INET6
 	case SIOCSIFPHYADDR_IN6:
 	case SIOCGIFPSRCADDR_IN6:
 	case SIOCGIFPDSTADDR_IN6:
 		error = in6_gre_ioctl(sc, cmd, data);
 		break;
 #endif
 	case SIOCGTUNFIB:
 		ifr->ifr_fib = sc->gre_fibnum;
 		break;
 	case SIOCSTUNFIB:
 		if ((error = priv_check(curthread, PRIV_NET_GRE)) != 0)
 			break;
 		if (ifr->ifr_fib >= rt_numfibs)
 			error = EINVAL;
 		else
 			sc->gre_fibnum = ifr->ifr_fib;
 		break;
 	case GRESKEY:
 	case GRESOPTS:
+	case GRESPORT:
 		if ((error = priv_check(curthread, PRIV_NET_GRE)) != 0)
 			break;
 		if ((error = copyin(ifr_data_get_ptr(ifr), &opt,
 		    sizeof(opt))) != 0)
 			break;
 		if (cmd == GRESKEY) {
 			if (sc->gre_key == opt)
 				break;
 		} else if (cmd == GRESOPTS) {
 			if (opt & ~GRE_OPTMASK) {
 				error = EINVAL;
 				break;
 			}
 			if (sc->gre_options == opt)
 				break;
+		} else if (cmd == GRESPORT) {
+			if (opt != 0 && (opt < V_ipport_hifirstauto ||
+			    opt > V_ipport_hilastauto)) {
+				error = EINVAL;
+				break;
+			}
+			if (sc->gre_port == opt)
+				break;
+			if ((sc->gre_options & GRE_UDPENCAP) == 0) {
+				/*
+				 * UDP encapsulation is not enabled, thus
+				 * there is no need to reattach softc.
+				 */
+				sc->gre_port = opt;
+				break;
+			}
 		}
 		switch (sc->gre_family) {
 #ifdef INET
 		case AF_INET:
-			in_gre_setopts(sc, cmd, opt);
+			error = in_gre_setopts(sc, cmd, opt);
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
-			in6_gre_setopts(sc, cmd, opt);
+			error = in6_gre_setopts(sc, cmd, opt);
 			break;
 #endif
 		default:
+			/*
+			 * Tunnel is not yet configured.
+			 * We can just change any parameters.
+			 */
 			if (cmd == GRESKEY)
 				sc->gre_key = opt;
-			else
+			if (cmd == GRESOPTS)
 				sc->gre_options = opt;
+			if (cmd == GRESPORT)
+				sc->gre_port = opt;
 			break;
 		}
 		/*
 		 * XXX: Do we need to initiate change of interface
 		 * state here?
 		 */
 		break;
 	case GREGKEY:
 		error = copyout(&sc->gre_key, ifr_data_get_ptr(ifr),
 		    sizeof(sc->gre_key));
 		break;
 	case GREGOPTS:
 		error = copyout(&sc->gre_options, ifr_data_get_ptr(ifr),
 		    sizeof(sc->gre_options));
 		break;
+	case GREGPORT:
+		error = copyout(&sc->gre_port, ifr_data_get_ptr(ifr),
+		    sizeof(sc->gre_port));
+		break;
 	default:
 		error = EINVAL;
 		break;
 	}
 	if (error == 0 && sc->gre_family != 0) {
 		if (
 #ifdef INET
 		    cmd == SIOCSIFPHYADDR ||
 #endif
 #ifdef INET6
 		    cmd == SIOCSIFPHYADDR_IN6 ||
 #endif
 		    0) {
 			if_link_state_change(ifp, LINK_STATE_UP);
 		}
 	}
 end:
 	sx_xunlock(&gre_ioctl_sx);
 	return (error);
 }
 
 static void
 gre_delete_tunnel(struct gre_softc *sc)
 {
+	struct gre_socket *gs;
 
 	sx_assert(&gre_ioctl_sx, SA_XLOCKED);
 	if (sc->gre_family != 0) {
 		CK_LIST_REMOVE(sc, chain);
 		CK_LIST_REMOVE(sc, srchash);
 		GRE_WAIT();
 		free(sc->gre_hdr, M_GRE);
 		sc->gre_family = 0;
 	}
+	/*
+	 * If this Tunnel was the last one that could use UDP socket,
+	 * we should unlink socket from hash table and close it.
+	 */
+	if ((gs = sc->gre_so) != NULL && CK_LIST_EMPTY(&gs->list)) {
+		CK_LIST_REMOVE(gs, chain);
+		soclose(gs->so);
+		epoch_call(net_epoch_preempt, &gs->epoch_ctx, gre_sofree);
+		sc->gre_so = NULL;
+	}
 	GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING;
 	if_link_state_change(GRE2IFP(sc), LINK_STATE_DOWN);
 }
 
 struct gre_list *
 gre_hashinit(void)
 {
 	struct gre_list *hash;
 	int i;
 
 	hash = malloc(sizeof(struct gre_list) * GRE_HASH_SIZE,
 	    M_GRE, M_WAITOK);
 	for (i = 0; i < GRE_HASH_SIZE; i++)
 		CK_LIST_INIT(&hash[i]);
 
 	return (hash);
 }
 
 void
 gre_hashdestroy(struct gre_list *hash)
 {
 
 	free(hash, M_GRE);
 }
 
 void
-gre_updatehdr(struct gre_softc *sc, struct grehdr *gh)
+gre_sofree(epoch_context_t ctx)
 {
+	struct gre_socket *gs;
+
+	gs = __containerof(ctx, struct gre_socket, epoch_ctx);
+	free(gs, M_GRE);
+}
+
+static __inline uint16_t
+gre_cksum_add(uint16_t sum, uint16_t a)
+{
+	uint16_t res;
+
+	res = sum + a;
+	return (res + (res < a));
+}
+
+void
+gre_update_udphdr(struct gre_softc *sc, struct udphdr *udp, uint16_t csum)
+{
+
+	sx_assert(&gre_ioctl_sx, SA_XLOCKED);
+	MPASS(sc->gre_options & GRE_UDPENCAP);
+
+	udp->uh_dport = htons(GRE_UDPPORT);
+	udp->uh_sport = htons(sc->gre_port);
+	udp->uh_sum = csum;
+	udp->uh_ulen = 0;
+}
+
+void
+gre_update_hdr(struct gre_softc *sc, struct grehdr *gh)
+{
 	uint32_t *opts;
 	uint16_t flags;
 
 	sx_assert(&gre_ioctl_sx, SA_XLOCKED);
 
 	flags = 0;
 	opts = gh->gre_opts;
 	if (sc->gre_options & GRE_ENABLE_CSUM) {
 		flags |= GRE_FLAGS_CP;
 		sc->gre_hlen += 2 * sizeof(uint16_t);
 		*opts++ = 0;
 	}
 	if (sc->gre_key != 0) {
 		flags |= GRE_FLAGS_KP;
 		sc->gre_hlen += sizeof(uint32_t);
 		*opts++ = htonl(sc->gre_key);
 	}
 	if (sc->gre_options & GRE_ENABLE_SEQ) {
 		flags |= GRE_FLAGS_SP;
 		sc->gre_hlen += sizeof(uint32_t);
 		*opts++ = 0;
 	} else
 		sc->gre_oseq = 0;
 	gh->gre_flags = htons(flags);
 }
 
 int
 gre_input(struct mbuf *m, int off, int proto, void *arg)
 {
 	struct gre_softc *sc = arg;
 	struct grehdr *gh;
 	struct ifnet *ifp;
 	uint32_t *opts;
 #ifdef notyet
 	uint32_t key;
 #endif
 	uint16_t flags;
 	int hlen, isr, af;
 
 	ifp = GRE2IFP(sc);
 	hlen = off + sizeof(struct grehdr) + 4 * sizeof(uint32_t);
 	if (m->m_pkthdr.len < hlen)
 		goto drop;
 	if (m->m_len < hlen) {
 		m = m_pullup(m, hlen);
 		if (m == NULL)
 			goto drop;
 	}
 	gh = (struct grehdr *)mtodo(m, off);
 	flags = ntohs(gh->gre_flags);
 	if (flags & ~GRE_FLAGS_MASK)
 		goto drop;
 	opts = gh->gre_opts;
 	hlen = 2 * sizeof(uint16_t);
 	if (flags & GRE_FLAGS_CP) {
 		/* reserved1 field must be zero */
 		if (((uint16_t *)opts)[1] != 0)
 			goto drop;
 		if (in_cksum_skip(m, m->m_pkthdr.len, off) != 0)
 			goto drop;
 		hlen += 2 * sizeof(uint16_t);
 		opts++;
 	}
 	if (flags & GRE_FLAGS_KP) {
 #ifdef notyet
         /* 
          * XXX: The current implementation uses the key only for outgoing
          * packets. But we can check the key value here, or even in the
          * encapcheck function.
          */
 		key = ntohl(*opts);
 #endif
 		hlen += sizeof(uint32_t);
 		opts++;
     }
 #ifdef notyet
 	} else
 		key = 0;
 
 	if (sc->gre_key != 0 && (key != sc->gre_key || key != 0))
 		goto drop;
 #endif
 	if (flags & GRE_FLAGS_SP) {
 #ifdef notyet
 		seq = ntohl(*opts);
 #endif
 		hlen += sizeof(uint32_t);
 	}
 	switch (ntohs(gh->gre_proto)) {
 	case ETHERTYPE_WCCP:
 		/*
 		 * For WCCP skip an additional 4 bytes if after GRE header
 		 * doesn't follow an IP header.
 		 */
 		if (flags == 0 && (*(uint8_t *)gh->gre_opts & 0xF0) != 0x40)
 			hlen += sizeof(uint32_t);
 		/* FALLTHROUGH */
 	case ETHERTYPE_IP:
 		isr = NETISR_IP;
 		af = AF_INET;
 		break;
 	case ETHERTYPE_IPV6:
 		isr = NETISR_IPV6;
 		af = AF_INET6;
 		break;
 	default:
 		goto drop;
 	}
 	m_adj(m, off + hlen);
 	m_clrprotoflags(m);
 	m->m_pkthdr.rcvif = ifp;
 	M_SETFIB(m, ifp->if_fib);
 #ifdef MAC
 	mac_ifnet_create_mbuf(ifp, m);
 #endif
 	BPF_MTAP2(ifp, &af, sizeof(af), m);
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 	if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len);
 	if ((ifp->if_flags & IFF_MONITOR) != 0)
 		m_freem(m);
 	else
 		netisr_dispatch(isr, m);
 	return (IPPROTO_DONE);
 drop:
 	if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 	m_freem(m);
 	return (IPPROTO_DONE);
 }
 
 static int
 gre_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst,
    struct route *ro)
 {
 	uint32_t af;
 
 	if (dst->sa_family == AF_UNSPEC)
 		bcopy(dst->sa_data, &af, sizeof(af));
 	else
 		af = dst->sa_family;
 	/*
 	 * Now save the af in the inbound pkt csum data, this is a cheat since
 	 * we are using the inbound csum_data field to carry the af over to
 	 * the gre_transmit() routine, avoiding using yet another mtag.
 	 */
 	m->m_pkthdr.csum_data = af;
 	return (ifp->if_transmit(ifp, m));
 }
 
 static void
 gre_setseqn(struct grehdr *gh, uint32_t seq)
 {
 	uint32_t *opts;
 	uint16_t flags;
 
 	opts = gh->gre_opts;
 	flags = ntohs(gh->gre_flags);
 	KASSERT((flags & GRE_FLAGS_SP) != 0,
 	    ("gre_setseqn called, but GRE_FLAGS_SP isn't set "));
 	if (flags & GRE_FLAGS_CP)
 		opts++;
 	if (flags & GRE_FLAGS_KP)
 		opts++;
 	*opts = htonl(seq);
 }
 
+static uint32_t
+gre_flowid(struct gre_softc *sc, struct mbuf *m, uint32_t af)
+{
+	uint32_t flowid;
+
+	if ((sc->gre_options & GRE_UDPENCAP) == 0 || sc->gre_port != 0)
+		return (0);
+#ifndef RSS
+	switch (af) {
+#ifdef INET
+	case AF_INET:
+		flowid = mtod(m, struct ip *)->ip_src.s_addr ^
+		    mtod(m, struct ip *)->ip_dst.s_addr;
+		break;
+#endif
+#ifdef INET6
+	case AF_INET6:
+		flowid = mtod(m, struct ip6_hdr *)->ip6_src.s6_addr32[3] ^
+		    mtod(m, struct ip6_hdr *)->ip6_dst.s6_addr32[3];
+		break;
+#endif
+	default:
+		flowid = 0;
+	}
+#else /* RSS */
+	switch (af) {
+#ifdef INET
+	case AF_INET:
+		flowid = rss_hash_ip4_2tuple(mtod(m, struct ip *)->ip_src,
+		    mtod(m, struct ip *)->ip_dst);
+		break;
+#endif
+#ifdef INET6
+	case AF_INET6:
+		flowid = rss_hash_ip6_2tuple(
+		    &mtod(m, struct ip6_hdr *)->ip6_src,
+		    &mtod(m, struct ip6_hdr *)->ip6_dst);
+		break;
+#endif
+	default:
+		flowid = 0;
+	}
+#endif
+	return (flowid);
+}
+
 #define	MTAG_GRE	1307983903
 static int
 gre_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	GRE_RLOCK_TRACKER;
 	struct gre_softc *sc;
 	struct grehdr *gh;
-	uint32_t af;
+	struct udphdr *uh;
+	uint32_t af, flowid;
 	int error, len;
 	uint16_t proto;
 
 	len = 0;
 	GRE_RLOCK();
 #ifdef MAC
 	error = mac_ifnet_check_transmit(ifp, m);
 	if (error) {
 		m_freem(m);
 		goto drop;
 	}
 #endif
 	error = ENETDOWN;
 	sc = ifp->if_softc;
 	if ((ifp->if_flags & IFF_MONITOR) != 0 ||
 	    (ifp->if_flags & IFF_UP) == 0 ||
 	    (ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 ||
 	    sc->gre_family == 0 ||
 	    (error = if_tunnel_check_nesting(ifp, m, MTAG_GRE,
 		V_max_gre_nesting)) != 0) {
 		m_freem(m);
 		goto drop;
 	}
 	af = m->m_pkthdr.csum_data;
 	BPF_MTAP2(ifp, &af, sizeof(af), m);
 	m->m_flags &= ~(M_BCAST|M_MCAST);
+	flowid = gre_flowid(sc, m, af);
 	M_SETFIB(m, sc->gre_fibnum);
 	M_PREPEND(m, sc->gre_hlen, M_NOWAIT);
 	if (m == NULL) {
 		error = ENOBUFS;
 		goto drop;
 	}
 	bcopy(sc->gre_hdr, mtod(m, void *), sc->gre_hlen);
 	/* Determine GRE proto */
 	switch (af) {
 #ifdef INET
 	case AF_INET:
 		proto = htons(ETHERTYPE_IP);
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		proto = htons(ETHERTYPE_IPV6);
 		break;
 #endif
 	default:
 		m_freem(m);
 		error = ENETDOWN;
 		goto drop;
 	}
 	/* Determine offset of GRE header */
 	switch (sc->gre_family) {
 #ifdef INET
 	case AF_INET:
 		len = sizeof(struct ip);
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		len = sizeof(struct ip6_hdr);
 		break;
 #endif
 	default:
 		m_freem(m);
 		error = ENETDOWN;
 		goto drop;
 	}
+	if (sc->gre_options & GRE_UDPENCAP) {
+		uh = (struct udphdr *)mtodo(m, len);
+		uh->uh_sport |= htons(V_ipport_hifirstauto) |
+		    (flowid >> 16) | (flowid & 0xFFFF);
+		uh->uh_sport = htons(ntohs(uh->uh_sport) %
+		    V_ipport_hilastauto);
+		uh->uh_ulen = htons(m->m_pkthdr.len - len);
+		uh->uh_sum = gre_cksum_add(uh->uh_sum,
+		    htons(m->m_pkthdr.len - len + IPPROTO_UDP));
+		m->m_pkthdr.csum_flags = sc->gre_csumflags;
+		m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
+		len += sizeof(struct udphdr);
+	}
 	gh = (struct grehdr *)mtodo(m, len);
 	gh->gre_proto = proto;
 	if (sc->gre_options & GRE_ENABLE_SEQ)
 		gre_setseqn(gh, sc->gre_oseq++);
 	if (sc->gre_options & GRE_ENABLE_CSUM) {
 		*(uint16_t *)gh->gre_opts = in_cksum_skip(m,
 		    m->m_pkthdr.len, len);
 	}
 	len = m->m_pkthdr.len - len;
 	switch (sc->gre_family) {
 #ifdef INET
 	case AF_INET:
 		error = in_gre_output(m, af, sc->gre_hlen);
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
-		error = in6_gre_output(m, af, sc->gre_hlen);
+		error = in6_gre_output(m, af, sc->gre_hlen, flowid);
 		break;
 #endif
 	default:
 		m_freem(m);
 		error = ENETDOWN;
 	}
 drop:
 	if (error)
 		if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 	else {
 		if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 		if_inc_counter(ifp, IFCOUNTER_OBYTES, len);
 	}
 	GRE_RUNLOCK();
 	return (error);
 }
 
 static void
 gre_qflush(struct ifnet *ifp __unused)
 {
 
 }
 
 static int
 gremodevent(module_t mod, int type, void *data)
 {
 
 	switch (type) {
 	case MOD_LOAD:
 	case MOD_UNLOAD:
 		break;
 	default:
 		return (EOPNOTSUPP);
 	}
 	return (0);
 }
 
 static moduledata_t gre_mod = {
 	"if_gre",
 	gremodevent,
 	0
 };
 
 DECLARE_MODULE(if_gre, gre_mod, SI_SUB_PSEUDO, SI_ORDER_ANY);
 MODULE_VERSION(if_gre, 1);
Index: user/ngie/bug-237403/sys/net/if_gre.h
===================================================================
--- user/ngie/bug-237403/sys/net/if_gre.h	(revision 346925)
+++ user/ngie/bug-237403/sys/net/if_gre.h	(revision 346926)
@@ -1,147 +1,185 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 1998 The NetBSD Foundation, Inc.
  * Copyright (c) 2014 Andrey V. Elsukov <ae@FreeBSD.org>
  * All rights reserved
  *
  * This code is derived from software contributed to The NetBSD Foundation
  * by Heiko W.Rupp <hwr@pilhuhn.de>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * $NetBSD: if_gre.h,v 1.13 2003/11/10 08:51:52 wiz Exp $
  * $FreeBSD$
  */
 
 #ifndef _NET_IF_GRE_H_
 #define _NET_IF_GRE_H_
 
 #ifdef _KERNEL
 /* GRE header according to RFC 2784 and RFC 2890 */
 struct grehdr {
 	uint16_t	gre_flags;	/* GRE flags */
 #define	GRE_FLAGS_CP	0x8000		/* checksum present */
 #define	GRE_FLAGS_KP	0x2000		/* key present */
 #define	GRE_FLAGS_SP	0x1000		/* sequence present */
 #define	GRE_FLAGS_MASK	(GRE_FLAGS_CP|GRE_FLAGS_KP|GRE_FLAGS_SP)
 	uint16_t	gre_proto;	/* protocol type */
 	uint32_t	gre_opts[0];	/* optional fields */
 } __packed;
 
 #ifdef INET
 struct greip {
 	struct ip	gi_ip;
 	struct grehdr	gi_gre;
 } __packed;
-#endif
 
+struct greudp {
+	struct ip	gi_ip;
+	struct udphdr	gi_udp;
+	struct grehdr	gi_gre;
+} __packed;
+#endif /* INET */
+
 #ifdef INET6
 struct greip6 {
 	struct ip6_hdr	gi6_ip6;
 	struct grehdr	gi6_gre;
 } __packed;
-#endif
 
+struct greudp6 {
+	struct ip6_hdr	gi6_ip6;
+	struct udphdr	gi6_udp;
+	struct grehdr	gi6_gre;
+} __packed;
+#endif /* INET6 */
+
+CK_LIST_HEAD(gre_list, gre_softc);
+CK_LIST_HEAD(gre_sockets, gre_socket);
+struct gre_socket {
+	struct socket		*so;
+	struct gre_list		list;
+	CK_LIST_ENTRY(gre_socket) chain;
+	struct epoch_context	epoch_ctx;
+};
+
 struct gre_softc {
 	struct ifnet		*gre_ifp;
 	int			gre_family;	/* AF of delivery header */
 	uint32_t		gre_iseq;
 	uint32_t		gre_oseq;
 	uint32_t		gre_key;
 	uint32_t		gre_options;
+	uint32_t		gre_csumflags;
+	uint32_t		gre_port;
 	u_int			gre_fibnum;
 	u_int			gre_hlen;	/* header size */
 	union {
 		void		*hdr;
 #ifdef INET
-		struct greip	*gihdr;
+		struct greip	*iphdr;
+		struct greudp	*udphdr;
 #endif
 #ifdef INET6
-		struct greip6	*gi6hdr;
+		struct greip6	*ip6hdr;
+		struct greudp6	*udp6hdr;
 #endif
 	} gre_uhdr;
+	struct gre_socket	*gre_so;
 
 	CK_LIST_ENTRY(gre_softc) chain;
 	CK_LIST_ENTRY(gre_softc) srchash;
 };
-CK_LIST_HEAD(gre_list, gre_softc);
 MALLOC_DECLARE(M_GRE);
 
 #ifndef GRE_HASH_SIZE
 #define	GRE_HASH_SIZE	(1 << 4)
 #endif
 
 #define	GRE2IFP(sc)		((sc)->gre_ifp)
 #define	GRE_RLOCK_TRACKER	struct epoch_tracker gre_et
 #define	GRE_RLOCK()		epoch_enter_preempt(net_epoch_preempt, &gre_et)
 #define	GRE_RUNLOCK()		epoch_exit_preempt(net_epoch_preempt, &gre_et)
 #define	GRE_WAIT()		epoch_wait_preempt(net_epoch_preempt)
 
 #define	gre_hdr			gre_uhdr.hdr
-#define	gre_gihdr		gre_uhdr.gihdr
-#define	gre_gi6hdr		gre_uhdr.gi6hdr
-#define	gre_oip			gre_gihdr->gi_ip
-#define	gre_oip6		gre_gi6hdr->gi6_ip6
+#define	gre_iphdr		gre_uhdr.iphdr
+#define	gre_ip6hdr		gre_uhdr.ip6hdr
+#define	gre_udphdr		gre_uhdr.udphdr
+#define	gre_udp6hdr		gre_uhdr.udp6hdr
 
+#define	gre_oip			gre_iphdr->gi_ip
+#define	gre_udp			gre_udphdr->gi_udp
+#define	gre_oip6		gre_ip6hdr->gi6_ip6
+#define	gre_udp6		gre_udp6hdr->gi6_udp
+
 struct gre_list *gre_hashinit(void);
 void gre_hashdestroy(struct gre_list *);
 
 int	gre_input(struct mbuf *, int, int, void *);
-void	gre_updatehdr(struct gre_softc *, struct grehdr *);
+void	gre_update_hdr(struct gre_softc *, struct grehdr *);
+void	gre_update_udphdr(struct gre_softc *, struct udphdr *, uint16_t);
+void	gre_sofree(epoch_context_t);
 
 void	in_gre_init(void);
 void	in_gre_uninit(void);
-void	in_gre_setopts(struct gre_softc *, u_long, uint32_t);
+int	in_gre_setopts(struct gre_softc *, u_long, uint32_t);
 int	in_gre_ioctl(struct gre_softc *, u_long, caddr_t);
 int	in_gre_output(struct mbuf *, int, int);
 
 void	in6_gre_init(void);
 void	in6_gre_uninit(void);
-void	in6_gre_setopts(struct gre_softc *, u_long, uint32_t);
+int	in6_gre_setopts(struct gre_softc *, u_long, uint32_t);
 int	in6_gre_ioctl(struct gre_softc *, u_long, caddr_t);
-int	in6_gre_output(struct mbuf *, int, int);
+int	in6_gre_output(struct mbuf *, int, int, uint32_t);
 /*
  * CISCO uses special type for GRE tunnel created as part of WCCP
  * connection, while in fact those packets are just IPv4 encapsulated
  * into GRE.
  */
 #define ETHERTYPE_WCCP		0x883E
 #endif /* _KERNEL */
 
 #define GRESADDRS	_IOW('i', 101, struct ifreq)
 #define GRESADDRD	_IOW('i', 102, struct ifreq)
 #define GREGADDRS	_IOWR('i', 103, struct ifreq)
 #define GREGADDRD	_IOWR('i', 104, struct ifreq)
 #define GRESPROTO	_IOW('i' , 105, struct ifreq)
 #define GREGPROTO	_IOWR('i', 106, struct ifreq)
 
 #define	GREGKEY		_IOWR('i', 107, struct ifreq)
 #define	GRESKEY		_IOW('i', 108, struct ifreq)
 #define	GREGOPTS	_IOWR('i', 109, struct ifreq)
 #define	GRESOPTS	_IOW('i', 110, struct ifreq)
+#define	GREGPORT	_IOWR('i', 111, struct ifreq)
+#define	GRESPORT	_IOW('i', 112, struct ifreq)
 
+/* GRE-in-UDP encapsulation destination port as defined in RFC8086 */
+#define	GRE_UDPPORT		4754
+
 #define	GRE_ENABLE_CSUM		0x0001
 #define	GRE_ENABLE_SEQ		0x0002
-#define	GRE_OPTMASK		(GRE_ENABLE_CSUM|GRE_ENABLE_SEQ)
+#define	GRE_UDPENCAP		0x0004
+#define	GRE_OPTMASK		(GRE_ENABLE_CSUM|GRE_ENABLE_SEQ|GRE_UDPENCAP)
 
 #endif /* _NET_IF_GRE_H_ */
Index: user/ngie/bug-237403/sys/net/if_tap.c
===================================================================
--- user/ngie/bug-237403/sys/net/if_tap.c	(revision 346925)
+++ user/ngie/bug-237403/sys/net/if_tap.c	(revision 346926)
@@ -1,1127 +1,1145 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (C) 1999-2000 by Maksim Yevmenkin <m_evmenkin@yahoo.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * BASED ON:
  * -------------------------------------------------------------------------
  *
  * Copyright (c) 1988, Julian Onions <jpo@cs.nott.ac.uk>
  * Nottingham University 1987.
  */
 
 /*
  * $FreeBSD$
  * $Id: if_tap.c,v 0.21 2000/07/23 21:46:02 max Exp $
  */
 
 #include "opt_inet.h"
 
 #include <sys/param.h>
 #include <sys/conf.h>
+#include <sys/lock.h>
 #include <sys/fcntl.h>
 #include <sys/filio.h>
 #include <sys/jail.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/module.h>
 #include <sys/poll.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/selinfo.h>
 #include <sys/signalvar.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
+#include <sys/sx.h>
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 #include <sys/ttycom.h>
 #include <sys/uio.h>
 #include <sys/queue.h>
 
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_clone.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_types.h>
 #include <net/route.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 
 #include <net/if_tapvar.h>
 #include <net/if_tap.h>
 
 #define CDEV_NAME	"tap"
 #define TAPDEBUG	if (tapdebug) printf
 
 static const char tapname[] = "tap";
 static const char vmnetname[] = "vmnet";
 #define TAPMAXUNIT	0x7fff
 #define VMNET_DEV_MASK	CLONE_FLAG0
 
 /* module */
 static int		tapmodevent(module_t, int, void *);
 
 /* device */
 static void		tapclone(void *, struct ucred *, char *, int,
 			    struct cdev **);
 static void		tapcreate(struct cdev *);
 
 /* network interface */
 static void		tapifstart(struct ifnet *);
 static int		tapifioctl(struct ifnet *, u_long, caddr_t);
 static void		tapifinit(void *);
 
 static int		tap_clone_create(struct if_clone *, int, caddr_t);
 static void		tap_clone_destroy(struct ifnet *);
 static struct if_clone *tap_cloner;
 static int		vmnet_clone_create(struct if_clone *, int, caddr_t);
 static void		vmnet_clone_destroy(struct ifnet *);
 static struct if_clone *vmnet_cloner;
 
 /* character device */
 static d_open_t		tapopen;
 static d_close_t	tapclose;
 static d_read_t		tapread;
 static d_write_t	tapwrite;
 static d_ioctl_t	tapioctl;
 static d_poll_t		tappoll;
 static d_kqfilter_t	tapkqfilter;
 
 /* kqueue(2) */
 static int		tapkqread(struct knote *, long);
 static int		tapkqwrite(struct knote *, long);
 static void		tapkqdetach(struct knote *);
 
 static struct filterops	tap_read_filterops = {
 	.f_isfd =	1,
 	.f_attach =	NULL,
 	.f_detach =	tapkqdetach,
 	.f_event =	tapkqread,
 };
 
 static struct filterops	tap_write_filterops = {
 	.f_isfd =	1,
 	.f_attach =	NULL,
 	.f_detach =	tapkqdetach,
 	.f_event =	tapkqwrite,
 };
 
 static struct cdevsw	tap_cdevsw = {
 	.d_version =	D_VERSION,
 	.d_flags =	D_NEEDMINOR,
 	.d_open =	tapopen,
 	.d_close =	tapclose,
 	.d_read =	tapread,
 	.d_write =	tapwrite,
 	.d_ioctl =	tapioctl,
 	.d_poll =	tappoll,
 	.d_name =	CDEV_NAME,
 	.d_kqfilter =	tapkqfilter,
 };
 
 /*
  * All global variables in if_tap.c are locked with tapmtx, with the
  * exception of tapdebug, which is accessed unlocked; tapclones is
  * static at runtime.
  */
 static struct mtx		tapmtx;
 static int			tapdebug = 0;        /* debug flag   */
 static int			tapuopen = 0;        /* allow user open() */
 static int			tapuponopen = 0;    /* IFF_UP on open() */
 static int			tapdclone = 1;	/* enable devfs cloning */
 static SLIST_HEAD(, tap_softc)	taphead;             /* first device */
 static struct clonedevs 	*tapclones;
 
 MALLOC_DECLARE(M_TAP);
 MALLOC_DEFINE(M_TAP, CDEV_NAME, "Ethernet tunnel interface");
 SYSCTL_INT(_debug, OID_AUTO, if_tap_debug, CTLFLAG_RW, &tapdebug, 0, "");
 
+static struct sx tap_ioctl_sx;
+SX_SYSINIT(tap_ioctl_sx, &tap_ioctl_sx, "tap_ioctl");
+
 SYSCTL_DECL(_net_link);
 static SYSCTL_NODE(_net_link, OID_AUTO, tap, CTLFLAG_RW, 0,
     "Ethernet tunnel software network interface");
 SYSCTL_INT(_net_link_tap, OID_AUTO, user_open, CTLFLAG_RW, &tapuopen, 0,
 	"Allow user to open /dev/tap (based on node permissions)");
 SYSCTL_INT(_net_link_tap, OID_AUTO, up_on_open, CTLFLAG_RW, &tapuponopen, 0,
 	"Bring interface up when /dev/tap is opened");
 SYSCTL_INT(_net_link_tap, OID_AUTO, devfs_cloning, CTLFLAG_RWTUN, &tapdclone, 0,
 	"Enable legacy devfs interface creation");
 SYSCTL_INT(_net_link_tap, OID_AUTO, debug, CTLFLAG_RW, &tapdebug, 0, "");
 
 DEV_MODULE(if_tap, tapmodevent, NULL);
+MODULE_VERSION(if_tap, 1);
 
 static int
 tap_clone_create(struct if_clone *ifc, int unit, caddr_t params)
 {
 	struct cdev *dev;
 	int i;
 
 	/* Find any existing device, or allocate new unit number. */
 	i = clone_create(&tapclones, &tap_cdevsw, &unit, &dev, 0);
 	if (i) {
 		dev = make_dev(&tap_cdevsw, unit, UID_ROOT, GID_WHEEL, 0600,
 		    "%s%d", tapname, unit);
 	}
 
 	tapcreate(dev);
 	return (0);
 }
 
 /* vmnet devices are tap devices in disguise */
 static int
 vmnet_clone_create(struct if_clone *ifc, int unit, caddr_t params)
 {
 	struct cdev *dev;
 	int i;
 
 	/* Find any existing device, or allocate new unit number. */
 	i = clone_create(&tapclones, &tap_cdevsw, &unit, &dev, VMNET_DEV_MASK);
 	if (i) {
 		dev = make_dev(&tap_cdevsw, unit | VMNET_DEV_MASK, UID_ROOT,
 		    GID_WHEEL, 0600, "%s%d", vmnetname, unit);
 	}
 
 	tapcreate(dev);
 	return (0);
 }
 
 static void
 tap_destroy(struct tap_softc *tp)
 {
 	struct ifnet *ifp = tp->tap_ifp;
 
 	CURVNET_SET(ifp->if_vnet);
+	sx_xlock(&tap_ioctl_sx);
+	ifp->if_softc = NULL;
+	sx_xunlock(&tap_ioctl_sx);
+
 	destroy_dev(tp->tap_dev);
 	seldrain(&tp->tap_rsel);
 	knlist_clear(&tp->tap_rsel.si_note, 0);
 	knlist_destroy(&tp->tap_rsel.si_note);
 	ether_ifdetach(ifp);
 	if_free(ifp);
 
 	mtx_destroy(&tp->tap_mtx);
 	free(tp, M_TAP);
 	CURVNET_RESTORE();
 }
 
 static void
 tap_clone_destroy(struct ifnet *ifp)
 {
 	struct tap_softc *tp = ifp->if_softc;
 
 	mtx_lock(&tapmtx);
 	SLIST_REMOVE(&taphead, tp, tap_softc, tap_next);
 	mtx_unlock(&tapmtx);
 	tap_destroy(tp);
 }
 
 /* vmnet devices are tap devices in disguise */
 static void
 vmnet_clone_destroy(struct ifnet *ifp)
 {
 	tap_clone_destroy(ifp);
 }
 
 /*
  * tapmodevent
  *
  * module event handler
  */
 static int
 tapmodevent(module_t mod, int type, void *data)
 {
 	static eventhandler_tag	 eh_tag = NULL;
 	struct tap_softc	*tp = NULL;
 	struct ifnet		*ifp = NULL;
 
 	switch (type) {
 	case MOD_LOAD:
 
 		/* intitialize device */
 
 		mtx_init(&tapmtx, "tapmtx", NULL, MTX_DEF);
 		SLIST_INIT(&taphead);
 
 		clone_setup(&tapclones);
 		eh_tag = EVENTHANDLER_REGISTER(dev_clone, tapclone, 0, 1000);
 		if (eh_tag == NULL) {
 			clone_cleanup(&tapclones);
 			mtx_destroy(&tapmtx);
 			return (ENOMEM);
 		}
 		tap_cloner = if_clone_simple(tapname, tap_clone_create,
 		    tap_clone_destroy, 0);
 		vmnet_cloner = if_clone_simple(vmnetname, vmnet_clone_create,
 		    vmnet_clone_destroy, 0);
 		return (0);
 
 	case MOD_UNLOAD:
 		/*
 		 * The EBUSY algorithm here can't quite atomically
 		 * guarantee that this is race-free since we have to
 		 * release the tap mtx to deregister the clone handler.
 		 */
 		mtx_lock(&tapmtx);
 		SLIST_FOREACH(tp, &taphead, tap_next) {
 			mtx_lock(&tp->tap_mtx);
 			if (tp->tap_flags & TAP_OPEN) {
 				mtx_unlock(&tp->tap_mtx);
 				mtx_unlock(&tapmtx);
 				return (EBUSY);
 			}
 			mtx_unlock(&tp->tap_mtx);
 		}
 		mtx_unlock(&tapmtx);
 
 		EVENTHANDLER_DEREGISTER(dev_clone, eh_tag);
 		if_clone_detach(tap_cloner);
 		if_clone_detach(vmnet_cloner);
 		drain_dev_clone_events();
 
 		mtx_lock(&tapmtx);
 		while ((tp = SLIST_FIRST(&taphead)) != NULL) {
 			SLIST_REMOVE_HEAD(&taphead, tap_next);
 			mtx_unlock(&tapmtx);
 
 			ifp = tp->tap_ifp;
 
 			TAPDEBUG("detaching %s\n", ifp->if_xname);
 
 			tap_destroy(tp);
 			mtx_lock(&tapmtx);
 		}
 		mtx_unlock(&tapmtx);
 		clone_cleanup(&tapclones);
 
 		mtx_destroy(&tapmtx);
 
 		break;
 
 	default:
 		return (EOPNOTSUPP);
 	}
 
 	return (0);
 } /* tapmodevent */
 
 
 /*
  * DEVFS handler
  *
  * We need to support two kind of devices - tap and vmnet
  */
 static void
 tapclone(void *arg, struct ucred *cred, char *name, int namelen, struct cdev **dev)
 {
 	char		devname[SPECNAMELEN + 1];
 	int		i, unit, append_unit;
 	int		extra;
 
 	if (*dev != NULL)
 		return;
 
 	if (!tapdclone ||
 	    (!tapuopen && priv_check_cred(cred, PRIV_NET_IFCREATE) != 0))
 		return;
 
 	unit = 0;
 	append_unit = 0;
 	extra = 0;
 
 	/* We're interested in only tap/vmnet devices. */
 	if (strcmp(name, tapname) == 0) {
 		unit = -1;
 	} else if (strcmp(name, vmnetname) == 0) {
 		unit = -1;
 		extra = VMNET_DEV_MASK;
 	} else if (dev_stdclone(name, NULL, tapname, &unit) != 1) {
 		if (dev_stdclone(name, NULL, vmnetname, &unit) != 1) {
 			return;
 		} else {
 			extra = VMNET_DEV_MASK;
 		}
 	}
 
 	if (unit == -1)
 		append_unit = 1;
 
 	CURVNET_SET(CRED_TO_VNET(cred));
 	/* find any existing device, or allocate new unit number */
 	i = clone_create(&tapclones, &tap_cdevsw, &unit, dev, extra);
 	if (i) {
 		if (append_unit) {
 			/*
 			 * We were passed 'tun' or 'tap', with no unit specified
 			 * so we'll need to append it now.
 			 */
 			namelen = snprintf(devname, sizeof(devname), "%s%d", name,
 			    unit);
 			name = devname;
 		}
 
 		*dev = make_dev_credf(MAKEDEV_REF, &tap_cdevsw, unit | extra,
 		     cred, UID_ROOT, GID_WHEEL, 0600, "%s", name);
 	}
 
 	if_clone_create(name, namelen, NULL);
 	CURVNET_RESTORE();
 } /* tapclone */
 
 
 /*
  * tapcreate
  *
  * to create interface
  */
 static void
 tapcreate(struct cdev *dev)
 {
 	struct ifnet		*ifp = NULL;
 	struct tap_softc	*tp = NULL;
 	unsigned short		 macaddr_hi;
 	uint32_t		 macaddr_mid;
 	int			 unit;
 	const char		*name = NULL;
 	u_char			eaddr[6];
 
 	/* allocate driver storage and create device */
 	tp = malloc(sizeof(*tp), M_TAP, M_WAITOK | M_ZERO);
 	mtx_init(&tp->tap_mtx, "tap_mtx", NULL, MTX_DEF);
 	mtx_lock(&tapmtx);
 	SLIST_INSERT_HEAD(&taphead, tp, tap_next);
 	mtx_unlock(&tapmtx);
 
 	unit = dev2unit(dev);
 
 	/* select device: tap or vmnet */
 	if (unit & VMNET_DEV_MASK) {
 		name = vmnetname;
 		tp->tap_flags |= TAP_VMNET;
 	} else
 		name = tapname;
 
 	unit &= TAPMAXUNIT;
 
 	TAPDEBUG("tapcreate(%s%d). minor = %#x\n", name, unit, dev2unit(dev));
 
 	/* generate fake MAC address: 00 bd xx xx xx unit_no */
 	macaddr_hi = htons(0x00bd);
 	macaddr_mid = (uint32_t) ticks;
 	bcopy(&macaddr_hi, eaddr, sizeof(short));
 	bcopy(&macaddr_mid, &eaddr[2], sizeof(uint32_t));
 	eaddr[5] = (u_char)unit;
 
 	/* fill the rest and attach interface */
 	ifp = tp->tap_ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL)
 		panic("%s%d: can not if_alloc()", name, unit);
 	ifp->if_softc = tp;
 	if_initname(ifp, name, unit);
 	ifp->if_init = tapifinit;
 	ifp->if_start = tapifstart;
 	ifp->if_ioctl = tapifioctl;
 	ifp->if_mtu = ETHERMTU;
 	ifp->if_flags = (IFF_BROADCAST|IFF_SIMPLEX|IFF_MULTICAST);
 	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
 	ifp->if_capabilities |= IFCAP_LINKSTATE;
 	ifp->if_capenable |= IFCAP_LINKSTATE;
 
 	dev->si_drv1 = tp;
 	tp->tap_dev = dev;
 
 	ether_ifattach(ifp, eaddr);
 
 	mtx_lock(&tp->tap_mtx);
 	tp->tap_flags |= TAP_INITED;
 	mtx_unlock(&tp->tap_mtx);
 
 	knlist_init_mtx(&tp->tap_rsel.si_note, &tp->tap_mtx);
 
 	TAPDEBUG("interface %s is created. minor = %#x\n", 
 		ifp->if_xname, dev2unit(dev));
 } /* tapcreate */
 
 
 /*
  * tapopen
  *
  * to open tunnel. must be superuser
  */
 static int
 tapopen(struct cdev *dev, int flag, int mode, struct thread *td)
 {
 	struct tap_softc	*tp = NULL;
 	struct ifnet		*ifp = NULL;
 	int			 error;
 
 	if (tapuopen == 0) {
 		error = priv_check(td, PRIV_NET_TAP);
 		if (error)
 			return (error);
 	}
 
 	if ((dev2unit(dev) & CLONE_UNITMASK) > TAPMAXUNIT)
 		return (ENXIO);
 
 	tp = dev->si_drv1;
 
 	mtx_lock(&tp->tap_mtx);
 	if (tp->tap_flags & TAP_OPEN) {
 		mtx_unlock(&tp->tap_mtx);
 		return (EBUSY);
 	}
 
 	bcopy(IF_LLADDR(tp->tap_ifp), tp->ether_addr, sizeof(tp->ether_addr));
 	tp->tap_pid = td->td_proc->p_pid;
 	tp->tap_flags |= TAP_OPEN;
 	ifp = tp->tap_ifp;
 
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 	if (tapuponopen)
 		ifp->if_flags |= IFF_UP;
 	if_link_state_change(ifp, LINK_STATE_UP);
 	mtx_unlock(&tp->tap_mtx);
 
 	TAPDEBUG("%s is open. minor = %#x\n", ifp->if_xname, dev2unit(dev));
 
 	return (0);
 } /* tapopen */
 
 
 /*
  * tapclose
  *
  * close the device - mark i/f down & delete routing info
  */
 static int
 tapclose(struct cdev *dev, int foo, int bar, struct thread *td)
 {
 	struct ifaddr		*ifa;
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 
 	/* junk all pending output */
 	mtx_lock(&tp->tap_mtx);
 	CURVNET_SET(ifp->if_vnet);
 	IF_DRAIN(&ifp->if_snd);
 
 	/*
 	 * Do not bring the interface down, and do not anything with
 	 * interface, if we are in VMnet mode. Just close the device.
 	 */
 	if (((tp->tap_flags & TAP_VMNET) == 0) &&
 	    (ifp->if_flags & (IFF_UP | IFF_LINK0)) == IFF_UP) {
 		mtx_unlock(&tp->tap_mtx);
 		if_down(ifp);
 		mtx_lock(&tp->tap_mtx);
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 			ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 			mtx_unlock(&tp->tap_mtx);
 			CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 				rtinit(ifa, (int)RTM_DELETE, 0);
 			}
 			if_purgeaddrs(ifp);
 			mtx_lock(&tp->tap_mtx);
 		}
 	}
 
 	if_link_state_change(ifp, LINK_STATE_DOWN);
 	CURVNET_RESTORE();
 
 	funsetown(&tp->tap_sigio);
 	selwakeuppri(&tp->tap_rsel, PZERO+1);
 	KNOTE_LOCKED(&tp->tap_rsel.si_note, 0);
 
 	tp->tap_flags &= ~TAP_OPEN;
 	tp->tap_pid = 0;
 	mtx_unlock(&tp->tap_mtx);
 
 	TAPDEBUG("%s is closed. minor = %#x\n", 
 		ifp->if_xname, dev2unit(dev));
 
 	return (0);
 } /* tapclose */
 
 
 /*
  * tapifinit
  *
  * network interface initialization function
  */
 static void
 tapifinit(void *xtp)
 {
 	struct tap_softc	*tp = (struct tap_softc *)xtp;
 	struct ifnet		*ifp = tp->tap_ifp;
 
 	TAPDEBUG("initializing %s\n", ifp->if_xname);
 
 	mtx_lock(&tp->tap_mtx);
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 	mtx_unlock(&tp->tap_mtx);
 
 	/* attempt to start output */
 	tapifstart(ifp);
 } /* tapifinit */
 
 
 /*
  * tapifioctl
  *
  * Process an ioctl request on network interface
  */
 static int
 tapifioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
-	struct tap_softc	*tp = ifp->if_softc;
+	struct tap_softc	*tp;
 	struct ifreq		*ifr = (struct ifreq *)data;
 	struct ifstat		*ifs = NULL;
 	struct ifmediareq	*ifmr = NULL;
 	int			 dummy, error = 0;
 
+	sx_xlock(&tap_ioctl_sx);
+	tp = ifp->if_softc;
+	if (tp == NULL) {
+		error = ENXIO;
+		goto bad;
+	}
 	switch (cmd) {
 		case SIOCSIFFLAGS: /* XXX -- just like vmnet does */
 		case SIOCADDMULTI:
 		case SIOCDELMULTI:
 			break;
 
 		case SIOCGIFMEDIA:
 			ifmr = (struct ifmediareq *)data;
 			dummy = ifmr->ifm_count;
 			ifmr->ifm_count = 1;
 			ifmr->ifm_status = IFM_AVALID;
 			ifmr->ifm_active = IFM_ETHER;
 			if (tp->tap_flags & TAP_OPEN)
 				ifmr->ifm_status |= IFM_ACTIVE;
 			ifmr->ifm_current = ifmr->ifm_active;
 			if (dummy >= 1) {
 				int media = IFM_ETHER;
 				error = copyout(&media, ifmr->ifm_ulist,
 				    sizeof(int));
 			}
 			break;
 
 		case SIOCSIFMTU:
 			ifp->if_mtu = ifr->ifr_mtu;
 			break;
 
 		case SIOCGIFSTATUS:
 			ifs = (struct ifstat *)data;
 			mtx_lock(&tp->tap_mtx);
 			if (tp->tap_pid != 0)
 				snprintf(ifs->ascii, sizeof(ifs->ascii),
 					"\tOpened by PID %d\n", tp->tap_pid);
 			else
 				ifs->ascii[0] = '\0';
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		default:
 			error = ether_ioctl(ifp, cmd, data);
 			break;
 	}
 
+bad:
+	sx_xunlock(&tap_ioctl_sx);
 	return (error);
 } /* tapifioctl */
 
 
 /*
  * tapifstart
  *
  * queue packets from higher level ready to put out
  */
 static void
 tapifstart(struct ifnet *ifp)
 {
 	struct tap_softc	*tp = ifp->if_softc;
 
 	TAPDEBUG("%s starting\n", ifp->if_xname);
 
 	/*
 	 * do not junk pending output if we are in VMnet mode.
 	 * XXX: can this do any harm because of queue overflow?
 	 */
 
 	mtx_lock(&tp->tap_mtx);
 	if (((tp->tap_flags & TAP_VMNET) == 0) &&
 	    ((tp->tap_flags & TAP_READY) != TAP_READY)) {
 		struct mbuf *m;
 
 		/* Unlocked read. */
 		TAPDEBUG("%s not ready, tap_flags = 0x%x\n", ifp->if_xname, 
 		    tp->tap_flags);
 
 		for (;;) {
 			IF_DEQUEUE(&ifp->if_snd, m);
 			if (m != NULL) {
 				m_freem(m);
 				if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 			} else
 				break;
 		}
 		mtx_unlock(&tp->tap_mtx);
 
 		return;
 	}
 
 	ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 
 	if (!IFQ_IS_EMPTY(&ifp->if_snd)) {
 		if (tp->tap_flags & TAP_RWAIT) {
 			tp->tap_flags &= ~TAP_RWAIT;
 			wakeup(tp);
 		}
 
 		if ((tp->tap_flags & TAP_ASYNC) && (tp->tap_sigio != NULL)) {
 			mtx_unlock(&tp->tap_mtx);
 			pgsigio(&tp->tap_sigio, SIGIO, 0);
 			mtx_lock(&tp->tap_mtx);
 		}
 
 		selwakeuppri(&tp->tap_rsel, PZERO+1);
 		KNOTE_LOCKED(&tp->tap_rsel.si_note, 0);
 		if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); /* obytes are counted in ether_output */
 	}
 
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 	mtx_unlock(&tp->tap_mtx);
 } /* tapifstart */
 
 
 /*
  * tapioctl
  *
  * the cdevsw interface is now pretty minimal
  */
 static int
 tapioctl(struct cdev *dev, u_long cmd, caddr_t data, int flag, struct thread *td)
 {
 	struct ifreq		 ifr;
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 	struct tapinfo		*tapp = NULL;
 	int			 f;
 	int			 error;
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4)
 	int			 ival;
 #endif
 
 	switch (cmd) {
 		case TAPSIFINFO:
 			tapp = (struct tapinfo *)data;
 			if (ifp->if_type != tapp->type)
 				return (EPROTOTYPE);
 			mtx_lock(&tp->tap_mtx);
 			if (ifp->if_mtu != tapp->mtu) {
 				strlcpy(ifr.ifr_name, if_name(ifp), IFNAMSIZ);
 				ifr.ifr_mtu = tapp->mtu;
 				CURVNET_SET(ifp->if_vnet);
 				error = ifhwioctl(SIOCSIFMTU, ifp,
 				    (caddr_t)&ifr, td);
 				CURVNET_RESTORE();
 				if (error) {
 					mtx_unlock(&tp->tap_mtx);
 					return (error);
 				}
 			}
 			ifp->if_baudrate = tapp->baudrate;
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		case TAPGIFINFO:
 			tapp = (struct tapinfo *)data;
 			mtx_lock(&tp->tap_mtx);
 			tapp->mtu = ifp->if_mtu;
 			tapp->type = ifp->if_type;
 			tapp->baudrate = ifp->if_baudrate;
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		case TAPSDEBUG:
 			tapdebug = *(int *)data;
 			break;
 
 		case TAPGDEBUG:
 			*(int *)data = tapdebug;
 			break;
 
 		case TAPGIFNAME: {
 			struct ifreq	*ifr = (struct ifreq *) data;
 
 			strlcpy(ifr->ifr_name, ifp->if_xname, IFNAMSIZ);
 			} break;
 
 		case FIONBIO:
 			break;
 
 		case FIOASYNC:
 			mtx_lock(&tp->tap_mtx);
 			if (*(int *)data)
 				tp->tap_flags |= TAP_ASYNC;
 			else
 				tp->tap_flags &= ~TAP_ASYNC;
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		case FIONREAD:
 			if (!IFQ_IS_EMPTY(&ifp->if_snd)) {
 				struct mbuf *mb;
 
 				IFQ_LOCK(&ifp->if_snd);
 				IFQ_POLL_NOLOCK(&ifp->if_snd, mb);
 				for (*(int *)data = 0; mb != NULL;
 				     mb = mb->m_next)
 					*(int *)data += mb->m_len;
 				IFQ_UNLOCK(&ifp->if_snd);
 			} else
 				*(int *)data = 0;
 			break;
 
 		case FIOSETOWN:
 			return (fsetown(*(int *)data, &tp->tap_sigio));
 
 		case FIOGETOWN:
 			*(int *)data = fgetown(&tp->tap_sigio);
 			return (0);
 
 		/* this is deprecated, FIOSETOWN should be used instead */
 		case TIOCSPGRP:
 			return (fsetown(-(*(int *)data), &tp->tap_sigio));
 
 		/* this is deprecated, FIOGETOWN should be used instead */
 		case TIOCGPGRP:
 			*(int *)data = -fgetown(&tp->tap_sigio);
 			return (0);
 
 		/* VMware/VMnet port ioctl's */
 
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4)
 		case _IO('V', 0):
 			ival = IOCPARM_IVAL(data);
 			data = (caddr_t)&ival;
 			/* FALLTHROUGH */
 #endif
 		case VMIO_SIOCSIFFLAGS: /* VMware/VMnet SIOCSIFFLAGS */
 			f = *(int *)data;
 			f &= 0x0fff;
 			f &= ~IFF_CANTCHANGE;
 			f |= IFF_UP;
 
 			mtx_lock(&tp->tap_mtx);
 			ifp->if_flags = f | (ifp->if_flags & IFF_CANTCHANGE);
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		case SIOCGIFADDR:	/* get MAC address of the remote side */
 			mtx_lock(&tp->tap_mtx);
 			bcopy(tp->ether_addr, data, sizeof(tp->ether_addr));
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		case SIOCSIFADDR:	/* set MAC address of the remote side */
 			mtx_lock(&tp->tap_mtx);
 			bcopy(data, tp->ether_addr, sizeof(tp->ether_addr));
 			mtx_unlock(&tp->tap_mtx);
 			break;
 
 		default:
 			return (ENOTTY);
 	}
 	return (0);
 } /* tapioctl */
 
 
 /*
  * tapread
  *
  * the cdevsw read interface - reads a packet at a time, or at
  * least as much of a packet as can be read
  */
 static int
 tapread(struct cdev *dev, struct uio *uio, int flag)
 {
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 	struct mbuf		*m = NULL;
 	int			 error = 0, len;
 
 	TAPDEBUG("%s reading, minor = %#x\n", ifp->if_xname, dev2unit(dev));
 
 	mtx_lock(&tp->tap_mtx);
 	if ((tp->tap_flags & TAP_READY) != TAP_READY) {
 		mtx_unlock(&tp->tap_mtx);
 
 		/* Unlocked read. */
 		TAPDEBUG("%s not ready. minor = %#x, tap_flags = 0x%x\n",
 			ifp->if_xname, dev2unit(dev), tp->tap_flags);
 
 		return (EHOSTDOWN);
 	}
 
 	tp->tap_flags &= ~TAP_RWAIT;
 
 	/* sleep until we get a packet */
 	do {
 		IF_DEQUEUE(&ifp->if_snd, m);
 
 		if (m == NULL) {
 			if (flag & O_NONBLOCK) {
 				mtx_unlock(&tp->tap_mtx);
 				return (EWOULDBLOCK);
 			}
 
 			tp->tap_flags |= TAP_RWAIT;
 			error = mtx_sleep(tp, &tp->tap_mtx, PCATCH | (PZERO + 1),
 			    "taprd", 0);
 			if (error) {
 				mtx_unlock(&tp->tap_mtx);
 				return (error);
 			}
 		}
 	} while (m == NULL);
 	mtx_unlock(&tp->tap_mtx);
 
 	/* feed packet to bpf */
 	BPF_MTAP(ifp, m);
 
 	/* xfer packet to user space */
 	while ((m != NULL) && (uio->uio_resid > 0) && (error == 0)) {
 		len = min(uio->uio_resid, m->m_len);
 		if (len == 0)
 			break;
 
 		error = uiomove(mtod(m, void *), len, uio);
 		m = m_free(m);
 	}
 
 	if (m != NULL) {
 		TAPDEBUG("%s dropping mbuf, minor = %#x\n", ifp->if_xname, 
 			dev2unit(dev));
 		m_freem(m);
 	}
 
 	return (error);
 } /* tapread */
 
 
 /*
  * tapwrite
  *
  * the cdevsw write interface - an atomic write is a packet - or else!
  */
 static int
 tapwrite(struct cdev *dev, struct uio *uio, int flag)
 {
 	struct ether_header	*eh;
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 	struct mbuf		*m;
 
 	TAPDEBUG("%s writing, minor = %#x\n", 
 		ifp->if_xname, dev2unit(dev));
 
 	if (uio->uio_resid == 0)
 		return (0);
 
 	if ((uio->uio_resid < 0) || (uio->uio_resid > TAPMRU)) {
 		TAPDEBUG("%s invalid packet len = %zd, minor = %#x\n",
 			ifp->if_xname, uio->uio_resid, dev2unit(dev));
 
 		return (EIO);
 	}
 
 	if ((m = m_uiotombuf(uio, M_NOWAIT, 0, ETHER_ALIGN,
 	    M_PKTHDR)) == NULL) {
 		if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 		return (ENOBUFS);
 	}
 
 	m->m_pkthdr.rcvif = ifp;
 
 	/*
 	 * Only pass a unicast frame to ether_input(), if it would actually
 	 * have been received by non-virtual hardware.
 	 */
 	if (m->m_len < sizeof(struct ether_header)) {
 		m_freem(m);
 		return (0);
 	}
 	eh = mtod(m, struct ether_header *);
 
 	if (eh && (ifp->if_flags & IFF_PROMISC) == 0 &&
 	    !ETHER_IS_MULTICAST(eh->ether_dhost) &&
 	    bcmp(eh->ether_dhost, IF_LLADDR(ifp), ETHER_ADDR_LEN) != 0) {
 		m_freem(m);
 		return (0);
 	}
 
 	/* Pass packet up to parent. */
 	CURVNET_SET(ifp->if_vnet);
 	(*ifp->if_input)(ifp, m);
 	CURVNET_RESTORE();
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); /* ibytes are counted in parent */
 
 	return (0);
 } /* tapwrite */
 
 
 /*
  * tappoll
  *
  * the poll interface, this is only useful on reads
  * really. the write detect always returns true, write never blocks
  * anyway, it either accepts the packet or drops it
  */
 static int
 tappoll(struct cdev *dev, int events, struct thread *td)
 {
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 	int			 revents = 0;
 
 	TAPDEBUG("%s polling, minor = %#x\n", 
 		ifp->if_xname, dev2unit(dev));
 
 	if (events & (POLLIN | POLLRDNORM)) {
 		IFQ_LOCK(&ifp->if_snd);
 		if (!IFQ_IS_EMPTY(&ifp->if_snd)) {
 			TAPDEBUG("%s have data in queue. len = %d, " \
 				"minor = %#x\n", ifp->if_xname,
 				ifp->if_snd.ifq_len, dev2unit(dev));
 
 			revents |= (events & (POLLIN | POLLRDNORM));
 		} else {
 			TAPDEBUG("%s waiting for data, minor = %#x\n",
 				ifp->if_xname, dev2unit(dev));
 
 			selrecord(td, &tp->tap_rsel);
 		}
 		IFQ_UNLOCK(&ifp->if_snd);
 	}
 
 	if (events & (POLLOUT | POLLWRNORM))
 		revents |= (events & (POLLOUT | POLLWRNORM));
 
 	return (revents);
 } /* tappoll */
 
 
 /*
  * tap_kqfilter
  *
  * support for kevent() system call
  */
 static int
 tapkqfilter(struct cdev *dev, struct knote *kn)
 {
 	struct tap_softc	*tp = dev->si_drv1;
 	struct ifnet		*ifp = tp->tap_ifp;
 
 	switch (kn->kn_filter) {
 	case EVFILT_READ:
 		TAPDEBUG("%s kqfilter: EVFILT_READ, minor = %#x\n",
 			ifp->if_xname, dev2unit(dev));
 		kn->kn_fop = &tap_read_filterops;
 		break;
 
 	case EVFILT_WRITE:
 		TAPDEBUG("%s kqfilter: EVFILT_WRITE, minor = %#x\n",
 			ifp->if_xname, dev2unit(dev));
 		kn->kn_fop = &tap_write_filterops;
 		break;
 
 	default:
 		TAPDEBUG("%s kqfilter: invalid filter, minor = %#x\n",
 			ifp->if_xname, dev2unit(dev));
 		return (EINVAL);
 		/* NOT REACHED */
 	}
 
 	kn->kn_hook = tp;
 	knlist_add(&tp->tap_rsel.si_note, kn, 0);
 
 	return (0);
 } /* tapkqfilter */
 
 
 /*
  * tap_kqread
  * 
  * Return true if there is data in the interface queue
  */
 static int
 tapkqread(struct knote *kn, long hint)
 {
 	int			 ret;
 	struct tap_softc	*tp = kn->kn_hook;
 	struct cdev		*dev = tp->tap_dev;
 	struct ifnet		*ifp = tp->tap_ifp;
 
 	if ((kn->kn_data = ifp->if_snd.ifq_len) > 0) {
 		TAPDEBUG("%s have data in queue. len = %d, minor = %#x\n",
 			ifp->if_xname, ifp->if_snd.ifq_len, dev2unit(dev));
 		ret = 1;
 	} else {
 		TAPDEBUG("%s waiting for data, minor = %#x\n",
 			ifp->if_xname, dev2unit(dev));
 		ret = 0;
 	}
 
 	return (ret);
 } /* tapkqread */
 
 
 /*
  * tap_kqwrite
  *
  * Always can write. Return the MTU in kn->data
  */
 static int
 tapkqwrite(struct knote *kn, long hint)
 {
 	struct tap_softc	*tp = kn->kn_hook;
 	struct ifnet		*ifp = tp->tap_ifp;
 
 	kn->kn_data = ifp->if_mtu;
 
 	return (1);
 } /* tapkqwrite */
 
 
 static void
 tapkqdetach(struct knote *kn)
 {
 	struct tap_softc	*tp = kn->kn_hook;
 
 	knlist_remove(&tp->tap_rsel.si_note, kn, 0);
 } /* tapkqdetach */
 
Index: user/ngie/bug-237403/sys/net/if_tun.c
===================================================================
--- user/ngie/bug-237403/sys/net/if_tun.c	(revision 346925)
+++ user/ngie/bug-237403/sys/net/if_tun.c	(revision 346926)
@@ -1,1096 +1,1112 @@
 /*	$NetBSD: if_tun.c,v 1.14 1994/06/29 06:36:25 cgd Exp $	*/
 
 /*-
  * Copyright (c) 1988, Julian Onions <jpo@cs.nott.ac.uk>
  * Nottingham University 1987.
  *
  * This source may be freely distributed, however I would be interested
  * in any changes that are made.
  *
  * This driver takes packets off the IP i/f and hands them up to a
  * user process to have its wicked way with. This driver has it's
  * roots in a similar driver written by Phil Cockcroft (formerly) at
  * UCL. This driver is based much more on read/write/poll mode of
  * operation though.
  *
  * $FreeBSD$
  */
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/param.h>
+#include <sys/lock.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/systm.h>
 #include <sys/jail.h>
 #include <sys/mbuf.h>
 #include <sys/module.h>
 #include <sys/socket.h>
 #include <sys/fcntl.h>
 #include <sys/filio.h>
 #include <sys/sockio.h>
+#include <sys/sx.h>
 #include <sys/ttycom.h>
 #include <sys/poll.h>
 #include <sys/selinfo.h>
 #include <sys/signalvar.h>
 #include <sys/filedesc.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/conf.h>
 #include <sys/uio.h>
 #include <sys/malloc.h>
 #include <sys/random.h>
 #include <sys/ctype.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_clone.h>
 #include <net/if_types.h>
 #include <net/netisr.h>
 #include <net/route.h>
 #include <net/vnet.h>
 #ifdef INET
 #include <netinet/in.h>
 #endif
 #include <net/bpf.h>
 #include <net/if_tun.h>
 
 #include <sys/queue.h>
 #include <sys/condvar.h>
 
 #include <security/mac/mac_framework.h>
 
 /*
  * tun_list is protected by global tunmtx.  Other mutable fields are
  * protected by tun->tun_mtx, or by their owning subsystem.  tun_dev is
  * static for the duration of a tunnel interface.
  */
 struct tun_softc {
 	TAILQ_ENTRY(tun_softc)	tun_list;
 	struct cdev *tun_dev;
 	u_short	tun_flags;		/* misc flags */
 #define	TUN_OPEN	0x0001
 #define	TUN_INITED	0x0002
 #define	TUN_RCOLL	0x0004
 #define	TUN_IASET	0x0008
 #define	TUN_DSTADDR	0x0010
 #define	TUN_LMODE	0x0020
 #define	TUN_RWAIT	0x0040
 #define	TUN_ASYNC	0x0080
 #define	TUN_IFHEAD	0x0100
+#define	TUN_DYING	0x0200
 
 #define TUN_READY       (TUN_OPEN | TUN_INITED)
 
-	/*
-	 * XXXRW: tun_pid is used to exclusively lock /dev/tun.  Is this
-	 * actually needed?  Can we just return EBUSY if already open?
-	 * Problem is that this involved inherent races when a tun device
-	 * is handed off from one process to another, as opposed to just
-	 * being slightly stale informationally.
-	 */
 	pid_t	tun_pid;		/* owning pid */
 	struct	ifnet *tun_ifp;		/* the interface */
 	struct  sigio *tun_sigio;	/* information for async I/O */
 	struct	selinfo	tun_rsel;	/* read select */
 	struct mtx	tun_mtx;	/* protect mutable softc fields */
 	struct cv	tun_cv;		/* protect against ref'd dev destroy */
 };
 #define TUN2IFP(sc)	((sc)->tun_ifp)
 
 #define TUNDEBUG	if (tundebug) if_printf
 
 /*
  * All mutable global variables in if_tun are locked using tunmtx, with
  * the exception of tundebug, which is used unlocked, and tunclones,
  * which is static after setup.
  */
 static struct mtx tunmtx;
 static eventhandler_tag tag;
 static const char tunname[] = "tun";
 static MALLOC_DEFINE(M_TUN, tunname, "Tunnel Interface");
 static int tundebug = 0;
 static int tundclone = 1;
 static struct clonedevs *tunclones;
 static TAILQ_HEAD(,tun_softc)	tunhead = TAILQ_HEAD_INITIALIZER(tunhead);
 SYSCTL_INT(_debug, OID_AUTO, if_tun_debug, CTLFLAG_RW, &tundebug, 0, "");
 
+static struct sx tun_ioctl_sx;
+SX_SYSINIT(tun_ioctl_sx, &tun_ioctl_sx, "tun_ioctl");
+
 SYSCTL_DECL(_net_link);
 static SYSCTL_NODE(_net_link, OID_AUTO, tun, CTLFLAG_RW, 0,
     "IP tunnel software network interface.");
 SYSCTL_INT(_net_link_tun, OID_AUTO, devfs_cloning, CTLFLAG_RWTUN, &tundclone, 0,
     "Enable legacy devfs interface creation.");
 
 static void	tunclone(void *arg, struct ucred *cred, char *name,
 		    int namelen, struct cdev **dev);
 static void	tuncreate(const char *name, struct cdev *dev);
 static int	tunifioctl(struct ifnet *, u_long, caddr_t);
 static void	tuninit(struct ifnet *);
 static int	tunmodevent(module_t, int, void *);
 static int	tunoutput(struct ifnet *, struct mbuf *,
 		    const struct sockaddr *, struct route *ro);
 static void	tunstart(struct ifnet *);
 
 static int	tun_clone_match(struct if_clone *ifc, const char *name);
 static int	tun_clone_create(struct if_clone *, char *, size_t, caddr_t);
 static int	tun_clone_destroy(struct if_clone *, struct ifnet *);
 static struct unrhdr	*tun_unrhdr;
 VNET_DEFINE_STATIC(struct if_clone *, tun_cloner);
 #define V_tun_cloner VNET(tun_cloner)
 
 static d_open_t		tunopen;
 static d_close_t	tunclose;
 static d_read_t		tunread;
 static d_write_t	tunwrite;
 static d_ioctl_t	tunioctl;
 static d_poll_t		tunpoll;
 static d_kqfilter_t	tunkqfilter;
 
 static int		tunkqread(struct knote *, long);
 static int		tunkqwrite(struct knote *, long);
 static void		tunkqdetach(struct knote *);
 
 static struct filterops tun_read_filterops = {
 	.f_isfd =	1,
 	.f_attach =	NULL,
 	.f_detach =	tunkqdetach,
 	.f_event =	tunkqread,
 };
 
 static struct filterops tun_write_filterops = {
 	.f_isfd =	1,
 	.f_attach =	NULL,
 	.f_detach =	tunkqdetach,
 	.f_event =	tunkqwrite,
 };
 
 static struct cdevsw tun_cdevsw = {
 	.d_version =	D_VERSION,
 	.d_flags =	D_NEEDMINOR,
 	.d_open =	tunopen,
 	.d_close =	tunclose,
 	.d_read =	tunread,
 	.d_write =	tunwrite,
 	.d_ioctl =	tunioctl,
 	.d_poll =	tunpoll,
 	.d_kqfilter =	tunkqfilter,
 	.d_name =	tunname,
 };
 
 static int
 tun_clone_match(struct if_clone *ifc, const char *name)
 {
 	if (strncmp(tunname, name, 3) == 0 &&
 	    (name[3] == '\0' || isdigit(name[3])))
 		return (1);
 
 	return (0);
 }
 
 static int
 tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params)
 {
 	struct cdev *dev;
 	int err, unit, i;
 
 	err = ifc_name2unit(name, &unit);
 	if (err != 0)
 		return (err);
 
 	if (unit != -1) {
 		/* If this unit number is still available that/s okay. */
 		if (alloc_unr_specific(tun_unrhdr, unit) == -1)
 			return (EEXIST);
 	} else {
 		unit = alloc_unr(tun_unrhdr);
 	}
 
 	snprintf(name, IFNAMSIZ, "%s%d", tunname, unit);
 
 	/* find any existing device, or allocate new unit number */
 	i = clone_create(&tunclones, &tun_cdevsw, &unit, &dev, 0);
 	if (i) {
 		/* No preexisting struct cdev *, create one */
 		dev = make_dev(&tun_cdevsw, unit,
 		    UID_UUCP, GID_DIALER, 0600, "%s%d", tunname, unit);
 	}
 	tuncreate(tunname, dev);
 
 	return (0);
 }
 
 static void
 tunclone(void *arg, struct ucred *cred, char *name, int namelen,
     struct cdev **dev)
 {
 	char devname[SPECNAMELEN + 1];
 	int u, i, append_unit;
 
 	if (*dev != NULL)
 		return;
 
 	/*
 	 * If tun cloning is enabled, only the superuser can create an
 	 * interface.
 	 */
 	if (!tundclone || priv_check_cred(cred, PRIV_NET_IFCREATE) != 0)
 		return;
 
 	if (strcmp(name, tunname) == 0) {
 		u = -1;
 	} else if (dev_stdclone(name, NULL, tunname, &u) != 1)
 		return;	/* Don't recognise the name */
 	if (u != -1 && u > IF_MAXUNIT)
 		return;	/* Unit number too high */
 
 	if (u == -1)
 		append_unit = 1;
 	else
 		append_unit = 0;
 
 	CURVNET_SET(CRED_TO_VNET(cred));
 	/* find any existing device, or allocate new unit number */
 	i = clone_create(&tunclones, &tun_cdevsw, &u, dev, 0);
 	if (i) {
 		if (append_unit) {
 			namelen = snprintf(devname, sizeof(devname), "%s%d",
 			    name, u);
 			name = devname;
 		}
 		/* No preexisting struct cdev *, create one */
 		*dev = make_dev_credf(MAKEDEV_REF, &tun_cdevsw, u, cred,
 		    UID_UUCP, GID_DIALER, 0600, "%s", name);
 	}
 
 	if_clone_create(name, namelen, NULL);
 	CURVNET_RESTORE();
 }
 
 static void
 tun_destroy(struct tun_softc *tp)
 {
 	struct cdev *dev;
 
 	mtx_lock(&tp->tun_mtx);
+	tp->tun_flags |= TUN_DYING;
 	if ((tp->tun_flags & TUN_OPEN) != 0)
 		cv_wait_unlock(&tp->tun_cv, &tp->tun_mtx);
 	else
 		mtx_unlock(&tp->tun_mtx);
 
 	CURVNET_SET(TUN2IFP(tp)->if_vnet);
+	sx_xlock(&tun_ioctl_sx);
+	TUN2IFP(tp)->if_softc = NULL;
+	sx_xunlock(&tun_ioctl_sx);
+
 	dev = tp->tun_dev;
 	bpfdetach(TUN2IFP(tp));
 	if_detach(TUN2IFP(tp));
 	free_unr(tun_unrhdr, TUN2IFP(tp)->if_dunit);
 	if_free(TUN2IFP(tp));
 	destroy_dev(dev);
 	seldrain(&tp->tun_rsel);
 	knlist_clear(&tp->tun_rsel.si_note, 0);
 	knlist_destroy(&tp->tun_rsel.si_note);
 	mtx_destroy(&tp->tun_mtx);
 	cv_destroy(&tp->tun_cv);
 	free(tp, M_TUN);
 	CURVNET_RESTORE();
 }
 
 static int
 tun_clone_destroy(struct if_clone *ifc, struct ifnet *ifp)
 {
 	struct tun_softc *tp = ifp->if_softc;
 
 	mtx_lock(&tunmtx);
 	TAILQ_REMOVE(&tunhead, tp, tun_list);
 	mtx_unlock(&tunmtx);
 	tun_destroy(tp);
 
 	return (0);
 }
 
 static void
 vnet_tun_init(const void *unused __unused)
 {
 	V_tun_cloner = if_clone_advanced(tunname, 0, tun_clone_match,
 			tun_clone_create, tun_clone_destroy);
 }
 VNET_SYSINIT(vnet_tun_init, SI_SUB_PROTO_IF, SI_ORDER_ANY,
 		vnet_tun_init, NULL);
 
 static void
 vnet_tun_uninit(const void *unused __unused)
 {
 	if_clone_detach(V_tun_cloner);
 }
 VNET_SYSUNINIT(vnet_tun_uninit, SI_SUB_PROTO_IF, SI_ORDER_ANY,
     vnet_tun_uninit, NULL);
 
 static void
 tun_uninit(const void *unused __unused)
 {
 	struct tun_softc *tp;
 
 	EVENTHANDLER_DEREGISTER(dev_clone, tag);
 	drain_dev_clone_events();
 
 	mtx_lock(&tunmtx);
 	while ((tp = TAILQ_FIRST(&tunhead)) != NULL) {
 		TAILQ_REMOVE(&tunhead, tp, tun_list);
 		mtx_unlock(&tunmtx);
 		tun_destroy(tp);
 		mtx_lock(&tunmtx);
 	}
 	mtx_unlock(&tunmtx);
 	delete_unrhdr(tun_unrhdr);
 	clone_cleanup(&tunclones);
 	mtx_destroy(&tunmtx);
 }
 SYSUNINIT(tun_uninit, SI_SUB_PROTO_IF, SI_ORDER_ANY, tun_uninit, NULL);
 
 static int
 tunmodevent(module_t mod, int type, void *data)
 {
 
 	switch (type) {
 	case MOD_LOAD:
 		mtx_init(&tunmtx, "tunmtx", NULL, MTX_DEF);
 		clone_setup(&tunclones);
 		tun_unrhdr = new_unrhdr(0, IF_MAXUNIT, &tunmtx);
 		tag = EVENTHANDLER_REGISTER(dev_clone, tunclone, 0, 1000);
 		if (tag == NULL)
 			return (ENOMEM);
 		break;
 	case MOD_UNLOAD:
 		/* See tun_uninit, so it's done after the vnet_sysuninit() */
 		break;
 	default:
 		return EOPNOTSUPP;
 	}
 	return 0;
 }
 
 static moduledata_t tun_mod = {
 	"if_tun",
 	tunmodevent,
 	0
 };
 
 DECLARE_MODULE(if_tun, tun_mod, SI_SUB_PSEUDO, SI_ORDER_ANY);
 MODULE_VERSION(if_tun, 1);
 
 static void
 tunstart(struct ifnet *ifp)
 {
 	struct tun_softc *tp = ifp->if_softc;
 	struct mbuf *m;
 
 	TUNDEBUG(ifp,"%s starting\n", ifp->if_xname);
 	if (ALTQ_IS_ENABLED(&ifp->if_snd)) {
 		IFQ_LOCK(&ifp->if_snd);
 		IFQ_POLL_NOLOCK(&ifp->if_snd, m);
 		if (m == NULL) {
 			IFQ_UNLOCK(&ifp->if_snd);
 			return;
 		}
 		IFQ_UNLOCK(&ifp->if_snd);
 	}
 
 	mtx_lock(&tp->tun_mtx);
 	if (tp->tun_flags & TUN_RWAIT) {
 		tp->tun_flags &= ~TUN_RWAIT;
 		wakeup(tp);
 	}
 	selwakeuppri(&tp->tun_rsel, PZERO + 1);
 	KNOTE_LOCKED(&tp->tun_rsel.si_note, 0);
 	if (tp->tun_flags & TUN_ASYNC && tp->tun_sigio) {
 		mtx_unlock(&tp->tun_mtx);
 		pgsigio(&tp->tun_sigio, SIGIO, 0);
 	} else
 		mtx_unlock(&tp->tun_mtx);
 }
 
 /* XXX: should return an error code so it can fail. */
 static void
 tuncreate(const char *name, struct cdev *dev)
 {
 	struct tun_softc *sc;
 	struct ifnet *ifp;
 
 	sc = malloc(sizeof(*sc), M_TUN, M_WAITOK | M_ZERO);
 	mtx_init(&sc->tun_mtx, "tun_mtx", NULL, MTX_DEF);
 	cv_init(&sc->tun_cv, "tun_condvar");
 	sc->tun_flags = TUN_INITED;
 	sc->tun_dev = dev;
 	mtx_lock(&tunmtx);
 	TAILQ_INSERT_TAIL(&tunhead, sc, tun_list);
 	mtx_unlock(&tunmtx);
 
 	ifp = sc->tun_ifp = if_alloc(IFT_PPP);
 	if (ifp == NULL)
 		panic("%s%d: failed to if_alloc() interface.\n",
 		    name, dev2unit(dev));
 	if_initname(ifp, name, dev2unit(dev));
 	ifp->if_mtu = TUNMTU;
 	ifp->if_ioctl = tunifioctl;
 	ifp->if_output = tunoutput;
 	ifp->if_start = tunstart;
 	ifp->if_flags = IFF_POINTOPOINT | IFF_MULTICAST;
 	ifp->if_softc = sc;
 	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
 	ifp->if_snd.ifq_drv_maxlen = 0;
 	IFQ_SET_READY(&ifp->if_snd);
 	knlist_init_mtx(&sc->tun_rsel.si_note, &sc->tun_mtx);
 	ifp->if_capabilities |= IFCAP_LINKSTATE;
 	ifp->if_capenable |= IFCAP_LINKSTATE;
 
 	if_attach(ifp);
 	bpfattach(ifp, DLT_NULL, sizeof(u_int32_t));
 	dev->si_drv1 = sc;
 	TUNDEBUG(ifp, "interface %s is created, minor = %#x\n",
 	    ifp->if_xname, dev2unit(dev));
 }
 
 static int
 tunopen(struct cdev *dev, int flag, int mode, struct thread *td)
 {
 	struct ifnet	*ifp;
 	struct tun_softc *tp;
 
 	/*
 	 * XXXRW: Non-atomic test and set of dev->si_drv1 requires
 	 * synchronization.
 	 */
 	tp = dev->si_drv1;
 	if (!tp) {
 		tuncreate(tunname, dev);
 		tp = dev->si_drv1;
 	}
 
-	/*
-	 * XXXRW: This use of tun_pid is subject to error due to the
-	 * fact that a reference to the tunnel can live beyond the
-	 * death of the process that created it.  Can we replace this
-	 * with a simple busy flag?
-	 */
 	mtx_lock(&tp->tun_mtx);
-	if (tp->tun_pid != 0 && tp->tun_pid != td->td_proc->p_pid) {
+	if ((tp->tun_flags & (TUN_OPEN | TUN_DYING)) != 0) {
 		mtx_unlock(&tp->tun_mtx);
 		return (EBUSY);
 	}
-	tp->tun_pid = td->td_proc->p_pid;
 
+	tp->tun_pid = td->td_proc->p_pid;
 	tp->tun_flags |= TUN_OPEN;
 	ifp = TUN2IFP(tp);
 	if_link_state_change(ifp, LINK_STATE_UP);
 	TUNDEBUG(ifp, "open\n");
 	mtx_unlock(&tp->tun_mtx);
 
 	return (0);
 }
 
 /*
  * tunclose - close the device - mark i/f down & delete
  * routing info
  */
 static	int
 tunclose(struct cdev *dev, int foo, int bar, struct thread *td)
 {
 	struct tun_softc *tp;
 	struct ifnet *ifp;
 
 	tp = dev->si_drv1;
 	ifp = TUN2IFP(tp);
 
 	mtx_lock(&tp->tun_mtx);
+	/*
+	 * Simply close the device if this isn't the controlling process.  This
+	 * may happen if, for instance, the tunnel has been handed off to
+	 * another process.  The original controller should be able to close it
+	 * without putting us into an inconsistent state.
+	 */
+	if (td->td_proc->p_pid != tp->tun_pid) {
+		mtx_unlock(&tp->tun_mtx);
+		return (0);
+	}
 
 	/*
 	 * junk all pending output
 	 */
 	CURVNET_SET(ifp->if_vnet);
 	IFQ_PURGE(&ifp->if_snd);
 
 	if (ifp->if_flags & IFF_UP) {
 		mtx_unlock(&tp->tun_mtx);
 		if_down(ifp);
 		mtx_lock(&tp->tun_mtx);
 	}
 
 	/* Delete all addresses and routes which reference this interface. */
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		struct ifaddr *ifa;
 
 		ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 		mtx_unlock(&tp->tun_mtx);
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			/* deal w/IPv4 PtP destination; unlocked read */
 			if (ifa->ifa_addr->sa_family == AF_INET) {
 				rtinit(ifa, (int)RTM_DELETE,
 				    tp->tun_flags & TUN_DSTADDR ? RTF_HOST : 0);
 			} else {
 				rtinit(ifa, (int)RTM_DELETE, 0);
 			}
 		}
 		if_purgeaddrs(ifp);
 		mtx_lock(&tp->tun_mtx);
 	}
 	if_link_state_change(ifp, LINK_STATE_DOWN);
 	CURVNET_RESTORE();
 
 	funsetown(&tp->tun_sigio);
 	selwakeuppri(&tp->tun_rsel, PZERO + 1);
 	KNOTE_LOCKED(&tp->tun_rsel.si_note, 0);
 	TUNDEBUG (ifp, "closed\n");
 	tp->tun_flags &= ~TUN_OPEN;
 	tp->tun_pid = 0;
 
 	cv_broadcast(&tp->tun_cv);
 	mtx_unlock(&tp->tun_mtx);
 	return (0);
 }
 
 static void
 tuninit(struct ifnet *ifp)
 {
 	struct tun_softc *tp = ifp->if_softc;
 #ifdef INET
 	struct ifaddr *ifa;
 #endif
 
 	TUNDEBUG(ifp, "tuninit\n");
 
 	mtx_lock(&tp->tun_mtx);
 	ifp->if_flags |= IFF_UP;
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	getmicrotime(&ifp->if_lastchange);
 
 #ifdef INET
 	if_addr_rlock(ifp);
 	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 		if (ifa->ifa_addr->sa_family == AF_INET) {
 			struct sockaddr_in *si;
 
 			si = (struct sockaddr_in *)ifa->ifa_addr;
 			if (si->sin_addr.s_addr)
 				tp->tun_flags |= TUN_IASET;
 
 			si = (struct sockaddr_in *)ifa->ifa_dstaddr;
 			if (si && si->sin_addr.s_addr)
 				tp->tun_flags |= TUN_DSTADDR;
 		}
 	}
 	if_addr_runlock(ifp);
 #endif
 	mtx_unlock(&tp->tun_mtx);
 }
 
 /*
  * Process an ioctl request.
  */
 static int
 tunifioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
 	struct ifreq *ifr = (struct ifreq *)data;
-	struct tun_softc *tp = ifp->if_softc;
+	struct tun_softc *tp;
 	struct ifstat *ifs;
 	int		error = 0;
 
+	sx_xlock(&tun_ioctl_sx);
+	tp = ifp->if_softc;
+	if (tp == NULL) {
+		error = ENXIO;
+		goto bad;
+	}
 	switch(cmd) {
 	case SIOCGIFSTATUS:
 		ifs = (struct ifstat *)data;
 		mtx_lock(&tp->tun_mtx);
 		if (tp->tun_pid)
 			snprintf(ifs->ascii, sizeof(ifs->ascii),
 			    "\tOpened by PID %d\n", tp->tun_pid);
 		else
 			ifs->ascii[0] = '\0';
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case SIOCSIFADDR:
 		tuninit(ifp);
 		TUNDEBUG(ifp, "address set\n");
 		break;
 	case SIOCSIFMTU:
 		ifp->if_mtu = ifr->ifr_mtu;
 		TUNDEBUG(ifp, "mtu set\n");
 		break;
 	case SIOCSIFFLAGS:
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		break;
 	default:
 		error = EINVAL;
 	}
+bad:
+	sx_xunlock(&tun_ioctl_sx);
 	return (error);
 }
 
 /*
  * tunoutput - queue packets from higher level ready to put out.
  */
 static int
 tunoutput(struct ifnet *ifp, struct mbuf *m0, const struct sockaddr *dst,
     struct route *ro)
 {
 	struct tun_softc *tp = ifp->if_softc;
 	u_short cached_tun_flags;
 	int error;
 	u_int32_t af;
 
 	TUNDEBUG (ifp, "tunoutput\n");
 
 #ifdef MAC
 	error = mac_ifnet_check_transmit(ifp, m0);
 	if (error) {
 		m_freem(m0);
 		return (error);
 	}
 #endif
 
 	/* Could be unlocked read? */
 	mtx_lock(&tp->tun_mtx);
 	cached_tun_flags = tp->tun_flags;
 	mtx_unlock(&tp->tun_mtx);
 	if ((cached_tun_flags & TUN_READY) != TUN_READY) {
 		TUNDEBUG (ifp, "not ready 0%o\n", tp->tun_flags);
 		m_freem (m0);
 		return (EHOSTDOWN);
 	}
 
 	if ((ifp->if_flags & IFF_UP) != IFF_UP) {
 		m_freem (m0);
 		return (EHOSTDOWN);
 	}
 
 	/* BPF writes need to be handled specially. */
 	if (dst->sa_family == AF_UNSPEC)
 		bcopy(dst->sa_data, &af, sizeof(af));
 	else
 		af = dst->sa_family;
 
 	if (bpf_peers_present(ifp->if_bpf))
 		bpf_mtap2(ifp->if_bpf, &af, sizeof(af), m0);
 
 	/* prepend sockaddr? this may abort if the mbuf allocation fails */
 	if (cached_tun_flags & TUN_LMODE) {
 		/* allocate space for sockaddr */
 		M_PREPEND(m0, dst->sa_len, M_NOWAIT);
 
 		/* if allocation failed drop packet */
 		if (m0 == NULL) {
 			if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
 			if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 			return (ENOBUFS);
 		} else {
 			bcopy(dst, m0->m_data, dst->sa_len);
 		}
 	}
 
 	if (cached_tun_flags & TUN_IFHEAD) {
 		/* Prepend the address family */
 		M_PREPEND(m0, 4, M_NOWAIT);
 
 		/* if allocation failed drop packet */
 		if (m0 == NULL) {
 			if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
 			if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 			return (ENOBUFS);
 		} else
 			*(u_int32_t *)m0->m_data = htonl(af);
 	} else {
 #ifdef INET
 		if (af != AF_INET)
 #endif
 		{
 			m_freem(m0);
 			return (EAFNOSUPPORT);
 		}
 	}
 
 	error = (ifp->if_transmit)(ifp, m0);
 	if (error)
 		return (ENOBUFS);
 	if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 	return (0);
 }
 
 /*
  * the cdevsw interface is now pretty minimal.
  */
 static	int
 tunioctl(struct cdev *dev, u_long cmd, caddr_t data, int flag,
     struct thread *td)
 {
 	struct ifreq ifr;
 	struct tun_softc *tp = dev->si_drv1;
 	struct tuninfo *tunp;
 	int error;
 
 	switch (cmd) {
 	case TUNSIFINFO:
 		tunp = (struct tuninfo *)data;
 		if (TUN2IFP(tp)->if_type != tunp->type)
 			return (EPROTOTYPE);
 		mtx_lock(&tp->tun_mtx);
 		if (TUN2IFP(tp)->if_mtu != tunp->mtu) {
 			strlcpy(ifr.ifr_name, if_name(TUN2IFP(tp)), IFNAMSIZ);
 			ifr.ifr_mtu = tunp->mtu;
 			CURVNET_SET(TUN2IFP(tp)->if_vnet);
 			error = ifhwioctl(SIOCSIFMTU, TUN2IFP(tp),
 			    (caddr_t)&ifr, td);
 			CURVNET_RESTORE();
 			if (error) {
 				mtx_unlock(&tp->tun_mtx);
 				return (error);
 			}
 		}
 		TUN2IFP(tp)->if_baudrate = tunp->baudrate;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case TUNGIFINFO:
 		tunp = (struct tuninfo *)data;
 		mtx_lock(&tp->tun_mtx);
 		tunp->mtu = TUN2IFP(tp)->if_mtu;
 		tunp->type = TUN2IFP(tp)->if_type;
 		tunp->baudrate = TUN2IFP(tp)->if_baudrate;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case TUNSDEBUG:
 		tundebug = *(int *)data;
 		break;
 	case TUNGDEBUG:
 		*(int *)data = tundebug;
 		break;
 	case TUNSLMODE:
 		mtx_lock(&tp->tun_mtx);
 		if (*(int *)data) {
 			tp->tun_flags |= TUN_LMODE;
 			tp->tun_flags &= ~TUN_IFHEAD;
 		} else
 			tp->tun_flags &= ~TUN_LMODE;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case TUNSIFHEAD:
 		mtx_lock(&tp->tun_mtx);
 		if (*(int *)data) {
 			tp->tun_flags |= TUN_IFHEAD;
 			tp->tun_flags &= ~TUN_LMODE;
 		} else
 			tp->tun_flags &= ~TUN_IFHEAD;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case TUNGIFHEAD:
 		mtx_lock(&tp->tun_mtx);
 		*(int *)data = (tp->tun_flags & TUN_IFHEAD) ? 1 : 0;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case TUNSIFMODE:
 		/* deny this if UP */
 		if (TUN2IFP(tp)->if_flags & IFF_UP)
 			return(EBUSY);
 
 		switch (*(int *)data & ~IFF_MULTICAST) {
 		case IFF_POINTOPOINT:
 		case IFF_BROADCAST:
 			mtx_lock(&tp->tun_mtx);
 			TUN2IFP(tp)->if_flags &=
 			    ~(IFF_BROADCAST|IFF_POINTOPOINT|IFF_MULTICAST);
 			TUN2IFP(tp)->if_flags |= *(int *)data;
 			mtx_unlock(&tp->tun_mtx);
 			break;
 		default:
 			return(EINVAL);
 		}
 		break;
 	case TUNSIFPID:
 		mtx_lock(&tp->tun_mtx);
 		tp->tun_pid = curthread->td_proc->p_pid;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case FIONBIO:
 		break;
 	case FIOASYNC:
 		mtx_lock(&tp->tun_mtx);
 		if (*(int *)data)
 			tp->tun_flags |= TUN_ASYNC;
 		else
 			tp->tun_flags &= ~TUN_ASYNC;
 		mtx_unlock(&tp->tun_mtx);
 		break;
 	case FIONREAD:
 		if (!IFQ_IS_EMPTY(&TUN2IFP(tp)->if_snd)) {
 			struct mbuf *mb;
 			IFQ_LOCK(&TUN2IFP(tp)->if_snd);
 			IFQ_POLL_NOLOCK(&TUN2IFP(tp)->if_snd, mb);
 			for (*(int *)data = 0; mb != NULL; mb = mb->m_next)
 				*(int *)data += mb->m_len;
 			IFQ_UNLOCK(&TUN2IFP(tp)->if_snd);
 		} else
 			*(int *)data = 0;
 		break;
 	case FIOSETOWN:
 		return (fsetown(*(int *)data, &tp->tun_sigio));
 
 	case FIOGETOWN:
 		*(int *)data = fgetown(&tp->tun_sigio);
 		return (0);
 
 	/* This is deprecated, FIOSETOWN should be used instead. */
 	case TIOCSPGRP:
 		return (fsetown(-(*(int *)data), &tp->tun_sigio));
 
 	/* This is deprecated, FIOGETOWN should be used instead. */
 	case TIOCGPGRP:
 		*(int *)data = -fgetown(&tp->tun_sigio);
 		return (0);
 
 	default:
 		return (ENOTTY);
 	}
 	return (0);
 }
 
 /*
  * The cdevsw read interface - reads a packet at a time, or at
  * least as much of a packet as can be read.
  */
 static	int
 tunread(struct cdev *dev, struct uio *uio, int flag)
 {
 	struct tun_softc *tp = dev->si_drv1;
 	struct ifnet	*ifp = TUN2IFP(tp);
 	struct mbuf	*m;
 	int		error=0, len;
 
 	TUNDEBUG (ifp, "read\n");
 	mtx_lock(&tp->tun_mtx);
 	if ((tp->tun_flags & TUN_READY) != TUN_READY) {
 		mtx_unlock(&tp->tun_mtx);
 		TUNDEBUG (ifp, "not ready 0%o\n", tp->tun_flags);
 		return (EHOSTDOWN);
 	}
 
 	tp->tun_flags &= ~TUN_RWAIT;
 
 	do {
 		IFQ_DEQUEUE(&ifp->if_snd, m);
 		if (m == NULL) {
 			if (flag & O_NONBLOCK) {
 				mtx_unlock(&tp->tun_mtx);
 				return (EWOULDBLOCK);
 			}
 			tp->tun_flags |= TUN_RWAIT;
 			error = mtx_sleep(tp, &tp->tun_mtx, PCATCH | (PZERO + 1),
 			    "tunread", 0);
 			if (error != 0) {
 				mtx_unlock(&tp->tun_mtx);
 				return (error);
 			}
 		}
 	} while (m == NULL);
 	mtx_unlock(&tp->tun_mtx);
 
 	while (m && uio->uio_resid > 0 && error == 0) {
 		len = min(uio->uio_resid, m->m_len);
 		if (len != 0)
 			error = uiomove(mtod(m, void *), len, uio);
 		m = m_free(m);
 	}
 
 	if (m) {
 		TUNDEBUG(ifp, "Dropping mbuf\n");
 		m_freem(m);
 	}
 	return (error);
 }
 
 /*
  * the cdevsw write interface - an atomic write is a packet - or else!
  */
 static	int
 tunwrite(struct cdev *dev, struct uio *uio, int flag)
 {
 	struct tun_softc *tp = dev->si_drv1;
 	struct ifnet	*ifp = TUN2IFP(tp);
 	struct mbuf	*m;
 	uint32_t	family, mru;
 	int 		isr;
 
 	TUNDEBUG(ifp, "tunwrite\n");
 
 	if ((ifp->if_flags & IFF_UP) != IFF_UP)
 		/* ignore silently */
 		return (0);
 
 	if (uio->uio_resid == 0)
 		return (0);
 
 	mru = TUNMRU;
 	if (tp->tun_flags & TUN_IFHEAD)
 		mru += sizeof(family);
 	if (uio->uio_resid < 0 || uio->uio_resid > mru) {
 		TUNDEBUG(ifp, "len=%zd!\n", uio->uio_resid);
 		return (EIO);
 	}
 
 	if ((m = m_uiotombuf(uio, M_NOWAIT, 0, 0, M_PKTHDR)) == NULL) {
 		if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 		return (ENOBUFS);
 	}
 
 	m->m_pkthdr.rcvif = ifp;
 #ifdef MAC
 	mac_ifnet_create_mbuf(ifp, m);
 #endif
 
 	/* Could be unlocked read? */
 	mtx_lock(&tp->tun_mtx);
 	if (tp->tun_flags & TUN_IFHEAD) {
 		mtx_unlock(&tp->tun_mtx);
 		if (m->m_len < sizeof(family) &&
 		    (m = m_pullup(m, sizeof(family))) == NULL)
 			return (ENOBUFS);
 		family = ntohl(*mtod(m, u_int32_t *));
 		m_adj(m, sizeof(family));
 	} else {
 		mtx_unlock(&tp->tun_mtx);
 		family = AF_INET;
 	}
 
 	BPF_MTAP2(ifp, &family, sizeof(family), m);
 
 	switch (family) {
 #ifdef INET
 	case AF_INET:
 		isr = NETISR_IP;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		isr = NETISR_IPV6;
 		break;
 #endif
 	default:
 		m_freem(m);
 		return (EAFNOSUPPORT);
 	}
 	random_harvest_queue(m, sizeof(*m), RANDOM_NET_TUN);
 	if_inc_counter(ifp, IFCOUNTER_IBYTES, m->m_pkthdr.len);
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 	CURVNET_SET(ifp->if_vnet);
 	M_SETFIB(m, ifp->if_fib);
 	netisr_dispatch(isr, m);
 	CURVNET_RESTORE();
 	return (0);
 }
 
 /*
  * tunpoll - the poll interface, this is only useful on reads
  * really. The write detect always returns true, write never blocks
  * anyway, it either accepts the packet or drops it.
  */
 static	int
 tunpoll(struct cdev *dev, int events, struct thread *td)
 {
 	struct tun_softc *tp = dev->si_drv1;
 	struct ifnet	*ifp = TUN2IFP(tp);
 	int		revents = 0;
 	struct mbuf	*m;
 
 	TUNDEBUG(ifp, "tunpoll\n");
 
 	if (events & (POLLIN | POLLRDNORM)) {
 		IFQ_LOCK(&ifp->if_snd);
 		IFQ_POLL_NOLOCK(&ifp->if_snd, m);
 		if (m != NULL) {
 			TUNDEBUG(ifp, "tunpoll q=%d\n", ifp->if_snd.ifq_len);
 			revents |= events & (POLLIN | POLLRDNORM);
 		} else {
 			TUNDEBUG(ifp, "tunpoll waiting\n");
 			selrecord(td, &tp->tun_rsel);
 		}
 		IFQ_UNLOCK(&ifp->if_snd);
 	}
 	if (events & (POLLOUT | POLLWRNORM))
 		revents |= events & (POLLOUT | POLLWRNORM);
 
 	return (revents);
 }
 
 /*
  * tunkqfilter - support for the kevent() system call.
  */
 static int
 tunkqfilter(struct cdev *dev, struct knote *kn)
 {
 	struct tun_softc	*tp = dev->si_drv1;
 	struct ifnet	*ifp = TUN2IFP(tp);
 
 	switch(kn->kn_filter) {
 	case EVFILT_READ:
 		TUNDEBUG(ifp, "%s kqfilter: EVFILT_READ, minor = %#x\n",
 		    ifp->if_xname, dev2unit(dev));
 		kn->kn_fop = &tun_read_filterops;
 		break;
 
 	case EVFILT_WRITE:
 		TUNDEBUG(ifp, "%s kqfilter: EVFILT_WRITE, minor = %#x\n",
 		    ifp->if_xname, dev2unit(dev));
 		kn->kn_fop = &tun_write_filterops;
 		break;
 
 	default:
 		TUNDEBUG(ifp, "%s kqfilter: invalid filter, minor = %#x\n",
 		    ifp->if_xname, dev2unit(dev));
 		return(EINVAL);
 	}
 
 	kn->kn_hook = tp;
 	knlist_add(&tp->tun_rsel.si_note, kn, 0);
 
 	return (0);
 }
 
 /*
  * Return true of there is data in the interface queue.
  */
 static int
 tunkqread(struct knote *kn, long hint)
 {
 	int			ret;
 	struct tun_softc	*tp = kn->kn_hook;
 	struct cdev		*dev = tp->tun_dev;
 	struct ifnet	*ifp = TUN2IFP(tp);
 
 	if ((kn->kn_data = ifp->if_snd.ifq_len) > 0) {
 		TUNDEBUG(ifp,
 		    "%s have data in the queue.  Len = %d, minor = %#x\n",
 		    ifp->if_xname, ifp->if_snd.ifq_len, dev2unit(dev));
 		ret = 1;
 	} else {
 		TUNDEBUG(ifp,
 		    "%s waiting for data, minor = %#x\n", ifp->if_xname,
 		    dev2unit(dev));
 		ret = 0;
 	}
 
 	return (ret);
 }
 
 /*
  * Always can write, always return MTU in kn->data.
  */
 static int
 tunkqwrite(struct knote *kn, long hint)
 {
 	struct tun_softc	*tp = kn->kn_hook;
 	struct ifnet	*ifp = TUN2IFP(tp);
 
 	kn->kn_data = ifp->if_mtu;
 
 	return (1);
 }
 
 static void
 tunkqdetach(struct knote *kn)
 {
 	struct tun_softc	*tp = kn->kn_hook;
 
 	knlist_remove(&tp->tun_rsel.si_note, kn, 0);
 }
Index: user/ngie/bug-237403/sys/net/iflib.c
===================================================================
--- user/ngie/bug-237403/sys/net/iflib.c	(revision 346925)
+++ user/ngie/bug-237403/sys/net/iflib.c	(revision 346926)
@@ -1,6537 +1,6730 @@
 /*-
  * Copyright (c) 2014-2018, Matthew Macy <mmacy@mattmacy.io>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  *  1. Redistributions of source code must retain the above copyright notice,
  *     this list of conditions and the following disclaimer.
  *
  *  2. Neither the name of Matthew Macy nor the names of its
  *     contributors may be used to endorse or promote products derived from
  *     this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_acpi.h"
 #include "opt_sched.h"
 
 #include <sys/param.h>
 #include <sys/types.h>
 #include <sys/bus.h>
 #include <sys/eventhandler.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/module.h>
 #include <sys/kobj.h>
 #include <sys/rman.h>
 #include <sys/sbuf.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
 #include <sys/taskqueue.h>
 #include <sys/limits.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_types.h>
 #include <net/if_media.h>
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/mp_ring.h>
+#include <net/pfil.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/tcp_lro.h>
 #include <netinet/in_systm.h>
 #include <netinet/if_ether.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 #include <netinet/tcp.h>
 #include <netinet/ip_var.h>
 #include <netinet/netdump/netdump.h>
 #include <netinet6/ip6_var.h>
 
 #include <machine/bus.h>
 #include <machine/in_cksum.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <dev/led/led.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pci_private.h>
 
 #include <net/iflib.h>
 #include <net/iflib_private.h>
 
 #include "ifdi_if.h"
 
 #ifdef PCI_IOV
 #include <dev/pci/pci_iov.h>
 #endif
 
 #include <sys/bitstring.h>
 /*
  * enable accounting of every mbuf as it comes in to and goes out of
  * iflib's software descriptor references
  */
 #define MEMORY_LOGGING 0
 /*
  * Enable mbuf vectors for compressing long mbuf chains
  */
 
 /*
  * NB:
  * - Prefetching in tx cleaning should perhaps be a tunable. The distance ahead
  *   we prefetch needs to be determined by the time spent in m_free vis a vis
  *   the cost of a prefetch. This will of course vary based on the workload:
  *      - NFLX's m_free path is dominated by vm-based M_EXT manipulation which
  *        is quite expensive, thus suggesting very little prefetch.
  *      - small packet forwarding which is just returning a single mbuf to
  *        UMA will typically be very fast vis a vis the cost of a memory
  *        access.
  */
 
 
 /*
  * File organization:
  *  - private structures
  *  - iflib private utility functions
  *  - ifnet functions
  *  - vlan registry and other exported functions
  *  - iflib public core functions
  *
  *
  */
 MALLOC_DEFINE(M_IFLIB, "iflib", "ifnet library");
 
 struct iflib_txq;
 typedef struct iflib_txq *iflib_txq_t;
 struct iflib_rxq;
 typedef struct iflib_rxq *iflib_rxq_t;
 struct iflib_fl;
 typedef struct iflib_fl *iflib_fl_t;
 
 struct iflib_ctx;
 
 static void iru_init(if_rxd_update_t iru, iflib_rxq_t rxq, uint8_t flid);
 static void iflib_timer(void *arg);
 
 typedef struct iflib_filter_info {
 	driver_filter_t *ifi_filter;
 	void *ifi_filter_arg;
 	struct grouptask *ifi_task;
 	void *ifi_ctx;
 } *iflib_filter_info_t;
 
 struct iflib_ctx {
 	KOBJ_FIELDS;
 	/*
 	 * Pointer to hardware driver's softc
 	 */
 	void *ifc_softc;
 	device_t ifc_dev;
 	if_t ifc_ifp;
 
 	cpuset_t ifc_cpus;
 	if_shared_ctx_t ifc_sctx;
 	struct if_softc_ctx ifc_softc_ctx;
 
 	struct sx ifc_ctx_sx;
 	struct mtx ifc_state_mtx;
 
 	iflib_txq_t ifc_txqs;
 	iflib_rxq_t ifc_rxqs;
 	uint32_t ifc_if_flags;
 	uint32_t ifc_flags;
 	uint32_t ifc_max_fl_buf_size;
 	uint32_t ifc_rx_mbuf_sz;
 
 	int ifc_link_state;
 	int ifc_link_irq;
 	int ifc_watchdog_events;
 	struct cdev *ifc_led_dev;
 	struct resource *ifc_msix_mem;
 
 	struct if_irq ifc_legacy_irq;
 	struct grouptask ifc_admin_task;
 	struct grouptask ifc_vflr_task;
 	struct iflib_filter_info ifc_filter_info;
 	struct ifmedia	ifc_media;
 
 	struct sysctl_oid *ifc_sysctl_node;
 	uint16_t ifc_sysctl_ntxqs;
 	uint16_t ifc_sysctl_nrxqs;
 	uint16_t ifc_sysctl_qs_eq_override;
 	uint16_t ifc_sysctl_rx_budget;
 	uint16_t ifc_sysctl_tx_abdicate;
+	uint16_t ifc_sysctl_core_offset;
+#define	CORE_OFFSET_UNSPECIFIED	0xffff
+	uint8_t  ifc_sysctl_separate_txrx;
 
 	qidx_t ifc_sysctl_ntxds[8];
 	qidx_t ifc_sysctl_nrxds[8];
 	struct if_txrx ifc_txrx;
 #define isc_txd_encap  ifc_txrx.ift_txd_encap
 #define isc_txd_flush  ifc_txrx.ift_txd_flush
 #define isc_txd_credits_update  ifc_txrx.ift_txd_credits_update
 #define isc_rxd_available ifc_txrx.ift_rxd_available
 #define isc_rxd_pkt_get ifc_txrx.ift_rxd_pkt_get
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_rxd_flush ifc_txrx.ift_rxd_flush
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_legacy_intr ifc_txrx.ift_legacy_intr
 	eventhandler_tag ifc_vlan_attach_event;
 	eventhandler_tag ifc_vlan_detach_event;
 	struct ether_addr ifc_mac;
 	char ifc_mtx_name[16];
 };
 
 
 void *
 iflib_get_softc(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_softc);
 }
 
 device_t
 iflib_get_dev(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_dev);
 }
 
 if_t
 iflib_get_ifp(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_ifp);
 }
 
 struct ifmedia *
 iflib_get_media(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_media);
 }
 
 uint32_t
 iflib_get_flags(if_ctx_t ctx)
 {
 	return (ctx->ifc_flags);
 }
 
 void
 iflib_set_mac(if_ctx_t ctx, uint8_t mac[ETHER_ADDR_LEN])
 {
 
 	bcopy(mac, ctx->ifc_mac.octet, ETHER_ADDR_LEN);
 }
 
 if_softc_ctx_t
 iflib_get_softc_ctx(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_softc_ctx);
 }
 
 if_shared_ctx_t
 iflib_get_sctx(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_sctx);
 }
 
 #define IP_ALIGNED(m) ((((uintptr_t)(m)->m_data) & 0x3) == 0x2)
 #define CACHE_PTR_INCREMENT (CACHE_LINE_SIZE/sizeof(void*))
 #define CACHE_PTR_NEXT(ptr) ((void *)(((uintptr_t)(ptr)+CACHE_LINE_SIZE-1) & (CACHE_LINE_SIZE-1)))
 
 #define LINK_ACTIVE(ctx) ((ctx)->ifc_link_state == LINK_STATE_UP)
 #define CTX_IS_VF(ctx) ((ctx)->ifc_sctx->isc_flags & IFLIB_IS_VF)
 
 typedef struct iflib_sw_rx_desc_array {
 	bus_dmamap_t	*ifsd_map;         /* bus_dma maps for packet */
 	struct mbuf	**ifsd_m;           /* pkthdr mbufs */
 	caddr_t		*ifsd_cl;          /* direct cluster pointer for rx */
 	bus_addr_t	*ifsd_ba;          /* bus addr of cluster for rx */
 } iflib_rxsd_array_t;
 
 typedef struct iflib_sw_tx_desc_array {
 	bus_dmamap_t    *ifsd_map;         /* bus_dma maps for packet */
 	bus_dmamap_t	*ifsd_tso_map;     /* bus_dma maps for TSO packet */
 	struct mbuf    **ifsd_m;           /* pkthdr mbufs */
 } if_txsd_vec_t;
 
 
 /* magic number that should be high enough for any hardware */
 #define IFLIB_MAX_TX_SEGS		128
 #define IFLIB_RX_COPY_THRESH		128
 #define IFLIB_MAX_RX_REFRESH		32
 /* The minimum descriptors per second before we start coalescing */
 #define IFLIB_MIN_DESC_SEC		16384
 #define IFLIB_DEFAULT_TX_UPDATE_FREQ	16
 #define IFLIB_QUEUE_IDLE		0
 #define IFLIB_QUEUE_HUNG		1
 #define IFLIB_QUEUE_WORKING		2
 /* maximum number of txqs that can share an rx interrupt */
 #define IFLIB_MAX_TX_SHARED_INTR	4
 
 /* this should really scale with ring size - this is a fairly arbitrary value */
 #define TX_BATCH_SIZE			32
 
 #define IFLIB_RESTART_BUDGET		8
 
 
 #define CSUM_OFFLOAD		(CSUM_IP_TSO|CSUM_IP6_TSO|CSUM_IP| \
 				 CSUM_IP_UDP|CSUM_IP_TCP|CSUM_IP_SCTP| \
 				 CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP)
 struct iflib_txq {
 	qidx_t		ift_in_use;
 	qidx_t		ift_cidx;
 	qidx_t		ift_cidx_processed;
 	qidx_t		ift_pidx;
 	uint8_t		ift_gen;
 	uint8_t		ift_br_offset;
 	uint16_t	ift_npending;
 	uint16_t	ift_db_pending;
 	uint16_t	ift_rs_pending;
 	/* implicit pad */
 	uint8_t		ift_txd_size[8];
 	uint64_t	ift_processed;
 	uint64_t	ift_cleaned;
 	uint64_t	ift_cleaned_prev;
 #if MEMORY_LOGGING
 	uint64_t	ift_enqueued;
 	uint64_t	ift_dequeued;
 #endif
 	uint64_t	ift_no_tx_dma_setup;
 	uint64_t	ift_no_desc_avail;
 	uint64_t	ift_mbuf_defrag_failed;
 	uint64_t	ift_mbuf_defrag;
 	uint64_t	ift_map_failed;
 	uint64_t	ift_txd_encap_efbig;
 	uint64_t	ift_pullups;
 	uint64_t	ift_last_timer_tick;
 
 	struct mtx	ift_mtx;
 	struct mtx	ift_db_mtx;
 
 	/* constant values */
 	if_ctx_t	ift_ctx;
 	struct ifmp_ring        *ift_br;
 	struct grouptask	ift_task;
 	qidx_t		ift_size;
 	uint16_t	ift_id;
 	struct callout	ift_timer;
 
 	if_txsd_vec_t	ift_sds;
 	uint8_t		ift_qstatus;
 	uint8_t		ift_closed;
 	uint8_t		ift_update_freq;
 	struct iflib_filter_info ift_filter_info;
 	bus_dma_tag_t	ift_buf_tag;
 	bus_dma_tag_t	ift_tso_buf_tag;
 	iflib_dma_info_t	ift_ifdi;
 #define MTX_NAME_LEN 16
 	char                    ift_mtx_name[MTX_NAME_LEN];
 	char                    ift_db_mtx_name[MTX_NAME_LEN];
 	bus_dma_segment_t	ift_segs[IFLIB_MAX_TX_SEGS]  __aligned(CACHE_LINE_SIZE);
 #ifdef IFLIB_DIAGNOSTICS
 	uint64_t ift_cpu_exec_count[256];
 #endif
 } __aligned(CACHE_LINE_SIZE);
 
 struct iflib_fl {
 	qidx_t		ifl_cidx;
 	qidx_t		ifl_pidx;
 	qidx_t		ifl_credits;
 	uint8_t		ifl_gen;
 	uint8_t		ifl_rxd_size;
 #if MEMORY_LOGGING
 	uint64_t	ifl_m_enqueued;
 	uint64_t	ifl_m_dequeued;
 	uint64_t	ifl_cl_enqueued;
 	uint64_t	ifl_cl_dequeued;
 #endif
 	/* implicit pad */
 
 	bitstr_t 	*ifl_rx_bitmap;
 	qidx_t		ifl_fragidx;
 	/* constant */
 	qidx_t		ifl_size;
 	uint16_t	ifl_buf_size;
 	uint16_t	ifl_cltype;
 	uma_zone_t	ifl_zone;
 	iflib_rxsd_array_t	ifl_sds;
 	iflib_rxq_t	ifl_rxq;
 	uint8_t		ifl_id;
 	bus_dma_tag_t	ifl_buf_tag;
 	iflib_dma_info_t	ifl_ifdi;
 	uint64_t	ifl_bus_addrs[IFLIB_MAX_RX_REFRESH] __aligned(CACHE_LINE_SIZE);
 	caddr_t		ifl_vm_addrs[IFLIB_MAX_RX_REFRESH];
 	qidx_t	ifl_rxd_idxs[IFLIB_MAX_RX_REFRESH];
 }  __aligned(CACHE_LINE_SIZE);
 
 static inline qidx_t
 get_inuse(int size, qidx_t cidx, qidx_t pidx, uint8_t gen)
 {
 	qidx_t used;
 
 	if (pidx > cidx)
 		used = pidx - cidx;
 	else if (pidx < cidx)
 		used = size - cidx + pidx;
 	else if (gen == 0 && pidx == cidx)
 		used = 0;
 	else if (gen == 1 && pidx == cidx)
 		used = size;
 	else
 		panic("bad state");
 
 	return (used);
 }
 
 #define TXQ_AVAIL(txq) (txq->ift_size - get_inuse(txq->ift_size, txq->ift_cidx, txq->ift_pidx, txq->ift_gen))
 
 #define IDXDIFF(head, tail, wrap) \
 	((head) >= (tail) ? (head) - (tail) : (wrap) - (tail) + (head))
 
 struct iflib_rxq {
 	/* If there is a separate completion queue -
 	 * these are the cq cidx and pidx. Otherwise
 	 * these are unused.
 	 */
 	qidx_t		ifr_size;
 	qidx_t		ifr_cq_cidx;
 	qidx_t		ifr_cq_pidx;
 	uint8_t		ifr_cq_gen;
 	uint8_t		ifr_fl_offset;
 
 	if_ctx_t	ifr_ctx;
 	iflib_fl_t	ifr_fl;
 	uint64_t	ifr_rx_irq;
+	struct pfil_head	*pfil;
 	uint16_t	ifr_id;
 	uint8_t		ifr_lro_enabled;
 	uint8_t		ifr_nfl;
 	uint8_t		ifr_ntxqirq;
 	uint8_t		ifr_txqid[IFLIB_MAX_TX_SHARED_INTR];
 	struct lro_ctrl			ifr_lc;
 	struct grouptask        ifr_task;
 	struct iflib_filter_info ifr_filter_info;
 	iflib_dma_info_t		ifr_ifdi;
 
 	/* dynamically allocate if any drivers need a value substantially larger than this */
 	struct if_rxd_frag	ifr_frags[IFLIB_MAX_RX_SEGS] __aligned(CACHE_LINE_SIZE);
 #ifdef IFLIB_DIAGNOSTICS
 	uint64_t ifr_cpu_exec_count[256];
 #endif
 }  __aligned(CACHE_LINE_SIZE);
 
 typedef struct if_rxsd {
 	caddr_t *ifsd_cl;
-	struct mbuf **ifsd_m;
 	iflib_fl_t ifsd_fl;
 	qidx_t ifsd_cidx;
 } *if_rxsd_t;
 
 /* multiple of word size */
 #ifdef __LP64__
 #define PKT_INFO_SIZE	6
 #define RXD_INFO_SIZE	5
 #define PKT_TYPE uint64_t
 #else
 #define PKT_INFO_SIZE	11
 #define RXD_INFO_SIZE	8
 #define PKT_TYPE uint32_t
 #endif
 #define PKT_LOOP_BOUND  ((PKT_INFO_SIZE/3)*3)
 #define RXD_LOOP_BOUND  ((RXD_INFO_SIZE/4)*4)
 
 typedef struct if_pkt_info_pad {
 	PKT_TYPE pkt_val[PKT_INFO_SIZE];
 } *if_pkt_info_pad_t;
 typedef struct if_rxd_info_pad {
 	PKT_TYPE rxd_val[RXD_INFO_SIZE];
 } *if_rxd_info_pad_t;
 
 CTASSERT(sizeof(struct if_pkt_info_pad) == sizeof(struct if_pkt_info));
 CTASSERT(sizeof(struct if_rxd_info_pad) == sizeof(struct if_rxd_info));
 
 
 static inline void
 pkt_info_zero(if_pkt_info_t pi)
 {
 	if_pkt_info_pad_t pi_pad;
 
 	pi_pad = (if_pkt_info_pad_t)pi;
 	pi_pad->pkt_val[0] = 0; pi_pad->pkt_val[1] = 0; pi_pad->pkt_val[2] = 0;
 	pi_pad->pkt_val[3] = 0; pi_pad->pkt_val[4] = 0; pi_pad->pkt_val[5] = 0;
 #ifndef __LP64__
 	pi_pad->pkt_val[6] = 0; pi_pad->pkt_val[7] = 0; pi_pad->pkt_val[8] = 0;
 	pi_pad->pkt_val[9] = 0; pi_pad->pkt_val[10] = 0;
 #endif	
 }
 
 static device_method_t iflib_pseudo_methods[] = {
 	DEVMETHOD(device_attach, noop_attach),
 	DEVMETHOD(device_detach, iflib_pseudo_detach),
 	DEVMETHOD_END
 };
 
 driver_t iflib_pseudodriver = {
 	"iflib_pseudo", iflib_pseudo_methods, sizeof(struct iflib_ctx),
 };
 
 static inline void
 rxd_info_zero(if_rxd_info_t ri)
 {
 	if_rxd_info_pad_t ri_pad;
 	int i;
 
 	ri_pad = (if_rxd_info_pad_t)ri;
 	for (i = 0; i < RXD_LOOP_BOUND; i += 4) {
 		ri_pad->rxd_val[i] = 0;
 		ri_pad->rxd_val[i+1] = 0;
 		ri_pad->rxd_val[i+2] = 0;
 		ri_pad->rxd_val[i+3] = 0;
 	}
 #ifdef __LP64__
 	ri_pad->rxd_val[RXD_INFO_SIZE-1] = 0;
 #endif
 }
 
 /*
  * Only allow a single packet to take up most 1/nth of the tx ring
  */
 #define MAX_SINGLE_PACKET_FRACTION 12
 #define IF_BAD_DMA (bus_addr_t)-1
 
 #define CTX_ACTIVE(ctx) ((if_getdrvflags((ctx)->ifc_ifp) & IFF_DRV_RUNNING))
 
 #define CTX_LOCK_INIT(_sc)  sx_init(&(_sc)->ifc_ctx_sx, "iflib ctx lock")
 #define CTX_LOCK(ctx) sx_xlock(&(ctx)->ifc_ctx_sx)
 #define CTX_UNLOCK(ctx) sx_xunlock(&(ctx)->ifc_ctx_sx)
 #define CTX_LOCK_DESTROY(ctx) sx_destroy(&(ctx)->ifc_ctx_sx)
 
 
 #define STATE_LOCK_INIT(_sc, _name)  mtx_init(&(_sc)->ifc_state_mtx, _name, "iflib state lock", MTX_DEF)
 #define STATE_LOCK(ctx) mtx_lock(&(ctx)->ifc_state_mtx)
 #define STATE_UNLOCK(ctx) mtx_unlock(&(ctx)->ifc_state_mtx)
 #define STATE_LOCK_DESTROY(ctx) mtx_destroy(&(ctx)->ifc_state_mtx)
 
 
 
 #define CALLOUT_LOCK(txq)	mtx_lock(&txq->ift_mtx)
 #define CALLOUT_UNLOCK(txq) 	mtx_unlock(&txq->ift_mtx)
 
 void
 iflib_set_detach(if_ctx_t ctx)
 {
 	STATE_LOCK(ctx);
 	ctx->ifc_flags |= IFC_IN_DETACH;
 	STATE_UNLOCK(ctx);
 }
 
 /* Our boot-time initialization hook */
 static int	iflib_module_event_handler(module_t, int, void *);
 
 static moduledata_t iflib_moduledata = {
 	"iflib",
 	iflib_module_event_handler,
 	NULL
 };
 
 DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);
 MODULE_DEPEND(iflib, ether, 1, 1, 1);
 
 TASKQGROUP_DEFINE(if_io_tqg, mp_ncpus, 1);
 TASKQGROUP_DEFINE(if_config_tqg, 1, 1);
 
 #ifndef IFLIB_DEBUG_COUNTERS
 #ifdef INVARIANTS
 #define IFLIB_DEBUG_COUNTERS 1
 #else
 #define IFLIB_DEBUG_COUNTERS 0
 #endif /* !INVARIANTS */
 #endif
 
 static SYSCTL_NODE(_net, OID_AUTO, iflib, CTLFLAG_RD, 0,
                    "iflib driver parameters");
 
 /*
  * XXX need to ensure that this can't accidentally cause the head to be moved backwards 
  */
 static int iflib_min_tx_latency = 0;
 SYSCTL_INT(_net_iflib, OID_AUTO, min_tx_latency, CTLFLAG_RW,
 		   &iflib_min_tx_latency, 0, "minimize transmit latency at the possible expense of throughput");
 static int iflib_no_tx_batch = 0;
 SYSCTL_INT(_net_iflib, OID_AUTO, no_tx_batch, CTLFLAG_RW,
 		   &iflib_no_tx_batch, 0, "minimize transmit latency at the possible expense of throughput");
 
 
 #if IFLIB_DEBUG_COUNTERS
 
 static int iflib_tx_seen;
 static int iflib_tx_sent;
 static int iflib_tx_encap;
 static int iflib_rx_allocs;
 static int iflib_fl_refills;
 static int iflib_fl_refills_large;
 static int iflib_tx_frees;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_seen, CTLFLAG_RD,
 		   &iflib_tx_seen, 0, "# tx mbufs seen");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_sent, CTLFLAG_RD,
 		   &iflib_tx_sent, 0, "# tx mbufs sent");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_encap, CTLFLAG_RD,
 		   &iflib_tx_encap, 0, "# tx mbufs encapped");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_frees, CTLFLAG_RD,
 		   &iflib_tx_frees, 0, "# tx frees");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_allocs, CTLFLAG_RD,
 		   &iflib_rx_allocs, 0, "# rx allocations");
 SYSCTL_INT(_net_iflib, OID_AUTO, fl_refills, CTLFLAG_RD,
 		   &iflib_fl_refills, 0, "# refills");
 SYSCTL_INT(_net_iflib, OID_AUTO, fl_refills_large, CTLFLAG_RD,
 		   &iflib_fl_refills_large, 0, "# large refills");
 
 
 static int iflib_txq_drain_flushing;
 static int iflib_txq_drain_oactive;
 static int iflib_txq_drain_notready;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_flushing, CTLFLAG_RD,
 		   &iflib_txq_drain_flushing, 0, "# drain flushes");
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_oactive, CTLFLAG_RD,
 		   &iflib_txq_drain_oactive, 0, "# drain oactives");
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_notready, CTLFLAG_RD,
 		   &iflib_txq_drain_notready, 0, "# drain notready");
 
 
 static int iflib_encap_load_mbuf_fail;
 static int iflib_encap_pad_mbuf_fail;
 static int iflib_encap_txq_avail_fail;
 static int iflib_encap_txd_encap_fail;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_load_mbuf_fail, CTLFLAG_RD,
 		   &iflib_encap_load_mbuf_fail, 0, "# busdma load failures");
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_pad_mbuf_fail, CTLFLAG_RD,
 		   &iflib_encap_pad_mbuf_fail, 0, "# runt frame pad failures");
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_txq_avail_fail, CTLFLAG_RD,
 		   &iflib_encap_txq_avail_fail, 0, "# txq avail failures");
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_txd_encap_fail, CTLFLAG_RD,
 		   &iflib_encap_txd_encap_fail, 0, "# driver encap failures");
 
 static int iflib_task_fn_rxs;
 static int iflib_rx_intr_enables;
 static int iflib_fast_intrs;
 static int iflib_rx_unavail;
 static int iflib_rx_ctx_inactive;
 static int iflib_rx_if_input;
-static int iflib_rx_mbuf_null;
 static int iflib_rxd_flush;
 
 static int iflib_verbose_debug;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, task_fn_rx, CTLFLAG_RD,
 		   &iflib_task_fn_rxs, 0, "# task_fn_rx calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_intr_enables, CTLFLAG_RD,
 		   &iflib_rx_intr_enables, 0, "# rx intr enables");
 SYSCTL_INT(_net_iflib, OID_AUTO, fast_intrs, CTLFLAG_RD,
 		   &iflib_fast_intrs, 0, "# fast_intr calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_unavail, CTLFLAG_RD,
 		   &iflib_rx_unavail, 0, "# times rxeof called with no available data");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_ctx_inactive, CTLFLAG_RD,
 		   &iflib_rx_ctx_inactive, 0, "# times rxeof called with inactive context");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_if_input, CTLFLAG_RD,
 		   &iflib_rx_if_input, 0, "# times rxeof called if_input");
-SYSCTL_INT(_net_iflib, OID_AUTO, rx_mbuf_null, CTLFLAG_RD,
-		   &iflib_rx_mbuf_null, 0, "# times rxeof got null mbuf");
 SYSCTL_INT(_net_iflib, OID_AUTO, rxd_flush, CTLFLAG_RD,
 	         &iflib_rxd_flush, 0, "# times rxd_flush called");
 SYSCTL_INT(_net_iflib, OID_AUTO, verbose_debug, CTLFLAG_RW,
 		   &iflib_verbose_debug, 0, "enable verbose debugging");
 
 #define DBG_COUNTER_INC(name) atomic_add_int(&(iflib_ ## name), 1)
 static void
 iflib_debug_reset(void)
 {
 	iflib_tx_seen = iflib_tx_sent = iflib_tx_encap = iflib_rx_allocs =
 		iflib_fl_refills = iflib_fl_refills_large = iflib_tx_frees =
 		iflib_txq_drain_flushing = iflib_txq_drain_oactive =
 		iflib_txq_drain_notready =
 		iflib_encap_load_mbuf_fail = iflib_encap_pad_mbuf_fail =
 		iflib_encap_txq_avail_fail = iflib_encap_txd_encap_fail =
 		iflib_task_fn_rxs = iflib_rx_intr_enables = iflib_fast_intrs =
 		iflib_rx_unavail =
 		iflib_rx_ctx_inactive = iflib_rx_if_input =
-		iflib_rx_mbuf_null = iflib_rxd_flush = 0;
+		iflib_rxd_flush = 0;
 }
 
 #else
 #define DBG_COUNTER_INC(name)
 static void iflib_debug_reset(void) {}
 #endif
 
 #define IFLIB_DEBUG 0
 
 static void iflib_tx_structures_free(if_ctx_t ctx);
 static void iflib_rx_structures_free(if_ctx_t ctx);
 static int iflib_queues_alloc(if_ctx_t ctx);
 static int iflib_tx_credits_update(if_ctx_t ctx, iflib_txq_t txq);
 static int iflib_rxd_avail(if_ctx_t ctx, iflib_rxq_t rxq, qidx_t cidx, qidx_t budget);
 static int iflib_qset_structures_setup(if_ctx_t ctx);
 static int iflib_msix_init(if_ctx_t ctx);
 static int iflib_legacy_setup(if_ctx_t ctx, driver_filter_t filter, void *filterarg, int *rid, const char *str);
 static void iflib_txq_check_drain(iflib_txq_t txq, int budget);
 static uint32_t iflib_txq_can_drain(struct ifmp_ring *);
 #ifdef ALTQ
 static void iflib_altq_if_start(if_t ifp);
 static int iflib_altq_if_transmit(if_t ifp, struct mbuf *m);
 #endif
 static int iflib_register(if_ctx_t);
 static void iflib_init_locked(if_ctx_t ctx);
 static void iflib_add_device_sysctl_pre(if_ctx_t ctx);
 static void iflib_add_device_sysctl_post(if_ctx_t ctx);
 static void iflib_ifmp_purge(iflib_txq_t txq);
 static void _iflib_pre_assert(if_softc_ctx_t scctx);
 static void iflib_if_init_locked(if_ctx_t ctx);
 static void iflib_free_intr_mem(if_ctx_t ctx);
 #ifndef __NO_STRICT_ALIGNMENT
 static struct mbuf * iflib_fixup_rx(struct mbuf *m);
 #endif
 
+static SLIST_HEAD(cpu_offset_list, cpu_offset) cpu_offsets =
+    SLIST_HEAD_INITIALIZER(cpu_offsets);
+struct cpu_offset {
+	SLIST_ENTRY(cpu_offset) entries;
+	cpuset_t	set;
+	unsigned int	refcount;
+	uint16_t	offset;
+};
+static struct mtx cpu_offset_mtx;
+MTX_SYSINIT(iflib_cpu_offset, &cpu_offset_mtx, "iflib_cpu_offset lock",
+    MTX_DEF);
+
 NETDUMP_DEFINE(iflib);
 
 #ifdef DEV_NETMAP
 #include <sys/selinfo.h>
 #include <net/netmap.h>
 #include <dev/netmap/netmap_kern.h>
 
 MODULE_DEPEND(iflib, netmap, 1, 1, 1);
 
 static int netmap_fl_refill(iflib_rxq_t rxq, struct netmap_kring *kring, uint32_t nm_i, bool init);
 
 /*
  * device-specific sysctl variables:
  *
  * iflib_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
  *	During regular operations the CRC is stripped, but on some
  *	hardware reception of frames not multiple of 64 is slower,
  *	so using crcstrip=0 helps in benchmarks.
  *
  * iflib_rx_miss, iflib_rx_miss_bufs:
  *	count packets that might be missed due to lost interrupts.
  */
 SYSCTL_DECL(_dev_netmap);
 /*
  * The xl driver by default strips CRCs and we do not override it.
  */
 
 int iflib_crcstrip = 1;
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_crcstrip,
     CTLFLAG_RW, &iflib_crcstrip, 1, "strip CRC on rx frames");
 
 int iflib_rx_miss, iflib_rx_miss_bufs;
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_rx_miss,
     CTLFLAG_RW, &iflib_rx_miss, 0, "potentially missed rx intr");
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_rx_miss_bufs,
     CTLFLAG_RW, &iflib_rx_miss_bufs, 0, "potentially missed rx intr bufs");
 
 /*
  * Register/unregister. We are already under netmap lock.
  * Only called on the first register or the last unregister.
  */
 static int
 iflib_netmap_register(struct netmap_adapter *na, int onoff)
 {
 	struct ifnet *ifp = na->ifp;
 	if_ctx_t ctx = ifp->if_softc;
 	int status;
 
 	CTX_LOCK(ctx);
 	IFDI_INTR_DISABLE(ctx);
 
 	/* Tell the stack that the interface is no longer active */
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 
 	if (!CTX_IS_VF(ctx))
 		IFDI_CRCSTRIP_SET(ctx, onoff, iflib_crcstrip);
 
 	/* enable or disable flags and callbacks in na and ifp */
 	if (onoff) {
 		nm_set_native_flags(na);
 	} else {
 		nm_clear_native_flags(na);
 	}
 	iflib_stop(ctx);
 	iflib_init_locked(ctx);
 	IFDI_CRCSTRIP_SET(ctx, onoff, iflib_crcstrip); // XXX why twice ?
 	status = ifp->if_drv_flags & IFF_DRV_RUNNING ? 0 : 1;
 	if (status)
 		nm_clear_native_flags(na);
 	CTX_UNLOCK(ctx);
 	return (status);
 }
 
 static int
 netmap_fl_refill(iflib_rxq_t rxq, struct netmap_kring *kring, uint32_t nm_i, bool init)
 {
 	struct netmap_adapter *na = kring->na;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int head = kring->rhead;
 	struct netmap_ring *ring = kring->ring;
 	bus_dmamap_t *map;
 	struct if_rxd_update iru;
 	if_ctx_t ctx = rxq->ifr_ctx;
 	iflib_fl_t fl = &rxq->ifr_fl[0];
 	uint32_t refill_pidx, nic_i;
 #if IFLIB_DEBUG_COUNTERS
 	int rf_count = 0;
 #endif
 
 	if (nm_i == head && __predict_true(!init))
 		return 0;
 	iru_init(&iru, rxq, 0 /* flid */);
 	map = fl->ifl_sds.ifsd_map;
 	refill_pidx = netmap_idx_k2n(kring, nm_i);
 	/*
 	 * IMPORTANT: we must leave one free slot in the ring,
 	 * so move head back by one unit
 	 */
 	head = nm_prev(head, lim);
 	nic_i = UINT_MAX;
 	DBG_COUNTER_INC(fl_refills);
 	while (nm_i != head) {
 #if IFLIB_DEBUG_COUNTERS
 		if (++rf_count == 9)
 			DBG_COUNTER_INC(fl_refills_large);
 #endif
 		for (int tmp_pidx = 0; tmp_pidx < IFLIB_MAX_RX_REFRESH && nm_i != head; tmp_pidx++) {
 			struct netmap_slot *slot = &ring->slot[nm_i];
 			void *addr = PNMB(na, slot, &fl->ifl_bus_addrs[tmp_pidx]);
 			uint32_t nic_i_dma = refill_pidx;
 			nic_i = netmap_idx_k2n(kring, nm_i);
 
 			MPASS(tmp_pidx < IFLIB_MAX_RX_REFRESH);
 
 			if (addr == NETMAP_BUF_BASE(na)) /* bad buf */
 			        return netmap_ring_reinit(kring);
 
 			fl->ifl_vm_addrs[tmp_pidx] = addr;
 			if (__predict_false(init)) {
 				netmap_load_map(na, fl->ifl_buf_tag,
 				    map[nic_i], addr);
 			} else if (slot->flags & NS_BUF_CHANGED) {
 				/* buffer has changed, reload map */
 				netmap_reload_map(na, fl->ifl_buf_tag,
 				    map[nic_i], addr);
 			}
 			slot->flags &= ~NS_BUF_CHANGED;
 
 			nm_i = nm_next(nm_i, lim);
 			fl->ifl_rxd_idxs[tmp_pidx] = nic_i = nm_next(nic_i, lim);
 			if (nm_i != head && tmp_pidx < IFLIB_MAX_RX_REFRESH-1)
 				continue;
 
 			iru.iru_pidx = refill_pidx;
 			iru.iru_count = tmp_pidx+1;
 			ctx->isc_rxd_refill(ctx->ifc_softc, &iru);
 			refill_pidx = nic_i;
 			for (int n = 0; n < iru.iru_count; n++) {
 				bus_dmamap_sync(fl->ifl_buf_tag, map[nic_i_dma],
 						BUS_DMASYNC_PREREAD);
 				/* XXX - change this to not use the netmap func*/
 				nic_i_dma = nm_next(nic_i_dma, lim);
 			}
 		}
 	}
 	kring->nr_hwcur = head;
 
 	bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	if (__predict_true(nic_i != UINT_MAX)) {
 		ctx->isc_rxd_flush(ctx->ifc_softc, rxq->ifr_id, fl->ifl_id, nic_i);
 		DBG_COUNTER_INC(rxd_flush);
 	}
 	return (0);
 }
 
 /*
  * Reconcile kernel and user view of the transmit ring.
  *
  * All information is in the kring.
  * Userspace wants to send packets up to the one before kring->rhead,
  * kernel knows kring->nr_hwcur is the first unsent packet.
  *
  * Here we push packets out (as many as possible), and possibly
  * reclaim buffers from previously completed transmission.
  *
  * The caller (netmap) guarantees that there is only one instance
  * running at any time. Any interference with other driver
  * methods should be handled by the individual drivers.
  */
 static int
 iflib_netmap_txsync(struct netmap_kring *kring, int flags)
 {
 	struct netmap_adapter *na = kring->na;
 	struct ifnet *ifp = na->ifp;
 	struct netmap_ring *ring = kring->ring;
 	u_int nm_i;	/* index into the netmap kring */
 	u_int nic_i;	/* index into the NIC ring */
 	u_int n;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int const head = kring->rhead;
 	struct if_pkt_info pi;
 
 	/*
 	 * interrupts on every tx packet are expensive so request
 	 * them every half ring, or where NS_REPORT is set
 	 */
 	u_int report_frequency = kring->nkr_num_slots >> 1;
 	/* device-specific */
 	if_ctx_t ctx = ifp->if_softc;
 	iflib_txq_t txq = &ctx->ifc_txqs[kring->ring_id];
 
 	bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	/*
 	 * First part: process new packets to send.
 	 * nm_i is the current index in the netmap kring,
 	 * nic_i is the corresponding index in the NIC ring.
 	 *
 	 * If we have packets to send (nm_i != head)
 	 * iterate over the netmap ring, fetch length and update
 	 * the corresponding slot in the NIC ring. Some drivers also
 	 * need to update the buffer's physical address in the NIC slot
 	 * even NS_BUF_CHANGED is not set (PNMB computes the addresses).
 	 *
 	 * The netmap_reload_map() calls is especially expensive,
 	 * even when (as in this case) the tag is 0, so do only
 	 * when the buffer has actually changed.
 	 *
 	 * If possible do not set the report/intr bit on all slots,
 	 * but only a few times per ring or when NS_REPORT is set.
 	 *
 	 * Finally, on 10G and faster drivers, it might be useful
 	 * to prefetch the next slot and txr entry.
 	 */
 
 	nm_i = kring->nr_hwcur;
 	if (nm_i != head) {	/* we have new packets to send */
 		pkt_info_zero(&pi);
 		pi.ipi_segs = txq->ift_segs;
 		pi.ipi_qsidx = kring->ring_id;
 		nic_i = netmap_idx_k2n(kring, nm_i);
 
 		__builtin_prefetch(&ring->slot[nm_i]);
 		__builtin_prefetch(&txq->ift_sds.ifsd_m[nic_i]);
 		__builtin_prefetch(&txq->ift_sds.ifsd_map[nic_i]);
 
 		for (n = 0; nm_i != head; n++) {
 			struct netmap_slot *slot = &ring->slot[nm_i];
 			u_int len = slot->len;
 			uint64_t paddr;
 			void *addr = PNMB(na, slot, &paddr);
 			int flags = (slot->flags & NS_REPORT ||
 				nic_i == 0 || nic_i == report_frequency) ?
 				IPI_TX_INTR : 0;
 
 			/* device-specific */
 			pi.ipi_len = len;
 			pi.ipi_segs[0].ds_addr = paddr;
 			pi.ipi_segs[0].ds_len = len;
 			pi.ipi_nsegs = 1;
 			pi.ipi_ndescs = 0;
 			pi.ipi_pidx = nic_i;
 			pi.ipi_flags = flags;
 
 			/* Fill the slot in the NIC ring. */
 			ctx->isc_txd_encap(ctx->ifc_softc, &pi);
 			DBG_COUNTER_INC(tx_encap);
 
 			/* prefetch for next round */
 			__builtin_prefetch(&ring->slot[nm_i + 1]);
 			__builtin_prefetch(&txq->ift_sds.ifsd_m[nic_i + 1]);
 			__builtin_prefetch(&txq->ift_sds.ifsd_map[nic_i + 1]);
 
 			NM_CHECK_ADDR_LEN(na, addr, len);
 
 			if (slot->flags & NS_BUF_CHANGED) {
 				/* buffer has changed, reload map */
 				netmap_reload_map(na, txq->ift_buf_tag,
 				    txq->ift_sds.ifsd_map[nic_i], addr);
 			}
 			/* make sure changes to the buffer are synced */
 			bus_dmamap_sync(txq->ift_buf_tag,
 			    txq->ift_sds.ifsd_map[nic_i],
 			    BUS_DMASYNC_PREWRITE);
 
 			slot->flags &= ~(NS_REPORT | NS_BUF_CHANGED);
 			nm_i = nm_next(nm_i, lim);
 			nic_i = nm_next(nic_i, lim);
 		}
 		kring->nr_hwcur = nm_i;
 
 		/* synchronize the NIC ring */
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		/* (re)start the tx unit up to slot nic_i (excluded) */
 		ctx->isc_txd_flush(ctx->ifc_softc, txq->ift_id, nic_i);
 	}
 
 	/*
 	 * Second part: reclaim buffers for completed transmissions.
 	 *
 	 * If there are unclaimed buffers, attempt to reclaim them.
 	 * If none are reclaimed, and TX IRQs are not in use, do an initial
 	 * minimal delay, then trigger the tx handler which will spin in the
 	 * group task queue.
 	 */
 	if (kring->nr_hwtail != nm_prev(kring->nr_hwcur, lim)) {
 		if (iflib_tx_credits_update(ctx, txq)) {
 			/* some tx completed, increment avail */
 			nic_i = txq->ift_cidx_processed;
 			kring->nr_hwtail = nm_prev(netmap_idx_n2k(kring, nic_i), lim);
 		}
 	}
 	if (!(ctx->ifc_flags & IFC_NETMAP_TX_IRQ))
 		if (kring->nr_hwtail != nm_prev(kring->nr_hwcur, lim)) {
 			callout_reset_on(&txq->ift_timer, hz < 2000 ? 1 : hz / 1000,
 			    iflib_timer, txq, txq->ift_timer.c_cpu);
 	}
 	return (0);
 }
 
 /*
  * Reconcile kernel and user view of the receive ring.
  * Same as for the txsync, this routine must be efficient.
  * The caller guarantees a single invocations, but races against
  * the rest of the driver should be handled here.
  *
  * On call, kring->rhead is the first packet that userspace wants
  * to keep, and kring->rcur is the wakeup point.
  * The kernel has previously reported packets up to kring->rtail.
  *
  * If (flags & NAF_FORCE_READ) also check for incoming packets irrespective
  * of whether or not we received an interrupt.
  */
 static int
 iflib_netmap_rxsync(struct netmap_kring *kring, int flags)
 {
 	struct netmap_adapter *na = kring->na;
 	struct netmap_ring *ring = kring->ring;
 	iflib_fl_t fl;
 	uint32_t nm_i;	/* index into the netmap ring */
 	uint32_t nic_i;	/* index into the NIC ring */
 	u_int i, n;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int const head = kring->rhead;
 	int force_update = (flags & NAF_FORCE_READ) || kring->nr_kflags & NKR_PENDINTR;
 	struct if_rxd_info ri;
 
 	struct ifnet *ifp = na->ifp;
 	if_ctx_t ctx = ifp->if_softc;
 	iflib_rxq_t rxq = &ctx->ifc_rxqs[kring->ring_id];
 	if (head > lim)
 		return netmap_ring_reinit(kring);
 
 	/*
 	 * XXX netmap_fl_refill() only ever (re)fills free list 0 so far.
 	 */
 
 	for (i = 0, fl = rxq->ifr_fl; i < rxq->ifr_nfl; i++, fl++) {
 		bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 	}
 
 	/*
 	 * First part: import newly received packets.
 	 *
 	 * nm_i is the index of the next free slot in the netmap ring,
 	 * nic_i is the index of the next received packet in the NIC ring,
 	 * and they may differ in case if_init() has been called while
 	 * in netmap mode. For the receive ring we have
 	 *
 	 *	nic_i = rxr->next_check;
 	 *	nm_i = kring->nr_hwtail (previous)
 	 * and
 	 *	nm_i == (nic_i + kring->nkr_hwofs) % ring_size
 	 *
 	 * rxr->next_check is set to 0 on a ring reinit
 	 */
 	if (netmap_no_pendintr || force_update) {
 		int crclen = iflib_crcstrip ? 0 : 4;
 		int error, avail;
 
 		for (i = 0; i < rxq->ifr_nfl; i++) {
 			fl = &rxq->ifr_fl[i];
 			nic_i = fl->ifl_cidx;
 			nm_i = netmap_idx_n2k(kring, nic_i);
 			avail = ctx->isc_rxd_available(ctx->ifc_softc,
 			    rxq->ifr_id, nic_i, USHRT_MAX);
 			for (n = 0; avail > 0; n++, avail--) {
 				rxd_info_zero(&ri);
 				ri.iri_frags = rxq->ifr_frags;
 				ri.iri_qsidx = kring->ring_id;
 				ri.iri_ifp = ctx->ifc_ifp;
 				ri.iri_cidx = nic_i;
 
 				error = ctx->isc_rxd_pkt_get(ctx->ifc_softc, &ri);
 				ring->slot[nm_i].len = error ? 0 : ri.iri_len - crclen;
 				ring->slot[nm_i].flags = 0;
 				bus_dmamap_sync(fl->ifl_buf_tag,
 				    fl->ifl_sds.ifsd_map[nic_i], BUS_DMASYNC_POSTREAD);
 				nm_i = nm_next(nm_i, lim);
 				nic_i = nm_next(nic_i, lim);
 			}
 			if (n) { /* update the state variables */
 				if (netmap_no_pendintr && !force_update) {
 					/* diagnostics */
 					iflib_rx_miss ++;
 					iflib_rx_miss_bufs += n;
 				}
 				fl->ifl_cidx = nic_i;
 				kring->nr_hwtail = nm_i;
 			}
 			kring->nr_kflags &= ~NKR_PENDINTR;
 		}
 	}
 	/*
 	 * Second part: skip past packets that userspace has released.
 	 * (kring->nr_hwcur to head excluded),
 	 * and make the buffers available for reception.
 	 * As usual nm_i is the index in the netmap ring,
 	 * nic_i is the index in the NIC ring, and
 	 * nm_i == (nic_i + kring->nkr_hwofs) % ring_size
 	 */
 	/* XXX not sure how this will work with multiple free lists */
 	nm_i = kring->nr_hwcur;
 
 	return (netmap_fl_refill(rxq, kring, nm_i, false));
 }
 
 static void
 iflib_netmap_intr(struct netmap_adapter *na, int onoff)
 {
 	struct ifnet *ifp = na->ifp;
 	if_ctx_t ctx = ifp->if_softc;
 
 	CTX_LOCK(ctx);
 	if (onoff) {
 		IFDI_INTR_ENABLE(ctx);
 	} else {
 		IFDI_INTR_DISABLE(ctx);
 	}
 	CTX_UNLOCK(ctx);
 }
 
 
 static int
 iflib_netmap_attach(if_ctx_t ctx)
 {
 	struct netmap_adapter na;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 
 	bzero(&na, sizeof(na));
 
 	na.ifp = ctx->ifc_ifp;
 	na.na_flags = NAF_BDG_MAYSLEEP;
 	MPASS(ctx->ifc_softc_ctx.isc_ntxqsets);
 	MPASS(ctx->ifc_softc_ctx.isc_nrxqsets);
 
 	na.num_tx_desc = scctx->isc_ntxd[0];
 	na.num_rx_desc = scctx->isc_nrxd[0];
 	na.nm_txsync = iflib_netmap_txsync;
 	na.nm_rxsync = iflib_netmap_rxsync;
 	na.nm_register = iflib_netmap_register;
 	na.nm_intr = iflib_netmap_intr;
 	na.num_tx_rings = ctx->ifc_softc_ctx.isc_ntxqsets;
 	na.num_rx_rings = ctx->ifc_softc_ctx.isc_nrxqsets;
 	return (netmap_attach(&na));
 }
 
 static void
 iflib_netmap_txq_init(if_ctx_t ctx, iflib_txq_t txq)
 {
 	struct netmap_adapter *na = NA(ctx->ifc_ifp);
 	struct netmap_slot *slot;
 
 	slot = netmap_reset(na, NR_TX, txq->ift_id, 0);
 	if (slot == NULL)
 		return;
 	for (int i = 0; i < ctx->ifc_softc_ctx.isc_ntxd[0]; i++) {
 
 		/*
 		 * In netmap mode, set the map for the packet buffer.
 		 * NOTE: Some drivers (not this one) also need to set
 		 * the physical buffer address in the NIC ring.
 		 * netmap_idx_n2k() maps a nic index, i, into the corresponding
 		 * netmap slot index, si
 		 */
 		int si = netmap_idx_n2k(na->tx_rings[txq->ift_id], i);
 		netmap_load_map(na, txq->ift_buf_tag, txq->ift_sds.ifsd_map[i],
 		    NMB(na, slot + si));
 	}
 }
 
 static void
 iflib_netmap_rxq_init(if_ctx_t ctx, iflib_rxq_t rxq)
 {
 	struct netmap_adapter *na = NA(ctx->ifc_ifp);
 	struct netmap_kring *kring = na->rx_rings[rxq->ifr_id];
 	struct netmap_slot *slot;
 	uint32_t nm_i;
 
 	slot = netmap_reset(na, NR_RX, rxq->ifr_id, 0);
 	if (slot == NULL)
 		return;
 	nm_i = netmap_idx_n2k(kring, 0);
 	netmap_fl_refill(rxq, kring, nm_i, true);
 }
 
 static void
 iflib_netmap_timer_adjust(if_ctx_t ctx, iflib_txq_t txq, uint32_t *reset_on)
 {
 	struct netmap_kring *kring;
 	uint16_t txqid;
 
 	txqid = txq->ift_id;
 	kring = NA(ctx->ifc_ifp)->tx_rings[txqid];
 
 	if (kring->nr_hwcur != nm_next(kring->nr_hwtail, kring->nkr_num_slots - 1)) {
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 		    BUS_DMASYNC_POSTREAD);
 		if (ctx->isc_txd_credits_update(ctx->ifc_softc, txqid, false))
 			netmap_tx_irq(ctx->ifc_ifp, txqid);
 		if (!(ctx->ifc_flags & IFC_NETMAP_TX_IRQ)) {
 			if (hz < 2000)
 				*reset_on = 1;
 			else
 				*reset_on = hz / 1000;
 		}
 	}
 }
 
 #define iflib_netmap_detach(ifp) netmap_detach(ifp)
 
 #else
 #define iflib_netmap_txq_init(ctx, txq)
 #define iflib_netmap_rxq_init(ctx, rxq)
 #define iflib_netmap_detach(ifp)
 
 #define iflib_netmap_attach(ctx) (0)
 #define netmap_rx_irq(ifp, qid, budget) (0)
 #define netmap_tx_irq(ifp, qid) do {} while (0)
 #define iflib_netmap_timer_adjust(ctx, txq, reset_on)
 
 #endif
 
 #if defined(__i386__) || defined(__amd64__)
 static __inline void
 prefetch(void *x)
 {
 	__asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x));
 }
 static __inline void
 prefetch2cachelines(void *x)
 {
 	__asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x));
 #if (CACHE_LINE_SIZE < 128)
 	__asm volatile("prefetcht0 %0" :: "m" (*(((unsigned long *)x)+CACHE_LINE_SIZE/(sizeof(unsigned long)))));
 #endif
 }
 #else
 #define prefetch(x)
 #define prefetch2cachelines(x)
 #endif
 
 static void
 iru_init(if_rxd_update_t iru, iflib_rxq_t rxq, uint8_t flid)
 {
 	iflib_fl_t fl;
 
 	fl = &rxq->ifr_fl[flid];
 	iru->iru_paddrs = fl->ifl_bus_addrs;
 	iru->iru_vaddrs = &fl->ifl_vm_addrs[0];
 	iru->iru_idxs = fl->ifl_rxd_idxs;
 	iru->iru_qsidx = rxq->ifr_id;
 	iru->iru_buf_size = fl->ifl_buf_size;
 	iru->iru_flidx = fl->ifl_id;
 }
 
 static void
 _iflib_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int err)
 {
 	if (err)
 		return;
 	*(bus_addr_t *) arg = segs[0].ds_addr;
 }
 
 int
 iflib_dma_alloc_align(if_ctx_t ctx, int size, int align, iflib_dma_info_t dma, int mapflags)
 {
 	int err;
 	device_t dev = ctx->ifc_dev;
 
 	err = bus_dma_tag_create(bus_get_dma_tag(dev),	/* parent */
 				align, 0,		/* alignment, bounds */
 				BUS_SPACE_MAXADDR,	/* lowaddr */
 				BUS_SPACE_MAXADDR,	/* highaddr */
 				NULL, NULL,		/* filter, filterarg */
 				size,			/* maxsize */
 				1,			/* nsegments */
 				size,			/* maxsegsize */
 				BUS_DMA_ALLOCNOW,	/* flags */
 				NULL,			/* lockfunc */
 				NULL,			/* lockarg */
 				&dma->idi_tag);
 	if (err) {
 		device_printf(dev,
 		    "%s: bus_dma_tag_create failed: %d\n",
 		    __func__, err);
 		goto fail_0;
 	}
 
 	err = bus_dmamem_alloc(dma->idi_tag, (void**) &dma->idi_vaddr,
 	    BUS_DMA_NOWAIT | BUS_DMA_COHERENT | BUS_DMA_ZERO, &dma->idi_map);
 	if (err) {
 		device_printf(dev,
 		    "%s: bus_dmamem_alloc(%ju) failed: %d\n",
 		    __func__, (uintmax_t)size, err);
 		goto fail_1;
 	}
 
 	dma->idi_paddr = IF_BAD_DMA;
 	err = bus_dmamap_load(dma->idi_tag, dma->idi_map, dma->idi_vaddr,
 	    size, _iflib_dmamap_cb, &dma->idi_paddr, mapflags | BUS_DMA_NOWAIT);
 	if (err || dma->idi_paddr == IF_BAD_DMA) {
 		device_printf(dev,
 		    "%s: bus_dmamap_load failed: %d\n",
 		    __func__, err);
 		goto fail_2;
 	}
 
 	dma->idi_size = size;
 	return (0);
 
 fail_2:
 	bus_dmamem_free(dma->idi_tag, dma->idi_vaddr, dma->idi_map);
 fail_1:
 	bus_dma_tag_destroy(dma->idi_tag);
 fail_0:
 	dma->idi_tag = NULL;
 
 	return (err);
 }
 
 int
 iflib_dma_alloc(if_ctx_t ctx, int size, iflib_dma_info_t dma, int mapflags)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 
 	KASSERT(sctx->isc_q_align != 0, ("alignment value not initialized"));
 
 	return (iflib_dma_alloc_align(ctx, size, sctx->isc_q_align, dma, mapflags));
 }
 
 int
 iflib_dma_alloc_multi(if_ctx_t ctx, int *sizes, iflib_dma_info_t *dmalist, int mapflags, int count)
 {
 	int i, err;
 	iflib_dma_info_t *dmaiter;
 
 	dmaiter = dmalist;
 	for (i = 0; i < count; i++, dmaiter++) {
 		if ((err = iflib_dma_alloc(ctx, sizes[i], *dmaiter, mapflags)) != 0)
 			break;
 	}
 	if (err)
 		iflib_dma_free_multi(dmalist, i);
 	return (err);
 }
 
 void
 iflib_dma_free(iflib_dma_info_t dma)
 {
 	if (dma->idi_tag == NULL)
 		return;
 	if (dma->idi_paddr != IF_BAD_DMA) {
 		bus_dmamap_sync(dma->idi_tag, dma->idi_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(dma->idi_tag, dma->idi_map);
 		dma->idi_paddr = IF_BAD_DMA;
 	}
 	if (dma->idi_vaddr != NULL) {
 		bus_dmamem_free(dma->idi_tag, dma->idi_vaddr, dma->idi_map);
 		dma->idi_vaddr = NULL;
 	}
 	bus_dma_tag_destroy(dma->idi_tag);
 	dma->idi_tag = NULL;
 }
 
 void
 iflib_dma_free_multi(iflib_dma_info_t *dmalist, int count)
 {
 	int i;
 	iflib_dma_info_t *dmaiter = dmalist;
 
 	for (i = 0; i < count; i++, dmaiter++)
 		iflib_dma_free(*dmaiter);
 }
 
 #ifdef EARLY_AP_STARTUP
 static const int iflib_started = 1;
 #else
 /*
  * We used to abuse the smp_started flag to decide if the queues have been
  * fully initialized (by late taskqgroup_adjust() calls in a SYSINIT()).
  * That gave bad races, since the SYSINIT() runs strictly after smp_started
  * is set.  Run a SYSINIT() strictly after that to just set a usable
  * completion flag.
  */
 
 static int iflib_started;
 
 static void
 iflib_record_started(void *arg)
 {
 	iflib_started = 1;
 }
 
 SYSINIT(iflib_record_started, SI_SUB_SMP + 1, SI_ORDER_FIRST,
 	iflib_record_started, NULL);
 #endif
 
 static int
 iflib_fast_intr(void *arg)
 {
 	iflib_filter_info_t info = arg;
 	struct grouptask *gtask = info->ifi_task;
 	int result;
 
 	if (!iflib_started)
 		return (FILTER_STRAY);
 
 	DBG_COUNTER_INC(fast_intrs);
 	if (info->ifi_filter != NULL) {
 		result = info->ifi_filter(info->ifi_filter_arg);
 		if ((result & FILTER_SCHEDULE_THREAD) == 0)
 			return (result);
 	}
 
 	GROUPTASK_ENQUEUE(gtask);
 	return (FILTER_HANDLED);
 }
 
 static int
 iflib_fast_intr_rxtx(void *arg)
 {
 	iflib_filter_info_t info = arg;
 	struct grouptask *gtask = info->ifi_task;
 	if_ctx_t ctx;
 	iflib_rxq_t rxq = (iflib_rxq_t)info->ifi_ctx;
 	iflib_txq_t txq;
 	void *sc;
 	int i, cidx, result;
 	qidx_t txqid;
 
 	if (!iflib_started)
 		return (FILTER_STRAY);
 
 	DBG_COUNTER_INC(fast_intrs);
 	if (info->ifi_filter != NULL) {
 		result = info->ifi_filter(info->ifi_filter_arg);
 		if ((result & FILTER_SCHEDULE_THREAD) == 0)
 			return (result);
 	}
 
 	ctx = rxq->ifr_ctx;
 	sc = ctx->ifc_softc;
 	MPASS(rxq->ifr_ntxqirq);
 	for (i = 0; i < rxq->ifr_ntxqirq; i++) {
 		txqid = rxq->ifr_txqid[i];
 		txq = &ctx->ifc_txqs[txqid];
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 		    BUS_DMASYNC_POSTREAD);
 		if (!ctx->isc_txd_credits_update(sc, txqid, false)) {
 			IFDI_TX_QUEUE_INTR_ENABLE(ctx, txqid);
 			continue;
 		}
 		GROUPTASK_ENQUEUE(&txq->ift_task);
 	}
 	if (ctx->ifc_sctx->isc_flags & IFLIB_HAS_RXCQ)
 		cidx = rxq->ifr_cq_cidx;
 	else
 		cidx = rxq->ifr_fl[0].ifl_cidx;
 	if (iflib_rxd_avail(ctx, rxq, cidx, 1))
 		GROUPTASK_ENQUEUE(gtask);
 	else {
 		IFDI_RX_QUEUE_INTR_ENABLE(ctx, rxq->ifr_id);
 		DBG_COUNTER_INC(rx_intr_enables);
 	}
 	return (FILTER_HANDLED);
 }
 
 
 static int
 iflib_fast_intr_ctx(void *arg)
 {
 	iflib_filter_info_t info = arg;
 	struct grouptask *gtask = info->ifi_task;
 	int result;
 
 	if (!iflib_started)
 		return (FILTER_STRAY);
 
 	DBG_COUNTER_INC(fast_intrs);
 	if (info->ifi_filter != NULL) {
 		result = info->ifi_filter(info->ifi_filter_arg);
 		if ((result & FILTER_SCHEDULE_THREAD) == 0)
 			return (result);
 	}
 
 	GROUPTASK_ENQUEUE(gtask);
 	return (FILTER_HANDLED);
 }
 
 static int
 _iflib_irq_alloc(if_ctx_t ctx, if_irq_t irq, int rid,
 		 driver_filter_t filter, driver_intr_t handler, void *arg,
 		 const char *name)
 {
 	int rc, flags;
 	struct resource *res;
 	void *tag = NULL;
 	device_t dev = ctx->ifc_dev;
 
 	flags = RF_ACTIVE;
 	if (ctx->ifc_flags & IFC_LEGACY)
 		flags |= RF_SHAREABLE;
 	MPASS(rid < 512);
 	irq->ii_rid = rid;
 	res = bus_alloc_resource_any(dev, SYS_RES_IRQ, &irq->ii_rid, flags);
 	if (res == NULL) {
 		device_printf(dev,
 		    "failed to allocate IRQ for rid %d, name %s.\n", rid, name);
 		return (ENOMEM);
 	}
 	irq->ii_res = res;
 	KASSERT(filter == NULL || handler == NULL, ("filter and handler can't both be non-NULL"));
 	rc = bus_setup_intr(dev, res, INTR_MPSAFE | INTR_TYPE_NET,
 						filter, handler, arg, &tag);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to setup interrupt for rid %d, name %s: %d\n",
 					  rid, name ? name : "unknown", rc);
 		return (rc);
 	} else if (name)
 		bus_describe_intr(dev, res, tag, "%s", name);
 
 	irq->ii_tag = tag;
 	return (0);
 }
 
 
 /*********************************************************************
  *
  *  Allocate DMA resources for TX buffers as well as memory for the TX
  *  mbuf map.  TX DMA maps (non-TSO/TSO) and TX mbuf map are kept in a
  *  iflib_sw_tx_desc_array structure, storing all the information that
  *  is needed to transmit a packet on the wire.  This is called only
  *  once at attach, setup is done every reset.
  *
  **********************************************************************/
 static int
 iflib_txsd_alloc(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	device_t dev = ctx->ifc_dev;
 	bus_size_t tsomaxsize;
 	int err, nsegments, ntsosegments;
 	bool tso;
 
 	nsegments = scctx->isc_tx_nsegments;
 	ntsosegments = scctx->isc_tx_tso_segments_max;
 	tsomaxsize = scctx->isc_tx_tso_size_max;
 	if (if_getcapabilities(ctx->ifc_ifp) & IFCAP_VLAN_MTU)
 		tsomaxsize += sizeof(struct ether_vlan_header);
 	MPASS(scctx->isc_ntxd[0] > 0);
 	MPASS(scctx->isc_ntxd[txq->ift_br_offset] > 0);
 	MPASS(nsegments > 0);
 	if (if_getcapabilities(ctx->ifc_ifp) & IFCAP_TSO) {
 		MPASS(ntsosegments > 0);
 		MPASS(sctx->isc_tso_maxsize >= tsomaxsize);
 	}
 
 	/*
 	 * Set up DMA tags for TX buffers.
 	 */
 	if ((err = bus_dma_tag_create(bus_get_dma_tag(dev),
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       sctx->isc_tx_maxsize,		/* maxsize */
 			       nsegments,	/* nsegments */
 			       sctx->isc_tx_maxsegsize,	/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txq->ift_buf_tag))) {
 		device_printf(dev,"Unable to allocate TX DMA tag: %d\n", err);
 		device_printf(dev,"maxsize: %ju nsegments: %d maxsegsize: %ju\n",
 		    (uintmax_t)sctx->isc_tx_maxsize, nsegments, (uintmax_t)sctx->isc_tx_maxsegsize);
 		goto fail;
 	}
 	tso = (if_getcapabilities(ctx->ifc_ifp) & IFCAP_TSO) != 0;
 	if (tso && (err = bus_dma_tag_create(bus_get_dma_tag(dev),
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       tsomaxsize,		/* maxsize */
 			       ntsosegments,	/* nsegments */
 			       sctx->isc_tso_maxsegsize,/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txq->ift_tso_buf_tag))) {
 		device_printf(dev, "Unable to allocate TSO TX DMA tag: %d\n",
 		    err);
 		goto fail;
 	}
 
 	/* Allocate memory for the TX mbuf map. */
 	if (!(txq->ift_sds.ifsd_m =
 	    (struct mbuf **) malloc(sizeof(struct mbuf *) *
 	    scctx->isc_ntxd[txq->ift_br_offset], M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate TX mbuf map memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 
 	/*
 	 * Create the DMA maps for TX buffers.
 	 */
 	if ((txq->ift_sds.ifsd_map = (bus_dmamap_t *)malloc(
 	    sizeof(bus_dmamap_t) * scctx->isc_ntxd[txq->ift_br_offset],
 	    M_IFLIB, M_NOWAIT | M_ZERO)) == NULL) {
 		device_printf(dev,
 		    "Unable to allocate TX buffer DMA map memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 	if (tso && (txq->ift_sds.ifsd_tso_map = (bus_dmamap_t *)malloc(
 	    sizeof(bus_dmamap_t) * scctx->isc_ntxd[txq->ift_br_offset],
 	    M_IFLIB, M_NOWAIT | M_ZERO)) == NULL) {
 		device_printf(dev,
 		    "Unable to allocate TSO TX buffer map memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 	for (int i = 0; i < scctx->isc_ntxd[txq->ift_br_offset]; i++) {
 		err = bus_dmamap_create(txq->ift_buf_tag, 0,
 		    &txq->ift_sds.ifsd_map[i]);
 		if (err != 0) {
 			device_printf(dev, "Unable to create TX DMA map\n");
 			goto fail;
 		}
 		if (!tso)
 			continue;
 		err = bus_dmamap_create(txq->ift_tso_buf_tag, 0,
 		    &txq->ift_sds.ifsd_tso_map[i]);
 		if (err != 0) {
 			device_printf(dev, "Unable to create TSO TX DMA map\n");
 			goto fail;
 		}
 	}
 	return (0);
 fail:
 	/* We free all, it handles case where we are in the middle */
 	iflib_tx_structures_free(ctx);
 	return (err);
 }
 
 static void
 iflib_txsd_destroy(if_ctx_t ctx, iflib_txq_t txq, int i)
 {
 	bus_dmamap_t map;
 
 	map = NULL;
 	if (txq->ift_sds.ifsd_map != NULL)
 		map = txq->ift_sds.ifsd_map[i];
 	if (map != NULL) {
 		bus_dmamap_sync(txq->ift_buf_tag, map, BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txq->ift_buf_tag, map);
 		bus_dmamap_destroy(txq->ift_buf_tag, map);
 		txq->ift_sds.ifsd_map[i] = NULL;
 	}
 
 	map = NULL;
 	if (txq->ift_sds.ifsd_tso_map != NULL)
 		map = txq->ift_sds.ifsd_tso_map[i];
 	if (map != NULL) {
 		bus_dmamap_sync(txq->ift_tso_buf_tag, map,
 		    BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txq->ift_tso_buf_tag, map);
 		bus_dmamap_destroy(txq->ift_tso_buf_tag, map);
 		txq->ift_sds.ifsd_tso_map[i] = NULL;
 	}
 }
 
 static void
 iflib_txq_destroy(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 
 	for (int i = 0; i < txq->ift_size; i++)
 		iflib_txsd_destroy(ctx, txq, i);
 	if (txq->ift_sds.ifsd_map != NULL) {
 		free(txq->ift_sds.ifsd_map, M_IFLIB);
 		txq->ift_sds.ifsd_map = NULL;
 	}
 	if (txq->ift_sds.ifsd_tso_map != NULL) {
 		free(txq->ift_sds.ifsd_tso_map, M_IFLIB);
 		txq->ift_sds.ifsd_tso_map = NULL;
 	}
 	if (txq->ift_sds.ifsd_m != NULL) {
 		free(txq->ift_sds.ifsd_m, M_IFLIB);
 		txq->ift_sds.ifsd_m = NULL;
 	}
 	if (txq->ift_buf_tag != NULL) {
 		bus_dma_tag_destroy(txq->ift_buf_tag);
 		txq->ift_buf_tag = NULL;
 	}
 	if (txq->ift_tso_buf_tag != NULL) {
 		bus_dma_tag_destroy(txq->ift_tso_buf_tag);
 		txq->ift_tso_buf_tag = NULL;
 	}
 }
 
 static void
 iflib_txsd_free(if_ctx_t ctx, iflib_txq_t txq, int i)
 {
 	struct mbuf **mp;
 
 	mp = &txq->ift_sds.ifsd_m[i];
 	if (*mp == NULL)
 		return;
 
 	if (txq->ift_sds.ifsd_map != NULL) {
 		bus_dmamap_sync(txq->ift_buf_tag,
 		    txq->ift_sds.ifsd_map[i], BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txq->ift_buf_tag, txq->ift_sds.ifsd_map[i]);
 	}
 	if (txq->ift_sds.ifsd_tso_map != NULL) {
 		bus_dmamap_sync(txq->ift_tso_buf_tag,
 		    txq->ift_sds.ifsd_tso_map[i], BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txq->ift_tso_buf_tag,
 		    txq->ift_sds.ifsd_tso_map[i]);
 	}
 	m_free(*mp);
 	DBG_COUNTER_INC(tx_frees);
 	*mp = NULL;
 }
 
 static int
 iflib_txq_setup(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	iflib_dma_info_t di;
 	int i;
 
 	/* Set number of descriptors available */
 	txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 	/* XXX make configurable */
 	txq->ift_update_freq = IFLIB_DEFAULT_TX_UPDATE_FREQ;
 
 	/* Reset indices */
 	txq->ift_cidx_processed = 0;
 	txq->ift_pidx = txq->ift_cidx = txq->ift_npending = 0;
 	txq->ift_size = scctx->isc_ntxd[txq->ift_br_offset];
 
 	for (i = 0, di = txq->ift_ifdi; i < sctx->isc_ntxqs; i++, di++)
 		bzero((void *)di->idi_vaddr, di->idi_size);
 
 	IFDI_TXQ_SETUP(ctx, txq->ift_id);
 	for (i = 0, di = txq->ift_ifdi; i < sctx->isc_ntxqs; i++, di++)
 		bus_dmamap_sync(di->idi_tag, di->idi_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Allocate DMA resources for RX buffers as well as memory for the RX
  *  mbuf map, direct RX cluster pointer map and RX cluster bus address
  *  map.  RX DMA map, RX mbuf map, direct RX cluster pointer map and
  *  RX cluster map are kept in a iflib_sw_rx_desc_array structure.
  *  Since we use use one entry in iflib_sw_rx_desc_array per received
  *  packet, the maximum number of entries we'll need is equal to the
  *  number of hardware receive descriptors that we've allocated.
  *
  **********************************************************************/
 static int
 iflib_rxsd_alloc(iflib_rxq_t rxq)
 {
 	if_ctx_t ctx = rxq->ifr_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	device_t dev = ctx->ifc_dev;
 	iflib_fl_t fl;
 	int			err;
 
 	MPASS(scctx->isc_nrxd[0] > 0);
 	MPASS(scctx->isc_nrxd[rxq->ifr_fl_offset] > 0);
 
 	fl = rxq->ifr_fl;
 	for (int i = 0; i <  rxq->ifr_nfl; i++, fl++) {
 		fl->ifl_size = scctx->isc_nrxd[rxq->ifr_fl_offset]; /* this isn't necessarily the same */
 		/* Set up DMA tag for RX buffers. */
 		err = bus_dma_tag_create(bus_get_dma_tag(dev), /* parent */
 					 1, 0,			/* alignment, bounds */
 					 BUS_SPACE_MAXADDR,	/* lowaddr */
 					 BUS_SPACE_MAXADDR,	/* highaddr */
 					 NULL, NULL,		/* filter, filterarg */
 					 sctx->isc_rx_maxsize,	/* maxsize */
 					 sctx->isc_rx_nsegments,	/* nsegments */
 					 sctx->isc_rx_maxsegsize,	/* maxsegsize */
 					 0,			/* flags */
 					 NULL,			/* lockfunc */
 					 NULL,			/* lockarg */
 					 &fl->ifl_buf_tag);
 		if (err) {
 			device_printf(dev,
 			    "Unable to allocate RX DMA tag: %d\n", err);
 			goto fail;
 		}
 
 		/* Allocate memory for the RX mbuf map. */
 		if (!(fl->ifl_sds.ifsd_m =
 		      (struct mbuf **) malloc(sizeof(struct mbuf *) *
 					      scctx->isc_nrxd[rxq->ifr_fl_offset], M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev,
 			    "Unable to allocate RX mbuf map memory\n");
 			err = ENOMEM;
 			goto fail;
 		}
 
 		/* Allocate memory for the direct RX cluster pointer map. */
 		if (!(fl->ifl_sds.ifsd_cl =
 		      (caddr_t *) malloc(sizeof(caddr_t) *
 					      scctx->isc_nrxd[rxq->ifr_fl_offset], M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev,
 			    "Unable to allocate RX cluster map memory\n");
 			err = ENOMEM;
 			goto fail;
 		}
 
 		/* Allocate memory for the RX cluster bus address map. */
 		if (!(fl->ifl_sds.ifsd_ba =
 		      (bus_addr_t *) malloc(sizeof(bus_addr_t) *
 					      scctx->isc_nrxd[rxq->ifr_fl_offset], M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev,
 			    "Unable to allocate RX bus address map memory\n");
 			err = ENOMEM;
 			goto fail;
 		}
 
 		/*
 		 * Create the DMA maps for RX buffers.
 		 */
 		if (!(fl->ifl_sds.ifsd_map =
 		      (bus_dmamap_t *) malloc(sizeof(bus_dmamap_t) * scctx->isc_nrxd[rxq->ifr_fl_offset], M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev,
 			    "Unable to allocate RX buffer DMA map memory\n");
 			err = ENOMEM;
 			goto fail;
 		}
 		for (int i = 0; i < scctx->isc_nrxd[rxq->ifr_fl_offset]; i++) {
 			err = bus_dmamap_create(fl->ifl_buf_tag, 0,
 			    &fl->ifl_sds.ifsd_map[i]);
 			if (err != 0) {
 				device_printf(dev, "Unable to create RX buffer DMA map\n");
 				goto fail;
 			}
 		}
 	}
 	return (0);
 
 fail:
 	iflib_rx_structures_free(ctx);
 	return (err);
 }
 
 
 /*
  * Internal service routines
  */
 
 struct rxq_refill_cb_arg {
 	int               error;
 	bus_dma_segment_t seg;
 	int               nseg;
 };
 
 static void
 _rxq_refill_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	struct rxq_refill_cb_arg *cb_arg = arg;
 
 	cb_arg->error = error;
 	cb_arg->seg = segs[0];
 	cb_arg->nseg = nseg;
 }
 
 /**
  *	rxq_refill - refill an rxq  free-buffer list
  *	@ctx: the iflib context
  *	@rxq: the free-list to refill
  *	@n: the number of new buffers to allocate
  *
  *	(Re)populate an rxq free-buffer list with up to @n new packet buffers.
  *	The caller must assure that @n does not exceed the queue's capacity.
  */
 static void
 _iflib_fl_refill(if_ctx_t ctx, iflib_fl_t fl, int count)
 {
 	struct if_rxd_update iru;
 	struct rxq_refill_cb_arg cb_arg;
 	struct mbuf *m;
 	caddr_t cl, *sd_cl;
 	struct mbuf **sd_m;
 	bus_dmamap_t *sd_map;
 	bus_addr_t bus_addr, *sd_ba;
 	int err, frag_idx, i, idx, n, pidx;
 	qidx_t credits;
 
 	sd_m = fl->ifl_sds.ifsd_m;
 	sd_map = fl->ifl_sds.ifsd_map;
 	sd_cl = fl->ifl_sds.ifsd_cl;
 	sd_ba = fl->ifl_sds.ifsd_ba;
 	pidx = fl->ifl_pidx;
 	idx = pidx;
 	frag_idx = fl->ifl_fragidx;
 	credits = fl->ifl_credits;
 
 	i = 0;
 	n = count;
 	MPASS(n > 0);
 	MPASS(credits + n <= fl->ifl_size);
 
 	if (pidx < fl->ifl_cidx)
 		MPASS(pidx + n <= fl->ifl_cidx);
 	if (pidx == fl->ifl_cidx && (credits < fl->ifl_size))
 		MPASS(fl->ifl_gen == 0);
 	if (pidx > fl->ifl_cidx)
 		MPASS(n <= fl->ifl_size - pidx + fl->ifl_cidx);
 
 	DBG_COUNTER_INC(fl_refills);
 	if (n > 8)
 		DBG_COUNTER_INC(fl_refills_large);
 	iru_init(&iru, fl->ifl_rxq, fl->ifl_id);
 	while (n--) {
 		/*
 		 * We allocate an uninitialized mbuf + cluster, mbuf is
 		 * initialized after rx.
 		 *
 		 * If the cluster is still set then we know a minimum sized packet was received
 		 */
 		bit_ffc_at(fl->ifl_rx_bitmap, frag_idx, fl->ifl_size,
 		    &frag_idx);
 		if (frag_idx < 0)
 			bit_ffc(fl->ifl_rx_bitmap, fl->ifl_size, &frag_idx);
 		MPASS(frag_idx >= 0);
 		if ((cl = sd_cl[frag_idx]) == NULL) {
 			if ((cl = m_cljget(NULL, M_NOWAIT, fl->ifl_buf_size)) == NULL)
 				break;
 
 			cb_arg.error = 0;
 			MPASS(sd_map != NULL);
 			err = bus_dmamap_load(fl->ifl_buf_tag, sd_map[frag_idx],
 			    cl, fl->ifl_buf_size, _rxq_refill_cb, &cb_arg,
 			    BUS_DMA_NOWAIT);
 			if (err != 0 || cb_arg.error) {
 				/*
 				 * !zone_pack ?
 				 */
 				if (fl->ifl_zone == zone_pack)
 					uma_zfree(fl->ifl_zone, cl);
 				break;
 			}
 
 			sd_ba[frag_idx] =  bus_addr = cb_arg.seg.ds_addr;
 			sd_cl[frag_idx] = cl;
 #if MEMORY_LOGGING
 			fl->ifl_cl_enqueued++;
 #endif
 		} else {
 			bus_addr = sd_ba[frag_idx];
 		}
 		bus_dmamap_sync(fl->ifl_buf_tag, sd_map[frag_idx],
 		    BUS_DMASYNC_PREREAD);
 
-		MPASS(sd_m[frag_idx] == NULL);
-		if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) {
-			break;
+		if (sd_m[frag_idx] == NULL) {
+			if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) {
+				break;
+			}
+			sd_m[frag_idx] = m;
 		}
-		sd_m[frag_idx] = m;
 		bit_set(fl->ifl_rx_bitmap, frag_idx);
 #if MEMORY_LOGGING
 		fl->ifl_m_enqueued++;
 #endif
 
 		DBG_COUNTER_INC(rx_allocs);
 		fl->ifl_rxd_idxs[i] = frag_idx;
 		fl->ifl_bus_addrs[i] = bus_addr;
 		fl->ifl_vm_addrs[i] = cl;
 		credits++;
 		i++;
 		MPASS(credits <= fl->ifl_size);
 		if (++idx == fl->ifl_size) {
 			fl->ifl_gen = 1;
 			idx = 0;
 		}
 		if (n == 0 || i == IFLIB_MAX_RX_REFRESH) {
 			iru.iru_pidx = pidx;
 			iru.iru_count = i;
 			ctx->isc_rxd_refill(ctx->ifc_softc, &iru);
 			i = 0;
 			pidx = idx;
 			fl->ifl_pidx = idx;
 			fl->ifl_credits = credits;
 		}
 	}
 
 	if (i) {
 		iru.iru_pidx = pidx;
 		iru.iru_count = i;
 		ctx->isc_rxd_refill(ctx->ifc_softc, &iru);
 		fl->ifl_pidx = idx;
 		fl->ifl_credits = credits;
 	}
 	DBG_COUNTER_INC(rxd_flush);
 	if (fl->ifl_pidx == 0)
 		pidx = fl->ifl_size - 1;
 	else
 		pidx = fl->ifl_pidx - 1;
 
 	bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	ctx->isc_rxd_flush(ctx->ifc_softc, fl->ifl_rxq->ifr_id, fl->ifl_id, pidx);
 	fl->ifl_fragidx = frag_idx;
 }
 
 static __inline void
 __iflib_fl_refill_lt(if_ctx_t ctx, iflib_fl_t fl, int max)
 {
 	/* we avoid allowing pidx to catch up with cidx as it confuses ixl */
 	int32_t reclaimable = fl->ifl_size - fl->ifl_credits - 1;
 #ifdef INVARIANTS
 	int32_t delta = fl->ifl_size - get_inuse(fl->ifl_size, fl->ifl_cidx, fl->ifl_pidx, fl->ifl_gen) - 1;
 #endif
 
 	MPASS(fl->ifl_credits <= fl->ifl_size);
 	MPASS(reclaimable == delta);
 
 	if (reclaimable > 0)
 		_iflib_fl_refill(ctx, fl, min(max, reclaimable));
 }
 
 uint8_t
 iflib_in_detach(if_ctx_t ctx)
 {
 	bool in_detach;
 	STATE_LOCK(ctx);
 	in_detach = !!(ctx->ifc_flags & IFC_IN_DETACH);
 	STATE_UNLOCK(ctx);
 	return (in_detach);
 }
 
 static void
 iflib_fl_bufs_free(iflib_fl_t fl)
 {
 	iflib_dma_info_t idi = fl->ifl_ifdi;
 	bus_dmamap_t sd_map;
 	uint32_t i;
 
 	for (i = 0; i < fl->ifl_size; i++) {
 		struct mbuf **sd_m = &fl->ifl_sds.ifsd_m[i];
 		caddr_t *sd_cl = &fl->ifl_sds.ifsd_cl[i];
 
 		if (*sd_cl != NULL) {
 			sd_map = fl->ifl_sds.ifsd_map[i];
 			bus_dmamap_sync(fl->ifl_buf_tag, sd_map,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(fl->ifl_buf_tag, sd_map);
 			if (*sd_cl != NULL)
 				uma_zfree(fl->ifl_zone, *sd_cl);
 			// XXX: Should this get moved out?
 			if (iflib_in_detach(fl->ifl_rxq->ifr_ctx))
 				bus_dmamap_destroy(fl->ifl_buf_tag, sd_map);
 			if (*sd_m != NULL) {
 				m_init(*sd_m, M_NOWAIT, MT_DATA, 0);
 				uma_zfree(zone_mbuf, *sd_m);
 			}
 		} else {
 			MPASS(*sd_cl == NULL);
 			MPASS(*sd_m == NULL);
 		}
 #if MEMORY_LOGGING
 		fl->ifl_m_dequeued++;
 		fl->ifl_cl_dequeued++;
 #endif
 		*sd_cl = NULL;
 		*sd_m = NULL;
 	}
 #ifdef INVARIANTS
 	for (i = 0; i < fl->ifl_size; i++) {
 		MPASS(fl->ifl_sds.ifsd_cl[i] == NULL);
 		MPASS(fl->ifl_sds.ifsd_m[i] == NULL);
 	}
 #endif
 	/*
 	 * Reset free list values
 	 */
 	fl->ifl_credits = fl->ifl_cidx = fl->ifl_pidx = fl->ifl_gen = fl->ifl_fragidx = 0;
 	bzero(idi->idi_vaddr, idi->idi_size);
 }
 
 /*********************************************************************
  *
  *  Initialize a receive ring and its buffers.
  *
  **********************************************************************/
 static int
 iflib_fl_setup(iflib_fl_t fl)
 {
 	iflib_rxq_t rxq = fl->ifl_rxq;
 	if_ctx_t ctx = rxq->ifr_ctx;
 
 	bit_nclear(fl->ifl_rx_bitmap, 0, fl->ifl_size - 1);
 	/*
 	** Free current RX buffer structs and their mbufs
 	*/
 	iflib_fl_bufs_free(fl);
 	/* Now replenish the mbufs */
 	MPASS(fl->ifl_credits == 0);
 	fl->ifl_buf_size = ctx->ifc_rx_mbuf_sz;
 	if (fl->ifl_buf_size > ctx->ifc_max_fl_buf_size)
 		ctx->ifc_max_fl_buf_size = fl->ifl_buf_size;
 	fl->ifl_cltype = m_gettype(fl->ifl_buf_size);
 	fl->ifl_zone = m_getzone(fl->ifl_buf_size);
 
 
 	/* avoid pre-allocating zillions of clusters to an idle card
 	 * potentially speeding up attach
 	 */
 	_iflib_fl_refill(ctx, fl, min(128, fl->ifl_size));
 	MPASS(min(128, fl->ifl_size) == fl->ifl_credits);
 	if (min(128, fl->ifl_size) != fl->ifl_credits)
 		return (ENOBUFS);
 	/*
 	 * handle failure
 	 */
 	MPASS(rxq != NULL);
 	MPASS(fl->ifl_ifdi != NULL);
 	bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Free receive ring data structures
  *
  **********************************************************************/
 static void
 iflib_rx_sds_free(iflib_rxq_t rxq)
 {
 	iflib_fl_t fl;
 	int i, j;
 
 	if (rxq->ifr_fl != NULL) {
 		for (i = 0; i < rxq->ifr_nfl; i++) {
 			fl = &rxq->ifr_fl[i];
 			if (fl->ifl_buf_tag != NULL) {
 				if (fl->ifl_sds.ifsd_map != NULL) {
 					for (j = 0; j < fl->ifl_size; j++) {
 						if (fl->ifl_sds.ifsd_map[j] ==
 						    NULL)
 							continue;
 						bus_dmamap_sync(
 						    fl->ifl_buf_tag,
 						    fl->ifl_sds.ifsd_map[j],
 						    BUS_DMASYNC_POSTREAD);
 						bus_dmamap_unload(
 						    fl->ifl_buf_tag,
 						    fl->ifl_sds.ifsd_map[j]);
 					}
 				}
 				bus_dma_tag_destroy(fl->ifl_buf_tag);
 				fl->ifl_buf_tag = NULL;
 			}
 			free(fl->ifl_sds.ifsd_m, M_IFLIB);
 			free(fl->ifl_sds.ifsd_cl, M_IFLIB);
 			free(fl->ifl_sds.ifsd_ba, M_IFLIB);
 			free(fl->ifl_sds.ifsd_map, M_IFLIB);
 			fl->ifl_sds.ifsd_m = NULL;
 			fl->ifl_sds.ifsd_cl = NULL;
 			fl->ifl_sds.ifsd_ba = NULL;
 			fl->ifl_sds.ifsd_map = NULL;
 		}
 		free(rxq->ifr_fl, M_IFLIB);
 		rxq->ifr_fl = NULL;
 		rxq->ifr_cq_gen = rxq->ifr_cq_cidx = rxq->ifr_cq_pidx = 0;
 	}
 }
 
 /*
  * MI independent logic
  *
  */
 static void
 iflib_timer(void *arg)
 {
 	iflib_txq_t txq = arg;
 	if_ctx_t ctx = txq->ift_ctx;
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 	uint64_t this_tick = ticks;
 	uint32_t reset_on = hz / 2;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 	/*
 	** Check on the state of the TX queue(s), this
 	** can be done without the lock because its RO
 	** and the HUNG state will be static if set.
 	*/
 	if (this_tick - txq->ift_last_timer_tick >= hz / 2) {
 		txq->ift_last_timer_tick = this_tick;
 		IFDI_TIMER(ctx, txq->ift_id);
 		if ((txq->ift_qstatus == IFLIB_QUEUE_HUNG) &&
 		    ((txq->ift_cleaned_prev == txq->ift_cleaned) ||
 		     (sctx->isc_pause_frames == 0)))
 			goto hung;
 
 		if (ifmp_ring_is_stalled(txq->ift_br))
 			txq->ift_qstatus = IFLIB_QUEUE_HUNG;
 		txq->ift_cleaned_prev = txq->ift_cleaned;
 	}
 #ifdef DEV_NETMAP
 	if (if_getcapenable(ctx->ifc_ifp) & IFCAP_NETMAP)
 		iflib_netmap_timer_adjust(ctx, txq, &reset_on);
 #endif
 	/* handle any laggards */
 	if (txq->ift_db_pending)
 		GROUPTASK_ENQUEUE(&txq->ift_task);
 
 	sctx->isc_pause_frames = 0;
 	if (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING) 
 		callout_reset_on(&txq->ift_timer, reset_on, iflib_timer, txq, txq->ift_timer.c_cpu);
 	return;
  hung:
 	device_printf(ctx->ifc_dev,  "TX(%d) desc avail = %d, pidx = %d\n",
 				  txq->ift_id, TXQ_AVAIL(txq), txq->ift_pidx);
 	STATE_LOCK(ctx);
 	if_setdrvflagbits(ctx->ifc_ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING);
 	ctx->ifc_flags |= (IFC_DO_WATCHDOG|IFC_DO_RESET);
 	iflib_admin_intr_deferred(ctx);
 	STATE_UNLOCK(ctx);
 }
 
 static void
 iflib_calc_rx_mbuf_sz(if_ctx_t ctx)
 {
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 
 	/*
 	 * XXX don't set the max_frame_size to larger
 	 * than the hardware can handle
 	 */
 	if (sctx->isc_max_frame_size <= MCLBYTES)
 		ctx->ifc_rx_mbuf_sz = MCLBYTES;
 	else
 		ctx->ifc_rx_mbuf_sz = MJUMPAGESIZE;
 }
 
 uint32_t
 iflib_get_rx_mbuf_sz(if_ctx_t ctx)
 {
 	return (ctx->ifc_rx_mbuf_sz);
 }
 
 static void
 iflib_init_locked(if_ctx_t ctx)
 {
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	if_t ifp = ctx->ifc_ifp;
 	iflib_fl_t fl;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	int i, j, tx_ip_csum_flags, tx_ip6_csum_flags;
 
 
 	if_setdrvflagbits(ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING);
 	IFDI_INTR_DISABLE(ctx);
 
 	tx_ip_csum_flags = scctx->isc_tx_csum_flags & (CSUM_IP | CSUM_TCP | CSUM_UDP | CSUM_SCTP);
 	tx_ip6_csum_flags = scctx->isc_tx_csum_flags & (CSUM_IP6_TCP | CSUM_IP6_UDP | CSUM_IP6_SCTP);
 	/* Set hardware offload abilities */
 	if_clearhwassist(ifp);
 	if (if_getcapenable(ifp) & IFCAP_TXCSUM)
 		if_sethwassistbits(ifp, tx_ip_csum_flags, 0);
 	if (if_getcapenable(ifp) & IFCAP_TXCSUM_IPV6)
 		if_sethwassistbits(ifp,  tx_ip6_csum_flags, 0);
 	if (if_getcapenable(ifp) & IFCAP_TSO4)
 		if_sethwassistbits(ifp, CSUM_IP_TSO, 0);
 	if (if_getcapenable(ifp) & IFCAP_TSO6)
 		if_sethwassistbits(ifp, CSUM_IP6_TSO, 0);
 
 	for (i = 0, txq = ctx->ifc_txqs; i < sctx->isc_ntxqsets; i++, txq++) {
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		CALLOUT_UNLOCK(txq);
 		iflib_netmap_txq_init(ctx, txq);
 	}
 
 	/*
 	 * Calculate a suitable Rx mbuf size prior to calling IFDI_INIT, so
 	 * that drivers can use the value when setting up the hardware receive
 	 * buffers.
 	 */
 	iflib_calc_rx_mbuf_sz(ctx);
 
 #ifdef INVARIANTS
 	i = if_getdrvflags(ifp);
 #endif
 	IFDI_INIT(ctx);
 	MPASS(if_getdrvflags(ifp) == i);
 	for (i = 0, rxq = ctx->ifc_rxqs; i < sctx->isc_nrxqsets; i++, rxq++) {
 		/* XXX this should really be done on a per-queue basis */
 		if (if_getcapenable(ifp) & IFCAP_NETMAP) {
 			MPASS(rxq->ifr_id == i);
 			iflib_netmap_rxq_init(ctx, rxq);
 			continue;
 		}
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {
 			if (iflib_fl_setup(fl)) {
 				device_printf(ctx->ifc_dev, "freelist setup failed - check cluster settings\n");
 				goto done;
 			}
 		}
 	}
 done:
 	if_setdrvflagbits(ctx->ifc_ifp, IFF_DRV_RUNNING, IFF_DRV_OACTIVE);
 	IFDI_INTR_ENABLE(ctx);
 	txq = ctx->ifc_txqs;
 	for (i = 0; i < sctx->isc_ntxqsets; i++, txq++)
 		callout_reset_on(&txq->ift_timer, hz/2, iflib_timer, txq,
 			txq->ift_timer.c_cpu);
 }
 
 static int
 iflib_media_change(if_t ifp)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	int err;
 
 	CTX_LOCK(ctx);
 	if ((err = IFDI_MEDIA_CHANGE(ctx)) == 0)
 		iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 	return (err);
 }
 
 static void
 iflib_media_status(if_t ifp, struct ifmediareq *ifmr)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	CTX_LOCK(ctx);
 	IFDI_UPDATE_ADMIN_STATUS(ctx);
 	IFDI_MEDIA_STATUS(ctx, ifmr);
 	CTX_UNLOCK(ctx);
 }
 
 void
 iflib_stop(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	iflib_dma_info_t di;
 	iflib_fl_t fl;
 	int i, j;
 
 	/* Tell the stack that the interface is no longer active */
 	if_setdrvflagbits(ctx->ifc_ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING);
 
 	IFDI_INTR_DISABLE(ctx);
 	DELAY(1000);
 	IFDI_STOP(ctx);
 	DELAY(1000);
 
 	iflib_debug_reset();
 	/* Wait for current tx queue users to exit to disarm watchdog timer. */
 	for (i = 0; i < scctx->isc_ntxqsets; i++, txq++) {
 		/* make sure all transmitters have completed before proceeding XXX */
 
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		CALLOUT_UNLOCK(txq);
 
 		/* clean any enqueued buffers */
 		iflib_ifmp_purge(txq);
 		/* Free any existing tx buffers. */
 		for (j = 0; j < txq->ift_size; j++) {
 			iflib_txsd_free(ctx, txq, j);
 		}
 		txq->ift_processed = txq->ift_cleaned = txq->ift_cidx_processed = 0;
 		txq->ift_in_use = txq->ift_gen = txq->ift_cidx = txq->ift_pidx = txq->ift_no_desc_avail = 0;
 		txq->ift_closed = txq->ift_mbuf_defrag = txq->ift_mbuf_defrag_failed = 0;
 		txq->ift_no_tx_dma_setup = txq->ift_txd_encap_efbig = txq->ift_map_failed = 0;
 		txq->ift_pullups = 0;
 		ifmp_ring_reset_stats(txq->ift_br);
 		for (j = 0, di = txq->ift_ifdi; j < sctx->isc_ntxqs; j++, di++)
 			bzero((void *)di->idi_vaddr, di->idi_size);
 	}
 	for (i = 0; i < scctx->isc_nrxqsets; i++, rxq++) {
 		/* make sure all transmitters have completed before proceeding XXX */
 
 		rxq->ifr_cq_gen = rxq->ifr_cq_cidx = rxq->ifr_cq_pidx = 0;
 		for (j = 0, di = rxq->ifr_ifdi; j < sctx->isc_nrxqs; j++, di++)
 			bzero((void *)di->idi_vaddr, di->idi_size);
 		/* also resets the free lists pidx/cidx */
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++)
 			iflib_fl_bufs_free(fl);
 	}
 }
 
 static inline caddr_t
 calc_next_rxd(iflib_fl_t fl, int cidx)
 {
 	qidx_t size;
 	int nrxd;
 	caddr_t start, end, cur, next;
 
 	nrxd = fl->ifl_size;
 	size = fl->ifl_rxd_size;
 	start = fl->ifl_ifdi->idi_vaddr;
 
 	if (__predict_false(size == 0))
 		return (start);
 	cur = start + size*cidx;
 	end = start + size*nrxd;
 	next = CACHE_PTR_NEXT(cur);
 	return (next < end ? next : start);
 }
 
 static inline void
 prefetch_pkts(iflib_fl_t fl, int cidx)
 {
 	int nextptr;
 	int nrxd = fl->ifl_size;
 	caddr_t next_rxd;
 
 
 	nextptr = (cidx + CACHE_PTR_INCREMENT) & (nrxd-1);
 	prefetch(&fl->ifl_sds.ifsd_m[nextptr]);
 	prefetch(&fl->ifl_sds.ifsd_cl[nextptr]);
 	next_rxd = calc_next_rxd(fl, cidx);
 	prefetch(next_rxd);
 	prefetch(fl->ifl_sds.ifsd_m[(cidx + 1) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_m[(cidx + 2) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_m[(cidx + 3) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_m[(cidx + 4) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_cl[(cidx + 1) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_cl[(cidx + 2) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_cl[(cidx + 3) & (nrxd-1)]);
 	prefetch(fl->ifl_sds.ifsd_cl[(cidx + 4) & (nrxd-1)]);
 }
 
-static void
-rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, int unload, if_rxsd_t sd)
+static struct mbuf *
+rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, bool unload, if_rxsd_t sd,
+    int *pf_rv, if_rxd_info_t ri)
 {
-	int flid, cidx;
 	bus_dmamap_t map;
 	iflib_fl_t fl;
-	int next;
+	caddr_t payload;
+	struct mbuf *m;
+	int flid, cidx, len, next;
 
 	map = NULL;
 	flid = irf->irf_flid;
 	cidx = irf->irf_idx;
 	fl = &rxq->ifr_fl[flid];
 	sd->ifsd_fl = fl;
 	sd->ifsd_cidx = cidx;
-	sd->ifsd_m = &fl->ifl_sds.ifsd_m[cidx];
+	m = fl->ifl_sds.ifsd_m[cidx];
 	sd->ifsd_cl = &fl->ifl_sds.ifsd_cl[cidx];
 	fl->ifl_credits--;
 #if MEMORY_LOGGING
 	fl->ifl_m_dequeued++;
 #endif
 	if (rxq->ifr_ctx->ifc_flags & IFC_PREFETCH)
 		prefetch_pkts(fl, cidx);
 	next = (cidx + CACHE_PTR_INCREMENT) & (fl->ifl_size-1);
 	prefetch(&fl->ifl_sds.ifsd_map[next]);
 	map = fl->ifl_sds.ifsd_map[cidx];
 	next = (cidx + CACHE_LINE_SIZE) & (fl->ifl_size-1);
 
 	/* not valid assert if bxe really does SGE from non-contiguous elements */
 	MPASS(fl->ifl_cidx == cidx);
 	bus_dmamap_sync(fl->ifl_buf_tag, map, BUS_DMASYNC_POSTREAD);
+
+	if (rxq->pfil != NULL && PFIL_HOOKED_IN(rxq->pfil) && pf_rv != NULL) {
+		payload  = *sd->ifsd_cl;
+		payload +=  ri->iri_pad;
+		len = ri->iri_len - ri->iri_pad;
+		*pf_rv = pfil_run_hooks(rxq->pfil, payload, ri->iri_ifp,
+		    len | PFIL_MEMPTR | PFIL_IN, NULL);
+		switch (*pf_rv) {
+		case PFIL_DROPPED:
+		case PFIL_CONSUMED:
+			/*
+			 * The filter ate it.  Everything is recycled.
+			 */
+			m = NULL;
+			unload = 0;
+			break;
+		case PFIL_REALLOCED:
+			/*
+			 * The filter copied it.  Everything is recycled.
+			 */
+			m = pfil_mem2mbuf(payload);
+			unload = 0;
+			break;
+		case PFIL_PASS:
+			/*
+			 * Filter said it was OK, so receive like
+			 * normal
+			 */
+			fl->ifl_sds.ifsd_m[cidx] = NULL;
+			break;
+		default:
+			MPASS(0);
+		}
+	} else {
+		fl->ifl_sds.ifsd_m[cidx] = NULL;
+		*pf_rv = PFIL_PASS;
+	}
+
 	if (unload)
 		bus_dmamap_unload(fl->ifl_buf_tag, map);
 	fl->ifl_cidx = (fl->ifl_cidx + 1) & (fl->ifl_size-1);
 	if (__predict_false(fl->ifl_cidx == 0))
 		fl->ifl_gen = 0;
 	bit_clear(fl->ifl_rx_bitmap, cidx);
+	return (m);
 }
 
 static struct mbuf *
-assemble_segments(iflib_rxq_t rxq, if_rxd_info_t ri, if_rxsd_t sd)
+assemble_segments(iflib_rxq_t rxq, if_rxd_info_t ri, if_rxsd_t sd, int *pf_rv)
 {
-	int i, padlen , flags;
 	struct mbuf *m, *mh, *mt;
 	caddr_t cl;
+	int  *pf_rv_ptr, flags, i, padlen;
+	bool consumed;
 
 	i = 0;
 	mh = NULL;
+	consumed = false;
+	*pf_rv = PFIL_PASS;
+	pf_rv_ptr = pf_rv;
 	do {
-		rxd_frag_to_sd(rxq, &ri->iri_frags[i], TRUE, sd);
+		m = rxd_frag_to_sd(rxq, &ri->iri_frags[i], !consumed, sd,
+		    pf_rv_ptr, ri);
 
 		MPASS(*sd->ifsd_cl != NULL);
-		MPASS(*sd->ifsd_m != NULL);
 
-		/* Don't include zero-length frags */
-		if (ri->iri_frags[i].irf_len == 0) {
+		/*
+		 * Exclude zero-length frags & frags from
+		 * packets the filter has consumed or dropped
+		 */
+		if (ri->iri_frags[i].irf_len == 0 || consumed ||
+		    *pf_rv == PFIL_CONSUMED || *pf_rv == PFIL_DROPPED) {
+			if (mh == NULL) {
+				/* everything saved here */
+				consumed = true;
+				pf_rv_ptr = NULL;
+				continue;
+			}
 			/* XXX we can save the cluster here, but not the mbuf */
-			m_init(*sd->ifsd_m, M_NOWAIT, MT_DATA, 0);
-			m_free(*sd->ifsd_m);
-			*sd->ifsd_m = NULL;
+			m_init(m, M_NOWAIT, MT_DATA, 0);
+			m_free(m);
 			continue;
 		}
-		m = *sd->ifsd_m;
-		*sd->ifsd_m = NULL;
 		if (mh == NULL) {
 			flags = M_PKTHDR|M_EXT;
 			mh = mt = m;
 			padlen = ri->iri_pad;
 		} else {
 			flags = M_EXT;
 			mt->m_next = m;
 			mt = m;
 			/* assuming padding is only on the first fragment */
 			padlen = 0;
 		}
 		cl = *sd->ifsd_cl;
 		*sd->ifsd_cl = NULL;
 
 		/* Can these two be made one ? */
 		m_init(m, M_NOWAIT, MT_DATA, flags);
 		m_cljset(m, cl, sd->ifsd_fl->ifl_cltype);
 		/*
 		 * These must follow m_init and m_cljset
 		 */
 		m->m_data += padlen;
 		ri->iri_len -= padlen;
 		m->m_len = ri->iri_frags[i].irf_len;
 	} while (++i < ri->iri_nfrags);
 
 	return (mh);
 }
 
 /*
  * Process one software descriptor
  */
 static struct mbuf *
 iflib_rxd_pkt_get(iflib_rxq_t rxq, if_rxd_info_t ri)
 {
 	struct if_rxsd sd;
 	struct mbuf *m;
+	int pf_rv;
 
 	/* should I merge this back in now that the two paths are basically duplicated? */
 	if (ri->iri_nfrags == 1 &&
 	    ri->iri_frags[0].irf_len <= MIN(IFLIB_RX_COPY_THRESH, MHLEN)) {
-		rxd_frag_to_sd(rxq, &ri->iri_frags[0], FALSE, &sd);
-		m = *sd.ifsd_m;
-		*sd.ifsd_m = NULL;
-		m_init(m, M_NOWAIT, MT_DATA, M_PKTHDR);
+		m = rxd_frag_to_sd(rxq, &ri->iri_frags[0], false, &sd,
+		    &pf_rv, ri);
+		if (pf_rv != PFIL_PASS && pf_rv != PFIL_REALLOCED)
+			return (m);
+		if (pf_rv == PFIL_PASS) {
+			m_init(m, M_NOWAIT, MT_DATA, M_PKTHDR);
 #ifndef __NO_STRICT_ALIGNMENT
-		if (!IP_ALIGNED(m))
-			m->m_data += 2;
+			if (!IP_ALIGNED(m))
+				m->m_data += 2;
 #endif
-		memcpy(m->m_data, *sd.ifsd_cl, ri->iri_len);
-		m->m_len = ri->iri_frags[0].irf_len;
-       } else {
-		m = assemble_segments(rxq, ri, &sd);
+			memcpy(m->m_data, *sd.ifsd_cl, ri->iri_len);
+			m->m_len = ri->iri_frags[0].irf_len;
+		}
+	} else {
+		m = assemble_segments(rxq, ri, &sd, &pf_rv);
+		if (pf_rv != PFIL_PASS && pf_rv != PFIL_REALLOCED)
+			return (m);
 	}
 	m->m_pkthdr.len = ri->iri_len;
 	m->m_pkthdr.rcvif = ri->iri_ifp;
 	m->m_flags |= ri->iri_flags;
 	m->m_pkthdr.ether_vtag = ri->iri_vtag;
 	m->m_pkthdr.flowid = ri->iri_flowid;
 	M_HASHTYPE_SET(m, ri->iri_rsstype);
 	m->m_pkthdr.csum_flags = ri->iri_csum_flags;
 	m->m_pkthdr.csum_data = ri->iri_csum_data;
 	return (m);
 }
 
 #if defined(INET6) || defined(INET)
 static void
 iflib_get_ip_forwarding(struct lro_ctrl *lc, bool *v4, bool *v6)
 {
 	CURVNET_SET(lc->ifp->if_vnet);
 #if defined(INET6)
 	*v6 = VNET(ip6_forwarding);
 #endif
 #if defined(INET)
 	*v4 = VNET(ipforwarding);
 #endif
 	CURVNET_RESTORE();
 }
 
 /*
  * Returns true if it's possible this packet could be LROed.
  * if it returns false, it is guaranteed that tcp_lro_rx()
  * would not return zero.
  */
 static bool
 iflib_check_lro_possible(struct mbuf *m, bool v4_forwarding, bool v6_forwarding)
 {
 	struct ether_header *eh;
 	uint16_t eh_type;
 
 	eh = mtod(m, struct ether_header *);
 	eh_type = ntohs(eh->ether_type);
 	switch (eh_type) {
 #if defined(INET6)
 		case ETHERTYPE_IPV6:
 			return !v6_forwarding;
 #endif
 #if defined (INET)
 		case ETHERTYPE_IP:
 			return !v4_forwarding;
 #endif
 	}
 
 	return false;
 }
 #else
 static void
 iflib_get_ip_forwarding(struct lro_ctrl *lc __unused, bool *v4 __unused, bool *v6 __unused)
 {
 }
 #endif
 
 static bool
 iflib_rxeof(iflib_rxq_t rxq, qidx_t budget)
 {
 	if_ctx_t ctx = rxq->ifr_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	int avail, i;
 	qidx_t *cidxp;
 	struct if_rxd_info ri;
 	int err, budget_left, rx_bytes, rx_pkts;
 	iflib_fl_t fl;
 	struct ifnet *ifp;
 	int lro_enabled;
 	bool v4_forwarding, v6_forwarding, lro_possible;
 
 	/*
 	 * XXX early demux data packets so that if_input processing only handles
 	 * acks in interrupt context
 	 */
 	struct mbuf *m, *mh, *mt, *mf;
 
 	lro_possible = v4_forwarding = v6_forwarding = false;
 	ifp = ctx->ifc_ifp;
 	mh = mt = NULL;
 	MPASS(budget > 0);
 	rx_pkts	= rx_bytes = 0;
 	if (sctx->isc_flags & IFLIB_HAS_RXCQ)
 		cidxp = &rxq->ifr_cq_cidx;
 	else
 		cidxp = &rxq->ifr_fl[0].ifl_cidx;
 	if ((avail = iflib_rxd_avail(ctx, rxq, *cidxp, budget)) == 0) {
 		for (i = 0, fl = &rxq->ifr_fl[0]; i < sctx->isc_nfl; i++, fl++)
 			__iflib_fl_refill_lt(ctx, fl, budget + 8);
 		DBG_COUNTER_INC(rx_unavail);
 		return (false);
 	}
 
+	/* pfil needs the vnet to be set */
+	CURVNET_SET_QUIET(ifp->if_vnet);
 	for (budget_left = budget; budget_left > 0 && avail > 0;) {
 		if (__predict_false(!CTX_ACTIVE(ctx))) {
 			DBG_COUNTER_INC(rx_ctx_inactive);
 			break;
 		}
 		/*
 		 * Reset client set fields to their default values
 		 */
 		rxd_info_zero(&ri);
 		ri.iri_qsidx = rxq->ifr_id;
 		ri.iri_cidx = *cidxp;
 		ri.iri_ifp = ifp;
 		ri.iri_frags = rxq->ifr_frags;
 		err = ctx->isc_rxd_pkt_get(ctx->ifc_softc, &ri);
 
 		if (err)
 			goto err;
+		rx_pkts += 1;
+		rx_bytes += ri.iri_len;
 		if (sctx->isc_flags & IFLIB_HAS_RXCQ) {
 			*cidxp = ri.iri_cidx;
 			/* Update our consumer index */
 			/* XXX NB: shurd - check if this is still safe */
 			while (rxq->ifr_cq_cidx >= scctx->isc_nrxd[0]) {
 				rxq->ifr_cq_cidx -= scctx->isc_nrxd[0];
 				rxq->ifr_cq_gen = 0;
 			}
 			/* was this only a completion queue message? */
 			if (__predict_false(ri.iri_nfrags == 0))
 				continue;
 		}
 		MPASS(ri.iri_nfrags != 0);
 		MPASS(ri.iri_len != 0);
 
 		/* will advance the cidx on the corresponding free lists */
 		m = iflib_rxd_pkt_get(rxq, &ri);
 		avail--;
 		budget_left--;
 		if (avail == 0 && budget_left)
 			avail = iflib_rxd_avail(ctx, rxq, *cidxp, budget_left);
 
-		if (__predict_false(m == NULL)) {
-			DBG_COUNTER_INC(rx_mbuf_null);
+		if (__predict_false(m == NULL))
 			continue;
-		}
+
 		/* imm_pkt: -- cxgb */
 		if (mh == NULL)
 			mh = mt = m;
 		else {
 			mt->m_nextpkt = m;
 			mt = m;
 		}
 	}
+	CURVNET_RESTORE();
 	/* make sure that we can refill faster than drain */
 	for (i = 0, fl = &rxq->ifr_fl[0]; i < sctx->isc_nfl; i++, fl++)
 		__iflib_fl_refill_lt(ctx, fl, budget + 8);
 
 	lro_enabled = (if_getcapenable(ifp) & IFCAP_LRO);
 	if (lro_enabled)
 		iflib_get_ip_forwarding(&rxq->ifr_lc, &v4_forwarding, &v6_forwarding);
 	mt = mf = NULL;
 	while (mh != NULL) {
 		m = mh;
 		mh = mh->m_nextpkt;
 		m->m_nextpkt = NULL;
 #ifndef __NO_STRICT_ALIGNMENT
 		if (!IP_ALIGNED(m) && (m = iflib_fixup_rx(m)) == NULL)
 			continue;
 #endif
 		rx_bytes += m->m_pkthdr.len;
 		rx_pkts++;
 #if defined(INET6) || defined(INET)
 		if (lro_enabled) {
 			if (!lro_possible) {
 				lro_possible = iflib_check_lro_possible(m, v4_forwarding, v6_forwarding);
 				if (lro_possible && mf != NULL) {
 					ifp->if_input(ifp, mf);
 					DBG_COUNTER_INC(rx_if_input);
 					mt = mf = NULL;
 				}
 			}
 			if ((m->m_pkthdr.csum_flags & (CSUM_L4_CALC|CSUM_L4_VALID)) ==
 			    (CSUM_L4_CALC|CSUM_L4_VALID)) {
 				if (lro_possible && tcp_lro_rx(&rxq->ifr_lc, m, 0) == 0)
 					continue;
 			}
 		}
 #endif
 		if (lro_possible) {
 			ifp->if_input(ifp, m);
 			DBG_COUNTER_INC(rx_if_input);
 			continue;
 		}
 
 		if (mf == NULL)
 			mf = m;
 		if (mt != NULL)
 			mt->m_nextpkt = m;
 		mt = m;
 	}
 	if (mf != NULL) {
 		ifp->if_input(ifp, mf);
 		DBG_COUNTER_INC(rx_if_input);
 	}
 
 	if_inc_counter(ifp, IFCOUNTER_IBYTES, rx_bytes);
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, rx_pkts);
 
 	/*
 	 * Flush any outstanding LRO work
 	 */
 #if defined(INET6) || defined(INET)
 	tcp_lro_flush_all(&rxq->ifr_lc);
 #endif
 	if (avail)
 		return true;
 	return (iflib_rxd_avail(ctx, rxq, *cidxp, 1));
 err:
 	STATE_LOCK(ctx);
 	ctx->ifc_flags |= IFC_DO_RESET;
 	iflib_admin_intr_deferred(ctx);
 	STATE_UNLOCK(ctx);
 	return (false);
 }
 
 #define TXD_NOTIFY_COUNT(txq) (((txq)->ift_size / (txq)->ift_update_freq)-1)
 static inline qidx_t
 txq_max_db_deferred(iflib_txq_t txq, qidx_t in_use)
 {
 	qidx_t notify_count = TXD_NOTIFY_COUNT(txq);
 	qidx_t minthresh = txq->ift_size / 8;
 	if (in_use > 4*minthresh)
 		return (notify_count);
 	if (in_use > 2*minthresh)
 		return (notify_count >> 1);
 	if (in_use > minthresh)
 		return (notify_count >> 3);
 	return (0);
 }
 
 static inline qidx_t
 txq_max_rs_deferred(iflib_txq_t txq)
 {
 	qidx_t notify_count = TXD_NOTIFY_COUNT(txq);
 	qidx_t minthresh = txq->ift_size / 8;
 	if (txq->ift_in_use > 4*minthresh)
 		return (notify_count);
 	if (txq->ift_in_use > 2*minthresh)
 		return (notify_count >> 1);
 	if (txq->ift_in_use > minthresh)
 		return (notify_count >> 2);
 	return (2);
 }
 
 #define M_CSUM_FLAGS(m) ((m)->m_pkthdr.csum_flags)
 #define M_HAS_VLANTAG(m) (m->m_flags & M_VLANTAG)
 
 #define TXQ_MAX_DB_DEFERRED(txq, in_use) txq_max_db_deferred((txq), (in_use))
 #define TXQ_MAX_RS_DEFERRED(txq) txq_max_rs_deferred(txq)
 #define TXQ_MAX_DB_CONSUMED(size) (size >> 4)
 
 /* forward compatibility for cxgb */
 #define FIRST_QSET(ctx) 0
 #define NTXQSETS(ctx) ((ctx)->ifc_softc_ctx.isc_ntxqsets)
 #define NRXQSETS(ctx) ((ctx)->ifc_softc_ctx.isc_nrxqsets)
 #define QIDX(ctx, m) ((((m)->m_pkthdr.flowid & ctx->ifc_softc_ctx.isc_rss_table_mask) % NTXQSETS(ctx)) + FIRST_QSET(ctx))
 #define DESC_RECLAIMABLE(q) ((int)((q)->ift_processed - (q)->ift_cleaned - (q)->ift_ctx->ifc_softc_ctx.isc_tx_nsegments))
 
 /* XXX we should be setting this to something other than zero */
 #define RECLAIM_THRESH(ctx) ((ctx)->ifc_sctx->isc_tx_reclaim_thresh)
 #define	MAX_TX_DESC(ctx) max((ctx)->ifc_softc_ctx.isc_tx_tso_segments_max, \
     (ctx)->ifc_softc_ctx.isc_tx_nsegments)
 
 static inline bool
 iflib_txd_db_check(if_ctx_t ctx, iflib_txq_t txq, int ring, qidx_t in_use)
 {
 	qidx_t dbval, max;
 	bool rang;
 
 	rang = false;
 	max = TXQ_MAX_DB_DEFERRED(txq, in_use);
 	if (ring || txq->ift_db_pending >= max) {
 		dbval = txq->ift_npending ? txq->ift_npending : txq->ift_pidx;
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 		ctx->isc_txd_flush(ctx->ifc_softc, txq->ift_id, dbval);
 		txq->ift_db_pending = txq->ift_npending = 0;
 		rang = true;
 	}
 	return (rang);
 }
 
 #ifdef PKT_DEBUG
 static void
 print_pkt(if_pkt_info_t pi)
 {
 	printf("pi len:  %d qsidx: %d nsegs: %d ndescs: %d flags: %x pidx: %d\n",
 	       pi->ipi_len, pi->ipi_qsidx, pi->ipi_nsegs, pi->ipi_ndescs, pi->ipi_flags, pi->ipi_pidx);
 	printf("pi new_pidx: %d csum_flags: %lx tso_segsz: %d mflags: %x vtag: %d\n",
 	       pi->ipi_new_pidx, pi->ipi_csum_flags, pi->ipi_tso_segsz, pi->ipi_mflags, pi->ipi_vtag);
 	printf("pi etype: %d ehdrlen: %d ip_hlen: %d ipproto: %d\n",
 	       pi->ipi_etype, pi->ipi_ehdrlen, pi->ipi_ip_hlen, pi->ipi_ipproto);
 }
 #endif
 
 #define IS_TSO4(pi) ((pi)->ipi_csum_flags & CSUM_IP_TSO)
 #define IS_TX_OFFLOAD4(pi) ((pi)->ipi_csum_flags & (CSUM_IP_TCP | CSUM_IP_TSO))
 #define IS_TSO6(pi) ((pi)->ipi_csum_flags & CSUM_IP6_TSO)
 #define IS_TX_OFFLOAD6(pi) ((pi)->ipi_csum_flags & (CSUM_IP6_TCP | CSUM_IP6_TSO))
 
 static int
 iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp)
 {
 	if_shared_ctx_t sctx = txq->ift_ctx->ifc_sctx;
 	struct ether_vlan_header *eh;
 	struct mbuf *m;
 
 	m = *mp;
 	if ((sctx->isc_flags & IFLIB_NEED_SCRATCH) &&
 	    M_WRITABLE(m) == 0) {
 		if ((m = m_dup(m, M_NOWAIT)) == NULL) {
 			return (ENOMEM);
 		} else {
 			m_freem(*mp);
 			DBG_COUNTER_INC(tx_frees);
 			*mp = m;
 		}
 	}
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present,
 	 * helpful for QinQ too.
 	 */
 	if (__predict_false(m->m_len < sizeof(*eh))) {
 		txq->ift_pullups++;
 		if (__predict_false((m = m_pullup(m, sizeof(*eh))) == NULL))
 			return (ENOMEM);
 	}
 	eh = mtod(m, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		pi->ipi_etype = ntohs(eh->evl_proto);
 		pi->ipi_ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 	} else {
 		pi->ipi_etype = ntohs(eh->evl_encap_proto);
 		pi->ipi_ehdrlen = ETHER_HDR_LEN;
 	}
 
 	switch (pi->ipi_etype) {
 #ifdef INET
 	case ETHERTYPE_IP:
 	{
 		struct mbuf *n;
 		struct ip *ip = NULL;
 		struct tcphdr *th = NULL;
 		int minthlen;
 
 		minthlen = min(m->m_pkthdr.len, pi->ipi_ehdrlen + sizeof(*ip) + sizeof(*th));
 		if (__predict_false(m->m_len < minthlen)) {
 			/*
 			 * if this code bloat is causing too much of a hit
 			 * move it to a separate function and mark it noinline
 			 */
 			if (m->m_len == pi->ipi_ehdrlen) {
 				n = m->m_next;
 				MPASS(n);
 				if (n->m_len >= sizeof(*ip))  {
 					ip = (struct ip *)n->m_data;
 					if (n->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 						th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 				} else {
 					txq->ift_pullups++;
 					if (__predict_false((m = m_pullup(m, minthlen)) == NULL))
 						return (ENOMEM);
 					ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 				}
 			} else {
 				txq->ift_pullups++;
 				if (__predict_false((m = m_pullup(m, minthlen)) == NULL))
 					return (ENOMEM);
 				ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 				if (m->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 					th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 			}
 		} else {
 			ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 			if (m->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 				th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 		}
 		pi->ipi_ip_hlen = ip->ip_hl << 2;
 		pi->ipi_ipproto = ip->ip_p;
 		pi->ipi_flags |= IPI_TX_IPV4;
 
 		/* TCP checksum offload may require TCP header length */
 		if (IS_TX_OFFLOAD4(pi)) {
 			if (__predict_true(pi->ipi_ipproto == IPPROTO_TCP)) {
 				if (__predict_false(th == NULL)) {
 					txq->ift_pullups++;
 					if (__predict_false((m = m_pullup(m, (ip->ip_hl << 2) + sizeof(*th))) == NULL))
 						return (ENOMEM);
 					th = (struct tcphdr *)((caddr_t)ip + pi->ipi_ip_hlen);
 				}
 				pi->ipi_tcp_hflags = th->th_flags;
 				pi->ipi_tcp_hlen = th->th_off << 2;
 				pi->ipi_tcp_seq = th->th_seq;
 			}
 			if (IS_TSO4(pi)) {
 				if (__predict_false(ip->ip_p != IPPROTO_TCP))
 					return (ENXIO);
 				/*
 				 * TSO always requires hardware checksum offload.
 				 */
 				pi->ipi_csum_flags |= (CSUM_IP_TCP | CSUM_IP);
 				th->th_sum = in_pseudo(ip->ip_src.s_addr,
 						       ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 				pi->ipi_tso_segsz = m->m_pkthdr.tso_segsz;
 				if (sctx->isc_flags & IFLIB_TSO_INIT_IP) {
 					ip->ip_sum = 0;
 					ip->ip_len = htons(pi->ipi_ip_hlen + pi->ipi_tcp_hlen + pi->ipi_tso_segsz);
 				}
 			}
 		}
 		if ((sctx->isc_flags & IFLIB_NEED_ZERO_CSUM) && (pi->ipi_csum_flags & CSUM_IP))
                        ip->ip_sum = 0;
 
 		break;
 	}
 #endif
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 	{
 		struct ip6_hdr *ip6 = (struct ip6_hdr *)(m->m_data + pi->ipi_ehdrlen);
 		struct tcphdr *th;
 		pi->ipi_ip_hlen = sizeof(struct ip6_hdr);
 
 		if (__predict_false(m->m_len < pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) {
 			txq->ift_pullups++;
 			if (__predict_false((m = m_pullup(m, pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) == NULL))
 				return (ENOMEM);
 		}
 		th = (struct tcphdr *)((caddr_t)ip6 + pi->ipi_ip_hlen);
 
 		/* XXX-BZ this will go badly in case of ext hdrs. */
 		pi->ipi_ipproto = ip6->ip6_nxt;
 		pi->ipi_flags |= IPI_TX_IPV6;
 
 		/* TCP checksum offload may require TCP header length */
 		if (IS_TX_OFFLOAD6(pi)) {
 			if (pi->ipi_ipproto == IPPROTO_TCP) {
 				if (__predict_false(m->m_len < pi->ipi_ehdrlen + sizeof(struct ip6_hdr) + sizeof(struct tcphdr))) {
 					txq->ift_pullups++;
 					if (__predict_false((m = m_pullup(m, pi->ipi_ehdrlen + sizeof(struct ip6_hdr) + sizeof(struct tcphdr))) == NULL))
 						return (ENOMEM);
 				}
 				pi->ipi_tcp_hflags = th->th_flags;
 				pi->ipi_tcp_hlen = th->th_off << 2;
 				pi->ipi_tcp_seq = th->th_seq;
 			}
 			if (IS_TSO6(pi)) {
 				if (__predict_false(ip6->ip6_nxt != IPPROTO_TCP))
 					return (ENXIO);
 				/*
 				 * TSO always requires hardware checksum offload.
 				 */
 				pi->ipi_csum_flags |= CSUM_IP6_TCP;
 				th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 				pi->ipi_tso_segsz = m->m_pkthdr.tso_segsz;
 			}
 		}
 		break;
 	}
 #endif
 	default:
 		pi->ipi_csum_flags &= ~CSUM_OFFLOAD;
 		pi->ipi_ip_hlen = 0;
 		break;
 	}
 	*mp = m;
 
 	return (0);
 }
 
 /*
  * If dodgy hardware rejects the scatter gather chain we've handed it
  * we'll need to remove the mbuf chain from ifsg_m[] before we can add the
  * m_defrag'd mbufs
  */
 static __noinline struct mbuf *
 iflib_remove_mbuf(iflib_txq_t txq)
 {
 	int ntxd, pidx;
 	struct mbuf *m, **ifsd_m;
 
 	ifsd_m = txq->ift_sds.ifsd_m;
 	ntxd = txq->ift_size;
 	pidx = txq->ift_pidx & (ntxd - 1);
 	ifsd_m = txq->ift_sds.ifsd_m;
 	m = ifsd_m[pidx];
 	ifsd_m[pidx] = NULL;
 	bus_dmamap_unload(txq->ift_buf_tag, txq->ift_sds.ifsd_map[pidx]);
 	if (txq->ift_sds.ifsd_tso_map != NULL)
 		bus_dmamap_unload(txq->ift_tso_buf_tag,
 		    txq->ift_sds.ifsd_tso_map[pidx]);
 #if MEMORY_LOGGING
 	txq->ift_dequeued++;
 #endif
 	return (m);
 }
 
 static inline caddr_t
 calc_next_txd(iflib_txq_t txq, int cidx, uint8_t qid)
 {
 	qidx_t size;
 	int ntxd;
 	caddr_t start, end, cur, next;
 
 	ntxd = txq->ift_size;
 	size = txq->ift_txd_size[qid];
 	start = txq->ift_ifdi[qid].idi_vaddr;
 
 	if (__predict_false(size == 0))
 		return (start);
 	cur = start + size*cidx;
 	end = start + size*ntxd;
 	next = CACHE_PTR_NEXT(cur);
 	return (next < end ? next : start);
 }
 
 /*
  * Pad an mbuf to ensure a minimum ethernet frame size.
  * min_frame_size is the frame size (less CRC) to pad the mbuf to
  */
 static __noinline int
 iflib_ether_pad(device_t dev, struct mbuf **m_head, uint16_t min_frame_size)
 {
 	/*
 	 * 18 is enough bytes to pad an ARP packet to 46 bytes, and
 	 * and ARP message is the smallest common payload I can think of
 	 */
 	static char pad[18];	/* just zeros */
 	int n;
 	struct mbuf *new_head;
 
 	if (!M_WRITABLE(*m_head)) {
 		new_head = m_dup(*m_head, M_NOWAIT);
 		if (new_head == NULL) {
 			m_freem(*m_head);
 			device_printf(dev, "cannot pad short frame, m_dup() failed");
 			DBG_COUNTER_INC(encap_pad_mbuf_fail);
 			DBG_COUNTER_INC(tx_frees);
 			return ENOMEM;
 		}
 		m_freem(*m_head);
 		*m_head = new_head;
 	}
 
 	for (n = min_frame_size - (*m_head)->m_pkthdr.len;
 	     n > 0; n -= sizeof(pad))
 		if (!m_append(*m_head, min(n, sizeof(pad)), pad))
 			break;
 
 	if (n > 0) {
 		m_freem(*m_head);
 		device_printf(dev, "cannot pad short frame\n");
 		DBG_COUNTER_INC(encap_pad_mbuf_fail);
 		DBG_COUNTER_INC(tx_frees);
 		return (ENOBUFS);
 	}
 
 	return 0;
 }
 
 static int
 iflib_encap(iflib_txq_t txq, struct mbuf **m_headp)
 {
 	if_ctx_t		ctx;
 	if_shared_ctx_t		sctx;
 	if_softc_ctx_t		scctx;
 	bus_dma_tag_t		buf_tag;
 	bus_dma_segment_t	*segs;
 	struct mbuf		*m_head, **ifsd_m;
 	void			*next_txd;
 	bus_dmamap_t		map;
 	struct if_pkt_info	pi;
 	int remap = 0;
 	int err, nsegs, ndesc, max_segs, pidx, cidx, next, ntxd;
 
 	ctx = txq->ift_ctx;
 	sctx = ctx->ifc_sctx;
 	scctx = &ctx->ifc_softc_ctx;
 	segs = txq->ift_segs;
 	ntxd = txq->ift_size;
 	m_head = *m_headp;
 	map = NULL;
 
 	/*
 	 * If we're doing TSO the next descriptor to clean may be quite far ahead
 	 */
 	cidx = txq->ift_cidx;
 	pidx = txq->ift_pidx;
 	if (ctx->ifc_flags & IFC_PREFETCH) {
 		next = (cidx + CACHE_PTR_INCREMENT) & (ntxd-1);
 		if (!(ctx->ifc_flags & IFLIB_HAS_TXCQ)) {
 			next_txd = calc_next_txd(txq, cidx, 0);
 			prefetch(next_txd);
 		}
 
 		/* prefetch the next cache line of mbuf pointers and flags */
 		prefetch(&txq->ift_sds.ifsd_m[next]);
 		prefetch(&txq->ift_sds.ifsd_map[next]);
 		next = (cidx + CACHE_LINE_SIZE) & (ntxd-1);
 	}
 	map = txq->ift_sds.ifsd_map[pidx];
 	ifsd_m = txq->ift_sds.ifsd_m;
 
 	if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 		buf_tag = txq->ift_tso_buf_tag;
 		max_segs = scctx->isc_tx_tso_segments_max;
 		map = txq->ift_sds.ifsd_tso_map[pidx];
 		MPASS(buf_tag != NULL);
 		MPASS(max_segs > 0);
 	} else {
 		buf_tag = txq->ift_buf_tag;
 		max_segs = scctx->isc_tx_nsegments;
 		map = txq->ift_sds.ifsd_map[pidx];
 	}
 	if ((sctx->isc_flags & IFLIB_NEED_ETHER_PAD) &&
 	    __predict_false(m_head->m_pkthdr.len < scctx->isc_min_frame_size)) {
 		err = iflib_ether_pad(ctx->ifc_dev, m_headp, scctx->isc_min_frame_size);
 		if (err) {
 			DBG_COUNTER_INC(encap_txd_encap_fail);
 			return err;
 		}
 	}
 	m_head = *m_headp;
 
 	pkt_info_zero(&pi);
 	pi.ipi_mflags = (m_head->m_flags & (M_VLANTAG|M_BCAST|M_MCAST));
 	pi.ipi_pidx = pidx;
 	pi.ipi_qsidx = txq->ift_id;
 	pi.ipi_len = m_head->m_pkthdr.len;
 	pi.ipi_csum_flags = m_head->m_pkthdr.csum_flags;
 	pi.ipi_vtag = (m_head->m_flags & M_VLANTAG) ? m_head->m_pkthdr.ether_vtag : 0;
 
 	/* deliberate bitwise OR to make one condition */
 	if (__predict_true((pi.ipi_csum_flags | pi.ipi_vtag))) {
 		if (__predict_false((err = iflib_parse_header(txq, &pi, m_headp)) != 0)) {
 			DBG_COUNTER_INC(encap_txd_encap_fail);
 			return (err);
 		}
 		m_head = *m_headp;
 	}
 
 retry:
 	err = bus_dmamap_load_mbuf_sg(buf_tag, map, m_head, segs, &nsegs,
 	    BUS_DMA_NOWAIT);
 defrag:
 	if (__predict_false(err)) {
 		switch (err) {
 		case EFBIG:
 			/* try collapse once and defrag once */
 			if (remap == 0) {
 				m_head = m_collapse(*m_headp, M_NOWAIT, max_segs);
 				/* try defrag if collapsing fails */
 				if (m_head == NULL)
 					remap++;
 			}
 			if (remap == 1) {
 				txq->ift_mbuf_defrag++;
 				m_head = m_defrag(*m_headp, M_NOWAIT);
 			}
 			/*
 			 * remap should never be >1 unless bus_dmamap_load_mbuf_sg
 			 * failed to map an mbuf that was run through m_defrag
 			 */
 			MPASS(remap <= 1);
 			if (__predict_false(m_head == NULL || remap > 1))
 				goto defrag_failed;
 			remap++;
 			*m_headp = m_head;
 			goto retry;
 			break;
 		case ENOMEM:
 			txq->ift_no_tx_dma_setup++;
 			break;
 		default:
 			txq->ift_no_tx_dma_setup++;
 			m_freem(*m_headp);
 			DBG_COUNTER_INC(tx_frees);
 			*m_headp = NULL;
 			break;
 		}
 		txq->ift_map_failed++;
 		DBG_COUNTER_INC(encap_load_mbuf_fail);
 		DBG_COUNTER_INC(encap_txd_encap_fail);
 		return (err);
 	}
 	ifsd_m[pidx] = m_head;
 	/*
 	 * XXX assumes a 1 to 1 relationship between segments and
 	 *        descriptors - this does not hold true on all drivers, e.g.
 	 *        cxgb
 	 */
 	if (__predict_false(nsegs + 2 > TXQ_AVAIL(txq))) {
 		txq->ift_no_desc_avail++;
 		bus_dmamap_unload(buf_tag, map);
 		DBG_COUNTER_INC(encap_txq_avail_fail);
 		DBG_COUNTER_INC(encap_txd_encap_fail);
 		if ((txq->ift_task.gt_task.ta_flags & TASK_ENQUEUED) == 0)
 			GROUPTASK_ENQUEUE(&txq->ift_task);
 		return (ENOBUFS);
 	}
 	/*
 	 * On Intel cards we can greatly reduce the number of TX interrupts
 	 * we see by only setting report status on every Nth descriptor.
 	 * However, this also means that the driver will need to keep track
 	 * of the descriptors that RS was set on to check them for the DD bit.
 	 */
 	txq->ift_rs_pending += nsegs + 1;
 	if (txq->ift_rs_pending > TXQ_MAX_RS_DEFERRED(txq) ||
 	     iflib_no_tx_batch || (TXQ_AVAIL(txq) - nsegs) <= MAX_TX_DESC(ctx) + 2) {
 		pi.ipi_flags |= IPI_TX_INTR;
 		txq->ift_rs_pending = 0;
 	}
 
 	pi.ipi_segs = segs;
 	pi.ipi_nsegs = nsegs;
 
 	MPASS(pidx >= 0 && pidx < txq->ift_size);
 #ifdef PKT_DEBUG
 	print_pkt(&pi);
 #endif
 	if ((err = ctx->isc_txd_encap(ctx->ifc_softc, &pi)) == 0) {
 		bus_dmamap_sync(buf_tag, map, BUS_DMASYNC_PREWRITE);
 		DBG_COUNTER_INC(tx_encap);
 		MPASS(pi.ipi_new_pidx < txq->ift_size);
 
 		ndesc = pi.ipi_new_pidx - pi.ipi_pidx;
 		if (pi.ipi_new_pidx < pi.ipi_pidx) {
 			ndesc += txq->ift_size;
 			txq->ift_gen = 1;
 		}
 		/*
 		 * drivers can need as many as 
 		 * two sentinels
 		 */
 		MPASS(ndesc <= pi.ipi_nsegs + 2);
 		MPASS(pi.ipi_new_pidx != pidx);
 		MPASS(ndesc > 0);
 		txq->ift_in_use += ndesc;
 
 		/*
 		 * We update the last software descriptor again here because there may
 		 * be a sentinel and/or there may be more mbufs than segments
 		 */
 		txq->ift_pidx = pi.ipi_new_pidx;
 		txq->ift_npending += pi.ipi_ndescs;
 	} else {
 		*m_headp = m_head = iflib_remove_mbuf(txq);
 		if (err == EFBIG) {
 			txq->ift_txd_encap_efbig++;
 			if (remap < 2) {
 				remap = 1;
 				goto defrag;
 			}
 		}
 		goto defrag_failed;
 	}
 	/*
 	 * err can't possibly be non-zero here, so we don't neet to test it
 	 * to see if we need to DBG_COUNTER_INC(encap_txd_encap_fail).
 	 */
 	return (err);
 
 defrag_failed:
 	txq->ift_mbuf_defrag_failed++;
 	txq->ift_map_failed++;
 	m_freem(*m_headp);
 	DBG_COUNTER_INC(tx_frees);
 	*m_headp = NULL;
 	DBG_COUNTER_INC(encap_txd_encap_fail);
 	return (ENOMEM);
 }
 
 static void
 iflib_tx_desc_free(iflib_txq_t txq, int n)
 {
 	uint32_t qsize, cidx, mask, gen;
 	struct mbuf *m, **ifsd_m;
 	bool do_prefetch;
 
 	cidx = txq->ift_cidx;
 	gen = txq->ift_gen;
 	qsize = txq->ift_size;
 	mask = qsize-1;
 	ifsd_m = txq->ift_sds.ifsd_m;
 	do_prefetch = (txq->ift_ctx->ifc_flags & IFC_PREFETCH);
 
 	while (n-- > 0) {
 		if (do_prefetch) {
 			prefetch(ifsd_m[(cidx + 3) & mask]);
 			prefetch(ifsd_m[(cidx + 4) & mask]);
 		}
 		if ((m = ifsd_m[cidx]) != NULL) {
 			prefetch(&ifsd_m[(cidx + CACHE_PTR_INCREMENT) & mask]);
 			if (m->m_pkthdr.csum_flags & CSUM_TSO) {
 				bus_dmamap_sync(txq->ift_tso_buf_tag,
 				    txq->ift_sds.ifsd_tso_map[cidx],
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(txq->ift_tso_buf_tag,
 				    txq->ift_sds.ifsd_tso_map[cidx]);
 			} else {
 				bus_dmamap_sync(txq->ift_buf_tag,
 				    txq->ift_sds.ifsd_map[cidx],
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(txq->ift_buf_tag,
 				    txq->ift_sds.ifsd_map[cidx]);
 			}
 			/* XXX we don't support any drivers that batch packets yet */
 			MPASS(m->m_nextpkt == NULL);
 			m_freem(m);
 			ifsd_m[cidx] = NULL;
 #if MEMORY_LOGGING
 			txq->ift_dequeued++;
 #endif
 			DBG_COUNTER_INC(tx_frees);
 		}
 		if (__predict_false(++cidx == qsize)) {
 			cidx = 0;
 			gen = 0;
 		}
 	}
 	txq->ift_cidx = cidx;
 	txq->ift_gen = gen;
 }
 
 static __inline int
 iflib_completed_tx_reclaim(iflib_txq_t txq, int thresh)
 {
 	int reclaim;
 	if_ctx_t ctx = txq->ift_ctx;
 
 	KASSERT(thresh >= 0, ("invalid threshold to reclaim"));
 	MPASS(thresh /*+ MAX_TX_DESC(txq->ift_ctx) */ < txq->ift_size);
 
 	/*
 	 * Need a rate-limiting check so that this isn't called every time
 	 */
 	iflib_tx_credits_update(ctx, txq);
 	reclaim = DESC_RECLAIMABLE(txq);
 
 	if (reclaim <= thresh /* + MAX_TX_DESC(txq->ift_ctx) */) {
 #ifdef INVARIANTS
 		if (iflib_verbose_debug) {
 			printf("%s processed=%ju cleaned=%ju tx_nsegments=%d reclaim=%d thresh=%d\n", __FUNCTION__,
 			       txq->ift_processed, txq->ift_cleaned, txq->ift_ctx->ifc_softc_ctx.isc_tx_nsegments,
 			       reclaim, thresh);
 
 		}
 #endif
 		return (0);
 	}
 	iflib_tx_desc_free(txq, reclaim);
 	txq->ift_cleaned += reclaim;
 	txq->ift_in_use -= reclaim;
 
 	return (reclaim);
 }
 
 static struct mbuf **
 _ring_peek_one(struct ifmp_ring *r, int cidx, int offset, int remaining)
 {
 	int next, size;
 	struct mbuf **items;
 
 	size = r->size;
 	next = (cidx + CACHE_PTR_INCREMENT) & (size-1);
 	items = __DEVOLATILE(struct mbuf **, &r->items[0]);
 
 	prefetch(items[(cidx + offset) & (size-1)]);
 	if (remaining > 1) {
 		prefetch2cachelines(&items[next]);
 		prefetch2cachelines(items[(cidx + offset + 1) & (size-1)]);
 		prefetch2cachelines(items[(cidx + offset + 2) & (size-1)]);
 		prefetch2cachelines(items[(cidx + offset + 3) & (size-1)]);
 	}
 	return (__DEVOLATILE(struct mbuf **, &r->items[(cidx + offset) & (size-1)]));
 }
 
 static void
 iflib_txq_check_drain(iflib_txq_t txq, int budget)
 {
 
 	ifmp_ring_check_drainage(txq->ift_br, budget);
 }
 
 static uint32_t
 iflib_txq_can_drain(struct ifmp_ring *r)
 {
 	iflib_txq_t txq = r->cookie;
 	if_ctx_t ctx = txq->ift_ctx;
 
 	if (TXQ_AVAIL(txq) > MAX_TX_DESC(ctx) + 2)
 		return (1);
 	bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 	    BUS_DMASYNC_POSTREAD);
 	return (ctx->isc_txd_credits_update(ctx->ifc_softc, txq->ift_id,
 	    false));
 }
 
 static uint32_t
 iflib_txq_drain(struct ifmp_ring *r, uint32_t cidx, uint32_t pidx)
 {
 	iflib_txq_t txq = r->cookie;
 	if_ctx_t ctx = txq->ift_ctx;
 	struct ifnet *ifp = ctx->ifc_ifp;
 	struct mbuf **mp, *m;
 	int i, count, consumed, pkt_sent, bytes_sent, mcast_sent, avail;
 	int reclaimed, err, in_use_prev, desc_used;
 	bool do_prefetch, ring, rang;
 
 	if (__predict_false(!(if_getdrvflags(ifp) & IFF_DRV_RUNNING) ||
 			    !LINK_ACTIVE(ctx))) {
 		DBG_COUNTER_INC(txq_drain_notready);
 		return (0);
 	}
 	reclaimed = iflib_completed_tx_reclaim(txq, RECLAIM_THRESH(ctx));
 	rang = iflib_txd_db_check(ctx, txq, reclaimed, txq->ift_in_use);
 	avail = IDXDIFF(pidx, cidx, r->size);
 	if (__predict_false(ctx->ifc_flags & IFC_QFLUSH)) {
 		DBG_COUNTER_INC(txq_drain_flushing);
 		for (i = 0; i < avail; i++) {
 			if (__predict_true(r->items[(cidx + i) & (r->size-1)] != (void *)txq))
 				m_free(r->items[(cidx + i) & (r->size-1)]);
 			r->items[(cidx + i) & (r->size-1)] = NULL;
 		}
 		return (avail);
 	}
 
 	if (__predict_false(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_OACTIVE)) {
 		txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		CALLOUT_UNLOCK(txq);
 		DBG_COUNTER_INC(txq_drain_oactive);
 		return (0);
 	}
 	if (reclaimed)
 		txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 	consumed = mcast_sent = bytes_sent = pkt_sent = 0;
 	count = MIN(avail, TX_BATCH_SIZE);
 #ifdef INVARIANTS
 	if (iflib_verbose_debug)
 		printf("%s avail=%d ifc_flags=%x txq_avail=%d ", __FUNCTION__,
 		       avail, ctx->ifc_flags, TXQ_AVAIL(txq));
 #endif
 	do_prefetch = (ctx->ifc_flags & IFC_PREFETCH);
 	avail = TXQ_AVAIL(txq);
 	err = 0;
 	for (desc_used = i = 0; i < count && avail > MAX_TX_DESC(ctx) + 2; i++) {
 		int rem = do_prefetch ? count - i : 0;
 
 		mp = _ring_peek_one(r, cidx, i, rem);
 		MPASS(mp != NULL && *mp != NULL);
 		if (__predict_false(*mp == (struct mbuf *)txq)) {
 			consumed++;
 			reclaimed++;
 			continue;
 		}
 		in_use_prev = txq->ift_in_use;
 		err = iflib_encap(txq, mp);
 		if (__predict_false(err)) {
 			/* no room - bail out */
 			if (err == ENOBUFS)
 				break;
 			consumed++;
 			/* we can't send this packet - skip it */
 			continue;
 		}
 		consumed++;
 		pkt_sent++;
 		m = *mp;
 		DBG_COUNTER_INC(tx_sent);
 		bytes_sent += m->m_pkthdr.len;
 		mcast_sent += !!(m->m_flags & M_MCAST);
 		avail = TXQ_AVAIL(txq);
 
 		txq->ift_db_pending += (txq->ift_in_use - in_use_prev);
 		desc_used += (txq->ift_in_use - in_use_prev);
 		ETHER_BPF_MTAP(ifp, m);
 		if (__predict_false(!(ifp->if_drv_flags & IFF_DRV_RUNNING)))
 			break;
 		rang = iflib_txd_db_check(ctx, txq, false, in_use_prev);
 	}
 
 	/* deliberate use of bitwise or to avoid gratuitous short-circuit */
 	ring = rang ? false  : (iflib_min_tx_latency | err) || (TXQ_AVAIL(txq) < MAX_TX_DESC(ctx));
 	iflib_txd_db_check(ctx, txq, ring, txq->ift_in_use);
 	if_inc_counter(ifp, IFCOUNTER_OBYTES, bytes_sent);
 	if_inc_counter(ifp, IFCOUNTER_OPACKETS, pkt_sent);
 	if (mcast_sent)
 		if_inc_counter(ifp, IFCOUNTER_OMCASTS, mcast_sent);
 #ifdef INVARIANTS
 	if (iflib_verbose_debug)
 		printf("consumed=%d\n", consumed);
 #endif
 	return (consumed);
 }
 
 static uint32_t
 iflib_txq_drain_always(struct ifmp_ring *r)
 {
 	return (1);
 }
 
 static uint32_t
 iflib_txq_drain_free(struct ifmp_ring *r, uint32_t cidx, uint32_t pidx)
 {
 	int i, avail;
 	struct mbuf **mp;
 	iflib_txq_t txq;
 
 	txq = r->cookie;
 
 	txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 	CALLOUT_LOCK(txq);
 	callout_stop(&txq->ift_timer);
 	CALLOUT_UNLOCK(txq);
 
 	avail = IDXDIFF(pidx, cidx, r->size);
 	for (i = 0; i < avail; i++) {
 		mp = _ring_peek_one(r, cidx, i, avail - i);
 		if (__predict_false(*mp == (struct mbuf *)txq))
 			continue;
 		m_freem(*mp);
 		DBG_COUNTER_INC(tx_frees);
 	}
 	MPASS(ifmp_ring_is_stalled(r) == 0);
 	return (avail);
 }
 
 static void
 iflib_ifmp_purge(iflib_txq_t txq)
 {
 	struct ifmp_ring *r;
 
 	r = txq->ift_br;
 	r->drain = iflib_txq_drain_free;
 	r->can_drain = iflib_txq_drain_always;
 
 	ifmp_ring_check_drainage(r, r->size);
 
 	r->drain = iflib_txq_drain;
 	r->can_drain = iflib_txq_can_drain;
 }
 
 static void
 _task_fn_tx(void *context)
 {
 	iflib_txq_t txq = context;
 	if_ctx_t ctx = txq->ift_ctx;
 #if defined(ALTQ) || defined(DEV_NETMAP)
 	if_t ifp = ctx->ifc_ifp;
 #endif
 	int abdicate = ctx->ifc_sysctl_tx_abdicate;
 
 #ifdef IFLIB_DIAGNOSTICS
 	txq->ift_cpu_exec_count[curcpu]++;
 #endif
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 #ifdef DEV_NETMAP
 	if (if_getcapenable(ifp) & IFCAP_NETMAP) {
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 		    BUS_DMASYNC_POSTREAD);
 		if (ctx->isc_txd_credits_update(ctx->ifc_softc, txq->ift_id, false))
 			netmap_tx_irq(ifp, txq->ift_id);
 		IFDI_TX_QUEUE_INTR_ENABLE(ctx, txq->ift_id);
 		return;
 	}
 #endif
 #ifdef ALTQ
 	if (ALTQ_IS_ENABLED(&ifp->if_snd))
 		iflib_altq_if_start(ifp);
 #endif
 	if (txq->ift_db_pending)
 		ifmp_ring_enqueue(txq->ift_br, (void **)&txq, 1, TX_BATCH_SIZE, abdicate);
 	else if (!abdicate)
 		ifmp_ring_check_drainage(txq->ift_br, TX_BATCH_SIZE);
 	/*
 	 * When abdicating, we always need to check drainage, not just when we don't enqueue
 	 */
 	if (abdicate)
 		ifmp_ring_check_drainage(txq->ift_br, TX_BATCH_SIZE);
 	if (ctx->ifc_flags & IFC_LEGACY)
 		IFDI_INTR_ENABLE(ctx);
 	else {
 #ifdef INVARIANTS
 		int rc =
 #endif
 			IFDI_TX_QUEUE_INTR_ENABLE(ctx, txq->ift_id);
 			KASSERT(rc != ENOTSUP, ("MSI-X support requires queue_intr_enable, but not implemented in driver"));
 	}
 }
 
 static void
 _task_fn_rx(void *context)
 {
 	iflib_rxq_t rxq = context;
 	if_ctx_t ctx = rxq->ifr_ctx;
 	bool more;
 	uint16_t budget;
 
 #ifdef IFLIB_DIAGNOSTICS
 	rxq->ifr_cpu_exec_count[curcpu]++;
 #endif
 	DBG_COUNTER_INC(task_fn_rxs);
 	if (__predict_false(!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)))
 		return;
 	more = true;
 #ifdef DEV_NETMAP
 	if (if_getcapenable(ctx->ifc_ifp) & IFCAP_NETMAP) {
 		u_int work = 0;
 		if (netmap_rx_irq(ctx->ifc_ifp, rxq->ifr_id, &work)) {
 			more = false;
 		}
 	}
 #endif
 	budget = ctx->ifc_sysctl_rx_budget;
 	if (budget == 0)
 		budget = 16;	/* XXX */
 	if (more == false || (more = iflib_rxeof(rxq, budget)) == false) {
 		if (ctx->ifc_flags & IFC_LEGACY)
 			IFDI_INTR_ENABLE(ctx);
 		else {
 #ifdef INVARIANTS
 			int rc =
 #endif
 				IFDI_RX_QUEUE_INTR_ENABLE(ctx, rxq->ifr_id);
 			KASSERT(rc != ENOTSUP, ("MSI-X support requires queue_intr_enable, but not implemented in driver"));
 			DBG_COUNTER_INC(rx_intr_enables);
 		}
 	}
 	if (__predict_false(!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)))
 		return;
 	if (more)
 		GROUPTASK_ENQUEUE(&rxq->ifr_task);
 }
 
 static void
 _task_fn_admin(void *context)
 {
 	if_ctx_t ctx = context;
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 	iflib_txq_t txq;
 	int i;
 	bool oactive, running, do_reset, do_watchdog, in_detach;
 	uint32_t reset_on = hz / 2;
 
 	STATE_LOCK(ctx);
 	running = (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING);
 	oactive = (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_OACTIVE);
 	do_reset = (ctx->ifc_flags & IFC_DO_RESET);
 	do_watchdog = (ctx->ifc_flags & IFC_DO_WATCHDOG);
 	in_detach = (ctx->ifc_flags & IFC_IN_DETACH);
 	ctx->ifc_flags &= ~(IFC_DO_RESET|IFC_DO_WATCHDOG);
 	STATE_UNLOCK(ctx);
 
 	if ((!running && !oactive) && !(ctx->ifc_sctx->isc_flags & IFLIB_ADMIN_ALWAYS_RUN))
 		return;
 	if (in_detach)
 		return;
 
 	CTX_LOCK(ctx);
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++) {
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		CALLOUT_UNLOCK(txq);
 	}
 	if (do_watchdog) {
 		ctx->ifc_watchdog_events++;
 		IFDI_WATCHDOG_RESET(ctx);
 	}
 	IFDI_UPDATE_ADMIN_STATUS(ctx);
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++) {
 #ifdef DEV_NETMAP
 		reset_on = hz / 2;
 		if (if_getcapenable(ctx->ifc_ifp) & IFCAP_NETMAP)
 			iflib_netmap_timer_adjust(ctx, txq, &reset_on);
 #endif
 		callout_reset_on(&txq->ift_timer, reset_on, iflib_timer, txq, txq->ift_timer.c_cpu);
 	}
 	IFDI_LINK_INTR_ENABLE(ctx);
 	if (do_reset)
 		iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 
 	if (LINK_ACTIVE(ctx) == 0)
 		return;
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++)
 		iflib_txq_check_drain(txq, IFLIB_RESTART_BUDGET);
 }
 
 
 static void
 _task_fn_iov(void *context)
 {
 	if_ctx_t ctx = context;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING) &&
 	    !(ctx->ifc_sctx->isc_flags & IFLIB_ADMIN_ALWAYS_RUN))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VFLR_HANDLE(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static int
 iflib_sysctl_int_delay(SYSCTL_HANDLER_ARGS)
 {
 	int err;
 	if_int_delay_info_t info;
 	if_ctx_t ctx;
 
 	info = (if_int_delay_info_t)arg1;
 	ctx = info->iidi_ctx;
 	info->iidi_req = req;
 	info->iidi_oidp = oidp;
 	CTX_LOCK(ctx);
 	err = IFDI_SYSCTL_INT_DELAY(ctx, info);
 	CTX_UNLOCK(ctx);
 	return (err);
 }
 
 /*********************************************************************
  *
  *  IFNET FUNCTIONS
  *
  **********************************************************************/
 
 static void
 iflib_if_init_locked(if_ctx_t ctx)
 {
 	iflib_stop(ctx);
 	iflib_init_locked(ctx);
 }
 
 
 static void
 iflib_if_init(void *arg)
 {
 	if_ctx_t ctx = arg;
 
 	CTX_LOCK(ctx);
 	iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static int
 iflib_if_transmit(if_t ifp, struct mbuf *m)
 {
 	if_ctx_t	ctx = if_getsoftc(ifp);
 
 	iflib_txq_t txq;
 	int err, qidx;
 	int abdicate = ctx->ifc_sysctl_tx_abdicate;
 
 	if (__predict_false((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || !LINK_ACTIVE(ctx))) {
 		DBG_COUNTER_INC(tx_frees);
 		m_freem(m);
 		return (ENETDOWN);
 	}
 
 	MPASS(m->m_nextpkt == NULL);
 	/* ALTQ-enabled interfaces always use queue 0. */
 	qidx = 0;
 	if ((NTXQSETS(ctx) > 1) && M_HASHTYPE_GET(m) && !ALTQ_IS_ENABLED(&ifp->if_snd))
 		qidx = QIDX(ctx, m);
 	/*
 	 * XXX calculate buf_ring based on flowid (divvy up bits?)
 	 */
 	txq = &ctx->ifc_txqs[qidx];
 
 #ifdef DRIVER_BACKPRESSURE
 	if (txq->ift_closed) {
 		while (m != NULL) {
 			next = m->m_nextpkt;
 			m->m_nextpkt = NULL;
 			m_freem(m);
 			DBG_COUNTER_INC(tx_frees);
 			m = next;
 		}
 		return (ENOBUFS);
 	}
 #endif
 #ifdef notyet
 	qidx = count = 0;
 	mp = marr;
 	next = m;
 	do {
 		count++;
 		next = next->m_nextpkt;
 	} while (next != NULL);
 
 	if (count > nitems(marr))
 		if ((mp = malloc(count*sizeof(struct mbuf *), M_IFLIB, M_NOWAIT)) == NULL) {
 			/* XXX check nextpkt */
 			m_freem(m);
 			/* XXX simplify for now */
 			DBG_COUNTER_INC(tx_frees);
 			return (ENOBUFS);
 		}
 	for (next = m, i = 0; next != NULL; i++) {
 		mp[i] = next;
 		next = next->m_nextpkt;
 		mp[i]->m_nextpkt = NULL;
 	}
 #endif
 	DBG_COUNTER_INC(tx_seen);
 	err = ifmp_ring_enqueue(txq->ift_br, (void **)&m, 1, TX_BATCH_SIZE, abdicate);
 
 	if (abdicate)
 		GROUPTASK_ENQUEUE(&txq->ift_task);
  	if (err) {
 		if (!abdicate)
 			GROUPTASK_ENQUEUE(&txq->ift_task);
 		/* support forthcoming later */
 #ifdef DRIVER_BACKPRESSURE
 		txq->ift_closed = TRUE;
 #endif
 		ifmp_ring_check_drainage(txq->ift_br, TX_BATCH_SIZE);
 		m_freem(m);
 		DBG_COUNTER_INC(tx_frees);
 	}
 
 	return (err);
 }
 
 #ifdef ALTQ
 /*
  * The overall approach to integrating iflib with ALTQ is to continue to use
  * the iflib mp_ring machinery between the ALTQ queue(s) and the hardware
  * ring.  Technically, when using ALTQ, queueing to an intermediate mp_ring
  * is redundant/unnecessary, but doing so minimizes the amount of
  * ALTQ-specific code required in iflib.  It is assumed that the overhead of
  * redundantly queueing to an intermediate mp_ring is swamped by the
  * performance limitations inherent in using ALTQ.
  *
  * When ALTQ support is compiled in, all iflib drivers will use a transmit
  * routine, iflib_altq_if_transmit(), that checks if ALTQ is enabled for the
  * given interface.  If ALTQ is enabled for an interface, then all
  * transmitted packets for that interface will be submitted to the ALTQ
  * subsystem via IFQ_ENQUEUE().  We don't use the legacy if_transmit()
  * implementation because it uses IFQ_HANDOFF(), which will duplicatively
  * update stats that the iflib machinery handles, and which is sensitve to
  * the disused IFF_DRV_OACTIVE flag.  Additionally, iflib_altq_if_start()
  * will be installed as the start routine for use by ALTQ facilities that
  * need to trigger queue drains on a scheduled basis.
  *
  */
 static void
 iflib_altq_if_start(if_t ifp)
 {
 	struct ifaltq *ifq = &ifp->if_snd;
 	struct mbuf *m;
 	
 	IFQ_LOCK(ifq);
 	IFQ_DEQUEUE_NOLOCK(ifq, m);
 	while (m != NULL) {
 		iflib_if_transmit(ifp, m);
 		IFQ_DEQUEUE_NOLOCK(ifq, m);
 	}
 	IFQ_UNLOCK(ifq);
 }
 
 static int
 iflib_altq_if_transmit(if_t ifp, struct mbuf *m)
 {
 	int err;
 
 	if (ALTQ_IS_ENABLED(&ifp->if_snd)) {
 		IFQ_ENQUEUE(&ifp->if_snd, m, err);
 		if (err == 0)
 			iflib_altq_if_start(ifp);
 	} else
 		err = iflib_if_transmit(ifp, m);
 
 	return (err);
 }
 #endif /* ALTQ */
 
 static void
 iflib_if_qflush(if_t ifp)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	iflib_txq_t txq = ctx->ifc_txqs;
 	int i;
 
 	STATE_LOCK(ctx);
 	ctx->ifc_flags |= IFC_QFLUSH;
 	STATE_UNLOCK(ctx);
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++)
 		while (!(ifmp_ring_is_idle(txq->ift_br) || ifmp_ring_is_stalled(txq->ift_br)))
 			iflib_txq_check_drain(txq, 0);
 	STATE_LOCK(ctx);
 	ctx->ifc_flags &= ~IFC_QFLUSH;
 	STATE_UNLOCK(ctx);
 
 	/*
 	 * When ALTQ is enabled, this will also take care of purging the
 	 * ALTQ queue(s).
 	 */
 	if_qflush(ifp);
 }
 
 
 #define IFCAP_FLAGS (IFCAP_HWCSUM_IPV6 | IFCAP_HWCSUM | IFCAP_LRO | \
 		     IFCAP_TSO | IFCAP_VLAN_HWTAGGING | IFCAP_HWSTATS | \
 		     IFCAP_VLAN_MTU | IFCAP_VLAN_HWFILTER | \
 		     IFCAP_VLAN_HWTSO | IFCAP_VLAN_HWCSUM)
 
 static int
 iflib_if_ioctl(if_t ifp, u_long command, caddr_t data)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	struct ifreq	*ifr = (struct ifreq *)data;
 #if defined(INET) || defined(INET6)
 	struct ifaddr	*ifa = (struct ifaddr *)data;
 #endif
 	bool		avoid_reset = FALSE;
 	int		err = 0, reinit = 0, bits;
 
 	switch (command) {
 	case SIOCSIFADDR:
 #ifdef INET
 		if (ifa->ifa_addr->sa_family == AF_INET)
 			avoid_reset = TRUE;
 #endif
 #ifdef INET6
 		if (ifa->ifa_addr->sa_family == AF_INET6)
 			avoid_reset = TRUE;
 #endif
 		/*
 		** Calling init results in link renegotiation,
 		** so we avoid doing it when possible.
 		*/
 		if (avoid_reset) {
 			if_setflagbits(ifp, IFF_UP,0);
 			if (!(if_getdrvflags(ifp) & IFF_DRV_RUNNING))
 				reinit = 1;
 #ifdef INET
 			if (!(if_getflags(ifp) & IFF_NOARP))
 				arp_ifinit(ifp, ifa);
 #endif
 		} else
 			err = ether_ioctl(ifp, command, data);
 		break;
 	case SIOCSIFMTU:
 		CTX_LOCK(ctx);
 		if (ifr->ifr_mtu == if_getmtu(ifp)) {
 			CTX_UNLOCK(ctx);
 			break;
 		}
 		bits = if_getdrvflags(ifp);
 		/* stop the driver and free any clusters before proceeding */
 		iflib_stop(ctx);
 
 		if ((err = IFDI_MTU_SET(ctx, ifr->ifr_mtu)) == 0) {
 			STATE_LOCK(ctx);
 			if (ifr->ifr_mtu > ctx->ifc_max_fl_buf_size)
 				ctx->ifc_flags |= IFC_MULTISEG;
 			else
 				ctx->ifc_flags &= ~IFC_MULTISEG;
 			STATE_UNLOCK(ctx);
 			err = if_setmtu(ifp, ifr->ifr_mtu);
 		}
 		iflib_init_locked(ctx);
 		STATE_LOCK(ctx);
 		if_setdrvflags(ifp, bits);
 		STATE_UNLOCK(ctx);
 		CTX_UNLOCK(ctx);
 		break;
 	case SIOCSIFFLAGS:
 		CTX_LOCK(ctx);
 		if (if_getflags(ifp) & IFF_UP) {
 			if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 				if ((if_getflags(ifp) ^ ctx->ifc_if_flags) &
 				    (IFF_PROMISC | IFF_ALLMULTI)) {
 					err = IFDI_PROMISC_SET(ctx, if_getflags(ifp));
 				}
 			} else
 				reinit = 1;
 		} else if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 			iflib_stop(ctx);
 		}
 		ctx->ifc_if_flags = if_getflags(ifp);
 		CTX_UNLOCK(ctx);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 			CTX_LOCK(ctx);
 			IFDI_INTR_DISABLE(ctx);
 			IFDI_MULTI_SET(ctx);
 			IFDI_INTR_ENABLE(ctx);
 			CTX_UNLOCK(ctx);
 		}
 		break;
 	case SIOCSIFMEDIA:
 		CTX_LOCK(ctx);
 		IFDI_MEDIA_SET(ctx);
 		CTX_UNLOCK(ctx);
 		/* falls thru */
 	case SIOCGIFMEDIA:
 	case SIOCGIFXMEDIA:
 		err = ifmedia_ioctl(ifp, ifr, &ctx->ifc_media, command);
 		break;
 	case SIOCGI2C:
 	{
 		struct ifi2creq i2c;
 
 		err = copyin(ifr_data_get_ptr(ifr), &i2c, sizeof(i2c));
 		if (err != 0)
 			break;
 		if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) {
 			err = EINVAL;
 			break;
 		}
 		if (i2c.len > sizeof(i2c.data)) {
 			err = EINVAL;
 			break;
 		}
 
 		if ((err = IFDI_I2C_REQ(ctx, &i2c)) == 0)
 			err = copyout(&i2c, ifr_data_get_ptr(ifr),
 			    sizeof(i2c));
 		break;
 	}
 	case SIOCSIFCAP:
 	{
 		int mask, setmask, oldmask;
 
 		oldmask = if_getcapenable(ifp);
 		mask = ifr->ifr_reqcap ^ oldmask;
 		mask &= ctx->ifc_softc_ctx.isc_capabilities;
 		setmask = 0;
 #ifdef TCP_OFFLOAD
 		setmask |= mask & (IFCAP_TOE4|IFCAP_TOE6);
 #endif
 		setmask |= (mask & IFCAP_FLAGS);
 		setmask |= (mask & IFCAP_WOL);
 
 		/*
 		 * If any RX csum has changed, change all the ones that
 		 * are supported by the driver.
 		 */
 		if (setmask & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) {
 			setmask |= ctx->ifc_softc_ctx.isc_capabilities &
 			    (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6);
 		}
 
 		/*
 		 * want to ensure that traffic has stopped before we change any of the flags
 		 */
 		if (setmask) {
 			CTX_LOCK(ctx);
 			bits = if_getdrvflags(ifp);
 			if (bits & IFF_DRV_RUNNING && setmask & ~IFCAP_WOL)
 				iflib_stop(ctx);
 			STATE_LOCK(ctx);
 			if_togglecapenable(ifp, setmask);
 			STATE_UNLOCK(ctx);
 			if (bits & IFF_DRV_RUNNING && setmask & ~IFCAP_WOL)
 				iflib_init_locked(ctx);
 			STATE_LOCK(ctx);
 			if_setdrvflags(ifp, bits);
 			STATE_UNLOCK(ctx);
 			CTX_UNLOCK(ctx);
 		}
 		if_vlancap(ifp);
 		break;
 	}
 	case SIOCGPRIVATE_0:
 	case SIOCSDRVSPEC:
 	case SIOCGDRVSPEC:
 		CTX_LOCK(ctx);
 		err = IFDI_PRIV_IOCTL(ctx, command, data);
 		CTX_UNLOCK(ctx);
 		break;
 	default:
 		err = ether_ioctl(ifp, command, data);
 		break;
 	}
 	if (reinit)
 		iflib_if_init(ctx);
 	return (err);
 }
 
 static uint64_t
 iflib_if_get_counter(if_t ifp, ift_counter cnt)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	return (IFDI_GET_COUNTER(ctx, cnt));
 }
 
 /*********************************************************************
  *
  *  OTHER FUNCTIONS EXPORTED TO THE STACK
  *
  **********************************************************************/
 
 static void
 iflib_vlan_register(void *arg, if_t ifp, uint16_t vtag)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	if ((void *)ctx != arg)
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VLAN_REGISTER(ctx, vtag);
 	/* Re-init to load the changes */
 	if (if_getcapenable(ifp) & IFCAP_VLAN_HWFILTER)
 		iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_vlan_unregister(void *arg, if_t ifp, uint16_t vtag)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	if ((void *)ctx != arg)
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VLAN_UNREGISTER(ctx, vtag);
 	/* Re-init to load the changes */
 	if (if_getcapenable(ifp) & IFCAP_VLAN_HWFILTER)
 		iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_led_func(void *arg, int onoff)
 {
 	if_ctx_t ctx = arg;
 
 	CTX_LOCK(ctx);
 	IFDI_LED_FUNC(ctx, onoff);
 	CTX_UNLOCK(ctx);
 }
 
 /*********************************************************************
  *
  *  BUS FUNCTION DEFINITIONS
  *
  **********************************************************************/
 
 int
 iflib_device_probe(device_t dev)
 {
 	pci_vendor_info_t *ent;
 
 	uint16_t	pci_vendor_id, pci_device_id;
 	uint16_t	pci_subvendor_id, pci_subdevice_id;
 	uint16_t	pci_rev_id;
 	if_shared_ctx_t sctx;
 
 	if ((sctx = DEVICE_REGISTER(dev)) == NULL || sctx->isc_magic != IFLIB_MAGIC)
 		return (ENOTSUP);
 
 	pci_vendor_id = pci_get_vendor(dev);
 	pci_device_id = pci_get_device(dev);
 	pci_subvendor_id = pci_get_subvendor(dev);
 	pci_subdevice_id = pci_get_subdevice(dev);
 	pci_rev_id = pci_get_revid(dev);
 	if (sctx->isc_parse_devinfo != NULL)
 		sctx->isc_parse_devinfo(&pci_device_id, &pci_subvendor_id, &pci_subdevice_id, &pci_rev_id);
 
 	ent = sctx->isc_vendor_info;
 	while (ent->pvi_vendor_id != 0) {
 		if (pci_vendor_id != ent->pvi_vendor_id) {
 			ent++;
 			continue;
 		}
 		if ((pci_device_id == ent->pvi_device_id) &&
 		    ((pci_subvendor_id == ent->pvi_subvendor_id) ||
 		     (ent->pvi_subvendor_id == 0)) &&
 		    ((pci_subdevice_id == ent->pvi_subdevice_id) ||
 		     (ent->pvi_subdevice_id == 0)) &&
 		    ((pci_rev_id == ent->pvi_rev_id) ||
 		     (ent->pvi_rev_id == 0))) {
 
 			device_set_desc_copy(dev, ent->pvi_name);
 			/* this needs to be changed to zero if the bus probing code
 			 * ever stops re-probing on best match because the sctx
 			 * may have its values over written by register calls
 			 * in subsequent probes
 			 */
 			return (BUS_PROBE_DEFAULT);
 		}
 		ent++;
 	}
 	return (ENXIO);
 }
 
 static void
 iflib_reset_qvalues(if_ctx_t ctx)
 {
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	device_t dev = ctx->ifc_dev;
 	int i;
 
 	scctx->isc_txrx_budget_bytes_max = IFLIB_MAX_TX_BYTES;
 	scctx->isc_tx_qdepth = IFLIB_DEFAULT_TX_QDEPTH;
 	/*
 	 * XXX sanity check that ntxd & nrxd are a power of 2
 	 */
 	if (ctx->ifc_sysctl_ntxqs != 0)
 		scctx->isc_ntxqsets = ctx->ifc_sysctl_ntxqs;
 	if (ctx->ifc_sysctl_nrxqs != 0)
 		scctx->isc_nrxqsets = ctx->ifc_sysctl_nrxqs;
 
 	for (i = 0; i < sctx->isc_ntxqs; i++) {
 		if (ctx->ifc_sysctl_ntxds[i] != 0)
 			scctx->isc_ntxd[i] = ctx->ifc_sysctl_ntxds[i];
 		else
 			scctx->isc_ntxd[i] = sctx->isc_ntxd_default[i];
 	}
 
 	for (i = 0; i < sctx->isc_nrxqs; i++) {
 		if (ctx->ifc_sysctl_nrxds[i] != 0)
 			scctx->isc_nrxd[i] = ctx->ifc_sysctl_nrxds[i];
 		else
 			scctx->isc_nrxd[i] = sctx->isc_nrxd_default[i];
 	}
 
 	for (i = 0; i < sctx->isc_nrxqs; i++) {
 		if (scctx->isc_nrxd[i] < sctx->isc_nrxd_min[i]) {
 			device_printf(dev, "nrxd%d: %d less than nrxd_min %d - resetting to min\n",
 				      i, scctx->isc_nrxd[i], sctx->isc_nrxd_min[i]);
 			scctx->isc_nrxd[i] = sctx->isc_nrxd_min[i];
 		}
 		if (scctx->isc_nrxd[i] > sctx->isc_nrxd_max[i]) {
 			device_printf(dev, "nrxd%d: %d greater than nrxd_max %d - resetting to max\n",
 				      i, scctx->isc_nrxd[i], sctx->isc_nrxd_max[i]);
 			scctx->isc_nrxd[i] = sctx->isc_nrxd_max[i];
 		}
 	}
 
 	for (i = 0; i < sctx->isc_ntxqs; i++) {
 		if (scctx->isc_ntxd[i] < sctx->isc_ntxd_min[i]) {
 			device_printf(dev, "ntxd%d: %d less than ntxd_min %d - resetting to min\n",
 				      i, scctx->isc_ntxd[i], sctx->isc_ntxd_min[i]);
 			scctx->isc_ntxd[i] = sctx->isc_ntxd_min[i];
 		}
 		if (scctx->isc_ntxd[i] > sctx->isc_ntxd_max[i]) {
 			device_printf(dev, "ntxd%d: %d greater than ntxd_max %d - resetting to max\n",
 				      i, scctx->isc_ntxd[i], sctx->isc_ntxd_max[i]);
 			scctx->isc_ntxd[i] = sctx->isc_ntxd_max[i];
 		}
 	}
 }
 
+static void
+iflib_add_pfil(if_ctx_t ctx)
+{
+	struct pfil_head *pfil;
+	struct pfil_head_args pa;
+	iflib_rxq_t rxq;
+	int i;
+
+	pa.pa_version = PFIL_VERSION;
+	pa.pa_flags = PFIL_IN;
+	pa.pa_type = PFIL_TYPE_ETHERNET;
+	pa.pa_headname = ctx->ifc_ifp->if_xname;
+	pfil = pfil_head_register(&pa);
+
+	for (i = 0, rxq = ctx->ifc_rxqs; i < NRXQSETS(ctx); i++, rxq++) {
+		rxq->pfil = pfil;
+	}
+}
+
+static void
+iflib_rem_pfil(if_ctx_t ctx)
+{
+	struct pfil_head *pfil;
+	iflib_rxq_t rxq;
+	int i;
+
+	rxq = ctx->ifc_rxqs;
+	pfil = rxq->pfil;
+	for (i = 0; i < NRXQSETS(ctx); i++, rxq++) {
+		rxq->pfil = NULL;
+	}
+	pfil_head_unregister(pfil);
+}
+
+static uint16_t
+get_ctx_core_offset(if_ctx_t ctx)
+{
+	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
+	struct cpu_offset *op;
+	uint16_t qc;
+	uint16_t ret = ctx->ifc_sysctl_core_offset;
+
+	if (ret != CORE_OFFSET_UNSPECIFIED)
+		return (ret);
+
+	if (ctx->ifc_sysctl_separate_txrx)
+		qc = scctx->isc_ntxqsets + scctx->isc_nrxqsets;
+	else
+		qc = max(scctx->isc_ntxqsets, scctx->isc_nrxqsets);
+
+	mtx_lock(&cpu_offset_mtx);
+	SLIST_FOREACH(op, &cpu_offsets, entries) {
+		if (CPU_CMP(&ctx->ifc_cpus, &op->set) == 0) {
+			ret = op->offset;
+			op->offset += qc;
+			MPASS(op->refcount < UINT_MAX);
+			op->refcount++;
+			break;
+		}
+	}
+	if (ret == CORE_OFFSET_UNSPECIFIED) {
+		ret = 0;
+		op = malloc(sizeof(struct cpu_offset), M_IFLIB,
+		    M_NOWAIT | M_ZERO);
+		if (op == NULL) {
+			device_printf(ctx->ifc_dev,
+			    "allocation for cpu offset failed.\n");
+		} else {
+			op->offset = qc;
+			op->refcount = 1;
+			CPU_COPY(&ctx->ifc_cpus, &op->set);
+			SLIST_INSERT_HEAD(&cpu_offsets, op, entries);
+		}
+	}
+	mtx_unlock(&cpu_offset_mtx);
+
+	return (ret);
+}
+
+static void
+unref_ctx_core_offset(if_ctx_t ctx)
+{
+	struct cpu_offset *op, *top;
+
+	mtx_lock(&cpu_offset_mtx);
+	SLIST_FOREACH_SAFE(op, &cpu_offsets, entries, top) {
+		if (CPU_CMP(&ctx->ifc_cpus, &op->set) == 0) {
+			MPASS(op->refcount > 0);
+			op->refcount--;
+			if (op->refcount == 0) {
+				SLIST_REMOVE(&cpu_offsets, op, cpu_offset, entries);
+				free(op, M_IFLIB);
+			}
+			break;
+		}
+	}
+	mtx_unlock(&cpu_offset_mtx);
+}
+
 int
 iflib_device_register(device_t dev, void *sc, if_shared_ctx_t sctx, if_ctx_t *ctxp)
 {
 	int err, rid, msix;
 	if_ctx_t ctx;
 	if_t ifp;
 	if_softc_ctx_t scctx;
 	int i;
 	uint16_t main_txq;
 	uint16_t main_rxq;
 
 
 	ctx = malloc(sizeof(* ctx), M_IFLIB, M_WAITOK|M_ZERO);
 
 	if (sc == NULL) {
 		sc = malloc(sctx->isc_driver->size, M_IFLIB, M_WAITOK|M_ZERO);
 		device_set_softc(dev, ctx);
 		ctx->ifc_flags |= IFC_SC_ALLOCATED;
 	}
 
 	ctx->ifc_sctx = sctx;
 	ctx->ifc_dev = dev;
 	ctx->ifc_softc = sc;
 
 	if ((err = iflib_register(ctx)) != 0) {
 		device_printf(dev, "iflib_register failed %d\n", err);
 		goto fail_ctx_free;
 	}
 	iflib_add_device_sysctl_pre(ctx);
 
 	scctx = &ctx->ifc_softc_ctx;
 	ifp = ctx->ifc_ifp;
 
 	iflib_reset_qvalues(ctx);
 	CTX_LOCK(ctx);
 	if ((err = IFDI_ATTACH_PRE(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_PRE failed %d\n", err);
 		goto fail_unlock;
 	}
 	_iflib_pre_assert(scctx);
 	ctx->ifc_txrx = *scctx->isc_txrx;
 
 #ifdef INVARIANTS
 	MPASS(scctx->isc_capabilities);
 	if (scctx->isc_capabilities & IFCAP_TXCSUM)
 		MPASS(scctx->isc_tx_csum_flags);
 #endif
 
 	if_setcapabilities(ifp, scctx->isc_capabilities | IFCAP_HWSTATS);
 	if_setcapenable(ifp, scctx->isc_capenable | IFCAP_HWSTATS);
 
 	if (scctx->isc_ntxqsets == 0 || (scctx->isc_ntxqsets_max && scctx->isc_ntxqsets_max < scctx->isc_ntxqsets))
 		scctx->isc_ntxqsets = scctx->isc_ntxqsets_max;
 	if (scctx->isc_nrxqsets == 0 || (scctx->isc_nrxqsets_max && scctx->isc_nrxqsets_max < scctx->isc_nrxqsets))
 		scctx->isc_nrxqsets = scctx->isc_nrxqsets_max;
 
 	main_txq = (sctx->isc_flags & IFLIB_HAS_TXCQ) ? 1 : 0;
 	main_rxq = (sctx->isc_flags & IFLIB_HAS_RXCQ) ? 1 : 0;
 
 	/* XXX change for per-queue sizes */
 	device_printf(dev, "Using %d tx descriptors and %d rx descriptors\n",
 	    scctx->isc_ntxd[main_txq], scctx->isc_nrxd[main_rxq]);
 	for (i = 0; i < sctx->isc_nrxqs; i++) {
 		if (!powerof2(scctx->isc_nrxd[i])) {
 			/* round down instead? */
 			device_printf(dev, "# rx descriptors must be a power of 2\n");
 			err = EINVAL;
 			goto fail_iflib_detach;
 		}
 	}
 	for (i = 0; i < sctx->isc_ntxqs; i++) {
 		if (!powerof2(scctx->isc_ntxd[i])) {
 			device_printf(dev,
 			    "# tx descriptors must be a power of 2");
 			err = EINVAL;
 			goto fail_iflib_detach;
 		}
 	}
 
 	if (scctx->isc_tx_nsegments > scctx->isc_ntxd[main_txq] /
 	    MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_nsegments = max(1, scctx->isc_ntxd[main_txq] /
 		    MAX_SINGLE_PACKET_FRACTION);
 	if (scctx->isc_tx_tso_segments_max > scctx->isc_ntxd[main_txq] /
 	    MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_tso_segments_max = max(1,
 		    scctx->isc_ntxd[main_txq] / MAX_SINGLE_PACKET_FRACTION);
 
 	/* TSO parameters - dig these out of the data sheet - simply correspond to tag setup */
 	if (if_getcapabilities(ifp) & IFCAP_TSO) {
 		/*
 		 * The stack can't handle a TSO size larger than IP_MAXPACKET,
 		 * but some MACs do.
 		 */
 		if_sethwtsomax(ifp, min(scctx->isc_tx_tso_size_max,
 		    IP_MAXPACKET));
 		/*
 		 * Take maximum number of m_pullup(9)'s in iflib_parse_header()
 		 * into account.  In the worst case, each of these calls will
 		 * add another mbuf and, thus, the requirement for another DMA
 		 * segment.  So for best performance, it doesn't make sense to
 		 * advertize a maximum of TSO segments that typically will
 		 * require defragmentation in iflib_encap().
 		 */
 		if_sethwtsomaxsegcount(ifp, scctx->isc_tx_tso_segments_max - 3);
 		if_sethwtsomaxsegsize(ifp, scctx->isc_tx_tso_segsize_max);
 	}
 	if (scctx->isc_rss_table_size == 0)
 		scctx->isc_rss_table_size = 64;
 	scctx->isc_rss_table_mask = scctx->isc_rss_table_size-1;
 
 	GROUPTASK_INIT(&ctx->ifc_admin_task, 0, _task_fn_admin, ctx);
 	/* XXX format name */
 	taskqgroup_attach(qgroup_if_config_tqg, &ctx->ifc_admin_task, ctx,
 	    NULL, NULL, "admin");
 
 	/* Set up cpu set.  If it fails, use the set of all CPUs. */
 	if (bus_get_cpus(dev, INTR_CPUS, sizeof(ctx->ifc_cpus), &ctx->ifc_cpus) != 0) {
 		device_printf(dev, "Unable to fetch CPU list\n");
 		CPU_COPY(&all_cpus, &ctx->ifc_cpus);
 	}
 	MPASS(CPU_COUNT(&ctx->ifc_cpus) > 0);
 
 	/*
 	** Now set up MSI or MSI-X, should return us the number of supported
 	** vectors (will be 1 for a legacy interrupt and MSI).
 	*/
 	if (sctx->isc_flags & IFLIB_SKIP_MSIX) {
 		msix = scctx->isc_vectors;
 	} else if (scctx->isc_msix_bar != 0)
 	       /*
 		* The simple fact that isc_msix_bar is not 0 does not mean we
 		* we have a good value there that is known to work.
 		*/
 		msix = iflib_msix_init(ctx);
 	else {
 		scctx->isc_vectors = 1;
 		scctx->isc_ntxqsets = 1;
 		scctx->isc_nrxqsets = 1;
 		scctx->isc_intr = IFLIB_INTR_LEGACY;
 		msix = 0;
 	}
 	/* Get memory for the station queues */
 	if ((err = iflib_queues_alloc(ctx))) {
 		device_printf(dev, "Unable to allocate queue memory\n");
 		goto fail_intr_free;
 	}
 
 	if ((err = iflib_qset_structures_setup(ctx)))
 		goto fail_queues;
 
 	/*
+	 * Now that we know how many queues there are, get the core offset.
+	 */
+	ctx->ifc_sysctl_core_offset = get_ctx_core_offset(ctx);
+
+	/*
 	 * Group taskqueues aren't properly set up until SMP is started,
 	 * so we disable interrupts until we can handle them post
 	 * SI_SUB_SMP.
 	 *
 	 * XXX: disabling interrupts doesn't actually work, at least for
 	 * the non-MSI case.  When they occur before SI_SUB_SMP completes,
 	 * we do null handling and depend on this not causing too large an
 	 * interrupt storm.
 	 */
 	IFDI_INTR_DISABLE(ctx);
 	if (msix > 1 && (err = IFDI_MSIX_INTR_ASSIGN(ctx, msix)) != 0) {
 		device_printf(dev, "IFDI_MSIX_INTR_ASSIGN failed %d\n", err);
 		goto fail_queues;
 	}
 	if (msix <= 1) {
 		rid = 0;
 		if (scctx->isc_intr == IFLIB_INTR_MSI) {
 			MPASS(msix == 1);
 			rid = 1;
 		}
 		if ((err = iflib_legacy_setup(ctx, ctx->isc_legacy_intr, ctx->ifc_softc, &rid, "irq0")) != 0) {
 			device_printf(dev, "iflib_legacy_setup failed %d\n", err);
 			goto fail_queues;
 		}
 	}
 
 	ether_ifattach(ctx->ifc_ifp, ctx->ifc_mac.octet);
 
 	if ((err = IFDI_ATTACH_POST(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_POST failed %d\n", err);
 		goto fail_detach;
 	}
 
 	/*
 	 * Tell the upper layer(s) if IFCAP_VLAN_MTU is supported.
 	 * This must appear after the call to ether_ifattach() because
 	 * ether_ifattach() sets if_hdrlen to the default value.
 	 */
 	if (if_getcapabilities(ifp) & IFCAP_VLAN_MTU)
 		if_setifheaderlen(ifp, sizeof(struct ether_vlan_header));
 
 	if ((err = iflib_netmap_attach(ctx))) {
 		device_printf(ctx->ifc_dev, "netmap attach failed: %d\n", err);
 		goto fail_detach;
 	}
 	*ctxp = ctx;
 
 	NETDUMP_SET(ctx->ifc_ifp, iflib);
 
 	if_setgetcounterfn(ctx->ifc_ifp, iflib_if_get_counter);
 	iflib_add_device_sysctl_post(ctx);
+	iflib_add_pfil(ctx);
 	ctx->ifc_flags |= IFC_INIT_DONE;
 	CTX_UNLOCK(ctx);
 	return (0);
 
 fail_detach:
 	ether_ifdetach(ctx->ifc_ifp);
 fail_intr_free:
 	iflib_free_intr_mem(ctx);
 fail_queues:
 	iflib_tx_structures_free(ctx);
 	iflib_rx_structures_free(ctx);
 fail_iflib_detach:
 	IFDI_DETACH(ctx);
 fail_unlock:
 	CTX_UNLOCK(ctx);
 fail_ctx_free:
         if (ctx->ifc_flags & IFC_SC_ALLOCATED)
                 free(ctx->ifc_softc, M_IFLIB);
         free(ctx, M_IFLIB);
 	return (err);
 }
 
 int
 iflib_pseudo_register(device_t dev, if_shared_ctx_t sctx, if_ctx_t *ctxp,
 					  struct iflib_cloneattach_ctx *clctx)
 {
 	int err;
 	if_ctx_t ctx;
 	if_t ifp;
 	if_softc_ctx_t scctx;
 	int i;
 	void *sc;
 	uint16_t main_txq;
 	uint16_t main_rxq;
 
 	ctx = malloc(sizeof(*ctx), M_IFLIB, M_WAITOK|M_ZERO);
 	sc = malloc(sctx->isc_driver->size, M_IFLIB, M_WAITOK|M_ZERO);
 	ctx->ifc_flags |= IFC_SC_ALLOCATED;
 	if (sctx->isc_flags & (IFLIB_PSEUDO|IFLIB_VIRTUAL))
 		ctx->ifc_flags |= IFC_PSEUDO;
 
 	ctx->ifc_sctx = sctx;
 	ctx->ifc_softc = sc;
 	ctx->ifc_dev = dev;
 
 	if ((err = iflib_register(ctx)) != 0) {
 		device_printf(dev, "%s: iflib_register failed %d\n", __func__, err);
 		goto fail_ctx_free;
 	}
 	iflib_add_device_sysctl_pre(ctx);
 
 	scctx = &ctx->ifc_softc_ctx;
 	ifp = ctx->ifc_ifp;
 
 	/*
 	 * XXX sanity check that ntxd & nrxd are a power of 2
 	 */
 	iflib_reset_qvalues(ctx);
 	CTX_LOCK(ctx);
 	if ((err = IFDI_ATTACH_PRE(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_PRE failed %d\n", err);
 		goto fail_unlock;
 	}
 	if (sctx->isc_flags & IFLIB_GEN_MAC)
 		ether_gen_addr(ifp, &ctx->ifc_mac);
 	if ((err = IFDI_CLONEATTACH(ctx, clctx->cc_ifc, clctx->cc_name,
 								clctx->cc_params)) != 0) {
 		device_printf(dev, "IFDI_CLONEATTACH failed %d\n", err);
 		goto fail_ctx_free;
 	}
 	ifmedia_add(&ctx->ifc_media, IFM_ETHER | IFM_1000_T | IFM_FDX, 0, NULL);
 	ifmedia_add(&ctx->ifc_media, IFM_ETHER | IFM_AUTO, 0, NULL);
 	ifmedia_set(&ctx->ifc_media, IFM_ETHER | IFM_AUTO);
 
 #ifdef INVARIANTS
 	MPASS(scctx->isc_capabilities);
 	if (scctx->isc_capabilities & IFCAP_TXCSUM)
 		MPASS(scctx->isc_tx_csum_flags);
 #endif
 
 	if_setcapabilities(ifp, scctx->isc_capabilities | IFCAP_HWSTATS | IFCAP_LINKSTATE);
 	if_setcapenable(ifp, scctx->isc_capenable | IFCAP_HWSTATS | IFCAP_LINKSTATE);
 
 	ifp->if_flags |= IFF_NOGROUP;
 	if (sctx->isc_flags & IFLIB_PSEUDO) {
 		ether_ifattach(ctx->ifc_ifp, ctx->ifc_mac.octet);
 
 		if ((err = IFDI_ATTACH_POST(ctx)) != 0) {
 			device_printf(dev, "IFDI_ATTACH_POST failed %d\n", err);
 			goto fail_detach;
 		}
 		*ctxp = ctx;
 
 		/*
 		 * Tell the upper layer(s) if IFCAP_VLAN_MTU is supported.
 		 * This must appear after the call to ether_ifattach() because
 		 * ether_ifattach() sets if_hdrlen to the default value.
 		 */
 		if (if_getcapabilities(ifp) & IFCAP_VLAN_MTU)
 			if_setifheaderlen(ifp,
 			    sizeof(struct ether_vlan_header));
 
 		if_setgetcounterfn(ctx->ifc_ifp, iflib_if_get_counter);
 		iflib_add_device_sysctl_post(ctx);
 		ctx->ifc_flags |= IFC_INIT_DONE;
 		return (0);
 	}
 	_iflib_pre_assert(scctx);
 	ctx->ifc_txrx = *scctx->isc_txrx;
 
 	if (scctx->isc_ntxqsets == 0 || (scctx->isc_ntxqsets_max && scctx->isc_ntxqsets_max < scctx->isc_ntxqsets))
 		scctx->isc_ntxqsets = scctx->isc_ntxqsets_max;
 	if (scctx->isc_nrxqsets == 0 || (scctx->isc_nrxqsets_max && scctx->isc_nrxqsets_max < scctx->isc_nrxqsets))
 		scctx->isc_nrxqsets = scctx->isc_nrxqsets_max;
 
 	main_txq = (sctx->isc_flags & IFLIB_HAS_TXCQ) ? 1 : 0;
 	main_rxq = (sctx->isc_flags & IFLIB_HAS_RXCQ) ? 1 : 0;
 
 	/* XXX change for per-queue sizes */
 	device_printf(dev, "Using %d tx descriptors and %d rx descriptors\n",
 	    scctx->isc_ntxd[main_txq], scctx->isc_nrxd[main_rxq]);
 	for (i = 0; i < sctx->isc_nrxqs; i++) {
 		if (!powerof2(scctx->isc_nrxd[i])) {
 			/* round down instead? */
 			device_printf(dev, "# rx descriptors must be a power of 2\n");
 			err = EINVAL;
 			goto fail_iflib_detach;
 		}
 	}
 	for (i = 0; i < sctx->isc_ntxqs; i++) {
 		if (!powerof2(scctx->isc_ntxd[i])) {
 			device_printf(dev,
 			    "# tx descriptors must be a power of 2");
 			err = EINVAL;
 			goto fail_iflib_detach;
 		}
 	}
 
 	if (scctx->isc_tx_nsegments > scctx->isc_ntxd[main_txq] /
 	    MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_nsegments = max(1, scctx->isc_ntxd[main_txq] /
 		    MAX_SINGLE_PACKET_FRACTION);
 	if (scctx->isc_tx_tso_segments_max > scctx->isc_ntxd[main_txq] /
 	    MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_tso_segments_max = max(1,
 		    scctx->isc_ntxd[main_txq] / MAX_SINGLE_PACKET_FRACTION);
 
 	/* TSO parameters - dig these out of the data sheet - simply correspond to tag setup */
 	if (if_getcapabilities(ifp) & IFCAP_TSO) {
 		/*
 		 * The stack can't handle a TSO size larger than IP_MAXPACKET,
 		 * but some MACs do.
 		 */
 		if_sethwtsomax(ifp, min(scctx->isc_tx_tso_size_max,
 		    IP_MAXPACKET));
 		/*
 		 * Take maximum number of m_pullup(9)'s in iflib_parse_header()
 		 * into account.  In the worst case, each of these calls will
 		 * add another mbuf and, thus, the requirement for another DMA
 		 * segment.  So for best performance, it doesn't make sense to
 		 * advertize a maximum of TSO segments that typically will
 		 * require defragmentation in iflib_encap().
 		 */
 		if_sethwtsomaxsegcount(ifp, scctx->isc_tx_tso_segments_max - 3);
 		if_sethwtsomaxsegsize(ifp, scctx->isc_tx_tso_segsize_max);
 	}
 	if (scctx->isc_rss_table_size == 0)
 		scctx->isc_rss_table_size = 64;
 	scctx->isc_rss_table_mask = scctx->isc_rss_table_size-1;
 
 	GROUPTASK_INIT(&ctx->ifc_admin_task, 0, _task_fn_admin, ctx);
 	/* XXX format name */
 	taskqgroup_attach(qgroup_if_config_tqg, &ctx->ifc_admin_task, ctx,
 	    NULL, NULL, "admin");
 
 	/* XXX --- can support > 1 -- but keep it simple for now */
 	scctx->isc_intr = IFLIB_INTR_LEGACY;
 
 	/* Get memory for the station queues */
 	if ((err = iflib_queues_alloc(ctx))) {
 		device_printf(dev, "Unable to allocate queue memory\n");
 		goto fail_iflib_detach;
 	}
 
 	if ((err = iflib_qset_structures_setup(ctx))) {
 		device_printf(dev, "qset structure setup failed %d\n", err);
 		goto fail_queues;
 	}
 
 	/*
 	 * XXX What if anything do we want to do about interrupts?
 	 */
 	ether_ifattach(ctx->ifc_ifp, ctx->ifc_mac.octet);
 	if ((err = IFDI_ATTACH_POST(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_POST failed %d\n", err);
 		goto fail_detach;
 	}
 
 	/*
 	 * Tell the upper layer(s) if IFCAP_VLAN_MTU is supported.
 	 * This must appear after the call to ether_ifattach() because
 	 * ether_ifattach() sets if_hdrlen to the default value.
 	 */
 	if (if_getcapabilities(ifp) & IFCAP_VLAN_MTU)
 		if_setifheaderlen(ifp, sizeof(struct ether_vlan_header));
 
 	/* XXX handle more than one queue */
 	for (i = 0; i < scctx->isc_nrxqsets; i++)
 		IFDI_RX_CLSET(ctx, 0, i, ctx->ifc_rxqs[i].ifr_fl[0].ifl_sds.ifsd_cl);
 
 	*ctxp = ctx;
 
 	if_setgetcounterfn(ctx->ifc_ifp, iflib_if_get_counter);
 	iflib_add_device_sysctl_post(ctx);
 	ctx->ifc_flags |= IFC_INIT_DONE;
 	CTX_UNLOCK(ctx);
 	return (0);
 fail_detach:
 	ether_ifdetach(ctx->ifc_ifp);
 fail_queues:
 	iflib_tx_structures_free(ctx);
 	iflib_rx_structures_free(ctx);
 fail_iflib_detach:
 	IFDI_DETACH(ctx);
 fail_unlock:
 	CTX_UNLOCK(ctx);
 fail_ctx_free:
 	free(ctx->ifc_softc, M_IFLIB);
 	free(ctx, M_IFLIB);
 	return (err);
 }
 
 int
 iflib_pseudo_deregister(if_ctx_t ctx)
 {
 	if_t ifp = ctx->ifc_ifp;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	int i, j;
 	struct taskqgroup *tqg;
 	iflib_fl_t fl;
 
 	/* Unregister VLAN events */
 	if (ctx->ifc_vlan_attach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_config, ctx->ifc_vlan_attach_event);
 	if (ctx->ifc_vlan_detach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_unconfig, ctx->ifc_vlan_detach_event);
 
 	ether_ifdetach(ifp);
 	/* ether_ifdetach calls if_qflush - lock must be destroy afterwards*/
 	CTX_LOCK_DESTROY(ctx);
 	/* XXX drain any dependent tasks */
 	tqg = qgroup_if_io_tqg;
 	for (txq = ctx->ifc_txqs, i = 0; i < NTXQSETS(ctx); i++, txq++) {
 		callout_drain(&txq->ift_timer);
 		if (txq->ift_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &txq->ift_task);
 	}
 	for (i = 0, rxq = ctx->ifc_rxqs; i < NRXQSETS(ctx); i++, rxq++) {
 		if (rxq->ifr_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &rxq->ifr_task);
 
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++)
 			free(fl->ifl_rx_bitmap, M_IFLIB);
 	}
 	tqg = qgroup_if_config_tqg;
 	if (ctx->ifc_admin_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_admin_task);
 	if (ctx->ifc_vflr_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_vflr_task);
 
 	if_free(ifp);
 
 	iflib_tx_structures_free(ctx);
 	iflib_rx_structures_free(ctx);
 	if (ctx->ifc_flags & IFC_SC_ALLOCATED)
 		free(ctx->ifc_softc, M_IFLIB);
 	free(ctx, M_IFLIB);
 	return (0);
 }
 
 int
 iflib_device_attach(device_t dev)
 {
 	if_ctx_t ctx;
 	if_shared_ctx_t sctx;
 
 	if ((sctx = DEVICE_REGISTER(dev)) == NULL || sctx->isc_magic != IFLIB_MAGIC)
 		return (ENOTSUP);
 
 	pci_enable_busmaster(dev);
 
 	return (iflib_device_register(dev, NULL, sctx, &ctx));
 }
 
 int
 iflib_device_deregister(if_ctx_t ctx)
 {
 	if_t ifp = ctx->ifc_ifp;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	device_t dev = ctx->ifc_dev;
 	int i, j;
 	struct taskqgroup *tqg;
 	iflib_fl_t fl;
 
 	/* Make sure VLANS are not using driver */
 	if (if_vlantrunkinuse(ifp)) {
 		device_printf(dev, "Vlan in use, detach first\n");
 		return (EBUSY);
 	}
 #ifdef PCI_IOV
 	if (!CTX_IS_VF(ctx) && pci_iov_detach(dev) != 0) {
 		device_printf(dev, "SR-IOV in use; detach first.\n");
 		return (EBUSY);
 	}
 #endif
 
 	STATE_LOCK(ctx);
 	ctx->ifc_flags |= IFC_IN_DETACH;
 	STATE_UNLOCK(ctx);
 
 	CTX_LOCK(ctx);
 	iflib_stop(ctx);
 	CTX_UNLOCK(ctx);
 
 	/* Unregister VLAN events */
 	if (ctx->ifc_vlan_attach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_config, ctx->ifc_vlan_attach_event);
 	if (ctx->ifc_vlan_detach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_unconfig, ctx->ifc_vlan_detach_event);
 
 	iflib_netmap_detach(ifp);
 	ether_ifdetach(ifp);
+	iflib_rem_pfil(ctx);
 	if (ctx->ifc_led_dev != NULL)
 		led_destroy(ctx->ifc_led_dev);
 	/* XXX drain any dependent tasks */
 	tqg = qgroup_if_io_tqg;
 	for (txq = ctx->ifc_txqs, i = 0; i < NTXQSETS(ctx); i++, txq++) {
 		callout_drain(&txq->ift_timer);
 		if (txq->ift_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &txq->ift_task);
 	}
 	for (i = 0, rxq = ctx->ifc_rxqs; i < NRXQSETS(ctx); i++, rxq++) {
 		if (rxq->ifr_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &rxq->ifr_task);
 
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++)
 			free(fl->ifl_rx_bitmap, M_IFLIB);
 	}
 	tqg = qgroup_if_config_tqg;
 	if (ctx->ifc_admin_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_admin_task);
 	if (ctx->ifc_vflr_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_vflr_task);
 	CTX_LOCK(ctx);
 	IFDI_DETACH(ctx);
 	CTX_UNLOCK(ctx);
 
 	/* ether_ifdetach calls if_qflush - lock must be destroy afterwards*/
 	CTX_LOCK_DESTROY(ctx);
 	device_set_softc(ctx->ifc_dev, NULL);
 	iflib_free_intr_mem(ctx);
 
 	bus_generic_detach(dev);
 	if_free(ifp);
 
 	iflib_tx_structures_free(ctx);
 	iflib_rx_structures_free(ctx);
 	if (ctx->ifc_flags & IFC_SC_ALLOCATED)
 		free(ctx->ifc_softc, M_IFLIB);
+	unref_ctx_core_offset(ctx);
 	STATE_LOCK_DESTROY(ctx);
 	free(ctx, M_IFLIB);
 	return (0);
 }
 
 static void
 iflib_free_intr_mem(if_ctx_t ctx)
 {
 
 	if (ctx->ifc_softc_ctx.isc_intr != IFLIB_INTR_MSIX) {
 		iflib_irq_free(ctx, &ctx->ifc_legacy_irq);
 	}
 	if (ctx->ifc_softc_ctx.isc_intr != IFLIB_INTR_LEGACY) {
 		pci_release_msi(ctx->ifc_dev);
 	}
 	if (ctx->ifc_msix_mem != NULL) {
 		bus_release_resource(ctx->ifc_dev, SYS_RES_MEMORY,
 		    rman_get_rid(ctx->ifc_msix_mem), ctx->ifc_msix_mem);
 		ctx->ifc_msix_mem = NULL;
 	}
 }
 
 int
 iflib_device_detach(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	return (iflib_device_deregister(ctx));
 }
 
 int
 iflib_device_suspend(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_SUSPEND(ctx);
 	CTX_UNLOCK(ctx);
 
 	return bus_generic_suspend(dev);
 }
 int
 iflib_device_shutdown(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_SHUTDOWN(ctx);
 	CTX_UNLOCK(ctx);
 
 	return bus_generic_suspend(dev);
 }
 
 
 int
 iflib_device_resume(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 	iflib_txq_t txq = ctx->ifc_txqs;
 
 	CTX_LOCK(ctx);
 	IFDI_RESUME(ctx);
 	iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 	for (int i = 0; i < NTXQSETS(ctx); i++, txq++)
 		iflib_txq_check_drain(txq, IFLIB_RESTART_BUDGET);
 
 	return (bus_generic_resume(dev));
 }
 
 int
 iflib_device_iov_init(device_t dev, uint16_t num_vfs, const nvlist_t *params)
 {
 	int error;
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	error = IFDI_IOV_INIT(ctx, num_vfs, params);
 	CTX_UNLOCK(ctx);
 
 	return (error);
 }
 
 void
 iflib_device_iov_uninit(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_IOV_UNINIT(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 int
 iflib_device_iov_add_vf(device_t dev, uint16_t vfnum, const nvlist_t *params)
 {
 	int error;
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	error = IFDI_IOV_VF_ADD(ctx, vfnum, params);
 	CTX_UNLOCK(ctx);
 
 	return (error);
 }
 
 /*********************************************************************
  *
  *  MODULE FUNCTION DEFINITIONS
  *
  **********************************************************************/
 
 /*
  * - Start a fast taskqueue thread for each core
  * - Start a taskqueue for control operations
  */
 static int
 iflib_module_init(void)
 {
 	return (0);
 }
 
 static int
 iflib_module_event_handler(module_t mod, int what, void *arg)
 {
 	int err;
 
 	switch (what) {
 	case MOD_LOAD:
 		if ((err = iflib_module_init()) != 0)
 			return (err);
 		break;
 	case MOD_UNLOAD:
 		return (EBUSY);
 	default:
 		return (EOPNOTSUPP);
 	}
 
 	return (0);
 }
 
 /*********************************************************************
  *
  *  PUBLIC FUNCTION DEFINITIONS
  *     ordered as in iflib.h
  *
  **********************************************************************/
 
 
 static void
 _iflib_assert(if_shared_ctx_t sctx)
 {
 	MPASS(sctx->isc_tx_maxsize);
 	MPASS(sctx->isc_tx_maxsegsize);
 
 	MPASS(sctx->isc_rx_maxsize);
 	MPASS(sctx->isc_rx_nsegments);
 	MPASS(sctx->isc_rx_maxsegsize);
 
 	MPASS(sctx->isc_nrxd_min[0]);
 	MPASS(sctx->isc_nrxd_max[0]);
 	MPASS(sctx->isc_nrxd_default[0]);
 	MPASS(sctx->isc_ntxd_min[0]);
 	MPASS(sctx->isc_ntxd_max[0]);
 	MPASS(sctx->isc_ntxd_default[0]);
 }
 
 static void
 _iflib_pre_assert(if_softc_ctx_t scctx)
 {
 
 	MPASS(scctx->isc_txrx->ift_txd_encap);
 	MPASS(scctx->isc_txrx->ift_txd_flush);
 	MPASS(scctx->isc_txrx->ift_txd_credits_update);
 	MPASS(scctx->isc_txrx->ift_rxd_available);
 	MPASS(scctx->isc_txrx->ift_rxd_pkt_get);
 	MPASS(scctx->isc_txrx->ift_rxd_refill);
 	MPASS(scctx->isc_txrx->ift_rxd_flush);
 }
 
 static int
 iflib_register(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	driver_t *driver = sctx->isc_driver;
 	device_t dev = ctx->ifc_dev;
 	if_t ifp;
 
 	_iflib_assert(sctx);
 
 	CTX_LOCK_INIT(ctx);
 	STATE_LOCK_INIT(ctx, device_get_nameunit(ctx->ifc_dev));
 	ifp = ctx->ifc_ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "can not allocate ifnet structure\n");
 		return (ENOMEM);
 	}
 
 	/*
 	 * Initialize our context's device specific methods
 	 */
 	kobj_init((kobj_t) ctx, (kobj_class_t) driver);
 	kobj_class_compile((kobj_class_t) driver);
 	driver->refs++;
 
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	if_setsoftc(ifp, ctx);
 	if_setdev(ifp, dev);
 	if_setinitfn(ifp, iflib_if_init);
 	if_setioctlfn(ifp, iflib_if_ioctl);
 #ifdef ALTQ
 	if_setstartfn(ifp, iflib_altq_if_start);
 	if_settransmitfn(ifp, iflib_altq_if_transmit);
 	if_setsendqready(ifp);
 #else
 	if_settransmitfn(ifp, iflib_if_transmit);
 #endif
 	if_setqflushfn(ifp, iflib_if_qflush);
 	if_setflags(ifp, IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST);
 
 	ctx->ifc_vlan_attach_event =
 		EVENTHANDLER_REGISTER(vlan_config, iflib_vlan_register, ctx,
 							  EVENTHANDLER_PRI_FIRST);
 	ctx->ifc_vlan_detach_event =
 		EVENTHANDLER_REGISTER(vlan_unconfig, iflib_vlan_unregister, ctx,
 							  EVENTHANDLER_PRI_FIRST);
 
 	ifmedia_init(&ctx->ifc_media, IFM_IMASK,
 					 iflib_media_change, iflib_media_status);
 
 	return (0);
 }
 
 
 static int
 iflib_queues_alloc(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	device_t dev = ctx->ifc_dev;
 	int nrxqsets = scctx->isc_nrxqsets;
 	int ntxqsets = scctx->isc_ntxqsets;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	iflib_fl_t fl = NULL;
 	int i, j, cpu, err, txconf, rxconf;
 	iflib_dma_info_t ifdip;
 	uint32_t *rxqsizes = scctx->isc_rxqsizes;
 	uint32_t *txqsizes = scctx->isc_txqsizes;
 	uint8_t nrxqs = sctx->isc_nrxqs;
 	uint8_t ntxqs = sctx->isc_ntxqs;
 	int nfree_lists = sctx->isc_nfl ? sctx->isc_nfl : 1;
 	caddr_t *vaddrs;
 	uint64_t *paddrs;
 
 	KASSERT(ntxqs > 0, ("number of queues per qset must be at least 1"));
 	KASSERT(nrxqs > 0, ("number of queues per qset must be at least 1"));
 
 	/* Allocate the TX ring struct memory */
 	if (!(ctx->ifc_txqs =
 	    (iflib_txq_t) malloc(sizeof(struct iflib_txq) *
 	    ntxqsets, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate TX ring memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 
 	/* Now allocate the RX */
 	if (!(ctx->ifc_rxqs =
 	    (iflib_rxq_t) malloc(sizeof(struct iflib_rxq) *
 	    nrxqsets, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate RX ring memory\n");
 		err = ENOMEM;
 		goto rx_fail;
 	}
 
 	txq = ctx->ifc_txqs;
 	rxq = ctx->ifc_rxqs;
 
 	/*
 	 * XXX handle allocation failure
 	 */
 	for (txconf = i = 0, cpu = CPU_FIRST(); i < ntxqsets; i++, txconf++, txq++, cpu = CPU_NEXT(cpu)) {
 		/* Set up some basics */
 
 		if ((ifdip = malloc(sizeof(struct iflib_dma_info) * ntxqs,
 		    M_IFLIB, M_NOWAIT | M_ZERO)) == NULL) {
 			device_printf(dev,
 			    "Unable to allocate TX DMA info memory\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 		txq->ift_ifdi = ifdip;
 		for (j = 0; j < ntxqs; j++, ifdip++) {
 			if (iflib_dma_alloc(ctx, txqsizes[j], ifdip, 0)) {
 				device_printf(dev,
 				    "Unable to allocate TX descriptors\n");
 				err = ENOMEM;
 				goto err_tx_desc;
 			}
 			txq->ift_txd_size[j] = scctx->isc_txd_size[j];
 			bzero((void *)ifdip->idi_vaddr, txqsizes[j]);
 		}
 		txq->ift_ctx = ctx;
 		txq->ift_id = i;
 		if (sctx->isc_flags & IFLIB_HAS_TXCQ) {
 			txq->ift_br_offset = 1;
 		} else {
 			txq->ift_br_offset = 0;
 		}
 		/* XXX fix this */
 		txq->ift_timer.c_cpu = cpu;
 
 		if (iflib_txsd_alloc(txq)) {
 			device_printf(dev, "Critical Failure setting up TX buffers\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 
 		/* Initialize the TX lock */
 		snprintf(txq->ift_mtx_name, MTX_NAME_LEN, "%s:tx(%d):callout",
 		    device_get_nameunit(dev), txq->ift_id);
 		mtx_init(&txq->ift_mtx, txq->ift_mtx_name, NULL, MTX_DEF);
 		callout_init_mtx(&txq->ift_timer, &txq->ift_mtx, 0);
 
 		snprintf(txq->ift_db_mtx_name, MTX_NAME_LEN, "%s:tx(%d):db",
 			 device_get_nameunit(dev), txq->ift_id);
 
 		err = ifmp_ring_alloc(&txq->ift_br, 2048, txq, iflib_txq_drain,
 				      iflib_txq_can_drain, M_IFLIB, M_WAITOK);
 		if (err) {
 			/* XXX free any allocated rings */
 			device_printf(dev, "Unable to allocate buf_ring\n");
 			goto err_tx_desc;
 		}
 	}
 
 	for (rxconf = i = 0; i < nrxqsets; i++, rxconf++, rxq++) {
 		/* Set up some basics */
 
 		if ((ifdip = malloc(sizeof(struct iflib_dma_info) * nrxqs,
 		   M_IFLIB, M_NOWAIT | M_ZERO)) == NULL) {
 			device_printf(dev,
 			    "Unable to allocate RX DMA info memory\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 
 		rxq->ifr_ifdi = ifdip;
 		/* XXX this needs to be changed if #rx queues != #tx queues */
 		rxq->ifr_ntxqirq = 1;
 		rxq->ifr_txqid[0] = i;
 		for (j = 0; j < nrxqs; j++, ifdip++) {
 			if (iflib_dma_alloc(ctx, rxqsizes[j], ifdip, 0)) {
 				device_printf(dev,
 				    "Unable to allocate RX descriptors\n");
 				err = ENOMEM;
 				goto err_tx_desc;
 			}
 			bzero((void *)ifdip->idi_vaddr, rxqsizes[j]);
 		}
 		rxq->ifr_ctx = ctx;
 		rxq->ifr_id = i;
 		if (sctx->isc_flags & IFLIB_HAS_RXCQ) {
 			rxq->ifr_fl_offset = 1;
 		} else {
 			rxq->ifr_fl_offset = 0;
 		}
 		rxq->ifr_nfl = nfree_lists;
 		if (!(fl =
 			  (iflib_fl_t) malloc(sizeof(struct iflib_fl) * nfree_lists, M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev, "Unable to allocate free list memory\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 		rxq->ifr_fl = fl;
 		for (j = 0; j < nfree_lists; j++) {
 			fl[j].ifl_rxq = rxq;
 			fl[j].ifl_id = j;
 			fl[j].ifl_ifdi = &rxq->ifr_ifdi[j + rxq->ifr_fl_offset];
 			fl[j].ifl_rxd_size = scctx->isc_rxd_size[j];
 		}
 		/* Allocate receive buffers for the ring */
 		if (iflib_rxsd_alloc(rxq)) {
 			device_printf(dev,
 			    "Critical Failure setting up receive buffers\n");
 			err = ENOMEM;
 			goto err_rx_desc;
 		}
 
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) 
 			fl->ifl_rx_bitmap = bit_alloc(fl->ifl_size, M_IFLIB,
 			    M_WAITOK);
 	}
 
 	/* TXQs */
 	vaddrs = malloc(sizeof(caddr_t)*ntxqsets*ntxqs, M_IFLIB, M_WAITOK);
 	paddrs = malloc(sizeof(uint64_t)*ntxqsets*ntxqs, M_IFLIB, M_WAITOK);
 	for (i = 0; i < ntxqsets; i++) {
 		iflib_dma_info_t di = ctx->ifc_txqs[i].ift_ifdi;
 
 		for (j = 0; j < ntxqs; j++, di++) {
 			vaddrs[i*ntxqs + j] = di->idi_vaddr;
 			paddrs[i*ntxqs + j] = di->idi_paddr;
 		}
 	}
 	if ((err = IFDI_TX_QUEUES_ALLOC(ctx, vaddrs, paddrs, ntxqs, ntxqsets)) != 0) {
 		device_printf(ctx->ifc_dev,
 		    "Unable to allocate device TX queue\n");
 		iflib_tx_structures_free(ctx);
 		free(vaddrs, M_IFLIB);
 		free(paddrs, M_IFLIB);
 		goto err_rx_desc;
 	}
 	free(vaddrs, M_IFLIB);
 	free(paddrs, M_IFLIB);
 
 	/* RXQs */
 	vaddrs = malloc(sizeof(caddr_t)*nrxqsets*nrxqs, M_IFLIB, M_WAITOK);
 	paddrs = malloc(sizeof(uint64_t)*nrxqsets*nrxqs, M_IFLIB, M_WAITOK);
 	for (i = 0; i < nrxqsets; i++) {
 		iflib_dma_info_t di = ctx->ifc_rxqs[i].ifr_ifdi;
 
 		for (j = 0; j < nrxqs; j++, di++) {
 			vaddrs[i*nrxqs + j] = di->idi_vaddr;
 			paddrs[i*nrxqs + j] = di->idi_paddr;
 		}
 	}
 	if ((err = IFDI_RX_QUEUES_ALLOC(ctx, vaddrs, paddrs, nrxqs, nrxqsets)) != 0) {
 		device_printf(ctx->ifc_dev,
 		    "Unable to allocate device RX queue\n");
 		iflib_tx_structures_free(ctx);
 		free(vaddrs, M_IFLIB);
 		free(paddrs, M_IFLIB);
 		goto err_rx_desc;
 	}
 	free(vaddrs, M_IFLIB);
 	free(paddrs, M_IFLIB);
 
 	return (0);
 
 /* XXX handle allocation failure changes */
 err_rx_desc:
 err_tx_desc:
 rx_fail:
 	if (ctx->ifc_rxqs != NULL)
 		free(ctx->ifc_rxqs, M_IFLIB);
 	ctx->ifc_rxqs = NULL;
 	if (ctx->ifc_txqs != NULL)
 		free(ctx->ifc_txqs, M_IFLIB);
 	ctx->ifc_txqs = NULL;
 fail:
 	return (err);
 }
 
 static int
 iflib_tx_structures_setup(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	int i;
 
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++)
 		iflib_txq_setup(txq);
 
 	return (0);
 }
 
 static void
 iflib_tx_structures_free(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	int i, j;
 
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++) {
 		iflib_txq_destroy(txq);
 		for (j = 0; j < sctx->isc_ntxqs; j++)
 			iflib_dma_free(&txq->ift_ifdi[j]);
 	}
 	free(ctx->ifc_txqs, M_IFLIB);
 	ctx->ifc_txqs = NULL;
 	IFDI_QUEUES_FREE(ctx);
 }
 
 /*********************************************************************
  *
  *  Initialize all receive rings.
  *
  **********************************************************************/
 static int
 iflib_rx_structures_setup(if_ctx_t ctx)
 {
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	int q;
 #if defined(INET6) || defined(INET)
 	int i, err;
 #endif
 
 	for (q = 0; q < ctx->ifc_softc_ctx.isc_nrxqsets; q++, rxq++) {
 #if defined(INET6) || defined(INET)
 		tcp_lro_free(&rxq->ifr_lc);
 		if ((err = tcp_lro_init_args(&rxq->ifr_lc, ctx->ifc_ifp,
 		    TCP_LRO_ENTRIES, min(1024,
 		    ctx->ifc_softc_ctx.isc_nrxd[rxq->ifr_fl_offset]))) != 0) {
 			device_printf(ctx->ifc_dev, "LRO Initialization failed!\n");
 			goto fail;
 		}
 		rxq->ifr_lro_enabled = TRUE;
 #endif
 		IFDI_RXQ_SETUP(ctx, rxq->ifr_id);
 	}
 	return (0);
 #if defined(INET6) || defined(INET)
 fail:
 	/*
 	 * Free RX software descriptors allocated so far, we will only handle
 	 * the rings that completed, the failing case will have
 	 * cleaned up for itself. 'q' failed, so its the terminus.
 	 */
 	rxq = ctx->ifc_rxqs;
 	for (i = 0; i < q; ++i, rxq++) {
 		iflib_rx_sds_free(rxq);
 		rxq->ifr_cq_gen = rxq->ifr_cq_cidx = rxq->ifr_cq_pidx = 0;
 	}
 	return (err);
 #endif
 }
 
 /*********************************************************************
  *
  *  Free all receive rings.
  *
  **********************************************************************/
 static void
 iflib_rx_structures_free(if_ctx_t ctx)
 {
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 
 	for (int i = 0; i < ctx->ifc_softc_ctx.isc_nrxqsets; i++, rxq++) {
 		iflib_rx_sds_free(rxq);
 	}
 	free(ctx->ifc_rxqs, M_IFLIB);
 	ctx->ifc_rxqs = NULL;
 }
 
 static int
 iflib_qset_structures_setup(if_ctx_t ctx)
 {
 	int err;
 
 	/*
 	 * It is expected that the caller takes care of freeing queues if this
 	 * fails.
 	 */
 	if ((err = iflib_tx_structures_setup(ctx)) != 0) {
 		device_printf(ctx->ifc_dev, "iflib_tx_structures_setup failed: %d\n", err);
 		return (err);
 	}
 
 	if ((err = iflib_rx_structures_setup(ctx)) != 0)
 		device_printf(ctx->ifc_dev, "iflib_rx_structures_setup failed: %d\n", err);
 
 	return (err);
 }
 
 int
 iflib_irq_alloc(if_ctx_t ctx, if_irq_t irq, int rid,
 		driver_filter_t filter, void *filter_arg, driver_intr_t handler, void *arg, const char *name)
 {
 
 	return (_iflib_irq_alloc(ctx, irq, rid, filter, handler, arg, name));
 }
 
 #ifdef SMP
 static int
 find_nth(if_ctx_t ctx, int qid)
 {
 	cpuset_t cpus;
 	int i, cpuid, eqid, count;
 
 	CPU_COPY(&ctx->ifc_cpus, &cpus);
 	count = CPU_COUNT(&cpus);
 	eqid = qid % count;
 	/* clear up to the qid'th bit */
 	for (i = 0; i < eqid; i++) {
 		cpuid = CPU_FFS(&cpus);
 		MPASS(cpuid != 0);
 		CPU_CLR(cpuid-1, &cpus);
 	}
 	cpuid = CPU_FFS(&cpus);
 	MPASS(cpuid != 0);
 	return (cpuid-1);
 }
 
 #ifdef SCHED_ULE
 extern struct cpu_group *cpu_top;              /* CPU topology */
 
 static int
 find_child_with_core(int cpu, struct cpu_group *grp)
 {
 	int i;
 
 	if (grp->cg_children == 0)
 		return -1;
 
 	MPASS(grp->cg_child);
 	for (i = 0; i < grp->cg_children; i++) {
 		if (CPU_ISSET(cpu, &grp->cg_child[i].cg_mask))
 			return i;
 	}
 
 	return -1;
 }
 
 /*
  * Find the nth "close" core to the specified core
  * "close" is defined as the deepest level that shares
  * at least an L2 cache.  With threads, this will be
- * threads on the same core.  If the sahred cache is L3
+ * threads on the same core.  If the shared cache is L3
  * or higher, simply returns the same core.
  */
 static int
 find_close_core(int cpu, int core_offset)
 {
 	struct cpu_group *grp;
 	int i;
 	int fcpu;
 	cpuset_t cs;
 
 	grp = cpu_top;
 	if (grp == NULL)
 		return cpu;
 	i = 0;
 	while ((i = find_child_with_core(cpu, grp)) != -1) {
 		/* If the child only has one cpu, don't descend */
 		if (grp->cg_child[i].cg_count <= 1)
 			break;
 		grp = &grp->cg_child[i];
 	}
 
 	/* If they don't share at least an L2 cache, use the same CPU */
 	if (grp->cg_level > CG_SHARE_L2 || grp->cg_level == CG_SHARE_NONE)
 		return cpu;
 
 	/* Now pick one */
 	CPU_COPY(&grp->cg_mask, &cs);
 
 	/* Add the selected CPU offset to core offset. */
 	for (i = 0; (fcpu = CPU_FFS(&cs)) != 0; i++) {
 		if (fcpu - 1 == cpu)
 			break;
 		CPU_CLR(fcpu - 1, &cs);
 	}
 	MPASS(fcpu);
 
 	core_offset += i;
 
 	CPU_COPY(&grp->cg_mask, &cs);
 	for (i = core_offset % grp->cg_count; i > 0; i--) {
 		MPASS(CPU_FFS(&cs));
 		CPU_CLR(CPU_FFS(&cs) - 1, &cs);
 	}
 	MPASS(CPU_FFS(&cs));
 	return CPU_FFS(&cs) - 1;
 }
 #else
 static int
 find_close_core(int cpu, int core_offset __unused)
 {
 	return cpu;
 }
 #endif
 
 static int
 get_core_offset(if_ctx_t ctx, iflib_intr_type_t type, int qid)
 {
 	switch (type) {
 	case IFLIB_INTR_TX:
 		/* TX queues get cores which share at least an L2 cache with the corresponding RX queue */
 		/* XXX handle multiple RX threads per core and more than two core per L2 group */
 		return qid / CPU_COUNT(&ctx->ifc_cpus) + 1;
 	case IFLIB_INTR_RX:
 	case IFLIB_INTR_RXTX:
 		/* RX queues get the specified core */
 		return qid / CPU_COUNT(&ctx->ifc_cpus);
 	default:
 		return -1;
 	}
 }
 #else
 #define get_core_offset(ctx, type, qid)	CPU_FIRST()
 #define find_close_core(cpuid, tid)	CPU_FIRST()
 #define find_nth(ctx, gid)		CPU_FIRST()
 #endif
 
 /* Just to avoid copy/paste */
 static inline int
 iflib_irq_set_affinity(if_ctx_t ctx, if_irq_t irq, iflib_intr_type_t type,
     int qid, struct grouptask *gtask, struct taskqgroup *tqg, void *uniq,
     const char *name)
 {
 	device_t dev;
-	int err, cpuid, tid;
+	int co, cpuid, err, tid;
 
 	dev = ctx->ifc_dev;
-	cpuid = find_nth(ctx, qid);
+	co = ctx->ifc_sysctl_core_offset;
+	if (ctx->ifc_sysctl_separate_txrx && type == IFLIB_INTR_TX)
+		co += ctx->ifc_softc_ctx.isc_nrxqsets;
+	cpuid = find_nth(ctx, qid + co);
 	tid = get_core_offset(ctx, type, qid);
 	MPASS(tid >= 0);
 	cpuid = find_close_core(cpuid, tid);
 	err = taskqgroup_attach_cpu(tqg, gtask, uniq, cpuid, dev, irq->ii_res,
 	    name);
 	if (err) {
 		device_printf(dev, "taskqgroup_attach_cpu failed %d\n", err);
 		return (err);
 	}
 #ifdef notyet
 	if (cpuid > ctx->ifc_cpuid_highest)
 		ctx->ifc_cpuid_highest = cpuid;
 #endif
 	return 0;
 }
 
 int
 iflib_irq_alloc_generic(if_ctx_t ctx, if_irq_t irq, int rid,
 			iflib_intr_type_t type, driver_filter_t *filter,
 			void *filter_arg, int qid, const char *name)
 {
 	device_t dev;
 	struct grouptask *gtask;
 	struct taskqgroup *tqg;
 	iflib_filter_info_t info;
 	gtask_fn_t *fn;
 	int tqrid, err;
 	driver_filter_t *intr_fast;
 	void *q;
 
 	info = &ctx->ifc_filter_info;
 	tqrid = rid;
 
 	switch (type) {
 	/* XXX merge tx/rx for netmap? */
 	case IFLIB_INTR_TX:
 		q = &ctx->ifc_txqs[qid];
 		info = &ctx->ifc_txqs[qid].ift_filter_info;
 		gtask = &ctx->ifc_txqs[qid].ift_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_tx;
 		intr_fast = iflib_fast_intr;
 		GROUPTASK_INIT(gtask, 0, fn, q);
 		ctx->ifc_flags |= IFC_NETMAP_TX_IRQ;
 		break;
 	case IFLIB_INTR_RX:
 		q = &ctx->ifc_rxqs[qid];
 		info = &ctx->ifc_rxqs[qid].ifr_filter_info;
 		gtask = &ctx->ifc_rxqs[qid].ifr_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_rx;
 		intr_fast = iflib_fast_intr;
 		GROUPTASK_INIT(gtask, 0, fn, q);
 		break;
 	case IFLIB_INTR_RXTX:
 		q = &ctx->ifc_rxqs[qid];
 		info = &ctx->ifc_rxqs[qid].ifr_filter_info;
 		gtask = &ctx->ifc_rxqs[qid].ifr_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_rx;
 		intr_fast = iflib_fast_intr_rxtx;
 		GROUPTASK_INIT(gtask, 0, fn, q);
 		break;
 	case IFLIB_INTR_ADMIN:
 		q = ctx;
 		tqrid = -1;
 		info = &ctx->ifc_filter_info;
 		gtask = &ctx->ifc_admin_task;
 		tqg = qgroup_if_config_tqg;
 		fn = _task_fn_admin;
 		intr_fast = iflib_fast_intr_ctx;
 		break;
 	default:
 		panic("unknown net intr type");
 	}
 
 	info->ifi_filter = filter;
 	info->ifi_filter_arg = filter_arg;
 	info->ifi_task = gtask;
 	info->ifi_ctx = q;
 
 	dev = ctx->ifc_dev;
 	err = _iflib_irq_alloc(ctx, irq, rid, intr_fast, NULL, info,  name);
 	if (err != 0) {
 		device_printf(dev, "_iflib_irq_alloc failed %d\n", err);
 		return (err);
 	}
 	if (type == IFLIB_INTR_ADMIN)
 		return (0);
 
 	if (tqrid != -1) {
 		err = iflib_irq_set_affinity(ctx, irq, type, qid, gtask, tqg,
 		    q, name);
 		if (err)
 			return (err);
 	} else {
 		taskqgroup_attach(tqg, gtask, q, dev, irq->ii_res, name);
 	}
 
 	return (0);
 }
 
 void
 iflib_softirq_alloc_generic(if_ctx_t ctx, if_irq_t irq, iflib_intr_type_t type, void *arg, int qid, const char *name)
 {
 	struct grouptask *gtask;
 	struct taskqgroup *tqg;
 	gtask_fn_t *fn;
 	void *q;
 	int err;
 
 	switch (type) {
 	case IFLIB_INTR_TX:
 		q = &ctx->ifc_txqs[qid];
 		gtask = &ctx->ifc_txqs[qid].ift_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_tx;
 		break;
 	case IFLIB_INTR_RX:
 		q = &ctx->ifc_rxqs[qid];
 		gtask = &ctx->ifc_rxqs[qid].ifr_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_rx;
 		break;
 	case IFLIB_INTR_IOV:
 		q = ctx;
 		gtask = &ctx->ifc_vflr_task;
 		tqg = qgroup_if_config_tqg;
 		fn = _task_fn_iov;
 		break;
 	default:
 		panic("unknown net intr type");
 	}
 	GROUPTASK_INIT(gtask, 0, fn, q);
 	if (irq != NULL) {
 		err = iflib_irq_set_affinity(ctx, irq, type, qid, gtask, tqg,
 		    q, name);
 		if (err)
 			taskqgroup_attach(tqg, gtask, q, ctx->ifc_dev,
 			    irq->ii_res, name);
 	} else {
 		taskqgroup_attach(tqg, gtask, q, NULL, NULL, name);
 	}
 }
 
 void
 iflib_irq_free(if_ctx_t ctx, if_irq_t irq)
 {
 
 	if (irq->ii_tag)
 		bus_teardown_intr(ctx->ifc_dev, irq->ii_res, irq->ii_tag);
 
 	if (irq->ii_res)
 		bus_release_resource(ctx->ifc_dev, SYS_RES_IRQ,
 		    rman_get_rid(irq->ii_res), irq->ii_res);
 }
 
 static int
 iflib_legacy_setup(if_ctx_t ctx, driver_filter_t filter, void *filter_arg, int *rid, const char *name)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	if_irq_t irq = &ctx->ifc_legacy_irq;
 	iflib_filter_info_t info;
 	device_t dev;
 	struct grouptask *gtask;
 	struct resource *res;
 	struct taskqgroup *tqg;
 	gtask_fn_t *fn;
 	int tqrid;
 	void *q;
 	int err;
 
 	q = &ctx->ifc_rxqs[0];
 	info = &rxq[0].ifr_filter_info;
 	gtask = &rxq[0].ifr_task;
 	tqg = qgroup_if_io_tqg;
 	tqrid = irq->ii_rid = *rid;
 	fn = _task_fn_rx;
 
 	ctx->ifc_flags |= IFC_LEGACY;
 	info->ifi_filter = filter;
 	info->ifi_filter_arg = filter_arg;
 	info->ifi_task = gtask;
 	info->ifi_ctx = ctx;
 
 	dev = ctx->ifc_dev;
 	/* We allocate a single interrupt resource */
 	if ((err = _iflib_irq_alloc(ctx, irq, tqrid, iflib_fast_intr_ctx, NULL, info, name)) != 0)
 		return (err);
 	GROUPTASK_INIT(gtask, 0, fn, q);
 	res = irq->ii_res;
 	taskqgroup_attach(tqg, gtask, q, dev, res, name);
 
 	GROUPTASK_INIT(&txq->ift_task, 0, _task_fn_tx, txq);
 	taskqgroup_attach(qgroup_if_io_tqg, &txq->ift_task, txq, dev, res,
 	    "tx");
 	return (0);
 }
 
 void
 iflib_led_create(if_ctx_t ctx)
 {
 
 	ctx->ifc_led_dev = led_create(iflib_led_func, ctx,
 	    device_get_nameunit(ctx->ifc_dev));
 }
 
 void
 iflib_tx_intr_deferred(if_ctx_t ctx, int txqid)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_txqs[txqid].ift_task);
 }
 
 void
 iflib_rx_intr_deferred(if_ctx_t ctx, int rxqid)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_rxqs[rxqid].ifr_task);
 }
 
 void
 iflib_admin_intr_deferred(if_ctx_t ctx)
 {
 #ifdef INVARIANTS
 	struct grouptask *gtask;
 
 	gtask = &ctx->ifc_admin_task;
 	MPASS(gtask != NULL && gtask->gt_taskqueue != NULL);
 #endif
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_admin_task);
 }
 
 void
 iflib_iov_intr_deferred(if_ctx_t ctx)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_vflr_task);
 }
 
 void
 iflib_io_tqg_attach(struct grouptask *gt, void *uniq, int cpu, char *name)
 {
 
 	taskqgroup_attach_cpu(qgroup_if_io_tqg, gt, uniq, cpu, NULL, NULL,
 	    name);
 }
 
 void
 iflib_config_gtask_init(void *ctx, struct grouptask *gtask, gtask_fn_t *fn,
 	const char *name)
 {
 
 	GROUPTASK_INIT(gtask, 0, fn, ctx);
 	taskqgroup_attach(qgroup_if_config_tqg, gtask, gtask, NULL, NULL,
 	    name);
 }
 
 void
 iflib_config_gtask_deinit(struct grouptask *gtask)
 {
 
 	taskqgroup_detach(qgroup_if_config_tqg, gtask);	
 }
 
 void
 iflib_link_state_change(if_ctx_t ctx, int link_state, uint64_t baudrate)
 {
 	if_t ifp = ctx->ifc_ifp;
 	iflib_txq_t txq = ctx->ifc_txqs;
 
 	if_setbaudrate(ifp, baudrate);
 	if (baudrate >= IF_Gbps(10)) {
 		STATE_LOCK(ctx);
 		ctx->ifc_flags |= IFC_PREFETCH;
 		STATE_UNLOCK(ctx);
 	}
 	/* If link down, disable watchdog */
 	if ((ctx->ifc_link_state == LINK_STATE_UP) && (link_state == LINK_STATE_DOWN)) {
 		for (int i = 0; i < ctx->ifc_softc_ctx.isc_ntxqsets; i++, txq++)
 			txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 	}
 	ctx->ifc_link_state = link_state;
 	if_link_state_change(ifp, link_state);
 }
 
 static int
 iflib_tx_credits_update(if_ctx_t ctx, iflib_txq_t txq)
 {
 	int credits;
 #ifdef INVARIANTS
 	int credits_pre = txq->ift_cidx_processed;
 #endif
 
 	if (ctx->isc_txd_credits_update == NULL)
 		return (0);
 
 	bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 	    BUS_DMASYNC_POSTREAD);
 	if ((credits = ctx->isc_txd_credits_update(ctx->ifc_softc, txq->ift_id, true)) == 0)
 		return (0);
 
 	txq->ift_processed += credits;
 	txq->ift_cidx_processed += credits;
 
 	MPASS(credits_pre + credits == txq->ift_cidx_processed);
 	if (txq->ift_cidx_processed >= txq->ift_size)
 		txq->ift_cidx_processed -= txq->ift_size;
 	return (credits);
 }
 
 static int
 iflib_rxd_avail(if_ctx_t ctx, iflib_rxq_t rxq, qidx_t cidx, qidx_t budget)
 {
 	iflib_fl_t fl;
 	u_int i;
 
 	for (i = 0, fl = &rxq->ifr_fl[0]; i < rxq->ifr_nfl; i++, fl++)
 		bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 	return (ctx->isc_rxd_available(ctx->ifc_softc, rxq->ifr_id, cidx,
 	    budget));
 }
 
 void
 iflib_add_int_delay_sysctl(if_ctx_t ctx, const char *name,
 	const char *description, if_int_delay_info_t info,
 	int offset, int value)
 {
 	info->iidi_ctx = ctx;
 	info->iidi_offset = offset;
 	info->iidi_value = value;
 	SYSCTL_ADD_PROC(device_get_sysctl_ctx(ctx->ifc_dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(ctx->ifc_dev)),
 	    OID_AUTO, name, CTLTYPE_INT|CTLFLAG_RW,
 	    info, 0, iflib_sysctl_int_delay, "I", description);
 }
 
 struct sx *
 iflib_ctx_lock_get(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_ctx_sx);
 }
 
 static int
 iflib_msix_init(if_ctx_t ctx)
 {
 	device_t dev = ctx->ifc_dev;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	int vectors, queues, rx_queues, tx_queues, queuemsgs, msgs;
 	int iflib_num_tx_queues, iflib_num_rx_queues;
 	int err, admincnt, bar;
 
 	iflib_num_tx_queues = ctx->ifc_sysctl_ntxqs;
 	iflib_num_rx_queues = ctx->ifc_sysctl_nrxqs;
 
 	if (bootverbose)
 		device_printf(dev, "msix_init qsets capped at %d\n",
 		    imax(scctx->isc_ntxqsets, scctx->isc_nrxqsets));
 
 	bar = ctx->ifc_softc_ctx.isc_msix_bar;
 	admincnt = sctx->isc_admin_intrcnt;
 	/* Override by tuneable */
 	if (scctx->isc_disable_msix)
 		goto msi;
 
 	/* First try MSI-X */
 	if ((msgs = pci_msix_count(dev)) == 0) {
 		if (bootverbose)
 			device_printf(dev, "MSI-X not supported or disabled\n");
 		goto msi;
 	}
 	/*
 	 * bar == -1 => "trust me I know what I'm doing"
 	 * Some drivers are for hardware that is so shoddily
 	 * documented that no one knows which bars are which
 	 * so the developer has to map all bars. This hack
 	 * allows shoddy garbage to use MSI-X in this framework.
 	 */
 	if (bar != -1) {
 		ctx->ifc_msix_mem = bus_alloc_resource_any(dev,
 	            SYS_RES_MEMORY, &bar, RF_ACTIVE);
 		if (ctx->ifc_msix_mem == NULL) {
 			device_printf(dev, "Unable to map MSI-X table\n");
 			goto msi;
 		}
 	}
 #if IFLIB_DEBUG
 	/* use only 1 qset in debug mode */
 	queuemsgs = min(msgs - admincnt, 1);
 #else
 	queuemsgs = msgs - admincnt;
 #endif
 #ifdef RSS
 	queues = imin(queuemsgs, rss_getnumbuckets());
 #else
 	queues = queuemsgs;
 #endif
 	queues = imin(CPU_COUNT(&ctx->ifc_cpus), queues);
 	if (bootverbose)
 		device_printf(dev,
 		    "intr CPUs: %d queue msgs: %d admincnt: %d\n",
 		    CPU_COUNT(&ctx->ifc_cpus), queuemsgs, admincnt);
 #ifdef  RSS
 	/* If we're doing RSS, clamp at the number of RSS buckets */
 	if (queues > rss_getnumbuckets())
 		queues = rss_getnumbuckets();
 #endif
 	if (iflib_num_rx_queues > 0 && iflib_num_rx_queues < queuemsgs - admincnt)
 		rx_queues = iflib_num_rx_queues;
 	else
 		rx_queues = queues;
 
 	if (rx_queues > scctx->isc_nrxqsets)
 		rx_queues = scctx->isc_nrxqsets;
 
 	/*
 	 * We want this to be all logical CPUs by default
 	 */
 	if (iflib_num_tx_queues > 0 && iflib_num_tx_queues < queues)
 		tx_queues = iflib_num_tx_queues;
 	else
 		tx_queues = mp_ncpus;
 
 	if (tx_queues > scctx->isc_ntxqsets)
 		tx_queues = scctx->isc_ntxqsets;
 
 	if (ctx->ifc_sysctl_qs_eq_override == 0) {
 #ifdef INVARIANTS
 		if (tx_queues != rx_queues)
 			device_printf(dev,
 			    "queue equality override not set, capping rx_queues at %d and tx_queues at %d\n",
 			    min(rx_queues, tx_queues), min(rx_queues, tx_queues));
 #endif
 		tx_queues = min(rx_queues, tx_queues);
 		rx_queues = min(rx_queues, tx_queues);
 	}
 
 	device_printf(dev, "Using %d rx queues %d tx queues\n",
 	    rx_queues, tx_queues);
 
 	vectors = rx_queues + admincnt;
 	if ((err = pci_alloc_msix(dev, &vectors)) == 0) {
 		device_printf(dev, "Using MSI-X interrupts with %d vectors\n",
 		    vectors);
 		scctx->isc_vectors = vectors;
 		scctx->isc_nrxqsets = rx_queues;
 		scctx->isc_ntxqsets = tx_queues;
 		scctx->isc_intr = IFLIB_INTR_MSIX;
 
 		return (vectors);
 	} else {
 		device_printf(dev,
 		    "failed to allocate %d MSI-X vectors, err: %d - using MSI\n",
 		    vectors, err);
 		bus_release_resource(dev, SYS_RES_MEMORY, bar,
 		    ctx->ifc_msix_mem);
 		ctx->ifc_msix_mem = NULL;
 	}
 msi:
 	vectors = pci_msi_count(dev);
 	scctx->isc_nrxqsets = 1;
 	scctx->isc_ntxqsets = 1;
 	scctx->isc_vectors = vectors;
 	if (vectors == 1 && pci_alloc_msi(dev, &vectors) == 0) {
 		device_printf(dev,"Using an MSI interrupt\n");
 		scctx->isc_intr = IFLIB_INTR_MSI;
 	} else {
 		scctx->isc_vectors = 1;
 		device_printf(dev,"Using a Legacy interrupt\n");
 		scctx->isc_intr = IFLIB_INTR_LEGACY;
 	}
 
 	return (vectors);
 }
 
 static const char *ring_states[] = { "IDLE", "BUSY", "STALLED", "ABDICATED" };
 
 static int
 mp_ring_state_handler(SYSCTL_HANDLER_ARGS)
 {
 	int rc;
 	uint16_t *state = ((uint16_t *)oidp->oid_arg1);
 	struct sbuf *sb;
 	const char *ring_state = "UNKNOWN";
 
 	/* XXX needed ? */
 	rc = sysctl_wire_old_buffer(req, 0);
 	MPASS(rc == 0);
 	if (rc != 0)
 		return (rc);
 	sb = sbuf_new_for_sysctl(NULL, NULL, 80, req);
 	MPASS(sb != NULL);
 	if (sb == NULL)
 		return (ENOMEM);
 	if (state[3] <= 3)
 		ring_state = ring_states[state[3]];
 
 	sbuf_printf(sb, "pidx_head: %04hd pidx_tail: %04hd cidx: %04hd state: %s",
 		    state[0], state[1], state[2], ring_state);
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
         return(rc);
 }
 
 enum iflib_ndesc_handler {
 	IFLIB_NTXD_HANDLER,
 	IFLIB_NRXD_HANDLER,
 };
 
 static int
 mp_ndesc_handler(SYSCTL_HANDLER_ARGS)
 {
 	if_ctx_t ctx = (void *)arg1;
 	enum iflib_ndesc_handler type = arg2;
 	char buf[256] = {0};
 	qidx_t *ndesc;
 	char *p, *next;
 	int nqs, rc, i;
 
 	MPASS(type == IFLIB_NTXD_HANDLER || type == IFLIB_NRXD_HANDLER);
 
 	nqs = 8;
 	switch(type) {
 	case IFLIB_NTXD_HANDLER:
 		ndesc = ctx->ifc_sysctl_ntxds;
 		if (ctx->ifc_sctx)
 			nqs = ctx->ifc_sctx->isc_ntxqs;
 		break;
 	case IFLIB_NRXD_HANDLER:
 		ndesc = ctx->ifc_sysctl_nrxds;
 		if (ctx->ifc_sctx)
 			nqs = ctx->ifc_sctx->isc_nrxqs;
 		break;
 	default:
 			panic("unhandled type");
 	}
 	if (nqs == 0)
 		nqs = 8;
 
 	for (i=0; i<8; i++) {
 		if (i >= nqs)
 			break;
 		if (i)
 			strcat(buf, ",");
 		sprintf(strchr(buf, 0), "%d", ndesc[i]);
 	}
 
 	rc = sysctl_handle_string(oidp, buf, sizeof(buf), req);
 	if (rc || req->newptr == NULL)
 		return rc;
 
 	for (i = 0, next = buf, p = strsep(&next, " ,"); i < 8 && p;
 	    i++, p = strsep(&next, " ,")) {
 		ndesc[i] = strtoul(p, NULL, 10);
 	}
 
 	return(rc);
 }
 
 #define NAME_BUFLEN 32
 static void
 iflib_add_device_sysctl_pre(if_ctx_t ctx)
 {
         device_t dev = iflib_get_dev(ctx);
 	struct sysctl_oid_list *child, *oid_list;
 	struct sysctl_ctx_list *ctx_list;
 	struct sysctl_oid *node;
 
 	ctx_list = device_get_sysctl_ctx(dev);
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev));
 	ctx->ifc_sysctl_node = node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, "iflib",
 						      CTLFLAG_RD, NULL, "IFLIB fields");
 	oid_list = SYSCTL_CHILDREN(node);
 
 	SYSCTL_ADD_CONST_STRING(ctx_list, oid_list, OID_AUTO, "driver_version",
 		       CTLFLAG_RD, ctx->ifc_sctx->isc_driver_version,
 		       "driver version");
 
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_ntxqs",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_ntxqs, 0,
 			"# of txqs to use, 0 => use default #");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_nrxqs",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_nrxqs, 0,
 			"# of rxqs to use, 0 => use default #");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_qs_enable",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_qs_eq_override, 0,
                        "permit #txq != #rxq");
 	SYSCTL_ADD_INT(ctx_list, oid_list, OID_AUTO, "disable_msix",
                       CTLFLAG_RWTUN, &ctx->ifc_softc_ctx.isc_disable_msix, 0,
                       "disable MSI-X (default 0)");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "rx_budget",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_rx_budget, 0,
                        "set the rx budget");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "tx_abdicate",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_tx_abdicate, 0,
 		       "cause tx to abdicate instead of running to completion");
+	ctx->ifc_sysctl_core_offset = CORE_OFFSET_UNSPECIFIED;
+	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "core_offset",
+		       CTLFLAG_RDTUN, &ctx->ifc_sysctl_core_offset, 0,
+		       "offset to start using cores at");
+	SYSCTL_ADD_U8(ctx_list, oid_list, OID_AUTO, "separate_txrx",
+		       CTLFLAG_RDTUN, &ctx->ifc_sysctl_separate_txrx, 0,
+		       "use separate cores for TX and RX");
 
 	/* XXX change for per-queue sizes */
 	SYSCTL_ADD_PROC(ctx_list, oid_list, OID_AUTO, "override_ntxds",
 		       CTLTYPE_STRING|CTLFLAG_RWTUN, ctx, IFLIB_NTXD_HANDLER,
                        mp_ndesc_handler, "A",
                        "list of # of tx descriptors to use, 0 = use default #");
 	SYSCTL_ADD_PROC(ctx_list, oid_list, OID_AUTO, "override_nrxds",
 		       CTLTYPE_STRING|CTLFLAG_RWTUN, ctx, IFLIB_NRXD_HANDLER,
                        mp_ndesc_handler, "A",
                        "list of # of rx descriptors to use, 0 = use default #");
 }
 
 static void
 iflib_add_device_sysctl_post(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
         device_t dev = iflib_get_dev(ctx);
 	struct sysctl_oid_list *child;
 	struct sysctl_ctx_list *ctx_list;
 	iflib_fl_t fl;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	int i, j;
 	char namebuf[NAME_BUFLEN];
 	char *qfmt;
 	struct sysctl_oid *queue_node, *fl_node, *node;
 	struct sysctl_oid_list *queue_list, *fl_list;
 	ctx_list = device_get_sysctl_ctx(dev);
 
 	node = ctx->ifc_sysctl_node;
 	child = SYSCTL_CHILDREN(node);
 
 	if (scctx->isc_ntxqsets > 100)
 		qfmt = "txq%03d";
 	else if (scctx->isc_ntxqsets > 10)
 		qfmt = "txq%02d";
 	else
 		qfmt = "txq%d";
 	for (i = 0, txq = ctx->ifc_txqs; i < scctx->isc_ntxqsets; i++, txq++) {
 		snprintf(namebuf, NAME_BUFLEN, qfmt, i);
 		queue_node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, namebuf,
 					     CTLFLAG_RD, NULL, "Queue Name");
 		queue_list = SYSCTL_CHILDREN(queue_node);
 #if MEMORY_LOGGING
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_dequeued",
 				CTLFLAG_RD,
 				&txq->ift_dequeued, "total mbufs freed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_enqueued",
 				CTLFLAG_RD,
 				&txq->ift_enqueued, "total mbufs enqueued");
 #endif
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "mbuf_defrag",
 				   CTLFLAG_RD,
 				   &txq->ift_mbuf_defrag, "# of times m_defrag was called");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "m_pullups",
 				   CTLFLAG_RD,
 				   &txq->ift_pullups, "# of times m_pullup was called");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "mbuf_defrag_failed",
 				   CTLFLAG_RD,
 				   &txq->ift_mbuf_defrag_failed, "# of times m_defrag failed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "no_desc_avail",
 				   CTLFLAG_RD,
 				   &txq->ift_no_desc_avail, "# of times no descriptors were available");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "tx_map_failed",
 				   CTLFLAG_RD,
 				   &txq->ift_map_failed, "# of times dma map failed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txd_encap_efbig",
 				   CTLFLAG_RD,
 				   &txq->ift_txd_encap_efbig, "# of times txd_encap returned EFBIG");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "no_tx_dma_setup",
 				   CTLFLAG_RD,
 				   &txq->ift_no_tx_dma_setup, "# of times map failed for other than EFBIG");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_pidx",
 				   CTLFLAG_RD,
 				   &txq->ift_pidx, 1, "Producer Index");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_cidx",
 				   CTLFLAG_RD,
 				   &txq->ift_cidx, 1, "Consumer Index");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_cidx_processed",
 				   CTLFLAG_RD,
 				   &txq->ift_cidx_processed, 1, "Consumer Index seen by credit update");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_in_use",
 				   CTLFLAG_RD,
 				   &txq->ift_in_use, 1, "descriptors in use");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_processed",
 				   CTLFLAG_RD,
 				   &txq->ift_processed, "descriptors procesed for clean");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_cleaned",
 				   CTLFLAG_RD,
 				   &txq->ift_cleaned, "total cleaned");
 		SYSCTL_ADD_PROC(ctx_list, queue_list, OID_AUTO, "ring_state",
 				CTLTYPE_STRING | CTLFLAG_RD, __DEVOLATILE(uint64_t *, &txq->ift_br->state),
 				0, mp_ring_state_handler, "A", "soft ring state");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_enqueues",
 				       CTLFLAG_RD, &txq->ift_br->enqueues,
 				       "# of enqueues to the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_drops",
 				       CTLFLAG_RD, &txq->ift_br->drops,
 				       "# of drops in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_starts",
 				       CTLFLAG_RD, &txq->ift_br->starts,
 				       "# of normal consumer starts in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_stalls",
 				       CTLFLAG_RD, &txq->ift_br->stalls,
 					       "# of consumer stalls in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_restarts",
 			       CTLFLAG_RD, &txq->ift_br->restarts,
 				       "# of consumer restarts in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_abdications",
 				       CTLFLAG_RD, &txq->ift_br->abdications,
 				       "# of consumer abdications in the mp_ring for this queue");
 	}
 
 	if (scctx->isc_nrxqsets > 100)
 		qfmt = "rxq%03d";
 	else if (scctx->isc_nrxqsets > 10)
 		qfmt = "rxq%02d";
 	else
 		qfmt = "rxq%d";
 	for (i = 0, rxq = ctx->ifc_rxqs; i < scctx->isc_nrxqsets; i++, rxq++) {
 		snprintf(namebuf, NAME_BUFLEN, qfmt, i);
 		queue_node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, namebuf,
 					     CTLFLAG_RD, NULL, "Queue Name");
 		queue_list = SYSCTL_CHILDREN(queue_node);
 		if (sctx->isc_flags & IFLIB_HAS_RXCQ) {
 			SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "rxq_cq_pidx",
 				       CTLFLAG_RD,
 				       &rxq->ifr_cq_pidx, 1, "Producer Index");
 			SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "rxq_cq_cidx",
 				       CTLFLAG_RD,
 				       &rxq->ifr_cq_cidx, 1, "Consumer Index");
 		}
 
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {
 			snprintf(namebuf, NAME_BUFLEN, "rxq_fl%d", j);
 			fl_node = SYSCTL_ADD_NODE(ctx_list, queue_list, OID_AUTO, namebuf,
 						     CTLFLAG_RD, NULL, "freelist Name");
 			fl_list = SYSCTL_CHILDREN(fl_node);
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "pidx",
 				       CTLFLAG_RD,
 				       &fl->ifl_pidx, 1, "Producer Index");
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "cidx",
 				       CTLFLAG_RD,
 				       &fl->ifl_cidx, 1, "Consumer Index");
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "credits",
 				       CTLFLAG_RD,
 				       &fl->ifl_credits, 1, "credits available");
 #if MEMORY_LOGGING
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_m_enqueued",
 					CTLFLAG_RD,
 					&fl->ifl_m_enqueued, "mbufs allocated");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_m_dequeued",
 					CTLFLAG_RD,
 					&fl->ifl_m_dequeued, "mbufs freed");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_cl_enqueued",
 					CTLFLAG_RD,
 					&fl->ifl_cl_enqueued, "clusters allocated");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_cl_dequeued",
 					CTLFLAG_RD,
 					&fl->ifl_cl_dequeued, "clusters freed");
 #endif
 
 		}
 	}
 
 }
 
 void
 iflib_request_reset(if_ctx_t ctx)
 {
 
 	STATE_LOCK(ctx);
 	ctx->ifc_flags |= IFC_DO_RESET;
 	STATE_UNLOCK(ctx);
 }
 
 #ifndef __NO_STRICT_ALIGNMENT
 static struct mbuf *
 iflib_fixup_rx(struct mbuf *m)
 {
 	struct mbuf *n;
 
 	if (m->m_len <= (MCLBYTES - ETHER_HDR_LEN)) {
 		bcopy(m->m_data, m->m_data + ETHER_HDR_LEN, m->m_len);
 		m->m_data += ETHER_HDR_LEN;
 		n = m;
 	} else {
 		MGETHDR(n, M_NOWAIT, MT_DATA);
 		if (n == NULL) {
 			m_freem(m);
 			return (NULL);
 		}
 		bcopy(m->m_data, n->m_data, ETHER_HDR_LEN);
 		m->m_data += ETHER_HDR_LEN;
 		m->m_len -= ETHER_HDR_LEN;
 		n->m_len = ETHER_HDR_LEN;
 		M_MOVE_PKTHDR(n, m);
 		n->m_next = m;
 	}
 	return (n);
 }
 #endif
 
 #ifdef NETDUMP
 static void
 iflib_netdump_init(struct ifnet *ifp, int *nrxr, int *ncl, int *clsize)
 {
 	if_ctx_t ctx;
 
 	ctx = if_getsoftc(ifp);
 	CTX_LOCK(ctx);
 	*nrxr = NRXQSETS(ctx);
 	*ncl = ctx->ifc_rxqs[0].ifr_fl->ifl_size;
 	*clsize = ctx->ifc_rxqs[0].ifr_fl->ifl_buf_size;
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_netdump_event(struct ifnet *ifp, enum netdump_ev event)
 {
 	if_ctx_t ctx;
 	if_softc_ctx_t scctx;
 	iflib_fl_t fl;
 	iflib_rxq_t rxq;
 	int i, j;
 
 	ctx = if_getsoftc(ifp);
 	scctx = &ctx->ifc_softc_ctx;
 
 	switch (event) {
 	case NETDUMP_START:
 		for (i = 0; i < scctx->isc_nrxqsets; i++) {
 			rxq = &ctx->ifc_rxqs[i];
 			for (j = 0; j < rxq->ifr_nfl; j++) {
 				fl = rxq->ifr_fl;
 				fl->ifl_zone = m_getzone(fl->ifl_buf_size);
 			}
 		}
 		iflib_no_tx_batch = 1;
 		break;
 	default:
 		break;
 	}
 }
 
 static int
 iflib_netdump_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	if_ctx_t ctx;
 	iflib_txq_t txq;
 	int error;
 
 	ctx = if_getsoftc(ifp);
 	if ((if_getdrvflags(ifp) & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
 	    IFF_DRV_RUNNING)
 		return (EBUSY);
 
 	txq = &ctx->ifc_txqs[0];
 	error = iflib_encap(txq, &m);
 	if (error == 0)
 		(void)iflib_txd_db_check(ctx, txq, true, txq->ift_in_use);
 	return (error);
 }
 
 static int
 iflib_netdump_poll(struct ifnet *ifp, int count)
 {
 	if_ctx_t ctx;
 	if_softc_ctx_t scctx;
 	iflib_txq_t txq;
 	int i;
 
 	ctx = if_getsoftc(ifp);
 	scctx = &ctx->ifc_softc_ctx;
 
 	if ((if_getdrvflags(ifp) & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
 	    IFF_DRV_RUNNING)
 		return (EBUSY);
 
 	txq = &ctx->ifc_txqs[0];
 	(void)iflib_completed_tx_reclaim(txq, RECLAIM_THRESH(ctx));
 
 	for (i = 0; i < scctx->isc_nrxqsets; i++)
 		(void)iflib_rxeof(&ctx->ifc_rxqs[i], 16 /* XXX */);
 	return (0);
 }
 #endif /* NETDUMP */
Index: user/ngie/bug-237403/sys/netinet/in_mcast.c
===================================================================
--- user/ngie/bug-237403/sys/netinet/in_mcast.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/in_mcast.c	(revision 346926)
@@ -1,3151 +1,3162 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 2007-2009 Bruce Simpson.
  * Copyright (c) 2005 Robert N. M. Watson.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. The name of the author may not be used to endorse or promote
  *    products derived from this software without specific prior written
  *    permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 /*
  * IPv4 multicast socket, group, and socket option processing module.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/protosw.h>
 #include <sys/rmlock.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/protosw.h>
 #include <sys/sysctl.h>
 #include <sys/ktr.h>
 #include <sys/taskqueue.h>
 #include <sys/gtaskqueue.h>
 #include <sys/tree.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_dl.h>
 #include <net/route.h>
 #include <net/vnet.h>
 
 #include <net/ethernet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/in_fib.h>
 #include <netinet/in_pcb.h>
 #include <netinet/in_var.h>
 #include <netinet/ip_var.h>
 #include <netinet/igmp_var.h>
 
 #ifndef KTR_IGMPV3
 #define KTR_IGMPV3 KTR_INET
 #endif
 
 #ifndef __SOCKUNION_DECLARED
 union sockunion {
 	struct sockaddr_storage	ss;
 	struct sockaddr		sa;
 	struct sockaddr_dl	sdl;
 	struct sockaddr_in	sin;
 };
 typedef union sockunion sockunion_t;
 #define __SOCKUNION_DECLARED
 #endif /* __SOCKUNION_DECLARED */
 
 static MALLOC_DEFINE(M_INMFILTER, "in_mfilter",
     "IPv4 multicast PCB-layer source filter");
 static MALLOC_DEFINE(M_IPMADDR, "in_multi", "IPv4 multicast group");
 static MALLOC_DEFINE(M_IPMOPTS, "ip_moptions", "IPv4 multicast options");
 static MALLOC_DEFINE(M_IPMSOURCE, "ip_msource",
     "IPv4 multicast IGMP-layer source filter");
 
 /*
  * Locking:
  * - Lock order is: Giant, INP_WLOCK, IN_MULTI_LIST_LOCK, IGMP_LOCK, IF_ADDR_LOCK.
  * - The IF_ADDR_LOCK is implicitly taken by inm_lookup() earlier, however
  *   it can be taken by code in net/if.c also.
  * - ip_moptions and in_mfilter are covered by the INP_WLOCK.
  *
  * struct in_multi is covered by IN_MULTI_LIST_LOCK. There isn't strictly
  * any need for in_multi itself to be virtualized -- it is bound to an ifp
  * anyway no matter what happens.
  */
 struct mtx in_multi_list_mtx;
 MTX_SYSINIT(in_multi_mtx, &in_multi_list_mtx, "in_multi_list_mtx", MTX_DEF);
 
 struct mtx in_multi_free_mtx;
 MTX_SYSINIT(in_multi_free_mtx, &in_multi_free_mtx, "in_multi_free_mtx", MTX_DEF);
 
 struct sx in_multi_sx;
 SX_SYSINIT(in_multi_sx, &in_multi_sx, "in_multi_sx");
 
 int ifma_restart;
 
 /*
  * Functions with non-static linkage defined in this file should be
  * declared in in_var.h:
  *  imo_multi_filter()
  *  in_addmulti()
  *  in_delmulti()
  *  in_joingroup()
  *  in_joingroup_locked()
  *  in_leavegroup()
  *  in_leavegroup_locked()
  * and ip_var.h:
  *  inp_freemoptions()
  *  inp_getmoptions()
  *  inp_setmoptions()
  *
  * XXX: Both carp and pf need to use the legacy (*,G) KPIs in_addmulti()
  * and in_delmulti().
  */
 static void	imf_commit(struct in_mfilter *);
 static int	imf_get_source(struct in_mfilter *imf,
 		    const struct sockaddr_in *psin,
 		    struct in_msource **);
 static struct in_msource *
 		imf_graft(struct in_mfilter *, const uint8_t,
 		    const struct sockaddr_in *);
 static void	imf_leave(struct in_mfilter *);
 static int	imf_prune(struct in_mfilter *, const struct sockaddr_in *);
 static void	imf_purge(struct in_mfilter *);
 static void	imf_rollback(struct in_mfilter *);
 static void	imf_reap(struct in_mfilter *);
 static int	imo_grow(struct ip_moptions *);
 static size_t	imo_match_group(const struct ip_moptions *,
 		    const struct ifnet *, const struct sockaddr *);
 static struct in_msource *
 		imo_match_source(const struct ip_moptions *, const size_t,
 		    const struct sockaddr *);
 static void	ims_merge(struct ip_msource *ims,
 		    const struct in_msource *lims, const int rollback);
 static int	in_getmulti(struct ifnet *, const struct in_addr *,
 		    struct in_multi **);
 static int	inm_get_source(struct in_multi *inm, const in_addr_t haddr,
 		    const int noalloc, struct ip_msource **pims);
 #ifdef KTR
 static int	inm_is_ifp_detached(const struct in_multi *);
 #endif
 static int	inm_merge(struct in_multi *, /*const*/ struct in_mfilter *);
 static void	inm_purge(struct in_multi *);
 static void	inm_reap(struct in_multi *);
 static void inm_release(struct in_multi *);
 static struct ip_moptions *
 		inp_findmoptions(struct inpcb *);
 static int	inp_get_source_filters(struct inpcb *, struct sockopt *);
 static int	inp_join_group(struct inpcb *, struct sockopt *);
 static int	inp_leave_group(struct inpcb *, struct sockopt *);
 static struct ifnet *
 		inp_lookup_mcast_ifp(const struct inpcb *,
 		    const struct sockaddr_in *, const struct in_addr);
 static int	inp_block_unblock_source(struct inpcb *, struct sockopt *);
 static int	inp_set_multicast_if(struct inpcb *, struct sockopt *);
 static int	inp_set_source_filters(struct inpcb *, struct sockopt *);
 static int	sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS);
 
 static SYSCTL_NODE(_net_inet_ip, OID_AUTO, mcast, CTLFLAG_RW, 0,
     "IPv4 multicast");
 
 static u_long in_mcast_maxgrpsrc = IP_MAX_GROUP_SRC_FILTER;
 SYSCTL_ULONG(_net_inet_ip_mcast, OID_AUTO, maxgrpsrc,
     CTLFLAG_RWTUN, &in_mcast_maxgrpsrc, 0,
     "Max source filters per group");
 
 static u_long in_mcast_maxsocksrc = IP_MAX_SOCK_SRC_FILTER;
 SYSCTL_ULONG(_net_inet_ip_mcast, OID_AUTO, maxsocksrc,
     CTLFLAG_RWTUN, &in_mcast_maxsocksrc, 0,
     "Max source filters per socket");
 
 int in_mcast_loop = IP_DEFAULT_MULTICAST_LOOP;
 SYSCTL_INT(_net_inet_ip_mcast, OID_AUTO, loop, CTLFLAG_RWTUN,
     &in_mcast_loop, 0, "Loopback multicast datagrams by default");
 
 static SYSCTL_NODE(_net_inet_ip_mcast, OID_AUTO, filters,
     CTLFLAG_RD | CTLFLAG_MPSAFE, sysctl_ip_mcast_filters,
     "Per-interface stack-wide source filters");
 
 #ifdef KTR
 /*
  * Inline function which wraps assertions for a valid ifp.
  * The ifnet layer will set the ifma's ifp pointer to NULL if the ifp
  * is detached.
  */
 static int __inline
 inm_is_ifp_detached(const struct in_multi *inm)
 {
 	struct ifnet *ifp;
 
 	KASSERT(inm->inm_ifma != NULL, ("%s: no ifma", __func__));
 	ifp = inm->inm_ifma->ifma_ifp;
 	if (ifp != NULL) {
 		/*
 		 * Sanity check that netinet's notion of ifp is the
 		 * same as net's.
 		 */
 		KASSERT(inm->inm_ifp == ifp, ("%s: bad ifp", __func__));
 	}
 
 	return (ifp == NULL);
 }
 #endif
 
 static struct grouptask free_gtask;
 static struct in_multi_head inm_free_list;
 static void inm_release_task(void *arg __unused);
 static void inm_init(void)
 {
 	SLIST_INIT(&inm_free_list);
 	taskqgroup_config_gtask_init(NULL, &free_gtask, inm_release_task, "inm release task");
 }
 
 #ifdef EARLY_AP_STARTUP
 SYSINIT(inm_init, SI_SUB_SMP + 1, SI_ORDER_FIRST,
 	inm_init, NULL);
 #else
 SYSINIT(inm_init, SI_SUB_ROOT_CONF - 1, SI_ORDER_FIRST,
 	inm_init, NULL);
 #endif
 
 
 void
 inm_release_list_deferred(struct in_multi_head *inmh)
 {
 
 	if (SLIST_EMPTY(inmh))
 		return;
 	mtx_lock(&in_multi_free_mtx);
 	SLIST_CONCAT(&inm_free_list, inmh, in_multi, inm_nrele);
 	mtx_unlock(&in_multi_free_mtx);
 	GROUPTASK_ENQUEUE(&free_gtask);
 }
 
 void
 inm_disconnect(struct in_multi *inm)
 {
 	struct ifnet *ifp;
 	struct ifmultiaddr *ifma, *ll_ifma;
 
 	ifp = inm->inm_ifp;
 	IF_ADDR_WLOCK_ASSERT(ifp);
 	ifma = inm->inm_ifma;
 
 	if_ref(ifp);
 	if (ifma->ifma_flags & IFMA_F_ENQUEUED) {
 		CK_STAILQ_REMOVE(&ifp->if_multiaddrs, ifma, ifmultiaddr, ifma_link);
 		ifma->ifma_flags &= ~IFMA_F_ENQUEUED;
 	}
 	MCDPRINTF("removed ifma: %p from %s\n", ifma, ifp->if_xname);
 	if ((ll_ifma = ifma->ifma_llifma) != NULL) {
 		MPASS(ifma != ll_ifma);
 		ifma->ifma_llifma = NULL;
 		MPASS(ll_ifma->ifma_llifma == NULL);
 		MPASS(ll_ifma->ifma_ifp == ifp);
 		if (--ll_ifma->ifma_refcount == 0) {
 			if (ll_ifma->ifma_flags & IFMA_F_ENQUEUED) {
 				CK_STAILQ_REMOVE(&ifp->if_multiaddrs, ll_ifma, ifmultiaddr, ifma_link);
 				ll_ifma->ifma_flags &= ~IFMA_F_ENQUEUED;
 			}
 			MCDPRINTF("removed ll_ifma: %p from %s\n", ll_ifma, ifp->if_xname);
 			if_freemulti(ll_ifma);
 			ifma_restart = true;
 		}
 	}
 }
 
 void
 inm_release_deferred(struct in_multi *inm)
 {
 	struct in_multi_head tmp;
 
 	IN_MULTI_LIST_LOCK_ASSERT();
 	MPASS(inm->inm_refcount > 0);
 	if (--inm->inm_refcount == 0) {
 		SLIST_INIT(&tmp);
 		inm_disconnect(inm);
 		inm->inm_ifma->ifma_protospec = NULL;
 		SLIST_INSERT_HEAD(&tmp, inm, inm_nrele);
 		inm_release_list_deferred(&tmp);
 	}
 }
 
 static void
 inm_release_task(void *arg __unused)
 {
 	struct in_multi_head inm_free_tmp;
 	struct in_multi *inm, *tinm;
 
 	SLIST_INIT(&inm_free_tmp);
 	mtx_lock(&in_multi_free_mtx);
 	SLIST_CONCAT(&inm_free_tmp, &inm_free_list, in_multi, inm_nrele);
 	mtx_unlock(&in_multi_free_mtx);
 	IN_MULTI_LOCK();
 	SLIST_FOREACH_SAFE(inm, &inm_free_tmp, inm_nrele, tinm) {
 		SLIST_REMOVE_HEAD(&inm_free_tmp, inm_nrele);
 		MPASS(inm);
 		inm_release(inm);
 	}
 	IN_MULTI_UNLOCK();
 }
 
 /*
  * Initialize an in_mfilter structure to a known state at t0, t1
  * with an empty source filter list.
  */
 static __inline void
 imf_init(struct in_mfilter *imf, const int st0, const int st1)
 {
 	memset(imf, 0, sizeof(struct in_mfilter));
 	RB_INIT(&imf->imf_sources);
 	imf->imf_st[0] = st0;
 	imf->imf_st[1] = st1;
 }
 
 /*
  * Function for looking up an in_multi record for an IPv4 multicast address
  * on a given interface. ifp must be valid. If no record found, return NULL.
  * The IN_MULTI_LIST_LOCK and IF_ADDR_LOCK on ifp must be held.
  */
 struct in_multi *
 inm_lookup_locked(struct ifnet *ifp, const struct in_addr ina)
 {
 	struct ifmultiaddr *ifma;
 	struct in_multi *inm;
 
 	IN_MULTI_LIST_LOCK_ASSERT();
 	IF_ADDR_LOCK_ASSERT(ifp);
 
 	inm = NULL;
 	CK_STAILQ_FOREACH(ifma, &((ifp)->if_multiaddrs), ifma_link) {
 		if (ifma->ifma_addr->sa_family != AF_INET ||
 			ifma->ifma_protospec == NULL)
 			continue;
 		inm = (struct in_multi *)ifma->ifma_protospec;
 		if (inm->inm_addr.s_addr == ina.s_addr)
 			break;
 		inm = NULL;
 	}
 	return (inm);
 }
 
 /*
  * Wrapper for inm_lookup_locked().
  * The IF_ADDR_LOCK will be taken on ifp and released on return.
  */
 struct in_multi *
 inm_lookup(struct ifnet *ifp, const struct in_addr ina)
 {
 	struct epoch_tracker et;
 	struct in_multi *inm;
 
 	IN_MULTI_LIST_LOCK_ASSERT();
 	NET_EPOCH_ENTER(et);
 	inm = inm_lookup_locked(ifp, ina);
 	NET_EPOCH_EXIT(et);
 
 	return (inm);
 }
 
 /*
  * Resize the ip_moptions vector to the next power-of-two minus 1.
  * May be called with locks held; do not sleep.
  */
 static int
 imo_grow(struct ip_moptions *imo)
 {
 	struct in_multi		**nmships;
 	struct in_multi		**omships;
 	struct in_mfilter	 *nmfilters;
 	struct in_mfilter	 *omfilters;
 	size_t			  idx;
 	size_t			  newmax;
 	size_t			  oldmax;
 
 	nmships = NULL;
 	nmfilters = NULL;
 	omships = imo->imo_membership;
 	omfilters = imo->imo_mfilters;
 	oldmax = imo->imo_max_memberships;
 	newmax = ((oldmax + 1) * 2) - 1;
 
 	if (newmax <= IP_MAX_MEMBERSHIPS) {
 		nmships = (struct in_multi **)realloc(omships,
 		    sizeof(struct in_multi *) * newmax, M_IPMOPTS, M_NOWAIT);
 		nmfilters = (struct in_mfilter *)realloc(omfilters,
 		    sizeof(struct in_mfilter) * newmax, M_INMFILTER, M_NOWAIT);
 		if (nmships != NULL && nmfilters != NULL) {
 			/* Initialize newly allocated source filter heads. */
 			for (idx = oldmax; idx < newmax; idx++) {
 				imf_init(&nmfilters[idx], MCAST_UNDEFINED,
 				    MCAST_EXCLUDE);
 			}
 			imo->imo_max_memberships = newmax;
 			imo->imo_membership = nmships;
 			imo->imo_mfilters = nmfilters;
 		}
 	}
 
 	if (nmships == NULL || nmfilters == NULL) {
 		if (nmships != NULL)
 			free(nmships, M_IPMOPTS);
 		if (nmfilters != NULL)
 			free(nmfilters, M_INMFILTER);
 		return (ETOOMANYREFS);
 	}
 
 	return (0);
 }
 
 /*
  * Find an IPv4 multicast group entry for this ip_moptions instance
  * which matches the specified group, and optionally an interface.
  * Return its index into the array, or -1 if not found.
  */
 static size_t
 imo_match_group(const struct ip_moptions *imo, const struct ifnet *ifp,
     const struct sockaddr *group)
 {
 	const struct sockaddr_in *gsin;
 	struct in_multi	**pinm;
 	int		  idx;
 	int		  nmships;
 
 	gsin = (const struct sockaddr_in *)group;
 
 	/* The imo_membership array may be lazy allocated. */
 	if (imo->imo_membership == NULL || imo->imo_num_memberships == 0)
 		return (-1);
 
 	nmships = imo->imo_num_memberships;
 	pinm = &imo->imo_membership[0];
 	for (idx = 0; idx < nmships; idx++, pinm++) {
 		if (*pinm == NULL)
 			continue;
 		if ((ifp == NULL || ((*pinm)->inm_ifp == ifp)) &&
 		    in_hosteq((*pinm)->inm_addr, gsin->sin_addr)) {
 			break;
 		}
 	}
 	if (idx >= nmships)
 		idx = -1;
 
 	return (idx);
 }
 
 /*
  * Find an IPv4 multicast source entry for this imo which matches
  * the given group index for this socket, and source address.
  *
  * NOTE: This does not check if the entry is in-mode, merely if
  * it exists, which may not be the desired behaviour.
  */
 static struct in_msource *
 imo_match_source(const struct ip_moptions *imo, const size_t gidx,
     const struct sockaddr *src)
 {
 	struct ip_msource	 find;
 	struct in_mfilter	*imf;
 	struct ip_msource	*ims;
 	const sockunion_t	*psa;
 
 	KASSERT(src->sa_family == AF_INET, ("%s: !AF_INET", __func__));
 	KASSERT(gidx != -1 && gidx < imo->imo_num_memberships,
 	    ("%s: invalid index %d\n", __func__, (int)gidx));
 
 	/* The imo_mfilters array may be lazy allocated. */
 	if (imo->imo_mfilters == NULL)
 		return (NULL);
 	imf = &imo->imo_mfilters[gidx];
 
 	/* Source trees are keyed in host byte order. */
 	psa = (const sockunion_t *)src;
 	find.ims_haddr = ntohl(psa->sin.sin_addr.s_addr);
 	ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find);
 
 	return ((struct in_msource *)ims);
 }
 
 /*
  * Perform filtering for multicast datagrams on a socket by group and source.
  *
  * Returns 0 if a datagram should be allowed through, or various error codes
  * if the socket was not a member of the group, or the source was muted, etc.
  */
 int
 imo_multi_filter(const struct ip_moptions *imo, const struct ifnet *ifp,
     const struct sockaddr *group, const struct sockaddr *src)
 {
 	size_t gidx;
 	struct in_msource *ims;
 	int mode;
 
 	KASSERT(ifp != NULL, ("%s: null ifp", __func__));
 
 	gidx = imo_match_group(imo, ifp, group);
 	if (gidx == -1)
 		return (MCAST_NOTGMEMBER);
 
 	/*
 	 * Check if the source was included in an (S,G) join.
 	 * Allow reception on exclusive memberships by default,
 	 * reject reception on inclusive memberships by default.
 	 * Exclude source only if an in-mode exclude filter exists.
 	 * Include source only if an in-mode include filter exists.
 	 * NOTE: We are comparing group state here at IGMP t1 (now)
 	 * with socket-layer t0 (since last downcall).
 	 */
 	mode = imo->imo_mfilters[gidx].imf_st[1];
 	ims = imo_match_source(imo, gidx, src);
 
 	if ((ims == NULL && mode == MCAST_INCLUDE) ||
 	    (ims != NULL && ims->imsl_st[0] != mode))
 		return (MCAST_NOTSMEMBER);
 
 	return (MCAST_PASS);
 }
 
 /*
  * Find and return a reference to an in_multi record for (ifp, group),
  * and bump its reference count.
  * If one does not exist, try to allocate it, and update link-layer multicast
  * filters on ifp to listen for group.
  * Assumes the IN_MULTI lock is held across the call.
  * Return 0 if successful, otherwise return an appropriate error code.
  */
 static int
 in_getmulti(struct ifnet *ifp, const struct in_addr *group,
     struct in_multi **pinm)
 {
 	struct sockaddr_in	 gsin;
 	struct ifmultiaddr	*ifma;
 	struct in_ifinfo	*ii;
 	struct in_multi		*inm;
 	int error;
 
 	IN_MULTI_LOCK_ASSERT();
 
 	ii = (struct in_ifinfo *)ifp->if_afdata[AF_INET];
 	IN_MULTI_LIST_LOCK();
 	inm = inm_lookup(ifp, *group);
 	if (inm != NULL) {
 		/*
 		 * If we already joined this group, just bump the
 		 * refcount and return it.
 		 */
 		KASSERT(inm->inm_refcount >= 1,
 		    ("%s: bad refcount %d", __func__, inm->inm_refcount));
 		inm_acquire_locked(inm);
 		*pinm = inm;
 	}
 	IN_MULTI_LIST_UNLOCK();
 	if (inm != NULL)
 		return (0);
 	
 	memset(&gsin, 0, sizeof(gsin));
 	gsin.sin_family = AF_INET;
 	gsin.sin_len = sizeof(struct sockaddr_in);
 	gsin.sin_addr = *group;
 
 	/*
 	 * Check if a link-layer group is already associated
 	 * with this network-layer group on the given ifnet.
 	 */
 	error = if_addmulti(ifp, (struct sockaddr *)&gsin, &ifma);
 	if (error != 0)
 		return (error);
 
 	/* XXX ifma_protospec must be covered by IF_ADDR_LOCK */
 	IN_MULTI_LIST_LOCK();
 	IF_ADDR_WLOCK(ifp);
 
 	/*
 	 * If something other than netinet is occupying the link-layer
 	 * group, print a meaningful error message and back out of
 	 * the allocation.
 	 * Otherwise, bump the refcount on the existing network-layer
 	 * group association and return it.
 	 */
 	if (ifma->ifma_protospec != NULL) {
 		inm = (struct in_multi *)ifma->ifma_protospec;
 #ifdef INVARIANTS
 		KASSERT(ifma->ifma_addr != NULL, ("%s: no ifma_addr",
 		    __func__));
 		KASSERT(ifma->ifma_addr->sa_family == AF_INET,
 		    ("%s: ifma not AF_INET", __func__));
 		KASSERT(inm != NULL, ("%s: no ifma_protospec", __func__));
 		if (inm->inm_ifma != ifma || inm->inm_ifp != ifp ||
 		    !in_hosteq(inm->inm_addr, *group)) {
 			char addrbuf[INET_ADDRSTRLEN];
 
 			panic("%s: ifma %p is inconsistent with %p (%s)",
 			    __func__, ifma, inm, inet_ntoa_r(*group, addrbuf));
 		}
 #endif
 		inm_acquire_locked(inm);
 		*pinm = inm;
 		goto out_locked;
 	}
 
 	IF_ADDR_WLOCK_ASSERT(ifp);
 
 	/*
 	 * A new in_multi record is needed; allocate and initialize it.
 	 * We DO NOT perform an IGMP join as the in_ layer may need to
 	 * push an initial source list down to IGMP to support SSM.
 	 *
 	 * The initial source filter state is INCLUDE, {} as per the RFC.
 	 */
 	inm = malloc(sizeof(*inm), M_IPMADDR, M_NOWAIT | M_ZERO);
 	if (inm == NULL) {
 		IF_ADDR_WUNLOCK(ifp);
 		IN_MULTI_LIST_UNLOCK();
 		if_delmulti_ifma(ifma);
 		return (ENOMEM);
 	}
 	inm->inm_addr = *group;
 	inm->inm_ifp = ifp;
 	inm->inm_igi = ii->ii_igmp;
 	inm->inm_ifma = ifma;
 	inm->inm_refcount = 1;
 	inm->inm_state = IGMP_NOT_MEMBER;
 	mbufq_init(&inm->inm_scq, IGMP_MAX_STATE_CHANGES);
 	inm->inm_st[0].iss_fmode = MCAST_UNDEFINED;
 	inm->inm_st[1].iss_fmode = MCAST_UNDEFINED;
 	RB_INIT(&inm->inm_srcs);
 
 	ifma->ifma_protospec = inm;
 
 	*pinm = inm;
  out_locked:
 	IF_ADDR_WUNLOCK(ifp);
 	IN_MULTI_LIST_UNLOCK();
 	return (0);
 }
 
 /*
  * Drop a reference to an in_multi record.
  *
  * If the refcount drops to 0, free the in_multi record and
  * delete the underlying link-layer membership.
  */
 static void
 inm_release(struct in_multi *inm)
 {
 	struct ifmultiaddr *ifma;
 	struct ifnet *ifp;
 
 	CTR2(KTR_IGMPV3, "%s: refcount is %d", __func__, inm->inm_refcount);
 	MPASS(inm->inm_refcount == 0);
 	CTR2(KTR_IGMPV3, "%s: freeing inm %p", __func__, inm);
 
 	ifma = inm->inm_ifma;
 	ifp = inm->inm_ifp;
 
 	/* XXX this access is not covered by IF_ADDR_LOCK */
 	CTR2(KTR_IGMPV3, "%s: purging ifma %p", __func__, ifma);
 	if (ifp != NULL) {
 		CURVNET_SET(ifp->if_vnet);
 		inm_purge(inm);
 		free(inm, M_IPMADDR);
 		if_delmulti_ifma_flags(ifma, 1);
 		CURVNET_RESTORE();
 		if_rele(ifp);
 	} else {
 		inm_purge(inm);
 		free(inm, M_IPMADDR);
 		if_delmulti_ifma_flags(ifma, 1);
 	}
 }
 
 /*
  * Clear recorded source entries for a group.
  * Used by the IGMP code. Caller must hold the IN_MULTI lock.
  * FIXME: Should reap.
  */
 void
 inm_clear_recorded(struct in_multi *inm)
 {
 	struct ip_msource	*ims;
 
 	IN_MULTI_LIST_LOCK_ASSERT();
 
 	RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) {
 		if (ims->ims_stp) {
 			ims->ims_stp = 0;
 			--inm->inm_st[1].iss_rec;
 		}
 	}
 	KASSERT(inm->inm_st[1].iss_rec == 0,
 	    ("%s: iss_rec %d not 0", __func__, inm->inm_st[1].iss_rec));
 }
 
 /*
  * Record a source as pending for a Source-Group IGMPv3 query.
  * This lives here as it modifies the shared tree.
  *
  * inm is the group descriptor.
  * naddr is the address of the source to record in network-byte order.
  *
  * If the net.inet.igmp.sgalloc sysctl is non-zero, we will
  * lazy-allocate a source node in response to an SG query.
  * Otherwise, no allocation is performed. This saves some memory
  * with the trade-off that the source will not be reported to the
  * router if joined in the window between the query response and
  * the group actually being joined on the local host.
  *
  * VIMAGE: XXX: Currently the igmp_sgalloc feature has been removed.
  * This turns off the allocation of a recorded source entry if
  * the group has not been joined.
  *
  * Return 0 if the source didn't exist or was already marked as recorded.
  * Return 1 if the source was marked as recorded by this function.
  * Return <0 if any error occurred (negated errno code).
  */
 int
 inm_record_source(struct in_multi *inm, const in_addr_t naddr)
 {
 	struct ip_msource	 find;
 	struct ip_msource	*ims, *nims;
 
 	IN_MULTI_LIST_LOCK_ASSERT();
 
 	find.ims_haddr = ntohl(naddr);
 	ims = RB_FIND(ip_msource_tree, &inm->inm_srcs, &find);
 	if (ims && ims->ims_stp)
 		return (0);
 	if (ims == NULL) {
 		if (inm->inm_nsrc == in_mcast_maxgrpsrc)
 			return (-ENOSPC);
 		nims = malloc(sizeof(struct ip_msource), M_IPMSOURCE,
 		    M_NOWAIT | M_ZERO);
 		if (nims == NULL)
 			return (-ENOMEM);
 		nims->ims_haddr = find.ims_haddr;
 		RB_INSERT(ip_msource_tree, &inm->inm_srcs, nims);
 		++inm->inm_nsrc;
 		ims = nims;
 	}
 
 	/*
 	 * Mark the source as recorded and update the recorded
 	 * source count.
 	 */
 	++ims->ims_stp;
 	++inm->inm_st[1].iss_rec;
 
 	return (1);
 }
 
 /*
  * Return a pointer to an in_msource owned by an in_mfilter,
  * given its source address.
  * Lazy-allocate if needed. If this is a new entry its filter state is
  * undefined at t0.
  *
  * imf is the filter set being modified.
  * haddr is the source address in *host* byte-order.
  *
  * SMPng: May be called with locks held; malloc must not block.
  */
 static int
 imf_get_source(struct in_mfilter *imf, const struct sockaddr_in *psin,
     struct in_msource **plims)
 {
 	struct ip_msource	 find;
 	struct ip_msource	*ims, *nims;
 	struct in_msource	*lims;
 	int			 error;
 
 	error = 0;
 	ims = NULL;
 	lims = NULL;
 
 	/* key is host byte order */
 	find.ims_haddr = ntohl(psin->sin_addr.s_addr);
 	ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find);
 	lims = (struct in_msource *)ims;
 	if (lims == NULL) {
 		if (imf->imf_nsrc == in_mcast_maxsocksrc)
 			return (ENOSPC);
 		nims = malloc(sizeof(struct in_msource), M_INMFILTER,
 		    M_NOWAIT | M_ZERO);
 		if (nims == NULL)
 			return (ENOMEM);
 		lims = (struct in_msource *)nims;
 		lims->ims_haddr = find.ims_haddr;
 		lims->imsl_st[0] = MCAST_UNDEFINED;
 		RB_INSERT(ip_msource_tree, &imf->imf_sources, nims);
 		++imf->imf_nsrc;
 	}
 
 	*plims = lims;
 
 	return (error);
 }
 
 /*
  * Graft a source entry into an existing socket-layer filter set,
  * maintaining any required invariants and checking allocations.
  *
  * The source is marked as being in the new filter mode at t1.
  *
  * Return the pointer to the new node, otherwise return NULL.
  */
 static struct in_msource *
 imf_graft(struct in_mfilter *imf, const uint8_t st1,
     const struct sockaddr_in *psin)
 {
 	struct ip_msource	*nims;
 	struct in_msource	*lims;
 
 	nims = malloc(sizeof(struct in_msource), M_INMFILTER,
 	    M_NOWAIT | M_ZERO);
 	if (nims == NULL)
 		return (NULL);
 	lims = (struct in_msource *)nims;
 	lims->ims_haddr = ntohl(psin->sin_addr.s_addr);
 	lims->imsl_st[0] = MCAST_UNDEFINED;
 	lims->imsl_st[1] = st1;
 	RB_INSERT(ip_msource_tree, &imf->imf_sources, nims);
 	++imf->imf_nsrc;
 
 	return (lims);
 }
 
 /*
  * Prune a source entry from an existing socket-layer filter set,
  * maintaining any required invariants and checking allocations.
  *
  * The source is marked as being left at t1, it is not freed.
  *
  * Return 0 if no error occurred, otherwise return an errno value.
  */
 static int
 imf_prune(struct in_mfilter *imf, const struct sockaddr_in *psin)
 {
 	struct ip_msource	 find;
 	struct ip_msource	*ims;
 	struct in_msource	*lims;
 
 	/* key is host byte order */
 	find.ims_haddr = ntohl(psin->sin_addr.s_addr);
 	ims = RB_FIND(ip_msource_tree, &imf->imf_sources, &find);
 	if (ims == NULL)
 		return (ENOENT);
 	lims = (struct in_msource *)ims;
 	lims->imsl_st[1] = MCAST_UNDEFINED;
 	return (0);
 }
 
 /*
  * Revert socket-layer filter set deltas at t1 to t0 state.
  */
 static void
 imf_rollback(struct in_mfilter *imf)
 {
 	struct ip_msource	*ims, *tims;
 	struct in_msource	*lims;
 
 	RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) {
 		lims = (struct in_msource *)ims;
 		if (lims->imsl_st[0] == lims->imsl_st[1]) {
 			/* no change at t1 */
 			continue;
 		} else if (lims->imsl_st[0] != MCAST_UNDEFINED) {
 			/* revert change to existing source at t1 */
 			lims->imsl_st[1] = lims->imsl_st[0];
 		} else {
 			/* revert source added t1 */
 			CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims);
 			RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims);
 			free(ims, M_INMFILTER);
 			imf->imf_nsrc--;
 		}
 	}
 	imf->imf_st[1] = imf->imf_st[0];
 }
 
 /*
  * Mark socket-layer filter set as INCLUDE {} at t1.
  */
 static void
 imf_leave(struct in_mfilter *imf)
 {
 	struct ip_msource	*ims;
 	struct in_msource	*lims;
 
 	RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) {
 		lims = (struct in_msource *)ims;
 		lims->imsl_st[1] = MCAST_UNDEFINED;
 	}
 	imf->imf_st[1] = MCAST_INCLUDE;
 }
 
 /*
  * Mark socket-layer filter set deltas as committed.
  */
 static void
 imf_commit(struct in_mfilter *imf)
 {
 	struct ip_msource	*ims;
 	struct in_msource	*lims;
 
 	RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) {
 		lims = (struct in_msource *)ims;
 		lims->imsl_st[0] = lims->imsl_st[1];
 	}
 	imf->imf_st[0] = imf->imf_st[1];
 }
 
 /*
  * Reap unreferenced sources from socket-layer filter set.
  */
 static void
 imf_reap(struct in_mfilter *imf)
 {
 	struct ip_msource	*ims, *tims;
 	struct in_msource	*lims;
 
 	RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) {
 		lims = (struct in_msource *)ims;
 		if ((lims->imsl_st[0] == MCAST_UNDEFINED) &&
 		    (lims->imsl_st[1] == MCAST_UNDEFINED)) {
 			CTR2(KTR_IGMPV3, "%s: free lims %p", __func__, ims);
 			RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims);
 			free(ims, M_INMFILTER);
 			imf->imf_nsrc--;
 		}
 	}
 }
 
 /*
  * Purge socket-layer filter set.
  */
 static void
 imf_purge(struct in_mfilter *imf)
 {
 	struct ip_msource	*ims, *tims;
 
 	RB_FOREACH_SAFE(ims, ip_msource_tree, &imf->imf_sources, tims) {
 		CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims);
 		RB_REMOVE(ip_msource_tree, &imf->imf_sources, ims);
 		free(ims, M_INMFILTER);
 		imf->imf_nsrc--;
 	}
 	imf->imf_st[0] = imf->imf_st[1] = MCAST_UNDEFINED;
 	KASSERT(RB_EMPTY(&imf->imf_sources),
 	    ("%s: imf_sources not empty", __func__));
 }
 
 /*
  * Look up a source filter entry for a multicast group.
  *
  * inm is the group descriptor to work with.
  * haddr is the host-byte-order IPv4 address to look up.
  * noalloc may be non-zero to suppress allocation of sources.
  * *pims will be set to the address of the retrieved or allocated source.
  *
  * SMPng: NOTE: may be called with locks held.
  * Return 0 if successful, otherwise return a non-zero error code.
  */
 static int
 inm_get_source(struct in_multi *inm, const in_addr_t haddr,
     const int noalloc, struct ip_msource **pims)
 {
 	struct ip_msource	 find;
 	struct ip_msource	*ims, *nims;
 
 	find.ims_haddr = haddr;
 	ims = RB_FIND(ip_msource_tree, &inm->inm_srcs, &find);
 	if (ims == NULL && !noalloc) {
 		if (inm->inm_nsrc == in_mcast_maxgrpsrc)
 			return (ENOSPC);
 		nims = malloc(sizeof(struct ip_msource), M_IPMSOURCE,
 		    M_NOWAIT | M_ZERO);
 		if (nims == NULL)
 			return (ENOMEM);
 		nims->ims_haddr = haddr;
 		RB_INSERT(ip_msource_tree, &inm->inm_srcs, nims);
 		++inm->inm_nsrc;
 		ims = nims;
 #ifdef KTR
 		CTR3(KTR_IGMPV3, "%s: allocated 0x%08x as %p", __func__,
 		    haddr, ims);
 #endif
 	}
 
 	*pims = ims;
 	return (0);
 }
 
 /*
  * Merge socket-layer source into IGMP-layer source.
  * If rollback is non-zero, perform the inverse of the merge.
  */
 static void
 ims_merge(struct ip_msource *ims, const struct in_msource *lims,
     const int rollback)
 {
 	int n = rollback ? -1 : 1;
 
 	if (lims->imsl_st[0] == MCAST_EXCLUDE) {
 		CTR3(KTR_IGMPV3, "%s: t1 ex -= %d on 0x%08x",
 		    __func__, n, ims->ims_haddr);
 		ims->ims_st[1].ex -= n;
 	} else if (lims->imsl_st[0] == MCAST_INCLUDE) {
 		CTR3(KTR_IGMPV3, "%s: t1 in -= %d on 0x%08x",
 		    __func__, n, ims->ims_haddr);
 		ims->ims_st[1].in -= n;
 	}
 
 	if (lims->imsl_st[1] == MCAST_EXCLUDE) {
 		CTR3(KTR_IGMPV3, "%s: t1 ex += %d on 0x%08x",
 		    __func__, n, ims->ims_haddr);
 		ims->ims_st[1].ex += n;
 	} else if (lims->imsl_st[1] == MCAST_INCLUDE) {
 		CTR3(KTR_IGMPV3, "%s: t1 in += %d on 0x%08x",
 		    __func__, n, ims->ims_haddr);
 		ims->ims_st[1].in += n;
 	}
 }
 
 /*
  * Atomically update the global in_multi state, when a membership's
  * filter list is being updated in any way.
  *
  * imf is the per-inpcb-membership group filter pointer.
  * A fake imf may be passed for in-kernel consumers.
  *
  * XXX This is a candidate for a set-symmetric-difference style loop
  * which would eliminate the repeated lookup from root of ims nodes,
  * as they share the same key space.
  *
  * If any error occurred this function will back out of refcounts
  * and return a non-zero value.
  */
 static int
 inm_merge(struct in_multi *inm, /*const*/ struct in_mfilter *imf)
 {
 	struct ip_msource	*ims, *nims;
 	struct in_msource	*lims;
 	int			 schanged, error;
 	int			 nsrc0, nsrc1;
 
 	schanged = 0;
 	error = 0;
 	nsrc1 = nsrc0 = 0;
 	IN_MULTI_LIST_LOCK_ASSERT();
 
 	/*
 	 * Update the source filters first, as this may fail.
 	 * Maintain count of in-mode filters at t0, t1. These are
 	 * used to work out if we transition into ASM mode or not.
 	 * Maintain a count of source filters whose state was
 	 * actually modified by this operation.
 	 */
 	RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) {
 		lims = (struct in_msource *)ims;
 		if (lims->imsl_st[0] == imf->imf_st[0]) nsrc0++;
 		if (lims->imsl_st[1] == imf->imf_st[1]) nsrc1++;
 		if (lims->imsl_st[0] == lims->imsl_st[1]) continue;
 		error = inm_get_source(inm, lims->ims_haddr, 0, &nims);
 		++schanged;
 		if (error)
 			break;
 		ims_merge(nims, lims, 0);
 	}
 	if (error) {
 		struct ip_msource *bims;
 
 		RB_FOREACH_REVERSE_FROM(ims, ip_msource_tree, nims) {
 			lims = (struct in_msource *)ims;
 			if (lims->imsl_st[0] == lims->imsl_st[1])
 				continue;
 			(void)inm_get_source(inm, lims->ims_haddr, 1, &bims);
 			if (bims == NULL)
 				continue;
 			ims_merge(bims, lims, 1);
 		}
 		goto out_reap;
 	}
 
 	CTR3(KTR_IGMPV3, "%s: imf filters in-mode: %d at t0, %d at t1",
 	    __func__, nsrc0, nsrc1);
 
 	/* Handle transition between INCLUDE {n} and INCLUDE {} on socket. */
 	if (imf->imf_st[0] == imf->imf_st[1] &&
 	    imf->imf_st[1] == MCAST_INCLUDE) {
 		if (nsrc1 == 0) {
 			CTR1(KTR_IGMPV3, "%s: --in on inm at t1", __func__);
 			--inm->inm_st[1].iss_in;
 		}
 	}
 
 	/* Handle filter mode transition on socket. */
 	if (imf->imf_st[0] != imf->imf_st[1]) {
 		CTR3(KTR_IGMPV3, "%s: imf transition %d to %d",
 		    __func__, imf->imf_st[0], imf->imf_st[1]);
 
 		if (imf->imf_st[0] == MCAST_EXCLUDE) {
 			CTR1(KTR_IGMPV3, "%s: --ex on inm at t1", __func__);
 			--inm->inm_st[1].iss_ex;
 		} else if (imf->imf_st[0] == MCAST_INCLUDE) {
 			CTR1(KTR_IGMPV3, "%s: --in on inm at t1", __func__);
 			--inm->inm_st[1].iss_in;
 		}
 
 		if (imf->imf_st[1] == MCAST_EXCLUDE) {
 			CTR1(KTR_IGMPV3, "%s: ex++ on inm at t1", __func__);
 			inm->inm_st[1].iss_ex++;
 		} else if (imf->imf_st[1] == MCAST_INCLUDE && nsrc1 > 0) {
 			CTR1(KTR_IGMPV3, "%s: in++ on inm at t1", __func__);
 			inm->inm_st[1].iss_in++;
 		}
 	}
 
 	/*
 	 * Track inm filter state in terms of listener counts.
 	 * If there are any exclusive listeners, stack-wide
 	 * membership is exclusive.
 	 * Otherwise, if only inclusive listeners, stack-wide is inclusive.
 	 * If no listeners remain, state is undefined at t1,
 	 * and the IGMP lifecycle for this group should finish.
 	 */
 	if (inm->inm_st[1].iss_ex > 0) {
 		CTR1(KTR_IGMPV3, "%s: transition to EX", __func__);
 		inm->inm_st[1].iss_fmode = MCAST_EXCLUDE;
 	} else if (inm->inm_st[1].iss_in > 0) {
 		CTR1(KTR_IGMPV3, "%s: transition to IN", __func__);
 		inm->inm_st[1].iss_fmode = MCAST_INCLUDE;
 	} else {
 		CTR1(KTR_IGMPV3, "%s: transition to UNDEF", __func__);
 		inm->inm_st[1].iss_fmode = MCAST_UNDEFINED;
 	}
 
 	/* Decrement ASM listener count on transition out of ASM mode. */
 	if (imf->imf_st[0] == MCAST_EXCLUDE && nsrc0 == 0) {
 		if ((imf->imf_st[1] != MCAST_EXCLUDE) ||
 		    (imf->imf_st[1] == MCAST_EXCLUDE && nsrc1 > 0)) {
 			CTR1(KTR_IGMPV3, "%s: --asm on inm at t1", __func__);
 			--inm->inm_st[1].iss_asm;
 		}
 	}
 
 	/* Increment ASM listener count on transition to ASM mode. */
 	if (imf->imf_st[1] == MCAST_EXCLUDE && nsrc1 == 0) {
 		CTR1(KTR_IGMPV3, "%s: asm++ on inm at t1", __func__);
 		inm->inm_st[1].iss_asm++;
 	}
 
 	CTR3(KTR_IGMPV3, "%s: merged imf %p to inm %p", __func__, imf, inm);
 	inm_print(inm);
 
 out_reap:
 	if (schanged > 0) {
 		CTR1(KTR_IGMPV3, "%s: sources changed; reaping", __func__);
 		inm_reap(inm);
 	}
 	return (error);
 }
 
 /*
  * Mark an in_multi's filter set deltas as committed.
  * Called by IGMP after a state change has been enqueued.
  */
 void
 inm_commit(struct in_multi *inm)
 {
 	struct ip_msource	*ims;
 
 	CTR2(KTR_IGMPV3, "%s: commit inm %p", __func__, inm);
 	CTR1(KTR_IGMPV3, "%s: pre commit:", __func__);
 	inm_print(inm);
 
 	RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) {
 		ims->ims_st[0] = ims->ims_st[1];
 	}
 	inm->inm_st[0] = inm->inm_st[1];
 }
 
 /*
  * Reap unreferenced nodes from an in_multi's filter set.
  */
 static void
 inm_reap(struct in_multi *inm)
 {
 	struct ip_msource	*ims, *tims;
 
 	RB_FOREACH_SAFE(ims, ip_msource_tree, &inm->inm_srcs, tims) {
 		if (ims->ims_st[0].ex > 0 || ims->ims_st[0].in > 0 ||
 		    ims->ims_st[1].ex > 0 || ims->ims_st[1].in > 0 ||
 		    ims->ims_stp != 0)
 			continue;
 		CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims);
 		RB_REMOVE(ip_msource_tree, &inm->inm_srcs, ims);
 		free(ims, M_IPMSOURCE);
 		inm->inm_nsrc--;
 	}
 }
 
 /*
  * Purge all source nodes from an in_multi's filter set.
  */
 static void
 inm_purge(struct in_multi *inm)
 {
 	struct ip_msource	*ims, *tims;
 
 	RB_FOREACH_SAFE(ims, ip_msource_tree, &inm->inm_srcs, tims) {
 		CTR2(KTR_IGMPV3, "%s: free ims %p", __func__, ims);
 		RB_REMOVE(ip_msource_tree, &inm->inm_srcs, ims);
 		free(ims, M_IPMSOURCE);
 		inm->inm_nsrc--;
 	}
 }
 
 /*
  * Join a multicast group; unlocked entry point.
  *
  * SMPng: XXX: in_joingroup() is called from in_control() when Giant
  * is not held. Fortunately, ifp is unlikely to have been detached
  * at this point, so we assume it's OK to recurse.
  */
 int
 in_joingroup(struct ifnet *ifp, const struct in_addr *gina,
     /*const*/ struct in_mfilter *imf, struct in_multi **pinm)
 {
 	int error;
 
 	IN_MULTI_LOCK();
 	error = in_joingroup_locked(ifp, gina, imf, pinm);
 	IN_MULTI_UNLOCK();
 
 	return (error);
 }
 
 /*
  * Join a multicast group; real entry point.
  *
  * Only preserves atomicity at inm level.
  * NOTE: imf argument cannot be const due to sys/tree.h limitations.
  *
  * If the IGMP downcall fails, the group is not joined, and an error
  * code is returned.
  */
 int
 in_joingroup_locked(struct ifnet *ifp, const struct in_addr *gina,
     /*const*/ struct in_mfilter *imf, struct in_multi **pinm)
 {
 	struct in_mfilter	 timf;
 	struct in_multi		*inm;
 	int			 error;
 
 	IN_MULTI_LOCK_ASSERT();
 	IN_MULTI_LIST_UNLOCK_ASSERT();
 
 	CTR4(KTR_IGMPV3, "%s: join 0x%08x on %p(%s))", __func__,
 	    ntohl(gina->s_addr), ifp, ifp->if_xname);
 
 	error = 0;
 	inm = NULL;
 
 	/*
 	 * If no imf was specified (i.e. kernel consumer),
 	 * fake one up and assume it is an ASM join.
 	 */
 	if (imf == NULL) {
 		imf_init(&timf, MCAST_UNDEFINED, MCAST_EXCLUDE);
 		imf = &timf;
 	}
 
 	error = in_getmulti(ifp, gina, &inm);
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: in_getmulti() failure", __func__);
 		return (error);
 	}
 	IN_MULTI_LIST_LOCK();
 	CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 	error = inm_merge(inm, imf);
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__);
 		goto out_inm_release;
 	}
 
 	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 	error = igmp_change_state(inm);
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: failed to update source", __func__);
 		goto out_inm_release;
 	}
 
  out_inm_release:
 	if (error) {
 
 		CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm);
 		inm_release_deferred(inm);
 	} else {
 		*pinm = inm;
 	}
 	IN_MULTI_LIST_UNLOCK();
 
 	return (error);
 }
 
 /*
  * Leave a multicast group; unlocked entry point.
  */
 int
 in_leavegroup(struct in_multi *inm, /*const*/ struct in_mfilter *imf)
 {
 	int error;
 
 	IN_MULTI_LOCK();
 	error = in_leavegroup_locked(inm, imf);
 	IN_MULTI_UNLOCK();
 
 	return (error);
 }
 
 /*
  * Leave a multicast group; real entry point.
  * All source filters will be expunged.
  *
  * Only preserves atomicity at inm level.
  *
  * Holding the write lock for the INP which contains imf
  * is highly advisable. We can't assert for it as imf does not
  * contain a back-pointer to the owning inp.
  *
  * Note: This is not the same as inm_release(*) as this function also
  * makes a state change downcall into IGMP.
  */
 int
 in_leavegroup_locked(struct in_multi *inm, /*const*/ struct in_mfilter *imf)
 {
 	struct in_mfilter	 timf;
 	int			 error;
 
 	error = 0;
 
 	IN_MULTI_LOCK_ASSERT();
 	IN_MULTI_LIST_UNLOCK_ASSERT();
 
 	CTR5(KTR_IGMPV3, "%s: leave inm %p, 0x%08x/%s, imf %p", __func__,
 	    inm, ntohl(inm->inm_addr.s_addr),
 	    (inm_is_ifp_detached(inm) ? "null" : inm->inm_ifp->if_xname),
 	    imf);
 
 	/*
 	 * If no imf was specified (i.e. kernel consumer),
 	 * fake one up and assume it is an ASM join.
 	 */
 	if (imf == NULL) {
 		imf_init(&timf, MCAST_EXCLUDE, MCAST_UNDEFINED);
 		imf = &timf;
 	}
 
 	/*
 	 * Begin state merge transaction at IGMP layer.
 	 *
 	 * As this particular invocation should not cause any memory
 	 * to be allocated, and there is no opportunity to roll back
 	 * the transaction, it MUST NOT fail.
 	 */
 	CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 	IN_MULTI_LIST_LOCK();
 	error = inm_merge(inm, imf);
 	KASSERT(error == 0, ("%s: failed to merge inm state", __func__));
 
 	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 	CURVNET_SET(inm->inm_ifp->if_vnet);
 	error = igmp_change_state(inm);
 	IF_ADDR_WLOCK(inm->inm_ifp);
 	inm_release_deferred(inm);
 	IF_ADDR_WUNLOCK(inm->inm_ifp);
 	IN_MULTI_LIST_UNLOCK();
 	CURVNET_RESTORE();
 	if (error)
 		CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);
 
 	CTR2(KTR_IGMPV3, "%s: dropping ref on %p", __func__, inm);
 
 	return (error);
 }
 
 /*#ifndef BURN_BRIDGES*/
 /*
  * Join an IPv4 multicast group in (*,G) exclusive mode.
  * The group must be a 224.0.0.0/24 link-scope group.
  * This KPI is for legacy kernel consumers only.
  */
 struct in_multi *
 in_addmulti(struct in_addr *ap, struct ifnet *ifp)
 {
 	struct in_multi *pinm;
 	int error;
 #ifdef INVARIANTS
 	char addrbuf[INET_ADDRSTRLEN];
 #endif
 
 	KASSERT(IN_LOCAL_GROUP(ntohl(ap->s_addr)),
 	    ("%s: %s not in 224.0.0.0/24", __func__,
 	    inet_ntoa_r(*ap, addrbuf)));
 
 	error = in_joingroup(ifp, ap, NULL, &pinm);
 	if (error != 0)
 		pinm = NULL;
 
 	return (pinm);
 }
 
 /*
  * Block or unblock an ASM multicast source on an inpcb.
  * This implements the delta-based API described in RFC 3678.
  *
  * The delta-based API applies only to exclusive-mode memberships.
  * An IGMP downcall will be performed.
  *
  * SMPng: NOTE: Must take Giant as a join may create a new ifma.
  *
  * Return 0 if successful, otherwise return an appropriate error code.
  */
 static int
 inp_block_unblock_source(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct group_source_req		 gsr;
 	struct rm_priotracker		 in_ifa_tracker;
 	sockunion_t			*gsa, *ssa;
 	struct ifnet			*ifp;
 	struct in_mfilter		*imf;
 	struct ip_moptions		*imo;
 	struct in_msource		*ims;
 	struct in_multi			*inm;
 	size_t				 idx;
 	uint16_t			 fmode;
 	int				 error, doblock;
 
 	ifp = NULL;
 	error = 0;
 	doblock = 0;
 
 	memset(&gsr, 0, sizeof(struct group_source_req));
 	gsa = (sockunion_t *)&gsr.gsr_group;
 	ssa = (sockunion_t *)&gsr.gsr_source;
 
 	switch (sopt->sopt_name) {
 	case IP_BLOCK_SOURCE:
 	case IP_UNBLOCK_SOURCE: {
 		struct ip_mreq_source	 mreqs;
 
 		error = sooptcopyin(sopt, &mreqs,
 		    sizeof(struct ip_mreq_source),
 		    sizeof(struct ip_mreq_source));
 		if (error)
 			return (error);
 
 		gsa->sin.sin_family = AF_INET;
 		gsa->sin.sin_len = sizeof(struct sockaddr_in);
 		gsa->sin.sin_addr = mreqs.imr_multiaddr;
 
 		ssa->sin.sin_family = AF_INET;
 		ssa->sin.sin_len = sizeof(struct sockaddr_in);
 		ssa->sin.sin_addr = mreqs.imr_sourceaddr;
 
 		if (!in_nullhost(mreqs.imr_interface)) {
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			INADDR_TO_IFP(mreqs.imr_interface, ifp);
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 		}
 		if (sopt->sopt_name == IP_BLOCK_SOURCE)
 			doblock = 1;
 
 		CTR3(KTR_IGMPV3, "%s: imr_interface = 0x%08x, ifp = %p",
 		    __func__, ntohl(mreqs.imr_interface.s_addr), ifp);
 		break;
 	    }
 
 	case MCAST_BLOCK_SOURCE:
 	case MCAST_UNBLOCK_SOURCE:
 		error = sooptcopyin(sopt, &gsr,
 		    sizeof(struct group_source_req),
 		    sizeof(struct group_source_req));
 		if (error)
 			return (error);
 
 		if (gsa->sin.sin_family != AF_INET ||
 		    gsa->sin.sin_len != sizeof(struct sockaddr_in))
 			return (EINVAL);
 
 		if (ssa->sin.sin_family != AF_INET ||
 		    ssa->sin.sin_len != sizeof(struct sockaddr_in))
 			return (EINVAL);
 
 		if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface)
 			return (EADDRNOTAVAIL);
 
 		ifp = ifnet_byindex(gsr.gsr_interface);
 
 		if (sopt->sopt_name == MCAST_BLOCK_SOURCE)
 			doblock = 1;
 		break;
 
 	default:
 		CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d",
 		    __func__, sopt->sopt_name);
 		return (EOPNOTSUPP);
 		break;
 	}
 
 	if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 		return (EINVAL);
 
 	/*
 	 * Check if we are actually a member of this group.
 	 */
 	imo = inp_findmoptions(inp);
 	idx = imo_match_group(imo, ifp, &gsa->sa);
 	if (idx == -1 || imo->imo_mfilters == NULL) {
 		error = EADDRNOTAVAIL;
 		goto out_inp_locked;
 	}
 
 	KASSERT(imo->imo_mfilters != NULL,
 	    ("%s: imo_mfilters not allocated", __func__));
 	imf = &imo->imo_mfilters[idx];
 	inm = imo->imo_membership[idx];
 
 	/*
 	 * Attempting to use the delta-based API on an
 	 * non exclusive-mode membership is an error.
 	 */
 	fmode = imf->imf_st[0];
 	if (fmode != MCAST_EXCLUDE) {
 		error = EINVAL;
 		goto out_inp_locked;
 	}
 
 	/*
 	 * Deal with error cases up-front:
 	 *  Asked to block, but already blocked; or
 	 *  Asked to unblock, but nothing to unblock.
 	 * If adding a new block entry, allocate it.
 	 */
 	ims = imo_match_source(imo, idx, &ssa->sa);
 	if ((ims != NULL && doblock) || (ims == NULL && !doblock)) {
 		CTR3(KTR_IGMPV3, "%s: source 0x%08x %spresent", __func__,
 		    ntohl(ssa->sin.sin_addr.s_addr), doblock ? "" : "not ");
 		error = EADDRNOTAVAIL;
 		goto out_inp_locked;
 	}
 
 	INP_WLOCK_ASSERT(inp);
 
 	/*
 	 * Begin state merge transaction at socket layer.
 	 */
 	if (doblock) {
 		CTR2(KTR_IGMPV3, "%s: %s source", __func__, "block");
 		ims = imf_graft(imf, fmode, &ssa->sin);
 		if (ims == NULL)
 			error = ENOMEM;
 	} else {
 		CTR2(KTR_IGMPV3, "%s: %s source", __func__, "allow");
 		error = imf_prune(imf, &ssa->sin);
 	}
 
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: merge imf state failed", __func__);
 		goto out_imf_rollback;
 	}
 
 	/*
 	 * Begin state merge transaction at IGMP layer.
 	 */
 	IN_MULTI_LOCK();
 	CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 	IN_MULTI_LIST_LOCK();
 	error = inm_merge(inm, imf);
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__);
 		IN_MULTI_LIST_UNLOCK();
 		goto out_in_multi_locked;
 	}
 
 	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 	error = igmp_change_state(inm);
 	IN_MULTI_LIST_UNLOCK();
 	if (error)
 		CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);
 
 out_in_multi_locked:
 
 	IN_MULTI_UNLOCK();
 out_imf_rollback:
 	if (error)
 		imf_rollback(imf);
 	else
 		imf_commit(imf);
 
 	imf_reap(imf);
 
 out_inp_locked:
 	INP_WUNLOCK(inp);
 	return (error);
 }
 
 /*
  * Given an inpcb, return its multicast options structure pointer.  Accepts
  * an unlocked inpcb pointer, but will return it locked.  May sleep.
  *
  * SMPng: NOTE: Potentially calls malloc(M_WAITOK) with Giant held.
  * SMPng: NOTE: Returns with the INP write lock held.
  */
 static struct ip_moptions *
 inp_findmoptions(struct inpcb *inp)
 {
 	struct ip_moptions	 *imo;
 	struct in_multi		**immp;
 	struct in_mfilter	 *imfp;
 	size_t			  idx;
 
 	INP_WLOCK(inp);
 	if (inp->inp_moptions != NULL)
 		return (inp->inp_moptions);
 
 	INP_WUNLOCK(inp);
 
 	imo = malloc(sizeof(*imo), M_IPMOPTS, M_WAITOK);
 	immp = malloc(sizeof(*immp) * IP_MIN_MEMBERSHIPS, M_IPMOPTS,
 	    M_WAITOK | M_ZERO);
 	imfp = malloc(sizeof(struct in_mfilter) * IP_MIN_MEMBERSHIPS,
 	    M_INMFILTER, M_WAITOK);
 
 	imo->imo_multicast_ifp = NULL;
 	imo->imo_multicast_addr.s_addr = INADDR_ANY;
 	imo->imo_multicast_vif = -1;
 	imo->imo_multicast_ttl = IP_DEFAULT_MULTICAST_TTL;
 	imo->imo_multicast_loop = in_mcast_loop;
 	imo->imo_num_memberships = 0;
 	imo->imo_max_memberships = IP_MIN_MEMBERSHIPS;
 	imo->imo_membership = immp;
 
 	/* Initialize per-group source filters. */
 	for (idx = 0; idx < IP_MIN_MEMBERSHIPS; idx++)
 		imf_init(&imfp[idx], MCAST_UNDEFINED, MCAST_EXCLUDE);
 	imo->imo_mfilters = imfp;
 
 	INP_WLOCK(inp);
 	if (inp->inp_moptions != NULL) {
 		free(imfp, M_INMFILTER);
 		free(immp, M_IPMOPTS);
 		free(imo, M_IPMOPTS);
 		return (inp->inp_moptions);
 	}
 	inp->inp_moptions = imo;
 	return (imo);
 }
 
 static void
 inp_gcmoptions(struct ip_moptions *imo)
 {
 	struct in_mfilter	*imf;
 	struct in_multi *inm;
 	struct ifnet *ifp;
 	size_t			 idx, nmships;
 
 	nmships = imo->imo_num_memberships;
 	for (idx = 0; idx < nmships; ++idx) {
 		imf = imo->imo_mfilters ? &imo->imo_mfilters[idx] : NULL;
 		if (imf)
 			imf_leave(imf);
 		inm = imo->imo_membership[idx];
 		ifp = inm->inm_ifp;
 		if (ifp != NULL) {
 			CURVNET_SET(ifp->if_vnet);
 			(void)in_leavegroup(inm, imf);
 			CURVNET_RESTORE();
 		} else {
 			(void)in_leavegroup(inm, imf);
 		}
 		if (imf)
 			imf_purge(imf);
 	}
 
 	if (imo->imo_mfilters)
 		free(imo->imo_mfilters, M_INMFILTER);
 	free(imo->imo_membership, M_IPMOPTS);
 	free(imo, M_IPMOPTS);
 }
 
 /*
  * Discard the IP multicast options (and source filters).  To minimize
  * the amount of work done while holding locks such as the INP's
  * pcbinfo lock (which is used in the receive path), the free
  * operation is deferred to the epoch callback task.
  */
 void
 inp_freemoptions(struct ip_moptions *imo)
 {
 	if (imo == NULL)
 		return;
 	inp_gcmoptions(imo);
 }
 
 /*
  * Atomically get source filters on a socket for an IPv4 multicast group.
  * Called with INP lock held; returns with lock released.
  */
 static int
 inp_get_source_filters(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct __msfilterreq	 msfr;
 	sockunion_t		*gsa;
 	struct ifnet		*ifp;
 	struct ip_moptions	*imo;
 	struct in_mfilter	*imf;
 	struct ip_msource	*ims;
 	struct in_msource	*lims;
 	struct sockaddr_in	*psin;
 	struct sockaddr_storage	*ptss;
 	struct sockaddr_storage	*tss;
 	int			 error;
 	size_t			 idx, nsrcs, ncsrcs;
 
 	INP_WLOCK_ASSERT(inp);
 
 	imo = inp->inp_moptions;
 	KASSERT(imo != NULL, ("%s: null ip_moptions", __func__));
 
 	INP_WUNLOCK(inp);
 
 	error = sooptcopyin(sopt, &msfr, sizeof(struct __msfilterreq),
 	    sizeof(struct __msfilterreq));
 	if (error)
 		return (error);
 
 	if (msfr.msfr_ifindex == 0 || V_if_index < msfr.msfr_ifindex)
 		return (EINVAL);
 
 	ifp = ifnet_byindex(msfr.msfr_ifindex);
 	if (ifp == NULL)
 		return (EINVAL);
 
 	INP_WLOCK(inp);
 
 	/*
 	 * Lookup group on the socket.
 	 */
 	gsa = (sockunion_t *)&msfr.msfr_group;
 	idx = imo_match_group(imo, ifp, &gsa->sa);
 	if (idx == -1 || imo->imo_mfilters == NULL) {
 		INP_WUNLOCK(inp);
 		return (EADDRNOTAVAIL);
 	}
 	imf = &imo->imo_mfilters[idx];
 
 	/*
 	 * Ignore memberships which are in limbo.
 	 */
 	if (imf->imf_st[1] == MCAST_UNDEFINED) {
 		INP_WUNLOCK(inp);
 		return (EAGAIN);
 	}
 	msfr.msfr_fmode = imf->imf_st[1];
 
 	/*
 	 * If the user specified a buffer, copy out the source filter
 	 * entries to userland gracefully.
 	 * We only copy out the number of entries which userland
 	 * has asked for, but we always tell userland how big the
 	 * buffer really needs to be.
 	 */
 	if (msfr.msfr_nsrcs > in_mcast_maxsocksrc)
 		msfr.msfr_nsrcs = in_mcast_maxsocksrc;
 	tss = NULL;
 	if (msfr.msfr_srcs != NULL && msfr.msfr_nsrcs > 0) {
 		tss = malloc(sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs,
 		    M_TEMP, M_NOWAIT | M_ZERO);
 		if (tss == NULL) {
 			INP_WUNLOCK(inp);
 			return (ENOBUFS);
 		}
 	}
 
 	/*
 	 * Count number of sources in-mode at t0.
 	 * If buffer space exists and remains, copy out source entries.
 	 */
 	nsrcs = msfr.msfr_nsrcs;
 	ncsrcs = 0;
 	ptss = tss;
 	RB_FOREACH(ims, ip_msource_tree, &imf->imf_sources) {
 		lims = (struct in_msource *)ims;
 		if (lims->imsl_st[0] == MCAST_UNDEFINED ||
 		    lims->imsl_st[0] != imf->imf_st[0])
 			continue;
 		++ncsrcs;
 		if (tss != NULL && nsrcs > 0) {
 			psin = (struct sockaddr_in *)ptss;
 			psin->sin_family = AF_INET;
 			psin->sin_len = sizeof(struct sockaddr_in);
 			psin->sin_addr.s_addr = htonl(lims->ims_haddr);
 			psin->sin_port = 0;
 			++ptss;
 			--nsrcs;
 		}
 	}
 
 	INP_WUNLOCK(inp);
 
 	if (tss != NULL) {
 		error = copyout(tss, msfr.msfr_srcs,
 		    sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs);
 		free(tss, M_TEMP);
 		if (error)
 			return (error);
 	}
 
 	msfr.msfr_nsrcs = ncsrcs;
 	error = sooptcopyout(sopt, &msfr, sizeof(struct __msfilterreq));
 
 	return (error);
 }
 
 /*
  * Return the IP multicast options in response to user getsockopt().
  */
 int
 inp_getmoptions(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct rm_priotracker	 in_ifa_tracker;
 	struct ip_mreqn		 mreqn;
 	struct ip_moptions	*imo;
 	struct ifnet		*ifp;
 	struct in_ifaddr	*ia;
 	int			 error, optval;
 	u_char			 coptval;
 
 	INP_WLOCK(inp);
 	imo = inp->inp_moptions;
 	/*
 	 * If socket is neither of type SOCK_RAW or SOCK_DGRAM,
 	 * or is a divert socket, reject it.
 	 */
 	if (inp->inp_socket->so_proto->pr_protocol == IPPROTO_DIVERT ||
 	    (inp->inp_socket->so_proto->pr_type != SOCK_RAW &&
 	    inp->inp_socket->so_proto->pr_type != SOCK_DGRAM)) {
 		INP_WUNLOCK(inp);
 		return (EOPNOTSUPP);
 	}
 
 	error = 0;
 	switch (sopt->sopt_name) {
 	case IP_MULTICAST_VIF:
 		if (imo != NULL)
 			optval = imo->imo_multicast_vif;
 		else
 			optval = -1;
 		INP_WUNLOCK(inp);
 		error = sooptcopyout(sopt, &optval, sizeof(int));
 		break;
 
 	case IP_MULTICAST_IF:
 		memset(&mreqn, 0, sizeof(struct ip_mreqn));
 		if (imo != NULL) {
 			ifp = imo->imo_multicast_ifp;
 			if (!in_nullhost(imo->imo_multicast_addr)) {
 				mreqn.imr_address = imo->imo_multicast_addr;
 			} else if (ifp != NULL) {
 				struct epoch_tracker et;
 
 				mreqn.imr_ifindex = ifp->if_index;
 				NET_EPOCH_ENTER(et);
 				IFP_TO_IA(ifp, ia, &in_ifa_tracker);
 				if (ia != NULL)
 					mreqn.imr_address =
 					    IA_SIN(ia)->sin_addr;
 				NET_EPOCH_EXIT(et);
 			}
 		}
 		INP_WUNLOCK(inp);
 		if (sopt->sopt_valsize == sizeof(struct ip_mreqn)) {
 			error = sooptcopyout(sopt, &mreqn,
 			    sizeof(struct ip_mreqn));
 		} else {
 			error = sooptcopyout(sopt, &mreqn.imr_address,
 			    sizeof(struct in_addr));
 		}
 		break;
 
 	case IP_MULTICAST_TTL:
 		if (imo == NULL)
 			optval = coptval = IP_DEFAULT_MULTICAST_TTL;
 		else
 			optval = coptval = imo->imo_multicast_ttl;
 		INP_WUNLOCK(inp);
 		if (sopt->sopt_valsize == sizeof(u_char))
 			error = sooptcopyout(sopt, &coptval, sizeof(u_char));
 		else
 			error = sooptcopyout(sopt, &optval, sizeof(int));
 		break;
 
 	case IP_MULTICAST_LOOP:
 		if (imo == NULL)
 			optval = coptval = IP_DEFAULT_MULTICAST_LOOP;
 		else
 			optval = coptval = imo->imo_multicast_loop;
 		INP_WUNLOCK(inp);
 		if (sopt->sopt_valsize == sizeof(u_char))
 			error = sooptcopyout(sopt, &coptval, sizeof(u_char));
 		else
 			error = sooptcopyout(sopt, &optval, sizeof(int));
 		break;
 
 	case IP_MSFILTER:
 		if (imo == NULL) {
 			error = EADDRNOTAVAIL;
 			INP_WUNLOCK(inp);
 		} else {
 			error = inp_get_source_filters(inp, sopt);
 		}
 		break;
 
 	default:
 		INP_WUNLOCK(inp);
 		error = ENOPROTOOPT;
 		break;
 	}
 
 	INP_UNLOCK_ASSERT(inp);
 
 	return (error);
 }
 
 /*
  * Look up the ifnet to use for a multicast group membership,
  * given the IPv4 address of an interface, and the IPv4 group address.
  *
  * This routine exists to support legacy multicast applications
  * which do not understand that multicast memberships are scoped to
  * specific physical links in the networking stack, or which need
  * to join link-scope groups before IPv4 addresses are configured.
  *
  * If inp is non-NULL, use this socket's current FIB number for any
  * required FIB lookup.
  * If ina is INADDR_ANY, look up the group address in the unicast FIB,
  * and use its ifp; usually, this points to the default next-hop.
  *
  * If the FIB lookup fails, attempt to use the first non-loopback
  * interface with multicast capability in the system as a
  * last resort. The legacy IPv4 ASM API requires that we do
  * this in order to allow groups to be joined when the routing
  * table has not yet been populated during boot.
  *
  * Returns NULL if no ifp could be found.
  *
  * FUTURE: Implement IPv4 source-address selection.
  */
 static struct ifnet *
 inp_lookup_mcast_ifp(const struct inpcb *inp,
     const struct sockaddr_in *gsin, const struct in_addr ina)
 {
 	struct rm_priotracker in_ifa_tracker;
 	struct ifnet *ifp;
 	struct nhop4_basic nh4;
 	uint32_t fibnum;
 
 	KASSERT(gsin->sin_family == AF_INET, ("%s: not AF_INET", __func__));
 	KASSERT(IN_MULTICAST(ntohl(gsin->sin_addr.s_addr)),
 	    ("%s: not multicast", __func__));
 
 	ifp = NULL;
 	if (!in_nullhost(ina)) {
 		IN_IFADDR_RLOCK(&in_ifa_tracker);
 		INADDR_TO_IFP(ina, ifp);
 		IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 	} else {
 		fibnum = inp ? inp->inp_inc.inc_fibnum : 0;
 		if (fib4_lookup_nh_basic(fibnum, gsin->sin_addr, 0, 0, &nh4)==0)
 			ifp = nh4.nh_ifp;
 		else {
 			struct in_ifaddr *ia;
 			struct ifnet *mifp;
 
 			mifp = NULL;
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
 				mifp = ia->ia_ifp;
 				if (!(mifp->if_flags & IFF_LOOPBACK) &&
 				     (mifp->if_flags & IFF_MULTICAST)) {
 					ifp = mifp;
 					break;
 				}
 			}
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 		}
 	}
 
 	return (ifp);
 }
 
 /*
  * Join an IPv4 multicast group, possibly with a source.
  */
 static int
 inp_join_group(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct group_source_req		 gsr;
 	sockunion_t			*gsa, *ssa;
 	struct ifnet			*ifp;
 	struct in_mfilter		*imf;
 	struct ip_moptions		*imo;
 	struct in_multi			*inm;
 	struct in_msource		*lims;
 	size_t				 idx;
 	int				 error, is_new;
 
 	ifp = NULL;
 	imf = NULL;
 	lims = NULL;
 	error = 0;
 	is_new = 0;
 
 	memset(&gsr, 0, sizeof(struct group_source_req));
 	gsa = (sockunion_t *)&gsr.gsr_group;
 	gsa->ss.ss_family = AF_UNSPEC;
 	ssa = (sockunion_t *)&gsr.gsr_source;
 	ssa->ss.ss_family = AF_UNSPEC;
 
 	switch (sopt->sopt_name) {
 	case IP_ADD_MEMBERSHIP: {
 		struct ip_mreqn mreqn;
 
 		if (sopt->sopt_valsize == sizeof(struct ip_mreqn))
 			error = sooptcopyin(sopt, &mreqn,
 			    sizeof(struct ip_mreqn), sizeof(struct ip_mreqn));
 		else
 			error = sooptcopyin(sopt, &mreqn,
 			    sizeof(struct ip_mreq), sizeof(struct ip_mreq));
 		if (error)
 			return (error);
 
 		gsa->sin.sin_family = AF_INET;
 		gsa->sin.sin_len = sizeof(struct sockaddr_in);
 		gsa->sin.sin_addr = mreqn.imr_multiaddr;
 		if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 			return (EINVAL);
 
 		if (sopt->sopt_valsize == sizeof(struct ip_mreqn) &&
 		    mreqn.imr_ifindex != 0)
 			ifp = ifnet_byindex(mreqn.imr_ifindex);
 		else
 			ifp = inp_lookup_mcast_ifp(inp, &gsa->sin,
 			    mreqn.imr_address);
 		break;
 	}
 	case IP_ADD_SOURCE_MEMBERSHIP: {
 		struct ip_mreq_source	 mreqs;
 
 		error = sooptcopyin(sopt, &mreqs, sizeof(struct ip_mreq_source),
 			    sizeof(struct ip_mreq_source));
 		if (error)
 			return (error);
 
 		gsa->sin.sin_family = ssa->sin.sin_family = AF_INET;
 		gsa->sin.sin_len = ssa->sin.sin_len =
 		    sizeof(struct sockaddr_in);
 
 		gsa->sin.sin_addr = mreqs.imr_multiaddr;
 		if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 			return (EINVAL);
 
 		ssa->sin.sin_addr = mreqs.imr_sourceaddr;
 
 		ifp = inp_lookup_mcast_ifp(inp, &gsa->sin,
 		    mreqs.imr_interface);
 		CTR3(KTR_IGMPV3, "%s: imr_interface = 0x%08x, ifp = %p",
 		    __func__, ntohl(mreqs.imr_interface.s_addr), ifp);
 		break;
 	}
 
 	case MCAST_JOIN_GROUP:
 	case MCAST_JOIN_SOURCE_GROUP:
 		if (sopt->sopt_name == MCAST_JOIN_GROUP) {
 			error = sooptcopyin(sopt, &gsr,
 			    sizeof(struct group_req),
 			    sizeof(struct group_req));
 		} else if (sopt->sopt_name == MCAST_JOIN_SOURCE_GROUP) {
 			error = sooptcopyin(sopt, &gsr,
 			    sizeof(struct group_source_req),
 			    sizeof(struct group_source_req));
 		}
 		if (error)
 			return (error);
 
 		if (gsa->sin.sin_family != AF_INET ||
 		    gsa->sin.sin_len != sizeof(struct sockaddr_in))
 			return (EINVAL);
 
 		/*
 		 * Overwrite the port field if present, as the sockaddr
 		 * being copied in may be matched with a binary comparison.
 		 */
 		gsa->sin.sin_port = 0;
 		if (sopt->sopt_name == MCAST_JOIN_SOURCE_GROUP) {
 			if (ssa->sin.sin_family != AF_INET ||
 			    ssa->sin.sin_len != sizeof(struct sockaddr_in))
 				return (EINVAL);
 			ssa->sin.sin_port = 0;
 		}
 
 		if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 			return (EINVAL);
 
 		if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface)
 			return (EADDRNOTAVAIL);
 		ifp = ifnet_byindex(gsr.gsr_interface);
 		break;
 
 	default:
 		CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d",
 		    __func__, sopt->sopt_name);
 		return (EOPNOTSUPP);
 		break;
 	}
 
 	if (ifp == NULL || (ifp->if_flags & IFF_MULTICAST) == 0)
 		return (EADDRNOTAVAIL);
 
 	imo = inp_findmoptions(inp);
 	idx = imo_match_group(imo, ifp, &gsa->sa);
 	if (idx == -1) {
 		is_new = 1;
 	} else {
 		inm = imo->imo_membership[idx];
 		imf = &imo->imo_mfilters[idx];
 		if (ssa->ss.ss_family != AF_UNSPEC) {
 			/*
 			 * MCAST_JOIN_SOURCE_GROUP on an exclusive membership
 			 * is an error. On an existing inclusive membership,
 			 * it just adds the source to the filter list.
 			 */
 			if (imf->imf_st[1] != MCAST_INCLUDE) {
 				error = EINVAL;
 				goto out_inp_locked;
 			}
 			/*
 			 * Throw out duplicates.
 			 *
 			 * XXX FIXME: This makes a naive assumption that
 			 * even if entries exist for *ssa in this imf,
 			 * they will be rejected as dupes, even if they
 			 * are not valid in the current mode (in-mode).
 			 *
 			 * in_msource is transactioned just as for anything
 			 * else in SSM -- but note naive use of inm_graft()
 			 * below for allocating new filter entries.
 			 *
 			 * This is only an issue if someone mixes the
 			 * full-state SSM API with the delta-based API,
 			 * which is discouraged in the relevant RFCs.
 			 */
 			lims = imo_match_source(imo, idx, &ssa->sa);
 			if (lims != NULL /*&&
 			    lims->imsl_st[1] == MCAST_INCLUDE*/) {
 				error = EADDRNOTAVAIL;
 				goto out_inp_locked;
 			}
 		} else {
 			/*
 			 * MCAST_JOIN_GROUP on an existing exclusive
 			 * membership is an error; return EADDRINUSE
 			 * to preserve 4.4BSD API idempotence, and
 			 * avoid tedious detour to code below.
 			 * NOTE: This is bending RFC 3678 a bit.
 			 *
 			 * On an existing inclusive membership, this is also
 			 * an error; if you want to change filter mode,
 			 * you must use the userland API setsourcefilter().
 			 * XXX We don't reject this for imf in UNDEFINED
 			 * state at t1, because allocation of a filter
 			 * is atomic with allocation of a membership.
 			 */
 			error = EINVAL;
 			if (imf->imf_st[1] == MCAST_EXCLUDE)
 				error = EADDRINUSE;
 			goto out_inp_locked;
 		}
 	}
 
 	/*
 	 * Begin state merge transaction at socket layer.
 	 */
 	INP_WLOCK_ASSERT(inp);
 
 	if (is_new) {
 		if (imo->imo_num_memberships == imo->imo_max_memberships) {
 			error = imo_grow(imo);
 			if (error)
 				goto out_inp_locked;
 		}
 		/*
 		 * Allocate the new slot upfront so we can deal with
 		 * grafting the new source filter in same code path
 		 * as for join-source on existing membership.
 		 */
 		idx = imo->imo_num_memberships;
 		imo->imo_membership[idx] = NULL;
 		imo->imo_num_memberships++;
 		KASSERT(imo->imo_mfilters != NULL,
 		    ("%s: imf_mfilters vector was not allocated", __func__));
 		imf = &imo->imo_mfilters[idx];
 		KASSERT(RB_EMPTY(&imf->imf_sources),
 		    ("%s: imf_sources not empty", __func__));
 	}
 
 	/*
 	 * Graft new source into filter list for this inpcb's
 	 * membership of the group. The in_multi may not have
 	 * been allocated yet if this is a new membership, however,
 	 * the in_mfilter slot will be allocated and must be initialized.
 	 *
 	 * Note: Grafting of exclusive mode filters doesn't happen
 	 * in this path.
 	 * XXX: Should check for non-NULL lims (node exists but may
 	 * not be in-mode) for interop with full-state API.
 	 */
 	if (ssa->ss.ss_family != AF_UNSPEC) {
 		/* Membership starts in IN mode */
 		if (is_new) {
 			CTR1(KTR_IGMPV3, "%s: new join w/source", __func__);
 			imf_init(imf, MCAST_UNDEFINED, MCAST_INCLUDE);
 		} else {
 			CTR2(KTR_IGMPV3, "%s: %s source", __func__, "allow");
 		}
 		lims = imf_graft(imf, MCAST_INCLUDE, &ssa->sin);
 		if (lims == NULL) {
 			CTR1(KTR_IGMPV3, "%s: merge imf state failed",
 			    __func__);
 			error = ENOMEM;
 			goto out_imo_free;
 		}
 	} else {
 		/* No address specified; Membership starts in EX mode */
 		if (is_new) {
 			CTR1(KTR_IGMPV3, "%s: new join w/o source", __func__);
 			imf_init(imf, MCAST_UNDEFINED, MCAST_EXCLUDE);
 		}
 	}
 
 	/*
 	 * Begin state merge transaction at IGMP layer.
 	 */
 	in_pcbref(inp);
 	INP_WUNLOCK(inp);
 	IN_MULTI_LOCK();
 
 	if (is_new) {
 		error = in_joingroup_locked(ifp, &gsa->sin.sin_addr, imf,
 		    &inm);
 		if (error) {
                         CTR1(KTR_IGMPV3, "%s: in_joingroup_locked failed", 
                             __func__);
                         IN_MULTI_LIST_UNLOCK();
 			goto out_imo_free;
 		}
 		inm_acquire(inm);
 		imo->imo_membership[idx] = inm;
 	} else {
 		CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 		IN_MULTI_LIST_LOCK();
 		error = inm_merge(inm, imf);
 		if (error) {
 			CTR1(KTR_IGMPV3, "%s: failed to merge inm state",
 				 __func__);
 			IN_MULTI_LIST_UNLOCK();
 			goto out_in_multi_locked;
 		}
 		CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 		error = igmp_change_state(inm);
 		IN_MULTI_LIST_UNLOCK();
 		if (error) {
 			CTR1(KTR_IGMPV3, "%s: failed igmp downcall",
 			    __func__);
 			goto out_in_multi_locked;
 		}
 	}
 
 out_in_multi_locked:
 
 	IN_MULTI_UNLOCK();
 	INP_WLOCK(inp);
 	if (in_pcbrele_wlocked(inp))
 		return (ENXIO);
 	if (error) {
 		imf_rollback(imf);
 		if (is_new)
 			imf_purge(imf);
 		else
 			imf_reap(imf);
 	} else {
 		imf_commit(imf);
 	}
 
 out_imo_free:
 	if (error && is_new) {
 		inm = imo->imo_membership[idx];
 		if (inm != NULL) {
 			IN_MULTI_LIST_LOCK();
 			inm_release_deferred(inm);
 			IN_MULTI_LIST_UNLOCK();
 		}
 		imo->imo_membership[idx] = NULL;
 		--imo->imo_num_memberships;
 	}
 
 out_inp_locked:
 	INP_WUNLOCK(inp);
 	return (error);
 }
 
 /*
  * Leave an IPv4 multicast group on an inpcb, possibly with a source.
  */
 static int
 inp_leave_group(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct group_source_req		 gsr;
 	struct ip_mreq_source		 mreqs;
 	struct rm_priotracker		 in_ifa_tracker;
 	sockunion_t			*gsa, *ssa;
 	struct ifnet			*ifp;
 	struct in_mfilter		*imf;
 	struct ip_moptions		*imo;
 	struct in_msource		*ims;
 	struct in_multi			*inm;
 	size_t				 idx;
 	int				 error, is_final;
 
 	ifp = NULL;
 	error = 0;
 	is_final = 1;
 
 	memset(&gsr, 0, sizeof(struct group_source_req));
 	gsa = (sockunion_t *)&gsr.gsr_group;
 	gsa->ss.ss_family = AF_UNSPEC;
 	ssa = (sockunion_t *)&gsr.gsr_source;
 	ssa->ss.ss_family = AF_UNSPEC;
 
 	switch (sopt->sopt_name) {
 	case IP_DROP_MEMBERSHIP:
 	case IP_DROP_SOURCE_MEMBERSHIP:
 		if (sopt->sopt_name == IP_DROP_MEMBERSHIP) {
 			error = sooptcopyin(sopt, &mreqs,
 			    sizeof(struct ip_mreq),
 			    sizeof(struct ip_mreq));
 			/*
 			 * Swap interface and sourceaddr arguments,
 			 * as ip_mreq and ip_mreq_source are laid
 			 * out differently.
 			 */
 			mreqs.imr_interface = mreqs.imr_sourceaddr;
 			mreqs.imr_sourceaddr.s_addr = INADDR_ANY;
 		} else if (sopt->sopt_name == IP_DROP_SOURCE_MEMBERSHIP) {
 			error = sooptcopyin(sopt, &mreqs,
 			    sizeof(struct ip_mreq_source),
 			    sizeof(struct ip_mreq_source));
 		}
 		if (error)
 			return (error);
 
 		gsa->sin.sin_family = AF_INET;
 		gsa->sin.sin_len = sizeof(struct sockaddr_in);
 		gsa->sin.sin_addr = mreqs.imr_multiaddr;
 
 		if (sopt->sopt_name == IP_DROP_SOURCE_MEMBERSHIP) {
 			ssa->sin.sin_family = AF_INET;
 			ssa->sin.sin_len = sizeof(struct sockaddr_in);
 			ssa->sin.sin_addr = mreqs.imr_sourceaddr;
 		}
 
 		/*
 		 * Attempt to look up hinted ifp from interface address.
 		 * Fallthrough with null ifp iff lookup fails, to
 		 * preserve 4.4BSD mcast API idempotence.
 		 * XXX NOTE WELL: The RFC 3678 API is preferred because
 		 * using an IPv4 address as a key is racy.
 		 */
 		if (!in_nullhost(mreqs.imr_interface)) {
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			INADDR_TO_IFP(mreqs.imr_interface, ifp);
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 		}
 		CTR3(KTR_IGMPV3, "%s: imr_interface = 0x%08x, ifp = %p",
 		    __func__, ntohl(mreqs.imr_interface.s_addr), ifp);
 
 		break;
 
 	case MCAST_LEAVE_GROUP:
 	case MCAST_LEAVE_SOURCE_GROUP:
 		if (sopt->sopt_name == MCAST_LEAVE_GROUP) {
 			error = sooptcopyin(sopt, &gsr,
 			    sizeof(struct group_req),
 			    sizeof(struct group_req));
 		} else if (sopt->sopt_name == MCAST_LEAVE_SOURCE_GROUP) {
 			error = sooptcopyin(sopt, &gsr,
 			    sizeof(struct group_source_req),
 			    sizeof(struct group_source_req));
 		}
 		if (error)
 			return (error);
 
 		if (gsa->sin.sin_family != AF_INET ||
 		    gsa->sin.sin_len != sizeof(struct sockaddr_in))
 			return (EINVAL);
 
 		if (sopt->sopt_name == MCAST_LEAVE_SOURCE_GROUP) {
 			if (ssa->sin.sin_family != AF_INET ||
 			    ssa->sin.sin_len != sizeof(struct sockaddr_in))
 				return (EINVAL);
 		}
 
 		if (gsr.gsr_interface == 0 || V_if_index < gsr.gsr_interface)
 			return (EADDRNOTAVAIL);
 
 		ifp = ifnet_byindex(gsr.gsr_interface);
 
 		if (ifp == NULL)
 			return (EADDRNOTAVAIL);
 		break;
 
 	default:
 		CTR2(KTR_IGMPV3, "%s: unknown sopt_name %d",
 		    __func__, sopt->sopt_name);
 		return (EOPNOTSUPP);
 		break;
 	}
 
 	if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 		return (EINVAL);
 
 	/*
 	 * Find the membership in the membership array.
 	 */
 	imo = inp_findmoptions(inp);
 	idx = imo_match_group(imo, ifp, &gsa->sa);
 	if (idx == -1) {
 		error = EADDRNOTAVAIL;
 		goto out_inp_locked;
 	}
 	inm = imo->imo_membership[idx];
 	imf = &imo->imo_mfilters[idx];
 
 	if (ssa->ss.ss_family != AF_UNSPEC)
 		is_final = 0;
 
 	/*
 	 * Begin state merge transaction at socket layer.
 	 */
 	INP_WLOCK_ASSERT(inp);
 
 	/*
 	 * If we were instructed only to leave a given source, do so.
 	 * MCAST_LEAVE_SOURCE_GROUP is only valid for inclusive memberships.
 	 */
 	if (is_final) {
 		imf_leave(imf);
 	} else {
 		if (imf->imf_st[0] == MCAST_EXCLUDE) {
 			error = EADDRNOTAVAIL;
 			goto out_inp_locked;
 		}
 		ims = imo_match_source(imo, idx, &ssa->sa);
 		if (ims == NULL) {
 			CTR3(KTR_IGMPV3, "%s: source 0x%08x %spresent",
 			    __func__, ntohl(ssa->sin.sin_addr.s_addr), "not ");
 			error = EADDRNOTAVAIL;
 			goto out_inp_locked;
 		}
 		CTR2(KTR_IGMPV3, "%s: %s source", __func__, "block");
 		error = imf_prune(imf, &ssa->sin);
 		if (error) {
 			CTR1(KTR_IGMPV3, "%s: merge imf state failed",
 			    __func__);
 			goto out_inp_locked;
 		}
 	}
 
 	/*
 	 * Begin state merge transaction at IGMP layer.
 	 */
 	in_pcbref(inp);
 	INP_WUNLOCK(inp);
 	IN_MULTI_LOCK();
 
 	if (is_final) {
 		/*
 		 * Give up the multicast address record to which
 		 * the membership points.
 		 */
 		(void)in_leavegroup_locked(inm, imf);
 	} else {
 		CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 		IN_MULTI_LIST_LOCK();
 		error = inm_merge(inm, imf);
 		if (error) {
 			CTR1(KTR_IGMPV3, "%s: failed to merge inm state",
 			    __func__);
 			IN_MULTI_LIST_UNLOCK();
 			goto out_in_multi_locked;
 		}
 
 		CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 		error = igmp_change_state(inm);
 		IN_MULTI_LIST_UNLOCK();
 		if (error) {
 			CTR1(KTR_IGMPV3, "%s: failed igmp downcall",
 			    __func__);
 		}
 	}
 
 out_in_multi_locked:
 
 	IN_MULTI_UNLOCK();
 	INP_WLOCK(inp);
 	if (in_pcbrele_wlocked(inp))
 		return (ENXIO);
 
 	if (error)
 		imf_rollback(imf);
 	else
 		imf_commit(imf);
 
 	imf_reap(imf);
 
 	if (is_final) {
 		/* Remove the gap in the membership and filter array. */
 		KASSERT(RB_EMPTY(&imf->imf_sources),
 		    ("%s: imf_sources not empty", __func__));
 		for (++idx; idx < imo->imo_num_memberships; ++idx) {
 			imo->imo_membership[idx - 1] = imo->imo_membership[idx];
 			imo->imo_mfilters[idx - 1] = imo->imo_mfilters[idx];
 		}
 		imf_init(&imo->imo_mfilters[idx - 1], MCAST_UNDEFINED,
 		    MCAST_EXCLUDE);
 		imo->imo_num_memberships--;
 	}
 
 out_inp_locked:
 	INP_WUNLOCK(inp);
 	return (error);
 }
 
 /*
  * Select the interface for transmitting IPv4 multicast datagrams.
  *
  * Either an instance of struct in_addr or an instance of struct ip_mreqn
  * may be passed to this socket option. An address of INADDR_ANY or an
  * interface index of 0 is used to remove a previous selection.
  * When no interface is selected, one is chosen for every send.
  */
 static int
 inp_set_multicast_if(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct rm_priotracker	 in_ifa_tracker;
 	struct in_addr		 addr;
 	struct ip_mreqn		 mreqn;
 	struct ifnet		*ifp;
 	struct ip_moptions	*imo;
 	int			 error;
 
 	if (sopt->sopt_valsize == sizeof(struct ip_mreqn)) {
 		/*
 		 * An interface index was specified using the
 		 * Linux-derived ip_mreqn structure.
 		 */
 		error = sooptcopyin(sopt, &mreqn, sizeof(struct ip_mreqn),
 		    sizeof(struct ip_mreqn));
 		if (error)
 			return (error);
 
 		if (mreqn.imr_ifindex < 0 || V_if_index < mreqn.imr_ifindex)
 			return (EINVAL);
 
 		if (mreqn.imr_ifindex == 0) {
 			ifp = NULL;
 		} else {
 			ifp = ifnet_byindex(mreqn.imr_ifindex);
 			if (ifp == NULL)
 				return (EADDRNOTAVAIL);
 		}
 	} else {
 		/*
 		 * An interface was specified by IPv4 address.
 		 * This is the traditional BSD usage.
 		 */
 		error = sooptcopyin(sopt, &addr, sizeof(struct in_addr),
 		    sizeof(struct in_addr));
 		if (error)
 			return (error);
 		if (in_nullhost(addr)) {
 			ifp = NULL;
 		} else {
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			INADDR_TO_IFP(addr, ifp);
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 			if (ifp == NULL)
 				return (EADDRNOTAVAIL);
 		}
 		CTR3(KTR_IGMPV3, "%s: ifp = %p, addr = 0x%08x", __func__, ifp,
 		    ntohl(addr.s_addr));
 	}
 
 	/* Reject interfaces which do not support multicast. */
 	if (ifp != NULL && (ifp->if_flags & IFF_MULTICAST) == 0)
 		return (EOPNOTSUPP);
 
 	imo = inp_findmoptions(inp);
 	imo->imo_multicast_ifp = ifp;
 	imo->imo_multicast_addr.s_addr = INADDR_ANY;
 	INP_WUNLOCK(inp);
 
 	return (0);
 }
 
 /*
  * Atomically set source filters on a socket for an IPv4 multicast group.
  *
  * SMPng: NOTE: Potentially calls malloc(M_WAITOK) with Giant held.
  */
 static int
 inp_set_source_filters(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct __msfilterreq	 msfr;
 	sockunion_t		*gsa;
 	struct ifnet		*ifp;
 	struct in_mfilter	*imf;
 	struct ip_moptions	*imo;
 	struct in_multi		*inm;
 	size_t			 idx;
 	int			 error;
 
 	error = sooptcopyin(sopt, &msfr, sizeof(struct __msfilterreq),
 	    sizeof(struct __msfilterreq));
 	if (error)
 		return (error);
 
 	if (msfr.msfr_nsrcs > in_mcast_maxsocksrc)
 		return (ENOBUFS);
 
 	if ((msfr.msfr_fmode != MCAST_EXCLUDE &&
 	     msfr.msfr_fmode != MCAST_INCLUDE))
 		return (EINVAL);
 
 	if (msfr.msfr_group.ss_family != AF_INET ||
 	    msfr.msfr_group.ss_len != sizeof(struct sockaddr_in))
 		return (EINVAL);
 
 	gsa = (sockunion_t *)&msfr.msfr_group;
 	if (!IN_MULTICAST(ntohl(gsa->sin.sin_addr.s_addr)))
 		return (EINVAL);
 
 	gsa->sin.sin_port = 0;	/* ignore port */
 
 	if (msfr.msfr_ifindex == 0 || V_if_index < msfr.msfr_ifindex)
 		return (EADDRNOTAVAIL);
 
 	ifp = ifnet_byindex(msfr.msfr_ifindex);
 	if (ifp == NULL)
 		return (EADDRNOTAVAIL);
 
 	/*
 	 * Take the INP write lock.
 	 * Check if this socket is a member of this group.
 	 */
 	imo = inp_findmoptions(inp);
 	idx = imo_match_group(imo, ifp, &gsa->sa);
 	if (idx == -1 || imo->imo_mfilters == NULL) {
 		error = EADDRNOTAVAIL;
 		goto out_inp_locked;
 	}
 	inm = imo->imo_membership[idx];
 	imf = &imo->imo_mfilters[idx];
 
 	/*
 	 * Begin state merge transaction at socket layer.
 	 */
 	INP_WLOCK_ASSERT(inp);
 
 	imf->imf_st[1] = msfr.msfr_fmode;
 
 	/*
 	 * Apply any new source filters, if present.
 	 * Make a copy of the user-space source vector so
 	 * that we may copy them with a single copyin. This
 	 * allows us to deal with page faults up-front.
 	 */
 	if (msfr.msfr_nsrcs > 0) {
 		struct in_msource	*lims;
 		struct sockaddr_in	*psin;
 		struct sockaddr_storage	*kss, *pkss;
 		int			 i;
 
 		INP_WUNLOCK(inp);
  
 		CTR2(KTR_IGMPV3, "%s: loading %lu source list entries",
 		    __func__, (unsigned long)msfr.msfr_nsrcs);
 		kss = malloc(sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs,
 		    M_TEMP, M_WAITOK);
 		error = copyin(msfr.msfr_srcs, kss,
 		    sizeof(struct sockaddr_storage) * msfr.msfr_nsrcs);
 		if (error) {
 			free(kss, M_TEMP);
 			return (error);
 		}
 
 		INP_WLOCK(inp);
 
 		/*
 		 * Mark all source filters as UNDEFINED at t1.
 		 * Restore new group filter mode, as imf_leave()
 		 * will set it to INCLUDE.
 		 */
 		imf_leave(imf);
 		imf->imf_st[1] = msfr.msfr_fmode;
 
 		/*
 		 * Update socket layer filters at t1, lazy-allocating
 		 * new entries. This saves a bunch of memory at the
 		 * cost of one RB_FIND() per source entry; duplicate
 		 * entries in the msfr_nsrcs vector are ignored.
 		 * If we encounter an error, rollback transaction.
 		 *
 		 * XXX This too could be replaced with a set-symmetric
 		 * difference like loop to avoid walking from root
 		 * every time, as the key space is common.
 		 */
 		for (i = 0, pkss = kss; i < msfr.msfr_nsrcs; i++, pkss++) {
 			psin = (struct sockaddr_in *)pkss;
 			if (psin->sin_family != AF_INET) {
 				error = EAFNOSUPPORT;
 				break;
 			}
 			if (psin->sin_len != sizeof(struct sockaddr_in)) {
 				error = EINVAL;
 				break;
 			}
 			error = imf_get_source(imf, psin, &lims);
 			if (error)
 				break;
 			lims->imsl_st[1] = imf->imf_st[1];
 		}
 		free(kss, M_TEMP);
 	}
 
 	if (error)
 		goto out_imf_rollback;
 
 	INP_WLOCK_ASSERT(inp);
 	IN_MULTI_LOCK();
 
 	/*
 	 * Begin state merge transaction at IGMP layer.
 	 */
 	CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
 	IN_MULTI_LIST_LOCK();
 	error = inm_merge(inm, imf);
 	if (error) {
 		CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__);
 		IN_MULTI_LIST_UNLOCK();
 		goto out_in_multi_locked;
 	}
 
 	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
 	error = igmp_change_state(inm);
 	IN_MULTI_LIST_UNLOCK();
 	if (error)
 		CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);
 
 out_in_multi_locked:
 
 	IN_MULTI_UNLOCK();
 
 out_imf_rollback:
 	if (error)
 		imf_rollback(imf);
 	else
 		imf_commit(imf);
 
 	imf_reap(imf);
 
 out_inp_locked:
 	INP_WUNLOCK(inp);
 	return (error);
 }
 
 /*
  * Set the IP multicast options in response to user setsockopt().
  *
  * Many of the socket options handled in this function duplicate the
  * functionality of socket options in the regular unicast API. However,
  * it is not possible to merge the duplicate code, because the idempotence
  * of the IPv4 multicast part of the BSD Sockets API must be preserved;
  * the effects of these options must be treated as separate and distinct.
  *
  * SMPng: XXX: Unlocked read of inp_socket believed OK.
  * FUTURE: The IP_MULTICAST_VIF option may be eliminated if MROUTING
  * is refactored to no longer use vifs.
  */
 int
 inp_setmoptions(struct inpcb *inp, struct sockopt *sopt)
 {
 	struct ip_moptions	*imo;
 	int			 error;
 
 	error = 0;
 
 	/*
 	 * If socket is neither of type SOCK_RAW or SOCK_DGRAM,
 	 * or is a divert socket, reject it.
 	 */
 	if (inp->inp_socket->so_proto->pr_protocol == IPPROTO_DIVERT ||
 	    (inp->inp_socket->so_proto->pr_type != SOCK_RAW &&
 	     inp->inp_socket->so_proto->pr_type != SOCK_DGRAM))
 		return (EOPNOTSUPP);
 
 	switch (sopt->sopt_name) {
 	case IP_MULTICAST_VIF: {
 		int vifi;
 		/*
 		 * Select a multicast VIF for transmission.
 		 * Only useful if multicast forwarding is active.
 		 */
 		if (legal_vif_num == NULL) {
 			error = EOPNOTSUPP;
 			break;
 		}
 		error = sooptcopyin(sopt, &vifi, sizeof(int), sizeof(int));
 		if (error)
 			break;
 		if (!legal_vif_num(vifi) && (vifi != -1)) {
 			error = EINVAL;
 			break;
 		}
 		imo = inp_findmoptions(inp);
 		imo->imo_multicast_vif = vifi;
 		INP_WUNLOCK(inp);
 		break;
 	}
 
 	case IP_MULTICAST_IF:
 		error = inp_set_multicast_if(inp, sopt);
 		break;
 
 	case IP_MULTICAST_TTL: {
 		u_char ttl;
 
 		/*
 		 * Set the IP time-to-live for outgoing multicast packets.
 		 * The original multicast API required a char argument,
 		 * which is inconsistent with the rest of the socket API.
 		 * We allow either a char or an int.
 		 */
 		if (sopt->sopt_valsize == sizeof(u_char)) {
 			error = sooptcopyin(sopt, &ttl, sizeof(u_char),
 			    sizeof(u_char));
 			if (error)
 				break;
 		} else {
 			u_int ittl;
 
 			error = sooptcopyin(sopt, &ittl, sizeof(u_int),
 			    sizeof(u_int));
 			if (error)
 				break;
 			if (ittl > 255) {
 				error = EINVAL;
 				break;
 			}
 			ttl = (u_char)ittl;
 		}
 		imo = inp_findmoptions(inp);
 		imo->imo_multicast_ttl = ttl;
 		INP_WUNLOCK(inp);
 		break;
 	}
 
 	case IP_MULTICAST_LOOP: {
 		u_char loop;
 
 		/*
 		 * Set the loopback flag for outgoing multicast packets.
 		 * Must be zero or one.  The original multicast API required a
 		 * char argument, which is inconsistent with the rest
 		 * of the socket API.  We allow either a char or an int.
 		 */
 		if (sopt->sopt_valsize == sizeof(u_char)) {
 			error = sooptcopyin(sopt, &loop, sizeof(u_char),
 			    sizeof(u_char));
 			if (error)
 				break;
 		} else {
 			u_int iloop;
 
 			error = sooptcopyin(sopt, &iloop, sizeof(u_int),
 					    sizeof(u_int));
 			if (error)
 				break;
 			loop = (u_char)iloop;
 		}
 		imo = inp_findmoptions(inp);
 		imo->imo_multicast_loop = !!loop;
 		INP_WUNLOCK(inp);
 		break;
 	}
 
 	case IP_ADD_MEMBERSHIP:
 	case IP_ADD_SOURCE_MEMBERSHIP:
 	case MCAST_JOIN_GROUP:
 	case MCAST_JOIN_SOURCE_GROUP:
 		error = inp_join_group(inp, sopt);
 		break;
 
 	case IP_DROP_MEMBERSHIP:
 	case IP_DROP_SOURCE_MEMBERSHIP:
 	case MCAST_LEAVE_GROUP:
 	case MCAST_LEAVE_SOURCE_GROUP:
 		error = inp_leave_group(inp, sopt);
 		break;
 
 	case IP_BLOCK_SOURCE:
 	case IP_UNBLOCK_SOURCE:
 	case MCAST_BLOCK_SOURCE:
 	case MCAST_UNBLOCK_SOURCE:
 		error = inp_block_unblock_source(inp, sopt);
 		break;
 
 	case IP_MSFILTER:
 		error = inp_set_source_filters(inp, sopt);
 		break;
 
 	default:
 		error = EOPNOTSUPP;
 		break;
 	}
 
 	INP_UNLOCK_ASSERT(inp);
 
 	return (error);
 }
 
 /*
  * Expose IGMP's multicast filter mode and source list(s) to userland,
  * keyed by (ifindex, group).
  * The filter mode is written out as a uint32_t, followed by
  * 0..n of struct in_addr.
  * For use by ifmcstat(8).
  * SMPng: NOTE: unlocked read of ifindex space.
  */
 static int
 sysctl_ip_mcast_filters(SYSCTL_HANDLER_ARGS)
 {
 	struct in_addr			 src, group;
 	struct epoch_tracker		 et;
 	struct ifnet			*ifp;
 	struct ifmultiaddr		*ifma;
 	struct in_multi			*inm;
 	struct ip_msource		*ims;
 	int				*name;
 	int				 retval;
 	u_int				 namelen;
 	uint32_t			 fmode, ifindex;
 
 	name = (int *)arg1;
 	namelen = arg2;
 
 	if (req->newptr != NULL)
 		return (EPERM);
 
 	if (namelen != 2)
 		return (EINVAL);
 
 	ifindex = name[0];
 	if (ifindex <= 0 || ifindex > V_if_index) {
 		CTR2(KTR_IGMPV3, "%s: ifindex %u out of range",
 		    __func__, ifindex);
 		return (ENOENT);
 	}
 
 	group.s_addr = name[1];
 	if (!IN_MULTICAST(ntohl(group.s_addr))) {
 		CTR2(KTR_IGMPV3, "%s: group 0x%08x is not multicast",
 		    __func__, ntohl(group.s_addr));
 		return (EINVAL);
 	}
 
 	ifp = ifnet_byindex(ifindex);
 	if (ifp == NULL) {
 		CTR2(KTR_IGMPV3, "%s: no ifp for ifindex %u",
 		    __func__, ifindex);
 		return (ENOENT);
 	}
 
 	retval = sysctl_wire_old_buffer(req,
 	    sizeof(uint32_t) + (in_mcast_maxgrpsrc * sizeof(struct in_addr)));
 	if (retval)
 		return (retval);
 
 	IN_MULTI_LIST_LOCK();
 
 	NET_EPOCH_ENTER(et);
 	CK_STAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 		if (ifma->ifma_addr->sa_family != AF_INET ||
 		    ifma->ifma_protospec == NULL)
 			continue;
 		inm = (struct in_multi *)ifma->ifma_protospec;
 		if (!in_hosteq(inm->inm_addr, group))
 			continue;
 		fmode = inm->inm_st[1].iss_fmode;
 		retval = SYSCTL_OUT(req, &fmode, sizeof(uint32_t));
 		if (retval != 0)
 			break;
 		RB_FOREACH(ims, ip_msource_tree, &inm->inm_srcs) {
 			CTR2(KTR_IGMPV3, "%s: visit node 0x%08x", __func__,
 			    ims->ims_haddr);
 			/*
 			 * Only copy-out sources which are in-mode.
 			 */
 			if (fmode != ims_get_mode(inm, ims, 1)) {
 				CTR1(KTR_IGMPV3, "%s: skip non-in-mode",
 				    __func__);
 				continue;
 			}
 			src.s_addr = htonl(ims->ims_haddr);
 			retval = SYSCTL_OUT(req, &src, sizeof(struct in_addr));
 			if (retval != 0)
 				break;
 		}
 	}
 	NET_EPOCH_EXIT(et);
 
 	IN_MULTI_LIST_UNLOCK();
 
 	return (retval);
 }
 
 #if defined(KTR) && (KTR_COMPILE & KTR_IGMPV3)
 
-static const char *inm_modestrs[] = { "un", "in", "ex" };
+static const char *inm_modestrs[] = {
+	[MCAST_UNDEFINED] = "un",
+	[MCAST_INCLUDE] = "in",
+	[MCAST_EXCLUDE] = "ex",
+};
+_Static_assert(MCAST_UNDEFINED == 0 &&
+	       MCAST_EXCLUDE + 1 == nitems(inm_modestrs),
+	       "inm_modestrs: no longer matches #defines");
 
 static const char *
 inm_mode_str(const int mode)
 {
 
 	if (mode >= MCAST_UNDEFINED && mode <= MCAST_EXCLUDE)
 		return (inm_modestrs[mode]);
 	return ("??");
 }
 
 static const char *inm_statestrs[] = {
-	"not-member",
-	"silent",
-	"idle",
-	"lazy",
-	"sleeping",
-	"awakening",
-	"query-pending",
-	"sg-query-pending",
-	"leaving"
+	[IGMP_NOT_MEMBER] = "not-member",
+	[IGMP_SILENT_MEMBER] = "silent",
+	[IGMP_REPORTING_MEMBER] = "reporting",
+	[IGMP_IDLE_MEMBER] = "idle",
+	[IGMP_LAZY_MEMBER] = "lazy",
+	[IGMP_SLEEPING_MEMBER] = "sleeping",
+	[IGMP_AWAKENING_MEMBER] = "awakening",
+	[IGMP_G_QUERY_PENDING_MEMBER] = "query-pending",
+	[IGMP_SG_QUERY_PENDING_MEMBER] = "sg-query-pending",
+	[IGMP_LEAVING_MEMBER] = "leaving",
 };
+_Static_assert(IGMP_NOT_MEMBER == 0 &&
+	       IGMP_LEAVING_MEMBER + 1 == nitems(inm_statestrs),
+	       "inm_statetrs: no longer matches #defines");
 
 static const char *
 inm_state_str(const int state)
 {
 
 	if (state >= IGMP_NOT_MEMBER && state <= IGMP_LEAVING_MEMBER)
 		return (inm_statestrs[state]);
 	return ("??");
 }
 
 /*
  * Dump an in_multi structure to the console.
  */
 void
 inm_print(const struct in_multi *inm)
 {
 	int t;
 	char addrbuf[INET_ADDRSTRLEN];
 
 	if ((ktr_mask & KTR_IGMPV3) == 0)
 		return;
 
 	printf("%s: --- begin inm %p ---\n", __func__, inm);
 	printf("addr %s ifp %p(%s) ifma %p\n",
 	    inet_ntoa_r(inm->inm_addr, addrbuf),
 	    inm->inm_ifp,
 	    inm->inm_ifp->if_xname,
 	    inm->inm_ifma);
 	printf("timer %u state %s refcount %u scq.len %u\n",
 	    inm->inm_timer,
 	    inm_state_str(inm->inm_state),
 	    inm->inm_refcount,
 	    inm->inm_scq.mq_len);
 	printf("igi %p nsrc %lu sctimer %u scrv %u\n",
 	    inm->inm_igi,
 	    inm->inm_nsrc,
 	    inm->inm_sctimer,
 	    inm->inm_scrv);
 	for (t = 0; t < 2; t++) {
 		printf("t%d: fmode %s asm %u ex %u in %u rec %u\n", t,
 		    inm_mode_str(inm->inm_st[t].iss_fmode),
 		    inm->inm_st[t].iss_asm,
 		    inm->inm_st[t].iss_ex,
 		    inm->inm_st[t].iss_in,
 		    inm->inm_st[t].iss_rec);
 	}
 	printf("%s: --- end inm %p ---\n", __func__, inm);
 }
 
 #else /* !KTR || !(KTR_COMPILE & KTR_IGMPV3) */
 
 void
 inm_print(const struct in_multi *inm)
 {
 
 }
 
 #endif /* KTR && (KTR_COMPILE & KTR_IGMPV3) */
 
 RB_GENERATE(ip_msource_tree, ip_msource, ims_link, ip_msource_cmp);
Index: user/ngie/bug-237403/sys/netinet/in_pcb.c
===================================================================
--- user/ngie/bug-237403/sys/netinet/in_pcb.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/in_pcb.c	(revision 346926)
@@ -1,3431 +1,3434 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1991, 1993, 1995
  *	The Regents of the University of California.
  * Copyright (c) 2007-2009 Robert N. M. Watson
  * Copyright (c) 2010-2011 Juniper Networks, Inc.
  * All rights reserved.
  *
  * Portions of this software were developed by Robert N. M. Watson under
  * contract to Juniper Networks, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)in_pcb.c	8.4 (Berkeley) 5/24/95
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 #include "opt_ipsec.h"
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_ratelimit.h"
 #include "opt_pcbgroup.h"
 #include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/callout.h>
 #include <sys/eventhandler.h>
 #include <sys/domain.h>
 #include <sys/protosw.h>
 #include <sys/rmlock.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sockio.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/refcount.h>
 #include <sys/jail.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 
 #ifdef DDB
 #include <ddb/ddb.h>
 #endif
 
 #include <vm/uma.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_types.h>
 #include <net/if_llatbl.h>
 #include <net/route.h>
 #include <net/rss_config.h>
 #include <net/vnet.h>
 
 #if defined(INET) || defined(INET6)
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip_var.h>
 #include <netinet/tcp_var.h>
 #ifdef TCPHPTS
 #include <netinet/tcp_hpts.h>
 #endif
 #include <netinet/udp.h>
 #include <netinet/udp_var.h>
 #endif
 #ifdef INET
 #include <netinet/in_var.h>
 #endif
 #ifdef INET6
 #include <netinet/ip6.h>
 #include <netinet6/in6_pcb.h>
 #include <netinet6/in6_var.h>
 #include <netinet6/ip6_var.h>
 #endif /* INET6 */
 
 #include <netipsec/ipsec_support.h>
 
 #include <security/mac/mac_framework.h>
 
 #define	INPCBLBGROUP_SIZMIN	8
 #define	INPCBLBGROUP_SIZMAX	256
 
 static struct callout	ipport_tick_callout;
 
 /*
  * These configure the range of local port addresses assigned to
  * "unspecified" outgoing connections/packets/whatever.
  */
 VNET_DEFINE(int, ipport_lowfirstauto) = IPPORT_RESERVED - 1;	/* 1023 */
 VNET_DEFINE(int, ipport_lowlastauto) = IPPORT_RESERVEDSTART;	/* 600 */
 VNET_DEFINE(int, ipport_firstauto) = IPPORT_EPHEMERALFIRST;	/* 10000 */
 VNET_DEFINE(int, ipport_lastauto) = IPPORT_EPHEMERALLAST;	/* 65535 */
 VNET_DEFINE(int, ipport_hifirstauto) = IPPORT_HIFIRSTAUTO;	/* 49152 */
 VNET_DEFINE(int, ipport_hilastauto) = IPPORT_HILASTAUTO;	/* 65535 */
 
 /*
  * Reserved ports accessible only to root. There are significant
  * security considerations that must be accounted for when changing these,
  * but the security benefits can be great. Please be careful.
  */
 VNET_DEFINE(int, ipport_reservedhigh) = IPPORT_RESERVED - 1;	/* 1023 */
 VNET_DEFINE(int, ipport_reservedlow);
 
 /* Variables dealing with random ephemeral port allocation. */
 VNET_DEFINE(int, ipport_randomized) = 1;	/* user controlled via sysctl */
 VNET_DEFINE(int, ipport_randomcps) = 10;	/* user controlled via sysctl */
 VNET_DEFINE(int, ipport_randomtime) = 45;	/* user controlled via sysctl */
 VNET_DEFINE(int, ipport_stoprandom);		/* toggled by ipport_tick */
 VNET_DEFINE(int, ipport_tcpallocs);
 VNET_DEFINE_STATIC(int, ipport_tcplastcount);
 
 #define	V_ipport_tcplastcount		VNET(ipport_tcplastcount)
 
 static void	in_pcbremlists(struct inpcb *inp);
 #ifdef INET
 static struct inpcb	*in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo,
 			    struct in_addr faddr, u_int fport_arg,
 			    struct in_addr laddr, u_int lport_arg,
 			    int lookupflags, struct ifnet *ifp);
 
 #define RANGECHK(var, min, max) \
 	if ((var) < (min)) { (var) = (min); } \
 	else if ((var) > (max)) { (var) = (max); }
 
 static int
 sysctl_net_ipport_check(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 
 	error = sysctl_handle_int(oidp, arg1, arg2, req);
 	if (error == 0) {
 		RANGECHK(V_ipport_lowfirstauto, 1, IPPORT_RESERVED - 1);
 		RANGECHK(V_ipport_lowlastauto, 1, IPPORT_RESERVED - 1);
 		RANGECHK(V_ipport_firstauto, IPPORT_RESERVED, IPPORT_MAX);
 		RANGECHK(V_ipport_lastauto, IPPORT_RESERVED, IPPORT_MAX);
 		RANGECHK(V_ipport_hifirstauto, IPPORT_RESERVED, IPPORT_MAX);
 		RANGECHK(V_ipport_hilastauto, IPPORT_RESERVED, IPPORT_MAX);
 	}
 	return (error);
 }
 
 #undef RANGECHK
 
 static SYSCTL_NODE(_net_inet_ip, IPPROTO_IP, portrange, CTLFLAG_RW, 0,
     "IP Ports");
 
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowfirst,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_lowfirstauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, lowlast,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_lowlastauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, first,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_firstauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, last,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_lastauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hifirst,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_hifirstauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_PROC(_net_inet_ip_portrange, OID_AUTO, hilast,
 	CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW,
 	&VNET_NAME(ipport_hilastauto), 0, &sysctl_net_ipport_check, "I", "");
 SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, reservedhigh,
 	CTLFLAG_VNET | CTLFLAG_RW | CTLFLAG_SECURE,
 	&VNET_NAME(ipport_reservedhigh), 0, "");
 SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, reservedlow,
 	CTLFLAG_RW|CTLFLAG_SECURE, &VNET_NAME(ipport_reservedlow), 0, "");
 SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomized,
 	CTLFLAG_VNET | CTLFLAG_RW,
 	&VNET_NAME(ipport_randomized), 0, "Enable random port allocation");
 SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomcps,
 	CTLFLAG_VNET | CTLFLAG_RW,
 	&VNET_NAME(ipport_randomcps), 0, "Maximum number of random port "
 	"allocations before switching to a sequental one");
 SYSCTL_INT(_net_inet_ip_portrange, OID_AUTO, randomtime,
 	CTLFLAG_VNET | CTLFLAG_RW,
 	&VNET_NAME(ipport_randomtime), 0,
 	"Minimum time to keep sequental port "
 	"allocation before switching to a random one");
 #endif /* INET */
 
 /*
  * in_pcb.c: manage the Protocol Control Blocks.
  *
  * NOTE: It is assumed that most of these functions will be called with
  * the pcbinfo lock held, and often, the inpcb lock held, as these utility
  * functions often modify hash chains or addresses in pcbs.
  */
 
 static struct inpcblbgroup *
 in_pcblbgroup_alloc(struct inpcblbgrouphead *hdr, u_char vflag,
     uint16_t port, const union in_dependaddr *addr, int size)
 {
 	struct inpcblbgroup *grp;
 	size_t bytes;
 
 	bytes = __offsetof(struct inpcblbgroup, il_inp[size]);
 	grp = malloc(bytes, M_PCB, M_ZERO | M_NOWAIT);
 	if (!grp)
 		return (NULL);
 	grp->il_vflag = vflag;
 	grp->il_lport = port;
 	grp->il_dependladdr = *addr;
 	grp->il_inpsiz = size;
 	CK_LIST_INSERT_HEAD(hdr, grp, il_list);
 	return (grp);
 }
 
 static void
 in_pcblbgroup_free_deferred(epoch_context_t ctx)
 {
 	struct inpcblbgroup *grp;
 
 	grp = __containerof(ctx, struct inpcblbgroup, il_epoch_ctx);
 	free(grp, M_PCB);
 }
 
 static void
 in_pcblbgroup_free(struct inpcblbgroup *grp)
 {
 
 	CK_LIST_REMOVE(grp, il_list);
 	epoch_call(net_epoch_preempt, &grp->il_epoch_ctx,
 	    in_pcblbgroup_free_deferred);
 }
 
 static struct inpcblbgroup *
 in_pcblbgroup_resize(struct inpcblbgrouphead *hdr,
     struct inpcblbgroup *old_grp, int size)
 {
 	struct inpcblbgroup *grp;
 	int i;
 
 	grp = in_pcblbgroup_alloc(hdr, old_grp->il_vflag,
 	    old_grp->il_lport, &old_grp->il_dependladdr, size);
 	if (grp == NULL)
 		return (NULL);
 
 	KASSERT(old_grp->il_inpcnt < grp->il_inpsiz,
 	    ("invalid new local group size %d and old local group count %d",
 	     grp->il_inpsiz, old_grp->il_inpcnt));
 
 	for (i = 0; i < old_grp->il_inpcnt; ++i)
 		grp->il_inp[i] = old_grp->il_inp[i];
 	grp->il_inpcnt = old_grp->il_inpcnt;
 	in_pcblbgroup_free(old_grp);
 	return (grp);
 }
 
 /*
  * PCB at index 'i' is removed from the group. Pull up the ones below il_inp[i]
  * and shrink group if possible.
  */
 static void
 in_pcblbgroup_reorder(struct inpcblbgrouphead *hdr, struct inpcblbgroup **grpp,
     int i)
 {
 	struct inpcblbgroup *grp, *new_grp;
 
 	grp = *grpp;
 	for (; i + 1 < grp->il_inpcnt; ++i)
 		grp->il_inp[i] = grp->il_inp[i + 1];
 	grp->il_inpcnt--;
 
 	if (grp->il_inpsiz > INPCBLBGROUP_SIZMIN &&
 	    grp->il_inpcnt <= grp->il_inpsiz / 4) {
 		/* Shrink this group. */
 		new_grp = in_pcblbgroup_resize(hdr, grp, grp->il_inpsiz / 2);
 		if (new_grp != NULL)
 			*grpp = new_grp;
 	}
 }
 
 /*
  * Add PCB to load balance group for SO_REUSEPORT_LB option.
  */
 static int
 in_pcbinslbgrouphash(struct inpcb *inp)
 {
 	const static struct timeval interval = { 60, 0 };
 	static struct timeval lastprint;
 	struct inpcbinfo *pcbinfo;
 	struct inpcblbgrouphead *hdr;
 	struct inpcblbgroup *grp;
 	uint32_t idx;
 
 	pcbinfo = inp->inp_pcbinfo;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(pcbinfo);
 
 	/*
 	 * Don't allow jailed socket to join local group.
 	 */
 	if (inp->inp_socket != NULL && jailed(inp->inp_socket->so_cred))
 		return (0);
 
 #ifdef INET6
 	/*
 	 * Don't allow IPv4 mapped INET6 wild socket.
 	 */
 	if ((inp->inp_vflag & INP_IPV4) &&
 	    inp->inp_laddr.s_addr == INADDR_ANY &&
 	    INP_CHECK_SOCKAF(inp->inp_socket, AF_INET6)) {
 		return (0);
 	}
 #endif
 
 	idx = INP_PCBPORTHASH(inp->inp_lport, pcbinfo->ipi_lbgrouphashmask);
 	hdr = &pcbinfo->ipi_lbgrouphashbase[idx];
 	CK_LIST_FOREACH(grp, hdr, il_list) {
 		if (grp->il_vflag == inp->inp_vflag &&
 		    grp->il_lport == inp->inp_lport &&
 		    memcmp(&grp->il_dependladdr,
 		    &inp->inp_inc.inc_ie.ie_dependladdr,
 		    sizeof(grp->il_dependladdr)) == 0)
 			break;
 	}
 	if (grp == NULL) {
 		/* Create new load balance group. */
 		grp = in_pcblbgroup_alloc(hdr, inp->inp_vflag,
 		    inp->inp_lport, &inp->inp_inc.inc_ie.ie_dependladdr,
 		    INPCBLBGROUP_SIZMIN);
 		if (grp == NULL)
 			return (ENOBUFS);
 	} else if (grp->il_inpcnt == grp->il_inpsiz) {
 		if (grp->il_inpsiz >= INPCBLBGROUP_SIZMAX) {
 			if (ratecheck(&lastprint, &interval))
 				printf("lb group port %d, limit reached\n",
 				    ntohs(grp->il_lport));
 			return (0);
 		}
 
 		/* Expand this local group. */
 		grp = in_pcblbgroup_resize(hdr, grp, grp->il_inpsiz * 2);
 		if (grp == NULL)
 			return (ENOBUFS);
 	}
 
 	KASSERT(grp->il_inpcnt < grp->il_inpsiz,
 	    ("invalid local group size %d and count %d", grp->il_inpsiz,
 	    grp->il_inpcnt));
 
 	grp->il_inp[grp->il_inpcnt] = inp;
 	grp->il_inpcnt++;
 	return (0);
 }
 
 /*
  * Remove PCB from load balance group.
  */
 static void
 in_pcbremlbgrouphash(struct inpcb *inp)
 {
 	struct inpcbinfo *pcbinfo;
 	struct inpcblbgrouphead *hdr;
 	struct inpcblbgroup *grp;
 	int i;
 
 	pcbinfo = inp->inp_pcbinfo;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(pcbinfo);
 
 	hdr = &pcbinfo->ipi_lbgrouphashbase[
 	    INP_PCBPORTHASH(inp->inp_lport, pcbinfo->ipi_lbgrouphashmask)];
 	CK_LIST_FOREACH(grp, hdr, il_list) {
 		for (i = 0; i < grp->il_inpcnt; ++i) {
 			if (grp->il_inp[i] != inp)
 				continue;
 
 			if (grp->il_inpcnt == 1) {
 				/* We are the last, free this local group. */
 				in_pcblbgroup_free(grp);
 			} else {
 				/* Pull up inpcbs, shrink group if possible. */
 				in_pcblbgroup_reorder(hdr, &grp, i);
 			}
 			return;
 		}
 	}
 }
 
 /*
  * Different protocols initialize their inpcbs differently - giving
  * different name to the lock.  But they all are disposed the same.
  */
 static void
 inpcb_fini(void *mem, int size)
 {
 	struct inpcb *inp = mem;
 
 	INP_LOCK_DESTROY(inp);
 }
 
 /*
  * Initialize an inpcbinfo -- we should be able to reduce the number of
  * arguments in time.
  */
 void
 in_pcbinfo_init(struct inpcbinfo *pcbinfo, const char *name,
     struct inpcbhead *listhead, int hash_nelements, int porthash_nelements,
     char *inpcbzone_name, uma_init inpcbzone_init, u_int hashfields)
 {
 
 	porthash_nelements = imin(porthash_nelements, IPPORT_MAX + 1);
 
 	INP_INFO_LOCK_INIT(pcbinfo, name);
 	INP_HASH_LOCK_INIT(pcbinfo, "pcbinfohash");	/* XXXRW: argument? */
 	INP_LIST_LOCK_INIT(pcbinfo, "pcbinfolist");
 #ifdef VIMAGE
 	pcbinfo->ipi_vnet = curvnet;
 #endif
 	pcbinfo->ipi_listhead = listhead;
 	CK_LIST_INIT(pcbinfo->ipi_listhead);
 	pcbinfo->ipi_count = 0;
 	pcbinfo->ipi_hashbase = hashinit(hash_nelements, M_PCB,
 	    &pcbinfo->ipi_hashmask);
 	pcbinfo->ipi_porthashbase = hashinit(porthash_nelements, M_PCB,
 	    &pcbinfo->ipi_porthashmask);
 	pcbinfo->ipi_lbgrouphashbase = hashinit(porthash_nelements, M_PCB,
 	    &pcbinfo->ipi_lbgrouphashmask);
 #ifdef PCBGROUP
 	in_pcbgroup_init(pcbinfo, hashfields, hash_nelements);
 #endif
 	pcbinfo->ipi_zone = uma_zcreate(inpcbzone_name, sizeof(struct inpcb),
 	    NULL, NULL, inpcbzone_init, inpcb_fini, UMA_ALIGN_PTR, 0);
 	uma_zone_set_max(pcbinfo->ipi_zone, maxsockets);
 	uma_zone_set_warning(pcbinfo->ipi_zone,
 	    "kern.ipc.maxsockets limit reached");
 }
 
 /*
  * Destroy an inpcbinfo.
  */
 void
 in_pcbinfo_destroy(struct inpcbinfo *pcbinfo)
 {
 
 	KASSERT(pcbinfo->ipi_count == 0,
 	    ("%s: ipi_count = %u", __func__, pcbinfo->ipi_count));
 
 	hashdestroy(pcbinfo->ipi_hashbase, M_PCB, pcbinfo->ipi_hashmask);
 	hashdestroy(pcbinfo->ipi_porthashbase, M_PCB,
 	    pcbinfo->ipi_porthashmask);
 	hashdestroy(pcbinfo->ipi_lbgrouphashbase, M_PCB,
 	    pcbinfo->ipi_lbgrouphashmask);
 #ifdef PCBGROUP
 	in_pcbgroup_destroy(pcbinfo);
 #endif
 	uma_zdestroy(pcbinfo->ipi_zone);
 	INP_LIST_LOCK_DESTROY(pcbinfo);
 	INP_HASH_LOCK_DESTROY(pcbinfo);
 	INP_INFO_LOCK_DESTROY(pcbinfo);
 }
 
 /*
  * Allocate a PCB and associate it with the socket.
  * On success return with the PCB locked.
  */
 int
 in_pcballoc(struct socket *so, struct inpcbinfo *pcbinfo)
 {
 	struct inpcb *inp;
 	int error;
 
 #ifdef INVARIANTS
 	if (pcbinfo == &V_tcbinfo) {
 		INP_INFO_RLOCK_ASSERT(pcbinfo);
 	} else {
 		INP_INFO_WLOCK_ASSERT(pcbinfo);
 	}
 #endif
 
 	error = 0;
 	inp = uma_zalloc(pcbinfo->ipi_zone, M_NOWAIT);
 	if (inp == NULL)
 		return (ENOBUFS);
 	bzero(&inp->inp_start_zero, inp_zero_size);
+#ifdef NUMA
+	inp->inp_numa_domain = M_NODOM;
+#endif
 	inp->inp_pcbinfo = pcbinfo;
 	inp->inp_socket = so;
 	inp->inp_cred = crhold(so->so_cred);
 	inp->inp_inc.inc_fibnum = so->so_fibnum;
 #ifdef MAC
 	error = mac_inpcb_init(inp, M_NOWAIT);
 	if (error != 0)
 		goto out;
 	mac_inpcb_create(so, inp);
 #endif
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	error = ipsec_init_pcbpolicy(inp);
 	if (error != 0) {
 #ifdef MAC
 		mac_inpcb_destroy(inp);
 #endif
 		goto out;
 	}
 #endif /*IPSEC*/
 #ifdef INET6
 	if (INP_SOCKAF(so) == AF_INET6) {
 		inp->inp_vflag |= INP_IPV6PROTO;
 		if (V_ip6_v6only)
 			inp->inp_flags |= IN6P_IPV6_V6ONLY;
 	}
 #endif
 	INP_WLOCK(inp);
 	INP_LIST_WLOCK(pcbinfo);
 	CK_LIST_INSERT_HEAD(pcbinfo->ipi_listhead, inp, inp_list);
 	pcbinfo->ipi_count++;
 	so->so_pcb = (caddr_t)inp;
 #ifdef INET6
 	if (V_ip6_auto_flowlabel)
 		inp->inp_flags |= IN6P_AUTOFLOWLABEL;
 #endif
 	inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
 	refcount_init(&inp->inp_refcount, 1);	/* Reference from inpcbinfo */
 
 	/*
 	 * Routes in inpcb's can cache L2 as well; they are guaranteed
 	 * to be cleaned up.
 	 */
 	inp->inp_route.ro_flags = RT_LLE_CACHE;
 	INP_LIST_WUNLOCK(pcbinfo);
 #if defined(IPSEC) || defined(IPSEC_SUPPORT) || defined(MAC)
 out:
 	if (error != 0) {
 		crfree(inp->inp_cred);
 		uma_zfree(pcbinfo->ipi_zone, inp);
 	}
 #endif
 	return (error);
 }
 
 #ifdef INET
 int
 in_pcbbind(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred)
 {
 	int anonport, error;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
 
 	if (inp->inp_lport != 0 || inp->inp_laddr.s_addr != INADDR_ANY)
 		return (EINVAL);
 	anonport = nam == NULL || ((struct sockaddr_in *)nam)->sin_port == 0;
 	error = in_pcbbind_setup(inp, nam, &inp->inp_laddr.s_addr,
 	    &inp->inp_lport, cred);
 	if (error)
 		return (error);
 	if (in_pcbinshash(inp) != 0) {
 		inp->inp_laddr.s_addr = INADDR_ANY;
 		inp->inp_lport = 0;
 		return (EAGAIN);
 	}
 	if (anonport)
 		inp->inp_flags |= INP_ANONPORT;
 	return (0);
 }
 #endif
 
 /*
  * Select a local port (number) to use.
  */
 #if defined(INET) || defined(INET6)
 int
 in_pcb_lport(struct inpcb *inp, struct in_addr *laddrp, u_short *lportp,
     struct ucred *cred, int lookupflags)
 {
 	struct inpcbinfo *pcbinfo;
 	struct inpcb *tmpinp;
 	unsigned short *lastport;
 	int count, dorandom, error;
 	u_short aux, first, last, lport;
 #ifdef INET
 	struct in_addr laddr;
 #endif
 
 	pcbinfo = inp->inp_pcbinfo;
 
 	/*
 	 * Because no actual state changes occur here, a global write lock on
 	 * the pcbinfo isn't required.
 	 */
 	INP_LOCK_ASSERT(inp);
 	INP_HASH_LOCK_ASSERT(pcbinfo);
 
 	if (inp->inp_flags & INP_HIGHPORT) {
 		first = V_ipport_hifirstauto;	/* sysctl */
 		last  = V_ipport_hilastauto;
 		lastport = &pcbinfo->ipi_lasthi;
 	} else if (inp->inp_flags & INP_LOWPORT) {
 		error = priv_check_cred(cred, PRIV_NETINET_RESERVEDPORT);
 		if (error)
 			return (error);
 		first = V_ipport_lowfirstauto;	/* 1023 */
 		last  = V_ipport_lowlastauto;	/* 600 */
 		lastport = &pcbinfo->ipi_lastlow;
 	} else {
 		first = V_ipport_firstauto;	/* sysctl */
 		last  = V_ipport_lastauto;
 		lastport = &pcbinfo->ipi_lastport;
 	}
 	/*
 	 * For UDP(-Lite), use random port allocation as long as the user
 	 * allows it.  For TCP (and as of yet unknown) connections,
 	 * use random port allocation only if the user allows it AND
 	 * ipport_tick() allows it.
 	 */
 	if (V_ipport_randomized &&
 		(!V_ipport_stoprandom || pcbinfo == &V_udbinfo ||
 		pcbinfo == &V_ulitecbinfo))
 		dorandom = 1;
 	else
 		dorandom = 0;
 	/*
 	 * It makes no sense to do random port allocation if
 	 * we have the only port available.
 	 */
 	if (first == last)
 		dorandom = 0;
 	/* Make sure to not include UDP(-Lite) packets in the count. */
 	if (pcbinfo != &V_udbinfo || pcbinfo != &V_ulitecbinfo)
 		V_ipport_tcpallocs++;
 	/*
 	 * Instead of having two loops further down counting up or down
 	 * make sure that first is always <= last and go with only one
 	 * code path implementing all logic.
 	 */
 	if (first > last) {
 		aux = first;
 		first = last;
 		last = aux;
 	}
 
 #ifdef INET
 	/* Make the compiler happy. */
 	laddr.s_addr = 0;
 	if ((inp->inp_vflag & (INP_IPV4|INP_IPV6)) == INP_IPV4) {
 		KASSERT(laddrp != NULL, ("%s: laddrp NULL for v4 inp %p",
 		    __func__, inp));
 		laddr = *laddrp;
 	}
 #endif
 	tmpinp = NULL;	/* Make compiler happy. */
 	lport = *lportp;
 
 	if (dorandom)
 		*lastport = first + (arc4random() % (last - first));
 
 	count = last - first;
 
 	do {
 		if (count-- < 0)	/* completely used? */
 			return (EADDRNOTAVAIL);
 		++*lastport;
 		if (*lastport < first || *lastport > last)
 			*lastport = first;
 		lport = htons(*lastport);
 
 #ifdef INET6
 		if ((inp->inp_vflag & INP_IPV6) != 0)
 			tmpinp = in6_pcblookup_local(pcbinfo,
 			    &inp->in6p_laddr, lport, lookupflags, cred);
 #endif
 #if defined(INET) && defined(INET6)
 		else
 #endif
 #ifdef INET
 			tmpinp = in_pcblookup_local(pcbinfo, laddr,
 			    lport, lookupflags, cred);
 #endif
 	} while (tmpinp != NULL);
 
 #ifdef INET
 	if ((inp->inp_vflag & (INP_IPV4|INP_IPV6)) == INP_IPV4)
 		laddrp->s_addr = laddr.s_addr;
 #endif
 	*lportp = lport;
 
 	return (0);
 }
 
 /*
  * Return cached socket options.
  */
 int
 inp_so_options(const struct inpcb *inp)
 {
 	int so_options;
 
 	so_options = 0;
 
 	if ((inp->inp_flags2 & INP_REUSEPORT_LB) != 0)
 		so_options |= SO_REUSEPORT_LB;
 	if ((inp->inp_flags2 & INP_REUSEPORT) != 0)
 		so_options |= SO_REUSEPORT;
 	if ((inp->inp_flags2 & INP_REUSEADDR) != 0)
 		so_options |= SO_REUSEADDR;
 	return (so_options);
 }
 #endif /* INET || INET6 */
 
 /*
  * Check if a new BINDMULTI socket is allowed to be created.
  *
  * ni points to the new inp.
  * oi points to the exisitng inp.
  *
  * This checks whether the existing inp also has BINDMULTI and
  * whether the credentials match.
  */
 int
 in_pcbbind_check_bindmulti(const struct inpcb *ni, const struct inpcb *oi)
 {
 	/* Check permissions match */
 	if ((ni->inp_flags2 & INP_BINDMULTI) &&
 	    (ni->inp_cred->cr_uid !=
 	    oi->inp_cred->cr_uid))
 		return (0);
 
 	/* Check the existing inp has BINDMULTI set */
 	if ((ni->inp_flags2 & INP_BINDMULTI) &&
 	    ((oi->inp_flags2 & INP_BINDMULTI) == 0))
 		return (0);
 
 	/*
 	 * We're okay - either INP_BINDMULTI isn't set on ni, or
 	 * it is and it matches the checks.
 	 */
 	return (1);
 }
 
 #ifdef INET
 /*
  * Set up a bind operation on a PCB, performing port allocation
  * as required, but do not actually modify the PCB. Callers can
  * either complete the bind by setting inp_laddr/inp_lport and
  * calling in_pcbinshash(), or they can just use the resulting
  * port and address to authorise the sending of a once-off packet.
  *
  * On error, the values of *laddrp and *lportp are not changed.
  */
 int
 in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
     u_short *lportp, struct ucred *cred)
 {
 	struct socket *so = inp->inp_socket;
 	struct sockaddr_in *sin;
 	struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 	struct in_addr laddr;
 	u_short lport = 0;
 	int lookupflags = 0, reuseport = (so->so_options & SO_REUSEPORT);
 	int error;
 
 	/*
 	 * XXX: Maybe we could let SO_REUSEPORT_LB set SO_REUSEPORT bit here
 	 * so that we don't have to add to the (already messy) code below.
 	 */
 	int reuseport_lb = (so->so_options & SO_REUSEPORT_LB);
 
 	/*
 	 * No state changes, so read locks are sufficient here.
 	 */
 	INP_LOCK_ASSERT(inp);
 	INP_HASH_LOCK_ASSERT(pcbinfo);
 
 	if (CK_STAILQ_EMPTY(&V_in_ifaddrhead)) /* XXX broken! */
 		return (EADDRNOTAVAIL);
 	laddr.s_addr = *laddrp;
 	if (nam != NULL && laddr.s_addr != INADDR_ANY)
 		return (EINVAL);
 	if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT|SO_REUSEPORT_LB)) == 0)
 		lookupflags = INPLOOKUP_WILDCARD;
 	if (nam == NULL) {
 		if ((error = prison_local_ip4(cred, &laddr)) != 0)
 			return (error);
 	} else {
 		sin = (struct sockaddr_in *)nam;
 		if (nam->sa_len != sizeof (*sin))
 			return (EINVAL);
 #ifdef notdef
 		/*
 		 * We should check the family, but old programs
 		 * incorrectly fail to initialize it.
 		 */
 		if (sin->sin_family != AF_INET)
 			return (EAFNOSUPPORT);
 #endif
 		error = prison_local_ip4(cred, &sin->sin_addr);
 		if (error)
 			return (error);
 		if (sin->sin_port != *lportp) {
 			/* Don't allow the port to change. */
 			if (*lportp != 0)
 				return (EINVAL);
 			lport = sin->sin_port;
 		}
 		/* NB: lport is left as 0 if the port isn't being changed. */
 		if (IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) {
 			/*
 			 * Treat SO_REUSEADDR as SO_REUSEPORT for multicast;
 			 * allow complete duplication of binding if
 			 * SO_REUSEPORT is set, or if SO_REUSEADDR is set
 			 * and a multicast address is bound on both
 			 * new and duplicated sockets.
 			 */
 			if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) != 0)
 				reuseport = SO_REUSEADDR|SO_REUSEPORT;
 			/*
 			 * XXX: How to deal with SO_REUSEPORT_LB here?
 			 * Treat same as SO_REUSEPORT for now.
 			 */
 			if ((so->so_options &
 			    (SO_REUSEADDR|SO_REUSEPORT_LB)) != 0)
 				reuseport_lb = SO_REUSEADDR|SO_REUSEPORT_LB;
 		} else if (sin->sin_addr.s_addr != INADDR_ANY) {
 			sin->sin_port = 0;		/* yech... */
 			bzero(&sin->sin_zero, sizeof(sin->sin_zero));
 			/*
 			 * Is the address a local IP address?
 			 * If INP_BINDANY is set, then the socket may be bound
 			 * to any endpoint address, local or not.
 			 */
 			if ((inp->inp_flags & INP_BINDANY) == 0 &&
 			    ifa_ifwithaddr_check((struct sockaddr *)sin) == 0)
 				return (EADDRNOTAVAIL);
 		}
 		laddr = sin->sin_addr;
 		if (lport) {
 			struct inpcb *t;
 			struct tcptw *tw;
 
 			/* GROSS */
 			if (ntohs(lport) <= V_ipport_reservedhigh &&
 			    ntohs(lport) >= V_ipport_reservedlow &&
 			    priv_check_cred(cred, PRIV_NETINET_RESERVEDPORT))
 				return (EACCES);
 			if (!IN_MULTICAST(ntohl(sin->sin_addr.s_addr)) &&
 			    priv_check_cred(inp->inp_cred, PRIV_NETINET_REUSEPORT) != 0) {
 				t = in_pcblookup_local(pcbinfo, sin->sin_addr,
 				    lport, INPLOOKUP_WILDCARD, cred);
 	/*
 	 * XXX
 	 * This entire block sorely needs a rewrite.
 	 */
 				if (t &&
 				    ((inp->inp_flags2 & INP_BINDMULTI) == 0) &&
 				    ((t->inp_flags & INP_TIMEWAIT) == 0) &&
 				    (so->so_type != SOCK_STREAM ||
 				     ntohl(t->inp_faddr.s_addr) == INADDR_ANY) &&
 				    (ntohl(sin->sin_addr.s_addr) != INADDR_ANY ||
 				     ntohl(t->inp_laddr.s_addr) != INADDR_ANY ||
 				     (t->inp_flags2 & INP_REUSEPORT) ||
 				     (t->inp_flags2 & INP_REUSEPORT_LB) == 0) &&
 				    (inp->inp_cred->cr_uid !=
 				     t->inp_cred->cr_uid))
 					return (EADDRINUSE);
 
 				/*
 				 * If the socket is a BINDMULTI socket, then
 				 * the credentials need to match and the
 				 * original socket also has to have been bound
 				 * with BINDMULTI.
 				 */
 				if (t && (! in_pcbbind_check_bindmulti(inp, t)))
 					return (EADDRINUSE);
 			}
 			t = in_pcblookup_local(pcbinfo, sin->sin_addr,
 			    lport, lookupflags, cred);
 			if (t && (t->inp_flags & INP_TIMEWAIT)) {
 				/*
 				 * XXXRW: If an incpb has had its timewait
 				 * state recycled, we treat the address as
 				 * being in use (for now).  This is better
 				 * than a panic, but not desirable.
 				 */
 				tw = intotw(t);
 				if (tw == NULL ||
 				    ((reuseport & tw->tw_so_options) == 0 &&
 					(reuseport_lb &
 				            tw->tw_so_options) == 0)) {
 					return (EADDRINUSE);
 				}
 			} else if (t &&
 				   ((inp->inp_flags2 & INP_BINDMULTI) == 0) &&
 				   (reuseport & inp_so_options(t)) == 0 &&
 				   (reuseport_lb & inp_so_options(t)) == 0) {
 #ifdef INET6
 				if (ntohl(sin->sin_addr.s_addr) !=
 				    INADDR_ANY ||
 				    ntohl(t->inp_laddr.s_addr) !=
 				    INADDR_ANY ||
 				    (inp->inp_vflag & INP_IPV6PROTO) == 0 ||
 				    (t->inp_vflag & INP_IPV6PROTO) == 0)
 #endif
 						return (EADDRINUSE);
 				if (t && (! in_pcbbind_check_bindmulti(inp, t)))
 					return (EADDRINUSE);
 			}
 		}
 	}
 	if (*lportp != 0)
 		lport = *lportp;
 	if (lport == 0) {
 		error = in_pcb_lport(inp, &laddr, &lport, cred, lookupflags);
 		if (error != 0)
 			return (error);
 
 	}
 	*laddrp = laddr.s_addr;
 	*lportp = lport;
 	return (0);
 }
 
 /*
  * Connect from a socket to a specified address.
  * Both address and port must be specified in argument sin.
  * If don't have a local address for this socket yet,
  * then pick one.
  */
 int
 in_pcbconnect_mbuf(struct inpcb *inp, struct sockaddr *nam,
     struct ucred *cred, struct mbuf *m)
 {
 	u_short lport, fport;
 	in_addr_t laddr, faddr;
 	int anonport, error;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
 
 	lport = inp->inp_lport;
 	laddr = inp->inp_laddr.s_addr;
 	anonport = (lport == 0);
 	error = in_pcbconnect_setup(inp, nam, &laddr, &lport, &faddr, &fport,
 	    NULL, cred);
 	if (error)
 		return (error);
 
 	/* Do the initial binding of the local address if required. */
 	if (inp->inp_laddr.s_addr == INADDR_ANY && inp->inp_lport == 0) {
 		inp->inp_lport = lport;
 		inp->inp_laddr.s_addr = laddr;
 		if (in_pcbinshash(inp) != 0) {
 			inp->inp_laddr.s_addr = INADDR_ANY;
 			inp->inp_lport = 0;
 			return (EAGAIN);
 		}
 	}
 
 	/* Commit the remaining changes. */
 	inp->inp_lport = lport;
 	inp->inp_laddr.s_addr = laddr;
 	inp->inp_faddr.s_addr = faddr;
 	inp->inp_fport = fport;
 	in_pcbrehash_mbuf(inp, m);
 
 	if (anonport)
 		inp->inp_flags |= INP_ANONPORT;
 	return (0);
 }
 
 int
 in_pcbconnect(struct inpcb *inp, struct sockaddr *nam, struct ucred *cred)
 {
 
 	return (in_pcbconnect_mbuf(inp, nam, cred, NULL));
 }
 
 /*
  * Do proper source address selection on an unbound socket in case
  * of connect. Take jails into account as well.
  */
 int
 in_pcbladdr(struct inpcb *inp, struct in_addr *faddr, struct in_addr *laddr,
     struct ucred *cred)
 {
 	struct ifaddr *ifa;
 	struct sockaddr *sa;
 	struct sockaddr_in *sin;
 	struct route sro;
 	struct epoch_tracker et;
 	int error;
 
 	KASSERT(laddr != NULL, ("%s: laddr NULL", __func__));
 	/*
 	 * Bypass source address selection and use the primary jail IP
 	 * if requested.
 	 */
 	if (cred != NULL && !prison_saddrsel_ip4(cred, laddr))
 		return (0);
 
 	error = 0;
 	bzero(&sro, sizeof(sro));
 
 	sin = (struct sockaddr_in *)&sro.ro_dst;
 	sin->sin_family = AF_INET;
 	sin->sin_len = sizeof(struct sockaddr_in);
 	sin->sin_addr.s_addr = faddr->s_addr;
 
 	/*
 	 * If route is known our src addr is taken from the i/f,
 	 * else punt.
 	 *
 	 * Find out route to destination.
 	 */
 	if ((inp->inp_socket->so_options & SO_DONTROUTE) == 0)
 		in_rtalloc_ign(&sro, 0, inp->inp_inc.inc_fibnum);
 
 	/*
 	 * If we found a route, use the address corresponding to
 	 * the outgoing interface.
 	 * 
 	 * Otherwise assume faddr is reachable on a directly connected
 	 * network and try to find a corresponding interface to take
 	 * the source address from.
 	 */
 	NET_EPOCH_ENTER(et);
 	if (sro.ro_rt == NULL || sro.ro_rt->rt_ifp == NULL) {
 		struct in_ifaddr *ia;
 		struct ifnet *ifp;
 
 		ia = ifatoia(ifa_ifwithdstaddr((struct sockaddr *)sin,
 					inp->inp_socket->so_fibnum));
 		if (ia == NULL) {
 			ia = ifatoia(ifa_ifwithnet((struct sockaddr *)sin, 0,
 						inp->inp_socket->so_fibnum));
 
 		}
 		if (ia == NULL) {
 			error = ENETUNREACH;
 			goto done;
 		}
 
 		if (cred == NULL || !prison_flag(cred, PR_IP4)) {
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		ifp = ia->ia_ifp;
 		ia = NULL;
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 
 			sa = ifa->ifa_addr;
 			if (sa->sa_family != AF_INET)
 				continue;
 			sin = (struct sockaddr_in *)sa;
 			if (prison_check_ip4(cred, &sin->sin_addr) == 0) {
 				ia = (struct in_ifaddr *)ifa;
 				break;
 			}
 		}
 		if (ia != NULL) {
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		/* 3. As a last resort return the 'default' jail address. */
 		error = prison_get_ip4(cred, laddr);
 		goto done;
 	}
 
 	/*
 	 * If the outgoing interface on the route found is not
 	 * a loopback interface, use the address from that interface.
 	 * In case of jails do those three steps:
 	 * 1. check if the interface address belongs to the jail. If so use it.
 	 * 2. check if we have any address on the outgoing interface
 	 *    belonging to this jail. If so use it.
 	 * 3. as a last resort return the 'default' jail address.
 	 */
 	if ((sro.ro_rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0) {
 		struct in_ifaddr *ia;
 		struct ifnet *ifp;
 
 		/* If not jailed, use the default returned. */
 		if (cred == NULL || !prison_flag(cred, PR_IP4)) {
 			ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa;
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		/* Jailed. */
 		/* 1. Check if the iface address belongs to the jail. */
 		sin = (struct sockaddr_in *)sro.ro_rt->rt_ifa->ifa_addr;
 		if (prison_check_ip4(cred, &sin->sin_addr) == 0) {
 			ia = (struct in_ifaddr *)sro.ro_rt->rt_ifa;
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		/*
 		 * 2. Check if we have any address on the outgoing interface
 		 *    belonging to this jail.
 		 */
 		ia = NULL;
 		ifp = sro.ro_rt->rt_ifp;
 		CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 			sa = ifa->ifa_addr;
 			if (sa->sa_family != AF_INET)
 				continue;
 			sin = (struct sockaddr_in *)sa;
 			if (prison_check_ip4(cred, &sin->sin_addr) == 0) {
 				ia = (struct in_ifaddr *)ifa;
 				break;
 			}
 		}
 		if (ia != NULL) {
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		/* 3. As a last resort return the 'default' jail address. */
 		error = prison_get_ip4(cred, laddr);
 		goto done;
 	}
 
 	/*
 	 * The outgoing interface is marked with 'loopback net', so a route
 	 * to ourselves is here.
 	 * Try to find the interface of the destination address and then
 	 * take the address from there. That interface is not necessarily
 	 * a loopback interface.
 	 * In case of jails, check that it is an address of the jail
 	 * and if we cannot find, fall back to the 'default' jail address.
 	 */
 	if ((sro.ro_rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) {
 		struct sockaddr_in sain;
 		struct in_ifaddr *ia;
 
 		bzero(&sain, sizeof(struct sockaddr_in));
 		sain.sin_family = AF_INET;
 		sain.sin_len = sizeof(struct sockaddr_in);
 		sain.sin_addr.s_addr = faddr->s_addr;
 
 		ia = ifatoia(ifa_ifwithdstaddr(sintosa(&sain),
 					inp->inp_socket->so_fibnum));
 		if (ia == NULL)
 			ia = ifatoia(ifa_ifwithnet(sintosa(&sain), 0,
 						inp->inp_socket->so_fibnum));
 		if (ia == NULL)
 			ia = ifatoia(ifa_ifwithaddr(sintosa(&sain)));
 
 		if (cred == NULL || !prison_flag(cred, PR_IP4)) {
 			if (ia == NULL) {
 				error = ENETUNREACH;
 				goto done;
 			}
 			laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 			goto done;
 		}
 
 		/* Jailed. */
 		if (ia != NULL) {
 			struct ifnet *ifp;
 
 			ifp = ia->ia_ifp;
 			ia = NULL;
 			CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 				sa = ifa->ifa_addr;
 				if (sa->sa_family != AF_INET)
 					continue;
 				sin = (struct sockaddr_in *)sa;
 				if (prison_check_ip4(cred,
 				    &sin->sin_addr) == 0) {
 					ia = (struct in_ifaddr *)ifa;
 					break;
 				}
 			}
 			if (ia != NULL) {
 				laddr->s_addr = ia->ia_addr.sin_addr.s_addr;
 				goto done;
 			}
 		}
 
 		/* 3. As a last resort return the 'default' jail address. */
 		error = prison_get_ip4(cred, laddr);
 		goto done;
 	}
 
 done:
 	NET_EPOCH_EXIT(et);
 	if (sro.ro_rt != NULL)
 		RTFREE(sro.ro_rt);
 	return (error);
 }
 
 /*
  * Set up for a connect from a socket to the specified address.
  * On entry, *laddrp and *lportp should contain the current local
  * address and port for the PCB; these are updated to the values
  * that should be placed in inp_laddr and inp_lport to complete
  * the connect.
  *
  * On success, *faddrp and *fportp will be set to the remote address
  * and port. These are not updated in the error case.
  *
  * If the operation fails because the connection already exists,
  * *oinpp will be set to the PCB of that connection so that the
  * caller can decide to override it. In all other cases, *oinpp
  * is set to NULL.
  */
 int
 in_pcbconnect_setup(struct inpcb *inp, struct sockaddr *nam,
     in_addr_t *laddrp, u_short *lportp, in_addr_t *faddrp, u_short *fportp,
     struct inpcb **oinpp, struct ucred *cred)
 {
 	struct rm_priotracker in_ifa_tracker;
 	struct sockaddr_in *sin = (struct sockaddr_in *)nam;
 	struct in_ifaddr *ia;
 	struct inpcb *oinp;
 	struct in_addr laddr, faddr;
 	u_short lport, fport;
 	int error;
 
 	/*
 	 * Because a global state change doesn't actually occur here, a read
 	 * lock is sufficient.
 	 */
 	INP_LOCK_ASSERT(inp);
 	INP_HASH_LOCK_ASSERT(inp->inp_pcbinfo);
 
 	if (oinpp != NULL)
 		*oinpp = NULL;
 	if (nam->sa_len != sizeof (*sin))
 		return (EINVAL);
 	if (sin->sin_family != AF_INET)
 		return (EAFNOSUPPORT);
 	if (sin->sin_port == 0)
 		return (EADDRNOTAVAIL);
 	laddr.s_addr = *laddrp;
 	lport = *lportp;
 	faddr = sin->sin_addr;
 	fport = sin->sin_port;
 
 	if (!CK_STAILQ_EMPTY(&V_in_ifaddrhead)) {
 		/*
 		 * If the destination address is INADDR_ANY,
 		 * use the primary local address.
 		 * If the supplied address is INADDR_BROADCAST,
 		 * and the primary interface supports broadcast,
 		 * choose the broadcast address for that interface.
 		 */
 		if (faddr.s_addr == INADDR_ANY) {
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			faddr =
 			    IA_SIN(CK_STAILQ_FIRST(&V_in_ifaddrhead))->sin_addr;
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 			if (cred != NULL &&
 			    (error = prison_get_ip4(cred, &faddr)) != 0)
 				return (error);
 		} else if (faddr.s_addr == (u_long)INADDR_BROADCAST) {
 			IN_IFADDR_RLOCK(&in_ifa_tracker);
 			if (CK_STAILQ_FIRST(&V_in_ifaddrhead)->ia_ifp->if_flags &
 			    IFF_BROADCAST)
 				faddr = satosin(&CK_STAILQ_FIRST(
 				    &V_in_ifaddrhead)->ia_broadaddr)->sin_addr;
 			IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 		}
 	}
 	if (laddr.s_addr == INADDR_ANY) {
 		error = in_pcbladdr(inp, &faddr, &laddr, cred);
 		/*
 		 * If the destination address is multicast and an outgoing
 		 * interface has been set as a multicast option, prefer the
 		 * address of that interface as our source address.
 		 */
 		if (IN_MULTICAST(ntohl(faddr.s_addr)) &&
 		    inp->inp_moptions != NULL) {
 			struct ip_moptions *imo;
 			struct ifnet *ifp;
 
 			imo = inp->inp_moptions;
 			if (imo->imo_multicast_ifp != NULL) {
 				ifp = imo->imo_multicast_ifp;
 				IN_IFADDR_RLOCK(&in_ifa_tracker);
 				CK_STAILQ_FOREACH(ia, &V_in_ifaddrhead, ia_link) {
 					if ((ia->ia_ifp == ifp) &&
 					    (cred == NULL ||
 					    prison_check_ip4(cred,
 					    &ia->ia_addr.sin_addr) == 0))
 						break;
 				}
 				if (ia == NULL)
 					error = EADDRNOTAVAIL;
 				else {
 					laddr = ia->ia_addr.sin_addr;
 					error = 0;
 				}
 				IN_IFADDR_RUNLOCK(&in_ifa_tracker);
 			}
 		}
 		if (error)
 			return (error);
 	}
 	oinp = in_pcblookup_hash_locked(inp->inp_pcbinfo, faddr, fport,
 	    laddr, lport, 0, NULL);
 	if (oinp != NULL) {
 		if (oinpp != NULL)
 			*oinpp = oinp;
 		return (EADDRINUSE);
 	}
 	if (lport == 0) {
 		error = in_pcbbind_setup(inp, NULL, &laddr.s_addr, &lport,
 		    cred);
 		if (error)
 			return (error);
 	}
 	*laddrp = laddr.s_addr;
 	*lportp = lport;
 	*faddrp = faddr.s_addr;
 	*fportp = fport;
 	return (0);
 }
 
 void
 in_pcbdisconnect(struct inpcb *inp)
 {
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(inp->inp_pcbinfo);
 
 	inp->inp_faddr.s_addr = INADDR_ANY;
 	inp->inp_fport = 0;
 	in_pcbrehash(inp);
 }
 #endif /* INET */
 
 /*
  * in_pcbdetach() is responsibe for disassociating a socket from an inpcb.
  * For most protocols, this will be invoked immediately prior to calling
  * in_pcbfree().  However, with TCP the inpcb may significantly outlive the
  * socket, in which case in_pcbfree() is deferred.
  */
 void
 in_pcbdetach(struct inpcb *inp)
 {
 
 	KASSERT(inp->inp_socket != NULL, ("%s: inp_socket == NULL", __func__));
 
 #ifdef RATELIMIT
 	if (inp->inp_snd_tag != NULL)
 		in_pcbdetach_txrtlmt(inp);
 #endif
 	inp->inp_socket->so_pcb = NULL;
 	inp->inp_socket = NULL;
 }
 
 /*
  * in_pcbref() bumps the reference count on an inpcb in order to maintain
  * stability of an inpcb pointer despite the inpcb lock being released.  This
  * is used in TCP when the inpcbinfo lock needs to be acquired or upgraded,
  * but where the inpcb lock may already held, or when acquiring a reference
  * via a pcbgroup.
  *
  * in_pcbref() should be used only to provide brief memory stability, and
  * must always be followed by a call to INP_WLOCK() and in_pcbrele() to
  * garbage collect the inpcb if it has been in_pcbfree()'d from another
  * context.  Until in_pcbrele() has returned that the inpcb is still valid,
  * lock and rele are the *only* safe operations that may be performed on the
  * inpcb.
  *
  * While the inpcb will not be freed, releasing the inpcb lock means that the
  * connection's state may change, so the caller should be careful to
  * revalidate any cached state on reacquiring the lock.  Drop the reference
  * using in_pcbrele().
  */
 void
 in_pcbref(struct inpcb *inp)
 {
 
 	KASSERT(inp->inp_refcount > 0, ("%s: refcount 0", __func__));
 
 	refcount_acquire(&inp->inp_refcount);
 }
 
 /*
  * Drop a refcount on an inpcb elevated using in_pcbref(); because a call to
  * in_pcbfree() may have been made between in_pcbref() and in_pcbrele(), we
  * return a flag indicating whether or not the inpcb remains valid.  If it is
  * valid, we return with the inpcb lock held.
  *
  * Notice that, unlike in_pcbref(), the inpcb lock must be held to drop a
  * reference on an inpcb.  Historically more work was done here (actually, in
  * in_pcbfree_internal()) but has been moved to in_pcbfree() to avoid the
  * need for the pcbinfo lock in in_pcbrele().  Deferring the free is entirely
  * about memory stability (and continued use of the write lock).
  */
 int
 in_pcbrele_rlocked(struct inpcb *inp)
 {
 	struct inpcbinfo *pcbinfo;
 
 	KASSERT(inp->inp_refcount > 0, ("%s: refcount 0", __func__));
 
 	INP_RLOCK_ASSERT(inp);
 
 	if (refcount_release(&inp->inp_refcount) == 0) {
 		/*
 		 * If the inpcb has been freed, let the caller know, even if
 		 * this isn't the last reference.
 		 */
 		if (inp->inp_flags2 & INP_FREED) {
 			INP_RUNLOCK(inp);
 			return (1);
 		}
 		return (0);
 	}
 	
 	KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL", __func__));
 #ifdef TCPHPTS
 	if (inp->inp_in_hpts || inp->inp_in_input) {
 		struct tcp_hpts_entry *hpts;
 		/*
 		 * We should not be on the hpts at 
 		 * this point in any form. we must
 		 * get the lock to be sure.
 		 */
 		hpts = tcp_hpts_lock(inp);
 		if (inp->inp_in_hpts)
 			panic("Hpts:%p inp:%p at free still on hpts",
 			      hpts, inp);
 		mtx_unlock(&hpts->p_mtx);
 		hpts = tcp_input_lock(inp);
 		if (inp->inp_in_input) 
 			panic("Hpts:%p inp:%p at free still on input hpts",
 			      hpts, inp);
 		mtx_unlock(&hpts->p_mtx);
 	}
 #endif
 	INP_RUNLOCK(inp);
 	pcbinfo = inp->inp_pcbinfo;
 	uma_zfree(pcbinfo->ipi_zone, inp);
 	return (1);
 }
 
 int
 in_pcbrele_wlocked(struct inpcb *inp)
 {
 	struct inpcbinfo *pcbinfo;
 
 	KASSERT(inp->inp_refcount > 0, ("%s: refcount 0", __func__));
 
 	INP_WLOCK_ASSERT(inp);
 
 	if (refcount_release(&inp->inp_refcount) == 0) {
 		/*
 		 * If the inpcb has been freed, let the caller know, even if
 		 * this isn't the last reference.
 		 */
 		if (inp->inp_flags2 & INP_FREED) {
 			INP_WUNLOCK(inp);
 			return (1);
 		}
 		return (0);
 	}
 
 	KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL", __func__));
 #ifdef TCPHPTS
 	if (inp->inp_in_hpts || inp->inp_in_input) {
 		struct tcp_hpts_entry *hpts;
 		/*
 		 * We should not be on the hpts at 
 		 * this point in any form. we must
 		 * get the lock to be sure.
 		 */
 		hpts = tcp_hpts_lock(inp);
 		if (inp->inp_in_hpts)
 			panic("Hpts:%p inp:%p at free still on hpts",
 			      hpts, inp);
 		mtx_unlock(&hpts->p_mtx);
 		hpts = tcp_input_lock(inp);
 		if (inp->inp_in_input) 
 			panic("Hpts:%p inp:%p at free still on input hpts",
 			      hpts, inp);
 		mtx_unlock(&hpts->p_mtx);
 	}
 #endif
 	INP_WUNLOCK(inp);
 	pcbinfo = inp->inp_pcbinfo;
 	uma_zfree(pcbinfo->ipi_zone, inp);
 	return (1);
 }
 
 /*
  * Temporary wrapper.
  */
 int
 in_pcbrele(struct inpcb *inp)
 {
 
 	return (in_pcbrele_wlocked(inp));
 }
 
 void
 in_pcblist_rele_rlocked(epoch_context_t ctx)
 {
 	struct in_pcblist *il;
 	struct inpcb *inp;
 	struct inpcbinfo *pcbinfo;
 	int i, n;
 
 	il = __containerof(ctx, struct in_pcblist, il_epoch_ctx);
 	pcbinfo = il->il_pcbinfo;
 	n = il->il_count;
 	INP_INFO_WLOCK(pcbinfo);
 	for (i = 0; i < n; i++) {
 		inp = il->il_inp_list[i];
 		INP_RLOCK(inp);
 		if (!in_pcbrele_rlocked(inp))
 			INP_RUNLOCK(inp);
 	}
 	INP_INFO_WUNLOCK(pcbinfo);
 	free(il, M_TEMP);
 }
 
 static void
 inpcbport_free(epoch_context_t ctx)
 {
 	struct inpcbport *phd;
 
 	phd = __containerof(ctx, struct inpcbport, phd_epoch_ctx);
 	free(phd, M_PCB);
 }
 
 static void
 in_pcbfree_deferred(epoch_context_t ctx)
 {
 	struct inpcb *inp;
 	int released __unused;
 
 	inp = __containerof(ctx, struct inpcb, inp_epoch_ctx);
 
 	INP_WLOCK(inp);
 	CURVNET_SET(inp->inp_vnet);
 #ifdef INET
 	struct ip_moptions *imo = inp->inp_moptions;
 	inp->inp_moptions = NULL;
 #endif
 	/* XXXRW: Do as much as possible here. */
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	if (inp->inp_sp != NULL)
 		ipsec_delete_pcbpolicy(inp);
 #endif
 #ifdef INET6
 	struct ip6_moptions *im6o = NULL;
 	if (inp->inp_vflag & INP_IPV6PROTO) {
 		ip6_freepcbopts(inp->in6p_outputopts);
 		im6o = inp->in6p_moptions;
 		inp->in6p_moptions = NULL;
 	}
 #endif
 	if (inp->inp_options)
 		(void)m_free(inp->inp_options);
 	inp->inp_vflag = 0;
 	crfree(inp->inp_cred);
 #ifdef MAC
 	mac_inpcb_destroy(inp);
 #endif
 	released = in_pcbrele_wlocked(inp);
 	MPASS(released);
 #ifdef INET6
 	ip6_freemoptions(im6o);
 #endif
 #ifdef INET
 	inp_freemoptions(imo);
 #endif	
 	CURVNET_RESTORE();
 }
 
 /*
  * Unconditionally schedule an inpcb to be freed by decrementing its
  * reference count, which should occur only after the inpcb has been detached
  * from its socket.  If another thread holds a temporary reference (acquired
  * using in_pcbref()) then the free is deferred until that reference is
  * released using in_pcbrele(), but the inpcb is still unlocked.  Almost all
  * work, including removal from global lists, is done in this context, where
  * the pcbinfo lock is held.
  */
 void
 in_pcbfree(struct inpcb *inp)
 {
 	struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 
 	KASSERT(inp->inp_socket == NULL, ("%s: inp_socket != NULL", __func__));
 	KASSERT((inp->inp_flags2 & INP_FREED) == 0,
 	    ("%s: called twice for pcb %p", __func__, inp));
 	if (inp->inp_flags2 & INP_FREED) {
 		INP_WUNLOCK(inp);
 		return;
 	}
 
 #ifdef INVARIANTS
 	if (pcbinfo == &V_tcbinfo) {
 		INP_INFO_LOCK_ASSERT(pcbinfo);
 	} else {
 		INP_INFO_WLOCK_ASSERT(pcbinfo);
 	}
 #endif
 	INP_WLOCK_ASSERT(inp);
 	INP_LIST_WLOCK(pcbinfo);
 	in_pcbremlists(inp);
 	INP_LIST_WUNLOCK(pcbinfo);
 	RO_INVALIDATE_CACHE(&inp->inp_route);
 	/* mark as destruction in progress */
 	inp->inp_flags2 |= INP_FREED;
 	INP_WUNLOCK(inp);
 	epoch_call(net_epoch_preempt, &inp->inp_epoch_ctx, in_pcbfree_deferred);
 }
 
 /*
  * in_pcbdrop() removes an inpcb from hashed lists, releasing its address and
  * port reservation, and preventing it from being returned by inpcb lookups.
  *
  * It is used by TCP to mark an inpcb as unused and avoid future packet
  * delivery or event notification when a socket remains open but TCP has
  * closed.  This might occur as a result of a shutdown()-initiated TCP close
  * or a RST on the wire, and allows the port binding to be reused while still
  * maintaining the invariant that so_pcb always points to a valid inpcb until
  * in_pcbdetach().
  *
  * XXXRW: Possibly in_pcbdrop() should also prevent future notifications by
  * in_pcbnotifyall() and in_pcbpurgeif0()?
  */
 void
 in_pcbdrop(struct inpcb *inp)
 {
 
 	INP_WLOCK_ASSERT(inp);
 #ifdef INVARIANTS
 	if (inp->inp_socket != NULL && inp->inp_ppcb != NULL)
 		MPASS(inp->inp_refcount > 1);
 #endif
 
 	/*
 	 * XXXRW: Possibly we should protect the setting of INP_DROPPED with
 	 * the hash lock...?
 	 */
 	inp->inp_flags |= INP_DROPPED;
 	if (inp->inp_flags & INP_INHASHLIST) {
 		struct inpcbport *phd = inp->inp_phd;
 
 		INP_HASH_WLOCK(inp->inp_pcbinfo);
 		in_pcbremlbgrouphash(inp);
 		CK_LIST_REMOVE(inp, inp_hash);
 		CK_LIST_REMOVE(inp, inp_portlist);
 		if (CK_LIST_FIRST(&phd->phd_pcblist) == NULL) {
 			CK_LIST_REMOVE(phd, phd_hash);
 			epoch_call(net_epoch_preempt, &phd->phd_epoch_ctx, inpcbport_free);
 		}
 		INP_HASH_WUNLOCK(inp->inp_pcbinfo);
 		inp->inp_flags &= ~INP_INHASHLIST;
 #ifdef PCBGROUP
 		in_pcbgroup_remove(inp);
 #endif
 	}
 }
 
 #ifdef INET
 /*
  * Common routines to return the socket addresses associated with inpcbs.
  */
 struct sockaddr *
 in_sockaddr(in_port_t port, struct in_addr *addr_p)
 {
 	struct sockaddr_in *sin;
 
 	sin = malloc(sizeof *sin, M_SONAME,
 		M_WAITOK | M_ZERO);
 	sin->sin_family = AF_INET;
 	sin->sin_len = sizeof(*sin);
 	sin->sin_addr = *addr_p;
 	sin->sin_port = port;
 
 	return (struct sockaddr *)sin;
 }
 
 int
 in_getsockaddr(struct socket *so, struct sockaddr **nam)
 {
 	struct inpcb *inp;
 	struct in_addr addr;
 	in_port_t port;
 
 	inp = sotoinpcb(so);
 	KASSERT(inp != NULL, ("in_getsockaddr: inp == NULL"));
 
 	INP_RLOCK(inp);
 	port = inp->inp_lport;
 	addr = inp->inp_laddr;
 	INP_RUNLOCK(inp);
 
 	*nam = in_sockaddr(port, &addr);
 	return 0;
 }
 
 int
 in_getpeeraddr(struct socket *so, struct sockaddr **nam)
 {
 	struct inpcb *inp;
 	struct in_addr addr;
 	in_port_t port;
 
 	inp = sotoinpcb(so);
 	KASSERT(inp != NULL, ("in_getpeeraddr: inp == NULL"));
 
 	INP_RLOCK(inp);
 	port = inp->inp_fport;
 	addr = inp->inp_faddr;
 	INP_RUNLOCK(inp);
 
 	*nam = in_sockaddr(port, &addr);
 	return 0;
 }
 
 void
 in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr faddr, int errno,
     struct inpcb *(*notify)(struct inpcb *, int))
 {
 	struct inpcb *inp, *inp_temp;
 
 	INP_INFO_WLOCK(pcbinfo);
 	CK_LIST_FOREACH_SAFE(inp, pcbinfo->ipi_listhead, inp_list, inp_temp) {
 		INP_WLOCK(inp);
 #ifdef INET6
 		if ((inp->inp_vflag & INP_IPV4) == 0) {
 			INP_WUNLOCK(inp);
 			continue;
 		}
 #endif
 		if (inp->inp_faddr.s_addr != faddr.s_addr ||
 		    inp->inp_socket == NULL) {
 			INP_WUNLOCK(inp);
 			continue;
 		}
 		if ((*notify)(inp, errno))
 			INP_WUNLOCK(inp);
 	}
 	INP_INFO_WUNLOCK(pcbinfo);
 }
 
 void
 in_pcbpurgeif0(struct inpcbinfo *pcbinfo, struct ifnet *ifp)
 {
 	struct inpcb *inp;
 	struct ip_moptions *imo;
 	int i, gap;
 
 	INP_INFO_WLOCK(pcbinfo);
 	CK_LIST_FOREACH(inp, pcbinfo->ipi_listhead, inp_list) {
 		INP_WLOCK(inp);
 		imo = inp->inp_moptions;
 		if ((inp->inp_vflag & INP_IPV4) &&
 		    imo != NULL) {
 			/*
 			 * Unselect the outgoing interface if it is being
 			 * detached.
 			 */
 			if (imo->imo_multicast_ifp == ifp)
 				imo->imo_multicast_ifp = NULL;
 
 			/*
 			 * Drop multicast group membership if we joined
 			 * through the interface being detached.
 			 *
 			 * XXX This can all be deferred to an epoch_call
 			 */
 			for (i = 0, gap = 0; i < imo->imo_num_memberships;
 			    i++) {
 				if (imo->imo_membership[i]->inm_ifp == ifp) {
 					IN_MULTI_LOCK_ASSERT();
 					in_leavegroup_locked(imo->imo_membership[i], NULL);
 					gap++;
 				} else if (gap != 0)
 					imo->imo_membership[i - gap] =
 					    imo->imo_membership[i];
 			}
 			imo->imo_num_memberships -= gap;
 		}
 		INP_WUNLOCK(inp);
 	}
 	INP_INFO_WUNLOCK(pcbinfo);
 }
 
 /*
  * Lookup a PCB based on the local address and port.  Caller must hold the
  * hash lock.  No inpcb locks or references are acquired.
  */
 #define INP_LOOKUP_MAPPED_PCB_COST	3
 struct inpcb *
 in_pcblookup_local(struct inpcbinfo *pcbinfo, struct in_addr laddr,
     u_short lport, int lookupflags, struct ucred *cred)
 {
 	struct inpcb *inp;
 #ifdef INET6
 	int matchwild = 3 + INP_LOOKUP_MAPPED_PCB_COST;
 #else
 	int matchwild = 3;
 #endif
 	int wildcard;
 
 	KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
 	    ("%s: invalid lookup flags %d", __func__, lookupflags));
 
 	INP_HASH_LOCK_ASSERT(pcbinfo);
 
 	if ((lookupflags & INPLOOKUP_WILDCARD) == 0) {
 		struct inpcbhead *head;
 		/*
 		 * Look for an unconnected (wildcard foreign addr) PCB that
 		 * matches the local address and port we're looking for.
 		 */
 		head = &pcbinfo->ipi_hashbase[INP_PCBHASH(INADDR_ANY, lport,
 		    0, pcbinfo->ipi_hashmask)];
 		CK_LIST_FOREACH(inp, head, inp_hash) {
 #ifdef INET6
 			/* XXX inp locking */
 			if ((inp->inp_vflag & INP_IPV4) == 0)
 				continue;
 #endif
 			if (inp->inp_faddr.s_addr == INADDR_ANY &&
 			    inp->inp_laddr.s_addr == laddr.s_addr &&
 			    inp->inp_lport == lport) {
 				/*
 				 * Found?
 				 */
 				if (cred == NULL ||
 				    prison_equal_ip4(cred->cr_prison,
 					inp->inp_cred->cr_prison))
 					return (inp);
 			}
 		}
 		/*
 		 * Not found.
 		 */
 		return (NULL);
 	} else {
 		struct inpcbporthead *porthash;
 		struct inpcbport *phd;
 		struct inpcb *match = NULL;
 		/*
 		 * Best fit PCB lookup.
 		 *
 		 * First see if this local port is in use by looking on the
 		 * port hash list.
 		 */
 		porthash = &pcbinfo->ipi_porthashbase[INP_PCBPORTHASH(lport,
 		    pcbinfo->ipi_porthashmask)];
 		CK_LIST_FOREACH(phd, porthash, phd_hash) {
 			if (phd->phd_port == lport)
 				break;
 		}
 		if (phd != NULL) {
 			/*
 			 * Port is in use by one or more PCBs. Look for best
 			 * fit.
 			 */
 			CK_LIST_FOREACH(inp, &phd->phd_pcblist, inp_portlist) {
 				wildcard = 0;
 				if (cred != NULL &&
 				    !prison_equal_ip4(inp->inp_cred->cr_prison,
 					cred->cr_prison))
 					continue;
 #ifdef INET6
 				/* XXX inp locking */
 				if ((inp->inp_vflag & INP_IPV4) == 0)
 					continue;
 				/*
 				 * We never select the PCB that has
 				 * INP_IPV6 flag and is bound to :: if
 				 * we have another PCB which is bound
 				 * to 0.0.0.0.  If a PCB has the
 				 * INP_IPV6 flag, then we set its cost
 				 * higher than IPv4 only PCBs.
 				 *
 				 * Note that the case only happens
 				 * when a socket is bound to ::, under
 				 * the condition that the use of the
 				 * mapped address is allowed.
 				 */
 				if ((inp->inp_vflag & INP_IPV6) != 0)
 					wildcard += INP_LOOKUP_MAPPED_PCB_COST;
 #endif
 				if (inp->inp_faddr.s_addr != INADDR_ANY)
 					wildcard++;
 				if (inp->inp_laddr.s_addr != INADDR_ANY) {
 					if (laddr.s_addr == INADDR_ANY)
 						wildcard++;
 					else if (inp->inp_laddr.s_addr != laddr.s_addr)
 						continue;
 				} else {
 					if (laddr.s_addr != INADDR_ANY)
 						wildcard++;
 				}
 				if (wildcard < matchwild) {
 					match = inp;
 					matchwild = wildcard;
 					if (matchwild == 0)
 						break;
 				}
 			}
 		}
 		return (match);
 	}
 }
 #undef INP_LOOKUP_MAPPED_PCB_COST
 
 static struct inpcb *
 in_pcblookup_lbgroup(const struct inpcbinfo *pcbinfo,
     const struct in_addr *laddr, uint16_t lport, const struct in_addr *faddr,
     uint16_t fport, int lookupflags)
 {
 	struct inpcb *local_wild;
 	const struct inpcblbgrouphead *hdr;
 	struct inpcblbgroup *grp;
 	uint32_t idx;
 
 	INP_HASH_LOCK_ASSERT(pcbinfo);
 
 	hdr = &pcbinfo->ipi_lbgrouphashbase[
 	    INP_PCBPORTHASH(lport, pcbinfo->ipi_lbgrouphashmask)];
 
 	/*
 	 * Order of socket selection:
 	 * 1. non-wild.
 	 * 2. wild (if lookupflags contains INPLOOKUP_WILDCARD).
 	 *
 	 * NOTE:
 	 * - Load balanced group does not contain jailed sockets
 	 * - Load balanced group does not contain IPv4 mapped INET6 wild sockets
 	 */
 	local_wild = NULL;
 	CK_LIST_FOREACH(grp, hdr, il_list) {
 #ifdef INET6
 		if (!(grp->il_vflag & INP_IPV4))
 			continue;
 #endif
 		if (grp->il_lport != lport)
 			continue;
 
 		idx = INP_PCBLBGROUP_PKTHASH(faddr->s_addr, lport, fport) %
 		    grp->il_inpcnt;
 		if (grp->il_laddr.s_addr == laddr->s_addr)
 			return (grp->il_inp[idx]);
 		if (grp->il_laddr.s_addr == INADDR_ANY &&
 		    (lookupflags & INPLOOKUP_WILDCARD) != 0)
 			local_wild = grp->il_inp[idx];
 	}
 	return (local_wild);
 }
 
 #ifdef PCBGROUP
 /*
  * Lookup PCB in hash list, using pcbgroup tables.
  */
 static struct inpcb *
 in_pcblookup_group(struct inpcbinfo *pcbinfo, struct inpcbgroup *pcbgroup,
     struct in_addr faddr, u_int fport_arg, struct in_addr laddr,
     u_int lport_arg, int lookupflags, struct ifnet *ifp)
 {
 	struct inpcbhead *head;
 	struct inpcb *inp, *tmpinp;
 	u_short fport = fport_arg, lport = lport_arg;
 	bool locked;
 
 	/*
 	 * First look for an exact match.
 	 */
 	tmpinp = NULL;
 	INP_GROUP_LOCK(pcbgroup);
 	head = &pcbgroup->ipg_hashbase[INP_PCBHASH(faddr.s_addr, lport, fport,
 	    pcbgroup->ipg_hashmask)];
 	CK_LIST_FOREACH(inp, head, inp_pcbgrouphash) {
 #ifdef INET6
 		/* XXX inp locking */
 		if ((inp->inp_vflag & INP_IPV4) == 0)
 			continue;
 #endif
 		if (inp->inp_faddr.s_addr == faddr.s_addr &&
 		    inp->inp_laddr.s_addr == laddr.s_addr &&
 		    inp->inp_fport == fport &&
 		    inp->inp_lport == lport) {
 			/*
 			 * XXX We should be able to directly return
 			 * the inp here, without any checks.
 			 * Well unless both bound with SO_REUSEPORT?
 			 */
 			if (prison_flag(inp->inp_cred, PR_IP4))
 				goto found;
 			if (tmpinp == NULL)
 				tmpinp = inp;
 		}
 	}
 	if (tmpinp != NULL) {
 		inp = tmpinp;
 		goto found;
 	}
 
 #ifdef	RSS
 	/*
 	 * For incoming connections, we may wish to do a wildcard
 	 * match for an RSS-local socket.
 	 */
 	if ((lookupflags & INPLOOKUP_WILDCARD) != 0) {
 		struct inpcb *local_wild = NULL, *local_exact = NULL;
 #ifdef INET6
 		struct inpcb *local_wild_mapped = NULL;
 #endif
 		struct inpcb *jail_wild = NULL;
 		struct inpcbhead *head;
 		int injail;
 
 		/*
 		 * Order of socket selection - we always prefer jails.
 		 *      1. jailed, non-wild.
 		 *      2. jailed, wild.
 		 *      3. non-jailed, non-wild.
 		 *      4. non-jailed, wild.
 		 */
 
 		head = &pcbgroup->ipg_hashbase[INP_PCBHASH(INADDR_ANY,
 		    lport, 0, pcbgroup->ipg_hashmask)];
 		CK_LIST_FOREACH(inp, head, inp_pcbgrouphash) {
 #ifdef INET6
 			/* XXX inp locking */
 			if ((inp->inp_vflag & INP_IPV4) == 0)
 				continue;
 #endif
 			if (inp->inp_faddr.s_addr != INADDR_ANY ||
 			    inp->inp_lport != lport)
 				continue;
 
 			injail = prison_flag(inp->inp_cred, PR_IP4);
 			if (injail) {
 				if (prison_check_ip4(inp->inp_cred,
 				    &laddr) != 0)
 					continue;
 			} else {
 				if (local_exact != NULL)
 					continue;
 			}
 
 			if (inp->inp_laddr.s_addr == laddr.s_addr) {
 				if (injail)
 					goto found;
 				else
 					local_exact = inp;
 			} else if (inp->inp_laddr.s_addr == INADDR_ANY) {
 #ifdef INET6
 				/* XXX inp locking, NULL check */
 				if (inp->inp_vflag & INP_IPV6PROTO)
 					local_wild_mapped = inp;
 				else
 #endif
 					if (injail)
 						jail_wild = inp;
 					else
 						local_wild = inp;
 			}
 		} /* LIST_FOREACH */
 
 		inp = jail_wild;
 		if (inp == NULL)
 			inp = local_exact;
 		if (inp == NULL)
 			inp = local_wild;
 #ifdef INET6
 		if (inp == NULL)
 			inp = local_wild_mapped;
 #endif
 		if (inp != NULL)
 			goto found;
 	}
 #endif
 
 	/*
 	 * Then look for a wildcard match, if requested.
 	 */
 	if ((lookupflags & INPLOOKUP_WILDCARD) != 0) {
 		struct inpcb *local_wild = NULL, *local_exact = NULL;
 #ifdef INET6
 		struct inpcb *local_wild_mapped = NULL;
 #endif
 		struct inpcb *jail_wild = NULL;
 		struct inpcbhead *head;
 		int injail;
 
 		/*
 		 * Order of socket selection - we always prefer jails.
 		 *      1. jailed, non-wild.
 		 *      2. jailed, wild.
 		 *      3. non-jailed, non-wild.
 		 *      4. non-jailed, wild.
 		 */
 		head = &pcbinfo->ipi_wildbase[INP_PCBHASH(INADDR_ANY, lport,
 		    0, pcbinfo->ipi_wildmask)];
 		CK_LIST_FOREACH(inp, head, inp_pcbgroup_wild) {
 #ifdef INET6
 			/* XXX inp locking */
 			if ((inp->inp_vflag & INP_IPV4) == 0)
 				continue;
 #endif
 			if (inp->inp_faddr.s_addr != INADDR_ANY ||
 			    inp->inp_lport != lport)
 				continue;
 
 			injail = prison_flag(inp->inp_cred, PR_IP4);
 			if (injail) {
 				if (prison_check_ip4(inp->inp_cred,
 				    &laddr) != 0)
 					continue;
 			} else {
 				if (local_exact != NULL)
 					continue;
 			}
 
 			if (inp->inp_laddr.s_addr == laddr.s_addr) {
 				if (injail)
 					goto found;
 				else
 					local_exact = inp;
 			} else if (inp->inp_laddr.s_addr == INADDR_ANY) {
 #ifdef INET6
 				/* XXX inp locking, NULL check */
 				if (inp->inp_vflag & INP_IPV6PROTO)
 					local_wild_mapped = inp;
 				else
 #endif
 					if (injail)
 						jail_wild = inp;
 					else
 						local_wild = inp;
 			}
 		} /* LIST_FOREACH */
 		inp = jail_wild;
 		if (inp == NULL)
 			inp = local_exact;
 		if (inp == NULL)
 			inp = local_wild;
 #ifdef INET6
 		if (inp == NULL)
 			inp = local_wild_mapped;
 #endif
 		if (inp != NULL)
 			goto found;
 	} /* if (lookupflags & INPLOOKUP_WILDCARD) */
 	INP_GROUP_UNLOCK(pcbgroup);
 	return (NULL);
 
 found:
 	if (lookupflags & INPLOOKUP_WLOCKPCB)
 		locked = INP_TRY_WLOCK(inp);
 	else if (lookupflags & INPLOOKUP_RLOCKPCB)
 		locked = INP_TRY_RLOCK(inp);
 	else
 		panic("%s: locking bug", __func__);
 	if (__predict_false(locked && (inp->inp_flags2 & INP_FREED))) {
 		if (lookupflags & INPLOOKUP_WLOCKPCB)
 			INP_WUNLOCK(inp);
 		else
 			INP_RUNLOCK(inp);
 		return (NULL);
 	} else if (!locked)
 		in_pcbref(inp);
 	INP_GROUP_UNLOCK(pcbgroup);
 	if (!locked) {
 		if (lookupflags & INPLOOKUP_WLOCKPCB) {
 			INP_WLOCK(inp);
 			if (in_pcbrele_wlocked(inp))
 				return (NULL);
 		} else {
 			INP_RLOCK(inp);
 			if (in_pcbrele_rlocked(inp))
 				return (NULL);
 		}
 	}
 #ifdef INVARIANTS
 	if (lookupflags & INPLOOKUP_WLOCKPCB)
 		INP_WLOCK_ASSERT(inp);
 	else
 		INP_RLOCK_ASSERT(inp);
 #endif
 	return (inp);
 }
 #endif /* PCBGROUP */
 
 /*
  * Lookup PCB in hash list, using pcbinfo tables.  This variation assumes
  * that the caller has locked the hash list, and will not perform any further
  * locking or reference operations on either the hash list or the connection.
  */
 static struct inpcb *
 in_pcblookup_hash_locked(struct inpcbinfo *pcbinfo, struct in_addr faddr,
     u_int fport_arg, struct in_addr laddr, u_int lport_arg, int lookupflags,
     struct ifnet *ifp)
 {
 	struct inpcbhead *head;
 	struct inpcb *inp, *tmpinp;
 	u_short fport = fport_arg, lport = lport_arg;
 
 #ifdef INVARIANTS
 	KASSERT((lookupflags & ~(INPLOOKUP_WILDCARD)) == 0,
 	    ("%s: invalid lookup flags %d", __func__, lookupflags));
 	if (!mtx_owned(&pcbinfo->ipi_hash_lock))
 		MPASS(in_epoch_verbose(net_epoch_preempt, 1));
 #endif
 	/*
 	 * First look for an exact match.
 	 */
 	tmpinp = NULL;
 	head = &pcbinfo->ipi_hashbase[INP_PCBHASH(faddr.s_addr, lport, fport,
 	    pcbinfo->ipi_hashmask)];
 	CK_LIST_FOREACH(inp, head, inp_hash) {
 #ifdef INET6
 		/* XXX inp locking */
 		if ((inp->inp_vflag & INP_IPV4) == 0)
 			continue;
 #endif
 		if (inp->inp_faddr.s_addr == faddr.s_addr &&
 		    inp->inp_laddr.s_addr == laddr.s_addr &&
 		    inp->inp_fport == fport &&
 		    inp->inp_lport == lport) {
 			/*
 			 * XXX We should be able to directly return
 			 * the inp here, without any checks.
 			 * Well unless both bound with SO_REUSEPORT?
 			 */
 			if (prison_flag(inp->inp_cred, PR_IP4))
 				return (inp);
 			if (tmpinp == NULL)
 				tmpinp = inp;
 		}
 	}
 	if (tmpinp != NULL)
 		return (tmpinp);
 
 	/*
 	 * Then look in lb group (for wildcard match).
 	 */
 	if ((lookupflags & INPLOOKUP_WILDCARD) != 0) {
 		inp = in_pcblookup_lbgroup(pcbinfo, &laddr, lport, &faddr,
 		    fport, lookupflags);
 		if (inp != NULL)
 			return (inp);
 	}
 
 	/*
 	 * Then look for a wildcard match, if requested.
 	 */
 	if ((lookupflags & INPLOOKUP_WILDCARD) != 0) {
 		struct inpcb *local_wild = NULL, *local_exact = NULL;
 #ifdef INET6
 		struct inpcb *local_wild_mapped = NULL;
 #endif
 		struct inpcb *jail_wild = NULL;
 		int injail;
 
 		/*
 		 * Order of socket selection - we always prefer jails.
 		 *      1. jailed, non-wild.
 		 *      2. jailed, wild.
 		 *      3. non-jailed, non-wild.
 		 *      4. non-jailed, wild.
 		 */
 
 		head = &pcbinfo->ipi_hashbase[INP_PCBHASH(INADDR_ANY, lport,
 		    0, pcbinfo->ipi_hashmask)];
 		CK_LIST_FOREACH(inp, head, inp_hash) {
 #ifdef INET6
 			/* XXX inp locking */
 			if ((inp->inp_vflag & INP_IPV4) == 0)
 				continue;
 #endif
 			if (inp->inp_faddr.s_addr != INADDR_ANY ||
 			    inp->inp_lport != lport)
 				continue;
 
 			injail = prison_flag(inp->inp_cred, PR_IP4);
 			if (injail) {
 				if (prison_check_ip4(inp->inp_cred,
 				    &laddr) != 0)
 					continue;
 			} else {
 				if (local_exact != NULL)
 					continue;
 			}
 
 			if (inp->inp_laddr.s_addr == laddr.s_addr) {
 				if (injail)
 					return (inp);
 				else
 					local_exact = inp;
 			} else if (inp->inp_laddr.s_addr == INADDR_ANY) {
 #ifdef INET6
 				/* XXX inp locking, NULL check */
 				if (inp->inp_vflag & INP_IPV6PROTO)
 					local_wild_mapped = inp;
 				else
 #endif
 					if (injail)
 						jail_wild = inp;
 					else
 						local_wild = inp;
 			}
 		} /* LIST_FOREACH */
 		if (jail_wild != NULL)
 			return (jail_wild);
 		if (local_exact != NULL)
 			return (local_exact);
 		if (local_wild != NULL)
 			return (local_wild);
 #ifdef INET6
 		if (local_wild_mapped != NULL)
 			return (local_wild_mapped);
 #endif
 	} /* if ((lookupflags & INPLOOKUP_WILDCARD) != 0) */
 
 	return (NULL);
 }
 
 /*
  * Lookup PCB in hash list, using pcbinfo tables.  This variation locks the
  * hash list lock, and will return the inpcb locked (i.e., requires
  * INPLOOKUP_LOCKPCB).
  */
 static struct inpcb *
 in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,
     u_int fport, struct in_addr laddr, u_int lport, int lookupflags,
     struct ifnet *ifp)
 {
 	struct inpcb *inp;
 
 	INP_HASH_RLOCK(pcbinfo);
 	inp = in_pcblookup_hash_locked(pcbinfo, faddr, fport, laddr, lport,
 	    (lookupflags & ~(INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)), ifp);
 	if (inp != NULL) {
 		if (lookupflags & INPLOOKUP_WLOCKPCB) {
 			INP_WLOCK(inp);
 			if (__predict_false(inp->inp_flags2 & INP_FREED)) {
 				INP_WUNLOCK(inp);
 				inp = NULL;
 			}
 		} else if (lookupflags & INPLOOKUP_RLOCKPCB) {
 			INP_RLOCK(inp);
 			if (__predict_false(inp->inp_flags2 & INP_FREED)) {
 				INP_RUNLOCK(inp);
 				inp = NULL;
 			}
 		} else
 			panic("%s: locking bug", __func__);
 #ifdef INVARIANTS
 		if (inp != NULL) {
 			if (lookupflags & INPLOOKUP_WLOCKPCB)
 				INP_WLOCK_ASSERT(inp);
 			else
 				INP_RLOCK_ASSERT(inp);
 		}
 #endif
 	}
 	INP_HASH_RUNLOCK(pcbinfo);
 	return (inp);
 }
 
 /*
  * Public inpcb lookup routines, accepting a 4-tuple, and optionally, an mbuf
  * from which a pre-calculated hash value may be extracted.
  *
  * Possibly more of this logic should be in in_pcbgroup.c.
  */
 struct inpcb *
 in_pcblookup(struct inpcbinfo *pcbinfo, struct in_addr faddr, u_int fport,
     struct in_addr laddr, u_int lport, int lookupflags, struct ifnet *ifp)
 {
 #if defined(PCBGROUP) && !defined(RSS)
 	struct inpcbgroup *pcbgroup;
 #endif
 
 	KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0,
 	    ("%s: invalid lookup flags %d", __func__, lookupflags));
 	KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0,
 	    ("%s: LOCKPCB not set", __func__));
 
 	/*
 	 * When not using RSS, use connection groups in preference to the
 	 * reservation table when looking up 4-tuples.  When using RSS, just
 	 * use the reservation table, due to the cost of the Toeplitz hash
 	 * in software.
 	 *
 	 * XXXRW: This policy belongs in the pcbgroup code, as in principle
 	 * we could be doing RSS with a non-Toeplitz hash that is affordable
 	 * in software.
 	 */
 #if defined(PCBGROUP) && !defined(RSS)
 	if (in_pcbgroup_enabled(pcbinfo)) {
 		pcbgroup = in_pcbgroup_bytuple(pcbinfo, laddr, lport, faddr,
 		    fport);
 		return (in_pcblookup_group(pcbinfo, pcbgroup, faddr, fport,
 		    laddr, lport, lookupflags, ifp));
 	}
 #endif
 	return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport,
 	    lookupflags, ifp));
 }
 
 struct inpcb *
 in_pcblookup_mbuf(struct inpcbinfo *pcbinfo, struct in_addr faddr,
     u_int fport, struct in_addr laddr, u_int lport, int lookupflags,
     struct ifnet *ifp, struct mbuf *m)
 {
 #ifdef PCBGROUP
 	struct inpcbgroup *pcbgroup;
 #endif
 
 	KASSERT((lookupflags & ~INPLOOKUP_MASK) == 0,
 	    ("%s: invalid lookup flags %d", __func__, lookupflags));
 	KASSERT((lookupflags & (INPLOOKUP_RLOCKPCB | INPLOOKUP_WLOCKPCB)) != 0,
 	    ("%s: LOCKPCB not set", __func__));
 
 #ifdef PCBGROUP
 	/*
 	 * If we can use a hardware-generated hash to look up the connection
 	 * group, use that connection group to find the inpcb.  Otherwise
 	 * fall back on a software hash -- or the reservation table if we're
 	 * using RSS.
 	 *
 	 * XXXRW: As above, that policy belongs in the pcbgroup code.
 	 */
 	if (in_pcbgroup_enabled(pcbinfo) &&
 	    !(M_HASHTYPE_TEST(m, M_HASHTYPE_NONE))) {
 		pcbgroup = in_pcbgroup_byhash(pcbinfo, M_HASHTYPE_GET(m),
 		    m->m_pkthdr.flowid);
 		if (pcbgroup != NULL)
 			return (in_pcblookup_group(pcbinfo, pcbgroup, faddr,
 			    fport, laddr, lport, lookupflags, ifp));
 #ifndef RSS
 		pcbgroup = in_pcbgroup_bytuple(pcbinfo, laddr, lport, faddr,
 		    fport);
 		return (in_pcblookup_group(pcbinfo, pcbgroup, faddr, fport,
 		    laddr, lport, lookupflags, ifp));
 #endif
 	}
 #endif
 	return (in_pcblookup_hash(pcbinfo, faddr, fport, laddr, lport,
 	    lookupflags, ifp));
 }
 #endif /* INET */
 
 /*
  * Insert PCB onto various hash lists.
  */
 static int
 in_pcbinshash_internal(struct inpcb *inp, int do_pcbgroup_update)
 {
 	struct inpcbhead *pcbhash;
 	struct inpcbporthead *pcbporthash;
 	struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 	struct inpcbport *phd;
 	u_int32_t hashkey_faddr;
 	int so_options;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(pcbinfo);
 
 	KASSERT((inp->inp_flags & INP_INHASHLIST) == 0,
 	    ("in_pcbinshash: INP_INHASHLIST"));
 
 #ifdef INET6
 	if (inp->inp_vflag & INP_IPV6)
 		hashkey_faddr = INP6_PCBHASHKEY(&inp->in6p_faddr);
 	else
 #endif
 	hashkey_faddr = inp->inp_faddr.s_addr;
 
 	pcbhash = &pcbinfo->ipi_hashbase[INP_PCBHASH(hashkey_faddr,
 		 inp->inp_lport, inp->inp_fport, pcbinfo->ipi_hashmask)];
 
 	pcbporthash = &pcbinfo->ipi_porthashbase[
 	    INP_PCBPORTHASH(inp->inp_lport, pcbinfo->ipi_porthashmask)];
 
 	/*
 	 * Add entry to load balance group.
 	 * Only do this if SO_REUSEPORT_LB is set.
 	 */
 	so_options = inp_so_options(inp);
 	if (so_options & SO_REUSEPORT_LB) {
 		int ret = in_pcbinslbgrouphash(inp);
 		if (ret) {
 			/* pcb lb group malloc fail (ret=ENOBUFS). */
 			return (ret);
 		}
 	}
 
 	/*
 	 * Go through port list and look for a head for this lport.
 	 */
 	CK_LIST_FOREACH(phd, pcbporthash, phd_hash) {
 		if (phd->phd_port == inp->inp_lport)
 			break;
 	}
 	/*
 	 * If none exists, malloc one and tack it on.
 	 */
 	if (phd == NULL) {
 		phd = malloc(sizeof(struct inpcbport), M_PCB, M_NOWAIT);
 		if (phd == NULL) {
 			return (ENOBUFS); /* XXX */
 		}
 		bzero(&phd->phd_epoch_ctx, sizeof(struct epoch_context));
 		phd->phd_port = inp->inp_lport;
 		CK_LIST_INIT(&phd->phd_pcblist);
 		CK_LIST_INSERT_HEAD(pcbporthash, phd, phd_hash);
 	}
 	inp->inp_phd = phd;
 	CK_LIST_INSERT_HEAD(&phd->phd_pcblist, inp, inp_portlist);
 	CK_LIST_INSERT_HEAD(pcbhash, inp, inp_hash);
 	inp->inp_flags |= INP_INHASHLIST;
 #ifdef PCBGROUP
 	if (do_pcbgroup_update)
 		in_pcbgroup_update(inp);
 #endif
 	return (0);
 }
 
 /*
  * For now, there are two public interfaces to insert an inpcb into the hash
  * lists -- one that does update pcbgroups, and one that doesn't.  The latter
  * is used only in the TCP syncache, where in_pcbinshash is called before the
  * full 4-tuple is set for the inpcb, and we don't want to install in the
  * pcbgroup until later.
  *
  * XXXRW: This seems like a misfeature.  in_pcbinshash should always update
  * connection groups, and partially initialised inpcbs should not be exposed
  * to either reservation hash tables or pcbgroups.
  */
 int
 in_pcbinshash(struct inpcb *inp)
 {
 
 	return (in_pcbinshash_internal(inp, 1));
 }
 
 int
 in_pcbinshash_nopcbgroup(struct inpcb *inp)
 {
 
 	return (in_pcbinshash_internal(inp, 0));
 }
 
 /*
  * Move PCB to the proper hash bucket when { faddr, fport } have  been
  * changed. NOTE: This does not handle the case of the lport changing (the
  * hashed port list would have to be updated as well), so the lport must
  * not change after in_pcbinshash() has been called.
  */
 void
 in_pcbrehash_mbuf(struct inpcb *inp, struct mbuf *m)
 {
 	struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 	struct inpcbhead *head;
 	u_int32_t hashkey_faddr;
 
 	INP_WLOCK_ASSERT(inp);
 	INP_HASH_WLOCK_ASSERT(pcbinfo);
 
 	KASSERT(inp->inp_flags & INP_INHASHLIST,
 	    ("in_pcbrehash: !INP_INHASHLIST"));
 
 #ifdef INET6
 	if (inp->inp_vflag & INP_IPV6)
 		hashkey_faddr = INP6_PCBHASHKEY(&inp->in6p_faddr);
 	else
 #endif
 	hashkey_faddr = inp->inp_faddr.s_addr;
 
 	head = &pcbinfo->ipi_hashbase[INP_PCBHASH(hashkey_faddr,
 		inp->inp_lport, inp->inp_fport, pcbinfo->ipi_hashmask)];
 
 	CK_LIST_REMOVE(inp, inp_hash);
 	CK_LIST_INSERT_HEAD(head, inp, inp_hash);
 
 #ifdef PCBGROUP
 	if (m != NULL)
 		in_pcbgroup_update_mbuf(inp, m);
 	else
 		in_pcbgroup_update(inp);
 #endif
 }
 
 void
 in_pcbrehash(struct inpcb *inp)
 {
 
 	in_pcbrehash_mbuf(inp, NULL);
 }
 
 /*
  * Remove PCB from various lists.
  */
 static void
 in_pcbremlists(struct inpcb *inp)
 {
 	struct inpcbinfo *pcbinfo = inp->inp_pcbinfo;
 
 #ifdef INVARIANTS
 	if (pcbinfo == &V_tcbinfo) {
 		INP_INFO_RLOCK_ASSERT(pcbinfo);
 	} else {
 		INP_INFO_WLOCK_ASSERT(pcbinfo);
 	}
 #endif
 
 	INP_WLOCK_ASSERT(inp);
 	INP_LIST_WLOCK_ASSERT(pcbinfo);
 
 	inp->inp_gencnt = ++pcbinfo->ipi_gencnt;
 	if (inp->inp_flags & INP_INHASHLIST) {
 		struct inpcbport *phd = inp->inp_phd;
 
 		INP_HASH_WLOCK(pcbinfo);
 
 		/* XXX: Only do if SO_REUSEPORT_LB set? */
 		in_pcbremlbgrouphash(inp);
 
 		CK_LIST_REMOVE(inp, inp_hash);
 		CK_LIST_REMOVE(inp, inp_portlist);
 		if (CK_LIST_FIRST(&phd->phd_pcblist) == NULL) {
 			CK_LIST_REMOVE(phd, phd_hash);
 			epoch_call(net_epoch_preempt, &phd->phd_epoch_ctx, inpcbport_free);
 		}
 		INP_HASH_WUNLOCK(pcbinfo);
 		inp->inp_flags &= ~INP_INHASHLIST;
 	}
 	CK_LIST_REMOVE(inp, inp_list);
 	pcbinfo->ipi_count--;
 #ifdef PCBGROUP
 	in_pcbgroup_remove(inp);
 #endif
 }
 
 /*
  * Check for alternatives when higher level complains
  * about service problems.  For now, invalidate cached
  * routing information.  If the route was created dynamically
  * (by a redirect), time to try a default gateway again.
  */
 void
 in_losing(struct inpcb *inp)
 {
 
 	RO_INVALIDATE_CACHE(&inp->inp_route);
 	return;
 }
 
 /*
  * A set label operation has occurred at the socket layer, propagate the
  * label change into the in_pcb for the socket.
  */
 void
 in_pcbsosetlabel(struct socket *so)
 {
 #ifdef MAC
 	struct inpcb *inp;
 
 	inp = sotoinpcb(so);
 	KASSERT(inp != NULL, ("in_pcbsosetlabel: so->so_pcb == NULL"));
 
 	INP_WLOCK(inp);
 	SOCK_LOCK(so);
 	mac_inpcb_sosetlabel(so, inp);
 	SOCK_UNLOCK(so);
 	INP_WUNLOCK(inp);
 #endif
 }
 
 /*
  * ipport_tick runs once per second, determining if random port allocation
  * should be continued.  If more than ipport_randomcps ports have been
  * allocated in the last second, then we return to sequential port
  * allocation. We return to random allocation only once we drop below
  * ipport_randomcps for at least ipport_randomtime seconds.
  */
 static void
 ipport_tick(void *xtp)
 {
 	VNET_ITERATOR_DECL(vnet_iter);
 
 	VNET_LIST_RLOCK_NOSLEEP();
 	VNET_FOREACH(vnet_iter) {
 		CURVNET_SET(vnet_iter);	/* XXX appease INVARIANTS here */
 		if (V_ipport_tcpallocs <=
 		    V_ipport_tcplastcount + V_ipport_randomcps) {
 			if (V_ipport_stoprandom > 0)
 				V_ipport_stoprandom--;
 		} else
 			V_ipport_stoprandom = V_ipport_randomtime;
 		V_ipport_tcplastcount = V_ipport_tcpallocs;
 		CURVNET_RESTORE();
 	}
 	VNET_LIST_RUNLOCK_NOSLEEP();
 	callout_reset(&ipport_tick_callout, hz, ipport_tick, NULL);
 }
 
 static void
 ip_fini(void *xtp)
 {
 
 	callout_stop(&ipport_tick_callout);
 }
 
 /* 
  * The ipport_callout should start running at about the time we attach the
  * inet or inet6 domains.
  */
 static void
 ipport_tick_init(const void *unused __unused)
 {
 
 	/* Start ipport_tick. */
 	callout_init(&ipport_tick_callout, 1);
 	callout_reset(&ipport_tick_callout, 1, ipport_tick, NULL);
 	EVENTHANDLER_REGISTER(shutdown_pre_sync, ip_fini, NULL,
 		SHUTDOWN_PRI_DEFAULT);
 }
 SYSINIT(ipport_tick_init, SI_SUB_PROTO_DOMAIN, SI_ORDER_MIDDLE, 
     ipport_tick_init, NULL);
 
 void
 inp_wlock(struct inpcb *inp)
 {
 
 	INP_WLOCK(inp);
 }
 
 void
 inp_wunlock(struct inpcb *inp)
 {
 
 	INP_WUNLOCK(inp);
 }
 
 void
 inp_rlock(struct inpcb *inp)
 {
 
 	INP_RLOCK(inp);
 }
 
 void
 inp_runlock(struct inpcb *inp)
 {
 
 	INP_RUNLOCK(inp);
 }
 
 #ifdef INVARIANT_SUPPORT
 void
 inp_lock_assert(struct inpcb *inp)
 {
 
 	INP_WLOCK_ASSERT(inp);
 }
 
 void
 inp_unlock_assert(struct inpcb *inp)
 {
 
 	INP_UNLOCK_ASSERT(inp);
 }
 #endif
 
 void
 inp_apply_all(void (*func)(struct inpcb *, void *), void *arg)
 {
 	struct inpcb *inp;
 
 	INP_INFO_WLOCK(&V_tcbinfo);
 	CK_LIST_FOREACH(inp, V_tcbinfo.ipi_listhead, inp_list) {
 		INP_WLOCK(inp);
 		func(inp, arg);
 		INP_WUNLOCK(inp);
 	}
 	INP_INFO_WUNLOCK(&V_tcbinfo);
 }
 
 struct socket *
 inp_inpcbtosocket(struct inpcb *inp)
 {
 
 	INP_WLOCK_ASSERT(inp);
 	return (inp->inp_socket);
 }
 
 struct tcpcb *
 inp_inpcbtotcpcb(struct inpcb *inp)
 {
 
 	INP_WLOCK_ASSERT(inp);
 	return ((struct tcpcb *)inp->inp_ppcb);
 }
 
 int
 inp_ip_tos_get(const struct inpcb *inp)
 {
 
 	return (inp->inp_ip_tos);
 }
 
 void
 inp_ip_tos_set(struct inpcb *inp, int val)
 {
 
 	inp->inp_ip_tos = val;
 }
 
 void
 inp_4tuple_get(struct inpcb *inp, uint32_t *laddr, uint16_t *lp,
     uint32_t *faddr, uint16_t *fp)
 {
 
 	INP_LOCK_ASSERT(inp);
 	*laddr = inp->inp_laddr.s_addr;
 	*faddr = inp->inp_faddr.s_addr;
 	*lp = inp->inp_lport;
 	*fp = inp->inp_fport;
 }
 
 struct inpcb *
 so_sotoinpcb(struct socket *so)
 {
 
 	return (sotoinpcb(so));
 }
 
 struct tcpcb *
 so_sototcpcb(struct socket *so)
 {
 
 	return (sototcpcb(so));
 }
 
 /*
  * Create an external-format (``xinpcb'') structure using the information in
  * the kernel-format in_pcb structure pointed to by inp.  This is done to
  * reduce the spew of irrelevant information over this interface, to isolate
  * user code from changes in the kernel structure, and potentially to provide
  * information-hiding if we decide that some of this information should be
  * hidden from users.
  */
 void
 in_pcbtoxinpcb(const struct inpcb *inp, struct xinpcb *xi)
 {
 
 	bzero(xi, sizeof(*xi));
 	xi->xi_len = sizeof(struct xinpcb);
 	if (inp->inp_socket)
 		sotoxsocket(inp->inp_socket, &xi->xi_socket);
 	bcopy(&inp->inp_inc, &xi->inp_inc, sizeof(struct in_conninfo));
 	xi->inp_gencnt = inp->inp_gencnt;
 	xi->inp_ppcb = (uintptr_t)inp->inp_ppcb;
 	xi->inp_flow = inp->inp_flow;
 	xi->inp_flowid = inp->inp_flowid;
 	xi->inp_flowtype = inp->inp_flowtype;
 	xi->inp_flags = inp->inp_flags;
 	xi->inp_flags2 = inp->inp_flags2;
 	xi->inp_rss_listen_bucket = inp->inp_rss_listen_bucket;
 	xi->in6p_cksum = inp->in6p_cksum;
 	xi->in6p_hops = inp->in6p_hops;
 	xi->inp_ip_tos = inp->inp_ip_tos;
 	xi->inp_vflag = inp->inp_vflag;
 	xi->inp_ip_ttl = inp->inp_ip_ttl;
 	xi->inp_ip_p = inp->inp_ip_p;
 	xi->inp_ip_minttl = inp->inp_ip_minttl;
 }
 
 #ifdef DDB
 static void
 db_print_indent(int indent)
 {
 	int i;
 
 	for (i = 0; i < indent; i++)
 		db_printf(" ");
 }
 
 static void
 db_print_inconninfo(struct in_conninfo *inc, const char *name, int indent)
 {
 	char faddr_str[48], laddr_str[48];
 
 	db_print_indent(indent);
 	db_printf("%s at %p\n", name, inc);
 
 	indent += 2;
 
 #ifdef INET6
 	if (inc->inc_flags & INC_ISIPV6) {
 		/* IPv6. */
 		ip6_sprintf(laddr_str, &inc->inc6_laddr);
 		ip6_sprintf(faddr_str, &inc->inc6_faddr);
 	} else
 #endif
 	{
 		/* IPv4. */
 		inet_ntoa_r(inc->inc_laddr, laddr_str);
 		inet_ntoa_r(inc->inc_faddr, faddr_str);
 	}
 	db_print_indent(indent);
 	db_printf("inc_laddr %s   inc_lport %u\n", laddr_str,
 	    ntohs(inc->inc_lport));
 	db_print_indent(indent);
 	db_printf("inc_faddr %s   inc_fport %u\n", faddr_str,
 	    ntohs(inc->inc_fport));
 }
 
 static void
 db_print_inpflags(int inp_flags)
 {
 	int comma;
 
 	comma = 0;
 	if (inp_flags & INP_RECVOPTS) {
 		db_printf("%sINP_RECVOPTS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_RECVRETOPTS) {
 		db_printf("%sINP_RECVRETOPTS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_RECVDSTADDR) {
 		db_printf("%sINP_RECVDSTADDR", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_ORIGDSTADDR) {
 		db_printf("%sINP_ORIGDSTADDR", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_HDRINCL) {
 		db_printf("%sINP_HDRINCL", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_HIGHPORT) {
 		db_printf("%sINP_HIGHPORT", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_LOWPORT) {
 		db_printf("%sINP_LOWPORT", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_ANONPORT) {
 		db_printf("%sINP_ANONPORT", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_RECVIF) {
 		db_printf("%sINP_RECVIF", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_MTUDISC) {
 		db_printf("%sINP_MTUDISC", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_RECVTTL) {
 		db_printf("%sINP_RECVTTL", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_DONTFRAG) {
 		db_printf("%sINP_DONTFRAG", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_RECVTOS) {
 		db_printf("%sINP_RECVTOS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_IPV6_V6ONLY) {
 		db_printf("%sIN6P_IPV6_V6ONLY", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_PKTINFO) {
 		db_printf("%sIN6P_PKTINFO", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_HOPLIMIT) {
 		db_printf("%sIN6P_HOPLIMIT", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_HOPOPTS) {
 		db_printf("%sIN6P_HOPOPTS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_DSTOPTS) {
 		db_printf("%sIN6P_DSTOPTS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_RTHDR) {
 		db_printf("%sIN6P_RTHDR", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_RTHDRDSTOPTS) {
 		db_printf("%sIN6P_RTHDRDSTOPTS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_TCLASS) {
 		db_printf("%sIN6P_TCLASS", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_AUTOFLOWLABEL) {
 		db_printf("%sIN6P_AUTOFLOWLABEL", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & INP_TIMEWAIT) {
 		db_printf("%sINP_TIMEWAIT", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_flags & INP_ONESBCAST) {
 		db_printf("%sINP_ONESBCAST", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_flags & INP_DROPPED) {
 		db_printf("%sINP_DROPPED", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_flags & INP_SOCKREF) {
 		db_printf("%sINP_SOCKREF", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_flags & IN6P_RFC2292) {
 		db_printf("%sIN6P_RFC2292", comma ? ", " : "");
 		comma = 1;
 	}
 	if (inp_flags & IN6P_MTU) {
 		db_printf("IN6P_MTU%s", comma ? ", " : "");
 		comma = 1;
 	}
 }
 
 static void
 db_print_inpvflag(u_char inp_vflag)
 {
 	int comma;
 
 	comma = 0;
 	if (inp_vflag & INP_IPV4) {
 		db_printf("%sINP_IPV4", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_vflag & INP_IPV6) {
 		db_printf("%sINP_IPV6", comma ? ", " : "");
 		comma  = 1;
 	}
 	if (inp_vflag & INP_IPV6PROTO) {
 		db_printf("%sINP_IPV6PROTO", comma ? ", " : "");
 		comma  = 1;
 	}
 }
 
 static void
 db_print_inpcb(struct inpcb *inp, const char *name, int indent)
 {
 
 	db_print_indent(indent);
 	db_printf("%s at %p\n", name, inp);
 
 	indent += 2;
 
 	db_print_indent(indent);
 	db_printf("inp_flow: 0x%x\n", inp->inp_flow);
 
 	db_print_inconninfo(&inp->inp_inc, "inp_conninfo", indent);
 
 	db_print_indent(indent);
 	db_printf("inp_ppcb: %p   inp_pcbinfo: %p   inp_socket: %p\n",
 	    inp->inp_ppcb, inp->inp_pcbinfo, inp->inp_socket);
 
 	db_print_indent(indent);
 	db_printf("inp_label: %p   inp_flags: 0x%x (",
 	   inp->inp_label, inp->inp_flags);
 	db_print_inpflags(inp->inp_flags);
 	db_printf(")\n");
 
 	db_print_indent(indent);
 	db_printf("inp_sp: %p   inp_vflag: 0x%x (", inp->inp_sp,
 	    inp->inp_vflag);
 	db_print_inpvflag(inp->inp_vflag);
 	db_printf(")\n");
 
 	db_print_indent(indent);
 	db_printf("inp_ip_ttl: %d   inp_ip_p: %d   inp_ip_minttl: %d\n",
 	    inp->inp_ip_ttl, inp->inp_ip_p, inp->inp_ip_minttl);
 
 	db_print_indent(indent);
 #ifdef INET6
 	if (inp->inp_vflag & INP_IPV6) {
 		db_printf("in6p_options: %p   in6p_outputopts: %p   "
 		    "in6p_moptions: %p\n", inp->in6p_options,
 		    inp->in6p_outputopts, inp->in6p_moptions);
 		db_printf("in6p_icmp6filt: %p   in6p_cksum %d   "
 		    "in6p_hops %u\n", inp->in6p_icmp6filt, inp->in6p_cksum,
 		    inp->in6p_hops);
 	} else
 #endif
 	{
 		db_printf("inp_ip_tos: %d   inp_ip_options: %p   "
 		    "inp_ip_moptions: %p\n", inp->inp_ip_tos,
 		    inp->inp_options, inp->inp_moptions);
 	}
 
 	db_print_indent(indent);
 	db_printf("inp_phd: %p   inp_gencnt: %ju\n", inp->inp_phd,
 	    (uintmax_t)inp->inp_gencnt);
 }
 
 DB_SHOW_COMMAND(inpcb, db_show_inpcb)
 {
 	struct inpcb *inp;
 
 	if (!have_addr) {
 		db_printf("usage: show inpcb <addr>\n");
 		return;
 	}
 	inp = (struct inpcb *)addr;
 
 	db_print_inpcb(inp, "inpcb", 0);
 }
 #endif /* DDB */
 
 #ifdef RATELIMIT
 /*
  * Modify TX rate limit based on the existing "inp->inp_snd_tag",
  * if any.
  */
 int
 in_pcbmodify_txrtlmt(struct inpcb *inp, uint32_t max_pacing_rate)
 {
 	union if_snd_tag_modify_params params = {
 		.rate_limit.max_rate = max_pacing_rate,
 	};
 	struct m_snd_tag *mst;
 	struct ifnet *ifp;
 	int error;
 
 	mst = inp->inp_snd_tag;
 	if (mst == NULL)
 		return (EINVAL);
 
 	ifp = mst->ifp;
 	if (ifp == NULL)
 		return (EINVAL);
 
 	if (ifp->if_snd_tag_modify == NULL) {
 		error = EOPNOTSUPP;
 	} else {
 		error = ifp->if_snd_tag_modify(mst, &params);
 	}
 	return (error);
 }
 
 /*
  * Query existing TX rate limit based on the existing
  * "inp->inp_snd_tag", if any.
  */
 int
 in_pcbquery_txrtlmt(struct inpcb *inp, uint32_t *p_max_pacing_rate)
 {
 	union if_snd_tag_query_params params = { };
 	struct m_snd_tag *mst;
 	struct ifnet *ifp;
 	int error;
 
 	mst = inp->inp_snd_tag;
 	if (mst == NULL)
 		return (EINVAL);
 
 	ifp = mst->ifp;
 	if (ifp == NULL)
 		return (EINVAL);
 
 	if (ifp->if_snd_tag_query == NULL) {
 		error = EOPNOTSUPP;
 	} else {
 		error = ifp->if_snd_tag_query(mst, &params);
 		if (error == 0 &&  p_max_pacing_rate != NULL)
 			*p_max_pacing_rate = params.rate_limit.max_rate;
 	}
 	return (error);
 }
 
 /*
  * Query existing TX queue level based on the existing
  * "inp->inp_snd_tag", if any.
  */
 int
 in_pcbquery_txrlevel(struct inpcb *inp, uint32_t *p_txqueue_level)
 {
 	union if_snd_tag_query_params params = { };
 	struct m_snd_tag *mst;
 	struct ifnet *ifp;
 	int error;
 
 	mst = inp->inp_snd_tag;
 	if (mst == NULL)
 		return (EINVAL);
 
 	ifp = mst->ifp;
 	if (ifp == NULL)
 		return (EINVAL);
 
 	if (ifp->if_snd_tag_query == NULL)
 		return (EOPNOTSUPP);
 
 	error = ifp->if_snd_tag_query(mst, &params);
 	if (error == 0 &&  p_txqueue_level != NULL)
 		*p_txqueue_level = params.rate_limit.queue_level;
 	return (error);
 }
 
 /*
  * Allocate a new TX rate limit send tag from the network interface
  * given by the "ifp" argument and save it in "inp->inp_snd_tag":
  */
 int
 in_pcbattach_txrtlmt(struct inpcb *inp, struct ifnet *ifp,
     uint32_t flowtype, uint32_t flowid, uint32_t max_pacing_rate)
 {
 	union if_snd_tag_alloc_params params = {
 		.rate_limit.hdr.type = (max_pacing_rate == -1U) ?
 		    IF_SND_TAG_TYPE_UNLIMITED : IF_SND_TAG_TYPE_RATE_LIMIT,
 		.rate_limit.hdr.flowid = flowid,
 		.rate_limit.hdr.flowtype = flowtype,
 		.rate_limit.max_rate = max_pacing_rate,
 	};
 	int error;
 
 	INP_WLOCK_ASSERT(inp);
 
 	if (inp->inp_snd_tag != NULL)
 		return (EINVAL);
 
 	if (ifp->if_snd_tag_alloc == NULL) {
 		error = EOPNOTSUPP;
 	} else {
 		error = ifp->if_snd_tag_alloc(ifp, &params, &inp->inp_snd_tag);
 
 		/*
 		 * At success increment the refcount on
 		 * the send tag's network interface:
 		 */
 		if (error == 0)
 			if_ref(inp->inp_snd_tag->ifp);
 	}
 	return (error);
 }
 
 /*
  * Free an existing TX rate limit tag based on the "inp->inp_snd_tag",
  * if any:
  */
 void
 in_pcbdetach_txrtlmt(struct inpcb *inp)
 {
 	struct m_snd_tag *mst;
 	struct ifnet *ifp;
 
 	INP_WLOCK_ASSERT(inp);
 
 	mst = inp->inp_snd_tag;
 	inp->inp_snd_tag = NULL;
 
 	if (mst == NULL)
 		return;
 
 	ifp = mst->ifp;
 	if (ifp == NULL)
 		return;
 
 	/*
 	 * If the device was detached while we still had reference(s)
 	 * on the ifp, we assume if_snd_tag_free() was replaced with
 	 * stubs.
 	 */
 	ifp->if_snd_tag_free(mst);
 
 	/* release reference count on network interface */
 	if_rele(ifp);
 }
 
 /*
  * This function should be called when the INP_RATE_LIMIT_CHANGED flag
  * is set in the fast path and will attach/detach/modify the TX rate
  * limit send tag based on the socket's so_max_pacing_rate value.
  */
 void
 in_pcboutput_txrtlmt(struct inpcb *inp, struct ifnet *ifp, struct mbuf *mb)
 {
 	struct socket *socket;
 	uint32_t max_pacing_rate;
 	bool did_upgrade;
 	int error;
 
 	if (inp == NULL)
 		return;
 
 	socket = inp->inp_socket;
 	if (socket == NULL)
 		return;
 
 	if (!INP_WLOCKED(inp)) {
 		/*
 		 * NOTE: If the write locking fails, we need to bail
 		 * out and use the non-ratelimited ring for the
 		 * transmit until there is a new chance to get the
 		 * write lock.
 		 */
 		if (!INP_TRY_UPGRADE(inp))
 			return;
 		did_upgrade = 1;
 	} else {
 		did_upgrade = 0;
 	}
 
 	/*
 	 * NOTE: The so_max_pacing_rate value is read unlocked,
 	 * because atomic updates are not required since the variable
 	 * is checked at every mbuf we send. It is assumed that the
 	 * variable read itself will be atomic.
 	 */
 	max_pacing_rate = socket->so_max_pacing_rate;
 
 	/*
 	 * NOTE: When attaching to a network interface a reference is
 	 * made to ensure the network interface doesn't go away until
 	 * all ratelimit connections are gone. The network interface
 	 * pointers compared below represent valid network interfaces,
 	 * except when comparing towards NULL.
 	 */
 	if (max_pacing_rate == 0 && inp->inp_snd_tag == NULL) {
 		error = 0;
 	} else if (!(ifp->if_capenable & IFCAP_TXRTLMT)) {
 		if (inp->inp_snd_tag != NULL)
 			in_pcbdetach_txrtlmt(inp);
 		error = 0;
 	} else if (inp->inp_snd_tag == NULL) {
 		/*
 		 * In order to utilize packet pacing with RSS, we need
 		 * to wait until there is a valid RSS hash before we
 		 * can proceed:
 		 */
 		if (M_HASHTYPE_GET(mb) == M_HASHTYPE_NONE) {
 			error = EAGAIN;
 		} else {
 			error = in_pcbattach_txrtlmt(inp, ifp, M_HASHTYPE_GET(mb),
 			    mb->m_pkthdr.flowid, max_pacing_rate);
 		}
 	} else {
 		error = in_pcbmodify_txrtlmt(inp, max_pacing_rate);
 	}
 	if (error == 0 || error == EOPNOTSUPP)
 		inp->inp_flags2 &= ~INP_RATE_LIMIT_CHANGED;
 	if (did_upgrade)
 		INP_DOWNGRADE(inp);
 }
 
 /*
  * Track route changes for TX rate limiting.
  */
 void
 in_pcboutput_eagain(struct inpcb *inp)
 {
 	bool did_upgrade;
 
 	if (inp == NULL)
 		return;
 
 	if (inp->inp_snd_tag == NULL)
 		return;
 
 	if (!INP_WLOCKED(inp)) {
 		/*
 		 * NOTE: If the write locking fails, we need to bail
 		 * out and use the non-ratelimited ring for the
 		 * transmit until there is a new chance to get the
 		 * write lock.
 		 */
 		if (!INP_TRY_UPGRADE(inp))
 			return;
 		did_upgrade = 1;
 	} else {
 		did_upgrade = 0;
 	}
 
 	/* detach rate limiting */
 	in_pcbdetach_txrtlmt(inp);
 
 	/* make sure new mbuf send tag allocation is made */
 	inp->inp_flags2 |= INP_RATE_LIMIT_CHANGED;
 
 	if (did_upgrade)
 		INP_DOWNGRADE(inp);
 }
 #endif /* RATELIMIT */
Index: user/ngie/bug-237403/sys/netinet/in_pcb.h
===================================================================
--- user/ngie/bug-237403/sys/netinet/in_pcb.h	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/in_pcb.h	(revision 346926)
@@ -1,894 +1,894 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1990, 1993
  *	The Regents of the University of California.
  * Copyright (c) 2010-2011 Juniper Networks, Inc.
  * All rights reserved.
  *
  * Portions of this software were developed by Robert N. M. Watson under
  * contract to Juniper Networks, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)in_pcb.h	8.1 (Berkeley) 6/10/93
  * $FreeBSD$
  */
 
 #ifndef _NETINET_IN_PCB_H_
 #define _NETINET_IN_PCB_H_
 
 #include <sys/queue.h>
 #include <sys/epoch.h>
 #include <sys/_lock.h>
 #include <sys/_mutex.h>
 #include <sys/_rwlock.h>
 #include <net/route.h>
 
 #ifdef _KERNEL
 #include <sys/lock.h>
 #include <sys/rwlock.h>
 #include <net/vnet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <vm/uma.h>
 #endif
 #include <sys/ck.h>
 
 #define	in6pcb		inpcb	/* for KAME src sync over BSD*'s */
 #define	in6p_sp		inp_sp	/* for KAME src sync over BSD*'s */
 
 /*
  * struct inpcb is the common protocol control block structure used in most
  * IP transport protocols.
  *
  * Pointers to local and foreign host table entries, local and foreign socket
  * numbers, and pointers up (to a socket structure) and down (to a
  * protocol-specific control block) are stored here.
  */
 CK_LIST_HEAD(inpcbhead, inpcb);
 CK_LIST_HEAD(inpcbporthead, inpcbport);
 CK_LIST_HEAD(inpcblbgrouphead, inpcblbgroup);
 typedef	uint64_t	inp_gen_t;
 
 /*
  * PCB with AF_INET6 null bind'ed laddr can receive AF_INET input packet.
  * So, AF_INET6 null laddr is also used as AF_INET null laddr, by utilizing
  * the following structure.
  */
 struct in_addr_4in6 {
 	u_int32_t	ia46_pad32[3];
 	struct	in_addr	ia46_addr4;
 };
 
 union in_dependaddr {
 	struct in_addr_4in6 id46_addr;
 	struct in6_addr	id6_addr;
 };
 
 /*
  * NOTE: ipv6 addrs should be 64-bit aligned, per RFC 2553.  in_conninfo has
  * some extra padding to accomplish this.
  * NOTE 2: tcp_syncache.c uses first 5 32-bit words, which identify fport,
  * lport, faddr to generate hash, so these fields shouldn't be moved.
  */
 struct in_endpoints {
 	u_int16_t	ie_fport;		/* foreign port */
 	u_int16_t	ie_lport;		/* local port */
 	/* protocol dependent part, local and foreign addr */
 	union in_dependaddr ie_dependfaddr;	/* foreign host table entry */
 	union in_dependaddr ie_dependladdr;	/* local host table entry */
 #define	ie_faddr	ie_dependfaddr.id46_addr.ia46_addr4
 #define	ie_laddr	ie_dependladdr.id46_addr.ia46_addr4
 #define	ie6_faddr	ie_dependfaddr.id6_addr
 #define	ie6_laddr	ie_dependladdr.id6_addr
 	u_int32_t	ie6_zoneid;		/* scope zone id */
 };
 
 /*
  * XXX The defines for inc_* are hacks and should be changed to direct
  * references.
  */
 struct in_conninfo {
 	u_int8_t	inc_flags;
 	u_int8_t	inc_len;
 	u_int16_t	inc_fibnum;	/* XXX was pad, 16 bits is plenty */
 	/* protocol dependent part */
 	struct	in_endpoints inc_ie;
 };
 
 /*
  * Flags for inc_flags.
  */
 #define	INC_ISIPV6	0x01
 #define	INC_IPV6MINMTU	0x02
 
 #define	inc_fport	inc_ie.ie_fport
 #define	inc_lport	inc_ie.ie_lport
 #define	inc_faddr	inc_ie.ie_faddr
 #define	inc_laddr	inc_ie.ie_laddr
 #define	inc6_faddr	inc_ie.ie6_faddr
 #define	inc6_laddr	inc_ie.ie6_laddr
 #define	inc6_zoneid	inc_ie.ie6_zoneid
 
 #if defined(_KERNEL) || defined(_WANT_INPCB)
 /*
  * struct inpcb captures the network layer state for TCP, UDP, and raw IPv4 and
  * IPv6 sockets.  In the case of TCP and UDP, further per-connection state is
  * hung off of inp_ppcb most of the time.  Almost all fields of struct inpcb
  * are static after creation or protected by a per-inpcb rwlock, inp_lock.  A
  * few fields are protected by multiple locks as indicated in the locking notes
  * below.  For these fields, all of the listed locks must be write-locked for
  * any modifications.  However, these fields can be safely read while any one of
  * the listed locks are read-locked.  This model can permit greater concurrency
  * for read operations.  For example, connections can be looked up while only
  * holding a read lock on the global pcblist lock.  This is important for
  * performance when attempting to find the connection for a packet given its IP
  * and port tuple.
  *
  * One noteworthy exception is that the global pcbinfo lock follows a different
  * set of rules in relation to the inp_list field.  Rather than being
  * write-locked for modifications and read-locked for list iterations, it must
  * be read-locked during modifications and write-locked during list iterations.
  * This ensures that the relatively rare global list iterations safely walk a
  * stable snapshot of connections while allowing more common list modifications
  * to safely grab the pcblist lock just while adding or removing a connection
  * from the global list.
  *
  * Key:
  * (b) - Protected by the hpts lock.
  * (c) - Constant after initialization
  * (e) - Protected by the net_epoch_prempt epoch
  * (g) - Protected by the pcbgroup lock
  * (i) - Protected by the inpcb lock
  * (p) - Protected by the pcbinfo lock for the inpcb
  * (l) - Protected by the pcblist lock for the inpcb
  * (h) - Protected by the pcbhash lock for the inpcb
  * (s) - Protected by another subsystem's locks
  * (x) - Undefined locking
  * 
  * Notes on the tcp_hpts:
  * 
  * First Hpts lock order is
  * 1) INP_WLOCK()
  * 2) HPTS_LOCK() i.e. hpts->pmtx 
  *
  * To insert a TCB on the hpts you *must* be holding the INP_WLOCK(). 
  * You may check the inp->inp_in_hpts flag without the hpts lock. 
  * The hpts is the only one that will clear this flag holding 
  * only the hpts lock. This means that in your tcp_output()
  * routine when you test for the inp_in_hpts flag to be 1 
  * it may be transitioning to 0 (by the hpts). 
  * That's ok since that will just mean an extra call to tcp_output 
  * that most likely will find the call you executed
  * (when the mis-match occured) will have put the TCB back 
  * on the hpts and it will return. If your
  * call did not add the inp back to the hpts then you will either
  * over-send or the cwnd will block you from sending more.
  *
  * Note you should also be holding the INP_WLOCK() when you
  * call the remove from the hpts as well. Though usually
  * you are either doing this from a timer, where you need and have
  * the INP_WLOCK() or from destroying your TCB where again
  * you should already have the INP_WLOCK().
  *
  * The inp_hpts_cpu, inp_hpts_cpu_set, inp_input_cpu and 
  * inp_input_cpu_set fields are controlled completely by
  * the hpts. Do not ever set these. The inp_hpts_cpu_set
  * and inp_input_cpu_set fields indicate if the hpts has
  * setup the respective cpu field. It is advised if this
  * field is 0, to enqueue the packet with the appropriate
  * hpts_immediate() call. If the _set field is 1, then
  * you may compare the inp_*_cpu field to the curcpu and
  * may want to again insert onto the hpts if these fields
  * are not equal (i.e. you are not on the expected CPU).
  *
  * A note on inp_hpts_calls and inp_input_calls, these
  * flags are set when the hpts calls either the output
  * or do_segment routines respectively. If the routine
  * being called wants to use this, then it needs to
  * clear the flag before returning. The hpts will not
  * clear the flag. The flags can be used to tell if
  * the hpts is the function calling the respective
  * routine.
  *
  * A few other notes:
  *
  * When a read lock is held, stability of the field is guaranteed; to write
  * to a field, a write lock must generally be held.
  *
  * netinet/netinet6-layer code should not assume that the inp_socket pointer
  * is safe to dereference without inp_lock being held, even for protocols
  * other than TCP (where the inpcb persists during TIMEWAIT even after the
  * socket has been freed), or there may be close(2)-related races.
  *
  * The inp_vflag field is overloaded, and would otherwise ideally be (c).
  *
  * TODO:  Currently only the TCP stack is leveraging the global pcbinfo lock
  * read-lock usage during modification, this model can be applied to other
  * protocols (especially SCTP).
  */
 struct icmp6_filter;
 struct inpcbpolicy;
 struct m_snd_tag;
 struct inpcb {
 	/* Cache line #1 (amd64) */
 	CK_LIST_ENTRY(inpcb) inp_hash;	/* [w](h/i) [r](e/i)  hash list */
 	CK_LIST_ENTRY(inpcb) inp_pcbgrouphash;	/* (g/i) hash list */
 	struct rwlock	inp_lock;
 	/* Cache line #2 (amd64) */
 #define	inp_start_zero	inp_hpts
 #define	inp_zero_size	(sizeof(struct inpcb) - \
 			    offsetof(struct inpcb, inp_start_zero))
 	TAILQ_ENTRY(inpcb) inp_hpts;	/* pacing out queue next lock(b) */
 
 	uint32_t inp_hpts_request;	/* Current hpts request, zero if
 					 * fits in the pacing window (i&b). */
 	/*
 	 * Note the next fields are protected by a
 	 * different lock (hpts-lock). This means that 
 	 * they must correspond in size to the smallest
 	 * protectable bit field (uint8_t on x86, and
 	 * other platfomrs potentially uint32_t?). Also
 	 * since CPU switches can occur at different times the two
 	 * fields can *not* be collapsed into a signal bit field.
 	 */
 #if defined(__amd64__) || defined(__i386__)	
 	volatile uint8_t inp_in_hpts; /* on output hpts (lock b) */
 	volatile uint8_t inp_in_input; /* on input hpts (lock b) */
 #else
 	volatile uint32_t inp_in_hpts; /* on output hpts (lock b) */
 	volatile uint32_t inp_in_input; /* on input hpts (lock b) */
 #endif
 	volatile uint16_t  inp_hpts_cpu; /* Lock (i) */
 	u_int	inp_refcount;		/* (i) refcount */
 	int	inp_flags;		/* (i) generic IP/datagram flags */
 	int	inp_flags2;		/* (i) generic IP/datagram flags #2*/
 	volatile uint16_t  inp_input_cpu; /* Lock (i) */
 	volatile uint8_t inp_hpts_cpu_set :1,  /* on output hpts (i) */
 			 inp_input_cpu_set : 1,	/* on input hpts (i) */
 			 inp_hpts_calls :1,	/* (i) from output hpts */
 			 inp_input_calls :1,	/* (i) from input hpts */
 			 inp_spare_bits2 : 4;
-	uint8_t inp_spare_byte;		/* Compiler hole */
+	uint8_t inp_numa_domain;	/* numa domain */
 	void	*inp_ppcb;		/* (i) pointer to per-protocol pcb */
 	struct	socket *inp_socket;	/* (i) back pointer to socket */
 	uint32_t 	 inp_hptsslot;	/* Hpts wheel slot this tcb is Lock(i&b) */
 	uint32_t         inp_hpts_drop_reas;	/* reason we are dropping the PCB (lock i&b) */
 	TAILQ_ENTRY(inpcb) inp_input;	/* pacing in  queue next lock(b) */
 	struct	inpcbinfo *inp_pcbinfo;	/* (c) PCB list info */
 	struct	inpcbgroup *inp_pcbgroup; /* (g/i) PCB group list */
 	CK_LIST_ENTRY(inpcb) inp_pcbgroup_wild; /* (g/i/h) group wildcard entry */
 	struct	ucred	*inp_cred;	/* (c) cache of socket cred */
 	u_int32_t inp_flow;		/* (i) IPv6 flow information */
 	u_char	inp_vflag;		/* (i) IP version flag (v4/v6) */
 	u_char	inp_ip_ttl;		/* (i) time to live proto */
 	u_char	inp_ip_p;		/* (c) protocol proto */
 	u_char	inp_ip_minttl;		/* (i) minimum TTL or drop */
 	uint32_t inp_flowid;		/* (x) flow id / queue id */
 	struct m_snd_tag *inp_snd_tag;	/* (i) send tag for outgoing mbufs */
 	uint32_t inp_flowtype;		/* (x) M_HASHTYPE value */
 	uint32_t inp_rss_listen_bucket;	/* (x) overridden RSS listen bucket */
 
 	/* Local and foreign ports, local and foreign addr. */
 	struct	in_conninfo inp_inc;	/* (i) list for PCB's local port */
 
 	/* MAC and IPSEC policy information. */
 	struct	label *inp_label;	/* (i) MAC label */
 	struct	inpcbpolicy *inp_sp;    /* (s) for IPSEC */
 
 	/* Protocol-dependent part; options. */
 	struct {
 		u_char	inp_ip_tos;		/* (i) type of service proto */
 		struct mbuf		*inp_options;	/* (i) IP options */
 		struct ip_moptions	*inp_moptions;	/* (i) mcast options */
 	};
 	struct {
 		/* (i) IP options */
 		struct mbuf		*in6p_options;
 		/* (i) IP6 options for outgoing packets */
 		struct ip6_pktopts	*in6p_outputopts;
 		/* (i) IP multicast options */
 		struct ip6_moptions	*in6p_moptions;
 		/* (i) ICMPv6 code type filter */
 		struct icmp6_filter	*in6p_icmp6filt;
 		/* (i) IPV6_CHECKSUM setsockopt */
 		int	in6p_cksum;
 		short	in6p_hops;
 	};
 	CK_LIST_ENTRY(inpcb) inp_portlist;	/* (i/h) */
 	struct	inpcbport *inp_phd;	/* (i/h) head of this list */
 	inp_gen_t	inp_gencnt;	/* (c) generation count */
 	void		*spare_ptr;	/* Spare pointer. */
 	rt_gen_t	inp_rt_cookie;	/* generation for route entry */
 	union {				/* cached L3 information */
 		struct route inp_route;
 		struct route_in6 inp_route6;
 	};
 	CK_LIST_ENTRY(inpcb) inp_list;	/* (p/l) list for all PCBs for proto */
 	                                /* (e[r]) for list iteration */
 	                                /* (p[w]/l) for addition/removal */
 	struct epoch_context inp_epoch_ctx;
 };
 #endif	/* _KERNEL */
 
 #define	inp_fport	inp_inc.inc_fport
 #define	inp_lport	inp_inc.inc_lport
 #define	inp_faddr	inp_inc.inc_faddr
 #define	inp_laddr	inp_inc.inc_laddr
 
 #define	in6p_faddr	inp_inc.inc6_faddr
 #define	in6p_laddr	inp_inc.inc6_laddr
 #define	in6p_zoneid	inp_inc.inc6_zoneid
 #define	in6p_flowinfo	inp_flow
 
 #define	inp_vnet	inp_pcbinfo->ipi_vnet
 
 /*
  * The range of the generation count, as used in this implementation, is 9e19.
  * We would have to create 300 billion connections per second for this number
  * to roll over in a year.  This seems sufficiently unlikely that we simply
  * don't concern ourselves with that possibility.
  */
 
 /*
  * Interface exported to userland by various protocols which use inpcbs.  Hack
  * alert -- only define if struct xsocket is in scope.
  * Fields prefixed with "xi_" are unique to this structure, and the rest
  * match fields in the struct inpcb, to ease coding and porting.
  *
  * Legend:
  * (s) - used by userland utilities in src
  * (p) - used by utilities in ports
  * (3) - is known to be used by third party software not in ports
  * (n) - no known usage
  */
 #ifdef _SYS_SOCKETVAR_H_
 struct xinpcb {
 	ksize_t		xi_len;			/* length of this structure */
 	struct xsocket	xi_socket;		/* (s,p) */
 	struct in_conninfo inp_inc;		/* (s,p) */
 	uint64_t	inp_gencnt;		/* (s,p) */
 	kvaddr_t	inp_ppcb;		/* (s) netstat(1) */
 	int64_t		inp_spare64[4];
 	uint32_t	inp_flow;		/* (s) */
 	uint32_t	inp_flowid;		/* (s) */
 	uint32_t	inp_flowtype;		/* (s) */
 	int32_t		inp_flags;		/* (s,p) */
 	int32_t		inp_flags2;		/* (s) */
 	int32_t		inp_rss_listen_bucket;	/* (n) */
 	int32_t		in6p_cksum;		/* (n) */
 	int32_t		inp_spare32[4];
 	uint16_t	in6p_hops;		/* (n) */
 	uint8_t		inp_ip_tos;		/* (n) */
 	int8_t		pad8;
 	uint8_t		inp_vflag;		/* (s,p) */
 	uint8_t		inp_ip_ttl;		/* (n) */
 	uint8_t		inp_ip_p;		/* (n) */
 	uint8_t		inp_ip_minttl;		/* (n) */
 	int8_t		inp_spare8[4];
 } __aligned(8);
 
 struct xinpgen {
 	ksize_t	xig_len;	/* length of this structure */
 	u_int		xig_count;	/* number of PCBs at this time */
 	uint32_t	_xig_spare32;
 	inp_gen_t	xig_gen;	/* generation count at this time */
 	so_gen_t	xig_sogen;	/* socket generation count this time */
 	uint64_t	_xig_spare64[4];
 } __aligned(8);
 #ifdef	_KERNEL
 void	in_pcbtoxinpcb(const struct inpcb *, struct xinpcb *);
 #endif
 #endif /* _SYS_SOCKETVAR_H_ */
 
 struct inpcbport {
 	struct epoch_context phd_epoch_ctx;
 	CK_LIST_ENTRY(inpcbport) phd_hash;
 	struct inpcbhead phd_pcblist;
 	u_short phd_port;
 };
 
 struct in_pcblist {
 	int il_count;
 	struct epoch_context il_epoch_ctx;
 	struct inpcbinfo *il_pcbinfo;
 	struct inpcb *il_inp_list[0];
 };
 
 /*-
  * Global data structure for each high-level protocol (UDP, TCP, ...) in both
  * IPv4 and IPv6.  Holds inpcb lists and information for managing them.
  *
  * Each pcbinfo is protected by three locks: ipi_lock, ipi_hash_lock and
  * ipi_list_lock:
  *  - ipi_lock covering the global pcb list stability during loop iteration,
  *  - ipi_hash_lock covering the hashed lookup tables,
  *  - ipi_list_lock covering mutable global fields (such as the global
  *    pcb list)
  *
  * The lock order is:
  *
  *    ipi_lock (before)
  *        inpcb locks (before)
  *            ipi_list locks (before)
  *                {ipi_hash_lock, pcbgroup locks}
  *
  * Locking key:
  *
  * (c) Constant or nearly constant after initialisation
  * (e) - Protected by the net_epoch_prempt epoch
  * (g) Locked by ipi_lock
  * (l) Locked by ipi_list_lock
  * (h) Read using either net_epoch_preempt or inpcb lock; write requires both ipi_hash_lock and inpcb lock
  * (p) Protected by one or more pcbgroup locks
  * (x) Synchronisation properties poorly defined
  */
 struct inpcbinfo {
 	/*
 	 * Global lock protecting inpcb list modification
 	 */
 	struct mtx		 ipi_lock;
 
 	/*
 	 * Global list of inpcbs on the protocol.
 	 */
 	struct inpcbhead	*ipi_listhead;		/* [r](e) [w](g/l) */
 	u_int			 ipi_count;		/* (l) */
 
 	/*
 	 * Generation count -- incremented each time a connection is allocated
 	 * or freed.
 	 */
 	u_quad_t		 ipi_gencnt;		/* (l) */
 
 	/*
 	 * Fields associated with port lookup and allocation.
 	 */
 	u_short			 ipi_lastport;		/* (x) */
 	u_short			 ipi_lastlow;		/* (x) */
 	u_short			 ipi_lasthi;		/* (x) */
 
 	/*
 	 * UMA zone from which inpcbs are allocated for this protocol.
 	 */
 	struct	uma_zone	*ipi_zone;		/* (c) */
 
 	/*
 	 * Connection groups associated with this protocol.  These fields are
 	 * constant, but pcbgroup structures themselves are protected by
 	 * per-pcbgroup locks.
 	 */
 	struct inpcbgroup	*ipi_pcbgroups;		/* (c) */
 	u_int			 ipi_npcbgroups;	/* (c) */
 	u_int			 ipi_hashfields;	/* (c) */
 
 	/*
 	 * Global lock protecting modification non-pcbgroup hash lookup tables.
 	 */
 	struct mtx		 ipi_hash_lock;
 
 	/*
 	 * Global hash of inpcbs, hashed by local and foreign addresses and
 	 * port numbers.
 	 */
 	struct inpcbhead	*ipi_hashbase;		/* (h) */
 	u_long			 ipi_hashmask;		/* (h) */
 
 	/*
 	 * Global hash of inpcbs, hashed by only local port number.
 	 */
 	struct inpcbporthead	*ipi_porthashbase;	/* (h) */
 	u_long			 ipi_porthashmask;	/* (h) */
 
 	/*
 	 * List of wildcard inpcbs for use with pcbgroups.  In the past, was
 	 * per-pcbgroup but is now global.  All pcbgroup locks must be held
 	 * to modify the list, so any is sufficient to read it.
 	 */
 	struct inpcbhead	*ipi_wildbase;		/* (p) */
 	u_long			 ipi_wildmask;		/* (p) */
 
 	/*
 	 * Load balance groups used for the SO_REUSEPORT_LB option,
 	 * hashed by local port.
 	 */
 	struct	inpcblbgrouphead *ipi_lbgrouphashbase;	/* (h) */
 	u_long			 ipi_lbgrouphashmask;	/* (h) */
 
 	/*
 	 * Pointer to network stack instance
 	 */
 	struct vnet		*ipi_vnet;		/* (c) */
 
 	/*
 	 * general use 2
 	 */
 	void 			*ipi_pspare[2];
 
 	/*
 	 * Global lock protecting global inpcb list, inpcb count, etc.
 	 */
 	struct rwlock		 ipi_list_lock;
 };
 
 #ifdef _KERNEL
 /*
  * Connection groups hold sets of connections that have similar CPU/thread
  * affinity.  Each connection belongs to exactly one connection group.
  */
 struct inpcbgroup {
 	/*
 	 * Per-connection group hash of inpcbs, hashed by local and foreign
 	 * addresses and port numbers.
 	 */
 	struct inpcbhead	*ipg_hashbase;		/* (c) */
 	u_long			 ipg_hashmask;		/* (c) */
 
 	/*
 	 * Notional affinity of this pcbgroup.
 	 */
 	u_int			 ipg_cpu;		/* (p) */
 
 	/*
 	 * Per-connection group lock, not to be confused with ipi_lock.
 	 * Protects the hash table hung off the group, but also the global
 	 * wildcard list in inpcbinfo.
 	 */
 	struct mtx		 ipg_lock;
 } __aligned(CACHE_LINE_SIZE);
 
 /*
  * Load balance groups used for the SO_REUSEPORT_LB socket option. Each group
  * (or unique address:port combination) can be re-used at most
  * INPCBLBGROUP_SIZMAX (256) times. The inpcbs are stored in il_inp which
  * is dynamically resized as processes bind/unbind to that specific group.
  */
 struct inpcblbgroup {
 	CK_LIST_ENTRY(inpcblbgroup) il_list;
 	struct epoch_context il_epoch_ctx;
 	uint16_t	il_lport;			/* (c) */
 	u_char		il_vflag;			/* (c) */
 	u_char		il_pad;
 	uint32_t	il_pad2;
 	union in_dependaddr il_dependladdr;		/* (c) */
 #define	il_laddr	il_dependladdr.id46_addr.ia46_addr4
 #define	il6_laddr	il_dependladdr.id6_addr
 	uint32_t	il_inpsiz; /* max count in il_inp[] (h) */
 	uint32_t	il_inpcnt; /* cur count in il_inp[] (h) */
 	struct inpcb	*il_inp[];			/* (h) */
 };
 
 #define INP_LOCK_INIT(inp, d, t) \
 	rw_init_flags(&(inp)->inp_lock, (t), RW_RECURSE |  RW_DUPOK)
 #define INP_LOCK_DESTROY(inp)	rw_destroy(&(inp)->inp_lock)
 #define INP_RLOCK(inp)		rw_rlock(&(inp)->inp_lock)
 #define INP_WLOCK(inp)		rw_wlock(&(inp)->inp_lock)
 #define INP_TRY_RLOCK(inp)	rw_try_rlock(&(inp)->inp_lock)
 #define INP_TRY_WLOCK(inp)	rw_try_wlock(&(inp)->inp_lock)
 #define INP_RUNLOCK(inp)	rw_runlock(&(inp)->inp_lock)
 #define INP_WUNLOCK(inp)	rw_wunlock(&(inp)->inp_lock)
 #define	INP_TRY_UPGRADE(inp)	rw_try_upgrade(&(inp)->inp_lock)
 #define	INP_DOWNGRADE(inp)	rw_downgrade(&(inp)->inp_lock)
 #define	INP_WLOCKED(inp)	rw_wowned(&(inp)->inp_lock)
 #define	INP_LOCK_ASSERT(inp)	rw_assert(&(inp)->inp_lock, RA_LOCKED)
 #define	INP_RLOCK_ASSERT(inp)	rw_assert(&(inp)->inp_lock, RA_RLOCKED)
 #define	INP_WLOCK_ASSERT(inp)	rw_assert(&(inp)->inp_lock, RA_WLOCKED)
 #define	INP_UNLOCK_ASSERT(inp)	rw_assert(&(inp)->inp_lock, RA_UNLOCKED)
 
 /*
  * These locking functions are for inpcb consumers outside of sys/netinet,
  * more specifically, they were added for the benefit of TOE drivers. The
  * macros are reserved for use by the stack.
  */
 void inp_wlock(struct inpcb *);
 void inp_wunlock(struct inpcb *);
 void inp_rlock(struct inpcb *);
 void inp_runlock(struct inpcb *);
 
 #ifdef INVARIANT_SUPPORT
 void inp_lock_assert(struct inpcb *);
 void inp_unlock_assert(struct inpcb *);
 #else
 #define	inp_lock_assert(inp)	do {} while (0)
 #define	inp_unlock_assert(inp)	do {} while (0)
 #endif
 
 void	inp_apply_all(void (*func)(struct inpcb *, void *), void *arg);
 int 	inp_ip_tos_get(const struct inpcb *inp);
 void 	inp_ip_tos_set(struct inpcb *inp, int val);
 struct socket *
 	inp_inpcbtosocket(struct inpcb *inp);
 struct tcpcb *
 	inp_inpcbtotcpcb(struct inpcb *inp);
 void 	inp_4tuple_get(struct inpcb *inp, uint32_t *laddr, uint16_t *lp,
 		uint32_t *faddr, uint16_t *fp);
 int	inp_so_options(const struct inpcb *inp);
 
 #endif /* _KERNEL */
 
 #define INP_INFO_LOCK_INIT(ipi, d) \
 	mtx_init(&(ipi)->ipi_lock, (d), NULL, MTX_DEF| MTX_RECURSE)
 #define INP_INFO_LOCK_DESTROY(ipi)  mtx_destroy(&(ipi)->ipi_lock)
 #define INP_INFO_RLOCK_ET(ipi, et)	NET_EPOCH_ENTER((et))
 #define INP_INFO_WLOCK(ipi) mtx_lock(&(ipi)->ipi_lock)
 #define INP_INFO_TRY_WLOCK(ipi)	mtx_trylock(&(ipi)->ipi_lock)
 #define INP_INFO_WLOCKED(ipi)	mtx_owned(&(ipi)->ipi_lock)
 #define INP_INFO_RUNLOCK_ET(ipi, et)	NET_EPOCH_EXIT((et))
 #define INP_INFO_RUNLOCK_TP(ipi, tp)	NET_EPOCH_EXIT(*(tp)->t_inpcb->inp_et)
 #define INP_INFO_WUNLOCK(ipi)	mtx_unlock(&(ipi)->ipi_lock)
 #define	INP_INFO_LOCK_ASSERT(ipi)	MPASS(in_epoch(net_epoch_preempt) || mtx_owned(&(ipi)->ipi_lock))
 #define INP_INFO_RLOCK_ASSERT(ipi)	MPASS(in_epoch(net_epoch_preempt))
 #define INP_INFO_WLOCK_ASSERT(ipi)	mtx_assert(&(ipi)->ipi_lock, MA_OWNED)
 #define INP_INFO_WUNLOCK_ASSERT(ipi)	\
 	mtx_assert(&(ipi)->ipi_lock, MA_NOTOWNED)
 #define INP_INFO_UNLOCK_ASSERT(ipi)	MPASS(!in_epoch(net_epoch_preempt) && !mtx_owned(&(ipi)->ipi_lock))
 
 #define INP_LIST_LOCK_INIT(ipi, d) \
         rw_init_flags(&(ipi)->ipi_list_lock, (d), 0)
 #define INP_LIST_LOCK_DESTROY(ipi)  rw_destroy(&(ipi)->ipi_list_lock)
 #define INP_LIST_RLOCK(ipi)     rw_rlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_WLOCK(ipi)     rw_wlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_TRY_RLOCK(ipi) rw_try_rlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_TRY_WLOCK(ipi) rw_try_wlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_TRY_UPGRADE(ipi)       rw_try_upgrade(&(ipi)->ipi_list_lock)
 #define INP_LIST_RUNLOCK(ipi)   rw_runlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_WUNLOCK(ipi)   rw_wunlock(&(ipi)->ipi_list_lock)
 #define INP_LIST_LOCK_ASSERT(ipi) \
 	rw_assert(&(ipi)->ipi_list_lock, RA_LOCKED)
 #define INP_LIST_RLOCK_ASSERT(ipi) \
 	rw_assert(&(ipi)->ipi_list_lock, RA_RLOCKED)
 #define INP_LIST_WLOCK_ASSERT(ipi) \
 	rw_assert(&(ipi)->ipi_list_lock, RA_WLOCKED)
 #define INP_LIST_UNLOCK_ASSERT(ipi) \
 	rw_assert(&(ipi)->ipi_list_lock, RA_UNLOCKED)
 
 #define	INP_HASH_LOCK_INIT(ipi, d) mtx_init(&(ipi)->ipi_hash_lock, (d), NULL, MTX_DEF)
 #define	INP_HASH_LOCK_DESTROY(ipi)	mtx_destroy(&(ipi)->ipi_hash_lock)
 #define	INP_HASH_RLOCK(ipi)		struct epoch_tracker inp_hash_et; epoch_enter_preempt(net_epoch_preempt, &inp_hash_et)
 #define	INP_HASH_RLOCK_ET(ipi, et)		epoch_enter_preempt(net_epoch_preempt, &(et))
 #define	INP_HASH_WLOCK(ipi)		mtx_lock(&(ipi)->ipi_hash_lock)
 #define	INP_HASH_RUNLOCK(ipi)		NET_EPOCH_EXIT(inp_hash_et)
 #define	INP_HASH_RUNLOCK_ET(ipi, et)	NET_EPOCH_EXIT((et))
 #define	INP_HASH_WUNLOCK(ipi)		mtx_unlock(&(ipi)->ipi_hash_lock)
 #define	INP_HASH_LOCK_ASSERT(ipi)	MPASS(in_epoch(net_epoch_preempt) || mtx_owned(&(ipi)->ipi_hash_lock))
 #define	INP_HASH_WLOCK_ASSERT(ipi)	mtx_assert(&(ipi)->ipi_hash_lock, MA_OWNED);
 
 #define	INP_GROUP_LOCK_INIT(ipg, d)	mtx_init(&(ipg)->ipg_lock, (d), NULL, \
 					    MTX_DEF | MTX_DUPOK)
 #define	INP_GROUP_LOCK_DESTROY(ipg)	mtx_destroy(&(ipg)->ipg_lock)
 
 #define	INP_GROUP_LOCK(ipg)		mtx_lock(&(ipg)->ipg_lock)
 #define	INP_GROUP_LOCK_ASSERT(ipg)	mtx_assert(&(ipg)->ipg_lock, MA_OWNED)
 #define	INP_GROUP_UNLOCK(ipg)		mtx_unlock(&(ipg)->ipg_lock)
 
 #define INP_PCBHASH(faddr, lport, fport, mask) \
 	(((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))
 #define INP_PCBPORTHASH(lport, mask) \
 	(ntohs((lport)) & (mask))
 #define	INP_PCBLBGROUP_PKTHASH(faddr, lport, fport) \
 	((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport)))
 #define	INP6_PCBHASHKEY(faddr)	((faddr)->s6_addr32[3])
 
 /*
  * Flags for inp_vflags -- historically version flags only
  */
 #define	INP_IPV4	0x1
 #define	INP_IPV6	0x2
 #define	INP_IPV6PROTO	0x4		/* opened under IPv6 protocol */
 
 /*
  * Flags for inp_flags.
  */
 #define	INP_RECVOPTS		0x00000001 /* receive incoming IP options */
 #define	INP_RECVRETOPTS		0x00000002 /* receive IP options for reply */
 #define	INP_RECVDSTADDR		0x00000004 /* receive IP dst address */
 #define	INP_HDRINCL		0x00000008 /* user supplies entire IP header */
 #define	INP_HIGHPORT		0x00000010 /* user wants "high" port binding */
 #define	INP_LOWPORT		0x00000020 /* user wants "low" port binding */
 #define	INP_ANONPORT		0x00000040 /* port chosen for user */
 #define	INP_RECVIF		0x00000080 /* receive incoming interface */
 #define	INP_MTUDISC		0x00000100 /* user can do MTU discovery */
 				   	   /* 0x000200 unused: was INP_FAITH */
 #define	INP_RECVTTL		0x00000400 /* receive incoming IP TTL */
 #define	INP_DONTFRAG		0x00000800 /* don't fragment packet */
 #define	INP_BINDANY		0x00001000 /* allow bind to any address */
 #define	INP_INHASHLIST		0x00002000 /* in_pcbinshash() has been called */
 #define	INP_RECVTOS		0x00004000 /* receive incoming IP TOS */
 #define	IN6P_IPV6_V6ONLY	0x00008000 /* restrict AF_INET6 socket for v6 */
 #define	IN6P_PKTINFO		0x00010000 /* receive IP6 dst and I/F */
 #define	IN6P_HOPLIMIT		0x00020000 /* receive hoplimit */
 #define	IN6P_HOPOPTS		0x00040000 /* receive hop-by-hop options */
 #define	IN6P_DSTOPTS		0x00080000 /* receive dst options after rthdr */
 #define	IN6P_RTHDR		0x00100000 /* receive routing header */
 #define	IN6P_RTHDRDSTOPTS	0x00200000 /* receive dstoptions before rthdr */
 #define	IN6P_TCLASS		0x00400000 /* receive traffic class value */
 #define	IN6P_AUTOFLOWLABEL	0x00800000 /* attach flowlabel automatically */
 #define	INP_TIMEWAIT		0x01000000 /* in TIMEWAIT, ppcb is tcptw */
 #define	INP_ONESBCAST		0x02000000 /* send all-ones broadcast */
 #define	INP_DROPPED		0x04000000 /* protocol drop flag */
 #define	INP_SOCKREF		0x08000000 /* strong socket reference */
 #define	INP_RESERVED_0          0x10000000 /* reserved field */
 #define	INP_RESERVED_1          0x20000000 /* reserved field */
 #define	IN6P_RFC2292		0x40000000 /* used RFC2292 API on the socket */
 #define	IN6P_MTU		0x80000000 /* receive path MTU */
 
 #define	INP_CONTROLOPTS		(INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\
 				 INP_RECVIF|INP_RECVTTL|INP_RECVTOS|\
 				 IN6P_PKTINFO|IN6P_HOPLIMIT|IN6P_HOPOPTS|\
 				 IN6P_DSTOPTS|IN6P_RTHDR|IN6P_RTHDRDSTOPTS|\
 				 IN6P_TCLASS|IN6P_AUTOFLOWLABEL|IN6P_RFC2292|\
 				 IN6P_MTU)
 
 /*
  * Flags for inp_flags2.
  */
 #define	INP_2UNUSED1		0x00000001
 #define	INP_2UNUSED2		0x00000002
 #define	INP_PCBGROUPWILD	0x00000004 /* in pcbgroup wildcard list */
 #define	INP_REUSEPORT		0x00000008 /* SO_REUSEPORT option is set */
 #define	INP_FREED		0x00000010 /* inp itself is not valid */
 #define	INP_REUSEADDR		0x00000020 /* SO_REUSEADDR option is set */
 #define	INP_BINDMULTI		0x00000040 /* IP_BINDMULTI option is set */
 #define	INP_RSS_BUCKET_SET	0x00000080 /* IP_RSS_LISTEN_BUCKET is set */
 #define	INP_RECVFLOWID		0x00000100 /* populate recv datagram with flow info */
 #define	INP_RECVRSSBUCKETID	0x00000200 /* populate recv datagram with bucket id */
 #define	INP_RATE_LIMIT_CHANGED	0x00000400 /* rate limit needs attention */
 #define	INP_ORIGDSTADDR		0x00000800 /* receive IP dst address/port */
 #define INP_CANNOT_DO_ECN	0x00001000 /* The stack does not do ECN */
 #define	INP_REUSEPORT_LB	0x00002000 /* SO_REUSEPORT_LB option is set */
 
 /*
  * Flags passed to in_pcblookup*() functions.
  */
 #define	INPLOOKUP_WILDCARD	0x00000001	/* Allow wildcard sockets. */
 #define	INPLOOKUP_RLOCKPCB	0x00000002	/* Return inpcb read-locked. */
 #define	INPLOOKUP_WLOCKPCB	0x00000004	/* Return inpcb write-locked. */
 
 #define	INPLOOKUP_MASK	(INPLOOKUP_WILDCARD | INPLOOKUP_RLOCKPCB | \
 			    INPLOOKUP_WLOCKPCB)
 
 #define	sotoinpcb(so)	((struct inpcb *)(so)->so_pcb)
 #define	sotoin6pcb(so)	sotoinpcb(so) /* for KAME src sync over BSD*'s */
 
 #define	INP_SOCKAF(so) so->so_proto->pr_domain->dom_family
 
 #define	INP_CHECK_SOCKAF(so, af)	(INP_SOCKAF(so) == af)
 
 /*
  * Constants for pcbinfo.ipi_hashfields.
  */
 #define	IPI_HASHFIELDS_NONE	0
 #define	IPI_HASHFIELDS_2TUPLE	1
 #define	IPI_HASHFIELDS_4TUPLE	2
 
 #ifdef _KERNEL
 VNET_DECLARE(int, ipport_reservedhigh);
 VNET_DECLARE(int, ipport_reservedlow);
 VNET_DECLARE(int, ipport_lowfirstauto);
 VNET_DECLARE(int, ipport_lowlastauto);
 VNET_DECLARE(int, ipport_firstauto);
 VNET_DECLARE(int, ipport_lastauto);
 VNET_DECLARE(int, ipport_hifirstauto);
 VNET_DECLARE(int, ipport_hilastauto);
 VNET_DECLARE(int, ipport_randomized);
 VNET_DECLARE(int, ipport_randomcps);
 VNET_DECLARE(int, ipport_randomtime);
 VNET_DECLARE(int, ipport_stoprandom);
 VNET_DECLARE(int, ipport_tcpallocs);
 
 #define	V_ipport_reservedhigh	VNET(ipport_reservedhigh)
 #define	V_ipport_reservedlow	VNET(ipport_reservedlow)
 #define	V_ipport_lowfirstauto	VNET(ipport_lowfirstauto)
 #define	V_ipport_lowlastauto	VNET(ipport_lowlastauto)
 #define	V_ipport_firstauto	VNET(ipport_firstauto)
 #define	V_ipport_lastauto	VNET(ipport_lastauto)
 #define	V_ipport_hifirstauto	VNET(ipport_hifirstauto)
 #define	V_ipport_hilastauto	VNET(ipport_hilastauto)
 #define	V_ipport_randomized	VNET(ipport_randomized)
 #define	V_ipport_randomcps	VNET(ipport_randomcps)
 #define	V_ipport_randomtime	VNET(ipport_randomtime)
 #define	V_ipport_stoprandom	VNET(ipport_stoprandom)
 #define	V_ipport_tcpallocs	VNET(ipport_tcpallocs)
 
 void	in_pcbinfo_destroy(struct inpcbinfo *);
 void	in_pcbinfo_init(struct inpcbinfo *, const char *, struct inpcbhead *,
 	    int, int, char *, uma_init, u_int);
 
 int	in_pcbbind_check_bindmulti(const struct inpcb *ni,
 	    const struct inpcb *oi);
 
 struct inpcbgroup *
 	in_pcbgroup_byhash(struct inpcbinfo *, u_int, uint32_t);
 struct inpcbgroup *
 	in_pcbgroup_byinpcb(struct inpcb *);
 struct inpcbgroup *
 	in_pcbgroup_bytuple(struct inpcbinfo *, struct in_addr, u_short,
 	    struct in_addr, u_short);
 void	in_pcbgroup_destroy(struct inpcbinfo *);
 int	in_pcbgroup_enabled(struct inpcbinfo *);
 void	in_pcbgroup_init(struct inpcbinfo *, u_int, int);
 void	in_pcbgroup_remove(struct inpcb *);
 void	in_pcbgroup_update(struct inpcb *);
 void	in_pcbgroup_update_mbuf(struct inpcb *, struct mbuf *);
 
 void	in_pcbpurgeif0(struct inpcbinfo *, struct ifnet *);
 int	in_pcballoc(struct socket *, struct inpcbinfo *);
 int	in_pcbbind(struct inpcb *, struct sockaddr *, struct ucred *);
 int	in_pcb_lport(struct inpcb *, struct in_addr *, u_short *,
 	    struct ucred *, int);
 int	in_pcbbind_setup(struct inpcb *, struct sockaddr *, in_addr_t *,
 	    u_short *, struct ucred *);
 int	in_pcbconnect(struct inpcb *, struct sockaddr *, struct ucred *);
 int	in_pcbconnect_mbuf(struct inpcb *, struct sockaddr *, struct ucred *,
 	    struct mbuf *);
 int	in_pcbconnect_setup(struct inpcb *, struct sockaddr *, in_addr_t *,
 	    u_short *, in_addr_t *, u_short *, struct inpcb **,
 	    struct ucred *);
 void	in_pcbdetach(struct inpcb *);
 void	in_pcbdisconnect(struct inpcb *);
 void	in_pcbdrop(struct inpcb *);
 void	in_pcbfree(struct inpcb *);
 int	in_pcbinshash(struct inpcb *);
 int	in_pcbinshash_nopcbgroup(struct inpcb *);
 int	in_pcbladdr(struct inpcb *, struct in_addr *, struct in_addr *,
 	    struct ucred *);
 struct inpcb *
 	in_pcblookup_local(struct inpcbinfo *,
 	    struct in_addr, u_short, int, struct ucred *);
 struct inpcb *
 	in_pcblookup(struct inpcbinfo *, struct in_addr, u_int,
 	    struct in_addr, u_int, int, struct ifnet *);
 struct inpcb *
 	in_pcblookup_mbuf(struct inpcbinfo *, struct in_addr, u_int,
 	    struct in_addr, u_int, int, struct ifnet *, struct mbuf *);
 void	in_pcbnotifyall(struct inpcbinfo *pcbinfo, struct in_addr,
 	    int, struct inpcb *(*)(struct inpcb *, int));
 void	in_pcbref(struct inpcb *);
 void	in_pcbrehash(struct inpcb *);
 void	in_pcbrehash_mbuf(struct inpcb *, struct mbuf *);
 int	in_pcbrele(struct inpcb *);
 int	in_pcbrele_rlocked(struct inpcb *);
 int	in_pcbrele_wlocked(struct inpcb *);
 void	in_pcblist_rele_rlocked(epoch_context_t ctx);
 void	in_losing(struct inpcb *);
 void	in_pcbsetsolabel(struct socket *so);
 int	in_getpeeraddr(struct socket *so, struct sockaddr **nam);
 int	in_getsockaddr(struct socket *so, struct sockaddr **nam);
 struct sockaddr *
 	in_sockaddr(in_port_t port, struct in_addr *addr);
 void	in_pcbsosetlabel(struct socket *so);
 #ifdef RATELIMIT
 int	in_pcbattach_txrtlmt(struct inpcb *, struct ifnet *, uint32_t, uint32_t, uint32_t);
 void	in_pcbdetach_txrtlmt(struct inpcb *);
 int	in_pcbmodify_txrtlmt(struct inpcb *, uint32_t);
 int	in_pcbquery_txrtlmt(struct inpcb *, uint32_t *);
 int	in_pcbquery_txrlevel(struct inpcb *, uint32_t *);
 void	in_pcboutput_txrtlmt(struct inpcb *, struct ifnet *, struct mbuf *);
 void	in_pcboutput_eagain(struct inpcb *);
 #endif
 #endif /* _KERNEL */
 
 #endif /* !_NETINET_IN_PCB_H_ */
Index: user/ngie/bug-237403/sys/netinet/ip_gre.c
===================================================================
--- user/ngie/bug-237403/sys/netinet/ip_gre.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/ip_gre.c	(revision 346926)
@@ -1,358 +1,595 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-NetBSD
  *
  * Copyright (c) 1998 The NetBSD Foundation, Inc.
  * Copyright (c) 2014, 2018 Andrey V. Elsukov <ae@FreeBSD.org>
  * All rights reserved.
  *
  * This code is derived from software contributed to The NetBSD Foundation
  * by Heiko W.Rupp <hwr@pilhuhn.de>
  *
  * IPv6-over-GRE contributed by Gert Doering <gert@greenie.muc.de>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * $NetBSD: ip_gre.c,v 1.29 2003/09/05 23:02:43 itojun Exp $
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/param.h>
 #include <sys/jail.h>
 #include <sys/systm.h>
 #include <sys/socket.h>
+#include <sys/socketvar.h>
 #include <sys/sockio.h>
 #include <sys/mbuf.h>
 #include <sys/errno.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/malloc.h>
 #include <sys/proc.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_var.h>
+#include <netinet/in_pcb.h>
 #include <netinet/ip.h>
 #include <netinet/ip_encap.h>
 #include <netinet/ip_var.h>
+#include <netinet/udp.h>
+#include <netinet/udp_var.h>
 
 #ifdef INET6
 #include <netinet/ip6.h>
 #endif
 
 #include <net/if_gre.h>
+#include <machine/in_cksum.h>
 
 #define	GRE_TTL			30
 VNET_DEFINE(int, ip_gre_ttl) = GRE_TTL;
 #define	V_ip_gre_ttl		VNET(ip_gre_ttl)
 SYSCTL_INT(_net_inet_ip, OID_AUTO, grettl, CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(ip_gre_ttl), 0, "Default TTL value for encapsulated packets");
 
+struct in_gre_socket {
+	struct gre_socket		base;
+	in_addr_t			addr;
+};
+VNET_DEFINE_STATIC(struct gre_sockets *, ipv4_sockets) = NULL;
 VNET_DEFINE_STATIC(struct gre_list *, ipv4_hashtbl) = NULL;
 VNET_DEFINE_STATIC(struct gre_list *, ipv4_srchashtbl) = NULL;
+#define	V_ipv4_sockets		VNET(ipv4_sockets)
 #define	V_ipv4_hashtbl		VNET(ipv4_hashtbl)
 #define	V_ipv4_srchashtbl	VNET(ipv4_srchashtbl)
 #define	GRE_HASH(src, dst)	(V_ipv4_hashtbl[\
     in_gre_hashval((src), (dst)) & (GRE_HASH_SIZE - 1)])
 #define	GRE_SRCHASH(src)	(V_ipv4_srchashtbl[\
     fnv_32_buf(&(src), sizeof(src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)])
+#define	GRE_SOCKHASH(src)	(V_ipv4_sockets[\
+    fnv_32_buf(&(src), sizeof(src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)])
 #define	GRE_HASH_SC(sc)		GRE_HASH((sc)->gre_oip.ip_src.s_addr,\
     (sc)->gre_oip.ip_dst.s_addr)
 
 static uint32_t
 in_gre_hashval(in_addr_t src, in_addr_t dst)
 {
 	uint32_t ret;
 
 	ret = fnv_32_buf(&src, sizeof(src), FNV1_32_INIT);
 	return (fnv_32_buf(&dst, sizeof(dst), ret));
 }
 
+static struct gre_socket*
+in_gre_lookup_socket(in_addr_t addr)
+{
+	struct gre_socket *gs;
+	struct in_gre_socket *s;
+
+	CK_LIST_FOREACH(gs, &GRE_SOCKHASH(addr), chain) {
+		s = __containerof(gs, struct in_gre_socket, base);
+		if (s->addr == addr)
+			break;
+	}
+	return (gs);
+}
+
 static int
-in_gre_checkdup(const struct gre_softc *sc, in_addr_t src, in_addr_t dst)
+in_gre_checkdup(const struct gre_softc *sc, in_addr_t src, in_addr_t dst,
+    uint32_t opts)
 {
+	struct gre_list *head;
 	struct gre_softc *tmp;
+	struct gre_socket *gs;
 
 	if (sc->gre_family == AF_INET &&
 	    sc->gre_oip.ip_src.s_addr == src &&
-	    sc->gre_oip.ip_dst.s_addr == dst)
+	    sc->gre_oip.ip_dst.s_addr == dst &&
+	    (sc->gre_options & GRE_UDPENCAP) == (opts & GRE_UDPENCAP))
 		return (EEXIST);
 
-	CK_LIST_FOREACH(tmp, &GRE_HASH(src, dst), chain) {
+	if (opts & GRE_UDPENCAP) {
+		gs = in_gre_lookup_socket(src);
+		if (gs == NULL)
+			return (0);
+		head = &gs->list;
+	} else
+		head = &GRE_HASH(src, dst);
+
+	CK_LIST_FOREACH(tmp, head, chain) {
 		if (tmp == sc)
 			continue;
 		if (tmp->gre_oip.ip_src.s_addr == src &&
 		    tmp->gre_oip.ip_dst.s_addr == dst)
 			return (EADDRNOTAVAIL);
 	}
 	return (0);
 }
 
 static int
 in_gre_lookup(const struct mbuf *m, int off, int proto, void **arg)
 {
 	const struct ip *ip;
 	struct gre_softc *sc;
 
 	if (V_ipv4_hashtbl == NULL)
 		return (0);
 
 	MPASS(in_epoch(net_epoch_preempt));
 	ip = mtod(m, const struct ip *);
 	CK_LIST_FOREACH(sc, &GRE_HASH(ip->ip_dst.s_addr,
 	    ip->ip_src.s_addr), chain) {
 		/*
 		 * This is an inbound packet, its ip_dst is source address
 		 * in softc.
 		 */
 		if (sc->gre_oip.ip_src.s_addr == ip->ip_dst.s_addr &&
 		    sc->gre_oip.ip_dst.s_addr == ip->ip_src.s_addr) {
 			if ((GRE2IFP(sc)->if_flags & IFF_UP) == 0)
 				return (0);
 			*arg = sc;
 			return (ENCAP_DRV_LOOKUP);
 		}
 	}
 	return (0);
 }
 
 /*
  * Check that ingress address belongs to local host.
  */
 static void
 in_gre_set_running(struct gre_softc *sc)
 {
 
 	if (in_localip(sc->gre_oip.ip_src))
 		GRE2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING;
 	else
 		GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING;
 }
 
 /*
  * ifaddr_event handler.
  * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent
  * source address spoofing.
  */
 static void
 in_gre_srcaddr(void *arg __unused, const struct sockaddr *sa,
     int event __unused)
 {
 	const struct sockaddr_in *sin;
 	struct gre_softc *sc;
 
 	/* Check that VNET is ready */
 	if (V_ipv4_hashtbl == NULL)
 		return;
 
 	MPASS(in_epoch(net_epoch_preempt));
 	sin = (const struct sockaddr_in *)sa;
 	CK_LIST_FOREACH(sc, &GRE_SRCHASH(sin->sin_addr.s_addr), srchash) {
 		if (sc->gre_oip.ip_src.s_addr != sin->sin_addr.s_addr)
 			continue;
 		in_gre_set_running(sc);
 	}
 }
 
 static void
+in_gre_udp_input(struct mbuf *m, int off, struct inpcb *inp,
+    const struct sockaddr *sa, void *ctx)
+{
+	struct epoch_tracker et;
+	struct gre_socket *gs;
+	struct gre_softc *sc;
+	in_addr_t dst;
+
+	NET_EPOCH_ENTER(et);
+	/*
+	 * udp_append() holds reference to inp, it is safe to check
+	 * inp_flags2 without INP_RLOCK().
+	 * If socket was closed before we have entered NET_EPOCH section,
+	 * INP_FREED flag should be set. Otherwise it should be safe to
+	 * make access to ctx data, because gre_so will be freed by
+	 * gre_sofree() via epoch_call().
+	 */
+	if (__predict_false(inp->inp_flags2 & INP_FREED)) {
+		NET_EPOCH_EXIT(et);
+		m_freem(m);
+		return;
+	}
+
+	gs = (struct gre_socket *)ctx;
+	dst = ((const struct sockaddr_in *)sa)->sin_addr.s_addr;
+	CK_LIST_FOREACH(sc, &gs->list, chain) {
+		if (sc->gre_oip.ip_dst.s_addr == dst)
+			break;
+	}
+	if (sc != NULL && (GRE2IFP(sc)->if_flags & IFF_UP) != 0){
+		gre_input(m, off + sizeof(struct udphdr), IPPROTO_UDP, sc);
+		NET_EPOCH_EXIT(et);
+		return;
+	}
+	m_freem(m);
+	NET_EPOCH_EXIT(et);
+}
+
+static int
+in_gre_setup_socket(struct gre_softc *sc)
+{
+	struct sockopt sopt;
+	struct sockaddr_in sin;
+	struct in_gre_socket *s;
+	struct gre_socket *gs;
+	in_addr_t addr;
+	int error, value;
+
+	/*
+	 * NOTE: we are protected with gre_ioctl_sx lock.
+	 *
+	 * First check that socket is already configured.
+	 * If so, check that source addres was not changed.
+	 * If address is different, check that there are no other tunnels
+	 * and close socket.
+	 */
+	addr = sc->gre_oip.ip_src.s_addr;
+	gs = sc->gre_so;
+	if (gs != NULL) {
+		s = __containerof(gs, struct in_gre_socket, base);
+		if (s->addr != addr) {
+			if (CK_LIST_EMPTY(&gs->list)) {
+				CK_LIST_REMOVE(gs, chain);
+				soclose(gs->so);
+				epoch_call(net_epoch_preempt, &gs->epoch_ctx,
+				    gre_sofree);
+			}
+			gs = sc->gre_so = NULL;
+		}
+	}
+
+	if (gs == NULL) {
+		/*
+		 * Check that socket for given address is already
+		 * configured.
+		 */
+		gs = in_gre_lookup_socket(addr);
+		if (gs == NULL) {
+			s = malloc(sizeof(*s), M_GRE, M_WAITOK | M_ZERO);
+			s->addr = addr;
+			gs = &s->base;
+
+			error = socreate(sc->gre_family, &gs->so,
+			    SOCK_DGRAM, IPPROTO_UDP, curthread->td_ucred,
+			    curthread);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot create socket: %d\n", error);
+				free(s, M_GRE);
+				return (error);
+			}
+
+			error = udp_set_kernel_tunneling(gs->so,
+			    in_gre_udp_input, NULL, gs);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot set UDP tunneling: %d\n", error);
+				goto fail;
+			}
+
+			memset(&sopt, 0, sizeof(sopt));
+			sopt.sopt_dir = SOPT_SET;
+			sopt.sopt_level = IPPROTO_IP;
+			sopt.sopt_name = IP_BINDANY;
+			sopt.sopt_val = &value;
+			sopt.sopt_valsize = sizeof(value);
+			value = 1;
+			error = sosetopt(gs->so, &sopt);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot set IP_BINDANY opt: %d\n", error);
+				goto fail;
+			}
+
+			memset(&sin, 0, sizeof(sin));
+			sin.sin_family = AF_INET;
+			sin.sin_len = sizeof(sin);
+			sin.sin_addr.s_addr = addr;
+			sin.sin_port = htons(GRE_UDPPORT);
+			error = sobind(gs->so, (struct sockaddr *)&sin,
+			    curthread);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot bind socket: %d\n", error);
+				goto fail;
+			}
+			/* Add socket to the chain */
+			CK_LIST_INSERT_HEAD(&GRE_SOCKHASH(addr), gs, chain);
+		}
+	}
+
+	/* Add softc to the socket's list */
+	CK_LIST_INSERT_HEAD(&gs->list, sc, chain);
+	sc->gre_so = gs;
+	return (0);
+fail:
+	soclose(gs->so);
+	free(s, M_GRE);
+	return (error);
+}
+
+static int
 in_gre_attach(struct gre_softc *sc)
 {
+	struct grehdr *gh;
+	int error;
 
-	sc->gre_hlen = sizeof(struct greip);
+	if (sc->gre_options & GRE_UDPENCAP) {
+		sc->gre_csumflags = CSUM_UDP;
+		sc->gre_hlen = sizeof(struct greudp);
+		sc->gre_oip.ip_p = IPPROTO_UDP;
+		gh = &sc->gre_udphdr->gi_gre;
+		gre_update_udphdr(sc, &sc->gre_udp,
+		    in_pseudo(sc->gre_oip.ip_src.s_addr,
+		    sc->gre_oip.ip_dst.s_addr, 0));
+	} else {
+		sc->gre_hlen = sizeof(struct greip);
+		sc->gre_oip.ip_p = IPPROTO_GRE;
+		gh = &sc->gre_iphdr->gi_gre;
+	}
 	sc->gre_oip.ip_v = IPVERSION;
 	sc->gre_oip.ip_hl = sizeof(struct ip) >> 2;
-	sc->gre_oip.ip_p = IPPROTO_GRE;
-	gre_updatehdr(sc, &sc->gre_gihdr->gi_gre);
-	CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain);
+	gre_update_hdr(sc, gh);
+
+	/*
+	 * If we return error, this means that sc is not linked,
+	 * and caller should reset gre_family and free(sc->gre_hdr).
+	 */
+	if (sc->gre_options & GRE_UDPENCAP) {
+		error = in_gre_setup_socket(sc);
+		if (error != 0)
+			return (error);
+	} else
+		CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain);
 	CK_LIST_INSERT_HEAD(&GRE_SRCHASH(sc->gre_oip.ip_src.s_addr),
 	    sc, srchash);
+
+	/* Set IFF_DRV_RUNNING if interface is ready */
+	in_gre_set_running(sc);
+	return (0);
 }
 
-void
+int
 in_gre_setopts(struct gre_softc *sc, u_long cmd, uint32_t value)
 {
+	int error;
 
-	MPASS(cmd == GRESKEY || cmd == GRESOPTS);
-
 	/* NOTE: we are protected with gre_ioctl_sx lock */
+	MPASS(cmd == GRESKEY || cmd == GRESOPTS || cmd == GRESPORT);
 	MPASS(sc->gre_family == AF_INET);
+
+	/*
+	 * If we are going to change encapsulation protocol, do check
+	 * for duplicate tunnels. Return EEXIST here to do not confuse
+	 * user.
+	 */
+	if (cmd == GRESOPTS &&
+	    (sc->gre_options & GRE_UDPENCAP) != (value & GRE_UDPENCAP) &&
+	    in_gre_checkdup(sc, sc->gre_oip.ip_src.s_addr,
+		sc->gre_oip.ip_dst.s_addr, value) == EADDRNOTAVAIL)
+		return (EEXIST);
+
 	CK_LIST_REMOVE(sc, chain);
 	CK_LIST_REMOVE(sc, srchash);
 	GRE_WAIT();
-	if (cmd == GRESKEY)
+	switch (cmd) {
+	case GRESKEY:
 		sc->gre_key = value;
-	else
+		break;
+	case GRESOPTS:
 		sc->gre_options = value;
-	in_gre_attach(sc);
+		break;
+	case GRESPORT:
+		sc->gre_port = value;
+		break;
+	}
+	error = in_gre_attach(sc);
+	if (error != 0) {
+		sc->gre_family = 0;
+		free(sc->gre_hdr, M_GRE);
+	}
+	return (error);
 }
 
 int
 in_gre_ioctl(struct gre_softc *sc, u_long cmd, caddr_t data)
 {
 	struct ifreq *ifr = (struct ifreq *)data;
 	struct sockaddr_in *dst, *src;
 	struct ip *ip;
 	int error;
 
 	/* NOTE: we are protected with gre_ioctl_sx lock */
 	error = EINVAL;
 	switch (cmd) {
 	case SIOCSIFPHYADDR:
 		src = &((struct in_aliasreq *)data)->ifra_addr;
 		dst = &((struct in_aliasreq *)data)->ifra_dstaddr;
 
 		/* sanity checks */
 		if (src->sin_family != dst->sin_family ||
 		    src->sin_family != AF_INET ||
 		    src->sin_len != dst->sin_len ||
 		    src->sin_len != sizeof(*src))
 			break;
 		if (src->sin_addr.s_addr == INADDR_ANY ||
 		    dst->sin_addr.s_addr == INADDR_ANY) {
 			error = EADDRNOTAVAIL;
 			break;
 		}
 		if (V_ipv4_hashtbl == NULL) {
 			V_ipv4_hashtbl = gre_hashinit();
 			V_ipv4_srchashtbl = gre_hashinit();
+			V_ipv4_sockets = (struct gre_sockets *)gre_hashinit();
 		}
 		error = in_gre_checkdup(sc, src->sin_addr.s_addr,
-		    dst->sin_addr.s_addr);
+		    dst->sin_addr.s_addr, sc->gre_options);
 		if (error == EADDRNOTAVAIL)
 			break;
 		if (error == EEXIST) {
 			/* Addresses are the same. Just return. */
 			error = 0;
 			break;
 		}
-		ip = malloc(sizeof(struct greip) + 3 * sizeof(uint32_t),
+		ip = malloc(sizeof(struct greudp) + 3 * sizeof(uint32_t),
 		    M_GRE, M_WAITOK | M_ZERO);
 		ip->ip_src.s_addr = src->sin_addr.s_addr;
 		ip->ip_dst.s_addr = dst->sin_addr.s_addr;
 		if (sc->gre_family != 0) {
 			/* Detach existing tunnel first */
 			CK_LIST_REMOVE(sc, chain);
 			CK_LIST_REMOVE(sc, srchash);
 			GRE_WAIT();
 			free(sc->gre_hdr, M_GRE);
 			/* XXX: should we notify about link state change? */
 		}
 		sc->gre_family = AF_INET;
 		sc->gre_hdr = ip;
 		sc->gre_oseq = 0;
 		sc->gre_iseq = UINT32_MAX;
-		in_gre_attach(sc);
-		in_gre_set_running(sc);
+		error = in_gre_attach(sc);
+		if (error != 0) {
+			sc->gre_family = 0;
+			free(sc->gre_hdr, M_GRE);
+		}
 		break;
 	case SIOCGIFPSRCADDR:
 	case SIOCGIFPDSTADDR:
 		if (sc->gre_family != AF_INET) {
 			error = EADDRNOTAVAIL;
 			break;
 		}
 		src = (struct sockaddr_in *)&ifr->ifr_addr;
 		memset(src, 0, sizeof(*src));
 		src->sin_family = AF_INET;
 		src->sin_len = sizeof(*src);
 		src->sin_addr = (cmd == SIOCGIFPSRCADDR) ?
 		    sc->gre_oip.ip_src: sc->gre_oip.ip_dst;
 		error = prison_if(curthread->td_ucred, (struct sockaddr *)src);
 		if (error != 0)
 			memset(src, 0, sizeof(*src));
 		break;
 	}
 	return (error);
 }
 
 int
 in_gre_output(struct mbuf *m, int af, int hlen)
 {
 	struct greip *gi;
 
 	gi = mtod(m, struct greip *);
 	switch (af) {
 	case AF_INET:
 		/*
 		 * gre_transmit() has used M_PREPEND() that doesn't guarantee
 		 * m_data is contiguous more than hlen bytes. Use m_copydata()
 		 * here to avoid m_pullup().
 		 */
 		m_copydata(m, hlen + offsetof(struct ip, ip_tos),
 		    sizeof(u_char), &gi->gi_ip.ip_tos);
 		m_copydata(m, hlen + offsetof(struct ip, ip_id),
 		    sizeof(u_short), (caddr_t)&gi->gi_ip.ip_id);
 		break;
 #ifdef INET6
 	case AF_INET6:
 		gi->gi_ip.ip_tos = 0; /* XXX */
 		ip_fillid(&gi->gi_ip);
 		break;
 #endif
 	}
 	gi->gi_ip.ip_ttl = V_ip_gre_ttl;
 	gi->gi_ip.ip_len = htons(m->m_pkthdr.len);
 	return (ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL));
 }
 
 static const struct srcaddrtab *ipv4_srcaddrtab = NULL;
 static const struct encaptab *ecookie = NULL;
 static const struct encap_config ipv4_encap_cfg = {
 	.proto = IPPROTO_GRE,
 	.min_length = sizeof(struct greip) + sizeof(struct ip),
 	.exact_match = ENCAP_DRV_LOOKUP,
 	.lookup = in_gre_lookup,
 	.input = gre_input
 };
 
 void
 in_gre_init(void)
 {
 
 	if (!IS_DEFAULT_VNET(curvnet))
 		return;
 	ipv4_srcaddrtab = ip_encap_register_srcaddr(in_gre_srcaddr,
 	    NULL, M_WAITOK);
 	ecookie = ip_encap_attach(&ipv4_encap_cfg, NULL, M_WAITOK);
 }
 
 void
 in_gre_uninit(void)
 {
 
 	if (IS_DEFAULT_VNET(curvnet)) {
 		ip_encap_detach(ecookie);
 		ip_encap_unregister_srcaddr(ipv4_srcaddrtab);
 	}
 	if (V_ipv4_hashtbl != NULL) {
 		gre_hashdestroy(V_ipv4_hashtbl);
 		V_ipv4_hashtbl = NULL;
 		GRE_WAIT();
 		gre_hashdestroy(V_ipv4_srchashtbl);
+		gre_hashdestroy((struct gre_list *)V_ipv4_sockets);
 	}
 }
Index: user/ngie/bug-237403/sys/netinet/ip_output.c
===================================================================
--- user/ngie/bug-237403/sys/netinet/ip_output.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/ip_output.c	(revision 346926)
@@ -1,1469 +1,1472 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1988, 1990, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)ip_output.c	8.3 (Berkeley) 1/21/94
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_ratelimit.h"
 #include "opt_ipsec.h"
 #include "opt_mbuf_stress_test.h"
 #include "opt_mpath.h"
 #include "opt_route.h"
 #include "opt_sctp.h"
 #include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/protosw.h>
 #include <sys/rmlock.h>
 #include <sys/sdt.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sysctl.h>
 #include <sys/ucred.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_llatbl.h>
 #include <net/netisr.h>
 #include <net/pfil.h>
 #include <net/route.h>
 #ifdef RADIX_MPATH
 #include <net/radix_mpath.h>
 #endif
 #include <net/rss_config.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_kdtrace.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/in_pcb.h>
 #include <netinet/in_rss.h>
 #include <netinet/in_var.h>
 #include <netinet/ip_var.h>
 #include <netinet/ip_options.h>
 
 #include <netinet/udp.h>
 #include <netinet/udp_var.h>
 
 #ifdef SCTP
 #include <netinet/sctp.h>
 #include <netinet/sctp_crc32.h>
 #endif
 
 #include <netipsec/ipsec_support.h>
 
 #include <machine/in_cksum.h>
 
 #include <security/mac/mac_framework.h>
 
 #ifdef MBUF_STRESS_TEST
 static int mbuf_frag_size = 0;
 SYSCTL_INT(_net_inet_ip, OID_AUTO, mbuf_frag_size, CTLFLAG_RW,
 	&mbuf_frag_size, 0, "Fragment outgoing mbufs to this size");
 #endif
 
 static void	ip_mloopback(struct ifnet *, const struct mbuf *, int);
 
 
 extern int in_mcast_loop;
 extern	struct protosw inetsw[];
 
 static inline int
 ip_output_pfil(struct mbuf **mp, struct ifnet *ifp, struct inpcb *inp,
     struct sockaddr_in *dst, int *fibnum, int *error)
 {
 	struct m_tag *fwd_tag = NULL;
 	struct mbuf *m;
 	struct in_addr odst;
 	struct ip *ip;
 
 	m = *mp;
 	ip = mtod(m, struct ip *);
 
 	/* Run through list of hooks for output packets. */
 	odst.s_addr = ip->ip_dst.s_addr;
 	switch (pfil_run_hooks(V_inet_pfil_head, mp, ifp, PFIL_OUT, inp)) {
 	case PFIL_DROPPED:
 		*error = EPERM;
 		/* FALLTHROUGH */
 	case PFIL_CONSUMED:
 		return 1; /* Finished */
 	case PFIL_PASS:
 		*error = 0;
 	}
 	m = *mp;
 	ip = mtod(m, struct ip *);
 
 	/* See if destination IP address was changed by packet filter. */
 	if (odst.s_addr != ip->ip_dst.s_addr) {
 		m->m_flags |= M_SKIP_FIREWALL;
 		/* If destination is now ourself drop to ip_input(). */
 		if (in_localip(ip->ip_dst)) {
 			m->m_flags |= M_FASTFWD_OURS;
 			if (m->m_pkthdr.rcvif == NULL)
 				m->m_pkthdr.rcvif = V_loif;
 			if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
 				m->m_pkthdr.csum_flags |=
 					CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 				m->m_pkthdr.csum_data = 0xffff;
 			}
 			m->m_pkthdr.csum_flags |=
 				CSUM_IP_CHECKED | CSUM_IP_VALID;
 #ifdef SCTP
 			if (m->m_pkthdr.csum_flags & CSUM_SCTP)
 				m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID;
 #endif
 			*error = netisr_queue(NETISR_IP, m);
 			return 1; /* Finished */
 		}
 
 		bzero(dst, sizeof(*dst));
 		dst->sin_family = AF_INET;
 		dst->sin_len = sizeof(*dst);
 		dst->sin_addr = ip->ip_dst;
 
 		return -1; /* Reloop */
 	}
 	/* See if fib was changed by packet filter. */
 	if ((*fibnum) != M_GETFIB(m)) {
 		m->m_flags |= M_SKIP_FIREWALL;
 		*fibnum = M_GETFIB(m);
 		return -1; /* Reloop for FIB change */
 	}
 
 	/* See if local, if yes, send it to netisr with IP_FASTFWD_OURS. */
 	if (m->m_flags & M_FASTFWD_OURS) {
 		if (m->m_pkthdr.rcvif == NULL)
 			m->m_pkthdr.rcvif = V_loif;
 		if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
 			m->m_pkthdr.csum_flags |=
 				CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 			m->m_pkthdr.csum_data = 0xffff;
 		}
 #ifdef SCTP
 		if (m->m_pkthdr.csum_flags & CSUM_SCTP)
 			m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID;
 #endif
 		m->m_pkthdr.csum_flags |=
 			CSUM_IP_CHECKED | CSUM_IP_VALID;
 
 		*error = netisr_queue(NETISR_IP, m);
 		return 1; /* Finished */
 	}
 	/* Or forward to some other address? */
 	if ((m->m_flags & M_IP_NEXTHOP) &&
 	    ((fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL)) != NULL)) {
 		bcopy((fwd_tag+1), dst, sizeof(struct sockaddr_in));
 		m->m_flags |= M_SKIP_FIREWALL;
 		m->m_flags &= ~M_IP_NEXTHOP;
 		m_tag_delete(m, fwd_tag);
 
 		return -1; /* Reloop for CHANGE of dst */
 	}
 
 	return 0;
 }
 
 /*
  * IP output.  The packet in mbuf chain m contains a skeletal IP
  * header (with len, off, ttl, proto, tos, src, dst).
  * The mbuf chain containing the packet will be freed.
  * The mbuf opt, if present, will not be freed.
  * If route ro is present and has ro_rt initialized, route lookup would be
  * skipped and ro->ro_rt would be used. If ro is present but ro->ro_rt is NULL,
  * then result of route lookup is stored in ro->ro_rt.
  *
  * In the IP forwarding case, the packet will arrive with options already
  * inserted, so must have a NULL opt pointer.
  */
 int
 ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags,
     struct ip_moptions *imo, struct inpcb *inp)
 {
 	struct rm_priotracker in_ifa_tracker;
 	struct epoch_tracker et;
 	struct ip *ip;
 	struct ifnet *ifp = NULL;	/* keep compiler happy */
 	struct mbuf *m0;
 	int hlen = sizeof (struct ip);
 	int mtu;
 	int error = 0;
 	struct sockaddr_in *dst;
 	const struct sockaddr_in *gw;
 	struct in_ifaddr *ia;
 	int isbroadcast;
 	uint16_t ip_len, ip_off;
 	struct route iproute;
 	struct rtentry *rte;	/* cache for ro->ro_rt */
 	uint32_t fibnum;
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	int no_route_but_check_spd = 0;
 #endif
 	M_ASSERTPKTHDR(m);
 
 	if (inp != NULL) {
 		INP_LOCK_ASSERT(inp);
 		M_SETFIB(m, inp->inp_inc.inc_fibnum);
 		if ((flags & IP_NODEFAULTFLOWID) == 0) {
 			m->m_pkthdr.flowid = inp->inp_flowid;
 			M_HASHTYPE_SET(m, inp->inp_flowtype);
 		}
+#ifdef NUMA
+		m->m_pkthdr.numa_domain = inp->inp_numa_domain;
+#endif
 	}
 
 	if (ro == NULL) {
 		ro = &iproute;
 		bzero(ro, sizeof (*ro));
 	}
 
 	if (opt) {
 		int len = 0;
 		m = ip_insertoptions(m, opt, &len);
 		if (len != 0)
 			hlen = len; /* ip->ip_hl is updated above */
 	}
 	ip = mtod(m, struct ip *);
 	ip_len = ntohs(ip->ip_len);
 	ip_off = ntohs(ip->ip_off);
 
 	if ((flags & (IP_FORWARDING|IP_RAWOUTPUT)) == 0) {
 		ip->ip_v = IPVERSION;
 		ip->ip_hl = hlen >> 2;
 		ip_fillid(ip);
 	} else {
 		/* Header already set, fetch hlen from there */
 		hlen = ip->ip_hl << 2;
 	}
 	if ((flags & IP_FORWARDING) == 0)
 		IPSTAT_INC(ips_localout);
 
 	/*
 	 * dst/gw handling:
 	 *
 	 * dst can be rewritten but always points to &ro->ro_dst.
 	 * gw is readonly but can point either to dst OR rt_gateway,
 	 * therefore we need restore gw if we're redoing lookup.
 	 */
 	gw = dst = (struct sockaddr_in *)&ro->ro_dst;
 	fibnum = (inp != NULL) ? inp->inp_inc.inc_fibnum : M_GETFIB(m);
 	rte = ro->ro_rt;
 	if (rte == NULL) {
 		bzero(dst, sizeof(*dst));
 		dst->sin_family = AF_INET;
 		dst->sin_len = sizeof(*dst);
 		dst->sin_addr = ip->ip_dst;
 	}
 	NET_EPOCH_ENTER(et);
 again:
 	/*
 	 * Validate route against routing table additions;
 	 * a better/more specific route might have been added.
 	 */
 	if (inp)
 		RT_VALIDATE(ro, &inp->inp_rt_cookie, fibnum);
 	/*
 	 * If there is a cached route,
 	 * check that it is to the same destination
 	 * and is still up.  If not, free it and try again.
 	 * The address family should also be checked in case of sharing the
 	 * cache with IPv6.
 	 * Also check whether routing cache needs invalidation.
 	 */
 	rte = ro->ro_rt;
 	if (rte && ((rte->rt_flags & RTF_UP) == 0 ||
 		    rte->rt_ifp == NULL ||
 		    !RT_LINK_IS_UP(rte->rt_ifp) ||
 			  dst->sin_family != AF_INET ||
 			  dst->sin_addr.s_addr != ip->ip_dst.s_addr)) {
 		RO_INVALIDATE_CACHE(ro);
 		rte = NULL;
 	}
 	ia = NULL;
 	/*
 	 * If routing to interface only, short circuit routing lookup.
 	 * The use of an all-ones broadcast address implies this; an
 	 * interface is specified by the broadcast address of an interface,
 	 * or the destination address of a ptp interface.
 	 */
 	if (flags & IP_SENDONES) {
 		if ((ia = ifatoia(ifa_ifwithbroadaddr(sintosa(dst),
 						      M_GETFIB(m)))) == NULL &&
 		    (ia = ifatoia(ifa_ifwithdstaddr(sintosa(dst),
 						    M_GETFIB(m)))) == NULL) {
 			IPSTAT_INC(ips_noroute);
 			error = ENETUNREACH;
 			goto bad;
 		}
 		ip->ip_dst.s_addr = INADDR_BROADCAST;
 		dst->sin_addr = ip->ip_dst;
 		ifp = ia->ia_ifp;
 		ip->ip_ttl = 1;
 		isbroadcast = 1;
 	} else if (flags & IP_ROUTETOIF) {
 		if ((ia = ifatoia(ifa_ifwithdstaddr(sintosa(dst),
 						    M_GETFIB(m)))) == NULL &&
 		    (ia = ifatoia(ifa_ifwithnet(sintosa(dst), 0,
 						M_GETFIB(m)))) == NULL) {
 			IPSTAT_INC(ips_noroute);
 			error = ENETUNREACH;
 			goto bad;
 		}
 		ifp = ia->ia_ifp;
 		ip->ip_ttl = 1;
 		isbroadcast = ifp->if_flags & IFF_BROADCAST ?
 		    in_ifaddr_broadcast(dst->sin_addr, ia) : 0;
 	} else if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) &&
 	    imo != NULL && imo->imo_multicast_ifp != NULL) {
 		/*
 		 * Bypass the normal routing lookup for multicast
 		 * packets if the interface is specified.
 		 */
 		ifp = imo->imo_multicast_ifp;
 		IFP_TO_IA(ifp, ia, &in_ifa_tracker);
 		isbroadcast = 0;	/* fool gcc */
 	} else {
 		/*
 		 * We want to do any cloning requested by the link layer,
 		 * as this is probably required in all cases for correct
 		 * operation (as it is for ARP).
 		 */
 		if (rte == NULL) {
 #ifdef RADIX_MPATH
 			rtalloc_mpath_fib(ro,
 			    ntohl(ip->ip_src.s_addr ^ ip->ip_dst.s_addr),
 			    fibnum);
 #else
 			in_rtalloc_ign(ro, 0, fibnum);
 #endif
 			rte = ro->ro_rt;
 		}
 		if (rte == NULL ||
 		    (rte->rt_flags & RTF_UP) == 0 ||
 		    rte->rt_ifp == NULL ||
 		    !RT_LINK_IS_UP(rte->rt_ifp)) {
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 			/*
 			 * There is no route for this packet, but it is
 			 * possible that a matching SPD entry exists.
 			 */
 			no_route_but_check_spd = 1;
 			mtu = 0; /* Silence GCC warning. */
 			goto sendit;
 #endif
 			IPSTAT_INC(ips_noroute);
 			error = EHOSTUNREACH;
 			goto bad;
 		}
 		ia = ifatoia(rte->rt_ifa);
 		ifp = rte->rt_ifp;
 		counter_u64_add(rte->rt_pksent, 1);
 		rt_update_ro_flags(ro);
 		if (rte->rt_flags & RTF_GATEWAY)
 			gw = (struct sockaddr_in *)rte->rt_gateway;
 		if (rte->rt_flags & RTF_HOST)
 			isbroadcast = (rte->rt_flags & RTF_BROADCAST);
 		else if (ifp->if_flags & IFF_BROADCAST)
 			isbroadcast = in_ifaddr_broadcast(gw->sin_addr, ia);
 		else
 			isbroadcast = 0;
 	}
 
 	/*
 	 * Calculate MTU.  If we have a route that is up, use that,
 	 * otherwise use the interface's MTU.
 	 */
 	if (rte != NULL && (rte->rt_flags & (RTF_UP|RTF_HOST)))
 		mtu = rte->rt_mtu;
 	else
 		mtu = ifp->if_mtu;
 	/* Catch a possible divide by zero later. */
 	KASSERT(mtu > 0, ("%s: mtu %d <= 0, rte=%p (rt_flags=0x%08x) ifp=%p",
 	    __func__, mtu, rte, (rte != NULL) ? rte->rt_flags : 0, ifp));
 
 	if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr))) {
 		m->m_flags |= M_MCAST;
 		/*
 		 * IP destination address is multicast.  Make sure "gw"
 		 * still points to the address in "ro".  (It may have been
 		 * changed to point to a gateway address, above.)
 		 */
 		gw = dst;
 		/*
 		 * See if the caller provided any multicast options
 		 */
 		if (imo != NULL) {
 			ip->ip_ttl = imo->imo_multicast_ttl;
 			if (imo->imo_multicast_vif != -1)
 				ip->ip_src.s_addr =
 				    ip_mcast_src ?
 				    ip_mcast_src(imo->imo_multicast_vif) :
 				    INADDR_ANY;
 		} else
 			ip->ip_ttl = IP_DEFAULT_MULTICAST_TTL;
 		/*
 		 * Confirm that the outgoing interface supports multicast.
 		 */
 		if ((imo == NULL) || (imo->imo_multicast_vif == -1)) {
 			if ((ifp->if_flags & IFF_MULTICAST) == 0) {
 				IPSTAT_INC(ips_noroute);
 				error = ENETUNREACH;
 				goto bad;
 			}
 		}
 		/*
 		 * If source address not specified yet, use address
 		 * of outgoing interface.
 		 */
 		if (ip->ip_src.s_addr == INADDR_ANY) {
 			/* Interface may have no addresses. */
 			if (ia != NULL)
 				ip->ip_src = IA_SIN(ia)->sin_addr;
 		}
 
 		if ((imo == NULL && in_mcast_loop) ||
 		    (imo && imo->imo_multicast_loop)) {
 			/*
 			 * Loop back multicast datagram if not expressly
 			 * forbidden to do so, even if we are not a member
 			 * of the group; ip_input() will filter it later,
 			 * thus deferring a hash lookup and mutex acquisition
 			 * at the expense of a cheap copy using m_copym().
 			 */
 			ip_mloopback(ifp, m, hlen);
 		} else {
 			/*
 			 * If we are acting as a multicast router, perform
 			 * multicast forwarding as if the packet had just
 			 * arrived on the interface to which we are about
 			 * to send.  The multicast forwarding function
 			 * recursively calls this function, using the
 			 * IP_FORWARDING flag to prevent infinite recursion.
 			 *
 			 * Multicasts that are looped back by ip_mloopback(),
 			 * above, will be forwarded by the ip_input() routine,
 			 * if necessary.
 			 */
 			if (V_ip_mrouter && (flags & IP_FORWARDING) == 0) {
 				/*
 				 * If rsvp daemon is not running, do not
 				 * set ip_moptions. This ensures that the packet
 				 * is multicast and not just sent down one link
 				 * as prescribed by rsvpd.
 				 */
 				if (!V_rsvp_on)
 					imo = NULL;
 				if (ip_mforward &&
 				    ip_mforward(ip, ifp, m, imo) != 0) {
 					m_freem(m);
 					goto done;
 				}
 			}
 		}
 
 		/*
 		 * Multicasts with a time-to-live of zero may be looped-
 		 * back, above, but must not be transmitted on a network.
 		 * Also, multicasts addressed to the loopback interface
 		 * are not sent -- the above call to ip_mloopback() will
 		 * loop back a copy. ip_input() will drop the copy if
 		 * this host does not belong to the destination group on
 		 * the loopback interface.
 		 */
 		if (ip->ip_ttl == 0 || ifp->if_flags & IFF_LOOPBACK) {
 			m_freem(m);
 			goto done;
 		}
 
 		goto sendit;
 	}
 
 	/*
 	 * If the source address is not specified yet, use the address
 	 * of the outoing interface.
 	 */
 	if (ip->ip_src.s_addr == INADDR_ANY) {
 		/* Interface may have no addresses. */
 		if (ia != NULL) {
 			ip->ip_src = IA_SIN(ia)->sin_addr;
 		}
 	}
 
 	/*
 	 * Look for broadcast address and
 	 * verify user is allowed to send
 	 * such a packet.
 	 */
 	if (isbroadcast) {
 		if ((ifp->if_flags & IFF_BROADCAST) == 0) {
 			error = EADDRNOTAVAIL;
 			goto bad;
 		}
 		if ((flags & IP_ALLOWBROADCAST) == 0) {
 			error = EACCES;
 			goto bad;
 		}
 		/* don't allow broadcast messages to be fragmented */
 		if (ip_len > mtu) {
 			error = EMSGSIZE;
 			goto bad;
 		}
 		m->m_flags |= M_BCAST;
 	} else {
 		m->m_flags &= ~M_BCAST;
 	}
 
 sendit:
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	if (IPSEC_ENABLED(ipv4)) {
 		if ((error = IPSEC_OUTPUT(ipv4, m, inp)) != 0) {
 			if (error == EINPROGRESS)
 				error = 0;
 			goto done;
 		}
 	}
 	/*
 	 * Check if there was a route for this packet; return error if not.
 	 */
 	if (no_route_but_check_spd) {
 		IPSTAT_INC(ips_noroute);
 		error = EHOSTUNREACH;
 		goto bad;
 	}
 	/* Update variables that are affected by ipsec4_output(). */
 	ip = mtod(m, struct ip *);
 	hlen = ip->ip_hl << 2;
 #endif /* IPSEC */
 
 	/* Jump over all PFIL processing if hooks are not active. */
 	if (PFIL_HOOKED_OUT(V_inet_pfil_head)) {
 		switch (ip_output_pfil(&m, ifp, inp, dst, &fibnum, &error)) {
 		case 1: /* Finished */
 			goto done;
 
 		case 0: /* Continue normally */
 			ip = mtod(m, struct ip *);
 			break;
 
 		case -1: /* Need to try again */
 			/* Reset everything for a new round */
 			RO_RTFREE(ro);
 			ro->ro_prepend = NULL;
 			rte = NULL;
 			gw = dst;
 			ip = mtod(m, struct ip *);
 			goto again;
 
 		}
 	}
 
 	/* IN_LOOPBACK must not appear on the wire - RFC1122. */
 	if (IN_LOOPBACK(ntohl(ip->ip_dst.s_addr)) ||
 	    IN_LOOPBACK(ntohl(ip->ip_src.s_addr))) {
 		if ((ifp->if_flags & IFF_LOOPBACK) == 0) {
 			IPSTAT_INC(ips_badaddr);
 			error = EADDRNOTAVAIL;
 			goto bad;
 		}
 	}
 
 	m->m_pkthdr.csum_flags |= CSUM_IP;
 	if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA & ~ifp->if_hwassist) {
 		in_delayed_cksum(m);
 		m->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
 	}
 #ifdef SCTP
 	if (m->m_pkthdr.csum_flags & CSUM_SCTP & ~ifp->if_hwassist) {
 		sctp_delayed_cksum(m, (uint32_t)(ip->ip_hl << 2));
 		m->m_pkthdr.csum_flags &= ~CSUM_SCTP;
 	}
 #endif
 
 	/*
 	 * If small enough for interface, or the interface will take
 	 * care of the fragmentation for us, we can just send directly.
 	 */
 	if (ip_len <= mtu ||
 	    (m->m_pkthdr.csum_flags & ifp->if_hwassist & CSUM_TSO) != 0) {
 		ip->ip_sum = 0;
 		if (m->m_pkthdr.csum_flags & CSUM_IP & ~ifp->if_hwassist) {
 			ip->ip_sum = in_cksum(m, hlen);
 			m->m_pkthdr.csum_flags &= ~CSUM_IP;
 		}
 
 		/*
 		 * Record statistics for this interface address.
 		 * With CSUM_TSO the byte/packet count will be slightly
 		 * incorrect because we count the IP+TCP headers only
 		 * once instead of for every generated packet.
 		 */
 		if (!(flags & IP_FORWARDING) && ia) {
 			if (m->m_pkthdr.csum_flags & CSUM_TSO)
 				counter_u64_add(ia->ia_ifa.ifa_opackets,
 				    m->m_pkthdr.len / m->m_pkthdr.tso_segsz);
 			else
 				counter_u64_add(ia->ia_ifa.ifa_opackets, 1);
 
 			counter_u64_add(ia->ia_ifa.ifa_obytes, m->m_pkthdr.len);
 		}
 #ifdef MBUF_STRESS_TEST
 		if (mbuf_frag_size && m->m_pkthdr.len > mbuf_frag_size)
 			m = m_fragment(m, M_NOWAIT, mbuf_frag_size);
 #endif
 		/*
 		 * Reset layer specific mbuf flags
 		 * to avoid confusing lower layers.
 		 */
 		m_clrprotoflags(m);
 		IP_PROBE(send, NULL, NULL, ip, ifp, ip, NULL);
 #ifdef RATELIMIT
 		if (inp != NULL) {
 			if (inp->inp_flags2 & INP_RATE_LIMIT_CHANGED)
 				in_pcboutput_txrtlmt(inp, ifp, m);
 			/* stamp send tag on mbuf */
 			m->m_pkthdr.snd_tag = inp->inp_snd_tag;
 		} else {
 			m->m_pkthdr.snd_tag = NULL;
 		}
 #endif
 		error = (*ifp->if_output)(ifp, m,
 		    (const struct sockaddr *)gw, ro);
 #ifdef RATELIMIT
 		/* check for route change */
 		if (error == EAGAIN)
 			in_pcboutput_eagain(inp);
 #endif
 		goto done;
 	}
 
 	/* Balk when DF bit is set or the interface didn't support TSO. */
 	if ((ip_off & IP_DF) || (m->m_pkthdr.csum_flags & CSUM_TSO)) {
 		error = EMSGSIZE;
 		IPSTAT_INC(ips_cantfrag);
 		goto bad;
 	}
 
 	/*
 	 * Too large for interface; fragment if possible. If successful,
 	 * on return, m will point to a list of packets to be sent.
 	 */
 	error = ip_fragment(ip, &m, mtu, ifp->if_hwassist);
 	if (error)
 		goto bad;
 	for (; m; m = m0) {
 		m0 = m->m_nextpkt;
 		m->m_nextpkt = 0;
 		if (error == 0) {
 			/* Record statistics for this interface address. */
 			if (ia != NULL) {
 				counter_u64_add(ia->ia_ifa.ifa_opackets, 1);
 				counter_u64_add(ia->ia_ifa.ifa_obytes,
 				    m->m_pkthdr.len);
 			}
 			/*
 			 * Reset layer specific mbuf flags
 			 * to avoid confusing upper layers.
 			 */
 			m_clrprotoflags(m);
 
 			IP_PROBE(send, NULL, NULL, mtod(m, struct ip *), ifp,
 			    mtod(m, struct ip *), NULL);
 #ifdef RATELIMIT
 			if (inp != NULL) {
 				if (inp->inp_flags2 & INP_RATE_LIMIT_CHANGED)
 					in_pcboutput_txrtlmt(inp, ifp, m);
 				/* stamp send tag on mbuf */
 				m->m_pkthdr.snd_tag = inp->inp_snd_tag;
 			} else {
 				m->m_pkthdr.snd_tag = NULL;
 			}
 #endif
 			error = (*ifp->if_output)(ifp, m,
 			    (const struct sockaddr *)gw, ro);
 #ifdef RATELIMIT
 			/* check for route change */
 			if (error == EAGAIN)
 				in_pcboutput_eagain(inp);
 #endif
 		} else
 			m_freem(m);
 	}
 
 	if (error == 0)
 		IPSTAT_INC(ips_fragmented);
 
 done:
 	if (ro == &iproute)
 		RO_RTFREE(ro);
 	else if (rte == NULL)
 		/*
 		 * If the caller supplied a route but somehow the reference
 		 * to it has been released need to prevent the caller
 		 * calling RTFREE on it again.
 		 */
 		ro->ro_rt = NULL;
 	NET_EPOCH_EXIT(et);
 	return (error);
  bad:
 	m_freem(m);
 	goto done;
 }
 
 /*
  * Create a chain of fragments which fit the given mtu. m_frag points to the
  * mbuf to be fragmented; on return it points to the chain with the fragments.
  * Return 0 if no error. If error, m_frag may contain a partially built
  * chain of fragments that should be freed by the caller.
  *
  * if_hwassist_flags is the hw offload capabilities (see if_data.ifi_hwassist)
  */
 int
 ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
     u_long if_hwassist_flags)
 {
 	int error = 0;
 	int hlen = ip->ip_hl << 2;
 	int len = (mtu - hlen) & ~7;	/* size of payload in each fragment */
 	int off;
 	struct mbuf *m0 = *m_frag;	/* the original packet		*/
 	int firstlen;
 	struct mbuf **mnext;
 	int nfrags;
 	uint16_t ip_len, ip_off;
 
 	ip_len = ntohs(ip->ip_len);
 	ip_off = ntohs(ip->ip_off);
 
 	if (ip_off & IP_DF) {	/* Fragmentation not allowed */
 		IPSTAT_INC(ips_cantfrag);
 		return EMSGSIZE;
 	}
 
 	/*
 	 * Must be able to put at least 8 bytes per fragment.
 	 */
 	if (len < 8)
 		return EMSGSIZE;
 
 	/*
 	 * If the interface will not calculate checksums on
 	 * fragmented packets, then do it here.
 	 */
 	if (m0->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
 		in_delayed_cksum(m0);
 		m0->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
 	}
 #ifdef SCTP
 	if (m0->m_pkthdr.csum_flags & CSUM_SCTP) {
 		sctp_delayed_cksum(m0, hlen);
 		m0->m_pkthdr.csum_flags &= ~CSUM_SCTP;
 	}
 #endif
 	if (len > PAGE_SIZE) {
 		/*
 		 * Fragment large datagrams such that each segment
 		 * contains a multiple of PAGE_SIZE amount of data,
 		 * plus headers. This enables a receiver to perform
 		 * page-flipping zero-copy optimizations.
 		 *
 		 * XXX When does this help given that sender and receiver
 		 * could have different page sizes, and also mtu could
 		 * be less than the receiver's page size ?
 		 */
 		int newlen;
 
 		off = MIN(mtu, m0->m_pkthdr.len);
 
 		/*
 		 * firstlen (off - hlen) must be aligned on an
 		 * 8-byte boundary
 		 */
 		if (off < hlen)
 			goto smart_frag_failure;
 		off = ((off - hlen) & ~7) + hlen;
 		newlen = (~PAGE_MASK) & mtu;
 		if ((newlen + sizeof (struct ip)) > mtu) {
 			/* we failed, go back the default */
 smart_frag_failure:
 			newlen = len;
 			off = hlen + len;
 		}
 		len = newlen;
 
 	} else {
 		off = hlen + len;
 	}
 
 	firstlen = off - hlen;
 	mnext = &m0->m_nextpkt;		/* pointer to next packet */
 
 	/*
 	 * Loop through length of segment after first fragment,
 	 * make new header and copy data of each part and link onto chain.
 	 * Here, m0 is the original packet, m is the fragment being created.
 	 * The fragments are linked off the m_nextpkt of the original
 	 * packet, which after processing serves as the first fragment.
 	 */
 	for (nfrags = 1; off < ip_len; off += len, nfrags++) {
 		struct ip *mhip;	/* ip header on the fragment */
 		struct mbuf *m;
 		int mhlen = sizeof (struct ip);
 
 		m = m_gethdr(M_NOWAIT, MT_DATA);
 		if (m == NULL) {
 			error = ENOBUFS;
 			IPSTAT_INC(ips_odropped);
 			goto done;
 		}
 		/*
 		 * Make sure the complete packet header gets copied
 		 * from the originating mbuf to the newly created
 		 * mbuf. This also ensures that existing firewall
 		 * classification(s), VLAN tags and so on get copied
 		 * to the resulting fragmented packet(s):
 		 */
 		if (m_dup_pkthdr(m, m0, M_NOWAIT) == 0) {
 			m_free(m);
 			error = ENOBUFS;
 			IPSTAT_INC(ips_odropped);
 			goto done;
 		}
 		/*
 		 * In the first mbuf, leave room for the link header, then
 		 * copy the original IP header including options. The payload
 		 * goes into an additional mbuf chain returned by m_copym().
 		 */
 		m->m_data += max_linkhdr;
 		mhip = mtod(m, struct ip *);
 		*mhip = *ip;
 		if (hlen > sizeof (struct ip)) {
 			mhlen = ip_optcopy(ip, mhip) + sizeof (struct ip);
 			mhip->ip_v = IPVERSION;
 			mhip->ip_hl = mhlen >> 2;
 		}
 		m->m_len = mhlen;
 		/* XXX do we need to add ip_off below ? */
 		mhip->ip_off = ((off - hlen) >> 3) + ip_off;
 		if (off + len >= ip_len)
 			len = ip_len - off;
 		else
 			mhip->ip_off |= IP_MF;
 		mhip->ip_len = htons((u_short)(len + mhlen));
 		m->m_next = m_copym(m0, off, len, M_NOWAIT);
 		if (m->m_next == NULL) {	/* copy failed */
 			m_free(m);
 			error = ENOBUFS;	/* ??? */
 			IPSTAT_INC(ips_odropped);
 			goto done;
 		}
 		m->m_pkthdr.len = mhlen + len;
 #ifdef MAC
 		mac_netinet_fragment(m0, m);
 #endif
 		mhip->ip_off = htons(mhip->ip_off);
 		mhip->ip_sum = 0;
 		if (m->m_pkthdr.csum_flags & CSUM_IP & ~if_hwassist_flags) {
 			mhip->ip_sum = in_cksum(m, mhlen);
 			m->m_pkthdr.csum_flags &= ~CSUM_IP;
 		}
 		*mnext = m;
 		mnext = &m->m_nextpkt;
 	}
 	IPSTAT_ADD(ips_ofragments, nfrags);
 
 	/*
 	 * Update first fragment by trimming what's been copied out
 	 * and updating header.
 	 */
 	m_adj(m0, hlen + firstlen - ip_len);
 	m0->m_pkthdr.len = hlen + firstlen;
 	ip->ip_len = htons((u_short)m0->m_pkthdr.len);
 	ip->ip_off = htons(ip_off | IP_MF);
 	ip->ip_sum = 0;
 	if (m0->m_pkthdr.csum_flags & CSUM_IP & ~if_hwassist_flags) {
 		ip->ip_sum = in_cksum(m0, hlen);
 		m0->m_pkthdr.csum_flags &= ~CSUM_IP;
 	}
 
 done:
 	*m_frag = m0;
 	return error;
 }
 
 void
 in_delayed_cksum(struct mbuf *m)
 {
 	struct ip *ip;
 	struct udphdr *uh;
 	uint16_t cklen, csum, offset;
 
 	ip = mtod(m, struct ip *);
 	offset = ip->ip_hl << 2 ;
 
 	if (m->m_pkthdr.csum_flags & CSUM_UDP) {
 		/* if udp header is not in the first mbuf copy udplen */
 		if (offset + sizeof(struct udphdr) > m->m_len) {
 			m_copydata(m, offset + offsetof(struct udphdr,
 			    uh_ulen), sizeof(cklen), (caddr_t)&cklen);
 			cklen = ntohs(cklen);
 		} else {
 			uh = (struct udphdr *)mtodo(m, offset);
 			cklen = ntohs(uh->uh_ulen);
 		}
 		csum = in_cksum_skip(m, cklen + offset, offset);
 		if (csum == 0)
 			csum = 0xffff;
 	} else {
 		cklen = ntohs(ip->ip_len);
 		csum = in_cksum_skip(m, cklen, offset);
 	}
 	offset += m->m_pkthdr.csum_data;	/* checksum offset */
 
 	if (offset + sizeof(csum) > m->m_len)
 		m_copyback(m, offset, sizeof(csum), (caddr_t)&csum);
 	else
 		*(u_short *)mtodo(m, offset) = csum;
 }
 
 /*
  * IP socket option processing.
  */
 int
 ip_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	struct	inpcb *inp = sotoinpcb(so);
 	int	error, optval;
 #ifdef	RSS
 	uint32_t rss_bucket;
 	int retval;
 #endif
 
 	error = optval = 0;
 	if (sopt->sopt_level != IPPROTO_IP) {
 		error = EINVAL;
 
 		if (sopt->sopt_level == SOL_SOCKET &&
 		    sopt->sopt_dir == SOPT_SET) {
 			switch (sopt->sopt_name) {
 			case SO_REUSEADDR:
 				INP_WLOCK(inp);
 				if ((so->so_options & SO_REUSEADDR) != 0)
 					inp->inp_flags2 |= INP_REUSEADDR;
 				else
 					inp->inp_flags2 &= ~INP_REUSEADDR;
 				INP_WUNLOCK(inp);
 				error = 0;
 				break;
 			case SO_REUSEPORT:
 				INP_WLOCK(inp);
 				if ((so->so_options & SO_REUSEPORT) != 0)
 					inp->inp_flags2 |= INP_REUSEPORT;
 				else
 					inp->inp_flags2 &= ~INP_REUSEPORT;
 				INP_WUNLOCK(inp);
 				error = 0;
 				break;
 			case SO_REUSEPORT_LB:
 				INP_WLOCK(inp);
 				if ((so->so_options & SO_REUSEPORT_LB) != 0)
 					inp->inp_flags2 |= INP_REUSEPORT_LB;
 				else
 					inp->inp_flags2 &= ~INP_REUSEPORT_LB;
 				INP_WUNLOCK(inp);
 				error = 0;
 				break;
 			case SO_SETFIB:
 				INP_WLOCK(inp);
 				inp->inp_inc.inc_fibnum = so->so_fibnum;
 				INP_WUNLOCK(inp);
 				error = 0;
 				break;
 			case SO_MAX_PACING_RATE:
 #ifdef RATELIMIT
 				INP_WLOCK(inp);
 				inp->inp_flags2 |= INP_RATE_LIMIT_CHANGED;
 				INP_WUNLOCK(inp);
 				error = 0;
 #else
 				error = EOPNOTSUPP;
 #endif
 				break;
 			default:
 				break;
 			}
 		}
 		return (error);
 	}
 
 	switch (sopt->sopt_dir) {
 	case SOPT_SET:
 		switch (sopt->sopt_name) {
 		case IP_OPTIONS:
 #ifdef notyet
 		case IP_RETOPTS:
 #endif
 		{
 			struct mbuf *m;
 			if (sopt->sopt_valsize > MLEN) {
 				error = EMSGSIZE;
 				break;
 			}
 			m = m_get(sopt->sopt_td ? M_WAITOK : M_NOWAIT, MT_DATA);
 			if (m == NULL) {
 				error = ENOBUFS;
 				break;
 			}
 			m->m_len = sopt->sopt_valsize;
 			error = sooptcopyin(sopt, mtod(m, char *), m->m_len,
 					    m->m_len);
 			if (error) {
 				m_free(m);
 				break;
 			}
 			INP_WLOCK(inp);
 			error = ip_pcbopts(inp, sopt->sopt_name, m);
 			INP_WUNLOCK(inp);
 			return (error);
 		}
 
 		case IP_BINDANY:
 			if (sopt->sopt_td != NULL) {
 				error = priv_check(sopt->sopt_td,
 				    PRIV_NETINET_BINDANY);
 				if (error)
 					break;
 			}
 			/* FALLTHROUGH */
 		case IP_BINDMULTI:
 #ifdef	RSS
 		case IP_RSS_LISTEN_BUCKET:
 #endif
 		case IP_TOS:
 		case IP_TTL:
 		case IP_MINTTL:
 		case IP_RECVOPTS:
 		case IP_RECVRETOPTS:
 		case IP_ORIGDSTADDR:
 		case IP_RECVDSTADDR:
 		case IP_RECVTTL:
 		case IP_RECVIF:
 		case IP_ONESBCAST:
 		case IP_DONTFRAG:
 		case IP_RECVTOS:
 		case IP_RECVFLOWID:
 #ifdef	RSS
 		case IP_RECVRSSBUCKETID:
 #endif
 			error = sooptcopyin(sopt, &optval, sizeof optval,
 					    sizeof optval);
 			if (error)
 				break;
 
 			switch (sopt->sopt_name) {
 			case IP_TOS:
 				inp->inp_ip_tos = optval;
 				break;
 
 			case IP_TTL:
 				inp->inp_ip_ttl = optval;
 				break;
 
 			case IP_MINTTL:
 				if (optval >= 0 && optval <= MAXTTL)
 					inp->inp_ip_minttl = optval;
 				else
 					error = EINVAL;
 				break;
 
 #define	OPTSET(bit) do {						\
 	INP_WLOCK(inp);							\
 	if (optval)							\
 		inp->inp_flags |= bit;					\
 	else								\
 		inp->inp_flags &= ~bit;					\
 	INP_WUNLOCK(inp);						\
 } while (0)
 
 #define	OPTSET2(bit, val) do {						\
 	INP_WLOCK(inp);							\
 	if (val)							\
 		inp->inp_flags2 |= bit;					\
 	else								\
 		inp->inp_flags2 &= ~bit;				\
 	INP_WUNLOCK(inp);						\
 } while (0)
 
 			case IP_RECVOPTS:
 				OPTSET(INP_RECVOPTS);
 				break;
 
 			case IP_RECVRETOPTS:
 				OPTSET(INP_RECVRETOPTS);
 				break;
 
 			case IP_RECVDSTADDR:
 				OPTSET(INP_RECVDSTADDR);
 				break;
 
 			case IP_ORIGDSTADDR:
 				OPTSET2(INP_ORIGDSTADDR, optval);
 				break;
 
 			case IP_RECVTTL:
 				OPTSET(INP_RECVTTL);
 				break;
 
 			case IP_RECVIF:
 				OPTSET(INP_RECVIF);
 				break;
 
 			case IP_ONESBCAST:
 				OPTSET(INP_ONESBCAST);
 				break;
 			case IP_DONTFRAG:
 				OPTSET(INP_DONTFRAG);
 				break;
 			case IP_BINDANY:
 				OPTSET(INP_BINDANY);
 				break;
 			case IP_RECVTOS:
 				OPTSET(INP_RECVTOS);
 				break;
 			case IP_BINDMULTI:
 				OPTSET2(INP_BINDMULTI, optval);
 				break;
 			case IP_RECVFLOWID:
 				OPTSET2(INP_RECVFLOWID, optval);
 				break;
 #ifdef	RSS
 			case IP_RSS_LISTEN_BUCKET:
 				if ((optval >= 0) &&
 				    (optval < rss_getnumbuckets())) {
 					inp->inp_rss_listen_bucket = optval;
 					OPTSET2(INP_RSS_BUCKET_SET, 1);
 				} else {
 					error = EINVAL;
 				}
 				break;
 			case IP_RECVRSSBUCKETID:
 				OPTSET2(INP_RECVRSSBUCKETID, optval);
 				break;
 #endif
 			}
 			break;
 #undef OPTSET
 #undef OPTSET2
 
 		/*
 		 * Multicast socket options are processed by the in_mcast
 		 * module.
 		 */
 		case IP_MULTICAST_IF:
 		case IP_MULTICAST_VIF:
 		case IP_MULTICAST_TTL:
 		case IP_MULTICAST_LOOP:
 		case IP_ADD_MEMBERSHIP:
 		case IP_DROP_MEMBERSHIP:
 		case IP_ADD_SOURCE_MEMBERSHIP:
 		case IP_DROP_SOURCE_MEMBERSHIP:
 		case IP_BLOCK_SOURCE:
 		case IP_UNBLOCK_SOURCE:
 		case IP_MSFILTER:
 		case MCAST_JOIN_GROUP:
 		case MCAST_LEAVE_GROUP:
 		case MCAST_JOIN_SOURCE_GROUP:
 		case MCAST_LEAVE_SOURCE_GROUP:
 		case MCAST_BLOCK_SOURCE:
 		case MCAST_UNBLOCK_SOURCE:
 			error = inp_setmoptions(inp, sopt);
 			break;
 
 		case IP_PORTRANGE:
 			error = sooptcopyin(sopt, &optval, sizeof optval,
 					    sizeof optval);
 			if (error)
 				break;
 
 			INP_WLOCK(inp);
 			switch (optval) {
 			case IP_PORTRANGE_DEFAULT:
 				inp->inp_flags &= ~(INP_LOWPORT);
 				inp->inp_flags &= ~(INP_HIGHPORT);
 				break;
 
 			case IP_PORTRANGE_HIGH:
 				inp->inp_flags &= ~(INP_LOWPORT);
 				inp->inp_flags |= INP_HIGHPORT;
 				break;
 
 			case IP_PORTRANGE_LOW:
 				inp->inp_flags &= ~(INP_HIGHPORT);
 				inp->inp_flags |= INP_LOWPORT;
 				break;
 
 			default:
 				error = EINVAL;
 				break;
 			}
 			INP_WUNLOCK(inp);
 			break;
 
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 		case IP_IPSEC_POLICY:
 			if (IPSEC_ENABLED(ipv4)) {
 				error = IPSEC_PCBCTL(ipv4, inp, sopt);
 				break;
 			}
 			/* FALLTHROUGH */
 #endif /* IPSEC */
 
 		default:
 			error = ENOPROTOOPT;
 			break;
 		}
 		break;
 
 	case SOPT_GET:
 		switch (sopt->sopt_name) {
 		case IP_OPTIONS:
 		case IP_RETOPTS:
 			INP_RLOCK(inp);
 			if (inp->inp_options) {
 				struct mbuf *options;
 
 				options = m_copym(inp->inp_options, 0,
 				    M_COPYALL, M_NOWAIT);
 				INP_RUNLOCK(inp);
 				if (options != NULL) {
 					error = sooptcopyout(sopt,
 							     mtod(options, char *),
 							     options->m_len);
 					m_freem(options);
 				} else
 					error = ENOMEM;
 			} else {
 				INP_RUNLOCK(inp);
 				sopt->sopt_valsize = 0;
 			}
 			break;
 
 		case IP_TOS:
 		case IP_TTL:
 		case IP_MINTTL:
 		case IP_RECVOPTS:
 		case IP_RECVRETOPTS:
 		case IP_ORIGDSTADDR:
 		case IP_RECVDSTADDR:
 		case IP_RECVTTL:
 		case IP_RECVIF:
 		case IP_PORTRANGE:
 		case IP_ONESBCAST:
 		case IP_DONTFRAG:
 		case IP_BINDANY:
 		case IP_RECVTOS:
 		case IP_BINDMULTI:
 		case IP_FLOWID:
 		case IP_FLOWTYPE:
 		case IP_RECVFLOWID:
 #ifdef	RSS
 		case IP_RSSBUCKETID:
 		case IP_RECVRSSBUCKETID:
 #endif
 			switch (sopt->sopt_name) {
 
 			case IP_TOS:
 				optval = inp->inp_ip_tos;
 				break;
 
 			case IP_TTL:
 				optval = inp->inp_ip_ttl;
 				break;
 
 			case IP_MINTTL:
 				optval = inp->inp_ip_minttl;
 				break;
 
 #define	OPTBIT(bit)	(inp->inp_flags & bit ? 1 : 0)
 #define	OPTBIT2(bit)	(inp->inp_flags2 & bit ? 1 : 0)
 
 			case IP_RECVOPTS:
 				optval = OPTBIT(INP_RECVOPTS);
 				break;
 
 			case IP_RECVRETOPTS:
 				optval = OPTBIT(INP_RECVRETOPTS);
 				break;
 
 			case IP_RECVDSTADDR:
 				optval = OPTBIT(INP_RECVDSTADDR);
 				break;
 
 			case IP_ORIGDSTADDR:
 				optval = OPTBIT2(INP_ORIGDSTADDR);
 				break;
 
 			case IP_RECVTTL:
 				optval = OPTBIT(INP_RECVTTL);
 				break;
 
 			case IP_RECVIF:
 				optval = OPTBIT(INP_RECVIF);
 				break;
 
 			case IP_PORTRANGE:
 				if (inp->inp_flags & INP_HIGHPORT)
 					optval = IP_PORTRANGE_HIGH;
 				else if (inp->inp_flags & INP_LOWPORT)
 					optval = IP_PORTRANGE_LOW;
 				else
 					optval = 0;
 				break;
 
 			case IP_ONESBCAST:
 				optval = OPTBIT(INP_ONESBCAST);
 				break;
 			case IP_DONTFRAG:
 				optval = OPTBIT(INP_DONTFRAG);
 				break;
 			case IP_BINDANY:
 				optval = OPTBIT(INP_BINDANY);
 				break;
 			case IP_RECVTOS:
 				optval = OPTBIT(INP_RECVTOS);
 				break;
 			case IP_FLOWID:
 				optval = inp->inp_flowid;
 				break;
 			case IP_FLOWTYPE:
 				optval = inp->inp_flowtype;
 				break;
 			case IP_RECVFLOWID:
 				optval = OPTBIT2(INP_RECVFLOWID);
 				break;
 #ifdef	RSS
 			case IP_RSSBUCKETID:
 				retval = rss_hash2bucket(inp->inp_flowid,
 				    inp->inp_flowtype,
 				    &rss_bucket);
 				if (retval == 0)
 					optval = rss_bucket;
 				else
 					error = EINVAL;
 				break;
 			case IP_RECVRSSBUCKETID:
 				optval = OPTBIT2(INP_RECVRSSBUCKETID);
 				break;
 #endif
 			case IP_BINDMULTI:
 				optval = OPTBIT2(INP_BINDMULTI);
 				break;
 			}
 			error = sooptcopyout(sopt, &optval, sizeof optval);
 			break;
 
 		/*
 		 * Multicast socket options are processed by the in_mcast
 		 * module.
 		 */
 		case IP_MULTICAST_IF:
 		case IP_MULTICAST_VIF:
 		case IP_MULTICAST_TTL:
 		case IP_MULTICAST_LOOP:
 		case IP_MSFILTER:
 			error = inp_getmoptions(inp, sopt);
 			break;
 
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 		case IP_IPSEC_POLICY:
 			if (IPSEC_ENABLED(ipv4)) {
 				error = IPSEC_PCBCTL(ipv4, inp, sopt);
 				break;
 			}
 			/* FALLTHROUGH */
 #endif /* IPSEC */
 
 		default:
 			error = ENOPROTOOPT;
 			break;
 		}
 		break;
 	}
 	return (error);
 }
 
 /*
  * Routine called from ip_output() to loop back a copy of an IP multicast
  * packet to the input queue of a specified interface.  Note that this
  * calls the output routine of the loopback "driver", but with an interface
  * pointer that might NOT be a loopback interface -- evil, but easier than
  * replicating that code here.
  */
 static void
 ip_mloopback(struct ifnet *ifp, const struct mbuf *m, int hlen)
 {
 	struct ip *ip;
 	struct mbuf *copym;
 
 	/*
 	 * Make a deep copy of the packet because we're going to
 	 * modify the pack in order to generate checksums.
 	 */
 	copym = m_dup(m, M_NOWAIT);
 	if (copym != NULL && (!M_WRITABLE(copym) || copym->m_len < hlen))
 		copym = m_pullup(copym, hlen);
 	if (copym != NULL) {
 		/* If needed, compute the checksum and mark it as valid. */
 		if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
 			in_delayed_cksum(copym);
 			copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA;
 			copym->m_pkthdr.csum_flags |=
 			    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 			copym->m_pkthdr.csum_data = 0xffff;
 		}
 		/*
 		 * We don't bother to fragment if the IP length is greater
 		 * than the interface's MTU.  Can this possibly matter?
 		 */
 		ip = mtod(copym, struct ip *);
 		ip->ip_sum = 0;
 		ip->ip_sum = in_cksum(copym, hlen);
 		if_simloop(ifp, copym, AF_INET, 0);
 	}
 }
Index: user/ngie/bug-237403/sys/netinet/tcp_syncache.c
===================================================================
--- user/ngie/bug-237403/sys/netinet/tcp_syncache.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet/tcp_syncache.c	(revision 346926)
@@ -1,2297 +1,2300 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2001 McAfee, Inc.
  * Copyright (c) 2006,2013 Andre Oppermann, Internet Business Solutions AG
  * All rights reserved.
  *
  * This software was developed for the FreeBSD Project by Jonathan Lemon
  * and McAfee Research, the Security Research Division of McAfee, Inc. under
  * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
  * DARPA CHATS research program. [2001 McAfee, Inc.]
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_ipsec.h"
 #include "opt_pcbgroup.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/hash.h>
 #include <sys/refcount.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/limits.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/proc.h>		/* for proc0 declaration */
 #include <sys/random.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/syslog.h>
 #include <sys/ucred.h>
 
 #include <sys/md5.h>
 #include <crypto/siphash/siphash.h>
 
 #include <vm/uma.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/route.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_kdtrace.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/in_var.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip_var.h>
 #include <netinet/ip_options.h>
 #ifdef INET6
 #include <netinet/ip6.h>
 #include <netinet/icmp6.h>
 #include <netinet6/nd6.h>
 #include <netinet6/ip6_var.h>
 #include <netinet6/in6_pcb.h>
 #endif
 #include <netinet/tcp.h>
 #include <netinet/tcp_fastopen.h>
 #include <netinet/tcp_fsm.h>
 #include <netinet/tcp_seq.h>
 #include <netinet/tcp_timer.h>
 #include <netinet/tcp_var.h>
 #include <netinet/tcp_syncache.h>
 #ifdef INET6
 #include <netinet6/tcp6_var.h>
 #endif
 #ifdef TCP_OFFLOAD
 #include <netinet/toecore.h>
 #endif
 
 #include <netipsec/ipsec_support.h>
 
 #include <machine/in_cksum.h>
 
 #include <security/mac/mac_framework.h>
 
 VNET_DEFINE_STATIC(int, tcp_syncookies) = 1;
 #define	V_tcp_syncookies		VNET(tcp_syncookies)
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, syncookies, CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(tcp_syncookies), 0,
     "Use TCP SYN cookies if the syncache overflows");
 
 VNET_DEFINE_STATIC(int, tcp_syncookiesonly) = 0;
 #define	V_tcp_syncookiesonly		VNET(tcp_syncookiesonly)
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, syncookies_only, CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(tcp_syncookiesonly), 0,
     "Use only TCP SYN cookies");
 
 VNET_DEFINE_STATIC(int, functions_inherit_listen_socket_stack) = 1;
 #define V_functions_inherit_listen_socket_stack \
     VNET(functions_inherit_listen_socket_stack)
 SYSCTL_INT(_net_inet_tcp, OID_AUTO, functions_inherit_listen_socket_stack,
     CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(functions_inherit_listen_socket_stack), 0,
     "Inherit listen socket's stack");
 
 #ifdef TCP_OFFLOAD
 #define ADDED_BY_TOE(sc) ((sc)->sc_tod != NULL)
 #endif
 
 static void	 syncache_drop(struct syncache *, struct syncache_head *);
 static void	 syncache_free(struct syncache *);
 static void	 syncache_insert(struct syncache *, struct syncache_head *);
 static int	 syncache_respond(struct syncache *, struct syncache_head *,
 		    const struct mbuf *, int);
 static struct	 socket *syncache_socket(struct syncache *, struct socket *,
 		    struct mbuf *m);
 static void	 syncache_timeout(struct syncache *sc, struct syncache_head *sch,
 		    int docallout);
 static void	 syncache_timer(void *);
 
 static uint32_t	 syncookie_mac(struct in_conninfo *, tcp_seq, uint8_t,
 		    uint8_t *, uintptr_t);
 static tcp_seq	 syncookie_generate(struct syncache_head *, struct syncache *);
 static struct syncache
 		*syncookie_lookup(struct in_conninfo *, struct syncache_head *,
 		    struct syncache *, struct tcphdr *, struct tcpopt *,
 		    struct socket *);
 static void	 syncookie_reseed(void *);
 #ifdef INVARIANTS
 static int	 syncookie_cmp(struct in_conninfo *inc, struct syncache_head *sch,
 		    struct syncache *sc, struct tcphdr *th, struct tcpopt *to,
 		    struct socket *lso);
 #endif
 
 /*
  * Transmit the SYN,ACK fewer times than TCP_MAXRXTSHIFT specifies.
  * 3 retransmits corresponds to a timeout with default values of
  * tcp_rexmit_initial * (             1 +
  *                       tcp_backoff[1] +
  *                       tcp_backoff[2] +
  *                       tcp_backoff[3]) + 3 * tcp_rexmit_slop,
  * 1000 ms * (1 + 2 + 4 + 8) +  3 * 200 ms = 15600 ms,
  * the odds are that the user has given up attempting to connect by then.
  */
 #define SYNCACHE_MAXREXMTS		3
 
 /* Arbitrary values */
 #define TCP_SYNCACHE_HASHSIZE		512
 #define TCP_SYNCACHE_BUCKETLIMIT	30
 
 VNET_DEFINE_STATIC(struct tcp_syncache, tcp_syncache);
 #define	V_tcp_syncache			VNET(tcp_syncache)
 
 static SYSCTL_NODE(_net_inet_tcp, OID_AUTO, syncache, CTLFLAG_RW, 0,
     "TCP SYN cache");
 
 SYSCTL_UINT(_net_inet_tcp_syncache, OID_AUTO, bucketlimit, CTLFLAG_VNET | CTLFLAG_RDTUN,
     &VNET_NAME(tcp_syncache.bucket_limit), 0,
     "Per-bucket hash limit for syncache");
 
 SYSCTL_UINT(_net_inet_tcp_syncache, OID_AUTO, cachelimit, CTLFLAG_VNET | CTLFLAG_RDTUN,
     &VNET_NAME(tcp_syncache.cache_limit), 0,
     "Overall entry limit for syncache");
 
 SYSCTL_UMA_CUR(_net_inet_tcp_syncache, OID_AUTO, count, CTLFLAG_VNET,
     &VNET_NAME(tcp_syncache.zone), "Current number of entries in syncache");
 
 SYSCTL_UINT(_net_inet_tcp_syncache, OID_AUTO, hashsize, CTLFLAG_VNET | CTLFLAG_RDTUN,
     &VNET_NAME(tcp_syncache.hashsize), 0,
     "Size of TCP syncache hashtable");
 
 static int
 sysctl_net_inet_tcp_syncache_rexmtlimit_check(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	u_int new;
 
 	new = V_tcp_syncache.rexmt_limit;
 	error = sysctl_handle_int(oidp, &new, 0, req);
 	if ((error == 0) && (req->newptr != NULL)) {
 		if (new > TCP_MAXRXTSHIFT)
 			error = EINVAL;
 		else
 			V_tcp_syncache.rexmt_limit = new;
 	}
 	return (error);
 }
 
 SYSCTL_PROC(_net_inet_tcp_syncache, OID_AUTO, rexmtlimit,
     CTLFLAG_VNET | CTLTYPE_UINT | CTLFLAG_RW,
     &VNET_NAME(tcp_syncache.rexmt_limit), 0,
     sysctl_net_inet_tcp_syncache_rexmtlimit_check, "UI",
     "Limit on SYN/ACK retransmissions");
 
 VNET_DEFINE(int, tcp_sc_rst_sock_fail) = 1;
 SYSCTL_INT(_net_inet_tcp_syncache, OID_AUTO, rst_on_sock_fail,
     CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(tcp_sc_rst_sock_fail), 0,
     "Send reset on socket allocation failure");
 
 static MALLOC_DEFINE(M_SYNCACHE, "syncache", "TCP syncache");
 
 #define	SCH_LOCK(sch)		mtx_lock(&(sch)->sch_mtx)
 #define	SCH_UNLOCK(sch)		mtx_unlock(&(sch)->sch_mtx)
 #define	SCH_LOCK_ASSERT(sch)	mtx_assert(&(sch)->sch_mtx, MA_OWNED)
 
 /*
  * Requires the syncache entry to be already removed from the bucket list.
  */
 static void
 syncache_free(struct syncache *sc)
 {
 
 	if (sc->sc_ipopts)
 		(void) m_free(sc->sc_ipopts);
 	if (sc->sc_cred)
 		crfree(sc->sc_cred);
 #ifdef MAC
 	mac_syncache_destroy(&sc->sc_label);
 #endif
 
 	uma_zfree(V_tcp_syncache.zone, sc);
 }
 
 void
 syncache_init(void)
 {
 	int i;
 
 	V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
 	V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT;
 	V_tcp_syncache.rexmt_limit = SYNCACHE_MAXREXMTS;
 	V_tcp_syncache.hash_secret = arc4random();
 
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.hashsize",
 	    &V_tcp_syncache.hashsize);
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.bucketlimit",
 	    &V_tcp_syncache.bucket_limit);
 	if (!powerof2(V_tcp_syncache.hashsize) ||
 	    V_tcp_syncache.hashsize == 0) {
 		printf("WARNING: syncache hash size is not a power of 2.\n");
 		V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
 	}
 	V_tcp_syncache.hashmask = V_tcp_syncache.hashsize - 1;
 
 	/* Set limits. */
 	V_tcp_syncache.cache_limit =
 	    V_tcp_syncache.hashsize * V_tcp_syncache.bucket_limit;
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.cachelimit",
 	    &V_tcp_syncache.cache_limit);
 
 	/* Allocate the hash table. */
 	V_tcp_syncache.hashbase = malloc(V_tcp_syncache.hashsize *
 	    sizeof(struct syncache_head), M_SYNCACHE, M_WAITOK | M_ZERO);
 
 #ifdef VIMAGE
 	V_tcp_syncache.vnet = curvnet;
 #endif
 
 	/* Initialize the hash buckets. */
 	for (i = 0; i < V_tcp_syncache.hashsize; i++) {
 		TAILQ_INIT(&V_tcp_syncache.hashbase[i].sch_bucket);
 		mtx_init(&V_tcp_syncache.hashbase[i].sch_mtx, "tcp_sc_head",
 			 NULL, MTX_DEF);
 		callout_init_mtx(&V_tcp_syncache.hashbase[i].sch_timer,
 			 &V_tcp_syncache.hashbase[i].sch_mtx, 0);
 		V_tcp_syncache.hashbase[i].sch_length = 0;
 		V_tcp_syncache.hashbase[i].sch_sc = &V_tcp_syncache;
 		V_tcp_syncache.hashbase[i].sch_last_overflow =
 		    -(SYNCOOKIE_LIFETIME + 1);
 	}
 
 	/* Create the syncache entry zone. */
 	V_tcp_syncache.zone = uma_zcreate("syncache", sizeof(struct syncache),
 	    NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
 	V_tcp_syncache.cache_limit = uma_zone_set_max(V_tcp_syncache.zone,
 	    V_tcp_syncache.cache_limit);
 
 	/* Start the SYN cookie reseeder callout. */
 	callout_init(&V_tcp_syncache.secret.reseed, 1);
 	arc4rand(V_tcp_syncache.secret.key[0], SYNCOOKIE_SECRET_SIZE, 0);
 	arc4rand(V_tcp_syncache.secret.key[1], SYNCOOKIE_SECRET_SIZE, 0);
 	callout_reset(&V_tcp_syncache.secret.reseed, SYNCOOKIE_LIFETIME * hz,
 	    syncookie_reseed, &V_tcp_syncache);
 }
 
 #ifdef VIMAGE
 void
 syncache_destroy(void)
 {
 	struct syncache_head *sch;
 	struct syncache *sc, *nsc;
 	int i;
 
 	/*
 	 * Stop the re-seed timer before freeing resources.  No need to
 	 * possibly schedule it another time.
 	 */
 	callout_drain(&V_tcp_syncache.secret.reseed);
 
 	/* Cleanup hash buckets: stop timers, free entries, destroy locks. */
 	for (i = 0; i < V_tcp_syncache.hashsize; i++) {
 
 		sch = &V_tcp_syncache.hashbase[i];
 		callout_drain(&sch->sch_timer);
 
 		SCH_LOCK(sch);
 		TAILQ_FOREACH_SAFE(sc, &sch->sch_bucket, sc_hash, nsc)
 			syncache_drop(sc, sch);
 		SCH_UNLOCK(sch);
 		KASSERT(TAILQ_EMPTY(&sch->sch_bucket),
 		    ("%s: sch->sch_bucket not empty", __func__));
 		KASSERT(sch->sch_length == 0, ("%s: sch->sch_length %d not 0",
 		    __func__, sch->sch_length));
 		mtx_destroy(&sch->sch_mtx);
 	}
 
 	KASSERT(uma_zone_get_cur(V_tcp_syncache.zone) == 0,
 	    ("%s: cache_count not 0", __func__));
 
 	/* Free the allocated global resources. */
 	uma_zdestroy(V_tcp_syncache.zone);
 	free(V_tcp_syncache.hashbase, M_SYNCACHE);
 }
 #endif
 
 /*
  * Inserts a syncache entry into the specified bucket row.
  * Locks and unlocks the syncache_head autonomously.
  */
 static void
 syncache_insert(struct syncache *sc, struct syncache_head *sch)
 {
 	struct syncache *sc2;
 
 	SCH_LOCK(sch);
 
 	/*
 	 * Make sure that we don't overflow the per-bucket limit.
 	 * If the bucket is full, toss the oldest element.
 	 */
 	if (sch->sch_length >= V_tcp_syncache.bucket_limit) {
 		KASSERT(!TAILQ_EMPTY(&sch->sch_bucket),
 			("sch->sch_length incorrect"));
 		sc2 = TAILQ_LAST(&sch->sch_bucket, sch_head);
 		sch->sch_last_overflow = time_uptime;
 		syncache_drop(sc2, sch);
 		TCPSTAT_INC(tcps_sc_bucketoverflow);
 	}
 
 	/* Put it into the bucket. */
 	TAILQ_INSERT_HEAD(&sch->sch_bucket, sc, sc_hash);
 	sch->sch_length++;
 
 #ifdef TCP_OFFLOAD
 	if (ADDED_BY_TOE(sc)) {
 		struct toedev *tod = sc->sc_tod;
 
 		tod->tod_syncache_added(tod, sc->sc_todctx);
 	}
 #endif
 
 	/* Reinitialize the bucket row's timer. */
 	if (sch->sch_length == 1)
 		sch->sch_nextc = ticks + INT_MAX;
 	syncache_timeout(sc, sch, 1);
 
 	SCH_UNLOCK(sch);
 
 	TCPSTATES_INC(TCPS_SYN_RECEIVED);
 	TCPSTAT_INC(tcps_sc_added);
 }
 
 /*
  * Remove and free entry from syncache bucket row.
  * Expects locked syncache head.
  */
 static void
 syncache_drop(struct syncache *sc, struct syncache_head *sch)
 {
 
 	SCH_LOCK_ASSERT(sch);
 
 	TCPSTATES_DEC(TCPS_SYN_RECEIVED);
 	TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
 	sch->sch_length--;
 
 #ifdef TCP_OFFLOAD
 	if (ADDED_BY_TOE(sc)) {
 		struct toedev *tod = sc->sc_tod;
 
 		tod->tod_syncache_removed(tod, sc->sc_todctx);
 	}
 #endif
 
 	syncache_free(sc);
 }
 
 /*
  * Engage/reengage time on bucket row.
  */
 static void
 syncache_timeout(struct syncache *sc, struct syncache_head *sch, int docallout)
 {
 	int rexmt;
 
 	if (sc->sc_rxmits == 0)
 		rexmt = tcp_rexmit_initial;
 	else
 		TCPT_RANGESET(rexmt,
 		    tcp_rexmit_initial * tcp_backoff[sc->sc_rxmits],
 		    tcp_rexmit_min, TCPTV_REXMTMAX);
 	sc->sc_rxttime = ticks + rexmt;
 	sc->sc_rxmits++;
 	if (TSTMP_LT(sc->sc_rxttime, sch->sch_nextc)) {
 		sch->sch_nextc = sc->sc_rxttime;
 		if (docallout)
 			callout_reset(&sch->sch_timer, sch->sch_nextc - ticks,
 			    syncache_timer, (void *)sch);
 	}
 }
 
 /*
  * Walk the timer queues, looking for SYN,ACKs that need to be retransmitted.
  * If we have retransmitted an entry the maximum number of times, expire it.
  * One separate timer for each bucket row.
  */
 static void
 syncache_timer(void *xsch)
 {
 	struct syncache_head *sch = (struct syncache_head *)xsch;
 	struct syncache *sc, *nsc;
 	int tick = ticks;
 	char *s;
 
 	CURVNET_SET(sch->sch_sc->vnet);
 
 	/* NB: syncache_head has already been locked by the callout. */
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * In the following cycle we may remove some entries and/or
 	 * advance some timeouts, so re-initialize the bucket timer.
 	 */
 	sch->sch_nextc = tick + INT_MAX;
 
 	TAILQ_FOREACH_SAFE(sc, &sch->sch_bucket, sc_hash, nsc) {
 		/*
 		 * We do not check if the listen socket still exists
 		 * and accept the case where the listen socket may be
 		 * gone by the time we resend the SYN/ACK.  We do
 		 * not expect this to happens often. If it does,
 		 * then the RST will be sent by the time the remote
 		 * host does the SYN/ACK->ACK.
 		 */
 		if (TSTMP_GT(sc->sc_rxttime, tick)) {
 			if (TSTMP_LT(sc->sc_rxttime, sch->sch_nextc))
 				sch->sch_nextc = sc->sc_rxttime;
 			continue;
 		}
 		if (sc->sc_rxmits > V_tcp_syncache.rexmt_limit) {
 			if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 				log(LOG_DEBUG, "%s; %s: Retransmits exhausted, "
 				    "giving up and removing syncache entry\n",
 				    s, __func__);
 				free(s, M_TCPLOG);
 			}
 			syncache_drop(sc, sch);
 			TCPSTAT_INC(tcps_sc_stale);
 			continue;
 		}
 		if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Response timeout, "
 			    "retransmitting (%u) SYN|ACK\n",
 			    s, __func__, sc->sc_rxmits);
 			free(s, M_TCPLOG);
 		}
 
 		syncache_respond(sc, sch, NULL, TH_SYN|TH_ACK);
 		TCPSTAT_INC(tcps_sc_retransmitted);
 		syncache_timeout(sc, sch, 0);
 	}
 	if (!TAILQ_EMPTY(&(sch)->sch_bucket))
 		callout_reset(&(sch)->sch_timer, (sch)->sch_nextc - tick,
 			syncache_timer, (void *)(sch));
 	CURVNET_RESTORE();
 }
 
 /*
  * Find an entry in the syncache.
  * Returns always with locked syncache_head plus a matching entry or NULL.
  */
 static struct syncache *
 syncache_lookup(struct in_conninfo *inc, struct syncache_head **schp)
 {
 	struct syncache *sc;
 	struct syncache_head *sch;
 	uint32_t hash;
 
 	/*
 	 * The hash is built on foreign port + local port + foreign address.
 	 * We rely on the fact that struct in_conninfo starts with 16 bits
 	 * of foreign port, then 16 bits of local port then followed by 128
 	 * bits of foreign address.  In case of IPv4 address, the first 3
 	 * 32-bit words of the address always are zeroes.
 	 */
 	hash = jenkins_hash32((uint32_t *)&inc->inc_ie, 5,
 	    V_tcp_syncache.hash_secret) & V_tcp_syncache.hashmask;
 
 	sch = &V_tcp_syncache.hashbase[hash];
 	*schp = sch;
 	SCH_LOCK(sch);
 
 	/* Circle through bucket row to find matching entry. */
 	TAILQ_FOREACH(sc, &sch->sch_bucket, sc_hash)
 		if (bcmp(&inc->inc_ie, &sc->sc_inc.inc_ie,
 		    sizeof(struct in_endpoints)) == 0)
 			break;
 
 	return (sc);	/* Always returns with locked sch. */
 }
 
 /*
  * This function is called when we get a RST for a
  * non-existent connection, so that we can see if the
  * connection is in the syn cache.  If it is, zap it.
  * If required send a challenge ACK.
  */
 void
 syncache_chkrst(struct in_conninfo *inc, struct tcphdr *th, struct mbuf *m)
 {
 	struct syncache *sc;
 	struct syncache_head *sch;
 	char *s = NULL;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * Any RST to our SYN|ACK must not carry ACK, SYN or FIN flags.
 	 * See RFC 793 page 65, section SEGMENT ARRIVES.
 	 */
 	if (th->th_flags & (TH_ACK|TH_SYN|TH_FIN)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Spurious RST with ACK, SYN or "
 			    "FIN flag set, segment ignored\n", s, __func__);
 		TCPSTAT_INC(tcps_badrst);
 		goto done;
 	}
 
 	/*
 	 * No corresponding connection was found in syncache.
 	 * If syncookies are enabled and possibly exclusively
 	 * used, or we are under memory pressure, a valid RST
 	 * may not find a syncache entry.  In that case we're
 	 * done and no SYN|ACK retransmissions will happen.
 	 * Otherwise the RST was misdirected or spoofed.
 	 */
 	if (sc == NULL) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Spurious RST without matching "
 			    "syncache entry (possibly syncookie only), "
 			    "segment ignored\n", s, __func__);
 		TCPSTAT_INC(tcps_badrst);
 		goto done;
 	}
 
 	/*
 	 * If the RST bit is set, check the sequence number to see
 	 * if this is a valid reset segment.
 	 *
 	 * RFC 793 page 37:
 	 *   In all states except SYN-SENT, all reset (RST) segments
 	 *   are validated by checking their SEQ-fields.  A reset is
 	 *   valid if its sequence number is in the window.
 	 *
 	 * RFC 793 page 69:
 	 *   There are four cases for the acceptability test for an incoming
 	 *   segment:
 	 *
 	 * Segment Receive  Test
 	 * Length  Window
 	 * ------- -------  -------------------------------------------
 	 *    0       0     SEG.SEQ = RCV.NXT
 	 *    0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
 	 *   >0       0     not acceptable
 	 *   >0      >0     RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND
 	 *               or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND
 	 *
 	 * Note that when receiving a SYN segment in the LISTEN state,
 	 * IRS is set to SEG.SEQ and RCV.NXT is set to SEG.SEQ+1, as
 	 * described in RFC 793, page 66.
 	 */
 	if ((SEQ_GEQ(th->th_seq, sc->sc_irs + 1) &&
 	    SEQ_LT(th->th_seq, sc->sc_irs + 1 + sc->sc_wnd)) ||
 	    (sc->sc_wnd == 0 && th->th_seq == sc->sc_irs + 1)) {
 		if (V_tcp_insecure_rst ||
 		    th->th_seq == sc->sc_irs + 1) {
 			syncache_drop(sc, sch);
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG,
 				    "%s; %s: Our SYN|ACK was rejected, "
 				    "connection attempt aborted by remote "
 				    "endpoint\n",
 				    s, __func__);
 			TCPSTAT_INC(tcps_sc_reset);
 		} else {
 			TCPSTAT_INC(tcps_badrst);
 			/* Send challenge ACK. */
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: RST with invalid "
 				    " SEQ %u != NXT %u (+WND %u), "
 				    "sending challenge ACK\n",
 				    s, __func__,
 				    th->th_seq, sc->sc_irs + 1, sc->sc_wnd);
 			syncache_respond(sc, sch, m, TH_ACK);
 		}
 	} else {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: RST with invalid SEQ %u != "
 			    "NXT %u (+WND %u), segment ignored\n",
 			    s, __func__,
 			    th->th_seq, sc->sc_irs + 1, sc->sc_wnd);
 		TCPSTAT_INC(tcps_badrst);
 	}
 
 done:
 	if (s != NULL)
 		free(s, M_TCPLOG);
 	SCH_UNLOCK(sch);
 }
 
 void
 syncache_badack(struct in_conninfo *inc)
 {
 	struct syncache *sc;
 	struct syncache_head *sch;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc != NULL) {
 		syncache_drop(sc, sch);
 		TCPSTAT_INC(tcps_sc_badack);
 	}
 	SCH_UNLOCK(sch);
 }
 
 void
 syncache_unreach(struct in_conninfo *inc, tcp_seq th_seq)
 {
 	struct syncache *sc;
 	struct syncache_head *sch;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc == NULL)
 		goto done;
 
 	/* If the sequence number != sc_iss, then it's a bogus ICMP msg */
 	if (ntohl(th_seq) != sc->sc_iss)
 		goto done;
 
 	/*
 	 * If we've rertransmitted 3 times and this is our second error,
 	 * we remove the entry.  Otherwise, we allow it to continue on.
 	 * This prevents us from incorrectly nuking an entry during a
 	 * spurious network outage.
 	 *
 	 * See tcp_notify().
 	 */
 	if ((sc->sc_flags & SCF_UNREACH) == 0 || sc->sc_rxmits < 3 + 1) {
 		sc->sc_flags |= SCF_UNREACH;
 		goto done;
 	}
 	syncache_drop(sc, sch);
 	TCPSTAT_INC(tcps_sc_unreach);
 done:
 	SCH_UNLOCK(sch);
 }
 
 /*
  * Build a new TCP socket structure from a syncache entry.
  *
  * On success return the newly created socket with its underlying inp locked.
  */
 static struct socket *
 syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m)
 {
 	struct tcp_function_block *blk;
 	struct inpcb *inp = NULL;
 	struct socket *so;
 	struct tcpcb *tp;
 	int error;
 	char *s;
 
 	INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
 
 	/*
 	 * Ok, create the full blown connection, and set things up
 	 * as they would have been set up if we had created the
 	 * connection when the SYN arrived.  If we can't create
 	 * the connection, abort it.
 	 */
 	so = sonewconn(lso, 0);
 	if (so == NULL) {
 		/*
 		 * Drop the connection; we will either send a RST or
 		 * have the peer retransmit its SYN again after its
 		 * RTO and try again.
 		 */
 		TCPSTAT_INC(tcps_listendrop);
 		if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Socket create failed "
 			    "due to limits or memory shortage\n",
 			    s, __func__);
 			free(s, M_TCPLOG);
 		}
 		goto abort2;
 	}
 #ifdef MAC
 	mac_socketpeer_set_from_mbuf(m, so);
 #endif
 
 	inp = sotoinpcb(so);
 	inp->inp_inc.inc_fibnum = so->so_fibnum;
 	INP_WLOCK(inp);
 	/*
 	 * Exclusive pcbinfo lock is not required in syncache socket case even
 	 * if two inpcb locks can be acquired simultaneously:
 	 *  - the inpcb in LISTEN state,
 	 *  - the newly created inp.
 	 *
 	 * In this case, an inp cannot be at same time in LISTEN state and
 	 * just created by an accept() call.
 	 */
 	INP_HASH_WLOCK(&V_tcbinfo);
 
 	/* Insert new socket into PCB hash list. */
 	inp->inp_inc.inc_flags = sc->sc_inc.inc_flags;
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		inp->inp_vflag &= ~INP_IPV4;
 		inp->inp_vflag |= INP_IPV6;
 		inp->in6p_laddr = sc->sc_inc.inc6_laddr;
 	} else {
 		inp->inp_vflag &= ~INP_IPV6;
 		inp->inp_vflag |= INP_IPV4;
 #endif
 		inp->inp_laddr = sc->sc_inc.inc_laddr;
 #ifdef INET6
 	}
 #endif
 
 	/*
 	 * If there's an mbuf and it has a flowid, then let's initialise the
 	 * inp with that particular flowid.
 	 */
 	if (m != NULL && M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
 		inp->inp_flowid = m->m_pkthdr.flowid;
 		inp->inp_flowtype = M_HASHTYPE_GET(m);
+#ifdef NUMA
+		inp->inp_numa_domain = m->m_pkthdr.numa_domain;
+#endif
 	}
 
 	/*
 	 * Install in the reservation hash table for now, but don't yet
 	 * install a connection group since the full 4-tuple isn't yet
 	 * configured.
 	 */
 	inp->inp_lport = sc->sc_inc.inc_lport;
 	if ((error = in_pcbinshash_nopcbgroup(inp)) != 0) {
 		/*
 		 * Undo the assignments above if we failed to
 		 * put the PCB on the hash lists.
 		 */
 #ifdef INET6
 		if (sc->sc_inc.inc_flags & INC_ISIPV6)
 			inp->in6p_laddr = in6addr_any;
 		else
 #endif
 			inp->inp_laddr.s_addr = INADDR_ANY;
 		inp->inp_lport = 0;
 		if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: in_pcbinshash failed "
 			    "with error %i\n",
 			    s, __func__, error);
 			free(s, M_TCPLOG);
 		}
 		INP_HASH_WUNLOCK(&V_tcbinfo);
 		goto abort;
 	}
 #ifdef INET6
 	if (inp->inp_vflag & INP_IPV6PROTO) {
 		struct inpcb *oinp = sotoinpcb(lso);
 
 		/*
 		 * Inherit socket options from the listening socket.
 		 * Note that in6p_inputopts are not (and should not be)
 		 * copied, since it stores previously received options and is
 		 * used to detect if each new option is different than the
 		 * previous one and hence should be passed to a user.
 		 * If we copied in6p_inputopts, a user would not be able to
 		 * receive options just after calling the accept system call.
 		 */
 		inp->inp_flags |= oinp->inp_flags & INP_CONTROLOPTS;
 		if (oinp->in6p_outputopts)
 			inp->in6p_outputopts =
 			    ip6_copypktopts(oinp->in6p_outputopts, M_NOWAIT);
 	}
 
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		struct in6_addr laddr6;
 		struct sockaddr_in6 sin6;
 
 		sin6.sin6_family = AF_INET6;
 		sin6.sin6_len = sizeof(sin6);
 		sin6.sin6_addr = sc->sc_inc.inc6_faddr;
 		sin6.sin6_port = sc->sc_inc.inc_fport;
 		sin6.sin6_flowinfo = sin6.sin6_scope_id = 0;
 		laddr6 = inp->in6p_laddr;
 		if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
 			inp->in6p_laddr = sc->sc_inc.inc6_laddr;
 		if ((error = in6_pcbconnect_mbuf(inp, (struct sockaddr *)&sin6,
 		    thread0.td_ucred, m)) != 0) {
 			inp->in6p_laddr = laddr6;
 			if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 				log(LOG_DEBUG, "%s; %s: in6_pcbconnect failed "
 				    "with error %i\n",
 				    s, __func__, error);
 				free(s, M_TCPLOG);
 			}
 			INP_HASH_WUNLOCK(&V_tcbinfo);
 			goto abort;
 		}
 		/* Override flowlabel from in6_pcbconnect. */
 		inp->inp_flow &= ~IPV6_FLOWLABEL_MASK;
 		inp->inp_flow |= sc->sc_flowlabel;
 	}
 #endif /* INET6 */
 #if defined(INET) && defined(INET6)
 	else
 #endif
 #ifdef INET
 	{
 		struct in_addr laddr;
 		struct sockaddr_in sin;
 
 		inp->inp_options = (m) ? ip_srcroute(m) : NULL;
 		
 		if (inp->inp_options == NULL) {
 			inp->inp_options = sc->sc_ipopts;
 			sc->sc_ipopts = NULL;
 		}
 
 		sin.sin_family = AF_INET;
 		sin.sin_len = sizeof(sin);
 		sin.sin_addr = sc->sc_inc.inc_faddr;
 		sin.sin_port = sc->sc_inc.inc_fport;
 		bzero((caddr_t)sin.sin_zero, sizeof(sin.sin_zero));
 		laddr = inp->inp_laddr;
 		if (inp->inp_laddr.s_addr == INADDR_ANY)
 			inp->inp_laddr = sc->sc_inc.inc_laddr;
 		if ((error = in_pcbconnect_mbuf(inp, (struct sockaddr *)&sin,
 		    thread0.td_ucred, m)) != 0) {
 			inp->inp_laddr = laddr;
 			if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 				log(LOG_DEBUG, "%s; %s: in_pcbconnect failed "
 				    "with error %i\n",
 				    s, __func__, error);
 				free(s, M_TCPLOG);
 			}
 			INP_HASH_WUNLOCK(&V_tcbinfo);
 			goto abort;
 		}
 	}
 #endif /* INET */
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	/* Copy old policy into new socket's. */
 	if (ipsec_copy_pcbpolicy(sotoinpcb(lso), inp) != 0)
 		printf("syncache_socket: could not copy policy\n");
 #endif
 	INP_HASH_WUNLOCK(&V_tcbinfo);
 	tp = intotcpcb(inp);
 	tcp_state_change(tp, TCPS_SYN_RECEIVED);
 	tp->iss = sc->sc_iss;
 	tp->irs = sc->sc_irs;
 	tcp_rcvseqinit(tp);
 	tcp_sendseqinit(tp);
 	blk = sototcpcb(lso)->t_fb;
 	if (V_functions_inherit_listen_socket_stack && blk != tp->t_fb) {
 		/*
 		 * Our parents t_fb was not the default,
 		 * we need to release our ref on tp->t_fb and 
 		 * pickup one on the new entry.
 		 */
 		struct tcp_function_block *rblk;
 		
 		rblk = find_and_ref_tcp_fb(blk);
 		KASSERT(rblk != NULL,
 		    ("cannot find blk %p out of syncache?", blk));
 		if (tp->t_fb->tfb_tcp_fb_fini)
 			(*tp->t_fb->tfb_tcp_fb_fini)(tp, 0);
 		refcount_release(&tp->t_fb->tfb_refcnt);
 		tp->t_fb = rblk;
 		/*
 		 * XXXrrs this is quite dangerous, it is possible
 		 * for the new function to fail to init. We also
 		 * are not asking if the handoff_is_ok though at
 		 * the very start thats probalbly ok.
 		 */
 		if (tp->t_fb->tfb_tcp_fb_init) {
 			(*tp->t_fb->tfb_tcp_fb_init)(tp);
 		}
 	}		
 	tp->snd_wl1 = sc->sc_irs;
 	tp->snd_max = tp->iss + 1;
 	tp->snd_nxt = tp->iss + 1;
 	tp->rcv_up = sc->sc_irs + 1;
 	tp->rcv_wnd = sc->sc_wnd;
 	tp->rcv_adv += tp->rcv_wnd;
 	tp->last_ack_sent = tp->rcv_nxt;
 
 	tp->t_flags = sototcpcb(lso)->t_flags & (TF_NOPUSH|TF_NODELAY);
 	if (sc->sc_flags & SCF_NOOPT)
 		tp->t_flags |= TF_NOOPT;
 	else {
 		if (sc->sc_flags & SCF_WINSCALE) {
 			tp->t_flags |= TF_REQ_SCALE|TF_RCVD_SCALE;
 			tp->snd_scale = sc->sc_requested_s_scale;
 			tp->request_r_scale = sc->sc_requested_r_scale;
 		}
 		if (sc->sc_flags & SCF_TIMESTAMP) {
 			tp->t_flags |= TF_REQ_TSTMP|TF_RCVD_TSTMP;
 			tp->ts_recent = sc->sc_tsreflect;
 			tp->ts_recent_age = tcp_ts_getticks();
 			tp->ts_offset = sc->sc_tsoff;
 		}
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 		if (sc->sc_flags & SCF_SIGNATURE)
 			tp->t_flags |= TF_SIGNATURE;
 #endif
 		if (sc->sc_flags & SCF_SACK)
 			tp->t_flags |= TF_SACK_PERMIT;
 	}
 
 	if (sc->sc_flags & SCF_ECN)
 		tp->t_flags |= TF_ECN_PERMIT;
 
 	/*
 	 * Set up MSS and get cached values from tcp_hostcache.
 	 * This might overwrite some of the defaults we just set.
 	 */
 	tcp_mss(tp, sc->sc_peer_mss);
 
 	/*
 	 * If the SYN,ACK was retransmitted, indicate that CWND to be
 	 * limited to one segment in cc_conn_init().
 	 * NB: sc_rxmits counts all SYN,ACK transmits, not just retransmits.
 	 */
 	if (sc->sc_rxmits > 1)
 		tp->snd_cwnd = 1;
 
 #ifdef TCP_OFFLOAD
 	/*
 	 * Allow a TOE driver to install its hooks.  Note that we hold the
 	 * pcbinfo lock too and that prevents tcp_usr_accept from accepting a
 	 * new connection before the TOE driver has done its thing.
 	 */
 	if (ADDED_BY_TOE(sc)) {
 		struct toedev *tod = sc->sc_tod;
 
 		tod->tod_offload_socket(tod, sc->sc_todctx, so);
 	}
 #endif
 	/*
 	 * Copy and activate timers.
 	 */
 	tp->t_keepinit = sototcpcb(lso)->t_keepinit;
 	tp->t_keepidle = sototcpcb(lso)->t_keepidle;
 	tp->t_keepintvl = sototcpcb(lso)->t_keepintvl;
 	tp->t_keepcnt = sototcpcb(lso)->t_keepcnt;
 	tcp_timer_activate(tp, TT_KEEP, TP_KEEPINIT(tp));
 
 	TCPSTAT_INC(tcps_accepts);
 	return (so);
 
 abort:
 	INP_WUNLOCK(inp);
 abort2:
 	if (so != NULL)
 		soabort(so);
 	return (NULL);
 }
 
 /*
  * This function gets called when we receive an ACK for a
  * socket in the LISTEN state.  We look up the connection
  * in the syncache, and if its there, we pull it out of
  * the cache and turn it into a full-blown connection in
  * the SYN-RECEIVED state.
  *
  * On syncache_socket() success the newly created socket
  * has its underlying inp locked.
  */
 int
 syncache_expand(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th,
     struct socket **lsop, struct mbuf *m)
 {
 	struct syncache *sc;
 	struct syncache_head *sch;
 	struct syncache scs;
 	char *s;
 
 	/*
 	 * Global TCP locks are held because we manipulate the PCB lists
 	 * and create a new socket.
 	 */
 	INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
 	KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK,
 	    ("%s: can handle only ACK", __func__));
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 
 #ifdef INVARIANTS
 	/*
 	 * Test code for syncookies comparing the syncache stored
 	 * values with the reconstructed values from the cookie.
 	 */
 	if (sc != NULL)
 		syncookie_cmp(inc, sch, sc, th, to, *lsop);
 #endif
 
 	if (sc == NULL) {
 		/*
 		 * There is no syncache entry, so see if this ACK is
 		 * a returning syncookie.  To do this, first:
 		 *  A. Check if syncookies are used in case of syncache
 		 *     overflows
 		 *  B. See if this socket has had a syncache entry dropped in
 		 *     the recent past. We don't want to accept a bogus
 		 *     syncookie if we've never received a SYN or accept it
 		 *     twice.
 		 *  C. check that the syncookie is valid.  If it is, then
 		 *     cobble up a fake syncache entry, and return.
 		 */
 		if (!V_tcp_syncookies) {
 			SCH_UNLOCK(sch);
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Spurious ACK, "
 				    "segment rejected (syncookies disabled)\n",
 				    s, __func__);
 			goto failed;
 		}
 		if (!V_tcp_syncookiesonly &&
 		    sch->sch_last_overflow < time_uptime - SYNCOOKIE_LIFETIME) {
 			SCH_UNLOCK(sch);
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Spurious ACK, "
 				    "segment rejected (no syncache entry)\n",
 				    s, __func__);
 			goto failed;
 		}
 		bzero(&scs, sizeof(scs));
 		sc = syncookie_lookup(inc, sch, &scs, th, to, *lsop);
 		SCH_UNLOCK(sch);
 		if (sc == NULL) {
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Segment failed "
 				    "SYNCOOKIE authentication, segment rejected "
 				    "(probably spoofed)\n", s, __func__);
 			goto failed;
 		}
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 		/* If received ACK has MD5 signature, check it. */
 		if ((to->to_flags & TOF_SIGNATURE) != 0 &&
 		    (!TCPMD5_ENABLED() ||
 		    TCPMD5_INPUT(m, th, to->to_signature) != 0)) {
 			/* Drop the ACK. */
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
 				log(LOG_DEBUG, "%s; %s: Segment rejected, "
 				    "MD5 signature doesn't match.\n",
 				    s, __func__);
 				free(s, M_TCPLOG);
 			}
 			TCPSTAT_INC(tcps_sig_err_sigopt);
 			return (-1); /* Do not send RST */
 		}
 #endif /* TCP_SIGNATURE */
 	} else {
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 		/*
 		 * If listening socket requested TCP digests, check that
 		 * received ACK has signature and it is correct.
 		 * If not, drop the ACK and leave sc entry in th cache,
 		 * because SYN was received with correct signature.
 		 */
 		if (sc->sc_flags & SCF_SIGNATURE) {
 			if ((to->to_flags & TOF_SIGNATURE) == 0) {
 				/* No signature */
 				TCPSTAT_INC(tcps_sig_err_nosigopt);
 				SCH_UNLOCK(sch);
 				if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
 					log(LOG_DEBUG, "%s; %s: Segment "
 					    "rejected, MD5 signature wasn't "
 					    "provided.\n", s, __func__);
 					free(s, M_TCPLOG);
 				}
 				return (-1); /* Do not send RST */
 			}
 			if (!TCPMD5_ENABLED() ||
 			    TCPMD5_INPUT(m, th, to->to_signature) != 0) {
 				/* Doesn't match or no SA */
 				SCH_UNLOCK(sch);
 				if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
 					log(LOG_DEBUG, "%s; %s: Segment "
 					    "rejected, MD5 signature doesn't "
 					    "match.\n", s, __func__);
 					free(s, M_TCPLOG);
 				}
 				return (-1); /* Do not send RST */
 			}
 		}
 #endif /* TCP_SIGNATURE */
 		/*
 		 * Pull out the entry to unlock the bucket row.
 		 * 
 		 * NOTE: We must decrease TCPS_SYN_RECEIVED count here, not
 		 * tcp_state_change().  The tcpcb is not existent at this
 		 * moment.  A new one will be allocated via syncache_socket->
 		 * sonewconn->tcp_usr_attach in TCPS_CLOSED state, then
 		 * syncache_socket() will change it to TCPS_SYN_RECEIVED.
 		 */
 		TCPSTATES_DEC(TCPS_SYN_RECEIVED);
 		TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
 		sch->sch_length--;
 #ifdef TCP_OFFLOAD
 		if (ADDED_BY_TOE(sc)) {
 			struct toedev *tod = sc->sc_tod;
 
 			tod->tod_syncache_removed(tod, sc->sc_todctx);
 		}
 #endif
 		SCH_UNLOCK(sch);
 	}
 
 	/*
 	 * Segment validation:
 	 * ACK must match our initial sequence number + 1 (the SYN|ACK).
 	 */
 	if (th->th_ack != sc->sc_iss + 1) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
 			    "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
 		goto failed;
 	}
 
 	/*
 	 * The SEQ must fall in the window starting at the received
 	 * initial receive sequence number + 1 (the SYN).
 	 */
 	if (SEQ_LEQ(th->th_seq, sc->sc_irs) ||
 	    SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment "
 			    "rejected\n", s, __func__, th->th_seq, sc->sc_irs);
 		goto failed;
 	}
 
 	/*
 	 * If timestamps were not negotiated during SYN/ACK they
 	 * must not appear on any segment during this session.
 	 */
 	if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
 			    "segment rejected\n", s, __func__);
 		goto failed;
 	}
 
 	/*
 	 * If timestamps were negotiated during SYN/ACK they should
 	 * appear on every segment during this session.
 	 * XXXAO: This is only informal as there have been unverified
 	 * reports of non-compliants stacks.
 	 */
 	if ((sc->sc_flags & SCF_TIMESTAMP) && !(to->to_flags & TOF_TS)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Timestamp missing, "
 			    "no action\n", s, __func__);
 			free(s, M_TCPLOG);
 			s = NULL;
 		}
 	}
 
 	*lsop = syncache_socket(sc, *lsop, m);
 
 	if (*lsop == NULL)
 		TCPSTAT_INC(tcps_sc_aborted);
 	else
 		TCPSTAT_INC(tcps_sc_completed);
 
 /* how do we find the inp for the new socket? */
 	if (sc != &scs)
 		syncache_free(sc);
 	return (1);
 failed:
 	if (sc != NULL && sc != &scs)
 		syncache_free(sc);
 	if (s != NULL)
 		free(s, M_TCPLOG);
 	*lsop = NULL;
 	return (0);
 }
 
 static void
 syncache_tfo_expand(struct syncache *sc, struct socket **lsop, struct mbuf *m,
     uint64_t response_cookie)
 {
 	struct inpcb *inp;
 	struct tcpcb *tp;
 	unsigned int *pending_counter;
 
 	/*
 	 * Global TCP locks are held because we manipulate the PCB lists
 	 * and create a new socket.
 	 */
 	INP_INFO_RLOCK_ASSERT(&V_tcbinfo);
 
 	pending_counter = intotcpcb(sotoinpcb(*lsop))->t_tfo_pending;
 	*lsop = syncache_socket(sc, *lsop, m);
 	if (*lsop == NULL) {
 		TCPSTAT_INC(tcps_sc_aborted);
 		atomic_subtract_int(pending_counter, 1);
 	} else {
 		soisconnected(*lsop);
 		inp = sotoinpcb(*lsop);
 		tp = intotcpcb(inp);
 		tp->t_flags |= TF_FASTOPEN;
 		tp->t_tfo_cookie.server = response_cookie;
 		tp->snd_max = tp->iss;
 		tp->snd_nxt = tp->iss;
 		tp->t_tfo_pending = pending_counter;
 		TCPSTAT_INC(tcps_sc_completed);
 	}
 }
 
 /*
  * Given a LISTEN socket and an inbound SYN request, add
  * this to the syn cache, and send back a segment:
  *	<SEQ=ISS><ACK=RCV_NXT><CTL=SYN,ACK>
  * to the source.
  *
  * IMPORTANT NOTE: We do _NOT_ ACK data that might accompany the SYN.
  * Doing so would require that we hold onto the data and deliver it
  * to the application.  However, if we are the target of a SYN-flood
  * DoS attack, an attacker could send data which would eventually
  * consume all available buffer space if it were ACKed.  By not ACKing
  * the data, we avoid this DoS scenario.
  *
  * The exception to the above is when a SYN with a valid TCP Fast Open (TFO)
  * cookie is processed and a new socket is created.  In this case, any data
  * accompanying the SYN will be queued to the socket by tcp_input() and will
  * be ACKed either when the application sends response data or the delayed
  * ACK timer expires, whichever comes first.
  */
 int
 syncache_add(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th,
     struct inpcb *inp, struct socket **lsop, struct mbuf *m, void *tod,
     void *todctx)
 {
 	struct tcpcb *tp;
 	struct socket *so;
 	struct syncache *sc = NULL;
 	struct syncache_head *sch;
 	struct mbuf *ipopts = NULL;
 	u_int ltflags;
 	int win, ip_ttl, ip_tos;
 	char *s;
 	int rv = 0;
 #ifdef INET6
 	int autoflowlabel = 0;
 #endif
 #ifdef MAC
 	struct label *maclabel;
 #endif
 	struct syncache scs;
 	struct ucred *cred;
 	uint64_t tfo_response_cookie;
 	unsigned int *tfo_pending = NULL;
 	int tfo_cookie_valid = 0;
 	int tfo_response_cookie_valid = 0;
 
 	INP_WLOCK_ASSERT(inp);			/* listen socket */
 	KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_SYN,
 	    ("%s: unexpected tcp flags", __func__));
 
 	/*
 	 * Combine all so/tp operations very early to drop the INP lock as
 	 * soon as possible.
 	 */
 	so = *lsop;
 	KASSERT(SOLISTENING(so), ("%s: %p not listening", __func__, so));
 	tp = sototcpcb(so);
 	cred = crhold(so->so_cred);
 
 #ifdef INET6
 	if ((inc->inc_flags & INC_ISIPV6) &&
 	    (inp->inp_flags & IN6P_AUTOFLOWLABEL))
 		autoflowlabel = 1;
 #endif
 	ip_ttl = inp->inp_ip_ttl;
 	ip_tos = inp->inp_ip_tos;
 	win = so->sol_sbrcv_hiwat;
 	ltflags = (tp->t_flags & (TF_NOOPT | TF_SIGNATURE));
 
 	if (V_tcp_fastopen_server_enable && IS_FASTOPEN(tp->t_flags) &&
 	    (tp->t_tfo_pending != NULL) &&
 	    (to->to_flags & TOF_FASTOPEN)) {
 		/*
 		 * Limit the number of pending TFO connections to
 		 * approximately half of the queue limit.  This prevents TFO
 		 * SYN floods from starving the service by filling the
 		 * listen queue with bogus TFO connections.
 		 */
 		if (atomic_fetchadd_int(tp->t_tfo_pending, 1) <=
 		    (so->sol_qlimit / 2)) {
 			int result;
 
 			result = tcp_fastopen_check_cookie(inc,
 			    to->to_tfo_cookie, to->to_tfo_len,
 			    &tfo_response_cookie);
 			tfo_cookie_valid = (result > 0);
 			tfo_response_cookie_valid = (result >= 0);
 		}
 
 		/*
 		 * Remember the TFO pending counter as it will have to be
 		 * decremented below if we don't make it to syncache_tfo_expand().
 		 */
 		tfo_pending = tp->t_tfo_pending;
 	}
 
 	/* By the time we drop the lock these should no longer be used. */
 	so = NULL;
 	tp = NULL;
 
 #ifdef MAC
 	if (mac_syncache_init(&maclabel) != 0) {
 		INP_WUNLOCK(inp);
 		goto done;
 	} else
 		mac_syncache_create(maclabel, inp);
 #endif
 	if (!tfo_cookie_valid)
 		INP_WUNLOCK(inp);
 
 	/*
 	 * Remember the IP options, if any.
 	 */
 #ifdef INET6
 	if (!(inc->inc_flags & INC_ISIPV6))
 #endif
 #ifdef INET
 		ipopts = (m) ? ip_srcroute(m) : NULL;
 #else
 		ipopts = NULL;
 #endif
 
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 	/*
 	 * If listening socket requested TCP digests, check that received
 	 * SYN has signature and it is correct. If signature doesn't match
 	 * or TCP_SIGNATURE support isn't enabled, drop the packet.
 	 */
 	if (ltflags & TF_SIGNATURE) {
 		if ((to->to_flags & TOF_SIGNATURE) == 0) {
 			TCPSTAT_INC(tcps_sig_err_nosigopt);
 			goto done;
 		}
 		if (!TCPMD5_ENABLED() ||
 		    TCPMD5_INPUT(m, th, to->to_signature) != 0)
 			goto done;
 	}
 #endif	/* TCP_SIGNATURE */
 	/*
 	 * See if we already have an entry for this connection.
 	 * If we do, resend the SYN,ACK, and reset the retransmit timer.
 	 *
 	 * XXX: should the syncache be re-initialized with the contents
 	 * of the new SYN here (which may have different options?)
 	 *
 	 * XXX: We do not check the sequence number to see if this is a
 	 * real retransmit or a new connection attempt.  The question is
 	 * how to handle such a case; either ignore it as spoofed, or
 	 * drop the current entry and create a new one?
 	 */
 	sc = syncache_lookup(inc, &sch);	/* returns locked entry */
 	SCH_LOCK_ASSERT(sch);
 	if (sc != NULL) {
 		if (tfo_cookie_valid)
 			INP_WUNLOCK(inp);
 		TCPSTAT_INC(tcps_sc_dupsyn);
 		if (ipopts) {
 			/*
 			 * If we were remembering a previous source route,
 			 * forget it and use the new one we've been given.
 			 */
 			if (sc->sc_ipopts)
 				(void) m_free(sc->sc_ipopts);
 			sc->sc_ipopts = ipopts;
 		}
 		/*
 		 * Update timestamp if present.
 		 */
 		if ((sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS))
 			sc->sc_tsreflect = to->to_tsval;
 		else
 			sc->sc_flags &= ~SCF_TIMESTAMP;
 #ifdef MAC
 		/*
 		 * Since we have already unconditionally allocated label
 		 * storage, free it up.  The syncache entry will already
 		 * have an initialized label we can use.
 		 */
 		mac_syncache_destroy(&maclabel);
 #endif
 		TCP_PROBE5(receive, NULL, NULL, m, NULL, th);
 		/* Retransmit SYN|ACK and reset retransmit count. */
 		if ((s = tcp_log_addrs(&sc->sc_inc, th, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Received duplicate SYN, "
 			    "resetting timer and retransmitting SYN|ACK\n",
 			    s, __func__);
 			free(s, M_TCPLOG);
 		}
 		if (syncache_respond(sc, sch, m, TH_SYN|TH_ACK) == 0) {
 			sc->sc_rxmits = 0;
 			syncache_timeout(sc, sch, 1);
 			TCPSTAT_INC(tcps_sndacks);
 			TCPSTAT_INC(tcps_sndtotal);
 		}
 		SCH_UNLOCK(sch);
 		goto donenoprobe;
 	}
 
 	if (tfo_cookie_valid) {
 		bzero(&scs, sizeof(scs));
 		sc = &scs;
 		goto skip_alloc;
 	}
 
 	sc = uma_zalloc(V_tcp_syncache.zone, M_NOWAIT | M_ZERO);
 	if (sc == NULL) {
 		/*
 		 * The zone allocator couldn't provide more entries.
 		 * Treat this as if the cache was full; drop the oldest
 		 * entry and insert the new one.
 		 */
 		TCPSTAT_INC(tcps_sc_zonefail);
 		if ((sc = TAILQ_LAST(&sch->sch_bucket, sch_head)) != NULL) {
 			sch->sch_last_overflow = time_uptime;
 			syncache_drop(sc, sch);
 		}
 		sc = uma_zalloc(V_tcp_syncache.zone, M_NOWAIT | M_ZERO);
 		if (sc == NULL) {
 			if (V_tcp_syncookies) {
 				bzero(&scs, sizeof(scs));
 				sc = &scs;
 			} else {
 				SCH_UNLOCK(sch);
 				if (ipopts)
 					(void) m_free(ipopts);
 				goto done;
 			}
 		}
 	}
 
 skip_alloc:
 	if (!tfo_cookie_valid && tfo_response_cookie_valid)
 		sc->sc_tfo_cookie = &tfo_response_cookie;
 
 	/*
 	 * Fill in the syncache values.
 	 */
 #ifdef MAC
 	sc->sc_label = maclabel;
 #endif
 	sc->sc_cred = cred;
 	cred = NULL;
 	sc->sc_ipopts = ipopts;
 	bcopy(inc, &sc->sc_inc, sizeof(struct in_conninfo));
 #ifdef INET6
 	if (!(inc->inc_flags & INC_ISIPV6))
 #endif
 	{
 		sc->sc_ip_tos = ip_tos;
 		sc->sc_ip_ttl = ip_ttl;
 	}
 #ifdef TCP_OFFLOAD
 	sc->sc_tod = tod;
 	sc->sc_todctx = todctx;
 #endif
 	sc->sc_irs = th->th_seq;
 	sc->sc_iss = arc4random();
 	sc->sc_flags = 0;
 	sc->sc_flowlabel = 0;
 
 	/*
 	 * Initial receive window: clip sbspace to [0 .. TCP_MAXWIN].
 	 * win was derived from socket earlier in the function.
 	 */
 	win = imax(win, 0);
 	win = imin(win, TCP_MAXWIN);
 	sc->sc_wnd = win;
 
 	if (V_tcp_do_rfc1323) {
 		/*
 		 * A timestamp received in a SYN makes
 		 * it ok to send timestamp requests and replies.
 		 */
 		if (to->to_flags & TOF_TS) {
 			sc->sc_tsreflect = to->to_tsval;
 			sc->sc_flags |= SCF_TIMESTAMP;
 			sc->sc_tsoff = tcp_new_ts_offset(inc);
 		}
 		if (to->to_flags & TOF_SCALE) {
 			int wscale = 0;
 
 			/*
 			 * Pick the smallest possible scaling factor that
 			 * will still allow us to scale up to sb_max, aka
 			 * kern.ipc.maxsockbuf.
 			 *
 			 * We do this because there are broken firewalls that
 			 * will corrupt the window scale option, leading to
 			 * the other endpoint believing that our advertised
 			 * window is unscaled.  At scale factors larger than
 			 * 5 the unscaled window will drop below 1500 bytes,
 			 * leading to serious problems when traversing these
 			 * broken firewalls.
 			 *
 			 * With the default maxsockbuf of 256K, a scale factor
 			 * of 3 will be chosen by this algorithm.  Those who
 			 * choose a larger maxsockbuf should watch out
 			 * for the compatibility problems mentioned above.
 			 *
 			 * RFC1323: The Window field in a SYN (i.e., a <SYN>
 			 * or <SYN,ACK>) segment itself is never scaled.
 			 */
 			while (wscale < TCP_MAX_WINSHIFT &&
 			    (TCP_MAXWIN << wscale) < sb_max)
 				wscale++;
 			sc->sc_requested_r_scale = wscale;
 			sc->sc_requested_s_scale = to->to_wscale;
 			sc->sc_flags |= SCF_WINSCALE;
 		}
 	}
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 	/*
 	 * If listening socket requested TCP digests, flag this in the
 	 * syncache so that syncache_respond() will do the right thing
 	 * with the SYN+ACK.
 	 */
 	if (ltflags & TF_SIGNATURE)
 		sc->sc_flags |= SCF_SIGNATURE;
 #endif	/* TCP_SIGNATURE */
 	if (to->to_flags & TOF_SACKPERM)
 		sc->sc_flags |= SCF_SACK;
 	if (to->to_flags & TOF_MSS)
 		sc->sc_peer_mss = to->to_mss;	/* peer mss may be zero */
 	if (ltflags & TF_NOOPT)
 		sc->sc_flags |= SCF_NOOPT;
 	if ((th->th_flags & (TH_ECE|TH_CWR)) && V_tcp_do_ecn)
 		sc->sc_flags |= SCF_ECN;
 
 	if (V_tcp_syncookies)
 		sc->sc_iss = syncookie_generate(sch, sc);
 #ifdef INET6
 	if (autoflowlabel) {
 		if (V_tcp_syncookies)
 			sc->sc_flowlabel = sc->sc_iss;
 		else
 			sc->sc_flowlabel = ip6_randomflowlabel();
 		sc->sc_flowlabel = htonl(sc->sc_flowlabel) & IPV6_FLOWLABEL_MASK;
 	}
 #endif
 	SCH_UNLOCK(sch);
 
 	if (tfo_cookie_valid) {
 		syncache_tfo_expand(sc, lsop, m, tfo_response_cookie);
 		/* INP_WUNLOCK(inp) will be performed by the caller */
 		rv = 1;
 		goto tfo_expanded;
 	}
 
 	TCP_PROBE5(receive, NULL, NULL, m, NULL, th);
 	/*
 	 * Do a standard 3-way handshake.
 	 */
 	if (syncache_respond(sc, sch, m, TH_SYN|TH_ACK) == 0) {
 		if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs)
 			syncache_free(sc);
 		else if (sc != &scs)
 			syncache_insert(sc, sch);   /* locks and unlocks sch */
 		TCPSTAT_INC(tcps_sndacks);
 		TCPSTAT_INC(tcps_sndtotal);
 	} else {
 		if (sc != &scs)
 			syncache_free(sc);
 		TCPSTAT_INC(tcps_sc_dropped);
 	}
 	goto donenoprobe;
 
 done:
 	TCP_PROBE5(receive, NULL, NULL, m, NULL, th);
 donenoprobe:
 	if (m) {
 		*lsop = NULL;
 		m_freem(m);
 	}
 	/*
 	 * If tfo_pending is not NULL here, then a TFO SYN that did not
 	 * result in a new socket was processed and the associated pending
 	 * counter has not yet been decremented.  All such TFO processing paths
 	 * transit this point.
 	 */
 	if (tfo_pending != NULL)
 		tcp_fastopen_decrement_counter(tfo_pending);
 
 tfo_expanded:
 	if (cred != NULL)
 		crfree(cred);
 #ifdef MAC
 	if (sc == &scs)
 		mac_syncache_destroy(&maclabel);
 #endif
 	return (rv);
 }
 
 /*
  * Send SYN|ACK or ACK to the peer.  Either in response to a peer's segment,
  * i.e. m0 != NULL, or upon 3WHS ACK timeout, i.e. m0 == NULL.
  */
 static int
 syncache_respond(struct syncache *sc, struct syncache_head *sch,
     const struct mbuf *m0, int flags)
 {
 	struct ip *ip = NULL;
 	struct mbuf *m;
 	struct tcphdr *th = NULL;
 	int optlen, error = 0;	/* Make compiler happy */
 	u_int16_t hlen, tlen, mssopt;
 	struct tcpopt to;
 #ifdef INET6
 	struct ip6_hdr *ip6 = NULL;
 #endif
 	hlen =
 #ifdef INET6
 	       (sc->sc_inc.inc_flags & INC_ISIPV6) ? sizeof(struct ip6_hdr) :
 #endif
 		sizeof(struct ip);
 	tlen = hlen + sizeof(struct tcphdr);
 
 	/* Determine MSS we advertize to other end of connection. */
 	mssopt = max(tcp_mssopt(&sc->sc_inc), V_tcp_minmss);
 
 	/* XXX: Assume that the entire packet will fit in a header mbuf. */
 	KASSERT(max_linkhdr + tlen + TCP_MAXOLEN <= MHLEN,
 	    ("syncache: mbuf too small"));
 
 	/* Create the IP+TCP header from scratch. */
 	m = m_gethdr(M_NOWAIT, MT_DATA);
 	if (m == NULL)
 		return (ENOBUFS);
 #ifdef MAC
 	mac_syncache_create_mbuf(sc->sc_label, m);
 #endif
 	m->m_data += max_linkhdr;
 	m->m_len = tlen;
 	m->m_pkthdr.len = tlen;
 	m->m_pkthdr.rcvif = NULL;
 
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		ip6 = mtod(m, struct ip6_hdr *);
 		ip6->ip6_vfc = IPV6_VERSION;
 		ip6->ip6_nxt = IPPROTO_TCP;
 		ip6->ip6_src = sc->sc_inc.inc6_laddr;
 		ip6->ip6_dst = sc->sc_inc.inc6_faddr;
 		ip6->ip6_plen = htons(tlen - hlen);
 		/* ip6_hlim is set after checksum */
 		ip6->ip6_flow &= ~IPV6_FLOWLABEL_MASK;
 		ip6->ip6_flow |= sc->sc_flowlabel;
 
 		th = (struct tcphdr *)(ip6 + 1);
 	}
 #endif
 #if defined(INET6) && defined(INET)
 	else
 #endif
 #ifdef INET
 	{
 		ip = mtod(m, struct ip *);
 		ip->ip_v = IPVERSION;
 		ip->ip_hl = sizeof(struct ip) >> 2;
 		ip->ip_len = htons(tlen);
 		ip->ip_id = 0;
 		ip->ip_off = 0;
 		ip->ip_sum = 0;
 		ip->ip_p = IPPROTO_TCP;
 		ip->ip_src = sc->sc_inc.inc_laddr;
 		ip->ip_dst = sc->sc_inc.inc_faddr;
 		ip->ip_ttl = sc->sc_ip_ttl;
 		ip->ip_tos = sc->sc_ip_tos;
 
 		/*
 		 * See if we should do MTU discovery.  Route lookups are
 		 * expensive, so we will only unset the DF bit if:
 		 *
 		 *	1) path_mtu_discovery is disabled
 		 *	2) the SCF_UNREACH flag has been set
 		 */
 		if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0))
 		       ip->ip_off |= htons(IP_DF);
 
 		th = (struct tcphdr *)(ip + 1);
 	}
 #endif /* INET */
 	th->th_sport = sc->sc_inc.inc_lport;
 	th->th_dport = sc->sc_inc.inc_fport;
 
 	if (flags & TH_SYN)
 		th->th_seq = htonl(sc->sc_iss);
 	else
 		th->th_seq = htonl(sc->sc_iss + 1);
 	th->th_ack = htonl(sc->sc_irs + 1);
 	th->th_off = sizeof(struct tcphdr) >> 2;
 	th->th_x2 = 0;
 	th->th_flags = flags;
 	th->th_win = htons(sc->sc_wnd);
 	th->th_urp = 0;
 
 	if ((flags & TH_SYN) && (sc->sc_flags & SCF_ECN)) {
 		th->th_flags |= TH_ECE;
 		TCPSTAT_INC(tcps_ecn_shs);
 	}
 
 	/* Tack on the TCP options. */
 	if ((sc->sc_flags & SCF_NOOPT) == 0) {
 		to.to_flags = 0;
 
 		if (flags & TH_SYN) {
 			to.to_mss = mssopt;
 			to.to_flags = TOF_MSS;
 			if (sc->sc_flags & SCF_WINSCALE) {
 				to.to_wscale = sc->sc_requested_r_scale;
 				to.to_flags |= TOF_SCALE;
 			}
 			if (sc->sc_flags & SCF_SACK)
 				to.to_flags |= TOF_SACKPERM;
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 			if (sc->sc_flags & SCF_SIGNATURE)
 				to.to_flags |= TOF_SIGNATURE;
 #endif
 			if (sc->sc_tfo_cookie) {
 				to.to_flags |= TOF_FASTOPEN;
 				to.to_tfo_len = TCP_FASTOPEN_COOKIE_LEN;
 				to.to_tfo_cookie = sc->sc_tfo_cookie;
 				/* don't send cookie again when retransmitting response */
 				sc->sc_tfo_cookie = NULL;
 			}
 		}
 		if (sc->sc_flags & SCF_TIMESTAMP) {
 			to.to_tsval = sc->sc_tsoff + tcp_ts_getticks();
 			to.to_tsecr = sc->sc_tsreflect;
 			to.to_flags |= TOF_TS;
 		}
 		optlen = tcp_addoptions(&to, (u_char *)(th + 1));
 
 		/* Adjust headers by option size. */
 		th->th_off = (sizeof(struct tcphdr) + optlen) >> 2;
 		m->m_len += optlen;
 		m->m_pkthdr.len += optlen;
 #ifdef INET6
 		if (sc->sc_inc.inc_flags & INC_ISIPV6)
 			ip6->ip6_plen = htons(ntohs(ip6->ip6_plen) + optlen);
 		else
 #endif
 			ip->ip_len = htons(ntohs(ip->ip_len) + optlen);
 #if defined(IPSEC_SUPPORT) || defined(TCP_SIGNATURE)
 		if (sc->sc_flags & SCF_SIGNATURE) {
 			KASSERT(to.to_flags & TOF_SIGNATURE,
 			    ("tcp_addoptions() didn't set tcp_signature"));
 
 			/* NOTE: to.to_signature is inside of mbuf */
 			if (!TCPMD5_ENABLED() ||
 			    TCPMD5_OUTPUT(m, th, to.to_signature) != 0) {
 				m_freem(m);
 				return (EACCES);
 			}
 		}
 #endif
 	} else
 		optlen = 0;
 
 	M_SETFIB(m, sc->sc_inc.inc_fibnum);
 	m->m_pkthdr.csum_data = offsetof(struct tcphdr, th_sum);
 	/*
 	 * If we have peer's SYN and it has a flowid, then let's assign it to
 	 * our SYN|ACK.  ip6_output() and ip_output() will not assign flowid
 	 * to SYN|ACK due to lack of inp here.
 	 */
 	if (m0 != NULL && M_HASHTYPE_GET(m0) != M_HASHTYPE_NONE) {
 		m->m_pkthdr.flowid = m0->m_pkthdr.flowid;
 		M_HASHTYPE_SET(m, M_HASHTYPE_GET(m0));
 	}
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		m->m_pkthdr.csum_flags = CSUM_TCP_IPV6;
 		th->th_sum = in6_cksum_pseudo(ip6, tlen + optlen - hlen,
 		    IPPROTO_TCP, 0);
 		ip6->ip6_hlim = in6_selecthlim(NULL, NULL);
 #ifdef TCP_OFFLOAD
 		if (ADDED_BY_TOE(sc)) {
 			struct toedev *tod = sc->sc_tod;
 
 			error = tod->tod_syncache_respond(tod, sc->sc_todctx, m);
 
 			return (error);
 		}
 #endif
 		TCP_PROBE5(send, NULL, NULL, ip6, NULL, th);
 		error = ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL);
 	}
 #endif
 #if defined(INET6) && defined(INET)
 	else
 #endif
 #ifdef INET
 	{
 		m->m_pkthdr.csum_flags = CSUM_TCP;
 		th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr,
 		    htons(tlen + optlen - hlen + IPPROTO_TCP));
 #ifdef TCP_OFFLOAD
 		if (ADDED_BY_TOE(sc)) {
 			struct toedev *tod = sc->sc_tod;
 
 			error = tod->tod_syncache_respond(tod, sc->sc_todctx, m);
 
 			return (error);
 		}
 #endif
 		TCP_PROBE5(send, NULL, NULL, ip, NULL, th);
 		error = ip_output(m, sc->sc_ipopts, NULL, 0, NULL, NULL);
 	}
 #endif
 	return (error);
 }
 
 /*
  * The purpose of syncookies is to handle spoofed SYN flooding DoS attacks
  * that exceed the capacity of the syncache by avoiding the storage of any
  * of the SYNs we receive.  Syncookies defend against blind SYN flooding
  * attacks where the attacker does not have access to our responses.
  *
  * Syncookies encode and include all necessary information about the
  * connection setup within the SYN|ACK that we send back.  That way we
  * can avoid keeping any local state until the ACK to our SYN|ACK returns
  * (if ever).  Normally the syncache and syncookies are running in parallel
  * with the latter taking over when the former is exhausted.  When matching
  * syncache entry is found the syncookie is ignored.
  *
  * The only reliable information persisting the 3WHS is our initial sequence
  * number ISS of 32 bits.  Syncookies embed a cryptographically sufficient
  * strong hash (MAC) value and a few bits of TCP SYN options in the ISS
  * of our SYN|ACK.  The MAC can be recomputed when the ACK to our SYN|ACK
  * returns and signifies a legitimate connection if it matches the ACK.
  *
  * The available space of 32 bits to store the hash and to encode the SYN
  * option information is very tight and we should have at least 24 bits for
  * the MAC to keep the number of guesses by blind spoofing reasonably high.
  *
  * SYN option information we have to encode to fully restore a connection:
  * MSS: is imporant to chose an optimal segment size to avoid IP level
  *   fragmentation along the path.  The common MSS values can be encoded
  *   in a 3-bit table.  Uncommon values are captured by the next lower value
  *   in the table leading to a slight increase in packetization overhead.
  * WSCALE: is necessary to allow large windows to be used for high delay-
  *   bandwidth product links.  Not scaling the window when it was initially
  *   negotiated is bad for performance as lack of scaling further decreases
  *   the apparent available send window.  We only need to encode the WSCALE
  *   we received from the remote end.  Our end can be recalculated at any
  *   time.  The common WSCALE values can be encoded in a 3-bit table.
  *   Uncommon values are captured by the next lower value in the table
  *   making us under-estimate the available window size halving our
  *   theoretically possible maximum throughput for that connection.
  * SACK: Greatly assists in packet loss recovery and requires 1 bit.
  * TIMESTAMP and SIGNATURE is not encoded because they are permanent options
  *   that are included in all segments on a connection.  We enable them when
  *   the ACK has them.
  *
  * Security of syncookies and attack vectors:
  *
  * The MAC is computed over (faddr||laddr||fport||lport||irs||flags||secmod)
  * together with the gloabl secret to make it unique per connection attempt.
  * Thus any change of any of those parameters results in a different MAC output
  * in an unpredictable way unless a collision is encountered.  24 bits of the
  * MAC are embedded into the ISS.
  *
  * To prevent replay attacks two rotating global secrets are updated with a
  * new random value every 15 seconds.  The life-time of a syncookie is thus
  * 15-30 seconds.
  *
  * Vector 1: Attacking the secret.  This requires finding a weakness in the
  * MAC itself or the way it is used here.  The attacker can do a chosen plain
  * text attack by varying and testing the all parameters under his control.
  * The strength depends on the size and randomness of the secret, and the
  * cryptographic security of the MAC function.  Due to the constant updating
  * of the secret the attacker has at most 29.999 seconds to find the secret
  * and launch spoofed connections.  After that he has to start all over again.
  *
  * Vector 2: Collision attack on the MAC of a single ACK.  With a 24 bit MAC
  * size an average of 4,823 attempts are required for a 50% chance of success
  * to spoof a single syncookie (birthday collision paradox).  However the
  * attacker is blind and doesn't know if one of his attempts succeeded unless
  * he has a side channel to interfere success from.  A single connection setup
  * success average of 90% requires 8,790 packets, 99.99% requires 17,578 packets.
  * This many attempts are required for each one blind spoofed connection.  For
  * every additional spoofed connection he has to launch another N attempts.
  * Thus for a sustained rate 100 spoofed connections per second approximately
  * 1,800,000 packets per second would have to be sent.
  *
  * NB: The MAC function should be fast so that it doesn't become a CPU
  * exhaustion attack vector itself.
  *
  * References:
  *  RFC4987 TCP SYN Flooding Attacks and Common Mitigations
  *  SYN cookies were first proposed by cryptographer Dan J. Bernstein in 1996
  *   http://cr.yp.to/syncookies.html    (overview)
  *   http://cr.yp.to/syncookies/archive (details)
  *
  *
  * Schematic construction of a syncookie enabled Initial Sequence Number:
  *  0        1         2         3
  *  12345678901234567890123456789012
  * |xxxxxxxxxxxxxxxxxxxxxxxxWWWMMMSP|
  *
  *  x 24 MAC (truncated)
  *  W  3 Send Window Scale index
  *  M  3 MSS index
  *  S  1 SACK permitted
  *  P  1 Odd/even secret
  */
 
 /*
  * Distribution and probability of certain MSS values.  Those in between are
  * rounded down to the next lower one.
  * [An Analysis of TCP Maximum Segment Sizes, S. Alcock and R. Nelson, 2011]
  *                            .2%  .3%   5%    7%    7%    20%   15%   45%
  */
 static int tcp_sc_msstab[] = { 216, 536, 1200, 1360, 1400, 1440, 1452, 1460 };
 
 /*
  * Distribution and probability of certain WSCALE values.  We have to map the
  * (send) window scale (shift) option with a range of 0-14 from 4 bits into 3
  * bits based on prevalence of certain values.  Where we don't have an exact
  * match for are rounded down to the next lower one letting us under-estimate
  * the true available window.  At the moment this would happen only for the
  * very uncommon values 3, 5 and those above 8 (more than 16MB socket buffer
  * and window size).  The absence of the WSCALE option (no scaling in either
  * direction) is encoded with index zero.
  * [WSCALE values histograms, Allman, 2012]
  *                            X 10 10 35  5  6 14 10%   by host
  *                            X 11  4  5  5 18 49  3%   by connections
  */
 static int tcp_sc_wstab[] = { 0, 0, 1, 2, 4, 6, 7, 8 };
 
 /*
  * Compute the MAC for the SYN cookie.  SIPHASH-2-4 is chosen for its speed
  * and good cryptographic properties.
  */
 static uint32_t
 syncookie_mac(struct in_conninfo *inc, tcp_seq irs, uint8_t flags,
     uint8_t *secbits, uintptr_t secmod)
 {
 	SIPHASH_CTX ctx;
 	uint32_t siphash[2];
 
 	SipHash24_Init(&ctx);
 	SipHash_SetKey(&ctx, secbits);
 	switch (inc->inc_flags & INC_ISIPV6) {
 #ifdef INET
 	case 0:
 		SipHash_Update(&ctx, &inc->inc_faddr, sizeof(inc->inc_faddr));
 		SipHash_Update(&ctx, &inc->inc_laddr, sizeof(inc->inc_laddr));
 		break;
 #endif
 #ifdef INET6
 	case INC_ISIPV6:
 		SipHash_Update(&ctx, &inc->inc6_faddr, sizeof(inc->inc6_faddr));
 		SipHash_Update(&ctx, &inc->inc6_laddr, sizeof(inc->inc6_laddr));
 		break;
 #endif
 	}
 	SipHash_Update(&ctx, &inc->inc_fport, sizeof(inc->inc_fport));
 	SipHash_Update(&ctx, &inc->inc_lport, sizeof(inc->inc_lport));
 	SipHash_Update(&ctx, &irs, sizeof(irs));
 	SipHash_Update(&ctx, &flags, sizeof(flags));
 	SipHash_Update(&ctx, &secmod, sizeof(secmod));
 	SipHash_Final((u_int8_t *)&siphash, &ctx);
 
 	return (siphash[0] ^ siphash[1]);
 }
 
 static tcp_seq
 syncookie_generate(struct syncache_head *sch, struct syncache *sc)
 {
 	u_int i, secbit, wscale;
 	uint32_t iss, hash;
 	uint8_t *secbits;
 	union syncookie cookie;
 
 	SCH_LOCK_ASSERT(sch);
 
 	cookie.cookie = 0;
 
 	/* Map our computed MSS into the 3-bit index. */
 	for (i = nitems(tcp_sc_msstab) - 1;
 	     tcp_sc_msstab[i] > sc->sc_peer_mss && i > 0;
 	     i--)
 		;
 	cookie.flags.mss_idx = i;
 
 	/*
 	 * Map the send window scale into the 3-bit index but only if
 	 * the wscale option was received.
 	 */
 	if (sc->sc_flags & SCF_WINSCALE) {
 		wscale = sc->sc_requested_s_scale;
 		for (i = nitems(tcp_sc_wstab) - 1;
 		    tcp_sc_wstab[i] > wscale && i > 0;
 		     i--)
 			;
 		cookie.flags.wscale_idx = i;
 	}
 
 	/* Can we do SACK? */
 	if (sc->sc_flags & SCF_SACK)
 		cookie.flags.sack_ok = 1;
 
 	/* Which of the two secrets to use. */
 	secbit = sch->sch_sc->secret.oddeven & 0x1;
 	cookie.flags.odd_even = secbit;
 
 	secbits = sch->sch_sc->secret.key[secbit];
 	hash = syncookie_mac(&sc->sc_inc, sc->sc_irs, cookie.cookie, secbits,
 	    (uintptr_t)sch);
 
 	/*
 	 * Put the flags into the hash and XOR them to get better ISS number
 	 * variance.  This doesn't enhance the cryptographic strength and is
 	 * done to prevent the 8 cookie bits from showing up directly on the
 	 * wire.
 	 */
 	iss = hash & ~0xff;
 	iss |= cookie.cookie ^ (hash >> 24);
 
 	TCPSTAT_INC(tcps_sc_sendcookie);
 	return (iss);
 }
 
 static struct syncache *
 syncookie_lookup(struct in_conninfo *inc, struct syncache_head *sch, 
     struct syncache *sc, struct tcphdr *th, struct tcpopt *to,
     struct socket *lso)
 {
 	uint32_t hash;
 	uint8_t *secbits;
 	tcp_seq ack, seq;
 	int wnd, wscale = 0;
 	union syncookie cookie;
 
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * Pull information out of SYN-ACK/ACK and revert sequence number
 	 * advances.
 	 */
 	ack = th->th_ack - 1;
 	seq = th->th_seq - 1;
 
 	/*
 	 * Unpack the flags containing enough information to restore the
 	 * connection.
 	 */
 	cookie.cookie = (ack & 0xff) ^ (ack >> 24);
 
 	/* Which of the two secrets to use. */
 	secbits = sch->sch_sc->secret.key[cookie.flags.odd_even];
 
 	hash = syncookie_mac(inc, seq, cookie.cookie, secbits, (uintptr_t)sch);
 
 	/* The recomputed hash matches the ACK if this was a genuine cookie. */
 	if ((ack & ~0xff) != (hash & ~0xff))
 		return (NULL);
 
 	/* Fill in the syncache values. */
 	sc->sc_flags = 0;
 	bcopy(inc, &sc->sc_inc, sizeof(struct in_conninfo));
 	sc->sc_ipopts = NULL;
 	
 	sc->sc_irs = seq;
 	sc->sc_iss = ack;
 
 	switch (inc->inc_flags & INC_ISIPV6) {
 #ifdef INET
 	case 0:
 		sc->sc_ip_ttl = sotoinpcb(lso)->inp_ip_ttl;
 		sc->sc_ip_tos = sotoinpcb(lso)->inp_ip_tos;
 		break;
 #endif
 #ifdef INET6
 	case INC_ISIPV6:
 		if (sotoinpcb(lso)->inp_flags & IN6P_AUTOFLOWLABEL)
 			sc->sc_flowlabel = sc->sc_iss & IPV6_FLOWLABEL_MASK;
 		break;
 #endif
 	}
 
 	sc->sc_peer_mss = tcp_sc_msstab[cookie.flags.mss_idx];
 
 	/* We can simply recompute receive window scale we sent earlier. */
 	while (wscale < TCP_MAX_WINSHIFT && (TCP_MAXWIN << wscale) < sb_max)
 		wscale++;
 
 	/* Only use wscale if it was enabled in the orignal SYN. */
 	if (cookie.flags.wscale_idx > 0) {
 		sc->sc_requested_r_scale = wscale;
 		sc->sc_requested_s_scale = tcp_sc_wstab[cookie.flags.wscale_idx];
 		sc->sc_flags |= SCF_WINSCALE;
 	}
 
 	wnd = lso->sol_sbrcv_hiwat;
 	wnd = imax(wnd, 0);
 	wnd = imin(wnd, TCP_MAXWIN);
 	sc->sc_wnd = wnd;
 
 	if (cookie.flags.sack_ok)
 		sc->sc_flags |= SCF_SACK;
 
 	if (to->to_flags & TOF_TS) {
 		sc->sc_flags |= SCF_TIMESTAMP;
 		sc->sc_tsreflect = to->to_tsval;
 		sc->sc_tsoff = tcp_new_ts_offset(inc);
 	}
 
 	if (to->to_flags & TOF_SIGNATURE)
 		sc->sc_flags |= SCF_SIGNATURE;
 
 	sc->sc_rxmits = 0;
 
 	TCPSTAT_INC(tcps_sc_recvcookie);
 	return (sc);
 }
 
 #ifdef INVARIANTS
 static int
 syncookie_cmp(struct in_conninfo *inc, struct syncache_head *sch,
     struct syncache *sc, struct tcphdr *th, struct tcpopt *to,
     struct socket *lso)
 {
 	struct syncache scs, *scx;
 	char *s;
 
 	bzero(&scs, sizeof(scs));
 	scx = syncookie_lookup(inc, sch, &scs, th, to, lso);
 
 	if ((s = tcp_log_addrs(inc, th, NULL, NULL)) == NULL)
 		return (0);
 
 	if (scx != NULL) {
 		if (sc->sc_peer_mss != scx->sc_peer_mss)
 			log(LOG_DEBUG, "%s; %s: mss different %i vs %i\n",
 			    s, __func__, sc->sc_peer_mss, scx->sc_peer_mss);
 
 		if (sc->sc_requested_r_scale != scx->sc_requested_r_scale)
 			log(LOG_DEBUG, "%s; %s: rwscale different %i vs %i\n",
 			    s, __func__, sc->sc_requested_r_scale,
 			    scx->sc_requested_r_scale);
 
 		if (sc->sc_requested_s_scale != scx->sc_requested_s_scale)
 			log(LOG_DEBUG, "%s; %s: swscale different %i vs %i\n",
 			    s, __func__, sc->sc_requested_s_scale,
 			    scx->sc_requested_s_scale);
 
 		if ((sc->sc_flags & SCF_SACK) != (scx->sc_flags & SCF_SACK))
 			log(LOG_DEBUG, "%s; %s: SACK different\n", s, __func__);
 	}
 
 	if (s != NULL)
 		free(s, M_TCPLOG);
 	return (0);
 }
 #endif /* INVARIANTS */
 
 static void
 syncookie_reseed(void *arg)
 {
 	struct tcp_syncache *sc = arg;
 	uint8_t *secbits;
 	int secbit;
 
 	/*
 	 * Reseeding the secret doesn't have to be protected by a lock.
 	 * It only must be ensured that the new random values are visible
 	 * to all CPUs in a SMP environment.  The atomic with release
 	 * semantics ensures that.
 	 */
 	secbit = (sc->secret.oddeven & 0x1) ? 0 : 1;
 	secbits = sc->secret.key[secbit];
 	arc4rand(secbits, SYNCOOKIE_SECRET_SIZE, 0);
 	atomic_add_rel_int(&sc->secret.oddeven, 1);
 
 	/* Reschedule ourself. */
 	callout_schedule(&sc->secret.reseed, SYNCOOKIE_LIFETIME * hz);
 }
 
 /*
  * Exports the syncache entries to userland so that netstat can display
  * them alongside the other sockets.  This function is intended to be
  * called only from tcp_pcblist.
  *
  * Due to concurrency on an active system, the number of pcbs exported
  * may have no relation to max_pcbs.  max_pcbs merely indicates the
  * amount of space the caller allocated for this function to use.
  */
 int
 syncache_pcblist(struct sysctl_req *req, int max_pcbs, int *pcbs_exported)
 {
 	struct xtcpcb xt;
 	struct syncache *sc;
 	struct syncache_head *sch;
 	int count, error, i;
 
 	for (count = 0, error = 0, i = 0; i < V_tcp_syncache.hashsize; i++) {
 		sch = &V_tcp_syncache.hashbase[i];
 		SCH_LOCK(sch);
 		TAILQ_FOREACH(sc, &sch->sch_bucket, sc_hash) {
 			if (count >= max_pcbs) {
 				SCH_UNLOCK(sch);
 				goto exit;
 			}
 			if (cr_cansee(req->td->td_ucred, sc->sc_cred) != 0)
 				continue;
 			bzero(&xt, sizeof(xt));
 			xt.xt_len = sizeof(xt);
 			if (sc->sc_inc.inc_flags & INC_ISIPV6)
 				xt.xt_inp.inp_vflag = INP_IPV6;
 			else
 				xt.xt_inp.inp_vflag = INP_IPV4;
 			bcopy(&sc->sc_inc, &xt.xt_inp.inp_inc,
 			    sizeof (struct in_conninfo));
 			xt.t_state = TCPS_SYN_RECEIVED;
 			xt.xt_inp.xi_socket.xso_protocol = IPPROTO_TCP;
 			xt.xt_inp.xi_socket.xso_len = sizeof (struct xsocket);
 			xt.xt_inp.xi_socket.so_type = SOCK_STREAM;
 			xt.xt_inp.xi_socket.so_state = SS_ISCONNECTING;
 			error = SYSCTL_OUT(req, &xt, sizeof xt);
 			if (error) {
 				SCH_UNLOCK(sch);
 				goto exit;
 			}
 			count++;
 		}
 		SCH_UNLOCK(sch);
 	}
 exit:
 	*pcbs_exported = count;
 	return error;
 }
Index: user/ngie/bug-237403/sys/netinet6/ip6_gre.c
===================================================================
--- user/ngie/bug-237403/sys/netinet6/ip6_gre.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet6/ip6_gre.c	(revision 346926)
@@ -1,346 +1,595 @@
 /*-
  * Copyright (c) 2014, 2018 Andrey V. Elsukov <ae@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/param.h>
 #include <sys/jail.h>
 #include <sys/systm.h>
 #include <sys/socket.h>
+#include <sys/socketvar.h>
 #include <sys/sockio.h>
 #include <sys/mbuf.h>
 #include <sys/errno.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/malloc.h>
 #include <sys/proc.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #ifdef INET
 #include <net/ethernet.h>
 #include <netinet/ip.h>
 #endif
+#include <netinet/in_pcb.h>
 #include <netinet/ip_encap.h>
+#include <netinet/ip_var.h>
 #include <netinet/ip6.h>
+#include <netinet/udp.h>
+#include <netinet/udp_var.h>
 #include <netinet6/ip6_var.h>
 #include <netinet6/in6_var.h>
 #include <netinet6/scope6_var.h>
 #include <net/if_gre.h>
 
 VNET_DEFINE(int, ip6_gre_hlim) = IPV6_DEFHLIM;
 #define	V_ip6_gre_hlim		VNET(ip6_gre_hlim)
 
 SYSCTL_DECL(_net_inet6_ip6);
 SYSCTL_INT(_net_inet6_ip6, OID_AUTO, grehlim, CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(ip6_gre_hlim), 0, "Default hop limit for encapsulated packets");
 
+struct in6_gre_socket {
+	struct gre_socket	base;
+	struct in6_addr		addr; /* scope zone id is embedded */
+};
+VNET_DEFINE_STATIC(struct gre_sockets *, ipv6_sockets) = NULL;
 VNET_DEFINE_STATIC(struct gre_list *, ipv6_hashtbl) = NULL;
 VNET_DEFINE_STATIC(struct gre_list *, ipv6_srchashtbl) = NULL;
+#define	V_ipv6_sockets		VNET(ipv6_sockets)
 #define	V_ipv6_hashtbl		VNET(ipv6_hashtbl)
 #define	V_ipv6_srchashtbl	VNET(ipv6_srchashtbl)
 #define	GRE_HASH(src, dst)	(V_ipv6_hashtbl[\
     in6_gre_hashval((src), (dst)) & (GRE_HASH_SIZE - 1)])
 #define	GRE_SRCHASH(src)	(V_ipv6_srchashtbl[\
     fnv_32_buf((src), sizeof(*src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)])
+#define	GRE_SOCKHASH(src)	(V_ipv6_sockets[\
+    fnv_32_buf((src), sizeof(*src), FNV1_32_INIT) & (GRE_HASH_SIZE - 1)])
 #define	GRE_HASH_SC(sc)		GRE_HASH(&(sc)->gre_oip6.ip6_src,\
     &(sc)->gre_oip6.ip6_dst)
 
 static uint32_t
 in6_gre_hashval(const struct in6_addr *src, const struct in6_addr *dst)
 {
 	uint32_t ret;
 
 	ret = fnv_32_buf(src, sizeof(*src), FNV1_32_INIT);
 	return (fnv_32_buf(dst, sizeof(*dst), ret));
 }
 
+static struct gre_socket*
+in6_gre_lookup_socket(const struct in6_addr *addr)
+{
+	struct gre_socket *gs;
+	struct in6_gre_socket *s;
+
+	CK_LIST_FOREACH(gs, &GRE_SOCKHASH(addr), chain) {
+		s = __containerof(gs, struct in6_gre_socket, base);
+		if (IN6_ARE_ADDR_EQUAL(&s->addr, addr))
+			break;
+	}
+	return (gs);
+}
+
 static int
 in6_gre_checkdup(const struct gre_softc *sc, const struct in6_addr *src,
-    const struct in6_addr *dst)
+    const struct in6_addr *dst, uint32_t opts)
 {
+	struct gre_list *head;
 	struct gre_softc *tmp;
+	struct gre_socket *gs;
 
 	if (sc->gre_family == AF_INET6 &&
 	    IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src, src) &&
-	    IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst, dst))
+	    IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst, dst) &&
+	    (sc->gre_options & GRE_UDPENCAP) == (opts & GRE_UDPENCAP))
 		return (EEXIST);
 
-	CK_LIST_FOREACH(tmp, &GRE_HASH(src, dst), chain) {
+	if (opts & GRE_UDPENCAP) {
+		gs = in6_gre_lookup_socket(src);
+		if (gs == NULL)
+			return (0);
+		head = &gs->list;
+	} else
+		head = &GRE_HASH(src, dst);
+
+	CK_LIST_FOREACH(tmp, head, chain) {
 		if (tmp == sc)
 			continue;
 		if (IN6_ARE_ADDR_EQUAL(&tmp->gre_oip6.ip6_src, src) &&
 		    IN6_ARE_ADDR_EQUAL(&tmp->gre_oip6.ip6_dst, dst))
 			return (EADDRNOTAVAIL);
 	}
 	return (0);
 }
 
 static int
 in6_gre_lookup(const struct mbuf *m, int off, int proto, void **arg)
 {
 	const struct ip6_hdr *ip6;
 	struct gre_softc *sc;
 
 	if (V_ipv6_hashtbl == NULL)
 		return (0);
 
 	MPASS(in_epoch(net_epoch_preempt));
 	ip6 = mtod(m, const struct ip6_hdr *);
 	CK_LIST_FOREACH(sc, &GRE_HASH(&ip6->ip6_dst, &ip6->ip6_src), chain) {
 		/*
 		 * This is an inbound packet, its ip6_dst is source address
 		 * in softc.
 		 */
 		if (IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src,
 		    &ip6->ip6_dst) &&
 		    IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst,
 		    &ip6->ip6_src)) {
 			if ((GRE2IFP(sc)->if_flags & IFF_UP) == 0)
 				return (0);
 			*arg = sc;
 			return (ENCAP_DRV_LOOKUP);
 		}
 	}
 	return (0);
 }
 
 /*
  * Check that ingress address belongs to local host.
  */
 static void
 in6_gre_set_running(struct gre_softc *sc)
 {
 
 	if (in6_localip(&sc->gre_oip6.ip6_src))
 		GRE2IFP(sc)->if_drv_flags |= IFF_DRV_RUNNING;
 	else
 		GRE2IFP(sc)->if_drv_flags &= ~IFF_DRV_RUNNING;
 }
 
 /*
  * ifaddr_event handler.
  * Clear IFF_DRV_RUNNING flag when ingress address disappears to prevent
  * source address spoofing.
  */
 static void
 in6_gre_srcaddr(void *arg __unused, const struct sockaddr *sa,
     int event __unused)
 {
 	const struct sockaddr_in6 *sin;
 	struct gre_softc *sc;
 
 	/* Check that VNET is ready */
 	if (V_ipv6_hashtbl == NULL)
 		return;
 
 	MPASS(in_epoch(net_epoch_preempt));
 	sin = (const struct sockaddr_in6 *)sa;
 	CK_LIST_FOREACH(sc, &GRE_SRCHASH(&sin->sin6_addr), srchash) {
 		if (IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_src,
 		    &sin->sin6_addr) == 0)
 			continue;
 		in6_gre_set_running(sc);
 	}
 }
 
 static void
+in6_gre_udp_input(struct mbuf *m, int off, struct inpcb *inp,
+    const struct sockaddr *sa, void *ctx)
+{
+	struct epoch_tracker et;
+	struct gre_socket *gs;
+	struct gre_softc *sc;
+	struct sockaddr_in6 dst;
+
+	NET_EPOCH_ENTER(et);
+	/*
+	 * udp_append() holds reference to inp, it is safe to check
+	 * inp_flags2 without INP_RLOCK().
+	 * If socket was closed before we have entered NET_EPOCH section,
+	 * INP_FREED flag should be set. Otherwise it should be safe to
+	 * make access to ctx data, because gre_so will be freed by
+	 * gre_sofree() via epoch_call().
+	 */
+	if (__predict_false(inp->inp_flags2 & INP_FREED)) {
+		NET_EPOCH_EXIT(et);
+		m_freem(m);
+		return;
+	}
+
+	gs = (struct gre_socket *)ctx;
+	dst = *(const struct sockaddr_in6 *)sa;
+	if (sa6_embedscope(&dst, 0)) {
+		NET_EPOCH_EXIT(et);
+		m_freem(m);
+		return;
+	}
+	CK_LIST_FOREACH(sc, &gs->list, chain) {
+		if (IN6_ARE_ADDR_EQUAL(&sc->gre_oip6.ip6_dst, &dst.sin6_addr))
+			break;
+	}
+	if (sc != NULL && (GRE2IFP(sc)->if_flags & IFF_UP) != 0){
+		gre_input(m, off + sizeof(struct udphdr), IPPROTO_UDP, sc);
+		NET_EPOCH_EXIT(et);
+		return;
+	}
+	m_freem(m);
+	NET_EPOCH_EXIT(et);
+}
+
+static int
+in6_gre_setup_socket(struct gre_softc *sc)
+{
+	struct sockopt sopt;
+	struct sockaddr_in6 sin6;
+	struct in6_gre_socket *s;
+	struct gre_socket *gs;
+	int error, value;
+
+	/*
+	 * NOTE: we are protected with gre_ioctl_sx lock.
+	 *
+	 * First check that socket is already configured.
+	 * If so, check that source addres was not changed.
+	 * If address is different, check that there are no other tunnels
+	 * and close socket.
+	 */
+	gs = sc->gre_so;
+	if (gs != NULL) {
+		s = __containerof(gs, struct in6_gre_socket, base);
+		if (!IN6_ARE_ADDR_EQUAL(&s->addr, &sc->gre_oip6.ip6_src)) {
+			if (CK_LIST_EMPTY(&gs->list)) {
+				CK_LIST_REMOVE(gs, chain);
+				soclose(gs->so);
+				epoch_call(net_epoch_preempt, &gs->epoch_ctx,
+				    gre_sofree);
+			}
+			gs = sc->gre_so = NULL;
+		}
+	}
+
+	if (gs == NULL) {
+		/*
+		 * Check that socket for given address is already
+		 * configured.
+		 */
+		gs = in6_gre_lookup_socket(&sc->gre_oip6.ip6_src);
+		if (gs == NULL) {
+			s = malloc(sizeof(*s), M_GRE, M_WAITOK | M_ZERO);
+			s->addr = sc->gre_oip6.ip6_src;
+			gs = &s->base;
+
+			error = socreate(sc->gre_family, &gs->so,
+			    SOCK_DGRAM, IPPROTO_UDP, curthread->td_ucred,
+			    curthread);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot create socket: %d\n", error);
+				free(s, M_GRE);
+				return (error);
+			}
+
+			error = udp_set_kernel_tunneling(gs->so,
+			    in6_gre_udp_input, NULL, gs);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot set UDP tunneling: %d\n", error);
+				goto fail;
+			}
+
+			memset(&sopt, 0, sizeof(sopt));
+			sopt.sopt_dir = SOPT_SET;
+			sopt.sopt_level = IPPROTO_IPV6;
+			sopt.sopt_name = IPV6_BINDANY;
+			sopt.sopt_val = &value;
+			sopt.sopt_valsize = sizeof(value);
+			value = 1;
+			error = sosetopt(gs->so, &sopt);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot set IPV6_BINDANY opt: %d\n",
+				    error);
+				goto fail;
+			}
+
+			memset(&sin6, 0, sizeof(sin6));
+			sin6.sin6_family = AF_INET6;
+			sin6.sin6_len = sizeof(sin6);
+			sin6.sin6_addr = sc->gre_oip6.ip6_src;
+			sin6.sin6_port = htons(GRE_UDPPORT);
+			error = sa6_recoverscope(&sin6);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot determine scope zone id: %d\n",
+				    error);
+				goto fail;
+			}
+			error = sobind(gs->so, (struct sockaddr *)&sin6,
+			    curthread);
+			if (error != 0) {
+				if_printf(GRE2IFP(sc),
+				    "cannot bind socket: %d\n", error);
+				goto fail;
+			}
+			/* Add socket to the chain */
+			CK_LIST_INSERT_HEAD(
+			    &GRE_SOCKHASH(&sc->gre_oip6.ip6_src), gs, chain);
+		}
+	}
+
+	/* Add softc to the socket's list */
+	CK_LIST_INSERT_HEAD(&gs->list, sc, chain);
+	sc->gre_so = gs;
+	return (0);
+fail:
+	soclose(gs->so);
+	free(s, M_GRE);
+	return (error);
+}
+
+static int
 in6_gre_attach(struct gre_softc *sc)
 {
+	struct grehdr *gh;
+	int error;
 
-	sc->gre_hlen = sizeof(struct greip6);
+	if (sc->gre_options & GRE_UDPENCAP) {
+		sc->gre_csumflags = CSUM_UDP_IPV6;
+		sc->gre_hlen = sizeof(struct greudp6);
+		sc->gre_oip6.ip6_nxt = IPPROTO_UDP;
+		gh = &sc->gre_udp6hdr->gi6_gre;
+		gre_update_udphdr(sc, &sc->gre_udp6,
+		    in6_cksum_pseudo(&sc->gre_oip6, 0, 0, 0));
+	} else {
+		sc->gre_hlen = sizeof(struct greip6);
+		sc->gre_oip6.ip6_nxt = IPPROTO_GRE;
+		gh = &sc->gre_ip6hdr->gi6_gre;
+	}
 	sc->gre_oip6.ip6_vfc = IPV6_VERSION;
-	sc->gre_oip6.ip6_nxt = IPPROTO_GRE;
-	gre_updatehdr(sc, &sc->gre_gi6hdr->gi6_gre);
-	CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain);
+	gre_update_hdr(sc, gh);
+
+	/*
+	 * If we return error, this means that sc is not linked,
+	 * and caller should reset gre_family and free(sc->gre_hdr).
+	 */
+	if (sc->gre_options & GRE_UDPENCAP) {
+		error = in6_gre_setup_socket(sc);
+		if (error != 0)
+			return (error);
+	} else
+		CK_LIST_INSERT_HEAD(&GRE_HASH_SC(sc), sc, chain);
 	CK_LIST_INSERT_HEAD(&GRE_SRCHASH(&sc->gre_oip6.ip6_src), sc, srchash);
+
+	/* Set IFF_DRV_RUNNING if interface is ready */
+	in6_gre_set_running(sc);
+	return (0);
 }
 
-void
+int
 in6_gre_setopts(struct gre_softc *sc, u_long cmd, uint32_t value)
 {
+	int error;
 
-	MPASS(cmd == GRESKEY || cmd == GRESOPTS);
-
 	/* NOTE: we are protected with gre_ioctl_sx lock */
+	MPASS(cmd == GRESKEY || cmd == GRESOPTS || cmd == GRESPORT);
 	MPASS(sc->gre_family == AF_INET6);
+
+	/*
+	 * If we are going to change encapsulation protocol, do check
+	 * for duplicate tunnels. Return EEXIST here to do not confuse
+	 * user.
+	 */
+	if (cmd == GRESOPTS &&
+	    (sc->gre_options & GRE_UDPENCAP) != (value & GRE_UDPENCAP) &&
+	    in6_gre_checkdup(sc, &sc->gre_oip6.ip6_src,
+		&sc->gre_oip6.ip6_dst, value) == EADDRNOTAVAIL)
+		return (EEXIST);
+
 	CK_LIST_REMOVE(sc, chain);
 	CK_LIST_REMOVE(sc, srchash);
 	GRE_WAIT();
-	if (cmd == GRESKEY)
+	switch (cmd) {
+	case GRESKEY:
 		sc->gre_key = value;
-	else
+		break;
+	case GRESOPTS:
 		sc->gre_options = value;
-	in6_gre_attach(sc);
+		break;
+	case GRESPORT:
+		sc->gre_port = value;
+		break;
+	}
+	error = in6_gre_attach(sc);
+	if (error != 0) {
+		sc->gre_family = 0;
+		free(sc->gre_hdr, M_GRE);
+	}
+	return (error);
 }
 
 int
 in6_gre_ioctl(struct gre_softc *sc, u_long cmd, caddr_t data)
 {
 	struct in6_ifreq *ifr = (struct in6_ifreq *)data;
 	struct sockaddr_in6 *dst, *src;
 	struct ip6_hdr *ip6;
 	int error;
 
 	/* NOTE: we are protected with gre_ioctl_sx lock */
 	error = EINVAL;
 	switch (cmd) {
 	case SIOCSIFPHYADDR_IN6:
 		src = &((struct in6_aliasreq *)data)->ifra_addr;
 		dst = &((struct in6_aliasreq *)data)->ifra_dstaddr;
 
 		/* sanity checks */
 		if (src->sin6_family != dst->sin6_family ||
 		    src->sin6_family != AF_INET6 ||
 		    src->sin6_len != dst->sin6_len ||
 		    src->sin6_len != sizeof(*src))
 			break;
 		if (IN6_IS_ADDR_UNSPECIFIED(&src->sin6_addr) ||
 		    IN6_IS_ADDR_UNSPECIFIED(&dst->sin6_addr)) {
 			error = EADDRNOTAVAIL;
 			break;
 		}
 		/*
 		 * Check validity of the scope zone ID of the
 		 * addresses, and convert it into the kernel
 		 * internal form if necessary.
 		 */
 		if ((error = sa6_embedscope(src, 0)) != 0 ||
 		    (error = sa6_embedscope(dst, 0)) != 0)
 			break;
 
 		if (V_ipv6_hashtbl == NULL) {
 			V_ipv6_hashtbl = gre_hashinit();
 			V_ipv6_srchashtbl = gre_hashinit();
+			V_ipv6_sockets = (struct gre_sockets *)gre_hashinit();
 		}
 		error = in6_gre_checkdup(sc, &src->sin6_addr,
-		    &dst->sin6_addr);
+		    &dst->sin6_addr, sc->gre_options);
 		if (error == EADDRNOTAVAIL)
 			break;
 		if (error == EEXIST) {
 			/* Addresses are the same. Just return. */
 			error = 0;
 			break;
 		}
-		ip6 = malloc(sizeof(struct greip6) + 3 * sizeof(uint32_t),
+		ip6 = malloc(sizeof(struct greudp6) + 3 * sizeof(uint32_t),
 		    M_GRE, M_WAITOK | M_ZERO);
 		ip6->ip6_src = src->sin6_addr;
 		ip6->ip6_dst = dst->sin6_addr;
 		if (sc->gre_family != 0) {
 			/* Detach existing tunnel first */
 			CK_LIST_REMOVE(sc, chain);
 			CK_LIST_REMOVE(sc, srchash);
 			GRE_WAIT();
 			free(sc->gre_hdr, M_GRE);
 			/* XXX: should we notify about link state change? */
 		}
 		sc->gre_family = AF_INET6;
 		sc->gre_hdr = ip6;
 		sc->gre_oseq = 0;
 		sc->gre_iseq = UINT32_MAX;
-		in6_gre_attach(sc);
-		in6_gre_set_running(sc);
+		error = in6_gre_attach(sc);
+		if (error != 0) {
+			sc->gre_family = 0;
+			free(sc->gre_hdr, M_GRE);
+		}
 		break;
 	case SIOCGIFPSRCADDR_IN6:
 	case SIOCGIFPDSTADDR_IN6:
 		if (sc->gre_family != AF_INET6) {
 			error = EADDRNOTAVAIL;
 			break;
 		}
 		src = (struct sockaddr_in6 *)&ifr->ifr_addr;
 		memset(src, 0, sizeof(*src));
 		src->sin6_family = AF_INET6;
 		src->sin6_len = sizeof(*src);
 		src->sin6_addr = (cmd == SIOCGIFPSRCADDR_IN6) ?
 		    sc->gre_oip6.ip6_src: sc->gre_oip6.ip6_dst;
 		error = prison_if(curthread->td_ucred, (struct sockaddr *)src);
 		if (error == 0)
 			error = sa6_recoverscope(src);
 		if (error != 0)
 			memset(src, 0, sizeof(*src));
 		break;
 	}
 	return (error);
 }
 
 int
-in6_gre_output(struct mbuf *m, int af __unused, int hlen __unused)
+in6_gre_output(struct mbuf *m, int af __unused, int hlen __unused,
+    uint32_t flowid)
 {
 	struct greip6 *gi6;
 
 	gi6 = mtod(m, struct greip6 *);
 	gi6->gi6_ip6.ip6_hlim = V_ip6_gre_hlim;
+	gi6->gi6_ip6.ip6_flow |= flowid & IPV6_FLOWLABEL_MASK;
 	return (ip6_output(m, NULL, NULL, IPV6_MINMTU, NULL, NULL, NULL));
 }
 
 static const struct srcaddrtab *ipv6_srcaddrtab = NULL;
 static const struct encaptab *ecookie = NULL;
 static const struct encap_config ipv6_encap_cfg = {
 	.proto = IPPROTO_GRE,
 	.min_length = sizeof(struct greip6) +
 #ifdef INET
 	    sizeof(struct ip),
 #else
 	    sizeof(struct ip6_hdr),
 #endif
 	.exact_match = ENCAP_DRV_LOOKUP,
 	.lookup = in6_gre_lookup,
 	.input = gre_input
 };
 
 void
 in6_gre_init(void)
 {
 
 	if (!IS_DEFAULT_VNET(curvnet))
 		return;
 	ipv6_srcaddrtab = ip6_encap_register_srcaddr(in6_gre_srcaddr,
 	    NULL, M_WAITOK);
 	ecookie = ip6_encap_attach(&ipv6_encap_cfg, NULL, M_WAITOK);
 }
 
 void
 in6_gre_uninit(void)
 {
 
 	if (IS_DEFAULT_VNET(curvnet)) {
 		ip6_encap_detach(ecookie);
 		ip6_encap_unregister_srcaddr(ipv6_srcaddrtab);
 	}
 	if (V_ipv6_hashtbl != NULL) {
 		gre_hashdestroy(V_ipv6_hashtbl);
 		V_ipv6_hashtbl = NULL;
 		GRE_WAIT();
 		gre_hashdestroy(V_ipv6_srchashtbl);
+		gre_hashdestroy((struct gre_list *)V_ipv6_sockets);
 	}
 }
Index: user/ngie/bug-237403/sys/netinet6/ip6_output.c
===================================================================
--- user/ngie/bug-237403/sys/netinet6/ip6_output.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netinet6/ip6_output.c	(revision 346926)
@@ -1,3146 +1,3149 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the project nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	$KAME: ip6_output.c,v 1.279 2002/01/26 06:12:30 jinmei Exp $
  */
 
 /*-
  * Copyright (c) 1982, 1986, 1988, 1990, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)ip_output.c	8.3 (Berkeley) 1/21/94
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_ratelimit.h"
 #include "opt_ipsec.h"
 #include "opt_sctp.h"
 #include "opt_route.h"
 #include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/errno.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/protosw.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/syslog.h>
 #include <sys/ucred.h>
 
 #include <machine/in_cksum.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_llatbl.h>
 #include <net/netisr.h>
 #include <net/route.h>
 #include <net/pfil.h>
 #include <net/rss_config.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_var.h>
 #include <netinet/ip_var.h>
 #include <netinet6/in6_fib.h>
 #include <netinet6/in6_var.h>
 #include <netinet/ip6.h>
 #include <netinet/icmp6.h>
 #include <netinet6/ip6_var.h>
 #include <netinet/in_pcb.h>
 #include <netinet/tcp_var.h>
 #include <netinet6/nd6.h>
 #include <netinet6/in6_rss.h>
 
 #include <netipsec/ipsec_support.h>
 #ifdef SCTP
 #include <netinet/sctp.h>
 #include <netinet/sctp_crc32.h>
 #endif
 
 #include <netinet6/ip6protosw.h>
 #include <netinet6/scope6_var.h>
 
 extern int in6_mcast_loop;
 
 struct ip6_exthdrs {
 	struct mbuf *ip6e_ip6;
 	struct mbuf *ip6e_hbh;
 	struct mbuf *ip6e_dest1;
 	struct mbuf *ip6e_rthdr;
 	struct mbuf *ip6e_dest2;
 };
 
 static MALLOC_DEFINE(M_IP6OPT, "ip6opt", "IPv6 options");
 
 static int ip6_pcbopt(int, u_char *, int, struct ip6_pktopts **,
 			   struct ucred *, int);
 static int ip6_pcbopts(struct ip6_pktopts **, struct mbuf *,
 	struct socket *, struct sockopt *);
 static int ip6_getpcbopt(struct inpcb *, int, struct sockopt *);
 static int ip6_setpktopt(int, u_char *, int, struct ip6_pktopts *,
 	struct ucred *, int, int, int);
 
 static int ip6_copyexthdr(struct mbuf **, caddr_t, int);
 static int ip6_insertfraghdr(struct mbuf *, struct mbuf *, int,
 	struct ip6_frag **);
 static int ip6_insert_jumboopt(struct ip6_exthdrs *, u_int32_t);
 static int ip6_splithdr(struct mbuf *, struct ip6_exthdrs *);
 static int ip6_getpmtu(struct route_in6 *, int,
 	struct ifnet *, const struct in6_addr *, u_long *, int *, u_int,
 	u_int);
 static int ip6_calcmtu(struct ifnet *, const struct in6_addr *, u_long,
 	u_long *, int *, u_int);
 static int ip6_getpmtu_ctl(u_int, const struct in6_addr *, u_long *);
 static int copypktopts(struct ip6_pktopts *, struct ip6_pktopts *, int);
 
 
 /*
  * Make an extension header from option data.  hp is the source, and
  * mp is the destination.
  */
 #define MAKE_EXTHDR(hp, mp)						\
     do {								\
 	if (hp) {							\
 		struct ip6_ext *eh = (struct ip6_ext *)(hp);		\
 		error = ip6_copyexthdr((mp), (caddr_t)(hp),		\
 		    ((eh)->ip6e_len + 1) << 3);				\
 		if (error)						\
 			goto freehdrs;					\
 	}								\
     } while (/*CONSTCOND*/ 0)
 
 /*
  * Form a chain of extension headers.
  * m is the extension header mbuf
  * mp is the previous mbuf in the chain
  * p is the next header
  * i is the type of option.
  */
 #define MAKE_CHAIN(m, mp, p, i)\
     do {\
 	if (m) {\
 		if (!hdrsplit) \
 			panic("assumption failed: hdr not split"); \
 		*mtod((m), u_char *) = *(p);\
 		*(p) = (i);\
 		p = mtod((m), u_char *);\
 		(m)->m_next = (mp)->m_next;\
 		(mp)->m_next = (m);\
 		(mp) = (m);\
 	}\
     } while (/*CONSTCOND*/ 0)
 
 void
 in6_delayed_cksum(struct mbuf *m, uint32_t plen, u_short offset)
 {
 	u_short csum;
 
 	csum = in_cksum_skip(m, offset + plen, offset);
 	if (m->m_pkthdr.csum_flags & CSUM_UDP_IPV6 && csum == 0)
 		csum = 0xffff;
 	offset += m->m_pkthdr.csum_data;	/* checksum offset */
 
 	if (offset + sizeof(csum) > m->m_len)
 		m_copyback(m, offset, sizeof(csum), (caddr_t)&csum);
 	else
 		*(u_short *)mtodo(m, offset) = csum;
 }
 
 int
 ip6_fragment(struct ifnet *ifp, struct mbuf *m0, int hlen, u_char nextproto,
     int fraglen , uint32_t id)
 {
 	struct mbuf *m, **mnext, *m_frgpart;
 	struct ip6_hdr *ip6, *mhip6;
 	struct ip6_frag *ip6f;
 	int off;
 	int error;
 	int tlen = m0->m_pkthdr.len;
 
 	KASSERT((fraglen % 8 == 0), ("Fragment length must be a multiple of 8"));
 
 	m = m0;
 	ip6 = mtod(m, struct ip6_hdr *);
 	mnext = &m->m_nextpkt;
 
 	for (off = hlen; off < tlen; off += fraglen) {
 		m = m_gethdr(M_NOWAIT, MT_DATA);
 		if (!m) {
 			IP6STAT_INC(ip6s_odropped);
 			return (ENOBUFS);
 		}
 		m->m_flags = m0->m_flags & M_COPYFLAGS;
 		*mnext = m;
 		mnext = &m->m_nextpkt;
 		m->m_data += max_linkhdr;
 		mhip6 = mtod(m, struct ip6_hdr *);
 		*mhip6 = *ip6;
 		m->m_len = sizeof(*mhip6);
 		error = ip6_insertfraghdr(m0, m, hlen, &ip6f);
 		if (error) {
 			IP6STAT_INC(ip6s_odropped);
 			return (error);
 		}
 		ip6f->ip6f_offlg = htons((u_short)((off - hlen) & ~7));
 		if (off + fraglen >= tlen)
 			fraglen = tlen - off;
 		else
 			ip6f->ip6f_offlg |= IP6F_MORE_FRAG;
 		mhip6->ip6_plen = htons((u_short)(fraglen + hlen +
 		    sizeof(*ip6f) - sizeof(struct ip6_hdr)));
 		if ((m_frgpart = m_copym(m0, off, fraglen, M_NOWAIT)) == NULL) {
 			IP6STAT_INC(ip6s_odropped);
 			return (ENOBUFS);
 		}
 		m_cat(m, m_frgpart);
 		m->m_pkthdr.len = fraglen + hlen + sizeof(*ip6f);
 		m->m_pkthdr.fibnum = m0->m_pkthdr.fibnum;
 		m->m_pkthdr.rcvif = NULL;
 		ip6f->ip6f_reserved = 0;
 		ip6f->ip6f_ident = id;
 		ip6f->ip6f_nxt = nextproto;
 		IP6STAT_INC(ip6s_ofragments);
 		in6_ifstat_inc(ifp, ifs6_out_fragcreat);
 	}
 
 	return (0);
 }
 
 /*
  * IP6 output. The packet in mbuf chain m contains a skeletal IP6
  * header (with pri, len, nxt, hlim, src, dst).
  * This function may modify ver and hlim only.
  * The mbuf chain containing the packet will be freed.
  * The mbuf opt, if present, will not be freed.
  * If route_in6 ro is present and has ro_rt initialized, route lookup would be
  * skipped and ro->ro_rt would be used. If ro is present but ro->ro_rt is NULL,
  * then result of route lookup is stored in ro->ro_rt.
  *
  * type of "mtu": rt_mtu is u_long, ifnet.ifr_mtu is int, and
  * nd_ifinfo.linkmtu is u_int32_t.  so we use u_long to hold largest one,
  * which is rt_mtu.
  *
  * ifpp - XXX: just for statistics
  */
 /*
  * XXX TODO: no flowid is assigned for outbound flows?
  */
 int
 ip6_output(struct mbuf *m0, struct ip6_pktopts *opt,
     struct route_in6 *ro, int flags, struct ip6_moptions *im6o,
     struct ifnet **ifpp, struct inpcb *inp)
 {
 	struct ip6_hdr *ip6;
 	struct ifnet *ifp, *origifp;
 	struct mbuf *m = m0;
 	struct mbuf *mprev = NULL;
 	int hlen, tlen, len;
 	struct route_in6 ip6route;
 	struct rtentry *rt = NULL;
 	struct sockaddr_in6 *dst, src_sa, dst_sa;
 	struct in6_addr odst;
 	int error = 0;
 	struct in6_ifaddr *ia = NULL;
 	u_long mtu;
 	int alwaysfrag, dontfrag;
 	u_int32_t optlen = 0, plen = 0, unfragpartlen = 0;
 	struct ip6_exthdrs exthdrs;
 	struct in6_addr src0, dst0;
 	u_int32_t zone;
 	struct route_in6 *ro_pmtu = NULL;
 	int hdrsplit = 0;
 	int sw_csum, tso;
 	int needfiblookup;
 	uint32_t fibnum;
 	struct m_tag *fwd_tag = NULL;
 	uint32_t id;
 
 	if (inp != NULL) {
 		INP_LOCK_ASSERT(inp);
 		M_SETFIB(m, inp->inp_inc.inc_fibnum);
 		if ((flags & IP_NODEFAULTFLOWID) == 0) {
 			/* unconditionally set flowid */
 			m->m_pkthdr.flowid = inp->inp_flowid;
 			M_HASHTYPE_SET(m, inp->inp_flowtype);
 		}
+#ifdef NUMA
+		m->m_pkthdr.numa_domain = inp->inp_numa_domain;
+#endif
 	}
 
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 	/*
 	 * IPSec checking which handles several cases.
 	 * FAST IPSEC: We re-injected the packet.
 	 * XXX: need scope argument.
 	 */
 	if (IPSEC_ENABLED(ipv6)) {
 		if ((error = IPSEC_OUTPUT(ipv6, m, inp)) != 0) {
 			if (error == EINPROGRESS)
 				error = 0;
 			goto done;
 		}
 	}
 #endif /* IPSEC */
 
 	bzero(&exthdrs, sizeof(exthdrs));
 	if (opt) {
 		/* Hop-by-Hop options header */
 		MAKE_EXTHDR(opt->ip6po_hbh, &exthdrs.ip6e_hbh);
 		/* Destination options header(1st part) */
 		if (opt->ip6po_rthdr) {
 			/*
 			 * Destination options header(1st part)
 			 * This only makes sense with a routing header.
 			 * See Section 9.2 of RFC 3542.
 			 * Disabling this part just for MIP6 convenience is
 			 * a bad idea.  We need to think carefully about a
 			 * way to make the advanced API coexist with MIP6
 			 * options, which might automatically be inserted in
 			 * the kernel.
 			 */
 			MAKE_EXTHDR(opt->ip6po_dest1, &exthdrs.ip6e_dest1);
 		}
 		/* Routing header */
 		MAKE_EXTHDR(opt->ip6po_rthdr, &exthdrs.ip6e_rthdr);
 		/* Destination options header(2nd part) */
 		MAKE_EXTHDR(opt->ip6po_dest2, &exthdrs.ip6e_dest2);
 	}
 
 	/*
 	 * Calculate the total length of the extension header chain.
 	 * Keep the length of the unfragmentable part for fragmentation.
 	 */
 	optlen = 0;
 	if (exthdrs.ip6e_hbh)
 		optlen += exthdrs.ip6e_hbh->m_len;
 	if (exthdrs.ip6e_dest1)
 		optlen += exthdrs.ip6e_dest1->m_len;
 	if (exthdrs.ip6e_rthdr)
 		optlen += exthdrs.ip6e_rthdr->m_len;
 	unfragpartlen = optlen + sizeof(struct ip6_hdr);
 
 	/* NOTE: we don't add AH/ESP length here (done in ip6_ipsec_output) */
 	if (exthdrs.ip6e_dest2)
 		optlen += exthdrs.ip6e_dest2->m_len;
 
 	/*
 	 * If there is at least one extension header,
 	 * separate IP6 header from the payload.
 	 */
 	if (optlen && !hdrsplit) {
 		if ((error = ip6_splithdr(m, &exthdrs)) != 0) {
 			m = NULL;
 			goto freehdrs;
 		}
 		m = exthdrs.ip6e_ip6;
 		hdrsplit++;
 	}
 
 	ip6 = mtod(m, struct ip6_hdr *);
 
 	/* adjust mbuf packet header length */
 	m->m_pkthdr.len += optlen;
 	plen = m->m_pkthdr.len - sizeof(*ip6);
 
 	/* If this is a jumbo payload, insert a jumbo payload option. */
 	if (plen > IPV6_MAXPACKET) {
 		if (!hdrsplit) {
 			if ((error = ip6_splithdr(m, &exthdrs)) != 0) {
 				m = NULL;
 				goto freehdrs;
 			}
 			m = exthdrs.ip6e_ip6;
 			hdrsplit++;
 		}
 		/* adjust pointer */
 		ip6 = mtod(m, struct ip6_hdr *);
 		if ((error = ip6_insert_jumboopt(&exthdrs, plen)) != 0)
 			goto freehdrs;
 		ip6->ip6_plen = 0;
 	} else
 		ip6->ip6_plen = htons(plen);
 
 	/*
 	 * Concatenate headers and fill in next header fields.
 	 * Here we have, on "m"
 	 *	IPv6 payload
 	 * and we insert headers accordingly.  Finally, we should be getting:
 	 *	IPv6 hbh dest1 rthdr ah* [esp* dest2 payload]
 	 *
 	 * during the header composing process, "m" points to IPv6 header.
 	 * "mprev" points to an extension header prior to esp.
 	 */
 	u_char *nexthdrp = &ip6->ip6_nxt;
 	mprev = m;
 
 	/*
 	 * we treat dest2 specially.  this makes IPsec processing
 	 * much easier.  the goal here is to make mprev point the
 	 * mbuf prior to dest2.
 	 *
 	 * result: IPv6 dest2 payload
 	 * m and mprev will point to IPv6 header.
 	 */
 	if (exthdrs.ip6e_dest2) {
 		if (!hdrsplit)
 			panic("assumption failed: hdr not split");
 		exthdrs.ip6e_dest2->m_next = m->m_next;
 		m->m_next = exthdrs.ip6e_dest2;
 		*mtod(exthdrs.ip6e_dest2, u_char *) = ip6->ip6_nxt;
 		ip6->ip6_nxt = IPPROTO_DSTOPTS;
 	}
 
 	/*
 	 * result: IPv6 hbh dest1 rthdr dest2 payload
 	 * m will point to IPv6 header.  mprev will point to the
 	 * extension header prior to dest2 (rthdr in the above case).
 	 */
 	MAKE_CHAIN(exthdrs.ip6e_hbh, mprev, nexthdrp, IPPROTO_HOPOPTS);
 	MAKE_CHAIN(exthdrs.ip6e_dest1, mprev, nexthdrp,
 		   IPPROTO_DSTOPTS);
 	MAKE_CHAIN(exthdrs.ip6e_rthdr, mprev, nexthdrp,
 		   IPPROTO_ROUTING);
 
 	/*
 	 * If there is a routing header, discard the packet.
 	 */
 	if (exthdrs.ip6e_rthdr) {
 		 error = EINVAL;
 		 goto bad;
 	}
 
 	/* Source address validation */
 	if (IN6_IS_ADDR_UNSPECIFIED(&ip6->ip6_src) &&
 	    (flags & IPV6_UNSPECSRC) == 0) {
 		error = EOPNOTSUPP;
 		IP6STAT_INC(ip6s_badscope);
 		goto bad;
 	}
 	if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_src)) {
 		error = EOPNOTSUPP;
 		IP6STAT_INC(ip6s_badscope);
 		goto bad;
 	}
 
 	IP6STAT_INC(ip6s_localout);
 
 	/*
 	 * Route packet.
 	 */
 	if (ro == NULL) {
 		ro = &ip6route;
 		bzero((caddr_t)ro, sizeof(*ro));
 	}
 	ro_pmtu = ro;
 	if (opt && opt->ip6po_rthdr)
 		ro = &opt->ip6po_route;
 	dst = (struct sockaddr_in6 *)&ro->ro_dst;
 	fibnum = (inp != NULL) ? inp->inp_inc.inc_fibnum : M_GETFIB(m);
 again:
 	/*
 	 * if specified, try to fill in the traffic class field.
 	 * do not override if a non-zero value is already set.
 	 * we check the diffserv field and the ecn field separately.
 	 */
 	if (opt && opt->ip6po_tclass >= 0) {
 		int mask = 0;
 
 		if ((ip6->ip6_flow & htonl(0xfc << 20)) == 0)
 			mask |= 0xfc;
 		if ((ip6->ip6_flow & htonl(0x03 << 20)) == 0)
 			mask |= 0x03;
 		if (mask != 0)
 			ip6->ip6_flow |= htonl((opt->ip6po_tclass & mask) << 20);
 	}
 
 	/* fill in or override the hop limit field, if necessary. */
 	if (opt && opt->ip6po_hlim != -1)
 		ip6->ip6_hlim = opt->ip6po_hlim & 0xff;
 	else if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
 		if (im6o != NULL)
 			ip6->ip6_hlim = im6o->im6o_multicast_hlim;
 		else
 			ip6->ip6_hlim = V_ip6_defmcasthlim;
 	}
 	/*
 	 * Validate route against routing table additions;
 	 * a better/more specific route might have been added.
 	 * Make sure address family is set in route.
 	 */
 	if (inp) {
 		ro->ro_dst.sin6_family = AF_INET6;
 		RT_VALIDATE((struct route *)ro, &inp->inp_rt_cookie, fibnum);
 	}
 	if (ro->ro_rt && fwd_tag == NULL && (ro->ro_rt->rt_flags & RTF_UP) &&
 	    ro->ro_dst.sin6_family == AF_INET6 &&
 	    IN6_ARE_ADDR_EQUAL(&ro->ro_dst.sin6_addr, &ip6->ip6_dst)) {
 		rt = ro->ro_rt;
 		ifp = ro->ro_rt->rt_ifp;
 	} else {
 		if (ro->ro_lle)
 			LLE_FREE(ro->ro_lle);	/* zeros ro_lle */
 		ro->ro_lle = NULL;
 		if (fwd_tag == NULL) {
 			bzero(&dst_sa, sizeof(dst_sa));
 			dst_sa.sin6_family = AF_INET6;
 			dst_sa.sin6_len = sizeof(dst_sa);
 			dst_sa.sin6_addr = ip6->ip6_dst;
 		}
 		error = in6_selectroute_fib(&dst_sa, opt, im6o, ro, &ifp,
 		    &rt, fibnum);
 		if (error != 0) {
 			if (ifp != NULL)
 				in6_ifstat_inc(ifp, ifs6_out_discard);
 			goto bad;
 		}
 	}
 	if (rt == NULL) {
 		/*
 		 * If in6_selectroute() does not return a route entry,
 		 * dst may not have been updated.
 		 */
 		*dst = dst_sa;	/* XXX */
 	}
 
 	/*
 	 * then rt (for unicast) and ifp must be non-NULL valid values.
 	 */
 	if ((flags & IPV6_FORWARDING) == 0) {
 		/* XXX: the FORWARDING flag can be set for mrouting. */
 		in6_ifstat_inc(ifp, ifs6_out_request);
 	}
 	if (rt != NULL) {
 		ia = (struct in6_ifaddr *)(rt->rt_ifa);
 		counter_u64_add(rt->rt_pksent, 1);
 	}
 
 	/* Setup data structures for scope ID checks. */
 	src0 = ip6->ip6_src;
 	bzero(&src_sa, sizeof(src_sa));
 	src_sa.sin6_family = AF_INET6;
 	src_sa.sin6_len = sizeof(src_sa);
 	src_sa.sin6_addr = ip6->ip6_src;
 
 	dst0 = ip6->ip6_dst;
 	/* re-initialize to be sure */
 	bzero(&dst_sa, sizeof(dst_sa));
 	dst_sa.sin6_family = AF_INET6;
 	dst_sa.sin6_len = sizeof(dst_sa);
 	dst_sa.sin6_addr = ip6->ip6_dst;
 
 	/* Check for valid scope ID. */
 	if (in6_setscope(&src0, ifp, &zone) == 0 &&
 	    sa6_recoverscope(&src_sa) == 0 && zone == src_sa.sin6_scope_id &&
 	    in6_setscope(&dst0, ifp, &zone) == 0 &&
 	    sa6_recoverscope(&dst_sa) == 0 && zone == dst_sa.sin6_scope_id) {
 		/*
 		 * The outgoing interface is in the zone of the source
 		 * and destination addresses.
 		 *
 		 * Because the loopback interface cannot receive
 		 * packets with a different scope ID than its own,
 		 * there is a trick is to pretend the outgoing packet
 		 * was received by the real network interface, by
 		 * setting "origifp" different from "ifp". This is
 		 * only allowed when "ifp" is a loopback network
 		 * interface. Refer to code in nd6_output_ifp() for
 		 * more details.
 		 */
 		origifp = ifp;
 	
 		/*
 		 * We should use ia_ifp to support the case of sending
 		 * packets to an address of our own.
 		 */
 		if (ia != NULL && ia->ia_ifp)
 			ifp = ia->ia_ifp;
 
 	} else if ((ifp->if_flags & IFF_LOOPBACK) == 0 ||
 	    sa6_recoverscope(&src_sa) != 0 ||
 	    sa6_recoverscope(&dst_sa) != 0 ||
 	    dst_sa.sin6_scope_id == 0 ||
 	    (src_sa.sin6_scope_id != 0 &&
 	    src_sa.sin6_scope_id != dst_sa.sin6_scope_id) ||
 	    (origifp = ifnet_byindex(dst_sa.sin6_scope_id)) == NULL) {
 		/*
 		 * If the destination network interface is not a
 		 * loopback interface, or the destination network
 		 * address has no scope ID, or the source address has
 		 * a scope ID set which is different from the
 		 * destination address one, or there is no network
 		 * interface representing this scope ID, the address
 		 * pair is considered invalid.
 		 */
 		IP6STAT_INC(ip6s_badscope);
 		in6_ifstat_inc(ifp, ifs6_out_discard);
 		if (error == 0)
 			error = EHOSTUNREACH; /* XXX */
 		goto bad;
 	}
 
 	/* All scope ID checks are successful. */
 
 	if (rt && !IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
 		if (opt && opt->ip6po_nextroute.ro_rt) {
 			/*
 			 * The nexthop is explicitly specified by the
 			 * application.  We assume the next hop is an IPv6
 			 * address.
 			 */
 			dst = (struct sockaddr_in6 *)opt->ip6po_nexthop;
 		}
 		else if ((rt->rt_flags & RTF_GATEWAY))
 			dst = (struct sockaddr_in6 *)rt->rt_gateway;
 	}
 
 	if (!IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
 		m->m_flags &= ~(M_BCAST | M_MCAST); /* just in case */
 	} else {
 		m->m_flags = (m->m_flags & ~M_BCAST) | M_MCAST;
 		in6_ifstat_inc(ifp, ifs6_out_mcast);
 		/*
 		 * Confirm that the outgoing interface supports multicast.
 		 */
 		if (!(ifp->if_flags & IFF_MULTICAST)) {
 			IP6STAT_INC(ip6s_noroute);
 			in6_ifstat_inc(ifp, ifs6_out_discard);
 			error = ENETUNREACH;
 			goto bad;
 		}
 		if ((im6o == NULL && in6_mcast_loop) ||
 		    (im6o && im6o->im6o_multicast_loop)) {
 			/*
 			 * Loop back multicast datagram if not expressly
 			 * forbidden to do so, even if we have not joined
 			 * the address; protocols will filter it later,
 			 * thus deferring a hash lookup and lock acquisition
 			 * at the expense of an m_copym().
 			 */
 			ip6_mloopback(ifp, m);
 		} else {
 			/*
 			 * If we are acting as a multicast router, perform
 			 * multicast forwarding as if the packet had just
 			 * arrived on the interface to which we are about
 			 * to send.  The multicast forwarding function
 			 * recursively calls this function, using the
 			 * IPV6_FORWARDING flag to prevent infinite recursion.
 			 *
 			 * Multicasts that are looped back by ip6_mloopback(),
 			 * above, will be forwarded by the ip6_input() routine,
 			 * if necessary.
 			 */
 			if (V_ip6_mrouter && (flags & IPV6_FORWARDING) == 0) {
 				/*
 				 * XXX: ip6_mforward expects that rcvif is NULL
 				 * when it is called from the originating path.
 				 * However, it may not always be the case.
 				 */
 				m->m_pkthdr.rcvif = NULL;
 				if (ip6_mforward(ip6, ifp, m) != 0) {
 					m_freem(m);
 					goto done;
 				}
 			}
 		}
 		/*
 		 * Multicasts with a hoplimit of zero may be looped back,
 		 * above, but must not be transmitted on a network.
 		 * Also, multicasts addressed to the loopback interface
 		 * are not sent -- the above call to ip6_mloopback() will
 		 * loop back a copy if this host actually belongs to the
 		 * destination group on the loopback interface.
 		 */
 		if (ip6->ip6_hlim == 0 || (ifp->if_flags & IFF_LOOPBACK) ||
 		    IN6_IS_ADDR_MC_INTFACELOCAL(&ip6->ip6_dst)) {
 			m_freem(m);
 			goto done;
 		}
 	}
 
 	/*
 	 * Fill the outgoing inteface to tell the upper layer
 	 * to increment per-interface statistics.
 	 */
 	if (ifpp)
 		*ifpp = ifp;
 
 	/* Determine path MTU. */
 	if ((error = ip6_getpmtu(ro_pmtu, ro != ro_pmtu, ifp, &ip6->ip6_dst,
 		    &mtu, &alwaysfrag, fibnum, *nexthdrp)) != 0)
 		goto bad;
 
 	/*
 	 * The caller of this function may specify to use the minimum MTU
 	 * in some cases.
 	 * An advanced API option (IPV6_USE_MIN_MTU) can also override MTU
 	 * setting.  The logic is a bit complicated; by default, unicast
 	 * packets will follow path MTU while multicast packets will be sent at
 	 * the minimum MTU.  If IP6PO_MINMTU_ALL is specified, all packets
 	 * including unicast ones will be sent at the minimum MTU.  Multicast
 	 * packets will always be sent at the minimum MTU unless
 	 * IP6PO_MINMTU_DISABLE is explicitly specified.
 	 * See RFC 3542 for more details.
 	 */
 	if (mtu > IPV6_MMTU) {
 		if ((flags & IPV6_MINMTU))
 			mtu = IPV6_MMTU;
 		else if (opt && opt->ip6po_minmtu == IP6PO_MINMTU_ALL)
 			mtu = IPV6_MMTU;
 		else if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst) &&
 			 (opt == NULL ||
 			  opt->ip6po_minmtu != IP6PO_MINMTU_DISABLE)) {
 			mtu = IPV6_MMTU;
 		}
 	}
 
 	/*
 	 * clear embedded scope identifiers if necessary.
 	 * in6_clearscope will touch the addresses only when necessary.
 	 */
 	in6_clearscope(&ip6->ip6_src);
 	in6_clearscope(&ip6->ip6_dst);
 
 	/*
 	 * If the outgoing packet contains a hop-by-hop options header,
 	 * it must be examined and processed even by the source node.
 	 * (RFC 2460, section 4.)
 	 */
 	if (exthdrs.ip6e_hbh) {
 		struct ip6_hbh *hbh = mtod(exthdrs.ip6e_hbh, struct ip6_hbh *);
 		u_int32_t dummy; /* XXX unused */
 		u_int32_t plen = 0; /* XXX: ip6_process will check the value */
 
 #ifdef DIAGNOSTIC
 		if ((hbh->ip6h_len + 1) << 3 > exthdrs.ip6e_hbh->m_len)
 			panic("ip6e_hbh is not contiguous");
 #endif
 		/*
 		 *  XXX: if we have to send an ICMPv6 error to the sender,
 		 *       we need the M_LOOP flag since icmp6_error() expects
 		 *       the IPv6 and the hop-by-hop options header are
 		 *       contiguous unless the flag is set.
 		 */
 		m->m_flags |= M_LOOP;
 		m->m_pkthdr.rcvif = ifp;
 		if (ip6_process_hopopts(m, (u_int8_t *)(hbh + 1),
 		    ((hbh->ip6h_len + 1) << 3) - sizeof(struct ip6_hbh),
 		    &dummy, &plen) < 0) {
 			/* m was already freed at this point */
 			error = EINVAL;/* better error? */
 			goto done;
 		}
 		m->m_flags &= ~M_LOOP; /* XXX */
 		m->m_pkthdr.rcvif = NULL;
 	}
 
 	/* Jump over all PFIL processing if hooks are not active. */
 	if (!PFIL_HOOKED_OUT(V_inet6_pfil_head))
 		goto passout;
 
 	odst = ip6->ip6_dst;
 	/* Run through list of hooks for output packets. */
 	switch (pfil_run_hooks(V_inet6_pfil_head, &m, ifp, PFIL_OUT, inp)) {
 	case PFIL_PASS:
 		ip6 = mtod(m, struct ip6_hdr *);
 		break;
 	case PFIL_DROPPED:
 		error = EPERM;
 		/* FALLTHROUGH */
 	case PFIL_CONSUMED:
 		goto done;
 	}
 
 	needfiblookup = 0;
 	/* See if destination IP address was changed by packet filter. */
 	if (!IN6_ARE_ADDR_EQUAL(&odst, &ip6->ip6_dst)) {
 		m->m_flags |= M_SKIP_FIREWALL;
 		/* If destination is now ourself drop to ip6_input(). */
 		if (in6_localip(&ip6->ip6_dst)) {
 			m->m_flags |= M_FASTFWD_OURS;
 			if (m->m_pkthdr.rcvif == NULL)
 				m->m_pkthdr.rcvif = V_loif;
 			if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA_IPV6) {
 				m->m_pkthdr.csum_flags |=
 				    CSUM_DATA_VALID_IPV6 | CSUM_PSEUDO_HDR;
 				m->m_pkthdr.csum_data = 0xffff;
 			}
 #ifdef SCTP
 			if (m->m_pkthdr.csum_flags & CSUM_SCTP_IPV6)
 				m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID;
 #endif
 			error = netisr_queue(NETISR_IPV6, m);
 			goto done;
 		} else {
 			RO_INVALIDATE_CACHE(ro);
 			needfiblookup = 1; /* Redo the routing table lookup. */
 		}
 	}
 	/* See if fib was changed by packet filter. */
 	if (fibnum != M_GETFIB(m)) {
 		m->m_flags |= M_SKIP_FIREWALL;
 		fibnum = M_GETFIB(m);
 		RO_INVALIDATE_CACHE(ro);
 		needfiblookup = 1;
 	}
 	if (needfiblookup)
 		goto again;
 
 	/* See if local, if yes, send it to netisr. */
 	if (m->m_flags & M_FASTFWD_OURS) {
 		if (m->m_pkthdr.rcvif == NULL)
 			m->m_pkthdr.rcvif = V_loif;
 		if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA_IPV6) {
 			m->m_pkthdr.csum_flags |=
 			    CSUM_DATA_VALID_IPV6 | CSUM_PSEUDO_HDR;
 			m->m_pkthdr.csum_data = 0xffff;
 		}
 #ifdef SCTP
 		if (m->m_pkthdr.csum_flags & CSUM_SCTP_IPV6)
 			m->m_pkthdr.csum_flags |= CSUM_SCTP_VALID;
 #endif
 		error = netisr_queue(NETISR_IPV6, m);
 		goto done;
 	}
 	/* Or forward to some other address? */
 	if ((m->m_flags & M_IP6_NEXTHOP) &&
 	    (fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL)) != NULL) {
 		dst = (struct sockaddr_in6 *)&ro->ro_dst;
 		bcopy((fwd_tag+1), &dst_sa, sizeof(struct sockaddr_in6));
 		m->m_flags |= M_SKIP_FIREWALL;
 		m->m_flags &= ~M_IP6_NEXTHOP;
 		m_tag_delete(m, fwd_tag);
 		goto again;
 	}
 
 passout:
 	/*
 	 * Send the packet to the outgoing interface.
 	 * If necessary, do IPv6 fragmentation before sending.
 	 *
 	 * the logic here is rather complex:
 	 * 1: normal case (dontfrag == 0, alwaysfrag == 0)
 	 * 1-a:	send as is if tlen <= path mtu
 	 * 1-b:	fragment if tlen > path mtu
 	 *
 	 * 2: if user asks us not to fragment (dontfrag == 1)
 	 * 2-a:	send as is if tlen <= interface mtu
 	 * 2-b:	error if tlen > interface mtu
 	 *
 	 * 3: if we always need to attach fragment header (alwaysfrag == 1)
 	 *	always fragment
 	 *
 	 * 4: if dontfrag == 1 && alwaysfrag == 1
 	 *	error, as we cannot handle this conflicting request
 	 */
 	sw_csum = m->m_pkthdr.csum_flags;
 	if (!hdrsplit) {
 		tso = ((sw_csum & ifp->if_hwassist & CSUM_TSO) != 0) ? 1 : 0;
 		sw_csum &= ~ifp->if_hwassist;
 	} else
 		tso = 0;
 	/*
 	 * If we added extension headers, we will not do TSO and calculate the
 	 * checksums ourselves for now.
 	 * XXX-BZ  Need a framework to know when the NIC can handle it, even
 	 * with ext. hdrs.
 	 */
 	if (sw_csum & CSUM_DELAY_DATA_IPV6) {
 		sw_csum &= ~CSUM_DELAY_DATA_IPV6;
 		in6_delayed_cksum(m, plen, sizeof(struct ip6_hdr));
 	}
 #ifdef SCTP
 	if (sw_csum & CSUM_SCTP_IPV6) {
 		sw_csum &= ~CSUM_SCTP_IPV6;
 		sctp_delayed_cksum(m, sizeof(struct ip6_hdr));
 	}
 #endif
 	m->m_pkthdr.csum_flags &= ifp->if_hwassist;
 	tlen = m->m_pkthdr.len;
 
 	if ((opt && (opt->ip6po_flags & IP6PO_DONTFRAG)) || tso)
 		dontfrag = 1;
 	else
 		dontfrag = 0;
 	if (dontfrag && alwaysfrag) {	/* case 4 */
 		/* conflicting request - can't transmit */
 		error = EMSGSIZE;
 		goto bad;
 	}
 	if (dontfrag && tlen > IN6_LINKMTU(ifp) && !tso) {	/* case 2-b */
 		/*
 		 * Even if the DONTFRAG option is specified, we cannot send the
 		 * packet when the data length is larger than the MTU of the
 		 * outgoing interface.
 		 * Notify the error by sending IPV6_PATHMTU ancillary data if
 		 * application wanted to know the MTU value. Also return an
 		 * error code (this is not described in the API spec).
 		 */
 		if (inp != NULL)
 			ip6_notify_pmtu(inp, &dst_sa, (u_int32_t)mtu);
 		error = EMSGSIZE;
 		goto bad;
 	}
 
 	/*
 	 * transmit packet without fragmentation
 	 */
 	if (dontfrag || (!alwaysfrag && tlen <= mtu)) {	/* case 1-a and 2-a */
 		struct in6_ifaddr *ia6;
 
 		ip6 = mtod(m, struct ip6_hdr *);
 		ia6 = in6_ifawithifp(ifp, &ip6->ip6_src);
 		if (ia6) {
 			/* Record statistics for this interface address. */
 			counter_u64_add(ia6->ia_ifa.ifa_opackets, 1);
 			counter_u64_add(ia6->ia_ifa.ifa_obytes,
 			    m->m_pkthdr.len);
 			ifa_free(&ia6->ia_ifa);
 		}
 #ifdef RATELIMIT
 		if (inp != NULL) {
 			if (inp->inp_flags2 & INP_RATE_LIMIT_CHANGED)
 				in_pcboutput_txrtlmt(inp, ifp, m);
 			/* stamp send tag on mbuf */
 			m->m_pkthdr.snd_tag = inp->inp_snd_tag;
 		} else {
 			m->m_pkthdr.snd_tag = NULL;
 		}
 #endif
 		error = nd6_output_ifp(ifp, origifp, m, dst,
 		    (struct route *)ro);
 #ifdef RATELIMIT
 		/* check for route change */
 		if (error == EAGAIN)
 			in_pcboutput_eagain(inp);
 #endif
 		goto done;
 	}
 
 	/*
 	 * try to fragment the packet.  case 1-b and 3
 	 */
 	if (mtu < IPV6_MMTU) {
 		/* path MTU cannot be less than IPV6_MMTU */
 		error = EMSGSIZE;
 		in6_ifstat_inc(ifp, ifs6_out_fragfail);
 		goto bad;
 	} else if (ip6->ip6_plen == 0) {
 		/* jumbo payload cannot be fragmented */
 		error = EMSGSIZE;
 		in6_ifstat_inc(ifp, ifs6_out_fragfail);
 		goto bad;
 	} else {
 		u_char nextproto;
 
 		/*
 		 * Too large for the destination or interface;
 		 * fragment if possible.
 		 * Must be able to put at least 8 bytes per fragment.
 		 */
 		hlen = unfragpartlen;
 		if (mtu > IPV6_MAXPACKET)
 			mtu = IPV6_MAXPACKET;
 
 		len = (mtu - hlen - sizeof(struct ip6_frag)) & ~7;
 		if (len < 8) {
 			error = EMSGSIZE;
 			in6_ifstat_inc(ifp, ifs6_out_fragfail);
 			goto bad;
 		}
 
 		/*
 		 * If the interface will not calculate checksums on
 		 * fragmented packets, then do it here.
 		 * XXX-BZ handle the hw offloading case.  Need flags.
 		 */
 		if (m->m_pkthdr.csum_flags & CSUM_DELAY_DATA_IPV6) {
 			in6_delayed_cksum(m, plen, hlen);
 			m->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA_IPV6;
 		}
 #ifdef SCTP
 		if (m->m_pkthdr.csum_flags & CSUM_SCTP_IPV6) {
 			sctp_delayed_cksum(m, hlen);
 			m->m_pkthdr.csum_flags &= ~CSUM_SCTP_IPV6;
 		}
 #endif
 		/*
 		 * Change the next header field of the last header in the
 		 * unfragmentable part.
 		 */
 		if (exthdrs.ip6e_rthdr) {
 			nextproto = *mtod(exthdrs.ip6e_rthdr, u_char *);
 			*mtod(exthdrs.ip6e_rthdr, u_char *) = IPPROTO_FRAGMENT;
 		} else if (exthdrs.ip6e_dest1) {
 			nextproto = *mtod(exthdrs.ip6e_dest1, u_char *);
 			*mtod(exthdrs.ip6e_dest1, u_char *) = IPPROTO_FRAGMENT;
 		} else if (exthdrs.ip6e_hbh) {
 			nextproto = *mtod(exthdrs.ip6e_hbh, u_char *);
 			*mtod(exthdrs.ip6e_hbh, u_char *) = IPPROTO_FRAGMENT;
 		} else {
 			nextproto = ip6->ip6_nxt;
 			ip6->ip6_nxt = IPPROTO_FRAGMENT;
 		}
 
 		/*
 		 * Loop through length of segment after first fragment,
 		 * make new header and copy data of each part and link onto
 		 * chain.
 		 */
 		m0 = m;
 		id = htonl(ip6_randomid());
 		if ((error = ip6_fragment(ifp, m, hlen, nextproto, len, id)))
 			goto sendorfree;
 
 		in6_ifstat_inc(ifp, ifs6_out_fragok);
 	}
 
 	/*
 	 * Remove leading garbages.
 	 */
 sendorfree:
 	m = m0->m_nextpkt;
 	m0->m_nextpkt = 0;
 	m_freem(m0);
 	for (; m; m = m0) {
 		m0 = m->m_nextpkt;
 		m->m_nextpkt = 0;
 		if (error == 0) {
 			/* Record statistics for this interface address. */
 			if (ia) {
 				counter_u64_add(ia->ia_ifa.ifa_opackets, 1);
 				counter_u64_add(ia->ia_ifa.ifa_obytes,
 				    m->m_pkthdr.len);
 			}
 #ifdef RATELIMIT
 			if (inp != NULL) {
 				if (inp->inp_flags2 & INP_RATE_LIMIT_CHANGED)
 					in_pcboutput_txrtlmt(inp, ifp, m);
 				/* stamp send tag on mbuf */
 				m->m_pkthdr.snd_tag = inp->inp_snd_tag;
 			} else {
 				m->m_pkthdr.snd_tag = NULL;
 			}
 #endif
 			error = nd6_output_ifp(ifp, origifp, m, dst,
 			    (struct route *)ro);
 #ifdef RATELIMIT
 			/* check for route change */
 			if (error == EAGAIN)
 				in_pcboutput_eagain(inp);
 #endif
 		} else
 			m_freem(m);
 	}
 
 	if (error == 0)
 		IP6STAT_INC(ip6s_fragmented);
 
 done:
 	if (ro == &ip6route)
 		RO_RTFREE(ro);
 	return (error);
 
 freehdrs:
 	m_freem(exthdrs.ip6e_hbh);	/* m_freem will check if mbuf is 0 */
 	m_freem(exthdrs.ip6e_dest1);
 	m_freem(exthdrs.ip6e_rthdr);
 	m_freem(exthdrs.ip6e_dest2);
 	/* FALLTHROUGH */
 bad:
 	if (m)
 		m_freem(m);
 	goto done;
 }
 
 static int
 ip6_copyexthdr(struct mbuf **mp, caddr_t hdr, int hlen)
 {
 	struct mbuf *m;
 
 	if (hlen > MCLBYTES)
 		return (ENOBUFS); /* XXX */
 
 	if (hlen > MLEN)
 		m = m_getcl(M_NOWAIT, MT_DATA, 0);
 	else
 		m = m_get(M_NOWAIT, MT_DATA);
 	if (m == NULL)
 		return (ENOBUFS);
 	m->m_len = hlen;
 	if (hdr)
 		bcopy(hdr, mtod(m, caddr_t), hlen);
 
 	*mp = m;
 	return (0);
 }
 
 /*
  * Insert jumbo payload option.
  */
 static int
 ip6_insert_jumboopt(struct ip6_exthdrs *exthdrs, u_int32_t plen)
 {
 	struct mbuf *mopt;
 	u_char *optbuf;
 	u_int32_t v;
 
 #define JUMBOOPTLEN	8	/* length of jumbo payload option and padding */
 
 	/*
 	 * If there is no hop-by-hop options header, allocate new one.
 	 * If there is one but it doesn't have enough space to store the
 	 * jumbo payload option, allocate a cluster to store the whole options.
 	 * Otherwise, use it to store the options.
 	 */
 	if (exthdrs->ip6e_hbh == NULL) {
 		mopt = m_get(M_NOWAIT, MT_DATA);
 		if (mopt == NULL)
 			return (ENOBUFS);
 		mopt->m_len = JUMBOOPTLEN;
 		optbuf = mtod(mopt, u_char *);
 		optbuf[1] = 0;	/* = ((JUMBOOPTLEN) >> 3) - 1 */
 		exthdrs->ip6e_hbh = mopt;
 	} else {
 		struct ip6_hbh *hbh;
 
 		mopt = exthdrs->ip6e_hbh;
 		if (M_TRAILINGSPACE(mopt) < JUMBOOPTLEN) {
 			/*
 			 * XXX assumption:
 			 * - exthdrs->ip6e_hbh is not referenced from places
 			 *   other than exthdrs.
 			 * - exthdrs->ip6e_hbh is not an mbuf chain.
 			 */
 			int oldoptlen = mopt->m_len;
 			struct mbuf *n;
 
 			/*
 			 * XXX: give up if the whole (new) hbh header does
 			 * not fit even in an mbuf cluster.
 			 */
 			if (oldoptlen + JUMBOOPTLEN > MCLBYTES)
 				return (ENOBUFS);
 
 			/*
 			 * As a consequence, we must always prepare a cluster
 			 * at this point.
 			 */
 			n = m_getcl(M_NOWAIT, MT_DATA, 0);
 			if (n == NULL)
 				return (ENOBUFS);
 			n->m_len = oldoptlen + JUMBOOPTLEN;
 			bcopy(mtod(mopt, caddr_t), mtod(n, caddr_t),
 			    oldoptlen);
 			optbuf = mtod(n, caddr_t) + oldoptlen;
 			m_freem(mopt);
 			mopt = exthdrs->ip6e_hbh = n;
 		} else {
 			optbuf = mtod(mopt, u_char *) + mopt->m_len;
 			mopt->m_len += JUMBOOPTLEN;
 		}
 		optbuf[0] = IP6OPT_PADN;
 		optbuf[1] = 1;
 
 		/*
 		 * Adjust the header length according to the pad and
 		 * the jumbo payload option.
 		 */
 		hbh = mtod(mopt, struct ip6_hbh *);
 		hbh->ip6h_len += (JUMBOOPTLEN >> 3);
 	}
 
 	/* fill in the option. */
 	optbuf[2] = IP6OPT_JUMBO;
 	optbuf[3] = 4;
 	v = (u_int32_t)htonl(plen + JUMBOOPTLEN);
 	bcopy(&v, &optbuf[4], sizeof(u_int32_t));
 
 	/* finally, adjust the packet header length */
 	exthdrs->ip6e_ip6->m_pkthdr.len += JUMBOOPTLEN;
 
 	return (0);
 #undef JUMBOOPTLEN
 }
 
 /*
  * Insert fragment header and copy unfragmentable header portions.
  */
 static int
 ip6_insertfraghdr(struct mbuf *m0, struct mbuf *m, int hlen,
     struct ip6_frag **frghdrp)
 {
 	struct mbuf *n, *mlast;
 
 	if (hlen > sizeof(struct ip6_hdr)) {
 		n = m_copym(m0, sizeof(struct ip6_hdr),
 		    hlen - sizeof(struct ip6_hdr), M_NOWAIT);
 		if (n == NULL)
 			return (ENOBUFS);
 		m->m_next = n;
 	} else
 		n = m;
 
 	/* Search for the last mbuf of unfragmentable part. */
 	for (mlast = n; mlast->m_next; mlast = mlast->m_next)
 		;
 
 	if (M_WRITABLE(mlast) &&
 	    M_TRAILINGSPACE(mlast) >= sizeof(struct ip6_frag)) {
 		/* use the trailing space of the last mbuf for the fragment hdr */
 		*frghdrp = (struct ip6_frag *)(mtod(mlast, caddr_t) +
 		    mlast->m_len);
 		mlast->m_len += sizeof(struct ip6_frag);
 		m->m_pkthdr.len += sizeof(struct ip6_frag);
 	} else {
 		/* allocate a new mbuf for the fragment header */
 		struct mbuf *mfrg;
 
 		mfrg = m_get(M_NOWAIT, MT_DATA);
 		if (mfrg == NULL)
 			return (ENOBUFS);
 		mfrg->m_len = sizeof(struct ip6_frag);
 		*frghdrp = mtod(mfrg, struct ip6_frag *);
 		mlast->m_next = mfrg;
 	}
 
 	return (0);
 }
 
 /*
  * Calculates IPv6 path mtu for destination @dst.
  * Resulting MTU is stored in @mtup.
  *
  * Returns 0 on success.
  */
 static int
 ip6_getpmtu_ctl(u_int fibnum, const struct in6_addr *dst, u_long *mtup)
 {
 	struct nhop6_extended nh6;
 	struct in6_addr kdst;
 	uint32_t scopeid;
 	struct ifnet *ifp;
 	u_long mtu;
 	int error;
 
 	in6_splitscope(dst, &kdst, &scopeid);
 	if (fib6_lookup_nh_ext(fibnum, &kdst, scopeid, NHR_REF, 0, &nh6) != 0)
 		return (EHOSTUNREACH);
 
 	ifp = nh6.nh_ifp;
 	mtu = nh6.nh_mtu;
 
 	error = ip6_calcmtu(ifp, dst, mtu, mtup, NULL, 0);
 	fib6_free_nh_ext(fibnum, &nh6);
 
 	return (error);
 }
 
 /*
  * Calculates IPv6 path MTU for @dst based on transmit @ifp,
  * and cached data in @ro_pmtu.
  * MTU from (successful) route lookup is saved (along with dst)
  * inside @ro_pmtu to avoid subsequent route lookups after packet
  * filter processing.
  *
  * Stores mtu and always-frag value into @mtup and @alwaysfragp.
  * Returns 0 on success.
  */
 static int
 ip6_getpmtu(struct route_in6 *ro_pmtu, int do_lookup,
     struct ifnet *ifp, const struct in6_addr *dst, u_long *mtup,
     int *alwaysfragp, u_int fibnum, u_int proto)
 {
 	struct nhop6_basic nh6;
 	struct in6_addr kdst;
 	uint32_t scopeid;
 	struct sockaddr_in6 *sa6_dst;
 	u_long mtu;
 
 	mtu = 0;
 	if (do_lookup) {
 
 		/*
 		 * Here ro_pmtu has final destination address, while
 		 * ro might represent immediate destination.
 		 * Use ro_pmtu destination since mtu might differ.
 		 */
 		sa6_dst = (struct sockaddr_in6 *)&ro_pmtu->ro_dst;
 		if (!IN6_ARE_ADDR_EQUAL(&sa6_dst->sin6_addr, dst))
 			ro_pmtu->ro_mtu = 0;
 
 		if (ro_pmtu->ro_mtu == 0) {
 			bzero(sa6_dst, sizeof(*sa6_dst));
 			sa6_dst->sin6_family = AF_INET6;
 			sa6_dst->sin6_len = sizeof(struct sockaddr_in6);
 			sa6_dst->sin6_addr = *dst;
 
 			in6_splitscope(dst, &kdst, &scopeid);
 			if (fib6_lookup_nh_basic(fibnum, &kdst, scopeid, 0, 0,
 			    &nh6) == 0)
 				ro_pmtu->ro_mtu = nh6.nh_mtu;
 		}
 
 		mtu = ro_pmtu->ro_mtu;
 	}
 
 	if (ro_pmtu->ro_rt)
 		mtu = ro_pmtu->ro_rt->rt_mtu;
 
 	return (ip6_calcmtu(ifp, dst, mtu, mtup, alwaysfragp, proto));
 }
 
 /*
  * Calculate MTU based on transmit @ifp, route mtu @rt_mtu and
  * hostcache data for @dst.
  * Stores mtu and always-frag value into @mtup and @alwaysfragp.
  *
  * Returns 0 on success.
  */
 static int
 ip6_calcmtu(struct ifnet *ifp, const struct in6_addr *dst, u_long rt_mtu,
     u_long *mtup, int *alwaysfragp, u_int proto)
 {
 	u_long mtu = 0;
 	int alwaysfrag = 0;
 	int error = 0;
 
 	if (rt_mtu > 0) {
 		u_int32_t ifmtu;
 		struct in_conninfo inc;
 
 		bzero(&inc, sizeof(inc));
 		inc.inc_flags |= INC_ISIPV6;
 		inc.inc6_faddr = *dst;
 
 		ifmtu = IN6_LINKMTU(ifp);
 
 		/* TCP is known to react to pmtu changes so skip hc */
 		if (proto != IPPROTO_TCP)
 			mtu = tcp_hc_getmtu(&inc);
 
 		if (mtu)
 			mtu = min(mtu, rt_mtu);
 		else
 			mtu = rt_mtu;
 		if (mtu == 0)
 			mtu = ifmtu;
 		else if (mtu < IPV6_MMTU) {
 			/*
 			 * RFC2460 section 5, last paragraph:
 			 * if we record ICMPv6 too big message with
 			 * mtu < IPV6_MMTU, transmit packets sized IPV6_MMTU
 			 * or smaller, with framgent header attached.
 			 * (fragment header is needed regardless from the
 			 * packet size, for translators to identify packets)
 			 */
 			alwaysfrag = 1;
 			mtu = IPV6_MMTU;
 		}
 	} else if (ifp) {
 		mtu = IN6_LINKMTU(ifp);
 	} else
 		error = EHOSTUNREACH; /* XXX */
 
 	*mtup = mtu;
 	if (alwaysfragp)
 		*alwaysfragp = alwaysfrag;
 	return (error);
 }
 
 /*
  * IP6 socket option processing.
  */
 int
 ip6_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	int optdatalen, uproto;
 	void *optdata;
 	struct inpcb *in6p = sotoinpcb(so);
 	int error, optval;
 	int level, op, optname;
 	int optlen;
 	struct thread *td;
 #ifdef	RSS
 	uint32_t rss_bucket;
 	int retval;
 #endif
 
 /*
  * Don't use more than a quarter of mbuf clusters.  N.B.:
  * nmbclusters is an int, but nmbclusters * MCLBYTES may overflow
  * on LP64 architectures, so cast to u_long to avoid undefined
  * behavior.  ILP32 architectures cannot have nmbclusters
  * large enough to overflow for other reasons.
  */
 #define IPV6_PKTOPTIONS_MBUF_LIMIT	((u_long)nmbclusters * MCLBYTES / 4)
 
 	level = sopt->sopt_level;
 	op = sopt->sopt_dir;
 	optname = sopt->sopt_name;
 	optlen = sopt->sopt_valsize;
 	td = sopt->sopt_td;
 	error = 0;
 	optval = 0;
 	uproto = (int)so->so_proto->pr_protocol;
 
 	if (level != IPPROTO_IPV6) {
 		error = EINVAL;
 
 		if (sopt->sopt_level == SOL_SOCKET &&
 		    sopt->sopt_dir == SOPT_SET) {
 			switch (sopt->sopt_name) {
 			case SO_REUSEADDR:
 				INP_WLOCK(in6p);
 				if ((so->so_options & SO_REUSEADDR) != 0)
 					in6p->inp_flags2 |= INP_REUSEADDR;
 				else
 					in6p->inp_flags2 &= ~INP_REUSEADDR;
 				INP_WUNLOCK(in6p);
 				error = 0;
 				break;
 			case SO_REUSEPORT:
 				INP_WLOCK(in6p);
 				if ((so->so_options & SO_REUSEPORT) != 0)
 					in6p->inp_flags2 |= INP_REUSEPORT;
 				else
 					in6p->inp_flags2 &= ~INP_REUSEPORT;
 				INP_WUNLOCK(in6p);
 				error = 0;
 				break;
 			case SO_REUSEPORT_LB:
 				INP_WLOCK(in6p);
 				if ((so->so_options & SO_REUSEPORT_LB) != 0)
 					in6p->inp_flags2 |= INP_REUSEPORT_LB;
 				else
 					in6p->inp_flags2 &= ~INP_REUSEPORT_LB;
 				INP_WUNLOCK(in6p);
 				error = 0;
 				break;
 			case SO_SETFIB:
 				INP_WLOCK(in6p);
 				in6p->inp_inc.inc_fibnum = so->so_fibnum;
 				INP_WUNLOCK(in6p);
 				error = 0;
 				break;
 			case SO_MAX_PACING_RATE:
 #ifdef RATELIMIT
 				INP_WLOCK(in6p);
 				in6p->inp_flags2 |= INP_RATE_LIMIT_CHANGED;
 				INP_WUNLOCK(in6p);
 				error = 0;
 #else
 				error = EOPNOTSUPP;
 #endif
 				break;
 			default:
 				break;
 			}
 		}
 	} else {		/* level == IPPROTO_IPV6 */
 		switch (op) {
 
 		case SOPT_SET:
 			switch (optname) {
 			case IPV6_2292PKTOPTIONS:
 #ifdef IPV6_PKTOPTIONS
 			case IPV6_PKTOPTIONS:
 #endif
 			{
 				struct mbuf *m;
 
 				if (optlen > IPV6_PKTOPTIONS_MBUF_LIMIT) {
 					printf("ip6_ctloutput: mbuf limit hit\n");
 					error = ENOBUFS;
 					break;
 				}
 
 				error = soopt_getm(sopt, &m); /* XXX */
 				if (error != 0)
 					break;
 				error = soopt_mcopyin(sopt, m); /* XXX */
 				if (error != 0)
 					break;
 				error = ip6_pcbopts(&in6p->in6p_outputopts,
 						    m, so, sopt);
 				m_freem(m); /* XXX */
 				break;
 			}
 
 			/*
 			 * Use of some Hop-by-Hop options or some
 			 * Destination options, might require special
 			 * privilege.  That is, normal applications
 			 * (without special privilege) might be forbidden
 			 * from setting certain options in outgoing packets,
 			 * and might never see certain options in received
 			 * packets. [RFC 2292 Section 6]
 			 * KAME specific note:
 			 *  KAME prevents non-privileged users from sending or
 			 *  receiving ANY hbh/dst options in order to avoid
 			 *  overhead of parsing options in the kernel.
 			 */
 			case IPV6_RECVHOPOPTS:
 			case IPV6_RECVDSTOPTS:
 			case IPV6_RECVRTHDRDSTOPTS:
 				if (td != NULL) {
 					error = priv_check(td,
 					    PRIV_NETINET_SETHDROPTS);
 					if (error)
 						break;
 				}
 				/* FALLTHROUGH */
 			case IPV6_UNICAST_HOPS:
 			case IPV6_HOPLIMIT:
 
 			case IPV6_RECVPKTINFO:
 			case IPV6_RECVHOPLIMIT:
 			case IPV6_RECVRTHDR:
 			case IPV6_RECVPATHMTU:
 			case IPV6_RECVTCLASS:
 			case IPV6_RECVFLOWID:
 #ifdef	RSS
 			case IPV6_RECVRSSBUCKETID:
 #endif
 			case IPV6_V6ONLY:
 			case IPV6_AUTOFLOWLABEL:
 			case IPV6_ORIGDSTADDR:
 			case IPV6_BINDANY:
 			case IPV6_BINDMULTI:
 #ifdef	RSS
 			case IPV6_RSS_LISTEN_BUCKET:
 #endif
 				if (optname == IPV6_BINDANY && td != NULL) {
 					error = priv_check(td,
 					    PRIV_NETINET_BINDANY);
 					if (error)
 						break;
 				}
 
 				if (optlen != sizeof(int)) {
 					error = EINVAL;
 					break;
 				}
 				error = sooptcopyin(sopt, &optval,
 					sizeof optval, sizeof optval);
 				if (error)
 					break;
 				switch (optname) {
 
 				case IPV6_UNICAST_HOPS:
 					if (optval < -1 || optval >= 256)
 						error = EINVAL;
 					else {
 						/* -1 = kernel default */
 						in6p->in6p_hops = optval;
 						if ((in6p->inp_vflag &
 						     INP_IPV4) != 0)
 							in6p->inp_ip_ttl = optval;
 					}
 					break;
 #define OPTSET(bit) \
 do { \
 	INP_WLOCK(in6p); \
 	if (optval) \
 		in6p->inp_flags |= (bit); \
 	else \
 		in6p->inp_flags &= ~(bit); \
 	INP_WUNLOCK(in6p); \
 } while (/*CONSTCOND*/ 0)
 #define OPTSET2292(bit) \
 do { \
 	INP_WLOCK(in6p); \
 	in6p->inp_flags |= IN6P_RFC2292; \
 	if (optval) \
 		in6p->inp_flags |= (bit); \
 	else \
 		in6p->inp_flags &= ~(bit); \
 	INP_WUNLOCK(in6p); \
 } while (/*CONSTCOND*/ 0)
 #define OPTBIT(bit) (in6p->inp_flags & (bit) ? 1 : 0)
 
 #define OPTSET2_N(bit, val) do {					\
 	if (val)							\
 		in6p->inp_flags2 |= bit;				\
 	else								\
 		in6p->inp_flags2 &= ~bit;				\
 } while (0)
 #define OPTSET2(bit, val) do {						\
 	INP_WLOCK(in6p);						\
 	OPTSET2_N(bit, val);						\
 	INP_WUNLOCK(in6p);						\
 } while (0)
 #define OPTBIT2(bit) (in6p->inp_flags2 & (bit) ? 1 : 0)
 #define OPTSET2292_EXCLUSIVE(bit)					\
 do {									\
 	INP_WLOCK(in6p);						\
 	if (OPTBIT(IN6P_RFC2292)) {					\
 		error = EINVAL;						\
 	} else {							\
 		if (optval)						\
 			in6p->inp_flags |= (bit);			\
 		else							\
 			in6p->inp_flags &= ~(bit);			\
 	}								\
 	INP_WUNLOCK(in6p);						\
 } while (/*CONSTCOND*/ 0)
 
 				case IPV6_RECVPKTINFO:
 					OPTSET2292_EXCLUSIVE(IN6P_PKTINFO);
 					break;
 
 				case IPV6_HOPLIMIT:
 				{
 					struct ip6_pktopts **optp;
 
 					/* cannot mix with RFC2292 */
 					if (OPTBIT(IN6P_RFC2292)) {
 						error = EINVAL;
 						break;
 					}
 					INP_WLOCK(in6p);
 					if (in6p->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
 						INP_WUNLOCK(in6p);
 						return (ECONNRESET);
 					}
 					optp = &in6p->in6p_outputopts;
 					error = ip6_pcbopt(IPV6_HOPLIMIT,
 					    (u_char *)&optval, sizeof(optval),
 					    optp, (td != NULL) ? td->td_ucred :
 					    NULL, uproto);
 					INP_WUNLOCK(in6p);
 					break;
 				}
 
 				case IPV6_RECVHOPLIMIT:
 					OPTSET2292_EXCLUSIVE(IN6P_HOPLIMIT);
 					break;
 
 				case IPV6_RECVHOPOPTS:
 					OPTSET2292_EXCLUSIVE(IN6P_HOPOPTS);
 					break;
 
 				case IPV6_RECVDSTOPTS:
 					OPTSET2292_EXCLUSIVE(IN6P_DSTOPTS);
 					break;
 
 				case IPV6_RECVRTHDRDSTOPTS:
 					OPTSET2292_EXCLUSIVE(IN6P_RTHDRDSTOPTS);
 					break;
 
 				case IPV6_RECVRTHDR:
 					OPTSET2292_EXCLUSIVE(IN6P_RTHDR);
 					break;
 
 				case IPV6_RECVPATHMTU:
 					/*
 					 * We ignore this option for TCP
 					 * sockets.
 					 * (RFC3542 leaves this case
 					 * unspecified.)
 					 */
 					if (uproto != IPPROTO_TCP)
 						OPTSET(IN6P_MTU);
 					break;
 
 				case IPV6_RECVFLOWID:
 					OPTSET2(INP_RECVFLOWID, optval);
 					break;
 
 #ifdef	RSS
 				case IPV6_RECVRSSBUCKETID:
 					OPTSET2(INP_RECVRSSBUCKETID, optval);
 					break;
 #endif
 
 				case IPV6_V6ONLY:
 					/*
 					 * make setsockopt(IPV6_V6ONLY)
 					 * available only prior to bind(2).
 					 * see ipng mailing list, Jun 22 2001.
 					 */
 					if (in6p->inp_lport ||
 					    !IN6_IS_ADDR_UNSPECIFIED(&in6p->in6p_laddr)) {
 						error = EINVAL;
 						break;
 					}
 					OPTSET(IN6P_IPV6_V6ONLY);
 					if (optval)
 						in6p->inp_vflag &= ~INP_IPV4;
 					else
 						in6p->inp_vflag |= INP_IPV4;
 					break;
 				case IPV6_RECVTCLASS:
 					/* cannot mix with RFC2292 XXX */
 					OPTSET2292_EXCLUSIVE(IN6P_TCLASS);
 					break;
 				case IPV6_AUTOFLOWLABEL:
 					OPTSET(IN6P_AUTOFLOWLABEL);
 					break;
 
 				case IPV6_ORIGDSTADDR:
 					OPTSET2(INP_ORIGDSTADDR, optval);
 					break;
 				case IPV6_BINDANY:
 					OPTSET(INP_BINDANY);
 					break;
 
 				case IPV6_BINDMULTI:
 					OPTSET2(INP_BINDMULTI, optval);
 					break;
 #ifdef	RSS
 				case IPV6_RSS_LISTEN_BUCKET:
 					if ((optval >= 0) &&
 					    (optval < rss_getnumbuckets())) {
 						INP_WLOCK(in6p);
 						in6p->inp_rss_listen_bucket = optval;
 						OPTSET2_N(INP_RSS_BUCKET_SET, 1);
 						INP_WUNLOCK(in6p);
 					} else {
 						error = EINVAL;
 					}
 					break;
 #endif
 				}
 				break;
 
 			case IPV6_TCLASS:
 			case IPV6_DONTFRAG:
 			case IPV6_USE_MIN_MTU:
 			case IPV6_PREFER_TEMPADDR:
 				if (optlen != sizeof(optval)) {
 					error = EINVAL;
 					break;
 				}
 				error = sooptcopyin(sopt, &optval,
 					sizeof optval, sizeof optval);
 				if (error)
 					break;
 				{
 					struct ip6_pktopts **optp;
 					INP_WLOCK(in6p);
 					if (in6p->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
 						INP_WUNLOCK(in6p);
 						return (ECONNRESET);
 					}
 					optp = &in6p->in6p_outputopts;
 					error = ip6_pcbopt(optname,
 					    (u_char *)&optval, sizeof(optval),
 					    optp, (td != NULL) ? td->td_ucred :
 					    NULL, uproto);
 					INP_WUNLOCK(in6p);
 					break;
 				}
 
 			case IPV6_2292PKTINFO:
 			case IPV6_2292HOPLIMIT:
 			case IPV6_2292HOPOPTS:
 			case IPV6_2292DSTOPTS:
 			case IPV6_2292RTHDR:
 				/* RFC 2292 */
 				if (optlen != sizeof(int)) {
 					error = EINVAL;
 					break;
 				}
 				error = sooptcopyin(sopt, &optval,
 					sizeof optval, sizeof optval);
 				if (error)
 					break;
 				switch (optname) {
 				case IPV6_2292PKTINFO:
 					OPTSET2292(IN6P_PKTINFO);
 					break;
 				case IPV6_2292HOPLIMIT:
 					OPTSET2292(IN6P_HOPLIMIT);
 					break;
 				case IPV6_2292HOPOPTS:
 					/*
 					 * Check super-user privilege.
 					 * See comments for IPV6_RECVHOPOPTS.
 					 */
 					if (td != NULL) {
 						error = priv_check(td,
 						    PRIV_NETINET_SETHDROPTS);
 						if (error)
 							return (error);
 					}
 					OPTSET2292(IN6P_HOPOPTS);
 					break;
 				case IPV6_2292DSTOPTS:
 					if (td != NULL) {
 						error = priv_check(td,
 						    PRIV_NETINET_SETHDROPTS);
 						if (error)
 							return (error);
 					}
 					OPTSET2292(IN6P_DSTOPTS|IN6P_RTHDRDSTOPTS); /* XXX */
 					break;
 				case IPV6_2292RTHDR:
 					OPTSET2292(IN6P_RTHDR);
 					break;
 				}
 				break;
 			case IPV6_PKTINFO:
 			case IPV6_HOPOPTS:
 			case IPV6_RTHDR:
 			case IPV6_DSTOPTS:
 			case IPV6_RTHDRDSTOPTS:
 			case IPV6_NEXTHOP:
 			{
 				/* new advanced API (RFC3542) */
 				u_char *optbuf;
 				u_char optbuf_storage[MCLBYTES];
 				int optlen;
 				struct ip6_pktopts **optp;
 
 				/* cannot mix with RFC2292 */
 				if (OPTBIT(IN6P_RFC2292)) {
 					error = EINVAL;
 					break;
 				}
 
 				/*
 				 * We only ensure valsize is not too large
 				 * here.  Further validation will be done
 				 * later.
 				 */
 				error = sooptcopyin(sopt, optbuf_storage,
 				    sizeof(optbuf_storage), 0);
 				if (error)
 					break;
 				optlen = sopt->sopt_valsize;
 				optbuf = optbuf_storage;
 				INP_WLOCK(in6p);
 				if (in6p->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
 					INP_WUNLOCK(in6p);
 					return (ECONNRESET);
 				}
 				optp = &in6p->in6p_outputopts;
 				error = ip6_pcbopt(optname, optbuf, optlen,
 				    optp, (td != NULL) ? td->td_ucred : NULL,
 				    uproto);
 				INP_WUNLOCK(in6p);
 				break;
 			}
 #undef OPTSET
 
 			case IPV6_MULTICAST_IF:
 			case IPV6_MULTICAST_HOPS:
 			case IPV6_MULTICAST_LOOP:
 			case IPV6_JOIN_GROUP:
 			case IPV6_LEAVE_GROUP:
 			case IPV6_MSFILTER:
 			case MCAST_BLOCK_SOURCE:
 			case MCAST_UNBLOCK_SOURCE:
 			case MCAST_JOIN_GROUP:
 			case MCAST_LEAVE_GROUP:
 			case MCAST_JOIN_SOURCE_GROUP:
 			case MCAST_LEAVE_SOURCE_GROUP:
 				error = ip6_setmoptions(in6p, sopt);
 				break;
 
 			case IPV6_PORTRANGE:
 				error = sooptcopyin(sopt, &optval,
 				    sizeof optval, sizeof optval);
 				if (error)
 					break;
 
 				INP_WLOCK(in6p);
 				switch (optval) {
 				case IPV6_PORTRANGE_DEFAULT:
 					in6p->inp_flags &= ~(INP_LOWPORT);
 					in6p->inp_flags &= ~(INP_HIGHPORT);
 					break;
 
 				case IPV6_PORTRANGE_HIGH:
 					in6p->inp_flags &= ~(INP_LOWPORT);
 					in6p->inp_flags |= INP_HIGHPORT;
 					break;
 
 				case IPV6_PORTRANGE_LOW:
 					in6p->inp_flags &= ~(INP_HIGHPORT);
 					in6p->inp_flags |= INP_LOWPORT;
 					break;
 
 				default:
 					error = EINVAL;
 					break;
 				}
 				INP_WUNLOCK(in6p);
 				break;
 
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 			case IPV6_IPSEC_POLICY:
 				if (IPSEC_ENABLED(ipv6)) {
 					error = IPSEC_PCBCTL(ipv6, in6p, sopt);
 					break;
 				}
 				/* FALLTHROUGH */
 #endif /* IPSEC */
 
 			default:
 				error = ENOPROTOOPT;
 				break;
 			}
 			break;
 
 		case SOPT_GET:
 			switch (optname) {
 
 			case IPV6_2292PKTOPTIONS:
 #ifdef IPV6_PKTOPTIONS
 			case IPV6_PKTOPTIONS:
 #endif
 				/*
 				 * RFC3542 (effectively) deprecated the
 				 * semantics of the 2292-style pktoptions.
 				 * Since it was not reliable in nature (i.e.,
 				 * applications had to expect the lack of some
 				 * information after all), it would make sense
 				 * to simplify this part by always returning
 				 * empty data.
 				 */
 				sopt->sopt_valsize = 0;
 				break;
 
 			case IPV6_RECVHOPOPTS:
 			case IPV6_RECVDSTOPTS:
 			case IPV6_RECVRTHDRDSTOPTS:
 			case IPV6_UNICAST_HOPS:
 			case IPV6_RECVPKTINFO:
 			case IPV6_RECVHOPLIMIT:
 			case IPV6_RECVRTHDR:
 			case IPV6_RECVPATHMTU:
 
 			case IPV6_V6ONLY:
 			case IPV6_PORTRANGE:
 			case IPV6_RECVTCLASS:
 			case IPV6_AUTOFLOWLABEL:
 			case IPV6_BINDANY:
 			case IPV6_FLOWID:
 			case IPV6_FLOWTYPE:
 			case IPV6_RECVFLOWID:
 #ifdef	RSS
 			case IPV6_RSSBUCKETID:
 			case IPV6_RECVRSSBUCKETID:
 #endif
 			case IPV6_BINDMULTI:
 				switch (optname) {
 
 				case IPV6_RECVHOPOPTS:
 					optval = OPTBIT(IN6P_HOPOPTS);
 					break;
 
 				case IPV6_RECVDSTOPTS:
 					optval = OPTBIT(IN6P_DSTOPTS);
 					break;
 
 				case IPV6_RECVRTHDRDSTOPTS:
 					optval = OPTBIT(IN6P_RTHDRDSTOPTS);
 					break;
 
 				case IPV6_UNICAST_HOPS:
 					optval = in6p->in6p_hops;
 					break;
 
 				case IPV6_RECVPKTINFO:
 					optval = OPTBIT(IN6P_PKTINFO);
 					break;
 
 				case IPV6_RECVHOPLIMIT:
 					optval = OPTBIT(IN6P_HOPLIMIT);
 					break;
 
 				case IPV6_RECVRTHDR:
 					optval = OPTBIT(IN6P_RTHDR);
 					break;
 
 				case IPV6_RECVPATHMTU:
 					optval = OPTBIT(IN6P_MTU);
 					break;
 
 				case IPV6_V6ONLY:
 					optval = OPTBIT(IN6P_IPV6_V6ONLY);
 					break;
 
 				case IPV6_PORTRANGE:
 				    {
 					int flags;
 					flags = in6p->inp_flags;
 					if (flags & INP_HIGHPORT)
 						optval = IPV6_PORTRANGE_HIGH;
 					else if (flags & INP_LOWPORT)
 						optval = IPV6_PORTRANGE_LOW;
 					else
 						optval = 0;
 					break;
 				    }
 				case IPV6_RECVTCLASS:
 					optval = OPTBIT(IN6P_TCLASS);
 					break;
 
 				case IPV6_AUTOFLOWLABEL:
 					optval = OPTBIT(IN6P_AUTOFLOWLABEL);
 					break;
 
 				case IPV6_ORIGDSTADDR:
 					optval = OPTBIT2(INP_ORIGDSTADDR);
 					break;
 
 				case IPV6_BINDANY:
 					optval = OPTBIT(INP_BINDANY);
 					break;
 
 				case IPV6_FLOWID:
 					optval = in6p->inp_flowid;
 					break;
 
 				case IPV6_FLOWTYPE:
 					optval = in6p->inp_flowtype;
 					break;
 
 				case IPV6_RECVFLOWID:
 					optval = OPTBIT2(INP_RECVFLOWID);
 					break;
 #ifdef	RSS
 				case IPV6_RSSBUCKETID:
 					retval =
 					    rss_hash2bucket(in6p->inp_flowid,
 					    in6p->inp_flowtype,
 					    &rss_bucket);
 					if (retval == 0)
 						optval = rss_bucket;
 					else
 						error = EINVAL;
 					break;
 
 				case IPV6_RECVRSSBUCKETID:
 					optval = OPTBIT2(INP_RECVRSSBUCKETID);
 					break;
 #endif
 
 				case IPV6_BINDMULTI:
 					optval = OPTBIT2(INP_BINDMULTI);
 					break;
 
 				}
 				if (error)
 					break;
 				error = sooptcopyout(sopt, &optval,
 					sizeof optval);
 				break;
 
 			case IPV6_PATHMTU:
 			{
 				u_long pmtu = 0;
 				struct ip6_mtuinfo mtuinfo;
 				struct in6_addr addr;
 
 				if (!(so->so_state & SS_ISCONNECTED))
 					return (ENOTCONN);
 				/*
 				 * XXX: we dot not consider the case of source
 				 * routing, or optional information to specify
 				 * the outgoing interface.
 				 * Copy faddr out of in6p to avoid holding lock
 				 * on inp during route lookup.
 				 */
 				INP_RLOCK(in6p);
 				bcopy(&in6p->in6p_faddr, &addr, sizeof(addr));
 				INP_RUNLOCK(in6p);
 				error = ip6_getpmtu_ctl(so->so_fibnum,
 				    &addr, &pmtu);
 				if (error)
 					break;
 				if (pmtu > IPV6_MAXPACKET)
 					pmtu = IPV6_MAXPACKET;
 
 				bzero(&mtuinfo, sizeof(mtuinfo));
 				mtuinfo.ip6m_mtu = (u_int32_t)pmtu;
 				optdata = (void *)&mtuinfo;
 				optdatalen = sizeof(mtuinfo);
 				error = sooptcopyout(sopt, optdata,
 				    optdatalen);
 				break;
 			}
 
 			case IPV6_2292PKTINFO:
 			case IPV6_2292HOPLIMIT:
 			case IPV6_2292HOPOPTS:
 			case IPV6_2292RTHDR:
 			case IPV6_2292DSTOPTS:
 				switch (optname) {
 				case IPV6_2292PKTINFO:
 					optval = OPTBIT(IN6P_PKTINFO);
 					break;
 				case IPV6_2292HOPLIMIT:
 					optval = OPTBIT(IN6P_HOPLIMIT);
 					break;
 				case IPV6_2292HOPOPTS:
 					optval = OPTBIT(IN6P_HOPOPTS);
 					break;
 				case IPV6_2292RTHDR:
 					optval = OPTBIT(IN6P_RTHDR);
 					break;
 				case IPV6_2292DSTOPTS:
 					optval = OPTBIT(IN6P_DSTOPTS|IN6P_RTHDRDSTOPTS);
 					break;
 				}
 				error = sooptcopyout(sopt, &optval,
 				    sizeof optval);
 				break;
 			case IPV6_PKTINFO:
 			case IPV6_HOPOPTS:
 			case IPV6_RTHDR:
 			case IPV6_DSTOPTS:
 			case IPV6_RTHDRDSTOPTS:
 			case IPV6_NEXTHOP:
 			case IPV6_TCLASS:
 			case IPV6_DONTFRAG:
 			case IPV6_USE_MIN_MTU:
 			case IPV6_PREFER_TEMPADDR:
 				error = ip6_getpcbopt(in6p, optname, sopt);
 				break;
 
 			case IPV6_MULTICAST_IF:
 			case IPV6_MULTICAST_HOPS:
 			case IPV6_MULTICAST_LOOP:
 			case IPV6_MSFILTER:
 				error = ip6_getmoptions(in6p, sopt);
 				break;
 
 #if defined(IPSEC) || defined(IPSEC_SUPPORT)
 			case IPV6_IPSEC_POLICY:
 				if (IPSEC_ENABLED(ipv6)) {
 					error = IPSEC_PCBCTL(ipv6, in6p, sopt);
 					break;
 				}
 				/* FALLTHROUGH */
 #endif /* IPSEC */
 			default:
 				error = ENOPROTOOPT;
 				break;
 			}
 			break;
 		}
 	}
 	return (error);
 }
 
 int
 ip6_raw_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	int error = 0, optval, optlen;
 	const int icmp6off = offsetof(struct icmp6_hdr, icmp6_cksum);
 	struct inpcb *in6p = sotoinpcb(so);
 	int level, op, optname;
 
 	level = sopt->sopt_level;
 	op = sopt->sopt_dir;
 	optname = sopt->sopt_name;
 	optlen = sopt->sopt_valsize;
 
 	if (level != IPPROTO_IPV6) {
 		return (EINVAL);
 	}
 
 	switch (optname) {
 	case IPV6_CHECKSUM:
 		/*
 		 * For ICMPv6 sockets, no modification allowed for checksum
 		 * offset, permit "no change" values to help existing apps.
 		 *
 		 * RFC3542 says: "An attempt to set IPV6_CHECKSUM
 		 * for an ICMPv6 socket will fail."
 		 * The current behavior does not meet RFC3542.
 		 */
 		switch (op) {
 		case SOPT_SET:
 			if (optlen != sizeof(int)) {
 				error = EINVAL;
 				break;
 			}
 			error = sooptcopyin(sopt, &optval, sizeof(optval),
 					    sizeof(optval));
 			if (error)
 				break;
 			if (optval < -1 || (optval % 2) != 0) {
 				/*
 				 * The API assumes non-negative even offset
 				 * values or -1 as a special value.
 				 */
 				error = EINVAL;
 			} else if (so->so_proto->pr_protocol ==
 			    IPPROTO_ICMPV6) {
 				if (optval != icmp6off)
 					error = EINVAL;
 			} else
 				in6p->in6p_cksum = optval;
 			break;
 
 		case SOPT_GET:
 			if (so->so_proto->pr_protocol == IPPROTO_ICMPV6)
 				optval = icmp6off;
 			else
 				optval = in6p->in6p_cksum;
 
 			error = sooptcopyout(sopt, &optval, sizeof(optval));
 			break;
 
 		default:
 			error = EINVAL;
 			break;
 		}
 		break;
 
 	default:
 		error = ENOPROTOOPT;
 		break;
 	}
 
 	return (error);
 }
 
 /*
  * Set up IP6 options in pcb for insertion in output packets or
  * specifying behavior of outgoing packets.
  */
 static int
 ip6_pcbopts(struct ip6_pktopts **pktopt, struct mbuf *m,
     struct socket *so, struct sockopt *sopt)
 {
 	struct ip6_pktopts *opt = *pktopt;
 	int error = 0;
 	struct thread *td = sopt->sopt_td;
 
 	/* turn off any old options. */
 	if (opt) {
 #ifdef DIAGNOSTIC
 		if (opt->ip6po_pktinfo || opt->ip6po_nexthop ||
 		    opt->ip6po_hbh || opt->ip6po_dest1 || opt->ip6po_dest2 ||
 		    opt->ip6po_rhinfo.ip6po_rhi_rthdr)
 			printf("ip6_pcbopts: all specified options are cleared.\n");
 #endif
 		ip6_clearpktopts(opt, -1);
 	} else
 		opt = malloc(sizeof(*opt), M_IP6OPT, M_WAITOK);
 	*pktopt = NULL;
 
 	if (!m || m->m_len == 0) {
 		/*
 		 * Only turning off any previous options, regardless of
 		 * whether the opt is just created or given.
 		 */
 		free(opt, M_IP6OPT);
 		return (0);
 	}
 
 	/*  set options specified by user. */
 	if ((error = ip6_setpktopts(m, opt, NULL, (td != NULL) ?
 	    td->td_ucred : NULL, so->so_proto->pr_protocol)) != 0) {
 		ip6_clearpktopts(opt, -1); /* XXX: discard all options */
 		free(opt, M_IP6OPT);
 		return (error);
 	}
 	*pktopt = opt;
 	return (0);
 }
 
 /*
  * initialize ip6_pktopts.  beware that there are non-zero default values in
  * the struct.
  */
 void
 ip6_initpktopts(struct ip6_pktopts *opt)
 {
 
 	bzero(opt, sizeof(*opt));
 	opt->ip6po_hlim = -1;	/* -1 means default hop limit */
 	opt->ip6po_tclass = -1;	/* -1 means default traffic class */
 	opt->ip6po_minmtu = IP6PO_MINMTU_MCASTONLY;
 	opt->ip6po_prefer_tempaddr = IP6PO_TEMPADDR_SYSTEM;
 }
 
 static int
 ip6_pcbopt(int optname, u_char *buf, int len, struct ip6_pktopts **pktopt,
     struct ucred *cred, int uproto)
 {
 	struct ip6_pktopts *opt;
 
 	if (*pktopt == NULL) {
 		*pktopt = malloc(sizeof(struct ip6_pktopts), M_IP6OPT,
 		    M_NOWAIT);
 		if (*pktopt == NULL)
 			return (ENOBUFS);
 		ip6_initpktopts(*pktopt);
 	}
 	opt = *pktopt;
 
 	return (ip6_setpktopt(optname, buf, len, opt, cred, 1, 0, uproto));
 }
 
 #define GET_PKTOPT_VAR(field, lenexpr) do {					\
 	if (pktopt && pktopt->field) {						\
 		INP_RUNLOCK(in6p);						\
 		optdata = malloc(sopt->sopt_valsize, M_TEMP, M_WAITOK);		\
 		malloc_optdata = true;						\
 		INP_RLOCK(in6p);						\
 		if (in6p->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {		\
 			INP_RUNLOCK(in6p);					\
 			free(optdata, M_TEMP);					\
 			return (ECONNRESET);					\
 		}								\
 		pktopt = in6p->in6p_outputopts;					\
 		if (pktopt && pktopt->field) {					\
 			optdatalen = min(lenexpr, sopt->sopt_valsize);		\
 			bcopy(&pktopt->field, optdata, optdatalen);		\
 		} else {							\
 			free(optdata, M_TEMP);					\
 			optdata = NULL;						\
 			malloc_optdata = false;					\
 		}								\
 	}									\
 } while(0)
 
 #define GET_PKTOPT_EXT_HDR(field) GET_PKTOPT_VAR(field,				\
 	(((struct ip6_ext *)pktopt->field)->ip6e_len + 1) << 3)
 
 #define GET_PKTOPT_SOCKADDR(field) GET_PKTOPT_VAR(field,			\
 	pktopt->field->sa_len)
 
 static int
 ip6_getpcbopt(struct inpcb *in6p, int optname, struct sockopt *sopt)
 {
 	void *optdata = NULL;
 	bool malloc_optdata = false;
 	int optdatalen = 0;
 	int error = 0;
 	struct in6_pktinfo null_pktinfo;
 	int deftclass = 0, on;
 	int defminmtu = IP6PO_MINMTU_MCASTONLY;
 	int defpreftemp = IP6PO_TEMPADDR_SYSTEM;
 	struct ip6_pktopts *pktopt;
 
 	INP_RLOCK(in6p);
 	pktopt = in6p->in6p_outputopts;
 
 	switch (optname) {
 	case IPV6_PKTINFO:
 		optdata = (void *)&null_pktinfo;
 		if (pktopt && pktopt->ip6po_pktinfo) {
 			bcopy(pktopt->ip6po_pktinfo, &null_pktinfo,
 			    sizeof(null_pktinfo));
 			in6_clearscope(&null_pktinfo.ipi6_addr);
 		} else {
 			/* XXX: we don't have to do this every time... */
 			bzero(&null_pktinfo, sizeof(null_pktinfo));
 		}
 		optdatalen = sizeof(struct in6_pktinfo);
 		break;
 	case IPV6_TCLASS:
 		if (pktopt && pktopt->ip6po_tclass >= 0)
 			deftclass = pktopt->ip6po_tclass;
 		optdata = (void *)&deftclass;
 		optdatalen = sizeof(int);
 		break;
 	case IPV6_HOPOPTS:
 		GET_PKTOPT_EXT_HDR(ip6po_hbh);
 		break;
 	case IPV6_RTHDR:
 		GET_PKTOPT_EXT_HDR(ip6po_rthdr);
 		break;
 	case IPV6_RTHDRDSTOPTS:
 		GET_PKTOPT_EXT_HDR(ip6po_dest1);
 		break;
 	case IPV6_DSTOPTS:
 		GET_PKTOPT_EXT_HDR(ip6po_dest2);
 		break;
 	case IPV6_NEXTHOP:
 		GET_PKTOPT_SOCKADDR(ip6po_nexthop);
 		break;
 	case IPV6_USE_MIN_MTU:
 		if (pktopt)
 			defminmtu = pktopt->ip6po_minmtu;
 		optdata = (void *)&defminmtu;
 		optdatalen = sizeof(int);
 		break;
 	case IPV6_DONTFRAG:
 		if (pktopt && ((pktopt->ip6po_flags) & IP6PO_DONTFRAG))
 			on = 1;
 		else
 			on = 0;
 		optdata = (void *)&on;
 		optdatalen = sizeof(on);
 		break;
 	case IPV6_PREFER_TEMPADDR:
 		if (pktopt)
 			defpreftemp = pktopt->ip6po_prefer_tempaddr;
 		optdata = (void *)&defpreftemp;
 		optdatalen = sizeof(int);
 		break;
 	default:		/* should not happen */
 #ifdef DIAGNOSTIC
 		panic("ip6_getpcbopt: unexpected option\n");
 #endif
 		INP_RUNLOCK(in6p);
 		return (ENOPROTOOPT);
 	}
 	INP_RUNLOCK(in6p);
 
 	error = sooptcopyout(sopt, optdata, optdatalen);
 	if (malloc_optdata)
 		free(optdata, M_TEMP);
 
 	return (error);
 }
 
 void
 ip6_clearpktopts(struct ip6_pktopts *pktopt, int optname)
 {
 	if (pktopt == NULL)
 		return;
 
 	if (optname == -1 || optname == IPV6_PKTINFO) {
 		if (pktopt->ip6po_pktinfo)
 			free(pktopt->ip6po_pktinfo, M_IP6OPT);
 		pktopt->ip6po_pktinfo = NULL;
 	}
 	if (optname == -1 || optname == IPV6_HOPLIMIT)
 		pktopt->ip6po_hlim = -1;
 	if (optname == -1 || optname == IPV6_TCLASS)
 		pktopt->ip6po_tclass = -1;
 	if (optname == -1 || optname == IPV6_NEXTHOP) {
 		if (pktopt->ip6po_nextroute.ro_rt) {
 			RTFREE(pktopt->ip6po_nextroute.ro_rt);
 			pktopt->ip6po_nextroute.ro_rt = NULL;
 		}
 		if (pktopt->ip6po_nexthop)
 			free(pktopt->ip6po_nexthop, M_IP6OPT);
 		pktopt->ip6po_nexthop = NULL;
 	}
 	if (optname == -1 || optname == IPV6_HOPOPTS) {
 		if (pktopt->ip6po_hbh)
 			free(pktopt->ip6po_hbh, M_IP6OPT);
 		pktopt->ip6po_hbh = NULL;
 	}
 	if (optname == -1 || optname == IPV6_RTHDRDSTOPTS) {
 		if (pktopt->ip6po_dest1)
 			free(pktopt->ip6po_dest1, M_IP6OPT);
 		pktopt->ip6po_dest1 = NULL;
 	}
 	if (optname == -1 || optname == IPV6_RTHDR) {
 		if (pktopt->ip6po_rhinfo.ip6po_rhi_rthdr)
 			free(pktopt->ip6po_rhinfo.ip6po_rhi_rthdr, M_IP6OPT);
 		pktopt->ip6po_rhinfo.ip6po_rhi_rthdr = NULL;
 		if (pktopt->ip6po_route.ro_rt) {
 			RTFREE(pktopt->ip6po_route.ro_rt);
 			pktopt->ip6po_route.ro_rt = NULL;
 		}
 	}
 	if (optname == -1 || optname == IPV6_DSTOPTS) {
 		if (pktopt->ip6po_dest2)
 			free(pktopt->ip6po_dest2, M_IP6OPT);
 		pktopt->ip6po_dest2 = NULL;
 	}
 }
 
 #define PKTOPT_EXTHDRCPY(type) \
 do {\
 	if (src->type) {\
 		int hlen = (((struct ip6_ext *)src->type)->ip6e_len + 1) << 3;\
 		dst->type = malloc(hlen, M_IP6OPT, canwait);\
 		if (dst->type == NULL)\
 			goto bad;\
 		bcopy(src->type, dst->type, hlen);\
 	}\
 } while (/*CONSTCOND*/ 0)
 
 static int
 copypktopts(struct ip6_pktopts *dst, struct ip6_pktopts *src, int canwait)
 {
 	if (dst == NULL || src == NULL)  {
 		printf("ip6_clearpktopts: invalid argument\n");
 		return (EINVAL);
 	}
 
 	dst->ip6po_hlim = src->ip6po_hlim;
 	dst->ip6po_tclass = src->ip6po_tclass;
 	dst->ip6po_flags = src->ip6po_flags;
 	dst->ip6po_minmtu = src->ip6po_minmtu;
 	dst->ip6po_prefer_tempaddr = src->ip6po_prefer_tempaddr;
 	if (src->ip6po_pktinfo) {
 		dst->ip6po_pktinfo = malloc(sizeof(*dst->ip6po_pktinfo),
 		    M_IP6OPT, canwait);
 		if (dst->ip6po_pktinfo == NULL)
 			goto bad;
 		*dst->ip6po_pktinfo = *src->ip6po_pktinfo;
 	}
 	if (src->ip6po_nexthop) {
 		dst->ip6po_nexthop = malloc(src->ip6po_nexthop->sa_len,
 		    M_IP6OPT, canwait);
 		if (dst->ip6po_nexthop == NULL)
 			goto bad;
 		bcopy(src->ip6po_nexthop, dst->ip6po_nexthop,
 		    src->ip6po_nexthop->sa_len);
 	}
 	PKTOPT_EXTHDRCPY(ip6po_hbh);
 	PKTOPT_EXTHDRCPY(ip6po_dest1);
 	PKTOPT_EXTHDRCPY(ip6po_dest2);
 	PKTOPT_EXTHDRCPY(ip6po_rthdr); /* not copy the cached route */
 	return (0);
 
   bad:
 	ip6_clearpktopts(dst, -1);
 	return (ENOBUFS);
 }
 #undef PKTOPT_EXTHDRCPY
 
 struct ip6_pktopts *
 ip6_copypktopts(struct ip6_pktopts *src, int canwait)
 {
 	int error;
 	struct ip6_pktopts *dst;
 
 	dst = malloc(sizeof(*dst), M_IP6OPT, canwait);
 	if (dst == NULL)
 		return (NULL);
 	ip6_initpktopts(dst);
 
 	if ((error = copypktopts(dst, src, canwait)) != 0) {
 		free(dst, M_IP6OPT);
 		return (NULL);
 	}
 
 	return (dst);
 }
 
 void
 ip6_freepcbopts(struct ip6_pktopts *pktopt)
 {
 	if (pktopt == NULL)
 		return;
 
 	ip6_clearpktopts(pktopt, -1);
 
 	free(pktopt, M_IP6OPT);
 }
 
 /*
  * Set IPv6 outgoing packet options based on advanced API.
  */
 int
 ip6_setpktopts(struct mbuf *control, struct ip6_pktopts *opt,
     struct ip6_pktopts *stickyopt, struct ucred *cred, int uproto)
 {
 	struct cmsghdr *cm = NULL;
 
 	if (control == NULL || opt == NULL)
 		return (EINVAL);
 
 	ip6_initpktopts(opt);
 	if (stickyopt) {
 		int error;
 
 		/*
 		 * If stickyopt is provided, make a local copy of the options
 		 * for this particular packet, then override them by ancillary
 		 * objects.
 		 * XXX: copypktopts() does not copy the cached route to a next
 		 * hop (if any).  This is not very good in terms of efficiency,
 		 * but we can allow this since this option should be rarely
 		 * used.
 		 */
 		if ((error = copypktopts(opt, stickyopt, M_NOWAIT)) != 0)
 			return (error);
 	}
 
 	/*
 	 * XXX: Currently, we assume all the optional information is stored
 	 * in a single mbuf.
 	 */
 	if (control->m_next)
 		return (EINVAL);
 
 	for (; control->m_len > 0; control->m_data += CMSG_ALIGN(cm->cmsg_len),
 	    control->m_len -= CMSG_ALIGN(cm->cmsg_len)) {
 		int error;
 
 		if (control->m_len < CMSG_LEN(0))
 			return (EINVAL);
 
 		cm = mtod(control, struct cmsghdr *);
 		if (cm->cmsg_len == 0 || cm->cmsg_len > control->m_len)
 			return (EINVAL);
 		if (cm->cmsg_level != IPPROTO_IPV6)
 			continue;
 
 		error = ip6_setpktopt(cm->cmsg_type, CMSG_DATA(cm),
 		    cm->cmsg_len - CMSG_LEN(0), opt, cred, 0, 1, uproto);
 		if (error)
 			return (error);
 	}
 
 	return (0);
 }
 
 /*
  * Set a particular packet option, as a sticky option or an ancillary data
  * item.  "len" can be 0 only when it's a sticky option.
  * We have 4 cases of combination of "sticky" and "cmsg":
  * "sticky=0, cmsg=0": impossible
  * "sticky=0, cmsg=1": RFC2292 or RFC3542 ancillary data
  * "sticky=1, cmsg=0": RFC3542 socket option
  * "sticky=1, cmsg=1": RFC2292 socket option
  */
 static int
 ip6_setpktopt(int optname, u_char *buf, int len, struct ip6_pktopts *opt,
     struct ucred *cred, int sticky, int cmsg, int uproto)
 {
 	int minmtupolicy, preftemp;
 	int error;
 
 	if (!sticky && !cmsg) {
 #ifdef DIAGNOSTIC
 		printf("ip6_setpktopt: impossible case\n");
 #endif
 		return (EINVAL);
 	}
 
 	/*
 	 * IPV6_2292xxx is for backward compatibility to RFC2292, and should
 	 * not be specified in the context of RFC3542.  Conversely,
 	 * RFC3542 types should not be specified in the context of RFC2292.
 	 */
 	if (!cmsg) {
 		switch (optname) {
 		case IPV6_2292PKTINFO:
 		case IPV6_2292HOPLIMIT:
 		case IPV6_2292NEXTHOP:
 		case IPV6_2292HOPOPTS:
 		case IPV6_2292DSTOPTS:
 		case IPV6_2292RTHDR:
 		case IPV6_2292PKTOPTIONS:
 			return (ENOPROTOOPT);
 		}
 	}
 	if (sticky && cmsg) {
 		switch (optname) {
 		case IPV6_PKTINFO:
 		case IPV6_HOPLIMIT:
 		case IPV6_NEXTHOP:
 		case IPV6_HOPOPTS:
 		case IPV6_DSTOPTS:
 		case IPV6_RTHDRDSTOPTS:
 		case IPV6_RTHDR:
 		case IPV6_USE_MIN_MTU:
 		case IPV6_DONTFRAG:
 		case IPV6_TCLASS:
 		case IPV6_PREFER_TEMPADDR: /* XXX: not an RFC3542 option */
 			return (ENOPROTOOPT);
 		}
 	}
 
 	switch (optname) {
 	case IPV6_2292PKTINFO:
 	case IPV6_PKTINFO:
 	{
 		struct ifnet *ifp = NULL;
 		struct in6_pktinfo *pktinfo;
 
 		if (len != sizeof(struct in6_pktinfo))
 			return (EINVAL);
 
 		pktinfo = (struct in6_pktinfo *)buf;
 
 		/*
 		 * An application can clear any sticky IPV6_PKTINFO option by
 		 * doing a "regular" setsockopt with ipi6_addr being
 		 * in6addr_any and ipi6_ifindex being zero.
 		 * [RFC 3542, Section 6]
 		 */
 		if (optname == IPV6_PKTINFO && opt->ip6po_pktinfo &&
 		    pktinfo->ipi6_ifindex == 0 &&
 		    IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
 			ip6_clearpktopts(opt, optname);
 			break;
 		}
 
 		if (uproto == IPPROTO_TCP && optname == IPV6_PKTINFO &&
 		    sticky && !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
 			return (EINVAL);
 		}
 		if (IN6_IS_ADDR_MULTICAST(&pktinfo->ipi6_addr))
 			return (EINVAL);
 		/* validate the interface index if specified. */
 		if (pktinfo->ipi6_ifindex > V_if_index)
 			 return (ENXIO);
 		if (pktinfo->ipi6_ifindex) {
 			ifp = ifnet_byindex(pktinfo->ipi6_ifindex);
 			if (ifp == NULL)
 				return (ENXIO);
 		}
 		if (ifp != NULL && (ifp->if_afdata[AF_INET6] == NULL ||
 		    (ND_IFINFO(ifp)->flags & ND6_IFF_IFDISABLED) != 0))
 			return (ENETDOWN);
 
 		if (ifp != NULL &&
 		    !IN6_IS_ADDR_UNSPECIFIED(&pktinfo->ipi6_addr)) {
 			struct in6_ifaddr *ia;
 
 			in6_setscope(&pktinfo->ipi6_addr, ifp, NULL);
 			ia = in6ifa_ifpwithaddr(ifp, &pktinfo->ipi6_addr);
 			if (ia == NULL)
 				return (EADDRNOTAVAIL);
 			ifa_free(&ia->ia_ifa);
 		}
 		/*
 		 * We store the address anyway, and let in6_selectsrc()
 		 * validate the specified address.  This is because ipi6_addr
 		 * may not have enough information about its scope zone, and
 		 * we may need additional information (such as outgoing
 		 * interface or the scope zone of a destination address) to
 		 * disambiguate the scope.
 		 * XXX: the delay of the validation may confuse the
 		 * application when it is used as a sticky option.
 		 */
 		if (opt->ip6po_pktinfo == NULL) {
 			opt->ip6po_pktinfo = malloc(sizeof(*pktinfo),
 			    M_IP6OPT, M_NOWAIT);
 			if (opt->ip6po_pktinfo == NULL)
 				return (ENOBUFS);
 		}
 		bcopy(pktinfo, opt->ip6po_pktinfo, sizeof(*pktinfo));
 		break;
 	}
 
 	case IPV6_2292HOPLIMIT:
 	case IPV6_HOPLIMIT:
 	{
 		int *hlimp;
 
 		/*
 		 * RFC 3542 deprecated the usage of sticky IPV6_HOPLIMIT
 		 * to simplify the ordering among hoplimit options.
 		 */
 		if (optname == IPV6_HOPLIMIT && sticky)
 			return (ENOPROTOOPT);
 
 		if (len != sizeof(int))
 			return (EINVAL);
 		hlimp = (int *)buf;
 		if (*hlimp < -1 || *hlimp > 255)
 			return (EINVAL);
 
 		opt->ip6po_hlim = *hlimp;
 		break;
 	}
 
 	case IPV6_TCLASS:
 	{
 		int tclass;
 
 		if (len != sizeof(int))
 			return (EINVAL);
 		tclass = *(int *)buf;
 		if (tclass < -1 || tclass > 255)
 			return (EINVAL);
 
 		opt->ip6po_tclass = tclass;
 		break;
 	}
 
 	case IPV6_2292NEXTHOP:
 	case IPV6_NEXTHOP:
 		if (cred != NULL) {
 			error = priv_check_cred(cred, PRIV_NETINET_SETHDROPTS);
 			if (error)
 				return (error);
 		}
 
 		if (len == 0) {	/* just remove the option */
 			ip6_clearpktopts(opt, IPV6_NEXTHOP);
 			break;
 		}
 
 		/* check if cmsg_len is large enough for sa_len */
 		if (len < sizeof(struct sockaddr) || len < *buf)
 			return (EINVAL);
 
 		switch (((struct sockaddr *)buf)->sa_family) {
 		case AF_INET6:
 		{
 			struct sockaddr_in6 *sa6 = (struct sockaddr_in6 *)buf;
 			int error;
 
 			if (sa6->sin6_len != sizeof(struct sockaddr_in6))
 				return (EINVAL);
 
 			if (IN6_IS_ADDR_UNSPECIFIED(&sa6->sin6_addr) ||
 			    IN6_IS_ADDR_MULTICAST(&sa6->sin6_addr)) {
 				return (EINVAL);
 			}
 			if ((error = sa6_embedscope(sa6, V_ip6_use_defzone))
 			    != 0) {
 				return (error);
 			}
 			break;
 		}
 		case AF_LINK:	/* should eventually be supported */
 		default:
 			return (EAFNOSUPPORT);
 		}
 
 		/* turn off the previous option, then set the new option. */
 		ip6_clearpktopts(opt, IPV6_NEXTHOP);
 		opt->ip6po_nexthop = malloc(*buf, M_IP6OPT, M_NOWAIT);
 		if (opt->ip6po_nexthop == NULL)
 			return (ENOBUFS);
 		bcopy(buf, opt->ip6po_nexthop, *buf);
 		break;
 
 	case IPV6_2292HOPOPTS:
 	case IPV6_HOPOPTS:
 	{
 		struct ip6_hbh *hbh;
 		int hbhlen;
 
 		/*
 		 * XXX: We don't allow a non-privileged user to set ANY HbH
 		 * options, since per-option restriction has too much
 		 * overhead.
 		 */
 		if (cred != NULL) {
 			error = priv_check_cred(cred, PRIV_NETINET_SETHDROPTS);
 			if (error)
 				return (error);
 		}
 
 		if (len == 0) {
 			ip6_clearpktopts(opt, IPV6_HOPOPTS);
 			break;	/* just remove the option */
 		}
 
 		/* message length validation */
 		if (len < sizeof(struct ip6_hbh))
 			return (EINVAL);
 		hbh = (struct ip6_hbh *)buf;
 		hbhlen = (hbh->ip6h_len + 1) << 3;
 		if (len != hbhlen)
 			return (EINVAL);
 
 		/* turn off the previous option, then set the new option. */
 		ip6_clearpktopts(opt, IPV6_HOPOPTS);
 		opt->ip6po_hbh = malloc(hbhlen, M_IP6OPT, M_NOWAIT);
 		if (opt->ip6po_hbh == NULL)
 			return (ENOBUFS);
 		bcopy(hbh, opt->ip6po_hbh, hbhlen);
 
 		break;
 	}
 
 	case IPV6_2292DSTOPTS:
 	case IPV6_DSTOPTS:
 	case IPV6_RTHDRDSTOPTS:
 	{
 		struct ip6_dest *dest, **newdest = NULL;
 		int destlen;
 
 		if (cred != NULL) { /* XXX: see the comment for IPV6_HOPOPTS */
 			error = priv_check_cred(cred, PRIV_NETINET_SETHDROPTS);
 			if (error)
 				return (error);
 		}
 
 		if (len == 0) {
 			ip6_clearpktopts(opt, optname);
 			break;	/* just remove the option */
 		}
 
 		/* message length validation */
 		if (len < sizeof(struct ip6_dest))
 			return (EINVAL);
 		dest = (struct ip6_dest *)buf;
 		destlen = (dest->ip6d_len + 1) << 3;
 		if (len != destlen)
 			return (EINVAL);
 
 		/*
 		 * Determine the position that the destination options header
 		 * should be inserted; before or after the routing header.
 		 */
 		switch (optname) {
 		case IPV6_2292DSTOPTS:
 			/*
 			 * The old advacned API is ambiguous on this point.
 			 * Our approach is to determine the position based
 			 * according to the existence of a routing header.
 			 * Note, however, that this depends on the order of the
 			 * extension headers in the ancillary data; the 1st
 			 * part of the destination options header must appear
 			 * before the routing header in the ancillary data,
 			 * too.
 			 * RFC3542 solved the ambiguity by introducing
 			 * separate ancillary data or option types.
 			 */
 			if (opt->ip6po_rthdr == NULL)
 				newdest = &opt->ip6po_dest1;
 			else
 				newdest = &opt->ip6po_dest2;
 			break;
 		case IPV6_RTHDRDSTOPTS:
 			newdest = &opt->ip6po_dest1;
 			break;
 		case IPV6_DSTOPTS:
 			newdest = &opt->ip6po_dest2;
 			break;
 		}
 
 		/* turn off the previous option, then set the new option. */
 		ip6_clearpktopts(opt, optname);
 		*newdest = malloc(destlen, M_IP6OPT, M_NOWAIT);
 		if (*newdest == NULL)
 			return (ENOBUFS);
 		bcopy(dest, *newdest, destlen);
 
 		break;
 	}
 
 	case IPV6_2292RTHDR:
 	case IPV6_RTHDR:
 	{
 		struct ip6_rthdr *rth;
 		int rthlen;
 
 		if (len == 0) {
 			ip6_clearpktopts(opt, IPV6_RTHDR);
 			break;	/* just remove the option */
 		}
 
 		/* message length validation */
 		if (len < sizeof(struct ip6_rthdr))
 			return (EINVAL);
 		rth = (struct ip6_rthdr *)buf;
 		rthlen = (rth->ip6r_len + 1) << 3;
 		if (len != rthlen)
 			return (EINVAL);
 
 		switch (rth->ip6r_type) {
 		case IPV6_RTHDR_TYPE_0:
 			if (rth->ip6r_len == 0)	/* must contain one addr */
 				return (EINVAL);
 			if (rth->ip6r_len % 2) /* length must be even */
 				return (EINVAL);
 			if (rth->ip6r_len / 2 != rth->ip6r_segleft)
 				return (EINVAL);
 			break;
 		default:
 			return (EINVAL);	/* not supported */
 		}
 
 		/* turn off the previous option */
 		ip6_clearpktopts(opt, IPV6_RTHDR);
 		opt->ip6po_rthdr = malloc(rthlen, M_IP6OPT, M_NOWAIT);
 		if (opt->ip6po_rthdr == NULL)
 			return (ENOBUFS);
 		bcopy(rth, opt->ip6po_rthdr, rthlen);
 
 		break;
 	}
 
 	case IPV6_USE_MIN_MTU:
 		if (len != sizeof(int))
 			return (EINVAL);
 		minmtupolicy = *(int *)buf;
 		if (minmtupolicy != IP6PO_MINMTU_MCASTONLY &&
 		    minmtupolicy != IP6PO_MINMTU_DISABLE &&
 		    minmtupolicy != IP6PO_MINMTU_ALL) {
 			return (EINVAL);
 		}
 		opt->ip6po_minmtu = minmtupolicy;
 		break;
 
 	case IPV6_DONTFRAG:
 		if (len != sizeof(int))
 			return (EINVAL);
 
 		if (uproto == IPPROTO_TCP || *(int *)buf == 0) {
 			/*
 			 * we ignore this option for TCP sockets.
 			 * (RFC3542 leaves this case unspecified.)
 			 */
 			opt->ip6po_flags &= ~IP6PO_DONTFRAG;
 		} else
 			opt->ip6po_flags |= IP6PO_DONTFRAG;
 		break;
 
 	case IPV6_PREFER_TEMPADDR:
 		if (len != sizeof(int))
 			return (EINVAL);
 		preftemp = *(int *)buf;
 		if (preftemp != IP6PO_TEMPADDR_SYSTEM &&
 		    preftemp != IP6PO_TEMPADDR_NOTPREFER &&
 		    preftemp != IP6PO_TEMPADDR_PREFER) {
 			return (EINVAL);
 		}
 		opt->ip6po_prefer_tempaddr = preftemp;
 		break;
 
 	default:
 		return (ENOPROTOOPT);
 	} /* end of switch */
 
 	return (0);
 }
 
 /*
  * Routine called from ip6_output() to loop back a copy of an IP6 multicast
  * packet to the input queue of a specified interface.  Note that this
  * calls the output routine of the loopback "driver", but with an interface
  * pointer that might NOT be &loif -- easier than replicating that code here.
  */
 void
 ip6_mloopback(struct ifnet *ifp, struct mbuf *m)
 {
 	struct mbuf *copym;
 	struct ip6_hdr *ip6;
 
 	copym = m_copym(m, 0, M_COPYALL, M_NOWAIT);
 	if (copym == NULL)
 		return;
 
 	/*
 	 * Make sure to deep-copy IPv6 header portion in case the data
 	 * is in an mbuf cluster, so that we can safely override the IPv6
 	 * header portion later.
 	 */
 	if (!M_WRITABLE(copym) ||
 	    copym->m_len < sizeof(struct ip6_hdr)) {
 		copym = m_pullup(copym, sizeof(struct ip6_hdr));
 		if (copym == NULL)
 			return;
 	}
 	ip6 = mtod(copym, struct ip6_hdr *);
 	/*
 	 * clear embedded scope identifiers if necessary.
 	 * in6_clearscope will touch the addresses only when necessary.
 	 */
 	in6_clearscope(&ip6->ip6_src);
 	in6_clearscope(&ip6->ip6_dst);
 	if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA_IPV6) {
 		copym->m_pkthdr.csum_flags |= CSUM_DATA_VALID_IPV6 |
 		    CSUM_PSEUDO_HDR;
 		copym->m_pkthdr.csum_data = 0xffff;
 	}
 	if_simloop(ifp, copym, AF_INET6, 0);
 }
 
 /*
  * Chop IPv6 header off from the payload.
  */
 static int
 ip6_splithdr(struct mbuf *m, struct ip6_exthdrs *exthdrs)
 {
 	struct mbuf *mh;
 	struct ip6_hdr *ip6;
 
 	ip6 = mtod(m, struct ip6_hdr *);
 	if (m->m_len > sizeof(*ip6)) {
 		mh = m_gethdr(M_NOWAIT, MT_DATA);
 		if (mh == NULL) {
 			m_freem(m);
 			return ENOBUFS;
 		}
 		m_move_pkthdr(mh, m);
 		M_ALIGN(mh, sizeof(*ip6));
 		m->m_len -= sizeof(*ip6);
 		m->m_data += sizeof(*ip6);
 		mh->m_next = m;
 		m = mh;
 		m->m_len = sizeof(*ip6);
 		bcopy((caddr_t)ip6, mtod(m, caddr_t), sizeof(*ip6));
 	}
 	exthdrs->ip6e_ip6 = m;
 	return 0;
 }
 
 /*
  * Compute IPv6 extension header length.
  */
 int
 ip6_optlen(struct inpcb *in6p)
 {
 	int len;
 
 	if (!in6p->in6p_outputopts)
 		return 0;
 
 	len = 0;
 #define elen(x) \
     (((struct ip6_ext *)(x)) ? (((struct ip6_ext *)(x))->ip6e_len + 1) << 3 : 0)
 
 	len += elen(in6p->in6p_outputopts->ip6po_hbh);
 	if (in6p->in6p_outputopts->ip6po_rthdr)
 		/* dest1 is valid with rthdr only */
 		len += elen(in6p->in6p_outputopts->ip6po_dest1);
 	len += elen(in6p->in6p_outputopts->ip6po_rthdr);
 	len += elen(in6p->in6p_outputopts->ip6po_dest2);
 	return len;
 #undef elen
 }
Index: user/ngie/bug-237403/sys/netpfil/ipfw/ip_fw2.c
===================================================================
--- user/ngie/bug-237403/sys/netpfil/ipfw/ip_fw2.c	(revision 346925)
+++ user/ngie/bug-237403/sys/netpfil/ipfw/ip_fw2.c	(revision 346926)
@@ -1,3489 +1,3491 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2002-2009 Luigi Rizzo, Universita` di Pisa
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 /*
  * The FreeBSD IP packet firewall, main file
  */
 
 #include "opt_ipfw.h"
 #include "opt_ipdivert.h"
 #include "opt_inet.h"
 #ifndef INET
 #error "IPFIREWALL requires INET"
 #endif /* INET */
 #include "opt_inet6.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/condvar.h>
 #include <sys/counter.h>
 #include <sys/eventhandler.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/jail.h>
 #include <sys/module.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/rwlock.h>
 #include <sys/rmlock.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
 #include <sys/ucred.h>
 #include <net/ethernet.h> /* for ETHERTYPE_IP */
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/route.h>
 #include <net/pfil.h>
 #include <net/vnet.h>
 
 #include <netpfil/pf/pf_mtag.h>
 
 #include <netinet/in.h>
 #include <netinet/in_var.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip.h>
 #include <netinet/ip_var.h>
 #include <netinet/ip_icmp.h>
 #include <netinet/ip_fw.h>
 #include <netinet/ip_carp.h>
 #include <netinet/pim.h>
 #include <netinet/tcp_var.h>
 #include <netinet/udp.h>
 #include <netinet/udp_var.h>
 #include <netinet/sctp.h>
 #include <netinet/sctp_crc32.h>
 #include <netinet/sctp_header.h>
 
 #include <netinet/ip6.h>
 #include <netinet/icmp6.h>
 #include <netinet/in_fib.h>
 #ifdef INET6
 #include <netinet6/in6_fib.h>
 #include <netinet6/in6_pcb.h>
 #include <netinet6/scope6_var.h>
 #include <netinet6/ip6_var.h>
 #endif
 
 #include <net/if_gre.h> /* for struct grehdr */
 
 #include <netpfil/ipfw/ip_fw_private.h>
 
 #include <machine/in_cksum.h>	/* XXX for in_cksum */
 
 #ifdef MAC
 #include <security/mac/mac_framework.h>
 #endif
 
 /*
  * static variables followed by global ones.
  * All ipfw global variables are here.
  */
 
 VNET_DEFINE_STATIC(int, fw_deny_unknown_exthdrs);
 #define	V_fw_deny_unknown_exthdrs	VNET(fw_deny_unknown_exthdrs)
 
 VNET_DEFINE_STATIC(int, fw_permit_single_frag6) = 1;
 #define	V_fw_permit_single_frag6	VNET(fw_permit_single_frag6)
 
 #ifdef IPFIREWALL_DEFAULT_TO_ACCEPT
 static int default_to_accept = 1;
 #else
 static int default_to_accept;
 #endif
 
 VNET_DEFINE(int, autoinc_step);
 VNET_DEFINE(int, fw_one_pass) = 1;
 
 VNET_DEFINE(unsigned int, fw_tables_max);
 VNET_DEFINE(unsigned int, fw_tables_sets) = 0;	/* Don't use set-aware tables */
 /* Use 128 tables by default */
 static unsigned int default_fw_tables = IPFW_TABLES_DEFAULT;
 
 #ifndef LINEAR_SKIPTO
 static int jump_fast(struct ip_fw_chain *chain, struct ip_fw *f, int num,
     int tablearg, int jump_backwards);
 #define	JUMP(ch, f, num, targ, back)	jump_fast(ch, f, num, targ, back)
 #else
 static int jump_linear(struct ip_fw_chain *chain, struct ip_fw *f, int num,
     int tablearg, int jump_backwards);
 #define	JUMP(ch, f, num, targ, back)	jump_linear(ch, f, num, targ, back)
 #endif
 
 /*
  * Each rule belongs to one of 32 different sets (0..31).
  * The variable set_disable contains one bit per set.
  * If the bit is set, all rules in the corresponding set
  * are disabled. Set RESVD_SET(31) is reserved for the default rule
  * and rules that are not deleted by the flush command,
  * and CANNOT be disabled.
  * Rules in set RESVD_SET can only be deleted individually.
  */
 VNET_DEFINE(u_int32_t, set_disable);
 #define	V_set_disable			VNET(set_disable)
 
 VNET_DEFINE(int, fw_verbose);
 /* counter for ipfw_log(NULL...) */
 VNET_DEFINE(u_int64_t, norule_counter);
 VNET_DEFINE(int, verbose_limit);
 
 /* layer3_chain contains the list of rules for layer 3 */
 VNET_DEFINE(struct ip_fw_chain, layer3_chain);
 
 /* ipfw_vnet_ready controls when we are open for business */
 VNET_DEFINE(int, ipfw_vnet_ready) = 0;
 
 VNET_DEFINE(int, ipfw_nat_ready) = 0;
 
 ipfw_nat_t *ipfw_nat_ptr = NULL;
 struct cfg_nat *(*lookup_nat_ptr)(struct nat_list *, int);
 ipfw_nat_cfg_t *ipfw_nat_cfg_ptr;
 ipfw_nat_cfg_t *ipfw_nat_del_ptr;
 ipfw_nat_cfg_t *ipfw_nat_get_cfg_ptr;
 ipfw_nat_cfg_t *ipfw_nat_get_log_ptr;
 
 #ifdef SYSCTL_NODE
 uint32_t dummy_def = IPFW_DEFAULT_RULE;
 static int sysctl_ipfw_table_num(SYSCTL_HANDLER_ARGS);
 static int sysctl_ipfw_tables_sets(SYSCTL_HANDLER_ARGS);
 
 SYSBEGIN(f3)
 
 SYSCTL_NODE(_net_inet_ip, OID_AUTO, fw, CTLFLAG_RW, 0, "Firewall");
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, one_pass,
     CTLFLAG_VNET | CTLFLAG_RW | CTLFLAG_SECURE3, &VNET_NAME(fw_one_pass), 0,
     "Only do a single pass through ipfw when using dummynet(4)");
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, autoinc_step,
     CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(autoinc_step), 0,
     "Rule number auto-increment step");
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, verbose,
     CTLFLAG_VNET | CTLFLAG_RW | CTLFLAG_SECURE3, &VNET_NAME(fw_verbose), 0,
     "Log matches to ipfw rules");
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, verbose_limit,
     CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(verbose_limit), 0,
     "Set upper limit of matches of ipfw rules logged");
 SYSCTL_UINT(_net_inet_ip_fw, OID_AUTO, default_rule, CTLFLAG_RD,
     &dummy_def, 0,
     "The default/max possible rule number.");
 SYSCTL_PROC(_net_inet_ip_fw, OID_AUTO, tables_max,
     CTLFLAG_VNET | CTLTYPE_UINT | CTLFLAG_RW, 0, 0, sysctl_ipfw_table_num, "IU",
     "Maximum number of concurrently used tables");
 SYSCTL_PROC(_net_inet_ip_fw, OID_AUTO, tables_sets,
     CTLFLAG_VNET | CTLTYPE_UINT | CTLFLAG_RW,
     0, 0, sysctl_ipfw_tables_sets, "IU",
     "Use per-set namespace for tables");
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, default_to_accept, CTLFLAG_RDTUN,
     &default_to_accept, 0,
     "Make the default rule accept all packets.");
 TUNABLE_INT("net.inet.ip.fw.tables_max", (int *)&default_fw_tables);
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, static_count,
     CTLFLAG_VNET | CTLFLAG_RD, &VNET_NAME(layer3_chain.n_rules), 0,
     "Number of static rules");
 
 #ifdef INET6
 SYSCTL_DECL(_net_inet6_ip6);
 SYSCTL_NODE(_net_inet6_ip6, OID_AUTO, fw, CTLFLAG_RW, 0, "Firewall");
 SYSCTL_INT(_net_inet6_ip6_fw, OID_AUTO, deny_unknown_exthdrs,
     CTLFLAG_VNET | CTLFLAG_RW | CTLFLAG_SECURE,
     &VNET_NAME(fw_deny_unknown_exthdrs), 0,
     "Deny packets with unknown IPv6 Extension Headers");
 SYSCTL_INT(_net_inet6_ip6_fw, OID_AUTO, permit_single_frag6,
     CTLFLAG_VNET | CTLFLAG_RW | CTLFLAG_SECURE,
     &VNET_NAME(fw_permit_single_frag6), 0,
     "Permit single packet IPv6 fragments");
 #endif /* INET6 */
 
 SYSEND
 
 #endif /* SYSCTL_NODE */
 
 
 /*
  * Some macros used in the various matching options.
  * L3HDR maps an ipv4 pointer into a layer3 header pointer of type T
  * Other macros just cast void * into the appropriate type
  */
 #define	L3HDR(T, ip)	((T *)((u_int32_t *)(ip) + (ip)->ip_hl))
 #define	TCP(p)		((struct tcphdr *)(p))
 #define	SCTP(p)		((struct sctphdr *)(p))
 #define	UDP(p)		((struct udphdr *)(p))
 #define	ICMP(p)		((struct icmphdr *)(p))
 #define	ICMP6(p)	((struct icmp6_hdr *)(p))
 
 static __inline int
 icmptype_match(struct icmphdr *icmp, ipfw_insn_u32 *cmd)
 {
 	int type = icmp->icmp_type;
 
 	return (type <= ICMP_MAXTYPE && (cmd->d[0] & (1<<type)) );
 }
 
 #define TT	( (1 << ICMP_ECHO) | (1 << ICMP_ROUTERSOLICIT) | \
     (1 << ICMP_TSTAMP) | (1 << ICMP_IREQ) | (1 << ICMP_MASKREQ) )
 
 static int
 is_icmp_query(struct icmphdr *icmp)
 {
 	int type = icmp->icmp_type;
 
 	return (type <= ICMP_MAXTYPE && (TT & (1<<type)) );
 }
 #undef TT
 
 /*
  * The following checks use two arrays of 8 or 16 bits to store the
  * bits that we want set or clear, respectively. They are in the
  * low and high half of cmd->arg1 or cmd->d[0].
  *
  * We scan options and store the bits we find set. We succeed if
  *
  *	(want_set & ~bits) == 0 && (want_clear & ~bits) == want_clear
  *
  * The code is sometimes optimized not to store additional variables.
  */
 
 static int
 flags_match(ipfw_insn *cmd, u_int8_t bits)
 {
 	u_char want_clear;
 	bits = ~bits;
 
 	if ( ((cmd->arg1 & 0xff) & bits) != 0)
 		return 0; /* some bits we want set were clear */
 	want_clear = (cmd->arg1 >> 8) & 0xff;
 	if ( (want_clear & bits) != want_clear)
 		return 0; /* some bits we want clear were set */
 	return 1;
 }
 
 static int
 ipopts_match(struct ip *ip, ipfw_insn *cmd)
 {
 	int optlen, bits = 0;
 	u_char *cp = (u_char *)(ip + 1);
 	int x = (ip->ip_hl << 2) - sizeof (struct ip);
 
 	for (; x > 0; x -= optlen, cp += optlen) {
 		int opt = cp[IPOPT_OPTVAL];
 
 		if (opt == IPOPT_EOL)
 			break;
 		if (opt == IPOPT_NOP)
 			optlen = 1;
 		else {
 			optlen = cp[IPOPT_OLEN];
 			if (optlen <= 0 || optlen > x)
 				return 0; /* invalid or truncated */
 		}
 		switch (opt) {
 
 		default:
 			break;
 
 		case IPOPT_LSRR:
 			bits |= IP_FW_IPOPT_LSRR;
 			break;
 
 		case IPOPT_SSRR:
 			bits |= IP_FW_IPOPT_SSRR;
 			break;
 
 		case IPOPT_RR:
 			bits |= IP_FW_IPOPT_RR;
 			break;
 
 		case IPOPT_TS:
 			bits |= IP_FW_IPOPT_TS;
 			break;
 		}
 	}
 	return (flags_match(cmd, bits));
 }
 
 static int
 tcpopts_match(struct tcphdr *tcp, ipfw_insn *cmd)
 {
 	int optlen, bits = 0;
 	u_char *cp = (u_char *)(tcp + 1);
 	int x = (tcp->th_off << 2) - sizeof(struct tcphdr);
 
 	for (; x > 0; x -= optlen, cp += optlen) {
 		int opt = cp[0];
 		if (opt == TCPOPT_EOL)
 			break;
 		if (opt == TCPOPT_NOP)
 			optlen = 1;
 		else {
 			optlen = cp[1];
 			if (optlen <= 0)
 				break;
 		}
 
 		switch (opt) {
 
 		default:
 			break;
 
 		case TCPOPT_MAXSEG:
 			bits |= IP_FW_TCPOPT_MSS;
 			break;
 
 		case TCPOPT_WINDOW:
 			bits |= IP_FW_TCPOPT_WINDOW;
 			break;
 
 		case TCPOPT_SACK_PERMITTED:
 		case TCPOPT_SACK:
 			bits |= IP_FW_TCPOPT_SACK;
 			break;
 
 		case TCPOPT_TIMESTAMP:
 			bits |= IP_FW_TCPOPT_TS;
 			break;
 
 		}
 	}
 	return (flags_match(cmd, bits));
 }
 
 static int
 iface_match(struct ifnet *ifp, ipfw_insn_if *cmd, struct ip_fw_chain *chain,
     uint32_t *tablearg)
 {
 
 	if (ifp == NULL)	/* no iface with this packet, match fails */
 		return (0);
 
 	/* Check by name or by IP address */
 	if (cmd->name[0] != '\0') { /* match by name */
 		if (cmd->name[0] == '\1') /* use tablearg to match */
 			return ipfw_lookup_table(chain, cmd->p.kidx, 0,
 			    &ifp->if_index, tablearg);
 		/* Check name */
 		if (cmd->p.glob) {
 			if (fnmatch(cmd->name, ifp->if_xname, 0) == 0)
 				return(1);
 		} else {
 			if (strncmp(ifp->if_xname, cmd->name, IFNAMSIZ) == 0)
 				return(1);
 		}
 	} else {
 #if !defined(USERSPACE) && defined(__FreeBSD__)	/* and OSX too ? */
 		struct ifaddr *ia;
 
 		if_addr_rlock(ifp);
 		CK_STAILQ_FOREACH(ia, &ifp->if_addrhead, ifa_link) {
 			if (ia->ifa_addr->sa_family != AF_INET)
 				continue;
 			if (cmd->p.ip.s_addr == ((struct sockaddr_in *)
 			    (ia->ifa_addr))->sin_addr.s_addr) {
 				if_addr_runlock(ifp);
 				return(1);	/* match */
 			}
 		}
 		if_addr_runlock(ifp);
 #endif /* __FreeBSD__ */
 	}
 	return(0);	/* no match, fail ... */
 }
 
 /*
  * The verify_path function checks if a route to the src exists and
  * if it is reachable via ifp (when provided).
  * 
  * The 'verrevpath' option checks that the interface that an IP packet
  * arrives on is the same interface that traffic destined for the
  * packet's source address would be routed out of.
  * The 'versrcreach' option just checks that the source address is
  * reachable via any route (except default) in the routing table.
  * These two are a measure to block forged packets. This is also
  * commonly known as "anti-spoofing" or Unicast Reverse Path
  * Forwarding (Unicast RFP) in Cisco-ese. The name of the knobs
  * is purposely reminiscent of the Cisco IOS command,
  *
  *   ip verify unicast reverse-path
  *   ip verify unicast source reachable-via any
  *
  * which implements the same functionality. But note that the syntax
  * is misleading, and the check may be performed on all IP packets
  * whether unicast, multicast, or broadcast.
  */
 static int
 verify_path(struct in_addr src, struct ifnet *ifp, u_int fib)
 {
 #if defined(USERSPACE) || !defined(__FreeBSD__)
 	return 0;
 #else
 	struct nhop4_basic nh4;
 
 	if (fib4_lookup_nh_basic(fib, src, NHR_IFAIF, 0, &nh4) != 0)
 		return (0);
 
 	/*
 	 * If ifp is provided, check for equality with rtentry.
 	 * We should use rt->rt_ifa->ifa_ifp, instead of rt->rt_ifp,
 	 * in order to pass packets injected back by if_simloop():
 	 * routing entry (via lo0) for our own address
 	 * may exist, so we need to handle routing assymetry.
 	 */
 	if (ifp != NULL && ifp != nh4.nh_ifp)
 		return (0);
 
 	/* if no ifp provided, check if rtentry is not default route */
 	if (ifp == NULL && (nh4.nh_flags & NHF_DEFAULT) != 0)
 		return (0);
 
 	/* or if this is a blackhole/reject route */
 	if (ifp == NULL && (nh4.nh_flags & (NHF_REJECT|NHF_BLACKHOLE)) != 0)
 		return (0);
 
 	/* found valid route */
 	return 1;
 #endif /* __FreeBSD__ */
 }
 
 /*
  * Generate an SCTP packet containing an ABORT chunk. The verification tag
  * is given by vtag. The T-bit is set in the ABORT chunk if and only if
  * reflected is not 0.
  */
 
 static struct mbuf *
 ipfw_send_abort(struct mbuf *replyto, struct ipfw_flow_id *id, u_int32_t vtag,
     int reflected)
 {
 	struct mbuf *m;
 	struct ip *ip;
 #ifdef INET6
 	struct ip6_hdr *ip6;
 #endif
 	struct sctphdr *sctp;
 	struct sctp_chunkhdr *chunk;
 	u_int16_t hlen, plen, tlen;
 
 	MGETHDR(m, M_NOWAIT, MT_DATA);
 	if (m == NULL)
 		return (NULL);
 
 	M_SETFIB(m, id->fib);
 #ifdef MAC
 	if (replyto != NULL)
 		mac_netinet_firewall_reply(replyto, m);
 	else
 		mac_netinet_firewall_send(m);
 #else
 	(void)replyto;		/* don't warn about unused arg */
 #endif
 
 	switch (id->addr_type) {
 	case 4:
 		hlen = sizeof(struct ip);
 		break;
 #ifdef INET6
 	case 6:
 		hlen = sizeof(struct ip6_hdr);
 		break;
 #endif
 	default:
 		/* XXX: log me?!? */
 		FREE_PKT(m);
 		return (NULL);
 	}
 	plen = sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr);
 	tlen = hlen + plen;
 	m->m_data += max_linkhdr;
 	m->m_flags |= M_SKIP_FIREWALL;
 	m->m_pkthdr.len = m->m_len = tlen;
 	m->m_pkthdr.rcvif = NULL;
 	bzero(m->m_data, tlen);
 
 	switch (id->addr_type) {
 	case 4:
 		ip = mtod(m, struct ip *);
 
 		ip->ip_v = 4;
 		ip->ip_hl = sizeof(struct ip) >> 2;
 		ip->ip_tos = IPTOS_LOWDELAY;
 		ip->ip_len = htons(tlen);
 		ip->ip_id = htons(0);
 		ip->ip_off = htons(0);
 		ip->ip_ttl = V_ip_defttl;
 		ip->ip_p = IPPROTO_SCTP;
 		ip->ip_sum = 0;
 		ip->ip_src.s_addr = htonl(id->dst_ip);
 		ip->ip_dst.s_addr = htonl(id->src_ip);
 
 		sctp = (struct sctphdr *)(ip + 1);
 		break;
 #ifdef INET6
 	case 6:
 		ip6 = mtod(m, struct ip6_hdr *);
 
 		ip6->ip6_vfc = IPV6_VERSION;
 		ip6->ip6_plen = htons(plen);
 		ip6->ip6_nxt = IPPROTO_SCTP;
 		ip6->ip6_hlim = IPV6_DEFHLIM;
 		ip6->ip6_src = id->dst_ip6;
 		ip6->ip6_dst = id->src_ip6;
 
 		sctp = (struct sctphdr *)(ip6 + 1);
 		break;
 #endif
 	}
 
 	sctp->src_port = htons(id->dst_port);
 	sctp->dest_port = htons(id->src_port);
 	sctp->v_tag = htonl(vtag);
 	sctp->checksum = htonl(0);
 
 	chunk = (struct sctp_chunkhdr *)(sctp + 1);
 	chunk->chunk_type = SCTP_ABORT_ASSOCIATION;
 	chunk->chunk_flags = 0;
 	if (reflected != 0) {
 		chunk->chunk_flags |= SCTP_HAD_NO_TCB;
 	}
 	chunk->chunk_length = htons(sizeof(struct sctp_chunkhdr));
 
 	sctp->checksum = sctp_calculate_cksum(m, hlen);
 
 	return (m);
 }
 
 /*
  * Generate a TCP packet, containing either a RST or a keepalive.
  * When flags & TH_RST, we are sending a RST packet, because of a
  * "reset" action matched the packet.
  * Otherwise we are sending a keepalive, and flags & TH_
  * The 'replyto' mbuf is the mbuf being replied to, if any, and is required
  * so that MAC can label the reply appropriately.
  */
 struct mbuf *
 ipfw_send_pkt(struct mbuf *replyto, struct ipfw_flow_id *id, u_int32_t seq,
     u_int32_t ack, int flags)
 {
 	struct mbuf *m = NULL;		/* stupid compiler */
 	struct ip *h = NULL;		/* stupid compiler */
 #ifdef INET6
 	struct ip6_hdr *h6 = NULL;
 #endif
 	struct tcphdr *th = NULL;
 	int len, dir;
 
 	MGETHDR(m, M_NOWAIT, MT_DATA);
 	if (m == NULL)
 		return (NULL);
 
 	M_SETFIB(m, id->fib);
 #ifdef MAC
 	if (replyto != NULL)
 		mac_netinet_firewall_reply(replyto, m);
 	else
 		mac_netinet_firewall_send(m);
 #else
 	(void)replyto;		/* don't warn about unused arg */
 #endif
 
 	switch (id->addr_type) {
 	case 4:
 		len = sizeof(struct ip) + sizeof(struct tcphdr);
 		break;
 #ifdef INET6
 	case 6:
 		len = sizeof(struct ip6_hdr) + sizeof(struct tcphdr);
 		break;
 #endif
 	default:
 		/* XXX: log me?!? */
 		FREE_PKT(m);
 		return (NULL);
 	}
 	dir = ((flags & (TH_SYN | TH_RST)) == TH_SYN);
 
 	m->m_data += max_linkhdr;
 	m->m_flags |= M_SKIP_FIREWALL;
 	m->m_pkthdr.len = m->m_len = len;
 	m->m_pkthdr.rcvif = NULL;
 	bzero(m->m_data, len);
 
 	switch (id->addr_type) {
 	case 4:
 		h = mtod(m, struct ip *);
 
 		/* prepare for checksum */
 		h->ip_p = IPPROTO_TCP;
 		h->ip_len = htons(sizeof(struct tcphdr));
 		if (dir) {
 			h->ip_src.s_addr = htonl(id->src_ip);
 			h->ip_dst.s_addr = htonl(id->dst_ip);
 		} else {
 			h->ip_src.s_addr = htonl(id->dst_ip);
 			h->ip_dst.s_addr = htonl(id->src_ip);
 		}
 
 		th = (struct tcphdr *)(h + 1);
 		break;
 #ifdef INET6
 	case 6:
 		h6 = mtod(m, struct ip6_hdr *);
 
 		/* prepare for checksum */
 		h6->ip6_nxt = IPPROTO_TCP;
 		h6->ip6_plen = htons(sizeof(struct tcphdr));
 		if (dir) {
 			h6->ip6_src = id->src_ip6;
 			h6->ip6_dst = id->dst_ip6;
 		} else {
 			h6->ip6_src = id->dst_ip6;
 			h6->ip6_dst = id->src_ip6;
 		}
 
 		th = (struct tcphdr *)(h6 + 1);
 		break;
 #endif
 	}
 
 	if (dir) {
 		th->th_sport = htons(id->src_port);
 		th->th_dport = htons(id->dst_port);
 	} else {
 		th->th_sport = htons(id->dst_port);
 		th->th_dport = htons(id->src_port);
 	}
 	th->th_off = sizeof(struct tcphdr) >> 2;
 
 	if (flags & TH_RST) {
 		if (flags & TH_ACK) {
 			th->th_seq = htonl(ack);
 			th->th_flags = TH_RST;
 		} else {
 			if (flags & TH_SYN)
 				seq++;
 			th->th_ack = htonl(seq);
 			th->th_flags = TH_RST | TH_ACK;
 		}
 	} else {
 		/*
 		 * Keepalive - use caller provided sequence numbers
 		 */
 		th->th_seq = htonl(seq);
 		th->th_ack = htonl(ack);
 		th->th_flags = TH_ACK;
 	}
 
 	switch (id->addr_type) {
 	case 4:
 		th->th_sum = in_cksum(m, len);
 
 		/* finish the ip header */
 		h->ip_v = 4;
 		h->ip_hl = sizeof(*h) >> 2;
 		h->ip_tos = IPTOS_LOWDELAY;
 		h->ip_off = htons(0);
 		h->ip_len = htons(len);
 		h->ip_ttl = V_ip_defttl;
 		h->ip_sum = 0;
 		break;
 #ifdef INET6
 	case 6:
 		th->th_sum = in6_cksum(m, IPPROTO_TCP, sizeof(*h6),
 		    sizeof(struct tcphdr));
 
 		/* finish the ip6 header */
 		h6->ip6_vfc |= IPV6_VERSION;
 		h6->ip6_hlim = IPV6_DEFHLIM;
 		break;
 #endif
 	}
 
 	return (m);
 }
 
 #ifdef INET6
 /*
  * ipv6 specific rules here...
  */
 static __inline int
 icmp6type_match (int type, ipfw_insn_u32 *cmd)
 {
 	return (type <= ICMP6_MAXTYPE && (cmd->d[type/32] & (1<<(type%32)) ) );
 }
 
 static int
 flow6id_match( int curr_flow, ipfw_insn_u32 *cmd )
 {
 	int i;
 	for (i=0; i <= cmd->o.arg1; ++i )
 		if (curr_flow == cmd->d[i] )
 			return 1;
 	return 0;
 }
 
 /* support for IP6_*_ME opcodes */
 static const struct in6_addr lla_mask = {{{
 	0xff, 0xff, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
 	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
 }}};
 
 static int
 ipfw_localip6(struct in6_addr *in6)
 {
 	struct rm_priotracker in6_ifa_tracker;
 	struct in6_ifaddr *ia;
 
 	if (IN6_IS_ADDR_MULTICAST(in6))
 		return (0);
 
 	if (!IN6_IS_ADDR_LINKLOCAL(in6))
 		return (in6_localip(in6));
 
 	IN6_IFADDR_RLOCK(&in6_ifa_tracker);
 	CK_STAILQ_FOREACH(ia, &V_in6_ifaddrhead, ia_link) {
 		if (!IN6_IS_ADDR_LINKLOCAL(&ia->ia_addr.sin6_addr))
 			continue;
 		if (IN6_ARE_MASKED_ADDR_EQUAL(&ia->ia_addr.sin6_addr,
 		    in6, &lla_mask)) {
 			IN6_IFADDR_RUNLOCK(&in6_ifa_tracker);
 			return (1);
 		}
 	}
 	IN6_IFADDR_RUNLOCK(&in6_ifa_tracker);
 	return (0);
 }
 
 static int
 verify_path6(struct in6_addr *src, struct ifnet *ifp, u_int fib)
 {
 	struct nhop6_basic nh6;
 
 	if (IN6_IS_SCOPE_LINKLOCAL(src))
 		return (1);
 
 	if (fib6_lookup_nh_basic(fib, src, 0, NHR_IFAIF, 0, &nh6) != 0)
 		return (0);
 
 	/* If ifp is provided, check for equality with route table. */
 	if (ifp != NULL && ifp != nh6.nh_ifp)
 		return (0);
 
 	/* if no ifp provided, check if rtentry is not default route */
 	if (ifp == NULL && (nh6.nh_flags & NHF_DEFAULT) != 0)
 		return (0);
 
 	/* or if this is a blackhole/reject route */
 	if (ifp == NULL && (nh6.nh_flags & (NHF_REJECT|NHF_BLACKHOLE)) != 0)
 		return (0);
 
 	/* found valid route */
 	return 1;
 }
 
 static int
 is_icmp6_query(int icmp6_type)
 {
 	if ((icmp6_type <= ICMP6_MAXTYPE) &&
 	    (icmp6_type == ICMP6_ECHO_REQUEST ||
 	    icmp6_type == ICMP6_MEMBERSHIP_QUERY ||
 	    icmp6_type == ICMP6_WRUREQUEST ||
 	    icmp6_type == ICMP6_FQDN_QUERY ||
 	    icmp6_type == ICMP6_NI_QUERY))
 		return (1);
 
 	return (0);
 }
 
 static int
 map_icmp_unreach(int code)
 {
 
 	/* RFC 7915 p4.2 */
 	switch (code) {
 	case ICMP_UNREACH_NET:
 	case ICMP_UNREACH_HOST:
 	case ICMP_UNREACH_SRCFAIL:
 	case ICMP_UNREACH_NET_UNKNOWN:
 	case ICMP_UNREACH_HOST_UNKNOWN:
 	case ICMP_UNREACH_TOSNET:
 	case ICMP_UNREACH_TOSHOST:
 		return (ICMP6_DST_UNREACH_NOROUTE);
 	case ICMP_UNREACH_PORT:
 		return (ICMP6_DST_UNREACH_NOPORT);
 	default:
 		/*
 		 * Map the rest of codes into admit prohibited.
 		 * XXX: unreach proto should be mapped into ICMPv6
 		 * parameter problem, but we use only unreach type.
 		 */
 		return (ICMP6_DST_UNREACH_ADMIN);
 	}
 }
 
 static void
 send_reject6(struct ip_fw_args *args, int code, u_int hlen, struct ip6_hdr *ip6)
 {
 	struct mbuf *m;
 
 	m = args->m;
 	if (code == ICMP6_UNREACH_RST && args->f_id.proto == IPPROTO_TCP) {
 		struct tcphdr *tcp;
 		tcp = (struct tcphdr *)((char *)ip6 + hlen);
 
 		if ((tcp->th_flags & TH_RST) == 0) {
 			struct mbuf *m0;
 			m0 = ipfw_send_pkt(args->m, &(args->f_id),
 			    ntohl(tcp->th_seq), ntohl(tcp->th_ack),
 			    tcp->th_flags | TH_RST);
 			if (m0 != NULL)
 				ip6_output(m0, NULL, NULL, 0, NULL, NULL,
 				    NULL);
 		}
 		FREE_PKT(m);
 	} else if (code == ICMP6_UNREACH_ABORT &&
 	    args->f_id.proto == IPPROTO_SCTP) {
 		struct mbuf *m0;
 		struct sctphdr *sctp;
 		u_int32_t v_tag;
 		int reflected;
 
 		sctp = (struct sctphdr *)((char *)ip6 + hlen);
 		reflected = 1;
 		v_tag = ntohl(sctp->v_tag);
 		/* Investigate the first chunk header if available */
 		if (m->m_len >= hlen + sizeof(struct sctphdr) +
 		    sizeof(struct sctp_chunkhdr)) {
 			struct sctp_chunkhdr *chunk;
 
 			chunk = (struct sctp_chunkhdr *)(sctp + 1);
 			switch (chunk->chunk_type) {
 			case SCTP_INITIATION:
 				/*
 				 * Packets containing an INIT chunk MUST have
 				 * a zero v-tag.
 				 */
 				if (v_tag != 0) {
 					v_tag = 0;
 					break;
 				}
 				/* INIT chunk MUST NOT be bundled */
 				if (m->m_pkthdr.len >
 				    hlen + sizeof(struct sctphdr) +
 				    ntohs(chunk->chunk_length) + 3) {
 					break;
 				}
 				/* Use the initiate tag if available */
 				if ((m->m_len >= hlen + sizeof(struct sctphdr) +
 				    sizeof(struct sctp_chunkhdr) +
 				    offsetof(struct sctp_init, a_rwnd))) {
 					struct sctp_init *init;
 
 					init = (struct sctp_init *)(chunk + 1);
 					v_tag = ntohl(init->initiate_tag);
 					reflected = 0;
 				}
 				break;
 			case SCTP_ABORT_ASSOCIATION:
 				/*
 				 * If the packet contains an ABORT chunk, don't
 				 * reply.
 				 * XXX: We should search through all chunks,
 				 *      but don't do to avoid attacks.
 				 */
 				v_tag = 0;
 				break;
 			}
 		}
 		if (v_tag == 0) {
 			m0 = NULL;
 		} else {
 			m0 = ipfw_send_abort(args->m, &(args->f_id), v_tag,
 			    reflected);
 		}
 		if (m0 != NULL)
 			ip6_output(m0, NULL, NULL, 0, NULL, NULL, NULL);
 		FREE_PKT(m);
 	} else if (code != ICMP6_UNREACH_RST && code != ICMP6_UNREACH_ABORT) {
 		/* Send an ICMPv6 unreach. */
 #if 0
 		/*
 		 * Unlike above, the mbufs need to line up with the ip6 hdr,
 		 * as the contents are read. We need to m_adj() the
 		 * needed amount.
 		 * The mbuf will however be thrown away so we can adjust it.
 		 * Remember we did an m_pullup on it already so we
 		 * can make some assumptions about contiguousness.
 		 */
 		if (args->L3offset)
 			m_adj(m, args->L3offset);
 #endif
 		icmp6_error(m, ICMP6_DST_UNREACH, code, 0);
 	} else
 		FREE_PKT(m);
 
 	args->m = NULL;
 }
 
 #endif /* INET6 */
 
 
 /*
  * sends a reject message, consuming the mbuf passed as an argument.
  */
 static void
 send_reject(struct ip_fw_args *args, int code, int iplen, struct ip *ip)
 {
 
 #if 0
 	/* XXX When ip is not guaranteed to be at mtod() we will
 	 * need to account for this */
 	 * The mbuf will however be thrown away so we can adjust it.
 	 * Remember we did an m_pullup on it already so we
 	 * can make some assumptions about contiguousness.
 	 */
 	if (args->L3offset)
 		m_adj(m, args->L3offset);
 #endif
 	if (code != ICMP_REJECT_RST && code != ICMP_REJECT_ABORT) {
 		/* Send an ICMP unreach */
 		icmp_error(args->m, ICMP_UNREACH, code, 0L, 0);
 	} else if (code == ICMP_REJECT_RST && args->f_id.proto == IPPROTO_TCP) {
 		struct tcphdr *const tcp =
 		    L3HDR(struct tcphdr, mtod(args->m, struct ip *));
 		if ( (tcp->th_flags & TH_RST) == 0) {
 			struct mbuf *m;
 			m = ipfw_send_pkt(args->m, &(args->f_id),
 				ntohl(tcp->th_seq), ntohl(tcp->th_ack),
 				tcp->th_flags | TH_RST);
 			if (m != NULL)
 				ip_output(m, NULL, NULL, 0, NULL, NULL);
 		}
 		FREE_PKT(args->m);
 	} else if (code == ICMP_REJECT_ABORT &&
 	    args->f_id.proto == IPPROTO_SCTP) {
 		struct mbuf *m;
 		struct sctphdr *sctp;
 		struct sctp_chunkhdr *chunk;
 		struct sctp_init *init;
 		u_int32_t v_tag;
 		int reflected;
 
 		sctp = L3HDR(struct sctphdr, mtod(args->m, struct ip *));
 		reflected = 1;
 		v_tag = ntohl(sctp->v_tag);
 		if (iplen >= (ip->ip_hl << 2) + sizeof(struct sctphdr) +
 		    sizeof(struct sctp_chunkhdr)) {
 			/* Look at the first chunk header if available */
 			chunk = (struct sctp_chunkhdr *)(sctp + 1);
 			switch (chunk->chunk_type) {
 			case SCTP_INITIATION:
 				/*
 				 * Packets containing an INIT chunk MUST have
 				 * a zero v-tag.
 				 */
 				if (v_tag != 0) {
 					v_tag = 0;
 					break;
 				}
 				/* INIT chunk MUST NOT be bundled */
 				if (iplen >
 				    (ip->ip_hl << 2) + sizeof(struct sctphdr) +
 				    ntohs(chunk->chunk_length) + 3) {
 					break;
 				}
 				/* Use the initiate tag if available */
 				if ((iplen >= (ip->ip_hl << 2) +
 				    sizeof(struct sctphdr) +
 				    sizeof(struct sctp_chunkhdr) +
 				    offsetof(struct sctp_init, a_rwnd))) {
 					init = (struct sctp_init *)(chunk + 1);
 					v_tag = ntohl(init->initiate_tag);
 					reflected = 0;
 				}
 				break;
 			case SCTP_ABORT_ASSOCIATION:
 				/*
 				 * If the packet contains an ABORT chunk, don't
 				 * reply.
 				 * XXX: We should search through all chunks,
 				 * but don't do to avoid attacks.
 				 */
 				v_tag = 0;
 				break;
 			}
 		}
 		if (v_tag == 0) {
 			m = NULL;
 		} else {
 			m = ipfw_send_abort(args->m, &(args->f_id), v_tag,
 			    reflected);
 		}
 		if (m != NULL)
 			ip_output(m, NULL, NULL, 0, NULL, NULL);
 		FREE_PKT(args->m);
 	} else
 		FREE_PKT(args->m);
 	args->m = NULL;
 }
 
 /*
  * Support for uid/gid/jail lookup. These tests are expensive
  * (because we may need to look into the list of active sockets)
  * so we cache the results. ugid_lookupp is 0 if we have not
  * yet done a lookup, 1 if we succeeded, and -1 if we tried
  * and failed. The function always returns the match value.
  * We could actually spare the variable and use *uc, setting
  * it to '(void *)check_uidgid if we have no info, NULL if
  * we tried and failed, or any other value if successful.
  */
 static int
 check_uidgid(ipfw_insn_u32 *insn, struct ip_fw_args *args, int *ugid_lookupp,
     struct ucred **uc)
 {
 #if defined(USERSPACE)
 	return 0;	// not supported in userspace
 #else
 #ifndef __FreeBSD__
 	/* XXX */
 	return cred_check(insn, proto, oif,
 	    dst_ip, dst_port, src_ip, src_port,
 	    (struct bsd_ucred *)uc, ugid_lookupp, ((struct mbuf *)inp)->m_skb);
 #else  /* FreeBSD */
 	struct in_addr src_ip, dst_ip;
 	struct inpcbinfo *pi;
 	struct ipfw_flow_id *id;
 	struct inpcb *pcb, *inp;
 	int lookupflags;
 	int match;
 
 	id = &args->f_id;
 	inp = args->inp;
 
 	/*
 	 * Check to see if the UDP or TCP stack supplied us with
 	 * the PCB. If so, rather then holding a lock and looking
 	 * up the PCB, we can use the one that was supplied.
 	 */
 	if (inp && *ugid_lookupp == 0) {
 		INP_LOCK_ASSERT(inp);
 		if (inp->inp_socket != NULL) {
 			*uc = crhold(inp->inp_cred);
 			*ugid_lookupp = 1;
 		} else
 			*ugid_lookupp = -1;
 	}
 	/*
 	 * If we have already been here and the packet has no
 	 * PCB entry associated with it, then we can safely
 	 * assume that this is a no match.
 	 */
 	if (*ugid_lookupp == -1)
 		return (0);
 	if (id->proto == IPPROTO_TCP) {
 		lookupflags = 0;
 		pi = &V_tcbinfo;
 	} else if (id->proto == IPPROTO_UDP) {
 		lookupflags = INPLOOKUP_WILDCARD;
 		pi = &V_udbinfo;
 	} else if (id->proto == IPPROTO_UDPLITE) {
 		lookupflags = INPLOOKUP_WILDCARD;
 		pi = &V_ulitecbinfo;
 	} else
 		return 0;
 	lookupflags |= INPLOOKUP_RLOCKPCB;
 	match = 0;
 	if (*ugid_lookupp == 0) {
 		if (id->addr_type == 6) {
 #ifdef INET6
 			if (args->flags & IPFW_ARGS_IN)
 				pcb = in6_pcblookup_mbuf(pi,
 				    &id->src_ip6, htons(id->src_port),
 				    &id->dst_ip6, htons(id->dst_port),
 				    lookupflags, NULL, args->m);
 			else
 				pcb = in6_pcblookup_mbuf(pi,
 				    &id->dst_ip6, htons(id->dst_port),
 				    &id->src_ip6, htons(id->src_port),
 				    lookupflags, args->ifp, args->m);
 #else
 			*ugid_lookupp = -1;
 			return (0);
 #endif
 		} else {
 			src_ip.s_addr = htonl(id->src_ip);
 			dst_ip.s_addr = htonl(id->dst_ip);
 			if (args->flags & IPFW_ARGS_IN)
 				pcb = in_pcblookup_mbuf(pi,
 				    src_ip, htons(id->src_port),
 				    dst_ip, htons(id->dst_port),
 				    lookupflags, NULL, args->m);
 			else
 				pcb = in_pcblookup_mbuf(pi,
 				    dst_ip, htons(id->dst_port),
 				    src_ip, htons(id->src_port),
 				    lookupflags, args->ifp, args->m);
 		}
 		if (pcb != NULL) {
 			INP_RLOCK_ASSERT(pcb);
 			*uc = crhold(pcb->inp_cred);
 			*ugid_lookupp = 1;
 			INP_RUNLOCK(pcb);
 		}
 		if (*ugid_lookupp == 0) {
 			/*
 			 * We tried and failed, set the variable to -1
 			 * so we will not try again on this packet.
 			 */
 			*ugid_lookupp = -1;
 			return (0);
 		}
 	}
 	if (insn->o.opcode == O_UID)
 		match = ((*uc)->cr_uid == (uid_t)insn->d[0]);
 	else if (insn->o.opcode == O_GID)
 		match = groupmember((gid_t)insn->d[0], *uc);
 	else if (insn->o.opcode == O_JAIL)
 		match = ((*uc)->cr_prison->pr_id == (int)insn->d[0]);
 	return (match);
 #endif /* __FreeBSD__ */
 #endif /* not supported in userspace */
 }
 
 /*
  * Helper function to set args with info on the rule after the matching
  * one. slot is precise, whereas we guess rule_id as they are
  * assigned sequentially.
  */
 static inline void
 set_match(struct ip_fw_args *args, int slot,
 	struct ip_fw_chain *chain)
 {
 	args->rule.chain_id = chain->id;
 	args->rule.slot = slot + 1; /* we use 0 as a marker */
 	args->rule.rule_id = 1 + chain->map[slot]->id;
 	args->rule.rulenum = chain->map[slot]->rulenum;
 	args->flags |= IPFW_ARGS_REF;
 }
 
 #ifndef LINEAR_SKIPTO
 /*
  * Helper function to enable cached rule lookups using
  * cached_id and cached_pos fields in ipfw rule.
  */
 static int
 jump_fast(struct ip_fw_chain *chain, struct ip_fw *f, int num,
     int tablearg, int jump_backwards)
 {
 	int f_pos;
 
 	/* If possible use cached f_pos (in f->cached_pos),
 	 * whose version is written in f->cached_id
 	 * (horrible hacks to avoid changing the ABI).
 	 */
 	if (num != IP_FW_TARG && f->cached_id == chain->id)
 		f_pos = f->cached_pos;
 	else {
 		int i = IP_FW_ARG_TABLEARG(chain, num, skipto);
 		/* make sure we do not jump backward */
 		if (jump_backwards == 0 && i <= f->rulenum)
 			i = f->rulenum + 1;
 		if (chain->idxmap != NULL)
 			f_pos = chain->idxmap[i];
 		else
 			f_pos = ipfw_find_rule(chain, i, 0);
 		/* update the cache */
 		if (num != IP_FW_TARG) {
 			f->cached_id = chain->id;
 			f->cached_pos = f_pos;
 		}
 	}
 
 	return (f_pos);
 }
 #else
 /*
  * Helper function to enable real fast rule lookups.
  */
 static int
 jump_linear(struct ip_fw_chain *chain, struct ip_fw *f, int num,
     int tablearg, int jump_backwards)
 {
 	int f_pos;
 
 	num = IP_FW_ARG_TABLEARG(chain, num, skipto);
 	/* make sure we do not jump backward */
 	if (jump_backwards == 0 && num <= f->rulenum)
 		num = f->rulenum + 1;
 	f_pos = chain->idxmap[num];
 
 	return (f_pos);
 }
 #endif
 
 #define	TARG(k, f)	IP_FW_ARG_TABLEARG(chain, k, f)
 /*
  * The main check routine for the firewall.
  *
  * All arguments are in args so we can modify them and return them
  * back to the caller.
  *
  * Parameters:
  *
  *	args->m	(in/out) The packet; we set to NULL when/if we nuke it.
  *		Starts with the IP header.
  *	args->L3offset	Number of bytes bypassed if we came from L2.
  *			e.g. often sizeof(eh)  ** NOTYET **
  *	args->ifp	Incoming or outgoing interface.
  *	args->divert_rule (in/out)
  *		Skip up to the first rule past this rule number;
  *		upon return, non-zero port number for divert or tee.
  *
  *	args->rule	Pointer to the last matching rule (in/out)
  *	args->next_hop	Socket we are forwarding to (out).
  *	args->next_hop6	IPv6 next hop we are forwarding to (out).
  *	args->f_id	Addresses grabbed from the packet (out)
  * 	args->rule.info	a cookie depending on rule action
  *
  * Return value:
  *
  *	IP_FW_PASS	the packet must be accepted
  *	IP_FW_DENY	the packet must be dropped
  *	IP_FW_DIVERT	divert packet, port in m_tag
  *	IP_FW_TEE	tee packet, port in m_tag
  *	IP_FW_DUMMYNET	to dummynet, pipe in args->cookie
  *	IP_FW_NETGRAPH	into netgraph, cookie args->cookie
  *		args->rule contains the matching rule,
  *		args->rule.info has additional information.
  *
  */
 int
 ipfw_chk(struct ip_fw_args *args)
 {
 
 	/*
 	 * Local variables holding state while processing a packet:
 	 *
 	 * IMPORTANT NOTE: to speed up the processing of rules, there
 	 * are some assumption on the values of the variables, which
 	 * are documented here. Should you change them, please check
 	 * the implementation of the various instructions to make sure
 	 * that they still work.
 	 *
 	 * m | args->m	Pointer to the mbuf, as received from the caller.
 	 *	It may change if ipfw_chk() does an m_pullup, or if it
 	 *	consumes the packet because it calls send_reject().
 	 *	XXX This has to change, so that ipfw_chk() never modifies
 	 *	or consumes the buffer.
 	 *	OR
 	 * args->mem	Pointer to contigous memory chunk.
 	 * ip	Is the beginning of the ip(4 or 6) header.
 	 * eh	Ethernet header in case if input is Layer2.
 	 */
 	struct mbuf *m;
 	struct ip *ip;
 	struct ether_header *eh;
 
 	/*
 	 * For rules which contain uid/gid or jail constraints, cache
 	 * a copy of the users credentials after the pcb lookup has been
 	 * executed. This will speed up the processing of rules with
 	 * these types of constraints, as well as decrease contention
 	 * on pcb related locks.
 	 */
 #ifndef __FreeBSD__
 	struct bsd_ucred ucred_cache;
 #else
 	struct ucred *ucred_cache = NULL;
 #endif
 	int ucred_lookup = 0;
 	int f_pos = 0;		/* index of current rule in the array */
 	int retval = 0;
 	struct ifnet *oif, *iif;
 
 	/*
 	 * hlen	The length of the IP header.
 	 */
 	u_int hlen = 0;		/* hlen >0 means we have an IP pkt */
 
 	/*
 	 * offset	The offset of a fragment. offset != 0 means that
 	 *	we have a fragment at this offset of an IPv4 packet.
 	 *	offset == 0 means that (if this is an IPv4 packet)
 	 *	this is the first or only fragment.
 	 *	For IPv6 offset|ip6f_mf == 0 means there is no Fragment Header
 	 *	or there is a single packet fragment (fragment header added
 	 *	without needed).  We will treat a single packet fragment as if
 	 *	there was no fragment header (or log/block depending on the
 	 *	V_fw_permit_single_frag6 sysctl setting).
 	 */
 	u_short offset = 0;
 	u_short ip6f_mf = 0;
 
 	/*
 	 * Local copies of addresses. They are only valid if we have
 	 * an IP packet.
 	 *
 	 * proto	The protocol. Set to 0 for non-ip packets,
 	 *	or to the protocol read from the packet otherwise.
 	 *	proto != 0 means that we have an IPv4 packet.
 	 *
 	 * src_port, dst_port	port numbers, in HOST format. Only
 	 *	valid for TCP and UDP packets.
 	 *
 	 * src_ip, dst_ip	ip addresses, in NETWORK format.
 	 *	Only valid for IPv4 packets.
 	 */
 	uint8_t proto;
 	uint16_t src_port, dst_port;		/* NOTE: host format	*/
 	struct in_addr src_ip, dst_ip;		/* NOTE: network format	*/
 	int iplen = 0;
 	int pktlen;
 
 	struct ipfw_dyn_info dyn_info;
 	struct ip_fw *q = NULL;
 	struct ip_fw_chain *chain = &V_layer3_chain;
 
 	/*
 	 * We store in ulp a pointer to the upper layer protocol header.
 	 * In the ipv4 case this is easy to determine from the header,
 	 * but for ipv6 we might have some additional headers in the middle.
 	 * ulp is NULL if not found.
 	 */
 	void *ulp = NULL;		/* upper layer protocol pointer. */
 
 	/* XXX ipv6 variables */
 	int is_ipv6 = 0;
 	uint8_t	icmp6_type = 0;
 	uint16_t ext_hd = 0;	/* bits vector for extension header filtering */
 	/* end of ipv6 variables */
 
 	int is_ipv4 = 0;
 
 	int done = 0;		/* flag to exit the outer loop */
 	IPFW_RLOCK_TRACKER;
 	bool mem;
 
 	if ((mem = (args->flags & IPFW_ARGS_LENMASK))) {
 		if (args->flags & IPFW_ARGS_ETHER) {
 			eh = (struct ether_header *)args->mem;
 			if (eh->ether_type == htons(ETHERTYPE_VLAN))
 				ip = (struct ip *)
 				    ((struct ether_vlan_header *)eh + 1);
 			else
 				ip = (struct ip *)(eh + 1);
 		} else {
 			eh = NULL;
 			ip = (struct ip *)args->mem;
 		}
 		pktlen = IPFW_ARGS_LENGTH(args->flags);
 		args->f_id.fib = args->ifp->if_fib;	/* best guess */
 	} else {
 		m = args->m;
 		if (m->m_flags & M_SKIP_FIREWALL || (! V_ipfw_vnet_ready))
 			return (IP_FW_PASS);	/* accept */
 		if (args->flags & IPFW_ARGS_ETHER) {
 	                /* We need some amount of data to be contiguous. */
 			if (m->m_len < min(m->m_pkthdr.len, max_protohdr) &&
 			    (args->m = m = m_pullup(m, min(m->m_pkthdr.len,
 			    max_protohdr))) == NULL)
 				goto pullup_failed;
 			eh = mtod(m, struct ether_header *);
 			ip = (struct ip *)(eh + 1);
 		} else {
 			eh = NULL;
 			ip = mtod(m, struct ip *);
 		}
 		pktlen = m->m_pkthdr.len;
 		args->f_id.fib = M_GETFIB(m); /* mbuf not altered */
 	}
 
 	dst_ip.s_addr = 0;		/* make sure it is initialized */
 	src_ip.s_addr = 0;		/* make sure it is initialized */
 	src_port = dst_port = 0;
 
 	DYN_INFO_INIT(&dyn_info);
 /*
  * PULLUP_TO(len, p, T) makes sure that len + sizeof(T) is contiguous,
  * then it sets p to point at the offset "len" in the mbuf. WARNING: the
  * pointer might become stale after other pullups (but we never use it
  * this way).
  */
 #define PULLUP_TO(_len, p, T)	PULLUP_LEN(_len, p, sizeof(T))
 #define	EHLEN	(eh != NULL ? ((char *)ip - (char *)eh) : 0)
 #define PULLUP_LEN(_len, p, T)					\
 do {								\
 	int x = (_len) + T + EHLEN;				\
 	if (mem) {						\
 		MPASS(pktlen >= x);				\
 		p = (char *)args->mem + (_len) + EHLEN;		\
 	} else {						\
 		if (__predict_false((m)->m_len < x)) {		\
 			args->m = m = m_pullup(m, x);		\
 			if (m == NULL)				\
 				goto pullup_failed;		\
 		}						\
 		p = mtod(m, char *) + (_len) + EHLEN;		\
 	}							\
 } while (0)
 /*
  * In case pointers got stale after pullups, update them.
  */
 #define	UPDATE_POINTERS()					\
 do {								\
 	if (!mem) {						\
 		if (eh != NULL) {				\
 			eh = mtod(m, struct ether_header *);	\
 			ip = (struct ip *)(eh + 1);		\
 		} else						\
 			ip = mtod(m, struct ip *);		\
 		args->m = m;					\
 	}							\
 } while (0)
 
 	/* Identify IP packets and fill up variables. */
 	if (pktlen >= sizeof(struct ip6_hdr) &&
 	    (eh == NULL || eh->ether_type == htons(ETHERTYPE_IPV6)) &&
 	    ip->ip_v == 6) {
 		struct ip6_hdr *ip6 = (struct ip6_hdr *)ip;
 
 		is_ipv6 = 1;
 		args->flags |= IPFW_ARGS_IP6;
 		hlen = sizeof(struct ip6_hdr);
 		proto = ip6->ip6_nxt;
 		/* Search extension headers to find upper layer protocols */
 		while (ulp == NULL && offset == 0) {
 			switch (proto) {
 			case IPPROTO_ICMPV6:
 				PULLUP_TO(hlen, ulp, struct icmp6_hdr);
 				icmp6_type = ICMP6(ulp)->icmp6_type;
 				break;
 
 			case IPPROTO_TCP:
 				PULLUP_TO(hlen, ulp, struct tcphdr);
 				dst_port = TCP(ulp)->th_dport;
 				src_port = TCP(ulp)->th_sport;
 				/* save flags for dynamic rules */
 				args->f_id._flags = TCP(ulp)->th_flags;
 				break;
 
 			case IPPROTO_SCTP:
 				if (pktlen >= hlen + sizeof(struct sctphdr) +
 				    sizeof(struct sctp_chunkhdr) +
 				    offsetof(struct sctp_init, a_rwnd))
 					PULLUP_LEN(hlen, ulp,
 					    sizeof(struct sctphdr) +
 					    sizeof(struct sctp_chunkhdr) +
 					    offsetof(struct sctp_init, a_rwnd));
 				else if (pktlen >= hlen + sizeof(struct sctphdr))
 					PULLUP_LEN(hlen, ulp, pktlen - hlen);
 				else
 					PULLUP_LEN(hlen, ulp,
 					    sizeof(struct sctphdr));
 				src_port = SCTP(ulp)->src_port;
 				dst_port = SCTP(ulp)->dest_port;
 				break;
 
 			case IPPROTO_UDP:
 			case IPPROTO_UDPLITE:
 				PULLUP_TO(hlen, ulp, struct udphdr);
 				dst_port = UDP(ulp)->uh_dport;
 				src_port = UDP(ulp)->uh_sport;
 				break;
 
 			case IPPROTO_HOPOPTS:	/* RFC 2460 */
 				PULLUP_TO(hlen, ulp, struct ip6_hbh);
 				ext_hd |= EXT_HOPOPTS;
 				hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
 				proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
 				ulp = NULL;
 				break;
 
 			case IPPROTO_ROUTING:	/* RFC 2460 */
 				PULLUP_TO(hlen, ulp, struct ip6_rthdr);
 				switch (((struct ip6_rthdr *)ulp)->ip6r_type) {
 				case 0:
 					ext_hd |= EXT_RTHDR0;
 					break;
 				case 2:
 					ext_hd |= EXT_RTHDR2;
 					break;
 				default:
 					if (V_fw_verbose)
 						printf("IPFW2: IPV6 - Unknown "
 						    "Routing Header type(%d)\n",
 						    ((struct ip6_rthdr *)
 						    ulp)->ip6r_type);
 					if (V_fw_deny_unknown_exthdrs)
 					    return (IP_FW_DENY);
 					break;
 				}
 				ext_hd |= EXT_ROUTING;
 				hlen += (((struct ip6_rthdr *)ulp)->ip6r_len + 1) << 3;
 				proto = ((struct ip6_rthdr *)ulp)->ip6r_nxt;
 				ulp = NULL;
 				break;
 
 			case IPPROTO_FRAGMENT:	/* RFC 2460 */
 				PULLUP_TO(hlen, ulp, struct ip6_frag);
 				ext_hd |= EXT_FRAGMENT;
 				hlen += sizeof (struct ip6_frag);
 				proto = ((struct ip6_frag *)ulp)->ip6f_nxt;
 				offset = ((struct ip6_frag *)ulp)->ip6f_offlg &
 					IP6F_OFF_MASK;
 				ip6f_mf = ((struct ip6_frag *)ulp)->ip6f_offlg &
 					IP6F_MORE_FRAG;
 				if (V_fw_permit_single_frag6 == 0 &&
 				    offset == 0 && ip6f_mf == 0) {
 					if (V_fw_verbose)
 						printf("IPFW2: IPV6 - Invalid "
 						    "Fragment Header\n");
 					if (V_fw_deny_unknown_exthdrs)
 					    return (IP_FW_DENY);
 					break;
 				}
 				args->f_id.extra =
 				    ntohl(((struct ip6_frag *)ulp)->ip6f_ident);
 				ulp = NULL;
 				break;
 
 			case IPPROTO_DSTOPTS:	/* RFC 2460 */
 				PULLUP_TO(hlen, ulp, struct ip6_hbh);
 				ext_hd |= EXT_DSTOPTS;
 				hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
 				proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
 				ulp = NULL;
 				break;
 
 			case IPPROTO_AH:	/* RFC 2402 */
 				PULLUP_TO(hlen, ulp, struct ip6_ext);
 				ext_hd |= EXT_AH;
 				hlen += (((struct ip6_ext *)ulp)->ip6e_len + 2) << 2;
 				proto = ((struct ip6_ext *)ulp)->ip6e_nxt;
 				ulp = NULL;
 				break;
 
 			case IPPROTO_ESP:	/* RFC 2406 */
 				PULLUP_TO(hlen, ulp, uint32_t);	/* SPI, Seq# */
 				/* Anything past Seq# is variable length and
 				 * data past this ext. header is encrypted. */
 				ext_hd |= EXT_ESP;
 				break;
 
 			case IPPROTO_NONE:	/* RFC 2460 */
 				/*
 				 * Packet ends here, and IPv6 header has
 				 * already been pulled up. If ip6e_len!=0
 				 * then octets must be ignored.
 				 */
 				ulp = ip; /* non-NULL to get out of loop. */
 				break;
 
 			case IPPROTO_OSPFIGP:
 				/* XXX OSPF header check? */
 				PULLUP_TO(hlen, ulp, struct ip6_ext);
 				break;
 
 			case IPPROTO_PIM:
 				/* XXX PIM header check? */
 				PULLUP_TO(hlen, ulp, struct pim);
 				break;
 
 			case IPPROTO_GRE:	/* RFC 1701 */
 				/* XXX GRE header check? */
 				PULLUP_TO(hlen, ulp, struct grehdr);
 				break;
 
 			case IPPROTO_CARP:
 				PULLUP_TO(hlen, ulp, offsetof(
 				    struct carp_header, carp_counter));
 				if (CARP_ADVERTISEMENT !=
 				    ((struct carp_header *)ulp)->carp_type)
 					return (IP_FW_DENY);
 				break;
 
 			case IPPROTO_IPV6:	/* RFC 2893 */
 				PULLUP_TO(hlen, ulp, struct ip6_hdr);
 				break;
 
 			case IPPROTO_IPV4:	/* RFC 2893 */
 				PULLUP_TO(hlen, ulp, struct ip);
 				break;
 
 			default:
 				if (V_fw_verbose)
 					printf("IPFW2: IPV6 - Unknown "
 					    "Extension Header(%d), ext_hd=%x\n",
 					     proto, ext_hd);
 				if (V_fw_deny_unknown_exthdrs)
 				    return (IP_FW_DENY);
 				PULLUP_TO(hlen, ulp, struct ip6_ext);
 				break;
 			} /*switch */
 		}
 		UPDATE_POINTERS();
 		ip6 = (struct ip6_hdr *)ip;
 		args->f_id.addr_type = 6;
 		args->f_id.src_ip6 = ip6->ip6_src;
 		args->f_id.dst_ip6 = ip6->ip6_dst;
 		args->f_id.flow_id6 = ntohl(ip6->ip6_flow);
 		iplen = ntohs(ip6->ip6_plen) + sizeof(*ip6);
 	} else if (pktlen >= sizeof(struct ip) &&
 	    (eh == NULL || eh->ether_type == htons(ETHERTYPE_IP)) &&
 	    ip->ip_v == 4) {
 		is_ipv4 = 1;
 		args->flags |= IPFW_ARGS_IP4;
 		hlen = ip->ip_hl << 2;
 		/*
 		 * Collect parameters into local variables for faster
 		 * matching.
 		 */
 		proto = ip->ip_p;
 		src_ip = ip->ip_src;
 		dst_ip = ip->ip_dst;
 		offset = ntohs(ip->ip_off) & IP_OFFMASK;
 		iplen = ntohs(ip->ip_len);
 
 		if (offset == 0) {
 			switch (proto) {
 			case IPPROTO_TCP:
 				PULLUP_TO(hlen, ulp, struct tcphdr);
 				dst_port = TCP(ulp)->th_dport;
 				src_port = TCP(ulp)->th_sport;
 				/* save flags for dynamic rules */
 				args->f_id._flags = TCP(ulp)->th_flags;
 				break;
 
 			case IPPROTO_SCTP:
 				if (pktlen >= hlen + sizeof(struct sctphdr) +
 				    sizeof(struct sctp_chunkhdr) +
 				    offsetof(struct sctp_init, a_rwnd))
 					PULLUP_LEN(hlen, ulp,
 					    sizeof(struct sctphdr) +
 					    sizeof(struct sctp_chunkhdr) +
 					    offsetof(struct sctp_init, a_rwnd));
 				else if (pktlen >= hlen + sizeof(struct sctphdr))
 					PULLUP_LEN(hlen, ulp, pktlen - hlen);
 				else
 					PULLUP_LEN(hlen, ulp,
 					    sizeof(struct sctphdr));
 				src_port = SCTP(ulp)->src_port;
 				dst_port = SCTP(ulp)->dest_port;
 				break;
 
 			case IPPROTO_UDP:
 			case IPPROTO_UDPLITE:
 				PULLUP_TO(hlen, ulp, struct udphdr);
 				dst_port = UDP(ulp)->uh_dport;
 				src_port = UDP(ulp)->uh_sport;
 				break;
 
 			case IPPROTO_ICMP:
 				PULLUP_TO(hlen, ulp, struct icmphdr);
 				//args->f_id.flags = ICMP(ulp)->icmp_type;
 				break;
 
 			default:
 				break;
 			}
 		}
 
 		UPDATE_POINTERS();
 		args->f_id.addr_type = 4;
 		args->f_id.src_ip = ntohl(src_ip.s_addr);
 		args->f_id.dst_ip = ntohl(dst_ip.s_addr);
 	} else {
 		proto = 0;
 		dst_ip.s_addr = src_ip.s_addr = 0;
 
 		args->f_id.addr_type = 1; /* XXX */
 	}
 #undef PULLUP_TO
 	pktlen = iplen < pktlen ? iplen: pktlen;
 
 	/* Properly initialize the rest of f_id */
 	args->f_id.proto = proto;
 	args->f_id.src_port = src_port = ntohs(src_port);
 	args->f_id.dst_port = dst_port = ntohs(dst_port);
 
 	IPFW_PF_RLOCK(chain);
 	if (! V_ipfw_vnet_ready) { /* shutting down, leave NOW. */
 		IPFW_PF_RUNLOCK(chain);
 		return (IP_FW_PASS);	/* accept */
 	}
 	if (args->flags & IPFW_ARGS_REF) {
 		/*
 		 * Packet has already been tagged as a result of a previous
 		 * match on rule args->rule aka args->rule_id (PIPE, QUEUE,
 		 * REASS, NETGRAPH, DIVERT/TEE...)
 		 * Validate the slot and continue from the next one
 		 * if still present, otherwise do a lookup.
 		 */
 		f_pos = (args->rule.chain_id == chain->id) ?
 		    args->rule.slot :
 		    ipfw_find_rule(chain, args->rule.rulenum,
 			args->rule.rule_id);
 	} else {
 		f_pos = 0;
 	}
 
 	if (args->flags & IPFW_ARGS_IN) {
 		iif = args->ifp;
 		oif = NULL;
 	} else {
 		MPASS(args->flags & IPFW_ARGS_OUT);
 		iif = mem ? NULL : m->m_pkthdr.rcvif;
 		oif = args->ifp;
 	}
 
 	/*
 	 * Now scan the rules, and parse microinstructions for each rule.
 	 * We have two nested loops and an inner switch. Sometimes we
 	 * need to break out of one or both loops, or re-enter one of
 	 * the loops with updated variables. Loop variables are:
 	 *
 	 *	f_pos (outer loop) points to the current rule.
 	 *		On output it points to the matching rule.
 	 *	done (outer loop) is used as a flag to break the loop.
 	 *	l (inner loop)	residual length of current rule.
 	 *		cmd points to the current microinstruction.
 	 *
 	 * We break the inner loop by setting l=0 and possibly
 	 * cmdlen=0 if we don't want to advance cmd.
 	 * We break the outer loop by setting done=1
 	 * We can restart the inner loop by setting l>0 and f_pos, f, cmd
 	 * as needed.
 	 */
 	for (; f_pos < chain->n_rules; f_pos++) {
 		ipfw_insn *cmd;
 		uint32_t tablearg = 0;
 		int l, cmdlen, skip_or; /* skip rest of OR block */
 		struct ip_fw *f;
 
 		f = chain->map[f_pos];
 		if (V_set_disable & (1 << f->set) )
 			continue;
 
 		skip_or = 0;
 		for (l = f->cmd_len, cmd = f->cmd ; l > 0 ;
 		    l -= cmdlen, cmd += cmdlen) {
 			int match;
 
 			/*
 			 * check_body is a jump target used when we find a
 			 * CHECK_STATE, and need to jump to the body of
 			 * the target rule.
 			 */
 
 /* check_body: */
 			cmdlen = F_LEN(cmd);
 			/*
 			 * An OR block (insn_1 || .. || insn_n) has the
 			 * F_OR bit set in all but the last instruction.
 			 * The first match will set "skip_or", and cause
 			 * the following instructions to be skipped until
 			 * past the one with the F_OR bit clear.
 			 */
 			if (skip_or) {		/* skip this instruction */
 				if ((cmd->len & F_OR) == 0)
 					skip_or = 0;	/* next one is good */
 				continue;
 			}
 			match = 0; /* set to 1 if we succeed */
 
 			switch (cmd->opcode) {
 			/*
 			 * The first set of opcodes compares the packet's
 			 * fields with some pattern, setting 'match' if a
 			 * match is found. At the end of the loop there is
 			 * logic to deal with F_NOT and F_OR flags associated
 			 * with the opcode.
 			 */
 			case O_NOP:
 				match = 1;
 				break;
 
 			case O_FORWARD_MAC:
 				printf("ipfw: opcode %d unimplemented\n",
 				    cmd->opcode);
 				break;
 
 			case O_GID:
 			case O_UID:
 			case O_JAIL:
 				/*
 				 * We only check offset == 0 && proto != 0,
 				 * as this ensures that we have a
 				 * packet with the ports info.
 				 */
 				if (offset != 0)
 					break;
 				if (proto == IPPROTO_TCP ||
 				    proto == IPPROTO_UDP ||
 				    proto == IPPROTO_UDPLITE)
 					match = check_uidgid(
 						    (ipfw_insn_u32 *)cmd,
 						    args, &ucred_lookup,
 #ifdef __FreeBSD__
 						    &ucred_cache);
 #else
 						    (void *)&ucred_cache);
 #endif
 				break;
 
 			case O_RECV:
 				match = iface_match(iif, (ipfw_insn_if *)cmd,
 				    chain, &tablearg);
 				break;
 
 			case O_XMIT:
 				match = iface_match(oif, (ipfw_insn_if *)cmd,
 				    chain, &tablearg);
 				break;
 
 			case O_VIA:
 				match = iface_match(args->ifp,
 				    (ipfw_insn_if *)cmd, chain, &tablearg);
 				break;
 
 			case O_MACADDR2:
 				if (args->flags & IPFW_ARGS_ETHER) {
 					u_int32_t *want = (u_int32_t *)
 						((ipfw_insn_mac *)cmd)->addr;
 					u_int32_t *mask = (u_int32_t *)
 						((ipfw_insn_mac *)cmd)->mask;
 					u_int32_t *hdr = (u_int32_t *)eh;
 
 					match =
 					    ( want[0] == (hdr[0] & mask[0]) &&
 					      want[1] == (hdr[1] & mask[1]) &&
 					      want[2] == (hdr[2] & mask[2]) );
 				}
 				break;
 
 			case O_MAC_TYPE:
 				if (args->flags & IPFW_ARGS_ETHER) {
 					u_int16_t *p =
 					    ((ipfw_insn_u16 *)cmd)->ports;
 					int i;
 
 					for (i = cmdlen - 1; !match && i>0;
 					    i--, p += 2)
 						match =
 						    (ntohs(eh->ether_type) >=
 						    p[0] &&
 						    ntohs(eh->ether_type) <=
 						    p[1]);
 				}
 				break;
 
 			case O_FRAG:
 				match = (offset != 0);
 				break;
 
 			case O_IN:	/* "out" is "not in" */
 				match = (oif == NULL);
 				break;
 
 			case O_LAYER2:
 				match = (args->flags & IPFW_ARGS_ETHER);
 				break;
 
 			case O_DIVERTED:
 				if ((args->flags & IPFW_ARGS_REF) == 0)
 					break;
 				/*
 				 * For diverted packets, args->rule.info
 				 * contains the divert port (in host format)
 				 * reason and direction.
 				 */
 				match = ((args->rule.info & IPFW_IS_MASK) ==
 				    IPFW_IS_DIVERT) && (
 				    ((args->rule.info & IPFW_INFO_IN) ?
 					1: 2) & cmd->arg1);
 				break;
 
 			case O_PROTO:
 				/*
 				 * We do not allow an arg of 0 so the
 				 * check of "proto" only suffices.
 				 */
 				match = (proto == cmd->arg1);
 				break;
 
 			case O_IP_SRC:
 				match = is_ipv4 &&
 				    (((ipfw_insn_ip *)cmd)->addr.s_addr ==
 				    src_ip.s_addr);
 				break;
 
 			case O_IP_DST_LOOKUP:
 			{
 				void *pkey;
 				uint32_t vidx, key;
 				uint16_t keylen;
 
 				if (cmdlen > F_INSN_SIZE(ipfw_insn_u32)) {
 					/* Determine lookup key type */
 					vidx = ((ipfw_insn_u32 *)cmd)->d[1];
 					if (vidx != 4 /* uid */ &&
 					    vidx != 5 /* jail */ &&
 					    is_ipv6 == 0 && is_ipv4 == 0)
 						break;
 					/* Determine key length */
 					if (vidx == 0 /* dst-ip */ ||
 					    vidx == 1 /* src-ip */)
 						keylen = is_ipv6 ?
 						    sizeof(struct in6_addr):
 						    sizeof(in_addr_t);
 					else {
 						keylen = sizeof(key);
 						pkey = &key;
 					}
 					if (vidx == 0 /* dst-ip */)
 						pkey = is_ipv4 ? (void *)&dst_ip:
 						    (void *)&args->f_id.dst_ip6;
 					else if (vidx == 1 /* src-ip */)
 						pkey = is_ipv4 ? (void *)&src_ip:
 						    (void *)&args->f_id.src_ip6;
 					else if (vidx == 6 /* dscp */) {
 						if (is_ipv4)
 							key = ip->ip_tos >> 2;
 						else {
 							key = args->f_id.flow_id6;
 							key = (key & 0x0f) << 2 |
 							    (key & 0xf000) >> 14;
 						}
 						key &= 0x3f;
 					} else if (vidx == 2 /* dst-port */ ||
 					    vidx == 3 /* src-port */) {
 						/* Skip fragments */
 						if (offset != 0)
 							break;
 						/* Skip proto without ports */
 						if (proto != IPPROTO_TCP &&
 						    proto != IPPROTO_UDP &&
 						    proto != IPPROTO_UDPLITE &&
 						    proto != IPPROTO_SCTP)
 							break;
 						if (vidx == 2 /* dst-port */)
 							key = dst_port;
 						else
 							key = src_port;
 					}
 #ifndef USERSPACE
 					else if (vidx == 4 /* uid */ ||
 					    vidx == 5 /* jail */) {
 						check_uidgid(
 						    (ipfw_insn_u32 *)cmd,
 						    args, &ucred_lookup,
 #ifdef __FreeBSD__
 						    &ucred_cache);
 						if (vidx == 4 /* uid */)
 							key = ucred_cache->cr_uid;
 						else if (vidx == 5 /* jail */)
 							key = ucred_cache->cr_prison->pr_id;
 #else /* !__FreeBSD__ */
 						    (void *)&ucred_cache);
 						if (vidx == 4 /* uid */)
 							key = ucred_cache.uid;
 						else if (vidx == 5 /* jail */)
 							key = ucred_cache.xid;
 #endif /* !__FreeBSD__ */
 					}
 #endif /* !USERSPACE */
 					else
 						break;
 					match = ipfw_lookup_table(chain,
 					    cmd->arg1, keylen, pkey, &vidx);
 					if (!match)
 						break;
 					tablearg = vidx;
 					break;
 				}
 				/* cmdlen =< F_INSN_SIZE(ipfw_insn_u32) */
 				/* FALLTHROUGH */
 			}
 			case O_IP_SRC_LOOKUP:
 			{
 				void *pkey;
 				uint32_t vidx;
 				uint16_t keylen;
 
 				if (is_ipv4) {
 					keylen = sizeof(in_addr_t);
 					if (cmd->opcode == O_IP_DST_LOOKUP)
 						pkey = &dst_ip;
 					else
 						pkey = &src_ip;
 				} else if (is_ipv6) {
 					keylen = sizeof(struct in6_addr);
 					if (cmd->opcode == O_IP_DST_LOOKUP)
 						pkey = &args->f_id.dst_ip6;
 					else
 						pkey = &args->f_id.src_ip6;
 				} else
 					break;
 				match = ipfw_lookup_table(chain, cmd->arg1,
 				    keylen, pkey, &vidx);
 				if (!match)
 					break;
 				if (cmdlen == F_INSN_SIZE(ipfw_insn_u32)) {
 					match = ((ipfw_insn_u32 *)cmd)->d[0] ==
 					    TARG_VAL(chain, vidx, tag);
 					if (!match)
 						break;
 				}
 				tablearg = vidx;
 				break;
 			}
 
 			case O_IP_FLOW_LOOKUP:
 				{
 					uint32_t v = 0;
 					match = ipfw_lookup_table(chain,
 					    cmd->arg1, 0, &args->f_id, &v);
 					if (cmdlen == F_INSN_SIZE(ipfw_insn_u32))
 						match = ((ipfw_insn_u32 *)cmd)->d[0] ==
 						    TARG_VAL(chain, v, tag);
 					if (match)
 						tablearg = v;
 				}
 				break;
 			case O_IP_SRC_MASK:
 			case O_IP_DST_MASK:
 				if (is_ipv4) {
 				    uint32_t a =
 					(cmd->opcode == O_IP_DST_MASK) ?
 					    dst_ip.s_addr : src_ip.s_addr;
 				    uint32_t *p = ((ipfw_insn_u32 *)cmd)->d;
 				    int i = cmdlen-1;
 
 				    for (; !match && i>0; i-= 2, p+= 2)
 					match = (p[0] == (a & p[1]));
 				}
 				break;
 
 			case O_IP_SRC_ME:
 				if (is_ipv4) {
 					match = in_localip(src_ip);
 					break;
 				}
 #ifdef INET6
 				/* FALLTHROUGH */
 			case O_IP6_SRC_ME:
 				match = is_ipv6 &&
 				    ipfw_localip6(&args->f_id.src_ip6);
 #endif
 				break;
 
 			case O_IP_DST_SET:
 			case O_IP_SRC_SET:
 				if (is_ipv4) {
 					u_int32_t *d = (u_int32_t *)(cmd+1);
 					u_int32_t addr =
 					    cmd->opcode == O_IP_DST_SET ?
 						args->f_id.dst_ip :
 						args->f_id.src_ip;
 
 					    if (addr < d[0])
 						    break;
 					    addr -= d[0]; /* subtract base */
 					    match = (addr < cmd->arg1) &&
 						( d[ 1 + (addr>>5)] &
 						  (1<<(addr & 0x1f)) );
 				}
 				break;
 
 			case O_IP_DST:
 				match = is_ipv4 &&
 				    (((ipfw_insn_ip *)cmd)->addr.s_addr ==
 				    dst_ip.s_addr);
 				break;
 
 			case O_IP_DST_ME:
 				if (is_ipv4) {
 					match = in_localip(dst_ip);
 					break;
 				}
 #ifdef INET6
 				/* FALLTHROUGH */
 			case O_IP6_DST_ME:
 				match = is_ipv6 &&
 				    ipfw_localip6(&args->f_id.dst_ip6);
 #endif
 				break;
 
 
 			case O_IP_SRCPORT:
 			case O_IP_DSTPORT:
 				/*
 				 * offset == 0 && proto != 0 is enough
 				 * to guarantee that we have a
 				 * packet with port info.
 				 */
 				if ((proto == IPPROTO_UDP ||
 				    proto == IPPROTO_UDPLITE ||
 				    proto == IPPROTO_TCP ||
 				    proto == IPPROTO_SCTP) && offset == 0) {
 					u_int16_t x =
 					    (cmd->opcode == O_IP_SRCPORT) ?
 						src_port : dst_port ;
 					u_int16_t *p =
 					    ((ipfw_insn_u16 *)cmd)->ports;
 					int i;
 
 					for (i = cmdlen - 1; !match && i>0;
 					    i--, p += 2)
 						match = (x>=p[0] && x<=p[1]);
 				}
 				break;
 
 			case O_ICMPTYPE:
 				match = (offset == 0 && proto==IPPROTO_ICMP &&
 				    icmptype_match(ICMP(ulp), (ipfw_insn_u32 *)cmd) );
 				break;
 
 #ifdef INET6
 			case O_ICMP6TYPE:
 				match = is_ipv6 && offset == 0 &&
 				    proto==IPPROTO_ICMPV6 &&
 				    icmp6type_match(
 					ICMP6(ulp)->icmp6_type,
 					(ipfw_insn_u32 *)cmd);
 				break;
 #endif /* INET6 */
 
 			case O_IPOPT:
 				match = (is_ipv4 &&
 				    ipopts_match(ip, cmd) );
 				break;
 
 			case O_IPVER:
 				match = (is_ipv4 &&
 				    cmd->arg1 == ip->ip_v);
 				break;
 
 			case O_IPID:
-			case O_IPLEN:
 			case O_IPTTL:
-				if (is_ipv4) {	/* only for IP packets */
+				if (!is_ipv4)
+					break;
+			case O_IPLEN:
+				{	/* only for IP packets */
 				    uint16_t x;
 				    uint16_t *p;
 				    int i;
 
 				    if (cmd->opcode == O_IPLEN)
 					x = iplen;
 				    else if (cmd->opcode == O_IPTTL)
 					x = ip->ip_ttl;
 				    else /* must be IPID */
 					x = ntohs(ip->ip_id);
 				    if (cmdlen == 1) {
 					match = (cmd->arg1 == x);
 					break;
 				    }
 				    /* otherwise we have ranges */
 				    p = ((ipfw_insn_u16 *)cmd)->ports;
 				    i = cmdlen - 1;
 				    for (; !match && i>0; i--, p += 2)
 					match = (x >= p[0] && x <= p[1]);
 				}
 				break;
 
 			case O_IPPRECEDENCE:
 				match = (is_ipv4 &&
 				    (cmd->arg1 == (ip->ip_tos & 0xe0)) );
 				break;
 
 			case O_IPTOS:
 				match = (is_ipv4 &&
 				    flags_match(cmd, ip->ip_tos));
 				break;
 
 			case O_DSCP:
 			    {
 				uint32_t *p;
 				uint16_t x;
 
 				p = ((ipfw_insn_u32 *)cmd)->d;
 
 				if (is_ipv4)
 					x = ip->ip_tos >> 2;
 				else if (is_ipv6) {
 					uint8_t *v;
 					v = &((struct ip6_hdr *)ip)->ip6_vfc;
 					x = (*v & 0x0F) << 2;
 					v++;
 					x |= *v >> 6;
 				} else
 					break;
 
 				/* DSCP bitmask is stored as low_u32 high_u32 */
 				if (x >= 32)
 					match = *(p + 1) & (1 << (x - 32));
 				else
 					match = *p & (1 << x);
 			    }
 				break;
 
 			case O_TCPDATALEN:
 				if (proto == IPPROTO_TCP && offset == 0) {
 				    struct tcphdr *tcp;
 				    uint16_t x;
 				    uint16_t *p;
 				    int i;
 #ifdef INET6
 				    if (is_ipv6) {
 					    struct ip6_hdr *ip6;
 
 					    ip6 = (struct ip6_hdr *)ip;
 					    if (ip6->ip6_plen == 0) {
 						    /*
 						     * Jumbo payload is not
 						     * supported by this
 						     * opcode.
 						     */
 						    break;
 					    }
 					    x = iplen - hlen;
 				    } else
 #endif /* INET6 */
 					    x = iplen - (ip->ip_hl << 2);
 				    tcp = TCP(ulp);
 				    x -= tcp->th_off << 2;
 				    if (cmdlen == 1) {
 					match = (cmd->arg1 == x);
 					break;
 				    }
 				    /* otherwise we have ranges */
 				    p = ((ipfw_insn_u16 *)cmd)->ports;
 				    i = cmdlen - 1;
 				    for (; !match && i>0; i--, p += 2)
 					match = (x >= p[0] && x <= p[1]);
 				}
 				break;
 
 			case O_TCPFLAGS:
 				match = (proto == IPPROTO_TCP && offset == 0 &&
 				    flags_match(cmd, TCP(ulp)->th_flags));
 				break;
 
 			case O_TCPOPTS:
 				if (proto == IPPROTO_TCP && offset == 0 && ulp){
 					PULLUP_LEN(hlen, ulp,
 					    (TCP(ulp)->th_off << 2));
 					match = tcpopts_match(TCP(ulp), cmd);
 				}
 				break;
 
 			case O_TCPSEQ:
 				match = (proto == IPPROTO_TCP && offset == 0 &&
 				    ((ipfw_insn_u32 *)cmd)->d[0] ==
 					TCP(ulp)->th_seq);
 				break;
 
 			case O_TCPACK:
 				match = (proto == IPPROTO_TCP && offset == 0 &&
 				    ((ipfw_insn_u32 *)cmd)->d[0] ==
 					TCP(ulp)->th_ack);
 				break;
 
 			case O_TCPWIN:
 				if (proto == IPPROTO_TCP && offset == 0) {
 				    uint16_t x;
 				    uint16_t *p;
 				    int i;
 
 				    x = ntohs(TCP(ulp)->th_win);
 				    if (cmdlen == 1) {
 					match = (cmd->arg1 == x);
 					break;
 				    }
 				    /* Otherwise we have ranges. */
 				    p = ((ipfw_insn_u16 *)cmd)->ports;
 				    i = cmdlen - 1;
 				    for (; !match && i > 0; i--, p += 2)
 					match = (x >= p[0] && x <= p[1]);
 				}
 				break;
 
 			case O_ESTAB:
 				/* reject packets which have SYN only */
 				/* XXX should i also check for TH_ACK ? */
 				match = (proto == IPPROTO_TCP && offset == 0 &&
 				    (TCP(ulp)->th_flags &
 				     (TH_RST | TH_ACK | TH_SYN)) != TH_SYN);
 				break;
 
 			case O_ALTQ: {
 				struct pf_mtag *at;
 				struct m_tag *mtag;
 				ipfw_insn_altq *altq = (ipfw_insn_altq *)cmd;
 
 				/*
 				 * ALTQ uses mbuf tags from another
 				 * packet filtering system - pf(4).
 				 * We allocate a tag in its format
 				 * and fill it in, pretending to be pf(4).
 				 */
 				match = 1;
 				at = pf_find_mtag(m);
 				if (at != NULL && at->qid != 0)
 					break;
 				mtag = m_tag_get(PACKET_TAG_PF,
 				    sizeof(struct pf_mtag), M_NOWAIT | M_ZERO);
 				if (mtag == NULL) {
 					/*
 					 * Let the packet fall back to the
 					 * default ALTQ.
 					 */
 					break;
 				}
 				m_tag_prepend(m, mtag);
 				at = (struct pf_mtag *)(mtag + 1);
 				at->qid = altq->qid;
 				at->hdr = ip;
 				break;
 			}
 
 			case O_LOG:
 				ipfw_log(chain, f, hlen, args,
 				    offset | ip6f_mf, tablearg, ip);
 				match = 1;
 				break;
 
 			case O_PROB:
 				match = (random()<((ipfw_insn_u32 *)cmd)->d[0]);
 				break;
 
 			case O_VERREVPATH:
 				/* Outgoing packets automatically pass/match */
 				match = (args->flags & IPFW_ARGS_OUT ||
 				    (
 #ifdef INET6
 				    is_ipv6 ?
 					verify_path6(&(args->f_id.src_ip6),
 					    iif, args->f_id.fib) :
 #endif
 				    verify_path(src_ip, iif, args->f_id.fib)));
 				break;
 
 			case O_VERSRCREACH:
 				/* Outgoing packets automatically pass/match */
 				match = (hlen > 0 && ((oif != NULL) || (
 #ifdef INET6
 				    is_ipv6 ?
 				        verify_path6(&(args->f_id.src_ip6),
 				            NULL, args->f_id.fib) :
 #endif
 				    verify_path(src_ip, NULL, args->f_id.fib))));
 				break;
 
 			case O_ANTISPOOF:
 				/* Outgoing packets automatically pass/match */
 				if (oif == NULL && hlen > 0 &&
 				    (  (is_ipv4 && in_localaddr(src_ip))
 #ifdef INET6
 				    || (is_ipv6 &&
 				        in6_localaddr(&(args->f_id.src_ip6)))
 #endif
 				    ))
 					match =
 #ifdef INET6
 					    is_ipv6 ? verify_path6(
 					        &(args->f_id.src_ip6), iif,
 						args->f_id.fib) :
 #endif
 					    verify_path(src_ip, iif,
 					        args->f_id.fib);
 				else
 					match = 1;
 				break;
 
 			case O_IPSEC:
 				match = (m_tag_find(m,
 				    PACKET_TAG_IPSEC_IN_DONE, NULL) != NULL);
 				/* otherwise no match */
 				break;
 
 #ifdef INET6
 			case O_IP6_SRC:
 				match = is_ipv6 &&
 				    IN6_ARE_ADDR_EQUAL(&args->f_id.src_ip6,
 				    &((ipfw_insn_ip6 *)cmd)->addr6);
 				break;
 
 			case O_IP6_DST:
 				match = is_ipv6 &&
 				IN6_ARE_ADDR_EQUAL(&args->f_id.dst_ip6,
 				    &((ipfw_insn_ip6 *)cmd)->addr6);
 				break;
 			case O_IP6_SRC_MASK:
 			case O_IP6_DST_MASK:
 				if (is_ipv6) {
 					int i = cmdlen - 1;
 					struct in6_addr p;
 					struct in6_addr *d =
 					    &((ipfw_insn_ip6 *)cmd)->addr6;
 
 					for (; !match && i > 0; d += 2,
 					    i -= F_INSN_SIZE(struct in6_addr)
 					    * 2) {
 						p = (cmd->opcode ==
 						    O_IP6_SRC_MASK) ?
 						    args->f_id.src_ip6:
 						    args->f_id.dst_ip6;
 						APPLY_MASK(&p, &d[1]);
 						match =
 						    IN6_ARE_ADDR_EQUAL(&d[0],
 						    &p);
 					}
 				}
 				break;
 
 			case O_FLOW6ID:
 				match = is_ipv6 &&
 				    flow6id_match(args->f_id.flow_id6,
 				    (ipfw_insn_u32 *) cmd);
 				break;
 
 			case O_EXT_HDR:
 				match = is_ipv6 &&
 				    (ext_hd & ((ipfw_insn *) cmd)->arg1);
 				break;
 
 			case O_IP6:
 				match = is_ipv6;
 				break;
 #endif
 
 			case O_IP4:
 				match = is_ipv4;
 				break;
 
 			case O_TAG: {
 				struct m_tag *mtag;
 				uint32_t tag = TARG(cmd->arg1, tag);
 
 				/* Packet is already tagged with this tag? */
 				mtag = m_tag_locate(m, MTAG_IPFW, tag, NULL);
 
 				/* We have `untag' action when F_NOT flag is
 				 * present. And we must remove this mtag from
 				 * mbuf and reset `match' to zero (`match' will
 				 * be inversed later).
 				 * Otherwise we should allocate new mtag and
 				 * push it into mbuf.
 				 */
 				if (cmd->len & F_NOT) { /* `untag' action */
 					if (mtag != NULL)
 						m_tag_delete(m, mtag);
 					match = 0;
 				} else {
 					if (mtag == NULL) {
 						mtag = m_tag_alloc( MTAG_IPFW,
 						    tag, 0, M_NOWAIT);
 						if (mtag != NULL)
 							m_tag_prepend(m, mtag);
 					}
 					match = 1;
 				}
 				break;
 			}
 
 			case O_FIB: /* try match the specified fib */
 				if (args->f_id.fib == cmd->arg1)
 					match = 1;
 				break;
 
 			case O_SOCKARG:	{
 #ifndef USERSPACE	/* not supported in userspace */
 				struct inpcb *inp = args->inp;
 				struct inpcbinfo *pi;
 				
 				if (is_ipv6) /* XXX can we remove this ? */
 					break;
 
 				if (proto == IPPROTO_TCP)
 					pi = &V_tcbinfo;
 				else if (proto == IPPROTO_UDP)
 					pi = &V_udbinfo;
 				else if (proto == IPPROTO_UDPLITE)
 					pi = &V_ulitecbinfo;
 				else
 					break;
 
 				/*
 				 * XXXRW: so_user_cookie should almost
 				 * certainly be inp_user_cookie?
 				 */
 
 				/* For incoming packet, lookup up the 
 				inpcb using the src/dest ip/port tuple */
 				if (inp == NULL) {
 					inp = in_pcblookup(pi, 
 						src_ip, htons(src_port),
 						dst_ip, htons(dst_port),
 						INPLOOKUP_RLOCKPCB, NULL);
 					if (inp != NULL) {
 						tablearg =
 						    inp->inp_socket->so_user_cookie;
 						if (tablearg)
 							match = 1;
 						INP_RUNLOCK(inp);
 					}
 				} else {
 					if (inp->inp_socket) {
 						tablearg =
 						    inp->inp_socket->so_user_cookie;
 						if (tablearg)
 							match = 1;
 					}
 				}
 #endif /* !USERSPACE */
 				break;
 			}
 
 			case O_TAGGED: {
 				struct m_tag *mtag;
 				uint32_t tag = TARG(cmd->arg1, tag);
 
 				if (cmdlen == 1) {
 					match = m_tag_locate(m, MTAG_IPFW,
 					    tag, NULL) != NULL;
 					break;
 				}
 
 				/* we have ranges */
 				for (mtag = m_tag_first(m);
 				    mtag != NULL && !match;
 				    mtag = m_tag_next(m, mtag)) {
 					uint16_t *p;
 					int i;
 
 					if (mtag->m_tag_cookie != MTAG_IPFW)
 						continue;
 
 					p = ((ipfw_insn_u16 *)cmd)->ports;
 					i = cmdlen - 1;
 					for(; !match && i > 0; i--, p += 2)
 						match =
 						    mtag->m_tag_id >= p[0] &&
 						    mtag->m_tag_id <= p[1];
 				}
 				break;
 			}
 				
 			/*
 			 * The second set of opcodes represents 'actions',
 			 * i.e. the terminal part of a rule once the packet
 			 * matches all previous patterns.
 			 * Typically there is only one action for each rule,
 			 * and the opcode is stored at the end of the rule
 			 * (but there are exceptions -- see below).
 			 *
 			 * In general, here we set retval and terminate the
 			 * outer loop (would be a 'break 3' in some language,
 			 * but we need to set l=0, done=1)
 			 *
 			 * Exceptions:
 			 * O_COUNT and O_SKIPTO actions:
 			 *   instead of terminating, we jump to the next rule
 			 *   (setting l=0), or to the SKIPTO target (setting
 			 *   f/f_len, cmd and l as needed), respectively.
 			 *
 			 * O_TAG, O_LOG and O_ALTQ action parameters:
 			 *   perform some action and set match = 1;
 			 *
 			 * O_LIMIT and O_KEEP_STATE: these opcodes are
 			 *   not real 'actions', and are stored right
 			 *   before the 'action' part of the rule (one
 			 *   exception is O_SKIP_ACTION which could be
 			 *   between these opcodes and 'action' one).
 			 *   These opcodes try to install an entry in the
 			 *   state tables; if successful, we continue with
 			 *   the next opcode (match=1; break;), otherwise
 			 *   the packet must be dropped (set retval,
 			 *   break loops with l=0, done=1)
 			 *
 			 * O_PROBE_STATE and O_CHECK_STATE: these opcodes
 			 *   cause a lookup of the state table, and a jump
 			 *   to the 'action' part of the parent rule
 			 *   if an entry is found, or
 			 *   (CHECK_STATE only) a jump to the next rule if
 			 *   the entry is not found.
 			 *   The result of the lookup is cached so that
 			 *   further instances of these opcodes become NOPs.
 			 *   The jump to the next rule is done by setting
 			 *   l=0, cmdlen=0.
 			 *
 			 * O_SKIP_ACTION: this opcode is not a real 'action'
 			 *  either, and is stored right before the 'action'
 			 *  part of the rule, right after the O_KEEP_STATE
 			 *  opcode. It causes match failure so the real
 			 *  'action' could be executed only if the rule
 			 *  is checked via dynamic rule from the state
 			 *  table, as in such case execution starts
 			 *  from the true 'action' opcode directly.
 			 *   
 			 */
 			case O_LIMIT:
 			case O_KEEP_STATE:
 				if (ipfw_dyn_install_state(chain, f,
 				    (ipfw_insn_limit *)cmd, args, ulp,
 				    pktlen, &dyn_info, tablearg)) {
 					/* error or limit violation */
 					retval = IP_FW_DENY;
 					l = 0;	/* exit inner loop */
 					done = 1; /* exit outer loop */
 				}
 				match = 1;
 				break;
 
 			case O_PROBE_STATE:
 			case O_CHECK_STATE:
 				/*
 				 * dynamic rules are checked at the first
 				 * keep-state or check-state occurrence,
 				 * with the result being stored in dyn_info.
 				 * The compiler introduces a PROBE_STATE
 				 * instruction for us when we have a
 				 * KEEP_STATE (because PROBE_STATE needs
 				 * to be run first).
 				 */
 				if (DYN_LOOKUP_NEEDED(&dyn_info, cmd) &&
 				    (q = ipfw_dyn_lookup_state(args, ulp,
 				    pktlen, cmd, &dyn_info)) != NULL) {
 					/*
 					 * Found dynamic entry, jump to the
 					 * 'action' part of the parent rule
 					 * by setting f, cmd, l and clearing
 					 * cmdlen.
 					 */
 					f = q;
 					f_pos = dyn_info.f_pos;
 					cmd = ACTION_PTR(f);
 					l = f->cmd_len - f->act_ofs;
 					cmdlen = 0;
 					match = 1;
 					break;
 				}
 				/*
 				 * Dynamic entry not found. If CHECK_STATE,
 				 * skip to next rule, if PROBE_STATE just
 				 * ignore and continue with next opcode.
 				 */
 				if (cmd->opcode == O_CHECK_STATE)
 					l = 0;	/* exit inner loop */
 				match = 1;
 				break;
 
 			case O_SKIP_ACTION:
 				match = 0;	/* skip to the next rule */
 				l = 0;		/* exit inner loop */
 				break;
 
 			case O_ACCEPT:
 				retval = 0;	/* accept */
 				l = 0;		/* exit inner loop */
 				done = 1;	/* exit outer loop */
 				break;
 
 			case O_PIPE:
 			case O_QUEUE:
 				set_match(args, f_pos, chain);
 				args->rule.info = TARG(cmd->arg1, pipe);
 				if (cmd->opcode == O_PIPE)
 					args->rule.info |= IPFW_IS_PIPE;
 				if (V_fw_one_pass)
 					args->rule.info |= IPFW_ONEPASS;
 				retval = IP_FW_DUMMYNET;
 				l = 0;          /* exit inner loop */
 				done = 1;       /* exit outer loop */
 				break;
 
 			case O_DIVERT:
 			case O_TEE:
 				if (args->flags & IPFW_ARGS_ETHER)
 					break;	/* not on layer 2 */
 				/* otherwise this is terminal */
 				l = 0;		/* exit inner loop */
 				done = 1;	/* exit outer loop */
 				retval = (cmd->opcode == O_DIVERT) ?
 					IP_FW_DIVERT : IP_FW_TEE;
 				set_match(args, f_pos, chain);
 				args->rule.info = TARG(cmd->arg1, divert);
 				break;
 
 			case O_COUNT:
 				IPFW_INC_RULE_COUNTER(f, pktlen);
 				l = 0;		/* exit inner loop */
 				break;
 
 			case O_SKIPTO:
 			    IPFW_INC_RULE_COUNTER(f, pktlen);
 			    f_pos = JUMP(chain, f, cmd->arg1, tablearg, 0);
 			    /*
 			     * Skip disabled rules, and re-enter
 			     * the inner loop with the correct
 			     * f_pos, f, l and cmd.
 			     * Also clear cmdlen and skip_or
 			     */
 			    for (; f_pos < chain->n_rules - 1 &&
 				    (V_set_disable &
 				     (1 << chain->map[f_pos]->set));
 				    f_pos++)
 				;
 			    /* Re-enter the inner loop at the skipto rule. */
 			    f = chain->map[f_pos];
 			    l = f->cmd_len;
 			    cmd = f->cmd;
 			    match = 1;
 			    cmdlen = 0;
 			    skip_or = 0;
 			    continue;
 			    break;	/* not reached */
 
 			case O_CALLRETURN: {
 				/*
 				 * Implementation of `subroutine' call/return,
 				 * in the stack carried in an mbuf tag. This
 				 * is different from `skipto' in that any call
 				 * address is possible (`skipto' must prevent
 				 * backward jumps to avoid endless loops).
 				 * We have `return' action when F_NOT flag is
 				 * present. The `m_tag_id' field is used as
 				 * stack pointer.
 				 */
 				struct m_tag *mtag;
 				uint16_t jmpto, *stack;
 
 #define	IS_CALL		((cmd->len & F_NOT) == 0)
 #define	IS_RETURN	((cmd->len & F_NOT) != 0)
 				/*
 				 * Hand-rolled version of m_tag_locate() with
 				 * wildcard `type'.
 				 * If not already tagged, allocate new tag.
 				 */
 				mtag = m_tag_first(m);
 				while (mtag != NULL) {
 					if (mtag->m_tag_cookie ==
 					    MTAG_IPFW_CALL)
 						break;
 					mtag = m_tag_next(m, mtag);
 				}
 				if (mtag == NULL && IS_CALL) {
 					mtag = m_tag_alloc(MTAG_IPFW_CALL, 0,
 					    IPFW_CALLSTACK_SIZE *
 					    sizeof(uint16_t), M_NOWAIT);
 					if (mtag != NULL)
 						m_tag_prepend(m, mtag);
 				}
 
 				/*
 				 * On error both `call' and `return' just
 				 * continue with next rule.
 				 */
 				if (IS_RETURN && (mtag == NULL ||
 				    mtag->m_tag_id == 0)) {
 					l = 0;		/* exit inner loop */
 					break;
 				}
 				if (IS_CALL && (mtag == NULL ||
 				    mtag->m_tag_id >= IPFW_CALLSTACK_SIZE)) {
 					printf("ipfw: call stack error, "
 					    "go to next rule\n");
 					l = 0;		/* exit inner loop */
 					break;
 				}
 
 				IPFW_INC_RULE_COUNTER(f, pktlen);
 				stack = (uint16_t *)(mtag + 1);
 
 				/*
 				 * The `call' action may use cached f_pos
 				 * (in f->next_rule), whose version is written
 				 * in f->next_rule.
 				 * The `return' action, however, doesn't have
 				 * fixed jump address in cmd->arg1 and can't use
 				 * cache.
 				 */
 				if (IS_CALL) {
 					stack[mtag->m_tag_id] = f->rulenum;
 					mtag->m_tag_id++;
 			    		f_pos = JUMP(chain, f, cmd->arg1,
 					    tablearg, 1);
 				} else {	/* `return' action */
 					mtag->m_tag_id--;
 					jmpto = stack[mtag->m_tag_id] + 1;
 					f_pos = ipfw_find_rule(chain, jmpto, 0);
 				}
 
 				/*
 				 * Skip disabled rules, and re-enter
 				 * the inner loop with the correct
 				 * f_pos, f, l and cmd.
 				 * Also clear cmdlen and skip_or
 				 */
 				for (; f_pos < chain->n_rules - 1 &&
 				    (V_set_disable &
 				    (1 << chain->map[f_pos]->set)); f_pos++)
 					;
 				/* Re-enter the inner loop at the dest rule. */
 				f = chain->map[f_pos];
 				l = f->cmd_len;
 				cmd = f->cmd;
 				cmdlen = 0;
 				skip_or = 0;
 				continue;
 				break;	/* NOTREACHED */
 			}
 #undef IS_CALL
 #undef IS_RETURN
 
 			case O_REJECT:
 				/*
 				 * Drop the packet and send a reject notice
 				 * if the packet is not ICMP (or is an ICMP
 				 * query), and it is not multicast/broadcast.
 				 */
 				if (hlen > 0 && is_ipv4 && offset == 0 &&
 				    (proto != IPPROTO_ICMP ||
 				     is_icmp_query(ICMP(ulp))) &&
 				    !(m->m_flags & (M_BCAST|M_MCAST)) &&
 				    !IN_MULTICAST(ntohl(dst_ip.s_addr))) {
 					send_reject(args, cmd->arg1, iplen, ip);
 					m = args->m;
 				}
 				/* FALLTHROUGH */
 #ifdef INET6
 			case O_UNREACH6:
 				if (hlen > 0 && is_ipv6 &&
 				    ((offset & IP6F_OFF_MASK) == 0) &&
 				    (proto != IPPROTO_ICMPV6 ||
 				     (is_icmp6_query(icmp6_type) == 1)) &&
 				    !(m->m_flags & (M_BCAST|M_MCAST)) &&
 				    !IN6_IS_ADDR_MULTICAST(
 					&args->f_id.dst_ip6)) {
 					send_reject6(args,
 					    cmd->opcode == O_REJECT ?
 					    map_icmp_unreach(cmd->arg1):
 					    cmd->arg1, hlen,
 					    (struct ip6_hdr *)ip);
 					m = args->m;
 				}
 				/* FALLTHROUGH */
 #endif
 			case O_DENY:
 				retval = IP_FW_DENY;
 				l = 0;		/* exit inner loop */
 				done = 1;	/* exit outer loop */
 				break;
 
 			case O_FORWARD_IP:
 				if (args->flags & IPFW_ARGS_ETHER)
 					break;	/* not valid on layer2 pkts */
 				if (q != f ||
 				    dyn_info.direction == MATCH_FORWARD) {
 				    struct sockaddr_in *sa;
 
 				    sa = &(((ipfw_insn_sa *)cmd)->sa);
 				    if (sa->sin_addr.s_addr == INADDR_ANY) {
 #ifdef INET6
 					/*
 					 * We use O_FORWARD_IP opcode for
 					 * fwd rule with tablearg, but tables
 					 * now support IPv6 addresses. And
 					 * when we are inspecting IPv6 packet,
 					 * we can use nh6 field from
 					 * table_value as next_hop6 address.
 					 */
 					if (is_ipv6) {
 						struct ip_fw_nh6 *nh6;
 
 						args->flags |= IPFW_ARGS_NH6;
 						nh6 = &args->hopstore6;
 						nh6->sin6_addr = TARG_VAL(
 						    chain, tablearg, nh6);
 						nh6->sin6_port = sa->sin_port;
 						nh6->sin6_scope_id = TARG_VAL(
 						    chain, tablearg, zoneid);
 					} else
 #endif
 					{
 						args->flags |= IPFW_ARGS_NH4;
 						args->hopstore.sin_port =
 						    sa->sin_port;
 						sa = &args->hopstore;
 						sa->sin_family = AF_INET;
 						sa->sin_len = sizeof(*sa);
 						sa->sin_addr.s_addr = htonl(
 						    TARG_VAL(chain, tablearg,
 						    nh4));
 					}
 				    } else {
 					    args->flags |= IPFW_ARGS_NH4PTR;
 					    args->next_hop = sa;
 				    }
 				}
 				retval = IP_FW_PASS;
 				l = 0;          /* exit inner loop */
 				done = 1;       /* exit outer loop */
 				break;
 
 #ifdef INET6
 			case O_FORWARD_IP6:
 				if (args->flags & IPFW_ARGS_ETHER)
 					break;	/* not valid on layer2 pkts */
 				if (q != f ||
 				    dyn_info.direction == MATCH_FORWARD) {
 					struct sockaddr_in6 *sin6;
 
 					sin6 = &(((ipfw_insn_sa6 *)cmd)->sa);
 					args->flags |= IPFW_ARGS_NH6PTR;
 					args->next_hop6 = sin6;
 				}
 				retval = IP_FW_PASS;
 				l = 0;		/* exit inner loop */
 				done = 1;	/* exit outer loop */
 				break;
 #endif
 
 			case O_NETGRAPH:
 			case O_NGTEE:
 				set_match(args, f_pos, chain);
 				args->rule.info = TARG(cmd->arg1, netgraph);
 				if (V_fw_one_pass)
 					args->rule.info |= IPFW_ONEPASS;
 				retval = (cmd->opcode == O_NETGRAPH) ?
 				    IP_FW_NETGRAPH : IP_FW_NGTEE;
 				l = 0;          /* exit inner loop */
 				done = 1;       /* exit outer loop */
 				break;
 
 			case O_SETFIB: {
 				uint32_t fib;
 
 				IPFW_INC_RULE_COUNTER(f, pktlen);
 				fib = TARG(cmd->arg1, fib) & 0x7FFF;
 				if (fib >= rt_numfibs)
 					fib = 0;
 				M_SETFIB(m, fib);
 				args->f_id.fib = fib; /* XXX */
 				l = 0;		/* exit inner loop */
 				break;
 		        }
 
 			case O_SETDSCP: {
 				uint16_t code;
 
 				code = TARG(cmd->arg1, dscp) & 0x3F;
 				l = 0;		/* exit inner loop */
 				if (is_ipv4) {
 					uint16_t old;
 
 					old = *(uint16_t *)ip;
 					ip->ip_tos = (code << 2) |
 					    (ip->ip_tos & 0x03);
 					ip->ip_sum = cksum_adjust(ip->ip_sum,
 					    old, *(uint16_t *)ip);
 				} else if (is_ipv6) {
 					uint8_t *v;
 
 					v = &((struct ip6_hdr *)ip)->ip6_vfc;
 					*v = (*v & 0xF0) | (code >> 2);
 					v++;
 					*v = (*v & 0x3F) | ((code & 0x03) << 6);
 				} else
 					break;
 
 				IPFW_INC_RULE_COUNTER(f, pktlen);
 				break;
 			}
 
 			case O_NAT:
 				l = 0;          /* exit inner loop */
 				done = 1;       /* exit outer loop */
 				/*
 				 * Ensure that we do not invoke NAT handler for
 				 * non IPv4 packets. Libalias expects only IPv4.
 				 */
 				if (!is_ipv4 || !IPFW_NAT_LOADED) {
 				    retval = IP_FW_DENY;
 				    break;
 				}
 
 				struct cfg_nat *t;
 				int nat_id;
 
 				args->rule.info = 0;
 				set_match(args, f_pos, chain);
 				/* Check if this is 'global' nat rule */
 				if (cmd->arg1 == IP_FW_NAT44_GLOBAL) {
 					retval = ipfw_nat_ptr(args, NULL, m);
 					break;
 				}
 				t = ((ipfw_insn_nat *)cmd)->nat;
 				if (t == NULL) {
 					nat_id = TARG(cmd->arg1, nat);
 					t = (*lookup_nat_ptr)(&chain->nat, nat_id);
 
 					if (t == NULL) {
 					    retval = IP_FW_DENY;
 					    break;
 					}
 					if (cmd->arg1 != IP_FW_TARG)
 					    ((ipfw_insn_nat *)cmd)->nat = t;
 				}
 				retval = ipfw_nat_ptr(args, t, m);
 				break;
 
 			case O_REASS: {
 				int ip_off;
 
 				l = 0;	/* in any case exit inner loop */
 				if (is_ipv6) /* IPv6 is not supported yet */
 					break;
 				IPFW_INC_RULE_COUNTER(f, pktlen);
 				ip_off = ntohs(ip->ip_off);
 
 				/* if not fragmented, go to next rule */
 				if ((ip_off & (IP_MF | IP_OFFMASK)) == 0)
 				    break;
 
 				args->m = m = ip_reass(m);
 
 				/*
 				 * do IP header checksum fixup.
 				 */
 				if (m == NULL) { /* fragment got swallowed */
 				    retval = IP_FW_DENY;
 				} else { /* good, packet complete */
 				    int hlen;
 
 				    ip = mtod(m, struct ip *);
 				    hlen = ip->ip_hl << 2;
 				    ip->ip_sum = 0;
 				    if (hlen == sizeof(struct ip))
 					ip->ip_sum = in_cksum_hdr(ip);
 				    else
 					ip->ip_sum = in_cksum(m, hlen);
 				    retval = IP_FW_REASS;
 				    args->rule.info = 0;
 				    set_match(args, f_pos, chain);
 				}
 				done = 1;	/* exit outer loop */
 				break;
 			}
 			case O_EXTERNAL_ACTION:
 				l = 0; /* in any case exit inner loop */
 				retval = ipfw_run_eaction(chain, args,
 				    cmd, &done);
 				/*
 				 * If both @retval and @done are zero,
 				 * consider this as rule matching and
 				 * update counters.
 				 */
 				if (retval == 0 && done == 0) {
 					IPFW_INC_RULE_COUNTER(f, pktlen);
 					/*
 					 * Reset the result of the last
 					 * dynamic state lookup.
 					 * External action can change
 					 * @args content, and it may be
 					 * used for new state lookup later.
 					 */
 					DYN_INFO_INIT(&dyn_info);
 				}
 				break;
 
 			default:
 				panic("-- unknown opcode %d\n", cmd->opcode);
 			} /* end of switch() on opcodes */
 			/*
 			 * if we get here with l=0, then match is irrelevant.
 			 */
 
 			if (cmd->len & F_NOT)
 				match = !match;
 
 			if (match) {
 				if (cmd->len & F_OR)
 					skip_or = 1;
 			} else {
 				if (!(cmd->len & F_OR)) /* not an OR block, */
 					break;		/* try next rule    */
 			}
 
 		}	/* end of inner loop, scan opcodes */
 #undef PULLUP_LEN
 
 		if (done)
 			break;
 
 /* next_rule:; */	/* try next rule		*/
 
 	}		/* end of outer for, scan rules */
 
 	if (done) {
 		struct ip_fw *rule = chain->map[f_pos];
 		/* Update statistics */
 		IPFW_INC_RULE_COUNTER(rule, pktlen);
 	} else {
 		retval = IP_FW_DENY;
 		printf("ipfw: ouch!, skip past end of rules, denying packet\n");
 	}
 	IPFW_PF_RUNLOCK(chain);
 #ifdef __FreeBSD__
 	if (ucred_cache != NULL)
 		crfree(ucred_cache);
 #endif
 	return (retval);
 
 pullup_failed:
 	if (V_fw_verbose)
 		printf("ipfw: pullup failed\n");
 	return (IP_FW_DENY);
 }
 
 /*
  * Set maximum number of tables that can be used in given VNET ipfw instance.
  */
 #ifdef SYSCTL_NODE
 static int
 sysctl_ipfw_table_num(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	unsigned int ntables;
 
 	ntables = V_fw_tables_max;
 
 	error = sysctl_handle_int(oidp, &ntables, 0, req);
 	/* Read operation or some error */
 	if ((error != 0) || (req->newptr == NULL))
 		return (error);
 
 	return (ipfw_resize_tables(&V_layer3_chain, ntables));
 }
 
 /*
  * Switches table namespace between global and per-set.
  */
 static int
 sysctl_ipfw_tables_sets(SYSCTL_HANDLER_ARGS)
 {
 	int error;
 	unsigned int sets;
 
 	sets = V_fw_tables_sets;
 
 	error = sysctl_handle_int(oidp, &sets, 0, req);
 	/* Read operation or some error */
 	if ((error != 0) || (req->newptr == NULL))
 		return (error);
 
 	return (ipfw_switch_tables_namespace(&V_layer3_chain, sets));
 }
 #endif
 
 /*
  * Module and VNET glue
  */
 
 /*
  * Stuff that must be initialised only on boot or module load
  */
 static int
 ipfw_init(void)
 {
 	int error = 0;
 
 	/*
  	 * Only print out this stuff the first time around,
 	 * when called from the sysinit code.
 	 */
 	printf("ipfw2 "
 #ifdef INET6
 		"(+ipv6) "
 #endif
 		"initialized, divert %s, nat %s, "
 		"default to %s, logging ",
 #ifdef IPDIVERT
 		"enabled",
 #else
 		"loadable",
 #endif
 #ifdef IPFIREWALL_NAT
 		"enabled",
 #else
 		"loadable",
 #endif
 		default_to_accept ? "accept" : "deny");
 
 	/*
 	 * Note: V_xxx variables can be accessed here but the vnet specific
 	 * initializer may not have been called yet for the VIMAGE case.
 	 * Tuneables will have been processed. We will print out values for
 	 * the default vnet. 
 	 * XXX This should all be rationalized AFTER 8.0
 	 */
 	if (V_fw_verbose == 0)
 		printf("disabled\n");
 	else if (V_verbose_limit == 0)
 		printf("unlimited\n");
 	else
 		printf("limited to %d packets/entry by default\n",
 		    V_verbose_limit);
 
 	/* Check user-supplied table count for validness */
 	if (default_fw_tables > IPFW_TABLES_MAX)
 	  default_fw_tables = IPFW_TABLES_MAX;
 
 	ipfw_init_sopt_handler();
 	ipfw_init_obj_rewriter();
 	ipfw_iface_init();
 	return (error);
 }
 
 /*
  * Called for the removal of the last instance only on module unload.
  */
 static void
 ipfw_destroy(void)
 {
 
 	ipfw_iface_destroy();
 	ipfw_destroy_sopt_handler();
 	ipfw_destroy_obj_rewriter();
 	printf("IP firewall unloaded\n");
 }
 
 /*
  * Stuff that must be initialized for every instance
  * (including the first of course).
  */
 static int
 vnet_ipfw_init(const void *unused)
 {
 	int error, first;
 	struct ip_fw *rule = NULL;
 	struct ip_fw_chain *chain;
 
 	chain = &V_layer3_chain;
 
 	first = IS_DEFAULT_VNET(curvnet) ? 1 : 0;
 
 	/* First set up some values that are compile time options */
 	V_autoinc_step = 100;	/* bounded to 1..1000 in add_rule() */
 	V_fw_deny_unknown_exthdrs = 1;
 #ifdef IPFIREWALL_VERBOSE
 	V_fw_verbose = 1;
 #endif
 #ifdef IPFIREWALL_VERBOSE_LIMIT
 	V_verbose_limit = IPFIREWALL_VERBOSE_LIMIT;
 #endif
 #ifdef IPFIREWALL_NAT
 	LIST_INIT(&chain->nat);
 #endif
 
 	/* Init shared services hash table */
 	ipfw_init_srv(chain);
 
 	ipfw_init_counters();
 	/* Set initial number of tables */
 	V_fw_tables_max = default_fw_tables;
 	error = ipfw_init_tables(chain, first);
 	if (error) {
 		printf("ipfw2: setting up tables failed\n");
 		free(chain->map, M_IPFW);
 		free(rule, M_IPFW);
 		return (ENOSPC);
 	}
 
 	IPFW_LOCK_INIT(chain);
 
 	/* fill and insert the default rule */
 	rule = ipfw_alloc_rule(chain, sizeof(struct ip_fw));
 	rule->cmd_len = 1;
 	rule->cmd[0].len = 1;
 	rule->cmd[0].opcode = default_to_accept ? O_ACCEPT : O_DENY;
 	chain->default_rule = rule;
 	ipfw_add_protected_rule(chain, rule, 0);
 
 	ipfw_dyn_init(chain);
 	ipfw_eaction_init(chain, first);
 #ifdef LINEAR_SKIPTO
 	ipfw_init_skipto_cache(chain);
 #endif
 	ipfw_bpf_init(first);
 
 	/* First set up some values that are compile time options */
 	V_ipfw_vnet_ready = 1;		/* Open for business */
 
 	/*
 	 * Hook the sockopt handler and pfil hooks for ipv4 and ipv6.
 	 * Even if the latter two fail we still keep the module alive
 	 * because the sockopt and layer2 paths are still useful.
 	 * ipfw[6]_hook return 0 on success, ENOENT on failure,
 	 * so we can ignore the exact return value and just set a flag.
 	 *
 	 * Note that V_fw[6]_enable are manipulated by a SYSCTL_PROC so
 	 * changes in the underlying (per-vnet) variables trigger
 	 * immediate hook()/unhook() calls.
 	 * In layer2 we have the same behaviour, except that V_ether_ipfw
 	 * is checked on each packet because there are no pfil hooks.
 	 */
 	V_ip_fw_ctl_ptr = ipfw_ctl3;
 	error = ipfw_attach_hooks();
 	return (error);
 }
 
 /*
  * Called for the removal of each instance.
  */
 static int
 vnet_ipfw_uninit(const void *unused)
 {
 	struct ip_fw *reap;
 	struct ip_fw_chain *chain = &V_layer3_chain;
 	int i, last;
 
 	V_ipfw_vnet_ready = 0; /* tell new callers to go away */
 	/*
 	 * disconnect from ipv4, ipv6, layer2 and sockopt.
 	 * Then grab, release and grab again the WLOCK so we make
 	 * sure the update is propagated and nobody will be in.
 	 */
 	ipfw_detach_hooks();
 	V_ip_fw_ctl_ptr = NULL;
 
 	last = IS_DEFAULT_VNET(curvnet) ? 1 : 0;
 
 	IPFW_UH_WLOCK(chain);
 	IPFW_UH_WUNLOCK(chain);
 
 	ipfw_dyn_uninit(0);	/* run the callout_drain */
 
 	IPFW_UH_WLOCK(chain);
 
 	reap = NULL;
 	IPFW_WLOCK(chain);
 	for (i = 0; i < chain->n_rules; i++)
 		ipfw_reap_add(chain, &reap, chain->map[i]);
 	free(chain->map, M_IPFW);
 #ifdef LINEAR_SKIPTO
 	ipfw_destroy_skipto_cache(chain);
 #endif
 	IPFW_WUNLOCK(chain);
 	IPFW_UH_WUNLOCK(chain);
 	ipfw_destroy_tables(chain, last);
 	ipfw_eaction_uninit(chain, last);
 	if (reap != NULL)
 		ipfw_reap_rules(reap);
 	vnet_ipfw_iface_destroy(chain);
 	ipfw_destroy_srv(chain);
 	IPFW_LOCK_DESTROY(chain);
 	ipfw_dyn_uninit(1);	/* free the remaining parts */
 	ipfw_destroy_counters();
 	ipfw_bpf_uninit(last);
 	return (0);
 }
 
 /*
  * Module event handler.
  * In general we have the choice of handling most of these events by the
  * event handler or by the (VNET_)SYS(UN)INIT handlers. I have chosen to
  * use the SYSINIT handlers as they are more capable of expressing the
  * flow of control during module and vnet operations, so this is just
  * a skeleton. Note there is no SYSINIT equivalent of the module
  * SHUTDOWN handler, but we don't have anything to do in that case anyhow.
  */
 static int
 ipfw_modevent(module_t mod, int type, void *unused)
 {
 	int err = 0;
 
 	switch (type) {
 	case MOD_LOAD:
 		/* Called once at module load or
 	 	 * system boot if compiled in. */
 		break;
 	case MOD_QUIESCE:
 		/* Called before unload. May veto unloading. */
 		break;
 	case MOD_UNLOAD:
 		/* Called during unload. */
 		break;
 	case MOD_SHUTDOWN:
 		/* Called during system shutdown. */
 		break;
 	default:
 		err = EOPNOTSUPP;
 		break;
 	}
 	return err;
 }
 
 static moduledata_t ipfwmod = {
 	"ipfw",
 	ipfw_modevent,
 	0
 };
 
 /* Define startup order. */
 #define	IPFW_SI_SUB_FIREWALL	SI_SUB_PROTO_FIREWALL
 #define	IPFW_MODEVENT_ORDER	(SI_ORDER_ANY - 255) /* On boot slot in here. */
 #define	IPFW_MODULE_ORDER	(IPFW_MODEVENT_ORDER + 1) /* A little later. */
 #define	IPFW_VNET_ORDER		(IPFW_MODEVENT_ORDER + 2) /* Later still. */
 
 DECLARE_MODULE(ipfw, ipfwmod, IPFW_SI_SUB_FIREWALL, IPFW_MODEVENT_ORDER);
 FEATURE(ipfw_ctl3, "ipfw new sockopt calls");
 MODULE_VERSION(ipfw, 3);
 /* should declare some dependencies here */
 
 /*
  * Starting up. Done in order after ipfwmod() has been called.
  * VNET_SYSINIT is also called for each existing vnet and each new vnet.
  */
 SYSINIT(ipfw_init, IPFW_SI_SUB_FIREWALL, IPFW_MODULE_ORDER,
 	    ipfw_init, NULL);
 VNET_SYSINIT(vnet_ipfw_init, IPFW_SI_SUB_FIREWALL, IPFW_VNET_ORDER,
 	    vnet_ipfw_init, NULL);
  
 /*
  * Closing up shop. These are done in REVERSE ORDER, but still
  * after ipfwmod() has been called. Not called on reboot.
  * VNET_SYSUNINIT is also called for each exiting vnet as it exits.
  * or when the module is unloaded.
  */
 SYSUNINIT(ipfw_destroy, IPFW_SI_SUB_FIREWALL, IPFW_MODULE_ORDER,
 	    ipfw_destroy, NULL);
 VNET_SYSUNINIT(vnet_ipfw_uninit, IPFW_SI_SUB_FIREWALL, IPFW_VNET_ORDER,
 	    vnet_ipfw_uninit, NULL);
 /* end of file */
Index: user/ngie/bug-237403/sys/opencrypto/cbc_mac.c
===================================================================
--- user/ngie/bug-237403/sys/opencrypto/cbc_mac.c	(revision 346925)
+++ user/ngie/bug-237403/sys/opencrypto/cbc_mac.c	(revision 346926)
@@ -1,268 +1,265 @@
 /*
  * Copyright (c) 2018-2019 iXsystems Inc.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/systm.h>
 #include <sys/param.h>
 #include <sys/endian.h>
 #include <opencrypto/cbc_mac.h>
 #include <opencrypto/xform_auth.h>
 
 /*
  * Given two CCM_CBC_BLOCK_LEN blocks, xor
  * them into dst, and then encrypt dst.
  */
 static void
 xor_and_encrypt(struct aes_cbc_mac_ctx *ctx,
 		const uint8_t *src, uint8_t *dst)
 {
 	const uint64_t *b1;
 	uint64_t *b2;
 	uint64_t temp_block[CCM_CBC_BLOCK_LEN/sizeof(uint64_t)];
 
 	b1 = (const uint64_t*)src;
 	b2 = (uint64_t*)dst;
 
 	for (size_t count = 0;
 	     count < CCM_CBC_BLOCK_LEN/sizeof(uint64_t);
 	     count++) {
 		temp_block[count] = b1[count] ^ b2[count];
 	}
 	rijndaelEncrypt(ctx->keysched, ctx->rounds, (void*)temp_block, dst);
 }
 
 void
 AES_CBC_MAC_Init(struct aes_cbc_mac_ctx *ctx)
 {
 	bzero(ctx, sizeof(*ctx));
 }
 
 void
 AES_CBC_MAC_Setkey(struct aes_cbc_mac_ctx *ctx, const uint8_t *key, uint16_t klen)
 {
 	ctx->rounds = rijndaelKeySetupEnc(ctx->keysched, key, klen * 8);
 }
 
 /*
  * This is called to set the nonce, aka IV.
  * Before this call, the authDataLength and cryptDataLength fields
  * MUST have been set.  Sadly, there's no way to return an error.
  *
  * The CBC-MAC algorithm requires that the first block contain the
  * nonce, as well as information about the sizes and lengths involved.
  */
 void
 AES_CBC_MAC_Reinit(struct aes_cbc_mac_ctx *ctx, const uint8_t *nonce, uint16_t nonceLen)
 {
 	uint8_t b0[CCM_CBC_BLOCK_LEN];
 	uint8_t *bp = b0, flags = 0;
 	uint8_t L = 0;
 	uint64_t dataLength = ctx->cryptDataLength;
-	
-	KASSERT(ctx->authDataLength != 0 || ctx->cryptDataLength != 0,
-	    ("Auth Data and Data lengths cannot both be 0"));
 
 	KASSERT(nonceLen >= 7 && nonceLen <= 13,
 	    ("nonceLen must be between 7 and 13 bytes"));
 
 	ctx->nonce = nonce;
 	ctx->nonceLength = nonceLen;
 	
 	ctx->authDataCount = 0;
 	ctx->blockIndex = 0;
 	explicit_bzero(ctx->staging_block, sizeof(ctx->staging_block));
 	
 	/*
 	 * Need to determine the L field value.  This is the number of
 	 * bytes needed to specify the length of the message; the length
 	 * is whatever is left in the 16 bytes after specifying flags and
 	 * the nonce.
 	 */
 	L = 15 - nonceLen;
 	
 	flags = ((ctx->authDataLength > 0) << 6) +
 	    (((AES_CBC_MAC_HASH_LEN - 2) / 2) << 3) +
 	    L - 1;
 	/*
 	 * Now we need to set up the first block, which has flags, nonce,
 	 * and the message length.
 	 */
 	b0[0] = flags;
 	bcopy(nonce, b0 + 1, nonceLen);
 	bp = b0 + 1 + nonceLen;
 
 	/* Need to copy L' [aka L-1] bytes of cryptDataLength */
 	for (uint8_t *dst = b0 + sizeof(b0) - 1; dst >= bp; dst--) {
 		*dst = dataLength;
 		dataLength >>= 8;
 	}
 	/* Now need to encrypt b0 */
 	rijndaelEncrypt(ctx->keysched, ctx->rounds, b0, ctx->block);
 	/* If there is auth data, we need to set up the staging block */
 	if (ctx->authDataLength) {
 		size_t addLength;
 		if (ctx->authDataLength < ((1<<16) - (1<<8))) {
 			uint16_t sizeVal = htobe16(ctx->authDataLength);
 			bcopy(&sizeVal, ctx->staging_block, sizeof(sizeVal));
 			addLength = sizeof(sizeVal);
 		} else if (ctx->authDataLength < (1ULL<<32)) {
 			uint32_t sizeVal = htobe32(ctx->authDataLength);
 			ctx->staging_block[0] = 0xff;
 			ctx->staging_block[1] = 0xfe;
 			bcopy(&sizeVal, ctx->staging_block+2, sizeof(sizeVal));
 			addLength = 2 + sizeof(sizeVal);
 		} else {
 			uint64_t sizeVal = htobe64(ctx->authDataLength);
 			ctx->staging_block[0] = 0xff;
 			ctx->staging_block[1] = 0xff;
 			bcopy(&sizeVal, ctx->staging_block+2, sizeof(sizeVal));
 			addLength = 2 + sizeof(sizeVal);
 		}
 		ctx->blockIndex = addLength;
 		/*
 		 * The length descriptor goes into the AAD buffer, so we
 		 * need to account for it.
 		 */
 		ctx->authDataLength += addLength;
 		ctx->authDataCount = addLength;
 	}
 }
 
 int
 AES_CBC_MAC_Update(struct aes_cbc_mac_ctx *ctx, const uint8_t *data,
     uint16_t length)
 {
 	size_t copy_amt;
 	
 	/*
 	 * This will be called in one of two phases:
 	 * (1)  Applying authentication data, or
 	 * (2)  Applying the payload data.
 	 *
 	 * Because CBC-MAC puts the authentication data size before the
 	 * data, subsequent calls won't be block-size-aligned.  Which
 	 * complicates things a fair bit.
 	 *
 	 * The payload data doesn't have that problem.
 	 */
 				
 	if (ctx->authDataCount < ctx->authDataLength) {
 		/*
 		 * We need to process data as authentication data.
 		 * Since we may be out of sync, we may also need
 		 * to pad out the staging block.
 		 */
 		const uint8_t *ptr = data;
 		while (length > 0) {
 
 			copy_amt = MIN(length,
 			    sizeof(ctx->staging_block) - ctx->blockIndex);
 
 			bcopy(ptr, ctx->staging_block + ctx->blockIndex,
 			    copy_amt);
 			ptr += copy_amt;
 			length -= copy_amt;
 			ctx->authDataCount += copy_amt;
 			ctx->blockIndex += copy_amt;
 			ctx->blockIndex %= sizeof(ctx->staging_block);
 
 			if (ctx->blockIndex == 0 ||
 			    ctx->authDataCount == ctx->authDataLength) {
 				/*
 				 * We're done with this block, so we
 				 * xor staging_block with block, and then
 				 * encrypt it.
 				 */
 				xor_and_encrypt(ctx, ctx->staging_block, ctx->block);
 				bzero(ctx->staging_block, sizeof(ctx->staging_block));
 				ctx->blockIndex = 0;
 				if (ctx->authDataCount >= ctx->authDataLength)
 					break;
 			}
 		}
 		/*
 		 * We'd like to be able to check length == 0 and return
 		 * here, but the way OCF calls us, length is always
 		 * blksize (16, in this case).  So we have to count on
 		 * the fact that OCF calls us separately for the AAD and
 		 * for the real data.
 		 */
 		return (0);
 	}
 	/*
 	 * If we're here, then we're encoding payload data.
 	 * This is marginally easier, except that _Update can
 	 * be called with non-aligned update lengths. As a result,
 	 * we still need to use the staging block.
 	 */
 	KASSERT((length + ctx->cryptDataCount) <= ctx->cryptDataLength,
 	    ("More encryption data than allowed"));
 
 	while (length) {
 		uint8_t *ptr;
 		
 		copy_amt = MIN(sizeof(ctx->staging_block) - ctx->blockIndex,
 		    length);
 		ptr = ctx->staging_block + ctx->blockIndex;
 		bcopy(data, ptr, copy_amt);
 		data += copy_amt;
 		ctx->blockIndex += copy_amt;
 		ctx->cryptDataCount += copy_amt;
 		length -= copy_amt;
 		if (ctx->blockIndex == sizeof(ctx->staging_block)) {
 			/* We've got a full block */
 			xor_and_encrypt(ctx, ctx->staging_block, ctx->block);
 			ctx->blockIndex = 0;
 			bzero(ctx->staging_block, sizeof(ctx->staging_block));
 		}
 	}
 	return (0);
 }
 
 void
 AES_CBC_MAC_Final(uint8_t *buf, struct aes_cbc_mac_ctx *ctx)
 {
 	uint8_t s0[CCM_CBC_BLOCK_LEN];
 	
 	/*
 	 * We first need to check to see if we've got any data
 	 * left over to encrypt.
 	 */
 	if (ctx->blockIndex != 0) {
 		xor_and_encrypt(ctx, ctx->staging_block, ctx->block);
 		ctx->cryptDataCount += ctx->blockIndex;
 		ctx->blockIndex = 0;
 		explicit_bzero(ctx->staging_block, sizeof(ctx->staging_block));
 	}
 	bzero(s0, sizeof(s0));
 	s0[0] = (15 - ctx->nonceLength) - 1;
 	bcopy(ctx->nonce, s0 + 1, ctx->nonceLength);
 	rijndaelEncrypt(ctx->keysched, ctx->rounds, s0, s0);
 	for (size_t indx = 0; indx < AES_CBC_MAC_HASH_LEN; indx++)
 		buf[indx] = ctx->block[indx] ^ s0[indx];
 	explicit_bzero(s0, sizeof(s0));
 }
Index: user/ngie/bug-237403/sys/powerpc/aim/aim_machdep.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/aim/aim_machdep.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/aim/aim_machdep.c	(revision 346926)
@@ -1,694 +1,695 @@
 /*-
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *      This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 /*-
  * Copyright (C) 2001 Benno Rice
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *	$NetBSD: machdep.c,v 1.74.2.1 2000/11/01 16:13:48 tv Exp $
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 #include "opt_kstack_pages.h"
 #include "opt_platform.h"
 
 #include <sys/param.h>
 #include <sys/proc.h>
 #include <sys/systm.h>
 #include <sys/bio.h>
 #include <sys/buf.h>
 #include <sys/bus.h>
 #include <sys/cons.h>
 #include <sys/cpu.h>
 #include <sys/eventhandler.h>
 #include <sys/exec.h>
 #include <sys/imgact.h>
 #include <sys/kdb.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/linker.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/msgbuf.h>
 #include <sys/mutex.h>
 #include <sys/ptrace.h>
 #include <sys/reboot.h>
 #include <sys/rwlock.h>
 #include <sys/signalvar.h>
 #include <sys/syscallsubr.h>
 #include <sys/sysctl.h>
 #include <sys/sysent.h>
 #include <sys/sysproto.h>
 #include <sys/ucontext.h>
 #include <sys/uio.h>
 #include <sys/vmmeter.h>
 #include <sys/vnode.h>
 
 #include <net/netisr.h>
 
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_page.h>
 #include <vm/vm_map.h>
 #include <vm/vm_object.h>
 #include <vm/vm_pager.h>
 
 #include <machine/altivec.h>
 #ifndef __powerpc64__
 #include <machine/bat.h>
 #endif
 #include <machine/cpu.h>
 #include <machine/elf.h>
 #include <machine/fpu.h>
 #include <machine/hid.h>
 #include <machine/kdb.h>
 #include <machine/md_var.h>
 #include <machine/metadata.h>
 #include <machine/mmuvar.h>
 #include <machine/pcb.h>
 #include <machine/reg.h>
 #include <machine/sigframe.h>
 #include <machine/spr.h>
 #include <machine/trap.h>
 #include <machine/vmparam.h>
 #include <machine/ofw_machdep.h>
 
 #include <ddb/ddb.h>
 
 #include <dev/ofw/openfirm.h>
 
 #ifdef __powerpc64__
 #include "mmu_oea64.h"
 #endif
 
 #ifndef __powerpc64__
 struct bat	battable[16];
 #endif
 
 #ifndef __powerpc64__
 /* Bits for running on 64-bit systems in 32-bit mode. */
 extern void	*testppc64, *testppc64size;
 extern void	*restorebridge, *restorebridgesize;
 extern void	*rfid_patch, *rfi_patch1, *rfi_patch2;
 extern void	*trapcode64;
 
 extern Elf_Addr	_GLOBAL_OFFSET_TABLE_[];
 #endif
 
 extern void	*rstcode, *rstcodeend;
 extern void	*trapcode, *trapcodeend;
 extern void	*hypertrapcode, *hypertrapcodeend;
 extern void	*generictrap, *generictrap64;
 extern void	*alitrap, *aliend;
 extern void	*dsitrap, *dsiend;
 extern void	*decrint, *decrsize;
 extern void     *extint, *extsize;
 extern void	*dblow, *dbend;
 extern void	*imisstrap, *imisssize;
 extern void	*dlmisstrap, *dlmisssize;
 extern void	*dsmisstrap, *dsmisssize;
 
 extern void *ap_pcpu;
 extern void __restartkernel(vm_offset_t, vm_offset_t, vm_offset_t, void *, uint32_t, register_t offset, register_t msr);
 
 void aim_early_init(vm_offset_t fdt, vm_offset_t toc, vm_offset_t ofentry,
     void *mdp, uint32_t mdp_cookie);
 void aim_cpu_init(vm_offset_t toc);
 
 void
 aim_early_init(vm_offset_t fdt, vm_offset_t toc, vm_offset_t ofentry, void *mdp,
     uint32_t mdp_cookie)
 {
 	register_t	scratch;
 
 	/*
 	 * If running from an FDT, make sure we are in real mode to avoid
 	 * tromping on firmware page tables. Everything in the kernel assumes
 	 * 1:1 mappings out of firmware, so this won't break anything not
 	 * already broken. This doesn't work if there is live OF, since OF
 	 * may internally use non-1:1 mappings.
 	 */
 	if (ofentry == 0)
 		mtmsr(mfmsr() & ~(PSL_IR | PSL_DR));
 
 #ifdef __powerpc64__
 	/*
 	 * If in real mode, relocate to high memory so that the kernel
 	 * can execute from the direct map.
 	 */
 	if (!(mfmsr() & PSL_DR) &&
 	    (vm_offset_t)&aim_early_init < DMAP_BASE_ADDRESS)
 		__restartkernel(fdt, 0, ofentry, mdp, mdp_cookie,
 		    DMAP_BASE_ADDRESS, mfmsr());
 #endif
 
 	/* Various very early CPU fix ups */
 	switch (mfpvr() >> 16) {
 		/*
 		 * PowerPC 970 CPUs have a misfeature requested by Apple that
 		 * makes them pretend they have a 32-byte cacheline. Turn this
 		 * off before we measure the cacheline size.
 		 */
 		case IBM970:
 		case IBM970FX:
 		case IBM970MP:
 		case IBM970GX:
 			scratch = mfspr(SPR_HID5);
 			scratch &= ~HID5_970_DCBZ_SIZE_HI;
 			mtspr(SPR_HID5, scratch);
 			break;
 	#ifdef __powerpc64__
 		case IBMPOWER7:
 		case IBMPOWER7PLUS:
 		case IBMPOWER8:
 		case IBMPOWER8E:
+		case IBMPOWER8NVL:
 		case IBMPOWER9:
 			/* XXX: get from ibm,slb-size in device tree */
 			n_slbs = 32;
 			break;
 	#endif
 	}
 }
 
 void
 aim_cpu_init(vm_offset_t toc)
 {
 	size_t		trap_offset, trapsize;
 	vm_offset_t	trap;
 	register_t	msr;
 	uint8_t		*cache_check;
 	int		cacheline_warn;
 #ifndef __powerpc64__
 	register_t	scratch;
 	int		ppc64;
 #endif
 
 	trap_offset = 0;
 	cacheline_warn = 0;
 
 	/* General setup for AIM CPUs */
 	psl_kernset = PSL_EE | PSL_ME | PSL_IR | PSL_DR | PSL_RI;
 
 #ifdef __powerpc64__
 	psl_kernset |= PSL_SF;
 	if (mfmsr() & PSL_HV)
 		psl_kernset |= PSL_HV;
 #endif
 	psl_userset = psl_kernset | PSL_PR;
 #ifdef __powerpc64__
 	psl_userset32 = psl_userset & ~PSL_SF;
 #endif
 
 	/* Bits that users aren't allowed to change */
 	psl_userstatic = ~(PSL_VEC | PSL_FP | PSL_FE0 | PSL_FE1);
 	/*
 	 * Mask bits from the SRR1 that aren't really the MSR:
 	 * Bits 1-4, 10-15 (ppc32), 33-36, 42-47 (ppc64)
 	 */
 	psl_userstatic &= ~0x783f0000UL;
 
 	/*
 	 * Initialize the interrupt tables and figure out our cache line
 	 * size and whether or not we need the 64-bit bridge code.
 	 */
 
 	/*
 	 * Disable translation in case the vector area hasn't been
 	 * mapped (G5). Note that no OFW calls can be made until
 	 * translation is re-enabled.
 	 */
 
 	msr = mfmsr();
 	mtmsr((msr & ~(PSL_IR | PSL_DR)) | PSL_RI);
 
 	/*
 	 * Measure the cacheline size using dcbz
 	 *
 	 * Use EXC_PGM as a playground. We are about to overwrite it
 	 * anyway, we know it exists, and we know it is cache-aligned.
 	 */
 
 	cache_check = (void *)EXC_PGM;
 
 	for (cacheline_size = 0; cacheline_size < 0x100; cacheline_size++)
 		cache_check[cacheline_size] = 0xff;
 
 	__asm __volatile("dcbz 0,%0":: "r" (cache_check) : "memory");
 
 	/* Find the first byte dcbz did not zero to get the cache line size */
 	for (cacheline_size = 0; cacheline_size < 0x100 &&
 	    cache_check[cacheline_size] == 0; cacheline_size++);
 
 	/* Work around psim bug */
 	if (cacheline_size == 0) {
 		cacheline_warn = 1;
 		cacheline_size = 32;
 	}
 
 	#ifndef __powerpc64__
 	/*
 	 * Figure out whether we need to use the 64 bit PMAP. This works by
 	 * executing an instruction that is only legal on 64-bit PPC (mtmsrd),
 	 * and setting ppc64 = 0 if that causes a trap.
 	 */
 
 	ppc64 = 1;
 
 	bcopy(&testppc64, (void *)EXC_PGM,  (size_t)&testppc64size);
 	__syncicache((void *)EXC_PGM, (size_t)&testppc64size);
 
 	__asm __volatile("\
 		mfmsr %0;	\
 		mtsprg2 %1;	\
 				\
 		mtmsrd %0;	\
 		mfsprg2 %1;"
 	    : "=r"(scratch), "=r"(ppc64));
 
 	if (ppc64)
 		cpu_features |= PPC_FEATURE_64;
 
 	/*
 	 * Now copy restorebridge into all the handlers, if necessary,
 	 * and set up the trap tables.
 	 */
 
 	if (cpu_features & PPC_FEATURE_64) {
 		/* Patch the two instances of rfi -> rfid */
 		bcopy(&rfid_patch,&rfi_patch1,4);
 	#ifdef KDB
 		/* rfi_patch2 is at the end of dbleave */
 		bcopy(&rfid_patch,&rfi_patch2,4);
 	#endif
 	}
 	#else /* powerpc64 */
 	cpu_features |= PPC_FEATURE_64;
 	#endif
 
 	trapsize = (size_t)&trapcodeend - (size_t)&trapcode;
 
 	/*
 	 * Copy generic handler into every possible trap. Special cases will get
 	 * different ones in a minute.
 	 */
 	for (trap = EXC_RST; trap < EXC_LAST; trap += 0x20)
 		bcopy(&trapcode, (void *)trap, trapsize);
 
 	#ifndef __powerpc64__
 	if (cpu_features & PPC_FEATURE_64) {
 		/*
 		 * Copy a code snippet to restore 32-bit bridge mode
 		 * to the top of every non-generic trap handler
 		 */
 
 		trap_offset += (size_t)&restorebridgesize;
 		bcopy(&restorebridge, (void *)EXC_RST, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_DSI, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_ALI, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_PGM, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_MCHK, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_TRC, trap_offset);
 		bcopy(&restorebridge, (void *)EXC_BPT, trap_offset);
 	}
 	#else
 	trapsize = (size_t)&hypertrapcodeend - (size_t)&hypertrapcode;
 	bcopy(&hypertrapcode, (void *)(EXC_HEA + trap_offset), trapsize);
 	bcopy(&hypertrapcode, (void *)(EXC_HMI + trap_offset), trapsize);
 	bcopy(&hypertrapcode, (void *)(EXC_HVI + trap_offset), trapsize);
 	bcopy(&hypertrapcode, (void *)(EXC_SOFT_PATCH + trap_offset), trapsize);
 	#endif
 
 	bcopy(&rstcode, (void *)(EXC_RST + trap_offset), (size_t)&rstcodeend -
 	    (size_t)&rstcode);
 
 #ifdef KDB
 	bcopy(&dblow, (void *)(EXC_MCHK + trap_offset), (size_t)&dbend -
 	    (size_t)&dblow);
 	bcopy(&dblow, (void *)(EXC_PGM + trap_offset), (size_t)&dbend -
 	    (size_t)&dblow);
 	bcopy(&dblow, (void *)(EXC_TRC + trap_offset), (size_t)&dbend -
 	    (size_t)&dblow);
 	bcopy(&dblow, (void *)(EXC_BPT + trap_offset), (size_t)&dbend -
 	    (size_t)&dblow);
 #endif
 	bcopy(&alitrap,  (void *)(EXC_ALI + trap_offset),  (size_t)&aliend -
 	    (size_t)&alitrap);
 	bcopy(&dsitrap,  (void *)(EXC_DSI + trap_offset),  (size_t)&dsiend -
 	    (size_t)&dsitrap);
 
 	#ifdef __powerpc64__
 	/* Set TOC base so that the interrupt code can get at it */
 	*((void **)TRAP_GENTRAP) = &generictrap;
 	*((register_t *)TRAP_TOCBASE) = toc;
 	#else
 	/* Set branch address for trap code */
 	if (cpu_features & PPC_FEATURE_64)
 		*((void **)TRAP_GENTRAP) = &generictrap64;
 	else
 		*((void **)TRAP_GENTRAP) = &generictrap;
 	*((void **)TRAP_TOCBASE) = _GLOBAL_OFFSET_TABLE_;
 
 	/* G2-specific TLB miss helper handlers */
 	bcopy(&imisstrap, (void *)EXC_IMISS,  (size_t)&imisssize);
 	bcopy(&dlmisstrap, (void *)EXC_DLMISS,  (size_t)&dlmisssize);
 	bcopy(&dsmisstrap, (void *)EXC_DSMISS,  (size_t)&dsmisssize);
 	#endif
 	__syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD);
 
 	/*
 	 * Restore MSR
 	 */
 	mtmsr(msr);
 
 	/* Warn if cachline size was not determined */
 	if (cacheline_warn == 1) {
 		printf("WARNING: cacheline size undetermined, setting to 32\n");
 	}
 
 	/*
 	 * Initialise virtual memory. Use BUS_PROBE_GENERIC priority
 	 * in case the platform module had a better idea of what we
 	 * should do.
 	 */
 	if (cpu_features & PPC_FEATURE_64)
 		pmap_mmu_install(MMU_TYPE_G5, BUS_PROBE_GENERIC);
 	else
 		pmap_mmu_install(MMU_TYPE_OEA, BUS_PROBE_GENERIC);
 }
 
 /*
  * Shutdown the CPU as much as possible.
  */
 void
 cpu_halt(void)
 {
 
 	OF_exit();
 }
 
 int
 ptrace_single_step(struct thread *td)
 {
 	struct trapframe *tf;
 
 	tf = td->td_frame;
 	tf->srr1 |= PSL_SE;
 
 	return (0);
 }
 
 int
 ptrace_clear_single_step(struct thread *td)
 {
 	struct trapframe *tf;
 
 	tf = td->td_frame;
 	tf->srr1 &= ~PSL_SE;
 
 	return (0);
 }
 
 void
 kdb_cpu_clear_singlestep(void)
 {
 
 	kdb_frame->srr1 &= ~PSL_SE;
 }
 
 void
 kdb_cpu_set_singlestep(void)
 {
 
 	kdb_frame->srr1 |= PSL_SE;
 }
 
 /*
  * Initialise a struct pcpu.
  */
 void
 cpu_pcpu_init(struct pcpu *pcpu, int cpuid, size_t sz)
 {
 #ifdef __powerpc64__
 /* Copy the SLB contents from the current CPU */
 memcpy(pcpu->pc_aim.slb, PCPU_GET(aim.slb), sizeof(pcpu->pc_aim.slb));
 #endif
 }
 
 #ifndef __powerpc64__
 uint64_t
 va_to_vsid(pmap_t pm, vm_offset_t va)
 {
 	return ((pm->pm_sr[(uintptr_t)va >> ADDR_SR_SHFT]) & SR_VSID_MASK);
 }
 
 #endif
 
 /*
  * These functions need to provide addresses that both (a) work in real mode
  * (or whatever mode/circumstances the kernel is in in early boot (now)) and
  * (b) can still, in principle, work once the kernel is going. Because these
  * rely on existing mappings/real mode, unmap is a no-op.
  */
 vm_offset_t
 pmap_early_io_map(vm_paddr_t pa, vm_size_t size)
 {
 	KASSERT(!pmap_bootstrapped, ("Not available after PMAP started!"));
 
 	/*
 	 * If we have the MMU up in early boot, assume it is 1:1. Otherwise,
 	 * try to get the address in a memory region compatible with the
 	 * direct map for efficiency later.
 	 */
 	if (mfmsr() & PSL_DR)
 		return (pa);
 	else
 		return (DMAP_BASE_ADDRESS + pa);
 }
 
 void
 pmap_early_io_unmap(vm_offset_t va, vm_size_t size)
 {
 
 	KASSERT(!pmap_bootstrapped, ("Not available after PMAP started!"));
 }
 
 /* From p3-53 of the MPC7450 RISC Microprocessor Family Reference Manual */
 void
 flush_disable_caches(void)
 {
 	register_t msr;
 	register_t msscr0;
 	register_t cache_reg;
 	volatile uint32_t *memp;
 	uint32_t temp;
 	int i;
 	int x;
 
 	msr = mfmsr();
 	powerpc_sync();
 	mtmsr(msr & ~(PSL_EE | PSL_DR));
 	msscr0 = mfspr(SPR_MSSCR0);
 	msscr0 &= ~MSSCR0_L2PFE;
 	mtspr(SPR_MSSCR0, msscr0);
 	powerpc_sync();
 	isync();
 	__asm__ __volatile__("dssall; sync");
 	powerpc_sync();
 	isync();
 	__asm__ __volatile__("dcbf 0,%0" :: "r"(0));
 	__asm__ __volatile__("dcbf 0,%0" :: "r"(0));
 	__asm__ __volatile__("dcbf 0,%0" :: "r"(0));
 
 	/* Lock the L1 Data cache. */
 	mtspr(SPR_LDSTCR, mfspr(SPR_LDSTCR) | 0xFF);
 	powerpc_sync();
 	isync();
 
 	mtspr(SPR_LDSTCR, 0);
 
 	/*
 	 * Perform this in two stages: Flush the cache starting in RAM, then do it
 	 * from ROM.
 	 */
 	memp = (volatile uint32_t *)0x00000000;
 	for (i = 0; i < 128 * 1024; i++) {
 		temp = *memp;
 		__asm__ __volatile__("dcbf 0,%0" :: "r"(memp));
 		memp += 32/sizeof(*memp);
 	}
 
 	memp = (volatile uint32_t *)0xfff00000;
 	x = 0xfe;
 
 	for (; x != 0xff;) {
 		mtspr(SPR_LDSTCR, x);
 		for (i = 0; i < 128; i++) {
 			temp = *memp;
 			__asm__ __volatile__("dcbf 0,%0" :: "r"(memp));
 			memp += 32/sizeof(*memp);
 		}
 		x = ((x << 1) | 1) & 0xff;
 	}
 	mtspr(SPR_LDSTCR, 0);
 
 	cache_reg = mfspr(SPR_L2CR);
 	if (cache_reg & L2CR_L2E) {
 		cache_reg &= ~(L2CR_L2IO_7450 | L2CR_L2DO_7450);
 		mtspr(SPR_L2CR, cache_reg);
 		powerpc_sync();
 		mtspr(SPR_L2CR, cache_reg | L2CR_L2HWF);
 		while (mfspr(SPR_L2CR) & L2CR_L2HWF)
 			; /* Busy wait for cache to flush */
 		powerpc_sync();
 		cache_reg &= ~L2CR_L2E;
 		mtspr(SPR_L2CR, cache_reg);
 		powerpc_sync();
 		mtspr(SPR_L2CR, cache_reg | L2CR_L2I);
 		powerpc_sync();
 		while (mfspr(SPR_L2CR) & L2CR_L2I)
 			; /* Busy wait for L2 cache invalidate */
 		powerpc_sync();
 	}
 
 	cache_reg = mfspr(SPR_L3CR);
 	if (cache_reg & L3CR_L3E) {
 		cache_reg &= ~(L3CR_L3IO | L3CR_L3DO);
 		mtspr(SPR_L3CR, cache_reg);
 		powerpc_sync();
 		mtspr(SPR_L3CR, cache_reg | L3CR_L3HWF);
 		while (mfspr(SPR_L3CR) & L3CR_L3HWF)
 			; /* Busy wait for cache to flush */
 		powerpc_sync();
 		cache_reg &= ~L3CR_L3E;
 		mtspr(SPR_L3CR, cache_reg);
 		powerpc_sync();
 		mtspr(SPR_L3CR, cache_reg | L3CR_L3I);
 		powerpc_sync();
 		while (mfspr(SPR_L3CR) & L3CR_L3I)
 			; /* Busy wait for L3 cache invalidate */
 		powerpc_sync();
 	}
 
 	mtspr(SPR_HID0, mfspr(SPR_HID0) & ~HID0_DCE);
 	powerpc_sync();
 	isync();
 
 	mtmsr(msr);
 }
 
 void
 cpu_sleep()
 {
 	static u_quad_t timebase = 0;
 	static register_t sprgs[4];
 	static register_t srrs[2];
 
 	jmp_buf resetjb;
 	struct thread *fputd;
 	struct thread *vectd;
 	register_t hid0;
 	register_t msr;
 	register_t saved_msr;
 
 	ap_pcpu = pcpup;
 
 	PCPU_SET(restore, &resetjb);
 
 	saved_msr = mfmsr();
 	fputd = PCPU_GET(fputhread);
 	vectd = PCPU_GET(vecthread);
 	if (fputd != NULL)
 		save_fpu(fputd);
 	if (vectd != NULL)
 		save_vec(vectd);
 	if (setjmp(resetjb) == 0) {
 		sprgs[0] = mfspr(SPR_SPRG0);
 		sprgs[1] = mfspr(SPR_SPRG1);
 		sprgs[2] = mfspr(SPR_SPRG2);
 		sprgs[3] = mfspr(SPR_SPRG3);
 		srrs[0] = mfspr(SPR_SRR0);
 		srrs[1] = mfspr(SPR_SRR1);
 		timebase = mftb();
 		powerpc_sync();
 		flush_disable_caches();
 		hid0 = mfspr(SPR_HID0);
 		hid0 = (hid0 & ~(HID0_DOZE | HID0_NAP)) | HID0_SLEEP;
 		powerpc_sync();
 		isync();
 		msr = mfmsr() | PSL_POW;
 		mtspr(SPR_HID0, hid0);
 		powerpc_sync();
 
 		while (1)
 			mtmsr(msr);
 	}
 	platform_smp_timebase_sync(timebase, 0);
 	PCPU_SET(curthread, curthread);
 	PCPU_SET(curpcb, curthread->td_pcb);
 	pmap_activate(curthread);
 	powerpc_sync();
 	mtspr(SPR_SPRG0, sprgs[0]);
 	mtspr(SPR_SPRG1, sprgs[1]);
 	mtspr(SPR_SPRG2, sprgs[2]);
 	mtspr(SPR_SPRG3, sprgs[3]);
 	mtspr(SPR_SRR0, srrs[0]);
 	mtspr(SPR_SRR1, srrs[1]);
 	mtmsr(saved_msr);
 	if (fputd == curthread)
 		enable_fpu(curthread);
 	if (vectd == curthread)
 		enable_vec(curthread);
 	powerpc_sync();
 }
 
Index: user/ngie/bug-237403/sys/powerpc/aim/mp_cpudep.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/aim/mp_cpudep.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/aim/mp_cpudep.c	(revision 346926)
@@ -1,420 +1,428 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2008 Marcel Moolenaar
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/smp.h>
 
 #include <machine/bus.h>
 #include <machine/cpu.h>
 #include <machine/hid.h>
 #include <machine/intr_machdep.h>
 #include <machine/pcb.h>
 #include <machine/psl.h>
 #include <machine/smp.h>
 #include <machine/spr.h>
 #include <machine/trap.h>
 
 #include <dev/ofw/openfirm.h>
 #include <machine/ofw_machdep.h>
 
 void *ap_pcpu;
 
 static register_t bsp_state[8] __aligned(8);
 
 static void cpudep_save_config(void *dummy);
 SYSINIT(cpu_save_config, SI_SUB_CPU, SI_ORDER_ANY, cpudep_save_config, NULL);
 
 void
 cpudep_ap_early_bootstrap(void)
 {
 #ifndef __powerpc64__
 	register_t reg;
 #endif
 
 	switch (mfpvr() >> 16) {
 	case IBM970:
 	case IBM970FX:
 	case IBM970MP:
 		/* Restore HID4 and HID5, which are necessary for the MMU */
 
 #ifdef __powerpc64__
 		mtspr(SPR_HID4, bsp_state[2]); powerpc_sync(); isync();
 		mtspr(SPR_HID5, bsp_state[3]); powerpc_sync(); isync();
 #else
 		__asm __volatile("ld %0, 16(%2); sync; isync;	\
 		    mtspr %1, %0; sync; isync;"
 		    : "=r"(reg) : "K"(SPR_HID4), "b"(bsp_state));
 		__asm __volatile("ld %0, 24(%2); sync; isync;	\
 		    mtspr %1, %0; sync; isync;"
 		    : "=r"(reg) : "K"(SPR_HID5), "b"(bsp_state));
 #endif
 		powerpc_sync();
 		break;
 	case IBMPOWER8:
 	case IBMPOWER8E:
+	case IBMPOWER8NVL:
 	case IBMPOWER9:
 #ifdef __powerpc64__
 		if (mfmsr() & PSL_HV) {
 			isync();
 			/*
 			 * Direct interrupts to SRR instead of HSRR and
 			 * reset LPCR otherwise
 			 */
 			mtspr(SPR_LPID, 0);
 			isync();
 
 			mtspr(SPR_LPCR, lpcr);
 			isync();
+
+			/*
+			 * Nuke FSCR, to be managed on a per-process basis
+			 * later.
+			 */
+			mtspr(SPR_FSCR, 0);
 		}
 #endif
 		break;
 	}
 
 	__asm __volatile("mtsprg 0, %0" :: "r"(ap_pcpu));
 	powerpc_sync();
 }
 
 uintptr_t
 cpudep_ap_bootstrap(void)
 {
 	register_t msr, sp;
 
 	msr = psl_kernset & ~PSL_EE;
 	mtmsr(msr);
 
 	pcpup->pc_curthread = pcpup->pc_idlethread;
 #ifdef __powerpc64__
 	__asm __volatile("mr 13,%0" :: "r"(pcpup->pc_curthread));
 #else
 	__asm __volatile("mr 2,%0" :: "r"(pcpup->pc_curthread));
 #endif
 	pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb;
 	sp = pcpup->pc_curpcb->pcb_sp;
 
 	return (sp);
 }
 
 static register_t
 mpc74xx_l2_enable(register_t l2cr_config)
 {
 	register_t ccr, bit;
 	uint16_t	vers;
 
 	vers = mfpvr() >> 16;
 	switch (vers) {
 	case MPC7400:
 	case MPC7410:
 		bit = L2CR_L2IP;
 		break;
 	default:
 		bit = L2CR_L2I;
 		break;
 	}
 
 	ccr = mfspr(SPR_L2CR);
 	if (ccr & L2CR_L2E)
 		return (ccr);
 
 	/* Configure L2 cache. */
 	ccr = l2cr_config & ~L2CR_L2E;
 	mtspr(SPR_L2CR, ccr | L2CR_L2I);
 	do {
 		ccr = mfspr(SPR_L2CR);
 	} while (ccr & bit);
 	powerpc_sync();
 	mtspr(SPR_L2CR, l2cr_config);
 	powerpc_sync();
 
 	return (l2cr_config);
 }
 
 static register_t
 mpc745x_l3_enable(register_t l3cr_config)
 {
 	register_t ccr;
 
 	ccr = mfspr(SPR_L3CR);
 	if (ccr & L3CR_L3E)
 		return (ccr);
 
 	/* Configure L3 cache. */
 	ccr = l3cr_config & ~(L3CR_L3E | L3CR_L3I | L3CR_L3PE | L3CR_L3CLKEN);
 	mtspr(SPR_L3CR, ccr);
 	ccr |= 0x4000000;       /* Magic, but documented. */
 	mtspr(SPR_L3CR, ccr);
 	ccr |= L3CR_L3CLKEN;
 	mtspr(SPR_L3CR, ccr);
 	mtspr(SPR_L3CR, ccr | L3CR_L3I);
 	while (mfspr(SPR_L3CR) & L3CR_L3I)
 		;
 	mtspr(SPR_L3CR, ccr & ~L3CR_L3CLKEN);
 	powerpc_sync();
 	DELAY(100);
 	mtspr(SPR_L3CR, ccr);
 	powerpc_sync();
 	DELAY(100);
 	ccr |= L3CR_L3E;
 	mtspr(SPR_L3CR, ccr);
 	powerpc_sync();
 
 	return(ccr);
 }
 
 static register_t
 mpc74xx_l1d_enable(void)
 {
 	register_t hid;
 
 	hid = mfspr(SPR_HID0);
 	if (hid & HID0_DCE)
 		return (hid);
 
 	/* Enable L1 D-cache */
 	hid |= HID0_DCE;
 	powerpc_sync();
 	mtspr(SPR_HID0, hid | HID0_DCFI);
 	powerpc_sync();
 
 	return (hid);
 }
 
 static register_t
 mpc74xx_l1i_enable(void)
 {
 	register_t hid;
 
 	hid = mfspr(SPR_HID0);
 	if (hid & HID0_ICE)
 		return (hid);
 
 	/* Enable L1 I-cache */
 	hid |= HID0_ICE;
 	isync();
 	mtspr(SPR_HID0, hid | HID0_ICFI);
 	isync();
 
 	return (hid);
 }
 
 static void
 cpudep_save_config(void *dummy)
 {
 	uint16_t	vers;
 
 	vers = mfpvr() >> 16;
 
 	switch(vers) {
 	case IBM970:
 	case IBM970FX:
 	case IBM970MP:
 		#ifdef __powerpc64__
 		bsp_state[0] = mfspr(SPR_HID0);
 		bsp_state[1] = mfspr(SPR_HID1);
 		bsp_state[2] = mfspr(SPR_HID4);
 		bsp_state[3] = mfspr(SPR_HID5);
 		#else
 		__asm __volatile ("mfspr %0,%2; mr %1,%0; srdi %0,%0,32"
 		    : "=r" (bsp_state[0]),"=r" (bsp_state[1]) : "K" (SPR_HID0));
 		__asm __volatile ("mfspr %0,%2; mr %1,%0; srdi %0,%0,32"
 		    : "=r" (bsp_state[2]),"=r" (bsp_state[3]) : "K" (SPR_HID1));
 		__asm __volatile ("mfspr %0,%2; mr %1,%0; srdi %0,%0,32"
 		    : "=r" (bsp_state[4]),"=r" (bsp_state[5]) : "K" (SPR_HID4));
 		__asm __volatile ("mfspr %0,%2; mr %1,%0; srdi %0,%0,32"
 		    : "=r" (bsp_state[6]),"=r" (bsp_state[7]) : "K" (SPR_HID5));
 		#endif
 
 		powerpc_sync();
 
 		break;
 	case IBMCELLBE:
 		#ifdef NOTYET /* Causes problems if in instruction stream on 970 */
 		if (mfmsr() & PSL_HV) {
 			bsp_state[0] = mfspr(SPR_HID0);
 			bsp_state[1] = mfspr(SPR_HID1);
 			bsp_state[2] = mfspr(SPR_HID4);
 			bsp_state[3] = mfspr(SPR_HID6);
 
 			bsp_state[4] = mfspr(SPR_CELL_TSCR);
 		}
 		#endif
 
 		bsp_state[5] = mfspr(SPR_CELL_TSRL);
 
 		break;
 	case MPC7450:
 	case MPC7455:
 	case MPC7457:
 		/* Only MPC745x CPUs have an L3 cache. */
 		bsp_state[3] = mfspr(SPR_L3CR);
 
 		/* Fallthrough */
 	case MPC7400:
 	case MPC7410:
 	case MPC7447A:
 	case MPC7448:
 		bsp_state[2] = mfspr(SPR_L2CR);
 		bsp_state[1] = mfspr(SPR_HID1);
 		bsp_state[0] = mfspr(SPR_HID0);
 		break;
 	}
 }
 
 void
 cpudep_ap_setup()
 { 
 	register_t	reg;
 	uint16_t	vers;
 
 	vers = mfpvr() >> 16;
 
 	/* The following is needed for restoring from sleep. */
 	platform_smp_timebase_sync(0, 1);
 
 	switch(vers) {
 	case IBM970:
 	case IBM970FX:
 	case IBM970MP:
 		/* Set HIOR to 0 */
 		__asm __volatile("mtspr 311,%0" :: "r"(0));
 		powerpc_sync();
 
 		/*
 		 * The 970 has strange rules about how to update HID registers.
 		 * See Table 2-3, 970MP manual
 		 *
 		 * Note: HID4 and HID5 restored already in
 		 * cpudep_ap_early_bootstrap()
 		 */
 
 		__asm __volatile("mtasr %0; sync" :: "r"(0));
 	#ifdef __powerpc64__
 		__asm __volatile(" \
 			sync; isync;					\
 			mtspr	%1, %0;					\
 			mfspr	%0, %1;	mfspr	%0, %1;	mfspr	%0, %1;	\
 			mfspr	%0, %1;	mfspr	%0, %1;	mfspr	%0, %1; \
 			sync; isync" 
 		    :: "r"(bsp_state[0]), "K"(SPR_HID0));
 		__asm __volatile("sync; isync;	\
 		    mtspr %1, %0; mtspr %1, %0; sync; isync"
 		    :: "r"(bsp_state[1]), "K"(SPR_HID1));
 	#else
 		__asm __volatile(" \
 			ld	%0,0(%2);				\
 			sync; isync;					\
 			mtspr	%1, %0;					\
 			mfspr	%0, %1;	mfspr	%0, %1;	mfspr	%0, %1;	\
 			mfspr	%0, %1;	mfspr	%0, %1;	mfspr	%0, %1; \
 			sync; isync" 
 		    : "=r"(reg) : "K"(SPR_HID0), "b"(bsp_state));
 		__asm __volatile("ld %0, 8(%2); sync; isync;	\
 		    mtspr %1, %0; mtspr %1, %0; sync; isync"
 		    : "=r"(reg) : "K"(SPR_HID1), "b"(bsp_state));
 	#endif
 
 		powerpc_sync();
 		break;
 	case IBMCELLBE:
 		#ifdef NOTYET /* Causes problems if in instruction stream on 970 */
 		if (mfmsr() & PSL_HV) {
 			mtspr(SPR_HID0, bsp_state[0]);
 			mtspr(SPR_HID1, bsp_state[1]);
 			mtspr(SPR_HID4, bsp_state[2]);
 			mtspr(SPR_HID6, bsp_state[3]);
 
 			mtspr(SPR_CELL_TSCR, bsp_state[4]);
 		}
 		#endif
 
 		mtspr(SPR_CELL_TSRL, bsp_state[5]);
 
 		break;
 	case MPC7400:
 	case MPC7410:
 	case MPC7447A:
 	case MPC7448:
 	case MPC7450:
 	case MPC7455:
 	case MPC7457:
 		/* XXX: Program the CPU ID into PIR */
 		__asm __volatile("mtspr 1023,%0" :: "r"(PCPU_GET(cpuid)));
 
 		powerpc_sync();
 		isync();
 
 		mtspr(SPR_HID0, bsp_state[0]); isync();
 		mtspr(SPR_HID1, bsp_state[1]); isync();
 
 		/* Now enable the L3 cache. */
 		switch (vers) {
 		case MPC7450:
 		case MPC7455:
 		case MPC7457:
 			/* Only MPC745x CPUs have an L3 cache. */
 			reg = mpc745x_l3_enable(bsp_state[3]);
 		default:
 			break;
 		}
 		
 		reg = mpc74xx_l2_enable(bsp_state[2]);
 		reg = mpc74xx_l1d_enable();
 		reg = mpc74xx_l1i_enable();
 
 		break;
 	case IBMPOWER7:
 	case IBMPOWER7PLUS:
 	case IBMPOWER8:
 	case IBMPOWER8E:
+	case IBMPOWER8NVL:
 	case IBMPOWER9:
 #ifdef __powerpc64__
 		if (mfmsr() & PSL_HV) {
 			mtspr(SPR_LPCR, mfspr(SPR_LPCR) | lpcr |
 			    LPCR_PECE_WAKESET);
 			isync();
 		}
 #endif
 		break;
 	default:
 #ifdef __powerpc64__
 		if (!(mfmsr() & PSL_HV)) /* Rely on HV to have set things up */
 			break;
 #endif
 		printf("WARNING: Unknown CPU type. Cache performace may be "
 		    "suboptimal.\n");
 		break;
 	}
 }
 
Index: user/ngie/bug-237403/sys/powerpc/conf/GENERIC64
===================================================================
--- user/ngie/bug-237403/sys/powerpc/conf/GENERIC64	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/conf/GENERIC64	(revision 346926)
@@ -1,255 +1,256 @@
 #
 # GENERIC -- Generic kernel configuration file for FreeBSD/powerpc
 #
 # For more information on this file, please read the handbook section on
 # Kernel Configuration Files:
 #
 #    https://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
 #
 # The handbook is also available locally in /usr/share/doc/handbook
 # if you've installed the doc distribution, otherwise always see the
 # FreeBSD World Wide Web server (https://www.FreeBSD.org/) for the
 # latest information.
 #
 # An exhaustive list of options and more detailed explanations of the
 # device lines is also present in the ../../conf/NOTES and NOTES files. 
 # If you are in doubt as to the purpose or necessity of a line, check first 
 # in NOTES.
 #
 # $FreeBSD$
 
 cpu		AIM
 ident		GENERIC
 
 machine 	powerpc	powerpc64
 
 makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols
 makeoptions	WITH_CTF=1
 
 # Platform support
 options 	POWERMAC		#NewWorld Apple PowerMacs
 options 	PS3			#Sony Playstation 3
 options 	MAMBO			#IBM Mambo Full System Simulator
 options 	PSERIES			#PAPR-compliant systems (e.g. IBM p)
 options 	POWERNV			#Non-virtualized OpenPOWER systems
 
 options		FDT			#Flattened Device Tree
 options 	SCHED_ULE		#ULE scheduler
 options 	NUMA			#Non-Uniform Memory Architecture support
 options 	PREEMPTION		#Enable kernel thread preemption
 options 	VIMAGE			# Subsystem virtualization, e.g. VNET
 options 	INET			#InterNETworking
 options 	INET6			#IPv6 communications protocols
 options 	IPSEC			# IP (v4/v6) security
 options 	IPSEC_SUPPORT		# Allow kldload of ipsec and tcpmd5
 options 	TCP_OFFLOAD		# TCP offload
 options 	TCP_BLACKBOX		# Enhanced TCP event logging
 options 	TCP_HHOOK		# hhook(9) framework for TCP
 options 	TCP_RFC7413		# TCP Fast Open
 options 	SCTP			#Stream Control Transmission Protocol
 options 	FFS			#Berkeley Fast Filesystem
 options 	SOFTUPDATES		#Enable FFS soft updates support
 options 	UFS_ACL			#Support for access control lists
 options 	UFS_DIRHASH		#Improve performance on big directories
 options 	UFS_GJOURNAL		#Enable gjournal-based UFS journaling
 options 	QUOTA			#Enable disk quotas for UFS
 options 	MD_ROOT			#MD is a potential root device
 options 	NFSCL			#Network Filesystem Client
 options 	NFSD			#Network Filesystem Server
 options 	NFSLOCKD		#Network Lock Manager
 options 	NFS_ROOT		#NFS usable as root device
 options 	MSDOSFS			#MSDOS Filesystem
 options 	CD9660			#ISO 9660 Filesystem
 options 	PROCFS			#Process filesystem (requires PSEUDOFS)
 options 	PSEUDOFS		#Pseudo-filesystem framework
 options 	GEOM_PART_APM		#Apple Partition Maps.
 options 	GEOM_PART_GPT		#GUID Partition Tables.
 options 	GEOM_LABEL		#Provides labelization
 options 	COMPAT_FREEBSD32	#Compatible with FreeBSD/powerpc binaries
 options 	COMPAT_FREEBSD5		#Compatible with FreeBSD5
 options 	COMPAT_FREEBSD6		#Compatible with FreeBSD6
 options 	COMPAT_FREEBSD7		#Compatible with FreeBSD7
 options 	COMPAT_FREEBSD9		# Compatible with FreeBSD9
 options 	COMPAT_FREEBSD10	# Compatible with FreeBSD10
 options 	COMPAT_FREEBSD11	# Compatible with FreeBSD11
 options 	SCSI_DELAY=5000		#Delay (in ms) before probing SCSI 
 options 	KTRACE			#ktrace(1) syscall trace support
 options 	STACK			#stack(9) support
 options 	SYSVSHM			#SYSV-style shared memory
 options 	SYSVMSG			#SYSV-style message queues
 options 	SYSVSEM			#SYSV-style semaphores
 options 	_KPOSIX_PRIORITY_SCHEDULING #Posix P1003_1B real-time extensions
 options		PRINTF_BUFR_SIZE=128	# Prevent printf output being interspersed.
 options 	HWPMC_HOOKS		# Necessary kernel hooks for hwpmc(4)
 options 	AUDIT			# Security event auditing
 options 	CAPABILITY_MODE		# Capsicum capability mode
 options 	CAPABILITIES		# Capsicum capabilities
 options 	MAC			# TrustedBSD MAC Framework
 options 	KDTRACE_HOOKS		# Kernel DTrace hooks
 options 	DDB_CTF			# Kernel ELF linker loads CTF data
 options 	INCLUDE_CONFIG_FILE     # Include this file in kernel
 options 	RACCT			# Resource accounting framework
 options 	RACCT_DEFAULT_TO_DISABLED # Set kern.racct.enable=0 by default
 options 	RCTL			# Resource limits
 
 # Debugging support.  Always need this:
 options 	KDB			# Enable kernel debugger support.
 options 	KDB_TRACE		# Print a stack trace for a panic.
 # For full debugger support use (turn off in stable branch):
 options 	DDB			#Support DDB
 #options 	DEADLKRES		#Enable the deadlock resolver
 options 	INVARIANTS		#Enable calls of extra sanity checking
 options 	INVARIANT_SUPPORT	#Extra sanity checks of internal structures, required by INVARIANTS
 options 	WITNESS			#Enable checks to detect deadlocks and cycles
 options 	WITNESS_SKIPSPIN	#Don't run witness on spinlocks for speed
 options 	MALLOC_DEBUG_MAXZONES=8	# Separate malloc(9) zones
 options 	VERBOSE_SYSINIT=0	# Support debug.verbose_sysinit, off by default
 
 # Kernel dump features.
 options 	EKCD			# Support for encrypted kernel dumps
 options 	GZIO			# gzip-compressed kernel and user dumps
 options 	ZSTDIO			# zstd-compressed kernel and user dumps
 options 	NETDUMP			# netdump(4) client support
 
 # Make an SMP-capable kernel by default
 options 	SMP			# Symmetric MultiProcessor Kernel
 
 # CPU frequency control
 device		cpufreq
 
 # Standard busses
 device		pci
 options 	PCI_HP			# PCI-Express native HotPlug
 device		agp
 
 # ATA controllers
 device		ahci		# AHCI-compatible SATA controllers
 device		ata		# Legacy ATA/SATA controllers
 device		mvs		# Marvell 88SX50XX/88SX60XX/88SX70XX/SoC SATA
 device		siis		# SiliconImage SiI3124/SiI3132/SiI3531 SATA
 
 # NVM Express (NVMe) support
 device		nvme		# base NVMe driver
 options		NVME_USE_NVD=0	# prefer the cam(4) based nda(4) driver
 device		nvd		# expose NVMe namespaces as disks, depends on nvme
 
 # SCSI Controllers
 device		ahc		# AHA2940 and onboard AIC7xxx devices
 options 	AHC_ALLOW_MEMIO	# Attempt to use memory mapped I/O
 device		isp		# Qlogic family
 device		ispfw		# Firmware module for Qlogic host adapters
 device		mpt		# LSI-Logic MPT-Fusion
 device		mps		# LSI-Logic MPT-Fusion 2
 device		sym		# NCR/Symbios/LSI Logic 53C8XX/53C1010/53C1510D
 
 # ATA/SCSI peripherals
 device		scbus		# SCSI bus (required for ATA/SCSI)
 device		ch		# SCSI media changers
 device		da		# Direct Access (disks)
 device		sa		# Sequential Access (tape etc)
 device		cd		# CD
 device		pass		# Passthrough device (direct ATA/SCSI access)
 device		ses		# Enclosure Service (SES and SAF-TE)
 
 # vt is the default console driver, resembling an SCO console
 device		vt		# Core console driver
 device		kbdmux
 
 # Serial (COM) ports
 device		scc
 device		uart
 device		uart_z8530
 
 device		iflib
 
 # Ethernet hardware
 device		em		# Intel PRO/1000 Gigabit Ethernet Family
 device		ix		# Intel PRO/10GbE PCIE PF Ethernet Family
 device		ixv		# Intel PRO/10GbE PCIE VF Ethernet Family
 device		glc		# Sony Playstation 3 Ethernet
 device		llan		# IBM pSeries Virtual Ethernet
 device		cxgbe		# Chelsio 10/25G NIC
 
 # PCI Ethernet NICs that use the common MII bus controller code.
 device		miibus		# MII bus support
 device		bge		# Broadcom BCM570xx Gigabit Ethernet
 device		gem		# Sun GEM/Sun ERI/Apple GMAC
 device		dc		# DEC/Intel 21143 and various workalikes
 device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
 device		re		# RealTek 8139C+/8169/8169S/8110S
 device		rl		# RealTek 8129/8139
 
 # Pseudo devices.
 device		crypto		# core crypto support
 device		loop		# Network loopback
 device		random		# Entropy device
 device		ether		# Ethernet support
 device		vlan		# 802.1Q VLAN support
 device		tun		# Packet tunnel.
 device		md		# Memory "disks"
 device		ofwd		# Open Firmware disks
 device		gif		# IPv6 and IPv4 tunneling
 device		firmware	# firmware assist module
 
 # The `bpf' device enables the Berkeley Packet Filter.
 # Be aware of the administrative consequences of enabling this!
 # Note that 'bpf' is required for DHCP.
 device		bpf		#Berkeley packet filter
 
 # USB support
 options 	USB_DEBUG	# enable debug msgs
 device		uhci		# UHCI PCI->USB interface
 device		ohci		# OHCI PCI->USB interface
 device		ehci		# EHCI PCI->USB interface
 device		xhci		# XHCI PCI->USB interface
 device		usb		# USB Bus (required)
 device		uhid		# "Human Interface Devices"
 device		ukbd		# Keyboard
 options 	KBD_INSTALL_CDEV # install a CDEV entry in /dev
 device		umass		# Disks/Mass storage - Requires scbus and da0
 device		ums		# Mouse
 # USB Ethernet
 device		aue		# ADMtek USB Ethernet
 device		axe		# ASIX Electronics USB Ethernet
 device		cdce		# Generic USB over Ethernet
 device		cue		# CATC USB Ethernet
 device		kue		# Kawasaki LSI USB Ethernet
 
 # Wireless NIC cards
 options 	IEEE80211_SUPPORT_MESH
 
 # FireWire support
 device		firewire	# FireWire bus code
 device		sbp		# SCSI over FireWire (Requires scbus and da)
 device		fwe		# Ethernet over FireWire (non-standard!)
 
 # Misc
 device		iicbus		# I2C bus code
 device		iic
 device		kiic		# Keywest I2C
 device		ad7417		# PowerMac7,2 temperature sensor
 device		ds1631		# PowerMac11,2 temperature sensor
 device		ds1775		# PowerMac7,2 temperature sensor
 device		fcu		# Apple Fan Control Unit
 device		max6690		# PowerMac7,2 temperature sensor
 device		powermac_nvram	# Open Firmware configuration NVRAM
 device		smu		# Apple System Management Unit
 device		atibl		# ATI-based backlight driver for PowerBooks/iBooks
 device		nvbl		# nVidia-based backlight driver for PowerBooks/iBooks
+device		opalflash	# PowerNV embedded flash memory
 
 # ADB support
 device		adb
 device		pmu
 
 # Sound support
 device		sound		# Generic sound driver (required)
 device		snd_ai2s	# Apple I2S audio
 device		snd_uaudio	# USB Audio
 
 # Netmap provides direct access to TX/RX rings on supported NICs
 device		netmap		# netmap(4) support
 
 # evdev interface
 options 	EVDEV_SUPPORT		# evdev support in legacy drivers
 device		evdev			# input event device support
 device		uinput			# install /dev/uinput cdev
Index: user/ngie/bug-237403/sys/powerpc/include/cpu.h
===================================================================
--- user/ngie/bug-237403/sys/powerpc/include/cpu.h	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/include/cpu.h	(revision 346926)
@@ -1,149 +1,150 @@
 /*-
  * SPDX-License-Identifier: BSD-4-Clause
  *
  * Copyright (C) 1995-1997 Wolfgang Solfrank.
  * Copyright (C) 1995-1997 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  *	$NetBSD: cpu.h,v 1.11 2000/05/26 21:19:53 thorpej Exp $
  * $FreeBSD$
  */
 
 #ifndef _MACHINE_CPU_H_
 #define	_MACHINE_CPU_H_
 
 #include <machine/frame.h>
 #include <machine/pcb.h>
 #include <machine/psl.h>
 
 /*
  * CPU Feature Attributes
  *
  * These are defined in the PowerPC ELF ABI for the AT_HWCAP vector,
  * and are exported to userland via the machdep.cpu_features
  * sysctl.
  */
 
 extern u_long cpu_features;
 extern u_long cpu_features2;
 
 #define	PPC_FEATURE_32		0x80000000	/* Always true */
 #define	PPC_FEATURE_64		0x40000000	/* Defined on a 64-bit CPU */
 #define	PPC_FEATURE_601_INSTR	0x20000000	/* Defined on a 64-bit CPU */
 #define	PPC_FEATURE_HAS_ALTIVEC	0x10000000	
 #define	PPC_FEATURE_HAS_FPU	0x08000000
 #define	PPC_FEATURE_HAS_MMU	0x04000000
 #define	PPC_FEATURE_UNIFIED_CACHE 0x01000000
 #define	PPC_FEATURE_HAS_SPE	0x00800000
 #define	PPC_FEATURE_HAS_EFP_SINGLE	0x00400000
 #define	PPC_FEATURE_HAS_EFP_DOUBLE	0x00200000
 #define	PPC_FEATURE_NO_TB	0x00100000
 #define	PPC_FEATURE_POWER4	0x00080000
 #define	PPC_FEATURE_POWER5	0x00040000
 #define	PPC_FEATURE_POWER5_PLUS	0x00020000
 #define	PPC_FEATURE_CELL	0x00010000
 #define	PPC_FEATURE_BOOKE	0x00008000
 #define	PPC_FEATURE_SMT		0x00004000
 #define	PPC_FEATURE_ICACHE_SNOOP	0x00002000
 #define	PPC_FEATURE_ARCH_2_05	0x00001000
 #define	PPC_FEATURE_HAS_DFP	0x00000400
 #define	PPC_FEATURE_POWER6_EXT	0x00000200
 #define	PPC_FEATURE_ARCH_2_06	0x00000100
 #define	PPC_FEATURE_HAS_VSX	0x00000080
 #define	PPC_FEATURE_TRUE_LE	0x00000002
 #define	PPC_FEATURE_PPC_LE	0x00000001
 
 #define	PPC_FEATURE2_ARCH_2_07	0x80000000
 #define	PPC_FEATURE2_HTM	0x40000000
 #define	PPC_FEATURE2_DSCR	0x20000000
+#define	PPC_FEATURE2_EBB	0x10000000
 #define	PPC_FEATURE2_ISEL	0x08000000
 #define	PPC_FEATURE2_TAR	0x04000000
 #define	PPC_FEATURE2_HAS_VEC_CRYPTO	0x02000000
 #define	PPC_FEATURE2_HTM_NOSC	0x01000000
 #define	PPC_FEATURE2_ARCH_3_00	0x00800000
 #define	PPC_FEATURE2_HAS_IEEE128	0x00400000
 #define	PPC_FEATURE2_DARN	0x00200000
 #define	PPC_FEATURE2_SCV	0x00100000
-#define	PPC_FEATURE2_HTM_NOSUSPEND	0x01000000
+#define	PPC_FEATURE2_HTM_NOSUSPEND	0x00080000
 
 #define	PPC_FEATURE_BITMASK						\
 	"\20"								\
 	"\040PPC32\037PPC64\036PPC601\035ALTIVEC\034FPU\033MMU\031UNIFIEDCACHE"	\
 	"\030SPE\027SPESFP\026DPESFP\025NOTB\024POWER4\023POWER5\022P5PLUS\021CELL"\
 	"\020BOOKE\017SMT\016ISNOOP\015ARCH205\013DFP\011ARCH206\010VSX"\
 	"\002TRUELE\001PPCLE"
 #define	PPC_FEATURE2_BITMASK						\
 	"\20"								\
 	"\040ARCH207\037HTM\036DSCR\034ISEL\033TAR\032VCRYPTO\031HTMNOSC" \
 	"\030ARCH300\027IEEE128\026DARN\025SCV\024HTMNOSUSP"
 
 #define	TRAPF_USERMODE(frame)	(((frame)->srr1 & PSL_PR) != 0)
 #define	TRAPF_PC(frame)		((frame)->srr0)
 
 /*
  * CTL_MACHDEP definitions.
  */
 #define	CPU_CACHELINE	1
 
 static __inline u_int64_t
 get_cyclecount(void)
 {
 	u_int32_t _upper, _lower;
 	u_int64_t _time;
 
 	__asm __volatile(
 		"mftb %0\n"
 		"mftbu %1"
 		: "=r" (_lower), "=r" (_upper));
 
 	_time = (u_int64_t)_upper;
 	_time = (_time << 32) + _lower;
 	return (_time);
 }
 
 #define	cpu_getstack(td)	((td)->td_frame->fixreg[1])
 #define	cpu_spinwait()		__asm __volatile("or 27,27,27") /* yield */
 #define	cpu_lock_delay()	DELAY(1)
 
 extern char btext[];
 extern char etext[];
 
 #ifdef __powerpc64__
 extern void enter_idle_powerx(void);
 extern uint64_t can_wakeup;
 extern register_t lpcr;
 #endif
 
 void	cpu_halt(void);
 void	cpu_reset(void);
 void	cpu_sleep(void);
 void	flush_disable_caches(void);
 void	fork_trampoline(void);
 void	swi_vm(void *);
 
 #endif	/* _MACHINE_CPU_H_ */
Index: user/ngie/bug-237403/sys/powerpc/include/pcb.h
===================================================================
--- user/ngie/bug-237403/sys/powerpc/include/pcb.h	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/include/pcb.h	(revision 346926)
@@ -1,111 +1,125 @@
 /*-
  * SPDX-License-Identifier: BSD-4-Clause
  *
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  *	$NetBSD: pcb.h,v 1.4 2000/06/04 11:57:17 tsubai Exp $
  * $FreeBSD$
  */
 
 #ifndef _MACHINE_PCB_H_
 #define	_MACHINE_PCB_H_
 
 #include <machine/setjmp.h>
 
 #ifndef _STANDALONE
 struct pcb {
 	register_t	pcb_context[20];	/* non-volatile r14-r31 */
 	register_t	pcb_cr;			/* Condition register */
 	register_t	pcb_sp;			/* stack pointer */
 	register_t	pcb_toc;		/* toc pointer */
 	register_t	pcb_lr;			/* link register */
 	register_t	pcb_dscr;		/* dscr value */
+	register_t	pcb_fscr;		
+	register_t	pcb_tar;
 	struct		pmap *pcb_pm;		/* pmap of our vmspace */
 	jmp_buf		*pcb_onfault;		/* For use during
 						    copyin/copyout */
 	int		pcb_flags;
 #define	PCB_FPU		0x1	/* Process uses FPU */
 #define	PCB_FPREGS	0x2	/* Process had FPU registers initialized */
 #define	PCB_VEC		0x4	/* Process had Altivec initialized */
 #define	PCB_VSX		0x8	/* Process had VSX initialized */
 #define	PCB_CDSCR	0x10	/* Process had Custom DSCR initialized */
 #define	PCB_HTM		0x20	/* Process had HTM initialized */
+#define	PCB_CFSCR	0x40	/* Process had FSCR updated */
 	struct fpu {
 		union {
 			double fpr;
 			uint32_t vsr[4];
 		} fpr[32];
 		double	fpscr;	/* FPSCR stored as double for easier access */
 	} pcb_fpu;		/* Floating point processor */
 	unsigned int	pcb_fpcpu;		/* which CPU had our FPU
 							stuff. */
 	struct vec {
 		uint32_t vr[32][4];
 		uint32_t spare[2];
 		uint32_t vrsave;
 		uint32_t vscr;	/* aligned at vector element 3 */
 	} pcb_vec __aligned(16);	/* Vector processor */
 	unsigned int	pcb_veccpu;		/* which CPU had our vector
 							stuff. */
 	struct htm {
 		uint64_t tfhar;
 		uint64_t texasr;
 		uint64_t tfiar;
 	} pcb_htm;
+	
+	struct ebb {
+		uint64_t ebbhr;
+		uint64_t ebbrr;
+		uint64_t bescr;
+	} pcb_ebb;
+
+	struct lmon {
+		uint64_t lmrr;
+		uint64_t lmser;
+	} pcb_lm;
 
 	union {
 		struct {
 			vm_offset_t	usr_segm;	/* Base address */
 			register_t	usr_vsid;	/* USER_SR segment */
 		} aim;
 		struct {
 			register_t	dbcr0;
 		} booke;
 	} pcb_cpu;
 	vm_offset_t pcb_lastill;	/* Last illegal instruction */
 };
 #endif
 
 #ifdef	_KERNEL
 
 struct trapframe;
 
 #ifndef curpcb
 extern struct pcb *curpcb;
 #endif
 
 extern struct pmap *curpm;
 extern struct proc *fpuproc;
 
 void	makectx(struct trapframe *, struct pcb *);
 void	savectx(struct pcb *) __returns_twice;
 
 #endif
 #endif	/* _MACHINE_PCB_H_ */
Index: user/ngie/bug-237403/sys/powerpc/include/spr.h
===================================================================
--- user/ngie/bug-237403/sys/powerpc/include/spr.h	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/include/spr.h	(revision 346926)
@@ -1,891 +1,912 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2001 The NetBSD Foundation, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * $NetBSD: spr.h,v 1.25 2002/08/14 15:38:40 matt Exp $
  * $FreeBSD$
  */
 #ifndef _POWERPC_SPR_H_
 #define	_POWERPC_SPR_H_
 
 #ifndef _LOCORE
 #define	mtspr(reg, val)							\
 	__asm __volatile("mtspr %0,%1" : : "K"(reg), "r"(val))
 #define	mfspr(reg)							\
 	( { register_t val;						\
 	  __asm __volatile("mfspr %0,%1" : "=r"(val) : "K"(reg));	\
 	  val; } )
 
 
 #ifndef __powerpc64__
 
 /* The following routines allow manipulation of the full 64-bit width 
  * of SPRs on 64 bit CPUs in bridge mode */
 
 #define mtspr64(reg,valhi,vallo,scratch)				\
 	__asm __volatile("						\
 		mfmsr %0; 						\
 		insrdi %0,%5,1,0; 					\
 		mtmsrd %0; 						\
 		isync; 							\
 									\
 		sld %1,%1,%4;						\
 		or %1,%1,%2;						\
 		mtspr %3,%1;						\
 		srd %1,%1,%4;						\
 									\
 		clrldi %0,%0,1; 					\
 		mtmsrd %0; 						\
 		isync;"							\
 	: "=r"(scratch), "=r"(valhi) : "r"(vallo), "K"(reg), "r"(32), "r"(1))
 
 #define mfspr64upper(reg,scratch)					\
 	( { register_t val;						\
 	    __asm __volatile("						\
 		mfmsr %0; 						\
 		insrdi %0,%4,1,0; 					\
 		mtmsrd %0; 						\
 		isync; 							\
 									\
 		mfspr %1,%2;						\
 		srd %1,%1,%3;						\
 									\
 		clrldi %0,%0,1; 					\
 		mtmsrd %0; 						\
 		isync;" 						\
 	    : "=r"(scratch), "=r"(val) : "K"(reg), "r"(32), "r"(1));	\
 	    val; } )
 
 #endif
 
 #endif /* _LOCORE */
 
 /*
  * Special Purpose Register declarations.
  *
  * The first column in the comments indicates which PowerPC
  * architectures the SPR is valid on - 4 for 4xx series,
  * 6 for 6xx/7xx series and 8 for 8xx and 8xxx series.
  */
 
 #define	SPR_MQ			0x000	/* .6. 601 MQ register */
 #define	SPR_XER			0x001	/* 468 Fixed Point Exception Register */
+#define	SPR_DSCR		0x003	/* .6. Data Stream Control Register (Unprivileged) */
 #define	SPR_RTCU_R		0x004	/* .6. 601 RTC Upper - Read */
 #define	SPR_RTCL_R		0x005	/* .6. 601 RTC Lower - Read */
 #define	SPR_LR			0x008	/* 468 Link Register */
 #define	SPR_CTR			0x009	/* 468 Count Register */
-#define	SPR_DSCR		0x011   /* Data Stream Control Register */
+#define	SPR_DSCRP		0x011   /* Data Stream Control Register (Privileged) */
 #define	SPR_DSISR		0x012	/* .68 DSI exception source */
 #define	  DSISR_DIRECT		  0x80000000 /* Direct-store error exception */
 #define	  DSISR_NOTFOUND	  0x40000000 /* Translation not found */
 #define	  DSISR_PROTECT		  0x08000000 /* Memory access not permitted */
 #define	  DSISR_INVRX		  0x04000000 /* Reserve-indexed insn direct-store access */
 #define	  DSISR_STORE		  0x02000000 /* Store operation */
 #define	  DSISR_DABR		  0x00400000 /* DABR match */
 #define	  DSISR_SEGMENT		  0x00200000 /* XXX; not in 6xx PEM */
 #define	  DSISR_EAR		  0x00100000 /* eciwx/ecowx && EAR[E] == 0 */
 #define	SPR_DAR			0x013	/* .68 Data Address Register */
 #define	SPR_RTCU_W		0x014	/* .6. 601 RTC Upper - Write */
 #define	SPR_RTCL_W		0x015	/* .6. 601 RTC Lower - Write */
 #define	SPR_DEC			0x016	/* .68 DECrementer register */
 #define	SPR_SDR1		0x019	/* .68 Page table base address register */
 #define	SPR_SRR0		0x01a	/* 468 Save/Restore Register 0 */
 #define	SPR_SRR1		0x01b	/* 468 Save/Restore Register 1 */
 #define	  SRR1_ISI_PFAULT	0x40000000 /* ISI page not found */
 #define	  SRR1_ISI_NOEXECUTE	0x10000000 /* Memory marked no-execute */
 #define	  SRR1_ISI_PP		0x08000000 /* PP bits forbid access */
 #define	SPR_DECAR		0x036	/* ..8 Decrementer auto reload */
 #define	SPR_EIE			0x050	/* ..8 Exception Interrupt ??? */
 #define	SPR_EID			0x051	/* ..8 Exception Interrupt ??? */
 #define	SPR_NRI			0x052	/* ..8 Exception Interrupt ??? */
 #define	SPR_FSCR		0x099	/* Facility Status and Control Register */
-#define FSCR_IC_MASK		  0xFF00000000000000ULL	/* FSCR[0:7] is Interrupt Cause */
-#define FSCR_IC_FP		  0x0000000000000000ULL	/* FP unavailable */
-#define FSCR_IC_VSX		  0x0100000000000000ULL	/* VSX unavailable */
-#define FSCR_IC_DSCR		  0x0200000000000000ULL	/* Access to the DSCR at SPRs 3 or 17 */
-#define FSCR_IC_PM		  0x0300000000000000ULL	/* Read or write access of a Performance Monitor SPR in group A */
-#define FSCR_IC_BHRB		  0x0400000000000000ULL	/* Execution of a BHRB Instruction */
-#define FSCR_IC_HTM		  0x0500000000000000ULL	/* Access to a Transactional Memory */
+#define	  FSCR_IC_MASK		  0xFF00000000000000ULL	/* FSCR[0:7] is Interrupt Cause */
+#define	  FSCR_IC_FP		  0x0000000000000000ULL	/* FP unavailable */
+#define	  FSCR_IC_VSX		  0x0100000000000000ULL	/* VSX unavailable */
+#define	  FSCR_IC_DSCR		  0x0200000000000000ULL	/* Access to the DSCR at SPRs 3 or 17 */
+#define	  FSCR_IC_PM		  0x0300000000000000ULL	/* Read or write access of a Performance Monitor SPR in group A */
+#define	  FSCR_IC_BHRB		  0x0400000000000000ULL	/* Execution of a BHRB Instruction */
+#define	  FSCR_IC_HTM		  0x0500000000000000ULL	/* Access to a Transactional Memory */
 /* Reserved 0x0600000000000000ULL */
-#define FSCR_IC_EBB		  0x0700000000000000ULL	/* Access to Event-Based Branch */
-#define FSCR_IC_TAR		  0x0800000000000000ULL	/* Access to Target Address Register */
-#define FSCR_IC_STOP		  0x0900000000000000ULL	/* Access to the 'stop' instruction in privileged non-hypervisor state */
-#define FSCR_IC_MSG		  0x0A00000000000000ULL	/* Access to 'msgsndp' or 'msgclrp' instructions */
-#define FSCR_IC_SCV		  0x0C00000000000000ULL	/* Execution of a 'scv' instruction */
+#define	  FSCR_IC_EBB		  0x0700000000000000ULL	/* Access to Event-Based Branch */
+#define	  FSCR_IC_TAR		  0x0800000000000000ULL	/* Access to Target Address Register */
+#define	  FSCR_IC_STOP		  0x0900000000000000ULL	/* Access to the 'stop' instruction in privileged non-hypervisor state */
+#define	  FSCR_IC_MSG		  0x0A00000000000000ULL	/* Access to 'msgsndp' or 'msgclrp' instructions */
+#define	  FSCR_IC_LM		  0x0A00000000000000ULL	/* Access to load monitored facility */
+#define	  FSCR_IC_SCV		  0x0C00000000000000ULL	/* Execution of a 'scv' instruction */
+#define	  FSCR_SCV		  0x0000000000001000 /* scv instruction available */
+#define	  FSCR_LM		  0x0000000000000800 /* Load monitored facilities available */
+#define	  FSCR_MSGP		  0x0000000000000400 /* msgsndp and SPRs available */
+#define	  FSCR_TAR		  0x0000000000000100 /* TAR register available */
+#define	  FSCR_EBB		  0x0000000000000080 /* Event-based branch available */
+#define	  FSCR_DSCR		  0x0000000000000004 /* DSCR available in PR state */
+#define	SPR_DPDES		0x0b0	/* .6. Directed Privileged Doorbell Exception State Register */
 #define	SPR_USPRG0		0x100	/* 4.. User SPR General 0 */
 #define	SPR_VRSAVE		0x100	/* .6. AltiVec VRSAVE */
 #define	SPR_SPRG0		0x110	/* 468 SPR General 0 */
 #define	SPR_SPRG1		0x111	/* 468 SPR General 1 */
 #define	SPR_SPRG2		0x112	/* 468 SPR General 2 */
 #define	SPR_SPRG3		0x113	/* 468 SPR General 3 */
 #define	SPR_SPRG4		0x114	/* 4.. SPR General 4 */
 #define	SPR_SPRG5		0x115	/* 4.. SPR General 5 */
 #define	SPR_SPRG6		0x116	/* 4.. SPR General 6 */
 #define	SPR_SPRG7		0x117	/* 4.. SPR General 7 */
 #define	SPR_SCOMC		0x114	/* ... SCOM Address Register (970) */
 #define	SPR_SCOMD		0x115	/* ... SCOM Data Register (970) */
 #define	SPR_ASR			0x118	/* ... Address Space Register (PPC64) */
 #define	SPR_EAR			0x11a	/* .68 External Access Register */
 #define	SPR_PVR			0x11f	/* 468 Processor Version Register */
 #define	  MPC601		  0x0001
 #define	  MPC603		  0x0003
 #define	  MPC604		  0x0004
 #define	  MPC602		  0x0005
 #define	  MPC603e		  0x0006
 #define	  MPC603ev		  0x0007
 #define	  MPC750		  0x0008
 #define	  MPC750CL		  0x7000	/* Nintendo Wii's Broadway */
 #define	  MPC604ev		  0x0009
 #define	  MPC7400		  0x000c
 #define	  MPC620		  0x0014
 #define	  IBM403		  0x0020
 #define	  IBM401A1		  0x0021
 #define	  IBM401B2		  0x0022
 #define	  IBM401C2		  0x0023
 #define	  IBM401D2		  0x0024
 #define	  IBM401E2		  0x0025
 #define	  IBM401F2		  0x0026
 #define	  IBM401G2		  0x0027
 #define	  IBMRS64II		  0x0033
 #define	  IBMRS64III		  0x0034
 #define	  IBMPOWER4		  0x0035
 #define	  IBMRS64III_2		  0x0036
 #define	  IBMRS64IV		  0x0037
 #define	  IBMPOWER4PLUS		  0x0038
 #define	  IBM970		  0x0039
 #define	  IBMPOWER5		  0x003a
 #define	  IBMPOWER5PLUS		  0x003b
 #define	  IBM970FX		  0x003c
 #define	  IBMPOWER6		  0x003e
 #define	  IBMPOWER7		  0x003f
 #define	  IBMPOWER3		  0x0040
 #define	  IBMPOWER3PLUS		  0x0041
 #define	  IBM970MP		  0x0044
 #define	  IBM970GX		  0x0045
 #define	  IBMPOWERPCA2		  0x0049
 #define	  IBMPOWER7PLUS		  0x004a
 #define	  IBMPOWER8E		  0x004b
+#define	  IBMPOWER8NVL		  0x004c
 #define	  IBMPOWER8		  0x004d
 #define	  IBMPOWER9		  0x004e
 #define	  MPC860		  0x0050
 #define	  IBMCELLBE		  0x0070
 #define	  MPC8240		  0x0081
 #define	  PA6T			  0x0090
 #define	  IBM405GP		  0x4011
 #define	  IBM405L		  0x4161
 #define	  IBM750FX		  0x7000
 #define	MPC745X_P(v)	((v & 0xFFF8) == 0x8000)
 #define	  MPC7450		  0x8000
 #define	  MPC7455		  0x8001
 #define	  MPC7457		  0x8002
 #define	  MPC7447A		  0x8003
 #define	  MPC7448		  0x8004
 #define	  MPC7410		  0x800c
 #define	  MPC8245		  0x8081
 #define	  FSL_E500v1		  0x8020
 #define	  FSL_E500v2		  0x8021
 #define	  FSL_E500mc		  0x8023
 #define	  FSL_E5500		  0x8024
 #define	  FSL_E6500		  0x8040
 #define	  FSL_E300C1		  0x8083
 #define	  FSL_E300C2		  0x8084
 #define	  FSL_E300C3		  0x8085
 #define	  FSL_E300C4		  0x8086
 
 #define   LPCR_PECE_WAKESET     (LPCR_PECE_EXT | LPCR_PECE_DECR | LPCR_PECE_ME)
  
 #define	SPR_EPCR		0x133
 #define	  EPCR_EXTGS		  0x80000000
 #define	  EPCR_DTLBGS		  0x40000000
 #define	  EPCR_ITLBGS		  0x20000000
 #define	  EPCR_DSIGS		  0x10000000
 #define	  EPCR_ISIGS		  0x08000000
 #define	  EPCR_DUVGS		  0x04000000
 #define	  EPCR_ICM		  0x02000000
 #define	  EPCR_GICMGS		  0x01000000
 #define	  EPCR_DGTMI		  0x00800000
 #define	  EPCR_DMIUH		  0x00400000
 #define	  EPCR_PMGS		  0x00200000
 
 #define	SPR_HSRR0		0x13a
 #define	SPR_HSRR1		0x13b
 #define	SPR_LPCR		0x13e	/* Logical Partitioning Control */
 #define	  LPCR_LPES		  0x008	/* Bit 60 */
 #define	  LPCR_HVICE		  0x002	/* Hypervisor Virtualization Interrupt (Arch 3.0) */
 #define	  LPCR_PECE_DRBL          (1ULL << 16) /* Directed Privileged Doorbell */
 #define	  LPCR_PECE_HDRBL         (1ULL << 15) /* Directed Hypervisor Doorbell */
 #define	  LPCR_PECE_EXT           (1ULL << 14) /* External exceptions */
 #define	  LPCR_PECE_DECR          (1ULL << 13) /* Decrementer exceptions */
 #define	  LPCR_PECE_ME            (1ULL << 12) /* Machine Check and Hypervisor */
                                                /* Maintenance exceptions */
 #define	SPR_LPID		0x13f	/* Logical Partitioning Control */
 #define	SPR_HMER		0x150	/* Hypervisor Maintenance Exception Register */
 #define	SPR_HMEER		0x151	/* Hypervisor Maintenance Exception Enable Register */
 
+#define	SPR_TIR			0x1be	/* .6. Thread Identification Register */
 #define	SPR_PTCR		0x1d0	/* Partition Table Control Register */
 #define	SPR_SPEFSCR		0x200	/* ..8 Signal Processing Engine FSCR. */
 #define	  SPEFSCR_SOVH		  0x80000000
 #define	  SPEFSCR_OVH		  0x40000000
 #define	  SPEFSCR_FGH		  0x20000000
 #define	  SPEFSCR_FXH		  0x10000000
 #define	  SPEFSCR_FINVH		  0x08000000
 #define	  SPEFSCR_FDBZH		  0x04000000
 #define	  SPEFSCR_FUNFH		  0x02000000
 #define	  SPEFSCR_FOVFH		  0x01000000
 #define	  SPEFSCR_FINXS		  0x00200000
 #define	  SPEFSCR_FINVS		  0x00100000
 #define	  SPEFSCR_FDBZS		  0x00080000
 #define	  SPEFSCR_FUNFS		  0x00040000
 #define	  SPEFSCR_FOVFS		  0x00020000
 #define	  SPEFSCR_SOV		  0x00008000
 #define	  SPEFSCR_OV		  0x00004000
 #define	  SPEFSCR_FG		  0x00002000
 #define	  SPEFSCR_FX		  0x00001000
 #define	  SPEFSCR_FINV		  0x00000800
 #define	  SPEFSCR_FDBZ		  0x00000400
 #define	  SPEFSCR_FUNF		  0x00000200
 #define	  SPEFSCR_FOVF		  0x00000100
 #define	  SPEFSCR_FINXE		  0x00000040
 #define	  SPEFSCR_FINVE		  0x00000020
 #define	  SPEFSCR_FDBZE		  0x00000010
 #define	  SPEFSCR_FUNFE		  0x00000008
 #define	  SPEFSCR_FOVFE		  0x00000004
 #define	  SPEFSCR_FRMC_M	  0x00000003
 #define	SPR_IBAT0U		0x210	/* .6. Instruction BAT Reg 0 Upper */
 #define	SPR_IBAT0L		0x211	/* .6. Instruction BAT Reg 0 Lower */
 #define	SPR_IBAT1U		0x212	/* .6. Instruction BAT Reg 1 Upper */
 #define	SPR_IBAT1L		0x213	/* .6. Instruction BAT Reg 1 Lower */
 #define	SPR_IBAT2U		0x214	/* .6. Instruction BAT Reg 2 Upper */
 #define	SPR_IBAT2L		0x215	/* .6. Instruction BAT Reg 2 Lower */
 #define	SPR_IBAT3U		0x216	/* .6. Instruction BAT Reg 3 Upper */
 #define	SPR_IBAT3L		0x217	/* .6. Instruction BAT Reg 3 Lower */
 #define	SPR_DBAT0U		0x218	/* .6. Data BAT Reg 0 Upper */
 #define	SPR_DBAT0L		0x219	/* .6. Data BAT Reg 0 Lower */
 #define	SPR_DBAT1U		0x21a	/* .6. Data BAT Reg 1 Upper */
 #define	SPR_DBAT1L		0x21b	/* .6. Data BAT Reg 1 Lower */
 #define	SPR_DBAT2U		0x21c	/* .6. Data BAT Reg 2 Upper */
 #define	SPR_DBAT2L		0x21d	/* .6. Data BAT Reg 2 Lower */
 #define	SPR_DBAT3U		0x21e	/* .6. Data BAT Reg 3 Upper */
 #define	SPR_DBAT3L		0x21f	/* .6. Data BAT Reg 3 Lower */
 #define	SPR_IC_CST		0x230	/* ..8 Instruction Cache CSR */
 #define	  IC_CST_IEN		0x80000000 /* I cache is ENabled   (RO) */
 #define	  IC_CST_CMD_INVALL	0x0c000000 /* I cache invalidate all */
 #define	  IC_CST_CMD_UNLOCKALL	0x0a000000 /* I cache unlock all */
 #define	  IC_CST_CMD_UNLOCK	0x08000000 /* I cache unlock block */
 #define	  IC_CST_CMD_LOADLOCK	0x06000000 /* I cache load & lock block */
 #define	  IC_CST_CMD_DISABLE	0x04000000 /* I cache disable */
 #define	  IC_CST_CMD_ENABLE	0x02000000 /* I cache enable */
 #define	  IC_CST_CCER1		0x00200000 /* I cache error type 1 (RO) */
 #define	  IC_CST_CCER2		0x00100000 /* I cache error type 2 (RO) */
 #define	  IC_CST_CCER3		0x00080000 /* I cache error type 3 (RO) */
 #define	SPR_IBAT4U		0x230	/* .6. Instruction BAT Reg 4 Upper */
 #define	SPR_IC_ADR		0x231	/* ..8 Instruction Cache Address */
 #define	SPR_IBAT4L		0x231	/* .6. Instruction BAT Reg 4 Lower */
 #define	SPR_IC_DAT		0x232	/* ..8 Instruction Cache Data */
 #define	SPR_IBAT5U		0x232	/* .6. Instruction BAT Reg 5 Upper */
 #define	SPR_IBAT5L		0x233	/* .6. Instruction BAT Reg 5 Lower */
 #define	SPR_IBAT6U		0x234	/* .6. Instruction BAT Reg 6 Upper */
 #define	SPR_IBAT6L		0x235	/* .6. Instruction BAT Reg 6 Lower */
 #define	SPR_IBAT7U		0x236	/* .6. Instruction BAT Reg 7 Upper */
 #define	SPR_IBAT7L		0x237	/* .6. Instruction BAT Reg 7 Lower */
 #define	SPR_DC_CST		0x230	/* ..8 Data Cache CSR */
 #define	  DC_CST_DEN		0x80000000 /* D cache ENabled (RO) */
 #define	  DC_CST_DFWT		0x40000000 /* D cache Force Write-Thru (RO) */
 #define	  DC_CST_LES		0x20000000 /* D cache Little Endian Swap (RO) */
 #define	  DC_CST_CMD_FLUSH	0x0e000000 /* D cache invalidate all */
 #define	  DC_CST_CMD_INVALL	0x0c000000 /* D cache invalidate all */
 #define	  DC_CST_CMD_UNLOCKALL	0x0a000000 /* D cache unlock all */
 #define	  DC_CST_CMD_UNLOCK	0x08000000 /* D cache unlock block */
 #define	  DC_CST_CMD_CLRLESWAP	0x07000000 /* D cache clr little-endian swap */
 #define	  DC_CST_CMD_LOADLOCK	0x06000000 /* D cache load & lock block */
 #define	  DC_CST_CMD_SETLESWAP	0x05000000 /* D cache set little-endian swap */
 #define	  DC_CST_CMD_DISABLE	0x04000000 /* D cache disable */
 #define	  DC_CST_CMD_CLRFWT	0x03000000 /* D cache clear forced write-thru */
 #define	  DC_CST_CMD_ENABLE	0x02000000 /* D cache enable */
 #define	  DC_CST_CMD_SETFWT	0x01000000 /* D cache set forced write-thru */
 #define	  DC_CST_CCER1		0x00200000 /* D cache error type 1 (RO) */
 #define	  DC_CST_CCER2		0x00100000 /* D cache error type 2 (RO) */
 #define	  DC_CST_CCER3		0x00080000 /* D cache error type 3 (RO) */
 #define	SPR_DBAT4U		0x238	/* .6. Data BAT Reg 4 Upper */
 #define	SPR_DC_ADR		0x231	/* ..8 Data Cache Address */
 #define	SPR_DBAT4L		0x239	/* .6. Data BAT Reg 4 Lower */
 #define	SPR_DC_DAT		0x232	/* ..8 Data Cache Data */
 #define	SPR_DBAT5U		0x23a	/* .6. Data BAT Reg 5 Upper */
 #define	SPR_DBAT5L		0x23b	/* .6. Data BAT Reg 5 Lower */
 #define	SPR_DBAT6U		0x23c	/* .6. Data BAT Reg 6 Upper */
 #define	SPR_DBAT6L		0x23d	/* .6. Data BAT Reg 6 Lower */
 #define	SPR_DBAT7U		0x23e	/* .6. Data BAT Reg 7 Upper */
 #define	SPR_DBAT7L		0x23f	/* .6. Data BAT Reg 7 Lower */
 #define	SPR_SPRG8		0x25c	/* ..8 SPR General 8 */
 #define	SPR_MI_CTR		0x310	/* ..8 IMMU control */
 #define	  Mx_CTR_GPM		0x80000000 /* Group Protection Mode */
 #define	  Mx_CTR_PPM		0x40000000 /* Page Protection Mode */
 #define	  Mx_CTR_CIDEF		0x20000000 /* Cache-Inhibit DEFault */
 #define	  MD_CTR_WTDEF		0x20000000 /* Write-Through DEFault */
 #define	  Mx_CTR_RSV4		0x08000000 /* Reserve 4 TLB entries */
 #define	  MD_CTR_TWAM		0x04000000 /* TableWalk Assist Mode */
 #define	  Mx_CTR_PPCS		0x02000000 /* Priv/user state compare mode */
 #define	  Mx_CTR_TLB_INDX	0x000001f0 /* TLB index mask */
 #define	  Mx_CTR_TLB_INDX_BITPOS	8	  /* TLB index shift */
 #define	SPR_MI_AP		0x312	/* ..8 IMMU access protection */
 #define	  Mx_GP_SUPER(n)	(0 << (2*(15-(n)))) /* access is supervisor */
 #define	  Mx_GP_PAGE		(1 << (2*(15-(n)))) /* access is page protect */
 #define	  Mx_GP_SWAPPED		(2 << (2*(15-(n)))) /* access is swapped */
 #define	  Mx_GP_USER		(3 << (2*(15-(n)))) /* access is user */
 #define	SPR_MI_EPN		0x313	/* ..8 IMMU effective number */
 #define	  Mx_EPN_EPN		0xfffff000 /* Effective Page Number mask */
 #define	  Mx_EPN_EV		0x00000020 /* Entry Valid */
 #define	  Mx_EPN_ASID		0x0000000f /* Address Space ID */
 #define	SPR_MI_TWC		0x315	/* ..8 IMMU tablewalk control */
 #define	  MD_TWC_L2TB		0xfffff000 /* Level-2 Tablewalk Base */
 #define	  Mx_TWC_APG		0x000001e0 /* Access Protection Group */
 #define	  Mx_TWC_G		0x00000010 /* Guarded memory */
 #define	  Mx_TWC_PS		0x0000000c /* Page Size (L1) */
 #define	  MD_TWC_WT		0x00000002 /* Write-Through */
 #define	  Mx_TWC_V		0x00000001 /* Entry Valid */
 #define	SPR_MI_RPN		0x316	/* ..8 IMMU real (phys) page number */
 #define	  Mx_RPN_RPN		0xfffff000 /* Real Page Number */
 #define	  Mx_RPN_PP		0x00000ff0 /* Page Protection */
 #define	  Mx_RPN_SPS		0x00000008 /* Small Page Size */
 #define	  Mx_RPN_SH		0x00000004 /* SHared page */
 #define	  Mx_RPN_CI		0x00000002 /* Cache Inhibit */
 #define	  Mx_RPN_V		0x00000001 /* Valid */
 #define	SPR_MD_CTR		0x318	/* ..8 DMMU control */
 #define	SPR_M_CASID		0x319	/* ..8 CASID */
 #define	  M_CASID		0x0000000f /* Current AS Id */
 #define	SPR_MD_AP		0x31a	/* ..8 DMMU access protection */
 #define	SPR_MD_EPN		0x31b	/* ..8 DMMU effective number */
 
 #define	SPR_970MMCR0		0x31b	/* ... Monitor Mode Control Register 0 (PPC 970) */
 #define	  SPR_970MMCR0_PMC1SEL(x) ((x) << 8) /* PMC1 selector (970) */
 #define	  SPR_970MMCR0_PMC2SEL(x) ((x) << 1) /* PMC2 selector (970) */
 #define	SPR_970MMCR1		0x31e	/* ... Monitor Mode Control Register 1 (PPC 970) */
 #define	  SPR_970MMCR1_PMC3SEL(x)	  (((x) & 0x1f) << 27) /* PMC 3 selector */
 #define	  SPR_970MMCR1_PMC4SEL(x)	  (((x) & 0x1f) << 22) /* PMC 4 selector */
 #define	  SPR_970MMCR1_PMC5SEL(x)	  (((x) & 0x1f) << 17) /* PMC 5 selector */
 #define	  SPR_970MMCR1_PMC6SEL(x)	  (((x) & 0x1f) << 12) /* PMC 6 selector */
 #define	  SPR_970MMCR1_PMC7SEL(x)	  (((x) & 0x1f) << 7) /* PMC 7 selector */
 #define	  SPR_970MMCR1_PMC8SEL(x)	  (((x) & 0x1f) << 2) /* PMC 8 selector */
 #define	SPR_970MMCRA		0x312	/* ... Monitor Mode Control Register 2 (PPC 970) */
 #define	SPR_970PMC1		0x313	/* ... PMC 1 */
 #define	SPR_970PMC2		0x314	/* ... PMC 2 */
 #define	SPR_970PMC3		0x315	/* ... PMC 3 */
 #define	SPR_970PMC4		0x316	/* ... PMC 4 */
 #define	SPR_970PMC5		0x317	/* ... PMC 5 */
 #define	SPR_970PMC6		0x318	/* ... PMC 6 */
 #define	SPR_970PMC7		0x319	/* ... PMC 7 */
 #define	SPR_970PMC8		0x31a	/* ... PMC 8 */
 
 #define	SPR_M_TWB		0x31c	/* ..8 MMU tablewalk base */
 #define	  M_TWB_L1TB		0xfffff000 /* level-1 translation base */
 #define	  M_TWB_L1INDX		0x00000ffc /* level-1 index */
 #define	SPR_MD_TWC		0x31d	/* ..8 DMMU tablewalk control */
 #define	SPR_MD_RPN		0x31e	/* ..8 DMMU real (phys) page number */
 #define	SPR_MD_TW		0x31f	/* ..8 MMU tablewalk scratch */
+#define	SPR_BESCRS		0x320	/* .6. Branch Event Status and Control Set Register */
+#define	SPR_BESCRSU		0x321	/* .6. Branch Event Status and Control Set Register (upper 32-bit) */
+#define	SPR_BESCRR		0x322	/* .6. Branch Event Status and Control Reset Register */
+#define	SPR_BESCRRU		0x323	/* .6. Branch Event Status and Control Register (upper 32-bit) */
+#define	SPR_EBBHR		0x324	/* .6. Event-based Branch Handler Register */
+#define	SPR_EBBRR		0x325	/* .6. Event-based Branch Return Register */
+#define	SPR_BESCR		0x326	/* .6. Branch Event Status and Control Register */
+#define	SPR_LMRR		0x32d	/* .6. Load Monitored Region Register */
+#define	SPR_LMSER		0x32e	/* .6. Load Monitored Section Enable Register */
+#define	SPR_TAR			0x32f	/* .6. Branch Target Address Register */
 #define	SPR_MI_CAM		0x330	/* ..8 IMMU CAM entry read */
 #define	SPR_MI_RAM0		0x331	/* ..8 IMMU RAM entry read reg 0 */
 #define	SPR_MI_RAM1		0x332	/* ..8 IMMU RAM entry read reg 1 */
 #define	SPR_MD_CAM		0x338	/* ..8 IMMU CAM entry read */
 #define	SPR_MD_RAM0		0x339	/* ..8 IMMU RAM entry read reg 0 */
 #define	SPR_MD_RAM1		0x33a	/* ..8 IMMU RAM entry read reg 1 */
 #define	SPR_PSSCR		0x357	/* Processor Stop Status and Control Register (ISA 3.0) */
 #define	  PSSCR_PLS_S		  60
 #define	  PSSCR_PLS_M		  (0xf << PSSCR_PLS_S)
 #define	  PSSCR_SD		  (1 << 22)
 #define	  PSSCR_ESL		  (1 << 21)
 #define	  PSSCR_EC		  (1 << 20)
 #define	  PSSCR_PSLL_S		  16
 #define	  PSSCR_PSLL_M		  (0xf << PSSCR_PSLL_S)
 #define	  PSSCR_TR_S		  8
 #define	  PSSCR_TR_M		  (0x3 << PSSCR_TR_S)
 #define	  PSSCR_MTL_S		  4
 #define	  PSSCR_MTL_M		  (0xf << PSSCR_MTL_S)
 #define	  PSSCR_RL_S		  0
 #define	  PSSCR_RL_M		  (0xf << PSSCR_RL_S)
 #define	SPR_PMCR                0x374   /* Processor Management Control Register */
 #define	SPR_UMMCR2		0x3a0	/* .6. User Monitor Mode Control Register 2 */
 #define	SPR_UMMCR0		0x3a8	/* .6. User Monitor Mode Control Register 0 */
 #define	SPR_USIA		0x3ab	/* .6. User Sampled Instruction Address */
 #define	SPR_UMMCR1		0x3ac	/* .6. User Monitor Mode Control Register 1 */
 #define	SPR_ZPR			0x3b0	/* 4.. Zone Protection Register */
 #define	SPR_MMCR2		0x3b0	/* .6. Monitor Mode Control Register 2 */
 #define	  SPR_MMCR2_THRESHMULT_32	  0x80000000 /* Multiply MMCR0 threshold by 32 */
 #define	  SPR_MMCR2_THRESHMULT_2	  0x00000000 /* Multiply MMCR0 threshold by 2 */
 #define	SPR_PID			0x3b1	/* 4.. Process ID */
 #define	SPR_PMC5		0x3b1	/* .6. Performance Counter Register 5 */
 #define	SPR_PMC6		0x3b2	/* .6. Performance Counter Register 6 */
 #define	SPR_CCR0		0x3b3	/* 4.. Core Configuration Register 0 */
 #define	SPR_IAC3		0x3b4	/* 4.. Instruction Address Compare 3 */
 #define	SPR_IAC4		0x3b5	/* 4.. Instruction Address Compare 4 */
 #define	SPR_DVC1		0x3b6	/* 4.. Data Value Compare 1 */
 #define	SPR_DVC2		0x3b7	/* 4.. Data Value Compare 2 */
 #define	SPR_MMCR0		0x3b8	/* .6. Monitor Mode Control Register 0 */
 #define	  SPR_MMCR0_FC		  0x80000000 /* Freeze counters */
 #define	  SPR_MMCR0_FCS		  0x40000000 /* Freeze counters in supervisor mode */
 #define	  SPR_MMCR0_FCP		  0x20000000 /* Freeze counters in user mode */
 #define	  SPR_MMCR0_FCM1	  0x10000000 /* Freeze counters when mark=1 */
 #define	  SPR_MMCR0_FCM0	  0x08000000 /* Freeze counters when mark=0 */
 #define	  SPR_MMCR0_PMXE	  0x04000000 /* Enable PM interrupt */
 #define	  SPR_MMCR0_FCECE	  0x02000000 /* Freeze counters after event */
 #define	  SPR_MMCR0_TBSEL_15	  0x01800000 /* Count bit 15 of TBL */
 #define	  SPR_MMCR0_TBSEL_19	  0x01000000 /* Count bit 19 of TBL */
 #define	  SPR_MMCR0_TBSEL_23	  0x00800000 /* Count bit 23 of TBL */
 #define	  SPR_MMCR0_TBSEL_31	  0x00000000 /* Count bit 31 of TBL */
 #define	  SPR_MMCR0_TBEE	  0x00400000 /* Time-base event enable */
 #define	  SPR_MMCRO_THRESHOLD(x)  ((x) << 16) /* Threshold value */
 #define	  SPR_MMCR0_PMC1CE	  0x00008000 /* PMC1 condition enable */
 #define	  SPR_MMCR0_PMCNCE	  0x00004000 /* PMCn condition enable */
 #define	  SPR_MMCR0_TRIGGER	  0x00002000 /* Trigger */
 #define	  SPR_MMCR0_PMC1SEL(x)	  (((x) & 0x3f) << 6) /* PMC1 selector */
 #define	  SPR_MMCR0_PMC2SEL(x)	  (((x) & 0x3f) << 0) /* PMC2 selector */
 #define	SPR_SGR			0x3b9	/* 4.. Storage Guarded Register */
 #define	SPR_PMC1		0x3b9	/* .6. Performance Counter Register 1 */
 #define	SPR_DCWR		0x3ba	/* 4.. Data Cache Write-through Register */
 #define	SPR_PMC2		0x3ba	/* .6. Performance Counter Register 2 */
 #define	SPR_SLER		0x3bb	/* 4.. Storage Little Endian Register */
 #define	SPR_SIA			0x3bb	/* .6. Sampled Instruction Address */
 #define	SPR_MMCR1		0x3bc	/* .6. Monitor Mode Control Register 2 */
 #define	  SPR_MMCR1_PMC3SEL(x)	  (((x) & 0x1f) << 27) /* PMC 3 selector */
 #define	  SPR_MMCR1_PMC4SEL(x)	  (((x) & 0x1f) << 22) /* PMC 4 selector */
 #define	  SPR_MMCR1_PMC5SEL(x)	  (((x) & 0x1f) << 17) /* PMC 5 selector */
 #define	  SPR_MMCR1_PMC6SEL(x)	  (((x) & 0x3f) << 11) /* PMC 6 selector */
 
 #define	SPR_SU0R		0x3bc	/* 4.. Storage User-defined 0 Register */
 #define	SPR_PMC3		0x3bd	/* .6. Performance Counter Register 3 */
 #define	SPR_PMC4		0x3be	/* .6. Performance Counter Register 4 */
 #define	SPR_DMISS		0x3d0	/* .68 Data TLB Miss Address Register */
 #define	SPR_DCMP		0x3d1	/* .68 Data TLB Compare Register */
 #define	SPR_HASH1		0x3d2	/* .68 Primary Hash Address Register */
 #define	SPR_ICDBDR		0x3d3	/* 4.. Instruction Cache Debug Data Register */
 #define	SPR_HASH2		0x3d3	/* .68 Secondary Hash Address Register */
 #define	SPR_IMISS		0x3d4	/* .68 Instruction TLB Miss Address Register */
 #define	SPR_TLBMISS		0x3d4	/* .6. TLB Miss Address Register */
 #define	SPR_DEAR		0x3d5	/* 4.. Data Error Address Register */
 #define	SPR_ICMP		0x3d5	/* .68 Instruction TLB Compare Register */
 #define	SPR_PTEHI		0x3d5	/* .6. Instruction TLB Compare Register */
 #define	SPR_EVPR		0x3d6	/* 4.. Exception Vector Prefix Register */
 #define	SPR_RPA			0x3d6	/* .68 Required Physical Address Register */
 #define	SPR_PTELO		0x3d6	/* .6. Required Physical Address Register */
 
 #define	SPR_TSR			0x150	/* ..8 Timer Status Register */
 #define	SPR_TCR			0x154	/* ..8 Timer Control Register */
 
 #define	  TSR_ENW		  0x80000000 /* Enable Next Watchdog */
 #define	  TSR_WIS		  0x40000000 /* Watchdog Interrupt Status */
 #define	  TSR_WRS_MASK		  0x30000000 /* Watchdog Reset Status */
 #define	  TSR_WRS_NONE		  0x00000000 /* No watchdog reset has occurred */
 #define	  TSR_WRS_CORE		  0x10000000 /* Core reset was forced by the watchdog */
 #define	  TSR_WRS_CHIP		  0x20000000 /* Chip reset was forced by the watchdog */
 #define	  TSR_WRS_SYSTEM	  0x30000000 /* System reset was forced by the watchdog */
 #define	  TSR_PIS		  0x08000000 /* PIT Interrupt Status */
 #define	  TSR_DIS		  0x08000000 /* Decrementer Interrupt Status */
 #define	  TSR_FIS		  0x04000000 /* FIT Interrupt Status */
 
 #define	  TCR_WP_MASK		  0xc0000000 /* Watchdog Period mask */
 #define	  TCR_WP_2_17		  0x00000000 /* 2**17 clocks */
 #define	  TCR_WP_2_21		  0x40000000 /* 2**21 clocks */
 #define	  TCR_WP_2_25		  0x80000000 /* 2**25 clocks */
 #define	  TCR_WP_2_29		  0xc0000000 /* 2**29 clocks */
 #define	  TCR_WRC_MASK		  0x30000000 /* Watchdog Reset Control mask */
 #define	  TCR_WRC_NONE		  0x00000000 /* No watchdog reset */
 #define	  TCR_WRC_CORE		  0x10000000 /* Core reset */
 #define	  TCR_WRC_CHIP		  0x20000000 /* Chip reset */
 #define	  TCR_WRC_SYSTEM	  0x30000000 /* System reset */
 #define	  TCR_WIE		  0x08000000 /* Watchdog Interrupt Enable */
 #define	  TCR_PIE		  0x04000000 /* PIT Interrupt Enable */
 #define	  TCR_DIE		  0x04000000 /* Pecrementer Interrupt Enable */
 #define	  TCR_FP_MASK		  0x03000000 /* FIT Period */
 #define	  TCR_FP_2_9		  0x00000000 /* 2**9 clocks */
 #define	  TCR_FP_2_13		  0x01000000 /* 2**13 clocks */
 #define	  TCR_FP_2_17		  0x02000000 /* 2**17 clocks */
 #define	  TCR_FP_2_21		  0x03000000 /* 2**21 clocks */
 #define	  TCR_FIE		  0x00800000 /* FIT Interrupt Enable */
 #define	  TCR_ARE		  0x00400000 /* Auto Reload Enable */
 
 #define	SPR_PIT			0x3db	/* 4.. Programmable Interval Timer */
 #define	SPR_SRR2		0x3de	/* 4.. Save/Restore Register 2 */
 #define	SPR_SRR3		0x3df	/* 4.. Save/Restore Register 3 */
 #define	SPR_HID0		0x3f0	/* ..8 Hardware Implementation Register 0 */
 #define	SPR_HID1		0x3f1	/* ..8 Hardware Implementation Register 1 */
 #define	SPR_HID2		0x3f3	/* ..8 Hardware Implementation Register 2 */
 #define	SPR_HID4		0x3f4	/* ..8 Hardware Implementation Register 4 */
 #define	SPR_HID5		0x3f6	/* ..8 Hardware Implementation Register 5 */
 #define	SPR_HID6		0x3f9	/* ..8 Hardware Implementation Register 6 */
 
 #define	SPR_CELL_TSRL		0x380	/* ... Cell BE Thread Status Register */
 #define	SPR_CELL_TSCR		0x399	/* ... Cell BE Thread Switch Register */
 
 #if defined(AIM)
 #define	SPR_DBSR		0x3f0	/* 4.. Debug Status Register */
 #define	  DBSR_IC		  0x80000000 /* Instruction completion debug event */
 #define	  DBSR_BT		  0x40000000 /* Branch Taken debug event */
 #define	  DBSR_EDE		  0x20000000 /* Exception debug event */
 #define	  DBSR_TIE		  0x10000000 /* Trap Instruction debug event */
 #define	  DBSR_UDE		  0x08000000 /* Unconditional debug event */
 #define	  DBSR_IA1		  0x04000000 /* IAC1 debug event */
 #define	  DBSR_IA2		  0x02000000 /* IAC2 debug event */
 #define	  DBSR_DR1		  0x01000000 /* DAC1 Read debug event */
 #define	  DBSR_DW1		  0x00800000 /* DAC1 Write debug event */
 #define	  DBSR_DR2		  0x00400000 /* DAC2 Read debug event */
 #define	  DBSR_DW2		  0x00200000 /* DAC2 Write debug event */
 #define	  DBSR_IDE		  0x00100000 /* Imprecise debug event */
 #define	  DBSR_IA3		  0x00080000 /* IAC3 debug event */
 #define	  DBSR_IA4		  0x00040000 /* IAC4 debug event */
 #define	  DBSR_MRR		  0x00000300 /* Most recent reset */
 #define	SPR_DBCR0		0x3f2	/* 4.. Debug Control Register 0 */
 #define	SPR_DBCR1		0x3bd	/* 4.. Debug Control Register 1 */
 #define	SPR_IAC1		0x3f4	/* 4.. Instruction Address Compare 1 */
 #define	SPR_IAC2		0x3f5	/* 4.. Instruction Address Compare 2 */
 #define	SPR_DAC1		0x3f6	/* 4.. Data Address Compare 1 */
 #define	SPR_DAC2		0x3f7	/* 4.. Data Address Compare 2 */
 #define	SPR_PIR			0x3ff	/* .6. Processor Identification Register */
 #elif defined(BOOKE)
 #define	SPR_PIR			0x11e	/* ..8 Processor Identification Register */
 #define	SPR_DBSR		0x130	/* ..8 Debug Status Register */
 #define	  DBSR_IDE		  0x80000000 /* Imprecise debug event. */
 #define	  DBSR_UDE		  0x40000000 /* Unconditional debug event. */
 #define	  DBSR_MRR		  0x30000000 /* Most recent Reset (mask). */
 #define	  DBSR_ICMP		  0x08000000 /* Instr. complete debug event. */
 #define	  DBSR_BRT		  0x04000000 /* Branch taken debug event. */
 #define	  DBSR_IRPT		  0x02000000 /* Interrupt taken debug event. */
 #define	  DBSR_TRAP		  0x01000000 /* Trap instr. debug event. */
 #define	  DBSR_IAC1		  0x00800000 /* Instr. address compare #1. */
 #define	  DBSR_IAC2		  0x00400000 /* Instr. address compare #2. */
 #define	  DBSR_IAC3		  0x00200000 /* Instr. address compare #3. */
 #define	  DBSR_IAC4		  0x00100000 /* Instr. address compare #4. */
 #define	  DBSR_DAC1R		  0x00080000 /* Data addr. read compare #1. */
 #define	  DBSR_DAC1W		  0x00040000 /* Data addr. write compare #1. */
 #define	  DBSR_DAC2R		  0x00020000 /* Data addr. read compare #2. */
 #define	  DBSR_DAC2W		  0x00010000 /* Data addr. write compare #2. */
 #define	  DBSR_RET		  0x00008000 /* Return debug event. */
 #define	SPR_DBCR0		0x134	/* ..8 Debug Control Register 0 */
 #define	SPR_DBCR1		0x135	/* ..8 Debug Control Register 1 */
 #define	SPR_IAC1		0x138	/* ..8 Instruction Address Compare 1 */
 #define	SPR_IAC2		0x139	/* ..8 Instruction Address Compare 2 */
 #define	SPR_DAC1		0x13c	/* ..8 Data Address Compare 1 */
 #define	SPR_DAC2		0x13d	/* ..8 Data Address Compare 2 */
 #endif
 
 #define	  DBCR0_EDM		  0x80000000 /* External Debug Mode */
 #define	  DBCR0_IDM		  0x40000000 /* Internal Debug Mode */
 #define	  DBCR0_RST_MASK	  0x30000000 /* ReSeT */
 #define	  DBCR0_RST_NONE	  0x00000000 /*   No action */
 #define	  DBCR0_RST_CORE	  0x10000000 /*   Core reset */
 #define	  DBCR0_RST_CHIP	  0x20000000 /*   Chip reset */
 #define	  DBCR0_RST_SYSTEM	  0x30000000 /*   System reset */
 #define	  DBCR0_IC		  0x08000000 /* Instruction Completion debug event */
 #define	  DBCR0_BT		  0x04000000 /* Branch Taken debug event */
 #define	  DBCR0_EDE		  0x02000000 /* Exception Debug Event */
 #define	  DBCR0_TDE		  0x01000000 /* Trap Debug Event */
 #define	  DBCR0_IA1		  0x00800000 /* IAC (Instruction Address Compare) 1 debug event */
 #define	  DBCR0_IA2		  0x00400000 /* IAC 2 debug event */
 #define	  DBCR0_IA12		  0x00200000 /* Instruction Address Range Compare 1-2 */
 #define	  DBCR0_IA12X		  0x00100000 /* IA12 eXclusive */
 #define	  DBCR0_IA3		  0x00080000 /* IAC 3 debug event */
 #define	  DBCR0_IA4		  0x00040000 /* IAC 4 debug event */
 #define	  DBCR0_IA34		  0x00020000 /* Instruction Address Range Compare 3-4 */
 #define	  DBCR0_IA34X		  0x00010000 /* IA34 eXclusive */
 #define	  DBCR0_IA12T		  0x00008000 /* Instruction Address Range Compare 1-2 range Toggle */
 #define	  DBCR0_IA34T		  0x00004000 /* Instruction Address Range Compare 3-4 range Toggle */
 #define	  DBCR0_FT		  0x00000001 /* Freeze Timers on debug event */
 
 #define	SPR_IABR		0x3f2	/* ..8 Instruction Address Breakpoint Register 0 */
 #define	SPR_DABR		0x3f5	/* .6. Data Address Breakpoint Register */
 #define	SPR_MSSCR0		0x3f6	/* .6. Memory SubSystem Control Register */
 #define	  MSSCR0_SHDEN		  0x80000000 /* 0: Shared-state enable */
 #define	  MSSCR0_SHDPEN3	  0x40000000 /* 1: ~SHD[01] signal enable in MEI mode */
 #define	  MSSCR0_L1INTVEN	  0x38000000 /* 2-4: L1 data cache ~HIT intervention enable */
 #define	  MSSCR0_L2INTVEN	  0x07000000 /* 5-7: L2 data cache ~HIT intervention enable*/
 #define	  MSSCR0_DL1HWF		  0x00800000 /* 8: L1 data cache hardware flush */
 #define	  MSSCR0_MBO		  0x00400000 /* 9: must be one */
 #define	  MSSCR0_EMODE		  0x00200000 /* 10: MPX bus mode (read-only) */
 #define	  MSSCR0_ABD		  0x00100000 /* 11: address bus driven (read-only) */
 #define	  MSSCR0_MBZ		  0x000fffff /* 12-31: must be zero */
 #define	  MSSCR0_L2PFE		  0x00000003 /* 30-31: L2 prefetch enable */
 #define	SPR_MSSSR0		0x3f7	/* .6. Memory Subsystem Status Register (MPC745x) */
 #define	  MSSSR0_L2TAG		  0x00040000 /* 13: L2 tag parity error */
 #define	  MSSSR0_L2DAT		  0x00020000 /* 14: L2 data parity error */
 #define	  MSSSR0_L3TAG		  0x00010000 /* 15: L3 tag parity error */
 #define	  MSSSR0_L3DAT		  0x00008000 /* 16: L3 data parity error */
 #define	  MSSSR0_APE		  0x00004000 /* 17: Address parity error */
 #define	  MSSSR0_DPE		  0x00002000 /* 18: Data parity error */
 #define	  MSSSR0_TEA		  0x00001000 /* 19: Bus transfer error acknowledge */
 #define	SPR_LDSTCR		0x3f8	/* .6. Load/Store Control Register */
 #define	SPR_L2PM		0x3f8	/* .6. L2 Private Memory Control Register */
 #define	SPR_L2CR		0x3f9	/* .6. L2 Control Register */
 #define	  L2CR_L2E		  0x80000000 /* 0: L2 enable */
 #define	  L2CR_L2PE		  0x40000000 /* 1: L2 data parity enable */
 #define	  L2CR_L2SIZ		  0x30000000 /* 2-3: L2 size */
 #define	   L2SIZ_2M		  0x00000000
 #define	   L2SIZ_256K		  0x10000000
 #define	   L2SIZ_512K		  0x20000000
 #define	   L2SIZ_1M		  0x30000000
 #define	  L2CR_L2CLK		  0x0e000000 /* 4-6: L2 clock ratio */
 #define	   L2CLK_DIS		  0x00000000 /* disable L2 clock */
 #define	   L2CLK_10		  0x02000000 /* core clock / 1   */
 #define	   L2CLK_15		  0x04000000 /*            / 1.5 */
 #define	   L2CLK_20		  0x08000000 /*            / 2   */
 #define	   L2CLK_25		  0x0a000000 /*            / 2.5 */
 #define	   L2CLK_30		  0x0c000000 /*            / 3   */
 #define	  L2CR_L2RAM		  0x01800000 /* 7-8: L2 RAM type */
 #define	   L2RAM_FLOWTHRU_BURST	  0x00000000
 #define	   L2RAM_PIPELINE_BURST	  0x01000000
 #define	   L2RAM_PIPELINE_LATE	  0x01800000
 #define	  L2CR_L2DO		  0x00400000 /* 9: L2 data-only.
 				      Setting this bit disables instruction
 				      caching. */
 #define	  L2CR_L2I		  0x00200000 /* 10: L2 global invalidate. */
 #define	  L2CR_L2IO_7450	  0x00010000 /* 11: L2 instruction-only (MPC745x). */
 #define	  L2CR_L2CTL		  0x00100000 /* 11: L2 RAM control (ZZ enable).
 				      Enables automatic operation of the
 				      L2ZZ (low-power mode) signal. */
 #define	  L2CR_L2WT		  0x00080000 /* 12: L2 write-through. */
 #define	  L2CR_L2TS		  0x00040000 /* 13: L2 test support. */
 #define	  L2CR_L2OH		  0x00030000 /* 14-15: L2 output hold. */
 #define	  L2CR_L2DO_7450	  0x00010000 /* 15: L2 data-only (MPC745x). */
 #define	  L2CR_L2SL		  0x00008000 /* 16: L2 DLL slow. */
 #define	  L2CR_L2DF		  0x00004000 /* 17: L2 differential clock. */
 #define	  L2CR_L2BYP		  0x00002000 /* 18: L2 DLL bypass. */
 #define	  L2CR_L2FA		  0x00001000 /* 19: L2 flush assist (for software flush). */
 #define	  L2CR_L2HWF		  0x00000800 /* 20: L2 hardware flush. */
 #define	  L2CR_L2IO		  0x00000400 /* 21: L2 instruction-only. */
 #define	  L2CR_L2CLKSTP		  0x00000200 /* 22: L2 clock stop. */
 #define	  L2CR_L2DRO		  0x00000100 /* 23: L2DLL rollover checkstop enable. */
 #define	  L2CR_L2IP		  0x00000001 /* 31: L2 global invalidate in */
 					     /*     progress (read only). */
 #define	SPR_L3CR		0x3fa	/* .6. L3 Control Register */
 #define	  L3CR_L3E		  0x80000000 /* 0: L3 enable */
 #define	  L3CR_L3PE		  0x40000000 /* 1: L3 data parity enable */
 #define	  L3CR_L3APE		  0x20000000
 #define	  L3CR_L3SIZ		  0x10000000 /* 3: L3 size (0=1MB, 1=2MB) */
 #define	  L3CR_L3CLKEN		  0x08000000 /* 4: Enables L3_CLK[0:1] */
 #define	  L3CR_L3CLK		  0x03800000
 #define	  L3CR_L3IO		  0x00400000
 #define	  L3CR_L3CLKEXT		  0x00200000
 #define	  L3CR_L3CKSPEXT	  0x00100000
 #define	  L3CR_L3OH1		  0x00080000
 #define	  L3CR_L3SPO		  0x00040000
 #define	  L3CR_L3CKSP		  0x00030000
 #define	  L3CR_L3PSP		  0x0000e000
 #define	  L3CR_L3REP		  0x00001000
 #define	  L3CR_L3HWF		  0x00000800
 #define	  L3CR_L3I		  0x00000400 /* 21: L3 global invalidate */
 #define	  L3CR_L3RT		  0x00000300
 #define	  L3CR_L3NIRCA		  0x00000080
 #define	  L3CR_L3DO		  0x00000040
 #define	  L3CR_PMEN		  0x00000004
 #define	  L3CR_PMSIZ		  0x00000003
 
 #define	SPR_DCCR		0x3fa	/* 4.. Data Cache Cachability Register */
 #define	SPR_ICCR		0x3fb	/* 4.. Instruction Cache Cachability Register */
 #define	SPR_THRM1		0x3fc	/* .6. Thermal Management Register */
 #define	SPR_THRM2		0x3fd	/* .6. Thermal Management Register */
 #define	  SPR_THRM_TIN		  0x80000000 /* Thermal interrupt bit (RO) */
 #define	  SPR_THRM_TIV		  0x40000000 /* Thermal interrupt valid (RO) */
 #define	  SPR_THRM_THRESHOLD(x)	  ((x) << 23) /* Thermal sensor threshold */
 #define	  SPR_THRM_TID		  0x00000004 /* Thermal interrupt direction */
 #define	  SPR_THRM_TIE		  0x00000002 /* Thermal interrupt enable */
 #define	  SPR_THRM_VALID		  0x00000001 /* Valid bit */
 #define	SPR_THRM3		0x3fe	/* .6. Thermal Management Register */
 #define	  SPR_THRM_TIMER(x)	  ((x) << 1) /* Sampling interval timer */
 #define	  SPR_THRM_ENABLE	  0x00000001 /* TAU Enable */
 #define	SPR_FPECR		0x3fe	/* .6. Floating-Point Exception Cause Register */
 
 /* Time Base Register declarations */
 #define	TBR_TBL			0x10c	/* 468 Time Base Lower - read */
 #define	TBR_TBU			0x10d	/* 468 Time Base Upper - read */
 #define	TBR_TBWL		0x11c	/* 468 Time Base Lower - supervisor, write */
 #define	TBR_TBWU		0x11d	/* 468 Time Base Upper - supervisor, write */
 
 /* Performance counter declarations */
 #define	PMC_OVERFLOW		0x80000000 /* Counter has overflowed */
 
 /* The first five countable [non-]events are common to many PMC's */
 #define	PMCN_NONE		 0 /* Count nothing */
 #define	PMCN_CYCLES		 1 /* Processor cycles */
 #define	PMCN_ICOMP		 2 /* Instructions completed */
 #define	PMCN_TBLTRANS		 3 /* TBL bit transitions */
 #define	PCMN_IDISPATCH		 4 /* Instructions dispatched */
 
 /* Similar things for the 970 PMC direct counters */
 #define	PMC970N_NONE		0x8 /* Count nothing */
 #define	PMC970N_CYCLES		0xf /* Processor cycles */
 #define	PMC970N_ICOMP		0x9 /* Instructions completed */
 
 #if defined(BOOKE)
 
 #define	SPR_MCARU		0x239	/* ..8 Machine Check Address register upper bits */
 #define	SPR_MCSR		0x23c	/* ..8 Machine Check Syndrome register */
 #define	SPR_MCAR		0x23d	/* ..8 Machine Check Address register */
 
 #define	SPR_ESR			0x003e	/* ..8 Exception Syndrome Register */
 #define	  ESR_PIL		  0x08000000 /* Program interrupt - illegal */
 #define	  ESR_PPR		  0x04000000 /* Program interrupt - privileged */
 #define	  ESR_PTR		  0x02000000 /* Program interrupt - trap */
 #define	  ESR_ST		  0x00800000 /* Store operation */
 #define	  ESR_DLK		  0x00200000 /* Data storage, D cache locking */
 #define	  ESR_ILK		  0x00100000 /* Data storage, I cache locking */
 #define	  ESR_BO		  0x00020000 /* Data/instruction storage, byte ordering */
 #define	  ESR_SPE		  0x00000080 /* SPE exception bit */
 
 #define	SPR_CSRR0		0x03a	/* ..8 58 Critical SRR0 */
 #define	SPR_CSRR1		0x03b	/* ..8 59 Critical SRR1 */
 #define	SPR_MCSRR0		0x23a	/* ..8 570 Machine check SRR0 */
 #define	SPR_MCSRR1		0x23b	/* ..8 571 Machine check SRR1 */
 #define	SPR_DSRR0		0x23e	/* ..8 574 Debug SRR0<E.ED> */
 #define	SPR_DSRR1		0x23f	/* ..8 575 Debug SRR1<E.ED> */
 
 #define	SPR_MMUCR		0x3b2	/* 4.. MMU Control Register */
 #define	  MMUCR_SWOA		(0x80000000 >> 7)
 #define	  MMUCR_U1TE		(0x80000000 >> 9)
 #define	  MMUCR_U2SWOAE		(0x80000000 >> 10)
 #define	  MMUCR_DULXE		(0x80000000 >> 12)
 #define	  MMUCR_IULXE		(0x80000000 >> 13)
 #define	  MMUCR_STS		(0x80000000 >> 15)
 #define	  MMUCR_STID_MASK	(0xFF000000 >> 24)
 
 #define	SPR_MMUCSR0		0x3f4	/* ..8 1012 MMU Control and Status Register 0 */
 #define	  MMUCSR0_L2TLB0_FI	0x04	/*  TLB0 flash invalidate */
 #define	  MMUCSR0_L2TLB1_FI	0x02	/*  TLB1 flash invalidate */
 
 #define	SPR_SVR			0x3ff	/* ..8 1023 System Version Register */
 #define	  SVR_MPC8533		  0x8034
 #define	  SVR_MPC8533E		  0x803c
 #define	  SVR_MPC8541		  0x8072
 #define	  SVR_MPC8541E		  0x807a
 #define	  SVR_MPC8548		  0x8031
 #define	  SVR_MPC8548E		  0x8039
 #define	  SVR_MPC8555		  0x8071
 #define	  SVR_MPC8555E		  0x8079
 #define	  SVR_MPC8572		  0x80e0
 #define	  SVR_MPC8572E		  0x80e8
 #define	  SVR_P1011		  0x80e5
 #define	  SVR_P1011E		  0x80ed
 #define	  SVR_P1013		  0x80e7
 #define	  SVR_P1013E		  0x80ef
 #define	  SVR_P1020		  0x80e4
 #define	  SVR_P1020E		  0x80ec
 #define	  SVR_P1022		  0x80e6
 #define	  SVR_P1022E		  0x80ee
 #define	  SVR_P2010		  0x80e3
 #define	  SVR_P2010E		  0x80eb
 #define	  SVR_P2020		  0x80e2
 #define	  SVR_P2020E		  0x80ea
 #define	  SVR_P2041		  0x8210
 #define	  SVR_P2041E		  0x8218
 #define	  SVR_P3041		  0x8211
 #define	  SVR_P3041E		  0x8219
 #define	  SVR_P4040		  0x8200
 #define	  SVR_P4040E		  0x8208
 #define	  SVR_P4080		  0x8201
 #define	  SVR_P4080E		  0x8209
 #define	  SVR_P5010		  0x8221
 #define	  SVR_P5010E		  0x8229
 #define	  SVR_P5020		  0x8220
 #define	  SVR_P5020E		  0x8228
 #define	  SVR_P5021		  0x8205
 #define	  SVR_P5021E		  0x820d
 #define	  SVR_P5040		  0x8204
 #define	  SVR_P5040E		  0x820c
 #define	SVR_VER(svr)		(((svr) >> 16) & 0xffff)
 
 #define	SPR_PID0		0x030	/* ..8 Process ID Register 0 */
 #define	SPR_PID1		0x279	/* ..8 Process ID Register 1 */
 #define	SPR_PID2		0x27a	/* ..8 Process ID Register 2 */
 
 #define	SPR_TLB0CFG		0x2B0	/* ..8 TLB 0 Config Register */
 #define	SPR_TLB1CFG		0x2B1	/* ..8 TLB 1 Config Register */
 #define	  TLBCFG_ASSOC_MASK	0xff000000 /* Associativity of TLB */
 #define	  TLBCFG_ASSOC_SHIFT	24
 #define	  TLBCFG_NENTRY_MASK	0x00000fff /* Number of entries in TLB */
 
 #define	SPR_IVPR		0x03f	/* ..8 Interrupt Vector Prefix Register */
 #define	SPR_IVOR0		0x190	/* ..8 Critical input */
 #define	SPR_IVOR1		0x191	/* ..8 Machine check */
 #define	SPR_IVOR2		0x192
 #define	SPR_IVOR3		0x193
 #define	SPR_IVOR4		0x194
 #define	SPR_IVOR5		0x195
 #define	SPR_IVOR6		0x196
 #define	SPR_IVOR7		0x197
 #define	SPR_IVOR8		0x198
 #define	SPR_IVOR9		0x199
 #define	SPR_IVOR10		0x19a
 #define	SPR_IVOR11		0x19b
 #define	SPR_IVOR12		0x19c
 #define	SPR_IVOR13		0x19d
 #define	SPR_IVOR14		0x19e
 #define	SPR_IVOR15		0x19f
 #define	SPR_IVOR32		0x210
 #define	SPR_IVOR33		0x211
 #define	SPR_IVOR34		0x212
 #define	SPR_IVOR35		0x213
 
 #define	SPR_MAS0		0x270	/* ..8 MMU Assist Register 0 Book-E/e500 */
 #define	SPR_MAS1		0x271	/* ..8 MMU Assist Register 1 Book-E/e500 */
 #define	SPR_MAS2		0x272	/* ..8 MMU Assist Register 2 Book-E/e500 */
 #define	SPR_MAS3		0x273	/* ..8 MMU Assist Register 3 Book-E/e500 */
 #define	SPR_MAS4		0x274	/* ..8 MMU Assist Register 4 Book-E/e500 */
 #define	SPR_MAS5		0x275	/* ..8 MMU Assist Register 5 Book-E */
 #define	SPR_MAS6		0x276	/* ..8 MMU Assist Register 6 Book-E/e500 */
 #define	SPR_MAS7		0x3B0	/* ..8 MMU Assist Register 7 Book-E/e500 */
 #define	SPR_MAS8		0x155	/* ..8 MMU Assist Register 8 Book-E/e500 */
 
 #define	SPR_L1CFG0		0x203	/* ..8 L1 cache configuration register 0 */
 #define	SPR_L1CFG1		0x204	/* ..8 L1 cache configuration register 1 */
 
 #define	SPR_CCR1		0x378
 #define	  CCR1_L2COBE		0x00000040
 
 #define	DCR_L2DCDCRAI		0x0000	/* L2 D-Cache DCR Address Pointer */
 #define	DCR_L2DCDCRDI		0x0001	/* L2 D-Cache DCR Data Indirect */
 #define	DCR_L2CR0		0x00	/* L2 Cache Configuration Register 0 */
 #define	  L2CR0_AS		0x30000000
 
 #define	SPR_L1CSR0		0x3F2	/* ..8 L1 Cache Control and Status Register 0 */
 #define	  L1CSR0_DCPE		0x00010000	/* Data Cache Parity Enable */
 #define	  L1CSR0_DCLFR		0x00000100	/* Data Cache Lock Bits Flash Reset */
 #define	  L1CSR0_DCFI		0x00000002	/* Data Cache Flash Invalidate */
 #define	  L1CSR0_DCE		0x00000001	/* Data Cache Enable */
 #define	SPR_L1CSR1		0x3F3	/* ..8 L1 Cache Control and Status Register 1 */
 #define	  L1CSR1_ICPE		0x00010000	/* Instruction Cache Parity Enable */
 #define	  L1CSR1_ICUL		0x00000400      /* Instr Cache Unable to Lock */
 #define	  L1CSR1_ICLFR		0x00000100	/* Instruction Cache Lock Bits Flash Reset */
 #define	  L1CSR1_ICFI		0x00000002	/* Instruction Cache Flash Invalidate */
 #define	  L1CSR1_ICE		0x00000001	/* Instruction Cache Enable */
 
 #define	SPR_L2CSR0		0x3F9	/* ..8 L2 Cache Control and Status Register 0 */
 #define	  L2CSR0_L2E		0x80000000	/* L2 Cache Enable */
 #define	  L2CSR0_L2PE		0x40000000	/* L2 Cache Parity Enable */
 #define	  L2CSR0_L2FI		0x00200000	/* L2 Cache Flash Invalidate */
 #define	  L2CSR0_L2LFC		0x00000400	/* L2 Cache Lock Flags Clear */
 
 #define	SPR_BUCSR		0x3F5	/* ..8 Branch Unit Control and Status Register */
 #define	  BUCSR_BPEN		0x00000001	/* Branch Prediction Enable */
 #define	  BUCSR_BBFI		0x00000200	/* Branch Buffer Flash Invalidate */
 
 #endif /* BOOKE */
 #endif /* !_POWERPC_SPR_H_ */
Index: user/ngie/bug-237403/sys/powerpc/powernv/opal_dev.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powernv/opal_dev.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powernv/opal_dev.c	(revision 346926)
@@ -1,423 +1,424 @@
 /*-
  * Copyright (c) 2015 Nathan Whitehorn
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/module.h>
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/clock.h>
 #include <sys/cpu.h>
 #include <sys/kernel.h>
 #include <sys/kthread.h>
 #include <sys/reboot.h>
 #include <sys/sysctl.h>
 #include <sys/endian.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 #include <dev/ofw/openfirm.h>
 
 #include "clock_if.h"
 #include "opal.h"
 
 static int	opaldev_probe(device_t);
 static int	opaldev_attach(device_t);
 /* clock interface */
 static int	opal_gettime(device_t dev, struct timespec *ts);
 static int	opal_settime(device_t dev, struct timespec *ts);
 /* ofw bus interface */
 static const struct ofw_bus_devinfo *opaldev_get_devinfo(device_t dev,
     device_t child);
 
 static void	opal_shutdown(void *arg, int howto);
 static void	opal_handle_shutdown_message(void *unused,
     struct opal_msg *msg);
 static void	opal_intr(void *);
 
 static device_method_t  opaldev_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		opaldev_probe),
 	DEVMETHOD(device_attach,	opaldev_attach),
 
 	/* clock interface */
 	DEVMETHOD(clock_gettime,	opal_gettime),
 	DEVMETHOD(clock_settime,	opal_settime),
 
 	/* Bus interface */
 	DEVMETHOD(bus_child_pnpinfo_str, ofw_bus_gen_child_pnpinfo_str),
 
         /* ofw_bus interface */
 	DEVMETHOD(ofw_bus_get_devinfo,	opaldev_get_devinfo),
 	DEVMETHOD(ofw_bus_get_compat,	ofw_bus_gen_get_compat),
 	DEVMETHOD(ofw_bus_get_model,	ofw_bus_gen_get_model),
 	DEVMETHOD(ofw_bus_get_name,	ofw_bus_gen_get_name),
 	DEVMETHOD(ofw_bus_get_node,	ofw_bus_gen_get_node),
 	DEVMETHOD(ofw_bus_get_type,	ofw_bus_gen_get_type),
 	
 	DEVMETHOD_END
 };
 
 static driver_t opaldev_driver = {
 	"opal",
 	opaldev_methods,
 	0
 };
 
 static devclass_t opaldev_devclass;
 
-DRIVER_MODULE(opaldev, ofwbus, opaldev_driver, opaldev_devclass, 0, 0);
+EARLY_DRIVER_MODULE(opaldev, ofwbus, opaldev_driver, opaldev_devclass, 0, 0,
+    BUS_PASS_BUS);
 
 static void opal_heartbeat(void);
 static void opal_handle_messages(void);
 
 static struct proc *opal_hb_proc;
 static struct kproc_desc opal_heartbeat_kp = {
 	"opal_heartbeat",
 	opal_heartbeat,
 	&opal_hb_proc
 };
 
 SYSINIT(opal_heartbeat_setup, SI_SUB_KTHREAD_IDLE, SI_ORDER_ANY, kproc_start,
     &opal_heartbeat_kp);
 
 static int opal_heartbeat_ms;
 EVENTHANDLER_LIST_DEFINE(OPAL_ASYNC_COMP);
 EVENTHANDLER_LIST_DEFINE(OPAL_EPOW);
 EVENTHANDLER_LIST_DEFINE(OPAL_SHUTDOWN);
 EVENTHANDLER_LIST_DEFINE(OPAL_HMI_EVT);
 EVENTHANDLER_LIST_DEFINE(OPAL_DPO);
 EVENTHANDLER_LIST_DEFINE(OPAL_OCC);
 
 #define	OPAL_SOFT_OFF		0
 #define	OPAL_SOFT_REBOOT	1
 
 static void
 opal_heartbeat(void)
 {
 	uint64_t events;
 
 	if (opal_heartbeat_ms == 0)
 		kproc_exit(0);
 
 	while (1) {
 		events = 0;
 		/* Turn the OPAL state crank */
 		opal_call(OPAL_POLL_EVENTS, vtophys(&events));
 		if (events & OPAL_EVENT_MSG_PENDING)
 			opal_handle_messages();
 		tsleep(opal_hb_proc, 0, "opal",
 		    MSEC_2_TICKS(opal_heartbeat_ms));
 	}
 }
 
 static int
 opaldev_probe(device_t dev)
 {
 	phandle_t iparent;
 	pcell_t *irqs;
 	int i, n_irqs;
 
 	if (!ofw_bus_is_compatible(dev, "ibm,opal-v3"))
 		return (ENXIO);
 	if (opal_check() != 0)
 		return (ENXIO);
 
 	device_set_desc(dev, "OPAL Abstraction Firmware");
 
 	/* Manually add IRQs before attaching */
 	if (OF_hasprop(ofw_bus_get_node(dev), "opal-interrupts")) {
 		iparent = OF_finddevice("/interrupt-controller@0");
 		iparent = OF_xref_from_node(iparent);
 
 		n_irqs = OF_getproplen(ofw_bus_get_node(dev),
                     "opal-interrupts") / sizeof(*irqs);
 		irqs = malloc(n_irqs * sizeof(*irqs), M_DEVBUF, M_WAITOK);
 		OF_getencprop(ofw_bus_get_node(dev), "opal-interrupts", irqs,
 		    n_irqs * sizeof(*irqs));
 		for (i = 0; i < n_irqs; i++)
 			bus_set_resource(dev, SYS_RES_IRQ, i,
 			    ofw_bus_map_intr(dev, iparent, 1, &irqs[i]), 1);
 		free(irqs, M_DEVBUF);
 	}
 
 
 	return (BUS_PROBE_SPECIFIC);
 }
 
 static int
 opaldev_attach(device_t dev)
 {
 	phandle_t child;
 	device_t cdev;
 	uint64_t junk;
 	int i, rv;
 	uint32_t async_count;
 	struct ofw_bus_devinfo *dinfo;
 	struct resource *irq;
 
 	/* Test for RTC support and register clock if it works */
 	rv = opal_call(OPAL_RTC_READ, vtophys(&junk), vtophys(&junk));
 	do {
 		rv = opal_call(OPAL_RTC_READ, vtophys(&junk), vtophys(&junk));
 		if (rv == OPAL_BUSY_EVENT)
 			rv = opal_call(OPAL_POLL_EVENTS, 0);
 	} while (rv == OPAL_BUSY_EVENT);
 
 	if (rv == OPAL_SUCCESS)
 		clock_register(dev, 2000);
 	
 	EVENTHANDLER_REGISTER(OPAL_SHUTDOWN, opal_handle_shutdown_message,
 	    NULL, EVENTHANDLER_PRI_ANY);
 	EVENTHANDLER_REGISTER(shutdown_final, opal_shutdown, NULL,
 	    SHUTDOWN_PRI_LAST);
 
 	OF_getencprop(ofw_bus_get_node(dev), "ibm,heartbeat-ms",
 	    &opal_heartbeat_ms, sizeof(opal_heartbeat_ms));
 	/* Bind to interrupts */
 	for (i = 0; (irq = bus_alloc_resource_any(dev, SYS_RES_IRQ, &i,
 	    RF_ACTIVE)) != NULL; i++)
 		bus_setup_intr(dev, irq, INTR_TYPE_TTY | INTR_MPSAFE |
 		    INTR_ENTROPY, NULL, opal_intr, (void *)rman_get_start(irq),
 		    NULL);
 
 	OF_getencprop(ofw_bus_get_node(dev), "opal-msg-async-num",
 	    &async_count, sizeof(async_count));
 	opal_init_async_tokens(async_count);
 
 	for (child = OF_child(ofw_bus_get_node(dev)); child != 0;
 	    child = OF_peer(child)) {
 		dinfo = malloc(sizeof(*dinfo), M_DEVBUF, M_WAITOK | M_ZERO);
 		if (ofw_bus_gen_setup_devinfo(dinfo, child) != 0) {
 			free(dinfo, M_DEVBUF);
 			continue;
 		}
 		cdev = device_add_child(dev, NULL, -1);
 		if (cdev == NULL) {
 			device_printf(dev, "<%s>: device_add_child failed\n",
 			    dinfo->obd_name);
 			ofw_bus_gen_destroy_devinfo(dinfo);
 			free(dinfo, M_DEVBUF);
 			continue;
 		}
 		device_set_ivars(cdev, dinfo);
 	}
 
 	return (bus_generic_attach(dev));
 }
 
 static int
 bcd2bin32(int bcd)
 {
 	int out = 0;
 
 	out += bcd2bin(bcd & 0xff);
 	out += 100*bcd2bin((bcd & 0x0000ff00) >> 8);
 	out += 10000*bcd2bin((bcd & 0x00ff0000) >> 16);
 	out += 1000000*bcd2bin((bcd & 0xffff0000) >> 24);
 
 	return (out);
 }
 
 static int
 bin2bcd32(int bin)
 {
 	int out = 0;
 	int tmp;
 
 	tmp = bin % 100;
 	out += bin2bcd(tmp) * 1;
 	bin = bin / 100;
 
 	tmp = bin % 100;
 	out += bin2bcd(tmp) * 100;
 	bin = bin / 100;
 
 	tmp = bin % 100;
 	out += bin2bcd(tmp) * 10000;
 
 	return (out);
 }
 
 static int
 opal_gettime(device_t dev, struct timespec *ts)
 {
 	int rv;
 	struct clocktime ct;
 	uint32_t ymd;
 	uint64_t hmsm;
 
 	rv = opal_call(OPAL_RTC_READ, vtophys(&ymd), vtophys(&hmsm));
 	while (rv == OPAL_BUSY_EVENT)  {
 		opal_call(OPAL_POLL_EVENTS, 0);
 		pause("opalrtc", 1);
 		rv = opal_call(OPAL_RTC_READ, vtophys(&ymd), vtophys(&hmsm));
 	}
 
 	if (rv != OPAL_SUCCESS)
 		return (ENXIO);
 
 	hmsm = be64toh(hmsm);
 	ymd = be32toh(ymd);
 
 	ct.nsec	= bcd2bin32((hmsm & 0x000000ffffff0000) >> 16) * 1000;
 	ct.sec	= bcd2bin((hmsm & 0x0000ff0000000000) >> 40);
 	ct.min	= bcd2bin((hmsm & 0x00ff000000000000) >> 48);
 	ct.hour	= bcd2bin((hmsm & 0xff00000000000000) >> 56);
 
 	ct.day	= bcd2bin((ymd & 0x000000ff) >> 0);
 	ct.mon	= bcd2bin((ymd & 0x0000ff00) >> 8);
 	ct.year	= bcd2bin32((ymd & 0xffff0000) >> 16);
 
 	return (clock_ct_to_ts(&ct, ts));
 }
 
 static int
 opal_settime(device_t dev, struct timespec *ts)
 {
 	int rv;
 	struct clocktime ct;
 	uint32_t ymd = 0;
 	uint64_t hmsm = 0;
 
 	clock_ts_to_ct(ts, &ct);
 
 	ymd |= (uint32_t)bin2bcd(ct.day);
 	ymd |= ((uint32_t)bin2bcd(ct.mon) << 8);
 	ymd |= ((uint32_t)bin2bcd32(ct.year) << 16);
 
 	hmsm |= ((uint64_t)bin2bcd32(ct.nsec/1000) << 16);
 	hmsm |= ((uint64_t)bin2bcd(ct.sec) << 40);
 	hmsm |= ((uint64_t)bin2bcd(ct.min) << 48);
 	hmsm |= ((uint64_t)bin2bcd(ct.hour) << 56);
 
 	hmsm = htobe64(hmsm);
 	ymd = htobe32(ymd);
 
 	do {
 		rv = opal_call(OPAL_RTC_WRITE, vtophys(&ymd), vtophys(&hmsm));
 		if (rv == OPAL_BUSY_EVENT) {
 			rv = opal_call(OPAL_POLL_EVENTS, 0);
 			pause("opalrtc", 1);
 		}
 	} while (rv == OPAL_BUSY_EVENT);
 
 	if (rv != OPAL_SUCCESS)
 		return (ENXIO);
 
 	return (0);
 }
 
 static const struct ofw_bus_devinfo *
 opaldev_get_devinfo(device_t dev, device_t child)
 {
 	return (device_get_ivars(child));
 }
 
 static void
 opal_shutdown(void *arg, int howto)
 {
 
 	if (howto & RB_HALT)
 		opal_call(OPAL_CEC_POWER_DOWN, 0 /* Normal power off */);
 	else
 		opal_call(OPAL_CEC_REBOOT);
 
 	opal_call(OPAL_RETURN_CPU);
 }
 
 static void
 opal_handle_shutdown_message(void *unused, struct opal_msg *msg)
 {
 	int howto;
 
 	switch (be64toh(msg->params[0])) {
 	case OPAL_SOFT_OFF:
 		howto = RB_POWEROFF;
 		break;
 	case OPAL_SOFT_REBOOT:
 		howto = RB_REROOT;
 		break;
 	}
 	shutdown_nice(howto);
 }
 
 static void
 opal_handle_messages(void)
 {
 	static struct opal_msg msg;
 	uint64_t rv;
 	uint32_t type;
 
 	rv = opal_call(OPAL_GET_MSG, vtophys(&msg), sizeof(msg));
 	
 	if (rv != OPAL_SUCCESS)
 		return;
 
 	type = be32toh(msg.msg_type);
 	switch (type) {
 	case OPAL_MSG_ASYNC_COMP:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_ASYNC_COMP, &msg);
 		break;
 	case OPAL_MSG_EPOW:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_EPOW, &msg);
 		break;
 	case OPAL_MSG_SHUTDOWN:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_SHUTDOWN, &msg);
 		break;
 	case OPAL_MSG_HMI_EVT:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_HMI_EVT, &msg);
 		break;
 	case OPAL_MSG_DPO:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_DPO, &msg);
 		break;
 	case OPAL_MSG_OCC:
 		EVENTHANDLER_DIRECT_INVOKE(OPAL_OCC, &msg);
 		break;
 	default:
 		printf("Unknown OPAL message type %d\n", type);
 	}
 }
 
 static void
 opal_intr(void *xintr)
 {
 	uint64_t events = 0;
 
 	opal_call(OPAL_HANDLE_INTERRUPT, (uint32_t)(uint64_t)xintr,
 	    vtophys(&events));
 	/* Wake up the heartbeat, if it's been setup. */
 	if (events != 0 && opal_hb_proc != NULL)
 		wakeup(opal_hb_proc);
 
 }
 
Index: user/ngie/bug-237403/sys/powerpc/powerpc/cpu.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/cpu.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/cpu.c	(revision 346926)
@@ -1,836 +1,848 @@
 /*-
  * SPDX-License-Identifier: BSD-4-Clause AND BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2001 Matt Thomas.
  * Copyright (c) 2001 Tsubai Masanari.
  * Copyright (c) 1998, 1999, 2001 Internet Research Institute, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by
  *	Internet Research Institute, Inc.
  * 4. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 /*-
  * Copyright (C) 2003 Benno Rice.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * from $NetBSD: cpu_subr.c,v 1.1 2003/02/03 17:10:09 matt Exp $
  * $FreeBSD$
  */
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/cpu.h>
 #include <sys/kernel.h>
 #include <sys/proc.h>
 #include <sys/sysctl.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 
 #include <machine/bus.h>
 #include <machine/cpu.h>
 #include <machine/hid.h>
 #include <machine/md_var.h>
 #include <machine/smp.h>
 #include <machine/spr.h>
 
 #include <dev/ofw/openfirm.h>
 
 static void	cpu_6xx_setup(int cpuid, uint16_t vers);
 static void	cpu_970_setup(int cpuid, uint16_t vers);
 static void	cpu_booke_setup(int cpuid, uint16_t vers);
 static void	cpu_powerx_setup(int cpuid, uint16_t vers);
 
 int powerpc_pow_enabled;
 void (*cpu_idle_hook)(sbintime_t) = NULL;
 static void	cpu_idle_60x(sbintime_t);
 static void	cpu_idle_booke(sbintime_t);
 #ifdef BOOKE_E500
 static void	cpu_idle_e500mc(sbintime_t sbt);
 #endif
 #if defined(__powerpc64__) && defined(AIM)
 static void	cpu_idle_powerx(sbintime_t);
 static void	cpu_idle_power9(sbintime_t);
 #endif
 
 struct cputab {
 	const char	*name;
 	uint16_t	version;
 	uint16_t	revfmt;
 	int		features;	/* Do not include PPC_FEATURE_32 or
 					 * PPC_FEATURE_HAS_MMU */
 	int		features2;
 	void		(*cpu_setup)(int cpuid, uint16_t vers);
 };
 #define	REVFMT_MAJMIN	1	/* %u.%u */
 #define	REVFMT_HEX	2	/* 0x%04x */
 #define	REVFMT_DEC	3	/* %u */
 static const struct cputab models[] = {
         { "Motorola PowerPC 601",	MPC601,		REVFMT_DEC,
 	   PPC_FEATURE_HAS_FPU | PPC_FEATURE_UNIFIED_CACHE, 0, cpu_6xx_setup },
         { "Motorola PowerPC 602",	MPC602,		REVFMT_DEC,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 603",	MPC603,		REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 603e",	MPC603e,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 603ev",	MPC603ev,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 604",	MPC604,		REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 604ev",	MPC604ev,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 620",	MPC620,		REVFMT_HEX,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU, 0, NULL },
         { "Motorola PowerPC 750",	MPC750,		REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "IBM PowerPC 750FX",		IBM750FX,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "IBM PowerPC 970",		IBM970,		REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU,
 	   0, cpu_970_setup },
         { "IBM PowerPC 970FX",		IBM970FX,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU,
 	   0, cpu_970_setup },
         { "IBM PowerPC 970GX",		IBM970GX,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU,
 	   0, cpu_970_setup },
         { "IBM PowerPC 970MP",		IBM970MP,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU,
 	   0, cpu_970_setup },
         { "IBM POWER4",		IBMPOWER4,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU | PPC_FEATURE_POWER4, 0, NULL },
         { "IBM POWER4+",	IBMPOWER4PLUS,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU | PPC_FEATURE_POWER4, 0, NULL },
         { "IBM POWER5",		IBMPOWER5,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU | PPC_FEATURE_POWER4 |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP, 0, NULL },
         { "IBM POWER5+",	IBMPOWER5PLUS,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU | PPC_FEATURE_POWER5_PLUS |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP, 0, NULL },
         { "IBM POWER6",		IBMPOWER6,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | PPC_FEATURE_ARCH_2_05 |
 	   PPC_FEATURE_TRUE_LE, 0, NULL },
         { "IBM POWER7",		IBMPOWER7,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ARCH_2_05 | PPC_FEATURE_ARCH_2_06 |
 	   PPC_FEATURE_HAS_VSX | PPC_FEATURE_TRUE_LE, PPC_FEATURE2_DSCR, NULL },
         { "IBM POWER7+",	IBMPOWER7PLUS,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ARCH_2_05 | PPC_FEATURE_ARCH_2_06 |
 	   PPC_FEATURE_HAS_VSX, PPC_FEATURE2_DSCR, NULL },
         { "IBM POWER8E",	IBMPOWER8E,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | PPC_FEATURE_ARCH_2_05 |
 	   PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_HAS_VSX | PPC_FEATURE_TRUE_LE,
 	   PPC_FEATURE2_ARCH_2_07 | PPC_FEATURE2_HTM | PPC_FEATURE2_DSCR | 
 	   PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR | PPC_FEATURE2_HAS_VEC_CRYPTO |
 	   PPC_FEATURE2_HTM_NOSC, cpu_powerx_setup },
+        { "IBM POWER8NVL",	IBMPOWER8NVL,	REVFMT_MAJMIN,
+	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
+	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | PPC_FEATURE_ARCH_2_05 |
+	   PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_HAS_VSX | PPC_FEATURE_TRUE_LE,
+	   PPC_FEATURE2_ARCH_2_07 | PPC_FEATURE2_HTM | PPC_FEATURE2_DSCR | 
+	   PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR | PPC_FEATURE2_HAS_VEC_CRYPTO |
+	   PPC_FEATURE2_HTM_NOSC, cpu_powerx_setup },
         { "IBM POWER8",		IBMPOWER8,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | PPC_FEATURE_ARCH_2_05 |
 	   PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_HAS_VSX | PPC_FEATURE_TRUE_LE,
 	   PPC_FEATURE2_ARCH_2_07 | PPC_FEATURE2_HTM | PPC_FEATURE2_DSCR | 
 	   PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR | PPC_FEATURE2_HAS_VEC_CRYPTO |
 	   PPC_FEATURE2_HTM_NOSC, cpu_powerx_setup },
         { "IBM POWER9",		IBMPOWER9,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | PPC_FEATURE_ARCH_2_05 |
 	   PPC_FEATURE_ARCH_2_06 | PPC_FEATURE_HAS_VSX | PPC_FEATURE_TRUE_LE,
 	   PPC_FEATURE2_ARCH_2_07 | PPC_FEATURE2_HTM | PPC_FEATURE2_DSCR |
-	   PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR | PPC_FEATURE2_HAS_VEC_CRYPTO |
+	   PPC_FEATURE2_EBB | PPC_FEATURE2_ISEL | PPC_FEATURE2_TAR |
+	   PPC_FEATURE2_HAS_VEC_CRYPTO | PPC_FEATURE2_HTM_NOSC |
 	   PPC_FEATURE2_ARCH_3_00 | PPC_FEATURE2_HAS_IEEE128 |
 	   PPC_FEATURE2_DARN, cpu_powerx_setup },
         { "Motorola PowerPC 7400",	MPC7400,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7410",	MPC7410,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7450",	MPC7450,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7455",	MPC7455,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7457",	MPC7457,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7447A",	MPC7447A,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 7448",	MPC7448,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 8240",	MPC8240,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Motorola PowerPC 8245",	MPC8245,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU, 0, cpu_6xx_setup },
         { "Freescale e500v1 core",	FSL_E500v1,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_SPE | PPC_FEATURE_HAS_EFP_SINGLE | PPC_FEATURE_BOOKE,
 	   PPC_FEATURE2_ISEL, cpu_booke_setup },
         { "Freescale e500v2 core",	FSL_E500v2,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_SPE | PPC_FEATURE_BOOKE |
 	   PPC_FEATURE_HAS_EFP_SINGLE | PPC_FEATURE_HAS_EFP_DOUBLE,
 	   PPC_FEATURE2_ISEL, cpu_booke_setup },
 	{ "Freescale e500mc core",	FSL_E500mc,	REVFMT_MAJMIN,
 	   PPC_FEATURE_HAS_FPU | PPC_FEATURE_BOOKE | PPC_FEATURE_ARCH_2_05 |
 	   PPC_FEATURE_ARCH_2_06, PPC_FEATURE2_ISEL,
 	   cpu_booke_setup },
 	{ "Freescale e5500 core",	FSL_E5500,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_FPU | PPC_FEATURE_BOOKE |
 	   PPC_FEATURE_ARCH_2_05 | PPC_FEATURE_ARCH_2_06,
 	   PPC_FEATURE2_ISEL, cpu_booke_setup },
 	{ "Freescale e6500 core",	FSL_E6500,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_BOOKE | PPC_FEATURE_ARCH_2_05 | PPC_FEATURE_ARCH_2_06,
 	   PPC_FEATURE2_ISEL, cpu_booke_setup },
         { "IBM Cell Broadband Engine",	IBMCELLBE,	REVFMT_MAJMIN,
 	   PPC_FEATURE_64 | PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_FPU |
 	   PPC_FEATURE_CELL | PPC_FEATURE_SMT, 0, NULL},
         { "Unknown PowerPC CPU",	0,		REVFMT_HEX, 0, 0, NULL },
 };
 
 static void	cpu_6xx_print_cacheinfo(u_int, uint16_t);
 static int	cpu_feature_bit(SYSCTL_HANDLER_ARGS);
 
 static char model[64];
 SYSCTL_STRING(_hw, HW_MODEL, model, CTLFLAG_RD, model, 0, "");
 
 static const struct cputab	*cput;
 
 u_long cpu_features = PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU;
 u_long cpu_features2 = 0;
 SYSCTL_OPAQUE(_hw, OID_AUTO, cpu_features, CTLFLAG_RD,
     &cpu_features, sizeof(cpu_features), "LX", "PowerPC CPU features");
 SYSCTL_OPAQUE(_hw, OID_AUTO, cpu_features2, CTLFLAG_RD,
     &cpu_features2, sizeof(cpu_features2), "LX", "PowerPC CPU features 2");
 
 #ifdef __powerpc64__
 register_t	lpcr = LPCR_LPES;
 #endif
 
 /* Provide some user-friendly aliases for bits in cpu_features */
 SYSCTL_PROC(_hw, OID_AUTO, floatingpoint, CTLTYPE_INT | CTLFLAG_RD,
     0, PPC_FEATURE_HAS_FPU, cpu_feature_bit, "I",
     "Floating point instructions executed in hardware");
 SYSCTL_PROC(_hw, OID_AUTO, altivec, CTLTYPE_INT | CTLFLAG_RD,
     0, PPC_FEATURE_HAS_ALTIVEC, cpu_feature_bit, "I", "CPU supports Altivec");
 
 /*
  * Phase 1 (early) CPU setup.  Setup the cpu_features/cpu_features2 variables,
  * so they can be used during platform and MMU bringup.
  */
 void
 cpu_feature_setup()
 {
 	u_int		pvr;
 	uint16_t	vers;
 	const struct	cputab *cp;
 
 	pvr = mfpvr();
 	vers = pvr >> 16;
 	for (cp = models; cp->version != 0; cp++) {
 		if (cp->version == vers)
 			break;
 	}
 
 	cput = cp;
 	cpu_features |= cp->features;
 	cpu_features2 |= cp->features2;
 }
 
 
 void
 cpu_setup(u_int cpuid)
 {
 	uint64_t	cps;
 	const char	*name;
 	u_int		maj, min, pvr;
 	uint16_t	rev, revfmt, vers;
 
 	pvr = mfpvr();
 	vers = pvr >> 16;
 	rev = pvr;
 	switch (vers) {
 		case MPC7410:
 			min = (pvr >> 0) & 0xff;
 			maj = min <= 4 ? 1 : 2;
 			break;
 		case FSL_E500v1:
 		case FSL_E500v2:
 		case FSL_E500mc:
 		case FSL_E5500:
 			maj = (pvr >>  4) & 0xf;
 			min = (pvr >>  0) & 0xf;
 			break;
 		default:
 			maj = (pvr >>  8) & 0xf;
 			min = (pvr >>  0) & 0xf;
 	}
 
 	revfmt = cput->revfmt;
 	name = cput->name;
 	if (rev == MPC750 && pvr == 15) {
 		name = "Motorola MPC755";
 		revfmt = REVFMT_HEX;
 	}
 	strncpy(model, name, sizeof(model) - 1);
 
 	printf("cpu%d: %s revision ", cpuid, name);
 
 	switch (revfmt) {
 		case REVFMT_MAJMIN:
 			printf("%u.%u", maj, min);
 			break;
 		case REVFMT_HEX:
 			printf("0x%04x", rev);
 			break;
 		case REVFMT_DEC:
 			printf("%u", rev);
 			break;
 	}
 
 	if (cpu_est_clockrate(0, &cps) == 0)
 		printf(", %jd.%02jd MHz", cps / 1000000, (cps / 10000) % 100);
 	printf("\n");
 
 	printf("cpu%d: Features %b\n", cpuid, (int)cpu_features,
 	    PPC_FEATURE_BITMASK);
 	if (cpu_features2 != 0)
 		printf("cpu%d: Features2 %b\n", cpuid, (int)cpu_features2,
 		    PPC_FEATURE2_BITMASK);
 
 	/*
 	 * Configure CPU
 	 */
 	if (cput->cpu_setup != NULL)
 		cput->cpu_setup(cpuid, vers);
 }
 
 /* Get current clock frequency for the given cpu id. */
 int
 cpu_est_clockrate(int cpu_id, uint64_t *cps)
 {
 	uint16_t	vers;
 	register_t	msr;
 	phandle_t	cpu, dev, root;
 	int		res  = 0;
 	char		buf[8];
 
 	vers = mfpvr() >> 16;
 	msr = mfmsr();
 	mtmsr(msr & ~PSL_EE);
 
 	switch (vers) {
 		case MPC7450:
 		case MPC7455:
 		case MPC7457:
 		case MPC750:
 		case IBM750FX:
 		case MPC7400:
 		case MPC7410:
 		case MPC7447A:
 		case MPC7448:
 			mtspr(SPR_MMCR0, SPR_MMCR0_FC);
 			mtspr(SPR_PMC1, 0);
 			mtspr(SPR_MMCR0, SPR_MMCR0_PMC1SEL(PMCN_CYCLES));
 			DELAY(1000);
 			*cps = (mfspr(SPR_PMC1) * 1000) + 4999;
 			mtspr(SPR_MMCR0, SPR_MMCR0_FC);
 
 			mtmsr(msr);
 			return (0);
 		case IBM970:
 		case IBM970FX:
 		case IBM970MP:
 			isync();
 			mtspr(SPR_970MMCR0, SPR_MMCR0_FC);
 			isync();
 			mtspr(SPR_970MMCR1, 0);
 			mtspr(SPR_970MMCRA, 0);
 			mtspr(SPR_970PMC1, 0);
 			mtspr(SPR_970MMCR0,
 			    SPR_970MMCR0_PMC1SEL(PMC970N_CYCLES));
 			isync();
 			DELAY(1000);
 			powerpc_sync();
 			mtspr(SPR_970MMCR0, SPR_MMCR0_FC);
 			*cps = (mfspr(SPR_970PMC1) * 1000) + 4999;
 
 			mtmsr(msr);
 			return (0);
 
 		default:
 			root = OF_peer(0);
 			if (root == 0)
 				return (ENXIO);
 
 			dev = OF_child(root);
 			while (dev != 0) {
 				res = OF_getprop(dev, "name", buf, sizeof(buf));
 				if (res > 0 && strcmp(buf, "cpus") == 0)
 					break;
 				dev = OF_peer(dev);
 			}
 			cpu = OF_child(dev);
 			while (cpu != 0) {
 				res = OF_getprop(cpu, "device_type", buf,
 						sizeof(buf));
 				if (res > 0 && strcmp(buf, "cpu") == 0)
 					break;
 				cpu = OF_peer(cpu);
 			}
 			if (cpu == 0)
 				return (ENOENT);
 			if (OF_getprop(cpu, "ibm,extended-clock-frequency",
 			    cps, sizeof(*cps)) >= 0) {
 				return (0);
 			} else if (OF_getprop(cpu, "clock-frequency", cps, 
 			    sizeof(cell_t)) >= 0) {
 				*cps >>= 32;
 				return (0);
 			} else {
 				return (ENOENT);
 			}
 	}
 }
 
 void
 cpu_6xx_setup(int cpuid, uint16_t vers)
 {
 	register_t hid0, pvr;
 	const char *bitmask;
 
 	hid0 = mfspr(SPR_HID0);
 	pvr = mfpvr();
 
 	/*
 	 * Configure power-saving mode.
 	 */
 	switch (vers) {
 		case MPC603:
 		case MPC603e:
 		case MPC603ev:
 		case MPC604ev:
 		case MPC750:
 		case IBM750FX:
 		case MPC7400:
 		case MPC7410:
 		case MPC8240:
 		case MPC8245:
 			/* Select DOZE mode. */
 			hid0 &= ~(HID0_DOZE | HID0_NAP | HID0_SLEEP);
 			hid0 |= HID0_DOZE | HID0_DPM;
 			powerpc_pow_enabled = 1;
 			break;
 
 		case MPC7448:
 		case MPC7447A:
 		case MPC7457:
 		case MPC7455:
 		case MPC7450:
 			/* Enable the 7450 branch caches */
 			hid0 |= HID0_SGE | HID0_BTIC;
 			hid0 |= HID0_LRSTK | HID0_FOLD | HID0_BHT;
 			/* Disable BTIC on 7450 Rev 2.0 or earlier and on 7457 */
 			if (((pvr >> 16) == MPC7450 && (pvr & 0xFFFF) <= 0x0200)
 					|| (pvr >> 16) == MPC7457)
 				hid0 &= ~HID0_BTIC;
 			/* Select NAP mode. */
 			hid0 &= ~(HID0_DOZE | HID0_NAP | HID0_SLEEP);
 			hid0 |= HID0_NAP | HID0_DPM;
 			powerpc_pow_enabled = 1;
 			break;
 
 		default:
 			/* No power-saving mode is available. */ ;
 	}
 
 	switch (vers) {
 		case IBM750FX:
 		case MPC750:
 			hid0 &= ~HID0_DBP;		/* XXX correct? */
 			hid0 |= HID0_EMCP | HID0_BTIC | HID0_SGE | HID0_BHT;
 			break;
 
 		case MPC7400:
 		case MPC7410:
 			hid0 &= ~HID0_SPD;
 			hid0 |= HID0_EMCP | HID0_BTIC | HID0_SGE | HID0_BHT;
 			hid0 |= HID0_EIEC;
 			break;
 
 	}
 
 	mtspr(SPR_HID0, hid0);
 
 	if (bootverbose)
 		cpu_6xx_print_cacheinfo(cpuid, vers);
 
 	switch (vers) {
 		case MPC7447A:
 		case MPC7448:
 		case MPC7450:
 		case MPC7455:
 		case MPC7457:
 			bitmask = HID0_7450_BITMASK;
 			break;
 		default:
 			bitmask = HID0_BITMASK;
 			break;
 	}
 
 	printf("cpu%d: HID0 %b\n", cpuid, (int)hid0, bitmask);
 
 	if (cpu_idle_hook == NULL)
 		cpu_idle_hook = cpu_idle_60x;
 }
 
 
 static void
 cpu_6xx_print_cacheinfo(u_int cpuid, uint16_t vers)
 {
 	register_t hid;
 
 	hid = mfspr(SPR_HID0);
 	printf("cpu%u: ", cpuid);
 	printf("L1 I-cache %sabled, ", (hid & HID0_ICE) ? "en" : "dis");
 	printf("L1 D-cache %sabled\n", (hid & HID0_DCE) ? "en" : "dis");
 
 	printf("cpu%u: ", cpuid);
   	if (mfspr(SPR_L2CR) & L2CR_L2E) {
 		switch (vers) {
 		case MPC7450:
 		case MPC7455:
 		case MPC7457:
 			printf("256KB L2 cache, ");
 			if (mfspr(SPR_L3CR) & L3CR_L3E)
 				printf("%cMB L3 backside cache",
 				    mfspr(SPR_L3CR) & L3CR_L3SIZ ? '2' : '1');
 			else
 				printf("L3 cache disabled");
 			printf("\n");
 			break;
 		case IBM750FX:
 			printf("512KB L2 cache\n");
 			break; 
 		default:
 			switch (mfspr(SPR_L2CR) & L2CR_L2SIZ) {
 			case L2SIZ_256K:
 				printf("256KB ");
 				break;
 			case L2SIZ_512K:
 				printf("512KB ");
 				break;
 			case L2SIZ_1M:
 				printf("1MB ");
 				break;
 			}
 			printf("write-%s", (mfspr(SPR_L2CR) & L2CR_L2WT)
 			    ? "through" : "back");
 			if (mfspr(SPR_L2CR) & L2CR_L2PE)
 				printf(", with parity");
 			printf(" backside cache\n");
 			break;
 		}
 	} else
 		printf("L2 cache disabled\n");
 }
 
 static void
 cpu_booke_setup(int cpuid, uint16_t vers)
 {
 #ifdef BOOKE_E500
 	register_t hid0;
 	const char *bitmask;
 
 	hid0 = mfspr(SPR_HID0);
 
 	switch (vers) {
 	case FSL_E500mc:
 		bitmask = HID0_E500MC_BITMASK;
 		cpu_idle_hook = cpu_idle_e500mc;
 		break;
 	case FSL_E5500:
 	case FSL_E6500:
 		bitmask = HID0_E5500_BITMASK;
 		cpu_idle_hook = cpu_idle_e500mc;
 		break;
 	case FSL_E500v1:
 	case FSL_E500v2:
 		/* Only e500v1/v2 support HID0 power management setup. */
 
 		/* Program power-management mode. */
 		hid0 &= ~(HID0_DOZE | HID0_NAP | HID0_SLEEP);
 		hid0 |= HID0_DOZE;
 
 		mtspr(SPR_HID0, hid0);
 	default:
 		bitmask = HID0_E500_BITMASK;
 		break;
 	}
 	printf("cpu%d: HID0 %b\n", cpuid, (int)hid0, bitmask);
 #endif
 
 	if (cpu_idle_hook == NULL)
 		cpu_idle_hook = cpu_idle_booke;
 }
 
 static void
 cpu_970_setup(int cpuid, uint16_t vers)
 {
 #ifdef AIM
 	uint32_t hid0_hi, hid0_lo;
 
 	__asm __volatile ("mfspr %0,%2; clrldi %1,%0,32; srdi %0,%0,32;"
 	    : "=r" (hid0_hi), "=r" (hid0_lo) : "K" (SPR_HID0));
 
 	/* Configure power-saving mode */
 	switch (vers) {
 	case IBM970MP:
 		hid0_hi |= (HID0_DEEPNAP | HID0_NAP | HID0_DPM);
 		hid0_hi &= ~HID0_DOZE;
 		break;
 	default:
 		hid0_hi |= (HID0_NAP | HID0_DPM);
 		hid0_hi &= ~(HID0_DOZE | HID0_DEEPNAP);
 		break;
 	}
 	powerpc_pow_enabled = 1;
 
 	__asm __volatile (" \
 		sync; isync;					\
 		sldi	%0,%0,32; or %0,%0,%1;			\
 		mtspr	%2, %0;					\
 		mfspr   %0, %2; mfspr   %0, %2; mfspr   %0, %2; \
 		mfspr   %0, %2; mfspr   %0, %2; mfspr   %0, %2; \
 		sync; isync"
 	    :: "r" (hid0_hi), "r"(hid0_lo), "K" (SPR_HID0));
 
 	__asm __volatile ("mfspr %0,%1; srdi %0,%0,32;"
 	    : "=r" (hid0_hi) : "K" (SPR_HID0));
 	printf("cpu%d: HID0 %b\n", cpuid, (int)(hid0_hi), HID0_970_BITMASK);
 #endif
 
 	cpu_idle_hook = cpu_idle_60x;
 }
 
 static void
 cpu_powerx_setup(int cpuid, uint16_t vers)
 {
 
 #if defined(__powerpc64__) && defined(AIM)
 	if ((mfmsr() & PSL_HV) == 0)
 		return;
 
+	/* Nuke the FSCR, to disable all facilities. */
+	mtspr(SPR_FSCR, 0);
+
 	/* Configure power-saving */
 	switch (vers) {
 	case IBMPOWER8:
 	case IBMPOWER8E:
+	case IBMPOWER8NVL:
 		cpu_idle_hook = cpu_idle_powerx;
 		mtspr(SPR_LPCR, mfspr(SPR_LPCR) | LPCR_PECE_WAKESET);
 		isync();
 		break;
 	case IBMPOWER9:
 		cpu_idle_hook = cpu_idle_power9;
 		mtspr(SPR_LPCR, mfspr(SPR_LPCR) | LPCR_PECE_WAKESET);
 		isync();
 		break;
 	default:
 		return;
 	}
 
 #endif
 }
 
 static int
 cpu_feature_bit(SYSCTL_HANDLER_ARGS)
 {
 	int result;
 
 	result = (cpu_features & arg2) ? 1 : 0;
 
 	return (sysctl_handle_int(oidp, &result, 0, req));
 }
 
 void
 cpu_idle(int busy)
 {
 	sbintime_t sbt = -1;
 
 #ifdef INVARIANTS
 	if ((mfmsr() & PSL_EE) != PSL_EE) {
 		struct thread *td = curthread;
 		printf("td msr %#lx\n", (u_long)td->td_md.md_saved_msr);
 		panic("ints disabled in idleproc!");
 	}
 #endif
 
 	CTR2(KTR_SPARE2, "cpu_idle(%d) at %d",
 	    busy, curcpu);
 
 	if (cpu_idle_hook != NULL) {
 		if (!busy) {
 			critical_enter();
 			sbt = cpu_idleclock();
 		}
 		cpu_idle_hook(sbt);
 		if (!busy) {
 			cpu_activeclock();
 			critical_exit();
 		}
 	}
 
 	CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done",
 	    busy, curcpu);
 }
 
 static void
 cpu_idle_60x(sbintime_t sbt)
 {
 	register_t msr;
 	uint16_t vers;
 
 	if (!powerpc_pow_enabled)
 		return;
 
 	msr = mfmsr();
 	vers = mfpvr() >> 16;
 
 #ifdef AIM
 	switch (vers) {
 	case IBM970:
 	case IBM970FX:
 	case IBM970MP:
 	case MPC7447A:
 	case MPC7448:
 	case MPC7450:
 	case MPC7455:
 	case MPC7457:
 		__asm __volatile("\
 			    dssall; sync; mtmsr %0; isync"
 			    :: "r"(msr | PSL_POW));
 		break;
 	default:
 		powerpc_sync();
 		mtmsr(msr | PSL_POW);
 		break;
 	}
 #endif
 }
 
 #ifdef BOOKE_E500
 static void
 cpu_idle_e500mc(sbintime_t sbt)
 {
 	/*
 	 * Base binutils doesn't know what the 'wait' instruction is, so
 	 * use the opcode encoding here.
 	 */
 	__asm __volatile(".long 0x7c00007c");
 }
 #endif
 
 static void
 cpu_idle_booke(sbintime_t sbt)
 {
 	register_t msr;
 
 	msr = mfmsr();
 
 #ifdef BOOKE_E500
 	powerpc_sync();
 	mtmsr(msr | PSL_WE);
 #endif
 }
 
 #if defined(__powerpc64__) && defined(AIM)
 static void
 cpu_idle_powerx(sbintime_t sbt)
 {
 	/* Sleeping when running on one cpu gives no advantages - avoid it */
 	if (smp_started == 0)
 		return;
 
 	spinlock_enter();
 	if (sched_runnable()) {
 		spinlock_exit();
 		return;
 	}
 
 	if (can_wakeup == 0)
 		can_wakeup = 1;
 	mb();
 
 	enter_idle_powerx();
 	spinlock_exit();
 }
 
 static void
 cpu_idle_power9(sbintime_t sbt)
 {
 	register_t msr;
 
 	msr = mfmsr();
 
 	/* Suspend external interrupts until stop instruction completes. */
 	mtmsr(msr &  ~PSL_EE);
 	/* Set the stop state to lowest latency, wake up to next instruction */
 	/* Set maximum transition level to 2, for deepest lossless sleep. */
 	mtspr(SPR_PSSCR, (2 << PSSCR_MTL_S) | (0 << PSSCR_RL_S));
 	/* "stop" instruction (PowerISA 3.0) */
 	__asm __volatile (".long 0x4c0002e4");
 	/*
 	 * Re-enable external interrupts to capture the interrupt that caused
 	 * the wake up.
 	 */
 	mtmsr(msr);
 	
 }
 #endif
 
 int
 cpu_idle_wakeup(int cpu)
 {
 
 	return (0);
 }
Index: user/ngie/bug-237403/sys/powerpc/powerpc/exec_machdep.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/exec_machdep.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/exec_machdep.c	(revision 346926)
@@ -1,1135 +1,1165 @@
 /*-
  * SPDX-License-Identifier: BSD-4-Clause AND BSD-2-Clause-FreeBSD
  *
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *      This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 /*-
  * Copyright (C) 2001 Benno Rice
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *	$NetBSD: machdep.c,v 1.74.2.1 2000/11/01 16:13:48 tv Exp $
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_fpu_emu.h"
 
 #include <sys/param.h>
 #include <sys/proc.h>
 #include <sys/systm.h>
 #include <sys/bio.h>
 #include <sys/buf.h>
 #include <sys/bus.h>
 #include <sys/cons.h>
 #include <sys/cpu.h>
 #include <sys/exec.h>
 #include <sys/imgact.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
 #include <sys/signalvar.h>
 #include <sys/syscallsubr.h>
 #include <sys/syscall.h>
 #include <sys/sysent.h>
 #include <sys/sysproto.h>
 #include <sys/ucontext.h>
 #include <sys/uio.h>
 
 #include <machine/altivec.h>
 #include <machine/cpu.h>
 #include <machine/elf.h>
 #include <machine/fpu.h>
 #include <machine/pcb.h>
 #include <machine/reg.h>
 #include <machine/sigframe.h>
 #include <machine/trap.h>
 #include <machine/vmparam.h>
 
 #include <vm/pmap.h>
 
 #ifdef FPU_EMU
 #include <powerpc/fpu/fpu_extern.h>
 #endif
 
 #ifdef COMPAT_FREEBSD32
 #include <compat/freebsd32/freebsd32_signal.h>
 #include <compat/freebsd32/freebsd32_util.h>
 #include <compat/freebsd32/freebsd32_proto.h>
 
 typedef struct __ucontext32 {
 	sigset_t		uc_sigmask;
 	mcontext32_t		uc_mcontext;
 	uint32_t		uc_link;
 	struct sigaltstack32    uc_stack;
 	uint32_t		uc_flags;
 	uint32_t		__spare__[4];
 } ucontext32_t;
 
 struct sigframe32 {
 	ucontext32_t		sf_uc;
 	struct siginfo32	sf_si;
 };
 
 static int	grab_mcontext32(struct thread *td, mcontext32_t *, int flags);
 #endif
 
 static int	grab_mcontext(struct thread *, mcontext_t *, int);
 
+static void	cleanup_power_extras(struct thread *);
+
 #ifdef __powerpc64__
 extern struct sysentvec elf64_freebsd_sysvec_v2;
 #endif
 
 void
 sendsig(sig_t catcher, ksiginfo_t *ksi, sigset_t *mask)
 {
 	struct trapframe *tf;
 	struct sigacts *psp;
 	struct sigframe sf;
 	struct thread *td;
 	struct proc *p;
 	#ifdef COMPAT_FREEBSD32
 	struct siginfo32 siginfo32;
 	struct sigframe32 sf32;
 	#endif
 	size_t sfpsize;
 	caddr_t sfp, usfp;
 	int oonstack, rndfsize;
 	int sig;
 	int code;
 
 	td = curthread;
 	p = td->td_proc;
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 
 	psp = p->p_sigacts;
 	mtx_assert(&psp->ps_mtx, MA_OWNED);
 	tf = td->td_frame;
 	oonstack = sigonstack(tf->fixreg[1]);
 
 	/*
 	 * Fill siginfo structure.
 	 */
 	ksi->ksi_info.si_signo = ksi->ksi_signo;
 	ksi->ksi_info.si_addr =
 	    (void *)((tf->exc == EXC_DSI || tf->exc == EXC_DSE) ? 
 	    tf->dar : tf->srr0);
 
 	#ifdef COMPAT_FREEBSD32
 	if (SV_PROC_FLAG(p, SV_ILP32)) {
 		siginfo_to_siginfo32(&ksi->ksi_info, &siginfo32);
 		sig = siginfo32.si_signo;
 		code = siginfo32.si_code;
 		sfp = (caddr_t)&sf32;
 		sfpsize = sizeof(sf32);
 		rndfsize = roundup(sizeof(sf32), 16);
 
 		/*
 		 * Save user context
 		 */
 
 		memset(&sf32, 0, sizeof(sf32));
 		grab_mcontext32(td, &sf32.sf_uc.uc_mcontext, 0);
 
 		sf32.sf_uc.uc_sigmask = *mask;
 		sf32.sf_uc.uc_stack.ss_sp = (uintptr_t)td->td_sigstk.ss_sp;
 		sf32.sf_uc.uc_stack.ss_size = (uint32_t)td->td_sigstk.ss_size;
 		sf32.sf_uc.uc_stack.ss_flags = (td->td_pflags & TDP_ALTSTACK)
 		    ? ((oonstack) ? SS_ONSTACK : 0) : SS_DISABLE;
 
 		sf32.sf_uc.uc_mcontext.mc_onstack = (oonstack) ? 1 : 0;
 	} else {
 	#endif
 		sig = ksi->ksi_signo;
 		code = ksi->ksi_code;
 		sfp = (caddr_t)&sf;
 		sfpsize = sizeof(sf);
 		#ifdef __powerpc64__
 		/*
 		 * 64-bit PPC defines a 288 byte scratch region
 		 * below the stack.
 		 */
 		rndfsize = 288 + roundup(sizeof(sf), 48);
 		#else
 		rndfsize = roundup(sizeof(sf), 16);
 		#endif
 
 		/*
 		 * Save user context
 		 */
 
 		memset(&sf, 0, sizeof(sf));
 		grab_mcontext(td, &sf.sf_uc.uc_mcontext, 0);
 
 		sf.sf_uc.uc_sigmask = *mask;
 		sf.sf_uc.uc_stack = td->td_sigstk;
 		sf.sf_uc.uc_stack.ss_flags = (td->td_pflags & TDP_ALTSTACK)
 		    ? ((oonstack) ? SS_ONSTACK : 0) : SS_DISABLE;
 
 		sf.sf_uc.uc_mcontext.mc_onstack = (oonstack) ? 1 : 0;
 	#ifdef COMPAT_FREEBSD32
 	}
 	#endif
 
 	CTR4(KTR_SIG, "sendsig: td=%p (%s) catcher=%p sig=%d", td, p->p_comm,
 	     catcher, sig);
 
 	/*
 	 * Allocate and validate space for the signal handler context.
 	 */
 	if ((td->td_pflags & TDP_ALTSTACK) != 0 && !oonstack &&
 	    SIGISMEMBER(psp->ps_sigonstack, sig)) {
 		usfp = (void *)(((uintptr_t)td->td_sigstk.ss_sp +
 		   td->td_sigstk.ss_size - rndfsize) & ~0xFul);
 	} else {
 		usfp = (void *)((tf->fixreg[1] - rndfsize) & ~0xFul);
 	}
 
 	/*
 	 * Save the floating-point state, if necessary, then copy it.
 	 */
 	/* XXX */
 
 	/*
 	 * Set up the registers to return to sigcode.
 	 *
 	 *   r1/sp - sigframe ptr
 	 *   lr    - sig function, dispatched to by blrl in trampoline
 	 *   r3    - sig number
 	 *   r4    - SIGINFO ? &siginfo : exception code
 	 *   r5    - user context
 	 *   srr0  - trampoline function addr
 	 */
 	tf->lr = (register_t)catcher;
 	tf->fixreg[1] = (register_t)usfp;
 	tf->fixreg[FIRSTARG] = sig;
 	#ifdef COMPAT_FREEBSD32
 	tf->fixreg[FIRSTARG+2] = (register_t)usfp +
 	    ((SV_PROC_FLAG(p, SV_ILP32)) ?
 	    offsetof(struct sigframe32, sf_uc) :
 	    offsetof(struct sigframe, sf_uc));
 	#else
 	tf->fixreg[FIRSTARG+2] = (register_t)usfp +
 	    offsetof(struct sigframe, sf_uc);
 	#endif
 	if (SIGISMEMBER(psp->ps_siginfo, sig)) {
 		/*
 		 * Signal handler installed with SA_SIGINFO.
 		 */
 		#ifdef COMPAT_FREEBSD32
 		if (SV_PROC_FLAG(p, SV_ILP32)) {
 			sf32.sf_si = siginfo32;
 			tf->fixreg[FIRSTARG+1] = (register_t)usfp +
 			    offsetof(struct sigframe32, sf_si);
 			sf32.sf_si = siginfo32;
 		} else  {
 		#endif
 			tf->fixreg[FIRSTARG+1] = (register_t)usfp +
 			    offsetof(struct sigframe, sf_si);
 			sf.sf_si = ksi->ksi_info;
 		#ifdef COMPAT_FREEBSD32
 		}
 		#endif
 	} else {
 		/* Old FreeBSD-style arguments. */
 		tf->fixreg[FIRSTARG+1] = code;
 		tf->fixreg[FIRSTARG+3] = (tf->exc == EXC_DSI) ? 
 		    tf->dar : tf->srr0;
 	}
 	mtx_unlock(&psp->ps_mtx);
 	PROC_UNLOCK(p);
 
 	tf->srr0 = (register_t)p->p_sysent->sv_sigcode_base;
 
 	/*
 	 * copy the frame out to userland.
 	 */
 	if (copyout(sfp, usfp, sfpsize) != 0) {
 		/*
 		 * Process has trashed its stack. Kill it.
 		 */
 		CTR2(KTR_SIG, "sendsig: sigexit td=%p sfp=%p", td, sfp);
 		PROC_LOCK(p);
 		sigexit(td, SIGILL);
 	}
 
 	CTR3(KTR_SIG, "sendsig: return td=%p pc=%#x sp=%#x", td,
 	     tf->srr0, tf->fixreg[1]);
 
 	PROC_LOCK(p);
 	mtx_lock(&psp->ps_mtx);
 }
 
 int
 sys_sigreturn(struct thread *td, struct sigreturn_args *uap)
 {
 	ucontext_t uc;
 	int error;
 
 	CTR2(KTR_SIG, "sigreturn: td=%p ucp=%p", td, uap->sigcntxp);
 
 	if (copyin(uap->sigcntxp, &uc, sizeof(uc)) != 0) {
 		CTR1(KTR_SIG, "sigreturn: efault td=%p", td);
 		return (EFAULT);
 	}
 
 	error = set_mcontext(td, &uc.uc_mcontext);
 	if (error != 0)
 		return (error);
 
 	kern_sigprocmask(td, SIG_SETMASK, &uc.uc_sigmask, NULL, 0);
 
 	CTR3(KTR_SIG, "sigreturn: return td=%p pc=%#x sp=%#x",
 	     td, uc.uc_mcontext.mc_srr0, uc.uc_mcontext.mc_gpr[1]);
 
 	return (EJUSTRETURN);
 }
 
 #ifdef COMPAT_FREEBSD4
 int
 freebsd4_sigreturn(struct thread *td, struct freebsd4_sigreturn_args *uap)
 {
 
 	return sys_sigreturn(td, (struct sigreturn_args *)uap);
 }
 #endif
 
 /*
  * Construct a PCB from a trapframe. This is called from kdb_trap() where
  * we want to start a backtrace from the function that caused us to enter
  * the debugger. We have the context in the trapframe, but base the trace
  * on the PCB. The PCB doesn't have to be perfect, as long as it contains
  * enough for a backtrace.
  */
 void
 makectx(struct trapframe *tf, struct pcb *pcb)
 {
 
 	pcb->pcb_lr = tf->srr0;
 	pcb->pcb_sp = tf->fixreg[1];
 }
 
 /*
  * get_mcontext/sendsig helper routine that doesn't touch the
  * proc lock
  */
 static int
 grab_mcontext(struct thread *td, mcontext_t *mcp, int flags)
 {
 	struct pcb *pcb;
 	int i;
 
 	pcb = td->td_pcb;
 
 	memset(mcp, 0, sizeof(mcontext_t));
 
 	mcp->mc_vers = _MC_VERSION;
 	mcp->mc_flags = 0;
 	memcpy(&mcp->mc_frame, td->td_frame, sizeof(struct trapframe));
 	if (flags & GET_MC_CLEAR_RET) {
 		mcp->mc_gpr[3] = 0;
 		mcp->mc_gpr[4] = 0;
 	}
 
 	/*
 	 * This assumes that floating-point context is *not* lazy,
 	 * so if the thread has used FP there would have been a
 	 * FP-unavailable exception that would have set things up
 	 * correctly.
 	 */
 	if (pcb->pcb_flags & PCB_FPREGS) {
 		if (pcb->pcb_flags & PCB_FPU) {
 			KASSERT(td == curthread,
 				("get_mcontext: fp save not curthread"));
 			critical_enter();
 			save_fpu(td);
 			critical_exit();
 		}
 		mcp->mc_flags |= _MC_FP_VALID;
 		memcpy(&mcp->mc_fpscr, &pcb->pcb_fpu.fpscr, sizeof(double));
 		for (i = 0; i < 32; i++)
 			memcpy(&mcp->mc_fpreg[i], &pcb->pcb_fpu.fpr[i].fpr,
 			    sizeof(double));
 	}
 
 	if (pcb->pcb_flags & PCB_VSX) {
 		for (i = 0; i < 32; i++)
 			memcpy(&mcp->mc_vsxfpreg[i],
 			    &pcb->pcb_fpu.fpr[i].vsr[2], sizeof(double));
 	}
 
 	/*
 	 * Repeat for Altivec context
 	 */
 
 	if (pcb->pcb_flags & PCB_VEC) {
 		KASSERT(td == curthread,
 			("get_mcontext: fp save not curthread"));
 		critical_enter();
 		save_vec(td);
 		critical_exit();
 		mcp->mc_flags |= _MC_AV_VALID;
 		mcp->mc_vscr  = pcb->pcb_vec.vscr;
 		mcp->mc_vrsave =  pcb->pcb_vec.vrsave;
 		memcpy(mcp->mc_avec, pcb->pcb_vec.vr, sizeof(mcp->mc_avec));
 	}
 
 	mcp->mc_len = sizeof(*mcp);
 
 	return (0);
 }
 
 int
 get_mcontext(struct thread *td, mcontext_t *mcp, int flags)
 {
 	int error;
 
 	error = grab_mcontext(td, mcp, flags);
 	if (error == 0) {
 		PROC_LOCK(curthread->td_proc);
 		mcp->mc_onstack = sigonstack(td->td_frame->fixreg[1]);
 		PROC_UNLOCK(curthread->td_proc);
 	}
 
 	return (error);
 }
 
 int
 set_mcontext(struct thread *td, mcontext_t *mcp)
 {
 	struct pcb *pcb;
 	struct trapframe *tf;
 	register_t tls;
 	int i;
 
 	pcb = td->td_pcb;
 	tf = td->td_frame;
 
 	if (mcp->mc_vers != _MC_VERSION || mcp->mc_len != sizeof(*mcp))
 		return (EINVAL);
 
 	/*
 	 * Don't let the user set privileged MSR bits
 	 */
 	if ((mcp->mc_srr1 & psl_userstatic) != (tf->srr1 & psl_userstatic)) {
 		return (EINVAL);
 	}
 
 	/* Copy trapframe, preserving TLS pointer across context change */
 	if (SV_PROC_FLAG(td->td_proc, SV_LP64))
 		tls = tf->fixreg[13];
 	else
 		tls = tf->fixreg[2];
 	memcpy(tf, mcp->mc_frame, sizeof(mcp->mc_frame));
 	if (SV_PROC_FLAG(td->td_proc, SV_LP64))
 		tf->fixreg[13] = tls;
 	else
 		tf->fixreg[2] = tls;
 
 	/* Disable FPU */
 	tf->srr1 &= ~PSL_FP;
 	pcb->pcb_flags &= ~PCB_FPU;
 
 	if (mcp->mc_flags & _MC_FP_VALID) {
 		/* enable_fpu() will happen lazily on a fault */
 		pcb->pcb_flags |= PCB_FPREGS;
 		memcpy(&pcb->pcb_fpu.fpscr, &mcp->mc_fpscr, sizeof(double));
 		bzero(pcb->pcb_fpu.fpr, sizeof(pcb->pcb_fpu.fpr));
 		for (i = 0; i < 32; i++) {
 			memcpy(&pcb->pcb_fpu.fpr[i].fpr, &mcp->mc_fpreg[i],
 			    sizeof(double));
 			memcpy(&pcb->pcb_fpu.fpr[i].vsr[2],
 			    &mcp->mc_vsxfpreg[i], sizeof(double));
 		}
 	}
 
 	if (mcp->mc_flags & _MC_AV_VALID) {
 		if ((pcb->pcb_flags & PCB_VEC) != PCB_VEC) {
 			critical_enter();
 			enable_vec(td);
 			critical_exit();
 		}
 		pcb->pcb_vec.vscr = mcp->mc_vscr;
 		pcb->pcb_vec.vrsave = mcp->mc_vrsave;
 		memcpy(pcb->pcb_vec.vr, mcp->mc_avec, sizeof(mcp->mc_avec));
 	}
 
 	return (0);
 }
 
 /*
+ * Clean up extra POWER state.  Some per-process registers and states are not
+ * managed by the MSR, so must be cleaned up explicitly on thread exit.
+ *
+ * Currently this includes:
+ * DSCR -- Data stream control register (PowerISA 2.06+)
+ * FSCR -- Facility Status and Control Register (PowerISA 2.07+)
+ */
+static void
+cleanup_power_extras(struct thread *td)
+{
+	uint32_t pcb_flags;
+
+	if (td != curthread)
+		return;
+
+	pcb_flags = td->td_pcb->pcb_flags;
+	/* Clean up registers not managed by MSR. */
+	if (pcb_flags & PCB_CFSCR)
+		mtspr(SPR_FSCR, 0);
+	if (pcb_flags & PCB_CDSCR) 
+		mtspr(SPR_DSCRP, 0);
+}
+
+/*
  * Set set up registers on exec.
  */
 void
 exec_setregs(struct thread *td, struct image_params *imgp, u_long stack)
 {
 	struct trapframe	*tf;
 	register_t		argc;
 
 	tf = trapframe(td);
 	bzero(tf, sizeof *tf);
 	#ifdef __powerpc64__
 	tf->fixreg[1] = -roundup(-stack + 48, 16);
 	#else
 	tf->fixreg[1] = -roundup(-stack + 8, 16);
 	#endif
 
 	/*
 	 * Set up arguments for _start():
 	 *	_start(argc, argv, envp, obj, cleanup, ps_strings);
 	 *
 	 * Notes:
 	 *	- obj and cleanup are the auxilliary and termination
 	 *	  vectors.  They are fixed up by ld.elf_so.
 	 *	- ps_strings is a NetBSD extention, and will be
 	 * 	  ignored by executables which are strictly
 	 *	  compliant with the SVR4 ABI.
 	 */
 
 	/* Collect argc from the user stack */
 	argc = fuword((void *)stack);
 
 	tf->fixreg[3] = argc;
 	tf->fixreg[4] = stack + sizeof(register_t);
 	tf->fixreg[5] = stack + (2 + argc)*sizeof(register_t);
 	tf->fixreg[6] = 0;				/* auxillary vector */
 	tf->fixreg[7] = 0;				/* termination vector */
 	tf->fixreg[8] = (register_t)imgp->ps_strings;	/* NetBSD extension */
 
 	tf->srr0 = imgp->entry_addr;
 	#ifdef __powerpc64__
 	tf->fixreg[12] = imgp->entry_addr;
 	#endif
 	tf->srr1 = psl_userset | PSL_FE_DFLT;
+	cleanup_power_extras(td);
 	td->td_pcb->pcb_flags = 0;
 }
 
 #ifdef COMPAT_FREEBSD32
 void
 ppc32_setregs(struct thread *td, struct image_params *imgp, u_long stack)
 {
 	struct trapframe	*tf;
 	uint32_t		argc;
 
 	tf = trapframe(td);
 	bzero(tf, sizeof *tf);
 	tf->fixreg[1] = -roundup(-stack + 8, 16);
 
 	argc = fuword32((void *)stack);
 
 	tf->fixreg[3] = argc;
 	tf->fixreg[4] = stack + sizeof(uint32_t);
 	tf->fixreg[5] = stack + (2 + argc)*sizeof(uint32_t);
 	tf->fixreg[6] = 0;				/* auxillary vector */
 	tf->fixreg[7] = 0;				/* termination vector */
 	tf->fixreg[8] = (register_t)imgp->ps_strings;	/* NetBSD extension */
 
 	tf->srr0 = imgp->entry_addr;
 	tf->srr1 = psl_userset32 | PSL_FE_DFLT;
+	cleanup_power_extras(td);
 	td->td_pcb->pcb_flags = 0;
 }
 #endif
 
 int
 fill_regs(struct thread *td, struct reg *regs)
 {
 	struct trapframe *tf;
 
 	tf = td->td_frame;
 	memcpy(regs, tf, sizeof(struct reg));
 
 	return (0);
 }
 
 int
 fill_dbregs(struct thread *td, struct dbreg *dbregs)
 {
 	/* No debug registers on PowerPC */
 	return (ENOSYS);
 }
 
 int
 fill_fpregs(struct thread *td, struct fpreg *fpregs)
 {
 	struct pcb *pcb;
 	int i;
 
 	pcb = td->td_pcb;
 
 	if ((pcb->pcb_flags & PCB_FPREGS) == 0)
 		memset(fpregs, 0, sizeof(struct fpreg));
 	else {
 		memcpy(&fpregs->fpscr, &pcb->pcb_fpu.fpscr, sizeof(double));
 		for (i = 0; i < 32; i++)
 			memcpy(&fpregs->fpreg[i], &pcb->pcb_fpu.fpr[i].fpr,
 			    sizeof(double));
 	}
 
 	return (0);
 }
 
 int
 set_regs(struct thread *td, struct reg *regs)
 {
 	struct trapframe *tf;
 
 	tf = td->td_frame;
 	memcpy(tf, regs, sizeof(struct reg));
 	
 	return (0);
 }
 
 int
 set_dbregs(struct thread *td, struct dbreg *dbregs)
 {
 	/* No debug registers on PowerPC */
 	return (ENOSYS);
 }
 
 int
 set_fpregs(struct thread *td, struct fpreg *fpregs)
 {
 	struct pcb *pcb;
 	int i;
 
 	pcb = td->td_pcb;
 	pcb->pcb_flags |= PCB_FPREGS;
 	memcpy(&pcb->pcb_fpu.fpscr, &fpregs->fpscr, sizeof(double));
 	for (i = 0; i < 32; i++) {
 		memcpy(&pcb->pcb_fpu.fpr[i].fpr, &fpregs->fpreg[i],
 		    sizeof(double));
 	}
 
 	return (0);
 }
 
 #ifdef COMPAT_FREEBSD32
 int
 set_regs32(struct thread *td, struct reg32 *regs)
 {
 	struct trapframe *tf;
 	int i;
 
 	tf = td->td_frame;
 	for (i = 0; i < 32; i++)
 		tf->fixreg[i] = regs->fixreg[i];
 	tf->lr = regs->lr;
 	tf->cr = regs->cr;
 	tf->xer = regs->xer;
 	tf->ctr = regs->ctr;
 	tf->srr0 = regs->pc;
 
 	return (0);
 }
 
 int
 fill_regs32(struct thread *td, struct reg32 *regs)
 {
 	struct trapframe *tf;
 	int i;
 
 	tf = td->td_frame;
 	for (i = 0; i < 32; i++)
 		regs->fixreg[i] = tf->fixreg[i];
 	regs->lr = tf->lr;
 	regs->cr = tf->cr;
 	regs->xer = tf->xer;
 	regs->ctr = tf->ctr;
 	regs->pc = tf->srr0;
 
 	return (0);
 }
 
 static int
 grab_mcontext32(struct thread *td, mcontext32_t *mcp, int flags)
 {
 	mcontext_t mcp64;
 	int i, error;
 
 	error = grab_mcontext(td, &mcp64, flags);
 	if (error != 0)
 		return (error);
 	
 	mcp->mc_vers = mcp64.mc_vers;
 	mcp->mc_flags = mcp64.mc_flags;
 	mcp->mc_onstack = mcp64.mc_onstack;
 	mcp->mc_len = mcp64.mc_len;
 	memcpy(mcp->mc_avec,mcp64.mc_avec,sizeof(mcp64.mc_avec));
 	memcpy(mcp->mc_av,mcp64.mc_av,sizeof(mcp64.mc_av));
 	for (i = 0; i < 42; i++)
 		mcp->mc_frame[i] = mcp64.mc_frame[i];
 	memcpy(mcp->mc_fpreg,mcp64.mc_fpreg,sizeof(mcp64.mc_fpreg));
 	memcpy(mcp->mc_vsxfpreg,mcp64.mc_vsxfpreg,sizeof(mcp64.mc_vsxfpreg));
 
 	return (0);
 }
 
 static int
 get_mcontext32(struct thread *td, mcontext32_t *mcp, int flags)
 {
 	int error;
 
 	error = grab_mcontext32(td, mcp, flags);
 	if (error == 0) {
 		PROC_LOCK(curthread->td_proc);
 		mcp->mc_onstack = sigonstack(td->td_frame->fixreg[1]);
 		PROC_UNLOCK(curthread->td_proc);
 	}
 
 	return (error);
 }
 
 static int
 set_mcontext32(struct thread *td, mcontext32_t *mcp)
 {
 	mcontext_t mcp64;
 	int i, error;
 
 	mcp64.mc_vers = mcp->mc_vers;
 	mcp64.mc_flags = mcp->mc_flags;
 	mcp64.mc_onstack = mcp->mc_onstack;
 	mcp64.mc_len = mcp->mc_len;
 	memcpy(mcp64.mc_avec,mcp->mc_avec,sizeof(mcp64.mc_avec));
 	memcpy(mcp64.mc_av,mcp->mc_av,sizeof(mcp64.mc_av));
 	for (i = 0; i < 42; i++)
 		mcp64.mc_frame[i] = mcp->mc_frame[i];
 	mcp64.mc_srr1 |= (td->td_frame->srr1 & 0xFFFFFFFF00000000ULL);
 	memcpy(mcp64.mc_fpreg,mcp->mc_fpreg,sizeof(mcp64.mc_fpreg));
 	memcpy(mcp64.mc_vsxfpreg,mcp->mc_vsxfpreg,sizeof(mcp64.mc_vsxfpreg));
 
 	error = set_mcontext(td, &mcp64);
 
 	return (error);
 }
 #endif
 
 #ifdef COMPAT_FREEBSD32
 int
 freebsd32_sigreturn(struct thread *td, struct freebsd32_sigreturn_args *uap)
 {
 	ucontext32_t uc;
 	int error;
 
 	CTR2(KTR_SIG, "sigreturn: td=%p ucp=%p", td, uap->sigcntxp);
 
 	if (copyin(uap->sigcntxp, &uc, sizeof(uc)) != 0) {
 		CTR1(KTR_SIG, "sigreturn: efault td=%p", td);
 		return (EFAULT);
 	}
 
 	error = set_mcontext32(td, &uc.uc_mcontext);
 	if (error != 0)
 		return (error);
 
 	kern_sigprocmask(td, SIG_SETMASK, &uc.uc_sigmask, NULL, 0);
 
 	CTR3(KTR_SIG, "sigreturn: return td=%p pc=%#x sp=%#x",
 	     td, uc.uc_mcontext.mc_srr0, uc.uc_mcontext.mc_gpr[1]);
 
 	return (EJUSTRETURN);
 }
 
 /*
  * The first two fields of a ucontext_t are the signal mask and the machine
  * context.  The next field is uc_link; we want to avoid destroying the link
  * when copying out contexts.
  */
 #define	UC32_COPY_SIZE	offsetof(ucontext32_t, uc_link)
 
 int
 freebsd32_getcontext(struct thread *td, struct freebsd32_getcontext_args *uap)
 {
 	ucontext32_t uc;
 	int ret;
 
 	if (uap->ucp == NULL)
 		ret = EINVAL;
 	else {
 		bzero(&uc, sizeof(uc));
 		get_mcontext32(td, &uc.uc_mcontext, GET_MC_CLEAR_RET);
 		PROC_LOCK(td->td_proc);
 		uc.uc_sigmask = td->td_sigmask;
 		PROC_UNLOCK(td->td_proc);
 		ret = copyout(&uc, uap->ucp, UC32_COPY_SIZE);
 	}
 	return (ret);
 }
 
 int
 freebsd32_setcontext(struct thread *td, struct freebsd32_setcontext_args *uap)
 {
 	ucontext32_t uc;
 	int ret;	
 
 	if (uap->ucp == NULL)
 		ret = EINVAL;
 	else {
 		ret = copyin(uap->ucp, &uc, UC32_COPY_SIZE);
 		if (ret == 0) {
 			ret = set_mcontext32(td, &uc.uc_mcontext);
 			if (ret == 0) {
 				kern_sigprocmask(td, SIG_SETMASK,
 				    &uc.uc_sigmask, NULL, 0);
 			}
 		}
 	}
 	return (ret == 0 ? EJUSTRETURN : ret);
 }
 
 int
 freebsd32_swapcontext(struct thread *td, struct freebsd32_swapcontext_args *uap)
 {
 	ucontext32_t uc;
 	int ret;
 
 	if (uap->oucp == NULL || uap->ucp == NULL)
 		ret = EINVAL;
 	else {
 		bzero(&uc, sizeof(uc));
 		get_mcontext32(td, &uc.uc_mcontext, GET_MC_CLEAR_RET);
 		PROC_LOCK(td->td_proc);
 		uc.uc_sigmask = td->td_sigmask;
 		PROC_UNLOCK(td->td_proc);
 		ret = copyout(&uc, uap->oucp, UC32_COPY_SIZE);
 		if (ret == 0) {
 			ret = copyin(uap->ucp, &uc, UC32_COPY_SIZE);
 			if (ret == 0) {
 				ret = set_mcontext32(td, &uc.uc_mcontext);
 				if (ret == 0) {
 					kern_sigprocmask(td, SIG_SETMASK,
 					    &uc.uc_sigmask, NULL, 0);
 				}
 			}
 		}
 	}
 	return (ret == 0 ? EJUSTRETURN : ret);
 }
 
 #endif
 
 void
 cpu_set_syscall_retval(struct thread *td, int error)
 {
 	struct proc *p;
 	struct trapframe *tf;
 	int fixup;
 
 	if (error == EJUSTRETURN)
 		return;
 
 	p = td->td_proc;
 	tf = td->td_frame;
 
 	if (tf->fixreg[0] == SYS___syscall &&
 	    (SV_PROC_FLAG(p, SV_ILP32))) {
 		int code = tf->fixreg[FIRSTARG + 1];
 		fixup = (
 #if defined(COMPAT_FREEBSD6) && defined(SYS_freebsd6_lseek)
 		    code != SYS_freebsd6_lseek &&
 #endif
 		    code != SYS_lseek) ?  1 : 0;
 	} else
 		fixup = 0;
 
 	switch (error) {
 	case 0:
 		if (fixup) {
 			/*
 			 * 64-bit return, 32-bit syscall. Fixup byte order
 			 */
 			tf->fixreg[FIRSTARG] = 0;
 			tf->fixreg[FIRSTARG + 1] = td->td_retval[0];
 		} else {
 			tf->fixreg[FIRSTARG] = td->td_retval[0];
 			tf->fixreg[FIRSTARG + 1] = td->td_retval[1];
 		}
 		tf->cr &= ~0x10000000;		/* Unset summary overflow */
 		break;
 	case ERESTART:
 		/*
 		 * Set user's pc back to redo the system call.
 		 */
 		tf->srr0 -= 4;
 		break;
 	default:
 		tf->fixreg[FIRSTARG] = SV_ABI_ERRNO(p, error);
 		tf->cr |= 0x10000000;		/* Set summary overflow */
 		break;
 	}
 }
 
 /*
  * Threading functions
  */
 void
 cpu_thread_exit(struct thread *td)
 {
+	cleanup_power_extras(td);
 }
 
 void
 cpu_thread_clean(struct thread *td)
 {
 }
 
 void
 cpu_thread_alloc(struct thread *td)
 {
 	struct pcb *pcb;
 
 	pcb = (struct pcb *)((td->td_kstack + td->td_kstack_pages * PAGE_SIZE -
 	    sizeof(struct pcb)) & ~0x2fUL);
 	td->td_pcb = pcb;
 	td->td_frame = (struct trapframe *)pcb - 1;
 }
 
 void
 cpu_thread_free(struct thread *td)
 {
 }
 
 int
 cpu_set_user_tls(struct thread *td, void *tls_base)
 {
 
 	if (SV_PROC_FLAG(td->td_proc, SV_LP64))
 		td->td_frame->fixreg[13] = (register_t)tls_base + 0x7010;
 	else
 		td->td_frame->fixreg[2] = (register_t)tls_base + 0x7008;
 	return (0);
 }
 
 void
 cpu_copy_thread(struct thread *td, struct thread *td0)
 {
 	struct pcb *pcb2;
 	struct trapframe *tf;
 	struct callframe *cf;
 
 	pcb2 = td->td_pcb;
 
 	/* Copy the upcall pcb */
 	bcopy(td0->td_pcb, pcb2, sizeof(*pcb2));
 
 	/* Create a stack for the new thread */
 	tf = td->td_frame;
 	bcopy(td0->td_frame, tf, sizeof(struct trapframe));
 	tf->fixreg[FIRSTARG] = 0;
 	tf->fixreg[FIRSTARG + 1] = 0;
 	tf->cr &= ~0x10000000;
 
 	/* Set registers for trampoline to user mode. */
 	cf = (struct callframe *)tf - 1;
 	memset(cf, 0, sizeof(struct callframe));
 	cf->cf_func = (register_t)fork_return;
 	cf->cf_arg0 = (register_t)td;
 	cf->cf_arg1 = (register_t)tf;
 
 	pcb2->pcb_sp = (register_t)cf;
 	#if defined(__powerpc64__) && (!defined(_CALL_ELF) || _CALL_ELF == 1)
 	pcb2->pcb_lr = ((register_t *)fork_trampoline)[0];
 	pcb2->pcb_toc = ((register_t *)fork_trampoline)[1];
 	#else
 	pcb2->pcb_lr = (register_t)fork_trampoline;
 	pcb2->pcb_context[0] = pcb2->pcb_lr;
 	#endif
 	pcb2->pcb_cpu.aim.usr_vsid = 0;
 #ifdef __SPE__
 	pcb2->pcb_vec.vscr = SPEFSCR_FINVE | SPEFSCR_FDBZE |
 	    SPEFSCR_FUNFE | SPEFSCR_FOVFE;
 #endif
 
 	/* Setup to release spin count in fork_exit(). */
 	td->td_md.md_spinlock_count = 1;
 	td->td_md.md_saved_msr = psl_kernset;
 }
 
 void
 cpu_set_upcall(struct thread *td, void (*entry)(void *), void *arg,
     stack_t *stack)
 {
 	struct trapframe *tf;
 	uintptr_t sp;
 
 	tf = td->td_frame;
 	/* align stack and alloc space for frame ptr and saved LR */
 	#ifdef __powerpc64__
 	sp = ((uintptr_t)stack->ss_sp + stack->ss_size - 48) &
 	    ~0x1f;
 	#else
 	sp = ((uintptr_t)stack->ss_sp + stack->ss_size - 8) &
 	    ~0x1f;
 	#endif
 	bzero(tf, sizeof(struct trapframe));
 
 	tf->fixreg[1] = (register_t)sp;
 	tf->fixreg[3] = (register_t)arg;
 	if (SV_PROC_FLAG(td->td_proc, SV_ILP32)) {
 		tf->srr0 = (register_t)entry;
 		#ifdef __powerpc64__
 		tf->srr1 = psl_userset32 | PSL_FE_DFLT;
 		#else
 		tf->srr1 = psl_userset | PSL_FE_DFLT;
 		#endif
 	} else {
 	    #ifdef __powerpc64__
 		if (td->td_proc->p_sysent == &elf64_freebsd_sysvec_v2) {
 			tf->srr0 = (register_t)entry;
 			/* ELFv2 ABI requires that the global entry point be in r12. */
 			tf->fixreg[12] = (register_t)entry;
 		}
 		else {
 			register_t entry_desc[3];
 			(void)copyin((void *)entry, entry_desc, sizeof(entry_desc));
 			tf->srr0 = entry_desc[0];
 			tf->fixreg[2] = entry_desc[1];
 			tf->fixreg[11] = entry_desc[2];
 		}
 		tf->srr1 = psl_userset | PSL_FE_DFLT;
 	    #endif
 	}
 
 	td->td_pcb->pcb_flags = 0;
 #ifdef __SPE__
 	td->td_pcb->pcb_vec.vscr = SPEFSCR_FINVE | SPEFSCR_FDBZE |
 	    SPEFSCR_FUNFE | SPEFSCR_FOVFE;
 #endif
 
 	td->td_retval[0] = (register_t)entry;
 	td->td_retval[1] = 0;
 }
 
 static int
 emulate_mfspr(int spr, int reg, struct trapframe *frame){
 	struct thread *td;
 
 	td = curthread;
 
-	if (spr == SPR_DSCR) {
+	if (spr == SPR_DSCR || spr == SPR_DSCRP) {
 		// If DSCR was never set, get the default DSCR
 		if ((td->td_pcb->pcb_flags & PCB_CDSCR) == 0)
-			td->td_pcb->pcb_dscr = mfspr(SPR_DSCR);
+			td->td_pcb->pcb_dscr = mfspr(SPR_DSCRP);
 
 		frame->fixreg[reg] = td->td_pcb->pcb_dscr;
 		frame->srr0 += 4;
 		return 0;
 	} else
 		return SIGILL;
 }
 
 static int
 emulate_mtspr(int spr, int reg, struct trapframe *frame){
 	struct thread *td;
 
 	td = curthread;
 
-	if (spr == SPR_DSCR) {
+	if (spr == SPR_DSCR || spr == SPR_DSCRP) {
 		td->td_pcb->pcb_flags |= PCB_CDSCR;
 		td->td_pcb->pcb_dscr = frame->fixreg[reg];
+		mtspr(SPR_DSCRP, frame->fixreg[reg]);
 		frame->srr0 += 4;
 		return 0;
 	} else
 		return SIGILL;
 }
 
 #define XFX 0xFC0007FF
 int
 ppc_instr_emulate(struct trapframe *frame, struct thread *td)
 {
 	struct pcb *pcb;
 	uint32_t instr;
 	int reg, sig;
 	int rs, spr;
 
 	instr = fuword32((void *)frame->srr0);
 	sig = SIGILL;
 
 	if ((instr & 0xfc1fffff) == 0x7c1f42a6) {	/* mfpvr */
 		reg = (instr & ~0xfc1fffff) >> 21;
 		frame->fixreg[reg] = mfpvr();
 		frame->srr0 += 4;
 		return (0);
 	} else if ((instr & XFX) == 0x7c0002a6) {	/* mfspr */
 		rs = (instr &  0x3e00000) >> 21;
 		spr = (instr & 0x1ff800) >> 16;
 		return emulate_mfspr(spr, rs, frame);
 	} else if ((instr & XFX) == 0x7c0003a6) {	/* mtspr */
 		rs = (instr &  0x3e00000) >> 21;
 		spr = (instr & 0x1ff800) >> 16;
 		return emulate_mtspr(spr, rs, frame);
 	} else if ((instr & 0xfc000ffe) == 0x7c0004ac) {	/* various sync */
 		powerpc_sync(); /* Do a heavy-weight sync */
 		frame->srr0 += 4;
 		return (0);
 	}
 
 	pcb = td->td_pcb;
 #ifdef FPU_EMU
 	if (!(pcb->pcb_flags & PCB_FPREGS)) {
 		bzero(&pcb->pcb_fpu, sizeof(pcb->pcb_fpu));
 		pcb->pcb_flags |= PCB_FPREGS;
 	} else if (pcb->pcb_flags & PCB_FPU)
 		save_fpu(td);
 	sig = fpu_emulate(frame, &pcb->pcb_fpu);
 	if ((sig == 0 || sig == SIGFPE) && pcb->pcb_flags & PCB_FPU)
 		enable_fpu(td);
 #endif
 	if (sig == SIGILL) {
 		if (pcb->pcb_lastill != frame->srr0) {
 			/* Allow a second chance, in case of cache sync issues. */
 			sig = 0;
 			pmap_sync_icache(PCPU_GET(curpmap), frame->srr0, 4);
 			pcb->pcb_lastill = frame->srr0;
 		}
 	}
 
 	return (sig);
 }
 
Index: user/ngie/bug-237403/sys/powerpc/powerpc/genassym.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/genassym.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/genassym.c	(revision 346926)
@@ -1,271 +1,281 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1990 The Regents of the University of California.
  * All rights reserved.
  *
  * This code is derived from software contributed to Berkeley by
  * William Jolitz.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	from: @(#)genassym.c	5.11 (Berkeley) 5/10/91
  * $FreeBSD$
  */
 
 #include <sys/param.h>
 #include <sys/assym.h>
 #include <sys/errno.h>
 #include <sys/ktr.h>
 #include <sys/proc.h>
 #include <sys/queue.h>
 #include <sys/signal.h>
 #include <sys/smp.h>
 #include <sys/systm.h>
 #include <sys/ucontext.h>
 #include <sys/ucontext.h>
 #include <sys/vmmeter.h>
 
 #include <vm/vm.h>
 #include <vm/vm_param.h>
 #include <vm/pmap.h>
 #include <vm/vm_map.h>
 
 #include <machine/pcb.h>
 #include <machine/psl.h>
 #include <machine/sigframe.h>
 
 ASSYM(PC_CURTHREAD, offsetof(struct pcpu, pc_curthread));
 ASSYM(PC_CURPCB, offsetof(struct pcpu, pc_curpcb));
 ASSYM(PC_CURPMAP, offsetof(struct pcpu, pc_curpmap));
 ASSYM(PC_TEMPSAVE, offsetof(struct pcpu, pc_tempsave));
 ASSYM(PC_DISISAVE, offsetof(struct pcpu, pc_disisave));
 ASSYM(PC_DBSAVE, offsetof(struct pcpu, pc_dbsave));
 ASSYM(PC_RESTORE, offsetof(struct pcpu, pc_restore));
 
 #if defined(BOOKE)
 ASSYM(PC_BOOKE_CRITSAVE, offsetof(struct pcpu, pc_booke.critsave));
 ASSYM(PC_BOOKE_MCHKSAVE, offsetof(struct pcpu, pc_booke.mchksave));
 ASSYM(PC_BOOKE_TLBSAVE, offsetof(struct pcpu, pc_booke.tlbsave));
 ASSYM(PC_BOOKE_TLB_LEVEL, offsetof(struct pcpu, pc_booke.tlb_level));
 ASSYM(PC_BOOKE_TLB_LOCK, offsetof(struct pcpu, pc_booke.tlb_lock));
 #endif
 
 ASSYM(CPUSAVE_R27, CPUSAVE_R27*sizeof(register_t));
 ASSYM(CPUSAVE_R28, CPUSAVE_R28*sizeof(register_t));
 ASSYM(CPUSAVE_R29, CPUSAVE_R29*sizeof(register_t));
 ASSYM(CPUSAVE_R30, CPUSAVE_R30*sizeof(register_t));
 ASSYM(CPUSAVE_R31, CPUSAVE_R31*sizeof(register_t));
 ASSYM(CPUSAVE_SRR0, CPUSAVE_SRR0*sizeof(register_t));
 ASSYM(CPUSAVE_SRR1, CPUSAVE_SRR1*sizeof(register_t));
 ASSYM(CPUSAVE_AIM_DAR, CPUSAVE_AIM_DAR*sizeof(register_t));
 ASSYM(CPUSAVE_AIM_DSISR, CPUSAVE_AIM_DSISR*sizeof(register_t));
 ASSYM(CPUSAVE_BOOKE_DEAR, CPUSAVE_BOOKE_DEAR*sizeof(register_t));
 ASSYM(CPUSAVE_BOOKE_ESR, CPUSAVE_BOOKE_ESR*sizeof(register_t));
 ASSYM(BOOKE_CRITSAVE_SRR0, BOOKE_CRITSAVE_SRR0*sizeof(register_t));
 ASSYM(BOOKE_CRITSAVE_SRR1, BOOKE_CRITSAVE_SRR1*sizeof(register_t));
 
 ASSYM(TLBSAVE_BOOKE_LR, TLBSAVE_BOOKE_LR*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_CR, TLBSAVE_BOOKE_CR*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_SRR0, TLBSAVE_BOOKE_SRR0*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_SRR1, TLBSAVE_BOOKE_SRR1*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R20, TLBSAVE_BOOKE_R20*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R21, TLBSAVE_BOOKE_R21*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R22, TLBSAVE_BOOKE_R22*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R23, TLBSAVE_BOOKE_R23*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R24, TLBSAVE_BOOKE_R24*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R25, TLBSAVE_BOOKE_R25*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R26, TLBSAVE_BOOKE_R26*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R27, TLBSAVE_BOOKE_R27*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R28, TLBSAVE_BOOKE_R28*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R29, TLBSAVE_BOOKE_R29*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R30, TLBSAVE_BOOKE_R30*sizeof(register_t));
 ASSYM(TLBSAVE_BOOKE_R31, TLBSAVE_BOOKE_R31*sizeof(register_t));
 
 ASSYM(MTX_LOCK, offsetof(struct mtx, mtx_lock));
 
 #if defined(AIM)
 ASSYM(USER_ADDR, USER_ADDR);
 #ifdef __powerpc64__
 ASSYM(PC_KERNSLB, offsetof(struct pcpu, pc_aim.slb));
 ASSYM(PC_USERSLB, offsetof(struct pcpu, pc_aim.userslb));
 ASSYM(PC_SLBSAVE, offsetof(struct pcpu, pc_aim.slbsave));
 ASSYM(PC_SLBSTACK, offsetof(struct pcpu, pc_aim.slbstack));
 ASSYM(USER_SLB_SLOT, USER_SLB_SLOT);
 ASSYM(USER_SLB_SLBE, USER_SLB_SLBE);
 ASSYM(SEGMENT_MASK, SEGMENT_MASK);
 #else
 ASSYM(PM_SR, offsetof(struct pmap, pm_sr));
 ASSYM(USER_SR, USER_SR);
 #endif
 #elif defined(BOOKE)
 #ifdef __powerpc64__
 ASSYM(PM_PP2D, offsetof(struct pmap, pm_pp2d));
 #else
 ASSYM(PM_PDIR, offsetof(struct pmap, pm_pdir));
 #endif
 /*
  * With pte_t being a bitfield struct, these fields cannot be addressed via
  * offsetof().
  */
 ASSYM(PTE_RPN, 0);
 ASSYM(PTE_FLAGS, sizeof(uint32_t));
 #if defined(BOOKE_E500)
 ASSYM(TLB_ENTRY_SIZE, sizeof(struct tlb_entry));
 #endif
 #endif
 
 #ifdef __powerpc64__
 ASSYM(FSP, 48);
 #else
 ASSYM(FSP, 8);
 #endif
 ASSYM(FRAMELEN, FRAMELEN);
 ASSYM(FRAME_0, offsetof(struct trapframe, fixreg[0]));
 ASSYM(FRAME_1, offsetof(struct trapframe, fixreg[1]));
 ASSYM(FRAME_2, offsetof(struct trapframe, fixreg[2]));
 ASSYM(FRAME_3, offsetof(struct trapframe, fixreg[3]));
 ASSYM(FRAME_4, offsetof(struct trapframe, fixreg[4]));
 ASSYM(FRAME_5, offsetof(struct trapframe, fixreg[5]));
 ASSYM(FRAME_6, offsetof(struct trapframe, fixreg[6]));
 ASSYM(FRAME_7, offsetof(struct trapframe, fixreg[7]));
 ASSYM(FRAME_8, offsetof(struct trapframe, fixreg[8]));
 ASSYM(FRAME_9, offsetof(struct trapframe, fixreg[9]));
 ASSYM(FRAME_10, offsetof(struct trapframe, fixreg[10]));
 ASSYM(FRAME_11, offsetof(struct trapframe, fixreg[11]));
 ASSYM(FRAME_12, offsetof(struct trapframe, fixreg[12]));
 ASSYM(FRAME_13, offsetof(struct trapframe, fixreg[13]));
 ASSYM(FRAME_14, offsetof(struct trapframe, fixreg[14]));
 ASSYM(FRAME_15, offsetof(struct trapframe, fixreg[15]));
 ASSYM(FRAME_16, offsetof(struct trapframe, fixreg[16]));
 ASSYM(FRAME_17, offsetof(struct trapframe, fixreg[17]));
 ASSYM(FRAME_18, offsetof(struct trapframe, fixreg[18]));
 ASSYM(FRAME_19, offsetof(struct trapframe, fixreg[19]));
 ASSYM(FRAME_20, offsetof(struct trapframe, fixreg[20]));
 ASSYM(FRAME_21, offsetof(struct trapframe, fixreg[21]));
 ASSYM(FRAME_22, offsetof(struct trapframe, fixreg[22]));
 ASSYM(FRAME_23, offsetof(struct trapframe, fixreg[23]));
 ASSYM(FRAME_24, offsetof(struct trapframe, fixreg[24]));
 ASSYM(FRAME_25, offsetof(struct trapframe, fixreg[25]));
 ASSYM(FRAME_26, offsetof(struct trapframe, fixreg[26]));
 ASSYM(FRAME_27, offsetof(struct trapframe, fixreg[27]));
 ASSYM(FRAME_28, offsetof(struct trapframe, fixreg[28]));
 ASSYM(FRAME_29, offsetof(struct trapframe, fixreg[29]));
 ASSYM(FRAME_30, offsetof(struct trapframe, fixreg[30]));
 ASSYM(FRAME_31, offsetof(struct trapframe, fixreg[31]));
 ASSYM(FRAME_LR, offsetof(struct trapframe, lr));
 ASSYM(FRAME_CR, offsetof(struct trapframe, cr));
 ASSYM(FRAME_CTR, offsetof(struct trapframe, ctr));
 ASSYM(FRAME_XER, offsetof(struct trapframe, xer));
 ASSYM(FRAME_SRR0, offsetof(struct trapframe, srr0));
 ASSYM(FRAME_SRR1, offsetof(struct trapframe, srr1));
 ASSYM(FRAME_EXC, offsetof(struct trapframe, exc));
 ASSYM(FRAME_AIM_DAR, offsetof(struct trapframe, dar));
 ASSYM(FRAME_AIM_DSISR, offsetof(struct trapframe, cpu.aim.dsisr));
 ASSYM(FRAME_BOOKE_DEAR, offsetof(struct trapframe, dar));
 ASSYM(FRAME_BOOKE_ESR, offsetof(struct trapframe, cpu.booke.esr));
 ASSYM(FRAME_BOOKE_DBCR0, offsetof(struct trapframe, cpu.booke.dbcr0));
 
 ASSYM(CF_FUNC, offsetof(struct callframe, cf_func));
 ASSYM(CF_ARG0, offsetof(struct callframe, cf_arg0));
 ASSYM(CF_ARG1, offsetof(struct callframe, cf_arg1));
 ASSYM(CF_SIZE, sizeof(struct callframe));
 
 ASSYM(PCB_CONTEXT, offsetof(struct pcb, pcb_context));
 ASSYM(PCB_CR, offsetof(struct pcb, pcb_cr));
 ASSYM(PCB_DSCR, offsetof(struct pcb, pcb_dscr));
+ASSYM(PCB_FSCR, offsetof(struct pcb, pcb_fscr));
+ASSYM(PCB_TAR, offsetof(struct pcb, pcb_tar));
 ASSYM(PCB_SP, offsetof(struct pcb, pcb_sp));
 ASSYM(PCB_TOC, offsetof(struct pcb, pcb_toc));
 ASSYM(PCB_LR, offsetof(struct pcb, pcb_lr));
 ASSYM(PCB_ONFAULT, offsetof(struct pcb, pcb_onfault));
 ASSYM(PCB_FLAGS, offsetof(struct pcb, pcb_flags));
 ASSYM(PCB_FPU, PCB_FPU);
 ASSYM(PCB_VEC, PCB_VEC);
 ASSYM(PCB_CDSCR, PCB_CDSCR);
+ASSYM(PCB_CFSCR, PCB_CFSCR);
 
 ASSYM(PCB_AIM_USR_VSID, offsetof(struct pcb, pcb_cpu.aim.usr_vsid));
 ASSYM(PCB_BOOKE_DBCR0, offsetof(struct pcb, pcb_cpu.booke.dbcr0));
 
 ASSYM(PCB_VSCR, offsetof(struct pcb, pcb_vec.vscr));
+
+ASSYM(PCB_EBB_EBBHR, offsetof(struct pcb, pcb_ebb.ebbhr));
+ASSYM(PCB_EBB_EBBRR, offsetof(struct pcb, pcb_ebb.ebbrr));
+ASSYM(PCB_EBB_BESCR, offsetof(struct pcb, pcb_ebb.bescr));
+
+ASSYM(PCB_LMON_LMRR, offsetof(struct pcb, pcb_lm.lmrr));
+ASSYM(PCB_LMON_LMSER, offsetof(struct pcb, pcb_lm.lmser));
 
 ASSYM(TD_LOCK, offsetof(struct thread, td_lock));
 ASSYM(TD_PROC, offsetof(struct thread, td_proc));
 ASSYM(TD_PCB, offsetof(struct thread, td_pcb));
 
 ASSYM(P_VMSPACE, offsetof(struct proc, p_vmspace));
 
 ASSYM(VM_PMAP, offsetof(struct vmspace, vm_pmap));
 
 ASSYM(TD_FLAGS, offsetof(struct thread, td_flags));
 
 ASSYM(TDF_ASTPENDING, TDF_ASTPENDING);
 ASSYM(TDF_NEEDRESCHED, TDF_NEEDRESCHED);
 
 ASSYM(SF_UC, offsetof(struct sigframe, sf_uc));
 
 ASSYM(DMAP_BASE_ADDRESS, DMAP_BASE_ADDRESS);
 ASSYM(MAXCOMLEN, MAXCOMLEN);
 
 #ifdef __powerpc64__
 ASSYM(PSL_CM, PSL_CM);
 #endif
 ASSYM(PSL_GS, PSL_GS);
 ASSYM(PSL_DE, PSL_DE);
 ASSYM(PSL_DS, PSL_DS);
 ASSYM(PSL_IS, PSL_IS);
 ASSYM(PSL_CE, PSL_CE);
 ASSYM(PSL_UCLE, PSL_UCLE);
 ASSYM(PSL_WE, PSL_WE);
 ASSYM(PSL_UBLE, PSL_UBLE);
 
 #if defined(AIM) && defined(__powerpc64__)
 ASSYM(PSL_SF, PSL_SF);
 ASSYM(PSL_HV, PSL_HV);
 #endif
 
 ASSYM(PSL_POW, PSL_POW);
 ASSYM(PSL_ILE, PSL_ILE);
 ASSYM(PSL_LE, PSL_LE);
 ASSYM(PSL_SE, PSL_SE);
 ASSYM(PSL_RI, PSL_RI);
 ASSYM(PSL_DR, PSL_DR);
 ASSYM(PSL_IP, PSL_IP);
 ASSYM(PSL_IR, PSL_IR);
 
 ASSYM(PSL_FE_DIS, PSL_FE_DIS);
 ASSYM(PSL_FE_NONREC, PSL_FE_NONREC);
 ASSYM(PSL_FE_PREC, PSL_FE_PREC);
 ASSYM(PSL_FE_REC, PSL_FE_REC);
 
 ASSYM(PSL_VEC, PSL_VEC);
 ASSYM(PSL_BE, PSL_BE);
 ASSYM(PSL_EE, PSL_EE);
 ASSYM(PSL_FE0, PSL_FE0);
 ASSYM(PSL_FE1, PSL_FE1);
 ASSYM(PSL_FP, PSL_FP);
 ASSYM(PSL_ME, PSL_ME);
 ASSYM(PSL_PR, PSL_PR);
 ASSYM(PSL_PMM, PSL_PMM);
 
Index: user/ngie/bug-237403/sys/powerpc/powerpc/swtch32.S
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/swtch32.S	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/swtch32.S	(revision 346926)
@@ -1,220 +1,218 @@
 /* $FreeBSD$ */
 /* $NetBSD: locore.S,v 1.24 2000/05/31 05:09:17 thorpej Exp $ */
 
 /*-
  * Copyright (C) 2001 Benno Rice
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
 /*-
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include "assym.inc"
 #include "opt_sched.h"
 
 #include <sys/syscall.h>
 
 #include <machine/trap.h>
 #include <machine/param.h>
 #include <machine/asm.h>
 #include <machine/spr.h>
 
 /*
  * void cpu_throw(struct thread *old, struct thread *new)
  */
 ENTRY(cpu_throw)
 	mr	%r2, %r4
 	li	%r14,0	/* Tell cpu_switchin not to release a thread */
 
 	b	cpu_switchin
 
 /*
  * void cpu_switch(struct thread *old,
  *		   struct thread *new,
  *		   struct mutex *mtx); 
  *
  * Switch to a new thread saving the current state in the old thread.
  */
 ENTRY(cpu_switch)
 	lwz	%r6,TD_PCB(%r3)		/* Get the old thread's PCB ptr */
 	stmw	%r12,PCB_CONTEXT(%r6)	/* Save the non-volatile GP regs.
 					   These can now be used for scratch */
 
 	mfcr	%r16			/* Save the condition register */
 	stw	%r16,PCB_CR(%r6)
 	mflr	%r16			/* Save the link register */
 	stw	%r16,PCB_LR(%r6)
 	stw	%r1,PCB_SP(%r6)		/* Save the stack pointer */
 
 	mr	%r14,%r3		/* Copy the old thread ptr... */
 	mr	%r2,%r4			/* and the new thread ptr in curthread */
 	mr	%r16,%r5		/* and the new lock */
 	mr	%r17,%r6		/* and the PCB */
 	
-	lwz	%r7,PCB_FLAGS(%r17)
+	lwz	%r18,PCB_FLAGS(%r17)
 	/* Save FPU context if needed */
-	andi.	%r7, %r7, PCB_FPU
+	andi.	%r7, %r18, PCB_FPU
 	beq	.L1
 	bl	save_fpu
 
 .L1:
 	mr	%r3,%r14		/* restore old thread ptr */
-	lwz	%r7,PCB_FLAGS(%r17)
 	/* Save Altivec context if needed */
-	andi.	%r7, %r7, PCB_VEC
+	andi.	%r7, %r18, PCB_VEC
 	beq	.L2
 	bl	save_vec
 	
 .L2:
 #if defined(__SPE__)
 	mfspr	%r3,SPR_SPEFSCR
 	stw	%r3,PCB_VSCR(%r17)
 #endif
 	mr	%r3,%r14		/* restore old thread ptr */
 	bl	pmap_deactivate		/* Deactivate the current pmap */
 
 	sync				/* Make sure all of that finished */
 
 cpu_switchin:
 #if defined(SMP) && defined(SCHED_ULE)
 	/* Wait for the new thread to become unblocked */
 	bl	_GLOBAL_OFFSET_TABLE_@local-4
 	mflr	%r6
 	lwz	%r6,blocked_lock@got(%r6)
 blocked_loop:
 	lwz	%r7,TD_LOCK(%r2)
 	cmpw	%r6,%r7 
 	beq-	blocked_loop
 	isync
 #endif
 
 	lwz	%r17,TD_PCB(%r2)	/* Get new current PCB */
 	lwz	%r1,PCB_SP(%r17)	/* Load new stack pointer */
 
 	/* Release old thread now that we have a stack pointer set up */
 	cmpwi	%r14,0
 	beq-	1f
 	stw	%r16,TD_LOCK(%r14)	/* ULE:	update old thread's lock */
 
 1:	mfsprg	%r7,0			/* Get the pcpu pointer */
 	stw	%r2,PC_CURTHREAD(%r7)	/* Store new current thread */
 	lwz	%r17,TD_PCB(%r2)	/* Store new current PCB */
 	stw	%r17,PC_CURPCB(%r7)
 
 	mr	%r3,%r2			/* Get new thread ptr */
 	bl	pmap_activate		/* Activate the new address space */
 
-	lwz	%r6, PCB_FLAGS(%r17)
+	lwz	%r19, PCB_FLAGS(%r17)
 	/* Restore FPU context if needed */
-	andi.	%r6, %r6, PCB_FPU
+	andi.	%r6, %r19, PCB_FPU
 	beq	.L3
 	mr	%r3,%r2			/* Pass curthread to enable_fpu */
 	bl	enable_fpu
 
 .L3:
-	lwz	%r6, PCB_FLAGS(%r17)
 	/* Restore Altivec context if needed */
-	andi.	%r6, %r6, PCB_VEC
+	andi.	%r6, %r19, PCB_VEC
 	beq	.L4
 	mr	%r3,%r2			/* Pass curthread to enable_vec */
 	bl	enable_vec
 
 .L4:
 #if defined(__SPE__)
 	lwz	%r3,PCB_VSCR(%r17)
 	mtspr	SPR_SPEFSCR,%r3
 #endif
 	/* thread to restore is in r3 */
 	mr	%r3,%r17		/* Recover PCB ptr */
 	lmw	%r12,PCB_CONTEXT(%r3)	/* Load the non-volatile GP regs */
 	lwz	%r5,PCB_CR(%r3)		/* Load the condition register */
 	mtcr	%r5
 	lwz	%r5,PCB_LR(%r3)		/* Load the link register */
 	mtlr	%r5
 	lwz	%r1,PCB_SP(%r3)		/* Load the stack pointer */
 	/*
 	 * Perform a dummy stwcx. to clear any reservations we may have
 	 * inherited from the previous thread. It doesn't matter if the
 	 * stwcx succeeds or not. pcb_context[0] can be clobbered.
 	 */
 	stwcx.	%r1, 0, %r3
 	blr
 
 /*
  * savectx(pcb)
  * Update pcb, saving current processor state
  */
 ENTRY(savectx)
 	stmw	%r12,PCB_CONTEXT(%r3)	/* Save the non-volatile GP regs */
 	mfcr	%r4			/* Save the condition register */
 	stw	%r4,PCB_CR(%r3)
 	stw	%r1,PCB_SP(%r3)		/* Save the stack pointer */
 	mflr	%r4			/* Save the link register */
 	stw	%r4,PCB_LR(%r3)
 	blr
 
 /*
  * fork_trampoline()
  * Set up the return from cpu_fork()
  */
 ENTRY(fork_trampoline)
 	lwz	%r3,CF_FUNC(%r1)
 	lwz	%r4,CF_ARG0(%r1)
 	lwz	%r5,CF_ARG1(%r1)
 	bl	fork_exit
 	addi	%r1,%r1,CF_SIZE-FSP	/* Allow 8 bytes in front of
 					   trapframe to simulate FRAME_SETUP
 					   does when allocating space for
 					   a frame pointer/saved LR */
 #ifdef __SPE__
 	li	%r3,SPEFSCR_FINVE|SPEFSCR_FDBZE|SPEFSCR_FUNFE|SPEFSCR_FOVFE
 	mtspr	SPR_SPEFSCR, %r3
 #endif
 	b	trapexit
Index: user/ngie/bug-237403/sys/powerpc/powerpc/swtch64.S
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/swtch64.S	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/swtch64.S	(revision 346926)
@@ -1,304 +1,359 @@
 /* $FreeBSD$ */
 /* $NetBSD: locore.S,v 1.24 2000/05/31 05:09:17 thorpej Exp $ */
 
 /*-
  * Copyright (C) 2001 Benno Rice
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
 /*-
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include "assym.inc"
 #include "opt_sched.h"
 
 #include <sys/syscall.h>
 
 #include <machine/trap.h>
 #include <machine/spr.h>
 #include <machine/param.h>
 #include <machine/asm.h>
 
 #ifdef _CALL_ELF
 .abiversion _CALL_ELF
 #endif
 
 TOC_ENTRY(blocked_lock)
 
 /*
  * void cpu_throw(struct thread *old, struct thread *new)
  */
 ENTRY(cpu_throw)
 	mr	%r13, %r4
 	li	%r14,0	/* Tell cpu_switchin not to release a thread */
 
 	b	cpu_switchin
 
 /*
  * void cpu_switch(struct thread *old,
  *		   struct thread *new,
  *		   struct mutex *mtx); 
  *
  * Switch to a new thread saving the current state in the old thread.
+ *
+ * Internally clobbers (not visible outside of this file):
+ * r18	- old thread pcb_flags
+ * r19	- new thread pcb_flags
  */
 ENTRY(cpu_switch)
 	ld	%r6,TD_PCB(%r3)		/* Get the old thread's PCB ptr */
 	std	%r12,PCB_CONTEXT(%r6)	/* Save the non-volatile GP regs.
 					   These can now be used for scratch */
 	std	%r14,PCB_CONTEXT+2*8(%r6)	
 	std	%r15,PCB_CONTEXT+3*8(%r6)	
 	std	%r16,PCB_CONTEXT+4*8(%r6)	
 	std	%r17,PCB_CONTEXT+5*8(%r6)	
 	std	%r18,PCB_CONTEXT+6*8(%r6)	
 	std	%r19,PCB_CONTEXT+7*8(%r6)	
 	std	%r20,PCB_CONTEXT+8*8(%r6)	
 	std	%r21,PCB_CONTEXT+9*8(%r6)	
 	std	%r22,PCB_CONTEXT+10*8(%r6)	
 	std	%r23,PCB_CONTEXT+11*8(%r6)	
 	std	%r24,PCB_CONTEXT+12*8(%r6)	
 	std	%r25,PCB_CONTEXT+13*8(%r6)	
 	std	%r26,PCB_CONTEXT+14*8(%r6)	
 	std	%r27,PCB_CONTEXT+15*8(%r6)	
 	std	%r28,PCB_CONTEXT+16*8(%r6)	
 	std	%r29,PCB_CONTEXT+17*8(%r6)	
 	std	%r30,PCB_CONTEXT+18*8(%r6)	
 	std	%r31,PCB_CONTEXT+19*8(%r6)	
 
 	mfcr	%r16			/* Save the condition register */
 	std	%r16,PCB_CR(%r6)
 	mflr	%r16			/* Save the link register */
 	std	%r16,PCB_LR(%r6)
 	std	%r1,PCB_SP(%r6)		/* Save the stack pointer */
 	std	%r2,PCB_TOC(%r6)	/* Save the TOC pointer */
 	
 	mr	%r14,%r3		/* Copy the old thread ptr... */
 	mr	%r13,%r4		/* and the new thread ptr in curthread*/
 	mr	%r16,%r5		/* and the new lock */
 	mr	%r17,%r6		/* and the PCB */
 	
 	stdu	%r1,-48(%r1)
 
-	lwz	%r7, PCB_FLAGS(%r17)
-	andi.	%r7, %r7, PCB_CDSCR
+	lwz	%r18, PCB_FLAGS(%r17)
+	andi.	%r7, %r18, PCB_CFSCR
+	beq	1f
+	mfspr	%r6, SPR_FSCR
+	std	%r6, PCB_FSCR(%r17)
+save_ebb:
+	andi.	%r0, %r6, FSCR_EBB
+	beq	save_lm
+	mfspr	%r7, SPR_EBBHR
+	std	%r7, PCB_EBB_EBBHR(%r17)
+	mfspr	%r7, SPR_EBBRR
+	std	%r7, PCB_EBB_EBBRR(%r17)
+	mfspr	%r7, SPR_BESCR
+	std	%r7, PCB_EBB_BESCR(%r17)
+save_lm:
+	andi.	%r0, %r6, FSCR_LM
+	beq	save_tar
+	mfspr	%r7, SPR_LMRR
+	std	%r7, PCB_LMON_LMRR(%r17)
+	mfspr	%r7, SPR_LMSER
+	std	%r7, PCB_LMON_LMSER(%r17)
+save_tar:
+	andi.	%r0, %r6, FSCR_TAR
+	beq	1f
+	mfspr	%r7, SPR_TAR
+	std	%r7, PCB_TAR(%r17)
+1:
+	andi.	%r7, %r18, PCB_CDSCR
 	beq	.L0
-	/* Custom DSCR was set. Reseting it to enter kernel */
-	li	%r7, 0x0
-	mtspr   SPR_DSCR, %r7
+	mfspr	%r6, SPR_DSCRP
+	std	%r6, PCB_DSCR(%r17)
 
 .L0:
-	lwz	%r7,PCB_FLAGS(%r17)
 	/* Save FPU context if needed */
-	andi.	%r7, %r7, PCB_FPU
+	andi.	%r7, %r18, PCB_FPU
 	beq	.L1
 	bl	save_fpu
 	nop
 
 .L1:
 	mr	%r3,%r14		/* restore old thread ptr */
-	lwz	%r7,PCB_FLAGS(%r17)
 	/* Save Altivec context if needed */
-	andi.	%r7, %r7, PCB_VEC
+	andi.	%r7, %r18, PCB_VEC
 	beq	.L2
 	bl	save_vec
 	nop
 	
 .L2:
 	mr	%r3,%r14		/* restore old thread ptr */
 	bl	pmap_deactivate	/* Deactivate the current pmap */
 	nop
 
 	sync				/* Make sure all of that finished */
 
 cpu_switchin:
 #if defined(SMP) && defined(SCHED_ULE)
 	/* Wait for the new thread to become unblocked */
 	addis	%r6,%r2,TOC_REF(blocked_lock)@ha
 	ld	%r6,TOC_REF(blocked_lock)@l(%r6)
 blocked_loop:
 	ld	%r7,TD_LOCK(%r13)
 	cmpd	%r6,%r7 
 	beq-	blocked_loop
 	isync
 #endif
 	
 	ld	%r17,TD_PCB(%r13)	/* Get new PCB */
 	ld	%r1,PCB_SP(%r17)	/* Load the stack pointer */
 	addi	%r1,%r1,-48		/* Remember about cpu_switch stack frame */
 
 	/* Release old thread now that we have a stack pointer set up */
 	cmpdi	%r14,0
 	beq-	1f
 	std	%r16,TD_LOCK(%r14)	/* ULE:	update old thread's lock */
 
 1:	mfsprg	%r7,0			/* Get the pcpu pointer */
 	std	%r13,PC_CURTHREAD(%r7)	/* Store new current thread */
 	ld	%r17,TD_PCB(%r13)	/* Store new current PCB */
 	std	%r17,PC_CURPCB(%r7)
 
 	mr	%r3,%r13		/* Get new thread ptr */
 	bl	pmap_activate		/* Activate the new address space */
 	nop
 
-	lwz	%r6, PCB_FLAGS(%r17)
+	lwz	%r19, PCB_FLAGS(%r17)
 	/* Restore FPU context if needed */
-	andi.	%r6, %r6, PCB_FPU
+	andi.	%r6, %r19, PCB_FPU
 	beq	.L3
 	mr	%r3,%r13		/* Pass curthread to enable_fpu */
 	bl	enable_fpu
 	nop
 
 .L3:
-	lwz	%r6, PCB_FLAGS(%r17)
 	/* Restore Altivec context if needed */
-	andi.	%r6, %r6, PCB_VEC
+	andi.	%r6, %r19, PCB_VEC
 	beq	.L31
 	mr	%r3,%r13		/* Pass curthread to enable_vec */
 	bl	enable_vec
 	nop
 
 .L31:
-	lwz	%r6, PCB_FLAGS(%r17)
-	/* Restore Custom DSCR if needed */
-	andi.	%r6, %r6, PCB_CDSCR
+	/* Load custom DSCR on PowerISA 2.06+ CPUs. */
+	/* Load changed FSCR on PowerISA 2.07+ CPUs. */
+	or	%r18,%r18,%r19
+	/* Restore Custom DSCR if needed (zeroes if in old but not new) */
+	andi.	%r6, %r18, PCB_CDSCR
+	beq	.L32
+	ld	%r7, PCB_DSCR(%r17)	/* Load the DSCR register*/
+	mtspr	SPR_DSCRP, %r7
+.L32:
+	/* Restore FSCR if needed (zeroes if in old but not new) */
+	andi.	%r6, %r18, PCB_CFSCR
 	beq	.L4
-	ld	%r6, PCB_DSCR(%r17)	/* Load the DSCR register*/
-	mtspr	SPR_DSCR, %r6
+	ld	%r7, PCB_FSCR(%r17)	/* Load the FSCR register*/
+	mtspr	SPR_FSCR, %r7
+restore_ebb:
+	andi.	%r0, %r7, FSCR_EBB
+	beq	restore_lm
+	ld	%r6, PCB_EBB_EBBHR(%r17)
+	mtspr	SPR_EBBHR, %r6
+	ld	%r6, PCB_EBB_EBBRR(%r17)
+	mtspr	SPR_EBBRR, %r6
+	ld	%r6, PCB_EBB_BESCR(%r17)
+	mtspr	SPR_BESCR, %r6
+restore_lm:
+	andi.	%r0, %r7, FSCR_LM
+	beq	restore_tar
+	ld	%r6, PCB_LMON_LMRR(%r17)
+	mtspr	SPR_LMRR, %r6
+	ld	%r6, PCB_LMON_LMSER(%r17)
+	mtspr	SPR_LMSER, %r6
+restore_tar:
+	andi.	%r0, %r7, FSCR_TAR
+	beq	.L4
+	ld	%r6, PCB_TAR(%r17)
+	mtspr	SPR_TAR, %r6
 
 	/* thread to restore is in r3 */
 .L4:
 	addi	%r1,%r1,48
 	mr	%r3,%r17		/* Recover PCB ptr */
 	ld	%r12,PCB_CONTEXT(%r3)	/* Load the non-volatile GP regs. */
 	ld	%r14,PCB_CONTEXT+2*8(%r3)	
 	ld	%r15,PCB_CONTEXT+3*8(%r3)	
 	ld	%r16,PCB_CONTEXT+4*8(%r3)	
 	ld	%r17,PCB_CONTEXT+5*8(%r3)	
 	ld	%r18,PCB_CONTEXT+6*8(%r3)	
 	ld	%r19,PCB_CONTEXT+7*8(%r3)	
 	ld	%r20,PCB_CONTEXT+8*8(%r3)	
 	ld	%r21,PCB_CONTEXT+9*8(%r3)	
 	ld	%r22,PCB_CONTEXT+10*8(%r3)	
 	ld	%r23,PCB_CONTEXT+11*8(%r3)	
 	ld	%r24,PCB_CONTEXT+12*8(%r3)	
 	ld	%r25,PCB_CONTEXT+13*8(%r3)	
 	ld	%r26,PCB_CONTEXT+14*8(%r3)	
 	ld	%r27,PCB_CONTEXT+15*8(%r3)	
 	ld	%r28,PCB_CONTEXT+16*8(%r3)
 	ld	%r29,PCB_CONTEXT+17*8(%r3)	
 	ld	%r30,PCB_CONTEXT+18*8(%r3)	
 	ld	%r31,PCB_CONTEXT+19*8(%r3)	
 	ld	%r5,PCB_CR(%r3)		/* Load the condition register */
 	mtcr	%r5
 	ld	%r5,PCB_LR(%r3)		/* Load the link register */
 	mtlr	%r5
 	ld	%r1,PCB_SP(%r3)		/* Load the stack pointer */
 	ld	%r2,PCB_TOC(%r3)	/* Load the TOC pointer */
 
 	/*
 	 * Perform a dummy stdcx. to clear any reservations we may have
 	 * inherited from the previous thread. It doesn't matter if the
 	 * stdcx succeeds or not. pcb_context[0] can be clobbered.
 	 */
 	stdcx.	%r1, 0, %r3
 	blr
 
 /*
  * savectx(pcb)
  * Update pcb, saving current processor state
  */
 ENTRY(savectx)
 	std	%r12,PCB_CONTEXT(%r3)	/* Save the non-volatile GP regs. */
 	std	%r13,PCB_CONTEXT+1*8(%r3)	
 	std	%r14,PCB_CONTEXT+2*8(%r3)	
 	std	%r15,PCB_CONTEXT+3*8(%r3)	
 	std	%r16,PCB_CONTEXT+4*8(%r3)	
 	std	%r17,PCB_CONTEXT+5*8(%r3)	
 	std	%r18,PCB_CONTEXT+6*8(%r3)	
 	std	%r19,PCB_CONTEXT+7*8(%r3)	
 	std	%r20,PCB_CONTEXT+8*8(%r3)	
 	std	%r21,PCB_CONTEXT+9*8(%r3)	
 	std	%r22,PCB_CONTEXT+10*8(%r3)	
 	std	%r23,PCB_CONTEXT+11*8(%r3)	
 	std	%r24,PCB_CONTEXT+12*8(%r3)	
 	std	%r25,PCB_CONTEXT+13*8(%r3)	
 	std	%r26,PCB_CONTEXT+14*8(%r3)	
 	std	%r27,PCB_CONTEXT+15*8(%r3)	
 	std	%r28,PCB_CONTEXT+16*8(%r3)
 	std	%r29,PCB_CONTEXT+17*8(%r3)	
 	std	%r30,PCB_CONTEXT+18*8(%r3)	
 	std	%r31,PCB_CONTEXT+19*8(%r3)	
 
 	mfcr	%r4			/* Save the condition register */
 	std	%r4,PCB_CR(%r3)
 	std	%r1,PCB_SP(%r3)		/* Save the stack pointer */
 	std	%r2,PCB_TOC(%r3)	/* Save the TOC pointer */
 	mflr	%r4			/* Save the link register */
 	std	%r4,PCB_LR(%r3)
 	blr
 
 /*
  * fork_trampoline()
  * Set up the return from cpu_fork()
  */
 
 ENTRY_NOPROF(fork_trampoline)
 	ld	%r3,CF_FUNC(%r1)
 	ld	%r4,CF_ARG0(%r1)
 	ld	%r5,CF_ARG1(%r1)
 
 	stdu	%r1,-48(%r1)
 	bl	fork_exit
 	nop
 	addi	%r1,%r1,48+CF_SIZE-FSP	/* Allow 8 bytes in front of
 					   trapframe to simulate FRAME_SETUP
 					   does when allocating space for
 					   a frame pointer/saved LR */
 	bl	trapexit
 	nop
Index: user/ngie/bug-237403/sys/powerpc/powerpc/trap.c
===================================================================
--- user/ngie/bug-237403/sys/powerpc/powerpc/trap.c	(revision 346925)
+++ user/ngie/bug-237403/sys/powerpc/powerpc/trap.c	(revision 346926)
@@ -1,992 +1,1022 @@
 /*-
  * Copyright (C) 1995, 1996 Wolfgang Solfrank.
  * Copyright (C) 1995, 1996 TooLs GmbH.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by TooLs GmbH.
  * 4. The name of TooLs GmbH may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $NetBSD: trap.c,v 1.58 2002/03/04 04:07:35 dbj Exp $
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/kdb.h>
 #include <sys/proc.h>
 #include <sys/ktr.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/pioctl.h>
 #include <sys/ptrace.h>
 #include <sys/reboot.h>
 #include <sys/syscall.h>
 #include <sys/sysent.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/uio.h>
 #include <sys/signalvar.h>
 #include <sys/vmmeter.h>
 
 #include <security/audit/audit.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_param.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_map.h>
 #include <vm/vm_page.h>
 
 #include <machine/_inttypes.h>
 #include <machine/altivec.h>
 #include <machine/cpu.h>
 #include <machine/db_machdep.h>
 #include <machine/fpu.h>
 #include <machine/frame.h>
 #include <machine/pcb.h>
 #include <machine/psl.h>
 #include <machine/trap.h>
 #include <machine/spr.h>
 #include <machine/sr.h>
 
 /* Below matches setjmp.S */
 #define	FAULTBUF_LR	21
 #define	FAULTBUF_R1	1
 #define	FAULTBUF_R2	2
 #define	FAULTBUF_CR	22
 #define	FAULTBUF_R14	3
 
 #define	MOREARGS(sp)	((caddr_t)((uintptr_t)(sp) + \
     sizeof(struct callframe) - 3*sizeof(register_t))) /* more args go here */
 
 static void	trap_fatal(struct trapframe *frame);
 static void	printtrap(u_int vector, struct trapframe *frame, int isfatal,
 		    int user);
 static int	trap_pfault(struct trapframe *frame, int user);
 static int	fix_unaligned(struct thread *td, struct trapframe *frame);
 static int	handle_onfault(struct trapframe *frame);
 static void	syscall(struct trapframe *frame);
 
 #if defined(__powerpc64__) && defined(AIM)
        void	handle_kernel_slb_spill(int, register_t, register_t);
 static int	handle_user_slb_spill(pmap_t pm, vm_offset_t addr);
 extern int	n_slbs;
 static void	normalize_inputs(void);
 #endif
 
 extern vm_offset_t __startkernel;
 
 #ifdef KDB
 int db_trap_glue(struct trapframe *);		/* Called from trap_subr.S */
 #endif
 
 struct powerpc_exception {
 	u_int	vector;
 	char	*name;
 };
 
 #ifdef KDTRACE_HOOKS
 #include <sys/dtrace_bsd.h>
 
 int (*dtrace_invop_jump_addr)(struct trapframe *);
 #endif
 
 static struct powerpc_exception powerpc_exceptions[] = {
 	{ EXC_CRIT,	"critical input" },
 	{ EXC_RST,	"system reset" },
 	{ EXC_MCHK,	"machine check" },
 	{ EXC_DSI,	"data storage interrupt" },
 	{ EXC_DSE,	"data segment exception" },
 	{ EXC_ISI,	"instruction storage interrupt" },
 	{ EXC_ISE,	"instruction segment exception" },
 	{ EXC_EXI,	"external interrupt" },
 	{ EXC_ALI,	"alignment" },
 	{ EXC_PGM,	"program" },
 	{ EXC_HEA,	"hypervisor emulation assistance" },
 	{ EXC_FPU,	"floating-point unavailable" },
 	{ EXC_APU,	"auxiliary proc unavailable" },
 	{ EXC_DECR,	"decrementer" },
 	{ EXC_FIT,	"fixed-interval timer" },
 	{ EXC_WDOG,	"watchdog timer" },
 	{ EXC_SC,	"system call" },
 	{ EXC_TRC,	"trace" },
 	{ EXC_FPA,	"floating-point assist" },
 	{ EXC_DEBUG,	"debug" },
 	{ EXC_PERF,	"performance monitoring" },
 	{ EXC_VEC,	"altivec unavailable" },
 	{ EXC_VSX,	"vsx unavailable" },
 	{ EXC_FAC,	"facility unavailable" },
 	{ EXC_ITMISS,	"instruction tlb miss" },
 	{ EXC_DLMISS,	"data load tlb miss" },
 	{ EXC_DSMISS,	"data store tlb miss" },
 	{ EXC_BPT,	"instruction breakpoint" },
 	{ EXC_SMI,	"system management" },
 	{ EXC_VECAST_G4,	"altivec assist" },
 	{ EXC_THRM,	"thermal management" },
 	{ EXC_RUNMODETRC,	"run mode/trace" },
 	{ EXC_SOFT_PATCH, "soft patch exception" },
 	{ EXC_LAST,	NULL }
 };
 
 #define ESR_BITMASK							\
     "\20"								\
     "\040b0\037b1\036b2\035b3\034PIL\033PRR\032PTR\031FP"		\
     "\030ST\027b9\026DLK\025ILK\024b12\023b13\022BO\021PIE"		\
     "\020b16\017b17\016b18\015b19\014b20\013b21\012b22\011b23"		\
     "\010SPE\007EPID\006b26\005b27\004b28\003b29\002b30\001b31"
 #define	MCSR_BITMASK							\
     "\20"								\
     "\040MCP\037ICERR\036DCERR\035TLBPERR\034L2MMU_MHIT\033b5\032b6\031b7"	\
     "\030b8\027b9\026b10\025NMI\024MAV\023MEA\022b14\021IF"		\
     "\020LD\017ST\016LDG\015b19\014b20\013b21\012b22\011b23"		\
     "\010b24\007b25\006b26\005b27\004b28\003b29\002TLBSYNC\001BSL2_ERR"
 #define	MSSSR_BITMASK							\
     "\20"								\
     "\040b0\037b1\036b2\035b3\034b4\033b5\032b6\031b7"			\
     "\030b8\027b9\026b10\025b11\024b12\023L2TAG\022L2DAT\021L3TAG"	\
     "\020L3DAT\017APE\016DPE\015TEA\014b20\013b21\012b22\011b23"	\
     "\010b24\007b25\006b26\005b27\004b28\003b29\002b30\001b31"
 
 
 static const char *
 trapname(u_int vector)
 {
 	struct	powerpc_exception *pe;
 
 	for (pe = powerpc_exceptions; pe->vector != EXC_LAST; pe++) {
 		if (pe->vector == vector)
 			return (pe->name);
 	}
 
 	return ("unknown");
 }
 
 static inline bool
 frame_is_trap_inst(struct trapframe *frame)
 {
 #ifdef AIM
 	return (frame->exc == EXC_PGM && frame->srr1 & EXC_PGM_TRAP);
 #else
 	return ((frame->cpu.booke.esr & ESR_PTR) != 0);
 #endif
 }
 
 void
 trap(struct trapframe *frame)
 {
 	struct thread	*td;
 	struct proc	*p;
 #ifdef KDTRACE_HOOKS
 	uint32_t inst;
 #endif
 	int		sig, type, user;
 	u_int		ucode;
 	ksiginfo_t	ksi;
 	register_t 	fscr;
 
 	VM_CNT_INC(v_trap);
 
 #ifdef KDB
 	if (kdb_active) {
 		kdb_reenter();
 		return;
 	}
 #endif
 
 	td = curthread;
 	p = td->td_proc;
 
 	type = ucode = frame->exc;
 	sig = 0;
 	user = frame->srr1 & PSL_PR;
 
 	CTR3(KTR_TRAP, "trap: %s type=%s (%s)", td->td_name,
 	    trapname(type), user ? "user" : "kernel");
 
 #ifdef KDTRACE_HOOKS
 	/*
 	 * A trap can occur while DTrace executes a probe. Before
 	 * executing the probe, DTrace blocks re-scheduling and sets
 	 * a flag in its per-cpu flags to indicate that it doesn't
 	 * want to fault. On returning from the probe, the no-fault
 	 * flag is cleared and finally re-scheduling is enabled.
 	 *
 	 * If the DTrace kernel module has registered a trap handler,
 	 * call it and if it returns non-zero, assume that it has
 	 * handled the trap and modified the trap frame so that this
 	 * function can return normally.
 	 */
 	if (dtrace_trap_func != NULL && (*dtrace_trap_func)(frame, type) != 0)
 		return;
 #endif
 
 	if (user) {
 		td->td_pticks = 0;
 		td->td_frame = frame;
 		if (td->td_cowgen != p->p_cowgen)
 			thread_cow_update(td);
 
 		/* User Mode Traps */
 		switch (type) {
 		case EXC_RUNMODETRC:
 		case EXC_TRC:
 			frame->srr1 &= ~PSL_SE;
 			sig = SIGTRAP;
 			ucode = TRAP_TRACE;
 			break;
 
 #if defined(__powerpc64__) && defined(AIM)
 		case EXC_ISE:
 		case EXC_DSE:
 			if (handle_user_slb_spill(&p->p_vmspace->vm_pmap,
 			    (type == EXC_ISE) ? frame->srr0 : frame->dar) != 0){
 				sig = SIGSEGV;
 				ucode = SEGV_MAPERR;
 			}
 			break;
 #endif
 		case EXC_DSI:
 		case EXC_ISI:
 			sig = trap_pfault(frame, 1);
 			if (sig == SIGSEGV)
 				ucode = SEGV_MAPERR;
 			break;
 
 		case EXC_SC:
 			syscall(frame);
 			break;
 
 		case EXC_FPU:
 			KASSERT((td->td_pcb->pcb_flags & PCB_FPU) != PCB_FPU,
 			    ("FPU already enabled for thread"));
 			enable_fpu(td);
 			break;
 
 		case EXC_VEC:
 			KASSERT((td->td_pcb->pcb_flags & PCB_VEC) != PCB_VEC,
 			    ("Altivec already enabled for thread"));
 			enable_vec(td);
 			break;
 
 		case EXC_VSX:
 			KASSERT((td->td_pcb->pcb_flags & PCB_VSX) != PCB_VSX,
 			    ("VSX already enabled for thread"));
 			if (!(td->td_pcb->pcb_flags & PCB_VEC))
 				enable_vec(td);
 			if (!(td->td_pcb->pcb_flags & PCB_FPU))
 				save_fpu(td);
 			td->td_pcb->pcb_flags |= PCB_VSX;
 			enable_fpu(td);
 			break;
 
 		case EXC_FAC:
 			fscr = mfspr(SPR_FSCR);
-			if ((fscr & FSCR_IC_MASK) == FSCR_IC_HTM) {
-				CTR0(KTR_TRAP, "Hardware Transactional Memory subsystem disabled");
+			switch (fscr & FSCR_IC_MASK) {
+			case FSCR_IC_HTM:
+				CTR0(KTR_TRAP,
+				    "Hardware Transactional Memory subsystem disabled");
+				sig = SIGILL;
+				ucode =	ILL_ILLOPC;
+				break;
+			case FSCR_IC_DSCR:
+				td->td_pcb->pcb_flags |= PCB_CFSCR | PCB_CDSCR;
+				fscr |= FSCR_DSCR;
+				mtspr(SPR_DSCR, 0);
+				break;
+			case FSCR_IC_EBB:
+				td->td_pcb->pcb_flags |= PCB_CFSCR;
+				fscr |= FSCR_EBB;
+				mtspr(SPR_EBBHR, 0);
+				mtspr(SPR_EBBRR, 0);
+				mtspr(SPR_BESCR, 0);
+				break;
+			case FSCR_IC_TAR:
+				td->td_pcb->pcb_flags |= PCB_CFSCR;
+				fscr |= FSCR_TAR;
+				mtspr(SPR_TAR, 0);
+				break;
+			case FSCR_IC_LM:
+				td->td_pcb->pcb_flags |= PCB_CFSCR;
+				fscr |= FSCR_LM;
+				mtspr(SPR_LMRR, 0);
+				mtspr(SPR_LMSER, 0);
+				break;
+			default:
+				sig = SIGILL;
+				ucode =	ILL_ILLOPC;
 			}
-			sig = SIGILL;
-			ucode =	ILL_ILLOPC;
+			mtspr(SPR_FSCR, fscr & ~FSCR_IC_MASK);
 			break;
 		case EXC_HEA:
 			sig = SIGILL;
 			ucode =	ILL_ILLOPC;
 			break;
 
 		case EXC_VECAST_E:
 		case EXC_VECAST_G4:
 		case EXC_VECAST_G5:
 			/*
 			 * We get a VPU assist exception for IEEE mode
 			 * vector operations on denormalized floats.
 			 * Emulating this is a giant pain, so for now,
 			 * just switch off IEEE mode and treat them as
 			 * zero.
 			 */
 
 			save_vec(td);
 			td->td_pcb->pcb_vec.vscr |= ALTIVEC_VSCR_NJ;
 			enable_vec(td);
 			break;
 
 		case EXC_ALI:
 			if (fix_unaligned(td, frame) != 0) {
 				sig = SIGBUS;
 				ucode = BUS_ADRALN;
 			}
 			else
 				frame->srr0 += 4;
 			break;
 
 		case EXC_DEBUG:	/* Single stepping */
 			mtspr(SPR_DBSR, mfspr(SPR_DBSR));
 			frame->srr1 &= ~PSL_DE;
 			frame->cpu.booke.dbcr0 &= ~(DBCR0_IDM | DBCR0_IC);
 			sig = SIGTRAP;
 			ucode = TRAP_TRACE;
 			break;
 
 		case EXC_PGM:
 			/* Identify the trap reason */
 			if (frame_is_trap_inst(frame)) {
 #ifdef KDTRACE_HOOKS
 				inst = fuword32((const void *)frame->srr0);
 				if (inst == 0x0FFFDDDD &&
 				    dtrace_pid_probe_ptr != NULL) {
 					(*dtrace_pid_probe_ptr)(frame);
 					break;
 				}
 #endif
  				sig = SIGTRAP;
 				ucode = TRAP_BRKPT;
 			} else {
 				sig = ppc_instr_emulate(frame, td);
 				if (sig == SIGILL) {
 					if (frame->srr1 & EXC_PGM_PRIV)
 						ucode = ILL_PRVOPC;
 					else if (frame->srr1 & EXC_PGM_ILLEGAL)
 						ucode = ILL_ILLOPC;
 				} else if (sig == SIGFPE)
 					ucode = FPE_FLTINV;	/* Punt for now, invalid operation. */
 			}
 			break;
 
 		case EXC_MCHK:
 			/*
 			 * Note that this may not be recoverable for the user
 			 * process, depending on the type of machine check,
 			 * but it at least prevents the kernel from dying.
 			 */
 			sig = SIGBUS;
 			ucode = BUS_OBJERR;
 			break;
 
 #if defined(__powerpc64__) && defined(AIM)
 		case EXC_SOFT_PATCH:
 			/*
 			 * Point to the instruction that generated the exception to execute it again,
 			 * and normalize the register values.
 			 */
 			frame->srr0 -= 4;
 			normalize_inputs();
 			break;
 #endif
 
 		default:
 			trap_fatal(frame);
 		}
 	} else {
 		/* Kernel Mode Traps */
 
 		KASSERT(cold || td->td_ucred != NULL,
 		    ("kernel trap doesn't have ucred"));
 		switch (type) {
 		case EXC_PGM:
 #ifdef KDTRACE_HOOKS
 			if (frame_is_trap_inst(frame)) {
 				if (*(uint32_t *)frame->srr0 == EXC_DTRACE) {
 					if (dtrace_invop_jump_addr != NULL) {
 						dtrace_invop_jump_addr(frame);
 						return;
 					}
 				}
 			}
 #endif
 #ifdef KDB
 			if (db_trap_glue(frame))
 				return;
 #endif
 			break;
 #if defined(__powerpc64__) && defined(AIM)
 		case EXC_DSE:
 			if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0 &&
 			    (frame->dar & SEGMENT_MASK) == USER_ADDR) {
 				__asm __volatile ("slbmte %0, %1" ::
 					"r"(td->td_pcb->pcb_cpu.aim.usr_vsid),
 					"r"(USER_SLB_SLBE));
 				return;
 			}
 			break;
 #endif
 		case EXC_DSI:
 			if (trap_pfault(frame, 0) == 0)
  				return;
 			break;
 		case EXC_MCHK:
 			if (handle_onfault(frame))
  				return;
 			break;
 		default:
 			break;
 		}
 		trap_fatal(frame);
 	}
 
 	if (sig != 0) {
 		if (p->p_sysent->sv_transtrap != NULL)
 			sig = (p->p_sysent->sv_transtrap)(sig, type);
 		ksiginfo_init_trap(&ksi);
 		ksi.ksi_signo = sig;
 		ksi.ksi_code = (int) ucode; /* XXX, not POSIX */
 		ksi.ksi_addr = (void *)frame->srr0;
 		ksi.ksi_trapno = type;
 		trapsignal(td, &ksi);
 	}
 
 	userret(td, frame);
 }
 
 static void
 trap_fatal(struct trapframe *frame)
 {
 #ifdef KDB
 	bool handled;
 #endif
 
 	printtrap(frame->exc, frame, 1, (frame->srr1 & PSL_PR));
 #ifdef KDB
 	if (debugger_on_trap) {
 		kdb_why = KDB_WHY_TRAP;
 		handled = kdb_trap(frame->exc, 0, frame);
 		kdb_why = KDB_WHY_UNSET;
 		if (handled)
 			return;
 	}
 #endif
 	panic("%s trap", trapname(frame->exc));
 }
 
 static void
 cpu_printtrap(u_int vector, struct trapframe *frame, int isfatal, int user)
 {
 #ifdef AIM
 	uint16_t ver;
 
 	switch (vector) {
 	case EXC_DSE:
 	case EXC_DSI:
 	case EXC_DTMISS:
 		printf("   dsisr           = 0x%lx\n",
 		    (u_long)frame->cpu.aim.dsisr);
 		break;
 	case EXC_MCHK:
 		ver = mfpvr() >> 16;
 		if (MPC745X_P(ver))
 			printf("    msssr0         = 0x%b\n",
 			    (int)mfspr(SPR_MSSSR0), MSSSR_BITMASK);
 		break;
 	}
 #elif defined(BOOKE)
 	vm_paddr_t pa;
 
 	switch (vector) {
 	case EXC_MCHK:
 		pa = mfspr(SPR_MCARU);
 		pa = (pa << 32) | (u_register_t)mfspr(SPR_MCAR);
 		printf("   mcsr            = 0x%b\n",
 		    (int)mfspr(SPR_MCSR), MCSR_BITMASK);
 		printf("   mcar            = 0x%jx\n", (uintmax_t)pa);
 	}
 	printf("   esr             = 0x%b\n",
 	    (int)frame->cpu.booke.esr, ESR_BITMASK);
 #endif
 }
 
 static void
 printtrap(u_int vector, struct trapframe *frame, int isfatal, int user)
 {
 
 	printf("\n");
 	printf("%s %s trap:\n", isfatal ? "fatal" : "handled",
 	    user ? "user" : "kernel");
 	printf("\n");
 	printf("   exception       = 0x%x (%s)\n", vector, trapname(vector));
 	switch (vector) {
 	case EXC_DSE:
 	case EXC_DSI:
 	case EXC_DTMISS:
 	case EXC_ALI:
 		printf("   virtual address = 0x%" PRIxPTR "\n", frame->dar);
 		break;
 	case EXC_ISE:
 	case EXC_ISI:
 	case EXC_ITMISS:
 		printf("   virtual address = 0x%" PRIxPTR "\n", frame->srr0);
 		break;
 	case EXC_MCHK:
 		break;
 	}
 	cpu_printtrap(vector, frame, isfatal, user);
 	printf("   srr0            = 0x%" PRIxPTR " (0x%" PRIxPTR ")\n",
 	    frame->srr0, frame->srr0 - (register_t)(__startkernel - KERNBASE));
 	printf("   srr1            = 0x%lx\n", (u_long)frame->srr1);
 	printf("   current msr     = 0x%" PRIxPTR "\n", mfmsr());
 	printf("   lr              = 0x%" PRIxPTR " (0x%" PRIxPTR ")\n",
 	    frame->lr, frame->lr - (register_t)(__startkernel - KERNBASE));
 	printf("   frame           = %p\n", frame);
 	printf("   curthread       = %p\n", curthread);
 	if (curthread != NULL)
 		printf("          pid = %d, comm = %s\n",
 		    curthread->td_proc->p_pid, curthread->td_name);
 	printf("\n");
 }
 
 /*
  * Handles a fatal fault when we have onfault state to recover.  Returns
  * non-zero if there was onfault recovery state available.
  */
 static int
 handle_onfault(struct trapframe *frame)
 {
 	struct		thread *td;
 	jmp_buf		*fb;
 
 	td = curthread;
 	fb = td->td_pcb->pcb_onfault;
 	if (fb != NULL) {
 		frame->srr0 = (*fb)->_jb[FAULTBUF_LR];
 		frame->fixreg[1] = (*fb)->_jb[FAULTBUF_R1];
 		frame->fixreg[2] = (*fb)->_jb[FAULTBUF_R2];
 		frame->fixreg[3] = 1;
 		frame->cr = (*fb)->_jb[FAULTBUF_CR];
 		bcopy(&(*fb)->_jb[FAULTBUF_R14], &frame->fixreg[14],
 		    18 * sizeof(register_t));
 		td->td_pcb->pcb_onfault = NULL; /* Returns twice, not thrice */
 		return (1);
 	}
 	return (0);
 }
 
 int
 cpu_fetch_syscall_args(struct thread *td)
 {
 	struct proc *p;
 	struct trapframe *frame;
 	struct syscall_args *sa;
 	caddr_t	params;
 	size_t argsz;
 	int error, n, i;
 
 	p = td->td_proc;
 	frame = td->td_frame;
 	sa = &td->td_sa;
 
 	sa->code = frame->fixreg[0];
 	params = (caddr_t)(frame->fixreg + FIRSTARG);
 	n = NARGREG;
 
 	if (sa->code == SYS_syscall) {
 		/*
 		 * code is first argument,
 		 * followed by actual args.
 		 */
 		sa->code = *(register_t *) params;
 		params += sizeof(register_t);
 		n -= 1;
 	} else if (sa->code == SYS___syscall) {
 		/*
 		 * Like syscall, but code is a quad,
 		 * so as to maintain quad alignment
 		 * for the rest of the args.
 		 */
 		if (SV_PROC_FLAG(p, SV_ILP32)) {
 			params += sizeof(register_t);
 			sa->code = *(register_t *) params;
 			params += sizeof(register_t);
 			n -= 2;
 		} else {
 			sa->code = *(register_t *) params;
 			params += sizeof(register_t);
 			n -= 1;
 		}
 	}
 
 	if (sa->code >= p->p_sysent->sv_size)
 		sa->callp = &p->p_sysent->sv_table[0];
 	else
 		sa->callp = &p->p_sysent->sv_table[sa->code];
 
 	sa->narg = sa->callp->sy_narg;
 
 	if (SV_PROC_FLAG(p, SV_ILP32)) {
 		argsz = sizeof(uint32_t);
 
 		for (i = 0; i < n; i++)
 			sa->args[i] = ((u_register_t *)(params))[i] &
 			    0xffffffff;
 	} else {
 		argsz = sizeof(uint64_t);
 
 		for (i = 0; i < n; i++)
 			sa->args[i] = ((u_register_t *)(params))[i];
 	}
 
 	if (sa->narg > n)
 		error = copyin(MOREARGS(frame->fixreg[1]), sa->args + n,
 			       (sa->narg - n) * argsz);
 	else
 		error = 0;
 
 #ifdef __powerpc64__
 	if (SV_PROC_FLAG(p, SV_ILP32) && sa->narg > n) {
 		/* Expand the size of arguments copied from the stack */
 
 		for (i = sa->narg; i >= n; i--)
 			sa->args[i] = ((uint32_t *)(&sa->args[n]))[i-n];
 	}
 #endif
 
 	if (error == 0) {
 		td->td_retval[0] = 0;
 		td->td_retval[1] = frame->fixreg[FIRSTARG + 1];
 	}
 	return (error);
 }
 
 #include "../../kern/subr_syscall.c"
 
 void
 syscall(struct trapframe *frame)
 {
 	struct thread *td;
 	int error;
 
 	td = curthread;
 	td->td_frame = frame;
 
 #if defined(__powerpc64__) && defined(AIM)
 	/*
 	 * Speculatively restore last user SLB segment, which we know is
 	 * invalid already, since we are likely to do copyin()/copyout().
 	 */
 	if (td->td_pcb->pcb_cpu.aim.usr_vsid != 0)
 		__asm __volatile ("slbmte %0, %1; isync" ::
 		    "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE));
 #endif
 
 	error = syscallenter(td);
 	syscallret(td, error);
 }
 
 #if defined(__powerpc64__) && defined(AIM)
 /* Handle kernel SLB faults -- runs in real mode, all seat belts off */
 void
 handle_kernel_slb_spill(int type, register_t dar, register_t srr0)
 {
 	struct slb *slbcache;
 	uint64_t slbe, slbv;
 	uint64_t esid, addr;
 	int i;
 
 	addr = (type == EXC_ISE) ? srr0 : dar;
 	slbcache = PCPU_GET(aim.slb);
 	esid = (uintptr_t)addr >> ADDR_SR_SHFT;
 	slbe = (esid << SLBE_ESID_SHIFT) | SLBE_VALID;
 	
 	/* See if the hardware flushed this somehow (can happen in LPARs) */
 	for (i = 0; i < n_slbs; i++)
 		if (slbcache[i].slbe == (slbe | (uint64_t)i))
 			return;
 
 	/* Not in the map, needs to actually be added */
 	slbv = kernel_va_to_slbv(addr);
 	if (slbcache[USER_SLB_SLOT].slbe == 0) {
 		for (i = 0; i < n_slbs; i++) {
 			if (i == USER_SLB_SLOT)
 				continue;
 			if (!(slbcache[i].slbe & SLBE_VALID))
 				goto fillkernslb;
 		}
 
 		if (i == n_slbs)
 			slbcache[USER_SLB_SLOT].slbe = 1;
 	}
 
 	/* Sacrifice a random SLB entry that is not the user entry */
 	i = mftb() % n_slbs;
 	if (i == USER_SLB_SLOT)
 		i = (i+1) % n_slbs;
 
 fillkernslb:
 	/* Write new entry */
 	slbcache[i].slbv = slbv;
 	slbcache[i].slbe = slbe | (uint64_t)i;
 
 	/* Trap handler will restore from cache on exit */
 }
 
 static int 
 handle_user_slb_spill(pmap_t pm, vm_offset_t addr)
 {
 	struct slb *user_entry;
 	uint64_t esid;
 	int i;
 
 	if (pm->pm_slb == NULL)
 		return (-1);
 
 	esid = (uintptr_t)addr >> ADDR_SR_SHFT;
 
 	PMAP_LOCK(pm);
 	user_entry = user_va_to_slb_entry(pm, addr);
 
 	if (user_entry == NULL) {
 		/* allocate_vsid auto-spills it */
 		(void)allocate_user_vsid(pm, esid, 0);
 	} else {
 		/*
 		 * Check that another CPU has not already mapped this.
 		 * XXX: Per-thread SLB caches would be better.
 		 */
 		for (i = 0; i < pm->pm_slb_len; i++)
 			if (pm->pm_slb[i] == user_entry)
 				break;
 
 		if (i == pm->pm_slb_len)
 			slb_insert_user(pm, user_entry);
 	}
 	PMAP_UNLOCK(pm);
 
 	return (0);
 }
 #endif
 
 static int
 trap_pfault(struct trapframe *frame, int user)
 {
 	vm_offset_t	eva, va;
 	struct		thread *td;
 	struct		proc *p;
 	vm_map_t	map;
 	vm_prot_t	ftype;
 	int		rv, is_user;
 
 	td = curthread;
 	p = td->td_proc;
 	if (frame->exc == EXC_ISI) {
 		eva = frame->srr0;
 		ftype = VM_PROT_EXECUTE;
 		if (frame->srr1 & SRR1_ISI_PFAULT)
 			ftype |= VM_PROT_READ;
 	} else {
 		eva = frame->dar;
 #ifdef BOOKE
 		if (frame->cpu.booke.esr & ESR_ST)
 #else
 		if (frame->cpu.aim.dsisr & DSISR_STORE)
 #endif
 			ftype = VM_PROT_WRITE;
 		else
 			ftype = VM_PROT_READ;
 	}
 
 	if (user) {
 		KASSERT(p->p_vmspace != NULL, ("trap_pfault: vmspace  NULL"));
 		map = &p->p_vmspace->vm_map;
 	} else {
 		rv = pmap_decode_kernel_ptr(eva, &is_user, &eva);
 		if (rv != 0)
 			return (SIGSEGV);
 
 		if (is_user)
 			map = &p->p_vmspace->vm_map;
 		else
 			map = kernel_map;
 	}
 	va = trunc_page(eva);
 
 	/* Fault in the page. */
 	rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL);
 	/*
 	 * XXXDTRACE: add dtrace_doubletrap_func here?
 	 */
 
 	if (rv == KERN_SUCCESS)
 		return (0);
 
 	if (!user && handle_onfault(frame))
 		return (0);
 
 	return (SIGSEGV);
 }
 
 /*
  * For now, this only deals with the particular unaligned access case
  * that gcc tends to generate.  Eventually it should handle all of the
  * possibilities that can happen on a 32-bit PowerPC in big-endian mode.
  */
 
 static int
 fix_unaligned(struct thread *td, struct trapframe *frame)
 {
 	struct thread	*fputhread;
 #ifdef	__SPE__
 	uint32_t	inst;
 #endif
 	int		indicator, reg;
 	double		*fpr;
 
 #ifdef __SPE__
 	indicator = (frame->cpu.booke.esr & (ESR_ST|ESR_SPE));
 	if (indicator & ESR_SPE) {
 		if (copyin((void *)frame->srr0, &inst, sizeof(inst)) != 0)
 			return (-1);
 		reg = EXC_ALI_SPE_REG(inst);
 		fpr = (double *)td->td_pcb->pcb_vec.vr[reg];
 		fputhread = PCPU_GET(vecthread);
 
 		/* Juggle the SPE to ensure that we've initialized
 		 * the registers, and that their current state is in
 		 * the PCB.
 		 */
 		if (fputhread != td) {
 			if (fputhread)
 				save_vec(fputhread);
 			enable_vec(td);
 		}
 		save_vec(td);
 
 		if (!(indicator & ESR_ST)) {
 			if (copyin((void *)frame->dar, fpr,
 			    sizeof(double)) != 0)
 				return (-1);
 			frame->fixreg[reg] = td->td_pcb->pcb_vec.vr[reg][1];
 			enable_vec(td);
 		} else {
 			td->td_pcb->pcb_vec.vr[reg][1] = frame->fixreg[reg];
 			if (copyout(fpr, (void *)frame->dar,
 			    sizeof(double)) != 0)
 				return (-1);
 		}
 		return (0);
 	}
 #else
 	indicator = EXC_ALI_OPCODE_INDICATOR(frame->cpu.aim.dsisr);
 
 	switch (indicator) {
 	case EXC_ALI_LFD:
 	case EXC_ALI_STFD:
 		reg = EXC_ALI_RST(frame->cpu.aim.dsisr);
 		fpr = &td->td_pcb->pcb_fpu.fpr[reg].fpr;
 		fputhread = PCPU_GET(fputhread);
 
 		/* Juggle the FPU to ensure that we've initialized
 		 * the FPRs, and that their current state is in
 		 * the PCB.
 		 */
 		if (fputhread != td) {
 			if (fputhread)
 				save_fpu(fputhread);
 			enable_fpu(td);
 		}
 		save_fpu(td);
 
 		if (indicator == EXC_ALI_LFD) {
 			if (copyin((void *)frame->dar, fpr,
 			    sizeof(double)) != 0)
 				return (-1);
 			enable_fpu(td);
 		} else {
 			if (copyout(fpr, (void *)frame->dar,
 			    sizeof(double)) != 0)
 				return (-1);
 		}
 		return (0);
 		break;
 	}
 #endif
 
 	return (-1);
 }
 
 #if defined(__powerpc64__) && defined(AIM)
 #define MSKNSHL(x, m, n) "(((" #x ") & " #m ") << " #n ")"
 #define MSKNSHR(x, m, n) "(((" #x ") & " #m ") >> " #n ")"
 
 /* xvcpsgndp instruction, built in opcode format.
  * This can be changed to use mnemonic after a toolchain update.
  */
 #define XVCPSGNDP(xt, xa, xb) \
 	__asm __volatile(".long (" \
 		MSKNSHL(60, 0x3f, 26) " | " \
 		MSKNSHL(xt, 0x1f, 21) " | " \
 		MSKNSHL(xa, 0x1f, 16) " | " \
 		MSKNSHL(xb, 0x1f, 11) " | " \
 		MSKNSHL(240, 0xff, 3) " | " \
 		MSKNSHR(xa,  0x20, 3) " | " \
 		MSKNSHR(xa,  0x20, 4) " | " \
 		MSKNSHR(xa,  0x20, 5) ")")
 
 /* Macros to normalize 1 or 10 VSX registers */
 #define NORM(x)	XVCPSGNDP(x, x, x)
 #define NORM10(x) \
 	NORM(x ## 0); NORM(x ## 1); NORM(x ## 2); NORM(x ## 3); NORM(x ## 4); \
 	NORM(x ## 5); NORM(x ## 6); NORM(x ## 7); NORM(x ## 8); NORM(x ## 9)
 
 static void
 normalize_inputs(void)
 {
 	unsigned long msr;
 
 	/* enable VSX */
 	msr = mfmsr();
 	mtmsr(msr | PSL_VSX);
 
 	NORM(0);   NORM(1);   NORM(2);   NORM(3);   NORM(4);
 	NORM(5);   NORM(6);   NORM(7);   NORM(8);   NORM(9);
 	NORM10(1); NORM10(2); NORM10(3); NORM10(4); NORM10(5);
 	NORM(60);  NORM(61);  NORM(62);  NORM(63);
 
 	/* restore MSR */
 	mtmsr(msr);
 }
 #endif
 
 #ifdef KDB
 int
 db_trap_glue(struct trapframe *frame)
 {
 
 	if (!(frame->srr1 & PSL_PR)
 	    && (frame->exc == EXC_TRC || frame->exc == EXC_RUNMODETRC
 	    	|| frame_is_trap_inst(frame)
 		|| frame->exc == EXC_BPT
 		|| frame->exc == EXC_DEBUG
 		|| frame->exc == EXC_DSI)) {
 		int type = frame->exc;
 
 		/* Ignore DTrace traps. */
 		if (*(uint32_t *)frame->srr0 == EXC_DTRACE)
 			return (0);
 		if (frame_is_trap_inst(frame)) {
 			type = T_BREAKPOINT;
 		}
 		return (kdb_trap(type, 0, frame));
 	}
 
 	return (0);
 }
 #endif
Index: user/ngie/bug-237403/sys/riscv/riscv/plic.c
===================================================================
--- user/ngie/bug-237403/sys/riscv/riscv/plic.c	(revision 346925)
+++ user/ngie/bug-237403/sys/riscv/riscv/plic.c	(revision 346926)
@@ -1,272 +1,298 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2018 Ruslan Bukin <br@bsdpad.com>
  * All rights reserved.
  *
  * This software was developed by SRI International and the University of
  * Cambridge Computer Laboratory (Department of Computer Science and
  * Technology) under DARPA contract HR0011-18-C-0016 ("ECATS"), as part of the
  * DARPA SSITH research programme.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/module.h>
 #include <sys/proc.h>
 #include <sys/rman.h>
 
 #include <machine/bus.h>
 #include <machine/intr.h>
 
 #include <dev/ofw/openfirm.h>
 #include <dev/ofw/ofw_bus.h>
 #include <dev/ofw/ofw_bus_subr.h>
 
 #include "pic_if.h"
 
 #define	PLIC_MAX_IRQS		2048
 #define	PLIC_PRIORITY(n)	(0x000000 + (n) * 0x4)
 #define	PLIC_ENABLE(n, h)	(0x002000 + (h) * 0x80 + 4 * ((n) / 32))
 #define	PLIC_THRESHOLD(h)	(0x200000 + (h) * 0x1000 + 0x0)
 #define	PLIC_CLAIM(h)		(0x200000 + (h) * 0x1000 + 0x4)
 
 struct plic_irqsrc {
 	struct intr_irqsrc	isrc;
 	u_int			irq;
 };
 
 struct plic_softc {
 	device_t		dev;
 	struct resource *	intc_res;
 	struct plic_irqsrc	isrcs[PLIC_MAX_IRQS];
 	int			ndev;
 };
 
 #define	RD4(sc, reg)				\
     bus_read_4(sc->intc_res, (reg))
 #define	WR4(sc, reg, val)			\
     bus_write_4(sc->intc_res, (reg), (val))
 
 static inline void
 plic_irq_dispatch(struct plic_softc *sc, u_int irq,
     struct trapframe *tf)
 {
 	struct plic_irqsrc *src;
 
 	src = &sc->isrcs[irq];
 
 	if (intr_isrc_dispatch(&src->isrc, tf) != 0)
 		device_printf(sc->dev, "Stray irq %u detected\n", irq);
 }
 
 static int
 plic_intr(void *arg)
 {
 	struct plic_softc *sc;
 	struct trapframe *tf;
 	uint32_t pending;
 	uint32_t cpu;
 
 	sc = arg;
 	cpu = PCPU_GET(cpuid);
 
 	pending = RD4(sc, PLIC_CLAIM(cpu));
 	if (pending) {
 		tf = curthread->td_intr_frame;
 		plic_irq_dispatch(sc, pending, tf);
 		WR4(sc, PLIC_CLAIM(cpu), pending);
 	}
 
 	return (FILTER_HANDLED);
 }
 
 static void
 plic_disable_intr(device_t dev, struct intr_irqsrc *isrc)
 {
 	struct plic_softc *sc;
 	struct plic_irqsrc *src;
 	uint32_t reg;
 	uint32_t cpu;
 
 	sc = device_get_softc(dev);
 	src = (struct plic_irqsrc *)isrc;
 
 	cpu = PCPU_GET(cpuid);
 
 	reg = RD4(sc, PLIC_ENABLE(src->irq, cpu));
 	reg &= ~(1 << (src->irq % 32));
 	WR4(sc, PLIC_ENABLE(src->irq, cpu), reg);
 }
 
 static void
 plic_enable_intr(device_t dev, struct intr_irqsrc *isrc)
 {
 	struct plic_softc *sc;
 	struct plic_irqsrc *src;
 	uint32_t reg;
 	uint32_t cpu;
 
 	sc = device_get_softc(dev);
 	src = (struct plic_irqsrc *)isrc;
 
 	WR4(sc, PLIC_PRIORITY(src->irq), 1);
 
 	cpu = PCPU_GET(cpuid);
 
 	reg = RD4(sc, PLIC_ENABLE(src->irq, cpu));
 	reg |= (1 << (src->irq % 32));
 	WR4(sc, PLIC_ENABLE(src->irq, cpu), reg);
 }
 
 static int
 plic_map_intr(device_t dev, struct intr_map_data *data,
     struct intr_irqsrc **isrcp)
 {
 	struct intr_map_data_fdt *daf;
 	struct plic_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	if (data->type != INTR_MAP_DATA_FDT)
 		return (ENOTSUP);
 
 	daf = (struct intr_map_data_fdt *)data;
 	if (daf->ncells != 1 || daf->cells[0] > sc->ndev)
 		return (EINVAL);
 
 	*isrcp = &sc->isrcs[daf->cells[0]].isrc;
 
 	return (0);
 }
 
 static int
 plic_probe(device_t dev)
 {
 
 	if (!ofw_bus_status_okay(dev))
 		return (ENXIO);
 
 	if (!ofw_bus_is_compatible(dev, "riscv,plic0"))
 		return (ENXIO);
 
 	device_set_desc(dev, "RISC-V PLIC");
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 plic_attach(device_t dev)
 {
 	struct plic_irqsrc *isrcs;
 	struct plic_softc *sc;
 	struct intr_pic *pic;
 	uint32_t irq;
 	const char *name;
 	phandle_t node;
 	phandle_t xref;
 	uint32_t cpu;
 	int error;
 	int rid;
 
 	sc = device_get_softc(dev);
 
 	sc->dev = dev;
 
 	node = ofw_bus_get_node(dev);
 	if ((OF_getencprop(node, "riscv,ndev", &sc->ndev,
 	    sizeof(sc->ndev))) < 0) {
 		device_printf(dev,
 		    "Error: could not get number of devices\n");
 		return (ENXIO);
 	}
 
 	if (sc->ndev >= PLIC_MAX_IRQS) {
 		device_printf(dev,
 		    "Error: invalid ndev (%d)\n", sc->ndev);
 		return (ENXIO);
 	}
 
 	/* Request memory resources */
 	rid = 0;
 	sc->intc_res = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid,
 	    RF_ACTIVE);
 	if (sc->intc_res == NULL) {
 		device_printf(dev,
 		    "Error: could not allocate memory resources\n");
 		return (ENXIO);
 	}
 
 	isrcs = sc->isrcs;
 	name = device_get_nameunit(sc->dev);
 	cpu = PCPU_GET(cpuid);
 	for (irq = 1; irq <= sc->ndev; irq++) {
 		isrcs[irq].irq = irq;
 		error = intr_isrc_register(&isrcs[irq].isrc, sc->dev,
 		    0, "%s,%u", name, irq);
 		if (error != 0)
 			return (error);
 
 		WR4(sc, PLIC_PRIORITY(irq), 0);
 		WR4(sc, PLIC_ENABLE(irq, cpu), 0);
 	}
 	WR4(sc, PLIC_THRESHOLD(cpu), 0);
 
 	xref = OF_xref_from_node(node);
 	pic = intr_pic_register(sc->dev, xref);
 	if (pic == NULL)
 		return (ENXIO);
 
 	csr_set(sie, SIE_SEIE);
 
 	return (intr_pic_claim_root(sc->dev, xref, plic_intr, sc, 0));
 }
 
+static void
+plic_pre_ithread(device_t dev, struct intr_irqsrc *isrc)
+{
+	struct plic_softc *sc;
+	struct plic_irqsrc *src;
+
+	sc = device_get_softc(dev);
+	src = (struct plic_irqsrc *)isrc;
+
+	WR4(sc, PLIC_PRIORITY(src->irq), 0);
+}
+
+static void
+plic_post_ithread(device_t dev, struct intr_irqsrc *isrc)
+{
+	struct plic_softc *sc;
+	struct plic_irqsrc *src;
+
+	sc = device_get_softc(dev);
+	src = (struct plic_irqsrc *)isrc;
+
+	WR4(sc, PLIC_PRIORITY(src->irq), 1);
+}
+
 static device_method_t plic_methods[] = {
 	DEVMETHOD(device_probe,		plic_probe),
 	DEVMETHOD(device_attach,	plic_attach),
 
 	DEVMETHOD(pic_disable_intr,	plic_disable_intr),
 	DEVMETHOD(pic_enable_intr,	plic_enable_intr),
 	DEVMETHOD(pic_map_intr,		plic_map_intr),
+	DEVMETHOD(pic_pre_ithread,	plic_pre_ithread),
+	DEVMETHOD(pic_post_ithread,	plic_post_ithread),
 
 	DEVMETHOD_END
 };
 
 static driver_t plic_driver = {
 	"plic",
 	plic_methods,
 	sizeof(struct plic_softc),
 };
 
 static devclass_t plic_devclass;
 
 EARLY_DRIVER_MODULE(plic, simplebus, plic_driver, plic_devclass,
     0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE);
Index: user/ngie/bug-237403/sys/sys/param.h
===================================================================
--- user/ngie/bug-237403/sys/sys/param.h	(revision 346925)
+++ user/ngie/bug-237403/sys/sys/param.h	(revision 346926)
@@ -1,367 +1,367 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1982, 1986, 1989, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)param.h	8.3 (Berkeley) 4/4/95
  * $FreeBSD$
  */
 
 #ifndef _SYS_PARAM_H_
 #define _SYS_PARAM_H_
 
 #include <sys/_null.h>
 
 #define	BSD	199506		/* System version (year & month). */
 #define BSD4_3	1
 #define BSD4_4	1
 
 /*
  * __FreeBSD_version numbers are documented in the Porter's Handbook.
  * If you bump the version for any reason, you should update the documentation
  * there.
  * Currently this lives here in the doc/ repository:
  *
  *	head/en_US.ISO8859-1/books/porters-handbook/versions/chapter.xml
  *
  * scheme is:  <major><two digit minor>Rxx
  *		'R' is in the range 0 to 4 if this is a release branch or
  *		X.0-CURRENT before releng/X.0 is created, otherwise 'R' is
  *		in the range 5 to 9.
  */
 #undef __FreeBSD_version
-#define __FreeBSD_version 1300020	/* Master, propagated to newvers */
+#define __FreeBSD_version 1300021	/* Master, propagated to newvers */
 
 /*
  * __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,
  * which by definition is always true on FreeBSD. This macro is also defined
  * on other systems that use the kernel of FreeBSD, such as GNU/kFreeBSD.
  *
  * It is tempting to use this macro in userland code when we want to enable
  * kernel-specific routines, and in fact it's fine to do this in code that
  * is part of FreeBSD itself.  However, be aware that as presence of this
  * macro is still not widespread (e.g. older FreeBSD versions, 3rd party
  * compilers, etc), it is STRONGLY DISCOURAGED to check for this macro in
  * external applications without also checking for __FreeBSD__ as an
  * alternative.
  */
 #undef __FreeBSD_kernel__
 #define __FreeBSD_kernel__
 
 #if defined(_KERNEL) || defined(IN_RTLD)
 #define	P_OSREL_SIGWAIT			700000
 #define	P_OSREL_SIGSEGV			700004
 #define	P_OSREL_MAP_ANON		800104
 #define	P_OSREL_MAP_FSTRICT		1100036
 #define	P_OSREL_SHUTDOWN_ENOTCONN	1100077
 #define	P_OSREL_MAP_GUARD		1200035
 #define	P_OSREL_WRFSBASE		1200041
 #define	P_OSREL_CK_CYLGRP		1200046
 #define	P_OSREL_VMTOTAL64		1200054
 #define	P_OSREL_CK_SUPERBLOCK		1300000
 #define	P_OSREL_CK_INODE		1300005
 
 #define	P_OSREL_MAJOR(x)		((x) / 100000)
 #endif
 
 #ifndef LOCORE
 #include <sys/types.h>
 #endif
 
 /*
  * Machine-independent constants (some used in following include files).
  * Redefined constants are from POSIX 1003.1 limits file.
  *
  * MAXCOMLEN should be >= sizeof(ac_comm) (see <acct.h>)
  */
 #include <sys/syslimits.h>
 
 #define	MAXCOMLEN	19		/* max command name remembered */
 #define	MAXINTERP	PATH_MAX	/* max interpreter file name length */
 #define	MAXLOGNAME	33		/* max login name length (incl. NUL) */
 #define	MAXUPRC		CHILD_MAX	/* max simultaneous processes */
 #define	NCARGS		ARG_MAX		/* max bytes for an exec function */
 #define	NGROUPS		(NGROUPS_MAX+1)	/* max number groups */
 #define	NOFILE		OPEN_MAX	/* max open files per process */
 #define	NOGROUP		65535		/* marker for empty group set member */
 #define MAXHOSTNAMELEN	256		/* max hostname size */
 #define SPECNAMELEN	255		/* max length of devicename */
 
 /* More types and definitions used throughout the kernel. */
 #ifdef _KERNEL
 #include <sys/cdefs.h>
 #include <sys/errno.h>
 #ifndef LOCORE
 #include <sys/time.h>
 #include <sys/priority.h>
 #endif
 
 #ifndef FALSE
 #define	FALSE	0
 #endif
 #ifndef TRUE
 #define	TRUE	1
 #endif
 #endif
 
 #ifndef _KERNEL
 /* Signals. */
 #include <sys/signal.h>
 #endif
 
 /* Machine type dependent parameters. */
 #include <machine/param.h>
 #ifndef _KERNEL
 #include <sys/limits.h>
 #endif
 
 #ifndef DEV_BSHIFT
 #define	DEV_BSHIFT	9		/* log2(DEV_BSIZE) */
 #endif
 #define	DEV_BSIZE	(1<<DEV_BSHIFT)
 
 #ifndef BLKDEV_IOSIZE
 #define BLKDEV_IOSIZE  PAGE_SIZE	/* default block device I/O size */
 #endif
 #ifndef DFLTPHYS
 #define DFLTPHYS	(64 * 1024)	/* default max raw I/O transfer size */
 #endif
 #ifndef MAXPHYS
 #define MAXPHYS		(128 * 1024)	/* max raw I/O transfer size */
 #endif
 #ifndef MAXDUMPPGS
 #define MAXDUMPPGS	(DFLTPHYS/PAGE_SIZE)
 #endif
 
 /*
  * Constants related to network buffer management.
  * MCLBYTES must be no larger than PAGE_SIZE.
  */
 #ifndef	MSIZE
 #define	MSIZE		256		/* size of an mbuf */
 #endif
 
 #ifndef	MCLSHIFT
 #define MCLSHIFT	11		/* convert bytes to mbuf clusters */
 #endif	/* MCLSHIFT */
 
 #define MCLBYTES	(1 << MCLSHIFT)	/* size of an mbuf cluster */
 
 #if PAGE_SIZE < 2048
 #define	MJUMPAGESIZE	MCLBYTES
 #elif PAGE_SIZE <= 8192
 #define	MJUMPAGESIZE	PAGE_SIZE
 #else
 #define	MJUMPAGESIZE	(8 * 1024)
 #endif
 
 #define	MJUM9BYTES	(9 * 1024)	/* jumbo cluster 9k */
 #define	MJUM16BYTES	(16 * 1024)	/* jumbo cluster 16k */
 
 /*
  * Some macros for units conversion
  */
 
 /* clicks to bytes */
 #ifndef ctob
 #define ctob(x)	((x)<<PAGE_SHIFT)
 #endif
 
 /* bytes to clicks */
 #ifndef btoc
 #define btoc(x)	(((vm_offset_t)(x)+PAGE_MASK)>>PAGE_SHIFT)
 #endif
 
 /*
  * btodb() is messy and perhaps slow because `bytes' may be an off_t.  We
  * want to shift an unsigned type to avoid sign extension and we don't
  * want to widen `bytes' unnecessarily.  Assume that the result fits in
  * a daddr_t.
  */
 #ifndef btodb
 #define btodb(bytes)	 		/* calculates (bytes / DEV_BSIZE) */ \
 	(sizeof (bytes) > sizeof(long) \
 	 ? (daddr_t)((unsigned long long)(bytes) >> DEV_BSHIFT) \
 	 : (daddr_t)((unsigned long)(bytes) >> DEV_BSHIFT))
 #endif
 
 #ifndef dbtob
 #define dbtob(db)			/* calculates (db * DEV_BSIZE) */ \
 	((off_t)(db) << DEV_BSHIFT)
 #endif
 
 #define	PRIMASK	0x0ff
 #define	PCATCH	0x100		/* OR'd with pri for tsleep to check signals */
 #define	PDROP	0x200	/* OR'd with pri to stop re-entry of interlock mutex */
 
 #define	NZERO	0		/* default "nice" */
 
 #define	NBBY	8		/* number of bits in a byte */
 #define	NBPW	sizeof(int)	/* number of bytes per word (integer) */
 
 #define	CMASK	022		/* default file mask: S_IWGRP|S_IWOTH */
 
 #define	NODEV	(dev_t)(-1)	/* non-existent device */
 
 /*
  * File system parameters and macros.
  *
  * MAXBSIZE -	Filesystems are made out of blocks of at most MAXBSIZE bytes
  *		per block.  MAXBSIZE may be made larger without effecting
  *		any existing filesystems as long as it does not exceed MAXPHYS,
  *		and may be made smaller at the risk of not being able to use
  *		filesystems which require a block size exceeding MAXBSIZE.
  *
  * MAXBCACHEBUF - Maximum size of a buffer in the buffer cache.  This must
  *		be >= MAXBSIZE and can be set differently for different
  *		architectures by defining it in <machine/param.h>.
  *		Making this larger allows NFS to do larger reads/writes.
  *
  * BKVASIZE -	Nominal buffer space per buffer, in bytes.  BKVASIZE is the
  *		minimum KVM memory reservation the kernel is willing to make.
  *		Filesystems can of course request smaller chunks.  Actual
  *		backing memory uses a chunk size of a page (PAGE_SIZE).
  *		The default value here can be overridden on a per-architecture
  *		basis by defining it in <machine/param.h>.
  *
  *		If you make BKVASIZE too small you risk seriously fragmenting
  *		the buffer KVM map which may slow things down a bit.  If you
  *		make it too big the kernel will not be able to optimally use
  *		the KVM memory reserved for the buffer cache and will wind
  *		up with too-few buffers.
  *
  *		The default is 16384, roughly 2x the block size used by a
  *		normal UFS filesystem.
  */
 #define MAXBSIZE	65536	/* must be power of 2 */
 #ifndef	MAXBCACHEBUF
 #define	MAXBCACHEBUF	MAXBSIZE /* must be a power of 2 >= MAXBSIZE */
 #endif
 #ifndef	BKVASIZE
 #define BKVASIZE	16384	/* must be power of 2 */
 #endif
 #define BKVAMASK	(BKVASIZE-1)
 
 /*
  * MAXPATHLEN defines the longest permissible path length after expanding
  * symbolic links. It is used to allocate a temporary buffer from the buffer
  * pool in which to do the name expansion, hence should be a power of two,
  * and must be less than or equal to MAXBSIZE.  MAXSYMLINKS defines the
  * maximum number of symbolic links that may be expanded in a path name.
  * It should be set high enough to allow all legitimate uses, but halt
  * infinite loops reasonably quickly.
  */
 #define	MAXPATHLEN	PATH_MAX
 #define MAXSYMLINKS	32
 
 /* Bit map related macros. */
 #define	setbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] |= 1<<((i)%NBBY))
 #define	clrbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] &= ~(1<<((i)%NBBY)))
 #define	isset(a,i)							\
 	(((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY)))
 #define	isclr(a,i)							\
 	((((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY))) == 0)
 
 /* Macros for counting and rounding. */
 #ifndef howmany
 #define	howmany(x, y)	(((x)+((y)-1))/(y))
 #endif
 #define	nitems(x)	(sizeof((x)) / sizeof((x)[0]))
 #define	rounddown(x, y)	(((x)/(y))*(y))
 #define	rounddown2(x, y) ((x)&(~((y)-1)))          /* if y is power of two */
 #define	roundup(x, y)	((((x)+((y)-1))/(y))*(y))  /* to any y */
 #define	roundup2(x, y)	(((x)+((y)-1))&(~((y)-1))) /* if y is powers of two */
 #define powerof2(x)	((((x)-1)&(x))==0)
 
 /* Macros for min/max. */
 #define	MIN(a,b) (((a)<(b))?(a):(b))
 #define	MAX(a,b) (((a)>(b))?(a):(b))
 
 #ifdef _KERNEL
 /*
  * Basic byte order function prototypes for non-inline functions.
  */
 #ifndef LOCORE
 #ifndef _BYTEORDER_PROTOTYPED
 #define	_BYTEORDER_PROTOTYPED
 __BEGIN_DECLS
 __uint32_t	 htonl(__uint32_t);
 __uint16_t	 htons(__uint16_t);
 __uint32_t	 ntohl(__uint32_t);
 __uint16_t	 ntohs(__uint16_t);
 __END_DECLS
 #endif
 #endif
 
 #ifndef _BYTEORDER_FUNC_DEFINED
 #define	_BYTEORDER_FUNC_DEFINED
 #define	htonl(x)	__htonl(x)
 #define	htons(x)	__htons(x)
 #define	ntohl(x)	__ntohl(x)
 #define	ntohs(x)	__ntohs(x)
 #endif /* !_BYTEORDER_FUNC_DEFINED */
 #endif /* _KERNEL */
 
 /*
  * Scale factor for scaled integers used to count %cpu time and load avgs.
  *
  * The number of CPU `tick's that map to a unique `%age' can be expressed
  * by the formula (1 / (2 ^ (FSHIFT - 11))).  The maximum load average that
  * can be calculated (assuming 32 bits) can be closely approximated using
  * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15).
  *
  * For the scheduler to maintain a 1:1 mapping of CPU `tick' to `%age',
  * FSHIFT must be at least 11; this gives us a maximum load avg of ~1024.
  */
 #define	FSHIFT	11		/* bits to right of fixed binary point */
 #define FSCALE	(1<<FSHIFT)
 
 #define dbtoc(db)			/* calculates devblks to pages */ \
 	((db + (ctodb(1) - 1)) >> (PAGE_SHIFT - DEV_BSHIFT))
 
 #define ctodb(db)			/* calculates pages to devblks */ \
 	((db) << (PAGE_SHIFT - DEV_BSHIFT))
 
 /*
  * Old spelling of __containerof().
  */
 #define	member2struct(s, m, x)						\
 	((struct s *)(void *)((char *)(x) - offsetof(struct s, m)))
 
 /*
  * Access a variable length array that has been declared as a fixed
  * length array.
  */
 #define __PAST_END(array, offset) (((__typeof__(*(array)) *)(array))[offset])
 
 #endif	/* _SYS_PARAM_H_ */
Index: user/ngie/bug-237403/sys/sys/proc.h
===================================================================
--- user/ngie/bug-237403/sys/sys/proc.h	(revision 346925)
+++ user/ngie/bug-237403/sys/sys/proc.h	(revision 346926)
@@ -1,1192 +1,1191 @@
 /*-
  * SPDX-License-Identifier: BSD-3-Clause
  *
  * Copyright (c) 1986, 1989, 1991, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)proc.h	8.15 (Berkeley) 5/19/95
  * $FreeBSD$
  */
 
 #ifndef _SYS_PROC_H_
 #define	_SYS_PROC_H_
 
 #include <sys/callout.h>		/* For struct callout. */
 #include <sys/event.h>			/* For struct klist. */
 #include <sys/condvar.h>
 #ifndef _KERNEL
 #include <sys/filedesc.h>
 #endif
 #include <sys/queue.h>
 #include <sys/_lock.h>
 #include <sys/lock_profile.h>
 #include <sys/_mutex.h>
 #include <sys/osd.h>
 #include <sys/priority.h>
 #include <sys/rtprio.h>			/* XXX. */
 #include <sys/runq.h>
 #include <sys/resource.h>
 #include <sys/sigio.h>
 #include <sys/signal.h>
 #include <sys/signalvar.h>
 #ifndef _KERNEL
 #include <sys/time.h>			/* For structs itimerval, timeval. */
 #else
 #include <sys/pcpu.h>
 #include <sys/systm.h>
 #endif
 #include <sys/ucontext.h>
 #include <sys/ucred.h>
 #include <sys/types.h>
 #include <sys/_domainset.h>
 
 #include <machine/proc.h>		/* Machine-dependent proc substruct. */
 #ifdef _KERNEL
 #include <machine/cpu.h>
 #endif
 
 /*
  * One structure allocated per session.
  *
  * List of locks
  * (m)		locked by s_mtx mtx
  * (e)		locked by proctree_lock sx
  * (c)		const until freeing
  */
 struct session {
 	u_int		s_count;	/* Ref cnt; pgrps in session - atomic. */
 	struct proc	*s_leader;	/* (m + e) Session leader. */
 	struct vnode	*s_ttyvp;	/* (m) Vnode of controlling tty. */
 	struct cdev_priv *s_ttydp;	/* (m) Device of controlling tty.  */
 	struct tty	*s_ttyp;	/* (e) Controlling tty. */
 	pid_t		s_sid;		/* (c) Session ID. */
 					/* (m) Setlogin() name: */
 	char		s_login[roundup(MAXLOGNAME, sizeof(long))];
 	struct mtx	s_mtx;		/* Mutex to protect members. */
 };
 
 /*
  * One structure allocated per process group.
  *
  * List of locks
  * (m)		locked by pg_mtx mtx
  * (e)		locked by proctree_lock sx
  * (c)		const until freeing
  */
 struct pgrp {
 	LIST_ENTRY(pgrp) pg_hash;	/* (e) Hash chain. */
 	LIST_HEAD(, proc) pg_members;	/* (m + e) Pointer to pgrp members. */
 	struct session	*pg_session;	/* (c) Pointer to session. */
 	struct sigiolst	pg_sigiolst;	/* (m) List of sigio sources. */
 	pid_t		pg_id;		/* (c) Process group id. */
 	int		pg_jobc;	/* (m) Job control process count. */
 	struct mtx	pg_mtx;		/* Mutex to protect members */
 };
 
 /*
  * pargs, used to hold a copy of the command line, if it had a sane length.
  */
 struct pargs {
 	u_int	ar_ref;		/* Reference count. */
 	u_int	ar_length;	/* Length. */
 	u_char	ar_args[1];	/* Arguments. */
 };
 
 /*-
  * Description of a process.
  *
  * This structure contains the information needed to manage a thread of
  * control, known in UN*X as a process; it has references to substructures
  * containing descriptions of things that the process uses, but may share
  * with related processes.  The process structure and the substructures
  * are always addressable except for those marked "(CPU)" below,
  * which might be addressable only on a processor on which the process
  * is running.
  *
  * Below is a key of locks used to protect each member of struct proc.  The
  * lock is indicated by a reference to a specific character in parens in the
  * associated comment.
  *      * - not yet protected
  *      a - only touched by curproc or parent during fork/wait
  *      b - created at fork, never changes
  *		(exception aiods switch vmspaces, but they are also
  *		marked 'P_SYSTEM' so hopefully it will be left alone)
  *      c - locked by proc mtx
  *      d - locked by allproc_lock lock
  *      e - locked by proctree_lock lock
  *      f - session mtx
  *      g - process group mtx
  *      h - callout_lock mtx
  *      i - by curproc or the master session mtx
  *      j - locked by proc slock
  *      k - only accessed by curthread
  *	k*- only accessed by curthread and from an interrupt
  *	kx- only accessed by curthread and by debugger
  *      l - the attaching proc or attaching proc parent
  *      m - Giant
  *      n - not locked, lazy
  *      o - ktrace lock
  *      q - td_contested lock
  *      r - p_peers lock
  *      s - see sleepq_switch(), sleeping_on_old_rtc(), and sleep(9)
  *      t - thread lock
  *	u - process stat lock
  *	w - process timer lock
  *      x - created at fork, only changes during single threading in exec
  *      y - created at first aio, doesn't change until exit or exec at which
  *          point we are single-threaded and only curthread changes it
  *      z - zombie threads lock
  *
  * If the locking key specifies two identifiers (for example, p_pptr) then
  * either lock is sufficient for read access, but both locks must be held
  * for write access.
  */
 struct cpuset;
 struct filecaps;
 struct filemon;
 struct kaioinfo;
 struct kaudit_record;
 struct kcov_info;
 struct kdtrace_proc;
 struct kdtrace_thread;
 struct mqueue_notifier;
 struct nlminfo;
 struct p_sched;
 struct proc;
 struct procdesc;
 struct racct;
 struct sbuf;
 struct sleepqueue;
 struct socket;
 struct syscall_args;
 struct td_sched;
 struct thread;
 struct trapframe;
 struct turnstile;
 struct vm_map;
 struct vm_map_entry;
 struct epoch_tracker;
 
 /*
  * XXX: Does this belong in resource.h or resourcevar.h instead?
  * Resource usage extension.  The times in rusage structs in the kernel are
  * never up to date.  The actual times are kept as runtimes and tick counts
  * (with control info in the "previous" times), and are converted when
  * userland asks for rusage info.  Backwards compatibility prevents putting
  * this directly in the user-visible rusage struct.
  *
  * Locking for p_rux: (cu) means (u) for p_rux and (c) for p_crux.
  * Locking for td_rux: (t) for all fields.
  */
 struct rusage_ext {
 	uint64_t	rux_runtime;    /* (cu) Real time. */
 	uint64_t	rux_uticks;     /* (cu) Statclock hits in user mode. */
 	uint64_t	rux_sticks;     /* (cu) Statclock hits in sys mode. */
 	uint64_t	rux_iticks;     /* (cu) Statclock hits in intr mode. */
 	uint64_t	rux_uu;         /* (c) Previous user time in usec. */
 	uint64_t	rux_su;         /* (c) Previous sys time in usec. */
 	uint64_t	rux_tu;         /* (c) Previous total time in usec. */
 };
 
 /*
  * Kernel runnable context (thread).
  * This is what is put to sleep and reactivated.
  * Thread context.  Processes may have multiple threads.
  */
 struct thread {
 	struct mtx	*volatile td_lock; /* replaces sched lock */
 	struct proc	*td_proc;	/* (*) Associated process. */
 	TAILQ_ENTRY(thread) td_plist;	/* (*) All threads in this proc. */
 	TAILQ_ENTRY(thread) td_runq;	/* (t) Run queue. */
 	TAILQ_ENTRY(thread) td_slpq;	/* (t) Sleep queue. */
 	TAILQ_ENTRY(thread) td_lockq;	/* (t) Lock queue. */
 	LIST_ENTRY(thread) td_hash;	/* (d) Hash chain. */
 	struct cpuset	*td_cpuset;	/* (t) CPU affinity mask. */
 	struct domainset_ref td_domain;	/* (a) NUMA policy */
 	struct seltd	*td_sel;	/* Select queue/channel. */
 	struct sleepqueue *td_sleepqueue; /* (k) Associated sleep queue. */
 	struct turnstile *td_turnstile;	/* (k) Associated turnstile. */
 	struct rl_q_entry *td_rlqe;	/* (k) Associated range lock entry. */
 	struct umtx_q   *td_umtxq;	/* (c?) Link for when we're blocked. */
 	lwpid_t		td_tid;		/* (b) Thread ID. */
 	sigqueue_t	td_sigqueue;	/* (c) Sigs arrived, not delivered. */
 #define	td_siglist	td_sigqueue.sq_signals
 	u_char		td_lend_user_pri; /* (t) Lend user pri. */
 
 /* Cleared during fork1() */
 #define	td_startzero td_epochnest
 	u_char		td_epochnest;	/* (k) Epoch nest counter. */
 	int		td_flags;	/* (t) TDF_* flags. */
 	int		td_inhibitors;	/* (t) Why can not run. */
 	int		td_pflags;	/* (k) Private thread (TDP_*) flags. */
 	int		td_dupfd;	/* (k) Ret value from fdopen. XXX */
 	int		td_sqqueue;	/* (t) Sleepqueue queue blocked on. */
 	void		*td_wchan;	/* (t) Sleep address. */
 	const char	*td_wmesg;	/* (t) Reason for sleep. */
 	volatile u_char td_owepreempt;  /* (k*) Preempt on last critical_exit */
 	u_char		td_tsqueue;	/* (t) Turnstile queue blocked on. */
 	short		td_locks;	/* (k) Debug: count of non-spin locks */
 	short		td_rw_rlocks;	/* (k) Count of rwlock read locks. */
 	short		td_sx_slocks;	/* (k) Count of sx shared locks. */
 	short		td_lk_slocks;	/* (k) Count of lockmgr shared locks. */
 	short		td_stopsched;	/* (k) Scheduler stopped. */
 	struct turnstile *td_blocked;	/* (t) Lock thread is blocked on. */
 	const char	*td_lockname;	/* (t) Name of lock blocked on. */
 	LIST_HEAD(, turnstile) td_contested;	/* (q) Contested locks. */
 	struct lock_list_entry *td_sleeplocks; /* (k) Held sleep locks. */
 	int		td_intr_nesting_level; /* (k) Interrupt recursion. */
 	int		td_pinned;	/* (k) Temporary cpu pin count. */
 	struct ucred	*td_ucred;	/* (k) Reference to credentials. */
 	struct plimit	*td_limit;	/* (k) Resource limits. */
 	int		td_slptick;	/* (t) Time at sleep. */
 	int		td_blktick;	/* (t) Time spent blocked. */
 	int		td_swvoltick;	/* (t) Time at last SW_VOL switch. */
 	int		td_swinvoltick;	/* (t) Time at last SW_INVOL switch. */
 	u_int		td_cow;		/* (*) Number of copy-on-write faults */
 	struct rusage	td_ru;		/* (t) rusage information. */
 	struct rusage_ext td_rux;	/* (t) Internal rusage information. */
 	uint64_t	td_incruntime;	/* (t) Cpu ticks to transfer to proc. */
 	uint64_t	td_runtime;	/* (t) How many cpu ticks we've run. */
 	u_int 		td_pticks;	/* (t) Statclock hits for profiling */
 	u_int		td_sticks;	/* (t) Statclock hits in system mode. */
 	u_int		td_iticks;	/* (t) Statclock hits in intr mode. */
 	u_int		td_uticks;	/* (t) Statclock hits in user mode. */
 	int		td_intrval;	/* (t) Return value for sleepq. */
 	sigset_t	td_oldsigmask;	/* (k) Saved mask from pre sigpause. */
 	volatile u_int	td_generation;	/* (k) For detection of preemption */
 	stack_t		td_sigstk;	/* (k) Stack ptr and on-stack flag. */
 	int		td_xsig;	/* (c) Signal for ptrace */
 	u_long		td_profil_addr;	/* (k) Temporary addr until AST. */
 	u_int		td_profil_ticks; /* (k) Temporary ticks until AST. */
 	char		td_name[MAXCOMLEN + 1];	/* (*) Thread name. */
 	struct file	*td_fpop;	/* (k) file referencing cdev under op */
 	int		td_dbgflags;	/* (c) Userland debugger flags */
 	siginfo_t	td_si;		/* (c) For debugger or core file */
 	int		td_ng_outbound;	/* (k) Thread entered ng from above. */
 	struct osd	td_osd;		/* (k) Object specific data. */
 	struct vm_map_entry *td_map_def_user; /* (k) Deferred entries. */
 	pid_t		td_dbg_forked;	/* (c) Child pid for debugger. */
 	u_int		td_vp_reserv;	/* (k) Count of reserved vnodes. */
 	int		td_no_sleeping;	/* (k) Sleeping disabled count. */
 	void		*td_su;		/* (k) FFS SU private */
 	sbintime_t	td_sleeptimo;	/* (t) Sleep timeout. */
 	int		td_rtcgen;	/* (s) rtc_generation of abs. sleep */
 	size_t		td_vslock_sz;	/* (k) amount of vslock-ed space */
 	struct kcov_info *td_kcov_info;	/* (*) Kernel code coverage data */
 #define	td_endzero td_sigmask
 
 /* Copied during fork1() or create_thread(). */
 #define	td_startcopy td_endzero
 	sigset_t	td_sigmask;	/* (c) Current signal mask. */
 	u_char		td_rqindex;	/* (t) Run queue index. */
 	u_char		td_base_pri;	/* (t) Thread base kernel priority. */
 	u_char		td_priority;	/* (t) Thread active priority. */
 	u_char		td_pri_class;	/* (t) Scheduling class. */
 	u_char		td_user_pri;	/* (t) User pri from estcpu and nice. */
 	u_char		td_base_user_pri; /* (t) Base user pri */
 	u_char		td_pre_epoch_prio; /* (k) User pri on entry to epoch */
 	uintptr_t	td_rb_list;	/* (k) Robust list head. */
 	uintptr_t	td_rbp_list;	/* (k) Robust priv list head. */
 	uintptr_t	td_rb_inact;	/* (k) Current in-action mutex loc. */
 	struct syscall_args td_sa;	/* (kx) Syscall parameters. Copied on
 					   fork for child tracing. */
 #define	td_endcopy td_pcb
 
 /*
  * Fields that must be manually set in fork1() or create_thread()
  * or already have been set in the allocator, constructor, etc.
  */
 	struct pcb	*td_pcb;	/* (k) Kernel VA of pcb and kstack. */
 	enum td_states {
 		TDS_INACTIVE = 0x0,
 		TDS_INHIBITED,
 		TDS_CAN_RUN,
 		TDS_RUNQ,
 		TDS_RUNNING
 	} td_state;			/* (t) thread state */
 	union {
 		register_t	tdu_retval[2];
 		off_t		tdu_off;
 	} td_uretoff;			/* (k) Syscall aux returns. */
 #define td_retval	td_uretoff.tdu_retval
 	u_int		td_cowgen;	/* (k) Generation of COW pointers. */
 	/* LP64 hole */
 	struct callout	td_slpcallout;	/* (h) Callout for sleep. */
 	struct trapframe *td_frame;	/* (k) */
 	struct vm_object *td_kstack_obj;/* (a) Kstack object. */
 	vm_offset_t	td_kstack;	/* (a) Kernel VA of kstack. */
 	int		td_kstack_pages; /* (a) Size of the kstack. */
 	volatile u_int	td_critnest;	/* (k*) Critical section nest level. */
 	struct mdthread td_md;		/* (k) Any machine-dependent fields. */
 	struct kaudit_record	*td_ar;	/* (k) Active audit record, if any. */
 	struct lpohead	td_lprof[2];	/* (a) lock profiling objects. */
 	struct kdtrace_thread	*td_dtrace; /* (*) DTrace-specific data. */
 	int		td_errno;	/* Error returned by last syscall. */
 	/* LP64 hole */
 	struct vnet	*td_vnet;	/* (k) Effective vnet. */
 	const char	*td_vnet_lpush;	/* (k) Debugging vnet push / pop. */
 	struct trapframe *td_intr_frame;/* (k) Frame of the current irq */
 	struct proc	*td_rfppwait_p;	/* (k) The vforked child */
 	struct vm_page	**td_ma;	/* (k) uio pages held */
 	int		td_ma_cnt;	/* (k) size of *td_ma */
 	/* LP64 hole */
 	void		*td_emuldata;	/* Emulator state data */
 	int		td_lastcpu;	/* (t) Last cpu we were on. */
 	int		td_oncpu;	/* (t) Which cpu we are on. */
 	void		*td_lkpi_task;	/* LinuxKPI task struct pointer */
 	struct epoch_tracker *td_et;	/* (k) compat KPI spare tracker */
 	int		td_pmcpend;
 };
 
 struct thread0_storage {
 	struct thread t0st_thread;
 	uint64_t t0st_sched[10];
 };
 
 struct mtx *thread_lock_block(struct thread *);
 void thread_lock_unblock(struct thread *, struct mtx *);
 void thread_lock_set(struct thread *, struct mtx *);
 #define	THREAD_LOCK_ASSERT(td, type)					\
 do {									\
 	struct mtx *__m = (td)->td_lock;				\
 	if (__m != &blocked_lock)					\
 		mtx_assert(__m, (type));				\
 } while (0)
 
 #ifdef INVARIANTS
 #define	THREAD_LOCKPTR_ASSERT(td, lock)					\
 do {									\
 	struct mtx *__m = (td)->td_lock;				\
 	KASSERT((__m == &blocked_lock || __m == (lock)),		\
 	    ("Thread %p lock %p does not match %p", td, __m, (lock)));	\
 } while (0)
 
 #define	TD_LOCKS_INC(td)	((td)->td_locks++)
 #define	TD_LOCKS_DEC(td) do {						\
 	KASSERT(SCHEDULER_STOPPED_TD(td) || (td)->td_locks > 0,		\
 	    ("thread %p owns no locks", (td)));				\
 	(td)->td_locks--;						\
 } while (0)
 #else
 #define	THREAD_LOCKPTR_ASSERT(td, lock)
 
 #define	TD_LOCKS_INC(td)
 #define	TD_LOCKS_DEC(td)
 #endif
 
 /*
  * Flags kept in td_flags:
  * To change these you MUST have the scheduler lock.
  */
 #define	TDF_BORROWING	0x00000001 /* Thread is borrowing pri from another. */
 #define	TDF_INPANIC	0x00000002 /* Caused a panic, let it drive crashdump. */
 #define	TDF_INMEM	0x00000004 /* Thread's stack is in memory. */
 #define	TDF_SINTR	0x00000008 /* Sleep is interruptible. */
 #define	TDF_TIMEOUT	0x00000010 /* Timing out during sleep. */
 #define	TDF_IDLETD	0x00000020 /* This is a per-CPU idle thread. */
 #define	TDF_CANSWAP	0x00000040 /* Thread can be swapped. */
 #define	TDF_SLEEPABORT	0x00000080 /* sleepq_abort was called. */
 #define	TDF_KTH_SUSP	0x00000100 /* kthread is suspended */
 #define	TDF_ALLPROCSUSP	0x00000200 /* suspended by SINGLE_ALLPROC */
 #define	TDF_BOUNDARY	0x00000400 /* Thread suspended at user boundary */
 #define	TDF_ASTPENDING	0x00000800 /* Thread has some asynchronous events. */
 #define	TDF_UNUSED12	0x00001000 /* --available-- */
 #define	TDF_SBDRY	0x00002000 /* Stop only on usermode boundary. */
 #define	TDF_UPIBLOCKED	0x00004000 /* Thread blocked on user PI mutex. */
 #define	TDF_NEEDSUSPCHK	0x00008000 /* Thread may need to suspend. */
 #define	TDF_NEEDRESCHED	0x00010000 /* Thread needs to yield. */
 #define	TDF_NEEDSIGCHK	0x00020000 /* Thread may need signal delivery. */
 #define	TDF_NOLOAD	0x00040000 /* Ignore during load avg calculations. */
 #define	TDF_SERESTART	0x00080000 /* ERESTART on stop attempts. */
 #define	TDF_THRWAKEUP	0x00100000 /* Libthr thread must not suspend itself. */
 #define	TDF_SEINTR	0x00200000 /* EINTR on stop attempts. */
 #define	TDF_SWAPINREQ	0x00400000 /* Swapin request due to wakeup. */
 #define	TDF_UNUSED23	0x00800000 /* --available-- */
 #define	TDF_SCHED0	0x01000000 /* Reserved for scheduler private use */
 #define	TDF_SCHED1	0x02000000 /* Reserved for scheduler private use */
 #define	TDF_SCHED2	0x04000000 /* Reserved for scheduler private use */
 #define	TDF_SCHED3	0x08000000 /* Reserved for scheduler private use */
 #define	TDF_ALRMPEND	0x10000000 /* Pending SIGVTALRM needs to be posted. */
 #define	TDF_PROFPEND	0x20000000 /* Pending SIGPROF needs to be posted. */
 #define	TDF_MACPEND	0x40000000 /* AST-based MAC event pending. */
 
 /* Userland debug flags */
 #define	TDB_SUSPEND	0x00000001 /* Thread is suspended by debugger */
 #define	TDB_XSIG	0x00000002 /* Thread is exchanging signal under trace */
 #define	TDB_USERWR	0x00000004 /* Debugger modified memory or registers */
 #define	TDB_SCE		0x00000008 /* Thread performs syscall enter */
 #define	TDB_SCX		0x00000010 /* Thread performs syscall exit */
 #define	TDB_EXEC	0x00000020 /* TDB_SCX from exec(2) family */
 #define	TDB_FORK	0x00000040 /* TDB_SCX from fork(2) that created new
 				      process */
 #define	TDB_STOPATFORK	0x00000080 /* Stop at the return from fork (child
 				      only) */
 #define	TDB_CHILD	0x00000100 /* New child indicator for ptrace() */
 #define	TDB_BORN	0x00000200 /* New LWP indicator for ptrace() */
 #define	TDB_EXIT	0x00000400 /* Exiting LWP indicator for ptrace() */
 #define	TDB_VFORK	0x00000800 /* vfork indicator for ptrace() */
 #define	TDB_FSTP	0x00001000 /* The thread is PT_ATTACH leader */
 #define	TDB_STEP	0x00002000 /* (x86) PSL_T set for PT_STEP */
 
 /*
  * "Private" flags kept in td_pflags:
  * These are only written by curthread and thus need no locking.
  */
 #define	TDP_OLDMASK	0x00000001 /* Need to restore mask after suspend. */
 #define	TDP_INKTR	0x00000002 /* Thread is currently in KTR code. */
 #define	TDP_INKTRACE	0x00000004 /* Thread is currently in KTRACE code. */
 #define	TDP_BUFNEED	0x00000008 /* Do not recurse into the buf flush */
 #define	TDP_COWINPROGRESS 0x00000010 /* Snapshot copy-on-write in progress. */
 #define	TDP_ALTSTACK	0x00000020 /* Have alternate signal stack. */
 #define	TDP_DEADLKTREAT	0x00000040 /* Lock acquisition - deadlock treatment. */
 #define	TDP_NOFAULTING	0x00000080 /* Do not handle page faults. */
 #define	TDP_UNUSED9	0x00000100 /* --available-- */
 #define	TDP_OWEUPC	0x00000200 /* Call addupc() at next AST. */
 #define	TDP_ITHREAD	0x00000400 /* Thread is an interrupt thread. */
 #define	TDP_SYNCIO	0x00000800 /* Local override, disable async i/o. */
 #define	TDP_SCHED1	0x00001000 /* Reserved for scheduler private use */
 #define	TDP_SCHED2	0x00002000 /* Reserved for scheduler private use */
 #define	TDP_SCHED3	0x00004000 /* Reserved for scheduler private use */
 #define	TDP_SCHED4	0x00008000 /* Reserved for scheduler private use */
 #define	TDP_GEOM	0x00010000 /* Settle GEOM before finishing syscall */
 #define	TDP_SOFTDEP	0x00020000 /* Stuck processing softdep worklist */
 #define	TDP_NORUNNINGBUF 0x00040000 /* Ignore runningbufspace check */
 #define	TDP_WAKEUP	0x00080000 /* Don't sleep in umtx cond_wait */
 #define	TDP_INBDFLUSH	0x00100000 /* Already in BO_BDFLUSH, do not recurse */
 #define	TDP_KTHREAD	0x00200000 /* This is an official kernel thread */
 #define	TDP_CALLCHAIN	0x00400000 /* Capture thread's callchain */
 #define	TDP_IGNSUSP	0x00800000 /* Permission to ignore the MNTK_SUSPEND* */
 #define	TDP_AUDITREC	0x01000000 /* Audit record pending on thread */
 #define	TDP_RFPPWAIT	0x02000000 /* Handle RFPPWAIT on syscall exit */
 #define	TDP_RESETSPUR	0x04000000 /* Reset spurious page fault history. */
 #define	TDP_NERRNO	0x08000000 /* Last errno is already in td_errno */
 #define	TDP_UIOHELD	0x10000000 /* Current uio has pages held in td_ma */
 #define	TDP_FORKING	0x20000000 /* Thread is being created through fork() */
 #define	TDP_EXECVMSPC	0x40000000 /* Execve destroyed old vmspace */
 
 /*
  * Reasons that the current thread can not be run yet.
  * More than one may apply.
  */
 #define	TDI_SUSPENDED	0x0001	/* On suspension queue. */
 #define	TDI_SLEEPING	0x0002	/* Actually asleep! (tricky). */
 #define	TDI_SWAPPED	0x0004	/* Stack not in mem.  Bad juju if run. */
 #define	TDI_LOCK	0x0008	/* Stopped on a lock. */
 #define	TDI_IWAIT	0x0010	/* Awaiting interrupt. */
 
 #define	TD_IS_SLEEPING(td)	((td)->td_inhibitors & TDI_SLEEPING)
 #define	TD_ON_SLEEPQ(td)	((td)->td_wchan != NULL)
 #define	TD_IS_SUSPENDED(td)	((td)->td_inhibitors & TDI_SUSPENDED)
 #define	TD_IS_SWAPPED(td)	((td)->td_inhibitors & TDI_SWAPPED)
 #define	TD_ON_LOCK(td)		((td)->td_inhibitors & TDI_LOCK)
 #define	TD_AWAITING_INTR(td)	((td)->td_inhibitors & TDI_IWAIT)
 #define	TD_IS_RUNNING(td)	((td)->td_state == TDS_RUNNING)
 #define	TD_ON_RUNQ(td)		((td)->td_state == TDS_RUNQ)
 #define	TD_CAN_RUN(td)		((td)->td_state == TDS_CAN_RUN)
 #define	TD_IS_INHIBITED(td)	((td)->td_state == TDS_INHIBITED)
 #define	TD_ON_UPILOCK(td)	((td)->td_flags & TDF_UPIBLOCKED)
 #define TD_IS_IDLETHREAD(td)	((td)->td_flags & TDF_IDLETD)
 
 #define	KTDSTATE(td)							\
 	(((td)->td_inhibitors & TDI_SLEEPING) != 0 ? "sleep"  :		\
 	((td)->td_inhibitors & TDI_SUSPENDED) != 0 ? "suspended" :	\
 	((td)->td_inhibitors & TDI_SWAPPED) != 0 ? "swapped" :		\
 	((td)->td_inhibitors & TDI_LOCK) != 0 ? "blocked" :		\
 	((td)->td_inhibitors & TDI_IWAIT) != 0 ? "iwait" : "yielding")
 
 #define	TD_SET_INHIB(td, inhib) do {			\
 	(td)->td_state = TDS_INHIBITED;			\
 	(td)->td_inhibitors |= (inhib);			\
 } while (0)
 
 #define	TD_CLR_INHIB(td, inhib) do {			\
 	if (((td)->td_inhibitors & (inhib)) &&		\
 	    (((td)->td_inhibitors &= ~(inhib)) == 0))	\
 		(td)->td_state = TDS_CAN_RUN;		\
 } while (0)
 
 #define	TD_SET_SLEEPING(td)	TD_SET_INHIB((td), TDI_SLEEPING)
 #define	TD_SET_SWAPPED(td)	TD_SET_INHIB((td), TDI_SWAPPED)
 #define	TD_SET_LOCK(td)		TD_SET_INHIB((td), TDI_LOCK)
 #define	TD_SET_SUSPENDED(td)	TD_SET_INHIB((td), TDI_SUSPENDED)
 #define	TD_SET_IWAIT(td)	TD_SET_INHIB((td), TDI_IWAIT)
 #define	TD_SET_EXITING(td)	TD_SET_INHIB((td), TDI_EXITING)
 
 #define	TD_CLR_SLEEPING(td)	TD_CLR_INHIB((td), TDI_SLEEPING)
 #define	TD_CLR_SWAPPED(td)	TD_CLR_INHIB((td), TDI_SWAPPED)
 #define	TD_CLR_LOCK(td)		TD_CLR_INHIB((td), TDI_LOCK)
 #define	TD_CLR_SUSPENDED(td)	TD_CLR_INHIB((td), TDI_SUSPENDED)
 #define	TD_CLR_IWAIT(td)	TD_CLR_INHIB((td), TDI_IWAIT)
 
 #define	TD_SET_RUNNING(td)	(td)->td_state = TDS_RUNNING
 #define	TD_SET_RUNQ(td)		(td)->td_state = TDS_RUNQ
 #define	TD_SET_CAN_RUN(td)	(td)->td_state = TDS_CAN_RUN
 
 #define	TD_SBDRY_INTR(td) \
     (((td)->td_flags & (TDF_SEINTR | TDF_SERESTART)) != 0)
 #define	TD_SBDRY_ERRNO(td) \
     (((td)->td_flags & TDF_SEINTR) != 0 ? EINTR : ERESTART)
 
 /*
  * Process structure.
  */
 struct proc {
 	LIST_ENTRY(proc) p_list;	/* (d) List of all processes. */
 	TAILQ_HEAD(, thread) p_threads;	/* (c) all threads. */
 	struct mtx	p_slock;	/* process spin lock */
 	struct ucred	*p_ucred;	/* (c) Process owner's identity. */
 	struct filedesc	*p_fd;		/* (b) Open files. */
 	struct filedesc_to_leader *p_fdtol; /* (b) Tracking node */
 	struct pstats	*p_stats;	/* (b) Accounting/statistics (CPU). */
 	struct plimit	*p_limit;	/* (c) Resource limits. */
 	struct callout	p_limco;	/* (c) Limit callout handle */
 	struct sigacts	*p_sigacts;	/* (x) Signal actions, state (CPU). */
 
 	int		p_flag;		/* (c) P_* flags. */
 	int		p_flag2;	/* (c) P2_* flags. */
 	enum p_states {
 		PRS_NEW = 0,		/* In creation */
 		PRS_NORMAL,		/* threads can be run. */
 		PRS_ZOMBIE
 	} p_state;			/* (j/c) Process status. */
 	pid_t		p_pid;		/* (b) Process identifier. */
 	LIST_ENTRY(proc) p_hash;	/* (d) Hash chain. */
 	LIST_ENTRY(proc) p_pglist;	/* (g + e) List of processes in pgrp. */
 	struct proc	*p_pptr;	/* (c + e) Pointer to parent process. */
 	LIST_ENTRY(proc) p_sibling;	/* (e) List of sibling processes. */
 	LIST_HEAD(, proc) p_children;	/* (e) Pointer to list of children. */
 	struct proc	*p_reaper;	/* (e) My reaper. */
 	LIST_HEAD(, proc) p_reaplist;	/* (e) List of my descendants
 					       (if I am reaper). */
 	LIST_ENTRY(proc) p_reapsibling;	/* (e) List of siblings - descendants of
 					       the same reaper. */
 	struct mtx	p_mtx;		/* (n) Lock for this struct. */
 	struct mtx	p_statmtx;	/* Lock for the stats */
 	struct mtx	p_itimmtx;	/* Lock for the virt/prof timers */
 	struct mtx	p_profmtx;	/* Lock for the profiling */
 	struct ksiginfo *p_ksi;	/* Locked by parent proc lock */
 	sigqueue_t	p_sigqueue;	/* (c) Sigs not delivered to a td. */
 #define p_siglist	p_sigqueue.sq_signals
 	pid_t		p_oppid;	/* (c + e) Real parent pid. */
 
 /* The following fields are all zeroed upon creation in fork. */
 #define	p_startzero	p_vmspace
 	struct vmspace	*p_vmspace;	/* (b) Address space. */
 	u_int		p_swtick;	/* (c) Tick when swapped in or out. */
 	u_int		p_cowgen;	/* (c) Generation of COW pointers. */
 	struct itimerval p_realtimer;	/* (c) Alarm timer. */
 	struct rusage	p_ru;		/* (a) Exit information. */
 	struct rusage_ext p_rux;	/* (cu) Internal resource usage. */
 	struct rusage_ext p_crux;	/* (c) Internal child resource usage. */
 	int		p_profthreads;	/* (c) Num threads in addupc_task. */
 	volatile int	p_exitthreads;	/* (j) Number of threads exiting */
 	int		p_traceflag;	/* (o) Kernel trace points. */
 	struct vnode	*p_tracevp;	/* (c + o) Trace to vnode. */
 	struct ucred	*p_tracecred;	/* (o) Credentials to trace with. */
 	struct vnode	*p_textvp;	/* (b) Vnode of executable. */
 	u_int		p_lock;		/* (c) Proclock (prevent swap) count. */
 	struct sigiolst	p_sigiolst;	/* (c) List of sigio sources. */
 	int		p_sigparent;	/* (c) Signal to parent on exit. */
 	int		p_sig;		/* (n) For core dump/debugger XXX. */
-	u_long		p_code;		/* (n) For core dump/debugger XXX. */
 	u_int		p_stops;	/* (c) Stop event bitmask. */
 	u_int		p_stype;	/* (c) Stop event type. */
 	char		p_step;		/* (c) Process is stopped. */
 	u_char		p_pfsflags;	/* (c) Procfs flags. */
 	u_int		p_ptevents;	/* (c + e) ptrace() event mask. */
 	struct nlminfo	*p_nlminfo;	/* (?) Only used by/for lockd. */
 	struct kaioinfo	*p_aioinfo;	/* (y) ASYNC I/O info. */
 	struct thread	*p_singlethread;/* (c + j) If single threading this is it */
 	int		p_suspcount;	/* (j) Num threads in suspended mode. */
 	struct thread	*p_xthread;	/* (c) Trap thread */
 	int		p_boundary_count;/* (j) Num threads at user boundary */
 	int		p_pendingcnt;	/* how many signals are pending */
 	struct itimers	*p_itimers;	/* (c) POSIX interval timers. */
 	struct procdesc	*p_procdesc;	/* (e) Process descriptor, if any. */
 	u_int		p_treeflag;	/* (e) P_TREE flags */
 	int		p_pendingexits; /* (c) Count of pending thread exits. */
 	struct filemon	*p_filemon;	/* (c) filemon-specific data. */
 	int		p_pdeathsig;	/* (c) Signal from parent on exit. */
 /* End area that is zeroed on creation. */
 #define	p_endzero	p_magic
 
 /* The following fields are all copied upon creation in fork. */
 #define	p_startcopy	p_endzero
 	u_int		p_magic;	/* (b) Magic number. */
 	int		p_osrel;	/* (x) osreldate for the
 					       binary (from ELF note, if any) */
 	uint32_t	p_fctl0;	/* (x) ABI feature control, ELF note */
 	char		p_comm[MAXCOMLEN + 1];	/* (x) Process name. */
 	struct sysentvec *p_sysent;	/* (b) Syscall dispatch info. */
 	struct pargs	*p_args;	/* (c) Process arguments. */
 	rlim_t		p_cpulimit;	/* (c) Current CPU limit in seconds. */
 	signed char	p_nice;		/* (c) Process "nice" value. */
 	int		p_fibnum;	/* in this routing domain XXX MRT */
 	pid_t		p_reapsubtree;	/* (e) Pid of the direct child of the
 					       reaper which spawned
 					       our subtree. */
 	uint16_t	p_elf_machine;	/* (x) ELF machine type */
 	uint64_t	p_elf_flags;	/* (x) ELF flags */
 /* End area that is copied on creation. */
 #define	p_endcopy	p_xexit
 
 	u_int		p_xexit;	/* (c) Exit code. */
 	u_int		p_xsig;		/* (c) Stop/kill sig. */
 	struct pgrp	*p_pgrp;	/* (c + e) Pointer to process group. */
 	struct knlist	*p_klist;	/* (c) Knotes attached to this proc. */
 	int		p_numthreads;	/* (c) Number of threads. */
 	struct mdproc	p_md;		/* Any machine-dependent fields. */
 	struct callout	p_itcallout;	/* (h + c) Interval timer callout. */
 	u_short		p_acflag;	/* (c) Accounting flags. */
 	struct proc	*p_peers;	/* (r) */
 	struct proc	*p_leader;	/* (b) */
 	void		*p_emuldata;	/* (c) Emulator state data. */
 	struct label	*p_label;	/* (*) Proc (not subject) MAC label. */
 	STAILQ_HEAD(, ktr_request)	p_ktr;	/* (o) KTR event queue. */
 	LIST_HEAD(, mqueue_notifier)	p_mqnotifier; /* (c) mqueue notifiers.*/
 	struct kdtrace_proc	*p_dtrace; /* (*) DTrace-specific data. */
 	struct cv	p_pwait;	/* (*) wait cv for exit/exec. */
 	uint64_t	p_prev_runtime;	/* (c) Resource usage accounting. */
 	struct racct	*p_racct;	/* (b) Resource accounting. */
 	int		p_throttled;	/* (c) Flag for racct pcpu throttling */
 	/*
 	 * An orphan is the child that has been re-parented to the
 	 * debugger as a result of attaching to it.  Need to keep
 	 * track of them for parent to be able to collect the exit
 	 * status of what used to be children.
 	 */
 	LIST_ENTRY(proc) p_orphan;	/* (e) List of orphan processes. */
 	LIST_HEAD(, proc) p_orphans;	/* (e) Pointer to list of orphans. */
 };
 
 #define	p_session	p_pgrp->pg_session
 #define	p_pgid		p_pgrp->pg_id
 
 #define	NOCPU		(-1)	/* For when we aren't on a CPU. */
 #define	NOCPU_OLD	(255)
 #define	MAXCPU_OLD	(254)
 
 #define	PROC_SLOCK(p)	mtx_lock_spin(&(p)->p_slock)
 #define	PROC_SUNLOCK(p)	mtx_unlock_spin(&(p)->p_slock)
 #define	PROC_SLOCK_ASSERT(p, type)	mtx_assert(&(p)->p_slock, (type))
 
 #define	PROC_STATLOCK(p)	mtx_lock_spin(&(p)->p_statmtx)
 #define	PROC_STATUNLOCK(p)	mtx_unlock_spin(&(p)->p_statmtx)
 #define	PROC_STATLOCK_ASSERT(p, type)	mtx_assert(&(p)->p_statmtx, (type))
 
 #define	PROC_ITIMLOCK(p)	mtx_lock_spin(&(p)->p_itimmtx)
 #define	PROC_ITIMUNLOCK(p)	mtx_unlock_spin(&(p)->p_itimmtx)
 #define	PROC_ITIMLOCK_ASSERT(p, type)	mtx_assert(&(p)->p_itimmtx, (type))
 
 #define	PROC_PROFLOCK(p)	mtx_lock_spin(&(p)->p_profmtx)
 #define	PROC_PROFUNLOCK(p)	mtx_unlock_spin(&(p)->p_profmtx)
 #define	PROC_PROFLOCK_ASSERT(p, type)	mtx_assert(&(p)->p_profmtx, (type))
 
 /* These flags are kept in p_flag. */
 #define	P_ADVLOCK	0x00001	/* Process may hold a POSIX advisory lock. */
 #define	P_CONTROLT	0x00002	/* Has a controlling terminal. */
 #define	P_KPROC		0x00004	/* Kernel process. */
 #define	P_UNUSED3	0x00008	/* --available-- */
 #define	P_PPWAIT	0x00010	/* Parent is waiting for child to exec/exit. */
 #define	P_PROFIL	0x00020	/* Has started profiling. */
 #define	P_STOPPROF	0x00040	/* Has thread requesting to stop profiling. */
 #define	P_HADTHREADS	0x00080	/* Has had threads (no cleanup shortcuts) */
 #define	P_SUGID		0x00100	/* Had set id privileges since last exec. */
 #define	P_SYSTEM	0x00200	/* System proc: no sigs, stats or swapping. */
 #define	P_SINGLE_EXIT	0x00400	/* Threads suspending should exit, not wait. */
 #define	P_TRACED	0x00800	/* Debugged process being traced. */
 #define	P_WAITED	0x01000	/* Someone is waiting for us. */
 #define	P_WEXIT		0x02000	/* Working on exiting. */
 #define	P_EXEC		0x04000	/* Process called exec. */
 #define	P_WKILLED	0x08000	/* Killed, go to kernel/user boundary ASAP. */
 #define	P_CONTINUED	0x10000	/* Proc has continued from a stopped state. */
 #define	P_STOPPED_SIG	0x20000	/* Stopped due to SIGSTOP/SIGTSTP. */
 #define	P_STOPPED_TRACE	0x40000	/* Stopped because of tracing. */
 #define	P_STOPPED_SINGLE 0x80000 /* Only 1 thread can continue (not to user). */
 #define	P_PROTECTED	0x100000 /* Do not kill on memory overcommit. */
 #define	P_SIGEVENT	0x200000 /* Process pending signals changed. */
 #define	P_SINGLE_BOUNDARY 0x400000 /* Threads should suspend at user boundary. */
 #define	P_HWPMC		0x800000 /* Process is using HWPMCs */
 #define	P_JAILED	0x1000000 /* Process is in jail. */
 #define	P_TOTAL_STOP	0x2000000 /* Stopped in stop_all_proc. */
 #define	P_INEXEC	0x4000000 /* Process is in execve(). */
 #define	P_STATCHILD	0x8000000 /* Child process stopped or exited. */
 #define	P_INMEM		0x10000000 /* Loaded into memory. */
 #define	P_SWAPPINGOUT	0x20000000 /* Process is being swapped out. */
 #define	P_SWAPPINGIN	0x40000000 /* Process is being swapped in. */
 #define	P_PPTRACE	0x80000000 /* PT_TRACEME by vforked child. */
 
 #define	P_STOPPED	(P_STOPPED_SIG|P_STOPPED_SINGLE|P_STOPPED_TRACE)
 #define	P_SHOULDSTOP(p)	((p)->p_flag & P_STOPPED)
 #define	P_KILLED(p)	((p)->p_flag & P_WKILLED)
 
 /* These flags are kept in p_flag2. */
 #define	P2_INHERIT_PROTECTED 0x00000001 /* New children get P_PROTECTED. */
 #define	P2_NOTRACE	0x00000002	/* No ptrace(2) attach or coredumps. */
 #define	P2_NOTRACE_EXEC 0x00000004	/* Keep P2_NOPTRACE on exec(2). */
 #define	P2_AST_SU	0x00000008	/* Handles SU ast for kthreads. */
 #define	P2_PTRACE_FSTP	0x00000010 /* SIGSTOP from PT_ATTACH not yet handled. */
 #define	P2_TRAPCAP	0x00000020	/* SIGTRAP on ENOTCAPABLE */
 #define	P2_ASLR_ENABLE	0x00000040	/* Force enable ASLR. */
 #define	P2_ASLR_DISABLE	0x00000080	/* Force disable ASLR. */
 #define	P2_ASLR_IGNSTART 0x00000100	/* Enable ASLR to consume sbrk area. */
 
 /* Flags protected by proctree_lock, kept in p_treeflags. */
 #define	P_TREE_ORPHANED		0x00000001	/* Reparented, on orphan list */
 #define	P_TREE_FIRST_ORPHAN	0x00000002	/* First element of orphan
 						   list */
 #define	P_TREE_REAPER		0x00000004	/* Reaper of subtree */
 
 /*
  * These were process status values (p_stat), now they are only used in
  * legacy conversion code.
  */
 #define	SIDL	1		/* Process being created by fork. */
 #define	SRUN	2		/* Currently runnable. */
 #define	SSLEEP	3		/* Sleeping on an address. */
 #define	SSTOP	4		/* Process debugging or suspension. */
 #define	SZOMB	5		/* Awaiting collection by parent. */
 #define	SWAIT	6		/* Waiting for interrupt. */
 #define	SLOCK	7		/* Blocked on a lock. */
 
 #define	P_MAGIC		0xbeefface
 
 #ifdef _KERNEL
 
 /* Types and flags for mi_switch(). */
 #define	SW_TYPE_MASK		0xff	/* First 8 bits are switch type */
 #define	SWT_NONE		0	/* Unspecified switch. */
 #define	SWT_PREEMPT		1	/* Switching due to preemption. */
 #define	SWT_OWEPREEMPT		2	/* Switching due to owepreempt. */
 #define	SWT_TURNSTILE		3	/* Turnstile contention. */
 #define	SWT_SLEEPQ		4	/* Sleepq wait. */
 #define	SWT_SLEEPQTIMO		5	/* Sleepq timeout wait. */
 #define	SWT_RELINQUISH		6	/* yield call. */
 #define	SWT_NEEDRESCHED		7	/* NEEDRESCHED was set. */
 #define	SWT_IDLE		8	/* Switching from the idle thread. */
 #define	SWT_IWAIT		9	/* Waiting for interrupts. */
 #define	SWT_SUSPEND		10	/* Thread suspended. */
 #define	SWT_REMOTEPREEMPT	11	/* Remote processor preempted. */
 #define	SWT_REMOTEWAKEIDLE	12	/* Remote processor preempted idle. */
 #define	SWT_COUNT		13	/* Number of switch types. */
 /* Flags */
 #define	SW_VOL		0x0100		/* Voluntary switch. */
 #define	SW_INVOL	0x0200		/* Involuntary switch. */
 #define SW_PREEMPT	0x0400		/* The invol switch is a preemption */
 
 /* How values for thread_single(). */
 #define	SINGLE_NO_EXIT	0
 #define	SINGLE_EXIT	1
 #define	SINGLE_BOUNDARY	2
 #define	SINGLE_ALLPROC	3
 
 #ifdef MALLOC_DECLARE
 MALLOC_DECLARE(M_PARGS);
 MALLOC_DECLARE(M_PGRP);
 MALLOC_DECLARE(M_SESSION);
 MALLOC_DECLARE(M_SUBPROC);
 #endif
 
 #define	FOREACH_PROC_IN_SYSTEM(p)					\
 	LIST_FOREACH((p), &allproc, p_list)
 #define	FOREACH_THREAD_IN_PROC(p, td)					\
 	TAILQ_FOREACH((td), &(p)->p_threads, td_plist)
 
 #define	FIRST_THREAD_IN_PROC(p)	TAILQ_FIRST(&(p)->p_threads)
 
 /*
  * We use process IDs <= pid_max <= PID_MAX; PID_MAX + 1 must also fit
  * in a pid_t, as it is used to represent "no process group".
  */
 #define	PID_MAX		99999
 #define	NO_PID		100000
 extern pid_t pid_max;
 
 #define	SESS_LEADER(p)	((p)->p_session->s_leader == (p))
 
 
 #define	STOPEVENT(p, e, v) do {						\
 	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,			\
  	    "checking stopevent %d", (e));				\
 	if ((p)->p_stops & (e))	{					\
 		PROC_LOCK(p);						\
 		stopevent((p), (e), (v));				\
 		PROC_UNLOCK(p);						\
 	}								\
 } while (0)
 #define	_STOPEVENT(p, e, v) do {					\
 	PROC_LOCK_ASSERT(p, MA_OWNED);					\
 	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, &p->p_mtx.lock_object, \
  	    "checking stopevent %d", (e));				\
 	if ((p)->p_stops & (e))						\
 		stopevent((p), (e), (v));				\
 } while (0)
 
 /* Lock and unlock a process. */
 #define	PROC_LOCK(p)	mtx_lock(&(p)->p_mtx)
 #define	PROC_TRYLOCK(p)	mtx_trylock(&(p)->p_mtx)
 #define	PROC_UNLOCK(p)	mtx_unlock(&(p)->p_mtx)
 #define	PROC_LOCKED(p)	mtx_owned(&(p)->p_mtx)
 #define	PROC_LOCK_ASSERT(p, type)	mtx_assert(&(p)->p_mtx, (type))
 
 /* Lock and unlock a process group. */
 #define	PGRP_LOCK(pg)	mtx_lock(&(pg)->pg_mtx)
 #define	PGRP_UNLOCK(pg)	mtx_unlock(&(pg)->pg_mtx)
 #define	PGRP_LOCKED(pg)	mtx_owned(&(pg)->pg_mtx)
 #define	PGRP_LOCK_ASSERT(pg, type)	mtx_assert(&(pg)->pg_mtx, (type))
 
 #define	PGRP_LOCK_PGSIGNAL(pg) do {					\
 	if ((pg) != NULL)						\
 		PGRP_LOCK(pg);						\
 } while (0)
 #define	PGRP_UNLOCK_PGSIGNAL(pg) do {					\
 	if ((pg) != NULL)						\
 		PGRP_UNLOCK(pg);					\
 } while (0)
 
 /* Lock and unlock a session. */
 #define	SESS_LOCK(s)	mtx_lock(&(s)->s_mtx)
 #define	SESS_UNLOCK(s)	mtx_unlock(&(s)->s_mtx)
 #define	SESS_LOCKED(s)	mtx_owned(&(s)->s_mtx)
 #define	SESS_LOCK_ASSERT(s, type)	mtx_assert(&(s)->s_mtx, (type))
 
 /*
  * Non-zero p_lock ensures that:
  * - exit1() is not performed until p_lock reaches zero;
  * - the process' threads stack are not swapped out if they are currently
  *   not (P_INMEM).
  *
  * PHOLD() asserts that the process (except the current process) is
  * not exiting, increments p_lock and swaps threads stacks into memory,
  * if needed.
  * _PHOLD() is same as PHOLD(), it takes the process locked.
  * _PHOLD_LITE() also takes the process locked, but comparing with
  * _PHOLD(), it only guarantees that exit1() is not executed,
  * faultin() is not called.
  */
 #define	PHOLD(p) do {							\
 	PROC_LOCK(p);							\
 	_PHOLD(p);							\
 	PROC_UNLOCK(p);							\
 } while (0)
 #define	_PHOLD(p) do {							\
 	PROC_LOCK_ASSERT((p), MA_OWNED);				\
 	KASSERT(!((p)->p_flag & P_WEXIT) || (p) == curproc,		\
 	    ("PHOLD of exiting process %p", p));			\
 	(p)->p_lock++;							\
 	if (((p)->p_flag & P_INMEM) == 0)				\
 		faultin((p));						\
 } while (0)
 #define	_PHOLD_LITE(p) do {						\
 	PROC_LOCK_ASSERT((p), MA_OWNED);				\
 	KASSERT(!((p)->p_flag & P_WEXIT) || (p) == curproc,		\
 	    ("PHOLD of exiting process %p", p));			\
 	(p)->p_lock++;							\
 } while (0)
 #define	PROC_ASSERT_HELD(p) do {					\
 	KASSERT((p)->p_lock > 0, ("process %p not held", p));		\
 } while (0)
 
 #define	PRELE(p) do {							\
 	PROC_LOCK((p));							\
 	_PRELE((p));							\
 	PROC_UNLOCK((p));						\
 } while (0)
 #define	_PRELE(p) do {							\
 	PROC_LOCK_ASSERT((p), MA_OWNED);				\
 	PROC_ASSERT_HELD(p);						\
 	(--(p)->p_lock);						\
 	if (((p)->p_flag & P_WEXIT) && (p)->p_lock == 0)		\
 		wakeup(&(p)->p_lock);					\
 } while (0)
 #define	PROC_ASSERT_NOT_HELD(p) do {					\
 	KASSERT((p)->p_lock == 0, ("process %p held", p));		\
 } while (0)
 
 #define	PROC_UPDATE_COW(p) do {						\
 	PROC_LOCK_ASSERT((p), MA_OWNED);				\
 	(p)->p_cowgen++;						\
 } while (0)
 
 /* Check whether a thread is safe to be swapped out. */
 #define	thread_safetoswapout(td)	((td)->td_flags & TDF_CANSWAP)
 
 /* Control whether or not it is safe for curthread to sleep. */
 #define	THREAD_NO_SLEEPING()		((curthread)->td_no_sleeping++)
 
 #define	THREAD_SLEEPING_OK()		((curthread)->td_no_sleeping--)
 
 #define	THREAD_CAN_SLEEP()		((curthread)->td_no_sleeping == 0)
 
 #define	PIDHASH(pid)	(&pidhashtbl[(pid) & pidhash])
 #define	PIDHASHLOCK(pid) (&pidhashtbl_lock[((pid) & pidhashlock)])
 extern LIST_HEAD(pidhashhead, proc) *pidhashtbl;
 extern struct sx *pidhashtbl_lock;
 extern u_long pidhash;
 extern u_long pidhashlock;
 #define	TIDHASH(tid)	(&tidhashtbl[(tid) & tidhash])
 extern LIST_HEAD(tidhashhead, thread) *tidhashtbl;
 extern u_long tidhash;
 extern struct rwlock tidhash_lock;
 
 #define	PGRPHASH(pgid)	(&pgrphashtbl[(pgid) & pgrphash])
 extern LIST_HEAD(pgrphashhead, pgrp) *pgrphashtbl;
 extern u_long pgrphash;
 
 extern struct sx allproc_lock;
 extern int allproc_gen;
 extern struct sx zombproc_lock;
 extern struct sx proctree_lock;
 extern struct mtx ppeers_lock;
 extern struct mtx procid_lock;
 extern struct proc proc0;		/* Process slot for swapper. */
 extern struct thread0_storage thread0_st;	/* Primary thread in proc0. */
 #define	thread0 (thread0_st.t0st_thread)
 extern struct vmspace vmspace0;		/* VM space for proc0. */
 extern int hogticks;			/* Limit on kernel cpu hogs. */
 extern int lastpid;
 extern int nprocs, maxproc;		/* Current and max number of procs. */
 extern int maxprocperuid;		/* Max procs per uid. */
 extern u_long ps_arg_cache_limit;
 
 LIST_HEAD(proclist, proc);
 TAILQ_HEAD(procqueue, proc);
 TAILQ_HEAD(threadqueue, thread);
 extern struct proclist allproc;		/* List of all processes. */
 extern struct proclist zombproc;	/* List of zombie processes. */
 extern struct proc *initproc, *pageproc; /* Process slots for init, pager. */
 
 extern struct uma_zone *proc_zone;
 
 struct	proc *pfind(pid_t);		/* Find process by id. */
 struct	proc *pfind_any(pid_t);		/* Find (zombie) process by id. */
 struct	proc *pfind_any_locked(pid_t pid); /* Find process by id, locked. */
 struct	pgrp *pgfind(pid_t);		/* Find process group by id. */
 struct	proc *zpfind(pid_t);		/* Find zombie process by id. */
 void	pidhash_slockall(void);		/* Shared lock all pid hash lists. */
 void	pidhash_sunlockall(void);	/* Shared unlock all pid hash lists. */
 
 struct	fork_req {
 	int		fr_flags;
 	int		fr_pages;
 	int 		*fr_pidp;
 	struct proc 	**fr_procp;
 	int 		*fr_pd_fd;
 	int 		fr_pd_flags;
 	struct filecaps	*fr_pd_fcaps;
 };
 
 /*
  * pget() flags.
  */
 #define	PGET_HOLD	0x00001	/* Hold the process. */
 #define	PGET_CANSEE	0x00002	/* Check against p_cansee(). */
 #define	PGET_CANDEBUG	0x00004	/* Check against p_candebug(). */
 #define	PGET_ISCURRENT	0x00008	/* Check that the found process is current. */
 #define	PGET_NOTWEXIT	0x00010	/* Check that the process is not in P_WEXIT. */
 #define	PGET_NOTINEXEC	0x00020	/* Check that the process is not in P_INEXEC. */
 #define	PGET_NOTID	0x00040	/* Do not assume tid if pid > PID_MAX. */
 
 #define	PGET_WANTREAD	(PGET_HOLD | PGET_CANDEBUG | PGET_NOTWEXIT)
 
 int	pget(pid_t pid, int flags, struct proc **pp);
 
 void	ast(struct trapframe *framep);
 struct	thread *choosethread(void);
 int	cr_cansee(struct ucred *u1, struct ucred *u2);
 int	cr_canseesocket(struct ucred *cred, struct socket *so);
 int	cr_canseeothergids(struct ucred *u1, struct ucred *u2);
 int	cr_canseeotheruids(struct ucred *u1, struct ucred *u2);
 int	cr_canseejailproc(struct ucred *u1, struct ucred *u2);
 int	cr_cansignal(struct ucred *cred, struct proc *proc, int signum);
 int	enterpgrp(struct proc *p, pid_t pgid, struct pgrp *pgrp,
 	    struct session *sess);
 int	enterthispgrp(struct proc *p, struct pgrp *pgrp);
 void	faultin(struct proc *p);
 void	fixjobc(struct proc *p, struct pgrp *pgrp, int entering);
 int	fork1(struct thread *, struct fork_req *);
 void	fork_rfppwait(struct thread *);
 void	fork_exit(void (*)(void *, struct trapframe *), void *,
 	    struct trapframe *);
 void	fork_return(struct thread *, struct trapframe *);
 int	inferior(struct proc *p);
 void	kern_proc_vmmap_resident(struct vm_map *map, struct vm_map_entry *entry,
 	    int *resident_count, bool *super);
 void	kern_yield(int);
 void 	kick_proc0(void);
 void	killjobc(void);
 int	leavepgrp(struct proc *p);
 int	maybe_preempt(struct thread *td);
 void	maybe_yield(void);
 void	mi_switch(int flags, struct thread *newtd);
 int	p_candebug(struct thread *td, struct proc *p);
 int	p_cansee(struct thread *td, struct proc *p);
 int	p_cansched(struct thread *td, struct proc *p);
 int	p_cansignal(struct thread *td, struct proc *p, int signum);
 int	p_canwait(struct thread *td, struct proc *p);
 struct	pargs *pargs_alloc(int len);
 void	pargs_drop(struct pargs *pa);
 void	pargs_hold(struct pargs *pa);
 int	proc_getargv(struct thread *td, struct proc *p, struct sbuf *sb);
 int	proc_getauxv(struct thread *td, struct proc *p, struct sbuf *sb);
 int	proc_getenvv(struct thread *td, struct proc *p, struct sbuf *sb);
 void	procinit(void);
 int	proc_iterate(int (*cb)(struct proc *, void *), void *cbarg);
 void	proc_linkup0(struct proc *p, struct thread *td);
 void	proc_linkup(struct proc *p, struct thread *td);
 struct proc *proc_realparent(struct proc *child);
 void	proc_reap(struct thread *td, struct proc *p, int *status, int options);
 void	proc_reparent(struct proc *child, struct proc *newparent, bool set_oppid);
 void	proc_set_traced(struct proc *p, bool stop);
 void	proc_wkilled(struct proc *p);
 struct	pstats *pstats_alloc(void);
 void	pstats_fork(struct pstats *src, struct pstats *dst);
 void	pstats_free(struct pstats *ps);
 void	reaper_abandon_children(struct proc *p, bool exiting);
 int	securelevel_ge(struct ucred *cr, int level);
 int	securelevel_gt(struct ucred *cr, int level);
 void	sess_hold(struct session *);
 void	sess_release(struct session *);
 int	setrunnable(struct thread *);
 void	setsugid(struct proc *p);
 int	should_yield(void);
 int	sigonstack(size_t sp);
 void	stopevent(struct proc *, u_int, u_int);
 struct	thread *tdfind(lwpid_t, pid_t);
 void	threadinit(void);
 void	tidhash_add(struct thread *);
 void	tidhash_remove(struct thread *);
 void	cpu_idle(int);
 int	cpu_idle_wakeup(int);
 extern	void (*cpu_idle_hook)(sbintime_t);	/* Hook to machdep CPU idler. */
 void	cpu_switch(struct thread *, struct thread *, struct mtx *);
 void	cpu_throw(struct thread *, struct thread *) __dead2;
 void	unsleep(struct thread *);
 void	userret(struct thread *, struct trapframe *);
 
 void	cpu_exit(struct thread *);
 void	exit1(struct thread *, int, int) __dead2;
 void	cpu_copy_thread(struct thread *td, struct thread *td0);
 bool	cpu_exec_vmspace_reuse(struct proc *p, struct vm_map *map);
 int	cpu_fetch_syscall_args(struct thread *td);
 void	cpu_fork(struct thread *, struct proc *, struct thread *, int);
 void	cpu_fork_kthread_handler(struct thread *, void (*)(void *), void *);
 int	cpu_procctl(struct thread *td, int idtype, id_t id, int com,
 	    void *data);
 void	cpu_set_syscall_retval(struct thread *, int);
 void	cpu_set_upcall(struct thread *, void (*)(void *), void *,
 	    stack_t *);
 int	cpu_set_user_tls(struct thread *, void *tls_base);
 void	cpu_thread_alloc(struct thread *);
 void	cpu_thread_clean(struct thread *);
 void	cpu_thread_exit(struct thread *);
 void	cpu_thread_free(struct thread *);
 void	cpu_thread_swapin(struct thread *);
 void	cpu_thread_swapout(struct thread *);
 struct	thread *thread_alloc(int pages);
 int	thread_alloc_stack(struct thread *, int pages);
 void	thread_cow_get_proc(struct thread *newtd, struct proc *p);
 void	thread_cow_get(struct thread *newtd, struct thread *td);
 void	thread_cow_free(struct thread *td);
 void	thread_cow_update(struct thread *td);
 int	thread_create(struct thread *td, struct rtprio *rtp,
 	    int (*initialize_thread)(struct thread *, void *), void *thunk);
 void	thread_exit(void) __dead2;
 void	thread_free(struct thread *td);
 void	thread_link(struct thread *td, struct proc *p);
 void	thread_reap(void);
 int	thread_single(struct proc *p, int how);
 void	thread_single_end(struct proc *p, int how);
 void	thread_stash(struct thread *td);
 void	thread_stopped(struct proc *p);
 void	childproc_stopped(struct proc *child, int reason);
 void	childproc_continued(struct proc *child);
 void	childproc_exited(struct proc *child);
 int	thread_suspend_check(int how);
 bool	thread_suspend_check_needed(void);
 void	thread_suspend_switch(struct thread *, struct proc *p);
 void	thread_suspend_one(struct thread *td);
 void	thread_unlink(struct thread *td);
 void	thread_unsuspend(struct proc *p);
 void	thread_wait(struct proc *p);
 struct thread	*thread_find(struct proc *p, lwpid_t tid);
 
 void	stop_all_proc(void);
 void	resume_all_proc(void);
 
 static __inline int
 curthread_pflags_set(int flags)
 {
 	struct thread *td;
 	int save;
 
 	td = curthread;
 	save = ~flags | (td->td_pflags & flags);
 	td->td_pflags |= flags;
 	return (save);
 }
 
 static __inline void
 curthread_pflags_restore(int save)
 {
 
 	curthread->td_pflags &= save;
 }
 
 static __inline __pure2 struct td_sched *
 td_get_sched(struct thread *td)
 {
 
 	return ((struct td_sched *)&td[1]);
 }
 
 extern void (*softdep_ast_cleanup)(struct thread *);
 static __inline void
 td_softdep_cleanup(struct thread *td)
 {
 
 	if (td->td_su != NULL && softdep_ast_cleanup != NULL)
 		softdep_ast_cleanup(td);
 }
 
 #define	PROC_ID_PID	0
 #define	PROC_ID_GROUP	1
 #define	PROC_ID_SESSION	2
 #define	PROC_ID_REAP	3
 
 void	proc_id_set(int type, pid_t id);
 void	proc_id_set_cond(int type, pid_t id);
 void	proc_id_clear(int type, pid_t id);
 
 #endif	/* _KERNEL */
 
 #endif	/* !_SYS_PROC_H_ */
Index: user/ngie/bug-237403/sys/x86/x86/busdma_bounce.c
===================================================================
--- user/ngie/bug-237403/sys/x86/x86/busdma_bounce.c	(revision 346925)
+++ user/ngie/bug-237403/sys/x86/x86/busdma_bounce.c	(revision 346926)
@@ -1,1321 +1,1319 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 1997, 1998 Justin T. Gibbs.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions, and the following disclaimer,
  *    without modification, immediately at the beginning of the file.
  * 2. The name of the author may not be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
  * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/domainset.h>
 #include <sys/malloc.h>
 #include <sys/bus.h>
 #include <sys/interrupt.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/lock.h>
 #include <sys/proc.h>
 #include <sys/memdesc.h>
 #include <sys/mutex.h>
 #include <sys/sysctl.h>
 #include <sys/uio.h>
 
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_page.h>
 #include <vm/vm_map.h>
 
 #include <machine/atomic.h>
 #include <machine/bus.h>
 #include <machine/md_var.h>
 #include <machine/specialreg.h>
 #include <x86/include/busdma_impl.h>
 
 #ifdef __i386__
 #define MAX_BPAGES (Maxmem > atop(0x100000000ULL) ? 8192 : 512)
 #else
 #define MAX_BPAGES 8192
 #endif
 
 enum {
 	BUS_DMA_COULD_BOUNCE	= 0x01,
 	BUS_DMA_MIN_ALLOC_COMP	= 0x02,
 	BUS_DMA_KMEM_ALLOC	= 0x04,
 };
 
 struct bounce_zone;
 
 struct bus_dma_tag {
 	struct bus_dma_tag_common common;
 	int			map_count;
 	int			bounce_flags;
 	bus_dma_segment_t	*segments;
 	struct bounce_zone	*bounce_zone;
 };
 
 struct bounce_page {
 	vm_offset_t	vaddr;		/* kva of bounce buffer */
 	bus_addr_t	busaddr;	/* Physical address */
 	vm_offset_t	datavaddr;	/* kva of client data */
 	vm_offset_t	dataoffs;	/* page offset of client data */
 	vm_page_t	datapage[2];	/* physical page(s) of client data */
 	bus_size_t	datacount;	/* client data count */
 	STAILQ_ENTRY(bounce_page) links;
 };
 
 int busdma_swi_pending;
 
 struct bounce_zone {
 	STAILQ_ENTRY(bounce_zone) links;
 	STAILQ_HEAD(bp_list, bounce_page) bounce_page_list;
 	int		total_bpages;
 	int		free_bpages;
 	int		reserved_bpages;
 	int		active_bpages;
 	int		total_bounced;
 	int		total_deferred;
 	int		map_count;
 	int		domain;
 	bus_size_t	alignment;
 	bus_addr_t	lowaddr;
 	char		zoneid[8];
 	char		lowaddrid[20];
 	struct sysctl_ctx_list sysctl_tree;
 	struct sysctl_oid *sysctl_tree_top;
 };
 
 static struct mtx bounce_lock;
 static int total_bpages;
 static int busdma_zonecount;
 static STAILQ_HEAD(, bounce_zone) bounce_zone_list;
 
 static SYSCTL_NODE(_hw, OID_AUTO, busdma, CTLFLAG_RD, 0, "Busdma parameters");
 SYSCTL_INT(_hw_busdma, OID_AUTO, total_bpages, CTLFLAG_RD, &total_bpages, 0,
 	   "Total bounce pages");
 
 struct bus_dmamap {
 	struct bp_list	       bpages;
 	int		       pagesneeded;
 	int		       pagesreserved;
 	bus_dma_tag_t	       dmat;
 	struct memdesc	       mem;
 	bus_dmamap_callback_t *callback;
 	void		      *callback_arg;
 	STAILQ_ENTRY(bus_dmamap) links;
 };
 
 static STAILQ_HEAD(, bus_dmamap) bounce_map_waitinglist;
 static STAILQ_HEAD(, bus_dmamap) bounce_map_callbacklist;
 static struct bus_dmamap nobounce_dmamap;
 
 static void init_bounce_pages(void *dummy);
 static int alloc_bounce_zone(bus_dma_tag_t dmat);
 static int alloc_bounce_pages(bus_dma_tag_t dmat, u_int numpages);
 static int reserve_bounce_pages(bus_dma_tag_t dmat, bus_dmamap_t map,
     int commit);
 static bus_addr_t add_bounce_page(bus_dma_tag_t dmat, bus_dmamap_t map,
     vm_offset_t vaddr, vm_paddr_t addr1, vm_paddr_t addr2, bus_size_t size);
 static void free_bounce_page(bus_dma_tag_t dmat, struct bounce_page *bpage);
 static void _bus_dmamap_count_pages(bus_dma_tag_t dmat, bus_dmamap_t map,
     pmap_t pmap, void *buf, bus_size_t buflen, int flags);
 static void _bus_dmamap_count_phys(bus_dma_tag_t dmat, bus_dmamap_t map,
     vm_paddr_t buf, bus_size_t buflen, int flags);
 static int _bus_dmamap_reserve_pages(bus_dma_tag_t dmat, bus_dmamap_t map,
     int flags);
 
 static int
 bounce_bus_dma_zone_setup(bus_dma_tag_t dmat)
 {
 	struct bounce_zone *bz;
 	int error;
 
 	/* Must bounce */
 	if ((error = alloc_bounce_zone(dmat)) != 0)
 		return (error);
 	bz = dmat->bounce_zone;
 
 	if (ptoa(bz->total_bpages) < dmat->common.maxsize) {
 		int pages;
 
 		pages = atop(dmat->common.maxsize) - bz->total_bpages;
 
 		/* Add pages to our bounce pool */
 		if (alloc_bounce_pages(dmat, pages) < pages)
 			return (ENOMEM);
 	}
 	/* Performed initial allocation */
 	dmat->bounce_flags |= BUS_DMA_MIN_ALLOC_COMP;
 
 	return (0);
 }
 
 /*
  * Allocate a device specific dma_tag.
  */
 static int
 bounce_bus_dma_tag_create(bus_dma_tag_t parent, bus_size_t alignment,
     bus_addr_t boundary, bus_addr_t lowaddr, bus_addr_t highaddr,
     bus_dma_filter_t *filter, void *filterarg, bus_size_t maxsize,
     int nsegments, bus_size_t maxsegsz, int flags, bus_dma_lock_t *lockfunc,
     void *lockfuncarg, bus_dma_tag_t *dmat)
 {
 	bus_dma_tag_t newtag;
 	int error;
 
 	*dmat = NULL;
 	error = common_bus_dma_tag_create(parent != NULL ? &parent->common :
 	    NULL, alignment, boundary, lowaddr, highaddr, filter, filterarg,
 	    maxsize, nsegments, maxsegsz, flags, lockfunc, lockfuncarg,
 	    sizeof (struct bus_dma_tag), (void **)&newtag);
 	if (error != 0)
 		return (error);
 
 	newtag->common.impl = &bus_dma_bounce_impl;
 	newtag->map_count = 0;
 	newtag->segments = NULL;
 
 	if (parent != NULL && (newtag->common.filter != NULL ||
 	    (parent->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0))
 		newtag->bounce_flags |= BUS_DMA_COULD_BOUNCE;
 
 	if (newtag->common.lowaddr < ptoa((vm_paddr_t)Maxmem) ||
 	    newtag->common.alignment > 1)
 		newtag->bounce_flags |= BUS_DMA_COULD_BOUNCE;
 
 	if ((newtag->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0 &&
 	    (flags & BUS_DMA_ALLOCNOW) != 0)
 		error = bounce_bus_dma_zone_setup(newtag);
 	else
 		error = 0;
 	
 	if (error != 0)
 		free(newtag, M_DEVBUF);
 	else
 		*dmat = newtag;
 	CTR4(KTR_BUSDMA, "%s returned tag %p tag flags 0x%x error %d",
 	    __func__, newtag, (newtag != NULL ? newtag->common.flags : 0),
 	    error);
 	return (error);
 }
 
 /*
  * Update the domain for the tag.  We may need to reallocate the zone and
  * bounce pages.
  */ 
 static int
 bounce_bus_dma_tag_set_domain(bus_dma_tag_t dmat)
 {
 
 	KASSERT(dmat->map_count == 0,
 	    ("bounce_bus_dma_tag_set_domain:  Domain set after use.\n"));
 	if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) == 0 ||
 	    dmat->bounce_zone == NULL)
 		return (0);
 	dmat->bounce_flags &= ~BUS_DMA_MIN_ALLOC_COMP;
 	return (bounce_bus_dma_zone_setup(dmat));
 }
 
 static int
 bounce_bus_dma_tag_destroy(bus_dma_tag_t dmat)
 {
 	bus_dma_tag_t dmat_copy, parent;
 	int error;
 
 	error = 0;
 	dmat_copy = dmat;
 
 	if (dmat != NULL) {
 		if (dmat->map_count != 0) {
 			error = EBUSY;
 			goto out;
 		}
 		while (dmat != NULL) {
 			parent = (bus_dma_tag_t)dmat->common.parent;
 			atomic_subtract_int(&dmat->common.ref_count, 1);
 			if (dmat->common.ref_count == 0) {
 				if (dmat->segments != NULL)
 					free_domain(dmat->segments, M_DEVBUF);
 				free(dmat, M_DEVBUF);
 				/*
 				 * Last reference count, so
 				 * release our reference
 				 * count on our parent.
 				 */
 				dmat = parent;
 			} else
 				dmat = NULL;
 		}
 	}
 out:
 	CTR3(KTR_BUSDMA, "%s tag %p error %d", __func__, dmat_copy, error);
 	return (error);
 }
 
 /*
  * Allocate a handle for mapping from kva/uva/physical
  * address space into bus device space.
  */
 static int
 bounce_bus_dmamap_create(bus_dma_tag_t dmat, int flags, bus_dmamap_t *mapp)
 {
 	struct bounce_zone *bz;
 	int error, maxpages, pages;
 
-	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s", __func__);
-
 	error = 0;
 
 	if (dmat->segments == NULL) {
 		dmat->segments = (bus_dma_segment_t *)malloc_domainset(
 		    sizeof(bus_dma_segment_t) * dmat->common.nsegments,
 		    M_DEVBUF, DOMAINSET_PREF(dmat->common.domain), M_NOWAIT);
 		if (dmat->segments == NULL) {
 			CTR3(KTR_BUSDMA, "%s: tag %p error %d",
 			    __func__, dmat, ENOMEM);
 			return (ENOMEM);
 		}
 	}
 
 	/*
 	 * Bouncing might be required if the driver asks for an active
 	 * exclusion region, a data alignment that is stricter than 1, and/or
 	 * an active address boundary.
 	 */
 	if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0) {
 		/* Must bounce */
 		if (dmat->bounce_zone == NULL) {
 			if ((error = alloc_bounce_zone(dmat)) != 0)
 				return (error);
 		}
 		bz = dmat->bounce_zone;
 
 		*mapp = (bus_dmamap_t)malloc_domainset(sizeof(**mapp), M_DEVBUF,
 		    DOMAINSET_PREF(dmat->common.domain), M_NOWAIT | M_ZERO);
 		if (*mapp == NULL) {
 			CTR3(KTR_BUSDMA, "%s: tag %p error %d",
 			    __func__, dmat, ENOMEM);
 			return (ENOMEM);
 		}
 
 		/* Initialize the new map */
 		STAILQ_INIT(&((*mapp)->bpages));
 
 		/*
 		 * Attempt to add pages to our pool on a per-instance
 		 * basis up to a sane limit.
 		 */
 		if (dmat->common.alignment > 1)
 			maxpages = MAX_BPAGES;
 		else
 			maxpages = MIN(MAX_BPAGES, Maxmem -
 			    atop(dmat->common.lowaddr));
 		if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
 		    (bz->map_count > 0 && bz->total_bpages < maxpages)) {
 			pages = MAX(atop(dmat->common.maxsize), 1);
 			pages = MIN(maxpages - bz->total_bpages, pages);
 			pages = MAX(pages, 1);
 			if (alloc_bounce_pages(dmat, pages) < pages)
 				error = ENOMEM;
 			if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP)
 			    == 0) {
 				if (error == 0) {
 					dmat->bounce_flags |=
 					    BUS_DMA_MIN_ALLOC_COMP;
 				}
 			} else
 				error = 0;
 		}
 		bz->map_count++;
 	} else {
 		*mapp = NULL;
 	}
 	if (error == 0)
 		dmat->map_count++;
 	CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d",
 	    __func__, dmat, dmat->common.flags, error);
 	return (error);
 }
 
 /*
  * Destroy a handle for mapping from kva/uva/physical
  * address space into bus device space.
  */
 static int
 bounce_bus_dmamap_destroy(bus_dma_tag_t dmat, bus_dmamap_t map)
 {
 
 	if (map != NULL && map != &nobounce_dmamap) {
 		if (STAILQ_FIRST(&map->bpages) != NULL) {
 			CTR3(KTR_BUSDMA, "%s: tag %p error %d",
 			    __func__, dmat, EBUSY);
 			return (EBUSY);
 		}
 		if (dmat->bounce_zone)
 			dmat->bounce_zone->map_count--;
 		free_domain(map, M_DEVBUF);
 	}
 	dmat->map_count--;
 	CTR2(KTR_BUSDMA, "%s: tag %p error 0", __func__, dmat);
 	return (0);
 }
 
 
 /*
  * Allocate a piece of memory that can be efficiently mapped into
  * bus device space based on the constraints lited in the dma tag.
  * A dmamap to for use with dmamap_load is also allocated.
  */
 static int
 bounce_bus_dmamem_alloc(bus_dma_tag_t dmat, void** vaddr, int flags,
     bus_dmamap_t *mapp)
 {
 	vm_memattr_t attr;
 	int mflags;
 
 	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s", __func__);
 
 	if (flags & BUS_DMA_NOWAIT)
 		mflags = M_NOWAIT;
 	else
 		mflags = M_WAITOK;
 
 	/* If we succeed, no mapping/bouncing will be required */
 	*mapp = NULL;
 
 	if (dmat->segments == NULL) {
 		dmat->segments = (bus_dma_segment_t *)malloc_domainset(
 		    sizeof(bus_dma_segment_t) * dmat->common.nsegments,
 		    M_DEVBUF, DOMAINSET_PREF(dmat->common.domain), mflags);
 		if (dmat->segments == NULL) {
 			CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d",
 			    __func__, dmat, dmat->common.flags, ENOMEM);
 			return (ENOMEM);
 		}
 	}
 	if (flags & BUS_DMA_ZERO)
 		mflags |= M_ZERO;
 	if (flags & BUS_DMA_NOCACHE)
 		attr = VM_MEMATTR_UNCACHEABLE;
 	else
 		attr = VM_MEMATTR_DEFAULT;
 
 	/*
 	 * Allocate the buffer from the malloc(9) allocator if...
 	 *  - It's small enough to fit into a single power of two sized bucket.
 	 *  - The alignment is less than or equal to the maximum size
 	 *  - The low address requirement is fulfilled.
 	 * else allocate non-contiguous pages if...
 	 *  - The page count that could get allocated doesn't exceed
 	 *    nsegments also when the maximum segment size is less
 	 *    than PAGE_SIZE.
 	 *  - The alignment constraint isn't larger than a page boundary.
 	 *  - There are no boundary-crossing constraints.
 	 * else allocate a block of contiguous pages because one or more of the
 	 * constraints is something that only the contig allocator can fulfill.
 	 *
 	 * NOTE: The (dmat->common.alignment <= dmat->maxsize) check
 	 * below is just a quick hack. The exact alignment guarantees
 	 * of malloc(9) need to be nailed down, and the code below
 	 * should be rewritten to take that into account.
 	 *
 	 * In the meantime warn the user if malloc gets it wrong.
 	 */
 	if (dmat->common.maxsize <= PAGE_SIZE &&
 	    dmat->common.alignment <= dmat->common.maxsize &&
 	    dmat->common.lowaddr >= ptoa((vm_paddr_t)Maxmem) &&
 	    attr == VM_MEMATTR_DEFAULT) {
 		*vaddr = malloc_domainset(dmat->common.maxsize, M_DEVBUF,
 		    DOMAINSET_PREF(dmat->common.domain), mflags);
 	} else if (dmat->common.nsegments >=
 	    howmany(dmat->common.maxsize, MIN(dmat->common.maxsegsz,
 	    PAGE_SIZE)) &&
 	    dmat->common.alignment <= PAGE_SIZE &&
 	    (dmat->common.boundary % PAGE_SIZE) == 0) {
 		/* Page-based multi-segment allocations allowed */
 		*vaddr = (void *)kmem_alloc_attr_domainset(
 		    DOMAINSET_PREF(dmat->common.domain), dmat->common.maxsize,
 		    mflags, 0ul, dmat->common.lowaddr, attr);
 		dmat->bounce_flags |= BUS_DMA_KMEM_ALLOC;
 	} else {
 		*vaddr = (void *)kmem_alloc_contig_domainset(
 		    DOMAINSET_PREF(dmat->common.domain), dmat->common.maxsize,
 		    mflags, 0ul, dmat->common.lowaddr,
 		    dmat->common.alignment != 0 ? dmat->common.alignment : 1ul,
 		    dmat->common.boundary, attr);
 		dmat->bounce_flags |= BUS_DMA_KMEM_ALLOC;
 	}
 	if (*vaddr == NULL) {
 		CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d",
 		    __func__, dmat, dmat->common.flags, ENOMEM);
 		return (ENOMEM);
 	} else if (vtophys(*vaddr) & (dmat->common.alignment - 1)) {
 		printf("bus_dmamem_alloc failed to align memory properly.\n");
 	}
 	CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d",
 	    __func__, dmat, dmat->common.flags, 0);
 	return (0);
 }
 
 /*
  * Free a piece of memory and it's allociated dmamap, that was allocated
  * via bus_dmamem_alloc.  Make the same choice for free/contigfree.
  */
 static void
 bounce_bus_dmamem_free(bus_dma_tag_t dmat, void *vaddr, bus_dmamap_t map)
 {
 	/*
 	 * dmamem does not need to be bounced, so the map should be
 	 * NULL and the BUS_DMA_KMEM_ALLOC flag cleared if malloc()
 	 * was used and set if kmem_alloc_contig() was used.
 	 */
 	if (map != NULL)
 		panic("bus_dmamem_free: Invalid map freed\n");
 	if ((dmat->bounce_flags & BUS_DMA_KMEM_ALLOC) == 0)
 		free_domain(vaddr, M_DEVBUF);
 	else
 		kmem_free((vm_offset_t)vaddr, dmat->common.maxsize);
 	CTR3(KTR_BUSDMA, "%s: tag %p flags 0x%x", __func__, dmat,
 	    dmat->bounce_flags);
 }
 
 static void
 _bus_dmamap_count_phys(bus_dma_tag_t dmat, bus_dmamap_t map, vm_paddr_t buf,
     bus_size_t buflen, int flags)
 {
 	vm_paddr_t curaddr;
 	bus_size_t sgsize;
 
 	if (map != &nobounce_dmamap && map->pagesneeded == 0) {
 		/*
 		 * Count the number of bounce pages
 		 * needed in order to complete this transfer
 		 */
 		curaddr = buf;
 		while (buflen != 0) {
 			sgsize = MIN(buflen, dmat->common.maxsegsz);
 			if (bus_dma_run_filter(&dmat->common, curaddr)) {
 				sgsize = MIN(sgsize,
 				    PAGE_SIZE - (curaddr & PAGE_MASK));
 				map->pagesneeded++;
 			}
 			curaddr += sgsize;
 			buflen -= sgsize;
 		}
 		CTR1(KTR_BUSDMA, "pagesneeded= %d\n", map->pagesneeded);
 	}
 }
 
 static void
 _bus_dmamap_count_pages(bus_dma_tag_t dmat, bus_dmamap_t map, pmap_t pmap,
     void *buf, bus_size_t buflen, int flags)
 {
 	vm_offset_t vaddr;
 	vm_offset_t vendaddr;
 	vm_paddr_t paddr;
 	bus_size_t sg_len;
 
 	if (map != &nobounce_dmamap && map->pagesneeded == 0) {
 		CTR4(KTR_BUSDMA, "lowaddr= %d Maxmem= %d, boundary= %d, "
 		    "alignment= %d", dmat->common.lowaddr,
 		    ptoa((vm_paddr_t)Maxmem),
 		    dmat->common.boundary, dmat->common.alignment);
 		CTR3(KTR_BUSDMA, "map= %p, nobouncemap= %p, pagesneeded= %d",
 		    map, &nobounce_dmamap, map->pagesneeded);
 		/*
 		 * Count the number of bounce pages
 		 * needed in order to complete this transfer
 		 */
 		vaddr = (vm_offset_t)buf;
 		vendaddr = (vm_offset_t)buf + buflen;
 
 		while (vaddr < vendaddr) {
 			sg_len = PAGE_SIZE - ((vm_offset_t)vaddr & PAGE_MASK);
 			if (pmap == kernel_pmap)
 				paddr = pmap_kextract(vaddr);
 			else
 				paddr = pmap_extract(pmap, vaddr);
 			if (bus_dma_run_filter(&dmat->common, paddr) != 0) {
 				sg_len = roundup2(sg_len,
 				    dmat->common.alignment);
 				map->pagesneeded++;
 			}
 			vaddr += sg_len;
 		}
 		CTR1(KTR_BUSDMA, "pagesneeded= %d\n", map->pagesneeded);
 	}
 }
 
 static void
 _bus_dmamap_count_ma(bus_dma_tag_t dmat, bus_dmamap_t map, struct vm_page **ma,
     int ma_offs, bus_size_t buflen, int flags)
 {
 	bus_size_t sg_len, max_sgsize;
 	int page_index;
 	vm_paddr_t paddr;
 
 	if (map != &nobounce_dmamap && map->pagesneeded == 0) {
 		CTR4(KTR_BUSDMA, "lowaddr= %d Maxmem= %d, boundary= %d, "
 		    "alignment= %d", dmat->common.lowaddr,
 		    ptoa((vm_paddr_t)Maxmem),
 		    dmat->common.boundary, dmat->common.alignment);
 		CTR3(KTR_BUSDMA, "map= %p, nobouncemap= %p, pagesneeded= %d",
 		    map, &nobounce_dmamap, map->pagesneeded);
 
 		/*
 		 * Count the number of bounce pages
 		 * needed in order to complete this transfer
 		 */
 		page_index = 0;
 		while (buflen > 0) {
 			paddr = VM_PAGE_TO_PHYS(ma[page_index]) + ma_offs;
 			sg_len = PAGE_SIZE - ma_offs;
 			max_sgsize = MIN(buflen, dmat->common.maxsegsz);
 			sg_len = MIN(sg_len, max_sgsize);
 			if (bus_dma_run_filter(&dmat->common, paddr) != 0) {
 				sg_len = roundup2(sg_len,
 				    dmat->common.alignment);
 				sg_len = MIN(sg_len, max_sgsize);
 				KASSERT((sg_len & (dmat->common.alignment - 1))
 				    == 0, ("Segment size is not aligned"));
 				map->pagesneeded++;
 			}
 			if (((ma_offs + sg_len) & ~PAGE_MASK) != 0)
 				page_index++;
 			ma_offs = (ma_offs + sg_len) & PAGE_MASK;
 			KASSERT(buflen >= sg_len,
 			    ("Segment length overruns original buffer"));
 			buflen -= sg_len;
 		}
 		CTR1(KTR_BUSDMA, "pagesneeded= %d\n", map->pagesneeded);
 	}
 }
 
 static int
 _bus_dmamap_reserve_pages(bus_dma_tag_t dmat, bus_dmamap_t map, int flags)
 {
 
 	/* Reserve Necessary Bounce Pages */
 	mtx_lock(&bounce_lock);
 	if (flags & BUS_DMA_NOWAIT) {
 		if (reserve_bounce_pages(dmat, map, 0) != 0) {
 			mtx_unlock(&bounce_lock);
 			return (ENOMEM);
 		}
 	} else {
 		if (reserve_bounce_pages(dmat, map, 1) != 0) {
 			/* Queue us for resources */
 			STAILQ_INSERT_TAIL(&bounce_map_waitinglist, map, links);
 			mtx_unlock(&bounce_lock);
 			return (EINPROGRESS);
 		}
 	}
 	mtx_unlock(&bounce_lock);
 
 	return (0);
 }
 
 /*
  * Add a single contiguous physical range to the segment list.
  */
 static int
 _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, vm_paddr_t curaddr,
     bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
 {
 	bus_addr_t baddr, bmask;
 	int seg;
 
 	KASSERT(curaddr <= BUS_SPACE_MAXADDR,
 	    ("ds_addr %#jx > BUS_SPACE_MAXADDR %#jx; dmat %p fl %#x low %#jx "
 	    "hi %#jx",
 	    (uintmax_t)curaddr, (uintmax_t)BUS_SPACE_MAXADDR,
 	    dmat, dmat->bounce_flags, (uintmax_t)dmat->common.lowaddr,
 	    (uintmax_t)dmat->common.highaddr));
 
 	/*
 	 * Make sure we don't cross any boundaries.
 	 */
 	bmask = ~(dmat->common.boundary - 1);
 	if (dmat->common.boundary > 0) {
 		baddr = (curaddr + dmat->common.boundary) & bmask;
 		if (sgsize > (baddr - curaddr))
 			sgsize = (baddr - curaddr);
 	}
 
 	/*
 	 * Insert chunk into a segment, coalescing with
 	 * previous segment if possible.
 	 */
 	seg = *segp;
 	if (seg == -1) {
 		seg = 0;
 		segs[seg].ds_addr = curaddr;
 		segs[seg].ds_len = sgsize;
 	} else {
 		if (curaddr == segs[seg].ds_addr + segs[seg].ds_len &&
 		    (segs[seg].ds_len + sgsize) <= dmat->common.maxsegsz &&
 		    (dmat->common.boundary == 0 ||
 		     (segs[seg].ds_addr & bmask) == (curaddr & bmask)))
 			segs[seg].ds_len += sgsize;
 		else {
 			if (++seg >= dmat->common.nsegments)
 				return (0);
 			segs[seg].ds_addr = curaddr;
 			segs[seg].ds_len = sgsize;
 		}
 	}
 	*segp = seg;
 	return (sgsize);
 }
 
 /*
  * Utility function to load a physical buffer.  segp contains
  * the starting segment on entrace, and the ending segment on exit.
  */
 static int
 bounce_bus_dmamap_load_phys(bus_dma_tag_t dmat, bus_dmamap_t map,
     vm_paddr_t buf, bus_size_t buflen, int flags, bus_dma_segment_t *segs,
     int *segp)
 {
 	bus_size_t sgsize;
 	vm_paddr_t curaddr;
 	int error;
 
 	if (map == NULL)
 		map = &nobounce_dmamap;
 
 	if (segs == NULL)
 		segs = dmat->segments;
 
 	if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0) {
 		_bus_dmamap_count_phys(dmat, map, buf, buflen, flags);
 		if (map->pagesneeded != 0) {
 			error = _bus_dmamap_reserve_pages(dmat, map, flags);
 			if (error)
 				return (error);
 		}
 	}
 
 	while (buflen > 0) {
 		curaddr = buf;
 		sgsize = MIN(buflen, dmat->common.maxsegsz);
 		if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0 &&
 		    map->pagesneeded != 0 &&
 		    bus_dma_run_filter(&dmat->common, curaddr)) {
 			sgsize = MIN(sgsize, PAGE_SIZE - (curaddr & PAGE_MASK));
 			curaddr = add_bounce_page(dmat, map, 0, curaddr, 0,
 			    sgsize);
 		}
 		sgsize = _bus_dmamap_addseg(dmat, map, curaddr, sgsize, segs,
 		    segp);
 		if (sgsize == 0)
 			break;
 		buf += sgsize;
 		buflen -= sgsize;
 	}
 
 	/*
 	 * Did we fit?
 	 */
 	return (buflen != 0 ? EFBIG : 0); /* XXX better return value here? */
 }
 
 /*
  * Utility function to load a linear buffer.  segp contains
  * the starting segment on entrace, and the ending segment on exit.
  */
 static int
 bounce_bus_dmamap_load_buffer(bus_dma_tag_t dmat, bus_dmamap_t map, void *buf,
     bus_size_t buflen, pmap_t pmap, int flags, bus_dma_segment_t *segs,
     int *segp)
 {
 	bus_size_t sgsize, max_sgsize;
 	vm_paddr_t curaddr;
 	vm_offset_t kvaddr, vaddr;
 	int error;
 
 	if (map == NULL)
 		map = &nobounce_dmamap;
 
 	if (segs == NULL)
 		segs = dmat->segments;
 
 	if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0) {
 		_bus_dmamap_count_pages(dmat, map, pmap, buf, buflen, flags);
 		if (map->pagesneeded != 0) {
 			error = _bus_dmamap_reserve_pages(dmat, map, flags);
 			if (error)
 				return (error);
 		}
 	}
 
 	vaddr = (vm_offset_t)buf;
 	while (buflen > 0) {
 		/*
 		 * Get the physical address for this segment.
 		 */
 		if (pmap == kernel_pmap) {
 			curaddr = pmap_kextract(vaddr);
 			kvaddr = vaddr;
 		} else {
 			curaddr = pmap_extract(pmap, vaddr);
 			kvaddr = 0;
 		}
 
 		/*
 		 * Compute the segment size, and adjust counts.
 		 */
 		max_sgsize = MIN(buflen, dmat->common.maxsegsz);
 		sgsize = PAGE_SIZE - (curaddr & PAGE_MASK);
 		if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0 &&
 		    map->pagesneeded != 0 &&
 		    bus_dma_run_filter(&dmat->common, curaddr)) {
 			sgsize = roundup2(sgsize, dmat->common.alignment);
 			sgsize = MIN(sgsize, max_sgsize);
 			curaddr = add_bounce_page(dmat, map, kvaddr, curaddr, 0,
 			    sgsize);
 		} else {
 			sgsize = MIN(sgsize, max_sgsize);
 		}
 		sgsize = _bus_dmamap_addseg(dmat, map, curaddr, sgsize, segs,
 		    segp);
 		if (sgsize == 0)
 			break;
 		vaddr += sgsize;
 		buflen -= sgsize;
 	}
 
 	/*
 	 * Did we fit?
 	 */
 	return (buflen != 0 ? EFBIG : 0); /* XXX better return value here? */
 }
 
 static int
 bounce_bus_dmamap_load_ma(bus_dma_tag_t dmat, bus_dmamap_t map,
     struct vm_page **ma, bus_size_t buflen, int ma_offs, int flags,
     bus_dma_segment_t *segs, int *segp)
 {
 	vm_paddr_t paddr, next_paddr;
 	int error, page_index;
 	bus_size_t sgsize, max_sgsize;
 
 	if (dmat->common.flags & BUS_DMA_KEEP_PG_OFFSET) {
 		/*
 		 * If we have to keep the offset of each page this function
 		 * is not suitable, switch back to bus_dmamap_load_ma_triv
 		 * which is going to do the right thing in this case.
 		 */
 		error = bus_dmamap_load_ma_triv(dmat, map, ma, buflen, ma_offs,
 		    flags, segs, segp);
 		return (error);
 	}
 
 	if (map == NULL)
 		map = &nobounce_dmamap;
 
 	if (segs == NULL)
 		segs = dmat->segments;
 
 	if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0) {
 		_bus_dmamap_count_ma(dmat, map, ma, ma_offs, buflen, flags);
 		if (map->pagesneeded != 0) {
 			error = _bus_dmamap_reserve_pages(dmat, map, flags);
 			if (error)
 				return (error);
 		}
 	}
 
 	page_index = 0;
 	while (buflen > 0) {
 		/*
 		 * Compute the segment size, and adjust counts.
 		 */
 		paddr = VM_PAGE_TO_PHYS(ma[page_index]) + ma_offs;
 		max_sgsize = MIN(buflen, dmat->common.maxsegsz);
 		sgsize = PAGE_SIZE - ma_offs;
 		if ((dmat->bounce_flags & BUS_DMA_COULD_BOUNCE) != 0 &&
 		    map->pagesneeded != 0 &&
 		    bus_dma_run_filter(&dmat->common, paddr)) {
 			sgsize = roundup2(sgsize, dmat->common.alignment);
 			sgsize = MIN(sgsize, max_sgsize);
 			KASSERT((sgsize & (dmat->common.alignment - 1)) == 0,
 			    ("Segment size is not aligned"));
 			/*
 			 * Check if two pages of the user provided buffer
 			 * are used.
 			 */
 			if ((ma_offs + sgsize) > PAGE_SIZE)
 				next_paddr =
 				    VM_PAGE_TO_PHYS(ma[page_index + 1]);
 			else
 				next_paddr = 0;
 			paddr = add_bounce_page(dmat, map, 0, paddr,
 			    next_paddr, sgsize);
 		} else {
 			sgsize = MIN(sgsize, max_sgsize);
 		}
 		sgsize = _bus_dmamap_addseg(dmat, map, paddr, sgsize, segs,
 		    segp);
 		if (sgsize == 0)
 			break;
 		KASSERT(buflen >= sgsize,
 		    ("Segment length overruns original buffer"));
 		buflen -= sgsize;
 		if (((ma_offs + sgsize) & ~PAGE_MASK) != 0)
 			page_index++;
 		ma_offs = (ma_offs + sgsize) & PAGE_MASK;
 	}
 
 	/*
 	 * Did we fit?
 	 */
 	return (buflen != 0 ? EFBIG : 0); /* XXX better return value here? */
 }
 
 static void
 bounce_bus_dmamap_waitok(bus_dma_tag_t dmat, bus_dmamap_t map,
     struct memdesc *mem, bus_dmamap_callback_t *callback, void *callback_arg)
 {
 
 	if (map == NULL)
 		return;
 	map->mem = *mem;
 	map->dmat = dmat;
 	map->callback = callback;
 	map->callback_arg = callback_arg;
 }
 
 static bus_dma_segment_t *
 bounce_bus_dmamap_complete(bus_dma_tag_t dmat, bus_dmamap_t map,
     bus_dma_segment_t *segs, int nsegs, int error)
 {
 
 	if (segs == NULL)
 		segs = dmat->segments;
 	return (segs);
 }
 
 /*
  * Release the mapping held by map.
  */
 static void
 bounce_bus_dmamap_unload(bus_dma_tag_t dmat, bus_dmamap_t map)
 {
 	struct bounce_page *bpage;
 
 	if (map == NULL)
 		return;
 
 	while ((bpage = STAILQ_FIRST(&map->bpages)) != NULL) {
 		STAILQ_REMOVE_HEAD(&map->bpages, links);
 		free_bounce_page(dmat, bpage);
 	}
 }
 
 static void
 bounce_bus_dmamap_sync(bus_dma_tag_t dmat, bus_dmamap_t map,
     bus_dmasync_op_t op)
 {
 	struct bounce_page *bpage;
 	vm_offset_t datavaddr, tempvaddr;
 	bus_size_t datacount1, datacount2;
 
 	if (map == NULL || (bpage = STAILQ_FIRST(&map->bpages)) == NULL)
 		return;
 
 	/*
 	 * Handle data bouncing.  We might also want to add support for
 	 * invalidating the caches on broken hardware.
 	 */
 	CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x op 0x%x "
 	    "performing bounce", __func__, dmat, dmat->common.flags, op);
 
 	if ((op & BUS_DMASYNC_PREWRITE) != 0) {
 		while (bpage != NULL) {
 			tempvaddr = 0;
 			datavaddr = bpage->datavaddr;
 			datacount1 = bpage->datacount;
 			if (datavaddr == 0) {
 				tempvaddr =
 				    pmap_quick_enter_page(bpage->datapage[0]);
 				datavaddr = tempvaddr | bpage->dataoffs;
 				datacount1 = min(PAGE_SIZE - bpage->dataoffs,
 				    datacount1);
 			}
 
 			bcopy((void *)datavaddr,
 			    (void *)bpage->vaddr, datacount1);
 
 			if (tempvaddr != 0)
 				pmap_quick_remove_page(tempvaddr);
 
 			if (bpage->datapage[1] == 0) {
 				KASSERT(datacount1 == bpage->datacount,
 		("Mismatch between data size and provided memory space"));
 				goto next_w;
 			}
 
 			/*
 			 * We are dealing with an unmapped buffer that expands
 			 * over two pages.
 			 */
 			datavaddr = pmap_quick_enter_page(bpage->datapage[1]);
 			datacount2 = bpage->datacount - datacount1;
 			bcopy((void *)datavaddr,
 			    (void *)(bpage->vaddr + datacount1), datacount2);
 			pmap_quick_remove_page(datavaddr);
 
 next_w:
 			bpage = STAILQ_NEXT(bpage, links);
 		}
 		dmat->bounce_zone->total_bounced++;
 	}
 
 	if ((op & BUS_DMASYNC_POSTREAD) != 0) {
 		while (bpage != NULL) {
 			tempvaddr = 0;
 			datavaddr = bpage->datavaddr;
 			datacount1 = bpage->datacount;
 			if (datavaddr == 0) {
 				tempvaddr =
 				    pmap_quick_enter_page(bpage->datapage[0]);
 				datavaddr = tempvaddr | bpage->dataoffs;
 				datacount1 = min(PAGE_SIZE - bpage->dataoffs,
 				    datacount1);
 			}
 
 			bcopy((void *)bpage->vaddr, (void *)datavaddr,
 			    datacount1);
 
 			if (tempvaddr != 0)
 				pmap_quick_remove_page(tempvaddr);
 
 			if (bpage->datapage[1] == 0) {
 				KASSERT(datacount1 == bpage->datacount,
 		("Mismatch between data size and provided memory space"));
 				goto next_r;
 			}
 
 			/*
 			 * We are dealing with an unmapped buffer that expands
 			 * over two pages.
 			 */
 			datavaddr = pmap_quick_enter_page(bpage->datapage[1]);
 			datacount2 = bpage->datacount - datacount1;
 			bcopy((void *)(bpage->vaddr + datacount1),
 			    (void *)datavaddr, datacount2);
 			pmap_quick_remove_page(datavaddr);
 
 next_r:
 			bpage = STAILQ_NEXT(bpage, links);
 		}
 		dmat->bounce_zone->total_bounced++;
 	}
 }
 
 static void
 init_bounce_pages(void *dummy __unused)
 {
 
 	total_bpages = 0;
 	STAILQ_INIT(&bounce_zone_list);
 	STAILQ_INIT(&bounce_map_waitinglist);
 	STAILQ_INIT(&bounce_map_callbacklist);
 	mtx_init(&bounce_lock, "bounce pages lock", NULL, MTX_DEF);
 }
 SYSINIT(bpages, SI_SUB_LOCK, SI_ORDER_ANY, init_bounce_pages, NULL);
 
 static struct sysctl_ctx_list *
 busdma_sysctl_tree(struct bounce_zone *bz)
 {
 
 	return (&bz->sysctl_tree);
 }
 
 static struct sysctl_oid *
 busdma_sysctl_tree_top(struct bounce_zone *bz)
 {
 
 	return (bz->sysctl_tree_top);
 }
 
 static int
 alloc_bounce_zone(bus_dma_tag_t dmat)
 {
 	struct bounce_zone *bz;
 
 	/* Check to see if we already have a suitable zone */
 	STAILQ_FOREACH(bz, &bounce_zone_list, links) {
 		if (dmat->common.alignment <= bz->alignment &&
 		    dmat->common.lowaddr >= bz->lowaddr &&
 		    dmat->common.domain == bz->domain) {
 			dmat->bounce_zone = bz;
 			return (0);
 		}
 	}
 
 	if ((bz = (struct bounce_zone *)malloc(sizeof(*bz), M_DEVBUF,
 	    M_NOWAIT | M_ZERO)) == NULL)
 		return (ENOMEM);
 
 	STAILQ_INIT(&bz->bounce_page_list);
 	bz->free_bpages = 0;
 	bz->reserved_bpages = 0;
 	bz->active_bpages = 0;
 	bz->lowaddr = dmat->common.lowaddr;
 	bz->alignment = MAX(dmat->common.alignment, PAGE_SIZE);
 	bz->map_count = 0;
 	bz->domain = dmat->common.domain;
 	snprintf(bz->zoneid, 8, "zone%d", busdma_zonecount);
 	busdma_zonecount++;
 	snprintf(bz->lowaddrid, 18, "%#jx", (uintmax_t)bz->lowaddr);
 	STAILQ_INSERT_TAIL(&bounce_zone_list, bz, links);
 	dmat->bounce_zone = bz;
 
 	sysctl_ctx_init(&bz->sysctl_tree);
 	bz->sysctl_tree_top = SYSCTL_ADD_NODE(&bz->sysctl_tree,
 	    SYSCTL_STATIC_CHILDREN(_hw_busdma), OID_AUTO, bz->zoneid,
 	    CTLFLAG_RD, 0, "");
 	if (bz->sysctl_tree_top == NULL) {
 		sysctl_ctx_free(&bz->sysctl_tree);
 		return (0);	/* XXX error code? */
 	}
 
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "total_bpages", CTLFLAG_RD, &bz->total_bpages, 0,
 	    "Total bounce pages");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "free_bpages", CTLFLAG_RD, &bz->free_bpages, 0,
 	    "Free bounce pages");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "reserved_bpages", CTLFLAG_RD, &bz->reserved_bpages, 0,
 	    "Reserved bounce pages");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "active_bpages", CTLFLAG_RD, &bz->active_bpages, 0,
 	    "Active bounce pages");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "total_bounced", CTLFLAG_RD, &bz->total_bounced, 0,
 	    "Total bounce requests");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "total_deferred", CTLFLAG_RD, &bz->total_deferred, 0,
 	    "Total bounce requests that were deferred");
 	SYSCTL_ADD_STRING(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "lowaddr", CTLFLAG_RD, bz->lowaddrid, 0, "");
 	SYSCTL_ADD_UAUTO(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "alignment", CTLFLAG_RD, &bz->alignment, "");
 	SYSCTL_ADD_INT(busdma_sysctl_tree(bz),
 	    SYSCTL_CHILDREN(busdma_sysctl_tree_top(bz)), OID_AUTO,
 	    "domain", CTLFLAG_RD, &bz->domain, 0,
 	    "memory domain");
 
 	return (0);
 }
 
 static int
 alloc_bounce_pages(bus_dma_tag_t dmat, u_int numpages)
 {
 	struct bounce_zone *bz;
 	int count;
 
 	bz = dmat->bounce_zone;
 	count = 0;
 	while (numpages > 0) {
 		struct bounce_page *bpage;
 
 		bpage = malloc_domainset(sizeof(*bpage), M_DEVBUF,
 		    DOMAINSET_PREF(dmat->common.domain), M_NOWAIT | M_ZERO);
 
 		if (bpage == NULL)
 			break;
 		bpage->vaddr = (vm_offset_t)contigmalloc_domainset(PAGE_SIZE,
 		    M_DEVBUF, DOMAINSET_PREF(dmat->common.domain), M_NOWAIT,
 		    0ul, bz->lowaddr, PAGE_SIZE, 0);
 		if (bpage->vaddr == 0) {
 			free_domain(bpage, M_DEVBUF);
 			break;
 		}
 		bpage->busaddr = pmap_kextract(bpage->vaddr);
 		mtx_lock(&bounce_lock);
 		STAILQ_INSERT_TAIL(&bz->bounce_page_list, bpage, links);
 		total_bpages++;
 		bz->total_bpages++;
 		bz->free_bpages++;
 		mtx_unlock(&bounce_lock);
 		count++;
 		numpages--;
 	}
 	return (count);
 }
 
 static int
 reserve_bounce_pages(bus_dma_tag_t dmat, bus_dmamap_t map, int commit)
 {
 	struct bounce_zone *bz;
 	int pages;
 
 	mtx_assert(&bounce_lock, MA_OWNED);
 	bz = dmat->bounce_zone;
 	pages = MIN(bz->free_bpages, map->pagesneeded - map->pagesreserved);
 	if (commit == 0 && map->pagesneeded > (map->pagesreserved + pages))
 		return (map->pagesneeded - (map->pagesreserved + pages));
 	bz->free_bpages -= pages;
 	bz->reserved_bpages += pages;
 	map->pagesreserved += pages;
 	pages = map->pagesneeded - map->pagesreserved;
 
 	return (pages);
 }
 
 static bus_addr_t
 add_bounce_page(bus_dma_tag_t dmat, bus_dmamap_t map, vm_offset_t vaddr,
     vm_paddr_t addr1, vm_paddr_t addr2, bus_size_t size)
 {
 	struct bounce_zone *bz;
 	struct bounce_page *bpage;
 
 	KASSERT(dmat->bounce_zone != NULL, ("no bounce zone in dma tag"));
 	KASSERT(map != NULL && map != &nobounce_dmamap,
 	    ("add_bounce_page: bad map %p", map));
 
 	bz = dmat->bounce_zone;
 	if (map->pagesneeded == 0)
 		panic("add_bounce_page: map doesn't need any pages");
 	map->pagesneeded--;
 
 	if (map->pagesreserved == 0)
 		panic("add_bounce_page: map doesn't need any pages");
 	map->pagesreserved--;
 
 	mtx_lock(&bounce_lock);
 	bpage = STAILQ_FIRST(&bz->bounce_page_list);
 	if (bpage == NULL)
 		panic("add_bounce_page: free page list is empty");
 
 	STAILQ_REMOVE_HEAD(&bz->bounce_page_list, links);
 	bz->reserved_bpages--;
 	bz->active_bpages++;
 	mtx_unlock(&bounce_lock);
 
 	if (dmat->common.flags & BUS_DMA_KEEP_PG_OFFSET) {
 		/* Page offset needs to be preserved. */
 		bpage->vaddr |= addr1 & PAGE_MASK;
 		bpage->busaddr |= addr1 & PAGE_MASK;
 		KASSERT(addr2 == 0,
 	("Trying to bounce multiple pages with BUS_DMA_KEEP_PG_OFFSET"));
 	}
 	bpage->datavaddr = vaddr;
 	bpage->datapage[0] = PHYS_TO_VM_PAGE(addr1);
 	KASSERT((addr2 & PAGE_MASK) == 0, ("Second page is not aligned"));
 	bpage->datapage[1] = PHYS_TO_VM_PAGE(addr2);
 	bpage->dataoffs = addr1 & PAGE_MASK;
 	bpage->datacount = size;
 	STAILQ_INSERT_TAIL(&(map->bpages), bpage, links);
 	return (bpage->busaddr);
 }
 
 static void
 free_bounce_page(bus_dma_tag_t dmat, struct bounce_page *bpage)
 {
 	struct bus_dmamap *map;
 	struct bounce_zone *bz;
 
 	bz = dmat->bounce_zone;
 	bpage->datavaddr = 0;
 	bpage->datacount = 0;
 	if (dmat->common.flags & BUS_DMA_KEEP_PG_OFFSET) {
 		/*
 		 * Reset the bounce page to start at offset 0.  Other uses
 		 * of this bounce page may need to store a full page of
 		 * data and/or assume it starts on a page boundary.
 		 */
 		bpage->vaddr &= ~PAGE_MASK;
 		bpage->busaddr &= ~PAGE_MASK;
 	}
 
 	mtx_lock(&bounce_lock);
 	STAILQ_INSERT_HEAD(&bz->bounce_page_list, bpage, links);
 	bz->free_bpages++;
 	bz->active_bpages--;
 	if ((map = STAILQ_FIRST(&bounce_map_waitinglist)) != NULL) {
 		if (reserve_bounce_pages(map->dmat, map, 1) == 0) {
 			STAILQ_REMOVE_HEAD(&bounce_map_waitinglist, links);
 			STAILQ_INSERT_TAIL(&bounce_map_callbacklist,
 			    map, links);
 			busdma_swi_pending = 1;
 			bz->total_deferred++;
 			swi_sched(vm_ih, 0);
 		}
 	}
 	mtx_unlock(&bounce_lock);
 }
 
 void
 busdma_swi(void)
 {
 	bus_dma_tag_t dmat;
 	struct bus_dmamap *map;
 
 	mtx_lock(&bounce_lock);
 	while ((map = STAILQ_FIRST(&bounce_map_callbacklist)) != NULL) {
 		STAILQ_REMOVE_HEAD(&bounce_map_callbacklist, links);
 		mtx_unlock(&bounce_lock);
 		dmat = map->dmat;
 		(dmat->common.lockfunc)(dmat->common.lockfuncarg, BUS_DMA_LOCK);
 		bus_dmamap_load_mem(map->dmat, map, &map->mem,
 		    map->callback, map->callback_arg, BUS_DMA_WAITOK);
 		(dmat->common.lockfunc)(dmat->common.lockfuncarg,
 		    BUS_DMA_UNLOCK);
 		mtx_lock(&bounce_lock);
 	}
 	mtx_unlock(&bounce_lock);
 }
 
 struct bus_dma_impl bus_dma_bounce_impl = {
 	.tag_create = bounce_bus_dma_tag_create,
 	.tag_destroy = bounce_bus_dma_tag_destroy,
 	.tag_set_domain = bounce_bus_dma_tag_set_domain,
 	.map_create = bounce_bus_dmamap_create,
 	.map_destroy = bounce_bus_dmamap_destroy,
 	.mem_alloc = bounce_bus_dmamem_alloc,
 	.mem_free = bounce_bus_dmamem_free,
 	.load_phys = bounce_bus_dmamap_load_phys,
 	.load_buffer = bounce_bus_dmamap_load_buffer,
 	.load_ma = bounce_bus_dmamap_load_ma,
 	.map_waitok = bounce_bus_dmamap_waitok,
 	.map_complete = bounce_bus_dmamap_complete,
 	.map_unload = bounce_bus_dmamap_unload,
 	.map_sync = bounce_bus_dmamap_sync,
 };
Index: user/ngie/bug-237403/sys/x86/x86/mp_x86.c
===================================================================
--- user/ngie/bug-237403/sys/x86/x86/mp_x86.c	(revision 346925)
+++ user/ngie/bug-237403/sys/x86/x86/mp_x86.c	(revision 346926)
@@ -1,1790 +1,1799 @@
 /*-
  * Copyright (c) 1996, by Steve Passe
  * Copyright (c) 2003, by Peter Wemm
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. The name of the developer may NOT be used to endorse or promote products
  *    derived from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #ifdef __i386__
 #include "opt_apic.h"
 #endif
 #include "opt_cpu.h"
 #include "opt_kstack_pages.h"
 #include "opt_pmap.h"
 #include "opt_sched.h"
 #include "opt_smp.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/cons.h>	/* cngetc() */
 #include <sys/cpuset.h>
 #ifdef GPROF 
 #include <sys/gmon.h>
 #endif
 #include <sys/kdb.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/memrange.h>
 #include <sys/mutex.h>
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 #include <sys/sysctl.h>
 
 #include <vm/vm.h>
 #include <vm/vm_param.h>
 #include <vm/pmap.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_map.h>
 
 #include <x86/apicreg.h>
 #include <machine/clock.h>
 #include <machine/cpu.h>
 #include <machine/cputypes.h>
 #include <x86/mca.h>
 #include <machine/md_var.h>
 #include <machine/pcb.h>
 #include <machine/psl.h>
 #include <machine/smp.h>
 #include <machine/specialreg.h>
 #include <x86/ucode.h>
 
 static MALLOC_DEFINE(M_CPUS, "cpus", "CPU items");
 
 /* lock region used by kernel profiling */
 int	mcount_lock;
 
 int	mp_naps;		/* # of Applications processors */
 int	boot_cpu_id = -1;	/* designated BSP */
 
 /* AP uses this during bootstrap.  Do not staticize.  */
 char *bootSTK;
 int bootAP;
 
 /* Free these after use */
 void *bootstacks[MAXCPU];
 void *dpcpu;
 
 struct pcb stoppcbs[MAXCPU];
 struct susppcb **susppcbs;
 
 #ifdef COUNT_IPIS
 /* Interrupt counts. */
 static u_long *ipi_preempt_counts[MAXCPU];
 static u_long *ipi_ast_counts[MAXCPU];
 u_long *ipi_invltlb_counts[MAXCPU];
 u_long *ipi_invlrng_counts[MAXCPU];
 u_long *ipi_invlpg_counts[MAXCPU];
 u_long *ipi_invlcache_counts[MAXCPU];
 u_long *ipi_rendezvous_counts[MAXCPU];
 static u_long *ipi_hardclock_counts[MAXCPU];
 #endif
 
 /* Default cpu_ops implementation. */
 struct cpu_ops cpu_ops;
 
 /*
  * Local data and functions.
  */
 
 static volatile cpuset_t ipi_stop_nmi_pending;
 
 volatile cpuset_t resuming_cpus;
 volatile cpuset_t toresume_cpus;
 
 /* used to hold the AP's until we are ready to release them */
 struct mtx ap_boot_mtx;
 
 /* Set to 1 once we're ready to let the APs out of the pen. */
 volatile int aps_ready = 0;
 
 /*
  * Store data from cpu_add() until later in the boot when we actually setup
  * the APs.
  */
 struct cpu_info *cpu_info;
 int *apic_cpuids;
 int cpu_apic_ids[MAXCPU];
 _Static_assert(MAXCPU <= MAX_APIC_ID,
     "MAXCPU cannot be larger that MAX_APIC_ID");
 _Static_assert(xAPIC_MAX_APIC_ID <= MAX_APIC_ID,
     "xAPIC_MAX_APIC_ID cannot be larger that MAX_APIC_ID");
 
 /* Holds pending bitmap based IPIs per CPU */
 volatile u_int cpu_ipi_pending[MAXCPU];
 
 static void	release_aps(void *dummy);
 static void	cpustop_handler_post(u_int cpu);
 
 static int	hyperthreading_allowed = 1;
 SYSCTL_INT(_machdep, OID_AUTO, hyperthreading_allowed, CTLFLAG_RDTUN,
 	&hyperthreading_allowed, 0, "Use Intel HTT logical CPUs");
 
 static struct topo_node topo_root;
 
 static int pkg_id_shift;
 static int node_id_shift;
 static int core_id_shift;
 static int disabled_cpus;
 
 struct cache_info {
 	int	id_shift;
 	int	present;
 } static caches[MAX_CACHE_LEVELS];
 
 unsigned int boot_address;
 
 #define MiB(v)	(v ## ULL << 20)
 
 void
 mem_range_AP_init(void)
 {
 
 	if (mem_range_softc.mr_op && mem_range_softc.mr_op->initAP)
 		mem_range_softc.mr_op->initAP(&mem_range_softc);
 }
 
 /*
  * Round up to the next power of two, if necessary, and then
  * take log2.
  * Returns -1 if argument is zero.
  */
 static __inline int
 mask_width(u_int x)
 {
 
 	return (fls(x << (1 - powerof2(x))) - 1);
 }
 
 /*
  * Add a cache level to the cache topology description.
  */
 static int
 add_deterministic_cache(int type, int level, int share_count)
 {
 
 	if (type == 0)
 		return (0);
 	if (type > 3) {
 		printf("unexpected cache type %d\n", type);
 		return (1);
 	}
 	if (type == 2) /* ignore instruction cache */
 		return (1);
 	if (level == 0 || level > MAX_CACHE_LEVELS) {
 		printf("unexpected cache level %d\n", type);
 		return (1);
 	}
 
 	if (caches[level - 1].present) {
 		printf("WARNING: multiple entries for L%u data cache\n", level);
 		printf("%u => %u\n", caches[level - 1].id_shift,
 		    mask_width(share_count));
 	}
 	caches[level - 1].id_shift = mask_width(share_count);
 	caches[level - 1].present = 1;
 
 	if (caches[level - 1].id_shift > pkg_id_shift) {
 		printf("WARNING: L%u data cache covers more "
 		    "APIC IDs than a package (%u > %u)\n", level,
 		    caches[level - 1].id_shift, pkg_id_shift);
 		caches[level - 1].id_shift = pkg_id_shift;
 	}
 	if (caches[level - 1].id_shift < core_id_shift) {
 		printf("WARNING: L%u data cache covers fewer "
 		    "APIC IDs than a core (%u < %u)\n", level,
 		    caches[level - 1].id_shift, core_id_shift);
 		caches[level - 1].id_shift = core_id_shift;
 	}
 
 	return (1);
 }
 
 /*
  * Determine topology of processing units and caches for AMD CPUs.
  * See:
  *  - AMD CPUID Specification (Publication # 25481)
  *  - BKDG for AMD NPT Family 0Fh Processors (Publication # 32559)
  *  - BKDG For AMD Family 10h Processors (Publication # 31116)
  *  - BKDG For AMD Family 15h Models 00h-0Fh Processors (Publication # 42301)
  *  - BKDG For AMD Family 16h Models 00h-0Fh Processors (Publication # 48751)
  *  - PPR For AMD Family 17h Models 00h-0Fh Processors (Publication # 54945)
  */
 static void
 topo_probe_amd(void)
 {
 	u_int p[4];
 	uint64_t v;
 	int level;
 	int nodes_per_socket;
 	int share_count;
 	int type;
 	int i;
 
 	/* No multi-core capability. */
 	if ((amd_feature2 & AMDID2_CMP) == 0)
 		return;
 
 	/* For families 10h and newer. */
 	pkg_id_shift = (cpu_procinfo2 & AMDID_COREID_SIZE) >>
 	    AMDID_COREID_SIZE_SHIFT;
 
 	/* For 0Fh family. */
 	if (pkg_id_shift == 0)
 		pkg_id_shift =
 		    mask_width((cpu_procinfo2 & AMDID_CMP_CORES) + 1);
 
 	/*
 	 * Families prior to 16h define the following value as
 	 * cores per compute unit and we don't really care about the AMD
 	 * compute units at the moment.  Perhaps we should treat them as
 	 * cores and cores within the compute units as hardware threads,
 	 * but that's up for debate.
 	 * Later families define the value as threads per compute unit,
 	 * so we are following AMD's nomenclature here.
 	 */
 	if ((amd_feature2 & AMDID2_TOPOLOGY) != 0 &&
 	    CPUID_TO_FAMILY(cpu_id) >= 0x16) {
 		cpuid_count(0x8000001e, 0, p);
 		share_count = ((p[1] >> 8) & 0xff) + 1;
 		core_id_shift = mask_width(share_count);
 
 		/*
 		 * For Zen (17h), gather Nodes per Processor.  Each node is a
 		 * Zeppelin die; TR and EPYC CPUs will have multiple dies per
 		 * package.  Communication latency between dies is higher than
 		 * within them.
 		 */
 		nodes_per_socket = ((p[2] >> 8) & 0x7) + 1;
 		node_id_shift = pkg_id_shift - mask_width(nodes_per_socket);
 	}
 
 	if ((amd_feature2 & AMDID2_TOPOLOGY) != 0) {
 		for (i = 0; ; i++) {
 			cpuid_count(0x8000001d, i, p);
 			type = p[0] & 0x1f;
 			level = (p[0] >> 5) & 0x7;
 			share_count = 1 + ((p[0] >> 14) & 0xfff);
 
 			if (!add_deterministic_cache(type, level, share_count))
 				break;
 		}
 	} else {
 		if (cpu_exthigh >= 0x80000005) {
 			cpuid_count(0x80000005, 0, p);
 			if (((p[2] >> 24) & 0xff) != 0) {
 				caches[0].id_shift = 0;
 				caches[0].present = 1;
 			}
 		}
 		if (cpu_exthigh >= 0x80000006) {
 			cpuid_count(0x80000006, 0, p);
 			if (((p[2] >> 16) & 0xffff) != 0) {
 				caches[1].id_shift = 0;
 				caches[1].present = 1;
 			}
 			if (((p[3] >> 18) & 0x3fff) != 0) {
 				nodes_per_socket = 1;
 				if ((amd_feature2 & AMDID2_NODE_ID) != 0) {
 					/*
 					 * Handle multi-node processors that
 					 * have multiple chips, each with its
 					 * own L3 cache, on the same die.
 					 */
 					v = rdmsr(0xc001100c);
 					nodes_per_socket = 1 + ((v >> 3) & 0x7);
 				}
 				caches[2].id_shift =
 				    pkg_id_shift - mask_width(nodes_per_socket);
 				caches[2].present = 1;
 			}
 		}
 	}
 }
 
 /*
  * Determine topology of processing units for Intel CPUs
  * using CPUID Leaf 1 and Leaf 4, if supported.
  * See:
  *  - Intel 64 Architecture Processor Topology Enumeration
  *  - Intel 64 and IA-32 ArchitecturesSoftware Developer’s Manual,
  *    Volume 3A: System Programming Guide, PROGRAMMING CONSIDERATIONS
  *    FOR HARDWARE MULTI-THREADING CAPABLE PROCESSORS
  */
 static void
 topo_probe_intel_0x4(void)
 {
 	u_int p[4];
 	int max_cores;
 	int max_logical;
 
 	/* Both zero and one here mean one logical processor per package. */
 	max_logical = (cpu_feature & CPUID_HTT) != 0 ?
 	    (cpu_procinfo & CPUID_HTT_CORES) >> 16 : 1;
 	if (max_logical <= 1)
 		return;
 
 	if (cpu_high >= 0x4) {
 		cpuid_count(0x04, 0, p);
 		max_cores = ((p[0] >> 26) & 0x3f) + 1;
 	} else
 		max_cores = 1;
 
 	core_id_shift = mask_width(max_logical/max_cores);
 	KASSERT(core_id_shift >= 0,
 	    ("intel topo: max_cores > max_logical\n"));
 	pkg_id_shift = core_id_shift + mask_width(max_cores);
 }
 
 /*
  * Determine topology of processing units for Intel CPUs
  * using CPUID Leaf 11, if supported.
  * See:
  *  - Intel 64 Architecture Processor Topology Enumeration
  *  - Intel 64 and IA-32 ArchitecturesSoftware Developer’s Manual,
  *    Volume 3A: System Programming Guide, PROGRAMMING CONSIDERATIONS
  *    FOR HARDWARE MULTI-THREADING CAPABLE PROCESSORS
  */
 static void
 topo_probe_intel_0xb(void)
 {
 	u_int p[4];
 	int bits;
 	int type;
 	int i;
 
 	/* Fall back if CPU leaf 11 doesn't really exist. */
 	cpuid_count(0x0b, 0, p);
 	if (p[1] == 0) {
 		topo_probe_intel_0x4();
 		return;
 	}
 
 	/* We only support three levels for now. */
 	for (i = 0; ; i++) {
 		cpuid_count(0x0b, i, p);
 
 		bits = p[0] & 0x1f;
 		type = (p[2] >> 8) & 0xff;
 
 		if (type == 0)
 			break;
 
 		/* TODO: check for duplicate (re-)assignment */
 		if (type == CPUID_TYPE_SMT)
 			core_id_shift = bits;
 		else if (type == CPUID_TYPE_CORE)
 			pkg_id_shift = bits;
 		else
 			printf("unknown CPU level type %d\n", type);
 	}
 
 	if (pkg_id_shift < core_id_shift) {
 		printf("WARNING: core covers more APIC IDs than a package\n");
 		core_id_shift = pkg_id_shift;
 	}
 }
 
 /*
  * Determine topology of caches for Intel CPUs.
  * See:
  *  - Intel 64 Architecture Processor Topology Enumeration
  *  - Intel 64 and IA-32 Architectures Software Developer’s Manual
  *    Volume 2A: Instruction Set Reference, A-M,
  *    CPUID instruction
  */
 static void
 topo_probe_intel_caches(void)
 {
 	u_int p[4];
 	int level;
 	int share_count;
 	int type;
 	int i;
 
 	if (cpu_high < 0x4) {
 		/*
 		 * Available cache level and sizes can be determined
 		 * via CPUID leaf 2, but that requires a huge table of hardcoded
 		 * values, so for now just assume L1 and L2 caches potentially
 		 * shared only by HTT processing units, if HTT is present.
 		 */
 		caches[0].id_shift = pkg_id_shift;
 		caches[0].present = 1;
 		caches[1].id_shift = pkg_id_shift;
 		caches[1].present = 1;
 		return;
 	}
 
 	for (i = 0; ; i++) {
 		cpuid_count(0x4, i, p);
 		type = p[0] & 0x1f;
 		level = (p[0] >> 5) & 0x7;
 		share_count = 1 + ((p[0] >> 14) & 0xfff);
 
 		if (!add_deterministic_cache(type, level, share_count))
 			break;
 	}
 }
 
 /*
  * Determine topology of processing units and caches for Intel CPUs.
  * See:
  *  - Intel 64 Architecture Processor Topology Enumeration
  */
 static void
 topo_probe_intel(void)
 {
 
 	/*
 	 * Note that 0x1 <= cpu_high < 4 case should be
 	 * compatible with topo_probe_intel_0x4() logic when
 	 * CPUID.1:EBX[23:16] > 0 (cpu_cores will be 1)
 	 * or it should trigger the fallback otherwise.
 	 */
 	if (cpu_high >= 0xb)
 		topo_probe_intel_0xb();
 	else if (cpu_high >= 0x1)
 		topo_probe_intel_0x4();
 
 	topo_probe_intel_caches();
 }
 
 /*
  * Topology information is queried only on BSP, on which this
  * code runs and for which it can query CPUID information.
  * Then topology is extrapolated on all packages using an
  * assumption that APIC ID to hardware component ID mapping is
  * homogenious.
  * That doesn't necesserily imply that the topology is uniform.
  */
 void
 topo_probe(void)
 {
 	static int cpu_topo_probed = 0;
 	struct x86_topo_layer {
 		int type;
 		int subtype;
 		int id_shift;
 	} topo_layers[MAX_CACHE_LEVELS + 4];
 	struct topo_node *parent;
 	struct topo_node *node;
 	int layer;
 	int nlayers;
 	int node_id;
 	int i;
 
 	if (cpu_topo_probed)
 		return;
 
 	CPU_ZERO(&logical_cpus_mask);
 
 	if (mp_ncpus <= 1)
 		; /* nothing */
 	else if (cpu_vendor_id == CPU_VENDOR_AMD)
 		topo_probe_amd();
 	else if (cpu_vendor_id == CPU_VENDOR_INTEL)
 		topo_probe_intel();
 
 	KASSERT(pkg_id_shift >= core_id_shift,
 	    ("bug in APIC topology discovery"));
 
 	nlayers = 0;
 	bzero(topo_layers, sizeof(topo_layers));
 
 	topo_layers[nlayers].type = TOPO_TYPE_PKG;
 	topo_layers[nlayers].id_shift = pkg_id_shift;
 	if (bootverbose)
 		printf("Package ID shift: %u\n", topo_layers[nlayers].id_shift);
 	nlayers++;
 
 	if (pkg_id_shift > node_id_shift && node_id_shift != 0) {
 		topo_layers[nlayers].type = TOPO_TYPE_GROUP;
 		topo_layers[nlayers].id_shift = node_id_shift;
 		if (bootverbose)
 			printf("Node ID shift: %u\n",
 			    topo_layers[nlayers].id_shift);
 		nlayers++;
 	}
 
 	/*
 	 * Consider all caches to be within a package/chip
 	 * and "in front" of all sub-components like
 	 * cores and hardware threads.
 	 */
 	for (i = MAX_CACHE_LEVELS - 1; i >= 0; --i) {
 		if (caches[i].present) {
 			if (node_id_shift != 0)
 				KASSERT(caches[i].id_shift <= node_id_shift,
 					("bug in APIC topology discovery"));
 			KASSERT(caches[i].id_shift <= pkg_id_shift,
 				("bug in APIC topology discovery"));
 			KASSERT(caches[i].id_shift >= core_id_shift,
 				("bug in APIC topology discovery"));
 
 			topo_layers[nlayers].type = TOPO_TYPE_CACHE;
 			topo_layers[nlayers].subtype = i + 1;
 			topo_layers[nlayers].id_shift = caches[i].id_shift;
 			if (bootverbose)
 				printf("L%u cache ID shift: %u\n",
 				    topo_layers[nlayers].subtype,
 				    topo_layers[nlayers].id_shift);
 			nlayers++;
 		}
 	}
 
 	if (pkg_id_shift > core_id_shift) {
 		topo_layers[nlayers].type = TOPO_TYPE_CORE;
 		topo_layers[nlayers].id_shift = core_id_shift;
 		if (bootverbose)
 			printf("Core ID shift: %u\n",
 			    topo_layers[nlayers].id_shift);
 		nlayers++;
 	}
 
 	topo_layers[nlayers].type = TOPO_TYPE_PU;
 	topo_layers[nlayers].id_shift = 0;
 	nlayers++;
 
 	topo_init_root(&topo_root);
 	for (i = 0; i <= max_apic_id; ++i) {
 		if (!cpu_info[i].cpu_present)
 			continue;
 
 		parent = &topo_root;
 		for (layer = 0; layer < nlayers; ++layer) {
 			node_id = i >> topo_layers[layer].id_shift;
 			parent = topo_add_node_by_hwid(parent, node_id,
 			    topo_layers[layer].type,
 			    topo_layers[layer].subtype);
 		}
 	}
 
 	parent = &topo_root;
 	for (layer = 0; layer < nlayers; ++layer) {
 		node_id = boot_cpu_id >> topo_layers[layer].id_shift;
 		node = topo_find_node_by_hwid(parent, node_id,
 		    topo_layers[layer].type,
 		    topo_layers[layer].subtype);
 		topo_promote_child(node);
 		parent = node;
 	}
 
 	cpu_topo_probed = 1;
 }
 
 /*
  * Assign logical CPU IDs to local APICs.
  */
 void
 assign_cpu_ids(void)
 {
 	struct topo_node *node;
 	u_int smt_mask;
 	int nhyper;
 
 	smt_mask = (1u << core_id_shift) - 1;
 
 	/*
 	 * Assign CPU IDs to local APIC IDs and disable any CPUs
 	 * beyond MAXCPU.  CPU 0 is always assigned to the BSP.
 	 */
 	mp_ncpus = 0;
 	nhyper = 0;
 	TOPO_FOREACH(node, &topo_root) {
 		if (node->type != TOPO_TYPE_PU)
 			continue;
 
 		if ((node->hwid & smt_mask) != (boot_cpu_id & smt_mask))
 			cpu_info[node->hwid].cpu_hyperthread = 1;
 
 		if (resource_disabled("lapic", node->hwid)) {
 			if (node->hwid != boot_cpu_id)
 				cpu_info[node->hwid].cpu_disabled = 1;
 			else
 				printf("Cannot disable BSP, APIC ID = %d\n",
 				    node->hwid);
 		}
 
 		if (!hyperthreading_allowed &&
 		    cpu_info[node->hwid].cpu_hyperthread)
 			cpu_info[node->hwid].cpu_disabled = 1;
 
 		if (mp_ncpus >= MAXCPU)
 			cpu_info[node->hwid].cpu_disabled = 1;
 
 		if (cpu_info[node->hwid].cpu_disabled) {
 			disabled_cpus++;
 			continue;
 		}
 
 		if (cpu_info[node->hwid].cpu_hyperthread)
 			nhyper++;
 
 		cpu_apic_ids[mp_ncpus] = node->hwid;
 		apic_cpuids[node->hwid] = mp_ncpus;
 		topo_set_pu_id(node, mp_ncpus);
 		mp_ncpus++;
 	}
 
 	KASSERT(mp_maxid >= mp_ncpus - 1,
 	    ("%s: counters out of sync: max %d, count %d", __func__, mp_maxid,
 	    mp_ncpus));
 
 	mp_ncores = mp_ncpus - nhyper;
 	smp_threads_per_core = mp_ncpus / mp_ncores;
 }
 
 /*
  * Print various information about the SMP system hardware and setup.
  */
 void
 cpu_mp_announce(void)
 {
 	struct topo_node *node;
 	const char *hyperthread;
 	struct topo_analysis topology;
 
 	printf("FreeBSD/SMP: ");
 	if (topo_analyze(&topo_root, 1, &topology)) {
 		printf("%d package(s)", topology.entities[TOPO_LEVEL_PKG]);
 		if (topology.entities[TOPO_LEVEL_GROUP] > 1)
 			printf(" x %d groups",
 			    topology.entities[TOPO_LEVEL_GROUP]);
 		if (topology.entities[TOPO_LEVEL_CACHEGROUP] > 1)
 			printf(" x %d cache groups",
 			    topology.entities[TOPO_LEVEL_CACHEGROUP]);
 		if (topology.entities[TOPO_LEVEL_CORE] > 0)
 			printf(" x %d core(s)",
 			    topology.entities[TOPO_LEVEL_CORE]);
 		if (topology.entities[TOPO_LEVEL_THREAD] > 1)
 			printf(" x %d hardware threads",
 			    topology.entities[TOPO_LEVEL_THREAD]);
 	} else {
 		printf("Non-uniform topology");
 	}
 	printf("\n");
 
 	if (disabled_cpus) {
 		printf("FreeBSD/SMP Online: ");
 		if (topo_analyze(&topo_root, 0, &topology)) {
 			printf("%d package(s)",
 			    topology.entities[TOPO_LEVEL_PKG]);
 			if (topology.entities[TOPO_LEVEL_GROUP] > 1)
 				printf(" x %d groups",
 				    topology.entities[TOPO_LEVEL_GROUP]);
 			if (topology.entities[TOPO_LEVEL_CACHEGROUP] > 1)
 				printf(" x %d cache groups",
 				    topology.entities[TOPO_LEVEL_CACHEGROUP]);
 			if (topology.entities[TOPO_LEVEL_CORE] > 0)
 				printf(" x %d core(s)",
 				    topology.entities[TOPO_LEVEL_CORE]);
 			if (topology.entities[TOPO_LEVEL_THREAD] > 1)
 				printf(" x %d hardware threads",
 				    topology.entities[TOPO_LEVEL_THREAD]);
 		} else {
 			printf("Non-uniform topology");
 		}
 		printf("\n");
 	}
 
 	if (!bootverbose)
 		return;
 
 	TOPO_FOREACH(node, &topo_root) {
 		switch (node->type) {
 		case TOPO_TYPE_PKG:
 			printf("Package HW ID = %u\n", node->hwid);
 			break;
 		case TOPO_TYPE_CORE:
 			printf("\tCore HW ID = %u\n", node->hwid);
 			break;
 		case TOPO_TYPE_PU:
 			if (cpu_info[node->hwid].cpu_hyperthread)
 				hyperthread = "/HT";
 			else
 				hyperthread = "";
 
 			if (node->subtype == 0)
 				printf("\t\tCPU (AP%s): APIC ID: %u"
 				    "(disabled)\n", hyperthread, node->hwid);
 			else if (node->id == 0)
 				printf("\t\tCPU0 (BSP): APIC ID: %u\n",
 				    node->hwid);
 			else
 				printf("\t\tCPU%u (AP%s): APIC ID: %u\n",
 				    node->id, hyperthread, node->hwid);
 			break;
 		default:
 			/* ignored */
 			break;
 		}
 	}
 }
 
 /*
  * Add a scheduling group, a group of logical processors sharing
  * a particular cache (and, thus having an affinity), to the scheduling
  * topology.
  * This function recursively works on lower level caches.
  */
 static void
 x86topo_add_sched_group(struct topo_node *root, struct cpu_group *cg_root)
 {
 	struct topo_node *node;
 	int nchildren;
 	int ncores;
 	int i;
 
 	KASSERT(root->type == TOPO_TYPE_SYSTEM || root->type == TOPO_TYPE_CACHE ||
 	    root->type == TOPO_TYPE_GROUP,
 	    ("x86topo_add_sched_group: bad type: %u", root->type));
 	CPU_COPY(&root->cpuset, &cg_root->cg_mask);
 	cg_root->cg_count = root->cpu_count;
 	if (root->type == TOPO_TYPE_SYSTEM)
 		cg_root->cg_level = CG_SHARE_NONE;
 	else
 		cg_root->cg_level = root->subtype;
 
 	/*
 	 * Check how many core nodes we have under the given root node.
 	 * If we have multiple logical processors, but not multiple
 	 * cores, then those processors must be hardware threads.
 	 */
 	ncores = 0;
 	node = root;
 	while (node != NULL) {
 		if (node->type != TOPO_TYPE_CORE) {
 			node = topo_next_node(root, node);
 			continue;
 		}
 
 		ncores++;
 		node = topo_next_nonchild_node(root, node);
 	}
 
 	if (cg_root->cg_level != CG_SHARE_NONE &&
 	    root->cpu_count > 1 && ncores < 2)
 		cg_root->cg_flags = CG_FLAG_SMT;
 
 	/*
 	 * Find out how many cache nodes we have under the given root node.
 	 * We ignore cache nodes that cover all the same processors as the
 	 * root node.  Also, we do not descend below found cache nodes.
 	 * That is, we count top-level "non-redundant" caches under the root
 	 * node.
 	 */
 	nchildren = 0;
 	node = root;
 	while (node != NULL) {
 		if ((node->type != TOPO_TYPE_GROUP &&
 		    node->type != TOPO_TYPE_CACHE) ||
 		    (root->type != TOPO_TYPE_SYSTEM &&
 		    CPU_CMP(&node->cpuset, &root->cpuset) == 0)) {
 			node = topo_next_node(root, node);
 			continue;
 		}
 		nchildren++;
 		node = topo_next_nonchild_node(root, node);
 	}
 
 	cg_root->cg_child = smp_topo_alloc(nchildren);
 	cg_root->cg_children = nchildren;
 
 	/*
 	 * Now find again the same cache nodes as above and recursively
 	 * build scheduling topologies for them.
 	 */
 	node = root;
 	i = 0;
 	while (node != NULL) {
 		if ((node->type != TOPO_TYPE_GROUP &&
 		    node->type != TOPO_TYPE_CACHE) ||
 		    (root->type != TOPO_TYPE_SYSTEM &&
 		    CPU_CMP(&node->cpuset, &root->cpuset) == 0)) {
 			node = topo_next_node(root, node);
 			continue;
 		}
 		cg_root->cg_child[i].cg_parent = cg_root;
 		x86topo_add_sched_group(node, &cg_root->cg_child[i]);
 		i++;
 		node = topo_next_nonchild_node(root, node);
 	}
 }
 
 /*
  * Build the MI scheduling topology from the discovered hardware topology.
  */
 struct cpu_group *
 cpu_topo(void)
 {
 	struct cpu_group *cg_root;
 
 	if (mp_ncpus <= 1)
 		return (smp_topo_none());
 
 	cg_root = smp_topo_alloc(1);
 	x86topo_add_sched_group(&topo_root, cg_root);
 	return (cg_root);
 }
 
 static void
 cpu_alloc(void *dummy __unused)
 {
 	/*
 	 * Dynamically allocate the arrays that depend on the
 	 * maximum APIC ID.
 	 */
 	cpu_info = malloc(sizeof(*cpu_info) * (max_apic_id + 1), M_CPUS,
 	    M_WAITOK | M_ZERO);
 	apic_cpuids = malloc(sizeof(*apic_cpuids) * (max_apic_id + 1), M_CPUS,
 	    M_WAITOK | M_ZERO);
 }
 SYSINIT(cpu_alloc, SI_SUB_CPU, SI_ORDER_FIRST, cpu_alloc, NULL);
 
 /*
  * Add a logical CPU to the topology.
  */
 void
 cpu_add(u_int apic_id, char boot_cpu)
 {
 
 	if (apic_id > max_apic_id) {
 		panic("SMP: APIC ID %d too high", apic_id);
 		return;
 	}
 	KASSERT(cpu_info[apic_id].cpu_present == 0, ("CPU %u added twice",
 	    apic_id));
 	cpu_info[apic_id].cpu_present = 1;
 	if (boot_cpu) {
 		KASSERT(boot_cpu_id == -1,
 		    ("CPU %u claims to be BSP, but CPU %u already is", apic_id,
 		    boot_cpu_id));
 		boot_cpu_id = apic_id;
 		cpu_info[apic_id].cpu_bsp = 1;
 	}
 	if (bootverbose)
 		printf("SMP: Added CPU %u (%s)\n", apic_id, boot_cpu ? "BSP" :
 		    "AP");
 }
 
 void
 cpu_mp_setmaxid(void)
 {
 
 	/*
 	 * mp_ncpus and mp_maxid should be already set by calls to cpu_add().
 	 * If there were no calls to cpu_add() assume this is a UP system.
 	 */
 	if (mp_ncpus == 0)
 		mp_ncpus = 1;
 }
 
 int
 cpu_mp_probe(void)
 {
 
 	/*
 	 * Always record BSP in CPU map so that the mbuf init code works
 	 * correctly.
 	 */
 	CPU_SETOF(0, &all_cpus);
 	return (mp_ncpus > 1);
 }
 
 /* Allocate memory for the AP trampoline. */
 void
 alloc_ap_trampoline(vm_paddr_t *physmap, unsigned int *physmap_idx)
 {
 	unsigned int i;
 	bool allocated;
 
 	allocated = false;
 	for (i = *physmap_idx; i <= *physmap_idx; i -= 2) {
 		/*
 		 * Find a memory region big enough and below the 1MB boundary
 		 * for the trampoline code.
 		 * NB: needs to be page aligned.
 		 */
 		if (physmap[i] >= MiB(1) ||
 		    (trunc_page(physmap[i + 1]) - round_page(physmap[i])) <
 		    round_page(bootMP_size))
 			continue;
 
 		allocated = true;
 		/*
 		 * Try to steal from the end of the region to mimic previous
 		 * behaviour, else fallback to steal from the start.
 		 */
 		if (physmap[i + 1] < MiB(1)) {
 			boot_address = trunc_page(physmap[i + 1]);
 			if ((physmap[i + 1] - boot_address) < bootMP_size)
 				boot_address -= round_page(bootMP_size);
 			physmap[i + 1] = boot_address;
 		} else {
 			boot_address = round_page(physmap[i]);
 			physmap[i] = boot_address + round_page(bootMP_size);
 		}
 		if (physmap[i] == physmap[i + 1] && *physmap_idx != 0) {
 			memmove(&physmap[i], &physmap[i + 2],
 			    sizeof(*physmap) * (*physmap_idx - i + 2));
 			*physmap_idx -= 2;
 		}
 		break;
 	}
 
 	if (!allocated) {
 		boot_address = basemem * 1024 - bootMP_size;
 		if (bootverbose)
 			printf(
 "Cannot find enough space for the boot trampoline, placing it at %#x",
 			    boot_address);
 	}
 }
 
 /*
  * AP CPU's call this to initialize themselves.
  */
 void
 init_secondary_tail(void)
 {
 	u_int cpuid;
 
 	pmap_activate_boot(vmspace_pmap(proc0.p_vmspace));
 
 	/*
 	 * On real hardware, switch to x2apic mode if possible.  Do it
 	 * after aps_ready was signalled, to avoid manipulating the
 	 * mode while BSP might still want to send some IPI to us
 	 * (second startup IPI is ignored on modern hardware etc).
 	 */
 	lapic_xapic_mode();
 
 	/* Initialize the PAT MSR. */
 	pmap_init_pat();
 
 	/* set up CPU registers and state */
 	cpu_setregs();
 
 	/* set up SSE/NX */
 	initializecpu();
 
 	/* set up FPU state on the AP */
 #ifdef __amd64__
 	fpuinit();
 #else
 	npxinit(false);
 #endif
 
 	if (cpu_ops.cpu_init)
 		cpu_ops.cpu_init();
 
 	/* A quick check from sanity claus */
 	cpuid = PCPU_GET(cpuid);
 	if (PCPU_GET(apic_id) != lapic_id()) {
 		printf("SMP: cpuid = %d\n", cpuid);
 		printf("SMP: actual apic_id = %d\n", lapic_id());
 		printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id));
 		panic("cpuid mismatch! boom!!");
 	}
 
 	/* Initialize curthread. */
 	KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
 	PCPU_SET(curthread, PCPU_GET(idlethread));
 
 	mtx_lock_spin(&ap_boot_mtx);
 
 	mca_init();
 
 	/* Init local apic for irq's */
 	lapic_setup(1);
 
 	/* Set memory range attributes for this CPU to match the BSP */
 	mem_range_AP_init();
 
 	smp_cpus++;
 
 	CTR1(KTR_SMP, "SMP: AP CPU #%d Launched", cpuid);
 	if (bootverbose)
 		printf("SMP: AP CPU #%d Launched!\n", cpuid);
 	else
 		printf("%s%d%s", smp_cpus == 2 ? "Launching APs: " : "",
 		    cpuid, smp_cpus == mp_ncpus ? "\n" : " ");
 
 	/* Determine if we are a logical CPU. */
 	if (cpu_info[PCPU_GET(apic_id)].cpu_hyperthread)
 		CPU_SET(cpuid, &logical_cpus_mask);
 
 	if (bootverbose)
 		lapic_dump("AP");
 
 	if (smp_cpus == mp_ncpus) {
 		/* enable IPI's, tlb shootdown, freezes etc */
 		atomic_store_rel_int(&smp_started, 1);
 	}
 
 #ifdef __amd64__
 	/*
 	 * Enable global pages TLB extension
 	 * This also implicitly flushes the TLB 
 	 */
 	load_cr4(rcr4() | CR4_PGE);
 	if (pmap_pcid_enabled)
 		load_cr4(rcr4() | CR4_PCIDE);
 	load_ds(_udatasel);
 	load_es(_udatasel);
 	load_fs(_ufssel);
 #endif
 
 	mtx_unlock_spin(&ap_boot_mtx);
 
 	/* Wait until all the AP's are up. */
 	while (atomic_load_acq_int(&smp_started) == 0)
 		ia32_pause();
 
 #ifndef EARLY_AP_STARTUP
 	/* Start per-CPU event timers. */
 	cpu_initclocks_ap();
 #endif
 
 	sched_throw(NULL);
 
 	panic("scheduler returned us to %s", __func__);
 	/* NOTREACHED */
 }
 
 static void
 smp_after_idle_runnable(void *arg __unused)
 {
 	struct thread *idle_td;
 	int cpu;
 
 	for (cpu = 1; cpu < mp_ncpus; cpu++) {
 		idle_td = pcpu_find(cpu)->pc_idlethread;
 		while (atomic_load_int(&idle_td->td_lastcpu) == NOCPU &&
 		    atomic_load_int(&idle_td->td_oncpu) == NOCPU)
 			cpu_spinwait();
 		kmem_free((vm_offset_t)bootstacks[cpu], kstack_pages *
 		    PAGE_SIZE);
 	}
 }
 SYSINIT(smp_after_idle_runnable, SI_SUB_SMP, SI_ORDER_ANY,
     smp_after_idle_runnable, NULL);
 
 /*
  * We tell the I/O APIC code about all the CPUs we want to receive
  * interrupts.  If we don't want certain CPUs to receive IRQs we
  * can simply not tell the I/O APIC code about them in this function.
  * We also do not tell it about the BSP since it tells itself about
  * the BSP internally to work with UP kernels and on UP machines.
  */
 void
 set_interrupt_apic_ids(void)
 {
 	u_int i, apic_id;
 
 	for (i = 0; i < MAXCPU; i++) {
 		apic_id = cpu_apic_ids[i];
 		if (apic_id == -1)
 			continue;
 		if (cpu_info[apic_id].cpu_bsp)
 			continue;
 		if (cpu_info[apic_id].cpu_disabled)
 			continue;
 
 		/* Don't let hyperthreads service interrupts. */
 		if (cpu_info[apic_id].cpu_hyperthread)
 			continue;
 
 		intr_add_cpu(i);
 	}
 }
 
 
 #ifdef COUNT_XINVLTLB_HITS
 u_int xhits_gbl[MAXCPU];
 u_int xhits_pg[MAXCPU];
 u_int xhits_rng[MAXCPU];
 static SYSCTL_NODE(_debug, OID_AUTO, xhits, CTLFLAG_RW, 0, "");
 SYSCTL_OPAQUE(_debug_xhits, OID_AUTO, global, CTLFLAG_RW, &xhits_gbl,
     sizeof(xhits_gbl), "IU", "");
 SYSCTL_OPAQUE(_debug_xhits, OID_AUTO, page, CTLFLAG_RW, &xhits_pg,
     sizeof(xhits_pg), "IU", "");
 SYSCTL_OPAQUE(_debug_xhits, OID_AUTO, range, CTLFLAG_RW, &xhits_rng,
     sizeof(xhits_rng), "IU", "");
 
 u_int ipi_global;
 u_int ipi_page;
 u_int ipi_range;
 u_int ipi_range_size;
 SYSCTL_INT(_debug_xhits, OID_AUTO, ipi_global, CTLFLAG_RW, &ipi_global, 0, "");
 SYSCTL_INT(_debug_xhits, OID_AUTO, ipi_page, CTLFLAG_RW, &ipi_page, 0, "");
 SYSCTL_INT(_debug_xhits, OID_AUTO, ipi_range, CTLFLAG_RW, &ipi_range, 0, "");
 SYSCTL_INT(_debug_xhits, OID_AUTO, ipi_range_size, CTLFLAG_RW, &ipi_range_size,
     0, "");
 #endif /* COUNT_XINVLTLB_HITS */
 
 /*
  * Init and startup IPI.
  */
 void
 ipi_startup(int apic_id, int vector)
 {
 
 	/*
 	 * This attempts to follow the algorithm described in the
 	 * Intel Multiprocessor Specification v1.4 in section B.4.
 	 * For each IPI, we allow the local APIC ~20us to deliver the
 	 * IPI.  If that times out, we panic.
 	 */
 
 	/*
 	 * first we do an INIT IPI: this INIT IPI might be run, resetting
 	 * and running the target CPU. OR this INIT IPI might be latched (P5
 	 * bug), CPU waiting for STARTUP IPI. OR this INIT IPI might be
 	 * ignored.
 	 */
 	lapic_ipi_raw(APIC_DEST_DESTFLD | APIC_TRIGMOD_LEVEL |
 	    APIC_LEVEL_ASSERT | APIC_DESTMODE_PHY | APIC_DELMODE_INIT, apic_id);
 	lapic_ipi_wait(100);
 
 	/* Explicitly deassert the INIT IPI. */
 	lapic_ipi_raw(APIC_DEST_DESTFLD | APIC_TRIGMOD_LEVEL |
 	    APIC_LEVEL_DEASSERT | APIC_DESTMODE_PHY | APIC_DELMODE_INIT,
 	    apic_id);
 
 	DELAY(10000);		/* wait ~10mS */
 
 	/*
 	 * next we do a STARTUP IPI: the previous INIT IPI might still be
 	 * latched, (P5 bug) this 1st STARTUP would then terminate
 	 * immediately, and the previously started INIT IPI would continue. OR
 	 * the previous INIT IPI has already run. and this STARTUP IPI will
 	 * run. OR the previous INIT IPI was ignored. and this STARTUP IPI
 	 * will run.
 	 */
 	lapic_ipi_raw(APIC_DEST_DESTFLD | APIC_TRIGMOD_EDGE |
 	    APIC_LEVEL_ASSERT | APIC_DESTMODE_PHY | APIC_DELMODE_STARTUP |
 	    vector, apic_id);
 	if (!lapic_ipi_wait(100))
 		panic("Failed to deliver first STARTUP IPI to APIC %d",
 		    apic_id);
 	DELAY(200);		/* wait ~200uS */
 
 	/*
 	 * finally we do a 2nd STARTUP IPI: this 2nd STARTUP IPI should run IF
 	 * the previous STARTUP IPI was cancelled by a latched INIT IPI. OR
 	 * this STARTUP IPI will be ignored, as only ONE STARTUP IPI is
 	 * recognized after hardware RESET or INIT IPI.
 	 */
 	lapic_ipi_raw(APIC_DEST_DESTFLD | APIC_TRIGMOD_EDGE |
 	    APIC_LEVEL_ASSERT | APIC_DESTMODE_PHY | APIC_DELMODE_STARTUP |
 	    vector, apic_id);
 	if (!lapic_ipi_wait(100))
 		panic("Failed to deliver second STARTUP IPI to APIC %d",
 		    apic_id);
 
 	DELAY(200);		/* wait ~200uS */
 }
 
 /*
  * Send an IPI to specified CPU handling the bitmap logic.
  */
 void
 ipi_send_cpu(int cpu, u_int ipi)
 {
 	u_int bitmap, old_pending, new_pending;
 
 	KASSERT(cpu_apic_ids[cpu] != -1, ("IPI to non-existent CPU %d", cpu));
 
 	if (IPI_IS_BITMAPED(ipi)) {
 		bitmap = 1 << ipi;
 		ipi = IPI_BITMAP_VECTOR;
 		do {
 			old_pending = cpu_ipi_pending[cpu];
 			new_pending = old_pending | bitmap;
 		} while  (!atomic_cmpset_int(&cpu_ipi_pending[cpu],
 		    old_pending, new_pending));	
 		if (old_pending)
 			return;
 	}
 	lapic_ipi_vectored(ipi, cpu_apic_ids[cpu]);
 }
 
 void
 ipi_bitmap_handler(struct trapframe frame)
 {
 	struct trapframe *oldframe;
 	struct thread *td;
 	int cpu = PCPU_GET(cpuid);
 	u_int ipi_bitmap;
 
 	critical_enter();
 	td = curthread;
 	td->td_intr_nesting_level++;
 	oldframe = td->td_intr_frame;
 	td->td_intr_frame = &frame;
 	ipi_bitmap = atomic_readandclear_int(&cpu_ipi_pending[cpu]);
 	if (ipi_bitmap & (1 << IPI_PREEMPT)) {
 #ifdef COUNT_IPIS
 		(*ipi_preempt_counts[cpu])++;
 #endif
 		sched_preempt(td);
 	}
 	if (ipi_bitmap & (1 << IPI_AST)) {
 #ifdef COUNT_IPIS
 		(*ipi_ast_counts[cpu])++;
 #endif
 		/* Nothing to do for AST */
 	}
 	if (ipi_bitmap & (1 << IPI_HARDCLOCK)) {
 #ifdef COUNT_IPIS
 		(*ipi_hardclock_counts[cpu])++;
 #endif
 		hardclockintr();
 	}
 	td->td_intr_frame = oldframe;
 	td->td_intr_nesting_level--;
 	critical_exit();
 }
 
 /*
  * send an IPI to a set of cpus.
  */
 void
 ipi_selected(cpuset_t cpus, u_int ipi)
 {
 	int cpu;
 
 	/*
 	 * IPI_STOP_HARD maps to a NMI and the trap handler needs a bit
 	 * of help in order to understand what is the source.
 	 * Set the mask of receiving CPUs for this purpose.
 	 */
 	if (ipi == IPI_STOP_HARD)
 		CPU_OR_ATOMIC(&ipi_stop_nmi_pending, &cpus);
 
 	while ((cpu = CPU_FFS(&cpus)) != 0) {
 		cpu--;
 		CPU_CLR(cpu, &cpus);
 		CTR3(KTR_SMP, "%s: cpu: %d ipi: %x", __func__, cpu, ipi);
 		ipi_send_cpu(cpu, ipi);
 	}
 }
 
 /*
  * send an IPI to a specific CPU.
  */
 void
 ipi_cpu(int cpu, u_int ipi)
 {
 
 	/*
 	 * IPI_STOP_HARD maps to a NMI and the trap handler needs a bit
 	 * of help in order to understand what is the source.
 	 * Set the mask of receiving CPUs for this purpose.
 	 */
 	if (ipi == IPI_STOP_HARD)
 		CPU_SET_ATOMIC(cpu, &ipi_stop_nmi_pending);
 
 	CTR3(KTR_SMP, "%s: cpu: %d ipi: %x", __func__, cpu, ipi);
 	ipi_send_cpu(cpu, ipi);
 }
 
 /*
  * send an IPI to all CPUs EXCEPT myself
  */
 void
 ipi_all_but_self(u_int ipi)
 {
 	cpuset_t other_cpus;
 
 	other_cpus = all_cpus;
 	CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 	if (IPI_IS_BITMAPED(ipi)) {
 		ipi_selected(other_cpus, ipi);
 		return;
 	}
 
 	/*
 	 * IPI_STOP_HARD maps to a NMI and the trap handler needs a bit
 	 * of help in order to understand what is the source.
 	 * Set the mask of receiving CPUs for this purpose.
 	 */
 	if (ipi == IPI_STOP_HARD)
 		CPU_OR_ATOMIC(&ipi_stop_nmi_pending, &other_cpus);
 
 	CTR2(KTR_SMP, "%s: ipi: %x", __func__, ipi);
 	lapic_ipi_vectored(ipi, APIC_IPI_DEST_OTHERS);
 }
 
 int
 ipi_nmi_handler(void)
 {
 	u_int cpuid;
 
 	/*
 	 * As long as there is not a simple way to know about a NMI's
 	 * source, if the bitmask for the current CPU is present in
 	 * the global pending bitword an IPI_STOP_HARD has been issued
 	 * and should be handled.
 	 */
 	cpuid = PCPU_GET(cpuid);
 	if (!CPU_ISSET(cpuid, &ipi_stop_nmi_pending))
 		return (1);
 
 	CPU_CLR_ATOMIC(cpuid, &ipi_stop_nmi_pending);
 	cpustop_handler();
 	return (0);
 }
 
 int nmi_kdb_lock;
 
 void
 nmi_call_kdb_smp(u_int type, struct trapframe *frame)
 {
 	int cpu;
 	bool call_post;
 
 	cpu = PCPU_GET(cpuid);
 	if (atomic_cmpset_acq_int(&nmi_kdb_lock, 0, 1)) {
 		nmi_call_kdb(cpu, type, frame);
 		call_post = false;
 	} else {
 		savectx(&stoppcbs[cpu]);
 		CPU_SET_ATOMIC(cpu, &stopped_cpus);
 		while (!atomic_cmpset_acq_int(&nmi_kdb_lock, 0, 1))
 			ia32_pause();
 		call_post = true;
 	}
 	atomic_store_rel_int(&nmi_kdb_lock, 0);
 	if (call_post)
 		cpustop_handler_post(cpu);
 }
 
 /*
  * Handle an IPI_STOP by saving our current context and spinning until we
  * are resumed.
  */
 void
 cpustop_handler(void)
 {
 	u_int cpu;
 
 	cpu = PCPU_GET(cpuid);
 
 	savectx(&stoppcbs[cpu]);
 
 	/* Indicate that we are stopped */
 	CPU_SET_ATOMIC(cpu, &stopped_cpus);
 
 	/* Wait for restart */
-	while (!CPU_ISSET(cpu, &started_cpus))
-	    ia32_pause();
+	while (!CPU_ISSET(cpu, &started_cpus)) {
+		ia32_pause();
+
+		/*
+		 * Halt non-BSP CPUs on panic -- we're never going to need them
+		 * again, and might as well save power / release resources
+		 * (e.g., overprovisioned VM infrastructure).
+		 */
+		while (__predict_false(!IS_BSP() && panicstr != NULL))
+			halt();
+	}
 
 	cpustop_handler_post(cpu);
 }
 
 static void
 cpustop_handler_post(u_int cpu)
 {
 
 	CPU_CLR_ATOMIC(cpu, &started_cpus);
 	CPU_CLR_ATOMIC(cpu, &stopped_cpus);
 
 	/*
 	 * We don't broadcast TLB invalidations to other CPUs when they are
 	 * stopped. Hence, we clear the TLB before resuming.
 	 */
 	invltlb_glob();
 
 #if defined(__amd64__) && defined(DDB)
 	amd64_db_resume_dbreg();
 #endif
 
 	if (cpu == 0 && cpustop_restartfunc != NULL) {
 		cpustop_restartfunc();
 		cpustop_restartfunc = NULL;
 	}
 }
 
 /*
  * Handle an IPI_SUSPEND by saving our current context and spinning until we
  * are resumed.
  */
 void
 cpususpend_handler(void)
 {
 	u_int cpu;
 
 	mtx_assert(&smp_ipi_mtx, MA_NOTOWNED);
 
 	cpu = PCPU_GET(cpuid);
 	if (savectx(&susppcbs[cpu]->sp_pcb)) {
 #ifdef __amd64__
 		fpususpend(susppcbs[cpu]->sp_fpususpend);
 #else
 		npxsuspend(susppcbs[cpu]->sp_fpususpend);
 #endif
 		/*
 		 * suspended_cpus is cleared shortly after each AP is restarted
 		 * by a Startup IPI, so that the BSP can proceed to restarting
 		 * the next AP.
 		 *
 		 * resuming_cpus gets cleared when the AP completes
 		 * initialization after having been released by the BSP.
 		 * resuming_cpus is probably not the best name for the
 		 * variable, because it is actually a set of processors that
 		 * haven't resumed yet and haven't necessarily started resuming.
 		 *
 		 * Note that suspended_cpus is meaningful only for ACPI suspend
 		 * as it's not really used for Xen suspend since the APs are
 		 * automatically restored to the running state and the correct
 		 * context.  For the same reason resumectx is never called in
 		 * that case.
 		 */
 		CPU_SET_ATOMIC(cpu, &suspended_cpus);
 		CPU_SET_ATOMIC(cpu, &resuming_cpus);
 
 		/*
 		 * Invalidate the cache after setting the global status bits.
 		 * The last AP to set its bit may end up being an Owner of the
 		 * corresponding cache line in MOESI protocol.  The AP may be
 		 * stopped before the cache line is written to the main memory.
 		 */
 		wbinvd();
 	} else {
 #ifdef __amd64__
 		fpuresume(susppcbs[cpu]->sp_fpususpend);
 #else
 		npxresume(susppcbs[cpu]->sp_fpususpend);
 #endif
 		pmap_init_pat();
 		initializecpu();
 		PCPU_SET(switchtime, 0);
 		PCPU_SET(switchticks, ticks);
 
 		/* Indicate that we have restarted and restored the context. */
 		CPU_CLR_ATOMIC(cpu, &suspended_cpus);
 	}
 
 	/* Wait for resume directive */
 	while (!CPU_ISSET(cpu, &toresume_cpus))
 		ia32_pause();
 
 	/* Re-apply microcode updates. */
 	ucode_reload();
 
 #ifdef __i386__
 	/* Finish removing the identity mapping of low memory for this AP. */
 	invltlb_glob();
 #endif
 
 	if (cpu_ops.cpu_resume)
 		cpu_ops.cpu_resume();
 #ifdef __amd64__
 	if (vmm_resume_p)
 		vmm_resume_p();
 #endif
 
 	/* Resume MCA and local APIC */
 	lapic_xapic_mode();
 	mca_resume();
 	lapic_setup(0);
 
 	/* Indicate that we are resumed */
 	CPU_CLR_ATOMIC(cpu, &resuming_cpus);
 	CPU_CLR_ATOMIC(cpu, &suspended_cpus);
 	CPU_CLR_ATOMIC(cpu, &toresume_cpus);
 }
 
 
 void
 invlcache_handler(void)
 {
 	uint32_t generation;
 
 #ifdef COUNT_IPIS
 	(*ipi_invlcache_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
 
 	/*
 	 * Reading the generation here allows greater parallelism
 	 * since wbinvd is a serializing instruction.  Without the
 	 * temporary, we'd wait for wbinvd to complete, then the read
 	 * would execute, then the dependent write, which must then
 	 * complete before return from interrupt.
 	 */
 	generation = smp_tlb_generation;
 	wbinvd();
 	PCPU_SET(smp_tlb_done, generation);
 }
 
 /*
  * This is called once the rest of the system is up and running and we're
  * ready to let the AP's out of the pen.
  */
 static void
 release_aps(void *dummy __unused)
 {
 
 	if (mp_ncpus == 1) 
 		return;
 	atomic_store_rel_int(&aps_ready, 1);
 	while (smp_started == 0)
 		ia32_pause();
 }
 SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL);
 
 #ifdef COUNT_IPIS
 /*
  * Setup interrupt counters for IPI handlers.
  */
 static void
 mp_ipi_intrcnt(void *dummy)
 {
 	char buf[64];
 	int i;
 
 	CPU_FOREACH(i) {
 		snprintf(buf, sizeof(buf), "cpu%d:invltlb", i);
 		intrcnt_add(buf, &ipi_invltlb_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:invlrng", i);
 		intrcnt_add(buf, &ipi_invlrng_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:invlpg", i);
 		intrcnt_add(buf, &ipi_invlpg_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:invlcache", i);
 		intrcnt_add(buf, &ipi_invlcache_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:preempt", i);
 		intrcnt_add(buf, &ipi_preempt_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:ast", i);
 		intrcnt_add(buf, &ipi_ast_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:rendezvous", i);
 		intrcnt_add(buf, &ipi_rendezvous_counts[i]);
 		snprintf(buf, sizeof(buf), "cpu%d:hardclock", i);
 		intrcnt_add(buf, &ipi_hardclock_counts[i]);
 	}		
 }
 SYSINIT(mp_ipi_intrcnt, SI_SUB_INTR, SI_ORDER_MIDDLE, mp_ipi_intrcnt, NULL);
 #endif
 
 /*
  * Flush the TLB on other CPU's
  */
 
 /* Variables needed for SMP tlb shootdown. */
 vm_offset_t smp_tlb_addr1, smp_tlb_addr2;
 pmap_t smp_tlb_pmap;
 volatile uint32_t smp_tlb_generation;
 
 #ifdef __amd64__
 #define	read_eflags() read_rflags()
 #endif
 
 static void
 smp_targeted_tlb_shootdown(cpuset_t mask, u_int vector, pmap_t pmap,
     vm_offset_t addr1, vm_offset_t addr2)
 {
 	cpuset_t other_cpus;
 	volatile uint32_t *p_cpudone;
 	uint32_t generation;
 	int cpu;
 
 	/* It is not necessary to signal other CPUs while in the debugger. */
 	if (kdb_active || panicstr != NULL)
 		return;
 
 	/*
 	 * Check for other cpus.  Return if none.
 	 */
 	if (CPU_ISFULLSET(&mask)) {
 		if (mp_ncpus <= 1)
 			return;
 	} else {
 		CPU_CLR(PCPU_GET(cpuid), &mask);
 		if (CPU_EMPTY(&mask))
 			return;
 	}
 
 	if (!(read_eflags() & PSL_I))
 		panic("%s: interrupts disabled", __func__);
 	mtx_lock_spin(&smp_ipi_mtx);
 	smp_tlb_addr1 = addr1;
 	smp_tlb_addr2 = addr2;
 	smp_tlb_pmap = pmap;
 	generation = ++smp_tlb_generation;
 	if (CPU_ISFULLSET(&mask)) {
 		ipi_all_but_self(vector);
 		other_cpus = all_cpus;
 		CPU_CLR(PCPU_GET(cpuid), &other_cpus);
 	} else {
 		other_cpus = mask;
 		while ((cpu = CPU_FFS(&mask)) != 0) {
 			cpu--;
 			CPU_CLR(cpu, &mask);
 			CTR3(KTR_SMP, "%s: cpu: %d ipi: %x", __func__,
 			    cpu, vector);
 			ipi_send_cpu(cpu, vector);
 		}
 	}
 	while ((cpu = CPU_FFS(&other_cpus)) != 0) {
 		cpu--;
 		CPU_CLR(cpu, &other_cpus);
 		p_cpudone = &cpuid_to_pcpu[cpu]->pc_smp_tlb_done;
 		while (*p_cpudone != generation)
 			ia32_pause();
 	}
 	mtx_unlock_spin(&smp_ipi_mtx);
 }
 
 void
 smp_masked_invltlb(cpuset_t mask, pmap_t pmap)
 {
 
 	if (smp_started) {
 		smp_targeted_tlb_shootdown(mask, IPI_INVLTLB, pmap, 0, 0);
 #ifdef COUNT_XINVLTLB_HITS
 		ipi_global++;
 #endif
 	}
 }
 
 void
 smp_masked_invlpg(cpuset_t mask, vm_offset_t addr, pmap_t pmap)
 {
 
 	if (smp_started) {
 		smp_targeted_tlb_shootdown(mask, IPI_INVLPG, pmap, addr, 0);
 #ifdef COUNT_XINVLTLB_HITS
 		ipi_page++;
 #endif
 	}
 }
 
 void
 smp_masked_invlpg_range(cpuset_t mask, vm_offset_t addr1, vm_offset_t addr2,
     pmap_t pmap)
 {
 
 	if (smp_started) {
 		smp_targeted_tlb_shootdown(mask, IPI_INVLRNG, pmap,
 		    addr1, addr2);
 #ifdef COUNT_XINVLTLB_HITS
 		ipi_range++;
 		ipi_range_size += (addr2 - addr1) / PAGE_SIZE;
 #endif
 	}
 }
 
 void
 smp_cache_flush(void)
 {
 
 	if (smp_started) {
 		smp_targeted_tlb_shootdown(all_cpus, IPI_INVLCACHE, NULL,
 		    0, 0);
 	}
 }
 
 /*
  * Handlers for TLB related IPIs
  */
 void
 invltlb_handler(void)
 {
 	uint32_t generation;
   
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_gbl[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
 #ifdef COUNT_IPIS
 	(*ipi_invltlb_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
 
 	/*
 	 * Reading the generation here allows greater parallelism
 	 * since invalidating the TLB is a serializing operation.
 	 */
 	generation = smp_tlb_generation;
 	if (smp_tlb_pmap == kernel_pmap)
 		invltlb_glob();
 #ifdef __amd64__
 	else
 		invltlb();
 #endif
 	PCPU_SET(smp_tlb_done, generation);
 }
 
 void
 invlpg_handler(void)
 {
 	uint32_t generation;
 
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_pg[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
 #ifdef COUNT_IPIS
 	(*ipi_invlpg_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
 
 	generation = smp_tlb_generation;	/* Overlap with serialization */
 #ifdef __i386__
 	if (smp_tlb_pmap == kernel_pmap)
 #endif
 		invlpg(smp_tlb_addr1);
 	PCPU_SET(smp_tlb_done, generation);
 }
 
 void
 invlrng_handler(void)
 {
 	vm_offset_t addr, addr2;
 	uint32_t generation;
 
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_rng[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
 #ifdef COUNT_IPIS
 	(*ipi_invlrng_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
 
 	addr = smp_tlb_addr1;
 	addr2 = smp_tlb_addr2;
 	generation = smp_tlb_generation;	/* Overlap with serialization */
 #ifdef __i386__
 	if (smp_tlb_pmap == kernel_pmap)
 #endif
 		do {
 			invlpg(addr);
 			addr += PAGE_SIZE;
 		} while (addr < addr2);
 
 	PCPU_SET(smp_tlb_done, generation);
 }
Index: user/ngie/bug-237403/tests/sys/opencrypto/cryptodev.py
===================================================================
--- user/ngie/bug-237403/tests/sys/opencrypto/cryptodev.py	(revision 346925)
+++ user/ngie/bug-237403/tests/sys/opencrypto/cryptodev.py	(revision 346926)
@@ -1,695 +1,695 @@
 #!/usr/bin/env python
 #
 # Copyright (c) 2014 The FreeBSD Foundation
 # Copyright 2014 John-Mark Gurney
 # All rights reserved.
 # Copyright 2019 Enji Cooper
 #
 # This software was developed by John-Mark Gurney under
 # the sponsorship from the FreeBSD Foundation.
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
 # 1.  Redistributions of source code must retain the above copyright
 #     notice, this list of conditions and the following disclaimer.
 # 2.  Redistributions in binary form must reproduce the above copyright
 #     notice, this list of conditions and the following disclaimer in the
 #     documentation and/or other materials provided with the distribution.
 #
 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 # SUCH DAMAGE.
 #
 # $FreeBSD$
 #
 
 from __future__ import print_function
 import array
 from fcntl import ioctl
 import os
 import random
 import signal
 from struct import pack as _pack
 import sys
 import time
 
 import dpkt
 
 from cryptodevh import *
 
 __all__ = [ 'Crypto', 'MismatchError', ]
 
 class FindOp(dpkt.Packet):
     __byte_order__ = '@'
     __hdr__ = (
         ('crid', 'i',   0),
         ('name', '32s', 0),
     )
 
 class SessionOp(dpkt.Packet):
     __byte_order__ = '@'
     __hdr__ = (
         ('cipher',    'I', 0),
         ('mac',       'I', 0),
         ('keylen',    'I', 0),
         ('key',       'P', 0),
         ('mackeylen', 'i', 0),
         ('mackey',    'P', 0),
         ('ses',       'I', 0),
     )
 
 class SessionOp2(dpkt.Packet):
     __byte_order__ = '@'
     __hdr__ = (
         ('cipher',    'I', 0),
         ('mac',       'I', 0),
         ('keylen',    'I', 0),
         ('key',       'P', 0),
         ('mackeylen', 'i', 0),
         ('mackey',    'P', 0),
         ('ses',       'I', 0),
         ('crid',      'i', 0),
         ('pad0',      'i', 0),
         ('pad1',      'i', 0),
         ('pad2',      'i', 0),
         ('pad3',      'i', 0),
     )
 
 class CryptOp(dpkt.Packet):
     __byte_order__ = '@'
     __hdr__ = (
         ('ses',   'I', 0),
         ('op',    'H', 0),
         ('flags', 'H', 0),
         ('len',   'I', 0),
         ('src',   'P', 0),
         ('dst',   'P', 0),
         ('mac',   'P', 0),
         ('iv',    'P', 0),
     )
 
 class CryptAEAD(dpkt.Packet):
     __byte_order__ = '@'
     __hdr__ = (
         ('ses',    'I', 0),
         ('op',     'H', 0),
         ('flags',  'H', 0),
         ('len',    'I', 0),
         ('aadlen', 'I', 0),
         ('ivlen',  'I', 0),
         ('src',    'P', 0),
         ('dst',    'P', 0),
         ('aad',    'P', 0),
         ('tag',    'P', 0),
         ('iv',     'P', 0),
     )
 
 # h2py.py can't handle multiarg macros
 CRIOGET = 3221513060
 CIOCGSESSION = 3224396645
 CIOCGSESSION2 = 3225445226
 CIOCFSESSION = 2147771238
 CIOCCRYPT = 3224396647
 CIOCKEY = 3230688104
 CIOCASYMFEAT = 1074029417
 CIOCKEY2 = 3230688107
 CIOCFINDDEV = 3223610220
 CIOCCRYPTAEAD = 3225445229
 
 def _getdev():
     buf = array.array('I', [0])
     fd = os.open('/dev/crypto', os.O_RDWR)
     try:
         ioctl(fd, CRIOGET, buf, 1)
     finally:
         os.close(fd)
 
     return buf[0]
 
 _cryptodev = _getdev()
 
 def _findop(crid, name):
     fop = FindOp()
     fop.crid = crid
     fop.name = name.encode("ascii")
     s = array.array('B', fop.pack_hdr())
     ioctl(_cryptodev, CIOCFINDDEV, s, 1)
     fop.unpack(s)
 
     try:
         idx = fop.name.index(b'\x00')
         name = fop.name[:idx]
     except ValueError:
         name = fop.name
 
     return fop.crid, name
 
 class Crypto:
     @staticmethod
     def findcrid(name):
         return _findop(-1, name)[0]
 
     @staticmethod
     def getcridname(crid):
         return _findop(crid, '')[1]
 
     def __init__(self, cipher=0, key=None, mac=0, mackey=None,
         crid=CRYPTOCAP_F_SOFTWARE | CRYPTOCAP_F_HARDWARE, maclen=None):
         self._ses = None
         self._maclen = maclen
         ses = SessionOp2()
         ses.cipher = cipher
         ses.mac = mac
 
         if key is not None:
             ses.keylen = len(key)
             k = array.array('B', key)
             ses.key = k.buffer_info()[0]
         else:
             self.key = None
 
         if mackey is not None:
             ses.mackeylen = len(mackey)
             mk = array.array('B', mackey)
             ses.mackey = mk.buffer_info()[0]
 
         if not cipher and not mac:
             raise ValueError('one of cipher or mac MUST be specified.')
         ses.crid = crid
         #print(ses)
         s = array.array('B', ses.pack_hdr())
         #print(s)
         ioctl(_cryptodev, CIOCGSESSION2, s, 1)
         ses.unpack(s)
 
         self._ses = ses.ses
 
     def __del__(self):
         if self._ses is None:
             return
 
         try:
             ioctl(_cryptodev, CIOCFSESSION, _pack('I', self._ses))
         except TypeError:
             pass
         self._ses = None
 
     @staticmethod
     def _to_bytes(val):
         if sys.version_info[0] >= 3:
             if isinstance(val, str):
                 return val.encode("ascii")
         return val
 
     def _doop(self, op, src, iv):
         cop = CryptOp()
         cop.ses = self._ses
         cop.op = op
         cop.flags = 0
         cop.len = len(src)
         s = array.array('B', src)
         cop.src = cop.dst = s.buffer_info()[0]
         if self._maclen is not None:
             m = array.array('B', [0] * self._maclen)
             cop.mac = m.buffer_info()[0]
         ivbuf = array.array('B', self._to_bytes(iv))
         cop.iv = ivbuf.buffer_info()[0]
 
         #print('cop:', cop)
         ioctl(_cryptodev, CIOCCRYPT, str(cop))
 
         s = s.tostring()
         if self._maclen is not None:
             return s, m.tostring()
 
         return s
 
     def _doaead(self, op, src, aad, iv, tag=None):
         caead = CryptAEAD()
         caead.ses = self._ses
         caead.op = op
         caead.flags = CRD_F_IV_EXPLICIT
         caead.flags = 0
         caead.len = len(src)
         src = self._to_bytes(src)
         s = array.array("B", src)
         caead.src = caead.dst = s.buffer_info()[0]
         caead.aadlen = len(aad)
         aad = self._to_bytes(aad)
         saad = array.array('B', aad)
         caead.aad = saad.buffer_info()[0]
 
         if self._maclen is None:
             raise ValueError('must have a tag length')
 
         if tag is None:
             tag = array.array('B', [0] * self._maclen)
         else:
             assert len(tag) == self._maclen, \
                 '%d != %d' % (len(tag), self._maclen)
             tag = self._to_bytes(tag)
             tag = array.array('B', tag)
 
         caead.tag = tag.buffer_info()[0]
 
         ivbuf = array.array('B', iv)
         caead.ivlen = len(iv)
         caead.iv = ivbuf.buffer_info()[0]
 
         ioctl(_cryptodev, CIOCCRYPTAEAD, self._to_bytes(str(caead)))
 
         s = s.tostring()
 
         return s, tag.tostring()
 
     def perftest(self, op, size, timeo=3):
-        inp = array.array('B', (random.randint(0, 255) for x in xrange(size)))
+        inp = array.array('B', (random.randint(0, 255) for x in range(size)))
         out = array.array('B', self._to_bytes(inp))
 
         # prep ioctl
         cop = CryptOp()
         cop.ses = self._ses
         cop.op = op
         cop.flags = 0
         cop.len = len(inp)
         s = array.array('B', self._to_bytes(inp))
         cop.src = s.buffer_info()[0]
         cop.dst = out.buffer_info()[0]
         if self._maclen is not None:
             m = array.array('B', [0] * self._maclen)
             cop.mac = m.buffer_info()[0]
-        ivbuf = array.array('B', (random.randint(0, 255) for x in xrange(16)))
+        ivbuf = array.array('B', (random.randint(0, 255) for x in range(16)))
         cop.iv = ivbuf.buffer_info()[0]
 
         exit = [ False ]
         def alarmhandle(a, b, exit=exit):
             exit[0] = True
 
         oldalarm = signal.signal(signal.SIGALRM, alarmhandle)
         signal.alarm(timeo)
 
         start = time.time()
         reps = 0
         while not exit[0]:
             ioctl(_cryptodev, CIOCCRYPT, self._to_bytes(str(cop)))
             reps += 1
 
         end = time.time()
 
         signal.signal(signal.SIGALRM, oldalarm)
 
         print('time:', end - start)
         print('perf MB/sec:', (reps * size) / (end - start) / 1024 / 1024)
 
     def encrypt(self, data, iv, aad=None):
         if aad is None:
             return self._doop(COP_ENCRYPT, data, iv)
         else:
             return self._doaead(COP_ENCRYPT, data, aad,
                 iv)
 
     def decrypt(self, data, iv, aad=None, tag=None):
         if aad is None:
             return self._doop(COP_DECRYPT, data, iv)
         else:
             return self._doaead(COP_DECRYPT, data, aad,
                 iv, tag=tag)
 
 class MismatchError(Exception):
     pass
 
 class KATParser:
     def __init__(self, fname, fields):
         self.fields = set(fields)
         self._pending = None
         self.fname = fname
         self.fp = None
 
     def __enter__(self):
         self.fp = open(self.fname)
         return self
 
     def __exit__(self, exc_type, exc_value, exc_tb):
         if self.fp is not None:
             self.fp.close()
 
     def __iter__(self):
         return self
 
     def __next__(self):
         while True:
             didread = False
             if self._pending is not None:
                 i = self._pending
                 self._pending = None
             else:
                 i = self.fp.readline()
                 didread = True
 
             if didread and not i:
                 raise StopIteration
 
             if not i.startswith('#') and i.strip():
                 break
 
         if i[0] == '[':
             yield i[1:].split(']', 1)[0], self.fielditer()
         else:
             raise ValueError('unknown line: %r' % repr(i))
 
     def eatblanks(self):
         while True:
             line = self.fp.readline()
             if line == '':
                 break
 
             line = line.strip()
             if line:
                 break
 
         return line
 
     def fielditer(self):
         while True:
             values = {}
 
             line = self.eatblanks()
             if not line or line[0] == '[':
                 self._pending = line
                 return
 
             while True:
                 try:
                     f, v = line.split(' =')
                 except:
                     if line == 'FAIL':
                         f, v = 'FAIL', ''
                     else:
                         print('line:', repr(line))
                         raise
                 v = v.strip()
 
                 if f in values:
                     raise ValueError('already present: %r' % repr(f))
                 values[f] = v
                 line = self.fp.readline().strip()
                 if not line:
                     break
 
             # we should have everything
             remain = self.fields.copy() - set(values.keys())
             # XXX - special case GCM decrypt
             if remain and not ('FAIL' in values and 'PT' in remain):
                 raise ValueError('not all fields found: %r' % repr(remain))
 
             yield values
 
 # The CCM files use a bit of a different syntax that doesn't quite fit
 # the generic KATParser.  In particular, some keys are set globally at
 # the start of the file, and some are set globally at the start of a
 # section.
 class KATCCMParser:
     def __init__(self, fname):
         self.fp = open(fname)
         self._pending = None
         self.read_globals()
 
     def read_globals(self):
         self.global_values = {}
         while True:
             line = self.fp.readline()
             if not line:
                 return
             if line[0] == '#' or not line.strip():
                 continue
             if line[0] == '[':
                 self._pending = line
                 return
 
             try:
                 f, v = line.split(' =')
             except:
                 print('line:', repr(line))
                 raise
 
             v = v.strip()
 
             if f in self.global_values:
                 raise ValueError('already present: %r' % repr(f))
             self.global_values[f] = v
 
     def read_section_values(self, kwpairs):
         self.section_values = self.global_values.copy()
         for pair in kwpairs.split(', '):
             f, v = pair.split(' = ')
             if f in self.section_values:
                 raise ValueError('already present: %r' % repr(f))
             self.section_values[f] = v
 
         while True:
             line = self.fp.readline()
             if not line:
                 return
             if line[0] == '#' or not line.strip():
                 continue
             if line[0] == '[':
                 self._pending = line
                 return
 
             try:
                 f, v = line.split(' =')
             except:
                 print('line:', repr(line))
                 raise
 
             if f == 'Count':
                 self._pending = line
                 return
 
             v = v.strip()
 
             if f in self.section_values:
                 raise ValueError('already present: %r' % repr(f))
             self.section_values[f] = v
 
     def __iter__(self):
         while True:
             if self._pending:
                 line = self._pending
                 self._pending = None
             else:
                 line = self.fp.readline()
                 if not line:
                     return
 
             if (line and line[0] == '#') or not line.strip():
                 continue
 
             if line[0] == '[':
                 section = line[1:].split(']', 1)[0]
                 self.read_section_values(section)
                 continue
 
             values = self.section_values.copy()
 
             while True:
                 try:
                     f, v = line.split(' =')
                 except:
                     print('line:', repr(line))
                     raise
                 v = v.strip()
 
                 if f in values:
                     raise ValueError('already present: %r' % repr(f))
                 values[f] = v
                 line = self.fp.readline().strip()
                 if not line:
                     break
 
             yield values
 
 
 def _spdechex(s):
     return binascii.hexlify(''.join(s.split()))
 
 if __name__ == '__main__':
     if True:
         try:
             crid = Crypto.findcrid('aesni0')
             print('aesni:', crid)
         except IOError:
             print('aesni0 not found')
 
-        for i in xrange(10):
+        for i in range(10):
             try:
                 name = Crypto.getcridname(i)
                 print('%2d: %r' % (i, repr(name)))
             except IOError:
                 pass
     elif False:
         kp = KATParser('/usr/home/jmg/aesni.testing/format tweak value input - data unit seq no/XTSGenAES128.rsp', [ 'COUNT', 'DataUnitLen', 'Key', 'DataUnitSeqNumber', 'PT', 'CT' ])
         for mode, ni in kp:
             print(i, ni)
             for j in ni:
                 print(j)
     elif False:
         key = _spdechex('c939cc13397c1d37de6ae0e1cb7c423c')
         iv = _spdechex('00000000000000000000000000000001')
         pt = _spdechex('ab3cabed693a32946055524052afe3c9cb49664f09fc8b7da824d924006b7496353b8c1657c5dec564d8f38d7432e1de35aae9d95590e66278d4acce883e51abaf94977fcd3679660109a92bf7b2973ccd547f065ec6cee4cb4a72a5e9f45e615d920d76cb34cba482467b3e21422a7242e7d931330c0fbf465c3a3a46fae943029fd899626dda542750a1eee253df323c6ef1573f1c8c156613e2ea0a6cdbf2ae9701020be2d6a83ecb7f3f9d8e')
         #pt = _spdechex('00000000000000000000000000000000')
         ct = _spdechex('f42c33853ecc5ce2949865fdb83de3bff1089e9360c94f830baebfaff72836ab5236f77212f1e7396c8c54ac73d81986375a6e9e299cfeca5ba051ed25e8d1affa5beaf6c1d2b45e90802408f2ced21663497e906de5f29341e5e52ddfea5363d628b3eb7806835e17bae051b3a6da3f8e2941fe44384eac17a9d298d2c331ca8320c775b5d53263a5e905059d891b21dede2d8110fd427c7bd5a9a274ddb47b1945ee79522203b6e297d0e399ef')
 
         c = Crypto(CRYPTO_AES_ICM, key)
         enc = c.encrypt(pt, iv)
 
         print('enc:', binascii.unhexlify(enc))
         print(' ct:', binascii.unhexlify(ct))
 
         assert ct == enc
 
         dec = c.decrypt(ct, iv)
 
         print('dec:', binascii.unhexlify(dec))
         print(' pt:', binascii.unhexlify(pt))
 
         assert pt == dec
     elif False:
         key = _spdechex('c939cc13397c1d37de6ae0e1cb7c423c')
         iv = _spdechex('00000000000000000000000000000001')
         pt = _spdechex('ab3cabed693a32946055524052afe3c9cb49664f09fc8b7da824d924006b7496353b8c1657c5dec564d8f38d7432e1de35aae9d95590e66278d4acce883e51abaf94977fcd3679660109a92bf7b2973ccd547f065ec6cee4cb4a72a5e9f45e615d920d76cb34cba482467b3e21422a7242e7d931330c0fbf465c3a3a46fae943029fd899626dda542750a1eee253df323c6ef1573f1c8c156613e2ea0a6cdbf2ae9701020be2d6a83ecb7f3f9d8e0a3f')
         #pt = _spdechex('00000000000000000000000000000000')
         ct = _spdechex('f42c33853ecc5ce2949865fdb83de3bff1089e9360c94f830baebfaff72836ab5236f77212f1e7396c8c54ac73d81986375a6e9e299cfeca5ba051ed25e8d1affa5beaf6c1d2b45e90802408f2ced21663497e906de5f29341e5e52ddfea5363d628b3eb7806835e17bae051b3a6da3f8e2941fe44384eac17a9d298d2c331ca8320c775b5d53263a5e905059d891b21dede2d8110fd427c7bd5a9a274ddb47b1945ee79522203b6e297d0e399ef3768')
 
         c = Crypto(CRYPTO_AES_ICM, key)
         enc = c.encrypt(pt, iv)
 
         print('enc:', binascii.hexlify(enc))
         print(' ct:', binascii.hexlify(ct))
 
         assert ct == enc
 
         dec = c.decrypt(ct, iv)
 
         print('dec:', binascii.hexlify(dec))
         print(' pt:', binascii.hexlify(pt))
 
         assert pt == dec
     elif False:
         key = _spdechex('c939cc13397c1d37de6ae0e1cb7c423c')
         iv = _spdechex('6eba2716ec0bd6fa5cdef5e6d3a795bc')
         pt = _spdechex('ab3cabed693a32946055524052afe3c9cb49664f09fc8b7da824d924006b7496353b8c1657c5dec564d8f38d7432e1de35aae9d95590e66278d4acce883e51abaf94977fcd3679660109a92bf7b2973ccd547f065ec6cee4cb4a72a5e9f45e615d920d76cb34cba482467b3e21422a7242e7d931330c0fbf465c3a3a46fae943029fd899626dda542750a1eee253df323c6ef1573f1c8c156613e2ea0a6cdbf2ae9701020be2d6a83ecb7f3f9d8e0a3f')
         ct = _spdechex('f1f81f12e72e992dbdc304032705dc75dc3e4180eff8ee4819906af6aee876d5b00b7c36d282a445ce3620327be481e8e53a8e5a8e5ca9abfeb2281be88d12ffa8f46d958d8224738c1f7eea48bda03edbf9adeb900985f4fa25648b406d13a886c25e70cfdecdde0ad0f2991420eb48a61c64fd797237cf2798c2675b9bb744360b0a3f329ac53bbceb4e3e7456e6514f1a9d2f06c236c31d0f080b79c15dce1096357416602520daa098b17d1af427')
         c = Crypto(CRYPTO_AES_CBC, key)
 
         enc = c.encrypt(pt, iv)
 
         print('enc:', binascii.hexlify(enc))
         print(' ct:', binascii.hexlify(ct))
 
         assert ct == enc
 
         dec = c.decrypt(ct, iv)
 
         print('dec:', binascii.hexlify(dec))
         print(' pt:', binascii.hexlify(pt))
 
         assert pt == dec
     elif False:
         key = _spdechex('c939cc13397c1d37de6ae0e1cb7c423c')
         iv = _spdechex('b3d8cc017cbb89b39e0f67e2')
         pt = _spdechex('c3b3c41f113a31b73d9a5cd4321030')
         aad = _spdechex('24825602bd12a984e0092d3e448eda5f')
         ct = _spdechex('93fe7d9e9bfd10348a5606e5cafa7354')
         ct = _spdechex('93fe7d9e9bfd10348a5606e5cafa73')
         tag = _spdechex('0032a1dc85f1c9786925a2e71d8272dd')
         tag = _spdechex('8d11a0929cb3fbe1fef01a4a38d5f8ea')
 
         c = Crypto(CRYPTO_AES_NIST_GCM_16, key,
             mac=CRYPTO_AES_128_NIST_GMAC, mackey=key)
 
         enc, enctag = c.encrypt(pt, iv, aad=aad)
 
         print('enc:', binascii.hexlify(enc))
         print(' ct:', binascii.hexlify(ct))
 
         assert enc == ct
 
         print('etg:', binascii.hexlify(enctag))
         print('tag:', binascii.hexlify(tag))
         assert enctag == tag
 
         # Make sure we get EBADMSG
         #enctag = enctag[:-1] + 'a'
         dec, dectag = c.decrypt(ct, iv, aad=aad, tag=enctag)
 
         print('dec:', binascii.hexlify(dec))
         print(' pt:', binascii.hexlify(pt))
 
         assert dec == pt
 
         print('dtg:', binascii.hexlify(dectag))
         print('tag:', binascii.hexlify(tag))
 
         assert dectag == tag
     elif False:
         key = _spdechex('c939cc13397c1d37de6ae0e1cb7c423c')
         iv = _spdechex('b3d8cc017cbb89b39e0f67e2')
         key = key + iv[:4]
         iv = iv[4:]
         pt = _spdechex('c3b3c41f113a31b73d9a5cd432103069')
         aad = _spdechex('24825602bd12a984e0092d3e448eda5f')
         ct = _spdechex('93fe7d9e9bfd10348a5606e5cafa7354')
         tag = _spdechex('0032a1dc85f1c9786925a2e71d8272dd')
 
         c = Crypto(CRYPTO_AES_GCM_16, key, mac=CRYPTO_AES_128_GMAC, mackey=key)
 
         enc, enctag = c.encrypt(pt, iv, aad=aad)
 
         print('enc:', binascii.hexlify(enc))
         print(' ct:', binascii.hexlify(ct))
 
         assert enc == ct
 
         print('etg:', binascii.hexlify(enctag))
         print('tag:', binascii.hexlify(tag))
         assert enctag == tag
     elif False:
-        for i in xrange(100000):
+        for i in range(100000):
             c = Crypto(CRYPTO_AES_XTS, binascii.unhexlify('1bbfeadf539daedcae33ced497343f3ca1f2474ad932b903997d44707db41382'))
             data = binascii.unhexlify('52a42bca4e9425a25bbc8c8bf6129dec')
             ct = binascii.unhexlify('517e602becd066b65fa4f4f56ddfe240')
             iv = _pack('QQ', 71, 0)
 
             enc = c.encrypt(data, iv)
             assert enc == ct
     elif True:
         c = Crypto(CRYPTO_AES_XTS, binascii.unhexlify('1bbfeadf539daedcae33ced497343f3ca1f2474ad932b903997d44707db41382'))
         data = binascii.unhexlify('52a42bca4e9425a25bbc8c8bf6129dec')
         ct = binascii.unhexlify('517e602becd066b65fa4f4f56ddfe240')
         iv = _pack('QQ', 71, 0)
 
         enc = c.encrypt(data, iv)
         assert enc == ct
 
         dec = c.decrypt(enc, iv)
         assert dec == data
 
         #c.perftest(COP_ENCRYPT, 192*1024, reps=30000)
 
     else:
         key = binascii.unhexlify('1bbfeadf539daedcae33ced497343f3ca1f2474ad932b903997d44707db41382')
         print('XTS %d testing:' % (len(key) * 8))
         c = Crypto(CRYPTO_AES_XTS, key)
         for i in [ 8192, 192*1024]:
             print('block size: %d' % i)
             c.perftest(COP_ENCRYPT, i)
             c.perftest(COP_DECRYPT, i)
Index: user/ngie/bug-237403/tests/sys/opencrypto/cryptotest.py
===================================================================
--- user/ngie/bug-237403/tests/sys/opencrypto/cryptotest.py	(revision 346925)
+++ user/ngie/bug-237403/tests/sys/opencrypto/cryptotest.py	(revision 346926)
@@ -1,495 +1,495 @@
 #!/usr/local/bin/python2
 #
 # Copyright (c) 2014 The FreeBSD Foundation
 # All rights reserved.
 #
 # This software was developed by John-Mark Gurney under
 # the sponsorship from the FreeBSD Foundation.
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
 # 1.  Redistributions of source code must retain the above copyright
 #     notice, this list of conditions and the following disclaimer.
 # 2.  Redistributions in binary form must reproduce the above copyright
 #     notice, this list of conditions and the following disclaimer in the
 #     documentation and/or other materials provided with the distribution.
 #
 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 # SUCH DAMAGE.
 #
 # $FreeBSD$
 #
 
 from __future__ import print_function
 
 import binascii
 import errno
 import cryptodev
 import itertools
 import os
 import struct
 import unittest
 from cryptodev import *
 from glob import iglob
 
 katdir = '/usr/local/share/nist-kat'
 
 def katg(base, glob):
     assert os.path.exists(katdir), "Please 'pkg install nist-kat'"
     if not os.path.exists(os.path.join(katdir, base)):
         raise unittest.SkipTest("Missing %s test vectors" % (base))
     return iglob(os.path.join(katdir, base, glob))
 
 aesmodules = [ 'cryptosoft0', 'aesni0', 'ccr0', 'ccp0' ]
 desmodules = [ 'cryptosoft0', ]
 shamodules = [ 'cryptosoft0', 'aesni0', 'ccr0', 'ccp0' ]
 
 def GenTestCase(cname):
     try:
         crid = cryptodev.Crypto.findcrid(cname)
     except IOError:
         return None
 
     class GendCryptoTestCase(unittest.TestCase):
         ###############
         ##### AES #####
         ###############
         @unittest.skipIf(cname not in aesmodules, 'skipping AES-XTS on %s' % (cname))
         def test_xts(self):
             for i in katg('XTSTestVectors/format tweak value input - data unit seq no', '*.rsp'):
                 self.runXTS(i, cryptodev.CRYPTO_AES_XTS)
 
         @unittest.skipIf(cname not in aesmodules, 'skipping AES-CBC on %s' % (cname))
         def test_cbc(self):
             for i in katg('KAT_AES', 'CBC[GKV]*.rsp'):
                 self.runCBC(i)
 
         @unittest.skipIf(cname not in aesmodules, 'skipping AES-CCM on %s' % (cname))
         def test_ccm(self):
             for i in katg('ccmtestvectors', 'V*.rsp'):
                 self.runCCMEncrypt(i)
 
             for i in katg('ccmtestvectors', 'D*.rsp'):
                 self.runCCMDecrypt(i)
 
         @unittest.skipIf(cname not in aesmodules, 'skipping AES-GCM on %s' % (cname))
         def test_gcm(self):
             for i in katg('gcmtestvectors', 'gcmEncrypt*'):
                 self.runGCM(i, 'ENCRYPT')
 
             for i in katg('gcmtestvectors', 'gcmDecrypt*'):
                 self.runGCM(i, 'DECRYPT')
 
         _gmacsizes = { 32: cryptodev.CRYPTO_AES_256_NIST_GMAC,
             24: cryptodev.CRYPTO_AES_192_NIST_GMAC,
             16: cryptodev.CRYPTO_AES_128_NIST_GMAC,
         }
         def runGCM(self, fname, mode):
             curfun = None
             if mode == 'ENCRYPT':
                 swapptct = False
                 curfun = Crypto.encrypt
             elif mode == 'DECRYPT':
                 swapptct = True
                 curfun = Crypto.decrypt
             else:
                 raise RuntimeError('unknown mode: %r' % repr(mode))
 
             columns = [ 'Count', 'Key', 'IV', 'CT', 'AAD', 'Tag', 'PT', ]
             with cryptodev.KATParser(fname, columns) as parser:
                 self.runGCMWithParser(parser, mode)
 
         def runGCMWithParser(self, parser, mode):
             for _, lines in next(parser):
                 for data in lines:
                     curcnt = int(data['Count'])
                     cipherkey = binascii.unhexlify(data['Key'])
                     iv = binascii.unhexlify(data['IV'])
                     aad = binascii.unhexlify(data['AAD'])
                     tag = binascii.unhexlify(data['Tag'])
                     if 'FAIL' not in data:
                         pt = binascii.unhexlify(data['PT'])
                     ct = binascii.unhexlify(data['CT'])
 
                     if len(iv) != 12:
                         # XXX - isn't supported
                         continue
 
                     try:
                         c = Crypto(cryptodev.CRYPTO_AES_NIST_GCM_16,
                             cipherkey,
                             mac=self._gmacsizes[len(cipherkey)],
                             mackey=cipherkey, crid=crid,
                             maclen=16)
                     except EnvironmentError as e:
                         # Can't test algorithms the driver does not support.
                         if e.errno != errno.EOPNOTSUPP:
                             raise
                         continue
 
                     if mode == 'ENCRYPT':
                         try:
                             rct, rtag = c.encrypt(pt, iv, aad)
                         except EnvironmentError as e:
                             # Can't test inputs the driver does not support.
                             if e.errno != errno.EINVAL:
                                 raise
                             continue
                         rtag = rtag[:len(tag)]
                         data['rct'] = binascii.hexlify(rct)
                         data['rtag'] = binascii.hexlify(rtag)
                         self.assertEqual(rct, ct, repr(data))
                         self.assertEqual(rtag, tag, repr(data))
                     else:
                         if len(tag) != 16:
                             continue
                         args = (ct, iv, aad, tag)
                         if 'FAIL' in data:
                             self.assertRaises(IOError,
                                 c.decrypt, *args)
                         else:
                             try:
                                 rpt, rtag = c.decrypt(*args)
                             except EnvironmentError as e:
                                 # Can't test inputs the driver does not support.
                                 if e.errno != errno.EINVAL:
                                     raise
                                 continue
                             data['rpt'] = binascii.unhexlify(rpt)
                             data['rtag'] = binascii.unhexlify(rtag)
                             self.assertEqual(rpt, pt,
                                 repr(data))
 
         def runCBC(self, fname):
             columns = [ 'COUNT', 'KEY', 'IV', 'PLAINTEXT', 'CIPHERTEXT', ]
             with cryptodev.KATParser(fname, columns) as parser:
                 self.runCBCWithParser(parser)
 
         def runCBCWithParser(self, parser):
             curfun = None
             for mode, lines in next(parser):
                 if mode == 'ENCRYPT':
                     swapptct = False
                     curfun = Crypto.encrypt
                 elif mode == 'DECRYPT':
                     swapptct = True
                     curfun = Crypto.decrypt
                 else:
                     raise RuntimeError('unknown mode: %r' % repr(mode))
 
                 for data in lines:
                     curcnt = int(data['COUNT'])
                     cipherkey = binascii.unhexlify(data['KEY'])
                     iv = binascii.unhexlify(data['IV'])
                     pt = binascii.unhexlify(data['PLAINTEXT'])
                     ct = binascii.unhexlify(data['CIPHERTEXT'])
 
                     if swapptct:
                         pt, ct = ct, pt
                     # run the fun
                     c = Crypto(cryptodev.CRYPTO_AES_CBC, cipherkey, crid=crid)
                     r = curfun(c, pt, iv)
                     self.assertEqual(r, ct)
 
         def runXTS(self, fname, meth):
             columns = [ 'COUNT', 'DataUnitLen', 'Key', 'DataUnitSeqNumber', 'PT',
                         'CT']
             with cryptodev.KATParser(fname, columns) as parser:
                 self.runXTSWithParser(parser, meth)
 
         def runXTSWithParser(self, parser, meth):
             curfun = None
             for mode, lines in next(parser):
                 if mode == 'ENCRYPT':
                     swapptct = False
                     curfun = Crypto.encrypt
                 elif mode == 'DECRYPT':
                     swapptct = True
                     curfun = Crypto.decrypt
                 else:
                     raise RuntimeError('unknown mode: %r' % repr(mode))
 
                 for data in lines:
                     curcnt = int(data['COUNT'])
                     nbits = int(data['DataUnitLen'])
                     cipherkey = binascii.unhexlify(data['Key'])
                     iv = struct.pack('QQ', int(data['DataUnitSeqNumber']), 0)
                     pt = binascii.unhexlify(data['PT'])
                     ct = binascii.unhexlify(data['CT'])
 
                     if nbits % 128 != 0:
                         # XXX - mark as skipped
                         continue
                     if swapptct:
                         pt, ct = ct, pt
                     # run the fun
                     try:
                         c = Crypto(meth, cipherkey, crid=crid)
                         r = curfun(c, pt, iv)
                     except EnvironmentError as e:
                         # Can't test hashes the driver does not support.
                         if e.errno != errno.EOPNOTSUPP:
                             raise
                         continue
                     self.assertEqual(r, ct)
 
         def runCCMEncrypt(self, fname):
             for data in cryptodev.KATCCMParser(fname):
                 Nlen = int(data['Nlen'])
                 if Nlen != 12:
                     # OCF only supports 12 byte IVs
                     continue
                 key = data['Key'].decode('hex')
                 nonce = data['Nonce'].decode('hex')
                 Alen = int(data['Alen'])
                 if Alen != 0:
                     aad = data['Adata'].decode('hex')
                 else:
                     aad = None
                 payload = data['Payload'].decode('hex')
                 ct = data['CT'].decode('hex')
 
                 try:
                     c = Crypto(crid=crid,
                         cipher=cryptodev.CRYPTO_AES_CCM_16,
                         key=key,
                         mac=cryptodev.CRYPTO_AES_CCM_CBC_MAC,
                         mackey=key, maclen=16)
                     r, tag = Crypto.encrypt(c, payload,
                         nonce, aad)
                 except EnvironmentError as e:
                     if e.errno != errno.EOPNOTSUPP:
                         raise
                     continue
 
                 out = r + tag
                 self.assertEqual(out, ct,
                     "Count " + data['Count'] + " Actual: " + \
                     repr(out.encode("hex")) + " Expected: " + \
                     repr(data) + " on " + cname)
 
         def runCCMDecrypt(self, fname):
             # XXX: Note that all of the current CCM
             # decryption test vectors use IV and tag sizes
             # that aren't supported by OCF none of the
             # tests are actually ran.
             for data in cryptodev.KATCCMParser(fname):
                 Nlen = int(data['Nlen'])
                 if Nlen != 12:
                     # OCF only supports 12 byte IVs
                     continue
                 Tlen = int(data['Tlen'])
                 if Tlen != 16:
                     # OCF only supports 16 byte tags
                     continue
                 key = data['Key'].decode('hex')
                 nonce = data['Nonce'].decode('hex')
                 Alen = int(data['Alen'])
                 if Alen != 0:
                     aad = data['Adata'].decode('hex')
                 else:
                     aad = None
                 ct = data['CT'].decode('hex')
                 tag = ct[-16:]
                 ct = ct[:-16]
 
                 try:
                     c = Crypto(crid=crid,
                         cipher=cryptodev.CRYPTO_AES_CCM_16,
                         key=key,
                         mac=cryptodev.CRYPTO_AES_CCM_CBC_MAC,
                         mackey=key, maclen=16)
                 except EnvironmentError as e:
                     if e.errno != errno.EOPNOTSUPP:
                         raise
                     continue
 
                 if data['Result'] == 'Fail':
                     self.assertRaises(IOError,
                         c.decrypt, payload, nonce, aad, tag)
                 else:
                     r = Crypto.decrypt(c, payload, nonce,
                         aad, tag)
 
                     payload = data['Payload'].decode('hex')
-                    Plen = int(data('Plen'))
+                    plen = int(data('Plen'))
                     payload = payload[:plen]
                     self.assertEqual(r, payload,
                         "Count " + data['Count'] + \
                         " Actual: " + repr(r.encode("hex")) + \
                         " Expected: " + repr(data) + \
                         " on " + cname)
 
         ###############
         ##### DES #####
         ###############
         @unittest.skipIf(cname not in desmodules, 'skipping DES on %s' % (cname))
         def test_tdes(self):
             for i in katg('KAT_TDES', 'TCBC[a-z]*.rsp'):
                 self.runTDES(i)
 
         def runTDES(self, fname):
             columns = [ 'COUNT', 'KEYs', 'IV', 'PLAINTEXT', 'CIPHERTEXT', ]
             with cryptodev.KATParser(fname, columns) as parser:
                 self.runTDESWithParser(parser)
 
         def runTDESWithParser(self, parser):
             curfun = None
             for mode, lines in next(parser):
                 if mode == 'ENCRYPT':
                     swapptct = False
                     curfun = Crypto.encrypt
                 elif mode == 'DECRYPT':
                     swapptct = True
                     curfun = Crypto.decrypt
                 else:
                     raise RuntimeError('unknown mode: %r' % repr(mode))
 
                 for data in lines:
                     curcnt = int(data['COUNT'])
                     key = data['KEYs'] * 3
                     cipherkey = binascii.unhexlify(key)
                     iv = binascii.unhexlify(data['IV'])
                     pt = binascii.unhexlify(data['PLAINTEXT'])
                     ct = binascii.unhexlify(data['CIPHERTEXT'])
 
                     if swapptct:
                         pt, ct = ct, pt
                     # run the fun
                     c = Crypto(cryptodev.CRYPTO_3DES_CBC, cipherkey, crid=crid)
                     r = curfun(c, pt, iv)
                     self.assertEqual(r, ct)
 
         ###############
         ##### SHA #####
         ###############
         @unittest.skipIf(cname not in shamodules, 'skipping SHA on %s' % str(cname))
         def test_sha(self):
             for i in katg('shabytetestvectors', 'SHA*Msg.rsp'):
                 self.runSHA(i)
 
         def runSHA(self, fname):
             # Skip SHA512_(224|256) tests
             if fname.find('SHA512_') != -1:
                 return
 
             for hashlength, lines in cryptodev.KATParser(fname,
                 [ 'Len', 'Msg', 'MD' ]):
                 # E.g., hashlength will be "L=20" (bytes)
                 hashlen = int(hashlength.split("=")[1])
 
                 if hashlen == 20:
                     alg = cryptodev.CRYPTO_SHA1
                 elif hashlen == 28:
                     alg = cryptodev.CRYPTO_SHA2_224
                 elif hashlen == 32:
                     alg = cryptodev.CRYPTO_SHA2_256
                 elif hashlen == 48:
                     alg = cryptodev.CRYPTO_SHA2_384
                 elif hashlen == 64:
                     alg = cryptodev.CRYPTO_SHA2_512
                 else:
                     # Skip unsupported hashes
                     # Slurp remaining input in section
                     for data in lines:
                         continue
                     continue
 
                 for data in lines:
                     msg = data['Msg'].decode('hex')
-                                        msg = msg[:int(data['Len'])]
+                    msg = msg[:int(data['Len'])]
                     md = data['MD'].decode('hex')
 
                     try:
                         c = Crypto(mac=alg, crid=crid,
                             maclen=hashlen)
                     except EnvironmentError as e:
                         # Can't test hashes the driver does not support.
                         if e.errno != errno.EOPNOTSUPP:
                             raise
                         continue
 
                     _, r = c.encrypt(msg, iv="")
 
                     self.assertEqual(r, md, "Actual: " + \
                         repr(r.encode("hex")) + " Expected: " + repr(data) + " on " + cname)
 
         @unittest.skipIf(cname not in shamodules, 'skipping SHA-HMAC on %s' % str(cname))
         def test_sha1hmac(self):
             for i in katg('hmactestvectors', 'HMAC.rsp'):
                 self.runSHA1HMAC(i)
 
         def runSHA1HMAC(self, fname):
             columns = [ 'Count', 'Klen', 'Tlen', 'Key', 'Msg', 'Mac' ]
             with cryptodev.KATParser(fname, columns) as parser:
                 self.runSHA1HMACWithParser(parser)
 
         def runSHA1HMACWithParser(self, parser):
             for hashlength, lines in next(parser):
                 # E.g., hashlength will be "L=20" (bytes)
                 hashlen = int(hashlength.split("=")[1])
 
                 blocksize = None
                 if hashlen == 20:
                     alg = cryptodev.CRYPTO_SHA1_HMAC
                     blocksize = 64
                 elif hashlen == 28:
                     alg = cryptodev.CRYPTO_SHA2_224_HMAC
                     blocksize = 64
                 elif hashlen == 32:
                     alg = cryptodev.CRYPTO_SHA2_256_HMAC
                     blocksize = 64
                 elif hashlen == 48:
                     alg = cryptodev.CRYPTO_SHA2_384_HMAC
                     blocksize = 128
                 elif hashlen == 64:
                     alg = cryptodev.CRYPTO_SHA2_512_HMAC
                     blocksize = 128
                 else:
                     # Skip unsupported hashes
                     # Slurp remaining input in section
                     for data in lines:
                         continue
                     continue
 
                 for data in lines:
                     key = binascii.unhexlify(data['Key'])
                     msg = binascii.unhexlify(data['Msg'])
                     mac = binascii.unhexlify(data['Mac'])
                     tlen = int(data['Tlen'])
 
                     if len(key) > blocksize:
                         continue
 
                     try:
                         c = Crypto(mac=alg, mackey=key,
                             crid=crid, maclen=hashlen)
                     except EnvironmentError as e:
                         # Can't test hashes the driver does not support.
                         if e.errno != errno.EOPNOTSUPP:
                             raise
                         continue
 
                     _, r = c.encrypt(msg, iv="")
 
                     self.assertEqual(r[:tlen], mac, "Actual: " + \
                         repr(r.encode("hex")) + " Expected: " + repr(data))
 
     return GendCryptoTestCase
 
 cryptosoft = GenTestCase('cryptosoft0')
 aesni = GenTestCase('aesni0')
 ccr = GenTestCase('ccr0')
 ccp = GenTestCase('ccp0')
 
 if __name__ == '__main__':
     unittest.main()
Index: user/ngie/bug-237403/tools/boot/ci-qemu-test.sh
===================================================================
--- user/ngie/bug-237403/tools/boot/ci-qemu-test.sh	(revision 346925)
+++ user/ngie/bug-237403/tools/boot/ci-qemu-test.sh	(revision 346926)
@@ -1,109 +1,110 @@
 #!/bin/sh
 
 # Install loader, kernel, and enough of userland to boot in QEMU and echo
 # "Hello world." from init, as a very quick smoke test for CI.  Uses QEMU's
 # virtual FAT filesystem to avoid the need to create a disk image.  While
 # designed for CI automated testing, this script can also be run by hand as
 # a quick smoke-test.  The rootgen.sh and related scripts generate much more
 # extensive tests for many combinations of boot env (ufs, zfs, geli, etc).
 #
 # $FreeBSD$
 
 set -e
 
 die()
 {
 	echo "$*" 1>&2
 	exit 1
 }
 
 tempdir_cleanup()
 {
 	trap - EXIT SIGINT SIGHUP SIGTERM SIGQUIT
 	rm -rf ${ROOTDIR}
 }
 
 tempdir_setup()
 {
 	# Create minimal directory structure and populate it.
 	# Caller must cd ${SRCTOP} before calling this function.
 
 	for dir in dev bin efi/boot etc lib libexec sbin usr/lib usr/libexec; do
 		mkdir -p ${ROOTDIR}/${dir}
 	done
 
 	# Install kernel, loader and minimal userland.
 
 	make -DNO_ROOT DESTDIR=${ROOTDIR} \
 	    MODULES_OVERRIDE= \
 	    WITHOUT_DEBUG_FILES=yes \
 	    WITHOUT_KERNEL_SYMBOLS=yes \
 	    installkernel
 	for dir in stand \
 	    lib/libc lib/libedit lib/ncurses \
 	    libexec/rtld-elf \
 	    bin/sh sbin/init sbin/shutdown; do
 		make -DNO_ROOT DESTDIR=${ROOTDIR} INSTALL="install -U" \
 		    WITHOUT_DEBUG_FILES= \
 		    WITHOUT_MAN= \
 		    WITHOUT_PROFILE= \
 		    WITHOUT_TESTS= \
 		    WITHOUT_TOOLCHAIN= \
 		    -C ${dir} install
 	done
 
 	# Put loader in standard EFI location.
 	mv ${ROOTDIR}/boot/loader.efi ${ROOTDIR}/efi/boot/BOOTx64.EFI
 
 	# Configuration files.
 	cat > ${ROOTDIR}/boot/loader.conf <<EOF
 vfs.root.mountfrom="msdosfs:/dev/ada0s1"
 autoboot_delay=-1
 boot_verbose=YES
 EOF
 	cat > ${ROOTDIR}/etc/rc <<EOF
 #!/bin/sh
 
 echo "Hello world."
 /sbin/shutdown -p now
 EOF
 
 	# Entropy needed to boot, see r346250 and followup commits/discussion.
 	dd if=/dev/random of=${ROOTDIR}/boot/entropy bs=4k count=1
 
 	# Remove unnecessary files to keep FAT filesystem size down.
 	rm -rf ${ROOTDIR}/METALOG ${ROOTDIR}/usr/lib
 }
 
 # Locate the top of the source tree, to run make install from.
 : ${SRCTOP:=$(make -V SRCTOP)}
 if [ -z "${SRCTOP}" ]; then
 	die "Cannot locate top of source tree"
 fi
 
-# Locate the uefi bios file used by qemu.
-: ${OVMF:=/usr/local/share/UEFI-firmware/QEMU_UEFI_CODE_x86_64.fd}
+# Locate the uefi firmware file used by qemu.
+: ${OVMF:=/usr/local/share/uefi-edk2-qemu/QEMU_UEFI_CODE-x86_64.fd}
 if [ ! -r "${OVMF}" ]; then
-	die "Cannot read UEFI bios file ${OVMF}"
+	die "Cannot read UEFI firmware file ${OVMF}"
 fi
 
 # Create a temp dir to hold the boot image.
 ROOTDIR=$(mktemp -d -t ci-qemu-test-fat-root)
 trap tempdir_cleanup EXIT SIGINT SIGHUP SIGTERM SIGQUIT
 
 # Populate the boot image in a temp dir.
 ( cd ${SRCTOP} && tempdir_setup )
 
 # And, boot in QEMU.
 : ${BOOTLOG:=${TMPDIR:-/tmp}/ci-qemu-test-boot.log}
 timeout 300 \
-    qemu-system-x86_64 -m 256M -bios ${OVMF} \
+    qemu-system-x86_64 -M q35 -m 256M -nodefaults \
+   	-drive if=pflash,format=raw,readonly,file=${OVMF} \
         -serial stdio -vga none -nographic -monitor none \
         -snapshot -hda fat:${ROOTDIR} 2>&1 | tee ${BOOTLOG}
 
 # Check whether we succesfully booted...
 if grep -q 'Hello world.' ${BOOTLOG}; then
 	echo "OK"
 else
 	die "Did not boot successfully, see ${BOOTLOG}"
 fi
Index: user/ngie/bug-237403/tools/boot/install-boot.sh
===================================================================
--- user/ngie/bug-237403/tools/boot/install-boot.sh	(revision 346925)
+++ user/ngie/bug-237403/tools/boot/install-boot.sh	(revision 346926)
@@ -1,429 +1,442 @@
 #!/bin/sh
 
 # $FreeBSD$
 
 #
 # Installs/updates the necessary boot blocks for the desired boot environment
 #
 # Lightly tested.. Intended to be installed, but until it matures, it will just
 # be a boot tool for regression testing.
 
 # insert code here to guess what you have -- yikes!
 
 # Minimum size of FAT filesystems, in KB.
 fat32min=33292
 fat16min=2100
 
 die() {
     echo $*
     exit 1
 }
 
 doit() {
     echo $*
     eval $*
 }
 
 find-part() {
     dev=$1
     part=$2
 
     gpart show $dev | tail +2 | awk '$4 == "'$part'" { print $3; }'
 }
 
 get_uefi_bootname() {
 
     case ${TARGET:-$(uname -m)} in
         amd64) echo bootx64 ;;
         arm64) echo bootaa64 ;;
         i386) echo bootia32 ;;
         arm) echo bootarm ;;
         *) die "machine type $(uname -m) doesn't support UEFI" ;;
     esac
 }
 
 make_esp_file() {
     local file sizekb loader device mntpt fatbits efibootname
 
     file=$1
     sizekb=$2
     loader=$3
 
     if [ "$sizekb" -ge "$fat32min" ]; then
         fatbits=32
     elif [ "$sizekb" -ge "$fat16min" ]; then
         fatbits=16
     else
         fatbits=12
     fi
 
     dd if=/dev/zero of="${file}" bs=1k count="${sizekb}"
     device=$(mdconfig -a -t vnode -f "${file}")
     newfs_msdos -F "${fatbits}" -c 1 -L EFISYS "/dev/${device}" > /dev/null 2>&1
     mntpt=$(mktemp -d /tmp/stand-test.XXXXXX)
     mount -t msdosfs "/dev/${device}" "${mntpt}"
     mkdir -p "${mntpt}/EFI/BOOT"
     efibootname=$(get_uefi_bootname)
     cp "${loader}" "${mntpt}/EFI/BOOT/${efibootname}.efi"
     umount "${mntpt}"
     rmdir "${mntpt}"
     mdconfig -d -u "${device}"
 }
 
 make_esp_device() {
     local dev file mntpt fstype efibootname kbfree loadersize efibootfile
     local isboot1 existingbootentryloaderfile bootorder bootentry
 
     # ESP device node
     dev=$1
     file=$2
 
     mntpt=$(mktemp -d /tmp/stand-test.XXXXXX)
 
     # See if we're using an existing (formatted) ESP
     fstype=$(fstyp "${dev}")
 
     if [ "${fstype}" != "msdosfs" ]; then
         newfs_msdos -F 32 -c 1 -L EFISYS "${dev}" > /dev/null 2>&1
     fi
 
     mount -t msdosfs "${dev}" "${mntpt}"
     if [ $? -ne 0 ]; then
         die "Failed to mount ${dev} as an msdosfs filesystem"
     fi
 
     echo "Mounted ESP ${dev} on ${mntpt}"
 
     efibootname=$(get_uefi_bootname)
     kbfree=$(df -k "${mntpt}" | tail -1 | cut -w -f 4)
     loadersize=$(stat -f %z "${file}")
     loadersize=$((loadersize / 1024))
 
     # Check if /EFI/BOOT/BOOTxx.EFI is the FreeBSD boot1.efi
     # If it is, remove it to avoid leaving stale files around
     efibootfile="${mntpt}/EFI/BOOT/${efibootname}.efi"
     if [ -f "${efibootfile}" ]; then
         isboot1=$(strings "${efibootfile}" | grep "FreeBSD EFI boot block")
 
         if [ -n "${isboot1}" ] && [ "$kbfree" -lt "${loadersize}" ]; then
             echo "Only ${kbfree}KB space remaining: removing old FreeBSD boot1.efi file /EFI/BOOT/${efibootname}.efi"
             rm "${efibootfile}"
             rmdir "${mntpt}/EFI/BOOT"
         else
             echo "${kbfree}KB space remaining on ESP: renaming old boot1.efi file /EFI/BOOT/${efibootname}.efi /EFI/BOOT/${efibootname}-old.efi"
             mv "${efibootfile}" "${mntpt}/EFI/BOOT/${efibootname}-old.efi"
         fi
     fi
 
     if [ ! -f "${mntpt}/EFI/freebsd/loader.efi" ] && [ "$kbfree" -lt "$loadersize" ]; then
         umount "${mntpt}"
 	rmdir "${mntpt}"
         echo "Failed to update the EFI System Partition ${dev}"
         echo "Insufficient space remaining for ${file}"
         echo "Run e.g \"mount -t msdosfs ${dev} /mnt\" to inspect it for files that can be removed."
         die
     fi
 
     mkdir -p "${mntpt}/EFI/freebsd"
 
     # Keep a copy of the existing loader.efi in case there's a problem with the new one
     if [ -f "${mntpt}/EFI/freebsd/loader.efi" ] && [ "$kbfree" -gt "$((loadersize * 2))" ]; then
         cp "${mntpt}/EFI/freebsd/loader.efi" "${mntpt}/EFI/freebsd/loader-old.efi"
     fi
 
     echo "Copying loader to /EFI/freebsd on ESP"
     cp "${file}" "${mntpt}/EFI/freebsd/loader.efi"
 
-    existingbootentryloaderfile=$(efibootmgr -v | grep "${mntpt}//EFI/freebsd/loader.efi")
+    if [ -n "${updatesystem}" ]; then
+        existingbootentryloaderfile=$(efibootmgr -v | grep "${mntpt}//EFI/freebsd/loader.efi")
 
-    if [ -z "$existingbootentryloaderfile" ]; then
-        # Try again without the double forward-slash in the path
-        existingbootentryloaderfile=$(efibootmgr -v | grep "${mntpt}/EFI/freebsd/loader.efi")
-    fi
-
-    if [ -z "$existingbootentryloaderfile" ]; then
-        echo "Creating UEFI boot entry for FreeBSD"
-        efibootmgr --create --label FreeBSD --loader "${mntpt}/EFI/freebsd/loader.efi" > /dev/null
-        if [ $? -ne 0 ]; then
-            die "Failed to create new boot entry"
+        if [ -z "$existingbootentryloaderfile" ]; then
+            # Try again without the double forward-slash in the path
+            existingbootentryloaderfile=$(efibootmgr -v | grep "${mntpt}/EFI/freebsd/loader.efi")
         fi
 
-        # When creating new entries, efibootmgr doesn't mark them active, so we need to
-        # do so. It doesn't make it easy to find which entry it just added, so rely on
-        # the fact that it places the new entry first in BootOrder.
-        bootorder=$(efivar --name 8be4df61-93ca-11d2-aa0d-00e098032b8c-BootOrder --print --no-name --hex | head -1)
-        bootentry=$(echo "${bootorder}" | cut -w -f 3)$(echo "${bootorder}" | cut -w -f 2)
-        echo "Marking UEFI boot entry ${bootentry} active"
-        efibootmgr --activate "${bootentry}" > /dev/null
+        if [ -z "$existingbootentryloaderfile" ]; then
+            echo "Creating UEFI boot entry for FreeBSD"
+            efibootmgr --create --label FreeBSD --loader "${mntpt}/EFI/freebsd/loader.efi" > /dev/null
+            if [ $? -ne 0 ]; then
+                die "Failed to create new boot entry"
+            fi
+
+            # When creating new entries, efibootmgr doesn't mark them active, so we need to
+            # do so. It doesn't make it easy to find which entry it just added, so rely on
+            # the fact that it places the new entry first in BootOrder.
+            bootorder=$(efivar --name 8be4df61-93ca-11d2-aa0d-00e098032b8c-BootOrder --print --no-name --hex | head -1)
+            bootentry=$(echo "${bootorder}" | cut -w -f 3)$(echo "${bootorder}" | cut -w -f 2)
+            echo "Marking UEFI boot entry ${bootentry} active"
+            efibootmgr --activate "${bootentry}" > /dev/null
+        else
+            echo "Existing UEFI FreeBSD boot entry found: not creating a new one"
+        fi
     else
-        echo "Existing UEFI FreeBSD boot entry found: not creating a new one"
+	# Configure for booting from removable media
+	if [ ! -d "${mntpt}/EFI/BOOT" ]; then
+		mkdir -p "${mntpt}/EFI/BOOT"
+	fi
+	cp "${file}" "${mntpt}/EFI/BOOT/${efibootname}.efi"
     fi
 
     umount "${mntpt}"
     rmdir "${mntpt}"
     echo "Finished updating ESP"
 }
 
 make_esp() {
     local file loaderfile
 
     file=$1
     loaderfile=$2
 
     if [ -f "$file" ]; then
         make_esp_file ${file} ${fat32min} ${loaderfile}
     else
         make_esp_device ${file} ${loaderfile}
     fi
 }
 
 make_esp_mbr() {
     dev=$1
     dst=$2
 
     s=$(find-part $dev "!239")
     if [ -z "$s" ] ; then
 	s=$(find-part $dev "efi")
 	if [ -z "$s" ] ; then
 	    die "No ESP slice found"
     	fi
     fi
     make_esp /dev/${dev}s${s} ${dst}/boot/loader.efi
 }
 
 make_esp_gpt() {
     dev=$1
     dst=$2
 
     idx=$(find-part $dev "efi")
     if [ -z "$idx" ] ; then
 	die "No ESP partition found"
     fi
     make_esp /dev/${dev}p${idx} ${dst}/boot/loader.efi
 }
 
 boot_nogeli_gpt_ufs_legacy() {
     dev=$1
     dst=$2
 
     idx=$(find-part $dev "freebsd-boot")
     if [ -z "$idx" ] ; then
 	die "No freebsd-boot partition found"
     fi
     doit gpart bootcode -b ${gpt0} -p ${gpt2} -i $idx $dev
 }
 
 boot_nogeli_gpt_ufs_uefi() {
     make_esp_gpt $1 $2
 }
 
 boot_nogeli_gpt_ufs_both() {
     boot_nogeli_gpt_ufs_legacy $1 $2 $3
     boot_nogeli_gpt_ufs_uefi $1 $2 $3
 }
 
 boot_nogeli_gpt_zfs_legacy() {
     dev=$1
     dst=$2
 
     idx=$(find-part $dev "freebsd-boot")
     if [ -z "$idx" ] ; then
 	die "No freebsd-boot partition found"
     fi
     doit gpart bootcode -b ${gpt0} -p ${gptzfs2} -i $idx $dev
 }
 
 boot_nogeli_gpt_zfs_uefi() {
     make_esp_gpt $1 $2
 }
 
 boot_nogeli_gpt_zfs_both() {
     boot_nogeli_gpt_zfs_legacy $1 $2 $3
     boot_nogeli_gpt_zfs_uefi $1 $2 $3
 }
 
 boot_nogeli_mbr_ufs_legacy() {
     dev=$1
     dst=$2
 
     doit gpart bootcode -b ${mbr0} ${dev}
     s=$(find-part $dev "freebsd")
     if [ -z "$s" ] ; then
 	die "No freebsd slice found"
     fi
     doit gpart bootcode -p ${mbr2} ${dev}s${s}
 }
 
 boot_nogeli_mbr_ufs_uefi() {
     make_esp_mbr $1 $2
 }
 
 boot_nogeli_mbr_ufs_both() {
     boot_nogeli_mbr_ufs_legacy $1 $2 $3
     boot_nogeli_mbr_ufs_uefi $1 $2 $3
 }
 
 boot_nogeli_mbr_zfs_legacy() {
     dev=$1
     dst=$2
 
     # search to find the BSD slice
     s=$(find-part $dev "freebsd")
     if [ -z "$s" ] ; then
 	die "No BSD slice found"
     fi
     idx=$(find-part ${dev}s${s} "freebsd-zfs")
     if [ -z "$idx" ] ; then
 	die "No freebsd-zfs slice found"
     fi
     # search to find the freebsd-zfs partition within the slice
     # Or just assume it is 'a' because it has to be since it fails otherwise
     doit gpart bootcode -b ${dst}/boot/mbr ${dev}
     dd if=${dst}/boot/zfsboot of=/tmp/zfsboot1 count=1
     doit gpart bootcode -b /tmp/zfsboot1 ${dev}s${s}	# Put boot1 into the start of part
     sysctl kern.geom.debugflags=0x10		# Put boot2 into ZFS boot slot
     doit dd if=${dst}/boot/zfsboot of=/dev/${dev}s${s}a skip=1 seek=1024
     sysctl kern.geom.debugflags=0x0
 }
 
 boot_nogeli_mbr_zfs_uefi() {
     make_esp_mbr $1 $2
 }
 
 boot_nogeli_mbr_zfs_both() {
     boot_nogeli_mbr_zfs_legacy $1 $2 $3
     boot_nogeli_mbr_zfs_uefi $1 $2 $3
 }
 
 boot_geli_gpt_ufs_legacy() {
     boot_nogeli_gpt_ufs_legacy $1 $2 $3
 }
 
 boot_geli_gpt_ufs_uefi() {
     boot_nogeli_gpt_ufs_uefi $1 $2 $3
 }
 
 boot_geli_gpt_ufs_both() {
     boot_nogeli_gpt_ufs_both $1 $2 $3
 }
 
 boot_geli_gpt_zfs_legacy() {
     boot_nogeli_gpt_zfs_legacy $1 $2 $3
 }
 
 boot_geli_gpt_zfs_uefi() {
     boot_nogeli_gpt_zfs_uefi $1 $2 $3
 }
 
 boot_geli_gpt_zfs_both() {
     boot_nogeli_gpt_zfs_both $1 $2 $3
 }
 
 # GELI+MBR is not a valid configuration
 boot_geli_mbr_ufs_legacy() {
     exit 1
 }
 
 boot_geli_mbr_ufs_uefi() {
     exit 1
 }
 
 boot_geli_mbr_ufs_both() {
     exit 1
 }
 
 boot_geli_mbr_zfs_legacy() {
     exit 1
 }
 
 boot_geli_mbr_zfs_uefi() {
     exit 1
 }
 
 boot_geli_mbr_zfs_both() {
     exit 1
 }
 
 boot_nogeli_vtoc8_ufs_ofw() {
     dev=$1
     dst=$2
 
     # For non-native builds, ensure that geom_part(4) supports VTOC8.
     kldload geom_part_vtoc8.ko
     doit gpart bootcode -p ${vtoc8} ${dev}
 }
 
 usage() {
 	printf 'Usage: %s -b bios [-d destdir] -f fs [-g geli] [-h] [-o optargs] -s scheme <bootdev>\n' "$0"
 	printf 'Options:\n'
 	printf ' bootdev       device to install the boot code on\n'
 	printf ' -b bios       bios type: legacy, uefi or both\n'
 	printf ' -d destdir    destination filesystem root\n'
 	printf ' -f fs         filesystem type: ufs or zfs\n'
 	printf ' -g geli       yes or no\n'
 	printf ' -h            this help/usage text\n'
+	printf ' -u            Run commands such as efibootmgr to update the\n'
+	printf '               currently running system\n'
 	printf ' -o optargs    optional arguments\n'
 	printf ' -s scheme     mbr or gpt\n'
 	exit 0
 }
 
 srcroot=/
 
 # Note: we really don't support geli boot in this script yet.
 geli=nogeli
 
-while getopts "b:d:f:g:ho:s:" opt; do
+while getopts "b:d:f:g:ho:s:u" opt; do
     case "$opt" in
 	b)
 	    bios=${OPTARG}
 	    ;;
 	d)
 	    srcroot=${OPTARG}
 	    ;;
 	f)
 	    fs=${OPTARG}
 	    ;;
 	g)
 	    case ${OPTARG} in
 		[Yy][Ee][Ss]|geli) geli=geli ;;
 		*) geli=nogeli ;;
 	    esac
+	    ;;
+	u)
+	    updatesystem=1
 	    ;;
 	o)
 	    opts=${OPTARG}
 	    ;;
 	s)
 	    scheme=${OPTARG}
 	    ;;
 
 	?|h)
             usage
             ;;
     esac
 done
 
 if [ -n "${scheme}" ] && [ -n "${fs}" ] && [ -n "${bios}" ]; then
     shift $((OPTIND-1))
     dev=$1
 fi
 
 # For gpt, we need to install pmbr as the primary boot loader
 # it knows about 
 gpt0=${srcroot}/boot/pmbr
 gpt2=${srcroot}/boot/gptboot
 gptzfs2=${srcroot}/boot/gptzfsboot
 
 # For MBR, we have lots of choices, but select mbr, boot0 has issues with UEFI
 mbr0=${srcroot}/boot/mbr
 mbr2=${srcroot}/boot/boot
 
 # VTOC8
 vtoc8=${srcroot}/boot/boot1
 
 # sanity check here
 
 # Check if we've been given arguments. If not, this script is probably being
 # sourced, so we shouldn't run anything.
 if [ -n "${dev}" ]; then
 	eval boot_${geli}_${scheme}_${fs}_${bios} $dev $srcroot $opts || echo "Unsupported boot env: ${geli}-${scheme}-${fs}-${bios}"
 fi
Index: user/ngie/bug-237403/tools/boot/rootgen.sh
===================================================================
--- user/ngie/bug-237403/tools/boot/rootgen.sh	(revision 346925)
+++ user/ngie/bug-237403/tools/boot/rootgen.sh	(revision 346926)
@@ -1,867 +1,892 @@
 #!/bin/sh
 
 # $FreeBSD$
 
 passphrase=passphrase
 iterations=50000
 
 # The smallest FAT32 filesystem is 33292 KB
 espsize=33292
 
 #
 # Builds all the bat-shit crazy combinations we support booting from,
 # at least for amd64. It assume you have a ~sane kernel in /boot/kernel
 # and copies that into the ~150MB root images we create (we create the du
 # size of the kernel + 20MB
 #
 # Sad panda sez: this runs as root, but could be userland if someone
 # creates userland geli and zfs tools.
 #
 # This assumes an external program install-boot.sh which will install
 # the appropriate boot files in the appropriate locations.
 #
 # These images assume ada0 will be the root image. We should likely
 # use labels, but we don't.
 #
 # Assumes you've already rebuilt... maybe bad? Also maybe bad: the env
 # vars should likely be conditionally set to allow better automation.
 #
 
+. $(dirname $0)/install-boot.sh
+
 cpsys() {
     src=$1
     dst=$2
 
     # Copy kernel + boot loader
     (cd $src ; tar cf - .) | (cd $dst; tar xf -)
 }
 
 mk_nogeli_gpt_ufs_legacy() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0p2	/		ufs	rw	1	1
 EOF
     makefs -t ffs -B little -s 200m ${img}.p2 ${src}
     mkimg -s gpt -b ${src}/boot/pmbr \
 	  -p freebsd-boot:=${src}/boot/gptboot \
 	  -p freebsd-ufs:=${img}.p2 -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_gpt_ufs_uefi() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0p2	/		ufs	rw	1	1
 EOF
-    make_esp_file ${img}.p1 ${espsize} ${src}
+    make_esp_file ${img}.p1 ${espsize} ${src}/boot/loader.efi
     makefs -t ffs -B little -s 200m ${img}.p2 ${src}
     mkimg -s gpt \
 	  -p efi:=${img}.p1 \
 	  -p freebsd-ufs:=${img}.p2 -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_gpt_ufs_both() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0p3	/		ufs	rw	1	1
 EOF
-    make_esp_file ${img}.p1 ${espsize} ${src}
+    make_esp_file ${img}.p1 ${espsize} ${src}/boot/loader.efi
     makefs -t ffs -B little -s 200m ${img}.p3 ${src}
     # p1 is boot for uefi, p2 is boot for gpt, p3 is /
     mkimg -b ${src}/boot/pmbr -s gpt \
 	  -p efi:=${img}.p1 \
 	  -p freebsd-boot:=${src}/boot/gptboot \
 	  -p freebsd-ufs:=${img}.p3 \
 	  -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_gpt_zfs_legacy() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-gpt-zfs-legacy
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p2
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_nogeli_gpt_zfs_uefi() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-gpt-zfs-uefi
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p2
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_nogeli_gpt_zfs_both() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-gpt-zfs-both
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p3
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_nogeli_mbr_ufs_legacy() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0s1a	/		ufs	rw	1	1
 EOF
     makefs -t ffs -B little -s 200m ${img}.s1a ${src}
     mkimg -s bsd -b ${src}/boot/boot -p freebsd-ufs:=${img}.s1a -o ${img}.s1
     mkimg -a 1 -s mbr -b ${src}/boot/boot0sio -p freebsd:=${img}.s1 -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_mbr_ufs_uefi() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0s2a	/		ufs	rw	1	1
 EOF
-    make_esp_file ${img}.s1 ${espsize} ${src}
+    make_esp_file ${img}.s1 ${espsize} ${src}/boot/loader.efi
     makefs -t ffs -B little -s 200m ${img}.s2a ${src}
     mkimg -s bsd -p freebsd-ufs:=${img}.s2a -o ${img}.s2
     mkimg -a 1 -s mbr -p efi:=${img}.s1 -p freebsd:=${img}.s2 -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_mbr_ufs_both() {
     src=$1
     img=$2
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0s2a	/		ufs	rw	1	1
 EOF
-    make_esp_file ${img}.s1 ${espsize} ${src}
+    make_esp_file ${img}.s1 ${espsize} ${src}/boot/loader.efi
     makefs -t ffs -B little -s 200m ${img}.s2a ${src}
     mkimg -s bsd -b ${src}/boot/boot -p freebsd-ufs:=${img}.s2a -o ${img}.s2
     mkimg -a 2 -s mbr -b ${src}/boot/mbr -p efi:=${img}.s1 -p freebsd:=${img}.s2 -o ${img}
     rm -f ${src}/etc/fstab
 }
 
 mk_nogeli_mbr_zfs_legacy() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-mbr-zfs-legacy
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s mbr ${md}
     gpart add -t freebsd ${md}
     gpart set -a active -i 1 ${md}
     gpart create -s bsd ${md}s1
     gpart add -t freebsd-zfs ${md}s1
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}s1a
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_nogeli_mbr_zfs_uefi() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-mbr-zfs-uefi
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s mbr ${md}
     gpart add -t efi -s ${espsize}k ${md}
     gpart add -t freebsd ${md}
     gpart set -a active -i 2 ${md}
     gpart create -s bsd ${md}s2
     gpart add -t freebsd-zfs ${md}s2
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}s2a
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_nogeli_mbr_zfs_both() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=nogeli-mbr-zfs-both
 
     dd if=/dev/zero of=${img} count=1 seek=$((200 * 1024 * 1024 / 512))
     md=$(mdconfig -f ${img})
     gpart create -s mbr ${md}
     gpart add -t efi -s  ${espsize}k ${md}
     gpart add -t freebsd ${md}
     gpart set -a active -i 2 ${md}
     gpart create -s bsd ${md}s2
     gpart add -t freebsd-zfs ${md}s2
     # install-boot will make this bootable
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}s2a
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_ufs_legacy() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
 
     dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
     gpart add -t freebsd-ufs -l root $md
     # install-boot will make this bootable
     echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p2
     echo ${passphrase} | geli attach -j - ${md}p2
     newfs /dev/${md}p2.eli
     mount /dev/${md}p2.eli ${mntpt}
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat > ${mntpt}/boot/loader.conf <<EOF
 geom_eli_load=YES
 EOF
     cat > ${mntpt}/etc/fstab <<EOF
 /dev/ada0p2.eli	/		ufs	rw	1	1
 EOF
 
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     umount -f ${mntpt}
     geli detach ${md}p2
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_ufs_uefi() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
 
     dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
     gpart add -t freebsd-ufs -l root $md
     # install-boot will make this bootable
     echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p2
     echo ${passphrase} | geli attach -j - ${md}p2
     newfs /dev/${md}p2.eli
     mount /dev/${md}p2.eli ${mntpt}
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat > ${mntpt}/boot/loader.conf <<EOF
 geom_eli_load=YES
 EOF
     cat > ${mntpt}/etc/fstab <<EOF
 /dev/ada0p2.eli	/		ufs	rw	1	1
 EOF
 
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     umount -f ${mntpt}
     geli detach ${md}p2
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_ufs_both() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
 
     dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
     gpart add -t freebsd-ufs -l root $md
     # install-boot will make this bootable
     echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p3
     echo ${passphrase} | geli attach -j - ${md}p3
     newfs /dev/${md}p3.eli
     mount /dev/${md}p3.eli ${mntpt}
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat > ${mntpt}/boot/loader.conf <<EOF
 geom_eli_load=YES
 EOF
     cat > ${mntpt}/etc/fstab <<EOF
 /dev/ada0p3.eli	/		ufs	rw	1	1
 EOF
 
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     umount -f ${mntpt}
     geli detach ${md}p3
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_zfs_legacy() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=geli-gpt-zfs-legacy
 
-    dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
+    # Note that in this flavor we create an empty p2 ufs partition, and put
+    # the bootable zfs stuff on p3, just to test the ability of the zfs probe
+    # probe routines to find a pool on a partition other than the first one.
+
+    dd if=/dev/zero of=${img} count=1 seek=$(( 300 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
+    gpart add -t freebsd-ufs -s 100m ${md}
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
-    echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p2
-    echo ${passphrase} | geli attach -j - ${md}p2
-    zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p2.eli
+    echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p3
+    echo ${passphrase} | geli attach -j - ${md}p3
+    zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p3.eli
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 geom_eli_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
-    geli detach ${md}p2
+    geli detach ${md}p3
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_zfs_uefi() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=geli-gpt-zfs-uefi
 
-    dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
+    # Note that in this flavor we create an empty p2 ufs partition, and put
+    # the bootable zfs stuff on p3, just to test the ability of the zfs probe
+    # probe routines to find a pool on a partition other than the first one.
+
+    dd if=/dev/zero of=${img} count=1 seek=$(( 300 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
+    gpart add -t freebsd-ufs -s 100m ${md}
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
-    echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p2
-    echo ${passphrase} | geli attach -j - ${md}p2
-    zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p2.eli
+    echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p3
+    echo ${passphrase} | geli attach -j - ${md}p3
+    zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p3.eli
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat >> ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 geom_eli_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
-    geli detach ${md}p2
+    geli detach ${md}p3
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 mk_geli_gpt_zfs_both() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
     pool=geli-gpt-zfs-both
 
     dd if=/dev/zero of=${img} count=1 seek=$(( 200 * 1024 * 1024 / 512 ))
     md=$(mdconfig -f ${img})
     gpart create -s gpt ${md}
     gpart add -t efi -s ${espsize}k -a 4k ${md}
     gpart add -t freebsd-boot -s 400k -a 4k	${md}	# <= ~540k
     gpart add -t freebsd-zfs -l root $md
     # install-boot will make this bootable
     echo ${passphrase} | geli init -bg -e AES-XTS -i ${iterations} -J - -l 256 -s 4096 ${md}p3
     echo ${passphrase} | geli attach -j - ${md}p3
     zpool create -O mountpoint=none -R ${mntpt} ${pool} ${md}p3.eli
     zpool set bootfs=${pool} ${pool}
     zfs create -po mountpoint=/ ${pool}/ROOT/default
     # NB: The online guides go nuts customizing /var and other mountpoints here, no need
     cpsys ${src} ${mntpt}
     # need to make a couple of tweaks
     cat > ${mntpt}/boot/loader.conf <<EOF
 zfs_load=YES
 opensolaris_load=YES
 geom_eli_load=YES
 EOF
+    cp /boot/kernel/acl_nfs4.ko ${mntpt}/boot/kernel/acl_nfs4.ko
     cp /boot/kernel/zfs.ko ${mntpt}/boot/kernel/zfs.ko
     cp /boot/kernel/opensolaris.ko ${mntpt}/boot/kernel/opensolaris.ko
     cp /boot/kernel/geom_eli.ko ${mntpt}/boot/kernel/geom_eli.ko
     # end tweaks
     zfs umount -f ${pool}/ROOT/default
     zfs set mountpoint=none ${pool}/ROOT/default
     zpool set bootfs=${pool}/ROOT/default ${pool}
     zpool set autoexpand=on ${pool}
     zpool export ${pool}
     geli detach ${md}p3
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
 }
 
 # GELI+MBR is not a valid configuration
 mk_geli_mbr_ufs_legacy() {
 }
 
 mk_geli_mbr_ufs_uefi() {
 }
 
 mk_geli_mbr_ufs_both() {
 }
 
 mk_geli_mbr_zfs_legacy() {
 }
 
 mk_geli_mbr_zfs_uefi() {
 }
 
 mk_geli_mbr_zfs_both() {
 }
 
 # iso
 # pxeldr
 # u-boot
 # powerpc
 
 mk_sparc64_nogeli_vtoc8_ufs_ofw() {
     src=$1
     img=$2
     mntpt=$3
     geli=$4
     scheme=$5
     fs=$6
     bios=$7
 
     cat > ${src}/etc/fstab <<EOF
 /dev/ada0a	/		ufs	rw	1	1
 EOF
     makefs -t ffs -B big -s 200m ${img} ${src}
     md=$(mdconfig -f ${img})
     # For non-native builds, ensure that geom_part(4) supports VTOC8.
     kldload geom_part_vtoc8.ko
     gpart create -s VTOC8 ${md}
     gpart add -t freebsd-ufs ${md}
     ${SRCTOP}/tools/boot/install-boot.sh -g ${geli} -s ${scheme} -f ${fs} -b ${bios} -d ${src} ${md}
     mdconfig -d -u ${md}
     rm -f ${src}/etc/fstab
 }
 
 qser="-serial telnet::4444,server -nographic"
 
 # https://wiki.freebsd.org/QemuRecipes
 # aarch64
 qemu_aarch64_uefi()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-aarch64 -m 4096M -cpu cortex-a57 -M virt  \
         -bios QEMU_EFI.fd ${qser} \
         -drive if=none,file=${img},id=hd0 \
         -device virtio-blk-device,drive=hd0" > $sh
     chmod 755 $sh
 # https://wiki.freebsd.org/arm64/QEMU also has
 #       -device virtio-net-device,netdev=net0
 #       -netdev user,id=net0
 }
 
 # Amd64 qemu
 qemu_amd64_legacy()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-x86_64 -m 256m --drive file=${img},format=raw ${qser}" > $sh
     chmod 755 $sh
 }
 
 qemu_amd64_uefi()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-x86_64 -m 256m -bios ~/bios/OVMF-X64.fd --drive file=${img},format=raw ${qser}" > $sh
     chmod 755 $sh
 }
 
 qemu_amd64_both()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-x86_64 -m 256m --drive file=${img},format=raw ${qser}" > $sh
     echo "qemu-system-x86_64 -m 256m -bios ~/bios/OVMF-X64.fd --drive file=${img},format=raw ${qser}" >> $sh
     chmod 755 $sh
 }
 
 # arm
 # nothing listed?
 
 # i386
 qemu_i386_legacy()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-i386 --drive file=${img},format=raw ${qser}" > $sh
     chmod 755 $sh
 }
 
 # Not yet supported
 qemu_i386_uefi()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-i386 -bios ~/bios/OVMF-X32.fd --drive file=${img},format=raw ${qser}" > $sh
     chmod 755 $sh
 }
 
 # Needs UEFI to be supported
 qemu_i386_both()
 {
     img=$1
     sh=$2
 
     echo "qemu-system-i386 --drive file=${img},format=raw ${qser}" > $sh
     echo "qemu-system-i386 -bios ~/bios/OVMF-X32.fd --drive file=${img},format=raw ${qser}" >> $sh
     chmod 755 $sh
 }
 
 make_one_image()
 {
     local arch=${1?}
     local geli=${2?}
     local scheme=${3?}
     local fs=${4?}
     local bios=${5?}
 
     # Create sparse file and mount newly created filesystem(s) on it
     img=${IMGDIR}/${arch}-${geli}-${scheme}-${fs}-${bios}.img
     sh=${IMGDIR}/${arch}-${geli}-${scheme}-${fs}-${bios}.sh
     echo "vvvvvvvvvvvvvv   Creating $img  vvvvvvvvvvvvvvv"
     rm -f ${img}*
     eval mk_${geli}_${scheme}_${fs}_${bios} ${DESTDIR} ${img} ${MNTPT} ${geli} ${scheme} ${fs} ${bios}
     eval qemu_${arch}_${bios} ${img} ${sh}
     [ -n "${SUDO_USER}" ] && chown ${SUDO_USER} ${img}*
     echo "^^^^^^^^^^^^^^   Created $img   ^^^^^^^^^^^^^^^"
 }
 
 # mips
 # qemu-system-mips -kernel /path/to/rootfs/boot/kernel/kernel -nographic -hda /path/to/disk.img -m 2048
 
 # Powerpc -- doesn't work but maybe it would enough for testing -- needs details
 # powerpc64
 # qemu-system-ppc64 -drive file=/path/to/disk.img,format=raw
 
 # sparc64
 # qemu-system-sparc64 -drive file=/path/to/disk.img,format=raw
 
 # Misc variables
 SRCTOP=$(make -v SRCTOP)
 cd ${SRCTOP}/stand
 OBJDIR=$(make -v .OBJDIR)
 IMGDIR=${OBJDIR}/boot-images
 mkdir -p ${IMGDIR}
 MNTPT=$(mktemp -d /tmp/stand-test.XXXXXX)
 
 # Setup the installed tree...
 DESTDIR=${OBJDIR}/boot-tree
 rm -rf ${DESTDIR}
 mkdir -p ${DESTDIR}/boot/defaults
 mkdir -p ${DESTDIR}/boot/kernel
 cp /boot/kernel/kernel ${DESTDIR}/boot/kernel
 echo -h -D -S115200 > ${DESTDIR}/boot.config
 cat > ${DESTDIR}/boot/loader.conf <<EOF
 console=comconsole
 comconsole_speed=115200
 boot_serial=yes
 boot_multicons=yes
 EOF
 # XXX
 cp /boot/device.hints ${DESTDIR}/boot/device.hints
 # Assume we're already built
-make install DESTDIR=${DESTDIR} MK_MAN=no MK_INSTALL_AS_USER=yes
+make install DESTDIR=${DESTDIR} MK_MAN=no MK_INSTALL_AS_USER=yes WITHOUT_DEBUG_FILES=yes
+if [ $? -ne 0 ]; then
+        echo "make install failed"
+        exit 1
+fi
 # Copy init, /bin/sh, minimal libraries and testing /etc/rc
 mkdir -p ${DESTDIR}/sbin ${DESTDIR}/bin \
       ${DESTDIR}/lib ${DESTDIR}/libexec \
       ${DESTDIR}/etc ${DESTDIR}/dev
 for f in /sbin/halt /sbin/init /bin/sh /sbin/sysctl $(ldd /bin/sh | awk 'NF == 4 { print $3; }') /libexec/ld-elf.so.1; do
     cp $f ${DESTDIR}/$f
 done
 cat > ${DESTDIR}/etc/rc <<EOF
 #!/bin/sh
 
 sysctl machdep.bootmethod
 echo "RC COMMAND RUNNING -- SUCCESS!!!!!"
 halt -p
 EOF
 
 # If we were given exactly 5 args, go make that one image.
 
 if [ $# -eq 5 ]; then
     make_one_image $*
     exit
 fi
 
 # OK. Let the games begin
 
 for arch in amd64; do
     for geli in nogeli geli; do
 	for scheme in gpt mbr; do
 	    for fs in ufs zfs; do
 		for bios in legacy uefi both; do
 		    make_one_image ${arch} ${geli} ${scheme} ${fs} ${bios}
 		done
 	    done
 	done
     done
 done
 
 rmdir ${MNTPT}
 
 exit 0
 
 # Notes for the future
 
 for arch in i386; do
     for geli in nogeli geli; do
 	for scheme in gpt mbr; do
 	    for fs in ufs zfs; do
 		for bios in legacy; do
 		    make_one_image ${arch} ${geli} ${scheme} ${fs} ${bios}
 		done
 	    done
 	done
     done
 done
 
 for arch in arm aarch64; do
     for scheme in gpt mbr; do
 	fs=ufs
 	for bios in uboot efi; do
 	    make_one_image ${arch} ${geli} ${scheme} ${fs} ${bios}
 	done
     done
 done
 
 for arch in powerpc powerpc64; do
     for scheme in ppc-wtf; do
 	fs=ufs
 	for bios in ofw uboot chrp; do
 	    make_one_image ${arch} ${geli} ${scheme} ${fs} ${bios}
 	done
     done
 done
 
 for arch in sparc64; do
     for geli in nogeli; do
 	for scheme in vtoc8; do
 	    for fs in ufs; do
 		for bios in ofw; do
 		    make_one_image ${arch} ${geli} ${scheme} ${fs} ${bios}
 		done
 	    done
 	done
     done
 done
Index: user/ngie/bug-237403/tools/regression/fsx/fsx.c
===================================================================
--- user/ngie/bug-237403/tools/regression/fsx/fsx.c	(revision 346925)
+++ user/ngie/bug-237403/tools/regression/fsx/fsx.c	(revision 346926)
@@ -1,1228 +1,1239 @@
 /*
  * Copyright (c) 1998-2001 Apple Computer, Inc. All rights reserved.
  *
  * @APPLE_LICENSE_HEADER_START@
  *
  * The contents of this file constitute Original Code as defined in and
  * are subject to the Apple Public Source License Version 2.0 (the
  * "License").  You may not use this file except in compliance with the
  * License.  Please obtain a copy of the License at
  * http://www.opensource.apple.com/apsl/ and read it before using this file.
  *
  * This Original Code and all software distributed under the License are
  * distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, EITHER
  * EXPRESS OR IMPLIED, AND APPLE HEREBY DISCLAIMS ALL SUCH WARRANTIES,
  * INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT.  Please see the
  * License for the specific language governing rights and limitations
  * under the License.
  *
  * @APPLE_LICENSE_HEADER_END@
  *
  *	File:	fsx.c
  *	Author:	Avadis Tevanian, Jr.
  *
  *	File system exerciser. 
  *
  *	Rewrite and enhancements 1998-2001 Conrad Minshall -- conrad@mac.com
  *
  *	Various features from Joe Sokol, Pat Dirks, and Clark Warner.
  *
  *	Small changes to work under Linux -- davej@suse.de
  *
  *	Sundry porting patches from Guy Harris 12/2001
  *
  *	Checks for mmap last-page zero fill.
  *
  *	Updated license to APSL 2.0, 2004/7/27 - Jordan Hubbard
  *
  * $FreeBSD$
  *
  */
 
 #include <sys/types.h>
 #include <sys/stat.h>
 #ifdef _UWIN
 # include <sys/param.h>
 # include <limits.h>
 # include <time.h>
 # include <strings.h>
 #endif
+#include <err.h>
 #include <fcntl.h>
 #include <sys/mman.h>
 #ifndef MAP_FILE
 # define MAP_FILE 0
 #endif
 #include <limits.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 #include <stdarg.h>
 #include <errno.h>
 
 #define NUMPRINTCOLUMNS 32	/* # columns of data to print on each line */
 
 /*
  *	A log entry is an operation and a bunch of arguments.
  */
 
 struct log_entry {
 	int	operation;
 	int	args[3];
 };
 
 #define	LOGSIZE	1000
 
 struct log_entry	oplog[LOGSIZE];	/* the log */
 int			logptr = 0;	/* current position in log */
 int			logcount = 0;	/* total ops */
 
 /*
  *	Define operations
  */
 
 #define	OP_READ		1
 #define OP_WRITE	2
 #define OP_TRUNCATE	3
 #define OP_CLOSEOPEN	4
 #define OP_MAPREAD	5
 #define OP_MAPWRITE	6
 #define OP_SKIPPED	7
 #define OP_INVALIDATE	8
 
 int page_size;
 int page_mask;
 
 char	*original_buf;			/* a pointer to the original data */
 char	*good_buf;			/* a pointer to the correct data */
 char	*temp_buf;			/* a pointer to the current data */
 char	*fname;				/* name of our test file */
 int	fd;				/* fd for our test file */
 
 off_t		file_size = 0;
 off_t		biggest = 0;
 char		state[256];
 unsigned long	testcalls = 0;		/* calls to function "test" */
 
 unsigned long	simulatedopcount = 0;	/* -b flag */
 int	closeprob = 0;			/* -c flag */
 int	invlprob = 0;			/* -i flag */
 int	debug = 0;			/* -d flag */
 unsigned long	debugstart = 0;		/* -D flag */
 unsigned long	maxfilelen = 256 * 1024;	/* -l flag */
 int	sizechecks = 1;			/* -n flag disables them */
 int	maxoplen = 64 * 1024;		/* -o flag */
 int	quiet = 0;			/* -q flag */
 unsigned long progressinterval = 0;	/* -p flag */
 int	readbdy = 1;			/* -r flag */
 int	style = 0;			/* -s flag */
 int	truncbdy = 1;			/* -t flag */
 int	writebdy = 1;			/* -w flag */
 long	monitorstart = -1;		/* -m flag */
 long	monitorend = -1;		/* -m flag */
 int	lite = 0;			/* -L flag */
 long	numops = -1;			/* -N flag */
 int	randomoplen = 1;		/* -O flag disables it */
 int	seed = 1;			/* -S flag */
 int     mapped_writes = 1;	      /* -W flag disables */
 int 	mapped_reads = 1;		/* -R flag disables it */
 int     mapped_msync = 1;	      /* -U flag disables */
 int	fsxgoodfd = 0;
 FILE *	fsxlogf = NULL;
 int badoff = -1;
 int closeopen = 0;
 int invl = 0;
 
 
 void
 vwarnc(code, fmt, ap)
 	int code;
 	const char *fmt;
 	va_list ap;
 {
 	fprintf(stderr, "fsx: ");
 	if (fmt != NULL) {
 		vfprintf(stderr, fmt, ap);
 		fprintf(stderr, ": ");
 	}
 	fprintf(stderr, "%s\n", strerror(code));
 }
 
 
 void
 warn(const char * fmt, ...)
 {
 	va_list ap;
 	va_start(ap, fmt);
 	vwarnc(errno, fmt, ap);
 	va_end(ap);
 }
 
 
 void
 prt(char *fmt, ...)
 {
 	va_list args;
 
 	va_start(args, fmt);
 	vfprintf(stdout, fmt, args);
 	va_end(args);
 
 	if (fsxlogf) {
 		va_start(args, fmt);
 		vfprintf(fsxlogf, fmt, args);
 		va_end(args);
 	}
 }
 
 void
 prterr(char *prefix)
 {
 	prt("%s%s%s\n", prefix, prefix ? ": " : "", strerror(errno));
 }
 
 
 void
 do_log4(int operation, int arg0, int arg1, int arg2)
 {
 	struct log_entry *le;
 
 	le = &oplog[logptr];
 	le->operation = operation;
 	le->args[0] = arg0;
 	le->args[1] = arg1;
 	le->args[2] = arg2;
 	logptr++;
 	logcount++;
 	if (logptr >= LOGSIZE)
 		logptr = 0;
 }
 
 
 void
 log4(int operation, int arg0, int arg1, int arg2)
 {
 	do_log4(operation, arg0, arg1, arg2);
 	if (closeopen)
 		do_log4(OP_CLOSEOPEN, 0, 0, 0);
 	if (invl)
 		do_log4(OP_INVALIDATE, 0, 0, 0);
 }
 
 
 void
 logdump(void)
 {
 	struct log_entry	*lp;
 	int	i, count, down, opnum;
 
 	prt("LOG DUMP (%d total operations):\n", logcount);
 	if (logcount < LOGSIZE) {
 		i = 0;
 		count = logcount;
 	} else {
 		i = logptr;
 		count = LOGSIZE;
 	}
 
 	opnum = i + 1 + (logcount/LOGSIZE)*LOGSIZE;
 	for ( ; count > 0; count--) {
 		lp = &oplog[i];
 
 		if (lp->operation == OP_CLOSEOPEN ||
 		    lp->operation == OP_INVALIDATE) {
 			switch (lp->operation) {
 			case OP_CLOSEOPEN:
 				prt("\t\tCLOSE/OPEN\n");
 				break;
 			case OP_INVALIDATE:
 				prt("\t\tMS_INVALIDATE\n");
 				break;
 			}
 			i++;
 			if (i == LOGSIZE)
 				i = 0;
 			continue;
 		}
 
 		prt("%d(%d mod 256): ", opnum, opnum%256);
 		switch (lp->operation) {
 		case OP_MAPREAD:
 			prt("MAPREAD\t0x%x thru 0x%x\t(0x%x bytes)",
 			    lp->args[0], lp->args[0] + lp->args[1] - 1,
 			    lp->args[1]);
 			if (badoff >= lp->args[0] && badoff <
 						     lp->args[0] + lp->args[1])
 				prt("\t***RRRR***");
 			break;
 		case OP_MAPWRITE:
 			prt("MAPWRITE 0x%x thru 0x%x\t(0x%x bytes)",
 			    lp->args[0], lp->args[0] + lp->args[1] - 1,
 			    lp->args[1]);
 			if (badoff >= lp->args[0] && badoff <
 						     lp->args[0] + lp->args[1])
 				prt("\t******WWWW");
 			break;
 		case OP_READ:
 			prt("READ\t0x%x thru 0x%x\t(0x%x bytes)",
 			    lp->args[0], lp->args[0] + lp->args[1] - 1,
 			    lp->args[1]);
 			if (badoff >= lp->args[0] &&
 			    badoff < lp->args[0] + lp->args[1])
 				prt("\t***RRRR***");
 			break;
 		case OP_WRITE:
-			prt("WRITE\t0x%x thru 0x%x\t(0x%x bytes)",
-			    lp->args[0], lp->args[0] + lp->args[1] - 1,
-			    lp->args[1]);
-			if (lp->args[0] > lp->args[2])
-				prt(" HOLE");
-			else if (lp->args[0] + lp->args[1] > lp->args[2])
-				prt(" EXTEND");
-			if ((badoff >= lp->args[0] || badoff >=lp->args[2]) &&
-			    badoff < lp->args[0] + lp->args[1])
-				prt("\t***WWWW");
+			{
+				int offset = lp->args[0];
+				int len = lp->args[1];
+				int oldlen = lp->args[2];
+
+				prt("WRITE\t0x%x thru 0x%x\t(0x%x bytes)",
+				    offset, offset + len - 1,
+				    len);
+				if (offset > oldlen)
+					prt(" HOLE");
+				else if (offset + len > oldlen)
+					prt(" EXTEND");
+				if ((badoff >= offset || badoff >=oldlen) &&
+				    badoff < offset + len)
+					prt("\t***WWWW");
+			}
 			break;
 		case OP_TRUNCATE:
 			down = lp->args[0] < lp->args[1];
 			prt("TRUNCATE %s\tfrom 0x%x to 0x%x",
 			    down ? "DOWN" : "UP", lp->args[1], lp->args[0]);
 			if (badoff >= lp->args[!down] &&
 			    badoff < lp->args[!!down])
 				prt("\t******WWWW");
 			break;
 		case OP_SKIPPED:
 			prt("SKIPPED (no operation)");
 			break;
 		default:
 			prt("BOGUS LOG ENTRY (operation code = %d)!",
 			    lp->operation);
 		}
 		prt("\n");
 		opnum++;
 		i++;
 		if (i == LOGSIZE)
 			i = 0;
 	}
 }
 
 
 void
 save_buffer(char *buffer, off_t bufferlength, int fd)
 {
 	off_t ret;
 	ssize_t byteswritten;
 
 	if (fd <= 0 || bufferlength == 0)
 		return;
 
 	if (bufferlength > SSIZE_MAX) {
 		prt("fsx flaw: overflow in save_buffer\n");
 		exit(67);
 	}
 	if (lite) {
 		off_t size_by_seek = lseek(fd, (off_t)0, SEEK_END);
 		if (size_by_seek == (off_t)-1)
 			prterr("save_buffer: lseek eof");
 		else if (bufferlength > size_by_seek) {
 			warn("save_buffer: .fsxgood file too short... will save 0x%llx bytes instead of 0x%llx\n", (unsigned long long)size_by_seek,
 			     (unsigned long long)bufferlength);
 			bufferlength = size_by_seek;
 		}
 	}
 
 	ret = lseek(fd, (off_t)0, SEEK_SET);
 	if (ret == (off_t)-1)
 		prterr("save_buffer: lseek 0");
 	
 	byteswritten = write(fd, buffer, (size_t)bufferlength);
 	if (byteswritten != bufferlength) {
 		if (byteswritten == -1)
 			prterr("save_buffer write");
 		else
 			warn("save_buffer: short write, 0x%x bytes instead of 0x%llx\n",
 			     (unsigned)byteswritten,
 			     (unsigned long long)bufferlength);
 	}
 }
 
 
 void
 report_failure(int status)
 {
 	logdump();
 	
 	if (fsxgoodfd) {
 		if (good_buf) {
 			save_buffer(good_buf, file_size, fsxgoodfd);
 			prt("Correct content saved for comparison\n");
 			prt("(maybe hexdump \"%s\" vs \"%s.fsxgood\")\n",
 			    fname, fname);
 		}
 		close(fsxgoodfd);
 	}
 	exit(status);
 }
 
 
 #define short_at(cp) ((unsigned short)((*((unsigned char *)(cp)) << 8) | \
 					*(((unsigned char *)(cp)) + 1)))
 
 void
 check_buffers(unsigned offset, unsigned size)
 {
 	unsigned char c, t;
 	unsigned i = 0;
 	unsigned n = 0;
 	unsigned op = 0;
 	unsigned bad = 0;
 
 	if (memcmp(good_buf + offset, temp_buf, size) != 0) {
 		prt("READ BAD DATA: offset = 0x%x, size = 0x%x\n",
 		    offset, size);
 		prt("OFFSET\tGOOD\tBAD\tRANGE\n");
 		while (size > 0) {
 			c = good_buf[offset];
 			t = temp_buf[i];
 			if (c != t) {
 				if (n == 0) {
 					bad = short_at(&temp_buf[i]);
 					prt("0x%5x\t0x%04x\t0x%04x", offset,
 					    short_at(&good_buf[offset]), bad);
 					op = temp_buf[offset & 1 ? i+1 : i];
 				}
 				n++;
 				badoff = offset;
 			}
 			offset++;
 			i++;
 			size--;
 		}
 		if (n) {
 			prt("\t0x%5x\n", n);
 			if (bad)
 				prt("operation# (mod 256) for the bad data may be %u\n", ((unsigned)op & 0xff));
 			else
 				prt("operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops\n");
 		} else
 			prt("????????????????\n");
 		report_failure(110);
 	}
 }
 
 
 void
 check_size(void)
 {
 	struct stat	statbuf;
 	off_t	size_by_seek;
 
 	if (fstat(fd, &statbuf)) {
 		prterr("check_size: fstat");
 		statbuf.st_size = -1;
 	}
 	size_by_seek = lseek(fd, (off_t)0, SEEK_END);
 	if (file_size != statbuf.st_size || file_size != size_by_seek) {
 		prt("Size error: expected 0x%llx stat 0x%llx seek 0x%llx\n",
 		    (unsigned long long)file_size,
 		    (unsigned long long)statbuf.st_size,
 		    (unsigned long long)size_by_seek);
 		report_failure(120);
 	}
 }
 
 
 void
 check_trunc_hack(void)
 {
 	struct stat statbuf;
 
 	ftruncate(fd, (off_t)0);
 	ftruncate(fd, (off_t)100000);
 	fstat(fd, &statbuf);
 	if (statbuf.st_size != (off_t)100000) {
 		prt("no extend on truncate! not posix!\n");
 		exit(130);
 	}
 	ftruncate(fd, (off_t)0);
 }
 
 
 void
 doread(unsigned offset, unsigned size)
 {
 	off_t ret;
 	unsigned iret;
 
 	offset -= offset % readbdy;
 	if (size == 0) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping zero size read\n");
 		log4(OP_SKIPPED, OP_READ, offset, size);
 		return;
 	}
 	if (size + offset > file_size) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping seek/read past end of file\n");
 		log4(OP_SKIPPED, OP_READ, offset, size);
 		return;
 	}
 
 	log4(OP_READ, offset, size, 0);
 
 	if (testcalls <= simulatedopcount)
 		return;
 
 	if (!quiet && ((progressinterval &&
 			testcalls % progressinterval == 0) ||
 		       (debug &&
 			(monitorstart == -1 ||
 			 (offset + size > monitorstart &&
 			  (monitorend == -1 || offset <= monitorend))))))
 		prt("%lu read\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls,
 		    offset, offset + size - 1, size);
 	ret = lseek(fd, (off_t)offset, SEEK_SET);
 	if (ret == (off_t)-1) {
 		prterr("doread: lseek");
 		report_failure(140);
 	}
 	iret = read(fd, temp_buf, size);
 	if (iret != size) {
 		if (iret == -1)
 			prterr("doread: read");
 		else
 			prt("short read: 0x%x bytes instead of 0x%x\n",
 			    iret, size);
 		report_failure(141);
 	}
 	check_buffers(offset, size);
 }
 
 
 void
 check_eofpage(char *s, unsigned offset, char *p, int size)
 {
 	uintptr_t last_page, should_be_zero;
 
 	if (offset + size <= (file_size & ~page_mask))
 		return;
 	/*
 	 * we landed in the last page of the file
 	 * test to make sure the VM system provided 0's 
 	 * beyond the true end of the file mapping
 	 * (as required by mmap def in 1996 posix 1003.1)
 	 */
 	last_page = ((uintptr_t)p + (offset & page_mask) + size) & ~page_mask;
 
 	for (should_be_zero = last_page + (file_size & page_mask);
 	     should_be_zero < last_page + page_size;
 	     should_be_zero++)
 		if (*(char *)should_be_zero) {
 			prt("Mapped %s: non-zero data past EOF (0x%llx) page offset 0x%x is 0x%04x\n",
 			    s, file_size - 1, should_be_zero & page_mask,
 			    short_at(should_be_zero));
 			report_failure(205);
 		}
 }
 
 
 void
 domapread(unsigned offset, unsigned size)
 {
 	unsigned pg_offset;
 	unsigned map_size;
 	char    *p;
 
 	offset -= offset % readbdy;
 	if (size == 0) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping zero size read\n");
 		log4(OP_SKIPPED, OP_MAPREAD, offset, size);
 		return;
 	}
 	if (size + offset > file_size) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping seek/read past end of file\n");
 		log4(OP_SKIPPED, OP_MAPREAD, offset, size);
 		return;
 	}
 
 	log4(OP_MAPREAD, offset, size, 0);
 
 	if (testcalls <= simulatedopcount)
 		return;
 
 	if (!quiet && ((progressinterval &&
 			testcalls % progressinterval == 0) ||
 		       (debug &&
 			(monitorstart == -1 ||
 			 (offset + size > monitorstart &&
 			  (monitorend == -1 || offset <= monitorend))))))
 		prt("%lu mapread\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls,
 		    offset, offset + size - 1, size);
 
 	pg_offset = offset & page_mask;
 	map_size  = pg_offset + size;
 
 	if ((p = (char *)mmap(0, map_size, PROT_READ, MAP_FILE | MAP_SHARED, fd,
 			      (off_t)(offset - pg_offset))) == (char *)-1) {
 		prterr("domapread: mmap");
 		report_failure(190);
 	}
 	memcpy(temp_buf, p + pg_offset, size);
 
 	check_eofpage("Read", offset, p, size);
 
 	if (munmap(p, map_size) != 0) {
 		prterr("domapread: munmap");
 		report_failure(191);
 	}
 
 	check_buffers(offset, size);
 }
 
 
 void
 gendata(char *original_buf, char *good_buf, unsigned offset, unsigned size)
 {
 	while (size--) {
 		good_buf[offset] = testcalls % 256; 
 		if (offset % 2)
 			good_buf[offset] += original_buf[offset];
 		offset++;
 	}
 }
 
 
 void
 dowrite(unsigned offset, unsigned size)
 {
 	off_t ret;
 	unsigned iret;
 
 	offset -= offset % writebdy;
 	if (size == 0) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping zero size write\n");
 		log4(OP_SKIPPED, OP_WRITE, offset, size);
 		return;
 	}
 
 	log4(OP_WRITE, offset, size, file_size);
 
 	gendata(original_buf, good_buf, offset, size);
 	if (file_size < offset + size) {
 		if (file_size < offset)
 			memset(good_buf + file_size, '\0', offset - file_size);
 		file_size = offset + size;
 		if (lite) {
 			warn("Lite file size bug in fsx!");
 			report_failure(149);
 		}
 	}
 
 	if (testcalls <= simulatedopcount)
 		return;
 
 	if (!quiet && ((progressinterval &&
 			testcalls % progressinterval == 0) ||
 		       (debug &&
 			(monitorstart == -1 ||
 			 (offset + size > monitorstart &&
 			  (monitorend == -1 || offset <= monitorend))))))
 		prt("%lu write\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls,
 		    offset, offset + size - 1, size);
 	ret = lseek(fd, (off_t)offset, SEEK_SET);
 	if (ret == (off_t)-1) {
 		prterr("dowrite: lseek");
 		report_failure(150);
 	}
 	iret = write(fd, good_buf + offset, size);
 	if (iret != size) {
 		if (iret == -1)
 			prterr("dowrite: write");
 		else
 			prt("short write: 0x%x bytes instead of 0x%x\n",
 			    iret, size);
 		report_failure(151);
 	}
 }
 
 
 void
 domapwrite(unsigned offset, unsigned size)
 {
 	unsigned pg_offset;
 	unsigned map_size;
 	off_t    cur_filesize;
 	char    *p;
 
 	offset -= offset % writebdy;
 	if (size == 0) {
 		if (!quiet && testcalls > simulatedopcount)
 			prt("skipping zero size write\n");
 		log4(OP_SKIPPED, OP_MAPWRITE, offset, size);
 		return;
 	}
 	cur_filesize = file_size;
 
 	log4(OP_MAPWRITE, offset, size, 0);
 
 	gendata(original_buf, good_buf, offset, size);
 	if (file_size < offset + size) {
 		if (file_size < offset)
 			memset(good_buf + file_size, '\0', offset - file_size);
 		file_size = offset + size;
 		if (lite) {
 			warn("Lite file size bug in fsx!");
 			report_failure(200);
 		}
 	}
 
 	if (testcalls <= simulatedopcount)
 		return;
 
 	if (!quiet && ((progressinterval &&
 			testcalls % progressinterval == 0) ||
 		       (debug &&
 			(monitorstart == -1 ||
 			 (offset + size > monitorstart &&
 			  (monitorend == -1 || offset <= monitorend))))))
 		prt("%lu mapwrite\t0x%x thru\t0x%x\t(0x%x bytes)\n", testcalls,
 		    offset, offset + size - 1, size);
 
 	if (file_size > cur_filesize) {
 		if (ftruncate(fd, file_size) == -1) {
 			prterr("domapwrite: ftruncate");
 			exit(201);
 		}
 	}
 	pg_offset = offset & page_mask;
 	map_size  = pg_offset + size;
 
 	if ((p = (char *)mmap(0, map_size, PROT_READ | PROT_WRITE,
 			      MAP_FILE | MAP_SHARED, fd,
 			      (off_t)(offset - pg_offset))) == MAP_FAILED) {
 		prterr("domapwrite: mmap");
 		report_failure(202);
 	}
 	memcpy(p + pg_offset, good_buf + offset, size);
 	if (mapped_msync && msync(p, map_size, MS_SYNC) != 0) {
 		prterr("domapwrite: msync");
 		report_failure(203);
 	}
 
 	check_eofpage("Write", offset, p, size);
 
 	if (munmap(p, map_size) != 0) {
 		prterr("domapwrite: munmap");
 		report_failure(204);
 	}
 }
 
 
 void
 dotruncate(unsigned size)
 {
 	int oldsize = file_size;
 
 	size -= size % truncbdy;
 	if (size > biggest) {
 		biggest = size;
 		if (!quiet && testcalls > simulatedopcount)
 			prt("truncating to largest ever: 0x%x\n", size);
 	}
 
 	log4(OP_TRUNCATE, size, (unsigned)file_size, 0);
 
 	if (size > file_size)
 		memset(good_buf + file_size, '\0', size - file_size);
 	file_size = size;
 
 	if (testcalls <= simulatedopcount)
 		return;
 	
 	if ((progressinterval && testcalls % progressinterval == 0) ||
 	    (debug && (monitorstart == -1 || monitorend == -1 ||
 		       size <= monitorend)))
 		prt("%lu trunc\tfrom 0x%x to 0x%x\n", testcalls, oldsize, size);
 	if (ftruncate(fd, (off_t)size) == -1) {
 		prt("ftruncate1: %x\n", size);
 		prterr("dotruncate: ftruncate");
 		report_failure(160);
 	}
 }
 
 
 void
 writefileimage()
 {
 	ssize_t iret;
 
 	if (lseek(fd, (off_t)0, SEEK_SET) == (off_t)-1) {
 		prterr("writefileimage: lseek");
 		report_failure(171);
 	}
 	iret = write(fd, good_buf, file_size);
 	if ((off_t)iret != file_size) {
 		if (iret == -1)
 			prterr("writefileimage: write");
 		else
 			prt("short write: 0x%x bytes instead of 0x%llx\n",
 			    iret, (unsigned long long)file_size);
 		report_failure(172);
 	}
 	if (lite ? 0 : ftruncate(fd, file_size) == -1) {
 		prt("ftruncate2: %llx\n", (unsigned long long)file_size);
 		prterr("writefileimage: ftruncate");
 		report_failure(173);
 	}
 }
 
 
 void
 docloseopen(void)
 { 
 	if (testcalls <= simulatedopcount)
 		return;
 
 	if (debug)
 		prt("%lu close/open\n", testcalls);
 	if (close(fd)) {
 		prterr("docloseopen: close");
 		report_failure(180);
 	}
 	fd = open(fname, O_RDWR, 0);
 	if (fd < 0) {
 		prterr("docloseopen: open");
 		report_failure(181);
 	}
 }
 
 
 void
 doinvl(void)
 {
 	char *p;
 
 	if (file_size == 0)
 		return;
 	if (testcalls <= simulatedopcount)
 		return;
 	if (debug)
 		prt("%lu msync(MS_INVALIDATE)\n", testcalls);
 
 	if ((p = (char *)mmap(0, file_size, PROT_READ | PROT_WRITE,
 			      MAP_FILE | MAP_SHARED, fd, 0)) == MAP_FAILED) {
 		prterr("doinvl: mmap");
 		report_failure(205);
 	}
 
 	if (msync(p, 0, MS_SYNC | MS_INVALIDATE) != 0) {
 		prterr("doinvl: msync");
 		report_failure(206);
 	}
 
 	if (munmap(p, file_size) != 0) {
 		prterr("doinvl: munmap");
 		report_failure(207);
 	}
 }
 
 
 void
 test(void)
 {
 	unsigned long	offset;
 	unsigned long	size = maxoplen;
 	unsigned long	rv = random();
 	unsigned long	op = rv % (3 + !lite + mapped_writes);
 
 	/* turn off the map read if necessary */
 
 	if (op == 2 && !mapped_reads)
 	    op = 0;
 
 	if (simulatedopcount > 0 && testcalls == simulatedopcount)
 		writefileimage();
 
 	testcalls++;
 
 	if (closeprob)
 		closeopen = (rv >> 3) < (1 << 28) / closeprob;
 	if (invlprob)
 		invl = (rv >> 3) < (1 << 28) / invlprob;
 
 	if (debugstart > 0 && testcalls >= debugstart)
 		debug = 1;
 
 	if (!quiet && testcalls < simulatedopcount && testcalls % 100000 == 0)
 		prt("%lu...\n", testcalls);
 
 	/*
 	 * READ:	op = 0
 	 * WRITE:	op = 1
 	 * MAPREAD:     op = 2
 	 * TRUNCATE:	op = 3
 	 * MAPWRITE:    op = 3 or 4
 	 */
 	if (lite ? 0 : op == 3 && style == 0) /* vanilla truncate? */
 		dotruncate(random() % maxfilelen);
 	else {
 		if (randomoplen)
 			size = random() % (maxoplen+1);
 		if (lite ? 0 : op == 3)
 			dotruncate(size);
 		else {
 			offset = random();
 			if (op == 1 || op == (lite ? 3 : 4)) {
 				offset %= maxfilelen;
 				if (offset + size > maxfilelen)
 					size = maxfilelen - offset;
 				if (op != 1)
 					domapwrite(offset, size);
 				else
 					dowrite(offset, size);
 			} else {
 				if (file_size)
 					offset %= file_size;
 				else
 					offset = 0;
 				if (offset + size > file_size)
 					size = file_size - offset;
 				if (op != 0)
 					domapread(offset, size);
 				else
 					doread(offset, size);
 			}
 		}
 	}
 	if (sizechecks && testcalls > simulatedopcount)
 		check_size();
 	if (invl)
 		doinvl();
 	if (closeopen)
 		docloseopen();
 }
 
 
 void
 cleanup(sig)
 	int	sig;
 {
 	if (sig)
 		prt("signal %d\n", sig);
 	prt("testcalls = %lu\n", testcalls);
 	exit(sig);
 }
 
 
 void
 usage(void)
 {
 	fprintf(stdout, "usage: %s",
 		"fsx [-dnqLOW] [-b opnum] [-c Prob] [-l flen] [-m start:end] [-o oplen] [-p progressinterval] [-r readbdy] [-s style] [-t truncbdy] [-w writebdy] [-D startingop] [-N numops] [-P dirpath] [-S seed] fname\n\
 	-b opnum: beginning operation number (default 1)\n\
 	-c P: 1 in P chance of file close+open at each op (default infinity)\n\
 	-d: debug output for all operations\n\
 	-i P: 1 in P chance of calling msync(MS_INVALIDATE) (default infinity)\n\
 	-l flen: the upper bound on file size (default 262144)\n\
 	-m startop:endop: monitor (print debug output) specified byte range (default 0:infinity)\n\
 	-n: no verifications of file size\n\
 	-o oplen: the upper bound on operation size (default 65536)\n\
 	-p progressinterval: debug output at specified operation interval\n\
 	-q: quieter operation\n\
 	-r readbdy: 4096 would make reads page aligned (default 1)\n\
 	-s style: 1 gives smaller truncates (default 0)\n\
 	-t truncbdy: 4096 would make truncates page aligned (default 1)\n\
 	-w writebdy: 4096 would make writes page aligned (default 1)\n\
 	-D startingop: debug output starting at specified operation\n\
 	-L: fsxLite - no file creations & no file size changes\n\
 	-N numops: total # operations to do (default infinity)\n\
 	-O: use oplen (see -o flag) for every op (default random)\n\
 	-P dirpath: save .fsxlog and .fsxgood files in dirpath (default ./)\n\
 	-S seed: for random # generator (default 1) 0 gets timestamp\n\
 	-W: mapped write operations DISabled\n\
 	-R: mapped read operations DISabled)\n\
 	-U: msync after mapped write operations DISabled\n\
 	fname: this filename is REQUIRED (no default)\n");
 	exit(90);
 }
 
 
 int
 getnum(char *s, char **e)
 {
 	int ret = -1;
 
 	*e = (char *) 0;
 	ret = strtol(s, e, 0);
 	if (*e)
 		switch (**e) {
 		case 'b':
 		case 'B':
 			ret *= 512;
 			*e = *e + 1;
 			break;
 		case 'k':
 		case 'K':
 			ret *= 1024;
 			*e = *e + 1;
 			break;
 		case 'm':
 		case 'M':
 			ret *= 1024*1024;
 			*e = *e + 1;
 			break;
 		case 'w':
 		case 'W':
 			ret *= 4;
 			*e = *e + 1;
 			break;
 		}
 	return (ret);
 }
 
 
 int
 main(int argc, char **argv)
 {
 	int	i, ch;
 	char	*endp;
 	char goodfile[1024];
 	char logfile[1024];
+	struct timespec now;
 
 	goodfile[0] = 0;
 	logfile[0] = 0;
 
 	page_size = getpagesize();
 	page_mask = page_size - 1;
 
 	setvbuf(stdout, (char *)0, _IOLBF, 0); /* line buffered stdout */
 
 	while ((ch = getopt(argc, argv,
 	    "b:c:di:l:m:no:p:qr:s:t:w:D:LN:OP:RS:UW")) != -1)
 		switch (ch) {
 		case 'b':
 			simulatedopcount = getnum(optarg, &endp);
 			if (!quiet)
 				fprintf(stdout, "Will begin at operation %ld\n",
 					simulatedopcount);
 			if (simulatedopcount == 0)
 				usage();
 			simulatedopcount -= 1;
 			break;
 		case 'c':
 			closeprob = getnum(optarg, &endp);
 			if (!quiet)
 				fprintf(stdout,
 					"Chance of close/open is 1 in %d\n",
 					closeprob);
 			if (closeprob <= 0)
 				usage();
 			break;
 		case 'd':
 			debug = 1;
 			break;
 		case 'i':
 			invlprob = getnum(optarg, &endp);
 			if (!quiet)
 				fprintf(stdout,
 					"Chance of MS_INVALIDATE is 1 in %d\n",
 					invlprob);
 			if (invlprob <= 0)
 				usage();
 			break;
 		case 'l':
 			maxfilelen = getnum(optarg, &endp);
 			if (maxfilelen <= 0)
 				usage();
 			break;
 		case 'm':
 			monitorstart = getnum(optarg, &endp);
 			if (monitorstart < 0)
 				usage();
 			if (!endp || *endp++ != ':')
 				usage();
 			monitorend = getnum(endp, &endp);
 			if (monitorend < 0)
 				usage();
 			if (monitorend == 0)
 				monitorend = -1; /* aka infinity */
 			debug = 1;
 		case 'n':
 			sizechecks = 0;
 			break;
 		case 'o':
 			maxoplen = getnum(optarg, &endp);
 			if (maxoplen <= 0)
 				usage();
 			break;
 		case 'p':
 			progressinterval = getnum(optarg, &endp);
 			if (progressinterval < 0)
 				usage();
 			break;
 		case 'q':
 			quiet = 1;
 			break;
 		case 'r':
 			readbdy = getnum(optarg, &endp);
 			if (readbdy <= 0)
 				usage();
 			break;
 		case 's':
 			style = getnum(optarg, &endp);
 			if (style < 0 || style > 1)
 				usage();
 			break;
 		case 't':
 			truncbdy = getnum(optarg, &endp);
 			if (truncbdy <= 0)
 				usage();
 			break;
 		case 'w':
 			writebdy = getnum(optarg, &endp);
 			if (writebdy <= 0)
 				usage();
 			break;
 		case 'D':
 			debugstart = getnum(optarg, &endp);
 			if (debugstart < 1)
 				usage();
 			break;
 		case 'L':
 			lite = 1;
 			break;
 		case 'N':
 			numops = getnum(optarg, &endp);
 			if (numops < 0)
 				usage();
 			break;
 		case 'O':
 			randomoplen = 0;
 			break;
 		case 'P':
 			strncpy(goodfile, optarg, sizeof(goodfile));
 			strcat(goodfile, "/");
 			strncpy(logfile, optarg, sizeof(logfile));
 			strcat(logfile, "/");
 			break;
 		case 'R':
 			mapped_reads = 0;
 			break;
 		case 'S':
 			seed = getnum(optarg, &endp);
-			if (seed == 0)
-				seed = time(0) % 10000;
+			if (seed == 0) {
+				if (clock_gettime(CLOCK_REALTIME, &now) != 0)
+					err(1, "clock_gettime");
+				seed = now.tv_nsec % 10000;
+			}
 			if (!quiet)
 				fprintf(stdout, "Seed set to %d\n", seed);
 			if (seed < 0)
 				usage();
 			break;
 		case 'W':
 			mapped_writes = 0;
 			if (!quiet)
 				fprintf(stdout, "mapped writes DISABLED\n");
 			break;
 		case 'U':
 			mapped_msync = 0;
 			if (!quiet)
 				fprintf(stdout, "mapped msync DISABLED\n");
 			break;
 
 		default:
 			usage();
 			/* NOTREACHED */
 		}
 	argc -= optind;
 	argv += optind;
 	if (argc != 1)
 		usage();
 	fname = argv[0];
 
 	signal(SIGHUP,	cleanup);
 	signal(SIGINT,	cleanup);
 	signal(SIGPIPE,	cleanup);
 	signal(SIGALRM,	cleanup);
 	signal(SIGTERM,	cleanup);
 	signal(SIGXCPU,	cleanup);
 	signal(SIGXFSZ,	cleanup);
 	signal(SIGVTALRM,	cleanup);
 	signal(SIGUSR1,	cleanup);
 	signal(SIGUSR2,	cleanup);
 
 	initstate(seed, state, 256);
 	setstate(state);
 	fd = open(fname, O_RDWR|(lite ? 0 : O_CREAT|O_TRUNC), 0666);
 	if (fd < 0) {
 		prterr(fname);
 		exit(91);
 	}
 	strncat(goodfile, fname, 256);
 	strcat (goodfile, ".fsxgood");
 	fsxgoodfd = open(goodfile, O_RDWR|O_CREAT|O_TRUNC, 0666);
 	if (fsxgoodfd < 0) {
 		prterr(goodfile);
 		exit(92);
 	}
 	strncat(logfile, fname, 256);
 	strcat (logfile, ".fsxlog");
 	fsxlogf = fopen(logfile, "w");
 	if (fsxlogf == NULL) {
 		prterr(logfile);
 		exit(93);
 	}
 	if (lite) {
 		off_t ret;
 		file_size = maxfilelen = lseek(fd, (off_t)0, SEEK_END);
 		if (file_size == (off_t)-1) {
 			prterr(fname);
 			warn("main: lseek eof");
 			exit(94);
 		}
 		ret = lseek(fd, (off_t)0, SEEK_SET);
 		if (ret == (off_t)-1) {
 			prterr(fname);
 			warn("main: lseek 0");
 			exit(95);
 		}
 	}
 	original_buf = (char *) malloc(maxfilelen);
 	for (i = 0; i < maxfilelen; i++)
 		original_buf[i] = random() % 256;
 	good_buf = (char *) malloc(maxfilelen);
 	memset(good_buf, '\0', maxfilelen);
 	temp_buf = (char *) malloc(maxoplen);
 	memset(temp_buf, '\0', maxoplen);
 	if (lite) {	/* zero entire existing file */
 		ssize_t written;
 
 		written = write(fd, good_buf, (size_t)maxfilelen);
 		if (written != maxfilelen) {
 			if (written == -1) {
 				prterr(fname);
 				warn("main: error on write");
 			} else
-				warn("main: short write, 0x%x bytes instead of 0x%x\n",
+				warn("main: short write, 0x%x bytes instead of 0x%lx\n",
 				     (unsigned)written, maxfilelen);
 			exit(98);
 		}
 	} else 
 		check_trunc_hack();
 
 	while (numops == -1 || numops--)
 		test();
 
 	if (close(fd)) {
 		prterr("close");
 		report_failure(99);
 	}
 	prt("All operations completed A-OK!\n");
 
 	exit(0);
 	return 0;
 }
 
Index: user/ngie/bug-237403/usr.sbin/bhyve/acpi.c
===================================================================
--- user/ngie/bug-237403/usr.sbin/bhyve/acpi.c	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/bhyve/acpi.c	(revision 346926)
@@ -1,984 +1,999 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2012 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 /*
  * bhyve ACPI table generator.
  *
  * Create the minimal set of ACPI tables required to boot FreeBSD (and
  * hopefully other o/s's) by writing out ASL template files for each of
  * the tables and the compiling them to AML with the Intel iasl compiler.
  * The AML files are then read into guest memory.
  *
  *  The tables are placed in the guest's ROM area just below 1MB physical,
  * above the MPTable.
  *
- *  Layout
+ *  Layout (No longer correct at FADT and beyond due to properly
+ *  calculating the size of the MADT to allow for changes to
+ *  VM_MAXCPU above 21 which overflows this layout.)
  *  ------
  *   RSDP  ->   0xf2400    (36 bytes fixed)
  *     RSDT  ->   0xf2440    (36 bytes + 4*7 table addrs, 4 used)
  *     XSDT  ->   0xf2480    (36 bytes + 8*7 table addrs, 4 used)
  *       MADT  ->   0xf2500  (depends on #CPUs)
  *       FADT  ->   0xf2600  (268 bytes)
  *       HPET  ->   0xf2740  (56 bytes)
  *       MCFG  ->   0xf2780  (60 bytes)
  *         FACS  ->   0xf27C0 (64 bytes)
  *         DSDT  ->   0xf2800 (variable - can go up to 0x100000)
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/errno.h>
 #include <sys/stat.h>
 
 #include <paths.h>
 #include <stdarg.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
 #include <machine/vmm.h>
 #include <vmmapi.h>
 
 #include "bhyverun.h"
 #include "acpi.h"
 #include "pci_emul.h"
 
 /*
- * Define the base address of the ACPI tables, and the offsets to
- * the individual tables
+ * Define the base address of the ACPI tables, the sizes of some tables, 
+ * and the offsets to the individual tables,
  */
 #define BHYVE_ACPI_BASE		0xf2400
 #define RSDT_OFFSET		0x040
 #define XSDT_OFFSET		0x080
 #define MADT_OFFSET		0x100
-#define FADT_OFFSET		0x200
-#define	HPET_OFFSET		0x340
-#define	MCFG_OFFSET		0x380
-#define FACS_OFFSET		0x3C0
-#define DSDT_OFFSET		0x400
+/*
+ * The MADT consists of:
+ *	44		Fixed Header
+ *	8 * maxcpu	Processor Local APIC entries
+ *	12		I/O APIC entry
+ *	2 * 10		Interrupt Source Override entires
+ *	6		Local APIC NMI entry
+ */
+#define	MADT_SIZE		(44 + VM_MAXCPU*8 + 12 + 2*10 + 6)
+#define	FADT_OFFSET		(MADT_OFFSET + MADT_SIZE)
+#define	FADT_SIZE		0x140
+#define	HPET_OFFSET		(FADT_OFFSET + FADT_SIZE)
+#define	HPET_SIZE		0x40
+#define	MCFG_OFFSET		(HPET_OFFSET + HPET_SIZE)
+#define	MCFG_SIZE		0x40
+#define	FACS_OFFSET		(MCFG_OFFSET + MCFG_SIZE)
+#define	FACS_SIZE		0x40
+#define	DSDT_OFFSET		(FACS_OFFSET + FACS_SIZE)
 
 #define	BHYVE_ASL_TEMPLATE	"bhyve.XXXXXXX"
 #define BHYVE_ASL_SUFFIX	".aml"
 #define BHYVE_ASL_COMPILER	"/usr/sbin/iasl"
 
 static int basl_keep_temps;
 static int basl_verbose_iasl;
 static int basl_ncpu;
 static uint32_t basl_acpi_base = BHYVE_ACPI_BASE;
 static uint32_t hpet_capabilities;
 
 /*
  * Contains the full pathname of the template to be passed
  * to mkstemp/mktemps(3)
  */
 static char basl_template[MAXPATHLEN];
 static char basl_stemplate[MAXPATHLEN];
 
 /*
  * State for dsdt_line(), dsdt_indent(), and dsdt_unindent().
  */
 static FILE *dsdt_fp;
 static int dsdt_indent_level;
 static int dsdt_error;
 
 struct basl_fio {
 	int	fd;
 	FILE	*fp;
 	char	f_name[MAXPATHLEN];
 };
 
 #define EFPRINTF(...) \
 	if (fprintf(__VA_ARGS__) < 0) goto err_exit;
 
 #define EFFLUSH(x) \
 	if (fflush(x) != 0) goto err_exit;
 
 static int
 basl_fwrite_rsdp(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve RSDP template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0008]\t\tSignature : \"RSD PTR \"\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 43\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 02\n");
 	EFPRINTF(fp, "[0004]\t\tRSDT Address : %08X\n",
 	    basl_acpi_base + RSDT_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tLength : 00000024\n");
 	EFPRINTF(fp, "[0008]\t\tXSDT Address : 00000000%08X\n",
 	    basl_acpi_base + XSDT_OFFSET);
 	EFPRINTF(fp, "[0001]\t\tExtended Checksum : 00\n");
 	EFPRINTF(fp, "[0003]\t\tReserved : 000000\n");
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_rsdt(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve RSDT template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"RSDT\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 01\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVRSDT  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	/* Add in pointers to the MADT, FADT and HPET */
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 0 : %08X\n",
 	    basl_acpi_base + MADT_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 1 : %08X\n",
 	    basl_acpi_base + FADT_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 2 : %08X\n",
 	    basl_acpi_base + HPET_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 3 : %08X\n",
 	    basl_acpi_base + MCFG_OFFSET);
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_xsdt(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve XSDT template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"XSDT\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 01\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVXSDT  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	/* Add in pointers to the MADT, FADT and HPET */
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 0 : 00000000%08X\n",
 	    basl_acpi_base + MADT_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 1 : 00000000%08X\n",
 	    basl_acpi_base + FADT_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 2 : 00000000%08X\n",
 	    basl_acpi_base + HPET_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tACPI Table Address 3 : 00000000%08X\n",
 	    basl_acpi_base + MCFG_OFFSET);
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_madt(FILE *fp)
 {
 	int i;
 
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve MADT template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"APIC\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 01\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVMADT  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0004]\t\tLocal Apic Address : FEE00000\n");
 	EFPRINTF(fp, "[0004]\t\tFlags (decoded below) : 00000001\n");
 	EFPRINTF(fp, "\t\t\tPC-AT Compatibility : 1\n");
 	EFPRINTF(fp, "\n");
 
 	/* Add a Processor Local APIC entry for each CPU */
 	for (i = 0; i < basl_ncpu; i++) {
 		EFPRINTF(fp, "[0001]\t\tSubtable Type : 00\n");
 		EFPRINTF(fp, "[0001]\t\tLength : 08\n");
 		/* iasl expects hex values for the proc and apic id's */
 		EFPRINTF(fp, "[0001]\t\tProcessor ID : %02x\n", i);
 		EFPRINTF(fp, "[0001]\t\tLocal Apic ID : %02x\n", i);
 		EFPRINTF(fp, "[0004]\t\tFlags (decoded below) : 00000001\n");
 		EFPRINTF(fp, "\t\t\tProcessor Enabled : 1\n");
 		EFPRINTF(fp, "\t\t\tRuntime Online Capable : 0\n");
 		EFPRINTF(fp, "\n");
 	}
 
 	/* Always a single IOAPIC entry, with ID 0 */
 	EFPRINTF(fp, "[0001]\t\tSubtable Type : 01\n");
 	EFPRINTF(fp, "[0001]\t\tLength : 0C\n");
 	/* iasl expects a hex value for the i/o apic id */
 	EFPRINTF(fp, "[0001]\t\tI/O Apic ID : %02x\n", 0);
 	EFPRINTF(fp, "[0001]\t\tReserved : 00\n");
 	EFPRINTF(fp, "[0004]\t\tAddress : fec00000\n");
 	EFPRINTF(fp, "[0004]\t\tInterrupt : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	/* Legacy IRQ0 is connected to pin 2 of the IOAPIC */
 	EFPRINTF(fp, "[0001]\t\tSubtable Type : 02\n");
 	EFPRINTF(fp, "[0001]\t\tLength : 0A\n");
 	EFPRINTF(fp, "[0001]\t\tBus : 00\n");
 	EFPRINTF(fp, "[0001]\t\tSource : 00\n");
 	EFPRINTF(fp, "[0004]\t\tInterrupt : 00000002\n");
 	EFPRINTF(fp, "[0002]\t\tFlags (decoded below) : 0005\n");
 	EFPRINTF(fp, "\t\t\tPolarity : 1\n");
 	EFPRINTF(fp, "\t\t\tTrigger Mode : 1\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0001]\t\tSubtable Type : 02\n");
 	EFPRINTF(fp, "[0001]\t\tLength : 0A\n");
 	EFPRINTF(fp, "[0001]\t\tBus : 00\n");
 	EFPRINTF(fp, "[0001]\t\tSource : %02X\n", SCI_INT);
 	EFPRINTF(fp, "[0004]\t\tInterrupt : %08X\n", SCI_INT);
 	EFPRINTF(fp, "[0002]\t\tFlags (decoded below) : 0000\n");
 	EFPRINTF(fp, "\t\t\tPolarity : 3\n");
 	EFPRINTF(fp, "\t\t\tTrigger Mode : 3\n");
 	EFPRINTF(fp, "\n");
 
 	/* Local APIC NMI is connected to LINT 1 on all CPUs */
 	EFPRINTF(fp, "[0001]\t\tSubtable Type : 04\n");
 	EFPRINTF(fp, "[0001]\t\tLength : 06\n");
 	EFPRINTF(fp, "[0001]\t\tProcessorId : FF\n");
 	EFPRINTF(fp, "[0002]\t\tFlags (decoded below) : 0005\n");
 	EFPRINTF(fp, "\t\t\tPolarity : 1\n");
 	EFPRINTF(fp, "\t\t\tTrigger Mode : 1\n");
 	EFPRINTF(fp, "[0001]\t\tInterrupt : 01\n");
 	EFPRINTF(fp, "\n");
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_fadt(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve FADT template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"FACP\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 0000010C\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 05\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVFACP  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0004]\t\tFACS Address : %08X\n",
 	    basl_acpi_base + FACS_OFFSET);
 	EFPRINTF(fp, "[0004]\t\tDSDT Address : %08X\n",
 	    basl_acpi_base + DSDT_OFFSET);
 	EFPRINTF(fp, "[0001]\t\tModel : 01\n");
 	EFPRINTF(fp, "[0001]\t\tPM Profile : 00 [Unspecified]\n");
 	EFPRINTF(fp, "[0002]\t\tSCI Interrupt : %04X\n",
 	    SCI_INT);
 	EFPRINTF(fp, "[0004]\t\tSMI Command Port : %08X\n",
 	    SMI_CMD);
 	EFPRINTF(fp, "[0001]\t\tACPI Enable Value : %02X\n",
 	    BHYVE_ACPI_ENABLE);
 	EFPRINTF(fp, "[0001]\t\tACPI Disable Value : %02X\n",
 	    BHYVE_ACPI_DISABLE);
 	EFPRINTF(fp, "[0001]\t\tS4BIOS Command : 00\n");
 	EFPRINTF(fp, "[0001]\t\tP-State Control : 00\n");
 	EFPRINTF(fp, "[0004]\t\tPM1A Event Block Address : %08X\n",
 	    PM1A_EVT_ADDR);
 	EFPRINTF(fp, "[0004]\t\tPM1B Event Block Address : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tPM1A Control Block Address : %08X\n",
 	    PM1A_CNT_ADDR);
 	EFPRINTF(fp, "[0004]\t\tPM1B Control Block Address : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tPM2 Control Block Address : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tPM Timer Block Address : %08X\n",
 	    IO_PMTMR);
 	EFPRINTF(fp, "[0004]\t\tGPE0 Block Address : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tGPE1 Block Address : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tPM1 Event Block Length : 04\n");
 	EFPRINTF(fp, "[0001]\t\tPM1 Control Block Length : 02\n");
 	EFPRINTF(fp, "[0001]\t\tPM2 Control Block Length : 00\n");
 	EFPRINTF(fp, "[0001]\t\tPM Timer Block Length : 04\n");
 	EFPRINTF(fp, "[0001]\t\tGPE0 Block Length : 00\n");
 	EFPRINTF(fp, "[0001]\t\tGPE1 Block Length : 00\n");
 	EFPRINTF(fp, "[0001]\t\tGPE1 Base Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\t_CST Support : 00\n");
 	EFPRINTF(fp, "[0002]\t\tC2 Latency : 0000\n");
 	EFPRINTF(fp, "[0002]\t\tC3 Latency : 0000\n");
 	EFPRINTF(fp, "[0002]\t\tCPU Cache Size : 0000\n");
 	EFPRINTF(fp, "[0002]\t\tCache Flush Stride : 0000\n");
 	EFPRINTF(fp, "[0001]\t\tDuty Cycle Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tDuty Cycle Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tRTC Day Alarm Index : 00\n");
 	EFPRINTF(fp, "[0001]\t\tRTC Month Alarm Index : 00\n");
 	EFPRINTF(fp, "[0001]\t\tRTC Century Index : 32\n");
 	EFPRINTF(fp, "[0002]\t\tBoot Flags (decoded below) : 0000\n");
 	EFPRINTF(fp, "\t\t\tLegacy Devices Supported (V2) : 0\n");
 	EFPRINTF(fp, "\t\t\t8042 Present on ports 60/64 (V2) : 0\n");
 	EFPRINTF(fp, "\t\t\tVGA Not Present (V4) : 1\n");
 	EFPRINTF(fp, "\t\t\tMSI Not Supported (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tPCIe ASPM Not Supported (V4) : 1\n");
 	EFPRINTF(fp, "\t\t\tCMOS RTC Not Present (V5) : 0\n");
 	EFPRINTF(fp, "[0001]\t\tReserved : 00\n");
 	EFPRINTF(fp, "[0004]\t\tFlags (decoded below) : 00000000\n");
 	EFPRINTF(fp, "\t\t\tWBINVD instruction is operational (V1) : 1\n");
 	EFPRINTF(fp, "\t\t\tWBINVD flushes all caches (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\tAll CPUs support C1 (V1) : 1\n");
 	EFPRINTF(fp, "\t\t\tC2 works on MP system (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\tControl Method Power Button (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\tControl Method Sleep Button (V1) : 1\n");
 	EFPRINTF(fp, "\t\t\tRTC wake not in fixed reg space (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\tRTC can wake system from S4 (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\t32-bit PM Timer (V1) : 1\n");
 	EFPRINTF(fp, "\t\t\tDocking Supported (V1) : 0\n");
 	EFPRINTF(fp, "\t\t\tReset Register Supported (V2) : 1\n");
 	EFPRINTF(fp, "\t\t\tSealed Case (V3) : 0\n");
 	EFPRINTF(fp, "\t\t\tHeadless - No Video (V3) : 1\n");
 	EFPRINTF(fp, "\t\t\tUse native instr after SLP_TYPx (V3) : 0\n");
 	EFPRINTF(fp, "\t\t\tPCIEXP_WAK Bits Supported (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tUse Platform Timer (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tRTC_STS valid on S4 wake (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tRemote Power-on capable (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tUse APIC Cluster Model (V4) : 0\n");
 	EFPRINTF(fp, "\t\t\tUse APIC Physical Destination Mode (V4) : 1\n");
 	EFPRINTF(fp, "\t\t\tHardware Reduced (V5) : 0\n");
 	EFPRINTF(fp, "\t\t\tLow Power S0 Idle (V5) : 0\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	    "[0012]\t\tReset Register : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 08\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 01 [Byte Access:8]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000CF9\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0001]\t\tValue to cause reset : 06\n");
 	EFPRINTF(fp, "[0002]\t\tARM Flags (decoded below): 0000\n");
 	EFPRINTF(fp, "\t\t\tPSCI Compliant : 0\n");
 	EFPRINTF(fp, "\t\t\tMust use HVC for PSCI : 0\n");
 	EFPRINTF(fp, "[0001]\t\tFADT Minor Revision : 01\n");
 	EFPRINTF(fp, "[0008]\t\tFACS Address : 00000000%08X\n",
 	    basl_acpi_base + FACS_OFFSET);
 	EFPRINTF(fp, "[0008]\t\tDSDT Address : 00000000%08X\n",
 	    basl_acpi_base + DSDT_OFFSET);
 	EFPRINTF(fp,
 	    "[0012]\t\tPM1A Event Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 20\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 02 [Word Access:16]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 00000000%08X\n",
 	    PM1A_EVT_ADDR);
 	EFPRINTF(fp, "\n");
 	
 	EFPRINTF(fp,
 	    "[0012]\t\tPM1B Event Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 	    "[0001]\t\tEncoded Access Width : 00 [Undefined/Legacy]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	    "[0012]\t\tPM1A Control Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 10\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 02 [Word Access:16]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 00000000%08X\n",
 	    PM1A_CNT_ADDR);
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	    "[0012]\t\tPM1B Control Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 	    "[0001]\t\tEncoded Access Width : 00 [Undefined/Legacy]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	    "[0012]\t\tPM2 Control Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 08\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 	    "[0001]\t\tEncoded Access Width : 00 [Undefined/Legacy]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	/* Valid for bhyve */
 	EFPRINTF(fp,
 	    "[0012]\t\tPM Timer Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 20\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 	    "[0001]\t\tEncoded Access Width : 03 [DWord Access:32]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 00000000%08X\n",
 	    IO_PMTMR);
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0012]\t\tGPE0 Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 01 [Byte Access:8]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0012]\t\tGPE1 Block : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 	    "[0001]\t\tEncoded Access Width : 00 [Undefined/Legacy]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	   "[0012]\t\tSleep Control Register : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 08\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 01 [Byte Access:8]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp,
 	    "[0012]\t\tSleep Status Register : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 01 [SystemIO]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 08\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 01 [Byte Access:8]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000000\n");
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_hpet(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve HPET template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"HPET\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 01\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVHPET  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0004]\t\tTimer Block ID : %08X\n", hpet_capabilities);
 	EFPRINTF(fp,
 	    "[0012]\t\tTimer Block Register : [Generic Address Structure]\n");
 	EFPRINTF(fp, "[0001]\t\tSpace ID : 00 [SystemMemory]\n");
 	EFPRINTF(fp, "[0001]\t\tBit Width : 00\n");
 	EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n");
 	EFPRINTF(fp,
 		 "[0001]\t\tEncoded Access Width : 00 [Undefined/Legacy]\n");
 	EFPRINTF(fp, "[0008]\t\tAddress : 00000000FED00000\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0001]\t\tHPET Number : 00\n");
 	EFPRINTF(fp, "[0002]\t\tMinimum Clock Ticks : 0000\n");
 	EFPRINTF(fp, "[0004]\t\tFlags (decoded below) : 00000001\n");
 	EFPRINTF(fp, "\t\t\t4K Page Protect : 1\n");
 	EFPRINTF(fp, "\t\t\t64K Page Protect : 0\n");
 	EFPRINTF(fp, "\n");
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_mcfg(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve MCFG template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"MCFG\"\n");
 	EFPRINTF(fp, "[0004]\t\tTable Length : 00000000\n");
 	EFPRINTF(fp, "[0001]\t\tRevision : 01\n");
 	EFPRINTF(fp, "[0001]\t\tChecksum : 00\n");
 	EFPRINTF(fp, "[0006]\t\tOem ID : \"BHYVE \"\n");
 	EFPRINTF(fp, "[0008]\t\tOem Table ID : \"BVMCFG  \"\n");
 	EFPRINTF(fp, "[0004]\t\tOem Revision : 00000001\n");
 
 	/* iasl will fill in the compiler ID/revision fields */
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler ID : \"xxxx\"\n");
 	EFPRINTF(fp, "[0004]\t\tAsl Compiler Revision : 00000000\n");
 	EFPRINTF(fp, "[0008]\t\tReserved : 0\n");
 	EFPRINTF(fp, "\n");
 
 	EFPRINTF(fp, "[0008]\t\tBase Address : %016lX\n", pci_ecfg_base());
 	EFPRINTF(fp, "[0002]\t\tSegment Group: 0000\n");
 	EFPRINTF(fp, "[0001]\t\tStart Bus: 00\n");
 	EFPRINTF(fp, "[0001]\t\tEnd Bus: FF\n");
 	EFPRINTF(fp, "[0004]\t\tReserved : 0\n");
 	EFFLUSH(fp);
 	return (0);
 err_exit:
 	return (errno);
 }
 
 static int
 basl_fwrite_facs(FILE *fp)
 {
 	EFPRINTF(fp, "/*\n");
 	EFPRINTF(fp, " * bhyve FACS template\n");
 	EFPRINTF(fp, " */\n");
 	EFPRINTF(fp, "[0004]\t\tSignature : \"FACS\"\n");
 	EFPRINTF(fp, "[0004]\t\tLength : 00000040\n");
 	EFPRINTF(fp, "[0004]\t\tHardware Signature : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\t32 Firmware Waking Vector : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tGlobal Lock : 00000000\n");
 	EFPRINTF(fp, "[0004]\t\tFlags (decoded below) : 00000000\n");
 	EFPRINTF(fp, "\t\t\tS4BIOS Support Present : 0\n");
 	EFPRINTF(fp, "\t\t\t64-bit Wake Supported (V2) : 0\n");
 	EFPRINTF(fp,
 	    "[0008]\t\t64 Firmware Waking Vector : 0000000000000000\n");
 	EFPRINTF(fp, "[0001]\t\tVersion : 02\n");
 	EFPRINTF(fp, "[0003]\t\tReserved : 000000\n");
 	EFPRINTF(fp, "[0004]\t\tOspmFlags (decoded below) : 00000000\n");
 	EFPRINTF(fp, "\t\t\t64-bit Wake Env Required (V2) : 0\n");
 
 	EFFLUSH(fp);
 
 	return (0);
 	
 err_exit:
 	return (errno);
 }
 
 /*
  * Helper routines for writing to the DSDT from other modules.
  */
 void
 dsdt_line(const char *fmt, ...)
 {
 	va_list ap;
 
 	if (dsdt_error != 0)
 		return;
 
 	if (strcmp(fmt, "") != 0) {
 		if (dsdt_indent_level != 0)
 			EFPRINTF(dsdt_fp, "%*c", dsdt_indent_level * 2, ' ');
 		va_start(ap, fmt);
 		if (vfprintf(dsdt_fp, fmt, ap) < 0) {
 			va_end(ap);
 			goto err_exit;
 		}
 		va_end(ap);
 	}
 	EFPRINTF(dsdt_fp, "\n");
 	return;
 
 err_exit:
 	dsdt_error = errno;
 }
 
 void
 dsdt_indent(int levels)
 {
 
 	dsdt_indent_level += levels;
 	assert(dsdt_indent_level >= 0);
 }
 
 void
 dsdt_unindent(int levels)
 {
 
 	assert(dsdt_indent_level >= levels);
 	dsdt_indent_level -= levels;
 }
 
 void
 dsdt_fixed_ioport(uint16_t iobase, uint16_t length)
 {
 
 	dsdt_line("IO (Decode16,");
 	dsdt_line("  0x%04X,             // Range Minimum", iobase);
 	dsdt_line("  0x%04X,             // Range Maximum", iobase);
 	dsdt_line("  0x01,               // Alignment");
 	dsdt_line("  0x%02X,               // Length", length);
 	dsdt_line("  )");
 }
 
 void
 dsdt_fixed_irq(uint8_t irq)
 {
 
 	dsdt_line("IRQNoFlags ()");
 	dsdt_line("  {%d}", irq);
 }
 
 void
 dsdt_fixed_mem32(uint32_t base, uint32_t length)
 {
 
 	dsdt_line("Memory32Fixed (ReadWrite,");
 	dsdt_line("  0x%08X,         // Address Base", base);
 	dsdt_line("  0x%08X,         // Address Length", length);
 	dsdt_line("  )");
 }
 
 static int
 basl_fwrite_dsdt(FILE *fp)
 {
 	dsdt_fp = fp;
 	dsdt_error = 0;
 	dsdt_indent_level = 0;
 
 	dsdt_line("/*");
 	dsdt_line(" * bhyve DSDT template");
 	dsdt_line(" */");
 	dsdt_line("DefinitionBlock (\"bhyve_dsdt.aml\", \"DSDT\", 2,"
 		 "\"BHYVE \", \"BVDSDT  \", 0x00000001)");
 	dsdt_line("{");
 	dsdt_line("  Name (_S5, Package ()");
 	dsdt_line("  {");
 	dsdt_line("      0x05,");
 	dsdt_line("      Zero,");
 	dsdt_line("  })");
 
 	pci_write_dsdt();
 
 	dsdt_line("");
 	dsdt_line("  Scope (_SB.PC00)");
 	dsdt_line("  {");
 	dsdt_line("    Device (HPET)");
 	dsdt_line("    {");
 	dsdt_line("      Name (_HID, EISAID(\"PNP0103\"))");
 	dsdt_line("      Name (_UID, 0)");
 	dsdt_line("      Name (_CRS, ResourceTemplate ()");
 	dsdt_line("      {");
 	dsdt_indent(4);
 	dsdt_fixed_mem32(0xFED00000, 0x400);
 	dsdt_unindent(4);
 	dsdt_line("      })");
 	dsdt_line("    }");
 	dsdt_line("  }");
 	dsdt_line("}");
 
 	if (dsdt_error != 0)
 		return (dsdt_error);
 
 	EFFLUSH(fp);
 
 	return (0);
 
 err_exit:
 	return (errno);
 }
 
 static int
 basl_open(struct basl_fio *bf, int suffix)
 {
 	int err;
 
 	err = 0;
 
 	if (suffix) {
 		strlcpy(bf->f_name, basl_stemplate, MAXPATHLEN);
 		bf->fd = mkstemps(bf->f_name, strlen(BHYVE_ASL_SUFFIX));
 	} else {
 		strlcpy(bf->f_name, basl_template, MAXPATHLEN);
 		bf->fd = mkstemp(bf->f_name);
 	}
 
 	if (bf->fd > 0) {
 		bf->fp = fdopen(bf->fd, "w+");
 		if (bf->fp == NULL) {
 			unlink(bf->f_name);
 			close(bf->fd);
 		}
 	} else {
 		err = 1;
 	}
 
 	return (err);
 }
 
 static void
 basl_close(struct basl_fio *bf)
 {
 
 	if (!basl_keep_temps)
 		unlink(bf->f_name);
 	fclose(bf->fp);
 }
 
 static int
 basl_start(struct basl_fio *in, struct basl_fio *out)
 {
 	int err;
 
 	err = basl_open(in, 0);
 	if (!err) {
 		err = basl_open(out, 1);
 		if (err) {
 			basl_close(in);
 		}
 	}
 
 	return (err);
 }
 
 static void
 basl_end(struct basl_fio *in, struct basl_fio *out)
 {
 
 	basl_close(in);
 	basl_close(out);
 }
 
 static int
 basl_load(struct vmctx *ctx, int fd, uint64_t off)
 {
 	struct stat sb;
 	void *gaddr;
 
 	if (fstat(fd, &sb) < 0)
 		return (errno);
 		
 	gaddr = paddr_guest2host(ctx, basl_acpi_base + off, sb.st_size);
 	if (gaddr == NULL)
 		return (EFAULT);
 
 	if (read(fd, gaddr, sb.st_size) < 0)
 		return (errno);
 
 	return (0);
 }
 
 static int
 basl_compile(struct vmctx *ctx, int (*fwrite_section)(FILE *), uint64_t offset)
 {
 	struct basl_fio io[2];
 	static char iaslbuf[3*MAXPATHLEN + 10];
 	char *fmt;
 	int err;
 
 	err = basl_start(&io[0], &io[1]);
 	if (!err) {
 		err = (*fwrite_section)(io[0].fp);
 
 		if (!err) {
 			/*
 			 * iasl sends the results of the compilation to
 			 * stdout. Shut this down by using the shell to
 			 * redirect stdout to /dev/null, unless the user
 			 * has requested verbose output for debugging
 			 * purposes
 			 */
 			fmt = basl_verbose_iasl ?
 				"%s -p %s %s" :
 				"/bin/sh -c \"%s -p %s %s\" 1> /dev/null";
 				
 			snprintf(iaslbuf, sizeof(iaslbuf),
 				 fmt,
 				 BHYVE_ASL_COMPILER,
 				 io[1].f_name, io[0].f_name);
 			err = system(iaslbuf);
 
 			if (!err) {
 				/*
 				 * Copy the aml output file into guest
 				 * memory at the specified location
 				 */
 				err = basl_load(ctx, io[1].fd, offset);
 			}
 		}
 		basl_end(&io[0], &io[1]);
 	}
 
 	return (err);
 }
 
 static int
 basl_make_templates(void)
 {
 	const char *tmpdir;
 	int err;
 	int len;
 
 	err = 0;
 
 	/*
 	 * 
 	 */
 	if ((tmpdir = getenv("BHYVE_TMPDIR")) == NULL || *tmpdir == '\0' ||
 	    (tmpdir = getenv("TMPDIR")) == NULL || *tmpdir == '\0') {
 		tmpdir = _PATH_TMP;
 	}
 
 	len = strlen(tmpdir);
 
 	if ((len + sizeof(BHYVE_ASL_TEMPLATE) + 1) < MAXPATHLEN) {
 		strcpy(basl_template, tmpdir);
 		while (len > 0 && basl_template[len - 1] == '/')
 			len--;
 		basl_template[len] = '/';
 		strcpy(&basl_template[len + 1], BHYVE_ASL_TEMPLATE);
 	} else
 		err = E2BIG;
 
 	if (!err) {
 		/*
 		 * len has been intialized (and maybe adjusted) above
 		 */
 		if ((len + sizeof(BHYVE_ASL_TEMPLATE) + 1 +
 		     sizeof(BHYVE_ASL_SUFFIX)) < MAXPATHLEN) {
 			strcpy(basl_stemplate, tmpdir);
 			basl_stemplate[len] = '/';
 			strcpy(&basl_stemplate[len + 1], BHYVE_ASL_TEMPLATE);
 			len = strlen(basl_stemplate);
 			strcpy(&basl_stemplate[len], BHYVE_ASL_SUFFIX);
 		} else
 			err = E2BIG;
 	}
 
 	return (err);
 }
 
 static struct {
 	int	(*wsect)(FILE *fp);
 	uint64_t  offset;
 } basl_ftables[] =
 {
 	{ basl_fwrite_rsdp, 0},
 	{ basl_fwrite_rsdt, RSDT_OFFSET },
 	{ basl_fwrite_xsdt, XSDT_OFFSET },
 	{ basl_fwrite_madt, MADT_OFFSET },
 	{ basl_fwrite_fadt, FADT_OFFSET },
 	{ basl_fwrite_hpet, HPET_OFFSET },
 	{ basl_fwrite_mcfg, MCFG_OFFSET },
 	{ basl_fwrite_facs, FACS_OFFSET },
 	{ basl_fwrite_dsdt, DSDT_OFFSET },
 	{ NULL }
 };
 
 int
 acpi_build(struct vmctx *ctx, int ncpu)
 {
 	int err;
 	int i;
 
 	basl_ncpu = ncpu;
 
 	err = vm_get_hpet_capabilities(ctx, &hpet_capabilities);
 	if (err != 0)
 		return (err);
 
 	/*
 	 * For debug, allow the user to have iasl compiler output sent
 	 * to stdout rather than /dev/null
 	 */
 	if (getenv("BHYVE_ACPI_VERBOSE_IASL"))
 		basl_verbose_iasl = 1;
 
 	/*
 	 * Allow the user to keep the generated ASL files for debugging
 	 * instead of deleting them following use
 	 */
 	if (getenv("BHYVE_ACPI_KEEPTMPS"))
 		basl_keep_temps = 1;
 
 	i = 0;
 	err = basl_make_templates();
 
 	/*
 	 * Run through all the ASL files, compiling them and
 	 * copying them into guest memory
 	 */
 	while (!err && basl_ftables[i].wsect != NULL) {
 		err = basl_compile(ctx, basl_ftables[i].wsect,
 				   basl_ftables[i].offset);
 		i++;
 	}
 
 	return (err);
 }
Index: user/ngie/bug-237403/usr.sbin/bhyve/bhyverun.h
===================================================================
--- user/ngie/bug-237403/usr.sbin/bhyve/bhyverun.h	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/bhyve/bhyverun.h	(revision 346926)
@@ -1,51 +1,52 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2011 NetApp, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef	_FBSDRUN_H_
 #define	_FBSDRUN_H_
 
 #define	VMEXIT_CONTINUE		(0)
 #define	VMEXIT_ABORT		(-1)
 
 struct vmctx;
 extern int guest_ncpus;
+extern uint16_t cores, sockets, threads;
 extern char *guest_uuid_str;
 extern char *vmname;
 
 void *paddr_guest2host(struct vmctx *ctx, uintptr_t addr, size_t len);
 
 void fbsdrun_set_capabilities(struct vmctx *ctx, int cpu);
 void fbsdrun_addcpu(struct vmctx *ctx, int fromcpu, int newcpu, uint64_t rip);
 int  fbsdrun_muxed(void);
 int  fbsdrun_vmexit_on_hlt(void);
 int  fbsdrun_vmexit_on_pause(void);
 int  fbsdrun_disable_x2apic(void);
 int  fbsdrun_virtio_msix(void);
 #endif
Index: user/ngie/bug-237403/usr.sbin/bhyve/smbiostbl.c
===================================================================
--- user/ngie/bug-237403/usr.sbin/bhyve/smbiostbl.c	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/bhyve/smbiostbl.c	(revision 346926)
@@ -1,829 +1,839 @@
 /*-
  * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
  *
  * Copyright (c) 2014 Tycho Nightingale <tycho.nightingale@pluribusnetworks.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 
 #include <assert.h>
 #include <errno.h>
 #include <md5.h>
 #include <stdio.h>
 #include <string.h>
 #include <unistd.h>
 #include <uuid.h>
 
 #include <machine/vmm.h>
 #include <vmmapi.h>
 
 #include "bhyverun.h"
 #include "smbiostbl.h"
 
 #define	MB			(1024*1024)
 #define	GB			(1024ULL*1024*1024)
 
 #define SMBIOS_BASE		0xF1000
 
 /* BHYVE_ACPI_BASE - SMBIOS_BASE) */
 #define	SMBIOS_MAX_LENGTH	(0xF2400 - 0xF1000)
 
 #define	SMBIOS_TYPE_BIOS	0
 #define	SMBIOS_TYPE_SYSTEM	1
 #define	SMBIOS_TYPE_CHASSIS	3
 #define	SMBIOS_TYPE_PROCESSOR	4
 #define	SMBIOS_TYPE_MEMARRAY	16
 #define	SMBIOS_TYPE_MEMDEVICE	17
 #define	SMBIOS_TYPE_MEMARRAYMAP	19
 #define	SMBIOS_TYPE_BOOT	32
 #define	SMBIOS_TYPE_EOT		127
 
 struct smbios_structure {
 	uint8_t		type;
 	uint8_t		length;
 	uint16_t	handle;
 } __packed;
 
 typedef int (*initializer_func_t)(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 struct smbios_template_entry {
 	struct smbios_structure	*entry;
 	const char		**strings;
 	initializer_func_t	initializer;
 };
 
 /*
  * SMBIOS Structure Table Entry Point
  */
 #define	SMBIOS_ENTRY_EANCHOR	"_SM_"
 #define	SMBIOS_ENTRY_EANCHORLEN	4
 #define	SMBIOS_ENTRY_IANCHOR	"_DMI_"
 #define	SMBIOS_ENTRY_IANCHORLEN	5
 
 struct smbios_entry_point {
 	char		eanchor[4];	/* anchor tag */
 	uint8_t		echecksum;	/* checksum of entry point structure */
 	uint8_t		eplen;		/* length in bytes of entry point */
 	uint8_t		major;		/* major version of the SMBIOS spec */
 	uint8_t		minor;		/* minor version of the SMBIOS spec */
 	uint16_t	maxssize;	/* maximum size in bytes of a struct */
 	uint8_t		revision;	/* entry point structure revision */
 	uint8_t		format[5];	/* entry point rev-specific data */
 	char		ianchor[5];	/* intermediate anchor tag */
 	uint8_t		ichecksum;	/* intermediate checksum */
 	uint16_t	stlen;		/* len in bytes of structure table */
 	uint32_t	staddr;		/* physical addr of structure table */
 	uint16_t	stnum;		/* number of structure table entries */
 	uint8_t		bcdrev;		/* BCD value representing DMI ver */
 } __packed;
 
 /*
  * BIOS Information
  */
 #define	SMBIOS_FL_ISA		0x00000010	/* ISA is supported */
 #define	SMBIOS_FL_PCI		0x00000080	/* PCI is supported */
 #define	SMBIOS_FL_SHADOW	0x00001000	/* BIOS shadowing is allowed */
 #define	SMBIOS_FL_CDBOOT	0x00008000	/* Boot from CD is supported */
 #define	SMBIOS_FL_SELBOOT	0x00010000	/* Selectable Boot supported */
 #define	SMBIOS_FL_EDD		0x00080000	/* EDD Spec is supported */
 
 #define	SMBIOS_XB1_FL_ACPI	0x00000001	/* ACPI is supported */
 
 #define	SMBIOS_XB2_FL_BBS	0x00000001	/* BIOS Boot Specification */
 #define	SMBIOS_XB2_FL_VM	0x00000010	/* Virtual Machine */
 
 struct smbios_table_type0 {
 	struct smbios_structure	header;
 	uint8_t			vendor;		/* vendor string */
 	uint8_t			version;	/* version string */
 	uint16_t		segment;	/* address segment location */
 	uint8_t			rel_date;	/* release date */
 	uint8_t			size;		/* rom size */
 	uint64_t		cflags;		/* characteristics */
 	uint8_t			xc_bytes[2];	/* characteristics ext bytes */
 	uint8_t			sb_major_rel;	/* system bios version */
 	uint8_t			sb_minor_rele;
 	uint8_t			ecfw_major_rel;	/* embedded ctrl fw version */
 	uint8_t			ecfw_minor_rel;
 } __packed;
 
 /*
  * System Information
  */
 #define	SMBIOS_WAKEUP_SWITCH	0x06	/* power switch */
 
 struct smbios_table_type1 {
 	struct smbios_structure	header;
 	uint8_t			manufacturer;	/* manufacturer string */
 	uint8_t			product;	/* product name string */
 	uint8_t			version;	/* version string */
 	uint8_t			serial;		/* serial number string */
 	uint8_t			uuid[16];	/* uuid byte array */
 	uint8_t			wakeup;		/* wake-up event */
 	uint8_t			sku;		/* sku number string */
 	uint8_t			family;		/* family name string */
 } __packed;
 
 /*
  * System Enclosure or Chassis
  */
 #define	SMBIOS_CHT_UNKNOWN	0x02	/* unknown */
 
 #define	SMBIOS_CHST_SAFE	0x03	/* safe */
 
 #define	SMBIOS_CHSC_NONE	0x03	/* none */
 
 struct smbios_table_type3 {
 	struct smbios_structure	header;
 	uint8_t			manufacturer;	/* manufacturer string */
 	uint8_t			type;		/* type */
 	uint8_t			version;	/* version string */
 	uint8_t			serial;		/* serial number string */
 	uint8_t			asset;		/* asset tag string */
 	uint8_t			bustate;	/* boot-up state */
 	uint8_t			psstate;	/* power supply state */
 	uint8_t			tstate;		/* thermal state */
 	uint8_t			security;	/* security status */
 	uint8_t			uheight;	/* height in 'u's */
 	uint8_t			cords;		/* number of power cords */
 	uint8_t			elems;		/* number of element records */
 	uint8_t			elemlen;	/* length of records */
 	uint8_t			sku;		/* sku number string */
 } __packed;
 
 /*
  * Processor Information
  */
 #define	SMBIOS_PRT_CENTRAL	0x03	/* central processor */
 
 #define	SMBIOS_PRF_OTHER	0x01	/* other */
 
 #define	SMBIOS_PRS_PRESENT	0x40	/* socket is populated */
 #define	SMBIOS_PRS_ENABLED	0x1	/* enabled */
 
 #define	SMBIOS_PRU_NONE		0x06	/* none */
 
 #define	SMBIOS_PFL_64B	0x04	/* 64-bit capable */
 
 struct smbios_table_type4 {
 	struct smbios_structure	header;
 	uint8_t			socket;		/* socket designation string */
 	uint8_t			type;		/* processor type */
 	uint8_t			family;		/* processor family */
 	uint8_t			manufacturer;	/* manufacturer string */
 	uint64_t		cpuid;		/* processor cpuid */
 	uint8_t			version;	/* version string */
 	uint8_t			voltage;	/* voltage */
 	uint16_t		clkspeed;	/* ext clock speed in mhz */
 	uint16_t		maxspeed;	/* maximum speed in mhz */
 	uint16_t		curspeed;	/* current speed in mhz */
 	uint8_t			status;		/* status */
 	uint8_t			upgrade;	/* upgrade */
 	uint16_t		l1handle;	/* l1 cache handle */
 	uint16_t		l2handle;	/* l2 cache handle */
 	uint16_t		l3handle;	/* l3 cache handle */
 	uint8_t			serial;		/* serial number string */
 	uint8_t			asset;		/* asset tag string */
 	uint8_t			part;		/* part number string */
 	uint8_t			cores;		/* cores per socket */
 	uint8_t			ecores;		/* enabled cores */
 	uint8_t			threads;	/* threads per socket */
 	uint16_t		cflags;		/* processor characteristics */
 	uint16_t		family2;	/* processor family 2 */
 } __packed;
 
 /*
  * Physical Memory Array
  */
 #define	SMBIOS_MAL_SYSMB	0x03	/* system board or motherboard */
 
 #define	SMBIOS_MAU_SYSTEM	0x03	/* system memory */
 
 #define	SMBIOS_MAE_NONE		0x03	/* none */
 
 struct smbios_table_type16 {
 	struct smbios_structure	header;
 	uint8_t			location;	/* physical device location */
 	uint8_t			use;		/* device functional purpose */
 	uint8_t			ecc;		/* err detect/correct method */
 	uint32_t		size;		/* max mem capacity in kb */
 	uint16_t		errhand;	/* handle of error (if any) */
 	uint16_t		ndevs;		/* num of slots or sockets */
 	uint64_t		xsize;		/* max mem capacity in bytes */
 } __packed;
 
 /*
  * Memory Device
  */
 #define	SMBIOS_MDFF_UNKNOWN	0x02	/* unknown */
 
 #define	SMBIOS_MDT_UNKNOWN	0x02	/* unknown */
 
 #define	SMBIOS_MDF_UNKNOWN	0x0004	/* unknown */
 
 struct smbios_table_type17 {
 	struct smbios_structure	header;
 	uint16_t		arrayhand;	/* handle of physl mem array */
 	uint16_t		errhand;	/* handle of mem error data */
 	uint16_t		twidth;		/* total width in bits */
 	uint16_t		dwidth;		/* data width in bits */
 	uint16_t		size;		/* size in bytes */
 	uint8_t			form;		/* form factor */
 	uint8_t			set;		/* set */
 	uint8_t			dloc;		/* device locator string */
 	uint8_t			bloc;		/* phys bank locator string */
 	uint8_t			type;		/* memory type */
 	uint16_t		flags;		/* memory characteristics */
 	uint16_t		maxspeed;	/* maximum speed in mhz */
 	uint8_t			manufacturer;	/* manufacturer string */
 	uint8_t			serial;		/* serial number string */
 	uint8_t			asset;		/* asset tag string */
 	uint8_t			part;		/* part number string */
 	uint8_t			attributes;	/* attributes */
 	uint32_t		xsize;		/* extended size in mbs */
 	uint16_t		curspeed;	/* current speed in mhz */
 	uint16_t		minvoltage;	/* minimum voltage */
 	uint16_t		maxvoltage;	/* maximum voltage */
 	uint16_t		curvoltage;	/* configured voltage */
 } __packed;
 
 /*
  * Memory Array Mapped Address
  */
 struct smbios_table_type19 {
 	struct smbios_structure	header;
 	uint32_t		saddr;		/* start phys addr in kb */
 	uint32_t		eaddr;		/* end phys addr in kb */
 	uint16_t		arrayhand;	/* physical mem array handle */
 	uint8_t			width;		/* num of dev in row */
 	uint64_t		xsaddr;		/* start phys addr in bytes */
 	uint64_t		xeaddr;		/* end phys addr in bytes */
 } __packed;
 
 /*
  * System Boot Information
  */
 #define	SMBIOS_BOOT_NORMAL	0	/* no errors detected */
 
 struct smbios_table_type32 {
 	struct smbios_structure	header;
 	uint8_t			reserved[6];
 	uint8_t			status;		/* boot status */
 } __packed;
 
 /*
  * End-of-Table
  */
 struct smbios_table_type127 {
 	struct smbios_structure	header;
 } __packed;
 
 struct smbios_table_type0 smbios_type0_template = {
 	{ SMBIOS_TYPE_BIOS, sizeof (struct smbios_table_type0), 0 },
 	1,	/* bios vendor string */
 	2,	/* bios version string */
 	0xF000,	/* bios address segment location */
 	3,	/* bios release date */
 	0x0,	/* bios size (64k * (n + 1) is the size in bytes) */
 	SMBIOS_FL_ISA | SMBIOS_FL_PCI | SMBIOS_FL_SHADOW |
 	    SMBIOS_FL_CDBOOT | SMBIOS_FL_EDD,
 	{ SMBIOS_XB1_FL_ACPI, SMBIOS_XB2_FL_BBS | SMBIOS_XB2_FL_VM },
 	0x0,	/* bios major release */
 	0x0,	/* bios minor release */
 	0xff,	/* embedded controller firmware major release */
 	0xff	/* embedded controller firmware minor release */
 };
 
 const char *smbios_type0_strings[] = {
 	"BHYVE",	/* vendor string */
 	"1.00",		/* bios version string */
 	"03/14/2014",	/* bios release date string */
 	NULL
 };
 
 struct smbios_table_type1 smbios_type1_template = {
 	{ SMBIOS_TYPE_SYSTEM, sizeof (struct smbios_table_type1), 0 },
 	1,		/* manufacturer string */
 	2,		/* product string */
 	3,		/* version string */
 	4,		/* serial number string */
 	{ 0 },
 	SMBIOS_WAKEUP_SWITCH,
 	5,		/* sku string */
 	6		/* family string */
 };
 
 static int smbios_type1_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 const char *smbios_type1_strings[] = {
 	" ",		/* manufacturer string */
 	"BHYVE",	/* product name string */
 	"1.0",		/* version string */
 	"None",		/* serial number string */
 	"None",		/* sku string */
 	" ",		/* family name string */
 	NULL
 };
 
 struct smbios_table_type3 smbios_type3_template = {
 	{ SMBIOS_TYPE_CHASSIS, sizeof (struct smbios_table_type3), 0 },
 	1,		/* manufacturer string */
 	SMBIOS_CHT_UNKNOWN,
 	2,		/* version string */
 	3,		/* serial number string */
 	4,		/* asset tag string */
 	SMBIOS_CHST_SAFE,
 	SMBIOS_CHST_SAFE,
 	SMBIOS_CHST_SAFE,
 	SMBIOS_CHSC_NONE,
 	0,		/* height in 'u's (0=enclosure height unspecified) */
 	0,		/* number of power cords (0=number unspecified) */
 	0,		/* number of contained element records */
 	0,		/* length of records */
 	5		/* sku number string */
 };
 
 const char *smbios_type3_strings[] = {
 	" ",		/* manufacturer string */
 	"1.0",		/* version string */
 	"None",		/* serial number string */
 	"None",		/* asset tag string */
 	"None",		/* sku number string */
 	NULL
 };
 
 struct smbios_table_type4 smbios_type4_template = {
 	{ SMBIOS_TYPE_PROCESSOR, sizeof (struct smbios_table_type4), 0 },
 	1,		/* socket designation string */
 	SMBIOS_PRT_CENTRAL,
 	SMBIOS_PRF_OTHER,
 	2,		/* manufacturer string */
 	0,		/* cpuid */
 	3,		/* version string */
 	0,		/* voltage */
 	0,		/* external clock frequency in mhz (0=unknown) */
 	0,		/* maximum frequency in mhz (0=unknown) */
 	0,		/* current frequency in mhz (0=unknown) */
 	SMBIOS_PRS_PRESENT | SMBIOS_PRS_ENABLED,
 	SMBIOS_PRU_NONE,
 	-1,		/* l1 cache handle */
 	-1,		/* l2 cache handle */
 	-1,		/* l3 cache handle */
 	4,		/* serial number string */
 	5,		/* asset tag string */
 	6,		/* part number string */
 	0,		/* cores per socket (0=unknown) */
 	0,		/* enabled cores per socket (0=unknown) */
 	0,		/* threads per socket (0=unknown) */
 	SMBIOS_PFL_64B,
 	SMBIOS_PRF_OTHER
 };
 
 const char *smbios_type4_strings[] = {
 	" ",		/* socket designation string */
 	" ",		/* manufacturer string */
 	" ",		/* version string */
 	"None",		/* serial number string */
 	"None",		/* asset tag string */
 	"None",		/* part number string */
 	NULL
 };
 
 static int smbios_type4_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 struct smbios_table_type16 smbios_type16_template = {
 	{ SMBIOS_TYPE_MEMARRAY, sizeof (struct smbios_table_type16),  0 },
 	SMBIOS_MAL_SYSMB,
 	SMBIOS_MAU_SYSTEM,
 	SMBIOS_MAE_NONE,
 	0x80000000,	/* max mem capacity in kb (0x80000000=use extended) */
 	-1,		/* handle of error (if any) */
 	0,		/* number of slots or sockets (TBD) */
 	0		/* extended maximum memory capacity in bytes (TBD) */
 };
 
 static int smbios_type16_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 struct smbios_table_type17 smbios_type17_template = {
 	{ SMBIOS_TYPE_MEMDEVICE, sizeof (struct smbios_table_type17),  0 },
 	-1,		/* handle of physical memory array */
 	-1,		/* handle of memory error data */
 	64,		/* total width in bits including ecc */
 	64,		/* data width in bits */
 	0x7fff,		/* size in bytes (0x7fff=use extended)*/
 	SMBIOS_MDFF_UNKNOWN,
 	0,		/* set (0x00=none, 0xff=unknown) */
 	1,		/* device locator string */
 	2,		/* physical bank locator string */
 	SMBIOS_MDT_UNKNOWN,
 	SMBIOS_MDF_UNKNOWN,
 	0,		/* maximum memory speed in mhz (0=unknown) */
 	3,		/* manufacturer string */
 	4,		/* serial number string */
 	5,		/* asset tag string */
 	6,		/* part number string */
 	0,		/* attributes (0=unknown rank information) */
 	0,		/* extended size in mb (TBD) */
 	0,		/* current speed in mhz (0=unknown) */
 	0,		/* minimum voltage in mv (0=unknown) */
 	0,		/* maximum voltage in mv (0=unknown) */
 	0		/* configured voltage in mv (0=unknown) */
 };
 
 const char *smbios_type17_strings[] = {
 	" ",		/* device locator string */
 	" ",		/* physical bank locator string */
 	" ",		/* manufacturer string */
 	"None",		/* serial number string */
 	"None",		/* asset tag string */
 	"None",		/* part number string */
 	NULL
 };
 
 static int smbios_type17_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 struct smbios_table_type19 smbios_type19_template = {
 	{ SMBIOS_TYPE_MEMARRAYMAP, sizeof (struct smbios_table_type19),  0 },
 	0xffffffff,	/* starting phys addr in kb (0xffffffff=use ext) */
 	0xffffffff,	/* ending phys addr in kb (0xffffffff=use ext) */
 	-1,		/* physical memory array handle */
 	1,		/* number of devices that form a row */
 	0,		/* extended starting phys addr in bytes (TDB) */
 	0		/* extended ending phys addr in bytes (TDB) */
 };
 
 static int smbios_type19_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 struct smbios_table_type32 smbios_type32_template = {
 	{ SMBIOS_TYPE_BOOT, sizeof (struct smbios_table_type32),  0 },
 	{ 0, 0, 0, 0, 0, 0 },
 	SMBIOS_BOOT_NORMAL
 };
 
 struct smbios_table_type127 smbios_type127_template = {
 	{ SMBIOS_TYPE_EOT, sizeof (struct smbios_table_type127),  0 }
 };
 
 static int smbios_generic_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size);
 
 static struct smbios_template_entry smbios_template[] = {
 	{ (struct smbios_structure *)&smbios_type0_template,
 	  smbios_type0_strings,
 	  smbios_generic_initializer },
 	{ (struct smbios_structure *)&smbios_type1_template,
 	  smbios_type1_strings,
 	  smbios_type1_initializer },
 	{ (struct smbios_structure *)&smbios_type3_template,
 	  smbios_type3_strings,
 	  smbios_generic_initializer },
 	{ (struct smbios_structure *)&smbios_type4_template,
 	  smbios_type4_strings,
 	  smbios_type4_initializer },
 	{ (struct smbios_structure *)&smbios_type16_template,
 	  NULL,
 	  smbios_type16_initializer },
 	{ (struct smbios_structure *)&smbios_type17_template,
 	  smbios_type17_strings,
 	  smbios_type17_initializer },
 	{ (struct smbios_structure *)&smbios_type19_template,
 	  NULL,
 	  smbios_type19_initializer },
 	{ (struct smbios_structure *)&smbios_type32_template,
 	  NULL,
 	  smbios_generic_initializer },
 	{ (struct smbios_structure *)&smbios_type127_template,
 	  NULL,
 	  smbios_generic_initializer },
 	{ NULL,NULL, NULL }
 };
 
 static uint64_t guest_lomem, guest_himem;
 static uint16_t type16_handle;
 
 static int
 smbios_generic_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	struct smbios_structure *entry;
 
 	memcpy(curaddr, template_entry, template_entry->length);
 	entry = (struct smbios_structure *)curaddr;
 	entry->handle = *n + 1;
 	curaddr += entry->length;
 	if (template_strings != NULL) {
 		int	i;
 
 		for (i = 0; template_strings[i] != NULL; i++) {
 			const char *string;
 			int len;
 
 			string = template_strings[i];
 			len = strlen(string) + 1;
 			memcpy(curaddr, string, len);
 			curaddr += len;
 		}
 		*curaddr = '\0';
 		curaddr++;
 	} else {
 		/* Minimum string section is double nul */
 		*curaddr = '\0';
 		curaddr++;
 		*curaddr = '\0';
 		curaddr++;
 	}
 	(*n)++;
 	*endaddr = curaddr;
 
 	return (0);
 }
 
 static int
 smbios_type1_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	struct smbios_table_type1 *type1;
 
 	smbios_generic_initializer(template_entry, template_strings,
 	    curaddr, endaddr, n, size);
 	type1 = (struct smbios_table_type1 *)curaddr;
 
 	if (guest_uuid_str != NULL) {
 		uuid_t		uuid;
 		uint32_t	status;
 
 		uuid_from_string(guest_uuid_str, &uuid, &status);
 		if (status != uuid_s_ok)
 			return (-1);
 
 		uuid_enc_le(&type1->uuid, &uuid);
 	} else {
 		MD5_CTX		mdctx;
 		u_char		digest[16];
 		char		hostname[MAXHOSTNAMELEN];
 
 		/*
 		 * Universally unique and yet reproducible are an
 		 * oxymoron, however reproducible is desirable in
 		 * this case.
 		 */
 		if (gethostname(hostname, sizeof(hostname)))
 			return (-1);
 
 		MD5Init(&mdctx);
 		MD5Update(&mdctx, vmname, strlen(vmname));
 		MD5Update(&mdctx, hostname, sizeof(hostname));
 		MD5Final(digest, &mdctx);
 
 		/*
 		 * Set the variant and version number.
 		 */
 		digest[6] &= 0x0F;
 		digest[6] |= 0x30;	/* version 3 */
 		digest[8] &= 0x3F;
 		digest[8] |= 0x80;
 
 		memcpy(&type1->uuid, digest, sizeof (digest));
 	}
 
 	return (0);
 }
 
 static int
 smbios_type4_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	int i;
 
-	for (i = 0; i < guest_ncpus; i++) {
+	for (i = 0; i < sockets; i++) {
 		struct smbios_table_type4 *type4;
 		char *p;
 		int nstrings, len;
 
 		smbios_generic_initializer(template_entry, template_strings,
 		    curaddr, endaddr, n, size);
 		type4 = (struct smbios_table_type4 *)curaddr;
 		p = curaddr + sizeof (struct smbios_table_type4);
 		nstrings = 0;
 		while (p < *endaddr - 1) {
 			if (*p++ == '\0')
 				nstrings++;
 		}
 		len = sprintf(*endaddr - 1, "CPU #%d", i) + 1;
 		*endaddr += len - 1;
 		*(*endaddr) = '\0';
 		(*endaddr)++;
 		type4->socket = nstrings + 1;
+		/* Revise cores and threads after update to smbios 3.0 */
+		if (cores > 254)
+			type4->cores = 0;
+		else
+			type4->cores = cores;
+		/* This threads is total threads in a socket */
+		if ((cores * threads) > 254)
+			type4->threads = 0;
+		else
+			type4->threads = (cores * threads);
 		curaddr = *endaddr;
 	}
 
 	return (0);
 }
 
 static int
 smbios_type16_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	struct smbios_table_type16 *type16;
 
 	type16_handle = *n;
 	smbios_generic_initializer(template_entry, template_strings,
 	    curaddr, endaddr, n, size);
 	type16 = (struct smbios_table_type16 *)curaddr;
 	type16->xsize = guest_lomem + guest_himem;
 	type16->ndevs = guest_himem > 0 ? 2 : 1;
 
 	return (0);
 }
 
 static int
 smbios_type17_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	struct smbios_table_type17 *type17;
 
 	smbios_generic_initializer(template_entry, template_strings,
 	    curaddr, endaddr, n, size);
 	type17 = (struct smbios_table_type17 *)curaddr;
 	type17->arrayhand = type16_handle;
 	type17->xsize = guest_lomem;
 
 	if (guest_himem > 0) {
 		curaddr = *endaddr;
 		smbios_generic_initializer(template_entry, template_strings,
 		    curaddr, endaddr, n, size);
 		type17 = (struct smbios_table_type17 *)curaddr;
 		type17->arrayhand = type16_handle;
 		type17->xsize = guest_himem;
 	}
 
 	return (0);
 }
 
 static int
 smbios_type19_initializer(struct smbios_structure *template_entry,
     const char **template_strings, char *curaddr, char **endaddr,
     uint16_t *n, uint16_t *size)
 {
 	struct smbios_table_type19 *type19;
 
 	smbios_generic_initializer(template_entry, template_strings,
 	    curaddr, endaddr, n, size);
 	type19 = (struct smbios_table_type19 *)curaddr;
 	type19->arrayhand = type16_handle;
 	type19->xsaddr = 0;
 	type19->xeaddr = guest_lomem;
 
 	if (guest_himem > 0) {
 		curaddr = *endaddr;
 		smbios_generic_initializer(template_entry, template_strings,
 		    curaddr, endaddr, n, size);
 		type19 = (struct smbios_table_type19 *)curaddr;
 		type19->arrayhand = type16_handle;
 		type19->xsaddr = 4*GB;
 		type19->xeaddr = guest_himem;
 	}
 
 	return (0);
 }
 
 static void
 smbios_ep_initializer(struct smbios_entry_point *smbios_ep, uint32_t staddr)
 {
 	memset(smbios_ep, 0, sizeof(*smbios_ep));
 	memcpy(smbios_ep->eanchor, SMBIOS_ENTRY_EANCHOR,
 	    SMBIOS_ENTRY_EANCHORLEN);
 	smbios_ep->eplen = 0x1F;
 	assert(sizeof (struct smbios_entry_point) == smbios_ep->eplen);
 	smbios_ep->major = 2;
 	smbios_ep->minor = 6;
 	smbios_ep->revision = 0;
 	memcpy(smbios_ep->ianchor, SMBIOS_ENTRY_IANCHOR,
 	    SMBIOS_ENTRY_IANCHORLEN);
 	smbios_ep->staddr = staddr;
 	smbios_ep->bcdrev = 0x24;
 }
 
 static void
 smbios_ep_finalizer(struct smbios_entry_point *smbios_ep, uint16_t len,
     uint16_t num, uint16_t maxssize)
 {
 	uint8_t	checksum;
 	int	i;
 
 	smbios_ep->maxssize = maxssize;
 	smbios_ep->stlen = len;
 	smbios_ep->stnum = num;
 
 	checksum = 0;
 	for (i = 0x10; i < 0x1f; i++) {
 		checksum -= ((uint8_t *)smbios_ep)[i];
 	}
 	smbios_ep->ichecksum = checksum;
 
 	checksum = 0;
 	for (i = 0; i < 0x1f; i++) {
 		checksum -= ((uint8_t *)smbios_ep)[i];
 	}
 	smbios_ep->echecksum = checksum;
 }
 
 int
 smbios_build(struct vmctx *ctx)
 {
 	struct smbios_entry_point	*smbios_ep;
 	uint16_t			n;
 	uint16_t			maxssize;
 	char				*curaddr, *startaddr, *ststartaddr;
 	int				i;
 	int				err;
 
 	guest_lomem = vm_get_lowmem_size(ctx);
 	guest_himem = vm_get_highmem_size(ctx);
 
 	startaddr = paddr_guest2host(ctx, SMBIOS_BASE, SMBIOS_MAX_LENGTH);
 	if (startaddr == NULL) {
 		fprintf(stderr, "smbios table requires mapped mem\n");
 		return (ENOMEM);
 	}
 
 	curaddr = startaddr;
 
 	smbios_ep = (struct smbios_entry_point *)curaddr;
 	smbios_ep_initializer(smbios_ep, SMBIOS_BASE +
 	    sizeof(struct smbios_entry_point));
 	curaddr += sizeof(struct smbios_entry_point);
 	ststartaddr = curaddr;
 
 	n = 0;
 	maxssize = 0;
 	for (i = 0; smbios_template[i].entry != NULL; i++) {
 		struct smbios_structure	*entry;
 		const char		**strings;
 		initializer_func_t      initializer;
 		char			*endaddr;
 		uint16_t		size;
 
 		entry = smbios_template[i].entry;
 		strings = smbios_template[i].strings;
 		initializer = smbios_template[i].initializer;
 
 		err = (*initializer)(entry, strings, curaddr, &endaddr,
 		    &n, &size);
 		if (err != 0)
 			return (err);
 
 		if (size > maxssize)
 			maxssize = size;
 
 		curaddr = endaddr;
 	}
 
 	assert(curaddr - startaddr < SMBIOS_MAX_LENGTH);
 	smbios_ep_finalizer(smbios_ep, curaddr - ststartaddr, n, maxssize);
 
 	return (0);
 }
Index: user/ngie/bug-237403/usr.sbin/bsdinstall/scripts/netconfig_ipv4
===================================================================
--- user/ngie/bug-237403/usr.sbin/bsdinstall/scripts/netconfig_ipv4	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/bsdinstall/scripts/netconfig_ipv4	(revision 346926)
@@ -1,102 +1,103 @@
 #!/bin/sh
 #-
 # Copyright (c) 2011 Nathan Whitehorn
 # Copyright (c) 2013-2015 Devin Teske
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
 # 1. Redistributions of source code must retain the above copyright
 #    notice, this list of conditions and the following disclaimer.
 # 2. Redistributions in binary form must reproduce the above copyright
 #    notice, this list of conditions and the following disclaimer in the
 #    documentation and/or other materials provided with the distribution.
 #
 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 # SUCH DAMAGE.
 #
 # $FreeBSD$
 #
 ############################################################ INCLUDES
 
 BSDCFG_SHARE="/usr/share/bsdconfig"
 . $BSDCFG_SHARE/common.subr || exit 1
 f_dprintf "%s: loading includes..." "$0"
 f_include $BSDCFG_SHARE/dialog.subr
 
 ############################################################ MAIN
 
 INTERFACE=$1
 IFCONFIG_PREFIX="$2"
 test -z "$IFCONFIG_PREFIX" || IFCONFIG_PREFIX="$2 "
 case "${INTERFACE}" in
 "")	dialog --backtitle 'FreeBSD Installer' --title 'Network Configuration' \
 	    --msgbox 'No interface specified for IPv4 configuration.' 0 0
 	exit 1
 	;;
 esac
 
 dialog --backtitle 'FreeBSD Installer' --title 'Network Configuration' --yesno 'Would you like to use DHCP to configure this interface?' 0 0
 if [ $? -eq $DIALOG_OK ]; then
 	if [ ! -z $BSDINSTALL_CONFIGCURRENT ]; then
+		ifconfig $INTERFACE up
 		dialog --backtitle 'FreeBSD Installer' --infobox "Acquiring DHCP lease..." 0 0
 		err=$( dhclient $INTERFACE 2>&1 )
 		if [ $? -ne 0 ]; then
 			f_dprintf "%s" "$err"
 			dialog --backtitle 'FreeBSD Installer' --msgbox "DHCP lease acquisition failed." 0 0
 			exec $0 ${INTERFACE} "${IFCONFIG_PREFIX}"
 		fi
 	fi
 	echo ifconfig_$INTERFACE=\"${IFCONFIG_PREFIX}DHCP\" >> $BSDINSTALL_TMPETC/._rc.conf.net
 	exit 0
 fi
 
 IP_ADDRESS=`ifconfig $INTERFACE inet | awk '/inet/ {printf("%s\n", $2); }'`
 NETMASK=`ifconfig $INTERFACE inet | awk '/inet/ {printf("%s\n", $4); }'`
 ROUTER=`netstat -rn -f inet | awk '/default/ {printf("%s\n", $2);}'`
 
 exec 3>&1
 IF_CONFIG=$(dialog --backtitle 'FreeBSD Installer' --title 'Network Configuration' --form 'Static Network Interface Configuration' 0 0 0 \
 	'IP Address' 1 0 "$IP_ADDRESS" 1 20 16 0 \
 	'Subnet Mask' 2 0 "$NETMASK" 2 20 16 0 \
 	'Default Router' 3 0 "$ROUTER" 3 20 16 0 \
 2>&1 1>&3)
 if [ $? -eq $DIALOG_CANCEL ]; then exit 1; fi
 exec 3>&-
 
 echo $INTERFACE $IF_CONFIG |
     awk -v prefix="$IFCONFIG_PREFIX" '{
 	printf("ifconfig_%s=\"%s\inet %s netmask %s\"\n", $1, prefix, $2, $3);
 	printf("defaultrouter=\"%s\"\n", $4);
     }' >> $BSDINSTALL_TMPETC/._rc.conf.net
 retval=$?
 
 if [ "$BSDINSTALL_CONFIGCURRENT" ]; then
 	. $BSDINSTALL_TMPETC/._rc.conf.net
 	if [ -n "$2" ]; then
 		ifconfig $INTERFACE `eval echo \\\$ifconfig_$INTERFACE | sed "s|$2||"`
 	else
 		ifconfig $INTERFACE `eval echo \\\$ifconfig_$INTERFACE`
 	fi
 	if [ "$defaultrouter" ]; then
 		route delete -inet default
 		route add -inet default $defaultrouter
 		retval=$?
 	fi
 fi
 
 exit $retval
 
 ################################################################################
 # END
 ################################################################################
Index: user/ngie/bug-237403/usr.sbin/kldxref/ef_mips.c
===================================================================
--- user/ngie/bug-237403/usr.sbin/kldxref/ef_mips.c	(nonexistent)
+++ user/ngie/bug-237403/usr.sbin/kldxref/ef_mips.c	(revision 346926)
@@ -0,0 +1,99 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2019 John Baldwin <jhb@FreeBSD.org>
+ *
+ * This software was developed by SRI International and the University of
+ * Cambridge Computer Laboratory (Department of Computer Science and
+ * Technology) under DARPA contract HR0011-18-C-0016 ("ECATS"), as part of the
+ * DARPA SSITH research programme.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+#include <sys/types.h>
+#include <machine/elf.h>
+
+#include <err.h>
+#include <errno.h>
+
+#include "ef.h"
+
+/*
+ * Apply relocations to the values we got from the file. `relbase' is the
+ * target relocation address of the section, and `dataoff' is the target
+ * relocation address of the data in `dest'.
+ */
+int
+ef_reloc(struct elf_file *ef, const void *reldata, int reltype, Elf_Off relbase,
+    Elf_Off dataoff, size_t len, void *dest)
+{
+	Elf_Addr *where, val;
+	const Elf_Rel *rel;
+	const Elf_Rela *rela;
+	Elf_Addr addend, addr;
+	Elf_Size rtype, symidx;
+
+	switch (reltype) {
+	case EF_RELOC_REL:
+		rel = (const Elf_Rel *)reldata;
+		where = (Elf_Addr *)((char *)dest + relbase + rel->r_offset -
+		    dataoff);
+		addend = 0;
+		rtype = ELF_R_TYPE(rel->r_info);
+		symidx = ELF_R_SYM(rel->r_info);
+		break;
+	case EF_RELOC_RELA:
+		rela = (const Elf_Rela *)reldata;
+		where = (Elf_Addr *)((char *)dest + relbase + rela->r_offset -
+		    dataoff);
+		addend = rela->r_addend;
+		rtype = ELF_R_TYPE(rela->r_info);
+		symidx = ELF_R_SYM(rela->r_info);
+		break;
+	default:
+		return (EINVAL);
+	}
+
+	if ((char *)where < (char *)dest || (char *)where >= (char *)dest + len)
+		return (0);
+
+	if (reltype == EF_RELOC_REL)
+		addend = *where;
+
+	switch (rtype) {
+#ifdef __LP64__
+	case R_MIPS_64:		/* S + A */
+#else
+	case R_MIPS_32:		/* S + A */
+#endif
+		addr = EF_SYMADDR(ef, symidx);
+		val = addr + addend;
+		*where = val;
+		break;
+	default:
+		warnx("unhandled relocation type %d", (int)rtype);
+	}
+	return (0);
+}

Property changes on: user/ngie/bug-237403/usr.sbin/kldxref/ef_mips.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: user/ngie/bug-237403/usr.sbin/nfsdumpstate/nfsdumpstate.c
===================================================================
--- user/ngie/bug-237403/usr.sbin/nfsdumpstate/nfsdumpstate.c	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/nfsdumpstate/nfsdumpstate.c	(revision 346926)
@@ -1,295 +1,315 @@
 /*-
  * Copyright (c) 2009 Rick Macklem, University of Guelph
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/linker.h>
 #include <sys/module.h>
 #include <sys/socket.h>
 
 #include <arpa/inet.h>
 
 #include <netinet/in.h>
 
 #include <nfs/nfssvc.h>
 
 #include <fs/nfs/rpcv2.h>
 #include <fs/nfs/nfsproto.h>
 #include <fs/nfs/nfskpiport.h>
 #include <fs/nfs/nfs.h>
 
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
 #define	DUMPSIZE	10000
 
 static void dump_lockstate(char *);
 static void dump_openstate(void);
 static void usage(void);
 static char *open_flags(uint32_t);
 static char *deleg_flags(uint32_t);
 static char *lock_flags(uint32_t);
 static char *client_flags(uint32_t);
 
 static struct nfsd_dumpclients dp[DUMPSIZE];
 static struct nfsd_dumplocks lp[DUMPSIZE];
 static char flag_string[20];
 
 int
 main(int argc, char **argv)
 {
 	int ch, openstate;
 	char *lockfile;
 
 	if (modfind("nfsd") < 0)
 		errx(1, "nfsd not loaded - self terminating");
 	openstate = 0;
 	lockfile = NULL;
 	while ((ch = getopt(argc, argv, "ol:")) != -1)
 		switch (ch) {
 		case 'o':
 			openstate = 1;
 			break;
 		case 'l':
 			lockfile = optarg;
 			break;
 		default:
 			usage();
 		}
 	argc -= optind;
 	argv += optind;
 
 	if (openstate == 0 && lockfile == NULL)
 		openstate = 1;
 	else if (openstate != 0 && lockfile != NULL)
 		errx(1, "-o and -l cannot both be specified");
 
 	/*
 	 * For -o, dump all open/lock state.
 	 * For -l, dump lock state for that file.
 	 */
 	if (openstate != 0)
 		dump_openstate();
 	else
 		dump_lockstate(lockfile);
 	exit(0);
 }
 
 static void
 usage(void)
 {
 
 	errx(1, "usage: nfsdumpstate [-o] [-l]");
 }
 
 /*
  * Dump all open/lock state.
  */
 static void
 dump_openstate(void)
 {
 	struct nfsd_dumplist dumplist;
 	int cnt, i;
+#ifdef INET6
 	char nbuf[INET6_ADDRSTRLEN];
+#endif
 
 	dumplist.ndl_size = DUMPSIZE;
 	dumplist.ndl_list = (void *)dp;
 	if (nfssvc(NFSSVC_DUMPCLIENTS, &dumplist) < 0)
 		errx(1, "Can't perform dump clients syscall");
 
 	printf("%-13s %9.9s %9.9s %9.9s %9.9s %9.9s %9.9s %-45s %s\n",
 	    "Flags", "OpenOwner", "Open", "LockOwner",
 	    "Lock", "Deleg", "OldDeleg", "Clientaddr", "ClientID");
 	/*
 	 * Loop through results, printing them out.
 	 */
 	cnt = 0;
 	while (dp[cnt].ndcl_clid.nclid_idlen > 0 && cnt < DUMPSIZE) {
 		printf("%-13s ", client_flags(dp[cnt].ndcl_flags));
 		printf("%9d %9d %9d %9d %9d %9d ",
 		    dp[cnt].ndcl_nopenowners,
 		    dp[cnt].ndcl_nopens,
 		    dp[cnt].ndcl_nlockowners,
 		    dp[cnt].ndcl_nlocks,
 		    dp[cnt].ndcl_ndelegs,
 		    dp[cnt].ndcl_nolddelegs);
 		switch (dp[cnt].ndcl_addrfam) {
 #ifdef INET
 		case AF_INET:
 			printf("%-45s ",
 			    inet_ntoa(dp[cnt].ndcl_cbaddr.sin_addr));
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			if (inet_ntop(AF_INET6, &dp[cnt].ndcl_cbaddr.sin6_addr,
 			    nbuf, sizeof(nbuf)) != NULL)
 				printf("%-45s ", nbuf);
 			else
 				printf("%-45s ", " ");
 			break;
 #endif
 		}
 		for (i = 0; i < dp[cnt].ndcl_clid.nclid_idlen; i++)
 			printf("%02x", dp[cnt].ndcl_clid.nclid_id[i]);
 		printf("\n");
 		cnt++;
 	}
 }
 
 /*
  * Dump the lock state for a file.
  */
 static void
 dump_lockstate(char *fname)
 {
 	struct nfsd_dumplocklist dumplocklist;
 	int cnt, i;
+#ifdef INET6
+	char nbuf[INET6_ADDRSTRLEN];
+#endif
 
 	dumplocklist.ndllck_size = DUMPSIZE;
 	dumplocklist.ndllck_list = (void *)lp;
 	dumplocklist.ndllck_fname = fname;
 	if (nfssvc(NFSSVC_DUMPLOCKS, &dumplocklist) < 0)
 		errx(1, "Can't dump locks for %s\n", fname);
 
-	printf("%-11s %-36s %-15s %s\n",
+	printf("%-11s %-36s %-45s %s\n",
 	    "Open/Lock",
 	    "          Stateid or Lock Range",
 	    "Clientaddr",
 	    "Owner and ClientID");
 	/*
 	 * Loop through results, printing them out.
 	 */
 	cnt = 0;
 	while (lp[cnt].ndlck_clid.nclid_idlen > 0 && cnt < DUMPSIZE) {
 		if (lp[cnt].ndlck_flags & NFSLCK_OPEN)
 			printf("%-11s %9d %08x %08x %08x ",
 			    open_flags(lp[cnt].ndlck_flags),
 			    lp[cnt].ndlck_stateid.seqid,
 			    lp[cnt].ndlck_stateid.other[0],
 			    lp[cnt].ndlck_stateid.other[1],
 			    lp[cnt].ndlck_stateid.other[2]);
 		else if (lp[cnt].ndlck_flags & (NFSLCK_DELEGREAD |
 		    NFSLCK_DELEGWRITE))
 			printf("%-11s %9d %08x %08x %08x ",
 			    deleg_flags(lp[cnt].ndlck_flags),
 			    lp[cnt].ndlck_stateid.seqid,
 			    lp[cnt].ndlck_stateid.other[0],
 			    lp[cnt].ndlck_stateid.other[1],
 			    lp[cnt].ndlck_stateid.other[2]);
 		else
 			printf("%-11s  %17jd %17jd ",
 			    lock_flags(lp[cnt].ndlck_flags),
 			    lp[cnt].ndlck_first,
 			    lp[cnt].ndlck_end);
-		if (lp[cnt].ndlck_addrfam == AF_INET)
-			printf("%-15s ",
+		switch (lp[cnt].ndlck_addrfam) {
+#ifdef INET
+		case AF_INET:
+			printf("%-45s ",
 			    inet_ntoa(lp[cnt].ndlck_cbaddr.sin_addr));
-		else
-			printf("%-15s ", "  ");
+			break;
+#endif
+#ifdef INET6
+		case AF_INET6:
+			if (inet_ntop(AF_INET6, &lp[cnt].ndlck_cbaddr.sin6_addr,
+			    nbuf, sizeof(nbuf)) != NULL)
+				printf("%-45s ", nbuf);
+			else
+				printf("%-45s ", " ");
+			break;
+#endif
+		default:
+			printf("%-45s ", "  ");
+			break;
+		}
 		for (i = 0; i < lp[cnt].ndlck_owner.nclid_idlen; i++)
 			printf("%02x", lp[cnt].ndlck_owner.nclid_id[i]);
 		printf(" ");
 		for (i = 0; i < lp[cnt].ndlck_clid.nclid_idlen; i++)
 			printf("%02x", lp[cnt].ndlck_clid.nclid_id[i]);
 		printf("\n");
 		cnt++;
 	}
 }
 
 /*
  * Parse the Open/Lock flag bits and create a string to be printed.
  */
 static char *
 open_flags(uint32_t flags)
 {
 	int i, j;
 
 	strlcpy(flag_string, "Open ", sizeof (flag_string));
 	i = 5;
 	if (flags & NFSLCK_READACCESS)
 		flag_string[i++] = 'R';
 	if (flags & NFSLCK_WRITEACCESS)
 		flag_string[i++] = 'W';
 	flag_string[i++] = ' ';
 	flag_string[i++] = 'D';
 	flag_string[i] = 'N';
 	j = i;
 	if (flags & NFSLCK_READDENY)
 		flag_string[i++] = 'R';
 	if (flags & NFSLCK_WRITEDENY)
 		flag_string[i++] = 'W';
 	if (i == j)
 		i++;
 	flag_string[i] = '\0';
 	return (flag_string);
 }
 
 static char *
 deleg_flags(uint32_t flags)
 {
 
 	if (flags & NFSLCK_DELEGREAD)
 		strlcpy(flag_string, "Deleg R", sizeof (flag_string));
 	else
 		strlcpy(flag_string, "Deleg W", sizeof (flag_string));
 	return (flag_string);
 }
 
 static char *
 lock_flags(uint32_t flags)
 {
 
 	if (flags & NFSLCK_READ)
 		strlcpy(flag_string, "Lock R", sizeof (flag_string));
 	else
 		strlcpy(flag_string, "Lock W", sizeof (flag_string));
 	return (flag_string);
 }
 
 static char *
 client_flags(uint32_t flags)
 {
 
 	flag_string[0] = '\0';
 	if (flags & LCL_NEEDSCONFIRM)
 		strlcat(flag_string, "NC ", sizeof (flag_string));
 	if (flags & LCL_CALLBACKSON)
 		strlcat(flag_string, "CB ", sizeof (flag_string));
 	if (flags & LCL_GSS)
 		strlcat(flag_string, "GSS ", sizeof (flag_string));
 	if (flags & LCL_ADMINREVOKED)
 		strlcat(flag_string, "REV", sizeof (flag_string));
 	return (flag_string);
 }
Index: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf
===================================================================
--- user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf	(nonexistent)
@@ -1,16 +0,0 @@
-# $FreeBSD$
-#
-# To disable this repository, instead of modifying or removing this file,
-# create a /usr/local/etc/pkg/repos/FreeBSD.conf file:
-#
-#   mkdir -p /usr/local/etc/pkg/repos
-#   echo "FreeBSD: { enabled: no }" > /usr/local/etc/pkg/repos/FreeBSD.conf
-#
-
-FreeBSD: {
-  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest",
-  mirror_type: "srv",
-  signature_type: "fingerprints",
-  fingerprints: "/usr/share/keys/pkg",
-  enabled: yes
-}

Property changes on: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf
___________________________________________________________________
Deleted: svn:keywords
## -1 +0,0 ##
-FreeBSD=%H
\ No newline at end of property
Index: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.latest
===================================================================
--- user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.latest	(nonexistent)
+++ user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.latest	(revision 346926)
@@ -0,0 +1,16 @@
+# $FreeBSD$
+#
+# To disable this repository, instead of modifying or removing this file,
+# create a /usr/local/etc/pkg/repos/FreeBSD.conf file:
+#
+#   mkdir -p /usr/local/etc/pkg/repos
+#   echo "FreeBSD: { enabled: no }" > /usr/local/etc/pkg/repos/FreeBSD.conf
+#
+
+FreeBSD: {
+  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest",
+  mirror_type: "srv",
+  signature_type: "fingerprints",
+  fingerprints: "/usr/share/keys/pkg",
+  enabled: yes
+}

Property changes on: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.latest
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Index: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.quarterly
===================================================================
--- user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.quarterly	(nonexistent)
+++ user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.quarterly	(revision 346926)
@@ -0,0 +1,16 @@
+# $FreeBSD$
+#
+# To disable this repository, instead of modifying or removing this file,
+# create a /usr/local/etc/pkg/repos/FreeBSD.conf file:
+#
+#   mkdir -p /usr/local/etc/pkg/repos
+#   echo "FreeBSD: { enabled: no }" > /usr/local/etc/pkg/repos/FreeBSD.conf
+#
+
+FreeBSD: {
+  url: "pkg+http://pkg.FreeBSD.org/${ABI}/quarterly",
+  mirror_type: "srv",
+  signature_type: "fingerprints",
+  fingerprints: "/usr/share/keys/pkg",
+  enabled: yes
+}

Property changes on: user/ngie/bug-237403/usr.sbin/pkg/FreeBSD.conf.quarterly
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Index: user/ngie/bug-237403/usr.sbin/pkg/Makefile
===================================================================
--- user/ngie/bug-237403/usr.sbin/pkg/Makefile	(revision 346925)
+++ user/ngie/bug-237403/usr.sbin/pkg/Makefile	(revision 346926)
@@ -1,14 +1,16 @@
 # $FreeBSD$
 
-CONFS=	FreeBSD.conf
+PKGCONFBRANCH?=	latest
+CONFS=		FreeBSD.conf.${PKGCONFBRANCH}
+CONFSNAME=	FreeBSD.conf
 CONFSDIR=	/etc/pkg
 CONFSMODE=	644
 PROG=	pkg
 SRCS=	pkg.c dns_utils.c config.c
 MAN=	pkg.7
 
 CFLAGS+=-I${SRCTOP}/contrib/libucl/include
 .PATH:	${SRCTOP}/contrib/libucl/include
 LIBADD=	archive fetch ucl sbuf crypto ssl
 
 .include <bsd.prog.mk>
Index: user/ngie/bug-237403
===================================================================
--- user/ngie/bug-237403	(revision 346925)
+++ user/ngie/bug-237403	(revision 346926)

Property changes on: user/ngie/bug-237403
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head:r346621-346925