Index: projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml
===================================================================
--- projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml	(revision 301546)
+++ projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml	(revision 301547)
@@ -1,1853 +1,1857 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
 <!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook XML V5.0-Based Extension//EN"
 	"../../../share/xml/freebsd50.dtd" [
 <!ENTITY % release PUBLIC "-//FreeBSD//ENTITIES Release Specification//EN" "release.ent">
  %release;
 <!ENTITY % sponsor PUBLIC "-//FreeBSD//ENTITIES Sponsor Specification//EN" "sponsor.ent">
  %sponsor;
 <!ENTITY % vendor PUBLIC "-//FreeBSD//ENTITIES Vendor Specification//EN" "vendor.ent">
  %vendor;
 <!ENTITY security SYSTEM "../../share/xml/security.xml">
 <!ENTITY errata SYSTEM "../../share/xml/errata.xml">
 ]>
 <article xmlns="http://docbook.org/ns/docbook"
   xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0">
 
   <info>
     <title>&os; &release.current; Release Notes</title>
 
     <author>
       <orgname>The &os; Project</orgname>
     </author>
 
     <pubdate>$FreeBSD$</pubdate>
 
     <!-- Last rev: 288943 -->
 
     <copyright>
       <year>2015</year>
       <year>2016</year>
       <holder role="mailto:doc@FreeBSD.org">The &os; Documentation
 	Project</holder>
     </copyright>
 
     <legalnotice xml:id="trademarks" role="trademarks">
       &tm-attrib.freebsd;
       &tm-attrib.ibm;
       &tm-attrib.ieee;
       &tm-attrib.intel;
       &tm-attrib.sparc;
       &tm-attrib.general;
     </legalnotice>
 
     <abstract>
       <para>The release notes for &os; &release.current; contain
 	a summary of the changes made to the &os; base system on the
 	&release.branch; development line.  This document lists
 	applicable security advisories that were issued since the last
 	release, as well as significant changes to the &os; kernel and
 	userland.  Some brief remarks on upgrading are also
 	presented.</para>
     </abstract>
   </info>
 
   <sect1 xml:id="intro">
     <title>Introduction</title>
 
     <para>This document contains the release notes for &os;
       &release.current;.  It describes recently added, changed, or
       deleted features of &os;.  It also provides some notes on
       upgrading from previous versions of &os;.</para>
 
     <para releasetype="current">The &release.type; distribution to
       which these release notes apply represents the latest point
       along the &release.branch; development branch since
       &release.branch; was created.  Information regarding pre-built,
       binary &release.type; distributions along this branch can be
       found at <uri
 	xlink:href="&release.url;">&release.url;</uri>.</para>
 
     <para releasetype="snapshot">The &release.type; distribution to
       which these release notes apply represents a point along the
       &release.branch; development branch between &release.prev; and
       the future &release.next;.  Information regarding pre-built,
       binary &release.type; distributions along this branch can be
       found at <uri
 	xlink:href="&release.url;">&release.url;</uri>.</para>
 
     <para releasetype="release">This distribution of &os;
       &release.current; is a &release.type; distribution.  It can be
       found at <uri xlink:href="&release.url;">&release.url;</uri> or
       any of its mirrors.  More information on obtaining this (or
       other) &release.type; distributions of &os; can be found in the
       <link
 	xlink:href="&url.books.handbook;/mirrors.html"><quote>Obtaining
 	  &os;</quote> appendix</link> to the <link
 	xlink:href="&url.books.handbook;/">&os;
 	Handbook</link>.</para>
 
     <para>All users are encouraged to consult the release errata
       before installing &os;.  The errata document is updated with
       <quote>late-breaking</quote> information discovered late in the
       release cycle or after the release.  Typically, it contains
       information on known bugs, security advisories, and corrections
       to documentation.  An up-to-date copy of the errata for &os;
       &release.current; can be found on the &os; Web site.</para>
 
     <para>This document describes the most user-visible new or changed
       features in &os; since &release.prev;.  In general, changes
       described here are unique to the &release.branch; branch unless
       specifically marked as &merged; features.</para>
 
     <para>Typical release note items document recent security
       advisories issued after &release.prev;, new drivers or hardware
       support, new commands or options, major bug fixes, or
       contributed software upgrades.  They may also list changes to
       major ports/packages or release engineering practices.  Clearly
       the release notes cannot list every single change made to &os;
       between releases; this document focuses primarily on security
       advisories, user-visible changes, and major architectural
       improvements.</para>
   </sect1>
 
   <sect1 xml:id="upgrade">
     <title>Upgrading from Previous Releases of &os;</title>
 
     <para arch="amd64,i386">Binary upgrades between RELEASE versions
       (and snapshots of the various security branches) are supported
       using the &man.freebsd-update.8; utility.  The binary upgrade
       procedure will update unmodified userland utilities, as well as
       unmodified GENERIC kernels distributed as a part of an official
       &os; release.  The &man.freebsd-update.8; utility requires that
       the host being upgraded have Internet connectivity.</para>
 
     <para>Source-based upgrades (those based on recompiling the &os;
       base system from source code) from previous versions are
       supported, according to the instructions in
       <filename>/usr/src/UPDATING</filename>.</para>
 
     <important>
       <para>Upgrading &os; should only be attempted after backing up
 	<emphasis>all</emphasis> data and configuration files.</para>
     </important>
   </sect1>
 
   <sect1 xml:id="security-errata">
     <title>Security and Errata</title>
 
     <para>This section lists the various Security Advisories and
       Errata Notices since &release.prev;.</para>
 
     <sect2 xml:id="security">
       <title>Security Advisories</title>
 
       &security;
     </sect2>
 
     <sect2 xml:id="errata">
       <title>Errata Notices</title>
 
       &errata;
     </sect2>
   </sect1>
 
   <sect1 xml:id="userland">
     <title>Userland</title>
 
     <para>This section covers changes and additions to userland
       applications, contributed software, and system utilities.</para>
 
     <sect2 xml:id="userland-config">
       <title>Userland Configuration Changes</title>
 
       <para revision="266463">The default &man.newsyslog.conf.5; now
 	includes files in the
 	<filename>/etc/newsyslog.conf.d/</filename> and
 	<filename>/usr/local/etc/newsyslog.conf.d/</filename>
 	directories by default for &man.newsyslog.8;.</para>
 
       <para revision="270675">The &man.mailwrapper.8; utility has been
 	updated to use &man.mailer.conf.5; from the
 	<literal>LOCALBASE</literal> environment variable, which
 	defaults to <filename class="directory">/usr/local</filename>
 	if unset.</para>
 
       <para revision="272350">The <literal>MK_ARM_EABI</literal>
 	&man.src.conf.5; option has been removed.</para>
 
       <para revision="301247">The <application>ntp</application> suite
 	has been updated to version 4.2.8p8.</para>
     </sect2>
 
     <sect2 xml:id="userland-programs">
       <title>Userland Application Changes</title>
 
       <para revision="258838" contrib="sponsor" sponsor="&ff;,
 	&google;" sponsorurl="">The &man.casperd.8; daemon has been
 	added, which provides access to functionality that is not
 	available in the <quote>capability mode</quote>
 	sandbox.</para>
 
       <para revision="260594">When unable to load a kernel module with
 	&man.kldload.8;, a message informing to view output of
 	&man.dmesg.8; is now printed, opposed to the previous output
 	<quote>Exec format error.</quote>.</para>
 
       <para revision="260910">Allow &man.pciconf.8; to identify PCI
 	devices that are attached to a driver to be identified by
 	their device name instead of just the selector.  Additionally,
 	an optional device argument to the <literal>-l</literal> flag
 	to restrict the output to only listing details about a single
 	device.</para>
 
       <para revision="260913">A new flag, <quote>onifconsole</quote>
 	has been added to <filename>/etc/ttys</filename>.  This allows
 	the system to provide a login prompt via serial console if the
 	device is an active kernel console, otherwise it is equivalent
 	to <literal>off</literal>.</para>
 
       <para revision="260926">Support for displaying VPD for PCI
 	devices via &man.pciconf.8; has been added.</para>
 
       <para revision="261498">&man.ping.8; protects against malicious
 	network packets using the Capsicum framework to drop
 	privileges.</para>
 
       <para revision="265229">The &man.ps.1; utility has been
 	updated to include the <literal>-J</literal> flag, used to
 	filter output by matching &man.jail.8; IDs and names.
 	Additionally, argument <literal>0</literal> can be used to
 	<literal>-J</literal> to only list processes running on the
 	host system.</para>
 
       <para revision="265249">The &man.top.1; utility has been updated
 	to filter by &man.jail.8; ID or name, in followup to the
 	&man.ps.1; change in <literal>r265229</literal>.</para>
 
       <para revision="266209">The &man.pmcstat.8; utility has been
 	updated to include a new flag, <literal>-l</literal>, which
 	ends event collection after the specified number of
 	seconds.</para>
 
       <para revision="270745">The &man.ps.1; utility has been updated
 	to include a new keyword, <quote>tracer</quote>, which
 	displays the <acronym>PID</acronym> of the tracing
 	process.</para>
 
       <para revision="271482">Support for adding empty partitions has
 	been added to the &man.mkimg.1; utility.</para>
 
       <para revision="272166">The &man.primes.6; utility has been
 	updated to correctly enumerate prime numbers between
 	<literal>4295098369</literal> and
 	<literal>3825123056546413050</literal>, which prior to this
 	change, it would be possible for returned values to be
 	incorrectly identified as prime numbers.</para>
 
       <para revision="272198">The &man.mkimg.1; utility has been
 	updated to include three options used to print information
 	about &man.mkimg.1; itself:</para>
 
       <informaltable frame="none" pgwide="0">
 	<tgroup cols="2">
 	  <colspec colwidth="1*"/>
 	  <colspec colwidth="1*"/>
 	  <thead>
 	    <row>
 	      <entry>Option</entry>
 	      <entry>Output</entry>
 	    </row>
 	  </thead>
 
 	  <tbody>
 	    <row>
 	      <entry><literal>--version</literal></entry>
 	      <entry>The current version of the &man.mkimg.1;
 		utility</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>--formats</literal></entry>
 	      <entry>The disk image file formats supported by
 		&man.mkimg.1;</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>--schemes</literal></entry>
 	      <entry>The partition schemes supported by
 		&man.mkimg.1;</entry>
 	    </row>
 	  </tbody>
 	</tgroup>
       </informaltable>
 
       <para revision="272488">Userland &man.ctf.5; support in
 	&man.dtrace.1; has been added.  With this change,
 	&man.dtrace.1; is able to resolve type info for function and
 	<acronym>USDT</acronym> probe arguments, and function return
 	values.</para>
 
       <para revision="274960">The &man.elfdump.1; utility has been
 	updated to support capability mode provided by
 	&man.capsicum.4;.</para>
 
       <para revision="275680" contrib="sponsor" sponsor="&ff;">The
 	&man.fstyp.8; utility has been added, which is used to
 	determine the filesystem on a specified device.</para>
 
       <para revision="276881">The <literal>libedit</literal> library
 	has been updated to support <acronym>UTF</acronym>-8, which
 	additionally provides unicode support to &man.sh.1;.</para>
 
       <para revision="276893" contrib="sponsor" sponsor="&ff;">The
 	&man.mkimg.1; utility has been updated to support the
 	<acronym>MBR</acronym> <acronym>EFI</acronym> partition
 	type.</para>
 
       <para revision="277166" arch="powerpc">The &man.ptrace.2; system
 	call has been updated include support for Altivec registers on
 	&os;/&arch.powerpc;.</para>
 
       <para revision="278320">A new device control utility,
 	&man.devctl.8; has been added, which allows making
 	administrative changes to individual devices, such as
 	attaching and detaching drivers, and enabling and disabling
 	devices.  The &man.devctl.8; utility uses the new
 	&man.devctl.3; library.</para>
 
       <para revision="279122" contrib="sponsor"
 	sponsor="&juniper;">The &man.netstat.1; utility has been
 	updated to link against the &man.libxo.3; shared
 	library.</para>
 
       <para revision="279139">A new flag, <literal>-c</literal>, has
 	been added to the &man.mkimg.1; utility, which allows
 	specifying the capacity of the target disk image.</para>
 
       <para revision="279315" contrib="sponsor" sponsor="&ff;">The
 	&man.uefisign.8; utility has been added.</para>
 
       <para revision="279571" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.freebsd-update.8; utility has
 	been updated to prevent fetching updated binary patches when
 	a previous upgrade has not been thoroughly completed.</para>
 
       <para revision="280870">A regression in the &man.libarchive.3;
 	library that would prevent a directory from being included in
 	the archive when <literal>--one-file-system</literal> is used
 	has been fixed.</para>
 
       <para revision="281311" contrib="sponsor" sponsor="&ff;">The
 	&man.ar.1; utility has been updated to set
 	<literal>ARCHIVE_EXTRACT_SECURE_SYMLINKS</literal> and
 	<literal>ARCHIVE_EXTRACT_SECURE_NODOTDOT</literal> to disallow
 	directory traversal when extracting an archive, similar to
 	&man.tar.1;.</para>
 
       <para revision="281617">A race condition in &man.wc.1; that
 	would cause final results to be sent to &man.stderr.4; when
 	receiving the <literal>SIGINFO</literal> signal has been
 	fixed.</para>
 
       <para revision="282208" contrib="sponsor"
 	sponsor="&multiplay;">The &man.chflags.1;, &man.chgrp.1;,
 	&man.chmod.1;, and &man.chown.8; utilities now affect symbolic
 	links when the <literal>-R</literal> flag is specified, as
 	documented in &man.symlink.7;.</para>
 
       <para revision="282608">The &man.date.1; utility has been
 	updated to print the modification time of the file passed as
 	an argument to the <literal>-r</literal> flag, improving
 	compatibility with the <acronym>GNU</acronym> &man.date.1;
 	utility behavior.</para>
 
       <para revision="283961">The &man.pw.8; utility has been updated
 	with a new flag, <literal>-R</literal>, that sets the root
 	directory within which the utility will operate.</para>
 
       <para revision="284297" contrib="sponsor"
 	sponsor="&clusterhq;">The &man.lockstat.1; utility has been
 	updated with several improvements:</para>
 
       <itemizedlist>
 	<listitem>
 	  <para>Spin locks are now reported as the amount of time
 	    spinning, instead of loop iterations.</para>
 	</listitem>
 
 	<listitem>
 	  <para>Reader locks are now recognized as adaptive that can
 	    spin on &os;.</para>
 	</listitem>
 
 	<listitem>
 	  <para>Lock aquisition events for successful reader try-lock
 	    events are now reported.</para>
 	</listitem>
 
 	<listitem>
 	  <para>Spin and block events are now reported before lock
 	    acquisition events.</para>
 	</listitem>
       </itemizedlist>
 
       <para revision="284589" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.fstyp.8; utility has been
 	updated to be able to detect &man.zfs.8; and &man.geli.8;
 	filesystems.</para>
 
       <para revision="284883">The &man.mkimg.1; utility has been
 	updated to include support for <literal>NTFS</literal>
 	filesystems in both <acronym>MBR</acronym> and
 	<acronym>GPT</acronym> partitioning schemes.</para>
 
       <para revision="285253">The &man.quota.1; utility has been
 	updated to include support for <acronym>IPv6</acronym>.</para>
 
       <para revision="285420">The &man.jexec.8; utility has been
 	updated to include a new flag, <literal>-l</literal>, which
 	ensures a clean environment in the target jail when used.
 	Additionally, &man.jexec.8; will run a shell within the target
 	jail when run no commands are specified.</para>
 
       <para revision="285550">The &man.w.1; utility has been updated
 	to display the full IPv6 remote address of the host from which
 	a user is connected.</para>
 
       <para revision="285685">The &man.jail.8; framework has been
 	updated to allow mounting &man.linprocfs.5; and
 	&man.linsysfs.5; within a jail.</para>
 
       <para revision="285772" contrib="sponsor"
 	sponsor="&emcisilon;">The &man.patch.1; utility has been
 	updated to include a new option to the <literal>-V</literal>
 	flag, <literal>none</literal>, which disables backup file
 	creation when applying a patch.</para>
 
       <para revision="286010" contrib="sponsor" sponsor="&ff;">The
 	&man.ar.1; utility now enables deterministic mode
 	(<literal>-D</literal>) by default.  This behavior can be
 	disabled by specifying the <literal>-U</literal> flag.</para>
 
       <para revision="286289" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.xargs.1; utility has been
 	updated to allow specifying <literal>0</literal> as an
 	argument to the <literal>-P</literal> (parallel mode) flag,
 	which allows creating as many concurrent processes as
 	possible.</para>
 
       <para revision="286795">The &man.patch.1; utility has been
 	updated to remove the automatic checkout feature.</para>
 
       <para revision="287473" contrib="sponsor" sponsor="&gandi;">A
 	new utility, &man.sesutil.8;, has been added, which is used
 	to manage &man.ses.4; devices.</para>
 
       <para revision="287522">The &man.pciconf.8; utility has been
 	updated to use the PCI ID database from the <filename
 	  role="package">misc/pciids</filename> package, if present,
 	falling back to the PCI ID database in the &os; base
 	system.</para>
 
       <para revision="287842" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.ifconfig.8; utility has been
 	updated to always exit with an error code if an important
 	&man.ioctl.2; fails.</para>
     </sect2>
 
     <sect2 xml:id="userland-contrib">
       <title>Contributed Software</title>
 
       <para revision="260445">&man.byacc.1; has been updated to
 	version 20140101.</para>
 
       <para revision="296633"><application>OpenSSH</application> has
 	been updated to 7.2p2.</para>
 
       <para revision="261344"><application>mdocml</application> has
 	been updated to version 1.12.3.</para>
 
       <para revision="275718">The <application>binutils</application>
 	suite of utilities has been updated to include upstream
 	patches that add new relocations for &arch.powerpc;
 	support.</para>
 
       <para revision="276398" contrib="sponsor" sponsor="&ff;">The
 	<application>ELF Tool Chain</application> has been updated to
 	upstream revision r3136.</para>
 
       <para revision="276551">The <application>texinfo</application>
 	utility and <literal>info</literal> pages were removed from
 	the base system.  The <filename
 	  role="package">print/texinfo</filename> port should be
 	installed on systems where <literal>info</literal> pages are
 	needed.</para>
 
       <para revision="276796" contrib="sponsor" sponsor="&ff;">The ELF
 	object manipulation tools
 	<application>addr2line</application>,
 	<application>elfcopy (strip)</application>,
 	<application>nm</application>,
 	<application>readelf</application>,
 	<application>size</application>, and
 	<application>strings</application> were switched to the
 	versions from the ELF Tool Chain project.</para>
 
       <para revision="276881">The <literal>libedit</literal> library
 	has been updated to include <acronym>UTF-8</acronym> support,
 	adding <acronym>UTF-8</acronym> support to the &man.sh.1;
 	shell.</para>
 
       <para revision="278433">The &man.xz.1; utility has been updated
 	to support multi-threaded compression.</para>
 
       <para revision="280932" contrib="sponsor" sponsor="&ff;">The
 	<application>elftoolchain</application> utilities have been
 	updated to version 3179.</para>
 
       <para revision="281316">The &man.xz.1; utility has been updated
 	to version 5.2.1.</para>
 
       <para revision="281373">The &man.nvi.1; utility has been updated
 	to version 2.1.3.</para>
 
       <para revision="281806">The &man.wpa.supplicant.8; and
 	&man.hostapd.8; utilities have been updated to version
 	2.4.</para>
 
       <para revision="296190" contrib="sponsor" sponsor="&ff;">The
 	&man.resolvconf.8; utility has been updated to version
 	3.7.3.</para>
 
       <para revision="284254"><application>bmake</application> has
 	been updated to version 20150606.</para>
 
       <para revision="285229"><application>sendmail</application> has
 	been updated to 8.15.2.  Starting with &os;&nbsp;11.0 and
 	sendmail 8.15, sendmail uses uncompressed IPv6 addresses by
 	default, i.e., they will not contain <quote>::</quote>.  For
 	example, instead of <quote>::1</quote>, it will be
 	<quote>0:0:0:0:0:0:0:1</quote>.  This permits a zero subnet to
 	have a more specific match, such as different map entries for
 	IPv6:0:0 versus IPv6:0.  This change requires that
 	configuration data (including maps, files, classes, custom
 	ruleset, etc.) must use the same format, so make certain such
 	configuration data is upgrading.  As a very simple check
 	search for patterns like 'IPv6:[0-9a-fA-F:]*::' and 'IPv6::'.
 	To return to the old behavior, set the m4 option
 	<literal>confUSE_COMPRESSED_IPV6_ADDRESSES</literal> or the cf
 	option <literal>UseCompressedIPv6Addresses</literal>.</para>
 
       <para revision="285275">The &man.tcpdump.1; utility has been
 	updated to version 4.7.4.</para>
 
       <para revision="298998"><application>OpenSSL</application> has
 	been updated to version 1.0.2h.</para>
 
       <para revision="285642" contrib="sponsor" sponsor="&dell;">The
 	&man.ssh.1; utility has been updated to re-implement hostname
 	canonicalization before locating the host in
 	<filename>known_hosts</filename>.</para>
 
       <para revision="285972">The &man.libarchive.3; library has been
 	updated to properly skip a sparse file entry in a &man.tar.1;
 	file, which would previously produce errors.</para>
 
       <para revision="286503">The <application>apr</application>
 	library used by &man.svnlite.1; has been updated to version
 	1.5.2.</para>
 
       <para revision="286505">The <application>serf</application>
 	library used by &man.svnlite.1; has been updated to version
 	1.3.8.</para>
 
       <para revision="286505">The &man.svnlite.1; utility has been
 	updated to version 1.8.14.</para>
 
       <para revision="298161">The <application>sqlite3</application>
 	library used by &man.svnlite.1; and &man.kerberos.8; has been
 	updated to version 3.12.1.</para>
 
       <para revision="286750">Timezone data files have been updated to
 	version 2015f.</para>
 
       <para revision="287168">The &man.acpi.4; subsystem has been
 	updated to version 20150818.</para>
 
       <para revision="287917">The &man.unbound.8; utility has been
 	updated to version 1.5.4.</para>
 
       <para revision="288090">&man.jemalloc.3; has been updated to
 	version 4.0.2.</para>
 
       <para revision="298192">The &man.file.1; utility has been
 	updated to version 5.26.</para>
 
       <para revision="288303">The &man.nc.1; utility has been updated
 	to the OpenBSD 5.8 version.</para>
 
       <para revision="296417"><application>Clang</application> has
 	been updated to version 3.8.0.</para>
 
       <para revision="296417"><application>LLVM</application> has
 	been updated to version 3.8.0.</para>
 
       <para revision="296417"><application>LLDB</application> has
 	been updated to version 3.8.0.</para>
 
       <para revision="296417"><application>libc++</application> has
 	been updated to version 3.8.0.</para>
 
       <para revision="296417">The
 	<application>compiler_rt</application> utility has been
 	updated to version 3.8.0.</para>
     </sect2>
 
     <sect2 xml:id="userland-installer">
       <title>Installation and Configuration Tools</title>
 
       <para revision="271539">The &man.bsdinstall.8; partition editor
 	and &man.sade.8; utility have been updated to include native
 	<acronym>ZFS</acronym> support.</para>
 
       <para revision="272274">The &os; installation utility,
 	&man.bsdinstall.8;, has been updated to set the
 	<literal>canmount</literal> &man.zfs.8; property to
 	<literal>off</literal> for the <filename
 	  class="directory">/var</filename> dataset, preventing the
 	contents of directories within <filename
 	  class="directory">/var</filename> from conflicting when
 	using multiple boot environments, such as that provided by
 	<filename role="package">sysutils/beadm</filename>.</para>
 
       <para revision="274394">The &man.bsdconfig.8; utility has been
 	updated to skip the initial &man.tzsetup.8;
 	<acronym>UTC</acronym> versus wall-clock time prompt when run
 	in a virtual machine, determined when the
 	<literal>kern.vm_guest</literal> &man.sysctl.8; is set to
 	<literal>1</literal>.</para>
 
       <para revision="275874">The &man.bsdinstall.8; utility has been
 	updated to use the new &man.dpv.3; library to display progress
 	when extracting the &os; distributions.</para>
 
       <para revision="285557" contrib="sponsor"
 	sponsor="&scaleengine;">Support for detecting and implementing
 	aligning partitions on 1Mb boundaries has been added to
 	&man.bsdinstall.8;.</para>
 
       <para revision="285679" contrib="sponsor"
 	sponsor="&scaleengine;">Support for detecting and implementing
 	a workaround for various laptops and motherboards that do not
 	boot properly from <acronym>GPT</acronym>-partitioned disks
 	has been added to &man.bsdinstall.8;.  Additionally, the
 	<literal>active</literal> flag will be set on the partition
 	when needed.</para>
 
       <para revision="285679" contrib="sponsor"
 	sponsor="&scaleengine;">Support for selecting the partitioning
 	scheme when installing on the <acronym>UFS</acronym>
 	filesystem has been added to &man.bsdinstall.8;.</para>
     </sect2>
 
     <sect2 xml:id="userland-rc">
       <title><filename class="directory">/etc/rc.d</filename>
 	Scripts</title>
 
       <para revision="270676">The &man.rc.8; subsystem has been
 	updated to allow configuring services in <filename
 	  class="directory">&dollar;{LOCALBASE}/etc/rc.conf.d/</filename>.
 	If <literal>LOCALBASE</literal> is unset, it defaults to
 	<filename class="directory">/usr/local</filename>.</para>
 
       <para revision="273955">A new &man.rc.8; script,
 	<filename>growfs</filename>, has been added, which will resize
 	the root filesystem on boot if <filename>/firstboot</filename>
 	exists.</para>
 
       <para revision="275299">The <filename>mrouted</filename>
 	&man.rc.8; script has been removed from the base system.  An
 	equivalent script is available from the <filename
 	  role="package">net/mrouted</filename> port.</para>
 
       <para revision="279463" contrib="sponsor"
 	sponsor="&sandvine;">A new &man.rc.8; script,
 	<filename>iovctl</filename>, has been added, which allows
 	automatically starting the &man.iovctl.8; utility at
 	boot.</para>
 
       <para revision="287576" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.service.8; utility has been
 	updated to honor entries within <filename
 	  class="directory">/etc/rc.conf.d/</filename>.</para>
 
     </sect2>
 
     <sect2 xml:id="userland-periodic">
       <title><filename class="directory">/etc/periodic</filename>
 	Scripts</title>
 
       <para revision="271321">The daily &man.periodic.8; script
 	<filename>110.clean-tmps</filename> has been updated to avoid
 	crossing filesystem mount boundaries when cleaning files in
 	<filename class="directory">/tmp</filename>.</para>
 
       <para revision="277216" contrib="sponsor" sponsor="&ff;">A new
 	&man.periodic.8; script,
 	<filename>510.status-world-kernel</filename>, has been added,
 	which evaluates the running userland and kernel versions from
 	the &man.uname.1; <literal>-U</literal> and
 	<literal>-K</literal> arguments, and prints an error if the
 	system userland and kernel are not in sync.</para>
     </sect2>
 
     <sect2 xml:id="userland-libraries">
       <title>Runtime Libraries and API</title>
 
       <para revision="265995">The Blowfish &man.crypt.3; default
 	format has been changed to
 	<literal>&dollar;2b&dollar;</literal>.</para>
 
       <para revision="268461">The &man.readline.3; library is now
 	statically linked in software within the base system, and the
 	shared library is no longer installed, allowing the Ports
 	Collection to use a modern version of the library.</para>
 
       <para revision="272273">The &man.strptime.3; library has been
 	updated to add support for <acronym>POSIX</acronym>-2001
 	features <literal>%U</literal> and
 	<literal>%W</literal>.</para>
 
       <para revision="272842,272848" contrib="sponsor"
 	sponsor="&ff;">The &man.dl.iterate.phdr.3; library has been
 	changed to always return the path name of the
 	<acronym>ELF</acronym> object in the
 	<literal>dlpi_name</literal> structure member.</para>
 
       <para revision="273562" contrib="sponsor"
 	sponsor="&juniper;">The &man.libxo.3; library has been
 	imported to the base system.</para>
 
       <para revision="273806" contrib="sponsor" sponsor="&chelsio;">A
 	userland library for Chelsio Terminator 5 based iWARP cards
 	has been added, allowing userland <acronym>RDMA</acronym>
 	applications to work over compatible
 	<acronym>NIC</acronym>s.</para>
 
       <para revision="274987">The &man.gpio.3; library has been added,
 	providing a wrapper around the &man.gpio.4; kernel
 	interface.</para>
 
       <para revision="275800" contrib="sponsor" sponsor="&ff;">The
 	&man.procctl.2; system call has been updated to include
 	a facility for non-&man.init.8; processes to be declared as
 	the reaper of child processes and their decendants.</para>
 
       <para revision="277610">The <literal>futimens()</literal> and
 	<literal>utimensat()</literal> system calls have been
 	added.  See &man.utimensat.2; for more information.</para>
 
       <para revision="278934">The &man.elf.3; compile-time dependency
 	has been removed from <filename>dtri.o</filename>, which
 	allows adding <application>DTrace</application> probes to
 	userland applications and libraries without also linking
 	against &man.elf.3;.</para>
 
       <para revision="279186">The &man.setmode.3; function has been
 	updated to consistently set <literal>errno</literal> on
 	failure.</para>
 
       <para revision="279663">The &man.qsort.3; functions have been
 	updated to be able to handle 32-bit aligned data on 64-bit
 	platforms, also providing a significant improvement in 32-bit
 	workloads.</para>
 
       <para revision="281130">Several standard include headers have
 	been updated to use of <application>gcc</application>
 	attributes, such as <literal>__result_use_check()</literal>,
 	<literal>__alloc_size()</literal>, and
 	<literal>__nonnull()</literal>.</para>
 
       <para revision="281845">Support for file verification in
 	<acronym>MAC</acronym> has been added.</para>
 
       <para revision="282973" contrib="sponsor" sponsor="&ff;">The
 	<literal>libgomp</literal> library is now only built when
 	building <acronym>GCC</acronym> from the base system.  An
 	up-to-date version is available in the Ports Collection as
 	<filename
 	  role="package">devel/libiomp5-devel</filename>.</para>
 
       <para revision="282988">The <filename>stdlib.h</filename> and
 	<filename>malloc.h</filename> headers have been updated to
 	make use of the <application>gcc</application>
 	<literal>alloc_align()</literal> attribute.</para>
 
       <para revision="284483" contrib="sponsor"
 	sponsor="&scaleengine;">The Blowfish &man.crypt.3; library
 	has been updated to support &dollar;2y&dollar; hashes.</para>
 
       <para revision="285277">The &man.execl.3; and &man.execlp.3;
 	library functions have been updated to use the
 	<literal>__sentinel</literal> <application>gcc</application>
 	attribute.</para>
     </sect2>
 
     <sect2 xml:id="userland-abi">
       <title>ABI Compatibility</title>
 
       <para revision="271982">The &linux; compatibility version has
 	been updated to <literal>2.6.18</literal>.  The
 	<literal>compat.linux.osrelease</literal> &man.sysctl.8; is
 	evaluated when building the <filename
 	  role="package">emulators/linux-c6</filename> and related
 	ports.</para>
 
       <para revision="288669">The stack protector has been upgraded to
 	the "strong" level, elevating the protection against buffer
 	overflows.  While this significantly improves the security of
 	the system, extensive testing was done to ensure there are no
 	measurable side effects in performance or
 	functionality.</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="kernel">
     <title>Kernel</title>
 
     <para>This section covers changes to kernel configurations, system
       tuning, and system control parameters that are not otherwise
       categorized.</para>
 
     <sect2 xml:id="kernel-bugfix">
       <title>Kernel Bug Fixes</title>
 
       <para revision="265876">A kernel bug that inhibited proper
 	functionality of the <literal>dev.cpu.0.freq</literal>
 	&man.sysctl.8; on &intel; processors with Turbo
 	Boost&nbsp;&trade; enabled has been fixed.</para>
 
       <para revision="271697" arch="powerpc">Support for
 	&man.dtrace.1; stack tracing has been fixed for
 	&os;/&arch.powerpc;, using the <literal>trapexit()</literal>
 	and <literal>asttrapexit()</literal> functions instead of
 	checking within addressed kernel space.</para>
 
       <para revision="271917">A kernel panic triggered when destroying
 	a &man.vnet.9; &man.jail.8; configured with &man.gif.4; has
 	been fixed.</para>
 
       <para revision="271918">A kernel panic triggered when destroying
 	a &man.vnet.9; &man.jail.8; configured with &man.gre.4; has
 	been fixed.</para>
 
       <para revision="272089">A bug in &man.ipfw.4; that could
 	potentially lead to a kernel panic when using &man.dummynet.4;
 	at layer 2 has been fixed.</para>
 
       <para revision="280930" contrib="sponsor" sponsor="&mitail;">The
 	kernel <acronym>RPC</acronym> has been updated to include
 	several enhancements:</para>
 
       <itemizedlist>
 	<listitem>
 	  <para>The 45 MiB limit on requests queued for
 	    &man.nfsd.8; threads has been removed.</para>
 	</listitem>
 
 	<listitem>
 	  <para>Avoids unnecessary throttling by not deferring
 	    accounting for completed requests.</para>
 	</listitem>
 
 	<listitem>
 	  <para>Fixes an integer overflow and signedness bugs.</para>
 	</listitem>
       </itemizedlist>
 
       <para revision="281261" arch="powerpc">Support for
 	&man.dtrace.1; has been added for the
 	Book-E&nbsp;&trade;.</para>
 
       <para revision="287886" contrib="sponsor"
 	sponsor="&multiplay;">The &man.kqueue.2; system call has been
 	updated to handle write events to files larger than 2
 	gigabytes.</para>
     </sect2>
 
     <sect2 xml:id="kernel-config">
       <title>Kernel Configuration</title>
 
       <para revision="266531">The <literal>IMAGACT_BINMISC</literal>
 	kernel configuration option has been enabled by default,
 	which enables application execution through emulators, such
 	as <application>Qemu</application>.</para>
 
       <para revision="268045">The <literal>VT</literal> kernel
 	configuration file has been removed, and the &man.vt.4;
 	driver is included in the <literal>GENERIC</literal> kernel.
 	To enable &man.vt.4;, enter <literal>set kern.vty=vt</literal>
 	at the &man.loader.8; prompt during boot, or add
 	<literal>kern.vty=vt</literal> to &man.loader.conf.5; and
 	reboot the system.</para>
 
       <para revision="277904">The &man.config.8; utility has been
 	updated to allow using a non-standard <filename
 	  class="directory">src/</filename> tree, specified as an
 	argument to the <literal>-s</literal> flag.</para>
 
       <para revision="277990" arch="powerpc64">The
 	&os;/&arch.powerpc64; kernel now builds as
 	a position-independent executable, allowing the kernel to be
 	loaded into and run from any physical or virtual
 	address.</para>
 
       <important>
 	<para>This change requires an update to &man.loader.8;.
 	  The userland and kernel must be updated before rebooting the
 	  system.</para>
       </important>
 
       <para revision="278338" arch="arm">A new module for creating
 	<filename>rpi.dtb</filename> has been added for the Raspberry
 	Pi.</para>
 
       <para revision="278340" arch="arm">The
 	<filename>rpi.dtb</filename> module is now installed to
 	<filename class="directory">/boot/dtb/</filename> by
 	default for the Raspberry Pi system.</para>
 
       <para revision="279189" contrib="sponsor" sponsor="&ff;"
 	arch="powerpc">Kernel support for Vector-Scalar eXtension
 	(<acronym>VSX</acronym>) found on POWER7 and POWER8 hardware
 	has been added.</para>
 
       <para revision="279252" contrib="sponsor" sponsor="&ff;"
 	arch="powerpc">The &man.pmap.9; implementation for 64-bit
 	&powerpc; processors has been overhaulded to improve
 	concurrency.</para>
 
       <para revision="279824" arch="arm">A new module for creating
 	the <filename>dtb</filename> module for AM335x systems has
 	been added.</para>
 
       <para revision="281495" contrib="sponsor" sponsor="&ff;">The
 	<literal>PAE_TABLES</literal> kernel configuration option has
 	been added for &os;/&arch.i386;, which instructs &man.pmap.9;
 	to use <acronym>PAE</acronym> format for page tables while
 	maintaining a 32-bit physical address size elsewhere in the
 	kernel.  The use of this option can enhance application-level
 	security by enabling the creation of <quote>no execute</quote>
 	mappings on modern &arch.i386; processors.  Unlike the
 	<literal>PAE</literal> option, <literal>PAE_TABLES</literal>
 	preserves kernel binary interface (<acronym>KBI</acronym>)
 	compatibility with non-<literal>PAE</literal> kernels,
 	allowing non-<literal>PAE</literal> kernel modules and drivers
 	to work with a <literal>PAE_TABLES</literal>-enabled kernel.
 	Additionally, system limits are tuned for 4GB maximum
 	<acronym>RAM</acronym>, avoiding kernel virtual address space
 	(<acronym>KVA</acronym>) exhaustion.</para>
 
       <para revision="282215">The <literal>SIFTR</literal> kernel
 	configuration has been added, allowing building &man.siftr.4;
 	statically into the kernel.</para>
 
       <para revision="282731" arch="arm">The &arch.arm; boot loader,
 	<filename>ubldr</filename>, is now relocatable.  In addition,
 	<filename>ubldr.bin</filename> is now created during build
 	time, which is a stripped binary with an entry point of
 	<literal>0</literal>, providing the ability to specify the
 	load address by running <literal>go
 	  &dollar;{loadaddr}</literal> in
 	<literal>u-boot</literal>.</para>
 
       <para revision="282921" contrib="sponsor" sponsor="&intelcorp;"
 	arch="amd64,i386">The &man.nvd.4; and &man.nvme.4; drivers are
 	now included in the <filename>GENERIC</filename> kernel
 	configuration by default.</para>
 
       <para revision="283959" contrib="sponsor"
 	sponsor="&limelight;">A new kernel configuration option,
 	<literal>EM_MULTIQUEUE</literal>, has been added which enables
 	multi-queue support in the &man.em.4; driver.</para>
 
       <note>
 	<para>Multi-queue support in the &man.em.4; driver is not
 	  officially supported by &intel;.</para>
       </note>
 
       <para revision="285142" contrib="sponsor"
 	sponsor="&netgate;">The <filename>GENERIC</filename> kernel
 	configuration has been updated to include the
 	<literal>IPSEC</literal> option by default.</para>
 
       <para revision="285387" contrib="sponsor"
 	sponsor="&norse;, &dell;">Initial <acronym>NUMA</acronym>
 	affinity and policy configuration has been added.  See
 	&man.numactl.1;, and &man.numa.getaffinity.2;, for usage
 	details.</para>
 
       <para revision="286231">The &man.pms.4; driver has been added
 	to the <filename>GENERIC</filename> kernel configuration for
 	supported architectures.</para>
 
       <para revision="287306" arch="arm">The
 	<filename>CUBIEBOARD2</filename> kernel configuration has been
 	renamed to <filename>A20</filename>.</para>
 
       <para revision="288176" contrib="sponsor" sponsor="&ff;">Kernel
 	debugging symbols are now installed to <filename
 	  class="directory">/usr/lib/debug/boot/kernel/</filename>.
 	To retain the previous behavior, add
 	<literal>KERN_DEBUGDIR=""</literal> to
 	&man.src.conf.5;.</para>
     </sect2>
 
     <sect2 xml:id="kernel-sysctl">
       <title>System Tuning and Controls</title>
 
       <para revision="275140" contrib="sponsor" sponsor="&ff;">The
 	&man.hwpmc.4; default and maximum callchain depths have been
 	increased.  The default has been increased from 16 to 32, and
 	the maximum increased from 32 to 128.</para>
 
       <para revision="279361">The <literal>kern.osrelease</literal>
 	and <literal>kern.osreldate</literal> are now configurable
 	&man.jail.8; parameters.</para>
 
       <para revision="280308,280949" contrib="sponsor"
 	sponsor="&ix;, &ff;">The &man.devfs.5; device filesystem has
 	been changed to update timestamps for read/write operations
 	using seconds precision.  A new &man.sysctl.8;,
 	<literal>vfs.devfs.dotimes</literal> has been added, which
 	when set to a non-zero value, enables default precision
 	timestamps for these operations.</para>
 
       <para revision="282213" contrib="sponsor" sponsor="&ff;">A new
 	&man.sysctl.8;, <literal>kern.racct.enable</literal>, has been
 	added, which when set to a non-zero value allows using
 	&man.rctl.8; with the <literal>GENERIC</literal> kernel.
 	A new kernel configuration option,
 	<literal>RACCT_DISABLED</literal> has also been added.</para>
 
       <para revision="282901" contrib="sponsor" sponsor="&ff;">The
 	<literal>GENERIC</literal> kernel configuration now includes
 	<literal>RACCT</literal> and <literal>RCTL</literal> by
 	default.</para>
 
       <note>
 	<para>To enable <literal>RACCT</literal> and
 	  <literal>RCTL</literal> on a system using the
 	  <literal>GENERIC</literal> kernel configuration, add
 	  <literal>kern.racct.enable=1</literal> to
 	  &man.loader.conf.5;, and reboot the system.</para>
       </note>
 
       <para revision="283136" contrib="sponsor"
 	sponsor="&limelight;">A new &man.sysctl.8;,
 	<literal>net.inet.tcp.hostcache.purgenow</literal>, has
 	been added, which when set to <literal>1</literal> during
 	runtime will flush all
 	<literal>net.inet.tcp.hostcache</literal> entries.</para>
 
       <para revision="285524">A new &man.sysctl.8;,
 	<literal>hw.model</literal>, has been added, which displays
 	<acronym>CPU</acronym> model information.</para>
 
       <para revision="286591">The &man.uart.4; driver has been
 	updated to allow tuning pulses per second captured in the
 	CTS line during runtime, whereas previously only the DCD line
 	could be used without rebuilding the kernel.</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="drivers">
     <title>Devices and Drivers</title>
 
     <para>This section covers changes and additions to devices and
       device drivers since &release.prev;.</para>
 
     <sect2 xml:id="drivers-device">
       <title>Device Drivers</title>
 
       <para revision="260903">Support for GPS ports has been added to
 	&man.uhso.4;.</para>
 
       <para revision="265132">The &man.full.4; device has been added,
 	and the <literal>lindev(4)</literal> device has been removed.
 	Prior to this change, <literal>lindev(4)</literal> provided
 	only the <filename>/dev/full</filename> character device,
 	returning <literal>ENOSPC</literal> on write attempts.  As
 	this device is not specific to &linux;, a native &os; version
 	has been added.</para>
 
       <para revision="271705">Hardware context support has been
 	added to the <literal>drm/i915</literal> driver, adding
 	support for <application>Mesa</application> 9.2 and
 	later.</para>
 
       <para revision="273178">The &man.vt.4; driver has been updated,
 	replacing the bitmapped <literal>kern.vt.spclkeys</literal>
 	&man.sysctl.8; with individual
 	<literal>kern.vt.kbd_*</literal> variants.</para>
 
       <para revision="273598">The &man.hpet.4; driver has been updated
 	to create a
 	<filename>/dev/hpet<replaceable>N</replaceable></filename>
 	device, providing access to <acronym>HPET</acronym> from
 	userspace.</para>
 
       <para revision="280183">The <literal>drm</literal> code has
 	been updated to match &linux; version 3.8.13.</para>
 
       <para revision="281440">The &man.psm.4; driver has been updated
 	to include improved support for newer Synaptics&nbsp;&reg;
 	touchpads and the ClickPad&nbsp;&reg; mouse on newer
 	Lenovo&nbsp;&trade; laptops.</para>
 
       <para revision="282783" arch="powerpc">Support for the Freescale
 	<acronym>PCI</acronym> Root Complex device has been
 	added.</para>
 
       <para revision="285876">The &man.cyapa.4; driver has been added,
 	supporting the Cypress APA I2C trackpad.</para>
 
       <para revision="285883">The &man.isl.4; driver has been added,
 	supporting the Intersil I2C ISL29018 digital ambient light
 	sensor.</para>
     </sect2>
 
     <sect2 xml:id="drivers-storage">
       <title>Storage Drivers</title>
 
       <para revision="265236" contrib="sponsor"
 	sponsor="&lsi;, &spectralogic;" sponsorurl="">The &man.mpr.4;
 	device has been added, providing support for LSI Fusion-MPT
 	3 12Gb SCSI/SATA controllers.</para>
 
       <para revision="265555" contrib="sponsor"
 	  sponsor="&lsi;">The &man.mrsas.4; driver has been added,
 	providing support for LSI MegaRAID SAS controllers.  The
 	&man.mfi.4; driver will attach to the controller, by default.
 	To enable &man.mrsas.4; add
 	<literal>hw.mfi.mrsas_enable=1</literal> to
 	<filename>/boot/loader.conf</filename>, which turns off
 	&man.mfi.4; device probing.</para>
 
       <note>
 	<para>At this time, the &man.mfiutil.8; utility and the &os;
 	  version of <application>MegaCLI</application> and
 	  <application>StorCli</application> do not work with
 	  &man.mrsas.4;.</para>
       </note>
 
       <para revision="275461" contrib="sponsor" sponsor="&ix;">The
 	&man.ctl.4; subsystem has been updated, increasing the ports
 	limit from <literal>128</literal> to <literal>256</literal>,
 	and <acronym>LUN</acronym> limit from <literal>256</literal>
 	to <literal>1024</literal>.</para>
 
       <para revision="276526">The <literal>asr(4)</literal> driver has
 	been removed, and is no longer supported.</para>
 
       <para revision="281387">The &man.hptnr.4; driver has been
 	updated to version 1.1.1.</para>
 
       <para revision="285662">The &man.pms.4; driver has been added,
 	providing support for the PMC Sierra line of
 	<acronym>SAS</acronym>/<acronym>SATA</acronym> host bus
 	adapters.</para>
 
       <para revision="287117" contrib="sponsor"
 	sponsor="&emcisilon;">The &man.ioat.4; driver has been added,
 	providing support for the <acronym>PSE</acronym> (Platform
 	Storage Extension).</para>
 
       <para revision="287621" contrib="sponsor" sponsor="&ix;">The
 	<acronym>CTL</acronym> High Availability implementation has
 	been rewritten.</para>
 
       <para revision="288310">The &man.ctl.4; driver has been updated
 	to support CD-ROM and removable devices.</para>
 
       <para contrib="sponsor" sponsor="&ix;">The &man.isp.4; driver has
 	been updated and improved: added support for 16Gbps FC cards,
 	improved target mode support, completed Multi-ID (NPIV)
 	functionality.</para>
     </sect2>
 
     <sect2 xml:id="drivers-network">
       <title>Network Drivers</title>
 
       <para revision="258830">Support for Broadcom chipsets BCM57764,
 	BCM57767, BCM57782, BCM57786 and BCM57787 has been added to
 	&man.bge.4;.</para>
 
       <para revision="260448">Support for the &intel; Centrino&trade;
 	Wireless-N 135 chipset has been added.</para>
 
       <para revision="260552">Firmware for &intel; Centrino&trade;
 	Wireless-N 105 devices has been added to the base
 	system.</para>
 
       <para revision="261975">The deprecated nve(4) driver has been
 	removed.  Users of NVIDIA nForce MCP network adapters are
 	advised to use the &man.nfe.4; driver instead, which has been
 	the default driver for this hardware since
 	&os;&nbsp;7.0.</para>
 
       <para revision="264601" contrib="sponsor"
 	sponsor="&darpa_afrl;">The <literal>if_nf10bmac(4)</literal>
 	device has been added, providing support for NetFPGA-10G
 	Embedded CPU Ethernet Core.</para>
 
       <note>
 	<para>The <literal>if_nf10bmac(4)</literal> driver operates on
 	  the FPGA, and is not suited for the PCI host
 	  interface.</para>
       </note>
 
       <para revision="265348" contrib="sponsor"
 	sponsor="&netgate;">The &man.ath.hal.4; driver has been
 	updated to support the Atheros AR1111 chipset.</para>
 
       <para revision="266770">Support for the &intel; Centrino&trade;
 	Wireless-N 105 chipset has been added.</para>
 
       <para revision="266757" contrib="sponsor"
 	sponsor="&chelsio;">Support for the &man.cxgbe.4; Terminator
 	5 (T5) 10G/40G cards has been added to &man.netmap.4;.</para>
 
       <para revision="272730">The &man.alc.4; driver has been updated
 	to support AR816x and AR817x ethernet controllers.</para>
 
       <para revision="272906">The &man.pf.4; packet filter default
 	hash has been changed from <literal>Jenkins</literal> to
 	<literal>Murmur3</literal>, providing a 3-percent performance
 	increase in packets-per-second.</para>
 
       <para revision="273331">The &man.vxlan.4; driver has been added,
 	which creates a virtual Layer 2 (Ethernet) network overlaid in
 	a Layer 3 (IP/UDP) network.  The &man.vxlan.4; driver is
 	analogous to &man.vlan.4;, but is designed to be better suited
 	for large, multiple-tenant datacenter environments.</para>
 
       <para revision="274246" contrib="sponsor" sponsor="&yandex;">The
 	&man.gre.4; driver has been significantly overhauled, and has
 	been split into two separate modules, &man.gre.4; and
 	&man.me.4;.</para>
 
       <para revision="278551">The &man.ral.4; driver has been updated
 	to support the RT5390 and RT5392 chipsets.</para>
 
       <para revision="283514" contrib="sponsor"
 	sponsor="&solarflare;">The &man.sfxge.4; driver has been
 	updated to support Solarflare Flareon Ultra 7000-series
 	chipsets.</para>
 
       <para revision="283766" contrib="sponsor"
 	sponsor="&limelight;">The &man.em.4; driver has been updated
 	with improved transmission queue hang detection.</para>
 
       <para revision="284125">The &man.cdce.4; driver has been updated
 	to include support for the RTL8153 chipset.</para>
 
       <para revision="286441">The &man.iwm.4; driver has been imported
 	from OpenBSD, providing support for &intel; 3160/7260/7265
 	wireless chipsets.</para>
 
       <para revision="286829" contrib="sponsor"
 	sponsor="&limelight;">The &man.em.4; driver has been updated
 	to allow disabling <acronym>CRC</acronym> stripping.</para>
 
       <para revision="287222">The &man.pf.4; implementation has been
 	updated to remove support for the <literal>scrub fragment
 	  crop|drop-ovl</literal> filtering rule.  Systems with this
 	rule in &man.pf.conf.5; will implicitly be converted to the
 	<literal>scrub fragment reassemble</literal> filtering rule,
 	without necessary intervention.</para>
 
       <para revision="288654">The &man.lagg.4; driver has been updated
 	to remove support for the <literal>fec</literal>
 	protocol.</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="hardware">
     <title>Hardware Support</title>
 
     <para>This section covers general hardware support for physical
       machines, hypervisors, and virtualization environments, as well
       as hardware changes and updates that do not otherwise fit in
       other sections of this document.</para>
 
     <sect2 xml:id="hardware-support">
       <title>Hardware Support</title>
 
       <para revision="268303">The &man.asmc.4; driver has been
 	updated to support the &apple;&nbsp;MacMini 3,1.</para>
 
       <para revision="268351">Support for &os;/ia64 has been dropped
 	as of &os;&nbsp;11.</para>
 
       <para revision="274386">An issue that could cause a system to
 	hang when entering <acronym>ACPI</acronym>
 	<literal>S3</literal> state (suspend to
 	<acronym>RAM</acronym>) has been corrected in the &man.acpi.4;
 	and &man.pci.4; drivers.</para>
 
       <para revision="274733" arch="powerpc">The power management unit
 	subsystem has been updated to support power button events on
 	certain &arch.powerpc; hardware, such as aluminum
 	PowerBook&nbsp;&reg;.</para>
 
       <para revision="275171,275190" arch="powerpc">The &man.hwpmc.4;
 	driver has been updated to correct performance counter
 	sampling on G4 (MPC74xxx) and G5 class processors.</para>
 
       <para revision="275732" contrib="sponsor"
 	sponsor="&ff;,&netgate;">The
 	<application>OpenCrypto</application> framework has been
 	updated to include <literal>AES-ICM</literal> and
 	<literal>AES-GCM</literal> modes, both of which have also been
 	added to the &man.aesni.4; driver.</para>
 
       <para revision="281713" arch="powerpc">The &man.hwpmc.4;
 	driver has been updated to support the Freescale e500
 	core.</para>
 
       <para revision="283766">The &man.ig4.4; driver has been added,
 	providing support for the fourth generation &intel;
 	<acronym>I2C</acronym> SMBus.</para>
 
       <para>The &man.uart.4; driver has been updated to support
 	<acronym>AMT</acronym> devices on newer systems.</para>
 
       <para revision="285316" contrib="sponsor" sponsor="&ff;"
 	arch="arm64">Initial <acronym>SMP</acronym> support has been
 	added to the &os;/&arch.arm64; port.</para>
     </sect2>
 
     <sect2 xml:id="hardware-virtualization">
       <title>Virtualization Support</title>
 
       <para revision="260410">Support for the <quote>Virtual Interrupt
 	  Delivery</quote> feature of &intel;&nbsp;VT-x is enabled if
 	supported by the CPU.  This feature can be disabled by running
 	<literal>sysctl hw.vmm.vmx.use_apic_vid=0</literal>.
 	Additionally, to persist this setting across reboots, add
 	<literal>hw.vmm.vmx.use_apic_vid=0</literal> to
 	<filename>/etc/sysctl.conf</filename>.</para>
 
       <para revision="260532">Support for <quote>Posted Interrupt
 	  Processing</quote> is enabled if supported by the CPU.  This
 	feature can be disabled by running <literal>sysctl
 	  hw.vmm.vmx.use_apic_pir=0</literal>.  Additionally, to
 	persist this setting across reboots, add
 	<literal>hw.vmm.vmx.use_apic_pir=0</literal> to
 	<filename>/etc/sysctl.conf</filename>.</para>
 
       <para revision="260582">Unmapped IO support has been added to
 	&man.virtio_blk.4;.</para>
 
       <para revision="260583">Unmapped IO support has been added to
 	&man.virtio_scsi.4;.</para>
 
       <para revision="260847">The &man.virtio_random.4; driver has
 	been added to harvest entropy from the host system.</para>
 
       <para revision="261504">&os;/&arch.i386; guests can be run under
 	bhyve.</para>
 
       <para revision="267536" contrib="sponsor"
 	sponsor="&citrix.rd;">Support for running a &os;/&arch.amd64;
 	<application>Xen</application> guest instance as
 	<acronym>PVH</acronym> guest has been added.
 	<acronym>PVH</acronym> mode, short for <quote>Para-Virtualized
 	  Hardware</quote>, uses para-virtualized drivers for boot and
 	I/O, and uses hardware virtualization extensions for all other
 	tasks, without the need for emulation.</para>
 
       <para revision="273375">The &man.bhyve.8; hypervisor has been
 	updated to support &amd; processors with
 	<acronym>SVM</acronym> and <acronym>AMD-V</acronym> hardware
 	extensions.</para>
 
       <para revision="273515">The &man.virtio.console.4; driver has
 	been added, which provides an interface to VirtIO console
 	devices through a &man.tty.4; device.</para>
 
       <para revision="279957">The &man.bhyve.8; hypervisor has been
 	updated to support <literal>DSM TRIM</literal> commands for
 	virtual <acronym>AHCI</acronym> disks.</para>
 
       <para revision="281439" arch="arm">Support for the
 	<application>QEMU</application> <literal>virt</literal> system
 	has been added.</para>
 
       <para revision="282212" contrib="sponsor" sponsor="&msostc;">The
 	Hyper-V&trade; drivers have been updated with several
 	enhancements:</para>
 
       <itemizedlist>
 	<listitem>
 	  <para>The &man.hv.vmbus.4; driver now has multi-channel
 	    support.</para>
 	</listitem>
 
 	<listitem>
 	  <para>The &man.hv.storvsc.4; driver now has scatter/gather
 	    support, in addition to performance improvements.</para>
 	</listitem>
 
 	<listitem>
 	  <para>The &man.hv.kvp.4; driver has received several bug
 	    fixes.</para>
 	</listitem>
       </itemizedlist>
 
       <para revision="282274">Support for &man.xen.4; para-virtualized
 	<literal>domU</literal> kernels has been removed.</para>
 
       <para revision="284746" contrib="sponsor" sponsor="&msostc;">The
 	&man.hv.netvsc.4; driver has been updated to support checksum
 	offloading and <acronym>TSO</acronym>.</para>
 
       <para revision="286062">The &man.xen.4; driver has been updated
 	to include support for <literal>blkif</literal> indirect
 	segment I/O.</para>
     </sect2>
 
     <sect2 xml:id="hardware-arm">
       <title>ARM Support</title>
 
       <para revision="260921">The &man.nand.4; device is enabled for
 	ARM devices by default.</para>
 
       <para revision="266943" arch="arm">Support for the Exynos 5420
 	Octa system has been added.</para>
 
       <para revision="267390" arch="arm">The <acronym>SMP</acronym>
 	option has been enabled for all Exynos 5 systems supported by
 	&os;.</para>
 
       <para revision="268838" arch="arm">Support for the Toradex
 	Apalis i.MX6 development board has been added.</para>
 
       <para revision="273264" arch="armv6">An issue that could cause
 	instability when detecting <acronym>SD</acronym> cards on the
 	Raspberry Pi <acronym>SOC</acronym> has been fixed.</para>
 
       <para revision="275963">The <literal>bcm2835_cpufreq</literal>
 	driver has been added, which supports <acronym>CPU</acronym>
 	frequency and voltage control on the Raspberry Pi
 	<acronym>SOC</acronym>.</para>
 
       <para revision="277042" arch="arm">Support to turn off the
 	BeagleBone Black system with the &man.shutdown.8;
 	<literal>-p</literal> flag or by invoking &man.poweroff.8; has
 	been added.</para>
 
       <para revision="277644" arch="arm">Audio transmission drivers
 	have been added for Digital Audio Multiplexer
 	(<acronym>AUDMUXM</acronym>), Smart Direct Memory Access
 	Controller (<acronym>SDMA</acronym>), and Syncronous Serial
 	Interface (<acronym>SSI</acronym>).</para>
 
       <para revision="280259" contrib="sponsor" sponsor="&ff;">Initial
 	support for the ARM AArch64 architecture has been
 	added.</para>
 
       <para revision="282779" arch="arm">Kernel support for Thumb-2
 	userland has been added.</para>
 
       <para revision="282827">Support for the hardware power button
 	on the BeagleBone Black system has been added.</para>
 
       <para revision="284273" contrib="sponsor"
 	sponsor="&ff;">Initial
 	<acronym>ACPI</acronym> support has been added for
 	&os;/&arch.arm64;.</para>
 
       <para revision="287225">Support for 1-Wire devices has been
 	added, providing support for 1-Wire hardware through
 	&man.gpio.4;.  See &man.ow.4;, &man.owc.4;, and
 	&man.ow.temp.4; for more information.</para>
 
       <para revision="287371" arch="arm64" contrib="sponsor"
 	sponsor="&abt;">Support for the HiSilicon HI6220 SoC has been
 	added.</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="storage">
     <title>Storage</title>
 
     <para>This section covers changes and additions to file systems
       and other storage subsystems, both local and networked.</para>
 
     <sect2 xml:id="storage-general">
       <title>General Storage</title>
 
       <para revision="278037" contrib="sponsor" sponsor="&ix;">The
 	&man.ctl.4; <acronym>LUN</acronym> mapping has been rewritten,
 	replacing <acronym>iSCSI</acronym>-specific mapping mechanisms
 	with a new mechanism that works for any port.</para>
 
       <para revision="278354" contrib="sponsor" sponsor="&ix;">The
 	&man.ctld.8; utility has been updated to allow controlling
 	non-<acronym>iSCSI</acronym> &man.ctl.4; ports.</para>
 
       <para revision="275681" contrib="sponsor" sponsor="&ff;">The
 	&man.autofs.5; subsystem has been updated to include a new
 	&man.auto.master.5; map, <literal>-media</literal>, which
 	allows automatically mounting removable media, such as
 	<acronym>CD</acronym> drives or <acronym>USB</acronym> flash
 	drives.</para>
 
       <para revision="279955" contrib="sponsor" sponsor="&ff;">The
 	&man.autofs.5; subsystem has been updated to include a new
 	&man.auto.master.5; map, <literal>-noauto</literal>, which
 	handles &man.fstab.5; entries set to
 	<literal>noauto</literal>.</para>
 
       <para revision="286444">The <acronym>GELI</acronym> class has
 	been updated to support the <literal>BIO_DELETE</literal>
 	&man.g.bio.9; <literal>bio_cmd</literal> field, providing
 	<acronym>TRIM</acronym>/<acronym>UNMAP</acronym> support on
 	<acronym>GELI</acronym>-backed <acronym>SSD</acronym> storage
 	providers.</para>
     </sect2>
 
     <sect2 xml:id="storage-net">
       <title>Networked Storage</title>
 
       <para revision="270096" contrib="sponsor" sponsor="&ff;">The new
 	filesystem automount facility, &man.autofs.5;, has been added.
 	The new &man.autofs.5; facility is similar to that found in
 	other &unix;-like operating systems, such as OS&nbsp;X&trade;
 	and Solaris&trade;.  The &man.autofs.5; facility uses
 	a &sun;-compatible &man.auto.master.5; configuration file, and
 	is administered with the &man.automount.8; userland utility,
 	and the &man.automountd.8; and &man.autounmountd.8;
 	daemons.</para>
 
       <para revision="273849" contrib="sponsor" sponsor="&ff;">Support
 	for the <literal>timeo</literal>, <literal>actimeo</literal>,
 	<literal>noac</literal>, and <literal>proto</literal> options
 	have been added to &man.mount.nfs.8;.</para>
     </sect2>
 
     <sect2 xml:id="storage-zfs">
       <title>ZFS</title>
 
       <para revision="275748">The <literal>arc_meta_limit</literal>
 	statistics are now visible through the
 	<literal>kstat</literal> &man.sysctl.8;.  As a result of this
 	change, the <literal>vfs.zfs.arc_meta_used</literal>
 	&man.sysctl.8; has been removed, and replaced with the
 	<literal>kstat.zfs.misc.arcstats.arc_meta_used</literal>
 	&man.sysctl.8;.</para>
 
       <para revision="287099" contrib="sponsor"
 	sponsor="&clusterhq;">The &man.zfs.8; <literal>l2arc</literal>
 	code has been updated to take <literal>ashift</literal> into
 	account when gathering buffers to be written to the
 	<literal>l2arc</literal> device.</para>
 
       <para revision="300906" contrib="sponsor"
 	sponsor="&ix;, &spectralogic;">The zfsd daemon has been added,
 	which manages hotspares and replements in drive slots that publish
 	physical paths.</para>
     </sect2>
 
     <sect2 xml:id="storage-geom">
       <title>&man.geom.4;</title>
 
       <para revision="267359">Support for the
 	<literal>disklabel64</literal> partitioning scheme has been
 	added to &man.gpart.8;.</para>
 
       <para revision="282465">Support for the
 	<literal>apple-boot</literal>, <literal>apple-hfs</literal>,
 	and <literal>apple-ufs</literal> <acronym>MBR</acronym>
 	partitioning schemes have been added to &man.gpart.8;.</para>
 
       <para revision="285594" contrib="sponsor"
 	sponsor="&scaleengine;">The &man.gpart.8; utility has been
 	updated to include a new attribute for <acronym>GPT</acronym>
 	partitions, <literal>lenovofix</literal>, which when set,
 	which works around <acronym>BIOS</acronym> compatibility
 	issues reported on several Lenovo&nbsp;&trade; laptops.</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="boot">
     <title>Boot Loader Changes</title>
 
     <para>This section covers the boot loader, boot menu, and other
       boot-related changes.</para>
 
     <sect2 xml:id="boot-loader">
       <title>Boot Loader Changes</title>
 
       <para revision="258431" contrib="sponsor" sponsor="&ff;">The
 	memory test run at boot time on &os;/&arch.amd64; platforms
 	has been disabled by default.</para>
 
       <para revision="262955">A new &man.ttys.5; class,
 	<literal>3wire</literal>, has been added.  This is similar to
 	the existing terminal classes, but does not have a defined
 	baudrate.</para>
 
       <para revision="274085">The &man.vt.4; driver has been made the
 	default system console driver.  The &man.syscons.4; driver is
 	still available, and can be enabled by adding
 	<literal>kern.vty=sc</literal> in &man.loader.conf.5;.
 	Alternatively, &man.syscons.4; can be enabled at boot time by
 	entering <literal>set kern.vty=sc</literal> at the
 	&man.loader.8; prompt.</para>
 
       <para revision="279950">Support for <literal>bzipfs</literal>
 	has been added to the <acronym>EFI</acronym> loader.</para>
 
       <para revision="281616">The boot loader has been updated to
 	support entering the <acronym>GELI</acronym> passphrase before
 	loading the kernel.  To enable this behavior, add
 	<literal>geom_eli_passphrase_prompt="YES"</literal> to
 	&man.loader.conf.5;.</para>
 
       <para revision="284683" contrib="sponsor" sponsor="&ff;"
 	arch="arm">The &man.ttys.5; file for &os;/&arch.arm; has been
 	updated to enable <filename>ttyu1</filename>,
 	<filename>ttyu2</filename>, and <filename>ttyu3</filename> by
 	default, if the callin port is an active console port.</para>
     </sect2>
 
     <sect2 xml:id="boot-menu">
       <title>Boot Menu Changes</title>
 
       <para>&nbsp;</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="network">
     <title>Networking</title>
 
     <para>This section describes changes that affect networking in
       &os;.</para>
 
     <sect2 xml:id="network-protocols">
       <title>Network Protocols</title>
 
       <para revision="263140">Support for the IPX network transport
 	protocol has been removed, and will not be supported in
 	&os;&nbsp;11 and later releases.</para>
 
       <para revision="272720" contrib="sponsor"
 	sponsor="&limelight;">Support for <acronym>PLPMTUD</acronym>
 	blackhole detection (<acronym>RFC</acronym> 4821) has been
 	added to the &man.tcp.4; stack, disabled by default.  New
 	control tunables have been added:</para>
 
       <informaltable frame="none" pgwide="0">
 	<tgroup cols="2">
 	  <colspec colwidth="1*"/>
 	  <colspec colwidth="1*"/>
 	  <thead>
 	    <row>
 	      <entry>Tunable</entry>
 	      <entry>Description</entry>
 	    </row>
 	  </thead>
 
 	  <tbody>
 	    <row>
 	      <entry><literal>net.inet.tcp.pmtud_blackhole_detection</literal></entry>
 	      <entry>Enables or disables <acronym>PLPMTUD</acronym>
 		blackhole detection</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>net.inet.tcp.pmtud_blackhole_mss</literal></entry>
 	      <entry><acronym>MSS</acronym> to try for IPv4</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>net.inet.tcp.v6pmtud_blackhole_mss</literal></entry>
 	      <entry><acronym>MSS to try for IPv6</acronym></entry>
 	    </row>
 	  </tbody>
 	</tgroup>
       </informaltable>
 
       <para>New monitoring &man.sysctl.8;s haven been added:</para>
 
       <informaltable frame="none" pgwide="0">
 	<tgroup cols="2">
 	  <colspec colwidth="1*"/>
 	  <colspec colwidth="1*"/>
 	  <thead>
 	    <row>
 	      <entry>Tunable</entry>
 	      <entry>Description</entry>
 	    </row>
 	  </thead>
 
 	  <tbody>
 	    <row>
 	      <entry><literal>net.inet.tcp.pmtud_blackhole_activated</literal></entry>
 	      <entry>Number of times the code was activated to attempt
 		downshifting the <acronym>MSS</acronym></entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>net.inet.tcp.pmtud_blackhole_min_activated</literal></entry>
 	      <entry>Number of times the blackhole
 		<acronym>MSS</acronym> was used in an attempt to
 		downshift</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>net.inet.tcp.pmtud_blackhole_failed</literal></entry>
 	      <entry>Number of times that the blackhole failed to
 		connect after downshifting the
 		<acronym>MSS</acronym></entry>
 	    </row>
 	  </tbody>
 	</tgroup>
       </informaltable>
 
       <para revision="280971" contrib="sponsor"
 	sponsor="&netflix;, &nginx;">Support for <acronym>IP</acronym>
 	identification for atomic datagrams (<acronym>RFC</acronym>
 	6864) has been added.  Support for this feature can be toggled
 	with the <literal>net.inet.ip.rfc6864</literal>
 	&man.sysctl.8;, which is enabled by default.</para>
 
       <para revision="285336" contrib="sponsor"
 	sponsor="&netgate;">The <acronym>IPSEC</acronym> has been
 	updated to include support for <acronym>AES</acronym> modes on
 	both software-only and hardware-backed (&man.aesni.4;)
 	systems.</para>
 
       <para revision="287798" contrib="sponsor" sponsor="&dell;">The
 	network stack has been updated to fix handling of
 	<acronym>IPv6</acronym> On-Link redirects.</para>
 
 	<para revision="300240">The net.inet.tcp.ecn.enable sysctl mib has been
 	changed from a binary off/on control to a three way setting.</para>
 
       <informaltable frame="none" pgwide="0">
 	<tgroup cols="2">
 	  <colspec colwidth="1*"/>
 	  <colspec colwidth="1*"/>
 	  <thead>
 	    <row>
 	      <entry>Value</entry>
 	      <entry>Description</entry>
 	    </row>
 	  </thead>
 
 	  <tbody>
 	    <row>
 	      <entry><literal>0</literal></entry>
 	      <entry>Totally disable ECN.</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>1</literal></entry>
 	      <entry>Enable ECN if incoming connections request it. Outgoing
 	      connections will request ECN.</entry>
 	    </row>
 
 	    <row>
 	      <entry><literal>2</literal></entry>
 	      <entry>Enable ECN if incoming connections request it. Outgoing
 	      conections will not request ECN.</entry>
 	    </row>
 
 	  </tbody>
 	</tgroup>
       </informaltable>
 
+      <para revision="300779">Dummynet AQM, an independent implementation of
+      CoDel and FQ-CoDel for ipfw/dummynet has been imported to the base
+      system.</para>
+
     </sect2>
   </sect1>
 
   <sect1 xml:id="ports">
     <title>Ports Collection and Package Infrastructure</title>
 
     <para>This section covers changes to the &os;&nbsp;Ports
       Collection, package infrastructure, and package maintenance and
       installation tools.</para>
 
     <sect2 xml:id="ports-infrastructure">
       <title>Infrastructure Changes</title>
 
       <para>&nbsp;</para>
     </sect2>
 
     <sect2 xml:id="ports-packages ">
       <title>Packaging Changes</title>
 
       <para>&nbsp;</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="doc">
     <title>Documentation</title>
 
     <para>This section covers changes to the &os;&nbsp;Documentation
       Project sources and toolchain.</para>
 
     <sect2 xml:id="doc-sources">
       <title>Documentation Source Changes</title>
 
       <para>&nbsp;</para>
     </sect2>
 
     <sect2 xml:id="doc-toolchain">
       <title>Documentation Toolchain Changes</title>
 
       <para>&nbsp;</para>
     </sect2>
   </sect1>
 
   <sect1 xml:id="releng">
     <title>Release Engineering and Integration</title>
 
     <para>This section convers changes that are specific to the
       &os;&nbsp;Release Engineering processes.</para>
 
     <sect2 xml:id="releng-changes">
       <title>Integration Changes</title>
 
       <para revision="277458" contrib="sponsor" sponsor="&ff;">The
 	Release Engineering build tools have been updated to include
 	support for producing virtual machine disk images for various
 	cloud hosting providers.</para>
 
       <para revision="278926">The Release Engineering build tools have
 	been updated to use multi-threaded &man.xz.1;.  By default,
 	the number of &man.xz.1; threads is set to the number of cores
 	available.</para>
 
       <para revision="281802" contrib="sponsor" sponsor="&ff;">The
 	Release Engineering build tools have been updated to include
 	support for building &os;/&arch.arm64; virtual machine and
 	memory stick installation images.</para>
 
       <para revision="282693" contrib="sponsor" sponsor="&ff;">The
 	Release Engineering build tools have been updated to support
 	building &os;/&arch.arm; images without external utilities for
 	supported boards where a corresponding
 	<literal>u-boot</literal> port exists in the Ports
 	Collection.</para>
 
       <para revision="283307" contrib="sponsor" sponsor="&ff;">The
 	&os;/&arch.i386; memory stick installation images are now
 	created using the &man.mkimg.1; utility, matching the way
 	the &os;/&arch.amd64; images are created.</para>
     </sect2>
   </sect1>
 </article>
Index: projects/vnet/share/man/man5/rc.conf.5
===================================================================
--- projects/vnet/share/man/man5/rc.conf.5	(revision 301546)
+++ projects/vnet/share/man/man5/rc.conf.5	(revision 301547)
@@ -1,4663 +1,4679 @@
 .\" Copyright (c) 1995
 .\"	Jordan K. Hubbard
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
-.Dd April 30, 2016
+.Dd June 8, 2016
 .Dt RC.CONF 5
 .Os
 .Sh NAME
 .Nm rc.conf
 .Nd system configuration information
 .Sh DESCRIPTION
 The file
 .Nm
 contains descriptive information about the local host name, configuration
 details for any potential network interfaces and which services should be
 started up at system initial boot time.
 In new installations, the
 .Nm
 file is generally initialized by the system installation utility.
 .Pp
 The purpose of
 .Nm
 is not to run commands or perform system startup actions
 directly.
 Instead, it is included by the
 various generic startup scripts in
 .Pa /etc
 which conditionalize their
 internal actions according to the settings found there.
 .Pp
 The
 .Pa /etc/rc.conf
 file is included from the file
 .Pa /etc/defaults/rc.conf ,
 which specifies the default settings for all the available options.
 Options need only be specified in
 .Pa /etc/rc.conf
 when the system administrator wishes to override these defaults.
 The file
 .Pa /etc/rc.conf.local
 is used to override settings in
 .Pa /etc/rc.conf
 for historical reasons.
 .Pp
 In addition to
 .Pa /etc/rc.conf.local
 you can also place smaller configuration files for each
 .Xr rc 8
 script in the
 .Pa /etc/rc.conf.d
 directory or
 .Ao Ar dir Ac Ns Pa /rc.conf.d
 directories specified in
 .Va local_startup ,
 which will be included by the
 .Va load_rc_config
 function.
 For jail configurations you could use the file
 .Pa /etc/rc.conf.d/jail
 to store jail specific configuration options.
 If
 .Va local_startup
 contains
 .Pa /usr/local/etc/rc.d
 and
 .Pa /opt/conf ,
 .Pa /usr/local/rc.conf.d/jail
 and
 .Pa /opt/conf/rc.conf.d/jail
 will be loaded.
 If
 .Ao Ar dir Ac Ns Pa /rc.conf.d/ Ns Ao Ar name Ac
 is a directory,
 all of files in the directory will be loaded.
 Also see the
 .Va rc_conf_files
 variable below.
 .Pp
 Options are set with
 .Dq Ar name Ns Li = Ns Ar value
 assignments that use
 .Xr sh 1
 syntax.
 The following list provides a name and short description for each
 variable that can be set in the
 .Nm
 file:
 .Bl -tag -width indent-two
 .It Va rc_debug
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable output of debug messages from rc scripts.
 This variable can be helpful in diagnosing mistakes when
 editing or integrating new scripts.
 Beware that this produces copious output to the terminal and
 .Xr syslog 3 .
 .It Va rc_info
 .Pq Vt bool
 If set to
 .Dq Li NO ,
 disable informational messages from the rc scripts.
 Informational messages are displayed when
 a condition that is not serious enough to warrant a warning or
 an error occurs.
 .It Va rc_startmsgs
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 show
 .Dq Starting foo:
 when faststart is used (e.g., at boot time).
 .It Va early_late_divider
 .Pq Vt str
 The name of the script that should be used as the
 delimiter between the
 .Dq early
 and
 .Dq late
 stages of the boot process.
 The early stage should contain all the services needed to
 get the disks (local or remote) mounted so that the late
 stage can include scripts contained in the directories
 listed in the
 .Va local_startup
 variable (see below).
 Thus, the two likely candidates for this value are
 .Pa mountcritlocal
 for the typical system, and
 .Pa mountcritremote
 if the system needs remote file
 systems mounted to get access to the
 .Va local_startup
 directories; for example when
 .Pa /usr/local
 is NFS mounted.
 For
 .Pa rc.conf
 within a
 .Xr jail 8
 .Pa NETWORKING
 is likely to be an appropriate value.
 Extreme care should be taken when changing this value,
 and before changing it one should ensure that there are
 adequate provisions to recover from a failed boot
 (such as physical contact with the machine,
 or reliable remote console access).
 .It Va always_force_depends
 .Pq Vt bool
 Various
 .Pa rc.d
 scripts use the force_depend function to check whether required
 services are already running, and to start them if necessary.
 By default during boot time this check is bypassed if the
 required service is enabled in
 .Pa /etc/rc.conf[.local] .
 Setting this option will bypass that check at boot time and
 always test whether or not the service is actually running.
 Enabling this option is likely to increase your boot time if
 services are enabled that utilize the force_depend check.
 .It Ao Ar name Ac Ns Va _chroot
 .Pq Vt str
 .Xr chroot
 to this directory before running the service.
 .It Ao Ar name Ac Ns Va _user
 .Pq Vt str
 Run the service under this user account.
 .It Ao Ar name Ac Ns Va _group
 .Pq Vt str
 Run the chrooted service under this system group. Unlike the _user
 setting, this setting has no effect if the service is not chrooted.
 .It Ao Ar name Ac Ns Va _fib
 .Pq Vt int
 The
 .Xr setfib 1
 value to run the service under.
 .It Ao Ar name Ac Ns Va _nice
 .Pq Vt int
 The
 .Xr nice 1
 value to run the service under.
 .It Va apm_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable support for Automatic Power Management with
 the
 .Xr apm 8
 command.
 .It Va apmd_enable
 .Pq Vt bool
 Run
 .Xr apmd 8
 to handle APM event from userland.
 This also enables support for APM.
 .It Va apmd_flags
 .Pq Vt str
 If
 .Va apmd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr apmd 8
 daemon.
 .It Va devd_enable
 .Pq Vt bool
 Run
 .Xr devd 8
 to handle device added, removed or unknown events from the kernel.
 .It Va ddb_enable
 .Pq Vt bool
 Run
 .Xr ddb 8
 to install
 .Xr ddb 4
 scripts at boot time.
 .It Va ddb_config
 .Pq Vt str
 Configuration file for
 .Xr ddb 8 .
 Default
 .Pa /etc/ddb.conf .
 .It Va kld_list
 .Pq Vt str
 A list of kernel modules to load right after the local
 disks are mounted.
 Loading modules at this point in the boot process is
 much faster than doing it via
 .Pa /boot/loader.conf
 for those modules not necessary for mounting local disk.
 .It Va kldxref_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Set to
 .Dq Li YES
 to automatically rebuild
 .Pa linker.hints
 files with
 .Xr kldxref 8
 at boot time.
 .It Va kldxref_clobber
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 If
 .Va kldxref_enable
 is true,
 setting to
 .Dq Li YES
 will overwrite existing
 .Pa linker.hints
 files at boot time.
 Otherwise,
 only missing
 .Pa linker.hints
 files are generated.
 .It Va kldxref_module_path
 .Pq Vt str
 Empty by default.
 A semi-colon
 .Pq Ql \&;
 delimited list of paths containing
 .Xr kld 4
 modules.
 If empty,
 the contents of the
 .Va kern.module_path
 .Xr sysctl 8
 are used.
 .It Va powerd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable the system power control facility with the
 .Xr powerd 8
 daemon.
 .It Va powerd_flags
 .Pq Vt str
 If
 .Va powerd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr powerd 8
 daemon.
 .It Va tmpmfs
 Controls the creation of a
 .Pa /tmp
 memory file system.
 Always happens if set to
 .Dq Li YES
 and never happens if set to
 .Dq Li NO .
 If set to anything else, a memory file system is created if
 .Pa /tmp
 is not writable.
 .It Va tmpsize
 Controls the size of a created
 .Pa /tmp
 memory file system.
 .It Va tmpmfs_flags
 Extra options passed to the
 .Xr mdmfs 8
 utility when the memory file system for
 .Pa /tmp
 is created.
 The default is
 .Dq Li "-S" ,
 which inhibits the use of softupdates on
 .Pa /tmp
 so that file system space is freed without delay
 after file truncation or deletion.
 See
 .Xr mdmfs 8
 for other options you can use in
 .Va tmpmfs_flags .
 .It Va varmfs
 Controls the creation of a
 .Pa /var
 memory file system.
 Always happens if set to
 .Dq Li YES
 and never happens if set to
 .Dq Li NO .
 If set to anything else, a memory file system is created if
 .Pa /var
 is not writable.
 .It Va varsize
 Controls the size of a created
 .Pa /var
 memory file system.
 .It Va varmfs_flags
 Extra options passed to the
 .Xr mdmfs 8
 utility when the memory file system for
 .Pa /var
 is created.
 The default is
 .Dq Li "-S" ,
 which inhibits the use of softupdates on
 .Pa /var
 so that file system space is freed without delay
 after file truncation or deletion.
 See
 .Xr mdmfs 8
 for other options you can use in
 .Va varmfs_flags .
 .It Va populate_var
 Controls the automatic population of the
 .Pa /var
 file system.
 Always happens if set to
 .Dq Li YES
 and never happens if set to
 .Dq Li NO .
 If set to anything else, a memory file system is created if
 .Pa /var
 is not writable.
 Note that this process requires access to certain commands in
 .Pa /usr
 before
 .Pa /usr
 is mounted on normal systems.
 .It Va cleanvar_enable
 .Pq Vt bool
 Clean the
 .Pa /var
 directory.
 .It Va local_startup
 .Pq Vt str
 List of directories to search for startup script files.
 .It Va script_name_sep
 .Pq Vt str
 The field separator to use for breaking down the list of startup script files
 into individual filenames.
 The default is a space.
 It is not necessary to change this unless there are startup scripts with names
 containing spaces.
 .It Va hostapd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr hostapd 8
 at system boot time.
 .It Va hostname
 .Pq Vt str
 The fully qualified domain name (FQDN) of this host on the network.
 This should almost certainly be set to something meaningful, even if
 there is no network connection.
 If
 .Xr dhclient 8
 is used to set the hostname via DHCP,
 this variable should be set to an empty string.
 If this value remains unset when the system is done booting
 your console login will display the default hostname of
 .Dq Amnesiac .
 .It Va nisdomainname
 .Pq Vt str
 The NIS domain name of this host, or
 .Dq Li NO
 if NIS is not used.
 .It Va dhclient_program
 .Pq Vt str
 Path to the DHCP client program
 .Pa ( /sbin/dhclient ,
 the
 .Ox
 DHCP client,
 is the default).
 .It Va dhclient_flags
 .Pq Vt str
 Additional flags to pass to the DHCP client program.
 For the
 .Ox
 DHCP client, see the
 .Xr dhclient 8
 manpage for a description of the command line options available.
 .It Va dhclient_flags_ Ns Aq Ar iface
 Additional flags to pass to the DHCP client program running on
 .Ar iface
 only.
 When specified, this variable overrides
 .Va dhclient_flags .
 .It Va background_dhclient
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start the DHCP client in background.
 This can cause trouble with applications depending on
 a working network, but it will provide a faster startup
 in many cases.
 .It Va background_dhclient_ Ns Aq Ar iface
 When specified, this variable overrides the
 .Va background_dhclient
 variable for interface
 .Ar iface
 only.
 .It Va synchronous_dhclient
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr dhclient 8
 synchronously at startup.
 This behavior can be overridden on a per-interface basis by replacing
 the
 .Dq Li DHCP
 keyword in the
 .Va ifconfig_ Ns Aq Ar interface
 variable with
 .Dq Li SYNCDHCP
 or
 .Dq Li NOSYNCDHCP .
 .It Va defaultroute_delay
 .Pq Vt int
 When set to a positive value, wait up to this long after configuring
 DHCP interfaces at startup to give the interfaces time to receive a lease.
 .It Va firewall_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to load firewall rules at startup.
 If the kernel was not built with
 .Cd "options IPFIREWALL" ,
 the
 .Pa ipfw.ko
 kernel module will be loaded.
 See also
 .Va ipfilter_enable .
 .It Va firewall_script
 .Pq Vt str
 This variable specifies the full path to the firewall script to run.
 The default is
 .Pa /etc/rc.firewall .
 .It Va firewall_type
 .Pq Vt str
 Names the firewall type from the selection in
 .Pa /etc/rc.firewall ,
 or the file which contains the local firewall ruleset.
 Valid selections from
 .Pa /etc/rc.firewall
 are:
 .Pp
 .Bl -tag -width ".Li simple" -compact
 .It Li open
 unrestricted IP access
 .It Li closed
 all IP services disabled, except via
 .Dq Li lo0
 .It Li client
 basic protection for a workstation
 .It Li simple
 basic protection for a LAN.
 .El
 .Pp
 If a filename is specified, the full path
 must be given.
 .It Va firewall_quiet
 .Pq Vt bool
 Set to
 .Dq Li YES
 to disable the display of firewall rules on the console during boot.
 .It Va firewall_logging
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable firewall event logging.
 This is equivalent to the
 .Dv IPFIREWALL_VERBOSE
 kernel option.
 .It Va firewall_logif
 .Pq Vt bool
 Set to
 .Dq Li YES
 to create pseudo interface
 .Li ipfw0
 for logging.
 For more details, see
 .Xr ipfw 8
 manual page.
 .It Va firewall_flags
 .Pq Vt str
 Flags passed to
 .Xr ipfw 8
 if
 .Va firewall_type
 specifies a filename.
 .It Va firewall_coscripts
 .Pq Vt str
 List of executables and/or rc scripts to run after firewall starts/stops.
 Default is empty.
 .\" ----- firewall_nat_enable setting --------------------------------
 .It Va firewall_nat_enable
 .Pq Vt bool
 The
 .Xr ipfw 8
 equivalent of
 .Va natd_enable .
 Setting this to
 .Dq Li YES
 enables kernel NAT.
 .Va firewall_enable
 must also be set to
 .Dq Li YES .
 .It Va firewall_nat_interface
 .Pq Vt str
 The
 .Xr ipfw 8
 equivalent of
 .Va natd_interface .
 This is the name of the public interface or IP address on which
 kernel NAT should run.
 .It Va firewall_nat_flags
 .Pq Vt str
 Additional configuration parameters for kernel NAT should be placed here.
 .It Va dummynet_enable
 .Pq Vt bool
 Setting this to
 .Dq Li YES
 will automatically load the
 .Xr dummynet 4
 module if
 .Va firewall_enable
 is also set to
 .Dq Li YES .
 .\" -------------------------------------------------------------------
 .It Va natd_program
 .Pq Vt str
 Path to
 .Xr natd 8 .
 .It Va natd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable
 .Xr natd 8 .
 .Va firewall_enable
 must also be set to
 .Dq Li YES ,
 and
 .Xr divert 4
 sockets must be enabled in the kernel.
 If the kernel was not built with
 .Cd "options IPDIVERT" ,
 the
 .Pa ipdivert.ko
 kernel module will be loaded.
 .It Va natd_interface
 .Pq Vt str
 This is the name of the public interface on which
 .Xr natd 8
 should run.
 The interface may be given as an interface name or as an IP address.
 .It Va natd_flags
 .Pq Vt str
 Additional
 .Xr natd 8
 flags should be placed here.
 The
 .Fl n
 or
 .Fl a
 flag is automatically added with the above
 .Va natd_interface
 as an argument.
 .\" ----- ipfilter_enable setting --------------------------------
 .It Va ipfilter_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting this to
 .Dq Li YES
 enables
 .Xr ipf 8
 packet filtering.
 .Pp
 Typical usage will require putting
 .Bd -literal
 ipfilter_enable="YES"
 ipnat_enable="YES"
 ipmon_enable="YES"
 ipfs_enable="YES"
 .Ed
 .Pp
 into
 .Pa /etc/rc.conf
 and editing
 .Pa /etc/ipf.rules
 and
 .Pa /etc/ipnat.rules
 appropriately.
 .Pp
 Note that
 .Va ipfilter_enable
 and
 .Va ipnat_enable
 can be enabled independently.
 .Va ipmon_enable
 and
 .Va ipfs_enable
 both require at least one of
 .Va ipfilter_enable
 and
 .Va ipnat_enable
 to be enabled.
 .Pp
 Having
 .Bd -literal
 options IPFILTER
 options IPFILTER_LOG
 options IPFILTER_DEFAULT_BLOCK
 .Ed
 .Pp
 in the kernel configuration file is a good idea, too.
 .\" ----- ipfilter_program setting ------------------------------
 .It Va ipfilter_program
 .Pq Vt str
 Path to
 .Xr ipf 8
 (default
 .Pa /sbin/ipf ) .
 .\" ----- ipfilter_rules setting --------------------------------
 .It Va ipfilter_rules
 .Pq Vt str
 Set to
 .Pa /etc/ipf.rules
 by default.
 This variable contains the name of the filter rule definition file.
 The file is expected to be readable for the
 .Xr ipf 8
 command to execute.
 .\" ----- ipv6_ipfilter_rules setting ---------------------------
 .It Va ipv6_ipfilter_rules
 .Pq Vt str
 Set to
 .Pa /etc/ipf6.rules
 by default.
 This variable contains the IPv6 filter rule definition file.
 The file is expected to be readable for the
 .Xr ipf 8
 command to execute.
 .\" ----- ipfilter_flags setting --------------------------------
 .It Va ipfilter_flags
 .Pq Vt str
 Empty by default.
 This variable contains flags passed to the
 .Xr ipf 8
 program.
 .\" ----- ipnat_enable setting ----------------------------------
 .It Va ipnat_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Set it to
 .Dq Li YES
 to enable
 .Xr ipnat 8
 network address translation.
 See
 .Va ipfilter_enable
 for a detailed discussion.
 .\" ----- ipnat_program setting ---------------------------------
 .It Va ipnat_program
 .Pq Vt str
 Path to
 .Xr ipnat 8
 (default
 .Pa /sbin/ipnat ) .
 .\" ----- ipnat_rules setting -----------------------------------
 .It Va ipnat_rules
 .Pq Vt str
 Set to
 .Pa /etc/ipnat.rules
 by default.
 This variable contains the name of the file
 holding the network address translation definition.
 This file is expected to be readable for the
 .Xr ipnat 8
 command to execute.
 .\" ----- ipnat_flags setting -----------------------------------
 .It Va ipnat_flags
 .Pq Vt str
 Empty by default.
 This variable contains flags passed to the
 .Xr ipnat 8
 program.
 .\" ----- ipmon_enable setting ----------------------------------
 .It Va ipmon_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Set it to
 .Dq Li YES
 to enable
 .Xr ipmon 8
 monitoring (logging
 .Xr ipf 8
 and
 .Xr ipnat 8
 events).
 Setting this variable needs setting
 .Va ipfilter_enable
 or
 .Va ipnat_enable
 too.
 See
 .Va ipfilter_enable
 for a detailed discussion.
 .\" ----- ipmon_program setting ---------------------------------
 .It Va ipmon_program
 .Pq Vt str
 Path to
 .Xr ipmon 8
 (default
 .Pa /sbin/ipmon ) .
 .\" ----- ipmon_flags setting -----------------------------------
 .It Va ipmon_flags
 .Pq Vt str
 Set to
 .Dq Li -Ds
 by default.
 This variable contains flags passed to the
 .Xr ipmon 8
 program.
 Another typical example would be
 .Dq Fl D Pa /var/log/ipflog
 to have
 .Xr ipmon 8
 log directly to a file bypassing
 .Xr syslogd 8 .
 Make sure to adjust
 .Pa /etc/newsyslog.conf
 in such case like this:
 .Bd -literal
 /var/log/ipflog  640  10  100  *  Z  /var/run/ipmon.pid
 .Ed
 .\" ----- ipfs_enable setting -----------------------------------
 .It Va ipfs_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Set it to
 .Dq Li YES
 to enable
 .Xr ipfs 8
 saving the filter and NAT state tables during shutdown
 and reloading them during startup again.
 Setting this variable needs setting
 .Va ipfilter_enable
 or
 .Va ipnat_enable
 to
 .Dq Li YES
 too.
 See
 .Va ipfilter_enable
 for a detailed discussion.
 Note that if
 .Va kern_securelevel
 is set to 3,
 .Va ipfs_enable
 cannot be used
 because the raised securelevel will prevent
 .Xr ipfs 8
 from saving the state tables at shutdown time.
 .\" ----- ipfs_program setting ----------------------------------
 .It Va ipfs_program
 .Pq Vt str
 Path to
 .Xr ipfs 8
 (default
 .Pa /sbin/ipfs ) .
 .\" ----- ipfs_flags setting ------------------------------------
 .It Va ipfs_flags
 .Pq Vt str
 Empty by default.
 This variable contains flags passed to the
 .Xr ipfs 8
 program.
 .\" ----- end of added ipf hook ---------------------------------
 .It Va pf_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting this to
 .Dq Li YES
 enables
 .Xr pf 4
 packet filtering.
 .Pp
 Typical usage will require putting
 .Pp
 .Dl pf_enable="YES"
 .Pp
 into
 .Pa /etc/rc.conf
 and editing
 .Pa /etc/pf.conf
 appropriately.
 Adding
 .Pp
 .Dl "device pf"
 .Pp
 builds support for
 .Xr pf 4
 into the kernel, otherwise the
 kernel module will be loaded.
 .It Va pf_rules
 .Pq Vt str
 Path to
 .Xr pf 4
 ruleset configuration file
 (default
 .Pa /etc/pf.conf ) .
 .It Va pf_program
 .Pq Vt str
 Path to
 .Xr pfctl 8
 (default
 .Pa /sbin/pfctl ) .
 .It Va pf_flags
 .Pq Vt str
 If
 .Va pf_enable
 is set to
 .Dq Li YES ,
 these flags are passed to the
 .Xr pfctl 8
 program when loading the ruleset.
 .It Va pflog_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting this to
 .Dq Li YES
 enables
 .Xr pflogd 8
 which logs packets from the
 .Xr pf 4
 packet filter.
 .It Va pflog_logfile
 .Pq Vt str
 If
 .Va pflog_enable
 is set to
 .Dq Li YES
 this controls where
 .Xr pflogd 8
 stores the logfile
 (default
 .Pa /var/log/pflog ) .
 Check
 .Pa /etc/newsyslog.conf
 to adjust logfile rotation for this.
 .It Va pflog_program
 .Pq Vt str
 Path to
 .Xr pflogd 8
 (default
 .Pa /sbin/pflogd ) .
 .It Va pflog_flags
 .Pq Vt str
 Empty by default.
 This variable contains additional flags passed to the
 .Xr pflogd 8
 program.
 .It Va pflog_instances
 .Pq Vt str
 If logging to more than one
 .Xr pflog 4
 interface is desired,
 .Va pflog_instances
 is set to the list of
 .Xr pflogd 8
 instances that should be started at system boot time. If
 .Va pflog_instances
 is set, for each whitespace-seperated
 .Ar element
 in the list,
 .Ao Ar element Ac Ns Va _dev
 and
 .Ao Ar element Ac Ns Va _logfile
 elements are assumed to exist.
 .Ao Ar element Ac Ns Va _dev
 must contain the
 .Xr pflog 4
 interface to be watched by the named
 .Xr pflogd 8
 instance.
 .Ao Ar element Ac Ns Va _logfile
 must contain the name of the logfile that will be used by the
 .Xr pflogd 8
 instance.
 .It Va ftpproxy_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting this to
 .Dq Li YES
 enables
 .Xr ftp-proxy 8
 which supports the
 .Xr pf 4
 packet filter in translating ftp connections.
 .It Va ftpproxy_flags
 .Pq Vt str
 Empty by default.
 This variable contains additional flags passed to the
 .Xr ftp-proxy 8
 program.
 .It Va ftpproxy_instances
 .Pq Vt str
 Empty by default. If multiple instances of
 .Xr ftp-proxy 8
 are desired at boot time,
 .Va ftpproxy_instances
 should contain a whitespace-seperated list of instance names. For each
 .Ar element
 in the list, a variable named
 .Ao Ar element Ac Ns Va _flags
 should be defined, containing the command-line flags to be passed to the
 .Xr ftp-proxy 8
 instance.
 .It Va pfsync_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting this to
 .Dq Li YES
 enables exposing
 .Xr pf 4
 state changes to other hosts over the network by means of
 .Xr pfsync 4 .
 The
 .Va pfsync_syncdev
 variable
 must also be set then.
 .It Va pfsync_syncdev
 .Pq Vt str
 Empty by default.
 This variable specifies the name of the network interface
 .Xr pfsync 4
 should operate through.
 It must be set accordingly if
 .Va pfsync_enable
 is set to
 .Dq Li YES .
 .It Va pfsync_syncpeer
 .Pq Vt str
 Empty by default.
 This variable is optional.
 By default, state change messages are sent out on the synchronisation
 interface using IP multicast packets.
 The protocol is IP protocol 240, PFSYNC, and the multicast group used is
 224.0.0.240.
 When a peer address is specified using the
 .Va pfsync_syncpeer
 option, the peer address is used as a destination for the pfsync
 traffic, and the traffic can then be protected using
 .Xr ipsec 4 .
 See the
 .Xr pfsync 4
 manpage for more details about using
 .Xr ipsec 4
 with
 .Xr pfsync 4
 interfaces.
 .It Va pfsync_ifconfig
 .Pq Vt str
 Empty by default.
 This variable can contain additional options to be passed to the
 .Xr ifconfig 8
 command used to set up
 .Xr pfsync 4 .
 .It Va tcp_extensions
 .Pq Vt bool
 Set to
 .Dq Li YES
 by default.
 Setting this to
 .Dq Li NO
 disables certain TCP options as described by
 .Rs
 .%T "RFC 1323"
 .Re
 Setting this to
 .Dq Li NO
 might help remedy such problems with connections as randomly hanging
 or other weird behavior.
 Some network devices are known
 to be broken with respect to these options.
 .It Va log_in_vain
 .Pq Vt int
 Set to 0 by default.
 The
 .Xr sysctl 8
 variables,
 .Va net.inet.tcp.log_in_vain
 and
 .Va net.inet.udp.log_in_vain ,
 as described in
 .Xr tcp 4
 and
 .Xr udp 4 ,
 are set to the given value.
 .It Va tcp_keepalive
 .Pq Vt bool
 Set to
 .Dq Li YES
 by default.
 Setting to
 .Dq Li NO
 will disable probing idle TCP connections to verify that the
 peer is still up and reachable.
 .It Va tcp_drop_synfin
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting to
 .Dq Li YES
 will cause the kernel to ignore TCP frames that have both
 the SYN and FIN flags set.
 This prevents OS fingerprinting, but may
 break some legitimate applications.
 .It Va icmp_drop_redirect
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting to
 .Dq Li YES
 will cause the kernel to ignore ICMP REDIRECT packets.
 Refer to
 .Xr icmp 4
 for more information.
 .It Va icmp_log_redirect
 .Pq Vt bool
 Set to
 .Dq Li NO
 by default.
 Setting to
 .Dq Li YES
 will cause the kernel to log ICMP REDIRECT packets.
 Note that
 the log messages are not rate-limited, so this option should only be used
 for troubleshooting networks.
 Refer to
 .Xr icmp 4
 for more information.
 .It Va icmp_bmcastecho
 .Pq Vt bool
 Set to
 .Dq Li YES
 to respond to broadcast or multicast ICMP ping packets.
 Refer to
 .Xr icmp 4
 for more information.
 .It Va ip_portrange_first
 .Pq Vt int
 If not set to
 .Dq Li NO ,
 this is the first port in the default portrange.
 Refer to
 .Xr ip 4
 for more information.
 .It Va ip_portrange_last
 .Pq Vt int
 If not set to
 .Dq Li NO ,
 this is the last port in the default portrange.
 Refer to
 .Xr ip 4
 for more information.
 .It Va network_interfaces
 .Pq Vt str
 Set to the list of network interfaces to configure on this host or
 .Dq Li AUTO
 (the default) for all current interfaces.
 Setting the
 .Va network_interfaces
 variable to anything other than the default is deprecated.
 Interfaces that the administrator wishes to store configuration for,
 but not start at boot should be configured with the
 .Dq Li NOAUTO
 keyword in their
 .Va ifconfig_ Ns Aq Ar interface
 variables as described below.
 .Pp
 An
 .Va ifconfig_ Ns Aq Ar interface
 variable is also assumed to exist for each value of
 .Ar interface .
 When an interface name contains any of the characters
 .Dq Li .-/+
 they are translated to
 .Dq Li _
 before lookup.
 The variable can contain arguments to
 .Xr ifconfig 8 ,
 as well as special case-insensitive keywords described below.
 Such keywords are removed before passing the value to
 .Xr ifconfig 8
 while the order of the other arguments is preserved.
 .Pp
 It is possible to add IP alias entries using
 .Xr ifconfig 8
 syntax with the address family keyword such as
 .Li inet .
 Assuming that the interface in question was
 .Li ed0 ,
 it might look something like this:
 .Bd -literal
 ifconfig_ed0_alias0="inet 127.0.0.253 netmask 0xffffffff"
 ifconfig_ed0_alias1="inet 127.0.0.254 netmask 0xffffffff"
 .Ed
 .Pp
 It also possible to configure multiple IP addresses in Classless
 Inter-Domain Routing
 .Pq CIDR
 address notation,
 whose each address component can be a range like
 .Li inet 192.0.2.5-23/24
 or
 .Li inet6 2001:db8:1-f::1/64 .
 This notation allows address and prefix length part only,
 not the other address modifiers.
 Note that the maximum number of the generated addresses from a range
 specification is limited to an integer value specified in
 .Va netif_ipexpand_max
 in
 .Xr rc.conf 5
 because a small typo can unexpectedly generate a large number of addresses.
 The default value is
 .Li 2048 .
 It can be increased by adding the following line into
 .Xr rc.conf 5 :
 .Bd -literal
 netif_ipexpand_max="4096"
 .Ed
 .Pp
 In the case of
 .Li 192.0.2.5-23/24 ,
 the address 192.0.2.5 will be configured with the
 netmask /24 and the addresses 192.0.2.6 to 192.0.2.23 with
 the non-conflicting netmask /32 as explained in the
 .Xr ifconfig 8
 alias section.
 Note that this special netmask handling is only for
 .Li inet ,
 not for the other address families such as
 .Li inet6 .
 .Pp
 With the interface in question being
 .Li ed0 ,
 an example could look like:
 .Bd -literal
 ifconfig_ed0_alias2="inet 192.0.2.129/27"
 ifconfig_ed0_alias3="inet 192.0.2.1-5/28"
 .Ed
 .Pp
 and so on.
 .Pp
 Note that
 .Va ipv4_addrs_ Ns Aq Ar interface
 variable was supported for IPv4 CIDR address notation.
 It is now deprecated because the functionality was integrated into
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
 though
 .Va ipv4_addrs_ Ns Aq Ar interface
 is still supported for backward compatibility.
 .Pp
 For each
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
 entry with an address family keyword,
 its contents are passed to
 .Xr ifconfig 8 .
 Execution stops at the first unsuccessful access, so if
 something like this is present:
 .Bd -literal
 ifconfig_ed0_alias0="inet 127.0.0.251 netmask 0xffffffff"
 ifconfig_ed0_alias1="inet 127.0.0.252 netmask 0xffffffff"
 ifconfig_ed0_alias2="inet 127.0.0.253 netmask 0xffffffff"
 ifconfig_ed0_alias4="inet 127.0.0.254 netmask 0xffffffff"
 .Ed
 .Pp
 Then note that alias4 would
 .Em not
 be added since the search would
 stop with the missing
 .Dq Li alias3
 entry.
 Because of this difficult to manage behavior,
 there is
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _aliases
 variable, which has the same functionality as
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
 and can have all of entries in a variable like the following:
 .Bd -literal
 ifconfig_ed0_aliases="\\
 	inet 127.0.0.251 netmask 0xffffffff \\
 	inet 127.0.0.252 netmask 0xffffffff \\
 	inet 127.0.0.253 netmask 0xffffffff \\
 	inet 127.0.0.254 netmask 0xffffffff"
 .Ed
 .Pp
 It also supports CIDR notation.
 .Pp
 If the
 .Pa /etc/start_if. Ns Aq Ar interface
 file is present, it is read and executed by the
 .Xr sh 1
 interpreter
 before configuring the interface as specified in the
 .Va ifconfig_ Ns Aq Ar interface
 and
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
 variables.
 .Pp
 If a
 .Va vlans_ Ns Aq Ar interface
 variable is set,
 a
 .Xr vlan 4
 interface will be created for each item in the list with the
 .Ar vlandev
 argument set to
 .Ar interface .
 If a vlan interface's name is a number,
 then that number is used as the vlan tag and the new vlan interface is
 named
 .Ar interface . Ns Ar tag .
 Otherwise,
 the vlan tag must be specified via a
 .Va vlan
 parameter in the
 .Va create_args_ Ns Aq Ar interface
 variable.
 .Pp
 To create a vlan device named
 .Li em0.101
 on
 .Li em0
 with the vlan tag 101 and the optional the IPv4 address 192.0.2.1/24:
 .Bd -literal
 vlans_em0="101"
 ifconfig_em0_101="inet 192.0.2.1/24"
 .Ed
 .Pp
 To create a vlan device named
 .Li myvlan
 on
 .Li em0
 with the vlan tag 102:
 .Bd -literal
 vlans_em0="myvlan"
 create_args_myvlan="vlan 102"
 .Ed
 .Pp
 If a
 .Va wlans_ Ns Aq Ar interface
 variable is set,
 an
 .Xr wlan 4
 interface will be created for each item in the list with the
 .Ar wlandev
 argument set to
 .Ar interface .
 Further wlan cloning arguments may be passed to the
 .Xr ifconfig 8
 .Cm create
 command by setting the
 .Va create_args_ Ns Aq Ar interface
 variable.
 One or more
 .Xr wlan 4
 devices must be created for each wireless devices as of
 .Fx 8.0 .
 Debugging flags for
 .Xr wlan 4
 devices as set by
 .Xr wlandebug 8
 may be specified with an
 .Va wlandebug_ Ns Aq Ar interface
 variable.
 The contents of this variable will be passed directly to
 .Xr wlandebug 8 .
 .Pp
 If the
 .Va ifconfig_ Ns Aq Ar interface
 contains the keyword
 .Dq Li NOAUTO
 then the interface will not be configured
 at boot or by
 .Pa /etc/pccard_ether
 when
 .Va network_interfaces
 is set to
 .Dq Li AUTO .
 .Pp
 It is possible to bring up an interface with DHCP by adding
 .Dq Li DHCP
 to the
 .Va ifconfig_ Ns Aq Ar interface
 variable.
 For instance, to initialize the
 .Li ed0
 device via DHCP,
 it is possible to use something like:
 .Bd -literal
 ifconfig_ed0="DHCP"
 .Ed
 .Pp
 If you want to configure your wireless interface with
 .Xr wpa_supplicant 8
 for use with WPA, EAP/LEAP or WEP, you need to add
 .Dq Li WPA
 to the
 .Va ifconfig_ Ns Aq Ar interface
 variable.
 .Pp
 On the other hand, if you want to configure your wireless interface with
 .Xr hostapd 8 ,
 you need to add
 .Dq Li HOSTAP
 to the
 .Va ifconfig_ Ns Aq Ar interface
 variable.
 .Xr hostapd 8
 will use the settings from
 .Pa /etc/hostapd- Ns Ao Ar interface Ac Ns .conf
 .Pp
 Finally, you can add
 .Xr ifconfig 8
 options in this variable, in addition to the
 .Pa /etc/start_if. Ns Aq Ar interface
 file.
 For instance, to configure an
 .Xr ath 4
 wireless device in station mode with an address obtained
 via DHCP, using WPA authentication and 802.11b mode, it is
 possible to use something like:
 .Bd -literal
 wlans_ath0="wlan0"
 ifconfig_wlan0="DHCP WPA mode 11b"
 .Ed
 .Pp
 In addition to the
 .Va ifconfig_ Ns Aq Ar interface
 form, a fallback variable
 .Va ifconfig_DEFAULT
 may be configured.
 It will be used for all interfaces with no
 .Va ifconfig_ Ns Aq Ar interface
 variable.
 This is intended to replace the no longer supported
 .Va pccard_ifconfig
 variable.
 .Pp
 It is also possible to rename an interface by doing:
 .Bd -literal
 ifconfig_ed0_name="net0"
 ifconfig_net0="inet 192.0.2.1 netmask 0xffffff00"
 .Ed
 .It Va ipv6_enable
 .Pq Vt bool
 This variable is deprecated.
 Use
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 and
 .Va ipv6_activate_all_interfaces
 if necessary.
 .Pp
 If the variable is
 .Dq Li YES ,
 .Dq Li inet6 accept_rtadv
 is added to all of
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 and the
 .Va ipv6_activate_all_interfaces
 is defined as
 .Dq Li YES .
 .It Va ipv6_prefer
 .Pq Vt bool
 This variable is deprecated.
 Use
 .Va ip6addrctl_policy
 instead.
 .Pp
 If the variable is
 .Dq Li YES ,
 the default address selection policy table set by
 .Xr ip6addrctl 8
 will be IPv6-preferred.
 .Pp
 If the variable is
 .Dq Li NO ,
 the default address selection policy table set by
 .Xr ip6addrctl 8
 will be IPv4-preferred.
 .It Va ipv6_activate_all_interfaces
 .Pq Vt bool
 This controls initial configuration on IPv6-capable
 interfaces with no corresponding
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 variable.
 Note that it is not always necessary to set this variable to
 .Dq YES
 to use IPv6 functionality on
 .Fx .
 In most cases, just configuring
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 variables works.
 .Pp
 If the variable is
 .Dq Li NO ,
 all interfaces which do not have a corresponding
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 variable will be marked as
 .Dq Li IFDISABLED
 at creation.
 This means that all of IPv6 functionality on that interface
 is completely disabled to enforce a security policy.
 If the variable is set to
 .Dq YES ,
 the flag will be cleared on all of the interfaces.
 .Pp
 In most cases, just defining an
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 for an IPv6-capable interface should be sufficient.
 However, if an interface is added dynamically
 .Pq by some tunneling protocols such as PPP, for example ,
 it is often difficult to define the variable in advance.
 In such a case, configuring the
 .Dq Li IFDISABLED
 flag can be disabled by setting this variable to
 .Dq YES .
 .Pp
 For more details of the
 .Dq Li IFDISABLED
 flag and keywords
 .Dq Li inet6 ifdisabled ,
 see
 .Xr ifconfig 8 .
 .Pp
 Default is
 .Dq Li NO .
 .It Va ipv6_privacy
 .Pq Vt bool
 If the variable is
 .Dq Li YES
 privacy addresses will be generated for each IPv6
 interface as described in RFC 4941.
 .It Va ipv6_network_interfaces
 .Pq Vt str
 This is the IPv6 equivalent of
 .Va network_interfaces .
 Normally manual configuration of this variable is not needed.
 .It Va ipv6_cpe_wanif
 .Pq Vt str
 If the variable is set to an interface name,
 the
 .Xr ifconfig 8
 options
 .Dq inet6 -no_radr accept_rtadv
 will be added to the specified interface automatically before evaluating
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6 ,
 and two
 .Xr sysctl 8
 variables
 .Va net.inet6.ip6.rfc6204w3
 and
 .Va net.inet6.ip6.no_radr
 will be set to 1.
 .Pp
 This means the specified interface will accept ICMPv6 Router
 Advertisement messages on that link and add the discovered
 routers into the Default Router List.
 While the other interfaces can still accept RA messages if the
 .Dq inet6 accept_rtadv
 option is specified, adding
 routes into the Default Router List will be disabled by
 .Dq inet6 no_radr
 option by default.
 See
 .Xr ifconfig 8
 for more details.
 .Pp
 Note that ICMPv6 Router Advertisement messages will be
 accepted even when
 .Va net.inet6.ip6.forwarding
 is 1
 .Pq packet forwarding is enabled
 when
 .Va net.inet6.ip6.rfc6204w3
 is set to 1.
 .Pp
 Default is
 .Dq Li NO .
 .It Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 .Pq Vt str
 IPv6 functionality on an interface should be configured by
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6 ,
 instead of setting ifconfig parameters in
 .Va ifconfig_ Ns Aq Ar interface .
 If this variable is empty, all of IPv6 configurations on the
 specified interface by other variables such as
 .Va ipv6_prefix_ Ns Ao Ar interface Ac
 will be ignored.
 .Pp
 Aliases should be set by
 .Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
 with
 .Dq Li inet6
 keyword.
 For example:
 .Bd -literal
 ifconfig_ed0_ipv6="inet6 2001:db8:1::1 prefixlen 64"
 ifconfig_ed0_alias0="inet6 2001:db8:2::1 prefixlen 64"
 .Ed
 .Pp
 Interfaces that have an
 .Dq Li inet6 accept_rtadv
 keyword in
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 setting will be automatically configured by SLAAC
 .Pq StateLess Address AutoConfiguration
 described in
 .Rs
 .%T "RFC 4862"
 .Re
 .Pp
 Note that a link-local address will be automatically configured in
 addition to the configured global-scope addresses because the IPv6
 specifications require it on each link.
 The address is calculated from the MAC address by using an algorithm
 defined in
 .Rs
 .%T "RFC 4862"
 .%O "Section 5.3"
 .Re
 .Pp
 If only a link-local address is needed on the interface,
 the following configuration can be used:
 .Bd -literal
 ifconfig_ed0_ipv6="inet6 auto_linklocal"
 .Ed
 .Pp
 A link-local address can also be configured manually.
 This is useful for the default router address of an IPv6 router
 so that it does not change when the network interface
 card is replaced.
 For example:
 .Bd -literal
 ifconfig_ed0_ipv6="inet6 fe80::1 prefixlen 64"
 .Ed
 .It Va ipv6_prefix_ Ns Aq Ar interface
 .Pq Vt str
 If one or more prefixes are defined in
 .Va ipv6_prefix_ Ns Aq Ar interface
 addresses based on each prefix and the EUI-64 interface index will be
 configured on that interface.
 Note that this variable will be ignored when
 .Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
 is empty.
 .Pp
 For example, the following configuration
 .Bd -literal
 ipv6_prefix_ed0="2001:db8:1:0 2001:db8:2:0"
 .Ed
 .Pp
 is equivalent to the following:
 .Bd -literal
 ifconfig_ed0_alias0="inet6 2001:db8:1:: eui64 prefixlen 64"
 ifconfig_ed0_alias1="inet6 2001:db8:1:: prefixlen 64 anycast"
 ifconfig_ed0_alias2="inet6 2001:db8:2:: eui64 prefixlen 64"
 ifconfig_ed0_alias3="inet6 2001:db8:2:: prefixlen 64 anycast"
 .Ed
 .Pp
 These Subnet-Router anycast addresses will be added only when
 .Va ipv6_gateway_enable
 is YES.
 .It Va ipv6_default_interface
 .Pq Vt str
 If not set to
 .Dq Li NO ,
 this is the default output interface for scoped addresses.
 This works only with ipv6_gateway_enable="NO".
 .It Va ip6addrctl_enable
 .Pq Vt bool
 This variable is to enable configuring default address selection policy table
 .Pq RFC 3484 .
 The table can be specified in another variable
 .Va ip6addrctl_policy .
 For
 .Va ip6addrctl_policy
 the following keywords can be specified:
 .Dq Li ipv4_prefer ,
 .Dq Li ipv6_prefer ,
 or
 .Dq Li AUTO .
 .Pp
 If
 .Dq Li ipv4_prefer
 or
 .Dq Li ipv6_prefer
 is specified,
 .Xr ip6addrctl 8
 installs a pre-defined policy table described in Section 2.1
 .Pq IPv6-preferred
 or 10.3
 .Pq IPv4-preferred
 of RFC 3484.
 .Pp
 If
 .Dq Li AUTO
 is specified, it attempts to read a file
 .Pa /etc/ip6addrctl.conf
 first.
 If this file is found,
 .Xr ip6addrctl 8
 reads and installs it.
 If not found, a policy is automatically set
 according to
 .Va ipv6_activate_all_interfaces
 variable; if the variable is set to
 .Dq Li YES
 the IPv6-preferred one is used.
 Otherwise IPv4-preferred.
 .Pp
 The default value of
 .Va ip6addrctl_enable
 and
 .Va ip6addrctl_policy
 are
 .Dq Li YES
 and
 .Dq Li AUTO ,
 respectively.
 .It Va cloned_interfaces
 .Pq Vt str
 Set to the list of clonable network interfaces to create on this host.
 Further cloning arguments may be passed to the
 .Xr ifconfig 8
 .Cm create
 command for each interface by setting the
 .Va create_args_ Ns Aq Ar interface
 variable.
 If an interface name is specified with
 .Dq :sticky
 keyword,
 the interface will not be destroyed even when
 .Pa rc.d/netif
 script is invoked with
 .Dq stop
 argument.
 This is useful when reconfiguring the interface without destroying it.
 Entries in
 .Va cloned_interfaces
 are automatically appended to
 .Va network_interfaces
 for configuration.
 .It Va cloned_interfaces_sticky
 .Pq Vt bool
 This variable is to globally enable functionality of
 .Dq :sticky
 keyword in
 .Va cloned_interfaces
 for all interfaces.
 The default value is
 .Dq NO .
 Even if this variable is specified to
 .Dq YES ,
 .Dq :nosticky
 keyword can be used to override it on per interface basis.
 .It Va gif_interfaces
 .Pq Vt str
 This variable is deprecated in favor of
 .Va cloned_interfaces .
 Set to the list of
 .Xr gif 4
 tunnel interfaces to configure on this host.
 A
 .Va gifconfig_ Ns Aq Ar interface
 variable is assumed to exist for each value of
 .Ar interface .
 The value of this variable is used to configure the link layer of the
 tunnel according to the syntax of the
 .Cm tunnel
 option to
 .Xr ifconfig 8 .
 Additionally, this option ensures that each listed interface is created
 via the
 .Cm create
 option to
 .Xr ifconfig 8
 before attempting to configure it.
 .It Va sppp_interfaces
 .Pq Vt str
 Set to the list of
 .Xr sppp 4
 interfaces to configure on this host.
 A
 .Va spppconfig_ Ns Aq Ar interface
 variable is assumed to exist for each value of
 .Ar interface .
 Each interface should also be configured by a general
 .Va ifconfig_ Ns Aq Ar interface
 setting.
 Refer to
 .Xr spppcontrol 8
 for more information about available options.
 .It Va ppp_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr ppp 8
 daemon.
 .It Va ppp_profile
 .Pq Vt str
 The name of the profile to use from
 .Pa /etc/ppp/ppp.conf .
 Also used for per-profile overrides of
 .Va ppp_mode
 and
 .Va ppp_nat ,
 and
 .Va ppp_ Ns Ao Ar profile Ac Ns _unit .
 When the profile name contains any of the characters
 .Dq Li .-/+
 they are translated to
 .Dq Li _
 for the proposes of the override variable names.
 .It Va ppp_mode
 .Pq Vt str
 Mode in which to run the
 .Xr ppp 8
 daemon.
 .It Va ppp_ Ns Ao Ar profile Ac Ns _mode
 .Pq Vt str
 Overrides the global
 .Va ppp_mode
 for
 .Ar profile .
 Accepted modes are
 .Dq Li auto ,
 .Dq Li ddial ,
 .Dq Li direct
 and
 .Dq Li dedicated .
 See the manual for a full description.
 .It Va ppp_nat
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enables network address translation.
 Used in conjunction with
 .Va gateway_enable
 allows hosts on private network addresses access to the Internet using
 this host as a network address translating router.
 .It Va ppp_ Ns Ao Ar profile Ac Ns _nat
 .Pq Vt str
 Overrides the global
 .Va ppp_nat
 for
 .Ar profile .
 .It Va ppp_ Ns Ao Ar profile Ac Ns _unit
 .Pq Vt int
 Set the unit number to be used for this profile.
 See the manual description of
 .Fl unit Ns Ar N
 for details.
 .It Va ppp_user
 .Pq Vt str
 The name of the user under which
 .Xr ppp 8
 should be started.
 By
 default,
 .Xr ppp 8
 is started as
 .Dq Li root .
 .It Va rc_conf_files
 .Pq Vt str
 This option is used to specify a list of files that will override
 the settings in
 .Pa /etc/defaults/rc.conf .
 The files will be read in the order in which they are specified and should
 include the full path to the file.
 By default, the files specified are
 .Pa /etc/rc.conf
 and
 .Pa /etc/rc.conf.local
 .It Va zfs_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Pa /etc/rc.d/zfs
 will attempt to automatically mount ZFS file systems and initialize ZFS volumes
 (ZVOLs).
 .It Va gptboot_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Pa /etc/rc.d/gptboot
 will log if the system successfully (or not) booted from a GPT partition,
 which had the
 .Ar bootonce
 attribute set using
 .Xr gpart 8
 utility.
 .It Va gbde_autoattach_all
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Pa /etc/rc.d/gbde
 will attempt to automatically initialize your .bde devices in
 .Pa /etc/fstab .
 .It Va gbde_devices
 .Pq Vt str
 List the devices that the script should try to attach,
 or
 .Dq Li AUTO .
 .It Va gbde_lockdir
 .Pq Vt str
 The directory where the
 .Xr gbde 4
 lockfiles are located.
 The default lockfile directory is
 .Pa /etc .
 .Pp
 The lockfile for each individual
 .Xr gbde 4
 device can be overridden by setting the variable
 .Va gbde_lock_ Ns Aq Ar device ,
 where
 .Ar device
 is the encrypted device without the
 .Dq Pa /dev/
 and
 .Dq Pa .bde
 parts.
 .It Va gbde_attach_attempts
 .Pq Vt int
 Number of times to attempt attaching to a
 .Xr gbde 4
 device, i.e., how many times the user is asked for the pass-phrase.
 Default is 3.
 .It Va geli_devices
 .Pq Vt str
 List of devices to automatically attach on boot.
 Note that .eli devices from
 .Pa /etc/fstab
 are automatically appended to this list.
 .It Va geli_tries
 .Pq Vt int
 Number of times user is asked for the pass-phrase.
 If empty, it will be taken from
 .Va kern.geom.eli.tries
 sysctl variable.
 .It Va geli_default_flags
 .Pq Vt str
 Default flags to use by
 .Xr geli 8
 when configuring disk encryption.
 Flags can be configured for every device separately by defining
 .Va geli_ Ns Ao Ar device Ac Ns Va _flags
 variable.
 .It Va geli_autodetach
 .Pq Vt str
 Specifies if GELI devices should be marked for detach on last close after
 file systems are mounted.
 Default is
 .Dq Li YES .
 This can be changed for every device separately by defining
 .Va geli_ Ns Ao Ar device Ac Ns Va _autodetach
 variable.
 .It Va root_rw_mount
 .Pq Vt bool
 Set to
 .Dq Li YES
 by default.
 After the file systems are checked at boot time, the root file system
 is remounted as read-write if this is set to
 .Dq Li YES .
 Diskless systems that mount their root file system from a read-only remote
 NFS share should set this to
 .Dq Li NO
 in their
 .Pa rc.conf .
 .It Va fsck_y_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Xr fsck 8
 will be run with the
 .Fl y
 flag if the initial preen
 of the file systems fails.
 .It Va background_fsck
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 the system will attempt to run
 .Xr fsck 8
 in the background where possible.
 .It Va background_fsck_delay
 .Pq Vt int
 The amount of time in seconds to sleep before starting a background
 .Xr fsck 8 .
 It defaults to sixty seconds to allow large applications such as
 the X server to start before disk I/O bandwidth is monopolized by
 .Xr fsck 8 .
 If set to a negative number, the background file system check will be
 delayed indefinitely to allow the administrator to run it at a more
 convenient time.
 For example it may be run from
 .Xr cron 8
 by adding a line like
 .Pp
 .Dl "0 4 * * * root /etc/rc.d/bgfsck forcestart"
 .Pp
 to
 .Pa /etc/crontab .
 .It Va netfs_types
 .Pq Vt str
 List of file system types that are network-based.
 This list should generally not be modified by end users.
 Use
 .Va extra_netfs_types
 instead.
 .It Va extra_netfs_types
 .Pq Vt str
 If set to something other than
 .Dq Li NO
 (the default),
 this variable extends the list of file system types
 for which automatic mounting at startup by
 .Xr rc 8
 should be delayed until the network is initialized.
 It should contain
 a whitespace-separated list of network file system descriptor pairs,
 each consisting of a file system type as passed to
 .Xr mount 8
 and a human-readable, one-word description,
 joined with a colon
 .Pq Ql \&: .
 Extending the default list in this way is only necessary
 when third party file system types are used.
 .It Va syslogd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr syslogd 8
 daemon.
 .It Va syslogd_program
 .Pq Vt str
 Path to
 .Xr syslogd 8
 (default
 .Pa /usr/sbin/syslogd ) .
 .It Va syslogd_flags
 .Pq Vt str
 If
 .Va syslogd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr syslogd 8 .
 .It Va inetd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr inetd 8
 daemon.
 .It Va inetd_program
 .Pq Vt str
 Path to
 .Xr inetd 8
 (default
 .Pa /usr/sbin/inetd ) .
 .It Va inetd_flags
 .Pq Vt str
 If
 .Va inetd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr inetd 8 .
 .It Va hastd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr hastd 8
 daemon.
 .It Va hastd_program
 .Pq Vt str
 Path to
 .Xr hastd 8
 (default
 .Pa /sbin/hastd ) .
 .It Va hastd_flags
 .Pq Vt str
 If
 .Va hastd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr hastd 8 .
 .It Va local_unbound_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr unbound 8
 daemon as a local caching resolver.
 .It Va kdc_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start a Kerberos 5 authentication server
 at boot time.
 .It Va kdc_program
 .Pq Vt str
 If
 .Va kdc_enable
 is set to
 .Dq Li YES
 this is the path to Kerberos 5 Authentication Server.
 .It Va kdc_flags
 .Pq Vt str
 Empty by default.
 This variable contains additional flags to be passed to the Kerberos 5
 authentication server.
 .It Va kadmind_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr kadmind 8 ,
 the Kerberos 5 Administration Daemon; set to
 .Dq Li NO
 on a slave server.
 .It Va kadmind_program
 .Pq Vt str
 If
 .Va kadmind_enable
 is set to
 .Dq Li YES
 this is the path to Kerberos 5 Administration Daemon.
 .It Va kpasswdd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr kpasswdd 8 ,
 the Kerberos 5 Password-Changing Daemon; set to
 .Dq Li NO
 on a slave server.
 .It Va kpasswdd_program
 .Pq Vt str
 If
 .Va kpasswdd_enable
 is set to
 .Dq Li YES
 this is the path to Kerberos 5 Password-Changing Daemon.
 .It Va kfd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr kfd 8 ,
 the Kerberos 5 ticket forwarding daemon, at the boot time.
 .It Va kfd_program
 .Pq Vt str
 Path to
 .Xr kfd 8
 (default
 .Pa /usr/libexec/kfd ) .
 .It Va rwhod_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rwhod 8
 daemon at boot time.
 .It Va rwhod_flags
 .Pq Vt str
 If
 .Va rwhod_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to it.
 .It Va amd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr amd 8
 daemon at boot time.
 .It Va amd_flags
 .Pq Vt str
 If
 .Va amd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to it.
 See the
 .Xr amd 8
 manpage for more information.
 .It Va amd_map_program
 .Pq Vt str
 If set,
 the specified program is run to get the list of
 .Xr amd 8
 maps.
 For example, if the
 .Xr amd 8
 maps are stored in NIS, one can set this to
 run
 .Xr ypcat 1
 to get a list of
 .Xr amd 8
 maps from the
 .Pa amd.master
 NIS map.
 .It Va update_motd
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Pa /etc/motd
 will be updated at boot time to reflect the kernel release
 being run.
 If set to
 .Dq Li NO ,
 .Pa /etc/motd
 will not be updated.
 .It Va nfs_client_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the NFS client daemons at boot time.
 .It Va nfs_access_cache
 .Pq Vt int
 If
 .Va nfs_client_enable
 is set to
 .Dq Li YES ,
 this can be set to
 .Dq Li 0
 to disable NFS ACCESS RPC caching, or to the number of seconds for which
 NFS ACCESS
 results should be cached.
 A value of 2-10 seconds will substantially reduce network
 traffic for many NFS operations.
 .It Va nfs_server_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the NFS server daemons at boot time.
 .It Va nfs_server_flags
 .Pq Vt str
 If
 .Va nfs_server_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr nfsd 8
 daemon.
 .It Va nfsv4_server_enable
 .Pq Vt bool
 If
 .Va nfs_server_enable
 is set to
 .Dq Li YES
 and
 .Va nfsv4_server_enable
 are set to
 .Dq Li YES ,
 enable the server for NFSv4 as well as NFSv2 and NFSv3.
 .It Va nfsuserd_enable
 .Pq Vt bool
 If
 .Va nfsuserd_enable
 is set to
 .Dq Li YES ,
 run the nfsuserd daemon, which is needed for NFSv4 in order
 to map between user/group names vs uid/gid numbers.
 If
 .Va nfsv4_server_enable
 is set to
 .Dq Li YES ,
 this will be forced enabled.
 .It Va nfsuserd_flags
 .Pq Vt str
 If
 .Va nfsuserd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr nfsuserd 8
 daemon.
 .It Va nfscbd_enable
 .Pq Vt bool
 If
 .Va nfscbd_enable
 is set to
 .Dq Li YES ,
 run the nfscbd daemon, which enables callbacks/delegations for the NFSv4 client.
 .It Va nfscbd_flags
 .Pq Vt str
 If
 .Va nfscbd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr nfscbd 8
 daemon.
 .It Va mountd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 and no
 .Va nfs_server_enable
 is set, start
 .Xr mountd 8 ,
 but not
 .Xr nfsd 8
 daemon.
 It is commonly needed to run CFS without real NFS used.
 .It Va mountd_flags
 .Pq Vt str
 If
 .Va mountd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr mountd 8
 daemon.
 .It Va weak_mountd_authentication
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 allow services like PCNFSD to make non-privileged mount
 requests.
 .It Va nfs_reserved_port_only
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 provide NFS services only on a secure port.
 .It Va nfs_bufpackets
 .Pq Vt int
 If set to a number, indicates the number of packets worth of
 socket buffer space to reserve on an NFS client.
 The kernel default is typically 4.
 Using a higher number may be
 useful on gigabit networks to improve performance.
 The minimum value is
 2 and the maximum is 64.
 .It Va rpc_lockd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES
 and also an NFS server or client, run
 .Xr rpc.lockd 8
 at boot time.
 .It Va rpc_lockd_flags
 .Pq Vt str
 If
 .Va rpc_lockd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rpc.lockd 8
 daemon.
 .It Va rpc_statd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES
 and also an NFS server or client, run
 .Xr rpc.statd 8
 at boot time.
 .It Va rpc_statd_flags
 .Pq Vt str
 If
 .Va rpc_statd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rpc.statd 8
 daemon.
 .It Va rpcbind_program
 .Pq Vt str
 Path to
 .Xr rpcbind 8
 (default
 .Pa /usr/sbin/rpcbind ) .
 .It Va rpcbind_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rpcbind 8
 service at boot time.
 .It Va rpcbind_flags
 .Pq Vt str
 If
 .Va rpcbind_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rpcbind 8
 daemon.
 .It Va keyserv_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr keyserv 8
 daemon on boot for running Secure RPC.
 .It Va keyserv_flags
 .Pq Vt str
 If
 .Va keyserv_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr keyserv 8
 daemon.
 .It Va pppoed_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr pppoed 8
 daemon at boot time to provide PPP over Ethernet services.
 .It Va pppoed_ Ns Aq Ar provider
 .Pq Vt str
 .Xr pppoed 8
 listens to requests to this
 .Ar provider
 and ultimately runs
 .Xr ppp 8
 with a
 .Ar system
 argument of the same name.
 .It Va pppoed_flags
 .Pq Vt str
 Additional flags to pass to
 .Xr pppoed 8 .
 .It Va pppoed_interface
 .Pq Vt str
 The network interface to run
 .Xr pppoed 8
 on.
 This is mandatory when
 .Va pppoed_enable
 is set to
 .Dq Li YES .
 .It Va timed_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr timed 8
 service at boot time.
 This command is intended for networks of
 machines where a consistent
 .Dq "network time"
 for all hosts must be established.
 This is often useful in large NFS
 environments where time stamps on files are expected to be consistent
 network-wide.
 .It Va timed_flags
 .Pq Vt str
 If
 .Va timed_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr timed 8
 service.
 .It Va ntpdate_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run
 .Xr ntpdate 8
 at system startup.
 This command is intended to
 synchronize the system clock only
 .Em once
 from some standard reference.
 .It Va ntpdate_config
 .Pq Vt str
 Configuration file for
 .Xr ntpdate 8 .
 Default
 .Pa /etc/ntp.conf .
 .It Va ntpdate_hosts
 .Pq Vt str
 A whitespace-separated list of NTP servers to synchronize with at startup.
 The default is to use the servers listed in
 .Va ntpdate_config ,
 if that file exists.
 .It Va ntpdate_program
 .Pq Vt str
 Path to
 .Xr ntpdate 8
 (default
 .Pa /usr/sbin/ntpdate ) .
 .It Va ntpdate_flags
 .Pq Vt str
 If
 .Va ntpdate_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr ntpdate 8
 command (typically a hostname).
 .It Va ntpd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr ntpd 8
 command at boot time.
 .It Va ntpd_program
 .Pq Vt str
 Path to
 .Xr ntpd 8
 (default
 .Pa /usr/sbin/ntpd ) .
 .It Va ntpd_config
 .Pq Vt str
 Path to
 .Xr ntpd 8
 configuration file.
 Default
 .Pa /etc/ntp.conf .
 .It Va ntpd_flags
 .Pq Vt str
 If
 .Va ntpd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr ntpd 8
 daemon.
 .It Va ntpd_sync_on_start
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 .Xr ntpd 8
 is run with the
 .Fl g
 flag, which syncs the system's clock on startup.
 See
 .Xr ntpd 8
 for more information regarding the
 .Fl g
 option.
 This is a preferred alternative to using
 .Xr ntpdate 8
 or specifying the
 .Va ntpdate_enable
 variable.
 .It Va nis_client_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr ypbind 8
 service at system boot time.
 .It Va nis_client_flags
 .Pq Vt str
 If
 .Va nis_client_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr ypbind 8
 service.
+.It Va nis_ypldap_enable
+.Pq Vt bool
+If set to
+.Dq Li YES ,
+run the
+.Xr ypldap 8
+daemon at system boot time.
+.It Va nis_ypldap_flags
+.Pq Vt str
+If
+.Va nis.ypldap_enable
+is set to
+.Dq Li YES ,
+these are the flags to pass to the
+.Xr ypldap 8
+daemon.
 .It Va nis_ypset_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr ypset 8
 daemon at system boot time.
 .It Va nis_ypset_flags
 .Pq Vt str
 If
 .Va nis_ypset_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr ypset 8
 daemon.
 .It Va nis_server_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr ypserv 8
 daemon at system boot time.
 .It Va nis_server_flags
 .Pq Vt str
 If
 .Va nis_server_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr ypserv 8
 daemon.
 .It Va nis_ypxfrd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rpc.ypxfrd 8
 daemon at system boot time.
 .It Va nis_ypxfrd_flags
 .Pq Vt str
 If
 .Va nis_ypxfrd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rpc.ypxfrd 8
 daemon.
 .It Va nis_yppasswdd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rpc.yppasswdd 8
 daemon at system boot time.
 .It Va nis_yppasswdd_flags
 .Pq Vt str
 If
 .Va nis_yppasswdd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rpc.yppasswdd 8
 daemon.
 .It Va rpc_ypupdated_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Nm rpc.ypupdated
 daemon at system boot time.
 .It Va bsnmpd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr bsnmpd 1
 daemon at system boot time.
 Be sure to understand the security implications of running SNMP daemon
 on your host.
 .It Va bsnmpd_flags
 .Pq Vt str
 If
 .Va bsnmpd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr bsnmpd 1
 daemon.
 .It Va defaultrouter
 .Pq Vt str
 If not set to
 .Dq Li NO ,
 create a default route to this host name or IP address
 (use an IP address if this router is also required to get to the
 name server!).
 .It Va ipv6_defaultrouter
 .Pq Vt str
 The IPv6 equivalent of
 .Va defaultrouter .
 .It Va static_arp_pairs
 .Pq Vt str
 Set to the list of static ARP pairs that are to be added at system
 boot time.
 For each whitespace separated
 .Ar element
 in the value, a
 .Va static_arp_ Ns Aq Ar element
 variable is assumed to exist whose contents will later be passed to a
 .Dq Nm arp Cm -S
 operation.
 For example
 .Bd -literal
 static_arp_pairs="gw"
 static_arp_gw="192.168.1.1 00:01:02:03:04:05"
 .Ed
 .It Va static_ndp_pairs
 .Pq Vt str
 Set to the list of static NDP pairs that are to be added at system
 boot time.
 For each whitespace separated
 .Ar element
 in the value, a
 .Va static_ndp_ Ns Aq Ar element
 variable is assumed to exist whose contents will later be passed to a
 .Dq Nm ndp Cm -s
 operation.
 For example
 .Bd -literal
 static_ndp_pairs="gw"
 static_ndp_gw="2001:db8:3::1 00:01:02:03:04:05"
 .Ed
 .It Va static_routes
 .Pq Vt str
 Set to the list of static routes that are to be added at system
 boot time.
 If not set to
 .Dq Li NO
 then for each whitespace separated
 .Ar element
 in the value, a
 .Va route_ Ns Aq Ar element
 variable is assumed to exist
 whose contents will later be passed to a
 .Dq Nm route Cm add
 operation.
 For example:
 .Bd -literal
 static_routes="ext mcast:gif0 gif0local:gif0"
 route_ext="-net 10.0.0.0/24 -gateway 192.168.0.1"
 route_mcast="-net 224.0.0.0/4 -iface gif0"
 route_gif0local="-host 169.254.1.1 -iface lo0"
 .Ed
 .Pp
 When an
 .Ar element
 is in the form of
 .Li name:ifname ,
 the route is specific to the interface
 .Li ifname .
 .It Va ipv6_static_routes
 .Pq Vt str
 The IPv6 equivalent of
 .Va static_routes .
 If not set to
 .Dq Li NO
 then for each whitespace separated
 .Ar element
 in the value, a
 .Va ipv6_route_ Ns Aq Ar element
 variable is assumed to exist
 whose contents will later be passed to a
 .Dq Nm route Cm add Fl inet6
 operation.
 .It Va natm_static_routes
 .Pq Vt str
 The
 .Xr natmip 4
 equivalent of
 .Va static_routes .
 If not empty then for each whitespace separated
 .Ar element
 in the value, a
 .Va route_ Ns Aq Ar element
 variable is assumed to exist whose contents will later be passed to a
 .Dq Nm atmconfig Cm natm Cm add
 operation.
 .It Va gateway_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 configure host to act as an IP router, e.g.\& to forward packets
 between interfaces.
 .It Va ipv6_gateway_enable
 .Pq Vt bool
 The IPv6 equivalent of
 .Va gateway_enable .
 .It Va routed_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run a routing daemon of some sort, based on the
 settings of
 .Va routed_program
 and
 .Va routed_flags .
 .It Va route6d_enable
 .Pq Vt bool
 The IPv6 equivalent of
 .Va routed_enable .
 If set to
 .Dq Li YES ,
 run a routing daemon of some sort, based on the
 settings of
 .Va route6d_program
 and
 .Va route6d_flags .
 .It Va routed_program
 .Pq Vt str
 If
 .Va routed_enable
 is set to
 .Dq Li YES ,
 this is the name of the routing daemon to use.
 .It Va route6d_program
 .Pq Vt str
 The IPv6 equivalent of
 .Va routed_program .
 .It Va routed_flags
 .Pq Vt str
 If
 .Va routed_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the routing daemon.
 .It Va route6d_flags
 .Pq Vt str
 The IPv6 equivalent of
 .Va routed_flags .
 .It Va mroute6d_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the IPv6 multicast routing daemon.
 .Pp
 Note that multicast routing daemons are no longer included in the
 .Fx
 base system, however, both
 .Xr mrouted 8
 and
 .Xr pim6dd 8
 may be installed from the
 .Fx
 Ports Collection.
 .It Va mroute6d_flags
 .Pq Vt str
 If
 .Va mroute6d_enable
 is set to
 .Dq Li YES ,
 these are the flags passed to the IPv6 multicast routing daemon.
 .It Va mroute6d_program
 .Pq Vt str
 If
 .Va mroute6d_enable
 is set to
 .Dq Li YES ,
 this is the path to the IPv6 multicast routing daemon.
 .It Va rtadvd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rtadvd 8
 daemon at boot time.
 The
 .Xr rtadvd 8
 utility sends ICMPv6 Router Advertisement messages to
 the interfaces specified in
 .Va rtadvd_interfaces .
 This should only be enabled with great care.
 You may want to fine-tune
 .Xr rtadvd.conf 5 .
 .It Va rtadvd_interfaces
 .Pq Vt str
 If
 .Va rtadvd_enable
 is set to
 .Dq Li YES
 this is the list of interfaces to use.
 .It Va arpproxy_all
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable global proxy ARP.
 .It Va forward_sourceroute
 .Pq Vt bool
 If set to
 .Dq Li YES
 and
 .Va gateway_enable
 is also set to
 .Dq Li YES ,
 source-routed packets are forwarded.
 .It Va accept_sourceroute
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 the system will accept source-routed packets directed at it.
 .It Va rarpd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr rarpd 8
 daemon at system boot time.
 .It Va rarpd_flags
 .Pq Vt str
 If
 .Va rarpd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr rarpd 8
 daemon.
 .It Va bootparamd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr bootparamd 8
 daemon at system boot time.
 .It Va bootparamd_flags
 .Pq Vt str
 If
 .Va bootparamd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr bootparamd 8
 daemon.
 .It Va stf_interface_ipv4addr
 .Pq Vt str
 If not set to
 .Dq Li NO ,
 this is the local IPv4 address for 6to4 (IPv6 over IPv4 tunneling
 interface).
 Specify this entry to enable the 6to4 interface.
 .It Va stf_interface_ipv4plen
 .Pq Vt int
 Prefix length for 6to4 IPv4 addresses, to limit peer address range.
 An effective value is 0-31.
 .It Va stf_interface_ipv6_ifid
 .Pq Vt str
 IPv6 interface ID for
 .Xr stf 4 .
 This can be set to
 .Dq Li AUTO .
 .It Va stf_interface_ipv6_slaid
 .Pq Vt str
 IPv6 Site Level Aggregator for
 .Xr stf 4 .
 .It Va ipv6_ipv4mapping
 .Pq Vt bool
 If set to
 .Dq Li YES
 this enables IPv4 mapped IPv6 address communication (like
 .Li ::ffff:a.b.c.d ) .
 .It Va rtsold_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable the
 .Xr rtsold 8
 daemon to send ICMPv6 Router Solicitation messages.
 .It Va rtsold_flags
 .Pq Vt str
 If
 .Va rtsold_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr rtsold 8 .
 .It Va rtsol_flags
 .Pq Vt str
 For interfaces configured with the
 .Dq Li inet6 accept_rtadv
 keyword, these are the flags to pass to
 .Xr rtsol 8 .
 .Pp
 Note that
 .Va rtsold_enable
 is mutually exclusive to
 .Va rtsol_flags ;
 .Va rtsold_enable
 takes precedence.
 .It Va atm_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable the configuration of ATM interfaces at system boot time.
 For all of the ATM variables described below, please refer to the
 .Xr atm 8
 manual page for further details on the available command parameters.
 Also refer to the files in
 .Pa /usr/share/examples/atm
 for more detailed configuration information.
 .It Va atm_load
 .Pq Vt str
 This is a list of physical ATM interface drivers to load.
 Typical values are
 .Dq Li hfa_pci
 and/or
 .Dq Li hea_pci .
 .It Va atm_netif_ Ns Aq Ar intf
 .Pq Vt str
 For the ATM physical interface
 .Ar intf ,
 this variable defines the name prefix and count for the ATM network
 interfaces to be created.
 The value will be passed as the parameters of an
 .Dq Nm atm Cm "set netif" Ar intf
 command.
 .It Va atm_sigmgr_ Ns Aq Ar intf
 .Pq Vt str
 For the ATM physical interface
 .Ar intf ,
 this variable defines the ATM signalling manager to be used.
 The value will be passed as the parameters of an
 .Dq Nm atm Cm attach Ar intf
 command.
 .It Va atm_prefix_ Ns Aq Ar intf
 .Pq Vt str
 For the ATM physical interface
 .Ar intf ,
 this variable defines the NSAP prefix for interfaces using a UNI signalling
 manager.
 If set to
 .Dq Li ILMI ,
 the prefix will automatically be set via the
 .Xr ilmid 8
 daemon.
 Otherwise, the value will be passed as the parameters of an
 .Dq Nm atm Cm "set prefix" Ar intf
 command.
 .It Va atm_macaddr_ Ns Aq Ar intf
 .Pq Vt str
 For the ATM physical interface
 .Ar intf ,
 this variable defines the MAC address for interfaces using a UNI signalling
 manager.
 If set to
 .Dq Li NO ,
 the hardware MAC address contained in the ATM interface card will be used.
 Otherwise, the value will be passed as the parameters of an
 .Dq Nm atm Cm "set mac" Ar intf
 command.
 .It Va atm_arpserver_ Ns Aq Ar netif
 .Pq Vt str
 For the ATM network interface
 .Ar netif ,
 this variable defines the ATM address for a host which is to provide ATMARP
 service.
 This variable is only applicable to interfaces using a UNI signalling
 manager.
 If set to
 .Dq Li local ,
 this host will become an ATMARP server.
 The value will be passed as the parameters of an
 .Dq Nm atm Cm "set arpserver" Ar netif
 command.
 .It Va atm_scsparp_ Ns Aq Ar netif
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 SCSP/ATMARP service for the network interface
 .Ar netif
 will be initiated using the
 .Xr scspd 8
 and
 .Xr atmarpd 8
 daemons.
 This variable is only applicable if
 .Va atm_arpserver_ Ns Aq Ar netif
 is set to
 .Dq Li local .
 .It Va atm_pvcs
 .Pq Vt str
 Set to the list of ATM PVCs to be added at system
 boot time.
 For each whitespace separated
 .Ar element
 in the value, an
 .Va atm_pvc_ Ns Aq Ar element
 variable is assumed to exist.
 The value of each of these variables
 will be passed as the parameters of an
 .Dq Nm atm Cm "add pvc"
 command.
 .It Va atm_arps
 .Pq Vt str
 Set to the list of permanent ATM ARP entries to be added
 at system boot time.
 For each whitespace separated
 .Ar element
 in the value, an
 .Va atm_arp_ Ns Aq Ar element
 variable is assumed to exist.
 The value of each of these variables
 will be passed as the parameters of an
 .Dq Nm atm Cm "add arp"
 command.
 .It Va natm_interfaces
 .Pq Vt str
 Set to the list of
 .Xr natm 4
 interfaces that will also be used for HARP through
 .Xr harp 4 .
 If this list is not empty all interfaces in the list will be brought up
 with
 .Xr ifconfig 8
 and
 .Xr harp 4
 will be loaded.
 For this to work the interface drivers must be either compiled into the
 kernel or must reside on the root partition.
 .It Va keybell
 .Pq Vt str
 The keyboard bell sound.
 Set to
 .Dq Li normal ,
 .Dq Li visual ,
 .Dq Li off ,
 or
 .Dq Li NO
 if the default behavior is desired.
 For details, refer to the
 .Xr kbdcontrol 1
 manpage.
 .It Va keyboard
 .Pq Vt str
 If set to a non-null string, the virtual console's keyboard input is
 set to this device.
 .It Va keymap
 .Pq Vt str
 If set to
 .Dq Li NO ,
 no keymap is installed, otherwise the value is used to install
 the keymap file found in
 .Pa /usr/share/syscons/keymaps/ Ns Ao Ar value Ac Ns Pa .kbd
 (if using
 .Xr syscons 4 ) or
 .Pa /usr/share/vt/keymaps/ Ns Ao Ar value Ac Ns Pa .kbd
 (if using
 .Xr vt 4 ) .
 .It Va keyrate
 .Pq Vt str
 The keyboard repeat speed.
 Set to
 .Dq Li slow ,
 .Dq Li normal ,
 .Dq Li fast ,
 or
 .Dq Li NO
 if the default behavior is desired.
 .It Va keychange
 .Pq Vt str
 If not set to
 .Dq Li NO ,
 attempt to program the function keys with the value.
 The value should
 be a single string of the form:
 .Dq Ar funkey_number new_value Op Ar funkey_number new_value ... .
 .It Va cursor
 .Pq Vt str
 Can be set to the value of
 .Dq Li normal ,
 .Dq Li blink ,
 .Dq Li destructive ,
 or
 .Dq Li NO
 to set the cursor behavior explicitly or choose the default behavior.
 .It Va scrnmap
 .Pq Vt str
 If set to
 .Dq Li NO ,
 no screen map is installed, otherwise the value is used to install
 the screen map file in
 .Pa /usr/share/syscons/scrnmaps/ Ns Aq Ar value .
 This parameter is ignored when using
 .Xr vt 4
 as the console driver.
 .It Va font8x16
 .Pq Vt str
 If set to
 .Dq Li NO ,
 the default 8x16 font value is used for screen size requests, otherwise
 the value in
 .Pa /usr/share/syscons/fonts/ Ns Aq Ar value
 or
 .Pa /usr/share/vt/fonts/ Ns Aq Ar value
 is used (depending on the console driver being used).
 .It Va font8x14
 .Pq Vt str
 If set to
 .Dq Li NO ,
 the default 8x14 font value is used for screen size requests, otherwise
 the value in
 .Pa /usr/share/syscons/fonts/ Ns Aq Ar value
 or
 .Pa /usr/share/vt/fonts/ Ns Aq Ar value
 is used (depending on the console driver being used).
 .It Va font8x8
 .Pq Vt str
 If set to
 .Dq Li NO ,
 the default 8x8 font value is used for screen size requests, otherwise
 the value in
 .Pa /usr/share/syscons/fonts/ Ns Aq Ar value
 or
 .Pa /usr/share/vt/fonts/ Ns Aq Ar value
 is used (depending on the console driver being used).
 .It Va blanktime
 .Pq Vt int
 If set to
 .Dq Li NO ,
 the default screen blanking interval is used, otherwise it is set
 to
 .Ar value
 seconds.
 .It Va saver
 .Pq Vt str
 If not set to
 .Dq Li NO ,
 this is the actual screen saver to use
 .Li ( blank , snake , daemon ,
 etc).
 .It Va moused_nondefault_enable
 .Pq Vt str
 If set to
 .Dq Li NO ,
 the mouse device specified on
 the command line is not automatically treated as enabled by the
 .Pa /etc/rc.d/moused
 script.
 Having this variable set to
 .Dq Li YES
 allows a
 .Xr usb 4
 mouse,
 for example,
 to be enabled as soon as it is plugged in.
 .It Va moused_enable
 .Pq Vt str
 If set to
 .Dq Li YES ,
 the
 .Xr moused 8
 daemon is started for doing cut/paste selection on the console.
 .It Va moused_type
 .Pq Vt str
 This is the protocol type of the mouse connected to this host.
 This variable must be set if
 .Va moused_enable
 is set to
 .Dq Li YES .
 The
 .Xr moused 8
 daemon
 is able to detect the appropriate mouse type automatically in many cases.
 Set this variable to
 .Dq Li auto
 to let the daemon detect it, or
 select one from the following list if the automatic detection fails.
 .Pp
 If the mouse is attached to the PS/2 mouse port, choose
 .Dq Li auto
 or
 .Dq Li ps/2 ,
 regardless of the brand and model of the mouse.
 Likewise, if the
 mouse is attached to the bus mouse port, choose
 .Dq Li auto
 or
 .Dq Li busmouse .
 All other protocols are for serial mice and will not work with
 the PS/2 and bus mice.
 If this is a USB mouse,
 .Dq Li auto
 is the only protocol type which will work.
 .Pp
 .Bl -tag -width ".Li x10mouseremote" -compact
 .It Li microsoft
 Microsoft mouse (serial)
 .It Li intellimouse
 Microsoft IntelliMouse (serial)
 .It Li mousesystems
 Mouse systems Corp.\& mouse (serial)
 .It Li mmseries
 MM Series mouse (serial)
 .It Li logitech
 Logitech mouse (serial)
 .It Li busmouse
 A bus mouse
 .It Li mouseman
 Logitech MouseMan and TrackMan (serial)
 .It Li glidepoint
 ALPS GlidePoint (serial)
 .It Li thinkingmouse
 Kensington ThinkingMouse (serial)
 .It Li ps/2
 PS/2 mouse
 .It Li mmhittab
 MM HitTablet (serial)
 .It Li x10mouseremote
 X10 MouseRemote (serial)
 .It Li versapad
 Interlink VersaPad (serial)
 .El
 .Pp
 Even if the mouse is not in the above list, it may be compatible
 with one in the list.
 Refer to the manual page for
 .Xr moused 8
 for compatibility information.
 .Pp
 It should also be noted that while this is enabled, any
 other client of the mouse (such as an X server) should access
 the mouse through the virtual mouse device,
 .Pa /dev/sysmouse ,
 and configure it as a
 .Dq Li sysmouse
 type mouse, since all
 mouse data is converted to this single canonical format when
 using
 .Xr moused 8 .
 If the client program does not support the
 .Dq Li sysmouse
 type,
 specify the
 .Dq Li mousesystems
 type.
 It is the second preferred type.
 .It Va moused_port
 .Pq Vt str
 If
 .Va moused_enable
 is set to
 .Dq Li YES ,
 this is the actual port the mouse is on.
 It might be
 .Pa /dev/cuau0
 for a COM1 serial mouse,
 .Pa /dev/psm0
 for a PS/2 mouse or
 .Pa /dev/mse0
 for a bus mouse, for example.
 .It Va moused_flags
 .Pq Vt str
 If
 .Va moused_flags
 is set, its value is used as an additional set of flags to pass to the
 .Xr moused 8
 daemon.
 .It Va "moused_" Ns Ar XXX Ns Va "_flags"
 When
 .Va moused_nondefault_enable
 is enabled, and a
 .Xr moused 8
 daemon is started for a non-default port, the
 .Va "moused_" Ns Ar XXX Ns Va "_flags"
 set of options has precedence over and replaces the default
 .Va moused_flags
 (where
 .Ar XXX
 is the name of the non-default port, i.e.,\&
 .Ar ums0 ) .
 By setting
 .Va "moused_" Ns Ar XXX Ns Va "_flags"
 it is possible to set up a different set of default flags for each
 .Xr moused 8
 instance.
 For example, you can use
 .Dq Li "-3"
 for the default
 .Va moused_flags
 to make your laptop's touchpad more comfortable to use,
 but an empty set of options for
 .Va moused_ums0_flags
 when your
 .Xr usb 4
 mouse has three or more buttons.
 .It Va mousechar_start
 .Pq Vt int
 If set to
 .Dq Li NO ,
 the default mouse cursor character range
 .Li 0xd0 Ns - Ns Li 0xd3
 is used,
 otherwise the range start is set
 to
 .Ar value
 character, see
 .Xr vidcontrol 1 .
 Use if the default range is occupied in the language code table.
 .It Va allscreens_flags
 .Pq Vt str
 If set,
 .Xr vidcontrol 1
 is run with these options for each of the virtual terminals
 .Pq Pa /dev/ttyv* .
 For example,
 .Dq Fl m Cm on
 will enable the mouse pointer on all virtual terminals
 if
 .Va moused_enable
 is set to
 .Dq Li YES .
 .It Va allscreens_kbdflags
 .Pq Vt str
 If set,
 .Xr kbdcontrol 1
 is run with these options for each of the virtual terminals
 .Pq Pa /dev/ttyv* .
 For example,
 .Dq Fl h Li 200
 will set the
 .Xr syscons 4
 or
 .Xr vt 4
 scrollback (history) buffer to 200 lines.
 .It Va cron_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr cron 8
 daemon at system boot time.
 .It Va cron_program
 .Pq Vt str
 Path to
 .Xr cron 8
 (default
 .Pa /usr/sbin/cron ) .
 .It Va cron_flags
 .Pq Vt str
 If
 .Va cron_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to
 .Xr cron 8 .
 .It Va cron_dst
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable the special handling of transitions to and from the
 Daylight Saving Time in
 .Xr cron 8
 (equivalent to using the flag
 .Fl s ) .
 .It Va lpd_program
 .Pq Vt str
 Path to
 .Xr lpd 8
 (default
 .Pa /usr/sbin/lpd ) .
 .It Va lpd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr lpd 8
 daemon at system boot time.
 .It Va lpd_flags
 .Pq Vt str
 If
 .Va lpd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr lpd 8
 daemon.
 .It Va chkprintcap_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run the
 .Xr chkprintcap 8
 command before starting the
 .Xr lpd 8
 daemon.
 .It Va chkprintcap_flags
 .Pq Vt str
 If
 .Va lpd_enable
 and
 .Va chkprintcap_enable
 are set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr chkprintcap 8
 program.
 The default is
 .Dq Li -d ,
 which causes missing directories to be created.
 .It Va mta_start_script
 .Pq Vt str
 This variable specifies the full path to the script to run to start
 a mail transfer agent.
 The default is
 .Pa /etc/rc.sendmail .
 The
 .Va sendmail_*
 variables which
 .Pa /etc/rc.sendmail
 uses are documented in the
 .Xr rc.sendmail 8
 manual page.
 .It Va dumpdev
 .Pq Vt str
 Indicates the device (usually a swap partition) to which a crash dump
 should be written in the event of a system crash.
 If the value of this variable is
 .Dq Li AUTO ,
 the first suitable swap device listed in
 .Pa /etc/fstab
 will be used as dump device.
 Otherwise, the value of this variable is passed as the argument to
 .Xr dumpon 8 .
 To disable crash dumps, set this variable to
 .Dq Li NO .
 .It Va dumpdir
 .Pq Vt str
 When the system reboots after a crash and a crash dump is found on the
 device specified by the
 .Va dumpdev
 variable,
 .Xr savecore 8
 will save that crash dump and a copy of the kernel to the directory
 specified by the
 .Va dumpdir
 variable.
 The default value is
 .Pa /var/crash .
 Set to
 .Dq Li NO
 to not run
 .Xr savecore 8
 at boot time when
 .Va dumpdir
 is set.
 .It Va savecore_enable
 .Pq Vt bool
 If set to
 .Dq Li NO ,
 disable automatic extraction of the crash dump from the
 .Va dumpdev .
 .It Va savecore_flags
 .Pq Vt str
 If crash dumps are enabled, these are the flags to pass to the
 .Xr savecore 8
 utility.
 .It Va quota_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to turn on user and group disk quotas on system startup via the
 .Xr quotaon 8
 command for all file systems marked as having quotas enabled in
 .Pa /etc/fstab .
 The kernel must be built with
 .Cd "options QUOTA"
 for disk quotas to function.
 .It Va check_quotas
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable user and group disk quota checking via the
 .Xr quotacheck 8
 command.
 .It Va quotacheck_flags
 .Pq Vt str
 If
 .Va quota_enable
 is set to
 .Dq Li YES ,
 and
 .Va check_quotas
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr quotacheck 8
 utility.
 The default is
 .Dq Li "-a" ,
 which checks quotas for all file systems with quotas enabled in
 .Pa /etc/fstab .
 .It Va quotaon_flags
 .Pq Vt str
 If
 .Va quota_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr quotaon 8
 utility.
 The default is
 .Dq Li "-a" ,
 which enables quotas for all file systems with quotas enabled in
 .Pa /etc/fstab .
 .It Va quotaoff_flags
 .Pq Vt str
 If
 .Va quota_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr quotaoff 8
 utility when shutting down the quota system.
 The default is
 .Dq Li "-a" ,
 which disables quotas for all file systems with quotas enabled in
 .Pa /etc/fstab .
 .It Va accounting_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable system accounting through the
 .Xr accton 8
 facility.
 .It Va ibcs2_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable iBCS2 (SCO) binary emulation at system initial boot
 time.
 .It Va ibcs2_loaders
 .Pq Vt str
 If not set to
 .Dq Li NO
 and if
 .Va ibcs2_enable
 is set to
 .Dq Li YES ,
 this specifies a list of additional iBCS2 loaders to enable.
 .It Va firstboot_sentinel
 .Pq Vt str
 This variable specifies the full path to a
 .Dq first boot
 sentinel file.
 If a file exists with this path,
 .Pa rc.d
 scripts with the
 .Dq firstboot
 keyword will be run on startup and the sentinel file will be deleted
 after the boot process completes.
 The sentinel file must be located on a writable file system which is
 mounted no later than
 .Va early_late_divider
 to function properly.
 The default is
 .Pa /firstboot .
 .It Va linux_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to enable Linux/ELF binary emulation at system initial
 boot time.
 .It Va svr4_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable SysVR4 emulation at boot time.
 .It Va sysvipc_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 load System V IPC primitives at boot time.
 .It Va clear_tmp_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to have
 .Pa /tmp
 cleaned at startup.
 .It Va clear_tmp_X
 .Pq Vt bool
 Set to
 .Dq Li NO
 to disable removing of X11 lock files,
 and the removal and (secure) recreation
 of the various socket directories for X11
 related programs.
 .It Va ldconfig_paths
 .Pq Vt str
 Set to the list of shared library paths to use with
 .Xr ldconfig 8 .
 NOTE:
 .Pa /usr/lib
 will always be added first, so it need not appear in this list.
 .It Va ldconfig32_paths
 .Pq Vt str
 Set to the list of 32-bit compatibility shared library paths to
 use with
 .Xr ldconfig 8 .
 .It Va ldconfig_paths_aout
 .Pq Vt str
 Set to the list of shared library paths to use with
 .Xr ldconfig 8
 legacy
 .Xr a.out 5
 support.
 .It Va ldconfig_insecure
 .Pq Vt bool
 The
 .Xr ldconfig 8
 utility normally refuses to use directories
 which are writable by anyone except root.
 Set this variable to
 .Dq Li YES
 to disable that security check during system startup.
 .It Va ldconfig_local_dirs
 .Pq Vt str
 Set to the list of local
 .Xr ldconfig 8
 directories.
 The names of all files in the directories listed will be
 passed as arguments to
 .Xr ldconfig 8 .
 .It Va ldconfig_local32_dirs
 .Pq Vt str
 Set to the list of local 32-bit compatibility
 .Xr ldconfig 8
 directories.
 The names of all files in the directories listed will be
 passed as arguments to
 .Dq Nm ldconfig Fl 32 .
 .It Va kern_securelevel_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to set the kernel security level at system startup.
 .It Va kern_securelevel
 .Pq Vt int
 The kernel security level to set at startup.
 The allowed range of
 .Ar value
 ranges from \-1 (the compile time default) to 3 (the
 most secure).
 See
 .Xr security 7
 for the list of possible security levels and their effect
 on system operation.
 .It Va sshd_program
 .Pq Vt str
 Path to the SSH server program
 .Pa ( /usr/sbin/sshd
 is the default).
 .It Va sshd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr sshd 8
 at system boot time.
 .It Va sshd_flags
 .Pq Vt str
 If
 .Va sshd_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr sshd 8
 daemon.
 .It Va ftpd_program
 .Pq Vt str
 Path to the FTP server program
 .Pa ( /usr/libexec/ftpd
 is the default).
 .It Va ftpd_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to start
 .Xr ftpd 8
 as a stand-alone daemon at system boot time.
 .It Va ftpd_flags
 .Pq Vt str
 If
 .Va ftpd_enable
 is set to
 .Dq Li YES ,
 these are the additional flags to pass to the
 .Xr ftpd 8
 daemon.
 .It Va watchdogd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 start the
 .Xr watchdogd 8
 daemon at boot time.
 This requires that the kernel have been compiled with a
 .Xr watchdog 4
 compatible device.
 .It Va watchdogd_flags
 .Pq Vt str
 If
 .Va watchdogd_enable
 is set to
 .Dq Li YES ,
 these are the flags passed to the
 .Xr watchdogd 8
 daemon.
 .It Va devfs_rulesets
 .Pq Vt str
 List of files containing sets of rules for
 .Xr devfs 8 .
 .It Va devfs_system_ruleset
 .Pq Vt str
 Rule name(s) to apply to the system
 .Pa /dev
 itself.
 .It Va devfs_set_rulesets
 .Pq Vt str
 Pairs of already-mounted
 .Pa dev
 directories and rulesets that should be applied to them.
 For example: /mount/dev=ruleset_name
 .It Va devfs_load_rulesets
 .Pq Vt bool
 If set, always load the default rulesets listed in
 .Va devfs_rulesets .
 .It Va performance_cx_lowest
 .Pq Vt str
 CPU idle state to use while on AC power.
 The string
 .Dq Li LOW
 indicates that
 .Xr acpi 4
 should use the lowest power state available while
 .Dq Li HIGH
 indicates that the lowest latency state (less power savings) should be used.
 .It Va performance_cpu_freq
 .Pq Vt str
 CPU clock frequency to use while on AC power.
 The string
 .Dq Li LOW
 indicates that
 .Xr cpufreq 4
 should use the lowest frequency available while
 .Dq Li HIGH
 indicates that the highest frequency (less power savings) should be used.
 .It Va economy_cx_lowest
 .Pq Vt str
 CPU idle state to use when off AC power.
 The string
 .Dq Li LOW
 indicates that
 .Xr acpi 4
 should use the lowest power state available while
 .Dq Li HIGH
 indicates that the lowest latency state (less power savings) should be used.
 .It Va economy_cpu_freq
 .Pq Vt str
 CPU clock frequency to use when off AC power.
 The string
 .Dq Li LOW
 indicates that
 .Xr cpufreq 4
 should use the lowest frequency available while
 .Dq Li HIGH
 indicates that the highest frequency (less power savings) should be used.
 .It Va jail_enable
 .Pq Vt bool
 If set to
 .Dq Li NO ,
 any configured jails will not be started.
 .It Va jail_conf
 .Pq Vt str
 The configuration filename used by
 .Xr jail 8
 utility.
 The default value is
 .Pa /etc/jail.conf .
 .It Va jail_parallel_start
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 all configured jails will be started in the background (in parallel).
 .It Va jail_flags
 .Pq Vt str
 Unset by default.
 When set, use as default value for
 .Va jail_ Ns Ao Ar jname Ac Ns Va _flags
 for every jail in
 .Va jail_list .
 .It Va jail_list
 .Pq Vt str
 A space-delimited list of jail names.
 When left empty, all of the
 .Xr jail 8
 instances defined in the configuration file are started.
 The names specified in this list control the jail startup order.
 .Xr jail 8
 instances missing from
 .Va jail_list
 must be started manually.
 Note that a jail's
 .Va depend
 parameter in the configuration file may override this list.
 .It Va jail_reverse_stop
 .Pq Vt bool
 When set to
 .Dq Li YES ,
 all configured jails in
 .Va jail_list
 are stopped in reverse order.
 .It Va jail_* variables
 Note that older releases supported per-jail configuration via
 .Xr rc.conf 5
 variables.
 For example,
 hostname of a jail named
 .Li vjail
 was able to be set by
 .Li jail_vjail_hostname .
 These per-jail configuration variables are now obsolete in favor of
 .Xr jail 8
 configuration file.
 For backward compatibility,
 when per-jail configuration variables are defined,
 .Xr jail 8
 configuration files are created as
 .Pa /var/run/jail. Ns Ao Ar jname Ac Ns Pa .conf
 and used.
 .Pp
 The following per-jail parameters are handled by
 .Pa rc.d/jail
 script out of their corresponding
 .Nm
 variables.
 In addition to them, parameters in
 .Va jail_ Ns Ao Ar jname Ac Ns Va _parameters
 will be added to the configuration file.
 They must be a semi-colon
 .Pq Ql \&;
 delimited list of
 .Dq key=value .
 For more details,
 see
 .Xr jail 8
 manual page.
 .Bl  -tag -width "host.hostname" -offset indent
 .It Li path
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _rootdir
 .It Li host.hostname
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _hostname
 .It Li exec.consolelog
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _consolelog .
 The default value is
 .Pa /var/log/jail_ Ao Ar jname Ac Pa _console.log .
 .It Li interface
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _interface .
 .It Li vnet.interface
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _vnet_interface .
 This implies
 .Li vnet
 parameter will be enabled and cannot be specified with
 .Va jail_ Ns Ao Ar jname Ac Ns Va _interface ,
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip
 and/or
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
 at the same time.
 .It Li fstab
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _fstab
 .It Li mount
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _procfs_enable .
 .It Li exec.fib
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _fib
 .It Li exec.start
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_start .
 The parameter name was
 .Li command
 in some older releases.
 .It Li exec.prestart
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_prestart
 .It Li exec.poststart
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_poststart
 .It Li exec.stop
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_stop
 .It Li exec.prestop
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_prestop
 .It Li exec.poststop
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _exec_poststop
 .It Li ip4.addr
 set if
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip
 or
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
 contain IPv4 addresses
 .It Li ip6.addr
 set if
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip
 or
 .Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
 contain IPv6 addresses
 .It Li allow.mount
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _mount_enable
 .It Li mount.devfs
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _devfs_enable
 .It Li devfs_ruleset
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _devfs_ruleset .
 This must be an integer,
 not a string.
 .It Li mount.fdescfs
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _fdescfs_enable
 .It Li allow.set_hostname
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _set_hostname_allow
 .It Li allow.rawsocket
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _socket_unixiproute_only
 .It Li allow.sysvipc
 set from
 .Va jail_ Ns Ao Ar jname Ac Ns Va _sysvipc_allow
 .El
 .\" -----------------------------------------------------
 .It Va harvest_mask
 .Pq Vt int
 Set to a bit-mask
 representing the entropy sources
 you wish to harvest.
 Refer to
 .Xr random 4
 for more information.
 .It Va entropy_dir
 .Pq Vt str
 Set to
 .Dq Li NO
 to disable caching entropy via
 .Xr cron 8 .
 Otherwise set to the directory
 in which the entropy files are stored.
 To be useful,
 there must be
 a system cron job
 that regularly writes and rotates
 files here.
 All files found
 will be used at boot time.
 The default is
 .Pa /var/db/entropy .
 .It Va entropy_file
 .Pq Vt str
 Set to
 .Dq Li NO
 to disable caching entropy through reboots.
 Otherwise set to the name
 of a file used to store cached entropy.
 This file should be located
 on a file system that is readable
 before all the volumes specified in
 .Xr fstab 5
 are mounted.
 By default,
 .Pa /entropy
 is used,
 but if
 .Pa /var/db/entropy-file
 is found it will also be used.
 This will be of some use to
 .Xr bsdinstall 8 .
 .It Va entropy_boot_file
 .Pq Vt str
 Set to
 .Dq Li NO
 to disable
 very early caching entropy
 through reboots.
 Otherwise set to the filename
 used to read
 very early reboot cached entropy.
 This file should be located where
 .Xr loader 8
 can read it.
 See also
 .Xr loader.conf 5 .
 The default location is
 .Pa /boot/entropy .
 .It Va entropy_save_sz
 .Pq Vt int
 Size of the entropy cache files saved by
 .Nm save-entropy
 periodically.
 .It Va entropy_save_num
 .Pq Vt int
 Number of entropy cache files to save by
 .Nm save-entropy
 periodically.
 .It Va ipsec_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to run
 .Xr setkey 8
 on
 .Va ipsec_file
 at boot time.
 .It Va ipsec_file
 .Pq Vt str
 Configuration file for
 .Xr setkey 8 .
 .It Va dmesg_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to save
 .Xr dmesg 8
 to
 .Pa /var/run/dmesg.boot
 on boot.
 .It Va rcshutdown_timeout
 .Pq Vt int
 If set, start a watchdog timer in the background which will terminate
 .Pa rc.shutdown
 if
 .Xr shutdown 8
 has not completed within the specified time (in seconds).
 Notice that in addition to this soft timeout,
 .Xr init 8
 also applies a hard timeout for the execution of
 .Pa rc.shutdown .
 This is configured via
 .Xr sysctl 8
 variable
 .Va kern.init_shutdown_timeout
 and defaults to 120 seconds.
 Setting the value of
 .Va rcshutdown_timeout
 to more than 120 seconds will have no effect until the
 .Xr sysctl 8
 variable
 .Va kern.init_shutdown_timeout
 is also increased.
 .It Va virecover_enable
 .Pq Vt bool
 Set to
 .Dq Li NO
 to prevent the system from trying to
 recover pre-maturely terminated
 .Xr vi 1
 sessions.
 .It Va ugidfw_enable
 .Pq Vt bool
 Set to
 .Dq Li YES
 to load the
 .Xr mac_bsdextended 4
 module upon system initialization and load a default
 ruleset file.
 .It Va bsdextended_script
 .Pq Vt str
 The default
 .Xr mac_bsdextended 4
 ruleset file to load.
 The default value of this variable is
 .Pa /etc/rc.bsdextended .
 .It Va newsyslog_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 run
 .Xr newsyslog 8
 command at startup.
 .It Va newsyslog_flags
 .Pq Vt str
 If
 .Va newsyslog_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr newsyslog 8
 program.
 The default is
 .Dq Li -CN ,
 which causes log files flagged with a
 .Cm C
 to be created.
 .It Va mdconfig_md Ns Aq Ar X
 .Pq Vt str
 Arguments to
 .Xr mdconfig 8
 for
 .Xr md 4
 device
 .Ar X .
 At minimum a
 .Fl t Ar type
 must be specified and either a
 .Fl s Ar size
 for malloc or swap backed
 .Xr md 4
 devices or a
 .Fl f Ar file
 for vnode backed
 .Xr md 4
 devices.
 Note that
 .Va mdconfig_md Ns Aq Ar X
 variables are evaluated until one variable is unset or null.
 .It Va mdconfig_md Ns Ao Ar X Ac Ns Va _newfs
 .Pq Vt str
 Optional arguments passed to
 .Xr newfs 8
 to initialize
 .Xr md 4
 device
 .Ar X .
 .It Va mdconfig_md Ns Ao Ar X Ac Ns Va _owner
 .Pq Vt str
 An ownership specification passed to
 .Xr chown 8
 after the specified
 .Xr md 4
 device
 .Ar X
 has been mounted.
 Both the
 .Xr md 4
 device and the mount point will be changed.
 .It Va mdconfig_md Ns Ao Ar X Ac Ns Va _perms
 .Pq Vt str
 A mode string passed to
 .Xr chmod 1
 after the specified
 .Xr md 4
 device
 .Ar X
 has been mounted.
 Both the
 .Xr md 4
 device and the mount point will be changed.
 .It Va mdconfig_md Ns Ao Ar X Ac Ns Va _files
 .Pq Vt str
 Files to be copied to the mount point of the
 .Xr md 4
 device
 .Ar X
 after it has been mounted.
 .It Va mdconfig_md Ns Ao Ar X Ac Ns Va _cmd
 .Pq Vt str
 Command to execute after the specified
 .Xr md 4
 device
 .Ar X
 has been mounted.
 Note that the command is passed to
 .Ic eval
 and that both
 .Va _dev
 and
 .Va _mp
 variables can be used to reference respectively the
 .Xr md 4
 device and the mount point.
 Assuming that the
 .Xr md 4
 device is
 .Li md0 ,
 one could set the following:
 .Bd -literal
 mdconfig_md0_cmd="tar xfzC /var/file.tgz \e${_mp}"
 .Ed
 .It Va autobridge_interfaces
 .Pq Vt str
 Set to the list of bridge interfaces that will have newly arriving interfaces
 checked against to be automatically added.
 If not set to
 .Dq Li NO
 then for each whitespace separated
 .Ar element
 in the value, a
 .Va autobridge_ Ns Aq Ar element
 variable is assumed to exist which has a whitespace separated list of interface
 names to match, these names can use wildcards.
 For example:
 .Bd -literal
 autobridge_interfaces="bridge0"
 autobridge_bridge0="tap* dc0 vlan[345]"
 .Ed
 .It Va mixer_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable support for sound mixer.
 .It Va hcsecd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable Bluetooth security daemon.
 .It Va hcsecd_config
 .Pq Vt str
 Configuration file for
 .Xr hcsecd 8 .
 Default
 .Pa /etc/bluetooth/hcsecd.conf .
 .It Va sdpd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable Bluetooth Service Discovery Protocol daemon.
 .It Va sdpd_control
 .Pq Vt str
 Path to
 .Xr sdpd 8
 control socket.
 Default
 .Pa /var/run/sdp .
 .It Va sdpd_groupname
 .Pq Vt str
 Sets
 .Xr sdpd 8
 group to run as after it initializes.
 Default
 .Dq Li nobody .
 .It Va sdpd_username
 .Pq Vt str
 Sets
 .Xr sdpd 8
 user to run as after it initializes.
 Default
 .Dq Li nobody .
 .It Va bthidd_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable Bluetooth Human Interface Device daemon.
 .It Va bthidd_config
 .Pq Vt str
 Configuration file for
 .Xr bthidd 8 .
 Default
 .Pa /etc/bluetooth/bthidd.conf .
 .It Va bthidd_hids
 .Pq Vt str
 Path to a file, where
 .Xr bthidd 8
 will store information about known HID devices.
 Default
 .Pa /var/db/bthidd.hids .
 .It Va rfcomm_pppd_server_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 enable Bluetooth RFCOMM PPP wrapper daemon.
 .It Va rfcomm_pppd_server_profile
 .Pq Vt str
 The name of the profile to use from
 .Pa /etc/ppp/ppp.conf .
 Multiple profiles can be specified here.
 Also used to specify per-profile overrides.
 When the profile name contains any of the characters
 .Dq Li .-/+
 they are translated to
 .Dq Li _
 for the proposes of the override variable names.
 .It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _bdaddr
 .Pq Vt str
 Overrides local address to listen on.
 By default
 .Xr rfcomm_pppd 8
 will listen on
 .Dq Li ANY
 address.
 The address can be specified as BD_ADDR or name.
 .It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _channel
 .Pq Vt str
 Overrides local RFCOMM channel to listen on.
 By default
 .Xr rfcomm_pppd 8
 will listen on RFCOMM channel 1.
 Must set properly if multiple profiles used in the same time.
 .It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _register_sp
 .Pq Vt bool
 Tells
 .Xr rfcomm_pppd 8
 if it should register Serial Port service on the specified RFCOMM channel.
 Default
 .Dq Li NO .
 .It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _register_dun
 .Pq Vt bool
 Tells
 .Xr rfcomm_pppd 8
 if it should register Dial-Up Networking service on the specified
 RFCOMM channel.
 Default
 .Dq Li NO .
 .It Va ubthidhci_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 change the USB Bluetooth controller from HID mode to HCI mode.
 You also need to specify the location of USB Bluetooth controller with the
 .Va ubthidhci_busnum
 and
 .Va ubthidhci_addr
 variables.
 .It Va ubthidhci_busnum
 Bus number where the USB Bluetooth controller is located.
 Check the output of
 .Xr usbconfig 8
 on your system to find this information.
 .It Va ubthidhci_addr
 Bus address of the USB Bluetooth controller.
 Check the output of
 .Xr usbconfig 8
 on your system to find this information.
 .It Va netwait_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 delays the start of network-reliant services until
 .Va netwait_if
 is up and ICMP packets to a destination defined in
 .Va netwait_ip
 are flowing.
 Link state is examined first, followed by
 .Dq Li pinging
 an IP address to verify network usability.
 If no destination can be reached or timeouts are exceeded,
 network services are started anyway with no guarantee that
 the network is usable.
 Use of this variable requires both
 .Va netwait_ip
 and
 .Va netwait_if
 to be set.
 .It Va netwait_ip
 .Pq Vt str
 Empty by default.
 This variable contains a space-delimited list of IP addresses to
 .Xr ping 8 .
 DNS hostnames should not be used as resolution is not guaranteed
 to be functional at this point.
 If multiple IP addresses are specified,
 each will be tried until one is successful or the list is exhausted.
 .It Va netwait_timeout
 .Pq Vt int
 Indicates the total number of seconds to perform a
 .Dq Li ping
 against each IP address in
 .Va netwait_ip ,
 at a rate of one ping per second.
 If any of the pings are successful,
 full network connectivity is considered reliable.
 The default is 60.
 .It Va netwait_if
 .Pq Vt str
 Empty by default.
 Defines the name of the network interface on which watch for link.
 .Xr ifconfig 8
 is used to monitor the interface, looking for
 .Dq Li status: no carrier .
 Once gone, the link is considered up.
 This can be a
 .Xr vlan 4
 interface if desired.
 .It Va netwait_if_timeout
 .Pq Vt int
 Defines the total number of seconds to wait for link to become usable,
 polled at a 1-second interval.
 The default is 30.
 .It Va rctl_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 load
 .Xr rctl 8
 rules from the defined ruleset.
 The kernel must be built with
 .Cd "options RACCT"
 and
 .Cd "options RCTL" .
 .It Va rctl_rules
 .Pq Vt str
 Set to
 .Pa /etc/rctl.conf
 by default.
 This variables contains the
 .Xr rctl.conf 5
 ruleset to load for
 .Xr rctl 8 .
 .It Va iovctl_files
 .Pq Vt str
 A space-separated list of configuration files used by
 .Xr iovctl 8 .
 The default value is an empty string.
 .It Va autofs_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 start the
 .Xr automount 8
 utility and the
 .Xr automountd 8
 and
 .Xr autounmountd 8
 daemons at boot time.
 .It Va automount_flags
 .Pq Vt str
 If
 .Va autofs_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr automount 8
 program.
 By default no flags are passed.
 .It Va automountd_flags
 .Pq Vt str
 If
 .Va autofs_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr automountd 8
 daemon.
 By default no flags are passed.
 .It Va autounmountd_flags
 .Pq Vt str
 If
 .Va autofs_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr autounmountd 8
 daemon.
 By default no flags are passed.
 .It Va ctld_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 start the
 .Xr ctld 8
 daemon at boot time.
 .It Va iscsid_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 start the
 .Xr iscsid 8
 daemon at boot time.
 .It Va iscsictl_enable
 .Pq Vt bool
 If set to
 .Dq Li YES ,
 start the
 .Xr iscsictl 8
 utility at boot time.
 .It Va iscsictl_flags
 .Pq Vt str
 If
 .Va iscsictl_enable
 is set to
 .Dq Li YES ,
 these are the flags to pass to the
 .Xr iscsictl 8
 program.
 The default is
 .Dq Li -Aa ,
 which configures sessions based on the
 .Pa /etc/iscsi.conf
 configuration file.
 .El
 .Sh FILES
 .Bl -tag -width ".Pa /etc/defaults/rc.conf" -compact
 .It Pa /etc/defaults/rc.conf
 .It Pa /etc/rc.conf
 .It Pa /etc/rc.conf.local
 .El
 .Sh SEE ALSO
 .Xr catman 1 ,
 .Xr chmod 1 ,
 .Xr gdb 1 ,
 .Xr info 1 ,
 .Xr kbdcontrol 1 ,
 .Xr makewhatis 1 ,
 .Xr sh 1 ,
 .Xr vi 1 ,
 .Xr vidcontrol 1 ,
 .Xr bridge 4 ,
 .Xr dummynet 4 ,
 .Xr ip 4 ,
 .Xr ipf 4 ,
 .Xr ipfw 4 ,
 .Xr ipnat 4 ,
 .Xr kld 4 ,
 .Xr pf 4 ,
 .Xr pflog 4 ,
 .Xr pfsync 4 ,
 .Xr tcp 4 ,
 .Xr udp 4 ,
 .Xr exports 5 ,
 .Xr fstab 5 ,
 .Xr ipf 5 ,
 .Xr ipnat 5 ,
 .Xr jail.conf 5 ,
 .Xr loader.conf 5 ,
 .Xr motd 5 ,
 .Xr newsyslog.conf 5 ,
 .Xr pf.conf 5 ,
 .Xr security 7 ,
 .Xr accton 8 ,
 .Xr amd 8 ,
 .Xr apm 8 ,
 .Xr atm 8 ,
 .Xr bsdinstall 8 ,
 .Xr bthidd 8 ,
 .Xr chkprintcap 8 ,
 .Xr chown 8 ,
 .Xr cron 8 ,
 .Xr devfs 8 ,
 .Xr dhclient 8 ,
 .Xr ftpd 8 ,
 .Xr geli 8 ,
 .Xr hcsecd 8 ,
 .Xr ifconfig 8 ,
 .Xr inetd 8 ,
 .Xr iovctl 8 ,
 .Xr ipf 8 ,
 .Xr ipfw 8 ,
 .Xr ipnat 8 ,
 .Xr jail 8 ,
 .Xr kldxref 8 ,
 .Xr loader 8 ,
 .Xr lpd 8 ,
 .Xr mdconfig 8 ,
 .Xr mdmfs 8 ,
 .Xr mixer 8 ,
 .Xr mountd 8 ,
 .Xr moused 8 ,
 .Xr newfs 8 ,
 .Xr newsyslog 8 ,
 .Xr nfsd 8 ,
 .Xr ntpd 8 ,
 .Xr ntpdate 8 ,
 .Xr pfctl 8 ,
 .Xr pflogd 8 ,
 .Xr ping 8 ,
 .Xr powerd 8 ,
 .Xr quotacheck 8 ,
 .Xr quotaon 8 ,
 .Xr rc 8 ,
 .Xr rc.sendmail 8 ,
 .Xr rfcomm_pppd 8 ,
 .Xr route 8 ,
 .Xr routed 8 ,
 .Xr rpc.lockd 8 ,
 .Xr rpc.statd 8 ,
 .Xr rpcbind 8 ,
 .Xr rwhod 8 ,
 .Xr savecore 8 ,
 .Xr sdpd 8 ,
 .Xr sshd 8 ,
 .Xr swapon 8 ,
 .Xr sysctl 8 ,
 .Xr syslogd 8 ,
 .Xr timed 8 ,
 .Xr unbound 8 ,
 .Xr usbconfig 8 ,
 .Xr wlandebug 8 ,
 .Xr yp 8 ,
 .Xr ypbind 8 ,
 .Xr ypserv 8 ,
 .Xr ypset 8
 .Sh HISTORY
 The
 .Nm
 file appeared in
 .Fx 2.2.2 .
 .Sh AUTHORS
 .An Jordan K. Hubbard .
Index: projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h
===================================================================
--- projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h	(revision 301546)
+++ projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h	(revision 301547)
@@ -1,111 +1,112 @@
 /*-
- * Copyright (c) 2015 Mellanox Technologies, Ltd. All rights reserved.
+ * Copyright (c) 2015-2016 Mellanox Technologies, Ltd. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef _LINUX_ETHERDEVICE
 #define	_LINUX_ETHERDEVICE
 
 #include <linux/types.h>
 
 #include <sys/random.h>
 #include <sys/libkern.h>
 
 #define	ETH_MODULE_SFF_8079		1
 #define	ETH_MODULE_SFF_8079_LEN		256
 #define	ETH_MODULE_SFF_8472		2
 #define	ETH_MODULE_SFF_8472_LEN		512
 #define	ETH_MODULE_SFF_8636		3
 #define	ETH_MODULE_SFF_8636_LEN		256
 #define	ETH_MODULE_SFF_8436		4
 #define	ETH_MODULE_SFF_8436_LEN		256
 
 struct ethtool_eeprom {
 	u32	offset;
 	u32	len;
 };
 
 struct ethtool_modinfo {
 	u32	type;
 	u32	eeprom_len;
 };
 
 static inline bool 
 is_zero_ether_addr(const u8 * addr)
 {
 	return ((addr[0] + addr[1] + addr[2] + addr[3] + addr[4] + addr[5]) == 0x00);
 }
 
 static inline bool 
 is_multicast_ether_addr(const u8 * addr)
 {
 	return (0x01 & addr[0]);
 }
 
 static inline bool 
 is_broadcast_ether_addr(const u8 * addr)
 {
 	return ((addr[0] + addr[1] + addr[2] + addr[3] + addr[4] + addr[5]) == (6 * 0xff));
 }
 
 static inline bool 
 is_valid_ether_addr(const u8 * addr)
 {
 	return !is_multicast_ether_addr(addr) && !is_zero_ether_addr(addr);
 }
 
 static inline void 
 ether_addr_copy(u8 * dst, const u8 * src)
 {
 	memcpy(dst, src, 6);
 }
 
 static inline bool
 ether_addr_equal(const u8 *pa, const u8 *pb)
 {
 	return (memcmp(pa, pb, 6) == 0);
 }
 
 static inline bool
 ether_addr_equal_64bits(const u8 *pa, const u8 *pb)
 {
 	return (memcmp(pa, pb, 6) == 0);
 }
 
 static inline void
 eth_broadcast_addr(u8 *pa)
 {
 	memset(pa, 0xff, 6);
 }
 
 static inline void
 random_ether_addr(u8 * dst)
 {
-	read_random(dst, 6);
+	if (read_random(dst, 6) == 0)
+		arc4rand(dst, 6, 0);
 
 	dst[0] &= 0xfe;
 	dst[0] |= 0x02;
 }
 
 #endif					/* _LINUX_ETHERDEVICE */
Index: projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h
===================================================================
--- projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h	(revision 301546)
+++ projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h	(revision 301547)
@@ -1,42 +1,44 @@
 /*-
  * Copyright (c) 2010 Isilon Systems, Inc.
  * Copyright (c) 2010 iX Systems, Inc.
  * Copyright (c) 2010 Panasas, Inc.
- * Copyright (c) 2013, 2014 Mellanox Technologies, Ltd.
+ * Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	_LINUX_RANDOM_H_
 #define	_LINUX_RANDOM_H_
 
 #include <sys/random.h>
+#include <sys/libkern.h>
 
 static inline void
 get_random_bytes(void *buf, int nbytes)
 {
-	read_random(buf, nbytes);
+	if (read_random(buf, nbytes) == 0)
+		arc4rand(buf, nbytes, 0);
 }
 
 #endif	/* _LINUX_RANDOM_H_ */
Index: projects/vnet/sys/dev/cxgb/cxgb_sge.c
===================================================================
--- projects/vnet/sys/dev/cxgb/cxgb_sge.c	(revision 301546)
+++ projects/vnet/sys/dev/cxgb/cxgb_sge.c	(revision 301547)
@@ -1,3706 +1,3707 @@
 /**************************************************************************
 
 Copyright (c) 2007-2009, Chelsio Inc.
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
  1. Redistributions of source code must retain the above copyright notice,
     this list of conditions and the following disclaimer.
 
  2. Neither the name of the Chelsio Corporation nor the names of its
     contributors may be used to endorse or promote products derived from
     this software without specific prior written permission.
  
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
 
 ***************************************************************************/
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet6.h"
 #include "opt_inet.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <machine/bus.h>
 #include <machine/resource.h>
 #include <sys/bus_dma.h>
 #include <sys/rman.h>
 #include <sys/queue.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
 
 #include <sys/proc.h>
 #include <sys/sbuf.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 #include <sys/systm.h>
 #include <sys/syslog.h>
 #include <sys/socket.h>
 #include <sys/sglist.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/bpf.h>	
 #include <net/ethernet.h>
 #include <net/if_vlan_var.h>
 
 #include <netinet/in_systm.h>
 #include <netinet/in.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 #include <netinet/tcp.h>
 
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <cxgb_include.h>
 #include <sys/mvec.h>
 
 int	txq_fills = 0;
 int	multiq_tx_enable = 1;
 
 #ifdef TCP_OFFLOAD
 CTASSERT(NUM_CPL_HANDLERS >= NUM_CPL_CMDS);
 #endif
 
 extern struct sysctl_oid_list sysctl__hw_cxgb_children;
 int cxgb_txq_buf_ring_size = TX_ETH_Q_SIZE;
 SYSCTL_INT(_hw_cxgb, OID_AUTO, txq_mr_size, CTLFLAG_RDTUN, &cxgb_txq_buf_ring_size, 0,
     "size of per-queue mbuf ring");
 
 static int cxgb_tx_coalesce_force = 0;
 SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_force, CTLFLAG_RWTUN,
     &cxgb_tx_coalesce_force, 0,
     "coalesce small packets into a single work request regardless of ring state");
 
 #define	COALESCE_START_DEFAULT		TX_ETH_Q_SIZE>>1
 #define	COALESCE_START_MAX		(TX_ETH_Q_SIZE-(TX_ETH_Q_SIZE>>3))
 #define	COALESCE_STOP_DEFAULT		TX_ETH_Q_SIZE>>2
 #define	COALESCE_STOP_MIN		TX_ETH_Q_SIZE>>5
 #define	TX_RECLAIM_DEFAULT		TX_ETH_Q_SIZE>>5
 #define	TX_RECLAIM_MAX			TX_ETH_Q_SIZE>>2
 #define	TX_RECLAIM_MIN			TX_ETH_Q_SIZE>>6
 
 
 static int cxgb_tx_coalesce_enable_start = COALESCE_START_DEFAULT;
 SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_enable_start, CTLFLAG_RWTUN,
     &cxgb_tx_coalesce_enable_start, 0,
     "coalesce enable threshold");
 static int cxgb_tx_coalesce_enable_stop = COALESCE_STOP_DEFAULT;
 SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_enable_stop, CTLFLAG_RWTUN,
     &cxgb_tx_coalesce_enable_stop, 0,
     "coalesce disable threshold");
 static int cxgb_tx_reclaim_threshold = TX_RECLAIM_DEFAULT;
 SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_reclaim_threshold, CTLFLAG_RWTUN,
     &cxgb_tx_reclaim_threshold, 0,
     "tx cleaning minimum threshold");
 
 /*
  * XXX don't re-enable this until TOE stops assuming
  * we have an m_ext
  */
 static int recycle_enable = 0;
 
 extern int cxgb_use_16k_clusters;
 extern int nmbjumbop;
 extern int nmbjumbo9;
 extern int nmbjumbo16;
 
 #define USE_GTS 0
 
 #define SGE_RX_SM_BUF_SIZE	1536
 #define SGE_RX_DROP_THRES	16
 #define SGE_RX_COPY_THRES	128
 
 /*
  * Period of the Tx buffer reclaim timer.  This timer does not need to run
  * frequently as Tx buffers are usually reclaimed by new Tx packets.
  */
 #define TX_RECLAIM_PERIOD       (hz >> 1)
 
 /* 
  * Values for sge_txq.flags
  */
 enum {
 	TXQ_RUNNING	= 1 << 0,  /* fetch engine is running */
 	TXQ_LAST_PKT_DB = 1 << 1,  /* last packet rang the doorbell */
 };
 
 struct tx_desc {
 	uint64_t	flit[TX_DESC_FLITS];
 } __packed;
 
 struct rx_desc {
 	uint32_t	addr_lo;
 	uint32_t	len_gen;
 	uint32_t	gen2;
 	uint32_t	addr_hi;
 } __packed;
 
 struct rsp_desc {               /* response queue descriptor */
 	struct rss_header	rss_hdr;
 	uint32_t		flags;
 	uint32_t		len_cq;
 	uint8_t			imm_data[47];
 	uint8_t			intr_gen;
 } __packed;
 
 #define RX_SW_DESC_MAP_CREATED	(1 << 0)
 #define TX_SW_DESC_MAP_CREATED	(1 << 1)
 #define RX_SW_DESC_INUSE        (1 << 3)
 #define TX_SW_DESC_MAPPED       (1 << 4)
 
 #define RSPQ_NSOP_NEOP           G_RSPD_SOP_EOP(0)
 #define RSPQ_EOP                 G_RSPD_SOP_EOP(F_RSPD_EOP)
 #define RSPQ_SOP                 G_RSPD_SOP_EOP(F_RSPD_SOP)
 #define RSPQ_SOP_EOP             G_RSPD_SOP_EOP(F_RSPD_SOP|F_RSPD_EOP)
 
 struct tx_sw_desc {                /* SW state per Tx descriptor */
 	struct mbuf	*m;
 	bus_dmamap_t	map;
 	int		flags;
 };
 
 struct rx_sw_desc {                /* SW state per Rx descriptor */
 	caddr_t		rxsd_cl;
 	struct mbuf	*m;
 	bus_dmamap_t	map;
 	int		flags;
 };
 
 struct txq_state {
 	unsigned int	compl;
 	unsigned int	gen;
 	unsigned int	pidx;
 };
 
 struct refill_fl_cb_arg {
 	int               error;
 	bus_dma_segment_t seg;
 	int               nseg;
 };
 
 
 /*
  * Maps a number of flits to the number of Tx descriptors that can hold them.
  * The formula is
  *
  * desc = 1 + (flits - 2) / (WR_FLITS - 1).
  *
  * HW allows up to 4 descriptors to be combined into a WR.
  */
 static uint8_t flit_desc_map[] = {
 	0,
 #if SGE_NUM_GENBITS == 1
 	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 	2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
 	3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
 	4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
 #elif SGE_NUM_GENBITS == 2
 	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 	2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
 	3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
 	4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
 #else
 # error "SGE_NUM_GENBITS must be 1 or 2"
 #endif
 };
 
 #define	TXQ_LOCK_ASSERT(qs)	mtx_assert(&(qs)->lock, MA_OWNED)
 #define	TXQ_TRYLOCK(qs)		mtx_trylock(&(qs)->lock)	
 #define	TXQ_LOCK(qs)		mtx_lock(&(qs)->lock)	
 #define	TXQ_UNLOCK(qs)		mtx_unlock(&(qs)->lock)	
 #define	TXQ_RING_EMPTY(qs)	drbr_empty((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
 #define	TXQ_RING_NEEDS_ENQUEUE(qs)					\
 	drbr_needs_enqueue((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
 #define	TXQ_RING_FLUSH(qs)	drbr_flush((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
 #define	TXQ_RING_DEQUEUE_COND(qs, func, arg)				\
 	drbr_dequeue_cond((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr, func, arg)
 #define	TXQ_RING_DEQUEUE(qs) \
 	drbr_dequeue((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
 
 int cxgb_debug = 0;
 
 static void sge_timer_cb(void *arg);
 static void sge_timer_reclaim(void *arg, int ncount);
 static void sge_txq_reclaim_handler(void *arg, int ncount);
 static void cxgb_start_locked(struct sge_qset *qs);
 
 /*
  * XXX need to cope with bursty scheduling by looking at a wider
  * window than we are now for determining the need for coalescing
  *
  */
 static __inline uint64_t
 check_pkt_coalesce(struct sge_qset *qs) 
 { 
         struct adapter *sc; 
         struct sge_txq *txq; 
 	uint8_t *fill;
 
 	if (__predict_false(cxgb_tx_coalesce_force))
 		return (1);
 	txq = &qs->txq[TXQ_ETH]; 
         sc = qs->port->adapter; 
 	fill = &sc->tunq_fill[qs->idx];
 
 	if (cxgb_tx_coalesce_enable_start > COALESCE_START_MAX)
 		cxgb_tx_coalesce_enable_start = COALESCE_START_MAX;
 	if (cxgb_tx_coalesce_enable_stop < COALESCE_STOP_MIN)
 		cxgb_tx_coalesce_enable_start = COALESCE_STOP_MIN;
 	/*
 	 * if the hardware transmit queue is more than 1/8 full
 	 * we mark it as coalescing - we drop back from coalescing
 	 * when we go below 1/32 full and there are no packets enqueued, 
 	 * this provides us with some degree of hysteresis
 	 */
         if (*fill != 0 && (txq->in_use <= cxgb_tx_coalesce_enable_stop) &&
 	    TXQ_RING_EMPTY(qs) && (qs->coalescing == 0))
                 *fill = 0; 
         else if (*fill == 0 && (txq->in_use >= cxgb_tx_coalesce_enable_start))
                 *fill = 1; 
 
 	return (sc->tunq_coalesce);
 } 
 
 #ifdef __LP64__
 static void
 set_wr_hdr(struct work_request_hdr *wrp, uint32_t wr_hi, uint32_t wr_lo)
 {
 	uint64_t wr_hilo;
 #if _BYTE_ORDER == _LITTLE_ENDIAN
 	wr_hilo = wr_hi;
 	wr_hilo |= (((uint64_t)wr_lo)<<32);
 #else
 	wr_hilo = wr_lo;
 	wr_hilo |= (((uint64_t)wr_hi)<<32);
 #endif	
 	wrp->wrh_hilo = wr_hilo;
 }
 #else
 static void
 set_wr_hdr(struct work_request_hdr *wrp, uint32_t wr_hi, uint32_t wr_lo)
 {
 
 	wrp->wrh_hi = wr_hi;
 	wmb();
 	wrp->wrh_lo = wr_lo;
 }
 #endif
 
 struct coalesce_info {
 	int count;
 	int nbytes;
 };
 
 static int
 coalesce_check(struct mbuf *m, void *arg)
 {
 	struct coalesce_info *ci = arg;
 	int *count = &ci->count;
 	int *nbytes = &ci->nbytes;
 
 	if ((*nbytes == 0) || ((*nbytes + m->m_len <= 10500) &&
 		(*count < 7) && (m->m_next == NULL))) {
 		*count += 1;
 		*nbytes += m->m_len;
 		return (1);
 	}
 	return (0);
 }
 
 static struct mbuf *
 cxgb_dequeue(struct sge_qset *qs)
 {
 	struct mbuf *m, *m_head, *m_tail;
 	struct coalesce_info ci;
 
 	
 	if (check_pkt_coalesce(qs) == 0) 
 		return TXQ_RING_DEQUEUE(qs);
 
 	m_head = m_tail = NULL;
 	ci.count = ci.nbytes = 0;
 	do {
 		m = TXQ_RING_DEQUEUE_COND(qs, coalesce_check, &ci);
 		if (m_head == NULL) {
 			m_tail = m_head = m;
 		} else if (m != NULL) {
 			m_tail->m_nextpkt = m;
 			m_tail = m;
 		}
 	} while (m != NULL);
 	if (ci.count > 7)
 		panic("trying to coalesce %d packets in to one WR", ci.count);
 	return (m_head);
 }
 	
 /**
  *	reclaim_completed_tx - reclaims completed Tx descriptors
  *	@adapter: the adapter
  *	@q: the Tx queue to reclaim completed descriptors from
  *
  *	Reclaims Tx descriptors that the SGE has indicated it has processed,
  *	and frees the associated buffers if possible.  Called with the Tx
  *	queue's lock held.
  */
 static __inline int
 reclaim_completed_tx(struct sge_qset *qs, int reclaim_min, int queue)
 {
 	struct sge_txq *q = &qs->txq[queue];
 	int reclaim = desc_reclaimable(q);
 
 	if ((cxgb_tx_reclaim_threshold > TX_RECLAIM_MAX) ||
 	    (cxgb_tx_reclaim_threshold < TX_RECLAIM_MIN))
 		cxgb_tx_reclaim_threshold = TX_RECLAIM_DEFAULT;
 
 	if (reclaim < reclaim_min)
 		return (0);
 
 	mtx_assert(&qs->lock, MA_OWNED);
 	if (reclaim > 0) {
 		t3_free_tx_desc(qs, reclaim, queue);
 		q->cleaned += reclaim;
 		q->in_use -= reclaim;
 	}
 	if (isset(&qs->txq_stopped, TXQ_ETH))
                 clrbit(&qs->txq_stopped, TXQ_ETH);
 
 	return (reclaim);
 }
 
 /**
  *	should_restart_tx - are there enough resources to restart a Tx queue?
  *	@q: the Tx queue
  *
  *	Checks if there are enough descriptors to restart a suspended Tx queue.
  */
 static __inline int
 should_restart_tx(const struct sge_txq *q)
 {
 	unsigned int r = q->processed - q->cleaned;
 
 	return q->in_use - r < (q->size >> 1);
 }
 
 /**
  *	t3_sge_init - initialize SGE
  *	@adap: the adapter
  *	@p: the SGE parameters
  *
  *	Performs SGE initialization needed every time after a chip reset.
  *	We do not initialize any of the queue sets here, instead the driver
  *	top-level must request those individually.  We also do not enable DMA
  *	here, that should be done after the queues have been set up.
  */
 void
 t3_sge_init(adapter_t *adap, struct sge_params *p)
 {
 	u_int ctrl, ups;
 
 	ups = 0; /* = ffs(pci_resource_len(adap->pdev, 2) >> 12); */
 
 	ctrl = F_DROPPKT | V_PKTSHIFT(2) | F_FLMODE | F_AVOIDCQOVFL |
 	       F_CQCRDTCTRL | F_CONGMODE | F_TNLFLMODE | F_FATLPERREN |
 	       V_HOSTPAGESIZE(PAGE_SHIFT - 11) | F_BIGENDIANINGRESS |
 	       V_USERSPACESIZE(ups ? ups - 1 : 0) | F_ISCSICOALESCING;
 #if SGE_NUM_GENBITS == 1
 	ctrl |= F_EGRGENCTRL;
 #endif
 	if (adap->params.rev > 0) {
 		if (!(adap->flags & (USING_MSIX | USING_MSI)))
 			ctrl |= F_ONEINTMULTQ | F_OPTONEINTMULTQ;
 	}
 	t3_write_reg(adap, A_SG_CONTROL, ctrl);
 	t3_write_reg(adap, A_SG_EGR_RCQ_DRB_THRSH, V_HIRCQDRBTHRSH(512) |
 		     V_LORCQDRBTHRSH(512));
 	t3_write_reg(adap, A_SG_TIMER_TICK, core_ticks_per_usec(adap) / 10);
 	t3_write_reg(adap, A_SG_CMDQ_CREDIT_TH, V_THRESHOLD(32) |
 		     V_TIMEOUT(200 * core_ticks_per_usec(adap)));
 	t3_write_reg(adap, A_SG_HI_DRB_HI_THRSH,
 		     adap->params.rev < T3_REV_C ? 1000 : 500);
 	t3_write_reg(adap, A_SG_HI_DRB_LO_THRSH, 256);
 	t3_write_reg(adap, A_SG_LO_DRB_HI_THRSH, 1000);
 	t3_write_reg(adap, A_SG_LO_DRB_LO_THRSH, 256);
 	t3_write_reg(adap, A_SG_OCO_BASE, V_BASE1(0xfff));
 	t3_write_reg(adap, A_SG_DRB_PRI_THRESH, 63 * 1024);
 }
 
 
 /**
  *	sgl_len - calculates the size of an SGL of the given capacity
  *	@n: the number of SGL entries
  *
  *	Calculates the number of flits needed for a scatter/gather list that
  *	can hold the given number of entries.
  */
 static __inline unsigned int
 sgl_len(unsigned int n)
 {
 	return ((3 * n) / 2 + (n & 1));
 }
 
 /**
  *	get_imm_packet - return the next ingress packet buffer from a response
  *	@resp: the response descriptor containing the packet data
  *
  *	Return a packet containing the immediate data of the given response.
  */
 static int
 get_imm_packet(adapter_t *sc, const struct rsp_desc *resp, struct mbuf *m)
 {
 
 	if (resp->rss_hdr.opcode == CPL_RX_DATA) {
 		const struct cpl_rx_data *cpl = (const void *)&resp->imm_data[0];
 		m->m_len = sizeof(*cpl) + ntohs(cpl->len);
 	} else if (resp->rss_hdr.opcode == CPL_RX_PKT) {
 		const struct cpl_rx_pkt *cpl = (const void *)&resp->imm_data[0];
 		m->m_len = sizeof(*cpl) + ntohs(cpl->len);
 	} else
 		m->m_len = IMMED_PKT_SIZE;
 	m->m_ext.ext_buf = NULL;
 	m->m_ext.ext_type = 0;
 	memcpy(mtod(m, uint8_t *), resp->imm_data, m->m_len); 
 	return (0);	
 }
 
 static __inline u_int
 flits_to_desc(u_int n)
 {
 	return (flit_desc_map[n]);
 }
 
 #define SGE_PARERR (F_CPPARITYERROR | F_OCPARITYERROR | F_RCPARITYERROR | \
 		    F_IRPARITYERROR | V_ITPARITYERROR(M_ITPARITYERROR) | \
 		    V_FLPARITYERROR(M_FLPARITYERROR) | F_LODRBPARITYERROR | \
 		    F_HIDRBPARITYERROR | F_LORCQPARITYERROR | \
 		    F_HIRCQPARITYERROR)
 #define SGE_FRAMINGERR (F_UC_REQ_FRAMINGERROR | F_R_REQ_FRAMINGERROR)
 #define SGE_FATALERR (SGE_PARERR | SGE_FRAMINGERR | F_RSPQCREDITOVERFOW | \
 		      F_RSPQDISABLED)
 
 /**
  *	t3_sge_err_intr_handler - SGE async event interrupt handler
  *	@adapter: the adapter
  *
  *	Interrupt handler for SGE asynchronous (non-data) events.
  */
 void
 t3_sge_err_intr_handler(adapter_t *adapter)
 {
 	unsigned int v, status;
 
 	status = t3_read_reg(adapter, A_SG_INT_CAUSE);
 	if (status & SGE_PARERR)
 		CH_ALERT(adapter, "SGE parity error (0x%x)\n",
 			 status & SGE_PARERR);
 	if (status & SGE_FRAMINGERR)
 		CH_ALERT(adapter, "SGE framing error (0x%x)\n",
 			 status & SGE_FRAMINGERR);
 	if (status & F_RSPQCREDITOVERFOW)
 		CH_ALERT(adapter, "SGE response queue credit overflow\n");
 
 	if (status & F_RSPQDISABLED) {
 		v = t3_read_reg(adapter, A_SG_RSPQ_FL_STATUS);
 
 		CH_ALERT(adapter,
 			 "packet delivered to disabled response queue (0x%x)\n",
 			 (v >> S_RSPQ0DISABLED) & 0xff);
 	}
 
 	t3_write_reg(adapter, A_SG_INT_CAUSE, status);
 	if (status & SGE_FATALERR)
 		t3_fatal_err(adapter);
 }
 
 void
 t3_sge_prep(adapter_t *adap, struct sge_params *p)
 {
 	int i, nqsets, fl_q_size, jumbo_q_size, use_16k, jumbo_buf_size;
 
 	nqsets = min(SGE_QSETS / adap->params.nports, mp_ncpus);
 	nqsets *= adap->params.nports;
 
 	fl_q_size = min(nmbclusters/(3*nqsets), FL_Q_SIZE);
 
 	while (!powerof2(fl_q_size))
 		fl_q_size--;
 
 	use_16k = cxgb_use_16k_clusters != -1 ? cxgb_use_16k_clusters :
 	    is_offload(adap);
 
 #if __FreeBSD_version >= 700111
 	if (use_16k) {
 		jumbo_q_size = min(nmbjumbo16/(3*nqsets), JUMBO_Q_SIZE);
 		jumbo_buf_size = MJUM16BYTES;
 	} else {
 		jumbo_q_size = min(nmbjumbo9/(3*nqsets), JUMBO_Q_SIZE);
 		jumbo_buf_size = MJUM9BYTES;
 	}
 #else
 	jumbo_q_size = min(nmbjumbop/(3*nqsets), JUMBO_Q_SIZE);
 	jumbo_buf_size = MJUMPAGESIZE;
 #endif
 	while (!powerof2(jumbo_q_size))
 		jumbo_q_size--;
 
 	if (fl_q_size < (FL_Q_SIZE / 4) || jumbo_q_size < (JUMBO_Q_SIZE / 2))
 		device_printf(adap->dev,
 		    "Insufficient clusters and/or jumbo buffers.\n");
 
 	p->max_pkt_size = jumbo_buf_size - sizeof(struct cpl_rx_data);
 
 	for (i = 0; i < SGE_QSETS; ++i) {
 		struct qset_params *q = p->qset + i;
 
 		if (adap->params.nports > 2) {
 			q->coalesce_usecs = 50;
 		} else {
 #ifdef INVARIANTS			
 			q->coalesce_usecs = 10;
 #else
 			q->coalesce_usecs = 5;
 #endif			
 		}
 		q->polling = 0;
 		q->rspq_size = RSPQ_Q_SIZE;
 		q->fl_size = fl_q_size;
 		q->jumbo_size = jumbo_q_size;
 		q->jumbo_buf_size = jumbo_buf_size;
 		q->txq_size[TXQ_ETH] = TX_ETH_Q_SIZE;
 		q->txq_size[TXQ_OFLD] = is_offload(adap) ? TX_OFLD_Q_SIZE : 16;
 		q->txq_size[TXQ_CTRL] = TX_CTRL_Q_SIZE;
 		q->cong_thres = 0;
 	}
 }
 
 int
 t3_sge_alloc(adapter_t *sc)
 {
 
 	/* The parent tag. */
 	if (bus_dma_tag_create( bus_get_dma_tag(sc->dev),/* PCI parent */
 				1, 0,			/* algnmnt, boundary */
 				BUS_SPACE_MAXADDR,	/* lowaddr */
 				BUS_SPACE_MAXADDR,	/* highaddr */
 				NULL, NULL,		/* filter, filterarg */
 				BUS_SPACE_MAXSIZE_32BIT,/* maxsize */
 				BUS_SPACE_UNRESTRICTED, /* nsegments */
 				BUS_SPACE_MAXSIZE_32BIT,/* maxsegsize */
 				0,			/* flags */
 				NULL, NULL,		/* lock, lockarg */
 				&sc->parent_dmat)) {
 		device_printf(sc->dev, "Cannot allocate parent DMA tag\n");
 		return (ENOMEM);
 	}
 
 	/*
 	 * DMA tag for normal sized RX frames
 	 */
 	if (bus_dma_tag_create(sc->parent_dmat, MCLBYTES, 0, BUS_SPACE_MAXADDR,
 		BUS_SPACE_MAXADDR, NULL, NULL, MCLBYTES, 1,
 		MCLBYTES, BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rx_dmat)) {
 		device_printf(sc->dev, "Cannot allocate RX DMA tag\n");
 		return (ENOMEM);
 	}
 
 	/* 
 	 * DMA tag for jumbo sized RX frames.
 	 */
 	if (bus_dma_tag_create(sc->parent_dmat, MJUM16BYTES, 0, BUS_SPACE_MAXADDR,
 		BUS_SPACE_MAXADDR, NULL, NULL, MJUM16BYTES, 1, MJUM16BYTES,
 		BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rx_jumbo_dmat)) {
 		device_printf(sc->dev, "Cannot allocate RX jumbo DMA tag\n");
 		return (ENOMEM);
 	}
 
 	/* 
 	 * DMA tag for TX frames.
 	 */
 	if (bus_dma_tag_create(sc->parent_dmat, 1, 0, BUS_SPACE_MAXADDR,
 		BUS_SPACE_MAXADDR, NULL, NULL, TX_MAX_SIZE, TX_MAX_SEGS,
 		TX_MAX_SIZE, BUS_DMA_ALLOCNOW,
 		NULL, NULL, &sc->tx_dmat)) {
 		device_printf(sc->dev, "Cannot allocate TX DMA tag\n");
 		return (ENOMEM);
 	}
 
 	return (0);
 }
 
 int
 t3_sge_free(struct adapter * sc)
 {
 
 	if (sc->tx_dmat != NULL)
 		bus_dma_tag_destroy(sc->tx_dmat);
 
 	if (sc->rx_jumbo_dmat != NULL)
 		bus_dma_tag_destroy(sc->rx_jumbo_dmat);
 
 	if (sc->rx_dmat != NULL)
 		bus_dma_tag_destroy(sc->rx_dmat);
 
 	if (sc->parent_dmat != NULL)
 		bus_dma_tag_destroy(sc->parent_dmat);
 
 	return (0);
 }
 
 void
 t3_update_qset_coalesce(struct sge_qset *qs, const struct qset_params *p)
 {
 
 	qs->rspq.holdoff_tmr = max(p->coalesce_usecs * 10, 1U);
 	qs->rspq.polling = 0 /* p->polling */;
 }
 
 #if !defined(__i386__) && !defined(__amd64__)
 static void
 refill_fl_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	struct refill_fl_cb_arg *cb_arg = arg;
 	
 	cb_arg->error = error;
 	cb_arg->seg = segs[0];
 	cb_arg->nseg = nseg;
 
 }
 #endif
 /**
  *	refill_fl - refill an SGE free-buffer list
  *	@sc: the controller softc
  *	@q: the free-list to refill
  *	@n: the number of new buffers to allocate
  *
  *	(Re)populate an SGE free-buffer list with up to @n new packet buffers.
  *	The caller must assure that @n does not exceed the queue's capacity.
  */
 static void
 refill_fl(adapter_t *sc, struct sge_fl *q, int n)
 {
 	struct rx_sw_desc *sd = &q->sdesc[q->pidx];
 	struct rx_desc *d = &q->desc[q->pidx];
 	struct refill_fl_cb_arg cb_arg;
 	struct mbuf *m;
 	caddr_t cl;
 	int err;
 	
 	cb_arg.error = 0;
 	while (n--) {
 		/*
 		 * We allocate an uninitialized mbuf + cluster, mbuf is
 		 * initialized after rx.
 		 */
 		if (q->zone == zone_pack) {
 			if ((m = m_getcl(M_NOWAIT, MT_NOINIT, M_PKTHDR)) == NULL)
 				break;
 			cl = m->m_ext.ext_buf;			
 		} else {
 			if ((cl = m_cljget(NULL, M_NOWAIT, q->buf_size)) == NULL)
 				break;
 			if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) {
 				uma_zfree(q->zone, cl);
 				break;
 			}
 		}
 		if ((sd->flags & RX_SW_DESC_MAP_CREATED) == 0) {
 			if ((err = bus_dmamap_create(q->entry_tag, 0, &sd->map))) {
 				log(LOG_WARNING, "bus_dmamap_create failed %d\n", err);
 				uma_zfree(q->zone, cl);
 				goto done;
 			}
 			sd->flags |= RX_SW_DESC_MAP_CREATED;
 		}
 #if !defined(__i386__) && !defined(__amd64__)
 		err = bus_dmamap_load(q->entry_tag, sd->map,
 		    cl, q->buf_size, refill_fl_cb, &cb_arg, 0);
 		
 		if (err != 0 || cb_arg.error) {
 			if (q->zone != zone_pack)
 				uma_zfree(q->zone, cl);
 			m_free(m);
 			goto done;
 		}
 #else
 		cb_arg.seg.ds_addr = pmap_kextract((vm_offset_t)cl);
 #endif		
 		sd->flags |= RX_SW_DESC_INUSE;
 		sd->rxsd_cl = cl;
 		sd->m = m;
 		d->addr_lo = htobe32(cb_arg.seg.ds_addr & 0xffffffff);
 		d->addr_hi = htobe32(((uint64_t)cb_arg.seg.ds_addr >>32) & 0xffffffff);
 		d->len_gen = htobe32(V_FLD_GEN1(q->gen));
 		d->gen2 = htobe32(V_FLD_GEN2(q->gen));
 
 		d++;
 		sd++;
 
 		if (++q->pidx == q->size) {
 			q->pidx = 0;
 			q->gen ^= 1;
 			sd = q->sdesc;
 			d = q->desc;
 		}
 		q->credits++;
 		q->db_pending++;
 	}
 
 done:
 	if (q->db_pending >= 32) {
 		q->db_pending = 0;
 		t3_write_reg(sc, A_SG_KDOORBELL, V_EGRCNTX(q->cntxt_id));
 	}
 }
 
 
 /**
  *	free_rx_bufs - free the Rx buffers on an SGE free list
  *	@sc: the controle softc
  *	@q: the SGE free list to clean up
  *
  *	Release the buffers on an SGE free-buffer Rx queue.  HW fetching from
  *	this queue should be stopped before calling this function.
  */
 static void
 free_rx_bufs(adapter_t *sc, struct sge_fl *q)
 {
 	u_int cidx = q->cidx;
 
 	while (q->credits--) {
 		struct rx_sw_desc *d = &q->sdesc[cidx];
 
 		if (d->flags & RX_SW_DESC_INUSE) {
 			bus_dmamap_unload(q->entry_tag, d->map);
 			bus_dmamap_destroy(q->entry_tag, d->map);
 			if (q->zone == zone_pack) {
 				m_init(d->m, M_NOWAIT, MT_DATA, M_EXT);
 				uma_zfree(zone_pack, d->m);
 			} else {
 				m_init(d->m, M_NOWAIT, MT_DATA, 0);
 				uma_zfree(zone_mbuf, d->m);
 				uma_zfree(q->zone, d->rxsd_cl);
 			}			
 		}
 		
 		d->rxsd_cl = NULL;
 		d->m = NULL;
 		if (++cidx == q->size)
 			cidx = 0;
 	}
 }
 
 static __inline void
 __refill_fl(adapter_t *adap, struct sge_fl *fl)
 {
 	refill_fl(adap, fl, min(16U, fl->size - fl->credits));
 }
 
 static __inline void
 __refill_fl_lt(adapter_t *adap, struct sge_fl *fl, int max)
 {
 	uint32_t reclaimable = fl->size - fl->credits;
 
 	if (reclaimable > 0)
 		refill_fl(adap, fl, min(max, reclaimable));
 }
 
 /**
  *	recycle_rx_buf - recycle a receive buffer
  *	@adapter: the adapter
  *	@q: the SGE free list
  *	@idx: index of buffer to recycle
  *
  *	Recycles the specified buffer on the given free list by adding it at
  *	the next available slot on the list.
  */
 static void
 recycle_rx_buf(adapter_t *adap, struct sge_fl *q, unsigned int idx)
 {
 	struct rx_desc *from = &q->desc[idx];
 	struct rx_desc *to   = &q->desc[q->pidx];
 
 	q->sdesc[q->pidx] = q->sdesc[idx];
 	to->addr_lo = from->addr_lo;        // already big endian
 	to->addr_hi = from->addr_hi;        // likewise
 	wmb();	/* necessary ? */
 	to->len_gen = htobe32(V_FLD_GEN1(q->gen));
 	to->gen2 = htobe32(V_FLD_GEN2(q->gen));
 	q->credits++;
 
 	if (++q->pidx == q->size) {
 		q->pidx = 0;
 		q->gen ^= 1;
 	}
 	t3_write_reg(adap, A_SG_KDOORBELL, V_EGRCNTX(q->cntxt_id));
 }
 
 static void
 alloc_ring_cb(void *arg, bus_dma_segment_t *segs, int nsegs, int error)
 {
 	uint32_t *addr;
 
 	addr = arg;
 	*addr = segs[0].ds_addr;
 }
 
 static int
 alloc_ring(adapter_t *sc, size_t nelem, size_t elem_size, size_t sw_size,
     bus_addr_t *phys, void *desc, void *sdesc, bus_dma_tag_t *tag,
     bus_dmamap_t *map, bus_dma_tag_t parent_entry_tag, bus_dma_tag_t *entry_tag)
 {
 	size_t len = nelem * elem_size;
 	void *s = NULL;
 	void *p = NULL;
 	int err;
 
 	if ((err = bus_dma_tag_create(sc->parent_dmat, PAGE_SIZE, 0,
 				      BUS_SPACE_MAXADDR_32BIT,
 				      BUS_SPACE_MAXADDR, NULL, NULL, len, 1,
 				      len, 0, NULL, NULL, tag)) != 0) {
 		device_printf(sc->dev, "Cannot allocate descriptor tag\n");
 		return (ENOMEM);
 	}
 
 	if ((err = bus_dmamem_alloc(*tag, (void **)&p, BUS_DMA_NOWAIT,
 				    map)) != 0) {
 		device_printf(sc->dev, "Cannot allocate descriptor memory\n");
 		return (ENOMEM);
 	}
 
 	bus_dmamap_load(*tag, *map, p, len, alloc_ring_cb, phys, 0);
 	bzero(p, len);
 	*(void **)desc = p;
 
 	if (sw_size) {
 		len = nelem * sw_size;
 		s = malloc(len, M_DEVBUF, M_WAITOK|M_ZERO);
 		*(void **)sdesc = s;
 	}
 	if (parent_entry_tag == NULL)
 		return (0);
 	    
 	if ((err = bus_dma_tag_create(parent_entry_tag, 1, 0,
 				      BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR,
 		                      NULL, NULL, TX_MAX_SIZE, TX_MAX_SEGS,
 				      TX_MAX_SIZE, BUS_DMA_ALLOCNOW,
 		                      NULL, NULL, entry_tag)) != 0) {
 		device_printf(sc->dev, "Cannot allocate descriptor entry tag\n");
 		return (ENOMEM);
 	}
 	return (0);
 }
 
 static void
 sge_slow_intr_handler(void *arg, int ncount)
 {
 	adapter_t *sc = arg;
 
 	t3_slow_intr_handler(sc);
 	t3_write_reg(sc, A_PL_INT_ENABLE0, sc->slow_intr_mask);
 	(void) t3_read_reg(sc, A_PL_INT_ENABLE0);
 }
 
 /**
  *	sge_timer_cb - perform periodic maintenance of an SGE qset
  *	@data: the SGE queue set to maintain
  *
  *	Runs periodically from a timer to perform maintenance of an SGE queue
  *	set.  It performs two tasks:
  *
  *	a) Cleans up any completed Tx descriptors that may still be pending.
  *	Normal descriptor cleanup happens when new packets are added to a Tx
  *	queue so this timer is relatively infrequent and does any cleanup only
  *	if the Tx queue has not seen any new packets in a while.  We make a
  *	best effort attempt to reclaim descriptors, in that we don't wait
  *	around if we cannot get a queue's lock (which most likely is because
  *	someone else is queueing new packets and so will also handle the clean
  *	up).  Since control queues use immediate data exclusively we don't
  *	bother cleaning them up here.
  *
  *	b) Replenishes Rx queues that have run out due to memory shortage.
  *	Normally new Rx buffers are added when existing ones are consumed but
  *	when out of memory a queue can become empty.  We try to add only a few
  *	buffers here, the queue will be replenished fully as these new buffers
  *	are used up if memory shortage has subsided.
  *	
  *	c) Return coalesced response queue credits in case a response queue is
  *	starved.
  *
  *	d) Ring doorbells for T304 tunnel queues since we have seen doorbell 
  *	fifo overflows and the FW doesn't implement any recovery scheme yet.
  */
 static void
 sge_timer_cb(void *arg)
 {
 	adapter_t *sc = arg;
 	if ((sc->flags & USING_MSIX) == 0) {
 		
 		struct port_info *pi;
 		struct sge_qset *qs;
 		struct sge_txq  *txq;
 		int i, j;
 		int reclaim_ofl, refill_rx;
 
 		if (sc->open_device_map == 0) 
 			return;
 
 		for (i = 0; i < sc->params.nports; i++) {
 			pi = &sc->port[i];
 			for (j = 0; j < pi->nqsets; j++) {
 				qs = &sc->sge.qs[pi->first_qset + j];
 				txq = &qs->txq[0];
 				reclaim_ofl = txq[TXQ_OFLD].processed - txq[TXQ_OFLD].cleaned;
 				refill_rx = ((qs->fl[0].credits < qs->fl[0].size) || 
 				    (qs->fl[1].credits < qs->fl[1].size));
 				if (reclaim_ofl || refill_rx) {
 					taskqueue_enqueue(sc->tq, &pi->timer_reclaim_task);
 					break;
 				}
 			}
 		}
 	}
 	
 	if (sc->params.nports > 2) {
 		int i;
 
 		for_each_port(sc, i) {
 			struct port_info *pi = &sc->port[i];
 
 			t3_write_reg(sc, A_SG_KDOORBELL, 
 				     F_SELEGRCNTX | 
 				     (FW_TUNNEL_SGEEC_START + pi->first_qset));
 		}
 	}	
 	if (((sc->flags & USING_MSIX) == 0 || sc->params.nports > 2) &&
 	    sc->open_device_map != 0)
 		callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
 }
 
 /*
  * This is meant to be a catch-all function to keep sge state private
  * to sge.c
  *
  */
 int
 t3_sge_init_adapter(adapter_t *sc)
 {
 	callout_init(&sc->sge_timer_ch, 1);
 	callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
 	TASK_INIT(&sc->slow_intr_task, 0, sge_slow_intr_handler, sc);
 	return (0);
 }
 
 int
 t3_sge_reset_adapter(adapter_t *sc)
 {
 	callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
 	return (0);
 }
 
 int
 t3_sge_init_port(struct port_info *pi)
 {
 	TASK_INIT(&pi->timer_reclaim_task, 0, sge_timer_reclaim, pi);
 	return (0);
 }
 
 /**
  *	refill_rspq - replenish an SGE response queue
  *	@adapter: the adapter
  *	@q: the response queue to replenish
  *	@credits: how many new responses to make available
  *
  *	Replenishes a response queue by making the supplied number of responses
  *	available to HW.
  */
 static __inline void
 refill_rspq(adapter_t *sc, const struct sge_rspq *q, u_int credits)
 {
 
 	/* mbufs are allocated on demand when a rspq entry is processed. */
 	t3_write_reg(sc, A_SG_RSPQ_CREDIT_RETURN,
 		     V_RSPQ(q->cntxt_id) | V_CREDITS(credits));
 }
 
 static void
 sge_txq_reclaim_handler(void *arg, int ncount)
 {
 	struct sge_qset *qs = arg;
 	int i;
 
 	for (i = 0; i < 3; i++)
 		reclaim_completed_tx(qs, 16, i);
 }
 
 static void
 sge_timer_reclaim(void *arg, int ncount)
 {
 	struct port_info *pi = arg;
 	int i, nqsets = pi->nqsets;
 	adapter_t *sc = pi->adapter;
 	struct sge_qset *qs;
 	struct mtx *lock;
 	
 	KASSERT((sc->flags & USING_MSIX) == 0,
 	    ("can't call timer reclaim for msi-x"));
 
 	for (i = 0; i < nqsets; i++) {
 		qs = &sc->sge.qs[pi->first_qset + i];
 
 		reclaim_completed_tx(qs, 16, TXQ_OFLD);
 		lock = (sc->flags & USING_MSIX) ? &qs->rspq.lock :
 			    &sc->sge.qs[0].rspq.lock;
 
 		if (mtx_trylock(lock)) {
 			/* XXX currently assume that we are *NOT* polling */
 			uint32_t status = t3_read_reg(sc, A_SG_RSPQ_FL_STATUS);
 
 			if (qs->fl[0].credits < qs->fl[0].size - 16)
 				__refill_fl(sc, &qs->fl[0]);
 			if (qs->fl[1].credits < qs->fl[1].size - 16)
 				__refill_fl(sc, &qs->fl[1]);
 			
 			if (status & (1 << qs->rspq.cntxt_id)) {
 				if (qs->rspq.credits) {
 					refill_rspq(sc, &qs->rspq, 1);
 					qs->rspq.credits--;
 					t3_write_reg(sc, A_SG_RSPQ_FL_STATUS, 
 					    1 << qs->rspq.cntxt_id);
 				}
 			}
 			mtx_unlock(lock);
 		}
 	}
 }
 
 /**
  *	init_qset_cntxt - initialize an SGE queue set context info
  *	@qs: the queue set
  *	@id: the queue set id
  *
  *	Initializes the TIDs and context ids for the queues of a queue set.
  */
 static void
 init_qset_cntxt(struct sge_qset *qs, u_int id)
 {
 
 	qs->rspq.cntxt_id = id;
 	qs->fl[0].cntxt_id = 2 * id;
 	qs->fl[1].cntxt_id = 2 * id + 1;
 	qs->txq[TXQ_ETH].cntxt_id = FW_TUNNEL_SGEEC_START + id;
 	qs->txq[TXQ_ETH].token = FW_TUNNEL_TID_START + id;
 	qs->txq[TXQ_OFLD].cntxt_id = FW_OFLD_SGEEC_START + id;
 	qs->txq[TXQ_CTRL].cntxt_id = FW_CTRL_SGEEC_START + id;
 	qs->txq[TXQ_CTRL].token = FW_CTRL_TID_START + id;
 
 	/* XXX: a sane limit is needed instead of INT_MAX */
 	mbufq_init(&qs->txq[TXQ_ETH].sendq, INT_MAX);
 	mbufq_init(&qs->txq[TXQ_OFLD].sendq, INT_MAX);
 	mbufq_init(&qs->txq[TXQ_CTRL].sendq, INT_MAX);
 }
 
 
 static void
 txq_prod(struct sge_txq *txq, unsigned int ndesc, struct txq_state *txqs)
 {
 	txq->in_use += ndesc;
 	/*
 	 * XXX we don't handle stopping of queue
 	 * presumably start handles this when we bump against the end
 	 */
 	txqs->gen = txq->gen;
 	txq->unacked += ndesc;
 	txqs->compl = (txq->unacked & 32) << (S_WR_COMPL - 5);
 	txq->unacked &= 31;
 	txqs->pidx = txq->pidx;
 	txq->pidx += ndesc;
 #ifdef INVARIANTS
 	if (((txqs->pidx > txq->cidx) &&
 		(txq->pidx < txqs->pidx) &&
 		(txq->pidx >= txq->cidx)) ||
 	    ((txqs->pidx < txq->cidx) &&
 		(txq->pidx >= txq-> cidx)) ||
 	    ((txqs->pidx < txq->cidx) &&
 		(txq->cidx < txqs->pidx)))
 		panic("txqs->pidx=%d txq->pidx=%d txq->cidx=%d",
 		    txqs->pidx, txq->pidx, txq->cidx);
 #endif
 	if (txq->pidx >= txq->size) {
 		txq->pidx -= txq->size;
 		txq->gen ^= 1;
 	}
 
 }
 
 /**
  *	calc_tx_descs - calculate the number of Tx descriptors for a packet
  *	@m: the packet mbufs
  *      @nsegs: the number of segments 
  *
  * 	Returns the number of Tx descriptors needed for the given Ethernet
  * 	packet.  Ethernet packets require addition of WR and CPL headers.
  */
 static __inline unsigned int
 calc_tx_descs(const struct mbuf *m, int nsegs)
 {
 	unsigned int flits;
 
 	if (m->m_pkthdr.len <= PIO_LEN)
 		return 1;
 
 	flits = sgl_len(nsegs) + 2;
 	if (m->m_pkthdr.csum_flags & CSUM_TSO)
 		flits++;
 
 	return flits_to_desc(flits);
 }
 
 /**
  *	make_sgl - populate a scatter/gather list for a packet
  *	@sgp: the SGL to populate
  *	@segs: the packet dma segments
  *	@nsegs: the number of segments
  *
  *	Generates a scatter/gather list for the buffers that make up a packet
  *	and returns the SGL size in 8-byte words.  The caller must size the SGL
  *	appropriately.
  */
 static __inline void
 make_sgl(struct sg_ent *sgp, bus_dma_segment_t *segs, int nsegs)
 {
 	int i, idx;
 	
 	for (idx = 0, i = 0; i < nsegs; i++) {
 		/*
 		 * firmware doesn't like empty segments
 		 */
 		if (segs[i].ds_len == 0)
 			continue;
 		if (i && idx == 0) 
 			++sgp;
 		
 		sgp->len[idx] = htobe32(segs[i].ds_len);
 		sgp->addr[idx] = htobe64(segs[i].ds_addr);
 		idx ^= 1;
 	}
 	
 	if (idx) {
 		sgp->len[idx] = 0;
 		sgp->addr[idx] = 0;
 	}
 }
 	
 /**
  *	check_ring_tx_db - check and potentially ring a Tx queue's doorbell
  *	@adap: the adapter
  *	@q: the Tx queue
  *
  *	Ring the doorbell if a Tx queue is asleep.  There is a natural race,
  *	where the HW is going to sleep just after we checked, however,
  *	then the interrupt handler will detect the outstanding TX packet
  *	and ring the doorbell for us.
  *
  *	When GTS is disabled we unconditionally ring the doorbell.
  */
 static __inline void
 check_ring_tx_db(adapter_t *adap, struct sge_txq *q, int mustring)
 {
 #if USE_GTS
 	clear_bit(TXQ_LAST_PKT_DB, &q->flags);
 	if (test_and_set_bit(TXQ_RUNNING, &q->flags) == 0) {
 		set_bit(TXQ_LAST_PKT_DB, &q->flags);
 #ifdef T3_TRACE
 		T3_TRACE1(adap->tb[q->cntxt_id & 7], "doorbell Tx, cntxt %d",
 			  q->cntxt_id);
 #endif
 		t3_write_reg(adap, A_SG_KDOORBELL,
 			     F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 	}
 #else
 	if (mustring || ++q->db_pending >= 32) {
 		wmb();            /* write descriptors before telling HW */
 		t3_write_reg(adap, A_SG_KDOORBELL,
 		    F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 		q->db_pending = 0;
 	}
 #endif
 }
 
 static __inline void
 wr_gen2(struct tx_desc *d, unsigned int gen)
 {
 #if SGE_NUM_GENBITS == 2
 	d->flit[TX_DESC_FLITS - 1] = htobe64(gen);
 #endif
 }
 
 /**
  *	write_wr_hdr_sgl - write a WR header and, optionally, SGL
  *	@ndesc: number of Tx descriptors spanned by the SGL
  *	@txd: first Tx descriptor to be written
  *	@txqs: txq state (generation and producer index)
  *	@txq: the SGE Tx queue
  *	@sgl: the SGL
  *	@flits: number of flits to the start of the SGL in the first descriptor
  *	@sgl_flits: the SGL size in flits
  *	@wr_hi: top 32 bits of WR header based on WR type (big endian)
  *	@wr_lo: low 32 bits of WR header based on WR type (big endian)
  *
  *	Write a work request header and an associated SGL.  If the SGL is
  *	small enough to fit into one Tx descriptor it has already been written
  *	and we just need to write the WR header.  Otherwise we distribute the
  *	SGL across the number of descriptors it spans.
  */
 static void
 write_wr_hdr_sgl(unsigned int ndesc, struct tx_desc *txd, struct txq_state *txqs,
     const struct sge_txq *txq, const struct sg_ent *sgl, unsigned int flits,
     unsigned int sgl_flits, unsigned int wr_hi, unsigned int wr_lo)
 {
 
 	struct work_request_hdr *wrp = (struct work_request_hdr *)txd;
 	struct tx_sw_desc *txsd = &txq->sdesc[txqs->pidx];
 	
 	if (__predict_true(ndesc == 1)) {
 		set_wr_hdr(wrp, htonl(F_WR_SOP | F_WR_EOP | V_WR_DATATYPE(1) |
 		    V_WR_SGLSFLT(flits)) | wr_hi,
 		    htonl(V_WR_LEN(flits + sgl_flits) | V_WR_GEN(txqs->gen)) |
 		    wr_lo);
 
 		wr_gen2(txd, txqs->gen);
 		
 	} else {
 		unsigned int ogen = txqs->gen;
 		const uint64_t *fp = (const uint64_t *)sgl;
 		struct work_request_hdr *wp = wrp;
 		
 		wrp->wrh_hi = htonl(F_WR_SOP | V_WR_DATATYPE(1) |
 		    V_WR_SGLSFLT(flits)) | wr_hi;
 		
 		while (sgl_flits) {
 			unsigned int avail = WR_FLITS - flits;
 
 			if (avail > sgl_flits)
 				avail = sgl_flits;
 			memcpy(&txd->flit[flits], fp, avail * sizeof(*fp));
 			sgl_flits -= avail;
 			ndesc--;
 			if (!sgl_flits)
 				break;
 			
 			fp += avail;
 			txd++;
 			txsd++;
 			if (++txqs->pidx == txq->size) {
 				txqs->pidx = 0;
 				txqs->gen ^= 1;
 				txd = txq->desc;
 				txsd = txq->sdesc;
 			}
 
 			/*
 			 * when the head of the mbuf chain
 			 * is freed all clusters will be freed
 			 * with it
 			 */
 			wrp = (struct work_request_hdr *)txd;
 			wrp->wrh_hi = htonl(V_WR_DATATYPE(1) |
 			    V_WR_SGLSFLT(1)) | wr_hi;
 			wrp->wrh_lo = htonl(V_WR_LEN(min(WR_FLITS,
 				    sgl_flits + 1)) |
 			    V_WR_GEN(txqs->gen)) | wr_lo;
 			wr_gen2(txd, txqs->gen);
 			flits = 1;
 		}
 		wrp->wrh_hi |= htonl(F_WR_EOP);
 		wmb();
 		wp->wrh_lo = htonl(V_WR_LEN(WR_FLITS) | V_WR_GEN(ogen)) | wr_lo;
 		wr_gen2((struct tx_desc *)wp, ogen);
 	}
 }
 
 /* sizeof(*eh) + sizeof(*ip) + sizeof(*tcp) */
 #define TCPPKTHDRSIZE (ETHER_HDR_LEN + 20 + 20)
 
 #define GET_VTAG(cntrl, m) \
 do { \
 	if ((m)->m_flags & M_VLANTAG)					            \
 		cntrl |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN((m)->m_pkthdr.ether_vtag); \
 } while (0)
 
 static int
 t3_encap(struct sge_qset *qs, struct mbuf **m)
 {
 	adapter_t *sc;
 	struct mbuf *m0;
 	struct sge_txq *txq;
 	struct txq_state txqs;
 	struct port_info *pi;
 	unsigned int ndesc, flits, cntrl, mlen;
 	int err, nsegs, tso_info = 0;
 
 	struct work_request_hdr *wrp;
 	struct tx_sw_desc *txsd;
 	struct sg_ent *sgp, *sgl;
 	uint32_t wr_hi, wr_lo, sgl_flits; 
 	bus_dma_segment_t segs[TX_MAX_SEGS];
 
 	struct tx_desc *txd;
 		
 	pi = qs->port;
 	sc = pi->adapter;
 	txq = &qs->txq[TXQ_ETH];
 	txd = &txq->desc[txq->pidx];
 	txsd = &txq->sdesc[txq->pidx];
 	sgl = txq->txq_sgl;
 
 	prefetch(txd);
 	m0 = *m;
 
 	mtx_assert(&qs->lock, MA_OWNED);
 	cntrl = V_TXPKT_INTF(pi->txpkt_intf);
 	KASSERT(m0->m_flags & M_PKTHDR, ("not packet header\n"));
 	
 	if  (m0->m_nextpkt == NULL && m0->m_next != NULL &&
 	    m0->m_pkthdr.csum_flags & (CSUM_TSO))
 		tso_info = V_LSO_MSS(m0->m_pkthdr.tso_segsz);
 
 	if (m0->m_nextpkt != NULL) {
 		busdma_map_sg_vec(txq->entry_tag, txsd->map, m0, segs, &nsegs);
 		ndesc = 1;
 		mlen = 0;
 	} else {
 		if ((err = busdma_map_sg_collapse(txq->entry_tag, txsd->map,
 		    &m0, segs, &nsegs))) {
 			if (cxgb_debug)
 				printf("failed ... err=%d\n", err);
 			return (err);
 		}
 		mlen = m0->m_pkthdr.len;
 		ndesc = calc_tx_descs(m0, nsegs);
 	}
 	txq_prod(txq, ndesc, &txqs);
 
 	KASSERT(m0->m_pkthdr.len, ("empty packet nsegs=%d", nsegs));
 	txsd->m = m0;
 
 	if (m0->m_nextpkt != NULL) {
 		struct cpl_tx_pkt_batch *cpl_batch = (struct cpl_tx_pkt_batch *)txd;
 		int i, fidx;
 
 		if (nsegs > 7)
 			panic("trying to coalesce %d packets in to one WR", nsegs);
 		txq->txq_coalesced += nsegs;
 		wrp = (struct work_request_hdr *)txd;
 		flits = nsegs*2 + 1;
 
 		for (fidx = 1, i = 0; i < nsegs; i++, fidx += 2) {
 			struct cpl_tx_pkt_batch_entry *cbe;
 			uint64_t flit;
 			uint32_t *hflit = (uint32_t *)&flit;
 			int cflags = m0->m_pkthdr.csum_flags;
 
 			cntrl = V_TXPKT_INTF(pi->txpkt_intf);
 			GET_VTAG(cntrl, m0);
 			cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT);
 			if (__predict_false(!(cflags & CSUM_IP)))
 				cntrl |= F_TXPKT_IPCSUM_DIS;
 			if (__predict_false(!(cflags & (CSUM_TCP | CSUM_UDP |
 			    CSUM_UDP_IPV6 | CSUM_TCP_IPV6))))
 				cntrl |= F_TXPKT_L4CSUM_DIS;
 
 			hflit[0] = htonl(cntrl);
 			hflit[1] = htonl(segs[i].ds_len | 0x80000000);
 			flit |= htobe64(1 << 24);
 			cbe = &cpl_batch->pkt_entry[i];
 			cbe->cntrl = hflit[0];
 			cbe->len = hflit[1];
 			cbe->addr = htobe64(segs[i].ds_addr);
 		}
 
 		wr_hi = htonl(F_WR_SOP | F_WR_EOP | V_WR_DATATYPE(1) |
 		    V_WR_SGLSFLT(flits)) |
 		    htonl(V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) | txqs.compl);
 		wr_lo = htonl(V_WR_LEN(flits) |
 		    V_WR_GEN(txqs.gen)) | htonl(V_WR_TID(txq->token));
 		set_wr_hdr(wrp, wr_hi, wr_lo);
 		wmb();
 		ETHER_BPF_MTAP(pi->ifp, m0);
 		wr_gen2(txd, txqs.gen);
 		check_ring_tx_db(sc, txq, 0);
 		return (0);		
 	} else if (tso_info) {
 		uint16_t eth_type;
 		struct cpl_tx_pkt_lso *hdr = (struct cpl_tx_pkt_lso *)txd;
 		struct ether_header *eh;
 		void *l3hdr;
 		struct tcphdr *tcp;
 
 		txd->flit[2] = 0;
 		GET_VTAG(cntrl, m0);
 		cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT_LSO);
 		hdr->cntrl = htonl(cntrl);
 		hdr->len = htonl(mlen | 0x80000000);
 
 		if (__predict_false(mlen < TCPPKTHDRSIZE)) {
 			printf("mbuf=%p,len=%d,tso_segsz=%d,csum_flags=%b,flags=%#x",
 			    m0, mlen, m0->m_pkthdr.tso_segsz,
 			    (int)m0->m_pkthdr.csum_flags, CSUM_BITS, m0->m_flags);
 			panic("tx tso packet too small");
 		}
 
 		/* Make sure that ether, ip, tcp headers are all in m0 */
 		if (__predict_false(m0->m_len < TCPPKTHDRSIZE)) {
 			m0 = m_pullup(m0, TCPPKTHDRSIZE);
 			if (__predict_false(m0 == NULL)) {
 				/* XXX panic probably an overreaction */
 				panic("couldn't fit header into mbuf");
 			}
 		}
 
 		eh = mtod(m0, struct ether_header *);
 		eth_type = eh->ether_type;
 		if (eth_type == htons(ETHERTYPE_VLAN)) {
 			struct ether_vlan_header *evh = (void *)eh;
 
 			tso_info |= V_LSO_ETH_TYPE(CPL_ETH_II_VLAN);
 			l3hdr = evh + 1;
 			eth_type = evh->evl_proto;
 		} else {
 			tso_info |= V_LSO_ETH_TYPE(CPL_ETH_II);
 			l3hdr = eh + 1;
 		}
 
 		if (eth_type == htons(ETHERTYPE_IP)) {
 			struct ip *ip = l3hdr;
 
 			tso_info |= V_LSO_IPHDR_WORDS(ip->ip_hl);
 			tcp = (struct tcphdr *)(ip + 1);
 		} else if (eth_type == htons(ETHERTYPE_IPV6)) {
 			struct ip6_hdr *ip6 = l3hdr;
 
 			KASSERT(ip6->ip6_nxt == IPPROTO_TCP,
 			    ("%s: CSUM_TSO with ip6_nxt %d",
 			    __func__, ip6->ip6_nxt));
 
 			tso_info |= F_LSO_IPV6;
 			tso_info |= V_LSO_IPHDR_WORDS(sizeof(*ip6) >> 2);
 			tcp = (struct tcphdr *)(ip6 + 1);
 		} else
 			panic("%s: CSUM_TSO but neither ip nor ip6", __func__);
 
 		tso_info |= V_LSO_TCPHDR_WORDS(tcp->th_off);
 		hdr->lso_info = htonl(tso_info);
 
 		if (__predict_false(mlen <= PIO_LEN)) {
 			/*
 			 * pkt not undersized but fits in PIO_LEN
 			 * Indicates a TSO bug at the higher levels.
 			 */
 			txsd->m = NULL;
 			m_copydata(m0, 0, mlen, (caddr_t)&txd->flit[3]);
 			flits = (mlen + 7) / 8 + 3;
 			wr_hi = htonl(V_WR_BCNTLFLT(mlen & 7) |
 					  V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) |
 					  F_WR_SOP | F_WR_EOP | txqs.compl);
 			wr_lo = htonl(V_WR_LEN(flits) |
 			    V_WR_GEN(txqs.gen) | V_WR_TID(txq->token));
 			set_wr_hdr(&hdr->wr, wr_hi, wr_lo);
 			wmb();
 			ETHER_BPF_MTAP(pi->ifp, m0);
 			wr_gen2(txd, txqs.gen);
 			check_ring_tx_db(sc, txq, 0);
 			m_freem(m0);
 			return (0);
 		}
 		flits = 3;	
 	} else {
 		struct cpl_tx_pkt *cpl = (struct cpl_tx_pkt *)txd;
 		
 		GET_VTAG(cntrl, m0);
 		cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT);
 		if (__predict_false(!(m0->m_pkthdr.csum_flags & CSUM_IP)))
 			cntrl |= F_TXPKT_IPCSUM_DIS;
 		if (__predict_false(!(m0->m_pkthdr.csum_flags & (CSUM_TCP |
 		    CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6))))
 			cntrl |= F_TXPKT_L4CSUM_DIS;
 		cpl->cntrl = htonl(cntrl);
 		cpl->len = htonl(mlen | 0x80000000);
 
 		if (mlen <= PIO_LEN) {
 			txsd->m = NULL;
 			m_copydata(m0, 0, mlen, (caddr_t)&txd->flit[2]);
 			flits = (mlen + 7) / 8 + 2;
 			
 			wr_hi = htonl(V_WR_BCNTLFLT(mlen & 7) |
 			    V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) |
 					  F_WR_SOP | F_WR_EOP | txqs.compl);
 			wr_lo = htonl(V_WR_LEN(flits) |
 			    V_WR_GEN(txqs.gen) | V_WR_TID(txq->token));
 			set_wr_hdr(&cpl->wr, wr_hi, wr_lo);
 			wmb();
 			ETHER_BPF_MTAP(pi->ifp, m0);
 			wr_gen2(txd, txqs.gen);
 			check_ring_tx_db(sc, txq, 0);
 			m_freem(m0);
 			return (0);
 		}
 		flits = 2;
 	}
 	wrp = (struct work_request_hdr *)txd;
 	sgp = (ndesc == 1) ? (struct sg_ent *)&txd->flit[flits] : sgl;
 	make_sgl(sgp, segs, nsegs);
 
 	sgl_flits = sgl_len(nsegs);
 
 	ETHER_BPF_MTAP(pi->ifp, m0);
 
 	KASSERT(ndesc <= 4, ("ndesc too large %d", ndesc));
 	wr_hi = htonl(V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) | txqs.compl);
 	wr_lo = htonl(V_WR_TID(txq->token));
 	write_wr_hdr_sgl(ndesc, txd, &txqs, txq, sgl, flits,
 	    sgl_flits, wr_hi, wr_lo);
 	check_ring_tx_db(sc, txq, 0);
 
 	return (0);
 }
 
 void
 cxgb_tx_watchdog(void *arg)
 {
 	struct sge_qset *qs = arg;
 	struct sge_txq *txq = &qs->txq[TXQ_ETH];
 
         if (qs->coalescing != 0 &&
 	    (txq->in_use <= cxgb_tx_coalesce_enable_stop) &&
 	    TXQ_RING_EMPTY(qs))
                 qs->coalescing = 0; 
         else if (qs->coalescing == 0 &&
 	    (txq->in_use >= cxgb_tx_coalesce_enable_start))
                 qs->coalescing = 1;
 	if (TXQ_TRYLOCK(qs)) {
 		qs->qs_flags |= QS_FLUSHING;
 		cxgb_start_locked(qs);
 		qs->qs_flags &= ~QS_FLUSHING;
 		TXQ_UNLOCK(qs);
 	}
 	if (qs->port->ifp->if_drv_flags & IFF_DRV_RUNNING)
 		callout_reset_on(&txq->txq_watchdog, hz/4, cxgb_tx_watchdog,
 		    qs, txq->txq_watchdog.c_cpu);
 }
 
 static void
 cxgb_tx_timeout(void *arg)
 {
 	struct sge_qset *qs = arg;
 	struct sge_txq *txq = &qs->txq[TXQ_ETH];
 
 	if (qs->coalescing == 0 && (txq->in_use >= (txq->size>>3)))
                 qs->coalescing = 1;	
 	if (TXQ_TRYLOCK(qs)) {
 		qs->qs_flags |= QS_TIMEOUT;
 		cxgb_start_locked(qs);
 		qs->qs_flags &= ~QS_TIMEOUT;
 		TXQ_UNLOCK(qs);
 	}
 }
 
 static void
 cxgb_start_locked(struct sge_qset *qs)
 {
 	struct mbuf *m_head = NULL;
 	struct sge_txq *txq = &qs->txq[TXQ_ETH];
 	struct port_info *pi = qs->port;
 	struct ifnet *ifp = pi->ifp;
 
 	if (qs->qs_flags & (QS_FLUSHING|QS_TIMEOUT))
 		reclaim_completed_tx(qs, 0, TXQ_ETH);
 
 	if (!pi->link_config.link_ok) {
 		TXQ_RING_FLUSH(qs);
 		return;
 	}
 	TXQ_LOCK_ASSERT(qs);
 	while (!TXQ_RING_EMPTY(qs) && (ifp->if_drv_flags & IFF_DRV_RUNNING) &&
 	    pi->link_config.link_ok) {
 		reclaim_completed_tx(qs, cxgb_tx_reclaim_threshold, TXQ_ETH);
 
 		if (txq->size - txq->in_use <= TX_MAX_DESC)
 			break;
 
 		if ((m_head = cxgb_dequeue(qs)) == NULL)
 			break;
 		/*
 		 *  Encapsulation can modify our pointer, and or make it
 		 *  NULL on failure.  In that event, we can't requeue.
 		 */
 		if (t3_encap(qs, &m_head) || m_head == NULL)
 			break;
 
 		m_head = NULL;
 	}
 
 	if (txq->db_pending)
 		check_ring_tx_db(pi->adapter, txq, 1);
 
 	if (!TXQ_RING_EMPTY(qs) && callout_pending(&txq->txq_timer) == 0 &&
 	    pi->link_config.link_ok)
 		callout_reset_on(&txq->txq_timer, 1, cxgb_tx_timeout,
 		    qs, txq->txq_timer.c_cpu);
 	if (m_head != NULL)
 		m_freem(m_head);
 }
 
 static int
 cxgb_transmit_locked(struct ifnet *ifp, struct sge_qset *qs, struct mbuf *m)
 {
 	struct port_info *pi = qs->port;
 	struct sge_txq *txq = &qs->txq[TXQ_ETH];
 	struct buf_ring *br = txq->txq_mr;
 	int error, avail;
 
 	avail = txq->size - txq->in_use;
 	TXQ_LOCK_ASSERT(qs);
 
 	/*
 	 * We can only do a direct transmit if the following are true:
 	 * - we aren't coalescing (ring < 3/4 full)
 	 * - the link is up -- checked in caller
 	 * - there are no packets enqueued already
 	 * - there is space in hardware transmit queue 
 	 */
 	if (check_pkt_coalesce(qs) == 0 &&
 	    !TXQ_RING_NEEDS_ENQUEUE(qs) && avail > TX_MAX_DESC) {
 		if (t3_encap(qs, &m)) {
 			if (m != NULL &&
 			    (error = drbr_enqueue(ifp, br, m)) != 0) 
 				return (error);
 		} else {
 			if (txq->db_pending)
 				check_ring_tx_db(pi->adapter, txq, 1);
 
 			/*
 			 * We've bypassed the buf ring so we need to update
 			 * the stats directly
 			 */
 			txq->txq_direct_packets++;
 			txq->txq_direct_bytes += m->m_pkthdr.len;
 		}
 	} else if ((error = drbr_enqueue(ifp, br, m)) != 0)
 		return (error);
 
 	reclaim_completed_tx(qs, cxgb_tx_reclaim_threshold, TXQ_ETH);
 	if (!TXQ_RING_EMPTY(qs) && pi->link_config.link_ok &&
 	    (!check_pkt_coalesce(qs) || (drbr_inuse(ifp, br) >= 7)))
 		cxgb_start_locked(qs);
 	else if (!TXQ_RING_EMPTY(qs) && !callout_pending(&txq->txq_timer))
 		callout_reset_on(&txq->txq_timer, 1, cxgb_tx_timeout,
 		    qs, txq->txq_timer.c_cpu);
 	return (0);
 }
 
 int
 cxgb_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct sge_qset *qs;
 	struct port_info *pi = ifp->if_softc;
 	int error, qidx = pi->first_qset;
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0
 	    ||(!pi->link_config.link_ok)) {
 		m_freem(m);
 		return (0);
 	}
 
 	/* check if flowid is set */
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)	
 		qidx = (m->m_pkthdr.flowid % pi->nqsets) + pi->first_qset;
 
 	qs = &pi->adapter->sge.qs[qidx];
 	
 	if (TXQ_TRYLOCK(qs)) {
 		/* XXX running */
 		error = cxgb_transmit_locked(ifp, qs, m);
 		TXQ_UNLOCK(qs);
 	} else
 		error = drbr_enqueue(ifp, qs->txq[TXQ_ETH].txq_mr, m);
 	return (error);
 }
 
 void
 cxgb_qflush(struct ifnet *ifp)
 {
 	/*
 	 * flush any enqueued mbufs in the buf_rings
 	 * and in the transmit queues
 	 * no-op for now
 	 */
 	return;
 }
 
 /**
  *	write_imm - write a packet into a Tx descriptor as immediate data
  *	@d: the Tx descriptor to write
  *	@m: the packet
  *	@len: the length of packet data to write as immediate data
  *	@gen: the generation bit value to write
  *
  *	Writes a packet as immediate data into a Tx descriptor.  The packet
  *	contains a work request at its beginning.  We must write the packet
  *	carefully so the SGE doesn't read accidentally before it's written in
  *	its entirety.
  */
 static __inline void
 write_imm(struct tx_desc *d, caddr_t src,
 	  unsigned int len, unsigned int gen)
 {
 	struct work_request_hdr *from = (struct work_request_hdr *)src;
 	struct work_request_hdr *to = (struct work_request_hdr *)d;
 	uint32_t wr_hi, wr_lo;
 
 	KASSERT(len <= WR_LEN && len >= sizeof(*from),
 	    ("%s: invalid len %d", __func__, len));
 	
 	memcpy(&to[1], &from[1], len - sizeof(*from));
 	wr_hi = from->wrh_hi | htonl(F_WR_SOP | F_WR_EOP |
 	    V_WR_BCNTLFLT(len & 7));
 	wr_lo = from->wrh_lo | htonl(V_WR_GEN(gen) | V_WR_LEN((len + 7) / 8));
 	set_wr_hdr(to, wr_hi, wr_lo);
 	wmb();
 	wr_gen2(d, gen);
 }
 
 /**
  *	check_desc_avail - check descriptor availability on a send queue
  *	@adap: the adapter
  *	@q: the TX queue
  *	@m: the packet needing the descriptors
  *	@ndesc: the number of Tx descriptors needed
  *	@qid: the Tx queue number in its queue set (TXQ_OFLD or TXQ_CTRL)
  *
  *	Checks if the requested number of Tx descriptors is available on an
  *	SGE send queue.  If the queue is already suspended or not enough
  *	descriptors are available the packet is queued for later transmission.
  *	Must be called with the Tx queue locked.
  *
  *	Returns 0 if enough descriptors are available, 1 if there aren't
  *	enough descriptors and the packet has been queued, and 2 if the caller
  *	needs to retry because there weren't enough descriptors at the
  *	beginning of the call but some freed up in the mean time.
  */
 static __inline int
 check_desc_avail(adapter_t *adap, struct sge_txq *q,
 		 struct mbuf *m, unsigned int ndesc,
 		 unsigned int qid)
 {
 	/* 
 	 * XXX We currently only use this for checking the control queue
 	 * the control queue is only used for binding qsets which happens
 	 * at init time so we are guaranteed enough descriptors
 	 */
 	if (__predict_false(mbufq_len(&q->sendq))) {
 addq_exit:	(void )mbufq_enqueue(&q->sendq, m);
 		return 1;
 	}
 	if (__predict_false(q->size - q->in_use < ndesc)) {
 
 		struct sge_qset *qs = txq_to_qset(q, qid);
 
 		setbit(&qs->txq_stopped, qid);
 		if (should_restart_tx(q) &&
 		    test_and_clear_bit(qid, &qs->txq_stopped))
 			return 2;
 
 		q->stops++;
 		goto addq_exit;
 	}
 	return 0;
 }
 
 
 /**
  *	reclaim_completed_tx_imm - reclaim completed control-queue Tx descs
  *	@q: the SGE control Tx queue
  *
  *	This is a variant of reclaim_completed_tx() that is used for Tx queues
  *	that send only immediate data (presently just the control queues) and
  *	thus do not have any mbufs
  */
 static __inline void
 reclaim_completed_tx_imm(struct sge_txq *q)
 {
 	unsigned int reclaim = q->processed - q->cleaned;
 
 	q->in_use -= reclaim;
 	q->cleaned += reclaim;
 }
 
 /**
  *	ctrl_xmit - send a packet through an SGE control Tx queue
  *	@adap: the adapter
  *	@q: the control queue
  *	@m: the packet
  *
  *	Send a packet through an SGE control Tx queue.  Packets sent through
  *	a control queue must fit entirely as immediate data in a single Tx
  *	descriptor and have no page fragments.
  */
 static int
 ctrl_xmit(adapter_t *adap, struct sge_qset *qs, struct mbuf *m)
 {
 	int ret;
 	struct work_request_hdr *wrp = mtod(m, struct work_request_hdr *);
 	struct sge_txq *q = &qs->txq[TXQ_CTRL];
 	
 	KASSERT(m->m_len <= WR_LEN, ("%s: bad tx data", __func__));
 
 	wrp->wrh_hi |= htonl(F_WR_SOP | F_WR_EOP);
 	wrp->wrh_lo = htonl(V_WR_TID(q->token));
 
 	TXQ_LOCK(qs);
 again:	reclaim_completed_tx_imm(q);
 
 	ret = check_desc_avail(adap, q, m, 1, TXQ_CTRL);
 	if (__predict_false(ret)) {
 		if (ret == 1) {
 			TXQ_UNLOCK(qs);
 			return (ENOSPC);
 		}
 		goto again;
 	}
 	write_imm(&q->desc[q->pidx], m->m_data, m->m_len, q->gen);
 	
 	q->in_use++;
 	if (++q->pidx >= q->size) {
 		q->pidx = 0;
 		q->gen ^= 1;
 	}
 	TXQ_UNLOCK(qs);
 	wmb();
 	t3_write_reg(adap, A_SG_KDOORBELL,
 	    F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 
 	m_free(m);
 	return (0);
 }
 
 
 /**
  *	restart_ctrlq - restart a suspended control queue
  *	@qs: the queue set cotaining the control queue
  *
  *	Resumes transmission on a suspended Tx control queue.
  */
 static void
 restart_ctrlq(void *data, int npending)
 {
 	struct mbuf *m;
 	struct sge_qset *qs = (struct sge_qset *)data;
 	struct sge_txq *q = &qs->txq[TXQ_CTRL];
 	adapter_t *adap = qs->port->adapter;
 
 	TXQ_LOCK(qs);
 again:	reclaim_completed_tx_imm(q);
 
 	while (q->in_use < q->size &&
 	       (m = mbufq_dequeue(&q->sendq)) != NULL) {
 
 		write_imm(&q->desc[q->pidx], m->m_data, m->m_len, q->gen);
 		m_free(m);
 
 		if (++q->pidx >= q->size) {
 			q->pidx = 0;
 			q->gen ^= 1;
 		}
 		q->in_use++;
 	}
 	if (mbufq_len(&q->sendq)) {
 		setbit(&qs->txq_stopped, TXQ_CTRL);
 
 		if (should_restart_tx(q) &&
 		    test_and_clear_bit(TXQ_CTRL, &qs->txq_stopped))
 			goto again;
 		q->stops++;
 	}
 	TXQ_UNLOCK(qs);
 	t3_write_reg(adap, A_SG_KDOORBELL,
 		     F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 }
 
 
 /*
  * Send a management message through control queue 0
  */
 int
 t3_mgmt_tx(struct adapter *adap, struct mbuf *m)
 {
 	return ctrl_xmit(adap, &adap->sge.qs[0], m);
 }
 
 /**
  *	free_qset - free the resources of an SGE queue set
  *	@sc: the controller owning the queue set
  *	@q: the queue set
  *
  *	Release the HW and SW resources associated with an SGE queue set, such
  *	as HW contexts, packet buffers, and descriptor rings.  Traffic to the
  *	queue set must be quiesced prior to calling this.
  */
 static void
 t3_free_qset(adapter_t *sc, struct sge_qset *q)
 {
 	int i;
 	
 	reclaim_completed_tx(q, 0, TXQ_ETH);
 	if (q->txq[TXQ_ETH].txq_mr != NULL) 
 		buf_ring_free(q->txq[TXQ_ETH].txq_mr, M_DEVBUF);
 	if (q->txq[TXQ_ETH].txq_ifq != NULL) {
 		ifq_delete(q->txq[TXQ_ETH].txq_ifq);
 		free(q->txq[TXQ_ETH].txq_ifq, M_DEVBUF);
 	}
 
 	for (i = 0; i < SGE_RXQ_PER_SET; ++i) {
 		if (q->fl[i].desc) {
 			mtx_lock_spin(&sc->sge.reg_lock);
 			t3_sge_disable_fl(sc, q->fl[i].cntxt_id);
 			mtx_unlock_spin(&sc->sge.reg_lock);
 			bus_dmamap_unload(q->fl[i].desc_tag, q->fl[i].desc_map);
 			bus_dmamem_free(q->fl[i].desc_tag, q->fl[i].desc,
 					q->fl[i].desc_map);
 			bus_dma_tag_destroy(q->fl[i].desc_tag);
 			bus_dma_tag_destroy(q->fl[i].entry_tag);
 		}
 		if (q->fl[i].sdesc) {
 			free_rx_bufs(sc, &q->fl[i]);
 			free(q->fl[i].sdesc, M_DEVBUF);
 		}
 	}
 
 	mtx_unlock(&q->lock);
 	MTX_DESTROY(&q->lock);
 	for (i = 0; i < SGE_TXQ_PER_SET; i++) {
 		if (q->txq[i].desc) {
 			mtx_lock_spin(&sc->sge.reg_lock);
 			t3_sge_enable_ecntxt(sc, q->txq[i].cntxt_id, 0);
 			mtx_unlock_spin(&sc->sge.reg_lock);
 			bus_dmamap_unload(q->txq[i].desc_tag,
 					q->txq[i].desc_map);
 			bus_dmamem_free(q->txq[i].desc_tag, q->txq[i].desc,
 					q->txq[i].desc_map);
 			bus_dma_tag_destroy(q->txq[i].desc_tag);
 			bus_dma_tag_destroy(q->txq[i].entry_tag);
 		}
 		if (q->txq[i].sdesc) {
 			free(q->txq[i].sdesc, M_DEVBUF);
 		}
 	}
 
 	if (q->rspq.desc) {
 		mtx_lock_spin(&sc->sge.reg_lock);
 		t3_sge_disable_rspcntxt(sc, q->rspq.cntxt_id);
 		mtx_unlock_spin(&sc->sge.reg_lock);
 		
 		bus_dmamap_unload(q->rspq.desc_tag, q->rspq.desc_map);
 		bus_dmamem_free(q->rspq.desc_tag, q->rspq.desc,
 			        q->rspq.desc_map);
 		bus_dma_tag_destroy(q->rspq.desc_tag);
 		MTX_DESTROY(&q->rspq.lock);
 	}
 
 #if defined(INET6) || defined(INET)
 	tcp_lro_free(&q->lro.ctrl);
 #endif
 
 	bzero(q, sizeof(*q));
 }
 
 /**
  *	t3_free_sge_resources - free SGE resources
  *	@sc: the adapter softc
  *
  *	Frees resources used by the SGE queue sets.
  */
 void
 t3_free_sge_resources(adapter_t *sc, int nqsets)
 {
 	int i;
 
 	for (i = 0; i < nqsets; ++i) {
 		TXQ_LOCK(&sc->sge.qs[i]);
 		t3_free_qset(sc, &sc->sge.qs[i]);
 	}
 }
 
 /**
  *	t3_sge_start - enable SGE
  *	@sc: the controller softc
  *
  *	Enables the SGE for DMAs.  This is the last step in starting packet
  *	transfers.
  */
 void
 t3_sge_start(adapter_t *sc)
 {
 	t3_set_reg_field(sc, A_SG_CONTROL, F_GLOBALENABLE, F_GLOBALENABLE);
 }
 
 /**
  *	t3_sge_stop - disable SGE operation
  *	@sc: the adapter
  *
  *	Disables the DMA engine.  This can be called in emeregencies (e.g.,
  *	from error interrupts) or from normal process context.  In the latter
  *	case it also disables any pending queue restart tasklets.  Note that
  *	if it is called in interrupt context it cannot disable the restart
  *	tasklets as it cannot wait, however the tasklets will have no effect
  *	since the doorbells are disabled and the driver will call this again
  *	later from process context, at which time the tasklets will be stopped
  *	if they are still running.
  */
 void
 t3_sge_stop(adapter_t *sc)
 {
 	int i, nqsets;
 	
 	t3_set_reg_field(sc, A_SG_CONTROL, F_GLOBALENABLE, 0);
 
 	if (sc->tq == NULL)
 		return;
 	
 	for (nqsets = i = 0; i < (sc)->params.nports; i++) 
 		nqsets += sc->port[i].nqsets;
 #ifdef notyet
 	/*
 	 * 
 	 * XXX
 	 */
 	for (i = 0; i < nqsets; ++i) {
 		struct sge_qset *qs = &sc->sge.qs[i];
 		
 		taskqueue_drain(sc->tq, &qs->txq[TXQ_OFLD].qresume_task);
 		taskqueue_drain(sc->tq, &qs->txq[TXQ_CTRL].qresume_task);
 	}
 #endif
 }
 
 /**
  *	t3_free_tx_desc - reclaims Tx descriptors and their buffers
  *	@adapter: the adapter
  *	@q: the Tx queue to reclaim descriptors from
  *	@reclaimable: the number of descriptors to reclaim
  *      @m_vec_size: maximum number of buffers to reclaim
  *      @desc_reclaimed: returns the number of descriptors reclaimed
  *
  *	Reclaims Tx descriptors from an SGE Tx queue and frees the associated
  *	Tx buffers.  Called with the Tx queue lock held.
  *
  *      Returns number of buffers of reclaimed   
  */
 void
 t3_free_tx_desc(struct sge_qset *qs, int reclaimable, int queue)
 {
 	struct tx_sw_desc *txsd;
 	unsigned int cidx, mask;
 	struct sge_txq *q = &qs->txq[queue];
 
 #ifdef T3_TRACE
 	T3_TRACE2(sc->tb[q->cntxt_id & 7],
 		  "reclaiming %u Tx descriptors at cidx %u", reclaimable, cidx);
 #endif
 	cidx = q->cidx;
 	mask = q->size - 1;
 	txsd = &q->sdesc[cidx];
 
 	mtx_assert(&qs->lock, MA_OWNED);
 	while (reclaimable--) {
 		prefetch(q->sdesc[(cidx + 1) & mask].m);
 		prefetch(q->sdesc[(cidx + 2) & mask].m);
 
 		if (txsd->m != NULL) {
 			if (txsd->flags & TX_SW_DESC_MAPPED) {
 				bus_dmamap_unload(q->entry_tag, txsd->map);
 				txsd->flags &= ~TX_SW_DESC_MAPPED;
 			}
 			m_freem_list(txsd->m);
 			txsd->m = NULL;
 		} else
 			q->txq_skipped++;
 		
 		++txsd;
 		if (++cidx == q->size) {
 			cidx = 0;
 			txsd = q->sdesc;
 		}
 	}
 	q->cidx = cidx;
 
 }
 
 /**
  *	is_new_response - check if a response is newly written
  *	@r: the response descriptor
  *	@q: the response queue
  *
  *	Returns true if a response descriptor contains a yet unprocessed
  *	response.
  */
 static __inline int
 is_new_response(const struct rsp_desc *r,
     const struct sge_rspq *q)
 {
 	return (r->intr_gen & F_RSPD_GEN2) == q->gen;
 }
 
 #define RSPD_GTS_MASK  (F_RSPD_TXQ0_GTS | F_RSPD_TXQ1_GTS)
 #define RSPD_CTRL_MASK (RSPD_GTS_MASK | \
 			V_RSPD_TXQ0_CR(M_RSPD_TXQ0_CR) | \
 			V_RSPD_TXQ1_CR(M_RSPD_TXQ1_CR) | \
 			V_RSPD_TXQ2_CR(M_RSPD_TXQ2_CR))
 
 /* How long to delay the next interrupt in case of memory shortage, in 0.1us. */
 #define NOMEM_INTR_DELAY 2500
 
 #ifdef TCP_OFFLOAD
 /**
  *	write_ofld_wr - write an offload work request
  *	@adap: the adapter
  *	@m: the packet to send
  *	@q: the Tx queue
  *	@pidx: index of the first Tx descriptor to write
  *	@gen: the generation value to use
  *	@ndesc: number of descriptors the packet will occupy
  *
  *	Write an offload work request to send the supplied packet.  The packet
  *	data already carry the work request with most fields populated.
  */
 static void
 write_ofld_wr(adapter_t *adap, struct mbuf *m, struct sge_txq *q,
     unsigned int pidx, unsigned int gen, unsigned int ndesc)
 {
 	unsigned int sgl_flits, flits;
 	int i, idx, nsegs, wrlen;
 	struct work_request_hdr *from;
 	struct sg_ent *sgp, t3sgl[TX_MAX_SEGS / 2 + 1];
 	struct tx_desc *d = &q->desc[pidx];
 	struct txq_state txqs;
 	struct sglist_seg *segs;
 	struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
 	struct sglist *sgl;
 
 	from = (void *)(oh + 1);	/* Start of WR within mbuf */
 	wrlen = m->m_len - sizeof(*oh);
 
 	if (!(oh->flags & F_HDR_SGL)) {
 		write_imm(d, (caddr_t)from, wrlen, gen);
 
 		/*
 		 * mbuf with "real" immediate tx data will be enqueue_wr'd by
 		 * t3_push_frames and freed in wr_ack.  Others, like those sent
 		 * down by close_conn, t3_send_reset, etc. should be freed here.
 		 */
 		if (!(oh->flags & F_HDR_DF))
 			m_free(m);
 		return;
 	}
 
 	memcpy(&d->flit[1], &from[1], wrlen - sizeof(*from));
 
 	sgl = oh->sgl;
 	flits = wrlen / 8;
 	sgp = (ndesc == 1) ? (struct sg_ent *)&d->flit[flits] : t3sgl;
 
 	nsegs = sgl->sg_nseg;
 	segs = sgl->sg_segs;
 	for (idx = 0, i = 0; i < nsegs; i++) {
 		KASSERT(segs[i].ss_len, ("%s: 0 len in sgl", __func__));
 		if (i && idx == 0) 
 			++sgp;
 		sgp->len[idx] = htobe32(segs[i].ss_len);
 		sgp->addr[idx] = htobe64(segs[i].ss_paddr);
 		idx ^= 1;
 	}
 	if (idx) {
 		sgp->len[idx] = 0;
 		sgp->addr[idx] = 0;
 	}
 
 	sgl_flits = sgl_len(nsegs);
 	txqs.gen = gen;
 	txqs.pidx = pidx;
 	txqs.compl = 0;
 
 	write_wr_hdr_sgl(ndesc, d, &txqs, q, t3sgl, flits, sgl_flits,
 	    from->wrh_hi, from->wrh_lo);
 }
 
 /**
  *	ofld_xmit - send a packet through an offload queue
  *	@adap: the adapter
  *	@q: the Tx offload queue
  *	@m: the packet
  *
  *	Send an offload packet through an SGE offload queue.
  */
 static int
 ofld_xmit(adapter_t *adap, struct sge_qset *qs, struct mbuf *m)
 {
 	int ret;
 	unsigned int ndesc;
 	unsigned int pidx, gen;
 	struct sge_txq *q = &qs->txq[TXQ_OFLD];
 	struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
 
 	ndesc = G_HDR_NDESC(oh->flags);
 
 	TXQ_LOCK(qs);
 again:	reclaim_completed_tx(qs, 16, TXQ_OFLD);
 	ret = check_desc_avail(adap, q, m, ndesc, TXQ_OFLD);
 	if (__predict_false(ret)) {
 		if (ret == 1) {
 			TXQ_UNLOCK(qs);
 			return (EINTR);
 		}
 		goto again;
 	}
 
 	gen = q->gen;
 	q->in_use += ndesc;
 	pidx = q->pidx;
 	q->pidx += ndesc;
 	if (q->pidx >= q->size) {
 		q->pidx -= q->size;
 		q->gen ^= 1;
 	}
 
 	write_ofld_wr(adap, m, q, pidx, gen, ndesc);
 	check_ring_tx_db(adap, q, 1);
 	TXQ_UNLOCK(qs);
 
 	return (0);
 }
 
 /**
  *	restart_offloadq - restart a suspended offload queue
  *	@qs: the queue set cotaining the offload queue
  *
  *	Resumes transmission on a suspended Tx offload queue.
  */
 static void
 restart_offloadq(void *data, int npending)
 {
 	struct mbuf *m;
 	struct sge_qset *qs = data;
 	struct sge_txq *q = &qs->txq[TXQ_OFLD];
 	adapter_t *adap = qs->port->adapter;
 	int cleaned;
 		
 	TXQ_LOCK(qs);
 again:	cleaned = reclaim_completed_tx(qs, 16, TXQ_OFLD);
 
 	while ((m = mbufq_first(&q->sendq)) != NULL) {
 		unsigned int gen, pidx;
 		struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
 		unsigned int ndesc = G_HDR_NDESC(oh->flags);
 
 		if (__predict_false(q->size - q->in_use < ndesc)) {
 			setbit(&qs->txq_stopped, TXQ_OFLD);
 			if (should_restart_tx(q) &&
 			    test_and_clear_bit(TXQ_OFLD, &qs->txq_stopped))
 				goto again;
 			q->stops++;
 			break;
 		}
 
 		gen = q->gen;
 		q->in_use += ndesc;
 		pidx = q->pidx;
 		q->pidx += ndesc;
 		if (q->pidx >= q->size) {
 			q->pidx -= q->size;
 			q->gen ^= 1;
 		}
 		
 		(void)mbufq_dequeue(&q->sendq);
 		TXQ_UNLOCK(qs);
 		write_ofld_wr(adap, m, q, pidx, gen, ndesc);
 		TXQ_LOCK(qs);
 	}
 #if USE_GTS
 	set_bit(TXQ_RUNNING, &q->flags);
 	set_bit(TXQ_LAST_PKT_DB, &q->flags);
 #endif
 	TXQ_UNLOCK(qs);
 	wmb();
 	t3_write_reg(adap, A_SG_KDOORBELL,
 		     F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
 }
 
 /**
  *	t3_offload_tx - send an offload packet
  *	@m: the packet
  *
  *	Sends an offload packet.  We use the packet priority to select the
  *	appropriate Tx queue as follows: bit 0 indicates whether the packet
  *	should be sent as regular or control, bits 1-3 select the queue set.
  */
 int
 t3_offload_tx(struct adapter *sc, struct mbuf *m)
 {
 	struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
 	struct sge_qset *qs = &sc->sge.qs[G_HDR_QSET(oh->flags)];
 
 	if (oh->flags & F_HDR_CTRL) {
 		m_adj(m, sizeof (*oh));	/* trim ofld_hdr off */
 		return (ctrl_xmit(sc, qs, m));
 	} else
 		return (ofld_xmit(sc, qs, m));
 }
 #endif
 
 static void
 restart_tx(struct sge_qset *qs)
 {
 	struct adapter *sc = qs->port->adapter;
 
 	if (isset(&qs->txq_stopped, TXQ_OFLD) &&
 	    should_restart_tx(&qs->txq[TXQ_OFLD]) &&
 	    test_and_clear_bit(TXQ_OFLD, &qs->txq_stopped)) {
 		qs->txq[TXQ_OFLD].restarts++;
 		taskqueue_enqueue(sc->tq, &qs->txq[TXQ_OFLD].qresume_task);
 	}
 
 	if (isset(&qs->txq_stopped, TXQ_CTRL) &&
 	    should_restart_tx(&qs->txq[TXQ_CTRL]) &&
 	    test_and_clear_bit(TXQ_CTRL, &qs->txq_stopped)) {
 		qs->txq[TXQ_CTRL].restarts++;
 		taskqueue_enqueue(sc->tq, &qs->txq[TXQ_CTRL].qresume_task);
 	}
 }
 
 /**
  *	t3_sge_alloc_qset - initialize an SGE queue set
  *	@sc: the controller softc
  *	@id: the queue set id
  *	@nports: how many Ethernet ports will be using this queue set
  *	@irq_vec_idx: the IRQ vector index for response queue interrupts
  *	@p: configuration parameters for this queue set
  *	@ntxq: number of Tx queues for the queue set
  *	@pi: port info for queue set
  *
  *	Allocate resources and initialize an SGE queue set.  A queue set
  *	comprises a response queue, two Rx free-buffer queues, and up to 3
  *	Tx queues.  The Tx queues are assigned roles in the order Ethernet
  *	queue, offload queue, and control queue.
  */
 int
 t3_sge_alloc_qset(adapter_t *sc, u_int id, int nports, int irq_vec_idx,
 		  const struct qset_params *p, int ntxq, struct port_info *pi)
 {
 	struct sge_qset *q = &sc->sge.qs[id];
 	int i, ret = 0;
 
 	MTX_INIT(&q->lock, q->namebuf, NULL, MTX_DEF);
 	q->port = pi;
 	q->adap = sc;
 
 	if ((q->txq[TXQ_ETH].txq_mr = buf_ring_alloc(cxgb_txq_buf_ring_size,
 	    M_DEVBUF, M_WAITOK, &q->lock)) == NULL) {
 		device_printf(sc->dev, "failed to allocate mbuf ring\n");
 		goto err;
 	}
 	if ((q->txq[TXQ_ETH].txq_ifq = malloc(sizeof(struct ifaltq), M_DEVBUF,
 	    M_NOWAIT | M_ZERO)) == NULL) {
 		device_printf(sc->dev, "failed to allocate ifq\n");
 		goto err;
 	}
 	ifq_init(q->txq[TXQ_ETH].txq_ifq, pi->ifp);	
 	callout_init(&q->txq[TXQ_ETH].txq_timer, 1);
 	callout_init(&q->txq[TXQ_ETH].txq_watchdog, 1);
 	q->txq[TXQ_ETH].txq_timer.c_cpu = id % mp_ncpus;
 	q->txq[TXQ_ETH].txq_watchdog.c_cpu = id % mp_ncpus;
 
 	init_qset_cntxt(q, id);
 	q->idx = id;
 	if ((ret = alloc_ring(sc, p->fl_size, sizeof(struct rx_desc),
 		    sizeof(struct rx_sw_desc), &q->fl[0].phys_addr,
 		    &q->fl[0].desc, &q->fl[0].sdesc,
 		    &q->fl[0].desc_tag, &q->fl[0].desc_map,
 		    sc->rx_dmat, &q->fl[0].entry_tag)) != 0) {
 		printf("error %d from alloc ring fl0\n", ret);
 		goto err;
 	}
 
 	if ((ret = alloc_ring(sc, p->jumbo_size, sizeof(struct rx_desc),
 		    sizeof(struct rx_sw_desc), &q->fl[1].phys_addr,
 		    &q->fl[1].desc, &q->fl[1].sdesc,
 		    &q->fl[1].desc_tag, &q->fl[1].desc_map,
 		    sc->rx_jumbo_dmat, &q->fl[1].entry_tag)) != 0) {
 		printf("error %d from alloc ring fl1\n", ret);
 		goto err;
 	}
 
 	if ((ret = alloc_ring(sc, p->rspq_size, sizeof(struct rsp_desc), 0,
 		    &q->rspq.phys_addr, &q->rspq.desc, NULL,
 		    &q->rspq.desc_tag, &q->rspq.desc_map,
 		    NULL, NULL)) != 0) {
 		printf("error %d from alloc ring rspq\n", ret);
 		goto err;
 	}
 
 	snprintf(q->rspq.lockbuf, RSPQ_NAME_LEN, "t3 rspq lock %d:%d",
 	    device_get_unit(sc->dev), irq_vec_idx);
 	MTX_INIT(&q->rspq.lock, q->rspq.lockbuf, NULL, MTX_DEF);
 
 	for (i = 0; i < ntxq; ++i) {
 		size_t sz = i == TXQ_CTRL ? 0 : sizeof(struct tx_sw_desc);
 
 		if ((ret = alloc_ring(sc, p->txq_size[i],
 			    sizeof(struct tx_desc), sz,
 			    &q->txq[i].phys_addr, &q->txq[i].desc,
 			    &q->txq[i].sdesc, &q->txq[i].desc_tag,
 			    &q->txq[i].desc_map,
 			    sc->tx_dmat, &q->txq[i].entry_tag)) != 0) {
 			printf("error %d from alloc ring tx %i\n", ret, i);
 			goto err;
 		}
 		mbufq_init(&q->txq[i].sendq, INT_MAX);
 		q->txq[i].gen = 1;
 		q->txq[i].size = p->txq_size[i];
 	}
 
 #ifdef TCP_OFFLOAD
 	TASK_INIT(&q->txq[TXQ_OFLD].qresume_task, 0, restart_offloadq, q);
 #endif
 	TASK_INIT(&q->txq[TXQ_CTRL].qresume_task, 0, restart_ctrlq, q);
 	TASK_INIT(&q->txq[TXQ_ETH].qreclaim_task, 0, sge_txq_reclaim_handler, q);
 	TASK_INIT(&q->txq[TXQ_OFLD].qreclaim_task, 0, sge_txq_reclaim_handler, q);
 
 	q->fl[0].gen = q->fl[1].gen = 1;
 	q->fl[0].size = p->fl_size;
 	q->fl[1].size = p->jumbo_size;
 
 	q->rspq.gen = 1;
 	q->rspq.cidx = 0;
 	q->rspq.size = p->rspq_size;
 
 	q->txq[TXQ_ETH].stop_thres = nports *
 	    flits_to_desc(sgl_len(TX_MAX_SEGS + 1) + 3);
 
 	q->fl[0].buf_size = MCLBYTES;
 	q->fl[0].zone = zone_pack;
 	q->fl[0].type = EXT_PACKET;
 
 	if (p->jumbo_buf_size ==  MJUM16BYTES) {
 		q->fl[1].zone = zone_jumbo16;
 		q->fl[1].type = EXT_JUMBO16;
 	} else if (p->jumbo_buf_size ==  MJUM9BYTES) {
 		q->fl[1].zone = zone_jumbo9;
 		q->fl[1].type = EXT_JUMBO9;		
 	} else if (p->jumbo_buf_size ==  MJUMPAGESIZE) {
 		q->fl[1].zone = zone_jumbop;
 		q->fl[1].type = EXT_JUMBOP;
 	} else {
 		KASSERT(0, ("can't deal with jumbo_buf_size %d.", p->jumbo_buf_size));
 		ret = EDOOFUS;
 		goto err;
 	}
 	q->fl[1].buf_size = p->jumbo_buf_size;
 
 	/* Allocate and setup the lro_ctrl structure */
 	q->lro.enabled = !!(pi->ifp->if_capenable & IFCAP_LRO);
 #if defined(INET6) || defined(INET)
 	ret = tcp_lro_init(&q->lro.ctrl);
 	if (ret) {
 		printf("error %d from tcp_lro_init\n", ret);
 		goto err;
 	}
 #endif
 	q->lro.ctrl.ifp = pi->ifp;
 
 	mtx_lock_spin(&sc->sge.reg_lock);
 	ret = -t3_sge_init_rspcntxt(sc, q->rspq.cntxt_id, irq_vec_idx,
 				   q->rspq.phys_addr, q->rspq.size,
 				   q->fl[0].buf_size, 1, 0);
 	if (ret) {
 		printf("error %d from t3_sge_init_rspcntxt\n", ret);
 		goto err_unlock;
 	}
 
 	for (i = 0; i < SGE_RXQ_PER_SET; ++i) {
 		ret = -t3_sge_init_flcntxt(sc, q->fl[i].cntxt_id, 0,
 					  q->fl[i].phys_addr, q->fl[i].size,
 					  q->fl[i].buf_size, p->cong_thres, 1,
 					  0);
 		if (ret) {
 			printf("error %d from t3_sge_init_flcntxt for index i=%d\n", ret, i);
 			goto err_unlock;
 		}
 	}
 
 	ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_ETH].cntxt_id, USE_GTS,
 				 SGE_CNTXT_ETH, id, q->txq[TXQ_ETH].phys_addr,
 				 q->txq[TXQ_ETH].size, q->txq[TXQ_ETH].token,
 				 1, 0);
 	if (ret) {
 		printf("error %d from t3_sge_init_ecntxt\n", ret);
 		goto err_unlock;
 	}
 
 	if (ntxq > 1) {
 		ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_OFLD].cntxt_id,
 					 USE_GTS, SGE_CNTXT_OFLD, id,
 					 q->txq[TXQ_OFLD].phys_addr,
 					 q->txq[TXQ_OFLD].size, 0, 1, 0);
 		if (ret) {
 			printf("error %d from t3_sge_init_ecntxt\n", ret);
 			goto err_unlock;
 		}
 	}
 
 	if (ntxq > 2) {
 		ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_CTRL].cntxt_id, 0,
 					 SGE_CNTXT_CTRL, id,
 					 q->txq[TXQ_CTRL].phys_addr,
 					 q->txq[TXQ_CTRL].size,
 					 q->txq[TXQ_CTRL].token, 1, 0);
 		if (ret) {
 			printf("error %d from t3_sge_init_ecntxt\n", ret);
 			goto err_unlock;
 		}
 	}
 
 	mtx_unlock_spin(&sc->sge.reg_lock);
 	t3_update_qset_coalesce(q, p);
 
 	refill_fl(sc, &q->fl[0], q->fl[0].size);
 	refill_fl(sc, &q->fl[1], q->fl[1].size);
 	refill_rspq(sc, &q->rspq, q->rspq.size - 1);
 
 	t3_write_reg(sc, A_SG_GTS, V_RSPQ(q->rspq.cntxt_id) |
 		     V_NEWTIMER(q->rspq.holdoff_tmr));
 
 	return (0);
 
 err_unlock:
 	mtx_unlock_spin(&sc->sge.reg_lock);
 err:	
 	TXQ_LOCK(q);
 	t3_free_qset(sc, q);
 
 	return (ret);
 }
 
 /*
  * Remove CPL_RX_PKT headers from the mbuf and reduce it to a regular mbuf with
  * ethernet data.  Hardware assistance with various checksums and any vlan tag
  * will also be taken into account here.
  */
 void
 t3_rx_eth(struct adapter *adap, struct mbuf *m, int ethpad)
 {
 	struct cpl_rx_pkt *cpl = (struct cpl_rx_pkt *)(mtod(m, uint8_t *) + ethpad);
 	struct port_info *pi = &adap->port[adap->rxpkt_map[cpl->iff]];
 	struct ifnet *ifp = pi->ifp;
 	
 	if (cpl->vlan_valid) {
 		m->m_pkthdr.ether_vtag = ntohs(cpl->vlan);
 		m->m_flags |= M_VLANTAG;
 	} 
 
 	m->m_pkthdr.rcvif = ifp;
 	/*
 	 * adjust after conversion to mbuf chain
 	 */
 	m->m_pkthdr.len -= (sizeof(*cpl) + ethpad);
 	m->m_len -= (sizeof(*cpl) + ethpad);
 	m->m_data += (sizeof(*cpl) + ethpad);
 
 	if (!cpl->fragment && cpl->csum_valid && cpl->csum == 0xffff) {
 		struct ether_header *eh = mtod(m, void *);
 		uint16_t eh_type;
 
 		if (eh->ether_type == htons(ETHERTYPE_VLAN)) {
 			struct ether_vlan_header *evh = mtod(m, void *);
 
 			eh_type = evh->evl_proto;
 		} else
 			eh_type = eh->ether_type;
 
 		if (ifp->if_capenable & IFCAP_RXCSUM &&
 		    eh_type == htons(ETHERTYPE_IP)) {
 			m->m_pkthdr.csum_flags = (CSUM_IP_CHECKED |
 			    CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 			m->m_pkthdr.csum_data = 0xffff;
 		} else if (ifp->if_capenable & IFCAP_RXCSUM_IPV6 &&
 		    eh_type == htons(ETHERTYPE_IPV6)) {
 			m->m_pkthdr.csum_flags = (CSUM_DATA_VALID_IPV6 |
 			    CSUM_PSEUDO_HDR);
 			m->m_pkthdr.csum_data = 0xffff;
 		}
 	}
 }
 
 /**
  *	get_packet - return the next ingress packet buffer from a free list
  *	@adap: the adapter that received the packet
  *	@drop_thres: # of remaining buffers before we start dropping packets
  *	@qs: the qset that the SGE free list holding the packet belongs to
  *      @mh: the mbuf header, contains a pointer to the head and tail of the mbuf chain
  *      @r: response descriptor 
  *
  *	Get the next packet from a free list and complete setup of the
  *	sk_buff.  If the packet is small we make a copy and recycle the
  *	original buffer, otherwise we use the original buffer itself.  If a
  *	positive drop threshold is supplied packets are dropped and their
  *	buffers recycled if (a) the number of remaining buffers is under the
  *	threshold and the packet is too big to copy, or (b) the packet should
  *	be copied but there is no memory for the copy.
  */
 static int
 get_packet(adapter_t *adap, unsigned int drop_thres, struct sge_qset *qs,
     struct t3_mbuf_hdr *mh, struct rsp_desc *r)
 {
 
 	unsigned int len_cq =  ntohl(r->len_cq);
 	struct sge_fl *fl = (len_cq & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0];
 	int mask, cidx = fl->cidx;
 	struct rx_sw_desc *sd = &fl->sdesc[cidx];
 	uint32_t len = G_RSPD_LEN(len_cq);
 	uint32_t flags = M_EXT;
 	uint8_t sopeop = G_RSPD_SOP_EOP(ntohl(r->flags));
 	caddr_t cl;
 	struct mbuf *m;
 	int ret = 0;
 
 	mask = fl->size - 1;
 	prefetch(fl->sdesc[(cidx + 1) & mask].m);
 	prefetch(fl->sdesc[(cidx + 2) & mask].m);
 	prefetch(fl->sdesc[(cidx + 1) & mask].rxsd_cl);
 	prefetch(fl->sdesc[(cidx + 2) & mask].rxsd_cl);	
 
 	fl->credits--;
 	bus_dmamap_sync(fl->entry_tag, sd->map, BUS_DMASYNC_POSTREAD);
 	
 	if (recycle_enable && len <= SGE_RX_COPY_THRES &&
 	    sopeop == RSPQ_SOP_EOP) {
 		if ((m = m_gethdr(M_NOWAIT, MT_DATA)) == NULL)
 			goto skip_recycle;
 		cl = mtod(m, void *);
 		memcpy(cl, sd->rxsd_cl, len);
 		recycle_rx_buf(adap, fl, fl->cidx);
 		m->m_pkthdr.len = m->m_len = len;
 		m->m_flags = 0;
 		mh->mh_head = mh->mh_tail = m;
 		ret = 1;
 		goto done;
 	} else {
 	skip_recycle:
 		bus_dmamap_unload(fl->entry_tag, sd->map);
 		cl = sd->rxsd_cl;
 		m = sd->m;
 
 		if ((sopeop == RSPQ_SOP_EOP) ||
 		    (sopeop == RSPQ_SOP))
 			flags |= M_PKTHDR;
 		m_init(m, M_NOWAIT, MT_DATA, flags);
 		if (fl->zone == zone_pack) {
 			/*
 			 * restore clobbered data pointer
 			 */
 			m->m_data = m->m_ext.ext_buf;
 		} else {
 			m_cljset(m, cl, fl->type);
 		}
 		m->m_len = len;
 	}		
 	switch(sopeop) {
 	case RSPQ_SOP_EOP:
 		ret = 1;
 		/* FALLTHROUGH */
 	case RSPQ_SOP:
 		mh->mh_head = mh->mh_tail = m;
 		m->m_pkthdr.len = len;
 		break;
 	case RSPQ_EOP:
 		ret = 1;
 		/* FALLTHROUGH */
 	case RSPQ_NSOP_NEOP:
 		if (mh->mh_tail == NULL) {
 			log(LOG_ERR, "discarding intermediate descriptor entry\n");
 			m_freem(m);
 			break;
 		}
 		mh->mh_tail->m_next = m;
 		mh->mh_tail = m;
 		mh->mh_head->m_pkthdr.len += len;
 		break;
 	}
 	if (cxgb_debug)
 		printf("len=%d pktlen=%d\n", m->m_len, m->m_pkthdr.len);
 done:
 	if (++fl->cidx == fl->size)
 		fl->cidx = 0;
 
 	return (ret);
 }
 
 /**
  *	handle_rsp_cntrl_info - handles control information in a response
  *	@qs: the queue set corresponding to the response
  *	@flags: the response control flags
  *
  *	Handles the control information of an SGE response, such as GTS
  *	indications and completion credits for the queue set's Tx queues.
  *	HW coalesces credits, we don't do any extra SW coalescing.
  */
 static __inline void
 handle_rsp_cntrl_info(struct sge_qset *qs, uint32_t flags)
 {
 	unsigned int credits;
 
 #if USE_GTS
 	if (flags & F_RSPD_TXQ0_GTS)
 		clear_bit(TXQ_RUNNING, &qs->txq[TXQ_ETH].flags);
 #endif
 	credits = G_RSPD_TXQ0_CR(flags);
 	if (credits) 
 		qs->txq[TXQ_ETH].processed += credits;
 
 	credits = G_RSPD_TXQ2_CR(flags);
 	if (credits)
 		qs->txq[TXQ_CTRL].processed += credits;
 
 # if USE_GTS
 	if (flags & F_RSPD_TXQ1_GTS)
 		clear_bit(TXQ_RUNNING, &qs->txq[TXQ_OFLD].flags);
 # endif
 	credits = G_RSPD_TXQ1_CR(flags);
 	if (credits)
 		qs->txq[TXQ_OFLD].processed += credits;
 
 }
 
 static void
 check_ring_db(adapter_t *adap, struct sge_qset *qs,
     unsigned int sleeping)
 {
 	;
 }
 
 /**
  *	process_responses - process responses from an SGE response queue
  *	@adap: the adapter
  *	@qs: the queue set to which the response queue belongs
  *	@budget: how many responses can be processed in this round
  *
  *	Process responses from an SGE response queue up to the supplied budget.
  *	Responses include received packets as well as credits and other events
  *	for the queues that belong to the response queue's queue set.
  *	A negative budget is effectively unlimited.
  *
  *	Additionally choose the interrupt holdoff time for the next interrupt
  *	on this queue.  If the system is under memory shortage use a fairly
  *	long delay to help recovery.
  */
 static int
 process_responses(adapter_t *adap, struct sge_qset *qs, int budget)
 {
 	struct sge_rspq *rspq = &qs->rspq;
 	struct rsp_desc *r = &rspq->desc[rspq->cidx];
 	int budget_left = budget;
 	unsigned int sleeping = 0;
 #if defined(INET6) || defined(INET)
 	int lro_enabled = qs->lro.enabled;
 	int skip_lro;
 	struct lro_ctrl *lro_ctrl = &qs->lro.ctrl;
 #endif
 	struct t3_mbuf_hdr *mh = &rspq->rspq_mh;
 #ifdef DEBUG	
 	static int last_holdoff = 0;
 	if (cxgb_debug && rspq->holdoff_tmr != last_holdoff) {
 		printf("next_holdoff=%d\n", rspq->holdoff_tmr);
 		last_holdoff = rspq->holdoff_tmr;
 	}
 #endif
 	rspq->next_holdoff = rspq->holdoff_tmr;
 
 	while (__predict_true(budget_left && is_new_response(r, rspq))) {
 		int eth, eop = 0, ethpad = 0;
 		uint32_t flags = ntohl(r->flags);
 		uint32_t rss_hash = be32toh(r->rss_hdr.rss_hash_val);
 		uint8_t opcode = r->rss_hdr.opcode;
 		
 		eth = (opcode == CPL_RX_PKT);
 		
 		if (__predict_false(flags & F_RSPD_ASYNC_NOTIF)) {
 			struct mbuf *m;
 
 			if (cxgb_debug)
 				printf("async notification\n");
 
 			if (mh->mh_head == NULL) {
 				mh->mh_head = m_gethdr(M_NOWAIT, MT_DATA);
 				m = mh->mh_head;
 			} else {
 				m = m_gethdr(M_NOWAIT, MT_DATA);
 			}
 			if (m == NULL)
 				goto no_mem;
 
                         memcpy(mtod(m, char *), r, AN_PKT_SIZE);
 			m->m_len = m->m_pkthdr.len = AN_PKT_SIZE;
                         *mtod(m, char *) = CPL_ASYNC_NOTIF;
 			opcode = CPL_ASYNC_NOTIF;
 			eop = 1;
                         rspq->async_notif++;
 			goto skip;
 		} else if  (flags & F_RSPD_IMM_DATA_VALID) {
 			struct mbuf *m = m_gethdr(M_NOWAIT, MT_DATA);
 
 			if (m == NULL) {	
 		no_mem:
 				rspq->next_holdoff = NOMEM_INTR_DELAY;
 				budget_left--;
 				break;
 			}
 			if (mh->mh_head == NULL)
 				mh->mh_head = m;
                         else 
 				mh->mh_tail->m_next = m;
 			mh->mh_tail = m;
 
 			get_imm_packet(adap, r, m);
 			mh->mh_head->m_pkthdr.len += m->m_len;
 			eop = 1;
 			rspq->imm_data++;
 		} else if (r->len_cq) {
 			int drop_thresh = eth ? SGE_RX_DROP_THRES : 0;
 			
 			eop = get_packet(adap, drop_thresh, qs, mh, r);
 			if (eop) {
 				if (r->rss_hdr.hash_type && !adap->timestamp) {
-					M_HASHTYPE_SET(mh->mh_head, M_HASHTYPE_OPAQUE);
+					M_HASHTYPE_SET(mh->mh_head,
+					    M_HASHTYPE_OPAQUE_HASH);
 					mh->mh_head->m_pkthdr.flowid = rss_hash;
 				}
 			}
 			
 			ethpad = 2;
 		} else {
 			rspq->pure_rsps++;
 		}
 	skip:
 		if (flags & RSPD_CTRL_MASK) {
 			sleeping |= flags & RSPD_GTS_MASK;
 			handle_rsp_cntrl_info(qs, flags);
 		}
 
 		if (!eth && eop) {
 			rspq->offload_pkts++;
 #ifdef TCP_OFFLOAD
 			adap->cpl_handler[opcode](qs, r, mh->mh_head);
 #else
 			m_freem(mh->mh_head);
 #endif
 			mh->mh_head = NULL;
 		} else if (eth && eop) {
 			struct mbuf *m = mh->mh_head;
 
 			t3_rx_eth(adap, m, ethpad);
 
 			/*
 			 * The T304 sends incoming packets on any qset.  If LRO
 			 * is also enabled, we could end up sending packet up
 			 * lro_ctrl->ifp's input.  That is incorrect.
 			 *
 			 * The mbuf's rcvif was derived from the cpl header and
 			 * is accurate.  Skip LRO and just use that.
 			 */
 #if defined(INET6) || defined(INET)
 			skip_lro = __predict_false(qs->port->ifp != m->m_pkthdr.rcvif);
 
 			if (lro_enabled && lro_ctrl->lro_cnt && !skip_lro
 			    && (tcp_lro_rx(lro_ctrl, m, 0) == 0)
 			    ) {
 				/* successfully queue'd for LRO */
 			} else
 #endif
 			{
 				/*
 				 * LRO not enabled, packet unsuitable for LRO,
 				 * or unable to queue.  Pass it up right now in
 				 * either case.
 				 */
 				struct ifnet *ifp = m->m_pkthdr.rcvif;
 				(*ifp->if_input)(ifp, m);
 			}
 			mh->mh_head = NULL;
 
 		}
 
 		r++;
 		if (__predict_false(++rspq->cidx == rspq->size)) {
 			rspq->cidx = 0;
 			rspq->gen ^= 1;
 			r = rspq->desc;
 		}
 
 		if (++rspq->credits >= 64) {
 			refill_rspq(adap, rspq, rspq->credits);
 			rspq->credits = 0;
 		}
 		__refill_fl_lt(adap, &qs->fl[0], 32);
 		__refill_fl_lt(adap, &qs->fl[1], 32);
 		--budget_left;
 	}
 
 #if defined(INET6) || defined(INET)
 	/* Flush LRO */
 	tcp_lro_flush_all(lro_ctrl);
 #endif
 
 	if (sleeping)
 		check_ring_db(adap, qs, sleeping);
 
 	mb();  /* commit Tx queue processed updates */
 	if (__predict_false(qs->txq_stopped > 1))
 		restart_tx(qs);
 
 	__refill_fl_lt(adap, &qs->fl[0], 512);
 	__refill_fl_lt(adap, &qs->fl[1], 512);
 	budget -= budget_left;
 	return (budget);
 }
 
 /*
  * A helper function that processes responses and issues GTS.
  */
 static __inline int
 process_responses_gts(adapter_t *adap, struct sge_rspq *rq)
 {
 	int work;
 	static int last_holdoff = 0;
 	
 	work = process_responses(adap, rspq_to_qset(rq), -1);
 
 	if (cxgb_debug && (rq->next_holdoff != last_holdoff)) {
 		printf("next_holdoff=%d\n", rq->next_holdoff);
 		last_holdoff = rq->next_holdoff;
 	}
 	t3_write_reg(adap, A_SG_GTS, V_RSPQ(rq->cntxt_id) |
 	    V_NEWTIMER(rq->next_holdoff) | V_NEWINDEX(rq->cidx));
 	
 	return (work);
 }
 
 
 /*
  * Interrupt handler for legacy INTx interrupts for T3B-based cards.
  * Handles data events from SGE response queues as well as error and other
  * async events as they all use the same interrupt pin.  We use one SGE
  * response queue per port in this mode and protect all response queues with
  * queue 0's lock.
  */
 void
 t3b_intr(void *data)
 {
 	uint32_t i, map;
 	adapter_t *adap = data;
 	struct sge_rspq *q0 = &adap->sge.qs[0].rspq;
 	
 	t3_write_reg(adap, A_PL_CLI, 0);
 	map = t3_read_reg(adap, A_SG_DATA_INTR);
 
 	if (!map) 
 		return;
 
 	if (__predict_false(map & F_ERRINTR)) {
 		t3_write_reg(adap, A_PL_INT_ENABLE0, 0);
 		(void) t3_read_reg(adap, A_PL_INT_ENABLE0);
 		taskqueue_enqueue(adap->tq, &adap->slow_intr_task);
 	}
 
 	mtx_lock(&q0->lock);
 	for_each_port(adap, i)
 	    if (map & (1 << i))
 			process_responses_gts(adap, &adap->sge.qs[i].rspq);
 	mtx_unlock(&q0->lock);
 }
 
 /*
  * The MSI interrupt handler.  This needs to handle data events from SGE
  * response queues as well as error and other async events as they all use
  * the same MSI vector.  We use one SGE response queue per port in this mode
  * and protect all response queues with queue 0's lock.
  */
 void
 t3_intr_msi(void *data)
 {
 	adapter_t *adap = data;
 	struct sge_rspq *q0 = &adap->sge.qs[0].rspq;
 	int i, new_packets = 0;
 
 	mtx_lock(&q0->lock);
 
 	for_each_port(adap, i)
 	    if (process_responses_gts(adap, &adap->sge.qs[i].rspq)) 
 		    new_packets = 1;
 	mtx_unlock(&q0->lock);
 	if (new_packets == 0) {
 		t3_write_reg(adap, A_PL_INT_ENABLE0, 0);
 		(void) t3_read_reg(adap, A_PL_INT_ENABLE0);
 		taskqueue_enqueue(adap->tq, &adap->slow_intr_task);
 	}
 }
 
 void
 t3_intr_msix(void *data)
 {
 	struct sge_qset *qs = data;
 	adapter_t *adap = qs->port->adapter;
 	struct sge_rspq *rspq = &qs->rspq;
 
 	if (process_responses_gts(adap, rspq) == 0)
 		rspq->unhandled_irqs++;
 }
 
 #define QDUMP_SBUF_SIZE		32 * 400
 static int
 t3_dump_rspq(SYSCTL_HANDLER_ARGS)
 {
 	struct sge_rspq *rspq;
 	struct sge_qset *qs;
 	int i, err, dump_end, idx;
 	struct sbuf *sb;
 	struct rsp_desc *rspd;
 	uint32_t data[4];
 	
 	rspq = arg1;
 	qs = rspq_to_qset(rspq);
 	if (rspq->rspq_dump_count == 0) 
 		return (0);
 	if (rspq->rspq_dump_count > RSPQ_Q_SIZE) {
 		log(LOG_WARNING,
 		    "dump count is too large %d\n", rspq->rspq_dump_count);
 		rspq->rspq_dump_count = 0;
 		return (EINVAL);
 	}
 	if (rspq->rspq_dump_start > (RSPQ_Q_SIZE-1)) {
 		log(LOG_WARNING,
 		    "dump start of %d is greater than queue size\n",
 		    rspq->rspq_dump_start);
 		rspq->rspq_dump_start = 0;
 		return (EINVAL);
 	}
 	err = t3_sge_read_rspq(qs->port->adapter, rspq->cntxt_id, data);
 	if (err)
 		return (err);
 	err = sysctl_wire_old_buffer(req, 0);
 	if (err)
 		return (err);
 	sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
 
 	sbuf_printf(sb, " \n index=%u size=%u MSI-X/RspQ=%u intr enable=%u intr armed=%u\n",
 	    (data[0] & 0xffff), data[0] >> 16, ((data[2] >> 20) & 0x3f),
 	    ((data[2] >> 26) & 1), ((data[2] >> 27) & 1));
 	sbuf_printf(sb, " generation=%u CQ mode=%u FL threshold=%u\n",
 	    ((data[2] >> 28) & 1), ((data[2] >> 31) & 1), data[3]);
 	
 	sbuf_printf(sb, " start=%d -> end=%d\n", rspq->rspq_dump_start,
 	    (rspq->rspq_dump_start + rspq->rspq_dump_count) & (RSPQ_Q_SIZE-1));
 	
 	dump_end = rspq->rspq_dump_start + rspq->rspq_dump_count;
 	for (i = rspq->rspq_dump_start; i < dump_end; i++) {
 		idx = i & (RSPQ_Q_SIZE-1);
 		
 		rspd = &rspq->desc[idx];
 		sbuf_printf(sb, "\tidx=%04d opcode=%02x cpu_idx=%x hash_type=%x cq_idx=%x\n",
 		    idx, rspd->rss_hdr.opcode, rspd->rss_hdr.cpu_idx,
 		    rspd->rss_hdr.hash_type, be16toh(rspd->rss_hdr.cq_idx));
 		sbuf_printf(sb, "\trss_hash_val=%x flags=%08x len_cq=%x intr_gen=%x\n",
 		    rspd->rss_hdr.rss_hash_val, be32toh(rspd->flags),
 		    be32toh(rspd->len_cq), rspd->intr_gen);
 	}
 
 	err = sbuf_finish(sb);
 	sbuf_delete(sb);
 	return (err);
 }	
 
 static int
 t3_dump_txq_eth(SYSCTL_HANDLER_ARGS)
 {
 	struct sge_txq *txq;
 	struct sge_qset *qs;
 	int i, j, err, dump_end;
 	struct sbuf *sb;
 	struct tx_desc *txd;
 	uint32_t *WR, wr_hi, wr_lo, gen;
 	uint32_t data[4];
 	
 	txq = arg1;
 	qs = txq_to_qset(txq, TXQ_ETH);
 	if (txq->txq_dump_count == 0) {
 		return (0);
 	}
 	if (txq->txq_dump_count > TX_ETH_Q_SIZE) {
 		log(LOG_WARNING,
 		    "dump count is too large %d\n", txq->txq_dump_count);
 		txq->txq_dump_count = 1;
 		return (EINVAL);
 	}
 	if (txq->txq_dump_start > (TX_ETH_Q_SIZE-1)) {
 		log(LOG_WARNING,
 		    "dump start of %d is greater than queue size\n",
 		    txq->txq_dump_start);
 		txq->txq_dump_start = 0;
 		return (EINVAL);
 	}
 	err = t3_sge_read_ecntxt(qs->port->adapter, qs->rspq.cntxt_id, data);
 	if (err)
 		return (err);
 	err = sysctl_wire_old_buffer(req, 0);
 	if (err)
 		return (err);
 	sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
 
 	sbuf_printf(sb, " \n credits=%u GTS=%u index=%u size=%u rspq#=%u cmdq#=%u\n",
 	    (data[0] & 0x7fff), ((data[0] >> 15) & 1), (data[0] >> 16), 
 	    (data[1] & 0xffff), ((data[3] >> 4) & 7), ((data[3] >> 7) & 1));
 	sbuf_printf(sb, " TUN=%u TOE=%u generation%u uP token=%u valid=%u\n",
 	    ((data[3] >> 8) & 1), ((data[3] >> 9) & 1), ((data[3] >> 10) & 1),
 	    ((data[3] >> 11) & 0xfffff), ((data[3] >> 31) & 1));
 	sbuf_printf(sb, " qid=%d start=%d -> end=%d\n", qs->idx,
 	    txq->txq_dump_start,
 	    (txq->txq_dump_start + txq->txq_dump_count) & (TX_ETH_Q_SIZE-1));
 
 	dump_end = txq->txq_dump_start + txq->txq_dump_count;
 	for (i = txq->txq_dump_start; i < dump_end; i++) {
 		txd = &txq->desc[i & (TX_ETH_Q_SIZE-1)];
 		WR = (uint32_t *)txd->flit;
 		wr_hi = ntohl(WR[0]);
 		wr_lo = ntohl(WR[1]);		
 		gen = G_WR_GEN(wr_lo);
 		
 		sbuf_printf(sb," wr_hi %08x wr_lo %08x gen %d\n",
 		    wr_hi, wr_lo, gen);
 		for (j = 2; j < 30; j += 4) 
 			sbuf_printf(sb, "\t%08x %08x %08x %08x \n",
 			    WR[j], WR[j + 1], WR[j + 2], WR[j + 3]);
 
 	}
 	err = sbuf_finish(sb);
 	sbuf_delete(sb);
 	return (err);
 }
 
 static int
 t3_dump_txq_ctrl(SYSCTL_HANDLER_ARGS)
 {
 	struct sge_txq *txq;
 	struct sge_qset *qs;
 	int i, j, err, dump_end;
 	struct sbuf *sb;
 	struct tx_desc *txd;
 	uint32_t *WR, wr_hi, wr_lo, gen;
 	
 	txq = arg1;
 	qs = txq_to_qset(txq, TXQ_CTRL);
 	if (txq->txq_dump_count == 0) {
 		return (0);
 	}
 	if (txq->txq_dump_count > 256) {
 		log(LOG_WARNING,
 		    "dump count is too large %d\n", txq->txq_dump_count);
 		txq->txq_dump_count = 1;
 		return (EINVAL);
 	}
 	if (txq->txq_dump_start > 255) {
 		log(LOG_WARNING,
 		    "dump start of %d is greater than queue size\n",
 		    txq->txq_dump_start);
 		txq->txq_dump_start = 0;
 		return (EINVAL);
 	}
 
 	err = sysctl_wire_old_buffer(req, 0);
 	if (err != 0)
 		return (err);
 	sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
 	sbuf_printf(sb, " qid=%d start=%d -> end=%d\n", qs->idx,
 	    txq->txq_dump_start,
 	    (txq->txq_dump_start + txq->txq_dump_count) & 255);
 
 	dump_end = txq->txq_dump_start + txq->txq_dump_count;
 	for (i = txq->txq_dump_start; i < dump_end; i++) {
 		txd = &txq->desc[i & (255)];
 		WR = (uint32_t *)txd->flit;
 		wr_hi = ntohl(WR[0]);
 		wr_lo = ntohl(WR[1]);		
 		gen = G_WR_GEN(wr_lo);
 		
 		sbuf_printf(sb," wr_hi %08x wr_lo %08x gen %d\n",
 		    wr_hi, wr_lo, gen);
 		for (j = 2; j < 30; j += 4) 
 			sbuf_printf(sb, "\t%08x %08x %08x %08x \n",
 			    WR[j], WR[j + 1], WR[j + 2], WR[j + 3]);
 
 	}
 	err = sbuf_finish(sb);
 	sbuf_delete(sb);
 	return (err);
 }
 
 static int
 t3_set_coalesce_usecs(SYSCTL_HANDLER_ARGS)
 {
 	adapter_t *sc = arg1;
 	struct qset_params *qsp = &sc->params.sge.qset[0]; 
 	int coalesce_usecs;	
 	struct sge_qset *qs;
 	int i, j, err, nqsets = 0;
 	struct mtx *lock;
 
 	if ((sc->flags & FULL_INIT_DONE) == 0)
 		return (ENXIO);
 		
 	coalesce_usecs = qsp->coalesce_usecs;
         err = sysctl_handle_int(oidp, &coalesce_usecs, arg2, req);
 
 	if (err != 0) {
 		return (err);
 	}
 	if (coalesce_usecs == qsp->coalesce_usecs)
 		return (0);
 
 	for (i = 0; i < sc->params.nports; i++) 
 		for (j = 0; j < sc->port[i].nqsets; j++)
 			nqsets++;
 
 	coalesce_usecs = max(1, coalesce_usecs);
 
 	for (i = 0; i < nqsets; i++) {
 		qs = &sc->sge.qs[i];
 		qsp = &sc->params.sge.qset[i];
 		qsp->coalesce_usecs = coalesce_usecs;
 		
 		lock = (sc->flags & USING_MSIX) ? &qs->rspq.lock :
 			    &sc->sge.qs[0].rspq.lock;
 
 		mtx_lock(lock);
 		t3_update_qset_coalesce(qs, qsp);
 		t3_write_reg(sc, A_SG_GTS, V_RSPQ(qs->rspq.cntxt_id) |
 		    V_NEWTIMER(qs->rspq.holdoff_tmr));
 		mtx_unlock(lock);
 	}
 
 	return (0);
 }
 
 static int
 t3_pkt_timestamp(SYSCTL_HANDLER_ARGS)
 {
 	adapter_t *sc = arg1;
 	int rc, timestamp;
 
 	if ((sc->flags & FULL_INIT_DONE) == 0)
 		return (ENXIO);
 
 	timestamp = sc->timestamp;
 	rc = sysctl_handle_int(oidp, &timestamp, arg2, req);
 
 	if (rc != 0)
 		return (rc);
 
 	if (timestamp != sc->timestamp) {
 		t3_set_reg_field(sc, A_TP_PC_CONFIG2, F_ENABLERXPKTTMSTPRSS,
 		    timestamp ? F_ENABLERXPKTTMSTPRSS : 0);
 		sc->timestamp = timestamp;
 	}
 
 	return (0);
 }
 
 void
 t3_add_attach_sysctls(adapter_t *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid_list *children;
 
 	ctx = device_get_sysctl_ctx(sc->dev);
 	children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->dev));
 
 	/* random information */
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, 
 	    "firmware_version",
 	    CTLFLAG_RD, sc->fw_version,
 	    0, "firmware version");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO,
 	    "hw_revision",
 	    CTLFLAG_RD, &sc->params.rev,
 	    0, "chip model");
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, 
 	    "port_types",
 	    CTLFLAG_RD, sc->port_types,
 	    0, "type of ports");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, 
 	    "enable_debug",
 	    CTLFLAG_RW, &cxgb_debug,
 	    0, "enable verbose debugging output");
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tunq_coalesce",
 	    CTLFLAG_RD, &sc->tunq_coalesce,
 	    "#tunneled packets freed");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, 
 	    "txq_overrun",
 	    CTLFLAG_RD, &txq_fills,
 	    0, "#times txq overrun");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO,
 	    "core_clock",
 	    CTLFLAG_RD, &sc->params.vpd.cclk,
 	    0, "core clock frequency (in KHz)");
 }
 
 
 static const char *rspq_name = "rspq";
 static const char *txq_names[] =
 {
 	"txq_eth",
 	"txq_ofld",
 	"txq_ctrl"	
 };
 
 static int
 sysctl_handle_macstat(SYSCTL_HANDLER_ARGS)
 {
 	struct port_info *p = arg1;
 	uint64_t *parg;
 
 	if (!p)
 		return (EINVAL);
 
 	cxgb_refresh_stats(p);
 	parg = (uint64_t *) ((uint8_t *)&p->mac.stats + arg2);
 
 	return (sysctl_handle_64(oidp, parg, 0, req));
 }
 
 void
 t3_add_configured_sysctls(adapter_t *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid_list *children;
 	int i, j;
 	
 	ctx = device_get_sysctl_ctx(sc->dev);
 	children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->dev));
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, 
 	    "intr_coal",
 	    CTLTYPE_INT|CTLFLAG_RW, sc,
 	    0, t3_set_coalesce_usecs,
 	    "I", "interrupt coalescing timer (us)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, 
 	    "pkt_timestamp",
 	    CTLTYPE_INT | CTLFLAG_RW, sc,
 	    0, t3_pkt_timestamp,
 	    "I", "provide packet timestamp instead of connection hash");
 
 	for (i = 0; i < sc->params.nports; i++) {
 		struct port_info *pi = &sc->port[i];
 		struct sysctl_oid *poid;
 		struct sysctl_oid_list *poidlist;
 		struct mac_stats *mstats = &pi->mac.stats;
 		
 		snprintf(pi->namebuf, PORT_NAME_LEN, "port%d", i);
 		poid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, 
 		    pi->namebuf, CTLFLAG_RD, NULL, "port statistics");
 		poidlist = SYSCTL_CHILDREN(poid);
 		SYSCTL_ADD_UINT(ctx, poidlist, OID_AUTO,
 		    "nqsets", CTLFLAG_RD, &pi->nqsets,
 		    0, "#queue sets");
 
 		for (j = 0; j < pi->nqsets; j++) {
 			struct sge_qset *qs = &sc->sge.qs[pi->first_qset + j];
 			struct sysctl_oid *qspoid, *rspqpoid, *txqpoid,
 					  *ctrlqpoid, *lropoid;
 			struct sysctl_oid_list *qspoidlist, *rspqpoidlist,
 					       *txqpoidlist, *ctrlqpoidlist,
 					       *lropoidlist;
 			struct sge_txq *txq = &qs->txq[TXQ_ETH];
 			
 			snprintf(qs->namebuf, QS_NAME_LEN, "qs%d", j);
 			
 			qspoid = SYSCTL_ADD_NODE(ctx, poidlist, OID_AUTO, 
 			    qs->namebuf, CTLFLAG_RD, NULL, "qset statistics");
 			qspoidlist = SYSCTL_CHILDREN(qspoid);
 
 			SYSCTL_ADD_UINT(ctx, qspoidlist, OID_AUTO, "fl0_empty",
 					CTLFLAG_RD, &qs->fl[0].empty, 0,
 					"freelist #0 empty");
 			SYSCTL_ADD_UINT(ctx, qspoidlist, OID_AUTO, "fl1_empty",
 					CTLFLAG_RD, &qs->fl[1].empty, 0,
 					"freelist #1 empty");
 
 			rspqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO, 
 			    rspq_name, CTLFLAG_RD, NULL, "rspq statistics");
 			rspqpoidlist = SYSCTL_CHILDREN(rspqpoid);
 
 			txqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO, 
 			    txq_names[0], CTLFLAG_RD, NULL, "txq statistics");
 			txqpoidlist = SYSCTL_CHILDREN(txqpoid);
 
 			ctrlqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO, 
 			    txq_names[2], CTLFLAG_RD, NULL, "ctrlq statistics");
 			ctrlqpoidlist = SYSCTL_CHILDREN(ctrlqpoid);
 
 			lropoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO, 
 			    "lro_stats", CTLFLAG_RD, NULL, "LRO statistics");
 			lropoidlist = SYSCTL_CHILDREN(lropoid);
 
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "size",
 			    CTLFLAG_RD, &qs->rspq.size,
 			    0, "#entries in response queue");
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "cidx",
 			    CTLFLAG_RD, &qs->rspq.cidx,
 			    0, "consumer index");
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "credits",
 			    CTLFLAG_RD, &qs->rspq.credits,
 			    0, "#credits");
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "starved",
 			    CTLFLAG_RD, &qs->rspq.starved,
 			    0, "#times starved");
 			SYSCTL_ADD_UAUTO(ctx, rspqpoidlist, OID_AUTO, "phys_addr",
 			    CTLFLAG_RD, &qs->rspq.phys_addr,
 			    "physical_address_of the queue");
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "dump_start",
 			    CTLFLAG_RW, &qs->rspq.rspq_dump_start,
 			    0, "start rspq dump entry");
 			SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "dump_count",
 			    CTLFLAG_RW, &qs->rspq.rspq_dump_count,
 			    0, "#rspq entries to dump");
 			SYSCTL_ADD_PROC(ctx, rspqpoidlist, OID_AUTO, "qdump",
 			    CTLTYPE_STRING | CTLFLAG_RD, &qs->rspq,
 			    0, t3_dump_rspq, "A", "dump of the response queue");
 
 			SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "dropped",
 			    CTLFLAG_RD, &qs->txq[TXQ_ETH].txq_mr->br_drops,
 			    "#tunneled packets dropped");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "sendqlen",
 			    CTLFLAG_RD, &qs->txq[TXQ_ETH].sendq.mq_len,
 			    0, "#tunneled packets waiting to be sent");
 #if 0			
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "queue_pidx",
 			    CTLFLAG_RD, (uint32_t *)(uintptr_t)&qs->txq[TXQ_ETH].txq_mr.br_prod,
 			    0, "#tunneled packets queue producer index");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "queue_cidx",
 			    CTLFLAG_RD, (uint32_t *)(uintptr_t)&qs->txq[TXQ_ETH].txq_mr.br_cons,
 			    0, "#tunneled packets queue consumer index");
 #endif			
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "processed",
 			    CTLFLAG_RD, &qs->txq[TXQ_ETH].processed,
 			    0, "#tunneled packets processed by the card");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "cleaned",
 			    CTLFLAG_RD, &txq->cleaned,
 			    0, "#tunneled packets cleaned");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "in_use",
 			    CTLFLAG_RD, &txq->in_use,
 			    0, "#tunneled packet slots in use");
 			SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "frees",
 			    CTLFLAG_RD, &txq->txq_frees,
 			    "#tunneled packets freed");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "skipped",
 			    CTLFLAG_RD, &txq->txq_skipped,
 			    0, "#tunneled packet descriptors skipped");
 			SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "coalesced",
 			    CTLFLAG_RD, &txq->txq_coalesced,
 			    "#tunneled packets coalesced");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "enqueued",
 			    CTLFLAG_RD, &txq->txq_enqueued,
 			    0, "#tunneled packets enqueued to hardware");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "stopped_flags",
 			    CTLFLAG_RD, &qs->txq_stopped,
 			    0, "tx queues stopped");
 			SYSCTL_ADD_UAUTO(ctx, txqpoidlist, OID_AUTO, "phys_addr",
 			    CTLFLAG_RD, &txq->phys_addr,
 			    "physical_address_of the queue");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "qgen",
 			    CTLFLAG_RW, &qs->txq[TXQ_ETH].gen,
 			    0, "txq generation");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "hw_cidx",
 			    CTLFLAG_RD, &txq->cidx,
 			    0, "hardware queue cidx");			
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "hw_pidx",
 			    CTLFLAG_RD, &txq->pidx,
 			    0, "hardware queue pidx");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "dump_start",
 			    CTLFLAG_RW, &qs->txq[TXQ_ETH].txq_dump_start,
 			    0, "txq start idx for dump");
 			SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "dump_count",
 			    CTLFLAG_RW, &qs->txq[TXQ_ETH].txq_dump_count,
 			    0, "txq #entries to dump");			
 			SYSCTL_ADD_PROC(ctx, txqpoidlist, OID_AUTO, "qdump",
 			    CTLTYPE_STRING | CTLFLAG_RD, &qs->txq[TXQ_ETH],
 			    0, t3_dump_txq_eth, "A", "dump of the transmit queue");
 
 			SYSCTL_ADD_UINT(ctx, ctrlqpoidlist, OID_AUTO, "dump_start",
 			    CTLFLAG_RW, &qs->txq[TXQ_CTRL].txq_dump_start,
 			    0, "ctrlq start idx for dump");
 			SYSCTL_ADD_UINT(ctx, ctrlqpoidlist, OID_AUTO, "dump_count",
 			    CTLFLAG_RW, &qs->txq[TXQ_CTRL].txq_dump_count,
 			    0, "ctrl #entries to dump");			
 			SYSCTL_ADD_PROC(ctx, ctrlqpoidlist, OID_AUTO, "qdump",
 			    CTLTYPE_STRING | CTLFLAG_RD, &qs->txq[TXQ_CTRL],
 			    0, t3_dump_txq_ctrl, "A", "dump of the transmit queue");
 
 			SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_queued",
 			    CTLFLAG_RD, &qs->lro.ctrl.lro_queued, 0, NULL);
 			SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_flushed",
 			    CTLFLAG_RD, &qs->lro.ctrl.lro_flushed, 0, NULL);
 			SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_bad_csum",
 			    CTLFLAG_RD, &qs->lro.ctrl.lro_bad_csum, 0, NULL);
 			SYSCTL_ADD_INT(ctx, lropoidlist, OID_AUTO, "lro_cnt",
 			    CTLFLAG_RD, &qs->lro.ctrl.lro_cnt, 0, NULL);
 		}
 
 		/* Now add a node for mac stats. */
 		poid = SYSCTL_ADD_NODE(ctx, poidlist, OID_AUTO, "mac_stats",
 		    CTLFLAG_RD, NULL, "MAC statistics");
 		poidlist = SYSCTL_CHILDREN(poid);
 
 		/*
 		 * We (ab)use the length argument (arg2) to pass on the offset
 		 * of the data that we are interested in.  This is only required
 		 * for the quad counters that are updated from the hardware (we
 		 * make sure that we return the latest value).
 		 * sysctl_handle_macstat first updates *all* the counters from
 		 * the hardware, and then returns the latest value of the
 		 * requested counter.  Best would be to update only the
 		 * requested counter from hardware, but t3_mac_update_stats()
 		 * hides all the register details and we don't want to dive into
 		 * all that here.
 		 */
 #define CXGB_SYSCTL_ADD_QUAD(a)	SYSCTL_ADD_OID(ctx, poidlist, OID_AUTO, #a, \
     (CTLTYPE_U64 | CTLFLAG_RD), pi, offsetof(struct mac_stats, a), \
     sysctl_handle_macstat, "QU", 0)
 		CXGB_SYSCTL_ADD_QUAD(tx_octets);
 		CXGB_SYSCTL_ADD_QUAD(tx_octets_bad);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames);
 		CXGB_SYSCTL_ADD_QUAD(tx_mcast_frames);
 		CXGB_SYSCTL_ADD_QUAD(tx_bcast_frames);
 		CXGB_SYSCTL_ADD_QUAD(tx_pause);
 		CXGB_SYSCTL_ADD_QUAD(tx_deferred);
 		CXGB_SYSCTL_ADD_QUAD(tx_late_collisions);
 		CXGB_SYSCTL_ADD_QUAD(tx_total_collisions);
 		CXGB_SYSCTL_ADD_QUAD(tx_excess_collisions);
 		CXGB_SYSCTL_ADD_QUAD(tx_underrun);
 		CXGB_SYSCTL_ADD_QUAD(tx_len_errs);
 		CXGB_SYSCTL_ADD_QUAD(tx_mac_internal_errs);
 		CXGB_SYSCTL_ADD_QUAD(tx_excess_deferral);
 		CXGB_SYSCTL_ADD_QUAD(tx_fcs_errs);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_64);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_65_127);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_128_255);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_256_511);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_512_1023);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_1024_1518);
 		CXGB_SYSCTL_ADD_QUAD(tx_frames_1519_max);
 		CXGB_SYSCTL_ADD_QUAD(rx_octets);
 		CXGB_SYSCTL_ADD_QUAD(rx_octets_bad);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames);
 		CXGB_SYSCTL_ADD_QUAD(rx_mcast_frames);
 		CXGB_SYSCTL_ADD_QUAD(rx_bcast_frames);
 		CXGB_SYSCTL_ADD_QUAD(rx_pause);
 		CXGB_SYSCTL_ADD_QUAD(rx_fcs_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_align_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_symbol_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_data_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_sequence_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_runt);
 		CXGB_SYSCTL_ADD_QUAD(rx_jabber);
 		CXGB_SYSCTL_ADD_QUAD(rx_short);
 		CXGB_SYSCTL_ADD_QUAD(rx_too_long);
 		CXGB_SYSCTL_ADD_QUAD(rx_mac_internal_errs);
 		CXGB_SYSCTL_ADD_QUAD(rx_cong_drops);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_64);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_65_127);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_128_255);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_256_511);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_512_1023);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_1024_1518);
 		CXGB_SYSCTL_ADD_QUAD(rx_frames_1519_max);
 #undef CXGB_SYSCTL_ADD_QUAD
 
 #define CXGB_SYSCTL_ADD_ULONG(a) SYSCTL_ADD_ULONG(ctx, poidlist, OID_AUTO, #a, \
     CTLFLAG_RD, &mstats->a, 0)
 		CXGB_SYSCTL_ADD_ULONG(tx_fifo_parity_err);
 		CXGB_SYSCTL_ADD_ULONG(rx_fifo_parity_err);
 		CXGB_SYSCTL_ADD_ULONG(tx_fifo_urun);
 		CXGB_SYSCTL_ADD_ULONG(rx_fifo_ovfl);
 		CXGB_SYSCTL_ADD_ULONG(serdes_signal_loss);
 		CXGB_SYSCTL_ADD_ULONG(xaui_pcs_ctc_err);
 		CXGB_SYSCTL_ADD_ULONG(xaui_pcs_align_change);
 		CXGB_SYSCTL_ADD_ULONG(num_toggled);
 		CXGB_SYSCTL_ADD_ULONG(num_resets);
 		CXGB_SYSCTL_ADD_ULONG(link_faults);
 #undef CXGB_SYSCTL_ADD_ULONG
 	}
 }
 	
 /**
  *	t3_get_desc - dump an SGE descriptor for debugging purposes
  *	@qs: the queue set
  *	@qnum: identifies the specific queue (0..2: Tx, 3:response, 4..5: Rx)
  *	@idx: the descriptor index in the queue
  *	@data: where to dump the descriptor contents
  *
  *	Dumps the contents of a HW descriptor of an SGE queue.  Returns the
  *	size of the descriptor.
  */
 int
 t3_get_desc(const struct sge_qset *qs, unsigned int qnum, unsigned int idx,
 		unsigned char *data)
 {
 	if (qnum >= 6)
 		return (EINVAL);
 
 	if (qnum < 3) {
 		if (!qs->txq[qnum].desc || idx >= qs->txq[qnum].size)
 			return -EINVAL;
 		memcpy(data, &qs->txq[qnum].desc[idx], sizeof(struct tx_desc));
 		return sizeof(struct tx_desc);
 	}
 
 	if (qnum == 3) {
 		if (!qs->rspq.desc || idx >= qs->rspq.size)
 			return (EINVAL);
 		memcpy(data, &qs->rspq.desc[idx], sizeof(struct rsp_desc));
 		return sizeof(struct rsp_desc);
 	}
 
 	qnum -= 4;
 	if (!qs->fl[qnum].desc || idx >= qs->fl[qnum].size)
 		return (EINVAL);
 	memcpy(data, &qs->fl[qnum].desc[idx], sizeof(struct rx_desc));
 	return sizeof(struct rx_desc);
 }
Index: projects/vnet/sys/dev/cxgbe/adapter.h
===================================================================
--- projects/vnet/sys/dev/cxgbe/adapter.h	(revision 301546)
+++ projects/vnet/sys/dev/cxgbe/adapter.h	(revision 301547)
@@ -1,1152 +1,1166 @@
 /*-
  * Copyright (c) 2011 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: Navdeep Parhar <np@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  *
  */
 
 #ifndef __T4_ADAPTER_H__
 #define __T4_ADAPTER_H__
 
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/types.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/rwlock.h>
 #include <sys/sx.h>
 #include <vm/uma.h>
 
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pcireg.h>
 #include <machine/bus.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_media.h>
 #include <netinet/in.h>
 #include <netinet/tcp_lro.h>
 
 #include "offload.h"
+#include "t4_ioctl.h"
 #include "common/t4_msg.h"
 #include "firmware/t4fw_interface.h"
 
 #define KTR_CXGBE	KTR_SPARE3
 MALLOC_DECLARE(M_CXGBE);
 #define CXGBE_UNIMPLEMENTED(s) \
     panic("%s (%s, line %d) not implemented yet.", s, __FILE__, __LINE__)
 
 #if defined(__i386__) || defined(__amd64__)
 static __inline void
 prefetch(void *x)
 {
 	__asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x));
 }
 #else
 #define prefetch(x)
 #endif
 
 #ifndef SYSCTL_ADD_UQUAD
 #define SYSCTL_ADD_UQUAD SYSCTL_ADD_QUAD
 #define sysctl_handle_64 sysctl_handle_quad
 #define CTLTYPE_U64 CTLTYPE_QUAD
 #endif
 
 #if (__FreeBSD_version >= 900030) || \
     ((__FreeBSD_version >= 802507) && (__FreeBSD_version < 900000))
 #define SBUF_DRAIN 1
 #endif
 
 #ifdef __amd64__
 /* XXX: need systemwide bus_space_read_8/bus_space_write_8 */
 static __inline uint64_t
 t4_bus_space_read_8(bus_space_tag_t tag, bus_space_handle_t handle,
     bus_size_t offset)
 {
 	KASSERT(tag == X86_BUS_SPACE_MEM,
 	    ("%s: can only handle mem space", __func__));
 
 	return (*(volatile uint64_t *)(handle + offset));
 }
 
 static __inline void
 t4_bus_space_write_8(bus_space_tag_t tag, bus_space_handle_t bsh,
     bus_size_t offset, uint64_t value)
 {
 	KASSERT(tag == X86_BUS_SPACE_MEM,
 	    ("%s: can only handle mem space", __func__));
 
 	*(volatile uint64_t *)(bsh + offset) = value;
 }
 #else
 static __inline uint64_t
 t4_bus_space_read_8(bus_space_tag_t tag, bus_space_handle_t handle,
     bus_size_t offset)
 {
 	return (uint64_t)bus_space_read_4(tag, handle, offset) +
 	    ((uint64_t)bus_space_read_4(tag, handle, offset + 4) << 32);
 }
 
 static __inline void
 t4_bus_space_write_8(bus_space_tag_t tag, bus_space_handle_t bsh,
     bus_size_t offset, uint64_t value)
 {
 	bus_space_write_4(tag, bsh, offset, value);
 	bus_space_write_4(tag, bsh, offset + 4, value >> 32);
 }
 #endif
 
 struct adapter;
 typedef struct adapter adapter_t;
 
 enum {
 	/*
 	 * All ingress queues use this entry size.  Note that the firmware event
 	 * queue and any iq expecting CPL_RX_PKT in the descriptor needs this to
 	 * be at least 64.
 	 */
 	IQ_ESIZE = 64,
 
 	/* Default queue sizes for all kinds of ingress queues */
 	FW_IQ_QSIZE = 256,
 	RX_IQ_QSIZE = 1024,
 
 	/* All egress queues use this entry size */
 	EQ_ESIZE = 64,
 
 	/* Default queue sizes for all kinds of egress queues */
 	CTRL_EQ_QSIZE = 128,
 	TX_EQ_QSIZE = 1024,
 
 #if MJUMPAGESIZE != MCLBYTES
 	SW_ZONE_SIZES = 4,	/* cluster, jumbop, jumbo9k, jumbo16k */
 #else
 	SW_ZONE_SIZES = 3,	/* cluster, jumbo9k, jumbo16k */
 #endif
 	CL_METADATA_SIZE = CACHE_LINE_SIZE,
 
 	SGE_MAX_WR_NDESC = SGE_MAX_WR_LEN / EQ_ESIZE, /* max WR size in desc */
 	TX_SGL_SEGS = 39,
 	TX_SGL_SEGS_TSO = 38,
 	TX_WR_FLITS = SGE_MAX_WR_LEN / 8
 };
 
 enum {
 	/* adapter intr_type */
 	INTR_INTX	= (1 << 0),
 	INTR_MSI 	= (1 << 1),
 	INTR_MSIX	= (1 << 2)
 };
 
 enum {
 	XGMAC_MTU	= (1 << 0),
 	XGMAC_PROMISC	= (1 << 1),
 	XGMAC_ALLMULTI	= (1 << 2),
 	XGMAC_VLANEX	= (1 << 3),
 	XGMAC_UCADDR	= (1 << 4),
 	XGMAC_MCADDRS	= (1 << 5),
 
 	XGMAC_ALL	= 0xffff
 };
 
 enum {
 	/* flags understood by begin_synchronized_op */
 	HOLD_LOCK	= (1 << 0),
 	SLEEP_OK	= (1 << 1),
 	INTR_OK		= (1 << 2),
 
 	/* flags understood by end_synchronized_op */
 	LOCK_HELD	= HOLD_LOCK,
 };
 
 enum {
 	/* adapter flags */
 	FULL_INIT_DONE	= (1 << 0),
 	FW_OK		= (1 << 1),
 	/* INTR_DIRECT	= (1 << 2),	No longer used. */
 	MASTER_PF	= (1 << 3),
 	ADAP_SYSCTL_CTX	= (1 << 4),
 	/* TOM_INIT_DONE= (1 << 5),	No longer used */
 	BUF_PACKING_OK	= (1 << 6),
 
 	CXGBE_BUSY	= (1 << 9),
 
 	/* port flags */
 	HAS_TRACEQ	= (1 << 3),
 
 	/* VI flags */
 	DOOMED		= (1 << 0),
 	VI_INIT_DONE	= (1 << 1),
 	VI_SYSCTL_CTX	= (1 << 2),
 	INTR_RXQ	= (1 << 4),	/* All NIC rxq's take interrupts */
 	INTR_OFLD_RXQ	= (1 << 5),	/* All TOE rxq's take interrupts */
 	INTR_ALL	= (INTR_RXQ | INTR_OFLD_RXQ),
 	VI_NETMAP	= (1 << 6),
 
 	/* adapter debug_flags */
 	DF_DUMP_MBOX	= (1 << 0),
 };
 
 #define IS_DOOMED(vi)	((vi)->flags & DOOMED)
 #define SET_DOOMED(vi)	do {(vi)->flags |= DOOMED;} while (0)
 #define IS_BUSY(sc)	((sc)->flags & CXGBE_BUSY)
 #define SET_BUSY(sc)	do {(sc)->flags |= CXGBE_BUSY;} while (0)
 #define CLR_BUSY(sc)	do {(sc)->flags &= ~CXGBE_BUSY;} while (0)
 
 struct vi_info {
 	device_t dev;
 	struct port_info *pi;
 
 	struct ifnet *ifp;
 	struct ifmedia media;
 
 	unsigned long flags;
 	int if_flags;
 
 	uint16_t *rss;
 	uint16_t viid;
 	int16_t  xact_addr_filt;/* index of exact MAC address filter */
 	uint16_t rss_size;	/* size of VI's RSS table slice */
 	uint16_t rss_base;	/* start of VI's RSS table slice */
 
 	eventhandler_tag vlan_c;
 
 	int nintr;
 	int first_intr;
 
 	/* These need to be int as they are used in sysctl */
 	int ntxq;	/* # of tx queues */
 	int first_txq;	/* index of first tx queue */
 	int rsrv_noflowq; /* Reserve queue 0 for non-flowid packets */
 	int nrxq;	/* # of rx queues */
 	int first_rxq;	/* index of first rx queue */
 	int nofldtxq;		/* # of offload tx queues */
 	int first_ofld_txq;	/* index of first offload tx queue */
 	int nofldrxq;		/* # of offload rx queues */
 	int first_ofld_rxq;	/* index of first offload rx queue */
 	int tmr_idx;
 	int pktc_idx;
 	int qsize_rxq;
 	int qsize_txq;
 
 	struct timeval last_refreshed;
 	struct fw_vi_stats_vf stats;
 
 	struct callout tick;
 	struct sysctl_ctx_list ctx;	/* from ifconfig up to driver detach */
 
 	uint8_t hw_addr[ETHER_ADDR_LEN]; /* factory MAC address, won't change */
 };
 
+enum {
+	/* tx_sched_class flags */
+	TX_SC_OK	= (1 << 0),	/* Set up in hardware, active. */
+};
+
+struct tx_sched_class {
+	int refcount;
+	int flags;
+	struct t4_sched_class_params params;
+};
+
 struct port_info {
 	device_t dev;
 	struct adapter *adapter;
 
 	struct vi_info *vi;
 	int nvi;
 	int up_vis;
 	int uld_vis;
+
+	struct tx_sched_class *tc;	/* traffic classes for this channel */
 
 	struct mtx pi_lock;
 	char lockname[16];
 	unsigned long flags;
 
 	uint8_t  lport;		/* associated offload logical port */
 	int8_t   mdio_addr;
 	uint8_t  port_type;
 	uint8_t  mod_type;
 	uint8_t  port_id;
 	uint8_t  tx_chan;
 	uint8_t  rx_chan_map;	/* rx MPS channel bitmap */
 
 	int linkdnrc;
 	struct link_config link_cfg;
 
 	struct timeval last_refreshed;
  	struct port_stats stats;
 	u_int tnl_cong_drops;
 	u_int tx_parse_error;
 
 	struct callout tick;
 };
 
 #define	IS_MAIN_VI(vi)		((vi) == &((vi)->pi->vi[0]))
 
 /* Where the cluster came from, how it has been carved up. */
 struct cluster_layout {
 	int8_t zidx;
 	int8_t hwidx;
 	uint16_t region1;	/* mbufs laid out within this region */
 				/* region2 is the DMA region */
 	uint16_t region3;	/* cluster_metadata within this region */
 };
 
 struct cluster_metadata {
 	u_int refcount;
 	struct fl_sdesc *sd;	/* For debug only.  Could easily be stale */
 };
 
 struct fl_sdesc {
 	caddr_t cl;
 	uint16_t nmbuf;	/* # of driver originated mbufs with ref on cluster */
 	struct cluster_layout cll;
 };
 
 struct tx_desc {
 	__be64 flit[8];
 };
 
 struct tx_sdesc {
 	struct mbuf *m;		/* m_nextpkt linked chain of frames */
 	uint8_t desc_used;	/* # of hardware descriptors used by the WR */
 };
 
 
 #define IQ_PAD (IQ_ESIZE - sizeof(struct rsp_ctrl) - sizeof(struct rss_header))
 struct iq_desc {
 	struct rss_header rss;
 	uint8_t cpl[IQ_PAD];
 	struct rsp_ctrl rsp;
 };
 #undef IQ_PAD
 CTASSERT(sizeof(struct iq_desc) == IQ_ESIZE);
 
 enum {
 	/* iq flags */
 	IQ_ALLOCATED	= (1 << 0),	/* firmware resources allocated */
 	IQ_HAS_FL	= (1 << 1),	/* iq associated with a freelist */
 	IQ_INTR		= (1 << 2),	/* iq takes direct interrupt */
 	IQ_LRO_ENABLED	= (1 << 3),	/* iq is an eth rxq with LRO enabled */
 
 	/* iq state */
 	IQS_DISABLED	= 0,
 	IQS_BUSY	= 1,
 	IQS_IDLE	= 2,
 };
 
 /*
  * Ingress Queue: T4 is producer, driver is consumer.
  */
 struct sge_iq {
 	uint32_t flags;
 	volatile int state;
 	struct adapter *adapter;
 	struct iq_desc  *desc;	/* KVA of descriptor ring */
 	int8_t   intr_pktc_idx;	/* packet count threshold index */
 	uint8_t  gen;		/* generation bit */
 	uint8_t  intr_params;	/* interrupt holdoff parameters */
 	uint8_t  intr_next;	/* XXX: holdoff for next interrupt */
 	uint16_t qsize;		/* size (# of entries) of the queue */
 	uint16_t sidx;		/* index of the entry with the status page */
 	uint16_t cidx;		/* consumer index */
 	uint16_t cntxt_id;	/* SGE context id for the iq */
 	uint16_t abs_id;	/* absolute SGE id for the iq */
 
 	STAILQ_ENTRY(sge_iq) link;
 
 	bus_dma_tag_t desc_tag;
 	bus_dmamap_t desc_map;
 	bus_addr_t ba;		/* bus address of descriptor ring */
 };
 
 enum {
 	EQ_CTRL		= 1,
 	EQ_ETH		= 2,
 	EQ_OFLD		= 3,
 
 	/* eq flags */
 	EQ_TYPEMASK	= 0x3,		/* 2 lsbits hold the type (see above) */
 	EQ_ALLOCATED	= (1 << 2),	/* firmware resources allocated */
 	EQ_ENABLED	= (1 << 3),	/* open for business */
 };
 
 /* Listed in order of preference.  Update t4_sysctls too if you change these */
 enum {DOORBELL_UDB, DOORBELL_WCWR, DOORBELL_UDBWC, DOORBELL_KDB};
 
 /*
  * Egress Queue: driver is producer, T4 is consumer.
  *
  * Note: A free list is an egress queue (driver produces the buffers and T4
  * consumes them) but it's special enough to have its own struct (see sge_fl).
  */
 struct sge_eq {
 	unsigned int flags;	/* MUST be first */
 	unsigned int cntxt_id;	/* SGE context id for the eq */
 	struct mtx eq_lock;
 
 	struct tx_desc *desc;	/* KVA of descriptor ring */
 	uint16_t doorbells;
 	volatile uint32_t *udb;	/* KVA of doorbell (lies within BAR2) */
 	u_int udb_qid;		/* relative qid within the doorbell page */
 	uint16_t sidx;		/* index of the entry with the status page */
 	uint16_t cidx;		/* consumer idx (desc idx) */
 	uint16_t pidx;		/* producer idx (desc idx) */
 	uint16_t equeqidx;	/* EQUEQ last requested at this pidx */
 	uint16_t dbidx;		/* pidx of the most recent doorbell */
 	uint16_t iqid;		/* iq that gets egr_update for the eq */
 	uint8_t tx_chan;	/* tx channel used by the eq */
 	volatile u_int equiq;	/* EQUIQ outstanding */
 
 	bus_dma_tag_t desc_tag;
 	bus_dmamap_t desc_map;
 	bus_addr_t ba;		/* bus address of descriptor ring */
 	char lockname[16];
 };
 
 struct sw_zone_info {
 	uma_zone_t zone;	/* zone that this cluster comes from */
 	int size;		/* size of cluster: 2K, 4K, 9K, 16K, etc. */
 	int type;		/* EXT_xxx type of the cluster */
 	int8_t head_hwidx;
 	int8_t tail_hwidx;
 };
 
 struct hw_buf_info {
 	int8_t zidx;		/* backpointer to zone; -ve means unused */
 	int8_t next;		/* next hwidx for this zone; -1 means no more */
 	int size;
 };
 
 enum {
 	NUM_MEMWIN = 3,
 
 	MEMWIN0_APERTURE = 2048,
 	MEMWIN0_BASE     = 0x1b800,
 
 	MEMWIN1_APERTURE = 32768,
 	MEMWIN1_BASE     = 0x28000,
 
 	MEMWIN2_APERTURE_T4 = 65536,
 	MEMWIN2_BASE_T4     = 0x30000,
 
 	MEMWIN2_APERTURE_T5 = 128 * 1024,
 	MEMWIN2_BASE_T5     = 0x60000,
 };
 
 struct memwin {
 	struct rwlock mw_lock __aligned(CACHE_LINE_SIZE);
 	uint32_t mw_base;	/* constant after setup_memwin */
 	uint32_t mw_aperture;	/* ditto */
 	uint32_t mw_curpos;	/* protected by mw_lock */
 };
 
 enum {
 	FL_STARVING	= (1 << 0), /* on the adapter's list of starving fl's */
 	FL_DOOMED	= (1 << 1), /* about to be destroyed */
 	FL_BUF_PACKING	= (1 << 2), /* buffer packing enabled */
 	FL_BUF_RESUME	= (1 << 3), /* resume from the middle of the frame */
 };
 
 #define FL_RUNNING_LOW(fl) \
     (IDXDIFF(fl->dbidx * 8, fl->cidx, fl->sidx * 8) <= fl->lowat)
 #define FL_NOT_RUNNING_LOW(fl) \
     (IDXDIFF(fl->dbidx * 8, fl->cidx, fl->sidx * 8) >= 2 * fl->lowat)
 
 struct sge_fl {
 	struct mtx fl_lock;
 	__be64 *desc;		/* KVA of descriptor ring, ptr to addresses */
 	struct fl_sdesc *sdesc;	/* KVA of software descriptor ring */
 	struct cluster_layout cll_def;	/* default refill zone, layout */
 	uint16_t lowat;		/* # of buffers <= this means fl needs help */
 	int flags;
 	uint16_t buf_boundary;
 
 	/* The 16b idx all deal with hw descriptors */
 	uint16_t dbidx;		/* hw pidx after last doorbell */
 	uint16_t sidx;		/* index of status page */
 	volatile uint16_t hw_cidx;
 
 	/* The 32b idx are all buffer idx, not hardware descriptor idx */
 	uint32_t cidx;		/* consumer index */
 	uint32_t pidx;		/* producer index */
 
 	uint32_t dbval;
 	u_int rx_offset;	/* offset in fl buf (when buffer packing) */
 	volatile uint32_t *udb;
 
 	uint64_t mbuf_allocated;/* # of mbuf allocated from zone_mbuf */
 	uint64_t mbuf_inlined;	/* # of mbuf created within clusters */
 	uint64_t cl_allocated;	/* # of clusters allocated */
 	uint64_t cl_recycled;	/* # of clusters recycled */
 	uint64_t cl_fast_recycled; /* # of clusters recycled (fast) */
 
 	/* These 3 are valid when FL_BUF_RESUME is set, stale otherwise. */
 	struct mbuf *m0;
 	struct mbuf **pnext;
 	u_int remaining;
 
 	uint16_t qsize;		/* # of hw descriptors (status page included) */
 	uint16_t cntxt_id;	/* SGE context id for the freelist */
 	TAILQ_ENTRY(sge_fl) link; /* All starving freelists */
 	bus_dma_tag_t desc_tag;
 	bus_dmamap_t desc_map;
 	char lockname[16];
 	bus_addr_t ba;		/* bus address of descriptor ring */
 	struct cluster_layout cll_alt;	/* alternate refill zone, layout */
 };
 
 struct mp_ring;
 
 /* txq: SGE egress queue + what's needed for Ethernet NIC */
 struct sge_txq {
 	struct sge_eq eq;	/* MUST be first */
 
 	struct ifnet *ifp;	/* the interface this txq belongs to */
 	struct mp_ring *r;	/* tx software ring */
 	struct tx_sdesc *sdesc;	/* KVA of software descriptor ring */
 	struct sglist *gl;
 	__be32 cpl_ctrl0;	/* for convenience */
 
 	struct task tx_reclaim_task;
 	/* stats for common events first */
 
 	uint64_t txcsum;	/* # of times hardware assisted with checksum */
 	uint64_t tso_wrs;	/* # of TSO work requests */
 	uint64_t vlan_insertion;/* # of times VLAN tag was inserted */
 	uint64_t imm_wrs;	/* # of work requests with immediate data */
 	uint64_t sgl_wrs;	/* # of work requests with direct SGL */
 	uint64_t txpkt_wrs;	/* # of txpkt work requests (not coalesced) */
 	uint64_t txpkts0_wrs;	/* # of type0 coalesced tx work requests */
 	uint64_t txpkts1_wrs;	/* # of type1 coalesced tx work requests */
 	uint64_t txpkts0_pkts;	/* # of frames in type0 coalesced tx WRs */
 	uint64_t txpkts1_pkts;	/* # of frames in type1 coalesced tx WRs */
 
 	/* stats for not-that-common events */
 } __aligned(CACHE_LINE_SIZE);
 
 /* rxq: SGE ingress queue + SGE free list + miscellaneous items */
 struct sge_rxq {
 	struct sge_iq iq;	/* MUST be first */
 	struct sge_fl fl;	/* MUST follow iq */
 
 	struct ifnet *ifp;	/* the interface this rxq belongs to */
 #if defined(INET) || defined(INET6)
 	struct lro_ctrl lro;	/* LRO state */
 #endif
 
 	/* stats for common events first */
 
 	uint64_t rxcsum;	/* # of times hardware assisted with checksum */
 	uint64_t vlan_extraction;/* # of times VLAN tag was extracted */
 
 	/* stats for not-that-common events */
 
 } __aligned(CACHE_LINE_SIZE);
 
 static inline struct sge_rxq *
 iq_to_rxq(struct sge_iq *iq)
 {
 
 	return (__containerof(iq, struct sge_rxq, iq));
 }
 
 
 /* ofld_rxq: SGE ingress queue + SGE free list + miscellaneous items */
 struct sge_ofld_rxq {
 	struct sge_iq iq;	/* MUST be first */
 	struct sge_fl fl;	/* MUST follow iq */
 } __aligned(CACHE_LINE_SIZE);
 
 static inline struct sge_ofld_rxq *
 iq_to_ofld_rxq(struct sge_iq *iq)
 {
 
 	return (__containerof(iq, struct sge_ofld_rxq, iq));
 }
 
 struct wrqe {
 	STAILQ_ENTRY(wrqe) link;
 	struct sge_wrq *wrq;
 	int wr_len;
 	char wr[] __aligned(16);
 };
 
 struct wrq_cookie {
 	TAILQ_ENTRY(wrq_cookie) link;
 	int ndesc;
 	int pidx;
 };
 
 /*
  * wrq: SGE egress queue that is given prebuilt work requests.  Both the control
  * and offload tx queues are of this type.
  */
 struct sge_wrq {
 	struct sge_eq eq;	/* MUST be first */
 
 	struct adapter *adapter;
 	struct task wrq_tx_task;
 
 	/* Tx desc reserved but WR not "committed" yet. */
 	TAILQ_HEAD(wrq_incomplete_wrs , wrq_cookie) incomplete_wrs;
 
 	/* List of WRs ready to go out as soon as descriptors are available. */
 	STAILQ_HEAD(, wrqe) wr_list;
 	u_int nwr_pending;
 	u_int ndesc_needed;
 
 	/* stats for common events first */
 
 	uint64_t tx_wrs_direct;	/* # of WRs written directly to desc ring. */
 	uint64_t tx_wrs_ss;	/* # of WRs copied from scratch space. */
 	uint64_t tx_wrs_copied;	/* # of WRs queued and copied to desc ring. */
 
 	/* stats for not-that-common events */
 
 	/*
 	 * Scratch space for work requests that wrap around after reaching the
 	 * status page, and some information about the last WR that used it.
 	 */
 	uint16_t ss_pidx;
 	uint16_t ss_len;
 	uint8_t ss[SGE_MAX_WR_LEN];
 
 } __aligned(CACHE_LINE_SIZE);
 
 
 struct sge_nm_rxq {
 	struct vi_info *vi;
 
 	struct iq_desc *iq_desc;
 	uint16_t iq_abs_id;
 	uint16_t iq_cntxt_id;
 	uint16_t iq_cidx;
 	uint16_t iq_sidx;
 	uint8_t iq_gen;
 
 	__be64  *fl_desc;
 	uint16_t fl_cntxt_id;
 	uint32_t fl_cidx;
 	uint32_t fl_pidx;
 	uint32_t fl_sidx;
 	uint32_t fl_db_val;
 	u_int fl_hwidx:4;
 
 	u_int nid;		/* netmap ring # for this queue */
 
 	/* infrequently used items after this */
 
 	bus_dma_tag_t iq_desc_tag;
 	bus_dmamap_t iq_desc_map;
 	bus_addr_t iq_ba;
 	int intr_idx;
 
 	bus_dma_tag_t fl_desc_tag;
 	bus_dmamap_t fl_desc_map;
 	bus_addr_t fl_ba;
 } __aligned(CACHE_LINE_SIZE);
 
 struct sge_nm_txq {
 	struct tx_desc *desc;
 	uint16_t cidx;
 	uint16_t pidx;
 	uint16_t sidx;
 	uint16_t equiqidx;	/* EQUIQ last requested at this pidx */
 	uint16_t equeqidx;	/* EQUEQ last requested at this pidx */
 	uint16_t dbidx;		/* pidx of the most recent doorbell */
 	uint16_t doorbells;
 	volatile uint32_t *udb;
 	u_int udb_qid;
 	u_int cntxt_id;
 	__be32 cpl_ctrl0;	/* for convenience */
 	u_int nid;		/* netmap ring # for this queue */
 
 	/* infrequently used items after this */
 
 	bus_dma_tag_t desc_tag;
 	bus_dmamap_t desc_map;
 	bus_addr_t ba;
 	int iqidx;
 } __aligned(CACHE_LINE_SIZE);
 
 struct sge {
 	int nrxq;	/* total # of Ethernet rx queues */
 	int ntxq;	/* total # of Ethernet tx tx queues */
 	int nofldrxq;	/* total # of TOE rx queues */
 	int nofldtxq;	/* total # of TOE tx queues */
 	int nnmrxq;	/* total # of netmap rx queues */
 	int nnmtxq;	/* total # of netmap tx queues */
 	int niq;	/* total # of ingress queues */
 	int neq;	/* total # of egress queues */
 
 	struct sge_iq fwq;	/* Firmware event queue */
 	struct sge_wrq mgmtq;	/* Management queue (control queue) */
 	struct sge_wrq *ctrlq;	/* Control queues */
 	struct sge_txq *txq;	/* NIC tx queues */
 	struct sge_rxq *rxq;	/* NIC rx queues */
 	struct sge_wrq *ofld_txq;	/* TOE tx queues */
 	struct sge_ofld_rxq *ofld_rxq;	/* TOE rx queues */
 	struct sge_nm_txq *nm_txq;	/* netmap tx queues */
 	struct sge_nm_rxq *nm_rxq;	/* netmap rx queues */
 
 	uint16_t iq_start;
 	int eq_start;
 	struct sge_iq **iqmap;	/* iq->cntxt_id to iq mapping */
 	struct sge_eq **eqmap;	/* eq->cntxt_id to eq mapping */
 
 	int8_t safe_hwidx1;	/* may not have room for metadata */
 	int8_t safe_hwidx2;	/* with room for metadata and maybe more */
 	struct sw_zone_info sw_zone_info[SW_ZONE_SIZES];
 	struct hw_buf_info hw_buf_info[SGE_FLBUF_SIZES];
 };
 
 struct rss_header;
 typedef int (*cpl_handler_t)(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 typedef int (*an_handler_t)(struct sge_iq *, const struct rsp_ctrl *);
 typedef int (*fw_msg_handler_t)(struct adapter *, const __be64 *);
 
 struct adapter {
 	SLIST_ENTRY(adapter) link;
 	device_t dev;
 	struct cdev *cdev;
 
 	/* PCIe register resources */
 	int regs_rid;
 	struct resource *regs_res;
 	int msix_rid;
 	struct resource *msix_res;
 	bus_space_handle_t bh;
 	bus_space_tag_t bt;
 	bus_size_t mmio_len;
 	int udbs_rid;
 	struct resource *udbs_res;
 	volatile uint8_t *udbs_base;
 
 	unsigned int pf;
 	unsigned int mbox;
 	unsigned int vpd_busy;
 	unsigned int vpd_flag;
 
 	/* Interrupt information */
 	int intr_type;
 	int intr_count;
 	struct irq {
 		struct resource *res;
 		int rid;
 		void *tag;
 	} *irq;
 
 	bus_dma_tag_t dmat;	/* Parent DMA tag */
 
 	struct sge sge;
 	int lro_timeout;
 
 	struct taskqueue *tq[MAX_NCHAN];	/* General purpose taskqueues */
 	struct port_info *port[MAX_NPORTS];
 	uint8_t chan_map[MAX_NCHAN];
 
 	void *tom_softc;	/* (struct tom_data *) */
 	struct tom_tunables tt;
 	void *iwarp_softc;	/* (struct c4iw_dev *) */
 	void *iscsi_ulp_softc;	/* (struct cxgbei_data *) */
 	struct l2t_data *l2t;	/* L2 table */
 	struct tid_info tids;
 
 	uint16_t doorbells;
 	int offload_map;	/* ports with IFCAP_TOE enabled */
 	int active_ulds;	/* ULDs activated on this adapter */
 	int flags;
 	int debug_flags;
 
 	char ifp_lockname[16];
 	struct mtx ifp_lock;
 	struct ifnet *ifp;	/* tracer ifp */
 	struct ifmedia media;
 	int traceq;		/* iq used by all tracers, -1 if none */
 	int tracer_valid;	/* bitmap of valid tracers */
 	int tracer_enabled;	/* bitmap of enabled tracers */
 
 	char fw_version[16];
 	char tp_version[16];
 	char exprom_version[16];
 	char cfg_file[32];
 	u_int cfcsum;
 	struct adapter_params params;
 	const struct chip_params *chip_params;
 	struct t4_virt_res vres;
 
 	uint16_t nbmcaps;
 	uint16_t linkcaps;
 	uint16_t switchcaps;
 	uint16_t niccaps;
 	uint16_t toecaps;
 	uint16_t rdmacaps;
 	uint16_t tlscaps;
 	uint16_t iscsicaps;
 	uint16_t fcoecaps;
 
 	struct sysctl_ctx_list ctx; /* from adapter_full_init to full_uninit */
 
 	struct mtx sc_lock;
 	char lockname[16];
 
 	/* Starving free lists */
 	struct mtx sfl_lock;	/* same cache-line as sc_lock? but that's ok */
 	TAILQ_HEAD(, sge_fl) sfl;
 	struct callout sfl_callout;
 
 	struct mtx reg_lock;	/* for indirect register access */
 
 	struct memwin memwin[NUM_MEMWIN];	/* memory windows */
 
 	an_handler_t an_handler __aligned(CACHE_LINE_SIZE);
 	fw_msg_handler_t fw_msg_handler[7];	/* NUM_FW6_TYPES */
 	cpl_handler_t cpl_handler[0xef];	/* NUM_CPL_CMDS */
 
 	const char *last_op;
 	const void *last_op_thr;
 	int last_op_flags;
 
 	int sc_do_rxcopy;
 };
 
 #define ADAPTER_LOCK(sc)		mtx_lock(&(sc)->sc_lock)
 #define ADAPTER_UNLOCK(sc)		mtx_unlock(&(sc)->sc_lock)
 #define ADAPTER_LOCK_ASSERT_OWNED(sc)	mtx_assert(&(sc)->sc_lock, MA_OWNED)
 #define ADAPTER_LOCK_ASSERT_NOTOWNED(sc) mtx_assert(&(sc)->sc_lock, MA_NOTOWNED)
 
 #define ASSERT_SYNCHRONIZED_OP(sc)	\
     KASSERT(IS_BUSY(sc) && \
 	(mtx_owned(&(sc)->sc_lock) || sc->last_op_thr == curthread), \
 	("%s: operation not synchronized.", __func__))
 
 #define PORT_LOCK(pi)			mtx_lock(&(pi)->pi_lock)
 #define PORT_UNLOCK(pi)			mtx_unlock(&(pi)->pi_lock)
 #define PORT_LOCK_ASSERT_OWNED(pi)	mtx_assert(&(pi)->pi_lock, MA_OWNED)
 #define PORT_LOCK_ASSERT_NOTOWNED(pi)	mtx_assert(&(pi)->pi_lock, MA_NOTOWNED)
 
 #define FL_LOCK(fl)			mtx_lock(&(fl)->fl_lock)
 #define FL_TRYLOCK(fl)			mtx_trylock(&(fl)->fl_lock)
 #define FL_UNLOCK(fl)			mtx_unlock(&(fl)->fl_lock)
 #define FL_LOCK_ASSERT_OWNED(fl)	mtx_assert(&(fl)->fl_lock, MA_OWNED)
 #define FL_LOCK_ASSERT_NOTOWNED(fl)	mtx_assert(&(fl)->fl_lock, MA_NOTOWNED)
 
 #define RXQ_FL_LOCK(rxq)		FL_LOCK(&(rxq)->fl)
 #define RXQ_FL_UNLOCK(rxq)		FL_UNLOCK(&(rxq)->fl)
 #define RXQ_FL_LOCK_ASSERT_OWNED(rxq)	FL_LOCK_ASSERT_OWNED(&(rxq)->fl)
 #define RXQ_FL_LOCK_ASSERT_NOTOWNED(rxq) FL_LOCK_ASSERT_NOTOWNED(&(rxq)->fl)
 
 #define EQ_LOCK(eq)			mtx_lock(&(eq)->eq_lock)
 #define EQ_TRYLOCK(eq)			mtx_trylock(&(eq)->eq_lock)
 #define EQ_UNLOCK(eq)			mtx_unlock(&(eq)->eq_lock)
 #define EQ_LOCK_ASSERT_OWNED(eq)	mtx_assert(&(eq)->eq_lock, MA_OWNED)
 #define EQ_LOCK_ASSERT_NOTOWNED(eq)	mtx_assert(&(eq)->eq_lock, MA_NOTOWNED)
 
 #define TXQ_LOCK(txq)			EQ_LOCK(&(txq)->eq)
 #define TXQ_TRYLOCK(txq)		EQ_TRYLOCK(&(txq)->eq)
 #define TXQ_UNLOCK(txq)			EQ_UNLOCK(&(txq)->eq)
 #define TXQ_LOCK_ASSERT_OWNED(txq)	EQ_LOCK_ASSERT_OWNED(&(txq)->eq)
 #define TXQ_LOCK_ASSERT_NOTOWNED(txq)	EQ_LOCK_ASSERT_NOTOWNED(&(txq)->eq)
 
 #define CH_DUMP_MBOX(sc, mbox, data_reg) \
 	do { \
 		if (sc->debug_flags & DF_DUMP_MBOX) { \
 			log(LOG_NOTICE, \
 			    "%s mbox %u: %016llx %016llx %016llx %016llx " \
 			    "%016llx %016llx %016llx %016llx\n", \
 			    device_get_nameunit(sc->dev), mbox, \
 			    (unsigned long long)t4_read_reg64(sc, data_reg), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 8), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 16), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 24), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 32), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 40), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 48), \
 			    (unsigned long long)t4_read_reg64(sc, data_reg + 56)); \
 		} \
 	} while (0)
 
 #define for_each_txq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.txq[vi->first_txq], iter = 0; \
 	    iter < vi->ntxq; ++iter, ++q)
 #define for_each_rxq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.rxq[vi->first_rxq], iter = 0; \
 	    iter < vi->nrxq; ++iter, ++q)
 #define for_each_ofld_txq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.ofld_txq[vi->first_ofld_txq], iter = 0; \
 	    iter < vi->nofldtxq; ++iter, ++q)
 #define for_each_ofld_rxq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.ofld_rxq[vi->first_ofld_rxq], iter = 0; \
 	    iter < vi->nofldrxq; ++iter, ++q)
 #define for_each_nm_txq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.nm_txq[vi->first_txq], iter = 0; \
 	    iter < vi->ntxq; ++iter, ++q)
 #define for_each_nm_rxq(vi, iter, q) \
 	for (q = &vi->pi->adapter->sge.nm_rxq[vi->first_rxq], iter = 0; \
 	    iter < vi->nrxq; ++iter, ++q)
 #define for_each_vi(_pi, _iter, _vi) \
 	for ((_vi) = (_pi)->vi, (_iter) = 0; (_iter) < (_pi)->nvi; \
 	     ++(_iter), ++(_vi))
 
 #define IDXINCR(idx, incr, wrap) do { \
 	idx = wrap - idx > incr ? idx + incr : incr - (wrap - idx); \
 } while (0)
 #define IDXDIFF(head, tail, wrap) \
 	((head) >= (tail) ? (head) - (tail) : (wrap) - (tail) + (head))
 
 /* One for errors, one for firmware events */
 #define T4_EXTRA_INTR 2
 
 static inline uint32_t
 t4_read_reg(struct adapter *sc, uint32_t reg)
 {
 
 	return bus_space_read_4(sc->bt, sc->bh, reg);
 }
 
 static inline void
 t4_write_reg(struct adapter *sc, uint32_t reg, uint32_t val)
 {
 
 	bus_space_write_4(sc->bt, sc->bh, reg, val);
 }
 
 static inline uint64_t
 t4_read_reg64(struct adapter *sc, uint32_t reg)
 {
 
 	return t4_bus_space_read_8(sc->bt, sc->bh, reg);
 }
 
 static inline void
 t4_write_reg64(struct adapter *sc, uint32_t reg, uint64_t val)
 {
 
 	t4_bus_space_write_8(sc->bt, sc->bh, reg, val);
 }
 
 static inline void
 t4_os_pci_read_cfg1(struct adapter *sc, int reg, uint8_t *val)
 {
 
 	*val = pci_read_config(sc->dev, reg, 1);
 }
 
 static inline void
 t4_os_pci_write_cfg1(struct adapter *sc, int reg, uint8_t val)
 {
 
 	pci_write_config(sc->dev, reg, val, 1);
 }
 
 static inline void
 t4_os_pci_read_cfg2(struct adapter *sc, int reg, uint16_t *val)
 {
 
 	*val = pci_read_config(sc->dev, reg, 2);
 }
 
 static inline void
 t4_os_pci_write_cfg2(struct adapter *sc, int reg, uint16_t val)
 {
 
 	pci_write_config(sc->dev, reg, val, 2);
 }
 
 static inline void
 t4_os_pci_read_cfg4(struct adapter *sc, int reg, uint32_t *val)
 {
 
 	*val = pci_read_config(sc->dev, reg, 4);
 }
 
 static inline void
 t4_os_pci_write_cfg4(struct adapter *sc, int reg, uint32_t val)
 {
 
 	pci_write_config(sc->dev, reg, val, 4);
 }
 
 static inline struct port_info *
 adap2pinfo(struct adapter *sc, int idx)
 {
 
 	return (sc->port[idx]);
 }
 
 static inline void
 t4_os_set_hw_addr(struct adapter *sc, int idx, uint8_t hw_addr[])
 {
 
 	bcopy(hw_addr, sc->port[idx]->vi[0].hw_addr, ETHER_ADDR_LEN);
 }
 
 static inline bool
 is_10G_port(const struct port_info *pi)
 {
 
 	return ((pi->link_cfg.supported & FW_PORT_CAP_SPEED_10G) != 0);
 }
 
 static inline bool
 is_40G_port(const struct port_info *pi)
 {
 
 	return ((pi->link_cfg.supported & FW_PORT_CAP_SPEED_40G) != 0);
 }
 
 static inline int
 port_top_speed(const struct port_info *pi)
 {
 
 	if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_100G)
 		return (100);
 	if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_40G)
 		return (40);
 	if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_10G)
 		return (10);
 	if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_1G)
 		return (1);
 
 	return (0);
 }
 
 static inline int
 tx_resume_threshold(struct sge_eq *eq)
 {
 
 	/* not quite the same as qsize / 4, but this will do. */
 	return (eq->sidx / 4);
 }
 
 static inline int
 t4_use_ldst(struct adapter *sc)
 {
 
 #ifdef notyet
 	return (sc->flags & FW_OK || !sc->use_bd);
 #else
 	return (0);
 #endif
 }
 
 /* t4_main.c */
 int t4_os_find_pci_capability(struct adapter *, int);
 int t4_os_pci_save_state(struct adapter *);
 int t4_os_pci_restore_state(struct adapter *);
 void t4_os_portmod_changed(const struct adapter *, int);
 void t4_os_link_changed(struct adapter *, int, int, int);
 void t4_iterate(void (*)(struct adapter *, void *), void *);
 int t4_register_cpl_handler(struct adapter *, int, cpl_handler_t);
 int t4_register_an_handler(struct adapter *, an_handler_t);
 int t4_register_fw_msg_handler(struct adapter *, int, fw_msg_handler_t);
 int t4_filter_rpl(struct sge_iq *, const struct rss_header *, struct mbuf *);
 int begin_synchronized_op(struct adapter *, struct vi_info *, int, char *);
 void doom_vi(struct adapter *, struct vi_info *);
 void end_synchronized_op(struct adapter *, int);
 int update_mac_settings(struct ifnet *, int);
 int adapter_full_init(struct adapter *);
 int adapter_full_uninit(struct adapter *);
 uint64_t cxgbe_get_counter(struct ifnet *, ift_counter);
 int vi_full_init(struct vi_info *);
 int vi_full_uninit(struct vi_info *);
 void vi_sysctls(struct vi_info *);
 void vi_tick(void *);
 
 #ifdef DEV_NETMAP
 /* t4_netmap.c */
 int create_netmap_ifnet(struct port_info *);
 int destroy_netmap_ifnet(struct port_info *);
 void t4_nm_intr(void *);
 #endif
 
 /* t4_sge.c */
 void t4_sge_modload(void);
 void t4_sge_modunload(void);
 uint64_t t4_sge_extfree_refs(void);
 void t4_init_sge_cpl_handlers(struct adapter *);
 void t4_tweak_chip_settings(struct adapter *);
 int t4_read_chip_settings(struct adapter *);
 int t4_create_dma_tag(struct adapter *);
 void t4_sge_sysctls(struct adapter *, struct sysctl_ctx_list *,
     struct sysctl_oid_list *);
 int t4_destroy_dma_tag(struct adapter *);
 int t4_setup_adapter_queues(struct adapter *);
 int t4_teardown_adapter_queues(struct adapter *);
 int t4_setup_vi_queues(struct vi_info *);
 int t4_teardown_vi_queues(struct vi_info *);
 void t4_intr_all(void *);
 void t4_intr(void *);
 void t4_intr_err(void *);
 void t4_intr_evt(void *);
 void t4_wrq_tx_locked(struct adapter *, struct sge_wrq *, struct wrqe *);
 void t4_update_fl_bufsize(struct ifnet *);
 int parse_pkt(struct mbuf **);
 void *start_wrq_wr(struct sge_wrq *, int, struct wrq_cookie *);
 void commit_wrq_wr(struct sge_wrq *, void *, struct wrq_cookie *);
 int tnl_cong(struct port_info *, int);
 
 /* t4_tracer.c */
 struct t4_tracer;
 void t4_tracer_modload(void);
 void t4_tracer_modunload(void);
 void t4_tracer_port_detach(struct adapter *);
 int t4_get_tracer(struct adapter *, struct t4_tracer *);
 int t4_set_tracer(struct adapter *, struct t4_tracer *);
 int t4_trace_pkt(struct sge_iq *, const struct rss_header *, struct mbuf *);
 int t5_trace_pkt(struct sge_iq *, const struct rss_header *, struct mbuf *);
 
 static inline struct wrqe *
 alloc_wrqe(int wr_len, struct sge_wrq *wrq)
 {
 	int len = offsetof(struct wrqe, wr) + wr_len;
 	struct wrqe *wr;
 
 	wr = malloc(len, M_CXGBE, M_NOWAIT);
 	if (__predict_false(wr == NULL))
 		return (NULL);
 	wr->wr_len = wr_len;
 	wr->wrq = wrq;
 	return (wr);
 }
 
 static inline void *
 wrtod(struct wrqe *wr)
 {
 	return (&wr->wr[0]);
 }
 
 static inline void
 free_wrqe(struct wrqe *wr)
 {
 	free(wr, M_CXGBE);
 }
 
 static inline void
 t4_wrq_tx(struct adapter *sc, struct wrqe *wr)
 {
 	struct sge_wrq *wrq = wr->wrq;
 
 	TXQ_LOCK(wrq);
 	t4_wrq_tx_locked(sc, wrq, wr);
 	TXQ_UNLOCK(wrq);
 }
 
 #endif
Index: projects/vnet/sys/dev/cxgbe/t4_main.c
===================================================================
--- projects/vnet/sys/dev/cxgbe/t4_main.c	(revision 301546)
+++ projects/vnet/sys/dev/cxgbe/t4_main.c	(revision 301547)
@@ -1,9442 +1,9561 @@
 /*-
  * Copyright (c) 2011 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: Navdeep Parhar <np@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/priv.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/module.h>
 #include <sys/malloc.h>
 #include <sys/queue.h>
 #include <sys/taskqueue.h>
 #include <sys/pciio.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pci_private.h>
 #include <sys/firmware.h>
 #include <sys/sbuf.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_types.h>
 #include <net/if_dl.h>
 #include <net/if_vlan_var.h>
 #ifdef RSS
 #include <net/rss_config.h>
 #endif
 #if defined(__i386__) || defined(__amd64__)
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #endif
 #ifdef DDB
 #include <ddb/ddb.h>
 #include <ddb/db_lex.h>
 #endif
 
 #include "common/common.h"
 #include "common/t4_msg.h"
 #include "common/t4_regs.h"
 #include "common/t4_regs_values.h"
 #include "t4_ioctl.h"
 #include "t4_l2t.h"
 #include "t4_mp_ring.h"
 
 /* T4 bus driver interface */
 static int t4_probe(device_t);
 static int t4_attach(device_t);
 static int t4_detach(device_t);
 static device_method_t t4_methods[] = {
 	DEVMETHOD(device_probe,		t4_probe),
 	DEVMETHOD(device_attach,	t4_attach),
 	DEVMETHOD(device_detach,	t4_detach),
 
 	DEVMETHOD_END
 };
 static driver_t t4_driver = {
 	"t4nex",
 	t4_methods,
 	sizeof(struct adapter)
 };
 
 
 /* T4 port (cxgbe) interface */
 static int cxgbe_probe(device_t);
 static int cxgbe_attach(device_t);
 static int cxgbe_detach(device_t);
 static device_method_t cxgbe_methods[] = {
 	DEVMETHOD(device_probe,		cxgbe_probe),
 	DEVMETHOD(device_attach,	cxgbe_attach),
 	DEVMETHOD(device_detach,	cxgbe_detach),
 	{ 0, 0 }
 };
 static driver_t cxgbe_driver = {
 	"cxgbe",
 	cxgbe_methods,
 	sizeof(struct port_info)
 };
 
 /* T4 VI (vcxgbe) interface */
 static int vcxgbe_probe(device_t);
 static int vcxgbe_attach(device_t);
 static int vcxgbe_detach(device_t);
 static device_method_t vcxgbe_methods[] = {
 	DEVMETHOD(device_probe,		vcxgbe_probe),
 	DEVMETHOD(device_attach,	vcxgbe_attach),
 	DEVMETHOD(device_detach,	vcxgbe_detach),
 	{ 0, 0 }
 };
 static driver_t vcxgbe_driver = {
 	"vcxgbe",
 	vcxgbe_methods,
 	sizeof(struct vi_info)
 };
 
 static d_ioctl_t t4_ioctl;
 static d_open_t t4_open;
 static d_close_t t4_close;
 
 static struct cdevsw t4_cdevsw = {
        .d_version = D_VERSION,
        .d_flags = 0,
        .d_open = t4_open,
        .d_close = t4_close,
        .d_ioctl = t4_ioctl,
        .d_name = "t4nex",
 };
 
 /* T5 bus driver interface */
 static int t5_probe(device_t);
 static device_method_t t5_methods[] = {
 	DEVMETHOD(device_probe,		t5_probe),
 	DEVMETHOD(device_attach,	t4_attach),
 	DEVMETHOD(device_detach,	t4_detach),
 
 	DEVMETHOD_END
 };
 static driver_t t5_driver = {
 	"t5nex",
 	t5_methods,
 	sizeof(struct adapter)
 };
 
 
 /* T5 port (cxl) interface */
 static driver_t cxl_driver = {
 	"cxl",
 	cxgbe_methods,
 	sizeof(struct port_info)
 };
 
 /* T5 VI (vcxl) interface */
 static driver_t vcxl_driver = {
 	"vcxl",
 	vcxgbe_methods,
 	sizeof(struct vi_info)
 };
 
 static struct cdevsw t5_cdevsw = {
        .d_version = D_VERSION,
        .d_flags = 0,
        .d_open = t4_open,
        .d_close = t4_close,
        .d_ioctl = t4_ioctl,
        .d_name = "t5nex",
 };
 
 /* ifnet + media interface */
 static void cxgbe_init(void *);
 static int cxgbe_ioctl(struct ifnet *, unsigned long, caddr_t);
 static int cxgbe_transmit(struct ifnet *, struct mbuf *);
 static void cxgbe_qflush(struct ifnet *);
 static int cxgbe_media_change(struct ifnet *);
 static void cxgbe_media_status(struct ifnet *, struct ifmediareq *);
 
 MALLOC_DEFINE(M_CXGBE, "cxgbe", "Chelsio T4/T5 Ethernet driver and services");
 
 /*
  * Correct lock order when you need to acquire multiple locks is t4_list_lock,
  * then ADAPTER_LOCK, then t4_uld_list_lock.
  */
 static struct sx t4_list_lock;
 SLIST_HEAD(, adapter) t4_list;
 #ifdef TCP_OFFLOAD
 static struct sx t4_uld_list_lock;
 SLIST_HEAD(, uld_info) t4_uld_list;
 #endif
 
 /*
  * Tunables.  See tweak_tunables() too.
  *
  * Each tunable is set to a default value here if it's known at compile-time.
  * Otherwise it is set to -1 as an indication to tweak_tunables() that it should
  * provide a reasonable default when the driver is loaded.
  *
  * Tunables applicable to both T4 and T5 are under hw.cxgbe.  Those specific to
  * T5 are under hw.cxl.
  */
 
 /*
  * Number of queues for tx and rx, 10G and 1G, NIC and offload.
  */
 #define NTXQ_10G 16
 static int t4_ntxq10g = -1;
 TUNABLE_INT("hw.cxgbe.ntxq10g", &t4_ntxq10g);
 
 #define NRXQ_10G 8
 static int t4_nrxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nrxq10g", &t4_nrxq10g);
 
 #define NTXQ_1G 4
 static int t4_ntxq1g = -1;
 TUNABLE_INT("hw.cxgbe.ntxq1g", &t4_ntxq1g);
 
 #define NRXQ_1G 2
 static int t4_nrxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nrxq1g", &t4_nrxq1g);
 
 static int t4_rsrv_noflowq = 0;
 TUNABLE_INT("hw.cxgbe.rsrv_noflowq", &t4_rsrv_noflowq);
 
 #ifdef TCP_OFFLOAD
 #define NOFLDTXQ_10G 8
 static int t4_nofldtxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nofldtxq10g", &t4_nofldtxq10g);
 
 #define NOFLDRXQ_10G 2
 static int t4_nofldrxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nofldrxq10g", &t4_nofldrxq10g);
 
 #define NOFLDTXQ_1G 2
 static int t4_nofldtxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nofldtxq1g", &t4_nofldtxq1g);
 
 #define NOFLDRXQ_1G 1
 static int t4_nofldrxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nofldrxq1g", &t4_nofldrxq1g);
 #endif
 
 #ifdef DEV_NETMAP
 #define NNMTXQ_10G 2
 static int t4_nnmtxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nnmtxq10g", &t4_nnmtxq10g);
 
 #define NNMRXQ_10G 2
 static int t4_nnmrxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nnmrxq10g", &t4_nnmrxq10g);
 
 #define NNMTXQ_1G 1
 static int t4_nnmtxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nnmtxq1g", &t4_nnmtxq1g);
 
 #define NNMRXQ_1G 1
 static int t4_nnmrxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nnmrxq1g", &t4_nnmrxq1g);
 #endif
 
 /*
  * Holdoff parameters for 10G and 1G ports.
  */
 #define TMR_IDX_10G 1
 static int t4_tmr_idx_10g = TMR_IDX_10G;
 TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_10G", &t4_tmr_idx_10g);
 
 #define PKTC_IDX_10G (-1)
 static int t4_pktc_idx_10g = PKTC_IDX_10G;
 TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_10G", &t4_pktc_idx_10g);
 
 #define TMR_IDX_1G 1
 static int t4_tmr_idx_1g = TMR_IDX_1G;
 TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_1G", &t4_tmr_idx_1g);
 
 #define PKTC_IDX_1G (-1)
 static int t4_pktc_idx_1g = PKTC_IDX_1G;
 TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_1G", &t4_pktc_idx_1g);
 
 /*
  * Size (# of entries) of each tx and rx queue.
  */
 static unsigned int t4_qsize_txq = TX_EQ_QSIZE;
 TUNABLE_INT("hw.cxgbe.qsize_txq", &t4_qsize_txq);
 
 static unsigned int t4_qsize_rxq = RX_IQ_QSIZE;
 TUNABLE_INT("hw.cxgbe.qsize_rxq", &t4_qsize_rxq);
 
 /*
  * Interrupt types allowed (bits 0, 1, 2 = INTx, MSI, MSI-X respectively).
  */
 static int t4_intr_types = INTR_MSIX | INTR_MSI | INTR_INTX;
 TUNABLE_INT("hw.cxgbe.interrupt_types", &t4_intr_types);
 
 /*
  * Configuration file.
  */
 #define DEFAULT_CF	"default"
 #define FLASH_CF	"flash"
 #define UWIRE_CF	"uwire"
 #define FPGA_CF		"fpga"
 static char t4_cfg_file[32] = DEFAULT_CF;
 TUNABLE_STR("hw.cxgbe.config_file", t4_cfg_file, sizeof(t4_cfg_file));
 
 /*
  * PAUSE settings (bit 0, 1 = rx_pause, tx_pause respectively).
  * rx_pause = 1 to heed incoming PAUSE frames, 0 to ignore them.
  * tx_pause = 1 to emit PAUSE frames when the rx FIFO reaches its high water
  *            mark or when signalled to do so, 0 to never emit PAUSE.
  */
 static int t4_pause_settings = PAUSE_TX | PAUSE_RX;
 TUNABLE_INT("hw.cxgbe.pause_settings", &t4_pause_settings);
 
 /*
  * Firmware auto-install by driver during attach (0, 1, 2 = prohibited, allowed,
  * encouraged respectively).
  */
 static unsigned int t4_fw_install = 1;
 TUNABLE_INT("hw.cxgbe.fw_install", &t4_fw_install);
 
 /*
  * ASIC features that will be used.  Disable the ones you don't want so that the
  * chip resources aren't wasted on features that will not be used.
  */
 static int t4_nbmcaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.nbmcaps_allowed", &t4_nbmcaps_allowed);
 
 static int t4_linkcaps_allowed = 0;	/* No DCBX, PPP, etc. by default */
 TUNABLE_INT("hw.cxgbe.linkcaps_allowed", &t4_linkcaps_allowed);
 
 static int t4_switchcaps_allowed = FW_CAPS_CONFIG_SWITCH_INGRESS |
     FW_CAPS_CONFIG_SWITCH_EGRESS;
 TUNABLE_INT("hw.cxgbe.switchcaps_allowed", &t4_switchcaps_allowed);
 
 static int t4_niccaps_allowed = FW_CAPS_CONFIG_NIC;
 TUNABLE_INT("hw.cxgbe.niccaps_allowed", &t4_niccaps_allowed);
 
 static int t4_toecaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.toecaps_allowed", &t4_toecaps_allowed);
 
 static int t4_rdmacaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.rdmacaps_allowed", &t4_rdmacaps_allowed);
 
 static int t4_tlscaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.tlscaps_allowed", &t4_tlscaps_allowed);
 
 static int t4_iscsicaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.iscsicaps_allowed", &t4_iscsicaps_allowed);
 
 static int t4_fcoecaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.fcoecaps_allowed", &t4_fcoecaps_allowed);
 
 static int t5_write_combine = 0;
 TUNABLE_INT("hw.cxl.write_combine", &t5_write_combine);
 
 static int t4_num_vis = 1;
 TUNABLE_INT("hw.cxgbe.num_vis", &t4_num_vis);
 
 /* Functions used by extra VIs to obtain unique MAC addresses for each VI. */
 static int vi_mac_funcs[] = {
 	FW_VI_FUNC_OFLD,
 	FW_VI_FUNC_IWARP,
 	FW_VI_FUNC_OPENISCSI,
 	FW_VI_FUNC_OPENFCOE,
 	FW_VI_FUNC_FOISCSI,
 	FW_VI_FUNC_FOFCOE,
 };
 
 struct intrs_and_queues {
 	uint16_t intr_type;	/* INTx, MSI, or MSI-X */
 	uint16_t nirq;		/* Total # of vectors */
 	uint16_t intr_flags_10g;/* Interrupt flags for each 10G port */
 	uint16_t intr_flags_1g;	/* Interrupt flags for each 1G port */
 	uint16_t ntxq10g;	/* # of NIC txq's for each 10G port */
 	uint16_t nrxq10g;	/* # of NIC rxq's for each 10G port */
 	uint16_t ntxq1g;	/* # of NIC txq's for each 1G port */
 	uint16_t nrxq1g;	/* # of NIC rxq's for each 1G port */
 	uint16_t rsrv_noflowq;	/* Flag whether to reserve queue 0 */
 #ifdef TCP_OFFLOAD
 	uint16_t nofldtxq10g;	/* # of TOE txq's for each 10G port */
 	uint16_t nofldrxq10g;	/* # of TOE rxq's for each 10G port */
 	uint16_t nofldtxq1g;	/* # of TOE txq's for each 1G port */
 	uint16_t nofldrxq1g;	/* # of TOE rxq's for each 1G port */
 #endif
 #ifdef DEV_NETMAP
 	uint16_t nnmtxq10g;	/* # of netmap txq's for each 10G port */
 	uint16_t nnmrxq10g;	/* # of netmap rxq's for each 10G port */
 	uint16_t nnmtxq1g;	/* # of netmap txq's for each 1G port */
 	uint16_t nnmrxq1g;	/* # of netmap rxq's for each 1G port */
 #endif
 };
 
 struct filter_entry {
         uint32_t valid:1;	/* filter allocated and valid */
         uint32_t locked:1;	/* filter is administratively locked */
         uint32_t pending:1;	/* filter action is pending firmware reply */
 	uint32_t smtidx:8;	/* Source MAC Table index for smac */
 	struct l2t_entry *l2t;	/* Layer Two Table entry for dmac */
 
         struct t4_filter_specification fs;
 };
 
 static int map_bars_0_and_4(struct adapter *);
 static int map_bar_2(struct adapter *);
 static void setup_memwin(struct adapter *);
 static void position_memwin(struct adapter *, int, uint32_t);
 static int rw_via_memwin(struct adapter *, int, uint32_t, uint32_t *, int, int);
 static inline int read_via_memwin(struct adapter *, int, uint32_t, uint32_t *,
     int);
 static inline int write_via_memwin(struct adapter *, int, uint32_t,
     const uint32_t *, int);
 static int validate_mem_range(struct adapter *, uint32_t, int);
 static int fwmtype_to_hwmtype(int);
 static int validate_mt_off_len(struct adapter *, int, uint32_t, int,
     uint32_t *);
 static int fixup_devlog_params(struct adapter *);
 static int cfg_itype_and_nqueues(struct adapter *, int, int, int,
     struct intrs_and_queues *);
 static int prep_firmware(struct adapter *);
 static int partition_resources(struct adapter *, const struct firmware *,
     const char *);
 static int get_params__pre_init(struct adapter *);
 static int get_params__post_init(struct adapter *);
 static int set_params__post_init(struct adapter *);
 static void t4_set_desc(struct adapter *);
 static void build_medialist(struct port_info *, struct ifmedia *);
 static int cxgbe_init_synchronized(struct vi_info *);
 static int cxgbe_uninit_synchronized(struct vi_info *);
 static int setup_intr_handlers(struct adapter *);
 static void quiesce_txq(struct adapter *, struct sge_txq *);
 static void quiesce_wrq(struct adapter *, struct sge_wrq *);
 static void quiesce_iq(struct adapter *, struct sge_iq *);
 static void quiesce_fl(struct adapter *, struct sge_fl *);
 static int t4_alloc_irq(struct adapter *, struct irq *, int rid,
     driver_intr_t *, void *, char *);
 static int t4_free_irq(struct adapter *, struct irq *);
 static void get_regs(struct adapter *, struct t4_regdump *, uint8_t *);
 static void vi_refresh_stats(struct adapter *, struct vi_info *);
 static void cxgbe_refresh_stats(struct adapter *, struct port_info *);
 static void cxgbe_tick(void *);
 static void cxgbe_vlan_config(void *, struct ifnet *, uint16_t);
 static int cpl_not_handled(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 static int an_not_handled(struct sge_iq *, const struct rsp_ctrl *);
 static int fw_msg_not_handled(struct adapter *, const __be64 *);
 static void t4_sysctls(struct adapter *);
 static void cxgbe_sysctls(struct port_info *);
 static int sysctl_int_array(SYSCTL_HANDLER_ARGS);
 static int sysctl_bitfield(SYSCTL_HANDLER_ARGS);
 static int sysctl_btphy(SYSCTL_HANDLER_ARGS);
 static int sysctl_noflowq(SYSCTL_HANDLER_ARGS);
 static int sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS);
 static int sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS);
 static int sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS);
 static int sysctl_qsize_txq(SYSCTL_HANDLER_ARGS);
 static int sysctl_pause_settings(SYSCTL_HANDLER_ARGS);
 static int sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS);
 static int sysctl_temperature(SYSCTL_HANDLER_ARGS);
 #ifdef SBUF_DRAIN
 static int sysctl_cctrl(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS);
 static int sysctl_cpl_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_ddp_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_devlog(SYSCTL_HANDLER_ARGS);
 static int sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_hw_sched(SYSCTL_HANDLER_ARGS);
 static int sysctl_lb_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_linkdnrc(SYSCTL_HANDLER_ARGS);
 static int sysctl_meminfo(SYSCTL_HANDLER_ARGS);
 static int sysctl_mps_tcam(SYSCTL_HANDLER_ARGS);
 static int sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS);
 static int sysctl_path_mtus(SYSCTL_HANDLER_ARGS);
 static int sysctl_pm_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_rdma_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tcp_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tids(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_tx_rate(SYSCTL_HANDLER_ARGS);
 static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS);
+static int sysctl_tc_params(SYSCTL_HANDLER_ARGS);
 #endif
 #ifdef TCP_OFFLOAD
 static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS);
 #endif
 static uint32_t fconf_iconf_to_mode(uint32_t, uint32_t);
 static uint32_t mode_to_fconf(uint32_t);
 static uint32_t mode_to_iconf(uint32_t);
 static int check_fspec_against_fconf_iconf(struct adapter *,
     struct t4_filter_specification *);
 static int get_filter_mode(struct adapter *, uint32_t *);
 static int set_filter_mode(struct adapter *, uint32_t);
 static inline uint64_t get_filter_hits(struct adapter *, uint32_t);
 static int get_filter(struct adapter *, struct t4_filter *);
 static int set_filter(struct adapter *, struct t4_filter *);
 static int del_filter(struct adapter *, struct t4_filter *);
 static void clear_filter(struct filter_entry *);
 static int set_filter_wr(struct adapter *, int);
 static int del_filter_wr(struct adapter *, int);
 static int get_sge_context(struct adapter *, struct t4_sge_context *);
 static int load_fw(struct adapter *, struct t4_data *);
 static int read_card_mem(struct adapter *, int, struct t4_mem_range *);
 static int read_i2c(struct adapter *, struct t4_i2c_data *);
 static int set_sched_class(struct adapter *, struct t4_sched_params *);
 static int set_sched_queue(struct adapter *, struct t4_sched_queue *);
 #ifdef TCP_OFFLOAD
 static int toe_capability(struct vi_info *, int);
 #endif
 static int mod_event(module_t, int, void *);
 
 struct {
 	uint16_t device;
 	char *desc;
 } t4_pciids[] = {
 	{0xa000, "Chelsio Terminator 4 FPGA"},
 	{0x4400, "Chelsio T440-dbg"},
 	{0x4401, "Chelsio T420-CR"},
 	{0x4402, "Chelsio T422-CR"},
 	{0x4403, "Chelsio T440-CR"},
 	{0x4404, "Chelsio T420-BCH"},
 	{0x4405, "Chelsio T440-BCH"},
 	{0x4406, "Chelsio T440-CH"},
 	{0x4407, "Chelsio T420-SO"},
 	{0x4408, "Chelsio T420-CX"},
 	{0x4409, "Chelsio T420-BT"},
 	{0x440a, "Chelsio T404-BT"},
 	{0x440e, "Chelsio T440-LP-CR"},
 }, t5_pciids[] = {
 	{0xb000, "Chelsio Terminator 5 FPGA"},
 	{0x5400, "Chelsio T580-dbg"},
 	{0x5401,  "Chelsio T520-CR"},		/* 2 x 10G */
 	{0x5402,  "Chelsio T522-CR"},		/* 2 x 10G, 2 X 1G */
 	{0x5403,  "Chelsio T540-CR"},		/* 4 x 10G */
 	{0x5407,  "Chelsio T520-SO"},		/* 2 x 10G, nomem */
 	{0x5409,  "Chelsio T520-BT"},		/* 2 x 10GBaseT */
 	{0x540a,  "Chelsio T504-BT"},		/* 4 x 1G */
 	{0x540d,  "Chelsio T580-CR"},		/* 2 x 40G */
 	{0x540e,  "Chelsio T540-LP-CR"},	/* 4 x 10G */
 	{0x5410,  "Chelsio T580-LP-CR"},	/* 2 x 40G */
 	{0x5411,  "Chelsio T520-LL-CR"},	/* 2 x 10G */
 	{0x5412,  "Chelsio T560-CR"},		/* 1 x 40G, 2 x 10G */
 	{0x5414,  "Chelsio T580-LP-SO-CR"},	/* 2 x 40G, nomem */
 	{0x5415,  "Chelsio T502-BT"},		/* 2 x 1G */
 #ifdef notyet
 	{0x5404,  "Chelsio T520-BCH"},
 	{0x5405,  "Chelsio T540-BCH"},
 	{0x5406,  "Chelsio T540-CH"},
 	{0x5408,  "Chelsio T520-CX"},
 	{0x540b,  "Chelsio B520-SR"},
 	{0x540c,  "Chelsio B504-BT"},
 	{0x540f,  "Chelsio Amsterdam"},
 	{0x5413,  "Chelsio T580-CHR"},
 #endif
 };
 
 #ifdef TCP_OFFLOAD
 /*
  * service_iq() has an iq and needs the fl.  Offset of fl from the iq should be
  * exactly the same for both rxq and ofld_rxq.
  */
 CTASSERT(offsetof(struct sge_ofld_rxq, iq) == offsetof(struct sge_rxq, iq));
 CTASSERT(offsetof(struct sge_ofld_rxq, fl) == offsetof(struct sge_rxq, fl));
 #endif
 
 /* No easy way to include t4_msg.h before adapter.h so we check this way */
 CTASSERT(nitems(((struct adapter *)0)->cpl_handler) == NUM_CPL_CMDS);
 CTASSERT(nitems(((struct adapter *)0)->fw_msg_handler) == NUM_FW6_TYPES);
 
 CTASSERT(sizeof(struct cluster_metadata) <= CL_METADATA_SIZE);
 
 static int
 t4_probe(device_t dev)
 {
 	int i;
 	uint16_t v = pci_get_vendor(dev);
 	uint16_t d = pci_get_device(dev);
 	uint8_t f = pci_get_function(dev);
 
 	if (v != PCI_VENDOR_ID_CHELSIO)
 		return (ENXIO);
 
 	/* Attach only to PF0 of the FPGA */
 	if (d == 0xa000 && f != 0)
 		return (ENXIO);
 
 	for (i = 0; i < nitems(t4_pciids); i++) {
 		if (d == t4_pciids[i].device) {
 			device_set_desc(dev, t4_pciids[i].desc);
 			return (BUS_PROBE_DEFAULT);
 		}
 	}
 
 	return (ENXIO);
 }
 
 static int
 t5_probe(device_t dev)
 {
 	int i;
 	uint16_t v = pci_get_vendor(dev);
 	uint16_t d = pci_get_device(dev);
 	uint8_t f = pci_get_function(dev);
 
 	if (v != PCI_VENDOR_ID_CHELSIO)
 		return (ENXIO);
 
 	/* Attach only to PF0 of the FPGA */
 	if (d == 0xb000 && f != 0)
 		return (ENXIO);
 
 	for (i = 0; i < nitems(t5_pciids); i++) {
 		if (d == t5_pciids[i].device) {
 			device_set_desc(dev, t5_pciids[i].desc);
 			return (BUS_PROBE_DEFAULT);
 		}
 	}
 
 	return (ENXIO);
 }
 
 static void
 t5_attribute_workaround(device_t dev)
 {
 	device_t root_port;
 	uint32_t v;
 
 	/*
 	 * The T5 chips do not properly echo the No Snoop and Relaxed
 	 * Ordering attributes when replying to a TLP from a Root
 	 * Port.  As a workaround, find the parent Root Port and
 	 * disable No Snoop and Relaxed Ordering.  Note that this
 	 * affects all devices under this root port.
 	 */
 	root_port = pci_find_pcie_root_port(dev);
 	if (root_port == NULL) {
 		device_printf(dev, "Unable to find parent root port\n");
 		return;
 	}
 
 	v = pcie_adjust_config(root_port, PCIER_DEVICE_CTL,
 	    PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE, 0, 2);
 	if ((v & (PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE)) !=
 	    0)
 		device_printf(dev, "Disabled No Snoop/Relaxed Ordering on %s\n",
 		    device_get_nameunit(root_port));
 }
 
 static int
 t4_attach(device_t dev)
 {
 	struct adapter *sc;
 	int rc = 0, i, j, n10g, n1g, rqidx, tqidx;
 	struct intrs_and_queues iaq;
 	struct sge *s;
 	uint8_t *buf;
 #ifdef TCP_OFFLOAD
 	int ofld_rqidx, ofld_tqidx;
 #endif
 #ifdef DEV_NETMAP
 	int nm_rqidx, nm_tqidx;
 #endif
 	int num_vis;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	TUNABLE_INT_FETCH("hw.cxgbe.debug_flags", &sc->debug_flags);
 
 	if ((pci_get_device(dev) & 0xff00) == 0x5400)
 		t5_attribute_workaround(dev);
 	pci_enable_busmaster(dev);
 	if (pci_find_cap(dev, PCIY_EXPRESS, &i) == 0) {
 		uint32_t v;
 
 		pci_set_max_read_req(dev, 4096);
 		v = pci_read_config(dev, i + PCIER_DEVICE_CTL, 2);
 		v |= PCIEM_CTL_RELAXED_ORD_ENABLE;
 		pci_write_config(dev, i + PCIER_DEVICE_CTL, v, 2);
 
 		sc->params.pci.mps = 128 << ((v & PCIEM_CTL_MAX_PAYLOAD) >> 5);
 	}
 
 	sc->traceq = -1;
 	mtx_init(&sc->ifp_lock, sc->ifp_lockname, 0, MTX_DEF);
 	snprintf(sc->ifp_lockname, sizeof(sc->ifp_lockname), "%s tracer",
 	    device_get_nameunit(dev));
 
 	snprintf(sc->lockname, sizeof(sc->lockname), "%s",
 	    device_get_nameunit(dev));
 	mtx_init(&sc->sc_lock, sc->lockname, 0, MTX_DEF);
 	sx_xlock(&t4_list_lock);
 	SLIST_INSERT_HEAD(&t4_list, sc, link);
 	sx_xunlock(&t4_list_lock);
 
 	mtx_init(&sc->sfl_lock, "starving freelists", 0, MTX_DEF);
 	TAILQ_INIT(&sc->sfl);
 	callout_init_mtx(&sc->sfl_callout, &sc->sfl_lock, 0);
 
 	mtx_init(&sc->reg_lock, "indirect register access", 0, MTX_DEF);
 
 	rc = map_bars_0_and_4(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/*
 	 * This is the real PF# to which we're attaching.  Works from within PCI
 	 * passthrough environments too, where pci_get_function() could return a
 	 * different PF# depending on the passthrough configuration.  We need to
 	 * use the real PF# in all our communication with the firmware.
 	 */
 	sc->pf = G_SOURCEPF(t4_read_reg(sc, A_PL_WHOAMI));
 	sc->mbox = sc->pf;
 
 	memset(sc->chan_map, 0xff, sizeof(sc->chan_map));
 	sc->an_handler = an_not_handled;
 	for (i = 0; i < nitems(sc->cpl_handler); i++)
 		sc->cpl_handler[i] = cpl_not_handled;
 	for (i = 0; i < nitems(sc->fw_msg_handler); i++)
 		sc->fw_msg_handler[i] = fw_msg_not_handled;
 	t4_register_cpl_handler(sc, CPL_SET_TCB_RPL, t4_filter_rpl);
 	t4_register_cpl_handler(sc, CPL_TRACE_PKT, t4_trace_pkt);
 	t4_register_cpl_handler(sc, CPL_T5_TRACE_PKT, t5_trace_pkt);
 	t4_init_sge_cpl_handlers(sc);
 
 	/* Prepare the adapter for operation. */
 	buf = malloc(PAGE_SIZE, M_CXGBE, M_ZERO | M_WAITOK);
 	rc = -t4_prep_adapter(sc, buf);
 	free(buf, M_CXGBE);
 	if (rc != 0) {
 		device_printf(dev, "failed to prepare adapter: %d.\n", rc);
 		goto done;
 	}
 
 	/*
 	 * Do this really early, with the memory windows set up even before the
 	 * character device.  The userland tool's register i/o and mem read
 	 * will work even in "recovery mode".
 	 */
 	setup_memwin(sc);
 	if (t4_init_devlog_params(sc, 0) == 0)
 		fixup_devlog_params(sc);
 	sc->cdev = make_dev(is_t4(sc) ? &t4_cdevsw : &t5_cdevsw,
 	    device_get_unit(dev), UID_ROOT, GID_WHEEL, 0600, "%s",
 	    device_get_nameunit(dev));
 	if (sc->cdev == NULL)
 		device_printf(dev, "failed to create nexus char device.\n");
 	else
 		sc->cdev->si_drv1 = sc;
 
 	/* Go no further if recovery mode has been requested. */
 	if (TUNABLE_INT_FETCH("hw.cxgbe.sos", &i) && i != 0) {
 		device_printf(dev, "recovery mode.\n");
 		goto done;
 	}
 
 #if defined(__i386__)
 	if ((cpu_feature & CPUID_CX8) == 0) {
 		device_printf(dev, "64 bit atomics not available.\n");
 		rc = ENOTSUP;
 		goto done;
 	}
 #endif
 
 	/* Prepare the firmware for operation */
 	rc = prep_firmware(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = get_params__post_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = set_params__post_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = map_bar_2(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = t4_create_dma_tag(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/*
 	 * Number of VIs to create per-port.  The first VI is the
 	 * "main" regular VI for the port.  The second VI is used for
 	 * netmap if present, and any remaining VIs are used for
 	 * additional virtual interfaces.
 	 *
 	 * Limit the number of VIs per port to the number of available
 	 * MAC addresses per port.
 	 */
 	if (t4_num_vis >= 1)
 		num_vis = t4_num_vis;
 	else
 		num_vis = 1;
 #ifdef DEV_NETMAP
 	num_vis++;
 #endif
 	if (num_vis > nitems(vi_mac_funcs)) {
 		num_vis = nitems(vi_mac_funcs);
 		device_printf(dev, "Number of VIs limited to %d\n", num_vis);
 	}
 
 	/*
 	 * First pass over all the ports - allocate VIs and initialize some
 	 * basic parameters like mac address, port type, etc.  We also figure
 	 * out whether a port is 10G or 1G and use that information when
 	 * calculating how many interrupts to attempt to allocate.
 	 */
 	n10g = n1g = 0;
 	for_each_port(sc, i) {
 		struct port_info *pi;
 		struct vi_info *vi;
 
 		pi = malloc(sizeof(*pi), M_CXGBE, M_ZERO | M_WAITOK);
 		sc->port[i] = pi;
 
 		/* These must be set before t4_port_init */
 		pi->adapter = sc;
 		pi->port_id = i;
 		pi->nvi = num_vis;
 		pi->vi = malloc(sizeof(struct vi_info) * num_vis, M_CXGBE,
 		    M_ZERO | M_WAITOK);
 
 		/*
 		 * Allocate the "main" VI and initialize parameters
 		 * like mac addr.
 		 */
 		rc = -t4_port_init(sc, sc->mbox, sc->pf, 0, i);
 		if (rc != 0) {
 			device_printf(dev, "unable to initialize port %d: %d\n",
 			    i, rc);
 			free(pi->vi, M_CXGBE);
 			free(pi, M_CXGBE);
 			sc->port[i] = NULL;
 			goto done;
 		}
 
 		pi->link_cfg.requested_fc &= ~(PAUSE_TX | PAUSE_RX);
 		pi->link_cfg.requested_fc |= t4_pause_settings;
 		pi->link_cfg.fc &= ~(PAUSE_TX | PAUSE_RX);
 		pi->link_cfg.fc |= t4_pause_settings;
 
 		rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, &pi->link_cfg);
 		if (rc != 0) {
 			device_printf(dev, "port %d l1cfg failed: %d\n", i, rc);
 			free(pi->vi, M_CXGBE);
 			free(pi, M_CXGBE);
 			sc->port[i] = NULL;
 			goto done;
 		}
 
 		snprintf(pi->lockname, sizeof(pi->lockname), "%sp%d",
 		    device_get_nameunit(dev), i);
 		mtx_init(&pi->pi_lock, pi->lockname, 0, MTX_DEF);
 		sc->chan_map[pi->tx_chan] = i;
 
+		pi->tc = malloc(sizeof(struct tx_sched_class) *
+		    sc->chip_params->nsched_cls, M_CXGBE, M_ZERO | M_WAITOK);
+
 		if (is_10G_port(pi) || is_40G_port(pi)) {
 			n10g++;
 			for_each_vi(pi, j, vi) {
 				vi->tmr_idx = t4_tmr_idx_10g;
 				vi->pktc_idx = t4_pktc_idx_10g;
 			}
 		} else {
 			n1g++;
 			for_each_vi(pi, j, vi) {
 				vi->tmr_idx = t4_tmr_idx_1g;
 				vi->pktc_idx = t4_pktc_idx_1g;
 			}
 		}
 
 		pi->linkdnrc = -1;
 
 		for_each_vi(pi, j, vi) {
 			vi->qsize_rxq = t4_qsize_rxq;
 			vi->qsize_txq = t4_qsize_txq;
 			vi->pi = pi;
 		}
 
 		pi->dev = device_add_child(dev, is_t4(sc) ? "cxgbe" : "cxl", -1);
 		if (pi->dev == NULL) {
 			device_printf(dev,
 			    "failed to add device for port %d.\n", i);
 			rc = ENXIO;
 			goto done;
 		}
 		pi->vi[0].dev = pi->dev;
 		device_set_softc(pi->dev, pi);
 	}
 
 	/*
 	 * Interrupt type, # of interrupts, # of rx/tx queues, etc.
 	 */
 #ifdef DEV_NETMAP
 	num_vis--;
 #endif
 	rc = cfg_itype_and_nqueues(sc, n10g, n1g, num_vis, &iaq);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	sc->intr_type = iaq.intr_type;
 	sc->intr_count = iaq.nirq;
 
 	s = &sc->sge;
 	s->nrxq = n10g * iaq.nrxq10g + n1g * iaq.nrxq1g;
 	s->ntxq = n10g * iaq.ntxq10g + n1g * iaq.ntxq1g;
 	if (num_vis > 1) {
 		s->nrxq += (n10g + n1g) * (num_vis - 1);
 		s->ntxq += (n10g + n1g) * (num_vis - 1);
 	}
 	s->neq = s->ntxq + s->nrxq;	/* the free list in an rxq is an eq */
 	s->neq += sc->params.nports + 1;/* ctrl queues: 1 per port + 1 mgmt */
 	s->niq = s->nrxq + 1;		/* 1 extra for firmware event queue */
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		s->nofldrxq = n10g * iaq.nofldrxq10g + n1g * iaq.nofldrxq1g;
 		s->nofldtxq = n10g * iaq.nofldtxq10g + n1g * iaq.nofldtxq1g;
 		if (num_vis > 1) {
 			s->nofldrxq += (n10g + n1g) * (num_vis - 1);
 			s->nofldtxq += (n10g + n1g) * (num_vis - 1);
 		}
 		s->neq += s->nofldtxq + s->nofldrxq;
 		s->niq += s->nofldrxq;
 
 		s->ofld_rxq = malloc(s->nofldrxq * sizeof(struct sge_ofld_rxq),
 		    M_CXGBE, M_ZERO | M_WAITOK);
 		s->ofld_txq = malloc(s->nofldtxq * sizeof(struct sge_wrq),
 		    M_CXGBE, M_ZERO | M_WAITOK);
 	}
 #endif
 #ifdef DEV_NETMAP
 	s->nnmrxq = n10g * iaq.nnmrxq10g + n1g * iaq.nnmrxq1g;
 	s->nnmtxq = n10g * iaq.nnmtxq10g + n1g * iaq.nnmtxq1g;
 	s->neq += s->nnmtxq + s->nnmrxq;
 	s->niq += s->nnmrxq;
 
 	s->nm_rxq = malloc(s->nnmrxq * sizeof(struct sge_nm_rxq),
 	    M_CXGBE, M_ZERO | M_WAITOK);
 	s->nm_txq = malloc(s->nnmtxq * sizeof(struct sge_nm_txq),
 	    M_CXGBE, M_ZERO | M_WAITOK);
 #endif
 
 	s->ctrlq = malloc(sc->params.nports * sizeof(struct sge_wrq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->rxq = malloc(s->nrxq * sizeof(struct sge_rxq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->txq = malloc(s->ntxq * sizeof(struct sge_txq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->iqmap = malloc(s->niq * sizeof(struct sge_iq *), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->eqmap = malloc(s->neq * sizeof(struct sge_eq *), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	sc->irq = malloc(sc->intr_count * sizeof(struct irq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_init_l2t(sc, M_WAITOK);
 
 	/*
 	 * Second pass over the ports.  This time we know the number of rx and
 	 * tx queues that each port should get.
 	 */
 	rqidx = tqidx = 0;
 #ifdef TCP_OFFLOAD
 	ofld_rqidx = ofld_tqidx = 0;
 #endif
 #ifdef DEV_NETMAP
 	nm_rqidx = nm_tqidx = 0;
 #endif
 	for_each_port(sc, i) {
 		struct port_info *pi = sc->port[i];
 		struct vi_info *vi;
 
 		if (pi == NULL)
 			continue;
 
 		for_each_vi(pi, j, vi) {
 #ifdef DEV_NETMAP
 			if (j == 1) {
 				vi->flags |= VI_NETMAP | INTR_RXQ;
 				vi->first_rxq = nm_rqidx;
 				vi->first_txq = nm_tqidx;
 				if (is_10G_port(pi) || is_40G_port(pi)) {
 					vi->nrxq = iaq.nnmrxq10g;
 					vi->ntxq = iaq.nnmtxq10g;
 				} else {
 					vi->nrxq = iaq.nnmrxq1g;
 					vi->ntxq = iaq.nnmtxq1g;
 				}
 				nm_rqidx += vi->nrxq;
 				nm_tqidx += vi->ntxq;
 				continue;
 			}
 #endif
 
 			vi->first_rxq = rqidx;
 			vi->first_txq = tqidx;
 			if (is_10G_port(pi) || is_40G_port(pi)) {
 				vi->flags |= iaq.intr_flags_10g & INTR_RXQ;
 				vi->nrxq = j == 0 ? iaq.nrxq10g : 1;
 				vi->ntxq = j == 0 ? iaq.ntxq10g : 1;
 			} else {
 				vi->flags |= iaq.intr_flags_1g & INTR_RXQ;
 				vi->nrxq = j == 0 ? iaq.nrxq1g : 1;
 				vi->ntxq = j == 0 ? iaq.ntxq1g : 1;
 			}
 
 			if (vi->ntxq > 1)
 				vi->rsrv_noflowq = iaq.rsrv_noflowq ? 1 : 0;
 			else
 				vi->rsrv_noflowq = 0;
 
 			rqidx += vi->nrxq;
 			tqidx += vi->ntxq;
 
 #ifdef TCP_OFFLOAD
 			if (!is_offload(sc))
 				continue;
 			vi->first_ofld_rxq = ofld_rqidx;
 			vi->first_ofld_txq = ofld_tqidx;
 			if (is_10G_port(pi) || is_40G_port(pi)) {
 				vi->flags |= iaq.intr_flags_10g & INTR_OFLD_RXQ;
 				vi->nofldrxq = j == 0 ? iaq.nofldrxq10g : 1;
 				vi->nofldtxq = j == 0 ? iaq.nofldtxq10g : 1;
 			} else {
 				vi->flags |= iaq.intr_flags_1g & INTR_OFLD_RXQ;
 				vi->nofldrxq = j == 0 ? iaq.nofldrxq1g : 1;
 				vi->nofldtxq = j == 0 ? iaq.nofldtxq1g : 1;
 			}
 			ofld_rqidx += vi->nofldrxq;
 			ofld_tqidx += vi->nofldtxq;
 #endif
 		}
 	}
 
 	rc = setup_intr_handlers(sc);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to setup interrupt handlers: %d\n", rc);
 		goto done;
 	}
 
 	rc = bus_generic_attach(dev);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to attach all child ports: %d\n", rc);
 		goto done;
 	}
 
 	device_printf(dev,
 	    "PCIe gen%d x%d, %d ports, %d %s interrupt%s, %d eq, %d iq\n",
 	    sc->params.pci.speed, sc->params.pci.width, sc->params.nports,
 	    sc->intr_count, sc->intr_type == INTR_MSIX ? "MSI-X" :
 	    (sc->intr_type == INTR_MSI ? "MSI" : "INTx"),
 	    sc->intr_count > 1 ? "s" : "", sc->sge.neq, sc->sge.niq);
 
 	t4_set_desc(sc);
 
 done:
 	if (rc != 0 && sc->cdev) {
 		/* cdev was created and so cxgbetool works; recover that way. */
 		device_printf(dev,
 		    "error during attach, adapter is now in recovery mode.\n");
 		rc = 0;
 	}
 
 	if (rc != 0)
 		t4_detach(dev);
 	else
 		t4_sysctls(sc);
 
 	return (rc);
 }
 
 /*
  * Idempotent
  */
 static int
 t4_detach(device_t dev)
 {
 	struct adapter *sc;
 	struct port_info *pi;
 	int i, rc;
 
 	sc = device_get_softc(dev);
 
 	if (sc->flags & FULL_INIT_DONE)
 		t4_intr_disable(sc);
 
 	if (sc->cdev) {
 		destroy_dev(sc->cdev);
 		sc->cdev = NULL;
 	}
 
 	rc = bus_generic_detach(dev);
 	if (rc) {
 		device_printf(dev,
 		    "failed to detach child devices: %d\n", rc);
 		return (rc);
 	}
 
 	for (i = 0; i < sc->intr_count; i++)
 		t4_free_irq(sc, &sc->irq[i]);
 
 	for (i = 0; i < MAX_NPORTS; i++) {
 		pi = sc->port[i];
 		if (pi) {
 			t4_free_vi(sc, sc->mbox, sc->pf, 0, pi->vi[0].viid);
 			if (pi->dev)
 				device_delete_child(dev, pi->dev);
 
 			mtx_destroy(&pi->pi_lock);
 			free(pi->vi, M_CXGBE);
+			free(pi->tc, M_CXGBE);
 			free(pi, M_CXGBE);
 		}
 	}
 
 	if (sc->flags & FULL_INIT_DONE)
 		adapter_full_uninit(sc);
 
 	if (sc->flags & FW_OK)
 		t4_fw_bye(sc, sc->mbox);
 
 	if (sc->intr_type == INTR_MSI || sc->intr_type == INTR_MSIX)
 		pci_release_msi(dev);
 
 	if (sc->regs_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->regs_rid,
 		    sc->regs_res);
 
 	if (sc->udbs_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->udbs_rid,
 		    sc->udbs_res);
 
 	if (sc->msix_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->msix_rid,
 		    sc->msix_res);
 
 	if (sc->l2t)
 		t4_free_l2t(sc->l2t);
 
 #ifdef TCP_OFFLOAD
 	free(sc->sge.ofld_rxq, M_CXGBE);
 	free(sc->sge.ofld_txq, M_CXGBE);
 #endif
 #ifdef DEV_NETMAP
 	free(sc->sge.nm_rxq, M_CXGBE);
 	free(sc->sge.nm_txq, M_CXGBE);
 #endif
 	free(sc->irq, M_CXGBE);
 	free(sc->sge.rxq, M_CXGBE);
 	free(sc->sge.txq, M_CXGBE);
 	free(sc->sge.ctrlq, M_CXGBE);
 	free(sc->sge.iqmap, M_CXGBE);
 	free(sc->sge.eqmap, M_CXGBE);
 	free(sc->tids.ftid_tab, M_CXGBE);
 	t4_destroy_dma_tag(sc);
 	if (mtx_initialized(&sc->sc_lock)) {
 		sx_xlock(&t4_list_lock);
 		SLIST_REMOVE(&t4_list, sc, adapter, link);
 		sx_xunlock(&t4_list_lock);
 		mtx_destroy(&sc->sc_lock);
 	}
 
 	callout_drain(&sc->sfl_callout);
 	if (mtx_initialized(&sc->tids.ftid_lock))
 		mtx_destroy(&sc->tids.ftid_lock);
 	if (mtx_initialized(&sc->sfl_lock))
 		mtx_destroy(&sc->sfl_lock);
 	if (mtx_initialized(&sc->ifp_lock))
 		mtx_destroy(&sc->ifp_lock);
 	if (mtx_initialized(&sc->reg_lock))
 		mtx_destroy(&sc->reg_lock);
 
 	for (i = 0; i < NUM_MEMWIN; i++) {
 		struct memwin *mw = &sc->memwin[i];
 
 		if (rw_initialized(&mw->mw_lock))
 			rw_destroy(&mw->mw_lock);
 	}
 
 	bzero(sc, sizeof(*sc));
 
 	return (0);
 }
 
 static int
 cxgbe_probe(device_t dev)
 {
 	char buf[128];
 	struct port_info *pi = device_get_softc(dev);
 
 	snprintf(buf, sizeof(buf), "port %d", pi->port_id);
 	device_set_desc_copy(dev, buf);
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 #define T4_CAP (IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | \
     IFCAP_VLAN_HWCSUM | IFCAP_TSO | IFCAP_JUMBO_MTU | IFCAP_LRO | \
     IFCAP_VLAN_HWTSO | IFCAP_LINKSTATE | IFCAP_HWCSUM_IPV6 | IFCAP_HWSTATS)
 #define T4_CAP_ENABLE (T4_CAP)
 
 static int
 cxgbe_vi_attach(device_t dev, struct vi_info *vi)
 {
 	struct ifnet *ifp;
 	struct sbuf *sb;
 
 	vi->xact_addr_filt = -1;
 	callout_init(&vi->tick, 1);
 
 	/* Allocate an ifnet and set it up */
 	ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "Cannot allocate ifnet\n");
 		return (ENOMEM);
 	}
 	vi->ifp = ifp;
 	ifp->if_softc = vi;
 
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 
 	ifp->if_init = cxgbe_init;
 	ifp->if_ioctl = cxgbe_ioctl;
 	ifp->if_transmit = cxgbe_transmit;
 	ifp->if_qflush = cxgbe_qflush;
 	ifp->if_get_counter = cxgbe_get_counter;
 
 	ifp->if_capabilities = T4_CAP;
 #ifdef TCP_OFFLOAD
 	if (vi->nofldrxq != 0)
 		ifp->if_capabilities |= IFCAP_TOE;
 #endif
 	ifp->if_capenable = T4_CAP_ENABLE;
 	ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_IP | CSUM_TSO |
 	    CSUM_UDP_IPV6 | CSUM_TCP_IPV6;
 
 	ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 	ifp->if_hw_tsomaxsegcount = TX_SGL_SEGS;
 	ifp->if_hw_tsomaxsegsize = 65536;
 
 	/* Initialize ifmedia for this VI */
 	ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change,
 	    cxgbe_media_status);
 	build_medialist(vi->pi, &vi->media);
 
 	vi->vlan_c = EVENTHANDLER_REGISTER(vlan_config, cxgbe_vlan_config, ifp,
 	    EVENTHANDLER_PRI_ANY);
 
 	ether_ifattach(ifp, vi->hw_addr);
 
 	sb = sbuf_new_auto();
 	sbuf_printf(sb, "%d txq, %d rxq (NIC)", vi->ntxq, vi->nrxq);
 #ifdef TCP_OFFLOAD
 	if (ifp->if_capabilities & IFCAP_TOE)
 		sbuf_printf(sb, "; %d txq, %d rxq (TOE)",
 		    vi->nofldtxq, vi->nofldrxq);
 #endif
 	sbuf_finish(sb);
 	device_printf(dev, "%s\n", sbuf_data(sb));
 	sbuf_delete(sb);
 
 	vi_sysctls(vi);
 
 	return (0);
 }
 
 static int
 cxgbe_attach(device_t dev)
 {
 	struct port_info *pi = device_get_softc(dev);
 	struct vi_info *vi;
 	int i, rc;
 
 	callout_init_mtx(&pi->tick, &pi->pi_lock, 0);
 
 	rc = cxgbe_vi_attach(dev, &pi->vi[0]);
 	if (rc)
 		return (rc);
 
 	for_each_vi(pi, i, vi) {
 		if (i == 0)
 			continue;
 #ifdef DEV_NETMAP
 		if (vi->flags & VI_NETMAP) {
 			/*
 			 * media handled here to keep
 			 * implementation private to this file
 			 */
 			ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change,
 			    cxgbe_media_status);
 			build_medialist(pi, &vi->media);
 			vi->dev = device_add_child(dev, is_t4(pi->adapter) ?
 			    "ncxgbe" : "ncxl", device_get_unit(dev));
 		} else
 #endif
 			vi->dev = device_add_child(dev, is_t4(pi->adapter) ?
 			    "vcxgbe" : "vcxl", -1);
 		if (vi->dev == NULL) {
 			device_printf(dev, "failed to add VI %d\n", i);
 			continue;
 		}
 		device_set_softc(vi->dev, vi);
 	}
 
 	cxgbe_sysctls(pi);
 
 	bus_generic_attach(dev);
 
 	return (0);
 }
 
 static void
 cxgbe_vi_detach(struct vi_info *vi)
 {
 	struct ifnet *ifp = vi->ifp;
 
 	ether_ifdetach(ifp);
 
 	if (vi->vlan_c)
 		EVENTHANDLER_DEREGISTER(vlan_config, vi->vlan_c);
 
 	/* Let detach proceed even if these fail. */
 	cxgbe_uninit_synchronized(vi);
 	callout_drain(&vi->tick);
 	vi_full_uninit(vi);
 
 	ifmedia_removeall(&vi->media);
 	if_free(vi->ifp);
 	vi->ifp = NULL;
 }
 
 static int
 cxgbe_detach(device_t dev)
 {
 	struct port_info *pi = device_get_softc(dev);
 	struct adapter *sc = pi->adapter;
 	int rc;
 
 	/* Detach the extra VIs first. */
 	rc = bus_generic_detach(dev);
 	if (rc)
 		return (rc);
 	device_delete_children(dev);
 
 	doom_vi(sc, &pi->vi[0]);
 
 	if (pi->flags & HAS_TRACEQ) {
 		sc->traceq = -1;	/* cloner should not create ifnet */
 		t4_tracer_port_detach(sc);
 	}
 
 	cxgbe_vi_detach(&pi->vi[0]);
 	callout_drain(&pi->tick);
 
 	end_synchronized_op(sc, 0);
 
 	return (0);
 }
 
 static void
 cxgbe_init(void *arg)
 {
 	struct vi_info *vi = arg;
 	struct adapter *sc = vi->pi->adapter;
 
 	if (begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4init") != 0)
 		return;
 	cxgbe_init_synchronized(vi);
 	end_synchronized_op(sc, 0);
 }
 
 static int
 cxgbe_ioctl(struct ifnet *ifp, unsigned long cmd, caddr_t data)
 {
 	int rc = 0, mtu, flags, can_sleep;
 	struct vi_info *vi = ifp->if_softc;
 	struct adapter *sc = vi->pi->adapter;
 	struct ifreq *ifr = (struct ifreq *)data;
 	uint32_t mask;
 
 	switch (cmd) {
 	case SIOCSIFMTU:
 		mtu = ifr->ifr_mtu;
 		if ((mtu < ETHERMIN) || (mtu > ETHERMTU_JUMBO))
 			return (EINVAL);
 
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4mtu");
 		if (rc)
 			return (rc);
 		ifp->if_mtu = mtu;
 		if (vi->flags & VI_INIT_DONE) {
 			t4_update_fl_bufsize(ifp);
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 				rc = update_mac_settings(ifp, XGMAC_MTU);
 		}
 		end_synchronized_op(sc, 0);
 		break;
 
 	case SIOCSIFFLAGS:
 		can_sleep = 0;
 redo_sifflags:
 		rc = begin_synchronized_op(sc, vi,
 		    can_sleep ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4flg");
 		if (rc)
 			return (rc);
 
 		if (ifp->if_flags & IFF_UP) {
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 				flags = vi->if_flags;
 				if ((ifp->if_flags ^ flags) &
 				    (IFF_PROMISC | IFF_ALLMULTI)) {
 					if (can_sleep == 1) {
 						end_synchronized_op(sc, 0);
 						can_sleep = 0;
 						goto redo_sifflags;
 					}
 					rc = update_mac_settings(ifp,
 					    XGMAC_PROMISC | XGMAC_ALLMULTI);
 				}
 			} else {
 				if (can_sleep == 0) {
 					end_synchronized_op(sc, LOCK_HELD);
 					can_sleep = 1;
 					goto redo_sifflags;
 				}
 				rc = cxgbe_init_synchronized(vi);
 			}
 			vi->if_flags = ifp->if_flags;
 		} else if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 			if (can_sleep == 0) {
 				end_synchronized_op(sc, LOCK_HELD);
 				can_sleep = 1;
 				goto redo_sifflags;
 			}
 			rc = cxgbe_uninit_synchronized(vi);
 		}
 		end_synchronized_op(sc, can_sleep ? 0 : LOCK_HELD);
 		break;
 
 	case SIOCADDMULTI:
 	case SIOCDELMULTI: /* these two are called with a mutex held :-( */
 		rc = begin_synchronized_op(sc, vi, HOLD_LOCK, "t4multi");
 		if (rc)
 			return (rc);
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 			rc = update_mac_settings(ifp, XGMAC_MCADDRS);
 		end_synchronized_op(sc, LOCK_HELD);
 		break;
 
 	case SIOCSIFCAP:
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4cap");
 		if (rc)
 			return (rc);
 
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 		if (mask & IFCAP_TXCSUM) {
 			ifp->if_capenable ^= IFCAP_TXCSUM;
 			ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP);
 
 			if (IFCAP_TSO4 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO4;
 				if_printf(ifp,
 				    "tso4 disabled due to -txcsum.\n");
 			}
 		}
 		if (mask & IFCAP_TXCSUM_IPV6) {
 			ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
 			ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
 
 			if (IFCAP_TSO6 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO6;
 				if_printf(ifp,
 				    "tso6 disabled due to -txcsum6.\n");
 			}
 		}
 		if (mask & IFCAP_RXCSUM)
 			ifp->if_capenable ^= IFCAP_RXCSUM;
 		if (mask & IFCAP_RXCSUM_IPV6)
 			ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
 
 		/*
 		 * Note that we leave CSUM_TSO alone (it is always set).  The
 		 * kernel takes both IFCAP_TSOx and CSUM_TSO into account before
 		 * sending a TSO request our way, so it's sufficient to toggle
 		 * IFCAP_TSOx only.
 		 */
 		if (mask & IFCAP_TSO4) {
 			if (!(IFCAP_TSO4 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum first.\n");
 				rc = EAGAIN;
 				goto fail;
 			}
 			ifp->if_capenable ^= IFCAP_TSO4;
 		}
 		if (mask & IFCAP_TSO6) {
 			if (!(IFCAP_TSO6 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum6 first.\n");
 				rc = EAGAIN;
 				goto fail;
 			}
 			ifp->if_capenable ^= IFCAP_TSO6;
 		}
 		if (mask & IFCAP_LRO) {
 #if defined(INET) || defined(INET6)
 			int i;
 			struct sge_rxq *rxq;
 
 			ifp->if_capenable ^= IFCAP_LRO;
 			for_each_rxq(vi, i, rxq) {
 				if (ifp->if_capenable & IFCAP_LRO)
 					rxq->iq.flags |= IQ_LRO_ENABLED;
 				else
 					rxq->iq.flags &= ~IQ_LRO_ENABLED;
 			}
 #endif
 		}
 #ifdef TCP_OFFLOAD
 		if (mask & IFCAP_TOE) {
 			int enable = (ifp->if_capenable ^ mask) & IFCAP_TOE;
 
 			rc = toe_capability(vi, enable);
 			if (rc != 0)
 				goto fail;
 
 			ifp->if_capenable ^= mask;
 		}
 #endif
 		if (mask & IFCAP_VLAN_HWTAGGING) {
 			ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 				rc = update_mac_settings(ifp, XGMAC_VLANEX);
 		}
 		if (mask & IFCAP_VLAN_MTU) {
 			ifp->if_capenable ^= IFCAP_VLAN_MTU;
 
 			/* Need to find out how to disable auto-mtu-inflation */
 		}
 		if (mask & IFCAP_VLAN_HWTSO)
 			ifp->if_capenable ^= IFCAP_VLAN_HWTSO;
 		if (mask & IFCAP_VLAN_HWCSUM)
 			ifp->if_capenable ^= IFCAP_VLAN_HWCSUM;
 
 #ifdef VLAN_CAPABILITIES
 		VLAN_CAPABILITIES(ifp);
 #endif
 fail:
 		end_synchronized_op(sc, 0);
 		break;
 
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		ifmedia_ioctl(ifp, ifr, &vi->media, cmd);
 		break;
 
 	case SIOCGI2C: {
 		struct ifi2creq i2c;
 
 		rc = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
 		if (rc != 0)
 			break;
 		if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) {
 			rc = EPERM;
 			break;
 		}
 		if (i2c.len > sizeof(i2c.data)) {
 			rc = EINVAL;
 			break;
 		}
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4i2c");
 		if (rc)
 			return (rc);
 		rc = -t4_i2c_rd(sc, sc->mbox, vi->pi->port_id, i2c.dev_addr,
 		    i2c.offset, i2c.len, &i2c.data[0]);
 		end_synchronized_op(sc, 0);
 		if (rc == 0)
 			rc = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
 		break;
 	}
 
 	default:
 		rc = ether_ioctl(ifp, cmd, data);
 	}
 
 	return (rc);
 }
 
 static int
 cxgbe_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct sge_txq *txq;
 	void *items[1];
 	int rc;
 
 	M_ASSERTPKTHDR(m);
 	MPASS(m->m_nextpkt == NULL);	/* not quite ready for this yet */
 
 	if (__predict_false(pi->link_cfg.link_ok == 0)) {
 		m_freem(m);
 		return (ENETDOWN);
 	}
 
 	rc = parse_pkt(&m);
 	if (__predict_false(rc != 0)) {
 		MPASS(m == NULL);			/* was freed already */
 		atomic_add_int(&pi->tx_parse_error, 1);	/* rare, atomic is ok */
 		return (rc);
 	}
 
 	/* Select a txq. */
 	txq = &sc->sge.txq[vi->first_txq];
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
 		txq += ((m->m_pkthdr.flowid % (vi->ntxq - vi->rsrv_noflowq)) +
 		    vi->rsrv_noflowq);
 
 	items[0] = m;
 	rc = mp_ring_enqueue(txq->r, items, 1, 4096);
 	if (__predict_false(rc != 0))
 		m_freem(m);
 
 	return (rc);
 }
 
 static void
 cxgbe_qflush(struct ifnet *ifp)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct sge_txq *txq;
 	int i;
 
 	/* queues do not exist if !VI_INIT_DONE. */
 	if (vi->flags & VI_INIT_DONE) {
 		for_each_txq(vi, i, txq) {
 			TXQ_LOCK(txq);
 			txq->eq.flags &= ~EQ_ENABLED;
 			TXQ_UNLOCK(txq);
 			while (!mp_ring_is_idle(txq->r)) {
 				mp_ring_check_drainage(txq->r, 0);
 				pause("qflush", 1);
 			}
 		}
 	}
 	if_qflush(ifp);
 }
 
 static uint64_t
 vi_get_counter(struct ifnet *ifp, ift_counter c)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct fw_vi_stats_vf *s = &vi->stats;
 
 	vi_refresh_stats(vi->pi->adapter, vi);
 
 	switch (c) {
 	case IFCOUNTER_IPACKETS:
 		return (s->rx_bcast_frames + s->rx_mcast_frames +
 		    s->rx_ucast_frames);
 	case IFCOUNTER_IERRORS:
 		return (s->rx_err_frames);
 	case IFCOUNTER_OPACKETS:
 		return (s->tx_bcast_frames + s->tx_mcast_frames +
 		    s->tx_ucast_frames + s->tx_offload_frames);
 	case IFCOUNTER_OERRORS:
 		return (s->tx_drop_frames);
 	case IFCOUNTER_IBYTES:
 		return (s->rx_bcast_bytes + s->rx_mcast_bytes +
 		    s->rx_ucast_bytes);
 	case IFCOUNTER_OBYTES:
 		return (s->tx_bcast_bytes + s->tx_mcast_bytes +
 		    s->tx_ucast_bytes + s->tx_offload_bytes);
 	case IFCOUNTER_IMCASTS:
 		return (s->rx_mcast_frames);
 	case IFCOUNTER_OMCASTS:
 		return (s->tx_mcast_frames);
 	case IFCOUNTER_OQDROPS: {
 		uint64_t drops;
 
 		drops = 0;
 		if ((vi->flags & (VI_INIT_DONE | VI_NETMAP)) == VI_INIT_DONE) {
 			int i;
 			struct sge_txq *txq;
 
 			for_each_txq(vi, i, txq)
 				drops += counter_u64_fetch(txq->r->drops);
 		}
 
 		return (drops);
 
 	}
 
 	default:
 		return (if_get_counter_default(ifp, c));
 	}
 }
 
 uint64_t
 cxgbe_get_counter(struct ifnet *ifp, ift_counter c)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct port_stats *s = &pi->stats;
 
 	if (pi->nvi > 1)
 		return (vi_get_counter(ifp, c));
 
 	cxgbe_refresh_stats(sc, pi);
 
 	switch (c) {
 	case IFCOUNTER_IPACKETS:
 		return (s->rx_frames);
 
 	case IFCOUNTER_IERRORS:
 		return (s->rx_jabber + s->rx_runt + s->rx_too_long +
 		    s->rx_fcs_err + s->rx_len_err);
 
 	case IFCOUNTER_OPACKETS:
 		return (s->tx_frames);
 
 	case IFCOUNTER_OERRORS:
 		return (s->tx_error_frames);
 
 	case IFCOUNTER_IBYTES:
 		return (s->rx_octets);
 
 	case IFCOUNTER_OBYTES:
 		return (s->tx_octets);
 
 	case IFCOUNTER_IMCASTS:
 		return (s->rx_mcast_frames);
 
 	case IFCOUNTER_OMCASTS:
 		return (s->tx_mcast_frames);
 
 	case IFCOUNTER_IQDROPS:
 		return (s->rx_ovflow0 + s->rx_ovflow1 + s->rx_ovflow2 +
 		    s->rx_ovflow3 + s->rx_trunc0 + s->rx_trunc1 + s->rx_trunc2 +
 		    s->rx_trunc3 + pi->tnl_cong_drops);
 
 	case IFCOUNTER_OQDROPS: {
 		uint64_t drops;
 
 		drops = s->tx_drop;
 		if (vi->flags & VI_INIT_DONE) {
 			int i;
 			struct sge_txq *txq;
 
 			for_each_txq(vi, i, txq)
 				drops += counter_u64_fetch(txq->r->drops);
 		}
 
 		return (drops);
 
 	}
 
 	default:
 		return (if_get_counter_default(ifp, c));
 	}
 }
 
 static int
 cxgbe_media_change(struct ifnet *ifp)
 {
 	struct vi_info *vi = ifp->if_softc;
 
 	device_printf(vi->dev, "%s unimplemented.\n", __func__);
 
 	return (EOPNOTSUPP);
 }
 
 static void
 cxgbe_media_status(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct ifmedia_entry *cur;
 	int speed = pi->link_cfg.speed;
 
 	cur = vi->media.ifm_cur;
 
 	ifmr->ifm_status = IFM_AVALID;
 	if (!pi->link_cfg.link_ok)
 		return;
 
 	ifmr->ifm_status |= IFM_ACTIVE;
 
 	/* active and current will differ iff current media is autoselect. */
 	if (IFM_SUBTYPE(cur->ifm_media) != IFM_AUTO)
 		return;
 
 	ifmr->ifm_active = IFM_ETHER | IFM_FDX;
 	if (speed == 10000)
 		ifmr->ifm_active |= IFM_10G_T;
 	else if (speed == 1000)
 		ifmr->ifm_active |= IFM_1000_T;
 	else if (speed == 100)
 		ifmr->ifm_active |= IFM_100_TX;
 	else if (speed == 10)
 		ifmr->ifm_active |= IFM_10_T;
 	else
 		KASSERT(0, ("%s: link up but speed unknown (%u)", __func__,
 			    speed));
 }
 
 static int
 vcxgbe_probe(device_t dev)
 {
 	char buf[128];
 	struct vi_info *vi = device_get_softc(dev);
 
 	snprintf(buf, sizeof(buf), "port %d vi %td", vi->pi->port_id,
 	    vi - vi->pi->vi);
 	device_set_desc_copy(dev, buf);
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 vcxgbe_attach(device_t dev)
 {
 	struct vi_info *vi;
 	struct port_info *pi;
 	struct adapter *sc;
 	int func, index, rc;
 	u32 param, val;
 
 	vi = device_get_softc(dev);
 	pi = vi->pi;
 	sc = pi->adapter;
 
 	index = vi - pi->vi;
 	KASSERT(index < nitems(vi_mac_funcs),
 	    ("%s: VI %s doesn't have a MAC func", __func__,
 	    device_get_nameunit(dev)));
 	func = vi_mac_funcs[index];
 	rc = t4_alloc_vi_func(sc, sc->mbox, pi->tx_chan, sc->pf, 0, 1,
 	    vi->hw_addr, &vi->rss_size, func, 0);
 	if (rc < 0) {
 		device_printf(dev, "Failed to allocate virtual interface "
 		    "for port %d: %d\n", pi->port_id, -rc);
 		return (-rc);
 	}
 	vi->viid = rc;
 
 	param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_RSSINFO) |
 	    V_FW_PARAMS_PARAM_YZ(vi->viid);
 	rc = t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 	if (rc)
 		vi->rss_base = 0xffff;
 	else {
 		/* MPASS((val >> 16) == rss_size); */
 		vi->rss_base = val & 0xffff;
 	}
 
 	rc = cxgbe_vi_attach(dev, vi);
 	if (rc) {
 		t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
 		return (rc);
 	}
 	return (0);
 }
 
 static int
 vcxgbe_detach(device_t dev)
 {
 	struct vi_info *vi;
 	struct adapter *sc;
 
 	vi = device_get_softc(dev);
 	sc = vi->pi->adapter;
 
 	doom_vi(sc, vi);
 
 	cxgbe_vi_detach(vi);
 	t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
 
 	end_synchronized_op(sc, 0);
 
 	return (0);
 }
 
 void
 t4_fatal_err(struct adapter *sc)
 {
 	t4_set_reg_field(sc, A_SGE_CONTROL, F_GLOBALENABLE, 0);
 	t4_intr_disable(sc);
 	log(LOG_EMERG, "%s: encountered fatal error, adapter stopped.\n",
 	    device_get_nameunit(sc->dev));
 }
 
 static int
 map_bars_0_and_4(struct adapter *sc)
 {
 	sc->regs_rid = PCIR_BAR(0);
 	sc->regs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->regs_rid, RF_ACTIVE);
 	if (sc->regs_res == NULL) {
 		device_printf(sc->dev, "cannot map registers.\n");
 		return (ENXIO);
 	}
 	sc->bt = rman_get_bustag(sc->regs_res);
 	sc->bh = rman_get_bushandle(sc->regs_res);
 	sc->mmio_len = rman_get_size(sc->regs_res);
 	setbit(&sc->doorbells, DOORBELL_KDB);
 
 	sc->msix_rid = PCIR_BAR(4);
 	sc->msix_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->msix_rid, RF_ACTIVE);
 	if (sc->msix_res == NULL) {
 		device_printf(sc->dev, "cannot map MSI-X BAR.\n");
 		return (ENXIO);
 	}
 
 	return (0);
 }
 
 static int
 map_bar_2(struct adapter *sc)
 {
 
 	/*
 	 * T4: only iWARP driver uses the userspace doorbells.  There is no need
 	 * to map it if RDMA is disabled.
 	 */
 	if (is_t4(sc) && sc->rdmacaps == 0)
 		return (0);
 
 	sc->udbs_rid = PCIR_BAR(2);
 	sc->udbs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->udbs_rid, RF_ACTIVE);
 	if (sc->udbs_res == NULL) {
 		device_printf(sc->dev, "cannot map doorbell BAR.\n");
 		return (ENXIO);
 	}
 	sc->udbs_base = rman_get_virtual(sc->udbs_res);
 
 	if (is_t5(sc)) {
 		setbit(&sc->doorbells, DOORBELL_UDB);
 #if defined(__i386__) || defined(__amd64__)
 		if (t5_write_combine) {
 			int rc;
 
 			/*
 			 * Enable write combining on BAR2.  This is the
 			 * userspace doorbell BAR and is split into 128B
 			 * (UDBS_SEG_SIZE) doorbell regions, each associated
 			 * with an egress queue.  The first 64B has the doorbell
 			 * and the second 64B can be used to submit a tx work
 			 * request with an implicit doorbell.
 			 */
 
 			rc = pmap_change_attr((vm_offset_t)sc->udbs_base,
 			    rman_get_size(sc->udbs_res), PAT_WRITE_COMBINING);
 			if (rc == 0) {
 				clrbit(&sc->doorbells, DOORBELL_UDB);
 				setbit(&sc->doorbells, DOORBELL_WCWR);
 				setbit(&sc->doorbells, DOORBELL_UDBWC);
 			} else {
 				device_printf(sc->dev,
 				    "couldn't enable write combining: %d\n",
 				    rc);
 			}
 
 			t4_write_reg(sc, A_SGE_STAT_CFG,
 			    V_STATSOURCE_T5(7) | V_STATMODE(0));
 		}
 #endif
 	}
 
 	return (0);
 }
 
 struct memwin_init {
 	uint32_t base;
 	uint32_t aperture;
 };
 
 static const struct memwin_init t4_memwin[NUM_MEMWIN] = {
 	{ MEMWIN0_BASE, MEMWIN0_APERTURE },
 	{ MEMWIN1_BASE, MEMWIN1_APERTURE },
 	{ MEMWIN2_BASE_T4, MEMWIN2_APERTURE_T4 }
 };
 
 static const struct memwin_init t5_memwin[NUM_MEMWIN] = {
 	{ MEMWIN0_BASE, MEMWIN0_APERTURE },
 	{ MEMWIN1_BASE, MEMWIN1_APERTURE },
 	{ MEMWIN2_BASE_T5, MEMWIN2_APERTURE_T5 },
 };
 
 static void
 setup_memwin(struct adapter *sc)
 {
 	const struct memwin_init *mw_init;
 	struct memwin *mw;
 	int i;
 	uint32_t bar0;
 
 	if (is_t4(sc)) {
 		/*
 		 * Read low 32b of bar0 indirectly via the hardware backdoor
 		 * mechanism.  Works from within PCI passthrough environments
 		 * too, where rman_get_start() can return a different value.  We
 		 * need to program the T4 memory window decoders with the actual
 		 * addresses that will be coming across the PCIe link.
 		 */
 		bar0 = t4_hw_pci_read_cfg4(sc, PCIR_BAR(0));
 		bar0 &= (uint32_t) PCIM_BAR_MEM_BASE;
 
 		mw_init = &t4_memwin[0];
 	} else {
 		/* T5+ use the relative offset inside the PCIe BAR */
 		bar0 = 0;
 
 		mw_init = &t5_memwin[0];
 	}
 
 	for (i = 0, mw = &sc->memwin[0]; i < NUM_MEMWIN; i++, mw_init++, mw++) {
 		rw_init(&mw->mw_lock, "memory window access");
 		mw->mw_base = mw_init->base;
 		mw->mw_aperture = mw_init->aperture;
 		mw->mw_curpos = 0;
 		t4_write_reg(sc,
 		    PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, i),
 		    (mw->mw_base + bar0) | V_BIR(0) |
 		    V_WINDOW(ilog2(mw->mw_aperture) - 10));
 		rw_wlock(&mw->mw_lock);
 		position_memwin(sc, i, 0);
 		rw_wunlock(&mw->mw_lock);
 	}
 
 	/* flush */
 	t4_read_reg(sc, PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, 2));
 }
 
 /*
  * Positions the memory window at the given address in the card's address space.
  * There are some alignment requirements and the actual position may be at an
  * address prior to the requested address.  mw->mw_curpos always has the actual
  * position of the window.
  */
 static void
 position_memwin(struct adapter *sc, int idx, uint32_t addr)
 {
 	struct memwin *mw;
 	uint32_t pf;
 	uint32_t reg;
 
 	MPASS(idx >= 0 && idx < NUM_MEMWIN);
 	mw = &sc->memwin[idx];
 	rw_assert(&mw->mw_lock, RA_WLOCKED);
 
 	if (is_t4(sc)) {
 		pf = 0;
 		mw->mw_curpos = addr & ~0xf;	/* start must be 16B aligned */
 	} else {
 		pf = V_PFNUM(sc->pf);
 		mw->mw_curpos = addr & ~0x7f;	/* start must be 128B aligned */
 	}
 	reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, idx);
 	t4_write_reg(sc, reg, mw->mw_curpos | pf);
 	t4_read_reg(sc, reg);	/* flush */
 }
 
 static int
 rw_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
     int len, int rw)
 {
 	struct memwin *mw;
 	uint32_t mw_end, v;
 
 	MPASS(idx >= 0 && idx < NUM_MEMWIN);
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (addr & 3 || len & 3 || len <= 0)
 		return (EINVAL);
 
 	mw = &sc->memwin[idx];
 	while (len > 0) {
 		rw_rlock(&mw->mw_lock);
 		mw_end = mw->mw_curpos + mw->mw_aperture;
 		if (addr >= mw_end || addr < mw->mw_curpos) {
 			/* Will need to reposition the window */
 			if (!rw_try_upgrade(&mw->mw_lock)) {
 				rw_runlock(&mw->mw_lock);
 				rw_wlock(&mw->mw_lock);
 			}
 			rw_assert(&mw->mw_lock, RA_WLOCKED);
 			position_memwin(sc, idx, addr);
 			rw_downgrade(&mw->mw_lock);
 			mw_end = mw->mw_curpos + mw->mw_aperture;
 		}
 		rw_assert(&mw->mw_lock, RA_RLOCKED);
 		while (addr < mw_end && len > 0) {
 			if (rw == 0) {
 				v = t4_read_reg(sc, mw->mw_base + addr -
 				    mw->mw_curpos);
 				*val++ = le32toh(v);
 			} else {
 				v = *val++;
 				t4_write_reg(sc, mw->mw_base + addr -
 				    mw->mw_curpos, htole32(v));
 			}
 			addr += 4;
 			len -= 4;
 		}
 		rw_runlock(&mw->mw_lock);
 	}
 
 	return (0);
 }
 
 static inline int
 read_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
     int len)
 {
 
 	return (rw_via_memwin(sc, idx, addr, val, len, 0));
 }
 
 static inline int
 write_via_memwin(struct adapter *sc, int idx, uint32_t addr,
     const uint32_t *val, int len)
 {
 
 	return (rw_via_memwin(sc, idx, addr, (void *)(uintptr_t)val, len, 1));
 }
 
 static int
 t4_range_cmp(const void *a, const void *b)
 {
 	return ((const struct t4_range *)a)->start -
 	       ((const struct t4_range *)b)->start;
 }
 
 /*
  * Verify that the memory range specified by the addr/len pair is valid within
  * the card's address space.
  */
 static int
 validate_mem_range(struct adapter *sc, uint32_t addr, int len)
 {
 	struct t4_range mem_ranges[4], *r, *next;
 	uint32_t em, addr_len;
 	int i, n, remaining;
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (addr & 3 || len & 3 || len <= 0)
 		return (EINVAL);
 
 	/* Enabled memories */
 	em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 
 	r = &mem_ranges[0];
 	n = 0;
 	bzero(r, sizeof(mem_ranges));
 	if (em & F_EDRAM0_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		r->size = G_EDRAM0_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EDRAM0_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (em & F_EDRAM1_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		r->size = G_EDRAM1_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EDRAM1_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (em & F_EXT_MEM_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		r->size = G_EXT_MEM_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EXT_MEM_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (is_t5(sc) && em & F_EXT_MEM1_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		r->size = G_EXT_MEM1_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EXT_MEM1_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	MPASS(n <= nitems(mem_ranges));
 
 	if (n > 1) {
 		/* Sort and merge the ranges. */
 		qsort(mem_ranges, n, sizeof(struct t4_range), t4_range_cmp);
 
 		/* Start from index 0 and examine the next n - 1 entries. */
 		r = &mem_ranges[0];
 		for (remaining = n - 1; remaining > 0; remaining--, r++) {
 
 			MPASS(r->size > 0);	/* r is a valid entry. */
 			next = r + 1;
 			MPASS(next->size > 0);	/* and so is the next one. */
 
 			while (r->start + r->size >= next->start) {
 				/* Merge the next one into the current entry. */
 				r->size = max(r->start + r->size,
 				    next->start + next->size) - r->start;
 				n--;	/* One fewer entry in total. */
 				if (--remaining == 0)
 					goto done;	/* short circuit */
 				next++;
 			}
 			if (next != r + 1) {
 				/*
 				 * Some entries were merged into r and next
 				 * points to the first valid entry that couldn't
 				 * be merged.
 				 */
 				MPASS(next->size > 0);	/* must be valid */
 				memcpy(r + 1, next, remaining * sizeof(*r));
 #ifdef INVARIANTS
 				/*
 				 * This so that the foo->size assertion in the
 				 * next iteration of the loop do the right
 				 * thing for entries that were pulled up and are
 				 * no longer valid.
 				 */
 				MPASS(n < nitems(mem_ranges));
 				bzero(&mem_ranges[n], (nitems(mem_ranges) - n) *
 				    sizeof(struct t4_range));
 #endif
 			}
 		}
 done:
 		/* Done merging the ranges. */
 		MPASS(n > 0);
 		r = &mem_ranges[0];
 		for (i = 0; i < n; i++, r++) {
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 		}
 	}
 
 	return (EFAULT);
 }
 
 static int
 fwmtype_to_hwmtype(int mtype)
 {
 
 	switch (mtype) {
 	case FW_MEMTYPE_EDC0:
 		return (MEM_EDC0);
 	case FW_MEMTYPE_EDC1:
 		return (MEM_EDC1);
 	case FW_MEMTYPE_EXTMEM:
 		return (MEM_MC0);
 	case FW_MEMTYPE_EXTMEM1:
 		return (MEM_MC1);
 	default:
 		panic("%s: cannot translate fw mtype %d.", __func__, mtype);
 	}
 }
 
 /*
  * Verify that the memory range specified by the memtype/offset/len pair is
  * valid and lies entirely within the memtype specified.  The global address of
  * the start of the range is returned in addr.
  */
 static int
 validate_mt_off_len(struct adapter *sc, int mtype, uint32_t off, int len,
     uint32_t *addr)
 {
 	uint32_t em, addr_len, maddr;
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (off & 3 || len & 3 || len == 0)
 		return (EINVAL);
 
 	em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 	switch (fwmtype_to_hwmtype(mtype)) {
 	case MEM_EDC0:
 		if (!(em & F_EDRAM0_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		maddr = G_EDRAM0_BASE(addr_len) << 20;
 		break;
 	case MEM_EDC1:
 		if (!(em & F_EDRAM1_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		maddr = G_EDRAM1_BASE(addr_len) << 20;
 		break;
 	case MEM_MC:
 		if (!(em & F_EXT_MEM_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		maddr = G_EXT_MEM_BASE(addr_len) << 20;
 		break;
 	case MEM_MC1:
 		if (!is_t5(sc) || !(em & F_EXT_MEM1_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		maddr = G_EXT_MEM1_BASE(addr_len) << 20;
 		break;
 	default:
 		return (EINVAL);
 	}
 
 	*addr = maddr + off;	/* global address */
 	return (validate_mem_range(sc, *addr, len));
 }
 
 static int
 fixup_devlog_params(struct adapter *sc)
 {
 	struct devlog_params *dparams = &sc->params.devlog;
 	int rc;
 
 	rc = validate_mt_off_len(sc, dparams->memtype, dparams->start,
 	    dparams->size, &dparams->addr);
 
 	return (rc);
 }
 
 static int
 cfg_itype_and_nqueues(struct adapter *sc, int n10g, int n1g, int num_vis,
     struct intrs_and_queues *iaq)
 {
 	int rc, itype, navail, nrxq10g, nrxq1g, n;
 	int nofldrxq10g = 0, nofldrxq1g = 0;
 	int nnmrxq10g = 0, nnmrxq1g = 0;
 
 	bzero(iaq, sizeof(*iaq));
 
 	iaq->ntxq10g = t4_ntxq10g;
 	iaq->ntxq1g = t4_ntxq1g;
 	iaq->nrxq10g = nrxq10g = t4_nrxq10g;
 	iaq->nrxq1g = nrxq1g = t4_nrxq1g;
 	iaq->rsrv_noflowq = t4_rsrv_noflowq;
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		iaq->nofldtxq10g = t4_nofldtxq10g;
 		iaq->nofldtxq1g = t4_nofldtxq1g;
 		iaq->nofldrxq10g = nofldrxq10g = t4_nofldrxq10g;
 		iaq->nofldrxq1g = nofldrxq1g = t4_nofldrxq1g;
 	}
 #endif
 #ifdef DEV_NETMAP
 	iaq->nnmtxq10g = t4_nnmtxq10g;
 	iaq->nnmtxq1g = t4_nnmtxq1g;
 	iaq->nnmrxq10g = nnmrxq10g = t4_nnmrxq10g;
 	iaq->nnmrxq1g = nnmrxq1g = t4_nnmrxq1g;
 #endif
 
 	for (itype = INTR_MSIX; itype; itype >>= 1) {
 
 		if ((itype & t4_intr_types) == 0)
 			continue;	/* not allowed */
 
 		if (itype == INTR_MSIX)
 			navail = pci_msix_count(sc->dev);
 		else if (itype == INTR_MSI)
 			navail = pci_msi_count(sc->dev);
 		else
 			navail = 1;
 restart:
 		if (navail == 0)
 			continue;
 
 		iaq->intr_type = itype;
 		iaq->intr_flags_10g = 0;
 		iaq->intr_flags_1g = 0;
 
 		/*
 		 * Best option: an interrupt vector for errors, one for the
 		 * firmware event queue, and one for every rxq (NIC, TOE, and
 		 * netmap).
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		iaq->nirq += n10g * (nrxq10g + nofldrxq10g + nnmrxq10g);
 		iaq->nirq += n10g * 2 * (num_vis - 1);
 		iaq->nirq += n1g * (nrxq1g + nofldrxq1g + nnmrxq1g);
 		iaq->nirq += n1g * 2 * (num_vis - 1);
 		if (iaq->nirq <= navail &&
 		    (itype != INTR_MSI || powerof2(iaq->nirq))) {
 			iaq->intr_flags_10g = INTR_ALL;
 			iaq->intr_flags_1g = INTR_ALL;
 			goto allocate;
 		}
 
 		/*
 		 * Second best option: a vector for errors, one for the firmware
 		 * event queue, and vectors for either all the NIC rx queues or
 		 * all the TOE rx queues.  The queues that don't get vectors
 		 * will forward their interrupts to those that do.
 		 *
 		 * Note: netmap rx queues cannot be created early and so they
 		 * can't be setup to receive forwarded interrupts for others.
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		if (nrxq10g >= nofldrxq10g) {
 			iaq->intr_flags_10g = INTR_RXQ;
 			iaq->nirq += n10g * nrxq10g;
 			iaq->nirq += n10g * (num_vis - 1);
 #ifdef DEV_NETMAP
 			iaq->nnmrxq10g = min(nnmrxq10g, nrxq10g);
 #endif
 		} else {
 			iaq->intr_flags_10g = INTR_OFLD_RXQ;
 			iaq->nirq += n10g * nofldrxq10g;
 #ifdef DEV_NETMAP
 			iaq->nnmrxq10g = min(nnmrxq10g, nofldrxq10g);
 #endif
 		}
 		if (nrxq1g >= nofldrxq1g) {
 			iaq->intr_flags_1g = INTR_RXQ;
 			iaq->nirq += n1g * nrxq1g;
 			iaq->nirq += n1g * (num_vis - 1);
 #ifdef DEV_NETMAP
 			iaq->nnmrxq1g = min(nnmrxq1g, nrxq1g);
 #endif
 		} else {
 			iaq->intr_flags_1g = INTR_OFLD_RXQ;
 			iaq->nirq += n1g * nofldrxq1g;
 #ifdef DEV_NETMAP
 			iaq->nnmrxq1g = min(nnmrxq1g, nofldrxq1g);
 #endif
 		}
 		if (iaq->nirq <= navail &&
 		    (itype != INTR_MSI || powerof2(iaq->nirq)))
 			goto allocate;
 
 		/*
 		 * Next best option: an interrupt vector for errors, one for the
 		 * firmware event queue, and at least one per VI.  At this
 		 * point we know we'll have to downsize nrxq and/or nofldrxq
 		 * and/or nnmrxq to fit what's available to us.
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		iaq->nirq += (n10g + n1g) * num_vis;
 		if (iaq->nirq <= navail) {
 			int leftover = navail - iaq->nirq;
 
 			if (n10g > 0) {
 				int target = max(nrxq10g, nofldrxq10g);
 
 				iaq->intr_flags_10g = nrxq10g >= nofldrxq10g ?
 				    INTR_RXQ : INTR_OFLD_RXQ;
 
 				n = 1;
 				while (n < target && leftover >= n10g) {
 					leftover -= n10g;
 					iaq->nirq += n10g;
 					n++;
 				}
 				iaq->nrxq10g = min(n, nrxq10g);
 #ifdef TCP_OFFLOAD
 				iaq->nofldrxq10g = min(n, nofldrxq10g);
 #endif
 #ifdef DEV_NETMAP
 				iaq->nnmrxq10g = min(n, nnmrxq10g);
 #endif
 			}
 
 			if (n1g > 0) {
 				int target = max(nrxq1g, nofldrxq1g);
 
 				iaq->intr_flags_1g = nrxq1g >= nofldrxq1g ?
 				    INTR_RXQ : INTR_OFLD_RXQ;
 
 				n = 1;
 				while (n < target && leftover >= n1g) {
 					leftover -= n1g;
 					iaq->nirq += n1g;
 					n++;
 				}
 				iaq->nrxq1g = min(n, nrxq1g);
 #ifdef TCP_OFFLOAD
 				iaq->nofldrxq1g = min(n, nofldrxq1g);
 #endif
 #ifdef DEV_NETMAP
 				iaq->nnmrxq1g = min(n, nnmrxq1g);
 #endif
 			}
 
 			if (itype != INTR_MSI || powerof2(iaq->nirq))
 				goto allocate;
 		}
 
 		/*
 		 * Least desirable option: one interrupt vector for everything.
 		 */
 		iaq->nirq = iaq->nrxq10g = iaq->nrxq1g = 1;
 		iaq->intr_flags_10g = iaq->intr_flags_1g = 0;
 #ifdef TCP_OFFLOAD
 		if (is_offload(sc))
 			iaq->nofldrxq10g = iaq->nofldrxq1g = 1;
 #endif
 #ifdef DEV_NETMAP
 		iaq->nnmrxq10g = iaq->nnmrxq1g = 1;
 #endif
 
 allocate:
 		navail = iaq->nirq;
 		rc = 0;
 		if (itype == INTR_MSIX)
 			rc = pci_alloc_msix(sc->dev, &navail);
 		else if (itype == INTR_MSI)
 			rc = pci_alloc_msi(sc->dev, &navail);
 
 		if (rc == 0) {
 			if (navail == iaq->nirq)
 				return (0);
 
 			/*
 			 * Didn't get the number requested.  Use whatever number
 			 * the kernel is willing to allocate (it's in navail).
 			 */
 			device_printf(sc->dev, "fewer vectors than requested, "
 			    "type=%d, req=%d, rcvd=%d; will downshift req.\n",
 			    itype, iaq->nirq, navail);
 			pci_release_msi(sc->dev);
 			goto restart;
 		}
 
 		device_printf(sc->dev,
 		    "failed to allocate vectors:%d, type=%d, req=%d, rcvd=%d\n",
 		    itype, rc, iaq->nirq, navail);
 	}
 
 	device_printf(sc->dev,
 	    "failed to find a usable interrupt type.  "
 	    "allowed=%d, msi-x=%d, msi=%d, intx=1", t4_intr_types,
 	    pci_msix_count(sc->dev), pci_msi_count(sc->dev));
 
 	return (ENXIO);
 }
 
 #define FW_VERSION(chip) ( \
     V_FW_HDR_FW_VER_MAJOR(chip##FW_VERSION_MAJOR) | \
     V_FW_HDR_FW_VER_MINOR(chip##FW_VERSION_MINOR) | \
     V_FW_HDR_FW_VER_MICRO(chip##FW_VERSION_MICRO) | \
     V_FW_HDR_FW_VER_BUILD(chip##FW_VERSION_BUILD))
 #define FW_INTFVER(chip, intf) (chip##FW_HDR_INTFVER_##intf)
 
 struct fw_info {
 	uint8_t chip;
 	char *kld_name;
 	char *fw_mod_name;
 	struct fw_hdr fw_hdr;	/* XXX: waste of space, need a sparse struct */
 } fw_info[] = {
 	{
 		.chip = CHELSIO_T4,
 		.kld_name = "t4fw_cfg",
 		.fw_mod_name = "t4fw",
 		.fw_hdr = {
 			.chip = FW_HDR_CHIP_T4,
 			.fw_ver = htobe32_const(FW_VERSION(T4)),
 			.intfver_nic = FW_INTFVER(T4, NIC),
 			.intfver_vnic = FW_INTFVER(T4, VNIC),
 			.intfver_ofld = FW_INTFVER(T4, OFLD),
 			.intfver_ri = FW_INTFVER(T4, RI),
 			.intfver_iscsipdu = FW_INTFVER(T4, ISCSIPDU),
 			.intfver_iscsi = FW_INTFVER(T4, ISCSI),
 			.intfver_fcoepdu = FW_INTFVER(T4, FCOEPDU),
 			.intfver_fcoe = FW_INTFVER(T4, FCOE),
 		},
 	}, {
 		.chip = CHELSIO_T5,
 		.kld_name = "t5fw_cfg",
 		.fw_mod_name = "t5fw",
 		.fw_hdr = {
 			.chip = FW_HDR_CHIP_T5,
 			.fw_ver = htobe32_const(FW_VERSION(T5)),
 			.intfver_nic = FW_INTFVER(T5, NIC),
 			.intfver_vnic = FW_INTFVER(T5, VNIC),
 			.intfver_ofld = FW_INTFVER(T5, OFLD),
 			.intfver_ri = FW_INTFVER(T5, RI),
 			.intfver_iscsipdu = FW_INTFVER(T5, ISCSIPDU),
 			.intfver_iscsi = FW_INTFVER(T5, ISCSI),
 			.intfver_fcoepdu = FW_INTFVER(T5, FCOEPDU),
 			.intfver_fcoe = FW_INTFVER(T5, FCOE),
 		},
 	}
 };
 
 static struct fw_info *
 find_fw_info(int chip)
 {
 	int i;
 
 	for (i = 0; i < nitems(fw_info); i++) {
 		if (fw_info[i].chip == chip)
 			return (&fw_info[i]);
 	}
 	return (NULL);
 }
 
 /*
  * Is the given firmware API compatible with the one the driver was compiled
  * with?
  */
 static int
 fw_compatible(const struct fw_hdr *hdr1, const struct fw_hdr *hdr2)
 {
 
 	/* short circuit if it's the exact same firmware version */
 	if (hdr1->chip == hdr2->chip && hdr1->fw_ver == hdr2->fw_ver)
 		return (1);
 
 	/*
 	 * XXX: Is this too conservative?  Perhaps I should limit this to the
 	 * features that are supported in the driver.
 	 */
 #define SAME_INTF(x) (hdr1->intfver_##x == hdr2->intfver_##x)
 	if (hdr1->chip == hdr2->chip && SAME_INTF(nic) && SAME_INTF(vnic) &&
 	    SAME_INTF(ofld) && SAME_INTF(ri) && SAME_INTF(iscsipdu) &&
 	    SAME_INTF(iscsi) && SAME_INTF(fcoepdu) && SAME_INTF(fcoe))
 		return (1);
 #undef SAME_INTF
 
 	return (0);
 }
 
 /*
  * The firmware in the KLD is usable, but should it be installed?  This routine
  * explains itself in detail if it indicates the KLD firmware should be
  * installed.
  */
 static int
 should_install_kld_fw(struct adapter *sc, int card_fw_usable, int k, int c)
 {
 	const char *reason;
 
 	if (!card_fw_usable) {
 		reason = "incompatible or unusable";
 		goto install;
 	}
 
 	if (k > c) {
 		reason = "older than the version bundled with this driver";
 		goto install;
 	}
 
 	if (t4_fw_install == 2 && k != c) {
 		reason = "different than the version bundled with this driver";
 		goto install;
 	}
 
 	return (0);
 
 install:
 	if (t4_fw_install == 0) {
 		device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
 		    "but the driver is prohibited from installing a different "
 		    "firmware on the card.\n",
 		    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 		    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason);
 
 		return (0);
 	}
 
 	device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
 	    "installing firmware %u.%u.%u.%u on card.\n",
 	    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 	    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason,
 	    G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
 	    G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
 
 	return (1);
 }
 /*
  * Establish contact with the firmware and determine if we are the master driver
  * or not, and whether we are responsible for chip initialization.
  */
 static int
 prep_firmware(struct adapter *sc)
 {
 	const struct firmware *fw = NULL, *default_cfg;
 	int rc, pf, card_fw_usable, kld_fw_usable, need_fw_reset = 1;
 	enum dev_state state;
 	struct fw_info *fw_info;
 	struct fw_hdr *card_fw;		/* fw on the card */
 	const struct fw_hdr *kld_fw;	/* fw in the KLD */
 	const struct fw_hdr *drv_fw;	/* fw header the driver was compiled
 					   against */
 
 	/* Contact firmware. */
 	rc = t4_fw_hello(sc, sc->mbox, sc->mbox, MASTER_MAY, &state);
 	if (rc < 0 || state == DEV_STATE_ERR) {
 		rc = -rc;
 		device_printf(sc->dev,
 		    "failed to connect to the firmware: %d, %d.\n", rc, state);
 		return (rc);
 	}
 	pf = rc;
 	if (pf == sc->mbox)
 		sc->flags |= MASTER_PF;
 	else if (state == DEV_STATE_UNINIT) {
 		/*
 		 * We didn't get to be the master so we definitely won't be
 		 * configuring the chip.  It's a bug if someone else hasn't
 		 * configured it already.
 		 */
 		device_printf(sc->dev, "couldn't be master(%d), "
 		    "device not already initialized either(%d).\n", rc, state);
 		return (EDOOFUS);
 	}
 
 	/* This is the firmware whose headers the driver was compiled against */
 	fw_info = find_fw_info(chip_id(sc));
 	if (fw_info == NULL) {
 		device_printf(sc->dev,
 		    "unable to look up firmware information for chip %d.\n",
 		    chip_id(sc));
 		return (EINVAL);
 	}
 	drv_fw = &fw_info->fw_hdr;
 
 	/*
 	 * The firmware KLD contains many modules.  The KLD name is also the
 	 * name of the module that contains the default config file.
 	 */
 	default_cfg = firmware_get(fw_info->kld_name);
 
 	/* Read the header of the firmware on the card */
 	card_fw = malloc(sizeof(*card_fw), M_CXGBE, M_ZERO | M_WAITOK);
 	rc = -t4_read_flash(sc, FLASH_FW_START,
 	    sizeof (*card_fw) / sizeof (uint32_t), (uint32_t *)card_fw, 1);
 	if (rc == 0)
 		card_fw_usable = fw_compatible(drv_fw, (const void*)card_fw);
 	else {
 		device_printf(sc->dev,
 		    "Unable to read card's firmware header: %d\n", rc);
 		card_fw_usable = 0;
 	}
 
 	/* This is the firmware in the KLD */
 	fw = firmware_get(fw_info->fw_mod_name);
 	if (fw != NULL) {
 		kld_fw = (const void *)fw->data;
 		kld_fw_usable = fw_compatible(drv_fw, kld_fw);
 	} else {
 		kld_fw = NULL;
 		kld_fw_usable = 0;
 	}
 
 	if (card_fw_usable && card_fw->fw_ver == drv_fw->fw_ver &&
 	    (!kld_fw_usable || kld_fw->fw_ver == drv_fw->fw_ver)) {
 		/*
 		 * Common case: the firmware on the card is an exact match and
 		 * the KLD is an exact match too, or the KLD is
 		 * absent/incompatible.  Note that t4_fw_install = 2 is ignored
 		 * here -- use cxgbetool loadfw if you want to reinstall the
 		 * same firmware as the one on the card.
 		 */
 	} else if (kld_fw_usable && state == DEV_STATE_UNINIT &&
 	    should_install_kld_fw(sc, card_fw_usable, be32toh(kld_fw->fw_ver),
 	    be32toh(card_fw->fw_ver))) {
 
 		rc = -t4_fw_upgrade(sc, sc->mbox, fw->data, fw->datasize, 0);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to install firmware: %d\n", rc);
 			goto done;
 		}
 
 		/* Installed successfully, update the cached header too. */
 		memcpy(card_fw, kld_fw, sizeof(*card_fw));
 		card_fw_usable = 1;
 		need_fw_reset = 0;	/* already reset as part of load_fw */
 	}
 
 	if (!card_fw_usable) {
 		uint32_t d, c, k;
 
 		d = ntohl(drv_fw->fw_ver);
 		c = ntohl(card_fw->fw_ver);
 		k = kld_fw ? ntohl(kld_fw->fw_ver) : 0;
 
 		device_printf(sc->dev, "Cannot find a usable firmware: "
 		    "fw_install %d, chip state %d, "
 		    "driver compiled with %d.%d.%d.%d, "
 		    "card has %d.%d.%d.%d, KLD has %d.%d.%d.%d\n",
 		    t4_fw_install, state,
 		    G_FW_HDR_FW_VER_MAJOR(d), G_FW_HDR_FW_VER_MINOR(d),
 		    G_FW_HDR_FW_VER_MICRO(d), G_FW_HDR_FW_VER_BUILD(d),
 		    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 		    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c),
 		    G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
 		    G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* We're using whatever's on the card and it's known to be good. */
 	sc->params.fw_vers = ntohl(card_fw->fw_ver);
 	snprintf(sc->fw_version, sizeof(sc->fw_version), "%u.%u.%u.%u",
 	    G_FW_HDR_FW_VER_MAJOR(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_MINOR(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_MICRO(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_BUILD(sc->params.fw_vers));
 
 	t4_get_tp_version(sc, &sc->params.tp_vers);
 	snprintf(sc->tp_version, sizeof(sc->tp_version), "%u.%u.%u.%u",
 	    G_FW_HDR_FW_VER_MAJOR(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_MINOR(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_MICRO(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_BUILD(sc->params.tp_vers));
 
 	if (t4_get_exprom_version(sc, &sc->params.exprom_vers) != 0)
 		sc->params.exprom_vers = 0;
 	else {
 		snprintf(sc->exprom_version, sizeof(sc->exprom_version),
 		    "%u.%u.%u.%u",
 		    G_FW_HDR_FW_VER_MAJOR(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_MINOR(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_MICRO(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_BUILD(sc->params.exprom_vers));
 	}
 
 	/* Reset device */
 	if (need_fw_reset &&
 	    (rc = -t4_fw_reset(sc, sc->mbox, F_PIORSTMODE | F_PIORST)) != 0) {
 		device_printf(sc->dev, "firmware reset failed: %d.\n", rc);
 		if (rc != ETIMEDOUT && rc != EIO)
 			t4_fw_bye(sc, sc->mbox);
 		goto done;
 	}
 	sc->flags |= FW_OK;
 
 	rc = get_params__pre_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/* Partition adapter resources as specified in the config file. */
 	if (state == DEV_STATE_UNINIT) {
 
 		KASSERT(sc->flags & MASTER_PF,
 		    ("%s: trying to change chip settings when not master.",
 		    __func__));
 
 		rc = partition_resources(sc, default_cfg, fw_info->kld_name);
 		if (rc != 0)
 			goto done;	/* error message displayed already */
 
 		t4_tweak_chip_settings(sc);
 
 		/* get basic stuff going */
 		rc = -t4_fw_initialize(sc, sc->mbox);
 		if (rc != 0) {
 			device_printf(sc->dev, "fw init failed: %d.\n", rc);
 			goto done;
 		}
 	} else {
 		snprintf(sc->cfg_file, sizeof(sc->cfg_file), "pf%d", pf);
 		sc->cfcsum = 0;
 	}
 
 done:
 	free(card_fw, M_CXGBE);
 	if (fw != NULL)
 		firmware_put(fw, FIRMWARE_UNLOAD);
 	if (default_cfg != NULL)
 		firmware_put(default_cfg, FIRMWARE_UNLOAD);
 
 	return (rc);
 }
 
 #define FW_PARAM_DEV(param) \
 	(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | \
 	 V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_##param))
 #define FW_PARAM_PFVF(param) \
 	(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_PFVF) | \
 	 V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_PFVF_##param))
 
 /*
  * Partition chip resources for use between various PFs, VFs, etc.
  */
 static int
 partition_resources(struct adapter *sc, const struct firmware *default_cfg,
     const char *name_prefix)
 {
 	const struct firmware *cfg = NULL;
 	int rc = 0;
 	struct fw_caps_config_cmd caps;
 	uint32_t mtype, moff, finicsum, cfcsum;
 
 	/*
 	 * Figure out what configuration file to use.  Pick the default config
 	 * file for the card if the user hasn't specified one explicitly.
 	 */
 	snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", t4_cfg_file);
 	if (strncmp(t4_cfg_file, DEFAULT_CF, sizeof(t4_cfg_file)) == 0) {
 		/* Card specific overrides go here. */
 		if (pci_get_device(sc->dev) == 0x440a)
 			snprintf(sc->cfg_file, sizeof(sc->cfg_file), UWIRE_CF);
 		if (is_fpga(sc))
 			snprintf(sc->cfg_file, sizeof(sc->cfg_file), FPGA_CF);
 	}
 
 	/*
 	 * We need to load another module if the profile is anything except
 	 * "default" or "flash".
 	 */
 	if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) != 0 &&
 	    strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
 		char s[32];
 
 		snprintf(s, sizeof(s), "%s_%s", name_prefix, sc->cfg_file);
 		cfg = firmware_get(s);
 		if (cfg == NULL) {
 			if (default_cfg != NULL) {
 				device_printf(sc->dev,
 				    "unable to load module \"%s\" for "
 				    "configuration profile \"%s\", will use "
 				    "the default config file instead.\n",
 				    s, sc->cfg_file);
 				snprintf(sc->cfg_file, sizeof(sc->cfg_file),
 				    "%s", DEFAULT_CF);
 			} else {
 				device_printf(sc->dev,
 				    "unable to load module \"%s\" for "
 				    "configuration profile \"%s\", will use "
 				    "the config file on the card's flash "
 				    "instead.\n", s, sc->cfg_file);
 				snprintf(sc->cfg_file, sizeof(sc->cfg_file),
 				    "%s", FLASH_CF);
 			}
 		}
 	}
 
 	if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) == 0 &&
 	    default_cfg == NULL) {
 		device_printf(sc->dev,
 		    "default config file not available, will use the config "
 		    "file on the card's flash instead.\n");
 		snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", FLASH_CF);
 	}
 
 	if (strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
 		u_int cflen;
 		const uint32_t *cfdata;
 		uint32_t param, val, addr;
 
 		KASSERT(cfg != NULL || default_cfg != NULL,
 		    ("%s: no config to upload", __func__));
 
 		/*
 		 * Ask the firmware where it wants us to upload the config file.
 		 */
 		param = FW_PARAM_DEV(CF);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 		if (rc != 0) {
 			/* No support for config file?  Shouldn't happen. */
 			device_printf(sc->dev,
 			    "failed to query config file location: %d.\n", rc);
 			goto done;
 		}
 		mtype = G_FW_PARAMS_PARAM_Y(val);
 		moff = G_FW_PARAMS_PARAM_Z(val) << 16;
 
 		/*
 		 * XXX: sheer laziness.  We deliberately added 4 bytes of
 		 * useless stuffing/comments at the end of the config file so
 		 * it's ok to simply throw away the last remaining bytes when
 		 * the config file is not an exact multiple of 4.  This also
 		 * helps with the validate_mt_off_len check.
 		 */
 		if (cfg != NULL) {
 			cflen = cfg->datasize & ~3;
 			cfdata = cfg->data;
 		} else {
 			cflen = default_cfg->datasize & ~3;
 			cfdata = default_cfg->data;
 		}
 
 		if (cflen > FLASH_CFG_MAX_SIZE) {
 			device_printf(sc->dev,
 			    "config file too long (%d, max allowed is %d).  "
 			    "Will try to use the config on the card, if any.\n",
 			    cflen, FLASH_CFG_MAX_SIZE);
 			goto use_config_on_flash;
 		}
 
 		rc = validate_mt_off_len(sc, mtype, moff, cflen, &addr);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "%s: addr (%d/0x%x) or len %d is not valid: %d.  "
 			    "Will try to use the config on the card, if any.\n",
 			    __func__, mtype, moff, cflen, rc);
 			goto use_config_on_flash;
 		}
 		write_via_memwin(sc, 2, addr, cfdata, cflen);
 	} else {
 use_config_on_flash:
 		mtype = FW_MEMTYPE_FLASH;
 		moff = t4_flash_cfg_addr(sc);
 	}
 
 	bzero(&caps, sizeof(caps));
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_READ);
 	caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID |
 	    V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) |
 	    V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(moff >> 16) | FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to pre-process config file: %d "
 		    "(mtype %d, moff 0x%x).\n", rc, mtype, moff);
 		goto done;
 	}
 
 	finicsum = be32toh(caps.finicsum);
 	cfcsum = be32toh(caps.cfcsum);
 	if (finicsum != cfcsum) {
 		device_printf(sc->dev,
 		    "WARNING: config file checksum mismatch: %08x %08x\n",
 		    finicsum, cfcsum);
 	}
 	sc->cfcsum = cfcsum;
 
 #define LIMIT_CAPS(x) do { \
 	caps.x &= htobe16(t4_##x##_allowed); \
 } while (0)
 
 	/*
 	 * Let the firmware know what features will (not) be used so it can tune
 	 * things accordingly.
 	 */
 	LIMIT_CAPS(nbmcaps);
 	LIMIT_CAPS(linkcaps);
 	LIMIT_CAPS(switchcaps);
 	LIMIT_CAPS(niccaps);
 	LIMIT_CAPS(toecaps);
 	LIMIT_CAPS(rdmacaps);
 	LIMIT_CAPS(tlscaps);
 	LIMIT_CAPS(iscsicaps);
 	LIMIT_CAPS(fcoecaps);
 #undef LIMIT_CAPS
 
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_WRITE);
 	caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), NULL);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to process config file: %d.\n", rc);
 	}
 done:
 	if (cfg != NULL)
 		firmware_put(cfg, FIRMWARE_UNLOAD);
 	return (rc);
 }
 
 /*
  * Retrieve parameters that are needed (or nice to have) very early.
  */
 static int
 get_params__pre_init(struct adapter *sc)
 {
 	int rc;
 	uint32_t param[2], val[2];
 
 	param[0] = FW_PARAM_DEV(PORTVEC);
 	param[1] = FW_PARAM_DEV(CCLK);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to query parameters (pre_init): %d.\n", rc);
 		return (rc);
 	}
 
 	sc->params.portvec = val[0];
 	sc->params.nports = bitcount32(val[0]);
 	sc->params.vpd.cclk = val[1];
 
 	/* Read device log parameters. */
 	rc = -t4_init_devlog_params(sc, 1);
 	if (rc == 0)
 		fixup_devlog_params(sc);
 	else {
 		device_printf(sc->dev,
 		    "failed to get devlog parameters: %d.\n", rc);
 		rc = 0;	/* devlog isn't critical for device operation */
 	}
 
 	return (rc);
 }
 
 /*
  * Retrieve various parameters that are of interest to the driver.  The device
  * has been initialized by the firmware at this point.
  */
 static int
 get_params__post_init(struct adapter *sc)
 {
 	int rc;
 	uint32_t param[7], val[7];
 	struct fw_caps_config_cmd caps;
 
 	param[0] = FW_PARAM_PFVF(IQFLINT_START);
 	param[1] = FW_PARAM_PFVF(EQ_START);
 	param[2] = FW_PARAM_PFVF(FILTER_START);
 	param[3] = FW_PARAM_PFVF(FILTER_END);
 	param[4] = FW_PARAM_PFVF(L2T_START);
 	param[5] = FW_PARAM_PFVF(L2T_END);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to query parameters (post_init): %d.\n", rc);
 		return (rc);
 	}
 
 	sc->sge.iq_start = val[0];
 	sc->sge.eq_start = val[1];
 	sc->tids.ftid_base = val[2];
 	sc->tids.nftids = val[3] - val[2] + 1;
 	sc->params.ftid_min = val[2];
 	sc->params.ftid_max = val[3];
 	sc->vres.l2t.start = val[4];
 	sc->vres.l2t.size = val[5] - val[4] + 1;
 	KASSERT(sc->vres.l2t.size <= L2T_SIZE,
 	    ("%s: L2 table size (%u) larger than expected (%u)",
 	    __func__, sc->vres.l2t.size, L2T_SIZE));
 
 	/* get capabilites */
 	bzero(&caps, sizeof(caps));
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_READ);
 	caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to get card capabilities: %d.\n", rc);
 		return (rc);
 	}
 
 #define READ_CAPS(x) do { \
 	sc->x = htobe16(caps.x); \
 } while (0)
 	READ_CAPS(nbmcaps);
 	READ_CAPS(linkcaps);
 	READ_CAPS(switchcaps);
 	READ_CAPS(niccaps);
 	READ_CAPS(toecaps);
 	READ_CAPS(rdmacaps);
 	READ_CAPS(tlscaps);
 	READ_CAPS(iscsicaps);
 	READ_CAPS(fcoecaps);
 
 	if (sc->niccaps & FW_CAPS_CONFIG_NIC_ETHOFLD) {
 		param[0] = FW_PARAM_PFVF(ETHOFLD_START);
 		param[1] = FW_PARAM_PFVF(ETHOFLD_END);
 		param[2] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 3, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query NIC parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->tids.etid_base = val[0];
 		sc->params.etid_min = val[0];
 		sc->tids.netids = val[1] - val[0] + 1;
 		sc->params.netids = sc->tids.netids;
 		sc->params.eo_wr_cred = val[2];
 		sc->params.ethoffload = 1;
 	}
 
 	if (sc->toecaps) {
 		/* query offload-related parameters */
 		param[0] = FW_PARAM_DEV(NTID);
 		param[1] = FW_PARAM_PFVF(SERVER_START);
 		param[2] = FW_PARAM_PFVF(SERVER_END);
 		param[3] = FW_PARAM_PFVF(TDDP_START);
 		param[4] = FW_PARAM_PFVF(TDDP_END);
 		param[5] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query TOE parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->tids.ntids = val[0];
 		sc->tids.natids = min(sc->tids.ntids / 2, MAX_ATIDS);
 		sc->tids.stid_base = val[1];
 		sc->tids.nstids = val[2] - val[1] + 1;
 		sc->vres.ddp.start = val[3];
 		sc->vres.ddp.size = val[4] - val[3] + 1;
 		sc->params.ofldq_wr_cred = val[5];
 		sc->params.offload = 1;
 	}
 	if (sc->rdmacaps) {
 		param[0] = FW_PARAM_PFVF(STAG_START);
 		param[1] = FW_PARAM_PFVF(STAG_END);
 		param[2] = FW_PARAM_PFVF(RQ_START);
 		param[3] = FW_PARAM_PFVF(RQ_END);
 		param[4] = FW_PARAM_PFVF(PBL_START);
 		param[5] = FW_PARAM_PFVF(PBL_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query RDMA parameters(1): %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.stag.start = val[0];
 		sc->vres.stag.size = val[1] - val[0] + 1;
 		sc->vres.rq.start = val[2];
 		sc->vres.rq.size = val[3] - val[2] + 1;
 		sc->vres.pbl.start = val[4];
 		sc->vres.pbl.size = val[5] - val[4] + 1;
 
 		param[0] = FW_PARAM_PFVF(SQRQ_START);
 		param[1] = FW_PARAM_PFVF(SQRQ_END);
 		param[2] = FW_PARAM_PFVF(CQ_START);
 		param[3] = FW_PARAM_PFVF(CQ_END);
 		param[4] = FW_PARAM_PFVF(OCQ_START);
 		param[5] = FW_PARAM_PFVF(OCQ_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query RDMA parameters(2): %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.qp.start = val[0];
 		sc->vres.qp.size = val[1] - val[0] + 1;
 		sc->vres.cq.start = val[2];
 		sc->vres.cq.size = val[3] - val[2] + 1;
 		sc->vres.ocq.start = val[4];
 		sc->vres.ocq.size = val[5] - val[4] + 1;
 	}
 	if (sc->iscsicaps) {
 		param[0] = FW_PARAM_PFVF(ISCSI_START);
 		param[1] = FW_PARAM_PFVF(ISCSI_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query iSCSI parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.iscsi.start = val[0];
 		sc->vres.iscsi.size = val[1] - val[0] + 1;
 	}
 
 	/*
 	 * We've got the params we wanted to query via the firmware.  Now grab
 	 * some others directly from the chip.
 	 */
 	rc = t4_read_chip_settings(sc);
 
 	return (rc);
 }
 
 static int
 set_params__post_init(struct adapter *sc)
 {
 	uint32_t param, val;
 
 	/* ask for encapsulated CPLs */
 	param = FW_PARAM_PFVF(CPLFW4MSG_ENCAP);
 	val = 1;
 	(void)t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 
 	return (0);
 }
 
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
 
 static void
 t4_set_desc(struct adapter *sc)
 {
 	char buf[128];
 	struct adapter_params *p = &sc->params;
 
 	snprintf(buf, sizeof(buf), "Chelsio %s %sNIC (rev %d), S/N:%s, "
 	    "P/N:%s, E/C:%s", p->vpd.id, is_offload(sc) ? "R" : "",
 	    chip_rev(sc), p->vpd.sn, p->vpd.pn, p->vpd.ec);
 
 	device_set_desc_copy(sc->dev, buf);
 }
 
 static void
 build_medialist(struct port_info *pi, struct ifmedia *media)
 {
 	int m;
 
 	PORT_LOCK(pi);
 
 	ifmedia_removeall(media);
 
 	m = IFM_ETHER | IFM_FDX;
 
 	switch(pi->port_type) {
 	case FW_PORT_TYPE_BT_XFI:
 	case FW_PORT_TYPE_BT_XAUI:
 		ifmedia_add(media, m | IFM_10G_T, 0, NULL);
 		/* fall through */
 
 	case FW_PORT_TYPE_BT_SGMII:
 		ifmedia_add(media, m | IFM_1000_T, 0, NULL);
 		ifmedia_add(media, m | IFM_100_TX, 0, NULL);
 		ifmedia_add(media, IFM_ETHER | IFM_AUTO, 0, NULL);
 		ifmedia_set(media, IFM_ETHER | IFM_AUTO);
 		break;
 
 	case FW_PORT_TYPE_CX4:
 		ifmedia_add(media, m | IFM_10G_CX4, 0, NULL);
 		ifmedia_set(media, m | IFM_10G_CX4);
 		break;
 
 	case FW_PORT_TYPE_QSFP_10G:
 	case FW_PORT_TYPE_SFP:
 	case FW_PORT_TYPE_FIBER_XFI:
 	case FW_PORT_TYPE_FIBER_XAUI:
 		switch (pi->mod_type) {
 
 		case FW_PORT_MOD_TYPE_LR:
 			ifmedia_add(media, m | IFM_10G_LR, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_LR);
 			break;
 
 		case FW_PORT_MOD_TYPE_SR:
 			ifmedia_add(media, m | IFM_10G_SR, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_SR);
 			break;
 
 		case FW_PORT_MOD_TYPE_LRM:
 			ifmedia_add(media, m | IFM_10G_LRM, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_LRM);
 			break;
 
 		case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
 		case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
 			ifmedia_add(media, m | IFM_10G_TWINAX, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_TWINAX);
 			break;
 
 		case FW_PORT_MOD_TYPE_NONE:
 			m &= ~IFM_FDX;
 			ifmedia_add(media, m | IFM_NONE, 0, NULL);
 			ifmedia_set(media, m | IFM_NONE);
 			break;
 
 		case FW_PORT_MOD_TYPE_NA:
 		case FW_PORT_MOD_TYPE_ER:
 		default:
 			device_printf(pi->dev,
 			    "unknown port_type (%d), mod_type (%d)\n",
 			    pi->port_type, pi->mod_type);
 			ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 			ifmedia_set(media, m | IFM_UNKNOWN);
 			break;
 		}
 		break;
 
 	case FW_PORT_TYPE_QSFP:
 		switch (pi->mod_type) {
 
 		case FW_PORT_MOD_TYPE_LR:
 			ifmedia_add(media, m | IFM_40G_LR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_LR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_SR:
 			ifmedia_add(media, m | IFM_40G_SR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_SR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
 		case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
 			ifmedia_add(media, m | IFM_40G_CR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_CR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_NONE:
 			m &= ~IFM_FDX;
 			ifmedia_add(media, m | IFM_NONE, 0, NULL);
 			ifmedia_set(media, m | IFM_NONE);
 			break;
 
 		default:
 			device_printf(pi->dev,
 			    "unknown port_type (%d), mod_type (%d)\n",
 			    pi->port_type, pi->mod_type);
 			ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 			ifmedia_set(media, m | IFM_UNKNOWN);
 			break;
 		}
 		break;
 
 	default:
 		device_printf(pi->dev,
 		    "unknown port_type (%d), mod_type (%d)\n", pi->port_type,
 		    pi->mod_type);
 		ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 		ifmedia_set(media, m | IFM_UNKNOWN);
 		break;
 	}
 
 	PORT_UNLOCK(pi);
 }
 
 #define FW_MAC_EXACT_CHUNK	7
 
 /*
  * Program the port's XGMAC based on parameters in ifnet.  The caller also
  * indicates which parameters should be programmed (the rest are left alone).
  */
 int
 update_mac_settings(struct ifnet *ifp, int flags)
 {
 	int rc = 0;
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	int mtu = -1, promisc = -1, allmulti = -1, vlanex = -1;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	KASSERT(flags, ("%s: not told what to update.", __func__));
 
 	if (flags & XGMAC_MTU)
 		mtu = ifp->if_mtu;
 
 	if (flags & XGMAC_PROMISC)
 		promisc = ifp->if_flags & IFF_PROMISC ? 1 : 0;
 
 	if (flags & XGMAC_ALLMULTI)
 		allmulti = ifp->if_flags & IFF_ALLMULTI ? 1 : 0;
 
 	if (flags & XGMAC_VLANEX)
 		vlanex = ifp->if_capenable & IFCAP_VLAN_HWTAGGING ? 1 : 0;
 
 	if (flags & (XGMAC_MTU|XGMAC_PROMISC|XGMAC_ALLMULTI|XGMAC_VLANEX)) {
 		rc = -t4_set_rxmode(sc, sc->mbox, vi->viid, mtu, promisc,
 		    allmulti, 1, vlanex, false);
 		if (rc) {
 			if_printf(ifp, "set_rxmode (%x) failed: %d\n", flags,
 			    rc);
 			return (rc);
 		}
 	}
 
 	if (flags & XGMAC_UCADDR) {
 		uint8_t ucaddr[ETHER_ADDR_LEN];
 
 		bcopy(IF_LLADDR(ifp), ucaddr, sizeof(ucaddr));
 		rc = t4_change_mac(sc, sc->mbox, vi->viid, vi->xact_addr_filt,
 		    ucaddr, true, true);
 		if (rc < 0) {
 			rc = -rc;
 			if_printf(ifp, "change_mac failed: %d\n", rc);
 			return (rc);
 		} else {
 			vi->xact_addr_filt = rc;
 			rc = 0;
 		}
 	}
 
 	if (flags & XGMAC_MCADDRS) {
 		const uint8_t *mcaddr[FW_MAC_EXACT_CHUNK];
 		int del = 1;
 		uint64_t hash = 0;
 		struct ifmultiaddr *ifma;
 		int i = 0, j;
 
 		if_maddr_rlock(ifp);
 		TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 			if (ifma->ifma_addr->sa_family != AF_LINK)
 				continue;
 			mcaddr[i] =
 			    LLADDR((struct sockaddr_dl *)ifma->ifma_addr);
 			MPASS(ETHER_IS_MULTICAST(mcaddr[i]));
 			i++;
 
 			if (i == FW_MAC_EXACT_CHUNK) {
 				rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid,
 				    del, i, mcaddr, NULL, &hash, 0);
 				if (rc < 0) {
 					rc = -rc;
 					for (j = 0; j < i; j++) {
 						if_printf(ifp,
 						    "failed to add mc address"
 						    " %02x:%02x:%02x:"
 						    "%02x:%02x:%02x rc=%d\n",
 						    mcaddr[j][0], mcaddr[j][1],
 						    mcaddr[j][2], mcaddr[j][3],
 						    mcaddr[j][4], mcaddr[j][5],
 						    rc);
 					}
 					goto mcfail;
 				}
 				del = 0;
 				i = 0;
 			}
 		}
 		if (i > 0) {
 			rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid, del, i,
 			    mcaddr, NULL, &hash, 0);
 			if (rc < 0) {
 				rc = -rc;
 				for (j = 0; j < i; j++) {
 					if_printf(ifp,
 					    "failed to add mc address"
 					    " %02x:%02x:%02x:"
 					    "%02x:%02x:%02x rc=%d\n",
 					    mcaddr[j][0], mcaddr[j][1],
 					    mcaddr[j][2], mcaddr[j][3],
 					    mcaddr[j][4], mcaddr[j][5],
 					    rc);
 				}
 				goto mcfail;
 			}
 		}
 
 		rc = -t4_set_addr_hash(sc, sc->mbox, vi->viid, 0, hash, 0);
 		if (rc != 0)
 			if_printf(ifp, "failed to set mc address hash: %d", rc);
 mcfail:
 		if_maddr_runlock(ifp);
 	}
 
 	return (rc);
 }
 
 /*
  * {begin|end}_synchronized_op must be called from the same thread.
  */
 int
 begin_synchronized_op(struct adapter *sc, struct vi_info *vi, int flags,
     char *wmesg)
 {
 	int rc, pri;
 
 #ifdef WITNESS
 	/* the caller thinks it's ok to sleep, but is it really? */
 	if (flags & SLEEP_OK)
 		WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,
 		    "begin_synchronized_op");
 #endif
 
 	if (INTR_OK)
 		pri = PCATCH;
 	else
 		pri = 0;
 
 	ADAPTER_LOCK(sc);
 	for (;;) {
 
 		if (vi && IS_DOOMED(vi)) {
 			rc = ENXIO;
 			goto done;
 		}
 
 		if (!IS_BUSY(sc)) {
 			rc = 0;
 			break;
 		}
 
 		if (!(flags & SLEEP_OK)) {
 			rc = EBUSY;
 			goto done;
 		}
 
 		if (mtx_sleep(&sc->flags, &sc->sc_lock, pri, wmesg, 0)) {
 			rc = EINTR;
 			goto done;
 		}
 	}
 
 	KASSERT(!IS_BUSY(sc), ("%s: controller busy.", __func__));
 	SET_BUSY(sc);
 #ifdef INVARIANTS
 	sc->last_op = wmesg;
 	sc->last_op_thr = curthread;
 	sc->last_op_flags = flags;
 #endif
 
 done:
 	if (!(flags & HOLD_LOCK) || rc)
 		ADAPTER_UNLOCK(sc);
 
 	return (rc);
 }
 
 /*
  * Tell if_ioctl and if_init that the VI is going away.  This is
  * special variant of begin_synchronized_op and must be paired with a
  * call to end_synchronized_op.
  */
 void
 doom_vi(struct adapter *sc, struct vi_info *vi)
 {
 
 	ADAPTER_LOCK(sc);
 	SET_DOOMED(vi);
 	wakeup(&sc->flags);
 	while (IS_BUSY(sc))
 		mtx_sleep(&sc->flags, &sc->sc_lock, 0, "t4detach", 0);
 	SET_BUSY(sc);
 #ifdef INVARIANTS
 	sc->last_op = "t4detach";
 	sc->last_op_thr = curthread;
 	sc->last_op_flags = 0;
 #endif
 	ADAPTER_UNLOCK(sc);
 }
 
 /*
  * {begin|end}_synchronized_op must be called from the same thread.
  */
 void
 end_synchronized_op(struct adapter *sc, int flags)
 {
 
 	if (flags & LOCK_HELD)
 		ADAPTER_LOCK_ASSERT_OWNED(sc);
 	else
 		ADAPTER_LOCK(sc);
 
 	KASSERT(IS_BUSY(sc), ("%s: controller not busy.", __func__));
 	CLR_BUSY(sc);
 	wakeup(&sc->flags);
 	ADAPTER_UNLOCK(sc);
 }
 
 static int
 cxgbe_init_synchronized(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	int rc = 0, i;
 	struct sge_txq *txq;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 		return (0);	/* already running */
 
 	if (!(sc->flags & FULL_INIT_DONE) &&
 	    ((rc = adapter_full_init(sc)) != 0))
 		return (rc);	/* error message displayed already */
 
 	if (!(vi->flags & VI_INIT_DONE) &&
 	    ((rc = vi_full_init(vi)) != 0))
 		return (rc); /* error message displayed already */
 
 	rc = update_mac_settings(ifp, XGMAC_ALL);
 	if (rc)
 		goto done;	/* error message displayed already */
 
 	rc = -t4_enable_vi(sc, sc->mbox, vi->viid, true, true);
 	if (rc != 0) {
 		if_printf(ifp, "enable_vi failed: %d\n", rc);
 		goto done;
 	}
 
 	/*
 	 * Can't fail from this point onwards.  Review cxgbe_uninit_synchronized
 	 * if this changes.
 	 */
 
 	for_each_txq(vi, i, txq) {
 		TXQ_LOCK(txq);
 		txq->eq.flags |= EQ_ENABLED;
 		TXQ_UNLOCK(txq);
 	}
 
 	/*
 	 * The first iq of the first port to come up is used for tracing.
 	 */
 	if (sc->traceq < 0 && IS_MAIN_VI(vi)) {
 		sc->traceq = sc->sge.rxq[vi->first_rxq].iq.abs_id;
 		t4_write_reg(sc, is_t4(sc) ?  A_MPS_TRC_RSS_CONTROL :
 		    A_MPS_T5_TRC_RSS_CONTROL, V_RSSCONTROL(pi->tx_chan) |
 		    V_QUEUENUMBER(sc->traceq));
 		pi->flags |= HAS_TRACEQ;
 	}
 
 	/* all ok */
 	PORT_LOCK(pi);
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	pi->up_vis++;
 
 	if (pi->nvi > 1)
 		callout_reset(&vi->tick, hz, vi_tick, vi);
 	else
 		callout_reset(&pi->tick, hz, cxgbe_tick, pi);
 	PORT_UNLOCK(pi);
 done:
 	if (rc != 0)
 		cxgbe_uninit_synchronized(vi);
 
 	return (rc);
 }
 
 /*
  * Idempotent.
  */
 static int
 cxgbe_uninit_synchronized(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	int rc, i;
 	struct sge_txq *txq;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (!(vi->flags & VI_INIT_DONE)) {
 		KASSERT(!(ifp->if_drv_flags & IFF_DRV_RUNNING),
 		    ("uninited VI is running"));
 		return (0);
 	}
 
 	/*
 	 * Disable the VI so that all its data in either direction is discarded
 	 * by the MPS.  Leave everything else (the queues, interrupts, and 1Hz
 	 * tick) intact as the TP can deliver negative advice or data that it's
 	 * holding in its RAM (for an offloaded connection) even after the VI is
 	 * disabled.
 	 */
 	rc = -t4_enable_vi(sc, sc->mbox, vi->viid, false, false);
 	if (rc) {
 		if_printf(ifp, "disable_vi failed: %d\n", rc);
 		return (rc);
 	}
 
 	for_each_txq(vi, i, txq) {
 		TXQ_LOCK(txq);
 		txq->eq.flags &= ~EQ_ENABLED;
 		TXQ_UNLOCK(txq);
 	}
 
 	PORT_LOCK(pi);
 	if (pi->nvi == 1)
 		callout_stop(&pi->tick);
 	else
 		callout_stop(&vi->tick);
 	if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
 		PORT_UNLOCK(pi);
 		return (0);
 	}
 	ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	pi->up_vis--;
 	if (pi->up_vis > 0) {
 		PORT_UNLOCK(pi);
 		return (0);
 	}
 	PORT_UNLOCK(pi);
 
 	pi->link_cfg.link_ok = 0;
 	pi->link_cfg.speed = 0;
 	pi->linkdnrc = -1;
 	t4_os_link_changed(sc, pi->port_id, 0, -1);
 
 	return (0);
 }
 
 /*
  * It is ok for this function to fail midway and return right away.  t4_detach
  * will walk the entire sc->irq list and clean up whatever is valid.
  */
 static int
 setup_intr_handlers(struct adapter *sc)
 {
 	int rc, rid, p, q, v;
 	char s[8];
 	struct irq *irq;
 	struct port_info *pi;
 	struct vi_info *vi;
 	struct sge_rxq *rxq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 #ifdef DEV_NETMAP
 	struct sge_nm_rxq *nm_rxq;
 #endif
 #ifdef RSS
 	int nbuckets = rss_getnumbuckets();
 #endif
 
 	/*
 	 * Setup interrupts.
 	 */
 	irq = &sc->irq[0];
 	rid = sc->intr_type == INTR_INTX ? 0 : 1;
 	if (sc->intr_count == 1)
 		return (t4_alloc_irq(sc, irq, rid, t4_intr_all, sc, "all"));
 
 	/* Multiple interrupts. */
 	KASSERT(sc->intr_count >= T4_EXTRA_INTR + sc->params.nports,
 	    ("%s: too few intr.", __func__));
 
 	/* The first one is always error intr */
 	rc = t4_alloc_irq(sc, irq, rid, t4_intr_err, sc, "err");
 	if (rc != 0)
 		return (rc);
 	irq++;
 	rid++;
 
 	/* The second one is always the firmware event queue */
 	rc = t4_alloc_irq(sc, irq, rid, t4_intr_evt, &sc->sge.fwq, "evt");
 	if (rc != 0)
 		return (rc);
 	irq++;
 	rid++;
 
 	for_each_port(sc, p) {
 		pi = sc->port[p];
 		for_each_vi(pi, v, vi) {
 			vi->first_intr = rid - 1;
 #ifdef DEV_NETMAP
 			if (vi->flags & VI_NETMAP) {
 				for_each_nm_rxq(vi, q, nm_rxq) {
 					snprintf(s, sizeof(s), "%d-%d", p, q);
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_nm_intr, nm_rxq, s);
 					if (rc != 0)
 						return (rc);
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 				continue;
 			}
 #endif
 			if (vi->flags & INTR_RXQ) {
 				for_each_rxq(vi, q, rxq) {
 					if (v == 0)
 						snprintf(s, sizeof(s), "%d.%d",
 						    p, q);
 					else
 						snprintf(s, sizeof(s),
 						    "%d(%d).%d", p, v, q);
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_intr, rxq, s);
 					if (rc != 0)
 						return (rc);
 #ifdef RSS
 					bus_bind_intr(sc->dev, irq->res,
 					    rss_getcpu(q % nbuckets));
 #endif
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 			}
 #ifdef TCP_OFFLOAD
 			if (vi->flags & INTR_OFLD_RXQ) {
 				for_each_ofld_rxq(vi, q, ofld_rxq) {
 					snprintf(s, sizeof(s), "%d,%d", p, q);
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_intr, ofld_rxq, s);
 					if (rc != 0)
 						return (rc);
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 			}
 #endif
 		}
 	}
 	MPASS(irq == &sc->irq[sc->intr_count]);
 
 	return (0);
 }
 
 int
 adapter_full_init(struct adapter *sc)
 {
 	int rc, i;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 	KASSERT((sc->flags & FULL_INIT_DONE) == 0,
 	    ("%s: FULL_INIT_DONE already", __func__));
 
 	/*
 	 * queues that belong to the adapter (not any particular port).
 	 */
 	rc = t4_setup_adapter_queues(sc);
 	if (rc != 0)
 		goto done;
 
 	for (i = 0; i < nitems(sc->tq); i++) {
 		sc->tq[i] = taskqueue_create("t4 taskq", M_NOWAIT,
 		    taskqueue_thread_enqueue, &sc->tq[i]);
 		if (sc->tq[i] == NULL) {
 			device_printf(sc->dev,
 			    "failed to allocate task queue %d\n", i);
 			rc = ENOMEM;
 			goto done;
 		}
 		taskqueue_start_threads(&sc->tq[i], 1, PI_NET, "%s tq%d",
 		    device_get_nameunit(sc->dev), i);
 	}
 
 	t4_intr_enable(sc);
 	sc->flags |= FULL_INIT_DONE;
 done:
 	if (rc != 0)
 		adapter_full_uninit(sc);
 
 	return (rc);
 }
 
 int
 adapter_full_uninit(struct adapter *sc)
 {
 	int i;
 
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 
 	t4_teardown_adapter_queues(sc);
 
 	for (i = 0; i < nitems(sc->tq) && sc->tq[i]; i++) {
 		taskqueue_free(sc->tq[i]);
 		sc->tq[i] = NULL;
 	}
 
 	sc->flags &= ~FULL_INIT_DONE;
 
 	return (0);
 }
 
 #ifdef RSS
 #define SUPPORTED_RSS_HASHTYPES (RSS_HASHTYPE_RSS_IPV4 | \
     RSS_HASHTYPE_RSS_TCP_IPV4 | RSS_HASHTYPE_RSS_IPV6 | \
     RSS_HASHTYPE_RSS_TCP_IPV6 | RSS_HASHTYPE_RSS_UDP_IPV4 | \
     RSS_HASHTYPE_RSS_UDP_IPV6)
 
 /* Translates kernel hash types to hardware. */
 static int
 hashconfig_to_hashen(int hashconfig)
 {
 	int hashen = 0;
 
 	if (hashconfig & RSS_HASHTYPE_RSS_IPV4)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_IPV6)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV4) {
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
 	}
 	if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV6) {
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
 	}
 	if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV4)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV6)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
 
 	return (hashen);
 }
 
 /* Translates hardware hash types to kernel. */
 static int
 hashen_to_hashconfig(int hashen)
 {
 	int hashconfig = 0;
 
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_UDPEN) {
 		/*
 		 * If UDP hashing was enabled it must have been enabled for
 		 * either IPv4 or IPv6 (inclusive or).  Enabling UDP without
 		 * enabling any 4-tuple hash is nonsense configuration.
 		 */
 		MPASS(hashen & (F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN));
 
 		if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
 			hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV4;
 		if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
 			hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV6;
 	}
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV4;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV6;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_IPV4;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_IPV6;
 
 	return (hashconfig);
 }
 #endif
 
 int
 vi_full_init(struct vi_info *vi)
 {
 	struct adapter *sc = vi->pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	uint16_t *rss;
 	struct sge_rxq *rxq;
 	int rc, i, j, hashen;
 #ifdef RSS
 	int nbuckets = rss_getnumbuckets();
 	int hashconfig = rss_gethashconfig();
 	int extra;
 	uint32_t raw_rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
 	uint32_t rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
 #endif
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	KASSERT((vi->flags & VI_INIT_DONE) == 0,
 	    ("%s: VI_INIT_DONE already", __func__));
 
 	sysctl_ctx_init(&vi->ctx);
 	vi->flags |= VI_SYSCTL_CTX;
 
 	/*
 	 * Allocate tx/rx/fl queues for this VI.
 	 */
 	rc = t4_setup_vi_queues(vi);
 	if (rc != 0)
 		goto done;	/* error message displayed already */
 
 #ifdef DEV_NETMAP
 	/* Netmap VIs configure RSS when netmap is enabled. */
 	if (vi->flags & VI_NETMAP) {
 		vi->flags |= VI_INIT_DONE;
 		return (0);
 	}
 #endif
 
 	/*
 	 * Setup RSS for this VI.  Save a copy of the RSS table for later use.
 	 */
 	if (vi->nrxq > vi->rss_size) {
 		if_printf(ifp, "nrxq (%d) > hw RSS table size (%d); "
 		    "some queues will never receive traffic.\n", vi->nrxq,
 		    vi->rss_size);
 	} else if (vi->rss_size % vi->nrxq) {
 		if_printf(ifp, "nrxq (%d), hw RSS table size (%d); "
 		    "expect uneven traffic distribution.\n", vi->nrxq,
 		    vi->rss_size);
 	}
 #ifdef RSS
 	MPASS(RSS_KEYSIZE == 40);
 	if (vi->nrxq != nbuckets) {
 		if_printf(ifp, "nrxq (%d) != kernel RSS buckets (%d);"
 		    "performance will be impacted.\n", vi->nrxq, nbuckets);
 	}
 
 	rss_getkey((void *)&raw_rss_key[0]);
 	for (i = 0; i < nitems(rss_key); i++) {
 		rss_key[i] = htobe32(raw_rss_key[nitems(rss_key) - 1 - i]);
 	}
 	t4_write_rss_key(sc, &rss_key[0], -1);
 #endif
 	rss = malloc(vi->rss_size * sizeof (*rss), M_CXGBE, M_ZERO | M_WAITOK);
 	for (i = 0; i < vi->rss_size;) {
 #ifdef RSS
 		j = rss_get_indirection_to_bucket(i);
 		j %= vi->nrxq;
 		rxq = &sc->sge.rxq[vi->first_rxq + j];
 		rss[i++] = rxq->iq.abs_id;
 #else
 		for_each_rxq(vi, j, rxq) {
 			rss[i++] = rxq->iq.abs_id;
 			if (i == vi->rss_size)
 				break;
 		}
 #endif
 	}
 
 	rc = -t4_config_rss_range(sc, sc->mbox, vi->viid, 0, vi->rss_size, rss,
 	    vi->rss_size);
 	if (rc != 0) {
 		if_printf(ifp, "rss_config failed: %d\n", rc);
 		goto done;
 	}
 
 #ifdef RSS
 	hashen = hashconfig_to_hashen(hashconfig);
 
 	/*
 	 * We may have had to enable some hashes even though the global config
 	 * wants them disabled.  This is a potential problem that must be
 	 * reported to the user.
 	 */
 	extra = hashen_to_hashconfig(hashen) ^ hashconfig;
 
 	/*
 	 * If we consider only the supported hash types, then the enabled hashes
 	 * are a superset of the requested hashes.  In other words, there cannot
 	 * be any supported hash that was requested but not enabled, but there
 	 * can be hashes that were not requested but had to be enabled.
 	 */
 	extra &= SUPPORTED_RSS_HASHTYPES;
 	MPASS((extra & hashconfig) == 0);
 
 	if (extra) {
 		if_printf(ifp,
 		    "global RSS config (0x%x) cannot be accommodated.\n",
 		    hashconfig);
 	}
 	if (extra & RSS_HASHTYPE_RSS_IPV4)
 		if_printf(ifp, "IPv4 2-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_TCP_IPV4)
 		if_printf(ifp, "TCP/IPv4 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_IPV6)
 		if_printf(ifp, "IPv6 2-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_TCP_IPV6)
 		if_printf(ifp, "TCP/IPv6 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_UDP_IPV4)
 		if_printf(ifp, "UDP/IPv4 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_UDP_IPV6)
 		if_printf(ifp, "UDP/IPv6 4-tuple hashing forced on.\n");
 #else
 	hashen = F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN | F_FW_RSS_VI_CONFIG_CMD_UDPEN;
 #endif
 	rc = -t4_config_vi_rss(sc, sc->mbox, vi->viid, hashen, rss[0]);
 	if (rc != 0) {
 		if_printf(ifp, "rss hash/defaultq config failed: %d\n", rc);
 		goto done;
 	}
 
 	vi->rss = rss;
 	vi->flags |= VI_INIT_DONE;
 done:
 	if (rc != 0)
 		vi_full_uninit(vi);
 
 	return (rc);
 }
 
 /*
  * Idempotent.
  */
 int
 vi_full_uninit(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	int i;
 	struct sge_rxq *rxq;
 	struct sge_txq *txq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 	struct sge_wrq *ofld_txq;
 #endif
 
 	if (vi->flags & VI_INIT_DONE) {
 
 		/* Need to quiesce queues.  */
 #ifdef DEV_NETMAP
 		if (vi->flags & VI_NETMAP)
 			goto skip;
 #endif
 
 		/* XXX: Only for the first VI? */
 		if (IS_MAIN_VI(vi))
 			quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]);
 
 		for_each_txq(vi, i, txq) {
 			quiesce_txq(sc, txq);
 		}
 
 #ifdef TCP_OFFLOAD
 		for_each_ofld_txq(vi, i, ofld_txq) {
 			quiesce_wrq(sc, ofld_txq);
 		}
 #endif
 
 		for_each_rxq(vi, i, rxq) {
 			quiesce_iq(sc, &rxq->iq);
 			quiesce_fl(sc, &rxq->fl);
 		}
 
 #ifdef TCP_OFFLOAD
 		for_each_ofld_rxq(vi, i, ofld_rxq) {
 			quiesce_iq(sc, &ofld_rxq->iq);
 			quiesce_fl(sc, &ofld_rxq->fl);
 		}
 #endif
 		free(vi->rss, M_CXGBE);
 	}
 #ifdef DEV_NETMAP
 skip:
 #endif
 
 	t4_teardown_vi_queues(vi);
 	vi->flags &= ~VI_INIT_DONE;
 
 	return (0);
 }
 
 static void
 quiesce_txq(struct adapter *sc, struct sge_txq *txq)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct sge_qstat *spg = (void *)&eq->desc[eq->sidx];
 
 	(void) sc;	/* unused */
 
 #ifdef INVARIANTS
 	TXQ_LOCK(txq);
 	MPASS((eq->flags & EQ_ENABLED) == 0);
 	TXQ_UNLOCK(txq);
 #endif
 
 	/* Wait for the mp_ring to empty. */
 	while (!mp_ring_is_idle(txq->r)) {
 		mp_ring_check_drainage(txq->r, 0);
 		pause("rquiesce", 1);
 	}
 
 	/* Then wait for the hardware to finish. */
 	while (spg->cidx != htobe16(eq->pidx))
 		pause("equiesce", 1);
 
 	/* Finally, wait for the driver to reclaim all descriptors. */
 	while (eq->cidx != eq->pidx)
 		pause("dquiesce", 1);
 }
 
 static void
 quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq)
 {
 
 	/* XXXTX */
 }
 
 static void
 quiesce_iq(struct adapter *sc, struct sge_iq *iq)
 {
 	(void) sc;	/* unused */
 
 	/* Synchronize with the interrupt handler */
 	while (!atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_DISABLED))
 		pause("iqfree", 1);
 }
 
 static void
 quiesce_fl(struct adapter *sc, struct sge_fl *fl)
 {
 	mtx_lock(&sc->sfl_lock);
 	FL_LOCK(fl);
 	fl->flags |= FL_DOOMED;
 	FL_UNLOCK(fl);
 	callout_stop(&sc->sfl_callout);
 	mtx_unlock(&sc->sfl_lock);
 
 	KASSERT((fl->flags & FL_STARVING) == 0,
 	    ("%s: still starving", __func__));
 }
 
 static int
 t4_alloc_irq(struct adapter *sc, struct irq *irq, int rid,
     driver_intr_t *handler, void *arg, char *name)
 {
 	int rc;
 
 	irq->rid = rid;
 	irq->res = bus_alloc_resource_any(sc->dev, SYS_RES_IRQ, &irq->rid,
 	    RF_SHAREABLE | RF_ACTIVE);
 	if (irq->res == NULL) {
 		device_printf(sc->dev,
 		    "failed to allocate IRQ for rid %d, name %s.\n", rid, name);
 		return (ENOMEM);
 	}
 
 	rc = bus_setup_intr(sc->dev, irq->res, INTR_MPSAFE | INTR_TYPE_NET,
 	    NULL, handler, arg, &irq->tag);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to setup interrupt for rid %d, name %s: %d\n",
 		    rid, name, rc);
 	} else if (name)
 		bus_describe_intr(sc->dev, irq->res, irq->tag, name);
 
 	return (rc);
 }
 
 static int
 t4_free_irq(struct adapter *sc, struct irq *irq)
 {
 	if (irq->tag)
 		bus_teardown_intr(sc->dev, irq->res, irq->tag);
 	if (irq->res)
 		bus_release_resource(sc->dev, SYS_RES_IRQ, irq->rid, irq->res);
 
 	bzero(irq, sizeof(*irq));
 
 	return (0);
 }
 
 static void
 get_regs(struct adapter *sc, struct t4_regdump *regs, uint8_t *buf)
 {
 
 	regs->version = chip_id(sc) | chip_rev(sc) << 10;
 	t4_get_regs(sc, buf, regs->len);
 }
 
 #define	A_PL_INDIR_CMD	0x1f8
 
 #define	S_PL_AUTOINC	31
 #define	M_PL_AUTOINC	0x1U
 #define	V_PL_AUTOINC(x)	((x) << S_PL_AUTOINC)
 #define	G_PL_AUTOINC(x)	(((x) >> S_PL_AUTOINC) & M_PL_AUTOINC)
 
 #define	S_PL_VFID	20
 #define	M_PL_VFID	0xffU
 #define	V_PL_VFID(x)	((x) << S_PL_VFID)
 #define	G_PL_VFID(x)	(((x) >> S_PL_VFID) & M_PL_VFID)
 
 #define	S_PL_ADDR	0
 #define	M_PL_ADDR	0xfffffU
 #define	V_PL_ADDR(x)	((x) << S_PL_ADDR)
 #define	G_PL_ADDR(x)	(((x) >> S_PL_ADDR) & M_PL_ADDR)
 
 #define	A_PL_INDIR_DATA	0x1fc
 
 static uint64_t
 read_vf_stat(struct adapter *sc, unsigned int viid, int reg)
 {
 	u32 stats[2];
 
 	mtx_assert(&sc->reg_lock, MA_OWNED);
 	t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
 	    V_PL_VFID(G_FW_VIID_VIN(viid)) | V_PL_ADDR(VF_MPS_REG(reg)));
 	stats[0] = t4_read_reg(sc, A_PL_INDIR_DATA);
 	stats[1] = t4_read_reg(sc, A_PL_INDIR_DATA);
 	return (((uint64_t)stats[1]) << 32 | stats[0]);
 }
 
 static void
 t4_get_vi_stats(struct adapter *sc, unsigned int viid,
     struct fw_vi_stats_vf *stats)
 {
 
 #define GET_STAT(name) \
 	read_vf_stat(sc, viid, A_MPS_VF_STAT_##name##_L)
 
 	stats->tx_bcast_bytes    = GET_STAT(TX_VF_BCAST_BYTES);
 	stats->tx_bcast_frames   = GET_STAT(TX_VF_BCAST_FRAMES);
 	stats->tx_mcast_bytes    = GET_STAT(TX_VF_MCAST_BYTES);
 	stats->tx_mcast_frames   = GET_STAT(TX_VF_MCAST_FRAMES);
 	stats->tx_ucast_bytes    = GET_STAT(TX_VF_UCAST_BYTES);
 	stats->tx_ucast_frames   = GET_STAT(TX_VF_UCAST_FRAMES);
 	stats->tx_drop_frames    = GET_STAT(TX_VF_DROP_FRAMES);
 	stats->tx_offload_bytes  = GET_STAT(TX_VF_OFFLOAD_BYTES);
 	stats->tx_offload_frames = GET_STAT(TX_VF_OFFLOAD_FRAMES);
 	stats->rx_bcast_bytes    = GET_STAT(RX_VF_BCAST_BYTES);
 	stats->rx_bcast_frames   = GET_STAT(RX_VF_BCAST_FRAMES);
 	stats->rx_mcast_bytes    = GET_STAT(RX_VF_MCAST_BYTES);
 	stats->rx_mcast_frames   = GET_STAT(RX_VF_MCAST_FRAMES);
 	stats->rx_ucast_bytes    = GET_STAT(RX_VF_UCAST_BYTES);
 	stats->rx_ucast_frames   = GET_STAT(RX_VF_UCAST_FRAMES);
 	stats->rx_err_frames     = GET_STAT(RX_VF_ERR_FRAMES);
 
 #undef GET_STAT
 }
 
 static void
 t4_clr_vi_stats(struct adapter *sc, unsigned int viid)
 {
 	int reg;
 
 	t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
 	    V_PL_VFID(G_FW_VIID_VIN(viid)) |
 	    V_PL_ADDR(VF_MPS_REG(A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L)));
 	for (reg = A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L;
 	     reg <= A_MPS_VF_STAT_RX_VF_ERR_FRAMES_H; reg += 4)
 		t4_write_reg(sc, A_PL_INDIR_DATA, 0);
 }
 
 static void
 vi_refresh_stats(struct adapter *sc, struct vi_info *vi)
 {
 	struct timeval tv;
 	const struct timeval interval = {0, 250000};	/* 250ms */
 
 	if (!(vi->flags & VI_INIT_DONE))
 		return;
 
 	getmicrotime(&tv);
 	timevalsub(&tv, &interval);
 	if (timevalcmp(&tv, &vi->last_refreshed, <))
 		return;
 
 	mtx_lock(&sc->reg_lock);
 	t4_get_vi_stats(sc, vi->viid, &vi->stats);
 	getmicrotime(&vi->last_refreshed);
 	mtx_unlock(&sc->reg_lock);
 }
 
 static void
 cxgbe_refresh_stats(struct adapter *sc, struct port_info *pi)
 {
 	int i;
 	u_int v, tnl_cong_drops;
 	struct timeval tv;
 	const struct timeval interval = {0, 250000};	/* 250ms */
 
 	getmicrotime(&tv);
 	timevalsub(&tv, &interval);
 	if (timevalcmp(&tv, &pi->last_refreshed, <))
 		return;
 
 	tnl_cong_drops = 0;
 	t4_get_port_stats(sc, pi->tx_chan, &pi->stats);
 	for (i = 0; i < sc->chip_params->nchan; i++) {
 		if (pi->rx_chan_map & (1 << i)) {
 			mtx_lock(&sc->reg_lock);
 			t4_read_indirect(sc, A_TP_MIB_INDEX, A_TP_MIB_DATA, &v,
 			    1, A_TP_MIB_TNL_CNG_DROP_0 + i);
 			mtx_unlock(&sc->reg_lock);
 			tnl_cong_drops += v;
 		}
 	}
 	pi->tnl_cong_drops = tnl_cong_drops;
 	getmicrotime(&pi->last_refreshed);
 }
 
 static void
 cxgbe_tick(void *arg)
 {
 	struct port_info *pi = arg;
 	struct adapter *sc = pi->adapter;
 
 	PORT_LOCK_ASSERT_OWNED(pi);
 	cxgbe_refresh_stats(sc, pi);
 
 	callout_schedule(&pi->tick, hz);
 }
 
 void
 vi_tick(void *arg)
 {
 	struct vi_info *vi = arg;
 	struct adapter *sc = vi->pi->adapter;
 
 	vi_refresh_stats(sc, vi);
 
 	callout_schedule(&vi->tick, hz);
 }
 
 static void
 cxgbe_vlan_config(void *arg, struct ifnet *ifp, uint16_t vid)
 {
 	struct ifnet *vlan;
 
 	if (arg != ifp || ifp->if_type != IFT_ETHER)
 		return;
 
 	vlan = VLAN_DEVAT(ifp, vid);
 	VLAN_SETCOOKIE(vlan, ifp);
 }
 
 static int
 cpl_not_handled(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 
 #ifdef INVARIANTS
 	panic("%s: opcode 0x%02x on iq %p with payload %p",
 	    __func__, rss->opcode, iq, m);
 #else
 	log(LOG_ERR, "%s: opcode 0x%02x on iq %p with payload %p\n",
 	    __func__, rss->opcode, iq, m);
 	m_freem(m);
 #endif
 	return (EDOOFUS);
 }
 
 int
 t4_register_cpl_handler(struct adapter *sc, int opcode, cpl_handler_t h)
 {
 	uintptr_t *loc, new;
 
 	if (opcode >= nitems(sc->cpl_handler))
 		return (EINVAL);
 
 	new = h ? (uintptr_t)h : (uintptr_t)cpl_not_handled;
 	loc = (uintptr_t *) &sc->cpl_handler[opcode];
 	atomic_store_rel_ptr(loc, new);
 
 	return (0);
 }
 
 static int
 an_not_handled(struct sge_iq *iq, const struct rsp_ctrl *ctrl)
 {
 
 #ifdef INVARIANTS
 	panic("%s: async notification on iq %p (ctrl %p)", __func__, iq, ctrl);
 #else
 	log(LOG_ERR, "%s: async notification on iq %p (ctrl %p)\n",
 	    __func__, iq, ctrl);
 #endif
 	return (EDOOFUS);
 }
 
 int
 t4_register_an_handler(struct adapter *sc, an_handler_t h)
 {
 	uintptr_t *loc, new;
 
 	new = h ? (uintptr_t)h : (uintptr_t)an_not_handled;
 	loc = (uintptr_t *) &sc->an_handler;
 	atomic_store_rel_ptr(loc, new);
 
 	return (0);
 }
 
 static int
 fw_msg_not_handled(struct adapter *sc, const __be64 *rpl)
 {
 	const struct cpl_fw6_msg *cpl =
 	    __containerof(rpl, struct cpl_fw6_msg, data[0]);
 
 #ifdef INVARIANTS
 	panic("%s: fw_msg type %d", __func__, cpl->type);
 #else
 	log(LOG_ERR, "%s: fw_msg type %d\n", __func__, cpl->type);
 #endif
 	return (EDOOFUS);
 }
 
 int
 t4_register_fw_msg_handler(struct adapter *sc, int type, fw_msg_handler_t h)
 {
 	uintptr_t *loc, new;
 
 	if (type >= nitems(sc->fw_msg_handler))
 		return (EINVAL);
 
 	/*
 	 * These are dispatched by the handler for FW{4|6}_CPL_MSG using the CPL
 	 * handler dispatch table.  Reject any attempt to install a handler for
 	 * this subtype.
 	 */
 	if (type == FW_TYPE_RSSCPL || type == FW6_TYPE_RSSCPL)
 		return (EINVAL);
 
 	new = h ? (uintptr_t)h : (uintptr_t)fw_msg_not_handled;
 	loc = (uintptr_t *) &sc->fw_msg_handler[type];
 	atomic_store_rel_ptr(loc, new);
 
 	return (0);
 }
 
 /*
  * Should match fw_caps_config_<foo> enums in t4fw_interface.h
  */
 static char *caps_decoder[] = {
 	"\20\001IPMI\002NCSI",				/* 0: NBM */
 	"\20\001PPP\002QFC\003DCBX",			/* 1: link */
 	"\20\001INGRESS\002EGRESS",			/* 2: switch */
 	"\20\001NIC\002VM\003IDS\004UM\005UM_ISGL"	/* 3: NIC */
 	    "\006HASHFILTER\007ETHOFLD",
 	"\20\001TOE",					/* 4: TOE */
 	"\20\001RDDP\002RDMAC",				/* 5: RDMA */
 	"\20\001INITIATOR_PDU\002TARGET_PDU"		/* 6: iSCSI */
 	    "\003INITIATOR_CNXOFLD\004TARGET_CNXOFLD"
 	    "\005INITIATOR_SSNOFLD\006TARGET_SSNOFLD"
 	    "\007T10DIF"
 	    "\010INITIATOR_CMDOFLD\011TARGET_CMDOFLD",
 	"\20\00KEYS",					/* 7: TLS */
 	"\20\001INITIATOR\002TARGET\003CTRL_OFLD"	/* 8: FCoE */
 		    "\004PO_INITIATOR\005PO_TARGET",
 };
 
 static void
 t4_sysctls(struct adapter *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children, *c0;
 	static char *doorbells = {"\20\1UDB\2WCWR\3UDBWC\4KDB"};
 
 	ctx = device_get_sysctl_ctx(sc->dev);
 
 	/*
 	 * dev.t4nex.X.
 	 */
 	oid = device_get_sysctl_tree(sc->dev);
 	c0 = children = SYSCTL_CHILDREN(oid);
 
 	sc->sc_do_rxcopy = 1;
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "do_rx_copy", CTLFLAG_RW,
 	    &sc->sc_do_rxcopy, 1, "Do RX copy of small frames");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nports", CTLFLAG_RD, NULL,
 	    sc->params.nports, "# of ports");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "hw_revision", CTLFLAG_RD,
 	    NULL, chip_rev(sc), "chip hardware revision");
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "tp_version",
 	    CTLFLAG_RD, sc->tp_version, 0, "TP microcode version");
 
 	if (sc->params.exprom_vers != 0) {
 		SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "exprom_version",
 		    CTLFLAG_RD, sc->exprom_version, 0, "expansion ROM version");
 	}
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "firmware_version",
 	    CTLFLAG_RD, sc->fw_version, 0, "firmware version");
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "cf",
 	    CTLFLAG_RD, sc->cfg_file, 0, "configuration file");
 
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cfcsum", CTLFLAG_RD, NULL,
 	    sc->cfcsum, "config file checksum");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "doorbells",
 	    CTLTYPE_STRING | CTLFLAG_RD, doorbells, sc->doorbells,
 	    sysctl_bitfield, "A", "available doorbells");
 
 #define SYSCTL_CAP(name, n, text) \
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, #name, \
 	    CTLTYPE_STRING | CTLFLAG_RD, caps_decoder[n], sc->name, \
 	    sysctl_bitfield, "A", "available " text "capabilities")
 
 	SYSCTL_CAP(nbmcaps, 0, "NBM");
 	SYSCTL_CAP(linkcaps, 1, "link");
 	SYSCTL_CAP(switchcaps, 2, "switch");
 	SYSCTL_CAP(niccaps, 3, "NIC");
 	SYSCTL_CAP(toecaps, 4, "TCP offload");
 	SYSCTL_CAP(rdmacaps, 5, "RDMA");
 	SYSCTL_CAP(iscsicaps, 6, "iSCSI");
 	SYSCTL_CAP(tlscaps, 7, "TLS");
 	SYSCTL_CAP(fcoecaps, 8, "FCoE");
 #undef SYSCTL_CAP
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "core_clock", CTLFLAG_RD, NULL,
 	    sc->params.vpd.cclk, "core clock frequency (in KHz)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_timers",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.timer_val,
 	    sizeof(sc->params.sge.timer_val), sysctl_int_array, "A",
 	    "interrupt holdoff timer values (us)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pkt_counts",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.counter_val,
 	    sizeof(sc->params.sge.counter_val), sysctl_int_array, "A",
 	    "interrupt holdoff packet counter values");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nfilters", CTLFLAG_RD,
 	    NULL, sc->tids.nftids, "number of filters");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature", CTLTYPE_INT |
 	    CTLFLAG_RD, sc, 0, sysctl_temperature, "I",
 	    "chip temperature (in Celsius)");
 
 	t4_sge_sysctls(sc, ctx, children);
 
 	sc->lro_timeout = 100;
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "lro_timeout", CTLFLAG_RW,
 	    &sc->lro_timeout, 0, "lro inactive-flush timeout (in us)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "debug_flags", CTLFLAG_RW,
 	    &sc->debug_flags, 0, "flags to enable runtime debugging");
 
 #ifdef SBUF_DRAIN
 	/*
 	 * dev.t4nex.X.misc.  Marked CTLFLAG_SKIP to avoid information overload.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "misc",
 	    CTLFLAG_RD | CTLFLAG_SKIP, NULL,
 	    "logs and miscellaneous information");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cctrl",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cctrl, "A", "congestion control");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 0 (TP0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 1,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 1 (TP1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ulp",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 2,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 2 (ULP)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 3,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 3 (SGE0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 4,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 4 (SGE1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ncsi",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 5,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 5 (NCSI)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    chip_id(sc) <= CHELSIO_T5 ? sysctl_cim_la : sysctl_cim_la_t6,
 	    "A", "CIM logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ma_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_ma_la, "A", "CIM MA logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 0 (ULP0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 1 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 1 (ULP1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp2",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 2 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 2 (ULP2)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp3",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 3 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 3 (ULP3)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 4 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 4 (SGE)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ncsi",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 5 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 5 (NCSI)");
 
 	if (chip_id(sc) > CHELSIO_T4) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge0_rx",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 6 + CIM_NUM_IBQ,
 		    sysctl_cim_ibq_obq, "A", "CIM OBQ 6 (SGE0-RX)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge1_rx",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 7 + CIM_NUM_IBQ,
 		    sysctl_cim_ibq_obq, "A", "CIM OBQ 7 (SGE1-RX)");
 	}
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_pif_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_pif_la, "A", "CIM PIF logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_qcfg",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_qcfg, "A", "CIM queue configuration");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cpl_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cpl_stats, "A", "CPL statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ddp_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_ddp_stats, "A", "non-TCP DDP statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "devlog",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_devlog, "A", "firmware's device log");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fcoe_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_fcoe_stats, "A", "FCoE statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "hw_sched",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_hw_sched, "A", "hardware scheduler ");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "l2t",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_l2t, "A", "hardware L2 table");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "lb_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_lb_stats, "A", "loopback statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "meminfo",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_meminfo, "A", "memory regions");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "mps_tcam",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    chip_id(sc) <= CHELSIO_T5 ? sysctl_mps_tcam : sysctl_mps_tcam_t6,
 	    "A", "MPS TCAM entries");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "path_mtus",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_path_mtus, "A", "path MTUs");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pm_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_pm_stats, "A", "PM statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rdma_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_rdma_stats, "A", "RDMA statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tcp_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tcp_stats, "A", "TCP statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tids",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tids, "A", "TID information");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_err_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tp_err_stats, "A", "TP error statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la_mask",
 	    CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_tp_la_mask, "I",
 	    "TP logic analyzer event capture mask");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tp_la, "A", "TP logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tx_rate",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tx_rate, "A", "Tx rate");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ulprx_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_ulprx_la, "A", "ULPRX logic analyzer");
 
 	if (is_t5(sc)) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "wcwr_stats",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 		    sysctl_wcwr_stats, "A", "write combined work requests");
 	}
 #endif
 
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		/*
 		 * dev.t4nex.X.toe.
 		 */
 		oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "toe", CTLFLAG_RD,
 		    NULL, "TOE parameters");
 		children = SYSCTL_CHILDREN(oid);
 
 		sc->tt.sndbuf = 256 * 1024;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "sndbuf", CTLFLAG_RW,
 		    &sc->tt.sndbuf, 0, "max hardware send buffer size");
 
 		sc->tt.ddp = 0;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ddp", CTLFLAG_RW,
 		    &sc->tt.ddp, 0, "DDP allowed");
 
 		sc->tt.rx_coalesce = 1;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "rx_coalesce",
 		    CTLFLAG_RW, &sc->tt.rx_coalesce, 0, "receive coalescing");
 
 		sc->tt.tx_align = 1;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_align",
 		    CTLFLAG_RW, &sc->tt.tx_align, 0, "chop and align payload");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timer_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_tick, "A",
 		    "TP timer tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timestamp_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 1, sysctl_tp_tick, "A",
 		    "TCP timestamp tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 2, sysctl_tp_tick, "A",
 		    "DACK tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_timer",
 		    CTLTYPE_UINT | CTLFLAG_RD, sc, 0, sysctl_tp_dack_timer,
 		    "IU", "DACK timer (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_min",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MIN,
 		    sysctl_tp_timer, "LU", "Retransmit min (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_max",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MAX,
 		    sysctl_tp_timer, "LU", "Retransmit max (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_min",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MIN,
 		    sysctl_tp_timer, "LU", "Persist timer min (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_max",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MAX,
 		    sysctl_tp_timer, "LU", "Persist timer max (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_idle",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_IDLE,
 		    sysctl_tp_timer, "LU", "Keepidle idle timer (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_intvl",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_INTVL,
 		    sysctl_tp_timer, "LU", "Keepidle interval (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "initial_srtt",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_INIT_SRTT,
 		    sysctl_tp_timer, "LU", "Initial SRTT (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "finwait2_timer",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_FINWAIT2_TIMER,
 		    sysctl_tp_timer, "LU", "FINWAIT2 timer (us)");
 	}
 #endif
 }
 
 void
 vi_sysctls(struct vi_info *vi)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children;
 
 	ctx = device_get_sysctl_ctx(vi->dev);
 
 	/*
 	 * dev.[nv](cxgbe|cxl).X.
 	 */
 	oid = device_get_sysctl_tree(vi->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "viid", CTLFLAG_RD, NULL,
 	    vi->viid, "VI identifer");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nrxq", CTLFLAG_RD,
 	    &vi->nrxq, 0, "# of rx queues");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ntxq", CTLFLAG_RD,
 	    &vi->ntxq, 0, "# of tx queues");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_rxq", CTLFLAG_RD,
 	    &vi->first_rxq, 0, "index of first rx queue");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_txq", CTLFLAG_RD,
 	    &vi->first_txq, 0, "index of first tx queue");
 
 	if (vi->flags & VI_NETMAP)
 		return;
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rsrv_noflowq", CTLTYPE_INT |
 	    CTLFLAG_RW, vi, 0, sysctl_noflowq, "IU",
 	    "Reserve queue 0 for non-flowid packets");
 
 #ifdef TCP_OFFLOAD
 	if (vi->nofldrxq != 0) {
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldrxq", CTLFLAG_RD,
 		    &vi->nofldrxq, 0,
 		    "# of rx queues for offloaded TCP connections");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldtxq", CTLFLAG_RD,
 		    &vi->nofldtxq, 0,
 		    "# of tx queues for offloaded TCP connections");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_rxq",
 		    CTLFLAG_RD, &vi->first_ofld_rxq, 0,
 		    "index of first TOE rx queue");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_txq",
 		    CTLFLAG_RD, &vi->first_ofld_txq, 0,
 		    "index of first TOE tx queue");
 	}
 #endif
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_tmr_idx",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_tmr_idx, "I",
 	    "holdoff timer index");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pktc_idx",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_pktc_idx, "I",
 	    "holdoff packet counter index");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_rxq",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_rxq, "I",
 	    "rx queue size");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_txq",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_txq, "I",
 	    "tx queue size");
 }
 
 static void
 cxgbe_sysctls(struct port_info *pi)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
-	struct sysctl_oid_list *children;
+	struct sysctl_oid_list *children, *children2;
 	struct adapter *sc = pi->adapter;
+	int i;
+	char name[16];
 
 	ctx = device_get_sysctl_ctx(pi->dev);
 
 	/*
 	 * dev.cxgbe.X.
 	 */
 	oid = device_get_sysctl_tree(pi->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "linkdnrc", CTLTYPE_STRING |
 	   CTLFLAG_RD, pi, 0, sysctl_linkdnrc, "A", "reason why link is down");
 	if (pi->port_type == FW_PORT_TYPE_BT_XAUI) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature",
 		    CTLTYPE_INT | CTLFLAG_RD, pi, 0, sysctl_btphy, "I",
 		    "PHY temperature (in Celsius)");
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fw_version",
 		    CTLTYPE_INT | CTLFLAG_RD, pi, 1, sysctl_btphy, "I",
 		    "PHY firmware version");
 	}
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pause_settings",
 	    CTLTYPE_STRING | CTLFLAG_RW, pi, PAUSE_TX, sysctl_pause_settings,
 	    "A", "PAUSE settings (bit 0 = rx_pause, bit 1 = tx_pause)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "max_speed", CTLFLAG_RD, NULL,
 	    port_top_speed(pi), "max speed (in Gbps)");
 
 	/*
+	 * dev.(cxgbe|cxl).X.tc.
+	 */
+	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "tc", CTLFLAG_RD, NULL,
+	    "Tx scheduler traffic classes");
+	for (i = 0; i < sc->chip_params->nsched_cls; i++) {
+		struct tx_sched_class *tc = &pi->tc[i];
+
+		snprintf(name, sizeof(name), "%d", i);
+		children2 = SYSCTL_CHILDREN(SYSCTL_ADD_NODE(ctx,
+		    SYSCTL_CHILDREN(oid), OID_AUTO, name, CTLFLAG_RD, NULL,
+		    "traffic class"));
+		SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "flags", CTLFLAG_RD,
+		    &tc->flags, 0, "flags");
+		SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "refcount",
+		    CTLFLAG_RD, &tc->refcount, 0, "references to this class");
+#ifdef SBUF_DRAIN
+		SYSCTL_ADD_PROC(ctx, children2, OID_AUTO, "params",
+		    CTLTYPE_STRING | CTLFLAG_RD, sc, (pi->port_id << 16) | i,
+		    sysctl_tc_params, "A", "traffic class parameters");
+#endif
+	}
+
+	/*
 	 * dev.cxgbe.X.stats.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD,
 	    NULL, "port statistics");
 	children = SYSCTL_CHILDREN(oid);
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_parse_error", CTLFLAG_RD,
 	    &pi->tx_parse_error, 0,
 	    "# of tx packets with invalid length or # of segments");
 
 #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \
 	SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \
 	    CTLTYPE_U64 | CTLFLAG_RD, sc, reg, \
 	    sysctl_handle_t4_reg64, "QU", desc)
 
 	SYSCTL_ADD_T4_REG64(pi, "tx_octets", "# of octets in good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BYTES_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames", "total # of good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_FRAMES_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_bcast_frames", "# of broadcast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_mcast_frames", "# of multicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_MCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ucast_frames", "# of unicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_UCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_error_frames", "# of error frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_64",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_65_127",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_65B_127B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_128_255",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_128B_255B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_256_511",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_256B_511B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_512_1023",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_512B_1023B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_1024_1518",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1024B_1518B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_1519_max",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1519B_MAX_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_drop", "# of dropped tx frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_DROP_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_pause", "# of pause frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PAUSE_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp0", "# of PPP prio 0 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP0_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp1", "# of PPP prio 1 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP1_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp2", "# of PPP prio 2 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP2_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp3", "# of PPP prio 3 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP3_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp4", "# of PPP prio 4 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP4_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp5", "# of PPP prio 5 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP5_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp6", "# of PPP prio 6 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP6_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp7", "# of PPP prio 7 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP7_L));
 
 	SYSCTL_ADD_T4_REG64(pi, "rx_octets", "# of octets in good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BYTES_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames", "total # of good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_FRAMES_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_bcast_frames", "# of broadcast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_mcast_frames", "# of multicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ucast_frames", "# of unicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_UCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_too_long", "# of frames exceeding MTU",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_jabber", "# of jabber frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_CRC_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_fcs_err",
 	    "# of frames received with bad FCS",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_CRC_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_len_err",
 	    "# of frames received with length error",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LEN_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_symbol_err", "symbol errors",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_SYM_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_runt", "# of short frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LESS_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_64",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_65_127",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_65B_127B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_128_255",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_128B_255B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_256_511",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_256B_511B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_512_1023",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_512B_1023B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_1024_1518",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1024B_1518B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_1519_max",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1519B_MAX_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_pause", "# of pause frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PAUSE_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp0", "# of PPP prio 0 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP0_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp1", "# of PPP prio 1 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP1_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp2", "# of PPP prio 2 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP2_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp3", "# of PPP prio 3 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP3_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp4", "# of PPP prio 4 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP4_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp5", "# of PPP prio 5 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP5_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp6", "# of PPP prio 6 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP6_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp7", "# of PPP prio 7 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP7_L));
 
 #undef SYSCTL_ADD_T4_REG64
 
 #define SYSCTL_ADD_T4_PORTSTAT(name, desc) \
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, #name, CTLFLAG_RD, \
 	    &pi->stats.name, desc)
 
 	/* We get these from port_stats and they may be stale by up to 1s */
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow0,
 	    "# drops due to buffer-group 0 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow1,
 	    "# drops due to buffer-group 1 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow2,
 	    "# drops due to buffer-group 2 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow3,
 	    "# drops due to buffer-group 3 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc0,
 	    "# of buffer-group 0 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc1,
 	    "# of buffer-group 1 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc2,
 	    "# of buffer-group 2 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc3,
 	    "# of buffer-group 3 truncated packets");
 
 #undef SYSCTL_ADD_T4_PORTSTAT
 }
 
 static int
 sysctl_int_array(SYSCTL_HANDLER_ARGS)
 {
 	int rc, *i, space = 0;
 	struct sbuf sb;
 
 	sbuf_new_for_sysctl(&sb, NULL, 64, req);
 	for (i = arg1; arg2; arg2 -= sizeof(int), i++) {
 		if (space)
 			sbuf_printf(&sb, " ");
 		sbuf_printf(&sb, "%d", *i);
 		space = 1;
 	}
 	rc = sbuf_finish(&sb);
 	sbuf_delete(&sb);
 	return (rc);
 }
 
 static int
 sysctl_bitfield(SYSCTL_HANDLER_ARGS)
 {
 	int rc;
 	struct sbuf *sb;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "%b", (int)arg2, (char *)arg1);
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_btphy(SYSCTL_HANDLER_ARGS)
 {
 	struct port_info *pi = arg1;
 	int op = arg2;
 	struct adapter *sc = pi->adapter;
 	u_int v;
 	int rc;
 
 	rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK, "t4btt");
 	if (rc)
 		return (rc);
 	/* XXX: magic numbers */
 	rc = -t4_mdio_rd(sc, sc->mbox, pi->mdio_addr, 0x1e, op ? 0x20 : 0xc820,
 	    &v);
 	end_synchronized_op(sc, 0);
 	if (rc)
 		return (rc);
 	if (op == 0)
 		v /= 256;
 
 	rc = sysctl_handle_int(oidp, &v, 0, req);
 	return (rc);
 }
 
 static int
 sysctl_noflowq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	int rc, val;
 
 	val = vi->rsrv_noflowq;
 	rc = sysctl_handle_int(oidp, &val, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if ((val >= 1) && (vi->ntxq > 1))
 		vi->rsrv_noflowq = 1;
 	else
 		vi->rsrv_noflowq = 0;
 
 	return (rc);
 }
 
 static int
 sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int idx, rc, i;
 	struct sge_rxq *rxq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 	uint8_t v;
 
 	idx = vi->tmr_idx;
 
 	rc = sysctl_handle_int(oidp, &idx, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (idx < 0 || idx >= SGE_NTIMERS)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4tmr");
 	if (rc)
 		return (rc);
 
 	v = V_QINTR_TIMER_IDX(idx) | V_QINTR_CNT_EN(vi->pktc_idx != -1);
 	for_each_rxq(vi, i, rxq) {
 #ifdef atomic_store_rel_8
 		atomic_store_rel_8(&rxq->iq.intr_params, v);
 #else
 		rxq->iq.intr_params = v;
 #endif
 	}
 #ifdef TCP_OFFLOAD
 	for_each_ofld_rxq(vi, i, ofld_rxq) {
 #ifdef atomic_store_rel_8
 		atomic_store_rel_8(&ofld_rxq->iq.intr_params, v);
 #else
 		ofld_rxq->iq.intr_params = v;
 #endif
 	}
 #endif
 	vi->tmr_idx = idx;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (0);
 }
 
 static int
 sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int idx, rc;
 
 	idx = vi->pktc_idx;
 
 	rc = sysctl_handle_int(oidp, &idx, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (idx < -1 || idx >= SGE_NCOUNTERS)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4pktc");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->pktc_idx = idx;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int qsize, rc;
 
 	qsize = vi->qsize_rxq;
 
 	rc = sysctl_handle_int(oidp, &qsize, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (qsize < 128 || (qsize & 7))
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4rxqs");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->qsize_rxq = qsize;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_qsize_txq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int qsize, rc;
 
 	qsize = vi->qsize_txq;
 
 	rc = sysctl_handle_int(oidp, &qsize, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (qsize < 128 || qsize > 65536)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4txqs");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->qsize_txq = qsize;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_pause_settings(SYSCTL_HANDLER_ARGS)
 {
 	struct port_info *pi = arg1;
 	struct adapter *sc = pi->adapter;
 	struct link_config *lc = &pi->link_cfg;
 	int rc;
 
 	if (req->newptr == NULL) {
 		struct sbuf *sb;
 		static char *bits = "\20\1PAUSE_RX\2PAUSE_TX";
 
 		rc = sysctl_wire_old_buffer(req, 0);
 		if (rc != 0)
 			return(rc);
 
 		sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
 		if (sb == NULL)
 			return (ENOMEM);
 
 		sbuf_printf(sb, "%b", lc->fc & (PAUSE_TX | PAUSE_RX), bits);
 		rc = sbuf_finish(sb);
 		sbuf_delete(sb);
 	} else {
 		char s[2];
 		int n;
 
 		s[0] = '0' + (lc->requested_fc & (PAUSE_TX | PAUSE_RX));
 		s[1] = 0;
 
 		rc = sysctl_handle_string(oidp, s, sizeof(s), req);
 		if (rc != 0)
 			return(rc);
 
 		if (s[1] != 0)
 			return (EINVAL);
 		if (s[0] < '0' || s[0] > '9')
 			return (EINVAL);	/* not a number */
 		n = s[0] - '0';
 		if (n & ~(PAUSE_TX | PAUSE_RX))
 			return (EINVAL);	/* some other bit is set too */
 
 		rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK,
 		    "t4PAUSE");
 		if (rc)
 			return (rc);
 		if ((lc->requested_fc & (PAUSE_TX | PAUSE_RX)) != n) {
 			int link_ok = lc->link_ok;
 
 			lc->requested_fc &= ~(PAUSE_TX | PAUSE_RX);
 			lc->requested_fc |= n;
 			rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, lc);
 			lc->link_ok = link_ok;	/* restore */
 		}
 		end_synchronized_op(sc, 0);
 	}
 
 	return (rc);
 }
 
 static int
 sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int reg = arg2;
 	uint64_t val;
 
 	val = t4_read_reg64(sc, reg);
 
 	return (sysctl_handle_64(oidp, &val, 0, req));
 }
 
 static int
 sysctl_temperature(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int rc, t;
 	uint32_t param, val;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4temp");
 	if (rc)
 		return (rc);
 	param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_DIAG) |
 	    V_FW_PARAMS_PARAM_Y(FW_PARAM_DEV_DIAG_TMP);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 	end_synchronized_op(sc, 0);
 	if (rc)
 		return (rc);
 
 	/* unknown is returned as 0 but we display -1 in that case */
 	t = val == 0 ? -1 : val;
 
 	rc = sysctl_handle_int(oidp, &t, 0, req);
 	return (rc);
 }
 
 #ifdef SBUF_DRAIN
 static int
 sysctl_cctrl(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint16_t incr[NMTUS][NCCTRL_WIN];
 	static const char *dec_fac[] = {
 		"0.5", "0.5625", "0.625", "0.6875", "0.75", "0.8125", "0.875",
 		"0.9375"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_read_cong_tbl(sc, incr);
 
 	for (i = 0; i < NCCTRL_WIN; ++i) {
 		sbuf_printf(sb, "%2d: %4u %4u %4u %4u %4u %4u %4u %4u\n", i,
 		    incr[0][i], incr[1][i], incr[2][i], incr[3][i], incr[4][i],
 		    incr[5][i], incr[6][i], incr[7][i]);
 		sbuf_printf(sb, "%8u %4u %4u %4u %4u %4u %4u %4u %5u %s\n",
 		    incr[8][i], incr[9][i], incr[10][i], incr[11][i],
 		    incr[12][i], incr[13][i], incr[14][i], incr[15][i],
 		    sc->params.a_wnd[i], dec_fac[sc->params.b_wnd[i]]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static const char *qname[CIM_NUM_IBQ + CIM_NUM_OBQ_T5] = {
 	"TP0", "TP1", "ULP", "SGE0", "SGE1", "NC-SI",	/* ibq's */
 	"ULP0", "ULP1", "ULP2", "ULP3", "SGE", "NC-SI",	/* obq's */
 	"SGE0-RX", "SGE1-RX"	/* additional obq's (T5 onwards) */
 };
 
 static int
 sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, n, qid = arg2;
 	uint32_t *buf, *p;
 	char *qtype;
 	u_int cim_num_obq = sc->chip_params->cim_num_obq;
 
 	KASSERT(qid >= 0 && qid < CIM_NUM_IBQ + cim_num_obq,
 	    ("%s: bad qid %d\n", __func__, qid));
 
 	if (qid < CIM_NUM_IBQ) {
 		/* inbound queue */
 		qtype = "IBQ";
 		n = 4 * CIM_IBQ_SIZE;
 		buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
 		rc = t4_read_cim_ibq(sc, qid, buf, n);
 	} else {
 		/* outbound queue */
 		qtype = "OBQ";
 		qid -= CIM_NUM_IBQ;
 		n = 4 * cim_num_obq * CIM_OBQ_SIZE;
 		buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
 		rc = t4_read_cim_obq(sc, qid, buf, n);
 	}
 
 	if (rc < 0) {
 		rc = -rc;
 		goto done;
 	}
 	n = rc * sizeof(uint32_t);	/* rc has # of words actually read */
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		goto done;
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
 	if (sb == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 
 	sbuf_printf(sb, "%s%d %s", qtype , qid, qname[arg2]);
 	for (i = 0, p = buf; i < n; i += 16, p += 4)
 		sbuf_printf(sb, "\n%#06x: %08x %08x %08x %08x", i, p[0], p[1],
 		    p[2], p[3]);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int cfg;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	MPASS(chip_id(sc) <= CHELSIO_T5);
 
 	rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
 	if (rc != 0)
 		return (rc);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	rc = -t4_cim_read_la(sc, buf, NULL);
 	if (rc != 0)
 		goto done;
 
 	sbuf_printf(sb, "Status   Data      PC%s",
 	    cfg & F_UPDBGLACAPTPCONLY ? "" :
 	    "     LS0Stat  LS0Addr             LS0Data");
 
 	for (p = buf; p <= &buf[sc->params.cim_la_size - 8]; p += 8) {
 		if (cfg & F_UPDBGLACAPTPCONLY) {
 			sbuf_printf(sb, "\n  %02x   %08x %08x", p[5] & 0xff,
 			    p[6], p[7]);
 			sbuf_printf(sb, "\n  %02x   %02x%06x %02x%06x",
 			    (p[3] >> 8) & 0xff, p[3] & 0xff, p[4] >> 8,
 			    p[4] & 0xff, p[5] >> 8);
 			sbuf_printf(sb, "\n  %02x   %x%07x %x%07x",
 			    (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
 			    p[1] & 0xf, p[2] >> 4);
 		} else {
 			sbuf_printf(sb,
 			    "\n  %02x   %x%07x %x%07x %08x %08x "
 			    "%08x%08x%08x%08x",
 			    (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
 			    p[1] & 0xf, p[2] >> 4, p[2] & 0xf, p[3], p[4], p[5],
 			    p[6], p[7]);
 		}
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int cfg;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	MPASS(chip_id(sc) > CHELSIO_T5);
 
 	rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
 	if (rc != 0)
 		return (rc);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	rc = -t4_cim_read_la(sc, buf, NULL);
 	if (rc != 0)
 		goto done;
 
 	sbuf_printf(sb, "Status   Inst    Data      PC%s",
 	    cfg & F_UPDBGLACAPTPCONLY ? "" :
 	    "     LS0Stat  LS0Addr  LS0Data  LS1Stat  LS1Addr  LS1Data");
 
 	for (p = buf; p <= &buf[sc->params.cim_la_size - 10]; p += 10) {
 		if (cfg & F_UPDBGLACAPTPCONLY) {
 			sbuf_printf(sb, "\n  %02x   %08x %08x %08x",
 			    p[3] & 0xff, p[2], p[1], p[0]);
 			sbuf_printf(sb, "\n  %02x   %02x%06x %02x%06x %02x%06x",
 			    (p[6] >> 8) & 0xff, p[6] & 0xff, p[5] >> 8,
 			    p[5] & 0xff, p[4] >> 8, p[4] & 0xff, p[3] >> 8);
 			sbuf_printf(sb, "\n  %02x   %04x%04x %04x%04x %04x%04x",
 			    (p[9] >> 16) & 0xff, p[9] & 0xffff, p[8] >> 16,
 			    p[8] & 0xffff, p[7] >> 16, p[7] & 0xffff,
 			    p[6] >> 16);
 		} else {
 			sbuf_printf(sb, "\n  %02x   %04x%04x %04x%04x %04x%04x "
 			    "%08x %08x %08x %08x %08x %08x",
 			    (p[9] >> 16) & 0xff,
 			    p[9] & 0xffff, p[8] >> 16,
 			    p[8] & 0xffff, p[7] >> 16,
 			    p[7] & 0xffff, p[6] >> 16,
 			    p[2], p[1], p[0], p[5], p[4], p[3]);
 		}
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int i;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(2 * CIM_MALA_SIZE * 5 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_cim_read_ma_la(sc, buf, buf + 5 * CIM_MALA_SIZE);
 	p = buf;
 
 	for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
 		sbuf_printf(sb, "\n%02x%08x%08x%08x%08x", p[4], p[3], p[2],
 		    p[1], p[0]);
 	}
 
 	sbuf_printf(sb, "\n\nCnt ID Tag UE       Data       RDY VLD");
 	for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
 		sbuf_printf(sb, "\n%3u %2u  %x   %u %08x%08x  %u   %u",
 		    (p[2] >> 10) & 0xff, (p[2] >> 7) & 7,
 		    (p[2] >> 3) & 0xf, (p[2] >> 2) & 1,
 		    (p[1] >> 2) | ((p[2] & 3) << 30),
 		    (p[0] >> 2) | ((p[1] & 3) << 30), (p[0] >> 1) & 1,
 		    p[0] & 1);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int i;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(2 * CIM_PIFLA_SIZE * 6 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_cim_read_pif_la(sc, buf, buf + 6 * CIM_PIFLA_SIZE, NULL, NULL);
 	p = buf;
 
 	sbuf_printf(sb, "Cntl ID DataBE   Addr                 Data");
 	for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
 		sbuf_printf(sb, "\n %02x  %02x  %04x  %08x %08x%08x%08x%08x",
 		    (p[5] >> 22) & 0xff, (p[5] >> 16) & 0x3f, p[5] & 0xffff,
 		    p[4], p[3], p[2], p[1], p[0]);
 	}
 
 	sbuf_printf(sb, "\n\nCntl ID               Data");
 	for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
 		sbuf_printf(sb, "\n %02x  %02x %08x%08x%08x%08x",
 		    (p[4] >> 6) & 0xff, p[4] & 0x3f, p[3], p[2], p[1], p[0]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint16_t base[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
 	uint16_t size[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
 	uint16_t thres[CIM_NUM_IBQ];
 	uint32_t obq_wr[2 * CIM_NUM_OBQ_T5], *wr = obq_wr;
 	uint32_t stat[4 * (CIM_NUM_IBQ + CIM_NUM_OBQ_T5)], *p = stat;
 	u_int cim_num_obq, ibq_rdaddr, obq_rdaddr, nq;
 
 	cim_num_obq = sc->chip_params->cim_num_obq;
 	if (is_t4(sc)) {
 		ibq_rdaddr = A_UP_IBQ_0_RDADDR;
 		obq_rdaddr = A_UP_OBQ_0_REALADDR;
 	} else {
 		ibq_rdaddr = A_UP_IBQ_0_SHADOW_RDADDR;
 		obq_rdaddr = A_UP_OBQ_0_SHADOW_REALADDR;
 	}
 	nq = CIM_NUM_IBQ + cim_num_obq;
 
 	rc = -t4_cim_read(sc, ibq_rdaddr, 4 * nq, stat);
 	if (rc == 0)
 		rc = -t4_cim_read(sc, obq_rdaddr, 2 * cim_num_obq, obq_wr);
 	if (rc != 0)
 		return (rc);
 
 	t4_read_cimq_cfg(sc, base, size, thres);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "Queue  Base  Size Thres RdPtr WrPtr  SOP  EOP Avail");
 
 	for (i = 0; i < CIM_NUM_IBQ; i++, p += 4)
 		sbuf_printf(sb, "\n%7s %5x %5u %5u %6x  %4x %4u %4u %5u",
 		    qname[i], base[i], size[i], thres[i], G_IBQRDADDR(p[0]),
 		    G_IBQWRADDR(p[1]), G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
 		    G_QUEREMFLITS(p[2]) * 16);
 	for ( ; i < nq; i++, p += 4, wr += 2)
 		sbuf_printf(sb, "\n%7s %5x %5u %12x  %4x %4u %4u %5u", qname[i],
 		    base[i], size[i], G_QUERDADDR(p[0]) & 0x3fff,
 		    wr[0] - base[i], G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
 		    G_QUEREMFLITS(p[2]) * 16);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_cpl_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_cpl_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_cpl_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "                 channel 0  channel 1"
 		    "  channel 2  channel 3");
 		sbuf_printf(sb, "\nCPL requests:   %10u %10u %10u %10u",
 		    stats.req[0], stats.req[1], stats.req[2], stats.req[3]);
 		sbuf_printf(sb, "\nCPL responses:   %10u %10u %10u %10u",
 		    stats.rsp[0], stats.rsp[1], stats.rsp[2], stats.rsp[3]);
 	} else {
 		sbuf_printf(sb, "                 channel 0  channel 1");
 		sbuf_printf(sb, "\nCPL requests:   %10u %10u",
 		    stats.req[0], stats.req[1]);
 		sbuf_printf(sb, "\nCPL responses:   %10u %10u",
 		    stats.rsp[0], stats.rsp[1]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_ddp_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_usm_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_get_usm_stats(sc, &stats);
 
 	sbuf_printf(sb, "Frames: %u\n", stats.frames);
 	sbuf_printf(sb, "Octets: %ju\n", stats.octets);
 	sbuf_printf(sb, "Drops:  %u", stats.drops);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static const char * const devlog_level_strings[] = {
 	[FW_DEVLOG_LEVEL_EMERG]		= "EMERG",
 	[FW_DEVLOG_LEVEL_CRIT]		= "CRIT",
 	[FW_DEVLOG_LEVEL_ERR]		= "ERR",
 	[FW_DEVLOG_LEVEL_NOTICE]	= "NOTICE",
 	[FW_DEVLOG_LEVEL_INFO]		= "INFO",
 	[FW_DEVLOG_LEVEL_DEBUG]		= "DEBUG"
 };
 
 static const char * const devlog_facility_strings[] = {
 	[FW_DEVLOG_FACILITY_CORE]	= "CORE",
 	[FW_DEVLOG_FACILITY_CF]		= "CF",
 	[FW_DEVLOG_FACILITY_SCHED]	= "SCHED",
 	[FW_DEVLOG_FACILITY_TIMER]	= "TIMER",
 	[FW_DEVLOG_FACILITY_RES]	= "RES",
 	[FW_DEVLOG_FACILITY_HW]		= "HW",
 	[FW_DEVLOG_FACILITY_FLR]	= "FLR",
 	[FW_DEVLOG_FACILITY_DMAQ]	= "DMAQ",
 	[FW_DEVLOG_FACILITY_PHY]	= "PHY",
 	[FW_DEVLOG_FACILITY_MAC]	= "MAC",
 	[FW_DEVLOG_FACILITY_PORT]	= "PORT",
 	[FW_DEVLOG_FACILITY_VI]		= "VI",
 	[FW_DEVLOG_FACILITY_FILTER]	= "FILTER",
 	[FW_DEVLOG_FACILITY_ACL]	= "ACL",
 	[FW_DEVLOG_FACILITY_TM]		= "TM",
 	[FW_DEVLOG_FACILITY_QFC]	= "QFC",
 	[FW_DEVLOG_FACILITY_DCB]	= "DCB",
 	[FW_DEVLOG_FACILITY_ETH]	= "ETH",
 	[FW_DEVLOG_FACILITY_OFLD]	= "OFLD",
 	[FW_DEVLOG_FACILITY_RI]		= "RI",
 	[FW_DEVLOG_FACILITY_ISCSI]	= "ISCSI",
 	[FW_DEVLOG_FACILITY_FCOE]	= "FCOE",
 	[FW_DEVLOG_FACILITY_FOISCSI]	= "FOISCSI",
 	[FW_DEVLOG_FACILITY_FOFCOE]	= "FOFCOE",
 	[FW_DEVLOG_FACILITY_CHNET]	= "CHNET",
 };
 
 static int
 sysctl_devlog(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct devlog_params *dparams = &sc->params.devlog;
 	struct fw_devlog_e *buf, *e;
 	int i, j, rc, nentries, first = 0;
 	struct sbuf *sb;
 	uint64_t ftstamp = UINT64_MAX;
 
 	if (dparams->addr == 0)
 		return (ENXIO);
 
 	buf = malloc(dparams->size, M_CXGBE, M_NOWAIT);
 	if (buf == NULL)
 		return (ENOMEM);
 
 	rc = read_via_memwin(sc, 1, dparams->addr, (void *)buf, dparams->size);
 	if (rc != 0)
 		goto done;
 
 	nentries = dparams->size / sizeof(struct fw_devlog_e);
 	for (i = 0; i < nentries; i++) {
 		e = &buf[i];
 
 		if (e->timestamp == 0)
 			break;	/* end */
 
 		e->timestamp = be64toh(e->timestamp);
 		e->seqno = be32toh(e->seqno);
 		for (j = 0; j < 8; j++)
 			e->params[j] = be32toh(e->params[j]);
 
 		if (e->timestamp < ftstamp) {
 			ftstamp = e->timestamp;
 			first = i;
 		}
 	}
 
 	if (buf[first].timestamp == 0)
 		goto done;	/* nothing in the log */
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		goto done;
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 	sbuf_printf(sb, "%10s  %15s  %8s  %8s  %s\n",
 	    "Seq#", "Tstamp", "Level", "Facility", "Message");
 
 	i = first;
 	do {
 		e = &buf[i];
 		if (e->timestamp == 0)
 			break;	/* end */
 
 		sbuf_printf(sb, "%10d  %15ju  %8s  %8s  ",
 		    e->seqno, e->timestamp,
 		    (e->level < nitems(devlog_level_strings) ?
 			devlog_level_strings[e->level] : "UNKNOWN"),
 		    (e->facility < nitems(devlog_facility_strings) ?
 			devlog_facility_strings[e->facility] : "UNKNOWN"));
 		sbuf_printf(sb, e->fmt, e->params[0], e->params[1],
 		    e->params[2], e->params[3], e->params[4],
 		    e->params[5], e->params[6], e->params[7]);
 
 		if (++i == nentries)
 			i = 0;
 	} while (i != first);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_fcoe_stats stats[MAX_NCHAN];
 	int i, nchan = sc->chip_params->nchan;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	for (i = 0; i < nchan; i++)
 		t4_get_fcoe_stats(sc, i, &stats[i]);
 
 	if (nchan > 2) {
 		sbuf_printf(sb, "                   channel 0        channel 1"
 		    "        channel 2        channel 3");
 		sbuf_printf(sb, "\noctetsDDP:  %16ju %16ju %16ju %16ju",
 		    stats[0].octets_ddp, stats[1].octets_ddp,
 		    stats[2].octets_ddp, stats[3].octets_ddp);
 		sbuf_printf(sb, "\nframesDDP:  %16u %16u %16u %16u",
 		    stats[0].frames_ddp, stats[1].frames_ddp,
 		    stats[2].frames_ddp, stats[3].frames_ddp);
 		sbuf_printf(sb, "\nframesDrop: %16u %16u %16u %16u",
 		    stats[0].frames_drop, stats[1].frames_drop,
 		    stats[2].frames_drop, stats[3].frames_drop);
 	} else {
 		sbuf_printf(sb, "                   channel 0        channel 1");
 		sbuf_printf(sb, "\noctetsDDP:  %16ju %16ju",
 		    stats[0].octets_ddp, stats[1].octets_ddp);
 		sbuf_printf(sb, "\nframesDDP:  %16u %16u",
 		    stats[0].frames_ddp, stats[1].frames_ddp);
 		sbuf_printf(sb, "\nframesDrop: %16u %16u",
 		    stats[0].frames_drop, stats[1].frames_drop);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_hw_sched(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	unsigned int map, kbps, ipg, mode;
 	unsigned int pace_tab[NTX_SCHED];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	map = t4_read_reg(sc, A_TP_TX_MOD_QUEUE_REQ_MAP);
 	mode = G_TIMERMODE(t4_read_reg(sc, A_TP_MOD_CONFIG));
 	t4_read_pace_tbl(sc, pace_tab);
 
 	sbuf_printf(sb, "Scheduler  Mode   Channel  Rate (Kbps)   "
 	    "Class IPG (0.1 ns)   Flow IPG (us)");
 
 	for (i = 0; i < NTX_SCHED; ++i, map >>= 2) {
 		t4_get_tx_sched(sc, i, &kbps, &ipg);
 		sbuf_printf(sb, "\n    %u      %-5s     %u     ", i,
 		    (mode & (1 << i)) ? "flow" : "class", map & 3);
 		if (kbps)
 			sbuf_printf(sb, "%9u     ", kbps);
 		else
 			sbuf_printf(sb, " disabled     ");
 
 		if (ipg)
 			sbuf_printf(sb, "%13u        ", ipg);
 		else
 			sbuf_printf(sb, "     disabled        ");
 
 		if (pace_tab[i])
 			sbuf_printf(sb, "%10u", pace_tab[i]);
 		else
 			sbuf_printf(sb, "  disabled");
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_lb_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, j;
 	uint64_t *p0, *p1;
 	struct lb_port_stats s[2];
 	static const char *stat_name[] = {
 		"OctetsOK:", "FramesOK:", "BcastFrames:", "McastFrames:",
 		"UcastFrames:", "ErrorFrames:", "Frames64:", "Frames65To127:",
 		"Frames128To255:", "Frames256To511:", "Frames512To1023:",
 		"Frames1024To1518:", "Frames1519ToMax:", "FramesDropped:",
 		"BG0FramesDropped:", "BG1FramesDropped:", "BG2FramesDropped:",
 		"BG3FramesDropped:", "BG0FramesTrunc:", "BG1FramesTrunc:",
 		"BG2FramesTrunc:", "BG3FramesTrunc:"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	memset(s, 0, sizeof(s));
 
 	for (i = 0; i < sc->chip_params->nchan; i += 2) {
 		t4_get_lb_stats(sc, i, &s[0]);
 		t4_get_lb_stats(sc, i + 1, &s[1]);
 
 		p0 = &s[0].octets;
 		p1 = &s[1].octets;
 		sbuf_printf(sb, "%s                       Loopback %u"
 		    "           Loopback %u", i == 0 ? "" : "\n", i, i + 1);
 
 		for (j = 0; j < nitems(stat_name); j++)
 			sbuf_printf(sb, "\n%-17s %20ju %20ju", stat_name[j],
 				   *p0++, *p1++);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_linkdnrc(SYSCTL_HANDLER_ARGS)
 {
 	int rc = 0;
 	struct port_info *pi = arg1;
 	struct sbuf *sb;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 	sb = sbuf_new_for_sysctl(NULL, NULL, 64, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	if (pi->linkdnrc < 0)
 		sbuf_printf(sb, "n/a");
 	else
 		sbuf_printf(sb, "%s", t4_link_down_rc_str(pi->linkdnrc));
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 struct mem_desc {
 	unsigned int base;
 	unsigned int limit;
 	unsigned int idx;
 };
 
 static int
 mem_desc_cmp(const void *a, const void *b)
 {
 	return ((const struct mem_desc *)a)->base -
 	       ((const struct mem_desc *)b)->base;
 }
 
 static void
 mem_region_show(struct sbuf *sb, const char *name, unsigned int from,
     unsigned int to)
 {
 	unsigned int size;
 
 	if (from == to)
 		return;
 
 	size = to - from + 1;
 	if (size == 0)
 		return;
 
 	/* XXX: need humanize_number(3) in libkern for a more readable 'size' */
 	sbuf_printf(sb, "%-15s %#x-%#x [%u]\n", name, from, to, size);
 }
 
 static int
 sysctl_meminfo(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, n;
 	uint32_t lo, hi, used, alloc;
 	static const char *memory[] = {"EDC0:", "EDC1:", "MC:", "MC0:", "MC1:"};
 	static const char *region[] = {
 		"DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:",
 		"Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:",
 		"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
 		"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
 		"RQUDP region:", "PBL region:", "TXPBL region:",
 		"DBVFIFO region:", "ULPRX state:", "ULPTX state:",
 		"On-chip queues:"
 	};
 	struct mem_desc avail[4];
 	struct mem_desc mem[nitems(region) + 3];	/* up to 3 holes */
 	struct mem_desc *md = mem;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	for (i = 0; i < nitems(mem); i++) {
 		mem[i].limit = 0;
 		mem[i].idx = i;
 	}
 
 	/* Find and sort the populated memory ranges */
 	i = 0;
 	lo = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 	if (lo & F_EDRAM0_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		avail[i].base = G_EDRAM0_BASE(hi) << 20;
 		avail[i].limit = avail[i].base + (G_EDRAM0_SIZE(hi) << 20);
 		avail[i].idx = 0;
 		i++;
 	}
 	if (lo & F_EDRAM1_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		avail[i].base = G_EDRAM1_BASE(hi) << 20;
 		avail[i].limit = avail[i].base + (G_EDRAM1_SIZE(hi) << 20);
 		avail[i].idx = 1;
 		i++;
 	}
 	if (lo & F_EXT_MEM_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		avail[i].base = G_EXT_MEM_BASE(hi) << 20;
 		avail[i].limit = avail[i].base +
 		    (G_EXT_MEM_SIZE(hi) << 20);
 		avail[i].idx = is_t5(sc) ? 3 : 2;	/* Call it MC0 for T5 */
 		i++;
 	}
 	if (is_t5(sc) && lo & F_EXT_MEM1_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		avail[i].base = G_EXT_MEM1_BASE(hi) << 20;
 		avail[i].limit = avail[i].base +
 		    (G_EXT_MEM1_SIZE(hi) << 20);
 		avail[i].idx = 4;
 		i++;
 	}
 	if (!i)                                    /* no memory available */
 		return 0;
 	qsort(avail, i, sizeof(struct mem_desc), mem_desc_cmp);
 
 	(md++)->base = t4_read_reg(sc, A_SGE_DBQ_CTXT_BADDR);
 	(md++)->base = t4_read_reg(sc, A_SGE_IMSG_CTXT_BADDR);
 	(md++)->base = t4_read_reg(sc, A_SGE_FLM_CACHE_BADDR);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_TIMER_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_RX_FLST_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_TX_FLST_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_PS_FLST_BASE);
 
 	/* the next few have explicit upper bounds */
 	md->base = t4_read_reg(sc, A_TP_PMM_TX_BASE);
 	md->limit = md->base - 1 +
 		    t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE) *
 		    G_PMTXMAXPAGE(t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE));
 	md++;
 
 	md->base = t4_read_reg(sc, A_TP_PMM_RX_BASE);
 	md->limit = md->base - 1 +
 		    t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) *
 		    G_PMRXMAXPAGE(t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE));
 	md++;
 
 	if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
 		if (chip_id(sc) <= CHELSIO_T5)
 			md->base = t4_read_reg(sc, A_LE_DB_HASH_TID_BASE);
 		else
 			md->base = t4_read_reg(sc, A_LE_DB_HASH_TBL_BASE_ADDR);
 		md->limit = 0;
 	} else {
 		md->base = 0;
 		md->idx = nitems(region);  /* hide it */
 	}
 	md++;
 
 #define ulp_region(reg) \
 	md->base = t4_read_reg(sc, A_ULP_ ## reg ## _LLIMIT);\
 	(md++)->limit = t4_read_reg(sc, A_ULP_ ## reg ## _ULIMIT)
 
 	ulp_region(RX_ISCSI);
 	ulp_region(RX_TDDP);
 	ulp_region(TX_TPT);
 	ulp_region(RX_STAG);
 	ulp_region(RX_RQ);
 	ulp_region(RX_RQUDP);
 	ulp_region(RX_PBL);
 	ulp_region(TX_PBL);
 #undef ulp_region
 
 	md->base = 0;
 	md->idx = nitems(region);
 	if (!is_t4(sc)) {
 		uint32_t size = 0;
 		uint32_t sge_ctrl = t4_read_reg(sc, A_SGE_CONTROL2);
 		uint32_t fifo_size = t4_read_reg(sc, A_SGE_DBVFIFO_SIZE);
 
 		if (is_t5(sc)) {
 			if (sge_ctrl & F_VFIFO_ENABLE)
 				size = G_DBVFIFO_SIZE(fifo_size);
 		} else
 			size = G_T6_DBVFIFO_SIZE(fifo_size);
 
 		if (size) {
 			md->base = G_BASEADDR(t4_read_reg(sc,
 			    A_SGE_DBVFIFO_BADDR));
 			md->limit = md->base + (size << 2) - 1;
 		}
 	}
 	md++;
 
 	md->base = t4_read_reg(sc, A_ULP_RX_CTX_BASE);
 	md->limit = 0;
 	md++;
 	md->base = t4_read_reg(sc, A_ULP_TX_ERR_TABLE_BASE);
 	md->limit = 0;
 	md++;
 
 	md->base = sc->vres.ocq.start;
 	if (sc->vres.ocq.size)
 		md->limit = md->base + sc->vres.ocq.size - 1;
 	else
 		md->idx = nitems(region);  /* hide it */
 	md++;
 
 	/* add any address-space holes, there can be up to 3 */
 	for (n = 0; n < i - 1; n++)
 		if (avail[n].limit < avail[n + 1].base)
 			(md++)->base = avail[n].limit;
 	if (avail[n].limit)
 		(md++)->base = avail[n].limit;
 
 	n = md - mem;
 	qsort(mem, n, sizeof(struct mem_desc), mem_desc_cmp);
 
 	for (lo = 0; lo < i; lo++)
 		mem_region_show(sb, memory[avail[lo].idx], avail[lo].base,
 				avail[lo].limit - 1);
 
 	sbuf_printf(sb, "\n");
 	for (i = 0; i < n; i++) {
 		if (mem[i].idx >= nitems(region))
 			continue;                        /* skip holes */
 		if (!mem[i].limit)
 			mem[i].limit = i < n - 1 ? mem[i + 1].base - 1 : ~0;
 		mem_region_show(sb, region[mem[i].idx], mem[i].base,
 				mem[i].limit);
 	}
 
 	sbuf_printf(sb, "\n");
 	lo = t4_read_reg(sc, A_CIM_SDRAM_BASE_ADDR);
 	hi = t4_read_reg(sc, A_CIM_SDRAM_ADDR_SIZE) + lo - 1;
 	mem_region_show(sb, "uP RAM:", lo, hi);
 
 	lo = t4_read_reg(sc, A_CIM_EXTMEM2_BASE_ADDR);
 	hi = t4_read_reg(sc, A_CIM_EXTMEM2_ADDR_SIZE) + lo - 1;
 	mem_region_show(sb, "uP Extmem2:", lo, hi);
 
 	lo = t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE);
 	sbuf_printf(sb, "\n%u Rx pages of size %uKiB for %u channels\n",
 		   G_PMRXMAXPAGE(lo),
 		   t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) >> 10,
 		   (lo & F_PMRXNUMCHN) ? 2 : 1);
 
 	lo = t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE);
 	hi = t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE);
 	sbuf_printf(sb, "%u Tx pages of size %u%ciB for %u channels\n",
 		   G_PMTXMAXPAGE(lo),
 		   hi >= (1 << 20) ? (hi >> 20) : (hi >> 10),
 		   hi >= (1 << 20) ? 'M' : 'K', 1 << G_PMTXNUMCHN(lo));
 	sbuf_printf(sb, "%u p-structs\n",
 		   t4_read_reg(sc, A_TP_CMM_MM_MAX_PSTRUCT));
 
 	for (i = 0; i < 4; i++) {
 		if (chip_id(sc) > CHELSIO_T5)
 			lo = t4_read_reg(sc, A_MPS_RX_MAC_BG_PG_CNT0 + i * 4);
 		else
 			lo = t4_read_reg(sc, A_MPS_RX_PG_RSV0 + i * 4);
 		if (is_t5(sc)) {
 			used = G_T5_USED(lo);
 			alloc = G_T5_ALLOC(lo);
 		} else {
 			used = G_USED(lo);
 			alloc = G_ALLOC(lo);
 		}
 		/* For T6 these are MAC buffer groups */
 		sbuf_printf(sb, "\nPort %d using %u pages out of %u allocated",
 		    i, used, alloc);
 	}
 	for (i = 0; i < sc->chip_params->nchan; i++) {
 		if (chip_id(sc) > CHELSIO_T5)
 			lo = t4_read_reg(sc, A_MPS_RX_LPBK_BG_PG_CNT0 + i * 4);
 		else
 			lo = t4_read_reg(sc, A_MPS_RX_PG_RSV4 + i * 4);
 		if (is_t5(sc)) {
 			used = G_T5_USED(lo);
 			alloc = G_T5_ALLOC(lo);
 		} else {
 			used = G_USED(lo);
 			alloc = G_ALLOC(lo);
 		}
 		/* For T6 these are MAC buffer groups */
 		sbuf_printf(sb,
 		    "\nLoopback %d using %u pages out of %u allocated",
 		    i, used, alloc);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static inline void
 tcamxy2valmask(uint64_t x, uint64_t y, uint8_t *addr, uint64_t *mask)
 {
 	*mask = x | y;
 	y = htobe64(y);
 	memcpy(addr, (char *)&y + 2, ETHER_ADDR_LEN);
 }
 
 static int
 sysctl_mps_tcam(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 
 	MPASS(chip_id(sc) <= CHELSIO_T5);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb,
 	    "Idx  Ethernet address     Mask     Vld Ports PF"
 	    "  VF              Replication             P0 P1 P2 P3  ML");
 	for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
 		uint64_t tcamx, tcamy, mask;
 		uint32_t cls_lo, cls_hi;
 		uint8_t addr[ETHER_ADDR_LEN];
 
 		tcamy = t4_read_reg64(sc, MPS_CLS_TCAM_Y_L(i));
 		tcamx = t4_read_reg64(sc, MPS_CLS_TCAM_X_L(i));
 		if (tcamx & tcamy)
 			continue;
 		tcamxy2valmask(tcamx, tcamy, addr, &mask);
 		cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
 		cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
 		sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x %012jx"
 			   "  %c   %#x%4u%4d", i, addr[0], addr[1], addr[2],
 			   addr[3], addr[4], addr[5], (uintmax_t)mask,
 			   (cls_lo & F_SRAM_VLD) ? 'Y' : 'N',
 			   G_PORTMAP(cls_hi), G_PF(cls_lo),
 			   (cls_lo & F_VF_VALID) ? G_VF(cls_lo) : -1);
 
 		if (cls_lo & F_REPLICATE) {
 			struct fw_ldst_cmd ldst_cmd;
 
 			memset(&ldst_cmd, 0, sizeof(ldst_cmd));
 			ldst_cmd.op_to_addrspace =
 			    htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
 				F_FW_CMD_REQUEST | F_FW_CMD_READ |
 				V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
 			ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
 			ldst_cmd.u.mps.rplc.fid_idx =
 			    htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
 				V_FW_LDST_CMD_IDX(i));
 
 			rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
 			    "t4mps");
 			if (rc)
 				break;
 			rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
 			    sizeof(ldst_cmd), &ldst_cmd);
 			end_synchronized_op(sc, 0);
 
 			if (rc != 0) {
 				sbuf_printf(sb, "%36d", rc);
 				rc = 0;
 			} else {
 				sbuf_printf(sb, " %08x %08x %08x %08x",
 				    be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
 			}
 		} else
 			sbuf_printf(sb, "%36s", "");
 
 		sbuf_printf(sb, "%4u%3u%3u%3u %#3x", G_SRAM_PRIO0(cls_lo),
 		    G_SRAM_PRIO1(cls_lo), G_SRAM_PRIO2(cls_lo),
 		    G_SRAM_PRIO3(cls_lo), (cls_lo >> S_MULTILISTEN0) & 0xf);
 	}
 
 	if (rc)
 		(void) sbuf_finish(sb);
 	else
 		rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 
 	MPASS(chip_id(sc) > CHELSIO_T5);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "Idx  Ethernet address     Mask       VNI   Mask"
 	    "   IVLAN Vld DIP_Hit   Lookup  Port Vld Ports PF  VF"
 	    "                           Replication"
 	    "                                    P0 P1 P2 P3  ML\n");
 
 	for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
 		uint8_t dip_hit, vlan_vld, lookup_type, port_num;
 		uint16_t ivlan;
 		uint64_t tcamx, tcamy, val, mask;
 		uint32_t cls_lo, cls_hi, ctl, data2, vnix, vniy;
 		uint8_t addr[ETHER_ADDR_LEN];
 
 		ctl = V_CTLREQID(1) | V_CTLCMDTYPE(0) | V_CTLXYBITSEL(0);
 		if (i < 256)
 			ctl |= V_CTLTCAMINDEX(i) | V_CTLTCAMSEL(0);
 		else
 			ctl |= V_CTLTCAMINDEX(i - 256) | V_CTLTCAMSEL(1);
 		t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
 		val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
 		tcamy = G_DMACH(val) << 32;
 		tcamy |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
 		data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
 		lookup_type = G_DATALKPTYPE(data2);
 		port_num = G_DATAPORTNUM(data2);
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			/* Inner header VNI */
 			vniy = ((data2 & F_DATAVIDH2) << 23) |
 				       (G_DATAVIDH1(data2) << 16) | G_VIDL(val);
 			dip_hit = data2 & F_DATADIPHIT;
 			vlan_vld = 0;
 		} else {
 			vniy = 0;
 			dip_hit = 0;
 			vlan_vld = data2 & F_DATAVIDH2;
 			ivlan = G_VIDL(val);
 		}
 
 		ctl |= V_CTLXYBITSEL(1);
 		t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
 		val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
 		tcamx = G_DMACH(val) << 32;
 		tcamx |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
 		data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			/* Inner header VNI mask */
 			vnix = ((data2 & F_DATAVIDH2) << 23) |
 			       (G_DATAVIDH1(data2) << 16) | G_VIDL(val);
 		} else
 			vnix = 0;
 
 		if (tcamx & tcamy)
 			continue;
 		tcamxy2valmask(tcamx, tcamy, addr, &mask);
 
 		cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
 		cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
 
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
 			    "%012jx %06x %06x    -    -   %3c"
 			    "      'I'  %4x   %3c   %#x%4u%4d", i, addr[0],
 			    addr[1], addr[2], addr[3], addr[4], addr[5],
 			    (uintmax_t)mask, vniy, vnix, dip_hit ? 'Y' : 'N',
 			    port_num, cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
 			    G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
 			    cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
 		} else {
 			sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
 			    "%012jx    -       -   ", i, addr[0], addr[1],
 			    addr[2], addr[3], addr[4], addr[5],
 			    (uintmax_t)mask);
 
 			if (vlan_vld)
 				sbuf_printf(sb, "%4u   Y     ", ivlan);
 			else
 				sbuf_printf(sb, "  -    N     ");
 
 			sbuf_printf(sb, "-      %3c  %4x   %3c   %#x%4u%4d",
 			    lookup_type ? 'I' : 'O', port_num,
 			    cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
 			    G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
 			    cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
 		}
 
 
 		if (cls_lo & F_T6_REPLICATE) {
 			struct fw_ldst_cmd ldst_cmd;
 
 			memset(&ldst_cmd, 0, sizeof(ldst_cmd));
 			ldst_cmd.op_to_addrspace =
 			    htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
 				F_FW_CMD_REQUEST | F_FW_CMD_READ |
 				V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
 			ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
 			ldst_cmd.u.mps.rplc.fid_idx =
 			    htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
 				V_FW_LDST_CMD_IDX(i));
 
 			rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
 			    "t6mps");
 			if (rc)
 				break;
 			rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
 			    sizeof(ldst_cmd), &ldst_cmd);
 			end_synchronized_op(sc, 0);
 
 			if (rc != 0) {
 				sbuf_printf(sb, "%72d", rc);
 				rc = 0;
 			} else {
 				sbuf_printf(sb, " %08x %08x %08x %08x"
 				    " %08x %08x %08x %08x",
 				    be32toh(ldst_cmd.u.mps.rplc.rplc255_224),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc223_192),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc191_160),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc159_128),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
 			}
 		} else
 			sbuf_printf(sb, "%72s", "");
 
 		sbuf_printf(sb, "%4u%3u%3u%3u %#x",
 		    G_T6_SRAM_PRIO0(cls_lo), G_T6_SRAM_PRIO1(cls_lo),
 		    G_T6_SRAM_PRIO2(cls_lo), G_T6_SRAM_PRIO3(cls_lo),
 		    (cls_lo >> S_T6_MULTILISTEN0) & 0xf);
 	}
 
 	if (rc)
 		(void) sbuf_finish(sb);
 	else
 		rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_path_mtus(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	uint16_t mtus[NMTUS];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_read_mtu_tbl(sc, mtus, NULL);
 
 	sbuf_printf(sb, "%u %u %u %u %u %u %u %u %u %u %u %u %u %u %u %u",
 	    mtus[0], mtus[1], mtus[2], mtus[3], mtus[4], mtus[5], mtus[6],
 	    mtus[7], mtus[8], mtus[9], mtus[10], mtus[11], mtus[12], mtus[13],
 	    mtus[14], mtus[15]);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_pm_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint32_t tx_cnt[MAX_PM_NSTATS], rx_cnt[MAX_PM_NSTATS];
 	uint64_t tx_cyc[MAX_PM_NSTATS], rx_cyc[MAX_PM_NSTATS];
 	static const char *tx_stats[MAX_PM_NSTATS] = {
 		"Read:", "Write bypass:", "Write mem:", "Bypass + mem:",
 		"Tx FIFO wait", NULL, "Tx latency"
 	};
 	static const char *rx_stats[MAX_PM_NSTATS] = {
 		"Read:", "Write bypass:", "Write mem:", "Flush:",
 		" Rx FIFO wait", NULL, "Rx latency"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_pmtx_get_stats(sc, tx_cnt, tx_cyc);
 	t4_pmrx_get_stats(sc, rx_cnt, rx_cyc);
 
 	sbuf_printf(sb, "                Tx pcmds             Tx bytes");
 	for (i = 0; i < 4; i++) {
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 	}
 
 	sbuf_printf(sb, "\n                Rx pcmds             Rx bytes");
 	for (i = 0; i < 4; i++) {
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 	}
 
 	if (chip_id(sc) > CHELSIO_T5) {
 		sbuf_printf(sb,
 		    "\n              Total wait      Total occupancy");
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 
 		i += 2;
 		MPASS(i < nitems(tx_stats));
 
 		sbuf_printf(sb,
 		    "\n                   Reads           Total wait");
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_rdma_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_rdma_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_rdma_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	sbuf_printf(sb, "NoRQEModDefferals: %u\n", stats.rqe_dfr_mod);
 	sbuf_printf(sb, "NoRQEPktDefferals: %u", stats.rqe_dfr_pkt);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tcp_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_tcp_stats v4, v6;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_tcp_stats(sc, &v4, &v6);
 	mtx_unlock(&sc->reg_lock);
 
 	sbuf_printf(sb,
 	    "                                IP                 IPv6\n");
 	sbuf_printf(sb, "OutRsts:      %20u %20u\n",
 	    v4.tcp_out_rsts, v6.tcp_out_rsts);
 	sbuf_printf(sb, "InSegs:       %20ju %20ju\n",
 	    v4.tcp_in_segs, v6.tcp_in_segs);
 	sbuf_printf(sb, "OutSegs:      %20ju %20ju\n",
 	    v4.tcp_out_segs, v6.tcp_out_segs);
 	sbuf_printf(sb, "RetransSegs:  %20ju %20ju",
 	    v4.tcp_retrans_segs, v6.tcp_retrans_segs);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tids(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tid_info *t = &sc->tids;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	if (t->natids) {
 		sbuf_printf(sb, "ATID range: 0-%u, in use: %u\n", t->natids - 1,
 		    t->atids_in_use);
 	}
 
 	if (t->ntids) {
 		if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
 			uint32_t b = t4_read_reg(sc, A_LE_DB_SERVER_INDEX) / 4;
 
 			if (b) {
 				sbuf_printf(sb, "TID range: 0-%u, %u-%u", b - 1,
 				    t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
 				    t->ntids - 1);
 			} else {
 				sbuf_printf(sb, "TID range: %u-%u",
 				    t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
 				    t->ntids - 1);
 			}
 		} else
 			sbuf_printf(sb, "TID range: 0-%u", t->ntids - 1);
 		sbuf_printf(sb, ", in use: %u\n",
 		    atomic_load_acq_int(&t->tids_in_use));
 	}
 
 	if (t->nstids) {
 		sbuf_printf(sb, "STID range: %u-%u, in use: %u\n", t->stid_base,
 		    t->stid_base + t->nstids - 1, t->stids_in_use);
 	}
 
 	if (t->nftids) {
 		sbuf_printf(sb, "FTID range: %u-%u\n", t->ftid_base,
 		    t->ftid_base + t->nftids - 1);
 	}
 
 	if (t->netids) {
 		sbuf_printf(sb, "ETID range: %u-%u\n", t->etid_base,
 		    t->etid_base + t->netids - 1);
 	}
 
 	sbuf_printf(sb, "HW TID usage: %u IP users, %u IPv6 users",
 	    t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV4),
 	    t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV6));
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_err_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_err_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "                 channel 0  channel 1"
 		    "  channel 2  channel 3\n");
 		sbuf_printf(sb, "macInErrs:      %10u %10u %10u %10u\n",
 		    stats.mac_in_errs[0], stats.mac_in_errs[1],
 		    stats.mac_in_errs[2], stats.mac_in_errs[3]);
 		sbuf_printf(sb, "hdrInErrs:      %10u %10u %10u %10u\n",
 		    stats.hdr_in_errs[0], stats.hdr_in_errs[1],
 		    stats.hdr_in_errs[2], stats.hdr_in_errs[3]);
 		sbuf_printf(sb, "tcpInErrs:      %10u %10u %10u %10u\n",
 		    stats.tcp_in_errs[0], stats.tcp_in_errs[1],
 		    stats.tcp_in_errs[2], stats.tcp_in_errs[3]);
 		sbuf_printf(sb, "tcp6InErrs:     %10u %10u %10u %10u\n",
 		    stats.tcp6_in_errs[0], stats.tcp6_in_errs[1],
 		    stats.tcp6_in_errs[2], stats.tcp6_in_errs[3]);
 		sbuf_printf(sb, "tnlCongDrops:   %10u %10u %10u %10u\n",
 		    stats.tnl_cong_drops[0], stats.tnl_cong_drops[1],
 		    stats.tnl_cong_drops[2], stats.tnl_cong_drops[3]);
 		sbuf_printf(sb, "tnlTxDrops:     %10u %10u %10u %10u\n",
 		    stats.tnl_tx_drops[0], stats.tnl_tx_drops[1],
 		    stats.tnl_tx_drops[2], stats.tnl_tx_drops[3]);
 		sbuf_printf(sb, "ofldVlanDrops:  %10u %10u %10u %10u\n",
 		    stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1],
 		    stats.ofld_vlan_drops[2], stats.ofld_vlan_drops[3]);
 		sbuf_printf(sb, "ofldChanDrops:  %10u %10u %10u %10u\n\n",
 		    stats.ofld_chan_drops[0], stats.ofld_chan_drops[1],
 		    stats.ofld_chan_drops[2], stats.ofld_chan_drops[3]);
 	} else {
 		sbuf_printf(sb, "                 channel 0  channel 1\n");
 		sbuf_printf(sb, "macInErrs:      %10u %10u\n",
 		    stats.mac_in_errs[0], stats.mac_in_errs[1]);
 		sbuf_printf(sb, "hdrInErrs:      %10u %10u\n",
 		    stats.hdr_in_errs[0], stats.hdr_in_errs[1]);
 		sbuf_printf(sb, "tcpInErrs:      %10u %10u\n",
 		    stats.tcp_in_errs[0], stats.tcp_in_errs[1]);
 		sbuf_printf(sb, "tcp6InErrs:     %10u %10u\n",
 		    stats.tcp6_in_errs[0], stats.tcp6_in_errs[1]);
 		sbuf_printf(sb, "tnlCongDrops:   %10u %10u\n",
 		    stats.tnl_cong_drops[0], stats.tnl_cong_drops[1]);
 		sbuf_printf(sb, "tnlTxDrops:     %10u %10u\n",
 		    stats.tnl_tx_drops[0], stats.tnl_tx_drops[1]);
 		sbuf_printf(sb, "ofldVlanDrops:  %10u %10u\n",
 		    stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1]);
 		sbuf_printf(sb, "ofldChanDrops:  %10u %10u\n\n",
 		    stats.ofld_chan_drops[0], stats.ofld_chan_drops[1]);
 	}
 
 	sbuf_printf(sb, "ofldNoNeigh:    %u\nofldCongDefer:  %u",
 	    stats.ofld_no_neigh, stats.ofld_cong_defer);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct tp_params *tpp = &sc->params.tp;
 	u_int mask;
 	int rc;
 
 	mask = tpp->la_mask >> 16;
 	rc = sysctl_handle_int(oidp, &mask, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 	if (mask > 0xffff)
 		return (EINVAL);
 	tpp->la_mask = mask << 16;
 	t4_set_reg_field(sc, A_TP_DBG_LA_CONFIG, 0xffff0000U, tpp->la_mask);
 
 	return (0);
 }
 
 struct field_desc {
 	const char *name;
 	u_int start;
 	u_int width;
 };
 
 static void
 field_desc_show(struct sbuf *sb, uint64_t v, const struct field_desc *f)
 {
 	char buf[32];
 	int line_size = 0;
 
 	while (f->name) {
 		uint64_t mask = (1ULL << f->width) - 1;
 		int len = snprintf(buf, sizeof(buf), "%s: %ju", f->name,
 		    ((uintmax_t)v >> f->start) & mask);
 
 		if (line_size + len >= 79) {
 			line_size = 8;
 			sbuf_printf(sb, "\n        ");
 		}
 		sbuf_printf(sb, "%s ", buf);
 		line_size += len + 1;
 		f++;
 	}
 	sbuf_printf(sb, "\n");
 }
 
 static const struct field_desc tp_la0[] = {
 	{ "RcfOpCodeOut", 60, 4 },
 	{ "State", 56, 4 },
 	{ "WcfState", 52, 4 },
 	{ "RcfOpcSrcOut", 50, 2 },
 	{ "CRxError", 49, 1 },
 	{ "ERxError", 48, 1 },
 	{ "SanityFailed", 47, 1 },
 	{ "SpuriousMsg", 46, 1 },
 	{ "FlushInputMsg", 45, 1 },
 	{ "FlushInputCpl", 44, 1 },
 	{ "RssUpBit", 43, 1 },
 	{ "RssFilterHit", 42, 1 },
 	{ "Tid", 32, 10 },
 	{ "InitTcb", 31, 1 },
 	{ "LineNumber", 24, 7 },
 	{ "Emsg", 23, 1 },
 	{ "EdataOut", 22, 1 },
 	{ "Cmsg", 21, 1 },
 	{ "CdataOut", 20, 1 },
 	{ "EreadPdu", 19, 1 },
 	{ "CreadPdu", 18, 1 },
 	{ "TunnelPkt", 17, 1 },
 	{ "RcfPeerFin", 16, 1 },
 	{ "RcfReasonOut", 12, 4 },
 	{ "TxCchannel", 10, 2 },
 	{ "RcfTxChannel", 8, 2 },
 	{ "RxEchannel", 6, 2 },
 	{ "RcfRxChannel", 5, 1 },
 	{ "RcfDataOutSrdy", 4, 1 },
 	{ "RxDvld", 3, 1 },
 	{ "RxOoDvld", 2, 1 },
 	{ "RxCongestion", 1, 1 },
 	{ "TxCongestion", 0, 1 },
 	{ NULL }
 };
 
 static const struct field_desc tp_la1[] = {
 	{ "CplCmdIn", 56, 8 },
 	{ "CplCmdOut", 48, 8 },
 	{ "ESynOut", 47, 1 },
 	{ "EAckOut", 46, 1 },
 	{ "EFinOut", 45, 1 },
 	{ "ERstOut", 44, 1 },
 	{ "SynIn", 43, 1 },
 	{ "AckIn", 42, 1 },
 	{ "FinIn", 41, 1 },
 	{ "RstIn", 40, 1 },
 	{ "DataIn", 39, 1 },
 	{ "DataInVld", 38, 1 },
 	{ "PadIn", 37, 1 },
 	{ "RxBufEmpty", 36, 1 },
 	{ "RxDdp", 35, 1 },
 	{ "RxFbCongestion", 34, 1 },
 	{ "TxFbCongestion", 33, 1 },
 	{ "TxPktSumSrdy", 32, 1 },
 	{ "RcfUlpType", 28, 4 },
 	{ "Eread", 27, 1 },
 	{ "Ebypass", 26, 1 },
 	{ "Esave", 25, 1 },
 	{ "Static0", 24, 1 },
 	{ "Cread", 23, 1 },
 	{ "Cbypass", 22, 1 },
 	{ "Csave", 21, 1 },
 	{ "CPktOut", 20, 1 },
 	{ "RxPagePoolFull", 18, 2 },
 	{ "RxLpbkPkt", 17, 1 },
 	{ "TxLpbkPkt", 16, 1 },
 	{ "RxVfValid", 15, 1 },
 	{ "SynLearned", 14, 1 },
 	{ "SetDelEntry", 13, 1 },
 	{ "SetInvEntry", 12, 1 },
 	{ "CpcmdDvld", 11, 1 },
 	{ "CpcmdSave", 10, 1 },
 	{ "RxPstructsFull", 8, 2 },
 	{ "EpcmdDvld", 7, 1 },
 	{ "EpcmdFlush", 6, 1 },
 	{ "EpcmdTrimPrefix", 5, 1 },
 	{ "EpcmdTrimPostfix", 4, 1 },
 	{ "ERssIp4Pkt", 3, 1 },
 	{ "ERssIp6Pkt", 2, 1 },
 	{ "ERssTcpUdpPkt", 1, 1 },
 	{ "ERssFceFipPkt", 0, 1 },
 	{ NULL }
 };
 
 static const struct field_desc tp_la2[] = {
 	{ "CplCmdIn", 56, 8 },
 	{ "MpsVfVld", 55, 1 },
 	{ "MpsPf", 52, 3 },
 	{ "MpsVf", 44, 8 },
 	{ "SynIn", 43, 1 },
 	{ "AckIn", 42, 1 },
 	{ "FinIn", 41, 1 },
 	{ "RstIn", 40, 1 },
 	{ "DataIn", 39, 1 },
 	{ "DataInVld", 38, 1 },
 	{ "PadIn", 37, 1 },
 	{ "RxBufEmpty", 36, 1 },
 	{ "RxDdp", 35, 1 },
 	{ "RxFbCongestion", 34, 1 },
 	{ "TxFbCongestion", 33, 1 },
 	{ "TxPktSumSrdy", 32, 1 },
 	{ "RcfUlpType", 28, 4 },
 	{ "Eread", 27, 1 },
 	{ "Ebypass", 26, 1 },
 	{ "Esave", 25, 1 },
 	{ "Static0", 24, 1 },
 	{ "Cread", 23, 1 },
 	{ "Cbypass", 22, 1 },
 	{ "Csave", 21, 1 },
 	{ "CPktOut", 20, 1 },
 	{ "RxPagePoolFull", 18, 2 },
 	{ "RxLpbkPkt", 17, 1 },
 	{ "TxLpbkPkt", 16, 1 },
 	{ "RxVfValid", 15, 1 },
 	{ "SynLearned", 14, 1 },
 	{ "SetDelEntry", 13, 1 },
 	{ "SetInvEntry", 12, 1 },
 	{ "CpcmdDvld", 11, 1 },
 	{ "CpcmdSave", 10, 1 },
 	{ "RxPstructsFull", 8, 2 },
 	{ "EpcmdDvld", 7, 1 },
 	{ "EpcmdFlush", 6, 1 },
 	{ "EpcmdTrimPrefix", 5, 1 },
 	{ "EpcmdTrimPostfix", 4, 1 },
 	{ "ERssIp4Pkt", 3, 1 },
 	{ "ERssIp6Pkt", 2, 1 },
 	{ "ERssTcpUdpPkt", 1, 1 },
 	{ "ERssFceFipPkt", 0, 1 },
 	{ NULL }
 };
 
 static void
 tp_la_show(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	field_desc_show(sb, *p, tp_la0);
 }
 
 static void
 tp_la_show2(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	if (idx)
 		sbuf_printf(sb, "\n");
 	field_desc_show(sb, p[0], tp_la0);
 	if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
 		field_desc_show(sb, p[1], tp_la0);
 }
 
 static void
 tp_la_show3(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	if (idx)
 		sbuf_printf(sb, "\n");
 	field_desc_show(sb, p[0], tp_la0);
 	if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
 		field_desc_show(sb, p[1], (p[0] & (1 << 17)) ? tp_la2 : tp_la1);
 }
 
 static int
 sysctl_tp_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	uint64_t *buf, *p;
 	int rc;
 	u_int i, inc;
 	void (*show_func)(struct sbuf *, uint64_t *, int);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(TPLA_SIZE * sizeof(uint64_t), M_CXGBE, M_ZERO | M_WAITOK);
 
 	t4_tp_read_la(sc, buf, NULL);
 	p = buf;
 
 	switch (G_DBGLAMODE(t4_read_reg(sc, A_TP_DBG_LA_CONFIG))) {
 	case 2:
 		inc = 2;
 		show_func = tp_la_show2;
 		break;
 	case 3:
 		inc = 2;
 		show_func = tp_la_show3;
 		break;
 	default:
 		inc = 1;
 		show_func = tp_la_show;
 	}
 
 	for (i = 0; i < TPLA_SIZE / inc; i++, p += inc)
 		(*show_func)(sb, p, i);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_tx_rate(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	u64 nrate[MAX_NCHAN], orate[MAX_NCHAN];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_get_chan_txrate(sc, nrate, orate);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "              channel 0   channel 1"
 		    "   channel 2   channel 3\n");
 		sbuf_printf(sb, "NIC B/s:     %10ju  %10ju  %10ju  %10ju\n",
 		    nrate[0], nrate[1], nrate[2], nrate[3]);
 		sbuf_printf(sb, "Offload B/s: %10ju  %10ju  %10ju  %10ju",
 		    orate[0], orate[1], orate[2], orate[3]);
 	} else {
 		sbuf_printf(sb, "              channel 0   channel 1\n");
 		sbuf_printf(sb, "NIC B/s:     %10ju  %10ju\n",
 		    nrate[0], nrate[1]);
 		sbuf_printf(sb, "Offload B/s: %10ju  %10ju",
 		    orate[0], orate[1]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_ulprx_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc, i;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(ULPRX_LA_SIZE * 8 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_ulprx_read_la(sc, buf);
 	p = buf;
 
 	sbuf_printf(sb, "      Pcmd        Type   Message"
 	    "                Data");
 	for (i = 0; i < ULPRX_LA_SIZE; i++, p += 8) {
 		sbuf_printf(sb, "\n%08x%08x  %4x  %08x  %08x%08x%08x%08x",
 		    p[1], p[0], p[2], p[3], p[7], p[6], p[5], p[4]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, v;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	v = t4_read_reg(sc, A_SGE_STAT_CFG);
 	if (G_STATSOURCE_T5(v) == 7) {
 		if (G_STATMODE(v) == 0) {
 			sbuf_printf(sb, "total %d, incomplete %d",
 			    t4_read_reg(sc, A_SGE_STAT_TOTAL),
 			    t4_read_reg(sc, A_SGE_STAT_MATCH));
 		} else if (G_STATMODE(v) == 1) {
 			sbuf_printf(sb, "total %d, data overflow %d",
 			    t4_read_reg(sc, A_SGE_STAT_TOTAL),
 			    t4_read_reg(sc, A_SGE_STAT_MATCH));
 		}
 	}
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
+
+static int
+sysctl_tc_params(SYSCTL_HANDLER_ARGS)
+{
+	struct adapter *sc = arg1;
+	struct tx_sched_class *tc;
+	struct t4_sched_class_params p;
+	struct sbuf *sb;
+	int i, rc, port_id, flags, mbps, gbps;
+
+	rc = sysctl_wire_old_buffer(req, 0);
+	if (rc != 0)
+		return (rc);
+
+	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
+	if (sb == NULL)
+		return (ENOMEM);
+
+	port_id = arg2 >> 16;
+	MPASS(port_id < sc->params.nports);
+	MPASS(sc->port[port_id] != NULL);
+	i = arg2 & 0xffff;
+	MPASS(i < sc->chip_params->nsched_cls);
+	tc = &sc->port[port_id]->tc[i];
+
+	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
+	    "t4tc_p");
+	if (rc)
+		goto done;
+	flags = tc->flags;
+	p = tc->params;
+	end_synchronized_op(sc, LOCK_HELD);
+
+	if ((flags & TX_SC_OK) == 0) {
+		sbuf_printf(sb, "none");
+		goto done;
+	}
+
+	if (p.level == SCHED_CLASS_LEVEL_CL_WRR) {
+		sbuf_printf(sb, "cl-wrr weight %u", p.weight);
+		goto done;
+	} else if (p.level == SCHED_CLASS_LEVEL_CL_RL)
+		sbuf_printf(sb, "cl-rl");
+	else if (p.level == SCHED_CLASS_LEVEL_CH_RL)
+		sbuf_printf(sb, "ch-rl");
+	else {
+		rc = ENXIO;
+		goto done;
+	}
+
+	if (p.ratemode == SCHED_CLASS_RATEMODE_REL) {
+		/* XXX: top speed or actual link speed? */
+		gbps = port_top_speed(sc->port[port_id]);
+		sbuf_printf(sb, " %u%% of %uGbps", p.maxrate, gbps);
+	}
+	else if (p.ratemode == SCHED_CLASS_RATEMODE_ABS) {
+		switch (p.rateunit) {
+		case SCHED_CLASS_RATEUNIT_BITS:
+			mbps = p.maxrate / 1000;
+			gbps = p.maxrate / 1000000;
+			if (p.maxrate == gbps * 1000000)
+				sbuf_printf(sb, " %uGbps", gbps);
+			else if (p.maxrate == mbps * 1000)
+				sbuf_printf(sb, " %uMbps", mbps);
+			else
+				sbuf_printf(sb, " %uKbps", p.maxrate);
+			break;
+		case SCHED_CLASS_RATEUNIT_PKTS:
+			sbuf_printf(sb, " %upps", p.maxrate);
+			break;
+		default:
+			rc = ENXIO;
+			goto done;
+		}
+	}
+
+	switch (p.mode) {
+	case SCHED_CLASS_MODE_CLASS:
+		sbuf_printf(sb, " aggregate");
+		break;
+	case SCHED_CLASS_MODE_FLOW:
+		sbuf_printf(sb, " per-flow");
+		break;
+	default:
+		rc = ENXIO;
+		goto done;
+	}
+
+done:
+	if (rc == 0)
+		rc = sbuf_finish(sb);
+	sbuf_delete(sb);
+
+	return (rc);
+}
 #endif
 
 #ifdef TCP_OFFLOAD
 static void
 unit_conv(char *buf, size_t len, u_int val, u_int factor)
 {
 	u_int rem = val % factor;
 
 	if (rem == 0)
 		snprintf(buf, len, "%u", val / factor);
 	else {
 		while (rem % 10 == 0)
 			rem /= 10;
 		snprintf(buf, len, "%u.%u", val / factor, rem);
 	}
 }
 
 static int
 sysctl_tp_tick(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	char buf[16];
 	u_int res, re;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
 	switch (arg2) {
 	case 0:
 		/* timer_tick */
 		re = G_TIMERRESOLUTION(res);
 		break;
 	case 1:
 		/* TCP timestamp tick */
 		re = G_TIMESTAMPRESOLUTION(res);
 		break;
 	case 2:
 		/* DACK tick */
 		re = G_DELAYEDACKRESOLUTION(res);
 		break;
 	default:
 		return (EDOOFUS);
 	}
 
 	unit_conv(buf, sizeof(buf), (cclk_ps << re), 1000000);
 
 	return (sysctl_handle_string(oidp, buf, sizeof(buf), req));
 }
 
 static int
 sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int res, dack_re, v;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
 	dack_re = G_DELAYEDACKRESOLUTION(res);
 	v = ((cclk_ps << dack_re) / 1000000) * t4_read_reg(sc, A_TP_DACK_TIMER);
 
 	return (sysctl_handle_int(oidp, &v, 0, req));
 }
 
 static int
 sysctl_tp_timer(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int reg = arg2;
 	u_int tre;
 	u_long tp_tick_us, v;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	MPASS(reg == A_TP_RXT_MIN || reg == A_TP_RXT_MAX ||
 	    reg == A_TP_PERS_MIN || reg == A_TP_PERS_MAX ||
 	    reg == A_TP_KEEP_IDLE || A_TP_KEEP_INTVL || reg == A_TP_INIT_SRTT ||
 	    reg == A_TP_FINWAIT2_TIMER);
 
 	tre = G_TIMERRESOLUTION(t4_read_reg(sc, A_TP_TIMER_RESOLUTION));
 	tp_tick_us = (cclk_ps << tre) / 1000000;
 
 	if (reg == A_TP_INIT_SRTT)
 		v = tp_tick_us * G_INITSRTT(t4_read_reg(sc, reg));
 	else
 		v = tp_tick_us * t4_read_reg(sc, reg);
 
 	return (sysctl_handle_long(oidp, &v, 0, req));
 }
 #endif
 
 static uint32_t
 fconf_iconf_to_mode(uint32_t fconf, uint32_t iconf)
 {
 	uint32_t mode;
 
 	mode = T4_FILTER_IPv4 | T4_FILTER_IPv6 | T4_FILTER_IP_SADDR |
 	    T4_FILTER_IP_DADDR | T4_FILTER_IP_SPORT | T4_FILTER_IP_DPORT;
 
 	if (fconf & F_FRAGMENTATION)
 		mode |= T4_FILTER_IP_FRAGMENT;
 
 	if (fconf & F_MPSHITTYPE)
 		mode |= T4_FILTER_MPS_HIT_TYPE;
 
 	if (fconf & F_MACMATCH)
 		mode |= T4_FILTER_MAC_IDX;
 
 	if (fconf & F_ETHERTYPE)
 		mode |= T4_FILTER_ETH_TYPE;
 
 	if (fconf & F_PROTOCOL)
 		mode |= T4_FILTER_IP_PROTO;
 
 	if (fconf & F_TOS)
 		mode |= T4_FILTER_IP_TOS;
 
 	if (fconf & F_VLAN)
 		mode |= T4_FILTER_VLAN;
 
 	if (fconf & F_VNIC_ID) {
 		mode |= T4_FILTER_VNIC;
 		if (iconf & F_VNIC)
 			mode |= T4_FILTER_IC_VNIC;
 	}
 
 	if (fconf & F_PORT)
 		mode |= T4_FILTER_PORT;
 
 	if (fconf & F_FCOE)
 		mode |= T4_FILTER_FCoE;
 
 	return (mode);
 }
 
 static uint32_t
 mode_to_fconf(uint32_t mode)
 {
 	uint32_t fconf = 0;
 
 	if (mode & T4_FILTER_IP_FRAGMENT)
 		fconf |= F_FRAGMENTATION;
 
 	if (mode & T4_FILTER_MPS_HIT_TYPE)
 		fconf |= F_MPSHITTYPE;
 
 	if (mode & T4_FILTER_MAC_IDX)
 		fconf |= F_MACMATCH;
 
 	if (mode & T4_FILTER_ETH_TYPE)
 		fconf |= F_ETHERTYPE;
 
 	if (mode & T4_FILTER_IP_PROTO)
 		fconf |= F_PROTOCOL;
 
 	if (mode & T4_FILTER_IP_TOS)
 		fconf |= F_TOS;
 
 	if (mode & T4_FILTER_VLAN)
 		fconf |= F_VLAN;
 
 	if (mode & T4_FILTER_VNIC)
 		fconf |= F_VNIC_ID;
 
 	if (mode & T4_FILTER_PORT)
 		fconf |= F_PORT;
 
 	if (mode & T4_FILTER_FCoE)
 		fconf |= F_FCOE;
 
 	return (fconf);
 }
 
 static uint32_t
 mode_to_iconf(uint32_t mode)
 {
 
 	if (mode & T4_FILTER_IC_VNIC)
 		return (F_VNIC);
 	return (0);
 }
 
 static int check_fspec_against_fconf_iconf(struct adapter *sc,
     struct t4_filter_specification *fs)
 {
 	struct tp_params *tpp = &sc->params.tp;
 	uint32_t fconf = 0;
 
 	if (fs->val.frag || fs->mask.frag)
 		fconf |= F_FRAGMENTATION;
 
 	if (fs->val.matchtype || fs->mask.matchtype)
 		fconf |= F_MPSHITTYPE;
 
 	if (fs->val.macidx || fs->mask.macidx)
 		fconf |= F_MACMATCH;
 
 	if (fs->val.ethtype || fs->mask.ethtype)
 		fconf |= F_ETHERTYPE;
 
 	if (fs->val.proto || fs->mask.proto)
 		fconf |= F_PROTOCOL;
 
 	if (fs->val.tos || fs->mask.tos)
 		fconf |= F_TOS;
 
 	if (fs->val.vlan_vld || fs->mask.vlan_vld)
 		fconf |= F_VLAN;
 
 	if (fs->val.ovlan_vld || fs->mask.ovlan_vld) {
 		fconf |= F_VNIC_ID;
 		if (tpp->ingress_config & F_VNIC)
 			return (EINVAL);
 	}
 
 	if (fs->val.pfvf_vld || fs->mask.pfvf_vld) {
 		fconf |= F_VNIC_ID;
 		if ((tpp->ingress_config & F_VNIC) == 0)
 			return (EINVAL);
 	}
 
 	if (fs->val.iport || fs->mask.iport)
 		fconf |= F_PORT;
 
 	if (fs->val.fcoe || fs->mask.fcoe)
 		fconf |= F_FCOE;
 
 	if ((tpp->vlan_pri_map | fconf) != tpp->vlan_pri_map)
 		return (E2BIG);
 
 	return (0);
 }
 
 static int
 get_filter_mode(struct adapter *sc, uint32_t *mode)
 {
 	struct tp_params *tpp = &sc->params.tp;
 
 	/*
 	 * We trust the cached values of the relevant TP registers.  This means
 	 * things work reliably only if writes to those registers are always via
 	 * t4_set_filter_mode.
 	 */
 	*mode = fconf_iconf_to_mode(tpp->vlan_pri_map, tpp->ingress_config);
 
 	return (0);
 }
 
 static int
 set_filter_mode(struct adapter *sc, uint32_t mode)
 {
 	struct tp_params *tpp = &sc->params.tp;
 	uint32_t fconf, iconf;
 	int rc;
 
 	iconf = mode_to_iconf(mode);
 	if ((iconf ^ tpp->ingress_config) & F_VNIC) {
 		/*
 		 * For now we just complain if A_TP_INGRESS_CONFIG is not
 		 * already set to the correct value for the requested filter
 		 * mode.  It's not clear if it's safe to write to this register
 		 * on the fly.  (And we trust the cached value of the register).
 		 */
 		return (EBUSY);
 	}
 
 	fconf = mode_to_fconf(mode);
 
 	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4setfm");
 	if (rc)
 		return (rc);
 
 	if (sc->tids.ftids_in_use > 0) {
 		rc = EBUSY;
 		goto done;
 	}
 
 #ifdef TCP_OFFLOAD
 	if (uld_active(sc, ULD_TOM)) {
 		rc = EBUSY;
 		goto done;
 	}
 #endif
 
 	rc = -t4_set_filter_mode(sc, fconf);
 done:
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static inline uint64_t
 get_filter_hits(struct adapter *sc, uint32_t fid)
 {
 	uint32_t tcb_addr;
 
 	tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE) +
 	    (fid + sc->tids.ftid_base) * TCB_SIZE;
 
 	if (is_t4(sc)) {
 		uint64_t hits;
 
 		read_via_memwin(sc, 0, tcb_addr + 16, (uint32_t *)&hits, 8);
 		return (be64toh(hits));
 	} else {
 		uint32_t hits;
 
 		read_via_memwin(sc, 0, tcb_addr + 24, &hits, 4);
 		return (be32toh(hits));
 	}
 }
 
 static int
 get_filter(struct adapter *sc, struct t4_filter *t)
 {
 	int i, rc, nfilters = sc->tids.nftids;
 	struct filter_entry *f;
 
 	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4getf");
 	if (rc)
 		return (rc);
 
 	if (sc->tids.ftids_in_use == 0 || sc->tids.ftid_tab == NULL ||
 	    t->idx >= nfilters) {
 		t->idx = 0xffffffff;
 		goto done;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 	for (i = t->idx; i < nfilters; i++, f++) {
 		if (f->valid) {
 			t->idx = i;
 			t->l2tidx = f->l2t ? f->l2t->idx : 0;
 			t->smtidx = f->smtidx;
 			if (f->fs.hitcnts)
 				t->hits = get_filter_hits(sc, t->idx);
 			else
 				t->hits = UINT64_MAX;
 			t->fs = f->fs;
 
 			goto done;
 		}
 	}
 
 	t->idx = 0xffffffff;
 done:
 	end_synchronized_op(sc, LOCK_HELD);
 	return (0);
 }
 
 static int
 set_filter(struct adapter *sc, struct t4_filter *t)
 {
 	unsigned int nfilters, nports;
 	struct filter_entry *f;
 	int i, rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setf");
 	if (rc)
 		return (rc);
 
 	nfilters = sc->tids.nftids;
 	nports = sc->params.nports;
 
 	if (nfilters == 0) {
 		rc = ENOTSUP;
 		goto done;
 	}
 
 	if (!(sc->flags & FULL_INIT_DONE)) {
 		rc = EAGAIN;
 		goto done;
 	}
 
 	if (t->idx >= nfilters) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* Validate against the global filter mode and ingress config */
 	rc = check_fspec_against_fconf_iconf(sc, &t->fs);
 	if (rc != 0)
 		goto done;
 
 	if (t->fs.action == FILTER_SWITCH && t->fs.eport >= nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (t->fs.val.iport >= nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* Can't specify an iq if not steering to it */
 	if (!t->fs.dirsteer && t->fs.iq) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* IPv6 filter idx must be 4 aligned */
 	if (t->fs.type == 1 &&
 	    ((t->idx & 0x3) || t->idx + 4 >= nfilters)) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (sc->tids.ftid_tab == NULL) {
 		KASSERT(sc->tids.ftids_in_use == 0,
 		    ("%s: no memory allocated but filters_in_use > 0",
 		    __func__));
 
 		sc->tids.ftid_tab = malloc(sizeof (struct filter_entry) *
 		    nfilters, M_CXGBE, M_NOWAIT | M_ZERO);
 		if (sc->tids.ftid_tab == NULL) {
 			rc = ENOMEM;
 			goto done;
 		}
 		mtx_init(&sc->tids.ftid_lock, "T4 filters", 0, MTX_DEF);
 	}
 
 	for (i = 0; i < 4; i++) {
 		f = &sc->tids.ftid_tab[t->idx + i];
 
 		if (f->pending || f->valid) {
 			rc = EBUSY;
 			goto done;
 		}
 		if (f->locked) {
 			rc = EPERM;
 			goto done;
 		}
 
 		if (t->fs.type == 0)
 			break;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 	f->fs = t->fs;
 
 	rc = set_filter_wr(sc, t->idx);
 done:
 	end_synchronized_op(sc, 0);
 
 	if (rc == 0) {
 		mtx_lock(&sc->tids.ftid_lock);
 		for (;;) {
 			if (f->pending == 0) {
 				rc = f->valid ? 0 : EIO;
 				break;
 			}
 
 			if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
 			    PCATCH, "t4setfw", 0)) {
 				rc = EINPROGRESS;
 				break;
 			}
 		}
 		mtx_unlock(&sc->tids.ftid_lock);
 	}
 	return (rc);
 }
 
 static int
 del_filter(struct adapter *sc, struct t4_filter *t)
 {
 	unsigned int nfilters;
 	struct filter_entry *f;
 	int rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4delf");
 	if (rc)
 		return (rc);
 
 	nfilters = sc->tids.nftids;
 
 	if (nfilters == 0) {
 		rc = ENOTSUP;
 		goto done;
 	}
 
 	if (sc->tids.ftid_tab == NULL || sc->tids.ftids_in_use == 0 ||
 	    t->idx >= nfilters) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (!(sc->flags & FULL_INIT_DONE)) {
 		rc = EAGAIN;
 		goto done;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 
 	if (f->pending) {
 		rc = EBUSY;
 		goto done;
 	}
 	if (f->locked) {
 		rc = EPERM;
 		goto done;
 	}
 
 	if (f->valid) {
 		t->fs = f->fs;	/* extra info for the caller */
 		rc = del_filter_wr(sc, t->idx);
 	}
 
 done:
 	end_synchronized_op(sc, 0);
 
 	if (rc == 0) {
 		mtx_lock(&sc->tids.ftid_lock);
 		for (;;) {
 			if (f->pending == 0) {
 				rc = f->valid ? EIO : 0;
 				break;
 			}
 
 			if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
 			    PCATCH, "t4delfw", 0)) {
 				rc = EINPROGRESS;
 				break;
 			}
 		}
 		mtx_unlock(&sc->tids.ftid_lock);
 	}
 
 	return (rc);
 }
 
 static void
 clear_filter(struct filter_entry *f)
 {
 	if (f->l2t)
 		t4_l2t_release(f->l2t);
 
 	bzero(f, sizeof (*f));
 }
 
 static int
 set_filter_wr(struct adapter *sc, int fidx)
 {
 	struct filter_entry *f = &sc->tids.ftid_tab[fidx];
 	struct fw_filter_wr *fwr;
 	unsigned int ftid, vnic_vld, vnic_vld_mask;
 	struct wrq_cookie cookie;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (f->fs.newdmac || f->fs.newvlan) {
 		/* This filter needs an L2T entry; allocate one. */
 		f->l2t = t4_l2t_alloc_switching(sc->l2t);
 		if (f->l2t == NULL)
 			return (EAGAIN);
 		if (t4_l2t_set_switching(sc, f->l2t, f->fs.vlan, f->fs.eport,
 		    f->fs.dmac)) {
 			t4_l2t_release(f->l2t);
 			f->l2t = NULL;
 			return (ENOMEM);
 		}
 	}
 
 	/* Already validated against fconf, iconf */
 	MPASS((f->fs.val.pfvf_vld & f->fs.val.ovlan_vld) == 0);
 	MPASS((f->fs.mask.pfvf_vld & f->fs.mask.ovlan_vld) == 0);
 	if (f->fs.val.pfvf_vld || f->fs.val.ovlan_vld)
 		vnic_vld = 1;
 	else
 		vnic_vld = 0;
 	if (f->fs.mask.pfvf_vld || f->fs.mask.ovlan_vld)
 		vnic_vld_mask = 1;
 	else
 		vnic_vld_mask = 0;
 
 	ftid = sc->tids.ftid_base + fidx;
 
 	fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
 	if (fwr == NULL)
 		return (ENOMEM);
 	bzero(fwr, sizeof(*fwr));
 
 	fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR));
 	fwr->len16_pkd = htobe32(FW_LEN16(*fwr));
 	fwr->tid_to_iq =
 	    htobe32(V_FW_FILTER_WR_TID(ftid) |
 		V_FW_FILTER_WR_RQTYPE(f->fs.type) |
 		V_FW_FILTER_WR_NOREPLY(0) |
 		V_FW_FILTER_WR_IQ(f->fs.iq));
 	fwr->del_filter_to_l2tix =
 	    htobe32(V_FW_FILTER_WR_RPTTID(f->fs.rpttid) |
 		V_FW_FILTER_WR_DROP(f->fs.action == FILTER_DROP) |
 		V_FW_FILTER_WR_DIRSTEER(f->fs.dirsteer) |
 		V_FW_FILTER_WR_MASKHASH(f->fs.maskhash) |
 		V_FW_FILTER_WR_DIRSTEERHASH(f->fs.dirsteerhash) |
 		V_FW_FILTER_WR_LPBK(f->fs.action == FILTER_SWITCH) |
 		V_FW_FILTER_WR_DMAC(f->fs.newdmac) |
 		V_FW_FILTER_WR_SMAC(f->fs.newsmac) |
 		V_FW_FILTER_WR_INSVLAN(f->fs.newvlan == VLAN_INSERT ||
 		    f->fs.newvlan == VLAN_REWRITE) |
 		V_FW_FILTER_WR_RMVLAN(f->fs.newvlan == VLAN_REMOVE ||
 		    f->fs.newvlan == VLAN_REWRITE) |
 		V_FW_FILTER_WR_HITCNTS(f->fs.hitcnts) |
 		V_FW_FILTER_WR_TXCHAN(f->fs.eport) |
 		V_FW_FILTER_WR_PRIO(f->fs.prio) |
 		V_FW_FILTER_WR_L2TIX(f->l2t ? f->l2t->idx : 0));
 	fwr->ethtype = htobe16(f->fs.val.ethtype);
 	fwr->ethtypem = htobe16(f->fs.mask.ethtype);
 	fwr->frag_to_ovlan_vldm =
 	    (V_FW_FILTER_WR_FRAG(f->fs.val.frag) |
 		V_FW_FILTER_WR_FRAGM(f->fs.mask.frag) |
 		V_FW_FILTER_WR_IVLAN_VLD(f->fs.val.vlan_vld) |
 		V_FW_FILTER_WR_OVLAN_VLD(vnic_vld) |
 		V_FW_FILTER_WR_IVLAN_VLDM(f->fs.mask.vlan_vld) |
 		V_FW_FILTER_WR_OVLAN_VLDM(vnic_vld_mask));
 	fwr->smac_sel = 0;
 	fwr->rx_chan_rx_rpl_iq = htobe16(V_FW_FILTER_WR_RX_CHAN(0) |
 	    V_FW_FILTER_WR_RX_RPL_IQ(sc->sge.fwq.abs_id));
 	fwr->maci_to_matchtypem =
 	    htobe32(V_FW_FILTER_WR_MACI(f->fs.val.macidx) |
 		V_FW_FILTER_WR_MACIM(f->fs.mask.macidx) |
 		V_FW_FILTER_WR_FCOE(f->fs.val.fcoe) |
 		V_FW_FILTER_WR_FCOEM(f->fs.mask.fcoe) |
 		V_FW_FILTER_WR_PORT(f->fs.val.iport) |
 		V_FW_FILTER_WR_PORTM(f->fs.mask.iport) |
 		V_FW_FILTER_WR_MATCHTYPE(f->fs.val.matchtype) |
 		V_FW_FILTER_WR_MATCHTYPEM(f->fs.mask.matchtype));
 	fwr->ptcl = f->fs.val.proto;
 	fwr->ptclm = f->fs.mask.proto;
 	fwr->ttyp = f->fs.val.tos;
 	fwr->ttypm = f->fs.mask.tos;
 	fwr->ivlan = htobe16(f->fs.val.vlan);
 	fwr->ivlanm = htobe16(f->fs.mask.vlan);
 	fwr->ovlan = htobe16(f->fs.val.vnic);
 	fwr->ovlanm = htobe16(f->fs.mask.vnic);
 	bcopy(f->fs.val.dip, fwr->lip, sizeof (fwr->lip));
 	bcopy(f->fs.mask.dip, fwr->lipm, sizeof (fwr->lipm));
 	bcopy(f->fs.val.sip, fwr->fip, sizeof (fwr->fip));
 	bcopy(f->fs.mask.sip, fwr->fipm, sizeof (fwr->fipm));
 	fwr->lp = htobe16(f->fs.val.dport);
 	fwr->lpm = htobe16(f->fs.mask.dport);
 	fwr->fp = htobe16(f->fs.val.sport);
 	fwr->fpm = htobe16(f->fs.mask.sport);
 	if (f->fs.newsmac)
 		bcopy(f->fs.smac, fwr->sma, sizeof (fwr->sma));
 
 	f->pending = 1;
 	sc->tids.ftids_in_use++;
 
 	commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
 	return (0);
 }
 
 static int
 del_filter_wr(struct adapter *sc, int fidx)
 {
 	struct filter_entry *f = &sc->tids.ftid_tab[fidx];
 	struct fw_filter_wr *fwr;
 	unsigned int ftid;
 	struct wrq_cookie cookie;
 
 	ftid = sc->tids.ftid_base + fidx;
 
 	fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
 	if (fwr == NULL)
 		return (ENOMEM);
 	bzero(fwr, sizeof (*fwr));
 
 	t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id);
 
 	f->pending = 1;
 	commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
 	return (0);
 }
 
 int
 t4_filter_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_set_tcb_rpl *rpl = (const void *)(rss + 1);
 	unsigned int idx = GET_TID(rpl);
 	unsigned int rc;
 	struct filter_entry *f;
 
 	KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__,
 	    rss->opcode));
 
 	if (is_ftid(sc, idx)) {
 
 		idx -= sc->tids.ftid_base;
 		f = &sc->tids.ftid_tab[idx];
 		rc = G_COOKIE(rpl->cookie);
 
 		mtx_lock(&sc->tids.ftid_lock);
 		if (rc == FW_FILTER_WR_FLT_ADDED) {
 			KASSERT(f->pending, ("%s: filter[%u] isn't pending.",
 			    __func__, idx));
 			f->smtidx = (be64toh(rpl->oldval) >> 24) & 0xff;
 			f->pending = 0;  /* asynchronous setup completed */
 			f->valid = 1;
 		} else {
 			if (rc != FW_FILTER_WR_FLT_DELETED) {
 				/* Add or delete failed, display an error */
 				log(LOG_ERR,
 				    "filter %u setup failed with error %u\n",
 				    idx, rc);
 			}
 
 			clear_filter(f);
 			sc->tids.ftids_in_use--;
 		}
 		wakeup(&sc->tids.ftid_tab);
 		mtx_unlock(&sc->tids.ftid_lock);
 	}
 
 	return (0);
 }
 
 static int
 get_sge_context(struct adapter *sc, struct t4_sge_context *cntxt)
 {
 	int rc;
 
 	if (cntxt->cid > M_CTXTQID)
 		return (EINVAL);
 
 	if (cntxt->mem_id != CTXT_EGRESS && cntxt->mem_id != CTXT_INGRESS &&
 	    cntxt->mem_id != CTXT_FLM && cntxt->mem_id != CTXT_CNM)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ctxt");
 	if (rc)
 		return (rc);
 
 	if (sc->flags & FW_OK) {
 		rc = -t4_sge_ctxt_rd(sc, sc->mbox, cntxt->cid, cntxt->mem_id,
 		    &cntxt->data[0]);
 		if (rc == 0)
 			goto done;
 	}
 
 	/*
 	 * Read via firmware failed or wasn't even attempted.  Read directly via
 	 * the backdoor.
 	 */
 	rc = -t4_sge_ctxt_rd_bd(sc, cntxt->cid, cntxt->mem_id, &cntxt->data[0]);
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 static int
 load_fw(struct adapter *sc, struct t4_data *fw)
 {
 	int rc;
 	uint8_t *fw_data;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ldfw");
 	if (rc)
 		return (rc);
 
 	if (sc->flags & FULL_INIT_DONE) {
 		rc = EBUSY;
 		goto done;
 	}
 
 	fw_data = malloc(fw->len, M_CXGBE, M_WAITOK);
 	if (fw_data == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 
 	rc = copyin(fw->data, fw_data, fw->len);
 	if (rc == 0)
 		rc = -t4_load_fw(sc, fw_data, fw->len);
 
 	free(fw_data, M_CXGBE);
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 #define MAX_READ_BUF_SIZE (128 * 1024)
 static int
 read_card_mem(struct adapter *sc, int win, struct t4_mem_range *mr)
 {
 	uint32_t addr, remaining, n;
 	uint32_t *buf;
 	int rc;
 	uint8_t *dst;
 
 	rc = validate_mem_range(sc, mr->addr, mr->len);
 	if (rc != 0)
 		return (rc);
 
 	buf = malloc(min(mr->len, MAX_READ_BUF_SIZE), M_CXGBE, M_WAITOK);
 	addr = mr->addr;
 	remaining = mr->len;
 	dst = (void *)mr->data;
 
 	while (remaining) {
 		n = min(remaining, MAX_READ_BUF_SIZE);
 		read_via_memwin(sc, 2, addr, buf, n);
 
 		rc = copyout(buf, dst, n);
 		if (rc != 0)
 			break;
 
 		dst += n;
 		remaining -= n;
 		addr += n;
 	}
 
 	free(buf, M_CXGBE);
 	return (rc);
 }
 #undef MAX_READ_BUF_SIZE
 
 static int
 read_i2c(struct adapter *sc, struct t4_i2c_data *i2cd)
 {
 	int rc;
 
 	if (i2cd->len == 0 || i2cd->port_id >= sc->params.nports)
 		return (EINVAL);
 
 	if (i2cd->len > sizeof(i2cd->data))
 		return (EFBIG);
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4i2crd");
 	if (rc)
 		return (rc);
 	rc = -t4_i2c_rd(sc, sc->mbox, i2cd->port_id, i2cd->dev_addr,
 	    i2cd->offset, i2cd->len, &i2cd->data[0]);
 	end_synchronized_op(sc, 0);
 
 	return (rc);
 }
 
 static int
 in_range(int val, int lo, int hi)
 {
 
 	return (val < 0 || (val <= hi && val >= lo));
 }
 
 static int
-set_sched_class(struct adapter *sc, struct t4_sched_params *p)
+set_sched_class_config(struct adapter *sc, int minmax)
 {
-	int fw_subcmd, fw_type, rc;
+	int rc;
 
-	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsc");
+	if (minmax < 0)
+		return (EINVAL);
+
+	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4sscc");
 	if (rc)
 		return (rc);
+	rc = -t4_sched_config(sc, FW_SCHED_TYPE_PKTSCHED, minmax, 1);
+	end_synchronized_op(sc, 0);
 
-	if (!(sc->flags & FULL_INIT_DONE)) {
-		rc = EAGAIN;
-		goto done;
-	}
+	return (rc);
+}
 
-	/*
-	 * Translate the cxgbetool parameters into T4 firmware parameters.  (The
-	 * sub-command and type are in common locations.)
-	 */
-	if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG)
-		fw_subcmd = FW_SCHED_SC_CONFIG;
-	else if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS)
-		fw_subcmd = FW_SCHED_SC_PARAMS;
-	else {
-		rc = EINVAL;
-		goto done;
-	}
-	if (p->type == SCHED_CLASS_TYPE_PACKET)
-		fw_type = FW_SCHED_TYPE_PKTSCHED;
-	else {
-		rc = EINVAL;
-		goto done;
-	}
+static int
+set_sched_class_params(struct adapter *sc, struct t4_sched_class_params *p,
+    int sleep_ok)
+{
+	int rc, top_speed, fw_level, fw_mode, fw_rateunit, fw_ratemode;
+	struct port_info *pi;
+	struct tx_sched_class *tc;
 
-	if (fw_subcmd == FW_SCHED_SC_CONFIG) {
-		/* Vet our parameters ..*/
-		if (p->u.config.minmax < 0) {
-			rc = EINVAL;
-			goto done;
-		}
+	if (p->level == SCHED_CLASS_LEVEL_CL_RL)
+		fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL;
+	else if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
+		fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR;
+	else if (p->level == SCHED_CLASS_LEVEL_CH_RL)
+		fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL;
+	else
+		return (EINVAL);
 
-		/* And pass the request to the firmware ...*/
-		rc = -t4_sched_config(sc, fw_type, p->u.config.minmax, 1);
-		goto done;
-	}
+	if (p->mode == SCHED_CLASS_MODE_CLASS)
+		fw_mode = FW_SCHED_PARAMS_MODE_CLASS;
+	else if (p->mode == SCHED_CLASS_MODE_FLOW)
+		fw_mode = FW_SCHED_PARAMS_MODE_FLOW;
+	else
+		return (EINVAL);
 
-	if (fw_subcmd == FW_SCHED_SC_PARAMS) {
-		int fw_level;
-		int fw_mode;
-		int fw_rateunit;
-		int fw_ratemode;
+	if (p->rateunit == SCHED_CLASS_RATEUNIT_BITS)
+		fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE;
+	else if (p->rateunit == SCHED_CLASS_RATEUNIT_PKTS)
+		fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE;
+	else
+		return (EINVAL);
 
-		if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL)
-			fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL;
-		else if (p->u.params.level == SCHED_CLASS_LEVEL_CL_WRR)
-			fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR;
-		else if (p->u.params.level == SCHED_CLASS_LEVEL_CH_RL)
-			fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL;
-		else {
-			rc = EINVAL;
-			goto done;
-		}
+	if (p->ratemode == SCHED_CLASS_RATEMODE_REL)
+		fw_ratemode = FW_SCHED_PARAMS_RATE_REL;
+	else if (p->ratemode == SCHED_CLASS_RATEMODE_ABS)
+		fw_ratemode = FW_SCHED_PARAMS_RATE_ABS;
+	else
+		return (EINVAL);
 
-		if (p->u.params.mode == SCHED_CLASS_MODE_CLASS)
-			fw_mode = FW_SCHED_PARAMS_MODE_CLASS;
-		else if (p->u.params.mode == SCHED_CLASS_MODE_FLOW)
-			fw_mode = FW_SCHED_PARAMS_MODE_FLOW;
-		else {
-			rc = EINVAL;
-			goto done;
-		}
+	/* Vet our parameters ... */
+	if (!in_range(p->channel, 0, sc->chip_params->nchan - 1))
+		return (ERANGE);
 
-		if (p->u.params.rateunit == SCHED_CLASS_RATEUNIT_BITS)
-			fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE;
-		else if (p->u.params.rateunit == SCHED_CLASS_RATEUNIT_PKTS)
-			fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE;
-		else {
-			rc = EINVAL;
-			goto done;
-		}
+	pi = sc->port[sc->chan_map[p->channel]];
+	if (pi == NULL)
+		return (ENXIO);
+	MPASS(pi->tx_chan == p->channel);
+	top_speed = port_top_speed(pi) * 1000000; /* Gbps -> Kbps */
 
-		if (p->u.params.ratemode == SCHED_CLASS_RATEMODE_REL)
-			fw_ratemode = FW_SCHED_PARAMS_RATE_REL;
-		else if (p->u.params.ratemode == SCHED_CLASS_RATEMODE_ABS)
-			fw_ratemode = FW_SCHED_PARAMS_RATE_ABS;
-		else {
-			rc = EINVAL;
-			goto done;
-		}
+	if (!in_range(p->cl, 0, sc->chip_params->nsched_cls) ||
+	    !in_range(p->minrate, 0, top_speed) ||
+	    !in_range(p->maxrate, 0, top_speed) ||
+	    !in_range(p->weight, 0, 100))
+		return (ERANGE);
 
-		/* Vet our parameters ... */
-		if (!in_range(p->u.params.channel, 0, 3) ||
-		    !in_range(p->u.params.cl, 0, sc->chip_params->nsched_cls) ||
-		    !in_range(p->u.params.minrate, 0, 10000000) ||
-		    !in_range(p->u.params.maxrate, 0, 10000000) ||
-		    !in_range(p->u.params.weight, 0, 100)) {
-			rc = ERANGE;
-			goto done;
-		}
+	/*
+	 * Translate any unset parameters into the firmware's
+	 * nomenclature and/or fail the call if the parameters
+	 * are required ...
+	 */
+	if (p->rateunit < 0 || p->ratemode < 0 || p->channel < 0 || p->cl < 0)
+		return (EINVAL);
 
+	if (p->minrate < 0)
+		p->minrate = 0;
+	if (p->maxrate < 0) {
+		if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
+		    p->level == SCHED_CLASS_LEVEL_CH_RL)
+			return (EINVAL);
+		else
+			p->maxrate = 0;
+	}
+	if (p->weight < 0) {
+		if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
+			return (EINVAL);
+		else
+			p->weight = 0;
+	}
+	if (p->pktsize < 0) {
+		if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
+		    p->level == SCHED_CLASS_LEVEL_CH_RL)
+			return (EINVAL);
+		else
+			p->pktsize = 0;
+	}
+
+	rc = begin_synchronized_op(sc, NULL,
+	    sleep_ok ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4sscp");
+	if (rc)
+		return (rc);
+	tc = &pi->tc[p->cl];
+	tc->params = *p;
+	rc = -t4_sched_params(sc, FW_SCHED_TYPE_PKTSCHED, fw_level, fw_mode,
+	    fw_rateunit, fw_ratemode, p->channel, p->cl, p->minrate, p->maxrate,
+	    p->weight, p->pktsize, sleep_ok);
+	if (rc == 0)
+		tc->flags |= TX_SC_OK;
+	else {
 		/*
-		 * Translate any unset parameters into the firmware's
-		 * nomenclature and/or fail the call if the parameters
-		 * are required ...
+		 * Unknown state at this point, see tc->params for what was
+		 * attempted.
 		 */
-		if (p->u.params.rateunit < 0 || p->u.params.ratemode < 0 ||
-		    p->u.params.channel < 0 || p->u.params.cl < 0) {
-			rc = EINVAL;
-			goto done;
-		}
-		if (p->u.params.minrate < 0)
-			p->u.params.minrate = 0;
-		if (p->u.params.maxrate < 0) {
-			if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL ||
-			    p->u.params.level == SCHED_CLASS_LEVEL_CH_RL) {
-				rc = EINVAL;
-				goto done;
-			} else
-				p->u.params.maxrate = 0;
-		}
-		if (p->u.params.weight < 0) {
-			if (p->u.params.level == SCHED_CLASS_LEVEL_CL_WRR) {
-				rc = EINVAL;
-				goto done;
-			} else
-				p->u.params.weight = 0;
-		}
-		if (p->u.params.pktsize < 0) {
-			if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL ||
-			    p->u.params.level == SCHED_CLASS_LEVEL_CH_RL) {
-				rc = EINVAL;
-				goto done;
-			} else
-				p->u.params.pktsize = 0;
-		}
-
-		/* See what the firmware thinks of the request ... */
-		rc = -t4_sched_params(sc, fw_type, fw_level, fw_mode,
-		    fw_rateunit, fw_ratemode, p->u.params.channel,
-		    p->u.params.cl, p->u.params.minrate, p->u.params.maxrate,
-		    p->u.params.weight, p->u.params.pktsize, 1);
-		goto done;
+		tc->flags &= ~TX_SC_OK;
 	}
+	end_synchronized_op(sc, sleep_ok ? 0 : LOCK_HELD);
 
-	rc = EINVAL;
-done:
-	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 static int
+set_sched_class(struct adapter *sc, struct t4_sched_params *p)
+{
+
+	if (p->type != SCHED_CLASS_TYPE_PACKET)
+		return (EINVAL);
+
+	if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG)
+		return (set_sched_class_config(sc, p->u.config.minmax));
+
+	if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS)
+		return (set_sched_class_params(sc, &p->u.params, 1));
+
+	return (EINVAL);
+}
+
+static int
 set_sched_queue(struct adapter *sc, struct t4_sched_queue *p)
 {
 	struct port_info *pi = NULL;
 	struct vi_info *vi;
 	struct sge_txq *txq;
 	uint32_t fw_mnem, fw_queue, fw_class;
 	int i, rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsq");
 	if (rc)
 		return (rc);
 
-	if (!(sc->flags & FULL_INIT_DONE)) {
-		rc = EAGAIN;
-		goto done;
-	}
-
 	if (p->port >= sc->params.nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* XXX: Only supported for the main VI. */
 	pi = sc->port[p->port];
 	vi = &pi->vi[0];
-	if (!in_range(p->queue, 0, vi->ntxq - 1) || !in_range(p->cl, 0, 7)) {
+	if (!(vi->flags & VI_INIT_DONE)) {
+		/* tx queues not set up yet */
+		rc = EAGAIN;
+		goto done;
+	}
+
+	if (!in_range(p->queue, 0, vi->ntxq - 1) ||
+	    !in_range(p->cl, 0, sc->chip_params->nsched_cls - 1)) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/*
 	 * Create a template for the FW_PARAMS_CMD mnemonic and value (TX
 	 * Scheduling Class in this case).
 	 */
 	fw_mnem = (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH));
 	fw_class = p->cl < 0 ? 0xffffffff : p->cl;
 
 	/*
 	 * If op.queue is non-negative, then we're only changing the scheduling
 	 * on a single specified TX queue.
 	 */
 	if (p->queue >= 0) {
 		txq = &sc->sge.txq[vi->first_txq + p->queue];
 		fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
 		rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
 		    &fw_class);
 		goto done;
 	}
 
 	/*
 	 * Change the scheduling on all the TX queues for the
 	 * interface.
 	 */
 	for_each_txq(vi, i, txq) {
 		fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
 		rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
 		    &fw_class);
 		if (rc)
 			goto done;
 	}
 
 	rc = 0;
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 int
 t4_os_find_pci_capability(struct adapter *sc, int cap)
 {
 	int i;
 
 	return (pci_find_cap(sc->dev, cap, &i) == 0 ? i : 0);
 }
 
 int
 t4_os_pci_save_state(struct adapter *sc)
 {
 	device_t dev;
 	struct pci_devinfo *dinfo;
 
 	dev = sc->dev;
 	dinfo = device_get_ivars(dev);
 
 	pci_cfg_save(dev, dinfo, 0);
 	return (0);
 }
 
 int
 t4_os_pci_restore_state(struct adapter *sc)
 {
 	device_t dev;
 	struct pci_devinfo *dinfo;
 
 	dev = sc->dev;
 	dinfo = device_get_ivars(dev);
 
 	pci_cfg_restore(dev, dinfo);
 	return (0);
 }
 
 void
 t4_os_portmod_changed(const struct adapter *sc, int idx)
 {
 	struct port_info *pi = sc->port[idx];
 	struct vi_info *vi;
 	struct ifnet *ifp;
 	int v;
 	static const char *mod_str[] = {
 		NULL, "LR", "SR", "ER", "TWINAX", "active TWINAX", "LRM"
 	};
 
 	for_each_vi(pi, v, vi) {
 		build_medialist(pi, &vi->media);
 	}
 
 	ifp = pi->vi[0].ifp;
 	if (pi->mod_type == FW_PORT_MOD_TYPE_NONE)
 		if_printf(ifp, "transceiver unplugged.\n");
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN)
 		if_printf(ifp, "unknown transceiver inserted.\n");
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED)
 		if_printf(ifp, "unsupported transceiver inserted.\n");
 	else if (pi->mod_type > 0 && pi->mod_type < nitems(mod_str)) {
 		if_printf(ifp, "%s transceiver inserted.\n",
 		    mod_str[pi->mod_type]);
 	} else {
 		if_printf(ifp, "transceiver (type %d) inserted.\n",
 		    pi->mod_type);
 	}
 }
 
 void
 t4_os_link_changed(struct adapter *sc, int idx, int link_stat, int reason)
 {
 	struct port_info *pi = sc->port[idx];
 	struct vi_info *vi;
 	struct ifnet *ifp;
 	int v;
 
 	if (link_stat)
 		pi->linkdnrc = -1;
 	else {
 		if (reason >= 0)
 			pi->linkdnrc = reason;
 	}
 	for_each_vi(pi, v, vi) {
 		ifp = vi->ifp;
 		if (ifp == NULL)
 			continue;
 
 		if (link_stat) {
 			ifp->if_baudrate = IF_Mbps(pi->link_cfg.speed);
 			if_link_state_change(ifp, LINK_STATE_UP);
 		} else {
 			if_link_state_change(ifp, LINK_STATE_DOWN);
 		}
 	}
 }
 
 void
 t4_iterate(void (*func)(struct adapter *, void *), void *arg)
 {
 	struct adapter *sc;
 
 	sx_slock(&t4_list_lock);
 	SLIST_FOREACH(sc, &t4_list, link) {
 		/*
 		 * func should not make any assumptions about what state sc is
 		 * in - the only guarantee is that sc->sc_lock is a valid lock.
 		 */
 		func(sc, arg);
 	}
 	sx_sunlock(&t4_list_lock);
 }
 
 static int
 t4_open(struct cdev *dev, int flags, int type, struct thread *td)
 {
        return (0);
 }
 
 static int
 t4_close(struct cdev *dev, int flags, int type, struct thread *td)
 {
        return (0);
 }
 
 static int
 t4_ioctl(struct cdev *dev, unsigned long cmd, caddr_t data, int fflag,
     struct thread *td)
 {
 	int rc;
 	struct adapter *sc = dev->si_drv1;
 
 	rc = priv_check(td, PRIV_DRIVER);
 	if (rc != 0)
 		return (rc);
 
 	switch (cmd) {
 	case CHELSIO_T4_GETREG: {
 		struct t4_reg *edata = (struct t4_reg *)data;
 
 		if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
 			return (EFAULT);
 
 		if (edata->size == 4)
 			edata->val = t4_read_reg(sc, edata->addr);
 		else if (edata->size == 8)
 			edata->val = t4_read_reg64(sc, edata->addr);
 		else
 			return (EINVAL);
 
 		break;
 	}
 	case CHELSIO_T4_SETREG: {
 		struct t4_reg *edata = (struct t4_reg *)data;
 
 		if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
 			return (EFAULT);
 
 		if (edata->size == 4) {
 			if (edata->val & 0xffffffff00000000)
 				return (EINVAL);
 			t4_write_reg(sc, edata->addr, (uint32_t) edata->val);
 		} else if (edata->size == 8)
 			t4_write_reg64(sc, edata->addr, edata->val);
 		else
 			return (EINVAL);
 		break;
 	}
 	case CHELSIO_T4_REGDUMP: {
 		struct t4_regdump *regs = (struct t4_regdump *)data;
 		int reglen = is_t4(sc) ? T4_REGDUMP_SIZE : T5_REGDUMP_SIZE;
 		uint8_t *buf;
 
 		if (regs->len < reglen) {
 			regs->len = reglen; /* hint to the caller */
 			return (ENOBUFS);
 		}
 
 		regs->len = reglen;
 		buf = malloc(reglen, M_CXGBE, M_WAITOK | M_ZERO);
 		get_regs(sc, regs, buf);
 		rc = copyout(buf, regs->data, reglen);
 		free(buf, M_CXGBE);
 		break;
 	}
 	case CHELSIO_T4_GET_FILTER_MODE:
 		rc = get_filter_mode(sc, (uint32_t *)data);
 		break;
 	case CHELSIO_T4_SET_FILTER_MODE:
 		rc = set_filter_mode(sc, *(uint32_t *)data);
 		break;
 	case CHELSIO_T4_GET_FILTER:
 		rc = get_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_SET_FILTER:
 		rc = set_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_DEL_FILTER:
 		rc = del_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_GET_SGE_CONTEXT:
 		rc = get_sge_context(sc, (struct t4_sge_context *)data);
 		break;
 	case CHELSIO_T4_LOAD_FW:
 		rc = load_fw(sc, (struct t4_data *)data);
 		break;
 	case CHELSIO_T4_GET_MEM:
 		rc = read_card_mem(sc, 2, (struct t4_mem_range *)data);
 		break;
 	case CHELSIO_T4_GET_I2C:
 		rc = read_i2c(sc, (struct t4_i2c_data *)data);
 		break;
 	case CHELSIO_T4_CLEAR_STATS: {
 		int i, v;
 		u_int port_id = *(uint32_t *)data;
 		struct port_info *pi;
 		struct vi_info *vi;
 
 		if (port_id >= sc->params.nports)
 			return (EINVAL);
 		pi = sc->port[port_id];
 
 		/* MAC stats */
 		t4_clr_port_stats(sc, pi->tx_chan);
 		pi->tx_parse_error = 0;
 		mtx_lock(&sc->reg_lock);
 		for_each_vi(pi, v, vi) {
 			if (vi->flags & VI_INIT_DONE)
 				t4_clr_vi_stats(sc, vi->viid);
 		}
 		mtx_unlock(&sc->reg_lock);
 
 		/*
 		 * Since this command accepts a port, clear stats for
 		 * all VIs on this port.
 		 */
 		for_each_vi(pi, v, vi) {
 			if (vi->flags & VI_INIT_DONE) {
 				struct sge_rxq *rxq;
 				struct sge_txq *txq;
 				struct sge_wrq *wrq;
 
 				if (vi->flags & VI_NETMAP)
 					continue;
 
 				for_each_rxq(vi, i, rxq) {
 #if defined(INET) || defined(INET6)
 					rxq->lro.lro_queued = 0;
 					rxq->lro.lro_flushed = 0;
 #endif
 					rxq->rxcsum = 0;
 					rxq->vlan_extraction = 0;
 				}
 
 				for_each_txq(vi, i, txq) {
 					txq->txcsum = 0;
 					txq->tso_wrs = 0;
 					txq->vlan_insertion = 0;
 					txq->imm_wrs = 0;
 					txq->sgl_wrs = 0;
 					txq->txpkt_wrs = 0;
 					txq->txpkts0_wrs = 0;
 					txq->txpkts1_wrs = 0;
 					txq->txpkts0_pkts = 0;
 					txq->txpkts1_pkts = 0;
 					mp_ring_reset_stats(txq->r);
 				}
 
 #ifdef TCP_OFFLOAD
 				/* nothing to clear for each ofld_rxq */
 
 				for_each_ofld_txq(vi, i, wrq) {
 					wrq->tx_wrs_direct = 0;
 					wrq->tx_wrs_copied = 0;
 				}
 #endif
 
 				if (IS_MAIN_VI(vi)) {
 					wrq = &sc->sge.ctrlq[pi->port_id];
 					wrq->tx_wrs_direct = 0;
 					wrq->tx_wrs_copied = 0;
 				}
 			}
 		}
 		break;
 	}
 	case CHELSIO_T4_SCHED_CLASS:
 		rc = set_sched_class(sc, (struct t4_sched_params *)data);
 		break;
 	case CHELSIO_T4_SCHED_QUEUE:
 		rc = set_sched_queue(sc, (struct t4_sched_queue *)data);
 		break;
 	case CHELSIO_T4_GET_TRACER:
 		rc = t4_get_tracer(sc, (struct t4_tracer *)data);
 		break;
 	case CHELSIO_T4_SET_TRACER:
 		rc = t4_set_tracer(sc, (struct t4_tracer *)data);
 		break;
 	default:
 		rc = EINVAL;
 	}
 
 	return (rc);
 }
 
 void
 t4_db_full(struct adapter *sc)
 {
 
 	CXGBE_UNIMPLEMENTED(__func__);
 }
 
 void
 t4_db_dropped(struct adapter *sc)
 {
 
 	CXGBE_UNIMPLEMENTED(__func__);
 }
 
 #ifdef TCP_OFFLOAD
 void
 t4_iscsi_init(struct adapter *sc, u_int tag_mask, const u_int *pgsz_order)
 {
 
 	t4_write_reg(sc, A_ULP_RX_ISCSI_TAGMASK, tag_mask);
 	t4_write_reg(sc, A_ULP_RX_ISCSI_PSZ, V_HPZ0(pgsz_order[0]) |
 		V_HPZ1(pgsz_order[1]) | V_HPZ2(pgsz_order[2]) |
 		V_HPZ3(pgsz_order[3]));
 }
 
 static int
 toe_capability(struct vi_info *vi, int enable)
 {
 	int rc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (!is_offload(sc))
 		return (ENODEV);
 
 	if (enable) {
 		if ((vi->ifp->if_capenable & IFCAP_TOE) != 0) {
 			/* TOE is already enabled. */
 			return (0);
 		}
 
 		/*
 		 * We need the port's queues around so that we're able to send
 		 * and receive CPLs to/from the TOE even if the ifnet for this
 		 * port has never been UP'd administratively.
 		 */
 		if (!(vi->flags & VI_INIT_DONE)) {
 			rc = cxgbe_init_synchronized(vi);
 			if (rc)
 				return (rc);
 		}
 		if (!(pi->vi[0].flags & VI_INIT_DONE)) {
 			rc = cxgbe_init_synchronized(&pi->vi[0]);
 			if (rc)
 				return (rc);
 		}
 
 		if (isset(&sc->offload_map, pi->port_id)) {
 			/* TOE is enabled on another VI of this port. */
 			pi->uld_vis++;
 			return (0);
 		}
 
 		if (!uld_active(sc, ULD_TOM)) {
 			rc = t4_activate_uld(sc, ULD_TOM);
 			if (rc == EAGAIN) {
 				log(LOG_WARNING,
 				    "You must kldload t4_tom.ko before trying "
 				    "to enable TOE on a cxgbe interface.\n");
 			}
 			if (rc != 0)
 				return (rc);
 			KASSERT(sc->tom_softc != NULL,
 			    ("%s: TOM activated but softc NULL", __func__));
 			KASSERT(uld_active(sc, ULD_TOM),
 			    ("%s: TOM activated but flag not set", __func__));
 		}
 
 		/* Activate iWARP and iSCSI too, if the modules are loaded. */
 		if (!uld_active(sc, ULD_IWARP))
 			(void) t4_activate_uld(sc, ULD_IWARP);
 		if (!uld_active(sc, ULD_ISCSI))
 			(void) t4_activate_uld(sc, ULD_ISCSI);
 
 		pi->uld_vis++;
 		setbit(&sc->offload_map, pi->port_id);
 	} else {
 		pi->uld_vis--;
 
 		if (!isset(&sc->offload_map, pi->port_id) || pi->uld_vis > 0)
 			return (0);
 
 		KASSERT(uld_active(sc, ULD_TOM),
 		    ("%s: TOM never initialized?", __func__));
 		clrbit(&sc->offload_map, pi->port_id);
 	}
 
 	return (0);
 }
 
 /*
  * Add an upper layer driver to the global list.
  */
 int
 t4_register_uld(struct uld_info *ui)
 {
 	int rc = 0;
 	struct uld_info *u;
 
 	sx_xlock(&t4_uld_list_lock);
 	SLIST_FOREACH(u, &t4_uld_list, link) {
 	    if (u->uld_id == ui->uld_id) {
 		    rc = EEXIST;
 		    goto done;
 	    }
 	}
 
 	SLIST_INSERT_HEAD(&t4_uld_list, ui, link);
 	ui->refcount = 0;
 done:
 	sx_xunlock(&t4_uld_list_lock);
 	return (rc);
 }
 
 int
 t4_unregister_uld(struct uld_info *ui)
 {
 	int rc = EINVAL;
 	struct uld_info *u;
 
 	sx_xlock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(u, &t4_uld_list, link) {
 	    if (u == ui) {
 		    if (ui->refcount > 0) {
 			    rc = EBUSY;
 			    goto done;
 		    }
 
 		    SLIST_REMOVE(&t4_uld_list, ui, uld_info, link);
 		    rc = 0;
 		    goto done;
 	    }
 	}
 done:
 	sx_xunlock(&t4_uld_list_lock);
 	return (rc);
 }
 
 int
 t4_activate_uld(struct adapter *sc, int id)
 {
 	int rc;
 	struct uld_info *ui;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (id < 0 || id > ULD_MAX)
 		return (EINVAL);
 	rc = EAGAIN;	/* kldoad the module with this ULD and try again. */
 
 	sx_slock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(ui, &t4_uld_list, link) {
 		if (ui->uld_id == id) {
 			if (!(sc->flags & FULL_INIT_DONE)) {
 				rc = adapter_full_init(sc);
 				if (rc != 0)
 					break;
 			}
 
 			rc = ui->activate(sc);
 			if (rc == 0) {
 				setbit(&sc->active_ulds, id);
 				ui->refcount++;
 			}
 			break;
 		}
 	}
 
 	sx_sunlock(&t4_uld_list_lock);
 
 	return (rc);
 }
 
 int
 t4_deactivate_uld(struct adapter *sc, int id)
 {
 	int rc;
 	struct uld_info *ui;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (id < 0 || id > ULD_MAX)
 		return (EINVAL);
 	rc = ENXIO;
 
 	sx_slock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(ui, &t4_uld_list, link) {
 		if (ui->uld_id == id) {
 			rc = ui->deactivate(sc);
 			if (rc == 0) {
 				clrbit(&sc->active_ulds, id);
 				ui->refcount--;
 			}
 			break;
 		}
 	}
 
 	sx_sunlock(&t4_uld_list_lock);
 
 	return (rc);
 }
 
 int
 uld_active(struct adapter *sc, int uld_id)
 {
 
 	MPASS(uld_id >= 0 && uld_id <= ULD_MAX);
 
 	return (isset(&sc->active_ulds, uld_id));
 }
 #endif
 
 /*
  * Come up with reasonable defaults for some of the tunables, provided they're
  * not set by the user (in which case we'll use the values as is).
  */
 static void
 tweak_tunables(void)
 {
 	int nc = mp_ncpus;	/* our snapshot of the number of CPUs */
 
 	if (t4_ntxq10g < 1) {
 #ifdef RSS
 		t4_ntxq10g = rss_getnumbuckets();
 #else
 		t4_ntxq10g = min(nc, NTXQ_10G);
 #endif
 	}
 
 	if (t4_ntxq1g < 1) {
 #ifdef RSS
 		/* XXX: way too many for 1GbE? */
 		t4_ntxq1g = rss_getnumbuckets();
 #else
 		t4_ntxq1g = min(nc, NTXQ_1G);
 #endif
 	}
 
 	if (t4_nrxq10g < 1) {
 #ifdef RSS
 		t4_nrxq10g = rss_getnumbuckets();
 #else
 		t4_nrxq10g = min(nc, NRXQ_10G);
 #endif
 	}
 
 	if (t4_nrxq1g < 1) {
 #ifdef RSS
 		/* XXX: way too many for 1GbE? */
 		t4_nrxq1g = rss_getnumbuckets();
 #else
 		t4_nrxq1g = min(nc, NRXQ_1G);
 #endif
 	}
 
 #ifdef TCP_OFFLOAD
 	if (t4_nofldtxq10g < 1)
 		t4_nofldtxq10g = min(nc, NOFLDTXQ_10G);
 
 	if (t4_nofldtxq1g < 1)
 		t4_nofldtxq1g = min(nc, NOFLDTXQ_1G);
 
 	if (t4_nofldrxq10g < 1)
 		t4_nofldrxq10g = min(nc, NOFLDRXQ_10G);
 
 	if (t4_nofldrxq1g < 1)
 		t4_nofldrxq1g = min(nc, NOFLDRXQ_1G);
 
 	if (t4_toecaps_allowed == -1)
 		t4_toecaps_allowed = FW_CAPS_CONFIG_TOE;
 
 	if (t4_rdmacaps_allowed == -1) {
 		t4_rdmacaps_allowed = FW_CAPS_CONFIG_RDMA_RDDP |
 		    FW_CAPS_CONFIG_RDMA_RDMAC;
 	}
 
 	if (t4_iscsicaps_allowed == -1) {
 		t4_iscsicaps_allowed = FW_CAPS_CONFIG_ISCSI_INITIATOR_PDU |
 		    FW_CAPS_CONFIG_ISCSI_TARGET_PDU |
 		    FW_CAPS_CONFIG_ISCSI_T10DIF;
 	}
 #else
 	if (t4_toecaps_allowed == -1)
 		t4_toecaps_allowed = 0;
 
 	if (t4_rdmacaps_allowed == -1)
 		t4_rdmacaps_allowed = 0;
 
 	if (t4_iscsicaps_allowed == -1)
 		t4_iscsicaps_allowed = 0;
 #endif
 
 #ifdef DEV_NETMAP
 	if (t4_nnmtxq10g < 1)
 		t4_nnmtxq10g = min(nc, NNMTXQ_10G);
 
 	if (t4_nnmtxq1g < 1)
 		t4_nnmtxq1g = min(nc, NNMTXQ_1G);
 
 	if (t4_nnmrxq10g < 1)
 		t4_nnmrxq10g = min(nc, NNMRXQ_10G);
 
 	if (t4_nnmrxq1g < 1)
 		t4_nnmrxq1g = min(nc, NNMRXQ_1G);
 #endif
 
 	if (t4_tmr_idx_10g < 0 || t4_tmr_idx_10g >= SGE_NTIMERS)
 		t4_tmr_idx_10g = TMR_IDX_10G;
 
 	if (t4_pktc_idx_10g < -1 || t4_pktc_idx_10g >= SGE_NCOUNTERS)
 		t4_pktc_idx_10g = PKTC_IDX_10G;
 
 	if (t4_tmr_idx_1g < 0 || t4_tmr_idx_1g >= SGE_NTIMERS)
 		t4_tmr_idx_1g = TMR_IDX_1G;
 
 	if (t4_pktc_idx_1g < -1 || t4_pktc_idx_1g >= SGE_NCOUNTERS)
 		t4_pktc_idx_1g = PKTC_IDX_1G;
 
 	if (t4_qsize_txq < 128)
 		t4_qsize_txq = 128;
 
 	if (t4_qsize_rxq < 128)
 		t4_qsize_rxq = 128;
 	while (t4_qsize_rxq & 7)
 		t4_qsize_rxq++;
 
 	t4_intr_types &= INTR_MSIX | INTR_MSI | INTR_INTX;
 }
 
 #ifdef DDB
 static void
 t4_dump_tcb(struct adapter *sc, int tid)
 {
 	uint32_t base, i, j, off, pf, reg, save, tcb_addr, win_pos;
 
 	reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, 2);
 	save = t4_read_reg(sc, reg);
 	base = sc->memwin[2].mw_base;
 
 	/* Dump TCB for the tid */
 	tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
 	tcb_addr += tid * TCB_SIZE;
 
 	if (is_t4(sc)) {
 		pf = 0;
 		win_pos = tcb_addr & ~0xf;	/* start must be 16B aligned */
 	} else {
 		pf = V_PFNUM(sc->pf);
 		win_pos = tcb_addr & ~0x7f;	/* start must be 128B aligned */
 	}
 	t4_write_reg(sc, reg, win_pos | pf);
 	t4_read_reg(sc, reg);
 
 	off = tcb_addr - win_pos;
 	for (i = 0; i < 4; i++) {
 		uint32_t buf[8];
 		for (j = 0; j < 8; j++, off += 4)
 			buf[j] = htonl(t4_read_reg(sc, base + off));
 
 		db_printf("%08x %08x %08x %08x %08x %08x %08x %08x\n",
 		    buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6],
 		    buf[7]);
 	}
 
 	t4_write_reg(sc, reg, save);
 	t4_read_reg(sc, reg);
 }
 
 static void
 t4_dump_devlog(struct adapter *sc)
 {
 	struct devlog_params *dparams = &sc->params.devlog;
 	struct fw_devlog_e e;
 	int i, first, j, m, nentries, rc;
 	uint64_t ftstamp = UINT64_MAX;
 
 	if (dparams->start == 0) {
 		db_printf("devlog params not valid\n");
 		return;
 	}
 
 	nentries = dparams->size / sizeof(struct fw_devlog_e);
 	m = fwmtype_to_hwmtype(dparams->memtype);
 
 	/* Find the first entry. */
 	first = -1;
 	for (i = 0; i < nentries && !db_pager_quit; i++) {
 		rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
 		    sizeof(e), (void *)&e);
 		if (rc != 0)
 			break;
 
 		if (e.timestamp == 0)
 			break;
 
 		e.timestamp = be64toh(e.timestamp);
 		if (e.timestamp < ftstamp) {
 			ftstamp = e.timestamp;
 			first = i;
 		}
 	}
 
 	if (first == -1)
 		return;
 
 	i = first;
 	do {
 		rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
 		    sizeof(e), (void *)&e);
 		if (rc != 0)
 			return;
 
 		if (e.timestamp == 0)
 			return;
 
 		e.timestamp = be64toh(e.timestamp);
 		e.seqno = be32toh(e.seqno);
 		for (j = 0; j < 8; j++)
 			e.params[j] = be32toh(e.params[j]);
 
 		db_printf("%10d  %15ju  %8s  %8s  ",
 		    e.seqno, e.timestamp,
 		    (e.level < nitems(devlog_level_strings) ?
 			devlog_level_strings[e.level] : "UNKNOWN"),
 		    (e.facility < nitems(devlog_facility_strings) ?
 			devlog_facility_strings[e.facility] : "UNKNOWN"));
 		db_printf(e.fmt, e.params[0], e.params[1], e.params[2],
 		    e.params[3], e.params[4], e.params[5], e.params[6],
 		    e.params[7]);
 
 		if (++i == nentries)
 			i = 0;
 	} while (i != first && !db_pager_quit);
 }
 
 static struct command_table db_t4_table = LIST_HEAD_INITIALIZER(db_t4_table);
 _DB_SET(_show, t4, NULL, db_show_table, 0, &db_t4_table);
 
 DB_FUNC(devlog, db_show_devlog, db_t4_table, CS_OWN, NULL)
 {
 	device_t dev;
 	int t;
 	bool valid;
 
 	valid = false;
 	t = db_read_token();
 	if (t == tIDENT) {
 		dev = device_lookup_by_name(db_tok_string);
 		valid = true;
 	}
 	db_skip_to_eol();
 	if (!valid) {
 		db_printf("usage: show t4 devlog <nexus>\n");
 		return;
 	}
 
 	if (dev == NULL) {
 		db_printf("device not found\n");
 		return;
 	}
 
 	t4_dump_devlog(device_get_softc(dev));
 }
 
 DB_FUNC(tcb, db_show_t4tcb, db_t4_table, CS_OWN, NULL)
 {
 	device_t dev;
 	int radix, tid, t;
 	bool valid;
 
 	valid = false;
 	radix = db_radix;
 	db_radix = 10;
 	t = db_read_token();
 	if (t == tIDENT) {
 		dev = device_lookup_by_name(db_tok_string);
 		t = db_read_token();
 		if (t == tNUMBER) {
 			tid = db_tok_number;
 			valid = true;
 		}
 	}	
 	db_radix = radix;
 	db_skip_to_eol();
 	if (!valid) {
 		db_printf("usage: show t4 tcb <nexus> <tid>\n");
 		return;
 	}
 
 	if (dev == NULL) {
 		db_printf("device not found\n");
 		return;
 	}
 	if (tid < 0) {
 		db_printf("invalid tid\n");
 		return;
 	}
 
 	t4_dump_tcb(device_get_softc(dev), tid);
 }
 #endif
 
 static struct sx mlu;	/* mod load unload */
 SX_SYSINIT(cxgbe_mlu, &mlu, "cxgbe mod load/unload");
 
 static int
 mod_event(module_t mod, int cmd, void *arg)
 {
 	int rc = 0;
 	static int loaded = 0;
 
 	switch (cmd) {
 	case MOD_LOAD:
 		sx_xlock(&mlu);
 		if (loaded++ == 0) {
 			t4_sge_modload();
 			sx_init(&t4_list_lock, "T4/T5 adapters");
 			SLIST_INIT(&t4_list);
 #ifdef TCP_OFFLOAD
 			sx_init(&t4_uld_list_lock, "T4/T5 ULDs");
 			SLIST_INIT(&t4_uld_list);
 #endif
 			t4_tracer_modload();
 			tweak_tunables();
 		}
 		sx_xunlock(&mlu);
 		break;
 
 	case MOD_UNLOAD:
 		sx_xlock(&mlu);
 		if (--loaded == 0) {
 			int tries;
 
 			sx_slock(&t4_list_lock);
 			if (!SLIST_EMPTY(&t4_list)) {
 				rc = EBUSY;
 				sx_sunlock(&t4_list_lock);
 				goto done_unload;
 			}
 #ifdef TCP_OFFLOAD
 			sx_slock(&t4_uld_list_lock);
 			if (!SLIST_EMPTY(&t4_uld_list)) {
 				rc = EBUSY;
 				sx_sunlock(&t4_uld_list_lock);
 				sx_sunlock(&t4_list_lock);
 				goto done_unload;
 			}
 #endif
 			tries = 0;
 			while (tries++ < 5 && t4_sge_extfree_refs() != 0) {
 				uprintf("%ju clusters with custom free routine "
 				    "still is use.\n", t4_sge_extfree_refs());
 				pause("t4unload", 2 * hz);
 			}
 #ifdef TCP_OFFLOAD
 			sx_sunlock(&t4_uld_list_lock);
 #endif
 			sx_sunlock(&t4_list_lock);
 
 			if (t4_sge_extfree_refs() == 0) {
 				t4_tracer_modunload();
 #ifdef TCP_OFFLOAD
 				sx_destroy(&t4_uld_list_lock);
 #endif
 				sx_destroy(&t4_list_lock);
 				t4_sge_modunload();
 				loaded = 0;
 			} else {
 				rc = EBUSY;
 				loaded++;	/* undo earlier decrement */
 			}
 		}
 done_unload:
 		sx_xunlock(&mlu);
 		break;
 	}
 
 	return (rc);
 }
 
 static devclass_t t4_devclass, t5_devclass;
 static devclass_t cxgbe_devclass, cxl_devclass;
 static devclass_t vcxgbe_devclass, vcxl_devclass;
 
 DRIVER_MODULE(t4nex, pci, t4_driver, t4_devclass, mod_event, 0);
 MODULE_VERSION(t4nex, 1);
 MODULE_DEPEND(t4nex, firmware, 1, 1, 1);
 #ifdef DEV_NETMAP
 MODULE_DEPEND(t4nex, netmap, 1, 1, 1);
 #endif /* DEV_NETMAP */
 
 
 DRIVER_MODULE(t5nex, pci, t5_driver, t5_devclass, mod_event, 0);
 MODULE_VERSION(t5nex, 1);
 MODULE_DEPEND(t5nex, firmware, 1, 1, 1);
 #ifdef DEV_NETMAP
 MODULE_DEPEND(t5nex, netmap, 1, 1, 1);
 #endif /* DEV_NETMAP */
 
 DRIVER_MODULE(cxgbe, t4nex, cxgbe_driver, cxgbe_devclass, 0, 0);
 MODULE_VERSION(cxgbe, 1);
 
 DRIVER_MODULE(cxl, t5nex, cxl_driver, cxl_devclass, 0, 0);
 MODULE_VERSION(cxl, 1);
 
 DRIVER_MODULE(vcxgbe, cxgbe, vcxgbe_driver, vcxgbe_devclass, 0, 0);
 MODULE_VERSION(vcxgbe, 1);
 
 DRIVER_MODULE(vcxl, cxl, vcxl_driver, vcxl_devclass, 0, 0);
 MODULE_VERSION(vcxl, 1);
Index: projects/vnet/sys/dev/e1000/if_igb.c
===================================================================
--- projects/vnet/sys/dev/e1000/if_igb.c	(revision 301546)
+++ projects/vnet/sys/dev/e1000/if_igb.c	(revision 301547)
@@ -1,6440 +1,6440 @@
 /******************************************************************************
 
   Copyright (c) 2001-2015, Intel Corporation 
   All rights reserved.
   
   Redistribution and use in source and binary forms, with or without 
   modification, are permitted provided that the following conditions are met:
   
    1. Redistributions of source code must retain the above copyright notice, 
       this list of conditions and the following disclaimer.
   
    2. Redistributions in binary form must reproduce the above copyright 
       notice, this list of conditions and the following disclaimer in the 
       documentation and/or other materials provided with the distribution.
   
    3. Neither the name of the Intel Corporation nor the names of its 
       contributors may be used to endorse or promote products derived from 
       this software without specific prior written permission.
   
   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.
 
 ******************************************************************************/
 /*$FreeBSD$*/
 
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_rss.h"
 
 #ifdef HAVE_KERNEL_OPTION_HEADERS
 #include "opt_device_polling.h"
 #include "opt_altq.h"
 #endif
 
 #include "if_igb.h"
 
 /*********************************************************************
  *  Driver version:
  *********************************************************************/
 char igb_driver_version[] = "2.5.3-k";
 
 
 /*********************************************************************
  *  PCI Device ID Table
  *
  *  Used by probe to select devices to load on
  *  Last field stores an index into e1000_strings
  *  Last entry must be all 0s
  *
  *  { Vendor ID, Device ID, SubVendor ID, SubDevice ID, String Index }
  *********************************************************************/
 
 static igb_vendor_info_t igb_vendor_info_array[] =
 {
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575EB_COPPER, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575EB_FIBER_SERDES, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575GB_QUAD_COPPER, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_NS, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_NS_SERDES, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_FIBER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_SERDES, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_SERDES_QUAD, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_QUAD_COPPER, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_QUAD_COPPER_ET2, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_VF, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_COPPER, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_FIBER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_SERDES, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_SGMII,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_COPPER_DUAL, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_QUAD_FIBER, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SERDES, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SGMII, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SFP, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_BACKPLANE, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_COPPER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_FIBER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_SERDES,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_SGMII,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_VF, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_IT, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_OEM1, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_FLASHLESS, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SERDES_FLASHLESS, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_FIBER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SERDES,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SGMII,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I211_COPPER,	0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_BACKPLANE_1GBPS, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_BACKPLANE_2_5GBPS, 0, 0, 0},
 	{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_SGMII,	0, 0, 0},
 	/* required last entry */
 	{0, 0, 0, 0, 0}
 };
 
 /*********************************************************************
  *  Table of branding strings for all supported NICs.
  *********************************************************************/
 
 static char *igb_strings[] = {
 	"Intel(R) PRO/1000 Network Connection"
 };
 
 /*********************************************************************
  *  Function prototypes
  *********************************************************************/
 static int	igb_probe(device_t);
 static int	igb_attach(device_t);
 static int	igb_detach(device_t);
 static int	igb_shutdown(device_t);
 static int	igb_suspend(device_t);
 static int	igb_resume(device_t);
 #ifndef IGB_LEGACY_TX
 static int	igb_mq_start(struct ifnet *, struct mbuf *);
 static int	igb_mq_start_locked(struct ifnet *, struct tx_ring *);
 static void	igb_qflush(struct ifnet *);
 static void	igb_deferred_mq_start(void *, int);
 #else
 static void	igb_start(struct ifnet *);
 static void	igb_start_locked(struct tx_ring *, struct ifnet *ifp);
 #endif
 static int	igb_ioctl(struct ifnet *, u_long, caddr_t);
 static uint64_t	igb_get_counter(if_t, ift_counter);
 static void	igb_init(void *);
 static void	igb_init_locked(struct adapter *);
 static void	igb_stop(void *);
 static void	igb_media_status(struct ifnet *, struct ifmediareq *);
 static int	igb_media_change(struct ifnet *);
 static void	igb_identify_hardware(struct adapter *);
 static int	igb_allocate_pci_resources(struct adapter *);
 static int	igb_allocate_msix(struct adapter *);
 static int	igb_allocate_legacy(struct adapter *);
 static int	igb_setup_msix(struct adapter *);
 static void	igb_free_pci_resources(struct adapter *);
 static void	igb_local_timer(void *);
 static void	igb_reset(struct adapter *);
 static int	igb_setup_interface(device_t, struct adapter *);
 static int	igb_allocate_queues(struct adapter *);
 static void	igb_configure_queues(struct adapter *);
 
 static int	igb_allocate_transmit_buffers(struct tx_ring *);
 static void	igb_setup_transmit_structures(struct adapter *);
 static void	igb_setup_transmit_ring(struct tx_ring *);
 static void	igb_initialize_transmit_units(struct adapter *);
 static void	igb_free_transmit_structures(struct adapter *);
 static void	igb_free_transmit_buffers(struct tx_ring *);
 
 static int	igb_allocate_receive_buffers(struct rx_ring *);
 static int	igb_setup_receive_structures(struct adapter *);
 static int	igb_setup_receive_ring(struct rx_ring *);
 static void	igb_initialize_receive_units(struct adapter *);
 static void	igb_free_receive_structures(struct adapter *);
 static void	igb_free_receive_buffers(struct rx_ring *);
 static void	igb_free_receive_ring(struct rx_ring *);
 
 static void	igb_enable_intr(struct adapter *);
 static void	igb_disable_intr(struct adapter *);
 static void	igb_update_stats_counters(struct adapter *);
 static bool	igb_txeof(struct tx_ring *);
 
 static __inline	void igb_rx_discard(struct rx_ring *, int);
 static __inline void igb_rx_input(struct rx_ring *,
 		    struct ifnet *, struct mbuf *, u32);
 
 static bool	igb_rxeof(struct igb_queue *, int, int *);
 static void	igb_rx_checksum(u32, struct mbuf *, u32);
 static int	igb_tx_ctx_setup(struct tx_ring *,
 		    struct mbuf *, u32 *, u32 *);
 static int	igb_tso_setup(struct tx_ring *,
 		    struct mbuf *, u32 *, u32 *);
 static void	igb_set_promisc(struct adapter *);
 static void	igb_disable_promisc(struct adapter *);
 static void	igb_set_multi(struct adapter *);
 static void	igb_update_link_status(struct adapter *);
 static void	igb_refresh_mbufs(struct rx_ring *, int);
 
 static void	igb_register_vlan(void *, struct ifnet *, u16);
 static void	igb_unregister_vlan(void *, struct ifnet *, u16);
 static void	igb_setup_vlan_hw_support(struct adapter *);
 
 static int	igb_xmit(struct tx_ring *, struct mbuf **);
 static int	igb_dma_malloc(struct adapter *, bus_size_t,
 		    struct igb_dma_alloc *, int);
 static void	igb_dma_free(struct adapter *, struct igb_dma_alloc *);
 static int	igb_sysctl_nvm_info(SYSCTL_HANDLER_ARGS);
 static void	igb_print_nvm_info(struct adapter *);
 static int 	igb_is_valid_ether_addr(u8 *);
 static void     igb_add_hw_stats(struct adapter *);
 
 static void	igb_vf_init_stats(struct adapter *);
 static void	igb_update_vf_stats_counters(struct adapter *);
 
 /* Management and WOL Support */
 static void	igb_init_manageability(struct adapter *);
 static void	igb_release_manageability(struct adapter *);
 static void     igb_get_hw_control(struct adapter *);
 static void     igb_release_hw_control(struct adapter *);
 static void     igb_enable_wakeup(device_t);
 static void     igb_led_func(void *, int);
 
 static int	igb_irq_fast(void *);
 static void	igb_msix_que(void *);
 static void	igb_msix_link(void *);
 static void	igb_handle_que(void *context, int pending);
 static void	igb_handle_link(void *context, int pending);
 static void	igb_handle_link_locked(struct adapter *);
 
 static void	igb_set_sysctl_value(struct adapter *, const char *,
 		    const char *, int *, int);
 static int	igb_set_flowcntl(SYSCTL_HANDLER_ARGS);
 static int	igb_sysctl_dmac(SYSCTL_HANDLER_ARGS);
 static int	igb_sysctl_eee(SYSCTL_HANDLER_ARGS);
 
 #ifdef DEVICE_POLLING
 static poll_handler_t igb_poll;
 #endif /* POLLING */
 
 /*********************************************************************
  *  FreeBSD Device Interface Entry Points
  *********************************************************************/
 
 static device_method_t igb_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe, igb_probe),
 	DEVMETHOD(device_attach, igb_attach),
 	DEVMETHOD(device_detach, igb_detach),
 	DEVMETHOD(device_shutdown, igb_shutdown),
 	DEVMETHOD(device_suspend, igb_suspend),
 	DEVMETHOD(device_resume, igb_resume),
 	DEVMETHOD_END
 };
 
 static driver_t igb_driver = {
 	"igb", igb_methods, sizeof(struct adapter),
 };
 
 static devclass_t igb_devclass;
 DRIVER_MODULE(igb, pci, igb_driver, igb_devclass, 0, 0);
 MODULE_DEPEND(igb, pci, 1, 1, 1);
 MODULE_DEPEND(igb, ether, 1, 1, 1);
 #ifdef DEV_NETMAP
 MODULE_DEPEND(igb, netmap, 1, 1, 1);
 #endif /* DEV_NETMAP */
 
 /*********************************************************************
  *  Tunable default values.
  *********************************************************************/
 
 static SYSCTL_NODE(_hw, OID_AUTO, igb, CTLFLAG_RD, 0, "IGB driver parameters");
 
 /* Descriptor defaults */
 static int igb_rxd = IGB_DEFAULT_RXD;
 static int igb_txd = IGB_DEFAULT_TXD;
 SYSCTL_INT(_hw_igb, OID_AUTO, rxd, CTLFLAG_RDTUN, &igb_rxd, 0,
     "Number of receive descriptors per queue");
 SYSCTL_INT(_hw_igb, OID_AUTO, txd, CTLFLAG_RDTUN, &igb_txd, 0,
     "Number of transmit descriptors per queue");
 
 /*
 ** AIM: Adaptive Interrupt Moderation
 ** which means that the interrupt rate
 ** is varied over time based on the
 ** traffic for that interrupt vector
 */
 static int igb_enable_aim = TRUE;
 SYSCTL_INT(_hw_igb, OID_AUTO, enable_aim, CTLFLAG_RWTUN, &igb_enable_aim, 0,
     "Enable adaptive interrupt moderation");
 
 /*
  * MSIX should be the default for best performance,
  * but this allows it to be forced off for testing.
  */         
 static int igb_enable_msix = 1;
 SYSCTL_INT(_hw_igb, OID_AUTO, enable_msix, CTLFLAG_RDTUN, &igb_enable_msix, 0,
     "Enable MSI-X interrupts");
 
 /*
 ** Tuneable Interrupt rate
 */
 static int igb_max_interrupt_rate = 8000;
 SYSCTL_INT(_hw_igb, OID_AUTO, max_interrupt_rate, CTLFLAG_RDTUN,
     &igb_max_interrupt_rate, 0, "Maximum interrupts per second");
 
 #ifndef IGB_LEGACY_TX
 /*
 ** Tuneable number of buffers in the buf-ring (drbr_xxx)
 */
 static int igb_buf_ring_size = IGB_BR_SIZE;
 SYSCTL_INT(_hw_igb, OID_AUTO, buf_ring_size, CTLFLAG_RDTUN,
     &igb_buf_ring_size, 0, "Size of the bufring");
 #endif
 
 /*
 ** Header split causes the packet header to
 ** be dma'd to a separate mbuf from the payload.
 ** this can have memory alignment benefits. But
 ** another plus is that small packets often fit
 ** into the header and thus use no cluster. Its
 ** a very workload dependent type feature.
 */
 static int igb_header_split = FALSE;
 SYSCTL_INT(_hw_igb, OID_AUTO, header_split, CTLFLAG_RDTUN, &igb_header_split, 0,
     "Enable receive mbuf header split");
 
 /*
 ** This will autoconfigure based on the
 ** number of CPUs and max supported
 ** MSIX messages if left at 0.
 */
 static int igb_num_queues = 0;
 SYSCTL_INT(_hw_igb, OID_AUTO, num_queues, CTLFLAG_RDTUN, &igb_num_queues, 0,
     "Number of queues to configure, 0 indicates autoconfigure");
 
 /*
 ** Global variable to store last used CPU when binding queues
 ** to CPUs in igb_allocate_msix.  Starts at CPU_FIRST and increments when a
 ** queue is bound to a cpu.
 */
 static int igb_last_bind_cpu = -1;
 
 /* How many packets rxeof tries to clean at a time */
 static int igb_rx_process_limit = 100;
 SYSCTL_INT(_hw_igb, OID_AUTO, rx_process_limit, CTLFLAG_RDTUN,
     &igb_rx_process_limit, 0,
     "Maximum number of received packets to process at a time, -1 means unlimited");
 
 /* How many packets txeof tries to clean at a time */
 static int igb_tx_process_limit = -1;
 SYSCTL_INT(_hw_igb, OID_AUTO, tx_process_limit, CTLFLAG_RDTUN,
     &igb_tx_process_limit, 0,
     "Maximum number of sent packets to process at a time, -1 means unlimited");
 
 #ifdef DEV_NETMAP	/* see ixgbe.c for details */
 #include <dev/netmap/if_igb_netmap.h>
 #endif /* DEV_NETMAP */
 /*********************************************************************
  *  Device identification routine
  *
  *  igb_probe determines if the driver should be loaded on
  *  adapter based on PCI vendor/device id of the adapter.
  *
  *  return BUS_PROBE_DEFAULT on success, positive on failure
  *********************************************************************/
 
 static int
 igb_probe(device_t dev)
 {
 	char		adapter_name[256];
 	uint16_t	pci_vendor_id = 0;
 	uint16_t	pci_device_id = 0;
 	uint16_t	pci_subvendor_id = 0;
 	uint16_t	pci_subdevice_id = 0;
 	igb_vendor_info_t *ent;
 
 	INIT_DEBUGOUT("igb_probe: begin");
 
 	pci_vendor_id = pci_get_vendor(dev);
 	if (pci_vendor_id != IGB_INTEL_VENDOR_ID)
 		return (ENXIO);
 
 	pci_device_id = pci_get_device(dev);
 	pci_subvendor_id = pci_get_subvendor(dev);
 	pci_subdevice_id = pci_get_subdevice(dev);
 
 	ent = igb_vendor_info_array;
 	while (ent->vendor_id != 0) {
 		if ((pci_vendor_id == ent->vendor_id) &&
 		    (pci_device_id == ent->device_id) &&
 
 		    ((pci_subvendor_id == ent->subvendor_id) ||
 		    (ent->subvendor_id == 0)) &&
 
 		    ((pci_subdevice_id == ent->subdevice_id) ||
 		    (ent->subdevice_id == 0))) {
 			sprintf(adapter_name, "%s, Version - %s",
 				igb_strings[ent->index],
 				igb_driver_version);
 			device_set_desc_copy(dev, adapter_name);
 			return (BUS_PROBE_DEFAULT);
 		}
 		ent++;
 	}
 	return (ENXIO);
 }
 
 /*********************************************************************
  *  Device initialization routine
  *
  *  The attach entry point is called when the driver is being loaded.
  *  This routine identifies the type of hardware, allocates all resources
  *  and initializes the hardware.
  *
  *  return 0 on success, positive on failure
  *********************************************************************/
 
 static int
 igb_attach(device_t dev)
 {
 	struct adapter	*adapter;
 	int		error = 0;
 	u16		eeprom_data;
 
 	INIT_DEBUGOUT("igb_attach: begin");
 
 	if (resource_disabled("igb", device_get_unit(dev))) {
 		device_printf(dev, "Disabled by device hint\n");
 		return (ENXIO);
 	}
 
 	adapter = device_get_softc(dev);
 	adapter->dev = adapter->osdep.dev = dev;
 	IGB_CORE_LOCK_INIT(adapter, device_get_nameunit(dev));
 
 	/* SYSCTLs */
 	SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 	    OID_AUTO, "nvm", CTLTYPE_INT|CTLFLAG_RW, adapter, 0,
 	    igb_sysctl_nvm_info, "I", "NVM Information");
 
 	igb_set_sysctl_value(adapter, "enable_aim",
 	    "Interrupt Moderation", &adapter->enable_aim,
 	    igb_enable_aim);
 
 	SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 	    OID_AUTO, "fc", CTLTYPE_INT|CTLFLAG_RW,
 	    adapter, 0, igb_set_flowcntl, "I", "Flow Control");
 
 	callout_init_mtx(&adapter->timer, &adapter->core_mtx, 0);
 
 	/* Determine hardware and mac info */
 	igb_identify_hardware(adapter);
 
 	/* Setup PCI resources */
 	if (igb_allocate_pci_resources(adapter)) {
 		device_printf(dev, "Allocation of PCI resources failed\n");
 		error = ENXIO;
 		goto err_pci;
 	}
 
 	/* Do Shared Code initialization */
 	if (e1000_setup_init_funcs(&adapter->hw, TRUE)) {
 		device_printf(dev, "Setup of Shared code failed\n");
 		error = ENXIO;
 		goto err_pci;
 	}
 
 	e1000_get_bus_info(&adapter->hw);
 
 	/* Sysctls for limiting the amount of work done in the taskqueues */
 	igb_set_sysctl_value(adapter, "rx_processing_limit",
 	    "max number of rx packets to process",
 	    &adapter->rx_process_limit, igb_rx_process_limit);
 
 	igb_set_sysctl_value(adapter, "tx_processing_limit",
 	    "max number of tx packets to process",
 	    &adapter->tx_process_limit, igb_tx_process_limit);
 
 	/*
 	 * Validate number of transmit and receive descriptors. It
 	 * must not exceed hardware maximum, and must be multiple
 	 * of E1000_DBA_ALIGN.
 	 */
 	if (((igb_txd * sizeof(struct e1000_tx_desc)) % IGB_DBA_ALIGN) != 0 ||
 	    (igb_txd > IGB_MAX_TXD) || (igb_txd < IGB_MIN_TXD)) {
 		device_printf(dev, "Using %d TX descriptors instead of %d!\n",
 		    IGB_DEFAULT_TXD, igb_txd);
 		adapter->num_tx_desc = IGB_DEFAULT_TXD;
 	} else
 		adapter->num_tx_desc = igb_txd;
 	if (((igb_rxd * sizeof(struct e1000_rx_desc)) % IGB_DBA_ALIGN) != 0 ||
 	    (igb_rxd > IGB_MAX_RXD) || (igb_rxd < IGB_MIN_RXD)) {
 		device_printf(dev, "Using %d RX descriptors instead of %d!\n",
 		    IGB_DEFAULT_RXD, igb_rxd);
 		adapter->num_rx_desc = IGB_DEFAULT_RXD;
 	} else
 		adapter->num_rx_desc = igb_rxd;
 
 	adapter->hw.mac.autoneg = DO_AUTO_NEG;
 	adapter->hw.phy.autoneg_wait_to_complete = FALSE;
 	adapter->hw.phy.autoneg_advertised = AUTONEG_ADV_DEFAULT;
 
 	/* Copper options */
 	if (adapter->hw.phy.media_type == e1000_media_type_copper) {
 		adapter->hw.phy.mdix = AUTO_ALL_MODES;
 		adapter->hw.phy.disable_polarity_correction = FALSE;
 		adapter->hw.phy.ms_type = IGB_MASTER_SLAVE;
 	}
 
 	/*
 	 * Set the frame limits assuming
 	 * standard ethernet sized frames.
 	 */
 	adapter->max_frame_size = ETHERMTU + ETHER_HDR_LEN + ETHERNET_FCS_SIZE;
 
 	/*
 	** Allocate and Setup Queues
 	*/
 	if (igb_allocate_queues(adapter)) {
 		error = ENOMEM;
 		goto err_pci;
 	}
 
 	/* Allocate the appropriate stats memory */
 	if (adapter->vf_ifp) {
 		adapter->stats =
 		    (struct e1000_vf_stats *)malloc(sizeof \
 		    (struct e1000_vf_stats), M_DEVBUF, M_NOWAIT | M_ZERO);
 		igb_vf_init_stats(adapter);
 	} else
 		adapter->stats =
 		    (struct e1000_hw_stats *)malloc(sizeof \
 		    (struct e1000_hw_stats), M_DEVBUF, M_NOWAIT | M_ZERO);
 	if (adapter->stats == NULL) {
 		device_printf(dev, "Can not allocate stats memory\n");
 		error = ENOMEM;
 		goto err_late;
 	}
 
 	/* Allocate multicast array memory. */
 	adapter->mta = malloc(sizeof(u8) * ETH_ADDR_LEN *
 	    MAX_NUM_MULTICAST_ADDRESSES, M_DEVBUF, M_NOWAIT);
 	if (adapter->mta == NULL) {
 		device_printf(dev, "Can not allocate multicast setup array\n");
 		error = ENOMEM;
 		goto err_late;
 	}
 
 	/* Some adapter-specific advanced features */
 	if (adapter->hw.mac.type >= e1000_i350) {
 		SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 		    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 		    OID_AUTO, "dmac", CTLTYPE_INT|CTLFLAG_RW,
 		    adapter, 0, igb_sysctl_dmac, "I", "DMA Coalesce");
 		SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
 		    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 		    OID_AUTO, "eee_disabled", CTLTYPE_INT|CTLFLAG_RW,
 		    adapter, 0, igb_sysctl_eee, "I",
 		    "Disable Energy Efficient Ethernet");
 		if (adapter->hw.phy.media_type == e1000_media_type_copper) {
 			if (adapter->hw.mac.type == e1000_i354)
 				e1000_set_eee_i354(&adapter->hw, TRUE, TRUE);
 			else
 				e1000_set_eee_i350(&adapter->hw, TRUE, TRUE);
 		}
 	}
 
 	/*
 	** Start from a known state, this is
 	** important in reading the nvm and
 	** mac from that.
 	*/
 	e1000_reset_hw(&adapter->hw);
 
 	/* Make sure we have a good EEPROM before we read from it */
 	if (((adapter->hw.mac.type != e1000_i210) &&
 	    (adapter->hw.mac.type != e1000_i211)) &&
 	    (e1000_validate_nvm_checksum(&adapter->hw) < 0)) {
 		/*
 		** Some PCI-E parts fail the first check due to
 		** the link being in sleep state, call it again,
 		** if it fails a second time its a real issue.
 		*/
 		if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
 			device_printf(dev,
 			    "The EEPROM Checksum Is Not Valid\n");
 			error = EIO;
 			goto err_late;
 		}
 	}
 
 	/*
 	** Copy the permanent MAC address out of the EEPROM
 	*/
 	if (e1000_read_mac_addr(&adapter->hw) < 0) {
 		device_printf(dev, "EEPROM read error while reading MAC"
 		    " address\n");
 		error = EIO;
 		goto err_late;
 	}
 	/* Check its sanity */
 	if (!igb_is_valid_ether_addr(adapter->hw.mac.addr)) {
 		device_printf(dev, "Invalid MAC address\n");
 		error = EIO;
 		goto err_late;
 	}
 
 	/* Setup OS specific network interface */
 	if (igb_setup_interface(dev, adapter) != 0)
 		goto err_late;
 
 	/* Now get a good starting state */
 	igb_reset(adapter);
 
 	/* Initialize statistics */
 	igb_update_stats_counters(adapter);
 
 	adapter->hw.mac.get_link_status = 1;
 	igb_update_link_status(adapter);
 
 	/* Indicate SOL/IDER usage */
 	if (e1000_check_reset_block(&adapter->hw))
 		device_printf(dev,
 		    "PHY reset is blocked due to SOL/IDER session.\n");
 
 	/* Determine if we have to control management hardware */
 	adapter->has_manage = e1000_enable_mng_pass_thru(&adapter->hw);
 
 	/*
 	 * Setup Wake-on-Lan
 	 */
 	/* APME bit in EEPROM is mapped to WUC.APME */
 	eeprom_data = E1000_READ_REG(&adapter->hw, E1000_WUC) & E1000_WUC_APME;
 	if (eeprom_data)
 		adapter->wol = E1000_WUFC_MAG;
 
 	/* Register for VLAN events */
 	adapter->vlan_attach = EVENTHANDLER_REGISTER(vlan_config,
 	     igb_register_vlan, adapter, EVENTHANDLER_PRI_FIRST);
 	adapter->vlan_detach = EVENTHANDLER_REGISTER(vlan_unconfig,
 	     igb_unregister_vlan, adapter, EVENTHANDLER_PRI_FIRST);
 
 	igb_add_hw_stats(adapter);
 
 	/* Tell the stack that the interface is not active */
 	adapter->ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	adapter->ifp->if_drv_flags |=  IFF_DRV_OACTIVE;
 
 	adapter->led_dev = led_create(igb_led_func, adapter,
 	    device_get_nameunit(dev));
 
 	/* 
 	** Configure Interrupts
 	*/
 	if ((adapter->msix > 1) && (igb_enable_msix))
 		error = igb_allocate_msix(adapter);
 	else /* MSI or Legacy */
 		error = igb_allocate_legacy(adapter);
 	if (error)
 		goto err_late;
 
 #ifdef DEV_NETMAP
 	igb_netmap_attach(adapter);
 #endif /* DEV_NETMAP */
 	INIT_DEBUGOUT("igb_attach: end");
 
 	return (0);
 
 err_late:
 	if (igb_detach(dev) == 0) /* igb_detach() already did the cleanup */
 		return(error);
 	igb_free_transmit_structures(adapter);
 	igb_free_receive_structures(adapter);
 	igb_release_hw_control(adapter);
 err_pci:
 	igb_free_pci_resources(adapter);
 	if (adapter->ifp != NULL)
 		if_free(adapter->ifp);
 	free(adapter->mta, M_DEVBUF);
 	IGB_CORE_LOCK_DESTROY(adapter);
 
 	return (error);
 }
 
 /*********************************************************************
  *  Device removal routine
  *
  *  The detach entry point is called when the driver is being removed.
  *  This routine stops the adapter and deallocates all the resources
  *  that were allocated for driver operation.
  *
  *  return 0 on success, positive on failure
  *********************************************************************/
 
 static int
 igb_detach(device_t dev)
 {
 	struct adapter	*adapter = device_get_softc(dev);
 	struct ifnet	*ifp = adapter->ifp;
 
 	INIT_DEBUGOUT("igb_detach: begin");
 
 	/* Make sure VLANS are not using driver */
 	if (adapter->ifp->if_vlantrunk != NULL) {
 		device_printf(dev,"Vlan in use, detach first\n");
 		return (EBUSY);
 	}
 
 	ether_ifdetach(adapter->ifp);
 
 	if (adapter->led_dev != NULL)
 		led_destroy(adapter->led_dev);
 
 #ifdef DEVICE_POLLING
 	if (ifp->if_capenable & IFCAP_POLLING)
 		ether_poll_deregister(ifp);
 #endif
 
 	IGB_CORE_LOCK(adapter);
 	adapter->in_detach = 1;
 	igb_stop(adapter);
 	IGB_CORE_UNLOCK(adapter);
 
 	e1000_phy_hw_reset(&adapter->hw);
 
 	/* Give control back to firmware */
 	igb_release_manageability(adapter);
 	igb_release_hw_control(adapter);
 
 	if (adapter->wol) {
 		E1000_WRITE_REG(&adapter->hw, E1000_WUC, E1000_WUC_PME_EN);
 		E1000_WRITE_REG(&adapter->hw, E1000_WUFC, adapter->wol);
 		igb_enable_wakeup(dev);
 	}
 
 	/* Unregister VLAN events */
 	if (adapter->vlan_attach != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
 	if (adapter->vlan_detach != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
 
 	callout_drain(&adapter->timer);
 
 #ifdef DEV_NETMAP
 	netmap_detach(adapter->ifp);
 #endif /* DEV_NETMAP */
 	igb_free_pci_resources(adapter);
 	bus_generic_detach(dev);
 	if_free(ifp);
 
 	igb_free_transmit_structures(adapter);
 	igb_free_receive_structures(adapter);
 	if (adapter->mta != NULL)
 		free(adapter->mta, M_DEVBUF);
 
 	IGB_CORE_LOCK_DESTROY(adapter);
 
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Shutdown entry point
  *
  **********************************************************************/
 
 static int
 igb_shutdown(device_t dev)
 {
 	return igb_suspend(dev);
 }
 
 /*
  * Suspend/resume device methods.
  */
 static int
 igb_suspend(device_t dev)
 {
 	struct adapter *adapter = device_get_softc(dev);
 
 	IGB_CORE_LOCK(adapter);
 
 	igb_stop(adapter);
 
         igb_release_manageability(adapter);
 	igb_release_hw_control(adapter);
 
         if (adapter->wol) {
                 E1000_WRITE_REG(&adapter->hw, E1000_WUC, E1000_WUC_PME_EN);
                 E1000_WRITE_REG(&adapter->hw, E1000_WUFC, adapter->wol);
                 igb_enable_wakeup(dev);
         }
 
 	IGB_CORE_UNLOCK(adapter);
 
 	return bus_generic_suspend(dev);
 }
 
 static int
 igb_resume(device_t dev)
 {
 	struct adapter *adapter = device_get_softc(dev);
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct ifnet *ifp = adapter->ifp;
 
 	IGB_CORE_LOCK(adapter);
 	igb_init_locked(adapter);
 	igb_init_manageability(adapter);
 
 	if ((ifp->if_flags & IFF_UP) &&
 	    (ifp->if_drv_flags & IFF_DRV_RUNNING) && adapter->link_active) {
 		for (int i = 0; i < adapter->num_queues; i++, txr++) {
 			IGB_TX_LOCK(txr);
 #ifndef IGB_LEGACY_TX
 			/* Process the stack queue only if not depleted */
 			if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 			    !drbr_empty(ifp, txr->br))
 				igb_mq_start_locked(ifp, txr);
 #else
 			if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 				igb_start_locked(txr, ifp);
 #endif
 			IGB_TX_UNLOCK(txr);
 		}
 	}
 	IGB_CORE_UNLOCK(adapter);
 
 	return bus_generic_resume(dev);
 }
 
 
 #ifdef IGB_LEGACY_TX
 
 /*********************************************************************
  *  Transmit entry point
  *
  *  igb_start is called by the stack to initiate a transmit.
  *  The driver will remain in this routine as long as there are
  *  packets to transmit and transmit resources are available.
  *  In case resources are not available stack is notified and
  *  the packet is requeued.
  **********************************************************************/
 
 static void
 igb_start_locked(struct tx_ring *txr, struct ifnet *ifp)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct mbuf	*m_head;
 
 	IGB_TX_LOCK_ASSERT(txr);
 
 	if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
 	    IFF_DRV_RUNNING)
 		return;
 	if (!adapter->link_active)
 		return;
 
 	/* Call cleanup if number of TX descriptors low */
 	if (txr->tx_avail <= IGB_TX_CLEANUP_THRESHOLD)
 		igb_txeof(txr);
 
 	while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
 		if (txr->tx_avail <= IGB_MAX_SCATTER) {
 			txr->queue_status |= IGB_QUEUE_DEPLETED;
 			break;
 		}
 		IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
 		if (m_head == NULL)
 			break;
 		/*
 		 *  Encapsulation can modify our pointer, and or make it
 		 *  NULL on failure.  In that event, we can't requeue.
 		 */
 		if (igb_xmit(txr, &m_head)) {
 			if (m_head != NULL)
 				IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 			if (txr->tx_avail <= IGB_MAX_SCATTER)
 				txr->queue_status |= IGB_QUEUE_DEPLETED;
 			break;
 		}
 
 		/* Send a copy of the frame to the BPF listener */
 		ETHER_BPF_MTAP(ifp, m_head);
 
 		/* Set watchdog on */
 		txr->watchdog_time = ticks;
 		txr->queue_status |= IGB_QUEUE_WORKING;
 	}
 }
  
 /*
  * Legacy TX driver routine, called from the
  * stack, always uses tx[0], and spins for it.
  * Should not be used with multiqueue tx
  */
 static void
 igb_start(struct ifnet *ifp)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct tx_ring	*txr = adapter->tx_rings;
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		IGB_TX_LOCK(txr);
 		igb_start_locked(txr, ifp);
 		IGB_TX_UNLOCK(txr);
 	}
 	return;
 }
 
 #else /* ~IGB_LEGACY_TX */
 
 /*
 ** Multiqueue Transmit Entry:
 **  quick turnaround to the stack
 **
 */
 static int
 igb_mq_start(struct ifnet *ifp, struct mbuf *m)
 {
 	struct adapter		*adapter = ifp->if_softc;
 	struct igb_queue	*que;
 	struct tx_ring		*txr;
 	int 			i, err = 0;
 #ifdef	RSS
 	uint32_t		bucket_id;
 #endif
 
 	/* Which queue to use */
 	/*
 	 * When doing RSS, map it to the same outbound queue
 	 * as the incoming flow would be mapped to.
 	 *
 	 * If everything is setup correctly, it should be the
 	 * same bucket that the current CPU we're on is.
 	 */
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
 #ifdef	RSS
 		if (rss_hash2bucket(m->m_pkthdr.flowid,
 		    M_HASHTYPE_GET(m), &bucket_id) == 0) {
 			/* XXX TODO: spit out something if bucket_id > num_queues? */
 			i = bucket_id % adapter->num_queues;
 		} else {
 #endif
 			i = m->m_pkthdr.flowid % adapter->num_queues;
 #ifdef	RSS
 		}
 #endif
 	} else {
 		i = curcpu % adapter->num_queues;
 	}
 	txr = &adapter->tx_rings[i];
 	que = &adapter->queues[i];
 
 	err = drbr_enqueue(ifp, txr->br, m);
 	if (err)
 		return (err);
 	if (IGB_TX_TRYLOCK(txr)) {
 		igb_mq_start_locked(ifp, txr);
 		IGB_TX_UNLOCK(txr);
 	} else
 		taskqueue_enqueue(que->tq, &txr->txq_task);
 
 	return (0);
 }
 
 static int
 igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
 {
 	struct adapter  *adapter = txr->adapter;
         struct mbuf     *next;
         int             err = 0, enq = 0;
 
 	IGB_TX_LOCK_ASSERT(txr);
 
 	if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
 	    adapter->link_active == 0)
 		return (ENETDOWN);
 
 	/* Process the queue */
 	while ((next = drbr_peek(ifp, txr->br)) != NULL) {
 		if ((err = igb_xmit(txr, &next)) != 0) {
 			if (next == NULL) {
 				/* It was freed, move forward */
 				drbr_advance(ifp, txr->br);
 			} else {
 				/* 
 				 * Still have one left, it may not be
 				 * the same since the transmit function
 				 * may have changed it.
 				 */
 				drbr_putback(ifp, txr->br, next);
 			}
 			break;
 		}
 		drbr_advance(ifp, txr->br);
 		enq++;
 		if (next->m_flags & M_MCAST && adapter->vf_ifp)
 			if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 		ETHER_BPF_MTAP(ifp, next);
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 	}
 	if (enq > 0) {
 		/* Set the watchdog */
 		txr->queue_status |= IGB_QUEUE_WORKING;
 		txr->watchdog_time = ticks;
 	}
 	if (txr->tx_avail <= IGB_TX_CLEANUP_THRESHOLD)
 		igb_txeof(txr);
 	if (txr->tx_avail <= IGB_MAX_SCATTER)
 		txr->queue_status |= IGB_QUEUE_DEPLETED;
 	return (err);
 }
 
 /*
  * Called from a taskqueue to drain queued transmit packets.
  */
 static void
 igb_deferred_mq_start(void *arg, int pending)
 {
 	struct tx_ring *txr = arg;
 	struct adapter *adapter = txr->adapter;
 	struct ifnet *ifp = adapter->ifp;
 
 	IGB_TX_LOCK(txr);
 	if (!drbr_empty(ifp, txr->br))
 		igb_mq_start_locked(ifp, txr);
 	IGB_TX_UNLOCK(txr);
 }
 
 /*
 ** Flush all ring buffers
 */
 static void
 igb_qflush(struct ifnet *ifp)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct mbuf	*m;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		IGB_TX_LOCK(txr);
 		while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
 			m_freem(m);
 		IGB_TX_UNLOCK(txr);
 	}
 	if_qflush(ifp);
 }
 #endif /* ~IGB_LEGACY_TX */
 
 /*********************************************************************
  *  Ioctl entry point
  *
  *  igb_ioctl is called when the user wants to configure the
  *  interface.
  *
  *  return 0 on success, positive on failure
  **********************************************************************/
 
 static int
 igb_ioctl(struct ifnet *ifp, u_long command, caddr_t data)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct ifreq	*ifr = (struct ifreq *)data;
 #if defined(INET) || defined(INET6)
 	struct ifaddr	*ifa = (struct ifaddr *)data;
 #endif
 	bool		avoid_reset = FALSE;
 	int		error = 0;
 
 	if (adapter->in_detach)
 		return (error);
 
 	switch (command) {
 	case SIOCSIFADDR:
 #ifdef INET
 		if (ifa->ifa_addr->sa_family == AF_INET)
 			avoid_reset = TRUE;
 #endif
 #ifdef INET6
 		if (ifa->ifa_addr->sa_family == AF_INET6)
 			avoid_reset = TRUE;
 #endif
 		/*
 		** Calling init results in link renegotiation,
 		** so we avoid doing it when possible.
 		*/
 		if (avoid_reset) {
 			ifp->if_flags |= IFF_UP;
 			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
 				igb_init(adapter);
 #ifdef INET
 			if (!(ifp->if_flags & IFF_NOARP))
 				arp_ifinit(ifp, ifa);
 #endif
 		} else
 			error = ether_ioctl(ifp, command, data);
 		break;
 	case SIOCSIFMTU:
 	    {
 		int max_frame_size;
 
 		IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFMTU (Set Interface MTU)");
 
 		IGB_CORE_LOCK(adapter);
 		max_frame_size = 9234;
 		if (ifr->ifr_mtu > max_frame_size - ETHER_HDR_LEN -
 		    ETHER_CRC_LEN) {
 			IGB_CORE_UNLOCK(adapter);
 			error = EINVAL;
 			break;
 		}
 
 		ifp->if_mtu = ifr->ifr_mtu;
 		adapter->max_frame_size =
 		    ifp->if_mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
 		igb_init_locked(adapter);
 		IGB_CORE_UNLOCK(adapter);
 		break;
 	    }
 	case SIOCSIFFLAGS:
 		IOCTL_DEBUGOUT("ioctl rcv'd:\
 		    SIOCSIFFLAGS (Set Interface Flags)");
 		IGB_CORE_LOCK(adapter);
 		if (ifp->if_flags & IFF_UP) {
 			if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) {
 				if ((ifp->if_flags ^ adapter->if_flags) &
 				    (IFF_PROMISC | IFF_ALLMULTI)) {
 					igb_disable_promisc(adapter);
 					igb_set_promisc(adapter);
 				}
 			} else
 				igb_init_locked(adapter);
 		} else
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 				igb_stop(adapter);
 		adapter->if_flags = ifp->if_flags;
 		IGB_CORE_UNLOCK(adapter);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		IOCTL_DEBUGOUT("ioctl rcv'd: SIOC(ADD|DEL)MULTI");
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 			IGB_CORE_LOCK(adapter);
 			igb_disable_intr(adapter);
 			igb_set_multi(adapter);
 #ifdef DEVICE_POLLING
 			if (!(ifp->if_capenable & IFCAP_POLLING))
 #endif
 				igb_enable_intr(adapter);
 			IGB_CORE_UNLOCK(adapter);
 		}
 		break;
 	case SIOCSIFMEDIA:
 		/* Check SOL/IDER usage */
 		IGB_CORE_LOCK(adapter);
 		if (e1000_check_reset_block(&adapter->hw)) {
 			IGB_CORE_UNLOCK(adapter);
 			device_printf(adapter->dev, "Media change is"
 			    " blocked due to SOL/IDER session.\n");
 			break;
 		}
 		IGB_CORE_UNLOCK(adapter);
 	case SIOCGIFMEDIA:
 		IOCTL_DEBUGOUT("ioctl rcv'd: \
 		    SIOCxIFMEDIA (Get/Set Interface Media)");
 		error = ifmedia_ioctl(ifp, ifr, &adapter->media, command);
 		break;
 	case SIOCSIFCAP:
 	    {
 		int mask, reinit;
 
 		IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFCAP (Set Capabilities)");
 		reinit = 0;
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 #ifdef DEVICE_POLLING
 		if (mask & IFCAP_POLLING) {
 			if (ifr->ifr_reqcap & IFCAP_POLLING) {
 				error = ether_poll_register(igb_poll, ifp);
 				if (error)
 					return (error);
 				IGB_CORE_LOCK(adapter);
 				igb_disable_intr(adapter);
 				ifp->if_capenable |= IFCAP_POLLING;
 				IGB_CORE_UNLOCK(adapter);
 			} else {
 				error = ether_poll_deregister(ifp);
 				/* Enable interrupt even in error case */
 				IGB_CORE_LOCK(adapter);
 				igb_enable_intr(adapter);
 				ifp->if_capenable &= ~IFCAP_POLLING;
 				IGB_CORE_UNLOCK(adapter);
 			}
 		}
 #endif
 #if __FreeBSD_version >= 1000000
 		/* HW cannot turn these on/off separately */
 		if (mask & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) {
 			ifp->if_capenable ^= IFCAP_RXCSUM;
 			ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
 			reinit = 1;
 		}
 		if (mask & IFCAP_TXCSUM) {
 			ifp->if_capenable ^= IFCAP_TXCSUM;
 			reinit = 1;
 		}
 		if (mask & IFCAP_TXCSUM_IPV6) {
 			ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
 			reinit = 1;
 		}
 #else
 		if (mask & IFCAP_HWCSUM) {
 			ifp->if_capenable ^= IFCAP_HWCSUM;
 			reinit = 1;
 		}
 #endif
 		if (mask & IFCAP_TSO4) {
 			ifp->if_capenable ^= IFCAP_TSO4;
 			reinit = 1;
 		}
 		if (mask & IFCAP_TSO6) {
 			ifp->if_capenable ^= IFCAP_TSO6;
 			reinit = 1;
 		}
 		if (mask & IFCAP_VLAN_HWTAGGING) {
 			ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
 			reinit = 1;
 		}
 		if (mask & IFCAP_VLAN_HWFILTER) {
 			ifp->if_capenable ^= IFCAP_VLAN_HWFILTER;
 			reinit = 1;
 		}
 		if (mask & IFCAP_VLAN_HWTSO) {
 			ifp->if_capenable ^= IFCAP_VLAN_HWTSO;
 			reinit = 1;
 		}
 		if (mask & IFCAP_LRO) {
 			ifp->if_capenable ^= IFCAP_LRO;
 			reinit = 1;
 		}
 		if (reinit && (ifp->if_drv_flags & IFF_DRV_RUNNING))
 			igb_init(adapter);
 		VLAN_CAPABILITIES(ifp);
 		break;
 	    }
 
 	default:
 		error = ether_ioctl(ifp, command, data);
 		break;
 	}
 
 	return (error);
 }
 
 
 /*********************************************************************
  *  Init entry point
  *
  *  This routine is used in two ways. It is used by the stack as
  *  init entry point in network interface structure. It is also used
  *  by the driver as a hw/sw initialization routine to get to a
  *  consistent state.
  *
  *  return 0 on success, positive on failure
  **********************************************************************/
 
 static void
 igb_init_locked(struct adapter *adapter)
 {
 	struct ifnet	*ifp = adapter->ifp;
 	device_t	dev = adapter->dev;
 
 	INIT_DEBUGOUT("igb_init: begin");
 
 	IGB_CORE_LOCK_ASSERT(adapter);
 
 	igb_disable_intr(adapter);
 	callout_stop(&adapter->timer);
 
 	/* Get the latest mac address, User can use a LAA */
         bcopy(IF_LLADDR(adapter->ifp), adapter->hw.mac.addr,
               ETHER_ADDR_LEN);
 
 	/* Put the address into the Receive Address Array */
 	e1000_rar_set(&adapter->hw, adapter->hw.mac.addr, 0);
 
 	igb_reset(adapter);
 	igb_update_link_status(adapter);
 
 	E1000_WRITE_REG(&adapter->hw, E1000_VET, ETHERTYPE_VLAN);
 
 	/* Set hardware offload abilities */
 	ifp->if_hwassist = 0;
 	if (ifp->if_capenable & IFCAP_TXCSUM) {
 #if __FreeBSD_version >= 1000000
 		ifp->if_hwassist |= (CSUM_IP_TCP | CSUM_IP_UDP);
 		if (adapter->hw.mac.type != e1000_82575)
 			ifp->if_hwassist |= CSUM_IP_SCTP;
 #else
 		ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP);
 #if __FreeBSD_version >= 800000
 		if (adapter->hw.mac.type != e1000_82575)
 			ifp->if_hwassist |= CSUM_SCTP;
 #endif
 #endif
 	}
 
 #if __FreeBSD_version >= 1000000
 	if (ifp->if_capenable & IFCAP_TXCSUM_IPV6) {
 		ifp->if_hwassist |= (CSUM_IP6_TCP | CSUM_IP6_UDP);
 		if (adapter->hw.mac.type != e1000_82575)
 			ifp->if_hwassist |= CSUM_IP6_SCTP;
 	}
 #endif
 	if (ifp->if_capenable & IFCAP_TSO)
 		ifp->if_hwassist |= CSUM_TSO;
 
 	/* Clear bad data from Rx FIFOs */
 	e1000_rx_fifo_flush_82575(&adapter->hw);
 
 	/* Configure for OS presence */
 	igb_init_manageability(adapter);
 
 	/* Prepare transmit descriptors and buffers */
 	igb_setup_transmit_structures(adapter);
 	igb_initialize_transmit_units(adapter);
 
 	/* Setup Multicast table */
 	igb_set_multi(adapter);
 
 	/*
 	** Figure out the desired mbuf pool
 	** for doing jumbo/packetsplit
 	*/
 	if (adapter->max_frame_size <= 2048)
 		adapter->rx_mbuf_sz = MCLBYTES;
 	else if (adapter->max_frame_size <= 4096)
 		adapter->rx_mbuf_sz = MJUMPAGESIZE;
 	else
 		adapter->rx_mbuf_sz = MJUM9BYTES;
 
 	/* Prepare receive descriptors and buffers */
 	if (igb_setup_receive_structures(adapter)) {
 		device_printf(dev, "Could not setup receive structures\n");
 		return;
 	}
 	igb_initialize_receive_units(adapter);
 
         /* Enable VLAN support */
 	if (ifp->if_capenable & IFCAP_VLAN_HWTAGGING)
 		igb_setup_vlan_hw_support(adapter);
                                 
 	/* Don't lose promiscuous settings */
 	igb_set_promisc(adapter);
 
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 
 	callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
 	e1000_clear_hw_cntrs_base_generic(&adapter->hw);
 
 	if (adapter->msix > 1) /* Set up queue routing */
 		igb_configure_queues(adapter);
 
 	/* this clears any pending interrupts */
 	E1000_READ_REG(&adapter->hw, E1000_ICR);
 #ifdef DEVICE_POLLING
 	/*
 	 * Only enable interrupts if we are not polling, make sure
 	 * they are off otherwise.
 	 */
 	if (ifp->if_capenable & IFCAP_POLLING)
 		igb_disable_intr(adapter);
 	else
 #endif /* DEVICE_POLLING */
 	{
 		igb_enable_intr(adapter);
 		E1000_WRITE_REG(&adapter->hw, E1000_ICS, E1000_ICS_LSC);
 	}
 
 	/* Set Energy Efficient Ethernet */
 	if (adapter->hw.phy.media_type == e1000_media_type_copper) {
 		if (adapter->hw.mac.type == e1000_i354)
 			e1000_set_eee_i354(&adapter->hw, TRUE, TRUE);
 		else
 			e1000_set_eee_i350(&adapter->hw, TRUE, TRUE);
 	}
 }
 
 static void
 igb_init(void *arg)
 {
 	struct adapter *adapter = arg;
 
 	IGB_CORE_LOCK(adapter);
 	igb_init_locked(adapter);
 	IGB_CORE_UNLOCK(adapter);
 }
 
 
 static void
 igb_handle_que(void *context, int pending)
 {
 	struct igb_queue *que = context;
 	struct adapter *adapter = que->adapter;
 	struct tx_ring *txr = que->txr;
 	struct ifnet	*ifp = adapter->ifp;
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		bool	more;
 
 		more = igb_rxeof(que, adapter->rx_process_limit, NULL);
 
 		IGB_TX_LOCK(txr);
 		igb_txeof(txr);
 #ifndef IGB_LEGACY_TX
 		/* Process the stack queue only if not depleted */
 		if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 		    !drbr_empty(ifp, txr->br))
 			igb_mq_start_locked(ifp, txr);
 #else
 		if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 			igb_start_locked(txr, ifp);
 #endif
 		IGB_TX_UNLOCK(txr);
 		/* Do we need another? */
 		if (more) {
 			taskqueue_enqueue(que->tq, &que->que_task);
 			return;
 		}
 	}
 
 #ifdef DEVICE_POLLING
 	if (ifp->if_capenable & IFCAP_POLLING)
 		return;
 #endif
 	/* Reenable this interrupt */
 	if (que->eims)
 		E1000_WRITE_REG(&adapter->hw, E1000_EIMS, que->eims);
 	else
 		igb_enable_intr(adapter);
 }
 
 /* Deal with link in a sleepable context */
 static void
 igb_handle_link(void *context, int pending)
 {
 	struct adapter *adapter = context;
 
 	IGB_CORE_LOCK(adapter);
 	igb_handle_link_locked(adapter);
 	IGB_CORE_UNLOCK(adapter);
 }
 
 static void
 igb_handle_link_locked(struct adapter *adapter)
 {
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct ifnet *ifp = adapter->ifp;
 
 	IGB_CORE_LOCK_ASSERT(adapter);
 	adapter->hw.mac.get_link_status = 1;
 	igb_update_link_status(adapter);
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) && adapter->link_active) {
 		for (int i = 0; i < adapter->num_queues; i++, txr++) {
 			IGB_TX_LOCK(txr);
 #ifndef IGB_LEGACY_TX
 			/* Process the stack queue only if not depleted */
 			if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 			    !drbr_empty(ifp, txr->br))
 				igb_mq_start_locked(ifp, txr);
 #else
 			if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 				igb_start_locked(txr, ifp);
 #endif
 			IGB_TX_UNLOCK(txr);
 		}
 	}
 }
 
 /*********************************************************************
  *
  *  MSI/Legacy Deferred
  *  Interrupt Service routine  
  *
  *********************************************************************/
 static int
 igb_irq_fast(void *arg)
 {
 	struct adapter		*adapter = arg;
 	struct igb_queue	*que = adapter->queues;
 	u32			reg_icr;
 
 
 	reg_icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
 
 	/* Hot eject?  */
 	if (reg_icr == 0xffffffff)
 		return FILTER_STRAY;
 
 	/* Definitely not our interrupt.  */
 	if (reg_icr == 0x0)
 		return FILTER_STRAY;
 
 	if ((reg_icr & E1000_ICR_INT_ASSERTED) == 0)
 		return FILTER_STRAY;
 
 	/*
 	 * Mask interrupts until the taskqueue is finished running.  This is
 	 * cheap, just assume that it is needed.  This also works around the
 	 * MSI message reordering errata on certain systems.
 	 */
 	igb_disable_intr(adapter);
 	taskqueue_enqueue(que->tq, &que->que_task);
 
 	/* Link status change */
 	if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))
 		taskqueue_enqueue(que->tq, &adapter->link_task);
 
 	if (reg_icr & E1000_ICR_RXO)
 		adapter->rx_overruns++;
 	return FILTER_HANDLED;
 }
 
 #ifdef DEVICE_POLLING
 #if __FreeBSD_version >= 800000
 #define POLL_RETURN_COUNT(a) (a)
 static int
 #else
 #define POLL_RETURN_COUNT(a)
 static void
 #endif
 igb_poll(struct ifnet *ifp, enum poll_cmd cmd, int count)
 {
 	struct adapter		*adapter = ifp->if_softc;
 	struct igb_queue	*que;
 	struct tx_ring		*txr;
 	u32			reg_icr, rx_done = 0;
 	u32			loop = IGB_MAX_LOOP;
 	bool			more;
 
 	IGB_CORE_LOCK(adapter);
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 		IGB_CORE_UNLOCK(adapter);
 		return POLL_RETURN_COUNT(rx_done);
 	}
 
 	if (cmd == POLL_AND_CHECK_STATUS) {
 		reg_icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
 		/* Link status change */
 		if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))
 			igb_handle_link_locked(adapter);
 
 		if (reg_icr & E1000_ICR_RXO)
 			adapter->rx_overruns++;
 	}
 	IGB_CORE_UNLOCK(adapter);
 
 	for (int i = 0; i < adapter->num_queues; i++) {
 		que = &adapter->queues[i];
 		txr = que->txr;
 
 		igb_rxeof(que, count, &rx_done);
 
 		IGB_TX_LOCK(txr);
 		do {
 			more = igb_txeof(txr);
 		} while (loop-- && more);
 #ifndef IGB_LEGACY_TX
 		if (!drbr_empty(ifp, txr->br))
 			igb_mq_start_locked(ifp, txr);
 #else
 		if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 			igb_start_locked(txr, ifp);
 #endif
 		IGB_TX_UNLOCK(txr);
 	}
 
 	return POLL_RETURN_COUNT(rx_done);
 }
 #endif /* DEVICE_POLLING */
 
 /*********************************************************************
  *
  *  MSIX Que Interrupt Service routine
  *
  **********************************************************************/
 static void
 igb_msix_que(void *arg)
 {
 	struct igb_queue *que = arg;
 	struct adapter *adapter = que->adapter;
 	struct ifnet   *ifp = adapter->ifp;
 	struct tx_ring *txr = que->txr;
 	struct rx_ring *rxr = que->rxr;
 	u32		newitr = 0;
 	bool		more_rx;
 
 	/* Ignore spurious interrupts */
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 		return;
 
 	E1000_WRITE_REG(&adapter->hw, E1000_EIMC, que->eims);
 	++que->irqs;
 
 	IGB_TX_LOCK(txr);
 	igb_txeof(txr);
 #ifndef IGB_LEGACY_TX
 	/* Process the stack queue only if not depleted */
 	if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
 	    !drbr_empty(ifp, txr->br))
 		igb_mq_start_locked(ifp, txr);
 #else
 	if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 		igb_start_locked(txr, ifp);
 #endif
 	IGB_TX_UNLOCK(txr);
 
 	more_rx = igb_rxeof(que, adapter->rx_process_limit, NULL);
 
 	if (adapter->enable_aim == FALSE)
 		goto no_calc;
 	/*
 	** Do Adaptive Interrupt Moderation:
         **  - Write out last calculated setting
 	**  - Calculate based on average size over
 	**    the last interval.
 	*/
         if (que->eitr_setting)
                 E1000_WRITE_REG(&adapter->hw,
                     E1000_EITR(que->msix), que->eitr_setting);
  
         que->eitr_setting = 0;
 
         /* Idle, do nothing */
         if ((txr->bytes == 0) && (rxr->bytes == 0))
                 goto no_calc;
                                 
         /* Used half Default if sub-gig */
         if (adapter->link_speed != 1000)
                 newitr = IGB_DEFAULT_ITR / 2;
         else {
 		if ((txr->bytes) && (txr->packets))
                 	newitr = txr->bytes/txr->packets;
 		if ((rxr->bytes) && (rxr->packets))
 			newitr = max(newitr,
 			    (rxr->bytes / rxr->packets));
                 newitr += 24; /* account for hardware frame, crc */
 		/* set an upper boundary */
 		newitr = min(newitr, 3000);
 		/* Be nice to the mid range */
                 if ((newitr > 300) && (newitr < 1200))
                         newitr = (newitr / 3);
                 else
                         newitr = (newitr / 2);
         }
         newitr &= 0x7FFC;  /* Mask invalid bits */
         if (adapter->hw.mac.type == e1000_82575)
                 newitr |= newitr << 16;
         else
                 newitr |= E1000_EITR_CNT_IGNR;
                  
         /* save for next interrupt */
         que->eitr_setting = newitr;
 
         /* Reset state */
         txr->bytes = 0;
         txr->packets = 0;
         rxr->bytes = 0;
         rxr->packets = 0;
 
 no_calc:
 	/* Schedule a clean task if needed*/
 	if (more_rx)
 		taskqueue_enqueue(que->tq, &que->que_task);
 	else
 		/* Reenable this interrupt */
 		E1000_WRITE_REG(&adapter->hw, E1000_EIMS, que->eims);
 	return;
 }
 
 
 /*********************************************************************
  *
  *  MSIX Link Interrupt Service routine
  *
  **********************************************************************/
 
 static void
 igb_msix_link(void *arg)
 {
 	struct adapter	*adapter = arg;
 	u32       	icr;
 
 	++adapter->link_irq;
 	icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
 	if (!(icr & E1000_ICR_LSC))
 		goto spurious;
 	igb_handle_link(adapter, 0);
 
 spurious:
 	/* Rearm */
 	E1000_WRITE_REG(&adapter->hw, E1000_IMS, E1000_IMS_LSC);
 	E1000_WRITE_REG(&adapter->hw, E1000_EIMS, adapter->link_mask);
 	return;
 }
 
 
 /*********************************************************************
  *
  *  Media Ioctl callback
  *
  *  This routine is called whenever the user queries the status of
  *  the interface using ifconfig.
  *
  **********************************************************************/
 static void
 igb_media_status(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct adapter *adapter = ifp->if_softc;
 
 	INIT_DEBUGOUT("igb_media_status: begin");
 
 	IGB_CORE_LOCK(adapter);
 	igb_update_link_status(adapter);
 
 	ifmr->ifm_status = IFM_AVALID;
 	ifmr->ifm_active = IFM_ETHER;
 
 	if (!adapter->link_active) {
 		IGB_CORE_UNLOCK(adapter);
 		return;
 	}
 
 	ifmr->ifm_status |= IFM_ACTIVE;
 
 	switch (adapter->link_speed) {
 	case 10:
 		ifmr->ifm_active |= IFM_10_T;
 		break;
 	case 100:
 		/*
 		** Support for 100Mb SFP - these are Fiber 
 		** but the media type appears as serdes
 		*/
 		if (adapter->hw.phy.media_type ==
 		    e1000_media_type_internal_serdes)
 			ifmr->ifm_active |= IFM_100_FX;
 		else
 			ifmr->ifm_active |= IFM_100_TX;
 		break;
 	case 1000:
 		ifmr->ifm_active |= IFM_1000_T;
 		break;
 	case 2500:
 		ifmr->ifm_active |= IFM_2500_SX;
 		break;
 	}
 
 	if (adapter->link_duplex == FULL_DUPLEX)
 		ifmr->ifm_active |= IFM_FDX;
 	else
 		ifmr->ifm_active |= IFM_HDX;
 
 	IGB_CORE_UNLOCK(adapter);
 }
 
 /*********************************************************************
  *
  *  Media Ioctl callback
  *
  *  This routine is called when the user changes speed/duplex using
  *  media/mediopt option with ifconfig.
  *
  **********************************************************************/
 static int
 igb_media_change(struct ifnet *ifp)
 {
 	struct adapter *adapter = ifp->if_softc;
 	struct ifmedia  *ifm = &adapter->media;
 
 	INIT_DEBUGOUT("igb_media_change: begin");
 
 	if (IFM_TYPE(ifm->ifm_media) != IFM_ETHER)
 		return (EINVAL);
 
 	IGB_CORE_LOCK(adapter);
 	switch (IFM_SUBTYPE(ifm->ifm_media)) {
 	case IFM_AUTO:
 		adapter->hw.mac.autoneg = DO_AUTO_NEG;
 		adapter->hw.phy.autoneg_advertised = AUTONEG_ADV_DEFAULT;
 		break;
 	case IFM_1000_LX:
 	case IFM_1000_SX:
 	case IFM_1000_T:
 		adapter->hw.mac.autoneg = DO_AUTO_NEG;
 		adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
 		break;
 	case IFM_100_TX:
 		adapter->hw.mac.autoneg = FALSE;
 		adapter->hw.phy.autoneg_advertised = 0;
 		if ((ifm->ifm_media & IFM_GMASK) == IFM_FDX)
 			adapter->hw.mac.forced_speed_duplex = ADVERTISE_100_FULL;
 		else
 			adapter->hw.mac.forced_speed_duplex = ADVERTISE_100_HALF;
 		break;
 	case IFM_10_T:
 		adapter->hw.mac.autoneg = FALSE;
 		adapter->hw.phy.autoneg_advertised = 0;
 		if ((ifm->ifm_media & IFM_GMASK) == IFM_FDX)
 			adapter->hw.mac.forced_speed_duplex = ADVERTISE_10_FULL;
 		else
 			adapter->hw.mac.forced_speed_duplex = ADVERTISE_10_HALF;
 		break;
 	default:
 		device_printf(adapter->dev, "Unsupported media type\n");
 	}
 
 	igb_init_locked(adapter);
 	IGB_CORE_UNLOCK(adapter);
 
 	return (0);
 }
 
 
 /*********************************************************************
  *
  *  This routine maps the mbufs to Advanced TX descriptors.
  *  
  **********************************************************************/
 static int
 igb_xmit(struct tx_ring *txr, struct mbuf **m_headp)
 {
 	struct adapter  *adapter = txr->adapter;
 	u32		olinfo_status = 0, cmd_type_len;
 	int             i, j, error, nsegs;
 	int		first;
 	bool		remap = TRUE;
 	struct mbuf	*m_head;
 	bus_dma_segment_t segs[IGB_MAX_SCATTER];
 	bus_dmamap_t	map;
 	struct igb_tx_buf *txbuf;
 	union e1000_adv_tx_desc *txd = NULL;
 
 	m_head = *m_headp;
 
 	/* Basic descriptor defines */
         cmd_type_len = (E1000_ADVTXD_DTYP_DATA |
 	    E1000_ADVTXD_DCMD_IFCS | E1000_ADVTXD_DCMD_DEXT);
 
 	if (m_head->m_flags & M_VLANTAG)
         	cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
 
         /*
          * Important to capture the first descriptor
          * used because it will contain the index of
          * the one we tell the hardware to report back
          */
         first = txr->next_avail_desc;
 	txbuf = &txr->tx_buffers[first];
 	map = txbuf->map;
 
 	/*
 	 * Map the packet for DMA.
 	 */
 retry:
 	error = bus_dmamap_load_mbuf_sg(txr->txtag, map,
 	    *m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
 
 	if (__predict_false(error)) {
 		struct mbuf *m;
 
 		switch (error) {
 		case EFBIG:
 			/* Try it again? - one try */
 			if (remap == TRUE) {
 				remap = FALSE;
 				m = m_collapse(*m_headp, M_NOWAIT,
 				    IGB_MAX_SCATTER);
 				if (m == NULL) {
 					adapter->mbuf_defrag_failed++;
 					m_freem(*m_headp);
 					*m_headp = NULL;
 					return (ENOBUFS);
 				}
 				*m_headp = m;
 				goto retry;
 			} else
 				return (error);
 		default:
 			txr->no_tx_dma_setup++;
 			m_freem(*m_headp);
 			*m_headp = NULL;
 			return (error);
 		}
 	}
 
 	/* Make certain there are enough descriptors */
 	if (txr->tx_avail < (nsegs + 2)) {
 		txr->no_desc_avail++;
 		bus_dmamap_unload(txr->txtag, map);
 		return (ENOBUFS);
 	}
 	m_head = *m_headp;
 
 	/*
 	** Set up the appropriate offload context
 	** this will consume the first descriptor
 	*/
 	error = igb_tx_ctx_setup(txr, m_head, &cmd_type_len, &olinfo_status);
 	if (__predict_false(error)) {
 		m_freem(*m_headp);
 		*m_headp = NULL;
 		return (error);
 	}
 
 	/* 82575 needs the queue index added */
 	if (adapter->hw.mac.type == e1000_82575)
 		olinfo_status |= txr->me << 4;
 
 	i = txr->next_avail_desc;
 	for (j = 0; j < nsegs; j++) {
 		bus_size_t seglen;
 		bus_addr_t segaddr;
 
 		txbuf = &txr->tx_buffers[i];
 		txd = &txr->tx_base[i];
 		seglen = segs[j].ds_len;
 		segaddr = htole64(segs[j].ds_addr);
 
 		txd->read.buffer_addr = segaddr;
 		txd->read.cmd_type_len = htole32(E1000_TXD_CMD_IFCS |
 		    cmd_type_len | seglen);
 		txd->read.olinfo_status = htole32(olinfo_status);
 
 		if (++i == txr->num_desc)
 			i = 0;
 	}
 
 	txd->read.cmd_type_len |=
 	    htole32(E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS);
 	txr->tx_avail -= nsegs;
 	txr->next_avail_desc = i;
 
 	txbuf->m_head = m_head;
 	/*
 	** Here we swap the map so the last descriptor,
 	** which gets the completion interrupt has the
 	** real map, and the first descriptor gets the
 	** unused map from this descriptor.
 	*/
 	txr->tx_buffers[first].map = txbuf->map;
 	txbuf->map = map;
 	bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE);
 
         /* Set the EOP descriptor that will be marked done */
         txbuf = &txr->tx_buffers[first];
 	txbuf->eop = txd;
 
         bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
             BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	/*
 	 * Advance the Transmit Descriptor Tail (Tdt), this tells the
 	 * hardware that this frame is available to transmit.
 	 */
 	++txr->total_packets;
 	E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), i);
 
 	return (0);
 }
 static void
 igb_set_promisc(struct adapter *adapter)
 {
 	struct ifnet	*ifp = adapter->ifp;
 	struct e1000_hw *hw = &adapter->hw;
 	u32		reg;
 
 	if (adapter->vf_ifp) {
 		e1000_promisc_set_vf(hw, e1000_promisc_enabled);
 		return;
 	}
 
 	reg = E1000_READ_REG(hw, E1000_RCTL);
 	if (ifp->if_flags & IFF_PROMISC) {
 		reg |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
 		E1000_WRITE_REG(hw, E1000_RCTL, reg);
 	} else if (ifp->if_flags & IFF_ALLMULTI) {
 		reg |= E1000_RCTL_MPE;
 		reg &= ~E1000_RCTL_UPE;
 		E1000_WRITE_REG(hw, E1000_RCTL, reg);
 	}
 }
 
 static void
 igb_disable_promisc(struct adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	struct ifnet	*ifp = adapter->ifp;
 	u32		reg;
 	int		mcnt = 0;
 
 	if (adapter->vf_ifp) {
 		e1000_promisc_set_vf(hw, e1000_promisc_disabled);
 		return;
 	}
 	reg = E1000_READ_REG(hw, E1000_RCTL);
 	reg &=  (~E1000_RCTL_UPE);
 	if (ifp->if_flags & IFF_ALLMULTI)
 		mcnt = MAX_NUM_MULTICAST_ADDRESSES;
 	else {
 		struct  ifmultiaddr *ifma;
 #if __FreeBSD_version < 800000
 		IF_ADDR_LOCK(ifp);
 #else   
 		if_maddr_rlock(ifp);
 #endif
 		TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 			if (ifma->ifma_addr->sa_family != AF_LINK)
 				continue;
 			if (mcnt == MAX_NUM_MULTICAST_ADDRESSES)
 				break;
 			mcnt++;
 		}
 #if __FreeBSD_version < 800000
 		IF_ADDR_UNLOCK(ifp);
 #else
 		if_maddr_runlock(ifp);
 #endif
 	}
 	/* Don't disable if in MAX groups */
 	if (mcnt < MAX_NUM_MULTICAST_ADDRESSES)
 		reg &=  (~E1000_RCTL_MPE);
 	E1000_WRITE_REG(hw, E1000_RCTL, reg);
 }
 
 
 /*********************************************************************
  *  Multicast Update
  *
  *  This routine is called whenever multicast address list is updated.
  *
  **********************************************************************/
 
 static void
 igb_set_multi(struct adapter *adapter)
 {
 	struct ifnet	*ifp = adapter->ifp;
 	struct ifmultiaddr *ifma;
 	u32 reg_rctl = 0;
 	u8  *mta;
 
 	int mcnt = 0;
 
 	IOCTL_DEBUGOUT("igb_set_multi: begin");
 
 	mta = adapter->mta;
 	bzero(mta, sizeof(uint8_t) * ETH_ADDR_LEN *
 	    MAX_NUM_MULTICAST_ADDRESSES);
 
 #if __FreeBSD_version < 800000
 	IF_ADDR_LOCK(ifp);
 #else
 	if_maddr_rlock(ifp);
 #endif
 	TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 		if (ifma->ifma_addr->sa_family != AF_LINK)
 			continue;
 
 		if (mcnt == MAX_NUM_MULTICAST_ADDRESSES)
 			break;
 
 		bcopy(LLADDR((struct sockaddr_dl *)ifma->ifma_addr),
 		    &mta[mcnt * ETH_ADDR_LEN], ETH_ADDR_LEN);
 		mcnt++;
 	}
 #if __FreeBSD_version < 800000
 	IF_ADDR_UNLOCK(ifp);
 #else
 	if_maddr_runlock(ifp);
 #endif
 
 	if (mcnt >= MAX_NUM_MULTICAST_ADDRESSES) {
 		reg_rctl = E1000_READ_REG(&adapter->hw, E1000_RCTL);
 		reg_rctl |= E1000_RCTL_MPE;
 		E1000_WRITE_REG(&adapter->hw, E1000_RCTL, reg_rctl);
 	} else
 		e1000_update_mc_addr_list(&adapter->hw, mta, mcnt);
 }
 
 
 /*********************************************************************
  *  Timer routine:
  *  	This routine checks for link status,
  *	updates statistics, and does the watchdog.
  *
  **********************************************************************/
 
 static void
 igb_local_timer(void *arg)
 {
 	struct adapter		*adapter = arg;
 	device_t		dev = adapter->dev;
 	struct ifnet		*ifp = adapter->ifp;
 	struct tx_ring		*txr = adapter->tx_rings;
 	struct igb_queue	*que = adapter->queues;
 	int			hung = 0, busy = 0;
 
 
 	IGB_CORE_LOCK_ASSERT(adapter);
 
 	igb_update_link_status(adapter);
 	igb_update_stats_counters(adapter);
 
         /*
         ** Check the TX queues status
 	**	- central locked handling of OACTIVE
 	**	- watchdog only if all queues show hung
         */
 	for (int i = 0; i < adapter->num_queues; i++, que++, txr++) {
 		if ((txr->queue_status & IGB_QUEUE_HUNG) &&
 		    (adapter->pause_frames == 0))
 			++hung;
 		if (txr->queue_status & IGB_QUEUE_DEPLETED)
 			++busy;
 		if ((txr->queue_status & IGB_QUEUE_IDLE) == 0)
 			taskqueue_enqueue(que->tq, &que->que_task);
 	}
 	if (hung == adapter->num_queues)
 		goto timeout;
 	if (busy == adapter->num_queues)
 		ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 	else if ((ifp->if_drv_flags & IFF_DRV_OACTIVE) &&
 	    (busy < adapter->num_queues))
 		ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 
 	adapter->pause_frames = 0;
 	callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
 #ifndef DEVICE_POLLING
 	/* Schedule all queue interrupts - deadlock protection */
 	E1000_WRITE_REG(&adapter->hw, E1000_EICS, adapter->que_mask);
 #endif
 	return;
 
 timeout:
 	device_printf(adapter->dev, "Watchdog timeout -- resetting\n");
 	device_printf(dev,"Queue(%d) tdh = %d, hw tdt = %d\n", txr->me,
             E1000_READ_REG(&adapter->hw, E1000_TDH(txr->me)),
             E1000_READ_REG(&adapter->hw, E1000_TDT(txr->me)));
 	device_printf(dev,"TX(%d) desc avail = %d,"
             "Next TX to Clean = %d\n",
             txr->me, txr->tx_avail, txr->next_to_clean);
 	adapter->ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	adapter->watchdog_events++;
 	igb_init_locked(adapter);
 }
 
 static void
 igb_update_link_status(struct adapter *adapter)
 {
 	struct e1000_hw		*hw = &adapter->hw;
 	struct e1000_fc_info	*fc = &hw->fc;
 	struct ifnet		*ifp = adapter->ifp;
 	device_t		dev = adapter->dev;
 	struct tx_ring		*txr = adapter->tx_rings;
 	u32			link_check, thstat, ctrl;
 	char			*flowctl = NULL;
 
 	link_check = thstat = ctrl = 0;
 
 	/* Get the cached link value or read for real */
         switch (hw->phy.media_type) {
         case e1000_media_type_copper:
                 if (hw->mac.get_link_status) {
 			/* Do the work to read phy */
                         e1000_check_for_link(hw);
                         link_check = !hw->mac.get_link_status;
                 } else
                         link_check = TRUE;
                 break;
         case e1000_media_type_fiber:
                 e1000_check_for_link(hw);
                 link_check = (E1000_READ_REG(hw, E1000_STATUS) &
                                  E1000_STATUS_LU);
                 break;
         case e1000_media_type_internal_serdes:
                 e1000_check_for_link(hw);
                 link_check = adapter->hw.mac.serdes_has_link;
                 break;
 	/* VF device is type_unknown */
         case e1000_media_type_unknown:
                 e1000_check_for_link(hw);
 		link_check = !hw->mac.get_link_status;
 		/* Fall thru */
         default:
                 break;
         }
 
 	/* Check for thermal downshift or shutdown */
 	if (hw->mac.type == e1000_i350) {
 		thstat = E1000_READ_REG(hw, E1000_THSTAT);
 		ctrl = E1000_READ_REG(hw, E1000_CTRL_EXT);
 	}
 
 	/* Get the flow control for display */
 	switch (fc->current_mode) {
 	case e1000_fc_rx_pause:
 		flowctl = "RX";
 		break;	
 	case e1000_fc_tx_pause:
 		flowctl = "TX";
 		break;	
 	case e1000_fc_full:
 		flowctl = "Full";
 		break;	
 	case e1000_fc_none:
 	default:
 		flowctl = "None";
 		break;	
 	}
 
 	/* Now we check if a transition has happened */
 	if (link_check && (adapter->link_active == 0)) {
 		e1000_get_speed_and_duplex(&adapter->hw, 
 		    &adapter->link_speed, &adapter->link_duplex);
 		if (bootverbose)
 			device_printf(dev, "Link is up %d Mbps %s,"
 			    " Flow Control: %s\n",
 			    adapter->link_speed,
 			    ((adapter->link_duplex == FULL_DUPLEX) ?
 			    "Full Duplex" : "Half Duplex"), flowctl);
 		adapter->link_active = 1;
 		ifp->if_baudrate = adapter->link_speed * 1000000;
 		if ((ctrl & E1000_CTRL_EXT_LINK_MODE_GMII) &&
 		    (thstat & E1000_THSTAT_LINK_THROTTLE))
 			device_printf(dev, "Link: thermal downshift\n");
 		/* Delay Link Up for Phy update */
 		if (((hw->mac.type == e1000_i210) ||
 		    (hw->mac.type == e1000_i211)) &&
 		    (hw->phy.id == I210_I_PHY_ID))
 			msec_delay(I210_LINK_DELAY);
 		/* Reset if the media type changed. */
 		if (hw->dev_spec._82575.media_changed) {
 			hw->dev_spec._82575.media_changed = false;
 			adapter->flags |= IGB_MEDIA_RESET;
 			igb_reset(adapter);
 		}	
 		/* This can sleep */
 		if_link_state_change(ifp, LINK_STATE_UP);
 	} else if (!link_check && (adapter->link_active == 1)) {
 		ifp->if_baudrate = adapter->link_speed = 0;
 		adapter->link_duplex = 0;
 		if (bootverbose)
 			device_printf(dev, "Link is Down\n");
 		if ((ctrl & E1000_CTRL_EXT_LINK_MODE_GMII) &&
 		    (thstat & E1000_THSTAT_PWR_DOWN))
 			device_printf(dev, "Link: thermal shutdown\n");
 		adapter->link_active = 0;
 		/* This can sleep */
 		if_link_state_change(ifp, LINK_STATE_DOWN);
 		/* Reset queue state */
 		for (int i = 0; i < adapter->num_queues; i++, txr++)
 			txr->queue_status = IGB_QUEUE_IDLE;
 	}
 }
 
 /*********************************************************************
  *
  *  This routine disables all traffic on the adapter by issuing a
  *  global reset on the MAC and deallocates TX/RX buffers.
  *
  **********************************************************************/
 
 static void
 igb_stop(void *arg)
 {
 	struct adapter	*adapter = arg;
 	struct ifnet	*ifp = adapter->ifp;
 	struct tx_ring *txr = adapter->tx_rings;
 
 	IGB_CORE_LOCK_ASSERT(adapter);
 
 	INIT_DEBUGOUT("igb_stop: begin");
 
 	igb_disable_intr(adapter);
 
 	callout_stop(&adapter->timer);
 
 	/* Tell the stack that the interface is no longer active */
 	ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	ifp->if_drv_flags |= IFF_DRV_OACTIVE;
 
 	/* Disarm watchdog timer. */
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		IGB_TX_LOCK(txr);
 		txr->queue_status = IGB_QUEUE_IDLE;
 		IGB_TX_UNLOCK(txr);
 	}
 
 	e1000_reset_hw(&adapter->hw);
 	E1000_WRITE_REG(&adapter->hw, E1000_WUC, 0);
 
 	e1000_led_off(&adapter->hw);
 	e1000_cleanup_led(&adapter->hw);
 }
 
 
 /*********************************************************************
  *
  *  Determine hardware revision.
  *
  **********************************************************************/
 static void
 igb_identify_hardware(struct adapter *adapter)
 {
 	device_t dev = adapter->dev;
 
 	/* Make sure our PCI config space has the necessary stuff set */
 	pci_enable_busmaster(dev);
 	adapter->hw.bus.pci_cmd_word = pci_read_config(dev, PCIR_COMMAND, 2);
 
 	/* Save off the information about this board */
 	adapter->hw.vendor_id = pci_get_vendor(dev);
 	adapter->hw.device_id = pci_get_device(dev);
 	adapter->hw.revision_id = pci_read_config(dev, PCIR_REVID, 1);
 	adapter->hw.subsystem_vendor_id =
 	    pci_read_config(dev, PCIR_SUBVEND_0, 2);
 	adapter->hw.subsystem_device_id =
 	    pci_read_config(dev, PCIR_SUBDEV_0, 2);
 
 	/* Set MAC type early for PCI setup */
 	e1000_set_mac_type(&adapter->hw);
 
 	/* Are we a VF device? */
 	if ((adapter->hw.mac.type == e1000_vfadapt) ||
 	    (adapter->hw.mac.type == e1000_vfadapt_i350))
 		adapter->vf_ifp = 1;
 	else
 		adapter->vf_ifp = 0;
 }
 
 static int
 igb_allocate_pci_resources(struct adapter *adapter)
 {
 	device_t	dev = adapter->dev;
 	int		rid;
 
 	rid = PCIR_BAR(0);
 	adapter->pci_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY,
 	    &rid, RF_ACTIVE);
 	if (adapter->pci_mem == NULL) {
 		device_printf(dev, "Unable to allocate bus resource: memory\n");
 		return (ENXIO);
 	}
 	adapter->osdep.mem_bus_space_tag =
 	    rman_get_bustag(adapter->pci_mem);
 	adapter->osdep.mem_bus_space_handle =
 	    rman_get_bushandle(adapter->pci_mem);
 	adapter->hw.hw_addr = (u8 *)&adapter->osdep.mem_bus_space_handle;
 
 	adapter->num_queues = 1; /* Defaults for Legacy or MSI */
 
 	/* This will setup either MSI/X or MSI */
 	adapter->msix = igb_setup_msix(adapter);
 	adapter->hw.back = &adapter->osdep;
 
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Setup the Legacy or MSI Interrupt handler
  *
  **********************************************************************/
 static int
 igb_allocate_legacy(struct adapter *adapter)
 {
 	device_t		dev = adapter->dev;
 	struct igb_queue	*que = adapter->queues;
 #ifndef IGB_LEGACY_TX
 	struct tx_ring		*txr = adapter->tx_rings;
 #endif
 	int			error, rid = 0;
 
 	/* Turn off all interrupts */
 	E1000_WRITE_REG(&adapter->hw, E1000_IMC, 0xffffffff);
 
 	/* MSI RID is 1 */
 	if (adapter->msix == 1)
 		rid = 1;
 
 	/* We allocate a single interrupt resource */
 	adapter->res = bus_alloc_resource_any(dev,
 	    SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
 	if (adapter->res == NULL) {
 		device_printf(dev, "Unable to allocate bus resource: "
 		    "interrupt\n");
 		return (ENXIO);
 	}
 
 #ifndef IGB_LEGACY_TX
 	TASK_INIT(&txr->txq_task, 0, igb_deferred_mq_start, txr);
 #endif
 
 	/*
 	 * Try allocating a fast interrupt and the associated deferred
 	 * processing contexts.
 	 */
 	TASK_INIT(&que->que_task, 0, igb_handle_que, que);
 	/* Make tasklet for deferred link handling */
 	TASK_INIT(&adapter->link_task, 0, igb_handle_link, adapter);
 	que->tq = taskqueue_create_fast("igb_taskq", M_NOWAIT,
 	    taskqueue_thread_enqueue, &que->tq);
 	taskqueue_start_threads(&que->tq, 1, PI_NET, "%s taskq",
 	    device_get_nameunit(adapter->dev));
 	if ((error = bus_setup_intr(dev, adapter->res,
 	    INTR_TYPE_NET | INTR_MPSAFE, igb_irq_fast, NULL,
 	    adapter, &adapter->tag)) != 0) {
 		device_printf(dev, "Failed to register fast interrupt "
 			    "handler: %d\n", error);
 		taskqueue_free(que->tq);
 		que->tq = NULL;
 		return (error);
 	}
 
 	return (0);
 }
 
 
 /*********************************************************************
  *
  *  Setup the MSIX Queue Interrupt handlers: 
  *
  **********************************************************************/
 static int
 igb_allocate_msix(struct adapter *adapter)
 {
 	device_t		dev = adapter->dev;
 	struct igb_queue	*que = adapter->queues;
 	int			error, rid, vector = 0;
 	int			cpu_id = 0;
 #ifdef	RSS
 	cpuset_t cpu_mask;
 #endif
 
 	/* Be sure to start with all interrupts disabled */
 	E1000_WRITE_REG(&adapter->hw, E1000_IMC, ~0);
 	E1000_WRITE_FLUSH(&adapter->hw);
 
 #ifdef	RSS
 	/*
 	 * If we're doing RSS, the number of queues needs to
 	 * match the number of RSS buckets that are configured.
 	 *
 	 * + If there's more queues than RSS buckets, we'll end
 	 *   up with queues that get no traffic.
 	 *
 	 * + If there's more RSS buckets than queues, we'll end
 	 *   up having multiple RSS buckets map to the same queue,
 	 *   so there'll be some contention.
 	 */
 	if (adapter->num_queues != rss_getnumbuckets()) {
 		device_printf(dev,
 		    "%s: number of queues (%d) != number of RSS buckets (%d)"
 		    "; performance will be impacted.\n",
 		    __func__,
 		    adapter->num_queues,
 		    rss_getnumbuckets());
 	}
 #endif
 
 	for (int i = 0; i < adapter->num_queues; i++, vector++, que++) {
 		rid = vector +1;
 		que->res = bus_alloc_resource_any(dev,
 		    SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
 		if (que->res == NULL) {
 			device_printf(dev,
 			    "Unable to allocate bus resource: "
 			    "MSIX Queue Interrupt\n");
 			return (ENXIO);
 		}
 		error = bus_setup_intr(dev, que->res,
 	    	    INTR_TYPE_NET | INTR_MPSAFE, NULL,
 		    igb_msix_que, que, &que->tag);
 		if (error) {
 			que->res = NULL;
 			device_printf(dev, "Failed to register Queue handler");
 			return (error);
 		}
 #if __FreeBSD_version >= 800504
 		bus_describe_intr(dev, que->res, que->tag, "que %d", i);
 #endif
 		que->msix = vector;
 		if (adapter->hw.mac.type == e1000_82575)
 			que->eims = E1000_EICR_TX_QUEUE0 << i;
 		else
 			que->eims = 1 << vector;
 
 #ifdef	RSS
 		/*
 		 * The queue ID is used as the RSS layer bucket ID.
 		 * We look up the queue ID -> RSS CPU ID and select
 		 * that.
 		 */
 		cpu_id = rss_getcpu(i % rss_getnumbuckets());
 #else
 		/*
 		 * Bind the msix vector, and thus the
 		 * rings to the corresponding cpu.
 		 *
 		 * This just happens to match the default RSS round-robin
 		 * bucket -> queue -> CPU allocation.
 		 */
 		if (adapter->num_queues > 1) {
 			if (igb_last_bind_cpu < 0)
 				igb_last_bind_cpu = CPU_FIRST();
 			cpu_id = igb_last_bind_cpu;
 		}
 #endif
 
 		if (adapter->num_queues > 1) {
 			bus_bind_intr(dev, que->res, cpu_id);
 #ifdef	RSS
 			device_printf(dev,
 				"Bound queue %d to RSS bucket %d\n",
 				i, cpu_id);
 #else
 			device_printf(dev,
 				"Bound queue %d to cpu %d\n",
 				i, cpu_id);
 #endif
 		}
 
 #ifndef IGB_LEGACY_TX
 		TASK_INIT(&que->txr->txq_task, 0, igb_deferred_mq_start,
 		    que->txr);
 #endif
 		/* Make tasklet for deferred handling */
 		TASK_INIT(&que->que_task, 0, igb_handle_que, que);
 		que->tq = taskqueue_create("igb_que", M_NOWAIT,
 		    taskqueue_thread_enqueue, &que->tq);
 		if (adapter->num_queues > 1) {
 			/*
 			 * Only pin the taskqueue thread to a CPU if
 			 * RSS is in use.
 			 *
 			 * This again just happens to match the default RSS
 			 * round-robin bucket -> queue -> CPU allocation.
 			 */
 #ifdef	RSS
 			CPU_SETOF(cpu_id, &cpu_mask);
 			taskqueue_start_threads_cpuset(&que->tq, 1, PI_NET,
 			    &cpu_mask,
 			    "%s que (bucket %d)",
 			    device_get_nameunit(adapter->dev),
 			    cpu_id);
 #else
 			taskqueue_start_threads(&que->tq, 1, PI_NET,
 			    "%s que (qid %d)",
 			    device_get_nameunit(adapter->dev),
 			    cpu_id);
 #endif
 		} else {
 			taskqueue_start_threads(&que->tq, 1, PI_NET, "%s que",
 			    device_get_nameunit(adapter->dev));
 		}
 
 		/* Finally update the last bound CPU id */
 		if (adapter->num_queues > 1)
 			igb_last_bind_cpu = CPU_NEXT(igb_last_bind_cpu);
 	}
 
 	/* And Link */
 	rid = vector + 1;
 	adapter->res = bus_alloc_resource_any(dev,
 	    SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
 	if (adapter->res == NULL) {
 		device_printf(dev,
 		    "Unable to allocate bus resource: "
 		    "MSIX Link Interrupt\n");
 		return (ENXIO);
 	}
 	if ((error = bus_setup_intr(dev, adapter->res,
 	    INTR_TYPE_NET | INTR_MPSAFE, NULL,
 	    igb_msix_link, adapter, &adapter->tag)) != 0) {
 		device_printf(dev, "Failed to register Link handler");
 		return (error);
 	}
 #if __FreeBSD_version >= 800504
 	bus_describe_intr(dev, adapter->res, adapter->tag, "link");
 #endif
 	adapter->linkvec = vector;
 
 	return (0);
 }
 
 
 static void
 igb_configure_queues(struct adapter *adapter)
 {
 	struct	e1000_hw	*hw = &adapter->hw;
 	struct	igb_queue	*que;
 	u32			tmp, ivar = 0, newitr = 0;
 
 	/* First turn on RSS capability */
 	if (adapter->hw.mac.type != e1000_82575)
 		E1000_WRITE_REG(hw, E1000_GPIE,
 		    E1000_GPIE_MSIX_MODE | E1000_GPIE_EIAME |
 		    E1000_GPIE_PBA | E1000_GPIE_NSICR);
 
 	/* Turn on MSIX */
 	switch (adapter->hw.mac.type) {
 	case e1000_82580:
 	case e1000_i350:
 	case e1000_i354:
 	case e1000_i210:
 	case e1000_i211:
 	case e1000_vfadapt:
 	case e1000_vfadapt_i350:
 		/* RX entries */
 		for (int i = 0; i < adapter->num_queues; i++) {
 			u32 index = i >> 1;
 			ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
 			que = &adapter->queues[i];
 			if (i & 1) {
 				ivar &= 0xFF00FFFF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 16;
 			} else {
 				ivar &= 0xFFFFFF00;
 				ivar |= que->msix | E1000_IVAR_VALID;
 			}
 			E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
 		}
 		/* TX entries */
 		for (int i = 0; i < adapter->num_queues; i++) {
 			u32 index = i >> 1;
 			ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
 			que = &adapter->queues[i];
 			if (i & 1) {
 				ivar &= 0x00FFFFFF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 24;
 			} else {
 				ivar &= 0xFFFF00FF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 8;
 			}
 			E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
 			adapter->que_mask |= que->eims;
 		}
 
 		/* And for the link interrupt */
 		ivar = (adapter->linkvec | E1000_IVAR_VALID) << 8;
 		adapter->link_mask = 1 << adapter->linkvec;
 		E1000_WRITE_REG(hw, E1000_IVAR_MISC, ivar);
 		break;
 	case e1000_82576:
 		/* RX entries */
 		for (int i = 0; i < adapter->num_queues; i++) {
 			u32 index = i & 0x7; /* Each IVAR has two entries */
 			ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
 			que = &adapter->queues[i];
 			if (i < 8) {
 				ivar &= 0xFFFFFF00;
 				ivar |= que->msix | E1000_IVAR_VALID;
 			} else {
 				ivar &= 0xFF00FFFF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 16;
 			}
 			E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
 			adapter->que_mask |= que->eims;
 		}
 		/* TX entries */
 		for (int i = 0; i < adapter->num_queues; i++) {
 			u32 index = i & 0x7; /* Each IVAR has two entries */
 			ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
 			que = &adapter->queues[i];
 			if (i < 8) {
 				ivar &= 0xFFFF00FF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 8;
 			} else {
 				ivar &= 0x00FFFFFF;
 				ivar |= (que->msix | E1000_IVAR_VALID) << 24;
 			}
 			E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
 			adapter->que_mask |= que->eims;
 		}
 
 		/* And for the link interrupt */
 		ivar = (adapter->linkvec | E1000_IVAR_VALID) << 8;
 		adapter->link_mask = 1 << adapter->linkvec;
 		E1000_WRITE_REG(hw, E1000_IVAR_MISC, ivar);
 		break;
 
 	case e1000_82575:
                 /* enable MSI-X support*/
 		tmp = E1000_READ_REG(hw, E1000_CTRL_EXT);
                 tmp |= E1000_CTRL_EXT_PBA_CLR;
                 /* Auto-Mask interrupts upon ICR read. */
                 tmp |= E1000_CTRL_EXT_EIAME;
                 tmp |= E1000_CTRL_EXT_IRCA;
                 E1000_WRITE_REG(hw, E1000_CTRL_EXT, tmp);
 
 		/* Queues */
 		for (int i = 0; i < adapter->num_queues; i++) {
 			que = &adapter->queues[i];
 			tmp = E1000_EICR_RX_QUEUE0 << i;
 			tmp |= E1000_EICR_TX_QUEUE0 << i;
 			que->eims = tmp;
 			E1000_WRITE_REG_ARRAY(hw, E1000_MSIXBM(0),
 			    i, que->eims);
 			adapter->que_mask |= que->eims;
 		}
 
 		/* Link */
 		E1000_WRITE_REG(hw, E1000_MSIXBM(adapter->linkvec),
 		    E1000_EIMS_OTHER);
 		adapter->link_mask |= E1000_EIMS_OTHER;
 	default:
 		break;
 	}
 
 	/* Set the starting interrupt rate */
 	if (igb_max_interrupt_rate > 0)
 		newitr = (4000000 / igb_max_interrupt_rate) & 0x7FFC;
 
         if (hw->mac.type == e1000_82575)
                 newitr |= newitr << 16;
         else
                 newitr |= E1000_EITR_CNT_IGNR;
 
 	for (int i = 0; i < adapter->num_queues; i++) {
 		que = &adapter->queues[i];
 		E1000_WRITE_REG(hw, E1000_EITR(que->msix), newitr);
 	}
 
 	return;
 }
 
 
 static void
 igb_free_pci_resources(struct adapter *adapter)
 {
 	struct		igb_queue *que = adapter->queues;
 	device_t	dev = adapter->dev;
 	int		rid;
 
 	/*
 	** There is a slight possibility of a failure mode
 	** in attach that will result in entering this function
 	** before interrupt resources have been initialized, and
 	** in that case we do not want to execute the loops below
 	** We can detect this reliably by the state of the adapter
 	** res pointer.
 	*/
 	if (adapter->res == NULL)
 		goto mem;
 
 	/*
 	 * First release all the interrupt resources:
 	 */
 	for (int i = 0; i < adapter->num_queues; i++, que++) {
 		rid = que->msix + 1;
 		if (que->tag != NULL) {
 			bus_teardown_intr(dev, que->res, que->tag);
 			que->tag = NULL;
 		}
 		if (que->res != NULL)
 			bus_release_resource(dev,
 			    SYS_RES_IRQ, rid, que->res);
 	}
 
 	/* Clean the Legacy or Link interrupt last */
 	if (adapter->linkvec) /* we are doing MSIX */
 		rid = adapter->linkvec + 1;
 	else
 		(adapter->msix != 0) ? (rid = 1):(rid = 0);
 
 	que = adapter->queues;
 	if (adapter->tag != NULL) {
 		taskqueue_drain(que->tq, &adapter->link_task);
 		bus_teardown_intr(dev, adapter->res, adapter->tag);
 		adapter->tag = NULL;
 	}
 	if (adapter->res != NULL)
 		bus_release_resource(dev, SYS_RES_IRQ, rid, adapter->res);
 
 	for (int i = 0; i < adapter->num_queues; i++, que++) {
 		if (que->tq != NULL) {
 #ifndef IGB_LEGACY_TX
 			taskqueue_drain(que->tq, &que->txr->txq_task);
 #endif
 			taskqueue_drain(que->tq, &que->que_task);
 			taskqueue_free(que->tq);
 		}
 	}
 mem:
 	if (adapter->msix)
 		pci_release_msi(dev);
 
 	if (adapter->msix_mem != NULL)
 		bus_release_resource(dev, SYS_RES_MEMORY,
 		    adapter->memrid, adapter->msix_mem);
 
 	if (adapter->pci_mem != NULL)
 		bus_release_resource(dev, SYS_RES_MEMORY,
 		    PCIR_BAR(0), adapter->pci_mem);
 
 }
 
 /*
  * Setup Either MSI/X or MSI
  */
 static int
 igb_setup_msix(struct adapter *adapter)
 {
 	device_t	dev = adapter->dev;
 	int		bar, want, queues, msgs, maxqueues;
 
 	/* tuneable override */
 	if (igb_enable_msix == 0)
 		goto msi;
 
 	/* First try MSI/X */
 	msgs = pci_msix_count(dev); 
 	if (msgs == 0)
 		goto msi;
 	/*
 	** Some new devices, as with ixgbe, now may
 	** use a different BAR, so we need to keep
 	** track of which is used.
 	*/
 	adapter->memrid = PCIR_BAR(IGB_MSIX_BAR);
 	bar = pci_read_config(dev, adapter->memrid, 4);
 	if (bar == 0) /* use next bar */
 		adapter->memrid += 4;
 	adapter->msix_mem = bus_alloc_resource_any(dev,
 	    SYS_RES_MEMORY, &adapter->memrid, RF_ACTIVE);
        	if (adapter->msix_mem == NULL) {
 		/* May not be enabled */
 		device_printf(adapter->dev,
 		    "Unable to map MSIX table \n");
 		goto msi;
 	}
 
 	queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus;
 
 	/* Override via tuneable */
 	if (igb_num_queues != 0)
 		queues = igb_num_queues;
 
 #ifdef	RSS
 	/* If we're doing RSS, clamp at the number of RSS buckets */
 	if (queues > rss_getnumbuckets())
 		queues = rss_getnumbuckets();
 #endif
 
 
 	/* Sanity check based on HW */
 	switch (adapter->hw.mac.type) {
 		case e1000_82575:
 			maxqueues = 4;
 			break;
 		case e1000_82576:
 		case e1000_82580:
 		case e1000_i350:
 		case e1000_i354:
 			maxqueues = 8;
 			break;
 		case e1000_i210:
 			maxqueues = 4;
 			break;
 		case e1000_i211:
 			maxqueues = 2;
 			break;
 		default:  /* VF interfaces */
 			maxqueues = 1;
 			break;
 	}
 
 	/* Final clamp on the actual hardware capability */
 	if (queues > maxqueues)
 		queues = maxqueues;
 
 	/*
 	** One vector (RX/TX pair) per queue
 	** plus an additional for Link interrupt
 	*/
 	want = queues + 1;
 	if (msgs >= want)
 		msgs = want;
 	else {
                	device_printf(adapter->dev,
 		    "MSIX Configuration Problem, "
 		    "%d vectors configured, but %d queues wanted!\n",
 		    msgs, want);
 		goto msi;
 	}
 	if ((pci_alloc_msix(dev, &msgs) == 0) && (msgs == want)) {
                	device_printf(adapter->dev,
 		    "Using MSIX interrupts with %d vectors\n", msgs);
 		adapter->num_queues = queues;
 		return (msgs);
 	}
 	/*
 	** If MSIX alloc failed or provided us with
 	** less than needed, free and fall through to MSI
 	*/
 	pci_release_msi(dev);
 
 msi:
        	if (adapter->msix_mem != NULL) {
 		bus_release_resource(dev, SYS_RES_MEMORY,
 		    PCIR_BAR(IGB_MSIX_BAR), adapter->msix_mem);
 		adapter->msix_mem = NULL;
 	}
        	msgs = 1;
 	if (pci_alloc_msi(dev, &msgs) == 0) {
 		device_printf(adapter->dev," Using an MSI interrupt\n");
 		return (msgs);
 	}
 	device_printf(adapter->dev," Using a Legacy interrupt\n");
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Initialize the DMA Coalescing feature
  *
  **********************************************************************/
 static void
 igb_init_dmac(struct adapter *adapter, u32 pba)
 {
 	device_t	dev = adapter->dev;
 	struct e1000_hw *hw = &adapter->hw;
 	u32 		dmac, reg = ~E1000_DMACR_DMAC_EN;
 	u16		hwm;
 
 	if (hw->mac.type == e1000_i211)
 		return;
 
 	if (hw->mac.type > e1000_82580) {
 
 		if (adapter->dmac == 0) { /* Disabling it */
 			E1000_WRITE_REG(hw, E1000_DMACR, reg);
 			return;
 		} else
 			device_printf(dev, "DMA Coalescing enabled\n");
 
 		/* Set starting threshold */
 		E1000_WRITE_REG(hw, E1000_DMCTXTH, 0);
 
 		hwm = 64 * pba - adapter->max_frame_size / 16;
 		if (hwm < 64 * (pba - 6))
 			hwm = 64 * (pba - 6);
 		reg = E1000_READ_REG(hw, E1000_FCRTC);
 		reg &= ~E1000_FCRTC_RTH_COAL_MASK;
 		reg |= ((hwm << E1000_FCRTC_RTH_COAL_SHIFT)
 		    & E1000_FCRTC_RTH_COAL_MASK);
 		E1000_WRITE_REG(hw, E1000_FCRTC, reg);
 
 
 		dmac = pba - adapter->max_frame_size / 512;
 		if (dmac < pba - 10)
 			dmac = pba - 10;
 		reg = E1000_READ_REG(hw, E1000_DMACR);
 		reg &= ~E1000_DMACR_DMACTHR_MASK;
 		reg = ((dmac << E1000_DMACR_DMACTHR_SHIFT)
 		    & E1000_DMACR_DMACTHR_MASK);
 
 		/* transition to L0x or L1 if available..*/
 		reg |= (E1000_DMACR_DMAC_EN | E1000_DMACR_DMAC_LX_MASK);
 
 		/* Check if status is 2.5Gb backplane connection
 		* before configuration of watchdog timer, which is
 		* in msec values in 12.8usec intervals
 		* watchdog timer= msec values in 32usec intervals
 		* for non 2.5Gb connection
 		*/
 		if (hw->mac.type == e1000_i354) {
 			int status = E1000_READ_REG(hw, E1000_STATUS);
 			if ((status & E1000_STATUS_2P5_SKU) &&
 			    (!(status & E1000_STATUS_2P5_SKU_OVER)))
 				reg |= ((adapter->dmac * 5) >> 6);
 			else
 				reg |= (adapter->dmac >> 5);
 		} else {
 			reg |= (adapter->dmac >> 5);
 		}
 
 		E1000_WRITE_REG(hw, E1000_DMACR, reg);
 
 		E1000_WRITE_REG(hw, E1000_DMCRTRH, 0);
 
 		/* Set the interval before transition */
 		reg = E1000_READ_REG(hw, E1000_DMCTLX);
 		if (hw->mac.type == e1000_i350)
 			reg |= IGB_DMCTLX_DCFLUSH_DIS;
 		/*
 		** in 2.5Gb connection, TTLX unit is 0.4 usec
 		** which is 0x4*2 = 0xA. But delay is still 4 usec
 		*/
 		if (hw->mac.type == e1000_i354) {
 			int status = E1000_READ_REG(hw, E1000_STATUS);
 			if ((status & E1000_STATUS_2P5_SKU) &&
 			    (!(status & E1000_STATUS_2P5_SKU_OVER)))
 				reg |= 0xA;
 			else
 				reg |= 0x4;
 		} else {
 			reg |= 0x4;
 		}
 
 		E1000_WRITE_REG(hw, E1000_DMCTLX, reg);
 
 		/* free space in tx packet buffer to wake from DMA coal */
 		E1000_WRITE_REG(hw, E1000_DMCTXTH, (IGB_TXPBSIZE -
 		    (2 * adapter->max_frame_size)) >> 6);
 
 		/* make low power state decision controlled by DMA coal */
 		reg = E1000_READ_REG(hw, E1000_PCIEMISC);
 		reg &= ~E1000_PCIEMISC_LX_DECISION;
 		E1000_WRITE_REG(hw, E1000_PCIEMISC, reg);
 
 	} else if (hw->mac.type == e1000_82580) {
 		u32 reg = E1000_READ_REG(hw, E1000_PCIEMISC);
 		E1000_WRITE_REG(hw, E1000_PCIEMISC,
 		    reg & ~E1000_PCIEMISC_LX_DECISION);
 		E1000_WRITE_REG(hw, E1000_DMACR, 0);
 	}
 }
 
 
 /*********************************************************************
  *
  *  Set up an fresh starting state
  *
  **********************************************************************/
 static void
 igb_reset(struct adapter *adapter)
 {
 	device_t	dev = adapter->dev;
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_fc_info *fc = &hw->fc;
 	struct ifnet	*ifp = adapter->ifp;
 	u32		pba = 0;
 	u16		hwm;
 
 	INIT_DEBUGOUT("igb_reset: begin");
 
 	/* Let the firmware know the OS is in control */
 	igb_get_hw_control(adapter);
 
 	/*
 	 * Packet Buffer Allocation (PBA)
 	 * Writing PBA sets the receive portion of the buffer
 	 * the remainder is used for the transmit buffer.
 	 */
 	switch (hw->mac.type) {
 	case e1000_82575:
 		pba = E1000_PBA_32K;
 		break;
 	case e1000_82576:
 	case e1000_vfadapt:
 		pba = E1000_READ_REG(hw, E1000_RXPBS);
 		pba &= E1000_RXPBS_SIZE_MASK_82576;
 		break;
 	case e1000_82580:
 	case e1000_i350:
 	case e1000_i354:
 	case e1000_vfadapt_i350:
 		pba = E1000_READ_REG(hw, E1000_RXPBS);
 		pba = e1000_rxpbs_adjust_82580(pba);
 		break;
 	case e1000_i210:
 	case e1000_i211:
 		pba = E1000_PBA_34K;
 	default:
 		break;
 	}
 
 	/* Special needs in case of Jumbo frames */
 	if ((hw->mac.type == e1000_82575) && (ifp->if_mtu > ETHERMTU)) {
 		u32 tx_space, min_tx, min_rx;
 		pba = E1000_READ_REG(hw, E1000_PBA);
 		tx_space = pba >> 16;
 		pba &= 0xffff;
 		min_tx = (adapter->max_frame_size +
 		    sizeof(struct e1000_tx_desc) - ETHERNET_FCS_SIZE) * 2;
 		min_tx = roundup2(min_tx, 1024);
 		min_tx >>= 10;
                 min_rx = adapter->max_frame_size;
                 min_rx = roundup2(min_rx, 1024);
                 min_rx >>= 10;
 		if (tx_space < min_tx &&
 		    ((min_tx - tx_space) < pba)) {
 			pba = pba - (min_tx - tx_space);
 			/*
                          * if short on rx space, rx wins
                          * and must trump tx adjustment
 			 */
                         if (pba < min_rx)
                                 pba = min_rx;
 		}
 		E1000_WRITE_REG(hw, E1000_PBA, pba);
 	}
 
 	INIT_DEBUGOUT1("igb_init: pba=%dK",pba);
 
 	/*
 	 * These parameters control the automatic generation (Tx) and
 	 * response (Rx) to Ethernet PAUSE frames.
 	 * - High water mark should allow for at least two frames to be
 	 *   received after sending an XOFF.
 	 * - Low water mark works best when it is very near the high water mark.
 	 *   This allows the receiver to restart by sending XON when it has
 	 *   drained a bit.
 	 */
 	hwm = min(((pba << 10) * 9 / 10),
 	    ((pba << 10) - 2 * adapter->max_frame_size));
 
 	if (hw->mac.type < e1000_82576) {
 		fc->high_water = hwm & 0xFFF8;  /* 8-byte granularity */
 		fc->low_water = fc->high_water - 8;
 	} else {
 		fc->high_water = hwm & 0xFFF0;  /* 16-byte granularity */
 		fc->low_water = fc->high_water - 16;
 	}
 
 	fc->pause_time = IGB_FC_PAUSE_TIME;
 	fc->send_xon = TRUE;
 	if (adapter->fc)
 		fc->requested_mode = adapter->fc;
 	else
 		fc->requested_mode = e1000_fc_default;
 
 	/* Issue a global reset */
 	e1000_reset_hw(hw);
 	E1000_WRITE_REG(hw, E1000_WUC, 0);
 
 	/* Reset for AutoMediaDetect */
 	if (adapter->flags & IGB_MEDIA_RESET) {
 		e1000_setup_init_funcs(hw, TRUE);
 		e1000_get_bus_info(hw);
 		adapter->flags &= ~IGB_MEDIA_RESET;
 	}
 
 	if (e1000_init_hw(hw) < 0)
 		device_printf(dev, "Hardware Initialization Failed\n");
 
 	/* Setup DMA Coalescing */
 	igb_init_dmac(adapter, pba);
 
 	E1000_WRITE_REG(&adapter->hw, E1000_VET, ETHERTYPE_VLAN);
 	e1000_get_phy_info(hw);
 	e1000_check_for_link(hw);
 	return;
 }
 
 /*********************************************************************
  *
  *  Setup networking device structure and register an interface.
  *
  **********************************************************************/
 static int
 igb_setup_interface(device_t dev, struct adapter *adapter)
 {
 	struct ifnet   *ifp;
 
 	INIT_DEBUGOUT("igb_setup_interface: begin");
 
 	ifp = adapter->ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "can not allocate ifnet structure\n");
 		return (-1);
 	}
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	ifp->if_init =  igb_init;
 	ifp->if_softc = adapter;
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 	ifp->if_ioctl = igb_ioctl;
 	ifp->if_get_counter = igb_get_counter;
 
 	/* TSO parameters */
 	ifp->if_hw_tsomax = IP_MAXPACKET;
 	ifp->if_hw_tsomaxsegcount = IGB_MAX_SCATTER;
 	ifp->if_hw_tsomaxsegsize = IGB_TSO_SEG_SIZE;
 
 #ifndef IGB_LEGACY_TX
 	ifp->if_transmit = igb_mq_start;
 	ifp->if_qflush = igb_qflush;
 #else
 	ifp->if_start = igb_start;
 	IFQ_SET_MAXLEN(&ifp->if_snd, adapter->num_tx_desc - 1);
 	ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 1;
 	IFQ_SET_READY(&ifp->if_snd);
 #endif
 
 	ether_ifattach(ifp, adapter->hw.mac.addr);
 
 	ifp->if_capabilities = ifp->if_capenable = 0;
 
 	ifp->if_capabilities = IFCAP_HWCSUM | IFCAP_VLAN_HWCSUM;
 #if __FreeBSD_version >= 1000000
 	ifp->if_capabilities |= IFCAP_HWCSUM_IPV6;
 #endif
 	ifp->if_capabilities |= IFCAP_TSO;
 	ifp->if_capabilities |= IFCAP_JUMBO_MTU;
 	ifp->if_capenable = ifp->if_capabilities;
 
 	/* Don't enable LRO by default */
 	ifp->if_capabilities |= IFCAP_LRO;
 
 #ifdef DEVICE_POLLING
 	ifp->if_capabilities |= IFCAP_POLLING;
 #endif
 
 	/*
 	 * Tell the upper layer(s) we
 	 * support full VLAN capability.
 	 */
 	ifp->if_hdrlen = sizeof(struct ether_vlan_header);
 	ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING
 			     |  IFCAP_VLAN_HWTSO
 			     |  IFCAP_VLAN_MTU;
 	ifp->if_capenable |= IFCAP_VLAN_HWTAGGING
 			  |  IFCAP_VLAN_HWTSO
 			  |  IFCAP_VLAN_MTU;
 
 	/*
 	** Don't turn this on by default, if vlans are
 	** created on another pseudo device (eg. lagg)
 	** then vlan events are not passed thru, breaking
 	** operation, but with HW FILTER off it works. If
 	** using vlans directly on the igb driver you can
 	** enable this and get full hardware tag filtering.
 	*/
 	ifp->if_capabilities |= IFCAP_VLAN_HWFILTER;
 
 	/*
 	 * Specify the media types supported by this adapter and register
 	 * callbacks to update media and link information
 	 */
 	ifmedia_init(&adapter->media, IFM_IMASK,
 	    igb_media_change, igb_media_status);
 	if ((adapter->hw.phy.media_type == e1000_media_type_fiber) ||
 	    (adapter->hw.phy.media_type == e1000_media_type_internal_serdes)) {
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_1000_SX | IFM_FDX, 
 			    0, NULL);
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_1000_SX, 0, NULL);
 	} else {
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_10_T, 0, NULL);
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_10_T | IFM_FDX,
 			    0, NULL);
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_100_TX,
 			    0, NULL);
 		ifmedia_add(&adapter->media, IFM_ETHER | IFM_100_TX | IFM_FDX,
 			    0, NULL);
 		if (adapter->hw.phy.type != e1000_phy_ife) {
 			ifmedia_add(&adapter->media,
 				IFM_ETHER | IFM_1000_T | IFM_FDX, 0, NULL);
 			ifmedia_add(&adapter->media,
 				IFM_ETHER | IFM_1000_T, 0, NULL);
 		}
 	}
 	ifmedia_add(&adapter->media, IFM_ETHER | IFM_AUTO, 0, NULL);
 	ifmedia_set(&adapter->media, IFM_ETHER | IFM_AUTO);
 	return (0);
 }
 
 
 /*
  * Manage DMA'able memory.
  */
 static void
 igb_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	if (error)
 		return;
 	*(bus_addr_t *) arg = segs[0].ds_addr;
 }
 
 static int
 igb_dma_malloc(struct adapter *adapter, bus_size_t size,
         struct igb_dma_alloc *dma, int mapflags)
 {
 	int error;
 
 	error = bus_dma_tag_create(bus_get_dma_tag(adapter->dev), /* parent */
 				IGB_DBA_ALIGN, 0,	/* alignment, bounds */
 				BUS_SPACE_MAXADDR,	/* lowaddr */
 				BUS_SPACE_MAXADDR,	/* highaddr */
 				NULL, NULL,		/* filter, filterarg */
 				size,			/* maxsize */
 				1,			/* nsegments */
 				size,			/* maxsegsize */
 				0,			/* flags */
 				NULL,			/* lockfunc */
 				NULL,			/* lockarg */
 				&dma->dma_tag);
 	if (error) {
 		device_printf(adapter->dev,
 		    "%s: bus_dma_tag_create failed: %d\n",
 		    __func__, error);
 		goto fail_0;
 	}
 
 	error = bus_dmamem_alloc(dma->dma_tag, (void**) &dma->dma_vaddr,
 	    BUS_DMA_NOWAIT | BUS_DMA_COHERENT, &dma->dma_map);
 	if (error) {
 		device_printf(adapter->dev,
 		    "%s: bus_dmamem_alloc(%ju) failed: %d\n",
 		    __func__, (uintmax_t)size, error);
 		goto fail_2;
 	}
 
 	dma->dma_paddr = 0;
 	error = bus_dmamap_load(dma->dma_tag, dma->dma_map, dma->dma_vaddr,
 	    size, igb_dmamap_cb, &dma->dma_paddr, mapflags | BUS_DMA_NOWAIT);
 	if (error || dma->dma_paddr == 0) {
 		device_printf(adapter->dev,
 		    "%s: bus_dmamap_load failed: %d\n",
 		    __func__, error);
 		goto fail_3;
 	}
 
 	return (0);
 
 fail_3:
 	bus_dmamap_unload(dma->dma_tag, dma->dma_map);
 fail_2:
 	bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
 	bus_dma_tag_destroy(dma->dma_tag);
 fail_0:
 	dma->dma_tag = NULL;
 
 	return (error);
 }
 
 static void
 igb_dma_free(struct adapter *adapter, struct igb_dma_alloc *dma)
 {
 	if (dma->dma_tag == NULL)
 		return;
 	if (dma->dma_paddr != 0) {
 		bus_dmamap_sync(dma->dma_tag, dma->dma_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(dma->dma_tag, dma->dma_map);
 		dma->dma_paddr = 0;
 	}
 	if (dma->dma_vaddr != NULL) {
 		bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
 		dma->dma_vaddr = NULL;
 	}
 	bus_dma_tag_destroy(dma->dma_tag);
 	dma->dma_tag = NULL;
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for the transmit and receive rings, and then
  *  the descriptors associated with each, called only once at attach.
  *
  **********************************************************************/
 static int
 igb_allocate_queues(struct adapter *adapter)
 {
 	device_t dev = adapter->dev;
 	struct igb_queue	*que = NULL;
 	struct tx_ring		*txr = NULL;
 	struct rx_ring		*rxr = NULL;
 	int rsize, tsize, error = E1000_SUCCESS;
 	int txconf = 0, rxconf = 0;
 
 	/* First allocate the top level queue structs */
 	if (!(adapter->queues =
 	    (struct igb_queue *) malloc(sizeof(struct igb_queue) *
 	    adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate queue memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
 	/* Next allocate the TX ring struct memory */
 	if (!(adapter->tx_rings =
 	    (struct tx_ring *) malloc(sizeof(struct tx_ring) *
 	    adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate TX ring memory\n");
 		error = ENOMEM;
 		goto tx_fail;
 	}
 
 	/* Now allocate the RX */
 	if (!(adapter->rx_rings =
 	    (struct rx_ring *) malloc(sizeof(struct rx_ring) *
 	    adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate RX ring memory\n");
 		error = ENOMEM;
 		goto rx_fail;
 	}
 
 	tsize = roundup2(adapter->num_tx_desc *
 	    sizeof(union e1000_adv_tx_desc), IGB_DBA_ALIGN);
 	/*
 	 * Now set up the TX queues, txconf is needed to handle the
 	 * possibility that things fail midcourse and we need to
 	 * undo memory gracefully
 	 */ 
 	for (int i = 0; i < adapter->num_queues; i++, txconf++) {
 		/* Set up some basics */
 		txr = &adapter->tx_rings[i];
 		txr->adapter = adapter;
 		txr->me = i;
 		txr->num_desc = adapter->num_tx_desc;
 
 		/* Initialize the TX lock */
 		snprintf(txr->mtx_name, sizeof(txr->mtx_name), "%s:tx(%d)",
 		    device_get_nameunit(dev), txr->me);
 		mtx_init(&txr->tx_mtx, txr->mtx_name, NULL, MTX_DEF);
 
 		if (igb_dma_malloc(adapter, tsize,
 			&txr->txdma, BUS_DMA_NOWAIT)) {
 			device_printf(dev,
 			    "Unable to allocate TX Descriptor memory\n");
 			error = ENOMEM;
 			goto err_tx_desc;
 		}
 		txr->tx_base = (union e1000_adv_tx_desc *)txr->txdma.dma_vaddr;
 		bzero((void *)txr->tx_base, tsize);
 
         	/* Now allocate transmit buffers for the ring */
         	if (igb_allocate_transmit_buffers(txr)) {
 			device_printf(dev,
 			    "Critical Failure setting up transmit buffers\n");
 			error = ENOMEM;
 			goto err_tx_desc;
         	}
 #ifndef IGB_LEGACY_TX
 		/* Allocate a buf ring */
 		txr->br = buf_ring_alloc(igb_buf_ring_size, M_DEVBUF,
 		    M_WAITOK, &txr->tx_mtx);
 #endif
 	}
 
 	/*
 	 * Next the RX queues...
 	 */ 
 	rsize = roundup2(adapter->num_rx_desc *
 	    sizeof(union e1000_adv_rx_desc), IGB_DBA_ALIGN);
 	for (int i = 0; i < adapter->num_queues; i++, rxconf++) {
 		rxr = &adapter->rx_rings[i];
 		rxr->adapter = adapter;
 		rxr->me = i;
 
 		/* Initialize the RX lock */
 		snprintf(rxr->mtx_name, sizeof(rxr->mtx_name), "%s:rx(%d)",
 		    device_get_nameunit(dev), txr->me);
 		mtx_init(&rxr->rx_mtx, rxr->mtx_name, NULL, MTX_DEF);
 
 		if (igb_dma_malloc(adapter, rsize,
 			&rxr->rxdma, BUS_DMA_NOWAIT)) {
 			device_printf(dev,
 			    "Unable to allocate RxDescriptor memory\n");
 			error = ENOMEM;
 			goto err_rx_desc;
 		}
 		rxr->rx_base = (union e1000_adv_rx_desc *)rxr->rxdma.dma_vaddr;
 		bzero((void *)rxr->rx_base, rsize);
 
         	/* Allocate receive buffers for the ring*/
 		if (igb_allocate_receive_buffers(rxr)) {
 			device_printf(dev,
 			    "Critical Failure setting up receive buffers\n");
 			error = ENOMEM;
 			goto err_rx_desc;
 		}
 	}
 
 	/*
 	** Finally set up the queue holding structs
 	*/
 	for (int i = 0; i < adapter->num_queues; i++) {
 		que = &adapter->queues[i];
 		que->adapter = adapter;
 		que->txr = &adapter->tx_rings[i];
 		que->rxr = &adapter->rx_rings[i];
 	}
 
 	return (0);
 
 err_rx_desc:
 	for (rxr = adapter->rx_rings; rxconf > 0; rxr++, rxconf--)
 		igb_dma_free(adapter, &rxr->rxdma);
 err_tx_desc:
 	for (txr = adapter->tx_rings; txconf > 0; txr++, txconf--)
 		igb_dma_free(adapter, &txr->txdma);
 	free(adapter->rx_rings, M_DEVBUF);
 rx_fail:
 #ifndef IGB_LEGACY_TX
 	buf_ring_free(txr->br, M_DEVBUF);
 #endif
 	free(adapter->tx_rings, M_DEVBUF);
 tx_fail:
 	free(adapter->queues, M_DEVBUF);
 fail:
 	return (error);
 }
 
 /*********************************************************************
  *
  *  Allocate memory for tx_buffer structures. The tx_buffer stores all
  *  the information needed to transmit a packet on the wire. This is
  *  called only once at attach, setup is done every reset.
  *
  **********************************************************************/
 static int
 igb_allocate_transmit_buffers(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	device_t dev = adapter->dev;
 	struct igb_tx_buf *txbuf;
 	int error, i;
 
 	/*
 	 * Setup DMA descriptor areas.
 	 */
 	if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       IGB_TSO_SIZE,		/* maxsize */
 			       IGB_MAX_SCATTER,		/* nsegments */
 			       PAGE_SIZE,		/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txr->txtag))) {
 		device_printf(dev,"Unable to allocate TX DMA tag\n");
 		goto fail;
 	}
 
 	if (!(txr->tx_buffers =
 	    (struct igb_tx_buf *) malloc(sizeof(struct igb_tx_buf) *
 	    adapter->num_tx_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
         /* Create the descriptor buffer dma maps */
 	txbuf = txr->tx_buffers;
 	for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
 		error = bus_dmamap_create(txr->txtag, 0, &txbuf->map);
 		if (error != 0) {
 			device_printf(dev, "Unable to create TX DMA map\n");
 			goto fail;
 		}
 	}
 
 	return 0;
 fail:
 	/* We free all, it handles case where we are in the middle */
 	igb_free_transmit_structures(adapter);
 	return (error);
 }
 
 /*********************************************************************
  *
  *  Initialize a transmit ring.
  *
  **********************************************************************/
 static void
 igb_setup_transmit_ring(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	struct igb_tx_buf *txbuf;
 	int i;
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(adapter->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 
 	/* Clear the old descriptor contents */
 	IGB_TX_LOCK(txr);
 #ifdef DEV_NETMAP
 	slot = netmap_reset(na, NR_TX, txr->me, 0);
 #endif /* DEV_NETMAP */
 	bzero((void *)txr->tx_base,
 	      (sizeof(union e1000_adv_tx_desc)) * adapter->num_tx_desc);
 	/* Reset indices */
 	txr->next_avail_desc = 0;
 	txr->next_to_clean = 0;
 
 	/* Free any existing tx buffers. */
         txbuf = txr->tx_buffers;
 	for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
 		if (txbuf->m_head != NULL) {
 			bus_dmamap_sync(txr->txtag, txbuf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag, txbuf->map);
 			m_freem(txbuf->m_head);
 			txbuf->m_head = NULL;
 		}
 #ifdef DEV_NETMAP
 		if (slot) {
 			int si = netmap_idx_n2k(&na->tx_rings[txr->me], i);
 			/* no need to set the address */
 			netmap_load_map(na, txr->txtag, txbuf->map, NMB(na, slot + si));
 		}
 #endif /* DEV_NETMAP */
 		/* clear the watch index */
 		txbuf->eop = NULL;
         }
 
 	/* Set number of descriptors available */
 	txr->tx_avail = adapter->num_tx_desc;
 
 	bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	IGB_TX_UNLOCK(txr);
 }
 
 /*********************************************************************
  *
  *  Initialize all transmit rings.
  *
  **********************************************************************/
 static void
 igb_setup_transmit_structures(struct adapter *adapter)
 {
 	struct tx_ring *txr = adapter->tx_rings;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++)
 		igb_setup_transmit_ring(txr);
 
 	return;
 }
 
 /*********************************************************************
  *
  *  Enable transmit unit.
  *
  **********************************************************************/
 static void
 igb_initialize_transmit_units(struct adapter *adapter)
 {
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct e1000_hw *hw = &adapter->hw;
 	u32		tctl, txdctl;
 
 	INIT_DEBUGOUT("igb_initialize_transmit_units: begin");
 	tctl = txdctl = 0;
 
 	/* Setup the Tx Descriptor Rings */
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		u64 bus_addr = txr->txdma.dma_paddr;
 
 		E1000_WRITE_REG(hw, E1000_TDLEN(i),
 		    adapter->num_tx_desc * sizeof(struct e1000_tx_desc));
 		E1000_WRITE_REG(hw, E1000_TDBAH(i),
 		    (uint32_t)(bus_addr >> 32));
 		E1000_WRITE_REG(hw, E1000_TDBAL(i),
 		    (uint32_t)bus_addr);
 
 		/* Setup the HW Tx Head and Tail descriptor pointers */
 		E1000_WRITE_REG(hw, E1000_TDT(i), 0);
 		E1000_WRITE_REG(hw, E1000_TDH(i), 0);
 
 		HW_DEBUGOUT2("Base = %x, Length = %x\n",
 		    E1000_READ_REG(hw, E1000_TDBAL(i)),
 		    E1000_READ_REG(hw, E1000_TDLEN(i)));
 
 		txr->queue_status = IGB_QUEUE_IDLE;
 
 		txdctl |= IGB_TX_PTHRESH;
 		txdctl |= IGB_TX_HTHRESH << 8;
 		txdctl |= IGB_TX_WTHRESH << 16;
 		txdctl |= E1000_TXDCTL_QUEUE_ENABLE;
 		E1000_WRITE_REG(hw, E1000_TXDCTL(i), txdctl);
 	}
 
 	if (adapter->vf_ifp)
 		return;
 
 	e1000_config_collision_dist(hw);
 
 	/* Program the Transmit Control Register */
 	tctl = E1000_READ_REG(hw, E1000_TCTL);
 	tctl &= ~E1000_TCTL_CT;
 	tctl |= (E1000_TCTL_PSP | E1000_TCTL_RTLC | E1000_TCTL_EN |
 		   (E1000_COLLISION_THRESHOLD << E1000_CT_SHIFT));
 
 	/* This write will effectively turn on the transmit unit. */
 	E1000_WRITE_REG(hw, E1000_TCTL, tctl);
 }
 
 /*********************************************************************
  *
  *  Free all transmit rings.
  *
  **********************************************************************/
 static void
 igb_free_transmit_structures(struct adapter *adapter)
 {
 	struct tx_ring *txr = adapter->tx_rings;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		IGB_TX_LOCK(txr);
 		igb_free_transmit_buffers(txr);
 		igb_dma_free(adapter, &txr->txdma);
 		IGB_TX_UNLOCK(txr);
 		IGB_TX_LOCK_DESTROY(txr);
 	}
 	free(adapter->tx_rings, M_DEVBUF);
 }
 
 /*********************************************************************
  *
  *  Free transmit ring related data structures.
  *
  **********************************************************************/
 static void
 igb_free_transmit_buffers(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	struct igb_tx_buf *tx_buffer;
 	int             i;
 
 	INIT_DEBUGOUT("free_transmit_ring: begin");
 
 	if (txr->tx_buffers == NULL)
 		return;
 
 	tx_buffer = txr->tx_buffers;
 	for (i = 0; i < adapter->num_tx_desc; i++, tx_buffer++) {
 		if (tx_buffer->m_head != NULL) {
 			bus_dmamap_sync(txr->txtag, tx_buffer->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag,
 			    tx_buffer->map);
 			m_freem(tx_buffer->m_head);
 			tx_buffer->m_head = NULL;
 			if (tx_buffer->map != NULL) {
 				bus_dmamap_destroy(txr->txtag,
 				    tx_buffer->map);
 				tx_buffer->map = NULL;
 			}
 		} else if (tx_buffer->map != NULL) {
 			bus_dmamap_unload(txr->txtag,
 			    tx_buffer->map);
 			bus_dmamap_destroy(txr->txtag,
 			    tx_buffer->map);
 			tx_buffer->map = NULL;
 		}
 	}
 #ifndef IGB_LEGACY_TX
 	if (txr->br != NULL)
 		buf_ring_free(txr->br, M_DEVBUF);
 #endif
 	if (txr->tx_buffers != NULL) {
 		free(txr->tx_buffers, M_DEVBUF);
 		txr->tx_buffers = NULL;
 	}
 	if (txr->txtag != NULL) {
 		bus_dma_tag_destroy(txr->txtag);
 		txr->txtag = NULL;
 	}
 	return;
 }
 
 /**********************************************************************
  *
  *  Setup work for hardware segmentation offload (TSO) on
  *  adapters using advanced tx descriptors
  *
  **********************************************************************/
 static int
 igb_tso_setup(struct tx_ring *txr, struct mbuf *mp,
     u32 *cmd_type_len, u32 *olinfo_status)
 {
 	struct adapter *adapter = txr->adapter;
 	struct e1000_adv_tx_context_desc *TXD;
 	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
 	u32 mss_l4len_idx = 0, paylen;
 	u16 vtag = 0, eh_type;
 	int ctxd, ehdrlen, ip_hlen, tcp_hlen;
 	struct ether_vlan_header *eh;
 #ifdef INET6
 	struct ip6_hdr *ip6;
 #endif
 #ifdef INET
 	struct ip *ip;
 #endif
 	struct tcphdr *th;
 
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 		eh_type = eh->evl_proto;
 	} else {
 		ehdrlen = ETHER_HDR_LEN;
 		eh_type = eh->evl_encap_proto;
 	}
 
 	switch (ntohs(eh_type)) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 		ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
 		/* XXX-BZ For now we do not pretend to support ext. hdrs. */
 		if (ip6->ip6_nxt != IPPROTO_TCP)
 			return (ENXIO);
 		ip_hlen = sizeof(struct ip6_hdr);
 		ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
 		th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
 		th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 		type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV6;
 		break;
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 		ip = (struct ip *)(mp->m_data + ehdrlen);
 		if (ip->ip_p != IPPROTO_TCP)
 			return (ENXIO);
 		ip->ip_sum = 0;
 		ip_hlen = ip->ip_hl << 2;
 		th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
 		th->th_sum = in_pseudo(ip->ip_src.s_addr,
 		    ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 		type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
 		/* Tell transmit desc to also do IPv4 checksum. */
 		*olinfo_status |= E1000_TXD_POPTS_IXSM << 8;
 		break;
 #endif
 	default:
 		panic("%s: CSUM_TSO but no supported IP version (0x%04x)",
 		    __func__, ntohs(eh_type));
 		break;
 	}
 
 	ctxd = txr->next_avail_desc;
 	TXD = (struct e1000_adv_tx_context_desc *) &txr->tx_base[ctxd];
 
 	tcp_hlen = th->th_off << 2;
 
 	/* This is used in the transmit desc in encap */
 	paylen = mp->m_pkthdr.len - ehdrlen - ip_hlen - tcp_hlen;
 
 	/* VLAN MACLEN IPLEN */
 	if (mp->m_flags & M_VLANTAG) {
 		vtag = htole16(mp->m_pkthdr.ether_vtag);
                 vlan_macip_lens |= (vtag << E1000_ADVTXD_VLAN_SHIFT);
 	}
 
 	vlan_macip_lens |= ehdrlen << E1000_ADVTXD_MACLEN_SHIFT;
 	vlan_macip_lens |= ip_hlen;
 	TXD->vlan_macip_lens = htole32(vlan_macip_lens);
 
 	/* ADV DTYPE TUCMD */
 	type_tucmd_mlhl |= E1000_ADVTXD_DCMD_DEXT | E1000_ADVTXD_DTYP_CTXT;
 	type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_TCP;
 	TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
 
 	/* MSS L4LEN IDX */
 	mss_l4len_idx |= (mp->m_pkthdr.tso_segsz << E1000_ADVTXD_MSS_SHIFT);
 	mss_l4len_idx |= (tcp_hlen << E1000_ADVTXD_L4LEN_SHIFT);
 	/* 82575 needs the queue index added */
 	if (adapter->hw.mac.type == e1000_82575)
 		mss_l4len_idx |= txr->me << 4;
 	TXD->mss_l4len_idx = htole32(mss_l4len_idx);
 
 	TXD->seqnum_seed = htole32(0);
 
 	if (++ctxd == txr->num_desc)
 		ctxd = 0;
 
 	txr->tx_avail--;
 	txr->next_avail_desc = ctxd;
 	*cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
 	*olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
 	*olinfo_status |= paylen << E1000_ADVTXD_PAYLEN_SHIFT;
 	++txr->tso_tx;
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Advanced Context Descriptor setup for VLAN, CSUM or TSO
  *
  **********************************************************************/
 
 static int
 igb_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp,
     u32 *cmd_type_len, u32 *olinfo_status)
 {
 	struct e1000_adv_tx_context_desc *TXD;
 	struct adapter *adapter = txr->adapter;
 	struct ether_vlan_header *eh;
 	struct ip *ip;
 	struct ip6_hdr *ip6;
 	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0;
 	int	ehdrlen, ip_hlen = 0;
 	u16	etype;
 	u8	ipproto = 0;
 	int	offload = TRUE;
 	int	ctxd = txr->next_avail_desc;
 	u16	vtag = 0;
 
 	/* First check if TSO is to be used */
 	if (mp->m_pkthdr.csum_flags & CSUM_TSO)
 		return (igb_tso_setup(txr, mp, cmd_type_len, olinfo_status));
 
 	if ((mp->m_pkthdr.csum_flags & CSUM_OFFLOAD) == 0)
 		offload = FALSE;
 
 	/* Indicate the whole packet as payload when not doing TSO */
        	*olinfo_status |= mp->m_pkthdr.len << E1000_ADVTXD_PAYLEN_SHIFT;
 
 	/* Now ready a context descriptor */
 	TXD = (struct e1000_adv_tx_context_desc *) &txr->tx_base[ctxd];
 
 	/*
 	** In advanced descriptors the vlan tag must 
 	** be placed into the context descriptor. Hence
 	** we need to make one even if not doing offloads.
 	*/
 	if (mp->m_flags & M_VLANTAG) {
 		vtag = htole16(mp->m_pkthdr.ether_vtag);
 		vlan_macip_lens |= (vtag << E1000_ADVTXD_VLAN_SHIFT);
 	} else if (offload == FALSE) /* ... no offload to do */
 		return (0);
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present,
 	 * helpful for QinQ too.
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		etype = ntohs(eh->evl_proto);
 		ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 	} else {
 		etype = ntohs(eh->evl_encap_proto);
 		ehdrlen = ETHER_HDR_LEN;
 	}
 
 	/* Set the ether header length */
 	vlan_macip_lens |= ehdrlen << E1000_ADVTXD_MACLEN_SHIFT;
 
 	switch (etype) {
 		case ETHERTYPE_IP:
 			ip = (struct ip *)(mp->m_data + ehdrlen);
 			ip_hlen = ip->ip_hl << 2;
 			ipproto = ip->ip_p;
 			type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
 			break;
 		case ETHERTYPE_IPV6:
 			ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
 			ip_hlen = sizeof(struct ip6_hdr);
 			/* XXX-BZ this will go badly in case of ext hdrs. */
 			ipproto = ip6->ip6_nxt;
 			type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV6;
 			break;
 		default:
 			offload = FALSE;
 			break;
 	}
 
 	vlan_macip_lens |= ip_hlen;
 	type_tucmd_mlhl |= E1000_ADVTXD_DCMD_DEXT | E1000_ADVTXD_DTYP_CTXT;
 
 	switch (ipproto) {
 		case IPPROTO_TCP:
 #if __FreeBSD_version >= 1000000
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_TCP | CSUM_IP6_TCP))
 #else
 			if (mp->m_pkthdr.csum_flags & CSUM_TCP)
 #endif
 				type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_TCP;
 			break;
 		case IPPROTO_UDP:
 #if __FreeBSD_version >= 1000000
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_UDP | CSUM_IP6_UDP))
 #else
 			if (mp->m_pkthdr.csum_flags & CSUM_UDP)
 #endif
 				type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_UDP;
 			break;
 
 #if __FreeBSD_version >= 800000
 		case IPPROTO_SCTP:
 #if __FreeBSD_version >= 1000000
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_SCTP | CSUM_IP6_SCTP))
 #else
 			if (mp->m_pkthdr.csum_flags & CSUM_SCTP)
 #endif
 				type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_SCTP;
 			break;
 #endif
 		default:
 			offload = FALSE;
 			break;
 	}
 
 	if (offload) /* For the TX descriptor setup */
 		*olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
 
 	/* 82575 needs the queue index added */
 	if (adapter->hw.mac.type == e1000_82575)
 		mss_l4len_idx = txr->me << 4;
 
 	/* Now copy bits into descriptor */
 	TXD->vlan_macip_lens = htole32(vlan_macip_lens);
 	TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
 	TXD->seqnum_seed = htole32(0);
 	TXD->mss_l4len_idx = htole32(mss_l4len_idx);
 
 	/* We've consumed the first desc, adjust counters */
 	if (++ctxd == txr->num_desc)
 		ctxd = 0;
 	txr->next_avail_desc = ctxd;
 	--txr->tx_avail;
 
         return (0);
 }
 
 /**********************************************************************
  *
  *  Examine each tx_buffer in the used queue. If the hardware is done
  *  processing the packet then free associated resources. The
  *  tx_buffer is put back on the free queue.
  *
  *  TRUE return means there's work in the ring to clean, FALSE its empty.
  **********************************************************************/
 static bool
 igb_txeof(struct tx_ring *txr)
 {
 	struct adapter		*adapter = txr->adapter;
 #ifdef DEV_NETMAP
 	struct ifnet		*ifp = adapter->ifp;
 #endif /* DEV_NETMAP */
 	u32			work, processed = 0;
 	int			limit = adapter->tx_process_limit;
 	struct igb_tx_buf	*buf;
 	union e1000_adv_tx_desc *txd;
 
 	mtx_assert(&txr->tx_mtx, MA_OWNED);
 
 #ifdef DEV_NETMAP
 	if (netmap_tx_irq(ifp, txr->me))
 		return (FALSE);
 #endif /* DEV_NETMAP */
 
 	if (txr->tx_avail == txr->num_desc) {
 		txr->queue_status = IGB_QUEUE_IDLE;
 		return FALSE;
 	}
 
 	/* Get work starting point */
 	work = txr->next_to_clean;
 	buf = &txr->tx_buffers[work];
 	txd = &txr->tx_base[work];
 	work -= txr->num_desc; /* The distance to ring end */
         bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
             BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 	do {
 		union e1000_adv_tx_desc *eop = buf->eop;
 		if (eop == NULL) /* No work */
 			break;
 
 		if ((eop->wb.status & E1000_TXD_STAT_DD) == 0)
 			break;	/* I/O not complete */
 
 		if (buf->m_head) {
 			txr->bytes +=
 			    buf->m_head->m_pkthdr.len;
 			bus_dmamap_sync(txr->txtag,
 			    buf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag,
 			    buf->map);
 			m_freem(buf->m_head);
 			buf->m_head = NULL;
 		}
 		buf->eop = NULL;
 		++txr->tx_avail;
 
 		/* We clean the range if multi segment */
 		while (txd != eop) {
 			++txd;
 			++buf;
 			++work;
 			/* wrap the ring? */
 			if (__predict_false(!work)) {
 				work -= txr->num_desc;
 				buf = txr->tx_buffers;
 				txd = txr->tx_base;
 			}
 			if (buf->m_head) {
 				txr->bytes +=
 				    buf->m_head->m_pkthdr.len;
 				bus_dmamap_sync(txr->txtag,
 				    buf->map,
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(txr->txtag,
 				    buf->map);
 				m_freem(buf->m_head);
 				buf->m_head = NULL;
 			}
 			++txr->tx_avail;
 			buf->eop = NULL;
 
 		}
 		++txr->packets;
 		++processed;
 		txr->watchdog_time = ticks;
 
 		/* Try the next packet */
 		++txd;
 		++buf;
 		++work;
 		/* reset with a wrap */
 		if (__predict_false(!work)) {
 			work -= txr->num_desc;
 			buf = txr->tx_buffers;
 			txd = txr->tx_base;
 		}
 		prefetch(txd);
 	} while (__predict_true(--limit));
 
 	bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	work += txr->num_desc;
 	txr->next_to_clean = work;
 
 	/*
 	** Watchdog calculation, we know there's
 	** work outstanding or the first return
 	** would have been taken, so none processed
 	** for too long indicates a hang.
 	*/
 	if ((!processed) && ((ticks - txr->watchdog_time) > IGB_WATCHDOG))
 		txr->queue_status |= IGB_QUEUE_HUNG;
 
 	if (txr->tx_avail >= IGB_QUEUE_THRESHOLD)
 		txr->queue_status &= ~IGB_QUEUE_DEPLETED;	
 
 	if (txr->tx_avail == txr->num_desc) {
 		txr->queue_status = IGB_QUEUE_IDLE;
 		return (FALSE);
 	}
 
 	return (TRUE);
 }
 
 /*********************************************************************
  *
  *  Refresh mbuf buffers for RX descriptor rings
  *   - now keeps its own state so discards due to resource
  *     exhaustion are unnecessary, if an mbuf cannot be obtained
  *     it just returns, keeping its placeholder, thus it can simply
  *     be recalled to try again.
  *
  **********************************************************************/
 static void
 igb_refresh_mbufs(struct rx_ring *rxr, int limit)
 {
 	struct adapter		*adapter = rxr->adapter;
 	bus_dma_segment_t	hseg[1];
 	bus_dma_segment_t	pseg[1];
 	struct igb_rx_buf	*rxbuf;
 	struct mbuf		*mh, *mp;
 	int			i, j, nsegs, error;
 	bool			refreshed = FALSE;
 
 	i = j = rxr->next_to_refresh;
 	/*
 	** Get one descriptor beyond
 	** our work mark to control
 	** the loop.
         */
 	if (++j == adapter->num_rx_desc)
 		j = 0;
 
 	while (j != limit) {
 		rxbuf = &rxr->rx_buffers[i];
 		/* No hdr mbuf used with header split off */
 		if (rxr->hdr_split == FALSE)
 			goto no_split;
 		if (rxbuf->m_head == NULL) {
 			mh = m_gethdr(M_NOWAIT, MT_DATA);
 			if (mh == NULL)
 				goto update;
 		} else
 			mh = rxbuf->m_head;
 
 		mh->m_pkthdr.len = mh->m_len = MHLEN;
 		mh->m_len = MHLEN;
 		mh->m_flags |= M_PKTHDR;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->htag,
 		    rxbuf->hmap, mh, hseg, &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) {
 			printf("Refresh mbufs: hdr dmamap load"
 			    " failure - %d\n", error);
 			m_free(mh);
 			rxbuf->m_head = NULL;
 			goto update;
 		}
 		rxbuf->m_head = mh;
 		bus_dmamap_sync(rxr->htag, rxbuf->hmap,
 		    BUS_DMASYNC_PREREAD);
 		rxr->rx_base[i].read.hdr_addr =
 		    htole64(hseg[0].ds_addr);
 no_split:
 		if (rxbuf->m_pack == NULL) {
 			mp = m_getjcl(M_NOWAIT, MT_DATA,
 			    M_PKTHDR, adapter->rx_mbuf_sz);
 			if (mp == NULL)
 				goto update;
 		} else
 			mp = rxbuf->m_pack;
 
 		mp->m_pkthdr.len = mp->m_len = adapter->rx_mbuf_sz;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 		    rxbuf->pmap, mp, pseg, &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) {
 			printf("Refresh mbufs: payload dmamap load"
 			    " failure - %d\n", error);
 			m_free(mp);
 			rxbuf->m_pack = NULL;
 			goto update;
 		}
 		rxbuf->m_pack = mp;
 		bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 		    BUS_DMASYNC_PREREAD);
 		rxr->rx_base[i].read.pkt_addr =
 		    htole64(pseg[0].ds_addr);
 		refreshed = TRUE; /* I feel wefreshed :) */
 
 		i = j; /* our next is precalculated */
 		rxr->next_to_refresh = i;
 		if (++j == adapter->num_rx_desc)
 			j = 0;
 	}
 update:
 	if (refreshed) /* update tail */
 		E1000_WRITE_REG(&adapter->hw,
 		    E1000_RDT(rxr->me), rxr->next_to_refresh);
 	return;
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for rx_buffer structures. Since we use one
  *  rx_buffer per received packet, the maximum number of rx_buffer's
  *  that we'll need is equal to the number of receive descriptors
  *  that we've allocated.
  *
  **********************************************************************/
 static int
 igb_allocate_receive_buffers(struct rx_ring *rxr)
 {
 	struct	adapter 	*adapter = rxr->adapter;
 	device_t 		dev = adapter->dev;
 	struct igb_rx_buf	*rxbuf;
 	int             	i, bsize, error;
 
 	bsize = sizeof(struct igb_rx_buf) * adapter->num_rx_desc;
 	if (!(rxr->rx_buffers =
 	    (struct igb_rx_buf *) malloc(bsize,
 	    M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate rx_buffer memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
 	if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
 				   1, 0,		/* alignment, bounds */
 				   BUS_SPACE_MAXADDR,	/* lowaddr */
 				   BUS_SPACE_MAXADDR,	/* highaddr */
 				   NULL, NULL,		/* filter, filterarg */
 				   MSIZE,		/* maxsize */
 				   1,			/* nsegments */
 				   MSIZE,		/* maxsegsize */
 				   0,			/* flags */
 				   NULL,		/* lockfunc */
 				   NULL,		/* lockfuncarg */
 				   &rxr->htag))) {
 		device_printf(dev, "Unable to create RX DMA tag\n");
 		goto fail;
 	}
 
 	if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
 				   1, 0,		/* alignment, bounds */
 				   BUS_SPACE_MAXADDR,	/* lowaddr */
 				   BUS_SPACE_MAXADDR,	/* highaddr */
 				   NULL, NULL,		/* filter, filterarg */
 				   MJUM9BYTES,		/* maxsize */
 				   1,			/* nsegments */
 				   MJUM9BYTES,		/* maxsegsize */
 				   0,			/* flags */
 				   NULL,		/* lockfunc */
 				   NULL,		/* lockfuncarg */
 				   &rxr->ptag))) {
 		device_printf(dev, "Unable to create RX payload DMA tag\n");
 		goto fail;
 	}
 
 	for (i = 0; i < adapter->num_rx_desc; i++) {
 		rxbuf = &rxr->rx_buffers[i];
 		error = bus_dmamap_create(rxr->htag, 0, &rxbuf->hmap);
 		if (error) {
 			device_printf(dev,
 			    "Unable to create RX head DMA maps\n");
 			goto fail;
 		}
 		error = bus_dmamap_create(rxr->ptag, 0, &rxbuf->pmap);
 		if (error) {
 			device_printf(dev,
 			    "Unable to create RX packet DMA maps\n");
 			goto fail;
 		}
 	}
 
 	return (0);
 
 fail:
 	/* Frees all, but can handle partial completion */
 	igb_free_receive_structures(adapter);
 	return (error);
 }
 
 
 static void
 igb_free_receive_ring(struct rx_ring *rxr)
 {
 	struct	adapter		*adapter = rxr->adapter;
 	struct igb_rx_buf	*rxbuf;
 
 
 	for (int i = 0; i < adapter->num_rx_desc; i++) {
 		rxbuf = &rxr->rx_buffers[i];
 		if (rxbuf->m_head != NULL) {
 			bus_dmamap_sync(rxr->htag, rxbuf->hmap,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(rxr->htag, rxbuf->hmap);
 			rxbuf->m_head->m_flags |= M_PKTHDR;
 			m_freem(rxbuf->m_head);
 		}
 		if (rxbuf->m_pack != NULL) {
 			bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 			rxbuf->m_pack->m_flags |= M_PKTHDR;
 			m_freem(rxbuf->m_pack);
 		}
 		rxbuf->m_head = NULL;
 		rxbuf->m_pack = NULL;
 	}
 }
 
 
 /*********************************************************************
  *
  *  Initialize a receive ring and its buffers.
  *
  **********************************************************************/
 static int
 igb_setup_receive_ring(struct rx_ring *rxr)
 {
 	struct	adapter		*adapter;
 	struct  ifnet		*ifp;
 	device_t		dev;
 	struct igb_rx_buf	*rxbuf;
 	bus_dma_segment_t	pseg[1], hseg[1];
 	struct lro_ctrl		*lro = &rxr->lro;
 	int			rsize, nsegs, error = 0;
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(rxr->adapter->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 
 	adapter = rxr->adapter;
 	dev = adapter->dev;
 	ifp = adapter->ifp;
 
 	/* Clear the ring contents */
 	IGB_RX_LOCK(rxr);
 #ifdef DEV_NETMAP
 	slot = netmap_reset(na, NR_RX, rxr->me, 0);
 #endif /* DEV_NETMAP */
 	rsize = roundup2(adapter->num_rx_desc *
 	    sizeof(union e1000_adv_rx_desc), IGB_DBA_ALIGN);
 	bzero((void *)rxr->rx_base, rsize);
 
 	/*
 	** Free current RX buffer structures and their mbufs
 	*/
 	igb_free_receive_ring(rxr);
 
 	/* Configure for header split? */
 	if (igb_header_split)
 		rxr->hdr_split = TRUE;
 
         /* Now replenish the ring mbufs */
 	for (int j = 0; j < adapter->num_rx_desc; ++j) {
 		struct mbuf	*mh, *mp;
 
 		rxbuf = &rxr->rx_buffers[j];
 #ifdef DEV_NETMAP
 		if (slot) {
 			/* slot sj is mapped to the j-th NIC-ring entry */
 			int sj = netmap_idx_n2k(&na->rx_rings[rxr->me], j);
 			uint64_t paddr;
 			void *addr;
 
 			addr = PNMB(na, slot + sj, &paddr);
 			netmap_load_map(na, rxr->ptag, rxbuf->pmap, addr);
 			/* Update descriptor */
 			rxr->rx_base[j].read.pkt_addr = htole64(paddr);
 			continue;
 		}
 #endif /* DEV_NETMAP */
 		if (rxr->hdr_split == FALSE)
 			goto skip_head;
 
 		/* First the header */
 		rxbuf->m_head = m_gethdr(M_NOWAIT, MT_DATA);
 		if (rxbuf->m_head == NULL) {
 			error = ENOBUFS;
                         goto fail;
 		}
 		m_adj(rxbuf->m_head, ETHER_ALIGN);
 		mh = rxbuf->m_head;
 		mh->m_len = mh->m_pkthdr.len = MHLEN;
 		mh->m_flags |= M_PKTHDR;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->htag,
 		    rxbuf->hmap, rxbuf->m_head, hseg,
 		    &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) /* Nothing elegant to do here */
                         goto fail;
 		bus_dmamap_sync(rxr->htag,
 		    rxbuf->hmap, BUS_DMASYNC_PREREAD);
 		/* Update descriptor */
 		rxr->rx_base[j].read.hdr_addr = htole64(hseg[0].ds_addr);
 
 skip_head:
 		/* Now the payload cluster */
 		rxbuf->m_pack = m_getjcl(M_NOWAIT, MT_DATA,
 		    M_PKTHDR, adapter->rx_mbuf_sz);
 		if (rxbuf->m_pack == NULL) {
 			error = ENOBUFS;
                         goto fail;
 		}
 		mp = rxbuf->m_pack;
 		mp->m_pkthdr.len = mp->m_len = adapter->rx_mbuf_sz;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 		    rxbuf->pmap, mp, pseg,
 		    &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0)
                         goto fail;
 		bus_dmamap_sync(rxr->ptag,
 		    rxbuf->pmap, BUS_DMASYNC_PREREAD);
 		/* Update descriptor */
 		rxr->rx_base[j].read.pkt_addr = htole64(pseg[0].ds_addr);
         }
 
 	/* Setup our descriptor indices */
 	rxr->next_to_check = 0;
 	rxr->next_to_refresh = adapter->num_rx_desc - 1;
 	rxr->lro_enabled = FALSE;
 	rxr->rx_split_packets = 0;
 	rxr->rx_bytes = 0;
 
 	rxr->fmp = NULL;
 	rxr->lmp = NULL;
 
 	bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	/*
 	** Now set up the LRO interface, we
 	** also only do head split when LRO
 	** is enabled, since so often they
 	** are undesirable in similar setups.
 	*/
 	if (ifp->if_capenable & IFCAP_LRO) {
 		error = tcp_lro_init(lro);
 		if (error) {
 			device_printf(dev, "LRO Initialization failed!\n");
 			goto fail;
 		}
 		INIT_DEBUGOUT("RX LRO Initialized\n");
 		rxr->lro_enabled = TRUE;
 		lro->ifp = adapter->ifp;
 	}
 
 	IGB_RX_UNLOCK(rxr);
 	return (0);
 
 fail:
 	igb_free_receive_ring(rxr);
 	IGB_RX_UNLOCK(rxr);
 	return (error);
 }
 
 
 /*********************************************************************
  *
  *  Initialize all receive rings.
  *
  **********************************************************************/
 static int
 igb_setup_receive_structures(struct adapter *adapter)
 {
 	struct rx_ring *rxr = adapter->rx_rings;
 	int i;
 
 	for (i = 0; i < adapter->num_queues; i++, rxr++)
 		if (igb_setup_receive_ring(rxr))
 			goto fail;
 
 	return (0);
 fail:
 	/*
 	 * Free RX buffers allocated so far, we will only handle
 	 * the rings that completed, the failing case will have
 	 * cleaned up for itself. 'i' is the endpoint.
 	 */
 	for (int j = 0; j < i; ++j) {
 		rxr = &adapter->rx_rings[j];
 		IGB_RX_LOCK(rxr);
 		igb_free_receive_ring(rxr);
 		IGB_RX_UNLOCK(rxr);
 	}
 
 	return (ENOBUFS);
 }
 
 /*
  * Initialise the RSS mapping for NICs that support multiple transmit/
  * receive rings.
  */
 static void
 igb_initialise_rss_mapping(struct adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	int i;
 	int queue_id;
 	u32 reta;
 	u32 rss_key[10], mrqc, shift = 0;
 
 	/* XXX? */
 	if (adapter->hw.mac.type == e1000_82575)
 		shift = 6;
 
 	/*
 	 * The redirection table controls which destination
 	 * queue each bucket redirects traffic to.
 	 * Each DWORD represents four queues, with the LSB
 	 * being the first queue in the DWORD.
 	 *
 	 * This just allocates buckets to queues using round-robin
 	 * allocation.
 	 *
 	 * NOTE: It Just Happens to line up with the default
 	 * RSS allocation method.
 	 */
 
 	/* Warning FM follows */
 	reta = 0;
 	for (i = 0; i < 128; i++) {
 #ifdef	RSS
 		queue_id = rss_get_indirection_to_bucket(i);
 		/*
 		 * If we have more queues than buckets, we'll
 		 * end up mapping buckets to a subset of the
 		 * queues.
 		 *
 		 * If we have more buckets than queues, we'll
 		 * end up instead assigning multiple buckets
 		 * to queues.
 		 *
 		 * Both are suboptimal, but we need to handle
 		 * the case so we don't go out of bounds
 		 * indexing arrays and such.
 		 */
 		queue_id = queue_id % adapter->num_queues;
 #else
 		queue_id = (i % adapter->num_queues);
 #endif
 		/* Adjust if required */
 		queue_id = queue_id << shift;
 
 		/*
 		 * The low 8 bits are for hash value (n+0);
 		 * The next 8 bits are for hash value (n+1), etc.
 		 */
 		reta = reta >> 8;
 		reta = reta | ( ((uint32_t) queue_id) << 24);
 		if ((i & 3) == 3) {
 			E1000_WRITE_REG(hw, E1000_RETA(i >> 2), reta);
 			reta = 0;
 		}
 	}
 
 	/* Now fill in hash table */
 
 	/*
 	 * MRQC: Multiple Receive Queues Command
 	 * Set queuing to RSS control, number depends on the device.
 	 */
 	mrqc = E1000_MRQC_ENABLE_RSS_8Q;
 
 #ifdef	RSS
 	/* XXX ew typecasting */
 	rss_getkey((uint8_t *) &rss_key);
 #else
 	arc4rand(&rss_key, sizeof(rss_key), 0);
 #endif
 	for (i = 0; i < 10; i++)
 		E1000_WRITE_REG_ARRAY(hw,
 		    E1000_RSSRK(0), i, rss_key[i]);
 
 	/*
 	 * Configure the RSS fields to hash upon.
 	 */
 	mrqc |= (E1000_MRQC_RSS_FIELD_IPV4 |
 	    E1000_MRQC_RSS_FIELD_IPV4_TCP);
 	mrqc |= (E1000_MRQC_RSS_FIELD_IPV6 |
 	    E1000_MRQC_RSS_FIELD_IPV6_TCP);
 	mrqc |=( E1000_MRQC_RSS_FIELD_IPV4_UDP |
 	    E1000_MRQC_RSS_FIELD_IPV6_UDP);
 	mrqc |=( E1000_MRQC_RSS_FIELD_IPV6_UDP_EX |
 	    E1000_MRQC_RSS_FIELD_IPV6_TCP_EX);
 
 	E1000_WRITE_REG(hw, E1000_MRQC, mrqc);
 }
 
 /*********************************************************************
  *
  *  Enable receive unit.
  *
  **********************************************************************/
 static void
 igb_initialize_receive_units(struct adapter *adapter)
 {
 	struct rx_ring	*rxr = adapter->rx_rings;
 	struct ifnet	*ifp = adapter->ifp;
 	struct e1000_hw *hw = &adapter->hw;
 	u32		rctl, rxcsum, psize, srrctl = 0;
 
 	INIT_DEBUGOUT("igb_initialize_receive_unit: begin");
 
 	/*
 	 * Make sure receives are disabled while setting
 	 * up the descriptor ring
 	 */
 	rctl = E1000_READ_REG(hw, E1000_RCTL);
 	E1000_WRITE_REG(hw, E1000_RCTL, rctl & ~E1000_RCTL_EN);
 
 	/*
 	** Set up for header split
 	*/
 	if (igb_header_split) {
 		/* Use a standard mbuf for the header */
 		srrctl |= IGB_HDR_BUF << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
 		srrctl |= E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
 	} else
 		srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
 
 	/*
 	** Set up for jumbo frames
 	*/
 	if (ifp->if_mtu > ETHERMTU) {
 		rctl |= E1000_RCTL_LPE;
 		if (adapter->rx_mbuf_sz == MJUMPAGESIZE) {
 			srrctl |= 4096 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
 			rctl |= E1000_RCTL_SZ_4096 | E1000_RCTL_BSEX;
 		} else if (adapter->rx_mbuf_sz > MJUMPAGESIZE) {
 			srrctl |= 8192 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
 			rctl |= E1000_RCTL_SZ_8192 | E1000_RCTL_BSEX;
 		}
 		/* Set maximum packet len */
 		psize = adapter->max_frame_size;
 		/* are we on a vlan? */
 		if (adapter->ifp->if_vlantrunk != NULL)
 			psize += VLAN_TAG_SIZE;
 		E1000_WRITE_REG(&adapter->hw, E1000_RLPML, psize);
 	} else {
 		rctl &= ~E1000_RCTL_LPE;
 		srrctl |= 2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
 		rctl |= E1000_RCTL_SZ_2048;
 	}
 
 	/*
 	 * If TX flow control is disabled and there's >1 queue defined,
 	 * enable DROP.
 	 *
 	 * This drops frames rather than hanging the RX MAC for all queues.
 	 */
 	if ((adapter->num_queues > 1) &&
 	    (adapter->fc == e1000_fc_none ||
 	     adapter->fc == e1000_fc_rx_pause)) {
 		srrctl |= E1000_SRRCTL_DROP_EN;
 	}
 
 	/* Setup the Base and Length of the Rx Descriptor Rings */
 	for (int i = 0; i < adapter->num_queues; i++, rxr++) {
 		u64 bus_addr = rxr->rxdma.dma_paddr;
 		u32 rxdctl;
 
 		E1000_WRITE_REG(hw, E1000_RDLEN(i),
 		    adapter->num_rx_desc * sizeof(struct e1000_rx_desc));
 		E1000_WRITE_REG(hw, E1000_RDBAH(i),
 		    (uint32_t)(bus_addr >> 32));
 		E1000_WRITE_REG(hw, E1000_RDBAL(i),
 		    (uint32_t)bus_addr);
 		E1000_WRITE_REG(hw, E1000_SRRCTL(i), srrctl);
 		/* Enable this Queue */
 		rxdctl = E1000_READ_REG(hw, E1000_RXDCTL(i));
 		rxdctl |= E1000_RXDCTL_QUEUE_ENABLE;
 		rxdctl &= 0xFFF00000;
 		rxdctl |= IGB_RX_PTHRESH;
 		rxdctl |= IGB_RX_HTHRESH << 8;
 		rxdctl |= IGB_RX_WTHRESH << 16;
 		E1000_WRITE_REG(hw, E1000_RXDCTL(i), rxdctl);
 	}
 
 	/*
 	** Setup for RX MultiQueue
 	*/
 	rxcsum = E1000_READ_REG(hw, E1000_RXCSUM);
 	if (adapter->num_queues >1) {
 
 		/* rss setup */
 		igb_initialise_rss_mapping(adapter);
 
 		/*
 		** NOTE: Receive Full-Packet Checksum Offload 
 		** is mutually exclusive with Multiqueue. However
 		** this is not the same as TCP/IP checksums which
 		** still work.
 		*/
 		rxcsum |= E1000_RXCSUM_PCSD;
 #if __FreeBSD_version >= 800000
 		/* For SCTP Offload */
 		if ((hw->mac.type != e1000_82575) &&
 		    (ifp->if_capenable & IFCAP_RXCSUM))
 			rxcsum |= E1000_RXCSUM_CRCOFL;
 #endif
 	} else {
 		/* Non RSS setup */
 		if (ifp->if_capenable & IFCAP_RXCSUM) {
 			rxcsum |= E1000_RXCSUM_IPPCSE;
 #if __FreeBSD_version >= 800000
 			if (adapter->hw.mac.type != e1000_82575)
 				rxcsum |= E1000_RXCSUM_CRCOFL;
 #endif
 		} else
 			rxcsum &= ~E1000_RXCSUM_TUOFL;
 	}
 	E1000_WRITE_REG(hw, E1000_RXCSUM, rxcsum);
 
 	/* Setup the Receive Control Register */
 	rctl &= ~(3 << E1000_RCTL_MO_SHIFT);
 	rctl |= E1000_RCTL_EN | E1000_RCTL_BAM | E1000_RCTL_LBM_NO |
 		   E1000_RCTL_RDMTS_HALF |
 		   (hw->mac.mc_filter_type << E1000_RCTL_MO_SHIFT);
 	/* Strip CRC bytes. */
 	rctl |= E1000_RCTL_SECRC;
 	/* Make sure VLAN Filters are off */
 	rctl &= ~E1000_RCTL_VFE;
 	/* Don't store bad packets */
 	rctl &= ~E1000_RCTL_SBP;
 
 	/* Enable Receives */
 	E1000_WRITE_REG(hw, E1000_RCTL, rctl);
 
 	/*
 	 * Setup the HW Rx Head and Tail Descriptor Pointers
 	 *   - needs to be after enable
 	 */
 	for (int i = 0; i < adapter->num_queues; i++) {
 		rxr = &adapter->rx_rings[i];
 		E1000_WRITE_REG(hw, E1000_RDH(i), rxr->next_to_check);
 #ifdef DEV_NETMAP
 		/*
 		 * an init() while a netmap client is active must
 		 * preserve the rx buffers passed to userspace.
 		 * In this driver it means we adjust RDT to
 		 * something different from next_to_refresh
 		 * (which is not used in netmap mode).
 		 */
 		if (ifp->if_capenable & IFCAP_NETMAP) {
 			struct netmap_adapter *na = NA(adapter->ifp);
 			struct netmap_kring *kring = &na->rx_rings[i];
 			int t = rxr->next_to_refresh - nm_kr_rxspace(kring);
 
 			if (t >= adapter->num_rx_desc)
 				t -= adapter->num_rx_desc;
 			else if (t < 0)
 				t += adapter->num_rx_desc;
 			E1000_WRITE_REG(hw, E1000_RDT(i), t);
 		} else
 #endif /* DEV_NETMAP */
 		E1000_WRITE_REG(hw, E1000_RDT(i), rxr->next_to_refresh);
 	}
 	return;
 }
 
 /*********************************************************************
  *
  *  Free receive rings.
  *
  **********************************************************************/
 static void
 igb_free_receive_structures(struct adapter *adapter)
 {
 	struct rx_ring *rxr = adapter->rx_rings;
 
 	for (int i = 0; i < adapter->num_queues; i++, rxr++) {
 		struct lro_ctrl	*lro = &rxr->lro;
 		igb_free_receive_buffers(rxr);
 		tcp_lro_free(lro);
 		igb_dma_free(adapter, &rxr->rxdma);
 	}
 
 	free(adapter->rx_rings, M_DEVBUF);
 }
 
 /*********************************************************************
  *
  *  Free receive ring data structures.
  *
  **********************************************************************/
 static void
 igb_free_receive_buffers(struct rx_ring *rxr)
 {
 	struct adapter		*adapter = rxr->adapter;
 	struct igb_rx_buf	*rxbuf;
 	int i;
 
 	INIT_DEBUGOUT("free_receive_structures: begin");
 
 	/* Cleanup any existing buffers */
 	if (rxr->rx_buffers != NULL) {
 		for (i = 0; i < adapter->num_rx_desc; i++) {
 			rxbuf = &rxr->rx_buffers[i];
 			if (rxbuf->m_head != NULL) {
 				bus_dmamap_sync(rxr->htag, rxbuf->hmap,
 				    BUS_DMASYNC_POSTREAD);
 				bus_dmamap_unload(rxr->htag, rxbuf->hmap);
 				rxbuf->m_head->m_flags |= M_PKTHDR;
 				m_freem(rxbuf->m_head);
 			}
 			if (rxbuf->m_pack != NULL) {
 				bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 				    BUS_DMASYNC_POSTREAD);
 				bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 				rxbuf->m_pack->m_flags |= M_PKTHDR;
 				m_freem(rxbuf->m_pack);
 			}
 			rxbuf->m_head = NULL;
 			rxbuf->m_pack = NULL;
 			if (rxbuf->hmap != NULL) {
 				bus_dmamap_destroy(rxr->htag, rxbuf->hmap);
 				rxbuf->hmap = NULL;
 			}
 			if (rxbuf->pmap != NULL) {
 				bus_dmamap_destroy(rxr->ptag, rxbuf->pmap);
 				rxbuf->pmap = NULL;
 			}
 		}
 		if (rxr->rx_buffers != NULL) {
 			free(rxr->rx_buffers, M_DEVBUF);
 			rxr->rx_buffers = NULL;
 		}
 	}
 
 	if (rxr->htag != NULL) {
 		bus_dma_tag_destroy(rxr->htag);
 		rxr->htag = NULL;
 	}
 	if (rxr->ptag != NULL) {
 		bus_dma_tag_destroy(rxr->ptag);
 		rxr->ptag = NULL;
 	}
 }
 
 static __inline void
 igb_rx_discard(struct rx_ring *rxr, int i)
 {
 	struct igb_rx_buf	*rbuf;
 
 	rbuf = &rxr->rx_buffers[i];
 
 	/* Partially received? Free the chain */
 	if (rxr->fmp != NULL) {
 		rxr->fmp->m_flags |= M_PKTHDR;
 		m_freem(rxr->fmp);
 		rxr->fmp = NULL;
 		rxr->lmp = NULL;
 	}
 
 	/*
 	** With advanced descriptors the writeback
 	** clobbers the buffer addrs, so its easier
 	** to just free the existing mbufs and take
 	** the normal refresh path to get new buffers
 	** and mapping.
 	*/
 	if (rbuf->m_head) {
 		m_free(rbuf->m_head);
 		rbuf->m_head = NULL;
 		bus_dmamap_unload(rxr->htag, rbuf->hmap);
 	}
 
 	if (rbuf->m_pack) {
 		m_free(rbuf->m_pack);
 		rbuf->m_pack = NULL;
 		bus_dmamap_unload(rxr->ptag, rbuf->pmap);
 	}
 
 	return;
 }
 
 static __inline void
 igb_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u32 ptype)
 {
 
 	/*
 	 * ATM LRO is only for IPv4/TCP packets and TCP checksum of the packet
 	 * should be computed by hardware. Also it should not have VLAN tag in
 	 * ethernet header.
 	 */
 	if (rxr->lro_enabled &&
 	    (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
 	    (ptype & E1000_RXDADV_PKTTYPE_ETQF) == 0 &&
 	    (ptype & (E1000_RXDADV_PKTTYPE_IPV4 | E1000_RXDADV_PKTTYPE_TCP)) ==
 	    (E1000_RXDADV_PKTTYPE_IPV4 | E1000_RXDADV_PKTTYPE_TCP) &&
 	    (m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) == 
 	    (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
 		/*
 		 * Send to the stack if:
 		 **  - LRO not enabled, or
 		 **  - no LRO resources, or
 		 **  - lro enqueue fails
 		 */
 		if (rxr->lro.lro_cnt != 0)
 			if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
 				return;
 	}
 	IGB_RX_UNLOCK(rxr);
 	(*ifp->if_input)(ifp, m);
 	IGB_RX_LOCK(rxr);
 }
 
 /*********************************************************************
  *
  *  This routine executes in interrupt context. It replenishes
  *  the mbufs in the descriptor and sends data which has been
  *  dma'ed into host memory to upper layer.
  *
  *  We loop at most count times if count is > 0, or until done if
  *  count < 0.
  *
  *  Return TRUE if more to clean, FALSE otherwise
  *********************************************************************/
 static bool
 igb_rxeof(struct igb_queue *que, int count, int *done)
 {
 	struct adapter		*adapter = que->adapter;
 	struct rx_ring		*rxr = que->rxr;
 	struct ifnet		*ifp = adapter->ifp;
 	struct lro_ctrl		*lro = &rxr->lro;
 	int			i, processed = 0, rxdone = 0;
 	u32			ptype, staterr = 0;
 	union e1000_adv_rx_desc	*cur;
 
 	IGB_RX_LOCK(rxr);
 	/* Sync the ring. */
 	bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 #ifdef DEV_NETMAP
 	if (netmap_rx_irq(ifp, rxr->me, &processed)) {
 		IGB_RX_UNLOCK(rxr);
 		return (FALSE);
 	}
 #endif /* DEV_NETMAP */
 
 	/* Main clean loop */
 	for (i = rxr->next_to_check; count != 0;) {
 		struct mbuf		*sendmp, *mh, *mp;
 		struct igb_rx_buf	*rxbuf;
 		u16			hlen, plen, hdr, vtag, pkt_info;
 		bool			eop = FALSE;
  
 		cur = &rxr->rx_base[i];
 		staterr = le32toh(cur->wb.upper.status_error);
 		if ((staterr & E1000_RXD_STAT_DD) == 0)
 			break;
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 		count--;
 		sendmp = mh = mp = NULL;
 		cur->wb.upper.status_error = 0;
 		rxbuf = &rxr->rx_buffers[i];
 		plen = le16toh(cur->wb.upper.length);
 		ptype = le32toh(cur->wb.lower.lo_dword.data) & IGB_PKTTYPE_MASK;
 		if (((adapter->hw.mac.type == e1000_i350) ||
 		    (adapter->hw.mac.type == e1000_i354)) &&
 		    (staterr & E1000_RXDEXT_STATERR_LB))
 			vtag = be16toh(cur->wb.upper.vlan);
 		else
 			vtag = le16toh(cur->wb.upper.vlan);
 		hdr = le16toh(cur->wb.lower.lo_dword.hs_rss.hdr_info);
 		pkt_info = le16toh(cur->wb.lower.lo_dword.hs_rss.pkt_info);
 		eop = ((staterr & E1000_RXD_STAT_EOP) == E1000_RXD_STAT_EOP);
 
 		/*
 		 * Free the frame (all segments) if we're at EOP and
 		 * it's an error.
 		 *
 		 * The datasheet states that EOP + status is only valid for
 		 * the final segment in a multi-segment frame.
 		 */
 		if (eop && ((staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) != 0)) {
 			adapter->dropped_pkts++;
 			++rxr->rx_discarded;
 			igb_rx_discard(rxr, i);
 			goto next_desc;
 		}
 
 		/*
 		** The way the hardware is configured to
 		** split, it will ONLY use the header buffer
 		** when header split is enabled, otherwise we
 		** get normal behavior, ie, both header and
 		** payload are DMA'd into the payload buffer.
 		**
 		** The fmp test is to catch the case where a
 		** packet spans multiple descriptors, in that
 		** case only the first header is valid.
 		*/
 		if (rxr->hdr_split && rxr->fmp == NULL) {
 			bus_dmamap_unload(rxr->htag, rxbuf->hmap);
 			hlen = (hdr & E1000_RXDADV_HDRBUFLEN_MASK) >>
 			    E1000_RXDADV_HDRBUFLEN_SHIFT;
 			if (hlen > IGB_HDR_BUF)
 				hlen = IGB_HDR_BUF;
 			mh = rxr->rx_buffers[i].m_head;
 			mh->m_len = hlen;
 			/* clear buf pointer for refresh */
 			rxbuf->m_head = NULL;
 			/*
 			** Get the payload length, this
 			** could be zero if its a small
 			** packet.
 			*/
 			if (plen > 0) {
 				mp = rxr->rx_buffers[i].m_pack;
 				mp->m_len = plen;
 				mh->m_next = mp;
 				/* clear buf pointer */
 				rxbuf->m_pack = NULL;
 				rxr->rx_split_packets++;
 			}
 		} else {
 			/*
 			** Either no header split, or a
 			** secondary piece of a fragmented
 			** split packet.
 			*/
 			mh = rxr->rx_buffers[i].m_pack;
 			mh->m_len = plen;
 			/* clear buf info for refresh */
 			rxbuf->m_pack = NULL;
 		}
 		bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 
 		++processed; /* So we know when to refresh */
 
 		/* Initial frame - setup */
 		if (rxr->fmp == NULL) {
 			mh->m_pkthdr.len = mh->m_len;
 			/* Save the head of the chain */
 			rxr->fmp = mh;
 			rxr->lmp = mh;
 			if (mp != NULL) {
 				/* Add payload if split */
 				mh->m_pkthdr.len += mp->m_len;
 				rxr->lmp = mh->m_next;
 			}
 		} else {
 			/* Chain mbuf's together */
 			rxr->lmp->m_next = mh;
 			rxr->lmp = rxr->lmp->m_next;
 			rxr->fmp->m_pkthdr.len += mh->m_len;
 		}
 
 		if (eop) {
 			rxr->fmp->m_pkthdr.rcvif = ifp;
 			rxr->rx_packets++;
 			/* capture data for AIM */
 			rxr->packets++;
 			rxr->bytes += rxr->fmp->m_pkthdr.len;
 			rxr->rx_bytes += rxr->fmp->m_pkthdr.len;
 
 			if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
 				igb_rx_checksum(staterr, rxr->fmp, ptype);
 
 			if ((ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
 			    (staterr & E1000_RXD_STAT_VP) != 0) {
 				rxr->fmp->m_pkthdr.ether_vtag = vtag;
 				rxr->fmp->m_flags |= M_VLANTAG;
 			}
 
 			/*
 			 * In case of multiqueue, we have RXCSUM.PCSD bit set
 			 * and never cleared. This means we have RSS hash
 			 * available to be used.
 			 */
 			if (adapter->num_queues > 1) {
 				rxr->fmp->m_pkthdr.flowid = 
 				    le32toh(cur->wb.lower.hi_dword.rss);
 				switch (pkt_info & E1000_RXDADV_RSSTYPE_MASK) {
 					case E1000_RXDADV_RSSTYPE_IPV4_TCP:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_TCP_IPV4);
 					break;
 					case E1000_RXDADV_RSSTYPE_IPV4:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_IPV4);
 					break;
 					case E1000_RXDADV_RSSTYPE_IPV6_TCP:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_TCP_IPV6);
 					break;
 					case E1000_RXDADV_RSSTYPE_IPV6_EX:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_IPV6_EX);
 					break;
 					case E1000_RXDADV_RSSTYPE_IPV6:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_IPV6);
 					break;
 					case E1000_RXDADV_RSSTYPE_IPV6_TCP_EX:
 						M_HASHTYPE_SET(rxr->fmp,
 						    M_HASHTYPE_RSS_TCP_IPV6_EX);
 					break;
 					default:
 						/* XXX fallthrough */
 						M_HASHTYPE_SET(rxr->fmp,
-						    M_HASHTYPE_OPAQUE);
+						    M_HASHTYPE_OPAQUE_HASH);
 				}
 			} else {
 #ifndef IGB_LEGACY_TX
 				rxr->fmp->m_pkthdr.flowid = que->msix;
 				M_HASHTYPE_SET(rxr->fmp, M_HASHTYPE_OPAQUE);
 #endif
 			}
 			sendmp = rxr->fmp;
 			/* Make sure to set M_PKTHDR. */
 			sendmp->m_flags |= M_PKTHDR;
 			rxr->fmp = NULL;
 			rxr->lmp = NULL;
 		}
 
 next_desc:
 		bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		/* Advance our pointers to the next descriptor. */
 		if (++i == adapter->num_rx_desc)
 			i = 0;
 		/*
 		** Send to the stack or LRO
 		*/
 		if (sendmp != NULL) {
 			rxr->next_to_check = i;
 			igb_rx_input(rxr, ifp, sendmp, ptype);
 			i = rxr->next_to_check;
 			rxdone++;
 		}
 
 		/* Every 8 descriptors we go to refresh mbufs */
 		if (processed == 8) {
                         igb_refresh_mbufs(rxr, i);
                         processed = 0;
 		}
 	}
 
 	/* Catch any remainders */
 	if (igb_rx_unrefreshed(rxr))
 		igb_refresh_mbufs(rxr, i);
 
 	rxr->next_to_check = i;
 
 	/*
 	 * Flush any outstanding LRO work
 	 */
 	tcp_lro_flush_all(lro);
 
 	if (done != NULL)
 		*done += rxdone;
 
 	IGB_RX_UNLOCK(rxr);
 	return ((staterr & E1000_RXD_STAT_DD) ? TRUE : FALSE);
 }
 
 /*********************************************************************
  *
  *  Verify that the hardware indicated that the checksum is valid.
  *  Inform the stack about the status of checksum so that stack
  *  doesn't spend time verifying the checksum.
  *
  *********************************************************************/
 static void
 igb_rx_checksum(u32 staterr, struct mbuf *mp, u32 ptype)
 {
 	u16 status = (u16)staterr;
 	u8  errors = (u8) (staterr >> 24);
 	int sctp;
 
 	/* Ignore Checksum bit is set */
 	if (status & E1000_RXD_STAT_IXSM) {
 		mp->m_pkthdr.csum_flags = 0;
 		return;
 	}
 
 	if ((ptype & E1000_RXDADV_PKTTYPE_ETQF) == 0 &&
 	    (ptype & E1000_RXDADV_PKTTYPE_SCTP) != 0)
 		sctp = 1;
 	else
 		sctp = 0;
 	if (status & E1000_RXD_STAT_IPCS) {
 		/* Did it pass? */
 		if (!(errors & E1000_RXD_ERR_IPE)) {
 			/* IP Checksum Good */
 			mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED;
 			mp->m_pkthdr.csum_flags |= CSUM_IP_VALID;
 		} else
 			mp->m_pkthdr.csum_flags = 0;
 	}
 
 	if (status & (E1000_RXD_STAT_TCPCS | E1000_RXD_STAT_UDPCS)) {
 		u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 #if __FreeBSD_version >= 800000
 		if (sctp) /* reassign */
 			type = CSUM_SCTP_VALID;
 #endif
 		/* Did it pass? */
 		if (!(errors & E1000_RXD_ERR_TCPE)) {
 			mp->m_pkthdr.csum_flags |= type;
 			if (sctp == 0)
 				mp->m_pkthdr.csum_data = htons(0xffff);
 		}
 	}
 	return;
 }
 
 /*
  * This routine is run via an vlan
  * config EVENT
  */
 static void
 igb_register_vlan(void *arg, struct ifnet *ifp, u16 vtag)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	u32		index, bit;
 
 	if (ifp->if_softc !=  arg)   /* Not our event */
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))       /* Invalid */
                 return;
 
 	IGB_CORE_LOCK(adapter);
 	index = (vtag >> 5) & 0x7F;
 	bit = vtag & 0x1F;
 	adapter->shadow_vfta[index] |= (1 << bit);
 	++adapter->num_vlans;
 	/* Change hw filter setting */
 	if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
 		igb_setup_vlan_hw_support(adapter);
 	IGB_CORE_UNLOCK(adapter);
 }
 
 /*
  * This routine is run via an vlan
  * unconfig EVENT
  */
 static void
 igb_unregister_vlan(void *arg, struct ifnet *ifp, u16 vtag)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	u32		index, bit;
 
 	if (ifp->if_softc !=  arg)
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))       /* Invalid */
                 return;
 
 	IGB_CORE_LOCK(adapter);
 	index = (vtag >> 5) & 0x7F;
 	bit = vtag & 0x1F;
 	adapter->shadow_vfta[index] &= ~(1 << bit);
 	--adapter->num_vlans;
 	/* Change hw filter setting */
 	if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
 		igb_setup_vlan_hw_support(adapter);
 	IGB_CORE_UNLOCK(adapter);
 }
 
 static void
 igb_setup_vlan_hw_support(struct adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	struct ifnet	*ifp = adapter->ifp;
 	u32             reg;
 
 	if (adapter->vf_ifp) {
 		e1000_rlpml_set_vf(hw,
 		    adapter->max_frame_size + VLAN_TAG_SIZE);
 		return;
 	}
 
 	reg = E1000_READ_REG(hw, E1000_CTRL);
 	reg |= E1000_CTRL_VME;
 	E1000_WRITE_REG(hw, E1000_CTRL, reg);
 
 	/* Enable the Filter Table */
 	if (ifp->if_capenable & IFCAP_VLAN_HWFILTER) {
 		reg = E1000_READ_REG(hw, E1000_RCTL);
 		reg &= ~E1000_RCTL_CFIEN;
 		reg |= E1000_RCTL_VFE;
 		E1000_WRITE_REG(hw, E1000_RCTL, reg);
 	}
 
 	/* Update the frame size */
 	E1000_WRITE_REG(&adapter->hw, E1000_RLPML,
 	    adapter->max_frame_size + VLAN_TAG_SIZE);
 
 	/* Don't bother with table if no vlans */
 	if ((adapter->num_vlans == 0) ||
 	    ((ifp->if_capenable & IFCAP_VLAN_HWFILTER) == 0))
                 return;
 	/*
 	** A soft reset zero's out the VFTA, so
 	** we need to repopulate it now.
 	*/
 	for (int i = 0; i < IGB_VFTA_SIZE; i++)
                 if (adapter->shadow_vfta[i] != 0) {
 			if (adapter->vf_ifp)
 				e1000_vfta_set_vf(hw,
 				    adapter->shadow_vfta[i], TRUE);
 			else
 				e1000_write_vfta(hw,
 				    i, adapter->shadow_vfta[i]);
 		}
 }
 
 static void
 igb_enable_intr(struct adapter *adapter)
 {
 	/* With RSS set up what to auto clear */
 	if (adapter->msix_mem) {
 		u32 mask = (adapter->que_mask | adapter->link_mask);
 		E1000_WRITE_REG(&adapter->hw, E1000_EIAC, mask);
 		E1000_WRITE_REG(&adapter->hw, E1000_EIAM, mask);
 		E1000_WRITE_REG(&adapter->hw, E1000_EIMS, mask);
 		E1000_WRITE_REG(&adapter->hw, E1000_IMS,
 		    E1000_IMS_LSC);
 	} else {
 		E1000_WRITE_REG(&adapter->hw, E1000_IMS,
 		    IMS_ENABLE_MASK);
 	}
 	E1000_WRITE_FLUSH(&adapter->hw);
 
 	return;
 }
 
 static void
 igb_disable_intr(struct adapter *adapter)
 {
 	if (adapter->msix_mem) {
 		E1000_WRITE_REG(&adapter->hw, E1000_EIMC, ~0);
 		E1000_WRITE_REG(&adapter->hw, E1000_EIAC, 0);
 	} 
 	E1000_WRITE_REG(&adapter->hw, E1000_IMC, ~0);
 	E1000_WRITE_FLUSH(&adapter->hw);
 	return;
 }
 
 /*
  * Bit of a misnomer, what this really means is
  * to enable OS management of the system... aka
  * to disable special hardware management features 
  */
 static void
 igb_init_manageability(struct adapter *adapter)
 {
 	if (adapter->has_manage) {
 		int manc2h = E1000_READ_REG(&adapter->hw, E1000_MANC2H);
 		int manc = E1000_READ_REG(&adapter->hw, E1000_MANC);
 
 		/* disable hardware interception of ARP */
 		manc &= ~(E1000_MANC_ARP_EN);
 
                 /* enable receiving management packets to the host */
 		manc |= E1000_MANC_EN_MNG2HOST;
 		manc2h |= 1 << 5;  /* Mng Port 623 */
 		manc2h |= 1 << 6;  /* Mng Port 664 */
 		E1000_WRITE_REG(&adapter->hw, E1000_MANC2H, manc2h);
 		E1000_WRITE_REG(&adapter->hw, E1000_MANC, manc);
 	}
 }
 
 /*
  * Give control back to hardware management
  * controller if there is one.
  */
 static void
 igb_release_manageability(struct adapter *adapter)
 {
 	if (adapter->has_manage) {
 		int manc = E1000_READ_REG(&adapter->hw, E1000_MANC);
 
 		/* re-enable hardware interception of ARP */
 		manc |= E1000_MANC_ARP_EN;
 		manc &= ~E1000_MANC_EN_MNG2HOST;
 
 		E1000_WRITE_REG(&adapter->hw, E1000_MANC, manc);
 	}
 }
 
 /*
  * igb_get_hw_control sets CTRL_EXT:DRV_LOAD bit.
  * For ASF and Pass Through versions of f/w this means that
  * the driver is loaded. 
  *
  */
 static void
 igb_get_hw_control(struct adapter *adapter)
 {
 	u32 ctrl_ext;
 
 	if (adapter->vf_ifp)
 		return;
 
 	/* Let firmware know the driver has taken over */
 	ctrl_ext = E1000_READ_REG(&adapter->hw, E1000_CTRL_EXT);
 	E1000_WRITE_REG(&adapter->hw, E1000_CTRL_EXT,
 	    ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
 }
 
 /*
  * igb_release_hw_control resets CTRL_EXT:DRV_LOAD bit.
  * For ASF and Pass Through versions of f/w this means that the
  * driver is no longer loaded.
  *
  */
 static void
 igb_release_hw_control(struct adapter *adapter)
 {
 	u32 ctrl_ext;
 
 	if (adapter->vf_ifp)
 		return;
 
 	/* Let firmware taken over control of h/w */
 	ctrl_ext = E1000_READ_REG(&adapter->hw, E1000_CTRL_EXT);
 	E1000_WRITE_REG(&adapter->hw, E1000_CTRL_EXT,
 	    ctrl_ext & ~E1000_CTRL_EXT_DRV_LOAD);
 }
 
 static int
 igb_is_valid_ether_addr(uint8_t *addr)
 {
 	char zero_addr[6] = { 0, 0, 0, 0, 0, 0 };
 
 	if ((addr[0] & 1) || (!bcmp(addr, zero_addr, ETHER_ADDR_LEN))) {
 		return (FALSE);
 	}
 
 	return (TRUE);
 }
 
 
 /*
  * Enable PCI Wake On Lan capability
  */
 static void
 igb_enable_wakeup(device_t dev)
 {
 	u16     cap, status;
 	u8      id;
 
 	/* First find the capabilities pointer*/
 	cap = pci_read_config(dev, PCIR_CAP_PTR, 2);
 	/* Read the PM Capabilities */
 	id = pci_read_config(dev, cap, 1);
 	if (id != PCIY_PMG)     /* Something wrong */
 		return;
 	/* OK, we have the power capabilities, so
 	   now get the status register */
 	cap += PCIR_POWER_STATUS;
 	status = pci_read_config(dev, cap, 2);
 	status |= PCIM_PSTAT_PME | PCIM_PSTAT_PMEENABLE;
 	pci_write_config(dev, cap, status, 2);
 	return;
 }
 
 static void
 igb_led_func(void *arg, int onoff)
 {
 	struct adapter	*adapter = arg;
 
 	IGB_CORE_LOCK(adapter);
 	if (onoff) {
 		e1000_setup_led(&adapter->hw);
 		e1000_led_on(&adapter->hw);
 	} else {
 		e1000_led_off(&adapter->hw);
 		e1000_cleanup_led(&adapter->hw);
 	}
 	IGB_CORE_UNLOCK(adapter);
 }
 
 static uint64_t
 igb_get_vf_counter(if_t ifp, ift_counter cnt)
 {
 	struct adapter *adapter;
 	struct e1000_vf_stats *stats;
 #ifndef IGB_LEGACY_TX
 	struct tx_ring *txr;
 	uint64_t rv;
 #endif
 
 	adapter = if_getsoftc(ifp);
 	stats = (struct e1000_vf_stats *)adapter->stats;
 
 	switch (cnt) {
 	case IFCOUNTER_IPACKETS:
 		return (stats->gprc);
 	case IFCOUNTER_OPACKETS:
 		return (stats->gptc);
 	case IFCOUNTER_IBYTES:
 		return (stats->gorc);
 	case IFCOUNTER_OBYTES:
 		return (stats->gotc);
 	case IFCOUNTER_IMCASTS:
 		return (stats->mprc);
 	case IFCOUNTER_IERRORS:
 		return (adapter->dropped_pkts);
 	case IFCOUNTER_OERRORS:
 		return (adapter->watchdog_events);
 #ifndef IGB_LEGACY_TX
 	case IFCOUNTER_OQDROPS:
 		rv = 0;
 		txr = adapter->tx_rings;
 		for (int i = 0; i < adapter->num_queues; i++, txr++)
 			rv += txr->br->br_drops;
 		return (rv);
 #endif
 	default:
 		return (if_get_counter_default(ifp, cnt));
 	}
 }
 
 static uint64_t
 igb_get_counter(if_t ifp, ift_counter cnt)
 {
 	struct adapter *adapter;
 	struct e1000_hw_stats *stats;
 #ifndef IGB_LEGACY_TX
 	struct tx_ring *txr;
 	uint64_t rv;
 #endif
 
 	adapter = if_getsoftc(ifp);
 	if (adapter->vf_ifp)
 		return (igb_get_vf_counter(ifp, cnt));
 
 	stats = (struct e1000_hw_stats *)adapter->stats;
 
 	switch (cnt) {
 	case IFCOUNTER_IPACKETS:
 		return (stats->gprc);
 	case IFCOUNTER_OPACKETS:
 		return (stats->gptc);
 	case IFCOUNTER_IBYTES:
 		return (stats->gorc);
 	case IFCOUNTER_OBYTES:
 		return (stats->gotc);
 	case IFCOUNTER_IMCASTS:
 		return (stats->mprc);
 	case IFCOUNTER_OMCASTS:
 		return (stats->mptc);
 	case IFCOUNTER_IERRORS:
 		return (adapter->dropped_pkts + stats->rxerrc +
 		    stats->crcerrs + stats->algnerrc +
 		    stats->ruc + stats->roc + stats->cexterr);
 	case IFCOUNTER_OERRORS:
 		return (stats->ecol + stats->latecol +
 		    adapter->watchdog_events);
 	case IFCOUNTER_COLLISIONS:
 		return (stats->colc);
 	case IFCOUNTER_IQDROPS:
 		return (stats->mpc);
 #ifndef IGB_LEGACY_TX
 	case IFCOUNTER_OQDROPS:
 		rv = 0;
 		txr = adapter->tx_rings;
 		for (int i = 0; i < adapter->num_queues; i++, txr++)
 			rv += txr->br->br_drops;
 		return (rv);
 #endif
 	default:
 		return (if_get_counter_default(ifp, cnt));
 	}
 }
 
 /**********************************************************************
  *
  *  Update the board statistics counters.
  *
  **********************************************************************/
 static void
 igb_update_stats_counters(struct adapter *adapter)
 {
         struct e1000_hw		*hw = &adapter->hw;
 	struct e1000_hw_stats	*stats;
 
 	/* 
 	** The virtual function adapter has only a
 	** small controlled set of stats, do only 
 	** those and return.
 	*/
 	if (adapter->vf_ifp) {
 		igb_update_vf_stats_counters(adapter);
 		return;
 	}
 
 	stats = (struct e1000_hw_stats	*)adapter->stats;
 
 	if (adapter->hw.phy.media_type == e1000_media_type_copper ||
 	   (E1000_READ_REG(hw, E1000_STATUS) & E1000_STATUS_LU)) {
 		stats->symerrs +=
 		    E1000_READ_REG(hw,E1000_SYMERRS);
 		stats->sec += E1000_READ_REG(hw, E1000_SEC);
 	}
 
 	stats->crcerrs += E1000_READ_REG(hw, E1000_CRCERRS);
 	stats->mpc += E1000_READ_REG(hw, E1000_MPC);
 	stats->scc += E1000_READ_REG(hw, E1000_SCC);
 	stats->ecol += E1000_READ_REG(hw, E1000_ECOL);
 
 	stats->mcc += E1000_READ_REG(hw, E1000_MCC);
 	stats->latecol += E1000_READ_REG(hw, E1000_LATECOL);
 	stats->colc += E1000_READ_REG(hw, E1000_COLC);
 	stats->dc += E1000_READ_REG(hw, E1000_DC);
 	stats->rlec += E1000_READ_REG(hw, E1000_RLEC);
 	stats->xonrxc += E1000_READ_REG(hw, E1000_XONRXC);
 	stats->xontxc += E1000_READ_REG(hw, E1000_XONTXC);
 	/*
 	** For watchdog management we need to know if we have been
 	** paused during the last interval, so capture that here.
 	*/ 
         adapter->pause_frames = E1000_READ_REG(&adapter->hw, E1000_XOFFRXC);
         stats->xoffrxc += adapter->pause_frames;
 	stats->xofftxc += E1000_READ_REG(hw, E1000_XOFFTXC);
 	stats->fcruc += E1000_READ_REG(hw, E1000_FCRUC);
 	stats->prc64 += E1000_READ_REG(hw, E1000_PRC64);
 	stats->prc127 += E1000_READ_REG(hw, E1000_PRC127);
 	stats->prc255 += E1000_READ_REG(hw, E1000_PRC255);
 	stats->prc511 += E1000_READ_REG(hw, E1000_PRC511);
 	stats->prc1023 += E1000_READ_REG(hw, E1000_PRC1023);
 	stats->prc1522 += E1000_READ_REG(hw, E1000_PRC1522);
 	stats->gprc += E1000_READ_REG(hw, E1000_GPRC);
 	stats->bprc += E1000_READ_REG(hw, E1000_BPRC);
 	stats->mprc += E1000_READ_REG(hw, E1000_MPRC);
 	stats->gptc += E1000_READ_REG(hw, E1000_GPTC);
 
 	/* For the 64-bit byte counters the low dword must be read first. */
 	/* Both registers clear on the read of the high dword */
 
 	stats->gorc += E1000_READ_REG(hw, E1000_GORCL) +
 	    ((u64)E1000_READ_REG(hw, E1000_GORCH) << 32);
 	stats->gotc += E1000_READ_REG(hw, E1000_GOTCL) +
 	    ((u64)E1000_READ_REG(hw, E1000_GOTCH) << 32);
 
 	stats->rnbc += E1000_READ_REG(hw, E1000_RNBC);
 	stats->ruc += E1000_READ_REG(hw, E1000_RUC);
 	stats->rfc += E1000_READ_REG(hw, E1000_RFC);
 	stats->roc += E1000_READ_REG(hw, E1000_ROC);
 	stats->rjc += E1000_READ_REG(hw, E1000_RJC);
 
 	stats->mgprc += E1000_READ_REG(hw, E1000_MGTPRC);
 	stats->mgpdc += E1000_READ_REG(hw, E1000_MGTPDC);
 	stats->mgptc += E1000_READ_REG(hw, E1000_MGTPTC);
 
 	stats->tor += E1000_READ_REG(hw, E1000_TORL) +
 	    ((u64)E1000_READ_REG(hw, E1000_TORH) << 32);
 	stats->tot += E1000_READ_REG(hw, E1000_TOTL) +
 	    ((u64)E1000_READ_REG(hw, E1000_TOTH) << 32);
 
 	stats->tpr += E1000_READ_REG(hw, E1000_TPR);
 	stats->tpt += E1000_READ_REG(hw, E1000_TPT);
 	stats->ptc64 += E1000_READ_REG(hw, E1000_PTC64);
 	stats->ptc127 += E1000_READ_REG(hw, E1000_PTC127);
 	stats->ptc255 += E1000_READ_REG(hw, E1000_PTC255);
 	stats->ptc511 += E1000_READ_REG(hw, E1000_PTC511);
 	stats->ptc1023 += E1000_READ_REG(hw, E1000_PTC1023);
 	stats->ptc1522 += E1000_READ_REG(hw, E1000_PTC1522);
 	stats->mptc += E1000_READ_REG(hw, E1000_MPTC);
 	stats->bptc += E1000_READ_REG(hw, E1000_BPTC);
 
 	/* Interrupt Counts */
 
 	stats->iac += E1000_READ_REG(hw, E1000_IAC);
 	stats->icrxptc += E1000_READ_REG(hw, E1000_ICRXPTC);
 	stats->icrxatc += E1000_READ_REG(hw, E1000_ICRXATC);
 	stats->ictxptc += E1000_READ_REG(hw, E1000_ICTXPTC);
 	stats->ictxatc += E1000_READ_REG(hw, E1000_ICTXATC);
 	stats->ictxqec += E1000_READ_REG(hw, E1000_ICTXQEC);
 	stats->ictxqmtc += E1000_READ_REG(hw, E1000_ICTXQMTC);
 	stats->icrxdmtc += E1000_READ_REG(hw, E1000_ICRXDMTC);
 	stats->icrxoc += E1000_READ_REG(hw, E1000_ICRXOC);
 
 	/* Host to Card Statistics */
 
 	stats->cbtmpc += E1000_READ_REG(hw, E1000_CBTMPC);
 	stats->htdpmc += E1000_READ_REG(hw, E1000_HTDPMC);
 	stats->cbrdpc += E1000_READ_REG(hw, E1000_CBRDPC);
 	stats->cbrmpc += E1000_READ_REG(hw, E1000_CBRMPC);
 	stats->rpthc += E1000_READ_REG(hw, E1000_RPTHC);
 	stats->hgptc += E1000_READ_REG(hw, E1000_HGPTC);
 	stats->htcbdpc += E1000_READ_REG(hw, E1000_HTCBDPC);
 	stats->hgorc += (E1000_READ_REG(hw, E1000_HGORCL) +
 	    ((u64)E1000_READ_REG(hw, E1000_HGORCH) << 32));
 	stats->hgotc += (E1000_READ_REG(hw, E1000_HGOTCL) +
 	    ((u64)E1000_READ_REG(hw, E1000_HGOTCH) << 32));
 	stats->lenerrs += E1000_READ_REG(hw, E1000_LENERRS);
 	stats->scvpc += E1000_READ_REG(hw, E1000_SCVPC);
 	stats->hrmpc += E1000_READ_REG(hw, E1000_HRMPC);
 
 	stats->algnerrc += E1000_READ_REG(hw, E1000_ALGNERRC);
 	stats->rxerrc += E1000_READ_REG(hw, E1000_RXERRC);
 	stats->tncrs += E1000_READ_REG(hw, E1000_TNCRS);
 	stats->cexterr += E1000_READ_REG(hw, E1000_CEXTERR);
 	stats->tsctc += E1000_READ_REG(hw, E1000_TSCTC);
 	stats->tsctfc += E1000_READ_REG(hw, E1000_TSCTFC);
 
 	/* Driver specific counters */
 	adapter->device_control = E1000_READ_REG(hw, E1000_CTRL);
 	adapter->rx_control = E1000_READ_REG(hw, E1000_RCTL);
 	adapter->int_mask = E1000_READ_REG(hw, E1000_IMS);
 	adapter->eint_mask = E1000_READ_REG(hw, E1000_EIMS);
 	adapter->packet_buf_alloc_tx =
 	    ((E1000_READ_REG(hw, E1000_PBA) & 0xffff0000) >> 16);
 	adapter->packet_buf_alloc_rx =
 	    (E1000_READ_REG(hw, E1000_PBA) & 0xffff);
 }
 
 
 /**********************************************************************
  *
  *  Initialize the VF board statistics counters.
  *
  **********************************************************************/
 static void
 igb_vf_init_stats(struct adapter *adapter)
 {
         struct e1000_hw *hw = &adapter->hw;
 	struct e1000_vf_stats	*stats;
 
 	stats = (struct e1000_vf_stats	*)adapter->stats;
 	if (stats == NULL)
 		return;
         stats->last_gprc = E1000_READ_REG(hw, E1000_VFGPRC);
         stats->last_gorc = E1000_READ_REG(hw, E1000_VFGORC);
         stats->last_gptc = E1000_READ_REG(hw, E1000_VFGPTC);
         stats->last_gotc = E1000_READ_REG(hw, E1000_VFGOTC);
         stats->last_mprc = E1000_READ_REG(hw, E1000_VFMPRC);
 }
  
 /**********************************************************************
  *
  *  Update the VF board statistics counters.
  *
  **********************************************************************/
 static void
 igb_update_vf_stats_counters(struct adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_vf_stats	*stats;
 
 	if (adapter->link_speed == 0)
 		return;
 
 	stats = (struct e1000_vf_stats	*)adapter->stats;
 
 	UPDATE_VF_REG(E1000_VFGPRC,
 	    stats->last_gprc, stats->gprc);
 	UPDATE_VF_REG(E1000_VFGORC,
 	    stats->last_gorc, stats->gorc);
 	UPDATE_VF_REG(E1000_VFGPTC,
 	    stats->last_gptc, stats->gptc);
 	UPDATE_VF_REG(E1000_VFGOTC,
 	    stats->last_gotc, stats->gotc);
 	UPDATE_VF_REG(E1000_VFMPRC,
 	    stats->last_mprc, stats->mprc);
 }
 
 /* Export a single 32-bit register via a read-only sysctl. */
 static int
 igb_sysctl_reg_handler(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *adapter;
 	u_int val;
 
 	adapter = oidp->oid_arg1;
 	val = E1000_READ_REG(&adapter->hw, oidp->oid_arg2);
 	return (sysctl_handle_int(oidp, &val, 0, req));
 }
 
 /*
 **  Tuneable interrupt rate handler
 */
 static int
 igb_sysctl_interrupt_rate_handler(SYSCTL_HANDLER_ARGS)
 {
 	struct igb_queue	*que = ((struct igb_queue *)oidp->oid_arg1);
 	int			error;
 	u32			reg, usec, rate;
                         
 	reg = E1000_READ_REG(&que->adapter->hw, E1000_EITR(que->msix));
 	usec = ((reg & 0x7FFC) >> 2);
 	if (usec > 0)
 		rate = 1000000 / usec;
 	else
 		rate = 0;
 	error = sysctl_handle_int(oidp, &rate, 0, req);
 	if (error || !req->newptr)
 		return error;
 	return 0;
 }
 
 /*
  * Add sysctl variables, one per statistic, to the system.
  */
 static void
 igb_add_hw_stats(struct adapter *adapter)
 {
 	device_t dev = adapter->dev;
 
 	struct tx_ring *txr = adapter->tx_rings;
 	struct rx_ring *rxr = adapter->rx_rings;
 
 	struct sysctl_ctx_list *ctx = device_get_sysctl_ctx(dev);
 	struct sysctl_oid *tree = device_get_sysctl_tree(dev);
 	struct sysctl_oid_list *child = SYSCTL_CHILDREN(tree);
 	struct e1000_hw_stats *stats = adapter->stats;
 
 	struct sysctl_oid *stat_node, *queue_node, *int_node, *host_node;
 	struct sysctl_oid_list *stat_list, *queue_list, *int_list, *host_list;
 
 #define QUEUE_NAME_LEN 32
 	char namebuf[QUEUE_NAME_LEN];
 
 	/* Driver Statistics */
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "dropped", 
 			CTLFLAG_RD, &adapter->dropped_pkts,
 			"Driver dropped packets");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "link_irq", 
 			CTLFLAG_RD, &adapter->link_irq,
 			"Link MSIX IRQ Handled");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "mbuf_defrag_fail",
 			CTLFLAG_RD, &adapter->mbuf_defrag_failed,
 			"Defragmenting mbuf chain failed");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_dma_fail", 
 			CTLFLAG_RD, &adapter->no_tx_dma_setup,
 			"Driver tx dma failure in xmit");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_overruns",
 			CTLFLAG_RD, &adapter->rx_overruns,
 			"RX overruns");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "watchdog_timeouts",
 			CTLFLAG_RD, &adapter->watchdog_events,
 			"Watchdog timeouts");
 
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "device_control", 
 			CTLFLAG_RD, &adapter->device_control,
 			"Device Control Register");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_control", 
 			CTLFLAG_RD, &adapter->rx_control,
 			"Receiver Control Register");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "interrupt_mask", 
 			CTLFLAG_RD, &adapter->int_mask,
 			"Interrupt Mask");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "extended_int_mask", 
 			CTLFLAG_RD, &adapter->eint_mask,
 			"Extended Interrupt Mask");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_buf_alloc", 
 			CTLFLAG_RD, &adapter->packet_buf_alloc_tx,
 			"Transmit Buffer Packet Allocation");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_buf_alloc", 
 			CTLFLAG_RD, &adapter->packet_buf_alloc_rx,
 			"Receive Buffer Packet Allocation");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "fc_high_water",
 			CTLFLAG_RD, &adapter->hw.fc.high_water, 0,
 			"Flow Control High Watermark");
 	SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "fc_low_water", 
 			CTLFLAG_RD, &adapter->hw.fc.low_water, 0,
 			"Flow Control Low Watermark");
 
 	for (int i = 0; i < adapter->num_queues; i++, rxr++, txr++) {
 		struct lro_ctrl *lro = &rxr->lro;
 
 		snprintf(namebuf, QUEUE_NAME_LEN, "queue%d", i);
 		queue_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, namebuf,
 					    CTLFLAG_RD, NULL, "Queue Name");
 		queue_list = SYSCTL_CHILDREN(queue_node);
 
 		SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "interrupt_rate", 
 				CTLTYPE_UINT | CTLFLAG_RD, &adapter->queues[i],
 				sizeof(&adapter->queues[i]),
 				igb_sysctl_interrupt_rate_handler,
 				"IU", "Interrupt Rate");
 
 		SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "txd_head", 
 				CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_TDH(txr->me),
 				igb_sysctl_reg_handler, "IU",
  				"Transmit Descriptor Head");
 		SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "txd_tail", 
 				CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_TDT(txr->me),
 				igb_sysctl_reg_handler, "IU",
  				"Transmit Descriptor Tail");
 		SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "no_desc_avail", 
 				CTLFLAG_RD, &txr->no_desc_avail,
 				"Queue Descriptors Unavailable");
 		SYSCTL_ADD_UQUAD(ctx, queue_list, OID_AUTO, "tx_packets",
 				CTLFLAG_RD, &txr->total_packets,
 				"Queue Packets Transmitted");
 
 		SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "rxd_head", 
 				CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_RDH(rxr->me),
 				igb_sysctl_reg_handler, "IU",
 				"Receive Descriptor Head");
 		SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "rxd_tail", 
 				CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_RDT(rxr->me),
 				igb_sysctl_reg_handler, "IU",
 				"Receive Descriptor Tail");
 		SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "rx_packets",
 				CTLFLAG_RD, &rxr->rx_packets,
 				"Queue Packets Received");
 		SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "rx_bytes",
 				CTLFLAG_RD, &rxr->rx_bytes,
 				"Queue Bytes Received");
 		SYSCTL_ADD_U64(ctx, queue_list, OID_AUTO, "lro_queued",
 				CTLFLAG_RD, &lro->lro_queued, 0,
 				"LRO Queued");
 		SYSCTL_ADD_U64(ctx, queue_list, OID_AUTO, "lro_flushed",
 				CTLFLAG_RD, &lro->lro_flushed, 0,
 				"LRO Flushed");
 	}
 
 	/* MAC stats get their own sub node */
 
 	stat_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "mac_stats", 
 				    CTLFLAG_RD, NULL, "MAC Statistics");
 	stat_list = SYSCTL_CHILDREN(stat_node);
 
 	/*
 	** VF adapter has a very limited set of stats
 	** since its not managing the metal, so to speak.
 	*/
 	if (adapter->vf_ifp) {
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_recvd",
 			CTLFLAG_RD, &stats->gprc,
 			"Good Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_txd",
 			CTLFLAG_RD, &stats->gptc,
 			"Good Packets Transmitted");
  	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_recvd", 
  			CTLFLAG_RD, &stats->gorc, 
  			"Good Octets Received"); 
  	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_txd", 
  			CTLFLAG_RD, &stats->gotc, 
  			"Good Octets Transmitted"); 
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_recvd",
 			CTLFLAG_RD, &stats->mprc,
 			"Multicast Packets Received");
 		return;
 	}
 
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "excess_coll", 
 			CTLFLAG_RD, &stats->ecol,
 			"Excessive collisions");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "single_coll", 
 			CTLFLAG_RD, &stats->scc,
 			"Single collisions");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "multiple_coll", 
 			CTLFLAG_RD, &stats->mcc,
 			"Multiple collisions");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "late_coll", 
 			CTLFLAG_RD, &stats->latecol,
 			"Late collisions");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "collision_count", 
 			CTLFLAG_RD, &stats->colc,
 			"Collision Count");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "symbol_errors",
 			CTLFLAG_RD, &stats->symerrs,
 			"Symbol Errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "sequence_errors",
 			CTLFLAG_RD, &stats->sec,
 			"Sequence Errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "defer_count",
 			CTLFLAG_RD, &stats->dc,
 			"Defer Count");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "missed_packets",
 			CTLFLAG_RD, &stats->mpc,
 			"Missed Packets");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_length_errors",
 			CTLFLAG_RD, &stats->rlec,
 			"Receive Length Errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_no_buff",
 			CTLFLAG_RD, &stats->rnbc,
 			"Receive No Buffers");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_undersize",
 			CTLFLAG_RD, &stats->ruc,
 			"Receive Undersize");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_fragmented",
 			CTLFLAG_RD, &stats->rfc,
 			"Fragmented Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_oversize",
 			CTLFLAG_RD, &stats->roc,
 			"Oversized Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_jabber",
 			CTLFLAG_RD, &stats->rjc,
 			"Recevied Jabber");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_errs",
 			CTLFLAG_RD, &stats->rxerrc,
 			"Receive Errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "crc_errs",
 			CTLFLAG_RD, &stats->crcerrs,
 			"CRC errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "alignment_errs",
 			CTLFLAG_RD, &stats->algnerrc,
 			"Alignment Errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_no_crs",
 			CTLFLAG_RD, &stats->tncrs,
 			"Transmit with No CRS");
 	/* On 82575 these are collision counts */
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "coll_ext_errs",
 			CTLFLAG_RD, &stats->cexterr,
 			"Collision/Carrier extension errors");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xon_recvd",
 			CTLFLAG_RD, &stats->xonrxc,
 			"XON Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xon_txd",
 			CTLFLAG_RD, &stats->xontxc,
 			"XON Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xoff_recvd",
 			CTLFLAG_RD, &stats->xoffrxc,
 			"XOFF Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xoff_txd",
 			CTLFLAG_RD, &stats->xofftxc,
 			"XOFF Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "unsupported_fc_recvd",
 			CTLFLAG_RD, &stats->fcruc,
 			"Unsupported Flow Control Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_recvd",
 			CTLFLAG_RD, &stats->mgprc,
 			"Management Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_drop",
 			CTLFLAG_RD, &stats->mgpdc,
 			"Management Packets Dropped");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_txd",
 			CTLFLAG_RD, &stats->mgptc,
 			"Management Packets Transmitted");
 	/* Packet Reception Stats */
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_pkts_recvd",
 			CTLFLAG_RD, &stats->tpr,
 			"Total Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_recvd",
 			CTLFLAG_RD, &stats->gprc,
 			"Good Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "bcast_pkts_recvd",
 			CTLFLAG_RD, &stats->bprc,
 			"Broadcast Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_recvd",
 			CTLFLAG_RD, &stats->mprc,
 			"Multicast Packets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_64",
 			CTLFLAG_RD, &stats->prc64,
 			"64 byte frames received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_65_127",
 			CTLFLAG_RD, &stats->prc127,
 			"65-127 byte frames received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_128_255",
 			CTLFLAG_RD, &stats->prc255,
 			"128-255 byte frames received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_256_511",
 			CTLFLAG_RD, &stats->prc511,
 			"256-511 byte frames received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_512_1023",
 			CTLFLAG_RD, &stats->prc1023,
 			"512-1023 byte frames received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_1024_1522",
 			CTLFLAG_RD, &stats->prc1522,
 			"1023-1522 byte frames received");
  	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_recvd", 
  			CTLFLAG_RD, &stats->gorc, 
 			"Good Octets Received");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_octets_recvd", 
 			CTLFLAG_RD, &stats->tor, 
 			"Total Octets Received");
 
 	/* Packet Transmission Stats */
  	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_txd", 
  			CTLFLAG_RD, &stats->gotc, 
  			"Good Octets Transmitted"); 
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_octets_txd", 
 			CTLFLAG_RD, &stats->tot, 
 			"Total Octets Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_pkts_txd",
 			CTLFLAG_RD, &stats->tpt,
 			"Total Packets Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_txd",
 			CTLFLAG_RD, &stats->gptc,
 			"Good Packets Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "bcast_pkts_txd",
 			CTLFLAG_RD, &stats->bptc,
 			"Broadcast Packets Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_txd",
 			CTLFLAG_RD, &stats->mptc,
 			"Multicast Packets Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_64",
 			CTLFLAG_RD, &stats->ptc64,
 			"64 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_65_127",
 			CTLFLAG_RD, &stats->ptc127,
 			"65-127 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_128_255",
 			CTLFLAG_RD, &stats->ptc255,
 			"128-255 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_256_511",
 			CTLFLAG_RD, &stats->ptc511,
 			"256-511 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_512_1023",
 			CTLFLAG_RD, &stats->ptc1023,
 			"512-1023 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_1024_1522",
 			CTLFLAG_RD, &stats->ptc1522,
 			"1024-1522 byte frames transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tso_txd",
 			CTLFLAG_RD, &stats->tsctc,
 			"TSO Contexts Transmitted");
 	SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tso_ctx_fail",
 			CTLFLAG_RD, &stats->tsctfc,
 			"TSO Contexts Failed");
 
 
 	/* Interrupt Stats */
 
 	int_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "interrupts", 
 				    CTLFLAG_RD, NULL, "Interrupt Statistics");
 	int_list = SYSCTL_CHILDREN(int_node);
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "asserts",
 			CTLFLAG_RD, &stats->iac,
 			"Interrupt Assertion Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_pkt_timer",
 			CTLFLAG_RD, &stats->icrxptc,
 			"Interrupt Cause Rx Pkt Timer Expire Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_abs_timer",
 			CTLFLAG_RD, &stats->icrxatc,
 			"Interrupt Cause Rx Abs Timer Expire Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_pkt_timer",
 			CTLFLAG_RD, &stats->ictxptc,
 			"Interrupt Cause Tx Pkt Timer Expire Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_abs_timer",
 			CTLFLAG_RD, &stats->ictxatc,
 			"Interrupt Cause Tx Abs Timer Expire Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_queue_empty",
 			CTLFLAG_RD, &stats->ictxqec,
 			"Interrupt Cause Tx Queue Empty Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_queue_min_thresh",
 			CTLFLAG_RD, &stats->ictxqmtc,
 			"Interrupt Cause Tx Queue Min Thresh Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_desc_min_thresh",
 			CTLFLAG_RD, &stats->icrxdmtc,
 			"Interrupt Cause Rx Desc Min Thresh Count");
 
 	SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_overrun",
 			CTLFLAG_RD, &stats->icrxoc,
 			"Interrupt Cause Receiver Overrun Count");
 
 	/* Host to Card Stats */
 
 	host_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "host", 
 				    CTLFLAG_RD, NULL, 
 				    "Host to Card Statistics");
 
 	host_list = SYSCTL_CHILDREN(host_node);
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_tx_pkt",
 			CTLFLAG_RD, &stats->cbtmpc,
 			"Circuit Breaker Tx Packet Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "host_tx_pkt_discard",
 			CTLFLAG_RD, &stats->htdpmc,
 			"Host Transmit Discarded Packets");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "rx_pkt",
 			CTLFLAG_RD, &stats->rpthc,
 			"Rx Packets To Host");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_rx_pkts",
 			CTLFLAG_RD, &stats->cbrmpc,
 			"Circuit Breaker Rx Packet Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_rx_pkt_drop",
 			CTLFLAG_RD, &stats->cbrdpc,
 			"Circuit Breaker Rx Dropped Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "tx_good_pkt",
 			CTLFLAG_RD, &stats->hgptc,
 			"Host Good Packets Tx Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_tx_pkt_drop",
 			CTLFLAG_RD, &stats->htcbdpc,
 			"Host Tx Circuit Breaker Dropped Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "rx_good_bytes",
 			CTLFLAG_RD, &stats->hgorc,
 			"Host Good Octets Received Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "tx_good_bytes",
 			CTLFLAG_RD, &stats->hgotc,
 			"Host Good Octets Transmit Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "length_errors",
 			CTLFLAG_RD, &stats->lenerrs,
 			"Length Errors");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "serdes_violation_pkt",
 			CTLFLAG_RD, &stats->scvpc,
 			"SerDes/SGMII Code Violation Pkt Count");
 
 	SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "header_redir_missed",
 			CTLFLAG_RD, &stats->hrmpc,
 			"Header Redirection Missed Packet Count");
 }
 
 
 /**********************************************************************
  *
  *  This routine provides a way to dump out the adapter eeprom,
  *  often a useful debug/service tool. This only dumps the first
  *  32 words, stuff that matters is in that extent.
  *
  **********************************************************************/
 static int
 igb_sysctl_nvm_info(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *adapter;
 	int error;
 	int result;
 
 	result = -1;
 	error = sysctl_handle_int(oidp, &result, 0, req);
 
 	if (error || !req->newptr)
 		return (error);
 
 	/*
 	 * This value will cause a hex dump of the
 	 * first 32 16-bit words of the EEPROM to
 	 * the screen.
 	 */
 	if (result == 1) {
 		adapter = (struct adapter *)arg1;
 		igb_print_nvm_info(adapter);
         }
 
 	return (error);
 }
 
 static void
 igb_print_nvm_info(struct adapter *adapter)
 {
 	u16	eeprom_data;
 	int	i, j, row = 0;
 
 	/* Its a bit crude, but it gets the job done */
 	printf("\nInterface EEPROM Dump:\n");
 	printf("Offset\n0x0000  ");
 	for (i = 0, j = 0; i < 32; i++, j++) {
 		if (j == 8) { /* Make the offset block */
 			j = 0; ++row;
 			printf("\n0x00%x0  ",row);
 		}
 		e1000_read_nvm(&adapter->hw, i, 1, &eeprom_data);
 		printf("%04x ", eeprom_data);
 	}
 	printf("\n");
 }
 
 static void
 igb_set_sysctl_value(struct adapter *adapter, const char *name,
 	const char *description, int *limit, int value)
 {
 	*limit = value;
 	SYSCTL_ADD_INT(device_get_sysctl_ctx(adapter->dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(adapter->dev)),
 	    OID_AUTO, name, CTLFLAG_RW, limit, value, description);
 }
 
 /*
 ** Set flow control using sysctl:
 ** Flow control values:
 ** 	0 - off
 **	1 - rx pause
 **	2 - tx pause
 **	3 - full
 */
 static int
 igb_set_flowcntl(SYSCTL_HANDLER_ARGS)
 {
 	int		error;
 	static int	input = 3; /* default is full */
 	struct adapter	*adapter = (struct adapter *) arg1;
 
 	error = sysctl_handle_int(oidp, &input, 0, req);
 
 	if ((error) || (req->newptr == NULL))
 		return (error);
 
 	switch (input) {
 		case e1000_fc_rx_pause:
 		case e1000_fc_tx_pause:
 		case e1000_fc_full:
 		case e1000_fc_none:
 			adapter->hw.fc.requested_mode = input;
 			adapter->fc = input;
 			break;
 		default:
 			/* Do nothing */
 			return (error);
 	}
 
 	adapter->hw.fc.current_mode = adapter->hw.fc.requested_mode;
 	e1000_force_mac_fc(&adapter->hw);
 	/* XXX TODO: update DROP_EN on each RX queue if appropriate */
 	return (error);
 }
 
 /*
 ** Manage DMA Coalesce:
 ** Control values:
 ** 	0/1 - off/on
 **	Legal timer values are:
 **	250,500,1000-10000 in thousands
 */
 static int
 igb_sysctl_dmac(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *adapter = (struct adapter *) arg1;
 	int		error;
 
 	error = sysctl_handle_int(oidp, &adapter->dmac, 0, req);
 
 	if ((error) || (req->newptr == NULL))
 		return (error);
 
 	switch (adapter->dmac) {
 		case 0:
 			/* Disabling */
 			break;
 		case 1: /* Just enable and use default */
 			adapter->dmac = 1000;
 			break;
 		case 250:
 		case 500:
 		case 1000:
 		case 2000:
 		case 3000:
 		case 4000:
 		case 5000:
 		case 6000:
 		case 7000:
 		case 8000:
 		case 9000:
 		case 10000:
 			/* Legal values - allow */
 			break;
 		default:
 			/* Do nothing, illegal value */
 			adapter->dmac = 0;
 			return (EINVAL);
 	}
 	/* Reinit the interface */
 	igb_init(adapter);
 	return (error);
 }
 
 /*
 ** Manage Energy Efficient Ethernet:
 ** Control values:
 **     0/1 - enabled/disabled
 */
 static int
 igb_sysctl_eee(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter	*adapter = (struct adapter *) arg1;
 	int		error, value;
 
 	value = adapter->hw.dev_spec._82575.eee_disable;
 	error = sysctl_handle_int(oidp, &value, 0, req);
 	if (error || req->newptr == NULL)
 		return (error);
 	IGB_CORE_LOCK(adapter);
 	adapter->hw.dev_spec._82575.eee_disable = (value != 0);
 	igb_init_locked(adapter);
 	IGB_CORE_UNLOCK(adapter);
 	return (0);
 }
Index: projects/vnet/sys/dev/gpio/gpiobus.c
===================================================================
--- projects/vnet/sys/dev/gpio/gpiobus.c	(revision 301546)
+++ projects/vnet/sys/dev/gpio/gpiobus.c	(revision 301547)
@@ -1,849 +1,874 @@
 /*-
  * Copyright (c) 2009 Oleksandr Tymoshenko <gonzo@freebsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/gpio.h>
 #include <sys/intr.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 
 #include <dev/gpio/gpiobusvar.h>
 
 #include "gpiobus_if.h"
 
 #undef GPIOBUS_DEBUG
 #ifdef GPIOBUS_DEBUG
 #define	dprintf printf
 #else
 #define	dprintf(x, arg...)
 #endif
 
 static void gpiobus_print_pins(struct gpiobus_ivar *, char *, size_t);
 static int gpiobus_parse_pins(struct gpiobus_softc *, device_t, int);
 static int gpiobus_probe(device_t);
 static int gpiobus_attach(device_t);
 static int gpiobus_detach(device_t);
 static int gpiobus_suspend(device_t);
 static int gpiobus_resume(device_t);
 static void gpiobus_probe_nomatch(device_t, device_t);
 static int gpiobus_print_child(device_t, device_t);
 static int gpiobus_child_location_str(device_t, device_t, char *, size_t);
 static int gpiobus_child_pnpinfo_str(device_t, device_t, char *, size_t);
 static device_t gpiobus_add_child(device_t, u_int, const char *, int);
 static void gpiobus_hinted_child(device_t, const char *, int);
 
 /*
  * GPIOBUS interface
  */
 static int gpiobus_acquire_bus(device_t, device_t, int);
 static void gpiobus_release_bus(device_t, device_t);
 static int gpiobus_pin_setflags(device_t, device_t, uint32_t, uint32_t);
 static int gpiobus_pin_getflags(device_t, device_t, uint32_t, uint32_t*);
 static int gpiobus_pin_getcaps(device_t, device_t, uint32_t, uint32_t*);
 static int gpiobus_pin_set(device_t, device_t, uint32_t, unsigned int);
 static int gpiobus_pin_get(device_t, device_t, uint32_t, unsigned int*);
 static int gpiobus_pin_toggle(device_t, device_t, uint32_t);
 
 /*
  * XXX -> Move me to better place - gpio_subr.c?
  * Also, this function must be changed when interrupt configuration
  * data will be moved into struct resource.
  */
 #ifdef INTRNG
+static void
+gpio_destruct_map_data(struct intr_map_data *map_data)
+{
+
+	KASSERT(map_data->type == INTR_MAP_DATA_GPIO,
+	    ("%s: bad map_data type %d", __func__, map_data->type));
+
+	free(map_data, M_DEVBUF);
+}
+
 struct resource *
 gpio_alloc_intr_resource(device_t consumer_dev, int *rid, u_int alloc_flags,
     gpio_pin_t pin, uint32_t intr_mode)
 {
-	u_int irqnum;
+	int rv;
+	u_int irq;
+	struct intr_map_data_gpio *gpio_data;
+	struct resource *res;
 
-	/*
-	 * Allocate new fictitious interrupt number and store configuration
-	 * into it.
-	 */
-	irqnum = intr_gpio_map_irq(pin->dev, pin->pin, pin->flags, intr_mode);
-	if (irqnum == INTR_IRQ_INVALID)
+	gpio_data = malloc(sizeof(*gpio_data), M_DEVBUF, M_WAITOK | M_ZERO);
+	gpio_data->hdr.type = INTR_MAP_DATA_GPIO;
+	gpio_data->hdr.destruct = gpio_destruct_map_data;
+	gpio_data->gpio_pin_num = pin->pin;
+	gpio_data->gpio_pin_flags = pin->flags;
+	gpio_data->gpio_intr_mode = intr_mode;
+
+	rv = intr_map_irq(pin->dev, 0, (struct intr_map_data *)gpio_data,
+	    &irq);
+	if (rv != 0) {
+		gpio_destruct_map_data((struct intr_map_data *)gpio_data);
 		return (NULL);
+	}
 
-	return (bus_alloc_resource(consumer_dev, SYS_RES_IRQ, rid,
-	    irqnum, irqnum, 1, alloc_flags));
+	res = bus_alloc_resource(consumer_dev, SYS_RES_IRQ, rid, irq, irq, 1,
+	    alloc_flags);
+	if (res == NULL) {
+		gpio_destruct_map_data((struct intr_map_data *)gpio_data);
+		return (NULL);
+	}
+	rman_set_virtual(res, gpio_data);
+	return (res);
 }
 #else
 struct resource *
 gpio_alloc_intr_resource(device_t consumer_dev, int *rid, u_int alloc_flags,
     gpio_pin_t pin, uint32_t intr_mode)
 {
 
 	return (NULL);
 }
 #endif
 
 int
 gpio_check_flags(uint32_t caps, uint32_t flags)
 {
 
 	/* Check for unwanted flags. */
 	if ((flags & caps) == 0 || (flags & caps) != flags)
 		return (EINVAL);
 	/* Cannot mix input/output together. */
 	if (flags & GPIO_PIN_INPUT && flags & GPIO_PIN_OUTPUT)
 		return (EINVAL);
 	/* Cannot mix pull-up/pull-down together. */
 	if (flags & GPIO_PIN_PULLUP && flags & GPIO_PIN_PULLDOWN)
 		return (EINVAL);
 
 	return (0);
 }
 
 static void
 gpiobus_print_pins(struct gpiobus_ivar *devi, char *buf, size_t buflen)
 {
 	char tmp[128];
 	int i, range_start, range_stop, need_coma;
 
 	if (devi->npins == 0)
 		return;
 
 	need_coma = 0;
 	range_start = range_stop = devi->pins[0];
 	for (i = 1; i < devi->npins; i++) {
 		if (devi->pins[i] != (range_stop + 1)) {
 			if (need_coma)
 				strlcat(buf, ",", buflen);
 			memset(tmp, 0, sizeof(tmp));
 			if (range_start != range_stop)
 				snprintf(tmp, sizeof(tmp) - 1, "%d-%d",
 				    range_start, range_stop);
 			else
 				snprintf(tmp, sizeof(tmp) - 1, "%d",
 				    range_start);
 			strlcat(buf, tmp, buflen);
 
 			range_start = range_stop = devi->pins[i];
 			need_coma = 1;
 		}
 		else
 			range_stop++;
 	}
 
 	if (need_coma)
 		strlcat(buf, ",", buflen);
 	memset(tmp, 0, sizeof(tmp));
 	if (range_start != range_stop)
 		snprintf(tmp, sizeof(tmp) - 1, "%d-%d",
 		    range_start, range_stop);
 	else
 		snprintf(tmp, sizeof(tmp) - 1, "%d",
 		    range_start);
 	strlcat(buf, tmp, buflen);
 }
 
 device_t
 gpiobus_attach_bus(device_t dev)
 {
 	device_t busdev;
 
 	busdev = device_add_child(dev, "gpiobus", -1);
 	if (busdev == NULL)
 		return (NULL);
 	if (device_add_child(dev, "gpioc", -1) == NULL) {
 		device_delete_child(dev, busdev);
 		return (NULL);
 	}
 #ifdef FDT
 	ofw_gpiobus_register_provider(dev);
 #endif
 	bus_generic_attach(dev);
 
 	return (busdev);
 }
 
 int
 gpiobus_detach_bus(device_t dev)
 {
 	int err;
 
 #ifdef FDT
 	ofw_gpiobus_unregister_provider(dev);
 #endif
 	err = bus_generic_detach(dev);
 	if (err != 0)
 		return (err);
 
 	return (device_delete_children(dev));
 }
 
 int
 gpiobus_init_softc(device_t dev)
 {
 	struct gpiobus_softc *sc;
 
 	sc = GPIOBUS_SOFTC(dev);
 	sc->sc_busdev = dev;
 	sc->sc_dev = device_get_parent(dev);
 	sc->sc_intr_rman.rm_type = RMAN_ARRAY;
 	sc->sc_intr_rman.rm_descr = "GPIO Interrupts";
 	if (rman_init(&sc->sc_intr_rman) != 0 ||
 	    rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0)
 		panic("%s: failed to set up rman.", __func__);
 
 	if (GPIO_PIN_MAX(sc->sc_dev, &sc->sc_npins) != 0)
 		return (ENXIO);
 
 	KASSERT(sc->sc_npins >= 0, ("GPIO device with no pins"));
 
 	/* Pins = GPIO_PIN_MAX() + 1 */
 	sc->sc_npins++;
 
 	sc->sc_pins = malloc(sizeof(*sc->sc_pins) * sc->sc_npins, M_DEVBUF,
 	    M_NOWAIT | M_ZERO);
 	if (sc->sc_pins == NULL)
 		return (ENOMEM);
 
 	/* Initialize the bus lock. */
 	GPIOBUS_LOCK_INIT(sc);
 
 	return (0);
 }
 
 int
 gpiobus_alloc_ivars(struct gpiobus_ivar *devi)
 {
 
 	/* Allocate pins and flags memory. */
 	devi->pins = malloc(sizeof(uint32_t) * devi->npins, M_DEVBUF,
 	    M_NOWAIT | M_ZERO);
 	if (devi->pins == NULL)
 		return (ENOMEM);
 	devi->flags = malloc(sizeof(uint32_t) * devi->npins, M_DEVBUF,
 	    M_NOWAIT | M_ZERO);
 	if (devi->flags == NULL) {
 		free(devi->pins, M_DEVBUF);
 		return (ENOMEM);
 	}
 
 	return (0);
 }
 
 void
 gpiobus_free_ivars(struct gpiobus_ivar *devi)
 {
 
 	if (devi->flags) {
 		free(devi->flags, M_DEVBUF);
 		devi->flags = NULL;
 	}
 	if (devi->pins) {
 		free(devi->pins, M_DEVBUF);
 		devi->pins = NULL;
 	}
 }
 
 int
 gpiobus_acquire_pin(device_t bus, uint32_t pin)
 {
 	struct gpiobus_softc *sc;
 
 	sc = device_get_softc(bus);
 	/* Consistency check. */
 	if (pin >= sc->sc_npins) {
 		device_printf(bus,
 		    "invalid pin %d, max: %d\n", pin, sc->sc_npins - 1);
 		return (-1);
 	}
 	/* Mark pin as mapped and give warning if it's already mapped. */
 	if (sc->sc_pins[pin].mapped) {
 		device_printf(bus, "warning: pin %d is already mapped\n", pin);
 		return (-1);
 	}
 	sc->sc_pins[pin].mapped = 1;
 
 	return (0);
 }
 
 /* Release mapped pin */
 int
 gpiobus_release_pin(device_t bus, uint32_t pin)
 {
 	struct gpiobus_softc *sc;
 
 	sc = device_get_softc(bus);
 	/* Consistency check. */
 	if (pin >= sc->sc_npins) {
 		device_printf(bus,
 		    "gpiobus_acquire_pin: invalid pin %d, max=%d\n",
 		    pin, sc->sc_npins - 1);
 		return (-1);
 	}
 
 	if (!sc->sc_pins[pin].mapped) {
 		device_printf(bus, "gpiobus_acquire_pin: pin %d is not mapped\n", pin);
 		return (-1);
 	}
 	sc->sc_pins[pin].mapped = 0;
 
 	return (0);
 }
 
 static int
 gpiobus_parse_pins(struct gpiobus_softc *sc, device_t child, int mask)
 {
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 	int i, npins;
 
 	npins = 0;
 	for (i = 0; i < 32; i++) {
 		if (mask & (1 << i))
 			npins++;
 	}
 	if (npins == 0) {
 		device_printf(child, "empty pin mask\n");
 		return (EINVAL);
 	}
 	devi->npins = npins;
 	if (gpiobus_alloc_ivars(devi) != 0) {
 		device_printf(child, "cannot allocate device ivars\n");
 		return (EINVAL);
 	}
 	npins = 0;
 	for (i = 0; i < 32; i++) {
 		if ((mask & (1 << i)) == 0)
 			continue;
 		/* Reserve the GPIO pin. */
 		if (gpiobus_acquire_pin(sc->sc_busdev, i) != 0) {
 			gpiobus_free_ivars(devi);
 			return (EINVAL);
 		}
 		devi->pins[npins++] = i;
 		/* Use the child name as pin name. */
 		GPIOBUS_PIN_SETNAME(sc->sc_busdev, i,
 		    device_get_nameunit(child));
 	}
 
 	return (0);
 }
 
 static int
 gpiobus_probe(device_t dev)
 {
 	device_set_desc(dev, "GPIO bus");
 
 	return (BUS_PROBE_GENERIC);
 }
 
 static int
 gpiobus_attach(device_t dev)
 {
 	int err;
 
 	err = gpiobus_init_softc(dev);
 	if (err != 0)
 		return (err);
 
 	/*
 	 * Get parent's pins and mark them as unmapped
 	 */
 	bus_generic_probe(dev);
 	bus_enumerate_hinted_children(dev);
 
 	return (bus_generic_attach(dev));
 }
 
 /*
  * Since this is not a self-enumerating bus, and since we always add
  * children in attach, we have to always delete children here.
  */
 static int
 gpiobus_detach(device_t dev)
 {
 	struct gpiobus_softc *sc;
 	struct gpiobus_ivar *devi;
 	device_t *devlist;
 	int i, err, ndevs;
 
 	sc = GPIOBUS_SOFTC(dev);
 	KASSERT(mtx_initialized(&sc->sc_mtx),
 	    ("gpiobus mutex not initialized"));
 	GPIOBUS_LOCK_DESTROY(sc);
 
 	if ((err = bus_generic_detach(dev)) != 0)
 		return (err);
 
 	if ((err = device_get_children(dev, &devlist, &ndevs)) != 0)
 		return (err);
 	for (i = 0; i < ndevs; i++) {
 		devi = GPIOBUS_IVAR(devlist[i]);
 		gpiobus_free_ivars(devi);
 		resource_list_free(&devi->rl);
 		free(devi, M_DEVBUF);
 		device_delete_child(dev, devlist[i]);
 	}
 	free(devlist, M_TEMP);
 	rman_fini(&sc->sc_intr_rman);
 	if (sc->sc_pins) {
 		for (i = 0; i < sc->sc_npins; i++) {
 			if (sc->sc_pins[i].name != NULL)
 				free(sc->sc_pins[i].name, M_DEVBUF);
 			sc->sc_pins[i].name = NULL;
 		}
 		free(sc->sc_pins, M_DEVBUF);
 		sc->sc_pins = NULL;
 	}
 
 	return (0);
 }
 
 static int
 gpiobus_suspend(device_t dev)
 {
 
 	return (bus_generic_suspend(dev));
 }
 
 static int
 gpiobus_resume(device_t dev)
 {
 
 	return (bus_generic_resume(dev));
 }
 
 static void
 gpiobus_probe_nomatch(device_t dev, device_t child)
 {
 	char pins[128];
 	struct gpiobus_ivar *devi;
 
 	devi = GPIOBUS_IVAR(child);
 	memset(pins, 0, sizeof(pins));
 	gpiobus_print_pins(devi, pins, sizeof(pins));
 	if (devi->npins > 1)
 		device_printf(dev, "<unknown device> at pins %s", pins);
 	else
 		device_printf(dev, "<unknown device> at pin %s", pins);
 	resource_list_print_type(&devi->rl, "irq", SYS_RES_IRQ, "%jd");
 	printf("\n");
 }
 
 static int
 gpiobus_print_child(device_t dev, device_t child)
 {
 	char pins[128];
 	int retval = 0;
 	struct gpiobus_ivar *devi;
 
 	devi = GPIOBUS_IVAR(child);
 	memset(pins, 0, sizeof(pins));
 	retval += bus_print_child_header(dev, child);
 	if (devi->npins > 0) {
 		if (devi->npins > 1)
 			retval += printf(" at pins ");
 		else
 			retval += printf(" at pin ");
 		gpiobus_print_pins(devi, pins, sizeof(pins));
 		retval += printf("%s", pins);
 	}
 	resource_list_print_type(&devi->rl, "irq", SYS_RES_IRQ, "%jd");
 	retval += bus_print_child_footer(dev, child);
 
 	return (retval);
 }
 
 static int
 gpiobus_child_location_str(device_t bus, device_t child, char *buf,
     size_t buflen)
 {
 	struct gpiobus_ivar *devi;
 
 	devi = GPIOBUS_IVAR(child);
 	if (devi->npins > 1)
 		strlcpy(buf, "pins=", buflen);
 	else
 		strlcpy(buf, "pin=", buflen);
 	gpiobus_print_pins(devi, buf, buflen);
 
 	return (0);
 }
 
 static int
 gpiobus_child_pnpinfo_str(device_t bus, device_t child, char *buf,
     size_t buflen)
 {
 
 	*buf = '\0';
 	return (0);
 }
 
 static device_t
 gpiobus_add_child(device_t dev, u_int order, const char *name, int unit)
 {
 	device_t child;
 	struct gpiobus_ivar *devi;
 
 	child = device_add_child_ordered(dev, order, name, unit);
 	if (child == NULL) 
 		return (child);
 	devi = malloc(sizeof(struct gpiobus_ivar), M_DEVBUF, M_NOWAIT | M_ZERO);
 	if (devi == NULL) {
 		device_delete_child(dev, child);
 		return (NULL);
 	}
 	resource_list_init(&devi->rl);
 	device_set_ivars(child, devi);
 
 	return (child);
 }
 
 static void
 gpiobus_hinted_child(device_t bus, const char *dname, int dunit)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(bus);
 	struct gpiobus_ivar *devi;
 	device_t child;
 	int irq, pins;
 
 	child = BUS_ADD_CHILD(bus, 0, dname, dunit);
 	devi = GPIOBUS_IVAR(child);
 	resource_int_value(dname, dunit, "pins", &pins);
 	if (gpiobus_parse_pins(sc, child, pins)) {
 		resource_list_free(&devi->rl);
 		free(devi, M_DEVBUF);
 		device_delete_child(bus, child);
 	}
 	if (resource_int_value(dname, dunit, "irq", &irq) == 0) {
 		if (bus_set_resource(child, SYS_RES_IRQ, 0, irq, 1) != 0)
 			device_printf(bus,
 			    "warning: bus_set_resource() failed\n");
 	}
 }
 
 static int
 gpiobus_set_resource(device_t dev, device_t child, int type, int rid,
     rman_res_t start, rman_res_t count)
 {
 	struct gpiobus_ivar *devi;
 	struct resource_list_entry *rle;
 
 	dprintf("%s: entry (%p, %p, %d, %d, %p, %ld)\n",
 	    __func__, dev, child, type, rid, (void *)(intptr_t)start, count);
 	devi = GPIOBUS_IVAR(child);
 	rle = resource_list_add(&devi->rl, type, rid, start,
 	    start + count - 1, count);
 	if (rle == NULL)
 		return (ENXIO);
 
 	return (0);
 }
 
 static struct resource *
 gpiobus_alloc_resource(device_t bus, device_t child, int type, int *rid,
     rman_res_t start, rman_res_t end, rman_res_t count, u_int flags)
 {
 	struct gpiobus_softc *sc;
 	struct resource *rv;
 	struct resource_list *rl;
 	struct resource_list_entry *rle;
 	int isdefault;
 
 	if (type != SYS_RES_IRQ)
 		return (NULL);
 	isdefault = (RMAN_IS_DEFAULT_RANGE(start, end) && count == 1);
 	rle = NULL;
 	if (isdefault) {
 		rl = BUS_GET_RESOURCE_LIST(bus, child);
 		if (rl == NULL)
 			return (NULL);
 		rle = resource_list_find(rl, type, *rid);
 		if (rle == NULL)
 			return (NULL);
 		if (rle->res != NULL)
 			panic("%s: resource entry is busy", __func__);
 		start = rle->start;
 		count = rle->count;
 		end = rle->end;
 	}
 	sc = device_get_softc(bus);
 	rv = rman_reserve_resource(&sc->sc_intr_rman, start, end, count, flags,
 	    child);
 	if (rv == NULL)
 		return (NULL);
 	rman_set_rid(rv, *rid);
 	if ((flags & RF_ACTIVE) != 0 &&
 	    bus_activate_resource(child, type, *rid, rv) != 0) {
 		rman_release_resource(rv);
 		return (NULL);
 	}
 
 	return (rv);
 }
 
 static int
 gpiobus_release_resource(device_t bus __unused, device_t child, int type,
     int rid, struct resource *r)
 {
 	int error;
 
 	if (rman_get_flags(r) & RF_ACTIVE) {
 		error = bus_deactivate_resource(child, type, rid, r);
 		if (error)
 			return (error);
 	}
 
 	return (rman_release_resource(r));
 }
 
 static struct resource_list *
 gpiobus_get_resource_list(device_t bus __unused, device_t child)
 {
 	struct gpiobus_ivar *ivar;
 
 	ivar = GPIOBUS_IVAR(child);
 
 	return (&ivar->rl);
 }
 
 static int
 gpiobus_acquire_bus(device_t busdev, device_t child, int how)
 {
 	struct gpiobus_softc *sc;
 
 	sc = device_get_softc(busdev);
 	GPIOBUS_ASSERT_UNLOCKED(sc);
 	GPIOBUS_LOCK(sc);
 	if (sc->sc_owner != NULL) {
 		if (sc->sc_owner == child)
 			panic("%s: %s still owns the bus.",
 			    device_get_nameunit(busdev),
 			    device_get_nameunit(child));
 		if (how == GPIOBUS_DONTWAIT) {
 			GPIOBUS_UNLOCK(sc);
 			return (EWOULDBLOCK);
 		}
 		while (sc->sc_owner != NULL)
 			mtx_sleep(sc, &sc->sc_mtx, 0, "gpiobuswait", 0);
 	}
 	sc->sc_owner = child;
 	GPIOBUS_UNLOCK(sc);
 
 	return (0);
 }
 
 static void
 gpiobus_release_bus(device_t busdev, device_t child)
 {
 	struct gpiobus_softc *sc;
 
 	sc = device_get_softc(busdev);
 	GPIOBUS_ASSERT_UNLOCKED(sc);
 	GPIOBUS_LOCK(sc);
 	if (sc->sc_owner == NULL)
 		panic("%s: %s releasing unowned bus.",
 		    device_get_nameunit(busdev),
 		    device_get_nameunit(child));
 	if (sc->sc_owner != child)
 		panic("%s: %s trying to release bus owned by %s",
 		    device_get_nameunit(busdev),
 		    device_get_nameunit(child),
 		    device_get_nameunit(sc->sc_owner));
 	sc->sc_owner = NULL;
 	wakeup(sc);
 	GPIOBUS_UNLOCK(sc);
 }
 
 static int
 gpiobus_pin_setflags(device_t dev, device_t child, uint32_t pin, 
     uint32_t flags)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 	uint32_t caps;
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 	if (GPIO_PIN_GETCAPS(sc->sc_dev, devi->pins[pin], &caps) != 0)
 		return (EINVAL);
 	if (gpio_check_flags(caps, flags) != 0)
 		return (EINVAL);
 
 	return (GPIO_PIN_SETFLAGS(sc->sc_dev, devi->pins[pin], flags));
 }
 
 static int
 gpiobus_pin_getflags(device_t dev, device_t child, uint32_t pin, 
     uint32_t *flags)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 
 	return GPIO_PIN_GETFLAGS(sc->sc_dev, devi->pins[pin], flags);
 }
 
 static int
 gpiobus_pin_getcaps(device_t dev, device_t child, uint32_t pin, 
     uint32_t *caps)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 
 	return GPIO_PIN_GETCAPS(sc->sc_dev, devi->pins[pin], caps);
 }
 
 static int
 gpiobus_pin_set(device_t dev, device_t child, uint32_t pin, 
     unsigned int value)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 
 	return GPIO_PIN_SET(sc->sc_dev, devi->pins[pin], value);
 }
 
 static int
 gpiobus_pin_get(device_t dev, device_t child, uint32_t pin, 
     unsigned int *value)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 
 	return GPIO_PIN_GET(sc->sc_dev, devi->pins[pin], value);
 }
 
 static int
 gpiobus_pin_toggle(device_t dev, device_t child, uint32_t pin)
 {
 	struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
 	struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
 
 	if (pin >= devi->npins)
 		return (EINVAL);
 
 	return GPIO_PIN_TOGGLE(sc->sc_dev, devi->pins[pin]);
 }
 
 static int
 gpiobus_pin_getname(device_t dev, uint32_t pin, char *name)
 {
 	struct gpiobus_softc *sc;
 
 	sc = GPIOBUS_SOFTC(dev);
 	if (pin > sc->sc_npins)
 		return (EINVAL);
 	/* Did we have a name for this pin ? */
 	if (sc->sc_pins[pin].name != NULL) {
 		memcpy(name, sc->sc_pins[pin].name, GPIOMAXNAME);
 		return (0);
 	}
 
 	/* Return the default pin name. */
 	return (GPIO_PIN_GETNAME(device_get_parent(dev), pin, name));
 }
 
 static int
 gpiobus_pin_setname(device_t dev, uint32_t pin, const char *name)
 {
 	struct gpiobus_softc *sc;
 
 	sc = GPIOBUS_SOFTC(dev);
 	if (pin > sc->sc_npins)
 		return (EINVAL);
 	if (name == NULL)
 		return (EINVAL);
 	/* Save the pin name. */
 	if (sc->sc_pins[pin].name == NULL)
 		sc->sc_pins[pin].name = malloc(GPIOMAXNAME, M_DEVBUF,
 		    M_WAITOK | M_ZERO);
 	strlcpy(sc->sc_pins[pin].name, name, GPIOMAXNAME);
 
 	return (0);
 }
 
 static device_method_t gpiobus_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		gpiobus_probe),
 	DEVMETHOD(device_attach,	gpiobus_attach),
 	DEVMETHOD(device_detach,	gpiobus_detach),
 	DEVMETHOD(device_shutdown,	bus_generic_shutdown),
 	DEVMETHOD(device_suspend,	gpiobus_suspend),
 	DEVMETHOD(device_resume,	gpiobus_resume),
 
 	/* Bus interface */
 	DEVMETHOD(bus_setup_intr,	bus_generic_setup_intr),
 	DEVMETHOD(bus_config_intr,	bus_generic_config_intr),
 	DEVMETHOD(bus_teardown_intr,	bus_generic_teardown_intr),
 	DEVMETHOD(bus_set_resource,	gpiobus_set_resource),
 	DEVMETHOD(bus_alloc_resource,	gpiobus_alloc_resource),
 	DEVMETHOD(bus_release_resource,	gpiobus_release_resource),
 	DEVMETHOD(bus_activate_resource,	bus_generic_activate_resource),
 	DEVMETHOD(bus_deactivate_resource,	bus_generic_deactivate_resource),
 	DEVMETHOD(bus_get_resource_list,	gpiobus_get_resource_list),
 	DEVMETHOD(bus_add_child,	gpiobus_add_child),
 	DEVMETHOD(bus_probe_nomatch,	gpiobus_probe_nomatch),
 	DEVMETHOD(bus_print_child,	gpiobus_print_child),
 	DEVMETHOD(bus_child_pnpinfo_str, gpiobus_child_pnpinfo_str),
 	DEVMETHOD(bus_child_location_str, gpiobus_child_location_str),
 	DEVMETHOD(bus_hinted_child,	gpiobus_hinted_child),
 
 	/* GPIO protocol */
 	DEVMETHOD(gpiobus_acquire_bus,	gpiobus_acquire_bus),
 	DEVMETHOD(gpiobus_release_bus,	gpiobus_release_bus),
 	DEVMETHOD(gpiobus_pin_getflags,	gpiobus_pin_getflags),
 	DEVMETHOD(gpiobus_pin_getcaps,	gpiobus_pin_getcaps),
 	DEVMETHOD(gpiobus_pin_setflags,	gpiobus_pin_setflags),
 	DEVMETHOD(gpiobus_pin_get,	gpiobus_pin_get),
 	DEVMETHOD(gpiobus_pin_set,	gpiobus_pin_set),
 	DEVMETHOD(gpiobus_pin_toggle,	gpiobus_pin_toggle),
 	DEVMETHOD(gpiobus_pin_getname,	gpiobus_pin_getname),
 	DEVMETHOD(gpiobus_pin_setname,	gpiobus_pin_setname),
 
 	DEVMETHOD_END
 };
 
 driver_t gpiobus_driver = {
 	"gpiobus",
 	gpiobus_methods,
 	sizeof(struct gpiobus_softc)
 };
 
 devclass_t	gpiobus_devclass;
 
 DRIVER_MODULE(gpiobus, gpio, gpiobus_driver, gpiobus_devclass, 0, 0);
 MODULE_VERSION(gpiobus, 1);
Index: projects/vnet/sys/dev/gpio/gpiobusvar.h
===================================================================
--- projects/vnet/sys/dev/gpio/gpiobusvar.h	(revision 301546)
+++ projects/vnet/sys/dev/gpio/gpiobusvar.h	(revision 301547)
@@ -1,144 +1,151 @@
 /*-
  * Copyright (c) 2009 Oleksandr Tymoshenko <gonzo@freebsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  *
  */
 
 #ifndef	__GPIOBUS_H__
 #define	__GPIOBUS_H__
 
 #include "opt_platform.h"
 
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/rman.h>
 
 #ifdef FDT
 #include <dev/ofw/ofw_bus_subr.h>
 #include <gnu/dts/include/dt-bindings/gpio/gpio.h>
 #endif
 
 #include "gpio_if.h"
 
 #ifdef FDT
 #define	GPIOBUS_IVAR(d) (struct gpiobus_ivar *)				\
 	&((struct ofw_gpiobus_devinfo *)device_get_ivars(d))->opd_dinfo
 #else
 #define	GPIOBUS_IVAR(d) (struct gpiobus_ivar *) device_get_ivars(d)
 #endif
 #define	GPIOBUS_SOFTC(d) (struct gpiobus_softc *) device_get_softc(d)
 #define	GPIOBUS_LOCK(_sc) mtx_lock(&(_sc)->sc_mtx)
 #define	GPIOBUS_UNLOCK(_sc) mtx_unlock(&(_sc)->sc_mtx)
 #define	GPIOBUS_LOCK_INIT(_sc) mtx_init(&_sc->sc_mtx,			\
 	    device_get_nameunit(_sc->sc_dev), "gpiobus", MTX_DEF)
 #define	GPIOBUS_LOCK_DESTROY(_sc) mtx_destroy(&_sc->sc_mtx)
 #define	GPIOBUS_ASSERT_LOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_OWNED)
 #define	GPIOBUS_ASSERT_UNLOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_NOTOWNED)
 
 #define	GPIOBUS_WAIT		1
 #define	GPIOBUS_DONTWAIT	2
 
 /* Use default interrupt mode -  for gpio_alloc_intr_resource */
 #define GPIO_INTR_CONFORM	GPIO_INTR_NONE
 
 struct gpiobus_pin_data
 {
 	int		mapped;		/* pin is mapped/reserved. */
 	char		*name;		/* pin name. */
 };
 
+struct intr_map_data_gpio {
+	struct intr_map_data	hdr;
+	u_int			gpio_pin_num;
+	u_int			gpio_pin_flags;
+	u_int		 	gpio_intr_mode;
+};
+
 struct gpiobus_softc
 {
 	struct mtx	sc_mtx;		/* bus mutex */
 	struct rman	sc_intr_rman;	/* isr resources */
 	device_t	sc_busdev;	/* bus device */
 	device_t	sc_owner;	/* bus owner */
 	device_t	sc_dev;		/* driver device */
 	int		sc_npins;	/* total pins on bus */
 	struct gpiobus_pin_data	*sc_pins; /* pin data */
 };
 
 struct gpiobus_pin
 {
 	device_t	dev;	/* gpio device */
 	uint32_t	flags;	/* pin flags */
 	uint32_t	pin;	/* pin number */
 };
 typedef struct gpiobus_pin *gpio_pin_t;
 
 struct gpiobus_ivar
 {
 	struct resource_list	rl;	/* isr resource list */
 	uint32_t	npins;	/* pins total */
 	uint32_t	*flags;	/* pins flags */
 	uint32_t	*pins;	/* pins map */
 };
 
 #ifdef FDT
 struct ofw_gpiobus_devinfo {
 	struct gpiobus_ivar	opd_dinfo;
 	struct ofw_bus_devinfo	opd_obdinfo;
 };
 
 static __inline int
 gpio_map_gpios(device_t bus, phandle_t dev, phandle_t gparent, int gcells,
     pcell_t *gpios, uint32_t *pin, uint32_t *flags)
 {
 	return (GPIO_MAP_GPIOS(bus, dev, gparent, gcells, gpios, pin, flags));
 }
 
 device_t ofw_gpiobus_add_fdt_child(device_t, const char *, phandle_t);
 int ofw_gpiobus_parse_gpios(device_t, char *, struct gpiobus_pin **);
 void ofw_gpiobus_register_provider(device_t);
 void ofw_gpiobus_unregister_provider(device_t);
 
 /* Consumers interface. */
 int gpio_pin_get_by_ofw_name(device_t consumer, phandle_t node,
     char *name, gpio_pin_t *gpio);
 int gpio_pin_get_by_ofw_idx(device_t consumer, phandle_t node,
     int idx, gpio_pin_t *gpio);
 int gpio_pin_get_by_ofw_property(device_t consumer, phandle_t node,
     char *name, gpio_pin_t *gpio);
 void gpio_pin_release(gpio_pin_t gpio);
 int gpio_pin_getcaps(gpio_pin_t pin, uint32_t *caps);
 int gpio_pin_is_active(gpio_pin_t pin, bool *active);
 int gpio_pin_set_active(gpio_pin_t pin, bool active);
 int gpio_pin_setflags(gpio_pin_t pin, uint32_t flags);
 #endif
 struct resource *gpio_alloc_intr_resource(device_t consumer_dev, int *rid,
     u_int alloc_flags, gpio_pin_t pin, uint32_t intr_mode);
 int gpio_check_flags(uint32_t, uint32_t);
 device_t gpiobus_attach_bus(device_t);
 int gpiobus_detach_bus(device_t);
 int gpiobus_init_softc(device_t);
 int gpiobus_alloc_ivars(struct gpiobus_ivar *);
 void gpiobus_free_ivars(struct gpiobus_ivar *);
 int gpiobus_acquire_pin(device_t, uint32_t);
 int gpiobus_release_pin(device_t, uint32_t);
 
 extern driver_t gpiobus_driver;
 
 #endif	/* __GPIOBUS_H__ */
Index: projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
===================================================================
--- projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c	(revision 301546)
+++ projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c	(revision 301547)
@@ -1,3003 +1,3003 @@
 /*-
  * Copyright (c) 2010-2012 Citrix Inc.
  * Copyright (c) 2009-2012,2016 Microsoft Corp.
  * Copyright (c) 2012 NetApp Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 /*-
  * Copyright (c) 2004-2006 Kip Macy
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet6.h"
 #include "opt_inet.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/sockio.h>
 #include <sys/mbuf.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/kernel.h>
 #include <sys/socket.h>
 #include <sys/queue.h>
 #include <sys/lock.h>
 #include <sys/sx.h>
 #include <sys/sysctl.h>
 #include <sys/buf_ring.h>
 
 #include <net/if.h>
 #include <net/if_arp.h>
 #include <net/ethernet.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 
 #include <net/bpf.h>
 
 #include <net/if_var.h>
 #include <net/if_types.h>
 #include <net/if_vlan_var.h>
 
 #include <netinet/in_systm.h>
 #include <netinet/in.h>
 #include <netinet/ip.h>
 #include <netinet/if_ether.h>
 #include <netinet/tcp.h>
 #include <netinet/udp.h>
 #include <netinet/ip6.h>
 
 #include <vm/vm.h>
 #include <vm/vm_param.h>
 #include <vm/vm_kern.h>
 #include <vm/pmap.h>
 
 #include <machine/bus.h>
 #include <machine/resource.h>
 #include <machine/frame.h>
 
 #include <sys/bus.h>
 #include <sys/rman.h>
 #include <sys/mutex.h>
 #include <sys/errno.h>
 #include <sys/types.h>
 #include <machine/atomic.h>
 
 #include <machine/intr_machdep.h>
 
 #include <machine/in_cksum.h>
 
 #include <dev/hyperv/include/hyperv.h>
 #include <dev/hyperv/include/hyperv_busdma.h>
 #include "hv_net_vsc.h"
 #include "hv_rndis.h"
 #include "hv_rndis_filter.h"
 
 #define hv_chan_rxr	hv_chan_priv1
 #define hv_chan_txr	hv_chan_priv2
 
 /* Short for Hyper-V network interface */
 #define NETVSC_DEVNAME    "hn"
 
 /*
  * It looks like offset 0 of buf is reserved to hold the softc pointer.
  * The sc pointer evidently not needed, and is not presently populated.
  * The packet offset is where the netvsc_packet starts in the buffer.
  */
 #define HV_NV_SC_PTR_OFFSET_IN_BUF         0
 #define HV_NV_PACKET_OFFSET_IN_BUF         16
 
 /* YYY should get it from the underlying channel */
 #define HN_TX_DESC_CNT			512
 
 #define HN_LROENT_CNT_DEF		128
 
 #define HN_RING_CNT_DEF_MAX		8
 
 #define HN_RNDIS_MSG_LEN		\
     (sizeof(rndis_msg) +		\
      RNDIS_HASHVAL_PPI_SIZE +		\
      RNDIS_VLAN_PPI_SIZE +		\
      RNDIS_TSO_PPI_SIZE +		\
      RNDIS_CSUM_PPI_SIZE)
 #define HN_RNDIS_MSG_BOUNDARY		PAGE_SIZE
 #define HN_RNDIS_MSG_ALIGN		CACHE_LINE_SIZE
 
 #define HN_TX_DATA_BOUNDARY		PAGE_SIZE
 #define HN_TX_DATA_MAXSIZE		IP_MAXPACKET
 #define HN_TX_DATA_SEGSIZE		PAGE_SIZE
 #define HN_TX_DATA_SEGCNT_MAX		\
     (NETVSC_PACKET_MAXPAGE - HV_RF_NUM_TX_RESERVED_PAGE_BUFS)
 
 #define HN_DIRECT_TX_SIZE_DEF		128
 
 #define HN_EARLY_TXEOF_THRESH		8
 
 struct hn_txdesc {
 #ifndef HN_USE_TXDESC_BUFRING
 	SLIST_ENTRY(hn_txdesc) link;
 #endif
 	struct mbuf	*m;
 	struct hn_tx_ring *txr;
 	int		refs;
 	uint32_t	flags;		/* HN_TXD_FLAG_ */
 	netvsc_packet	netvsc_pkt;	/* XXX to be removed */
 
 	bus_dmamap_t	data_dmap;
 
 	bus_addr_t	rndis_msg_paddr;
 	rndis_msg	*rndis_msg;
 	bus_dmamap_t	rndis_msg_dmap;
 };
 
 #define HN_TXD_FLAG_ONLIST	0x1
 #define HN_TXD_FLAG_DMAMAP	0x2
 
 /*
  * Only enable UDP checksum offloading when it is on 2012R2 or
  * later.  UDP checksum offloading doesn't work on earlier
  * Windows releases.
  */
 #define HN_CSUM_ASSIST_WIN8	(CSUM_IP | CSUM_TCP)
 #define HN_CSUM_ASSIST		(CSUM_IP | CSUM_UDP | CSUM_TCP)
 
 #define HN_LRO_LENLIM_MULTIRX_DEF	(12 * ETHERMTU)
 #define HN_LRO_LENLIM_DEF		(25 * ETHERMTU)
 /* YYY 2*MTU is a bit rough, but should be good enough. */
 #define HN_LRO_LENLIM_MIN(ifp)		(2 * (ifp)->if_mtu)
 
 #define HN_LRO_ACKCNT_DEF		1
 
 /*
  * Be aware that this sleepable mutex will exhibit WITNESS errors when
  * certain TCP and ARP code paths are taken.  This appears to be a
  * well-known condition, as all other drivers checked use a sleeping
  * mutex to protect their transmit paths.
  * Also Be aware that mutexes do not play well with semaphores, and there
  * is a conflicting semaphore in a certain channel code path.
  */
 #define NV_LOCK_INIT(_sc, _name) \
 	    mtx_init(&(_sc)->hn_lock, _name, MTX_NETWORK_LOCK, MTX_DEF)
 #define NV_LOCK(_sc)		mtx_lock(&(_sc)->hn_lock)
 #define NV_LOCK_ASSERT(_sc)	mtx_assert(&(_sc)->hn_lock, MA_OWNED)
 #define NV_UNLOCK(_sc)		mtx_unlock(&(_sc)->hn_lock)
 #define NV_LOCK_DESTROY(_sc)	mtx_destroy(&(_sc)->hn_lock)
 
 
 /*
  * Globals
  */
 
 int hv_promisc_mode = 0;    /* normal mode by default */
 
 SYSCTL_NODE(_hw, OID_AUTO, hn, CTLFLAG_RD | CTLFLAG_MPSAFE, NULL,
     "Hyper-V network interface");
 
 /* Trust tcp segements verification on host side. */
 static int hn_trust_hosttcp = 1;
 SYSCTL_INT(_hw_hn, OID_AUTO, trust_hosttcp, CTLFLAG_RDTUN,
     &hn_trust_hosttcp, 0,
     "Trust tcp segement verification on host side, "
     "when csum info is missing (global setting)");
 
 /* Trust udp datagrams verification on host side. */
 static int hn_trust_hostudp = 1;
 SYSCTL_INT(_hw_hn, OID_AUTO, trust_hostudp, CTLFLAG_RDTUN,
     &hn_trust_hostudp, 0,
     "Trust udp datagram verification on host side, "
     "when csum info is missing (global setting)");
 
 /* Trust ip packets verification on host side. */
 static int hn_trust_hostip = 1;
 SYSCTL_INT(_hw_hn, OID_AUTO, trust_hostip, CTLFLAG_RDTUN,
     &hn_trust_hostip, 0,
     "Trust ip packet verification on host side, "
     "when csum info is missing (global setting)");
 
 #if __FreeBSD_version >= 1100045
 /* Limit TSO burst size */
 static int hn_tso_maxlen = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, tso_maxlen, CTLFLAG_RDTUN,
     &hn_tso_maxlen, 0, "TSO burst limit");
 #endif
 
 /* Limit chimney send size */
 static int hn_tx_chimney_size = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, tx_chimney_size, CTLFLAG_RDTUN,
     &hn_tx_chimney_size, 0, "Chimney send packet size limit");
 
 /* Limit the size of packet for direct transmission */
 static int hn_direct_tx_size = HN_DIRECT_TX_SIZE_DEF;
 SYSCTL_INT(_hw_hn, OID_AUTO, direct_tx_size, CTLFLAG_RDTUN,
     &hn_direct_tx_size, 0, "Size of the packet for direct transmission");
 
 #if defined(INET) || defined(INET6)
 #if __FreeBSD_version >= 1100095
 static int hn_lro_entry_count = HN_LROENT_CNT_DEF;
 SYSCTL_INT(_hw_hn, OID_AUTO, lro_entry_count, CTLFLAG_RDTUN,
     &hn_lro_entry_count, 0, "LRO entry count");
 #endif
 #endif
 
 static int hn_share_tx_taskq = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, share_tx_taskq, CTLFLAG_RDTUN,
     &hn_share_tx_taskq, 0, "Enable shared TX taskqueue");
 
 static struct taskqueue	*hn_tx_taskq;
 
 #ifndef HN_USE_TXDESC_BUFRING
 static int hn_use_txdesc_bufring = 0;
 #else
 static int hn_use_txdesc_bufring = 1;
 #endif
 SYSCTL_INT(_hw_hn, OID_AUTO, use_txdesc_bufring, CTLFLAG_RD,
     &hn_use_txdesc_bufring, 0, "Use buf_ring for TX descriptors");
 
 static int hn_bind_tx_taskq = -1;
 SYSCTL_INT(_hw_hn, OID_AUTO, bind_tx_taskq, CTLFLAG_RDTUN,
     &hn_bind_tx_taskq, 0, "Bind TX taskqueue to the specified cpu");
 
 static int hn_use_if_start = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, use_if_start, CTLFLAG_RDTUN,
     &hn_use_if_start, 0, "Use if_start TX method");
 
 static int hn_chan_cnt = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, chan_cnt, CTLFLAG_RDTUN,
     &hn_chan_cnt, 0,
     "# of channels to use; each channel has one RX ring and one TX ring");
 
 static int hn_tx_ring_cnt = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, tx_ring_cnt, CTLFLAG_RDTUN,
     &hn_tx_ring_cnt, 0, "# of TX rings to use");
 
 static int hn_tx_swq_depth = 0;
 SYSCTL_INT(_hw_hn, OID_AUTO, tx_swq_depth, CTLFLAG_RDTUN,
     &hn_tx_swq_depth, 0, "Depth of IFQ or BUFRING");
 
 static u_int hn_cpu_index;
 
 /*
  * Forward declarations
  */
 static void hn_stop(hn_softc_t *sc);
 static void hn_ifinit_locked(hn_softc_t *sc);
 static void hn_ifinit(void *xsc);
 static int  hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data);
 static int hn_start_locked(struct hn_tx_ring *txr, int len);
 static void hn_start(struct ifnet *ifp);
 static void hn_start_txeof(struct hn_tx_ring *);
 static int hn_ifmedia_upd(struct ifnet *ifp);
 static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr);
 #if __FreeBSD_version >= 1100099
 static int hn_lro_lenlim_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_lro_ackcnt_sysctl(SYSCTL_HANDLER_ARGS);
 #endif
 static int hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_rx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_rx_stat_u64_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_conf_int_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_check_iplen(const struct mbuf *, int);
 static int hn_create_tx_ring(struct hn_softc *, int);
 static void hn_destroy_tx_ring(struct hn_tx_ring *);
 static int hn_create_tx_data(struct hn_softc *, int);
 static void hn_destroy_tx_data(struct hn_softc *);
 static void hn_start_taskfunc(void *, int);
 static void hn_start_txeof_taskfunc(void *, int);
 static void hn_stop_tx_tasks(struct hn_softc *);
 static int hn_encap(struct hn_tx_ring *, struct hn_txdesc *, struct mbuf **);
 static void hn_create_rx_data(struct hn_softc *sc, int);
 static void hn_destroy_rx_data(struct hn_softc *sc);
 static void hn_set_tx_chimney_size(struct hn_softc *, int);
 static void hn_channel_attach(struct hn_softc *, struct hv_vmbus_channel *);
 static void hn_subchan_attach(struct hn_softc *, struct hv_vmbus_channel *);
 
 static int hn_transmit(struct ifnet *, struct mbuf *);
 static void hn_xmit_qflush(struct ifnet *);
 static int hn_xmit(struct hn_tx_ring *, int);
 static void hn_xmit_txeof(struct hn_tx_ring *);
 static void hn_xmit_taskfunc(void *, int);
 static void hn_xmit_txeof_taskfunc(void *, int);
 
 #if __FreeBSD_version >= 1100099
 static void
 hn_set_lro_lenlim(struct hn_softc *sc, int lenlim)
 {
 	int i;
 
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i)
 		sc->hn_rx_ring[i].hn_lro.lro_length_lim = lenlim;
 }
 #endif
 
 static int
 hn_get_txswq_depth(const struct hn_tx_ring *txr)
 {
 
 	KASSERT(txr->hn_txdesc_cnt > 0, ("tx ring is not setup yet"));
 	if (hn_tx_swq_depth < txr->hn_txdesc_cnt)
 		return txr->hn_txdesc_cnt;
 	return hn_tx_swq_depth;
 }
 
 static int
 hn_ifmedia_upd(struct ifnet *ifp __unused)
 {
 
 	return EOPNOTSUPP;
 }
 
 static void
 hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct hn_softc *sc = ifp->if_softc;
 
 	ifmr->ifm_status = IFM_AVALID;
 	ifmr->ifm_active = IFM_ETHER;
 
 	if (!sc->hn_carrier) {
 		ifmr->ifm_active |= IFM_NONE;
 		return;
 	}
 	ifmr->ifm_status |= IFM_ACTIVE;
 	ifmr->ifm_active |= IFM_10G_T | IFM_FDX;
 }
 
 /* {F8615163-DF3E-46c5-913F-F2D2F965ED0E} */
 static const hv_guid g_net_vsc_device_type = {
 	.data = {0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46,
 		0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E}
 };
 
 /*
  * Standard probe entry point.
  *
  */
 static int
 netvsc_probe(device_t dev)
 {
 	const char *p;
 
 	p = vmbus_get_type(dev);
 	if (!memcmp(p, &g_net_vsc_device_type.data, sizeof(hv_guid))) {
 		device_set_desc(dev, "Hyper-V Network Interface");
 		if (bootverbose)
 			printf("Netvsc probe... DONE \n");
 
 		return (BUS_PROBE_DEFAULT);
 	}
 
 	return (ENXIO);
 }
 
 /*
  * Standard attach entry point.
  *
  * Called when the driver is loaded.  It allocates needed resources,
  * and initializes the "hardware" and software.
  */
 static int
 netvsc_attach(device_t dev)
 {
 	struct hv_device *device_ctx = vmbus_get_devctx(dev);
 	struct hv_vmbus_channel *pri_chan;
 	netvsc_device_info device_info;
 	hn_softc_t *sc;
 	int unit = device_get_unit(dev);
 	struct ifnet *ifp = NULL;
 	int error, ring_cnt, tx_ring_cnt;
 #if __FreeBSD_version >= 1100045
 	int tso_maxlen;
 #endif
 
 	sc = device_get_softc(dev);
 
 	sc->hn_unit = unit;
 	sc->hn_dev = dev;
 
 	if (hn_tx_taskq == NULL) {
 		sc->hn_tx_taskq = taskqueue_create("hn_tx", M_WAITOK,
 		    taskqueue_thread_enqueue, &sc->hn_tx_taskq);
 		if (hn_bind_tx_taskq >= 0) {
 			int cpu = hn_bind_tx_taskq;
 			cpuset_t cpu_set;
 
 			if (cpu > mp_ncpus - 1)
 				cpu = mp_ncpus - 1;
 			CPU_SETOF(cpu, &cpu_set);
 			taskqueue_start_threads_cpuset(&sc->hn_tx_taskq, 1,
 			    PI_NET, &cpu_set, "%s tx",
 			    device_get_nameunit(dev));
 		} else {
 			taskqueue_start_threads(&sc->hn_tx_taskq, 1, PI_NET,
 			    "%s tx", device_get_nameunit(dev));
 		}
 	} else {
 		sc->hn_tx_taskq = hn_tx_taskq;
 	}
 	NV_LOCK_INIT(sc, "NetVSCLock");
 
 	sc->hn_dev_obj = device_ctx;
 
 	ifp = sc->hn_ifp = if_alloc(IFT_ETHER);
 	ifp->if_softc = sc;
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 
 	/*
 	 * Figure out the # of RX rings (ring_cnt) and the # of TX rings
 	 * to use (tx_ring_cnt).
 	 *
 	 * NOTE:
 	 * The # of RX rings to use is same as the # of channels to use.
 	 */
 	ring_cnt = hn_chan_cnt;
 	if (ring_cnt <= 0) {
 		/* Default */
 		ring_cnt = mp_ncpus;
 		if (ring_cnt > HN_RING_CNT_DEF_MAX)
 			ring_cnt = HN_RING_CNT_DEF_MAX;
 	} else if (ring_cnt > mp_ncpus) {
 		ring_cnt = mp_ncpus;
 	}
 
 	tx_ring_cnt = hn_tx_ring_cnt;
 	if (tx_ring_cnt <= 0 || tx_ring_cnt > ring_cnt)
 		tx_ring_cnt = ring_cnt;
 	if (hn_use_if_start) {
 		/* ifnet.if_start only needs one TX ring. */
 		tx_ring_cnt = 1;
 	}
 
 	/*
 	 * Set the leader CPU for channels.
 	 */
 	sc->hn_cpu = atomic_fetchadd_int(&hn_cpu_index, ring_cnt) % mp_ncpus;
 
 	error = hn_create_tx_data(sc, tx_ring_cnt);
 	if (error)
 		goto failed;
 	hn_create_rx_data(sc, ring_cnt);
 
 	/*
 	 * Associate the first TX/RX ring w/ the primary channel.
 	 */
 	pri_chan = device_ctx->channel;
 	KASSERT(HV_VMBUS_CHAN_ISPRIMARY(pri_chan), ("not primary channel"));
 	KASSERT(pri_chan->offer_msg.offer.sub_channel_index == 0,
 	    ("primary channel subidx %u",
 	     pri_chan->offer_msg.offer.sub_channel_index));
 	hn_channel_attach(sc, pri_chan);
 
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 	ifp->if_ioctl = hn_ioctl;
 	ifp->if_init = hn_ifinit;
 	/* needed by hv_rf_on_device_add() code */
 	ifp->if_mtu = ETHERMTU;
 	if (hn_use_if_start) {
 		int qdepth = hn_get_txswq_depth(&sc->hn_tx_ring[0]);
 
 		ifp->if_start = hn_start;
 		IFQ_SET_MAXLEN(&ifp->if_snd, qdepth);
 		ifp->if_snd.ifq_drv_maxlen = qdepth - 1;
 		IFQ_SET_READY(&ifp->if_snd);
 	} else {
 		ifp->if_transmit = hn_transmit;
 		ifp->if_qflush = hn_xmit_qflush;
 	}
 
 	ifmedia_init(&sc->hn_media, 0, hn_ifmedia_upd, hn_ifmedia_sts);
 	ifmedia_add(&sc->hn_media, IFM_ETHER | IFM_AUTO, 0, NULL);
 	ifmedia_set(&sc->hn_media, IFM_ETHER | IFM_AUTO);
 	/* XXX ifmedia_set really should do this for us */
 	sc->hn_media.ifm_media = sc->hn_media.ifm_cur->ifm_media;
 
 	/*
 	 * Tell upper layers that we support full VLAN capability.
 	 */
 	ifp->if_hdrlen = sizeof(struct ether_vlan_header);
 	ifp->if_capabilities |=
 	    IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO |
 	    IFCAP_LRO;
 	ifp->if_capenable |=
 	    IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO |
 	    IFCAP_LRO;
 	ifp->if_hwassist = sc->hn_tx_ring[0].hn_csum_assist | CSUM_TSO;
 
 	error = hv_rf_on_device_add(device_ctx, &device_info, ring_cnt);
 	if (error)
 		goto failed;
 	KASSERT(sc->net_dev->num_channel > 0 &&
 	    sc->net_dev->num_channel <= sc->hn_rx_ring_inuse,
 	    ("invalid channel count %u, should be less than %d",
 	     sc->net_dev->num_channel, sc->hn_rx_ring_inuse));
 
 	/*
 	 * Set the # of TX/RX rings that could be used according to
 	 * the # of channels that host offered.
 	 */
 	if (sc->hn_tx_ring_inuse > sc->net_dev->num_channel)
 		sc->hn_tx_ring_inuse = sc->net_dev->num_channel;
 	sc->hn_rx_ring_inuse = sc->net_dev->num_channel;
 	device_printf(dev, "%d TX ring, %d RX ring\n",
 	    sc->hn_tx_ring_inuse, sc->hn_rx_ring_inuse);
 
 	if (sc->net_dev->num_channel > 1) {
 		struct hv_vmbus_channel **subchan;
 		int subchan_cnt = sc->net_dev->num_channel - 1;
 		int i;
 
 		/* Wait for sub-channels setup to complete. */
 		subchan = vmbus_get_subchan(pri_chan, subchan_cnt);
 
 		/* Attach the sub-channels. */
 		for (i = 0; i < subchan_cnt; ++i) {
 			/* NOTE: Calling order is critical. */
 			hn_subchan_attach(sc, subchan[i]);
 			hv_nv_subchan_attach(subchan[i]);
 		}
 
 		/* Release the sub-channels */
 		vmbus_rel_subchan(subchan, subchan_cnt);
 		device_printf(dev, "%d sub-channels setup done\n", subchan_cnt);
 	}
 
 #if __FreeBSD_version >= 1100099
 	if (sc->hn_rx_ring_inuse > 1) {
 		/*
 		 * Reduce TCP segment aggregation limit for multiple
 		 * RX rings to increase ACK timeliness.
 		 */
 		hn_set_lro_lenlim(sc, HN_LRO_LENLIM_MULTIRX_DEF);
 	}
 #endif
 
 	if (device_info.link_state == 0) {
 		sc->hn_carrier = 1;
 	}
 
 #if __FreeBSD_version >= 1100045
 	tso_maxlen = hn_tso_maxlen;
 	if (tso_maxlen <= 0 || tso_maxlen > IP_MAXPACKET)
 		tso_maxlen = IP_MAXPACKET;
 
 	ifp->if_hw_tsomaxsegcount = HN_TX_DATA_SEGCNT_MAX;
 	ifp->if_hw_tsomaxsegsize = PAGE_SIZE;
 	ifp->if_hw_tsomax = tso_maxlen -
 	    (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 #endif
 
 	ether_ifattach(ifp, device_info.mac_addr);
 
 #if __FreeBSD_version >= 1100045
 	if_printf(ifp, "TSO: %u/%u/%u\n", ifp->if_hw_tsomax,
 	    ifp->if_hw_tsomaxsegcount, ifp->if_hw_tsomaxsegsize);
 #endif
 
 	sc->hn_tx_chimney_max = sc->net_dev->send_section_size;
 	hn_set_tx_chimney_size(sc, sc->hn_tx_chimney_max);
 	if (hn_tx_chimney_size > 0 &&
 	    hn_tx_chimney_size < sc->hn_tx_chimney_max)
 		hn_set_tx_chimney_size(sc, hn_tx_chimney_size);
 
 	return (0);
 failed:
 	hn_destroy_tx_data(sc);
 	if (ifp != NULL)
 		if_free(ifp);
 	return (error);
 }
 
 /*
  * Standard detach entry point
  */
 static int
 netvsc_detach(device_t dev)
 {
 	struct hn_softc *sc = device_get_softc(dev);
 	struct hv_device *hv_device = vmbus_get_devctx(dev); 
 
 	if (bootverbose)
 		printf("netvsc_detach\n");
 
 	/*
 	 * XXXKYS:  Need to clean up all our
 	 * driver state; this is the driver
 	 * unloading.
 	 */
 
 	/*
 	 * XXXKYS:  Need to stop outgoing traffic and unregister
 	 * the netdevice.
 	 */
 
 	hv_rf_on_device_remove(hv_device, HV_RF_NV_DESTROY_CHANNEL);
 
 	hn_stop_tx_tasks(sc);
 
 	ifmedia_removeall(&sc->hn_media);
 	hn_destroy_rx_data(sc);
 	hn_destroy_tx_data(sc);
 
 	if (sc->hn_tx_taskq != hn_tx_taskq)
 		taskqueue_free(sc->hn_tx_taskq);
 
 	return (0);
 }
 
 /*
  * Standard shutdown entry point
  */
 static int
 netvsc_shutdown(device_t dev)
 {
 	return (0);
 }
 
 static __inline int
 hn_txdesc_dmamap_load(struct hn_tx_ring *txr, struct hn_txdesc *txd,
     struct mbuf **m_head, bus_dma_segment_t *segs, int *nsegs)
 {
 	struct mbuf *m = *m_head;
 	int error;
 
 	error = bus_dmamap_load_mbuf_sg(txr->hn_tx_data_dtag, txd->data_dmap,
 	    m, segs, nsegs, BUS_DMA_NOWAIT);
 	if (error == EFBIG) {
 		struct mbuf *m_new;
 
 		m_new = m_collapse(m, M_NOWAIT, HN_TX_DATA_SEGCNT_MAX);
 		if (m_new == NULL)
 			return ENOBUFS;
 		else
 			*m_head = m = m_new;
 		txr->hn_tx_collapsed++;
 
 		error = bus_dmamap_load_mbuf_sg(txr->hn_tx_data_dtag,
 		    txd->data_dmap, m, segs, nsegs, BUS_DMA_NOWAIT);
 	}
 	if (!error) {
 		bus_dmamap_sync(txr->hn_tx_data_dtag, txd->data_dmap,
 		    BUS_DMASYNC_PREWRITE);
 		txd->flags |= HN_TXD_FLAG_DMAMAP;
 	}
 	return error;
 }
 
 static __inline void
 hn_txdesc_dmamap_unload(struct hn_tx_ring *txr, struct hn_txdesc *txd)
 {
 
 	if (txd->flags & HN_TXD_FLAG_DMAMAP) {
 		bus_dmamap_sync(txr->hn_tx_data_dtag,
 		    txd->data_dmap, BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txr->hn_tx_data_dtag,
 		    txd->data_dmap);
 		txd->flags &= ~HN_TXD_FLAG_DMAMAP;
 	}
 }
 
 static __inline int
 hn_txdesc_put(struct hn_tx_ring *txr, struct hn_txdesc *txd)
 {
 
 	KASSERT((txd->flags & HN_TXD_FLAG_ONLIST) == 0,
 	    ("put an onlist txd %#x", txd->flags));
 
 	KASSERT(txd->refs > 0, ("invalid txd refs %d", txd->refs));
 	if (atomic_fetchadd_int(&txd->refs, -1) != 1)
 		return 0;
 
 	hn_txdesc_dmamap_unload(txr, txd);
 	if (txd->m != NULL) {
 		m_freem(txd->m);
 		txd->m = NULL;
 	}
 
 	txd->flags |= HN_TXD_FLAG_ONLIST;
 
 #ifndef HN_USE_TXDESC_BUFRING
 	mtx_lock_spin(&txr->hn_txlist_spin);
 	KASSERT(txr->hn_txdesc_avail >= 0 &&
 	    txr->hn_txdesc_avail < txr->hn_txdesc_cnt,
 	    ("txdesc_put: invalid txd avail %d", txr->hn_txdesc_avail));
 	txr->hn_txdesc_avail++;
 	SLIST_INSERT_HEAD(&txr->hn_txlist, txd, link);
 	mtx_unlock_spin(&txr->hn_txlist_spin);
 #else
 	atomic_add_int(&txr->hn_txdesc_avail, 1);
 	buf_ring_enqueue(txr->hn_txdesc_br, txd);
 #endif
 
 	return 1;
 }
 
 static __inline struct hn_txdesc *
 hn_txdesc_get(struct hn_tx_ring *txr)
 {
 	struct hn_txdesc *txd;
 
 #ifndef HN_USE_TXDESC_BUFRING
 	mtx_lock_spin(&txr->hn_txlist_spin);
 	txd = SLIST_FIRST(&txr->hn_txlist);
 	if (txd != NULL) {
 		KASSERT(txr->hn_txdesc_avail > 0,
 		    ("txdesc_get: invalid txd avail %d", txr->hn_txdesc_avail));
 		txr->hn_txdesc_avail--;
 		SLIST_REMOVE_HEAD(&txr->hn_txlist, link);
 	}
 	mtx_unlock_spin(&txr->hn_txlist_spin);
 #else
 	txd = buf_ring_dequeue_sc(txr->hn_txdesc_br);
 #endif
 
 	if (txd != NULL) {
 #ifdef HN_USE_TXDESC_BUFRING
 		atomic_subtract_int(&txr->hn_txdesc_avail, 1);
 #endif
 		KASSERT(txd->m == NULL && txd->refs == 0 &&
 		    (txd->flags & HN_TXD_FLAG_ONLIST), ("invalid txd"));
 		txd->flags &= ~HN_TXD_FLAG_ONLIST;
 		txd->refs = 1;
 	}
 	return txd;
 }
 
 static __inline void
 hn_txdesc_hold(struct hn_txdesc *txd)
 {
 
 	/* 0->1 transition will never work */
 	KASSERT(txd->refs > 0, ("invalid refs %d", txd->refs));
 	atomic_add_int(&txd->refs, 1);
 }
 
 static __inline void
 hn_txeof(struct hn_tx_ring *txr)
 {
 	txr->hn_has_txeof = 0;
 	txr->hn_txeof(txr);
 }
 
 static void
 hn_tx_done(struct hv_vmbus_channel *chan, void *xpkt)
 {
 	netvsc_packet *packet = xpkt;
 	struct hn_txdesc *txd;
 	struct hn_tx_ring *txr;
 
 	txd = (struct hn_txdesc *)(uintptr_t)
 	    packet->compl.send.send_completion_tid;
 
 	txr = txd->txr;
 	KASSERT(txr->hn_chan == chan,
 	    ("channel mismatch, on channel%u, should be channel%u",
 	     chan->offer_msg.offer.sub_channel_index,
 	     txr->hn_chan->offer_msg.offer.sub_channel_index));
 
 	txr->hn_has_txeof = 1;
 	hn_txdesc_put(txr, txd);
 
 	++txr->hn_txdone_cnt;
 	if (txr->hn_txdone_cnt >= HN_EARLY_TXEOF_THRESH) {
 		txr->hn_txdone_cnt = 0;
 		if (txr->hn_oactive)
 			hn_txeof(txr);
 	}
 }
 
 void
 netvsc_channel_rollup(struct hv_vmbus_channel *chan)
 {
 	struct hn_tx_ring *txr = chan->hv_chan_txr;
 #if defined(INET) || defined(INET6)
 	struct hn_rx_ring *rxr = chan->hv_chan_rxr;
 
 	tcp_lro_flush_all(&rxr->hn_lro);
 #endif
 
 	/*
 	 * NOTE:
 	 * 'txr' could be NULL, if multiple channels and
 	 * ifnet.if_start method are enabled.
 	 */
 	if (txr == NULL || !txr->hn_has_txeof)
 		return;
 
 	txr->hn_txdone_cnt = 0;
 	hn_txeof(txr);
 }
 
 /*
  * NOTE:
  * If this function fails, then both txd and m_head0 will be freed.
  */
 static int
 hn_encap(struct hn_tx_ring *txr, struct hn_txdesc *txd, struct mbuf **m_head0)
 {
 	bus_dma_segment_t segs[HN_TX_DATA_SEGCNT_MAX];
 	int error, nsegs, i;
 	struct mbuf *m_head = *m_head0;
 	netvsc_packet *packet;
 	rndis_msg *rndis_mesg;
 	rndis_packet *rndis_pkt;
 	rndis_per_packet_info *rppi;
 	struct rndis_hash_value *hash_value;
 	uint32_t rndis_msg_size;
 
 	packet = &txd->netvsc_pkt;
 	packet->is_data_pkt = TRUE;
 	packet->tot_data_buf_len = m_head->m_pkthdr.len;
 
 	/*
 	 * extension points to the area reserved for the
 	 * rndis_filter_packet, which is placed just after
 	 * the netvsc_packet (and rppi struct, if present;
 	 * length is updated later).
 	 */
 	rndis_mesg = txd->rndis_msg;
 	/* XXX not necessary */
 	memset(rndis_mesg, 0, HN_RNDIS_MSG_LEN);
 	rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
 
 	rndis_pkt = &rndis_mesg->msg.packet;
 	rndis_pkt->data_offset = sizeof(rndis_packet);
 	rndis_pkt->data_length = packet->tot_data_buf_len;
 	rndis_pkt->per_pkt_info_offset = sizeof(rndis_packet);
 
 	rndis_msg_size = RNDIS_MESSAGE_SIZE(rndis_packet);
 
 	/*
 	 * Set the hash value for this packet, so that the host could
 	 * dispatch the TX done event for this packet back to this TX
 	 * ring's channel.
 	 */
 	rndis_msg_size += RNDIS_HASHVAL_PPI_SIZE;
 	rppi = hv_set_rppi_data(rndis_mesg, RNDIS_HASHVAL_PPI_SIZE,
 	    nbl_hash_value);
 	hash_value = (struct rndis_hash_value *)((uint8_t *)rppi +
 	    rppi->per_packet_info_offset);
 	hash_value->hash_value = txr->hn_tx_idx;
 
 	if (m_head->m_flags & M_VLANTAG) {
 		ndis_8021q_info *rppi_vlan_info;
 
 		rndis_msg_size += RNDIS_VLAN_PPI_SIZE;
 		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_VLAN_PPI_SIZE,
 		    ieee_8021q_info);
 
 		rppi_vlan_info = (ndis_8021q_info *)((uint8_t *)rppi +
 		    rppi->per_packet_info_offset);
 		rppi_vlan_info->u1.s1.vlan_id =
 		    m_head->m_pkthdr.ether_vtag & 0xfff;
 	}
 
 	if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 		rndis_tcp_tso_info *tso_info;	
 		struct ether_vlan_header *eh;
 		int ether_len;
 
 		/*
 		 * XXX need m_pullup and use mtodo
 		 */
 		eh = mtod(m_head, struct ether_vlan_header*);
 		if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN))
 			ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 		else
 			ether_len = ETHER_HDR_LEN;
 
 		rndis_msg_size += RNDIS_TSO_PPI_SIZE;
 		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_TSO_PPI_SIZE,
 		    tcp_large_send_info);
 
 		tso_info = (rndis_tcp_tso_info *)((uint8_t *)rppi +
 		    rppi->per_packet_info_offset);
 		tso_info->lso_v2_xmit.type =
 		    RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE;
 
 #ifdef INET
 		if (m_head->m_pkthdr.csum_flags & CSUM_IP_TSO) {
 			struct ip *ip =
 			    (struct ip *)(m_head->m_data + ether_len);
 			unsigned long iph_len = ip->ip_hl << 2;
 			struct tcphdr *th =
 			    (struct tcphdr *)((caddr_t)ip + iph_len);
 
 			tso_info->lso_v2_xmit.ip_version =
 			    RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4;
 			ip->ip_len = 0;
 			ip->ip_sum = 0;
 
 			th->th_sum = in_pseudo(ip->ip_src.s_addr,
 			    ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 		}
 #endif
 #if defined(INET6) && defined(INET)
 		else
 #endif
 #ifdef INET6
 		{
 			struct ip6_hdr *ip6 = (struct ip6_hdr *)
 			    (m_head->m_data + ether_len);
 			struct tcphdr *th = (struct tcphdr *)(ip6 + 1);
 
 			tso_info->lso_v2_xmit.ip_version =
 			    RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6;
 			ip6->ip6_plen = 0;
 			th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 		}
 #endif
 		tso_info->lso_v2_xmit.tcp_header_offset = 0;
 		tso_info->lso_v2_xmit.mss = m_head->m_pkthdr.tso_segsz;
 	} else if (m_head->m_pkthdr.csum_flags & txr->hn_csum_assist) {
 		rndis_tcp_ip_csum_info *csum_info;
 
 		rndis_msg_size += RNDIS_CSUM_PPI_SIZE;
 		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_CSUM_PPI_SIZE,
 		    tcpip_chksum_info);
 		csum_info = (rndis_tcp_ip_csum_info *)((uint8_t *)rppi +
 		    rppi->per_packet_info_offset);
 
 		csum_info->xmit.is_ipv4 = 1;
 		if (m_head->m_pkthdr.csum_flags & CSUM_IP)
 			csum_info->xmit.ip_header_csum = 1;
 
 		if (m_head->m_pkthdr.csum_flags & CSUM_TCP) {
 			csum_info->xmit.tcp_csum = 1;
 			csum_info->xmit.tcp_header_offset = 0;
 		} else if (m_head->m_pkthdr.csum_flags & CSUM_UDP) {
 			csum_info->xmit.udp_csum = 1;
 		}
 	}
 
 	rndis_mesg->msg_len = packet->tot_data_buf_len + rndis_msg_size;
 	packet->tot_data_buf_len = rndis_mesg->msg_len;
 
 	/*
 	 * Chimney send, if the packet could fit into one chimney buffer.
 	 */
 	if (packet->tot_data_buf_len < txr->hn_tx_chimney_size) {
 		netvsc_dev *net_dev = txr->hn_sc->net_dev;
 		uint32_t send_buf_section_idx;
 
 		txr->hn_tx_chimney_tried++;
 		send_buf_section_idx =
 		    hv_nv_get_next_send_section(net_dev);
 		if (send_buf_section_idx !=
 		    NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) {
 			uint8_t *dest = ((uint8_t *)net_dev->send_buf +
 			    (send_buf_section_idx *
 			     net_dev->send_section_size));
 
 			memcpy(dest, rndis_mesg, rndis_msg_size);
 			dest += rndis_msg_size;
 			m_copydata(m_head, 0, m_head->m_pkthdr.len, dest);
 
 			packet->send_buf_section_idx = send_buf_section_idx;
 			packet->send_buf_section_size =
 			    packet->tot_data_buf_len;
 			packet->page_buf_count = 0;
 			txr->hn_tx_chimney++;
 			goto done;
 		}
 	}
 
 	error = hn_txdesc_dmamap_load(txr, txd, &m_head, segs, &nsegs);
 	if (error) {
 		int freed;
 
 		/*
 		 * This mbuf is not linked w/ the txd yet, so free it now.
 		 */
 		m_freem(m_head);
 		*m_head0 = NULL;
 
 		freed = hn_txdesc_put(txr, txd);
 		KASSERT(freed != 0,
 		    ("fail to free txd upon txdma error"));
 
 		txr->hn_txdma_failed++;
 		if_inc_counter(txr->hn_sc->hn_ifp, IFCOUNTER_OERRORS, 1);
 		return error;
 	}
 	*m_head0 = m_head;
 
 	packet->page_buf_count = nsegs + HV_RF_NUM_TX_RESERVED_PAGE_BUFS;
 
 	/* send packet with page buffer */
 	packet->page_buffers[0].pfn = atop(txd->rndis_msg_paddr);
 	packet->page_buffers[0].offset = txd->rndis_msg_paddr & PAGE_MASK;
 	packet->page_buffers[0].length = rndis_msg_size;
 
 	/*
 	 * Fill the page buffers with mbuf info starting at index
 	 * HV_RF_NUM_TX_RESERVED_PAGE_BUFS.
 	 */
 	for (i = 0; i < nsegs; ++i) {
 		hv_vmbus_page_buffer *pb = &packet->page_buffers[
 		    i + HV_RF_NUM_TX_RESERVED_PAGE_BUFS];
 
 		pb->pfn = atop(segs[i].ds_addr);
 		pb->offset = segs[i].ds_addr & PAGE_MASK;
 		pb->length = segs[i].ds_len;
 	}
 
 	packet->send_buf_section_idx =
 	    NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX;
 	packet->send_buf_section_size = 0;
 done:
 	txd->m = m_head;
 
 	/* Set the completion routine */
 	packet->compl.send.on_send_completion = hn_tx_done;
 	packet->compl.send.send_completion_context = packet;
 	packet->compl.send.send_completion_tid = (uint64_t)(uintptr_t)txd;
 
 	return 0;
 }
 
 /*
  * NOTE:
  * If this function fails, then txd will be freed, but the mbuf
  * associated w/ the txd will _not_ be freed.
  */
 static int
 hn_send_pkt(struct ifnet *ifp, struct hn_tx_ring *txr, struct hn_txdesc *txd)
 {
 	int error, send_failed = 0;
 
 again:
 	/*
 	 * Make sure that txd is not freed before ETHER_BPF_MTAP.
 	 */
 	hn_txdesc_hold(txd);
 	error = hv_nv_on_send(txr->hn_chan, &txd->netvsc_pkt);
 	if (!error) {
 		ETHER_BPF_MTAP(ifp, txd->m);
 		if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 		if (!hn_use_if_start) {
 			if_inc_counter(ifp, IFCOUNTER_OBYTES,
 			    txd->m->m_pkthdr.len);
 			if (txd->m->m_flags & M_MCAST)
 				if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 		}
 		txr->hn_pkts++;
 	}
 	hn_txdesc_put(txr, txd);
 
 	if (__predict_false(error)) {
 		int freed;
 
 		/*
 		 * This should "really rarely" happen.
 		 *
 		 * XXX Too many RX to be acked or too many sideband
 		 * commands to run?  Ask netvsc_channel_rollup()
 		 * to kick start later.
 		 */
 		txr->hn_has_txeof = 1;
 		if (!send_failed) {
 			txr->hn_send_failed++;
 			send_failed = 1;
 			/*
 			 * Try sending again after set hn_has_txeof;
 			 * in case that we missed the last
 			 * netvsc_channel_rollup().
 			 */
 			goto again;
 		}
 		if_printf(ifp, "send failed\n");
 
 		/*
 		 * Caller will perform further processing on the
 		 * associated mbuf, so don't free it in hn_txdesc_put();
 		 * only unload it from the DMA map in hn_txdesc_put(),
 		 * if it was loaded.
 		 */
 		txd->m = NULL;
 		freed = hn_txdesc_put(txr, txd);
 		KASSERT(freed != 0,
 		    ("fail to free txd upon send error"));
 
 		txr->hn_send_failed++;
 	}
 	return error;
 }
 
 /*
  * Start a transmit of one or more packets
  */
 static int
 hn_start_locked(struct hn_tx_ring *txr, int len)
 {
 	struct hn_softc *sc = txr->hn_sc;
 	struct ifnet *ifp = sc->hn_ifp;
 
 	KASSERT(hn_use_if_start,
 	    ("hn_start_locked is called, when if_start is disabled"));
 	KASSERT(txr == &sc->hn_tx_ring[0], ("not the first TX ring"));
 	mtx_assert(&txr->hn_tx_lock, MA_OWNED);
 
 	if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
 	    IFF_DRV_RUNNING)
 		return 0;
 
 	while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
 		struct hn_txdesc *txd;
 		struct mbuf *m_head;
 		int error;
 
 		IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
 		if (m_head == NULL)
 			break;
 
 		if (len > 0 && m_head->m_pkthdr.len > len) {
 			/*
 			 * This sending could be time consuming; let callers
 			 * dispatch this packet sending (and sending of any
 			 * following up packets) to tx taskqueue.
 			 */
 			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 			return 1;
 		}
 
 		txd = hn_txdesc_get(txr);
 		if (txd == NULL) {
 			txr->hn_no_txdescs++;
 			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 			atomic_set_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
 			break;
 		}
 
 		error = hn_encap(txr, txd, &m_head);
 		if (error) {
 			/* Both txd and m_head are freed */
 			continue;
 		}
 
 		error = hn_send_pkt(ifp, txr, txd);
 		if (__predict_false(error)) {
 			/* txd is freed, but m_head is not */
 			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 			atomic_set_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
 			break;
 		}
 	}
 	return 0;
 }
 
 /*
  * Link up/down notification
  */
 void
 netvsc_linkstatus_callback(struct hv_device *device_obj, uint32_t status)
 {
 	hn_softc_t *sc = device_get_softc(device_obj->device);
 
 	if (status == 1) {
 		sc->hn_carrier = 1;
 	} else {
 		sc->hn_carrier = 0;
 	}
 }
 
 /*
  * Append the specified data to the indicated mbuf chain,
  * Extend the mbuf chain if the new data does not fit in
  * existing space.
  *
  * This is a minor rewrite of m_append() from sys/kern/uipc_mbuf.c.
  * There should be an equivalent in the kernel mbuf code,
  * but there does not appear to be one yet.
  *
  * Differs from m_append() in that additional mbufs are
  * allocated with cluster size MJUMPAGESIZE, and filled
  * accordingly.
  *
  * Return 1 if able to complete the job; otherwise 0.
  */
 static int
 hv_m_append(struct mbuf *m0, int len, c_caddr_t cp)
 {
 	struct mbuf *m, *n;
 	int remainder, space;
 
 	for (m = m0; m->m_next != NULL; m = m->m_next)
 		;
 	remainder = len;
 	space = M_TRAILINGSPACE(m);
 	if (space > 0) {
 		/*
 		 * Copy into available space.
 		 */
 		if (space > remainder)
 			space = remainder;
 		bcopy(cp, mtod(m, caddr_t) + m->m_len, space);
 		m->m_len += space;
 		cp += space;
 		remainder -= space;
 	}
 	while (remainder > 0) {
 		/*
 		 * Allocate a new mbuf; could check space
 		 * and allocate a cluster instead.
 		 */
 		n = m_getjcl(M_NOWAIT, m->m_type, 0, MJUMPAGESIZE);
 		if (n == NULL)
 			break;
 		n->m_len = min(MJUMPAGESIZE, remainder);
 		bcopy(cp, mtod(n, caddr_t), n->m_len);
 		cp += n->m_len;
 		remainder -= n->m_len;
 		m->m_next = n;
 		m = n;
 	}
 	if (m0->m_flags & M_PKTHDR)
 		m0->m_pkthdr.len += len - remainder;
 
 	return (remainder == 0);
 }
 
 
 /*
  * Called when we receive a data packet from the "wire" on the
  * specified device
  *
  * Note:  This is no longer used as a callback
  */
 int
 netvsc_recv(struct hv_vmbus_channel *chan, netvsc_packet *packet,
     const rndis_tcp_ip_csum_info *csum_info,
     const struct rndis_hash_info *hash_info,
     const struct rndis_hash_value *hash_value)
 {
 	struct hn_rx_ring *rxr = chan->hv_chan_rxr;
 	struct ifnet *ifp = rxr->hn_ifp;
 	struct mbuf *m_new;
 	int size, do_lro = 0, do_csum = 1;
+	int hash_type = M_HASHTYPE_OPAQUE_HASH;
 
 	if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
 		return (0);
 
 	/*
 	 * Bail out if packet contains more data than configured MTU.
 	 */
 	if (packet->tot_data_buf_len > (ifp->if_mtu + ETHER_HDR_LEN)) {
 		return (0);
 	} else if (packet->tot_data_buf_len <= MHLEN) {
 		m_new = m_gethdr(M_NOWAIT, MT_DATA);
 		if (m_new == NULL) {
 			if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
 			return (0);
 		}
 		memcpy(mtod(m_new, void *), packet->data,
 		    packet->tot_data_buf_len);
 		m_new->m_pkthdr.len = m_new->m_len = packet->tot_data_buf_len;
 		rxr->hn_small_pkts++;
 	} else {
 		/*
 		 * Get an mbuf with a cluster.  For packets 2K or less,
 		 * get a standard 2K cluster.  For anything larger, get a
 		 * 4K cluster.  Any buffers larger than 4K can cause problems
 		 * if looped around to the Hyper-V TX channel, so avoid them.
 		 */
 		size = MCLBYTES;
 		if (packet->tot_data_buf_len > MCLBYTES) {
 			/* 4096 */
 			size = MJUMPAGESIZE;
 		}
 
 		m_new = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, size);
 		if (m_new == NULL) {
 			if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
 			return (0);
 		}
 
 		hv_m_append(m_new, packet->tot_data_buf_len, packet->data);
 	}
 	m_new->m_pkthdr.rcvif = ifp;
 
 	if (__predict_false((ifp->if_capenable & IFCAP_RXCSUM) == 0))
 		do_csum = 0;
 
 	/* receive side checksum offload */
 	if (csum_info != NULL) {
 		/* IP csum offload */
 		if (csum_info->receive.ip_csum_succeeded && do_csum) {
 			m_new->m_pkthdr.csum_flags |=
 			    (CSUM_IP_CHECKED | CSUM_IP_VALID);
 			rxr->hn_csum_ip++;
 		}
 
 		/* TCP/UDP csum offload */
 		if ((csum_info->receive.tcp_csum_succeeded ||
 		     csum_info->receive.udp_csum_succeeded) && do_csum) {
 			m_new->m_pkthdr.csum_flags |=
 			    (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 			m_new->m_pkthdr.csum_data = 0xffff;
 			if (csum_info->receive.tcp_csum_succeeded)
 				rxr->hn_csum_tcp++;
 			else
 				rxr->hn_csum_udp++;
 		}
 
 		if (csum_info->receive.ip_csum_succeeded &&
 		    csum_info->receive.tcp_csum_succeeded)
 			do_lro = 1;
 	} else {
 		const struct ether_header *eh;
 		uint16_t etype;
 		int hoff;
 
 		hoff = sizeof(*eh);
 		if (m_new->m_len < hoff)
 			goto skip;
 		eh = mtod(m_new, struct ether_header *);
 		etype = ntohs(eh->ether_type);
 		if (etype == ETHERTYPE_VLAN) {
 			const struct ether_vlan_header *evl;
 
 			hoff = sizeof(*evl);
 			if (m_new->m_len < hoff)
 				goto skip;
 			evl = mtod(m_new, struct ether_vlan_header *);
 			etype = ntohs(evl->evl_proto);
 		}
 
 		if (etype == ETHERTYPE_IP) {
 			int pr;
 
 			pr = hn_check_iplen(m_new, hoff);
 			if (pr == IPPROTO_TCP) {
 				if (do_csum &&
 				    (rxr->hn_trust_hcsum &
 				     HN_TRUST_HCSUM_TCP)) {
 					rxr->hn_csum_trusted++;
 					m_new->m_pkthdr.csum_flags |=
 					   (CSUM_IP_CHECKED | CSUM_IP_VALID |
 					    CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 					m_new->m_pkthdr.csum_data = 0xffff;
 				}
 				do_lro = 1;
 			} else if (pr == IPPROTO_UDP) {
 				if (do_csum &&
 				    (rxr->hn_trust_hcsum &
 				     HN_TRUST_HCSUM_UDP)) {
 					rxr->hn_csum_trusted++;
 					m_new->m_pkthdr.csum_flags |=
 					   (CSUM_IP_CHECKED | CSUM_IP_VALID |
 					    CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 					m_new->m_pkthdr.csum_data = 0xffff;
 				}
 			} else if (pr != IPPROTO_DONE && do_csum &&
 			    (rxr->hn_trust_hcsum & HN_TRUST_HCSUM_IP)) {
 				rxr->hn_csum_trusted++;
 				m_new->m_pkthdr.csum_flags |=
 				    (CSUM_IP_CHECKED | CSUM_IP_VALID);
 			}
 		}
 	}
 skip:
 	if ((packet->vlan_tci != 0) &&
 	    (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0) {
 		m_new->m_pkthdr.ether_vtag = packet->vlan_tci;
 		m_new->m_flags |= M_VLANTAG;
 	}
 
 	if (hash_info != NULL && hash_value != NULL) {
-		int hash_type = M_HASHTYPE_OPAQUE;
-
 		rxr->hn_rss_pkts++;
 		m_new->m_pkthdr.flowid = hash_value->hash_value;
 		if ((hash_info->hash_info & NDIS_HASH_FUNCTION_MASK) ==
 		    NDIS_HASH_FUNCTION_TOEPLITZ) {
 			uint32_t type =
 			    (hash_info->hash_info & NDIS_HASH_TYPE_MASK);
 
 			switch (type) {
 			case NDIS_HASH_IPV4:
 				hash_type = M_HASHTYPE_RSS_IPV4;
 				break;
 
 			case NDIS_HASH_TCP_IPV4:
 				hash_type = M_HASHTYPE_RSS_TCP_IPV4;
 				break;
 
 			case NDIS_HASH_IPV6:
 				hash_type = M_HASHTYPE_RSS_IPV6;
 				break;
 
 			case NDIS_HASH_IPV6_EX:
 				hash_type = M_HASHTYPE_RSS_IPV6_EX;
 				break;
 
 			case NDIS_HASH_TCP_IPV6:
 				hash_type = M_HASHTYPE_RSS_TCP_IPV6;
 				break;
 
 			case NDIS_HASH_TCP_IPV6_EX:
 				hash_type = M_HASHTYPE_RSS_TCP_IPV6_EX;
 				break;
 			}
 		}
-		M_HASHTYPE_SET(m_new, hash_type);
 	} else {
-		if (hash_value != NULL)
+		if (hash_value != NULL) {
 			m_new->m_pkthdr.flowid = hash_value->hash_value;
-		else
+		} else {
 			m_new->m_pkthdr.flowid = rxr->hn_rx_idx;
-		M_HASHTYPE_SET(m_new, M_HASHTYPE_OPAQUE);
+			hash_type = M_HASHTYPE_OPAQUE;
+		}
 	}
+	M_HASHTYPE_SET(m_new, hash_type);
 
 	/*
 	 * Note:  Moved RX completion back to hv_nv_on_receive() so all
 	 * messages (not just data messages) will trigger a response.
 	 */
 
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 	rxr->hn_pkts++;
 
 	if ((ifp->if_capenable & IFCAP_LRO) && do_lro) {
 #if defined(INET) || defined(INET6)
 		struct lro_ctrl *lro = &rxr->hn_lro;
 
 		if (lro->lro_cnt) {
 			rxr->hn_lro_tried++;
 			if (tcp_lro_rx(lro, m_new, 0) == 0) {
 				/* DONE! */
 				return 0;
 			}
 		}
 #endif
 	}
 
 	/* We're not holding the lock here, so don't release it */
 	(*ifp->if_input)(ifp, m_new);
 
 	return (0);
 }
 
 /*
  * Rules for using sc->temp_unusable:
  * 1.  sc->temp_unusable can only be read or written while holding NV_LOCK()
  * 2.  code reading sc->temp_unusable under NV_LOCK(), and finding 
  *     sc->temp_unusable set, must release NV_LOCK() and exit
  * 3.  to retain exclusive control of the interface,
  *     sc->temp_unusable must be set by code before releasing NV_LOCK()
  * 4.  only code setting sc->temp_unusable can clear sc->temp_unusable
  * 5.  code setting sc->temp_unusable must eventually clear sc->temp_unusable
  */
 
 /*
  * Standard ioctl entry point.  Called when the user wants to configure
  * the interface.
  */
 static int
 hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
 	hn_softc_t *sc = ifp->if_softc;
 	struct ifreq *ifr = (struct ifreq *)data;
 #ifdef INET
 	struct ifaddr *ifa = (struct ifaddr *)data;
 #endif
 	netvsc_device_info device_info;
 	struct hv_device *hn_dev;
 	int mask, error = 0;
 	int retry_cnt = 500;
 	
 	switch(cmd) {
 
 	case SIOCSIFADDR:
 #ifdef INET
 		if (ifa->ifa_addr->sa_family == AF_INET) {
 			ifp->if_flags |= IFF_UP;
 			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
 				hn_ifinit(sc);
 			arp_ifinit(ifp, ifa);
 		} else
 #endif
 		error = ether_ioctl(ifp, cmd, data);
 		break;
 	case SIOCSIFMTU:
 		hn_dev = vmbus_get_devctx(sc->hn_dev);
 
 		/* Check MTU value change */
 		if (ifp->if_mtu == ifr->ifr_mtu)
 			break;
 
 		if (ifr->ifr_mtu > NETVSC_MAX_CONFIGURABLE_MTU) {
 			error = EINVAL;
 			break;
 		}
 
 		/* Obtain and record requested MTU */
 		ifp->if_mtu = ifr->ifr_mtu;
 
 #if __FreeBSD_version >= 1100099
 		/*
 		 * Make sure that LRO aggregation length limit is still
 		 * valid, after the MTU change.
 		 */
 		NV_LOCK(sc);
 		if (sc->hn_rx_ring[0].hn_lro.lro_length_lim <
 		    HN_LRO_LENLIM_MIN(ifp))
 			hn_set_lro_lenlim(sc, HN_LRO_LENLIM_MIN(ifp));
 		NV_UNLOCK(sc);
 #endif
 
 		do {
 			NV_LOCK(sc);
 			if (!sc->temp_unusable) {
 				sc->temp_unusable = TRUE;
 				retry_cnt = -1;
 			}
 			NV_UNLOCK(sc);
 			if (retry_cnt > 0) {
 				retry_cnt--;
 				DELAY(5 * 1000);
 			}
 		} while (retry_cnt > 0);
 
 		if (retry_cnt == 0) {
 			error = EINVAL;
 			break;
 		}
 
 		/* We must remove and add back the device to cause the new
 		 * MTU to take effect.  This includes tearing down, but not
 		 * deleting the channel, then bringing it back up.
 		 */
 		error = hv_rf_on_device_remove(hn_dev, HV_RF_NV_RETAIN_CHANNEL);
 		if (error) {
 			NV_LOCK(sc);
 			sc->temp_unusable = FALSE;
 			NV_UNLOCK(sc);
 			break;
 		}
 		error = hv_rf_on_device_add(hn_dev, &device_info,
 		    sc->hn_rx_ring_inuse);
 		if (error) {
 			NV_LOCK(sc);
 			sc->temp_unusable = FALSE;
 			NV_UNLOCK(sc);
 			break;
 		}
 
 		sc->hn_tx_chimney_max = sc->net_dev->send_section_size;
 		if (sc->hn_tx_ring[0].hn_tx_chimney_size >
 		    sc->hn_tx_chimney_max)
 			hn_set_tx_chimney_size(sc, sc->hn_tx_chimney_max);
 
 		hn_ifinit_locked(sc);
 
 		NV_LOCK(sc);
 		sc->temp_unusable = FALSE;
 		NV_UNLOCK(sc);
 		break;
 	case SIOCSIFFLAGS:
 		do {
                        NV_LOCK(sc);
                        if (!sc->temp_unusable) {
                                sc->temp_unusable = TRUE;
                                retry_cnt = -1;
                        }
                        NV_UNLOCK(sc);
                        if (retry_cnt > 0) {
                       	        retry_cnt--;
                         	DELAY(5 * 1000);
                        }
                 } while (retry_cnt > 0);
 
                 if (retry_cnt == 0) {
                        error = EINVAL;
                        break;
                 }
 
 		if (ifp->if_flags & IFF_UP) {
 			/*
 			 * If only the state of the PROMISC flag changed,
 			 * then just use the 'set promisc mode' command
 			 * instead of reinitializing the entire NIC. Doing
 			 * a full re-init means reloading the firmware and
 			 * waiting for it to start up, which may take a
 			 * second or two.
 			 */
 #ifdef notyet
 			/* Fixme:  Promiscuous mode? */
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING &&
 			    ifp->if_flags & IFF_PROMISC &&
 			    !(sc->hn_if_flags & IFF_PROMISC)) {
 				/* do something here for Hyper-V */
 			} else if (ifp->if_drv_flags & IFF_DRV_RUNNING &&
 			    !(ifp->if_flags & IFF_PROMISC) &&
 			    sc->hn_if_flags & IFF_PROMISC) {
 				/* do something here for Hyper-V */
 			} else
 #endif
 				hn_ifinit_locked(sc);
 		} else {
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 				hn_stop(sc);
 			}
 		}
 		NV_LOCK(sc);
 		sc->temp_unusable = FALSE;
 		NV_UNLOCK(sc);
 		sc->hn_if_flags = ifp->if_flags;
 		error = 0;
 		break;
 	case SIOCSIFCAP:
 		NV_LOCK(sc);
 
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 		if (mask & IFCAP_TXCSUM) {
 			ifp->if_capenable ^= IFCAP_TXCSUM;
 			if (ifp->if_capenable & IFCAP_TXCSUM) {
 				ifp->if_hwassist |=
 				    sc->hn_tx_ring[0].hn_csum_assist;
 			} else {
 				ifp->if_hwassist &=
 				    ~sc->hn_tx_ring[0].hn_csum_assist;
 			}
 		}
 
 		if (mask & IFCAP_RXCSUM)
 			ifp->if_capenable ^= IFCAP_RXCSUM;
 
 		if (mask & IFCAP_LRO)
 			ifp->if_capenable ^= IFCAP_LRO;
 
 		if (mask & IFCAP_TSO4) {
 			ifp->if_capenable ^= IFCAP_TSO4;
 			if (ifp->if_capenable & IFCAP_TSO4)
 				ifp->if_hwassist |= CSUM_IP_TSO;
 			else
 				ifp->if_hwassist &= ~CSUM_IP_TSO;
 		}
 
 		if (mask & IFCAP_TSO6) {
 			ifp->if_capenable ^= IFCAP_TSO6;
 			if (ifp->if_capenable & IFCAP_TSO6)
 				ifp->if_hwassist |= CSUM_IP6_TSO;
 			else
 				ifp->if_hwassist &= ~CSUM_IP6_TSO;
 		}
 
 		NV_UNLOCK(sc);
 		error = 0;
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 #ifdef notyet
 		/* Fixme:  Multicast mode? */
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 			NV_LOCK(sc);
 			netvsc_setmulti(sc);
 			NV_UNLOCK(sc);
 			error = 0;
 		}
 #endif
 		error = EINVAL;
 		break;
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		error = ifmedia_ioctl(ifp, ifr, &sc->hn_media, cmd);
 		break;
 	default:
 		error = ether_ioctl(ifp, cmd, data);
 		break;
 	}
 
 	return (error);
 }
 
 /*
  *
  */
 static void
 hn_stop(hn_softc_t *sc)
 {
 	struct ifnet *ifp;
 	int ret, i;
 	struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
 
 	ifp = sc->hn_ifp;
 
 	if (bootverbose)
 		printf(" Closing Device ...\n");
 
 	atomic_clear_int(&ifp->if_drv_flags,
 	    (IFF_DRV_RUNNING | IFF_DRV_OACTIVE));
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
 		sc->hn_tx_ring[i].hn_oactive = 0;
 
 	if_link_state_change(ifp, LINK_STATE_DOWN);
 	sc->hn_initdone = 0;
 
 	ret = hv_rf_on_close(device_ctx);
 }
 
 /*
  * FreeBSD transmit entry point
  */
 static void
 hn_start(struct ifnet *ifp)
 {
 	struct hn_softc *sc = ifp->if_softc;
 	struct hn_tx_ring *txr = &sc->hn_tx_ring[0];
 
 	if (txr->hn_sched_tx)
 		goto do_sched;
 
 	if (mtx_trylock(&txr->hn_tx_lock)) {
 		int sched;
 
 		sched = hn_start_locked(txr, txr->hn_direct_tx_size);
 		mtx_unlock(&txr->hn_tx_lock);
 		if (!sched)
 			return;
 	}
 do_sched:
 	taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_tx_task);
 }
 
 static void
 hn_start_txeof(struct hn_tx_ring *txr)
 {
 	struct hn_softc *sc = txr->hn_sc;
 	struct ifnet *ifp = sc->hn_ifp;
 
 	KASSERT(txr == &sc->hn_tx_ring[0], ("not the first TX ring"));
 
 	if (txr->hn_sched_tx)
 		goto do_sched;
 
 	if (mtx_trylock(&txr->hn_tx_lock)) {
 		int sched;
 
 		atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
 		sched = hn_start_locked(txr, txr->hn_direct_tx_size);
 		mtx_unlock(&txr->hn_tx_lock);
 		if (sched) {
 			taskqueue_enqueue(txr->hn_tx_taskq,
 			    &txr->hn_tx_task);
 		}
 	} else {
 do_sched:
 		/*
 		 * Release the OACTIVE earlier, with the hope, that
 		 * others could catch up.  The task will clear the
 		 * flag again with the hn_tx_lock to avoid possible
 		 * races.
 		 */
 		atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
 		taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_txeof_task);
 	}
 }
 
 /*
  *
  */
 static void
 hn_ifinit_locked(hn_softc_t *sc)
 {
 	struct ifnet *ifp;
 	struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
 	int ret, i;
 
 	ifp = sc->hn_ifp;
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		return;
 	}
 
 	hv_promisc_mode = 1;
 
 	ret = hv_rf_on_open(device_ctx);
 	if (ret != 0) {
 		return;
 	} else {
 		sc->hn_initdone = 1;
 	}
 
 	atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
 		sc->hn_tx_ring[i].hn_oactive = 0;
 
 	atomic_set_int(&ifp->if_drv_flags, IFF_DRV_RUNNING);
 	if_link_state_change(ifp, LINK_STATE_UP);
 }
 
 /*
  *
  */
 static void
 hn_ifinit(void *xsc)
 {
 	hn_softc_t *sc = xsc;
 
 	NV_LOCK(sc);
 	if (sc->temp_unusable) {
 		NV_UNLOCK(sc);
 		return;
 	}
 	sc->temp_unusable = TRUE;
 	NV_UNLOCK(sc);
 
 	hn_ifinit_locked(sc);
 
 	NV_LOCK(sc);
 	sc->temp_unusable = FALSE;
 	NV_UNLOCK(sc);
 }
 
 #ifdef LATER
 /*
  *
  */
 static void
 hn_watchdog(struct ifnet *ifp)
 {
 	hn_softc_t *sc;
 	sc = ifp->if_softc;
 
 	printf("hn%d: watchdog timeout -- resetting\n", sc->hn_unit);
 	hn_ifinit(sc);    /*???*/
 	if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 }
 #endif
 
 #if __FreeBSD_version >= 1100099
 
 static int
 hn_lro_lenlim_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	unsigned int lenlim;
 	int error;
 
 	lenlim = sc->hn_rx_ring[0].hn_lro.lro_length_lim;
 	error = sysctl_handle_int(oidp, &lenlim, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	if (lenlim < HN_LRO_LENLIM_MIN(sc->hn_ifp) ||
 	    lenlim > TCP_LRO_LENGTH_MAX)
 		return EINVAL;
 
 	NV_LOCK(sc);
 	hn_set_lro_lenlim(sc, lenlim);
 	NV_UNLOCK(sc);
 	return 0;
 }
 
 static int
 hn_lro_ackcnt_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int ackcnt, error, i;
 
 	/*
 	 * lro_ackcnt_lim is append count limit,
 	 * +1 to turn it into aggregation limit.
 	 */
 	ackcnt = sc->hn_rx_ring[0].hn_lro.lro_ackcnt_lim + 1;
 	error = sysctl_handle_int(oidp, &ackcnt, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	if (ackcnt < 2 || ackcnt > (TCP_LRO_ACKCNT_MAX + 1))
 		return EINVAL;
 
 	/*
 	 * Convert aggregation limit back to append
 	 * count limit.
 	 */
 	--ackcnt;
 	NV_LOCK(sc);
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i)
 		sc->hn_rx_ring[i].hn_lro.lro_ackcnt_lim = ackcnt;
 	NV_UNLOCK(sc);
 	return 0;
 }
 
 #endif
 
 static int
 hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int hcsum = arg2;
 	int on, error, i;
 
 	on = 0;
 	if (sc->hn_rx_ring[0].hn_trust_hcsum & hcsum)
 		on = 1;
 
 	error = sysctl_handle_int(oidp, &on, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	NV_LOCK(sc);
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
 		struct hn_rx_ring *rxr = &sc->hn_rx_ring[i];
 
 		if (on)
 			rxr->hn_trust_hcsum |= hcsum;
 		else
 			rxr->hn_trust_hcsum &= ~hcsum;
 	}
 	NV_UNLOCK(sc);
 	return 0;
 }
 
 static int
 hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int chimney_size, error;
 
 	chimney_size = sc->hn_tx_ring[0].hn_tx_chimney_size;
 	error = sysctl_handle_int(oidp, &chimney_size, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	if (chimney_size > sc->hn_tx_chimney_max || chimney_size <= 0)
 		return EINVAL;
 
 	hn_set_tx_chimney_size(sc, chimney_size);
 	return 0;
 }
 
 static int
 hn_rx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int ofs = arg2, i, error;
 	struct hn_rx_ring *rxr;
 	u_long stat;
 
 	stat = 0;
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
 		rxr = &sc->hn_rx_ring[i];
 		stat += *((u_long *)((uint8_t *)rxr + ofs));
 	}
 
 	error = sysctl_handle_long(oidp, &stat, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	/* Zero out this stat. */
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
 		rxr = &sc->hn_rx_ring[i];
 		*((u_long *)((uint8_t *)rxr + ofs)) = 0;
 	}
 	return 0;
 }
 
 static int
 hn_rx_stat_u64_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int ofs = arg2, i, error;
 	struct hn_rx_ring *rxr;
 	uint64_t stat;
 
 	stat = 0;
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
 		rxr = &sc->hn_rx_ring[i];
 		stat += *((uint64_t *)((uint8_t *)rxr + ofs));
 	}
 
 	error = sysctl_handle_64(oidp, &stat, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	/* Zero out this stat. */
 	for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
 		rxr = &sc->hn_rx_ring[i];
 		*((uint64_t *)((uint8_t *)rxr + ofs)) = 0;
 	}
 	return 0;
 }
 
 static int
 hn_tx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int ofs = arg2, i, error;
 	struct hn_tx_ring *txr;
 	u_long stat;
 
 	stat = 0;
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
 		txr = &sc->hn_tx_ring[i];
 		stat += *((u_long *)((uint8_t *)txr + ofs));
 	}
 
 	error = sysctl_handle_long(oidp, &stat, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	/* Zero out this stat. */
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
 		txr = &sc->hn_tx_ring[i];
 		*((u_long *)((uint8_t *)txr + ofs)) = 0;
 	}
 	return 0;
 }
 
 static int
 hn_tx_conf_int_sysctl(SYSCTL_HANDLER_ARGS)
 {
 	struct hn_softc *sc = arg1;
 	int ofs = arg2, i, error, conf;
 	struct hn_tx_ring *txr;
 
 	txr = &sc->hn_tx_ring[0];
 	conf = *((int *)((uint8_t *)txr + ofs));
 
 	error = sysctl_handle_int(oidp, &conf, 0, req);
 	if (error || req->newptr == NULL)
 		return error;
 
 	NV_LOCK(sc);
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
 		txr = &sc->hn_tx_ring[i];
 		*((int *)((uint8_t *)txr + ofs)) = conf;
 	}
 	NV_UNLOCK(sc);
 
 	return 0;
 }
 
 static int
 hn_check_iplen(const struct mbuf *m, int hoff)
 {
 	const struct ip *ip;
 	int len, iphlen, iplen;
 	const struct tcphdr *th;
 	int thoff;				/* TCP data offset */
 
 	len = hoff + sizeof(struct ip);
 
 	/* The packet must be at least the size of an IP header. */
 	if (m->m_pkthdr.len < len)
 		return IPPROTO_DONE;
 
 	/* The fixed IP header must reside completely in the first mbuf. */
 	if (m->m_len < len)
 		return IPPROTO_DONE;
 
 	ip = mtodo(m, hoff);
 
 	/* Bound check the packet's stated IP header length. */
 	iphlen = ip->ip_hl << 2;
 	if (iphlen < sizeof(struct ip))		/* minimum header length */
 		return IPPROTO_DONE;
 
 	/* The full IP header must reside completely in the one mbuf. */
 	if (m->m_len < hoff + iphlen)
 		return IPPROTO_DONE;
 
 	iplen = ntohs(ip->ip_len);
 
 	/*
 	 * Check that the amount of data in the buffers is as
 	 * at least much as the IP header would have us expect.
 	 */
 	if (m->m_pkthdr.len < hoff + iplen)
 		return IPPROTO_DONE;
 
 	/*
 	 * Ignore IP fragments.
 	 */
 	if (ntohs(ip->ip_off) & (IP_OFFMASK | IP_MF))
 		return IPPROTO_DONE;
 
 	/*
 	 * The TCP/IP or UDP/IP header must be entirely contained within
 	 * the first fragment of a packet.
 	 */
 	switch (ip->ip_p) {
 	case IPPROTO_TCP:
 		if (iplen < iphlen + sizeof(struct tcphdr))
 			return IPPROTO_DONE;
 		if (m->m_len < hoff + iphlen + sizeof(struct tcphdr))
 			return IPPROTO_DONE;
 		th = (const struct tcphdr *)((const uint8_t *)ip + iphlen);
 		thoff = th->th_off << 2;
 		if (thoff < sizeof(struct tcphdr) || thoff + iphlen > iplen)
 			return IPPROTO_DONE;
 		if (m->m_len < hoff + iphlen + thoff)
 			return IPPROTO_DONE;
 		break;
 	case IPPROTO_UDP:
 		if (iplen < iphlen + sizeof(struct udphdr))
 			return IPPROTO_DONE;
 		if (m->m_len < hoff + iphlen + sizeof(struct udphdr))
 			return IPPROTO_DONE;
 		break;
 	default:
 		if (iplen < iphlen)
 			return IPPROTO_DONE;
 		break;
 	}
 	return ip->ip_p;
 }
 
 static void
 hn_create_rx_data(struct hn_softc *sc, int ring_cnt)
 {
 	struct sysctl_oid_list *child;
 	struct sysctl_ctx_list *ctx;
 	device_t dev = sc->hn_dev;
 #if defined(INET) || defined(INET6)
 #if __FreeBSD_version >= 1100095
 	int lroent_cnt;
 #endif
 #endif
 	int i;
 
 	sc->hn_rx_ring_cnt = ring_cnt;
 	sc->hn_rx_ring_inuse = sc->hn_rx_ring_cnt;
 
 	sc->hn_rx_ring = malloc(sizeof(struct hn_rx_ring) * sc->hn_rx_ring_cnt,
 	    M_NETVSC, M_WAITOK | M_ZERO);
 
 #if defined(INET) || defined(INET6)
 #if __FreeBSD_version >= 1100095
 	lroent_cnt = hn_lro_entry_count;
 	if (lroent_cnt < TCP_LRO_ENTRIES)
 		lroent_cnt = TCP_LRO_ENTRIES;
 	device_printf(dev, "LRO: entry count %d\n", lroent_cnt);
 #endif
 #endif	/* INET || INET6 */
 
 	ctx = device_get_sysctl_ctx(dev);
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev));
 
 	/* Create dev.hn.UNIT.rx sysctl tree */
 	sc->hn_rx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "rx",
 	    CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
 
 	for (i = 0; i < sc->hn_rx_ring_cnt; ++i) {
 		struct hn_rx_ring *rxr = &sc->hn_rx_ring[i];
 
 		if (hn_trust_hosttcp)
 			rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_TCP;
 		if (hn_trust_hostudp)
 			rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_UDP;
 		if (hn_trust_hostip)
 			rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_IP;
 		rxr->hn_ifp = sc->hn_ifp;
 		rxr->hn_rx_idx = i;
 
 		/*
 		 * Initialize LRO.
 		 */
 #if defined(INET) || defined(INET6)
 #if __FreeBSD_version >= 1100095
 		tcp_lro_init_args(&rxr->hn_lro, sc->hn_ifp, lroent_cnt, 0);
 #else
 		tcp_lro_init(&rxr->hn_lro);
 		rxr->hn_lro.ifp = sc->hn_ifp;
 #endif
 #if __FreeBSD_version >= 1100099
 		rxr->hn_lro.lro_length_lim = HN_LRO_LENLIM_DEF;
 		rxr->hn_lro.lro_ackcnt_lim = HN_LRO_ACKCNT_DEF;
 #endif
 #endif	/* INET || INET6 */
 
 		if (sc->hn_rx_sysctl_tree != NULL) {
 			char name[16];
 
 			/*
 			 * Create per RX ring sysctl tree:
 			 * dev.hn.UNIT.rx.RINGID
 			 */
 			snprintf(name, sizeof(name), "%d", i);
 			rxr->hn_rx_sysctl_tree = SYSCTL_ADD_NODE(ctx,
 			    SYSCTL_CHILDREN(sc->hn_rx_sysctl_tree),
 			    OID_AUTO, name, CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
 
 			if (rxr->hn_rx_sysctl_tree != NULL) {
 				SYSCTL_ADD_ULONG(ctx,
 				    SYSCTL_CHILDREN(rxr->hn_rx_sysctl_tree),
 				    OID_AUTO, "packets", CTLFLAG_RW,
 				    &rxr->hn_pkts, "# of packets received");
 				SYSCTL_ADD_ULONG(ctx,
 				    SYSCTL_CHILDREN(rxr->hn_rx_sysctl_tree),
 				    OID_AUTO, "rss_pkts", CTLFLAG_RW,
 				    &rxr->hn_rss_pkts,
 				    "# of packets w/ RSS info received");
 			}
 		}
 	}
 
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_queued",
 	    CTLTYPE_U64 | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_lro.lro_queued),
 	    hn_rx_stat_u64_sysctl, "LU", "LRO queued");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_flushed",
 	    CTLTYPE_U64 | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_lro.lro_flushed),
 	    hn_rx_stat_u64_sysctl, "LU", "LRO flushed");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_tried",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_lro_tried),
 	    hn_rx_stat_ulong_sysctl, "LU", "# of LRO tries");
 #if __FreeBSD_version >= 1100099
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_length_lim",
 	    CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
 	    hn_lro_lenlim_sysctl, "IU",
 	    "Max # of data bytes to be aggregated by LRO");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_ackcnt_lim",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
 	    hn_lro_ackcnt_sysctl, "I",
 	    "Max # of ACKs to be aggregated by LRO");
 #endif
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hosttcp",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_TCP,
 	    hn_trust_hcsum_sysctl, "I",
 	    "Trust tcp segement verification on host side, "
 	    "when csum info is missing");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostudp",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_UDP,
 	    hn_trust_hcsum_sysctl, "I",
 	    "Trust udp datagram verification on host side, "
 	    "when csum info is missing");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostip",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_IP,
 	    hn_trust_hcsum_sysctl, "I",
 	    "Trust ip packet verification on host side, "
 	    "when csum info is missing");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_ip",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_csum_ip),
 	    hn_rx_stat_ulong_sysctl, "LU", "RXCSUM IP");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_tcp",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_csum_tcp),
 	    hn_rx_stat_ulong_sysctl, "LU", "RXCSUM TCP");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_udp",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_csum_udp),
 	    hn_rx_stat_ulong_sysctl, "LU", "RXCSUM UDP");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_trusted",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_csum_trusted),
 	    hn_rx_stat_ulong_sysctl, "LU",
 	    "# of packets that we trust host's csum verification");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "small_pkts",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_rx_ring, hn_small_pkts),
 	    hn_rx_stat_ulong_sysctl, "LU", "# of small packets received");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rx_ring_cnt",
 	    CTLFLAG_RD, &sc->hn_rx_ring_cnt, 0, "# created RX rings");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rx_ring_inuse",
 	    CTLFLAG_RD, &sc->hn_rx_ring_inuse, 0, "# used RX rings");
 }
 
 static void
 hn_destroy_rx_data(struct hn_softc *sc)
 {
 #if defined(INET) || defined(INET6)
 	int i;
 #endif
 
 	if (sc->hn_rx_ring_cnt == 0)
 		return;
 
 #if defined(INET) || defined(INET6)
 	for (i = 0; i < sc->hn_rx_ring_cnt; ++i)
 		tcp_lro_free(&sc->hn_rx_ring[i].hn_lro);
 #endif
 	free(sc->hn_rx_ring, M_NETVSC);
 	sc->hn_rx_ring = NULL;
 
 	sc->hn_rx_ring_cnt = 0;
 	sc->hn_rx_ring_inuse = 0;
 }
 
 static int
 hn_create_tx_ring(struct hn_softc *sc, int id)
 {
 	struct hn_tx_ring *txr = &sc->hn_tx_ring[id];
 	bus_dma_tag_t parent_dtag;
 	int error, i;
 
 	txr->hn_sc = sc;
 	txr->hn_tx_idx = id;
 
 #ifndef HN_USE_TXDESC_BUFRING
 	mtx_init(&txr->hn_txlist_spin, "hn txlist", NULL, MTX_SPIN);
 #endif
 	mtx_init(&txr->hn_tx_lock, "hn tx", NULL, MTX_DEF);
 
 	txr->hn_txdesc_cnt = HN_TX_DESC_CNT;
 	txr->hn_txdesc = malloc(sizeof(struct hn_txdesc) * txr->hn_txdesc_cnt,
 	    M_NETVSC, M_WAITOK | M_ZERO);
 #ifndef HN_USE_TXDESC_BUFRING
 	SLIST_INIT(&txr->hn_txlist);
 #else
 	txr->hn_txdesc_br = buf_ring_alloc(txr->hn_txdesc_cnt, M_NETVSC,
 	    M_WAITOK, &txr->hn_tx_lock);
 #endif
 
 	txr->hn_tx_taskq = sc->hn_tx_taskq;
 
 	if (hn_use_if_start) {
 		txr->hn_txeof = hn_start_txeof;
 		TASK_INIT(&txr->hn_tx_task, 0, hn_start_taskfunc, txr);
 		TASK_INIT(&txr->hn_txeof_task, 0, hn_start_txeof_taskfunc, txr);
 	} else {
 		int br_depth;
 
 		txr->hn_txeof = hn_xmit_txeof;
 		TASK_INIT(&txr->hn_tx_task, 0, hn_xmit_taskfunc, txr);
 		TASK_INIT(&txr->hn_txeof_task, 0, hn_xmit_txeof_taskfunc, txr);
 
 		br_depth = hn_get_txswq_depth(txr);
 		txr->hn_mbuf_br = buf_ring_alloc(br_depth, M_NETVSC,
 		    M_WAITOK, &txr->hn_tx_lock);
 	}
 
 	txr->hn_direct_tx_size = hn_direct_tx_size;
 	if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1)
 		txr->hn_csum_assist = HN_CSUM_ASSIST;
 	else
 		txr->hn_csum_assist = HN_CSUM_ASSIST_WIN8;
 
 	/*
 	 * Always schedule transmission instead of trying to do direct
 	 * transmission.  This one gives the best performance so far.
 	 */
 	txr->hn_sched_tx = 1;
 
 	parent_dtag = bus_get_dma_tag(sc->hn_dev);
 
 	/* DMA tag for RNDIS messages. */
 	error = bus_dma_tag_create(parent_dtag, /* parent */
 	    HN_RNDIS_MSG_ALIGN,		/* alignment */
 	    HN_RNDIS_MSG_BOUNDARY,	/* boundary */
 	    BUS_SPACE_MAXADDR,		/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    HN_RNDIS_MSG_LEN,		/* maxsize */
 	    1,				/* nsegments */
 	    HN_RNDIS_MSG_LEN,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL,			/* lockfunc */
 	    NULL,			/* lockfuncarg */
 	    &txr->hn_tx_rndis_dtag);
 	if (error) {
 		device_printf(sc->hn_dev, "failed to create rndis dmatag\n");
 		return error;
 	}
 
 	/* DMA tag for data. */
 	error = bus_dma_tag_create(parent_dtag, /* parent */
 	    1,				/* alignment */
 	    HN_TX_DATA_BOUNDARY,	/* boundary */
 	    BUS_SPACE_MAXADDR,		/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    HN_TX_DATA_MAXSIZE,		/* maxsize */
 	    HN_TX_DATA_SEGCNT_MAX,	/* nsegments */
 	    HN_TX_DATA_SEGSIZE,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL,			/* lockfunc */
 	    NULL,			/* lockfuncarg */
 	    &txr->hn_tx_data_dtag);
 	if (error) {
 		device_printf(sc->hn_dev, "failed to create data dmatag\n");
 		return error;
 	}
 
 	for (i = 0; i < txr->hn_txdesc_cnt; ++i) {
 		struct hn_txdesc *txd = &txr->hn_txdesc[i];
 
 		txd->txr = txr;
 
 		/*
 		 * Allocate and load RNDIS messages.
 		 */
         	error = bus_dmamem_alloc(txr->hn_tx_rndis_dtag,
 		    (void **)&txd->rndis_msg,
 		    BUS_DMA_WAITOK | BUS_DMA_COHERENT,
 		    &txd->rndis_msg_dmap);
 		if (error) {
 			device_printf(sc->hn_dev,
 			    "failed to allocate rndis_msg, %d\n", i);
 			return error;
 		}
 
 		error = bus_dmamap_load(txr->hn_tx_rndis_dtag,
 		    txd->rndis_msg_dmap,
 		    txd->rndis_msg, HN_RNDIS_MSG_LEN,
 		    hyperv_dma_map_paddr, &txd->rndis_msg_paddr,
 		    BUS_DMA_NOWAIT);
 		if (error) {
 			device_printf(sc->hn_dev,
 			    "failed to load rndis_msg, %d\n", i);
 			bus_dmamem_free(txr->hn_tx_rndis_dtag,
 			    txd->rndis_msg, txd->rndis_msg_dmap);
 			return error;
 		}
 
 		/* DMA map for TX data. */
 		error = bus_dmamap_create(txr->hn_tx_data_dtag, 0,
 		    &txd->data_dmap);
 		if (error) {
 			device_printf(sc->hn_dev,
 			    "failed to allocate tx data dmamap\n");
 			bus_dmamap_unload(txr->hn_tx_rndis_dtag,
 			    txd->rndis_msg_dmap);
 			bus_dmamem_free(txr->hn_tx_rndis_dtag,
 			    txd->rndis_msg, txd->rndis_msg_dmap);
 			return error;
 		}
 
 		/* All set, put it to list */
 		txd->flags |= HN_TXD_FLAG_ONLIST;
 #ifndef HN_USE_TXDESC_BUFRING
 		SLIST_INSERT_HEAD(&txr->hn_txlist, txd, link);
 #else
 		buf_ring_enqueue(txr->hn_txdesc_br, txd);
 #endif
 	}
 	txr->hn_txdesc_avail = txr->hn_txdesc_cnt;
 
 	if (sc->hn_tx_sysctl_tree != NULL) {
 		struct sysctl_oid_list *child;
 		struct sysctl_ctx_list *ctx;
 		char name[16];
 
 		/*
 		 * Create per TX ring sysctl tree:
 		 * dev.hn.UNIT.tx.RINGID
 		 */
 		ctx = device_get_sysctl_ctx(sc->hn_dev);
 		child = SYSCTL_CHILDREN(sc->hn_tx_sysctl_tree);
 
 		snprintf(name, sizeof(name), "%d", id);
 		txr->hn_tx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO,
 		    name, CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
 
 		if (txr->hn_tx_sysctl_tree != NULL) {
 			child = SYSCTL_CHILDREN(txr->hn_tx_sysctl_tree);
 
 			SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_avail",
 			    CTLFLAG_RD, &txr->hn_txdesc_avail, 0,
 			    "# of available TX descs");
 			if (!hn_use_if_start) {
 				SYSCTL_ADD_INT(ctx, child, OID_AUTO, "oactive",
 				    CTLFLAG_RD, &txr->hn_oactive, 0,
 				    "over active");
 			}
 			SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "packets",
 			    CTLFLAG_RW, &txr->hn_pkts,
 			    "# of packets transmitted");
 		}
 	}
 
 	return 0;
 }
 
 static void
 hn_txdesc_dmamap_destroy(struct hn_txdesc *txd)
 {
 	struct hn_tx_ring *txr = txd->txr;
 
 	KASSERT(txd->m == NULL, ("still has mbuf installed"));
 	KASSERT((txd->flags & HN_TXD_FLAG_DMAMAP) == 0, ("still dma mapped"));
 
 	bus_dmamap_unload(txr->hn_tx_rndis_dtag, txd->rndis_msg_dmap);
 	bus_dmamem_free(txr->hn_tx_rndis_dtag, txd->rndis_msg,
 	    txd->rndis_msg_dmap);
 	bus_dmamap_destroy(txr->hn_tx_data_dtag, txd->data_dmap);
 }
 
 static void
 hn_destroy_tx_ring(struct hn_tx_ring *txr)
 {
 	struct hn_txdesc *txd;
 
 	if (txr->hn_txdesc == NULL)
 		return;
 
 #ifndef HN_USE_TXDESC_BUFRING
 	while ((txd = SLIST_FIRST(&txr->hn_txlist)) != NULL) {
 		SLIST_REMOVE_HEAD(&txr->hn_txlist, link);
 		hn_txdesc_dmamap_destroy(txd);
 	}
 #else
 	mtx_lock(&txr->hn_tx_lock);
 	while ((txd = buf_ring_dequeue_sc(txr->hn_txdesc_br)) != NULL)
 		hn_txdesc_dmamap_destroy(txd);
 	mtx_unlock(&txr->hn_tx_lock);
 #endif
 
 	if (txr->hn_tx_data_dtag != NULL)
 		bus_dma_tag_destroy(txr->hn_tx_data_dtag);
 	if (txr->hn_tx_rndis_dtag != NULL)
 		bus_dma_tag_destroy(txr->hn_tx_rndis_dtag);
 
 #ifdef HN_USE_TXDESC_BUFRING
 	buf_ring_free(txr->hn_txdesc_br, M_NETVSC);
 #endif
 
 	free(txr->hn_txdesc, M_NETVSC);
 	txr->hn_txdesc = NULL;
 
 	if (txr->hn_mbuf_br != NULL)
 		buf_ring_free(txr->hn_mbuf_br, M_NETVSC);
 
 #ifndef HN_USE_TXDESC_BUFRING
 	mtx_destroy(&txr->hn_txlist_spin);
 #endif
 	mtx_destroy(&txr->hn_tx_lock);
 }
 
 static int
 hn_create_tx_data(struct hn_softc *sc, int ring_cnt)
 {
 	struct sysctl_oid_list *child;
 	struct sysctl_ctx_list *ctx;
 	int i;
 
 	sc->hn_tx_ring_cnt = ring_cnt;
 	sc->hn_tx_ring_inuse = sc->hn_tx_ring_cnt;
 
 	sc->hn_tx_ring = malloc(sizeof(struct hn_tx_ring) * sc->hn_tx_ring_cnt,
 	    M_NETVSC, M_WAITOK | M_ZERO);
 
 	ctx = device_get_sysctl_ctx(sc->hn_dev);
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->hn_dev));
 
 	/* Create dev.hn.UNIT.tx sysctl tree */
 	sc->hn_tx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "tx",
 	    CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
 
 	for (i = 0; i < sc->hn_tx_ring_cnt; ++i) {
 		int error;
 
 		error = hn_create_tx_ring(sc, i);
 		if (error)
 			return error;
 	}
 
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "no_txdescs",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_no_txdescs),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of times short of TX descs");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "send_failed",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_send_failed),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of hyper-v sending failure");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "txdma_failed",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_txdma_failed),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of TX DMA failure");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_collapsed",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_tx_collapsed),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of TX mbuf collapsed");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_tx_chimney),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of chimney send");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney_tried",
 	    CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_tx_chimney_tried),
 	    hn_tx_stat_ulong_sysctl, "LU", "# of chimney send tries");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_cnt",
 	    CTLFLAG_RD, &sc->hn_tx_ring[0].hn_txdesc_cnt, 0,
 	    "# of total TX descs");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_chimney_max",
 	    CTLFLAG_RD, &sc->hn_tx_chimney_max, 0,
 	    "Chimney send packet size upper boundary");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney_size",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
 	    hn_tx_chimney_size_sysctl,
 	    "I", "Chimney send packet size limit");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "direct_tx_size",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_direct_tx_size),
 	    hn_tx_conf_int_sysctl, "I",
 	    "Size of the packet for direct transmission");
 	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "sched_tx",
 	    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
 	    __offsetof(struct hn_tx_ring, hn_sched_tx),
 	    hn_tx_conf_int_sysctl, "I",
 	    "Always schedule transmission "
 	    "instead of doing direct transmission");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_ring_cnt",
 	    CTLFLAG_RD, &sc->hn_tx_ring_cnt, 0, "# created TX rings");
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_ring_inuse",
 	    CTLFLAG_RD, &sc->hn_tx_ring_inuse, 0, "# used TX rings");
 
 	return 0;
 }
 
 static void
 hn_set_tx_chimney_size(struct hn_softc *sc, int chimney_size)
 {
 	int i;
 
 	NV_LOCK(sc);
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
 		sc->hn_tx_ring[i].hn_tx_chimney_size = chimney_size;
 	NV_UNLOCK(sc);
 }
 
 static void
 hn_destroy_tx_data(struct hn_softc *sc)
 {
 	int i;
 
 	if (sc->hn_tx_ring_cnt == 0)
 		return;
 
 	for (i = 0; i < sc->hn_tx_ring_cnt; ++i)
 		hn_destroy_tx_ring(&sc->hn_tx_ring[i]);
 
 	free(sc->hn_tx_ring, M_NETVSC);
 	sc->hn_tx_ring = NULL;
 
 	sc->hn_tx_ring_cnt = 0;
 	sc->hn_tx_ring_inuse = 0;
 }
 
 static void
 hn_start_taskfunc(void *xtxr, int pending __unused)
 {
 	struct hn_tx_ring *txr = xtxr;
 
 	mtx_lock(&txr->hn_tx_lock);
 	hn_start_locked(txr, 0);
 	mtx_unlock(&txr->hn_tx_lock);
 }
 
 static void
 hn_start_txeof_taskfunc(void *xtxr, int pending __unused)
 {
 	struct hn_tx_ring *txr = xtxr;
 
 	mtx_lock(&txr->hn_tx_lock);
 	atomic_clear_int(&txr->hn_sc->hn_ifp->if_drv_flags, IFF_DRV_OACTIVE);
 	hn_start_locked(txr, 0);
 	mtx_unlock(&txr->hn_tx_lock);
 }
 
 static void
 hn_stop_tx_tasks(struct hn_softc *sc)
 {
 	int i;
 
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
 		struct hn_tx_ring *txr = &sc->hn_tx_ring[i];
 
 		taskqueue_drain(txr->hn_tx_taskq, &txr->hn_tx_task);
 		taskqueue_drain(txr->hn_tx_taskq, &txr->hn_txeof_task);
 	}
 }
 
 static int
 hn_xmit(struct hn_tx_ring *txr, int len)
 {
 	struct hn_softc *sc = txr->hn_sc;
 	struct ifnet *ifp = sc->hn_ifp;
 	struct mbuf *m_head;
 
 	mtx_assert(&txr->hn_tx_lock, MA_OWNED);
 	KASSERT(hn_use_if_start == 0,
 	    ("hn_xmit is called, when if_start is enabled"));
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || txr->hn_oactive)
 		return 0;
 
 	while ((m_head = drbr_peek(ifp, txr->hn_mbuf_br)) != NULL) {
 		struct hn_txdesc *txd;
 		int error;
 
 		if (len > 0 && m_head->m_pkthdr.len > len) {
 			/*
 			 * This sending could be time consuming; let callers
 			 * dispatch this packet sending (and sending of any
 			 * following up packets) to tx taskqueue.
 			 */
 			drbr_putback(ifp, txr->hn_mbuf_br, m_head);
 			return 1;
 		}
 
 		txd = hn_txdesc_get(txr);
 		if (txd == NULL) {
 			txr->hn_no_txdescs++;
 			drbr_putback(ifp, txr->hn_mbuf_br, m_head);
 			txr->hn_oactive = 1;
 			break;
 		}
 
 		error = hn_encap(txr, txd, &m_head);
 		if (error) {
 			/* Both txd and m_head are freed; discard */
 			drbr_advance(ifp, txr->hn_mbuf_br);
 			continue;
 		}
 
 		error = hn_send_pkt(ifp, txr, txd);
 		if (__predict_false(error)) {
 			/* txd is freed, but m_head is not */
 			drbr_putback(ifp, txr->hn_mbuf_br, m_head);
 			txr->hn_oactive = 1;
 			break;
 		}
 
 		/* Sent */
 		drbr_advance(ifp, txr->hn_mbuf_br);
 	}
 	return 0;
 }
 
 static int
 hn_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct hn_softc *sc = ifp->if_softc;
 	struct hn_tx_ring *txr;
 	int error, idx = 0;
 
 	/*
 	 * Select the TX ring based on flowid
 	 */
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
 		idx = m->m_pkthdr.flowid % sc->hn_tx_ring_inuse;
 	txr = &sc->hn_tx_ring[idx];
 
 	error = drbr_enqueue(ifp, txr->hn_mbuf_br, m);
 	if (error) {
 		if_inc_counter(ifp, IFCOUNTER_OQDROPS, 1);
 		return error;
 	}
 
 	if (txr->hn_oactive)
 		return 0;
 
 	if (txr->hn_sched_tx)
 		goto do_sched;
 
 	if (mtx_trylock(&txr->hn_tx_lock)) {
 		int sched;
 
 		sched = hn_xmit(txr, txr->hn_direct_tx_size);
 		mtx_unlock(&txr->hn_tx_lock);
 		if (!sched)
 			return 0;
 	}
 do_sched:
 	taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_tx_task);
 	return 0;
 }
 
 static void
 hn_xmit_qflush(struct ifnet *ifp)
 {
 	struct hn_softc *sc = ifp->if_softc;
 	int i;
 
 	for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
 		struct hn_tx_ring *txr = &sc->hn_tx_ring[i];
 		struct mbuf *m;
 
 		mtx_lock(&txr->hn_tx_lock);
 		while ((m = buf_ring_dequeue_sc(txr->hn_mbuf_br)) != NULL)
 			m_freem(m);
 		mtx_unlock(&txr->hn_tx_lock);
 	}
 	if_qflush(ifp);
 }
 
 static void
 hn_xmit_txeof(struct hn_tx_ring *txr)
 {
 
 	if (txr->hn_sched_tx)
 		goto do_sched;
 
 	if (mtx_trylock(&txr->hn_tx_lock)) {
 		int sched;
 
 		txr->hn_oactive = 0;
 		sched = hn_xmit(txr, txr->hn_direct_tx_size);
 		mtx_unlock(&txr->hn_tx_lock);
 		if (sched) {
 			taskqueue_enqueue(txr->hn_tx_taskq,
 			    &txr->hn_tx_task);
 		}
 	} else {
 do_sched:
 		/*
 		 * Release the oactive earlier, with the hope, that
 		 * others could catch up.  The task will clear the
 		 * oactive again with the hn_tx_lock to avoid possible
 		 * races.
 		 */
 		txr->hn_oactive = 0;
 		taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_txeof_task);
 	}
 }
 
 static void
 hn_xmit_taskfunc(void *xtxr, int pending __unused)
 {
 	struct hn_tx_ring *txr = xtxr;
 
 	mtx_lock(&txr->hn_tx_lock);
 	hn_xmit(txr, 0);
 	mtx_unlock(&txr->hn_tx_lock);
 }
 
 static void
 hn_xmit_txeof_taskfunc(void *xtxr, int pending __unused)
 {
 	struct hn_tx_ring *txr = xtxr;
 
 	mtx_lock(&txr->hn_tx_lock);
 	txr->hn_oactive = 0;
 	hn_xmit(txr, 0);
 	mtx_unlock(&txr->hn_tx_lock);
 }
 
 static void
 hn_channel_attach(struct hn_softc *sc, struct hv_vmbus_channel *chan)
 {
 	struct hn_rx_ring *rxr;
 	int idx;
 
 	idx = chan->offer_msg.offer.sub_channel_index;
 
 	KASSERT(idx >= 0 && idx < sc->hn_rx_ring_inuse,
 	    ("invalid channel index %d, should > 0 && < %d",
 	     idx, sc->hn_rx_ring_inuse));
 	rxr = &sc->hn_rx_ring[idx];
 	KASSERT((rxr->hn_rx_flags & HN_RX_FLAG_ATTACHED) == 0,
 	    ("RX ring %d already attached", idx));
 	rxr->hn_rx_flags |= HN_RX_FLAG_ATTACHED;
 
 	chan->hv_chan_rxr = rxr;
 	if (bootverbose) {
 		if_printf(sc->hn_ifp, "link RX ring %d to channel%u\n",
 		    idx, chan->offer_msg.child_rel_id);
 	}
 
 	if (idx < sc->hn_tx_ring_inuse) {
 		struct hn_tx_ring *txr = &sc->hn_tx_ring[idx];
 
 		KASSERT((txr->hn_tx_flags & HN_TX_FLAG_ATTACHED) == 0,
 		    ("TX ring %d already attached", idx));
 		txr->hn_tx_flags |= HN_TX_FLAG_ATTACHED;
 
 		chan->hv_chan_txr = txr;
 		txr->hn_chan = chan;
 		if (bootverbose) {
 			if_printf(sc->hn_ifp, "link TX ring %d to channel%u\n",
 			    idx, chan->offer_msg.child_rel_id);
 		}
 	}
 
 	/* Bind channel to a proper CPU */
 	vmbus_channel_cpu_set(chan, (sc->hn_cpu + idx) % mp_ncpus);
 }
 
 static void
 hn_subchan_attach(struct hn_softc *sc, struct hv_vmbus_channel *chan)
 {
 
 	KASSERT(!HV_VMBUS_CHAN_ISPRIMARY(chan),
 	    ("subchannel callback on primary channel"));
 	KASSERT(chan->offer_msg.offer.sub_channel_index > 0,
 	    ("invalid channel subidx %u",
 	     chan->offer_msg.offer.sub_channel_index));
 	hn_channel_attach(sc, chan);
 }
 
 static void
 hn_tx_taskq_create(void *arg __unused)
 {
 	if (!hn_share_tx_taskq)
 		return;
 
 	hn_tx_taskq = taskqueue_create("hn_tx", M_WAITOK,
 	    taskqueue_thread_enqueue, &hn_tx_taskq);
 	if (hn_bind_tx_taskq >= 0) {
 		int cpu = hn_bind_tx_taskq;
 		cpuset_t cpu_set;
 
 		if (cpu > mp_ncpus - 1)
 			cpu = mp_ncpus - 1;
 		CPU_SETOF(cpu, &cpu_set);
 		taskqueue_start_threads_cpuset(&hn_tx_taskq, 1, PI_NET,
 		    &cpu_set, "hn tx");
 	} else {
 		taskqueue_start_threads(&hn_tx_taskq, 1, PI_NET, "hn tx");
 	}
 }
 SYSINIT(hn_txtq_create, SI_SUB_DRIVERS, SI_ORDER_FIRST,
     hn_tx_taskq_create, NULL);
 
 static void
 hn_tx_taskq_destroy(void *arg __unused)
 {
 	if (hn_tx_taskq != NULL)
 		taskqueue_free(hn_tx_taskq);
 }
 SYSUNINIT(hn_txtq_destroy, SI_SUB_DRIVERS, SI_ORDER_FIRST,
     hn_tx_taskq_destroy, NULL);
 
 static device_method_t netvsc_methods[] = {
         /* Device interface */
         DEVMETHOD(device_probe,         netvsc_probe),
         DEVMETHOD(device_attach,        netvsc_attach),
         DEVMETHOD(device_detach,        netvsc_detach),
         DEVMETHOD(device_shutdown,      netvsc_shutdown),
 
         { 0, 0 }
 };
 
 static driver_t netvsc_driver = {
         NETVSC_DEVNAME,
         netvsc_methods,
         sizeof(hn_softc_t)
 };
 
 static devclass_t netvsc_devclass;
 
 DRIVER_MODULE(hn, vmbus, netvsc_driver, netvsc_devclass, 0, 0);
 MODULE_VERSION(hn, 1);
 MODULE_DEPEND(hn, vmbus, 1, 1, 1);
Index: projects/vnet/sys/dev/ixgbe/ix_txrx.c
===================================================================
--- projects/vnet/sys/dev/ixgbe/ix_txrx.c	(revision 301546)
+++ projects/vnet/sys/dev/ixgbe/ix_txrx.c	(revision 301547)
@@ -1,2303 +1,2303 @@
 /******************************************************************************
 
   Copyright (c) 2001-2015, Intel Corporation 
   All rights reserved.
   
   Redistribution and use in source and binary forms, with or without 
   modification, are permitted provided that the following conditions are met:
   
    1. Redistributions of source code must retain the above copyright notice, 
       this list of conditions and the following disclaimer.
   
    2. Redistributions in binary form must reproduce the above copyright 
       notice, this list of conditions and the following disclaimer in the 
       documentation and/or other materials provided with the distribution.
   
    3. Neither the name of the Intel Corporation nor the names of its 
       contributors may be used to endorse or promote products derived from 
       this software without specific prior written permission.
   
   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.
 
 ******************************************************************************/
 /*$FreeBSD$*/
 
 
 #ifndef IXGBE_STANDALONE_BUILD
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_rss.h"
 #endif
 
 #include "ixgbe.h"
 
 #ifdef	RSS
 #include <net/rss_config.h>
 #include <netinet/in_rss.h>
 #endif
 
 #ifdef DEV_NETMAP
 #include <net/netmap.h>
 #include <sys/selinfo.h>
 #include <dev/netmap/netmap_kern.h>
 
 extern int ix_crcstrip;
 #endif
 
 /*
 ** HW RSC control:
 **  this feature only works with
 **  IPv4, and only on 82599 and later.
 **  Also this will cause IP forwarding to
 **  fail and that can't be controlled by
 **  the stack as LRO can. For all these
 **  reasons I've deemed it best to leave
 **  this off and not bother with a tuneable
 **  interface, this would need to be compiled
 **  to enable.
 */
 static bool ixgbe_rsc_enable = FALSE;
 
 #ifdef IXGBE_FDIR
 /*
 ** For Flow Director: this is the
 ** number of TX packets we sample
 ** for the filter pool, this means
 ** every 20th packet will be probed.
 **
 ** This feature can be disabled by
 ** setting this to 0.
 */
 static int atr_sample_rate = 20;
 #endif
 
 /*********************************************************************
  *  Local Function prototypes
  *********************************************************************/
 static void	ixgbe_setup_transmit_ring(struct tx_ring *);
 static void     ixgbe_free_transmit_buffers(struct tx_ring *);
 static int	ixgbe_setup_receive_ring(struct rx_ring *);
 static void     ixgbe_free_receive_buffers(struct rx_ring *);
 
 static void	ixgbe_rx_checksum(u32, struct mbuf *, u32);
 static void	ixgbe_refresh_mbufs(struct rx_ring *, int);
 static int      ixgbe_xmit(struct tx_ring *, struct mbuf **);
 static int	ixgbe_tx_ctx_setup(struct tx_ring *,
 		    struct mbuf *, u32 *, u32 *);
 static int	ixgbe_tso_setup(struct tx_ring *,
 		    struct mbuf *, u32 *, u32 *);
 #ifdef IXGBE_FDIR
 static void	ixgbe_atr(struct tx_ring *, struct mbuf *);
 #endif
 static __inline void ixgbe_rx_discard(struct rx_ring *, int);
 static __inline void ixgbe_rx_input(struct rx_ring *, struct ifnet *,
 		    struct mbuf *, u32);
 
 #ifdef IXGBE_LEGACY_TX
 /*********************************************************************
  *  Transmit entry point
  *
  *  ixgbe_start is called by the stack to initiate a transmit.
  *  The driver will remain in this routine as long as there are
  *  packets to transmit and transmit resources are available.
  *  In case resources are not available stack is notified and
  *  the packet is requeued.
  **********************************************************************/
 
 void
 ixgbe_start_locked(struct tx_ring *txr, struct ifnet * ifp)
 {
 	struct mbuf    *m_head;
 	struct adapter *adapter = txr->adapter;
 
 	IXGBE_TX_LOCK_ASSERT(txr);
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 		return;
 	if (!adapter->link_active)
 		return;
 
 	while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
 		if (txr->tx_avail <= IXGBE_QUEUE_MIN_FREE)
 			break;
 
 		IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
 		if (m_head == NULL)
 			break;
 
 		if (ixgbe_xmit(txr, &m_head)) {
 			if (m_head != NULL)
 				IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
 			break;
 		}
 		/* Send a copy of the frame to the BPF listener */
 		ETHER_BPF_MTAP(ifp, m_head);
 	}
 	return;
 }
 
 /*
  * Legacy TX start - called by the stack, this
  * always uses the first tx ring, and should
  * not be used with multiqueue tx enabled.
  */
 void
 ixgbe_start(struct ifnet *ifp)
 {
 	struct adapter *adapter = ifp->if_softc;
 	struct tx_ring	*txr = adapter->tx_rings;
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		IXGBE_TX_LOCK(txr);
 		ixgbe_start_locked(txr, ifp);
 		IXGBE_TX_UNLOCK(txr);
 	}
 	return;
 }
 
 #else /* ! IXGBE_LEGACY_TX */
 
 /*
 ** Multiqueue Transmit Entry Point
 ** (if_transmit function)
 */
 int
 ixgbe_mq_start(struct ifnet *ifp, struct mbuf *m)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct ix_queue	*que;
 	struct tx_ring	*txr;
 	int 		i, err = 0;
 #ifdef	RSS
 	uint32_t bucket_id;
 #endif
 
 	/*
 	 * When doing RSS, map it to the same outbound queue
 	 * as the incoming flow would be mapped to.
 	 *
 	 * If everything is setup correctly, it should be the
 	 * same bucket that the current CPU we're on is.
 	 */
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
 #ifdef	RSS
 		if (rss_hash2bucket(m->m_pkthdr.flowid,
 		    M_HASHTYPE_GET(m), &bucket_id) == 0) {
 			i = bucket_id % adapter->num_queues;
 #ifdef IXGBE_DEBUG
 			if (bucket_id > adapter->num_queues)
 				if_printf(ifp, "bucket_id (%d) > num_queues "
 				    "(%d)\n", bucket_id, adapter->num_queues);
 #endif
 		} else 
 #endif
 			i = m->m_pkthdr.flowid % adapter->num_queues;
 	} else
 		i = curcpu % adapter->num_queues;
 
 	/* Check for a hung queue and pick alternative */
 	if (((1 << i) & adapter->active_queues) == 0)
 		i = ffsl(adapter->active_queues);
 
 	txr = &adapter->tx_rings[i];
 	que = &adapter->queues[i];
 
 	err = drbr_enqueue(ifp, txr->br, m);
 	if (err)
 		return (err);
 	if (IXGBE_TX_TRYLOCK(txr)) {
 		ixgbe_mq_start_locked(ifp, txr);
 		IXGBE_TX_UNLOCK(txr);
 	} else
 		taskqueue_enqueue(que->tq, &txr->txq_task);
 
 	return (0);
 }
 
 int
 ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
 {
 	struct adapter  *adapter = txr->adapter;
         struct mbuf     *next;
         int             enqueued = 0, err = 0;
 
 	if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
 	    adapter->link_active == 0)
 		return (ENETDOWN);
 
 	/* Process the queue */
 #if __FreeBSD_version < 901504
 	next = drbr_dequeue(ifp, txr->br);
 	while (next != NULL) {
 		if ((err = ixgbe_xmit(txr, &next)) != 0) {
 			if (next != NULL)
 				err = drbr_enqueue(ifp, txr->br, next);
 #else
 	while ((next = drbr_peek(ifp, txr->br)) != NULL) {
 		if ((err = ixgbe_xmit(txr, &next)) != 0) {
 			if (next == NULL) {
 				drbr_advance(ifp, txr->br);
 			} else {
 				drbr_putback(ifp, txr->br, next);
 			}
 #endif
 			break;
 		}
 #if __FreeBSD_version >= 901504
 		drbr_advance(ifp, txr->br);
 #endif
 		enqueued++;
 #if 0 // this is VF-only
 #if __FreeBSD_version >= 1100036
 		/*
 		 * Since we're looking at the tx ring, we can check
 		 * to see if we're a VF by examing our tail register
 		 * address.
 		 */
 		if (txr->tail < IXGBE_TDT(0) && next->m_flags & M_MCAST)
 			if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 #endif
 #endif
 		/* Send a copy of the frame to the BPF listener */
 		ETHER_BPF_MTAP(ifp, next);
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 #if __FreeBSD_version < 901504
 		next = drbr_dequeue(ifp, txr->br);
 #endif
 	}
 
 	if (txr->tx_avail < IXGBE_TX_CLEANUP_THRESHOLD)
 		ixgbe_txeof(txr);
 
 	return (err);
 }
 
 /*
  * Called from a taskqueue to drain queued transmit packets.
  */
 void
 ixgbe_deferred_mq_start(void *arg, int pending)
 {
 	struct tx_ring *txr = arg;
 	struct adapter *adapter = txr->adapter;
 	struct ifnet *ifp = adapter->ifp;
 
 	IXGBE_TX_LOCK(txr);
 	if (!drbr_empty(ifp, txr->br))
 		ixgbe_mq_start_locked(ifp, txr);
 	IXGBE_TX_UNLOCK(txr);
 }
 
 /*
  * Flush all ring buffers
  */
 void
 ixgbe_qflush(struct ifnet *ifp)
 {
 	struct adapter	*adapter = ifp->if_softc;
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct mbuf	*m;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		IXGBE_TX_LOCK(txr);
 		while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
 			m_freem(m);
 		IXGBE_TX_UNLOCK(txr);
 	}
 	if_qflush(ifp);
 }
 #endif /* IXGBE_LEGACY_TX */
 
 
 /*********************************************************************
  *
  *  This routine maps the mbufs to tx descriptors, allowing the
  *  TX engine to transmit the packets. 
  *  	- return 0 on success, positive on failure
  *
  **********************************************************************/
 
 static int
 ixgbe_xmit(struct tx_ring *txr, struct mbuf **m_headp)
 {
 	struct adapter  *adapter = txr->adapter;
 	u32		olinfo_status = 0, cmd_type_len;
 	int             i, j, error, nsegs;
 	int		first;
 	bool		remap = TRUE;
 	struct mbuf	*m_head;
 	bus_dma_segment_t segs[adapter->num_segs];
 	bus_dmamap_t	map;
 	struct ixgbe_tx_buf *txbuf;
 	union ixgbe_adv_tx_desc *txd = NULL;
 
 	m_head = *m_headp;
 
 	/* Basic descriptor defines */
         cmd_type_len = (IXGBE_ADVTXD_DTYP_DATA |
 	    IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT);
 
 	if (m_head->m_flags & M_VLANTAG)
         	cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
 
         /*
          * Important to capture the first descriptor
          * used because it will contain the index of
          * the one we tell the hardware to report back
          */
         first = txr->next_avail_desc;
 	txbuf = &txr->tx_buffers[first];
 	map = txbuf->map;
 
 	/*
 	 * Map the packet for DMA.
 	 */
 retry:
 	error = bus_dmamap_load_mbuf_sg(txr->txtag, map,
 	    *m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
 
 	if (__predict_false(error)) {
 		struct mbuf *m;
 
 		switch (error) {
 		case EFBIG:
 			/* Try it again? - one try */
 			if (remap == TRUE) {
 				remap = FALSE;
 				/*
 				 * XXX: m_defrag will choke on
 				 * non-MCLBYTES-sized clusters
 				 */
 				m = m_defrag(*m_headp, M_NOWAIT);
 				if (m == NULL) {
 					adapter->mbuf_defrag_failed++;
 					m_freem(*m_headp);
 					*m_headp = NULL;
 					return (ENOBUFS);
 				}
 				*m_headp = m;
 				goto retry;
 			} else
 				return (error);
 		case ENOMEM:
 			txr->no_tx_dma_setup++;
 			return (error);
 		default:
 			txr->no_tx_dma_setup++;
 			m_freem(*m_headp);
 			*m_headp = NULL;
 			return (error);
 		}
 	}
 
 	/* Make certain there are enough descriptors */
 	if (txr->tx_avail < (nsegs + 2)) {
 		txr->no_desc_avail++;
 		bus_dmamap_unload(txr->txtag, map);
 		return (ENOBUFS);
 	}
 	m_head = *m_headp;
 
 	/*
 	 * Set up the appropriate offload context
 	 * this will consume the first descriptor
 	 */
 	error = ixgbe_tx_ctx_setup(txr, m_head, &cmd_type_len, &olinfo_status);
 	if (__predict_false(error)) {
 		if (error == ENOBUFS)
 			*m_headp = NULL;
 		return (error);
 	}
 
 #ifdef IXGBE_FDIR
 	/* Do the flow director magic */
 	if ((txr->atr_sample) && (!adapter->fdir_reinit)) {
 		++txr->atr_count;
 		if (txr->atr_count >= atr_sample_rate) {
 			ixgbe_atr(txr, m_head);
 			txr->atr_count = 0;
 		}
 	}
 #endif
 
 	olinfo_status |= IXGBE_ADVTXD_CC;
 	i = txr->next_avail_desc;
 	for (j = 0; j < nsegs; j++) {
 		bus_size_t seglen;
 		bus_addr_t segaddr;
 
 		txbuf = &txr->tx_buffers[i];
 		txd = &txr->tx_base[i];
 		seglen = segs[j].ds_len;
 		segaddr = htole64(segs[j].ds_addr);
 
 		txd->read.buffer_addr = segaddr;
 		txd->read.cmd_type_len = htole32(txr->txd_cmd |
 		    cmd_type_len |seglen);
 		txd->read.olinfo_status = htole32(olinfo_status);
 
 		if (++i == txr->num_desc)
 			i = 0;
 	}
 
 	txd->read.cmd_type_len |=
 	    htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
 	txr->tx_avail -= nsegs;
 	txr->next_avail_desc = i;
 
 	txbuf->m_head = m_head;
 	/*
 	 * Here we swap the map so the last descriptor,
 	 * which gets the completion interrupt has the
 	 * real map, and the first descriptor gets the
 	 * unused map from this descriptor.
 	 */
 	txr->tx_buffers[first].map = txbuf->map;
 	txbuf->map = map;
 	bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE);
 
         /* Set the EOP descriptor that will be marked done */
         txbuf = &txr->tx_buffers[first];
 	txbuf->eop = txd;
 
         bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
             BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	/*
 	 * Advance the Transmit Descriptor Tail (Tdt), this tells the
 	 * hardware that this frame is available to transmit.
 	 */
 	++txr->total_packets;
 	IXGBE_WRITE_REG(&adapter->hw, txr->tail, i);
 
 	/* Mark queue as having work */
 	if (txr->busy == 0)
 		txr->busy = 1;
 
 	return (0);
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for tx_buffer structures. The tx_buffer stores all
  *  the information needed to transmit a packet on the wire. This is
  *  called only once at attach, setup is done every reset.
  *
  **********************************************************************/
 int
 ixgbe_allocate_transmit_buffers(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	device_t dev = adapter->dev;
 	struct ixgbe_tx_buf *txbuf;
 	int error, i;
 
 	/*
 	 * Setup DMA descriptor areas.
 	 */
 	if ((error = bus_dma_tag_create(
 			       bus_get_dma_tag(adapter->dev),	/* parent */
 			       1, 0,		/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       IXGBE_TSO_SIZE,		/* maxsize */
 			       adapter->num_segs,	/* nsegments */
 			       PAGE_SIZE,		/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txr->txtag))) {
 		device_printf(dev,"Unable to allocate TX DMA tag\n");
 		goto fail;
 	}
 
 	if (!(txr->tx_buffers =
 	    (struct ixgbe_tx_buf *) malloc(sizeof(struct ixgbe_tx_buf) *
 	    adapter->num_tx_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
         /* Create the descriptor buffer dma maps */
 	txbuf = txr->tx_buffers;
 	for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
 		error = bus_dmamap_create(txr->txtag, 0, &txbuf->map);
 		if (error != 0) {
 			device_printf(dev, "Unable to create TX DMA map\n");
 			goto fail;
 		}
 	}
 
 	return 0;
 fail:
 	/* We free all, it handles case where we are in the middle */
 	ixgbe_free_transmit_structures(adapter);
 	return (error);
 }
 
 /*********************************************************************
  *
  *  Initialize a transmit ring.
  *
  **********************************************************************/
 static void
 ixgbe_setup_transmit_ring(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	struct ixgbe_tx_buf *txbuf;
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(adapter->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 
 	/* Clear the old ring contents */
 	IXGBE_TX_LOCK(txr);
 #ifdef DEV_NETMAP
 	/*
 	 * (under lock): if in netmap mode, do some consistency
 	 * checks and set slot to entry 0 of the netmap ring.
 	 */
 	slot = netmap_reset(na, NR_TX, txr->me, 0);
 #endif /* DEV_NETMAP */
 	bzero((void *)txr->tx_base,
 	      (sizeof(union ixgbe_adv_tx_desc)) * adapter->num_tx_desc);
 	/* Reset indices */
 	txr->next_avail_desc = 0;
 	txr->next_to_clean = 0;
 
 	/* Free any existing tx buffers. */
         txbuf = txr->tx_buffers;
 	for (int i = 0; i < txr->num_desc; i++, txbuf++) {
 		if (txbuf->m_head != NULL) {
 			bus_dmamap_sync(txr->txtag, txbuf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag, txbuf->map);
 			m_freem(txbuf->m_head);
 			txbuf->m_head = NULL;
 		}
 #ifdef DEV_NETMAP
 		/*
 		 * In netmap mode, set the map for the packet buffer.
 		 * NOTE: Some drivers (not this one) also need to set
 		 * the physical buffer address in the NIC ring.
 		 * Slots in the netmap ring (indexed by "si") are
 		 * kring->nkr_hwofs positions "ahead" wrt the
 		 * corresponding slot in the NIC ring. In some drivers
 		 * (not here) nkr_hwofs can be negative. Function
 		 * netmap_idx_n2k() handles wraparounds properly.
 		 */
 		if (slot) {
 			int si = netmap_idx_n2k(&na->tx_rings[txr->me], i);
 			netmap_load_map(na, txr->txtag,
 			    txbuf->map, NMB(na, slot + si));
 		}
 #endif /* DEV_NETMAP */
 		/* Clear the EOP descriptor pointer */
 		txbuf->eop = NULL;
         }
 
 #ifdef IXGBE_FDIR
 	/* Set the rate at which we sample packets */
 	if (adapter->hw.mac.type != ixgbe_mac_82598EB)
 		txr->atr_sample = atr_sample_rate;
 #endif
 
 	/* Set number of descriptors available */
 	txr->tx_avail = adapter->num_tx_desc;
 
 	bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	IXGBE_TX_UNLOCK(txr);
 }
 
 /*********************************************************************
  *
  *  Initialize all transmit rings.
  *
  **********************************************************************/
 int
 ixgbe_setup_transmit_structures(struct adapter *adapter)
 {
 	struct tx_ring *txr = adapter->tx_rings;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++)
 		ixgbe_setup_transmit_ring(txr);
 
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Free all transmit rings.
  *
  **********************************************************************/
 void
 ixgbe_free_transmit_structures(struct adapter *adapter)
 {
 	struct tx_ring *txr = adapter->tx_rings;
 
 	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 		IXGBE_TX_LOCK(txr);
 		ixgbe_free_transmit_buffers(txr);
 		ixgbe_dma_free(adapter, &txr->txdma);
 		IXGBE_TX_UNLOCK(txr);
 		IXGBE_TX_LOCK_DESTROY(txr);
 	}
 	free(adapter->tx_rings, M_DEVBUF);
 }
 
 /*********************************************************************
  *
  *  Free transmit ring related data structures.
  *
  **********************************************************************/
 static void
 ixgbe_free_transmit_buffers(struct tx_ring *txr)
 {
 	struct adapter *adapter = txr->adapter;
 	struct ixgbe_tx_buf *tx_buffer;
 	int             i;
 
 	INIT_DEBUGOUT("ixgbe_free_transmit_ring: begin");
 
 	if (txr->tx_buffers == NULL)
 		return;
 
 	tx_buffer = txr->tx_buffers;
 	for (i = 0; i < adapter->num_tx_desc; i++, tx_buffer++) {
 		if (tx_buffer->m_head != NULL) {
 			bus_dmamap_sync(txr->txtag, tx_buffer->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag,
 			    tx_buffer->map);
 			m_freem(tx_buffer->m_head);
 			tx_buffer->m_head = NULL;
 			if (tx_buffer->map != NULL) {
 				bus_dmamap_destroy(txr->txtag,
 				    tx_buffer->map);
 				tx_buffer->map = NULL;
 			}
 		} else if (tx_buffer->map != NULL) {
 			bus_dmamap_unload(txr->txtag,
 			    tx_buffer->map);
 			bus_dmamap_destroy(txr->txtag,
 			    tx_buffer->map);
 			tx_buffer->map = NULL;
 		}
 	}
 #ifdef IXGBE_LEGACY_TX
 	if (txr->br != NULL)
 		buf_ring_free(txr->br, M_DEVBUF);
 #endif
 	if (txr->tx_buffers != NULL) {
 		free(txr->tx_buffers, M_DEVBUF);
 		txr->tx_buffers = NULL;
 	}
 	if (txr->txtag != NULL) {
 		bus_dma_tag_destroy(txr->txtag);
 		txr->txtag = NULL;
 	}
 	return;
 }
 
 /*********************************************************************
  *
  *  Advanced Context Descriptor setup for VLAN, CSUM or TSO
  *
  **********************************************************************/
 
 static int
 ixgbe_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp,
     u32 *cmd_type_len, u32 *olinfo_status)
 {
 	struct adapter *adapter = txr->adapter;
 	struct ixgbe_adv_tx_context_desc *TXD;
 	struct ether_vlan_header *eh;
 #ifdef INET
 	struct ip *ip;
 #endif
 #ifdef INET6
 	struct ip6_hdr *ip6;
 #endif
 	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
 	int	ehdrlen, ip_hlen = 0;
 	u16	etype;
 	u8	ipproto = 0;
 	int	offload = TRUE;
 	int	ctxd = txr->next_avail_desc;
 	u16	vtag = 0;
 	caddr_t l3d;
 
 
 	/* First check if TSO is to be used */
 	if (mp->m_pkthdr.csum_flags & (CSUM_IP_TSO|CSUM_IP6_TSO))
 		return (ixgbe_tso_setup(txr, mp, cmd_type_len, olinfo_status));
 
 	if ((mp->m_pkthdr.csum_flags & CSUM_OFFLOAD) == 0)
 		offload = FALSE;
 
 	/* Indicate the whole packet as payload when not doing TSO */
        	*olinfo_status |= mp->m_pkthdr.len << IXGBE_ADVTXD_PAYLEN_SHIFT;
 
 	/* Now ready a context descriptor */
 	TXD = (struct ixgbe_adv_tx_context_desc *) &txr->tx_base[ctxd];
 
 	/*
 	** In advanced descriptors the vlan tag must 
 	** be placed into the context descriptor. Hence
 	** we need to make one even if not doing offloads.
 	*/
 	if (mp->m_flags & M_VLANTAG) {
 		vtag = htole16(mp->m_pkthdr.ether_vtag);
 		vlan_macip_lens |= (vtag << IXGBE_ADVTXD_VLAN_SHIFT);
 	} else if (!IXGBE_IS_X550VF(adapter) && (offload == FALSE))
 		return (0);
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present,
 	 * helpful for QinQ too.
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		etype = ntohs(eh->evl_proto);
 		ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 	} else {
 		etype = ntohs(eh->evl_encap_proto);
 		ehdrlen = ETHER_HDR_LEN;
 	}
 
 	/* Set the ether header length */
 	vlan_macip_lens |= ehdrlen << IXGBE_ADVTXD_MACLEN_SHIFT;
 
 	if (offload == FALSE)
 		goto no_offloads;
 
 	/*
 	 * If the first mbuf only includes the ethernet header, jump to the next one
 	 * XXX: This assumes the stack splits mbufs containing headers on header boundaries
 	 * XXX: And assumes the entire IP header is contained in one mbuf
 	 */
 	if (mp->m_len == ehdrlen && mp->m_next)
 		l3d = mtod(mp->m_next, caddr_t);
 	else
 		l3d = mtod(mp, caddr_t) + ehdrlen;
 
 	switch (etype) {
 #ifdef INET
 		case ETHERTYPE_IP:
 			ip = (struct ip *)(l3d);
 			ip_hlen = ip->ip_hl << 2;
 			ipproto = ip->ip_p;
 			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
 			/* Insert IPv4 checksum into data descriptors */
 			if (mp->m_pkthdr.csum_flags & CSUM_IP) {
 				ip->ip_sum = 0;
 				*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
 			}
 			break;
 #endif
 #ifdef INET6
 		case ETHERTYPE_IPV6:
 			ip6 = (struct ip6_hdr *)(l3d);
 			ip_hlen = sizeof(struct ip6_hdr);
 			ipproto = ip6->ip6_nxt;
 			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
 			break;
 #endif
 		default:
 			offload = FALSE;
 			break;
 	}
 
 	vlan_macip_lens |= ip_hlen;
 
 	/* No support for offloads for non-L4 next headers */
 	switch (ipproto) {
 		case IPPROTO_TCP:
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_TCP | CSUM_IP6_TCP))
 				type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
 			else
 				offload = false;
 			break;
 		case IPPROTO_UDP:
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_UDP | CSUM_IP6_UDP))
 				type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP;
 			else
 				offload = false;
 			break;
 		case IPPROTO_SCTP:
 			if (mp->m_pkthdr.csum_flags & (CSUM_IP_SCTP | CSUM_IP6_SCTP))
 				type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP;
 			else
 				offload = false;
 			break;
 		default:
 			offload = false;
 			break;
 	}
 
 	if (offload) /* Insert L4 checksum into data descriptors */
 		*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
 
 no_offloads:
 	type_tucmd_mlhl |= IXGBE_ADVTXD_DCMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
 
 	/* Now copy bits into descriptor */
 	TXD->vlan_macip_lens = htole32(vlan_macip_lens);
 	TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
 	TXD->seqnum_seed = htole32(0);
 	TXD->mss_l4len_idx = htole32(0);
 
 	/* We've consumed the first desc, adjust counters */
 	if (++ctxd == txr->num_desc)
 		ctxd = 0;
 	txr->next_avail_desc = ctxd;
 	--txr->tx_avail;
 
         return (0);
 }
 
 /**********************************************************************
  *
  *  Setup work for hardware segmentation offload (TSO) on
  *  adapters using advanced tx descriptors
  *
  **********************************************************************/
 static int
 ixgbe_tso_setup(struct tx_ring *txr, struct mbuf *mp,
     u32 *cmd_type_len, u32 *olinfo_status)
 {
 	struct ixgbe_adv_tx_context_desc *TXD;
 	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
 	u32 mss_l4len_idx = 0, paylen;
 	u16 vtag = 0, eh_type;
 	int ctxd, ehdrlen, ip_hlen, tcp_hlen;
 	struct ether_vlan_header *eh;
 #ifdef INET6
 	struct ip6_hdr *ip6;
 #endif
 #ifdef INET
 	struct ip *ip;
 #endif
 	struct tcphdr *th;
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 		eh_type = eh->evl_proto;
 	} else {
 		ehdrlen = ETHER_HDR_LEN;
 		eh_type = eh->evl_encap_proto;
 	}
 
 	switch (ntohs(eh_type)) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 		ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
 		/* XXX-BZ For now we do not pretend to support ext. hdrs. */
 		if (ip6->ip6_nxt != IPPROTO_TCP)
 			return (ENXIO);
 		ip_hlen = sizeof(struct ip6_hdr);
 		ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
 		th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
 		th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
 		break;
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 		ip = (struct ip *)(mp->m_data + ehdrlen);
 		if (ip->ip_p != IPPROTO_TCP)
 			return (ENXIO);
 		ip->ip_sum = 0;
 		ip_hlen = ip->ip_hl << 2;
 		th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
 		th->th_sum = in_pseudo(ip->ip_src.s_addr,
 		    ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
 		/* Tell transmit desc to also do IPv4 checksum. */
 		*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
 		break;
 #endif
 	default:
 		panic("%s: CSUM_TSO but no supported IP version (0x%04x)",
 		    __func__, ntohs(eh_type));
 		break;
 	}
 
 	ctxd = txr->next_avail_desc;
 	TXD = (struct ixgbe_adv_tx_context_desc *) &txr->tx_base[ctxd];
 
 	tcp_hlen = th->th_off << 2;
 
 	/* This is used in the transmit desc in encap */
 	paylen = mp->m_pkthdr.len - ehdrlen - ip_hlen - tcp_hlen;
 
 	/* VLAN MACLEN IPLEN */
 	if (mp->m_flags & M_VLANTAG) {
 		vtag = htole16(mp->m_pkthdr.ether_vtag);
                 vlan_macip_lens |= (vtag << IXGBE_ADVTXD_VLAN_SHIFT);
 	}
 
 	vlan_macip_lens |= ehdrlen << IXGBE_ADVTXD_MACLEN_SHIFT;
 	vlan_macip_lens |= ip_hlen;
 	TXD->vlan_macip_lens = htole32(vlan_macip_lens);
 
 	/* ADV DTYPE TUCMD */
 	type_tucmd_mlhl |= IXGBE_ADVTXD_DCMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
 	type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
 	TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
 
 	/* MSS L4LEN IDX */
 	mss_l4len_idx |= (mp->m_pkthdr.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT);
 	mss_l4len_idx |= (tcp_hlen << IXGBE_ADVTXD_L4LEN_SHIFT);
 	TXD->mss_l4len_idx = htole32(mss_l4len_idx);
 
 	TXD->seqnum_seed = htole32(0);
 
 	if (++ctxd == txr->num_desc)
 		ctxd = 0;
 
 	txr->tx_avail--;
 	txr->next_avail_desc = ctxd;
 	*cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
 	*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
 	*olinfo_status |= paylen << IXGBE_ADVTXD_PAYLEN_SHIFT;
 	++txr->tso_tx;
 	return (0);
 }
 
 
 /**********************************************************************
  *
  *  Examine each tx_buffer in the used queue. If the hardware is done
  *  processing the packet then free associated resources. The
  *  tx_buffer is put back on the free queue.
  *
  **********************************************************************/
 void
 ixgbe_txeof(struct tx_ring *txr)
 {
 	struct adapter		*adapter = txr->adapter;
 #ifdef DEV_NETMAP
 	struct ifnet		*ifp = adapter->ifp;
 #endif
 	u32			work, processed = 0;
 	u32			limit = adapter->tx_process_limit;
 	struct ixgbe_tx_buf	*buf;
 	union ixgbe_adv_tx_desc *txd;
 
 	mtx_assert(&txr->tx_mtx, MA_OWNED);
 
 #ifdef DEV_NETMAP
 	if (ifp->if_capenable & IFCAP_NETMAP) {
 		struct netmap_adapter *na = NA(ifp);
 		struct netmap_kring *kring = &na->tx_rings[txr->me];
 		txd = txr->tx_base;
 		bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 		    BUS_DMASYNC_POSTREAD);
 		/*
 		 * In netmap mode, all the work is done in the context
 		 * of the client thread. Interrupt handlers only wake up
 		 * clients, which may be sleeping on individual rings
 		 * or on a global resource for all rings.
 		 * To implement tx interrupt mitigation, we wake up the client
 		 * thread roughly every half ring, even if the NIC interrupts
 		 * more frequently. This is implemented as follows:
 		 * - ixgbe_txsync() sets kring->nr_kflags with the index of
 		 *   the slot that should wake up the thread (nkr_num_slots
 		 *   means the user thread should not be woken up);
 		 * - the driver ignores tx interrupts unless netmap_mitigate=0
 		 *   or the slot has the DD bit set.
 		 */
 		if (!netmap_mitigate ||
 		    (kring->nr_kflags < kring->nkr_num_slots &&
 		    txd[kring->nr_kflags].wb.status & IXGBE_TXD_STAT_DD)) {
 			netmap_tx_irq(ifp, txr->me);
 		}
 		return;
 	}
 #endif /* DEV_NETMAP */
 
 	if (txr->tx_avail == txr->num_desc) {
 		txr->busy = 0;
 		return;
 	}
 
 	/* Get work starting point */
 	work = txr->next_to_clean;
 	buf = &txr->tx_buffers[work];
 	txd = &txr->tx_base[work];
 	work -= txr->num_desc; /* The distance to ring end */
         bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
             BUS_DMASYNC_POSTREAD);
 
 	do {
 		union ixgbe_adv_tx_desc *eop = buf->eop;
 		if (eop == NULL) /* No work */
 			break;
 
 		if ((eop->wb.status & IXGBE_TXD_STAT_DD) == 0)
 			break;	/* I/O not complete */
 
 		if (buf->m_head) {
 			txr->bytes +=
 			    buf->m_head->m_pkthdr.len;
 			bus_dmamap_sync(txr->txtag,
 			    buf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(txr->txtag,
 			    buf->map);
 			m_freem(buf->m_head);
 			buf->m_head = NULL;
 		}
 		buf->eop = NULL;
 		++txr->tx_avail;
 
 		/* We clean the range if multi segment */
 		while (txd != eop) {
 			++txd;
 			++buf;
 			++work;
 			/* wrap the ring? */
 			if (__predict_false(!work)) {
 				work -= txr->num_desc;
 				buf = txr->tx_buffers;
 				txd = txr->tx_base;
 			}
 			if (buf->m_head) {
 				txr->bytes +=
 				    buf->m_head->m_pkthdr.len;
 				bus_dmamap_sync(txr->txtag,
 				    buf->map,
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(txr->txtag,
 				    buf->map);
 				m_freem(buf->m_head);
 				buf->m_head = NULL;
 			}
 			++txr->tx_avail;
 			buf->eop = NULL;
 
 		}
 		++txr->packets;
 		++processed;
 
 		/* Try the next packet */
 		++txd;
 		++buf;
 		++work;
 		/* reset with a wrap */
 		if (__predict_false(!work)) {
 			work -= txr->num_desc;
 			buf = txr->tx_buffers;
 			txd = txr->tx_base;
 		}
 		prefetch(txd);
 	} while (__predict_true(--limit));
 
 	bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	work += txr->num_desc;
 	txr->next_to_clean = work;
 
 	/*
 	** Queue Hang detection, we know there's
 	** work outstanding or the first return
 	** would have been taken, so increment busy
 	** if nothing managed to get cleaned, then
 	** in local_timer it will be checked and 
 	** marked as HUNG if it exceeds a MAX attempt.
 	*/
 	if ((processed == 0) && (txr->busy != IXGBE_QUEUE_HUNG))
 		++txr->busy;
 	/*
 	** If anything gets cleaned we reset state to 1,
 	** note this will turn off HUNG if its set.
 	*/
 	if (processed)
 		txr->busy = 1;
 
 	if (txr->tx_avail == txr->num_desc)
 		txr->busy = 0;
 
 	return;
 }
 
 
 #ifdef IXGBE_FDIR
 /*
 ** This routine parses packet headers so that Flow
 ** Director can make a hashed filter table entry 
 ** allowing traffic flows to be identified and kept
 ** on the same cpu.  This would be a performance
 ** hit, but we only do it at IXGBE_FDIR_RATE of
 ** packets.
 */
 static void
 ixgbe_atr(struct tx_ring *txr, struct mbuf *mp)
 {
 	struct adapter			*adapter = txr->adapter;
 	struct ix_queue			*que;
 	struct ip			*ip;
 	struct tcphdr			*th;
 	struct udphdr			*uh;
 	struct ether_vlan_header	*eh;
 	union ixgbe_atr_hash_dword	input = {.dword = 0}; 
 	union ixgbe_atr_hash_dword	common = {.dword = 0}; 
 	int  				ehdrlen, ip_hlen;
 	u16				etype;
 
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 		etype = eh->evl_proto;
 	} else {
 		ehdrlen = ETHER_HDR_LEN;
 		etype = eh->evl_encap_proto;
 	}
 
 	/* Only handling IPv4 */
 	if (etype != htons(ETHERTYPE_IP))
 		return;
 
 	ip = (struct ip *)(mp->m_data + ehdrlen);
 	ip_hlen = ip->ip_hl << 2;
 
 	/* check if we're UDP or TCP */
 	switch (ip->ip_p) {
 	case IPPROTO_TCP:
 		th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
 		/* src and dst are inverted */
 		common.port.dst ^= th->th_sport;
 		common.port.src ^= th->th_dport;
 		input.formatted.flow_type ^= IXGBE_ATR_FLOW_TYPE_TCPV4;
 		break;
 	case IPPROTO_UDP:
 		uh = (struct udphdr *)((caddr_t)ip + ip_hlen);
 		/* src and dst are inverted */
 		common.port.dst ^= uh->uh_sport;
 		common.port.src ^= uh->uh_dport;
 		input.formatted.flow_type ^= IXGBE_ATR_FLOW_TYPE_UDPV4;
 		break;
 	default:
 		return;
 	}
 
 	input.formatted.vlan_id = htobe16(mp->m_pkthdr.ether_vtag);
 	if (mp->m_pkthdr.ether_vtag)
 		common.flex_bytes ^= htons(ETHERTYPE_VLAN);
 	else
 		common.flex_bytes ^= etype;
 	common.ip ^= ip->ip_src.s_addr ^ ip->ip_dst.s_addr;
 
 	que = &adapter->queues[txr->me];
 	/*
 	** This assumes the Rx queue and Tx
 	** queue are bound to the same CPU
 	*/
 	ixgbe_fdir_add_signature_filter_82599(&adapter->hw,
 	    input, common, que->msix);
 }
 #endif /* IXGBE_FDIR */
 
 /*
 ** Used to detect a descriptor that has
 ** been merged by Hardware RSC.
 */
 static inline u32
 ixgbe_rsc_count(union ixgbe_adv_rx_desc *rx)
 {
 	return (le32toh(rx->wb.lower.lo_dword.data) &
 	    IXGBE_RXDADV_RSCCNT_MASK) >> IXGBE_RXDADV_RSCCNT_SHIFT;
 }
 
 /*********************************************************************
  *
  *  Initialize Hardware RSC (LRO) feature on 82599
  *  for an RX ring, this is toggled by the LRO capability
  *  even though it is transparent to the stack.
  *
  *  NOTE: since this HW feature only works with IPV4 and 
  *        our testing has shown soft LRO to be as effective
  *        I have decided to disable this by default.
  *
  **********************************************************************/
 static void
 ixgbe_setup_hw_rsc(struct rx_ring *rxr)
 {
 	struct	adapter 	*adapter = rxr->adapter;
 	struct	ixgbe_hw	*hw = &adapter->hw;
 	u32			rscctrl, rdrxctl;
 
 	/* If turning LRO/RSC off we need to disable it */
 	if ((adapter->ifp->if_capenable & IFCAP_LRO) == 0) {
 		rscctrl = IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxr->me));
 		rscctrl &= ~IXGBE_RSCCTL_RSCEN;
 		return;
 	}
 
 	rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
 	rdrxctl &= ~IXGBE_RDRXCTL_RSCFRSTSIZE;
 #ifdef DEV_NETMAP /* crcstrip is optional in netmap */
 	if (adapter->ifp->if_capenable & IFCAP_NETMAP && !ix_crcstrip)
 #endif /* DEV_NETMAP */
 	rdrxctl |= IXGBE_RDRXCTL_CRCSTRIP;
 	rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
 	IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
 
 	rscctrl = IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxr->me));
 	rscctrl |= IXGBE_RSCCTL_RSCEN;
 	/*
 	** Limit the total number of descriptors that
 	** can be combined, so it does not exceed 64K
 	*/
 	if (rxr->mbuf_sz == MCLBYTES)
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
 	else if (rxr->mbuf_sz == MJUMPAGESIZE)
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
 	else if (rxr->mbuf_sz == MJUM9BYTES)
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
 	else  /* Using 16K cluster */
 		rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
 
 	IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxr->me), rscctrl);
 
 	/* Enable TCP header recognition */
 	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(0),
 	    (IXGBE_READ_REG(hw, IXGBE_PSRTYPE(0)) |
 	    IXGBE_PSRTYPE_TCPHDR));
 
 	/* Disable RSC for ACK packets */
 	IXGBE_WRITE_REG(hw, IXGBE_RSCDBU,
 	    (IXGBE_RSCDBU_RSCACKDIS | IXGBE_READ_REG(hw, IXGBE_RSCDBU)));
 
 	rxr->hw_rsc = TRUE;
 }
 
 /*********************************************************************
  *
  *  Refresh mbuf buffers for RX descriptor rings
  *   - now keeps its own state so discards due to resource
  *     exhaustion are unnecessary, if an mbuf cannot be obtained
  *     it just returns, keeping its placeholder, thus it can simply
  *     be recalled to try again.
  *
  **********************************************************************/
 static void
 ixgbe_refresh_mbufs(struct rx_ring *rxr, int limit)
 {
 	struct adapter		*adapter = rxr->adapter;
 	bus_dma_segment_t	seg[1];
 	struct ixgbe_rx_buf	*rxbuf;
 	struct mbuf		*mp;
 	int			i, j, nsegs, error;
 	bool			refreshed = FALSE;
 
 	i = j = rxr->next_to_refresh;
 	/* Control the loop with one beyond */
 	if (++j == rxr->num_desc)
 		j = 0;
 
 	while (j != limit) {
 		rxbuf = &rxr->rx_buffers[i];
 		if (rxbuf->buf == NULL) {
 			mp = m_getjcl(M_NOWAIT, MT_DATA,
 			    M_PKTHDR, rxr->mbuf_sz);
 			if (mp == NULL)
 				goto update;
 			if (adapter->max_frame_size <= (MCLBYTES - ETHER_ALIGN))
 				m_adj(mp, ETHER_ALIGN);
 		} else
 			mp = rxbuf->buf;
 
 		mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
 
 		/* If we're dealing with an mbuf that was copied rather
 		 * than replaced, there's no need to go through busdma.
 		 */
 		if ((rxbuf->flags & IXGBE_RX_COPY) == 0) {
 			/* Get the memory mapping */
 			bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 			error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 			    rxbuf->pmap, mp, seg, &nsegs, BUS_DMA_NOWAIT);
 			if (error != 0) {
 				printf("Refresh mbufs: payload dmamap load"
 				    " failure - %d\n", error);
 				m_free(mp);
 				rxbuf->buf = NULL;
 				goto update;
 			}
 			rxbuf->buf = mp;
 			bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 			    BUS_DMASYNC_PREREAD);
 			rxbuf->addr = rxr->rx_base[i].read.pkt_addr =
 			    htole64(seg[0].ds_addr);
 		} else {
 			rxr->rx_base[i].read.pkt_addr = rxbuf->addr;
 			rxbuf->flags &= ~IXGBE_RX_COPY;
 		}
 
 		refreshed = TRUE;
 		/* Next is precalculated */
 		i = j;
 		rxr->next_to_refresh = i;
 		if (++j == rxr->num_desc)
 			j = 0;
 	}
 update:
 	if (refreshed) /* Update hardware tail index */
 		IXGBE_WRITE_REG(&adapter->hw,
 		    rxr->tail, rxr->next_to_refresh);
 	return;
 }
 
 /*********************************************************************
  *
  *  Allocate memory for rx_buffer structures. Since we use one
  *  rx_buffer per received packet, the maximum number of rx_buffer's
  *  that we'll need is equal to the number of receive descriptors
  *  that we've allocated.
  *
  **********************************************************************/
 int
 ixgbe_allocate_receive_buffers(struct rx_ring *rxr)
 {
 	struct	adapter 	*adapter = rxr->adapter;
 	device_t 		dev = adapter->dev;
 	struct ixgbe_rx_buf 	*rxbuf;
 	int             	bsize, error;
 
 	bsize = sizeof(struct ixgbe_rx_buf) * rxr->num_desc;
 	if (!(rxr->rx_buffers =
 	    (struct ixgbe_rx_buf *) malloc(bsize,
 	    M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate rx_buffer memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
 	if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),	/* parent */
 				   1, 0,	/* alignment, bounds */
 				   BUS_SPACE_MAXADDR,	/* lowaddr */
 				   BUS_SPACE_MAXADDR,	/* highaddr */
 				   NULL, NULL,		/* filter, filterarg */
 				   MJUM16BYTES,		/* maxsize */
 				   1,			/* nsegments */
 				   MJUM16BYTES,		/* maxsegsize */
 				   0,			/* flags */
 				   NULL,		/* lockfunc */
 				   NULL,		/* lockfuncarg */
 				   &rxr->ptag))) {
 		device_printf(dev, "Unable to create RX DMA tag\n");
 		goto fail;
 	}
 
 	for (int i = 0; i < rxr->num_desc; i++, rxbuf++) {
 		rxbuf = &rxr->rx_buffers[i];
 		error = bus_dmamap_create(rxr->ptag, 0, &rxbuf->pmap);
 		if (error) {
 			device_printf(dev, "Unable to create RX dma map\n");
 			goto fail;
 		}
 	}
 
 	return (0);
 
 fail:
 	/* Frees all, but can handle partial completion */
 	ixgbe_free_receive_structures(adapter);
 	return (error);
 }
 
 static void     
 ixgbe_free_receive_ring(struct rx_ring *rxr)
 { 
 	struct ixgbe_rx_buf       *rxbuf;
 
 	for (int i = 0; i < rxr->num_desc; i++) {
 		rxbuf = &rxr->rx_buffers[i];
 		if (rxbuf->buf != NULL) {
 			bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 			rxbuf->buf->m_flags |= M_PKTHDR;
 			m_freem(rxbuf->buf);
 			rxbuf->buf = NULL;
 			rxbuf->flags = 0;
 		}
 	}
 }
 
 /*********************************************************************
  *
  *  Initialize a receive ring and its buffers.
  *
  **********************************************************************/
 static int
 ixgbe_setup_receive_ring(struct rx_ring *rxr)
 {
 	struct	adapter 	*adapter;
 	struct ifnet		*ifp;
 	device_t		dev;
 	struct ixgbe_rx_buf	*rxbuf;
 	bus_dma_segment_t	seg[1];
 	struct lro_ctrl		*lro = &rxr->lro;
 	int			rsize, nsegs, error = 0;
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(rxr->adapter->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 
 	adapter = rxr->adapter;
 	ifp = adapter->ifp;
 	dev = adapter->dev;
 
 	/* Clear the ring contents */
 	IXGBE_RX_LOCK(rxr);
 #ifdef DEV_NETMAP
 	/* same as in ixgbe_setup_transmit_ring() */
 	slot = netmap_reset(na, NR_RX, rxr->me, 0);
 #endif /* DEV_NETMAP */
 	rsize = roundup2(adapter->num_rx_desc *
 	    sizeof(union ixgbe_adv_rx_desc), DBA_ALIGN);
 	bzero((void *)rxr->rx_base, rsize);
 	/* Cache the size */
 	rxr->mbuf_sz = adapter->rx_mbuf_sz;
 
 	/* Free current RX buffer structs and their mbufs */
 	ixgbe_free_receive_ring(rxr);
 
 	/* Now replenish the mbufs */
 	for (int j = 0; j != rxr->num_desc; ++j) {
 		struct mbuf	*mp;
 
 		rxbuf = &rxr->rx_buffers[j];
 #ifdef DEV_NETMAP
 		/*
 		 * In netmap mode, fill the map and set the buffer
 		 * address in the NIC ring, considering the offset
 		 * between the netmap and NIC rings (see comment in
 		 * ixgbe_setup_transmit_ring() ). No need to allocate
 		 * an mbuf, so end the block with a continue;
 		 */
 		if (slot) {
 			int sj = netmap_idx_n2k(&na->rx_rings[rxr->me], j);
 			uint64_t paddr;
 			void *addr;
 
 			addr = PNMB(na, slot + sj, &paddr);
 			netmap_load_map(na, rxr->ptag, rxbuf->pmap, addr);
 			/* Update descriptor and the cached value */
 			rxr->rx_base[j].read.pkt_addr = htole64(paddr);
 			rxbuf->addr = htole64(paddr);
 			continue;
 		}
 #endif /* DEV_NETMAP */
 		rxbuf->flags = 0; 
 		rxbuf->buf = m_getjcl(M_NOWAIT, MT_DATA,
 		    M_PKTHDR, adapter->rx_mbuf_sz);
 		if (rxbuf->buf == NULL) {
 			error = ENOBUFS;
                         goto fail;
 		}
 		mp = rxbuf->buf;
 		mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 		    rxbuf->pmap, mp, seg,
 		    &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0)
                         goto fail;
 		bus_dmamap_sync(rxr->ptag,
 		    rxbuf->pmap, BUS_DMASYNC_PREREAD);
 		/* Update the descriptor and the cached value */
 		rxr->rx_base[j].read.pkt_addr = htole64(seg[0].ds_addr);
 		rxbuf->addr = htole64(seg[0].ds_addr);
 	}
 
 
 	/* Setup our descriptor indices */
 	rxr->next_to_check = 0;
 	rxr->next_to_refresh = 0;
 	rxr->lro_enabled = FALSE;
 	rxr->rx_copies = 0;
 	rxr->rx_bytes = 0;
 	rxr->vtag_strip = FALSE;
 
 	bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	/*
 	** Now set up the LRO interface:
 	*/
 	if (ixgbe_rsc_enable)
 		ixgbe_setup_hw_rsc(rxr);
 	else if (ifp->if_capenable & IFCAP_LRO) {
 		int err = tcp_lro_init(lro);
 		if (err) {
 			device_printf(dev, "LRO Initialization failed!\n");
 			goto fail;
 		}
 		INIT_DEBUGOUT("RX Soft LRO Initialized\n");
 		rxr->lro_enabled = TRUE;
 		lro->ifp = adapter->ifp;
 	}
 
 	IXGBE_RX_UNLOCK(rxr);
 	return (0);
 
 fail:
 	ixgbe_free_receive_ring(rxr);
 	IXGBE_RX_UNLOCK(rxr);
 	return (error);
 }
 
 /*********************************************************************
  *
  *  Initialize all receive rings.
  *
  **********************************************************************/
 int
 ixgbe_setup_receive_structures(struct adapter *adapter)
 {
 	struct rx_ring *rxr = adapter->rx_rings;
 	int j;
 
 	for (j = 0; j < adapter->num_queues; j++, rxr++)
 		if (ixgbe_setup_receive_ring(rxr))
 			goto fail;
 
 	return (0);
 fail:
 	/*
 	 * Free RX buffers allocated so far, we will only handle
 	 * the rings that completed, the failing case will have
 	 * cleaned up for itself. 'j' failed, so its the terminus.
 	 */
 	for (int i = 0; i < j; ++i) {
 		rxr = &adapter->rx_rings[i];
 		ixgbe_free_receive_ring(rxr);
 	}
 
 	return (ENOBUFS);
 }
 
 
 /*********************************************************************
  *
  *  Free all receive rings.
  *
  **********************************************************************/
 void
 ixgbe_free_receive_structures(struct adapter *adapter)
 {
 	struct rx_ring *rxr = adapter->rx_rings;
 
 	INIT_DEBUGOUT("ixgbe_free_receive_structures: begin");
 
 	for (int i = 0; i < adapter->num_queues; i++, rxr++) {
 		struct lro_ctrl		*lro = &rxr->lro;
 		ixgbe_free_receive_buffers(rxr);
 		/* Free LRO memory */
 		tcp_lro_free(lro);
 		/* Free the ring memory as well */
 		ixgbe_dma_free(adapter, &rxr->rxdma);
 	}
 
 	free(adapter->rx_rings, M_DEVBUF);
 }
 
 
 /*********************************************************************
  *
  *  Free receive ring data structures
  *
  **********************************************************************/
 void
 ixgbe_free_receive_buffers(struct rx_ring *rxr)
 {
 	struct adapter		*adapter = rxr->adapter;
 	struct ixgbe_rx_buf	*rxbuf;
 
 	INIT_DEBUGOUT("ixgbe_free_receive_buffers: begin");
 
 	/* Cleanup any existing buffers */
 	if (rxr->rx_buffers != NULL) {
 		for (int i = 0; i < adapter->num_rx_desc; i++) {
 			rxbuf = &rxr->rx_buffers[i];
 			if (rxbuf->buf != NULL) {
 				bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
 				    BUS_DMASYNC_POSTREAD);
 				bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
 				rxbuf->buf->m_flags |= M_PKTHDR;
 				m_freem(rxbuf->buf);
 			}
 			rxbuf->buf = NULL;
 			if (rxbuf->pmap != NULL) {
 				bus_dmamap_destroy(rxr->ptag, rxbuf->pmap);
 				rxbuf->pmap = NULL;
 			}
 		}
 		if (rxr->rx_buffers != NULL) {
 			free(rxr->rx_buffers, M_DEVBUF);
 			rxr->rx_buffers = NULL;
 		}
 	}
 
 	if (rxr->ptag != NULL) {
 		bus_dma_tag_destroy(rxr->ptag);
 		rxr->ptag = NULL;
 	}
 
 	return;
 }
 
 static __inline void
 ixgbe_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u32 ptype)
 {
                  
         /*
          * ATM LRO is only for IP/TCP packets and TCP checksum of the packet
          * should be computed by hardware. Also it should not have VLAN tag in
          * ethernet header.  In case of IPv6 we do not yet support ext. hdrs.
          */
         if (rxr->lro_enabled &&
             (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
             (ptype & IXGBE_RXDADV_PKTTYPE_ETQF) == 0 &&
             ((ptype & (IXGBE_RXDADV_PKTTYPE_IPV4 | IXGBE_RXDADV_PKTTYPE_TCP)) ==
             (IXGBE_RXDADV_PKTTYPE_IPV4 | IXGBE_RXDADV_PKTTYPE_TCP) ||
             (ptype & (IXGBE_RXDADV_PKTTYPE_IPV6 | IXGBE_RXDADV_PKTTYPE_TCP)) ==
             (IXGBE_RXDADV_PKTTYPE_IPV6 | IXGBE_RXDADV_PKTTYPE_TCP)) &&
             (m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) ==
             (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
                 /*
                  * Send to the stack if:
                  **  - LRO not enabled, or
                  **  - no LRO resources, or
                  **  - lro enqueue fails
                  */
                 if (rxr->lro.lro_cnt != 0)
                         if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
                                 return;
         }
 	IXGBE_RX_UNLOCK(rxr);
         (*ifp->if_input)(ifp, m);
 	IXGBE_RX_LOCK(rxr);
 }
 
 static __inline void
 ixgbe_rx_discard(struct rx_ring *rxr, int i)
 {
 	struct ixgbe_rx_buf	*rbuf;
 
 	rbuf = &rxr->rx_buffers[i];
 
 
 	/*
 	** With advanced descriptors the writeback
 	** clobbers the buffer addrs, so its easier
 	** to just free the existing mbufs and take
 	** the normal refresh path to get new buffers
 	** and mapping.
 	*/
 
 	if (rbuf->fmp != NULL) {/* Partial chain ? */
 		rbuf->fmp->m_flags |= M_PKTHDR;
 		m_freem(rbuf->fmp);
 		rbuf->fmp = NULL;
 		rbuf->buf = NULL; /* rbuf->buf is part of fmp's chain */
 	} else if (rbuf->buf) {
 		m_free(rbuf->buf);
 		rbuf->buf = NULL;
 	}
 	bus_dmamap_unload(rxr->ptag, rbuf->pmap);
 
 	rbuf->flags = 0;
  
 	return;
 }
 
 
 /*********************************************************************
  *
  *  This routine executes in interrupt context. It replenishes
  *  the mbufs in the descriptor and sends data which has been
  *  dma'ed into host memory to upper layer.
  *
  *  Return TRUE for more work, FALSE for all clean.
  *********************************************************************/
 bool
 ixgbe_rxeof(struct ix_queue *que)
 {
 	struct adapter		*adapter = que->adapter;
 	struct rx_ring		*rxr = que->rxr;
 	struct ifnet		*ifp = adapter->ifp;
 	struct lro_ctrl		*lro = &rxr->lro;
 	int			i, nextp, processed = 0;
 	u32			staterr = 0;
 	u32			count = adapter->rx_process_limit;
 	union ixgbe_adv_rx_desc	*cur;
 	struct ixgbe_rx_buf	*rbuf, *nbuf;
 	u16			pkt_info;
 
 	IXGBE_RX_LOCK(rxr);
 
 #ifdef DEV_NETMAP
 	/* Same as the txeof routine: wakeup clients on intr. */
 	if (netmap_rx_irq(ifp, rxr->me, &processed)) {
 		IXGBE_RX_UNLOCK(rxr);
 		return (FALSE);
 	}
 #endif /* DEV_NETMAP */
 
 	for (i = rxr->next_to_check; count != 0;) {
 		struct mbuf	*sendmp, *mp;
 		u32		rsc, ptype;
 		u16		len;
 		u16		vtag = 0;
 		bool		eop;
  
 		/* Sync the ring. */
 		bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 		cur = &rxr->rx_base[i];
 		staterr = le32toh(cur->wb.upper.status_error);
 		pkt_info = le16toh(cur->wb.lower.lo_dword.hs_rss.pkt_info);
 
 		if ((staterr & IXGBE_RXD_STAT_DD) == 0)
 			break;
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 
 		count--;
 		sendmp = NULL;
 		nbuf = NULL;
 		rsc = 0;
 		cur->wb.upper.status_error = 0;
 		rbuf = &rxr->rx_buffers[i];
 		mp = rbuf->buf;
 
 		len = le16toh(cur->wb.upper.length);
 		ptype = le32toh(cur->wb.lower.lo_dword.data) &
 		    IXGBE_RXDADV_PKTTYPE_MASK;
 		eop = ((staterr & IXGBE_RXD_STAT_EOP) != 0);
 
 		/* Make sure bad packets are discarded */
 		if (eop && (staterr & IXGBE_RXDADV_ERR_FRAME_ERR_MASK) != 0) {
 #if __FreeBSD_version >= 1100036
 			if (IXGBE_IS_VF(adapter))
 				if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
 #endif
 			rxr->rx_discarded++;
 			ixgbe_rx_discard(rxr, i);
 			goto next_desc;
 		}
 
 		/*
 		** On 82599 which supports a hardware
 		** LRO (called HW RSC), packets need
 		** not be fragmented across sequential
 		** descriptors, rather the next descriptor
 		** is indicated in bits of the descriptor.
 		** This also means that we might proceses
 		** more than one packet at a time, something
 		** that has never been true before, it
 		** required eliminating global chain pointers
 		** in favor of what we are doing here.  -jfv
 		*/
 		if (!eop) {
 			/*
 			** Figure out the next descriptor
 			** of this frame.
 			*/
 			if (rxr->hw_rsc == TRUE) {
 				rsc = ixgbe_rsc_count(cur);
 				rxr->rsc_num += (rsc - 1);
 			}
 			if (rsc) { /* Get hardware index */
 				nextp = ((staterr &
 				    IXGBE_RXDADV_NEXTP_MASK) >>
 				    IXGBE_RXDADV_NEXTP_SHIFT);
 			} else { /* Just sequential */
 				nextp = i + 1;
 				if (nextp == adapter->num_rx_desc)
 					nextp = 0;
 			}
 			nbuf = &rxr->rx_buffers[nextp];
 			prefetch(nbuf);
 		}
 		/*
 		** Rather than using the fmp/lmp global pointers
 		** we now keep the head of a packet chain in the
 		** buffer struct and pass this along from one
 		** descriptor to the next, until we get EOP.
 		*/
 		mp->m_len = len;
 		/*
 		** See if there is a stored head
 		** that determines what we are
 		*/
 		sendmp = rbuf->fmp;
 		if (sendmp != NULL) {  /* secondary frag */
 			rbuf->buf = rbuf->fmp = NULL;
 			mp->m_flags &= ~M_PKTHDR;
 			sendmp->m_pkthdr.len += mp->m_len;
 		} else {
 			/*
 			 * Optimize.  This might be a small packet,
 			 * maybe just a TCP ACK.  Do a fast copy that
 			 * is cache aligned into a new mbuf, and
 			 * leave the old mbuf+cluster for re-use.
 			 */
 			if (eop && len <= IXGBE_RX_COPY_LEN) {
 				sendmp = m_gethdr(M_NOWAIT, MT_DATA);
 				if (sendmp != NULL) {
 					sendmp->m_data +=
 					    IXGBE_RX_COPY_ALIGN;
 					ixgbe_bcopy(mp->m_data,
 					    sendmp->m_data, len);
 					sendmp->m_len = len;
 					rxr->rx_copies++;
 					rbuf->flags |= IXGBE_RX_COPY;
 				}
 			}
 			if (sendmp == NULL) {
 				rbuf->buf = rbuf->fmp = NULL;
 				sendmp = mp;
 			}
 
 			/* first desc of a non-ps chain */
 			sendmp->m_flags |= M_PKTHDR;
 			sendmp->m_pkthdr.len = mp->m_len;
 		}
 		++processed;
 
 		/* Pass the head pointer on */
 		if (eop == 0) {
 			nbuf->fmp = sendmp;
 			sendmp = NULL;
 			mp->m_next = nbuf->buf;
 		} else { /* Sending this frame */
 			sendmp->m_pkthdr.rcvif = ifp;
 			rxr->rx_packets++;
 			/* capture data for AIM */
 			rxr->bytes += sendmp->m_pkthdr.len;
 			rxr->rx_bytes += sendmp->m_pkthdr.len;
 			/* Process vlan info */
 			if ((rxr->vtag_strip) &&
 			    (staterr & IXGBE_RXD_STAT_VP))
 				vtag = le16toh(cur->wb.upper.vlan);
 			if (vtag) {
 				sendmp->m_pkthdr.ether_vtag = vtag;
 				sendmp->m_flags |= M_VLANTAG;
 			}
 			if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
 				ixgbe_rx_checksum(staterr, sendmp, ptype);
 
                         /*
                          * In case of multiqueue, we have RXCSUM.PCSD bit set
                          * and never cleared. This means we have RSS hash
                          * available to be used.   
                          */
                         if (adapter->num_queues > 1) {
                                 sendmp->m_pkthdr.flowid =
                                     le32toh(cur->wb.lower.hi_dword.rss);
                                 switch (pkt_info & IXGBE_RXDADV_RSSTYPE_MASK) {  
                                     case IXGBE_RXDADV_RSSTYPE_IPV4:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_IPV4);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV4_TCP:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_TCP_IPV4);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_IPV6);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6_TCP:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_TCP_IPV6);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6_EX:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_IPV6_EX);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6_TCP_EX:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_TCP_IPV6_EX);
                                         break;
 #if __FreeBSD_version > 1100000
                                     case IXGBE_RXDADV_RSSTYPE_IPV4_UDP:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_UDP_IPV4);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6_UDP:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_UDP_IPV6);
                                         break;
                                     case IXGBE_RXDADV_RSSTYPE_IPV6_UDP_EX:
                                         M_HASHTYPE_SET(sendmp,
                                             M_HASHTYPE_RSS_UDP_IPV6_EX);
                                         break;
 #endif
                                     default:
                                         M_HASHTYPE_SET(sendmp,
-                                            M_HASHTYPE_OPAQUE);
+                                            M_HASHTYPE_OPAQUE_HASH);
                                 }
                         } else {
                                 sendmp->m_pkthdr.flowid = que->msix;
 				M_HASHTYPE_SET(sendmp, M_HASHTYPE_OPAQUE);
 			}
 		}
 next_desc:
 		bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		/* Advance our pointers to the next descriptor. */
 		if (++i == rxr->num_desc)
 			i = 0;
 
 		/* Now send to the stack or do LRO */
 		if (sendmp != NULL) {
 			rxr->next_to_check = i;
 			ixgbe_rx_input(rxr, ifp, sendmp, ptype);
 			i = rxr->next_to_check;
 		}
 
                /* Every 8 descriptors we go to refresh mbufs */
 		if (processed == 8) {
 			ixgbe_refresh_mbufs(rxr, i);
 			processed = 0;
 		}
 	}
 
 	/* Refresh any remaining buf structs */
 	if (ixgbe_rx_unrefreshed(rxr))
 		ixgbe_refresh_mbufs(rxr, i);
 
 	rxr->next_to_check = i;
 
 	/*
 	 * Flush any outstanding LRO work
 	 */
 	tcp_lro_flush_all(lro);
 
 	IXGBE_RX_UNLOCK(rxr);
 
 	/*
 	** Still have cleaning to do?
 	*/
 	if ((staterr & IXGBE_RXD_STAT_DD) != 0)
 		return (TRUE);
 	else
 		return (FALSE);
 }
 
 
 /*********************************************************************
  *
  *  Verify that the hardware indicated that the checksum is valid.
  *  Inform the stack about the status of checksum so that stack
  *  doesn't spend time verifying the checksum.
  *
  *********************************************************************/
 static void
 ixgbe_rx_checksum(u32 staterr, struct mbuf * mp, u32 ptype)
 {
 	u16	status = (u16) staterr;
 	u8	errors = (u8) (staterr >> 24);
 	bool	sctp = false;
 
 	if ((ptype & IXGBE_RXDADV_PKTTYPE_ETQF) == 0 &&
 	    (ptype & IXGBE_RXDADV_PKTTYPE_SCTP) != 0)
 		sctp = true;
 
 	/* IPv4 checksum */
 	if (status & IXGBE_RXD_STAT_IPCS) {
 		mp->m_pkthdr.csum_flags |= CSUM_L3_CALC;
 		/* IP Checksum Good */
 		if (!(errors & IXGBE_RXD_ERR_IPE))
 			mp->m_pkthdr.csum_flags |= CSUM_L3_VALID;
 	}
 	/* TCP/UDP/SCTP checksum */
 	if (status & IXGBE_RXD_STAT_L4CS) {
 		mp->m_pkthdr.csum_flags |= CSUM_L4_CALC;
 		if (!(errors & IXGBE_RXD_ERR_TCPE)) {
 			mp->m_pkthdr.csum_flags |= CSUM_L4_VALID;
 			if (!sctp)
 				mp->m_pkthdr.csum_data = htons(0xffff);
 		}
 	}
 }
 
 /********************************************************************
  * Manage DMA'able memory.
  *******************************************************************/
 static void
 ixgbe_dmamap_cb(void *arg, bus_dma_segment_t * segs, int nseg, int error)
 {
 	if (error)
 		return;
 	*(bus_addr_t *) arg = segs->ds_addr;
 	return;
 }
 
 int
 ixgbe_dma_malloc(struct adapter *adapter, bus_size_t size,
 		struct ixgbe_dma_alloc *dma, int mapflags)
 {
 	device_t dev = adapter->dev;
 	int             r;
 
 	r = bus_dma_tag_create(bus_get_dma_tag(adapter->dev),	/* parent */
 			       DBA_ALIGN, 0,	/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,	/* filter, filterarg */
 			       size,	/* maxsize */
 			       1,	/* nsegments */
 			       size,	/* maxsegsize */
 			       BUS_DMA_ALLOCNOW,	/* flags */
 			       NULL,	/* lockfunc */
 			       NULL,	/* lockfuncarg */
 			       &dma->dma_tag);
 	if (r != 0) {
 		device_printf(dev,"ixgbe_dma_malloc: bus_dma_tag_create failed; "
 		       "error %u\n", r);
 		goto fail_0;
 	}
 	r = bus_dmamem_alloc(dma->dma_tag, (void **)&dma->dma_vaddr,
 			     BUS_DMA_NOWAIT, &dma->dma_map);
 	if (r != 0) {
 		device_printf(dev,"ixgbe_dma_malloc: bus_dmamem_alloc failed; "
 		       "error %u\n", r);
 		goto fail_1;
 	}
 	r = bus_dmamap_load(dma->dma_tag, dma->dma_map, dma->dma_vaddr,
 			    size,
 			    ixgbe_dmamap_cb,
 			    &dma->dma_paddr,
 			    mapflags | BUS_DMA_NOWAIT);
 	if (r != 0) {
 		device_printf(dev,"ixgbe_dma_malloc: bus_dmamap_load failed; "
 		       "error %u\n", r);
 		goto fail_2;
 	}
 	dma->dma_size = size;
 	return (0);
 fail_2:
 	bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
 fail_1:
 	bus_dma_tag_destroy(dma->dma_tag);
 fail_0:
 	dma->dma_tag = NULL;
 	return (r);
 }
 
 void
 ixgbe_dma_free(struct adapter *adapter, struct ixgbe_dma_alloc *dma)
 {
 	bus_dmamap_sync(dma->dma_tag, dma->dma_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 	bus_dmamap_unload(dma->dma_tag, dma->dma_map);
 	bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
 	bus_dma_tag_destroy(dma->dma_tag);
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for the transmit and receive rings, and then
  *  the descriptors associated with each, called only once at attach.
  *
  **********************************************************************/
 int
 ixgbe_allocate_queues(struct adapter *adapter)
 {
 	device_t	dev = adapter->dev;
 	struct ix_queue	*que;
 	struct tx_ring	*txr;
 	struct rx_ring	*rxr;
 	int rsize, tsize, error = IXGBE_SUCCESS;
 	int txconf = 0, rxconf = 0;
 #ifdef PCI_IOV
 	enum ixgbe_iov_mode iov_mode;
 #endif
 
         /* First allocate the top level queue structs */
         if (!(adapter->queues =
             (struct ix_queue *) malloc(sizeof(struct ix_queue) *
             adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
                 device_printf(dev, "Unable to allocate queue memory\n");
                 error = ENOMEM;
                 goto fail;
         }
 
 	/* First allocate the TX ring struct memory */
 	if (!(adapter->tx_rings =
 	    (struct tx_ring *) malloc(sizeof(struct tx_ring) *
 	    adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate TX ring memory\n");
 		error = ENOMEM;
 		goto tx_fail;
 	}
 
 	/* Next allocate the RX */
 	if (!(adapter->rx_rings =
 	    (struct rx_ring *) malloc(sizeof(struct rx_ring) *
 	    adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate RX ring memory\n");
 		error = ENOMEM;
 		goto rx_fail;
 	}
 
 	/* For the ring itself */
 	tsize = roundup2(adapter->num_tx_desc *
 	    sizeof(union ixgbe_adv_tx_desc), DBA_ALIGN);
 
 #ifdef PCI_IOV
 	iov_mode = ixgbe_get_iov_mode(adapter);
 	adapter->pool = ixgbe_max_vfs(iov_mode);
 #else
 	adapter->pool = 0;
 #endif
 	/*
 	 * Now set up the TX queues, txconf is needed to handle the
 	 * possibility that things fail midcourse and we need to
 	 * undo memory gracefully
 	 */ 
 	for (int i = 0; i < adapter->num_queues; i++, txconf++) {
 		/* Set up some basics */
 		txr = &adapter->tx_rings[i];
 		txr->adapter = adapter;
 #ifdef PCI_IOV
 		txr->me = ixgbe_pf_que_index(iov_mode, i);
 #else
 		txr->me = i;
 #endif
 		txr->num_desc = adapter->num_tx_desc;
 
 		/* Initialize the TX side lock */
 		snprintf(txr->mtx_name, sizeof(txr->mtx_name), "%s:tx(%d)",
 		    device_get_nameunit(dev), txr->me);
 		mtx_init(&txr->tx_mtx, txr->mtx_name, NULL, MTX_DEF);
 
 		if (ixgbe_dma_malloc(adapter, tsize,
 			&txr->txdma, BUS_DMA_NOWAIT)) {
 			device_printf(dev,
 			    "Unable to allocate TX Descriptor memory\n");
 			error = ENOMEM;
 			goto err_tx_desc;
 		}
 		txr->tx_base = (union ixgbe_adv_tx_desc *)txr->txdma.dma_vaddr;
 		bzero((void *)txr->tx_base, tsize);
 
         	/* Now allocate transmit buffers for the ring */
         	if (ixgbe_allocate_transmit_buffers(txr)) {
 			device_printf(dev,
 			    "Critical Failure setting up transmit buffers\n");
 			error = ENOMEM;
 			goto err_tx_desc;
         	}
 #ifndef IXGBE_LEGACY_TX
 		/* Allocate a buf ring */
 		txr->br = buf_ring_alloc(IXGBE_BR_SIZE, M_DEVBUF,
 		    M_WAITOK, &txr->tx_mtx);
 		if (txr->br == NULL) {
 			device_printf(dev,
 			    "Critical Failure setting up buf ring\n");
 			error = ENOMEM;
 			goto err_tx_desc;
         	}
 #endif
 	}
 
 	/*
 	 * Next the RX queues...
 	 */ 
 	rsize = roundup2(adapter->num_rx_desc *
 	    sizeof(union ixgbe_adv_rx_desc), DBA_ALIGN);
 	for (int i = 0; i < adapter->num_queues; i++, rxconf++) {
 		rxr = &adapter->rx_rings[i];
 		/* Set up some basics */
 		rxr->adapter = adapter;
 #ifdef PCI_IOV
 		rxr->me = ixgbe_pf_que_index(iov_mode, i);
 #else
 		rxr->me = i;
 #endif
 		rxr->num_desc = adapter->num_rx_desc;
 
 		/* Initialize the RX side lock */
 		snprintf(rxr->mtx_name, sizeof(rxr->mtx_name), "%s:rx(%d)",
 		    device_get_nameunit(dev), rxr->me);
 		mtx_init(&rxr->rx_mtx, rxr->mtx_name, NULL, MTX_DEF);
 
 		if (ixgbe_dma_malloc(adapter, rsize,
 			&rxr->rxdma, BUS_DMA_NOWAIT)) {
 			device_printf(dev,
 			    "Unable to allocate RxDescriptor memory\n");
 			error = ENOMEM;
 			goto err_rx_desc;
 		}
 		rxr->rx_base = (union ixgbe_adv_rx_desc *)rxr->rxdma.dma_vaddr;
 		bzero((void *)rxr->rx_base, rsize);
 
         	/* Allocate receive buffers for the ring*/
 		if (ixgbe_allocate_receive_buffers(rxr)) {
 			device_printf(dev,
 			    "Critical Failure setting up receive buffers\n");
 			error = ENOMEM;
 			goto err_rx_desc;
 		}
 	}
 
 	/*
 	** Finally set up the queue holding structs
 	*/
 	for (int i = 0; i < adapter->num_queues; i++) {
 		que = &adapter->queues[i];
 		que->adapter = adapter;
 		que->me = i;
 		que->txr = &adapter->tx_rings[i];
 		que->rxr = &adapter->rx_rings[i];
 	}
 
 	return (0);
 
 err_rx_desc:
 	for (rxr = adapter->rx_rings; rxconf > 0; rxr++, rxconf--)
 		ixgbe_dma_free(adapter, &rxr->rxdma);
 err_tx_desc:
 	for (txr = adapter->tx_rings; txconf > 0; txr++, txconf--)
 		ixgbe_dma_free(adapter, &txr->txdma);
 	free(adapter->rx_rings, M_DEVBUF);
 rx_fail:
 	free(adapter->tx_rings, M_DEVBUF);
 tx_fail:
 	free(adapter->queues, M_DEVBUF);
 fail:
 	return (error);
 }
Index: projects/vnet/sys/dev/ixl/ixl_txrx.c
===================================================================
--- projects/vnet/sys/dev/ixl/ixl_txrx.c	(revision 301546)
+++ projects/vnet/sys/dev/ixl/ixl_txrx.c	(revision 301547)
@@ -1,1831 +1,1831 @@
 /******************************************************************************
 
   Copyright (c) 2013-2015, Intel Corporation 
   All rights reserved.
   
   Redistribution and use in source and binary forms, with or without 
   modification, are permitted provided that the following conditions are met:
   
    1. Redistributions of source code must retain the above copyright notice, 
       this list of conditions and the following disclaimer.
   
    2. Redistributions in binary form must reproduce the above copyright 
       notice, this list of conditions and the following disclaimer in the 
       documentation and/or other materials provided with the distribution.
   
    3. Neither the name of the Intel Corporation nor the names of its 
       contributors may be used to endorse or promote products derived from 
       this software without specific prior written permission.
   
   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.
 
 ******************************************************************************/
 /*$FreeBSD$*/
 
 /*
 **	IXL driver TX/RX Routines:
 **	    This was seperated to allow usage by
 ** 	    both the BASE and the VF drivers.
 */
 
 #ifndef IXL_STANDALONE_BUILD
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_rss.h"
 #endif
 
 #include "ixl.h"
 
 #ifdef RSS
 #include <net/rss_config.h>
 #endif
 
 /* Local Prototypes */
 static void	ixl_rx_checksum(struct mbuf *, u32, u32, u8);
 static void	ixl_refresh_mbufs(struct ixl_queue *, int);
 static int      ixl_xmit(struct ixl_queue *, struct mbuf **);
 static int	ixl_tx_setup_offload(struct ixl_queue *,
 		    struct mbuf *, u32 *, u32 *);
 static bool	ixl_tso_setup(struct ixl_queue *, struct mbuf *);
 
 static __inline void ixl_rx_discard(struct rx_ring *, int);
 static __inline void ixl_rx_input(struct rx_ring *, struct ifnet *,
 		    struct mbuf *, u8);
 
 #ifdef DEV_NETMAP
 #include <dev/netmap/if_ixl_netmap.h>
 #endif /* DEV_NETMAP */
 
 /*
 ** Multiqueue Transmit driver
 */
 int
 ixl_mq_start(struct ifnet *ifp, struct mbuf *m)
 {
 	struct ixl_vsi		*vsi = ifp->if_softc;
 	struct ixl_queue	*que;
 	struct tx_ring		*txr;
 	int 			err, i;
 #ifdef RSS
 	u32			bucket_id;
 #endif
 
 	/*
 	** Which queue to use:
 	**
 	** When doing RSS, map it to the same outbound
 	** queue as the incoming flow would be mapped to.
 	** If everything is setup correctly, it should be
 	** the same bucket that the current CPU we're on is.
 	*/
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
 #ifdef  RSS
 		if (rss_hash2bucket(m->m_pkthdr.flowid,
 		    M_HASHTYPE_GET(m), &bucket_id) == 0) {
 			i = bucket_id % vsi->num_queues;
                 } else
 #endif
                         i = m->m_pkthdr.flowid % vsi->num_queues;
         } else
 		i = curcpu % vsi->num_queues;
 	/*
 	** This may not be perfect, but until something
 	** better comes along it will keep from scheduling
 	** on stalled queues.
 	*/
 	if (((1 << i) & vsi->active_queues) == 0)
 		i = ffsl(vsi->active_queues);
 
 	que = &vsi->queues[i];
 	txr = &que->txr;
 
 	err = drbr_enqueue(ifp, txr->br, m);
 	if (err)
 		return (err);
 	if (IXL_TX_TRYLOCK(txr)) {
 		ixl_mq_start_locked(ifp, txr);
 		IXL_TX_UNLOCK(txr);
 	} else
 		taskqueue_enqueue(que->tq, &que->tx_task);
 
 	return (0);
 }
 
 int
 ixl_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
 {
 	struct ixl_queue	*que = txr->que;
 	struct ixl_vsi		*vsi = que->vsi;
         struct mbuf		*next;
         int			err = 0;
 
 
 	if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
 	    vsi->link_active == 0)
 		return (ENETDOWN);
 
 	/* Process the transmit queue */
 	while ((next = drbr_peek(ifp, txr->br)) != NULL) {
 		if ((err = ixl_xmit(que, &next)) != 0) {
 			if (next == NULL)
 				drbr_advance(ifp, txr->br);
 			else
 				drbr_putback(ifp, txr->br, next);
 			break;
 		}
 		drbr_advance(ifp, txr->br);
 		/* Send a copy of the frame to the BPF listener */
 		ETHER_BPF_MTAP(ifp, next);
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 	}
 
 	if (txr->avail < IXL_TX_CLEANUP_THRESHOLD)
 		ixl_txeof(que);
 
 	return (err);
 }
 
 /*
  * Called from a taskqueue to drain queued transmit packets.
  */
 void
 ixl_deferred_mq_start(void *arg, int pending)
 {
 	struct ixl_queue	*que = arg;
         struct tx_ring		*txr = &que->txr;
 	struct ixl_vsi		*vsi = que->vsi;
         struct ifnet		*ifp = vsi->ifp;
         
 	IXL_TX_LOCK(txr);
 	if (!drbr_empty(ifp, txr->br))
 		ixl_mq_start_locked(ifp, txr);
 	IXL_TX_UNLOCK(txr);
 }
 
 /*
 ** Flush all queue ring buffers
 */
 void
 ixl_qflush(struct ifnet *ifp)
 {
 	struct ixl_vsi	*vsi = ifp->if_softc;
 
         for (int i = 0; i < vsi->num_queues; i++) {
 		struct ixl_queue *que = &vsi->queues[i];
 		struct tx_ring	*txr = &que->txr;
 		struct mbuf	*m;
 		IXL_TX_LOCK(txr);
 		while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
 			m_freem(m);
 		IXL_TX_UNLOCK(txr);
 	}
 	if_qflush(ifp);
 }
 
 /*
 ** Find mbuf chains passed to the driver 
 ** that are 'sparse', using more than 8
 ** mbufs to deliver an mss-size chunk of data
 */
 static inline bool
 ixl_tso_detect_sparse(struct mbuf *mp)
 {
 	struct mbuf	*m;
 	int		num = 0, mss;
 	bool		ret = FALSE;
 
 	mss = mp->m_pkthdr.tso_segsz;
 	for (m = mp->m_next; m != NULL; m = m->m_next) {
 		num++;
 		mss -= m->m_len;
 		if (mss < 1)
 			break;
 		if (m->m_next == NULL)
 			break;
 	}
 	if (num > IXL_SPARSE_CHAIN)
 		ret = TRUE;
 
 	return (ret);
 }
 
 
 /*********************************************************************
  *
  *  This routine maps the mbufs to tx descriptors, allowing the
  *  TX engine to transmit the packets. 
  *  	- return 0 on success, positive on failure
  *
  **********************************************************************/
 #define IXL_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
 
 static int
 ixl_xmit(struct ixl_queue *que, struct mbuf **m_headp)
 {
 	struct ixl_vsi		*vsi = que->vsi;
 	struct i40e_hw		*hw = vsi->hw;
 	struct tx_ring		*txr = &que->txr;
 	struct ixl_tx_buf	*buf;
 	struct i40e_tx_desc	*txd = NULL;
 	struct mbuf		*m_head, *m;
 	int             	i, j, error, nsegs, maxsegs;
 	int			first, last = 0;
 	u16			vtag = 0;
 	u32			cmd, off;
 	bus_dmamap_t		map;
 	bus_dma_tag_t		tag;
 	bus_dma_segment_t	segs[IXL_MAX_TSO_SEGS];
 
 	cmd = off = 0;
 	m_head = *m_headp;
 
         /*
          * Important to capture the first descriptor
          * used because it will contain the index of
          * the one we tell the hardware to report back
          */
         first = txr->next_avail;
 	buf = &txr->buffers[first];
 	map = buf->map;
 	tag = txr->tx_tag;
 	maxsegs = IXL_MAX_TX_SEGS;
 
 	if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 		/* Use larger mapping for TSO */
 		tag = txr->tso_tag;
 		maxsegs = IXL_MAX_TSO_SEGS;
 		if (ixl_tso_detect_sparse(m_head)) {
 			m = m_defrag(m_head, M_NOWAIT);
 			if (m == NULL) {
 				m_freem(*m_headp);
 				*m_headp = NULL;
 				return (ENOBUFS);
 			}
 			*m_headp = m;
 		}
 	}
 
 	/*
 	 * Map the packet for DMA.
 	 */
 	error = bus_dmamap_load_mbuf_sg(tag, map,
 	    *m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
 
 	if (error == EFBIG) {
 		struct mbuf *m;
 
 		m = m_defrag(*m_headp, M_NOWAIT);
 		if (m == NULL) {
 			que->mbuf_defrag_failed++;
 			m_freem(*m_headp);
 			*m_headp = NULL;
 			return (ENOBUFS);
 		}
 		*m_headp = m;
 
 		/* Try it again */
 		error = bus_dmamap_load_mbuf_sg(tag, map,
 		    *m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
 
 		if (error == ENOMEM) {
 			que->tx_dma_setup++;
 			return (error);
 		} else if (error != 0) {
 			que->tx_dma_setup++;
 			m_freem(*m_headp);
 			*m_headp = NULL;
 			return (error);
 		}
 	} else if (error == ENOMEM) {
 		que->tx_dma_setup++;
 		return (error);
 	} else if (error != 0) {
 		que->tx_dma_setup++;
 		m_freem(*m_headp);
 		*m_headp = NULL;
 		return (error);
 	}
 
 	/* Make certain there are enough descriptors */
 	if (nsegs > txr->avail - 2) {
 		txr->no_desc++;
 		error = ENOBUFS;
 		goto xmit_fail;
 	}
 	m_head = *m_headp;
 
 	/* Set up the TSO/CSUM offload */
 	if (m_head->m_pkthdr.csum_flags & CSUM_OFFLOAD) {
 		error = ixl_tx_setup_offload(que, m_head, &cmd, &off);
 		if (error)
 			goto xmit_fail;
 	}
 
 	cmd |= I40E_TX_DESC_CMD_ICRC;
 	/* Grab the VLAN tag */
 	if (m_head->m_flags & M_VLANTAG) {
 		cmd |= I40E_TX_DESC_CMD_IL2TAG1;
 		vtag = htole16(m_head->m_pkthdr.ether_vtag);
 	}
 
 	i = txr->next_avail;
 	for (j = 0; j < nsegs; j++) {
 		bus_size_t seglen;
 
 		buf = &txr->buffers[i];
 		buf->tag = tag; /* Keep track of the type tag */
 		txd = &txr->base[i];
 		seglen = segs[j].ds_len;
 
 		txd->buffer_addr = htole64(segs[j].ds_addr);
 		txd->cmd_type_offset_bsz =
 		    htole64(I40E_TX_DESC_DTYPE_DATA
 		    | ((u64)cmd  << I40E_TXD_QW1_CMD_SHIFT)
 		    | ((u64)off << I40E_TXD_QW1_OFFSET_SHIFT)
 		    | ((u64)seglen  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 		    | ((u64)vtag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 
 		last = i; /* descriptor that will get completion IRQ */
 
 		if (++i == que->num_desc)
 			i = 0;
 
 		buf->m_head = NULL;
 		buf->eop_index = -1;
 	}
 	/* Set the last descriptor for report */
 	txd->cmd_type_offset_bsz |=
 	    htole64(((u64)IXL_TXD_CMD << I40E_TXD_QW1_CMD_SHIFT));
 	txr->avail -= nsegs;
 	txr->next_avail = i;
 
 	buf->m_head = m_head;
 	/* Swap the dma map between the first and last descriptor */
 	txr->buffers[first].map = buf->map;
 	buf->map = map;
 	bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE);
 
         /* Set the index of the descriptor that will be marked done */
         buf = &txr->buffers[first];
 	buf->eop_index = last;
 
         bus_dmamap_sync(txr->dma.tag, txr->dma.map,
             BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	/*
 	 * Advance the Transmit Descriptor Tail (Tdt), this tells the
 	 * hardware that this frame is available to transmit.
 	 */
 	++txr->total_packets;
 	wr32(hw, txr->tail, i);
 
 	/* Mark outstanding work */
 	if (que->busy == 0)
 		que->busy = 1;
 	return (0);
 
 xmit_fail:
 	bus_dmamap_unload(tag, buf->map);
 	return (error);
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for tx_buffer structures. The tx_buffer stores all
  *  the information needed to transmit a packet on the wire. This is
  *  called only once at attach, setup is done every reset.
  *
  **********************************************************************/
 int
 ixl_allocate_tx_data(struct ixl_queue *que)
 {
 	struct tx_ring		*txr = &que->txr;
 	struct ixl_vsi		*vsi = que->vsi;
 	device_t		dev = vsi->dev;
 	struct ixl_tx_buf	*buf;
 	int			error = 0;
 
 	/*
 	 * Setup DMA descriptor areas.
 	 */
 	if ((error = bus_dma_tag_create(NULL,		/* parent */
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       IXL_TSO_SIZE,		/* maxsize */
 			       IXL_MAX_TX_SEGS,		/* nsegments */
 			       PAGE_SIZE,		/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txr->tx_tag))) {
 		device_printf(dev,"Unable to allocate TX DMA tag\n");
 		goto fail;
 	}
 
 	/* Make a special tag for TSO */
 	if ((error = bus_dma_tag_create(NULL,		/* parent */
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       IXL_TSO_SIZE,		/* maxsize */
 			       IXL_MAX_TSO_SEGS,	/* nsegments */
 			       PAGE_SIZE,		/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txr->tso_tag))) {
 		device_printf(dev,"Unable to allocate TX TSO DMA tag\n");
 		goto fail;
 	}
 
 	if (!(txr->buffers =
 	    (struct ixl_tx_buf *) malloc(sizeof(struct ixl_tx_buf) *
 	    que->num_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer memory\n");
 		error = ENOMEM;
 		goto fail;
 	}
 
         /* Create the descriptor buffer default dma maps */
 	buf = txr->buffers;
 	for (int i = 0; i < que->num_desc; i++, buf++) {
 		buf->tag = txr->tx_tag;
 		error = bus_dmamap_create(buf->tag, 0, &buf->map);
 		if (error != 0) {
 			device_printf(dev, "Unable to create TX DMA map\n");
 			goto fail;
 		}
 	}
 fail:
 	return (error);
 }
 
 
 /*********************************************************************
  *
  *  (Re)Initialize a queue transmit ring.
  *	- called by init, it clears the descriptor ring,
  *	  and frees any stale mbufs 
  *
  **********************************************************************/
 void
 ixl_init_tx_ring(struct ixl_queue *que)
 {
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(que->vsi->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 	struct tx_ring		*txr = &que->txr;
 	struct ixl_tx_buf	*buf;
 
 	/* Clear the old ring contents */
 	IXL_TX_LOCK(txr);
 
 #ifdef DEV_NETMAP
 	/*
 	 * (under lock): if in netmap mode, do some consistency
 	 * checks and set slot to entry 0 of the netmap ring.
 	 */
 	slot = netmap_reset(na, NR_TX, que->me, 0);
 #endif /* DEV_NETMAP */
 
 	bzero((void *)txr->base,
 	      (sizeof(struct i40e_tx_desc)) * que->num_desc);
 
 	/* Reset indices */
 	txr->next_avail = 0;
 	txr->next_to_clean = 0;
 
 #ifdef IXL_FDIR
 	/* Initialize flow director */
 	txr->atr_rate = ixl_atr_rate;
 	txr->atr_count = 0;
 #endif
 
 	/* Free any existing tx mbufs. */
         buf = txr->buffers;
 	for (int i = 0; i < que->num_desc; i++, buf++) {
 		if (buf->m_head != NULL) {
 			bus_dmamap_sync(buf->tag, buf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(buf->tag, buf->map);
 			m_freem(buf->m_head);
 			buf->m_head = NULL;
 		}
 #ifdef DEV_NETMAP
 		/*
 		 * In netmap mode, set the map for the packet buffer.
 		 * NOTE: Some drivers (not this one) also need to set
 		 * the physical buffer address in the NIC ring.
 		 * netmap_idx_n2k() maps a nic index, i, into the corresponding
 		 * netmap slot index, si
 		 */
 		if (slot) {
 			int si = netmap_idx_n2k(&na->tx_rings[que->me], i);
 			netmap_load_map(na, buf->tag, buf->map, NMB(na, slot + si));
 		}
 #endif /* DEV_NETMAP */
 		/* Clear the EOP index */
 		buf->eop_index = -1;
         }
 
 	/* Set number of descriptors available */
 	txr->avail = que->num_desc;
 
 	bus_dmamap_sync(txr->dma.tag, txr->dma.map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	IXL_TX_UNLOCK(txr);
 }
 
 
 /*********************************************************************
  *
  *  Free transmit ring related data structures.
  *
  **********************************************************************/
 void
 ixl_free_que_tx(struct ixl_queue *que)
 {
 	struct tx_ring *txr = &que->txr;
 	struct ixl_tx_buf *buf;
 
 	INIT_DBG_IF(que->vsi->ifp, "queue %d: begin", que->me);
 
 	for (int i = 0; i < que->num_desc; i++) {
 		buf = &txr->buffers[i];
 		if (buf->m_head != NULL) {
 			bus_dmamap_sync(buf->tag, buf->map,
 			    BUS_DMASYNC_POSTWRITE);
 			bus_dmamap_unload(buf->tag,
 			    buf->map);
 			m_freem(buf->m_head);
 			buf->m_head = NULL;
 			if (buf->map != NULL) {
 				bus_dmamap_destroy(buf->tag,
 				    buf->map);
 				buf->map = NULL;
 			}
 		} else if (buf->map != NULL) {
 			bus_dmamap_unload(buf->tag,
 			    buf->map);
 			bus_dmamap_destroy(buf->tag,
 			    buf->map);
 			buf->map = NULL;
 		}
 	}
 	if (txr->br != NULL)
 		buf_ring_free(txr->br, M_DEVBUF);
 	if (txr->buffers != NULL) {
 		free(txr->buffers, M_DEVBUF);
 		txr->buffers = NULL;
 	}
 	if (txr->tx_tag != NULL) {
 		bus_dma_tag_destroy(txr->tx_tag);
 		txr->tx_tag = NULL;
 	}
 	if (txr->tso_tag != NULL) {
 		bus_dma_tag_destroy(txr->tso_tag);
 		txr->tso_tag = NULL;
 	}
 
 	INIT_DBG_IF(que->vsi->ifp, "queue %d: end", que->me);
 	return;
 }
 
 /*********************************************************************
  *
  *  Setup descriptor for hw offloads 
  *
  **********************************************************************/
 
 static int
 ixl_tx_setup_offload(struct ixl_queue *que,
     struct mbuf *mp, u32 *cmd, u32 *off)
 {
 	struct ether_vlan_header	*eh;
 #ifdef INET
 	struct ip			*ip = NULL;
 #endif
 	struct tcphdr			*th = NULL;
 #ifdef INET6
 	struct ip6_hdr			*ip6;
 #endif
 	int				elen, ip_hlen = 0, tcp_hlen;
 	u16				etype;
 	u8				ipproto = 0;
 	bool				tso = FALSE;
 
 	/* Set up the TSO context descriptor if required */
 	if (mp->m_pkthdr.csum_flags & CSUM_TSO) {
 		tso = ixl_tso_setup(que, mp);
 		if (tso)
 			++que->tso;
 		else
 			return (ENXIO);
 	}
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present,
 	 * helpful for QinQ too.
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		etype = ntohs(eh->evl_proto);
 		elen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 	} else {
 		etype = ntohs(eh->evl_encap_proto);
 		elen = ETHER_HDR_LEN;
 	}
 
 	switch (etype) {
 #ifdef INET
 		case ETHERTYPE_IP:
 			ip = (struct ip *)(mp->m_data + elen);
 			ip_hlen = ip->ip_hl << 2;
 			ipproto = ip->ip_p;
 			th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
 			/* The IP checksum must be recalculated with TSO */
 			if (tso)
 				*cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			else
 				*cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
 			break;
 #endif
 #ifdef INET6
 		case ETHERTYPE_IPV6:
 			ip6 = (struct ip6_hdr *)(mp->m_data + elen);
 			ip_hlen = sizeof(struct ip6_hdr);
 			ipproto = ip6->ip6_nxt;
 			th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
 			*cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
 			break;
 #endif
 		default:
 			break;
 	}
 
 	*off |= (elen >> 1) << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
 	*off |= (ip_hlen >> 2) << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
 
 	switch (ipproto) {
 		case IPPROTO_TCP:
 			tcp_hlen = th->th_off << 2;
 			if (mp->m_pkthdr.csum_flags & (CSUM_TCP|CSUM_TCP_IPV6)) {
 				*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
 				*off |= (tcp_hlen >> 2) <<
 				    I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
 			}
 #ifdef IXL_FDIR
 			ixl_atr(que, th, etype);
 #endif
 			break;
 		case IPPROTO_UDP:
 			if (mp->m_pkthdr.csum_flags & (CSUM_UDP|CSUM_UDP_IPV6)) {
 				*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
 				*off |= (sizeof(struct udphdr) >> 2) <<
 				    I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
 			}
 			break;
 
 		case IPPROTO_SCTP:
 			if (mp->m_pkthdr.csum_flags & (CSUM_SCTP|CSUM_SCTP_IPV6)) {
 				*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
 				*off |= (sizeof(struct sctphdr) >> 2) <<
 				    I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
 			}
 			/* Fall Thru */
 		default:
 			break;
 	}
 
         return (0);
 }
 
 
 /**********************************************************************
  *
  *  Setup context for hardware segmentation offload (TSO)
  *
  **********************************************************************/
 static bool
 ixl_tso_setup(struct ixl_queue *que, struct mbuf *mp)
 {
 	struct tx_ring			*txr = &que->txr;
 	struct i40e_tx_context_desc	*TXD;
 	struct ixl_tx_buf		*buf;
 	u32				cmd, mss, type, tsolen;
 	u16				etype;
 	int				idx, elen, ip_hlen, tcp_hlen;
 	struct ether_vlan_header	*eh;
 #ifdef INET
 	struct ip			*ip;
 #endif
 #ifdef INET6
 	struct ip6_hdr			*ip6;
 #endif
 #if defined(INET6) || defined(INET)
 	struct tcphdr			*th;
 #endif
 	u64				type_cmd_tso_mss;
 
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present
 	 */
 	eh = mtod(mp, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		elen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 		etype = eh->evl_proto;
 	} else {
 		elen = ETHER_HDR_LEN;
 		etype = eh->evl_encap_proto;
 	}
 
         switch (ntohs(etype)) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 		ip6 = (struct ip6_hdr *)(mp->m_data + elen);
 		if (ip6->ip6_nxt != IPPROTO_TCP)
 			return (ENXIO);
 		ip_hlen = sizeof(struct ip6_hdr);
 		th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
 		th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 		tcp_hlen = th->th_off << 2;
 		/*
 		 * The corresponding flag is set by the stack in the IPv4
 		 * TSO case, but not in IPv6 (at least in FreeBSD 10.2).
 		 * So, set it here because the rest of the flow requires it.
 		 */
 		mp->m_pkthdr.csum_flags |= CSUM_TCP_IPV6;
 		break;
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 		ip = (struct ip *)(mp->m_data + elen);
 		if (ip->ip_p != IPPROTO_TCP)
 			return (ENXIO);
 		ip->ip_sum = 0;
 		ip_hlen = ip->ip_hl << 2;
 		th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
 		th->th_sum = in_pseudo(ip->ip_src.s_addr,
 		    ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 		tcp_hlen = th->th_off << 2;
 		break;
 #endif
 	default:
 		printf("%s: CSUM_TSO but no supported IP version (0x%04x)",
 		    __func__, ntohs(etype));
 		return FALSE;
         }
 
         /* Ensure we have at least the IP+TCP header in the first mbuf. */
         if (mp->m_len < elen + ip_hlen + sizeof(struct tcphdr))
 		return FALSE;
 
 	idx = txr->next_avail;
 	buf = &txr->buffers[idx];
 	TXD = (struct i40e_tx_context_desc *) &txr->base[idx];
 	tsolen = mp->m_pkthdr.len - (elen + ip_hlen + tcp_hlen);
 
 	type = I40E_TX_DESC_DTYPE_CONTEXT;
 	cmd = I40E_TX_CTX_DESC_TSO;
 	mss = mp->m_pkthdr.tso_segsz;
 
 	type_cmd_tso_mss = ((u64)type << I40E_TXD_CTX_QW1_DTYPE_SHIFT) |
 	    ((u64)cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
 	    ((u64)tsolen << I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
 	    ((u64)mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
 	TXD->type_cmd_tso_mss = htole64(type_cmd_tso_mss);
 
 	TXD->tunneling_params = htole32(0);
 	buf->m_head = NULL;
 	buf->eop_index = -1;
 
 	if (++idx == que->num_desc)
 		idx = 0;
 
 	txr->avail--;
 	txr->next_avail = idx;
 
 	return TRUE;
 }
 
 /*             
 ** ixl_get_tx_head - Retrieve the value from the 
 **    location the HW records its HEAD index
 */
 static inline u32
 ixl_get_tx_head(struct ixl_queue *que)
 {
 	struct tx_ring  *txr = &que->txr;
 	void *head = &txr->base[que->num_desc];
 	return LE32_TO_CPU(*(volatile __le32 *)head);
 }
 
 /**********************************************************************
  *
  *  Examine each tx_buffer in the used queue. If the hardware is done
  *  processing the packet then free associated resources. The
  *  tx_buffer is put back on the free queue.
  *
  **********************************************************************/
 bool
 ixl_txeof(struct ixl_queue *que)
 {
 	struct tx_ring		*txr = &que->txr;
 	u32			first, last, head, done, processed;
 	struct ixl_tx_buf	*buf;
 	struct i40e_tx_desc	*tx_desc, *eop_desc;
 
 
 	mtx_assert(&txr->mtx, MA_OWNED);
 
 #ifdef DEV_NETMAP
 	// XXX todo: implement moderation
 	if (netmap_tx_irq(que->vsi->ifp, que->me))
 		return FALSE;
 #endif /* DEF_NETMAP */
 
 	/* These are not the descriptors you seek, move along :) */
 	if (txr->avail == que->num_desc) {
 		que->busy = 0;
 		return FALSE;
 	}
 
 	processed = 0;
 	first = txr->next_to_clean;
 	buf = &txr->buffers[first];
 	tx_desc = (struct i40e_tx_desc *)&txr->base[first];
 	last = buf->eop_index;
 	if (last == -1)
 		return FALSE;
 	eop_desc = (struct i40e_tx_desc *)&txr->base[last];
 
 	/* Get the Head WB value */
 	head = ixl_get_tx_head(que);
 
 	/*
 	** Get the index of the first descriptor
 	** BEYOND the EOP and call that 'done'.
 	** I do this so the comparison in the
 	** inner while loop below can be simple
 	*/
 	if (++last == que->num_desc) last = 0;
 	done = last;
 
         bus_dmamap_sync(txr->dma.tag, txr->dma.map,
             BUS_DMASYNC_POSTREAD);
 	/*
 	** The HEAD index of the ring is written in a 
 	** defined location, this rather than a done bit
 	** is what is used to keep track of what must be
 	** 'cleaned'.
 	*/
 	while (first != head) {
 		/* We clean the range of the packet */
 		while (first != done) {
 			++txr->avail;
 			++processed;
 
 			if (buf->m_head) {
 				txr->bytes += /* for ITR adjustment */
 				    buf->m_head->m_pkthdr.len;
 				txr->tx_bytes += /* for TX stats */
 				    buf->m_head->m_pkthdr.len;
 				bus_dmamap_sync(buf->tag,
 				    buf->map,
 				    BUS_DMASYNC_POSTWRITE);
 				bus_dmamap_unload(buf->tag,
 				    buf->map);
 				m_freem(buf->m_head);
 				buf->m_head = NULL;
 				buf->map = NULL;
 			}
 			buf->eop_index = -1;
 
 			if (++first == que->num_desc)
 				first = 0;
 
 			buf = &txr->buffers[first];
 			tx_desc = &txr->base[first];
 		}
 		++txr->packets;
 		/* See if there is more work now */
 		last = buf->eop_index;
 		if (last != -1) {
 			eop_desc = &txr->base[last];
 			/* Get next done point */
 			if (++last == que->num_desc) last = 0;
 			done = last;
 		} else
 			break;
 	}
 	bus_dmamap_sync(txr->dma.tag, txr->dma.map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 	txr->next_to_clean = first;
 
 
 	/*
 	** Hang detection, we know there's
 	** work outstanding or the first return
 	** would have been taken, so indicate an
 	** unsuccessful pass, in local_timer if
 	** the value is too great the queue will
 	** be considered hung. If anything has been
 	** cleaned then reset the state.
 	*/
 	if ((processed == 0) && (que->busy != IXL_QUEUE_HUNG))
 		++que->busy;
 
 	if (processed)
 		que->busy = 1; /* Note this turns off HUNG */
 
 	/*
 	 * If there are no pending descriptors, clear the timeout.
 	 */
 	if (txr->avail == que->num_desc) {
 		que->busy = 0;
 		return FALSE;
 	}
 
 	return TRUE;
 }
 
 /*********************************************************************
  *
  *  Refresh mbuf buffers for RX descriptor rings
  *   - now keeps its own state so discards due to resource
  *     exhaustion are unnecessary, if an mbuf cannot be obtained
  *     it just returns, keeping its placeholder, thus it can simply
  *     be recalled to try again.
  *
  **********************************************************************/
 static void
 ixl_refresh_mbufs(struct ixl_queue *que, int limit)
 {
 	struct ixl_vsi		*vsi = que->vsi;
 	struct rx_ring		*rxr = &que->rxr;
 	bus_dma_segment_t	hseg[1];
 	bus_dma_segment_t	pseg[1];
 	struct ixl_rx_buf	*buf;
 	struct mbuf		*mh, *mp;
 	int			i, j, nsegs, error;
 	bool			refreshed = FALSE;
 
 	i = j = rxr->next_refresh;
 	/* Control the loop with one beyond */
 	if (++j == que->num_desc)
 		j = 0;
 
 	while (j != limit) {
 		buf = &rxr->buffers[i];
 		if (rxr->hdr_split == FALSE)
 			goto no_split;
 
 		if (buf->m_head == NULL) {
 			mh = m_gethdr(M_NOWAIT, MT_DATA);
 			if (mh == NULL)
 				goto update;
 		} else
 			mh = buf->m_head;
 
 		mh->m_pkthdr.len = mh->m_len = MHLEN;
 		mh->m_len = MHLEN;
 		mh->m_flags |= M_PKTHDR;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->htag,
 		    buf->hmap, mh, hseg, &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) {
 			printf("Refresh mbufs: hdr dmamap load"
 			    " failure - %d\n", error);
 			m_free(mh);
 			buf->m_head = NULL;
 			goto update;
 		}
 		buf->m_head = mh;
 		bus_dmamap_sync(rxr->htag, buf->hmap,
 		    BUS_DMASYNC_PREREAD);
 		rxr->base[i].read.hdr_addr =
 		   htole64(hseg[0].ds_addr);
 
 no_split:
 		if (buf->m_pack == NULL) {
 			mp = m_getjcl(M_NOWAIT, MT_DATA,
 			    M_PKTHDR, rxr->mbuf_sz);
 			if (mp == NULL)
 				goto update;
 		} else
 			mp = buf->m_pack;
 
 		mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 		    buf->pmap, mp, pseg, &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) {
 			printf("Refresh mbufs: payload dmamap load"
 			    " failure - %d\n", error);
 			m_free(mp);
 			buf->m_pack = NULL;
 			goto update;
 		}
 		buf->m_pack = mp;
 		bus_dmamap_sync(rxr->ptag, buf->pmap,
 		    BUS_DMASYNC_PREREAD);
 		rxr->base[i].read.pkt_addr =
 		   htole64(pseg[0].ds_addr);
 		/* Used only when doing header split */
 		rxr->base[i].read.hdr_addr = 0;
 
 		refreshed = TRUE;
 		/* Next is precalculated */
 		i = j;
 		rxr->next_refresh = i;
 		if (++j == que->num_desc)
 			j = 0;
 	}
 update:
 	if (refreshed) /* Update hardware tail index */
 		wr32(vsi->hw, rxr->tail, rxr->next_refresh);
 	return;
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for rx_buffer structures. Since we use one
  *  rx_buffer per descriptor, the maximum number of rx_buffer's
  *  that we'll need is equal to the number of receive descriptors
  *  that we've defined.
  *
  **********************************************************************/
 int
 ixl_allocate_rx_data(struct ixl_queue *que)
 {
 	struct rx_ring		*rxr = &que->rxr;
 	struct ixl_vsi		*vsi = que->vsi;
 	device_t 		dev = vsi->dev;
 	struct ixl_rx_buf 	*buf;
 	int             	i, bsize, error;
 
 	bsize = sizeof(struct ixl_rx_buf) * que->num_desc;
 	if (!(rxr->buffers =
 	    (struct ixl_rx_buf *) malloc(bsize,
 	    M_DEVBUF, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate rx_buffer memory\n");
 		error = ENOMEM;
 		return (error);
 	}
 
 	if ((error = bus_dma_tag_create(NULL,	/* parent */
 				   1, 0,	/* alignment, bounds */
 				   BUS_SPACE_MAXADDR,	/* lowaddr */
 				   BUS_SPACE_MAXADDR,	/* highaddr */
 				   NULL, NULL,		/* filter, filterarg */
 				   MSIZE,		/* maxsize */
 				   1,			/* nsegments */
 				   MSIZE,		/* maxsegsize */
 				   0,			/* flags */
 				   NULL,		/* lockfunc */
 				   NULL,		/* lockfuncarg */
 				   &rxr->htag))) {
 		device_printf(dev, "Unable to create RX DMA htag\n");
 		return (error);
 	}
 
 	if ((error = bus_dma_tag_create(NULL,	/* parent */
 				   1, 0,	/* alignment, bounds */
 				   BUS_SPACE_MAXADDR,	/* lowaddr */
 				   BUS_SPACE_MAXADDR,	/* highaddr */
 				   NULL, NULL,		/* filter, filterarg */
 				   MJUM16BYTES,		/* maxsize */
 				   1,			/* nsegments */
 				   MJUM16BYTES,		/* maxsegsize */
 				   0,			/* flags */
 				   NULL,		/* lockfunc */
 				   NULL,		/* lockfuncarg */
 				   &rxr->ptag))) {
 		device_printf(dev, "Unable to create RX DMA ptag\n");
 		return (error);
 	}
 
 	for (i = 0; i < que->num_desc; i++) {
 		buf = &rxr->buffers[i];
 		error = bus_dmamap_create(rxr->htag,
 		    BUS_DMA_NOWAIT, &buf->hmap);
 		if (error) {
 			device_printf(dev, "Unable to create RX head map\n");
 			break;
 		}
 		error = bus_dmamap_create(rxr->ptag,
 		    BUS_DMA_NOWAIT, &buf->pmap);
 		if (error) {
 			device_printf(dev, "Unable to create RX pkt map\n");
 			break;
 		}
 	}
 
 	return (error);
 }
 
 
 /*********************************************************************
  *
  *  (Re)Initialize the queue receive ring and its buffers.
  *
  **********************************************************************/
 int
 ixl_init_rx_ring(struct ixl_queue *que)
 {
 	struct	rx_ring 	*rxr = &que->rxr;
 	struct ixl_vsi		*vsi = que->vsi;
 #if defined(INET6) || defined(INET)
 	struct ifnet		*ifp = vsi->ifp;
 	struct lro_ctrl		*lro = &rxr->lro;
 #endif
 	struct ixl_rx_buf	*buf;
 	bus_dma_segment_t	pseg[1], hseg[1];
 	int			rsize, nsegs, error = 0;
 #ifdef DEV_NETMAP
 	struct netmap_adapter *na = NA(que->vsi->ifp);
 	struct netmap_slot *slot;
 #endif /* DEV_NETMAP */
 
 	IXL_RX_LOCK(rxr);
 #ifdef DEV_NETMAP
 	/* same as in ixl_init_tx_ring() */
 	slot = netmap_reset(na, NR_RX, que->me, 0);
 #endif /* DEV_NETMAP */
 	/* Clear the ring contents */
 	rsize = roundup2(que->num_desc *
 	    sizeof(union i40e_rx_desc), DBA_ALIGN);
 	bzero((void *)rxr->base, rsize);
 	/* Cleanup any existing buffers */
 	for (int i = 0; i < que->num_desc; i++) {
 		buf = &rxr->buffers[i];
 		if (buf->m_head != NULL) {
 			bus_dmamap_sync(rxr->htag, buf->hmap,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(rxr->htag, buf->hmap);
 			buf->m_head->m_flags |= M_PKTHDR;
 			m_freem(buf->m_head);
 		}
 		if (buf->m_pack != NULL) {
 			bus_dmamap_sync(rxr->ptag, buf->pmap,
 			    BUS_DMASYNC_POSTREAD);
 			bus_dmamap_unload(rxr->ptag, buf->pmap);
 			buf->m_pack->m_flags |= M_PKTHDR;
 			m_freem(buf->m_pack);
 		}
 		buf->m_head = NULL;
 		buf->m_pack = NULL;
 	}
 
 	/* header split is off */
 	rxr->hdr_split = FALSE;
 
 	/* Now replenish the mbufs */
 	for (int j = 0; j != que->num_desc; ++j) {
 		struct mbuf	*mh, *mp;
 
 		buf = &rxr->buffers[j];
 #ifdef DEV_NETMAP
 		/*
 		 * In netmap mode, fill the map and set the buffer
 		 * address in the NIC ring, considering the offset
 		 * between the netmap and NIC rings (see comment in
 		 * ixgbe_setup_transmit_ring() ). No need to allocate
 		 * an mbuf, so end the block with a continue;
 		 */
 		if (slot) {
 			int sj = netmap_idx_n2k(&na->rx_rings[que->me], j);
 			uint64_t paddr;
 			void *addr;
 
 			addr = PNMB(na, slot + sj, &paddr);
 			netmap_load_map(na, rxr->dma.tag, buf->pmap, addr);
 			/* Update descriptor and the cached value */
 			rxr->base[j].read.pkt_addr = htole64(paddr);
 			rxr->base[j].read.hdr_addr = 0;
 			continue;
 		}
 #endif /* DEV_NETMAP */
 		/*
 		** Don't allocate mbufs if not
 		** doing header split, its wasteful
 		*/ 
 		if (rxr->hdr_split == FALSE)
 			goto skip_head;
 
 		/* First the header */
 		buf->m_head = m_gethdr(M_NOWAIT, MT_DATA);
 		if (buf->m_head == NULL) {
 			error = ENOBUFS;
 			goto fail;
 		}
 		m_adj(buf->m_head, ETHER_ALIGN);
 		mh = buf->m_head;
 		mh->m_len = mh->m_pkthdr.len = MHLEN;
 		mh->m_flags |= M_PKTHDR;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->htag,
 		    buf->hmap, buf->m_head, hseg,
 		    &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0) /* Nothing elegant to do here */
 			goto fail;
 		bus_dmamap_sync(rxr->htag,
 		    buf->hmap, BUS_DMASYNC_PREREAD);
 		/* Update descriptor */
 		rxr->base[j].read.hdr_addr = htole64(hseg[0].ds_addr);
 
 skip_head:
 		/* Now the payload cluster */
 		buf->m_pack = m_getjcl(M_NOWAIT, MT_DATA,
 		    M_PKTHDR, rxr->mbuf_sz);
 		if (buf->m_pack == NULL) {
 			error = ENOBUFS;
                         goto fail;
 		}
 		mp = buf->m_pack;
 		mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
 		/* Get the memory mapping */
 		error = bus_dmamap_load_mbuf_sg(rxr->ptag,
 		    buf->pmap, mp, pseg,
 		    &nsegs, BUS_DMA_NOWAIT);
 		if (error != 0)
                         goto fail;
 		bus_dmamap_sync(rxr->ptag,
 		    buf->pmap, BUS_DMASYNC_PREREAD);
 		/* Update descriptor */
 		rxr->base[j].read.pkt_addr = htole64(pseg[0].ds_addr);
 		rxr->base[j].read.hdr_addr = 0;
 	}
 
 
 	/* Setup our descriptor indices */
 	rxr->next_check = 0;
 	rxr->next_refresh = 0;
 	rxr->lro_enabled = FALSE;
 	rxr->split = 0;
 	rxr->bytes = 0;
 	rxr->discard = FALSE;
 
 	wr32(vsi->hw, rxr->tail, que->num_desc - 1);
 	ixl_flush(vsi->hw);
 
 #if defined(INET6) || defined(INET)
 	/*
 	** Now set up the LRO interface:
 	*/
 	if (ifp->if_capenable & IFCAP_LRO) {
 		int err = tcp_lro_init(lro);
 		if (err) {
 			if_printf(ifp, "queue %d: LRO Initialization failed!\n", que->me);
 			goto fail;
 		}
 		INIT_DBG_IF(ifp, "queue %d: RX Soft LRO Initialized", que->me);
 		rxr->lro_enabled = TRUE;
 		lro->ifp = vsi->ifp;
 	}
 #endif
 
 	bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 fail:
 	IXL_RX_UNLOCK(rxr);
 	return (error);
 }
 
 
 /*********************************************************************
  *
  *  Free station receive ring data structures
  *
  **********************************************************************/
 void
 ixl_free_que_rx(struct ixl_queue *que)
 {
 	struct rx_ring		*rxr = &que->rxr;
 	struct ixl_rx_buf	*buf;
 
 	INIT_DBG_IF(que->vsi->ifp, "queue %d: begin", que->me);
 
 	/* Cleanup any existing buffers */
 	if (rxr->buffers != NULL) {
 		for (int i = 0; i < que->num_desc; i++) {
 			buf = &rxr->buffers[i];
 			if (buf->m_head != NULL) {
 				bus_dmamap_sync(rxr->htag, buf->hmap,
 				    BUS_DMASYNC_POSTREAD);
 				bus_dmamap_unload(rxr->htag, buf->hmap);
 				buf->m_head->m_flags |= M_PKTHDR;
 				m_freem(buf->m_head);
 			}
 			if (buf->m_pack != NULL) {
 				bus_dmamap_sync(rxr->ptag, buf->pmap,
 				    BUS_DMASYNC_POSTREAD);
 				bus_dmamap_unload(rxr->ptag, buf->pmap);
 				buf->m_pack->m_flags |= M_PKTHDR;
 				m_freem(buf->m_pack);
 			}
 			buf->m_head = NULL;
 			buf->m_pack = NULL;
 			if (buf->hmap != NULL) {
 				bus_dmamap_destroy(rxr->htag, buf->hmap);
 				buf->hmap = NULL;
 			}
 			if (buf->pmap != NULL) {
 				bus_dmamap_destroy(rxr->ptag, buf->pmap);
 				buf->pmap = NULL;
 			}
 		}
 		if (rxr->buffers != NULL) {
 			free(rxr->buffers, M_DEVBUF);
 			rxr->buffers = NULL;
 		}
 	}
 
 	if (rxr->htag != NULL) {
 		bus_dma_tag_destroy(rxr->htag);
 		rxr->htag = NULL;
 	}
 	if (rxr->ptag != NULL) {
 		bus_dma_tag_destroy(rxr->ptag);
 		rxr->ptag = NULL;
 	}
 
 	INIT_DBG_IF(que->vsi->ifp, "queue %d: end", que->me);
 	return;
 }
 
 static __inline void
 ixl_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u8 ptype)
 {
 
 #if defined(INET6) || defined(INET)
         /*
          * ATM LRO is only for IPv4/TCP packets and TCP checksum of the packet
          * should be computed by hardware. Also it should not have VLAN tag in
          * ethernet header.
          */
         if (rxr->lro_enabled &&
             (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
             (m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) ==
             (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
                 /*
                  * Send to the stack if:
                  **  - LRO not enabled, or
                  **  - no LRO resources, or
                  **  - lro enqueue fails
                  */
                 if (rxr->lro.lro_cnt != 0)
                         if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
                                 return;
         }
 #endif
 	IXL_RX_UNLOCK(rxr);
         (*ifp->if_input)(ifp, m);
 	IXL_RX_LOCK(rxr);
 }
 
 
 static __inline void
 ixl_rx_discard(struct rx_ring *rxr, int i)
 {
 	struct ixl_rx_buf	*rbuf;
 
 	rbuf = &rxr->buffers[i];
 
         if (rbuf->fmp != NULL) {/* Partial chain ? */
 		rbuf->fmp->m_flags |= M_PKTHDR;
                 m_freem(rbuf->fmp);
                 rbuf->fmp = NULL;
 	}
 
 	/*
 	** With advanced descriptors the writeback
 	** clobbers the buffer addrs, so its easier
 	** to just free the existing mbufs and take
 	** the normal refresh path to get new buffers
 	** and mapping.
 	*/
 	if (rbuf->m_head) {
 		m_free(rbuf->m_head);
 		rbuf->m_head = NULL;
 	}
  
 	if (rbuf->m_pack) {
 		m_free(rbuf->m_pack);
 		rbuf->m_pack = NULL;
 	}
 
 	return;
 }
 
 #ifdef RSS
 /*
 ** i40e_ptype_to_hash: parse the packet type
 ** to determine the appropriate hash.
 */
 static inline int
 ixl_ptype_to_hash(u8 ptype)
 {
         struct i40e_rx_ptype_decoded	decoded;
 	u8				ex = 0;
 
 	decoded = decode_rx_desc_ptype(ptype);
 	ex = decoded.outer_frag;
 
 	if (!decoded.known)
-		return M_HASHTYPE_OPAQUE;
+		return M_HASHTYPE_OPAQUE_HASH;
 
 	if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_L2) 
-		return M_HASHTYPE_OPAQUE;
+		return M_HASHTYPE_OPAQUE_HASH;
 
 	/* Note: anything that gets to this point is IP */
         if (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6) { 
 		switch (decoded.inner_prot) {
 			case I40E_RX_PTYPE_INNER_PROT_TCP:
 				if (ex)
 					return M_HASHTYPE_RSS_TCP_IPV6_EX;
 				else
 					return M_HASHTYPE_RSS_TCP_IPV6;
 			case I40E_RX_PTYPE_INNER_PROT_UDP:
 				if (ex)
 					return M_HASHTYPE_RSS_UDP_IPV6_EX;
 				else
 					return M_HASHTYPE_RSS_UDP_IPV6;
 			default:
 				if (ex)
 					return M_HASHTYPE_RSS_IPV6_EX;
 				else
 					return M_HASHTYPE_RSS_IPV6;
 		}
 	}
         if (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4) { 
 		switch (decoded.inner_prot) {
 			case I40E_RX_PTYPE_INNER_PROT_TCP:
 					return M_HASHTYPE_RSS_TCP_IPV4;
 			case I40E_RX_PTYPE_INNER_PROT_UDP:
 				if (ex)
 					return M_HASHTYPE_RSS_UDP_IPV4_EX;
 				else
 					return M_HASHTYPE_RSS_UDP_IPV4;
 			default:
 					return M_HASHTYPE_RSS_IPV4;
 		}
 	}
 	/* We should never get here!! */
-	return M_HASHTYPE_OPAQUE;
+	return M_HASHTYPE_OPAQUE_HASH;
 }
 #endif /* RSS */
 
 /*********************************************************************
  *
  *  This routine executes in interrupt context. It replenishes
  *  the mbufs in the descriptor and sends data which has been
  *  dma'ed into host memory to upper layer.
  *
  *  We loop at most count times if count is > 0, or until done if
  *  count < 0.
  *
  *  Return TRUE for more work, FALSE for all clean.
  *********************************************************************/
 bool
 ixl_rxeof(struct ixl_queue *que, int count)
 {
 	struct ixl_vsi		*vsi = que->vsi;
 	struct rx_ring		*rxr = &que->rxr;
 	struct ifnet		*ifp = vsi->ifp;
 #if defined(INET6) || defined(INET)
 	struct lro_ctrl		*lro = &rxr->lro;
 #endif
 	int			i, nextp, processed = 0;
 	union i40e_rx_desc	*cur;
 	struct ixl_rx_buf	*rbuf, *nbuf;
 
 
 	IXL_RX_LOCK(rxr);
 
 #ifdef DEV_NETMAP
 	if (netmap_rx_irq(ifp, que->me, &count)) {
 		IXL_RX_UNLOCK(rxr);
 		return (FALSE);
 	}
 #endif /* DEV_NETMAP */
 
 	for (i = rxr->next_check; count != 0;) {
 		struct mbuf	*sendmp, *mh, *mp;
 		u32		rsc, status, error;
 		u16		hlen, plen, vtag;
 		u64		qword;
 		u8		ptype;
 		bool		eop;
  
 		/* Sync the ring. */
 		bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 		cur = &rxr->base[i];
 		qword = le64toh(cur->wb.qword1.status_error_len);
 		status = (qword & I40E_RXD_QW1_STATUS_MASK)
 		    >> I40E_RXD_QW1_STATUS_SHIFT;
 		error = (qword & I40E_RXD_QW1_ERROR_MASK)
 		    >> I40E_RXD_QW1_ERROR_SHIFT;
 		plen = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK)
 		    >> I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
 		hlen = (qword & I40E_RXD_QW1_LENGTH_HBUF_MASK)
 		    >> I40E_RXD_QW1_LENGTH_HBUF_SHIFT;
 		ptype = (qword & I40E_RXD_QW1_PTYPE_MASK)
 		    >> I40E_RXD_QW1_PTYPE_SHIFT;
 
 		if ((status & (1 << I40E_RX_DESC_STATUS_DD_SHIFT)) == 0) {
 			++rxr->not_done;
 			break;
 		}
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			break;
 
 		count--;
 		sendmp = NULL;
 		nbuf = NULL;
 		rsc = 0;
 		cur->wb.qword1.status_error_len = 0;
 		rbuf = &rxr->buffers[i];
 		mh = rbuf->m_head;
 		mp = rbuf->m_pack;
 		eop = (status & (1 << I40E_RX_DESC_STATUS_EOF_SHIFT));
 		if (status & (1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT))
 			vtag = le16toh(cur->wb.qword0.lo_dword.l2tag1);
 		else
 			vtag = 0;
 
 		/*
 		** Make sure bad packets are discarded,
 		** note that only EOP descriptor has valid
 		** error results.
 		*/
                 if (eop && (error & (1 << I40E_RX_DESC_ERROR_RXE_SHIFT))) {
 			rxr->desc_errs++;
 			ixl_rx_discard(rxr, i);
 			goto next_desc;
 		}
 
 		/* Prefetch the next buffer */
 		if (!eop) {
 			nextp = i + 1;
 			if (nextp == que->num_desc)
 				nextp = 0;
 			nbuf = &rxr->buffers[nextp];
 			prefetch(nbuf);
 		}
 
 		/*
 		** The header mbuf is ONLY used when header 
 		** split is enabled, otherwise we get normal 
 		** behavior, ie, both header and payload
 		** are DMA'd into the payload buffer.
 		**
 		** Rather than using the fmp/lmp global pointers
 		** we now keep the head of a packet chain in the
 		** buffer struct and pass this along from one
 		** descriptor to the next, until we get EOP.
 		*/
 		if (rxr->hdr_split && (rbuf->fmp == NULL)) {
 			if (hlen > IXL_RX_HDR)
 				hlen = IXL_RX_HDR;
 			mh->m_len = hlen;
 			mh->m_flags |= M_PKTHDR;
 			mh->m_next = NULL;
 			mh->m_pkthdr.len = mh->m_len;
 			/* Null buf pointer so it is refreshed */
 			rbuf->m_head = NULL;
 			/*
 			** Check the payload length, this
 			** could be zero if its a small
 			** packet.
 			*/
 			if (plen > 0) {
 				mp->m_len = plen;
 				mp->m_next = NULL;
 				mp->m_flags &= ~M_PKTHDR;
 				mh->m_next = mp;
 				mh->m_pkthdr.len += mp->m_len;
 				/* Null buf pointer so it is refreshed */
 				rbuf->m_pack = NULL;
 				rxr->split++;
 			}
 			/*
 			** Now create the forward
 			** chain so when complete 
 			** we wont have to.
 			*/
                         if (eop == 0) {
 				/* stash the chain head */
                                 nbuf->fmp = mh;
 				/* Make forward chain */
                                 if (plen)
                                         mp->m_next = nbuf->m_pack;
                                 else
                                         mh->m_next = nbuf->m_pack;
                         } else {
 				/* Singlet, prepare to send */
                                 sendmp = mh;
                                 if (vtag) {
                                         sendmp->m_pkthdr.ether_vtag = vtag;
                                         sendmp->m_flags |= M_VLANTAG;
                                 }
                         }
 		} else {
 			/*
 			** Either no header split, or a
 			** secondary piece of a fragmented
 			** split packet.
 			*/
 			mp->m_len = plen;
 			/*
 			** See if there is a stored head
 			** that determines what we are
 			*/
 			sendmp = rbuf->fmp;
 			rbuf->m_pack = rbuf->fmp = NULL;
 
 			if (sendmp != NULL) /* secondary frag */
 				sendmp->m_pkthdr.len += mp->m_len;
 			else {
 				/* first desc of a non-ps chain */
 				sendmp = mp;
 				sendmp->m_flags |= M_PKTHDR;
 				sendmp->m_pkthdr.len = mp->m_len;
 				if (vtag) {
 					sendmp->m_pkthdr.ether_vtag = vtag;
 					sendmp->m_flags |= M_VLANTAG;
 				}
                         }
 			/* Pass the head pointer on */
 			if (eop == 0) {
 				nbuf->fmp = sendmp;
 				sendmp = NULL;
 				mp->m_next = nbuf->m_pack;
 			}
 		}
 		++processed;
 		/* Sending this frame? */
 		if (eop) {
 			sendmp->m_pkthdr.rcvif = ifp;
 			/* gather stats */
 			rxr->rx_packets++;
 			rxr->rx_bytes += sendmp->m_pkthdr.len;
 			/* capture data for dynamic ITR adjustment */
 			rxr->packets++;
 			rxr->bytes += sendmp->m_pkthdr.len;
 			if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
 				ixl_rx_checksum(sendmp, status, error, ptype);
 #ifdef RSS
 			sendmp->m_pkthdr.flowid =
 			    le32toh(cur->wb.qword0.hi_dword.rss);
 			M_HASHTYPE_SET(sendmp, ixl_ptype_to_hash(ptype));
 #else
 			sendmp->m_pkthdr.flowid = que->msix;
 			M_HASHTYPE_SET(sendmp, M_HASHTYPE_OPAQUE);
 #endif
 		}
 next_desc:
 		bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		/* Advance our pointers to the next descriptor. */
 		if (++i == que->num_desc)
 			i = 0;
 
 		/* Now send to the stack or do LRO */
 		if (sendmp != NULL) {
 			rxr->next_check = i;
 			ixl_rx_input(rxr, ifp, sendmp, ptype);
 			i = rxr->next_check;
 		}
 
                /* Every 8 descriptors we go to refresh mbufs */
 		if (processed == 8) {
 			ixl_refresh_mbufs(que, i);
 			processed = 0;
 		}
 	}
 
 	/* Refresh any remaining buf structs */
 	if (ixl_rx_unrefreshed(que))
 		ixl_refresh_mbufs(que, i);
 
 	rxr->next_check = i;
 
 #if defined(INET6) || defined(INET)
 	/*
 	 * Flush any outstanding LRO work
 	 */
 	tcp_lro_flush_all(lro);
 #endif
 
 	IXL_RX_UNLOCK(rxr);
 	return (FALSE);
 }
 
 
 /*********************************************************************
  *
  *  Verify that the hardware indicated that the checksum is valid.
  *  Inform the stack about the status of checksum so that stack
  *  doesn't spend time verifying the checksum.
  *
  *********************************************************************/
 static void
 ixl_rx_checksum(struct mbuf * mp, u32 status, u32 error, u8 ptype)
 {
 	struct i40e_rx_ptype_decoded decoded;
 
 	decoded = decode_rx_desc_ptype(ptype);
 
 	/* Errors? */
  	if (error & ((1 << I40E_RX_DESC_ERROR_IPE_SHIFT) |
 	    (1 << I40E_RX_DESC_ERROR_L4E_SHIFT))) {
 		mp->m_pkthdr.csum_flags = 0;
 		return;
 	}
 
 	/* IPv6 with extension headers likely have bad csum */
 	if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
 	    decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6)
 		if (status &
 		    (1 << I40E_RX_DESC_STATUS_IPV6EXADD_SHIFT)) {
 			mp->m_pkthdr.csum_flags = 0;
 			return;
 		}
 
  
 	/* IP Checksum Good */
 	mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED;
 	mp->m_pkthdr.csum_flags |= CSUM_IP_VALID;
 
 	if (status & (1 << I40E_RX_DESC_STATUS_L3L4P_SHIFT)) {
 		mp->m_pkthdr.csum_flags |= 
 		    (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 		mp->m_pkthdr.csum_data |= htons(0xffff);
 	}
 	return;
 }
 
 #if __FreeBSD_version >= 1100000
 uint64_t
 ixl_get_counter(if_t ifp, ift_counter cnt)
 {
 	struct ixl_vsi *vsi;
 
 	vsi = if_getsoftc(ifp);
 
 	switch (cnt) {
 	case IFCOUNTER_IPACKETS:
 		return (vsi->ipackets);
 	case IFCOUNTER_IERRORS:
 		return (vsi->ierrors);
 	case IFCOUNTER_OPACKETS:
 		return (vsi->opackets);
 	case IFCOUNTER_OERRORS:
 		return (vsi->oerrors);
 	case IFCOUNTER_COLLISIONS:
 		/* Collisions are by standard impossible in 40G/10G Ethernet */
 		return (0);
 	case IFCOUNTER_IBYTES:
 		return (vsi->ibytes);
 	case IFCOUNTER_OBYTES:
 		return (vsi->obytes);
 	case IFCOUNTER_IMCASTS:
 		return (vsi->imcasts);
 	case IFCOUNTER_OMCASTS:
 		return (vsi->omcasts);
 	case IFCOUNTER_IQDROPS:
 		return (vsi->iqdrops);
 	case IFCOUNTER_OQDROPS:
 		return (vsi->oqdrops);
 	case IFCOUNTER_NOPROTO:
 		return (vsi->noproto);
 	default:
 		return (if_get_counter_default(ifp, cnt));
 	}
 }
 #endif
 
Index: projects/vnet/sys/dev/mlx5/driver.h
===================================================================
--- projects/vnet/sys/dev/mlx5/driver.h	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/driver.h	(revision 301547)
@@ -1,943 +1,944 @@
 /*-
  * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef MLX5_DRIVER_H
 #define MLX5_DRIVER_H
 
 #include <linux/kernel.h>
 #include <linux/completion.h>
 #include <linux/pci.h>
 #include <linux/cache.h>
 #include <linux/rbtree.h>
+#include <linux/if_ether.h>
 #include <linux/semaphore.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/radix-tree.h>
 
 #include <dev/mlx5/device.h>
 #include <dev/mlx5/doorbell.h>
 
 enum {
 	MLX5_BOARD_ID_LEN = 64,
 	MLX5_MAX_NAME_LEN = 16,
 };
 
 enum {
 	/* one minute for the sake of bringup. Generally, commands must always
 	 * complete and we may need to increase this timeout value
 	 */
 	MLX5_CMD_TIMEOUT_MSEC	= 7200 * 1000,
 	MLX5_CMD_WQ_MAX_NAME	= 32,
 };
 
 enum {
 	CMD_OWNER_SW		= 0x0,
 	CMD_OWNER_HW		= 0x1,
 	CMD_STATUS_SUCCESS	= 0,
 };
 
 enum mlx5_sqp_t {
 	MLX5_SQP_SMI		= 0,
 	MLX5_SQP_GSI		= 1,
 	MLX5_SQP_IEEE_1588	= 2,
 	MLX5_SQP_SNIFFER	= 3,
 	MLX5_SQP_SYNC_UMR	= 4,
 };
 
 enum {
 	MLX5_MAX_PORTS	= 2,
 };
 
 enum {
 	MLX5_EQ_VEC_PAGES	 = 0,
 	MLX5_EQ_VEC_CMD		 = 1,
 	MLX5_EQ_VEC_ASYNC	 = 2,
 	MLX5_EQ_VEC_COMP_BASE,
 };
 
 enum {
 	MLX5_MAX_IRQ_NAME	= 32
 };
 
 enum {
 	MLX5_ATOMIC_MODE_IB_COMP	= 1 << 16,
 	MLX5_ATOMIC_MODE_CX		= 2 << 16,
 	MLX5_ATOMIC_MODE_8B		= 3 << 16,
 	MLX5_ATOMIC_MODE_16B		= 4 << 16,
 	MLX5_ATOMIC_MODE_32B		= 5 << 16,
 	MLX5_ATOMIC_MODE_64B		= 6 << 16,
 	MLX5_ATOMIC_MODE_128B		= 7 << 16,
 	MLX5_ATOMIC_MODE_256B		= 8 << 16,
 };
 
 enum {
 	MLX5_REG_QETCR		 = 0x4005,
 	MLX5_REG_QPDP		 = 0x4007,
 	MLX5_REG_QTCT		 = 0x400A,
 	MLX5_REG_PCAP		 = 0x5001,
 	MLX5_REG_PMTU		 = 0x5003,
 	MLX5_REG_PTYS		 = 0x5004,
 	MLX5_REG_PAOS		 = 0x5006,
 	MLX5_REG_PFCC		 = 0x5007,
 	MLX5_REG_PPCNT		 = 0x5008,
 	MLX5_REG_PMAOS		 = 0x5012,
 	MLX5_REG_PUDE		 = 0x5009,
 	MLX5_REG_PPTB		 = 0x500B,
 	MLX5_REG_PBMC		 = 0x500C,
 	MLX5_REG_PMPE		 = 0x5010,
 	MLX5_REG_PELC		 = 0x500e,
 	MLX5_REG_PVLC		 = 0x500f,
 	MLX5_REG_PMLP		 = 0x5002,
 	MLX5_REG_NODE_DESC	 = 0x6001,
 	MLX5_REG_HOST_ENDIANNESS = 0x7004,
 	MLX5_REG_MCIA		 = 0x9014,
 };
 
 enum dbg_rsc_type {
 	MLX5_DBG_RSC_QP,
 	MLX5_DBG_RSC_EQ,
 	MLX5_DBG_RSC_CQ,
 };
 
 struct mlx5_field_desc {
 	struct dentry	       *dent;
 	int			i;
 };
 
 struct mlx5_rsc_debug {
 	struct mlx5_core_dev   *dev;
 	void		       *object;
 	enum dbg_rsc_type	type;
 	struct dentry	       *root;
 	struct mlx5_field_desc	fields[0];
 };
 
 enum mlx5_dev_event {
 	MLX5_DEV_EVENT_SYS_ERROR,
 	MLX5_DEV_EVENT_PORT_UP,
 	MLX5_DEV_EVENT_PORT_DOWN,
 	MLX5_DEV_EVENT_PORT_INITIALIZED,
 	MLX5_DEV_EVENT_LID_CHANGE,
 	MLX5_DEV_EVENT_PKEY_CHANGE,
 	MLX5_DEV_EVENT_GUID_CHANGE,
 	MLX5_DEV_EVENT_CLIENT_REREG,
 	MLX5_DEV_EVENT_VPORT_CHANGE,
 };
 
 enum mlx5_port_status {
 	MLX5_PORT_UP        = 1 << 0,
 	MLX5_PORT_DOWN      = 1 << 1,
 };
 
 enum mlx5_link_mode {
 	MLX5_1000BASE_CX_SGMII	= 0,
 	MLX5_1000BASE_KX	= 1,
 	MLX5_10GBASE_CX4	= 2,
 	MLX5_10GBASE_KX4	= 3,
 	MLX5_10GBASE_KR		= 4,
 	MLX5_20GBASE_KR2	= 5,
 	MLX5_40GBASE_CR4	= 6,
 	MLX5_40GBASE_KR4	= 7,
 	MLX5_56GBASE_R4		= 8,
 	MLX5_10GBASE_CR		= 12,
 	MLX5_10GBASE_SR		= 13,
 	MLX5_10GBASE_ER		= 14,
 	MLX5_40GBASE_SR4	= 15,
 	MLX5_40GBASE_LR4	= 16,
 	MLX5_100GBASE_CR4	= 20,
 	MLX5_100GBASE_SR4	= 21,
 	MLX5_100GBASE_KR4	= 22,
 	MLX5_100GBASE_LR4	= 23,
 	MLX5_100BASE_TX		= 24,
 	MLX5_1000BASE_T		= 25,
 	MLX5_10GBASE_T		= 26,
 	MLX5_25GBASE_CR		= 27,
 	MLX5_25GBASE_KR		= 28,
 	MLX5_25GBASE_SR		= 29,
 	MLX5_50GBASE_CR2	= 30,
 	MLX5_50GBASE_KR2	= 31,
 	MLX5_LINK_MODES_NUMBER,
 };
 
 #define MLX5_PROT_MASK(link_mode) (1 << link_mode)
 
 struct mlx5_uuar_info {
 	struct mlx5_uar	       *uars;
 	int			num_uars;
 	int			num_low_latency_uuars;
 	unsigned long	       *bitmap;
 	unsigned int	       *count;
 	struct mlx5_bf	       *bfs;
 
 	/*
 	 * protect uuar allocation data structs
 	 */
 	struct mutex		lock;
 	u32			ver;
 };
 
 struct mlx5_bf {
 	void __iomem	       *reg;
 	void __iomem	       *regreg;
 	int			buf_size;
 	struct mlx5_uar	       *uar;
 	unsigned long		offset;
 	int			need_lock;
 	/* protect blue flame buffer selection when needed
 	 */
 	spinlock_t		lock;
 
 	/* serialize 64 bit writes when done as two 32 bit accesses
 	 */
 	spinlock_t		lock32;
 	int			uuarn;
 };
 
 struct mlx5_cmd_first {
 	__be32		data[4];
 };
 
 struct mlx5_cmd_msg {
 	struct list_head		list;
 	struct cache_ent	       *cache;
 	u32				len;
 	struct mlx5_cmd_first		first;
 	struct mlx5_cmd_mailbox	       *next;
 };
 
 struct mlx5_cmd_debug {
 	struct dentry	       *dbg_root;
 	struct dentry	       *dbg_in;
 	struct dentry	       *dbg_out;
 	struct dentry	       *dbg_outlen;
 	struct dentry	       *dbg_status;
 	struct dentry	       *dbg_run;
 	void		       *in_msg;
 	void		       *out_msg;
 	u8			status;
 	u16			inlen;
 	u16			outlen;
 };
 
 struct cache_ent {
 	/* protect block chain allocations
 	 */
 	spinlock_t		lock;
 	struct list_head	head;
 };
 
 struct cmd_msg_cache {
 	struct cache_ent	large;
 	struct cache_ent	med;
 
 };
 
 struct mlx5_cmd_stats {
 	u64		sum;
 	u64		n;
 	struct dentry  *root;
 	struct dentry  *avg;
 	struct dentry  *count;
 	/* protect command average calculations */
 	spinlock_t	lock;
 };
 
 struct mlx5_cmd {
 	void	       *cmd_alloc_buf;
 	dma_addr_t	alloc_dma;
 	int		alloc_size;
 	void	       *cmd_buf;
 	dma_addr_t	dma;
 	u16		cmdif_rev;
 	u8		log_sz;
 	u8		log_stride;
 	int		max_reg_cmds;
 	int		events;
 	u32 __iomem    *vector;
 
 	/* protect command queue allocations
 	 */
 	spinlock_t	alloc_lock;
 
 	/* protect token allocations
 	 */
 	spinlock_t	token_lock;
 	u8		token;
 	unsigned long	bitmask;
 	char		wq_name[MLX5_CMD_WQ_MAX_NAME];
 	struct workqueue_struct *wq;
 	struct semaphore sem;
 	struct semaphore pages_sem;
 	int	mode;
 	struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS];
 	struct pci_pool *pool;
 	struct mlx5_cmd_debug dbg;
 	struct cmd_msg_cache cache;
 	int checksum_disabled;
 	struct mlx5_cmd_stats stats[MLX5_CMD_OP_MAX];
 	int moving_to_polling;
 };
 
 struct mlx5_port_caps {
 	int	gid_table_len;
 	int	pkey_table_len;
 	u8	ext_port_cap;
 };
 
 struct mlx5_cmd_mailbox {
 	void	       *buf;
 	dma_addr_t	dma;
 	struct mlx5_cmd_mailbox *next;
 };
 
 struct mlx5_buf_list {
 	void		       *buf;
 	dma_addr_t		map;
 };
 
 struct mlx5_buf {
 	struct mlx5_buf_list	direct;
 	struct mlx5_buf_list   *page_list;
 	int			nbufs;
 	int			npages;
 	int			size;
 	u8			page_shift;
 };
 
 struct mlx5_eq {
 	struct mlx5_core_dev   *dev;
 	__be32 __iomem	       *doorbell;
 	u32			cons_index;
 	struct mlx5_buf		buf;
 	int			size;
 	u8			irqn;
 	u8			eqn;
 	int			nent;
 	u64			mask;
 	struct list_head	list;
 	int			index;
 	struct mlx5_rsc_debug	*dbg;
 };
 
 struct mlx5_core_psv {
 	u32	psv_idx;
 	struct psv_layout {
 		u32	pd;
 		u16	syndrome;
 		u16	reserved;
 		u16	bg;
 		u16	app_tag;
 		u32	ref_tag;
 	} psv;
 };
 
 struct mlx5_core_sig_ctx {
 	struct mlx5_core_psv	psv_memory;
 	struct mlx5_core_psv	psv_wire;
 #if (__FreeBSD_version >= 1100000)
 	struct ib_sig_err       err_item;
 #endif
 	bool			sig_status_checked;
 	bool			sig_err_exists;
 	u32			sigerr_count;
 };
 
 struct mlx5_core_mr {
 	u64			iova;
 	u64			size;
 	u32			key;
 	u32			pd;
 };
 
 enum mlx5_res_type {
 	MLX5_RES_QP,
 	MLX5_RES_SRQ,
 	MLX5_RES_XSRQ,
 };
 
 struct mlx5_core_rsc_common {
 	enum mlx5_res_type	res;
 	atomic_t		refcount;
 	struct completion	free;
 };
 
 struct mlx5_core_srq {
 	struct mlx5_core_rsc_common	common; /* must be first */
 	u32				srqn;
 	int				max;
 	int				max_gs;
 	int				max_avail_gather;
 	int				wqe_shift;
 	void				(*event)(struct mlx5_core_srq *, int);
 	atomic_t			refcount;
 	struct completion		free;
 };
 
 struct mlx5_eq_table {
 	void __iomem	       *update_ci;
 	void __iomem	       *update_arm_ci;
 	struct list_head	comp_eqs_list;
 	struct mlx5_eq		pages_eq;
 	struct mlx5_eq		async_eq;
 	struct mlx5_eq		cmd_eq;
 	int			num_comp_vectors;
 	/* protect EQs list
 	 */
 	spinlock_t		lock;
 };
 
 struct mlx5_uar {
 	u32			index;
 	struct list_head	bf_list;
 	unsigned		free_bf_bmap;
 	void __iomem	       *bf_map;
 	void __iomem	       *map;
 };
 
 
 struct mlx5_core_health {
 	struct mlx5_health_buffer __iomem	*health;
 	__be32 __iomem		       *health_counter;
 	struct timer_list		timer;
 	struct list_head		list;
 	u32				prev;
 	int				miss_counter;
 };
 
 #define	MLX5_CQ_LINEAR_ARRAY_SIZE	1024
 
 struct mlx5_cq_linear_array_entry {
 	spinlock_t	lock;
 	struct mlx5_core_cq * volatile cq;
 };
 
 struct mlx5_cq_table {
 	/* protect radix tree
 	 */
 	spinlock_t		lock;
 	struct radix_tree_root	tree;
 	struct mlx5_cq_linear_array_entry linear_array[MLX5_CQ_LINEAR_ARRAY_SIZE];
 };
 
 struct mlx5_qp_table {
 	/* protect radix tree
 	 */
 	spinlock_t		lock;
 	struct radix_tree_root	tree;
 };
 
 struct mlx5_srq_table {
 	/* protect radix tree
 	 */
 	spinlock_t		lock;
 	struct radix_tree_root	tree;
 };
 
 struct mlx5_mr_table {
 	/* protect radix tree
 	 */
 	rwlock_t		lock;
 	struct radix_tree_root	tree;
 };
 
 struct mlx5_irq_info {
 	char name[MLX5_MAX_IRQ_NAME];
 };
 
 struct mlx5_priv {
 	char			name[MLX5_MAX_NAME_LEN];
 	struct mlx5_eq_table	eq_table;
 	struct msix_entry	*msix_arr;
 	struct mlx5_irq_info	*irq_info;
 	struct mlx5_uuar_info	uuari;
 	MLX5_DECLARE_DOORBELL_LOCK(cq_uar_lock);
 
 	struct io_mapping	*bf_mapping;
 
 	/* pages stuff */
 	struct workqueue_struct *pg_wq;
 	struct rb_root		page_root;
 	int			fw_pages;
 	int			reg_pages;
 	struct list_head	free_list;
 
 	struct mlx5_core_health health;
 
 	struct mlx5_srq_table	srq_table;
 
 	/* start: qp staff */
 	struct mlx5_qp_table	qp_table;
 	struct dentry	       *qp_debugfs;
 	struct dentry	       *eq_debugfs;
 	struct dentry	       *cq_debugfs;
 	struct dentry	       *cmdif_debugfs;
 	/* end: qp staff */
 
 	/* start: cq staff */
 	struct mlx5_cq_table	cq_table;
 	/* end: cq staff */
 
 	/* start: mr staff */
 	struct mlx5_mr_table	mr_table;
 	/* end: mr staff */
 
 	/* start: alloc staff */
 	int			numa_node;
 
 	struct mutex   pgdir_mutex;
 	struct list_head        pgdir_list;
 	/* end: alloc staff */
 	struct dentry	       *dbg_root;
 
 	/* protect mkey key part */
 	spinlock_t		mkey_lock;
 	u8			mkey_key;
 
 	struct list_head        dev_list;
 	struct list_head        ctx_list;
 	spinlock_t              ctx_lock;
 };
 
 struct mlx5_special_contexts {
 	int resd_lkey;
 };
 
 struct mlx5_core_dev {
 	struct pci_dev	       *pdev;
 	char			board_id[MLX5_BOARD_ID_LEN];
 	struct mlx5_cmd		cmd;
 	struct mlx5_port_caps	port_caps[MLX5_MAX_PORTS];
 	u32 hca_caps_cur[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 	u32 hca_caps_max[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
 	struct mlx5_init_seg __iomem *iseg;
 	void			(*event) (struct mlx5_core_dev *dev,
 					  enum mlx5_dev_event event,
 					  unsigned long param);
 	struct mlx5_priv	priv;
 	struct mlx5_profile	*profile;
 	atomic_t		num_qps;
 	u32			issi;
 	struct mlx5_special_contexts special_contexts;
 	unsigned int module_status[MLX5_MAX_PORTS];
 };
 
 enum {
 	MLX5_WOL_DISABLE       = 0,
 	MLX5_WOL_SECURED_MAGIC = 1 << 1,
 	MLX5_WOL_MAGIC         = 1 << 2,
 	MLX5_WOL_ARP           = 1 << 3,
 	MLX5_WOL_BROADCAST     = 1 << 4,
 	MLX5_WOL_MULTICAST     = 1 << 5,
 	MLX5_WOL_UNICAST       = 1 << 6,
 	MLX5_WOL_PHY_ACTIVITY  = 1 << 7,
 };
 
 struct mlx5_db {
 	__be32			*db;
 	union {
 		struct mlx5_db_pgdir		*pgdir;
 		struct mlx5_ib_user_db_page	*user_page;
 	}			u;
 	dma_addr_t		dma;
 	int			index;
 };
 
 struct mlx5_net_counters {
 	u64	packets;
 	u64	octets;
 };
 
 struct mlx5_ptys_reg {
 	u8	local_port;
 	u8	proto_mask;
 	u32	eth_proto_cap;
 	u16	ib_link_width_cap;
 	u16	ib_proto_cap;
 	u32	eth_proto_admin;
 	u16	ib_link_width_admin;
 	u16	ib_proto_admin;
 	u32	eth_proto_oper;
 	u16	ib_link_width_oper;
 	u16	ib_proto_oper;
 	u32	eth_proto_lp_advertise;
 };
 
 struct mlx5_pvlc_reg {
 	u8	local_port;
 	u8	vl_hw_cap;
 	u8	vl_admin;
 	u8	vl_operational;
 };
 
 struct mlx5_pmtu_reg {
 	u8	local_port;
 	u16	max_mtu;
 	u16	admin_mtu;
 	u16	oper_mtu;
 };
 
 struct mlx5_vport_counters {
 	struct mlx5_net_counters	received_errors;
 	struct mlx5_net_counters	transmit_errors;
 	struct mlx5_net_counters	received_ib_unicast;
 	struct mlx5_net_counters	transmitted_ib_unicast;
 	struct mlx5_net_counters	received_ib_multicast;
 	struct mlx5_net_counters	transmitted_ib_multicast;
 	struct mlx5_net_counters	received_eth_broadcast;
 	struct mlx5_net_counters	transmitted_eth_broadcast;
 	struct mlx5_net_counters	received_eth_unicast;
 	struct mlx5_net_counters	transmitted_eth_unicast;
 	struct mlx5_net_counters	received_eth_multicast;
 	struct mlx5_net_counters	transmitted_eth_multicast;
 };
 
 enum {
 	MLX5_DB_PER_PAGE = PAGE_SIZE / L1_CACHE_BYTES,
 };
 
 enum {
 	MLX5_COMP_EQ_SIZE = 1024,
 };
 
 enum {
 	MLX5_PTYS_IB = 1 << 0,
 	MLX5_PTYS_EN = 1 << 2,
 };
 
 struct mlx5_db_pgdir {
 	struct list_head	list;
 	DECLARE_BITMAP(bitmap, MLX5_DB_PER_PAGE);
 	__be32		       *db_page;
 	dma_addr_t		db_dma;
 };
 
 typedef void (*mlx5_cmd_cbk_t)(int status, void *context);
 
 struct mlx5_cmd_work_ent {
 	struct mlx5_cmd_msg    *in;
 	struct mlx5_cmd_msg    *out;
 	void		       *uout;
 	int			uout_size;
 	mlx5_cmd_cbk_t		callback;
 	void		       *context;
 	int			idx;
 	struct completion	done;
 	struct mlx5_cmd        *cmd;
 	struct work_struct	work;
 	struct mlx5_cmd_layout *lay;
 	int			ret;
 	int			page_queue;
 	u8			status;
 	u8			token;
 	u64			ts1;
 	u64			ts2;
 	u16			op;
 };
 
 struct mlx5_pas {
 	u64	pa;
 	u8	log_sz;
 };
 
 static inline void *mlx5_buf_offset(struct mlx5_buf *buf, int offset)
 {
 	if (likely(BITS_PER_LONG == 64 || buf->nbufs == 1))
 		return buf->direct.buf + offset;
 	else
 		return buf->page_list[offset >> PAGE_SHIFT].buf +
 			(offset & (PAGE_SIZE - 1));
 }
 
 
 extern struct workqueue_struct *mlx5_core_wq;
 
 #define STRUCT_FIELD(header, field) \
 	.struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field),      \
 	.struct_size_bytes   = sizeof((struct ib_unpacked_ ## header *)0)->field
 
 static inline struct mlx5_core_dev *pci2mlx5_core_dev(struct pci_dev *pdev)
 {
 	return pci_get_drvdata(pdev);
 }
 
 extern struct dentry *mlx5_debugfs_root;
 
 static inline u16 fw_rev_maj(struct mlx5_core_dev *dev)
 {
 	return ioread32be(&dev->iseg->fw_rev) & 0xffff;
 }
 
 static inline u16 fw_rev_min(struct mlx5_core_dev *dev)
 {
 	return ioread32be(&dev->iseg->fw_rev) >> 16;
 }
 
 static inline u16 fw_rev_sub(struct mlx5_core_dev *dev)
 {
 	return ioread32be(&dev->iseg->cmdif_rev_fw_sub) & 0xffff;
 }
 
 static inline u16 cmdif_rev_get(struct mlx5_core_dev *dev)
 {
 	return ioread32be(&dev->iseg->cmdif_rev_fw_sub) >> 16;
 }
 
 static inline int mlx5_get_gid_table_len(u16 param)
 {
 	if (param > 4) {
 		printf("M4_CORE_DRV_NAME: WARN: ""gid table length is zero\n");
 		return 0;
 	}
 
 	return 8 * (1 << param);
 }
 
 static inline void *mlx5_vzalloc(unsigned long size)
 {
 	void *rtn;
 
 	rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
 	return rtn;
 }
 
 static inline u32 mlx5_base_mkey(const u32 key)
 {
 	return key & 0xffffff00u;
 }
 
 int mlx5_cmd_init(struct mlx5_core_dev *dev);
 void mlx5_cmd_cleanup(struct mlx5_core_dev *dev);
 void mlx5_cmd_use_events(struct mlx5_core_dev *dev);
 void mlx5_cmd_use_polling(struct mlx5_core_dev *dev);
 int mlx5_cmd_status_to_err(struct mlx5_outbox_hdr *hdr);
 int mlx5_cmd_status_to_err_v2(void *ptr);
 int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type,
 		       enum mlx5_cap_mode cap_mode);
 int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 		  int out_size);
 int mlx5_cmd_exec_cb(struct mlx5_core_dev *dev, void *in, int in_size,
 		     void *out, int out_size, mlx5_cmd_cbk_t callback,
 		     void *context);
 int mlx5_cmd_alloc_uar(struct mlx5_core_dev *dev, u32 *uarn);
 int mlx5_cmd_free_uar(struct mlx5_core_dev *dev, u32 uarn);
 int mlx5_alloc_uuars(struct mlx5_core_dev *dev, struct mlx5_uuar_info *uuari);
 int mlx5_free_uuars(struct mlx5_core_dev *dev, struct mlx5_uuar_info *uuari);
 int mlx5_alloc_map_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar);
 void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar);
 void mlx5_health_cleanup(void);
 void  __init mlx5_health_init(void);
 void mlx5_start_health_poll(struct mlx5_core_dev *dev);
 void mlx5_stop_health_poll(struct mlx5_core_dev *dev);
 int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size, int max_direct,
 			struct mlx5_buf *buf, int node);
 int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, int max_direct,
 		   struct mlx5_buf *buf);
 void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_buf *buf);
 int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 			 struct mlx5_create_srq_mbox_in *in, int inlen,
 			 int is_xrc);
 int mlx5_core_destroy_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq);
 int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 			struct mlx5_query_srq_mbox_out *out);
 int mlx5_core_query_vendor_id(struct mlx5_core_dev *mdev, u32 *vendor_id);
 int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
 		      u16 lwm, int is_srq);
 void mlx5_init_mr_table(struct mlx5_core_dev *dev);
 void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev);
 int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
 			  struct mlx5_create_mkey_mbox_in *in, int inlen,
 			  mlx5_cmd_cbk_t callback, void *context,
 			  struct mlx5_create_mkey_mbox_out *out);
 int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr);
 int mlx5_core_query_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
 			 struct mlx5_query_mkey_mbox_out *out, int outlen);
 int mlx5_core_dump_fill_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
 			     u32 *mkey);
 int mlx5_core_alloc_pd(struct mlx5_core_dev *dev, u32 *pdn);
 int mlx5_core_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn);
 int mlx5_core_mad_ifc(struct mlx5_core_dev *dev, void *inb, void *outb,
 		      u16 opmod, u8 port);
 void mlx5_pagealloc_init(struct mlx5_core_dev *dev);
 void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev);
 int mlx5_pagealloc_start(struct mlx5_core_dev *dev);
 void mlx5_pagealloc_stop(struct mlx5_core_dev *dev);
 void mlx5_core_req_pages_handler(struct mlx5_core_dev *dev, u16 func_id,
 				 s32 npages);
 int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot);
 int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev);
 void mlx5_register_debugfs(void);
 void mlx5_unregister_debugfs(void);
 int mlx5_eq_init(struct mlx5_core_dev *dev);
 void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
 void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
 void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn);
 void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
 void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
 struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
 void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, unsigned long vector);
 void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type);
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 		       int nent, u64 mask, const char *name, struct mlx5_uar *uar);
 int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
 int mlx5_start_eqs(struct mlx5_core_dev *dev);
 int mlx5_stop_eqs(struct mlx5_core_dev *dev);
 int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn, int *irqn);
 int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
 int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
 
 int mlx5_qp_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in,
 			 int size_in, void *data_out, int size_out,
 			 u16 reg_num, int arg, int write);
 
 int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps);
 int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
 			 int ptys_size, int proto_mask);
 int mlx5_query_port_proto_cap(struct mlx5_core_dev *dev,
 			      u32 *proto_cap, int proto_mask);
 int mlx5_query_port_proto_admin(struct mlx5_core_dev *dev,
 				u32 *proto_admin, int proto_mask);
 int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 proto_admin,
 			int proto_mask);
 int mlx5_set_port_status(struct mlx5_core_dev *dev,
 			 enum mlx5_port_status status);
 int mlx5_query_port_status(struct mlx5_core_dev *dev, u8 *status);
 int mlx5_set_port_pause(struct mlx5_core_dev *dev, u32 port,
 			u32 rx_pause, u32 tx_pause);
 int mlx5_query_port_pause(struct mlx5_core_dev *dev, u32 port,
 			  u32 *rx_pause, u32 *tx_pause);
 
 int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu);
 int mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu);
 int mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu);
 
 unsigned int mlx5_query_module_status(struct mlx5_core_dev *dev, int module_num);
 int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num);
 int mlx5_query_eeprom(struct mlx5_core_dev *dev, int i2c_addr, int page_num,
 		      int device_addr, int size, int module_num, u32 *data,
 		      int *size_read);
 
 int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
 void mlx5_debug_eq_remove(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
 int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 		       struct mlx5_query_eq_mbox_out *out, int outlen);
 int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_eq_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_db_alloc(struct mlx5_core_dev *dev, struct mlx5_db *db);
 int mlx5_db_alloc_node(struct mlx5_core_dev *dev, struct mlx5_db *db,
 		       int node);
 void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db);
 
 const char *mlx5_command_str(int command);
 int mlx5_cmdif_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_cmdif_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_core_create_psv(struct mlx5_core_dev *dev, u32 pdn,
 			 int npsvs, u32 *sig_index);
 int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num);
 void mlx5_core_put_rsc(struct mlx5_core_rsc_common *common);
 u8 mlx5_is_wol_supported(struct mlx5_core_dev *dev);
 int mlx5_set_wol(struct mlx5_core_dev *dev, u8 wol_mode);
 int mlx5_query_wol(struct mlx5_core_dev *dev, u8 *wol_mode);
 int mlx5_core_access_pvlc(struct mlx5_core_dev *dev,
 			  struct mlx5_pvlc_reg *pvlc, int write);
 int mlx5_core_access_ptys(struct mlx5_core_dev *dev,
 			  struct mlx5_ptys_reg *ptys, int write);
 int mlx5_core_access_pmtu(struct mlx5_core_dev *dev,
 			  struct mlx5_pmtu_reg *pmtu, int write);
 int mlx5_vxlan_udp_port_add(struct mlx5_core_dev *dev, u16 port);
 int mlx5_vxlan_udp_port_delete(struct mlx5_core_dev *dev, u16 port);
 int mlx5_query_port_cong_status(struct mlx5_core_dev *mdev, int protocol,
 				int priority, int *is_enable);
 int mlx5_modify_port_cong_status(struct mlx5_core_dev *mdev, int protocol,
 				 int priority, int enable);
 int mlx5_query_port_cong_params(struct mlx5_core_dev *mdev, int protocol,
 				void *out, int out_size);
 int mlx5_modify_port_cong_params(struct mlx5_core_dev *mdev,
 				 void *in, int in_size);
 int mlx5_query_port_cong_statistics(struct mlx5_core_dev *mdev, int clear,
 				    void *out, int out_size);
 static inline u32 mlx5_mkey_to_idx(u32 mkey)
 {
 	return mkey >> 8;
 }
 
 static inline u32 mlx5_idx_to_mkey(u32 mkey_idx)
 {
 	return mkey_idx << 8;
 }
 
 static inline u8 mlx5_mkey_variant(u32 mkey)
 {
 	return mkey & 0xff;
 }
 
 enum {
 	MLX5_PROF_MASK_QP_SIZE		= (u64)1 << 0,
 	MLX5_PROF_MASK_MR_CACHE		= (u64)1 << 1,
 };
 
 enum {
 	MAX_MR_CACHE_ENTRIES    = 16,
 };
 
 enum {
 	MLX5_INTERFACE_PROTOCOL_IB  = 0,
 	MLX5_INTERFACE_PROTOCOL_ETH = 1,
 };
 
 struct mlx5_interface {
 	void *			(*add)(struct mlx5_core_dev *dev);
 	void			(*remove)(struct mlx5_core_dev *dev, void *context);
 	void			(*event)(struct mlx5_core_dev *dev, void *context,
 					 enum mlx5_dev_event event, unsigned long param);
 	void *                  (*get_dev)(void *context);
 	int			protocol;
 	struct list_head	list;
 };
 
 void *mlx5_get_protocol_dev(struct mlx5_core_dev *mdev, int protocol);
 int mlx5_register_interface(struct mlx5_interface *intf);
 void mlx5_unregister_interface(struct mlx5_interface *intf);
 
 struct mlx5_profile {
 	u64	mask;
 	u8	log_max_qp;
 	struct {
 		int	size;
 		int	limit;
 	} mr_cache[MAX_MR_CACHE_ENTRIES];
 };
 
 
 #define MLX5_EEPROM_MAX_BYTES			32
 #define MLX5_EEPROM_IDENTIFIER_BYTE_MASK	0x000000ff
 #define MLX5_EEPROM_REVISION_ID_BYTE_MASK	0x0000ff00
 #define MLX5_EEPROM_PAGE_3_VALID_BIT_MASK	0x00040000
 #endif /* MLX5_DRIVER_H */
Index: projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c	(revision 301547)
@@ -1,922 +1,1280 @@
 /*-
  * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <linux/etherdevice.h>
 #include <dev/mlx5/driver.h>
 #include <dev/mlx5/vport.h>
 #include "mlx5_core.h"
 
 u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod)
 {
 	u32 in[MLX5_ST_SZ_DW(query_vport_state_in)];
 	u32 out[MLX5_ST_SZ_DW(query_vport_state_out)];
 	int err;
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(query_vport_state_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_VPORT_STATE);
 	MLX5_SET(query_vport_state_in, in, op_mod, opmod);
 
 	err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out,
 					 sizeof(out));
 	if (err)
 		mlx5_core_warn(mdev, "MLX5_CMD_OP_QUERY_VPORT_STATE failed\n");
 
 	return MLX5_GET(query_vport_state_out, out, state);
 }
 EXPORT_SYMBOL_GPL(mlx5_query_vport_state);
 
 static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u32 vport,
 					u32 *out, int outlen)
 {
 	u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(query_nic_vport_context_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
 
 	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
 	if (vport)
 		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
 
 	return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
 }
 
 int mlx5_vport_alloc_q_counter(struct mlx5_core_dev *mdev, int *counter_set_id)
 {
 	u32 in[MLX5_ST_SZ_DW(alloc_q_counter_in)];
 	u32 out[MLX5_ST_SZ_DW(alloc_q_counter_in)];
 	int err;
 
 	memset(in, 0, sizeof(in));
 	memset(out, 0, sizeof(out));
 
 	MLX5_SET(alloc_q_counter_in, in, opcode,
 		 MLX5_CMD_OP_ALLOC_Q_COUNTER);
 
 	err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
 					 out, sizeof(out));
 
 	if (err)
 		return err;
 
 	*counter_set_id = MLX5_GET(alloc_q_counter_out, out,
 				   counter_set_id);
 	return err;
 }
 
 int mlx5_vport_dealloc_q_counter(struct mlx5_core_dev *mdev,
 				 int counter_set_id)
 {
 	u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)];
 	u32 out[MLX5_ST_SZ_DW(dealloc_q_counter_out)];
 
 	memset(in, 0, sizeof(in));
 	memset(out, 0, sizeof(out));
 
 	MLX5_SET(dealloc_q_counter_in, in, opcode,
 		 MLX5_CMD_OP_DEALLOC_Q_COUNTER);
 	MLX5_SET(dealloc_q_counter_in, in, counter_set_id,
 		 counter_set_id);
 
 	return mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
 					  out, sizeof(out));
 }
 
 static int mlx5_vport_query_q_counter(struct mlx5_core_dev *mdev,
 				      int counter_set_id,
 				      int reset,
 				      void *out,
 				      int out_size)
 {
 	u32 in[MLX5_ST_SZ_DW(query_q_counter_in)];
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(query_q_counter_in, in, opcode, MLX5_CMD_OP_QUERY_Q_COUNTER);
 	MLX5_SET(query_q_counter_in, in, clear, reset);
 	MLX5_SET(query_q_counter_in, in, counter_set_id, counter_set_id);
 
 	return mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
 					  out, out_size);
 }
 
 int mlx5_vport_query_out_of_rx_buffer(struct mlx5_core_dev *mdev,
 				      int counter_set_id,
 				      u32 *out_of_rx_buffer)
 {
 	u32 out[MLX5_ST_SZ_DW(query_q_counter_out)];
 	int err;
 
 	memset(out, 0, sizeof(out));
 
 	err = mlx5_vport_query_q_counter(mdev, counter_set_id, 0, out,
 					 sizeof(out));
 
 	if (err)
 		return err;
 
 	*out_of_rx_buffer = MLX5_GET(query_q_counter_out, out,
 				     out_of_buffer);
 	return err;
 }
 
 int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u32 vport, u8 *addr)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
 	u8 *out_addr;
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	out_addr = MLX5_ADDR_OF(query_nic_vport_context_out, out,
 				nic_vport_context.permanent_address);
 
 	err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
 	if (err)
 		goto out;
 
 	ether_addr_copy(addr, &out_addr[2]);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_address);
 
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
 					nic_vport_context.system_image_guid);
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_system_image_guid);
 
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*node_guid = MLX5_GET64(query_nic_vport_context_out, out,
 				nic_vport_context.node_guid);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_node_guid);
 
 int mlx5_query_nic_vport_port_guid(struct mlx5_core_dev *mdev, u64 *port_guid)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*port_guid = MLX5_GET64(query_nic_vport_context_out, out,
 				nic_vport_context.port_guid);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_port_guid);
 
 int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
 					u16 *qkey_viol_cntr)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out,
 				nic_vport_context.qkey_violation_counter);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_qkey_viol_cntr);
 
 static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
 					 int inlen)
 {
 	u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
 
 	MLX5_SET(modify_nic_vport_context_in, in, opcode,
 		 MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
 
 	memset(out, 0, sizeof(out));
 	return mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
 }
 
 static int mlx5_nic_vport_enable_disable_roce(struct mlx5_core_dev *mdev,
 					      int enable_disable)
 {
 	void *in;
 	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
 	int err;
 
 	in = mlx5_vzalloc(inlen);
 	if (!in) {
 		mlx5_core_warn(mdev, "failed to allocate inbox\n");
 		return -ENOMEM;
 	}
 
 	MLX5_SET(modify_nic_vport_context_in, in, field_select.roce_en, 1);
 	MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.roce_en,
 		 enable_disable);
 
 	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
 
 	kvfree(in);
 
 	return err;
 }
 
 int mlx5_set_nic_vport_current_mac(struct mlx5_core_dev *mdev, int vport,
 				   bool other_vport, u8 *addr)
 {
 	void *in;
 	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
 		  + MLX5_ST_SZ_BYTES(mac_address_layout);
 	u8  *mac_layout;
 	u8  *mac_ptr;
 	int err;
 
 	in = mlx5_vzalloc(inlen);
 	if (!in) {
 		mlx5_core_warn(mdev, "failed to allocate inbox\n");
 		return -ENOMEM;
 	}
 
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 opcode, MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 vport_number, vport);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 other_vport, other_vport);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 field_select.addresses_list, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 nic_vport_context.allowed_list_type,
 		 MLX5_NIC_VPORT_LIST_TYPE_UC);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 nic_vport_context.allowed_list_size, 1);
 
 	mac_layout = (u8 *)MLX5_ADDR_OF(modify_nic_vport_context_in, in,
 		nic_vport_context.current_uc_mac_address);
 	mac_ptr = (u8 *)MLX5_ADDR_OF(mac_address_layout, mac_layout,
 		mac_addr_47_32);
 	ether_addr_copy(mac_ptr, addr);
 
 	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
 
 	kvfree(in);
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_current_mac);
 
 int mlx5_set_nic_vport_vlan_list(struct mlx5_core_dev *dev, u32 vport,
 				 u16 *vlan_list, int list_len)
 {
 	void *in, *ctx;
 	int i, err;
 	int  inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
 		+ MLX5_ST_SZ_BYTES(vlan_layout) * (int)list_len;
 
 	int max_list_size = 1 << MLX5_CAP_GEN_MAX(dev, log_max_vlan_list);
 
 	if (list_len > max_list_size) {
 		mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
 			       list_len, max_list_size);
 		return -ENOSPC;
 	}
 
 	in = mlx5_vzalloc(inlen);
 	if (!in) {
 		mlx5_core_warn(dev, "failed to allocate inbox\n");
 		return -ENOMEM;
 	}
 
 	MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
 	if (vport)
 		MLX5_SET(modify_nic_vport_context_in, in,
 			 other_vport, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 field_select.addresses_list, 1);
 
 	ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in, nic_vport_context);
 
 	MLX5_SET(nic_vport_context, ctx, allowed_list_type,
 		 MLX5_NIC_VPORT_LIST_TYPE_VLAN);
 	MLX5_SET(nic_vport_context, ctx, allowed_list_size, list_len);
 
 	for (i = 0; i < list_len; i++) {
 		u8 *vlan_lout = MLX5_ADDR_OF(nic_vport_context, ctx,
 					 current_uc_mac_address[i]);
 		MLX5_SET(vlan_layout, vlan_lout, vlan, vlan_list[i]);
 	}
 
 	err = mlx5_modify_nic_vport_context(dev, in, inlen);
 
 	kvfree(in);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_vlan_list);
 
 int mlx5_set_nic_vport_mc_list(struct mlx5_core_dev *mdev, int vport,
 			       u64 *addr_list, size_t addr_list_len)
 {
 	void *in, *ctx;
 	int  inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
 		  + MLX5_ST_SZ_BYTES(mac_address_layout) * (int)addr_list_len;
 	int err;
 	size_t i;
 	int max_list_sz = 1 << MLX5_CAP_GEN_MAX(mdev, log_max_current_mc_list);
 
 	if ((int)addr_list_len > max_list_sz) {
 		mlx5_core_warn(mdev, "Requested list size (%d) > (%d) max_list_size\n",
 			       (int)addr_list_len, max_list_sz);
 		return -ENOSPC;
 	}
 
 	in = mlx5_vzalloc(inlen);
 	if (!in) {
 		mlx5_core_warn(mdev, "failed to allocate inbox\n");
 		return -ENOMEM;
 	}
 
 	MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
 	if (vport)
 		MLX5_SET(modify_nic_vport_context_in, in,
 			 other_vport, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 field_select.addresses_list, 1);
 
 	ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in, nic_vport_context);
 
 	MLX5_SET(nic_vport_context, ctx, allowed_list_type,
 		 MLX5_NIC_VPORT_LIST_TYPE_MC);
 	MLX5_SET(nic_vport_context, ctx, allowed_list_size, addr_list_len);
 
 	for (i = 0; i < addr_list_len; i++) {
 		u8 *mac_lout = (u8 *)MLX5_ADDR_OF(nic_vport_context, ctx,
 						  current_uc_mac_address[i]);
 		u8 *mac_ptr = (u8 *)MLX5_ADDR_OF(mac_address_layout, mac_lout,
 						 mac_addr_47_32);
 		ether_addr_copy(mac_ptr, (u8 *)&addr_list[i]);
 	}
 
 	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
 
 	kvfree(in);
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_mc_list);
 
 int mlx5_set_nic_vport_promisc(struct mlx5_core_dev *mdev, int vport,
 			       bool promisc_mc, bool promisc_uc,
 			       bool promisc_all)
 {
 	u8  in[MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)];
 	u8 *ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
 			       nic_vport_context);
 
 	memset(in, 0, MLX5_ST_SZ_BYTES(modify_nic_vport_context_in));
 
 	MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
 	if (vport)
 		MLX5_SET(modify_nic_vport_context_in, in,
 			 other_vport, 1);
 	MLX5_SET(modify_nic_vport_context_in, in, field_select.promisc, 1);
 	if (promisc_mc)
 		MLX5_SET(nic_vport_context, ctx, promisc_mc, 1);
 	if (promisc_uc)
 		MLX5_SET(nic_vport_context, ctx, promisc_uc, 1);
 	if (promisc_all)
 		MLX5_SET(nic_vport_context, ctx, promisc_all, 1);
 
 	return mlx5_modify_nic_vport_context(mdev, in, sizeof(in));
 }
 EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_promisc);
+
+int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
+				  u32 vport,
+				  enum mlx5_list_type list_type,
+				  u8 addr_list[][ETH_ALEN],
+				  int *list_size)
+{
+	u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+	void *nic_vport_ctx;
+	int max_list_size;
+	int req_list_size;
+	u8 *mac_addr;
+	int out_sz;
+	void *out;
+	int err;
+	int i;
+
+	req_list_size = *list_size;
+
+	max_list_size = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC) ?
+			1 << MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list) :
+			1 << MLX5_CAP_GEN_MAX(dev, log_max_current_mc_list);
+
+	if (req_list_size > max_list_size) {
+		mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
+			       req_list_size, max_list_size);
+		req_list_size = max_list_size;
+	}
+
+	out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+		 req_list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+	memset(in, 0, sizeof(in));
+	out = kzalloc(out_sz, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, allowed_list_type, list_type);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+	err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+	if (err)
+		goto out;
+
+	nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+				     nic_vport_context);
+	req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+				 allowed_list_size);
+
+	*list_size = req_list_size;
+	for (i = 0; i < req_list_size; i++) {
+		mac_addr = MLX5_ADDR_OF(nic_vport_context,
+					nic_vport_ctx,
+					current_uc_mac_address[i]) + 2;
+		ether_addr_copy(addr_list[i], mac_addr);
+	}
+out:
+	kfree(out);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_list);
+
+int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev *dev,
+				   enum mlx5_list_type list_type,
+				   u8 addr_list[][ETH_ALEN],
+				   int list_size)
+{
+	u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+	void *nic_vport_ctx;
+	int max_list_size;
+	int in_sz;
+	void *in;
+	int err;
+	int i;
+
+	max_list_size = list_type == MLX5_NIC_VPORT_LIST_TYPE_UC ?
+		 1 << MLX5_CAP_GEN(dev, log_max_current_uc_list) :
+		 1 << MLX5_CAP_GEN(dev, log_max_current_mc_list);
+
+	if (list_size > max_list_size)
+		return -ENOSPC;
+
+	in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+		list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+	memset(out, 0, sizeof(out));
+	in = kzalloc(in_sz, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 field_select.addresses_list, 1);
+
+	nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+				     nic_vport_context);
+
+	MLX5_SET(nic_vport_context, nic_vport_ctx,
+		 allowed_list_type, list_type);
+	MLX5_SET(nic_vport_context, nic_vport_ctx,
+		 allowed_list_size, list_size);
+
+	for (i = 0; i < list_size; i++) {
+		u8 *curr_mac = MLX5_ADDR_OF(nic_vport_context,
+					    nic_vport_ctx,
+					    current_uc_mac_address[i]) + 2;
+		ether_addr_copy(curr_mac, addr_list[i]);
+	}
+
+	err = mlx5_cmd_exec_check_status(dev, in, in_sz, out, sizeof(out));
+	kfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mac_list);
+
+int mlx5_query_nic_vport_vlan_list(struct mlx5_core_dev *dev,
+				   u32 vport,
+				   u16 *vlan_list,
+				   int *list_size)
+{
+	u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+	void *nic_vport_ctx;
+	int max_list_size;
+	int req_list_size;
+	int out_sz;
+	void *out;
+	void *vlan_addr;
+	int err;
+	int i;
+
+	req_list_size = *list_size;
+
+	max_list_size = 1 << MLX5_CAP_GEN_MAX(dev, log_max_vlan_list);
+
+	if (req_list_size > max_list_size) {
+		mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
+			       req_list_size, max_list_size);
+		req_list_size = max_list_size;
+	}
+
+	out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+		 req_list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+	memset(in, 0, sizeof(in));
+	out = kzalloc(out_sz, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	MLX5_SET(query_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+	MLX5_SET(query_nic_vport_context_in, in, allowed_list_type,
+		 MLX5_NIC_VPORT_CONTEXT_ALLOWED_LIST_TYPE_VLAN_LIST);
+	MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+	if (vport)
+		MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+	err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+	if (err)
+		goto out;
+
+	nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+				     nic_vport_context);
+	req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+				 allowed_list_size);
+
+	*list_size = req_list_size;
+	for (i = 0; i < req_list_size; i++) {
+		vlan_addr = MLX5_ADDR_OF(nic_vport_context, nic_vport_ctx,
+					 current_uc_mac_address[i]);
+		vlan_list[i] = MLX5_GET(vlan_layout, vlan_addr, vlan);
+	}
+out:
+	kfree(out);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_vlan_list);
+
+int mlx5_modify_nic_vport_vlans(struct mlx5_core_dev *dev,
+				u16 vlans[],
+				int list_size)
+{
+	u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+	void *nic_vport_ctx;
+	int max_list_size;
+	int in_sz;
+	void *in;
+	int err;
+	int i;
+
+	max_list_size = 1 << MLX5_CAP_GEN(dev, log_max_vlan_list);
+
+	if (list_size > max_list_size)
+		return -ENOSPC;
+
+	in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+		list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+	memset(out, 0, sizeof(out));
+	in = kzalloc(in_sz, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, opcode,
+		 MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 field_select.addresses_list, 1);
+
+	nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+				     nic_vport_context);
+
+	MLX5_SET(nic_vport_context, nic_vport_ctx,
+		 allowed_list_type, MLX5_NIC_VPORT_LIST_TYPE_VLAN);
+	MLX5_SET(nic_vport_context, nic_vport_ctx,
+		 allowed_list_size, list_size);
+
+	for (i = 0; i < list_size; i++) {
+		void *vlan_addr = MLX5_ADDR_OF(nic_vport_context,
+					       nic_vport_ctx,
+					       current_uc_mac_address[i]);
+		MLX5_SET(vlan_layout, vlan_addr, vlan, vlans[i]);
+	}
+
+	err = mlx5_cmd_exec_check_status(dev, in, in_sz, out, sizeof(out));
+	kfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_vlans);
+
 int mlx5_set_nic_vport_permanent_mac(struct mlx5_core_dev *mdev, int vport,
 				     u8 *addr)
 {
 	void *in;
 	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
 	u8  *mac_ptr;
 	int err;
 
 	in = mlx5_vzalloc(inlen);
 	if (!in) {
 		mlx5_core_warn(mdev, "failed to allocate inbox\n");
 		return -ENOMEM;
 	}
 
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 opcode, MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
 	MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
 	MLX5_SET(modify_nic_vport_context_in, in, other_vport, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 field_select.permanent_address, 1);
 	mac_ptr = (u8 *)MLX5_ADDR_OF(modify_nic_vport_context_in, in,
 		nic_vport_context.permanent_address.mac_addr_47_32);
 	ether_addr_copy(mac_ptr, addr);
 
 	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
 
 	kvfree(in);
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_permanent_mac);
 
 int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
 {
 	return mlx5_nic_vport_enable_disable_roce(mdev, 1);
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
 
 int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
 {
 	return mlx5_nic_vport_enable_disable_roce(mdev, 0);
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
 
 int mlx5_query_hca_vport_context(struct mlx5_core_dev *mdev,
 				 u8 port_num, u8 vport_num, u32 *out,
 				 int outlen)
 {
 	u32 in[MLX5_ST_SZ_DW(query_hca_vport_context_in)];
 	int is_group_manager;
 
 	is_group_manager = MLX5_CAP_GEN(mdev, vport_group_manager);
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(query_hca_vport_context_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT);
 
 	if (vport_num) {
 		if (is_group_manager) {
 			MLX5_SET(query_hca_vport_context_in, in, other_vport,
 				 1);
 			MLX5_SET(query_hca_vport_context_in, in, vport_number,
 				 vport_num);
 		} else {
 			return -EPERM;
 		}
 	}
 
 	if (MLX5_CAP_GEN(mdev, num_ports) == 2)
 		MLX5_SET(query_hca_vport_context_in, in, port_num, port_num);
 
 	return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
 }
 
 int mlx5_query_hca_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_hca_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_hca_vport_context(mdev, 1, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*system_image_guid = MLX5_GET64(query_hca_vport_context_out, out,
 					hca_vport_context.system_image_guid);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_system_image_guid);
 
 int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
 {
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_hca_vport_context_out);
 	int err;
 
 	out = mlx5_vzalloc(outlen);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_hca_vport_context(mdev, 1, 0, out, outlen);
 	if (err)
 		goto out;
 
 	*node_guid = MLX5_GET64(query_hca_vport_context_out, out,
 				hca_vport_context.node_guid);
 
 out:
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_node_guid);
 
 int mlx5_query_hca_vport_gid(struct mlx5_core_dev *dev, u8 port_num,
 			     u16 vport_num, u16 gid_index, union ib_gid *gid)
 {
 	int in_sz = MLX5_ST_SZ_BYTES(query_hca_vport_gid_in);
 	int out_sz = MLX5_ST_SZ_BYTES(query_hca_vport_gid_out);
 	int is_group_manager;
 	void *out = NULL;
 	void *in = NULL;
 	union ib_gid *tmp;
 	int tbsz;
 	int nout;
 	int err;
 
 	is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
 	tbsz = mlx5_get_gid_table_len(MLX5_CAP_GEN(dev, gid_table_size));
 
 	if (gid_index > tbsz && gid_index != 0xffff)
 		return -EINVAL;
 
 	if (gid_index == 0xffff)
 		nout = tbsz;
 	else
 		nout = 1;
 
 	out_sz += nout * sizeof(*gid);
 
 	in = mlx5_vzalloc(in_sz);
 	out = mlx5_vzalloc(out_sz);
 	if (!in || !out) {
 		err = -ENOMEM;
 		goto out;
 	}
 
 	MLX5_SET(query_hca_vport_gid_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_HCA_VPORT_GID);
 	if (vport_num) {
 		if (is_group_manager) {
 			MLX5_SET(query_hca_vport_gid_in, in, vport_number,
 				 vport_num);
 			MLX5_SET(query_hca_vport_gid_in, in, other_vport, 1);
 		} else {
 			err = -EPERM;
 			goto out;
 		}
 	}
 
 	MLX5_SET(query_hca_vport_gid_in, in, gid_index, gid_index);
 
 	if (MLX5_CAP_GEN(dev, num_ports) == 2)
 		MLX5_SET(query_hca_vport_gid_in, in, port_num, port_num);
 
 	err = mlx5_cmd_exec(dev, in, in_sz, out, out_sz);
 	if (err)
 		goto out;
 
 	err = mlx5_cmd_status_to_err_v2(out);
 	if (err)
 		goto out;
 
 	tmp = (union ib_gid *)MLX5_ADDR_OF(query_hca_vport_gid_out, out, gid);
 	gid->global.subnet_prefix = tmp->global.subnet_prefix;
 	gid->global.interface_id = tmp->global.interface_id;
 
 out:
 	kvfree(in);
 	kvfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_gid);
 
 int mlx5_query_hca_vport_pkey(struct mlx5_core_dev *dev, u8 other_vport,
 			      u8 port_num, u16 vf_num, u16 pkey_index,
 			      u16 *pkey)
 {
 	int in_sz = MLX5_ST_SZ_BYTES(query_hca_vport_pkey_in);
 	int out_sz = MLX5_ST_SZ_BYTES(query_hca_vport_pkey_out);
 	int is_group_manager;
 	void *out = NULL;
 	void *in = NULL;
 	void *pkarr;
 	int nout;
 	int tbsz;
 	int err;
 	int i;
 
 	is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
 
 	tbsz = mlx5_to_sw_pkey_sz(MLX5_CAP_GEN(dev, pkey_table_size));
 	if (pkey_index > tbsz && pkey_index != 0xffff)
 		return -EINVAL;
 
 	if (pkey_index == 0xffff)
 		nout = tbsz;
 	else
 		nout = 1;
 
 	out_sz += nout * MLX5_ST_SZ_BYTES(pkey);
 
 	in = kzalloc(in_sz, GFP_KERNEL);
 	out = kzalloc(out_sz, GFP_KERNEL);
 
 	MLX5_SET(query_hca_vport_pkey_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_HCA_VPORT_PKEY);
 	if (other_vport) {
 		if (is_group_manager) {
 			MLX5_SET(query_hca_vport_pkey_in, in, vport_number,
 				 vf_num);
 			MLX5_SET(query_hca_vport_pkey_in, in, other_vport, 1);
 		} else {
 			err = -EPERM;
 			goto out;
 		}
 	}
 	MLX5_SET(query_hca_vport_pkey_in, in, pkey_index, pkey_index);
 
 	if (MLX5_CAP_GEN(dev, num_ports) == 2)
 		MLX5_SET(query_hca_vport_pkey_in, in, port_num, port_num);
 
 	err = mlx5_cmd_exec(dev, in, in_sz, out, out_sz);
 	if (err)
 		goto out;
 
 	err = mlx5_cmd_status_to_err_v2(out);
 	if (err)
 		goto out;
 
 	pkarr = MLX5_ADDR_OF(query_hca_vport_pkey_out, out, pkey);
 	for (i = 0; i < nout; i++, pkey++,
 	     pkarr += MLX5_ST_SZ_BYTES(pkey))
 		*pkey = MLX5_GET_PR(pkey, pkarr, pkey);
 
 out:
 	kfree(in);
 	kfree(out);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_pkey);
 
 static int mlx5_modify_eswitch_vport_context(struct mlx5_core_dev *mdev,
 					     u16 vport, void *in, int inlen)
 {
 	u32 out[MLX5_ST_SZ_DW(modify_esw_vport_context_out)];
 	int err;
 
 	memset(out, 0, sizeof(out));
 
 	MLX5_SET(modify_esw_vport_context_in, in, vport_number, vport);
 	if (vport)
 		MLX5_SET(modify_esw_vport_context_in, in, other_vport, 1);
 
 	MLX5_SET(modify_esw_vport_context_in, in, opcode,
 		 MLX5_CMD_OP_MODIFY_ESW_VPORT_CONTEXT);
 
 	err = mlx5_cmd_exec_check_status(mdev, in, inlen,
 					 out, sizeof(out));
 	if (err)
 		mlx5_core_warn(mdev, "MLX5_CMD_OP_MODIFY_ESW_VPORT_CONTEXT failed\n");
 
 	return err;
 }
 
 int mlx5_set_eswitch_cvlan_info(struct mlx5_core_dev *mdev, u8 vport,
 				u8 insert_mode, u8 strip_mode,
 				u16 vlan, u8 cfi, u8 pcp)
 {
 	u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)];
 
 	memset(in, 0, sizeof(in));
 
 	if (insert_mode != MLX5_MODIFY_ESW_VPORT_CONTEXT_CVLAN_INSERT_NONE) {
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.cvlan_cfi, cfi);
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.cvlan_pcp, pcp);
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.cvlan_id, vlan);
 	}
 
 	MLX5_SET(modify_esw_vport_context_in, in,
 		 esw_vport_context.vport_cvlan_insert, insert_mode);
 
 	MLX5_SET(modify_esw_vport_context_in, in,
 		 esw_vport_context.vport_cvlan_strip, strip_mode);
 
 	MLX5_SET(modify_esw_vport_context_in, in, field_select,
 		 MLX5_MODIFY_ESW_VPORT_CONTEXT_FIELD_SELECT_CVLAN_STRIP |
 		 MLX5_MODIFY_ESW_VPORT_CONTEXT_FIELD_SELECT_CVLAN_INSERT);
 
 	return mlx5_modify_eswitch_vport_context(mdev, vport, in, sizeof(in));
 }
 EXPORT_SYMBOL_GPL(mlx5_set_eswitch_cvlan_info);
+
+int mlx5_arm_vport_context_events(struct mlx5_core_dev *mdev,
+				  u8 vport,
+				  u32 events_mask)
+{
+	u32 *in;
+	u32 inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *nic_vport_ctx;
+	int err;
+
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in,
+		 in,
+		 opcode,
+		 MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+	MLX5_SET(modify_nic_vport_context_in,
+		 in,
+		 field_select.change_event,
+		 1);
+	MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
+	if (vport)
+		MLX5_SET(modify_nic_vport_context_in, in, other_vport, 1);
+	nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in,
+				     in,
+				     nic_vport_context);
+
+	MLX5_SET(nic_vport_context, nic_vport_ctx, arm_change_event, 1);
+
+	if (events_mask & MLX5_UC_ADDR_CHANGE)
+		MLX5_SET(nic_vport_context,
+			 nic_vport_ctx,
+			 event_on_uc_address_change,
+			 1);
+	if (events_mask & MLX5_MC_ADDR_CHANGE)
+		MLX5_SET(nic_vport_context,
+			 nic_vport_ctx,
+			 event_on_mc_address_change,
+			 1);
+	if (events_mask & MLX5_VLAN_CHANGE)
+		MLX5_SET(nic_vport_context,
+			 nic_vport_ctx,
+			 event_on_vlan_change,
+			 1);
+	if (events_mask & MLX5_PROMISC_CHANGE)
+		MLX5_SET(nic_vport_context,
+			 nic_vport_ctx,
+			 event_on_promisc_change,
+			 1);
+	if (events_mask & MLX5_MTU_CHANGE)
+		MLX5_SET(nic_vport_context,
+			 nic_vport_ctx,
+			 event_on_mtu,
+			 1);
+
+	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_arm_vport_context_events);
+
+int mlx5_query_vport_promisc(struct mlx5_core_dev *mdev,
+			     u32 vport,
+			     u8 *promisc_uc,
+			     u8 *promisc_mc,
+			     u8 *promisc_all)
+{
+	u32 *out;
+	int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+	int err;
+
+	out = kzalloc(outlen, GFP_KERNEL);
+	if (!out)
+		return -ENOMEM;
+
+	err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
+	if (err)
+		goto out;
+
+	*promisc_uc = MLX5_GET(query_nic_vport_context_out, out,
+			       nic_vport_context.promisc_uc);
+	*promisc_mc = MLX5_GET(query_nic_vport_context_out, out,
+			       nic_vport_context.promisc_mc);
+	*promisc_all = MLX5_GET(query_nic_vport_context_out, out,
+				nic_vport_context.promisc_all);
+
+out:
+	kfree(out);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_promisc);
+
+int mlx5_modify_nic_vport_promisc(struct mlx5_core_dev *mdev,
+				  int promisc_uc,
+				  int promisc_mc,
+				  int promisc_all)
+{
+	void *in;
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	int err;
+
+	in = mlx5_vzalloc(inlen);
+	if (!in) {
+		mlx5_core_err(mdev, "failed to allocate inbox\n");
+		return -ENOMEM;
+	}
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.promisc, 1);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.promisc_uc, promisc_uc);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.promisc_mc, promisc_mc);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.promisc_all, promisc_all);
+
+	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_promisc);
 
 int mlx5_query_vport_counter(struct mlx5_core_dev *dev,
 			     u8 port_num, u16 vport_num,
 			     void *out, int out_size)
 {
 	int in_sz = MLX5_ST_SZ_BYTES(query_vport_counter_in);
 	int is_group_manager;
 	void *in;
 	int err;
 
 	is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
 
 	in = mlx5_vzalloc(in_sz);
 	if (!in)
 		return -ENOMEM;
 
 	MLX5_SET(query_vport_counter_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_VPORT_COUNTER);
 	if (vport_num) {
 		if (is_group_manager) {
 			MLX5_SET(query_vport_counter_in, in, other_vport, 1);
 			MLX5_SET(query_vport_counter_in, in, vport_number,
 				 vport_num);
 		} else {
 			err = -EPERM;
 			goto ex;
 		}
 	}
 	if (MLX5_CAP_GEN(dev, num_ports) == 2)
 		MLX5_SET(query_vport_counter_in, in, port_num, port_num);
 
 	err = mlx5_cmd_exec(dev, in, in_sz, out,  out_size);
 	if (err)
 		goto ex;
 	err = mlx5_cmd_status_to_err_v2(out);
 	if (err)
 		goto ex;
 
 ex:
 	kvfree(in);
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_query_vport_counter);
 
 int mlx5_get_vport_counters(struct mlx5_core_dev *dev, u8 port_num,
 			    struct mlx5_vport_counters *vc)
 {
 	int out_sz = MLX5_ST_SZ_BYTES(query_vport_counter_out);
 	void *out;
 	int err;
 
 	out = mlx5_vzalloc(out_sz);
 	if (!out)
 		return -ENOMEM;
 
 	err = mlx5_query_vport_counter(dev, port_num, 0, out, out_sz);
 	if (err)
 		goto ex;
 
 	vc->received_errors.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_errors.packets);
 	vc->received_errors.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_errors.octets);
 	vc->transmit_errors.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmit_errors.packets);
 	vc->transmit_errors.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmit_errors.octets);
 	vc->received_ib_unicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_ib_unicast.packets);
 	vc->received_ib_unicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_ib_unicast.octets);
 	vc->transmitted_ib_unicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_ib_unicast.packets);
 	vc->transmitted_ib_unicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_ib_unicast.octets);
 	vc->received_ib_multicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_ib_multicast.packets);
 	vc->received_ib_multicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_ib_multicast.octets);
 	vc->transmitted_ib_multicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_ib_multicast.packets);
 	vc->transmitted_ib_multicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_ib_multicast.octets);
 	vc->received_eth_broadcast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_broadcast.packets);
 	vc->received_eth_broadcast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_broadcast.octets);
 	vc->transmitted_eth_broadcast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_broadcast.packets);
 	vc->transmitted_eth_broadcast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_broadcast.octets);
 	vc->received_eth_unicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_unicast.octets);
 	vc->received_eth_unicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_unicast.packets);
 	vc->transmitted_eth_unicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_unicast.octets);
 	vc->transmitted_eth_unicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_unicast.packets);
 	vc->received_eth_multicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_multicast.octets);
 	vc->received_eth_multicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, received_eth_multicast.packets);
 	vc->transmitted_eth_multicast.octets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_multicast.octets);
 	vc->transmitted_eth_multicast.packets =
 		MLX5_GET64(query_vport_counter_out,
 			   out, transmitted_eth_multicast.packets);
 
 ex:
 	kvfree(out);
 	return err;
 }
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c	(revision 301547)
@@ -1,867 +1,999 @@
 /*-
  * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include "en.h"
 
 #include <linux/list.h>
 #include <dev/mlx5/flow_table.h>
 
 enum {
 	MLX5E_FULLMATCH = 0,
 	MLX5E_ALLMULTI = 1,
 	MLX5E_PROMISC = 2,
 };
 
 enum {
 	MLX5E_UC = 0,
 	MLX5E_MC_IPV4 = 1,
 	MLX5E_MC_IPV6 = 2,
 	MLX5E_MC_OTHER = 3,
 };
 
 enum {
 	MLX5E_ACTION_NONE = 0,
 	MLX5E_ACTION_ADD = 1,
 	MLX5E_ACTION_DEL = 2,
 };
 
 struct mlx5e_eth_addr_hash_node {
 	LIST_ENTRY(mlx5e_eth_addr_hash_node) hlist;
 	u8	action;
 	struct mlx5e_eth_addr_info ai;
 };
 
 static inline int
 mlx5e_hash_eth_addr(const u8 * addr)
 {
 	return (addr[5]);
 }
 
 static void
 mlx5e_add_eth_addr_to_hash(struct mlx5e_eth_addr_hash_head *hash,
     const u8 * addr)
 {
 	struct mlx5e_eth_addr_hash_node *hn;
 	int ix = mlx5e_hash_eth_addr(addr);
 
 	LIST_FOREACH(hn, &hash[ix], hlist) {
 		if (bcmp(hn->ai.addr, addr, ETHER_ADDR_LEN) == 0) {
 			if (hn->action == MLX5E_ACTION_DEL)
 				hn->action = MLX5E_ACTION_NONE;
 			return;
 		}
 	}
 
 	hn = malloc(sizeof(*hn), M_MLX5EN, M_NOWAIT | M_ZERO);
 	if (hn == NULL)
 		return;
 
 	ether_addr_copy(hn->ai.addr, addr);
 	hn->action = MLX5E_ACTION_ADD;
 
 	LIST_INSERT_HEAD(&hash[ix], hn, hlist);
 }
 
 static void
 mlx5e_del_eth_addr_from_hash(struct mlx5e_eth_addr_hash_node *hn)
 {
 	LIST_REMOVE(hn, hlist);
 	free(hn, M_MLX5EN);
 }
 
 static void
 mlx5e_del_eth_addr_from_flow_table(struct mlx5e_priv *priv,
     struct mlx5e_eth_addr_info *ai)
 {
 	void *ft = priv->ft.main;
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV6_TCP))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6_TCP]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV4_TCP))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4_TCP]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV6_UDP))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6_UDP]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV4_UDP))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4_UDP]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV6))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_IPV4))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4]);
 
 	if (ai->tt_vec & (1 << MLX5E_TT_ANY))
 		mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_ANY]);
 }
 
 static int
 mlx5e_get_eth_addr_type(const u8 * addr)
 {
 	if (ETHER_IS_MULTICAST(addr) == 0)
 		return (MLX5E_UC);
 
 	if ((addr[0] == 0x01) &&
 	    (addr[1] == 0x00) &&
 	    (addr[2] == 0x5e) &&
 	    !(addr[3] & 0x80))
 		return (MLX5E_MC_IPV4);
 
 	if ((addr[0] == 0x33) &&
 	    (addr[1] == 0x33))
 		return (MLX5E_MC_IPV6);
 
 	return (MLX5E_MC_OTHER);
 }
 
 static	u32
 mlx5e_get_tt_vec(struct mlx5e_eth_addr_info *ai, int type)
 {
 	int eth_addr_type;
 	u32 ret;
 
 	switch (type) {
 	case MLX5E_FULLMATCH:
 		eth_addr_type = mlx5e_get_eth_addr_type(ai->addr);
 		switch (eth_addr_type) {
 		case MLX5E_UC:
 			ret =
 			    (1 << MLX5E_TT_IPV4_TCP) |
 			    (1 << MLX5E_TT_IPV6_TCP) |
 			    (1 << MLX5E_TT_IPV4_UDP) |
 			    (1 << MLX5E_TT_IPV6_UDP) |
 			    (1 << MLX5E_TT_IPV4) |
 			    (1 << MLX5E_TT_IPV6) |
 			    (1 << MLX5E_TT_ANY) |
 			    0;
 			break;
 
 		case MLX5E_MC_IPV4:
 			ret =
 			    (1 << MLX5E_TT_IPV4_UDP) |
 			    (1 << MLX5E_TT_IPV4) |
 			    0;
 			break;
 
 		case MLX5E_MC_IPV6:
 			ret =
 			    (1 << MLX5E_TT_IPV6_UDP) |
 			    (1 << MLX5E_TT_IPV6) |
 			    0;
 			break;
 
 		default:
 			ret =
 			    (1 << MLX5E_TT_ANY) |
 			    0;
 			break;
 		}
 		break;
 
 	case MLX5E_ALLMULTI:
 		ret =
 		    (1 << MLX5E_TT_IPV4_UDP) |
 		    (1 << MLX5E_TT_IPV6_UDP) |
 		    (1 << MLX5E_TT_IPV4) |
 		    (1 << MLX5E_TT_IPV6) |
 		    (1 << MLX5E_TT_ANY) |
 		    0;
 		break;
 
 	default:			/* MLX5E_PROMISC */
 		ret =
 		    (1 << MLX5E_TT_IPV4_TCP) |
 		    (1 << MLX5E_TT_IPV6_TCP) |
 		    (1 << MLX5E_TT_IPV4_UDP) |
 		    (1 << MLX5E_TT_IPV6_UDP) |
 		    (1 << MLX5E_TT_IPV4) |
 		    (1 << MLX5E_TT_IPV6) |
 		    (1 << MLX5E_TT_ANY) |
 		    0;
 		break;
 	}
 
 	return (ret);
 }
 
 static int
 mlx5e_add_eth_addr_rule_sub(struct mlx5e_priv *priv,
     struct mlx5e_eth_addr_info *ai, int type,
     void *flow_context, void *match_criteria)
 {
 	u8 match_criteria_enable = 0;
 	void *match_value;
 	void *dest;
 	u8 *dmac;
 	u8 *match_criteria_dmac;
 	void *ft = priv->ft.main;
 	u32 *tirn = priv->tirn;
 	u32 tt_vec;
 	int err;
 
 	match_value = MLX5_ADDR_OF(flow_context, flow_context, match_value);
 	dmac = MLX5_ADDR_OF(fte_match_param, match_value,
 	    outer_headers.dmac_47_16);
 	match_criteria_dmac = MLX5_ADDR_OF(fte_match_param, match_criteria,
 	    outer_headers.dmac_47_16);
 	dest = MLX5_ADDR_OF(flow_context, flow_context, destination);
 
 	MLX5_SET(flow_context, flow_context, action,
 	    MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
 	MLX5_SET(flow_context, flow_context, destination_list_size, 1);
 	MLX5_SET(dest_format_struct, dest, destination_type,
 	    MLX5_FLOW_CONTEXT_DEST_TYPE_TIR);
 
 	switch (type) {
 	case MLX5E_FULLMATCH:
 		match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 		memset(match_criteria_dmac, 0xff, ETH_ALEN);
 		ether_addr_copy(dmac, ai->addr);
 		break;
 
 	case MLX5E_ALLMULTI:
 		match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 		match_criteria_dmac[0] = 0x01;
 		dmac[0] = 0x01;
 		break;
 
 	case MLX5E_PROMISC:
 		break;
 	default:
 		break;
 	}
 
 	tt_vec = mlx5e_get_tt_vec(ai, type);
 
 	if (tt_vec & (1 << MLX5E_TT_ANY)) {
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_ANY]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_ANY]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_ANY);
 	}
 	match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, match_criteria,
 	    outer_headers.ethertype);
 
 	if (tt_vec & (1 << MLX5E_TT_IPV4)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IP);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV4]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV4);
 	}
 	if (tt_vec & (1 << MLX5E_TT_IPV6)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IPV6);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV6]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV6);
 	}
 	MLX5_SET_TO_ONES(fte_match_param, match_criteria,
 	    outer_headers.ip_protocol);
 	MLX5_SET(fte_match_param, match_value, outer_headers.ip_protocol,
 	    IPPROTO_UDP);
 
 	if (tt_vec & (1 << MLX5E_TT_IPV4_UDP)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IP);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV4_UDP]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4_UDP]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV4_UDP);
 	}
 	if (tt_vec & (1 << MLX5E_TT_IPV6_UDP)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IPV6);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV6_UDP]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6_UDP]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV6_UDP);
 	}
 	MLX5_SET(fte_match_param, match_value, outer_headers.ip_protocol,
 	    IPPROTO_TCP);
 
 	if (tt_vec & (1 << MLX5E_TT_IPV4_TCP)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IP);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV4_TCP]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4_TCP]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV4_TCP);
 	}
 	if (tt_vec & (1 << MLX5E_TT_IPV6_TCP)) {
 		MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
 		    ETHERTYPE_IPV6);
 		MLX5_SET(dest_format_struct, dest, destination_id,
 		    tirn[MLX5E_TT_IPV6_TCP]);
 		err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
 		    match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6_TCP]);
 		if (err) {
 			mlx5e_del_eth_addr_from_flow_table(priv, ai);
 			return (err);
 		}
 		ai->tt_vec |= (1 << MLX5E_TT_IPV6_TCP);
 	}
 	return (0);
 }
 
 static int
 mlx5e_add_eth_addr_rule(struct mlx5e_priv *priv,
     struct mlx5e_eth_addr_info *ai, int type)
 {
 	u32 *flow_context;
 	u32 *match_criteria;
 	int err;
 
 	flow_context = mlx5_vzalloc(MLX5_ST_SZ_BYTES(flow_context) +
 	    MLX5_ST_SZ_BYTES(dest_format_struct));
 	match_criteria = mlx5_vzalloc(MLX5_ST_SZ_BYTES(fte_match_param));
 	if (!flow_context || !match_criteria) {
 		if_printf(priv->ifp, "%s: alloc failed\n", __func__);
 		err = -ENOMEM;
 		goto add_eth_addr_rule_out;
 	}
 	err = mlx5e_add_eth_addr_rule_sub(priv, ai, type, flow_context,
 	    match_criteria);
 	if (err)
 		if_printf(priv->ifp, "%s: failed\n", __func__);
 
 add_eth_addr_rule_out:
 	kvfree(match_criteria);
 	kvfree(flow_context);
 	return (err);
 }
 
+static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
+{
+	struct ifnet *ifp = priv->ifp;
+	int max_list_size;
+	int list_size;
+	u16 *vlans;
+	int vlan;
+	int err;
+	int i;
+
+	list_size = 0;
+	for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
+		list_size++;
+
+	max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);
+
+	if (list_size > max_list_size) {
+		if_printf(ifp,
+			    "ifnet vlans list size (%d) > (%d) max vport list size, some vlans will be dropped\n",
+			    list_size, max_list_size);
+		list_size = max_list_size;
+	}
+
+	vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
+	if (!vlans)
+		return -ENOMEM;
+
+	i = 0;
+	for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID) {
+		if (i >= list_size)
+			break;
+		vlans[i++] = vlan;
+	}
+
+	err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
+	if (err)
+		if_printf(ifp, "Failed to modify vport vlans list err(%d)\n",
+			   err);
+
+	kfree(vlans);
+	return err;
+}
+
 enum mlx5e_vlan_rule_type {
 	MLX5E_VLAN_RULE_TYPE_UNTAGGED,
 	MLX5E_VLAN_RULE_TYPE_ANY_VID,
 	MLX5E_VLAN_RULE_TYPE_MATCH_VID,
 };
 
 static int
 mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
     enum mlx5e_vlan_rule_type rule_type, u16 vid)
 {
 	u8 match_criteria_enable = 0;
 	u32 *flow_context;
 	void *match_value;
 	void *dest;
 	u32 *match_criteria;
 	u32 *ft_ix;
 	int err;
 
 	flow_context = mlx5_vzalloc(MLX5_ST_SZ_BYTES(flow_context) +
 	    MLX5_ST_SZ_BYTES(dest_format_struct));
 	match_criteria = mlx5_vzalloc(MLX5_ST_SZ_BYTES(fte_match_param));
 	if (!flow_context || !match_criteria) {
 		if_printf(priv->ifp, "%s: alloc failed\n", __func__);
 		err = -ENOMEM;
 		goto add_vlan_rule_out;
 	}
 	match_value = MLX5_ADDR_OF(flow_context, flow_context, match_value);
 	dest = MLX5_ADDR_OF(flow_context, flow_context, destination);
 
 	MLX5_SET(flow_context, flow_context, action,
 	    MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
 	MLX5_SET(flow_context, flow_context, destination_list_size, 1);
 	MLX5_SET(dest_format_struct, dest, destination_type,
 	    MLX5_FLOW_CONTEXT_DEST_TYPE_FLOW_TABLE);
 	MLX5_SET(dest_format_struct, dest, destination_id,
 	    mlx5_get_flow_table_id(priv->ft.main));
 
 	match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, match_criteria,
 	    outer_headers.vlan_tag);
 
 	switch (rule_type) {
 	case MLX5E_VLAN_RULE_TYPE_UNTAGGED:
 		ft_ix = &priv->vlan.untagged_rule_ft_ix;
 		break;
 	case MLX5E_VLAN_RULE_TYPE_ANY_VID:
 		ft_ix = &priv->vlan.any_vlan_rule_ft_ix;
 		MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
 		    1);
 		break;
 	default:			/* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
 		ft_ix = &priv->vlan.active_vlans_ft_ix[vid];
 		MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
 		    1);
 		MLX5_SET_TO_ONES(fte_match_param, match_criteria,
 		    outer_headers.first_vid);
 		MLX5_SET(fte_match_param, match_value, outer_headers.first_vid,
 		    vid);
+		mlx5e_vport_context_update_vlans(priv);
 		break;
 	}
 
 	err = mlx5_add_flow_table_entry(priv->ft.vlan, match_criteria_enable,
 	    match_criteria, flow_context, ft_ix);
 	if (err)
 		if_printf(priv->ifp, "%s: failed\n", __func__);
 
 add_vlan_rule_out:
 	kvfree(match_criteria);
 	kvfree(flow_context);
 	return (err);
 }
 
 static void
 mlx5e_del_vlan_rule(struct mlx5e_priv *priv,
     enum mlx5e_vlan_rule_type rule_type, u16 vid)
 {
 	switch (rule_type) {
 	case MLX5E_VLAN_RULE_TYPE_UNTAGGED:
 		mlx5_del_flow_table_entry(priv->ft.vlan,
 		    priv->vlan.untagged_rule_ft_ix);
 		break;
 	case MLX5E_VLAN_RULE_TYPE_ANY_VID:
 		mlx5_del_flow_table_entry(priv->ft.vlan,
 		    priv->vlan.any_vlan_rule_ft_ix);
 		break;
 	case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
 		mlx5_del_flow_table_entry(priv->ft.vlan,
 		    priv->vlan.active_vlans_ft_ix[vid]);
+		mlx5e_vport_context_update_vlans(priv);
 		break;
 	}
 }
 
 void
 mlx5e_enable_vlan_filter(struct mlx5e_priv *priv)
 {
 	if (priv->vlan.filter_disabled) {
 		priv->vlan.filter_disabled = false;
 		if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 			mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
 			    0);
 	}
 }
 
 void
 mlx5e_disable_vlan_filter(struct mlx5e_priv *priv)
 {
 	if (!priv->vlan.filter_disabled) {
 		priv->vlan.filter_disabled = true;
 		if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 			mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
 			    0);
 	}
 }
 
 void
 mlx5e_vlan_rx_add_vid(void *arg, struct ifnet *ifp, u16 vid)
 {
 	struct mlx5e_priv *priv = arg;
 
 	if (ifp != priv->ifp)
 		return;
 
 	PRIV_LOCK(priv);
 	set_bit(vid, priv->vlan.active_vlans);
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 		mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
 	PRIV_UNLOCK(priv);
 }
 
 void
 mlx5e_vlan_rx_kill_vid(void *arg, struct ifnet *ifp, u16 vid)
 {
 	struct mlx5e_priv *priv = arg;
 
 	if (ifp != priv->ifp)
 		return;
 
 	PRIV_LOCK(priv);
 	clear_bit(vid, priv->vlan.active_vlans);
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 		mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
 	PRIV_UNLOCK(priv);
 }
 
 int
 mlx5e_add_all_vlan_rules(struct mlx5e_priv *priv)
 {
 	u16 vid;
 	int err;
 
 	for_each_set_bit(vid, priv->vlan.active_vlans, VLAN_N_VID) {
 		err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID,
 		    vid);
 		if (err)
 			return (err);
 	}
 
 	err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_UNTAGGED, 0);
 	if (err)
 		return (err);
 
 	if (priv->vlan.filter_disabled) {
 		err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
 		    0);
 		if (err)
 			return (err);
 	}
 	return (0);
 }
 
 void
 mlx5e_del_all_vlan_rules(struct mlx5e_priv *priv)
 {
 	u16 vid;
 
 	if (priv->vlan.filter_disabled)
 		mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID, 0);
 
 	mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_UNTAGGED, 0);
 
 	for_each_set_bit(vid, priv->vlan.active_vlans, VLAN_N_VID)
 	    mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
 }
 
 #define	mlx5e_for_each_hash_node(hn, tmp, hash, i) \
 	for (i = 0; i < MLX5E_ETH_ADDR_HASH_SIZE; i++) \
 		LIST_FOREACH_SAFE(hn, &(hash)[i], hlist, tmp)
 
 static void
 mlx5e_execute_action(struct mlx5e_priv *priv,
     struct mlx5e_eth_addr_hash_node *hn)
 {
 	switch (hn->action) {
 	case MLX5E_ACTION_ADD:
 		mlx5e_add_eth_addr_rule(priv, &hn->ai, MLX5E_FULLMATCH);
 		hn->action = MLX5E_ACTION_NONE;
 		break;
 
 	case MLX5E_ACTION_DEL:
 		mlx5e_del_eth_addr_from_flow_table(priv, &hn->ai);
 		mlx5e_del_eth_addr_from_hash(hn);
 		break;
 
 	default:
 		break;
 	}
 }
 
 static void
 mlx5e_sync_ifp_addr(struct mlx5e_priv *priv)
 {
 	struct ifnet *ifp = priv->ifp;
 	struct ifaddr *ifa;
 	struct ifmultiaddr *ifma;
 
 	/* XXX adding this entry might not be needed */
 	mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_uc,
 	    LLADDR((struct sockaddr_dl *)(ifp->if_addr->ifa_addr)));
 
 	if_addr_rlock(ifp);
 	TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 		if (ifa->ifa_addr->sa_family != AF_LINK)
 			continue;
 		mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_uc,
 		    LLADDR((struct sockaddr_dl *)ifa->ifa_addr));
 	}
 	if_addr_runlock(ifp);
 
 	if_maddr_rlock(ifp);
 	TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 		if (ifma->ifma_addr->sa_family != AF_LINK)
 			continue;
 		mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_mc,
 		    LLADDR((struct sockaddr_dl *)ifma->ifma_addr));
 	}
 	if_maddr_runlock(ifp);
 }
 
+static void mlx5e_fill_addr_array(struct mlx5e_priv *priv, int list_type,
+				  u8 addr_array[][ETH_ALEN], int size)
+{
+	bool is_uc = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC);
+	struct ifnet *ifp = priv->ifp;
+	struct mlx5e_eth_addr_hash_node *hn;
+	struct mlx5e_eth_addr_hash_head *addr_list;
+	struct mlx5e_eth_addr_hash_node *tmp;
+	int i = 0;
+	int hi;
+
+	addr_list = is_uc ? priv->eth_addr.if_uc : priv->eth_addr.if_mc;
+
+	if (is_uc) /* Make sure our own address is pushed first */
+		ether_addr_copy(addr_array[i++], IF_LLADDR(ifp));
+	else if (priv->eth_addr.broadcast_enabled)
+		ether_addr_copy(addr_array[i++], ifp->if_broadcastaddr);
+
+	mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
+		if (ether_addr_equal(IF_LLADDR(ifp), hn->ai.addr))
+			continue;
+		if (i >= size)
+			break;
+		ether_addr_copy(addr_array[i++], hn->ai.addr);
+	}
+}
+
+static void mlx5e_vport_context_update_addr_list(struct mlx5e_priv *priv,
+						 int list_type)
+{
+	bool is_uc = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC);
+	struct mlx5e_eth_addr_hash_node *hn;
+	u8 (*addr_array)[ETH_ALEN] = NULL;
+	struct mlx5e_eth_addr_hash_head *addr_list;
+	struct mlx5e_eth_addr_hash_node *tmp;
+	int max_size;
+	int size;
+	int err;
+	int hi;
+
+	size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 : 0);
+	max_size = is_uc ?
+		1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
+		1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);
+
+	addr_list = is_uc ? priv->eth_addr.if_uc : priv->eth_addr.if_mc;
+	mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
+		size++;
+
+	if (size > max_size) {
+		if_printf(priv->ifp,
+			    "ifp %s list size (%d) > (%d) max vport list size, some addresses will be dropped\n",
+			    is_uc ? "UC" : "MC", size, max_size);
+		size = max_size;
+	}
+
+	if (size) {
+		addr_array = kcalloc(size, ETH_ALEN, GFP_KERNEL);
+		if (!addr_array) {
+			err = -ENOMEM;
+			goto out;
+		}
+		mlx5e_fill_addr_array(priv, list_type, addr_array, size);
+	}
+
+	err = mlx5_modify_nic_vport_mac_list(priv->mdev, list_type, addr_array, size);
+out:
+	if (err)
+		if_printf(priv->ifp,
+			   "Failed to modify vport %s list err(%d)\n",
+			   is_uc ? "UC" : "MC", err);
+	kfree(addr_array);
+}
+
+static void mlx5e_vport_context_update(struct mlx5e_priv *priv)
+{
+	struct mlx5e_eth_addr_db *ea = &priv->eth_addr;
+
+	mlx5e_vport_context_update_addr_list(priv, MLX5_NIC_VPORT_LIST_TYPE_UC);
+	mlx5e_vport_context_update_addr_list(priv, MLX5_NIC_VPORT_LIST_TYPE_MC);
+	mlx5_modify_nic_vport_promisc(priv->mdev, 0,
+				      ea->allmulti_enabled,
+				      ea->promisc_enabled);
+}
+
 static void
 mlx5e_apply_ifp_addr(struct mlx5e_priv *priv)
 {
 	struct mlx5e_eth_addr_hash_node *hn;
 	struct mlx5e_eth_addr_hash_node *tmp;
 	int i;
 
 	mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_uc, i)
 	    mlx5e_execute_action(priv, hn);
 
 	mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_mc, i)
 	    mlx5e_execute_action(priv, hn);
 }
 
 static void
 mlx5e_handle_ifp_addr(struct mlx5e_priv *priv)
 {
 	struct mlx5e_eth_addr_hash_node *hn;
 	struct mlx5e_eth_addr_hash_node *tmp;
 	int i;
 
 	mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_uc, i)
 	    hn->action = MLX5E_ACTION_DEL;
 	mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_mc, i)
 	    hn->action = MLX5E_ACTION_DEL;
 
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 		mlx5e_sync_ifp_addr(priv);
 
 	mlx5e_apply_ifp_addr(priv);
 }
 
 void
 mlx5e_set_rx_mode_core(struct mlx5e_priv *priv)
 {
 	struct mlx5e_eth_addr_db *ea = &priv->eth_addr;
 	struct ifnet *ndev = priv->ifp;
 
 	bool rx_mode_enable = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	bool promisc_enabled = rx_mode_enable && (ndev->if_flags & IFF_PROMISC);
 	bool allmulti_enabled = rx_mode_enable && (ndev->if_flags & IFF_ALLMULTI);
 	bool broadcast_enabled = rx_mode_enable;
 
 	bool enable_promisc = !ea->promisc_enabled && promisc_enabled;
 	bool disable_promisc = ea->promisc_enabled && !promisc_enabled;
 	bool enable_allmulti = !ea->allmulti_enabled && allmulti_enabled;
 	bool disable_allmulti = ea->allmulti_enabled && !allmulti_enabled;
 	bool enable_broadcast = !ea->broadcast_enabled && broadcast_enabled;
 	bool disable_broadcast = ea->broadcast_enabled && !broadcast_enabled;
 
 	/* update broadcast address */
 	ether_addr_copy(priv->eth_addr.broadcast.addr,
 	    priv->ifp->if_broadcastaddr);
 
 	if (enable_promisc)
 		mlx5e_add_eth_addr_rule(priv, &ea->promisc, MLX5E_PROMISC);
 	if (enable_allmulti)
 		mlx5e_add_eth_addr_rule(priv, &ea->allmulti, MLX5E_ALLMULTI);
 	if (enable_broadcast)
 		mlx5e_add_eth_addr_rule(priv, &ea->broadcast, MLX5E_FULLMATCH);
 
 	mlx5e_handle_ifp_addr(priv);
 
 	if (disable_broadcast)
 		mlx5e_del_eth_addr_from_flow_table(priv, &ea->broadcast);
 	if (disable_allmulti)
 		mlx5e_del_eth_addr_from_flow_table(priv, &ea->allmulti);
 	if (disable_promisc)
 		mlx5e_del_eth_addr_from_flow_table(priv, &ea->promisc);
 
 	ea->promisc_enabled = promisc_enabled;
 	ea->allmulti_enabled = allmulti_enabled;
 	ea->broadcast_enabled = broadcast_enabled;
+
+	mlx5e_vport_context_update(priv);
 }
 
 void
 mlx5e_set_rx_mode_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv =
 	    container_of(work, struct mlx5e_priv, set_rx_mode_work);
 
 	PRIV_LOCK(priv);
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 		mlx5e_set_rx_mode_core(priv);
 	PRIV_UNLOCK(priv);
 }
 
 static int
 mlx5e_create_main_flow_table(struct mlx5e_priv *priv)
 {
 	struct mlx5_flow_table_group *g;
 	u8 *dmac;
 
 	g = malloc(9 * sizeof(*g), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (g == NULL)
 		return (-ENOMEM);
 
 	g[0].log_sz = 2;
 	g[0].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
 	    outer_headers.ethertype);
 	MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
 	    outer_headers.ip_protocol);
 
 	g[1].log_sz = 1;
 	g[1].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, g[1].match_criteria,
 	    outer_headers.ethertype);
 
 	g[2].log_sz = 0;
 
 	g[3].log_sz = 14;
 	g[3].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[3].match_criteria,
 	    outer_headers.dmac_47_16);
 	memset(dmac, 0xff, ETH_ALEN);
 	MLX5_SET_TO_ONES(fte_match_param, g[3].match_criteria,
 	    outer_headers.ethertype);
 	MLX5_SET_TO_ONES(fte_match_param, g[3].match_criteria,
 	    outer_headers.ip_protocol);
 
 	g[4].log_sz = 13;
 	g[4].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[4].match_criteria,
 	    outer_headers.dmac_47_16);
 	memset(dmac, 0xff, ETH_ALEN);
 	MLX5_SET_TO_ONES(fte_match_param, g[4].match_criteria,
 	    outer_headers.ethertype);
 
 	g[5].log_sz = 11;
 	g[5].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[5].match_criteria,
 	    outer_headers.dmac_47_16);
 	memset(dmac, 0xff, ETH_ALEN);
 
 	g[6].log_sz = 2;
 	g[6].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[6].match_criteria,
 	    outer_headers.dmac_47_16);
 	dmac[0] = 0x01;
 	MLX5_SET_TO_ONES(fte_match_param, g[6].match_criteria,
 	    outer_headers.ethertype);
 	MLX5_SET_TO_ONES(fte_match_param, g[6].match_criteria,
 	    outer_headers.ip_protocol);
 
 	g[7].log_sz = 1;
 	g[7].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[7].match_criteria,
 	    outer_headers.dmac_47_16);
 	dmac[0] = 0x01;
 	MLX5_SET_TO_ONES(fte_match_param, g[7].match_criteria,
 	    outer_headers.ethertype);
 
 	g[8].log_sz = 0;
 	g[8].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	dmac = MLX5_ADDR_OF(fte_match_param, g[8].match_criteria,
 	    outer_headers.dmac_47_16);
 	dmac[0] = 0x01;
 	priv->ft.main = mlx5_create_flow_table(priv->mdev, 1,
 	    MLX5_FLOW_TABLE_TYPE_NIC_RCV,
 	    0, 9, g);
 	free(g, M_MLX5EN);
 
 	return (priv->ft.main ? 0 : -ENOMEM);
 }
 
 static void
 mlx5e_destroy_main_flow_table(struct mlx5e_priv *priv)
 {
 	mlx5_destroy_flow_table(priv->ft.main);
 	priv->ft.main = NULL;
 }
 
 static int
 mlx5e_create_vlan_flow_table(struct mlx5e_priv *priv)
 {
 	struct mlx5_flow_table_group *g;
 
 	g = malloc(2 * sizeof(*g), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (g == NULL)
 		return (-ENOMEM);
 
 	g[0].log_sz = 12;
 	g[0].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
 	    outer_headers.vlan_tag);
 	MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
 	    outer_headers.first_vid);
 
 	/* untagged + any vlan id */
 	g[1].log_sz = 1;
 	g[1].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 	MLX5_SET_TO_ONES(fte_match_param, g[1].match_criteria,
 	    outer_headers.vlan_tag);
 
 	priv->ft.vlan = mlx5_create_flow_table(priv->mdev, 0,
 	    MLX5_FLOW_TABLE_TYPE_NIC_RCV,
 	    0, 2, g);
 	free(g, M_MLX5EN);
 
 	return (priv->ft.vlan ? 0 : -ENOMEM);
 }
 
 static void
 mlx5e_destroy_vlan_flow_table(struct mlx5e_priv *priv)
 {
 	mlx5_destroy_flow_table(priv->ft.vlan);
 	priv->ft.vlan = NULL;
 }
 
 int
 mlx5e_open_flow_table(struct mlx5e_priv *priv)
 {
 	int err;
 
 	err = mlx5e_create_main_flow_table(priv);
 	if (err)
 		return (err);
 
 	err = mlx5e_create_vlan_flow_table(priv);
 	if (err)
 		goto err_destroy_main_flow_table;
 
 	return (0);
 
 err_destroy_main_flow_table:
 	mlx5e_destroy_main_flow_table(priv);
 
 	return (err);
 }
 
 void
 mlx5e_close_flow_table(struct mlx5e_priv *priv)
 {
 	mlx5e_destroy_vlan_flow_table(priv);
 	mlx5e_destroy_main_flow_table(priv);
 }
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c	(revision 301547)
@@ -1,3183 +1,3190 @@
 /*-
  * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include "en.h"
 
 #include <sys/sockio.h>
 #include <machine/atomic.h>
 
 #define	ETH_DRIVER_VERSION	"3.1.0-dev"
 char mlx5e_version[] = "Mellanox Ethernet driver"
     " (" ETH_DRIVER_VERSION ")";
 
 struct mlx5e_rq_param {
 	u32	rqc [MLX5_ST_SZ_DW(rqc)];
 	struct mlx5_wq_param wq;
 };
 
 struct mlx5e_sq_param {
 	u32	sqc [MLX5_ST_SZ_DW(sqc)];
 	struct mlx5_wq_param wq;
 };
 
 struct mlx5e_cq_param {
 	u32	cqc [MLX5_ST_SZ_DW(cqc)];
 	struct mlx5_wq_param wq;
 	u16	eq_ix;
 };
 
 struct mlx5e_channel_param {
 	struct mlx5e_rq_param rq;
 	struct mlx5e_sq_param sq;
 	struct mlx5e_cq_param rx_cq;
 	struct mlx5e_cq_param tx_cq;
 };
 
 static const struct {
 	u32	subtype;
 	u64	baudrate;
 }	mlx5e_mode_table[MLX5E_LINK_MODES_NUMBER] = {
 
 	[MLX5E_1000BASE_CX_SGMII] = {
 		.subtype = IFM_1000_CX_SGMII,
 		.baudrate = IF_Mbps(1000ULL),
 	},
 	[MLX5E_1000BASE_KX] = {
 		.subtype = IFM_1000_KX,
 		.baudrate = IF_Mbps(1000ULL),
 	},
 	[MLX5E_10GBASE_CX4] = {
 		.subtype = IFM_10G_CX4,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_10GBASE_KX4] = {
 		.subtype = IFM_10G_KX4,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_10GBASE_KR] = {
 		.subtype = IFM_10G_KR,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_20GBASE_KR2] = {
 		.subtype = IFM_20G_KR2,
 		.baudrate = IF_Gbps(20ULL),
 	},
 	[MLX5E_40GBASE_CR4] = {
 		.subtype = IFM_40G_CR4,
 		.baudrate = IF_Gbps(40ULL),
 	},
 	[MLX5E_40GBASE_KR4] = {
 		.subtype = IFM_40G_KR4,
 		.baudrate = IF_Gbps(40ULL),
 	},
 	[MLX5E_56GBASE_R4] = {
 		.subtype = IFM_56G_R4,
 		.baudrate = IF_Gbps(56ULL),
 	},
 	[MLX5E_10GBASE_CR] = {
 		.subtype = IFM_10G_CR1,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_10GBASE_SR] = {
 		.subtype = IFM_10G_SR,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_10GBASE_LR] = {
 		.subtype = IFM_10G_LR,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_40GBASE_SR4] = {
 		.subtype = IFM_40G_SR4,
 		.baudrate = IF_Gbps(40ULL),
 	},
 	[MLX5E_40GBASE_LR4] = {
 		.subtype = IFM_40G_LR4,
 		.baudrate = IF_Gbps(40ULL),
 	},
 	[MLX5E_100GBASE_CR4] = {
 		.subtype = IFM_100G_CR4,
 		.baudrate = IF_Gbps(100ULL),
 	},
 	[MLX5E_100GBASE_SR4] = {
 		.subtype = IFM_100G_SR4,
 		.baudrate = IF_Gbps(100ULL),
 	},
 	[MLX5E_100GBASE_KR4] = {
 		.subtype = IFM_100G_KR4,
 		.baudrate = IF_Gbps(100ULL),
 	},
 	[MLX5E_100GBASE_LR4] = {
 		.subtype = IFM_100G_LR4,
 		.baudrate = IF_Gbps(100ULL),
 	},
 	[MLX5E_100BASE_TX] = {
 		.subtype = IFM_100_TX,
 		.baudrate = IF_Mbps(100ULL),
 	},
 	[MLX5E_100BASE_T] = {
 		.subtype = IFM_100_T,
 		.baudrate = IF_Mbps(100ULL),
 	},
 	[MLX5E_10GBASE_T] = {
 		.subtype = IFM_10G_T,
 		.baudrate = IF_Gbps(10ULL),
 	},
 	[MLX5E_25GBASE_CR] = {
 		.subtype = IFM_25G_CR,
 		.baudrate = IF_Gbps(25ULL),
 	},
 	[MLX5E_25GBASE_KR] = {
 		.subtype = IFM_25G_KR,
 		.baudrate = IF_Gbps(25ULL),
 	},
 	[MLX5E_25GBASE_SR] = {
 		.subtype = IFM_25G_SR,
 		.baudrate = IF_Gbps(25ULL),
 	},
 	[MLX5E_50GBASE_CR2] = {
 		.subtype = IFM_50G_CR2,
 		.baudrate = IF_Gbps(50ULL),
 	},
 	[MLX5E_50GBASE_KR2] = {
 		.subtype = IFM_50G_KR2,
 		.baudrate = IF_Gbps(50ULL),
 	},
 };
 
 MALLOC_DEFINE(M_MLX5EN, "MLX5EN", "MLX5 Ethernet");
 
 static void
 mlx5e_update_carrier(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 out[MLX5_ST_SZ_DW(ptys_reg)];
 	u32 eth_proto_oper;
 	int error;
 	u8 port_state;
 	u8 i;
 
 	port_state = mlx5_query_vport_state(mdev,
 	    MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT);
 
 	if (port_state == VPORT_STATE_UP) {
 		priv->media_status_last |= IFM_ACTIVE;
 	} else {
 		priv->media_status_last &= ~IFM_ACTIVE;
 		priv->media_active_last = IFM_ETHER;
 		if_link_state_change(priv->ifp, LINK_STATE_DOWN);
 		return;
 	}
 
 	error = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN);
 	if (error) {
 		priv->media_active_last = IFM_ETHER;
 		priv->ifp->if_baudrate = 1;
 		if_printf(priv->ifp, "%s: query port ptys failed: 0x%x\n",
 		    __func__, error);
 		return;
 	}
 	eth_proto_oper = MLX5_GET(ptys_reg, out, eth_proto_oper);
 
 	for (i = 0; i != MLX5E_LINK_MODES_NUMBER; i++) {
 		if (mlx5e_mode_table[i].baudrate == 0)
 			continue;
 		if (MLX5E_PROT_MASK(i) & eth_proto_oper) {
 			priv->ifp->if_baudrate =
 			    mlx5e_mode_table[i].baudrate;
 			priv->media_active_last =
 			    mlx5e_mode_table[i].subtype | IFM_ETHER | IFM_FDX;
 		}
 	}
 	if_link_state_change(priv->ifp, LINK_STATE_UP);
 }
 
 static void
 mlx5e_media_status(struct ifnet *dev, struct ifmediareq *ifmr)
 {
 	struct mlx5e_priv *priv = dev->if_softc;
 
 	ifmr->ifm_status = priv->media_status_last;
 	ifmr->ifm_active = priv->media_active_last |
 	    (priv->params.rx_pauseframe_control ? IFM_ETH_RXPAUSE : 0) |
 	    (priv->params.tx_pauseframe_control ? IFM_ETH_TXPAUSE : 0);
 
 }
 
 static u32
 mlx5e_find_link_mode(u32 subtype)
 {
 	u32 i;
 	u32 link_mode = 0;
 
 	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
 		if (mlx5e_mode_table[i].baudrate == 0)
 			continue;
 		if (mlx5e_mode_table[i].subtype == subtype)
 			link_mode |= MLX5E_PROT_MASK(i);
 	}
 
 	return (link_mode);
 }
 
 static int
 mlx5e_media_change(struct ifnet *dev)
 {
 	struct mlx5e_priv *priv = dev->if_softc;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 eth_proto_cap;
 	u32 link_mode;
 	int was_opened;
 	int locked;
 	int error;
 
 	locked = PRIV_LOCKED(priv);
 	if (!locked)
 		PRIV_LOCK(priv);
 
 	if (IFM_TYPE(priv->media.ifm_media) != IFM_ETHER) {
 		error = EINVAL;
 		goto done;
 	}
 	link_mode = mlx5e_find_link_mode(IFM_SUBTYPE(priv->media.ifm_media));
 
 	/* query supported capabilities */
 	error = mlx5_query_port_proto_cap(mdev, &eth_proto_cap, MLX5_PTYS_EN);
 	if (error != 0) {
 		if_printf(dev, "Query port media capability failed\n");
 		goto done;
 	}
 	/* check for autoselect */
 	if (IFM_SUBTYPE(priv->media.ifm_media) == IFM_AUTO) {
 		link_mode = eth_proto_cap;
 		if (link_mode == 0) {
 			if_printf(dev, "Port media capability is zero\n");
 			error = EINVAL;
 			goto done;
 		}
 	} else {
 		link_mode = link_mode & eth_proto_cap;
 		if (link_mode == 0) {
 			if_printf(dev, "Not supported link mode requested\n");
 			error = EINVAL;
 			goto done;
 		}
 	}
 	/* update pauseframe control bits */
 	priv->params.rx_pauseframe_control =
 	    (priv->media.ifm_media & IFM_ETH_RXPAUSE) ? 1 : 0;
 	priv->params.tx_pauseframe_control =
 	    (priv->media.ifm_media & IFM_ETH_TXPAUSE) ? 1 : 0;
 
 	/* check if device is opened */
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	/* reconfigure the hardware */
 	mlx5_set_port_status(mdev, MLX5_PORT_DOWN);
 	mlx5_set_port_proto(mdev, link_mode, MLX5_PTYS_EN);
 	mlx5_set_port_pause(mdev, 1,
 	    priv->params.rx_pauseframe_control,
 	    priv->params.tx_pauseframe_control);
 	if (was_opened)
 		mlx5_set_port_status(mdev, MLX5_PORT_UP);
 
 done:
 	if (!locked)
 		PRIV_UNLOCK(priv);
 	return (error);
 }
 
 static void
 mlx5e_update_carrier_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 	    update_carrier_work);
 
 	PRIV_LOCK(priv);
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state))
 		mlx5e_update_carrier(priv);
 	PRIV_UNLOCK(priv);
 }
 
 static void
 mlx5e_update_pport_counters(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5e_pport_stats *s = &priv->stats.pport;
 	struct mlx5e_port_stats_debug *s_debug = &priv->stats.port_stats_debug;
 	u32 *in;
 	u32 *out;
 	u64 *ptr;
 	unsigned sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
 	unsigned x;
 	unsigned y;
 
 	in = mlx5_vzalloc(sz);
 	out = mlx5_vzalloc(sz);
 	if (in == NULL || out == NULL)
 		goto free_out;
 
 	ptr = (uint64_t *)MLX5_ADDR_OF(ppcnt_reg, out, counter_set);
 
 	MLX5_SET(ppcnt_reg, in, local_port, 1);
 
 	MLX5_SET(ppcnt_reg, in, grp, MLX5_IEEE_802_3_COUNTERS_GROUP);
 	mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
 	for (x = y = 0; x != MLX5E_PPORT_IEEE802_3_STATS_NUM; x++, y++)
 		s->arg[y] = be64toh(ptr[x]);
 
 	MLX5_SET(ppcnt_reg, in, grp, MLX5_RFC_2819_COUNTERS_GROUP);
 	mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
 	for (x = 0; x != MLX5E_PPORT_RFC2819_STATS_NUM; x++, y++)
 		s->arg[y] = be64toh(ptr[x]);
 	for (y = 0; x != MLX5E_PPORT_RFC2819_STATS_NUM +
 	    MLX5E_PPORT_RFC2819_STATS_DEBUG_NUM; x++, y++)
 		s_debug->arg[y] = be64toh(ptr[x]);
 
 	MLX5_SET(ppcnt_reg, in, grp, MLX5_RFC_2863_COUNTERS_GROUP);
 	mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
 	for (x = 0; x != MLX5E_PPORT_RFC2863_STATS_DEBUG_NUM; x++, y++)
 		s_debug->arg[y] = be64toh(ptr[x]);
 
 	MLX5_SET(ppcnt_reg, in, grp, MLX5_PHYSICAL_LAYER_COUNTERS_GROUP);
 	mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
 	for (x = 0; x != MLX5E_PPORT_PHYSICAL_LAYER_STATS_DEBUG_NUM; x++, y++)
 		s_debug->arg[y] = be64toh(ptr[x]);
 free_out:
 	kvfree(in);
 	kvfree(out);
 }
 
 static void
 mlx5e_update_stats_work(struct work_struct *work)
 {
 	struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
 	    update_stats_work);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5e_vport_stats *s = &priv->stats.vport;
 	struct mlx5e_rq_stats *rq_stats;
 	struct mlx5e_sq_stats *sq_stats;
 	struct buf_ring *sq_br;
 #if (__FreeBSD_version < 1100000)
 	struct ifnet *ifp = priv->ifp;
 #endif
 
 	u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)];
 	u32 *out;
 	int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out);
 	u64 tso_packets = 0;
 	u64 tso_bytes = 0;
 	u64 tx_queue_dropped = 0;
 	u64 tx_defragged = 0;
 	u64 tx_offload_none = 0;
 	u64 lro_packets = 0;
 	u64 lro_bytes = 0;
 	u64 sw_lro_queued = 0;
 	u64 sw_lro_flushed = 0;
 	u64 rx_csum_none = 0;
 	u64 rx_wqe_err = 0;
 	u32 rx_out_of_buffer = 0;
 	int i;
 	int j;
 
 	PRIV_LOCK(priv);
 	out = mlx5_vzalloc(outlen);
 	if (out == NULL)
 		goto free_out;
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
 		goto free_out;
 
 	/* Collect firts the SW counters and then HW for consistency */
 	for (i = 0; i < priv->params.num_channels; i++) {
 		struct mlx5e_rq *rq = &priv->channel[i]->rq;
 
 		rq_stats = &priv->channel[i]->rq.stats;
 
 		/* collect stats from LRO */
 		rq_stats->sw_lro_queued = rq->lro.lro_queued;
 		rq_stats->sw_lro_flushed = rq->lro.lro_flushed;
 		sw_lro_queued += rq_stats->sw_lro_queued;
 		sw_lro_flushed += rq_stats->sw_lro_flushed;
 		lro_packets += rq_stats->lro_packets;
 		lro_bytes += rq_stats->lro_bytes;
 		rx_csum_none += rq_stats->csum_none;
 		rx_wqe_err += rq_stats->wqe_err;
 
 		for (j = 0; j < priv->num_tc; j++) {
 			sq_stats = &priv->channel[i]->sq[j].stats;
 			sq_br = priv->channel[i]->sq[j].br;
 
 			tso_packets += sq_stats->tso_packets;
 			tso_bytes += sq_stats->tso_bytes;
 			tx_queue_dropped += sq_stats->dropped;
 			tx_queue_dropped += sq_br->br_drops;
 			tx_defragged += sq_stats->defragged;
 			tx_offload_none += sq_stats->csum_offload_none;
 		}
 	}
 
 	/* update counters */
 	s->tso_packets = tso_packets;
 	s->tso_bytes = tso_bytes;
 	s->tx_queue_dropped = tx_queue_dropped;
 	s->tx_defragged = tx_defragged;
 	s->lro_packets = lro_packets;
 	s->lro_bytes = lro_bytes;
 	s->sw_lro_queued = sw_lro_queued;
 	s->sw_lro_flushed = sw_lro_flushed;
 	s->rx_csum_none = rx_csum_none;
 	s->rx_wqe_err = rx_wqe_err;
 
 	/* HW counters */
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(query_vport_counter_in, in, opcode,
 	    MLX5_CMD_OP_QUERY_VPORT_COUNTER);
 	MLX5_SET(query_vport_counter_in, in, op_mod, 0);
 	MLX5_SET(query_vport_counter_in, in, other_vport, 0);
 
 	memset(out, 0, outlen);
 
 	/* get number of out-of-buffer drops first */
 	if (mlx5_vport_query_out_of_rx_buffer(mdev, priv->counter_set_id,
 	    &rx_out_of_buffer))
 		goto free_out;
 
 	/* accumulate difference into a 64-bit counter */
 	s->rx_out_of_buffer += (u64)(u32)(rx_out_of_buffer - s->rx_out_of_buffer_prev);
 	s->rx_out_of_buffer_prev = rx_out_of_buffer;
 
 	/* get port statistics */
 	if (mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen))
 		goto free_out;
 
 #define	MLX5_GET_CTR(out, x) \
 	MLX5_GET64(query_vport_counter_out, out, x)
 
 	s->rx_error_packets =
 	    MLX5_GET_CTR(out, received_errors.packets);
 	s->rx_error_bytes =
 	    MLX5_GET_CTR(out, received_errors.octets);
 	s->tx_error_packets =
 	    MLX5_GET_CTR(out, transmit_errors.packets);
 	s->tx_error_bytes =
 	    MLX5_GET_CTR(out, transmit_errors.octets);
 
 	s->rx_unicast_packets =
 	    MLX5_GET_CTR(out, received_eth_unicast.packets);
 	s->rx_unicast_bytes =
 	    MLX5_GET_CTR(out, received_eth_unicast.octets);
 	s->tx_unicast_packets =
 	    MLX5_GET_CTR(out, transmitted_eth_unicast.packets);
 	s->tx_unicast_bytes =
 	    MLX5_GET_CTR(out, transmitted_eth_unicast.octets);
 
 	s->rx_multicast_packets =
 	    MLX5_GET_CTR(out, received_eth_multicast.packets);
 	s->rx_multicast_bytes =
 	    MLX5_GET_CTR(out, received_eth_multicast.octets);
 	s->tx_multicast_packets =
 	    MLX5_GET_CTR(out, transmitted_eth_multicast.packets);
 	s->tx_multicast_bytes =
 	    MLX5_GET_CTR(out, transmitted_eth_multicast.octets);
 
 	s->rx_broadcast_packets =
 	    MLX5_GET_CTR(out, received_eth_broadcast.packets);
 	s->rx_broadcast_bytes =
 	    MLX5_GET_CTR(out, received_eth_broadcast.octets);
 	s->tx_broadcast_packets =
 	    MLX5_GET_CTR(out, transmitted_eth_broadcast.packets);
 	s->tx_broadcast_bytes =
 	    MLX5_GET_CTR(out, transmitted_eth_broadcast.octets);
 
 	s->rx_packets =
 	    s->rx_unicast_packets +
 	    s->rx_multicast_packets +
 	    s->rx_broadcast_packets -
 	    s->rx_out_of_buffer;
 	s->rx_bytes =
 	    s->rx_unicast_bytes +
 	    s->rx_multicast_bytes +
 	    s->rx_broadcast_bytes;
 	s->tx_packets =
 	    s->tx_unicast_packets +
 	    s->tx_multicast_packets +
 	    s->tx_broadcast_packets;
 	s->tx_bytes =
 	    s->tx_unicast_bytes +
 	    s->tx_multicast_bytes +
 	    s->tx_broadcast_bytes;
 
 	/* Update calculated offload counters */
 	s->tx_csum_offload = s->tx_packets - tx_offload_none;
 	s->rx_csum_good = s->rx_packets - s->rx_csum_none;
 
 	/* Update per port counters */
 	mlx5e_update_pport_counters(priv);
 
 #if (__FreeBSD_version < 1100000)
 	/* no get_counters interface in fbsd 10 */
 	ifp->if_ipackets = s->rx_packets;
 	ifp->if_ierrors = s->rx_error_packets;
 	ifp->if_iqdrops = s->rx_out_of_buffer;
 	ifp->if_opackets = s->tx_packets;
 	ifp->if_oerrors = s->tx_error_packets;
 	ifp->if_snd.ifq_drops = s->tx_queue_dropped;
 	ifp->if_ibytes = s->rx_bytes;
 	ifp->if_obytes = s->tx_bytes;
 #endif
 
 free_out:
 	kvfree(out);
 	PRIV_UNLOCK(priv);
 }
 
 static void
 mlx5e_update_stats(void *arg)
 {
 	struct mlx5e_priv *priv = arg;
 
 	schedule_work(&priv->update_stats_work);
 
 	callout_reset(&priv->watchdog, hz, &mlx5e_update_stats, priv);
 }
 
 static void
 mlx5e_async_event_sub(struct mlx5e_priv *priv,
     enum mlx5_dev_event event)
 {
 	switch (event) {
 	case MLX5_DEV_EVENT_PORT_UP:
 	case MLX5_DEV_EVENT_PORT_DOWN:
 		schedule_work(&priv->update_carrier_work);
 		break;
 
 	default:
 		break;
 	}
 }
 
 static void
 mlx5e_async_event(struct mlx5_core_dev *mdev, void *vpriv,
     enum mlx5_dev_event event, unsigned long param)
 {
 	struct mlx5e_priv *priv = vpriv;
 
 	mtx_lock(&priv->async_events_mtx);
 	if (test_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state))
 		mlx5e_async_event_sub(priv, event);
 	mtx_unlock(&priv->async_events_mtx);
 }
 
 static void
 mlx5e_enable_async_events(struct mlx5e_priv *priv)
 {
 	set_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state);
 }
 
 static void
 mlx5e_disable_async_events(struct mlx5e_priv *priv)
 {
 	mtx_lock(&priv->async_events_mtx);
 	clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state);
 	mtx_unlock(&priv->async_events_mtx);
 }
 
 static const char *mlx5e_rq_stats_desc[] = {
 	MLX5E_RQ_STATS(MLX5E_STATS_DESC)
 };
 
 static int
 mlx5e_create_rq(struct mlx5e_channel *c,
     struct mlx5e_rq_param *param,
     struct mlx5e_rq *rq)
 {
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	char buffer[16];
 	void *rqc = param->rqc;
 	void *rqc_wq = MLX5_ADDR_OF(rqc, rqc, wq);
 	int wq_sz;
 	int err;
 	int i;
 
 	/* Create DMA descriptor TAG */
 	if ((err = -bus_dma_tag_create(
 	    bus_get_dma_tag(mdev->pdev->dev.bsddev),
 	    1,				/* any alignment */
 	    0,				/* no boundary */
 	    BUS_SPACE_MAXADDR,		/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    MJUM16BYTES,		/* maxsize */
 	    1,				/* nsegments */
 	    MJUM16BYTES,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockfuncarg */
 	    &rq->dma_tag)))
 		goto done;
 
 	err = mlx5_wq_ll_create(mdev, &param->wq, rqc_wq, &rq->wq,
 	    &rq->wq_ctrl);
 	if (err)
 		goto err_free_dma_tag;
 
 	rq->wq.db = &rq->wq.db[MLX5_RCV_DBR];
 
 	if (priv->params.hw_lro_en) {
 		rq->wqe_sz = priv->params.lro_wqe_sz;
 	} else {
 		rq->wqe_sz = MLX5E_SW2MB_MTU(priv->ifp->if_mtu);
 	}
 	if (rq->wqe_sz > MJUM16BYTES) {
 		err = -ENOMEM;
 		goto err_rq_wq_destroy;
 	} else if (rq->wqe_sz > MJUM9BYTES) {
 		rq->wqe_sz = MJUM16BYTES;
 	} else if (rq->wqe_sz > MJUMPAGESIZE) {
 		rq->wqe_sz = MJUM9BYTES;
 	} else if (rq->wqe_sz > MCLBYTES) {
 		rq->wqe_sz = MJUMPAGESIZE;
 	} else {
 		rq->wqe_sz = MCLBYTES;
 	}
 
 	wq_sz = mlx5_wq_ll_get_size(&rq->wq);
 	rq->mbuf = malloc(wq_sz * sizeof(rq->mbuf[0]), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (rq->mbuf == NULL) {
 		err = -ENOMEM;
 		goto err_rq_wq_destroy;
 	}
 	for (i = 0; i != wq_sz; i++) {
 		struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, i);
 		uint32_t byte_count = rq->wqe_sz - MLX5E_NET_IP_ALIGN;
 
 		err = -bus_dmamap_create(rq->dma_tag, 0, &rq->mbuf[i].dma_map);
 		if (err != 0) {
 			while (i--)
 				bus_dmamap_destroy(rq->dma_tag, rq->mbuf[i].dma_map);
 			goto err_rq_mbuf_free;
 		}
 		wqe->data.lkey = c->mkey_be;
 		wqe->data.byte_count = cpu_to_be32(byte_count | MLX5_HW_START_PADDING);
 	}
 
 	rq->pdev = c->pdev;
 	rq->ifp = c->ifp;
 	rq->channel = c;
 	rq->ix = c->ix;
 
 	snprintf(buffer, sizeof(buffer), "rxstat%d", c->ix);
 	mlx5e_create_stats(&rq->stats.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    buffer, mlx5e_rq_stats_desc, MLX5E_RQ_STATS_NUM,
 	    rq->stats.arg);
 
 #ifdef HAVE_TURBO_LRO
 	if (tcp_tlro_init(&rq->lro, c->ifp, MLX5E_BUDGET_MAX) != 0)
 		rq->lro.mbuf = NULL;
 #else
 	if (tcp_lro_init(&rq->lro))
 		rq->lro.lro_cnt = 0;
 	else
 		rq->lro.ifp = c->ifp;
 #endif
 	return (0);
 
 err_rq_mbuf_free:
 	free(rq->mbuf, M_MLX5EN);
 err_rq_wq_destroy:
 	mlx5_wq_destroy(&rq->wq_ctrl);
 err_free_dma_tag:
 	bus_dma_tag_destroy(rq->dma_tag);
 done:
 	return (err);
 }
 
 static void
 mlx5e_destroy_rq(struct mlx5e_rq *rq)
 {
 	int wq_sz;
 	int i;
 
 	/* destroy all sysctl nodes */
 	sysctl_ctx_free(&rq->stats.ctx);
 
 	/* free leftover LRO packets, if any */
 #ifdef HAVE_TURBO_LRO
 	tcp_tlro_free(&rq->lro);
 #else
 	tcp_lro_free(&rq->lro);
 #endif
 	wq_sz = mlx5_wq_ll_get_size(&rq->wq);
 	for (i = 0; i != wq_sz; i++) {
 		if (rq->mbuf[i].mbuf != NULL) {
 			bus_dmamap_unload(rq->dma_tag,
 			    rq->mbuf[i].dma_map);
 			m_freem(rq->mbuf[i].mbuf);
 		}
 		bus_dmamap_destroy(rq->dma_tag, rq->mbuf[i].dma_map);
 	}
 	free(rq->mbuf, M_MLX5EN);
 	mlx5_wq_destroy(&rq->wq_ctrl);
 }
 
 static int
 mlx5e_enable_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
 {
 	struct mlx5e_channel *c = rq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	void *in;
 	void *rqc;
 	void *wq;
 	int inlen;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(create_rq_in) +
 	    sizeof(u64) * rq->wq_ctrl.buf.npages;
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 
 	rqc = MLX5_ADDR_OF(create_rq_in, in, ctx);
 	wq = MLX5_ADDR_OF(rqc, rqc, wq);
 
 	memcpy(rqc, param->rqc, sizeof(param->rqc));
 
 	MLX5_SET(rqc, rqc, cqn, c->rq.cq.mcq.cqn);
 	MLX5_SET(rqc, rqc, state, MLX5_RQC_STATE_RST);
 	MLX5_SET(rqc, rqc, flush_in_error_en, 1);
 	if (priv->counter_set_id >= 0)
 		MLX5_SET(rqc, rqc, counter_set_id, priv->counter_set_id);
 	MLX5_SET(wq, wq, log_wq_pg_sz, rq->wq_ctrl.buf.page_shift -
 	    PAGE_SHIFT);
 	MLX5_SET64(wq, wq, dbr_addr, rq->wq_ctrl.db.dma);
 
 	mlx5_fill_page_array(&rq->wq_ctrl.buf,
 	    (__be64 *) MLX5_ADDR_OF(wq, wq, pas));
 
 	err = mlx5_core_create_rq(mdev, in, inlen, &rq->rqn);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static int
 mlx5e_modify_rq(struct mlx5e_rq *rq, int curr_state, int next_state)
 {
 	struct mlx5e_channel *c = rq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	void *in;
 	void *rqc;
 	int inlen;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(modify_rq_in);
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 
 	rqc = MLX5_ADDR_OF(modify_rq_in, in, ctx);
 
 	MLX5_SET(modify_rq_in, in, rqn, rq->rqn);
 	MLX5_SET(modify_rq_in, in, rq_state, curr_state);
 	MLX5_SET(rqc, rqc, state, next_state);
 
 	err = mlx5_core_modify_rq(mdev, in, inlen);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static void
 mlx5e_disable_rq(struct mlx5e_rq *rq)
 {
 	struct mlx5e_channel *c = rq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	mlx5_core_destroy_rq(mdev, rq->rqn);
 }
 
 static int
 mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq)
 {
 	struct mlx5e_channel *c = rq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_wq_ll *wq = &rq->wq;
 	int i;
 
 	for (i = 0; i < 1000; i++) {
 		if (wq->cur_sz >= priv->params.min_rx_wqes)
 			return (0);
 
 		msleep(4);
 	}
 	return (-ETIMEDOUT);
 }
 
 static int
 mlx5e_open_rq(struct mlx5e_channel *c,
     struct mlx5e_rq_param *param,
     struct mlx5e_rq *rq)
 {
 	int err;
 
 	err = mlx5e_create_rq(c, param, rq);
 	if (err)
 		return (err);
 
 	err = mlx5e_enable_rq(rq, param);
 	if (err)
 		goto err_destroy_rq;
 
 	err = mlx5e_modify_rq(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
 	if (err)
 		goto err_disable_rq;
 
 	c->rq.enabled = 1;
 
 	return (0);
 
 err_disable_rq:
 	mlx5e_disable_rq(rq);
 err_destroy_rq:
 	mlx5e_destroy_rq(rq);
 
 	return (err);
 }
 
 static void
 mlx5e_close_rq(struct mlx5e_rq *rq)
 {
 	rq->enabled = 0;
 	mlx5e_modify_rq(rq, MLX5_RQC_STATE_RDY, MLX5_RQC_STATE_ERR);
 }
 
 static void
 mlx5e_close_rq_wait(struct mlx5e_rq *rq)
 {
 	/* wait till RQ is empty */
 	while (!mlx5_wq_ll_is_empty(&rq->wq)) {
 		msleep(4);
 		rq->cq.mcq.comp(&rq->cq.mcq);
 	}
 
 	mlx5e_disable_rq(rq);
 	mlx5e_destroy_rq(rq);
 }
 
 static void
 mlx5e_free_sq_db(struct mlx5e_sq *sq)
 {
 	int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
 	int x;
 
 	for (x = 0; x != wq_sz; x++)
 		bus_dmamap_destroy(sq->dma_tag, sq->mbuf[x].dma_map);
 	free(sq->mbuf, M_MLX5EN);
 }
 
 static int
 mlx5e_alloc_sq_db(struct mlx5e_sq *sq)
 {
 	int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
 	int err;
 	int x;
 
 	sq->mbuf = malloc(wq_sz * sizeof(sq->mbuf[0]), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (sq->mbuf == NULL)
 		return (-ENOMEM);
 
 	/* Create DMA descriptor MAPs */
 	for (x = 0; x != wq_sz; x++) {
 		err = -bus_dmamap_create(sq->dma_tag, 0, &sq->mbuf[x].dma_map);
 		if (err != 0) {
 			while (x--)
 				bus_dmamap_destroy(sq->dma_tag, sq->mbuf[x].dma_map);
 			free(sq->mbuf, M_MLX5EN);
 			return (err);
 		}
 	}
 	return (0);
 }
 
 static const char *mlx5e_sq_stats_desc[] = {
 	MLX5E_SQ_STATS(MLX5E_STATS_DESC)
 };
 
 static int
 mlx5e_create_sq(struct mlx5e_channel *c,
     int tc,
     struct mlx5e_sq_param *param,
     struct mlx5e_sq *sq)
 {
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	char buffer[16];
 
 	void *sqc = param->sqc;
 	void *sqc_wq = MLX5_ADDR_OF(sqc, sqc, wq);
 #ifdef RSS
 	cpuset_t cpu_mask;
 	int cpu_id;
 #endif
 	int err;
 
 	/* Create DMA descriptor TAG */
 	if ((err = -bus_dma_tag_create(
 	    bus_get_dma_tag(mdev->pdev->dev.bsddev),
 	    1,				/* any alignment */
 	    0,				/* no boundary */
 	    BUS_SPACE_MAXADDR,		/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    MLX5E_MAX_TX_PAYLOAD_SIZE,	/* maxsize */
 	    MLX5E_MAX_TX_MBUF_FRAGS,	/* nsegments */
 	    MLX5E_MAX_TX_MBUF_SIZE,	/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockfuncarg */
 	    &sq->dma_tag)))
 		goto done;
 
 	err = mlx5_alloc_map_uar(mdev, &sq->uar);
 	if (err)
 		goto err_free_dma_tag;
 
 	err = mlx5_wq_cyc_create(mdev, &param->wq, sqc_wq, &sq->wq,
 	    &sq->wq_ctrl);
 	if (err)
 		goto err_unmap_free_uar;
 
 	sq->wq.db = &sq->wq.db[MLX5_SND_DBR];
 	sq->uar_map = sq->uar.map;
 	sq->uar_bf_map = sq->uar.bf_map;
 	sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2;
 
 	err = mlx5e_alloc_sq_db(sq);
 	if (err)
 		goto err_sq_wq_destroy;
 
 	sq->pdev = c->pdev;
 	sq->mkey_be = c->mkey_be;
 	sq->channel = c;
 	sq->tc = tc;
 
 	sq->br = buf_ring_alloc(MLX5E_SQ_TX_QUEUE_SIZE, M_MLX5EN,
 	    M_WAITOK, &sq->lock);
 	if (sq->br == NULL) {
 		if_printf(c->ifp, "%s: Failed allocating sq drbr buffer\n",
 		    __func__);
 		err = -ENOMEM;
 		goto err_free_sq_db;
 	}
 
 	sq->sq_tq = taskqueue_create_fast("mlx5e_que", M_WAITOK,
 	    taskqueue_thread_enqueue, &sq->sq_tq);
 	if (sq->sq_tq == NULL) {
 		if_printf(c->ifp, "%s: Failed allocating taskqueue\n",
 		    __func__);
 		err = -ENOMEM;
 		goto err_free_drbr;
 	}
 
 	TASK_INIT(&sq->sq_task, 0, mlx5e_tx_que, sq);
 #ifdef RSS
 	cpu_id = rss_getcpu(c->ix % rss_getnumbuckets());
 	CPU_SETOF(cpu_id, &cpu_mask);
 	taskqueue_start_threads_cpuset(&sq->sq_tq, 1, PI_NET, &cpu_mask,
 	    "%s TX SQ%d.%d CPU%d", c->ifp->if_xname, c->ix, tc, cpu_id);
 #else
 	taskqueue_start_threads(&sq->sq_tq, 1, PI_NET,
 	    "%s TX SQ%d.%d", c->ifp->if_xname, c->ix, tc);
 #endif
 	snprintf(buffer, sizeof(buffer), "txstat%dtc%d", c->ix, tc);
 	mlx5e_create_stats(&sq->stats.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    buffer, mlx5e_sq_stats_desc, MLX5E_SQ_STATS_NUM,
 	    sq->stats.arg);
 
 	return (0);
 
 err_free_drbr:
 	buf_ring_free(sq->br, M_MLX5EN);
 err_free_sq_db:
 	mlx5e_free_sq_db(sq);
 err_sq_wq_destroy:
 	mlx5_wq_destroy(&sq->wq_ctrl);
 
 err_unmap_free_uar:
 	mlx5_unmap_free_uar(mdev, &sq->uar);
 
 err_free_dma_tag:
 	bus_dma_tag_destroy(sq->dma_tag);
 done:
 	return (err);
 }
 
 static void
 mlx5e_destroy_sq(struct mlx5e_sq *sq)
 {
 	struct mlx5e_channel *c = sq->channel;
 	struct mlx5e_priv *priv = c->priv;
 
 	/* destroy all sysctl nodes */
 	sysctl_ctx_free(&sq->stats.ctx);
 
 	mlx5e_free_sq_db(sq);
 	mlx5_wq_destroy(&sq->wq_ctrl);
 	mlx5_unmap_free_uar(priv->mdev, &sq->uar);
 	taskqueue_drain(sq->sq_tq, &sq->sq_task);
 	taskqueue_free(sq->sq_tq);
 	buf_ring_free(sq->br, M_MLX5EN);
 }
 
 static int
 mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
 {
 	struct mlx5e_channel *c = sq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	void *in;
 	void *sqc;
 	void *wq;
 	int inlen;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(create_sq_in) +
 	    sizeof(u64) * sq->wq_ctrl.buf.npages;
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 
 	sqc = MLX5_ADDR_OF(create_sq_in, in, ctx);
 	wq = MLX5_ADDR_OF(sqc, sqc, wq);
 
 	memcpy(sqc, param->sqc, sizeof(param->sqc));
 
 	MLX5_SET(sqc, sqc, tis_num_0, priv->tisn[sq->tc]);
 	MLX5_SET(sqc, sqc, cqn, c->sq[sq->tc].cq.mcq.cqn);
 	MLX5_SET(sqc, sqc, state, MLX5_SQC_STATE_RST);
 	MLX5_SET(sqc, sqc, tis_lst_sz, 1);
 	MLX5_SET(sqc, sqc, flush_in_error_en, 1);
 
 	MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC);
 	MLX5_SET(wq, wq, uar_page, sq->uar.index);
 	MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift -
 	    PAGE_SHIFT);
 	MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma);
 
 	mlx5_fill_page_array(&sq->wq_ctrl.buf,
 	    (__be64 *) MLX5_ADDR_OF(wq, wq, pas));
 
 	err = mlx5_core_create_sq(mdev, in, inlen, &sq->sqn);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static int
 mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state, int next_state)
 {
 	struct mlx5e_channel *c = sq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	void *in;
 	void *sqc;
 	int inlen;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(modify_sq_in);
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 
 	sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx);
 
 	MLX5_SET(modify_sq_in, in, sqn, sq->sqn);
 	MLX5_SET(modify_sq_in, in, sq_state, curr_state);
 	MLX5_SET(sqc, sqc, state, next_state);
 
 	err = mlx5_core_modify_sq(mdev, in, inlen);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static void
 mlx5e_disable_sq(struct mlx5e_sq *sq)
 {
 	struct mlx5e_channel *c = sq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	mlx5_core_destroy_sq(mdev, sq->sqn);
 }
 
 static int
 mlx5e_open_sq(struct mlx5e_channel *c,
     int tc,
     struct mlx5e_sq_param *param,
     struct mlx5e_sq *sq)
 {
 	int err;
 
 	err = mlx5e_create_sq(c, tc, param, sq);
 	if (err)
 		return (err);
 
 	err = mlx5e_enable_sq(sq, param);
 	if (err)
 		goto err_destroy_sq;
 
 	err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RST, MLX5_SQC_STATE_RDY);
 	if (err)
 		goto err_disable_sq;
 
 	atomic_store_rel_int(&sq->queue_state, MLX5E_SQ_READY);
 
 	return (0);
 
 err_disable_sq:
 	mlx5e_disable_sq(sq);
 err_destroy_sq:
 	mlx5e_destroy_sq(sq);
 
 	return (err);
 }
 
 static void
 mlx5e_sq_send_nops_locked(struct mlx5e_sq *sq, int can_sleep)
 {
 	/* fill up remainder with NOPs */
 	while (sq->cev_counter != 0) {
 		while (!mlx5e_sq_has_room_for(sq, 1)) {
 			if (can_sleep != 0) {
 				mtx_unlock(&sq->lock);
 				msleep(4);
 				mtx_lock(&sq->lock);
 			} else {
 				goto done;
 			}
 		}
 		/* send a single NOP */
 		mlx5e_send_nop(sq, 1);
 		wmb();
 	}
 done:
 	/* Check if we need to write the doorbell */
 	if (likely(sq->doorbell.d64 != 0)) {
 		mlx5e_tx_notify_hw(sq, sq->doorbell.d32, 0);
 		sq->doorbell.d64 = 0;
 	}
 	return;
 }
 
 void
 mlx5e_sq_cev_timeout(void *arg)
 {
 	struct mlx5e_sq *sq = arg;
 
 	mtx_assert(&sq->lock, MA_OWNED);
 
 	/* check next state */
 	switch (sq->cev_next_state) {
 	case MLX5E_CEV_STATE_SEND_NOPS:
 		/* fill TX ring with NOPs, if any */
 		mlx5e_sq_send_nops_locked(sq, 0);
 
 		/* check if completed */
 		if (sq->cev_counter == 0) {
 			sq->cev_next_state = MLX5E_CEV_STATE_INITIAL;
 			return;
 		}
 		break;
 	default:
 		/* send NOPs on next timeout */
 		sq->cev_next_state = MLX5E_CEV_STATE_SEND_NOPS;
 		break;
 	}
 
 	/* restart timer */
 	callout_reset_curcpu(&sq->cev_callout, hz, mlx5e_sq_cev_timeout, sq);
 }
 
 static void
 mlx5e_close_sq_wait(struct mlx5e_sq *sq)
 {
 
 	mtx_lock(&sq->lock);
 	/* teardown event factor timer, if any */
 	sq->cev_next_state = MLX5E_CEV_STATE_HOLD_NOPS;
 	callout_stop(&sq->cev_callout);
 
 	/* send dummy NOPs in order to flush the transmit ring */
 	mlx5e_sq_send_nops_locked(sq, 1);
 	mtx_unlock(&sq->lock);
 
 	/* make sure it is safe to free the callout */
 	callout_drain(&sq->cev_callout);
 
 	/* error out remaining requests */
 	mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR);
 
 	/* wait till SQ is empty */
 	mtx_lock(&sq->lock);
 	while (sq->cc != sq->pc) {
 		mtx_unlock(&sq->lock);
 		msleep(4);
 		sq->cq.mcq.comp(&sq->cq.mcq);
 		mtx_lock(&sq->lock);
 	}
 	mtx_unlock(&sq->lock);
 
 	mlx5e_disable_sq(sq);
 	mlx5e_destroy_sq(sq);
 }
 
 static int
 mlx5e_create_cq(struct mlx5e_channel *c,
     struct mlx5e_cq_param *param,
     struct mlx5e_cq *cq,
     mlx5e_cq_comp_t *comp)
 {
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5_core_cq *mcq = &cq->mcq;
 	int eqn_not_used;
 	int irqn;
 	int err;
 	u32 i;
 
 	param->wq.buf_numa_node = 0;
 	param->wq.db_numa_node = 0;
 	param->eq_ix = c->ix;
 
 	err = mlx5_cqwq_create(mdev, &param->wq, param->cqc, &cq->wq,
 	    &cq->wq_ctrl);
 	if (err)
 		return (err);
 
 	mlx5_vector2eqn(mdev, param->eq_ix, &eqn_not_used, &irqn);
 
 	mcq->cqe_sz = 64;
 	mcq->set_ci_db = cq->wq_ctrl.db.db;
 	mcq->arm_db = cq->wq_ctrl.db.db + 1;
 	*mcq->set_ci_db = 0;
 	*mcq->arm_db = 0;
 	mcq->vector = param->eq_ix;
 	mcq->comp = comp;
 	mcq->event = mlx5e_cq_error_event;
 	mcq->irqn = irqn;
 	mcq->uar = &priv->cq_uar;
 
 	for (i = 0; i < mlx5_cqwq_get_size(&cq->wq); i++) {
 		struct mlx5_cqe64 *cqe = mlx5_cqwq_get_wqe(&cq->wq, i);
 
 		cqe->op_own = 0xf1;
 	}
 
 	cq->channel = c;
 
 	return (0);
 }
 
 static void
 mlx5e_destroy_cq(struct mlx5e_cq *cq)
 {
 	mlx5_wq_destroy(&cq->wq_ctrl);
 }
 
 static int
 mlx5e_enable_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param,
     u8 moderation_mode)
 {
 	struct mlx5e_channel *c = cq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5_core_cq *mcq = &cq->mcq;
 	void *in;
 	void *cqc;
 	int inlen;
 	int irqn_not_used;
 	int eqn;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
 	    sizeof(u64) * cq->wq_ctrl.buf.npages;
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 
 	cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
 
 	memcpy(cqc, param->cqc, sizeof(param->cqc));
 
 	mlx5_fill_page_array(&cq->wq_ctrl.buf,
 	    (__be64 *) MLX5_ADDR_OF(create_cq_in, in, pas));
 
 	mlx5_vector2eqn(mdev, param->eq_ix, &eqn, &irqn_not_used);
 
 	MLX5_SET(cqc, cqc, cq_period_mode, moderation_mode);
 	MLX5_SET(cqc, cqc, c_eqn, eqn);
 	MLX5_SET(cqc, cqc, uar_page, mcq->uar->index);
 	MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift -
 	    PAGE_SHIFT);
 	MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma);
 
 	err = mlx5_core_create_cq(mdev, mcq, in, inlen);
 
 	kvfree(in);
 
 	if (err)
 		return (err);
 
 	mlx5e_cq_arm(cq);
 
 	return (0);
 }
 
 static void
 mlx5e_disable_cq(struct mlx5e_cq *cq)
 {
 	struct mlx5e_channel *c = cq->channel;
 	struct mlx5e_priv *priv = c->priv;
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	mlx5_core_destroy_cq(mdev, &cq->mcq);
 }
 
 static int
 mlx5e_open_cq(struct mlx5e_channel *c,
     struct mlx5e_cq_param *param,
     struct mlx5e_cq *cq,
     mlx5e_cq_comp_t *comp,
     u8 moderation_mode)
 {
 	int err;
 
 	err = mlx5e_create_cq(c, param, cq, comp);
 	if (err)
 		return (err);
 
 	err = mlx5e_enable_cq(cq, param, moderation_mode);
 	if (err)
 		goto err_destroy_cq;
 
 	return (0);
 
 err_destroy_cq:
 	mlx5e_destroy_cq(cq);
 
 	return (err);
 }
 
 static void
 mlx5e_close_cq(struct mlx5e_cq *cq)
 {
 	mlx5e_disable_cq(cq);
 	mlx5e_destroy_cq(cq);
 }
 
 static int
 mlx5e_open_tx_cqs(struct mlx5e_channel *c,
     struct mlx5e_channel_param *cparam)
 {
 	u8 tx_moderation_mode;
 	int err;
 	int tc;
 
 	switch (c->priv->params.tx_cq_moderation_mode) {
 	case 0:
 		tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
 		break;
 	default:
 		if (MLX5_CAP_GEN(c->priv->mdev, cq_period_start_from_cqe))
 			tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_CQE;
 		else
 			tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
 		break;
 	}
 	for (tc = 0; tc < c->num_tc; tc++) {
 		/* open completion queue */
 		err = mlx5e_open_cq(c, &cparam->tx_cq, &c->sq[tc].cq,
 		    &mlx5e_tx_cq_comp, tx_moderation_mode);
 		if (err)
 			goto err_close_tx_cqs;
 	}
 	return (0);
 
 err_close_tx_cqs:
 	for (tc--; tc >= 0; tc--)
 		mlx5e_close_cq(&c->sq[tc].cq);
 
 	return (err);
 }
 
 static void
 mlx5e_close_tx_cqs(struct mlx5e_channel *c)
 {
 	int tc;
 
 	for (tc = 0; tc < c->num_tc; tc++)
 		mlx5e_close_cq(&c->sq[tc].cq);
 }
 
 static int
 mlx5e_open_sqs(struct mlx5e_channel *c,
     struct mlx5e_channel_param *cparam)
 {
 	int err;
 	int tc;
 
 	for (tc = 0; tc < c->num_tc; tc++) {
 		err = mlx5e_open_sq(c, tc, &cparam->sq, &c->sq[tc]);
 		if (err)
 			goto err_close_sqs;
 	}
 
 	return (0);
 
 err_close_sqs:
 	for (tc--; tc >= 0; tc--)
 		mlx5e_close_sq_wait(&c->sq[tc]);
 
 	return (err);
 }
 
 static void
 mlx5e_close_sqs_wait(struct mlx5e_channel *c)
 {
 	int tc;
 
 	for (tc = 0; tc < c->num_tc; tc++)
 		mlx5e_close_sq_wait(&c->sq[tc]);
 }
 
 static void
 mlx5e_chan_mtx_init(struct mlx5e_channel *c)
 {
 	int tc;
 
 	mtx_init(&c->rq.mtx, "mlx5rx", MTX_NETWORK_LOCK, MTX_DEF);
 
 	for (tc = 0; tc < c->num_tc; tc++) {
 		struct mlx5e_sq *sq = c->sq + tc;
 
 		mtx_init(&sq->lock, "mlx5tx", MTX_NETWORK_LOCK, MTX_DEF);
 		mtx_init(&sq->comp_lock, "mlx5comp", MTX_NETWORK_LOCK,
 		    MTX_DEF);
 
 		callout_init_mtx(&sq->cev_callout, &sq->lock, 0);
 
 		sq->cev_factor = c->priv->params_ethtool.tx_completion_fact;
 
 		/* ensure the TX completion event factor is not zero */
 		if (sq->cev_factor == 0)
 			sq->cev_factor = 1;
 	}
 }
 
 static void
 mlx5e_chan_mtx_destroy(struct mlx5e_channel *c)
 {
 	int tc;
 
 	mtx_destroy(&c->rq.mtx);
 
 	for (tc = 0; tc < c->num_tc; tc++) {
 		mtx_destroy(&c->sq[tc].lock);
 		mtx_destroy(&c->sq[tc].comp_lock);
 	}
 }
 
 static int
 mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
     struct mlx5e_channel_param *cparam,
     struct mlx5e_channel *volatile *cp)
 {
 	struct mlx5e_channel *c;
 	u8 rx_moderation_mode;
 	int err;
 
 	c = malloc(sizeof(*c), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (c == NULL)
 		return (-ENOMEM);
 
 	c->priv = priv;
 	c->ix = ix;
 	c->cpu = 0;
 	c->pdev = &priv->mdev->pdev->dev;
 	c->ifp = priv->ifp;
 	c->mkey_be = cpu_to_be32(priv->mr.key);
 	c->num_tc = priv->num_tc;
 
 	/* init mutexes */
 	mlx5e_chan_mtx_init(c);
 
 	/* open transmit completion queue */
 	err = mlx5e_open_tx_cqs(c, cparam);
 	if (err)
 		goto err_free;
 
 	switch (priv->params.rx_cq_moderation_mode) {
 	case 0:
 		rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
 		break;
 	default:
 		if (MLX5_CAP_GEN(priv->mdev, cq_period_start_from_cqe))
 			rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_CQE;
 		else
 			rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
 		break;
 	}
 
 	/* open receive completion queue */
 	err = mlx5e_open_cq(c, &cparam->rx_cq, &c->rq.cq,
 	    &mlx5e_rx_cq_comp, rx_moderation_mode);
 	if (err)
 		goto err_close_tx_cqs;
 
 	err = mlx5e_open_sqs(c, cparam);
 	if (err)
 		goto err_close_rx_cq;
 
 	err = mlx5e_open_rq(c, &cparam->rq, &c->rq);
 	if (err)
 		goto err_close_sqs;
 
 	/* store channel pointer */
 	*cp = c;
 
 	/* poll receive queue initially */
 	c->rq.cq.mcq.comp(&c->rq.cq.mcq);
 
 	return (0);
 
 err_close_sqs:
 	mlx5e_close_sqs_wait(c);
 
 err_close_rx_cq:
 	mlx5e_close_cq(&c->rq.cq);
 
 err_close_tx_cqs:
 	mlx5e_close_tx_cqs(c);
 
 err_free:
 	/* destroy mutexes */
 	mlx5e_chan_mtx_destroy(c);
 	free(c, M_MLX5EN);
 	return (err);
 }
 
 static void
 mlx5e_close_channel(struct mlx5e_channel *volatile *pp)
 {
 	struct mlx5e_channel *c = *pp;
 
 	/* check if channel is already closed */
 	if (c == NULL)
 		return;
 	mlx5e_close_rq(&c->rq);
 }
 
 static void
 mlx5e_close_channel_wait(struct mlx5e_channel *volatile *pp)
 {
 	struct mlx5e_channel *c = *pp;
 
 	/* check if channel is already closed */
 	if (c == NULL)
 		return;
 	/* ensure channel pointer is no longer used */
 	*pp = NULL;
 
 	mlx5e_close_rq_wait(&c->rq);
 	mlx5e_close_sqs_wait(c);
 	mlx5e_close_cq(&c->rq.cq);
 	mlx5e_close_tx_cqs(c);
 	/* destroy mutexes */
 	mlx5e_chan_mtx_destroy(c);
 	free(c, M_MLX5EN);
 }
 
 static void
 mlx5e_build_rq_param(struct mlx5e_priv *priv,
     struct mlx5e_rq_param *param)
 {
 	void *rqc = param->rqc;
 	void *wq = MLX5_ADDR_OF(rqc, rqc, wq);
 
 	MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_LINKED_LIST);
 	MLX5_SET(wq, wq, end_padding_mode, MLX5_WQ_END_PAD_MODE_ALIGN);
 	MLX5_SET(wq, wq, log_wq_stride, ilog2(sizeof(struct mlx5e_rx_wqe)));
 	MLX5_SET(wq, wq, log_wq_sz, priv->params.log_rq_size);
 	MLX5_SET(wq, wq, pd, priv->pdn);
 
 	param->wq.buf_numa_node = 0;
 	param->wq.db_numa_node = 0;
 	param->wq.linear = 1;
 }
 
 static void
 mlx5e_build_sq_param(struct mlx5e_priv *priv,
     struct mlx5e_sq_param *param)
 {
 	void *sqc = param->sqc;
 	void *wq = MLX5_ADDR_OF(sqc, sqc, wq);
 
 	MLX5_SET(wq, wq, log_wq_sz, priv->params.log_sq_size);
 	MLX5_SET(wq, wq, log_wq_stride, ilog2(MLX5_SEND_WQE_BB));
 	MLX5_SET(wq, wq, pd, priv->pdn);
 
 	param->wq.buf_numa_node = 0;
 	param->wq.db_numa_node = 0;
 	param->wq.linear = 1;
 }
 
 static void
 mlx5e_build_common_cq_param(struct mlx5e_priv *priv,
     struct mlx5e_cq_param *param)
 {
 	void *cqc = param->cqc;
 
 	MLX5_SET(cqc, cqc, uar_page, priv->cq_uar.index);
 }
 
 static void
 mlx5e_build_rx_cq_param(struct mlx5e_priv *priv,
     struct mlx5e_cq_param *param)
 {
 	void *cqc = param->cqc;
 
 
 	/*
 	 * TODO The sysctl to control on/off is a bool value for now, which means
 	 * we only support CSUM, once HASH is implemnted we'll need to address that.
 	 */
 	if (priv->params.cqe_zipping_en) {
 		MLX5_SET(cqc, cqc, mini_cqe_res_format, MLX5_CQE_FORMAT_CSUM);
 		MLX5_SET(cqc, cqc, cqe_compression_en, 1);
 	}
 
 	MLX5_SET(cqc, cqc, log_cq_size, priv->params.log_rq_size);
 	MLX5_SET(cqc, cqc, cq_period, priv->params.rx_cq_moderation_usec);
 	MLX5_SET(cqc, cqc, cq_max_count, priv->params.rx_cq_moderation_pkts);
 
 	mlx5e_build_common_cq_param(priv, param);
 }
 
 static void
 mlx5e_build_tx_cq_param(struct mlx5e_priv *priv,
     struct mlx5e_cq_param *param)
 {
 	void *cqc = param->cqc;
 
 	MLX5_SET(cqc, cqc, log_cq_size, priv->params.log_sq_size);
 	MLX5_SET(cqc, cqc, cq_period, priv->params.tx_cq_moderation_usec);
 	MLX5_SET(cqc, cqc, cq_max_count, priv->params.tx_cq_moderation_pkts);
 
 	mlx5e_build_common_cq_param(priv, param);
 }
 
 static void
 mlx5e_build_channel_param(struct mlx5e_priv *priv,
     struct mlx5e_channel_param *cparam)
 {
 	memset(cparam, 0, sizeof(*cparam));
 
 	mlx5e_build_rq_param(priv, &cparam->rq);
 	mlx5e_build_sq_param(priv, &cparam->sq);
 	mlx5e_build_rx_cq_param(priv, &cparam->rx_cq);
 	mlx5e_build_tx_cq_param(priv, &cparam->tx_cq);
 }
 
 static int
 mlx5e_open_channels(struct mlx5e_priv *priv)
 {
 	struct mlx5e_channel_param cparam;
 	void *ptr;
 	int err;
 	int i;
 	int j;
 
 	priv->channel = malloc(priv->params.num_channels *
 	    sizeof(struct mlx5e_channel *), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (priv->channel == NULL)
 		return (-ENOMEM);
 
 	mlx5e_build_channel_param(priv, &cparam);
 	for (i = 0; i < priv->params.num_channels; i++) {
 		err = mlx5e_open_channel(priv, i, &cparam, &priv->channel[i]);
 		if (err)
 			goto err_close_channels;
 	}
 
 	for (j = 0; j < priv->params.num_channels; j++) {
 		err = mlx5e_wait_for_min_rx_wqes(&priv->channel[j]->rq);
 		if (err)
 			goto err_close_channels;
 	}
 
 	return (0);
 
 err_close_channels:
 	for (i--; i >= 0; i--) {
 		mlx5e_close_channel(&priv->channel[i]);
 		mlx5e_close_channel_wait(&priv->channel[i]);
 	}
 
 	/* remove "volatile" attribute from "channel" pointer */
 	ptr = __DECONST(void *, priv->channel);
 	priv->channel = NULL;
 
 	free(ptr, M_MLX5EN);
 
 	return (err);
 }
 
 static void
 mlx5e_close_channels(struct mlx5e_priv *priv)
 {
 	void *ptr;
 	int i;
 
 	if (priv->channel == NULL)
 		return;
 
 	for (i = 0; i < priv->params.num_channels; i++)
 		mlx5e_close_channel(&priv->channel[i]);
 	for (i = 0; i < priv->params.num_channels; i++)
 		mlx5e_close_channel_wait(&priv->channel[i]);
 
 	/* remove "volatile" attribute from "channel" pointer */
 	ptr = __DECONST(void *, priv->channel);
 	priv->channel = NULL;
 
 	free(ptr, M_MLX5EN);
 }
 
 static int
 mlx5e_refresh_sq_params(struct mlx5e_priv *priv, struct mlx5e_sq *sq)
 {
 	return (mlx5_core_modify_cq_moderation(priv->mdev, &sq->cq.mcq,
 	    priv->params.tx_cq_moderation_usec,
 	    priv->params.tx_cq_moderation_pkts));
 }
 
 static int
 mlx5e_refresh_rq_params(struct mlx5e_priv *priv, struct mlx5e_rq *rq)
 {
 	return (mlx5_core_modify_cq_moderation(priv->mdev, &rq->cq.mcq,
 	    priv->params.rx_cq_moderation_usec,
 	    priv->params.rx_cq_moderation_pkts));
 }
 
 static int
 mlx5e_refresh_channel_params_sub(struct mlx5e_priv *priv, struct mlx5e_channel *c)
 {
 	int err;
 	int i;
 
 	if (c == NULL)
 		return (EINVAL);
 
 	err = mlx5e_refresh_rq_params(priv, &c->rq);
 	if (err)
 		goto done;
 
 	for (i = 0; i != c->num_tc; i++) {
 		err = mlx5e_refresh_sq_params(priv, &c->sq[i]);
 		if (err)
 			goto done;
 	}
 done:
 	return (err);
 }
 
 int
 mlx5e_refresh_channel_params(struct mlx5e_priv *priv)
 {
 	int i;
 
 	if (priv->channel == NULL)
 		return (EINVAL);
 
 	for (i = 0; i < priv->params.num_channels; i++) {
 		int err;
 
 		err = mlx5e_refresh_channel_params_sub(priv, priv->channel[i]);
 		if (err)
 			return (err);
 	}
 	return (0);
 }
 
 static int
 mlx5e_open_tis(struct mlx5e_priv *priv, int tc)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 in[MLX5_ST_SZ_DW(create_tis_in)];
 	void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(tisc, tisc, prio, tc);
 	MLX5_SET(tisc, tisc, transport_domain, priv->tdn);
 
 	return (mlx5_core_create_tis(mdev, in, sizeof(in), &priv->tisn[tc]));
 }
 
 static void
 mlx5e_close_tis(struct mlx5e_priv *priv, int tc)
 {
 	mlx5_core_destroy_tis(priv->mdev, priv->tisn[tc]);
 }
 
 static int
 mlx5e_open_tises(struct mlx5e_priv *priv)
 {
 	int num_tc = priv->num_tc;
 	int err;
 	int tc;
 
 	for (tc = 0; tc < num_tc; tc++) {
 		err = mlx5e_open_tis(priv, tc);
 		if (err)
 			goto err_close_tises;
 	}
 
 	return (0);
 
 err_close_tises:
 	for (tc--; tc >= 0; tc--)
 		mlx5e_close_tis(priv, tc);
 
 	return (err);
 }
 
 static void
 mlx5e_close_tises(struct mlx5e_priv *priv)
 {
 	int num_tc = priv->num_tc;
 	int tc;
 
 	for (tc = 0; tc < num_tc; tc++)
 		mlx5e_close_tis(priv, tc);
 }
 
 static int
 mlx5e_open_rqt(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 *in;
 	u32 out[MLX5_ST_SZ_DW(create_rqt_out)];
 	void *rqtc;
 	int inlen;
 	int err;
 	int sz;
 	int i;
 
 	sz = 1 << priv->params.rx_hash_log_tbl_sz;
 
 	inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + sizeof(u32) * sz;
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 	rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
 
 	MLX5_SET(rqtc, rqtc, rqt_actual_size, sz);
 	MLX5_SET(rqtc, rqtc, rqt_max_size, sz);
 
 	for (i = 0; i < sz; i++) {
 		int ix;
 #ifdef RSS
 		ix = rss_get_indirection_to_bucket(i);
 #else
 		ix = i;
 #endif
 		/* ensure we don't overflow */
 		ix %= priv->params.num_channels;
 		MLX5_SET(rqtc, rqtc, rq_num[i], priv->channel[ix]->rq.rqn);
 	}
 
 	MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
 
 	memset(out, 0, sizeof(out));
 	err = mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
 	if (!err)
 		priv->rqtn = MLX5_GET(create_rqt_out, out, rqtn);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static void
 mlx5e_close_rqt(struct mlx5e_priv *priv)
 {
 	u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)];
 	u32 out[MLX5_ST_SZ_DW(destroy_rqt_out)];
 
 	memset(in, 0, sizeof(in));
 
 	MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT);
 	MLX5_SET(destroy_rqt_in, in, rqtn, priv->rqtn);
 
 	mlx5_cmd_exec_check_status(priv->mdev, in, sizeof(in), out,
 	    sizeof(out));
 }
 
 static void
 mlx5e_build_tir_ctx(struct mlx5e_priv *priv, u32 * tirc, int tt)
 {
 	void *hfso = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
 	__be32 *hkey;
 
 	MLX5_SET(tirc, tirc, transport_domain, priv->tdn);
 
 #define	ROUGH_MAX_L2_L3_HDR_SZ 256
 
 #define	MLX5_HASH_IP     (MLX5_HASH_FIELD_SEL_SRC_IP   |\
 			  MLX5_HASH_FIELD_SEL_DST_IP)
 
 #define	MLX5_HASH_ALL    (MLX5_HASH_FIELD_SEL_SRC_IP   |\
 			  MLX5_HASH_FIELD_SEL_DST_IP   |\
 			  MLX5_HASH_FIELD_SEL_L4_SPORT |\
 			  MLX5_HASH_FIELD_SEL_L4_DPORT)
 
 #define	MLX5_HASH_IP_IPSEC_SPI	(MLX5_HASH_FIELD_SEL_SRC_IP   |\
 				 MLX5_HASH_FIELD_SEL_DST_IP   |\
 				 MLX5_HASH_FIELD_SEL_IPSEC_SPI)
 
 	if (priv->params.hw_lro_en) {
 		MLX5_SET(tirc, tirc, lro_enable_mask,
 		    MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO |
 		    MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO);
 		MLX5_SET(tirc, tirc, lro_max_msg_sz,
 		    (priv->params.lro_wqe_sz -
 		    ROUGH_MAX_L2_L3_HDR_SZ) >> 8);
 		/* TODO: add the option to choose timer value dynamically */
 		MLX5_SET(tirc, tirc, lro_timeout_period_usecs,
 		    MLX5_CAP_ETH(priv->mdev,
 		    lro_timer_supported_periods[2]));
 	}
 
 	/* setup parameters for hashing TIR type, if any */
 	switch (tt) {
 	case MLX5E_TT_ANY:
 		MLX5_SET(tirc, tirc, disp_type,
 		    MLX5_TIRC_DISP_TYPE_DIRECT);
 		MLX5_SET(tirc, tirc, inline_rqn,
 		    priv->channel[0]->rq.rqn);
 		break;
 	default:
 		MLX5_SET(tirc, tirc, disp_type,
 		    MLX5_TIRC_DISP_TYPE_INDIRECT);
 		MLX5_SET(tirc, tirc, indirect_table,
 		    priv->rqtn);
 		MLX5_SET(tirc, tirc, rx_hash_fn,
 		    MLX5_TIRC_RX_HASH_FN_HASH_TOEPLITZ);
 		hkey = (__be32 *) MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
 #ifdef RSS
 		/*
 		 * The FreeBSD RSS implementation does currently not
 		 * support symmetric Toeplitz hashes:
 		 */
 		MLX5_SET(tirc, tirc, rx_hash_symmetric, 0);
 		rss_getkey((uint8_t *)hkey);
 #else
 		MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
 		hkey[0] = cpu_to_be32(0xD181C62C);
 		hkey[1] = cpu_to_be32(0xF7F4DB5B);
 		hkey[2] = cpu_to_be32(0x1983A2FC);
 		hkey[3] = cpu_to_be32(0x943E1ADB);
 		hkey[4] = cpu_to_be32(0xD9389E6B);
 		hkey[5] = cpu_to_be32(0xD1039C2C);
 		hkey[6] = cpu_to_be32(0xA74499AD);
 		hkey[7] = cpu_to_be32(0x593D56D9);
 		hkey[8] = cpu_to_be32(0xF3253C06);
 		hkey[9] = cpu_to_be32(0x2ADC1FFC);
 #endif
 		break;
 	}
 
 	switch (tt) {
 	case MLX5E_TT_IPV4_TCP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV4);
 		MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
 		    MLX5_L4_PROT_TYPE_TCP);
 #ifdef RSS
 		if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_TCP_IPV4)) {
 			MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 			    MLX5_HASH_IP);
 		} else
 #endif
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_ALL);
 		break;
 
 	case MLX5E_TT_IPV6_TCP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV6);
 		MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
 		    MLX5_L4_PROT_TYPE_TCP);
 #ifdef RSS
 		if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_TCP_IPV6)) {
 			MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 			    MLX5_HASH_IP);
 		} else
 #endif
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_ALL);
 		break;
 
 	case MLX5E_TT_IPV4_UDP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV4);
 		MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
 		    MLX5_L4_PROT_TYPE_UDP);
 #ifdef RSS
 		if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_UDP_IPV4)) {
 			MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 			    MLX5_HASH_IP);
 		} else
 #endif
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_ALL);
 		break;
 
 	case MLX5E_TT_IPV6_UDP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV6);
 		MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
 		    MLX5_L4_PROT_TYPE_UDP);
 #ifdef RSS
 		if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_UDP_IPV6)) {
 			MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 			    MLX5_HASH_IP);
 		} else
 #endif
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_ALL);
 		break;
 
 	case MLX5E_TT_IPV4_IPSEC_AH:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV4);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP_IPSEC_SPI);
 		break;
 
 	case MLX5E_TT_IPV6_IPSEC_AH:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV6);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP_IPSEC_SPI);
 		break;
 
 	case MLX5E_TT_IPV4_IPSEC_ESP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV4);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP_IPSEC_SPI);
 		break;
 
 	case MLX5E_TT_IPV6_IPSEC_ESP:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV6);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP_IPSEC_SPI);
 		break;
 
 	case MLX5E_TT_IPV4:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV4);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP);
 		break;
 
 	case MLX5E_TT_IPV6:
 		MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
 		    MLX5_L3_PROT_TYPE_IPV6);
 		MLX5_SET(rx_hash_field_select, hfso, selected_fields,
 		    MLX5_HASH_IP);
 		break;
 
 	default:
 		break;
 	}
 }
 
 static int
 mlx5e_open_tir(struct mlx5e_priv *priv, int tt)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 *in;
 	void *tirc;
 	int inlen;
 	int err;
 
 	inlen = MLX5_ST_SZ_BYTES(create_tir_in);
 	in = mlx5_vzalloc(inlen);
 	if (in == NULL)
 		return (-ENOMEM);
 	tirc = MLX5_ADDR_OF(create_tir_in, in, tir_context);
 
 	mlx5e_build_tir_ctx(priv, tirc, tt);
 
 	err = mlx5_core_create_tir(mdev, in, inlen, &priv->tirn[tt]);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static void
 mlx5e_close_tir(struct mlx5e_priv *priv, int tt)
 {
 	mlx5_core_destroy_tir(priv->mdev, priv->tirn[tt]);
 }
 
 static int
 mlx5e_open_tirs(struct mlx5e_priv *priv)
 {
 	int err;
 	int i;
 
 	for (i = 0; i < MLX5E_NUM_TT; i++) {
 		err = mlx5e_open_tir(priv, i);
 		if (err)
 			goto err_close_tirs;
 	}
 
 	return (0);
 
 err_close_tirs:
 	for (i--; i >= 0; i--)
 		mlx5e_close_tir(priv, i);
 
 	return (err);
 }
 
 static void
 mlx5e_close_tirs(struct mlx5e_priv *priv)
 {
 	int i;
 
 	for (i = 0; i < MLX5E_NUM_TT; i++)
 		mlx5e_close_tir(priv, i);
 }
 
 /*
  * SW MTU does not include headers,
  * HW MTU includes all headers and checksums.
  */
 static int
 mlx5e_set_dev_port_mtu(struct ifnet *ifp, int sw_mtu)
 {
 	struct mlx5e_priv *priv = ifp->if_softc;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int hw_mtu;
 	int err;
 
 
 	err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(sw_mtu));
 	if (err) {
 		if_printf(ifp, "%s: mlx5_set_port_mtu failed setting %d, err=%d\n",
 		    __func__, sw_mtu, err);
 		return (err);
 	}
 	err = mlx5_query_port_oper_mtu(mdev, &hw_mtu);
 	if (!err) {
 		ifp->if_mtu = MLX5E_HW2SW_MTU(hw_mtu);
 
 		if (ifp->if_mtu != sw_mtu) {
 			if_printf(ifp, "Port MTU %d is different than "
 			    "ifp mtu %d\n", sw_mtu, (int)ifp->if_mtu);
 		}
 	} else {
 		if_printf(ifp, "Query port MTU, after setting new "
 		    "MTU value, failed\n");
 		ifp->if_mtu = sw_mtu;
 	}
 	return (0);
 }
 
 int
 mlx5e_open_locked(struct ifnet *ifp)
 {
 	struct mlx5e_priv *priv = ifp->if_softc;
 	int err;
 
 	/* check if already opened */
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state) != 0)
 		return (0);
 
 #ifdef RSS
 	if (rss_getnumbuckets() > priv->params.num_channels) {
 		if_printf(ifp, "NOTE: There are more RSS buckets(%u) than "
 		    "channels(%u) available\n", rss_getnumbuckets(),
 		    priv->params.num_channels);
 	}
 #endif
 	err = mlx5e_open_tises(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_open_tises failed, %d\n",
 		    __func__, err);
 		return (err);
 	}
 	err = mlx5_vport_alloc_q_counter(priv->mdev, &priv->counter_set_id);
 	if (err) {
 		if_printf(priv->ifp,
 		    "%s: mlx5_vport_alloc_q_counter failed: %d\n",
 		    __func__, err);
 		goto err_close_tises;
 	}
 	err = mlx5e_open_channels(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_open_channels failed, %d\n",
 		    __func__, err);
 		goto err_dalloc_q_counter;
 	}
 	err = mlx5e_open_rqt(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_open_rqt failed, %d\n",
 		    __func__, err);
 		goto err_close_channels;
 	}
 	err = mlx5e_open_tirs(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_open_tir failed, %d\n",
 		    __func__, err);
 		goto err_close_rqls;
 	}
 	err = mlx5e_open_flow_table(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_open_flow_table failed, %d\n",
 		    __func__, err);
 		goto err_close_tirs;
 	}
 	err = mlx5e_add_all_vlan_rules(priv);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_add_all_vlan_rules failed, %d\n",
 		    __func__, err);
 		goto err_close_flow_table;
 	}
 	set_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	mlx5e_update_carrier(priv);
 	mlx5e_set_rx_mode_core(priv);
 
 	return (0);
 
 err_close_flow_table:
 	mlx5e_close_flow_table(priv);
 
 err_close_tirs:
 	mlx5e_close_tirs(priv);
 
 err_close_rqls:
 	mlx5e_close_rqt(priv);
 
 err_close_channels:
 	mlx5e_close_channels(priv);
 
 err_dalloc_q_counter:
 	mlx5_vport_dealloc_q_counter(priv->mdev, priv->counter_set_id);
 
 err_close_tises:
 	mlx5e_close_tises(priv);
 
 	return (err);
 }
 
 static void
 mlx5e_open(void *arg)
 {
 	struct mlx5e_priv *priv = arg;
 
 	PRIV_LOCK(priv);
 	if (mlx5_set_port_status(priv->mdev, MLX5_PORT_UP))
 		if_printf(priv->ifp,
 		    "%s: Setting port status to up failed\n",
 		    __func__);
 
 	mlx5e_open_locked(priv->ifp);
 	priv->ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	PRIV_UNLOCK(priv);
 }
 
 int
 mlx5e_close_locked(struct ifnet *ifp)
 {
 	struct mlx5e_priv *priv = ifp->if_softc;
 
 	/* check if already closed */
 	if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
 		return (0);
 
 	clear_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	mlx5e_set_rx_mode_core(priv);
 	mlx5e_del_all_vlan_rules(priv);
 	if_link_state_change(priv->ifp, LINK_STATE_DOWN);
 	mlx5e_close_flow_table(priv);
 	mlx5e_close_tirs(priv);
 	mlx5e_close_rqt(priv);
 	mlx5e_close_channels(priv);
 	mlx5_vport_dealloc_q_counter(priv->mdev, priv->counter_set_id);
 	mlx5e_close_tises(priv);
 
 	return (0);
 }
 
 #if (__FreeBSD_version >= 1100000)
 static uint64_t
 mlx5e_get_counter(struct ifnet *ifp, ift_counter cnt)
 {
 	struct mlx5e_priv *priv = ifp->if_softc;
 	u64 retval;
 
 	/* PRIV_LOCK(priv); XXX not allowed */
 	switch (cnt) {
 	case IFCOUNTER_IPACKETS:
 		retval = priv->stats.vport.rx_packets;
 		break;
 	case IFCOUNTER_IERRORS:
 		retval = priv->stats.vport.rx_error_packets;
 		break;
 	case IFCOUNTER_IQDROPS:
 		retval = priv->stats.vport.rx_out_of_buffer;
 		break;
 	case IFCOUNTER_OPACKETS:
 		retval = priv->stats.vport.tx_packets;
 		break;
 	case IFCOUNTER_OERRORS:
 		retval = priv->stats.vport.tx_error_packets;
 		break;
 	case IFCOUNTER_IBYTES:
 		retval = priv->stats.vport.rx_bytes;
 		break;
 	case IFCOUNTER_OBYTES:
 		retval = priv->stats.vport.tx_bytes;
 		break;
 	case IFCOUNTER_IMCASTS:
 		retval = priv->stats.vport.rx_multicast_packets;
 		break;
 	case IFCOUNTER_OMCASTS:
 		retval = priv->stats.vport.tx_multicast_packets;
 		break;
 	case IFCOUNTER_OQDROPS:
 		retval = priv->stats.vport.tx_queue_dropped;
 		break;
 	default:
 		retval = if_get_counter_default(ifp, cnt);
 		break;
 	}
 	/* PRIV_UNLOCK(priv); XXX not allowed */
 	return (retval);
 }
 #endif
 
 static void
 mlx5e_set_rx_mode(struct ifnet *ifp)
 {
 	struct mlx5e_priv *priv = ifp->if_softc;
 
 	schedule_work(&priv->set_rx_mode_work);
 }
 
 static int
 mlx5e_ioctl(struct ifnet *ifp, u_long command, caddr_t data)
 {
 	struct mlx5e_priv *priv;
 	struct ifreq *ifr;
 	struct ifi2creq i2c;
 	int error = 0;
 	int mask = 0;
 	int size_read = 0;
 	int module_num;
 	int max_mtu;
 	uint8_t read_addr;
 
 	priv = ifp->if_softc;
 
 	/* check if detaching */
 	if (priv == NULL || priv->gone != 0)
 		return (ENXIO);
 
 	switch (command) {
 	case SIOCSIFMTU:
 		ifr = (struct ifreq *)data;
 
 		PRIV_LOCK(priv);
 		mlx5_query_port_max_mtu(priv->mdev, &max_mtu);
 
 		if (ifr->ifr_mtu >= MLX5E_MTU_MIN &&
 		    ifr->ifr_mtu <= MIN(MLX5E_MTU_MAX, max_mtu)) {
 			int was_opened;
 
 			was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 			if (was_opened)
 				mlx5e_close_locked(ifp);
 
 			/* set new MTU */
 			mlx5e_set_dev_port_mtu(ifp, ifr->ifr_mtu);
 
 			if (was_opened)
 				mlx5e_open_locked(ifp);
 		} else {
 			error = EINVAL;
 			if_printf(ifp, "Invalid MTU value. Min val: %d, Max val: %d\n",
 			    MLX5E_MTU_MIN, MIN(MLX5E_MTU_MAX, max_mtu));
 		}
 		PRIV_UNLOCK(priv);
 		break;
 	case SIOCSIFFLAGS:
 		if ((ifp->if_flags & IFF_UP) &&
 		    (ifp->if_drv_flags & IFF_DRV_RUNNING)) {
 			mlx5e_set_rx_mode(ifp);
 			break;
 		}
 		PRIV_LOCK(priv);
 		if (ifp->if_flags & IFF_UP) {
 			if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 				if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
 					mlx5e_open_locked(ifp);
 				ifp->if_drv_flags |= IFF_DRV_RUNNING;
 				mlx5_set_port_status(priv->mdev, MLX5_PORT_UP);
 			}
 		} else {
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 				mlx5_set_port_status(priv->mdev,
 				    MLX5_PORT_DOWN);
 				if (test_bit(MLX5E_STATE_OPENED, &priv->state) != 0)
 					mlx5e_close_locked(ifp);
 				mlx5e_update_carrier(priv);
 				ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 			}
 		}
 		PRIV_UNLOCK(priv);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		mlx5e_set_rx_mode(ifp);
 		break;
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 	case SIOCGIFXMEDIA:
 		ifr = (struct ifreq *)data;
 		error = ifmedia_ioctl(ifp, ifr, &priv->media, command);
 		break;
 	case SIOCSIFCAP:
 		ifr = (struct ifreq *)data;
 		PRIV_LOCK(priv);
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 
 		if (mask & IFCAP_TXCSUM) {
 			ifp->if_capenable ^= IFCAP_TXCSUM;
 			ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP);
 
 			if (IFCAP_TSO4 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO4;
 				ifp->if_hwassist &= ~CSUM_IP_TSO;
 				if_printf(ifp,
 				    "tso4 disabled due to -txcsum.\n");
 			}
 		}
 		if (mask & IFCAP_TXCSUM_IPV6) {
 			ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
 			ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
 
 			if (IFCAP_TSO6 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO6;
 				ifp->if_hwassist &= ~CSUM_IP6_TSO;
 				if_printf(ifp,
 				    "tso6 disabled due to -txcsum6.\n");
 			}
 		}
 		if (mask & IFCAP_RXCSUM)
 			ifp->if_capenable ^= IFCAP_RXCSUM;
 		if (mask & IFCAP_RXCSUM_IPV6)
 			ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
 		if (mask & IFCAP_TSO4) {
 			if (!(IFCAP_TSO4 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum first.\n");
 				error = EAGAIN;
 				goto out;
 			}
 			ifp->if_capenable ^= IFCAP_TSO4;
 			ifp->if_hwassist ^= CSUM_IP_TSO;
 		}
 		if (mask & IFCAP_TSO6) {
 			if (!(IFCAP_TSO6 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum6 first.\n");
 				error = EAGAIN;
 				goto out;
 			}
 			ifp->if_capenable ^= IFCAP_TSO6;
 			ifp->if_hwassist ^= CSUM_IP6_TSO;
 		}
 		if (mask & IFCAP_VLAN_HWFILTER) {
 			if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
 				mlx5e_disable_vlan_filter(priv);
 			else
 				mlx5e_enable_vlan_filter(priv);
 
 			ifp->if_capenable ^= IFCAP_VLAN_HWFILTER;
 		}
 		if (mask & IFCAP_VLAN_HWTAGGING)
 			ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
 		if (mask & IFCAP_WOL_MAGIC)
 			ifp->if_capenable ^= IFCAP_WOL_MAGIC;
 
 		VLAN_CAPABILITIES(ifp);
 		/* turn off LRO means also turn of HW LRO - if it's on */
 		if (mask & IFCAP_LRO) {
 			int was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 			bool need_restart = false;
 
 			ifp->if_capenable ^= IFCAP_LRO;
 			if (!(ifp->if_capenable & IFCAP_LRO)) {
 				if (priv->params.hw_lro_en) {
 					priv->params.hw_lro_en = false;
 					need_restart = true;
 					/* Not sure this is the correct way */
 					priv->params_ethtool.hw_lro = priv->params.hw_lro_en;
 				}
 			}
 			if (was_opened && need_restart) {
 				mlx5e_close_locked(ifp);
 				mlx5e_open_locked(ifp);
 			}
 		}
 out:
 		PRIV_UNLOCK(priv);
 		break;
 
 	case SIOCGI2C:
 		ifr = (struct ifreq *)data;
 
 		/*
 		 * Copy from the user-space address ifr_data to the
 		 * kernel-space address i2c
 		 */
 		error = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
 		if (error)
 			break;
 
 		if (i2c.len > sizeof(i2c.data)) {
 			error = EINVAL;
 			break;
 		}
 
 		PRIV_LOCK(priv);
 		/* Get module_num which is required for the query_eeprom */
 		error = mlx5_query_module_num(priv->mdev, &module_num);
 		if (error) {
 			if_printf(ifp, "Query module num failed, eeprom "
 			    "reading is not supported\n");
 			error = EINVAL;
 			goto err_i2c;
 		}
 		/* Check if module is present before doing an access */
 		if (mlx5_query_module_status(priv->mdev, module_num) !=
 		    MLX5_MODULE_STATUS_PLUGGED) {
 			error = EINVAL;
 			goto err_i2c;
 		}
 		/*
 		 * Currently 0XA0 and 0xA2 are the only addresses permitted.
 		 * The internal conversion is as follows:
 		 */
 		if (i2c.dev_addr == 0xA0)
 			read_addr = MLX5E_I2C_ADDR_LOW;
 		else if (i2c.dev_addr == 0xA2)
 			read_addr = MLX5E_I2C_ADDR_HIGH;
 		else {
 			if_printf(ifp, "Query eeprom failed, "
 			    "Invalid Address: %X\n", i2c.dev_addr);
 			error = EINVAL;
 			goto err_i2c;
 		}
 		error = mlx5_query_eeprom(priv->mdev,
 		    read_addr, MLX5E_EEPROM_LOW_PAGE,
 		    (uint32_t)i2c.offset, (uint32_t)i2c.len, module_num,
 		    (uint32_t *)i2c.data, &size_read);
 		if (error) {
 			if_printf(ifp, "Query eeprom failed, eeprom "
 			    "reading is not supported\n");
 			error = EINVAL;
 			goto err_i2c;
 		}
 
 		if (i2c.len > MLX5_EEPROM_MAX_BYTES) {
 			error = mlx5_query_eeprom(priv->mdev,
 			    read_addr, MLX5E_EEPROM_LOW_PAGE,
 			    (uint32_t)(i2c.offset + size_read),
 			    (uint32_t)(i2c.len - size_read), module_num,
 			    (uint32_t *)(i2c.data + size_read), &size_read);
 		}
 		if (error) {
 			if_printf(ifp, "Query eeprom failed, eeprom "
 			    "reading is not supported\n");
 			error = EINVAL;
 			goto err_i2c;
 		}
 
 		error = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
 err_i2c:
 		PRIV_UNLOCK(priv);
 		break;
 
 	default:
 		error = ether_ioctl(ifp, command, data);
 		break;
 	}
 	return (error);
 }
 
 static int
 mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
 	/*
 	 * TODO: uncoment once FW really sets all these bits if
 	 * (!mdev->caps.eth.rss_ind_tbl_cap || !mdev->caps.eth.csum_cap ||
 	 * !mdev->caps.eth.max_lso_cap || !mdev->caps.eth.vlan_cap ||
 	 * !(mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_SCQE_BRK_MOD)) return
 	 * -ENOTSUPP;
 	 */
 
 	/* TODO: add more must-to-have features */
 
 	return (0);
 }
 
 static void
 mlx5e_build_ifp_priv(struct mlx5_core_dev *mdev,
     struct mlx5e_priv *priv,
     int num_comp_vectors)
 {
 	/*
 	 * TODO: Consider link speed for setting "log_sq_size",
 	 * "log_rq_size" and "cq_moderation_xxx":
 	 */
 	priv->params.log_sq_size =
 	    MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
 	priv->params.log_rq_size =
 	    MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE;
 	priv->params.rx_cq_moderation_usec =
 	    MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
 	    MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE :
 	    MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
 	priv->params.rx_cq_moderation_mode =
 	    MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ? 1 : 0;
 	priv->params.rx_cq_moderation_pkts =
 	    MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
 	priv->params.tx_cq_moderation_usec =
 	    MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
 	priv->params.tx_cq_moderation_pkts =
 	    MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
 	priv->params.min_rx_wqes =
 	    MLX5E_PARAMS_DEFAULT_MIN_RX_WQES;
 	priv->params.rx_hash_log_tbl_sz =
 	    (order_base_2(num_comp_vectors) >
 	    MLX5E_PARAMS_DEFAULT_RX_HASH_LOG_TBL_SZ) ?
 	    order_base_2(num_comp_vectors) :
 	    MLX5E_PARAMS_DEFAULT_RX_HASH_LOG_TBL_SZ;
 	priv->params.num_tc = 1;
 	priv->params.default_vlan_prio = 0;
 	priv->counter_set_id = -1;
 
 	/*
 	 * hw lro is currently defaulted to off. when it won't anymore we
 	 * will consider the HW capability: "!!MLX5_CAP_ETH(mdev, lro_cap)"
 	 */
 	priv->params.hw_lro_en = false;
 	priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
 
 	priv->params.cqe_zipping_en = !!MLX5_CAP_GEN(mdev, cqe_compression);
 
 	priv->mdev = mdev;
 	priv->params.num_channels = num_comp_vectors;
 	priv->order_base_2_num_channels = order_base_2(num_comp_vectors);
 	priv->queue_mapping_channel_mask =
 	    roundup_pow_of_two(num_comp_vectors) - 1;
 	priv->num_tc = priv->params.num_tc;
 	priv->default_vlan_prio = priv->params.default_vlan_prio;
 
 	INIT_WORK(&priv->update_stats_work, mlx5e_update_stats_work);
 	INIT_WORK(&priv->update_carrier_work, mlx5e_update_carrier_work);
 	INIT_WORK(&priv->set_rx_mode_work, mlx5e_set_rx_mode_work);
 }
 
 static int
 mlx5e_create_mkey(struct mlx5e_priv *priv, u32 pdn,
     struct mlx5_core_mr *mr)
 {
 	struct ifnet *ifp = priv->ifp;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	struct mlx5_create_mkey_mbox_in *in;
 	int err;
 
 	in = mlx5_vzalloc(sizeof(*in));
 	if (in == NULL) {
 		if_printf(ifp, "%s: failed to allocate inbox\n", __func__);
 		return (-ENOMEM);
 	}
 	in->seg.flags = MLX5_PERM_LOCAL_WRITE |
 	    MLX5_PERM_LOCAL_READ |
 	    MLX5_ACCESS_MODE_PA;
 	in->seg.flags_pd = cpu_to_be32(pdn | MLX5_MKEY_LEN64);
 	in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
 
 	err = mlx5_core_create_mkey(mdev, mr, in, sizeof(*in), NULL, NULL,
 	    NULL);
 	if (err)
 		if_printf(ifp, "%s: mlx5_core_create_mkey failed, %d\n",
 		    __func__, err);
 
 	kvfree(in);
 
 	return (err);
 }
 
 static const char *mlx5e_vport_stats_desc[] = {
 	MLX5E_VPORT_STATS(MLX5E_STATS_DESC)
 };
 
 static const char *mlx5e_pport_stats_desc[] = {
 	MLX5E_PPORT_STATS(MLX5E_STATS_DESC)
 };
 
 static void
 mlx5e_priv_mtx_init(struct mlx5e_priv *priv)
 {
 	mtx_init(&priv->async_events_mtx, "mlx5async", MTX_NETWORK_LOCK, MTX_DEF);
 	sx_init(&priv->state_lock, "mlx5state");
 	callout_init_mtx(&priv->watchdog, &priv->async_events_mtx, 0);
 }
 
 static void
 mlx5e_priv_mtx_destroy(struct mlx5e_priv *priv)
 {
 	mtx_destroy(&priv->async_events_mtx);
 	sx_destroy(&priv->state_lock);
 }
 
 static int
 sysctl_firmware(SYSCTL_HANDLER_ARGS)
 {
 	/*
 	 * %d.%d%.d the string format.
 	 * fw_rev_{maj,min,sub} return u16, 2^16 = 65536.
 	 * We need at most 5 chars to store that.
 	 * It also has: two "." and NULL at the end, which means we need 18
 	 * (5*3 + 3) chars at most.
 	 */
 	char fw[18];
 	struct mlx5e_priv *priv = arg1;
 	int error;
 
 	snprintf(fw, sizeof(fw), "%d.%d.%d", fw_rev_maj(priv->mdev), fw_rev_min(priv->mdev),
 	    fw_rev_sub(priv->mdev));
 	error = sysctl_handle_string(oidp, fw, sizeof(fw), req);
 	return (error);
 }
 
 static void
 mlx5e_add_hw_stats(struct mlx5e_priv *priv)
 {
 	SYSCTL_ADD_PROC(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_hw),
 	    OID_AUTO, "fw_version", CTLTYPE_STRING | CTLFLAG_RD, priv, 0,
 	    sysctl_firmware, "A", "HCA firmware version");
 
 	SYSCTL_ADD_STRING(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_hw),
 	    OID_AUTO, "board_id", CTLFLAG_RD, priv->mdev->board_id, 0,
 	    "Board ID");
 }
 
 static void
 mlx5e_setup_pauseframes(struct mlx5e_priv *priv)
 {
 #if (__FreeBSD_version < 1100000)
 	char path[64];
 
 #endif
 	/* Only receiving pauseframes is enabled by default */
 	priv->params.tx_pauseframe_control = 0;
 	priv->params.rx_pauseframe_control = 1;
 
 #if (__FreeBSD_version < 1100000)
 	/* compute path for sysctl */
 	snprintf(path, sizeof(path), "dev.mce.%d.tx_pauseframe_control",
 	    device_get_unit(priv->mdev->pdev->dev.bsddev));
 
 	/* try to fetch tunable, if any */
 	TUNABLE_INT_FETCH(path, &priv->params.tx_pauseframe_control);
 
 	/* compute path for sysctl */
 	snprintf(path, sizeof(path), "dev.mce.%d.rx_pauseframe_control",
 	    device_get_unit(priv->mdev->pdev->dev.bsddev));
 
 	/* try to fetch tunable, if any */
 	TUNABLE_INT_FETCH(path, &priv->params.rx_pauseframe_control);
 #endif
 
 	/* register pausframe SYSCTLs */
 	SYSCTL_ADD_INT(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    OID_AUTO, "tx_pauseframe_control", CTLFLAG_RDTUN,
 	    &priv->params.tx_pauseframe_control, 0,
 	    "Set to enable TX pause frames. Clear to disable.");
 
 	SYSCTL_ADD_INT(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    OID_AUTO, "rx_pauseframe_control", CTLFLAG_RDTUN,
 	    &priv->params.rx_pauseframe_control, 0,
 	    "Set to enable RX pause frames. Clear to disable.");
 
 	/* range check */
 	priv->params.tx_pauseframe_control =
 	    priv->params.tx_pauseframe_control ? 1 : 0;
 	priv->params.rx_pauseframe_control =
 	    priv->params.rx_pauseframe_control ? 1 : 0;
 
 	/* update firmware */
 	mlx5_set_port_pause(priv->mdev, 1,
 	    priv->params.rx_pauseframe_control,
 	    priv->params.tx_pauseframe_control);
 }
 
 static void *
 mlx5e_create_ifp(struct mlx5_core_dev *mdev)
 {
 	static volatile int mlx5_en_unit;
 	struct ifnet *ifp;
 	struct mlx5e_priv *priv;
 	u8 dev_addr[ETHER_ADDR_LEN] __aligned(4);
 	struct sysctl_oid_list *child;
 	int ncv = mdev->priv.eq_table.num_comp_vectors;
 	char unit[16];
 	int err;
 	int i;
 	u32 eth_proto_cap;
 
 	if (mlx5e_check_required_hca_cap(mdev)) {
 		mlx5_core_dbg(mdev, "mlx5e_check_required_hca_cap() failed\n");
 		return (NULL);
 	}
 	priv = malloc(sizeof(*priv), M_MLX5EN, M_WAITOK | M_ZERO);
 	if (priv == NULL) {
 		mlx5_core_err(mdev, "malloc() failed\n");
 		return (NULL);
 	}
 	mlx5e_priv_mtx_init(priv);
 
 	ifp = priv->ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		mlx5_core_err(mdev, "if_alloc() failed\n");
 		goto err_free_priv;
 	}
 	ifp->if_softc = priv;
 	if_initname(ifp, "mce", atomic_fetchadd_int(&mlx5_en_unit, 1));
 	ifp->if_mtu = ETHERMTU;
 	ifp->if_init = mlx5e_open;
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 	ifp->if_ioctl = mlx5e_ioctl;
 	ifp->if_transmit = mlx5e_xmit;
 	ifp->if_qflush = if_qflush;
 #if (__FreeBSD_version >= 1100000)
 	ifp->if_get_counter = mlx5e_get_counter;
 #endif
 	ifp->if_snd.ifq_maxlen = ifqmaxlen;
 	/*
          * Set driver features
          */
 	ifp->if_capabilities |= IFCAP_HWCSUM | IFCAP_HWCSUM_IPV6;
 	ifp->if_capabilities |= IFCAP_VLAN_MTU | IFCAP_VLAN_HWTAGGING;
 	ifp->if_capabilities |= IFCAP_VLAN_HWCSUM | IFCAP_VLAN_HWFILTER;
 	ifp->if_capabilities |= IFCAP_LINKSTATE | IFCAP_JUMBO_MTU;
 	ifp->if_capabilities |= IFCAP_LRO;
 	ifp->if_capabilities |= IFCAP_TSO | IFCAP_VLAN_HWTSO;
 
 	/* set TSO limits so that we don't have to drop TX packets */
 	ifp->if_hw_tsomax = MLX5E_MAX_TX_PAYLOAD_SIZE - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 	ifp->if_hw_tsomaxsegcount = MLX5E_MAX_TX_MBUF_FRAGS - 1 /* hdr */;
 	ifp->if_hw_tsomaxsegsize = MLX5E_MAX_TX_MBUF_SIZE;
 
 	ifp->if_capenable = ifp->if_capabilities;
 	ifp->if_hwassist = 0;
 	if (ifp->if_capenable & IFCAP_TSO)
 		ifp->if_hwassist |= CSUM_TSO;
 	if (ifp->if_capenable & IFCAP_TXCSUM)
 		ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP | CSUM_IP);
 	if (ifp->if_capenable & IFCAP_TXCSUM_IPV6)
 		ifp->if_hwassist |= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
 
 	/* ifnet sysctl tree */
 	sysctl_ctx_init(&priv->sysctl_ctx);
 	priv->sysctl_ifnet = SYSCTL_ADD_NODE(&priv->sysctl_ctx, SYSCTL_STATIC_CHILDREN(_dev),
 	    OID_AUTO, ifp->if_dname, CTLFLAG_RD, 0, "MLX5 ethernet - interface name");
 	if (priv->sysctl_ifnet == NULL) {
 		mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
 		goto err_free_sysctl;
 	}
 	snprintf(unit, sizeof(unit), "%d", ifp->if_dunit);
 	priv->sysctl_ifnet = SYSCTL_ADD_NODE(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    OID_AUTO, unit, CTLFLAG_RD, 0, "MLX5 ethernet - interface unit");
 	if (priv->sysctl_ifnet == NULL) {
 		mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
 		goto err_free_sysctl;
 	}
 
 	/* HW sysctl tree */
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(mdev->pdev->dev.bsddev));
 	priv->sysctl_hw = SYSCTL_ADD_NODE(&priv->sysctl_ctx, child,
 	    OID_AUTO, "hw", CTLFLAG_RD, 0, "MLX5 ethernet dev hw");
 	if (priv->sysctl_hw == NULL) {
 		mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
 		goto err_free_sysctl;
 	}
 	mlx5e_build_ifp_priv(mdev, priv, ncv);
 	err = mlx5_alloc_map_uar(mdev, &priv->cq_uar);
 	if (err) {
 		if_printf(ifp, "%s: mlx5_alloc_map_uar failed, %d\n",
 		    __func__, err);
 		goto err_free_sysctl;
 	}
 	err = mlx5_core_alloc_pd(mdev, &priv->pdn);
 	if (err) {
 		if_printf(ifp, "%s: mlx5_core_alloc_pd failed, %d\n",
 		    __func__, err);
 		goto err_unmap_free_uar;
 	}
 	err = mlx5_alloc_transport_domain(mdev, &priv->tdn);
 	if (err) {
 		if_printf(ifp, "%s: mlx5_alloc_transport_domain failed, %d\n",
 		    __func__, err);
 		goto err_dealloc_pd;
 	}
 	err = mlx5e_create_mkey(priv, priv->pdn, &priv->mr);
 	if (err) {
 		if_printf(ifp, "%s: mlx5e_create_mkey failed, %d\n",
 		    __func__, err);
 		goto err_dealloc_transport_domain;
 	}
 	mlx5_query_nic_vport_mac_address(priv->mdev, 0, dev_addr);
 
+	/* check if we should generate a random MAC address */
+	if (MLX5_CAP_GEN(priv->mdev, vport_group_manager) == 0 &&
+	    is_zero_ether_addr(dev_addr)) {
+		random_ether_addr(dev_addr);
+		if_printf(ifp, "Assigned random MAC address\n");
+	}
+
 	/* set default MTU */
 	mlx5e_set_dev_port_mtu(ifp, ifp->if_mtu);
 
 	/* Set desc */
 	device_set_desc(mdev->pdev->dev.bsddev, mlx5e_version);
 
 	/* Set default media status */
 	priv->media_status_last = IFM_AVALID;
 	priv->media_active_last = IFM_ETHER | IFM_AUTO |
 	    IFM_ETH_RXPAUSE | IFM_FDX;
 
 	/* setup default pauseframes configuration */
 	mlx5e_setup_pauseframes(priv);
 
 	err = mlx5_query_port_proto_cap(mdev, &eth_proto_cap, MLX5_PTYS_EN);
 	if (err) {
 		eth_proto_cap = 0;
 		if_printf(ifp, "%s: Query port media capability failed, %d\n",
 		    __func__, err);
 	}
 
 	/* Setup supported medias */
 	ifmedia_init(&priv->media, IFM_IMASK | IFM_ETH_FMASK,
 	    mlx5e_media_change, mlx5e_media_status);
 
 	for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
 		if (mlx5e_mode_table[i].baudrate == 0)
 			continue;
 		if (MLX5E_PROT_MASK(i) & eth_proto_cap) {
 			ifmedia_add(&priv->media,
 			    mlx5e_mode_table[i].subtype |
 			    IFM_ETHER, 0, NULL);
 			ifmedia_add(&priv->media,
 			    mlx5e_mode_table[i].subtype |
 			    IFM_ETHER | IFM_FDX |
 			    IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE, 0, NULL);
 		}
 	}
 
 	ifmedia_add(&priv->media, IFM_ETHER | IFM_AUTO, 0, NULL);
 	ifmedia_add(&priv->media, IFM_ETHER | IFM_AUTO | IFM_FDX |
 	    IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE, 0, NULL);
 
 	/* Set autoselect by default */
 	ifmedia_set(&priv->media, IFM_ETHER | IFM_AUTO | IFM_FDX |
 	    IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE);
 	ether_ifattach(ifp, dev_addr);
 
 	/* Register for VLAN events */
 	priv->vlan_attach = EVENTHANDLER_REGISTER(vlan_config,
 	    mlx5e_vlan_rx_add_vid, priv, EVENTHANDLER_PRI_FIRST);
 	priv->vlan_detach = EVENTHANDLER_REGISTER(vlan_unconfig,
 	    mlx5e_vlan_rx_kill_vid, priv, EVENTHANDLER_PRI_FIRST);
 
 	/* Link is down by default */
 	if_link_state_change(ifp, LINK_STATE_DOWN);
 
 	mlx5e_enable_async_events(priv);
 
 	mlx5e_add_hw_stats(priv);
 
 	mlx5e_create_stats(&priv->stats.vport.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    "vstats", mlx5e_vport_stats_desc, MLX5E_VPORT_STATS_NUM,
 	    priv->stats.vport.arg);
 
 	mlx5e_create_stats(&priv->stats.pport.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
 	    "pstats", mlx5e_pport_stats_desc, MLX5E_PPORT_STATS_NUM,
 	    priv->stats.pport.arg);
 
 	mlx5e_create_ethtool(priv);
 
 	mtx_lock(&priv->async_events_mtx);
 	mlx5e_update_stats(priv);
 	mtx_unlock(&priv->async_events_mtx);
 
 	return (priv);
 
 err_dealloc_transport_domain:
 	mlx5_dealloc_transport_domain(mdev, priv->tdn);
 
 err_dealloc_pd:
 	mlx5_core_dealloc_pd(mdev, priv->pdn);
 
 err_unmap_free_uar:
 	mlx5_unmap_free_uar(mdev, &priv->cq_uar);
 
 err_free_sysctl:
 	sysctl_ctx_free(&priv->sysctl_ctx);
 
 	if_free(ifp);
 
 err_free_priv:
 	mlx5e_priv_mtx_destroy(priv);
 	free(priv, M_MLX5EN);
 	return (NULL);
 }
 
 static void
 mlx5e_destroy_ifp(struct mlx5_core_dev *mdev, void *vpriv)
 {
 	struct mlx5e_priv *priv = vpriv;
 	struct ifnet *ifp = priv->ifp;
 
 	/* don't allow more IOCTLs */
 	priv->gone = 1;
 
 	/* XXX wait a bit to allow IOCTL handlers to complete */
 	pause("W", hz);
 
 	/* stop watchdog timer */
 	callout_drain(&priv->watchdog);
 
 	if (priv->vlan_attach != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_config, priv->vlan_attach);
 	if (priv->vlan_detach != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_unconfig, priv->vlan_detach);
 
 	/* make sure device gets closed */
 	PRIV_LOCK(priv);
 	mlx5e_close_locked(ifp);
 	PRIV_UNLOCK(priv);
 
 	/* unregister device */
 	ifmedia_removeall(&priv->media);
 	ether_ifdetach(ifp);
 	if_free(ifp);
 
 	/* destroy all remaining sysctl nodes */
 	if (priv->sysctl_debug)
 		sysctl_ctx_free(&priv->stats.port_stats_debug.ctx);
 	sysctl_ctx_free(&priv->stats.vport.ctx);
 	sysctl_ctx_free(&priv->stats.pport.ctx);
 	sysctl_ctx_free(&priv->sysctl_ctx);
 
 	mlx5_core_destroy_mkey(priv->mdev, &priv->mr);
 	mlx5_dealloc_transport_domain(priv->mdev, priv->tdn);
 	mlx5_core_dealloc_pd(priv->mdev, priv->pdn);
 	mlx5_unmap_free_uar(priv->mdev, &priv->cq_uar);
 	mlx5e_disable_async_events(priv);
 	flush_scheduled_work();
 	mlx5e_priv_mtx_destroy(priv);
 	free(priv, M_MLX5EN);
 }
 
 static void *
 mlx5e_get_ifp(void *vpriv)
 {
 	struct mlx5e_priv *priv = vpriv;
 
 	return (priv->ifp);
 }
 
 static struct mlx5_interface mlx5e_interface = {
 	.add = mlx5e_create_ifp,
 	.remove = mlx5e_destroy_ifp,
 	.event = mlx5e_async_event,
 	.protocol = MLX5_INTERFACE_PROTOCOL_ETH,
 	.get_dev = mlx5e_get_ifp,
 };
 
 void
 mlx5e_init(void)
 {
 	mlx5_register_interface(&mlx5e_interface);
 }
 
 void
 mlx5e_cleanup(void)
 {
 	mlx5_unregister_interface(&mlx5e_interface);
 }
 
 module_init_order(mlx5e_init, SI_ORDER_THIRD);
 module_exit_order(mlx5e_cleanup, SI_ORDER_THIRD);
 
 #if (__FreeBSD_version >= 1100000)
 MODULE_DEPEND(mlx5en, linuxkpi, 1, 1, 1);
 #endif
 MODULE_DEPEND(mlx5en, mlx5, 1, 1, 1);
 MODULE_VERSION(mlx5en, 1);
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c	(revision 301547)
@@ -1,444 +1,444 @@
 /*-
  * Copyright (c) 2015 Mellanox Technologies. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include "en.h"
 #include <machine/in_cksum.h>
 
 static inline int
 mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq,
     struct mlx5e_rx_wqe *wqe, u16 ix)
 {
 	bus_dma_segment_t segs[1];
 	struct mbuf *mb;
 	int nsegs;
 	int err;
 
 	if (rq->mbuf[ix].mbuf != NULL)
 		return (0);
 
 	mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rq->wqe_sz);
 	if (unlikely(!mb))
 		return (-ENOMEM);
 
 	/* set initial mbuf length */
 	mb->m_pkthdr.len = mb->m_len = rq->wqe_sz;
 
 	/* get IP header aligned */
 	m_adj(mb, MLX5E_NET_IP_ALIGN);
 
 	err = -bus_dmamap_load_mbuf_sg(rq->dma_tag, rq->mbuf[ix].dma_map,
 	    mb, segs, &nsegs, BUS_DMA_NOWAIT);
 	if (err != 0)
 		goto err_free_mbuf;
 	if (unlikely(nsegs != 1)) {
 		bus_dmamap_unload(rq->dma_tag, rq->mbuf[ix].dma_map);
 		err = -ENOMEM;
 		goto err_free_mbuf;
 	}
 	wqe->data.addr = cpu_to_be64(segs[0].ds_addr);
 
 	rq->mbuf[ix].mbuf = mb;
 	rq->mbuf[ix].data = mb->m_data;
 
 	bus_dmamap_sync(rq->dma_tag, rq->mbuf[ix].dma_map,
 	    BUS_DMASYNC_PREREAD);
 	return (0);
 
 err_free_mbuf:
 	m_freem(mb);
 	return (err);
 }
 
 static void
 mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 {
 	if (unlikely(rq->enabled == 0))
 		return;
 
 	while (!mlx5_wq_ll_is_full(&rq->wq)) {
 		struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, rq->wq.head);
 
 		if (unlikely(mlx5e_alloc_rx_wqe(rq, wqe, rq->wq.head)))
 			break;
 
 		mlx5_wq_ll_push(&rq->wq, be16_to_cpu(wqe->next.next_wqe_index));
 	}
 
 	/* ensure wqes are visible to device before updating doorbell record */
 	wmb();
 
 	mlx5_wq_ll_update_db_record(&rq->wq);
 }
 
 static void
 mlx5e_lro_update_hdr(struct mbuf *mb, struct mlx5_cqe64 *cqe)
 {
 	/* TODO: consider vlans, ip options, ... */
 	struct ether_header *eh;
 	uint16_t eh_type;
 	uint16_t tot_len;
 	struct ip6_hdr *ip6 = NULL;
 	struct ip *ip4 = NULL;
 	struct tcphdr *th;
 	uint32_t *ts_ptr;
 	uint8_t l4_hdr_type;
 	int tcp_ack;
 
 	eh = mtod(mb, struct ether_header *);
 	eh_type = ntohs(eh->ether_type);
 
 	l4_hdr_type = get_cqe_l4_hdr_type(cqe);
 	tcp_ack = ((CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA == l4_hdr_type) ||
 	    (CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA == l4_hdr_type));
 
 	/* TODO: consider vlan */
 	tot_len = be32_to_cpu(cqe->byte_cnt) - ETHER_HDR_LEN;
 
 	switch (eh_type) {
 	case ETHERTYPE_IP:
 		ip4 = (struct ip *)(eh + 1);
 		th = (struct tcphdr *)(ip4 + 1);
 		break;
 	case ETHERTYPE_IPV6:
 		ip6 = (struct ip6_hdr *)(eh + 1);
 		th = (struct tcphdr *)(ip6 + 1);
 		break;
 	default:
 		return;
 	}
 
 	ts_ptr = (uint32_t *)(th + 1);
 
 	if (get_cqe_lro_tcppsh(cqe))
 		th->th_flags |= TH_PUSH;
 
 	if (tcp_ack) {
 		th->th_flags |= TH_ACK;
 		th->th_ack = cqe->lro_ack_seq_num;
 		th->th_win = cqe->lro_tcp_win;
 
 		/*
 		 * FreeBSD handles only 32bit aligned timestamp right after
 		 * the TCP hdr
 		 * +--------+--------+--------+--------+
 		 * |   NOP  |  NOP   |  TSopt |   10   |
 		 * +--------+--------+--------+--------+
 		 * |          TSval   timestamp        |
 		 * +--------+--------+--------+--------+
 		 * |          TSecr   timestamp        |
 		 * +--------+--------+--------+--------+
 		 */
 		if (get_cqe_lro_timestamp_valid(cqe) &&
 		    (__predict_true(*ts_ptr) == ntohl(TCPOPT_NOP << 24 |
 		    TCPOPT_NOP << 16 | TCPOPT_TIMESTAMP << 8 |
 		    TCPOLEN_TIMESTAMP))) {
 			/*
 			 * cqe->timestamp is 64bit long.
 			 * [0-31] - timestamp.
 			 * [32-64] - timestamp echo replay.
 			 */
 			ts_ptr[1] = *(uint32_t *)&cqe->timestamp;
 			ts_ptr[2] = *((uint32_t *)&cqe->timestamp + 1);
 		}
 	}
 	if (ip4) {
 		ip4->ip_ttl = cqe->lro_min_ttl;
 		ip4->ip_len = cpu_to_be16(tot_len);
 		ip4->ip_sum = 0;
 		ip4->ip_sum = in_cksum(mb, ip4->ip_hl << 2);
 	} else {
 		ip6->ip6_hlim = cqe->lro_min_ttl;
 		ip6->ip6_plen = cpu_to_be16(tot_len -
 		    sizeof(struct ip6_hdr));
 	}
 	/* TODO: handle tcp checksum */
 }
 
 static inline void
 mlx5e_build_rx_mbuf(struct mlx5_cqe64 *cqe,
     struct mlx5e_rq *rq, struct mbuf *mb,
     u32 cqe_bcnt)
 {
 	struct ifnet *ifp = rq->ifp;
 	int lro_num_seg;	/* HW LRO session aggregated packets counter */
 
 	lro_num_seg = be32_to_cpu(cqe->srqn) >> 24;
 	if (lro_num_seg > 1) {
 		mlx5e_lro_update_hdr(mb, cqe);
 		rq->stats.lro_packets++;
 		rq->stats.lro_bytes += cqe_bcnt;
 	}
 
 	mb->m_pkthdr.len = mb->m_len = cqe_bcnt;
 	/* check if a Toeplitz hash was computed */
 	if (cqe->rss_hash_type != 0) {
 		mb->m_pkthdr.flowid = be32_to_cpu(cqe->rss_hash_result);
 #ifdef RSS
 		/* decode the RSS hash type */
 		switch (cqe->rss_hash_type &
 		    (CQE_RSS_DST_HTYPE_L4 | CQE_RSS_DST_HTYPE_IP)) {
 		/* IPv4 */
 		case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV4):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV4);
 			break;
 		case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV4):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV4);
 			break;
 		case CQE_RSS_DST_HTYPE_IPV4:
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV4);
 			break;
 		/* IPv6 */
 		case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV6):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV6);
 			break;
 		case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV6):
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV6);
 			break;
 		case CQE_RSS_DST_HTYPE_IPV6:
 			M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV6);
 			break;
 		default:	/* Other */
-			M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
+			M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
 			break;
 		}
 #else
-		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
+		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
 #endif
 	} else {
 		mb->m_pkthdr.flowid = rq->ix;
 		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
 	}
 	mb->m_pkthdr.rcvif = ifp;
 
 	if (likely(ifp->if_capenable & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) &&
 	    ((cqe->hds_ip_ext & (CQE_L2_OK | CQE_L3_OK | CQE_L4_OK)) ==
 	    (CQE_L2_OK | CQE_L3_OK | CQE_L4_OK))) {
 		mb->m_pkthdr.csum_flags =
 		    CSUM_IP_CHECKED | CSUM_IP_VALID |
 		    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 		mb->m_pkthdr.csum_data = htons(0xffff);
 	} else {
 		rq->stats.csum_none++;
 	}
 
 	if (cqe_has_vlan(cqe)) {
 		mb->m_pkthdr.ether_vtag = be16_to_cpu(cqe->vlan_info);
 		mb->m_flags |= M_VLANTAG;
 	}
 }
 
 static inline void
 mlx5e_read_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
 {
 	memcpy(data, mlx5_cqwq_get_wqe(&cq->wq, (cc & cq->wq.sz_m1)),
 	    sizeof(struct mlx5_cqe64));
 }
 
 static inline void
 mlx5e_write_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
 {
 	memcpy(mlx5_cqwq_get_wqe(&cq->wq, cc & cq->wq.sz_m1),
 	    data, sizeof(struct mlx5_cqe64));
 }
 
 static inline void
 mlx5e_decompress_cqe(struct mlx5e_cq *cq, struct mlx5_cqe64 *title,
     struct mlx5_mini_cqe8 *mini,
     u16 wqe_counter, int i)
 {
 	/*
 	 * NOTE: The fields which are not set here are copied from the
 	 * initial and common title. See memcpy() in
 	 * mlx5e_write_cqe_slot().
 	 */
 	title->byte_cnt = mini->byte_cnt;
 	title->wqe_counter = cpu_to_be16((wqe_counter + i) & cq->wq.sz_m1);
 	title->check_sum = mini->checksum;
 	title->op_own = (title->op_own & 0xf0) |
 	    (((cq->wq.cc + i) >> cq->wq.log_sz) & 1);
 }
 
 #define MLX5E_MINI_ARRAY_SZ 8
 /* Make sure structs are not packet differently */
 CTASSERT(sizeof(struct mlx5_cqe64) ==
     sizeof(struct mlx5_mini_cqe8) * MLX5E_MINI_ARRAY_SZ);
 static void
 mlx5e_decompress_cqes(struct mlx5e_cq *cq)
 {
 	struct mlx5_mini_cqe8 mini_array[MLX5E_MINI_ARRAY_SZ];
 	struct mlx5_cqe64 title;
 	u32 cqe_count;
 	u32 i = 0;
 	u16 title_wqe_counter;
 
 	mlx5e_read_cqe_slot(cq, cq->wq.cc, &title);
 	title_wqe_counter = be16_to_cpu(title.wqe_counter);
 	cqe_count = be32_to_cpu(title.byte_cnt);
 
 	/* Make sure we won't overflow */
 	KASSERT(cqe_count <= cq->wq.sz_m1,
 	    ("%s: cqe_count %u > cq->wq.sz_m1 %u", __func__,
 	    cqe_count, cq->wq.sz_m1));
 
 	mlx5e_read_cqe_slot(cq, cq->wq.cc + 1, mini_array);
 	while (true) {
 		mlx5e_decompress_cqe(cq, &title,
 		    &mini_array[i % MLX5E_MINI_ARRAY_SZ],
 		    title_wqe_counter, i);
 		mlx5e_write_cqe_slot(cq, cq->wq.cc + i, &title);
 		i++;
 
 		if (i == cqe_count)
 			break;
 		if (i % MLX5E_MINI_ARRAY_SZ == 0)
 			mlx5e_read_cqe_slot(cq, cq->wq.cc + i, mini_array);
 	}
 }
 
 static int
 mlx5e_poll_rx_cq(struct mlx5e_rq *rq, int budget)
 {
 	int i;
 
 	for (i = 0; i < budget; i++) {
 		struct mlx5e_rx_wqe *wqe;
 		struct mlx5_cqe64 *cqe;
 		struct mbuf *mb;
 		__be16 wqe_counter_be;
 		u16 wqe_counter;
 		u32 byte_cnt;
 
 		cqe = mlx5e_get_cqe(&rq->cq);
 		if (!cqe)
 			break;
 
 		if (mlx5_get_cqe_format(cqe) == MLX5_COMPRESSED)
 			mlx5e_decompress_cqes(&rq->cq);
 
 		mlx5_cqwq_pop(&rq->cq.wq);
 
 		wqe_counter_be = cqe->wqe_counter;
 		wqe_counter = be16_to_cpu(wqe_counter_be);
 		wqe = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
 		byte_cnt = be32_to_cpu(cqe->byte_cnt);
 
 		bus_dmamap_sync(rq->dma_tag,
 		    rq->mbuf[wqe_counter].dma_map,
 		    BUS_DMASYNC_POSTREAD);
 
 		if (unlikely((cqe->op_own >> 4) != MLX5_CQE_RESP_SEND)) {
 			rq->stats.wqe_err++;
 			goto wq_ll_pop;
 		}
 
 		if (MHLEN >= byte_cnt &&
 		    (mb = m_gethdr(M_NOWAIT, MT_DATA)) != NULL) {
 			bcopy(rq->mbuf[wqe_counter].data, mtod(mb, caddr_t),
 			    byte_cnt);
 		} else {
 			mb = rq->mbuf[wqe_counter].mbuf;
 			rq->mbuf[wqe_counter].mbuf = NULL;	/* safety clear */
 
 			bus_dmamap_unload(rq->dma_tag,
 			    rq->mbuf[wqe_counter].dma_map);
 		}
 
 		mlx5e_build_rx_mbuf(cqe, rq, mb, byte_cnt);
 		rq->stats.packets++;
 #ifdef HAVE_TURBO_LRO
 		if (mb->m_pkthdr.csum_flags == 0 ||
 		    (rq->ifp->if_capenable & IFCAP_LRO) == 0 ||
 		    rq->lro.mbuf == NULL) {
 			/* normal input */
 			rq->ifp->if_input(rq->ifp, mb);
 		} else {
 			tcp_tlro_rx(&rq->lro, mb);
 		}
 #else
 		if (mb->m_pkthdr.csum_flags == 0 ||
 		    (rq->ifp->if_capenable & IFCAP_LRO) == 0 ||
 		    rq->lro.lro_cnt == 0 ||
 		    tcp_lro_rx(&rq->lro, mb, 0) != 0) {
 			rq->ifp->if_input(rq->ifp, mb);
 		}
 #endif
 wq_ll_pop:
 		mlx5_wq_ll_pop(&rq->wq, wqe_counter_be,
 		    &wqe->next.next_wqe_index);
 	}
 
 	mlx5_cqwq_update_db_record(&rq->cq.wq);
 
 	/* ensure cq space is freed before enabling more cqes */
 	wmb();
 #ifndef HAVE_TURBO_LRO
 	tcp_lro_flush_all(&rq->lro);
 #endif
 	return (i);
 }
 
 void
 mlx5e_rx_cq_comp(struct mlx5_core_cq *mcq)
 {
 	struct mlx5e_rq *rq = container_of(mcq, struct mlx5e_rq, cq.mcq);
 	int i = 0;
 
 #ifdef HAVE_PER_CQ_EVENT_PACKET
 	struct mbuf *mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rq->wqe_sz);
 
 	if (mb != NULL) {
 		/* this code is used for debugging purpose only */
 		mb->m_pkthdr.len = mb->m_len = 15;
 		memset(mb->m_data, 255, 14);
 		mb->m_data[14] = rq->ix;
 		mb->m_pkthdr.rcvif = rq->ifp;
 		rq->ifp->if_input(rq->ifp, mb);
 	}
 #endif
 
 	mtx_lock(&rq->mtx);
 
 	/*
 	 * Polling the entire CQ without posting new WQEs results in
 	 * lack of receive WQEs during heavy traffic scenarios.
 	 */
 	while (1) {
 		if (mlx5e_poll_rx_cq(rq, MLX5E_RX_BUDGET_MAX) !=
 		    MLX5E_RX_BUDGET_MAX)
 			break;
 		i += MLX5E_RX_BUDGET_MAX;
 		if (i >= MLX5E_BUDGET_MAX)
 			break;
 		mlx5e_post_rx_wqes(rq);
 	}
 	mlx5e_post_rx_wqes(rq);
 	mlx5e_cq_arm(&rq->cq);
 #ifdef HAVE_TURBO_LRO
 	tcp_tlro_flush(&rq->lro, 1);
 #endif
 	mtx_unlock(&rq->mtx);
 }
Index: projects/vnet/sys/dev/mlx5/vport.h
===================================================================
--- projects/vnet/sys/dev/mlx5/vport.h	(revision 301546)
+++ projects/vnet/sys/dev/mlx5/vport.h	(revision 301547)
@@ -1,81 +1,109 @@
 /*-
  * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef __MLX5_VPORT_H__
 #define __MLX5_VPORT_H__
 
 #include <dev/mlx5/driver.h>
 int mlx5_vport_alloc_q_counter(struct mlx5_core_dev *mdev,
 			       int *counter_set_id);
 int mlx5_vport_dealloc_q_counter(struct mlx5_core_dev *mdev,
 				 int counter_set_id);
 int mlx5_vport_query_out_of_rx_buffer(struct mlx5_core_dev *mdev,
 				      int counter_set_id,
 				      u32 *out_of_rx_buffer);
 
 u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod);
+int mlx5_arm_vport_context_events(struct mlx5_core_dev *mdev,
+				  u8 vport,
+				  u32 events_mask);
+int mlx5_query_vport_promisc(struct mlx5_core_dev *mdev,
+			     u32 vport,
+			     u8 *promisc_uc,
+			     u8 *promisc_mc,
+			     u8 *promisc_all);
+int mlx5_modify_nic_vport_promisc(struct mlx5_core_dev *mdev,
+				  int promisc_uc,
+				  int promisc_mc,
+				  int promisc_all);
 int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
 				     u32 vport, u8 *addr);
 int mlx5_set_nic_vport_current_mac(struct mlx5_core_dev *mdev, int vport,
 				   bool other_vport, u8 *addr);
 int mlx5_set_nic_vport_vlan_list(struct mlx5_core_dev *dev, u32 vport,
 				 u16 *vlan_list, int list_len);
 int mlx5_set_nic_vport_mc_list(struct mlx5_core_dev *mdev, int vport,
 			       u64 *addr_list, size_t addr_list_len);
 int mlx5_set_nic_vport_promisc(struct mlx5_core_dev *mdev, int vport,
 			       bool promisc_mc, bool promisc_uc,
 			       bool promisc_all);
+int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
+				  u32 vport,
+				  enum mlx5_list_type list_type,
+				  u8 addr_list[][ETH_ALEN],
+				  int *list_size);
+int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev *dev,
+				   enum mlx5_list_type list_type,
+				   u8 addr_list[][ETH_ALEN],
+				   int list_size);
+int mlx5_query_nic_vport_vlan_list(struct mlx5_core_dev *dev,
+				   u32 vport,
+				   u16 *vlan_list,
+				   int *list_size);
+int mlx5_modify_nic_vport_vlans(struct mlx5_core_dev *dev,
+				u16 vlans[],
+				int list_size);
 int mlx5_set_nic_vport_permanent_mac(struct mlx5_core_dev *mdev, int vport,
 				     u8 *addr);
 int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev);
 int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev);
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid);
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
 int mlx5_query_nic_vport_port_guid(struct mlx5_core_dev *mdev, u64 *port_guid);
 int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
 					u16 *qkey_viol_cntr);
 int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
 int mlx5_query_hca_vport_system_image_guid(struct mlx5_core_dev *mdev,
 					   u64 *system_image_guid);
 int mlx5_query_hca_vport_context(struct mlx5_core_dev *mdev,
 				 u8 port_num, u8 vport_num, u32 *out,
 				 int outlen);
 int mlx5_query_hca_vport_pkey(struct mlx5_core_dev *dev, u8 other_vport,
 			      u8 port_num, u16 vf_num, u16 pkey_index,
 			      u16 *pkey);
 int mlx5_query_hca_vport_gid(struct mlx5_core_dev *dev, u8 port_num,
 			     u16 vport_num, u16 gid_index, union ib_gid *gid);
 int mlx5_set_eswitch_cvlan_info(struct mlx5_core_dev *mdev, u8 vport,
 				u8 insert_mode, u8 strip_mode,
 				u16 vlan, u8 cfi, u8 pcp);
 int mlx5_query_vport_counter(struct mlx5_core_dev *dev,
 			     u8 port_num, u16 vport_num,
 			     void *out, int out_size);
 int mlx5_get_vport_counters(struct mlx5_core_dev *dev, u8 port_num,
 			    struct mlx5_vport_counters *vc);
 #endif /* __MLX5_VPORT_H__ */
Index: projects/vnet/sys/dev/qlxgbe/ql_isr.c
===================================================================
--- projects/vnet/sys/dev/qlxgbe/ql_isr.c	(revision 301546)
+++ projects/vnet/sys/dev/qlxgbe/ql_isr.c	(revision 301547)
@@ -1,924 +1,924 @@
 /*
  * Copyright (c) 2013-2016 Qlogic Corporation
  * All rights reserved.
  *
  *  Redistribution and use in source and binary forms, with or without
  *  modification, are permitted provided that the following conditions
  *  are met:
  *
  *  1. Redistributions of source code must retain the above copyright
  *     notice, this list of conditions and the following disclaimer.
  *  2. Redistributions in binary form must reproduce the above copyright
  *     notice, this list of conditions and the following disclaimer in the
  *     documentation and/or other materials provided with the distribution.
  *
  *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  *  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  *  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  *  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  *  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  *  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  *  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  *  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  *  POSSIBILITY OF SUCH DAMAGE.
  */
 
 /*
  * File: ql_isr.c
  * Author : David C Somayajulu, Qlogic Corporation, Aliso Viejo, CA 92656.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 
 #include "ql_os.h"
 #include "ql_hw.h"
 #include "ql_def.h"
 #include "ql_inline.h"
 #include "ql_ver.h"
 #include "ql_glbl.h"
 #include "ql_dbg.h"
 
 static void qla_replenish_normal_rx(qla_host_t *ha, qla_sds_t *sdsp,
 		uint32_t r_idx);
 
 static void
 qla_rcv_error(qla_host_t *ha)
 {
 	ha->flags.stop_rcv = 1;
 	ha->qla_initiate_recovery = 1;
 }
 
 
 /*
  * Name: qla_rx_intr
  * Function: Handles normal ethernet frames received
  */
 static void
 qla_rx_intr(qla_host_t *ha, qla_sgl_rcv_t *sgc, uint32_t sds_idx)
 {
 	qla_rx_buf_t		*rxb;
 	struct mbuf		*mp = NULL, *mpf = NULL, *mpl = NULL;
 	struct ifnet		*ifp = ha->ifp;
 	qla_sds_t		*sdsp;
 	struct ether_vlan_header *eh;
 	uint32_t		i, rem_len = 0;
 	uint32_t		r_idx = 0;
 	qla_rx_ring_t		*rx_ring;
 
 	if (ha->hw.num_rds_rings > 1)
 		r_idx = sds_idx;
 	
 	ha->hw.rds[r_idx].count++;
 
 	sdsp = &ha->hw.sds[sds_idx];
 	rx_ring = &ha->rx_ring[r_idx];
 	
 	for (i = 0; i < sgc->num_handles; i++) {
 		rxb = &rx_ring->rx_buf[sgc->handle[i] & 0x7FFF];
 
 		QL_ASSERT(ha, (rxb != NULL),
 			("%s: [sds_idx]=[%d] rxb != NULL\n", __func__,\
 			sds_idx));
 
 		if ((rxb == NULL) || QL_ERR_INJECT(ha, INJCT_RX_RXB_INVAL)) {
 			/* log the error */
 			device_printf(ha->pci_dev,
 				"%s invalid rxb[%d, %d, 0x%04x]\n",
 				__func__, sds_idx, i, sgc->handle[i]);
 			qla_rcv_error(ha);
 			return;
 		}
 
 		mp = rxb->m_head;
 		if (i == 0) 
 			mpf = mp;
 
 		QL_ASSERT(ha, (mp != NULL),
 			("%s: [sds_idx]=[%d] mp != NULL\n", __func__,\
 			sds_idx));
 
 		bus_dmamap_sync(ha->rx_tag, rxb->map, BUS_DMASYNC_POSTREAD);
 
 		rxb->m_head = NULL;
 		rxb->next = sdsp->rxb_free;
 		sdsp->rxb_free = rxb;
 		sdsp->rx_free++;
 	
 		if ((mp == NULL) || QL_ERR_INJECT(ha, INJCT_RX_MP_NULL)) {
 			/* log the error */
 			device_printf(ha->pci_dev,
 				"%s mp  == NULL [%d, %d, 0x%04x]\n",
 				__func__, sds_idx, i, sgc->handle[i]);
 			qla_rcv_error(ha);
 			return;
 		}
 
 		if (i == 0) {
 			mpl = mpf = mp;
 			mp->m_flags |= M_PKTHDR;
 			mp->m_pkthdr.len = sgc->pkt_length;
 			mp->m_pkthdr.rcvif = ifp;
 			rem_len = mp->m_pkthdr.len;
 		} else {
 			mp->m_flags &= ~M_PKTHDR;
 			mpl->m_next = mp;
 			mpl = mp;
 			rem_len = rem_len - mp->m_len;
 		}
 	}
 
 	mpl->m_len = rem_len;
 
 	eh = mtod(mpf, struct ether_vlan_header *);
 
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		uint32_t *data = (uint32_t *)eh;
 
 		mpf->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
 		mpf->m_flags |= M_VLANTAG;
 
 		*(data + 3) = *(data + 2);
 		*(data + 2) = *(data + 1);
 		*(data + 1) = *data;
 
 		m_adj(mpf, ETHER_VLAN_ENCAP_LEN);
 	}
 
 	if (sgc->chksum_status == Q8_STAT_DESC_STATUS_CHKSUM_OK) {
 		mpf->m_pkthdr.csum_flags = CSUM_IP_CHECKED | CSUM_IP_VALID |
 			CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 		mpf->m_pkthdr.csum_data = 0xFFFF;
 	} else {
 		mpf->m_pkthdr.csum_flags = 0;
 	}
 
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 
 	mpf->m_pkthdr.flowid = sgc->rss_hash;
-	M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE);
+	M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE_HASH);
 
 	(*ifp->if_input)(ifp, mpf);
 
 	if (sdsp->rx_free > ha->std_replenish)
 		qla_replenish_normal_rx(ha, sdsp, r_idx);
 
 	return;
 }
 
 #define QLA_TCP_HDR_SIZE        20
 #define QLA_TCP_TS_OPTION_SIZE  12
 
 /*
  * Name: qla_lro_intr
  * Function: Handles normal ethernet frames received
  */
 static int
 qla_lro_intr(qla_host_t *ha, qla_sgl_lro_t *sgc, uint32_t sds_idx)
 {
 	qla_rx_buf_t *rxb;
 	struct mbuf *mp = NULL, *mpf = NULL, *mpl = NULL;
 	struct ifnet *ifp = ha->ifp;
 	qla_sds_t *sdsp;
 	struct ether_vlan_header *eh;
 	uint32_t i, rem_len = 0, pkt_length, iplen;
 	struct tcphdr *th;
 	struct ip *ip = NULL;
 	struct ip6_hdr *ip6 = NULL;
 	uint16_t etype;
 	uint32_t r_idx = 0;
 	qla_rx_ring_t *rx_ring;
 
 	if (ha->hw.num_rds_rings > 1)
 		r_idx = sds_idx;
 
 	ha->hw.rds[r_idx].count++;
 
 	rx_ring = &ha->rx_ring[r_idx];
 	
 	ha->lro_pkt_count++;
 
 	sdsp = &ha->hw.sds[sds_idx];
 	
 	pkt_length = sgc->payload_length + sgc->l4_offset;
 
 	if (sgc->flags & Q8_LRO_COMP_TS) {
 		pkt_length += QLA_TCP_HDR_SIZE + QLA_TCP_TS_OPTION_SIZE;
 	} else {
 		pkt_length += QLA_TCP_HDR_SIZE;
 	}
 	ha->lro_bytes += pkt_length;
 
 	for (i = 0; i < sgc->num_handles; i++) {
 		rxb = &rx_ring->rx_buf[sgc->handle[i] & 0x7FFF];
 
 		QL_ASSERT(ha, (rxb != NULL),
 			("%s: [sds_idx]=[%d] rxb != NULL\n", __func__,\
 			sds_idx));
 
 		if ((rxb == NULL) || QL_ERR_INJECT(ha, INJCT_LRO_RXB_INVAL)) {
 			/* log the error */
 			device_printf(ha->pci_dev,
 				"%s invalid rxb[%d, %d, 0x%04x]\n",
 				__func__, sds_idx, i, sgc->handle[i]);
 			qla_rcv_error(ha);
 			return (0);
 		}
 
 		mp = rxb->m_head;
 		if (i == 0) 
 			mpf = mp;
 
 		QL_ASSERT(ha, (mp != NULL),
 			("%s: [sds_idx]=[%d] mp != NULL\n", __func__,\
 			sds_idx));
 
 		bus_dmamap_sync(ha->rx_tag, rxb->map, BUS_DMASYNC_POSTREAD);
 
 		rxb->m_head = NULL;
 		rxb->next = sdsp->rxb_free;
 		sdsp->rxb_free = rxb;
 		sdsp->rx_free++;
 	
 		if ((mp == NULL) || QL_ERR_INJECT(ha, INJCT_LRO_MP_NULL)) {
 			/* log the error */
 			device_printf(ha->pci_dev,
 				"%s mp  == NULL [%d, %d, 0x%04x]\n",
 				__func__, sds_idx, i, sgc->handle[i]);
 			qla_rcv_error(ha);
 			return (0);
 		}
 
 		if (i == 0) {
 			mpl = mpf = mp;
 			mp->m_flags |= M_PKTHDR;
 			mp->m_pkthdr.len = pkt_length;
 			mp->m_pkthdr.rcvif = ifp;
 			rem_len = mp->m_pkthdr.len;
 		} else {
 			mp->m_flags &= ~M_PKTHDR;
 			mpl->m_next = mp;
 			mpl = mp;
 			rem_len = rem_len - mp->m_len;
 		}
 	}
 
 	mpl->m_len = rem_len;
 
 	th = (struct tcphdr *)(mpf->m_data + sgc->l4_offset);
 
 	if (sgc->flags & Q8_LRO_COMP_PUSH_BIT)
 		th->th_flags |= TH_PUSH;
 
 	m_adj(mpf, sgc->l2_offset);
 
 	eh = mtod(mpf, struct ether_vlan_header *);
 
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		uint32_t *data = (uint32_t *)eh;
 
 		mpf->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
 		mpf->m_flags |= M_VLANTAG;
 
 		*(data + 3) = *(data + 2);
 		*(data + 2) = *(data + 1);
 		*(data + 1) = *data;
 
 		m_adj(mpf, ETHER_VLAN_ENCAP_LEN);
 
 		etype = ntohs(eh->evl_proto);
 	} else {
 		etype = ntohs(eh->evl_encap_proto);
 	}
 
 	if (etype == ETHERTYPE_IP) {
 		ip = (struct ip *)(mpf->m_data + ETHER_HDR_LEN);
 	
 		iplen = (ip->ip_hl << 2) + (th->th_off << 2) +
 				sgc->payload_length;
 
                 ip->ip_len = htons(iplen);
 
 		ha->ipv4_lro++;
 	} else if (etype == ETHERTYPE_IPV6) {
 		ip6 = (struct ip6_hdr *)(mpf->m_data + ETHER_HDR_LEN);
 
 		iplen = (th->th_off << 2) + sgc->payload_length;
 
 		ip6->ip6_plen = htons(iplen);
 
 		ha->ipv6_lro++;
 	} else {
 		m_freem(mpf);
 
 		if (sdsp->rx_free > ha->std_replenish)
 			qla_replenish_normal_rx(ha, sdsp, r_idx);
 		return 0;
 	}
 
 	mpf->m_pkthdr.csum_flags = CSUM_IP_CHECKED | CSUM_IP_VALID |
 					CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 	mpf->m_pkthdr.csum_data = 0xFFFF;
 
 	mpf->m_pkthdr.flowid = sgc->rss_hash;
-	M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE);
+	M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE_HASH);
 
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 
 	(*ifp->if_input)(ifp, mpf);
 
 	if (sdsp->rx_free > ha->std_replenish)
 		qla_replenish_normal_rx(ha, sdsp, r_idx);
 
 	return (0);
 }
 
 static int
 qla_rcv_cont_sds(qla_host_t *ha, uint32_t sds_idx, uint32_t comp_idx,
 	uint32_t dcount, uint16_t *handle, uint16_t *nhandles)
 {
 	uint32_t i;
 	uint16_t num_handles;
 	q80_stat_desc_t *sdesc;
 	uint32_t opcode;
 
 	*nhandles = 0;
 	dcount--;
 
 	for (i = 0; i < dcount; i++) {
 		comp_idx = (comp_idx + 1) & (NUM_STATUS_DESCRIPTORS-1);
 		sdesc = (q80_stat_desc_t *)
 				&ha->hw.sds[sds_idx].sds_ring_base[comp_idx];
 
 		opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
 
 		if (!opcode) {
 			device_printf(ha->pci_dev, "%s: opcode=0 %p %p\n",
 				__func__, (void *)sdesc->data[0],
 				(void *)sdesc->data[1]);
 			return -1;
 		}
 
 		num_handles = Q8_SGL_STAT_DESC_NUM_HANDLES((sdesc->data[1]));
 		if (!num_handles) {
 			device_printf(ha->pci_dev, "%s: opcode=0 %p %p\n",
 				__func__, (void *)sdesc->data[0],
 				(void *)sdesc->data[1]);
 			return -1;
 		}
 
 		if (QL_ERR_INJECT(ha, INJCT_NUM_HNDLE_INVALID))
 			num_handles = -1;
 
 		switch (num_handles) {
 
 		case 1:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			break;
 
 		case 2:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			break;
 
 		case 3:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
 			break;
 
 		case 4:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
 			break;
 
 		case 5:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
 			break;
 
 		case 6:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE6((sdesc->data[1]));
 			break;
 
 		case 7:
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE6((sdesc->data[1]));
 			*handle++ = Q8_SGL_STAT_DESC_HANDLE7((sdesc->data[1]));
 			break;
 
 		default:
 			device_printf(ha->pci_dev,
 				"%s: invalid num handles %p %p\n",
 				__func__, (void *)sdesc->data[0],
 				(void *)sdesc->data[1]);
 
 			QL_ASSERT(ha, (0),\
 			("%s: %s [nh, sds, d0, d1]=[%d, %d, %p, %p]\n",
 			__func__, "invalid num handles", sds_idx, num_handles,
 			(void *)sdesc->data[0],(void *)sdesc->data[1]));
 
 			qla_rcv_error(ha);
 			return 0;
 		}
 		*nhandles = *nhandles + num_handles;
 	}
 	return 0;
 }
 
 /*
  * Name: qla_rcv_isr
  * Function: Main Interrupt Service Routine
  */
 static uint32_t
 qla_rcv_isr(qla_host_t *ha, uint32_t sds_idx, uint32_t count)
 {
 	device_t dev;
 	qla_hw_t *hw;
 	uint32_t comp_idx, c_idx = 0, desc_count = 0, opcode;
 	volatile q80_stat_desc_t *sdesc, *sdesc0 = NULL;
 	uint32_t ret = 0;
 	qla_sgl_comp_t sgc;
 	uint16_t nhandles;
 	uint32_t sds_replenish_threshold = 0;
 
 	dev = ha->pci_dev;
 	hw = &ha->hw;
 
 	hw->sds[sds_idx].rcv_active = 1;
 	if (ha->flags.stop_rcv) {
 		hw->sds[sds_idx].rcv_active = 0;
 		return 0;
 	}
 
 	QL_DPRINT2(ha, (dev, "%s: [%d]enter\n", __func__, sds_idx));
 
 	/*
 	 * receive interrupts
 	 */
 	comp_idx = hw->sds[sds_idx].sdsr_next;
 
 	while (count-- && !ha->flags.stop_rcv) {
 
 		sdesc = (q80_stat_desc_t *)
 				&hw->sds[sds_idx].sds_ring_base[comp_idx];
 
 		opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
 
 		if (!opcode)
 			break;
 
 		hw->sds[sds_idx].intr_count++;
 		switch (opcode) {
 
 		case Q8_STAT_DESC_OPCODE_RCV_PKT:
 
 			desc_count = 1;
 
 			bzero(&sgc, sizeof(qla_sgl_comp_t));
 
 			sgc.rcv.pkt_length =
 				Q8_STAT_DESC_TOTAL_LENGTH((sdesc->data[0]));
 			sgc.rcv.num_handles = 1;
 			sgc.rcv.handle[0] =
 				Q8_STAT_DESC_HANDLE((sdesc->data[0]));
 			sgc.rcv.chksum_status =
 				Q8_STAT_DESC_STATUS((sdesc->data[1]));
 
 			sgc.rcv.rss_hash =
 				Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
 
 			if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
 				sgc.rcv.vlan_tag =
 					Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
 			}
 			qla_rx_intr(ha, &sgc.rcv, sds_idx);
 			break;
 
 		case Q8_STAT_DESC_OPCODE_SGL_RCV:
 
 			desc_count =
 				Q8_STAT_DESC_COUNT_SGL_RCV((sdesc->data[1]));
 
 			if (desc_count > 1) {
 				c_idx = (comp_idx + desc_count -1) &
 						(NUM_STATUS_DESCRIPTORS-1);
 				sdesc0 = (q80_stat_desc_t *)
 					&hw->sds[sds_idx].sds_ring_base[c_idx];
 
 				if (Q8_STAT_DESC_OPCODE((sdesc0->data[1])) !=
 						Q8_STAT_DESC_OPCODE_CONT) {
 					desc_count = 0;
 					break;
 				}
 			}
 
 			bzero(&sgc, sizeof(qla_sgl_comp_t));
 
 			sgc.rcv.pkt_length =
 				Q8_STAT_DESC_TOTAL_LENGTH_SGL_RCV(\
 					(sdesc->data[0]));
 			sgc.rcv.chksum_status =
 				Q8_STAT_DESC_STATUS((sdesc->data[1]));
 
 			sgc.rcv.rss_hash =
 				Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
 
 			if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
 				sgc.rcv.vlan_tag =
 					Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
 			}
 
 			QL_ASSERT(ha, (desc_count <= 2) ,\
 				("%s: [sds_idx, data0, data1]="\
 				"%d, %p, %p]\n", __func__, sds_idx,\
 				(void *)sdesc->data[0],\
 				(void *)sdesc->data[1]));
 
 			sgc.rcv.num_handles = 1;
 			sgc.rcv.handle[0] = 
 				Q8_STAT_DESC_HANDLE((sdesc->data[0]));
 			
 			if (qla_rcv_cont_sds(ha, sds_idx, comp_idx, desc_count,
 				&sgc.rcv.handle[1], &nhandles)) {
 				device_printf(dev,
 					"%s: [sds_idx, dcount, data0, data1]="
 					 "[%d, %d, 0x%llx, 0x%llx]\n",
 					__func__, sds_idx, desc_count,
 					(long long unsigned int)sdesc->data[0],
 					(long long unsigned int)sdesc->data[1]);
 				desc_count = 0;
 				break;	
 			}
 
 			sgc.rcv.num_handles += nhandles;
 
 			qla_rx_intr(ha, &sgc.rcv, sds_idx);
 			
 			break;
 
 		case Q8_STAT_DESC_OPCODE_SGL_LRO:
 
 			desc_count =
 				Q8_STAT_DESC_COUNT_SGL_LRO((sdesc->data[1]));
 
 			if (desc_count > 1) {
 				c_idx = (comp_idx + desc_count -1) &
 						(NUM_STATUS_DESCRIPTORS-1);
 				sdesc0 = (q80_stat_desc_t *)
 					&hw->sds[sds_idx].sds_ring_base[c_idx];
 
 				if (Q8_STAT_DESC_OPCODE((sdesc0->data[1])) !=
 						Q8_STAT_DESC_OPCODE_CONT) {
 					desc_count = 0;
 					break;
 				}
 			}
 			bzero(&sgc, sizeof(qla_sgl_comp_t));
 
 			sgc.lro.payload_length =
 			Q8_STAT_DESC_TOTAL_LENGTH_SGL_RCV((sdesc->data[0]));
 				
 			sgc.lro.rss_hash =
 				Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
 			
 			sgc.lro.num_handles = 1;
 			sgc.lro.handle[0] =
 				Q8_STAT_DESC_HANDLE((sdesc->data[0]));
 
 			if (Q8_SGL_LRO_STAT_TS((sdesc->data[1])))
 				sgc.lro.flags |= Q8_LRO_COMP_TS;
 
 			if (Q8_SGL_LRO_STAT_PUSH_BIT((sdesc->data[1])))
 				sgc.lro.flags |= Q8_LRO_COMP_PUSH_BIT;
 
 			sgc.lro.l2_offset =
 				Q8_SGL_LRO_STAT_L2_OFFSET((sdesc->data[1]));
 			sgc.lro.l4_offset =
 				Q8_SGL_LRO_STAT_L4_OFFSET((sdesc->data[1]));
 
 			if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
 				sgc.lro.vlan_tag =
 					Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
 			}
 
 			QL_ASSERT(ha, (desc_count <= 7) ,\
 				("%s: [sds_idx, data0, data1]="\
 				 "[%d, 0x%llx, 0x%llx]\n",\
 				__func__, sds_idx,\
 				(long long unsigned int)sdesc->data[0],\
 				(long long unsigned int)sdesc->data[1]));
 				
 			if (qla_rcv_cont_sds(ha, sds_idx, comp_idx, 
 				desc_count, &sgc.lro.handle[1], &nhandles)) {
 				device_printf(dev,
 				"%s: [sds_idx, data0, data1]="\
 				 "[%d, 0x%llx, 0x%llx]\n",\
 				__func__, sds_idx,\
 				(long long unsigned int)sdesc->data[0],\
 				(long long unsigned int)sdesc->data[1]);
 
 				desc_count = 0;
 				break;	
 			}
 
 			sgc.lro.num_handles += nhandles;
 
 			if (qla_lro_intr(ha, &sgc.lro, sds_idx)) {
 				device_printf(dev,
 				"%s: [sds_idx, data0, data1]="\
 				 "[%d, 0x%llx, 0x%llx]\n",\
 				__func__, sds_idx,\
 				(long long unsigned int)sdesc->data[0],\
 				(long long unsigned int)sdesc->data[1]);
 				device_printf(dev,
 				"%s: [comp_idx, c_idx, dcount, nhndls]="\
 				 "[%d, %d, %d, %d]\n",\
 				__func__, comp_idx, c_idx, desc_count,
 				sgc.lro.num_handles);
 				if (desc_count > 1) {
 				device_printf(dev,
 				"%s: [sds_idx, data0, data1]="\
 				 "[%d, 0x%llx, 0x%llx]\n",\
 				__func__, sds_idx,\
 				(long long unsigned int)sdesc0->data[0],\
 				(long long unsigned int)sdesc0->data[1]);
 				}
 			}
 			
 			break;
 
 		default:
 			device_printf(dev, "%s: default 0x%llx!\n", __func__,
 					(long long unsigned int)sdesc->data[0]);
 			break;
 		}
 
 		if (desc_count == 0)
 			break;
 
 		sds_replenish_threshold += desc_count;
 
 
 		while (desc_count--) {
 			sdesc->data[0] = 0ULL;
 			sdesc->data[1] = 0ULL;
 			comp_idx = (comp_idx + 1) & (NUM_STATUS_DESCRIPTORS-1);
 			sdesc = (q80_stat_desc_t *)
 				&hw->sds[sds_idx].sds_ring_base[comp_idx];
 		}
 
 		if (sds_replenish_threshold > ha->hw.sds_cidx_thres) {
 			sds_replenish_threshold = 0;
 			if (hw->sds[sds_idx].sdsr_next != comp_idx) {
 				QL_UPDATE_SDS_CONSUMER_INDEX(ha, sds_idx,\
 					comp_idx);
 			}
 			hw->sds[sds_idx].sdsr_next = comp_idx;
 		}
 	}
 
 	if (ha->flags.stop_rcv)
 		goto qla_rcv_isr_exit;
 
 	if (hw->sds[sds_idx].sdsr_next != comp_idx) {
 		QL_UPDATE_SDS_CONSUMER_INDEX(ha, sds_idx, comp_idx);
 	}
 	hw->sds[sds_idx].sdsr_next = comp_idx;
 
 	sdesc = (q80_stat_desc_t *)&hw->sds[sds_idx].sds_ring_base[comp_idx];
 	opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
 
 	if (opcode)
 		ret = -1;
 
 qla_rcv_isr_exit:
 	hw->sds[sds_idx].rcv_active = 0;
 
 	return (ret);
 }
 
 void
 ql_mbx_isr(void *arg)
 {
 	qla_host_t *ha;
 	uint32_t data;
 	uint32_t prev_link_state;
 
 	ha = arg;
 
 	if (ha == NULL) {
 		device_printf(ha->pci_dev, "%s: arg == NULL\n", __func__);
 		return;
 	}
 
 	data = READ_REG32(ha, Q8_FW_MBOX_CNTRL);
 	if ((data & 0x3) != 0x1) {
 		WRITE_REG32(ha, ha->hw.mbx_intr_mask_offset, 0);
 		return;
 	}
 
 	data = READ_REG32(ha, Q8_FW_MBOX0);
 
 	if ((data & 0xF000) != 0x8000)
 		return;
 
 	data = data & 0xFFFF;
 
 	switch (data) {
 
 	case 0x8001:  /* It's an AEN */
 		
 		ha->hw.cable_oui = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
 
 		data = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
 		ha->hw.cable_length = data & 0xFFFF;
 
 		data = data >> 16;
 		ha->hw.link_speed = data & 0xFFF;
 
 		data = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
 
 		prev_link_state =  ha->hw.link_up;
 		ha->hw.link_up = (((data & 0xFF) == 0) ? 0 : 1);
 
 		if (prev_link_state !=  ha->hw.link_up) {
 			if (ha->hw.link_up)
 				if_link_state_change(ha->ifp, LINK_STATE_UP);
 			else
 				if_link_state_change(ha->ifp, LINK_STATE_DOWN);
 		}
 
 
 		ha->hw.module_type = ((data >> 8) & 0xFF);
 		ha->hw.flags.fduplex = (((data & 0xFF0000) == 0) ? 0 : 1);
 		ha->hw.flags.autoneg = (((data & 0xFF000000) == 0) ? 0 : 1);
 		
 		data = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
 		ha->hw.flags.loopback_mode = data & 0x03;
 
 		ha->hw.link_faults = (data >> 3) & 0xFF;
 
 		break;
 
         case 0x8100:
 		ha->hw.imd_compl=1;
 		break;
 
         case 0x8101:
                 ha->async_event = 1;
                 ha->hw.aen_mb0 = 0x8101;
                 ha->hw.aen_mb1 = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
                 ha->hw.aen_mb2 = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
                 ha->hw.aen_mb3 = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
                 ha->hw.aen_mb4 = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
                 break;
 
         case 0x8110:
                 /* for now just dump the registers */
                 {
                         uint32_t ombx[5];
 
                         ombx[0] = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
                         ombx[1] = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
                         ombx[2] = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
                         ombx[3] = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
                         ombx[4] = READ_REG32(ha, (Q8_FW_MBOX0 + 20));
 
                         device_printf(ha->pci_dev, "%s: "
                                 "0x%08x 0x%08x 0x%08x 0x%08x 0x%08x 0x%08x\n",
                                 __func__, data, ombx[0], ombx[1], ombx[2],
                                 ombx[3], ombx[4]);
                 }
 
                 break;
 
         case 0x8130:
                 /* sfp insertion aen */
                 device_printf(ha->pci_dev, "%s: sfp inserted [0x%08x]\n",
                         __func__, READ_REG32(ha, (Q8_FW_MBOX0 + 4)));
                 break;
 
         case 0x8131:
                 /* sfp removal aen */
                 device_printf(ha->pci_dev, "%s: sfp removed]\n", __func__);
                 break;
 
 	default:
 		device_printf(ha->pci_dev, "%s: AEN[0x%08x]\n", __func__, data);
 		break;
 	}
 	WRITE_REG32(ha, Q8_FW_MBOX_CNTRL, 0x0);
 	WRITE_REG32(ha, ha->hw.mbx_intr_mask_offset, 0x0);
 	return;
 }
 
 
 static void
 qla_replenish_normal_rx(qla_host_t *ha, qla_sds_t *sdsp, uint32_t r_idx)
 {
 	qla_rx_buf_t *rxb;
 	int count = sdsp->rx_free;
 	uint32_t rx_next;
 	qla_rdesc_t *rdesc;
 
 	/* we can play with this value via a sysctl */
 	uint32_t replenish_thresh = ha->hw.rds_pidx_thres;
 
 	rdesc = &ha->hw.rds[r_idx];
 
 	rx_next = rdesc->rx_next;
 
 	while (count--) {
 		rxb = sdsp->rxb_free;
 
 		if (rxb == NULL)
 			break;
 
 		sdsp->rxb_free = rxb->next;
 		sdsp->rx_free--;
 
 		if (ql_get_mbuf(ha, rxb, NULL) == 0) {
 			qla_set_hw_rcv_desc(ha, r_idx, rdesc->rx_in,
 				rxb->handle,
 				rxb->paddr, (rxb->m_head)->m_pkthdr.len);
 			rdesc->rx_in++;
 			if (rdesc->rx_in == NUM_RX_DESCRIPTORS)
 				rdesc->rx_in = 0;
 			rdesc->rx_next++;
 			if (rdesc->rx_next == NUM_RX_DESCRIPTORS)
 				rdesc->rx_next = 0;
 		} else {
 			device_printf(ha->pci_dev,
 				"%s: ql_get_mbuf [0,(%d),(%d)] failed\n",
 				__func__, rdesc->rx_in, rxb->handle);
 
 			rxb->m_head = NULL;
 			rxb->next = sdsp->rxb_free;
 			sdsp->rxb_free = rxb;
 			sdsp->rx_free++;
 
 			break;
 		}
 		if (replenish_thresh-- == 0) {
 			QL_UPDATE_RDS_PRODUCER_INDEX(ha, rdesc->prod_std,
 				rdesc->rx_next);
 			rx_next = rdesc->rx_next;
 			replenish_thresh = ha->hw.rds_pidx_thres;
 		}
 	}
 
 	if (rx_next != rdesc->rx_next) {
 		QL_UPDATE_RDS_PRODUCER_INDEX(ha, rdesc->prod_std,
 			rdesc->rx_next);
 	}
 }
 
 void
 ql_isr(void *arg)
 {
 	qla_ivec_t *ivec = arg;
 	qla_host_t *ha ;
 	int idx;
 	qla_hw_t *hw;
 	struct ifnet *ifp;
 	uint32_t ret = 0;
 
 	ha = ivec->ha;
 	hw = &ha->hw;
 	ifp = ha->ifp;
 
 	if ((idx = ivec->sds_idx) >= ha->hw.num_sds_rings)
 		return;
 
 	if (idx == 0)
 		taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
 	
 	ret = qla_rcv_isr(ha, idx, -1);
 
 	if (idx == 0)
 		taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
 
 	if (!ha->flags.stop_rcv) {
 		QL_ENABLE_INTERRUPTS(ha, idx);
 	}
 	return;
 }
 
Index: projects/vnet/sys/dev/qlxge/qls_isr.c
===================================================================
--- projects/vnet/sys/dev/qlxge/qls_isr.c	(revision 301546)
+++ projects/vnet/sys/dev/qlxge/qls_isr.c	(revision 301547)
@@ -1,396 +1,396 @@
 /*
  * Copyright (c) 2013-2014 Qlogic Corporation
  * All rights reserved.
  *
  *  Redistribution and use in source and binary forms, with or without
  *  modification, are permitted provided that the following conditions
  *  are met:
  *
  *  1. Redistributions of source code must retain the above copyright
  *     notice, this list of conditions and the following disclaimer.
  *  2. Redistributions in binary form must reproduce the above copyright
  *     notice, this list of conditions and the following disclaimer in the
  *     documentation and/or other materials provided with the distribution.
  *
  *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  *  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  *  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  *  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  *  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  *  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  *  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  *  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  *  POSSIBILITY OF SUCH DAMAGE.
  */
 
 /*
  * File: qls_isr.c
  * Author : David C Somayajulu, Qlogic Corporation, Aliso Viejo, CA 92656.
  */
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 
 
 #include "qls_os.h"
 #include "qls_hw.h"
 #include "qls_def.h"
 #include "qls_inline.h"
 #include "qls_ver.h"
 #include "qls_glbl.h"
 #include "qls_dbg.h"
 
 
 static void
 qls_tx_comp(qla_host_t *ha, uint32_t txr_idx, q81_tx_mac_comp_t *tx_comp)
 {
 	qla_tx_buf_t *txb;
 	uint32_t tx_idx = tx_comp->tid_lo;
 
 	if (tx_idx >= NUM_TX_DESCRIPTORS) {
 		ha->qla_initiate_recovery = 1;
 		return;
 	}
 
 	txb = &ha->tx_ring[txr_idx].tx_buf[tx_idx];
 
 	if (txb->m_head) {
 		if_inc_counter(ha->ifp, IFCOUNTER_OPACKETS, 1);
 		bus_dmamap_sync(ha->tx_tag, txb->map,
 		        BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(ha->tx_tag, txb->map);
 		m_freem(txb->m_head);
 
 		txb->m_head = NULL;
 	}
 
         ha->tx_ring[txr_idx].txr_done++;
 
 	if (ha->tx_ring[txr_idx].txr_done == NUM_TX_DESCRIPTORS)
 		ha->tx_ring[txr_idx].txr_done = 0;
 }
 
 static void
 qls_replenish_rx(qla_host_t *ha, uint32_t r_idx)
 {
         qla_rx_buf_t			*rxb;
 	qla_rx_ring_t			*rxr;
         int				count;
 	volatile q81_bq_addr_e_t	*sbq_e;
 
 	rxr = &ha->rx_ring[r_idx];
 
 	count = rxr->rx_free;
 	sbq_e = rxr->sbq_vaddr;
 
         while (count--) {
 
 		rxb = &rxr->rx_buf[rxr->sbq_next];
 
 		if (rxb->m_head == NULL) {
                 	if (qls_get_mbuf(ha, rxb, NULL) != 0) {
                         	device_printf(ha->pci_dev,
 					"%s: qls_get_mbuf [0,%d,%d] failed\n",
 					__func__, rxr->sbq_next, r_idx);
 				rxb->m_head = NULL;
 				break;
 			}
 		}
 
 		if (rxb->m_head != NULL) {
 			sbq_e[rxr->sbq_next].addr_lo = (uint32_t)rxb->paddr;
 			sbq_e[rxr->sbq_next].addr_hi =
 				(uint32_t)(rxb->paddr >> 32);
 
                         rxr->sbq_next++;
                         if (rxr->sbq_next == NUM_RX_DESCRIPTORS)
                                 rxr->sbq_next = 0;
 
 			rxr->sbq_free++;
                 	rxr->rx_free--;
 		}
 
                 if (rxr->sbq_free == 16) {
 
 			rxr->sbq_in += 16;
 			rxr->sbq_in = rxr->sbq_in & (NUM_RX_DESCRIPTORS - 1);
 			rxr->sbq_free = 0;
 
 			Q81_WR_SBQ_PROD_IDX(r_idx, (rxr->sbq_in));
                 }
         }
 }
 
 static int
 qls_rx_comp(qla_host_t *ha, uint32_t rxr_idx, uint32_t cq_idx, q81_rx_t *cq_e)
 {
 	qla_rx_buf_t	*rxb;
 	qla_rx_ring_t	*rxr;
 	device_t	dev = ha->pci_dev;
 	struct mbuf     *mp = NULL;
 	struct ifnet	*ifp = ha->ifp;
 	struct lro_ctrl	*lro;
 	struct ether_vlan_header *eh;
 
 	rxr = &ha->rx_ring[rxr_idx];
 
 	lro = &rxr->lro;
 
 	rxb = &rxr->rx_buf[rxr->rx_next];
 
 	if (!(cq_e->flags1 & Q81_RX_FLAGS1_DS)) {
 		device_printf(dev, "%s: DS bit not set \n", __func__);
 		return -1;
 	}
 	if (rxb->paddr != cq_e->b_paddr) {
 
 		device_printf(dev,
 			"%s: (rxb->paddr != cq_e->b_paddr)[%p, %p] \n",
 			__func__, (void *)rxb->paddr, (void *)cq_e->b_paddr);
 
 		Q81_SET_CQ_INVALID(cq_idx);
 
 		ha->qla_initiate_recovery = 1;
 
 		return(-1);
 	}
 
 	rxr->rx_int++;
 
 	if ((cq_e->flags1 & Q81_RX_FLAGS1_ERR_MASK) == 0) {
 
 		mp = rxb->m_head;
 		rxb->m_head = NULL;
 
 		if (mp == NULL) {
 			device_printf(dev, "%s: mp == NULL\n", __func__);
 		} else {
 			mp->m_flags |= M_PKTHDR;
 			mp->m_pkthdr.len = cq_e->length;
 			mp->m_pkthdr.rcvif = ifp;
 			mp->m_len = cq_e->length;
 
 			eh = mtod(mp, struct ether_vlan_header *);
 
 			if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 				uint32_t *data = (uint32_t *)eh;
 
 				mp->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
 				mp->m_flags |= M_VLANTAG;
 
 				*(data + 3) = *(data + 2);
 				*(data + 2) = *(data + 1);
 				*(data + 1) = *data;
 
 				m_adj(mp, ETHER_VLAN_ENCAP_LEN);
 			}
 
 			if ((cq_e->flags1 & Q81_RX_FLAGS1_RSS_MATCH_MASK)) {
 				rxr->rss_int++;
 				mp->m_pkthdr.flowid = cq_e->rss;
-				M_HASHTYPE_SET(mp, M_HASHTYPE_OPAQUE);
+				M_HASHTYPE_SET(mp, M_HASHTYPE_OPAQUE_HASH);
 			}
 			if (cq_e->flags0 & (Q81_RX_FLAGS0_TE |
 				Q81_RX_FLAGS0_NU | Q81_RX_FLAGS0_IE)) {
 				mp->m_pkthdr.csum_flags = 0;
 			} else {
 				mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED |
 					CSUM_IP_VALID | CSUM_DATA_VALID |
 					CSUM_PSEUDO_HDR;
 				mp->m_pkthdr.csum_data = 0xFFFF;
 			}
 			if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 
 			if (lro->lro_cnt && (tcp_lro_rx(lro, mp, 0) == 0)) {
 				/* LRO packet has been successfully queued */
 			} else {
 				(*ifp->if_input)(ifp, mp);
 			}
 		}
 	} else {
 		device_printf(dev, "%s: err [0%08x]\n", __func__, cq_e->flags1);
 	}
 
 	rxr->rx_free++;
 	rxr->rx_next++;
 
 	if (rxr->rx_next == NUM_RX_DESCRIPTORS)
 		rxr->rx_next = 0;
 
 	if ((rxr->rx_free + rxr->sbq_free) >= 16)
                 qls_replenish_rx(ha, rxr_idx);
 
 	return 0;
 }
 
 static void
 qls_cq_isr(qla_host_t *ha, uint32_t cq_idx)
 {
 	q81_cq_e_t *cq_e, *cq_b;
 	uint32_t i, cq_comp_idx;
 	int ret = 0, tx_comp_done = 0;
 	struct lro_ctrl	*lro;
 
 	cq_b = ha->rx_ring[cq_idx].cq_base_vaddr;
 	lro = &ha->rx_ring[cq_idx].lro;
 
 	cq_comp_idx = *(ha->rx_ring[cq_idx].cqi_vaddr);
 
 	i = ha->rx_ring[cq_idx].cq_next;
 
 	while (i != cq_comp_idx) {
 
 		cq_e = &cq_b[i];
 
 		switch (cq_e->opcode) {
 
                 case Q81_IOCB_TX_MAC:
                 case Q81_IOCB_TX_TSO:
                         qls_tx_comp(ha, cq_idx, (q81_tx_mac_comp_t *)cq_e);
                         tx_comp_done++;
                         break;
 
 		case Q81_IOCB_RX:
 			ret = qls_rx_comp(ha, cq_idx, i, (q81_rx_t *)cq_e);
 	
 			break;
 
 		case Q81_IOCB_MPI:
 		case Q81_IOCB_SYS:
 		default:
 			device_printf(ha->pci_dev, "%s[%d %d 0x%x]: illegal \n",
 				__func__, i, (*(ha->rx_ring[cq_idx].cqi_vaddr)),
 				cq_e->opcode);
 			qls_dump_buf32(ha, __func__, cq_e,
 				(sizeof (q81_cq_e_t) >> 2));
 			break;
 		}
 
 		i++;
 		if (i == NUM_CQ_ENTRIES)
 			i = 0;
 
 		if (ret) {
 			break;
 		}
 
 		if (i == cq_comp_idx) {
 			cq_comp_idx = *(ha->rx_ring[cq_idx].cqi_vaddr);
 		}
 
                 if (tx_comp_done) {
                         taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
                         tx_comp_done = 0;
                 }
 	}
 
 	tcp_lro_flush_all(lro);
 
 	ha->rx_ring[cq_idx].cq_next = cq_comp_idx;
 
 	if (!ret) {
 		Q81_WR_CQ_CONS_IDX(cq_idx, (ha->rx_ring[cq_idx].cq_next));
 	}
         if (tx_comp_done)
                 taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
 
 	return;
 }
 
 static void
 qls_mbx_isr(qla_host_t *ha)
 {
 	uint32_t data;
 	int i;
 	device_t dev = ha->pci_dev;
 
 	if (qls_mbx_rd_reg(ha, 0, &data) == 0) {
 
 		if ((data & 0xF000) == 0x4000) {
 			ha->mbox[0] = data;
 			for (i = 1; i < Q81_NUM_MBX_REGISTERS; i++) {
 				if (qls_mbx_rd_reg(ha, i, &data))
 					break; 
 				ha->mbox[i] = data;
 			}
 			ha->mbx_done = 1;
 		} else if ((data & 0xF000) == 0x8000) {
 
 			/* we have an AEN */
 	
 			ha->aen[0] = data;
 			for (i = 1; i < Q81_NUM_AEN_REGISTERS; i++) {
 				if (qls_mbx_rd_reg(ha, i, &data))
 					break; 
 				ha->aen[i] = data;
 			}
 			device_printf(dev,"%s: AEN "
 				"[0x%08x 0x%08x 0x%08x 0x%08x 0x%08x"
 				" 0x%08x 0x%08x 0x%08x 0x%08x]\n",
 				__func__,
 				ha->aen[0], ha->aen[1], ha->aen[2],
 				ha->aen[3], ha->aen[4], ha->aen[5],
 				ha->aen[6], ha->aen[7], ha->aen[8]);
 
 			switch ((ha->aen[0] & 0xFFFF)) {
 
 			case 0x8011:
 				ha->link_up = 1;
 				break;
 
 			case 0x8012:
 				ha->link_up = 0;
 				break;
 
 			case 0x8130:
 				ha->link_hw_info = ha->aen[1];
 				break;
 
 			case 0x8131:
 				ha->link_hw_info = 0;
 				break;
 
 			}
 		} 
 	}
 	WRITE_REG32(ha, Q81_CTL_HOST_CMD_STATUS, Q81_CTL_HCS_CMD_CLR_RTH_INTR);
 
 	return;
 }
 
 void
 qls_isr(void *arg)
 {
 	qla_ivec_t *ivec = arg;
 	qla_host_t *ha;
 	uint32_t status;
 	uint32_t cq_idx;
 	device_t dev;
 
 	ha = ivec->ha;
 	cq_idx = ivec->cq_idx;
 	dev = ha->pci_dev;
 
 	status = READ_REG32(ha, Q81_CTL_STATUS);
 
 	if (status & Q81_CTL_STATUS_FE) {
 		device_printf(dev, "%s fatal error\n", __func__);
 		return;
 	}
 
 	if ((cq_idx == 0) && (status & Q81_CTL_STATUS_PI)) {
 		qls_mbx_isr(ha);
 	}
 
 	status = READ_REG32(ha, Q81_CTL_INTR_STATUS1);
 
 	if (status & ( 0x1 << cq_idx))
 		qls_cq_isr(ha, cq_idx);
 
 	Q81_ENABLE_INTR(ha, cq_idx);
 
 	return;
 }
 
Index: projects/vnet/sys/kern/subr_intr.c
===================================================================
--- projects/vnet/sys/kern/subr_intr.c	(revision 301546)
+++ projects/vnet/sys/kern/subr_intr.c	(revision 301547)
@@ -1,1544 +1,1392 @@
 /*-
  * Copyright (c) 2015-2016 Svatopluk Kraus
  * Copyright (c) 2015-2016 Michal Meloun
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 /*
  *	New-style Interrupt Framework
  *
  *  TODO: - to support IPI (PPI) enabling on other CPUs if already started
  *        - to complete things for removable PICs
  */
 
-#include "opt_acpi.h"
 #include "opt_ddb.h"
 #include "opt_hwpmc_hooks.h"
-#include "opt_platform.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/syslog.h>
 #include <sys/malloc.h>
 #include <sys/proc.h>
 #include <sys/queue.h>
 #include <sys/bus.h>
 #include <sys/interrupt.h>
 #include <sys/conf.h>
 #include <sys/cpuset.h>
 #include <sys/rman.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 #ifdef HWPMC_HOOKS
 #include <sys/pmckern.h>
 #endif
 
 #include <machine/atomic.h>
 #include <machine/intr.h>
 #include <machine/cpu.h>
 #include <machine/smp.h>
 #include <machine/stdarg.h>
 
 #ifdef DDB
 #include <ddb/ddb.h>
 #endif
 
 #include "pic_if.h"
 #include "msi_if.h"
 
 #define	INTRNAME_LEN	(2*MAXCOMLEN + 1)
 
 #ifdef DEBUG
 #define debugf(fmt, args...) do { printf("%s(): ", __func__);	\
     printf(fmt,##args); } while (0)
 #else
 #define debugf(fmt, args...)
 #endif
 
 MALLOC_DECLARE(M_INTRNG);
 MALLOC_DEFINE(M_INTRNG, "intr", "intr interrupt handling");
 
 /* Main interrupt handler called from assembler -> 'hidden' for C code. */
 void intr_irq_handler(struct trapframe *tf);
 
 /* Root interrupt controller stuff. */
 device_t intr_irq_root_dev;
 static intr_irq_filter_t *irq_root_filter;
 static void *irq_root_arg;
 static u_int irq_root_ipicount;
 
 struct intr_pic_child {
 	SLIST_ENTRY(intr_pic_child)	 pc_next;
 	struct intr_pic			*pc_pic;
 	intr_child_irq_filter_t		*pc_filter;
 	void				*pc_filter_arg;
 	uintptr_t			 pc_start;
 	uintptr_t			 pc_length;
 };
 
 /* Interrupt controller definition. */
 struct intr_pic {
 	SLIST_ENTRY(intr_pic)	pic_next;
 	intptr_t		pic_xref;	/* hardware identification */
 	device_t		pic_dev;
 #define	FLAG_PIC	(1 << 0)
 #define	FLAG_MSI	(1 << 1)
 	u_int			pic_flags;
 	struct mtx		pic_child_lock;
 	SLIST_HEAD(, intr_pic_child) pic_children;
 };
 
 static struct mtx pic_list_lock;
 static SLIST_HEAD(, intr_pic) pic_list;
 
 static struct intr_pic *pic_lookup(device_t dev, intptr_t xref);
 
 /* Interrupt source definition. */
 static struct mtx isrc_table_lock;
 static struct intr_irqsrc *irq_sources[NIRQ];
 u_int irq_next_free;
 
-/*
- *  XXX - All stuff around struct intr_dev_data is considered as temporary
- *  until better place for storing struct intr_map_data will be find.
- *
- *  For now, there are two global interrupt numbers spaces:
- *  <0, NIRQ)                      ... interrupts without config data
- *                                     managed in irq_sources[]
- *  IRQ_DDATA_BASE + <0, 2 * NIRQ) ... interrupts with config data
- *                                     managed in intr_ddata_tab[]
- *
- *  Read intr_ddata_lookup() to see how these spaces are worked with.
- *  Note that each interrupt number from second space duplicates some number
- *  from first space at this moment. An interrupt number from first space can
- *  be duplicated even multiple times in second space.
- */
-struct intr_dev_data {
-	device_t		idd_dev;
-	intptr_t		idd_xref;
-	u_int			idd_irq;
-	struct intr_map_data *	idd_data;
-	struct intr_irqsrc *	idd_isrc;
-};
-
-static struct intr_dev_data *intr_ddata_tab[2 * NIRQ];
-static u_int intr_ddata_first_unused;
-
-#define IRQ_DDATA_BASE	10000
-CTASSERT(IRQ_DDATA_BASE > nitems(irq_sources));
-
 #ifdef SMP
 static boolean_t irq_assign_cpu = FALSE;
 #endif
 
 /*
  * - 2 counters for each I/O interrupt.
  * - MAXCPU counters for each IPI counters for SMP.
  */
 #ifdef SMP
 #define INTRCNT_COUNT   (NIRQ * 2 + INTR_IPI_COUNT * MAXCPU)
 #else
 #define INTRCNT_COUNT   (NIRQ * 2)
 #endif
 
 /* Data for MI statistics reporting. */
 u_long intrcnt[INTRCNT_COUNT];
 char intrnames[INTRCNT_COUNT * INTRNAME_LEN];
 size_t sintrcnt = sizeof(intrcnt);
 size_t sintrnames = sizeof(intrnames);
 static u_int intrcnt_index;
 
 /*
  *  Interrupt framework initialization routine.
  */
 static void
 intr_irq_init(void *dummy __unused)
 {
 
 	SLIST_INIT(&pic_list);
 	mtx_init(&pic_list_lock, "intr pic list", NULL, MTX_DEF);
 
 	mtx_init(&isrc_table_lock, "intr isrc table", NULL, MTX_DEF);
 }
 SYSINIT(intr_irq_init, SI_SUB_INTR, SI_ORDER_FIRST, intr_irq_init, NULL);
 
 static void
 intrcnt_setname(const char *name, int index)
 {
 
 	snprintf(intrnames + INTRNAME_LEN * index, INTRNAME_LEN, "%-*s",
 	    INTRNAME_LEN - 1, name);
 }
 
 /*
  *  Update name for interrupt source with interrupt event.
  */
 static void
 intrcnt_updatename(struct intr_irqsrc *isrc)
 {
 
 	/* QQQ: What about stray counter name? */
 	mtx_assert(&isrc_table_lock, MA_OWNED);
 	intrcnt_setname(isrc->isrc_event->ie_fullname, isrc->isrc_index);
 }
 
 /*
  *  Virtualization for interrupt source interrupt counter increment.
  */
 static inline void
 isrc_increment_count(struct intr_irqsrc *isrc)
 {
 
 	if (isrc->isrc_flags & INTR_ISRCF_PPI)
 		atomic_add_long(&isrc->isrc_count[0], 1);
 	else
 		isrc->isrc_count[0]++;
 }
 
 /*
  *  Virtualization for interrupt source interrupt stray counter increment.
  */
 static inline void
 isrc_increment_straycount(struct intr_irqsrc *isrc)
 {
 
 	isrc->isrc_count[1]++;
 }
 
 /*
  *  Virtualization for interrupt source interrupt name update.
  */
 static void
 isrc_update_name(struct intr_irqsrc *isrc, const char *name)
 {
 	char str[INTRNAME_LEN];
 
 	mtx_assert(&isrc_table_lock, MA_OWNED);
 
 	if (name != NULL) {
 		snprintf(str, INTRNAME_LEN, "%s: %s", isrc->isrc_name, name);
 		intrcnt_setname(str, isrc->isrc_index);
 		snprintf(str, INTRNAME_LEN, "stray %s: %s", isrc->isrc_name,
 		    name);
 		intrcnt_setname(str, isrc->isrc_index + 1);
 	} else {
 		snprintf(str, INTRNAME_LEN, "%s:", isrc->isrc_name);
 		intrcnt_setname(str, isrc->isrc_index);
 		snprintf(str, INTRNAME_LEN, "stray %s:", isrc->isrc_name);
 		intrcnt_setname(str, isrc->isrc_index + 1);
 	}
 }
 
 /*
  *  Virtualization for interrupt source interrupt counters setup.
  */
 static void
 isrc_setup_counters(struct intr_irqsrc *isrc)
 {
 	u_int index;
 
 	/*
 	 *  XXX - it does not work well with removable controllers and
 	 *        interrupt sources !!!
 	 */
 	index = atomic_fetchadd_int(&intrcnt_index, 2);
 	isrc->isrc_index = index;
 	isrc->isrc_count = &intrcnt[index];
 	isrc_update_name(isrc, NULL);
 }
 
 /*
  *  Virtualization for interrupt source interrupt counters release.
  */
 static void
 isrc_release_counters(struct intr_irqsrc *isrc)
 {
 
 	panic("%s: not implemented", __func__);
 }
 
 #ifdef SMP
 /*
  *  Virtualization for interrupt source IPI counters setup.
  */
 u_long *
 intr_ipi_setup_counters(const char *name)
 {
 	u_int index, i;
 	char str[INTRNAME_LEN];
 
 	index = atomic_fetchadd_int(&intrcnt_index, MAXCPU);
 	for (i = 0; i < MAXCPU; i++) {
 		snprintf(str, INTRNAME_LEN, "cpu%d:%s", i, name);
 		intrcnt_setname(str, index + i);
 	}
 	return (&intrcnt[index]);
 }
 #endif
 
 /*
  *  Main interrupt dispatch handler. It's called straight
  *  from the assembler, where CPU interrupt is served.
  */
 void
 intr_irq_handler(struct trapframe *tf)
 {
 	struct trapframe * oldframe;
 	struct thread * td;
 
 	KASSERT(irq_root_filter != NULL, ("%s: no filter", __func__));
 
 	PCPU_INC(cnt.v_intr);
 	critical_enter();
 	td = curthread;
 	oldframe = td->td_intr_frame;
 	td->td_intr_frame = tf;
 	irq_root_filter(irq_root_arg);
 	td->td_intr_frame = oldframe;
 	critical_exit();
 #ifdef HWPMC_HOOKS
 	if (pmc_hook && TRAPF_USERMODE(tf) &&
 	    (PCPU_GET(curthread)->td_pflags & TDP_CALLCHAIN))
 		pmc_hook(PCPU_GET(curthread), PMC_FN_USER_CALLCHAIN, tf);
 #endif
 }
 
 int
 intr_child_irq_handler(struct intr_pic *parent, uintptr_t irq)
 {
 	struct intr_pic_child *child;
 	bool found;
 
 	found = false;
 	mtx_lock_spin(&parent->pic_child_lock);
 	SLIST_FOREACH(child, &parent->pic_children, pc_next) {
 		if (child->pc_start <= irq &&
 		    irq < (child->pc_start + child->pc_length)) {
 			found = true;
 			break;
 		}
 	}
 	mtx_unlock_spin(&parent->pic_child_lock);
 
 	if (found)
 		return (child->pc_filter(child->pc_filter_arg, irq));
 
 	return (FILTER_STRAY);
 }
 
 /*
  *  interrupt controller dispatch function for interrupts. It should
  *  be called straight from the interrupt controller, when associated interrupt
  *  source is learned.
  */
 int
 intr_isrc_dispatch(struct intr_irqsrc *isrc, struct trapframe *tf)
 {
 
 	KASSERT(isrc != NULL, ("%s: no source", __func__));
 
 	isrc_increment_count(isrc);
 
 #ifdef INTR_SOLO
 	if (isrc->isrc_filter != NULL) {
 		int error;
 		error = isrc->isrc_filter(isrc->isrc_arg, tf);
 		PIC_POST_FILTER(isrc->isrc_dev, isrc);
 		if (error == FILTER_HANDLED)
 			return (0);
 	} else
 #endif
 	if (isrc->isrc_event != NULL) {
 		if (intr_event_handle(isrc->isrc_event, tf) == 0)
 			return (0);
 	}
 
 	isrc_increment_straycount(isrc);
 	return (EINVAL);
 }
 
 /*
  *  Alloc unique interrupt number (resource handle) for interrupt source.
  *
  *  There could be various strategies how to allocate free interrupt number
  *  (resource handle) for new interrupt source.
  *
  *  1. Handles are always allocated forward, so handles are not recycled
  *     immediately. However, if only one free handle left which is reused
  *     constantly...
  */
 static inline int
 isrc_alloc_irq(struct intr_irqsrc *isrc)
 {
 	u_int maxirqs, irq;
 
 	mtx_assert(&isrc_table_lock, MA_OWNED);
 
 	maxirqs = nitems(irq_sources);
 	if (irq_next_free >= maxirqs)
 		return (ENOSPC);
 
 	for (irq = irq_next_free; irq < maxirqs; irq++) {
 		if (irq_sources[irq] == NULL)
 			goto found;
 	}
 	for (irq = 0; irq < irq_next_free; irq++) {
 		if (irq_sources[irq] == NULL)
 			goto found;
 	}
 
 	irq_next_free = maxirqs;
 	return (ENOSPC);
 
 found:
 	isrc->isrc_irq = irq;
 	irq_sources[irq] = isrc;
 
 	irq_next_free = irq + 1;
 	if (irq_next_free >= maxirqs)
 		irq_next_free = 0;
 	return (0);
 }
 
 /*
  *  Free unique interrupt number (resource handle) from interrupt source.
  */
 static inline int
 isrc_free_irq(struct intr_irqsrc *isrc)
 {
 
 	mtx_assert(&isrc_table_lock, MA_OWNED);
 
 	if (isrc->isrc_irq >= nitems(irq_sources))
 		return (EINVAL);
 	if (irq_sources[isrc->isrc_irq] != isrc)
 		return (EINVAL);
 
 	irq_sources[isrc->isrc_irq] = NULL;
 	isrc->isrc_irq = INTR_IRQ_INVALID;	/* just to be safe */
 	return (0);
 }
 
 /*
  *  Lookup interrupt source by interrupt number (resource handle).
  */
 static inline struct intr_irqsrc *
 isrc_lookup(u_int irq)
 {
 
 	if (irq < nitems(irq_sources))
 		return (irq_sources[irq]);
 	return (NULL);
 }
 
 /*
  *  Initialize interrupt source and register it into global interrupt table.
  */
 int
 intr_isrc_register(struct intr_irqsrc *isrc, device_t dev, u_int flags,
     const char *fmt, ...)
 {
 	int error;
 	va_list ap;
 
 	bzero(isrc, sizeof(struct intr_irqsrc));
 	isrc->isrc_dev = dev;
 	isrc->isrc_irq = INTR_IRQ_INVALID;	/* just to be safe */
 	isrc->isrc_flags = flags;
 
 	va_start(ap, fmt);
 	vsnprintf(isrc->isrc_name, INTR_ISRC_NAMELEN, fmt, ap);
 	va_end(ap);
 
 	mtx_lock(&isrc_table_lock);
 	error = isrc_alloc_irq(isrc);
 	if (error != 0) {
 		mtx_unlock(&isrc_table_lock);
 		return (error);
 	}
 	/*
 	 * Setup interrupt counters, but not for IPI sources. Those are setup
 	 * later and only for used ones (up to INTR_IPI_COUNT) to not exhaust
 	 * our counter pool.
 	 */
 	if ((isrc->isrc_flags & INTR_ISRCF_IPI) == 0)
 		isrc_setup_counters(isrc);
 	mtx_unlock(&isrc_table_lock);
 	return (0);
 }
 
 /*
  *  Deregister interrupt source from global interrupt table.
  */
 int
 intr_isrc_deregister(struct intr_irqsrc *isrc)
 {
 	int error;
 
 	mtx_lock(&isrc_table_lock);
 	if ((isrc->isrc_flags & INTR_ISRCF_IPI) == 0)
 		isrc_release_counters(isrc);
 	error = isrc_free_irq(isrc);
 	mtx_unlock(&isrc_table_lock);
 	return (error);
 }
 
 #ifdef SMP
 /*
  *  A support function for a PIC to decide if provided ISRC should be inited
  *  on given cpu. The logic of INTR_ISRCF_BOUND flag and isrc_cpu member of
  *  struct intr_irqsrc is the following:
  *
  *     If INTR_ISRCF_BOUND is set, the ISRC should be inited only on cpus
  *     set in isrc_cpu. If not, the ISRC should be inited on every cpu and
  *     isrc_cpu is kept consistent with it. Thus isrc_cpu is always correct.
  */
 bool
 intr_isrc_init_on_cpu(struct intr_irqsrc *isrc, u_int cpu)
 {
 
 	if (isrc->isrc_handlers == 0)
 		return (false);
 	if ((isrc->isrc_flags & (INTR_ISRCF_PPI | INTR_ISRCF_IPI)) == 0)
 		return (false);
 	if (isrc->isrc_flags & INTR_ISRCF_BOUND)
 		return (CPU_ISSET(cpu, &isrc->isrc_cpu));
 
 	CPU_SET(cpu, &isrc->isrc_cpu);
 	return (true);
 }
 #endif
 
-static struct intr_dev_data *
-intr_ddata_alloc(u_int extsize)
-{
-	struct intr_dev_data *ddata;
-	size_t size;
-
-	size = sizeof(*ddata);
-	ddata = malloc(size + extsize, M_INTRNG, M_WAITOK | M_ZERO);
-
-	mtx_lock(&isrc_table_lock);
-	if (intr_ddata_first_unused >= nitems(intr_ddata_tab)) {
-		mtx_unlock(&isrc_table_lock);
-		free(ddata, M_INTRNG);
-		return (NULL);
-	}
-	intr_ddata_tab[intr_ddata_first_unused] = ddata;
-	ddata->idd_irq = IRQ_DDATA_BASE + intr_ddata_first_unused++;
-	mtx_unlock(&isrc_table_lock);
-
-	ddata->idd_data = (struct intr_map_data *)((uintptr_t)ddata + size);
-	return (ddata);
-}
-
-static struct intr_irqsrc *
-intr_ddata_lookup(u_int irq, struct intr_map_data **datap)
-{
-	int error;
-	struct intr_irqsrc *isrc;
-	struct intr_dev_data *ddata;
-
-	isrc = isrc_lookup(irq);
-	if (isrc != NULL) {
-		if (datap != NULL)
-			*datap = NULL;
-		return (isrc);
-	}
-
-	if (irq < IRQ_DDATA_BASE)
-		return (NULL);
-
-	irq -= IRQ_DDATA_BASE;
-	if (irq >= nitems(intr_ddata_tab))
-		return (NULL);
-
-	ddata = intr_ddata_tab[irq];
-	if (ddata->idd_isrc == NULL) {
-		error = intr_map_irq(ddata->idd_dev, ddata->idd_xref,
-		    ddata->idd_data, &irq);
-		if (error != 0)
-			return (NULL);
-		ddata->idd_isrc = isrc_lookup(irq);
-	}
-	if (datap != NULL)
-		*datap = ddata->idd_data;
-	return (ddata->idd_isrc);
-}
-
-#ifdef DEV_ACPI
-/*
- *  Map interrupt source according to ACPI info into framework. If such mapping
- *  does not exist, create it. Return unique interrupt number (resource handle)
- *  associated with mapped interrupt source.
- */
-u_int
-intr_acpi_map_irq(device_t dev, u_int irq, enum intr_polarity pol,
-    enum intr_trigger trig)
-{
-	struct intr_map_data_acpi *daa;
-	struct intr_dev_data *ddata;
-
-	ddata = intr_ddata_alloc(sizeof(struct intr_map_data_acpi));
-	if (ddata == NULL)
-		return (INTR_IRQ_INVALID);	/* no space left */
-
-	ddata->idd_dev = dev;
-	ddata->idd_data->type = INTR_MAP_DATA_ACPI;
-
-	daa = (struct intr_map_data_acpi *)ddata->idd_data;
-	daa->irq = irq;
-	daa->pol = pol;
-	daa->trig = trig;
-
-	return (ddata->idd_irq);
-}
-#endif
-
-/*
- *  Store GPIO interrupt decription in framework and return unique interrupt
- *  number (resource handle) associated with it.
- */
-u_int
-intr_gpio_map_irq(device_t dev, u_int pin_num, u_int pin_flags, u_int intr_mode)
-{
-	struct intr_dev_data *ddata;
-	struct intr_map_data_gpio *dag;
-
-	ddata = intr_ddata_alloc(sizeof(struct intr_map_data_gpio));
-	if (ddata == NULL)
-		return (INTR_IRQ_INVALID);	/* no space left */
-
-	ddata->idd_dev = dev;
-	ddata->idd_data->type = INTR_MAP_DATA_GPIO;
-
-	dag = (struct intr_map_data_gpio *)ddata->idd_data;
-	dag->gpio_pin_num = pin_num;
-	dag->gpio_pin_flags = pin_flags;
-	dag->gpio_intr_mode = intr_mode;
-	return (ddata->idd_irq);
-}
-
 #ifdef INTR_SOLO
 /*
  *  Setup filter into interrupt source.
  */
 static int
 iscr_setup_filter(struct intr_irqsrc *isrc, const char *name,
     intr_irq_filter_t *filter, void *arg, void **cookiep)
 {
 
 	if (filter == NULL)
 		return (EINVAL);
 
 	mtx_lock(&isrc_table_lock);
 	/*
 	 * Make sure that we do not mix the two ways
 	 * how we handle interrupt sources.
 	 */
 	if (isrc->isrc_filter != NULL || isrc->isrc_event != NULL) {
 		mtx_unlock(&isrc_table_lock);
 		return (EBUSY);
 	}
 	isrc->isrc_filter = filter;
 	isrc->isrc_arg = arg;
 	isrc_update_name(isrc, name);
 	mtx_unlock(&isrc_table_lock);
 
 	*cookiep = isrc;
 	return (0);
 }
 #endif
 
 /*
  *  Interrupt source pre_ithread method for MI interrupt framework.
  */
 static void
 intr_isrc_pre_ithread(void *arg)
 {
 	struct intr_irqsrc *isrc = arg;
 
 	PIC_PRE_ITHREAD(isrc->isrc_dev, isrc);
 }
 
 /*
  *  Interrupt source post_ithread method for MI interrupt framework.
  */
 static void
 intr_isrc_post_ithread(void *arg)
 {
 	struct intr_irqsrc *isrc = arg;
 
 	PIC_POST_ITHREAD(isrc->isrc_dev, isrc);
 }
 
 /*
  *  Interrupt source post_filter method for MI interrupt framework.
  */
 static void
 intr_isrc_post_filter(void *arg)
 {
 	struct intr_irqsrc *isrc = arg;
 
 	PIC_POST_FILTER(isrc->isrc_dev, isrc);
 }
 
 /*
  *  Interrupt source assign_cpu method for MI interrupt framework.
  */
 static int
 intr_isrc_assign_cpu(void *arg, int cpu)
 {
 #ifdef SMP
 	struct intr_irqsrc *isrc = arg;
 	int error;
 
 	if (isrc->isrc_dev != intr_irq_root_dev)
 		return (EINVAL);
 
 	mtx_lock(&isrc_table_lock);
 	if (cpu == NOCPU) {
 		CPU_ZERO(&isrc->isrc_cpu);
 		isrc->isrc_flags &= ~INTR_ISRCF_BOUND;
 	} else {
 		CPU_SETOF(cpu, &isrc->isrc_cpu);
 		isrc->isrc_flags |= INTR_ISRCF_BOUND;
 	}
 
 	/*
 	 * In NOCPU case, it's up to PIC to either leave ISRC on same CPU or
 	 * re-balance it to another CPU or enable it on more CPUs. However,
 	 * PIC is expected to change isrc_cpu appropriately to keep us well
 	 * informed if the call is successful.
 	 */
 	if (irq_assign_cpu) {
 		error = PIC_BIND_INTR(isrc->isrc_dev, isrc);
 		if (error) {
 			CPU_ZERO(&isrc->isrc_cpu);
 			mtx_unlock(&isrc_table_lock);
 			return (error);
 		}
 	}
 	mtx_unlock(&isrc_table_lock);
 	return (0);
 #else
 	return (EOPNOTSUPP);
 #endif
 }
 
 /*
  *  Create interrupt event for interrupt source.
  */
 static int
 isrc_event_create(struct intr_irqsrc *isrc)
 {
 	struct intr_event *ie;
 	int error;
 
 	error = intr_event_create(&ie, isrc, 0, isrc->isrc_irq,
 	    intr_isrc_pre_ithread, intr_isrc_post_ithread, intr_isrc_post_filter,
 	    intr_isrc_assign_cpu, "%s:", isrc->isrc_name);
 	if (error)
 		return (error);
 
 	mtx_lock(&isrc_table_lock);
 	/*
 	 * Make sure that we do not mix the two ways
 	 * how we handle interrupt sources. Let contested event wins.
 	 */
 #ifdef INTR_SOLO
 	if (isrc->isrc_filter != NULL || isrc->isrc_event != NULL) {
 #else
 	if (isrc->isrc_event != NULL) {
 #endif
 		mtx_unlock(&isrc_table_lock);
 		intr_event_destroy(ie);
 		return (isrc->isrc_event != NULL ? EBUSY : 0);
 	}
 	isrc->isrc_event = ie;
 	mtx_unlock(&isrc_table_lock);
 
 	return (0);
 }
 #ifdef notyet
 /*
  *  Destroy interrupt event for interrupt source.
  */
 static void
 isrc_event_destroy(struct intr_irqsrc *isrc)
 {
 	struct intr_event *ie;
 
 	mtx_lock(&isrc_table_lock);
 	ie = isrc->isrc_event;
 	isrc->isrc_event = NULL;
 	mtx_unlock(&isrc_table_lock);
 
 	if (ie != NULL)
 		intr_event_destroy(ie);
 }
 #endif
 /*
  *  Add handler to interrupt source.
  */
 static int
 isrc_add_handler(struct intr_irqsrc *isrc, const char *name,
     driver_filter_t filter, driver_intr_t handler, void *arg,
     enum intr_type flags, void **cookiep)
 {
 	int error;
 
 	if (isrc->isrc_event == NULL) {
 		error = isrc_event_create(isrc);
 		if (error)
 			return (error);
 	}
 
 	error = intr_event_add_handler(isrc->isrc_event, name, filter, handler,
 	    arg, intr_priority(flags), flags, cookiep);
 	if (error == 0) {
 		mtx_lock(&isrc_table_lock);
 		intrcnt_updatename(isrc);
 		mtx_unlock(&isrc_table_lock);
 	}
 
 	return (error);
 }
 
 /*
  *  Lookup interrupt controller locked.
  */
 static inline struct intr_pic *
 pic_lookup_locked(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	mtx_assert(&pic_list_lock, MA_OWNED);
 
 	if (dev == NULL && xref == 0)
 		return (NULL);
 
 	/* Note that pic->pic_dev is never NULL on registered PIC. */
 	SLIST_FOREACH(pic, &pic_list, pic_next) {
 		if (dev == NULL) {
 			if (xref == pic->pic_xref)
 				return (pic);
 		} else if (xref == 0 || pic->pic_xref == 0) {
 			if (dev == pic->pic_dev)
 				return (pic);
 		} else if (xref == pic->pic_xref && dev == pic->pic_dev)
 				return (pic);
 	}
 	return (NULL);
 }
 
 /*
  *  Lookup interrupt controller.
  */
 static struct intr_pic *
 pic_lookup(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	mtx_lock(&pic_list_lock);
 	pic = pic_lookup_locked(dev, xref);
 	mtx_unlock(&pic_list_lock);
 	return (pic);
 }
 
 /*
  *  Create interrupt controller.
  */
 static struct intr_pic *
 pic_create(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	mtx_lock(&pic_list_lock);
 	pic = pic_lookup_locked(dev, xref);
 	if (pic != NULL) {
 		mtx_unlock(&pic_list_lock);
 		return (pic);
 	}
 	pic = malloc(sizeof(*pic), M_INTRNG, M_NOWAIT | M_ZERO);
 	if (pic == NULL) {
 		mtx_unlock(&pic_list_lock);
 		return (NULL);
 	}
 	pic->pic_xref = xref;
 	pic->pic_dev = dev;
 	mtx_init(&pic->pic_child_lock, "pic child lock", NULL, MTX_SPIN);
 	SLIST_INSERT_HEAD(&pic_list, pic, pic_next);
 	mtx_unlock(&pic_list_lock);
 
 	return (pic);
 }
 #ifdef notyet
 /*
  *  Destroy interrupt controller.
  */
 static void
 pic_destroy(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	mtx_lock(&pic_list_lock);
 	pic = pic_lookup_locked(dev, xref);
 	if (pic == NULL) {
 		mtx_unlock(&pic_list_lock);
 		return;
 	}
 	SLIST_REMOVE(&pic_list, pic, intr_pic, pic_next);
 	mtx_unlock(&pic_list_lock);
 
 	free(pic, M_INTRNG);
 }
 #endif
 /*
  *  Register interrupt controller.
  */
 struct intr_pic *
 intr_pic_register(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	if (dev == NULL)
 		return (NULL);
 	pic = pic_create(dev, xref);
 	if (pic == NULL)
 		return (NULL);
 
 	pic->pic_flags |= FLAG_PIC;
 
 	debugf("PIC %p registered for %s <dev %p, xref %x>\n", pic,
 	    device_get_nameunit(dev), dev, xref);
 	return (pic);
 }
 
 /*
  *  Unregister interrupt controller.
  */
 int
 intr_pic_deregister(device_t dev, intptr_t xref)
 {
 
 	panic("%s: not implemented", __func__);
 }
 
 /*
  *  Mark interrupt controller (itself) as a root one.
  *
  *  Note that only an interrupt controller can really know its position
  *  in interrupt controller's tree. So root PIC must claim itself as a root.
  *
  *  In FDT case, according to ePAPR approved version 1.1 from 08 April 2011,
  *  page 30:
  *    "The root of the interrupt tree is determined when traversal
  *     of the interrupt tree reaches an interrupt controller node without
  *     an interrupts property and thus no explicit interrupt parent."
  */
 int
 intr_pic_claim_root(device_t dev, intptr_t xref, intr_irq_filter_t *filter,
     void *arg, u_int ipicount)
 {
 	struct intr_pic *pic;
 
 	pic = pic_lookup(dev, xref);
 	if (pic == NULL) {
 		device_printf(dev, "not registered\n");
 		return (EINVAL);
 	}
 
 	KASSERT((pic->pic_flags & FLAG_PIC) != 0,
 	    ("%s: Found a non-PIC controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	if (filter == NULL) {
 		device_printf(dev, "filter missing\n");
 		return (EINVAL);
 	}
 
 	/*
 	 * Only one interrupt controllers could be on the root for now.
 	 * Note that we further suppose that there is not threaded interrupt
 	 * routine (handler) on the root. See intr_irq_handler().
 	 */
 	if (intr_irq_root_dev != NULL) {
 		device_printf(dev, "another root already set\n");
 		return (EBUSY);
 	}
 
 	intr_irq_root_dev = dev;
 	irq_root_filter = filter;
 	irq_root_arg = arg;
 	irq_root_ipicount = ipicount;
 
 	debugf("irq root set to %s\n", device_get_nameunit(dev));
 	return (0);
 }
 
 /*
  * Add a handler to manage a sub range of a parents interrupts.
  */
 struct intr_pic *
 intr_pic_add_handler(device_t parent, struct intr_pic *pic,
     intr_child_irq_filter_t *filter, void *arg, uintptr_t start,
     uintptr_t length)
 {
 	struct intr_pic *parent_pic;
 	struct intr_pic_child *newchild;
 #ifdef INVARIANTS
 	struct intr_pic_child *child;
 #endif
 
 	parent_pic = pic_lookup(parent, 0);
 	if (parent_pic == NULL)
 		return (NULL);
 
 	newchild = malloc(sizeof(*newchild), M_INTRNG, M_WAITOK | M_ZERO);
 	newchild->pc_pic = pic;
 	newchild->pc_filter = filter;
 	newchild->pc_filter_arg = arg;
 	newchild->pc_start = start;
 	newchild->pc_length = length;
 
 	mtx_lock_spin(&parent_pic->pic_child_lock);
 #ifdef INVARIANTS
 	SLIST_FOREACH(child, &parent_pic->pic_children, pc_next) {
 		KASSERT(child->pc_pic != pic, ("%s: Adding a child PIC twice",
 		    __func__));
 	}
 #endif
 	SLIST_INSERT_HEAD(&parent_pic->pic_children, newchild, pc_next);
 	mtx_unlock_spin(&parent_pic->pic_child_lock);
 
 	return (pic);
 }
 
 int
 intr_map_irq(device_t dev, intptr_t xref, struct intr_map_data *data,
     u_int *irqp)
 {
 	int error;
 	struct intr_irqsrc *isrc;
 	struct intr_pic *pic;
 
 	if (data == NULL)
 		return (EINVAL);
 
 	pic = pic_lookup(dev, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_PIC) != 0,
 	    ("%s: Found a non-PIC controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	error = PIC_MAP_INTR(pic->pic_dev, data, &isrc);
 	if (error == 0)
 		*irqp = isrc->isrc_irq;
 	return (error);
 }
 
 int
 intr_alloc_irq(device_t dev, struct resource *res)
 {
 	struct intr_map_data *data;
 	struct intr_irqsrc *isrc;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	data = rman_get_virtual(res);
-	if (data == NULL)
-		isrc = intr_ddata_lookup(rman_get_start(res), &data);
-	else
-		isrc = isrc_lookup(rman_get_start(res));
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL)
 		return (EINVAL);
 
+	data = rman_get_virtual(res);
 	return (PIC_ALLOC_INTR(isrc->isrc_dev, isrc, res, data));
 }
 
 int
 intr_release_irq(device_t dev, struct resource *res)
 {
 	struct intr_map_data *data;
 	struct intr_irqsrc *isrc;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	data = rman_get_virtual(res);
-	if (data == NULL)
-		isrc = intr_ddata_lookup(rman_get_start(res), &data);
-	else
-		isrc = isrc_lookup(rman_get_start(res));
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL)
 		return (EINVAL);
 
+	data = rman_get_virtual(res);
 	return (PIC_RELEASE_INTR(isrc->isrc_dev, isrc, res, data));
 }
 
 int
 intr_setup_irq(device_t dev, struct resource *res, driver_filter_t filt,
     driver_intr_t hand, void *arg, int flags, void **cookiep)
 {
 	int error;
 	struct intr_map_data *data;
 	struct intr_irqsrc *isrc;
 	const char *name;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	data = rman_get_virtual(res);
-	if (data == NULL)
-		isrc = intr_ddata_lookup(rman_get_start(res), &data);
-	else
-		isrc = isrc_lookup(rman_get_start(res));
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL)
 		return (EINVAL);
 
+	data = rman_get_virtual(res);
 	name = device_get_nameunit(dev);
 
 #ifdef INTR_SOLO
 	/*
 	 * Standard handling is done through MI interrupt framework. However,
 	 * some interrupts could request solely own special handling. This
 	 * non standard handling can be used for interrupt controllers without
 	 * handler (filter only), so in case that interrupt controllers are
 	 * chained, MI interrupt framework is called only in leaf controller.
 	 *
 	 * Note that root interrupt controller routine is served as well,
 	 * however in intr_irq_handler(), i.e. main system dispatch routine.
 	 */
 	if (flags & INTR_SOLO && hand != NULL) {
 		debugf("irq %u cannot solo on %s\n", irq, name);
 		return (EINVAL);
 	}
 
 	if (flags & INTR_SOLO) {
 		error = iscr_setup_filter(isrc, name, (intr_irq_filter_t *)filt,
 		    arg, cookiep);
 		debugf("irq %u setup filter error %d on %s\n", irq, error,
 		    name);
 	} else
 #endif
 		{
 		error = isrc_add_handler(isrc, name, filt, hand, arg, flags,
 		    cookiep);
 		debugf("irq %u add handler error %d on %s\n", irq, error, name);
 	}
 	if (error != 0)
 		return (error);
 
 	mtx_lock(&isrc_table_lock);
 	error = PIC_SETUP_INTR(isrc->isrc_dev, isrc, res, data);
 	if (error == 0) {
 		isrc->isrc_handlers++;
 		if (isrc->isrc_handlers == 1)
 			PIC_ENABLE_INTR(isrc->isrc_dev, isrc);
 	}
 	mtx_unlock(&isrc_table_lock);
 	if (error != 0)
 		intr_event_remove_handler(*cookiep);
 	return (error);
 }
 
 int
 intr_teardown_irq(device_t dev, struct resource *res, void *cookie)
 {
 	int error;
 	struct intr_map_data *data;
 	struct intr_irqsrc *isrc;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	data = rman_get_virtual(res);
-	if (data == NULL)
-		isrc = intr_ddata_lookup(rman_get_start(res), &data);
-	else
-		isrc = isrc_lookup(rman_get_start(res));
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL || isrc->isrc_handlers == 0)
 		return (EINVAL);
 
+	data = rman_get_virtual(res);
+
 #ifdef INTR_SOLO
 	if (isrc->isrc_filter != NULL) {
 		if (isrc != cookie)
 			return (EINVAL);
 
 		mtx_lock(&isrc_table_lock);
 		isrc->isrc_filter = NULL;
 		isrc->isrc_arg = NULL;
 		isrc->isrc_handlers = 0;
 		PIC_DISABLE_INTR(isrc->isrc_dev, isrc);
 		PIC_TEARDOWN_INTR(isrc->isrc_dev, isrc, res, data);
 		isrc_update_name(isrc, NULL);
 		mtx_unlock(&isrc_table_lock);
 		return (0);
 	}
 #endif
 	if (isrc != intr_handler_source(cookie))
 		return (EINVAL);
 
 	error = intr_event_remove_handler(cookie);
 	if (error == 0) {
 		mtx_lock(&isrc_table_lock);
 		isrc->isrc_handlers--;
 		if (isrc->isrc_handlers == 0)
 			PIC_DISABLE_INTR(isrc->isrc_dev, isrc);
 		PIC_TEARDOWN_INTR(isrc->isrc_dev, isrc, res, data);
 		intrcnt_updatename(isrc);
 		mtx_unlock(&isrc_table_lock);
 	}
 	return (error);
 }
 
 int
 intr_describe_irq(device_t dev, struct resource *res, void *cookie,
     const char *descr)
 {
 	int error;
 	struct intr_irqsrc *isrc;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	isrc = intr_ddata_lookup(rman_get_start(res), NULL);
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL || isrc->isrc_handlers == 0)
 		return (EINVAL);
 #ifdef INTR_SOLO
 	if (isrc->isrc_filter != NULL) {
 		if (isrc != cookie)
 			return (EINVAL);
 
 		mtx_lock(&isrc_table_lock);
 		isrc_update_name(isrc, descr);
 		mtx_unlock(&isrc_table_lock);
 		return (0);
 	}
 #endif
 	error = intr_event_describe_handler(isrc->isrc_event, cookie, descr);
 	if (error == 0) {
 		mtx_lock(&isrc_table_lock);
 		intrcnt_updatename(isrc);
 		mtx_unlock(&isrc_table_lock);
 	}
 	return (error);
 }
 
 #ifdef SMP
 int
 intr_bind_irq(device_t dev, struct resource *res, int cpu)
 {
 	struct intr_irqsrc *isrc;
 
 	KASSERT(rman_get_start(res) == rman_get_end(res),
 	    ("%s: more interrupts in resource", __func__));
 
-	isrc = intr_ddata_lookup(rman_get_start(res), NULL);
+	isrc = isrc_lookup(rman_get_start(res));
 	if (isrc == NULL || isrc->isrc_handlers == 0)
 		return (EINVAL);
 #ifdef INTR_SOLO
 	if (isrc->isrc_filter != NULL)
 		return (intr_isrc_assign_cpu(isrc, cpu));
 #endif
 	return (intr_event_bind(isrc->isrc_event, cpu));
 }
 
 /*
  * Return the CPU that the next interrupt source should use.
  * For now just returns the next CPU according to round-robin.
  */
 u_int
 intr_irq_next_cpu(u_int last_cpu, cpuset_t *cpumask)
 {
 
 	if (!irq_assign_cpu || mp_ncpus == 1)
 		return (PCPU_GET(cpuid));
 
 	do {
 		last_cpu++;
 		if (last_cpu > mp_maxid)
 			last_cpu = 0;
 	} while (!CPU_ISSET(last_cpu, cpumask));
 	return (last_cpu);
 }
 
 /*
  *  Distribute all the interrupt sources among the available
  *  CPUs once the AP's have been launched.
  */
 static void
 intr_irq_shuffle(void *arg __unused)
 {
 	struct intr_irqsrc *isrc;
 	u_int i;
 
 	if (mp_ncpus == 1)
 		return;
 
 	mtx_lock(&isrc_table_lock);
 	irq_assign_cpu = TRUE;
 	for (i = 0; i < NIRQ; i++) {
 		isrc = irq_sources[i];
 		if (isrc == NULL || isrc->isrc_handlers == 0 ||
 		    isrc->isrc_flags & (INTR_ISRCF_PPI | INTR_ISRCF_IPI))
 			continue;
 
 		if (isrc->isrc_event != NULL &&
 		    isrc->isrc_flags & INTR_ISRCF_BOUND &&
 		    isrc->isrc_event->ie_cpu != CPU_FFS(&isrc->isrc_cpu) - 1)
 			panic("%s: CPU inconsistency", __func__);
 
 		if ((isrc->isrc_flags & INTR_ISRCF_BOUND) == 0)
 			CPU_ZERO(&isrc->isrc_cpu); /* start again */
 
 		/*
 		 * We are in wicked position here if the following call fails
 		 * for bound ISRC. The best thing we can do is to clear
 		 * isrc_cpu so inconsistency with ie_cpu will be detectable.
 		 */
 		if (PIC_BIND_INTR(isrc->isrc_dev, isrc) != 0)
 			CPU_ZERO(&isrc->isrc_cpu);
 	}
 	mtx_unlock(&isrc_table_lock);
 }
 SYSINIT(intr_irq_shuffle, SI_SUB_SMP, SI_ORDER_SECOND, intr_irq_shuffle, NULL);
 
 #else
 u_int
 intr_irq_next_cpu(u_int current_cpu, cpuset_t *cpumask)
 {
 
 	return (PCPU_GET(cpuid));
 }
 #endif
 
 /*
  *  Register a MSI/MSI-X interrupt controller
  */
 int
 intr_msi_register(device_t dev, intptr_t xref)
 {
 	struct intr_pic *pic;
 
 	if (dev == NULL)
 		return (EINVAL);
 	pic = pic_create(dev, xref);
 	if (pic == NULL)
 		return (ENOMEM);
 
 	pic->pic_flags |= FLAG_MSI;
 
 	debugf("PIC %p registered for %s <dev %p, xref %jx>\n", pic,
 	    device_get_nameunit(dev), dev, (uintmax_t)xref);
 	return (0);
 }
 
 int
 intr_alloc_msi(device_t pci, device_t child, intptr_t xref, int count,
     int maxcount, int *irqs)
 {
 	struct intr_irqsrc **isrc;
 	struct intr_pic *pic;
 	device_t pdev;
 	int err, i;
 
 	pic = pic_lookup(NULL, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_MSI) != 0,
 	    ("%s: Found a non-MSI controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	isrc = malloc(sizeof(*isrc) * count, M_INTRNG, M_WAITOK);
 	err = MSI_ALLOC_MSI(pic->pic_dev, child, count, maxcount, &pdev, isrc);
 	if (err == 0) {
 		for (i = 0; i < count; i++) {
 			irqs[i] = isrc[i]->isrc_irq;
 		}
 	}
 
 	free(isrc, M_INTRNG);
 
 	return (err);
 }
 
 int
 intr_release_msi(device_t pci, device_t child, intptr_t xref, int count,
     int *irqs)
 {
 	struct intr_irqsrc **isrc;
 	struct intr_pic *pic;
 	int i, err;
 
 	pic = pic_lookup(NULL, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_MSI) != 0,
 	    ("%s: Found a non-MSI controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	isrc = malloc(sizeof(*isrc) * count, M_INTRNG, M_WAITOK);
 
 	for (i = 0; i < count; i++) {
 		isrc[i] = isrc_lookup(irqs[i]);
 		if (isrc == NULL) {
 			free(isrc, M_INTRNG);
 			return (EINVAL);
 		}
 	}
 
 	err = MSI_RELEASE_MSI(pic->pic_dev, child, count, isrc);
 	free(isrc, M_INTRNG);
 	return (err);
 }
 
 int
 intr_alloc_msix(device_t pci, device_t child, intptr_t xref, int *irq)
 {
 	struct intr_irqsrc *isrc;
 	struct intr_pic *pic;
 	device_t pdev;
 	int err;
 
 	pic = pic_lookup(NULL, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_MSI) != 0,
 	    ("%s: Found a non-MSI controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	err = MSI_ALLOC_MSIX(pic->pic_dev, child, &pdev, &isrc);
 	if (err != 0)
 		return (err);
 
 	*irq = isrc->isrc_irq;
 	return (0);
 }
 
 int
 intr_release_msix(device_t pci, device_t child, intptr_t xref, int irq)
 {
 	struct intr_irqsrc *isrc;
 	struct intr_pic *pic;
 	int err;
 
 	pic = pic_lookup(NULL, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_MSI) != 0,
 	    ("%s: Found a non-MSI controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	isrc = isrc_lookup(irq);
 	if (isrc == NULL)
 		return (EINVAL);
 
 	err = MSI_RELEASE_MSIX(pic->pic_dev, child, isrc);
 	return (err);
 }
 
 int
 intr_map_msi(device_t pci, device_t child, intptr_t xref, int irq,
     uint64_t *addr, uint32_t *data)
 {
 	struct intr_irqsrc *isrc;
 	struct intr_pic *pic;
 	int err;
 
 	pic = pic_lookup(NULL, xref);
 	if (pic == NULL)
 		return (ESRCH);
 
 	KASSERT((pic->pic_flags & FLAG_MSI) != 0,
 	    ("%s: Found a non-MSI controller: %s", __func__,
 	     device_get_name(pic->pic_dev)));
 
 	isrc = isrc_lookup(irq);
 	if (isrc == NULL)
 		return (EINVAL);
 
 	err = MSI_MAP_MSI(pic->pic_dev, child, isrc, addr, data);
 	return (err);
 }
 
 
 void dosoftints(void);
 void
 dosoftints(void)
 {
 }
 
 #ifdef SMP
 /*
  *  Init interrupt controller on another CPU.
  */
 void
 intr_pic_init_secondary(void)
 {
 
 	/*
 	 * QQQ: Only root PIC is aware of other CPUs ???
 	 */
 	KASSERT(intr_irq_root_dev != NULL, ("%s: no root attached", __func__));
 
 	//mtx_lock(&isrc_table_lock);
 	PIC_INIT_SECONDARY(intr_irq_root_dev);
 	//mtx_unlock(&isrc_table_lock);
 }
 #endif
 
 #ifdef DDB
 DB_SHOW_COMMAND(irqs, db_show_irqs)
 {
 	u_int i, irqsum;
 	u_long num;
 	struct intr_irqsrc *isrc;
 
 	for (irqsum = 0, i = 0; i < NIRQ; i++) {
 		isrc = irq_sources[i];
 		if (isrc == NULL)
 			continue;
 
 		num = isrc->isrc_count != NULL ? isrc->isrc_count[0] : 0;
 		db_printf("irq%-3u <%s>: cpu %02lx%s cnt %lu\n", i,
 		    isrc->isrc_name, isrc->isrc_cpu.__bits[0],
 		    isrc->isrc_flags & INTR_ISRCF_BOUND ? " (bound)" : "", num);
 		irqsum += num;
 	}
 	db_printf("irq total %u\n", irqsum);
 }
 #endif
Index: projects/vnet/sys/net/flowtable.c
===================================================================
--- projects/vnet/sys/net/flowtable.c	(revision 301546)
+++ projects/vnet/sys/net/flowtable.c	(revision 301547)
@@ -1,1184 +1,1184 @@
 /*-
  * Copyright (c) 2014 Gleb Smirnoff <glebius@FreeBSD.org>
  * Copyright (c) 2008-2010, BitGravity Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  *  1. Redistributions of source code must retain the above copyright notice,
  *     this list of conditions and the following disclaimer.
  *
  *  2. Neither the name of the BitGravity Corporation nor the names of its
  *     contributors may be used to endorse or promote products derived from
  *     this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include "opt_route.h"
 #include "opt_mpath.h"
 #include "opt_ddb.h"
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/types.h>
 #include <sys/bitstring.h>
 #include <sys/condvar.h>
 #include <sys/callout.h>
 #include <sys/hash.h>
 #include <sys/kernel.h>
 #include <sys/kthread.h>
 #include <sys/limits.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/queue.h>
 #include <sys/sbuf.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/syslog.h>
 #include <sys/sysctl.h>
 #include <vm/uma.h>
 
 #include <net/if.h>
 #include <net/if_llatbl.h>
 #include <net/if_var.h>
 #include <net/route.h>
 #include <net/flowtable.h>
 #include <net/vnet.h>
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/in_var.h>
 #include <netinet/if_ether.h>
 #include <netinet/ip.h>
 #ifdef INET6
 #include <netinet/ip6.h>
 #endif
 #ifdef FLOWTABLE_HASH_ALL
 #include <netinet/tcp.h>
 #include <netinet/udp.h>
 #include <netinet/sctp.h>
 #endif
 
 #include <ddb/ddb.h>
 
 #ifdef	FLOWTABLE_HASH_ALL
 #define	KEY_PORTS	(sizeof(uint16_t) * 2)
 #define	KEY_ADDRS	2
 #else
 #define	KEY_PORTS	0
 #define	KEY_ADDRS	1
 #endif
 
 #ifdef	INET6
 #define	KEY_ADDR_LEN	sizeof(struct in6_addr)
 #else
 #define	KEY_ADDR_LEN	sizeof(struct in_addr)
 #endif
 
 #define	KEYLEN	((KEY_ADDR_LEN * KEY_ADDRS + KEY_PORTS) / sizeof(uint32_t))
 
 struct flentry {
 	uint32_t		f_hash;		/* hash flowing forward */
 	uint32_t		f_key[KEYLEN];	/* address(es and ports) */
 	uint32_t		f_uptime;	/* uptime at last access */
 	uint16_t		f_fibnum;	/* fib index */
 #ifdef FLOWTABLE_HASH_ALL
 	uint8_t			f_proto;	/* protocol */
 	uint8_t			f_flags;	/* stale? */
 #define FL_STALE 		1
 #endif
 	SLIST_ENTRY(flentry)	f_next;		/* pointer to collision entry */
 	struct rtentry		*f_rt;		/* rtentry for flow */
 	struct llentry		*f_lle;		/* llentry for flow */
 };
 #undef KEYLEN
 
 SLIST_HEAD(flist, flentry);
 /* Make sure we can use pcpu_zone_ptr for struct flist. */
 CTASSERT(sizeof(struct flist) == sizeof(void *));
 
 struct flowtable {
 	counter_u64_t	*ft_stat;
 	int 		ft_size;
 	/*
 	 * ft_table is a malloc(9)ed array of pointers.  Pointers point to
 	 * memory from UMA_ZONE_PCPU zone.
 	 * ft_masks is per-cpu pointer itself.  Each instance points
 	 * to a malloc(9)ed bitset, that is private to corresponding CPU.
 	 */
 	struct flist	**ft_table;
 	bitstr_t 	**ft_masks;
 	bitstr_t	*ft_tmpmask;
 };
 
 #define	FLOWSTAT_ADD(ft, name, v)	\
 	counter_u64_add((ft)->ft_stat[offsetof(struct flowtable_stat, name) / sizeof(uint64_t)], (v))
 #define	FLOWSTAT_INC(ft, name)	FLOWSTAT_ADD(ft, name, 1)
 
 static struct proc *flowcleanerproc;
 static uint32_t flow_hashjitter;
 
 static struct cv 	flowclean_f_cv;
 static struct cv 	flowclean_c_cv;
 static struct mtx	flowclean_lock;
 static uint32_t		flowclean_cycles;
 
 /*
  * TODO:
  * - add sysctls to resize && flush flow tables
  * - Add per flowtable sysctls for statistics and configuring timeouts
  * - add saturation counter to rtentry to support per-packet load-balancing
  *   add flag to indicate round-robin flow, add list lookup from head
      for flows
  * - add sysctl / device node / syscall to support exporting and importing
  *   of flows with flag to indicate that a flow was imported so should
  *   not be considered for auto-cleaning
  * - support explicit connection state (currently only ad-hoc for DSR)
  * - idetach() cleanup for options VIMAGE builds.
  */
 #ifdef INET
 static VNET_DEFINE(struct flowtable, ip4_ft);
 #define	V_ip4_ft	VNET(ip4_ft)
 #endif
 #ifdef INET6
 static VNET_DEFINE(struct flowtable, ip6_ft);
 #define	V_ip6_ft	VNET(ip6_ft)
 #endif
 
 static uma_zone_t flow_zone;
 
 static VNET_DEFINE(int, flowtable_enable) = 1;
 #define	V_flowtable_enable		VNET(flowtable_enable)
 
 static SYSCTL_NODE(_net, OID_AUTO, flowtable, CTLFLAG_RD, NULL,
     "flowtable");
 SYSCTL_INT(_net_flowtable, OID_AUTO, enable, CTLFLAG_VNET | CTLFLAG_RW,
     &VNET_NAME(flowtable_enable), 0, "enable flowtable caching.");
 SYSCTL_UMA_MAX(_net_flowtable, OID_AUTO, maxflows, CTLFLAG_RW,
     &flow_zone, "Maximum number of flows allowed");
 
 static MALLOC_DEFINE(M_FTABLE, "flowtable", "flowtable hashes and bitstrings");
 
 static struct flentry *
 flowtable_lookup_common(struct flowtable *, uint32_t *, int, uint32_t);
 
 #ifdef INET
 static struct flentry *
 flowtable_lookup_ipv4(struct mbuf *m, struct route *ro)
 {
 	struct flentry *fle;
 	struct sockaddr_in *sin;
 	struct ip *ip;
 	uint32_t fibnum;
 #ifdef FLOWTABLE_HASH_ALL
 	uint32_t key[3];
 	int iphlen;
 	uint16_t sport, dport;
 	uint8_t proto;
 #endif
 
 	ip = mtod(m, struct ip *);
 
 	if (ip->ip_src.s_addr == ip->ip_dst.s_addr ||
 	    (ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
 	    (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET)
 		return (NULL);
 
 	fibnum = M_GETFIB(m);
 
 #ifdef FLOWTABLE_HASH_ALL
 	iphlen = ip->ip_hl << 2;
 	proto = ip->ip_p;
 
 	switch (proto) {
 	case IPPROTO_TCP: {
 		struct tcphdr *th;
 
 		th = (struct tcphdr *)((char *)ip + iphlen);
 		sport = th->th_sport;
 		dport = th->th_dport;
 		if (th->th_flags & (TH_RST|TH_FIN))
 			fibnum |= (FL_STALE << 24);
 		break;
 	}
 	case IPPROTO_UDP: {
 		struct udphdr *uh;
 
 		uh = (struct udphdr *)((char *)ip + iphlen);
 		sport = uh->uh_sport;
 		dport = uh->uh_dport;
 		break;
 	}
 	case IPPROTO_SCTP: {
 		struct sctphdr *sh;
 
 		sh = (struct sctphdr *)((char *)ip + iphlen);
 		sport = sh->src_port;
 		dport = sh->dest_port;
 		/* XXXGL: handle stale? */
 		break;
 	}
 	default:
 		sport = dport = 0;
 		break;
 	}
 
 	key[0] = ip->ip_dst.s_addr;
 	key[1] = ip->ip_src.s_addr;
 	key[2] = (dport << 16) | sport;
 	fibnum |= proto << 16;
 
 	fle = flowtable_lookup_common(&V_ip4_ft, key, 3 * sizeof(uint32_t),
 	    fibnum);
 
 #else	/* !FLOWTABLE_HASH_ALL */
 
 	fle = flowtable_lookup_common(&V_ip4_ft, (uint32_t *)&ip->ip_dst,
 	    sizeof(struct in_addr), fibnum);
 
 #endif	/* FLOWTABLE_HASH_ALL */
 
 	if (fle == NULL)
 		return (NULL);
 
 	sin = (struct sockaddr_in *)&ro->ro_dst;
 	sin->sin_family = AF_INET;
 	sin->sin_len = sizeof(*sin);
 	sin->sin_addr = ip->ip_dst;
 
 	return (fle);
 }
 #endif /* INET */
 
 #ifdef INET6
 /*
  * PULLUP_TO(len, p, T) makes sure that len + sizeof(T) is contiguous,
  * then it sets p to point at the offset "len" in the mbuf. WARNING: the
  * pointer might become stale after other pullups (but we never use it
  * this way).
  */
 #define PULLUP_TO(_len, p, T)						\
 do {									\
 	int x = (_len) + sizeof(T);					\
 	if ((m)->m_len < x)						\
 		return (NULL);						\
 	p = (mtod(m, char *) + (_len));					\
 } while (0)
 
 #define	TCP(p)		((struct tcphdr *)(p))
 #define	SCTP(p)		((struct sctphdr *)(p))
 #define	UDP(p)		((struct udphdr *)(p))
 
 static struct flentry *
 flowtable_lookup_ipv6(struct mbuf *m, struct route *ro)
 {
 	struct flentry *fle;
 	struct sockaddr_in6 *sin6;
 	struct ip6_hdr *ip6;
 	uint32_t fibnum;
 #ifdef FLOWTABLE_HASH_ALL
 	uint32_t key[9];
 	void *ulp;
 	int hlen;
 	uint16_t sport, dport;
 	u_short offset;
 	uint8_t proto;
 #else
 	uint32_t key[4];
 #endif
 
 	ip6 = mtod(m, struct ip6_hdr *);
 	if (in6_localaddr(&ip6->ip6_dst))
 		return (NULL);
 
 	fibnum = M_GETFIB(m);
 
 #ifdef	FLOWTABLE_HASH_ALL
 	hlen = sizeof(struct ip6_hdr);
 	proto = ip6->ip6_nxt;
 	offset = sport = dport = 0;
 	ulp = NULL;
 	while (ulp == NULL) {
 		switch (proto) {
 		case IPPROTO_ICMPV6:
 		case IPPROTO_OSPFIGP:
 		case IPPROTO_PIM:
 		case IPPROTO_CARP:
 		case IPPROTO_ESP:
 		case IPPROTO_NONE:
 			ulp = ip6;
 			break;
 		case IPPROTO_TCP:
 			PULLUP_TO(hlen, ulp, struct tcphdr);
 			dport = TCP(ulp)->th_dport;
 			sport = TCP(ulp)->th_sport;
 			if (TCP(ulp)->th_flags & (TH_RST|TH_FIN))
 				fibnum |= (FL_STALE << 24);
 			break;
 		case IPPROTO_SCTP:
 			PULLUP_TO(hlen, ulp, struct sctphdr);
 			dport = SCTP(ulp)->src_port;
 			sport = SCTP(ulp)->dest_port;
 			/* XXXGL: handle stale? */
 			break;
 		case IPPROTO_UDP:
 			PULLUP_TO(hlen, ulp, struct udphdr);
 			dport = UDP(ulp)->uh_dport;
 			sport = UDP(ulp)->uh_sport;
 			break;
 		case IPPROTO_HOPOPTS:	/* RFC 2460 */
 			PULLUP_TO(hlen, ulp, struct ip6_hbh);
 			hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
 			proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
 			ulp = NULL;
 			break;
 		case IPPROTO_ROUTING:	/* RFC 2460 */
 			PULLUP_TO(hlen, ulp, struct ip6_rthdr);
 			hlen += (((struct ip6_rthdr *)ulp)->ip6r_len + 1) << 3;
 			proto = ((struct ip6_rthdr *)ulp)->ip6r_nxt;
 			ulp = NULL;
 			break;
 		case IPPROTO_FRAGMENT:	/* RFC 2460 */
 			PULLUP_TO(hlen, ulp, struct ip6_frag);
 			hlen += sizeof (struct ip6_frag);
 			proto = ((struct ip6_frag *)ulp)->ip6f_nxt;
 			offset = ((struct ip6_frag *)ulp)->ip6f_offlg &
 			    IP6F_OFF_MASK;
 			ulp = NULL;
 			break;
 		case IPPROTO_DSTOPTS:	/* RFC 2460 */
 			PULLUP_TO(hlen, ulp, struct ip6_hbh);
 			hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
 			proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
 			ulp = NULL;
 			break;
 		case IPPROTO_AH:	/* RFC 2402 */
 			PULLUP_TO(hlen, ulp, struct ip6_ext);
 			hlen += (((struct ip6_ext *)ulp)->ip6e_len + 2) << 2;
 			proto = ((struct ip6_ext *)ulp)->ip6e_nxt;
 			ulp = NULL;
 			break;
 		default:
 			PULLUP_TO(hlen, ulp, struct ip6_ext);
 			break;
 		}
 	}
 
 	bcopy(&ip6->ip6_dst, &key[0], sizeof(struct in6_addr));
 	bcopy(&ip6->ip6_src, &key[4], sizeof(struct in6_addr));
 	key[8] = (dport << 16) | sport;
 	fibnum |= proto << 16;
 
 	fle = flowtable_lookup_common(&V_ip6_ft, key, 9 * sizeof(uint32_t),
 	    fibnum);
 #else	/* !FLOWTABLE_HASH_ALL */
 	bcopy(&ip6->ip6_dst, &key[0], sizeof(struct in6_addr));
 	fle = flowtable_lookup_common(&V_ip6_ft, key, sizeof(struct in6_addr),
 	    fibnum);
 #endif	/* FLOWTABLE_HASH_ALL */
 
 	if (fle == NULL)
 		return (NULL);
 
 	sin6 = (struct sockaddr_in6 *)&ro->ro_dst;
 	sin6->sin6_family = AF_INET6;
 	sin6->sin6_len = sizeof(*sin6);
 	bcopy(&ip6->ip6_dst, &sin6->sin6_addr, sizeof(struct in6_addr));
 
 	return (fle);
 }
 #endif /* INET6 */
 
 static bitstr_t *
 flowtable_mask(struct flowtable *ft)
 {
 
 	/*
 	 * flowtable_free_stale() calls w/o critical section, but
 	 * with sched_bind(). Since pointer is stable throughout
 	 * ft lifetime, it is safe, otherwise...
 	 *
 	 * CRITICAL_ASSERT(curthread);
 	 */
 
 	return (*(bitstr_t **)zpcpu_get(ft->ft_masks));
 }
 
 static struct flist *
 flowtable_list(struct flowtable *ft, uint32_t hash)
 {
 
 	CRITICAL_ASSERT(curthread);
 	return (zpcpu_get(ft->ft_table[hash % ft->ft_size]));
 }
 
 static int
 flow_stale(struct flowtable *ft, struct flentry *fle, int maxidle)
 {
 
 	if (((fle->f_rt->rt_flags & RTF_UP) == 0) ||
 	    (fle->f_rt->rt_ifp == NULL) ||
 	    !RT_LINK_IS_UP(fle->f_rt->rt_ifp) ||
 	    (fle->f_lle->la_flags & LLE_VALID) == 0)
 		return (1);
 
 	if (time_uptime - fle->f_uptime > maxidle)
 		return (1);
 
 #ifdef FLOWTABLE_HASH_ALL
 	if (fle->f_flags & FL_STALE)
 		return (1);
 #endif
 
 	return (0);
 }
 
 static int
 flow_full(void)
 {
 	int count, max;
 
 	count = uma_zone_get_cur(flow_zone);
 	max = uma_zone_get_max(flow_zone);
 
 	return (count > (max - (max >> 3)));
 }
 
 static int
 flow_matches(struct flentry *fle, uint32_t *key, int keylen, uint32_t fibnum)
 {
 #ifdef FLOWTABLE_HASH_ALL
 	uint8_t proto;
 
 	proto = (fibnum >> 16) & 0xff;
 	fibnum &= 0xffff;
 #endif
 
 	CRITICAL_ASSERT(curthread);
 
 	/* Microoptimization for IPv4: don't use bcmp(). */
 	if (((keylen == sizeof(uint32_t) && (fle->f_key[0] == key[0])) ||
 	    (bcmp(fle->f_key, key, keylen) == 0)) &&
 	    fibnum == fle->f_fibnum &&
 #ifdef FLOWTABLE_HASH_ALL
 	    proto == fle->f_proto &&
 #endif
 	    (fle->f_rt->rt_flags & RTF_UP) &&
 	    fle->f_rt->rt_ifp != NULL &&
 	    (fle->f_lle->la_flags & LLE_VALID))
 		return (1);
 
 	return (0);
 }
 
 static struct flentry *
 flowtable_insert(struct flowtable *ft, uint32_t hash, uint32_t *key,
     int keylen, uint32_t fibnum0)
 {
 #ifdef INET6
 	struct route_in6 sro6;
 #endif
 #ifdef INET
 	struct route sro;
 #endif
 	struct route *ro = NULL;
 	struct rtentry *rt;
 	struct lltable *lt = NULL;
 	struct llentry *lle;
 	struct sockaddr_storage *l3addr;
 	struct ifnet *ifp;
 	struct flist *flist;
 	struct flentry *fle, *iter;
 	bitstr_t *mask;
 	uint16_t fibnum = fibnum0;
 #ifdef FLOWTABLE_HASH_ALL
 	uint8_t proto;
 
 	proto = (fibnum0 >> 16) & 0xff;
 	fibnum = fibnum0 & 0xffff;
 #endif
 
 	/*
 	 * This bit of code ends up locking the
 	 * same route 3 times (just like ip_output + ether_output)
 	 * - at lookup
 	 * - in rt_check when called by arpresolve
 	 * - dropping the refcount for the rtentry
 	 *
 	 * This could be consolidated to one if we wrote a variant
 	 * of arpresolve with an rt_check variant that expected to
 	 * receive the route locked
 	 */
 #ifdef INET
 	if (ft == &V_ip4_ft) {
 		struct sockaddr_in *sin;
 
 		ro = &sro;
 		bzero(&sro.ro_dst, sizeof(sro.ro_dst));
 
 		sin = (struct sockaddr_in *)&sro.ro_dst;
 		sin->sin_family = AF_INET;
 		sin->sin_len = sizeof(*sin);
 		sin->sin_addr.s_addr = key[0];
 	}
 #endif
 #ifdef INET6
 	if (ft == &V_ip6_ft) {
 		struct sockaddr_in6 *sin6;
 
 		ro = (struct route *)&sro6;
 		sin6 = &sro6.ro_dst;
 
 		bzero(sin6, sizeof(*sin6));
 		sin6->sin6_family = AF_INET6;
 		sin6->sin6_len = sizeof(*sin6);
 		bcopy(key, &sin6->sin6_addr, sizeof(struct in6_addr));
 	}
 #endif
 
 	ro->ro_rt = NULL;
 #ifdef RADIX_MPATH
 	rtalloc_mpath_fib(ro, hash, fibnum);
 #else
 	rtalloc_ign_fib(ro, 0, fibnum);
 #endif
 	if (ro->ro_rt == NULL)
 		return (NULL);
 
 	rt = ro->ro_rt;
 	ifp = rt->rt_ifp;
 
 	if (ifp->if_flags & (IFF_POINTOPOINT | IFF_LOOPBACK)) {
 		RTFREE(rt);
 		return (NULL);
 	}
 
 #ifdef INET
 	if (ft == &V_ip4_ft)
 		lt = LLTABLE(ifp);
 #endif
 #ifdef INET6
 	if (ft == &V_ip6_ft)
 		lt = LLTABLE6(ifp);
 #endif
 
 	if (rt->rt_flags & RTF_GATEWAY)
 		l3addr = (struct sockaddr_storage *)rt->rt_gateway;
 	else
 		l3addr = (struct sockaddr_storage *)&ro->ro_dst;
 	lle = llentry_alloc(ifp, lt, l3addr);
 
 	if (lle == NULL) {
 		RTFREE(rt);
 		return (NULL);
 	}
 
 	/* Don't insert the entry if the ARP hasn't yet finished resolving. */
 	if ((lle->la_flags & LLE_VALID) == 0) {
 		RTFREE(rt);
 		LLE_FREE(lle);
 		FLOWSTAT_INC(ft, ft_fail_lle_invalid);
 		return (NULL);
 	}
 
 	fle = uma_zalloc(flow_zone, M_NOWAIT | M_ZERO);
 	if (fle == NULL) {
 		RTFREE(rt);
 		LLE_FREE(lle);
 		return (NULL);
 	}
 
 	fle->f_hash = hash;
 	bcopy(key, &fle->f_key, keylen);
 	fle->f_rt = rt;
 	fle->f_lle = lle;
 	fle->f_fibnum = fibnum;
 	fle->f_uptime = time_uptime;
 #ifdef FLOWTABLE_HASH_ALL
 	fle->f_proto = proto;
 	fle->f_flags = fibnum0 >> 24;
 #endif
 
 	critical_enter();
 	mask = flowtable_mask(ft);
 	flist = flowtable_list(ft, hash);
 
 	if (SLIST_EMPTY(flist)) {
 		bit_set(mask, (hash % ft->ft_size));
 		SLIST_INSERT_HEAD(flist, fle, f_next);
 		goto skip;
 	}
 
 	/*
 	 * find end of list and make sure that we were not
 	 * preempted by another thread handling this flow
 	 */
 	SLIST_FOREACH(iter, flist, f_next) {
 		KASSERT(iter->f_hash % ft->ft_size == hash % ft->ft_size,
 		    ("%s: wrong hash", __func__));
 		if (flow_matches(iter, key, keylen, fibnum)) {
 			/*
 			 * We probably migrated to an other CPU after
 			 * lookup in flowtable_lookup_common() failed.
 			 * It appeared that this CPU already has flow
 			 * entry.
 			 */
 			iter->f_uptime = time_uptime;
 #ifdef FLOWTABLE_HASH_ALL
 			iter->f_flags |= fibnum >> 24;
 #endif
 			critical_exit();
 			FLOWSTAT_INC(ft, ft_collisions);
 			uma_zfree(flow_zone, fle);
 			return (iter);
 		}
 	}
 
 	SLIST_INSERT_HEAD(flist, fle, f_next);
 skip:
 	critical_exit();
 	FLOWSTAT_INC(ft, ft_inserts);
 
 	return (fle);
 }
 
 int
 flowtable_lookup(sa_family_t sa, struct mbuf *m, struct route *ro)
 {
 	struct flentry *fle;
 	struct llentry *lle;
 
 	if (V_flowtable_enable == 0)
 		return (ENXIO);
 
 	switch (sa) {
 #ifdef INET
 	case AF_INET:
 		fle = flowtable_lookup_ipv4(m, ro);
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		fle = flowtable_lookup_ipv6(m, ro);
 		break;
 #endif
 	default:
 		panic("%s: sa %d", __func__, sa);
 	}
 
 	if (fle == NULL)
 		return (EHOSTUNREACH);
 
 	if (M_HASHTYPE_GET(m) == M_HASHTYPE_NONE) {
-		M_HASHTYPE_SET(m, M_HASHTYPE_OPAQUE);
+		M_HASHTYPE_SET(m, M_HASHTYPE_OPAQUE_HASH);
 		m->m_pkthdr.flowid = fle->f_hash;
 	}
 
 	ro->ro_rt = fle->f_rt;
 	ro->ro_flags |= RT_NORTREF;
 	lle = fle->f_lle;
 	if (lle != NULL && (lle->la_flags & LLE_VALID))
 		ro->ro_lle = lle;	/* share ref with fle->f_lle */
 
 	return (0);
 }
 
 static struct flentry *
 flowtable_lookup_common(struct flowtable *ft, uint32_t *key, int keylen,
     uint32_t fibnum)
 {
 	struct flist *flist;
 	struct flentry *fle;
 	uint32_t hash;
 
 	FLOWSTAT_INC(ft, ft_lookups);
 
 	hash = jenkins_hash32(key, keylen / sizeof(uint32_t), flow_hashjitter);
 
 	critical_enter();
 	flist = flowtable_list(ft, hash);
 	SLIST_FOREACH(fle, flist, f_next) {
 		KASSERT(fle->f_hash % ft->ft_size == hash % ft->ft_size,
 		    ("%s: wrong hash", __func__));
 		if (flow_matches(fle, key, keylen, fibnum)) {
 			fle->f_uptime = time_uptime;
 #ifdef FLOWTABLE_HASH_ALL
 			fle->f_flags |= fibnum >> 24;
 #endif
 			critical_exit();
 			FLOWSTAT_INC(ft, ft_hits);
 			return (fle);
 		}
 	}
 	critical_exit();
 
 	FLOWSTAT_INC(ft, ft_misses);
 
 	return (flowtable_insert(ft, hash, key, keylen, fibnum));
 }
 
 static void
 flowtable_alloc(struct flowtable *ft)
 {
 
 	ft->ft_table = malloc(ft->ft_size * sizeof(struct flist),
 	    M_FTABLE, M_WAITOK);
 	for (int i = 0; i < ft->ft_size; i++)
 		ft->ft_table[i] = uma_zalloc(pcpu_zone_ptr, M_WAITOK | M_ZERO);
 
 	ft->ft_masks = uma_zalloc(pcpu_zone_ptr, M_WAITOK);
 	for (int i = 0; i < mp_ncpus; i++) {
 		bitstr_t **b;
 
 		b = zpcpu_get_cpu(ft->ft_masks, i);
 		*b = bit_alloc(ft->ft_size, M_FTABLE, M_WAITOK);
 	}
 	ft->ft_tmpmask = bit_alloc(ft->ft_size, M_FTABLE, M_WAITOK);
 }
 
 static void
 flowtable_free_stale(struct flowtable *ft, struct rtentry *rt, int maxidle)
 {
 	struct flist *flist, freelist;
 	struct flentry *fle, *fle1, *fleprev;
 	bitstr_t *mask, *tmpmask;
 	int curbit, tmpsize;
 
 	SLIST_INIT(&freelist);
 	mask = flowtable_mask(ft);
 	tmpmask = ft->ft_tmpmask;
 	tmpsize = ft->ft_size;
 	memcpy(tmpmask, mask, ft->ft_size/8);
 	curbit = 0;
 	fleprev = NULL; /* pacify gcc */
 	/*
 	 * XXX Note to self, bit_ffs operates at the byte level
 	 * and thus adds gratuitous overhead
 	 */
 	bit_ffs(tmpmask, ft->ft_size, &curbit);
 	while (curbit != -1) {
 		if (curbit >= ft->ft_size || curbit < -1) {
 			log(LOG_ALERT,
 			    "warning: bad curbit value %d \n",
 			    curbit);
 			break;
 		}
 
 		FLOWSTAT_INC(ft, ft_free_checks);
 
 		critical_enter();
 		flist = flowtable_list(ft, curbit);
 #ifdef DIAGNOSTIC
 		if (SLIST_EMPTY(flist) && curbit > 0) {
 			log(LOG_ALERT,
 			    "warning bit=%d set, but no fle found\n",
 			    curbit);
 		}
 #endif
 		SLIST_FOREACH_SAFE(fle, flist, f_next, fle1) {
 			if (rt != NULL && fle->f_rt != rt) {
 				fleprev = fle;
 				continue;
 			}
 			if (!flow_stale(ft, fle, maxidle)) {
 				fleprev = fle;
 				continue;
 			}
 
 			if (fle == SLIST_FIRST(flist))
 				SLIST_REMOVE_HEAD(flist, f_next);
 			else
 				SLIST_REMOVE_AFTER(fleprev, f_next);
 			SLIST_INSERT_HEAD(&freelist, fle, f_next);
 		}
 		if (SLIST_EMPTY(flist))
 			bit_clear(mask, curbit);
 		critical_exit();
 
 		bit_clear(tmpmask, curbit);
 		bit_ffs(tmpmask, tmpsize, &curbit);
 	}
 
 	SLIST_FOREACH_SAFE(fle, &freelist, f_next, fle1) {
 		FLOWSTAT_INC(ft, ft_frees);
 		if (fle->f_rt != NULL)
 			RTFREE(fle->f_rt);
 		if (fle->f_lle != NULL)
 			LLE_FREE(fle->f_lle);
 		uma_zfree(flow_zone, fle);
 	}
 }
 
 static void
 flowtable_clean_vnet(struct flowtable *ft, struct rtentry *rt, int maxidle)
 {
 	int i;
 
 	CPU_FOREACH(i) {
 		if (smp_started == 1) {
 			thread_lock(curthread);
 			sched_bind(curthread, i);
 			thread_unlock(curthread);
 		}
 
 		flowtable_free_stale(ft, rt, maxidle);
 
 		if (smp_started == 1) {
 			thread_lock(curthread);
 			sched_unbind(curthread);
 			thread_unlock(curthread);
 		}
 	}
 }
 
 void
 flowtable_route_flush(sa_family_t sa, struct rtentry *rt)
 {
 	struct flowtable *ft;
 
 	switch (sa) {
 #ifdef INET
 	case AF_INET:
 		ft = &V_ip4_ft;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		ft = &V_ip6_ft;
 		break;
 #endif
 	default:
 		panic("%s: sa %d", __func__, sa);
 	}
 
 	flowtable_clean_vnet(ft, rt, 0);
 }
 
 static void
 flowtable_cleaner(void)
 {
 	VNET_ITERATOR_DECL(vnet_iter);
 	struct thread *td;
 
 	if (bootverbose)
 		log(LOG_INFO, "flowtable cleaner started\n");
 	td = curthread;
 	while (1) {
 		uint32_t flowclean_freq, maxidle;
 
 		/*
 		 * The maximum idle time, as well as frequency are arbitrary.
 		 */
 		if (flow_full())
 			maxidle = 5;
 		else
 			maxidle = 30;
 
 		VNET_LIST_RLOCK();
 		VNET_FOREACH(vnet_iter) {
 			CURVNET_SET(vnet_iter);
 #ifdef INET
 			flowtable_clean_vnet(&V_ip4_ft, NULL, maxidle);
 #endif
 #ifdef INET6
 			flowtable_clean_vnet(&V_ip6_ft, NULL, maxidle);
 #endif
 			CURVNET_RESTORE();
 		}
 		VNET_LIST_RUNLOCK();
 
 		if (flow_full())
 			flowclean_freq = 4*hz;
 		else
 			flowclean_freq = 20*hz;
 		mtx_lock(&flowclean_lock);
 		thread_lock(td);
 		sched_prio(td, PPAUSE);
 		thread_unlock(td);
 		flowclean_cycles++;
 		cv_broadcast(&flowclean_f_cv);
 		cv_timedwait(&flowclean_c_cv, &flowclean_lock, flowclean_freq);
 		mtx_unlock(&flowclean_lock);
 	}
 }
 
 static void
 flowtable_flush(void *unused __unused)
 {
 	uint64_t start;
 
 	mtx_lock(&flowclean_lock);
 	start = flowclean_cycles;
 	while (start == flowclean_cycles) {
 		cv_broadcast(&flowclean_c_cv);
 		cv_wait(&flowclean_f_cv, &flowclean_lock);
 	}
 	mtx_unlock(&flowclean_lock);
 }
 
 static struct kproc_desc flow_kp = {
 	"flowcleaner",
 	flowtable_cleaner,
 	&flowcleanerproc
 };
 SYSINIT(flowcleaner, SI_SUB_KTHREAD_IDLE, SI_ORDER_ANY, kproc_start, &flow_kp);
 
 static int
 flowtable_get_size(char *name)
 {
 	int size;
 
 	if (TUNABLE_INT_FETCH(name, &size)) {
 		if (size < 256)
 			size = 256;
 		if (!powerof2(size)) {
 			printf("%s must be power of 2\n", name);
 			size = 2048;
 		}
 	} else {
 		/*
 		 * round up to the next power of 2
 		 */
 		size = 1 << fls((1024 + maxusers * 64) - 1);
 	}
 
 	return (size);
 }
 
 static void
 flowtable_init(const void *unused __unused)
 {
 
 	flow_hashjitter = arc4random();
 
 	flow_zone = uma_zcreate("flows", sizeof(struct flentry),
 	    NULL, NULL, NULL, NULL, (64-1), UMA_ZONE_MAXBUCKET);
 	uma_zone_set_max(flow_zone, 1024 + maxusers * 64 * mp_ncpus);
 
 	cv_init(&flowclean_c_cv, "c_flowcleanwait");
 	cv_init(&flowclean_f_cv, "f_flowcleanwait");
 	mtx_init(&flowclean_lock, "flowclean lock", NULL, MTX_DEF);
 	EVENTHANDLER_REGISTER(ifnet_departure_event, flowtable_flush, NULL,
 	    EVENTHANDLER_PRI_ANY);
 }
 SYSINIT(flowtable_init, SI_SUB_PROTO_BEGIN, SI_ORDER_FIRST,
     flowtable_init, NULL);
 
 #ifdef INET
 static SYSCTL_NODE(_net_flowtable, OID_AUTO, ip4, CTLFLAG_RD, NULL,
     "Flowtable for IPv4");
 
 static VNET_PCPUSTAT_DEFINE(struct flowtable_stat, ip4_ftstat);
 VNET_PCPUSTAT_SYSINIT(ip4_ftstat);
 VNET_PCPUSTAT_SYSUNINIT(ip4_ftstat);
 SYSCTL_VNET_PCPUSTAT(_net_flowtable_ip4, OID_AUTO, stat, struct flowtable_stat,
     ip4_ftstat, "Flowtable statistics for IPv4 "
     "(struct flowtable_stat, net/flowtable.h)");
 
 static void
 flowtable_init_vnet_v4(const void *unused __unused)
 {
 
 	V_ip4_ft.ft_size = flowtable_get_size("net.flowtable.ip4.size");
 	V_ip4_ft.ft_stat = VNET(ip4_ftstat);
 	flowtable_alloc(&V_ip4_ft);
 }
 VNET_SYSINIT(ft_vnet_v4, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
     flowtable_init_vnet_v4, NULL);
 #endif /* INET */
 
 #ifdef INET6
 static SYSCTL_NODE(_net_flowtable, OID_AUTO, ip6, CTLFLAG_RD, NULL,
     "Flowtable for IPv6");
 
 static VNET_PCPUSTAT_DEFINE(struct flowtable_stat, ip6_ftstat);
 VNET_PCPUSTAT_SYSINIT(ip6_ftstat);
 VNET_PCPUSTAT_SYSUNINIT(ip6_ftstat);
 SYSCTL_VNET_PCPUSTAT(_net_flowtable_ip6, OID_AUTO, stat, struct flowtable_stat,
     ip6_ftstat, "Flowtable statistics for IPv6 "
     "(struct flowtable_stat, net/flowtable.h)");
 
 static void
 flowtable_init_vnet_v6(const void *unused __unused)
 {
 
 	V_ip6_ft.ft_size = flowtable_get_size("net.flowtable.ip6.size");
 	V_ip6_ft.ft_stat = VNET(ip6_ftstat);
 	flowtable_alloc(&V_ip6_ft);
 }
 VNET_SYSINIT(flowtable_init_vnet_v6, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
     flowtable_init_vnet_v6, NULL);
 #endif /* INET6 */
 
 #ifdef DDB
 static bitstr_t *
 flowtable_mask_pcpu(struct flowtable *ft, int cpuid)
 {
 
 	return (zpcpu_get_cpu(*ft->ft_masks, cpuid));
 }
 
 static struct flist *
 flowtable_list_pcpu(struct flowtable *ft, uint32_t hash, int cpuid)
 {
 
 	return (zpcpu_get_cpu(&ft->ft_table[hash % ft->ft_size], cpuid));
 }
 
 static void
 flow_show(struct flowtable *ft, struct flentry *fle)
 {
 	int idle_time;
 	int rt_valid, ifp_valid;
 	volatile struct rtentry *rt;
 	struct ifnet *ifp = NULL;
 	uint32_t *hashkey = fle->f_key;
 
 	idle_time = (int)(time_uptime - fle->f_uptime);
 	rt = fle->f_rt;
 	rt_valid = rt != NULL;
 	if (rt_valid)
 		ifp = rt->rt_ifp;
 	ifp_valid = ifp != NULL;
 
 #ifdef INET
 	if (ft == &V_ip4_ft) {
 		char daddr[4*sizeof "123"];
 #ifdef FLOWTABLE_HASH_ALL
 		char saddr[4*sizeof "123"];
 		uint16_t sport, dport;
 #endif
 
 		inet_ntoa_r(*(struct in_addr *) &hashkey[0], daddr);
 #ifdef FLOWTABLE_HASH_ALL
 		inet_ntoa_r(*(struct in_addr *) &hashkey[1], saddr);
 		dport = ntohs((uint16_t)(hashkey[2] >> 16));
 		sport = ntohs((uint16_t)(hashkey[2] & 0xffff));
 		db_printf("%s:%d->%s:%d", saddr, sport, daddr, dport);
 #else
 		db_printf("%s ", daddr);
 #endif
 	}
 #endif /* INET */
 #ifdef INET6
 	if (ft == &V_ip6_ft) {
 #ifdef FLOWTABLE_HASH_ALL
 		db_printf("\n\tkey=%08x:%08x:%08x%08x:%08x:%08x%08x:%08x:%08x",
 		    hashkey[0], hashkey[1], hashkey[2],
 		    hashkey[3], hashkey[4], hashkey[5],
 		    hashkey[6], hashkey[7], hashkey[8]);
 #else
 		db_printf("\n\tkey=%08x:%08x:%08x ",
 		    hashkey[0], hashkey[1], hashkey[2]);
 #endif
 	}
 #endif /* INET6 */
 
 	db_printf("hash=%08x idle_time=%03d"
 	    "\n\tfibnum=%02d rt=%p",
 	    fle->f_hash, idle_time, fle->f_fibnum, fle->f_rt);
 
 #ifdef FLOWTABLE_HASH_ALL
 	if (fle->f_flags & FL_STALE)
 		db_printf(" FL_STALE ");
 #endif
 	if (rt_valid) {
 		if (rt->rt_flags & RTF_UP)
 			db_printf(" RTF_UP ");
 	}
 	if (ifp_valid) {
 		if (ifp->if_flags & IFF_LOOPBACK)
 			db_printf(" IFF_LOOPBACK ");
 		if (ifp->if_flags & IFF_UP)
 			db_printf(" IFF_UP ");
 		if (ifp->if_flags & IFF_POINTOPOINT)
 			db_printf(" IFF_POINTOPOINT ");
 	}
 	db_printf("\n");
 }
 
 static void
 flowtable_show(struct flowtable *ft, int cpuid)
 {
 	int curbit = 0;
 	bitstr_t *mask, *tmpmask;
 
 	if (cpuid != -1)
 		db_printf("cpu: %d\n", cpuid);
 	mask = flowtable_mask_pcpu(ft, cpuid);
 	tmpmask = ft->ft_tmpmask;
 	memcpy(tmpmask, mask, ft->ft_size/8);
 	/*
 	 * XXX Note to self, bit_ffs operates at the byte level
 	 * and thus adds gratuitous overhead
 	 */
 	bit_ffs(tmpmask, ft->ft_size, &curbit);
 	while (curbit != -1) {
 		struct flist *flist;
 		struct flentry *fle;
 
 		if (curbit >= ft->ft_size || curbit < -1) {
 			db_printf("warning: bad curbit value %d \n",
 			    curbit);
 			break;
 		}
 
 		flist = flowtable_list_pcpu(ft, curbit, cpuid);
 
 		SLIST_FOREACH(fle, flist, f_next)
 			flow_show(ft, fle);
 		bit_clear(tmpmask, curbit);
 		bit_ffs(tmpmask, ft->ft_size, &curbit);
 	}
 }
 
 static void
 flowtable_show_vnet(struct flowtable *ft)
 {
 
 	int i;
 
 	CPU_FOREACH(i)
 		flowtable_show(ft, i);
 }
 
 DB_SHOW_COMMAND(flowtables, db_show_flowtables)
 {
 	VNET_ITERATOR_DECL(vnet_iter);
 
 	VNET_FOREACH(vnet_iter) {
 		CURVNET_SET(vnet_iter);
 #ifdef VIMAGE
 		db_printf("vnet %p\n", vnet_iter);
 #endif
 #ifdef INET
 		printf("IPv4:\n");
 		flowtable_show_vnet(&V_ip4_ft);
 #endif
 #ifdef INET6
 		printf("IPv6:\n");
 		flowtable_show_vnet(&V_ip6_ft);
 #endif
 		CURVNET_RESTORE();
 	}
 }
 #endif
Index: projects/vnet/sys/net/if_vxlan.c
===================================================================
--- projects/vnet/sys/net/if_vxlan.c	(revision 301546)
+++ projects/vnet/sys/net/if_vxlan.c	(revision 301547)
@@ -1,3090 +1,3089 @@
 /*-
  * Copyright (c) 2014, Bryan Venteicher <bryanv@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/eventhandler.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/hash.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/module.h>
 #include <sys/refcount.h>
 #include <sys/rmlock.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/queue.h>
 #include <sys/sbuf.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_clone.h>
 #include <net/if_dl.h>
 #include <net/if_types.h>
 #include <net/if_vxlan.h>
 #include <net/netisr.h>
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/in_var.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 #include <netinet/ip_var.h>
 #include <netinet6/ip6_var.h>
 #include <netinet/udp.h>
 #include <netinet/udp_var.h>
 
 struct vxlan_softc;
 LIST_HEAD(vxlan_softc_head, vxlan_softc);
 
 struct vxlan_socket_mc_info {
 	union vxlan_sockaddr		 vxlsomc_saddr;
 	union vxlan_sockaddr		 vxlsomc_gaddr;
 	int				 vxlsomc_ifidx;
 	int				 vxlsomc_users;
 };
 
 #define VXLAN_SO_MC_MAX_GROUPS		32
 
 #define VXLAN_SO_VNI_HASH_SHIFT		6
 #define VXLAN_SO_VNI_HASH_SIZE		(1 << VXLAN_SO_VNI_HASH_SHIFT)
 #define VXLAN_SO_VNI_HASH(_vni)		((_vni) % VXLAN_SO_VNI_HASH_SIZE)
 
 struct vxlan_socket {
 	struct socket			*vxlso_sock;
 	struct rmlock			 vxlso_lock;
 	u_int				 vxlso_refcnt;
 	union vxlan_sockaddr		 vxlso_laddr;
 	LIST_ENTRY(vxlan_socket)	 vxlso_entry;
 	struct vxlan_softc_head		 vxlso_vni_hash[VXLAN_SO_VNI_HASH_SIZE];
 	struct vxlan_socket_mc_info	 vxlso_mc[VXLAN_SO_MC_MAX_GROUPS];
 };
 
 #define VXLAN_SO_RLOCK(_vso, _p)	rm_rlock(&(_vso)->vxlso_lock, (_p))
 #define VXLAN_SO_RUNLOCK(_vso, _p)	rm_runlock(&(_vso)->vxlso_lock, (_p))
 #define VXLAN_SO_WLOCK(_vso)		rm_wlock(&(_vso)->vxlso_lock)
 #define VXLAN_SO_WUNLOCK(_vso)		rm_wunlock(&(_vso)->vxlso_lock)
 #define VXLAN_SO_LOCK_ASSERT(_vso) \
     rm_assert(&(_vso)->vxlso_lock, RA_LOCKED)
 #define VXLAN_SO_LOCK_WASSERT(_vso) \
     rm_assert(&(_vso)->vxlso_lock, RA_WLOCKED)
 
 #define VXLAN_SO_ACQUIRE(_vso)		refcount_acquire(&(_vso)->vxlso_refcnt)
 #define VXLAN_SO_RELEASE(_vso)		refcount_release(&(_vso)->vxlso_refcnt)
 
 struct vxlan_ftable_entry {
 	LIST_ENTRY(vxlan_ftable_entry)	 vxlfe_hash;
 	uint16_t			 vxlfe_flags;
 	uint8_t				 vxlfe_mac[ETHER_ADDR_LEN];
 	union vxlan_sockaddr		 vxlfe_raddr;
 	time_t				 vxlfe_expire;
 };
 
 #define VXLAN_FE_FLAG_DYNAMIC		0x01
 #define VXLAN_FE_FLAG_STATIC		0x02
 
 #define VXLAN_FE_IS_DYNAMIC(_fe) \
     ((_fe)->vxlfe_flags & VXLAN_FE_FLAG_DYNAMIC)
 
 #define VXLAN_SC_FTABLE_SHIFT		9
 #define VXLAN_SC_FTABLE_SIZE		(1 << VXLAN_SC_FTABLE_SHIFT)
 #define VXLAN_SC_FTABLE_MASK		(VXLAN_SC_FTABLE_SIZE - 1)
 #define VXLAN_SC_FTABLE_HASH(_sc, _mac)	\
     (vxlan_mac_hash(_sc, _mac) % VXLAN_SC_FTABLE_SIZE)
 
 LIST_HEAD(vxlan_ftable_head, vxlan_ftable_entry);
 
 struct vxlan_statistics {
 	uint32_t	ftable_nospace;
 	uint32_t	ftable_lock_upgrade_failed;
 };
 
 struct vxlan_softc {
 	struct ifnet			*vxl_ifp;
 	struct vxlan_socket		*vxl_sock;
 	uint32_t			 vxl_vni;
 	union vxlan_sockaddr		 vxl_src_addr;
 	union vxlan_sockaddr		 vxl_dst_addr;
 	uint32_t			 vxl_flags;
 #define VXLAN_FLAG_INIT		0x0001
 #define VXLAN_FLAG_TEARDOWN	0x0002
 #define VXLAN_FLAG_LEARN	0x0004
 
 	uint32_t			 vxl_port_hash_key;
 	uint16_t			 vxl_min_port;
 	uint16_t			 vxl_max_port;
 	uint8_t				 vxl_ttl;
 
 	/* Lookup table from MAC address to forwarding entry. */
 	uint32_t			 vxl_ftable_cnt;
 	uint32_t			 vxl_ftable_max;
 	uint32_t			 vxl_ftable_timeout;
 	uint32_t			 vxl_ftable_hash_key;
 	struct vxlan_ftable_head	*vxl_ftable;
 
 	/* Derived from vxl_dst_addr. */
 	struct vxlan_ftable_entry	 vxl_default_fe;
 
 	struct ip_moptions		*vxl_im4o;
 	struct ip6_moptions		*vxl_im6o;
 
 	struct rmlock			 vxl_lock;
 	volatile u_int			 vxl_refcnt;
 
 	int				 vxl_unit;
 	int				 vxl_vso_mc_index;
 	struct vxlan_statistics		 vxl_stats;
 	struct sysctl_oid		*vxl_sysctl_node;
 	struct sysctl_ctx_list		 vxl_sysctl_ctx;
 	struct callout			 vxl_callout;
 	uint8_t				 vxl_hwaddr[ETHER_ADDR_LEN];
 	int				 vxl_mc_ifindex;
 	struct ifnet			*vxl_mc_ifp;
 	char				 vxl_mc_ifname[IFNAMSIZ];
 	LIST_ENTRY(vxlan_softc)		 vxl_entry;
 	LIST_ENTRY(vxlan_softc)		 vxl_ifdetach_list;
 };
 
 #define VXLAN_RLOCK(_sc, _p)	rm_rlock(&(_sc)->vxl_lock, (_p))
 #define VXLAN_RUNLOCK(_sc, _p)	rm_runlock(&(_sc)->vxl_lock, (_p))
 #define VXLAN_WLOCK(_sc)	rm_wlock(&(_sc)->vxl_lock)
 #define VXLAN_WUNLOCK(_sc)	rm_wunlock(&(_sc)->vxl_lock)
 #define VXLAN_LOCK_WOWNED(_sc)	rm_wowned(&(_sc)->vxl_lock)
 #define VXLAN_LOCK_ASSERT(_sc)	rm_assert(&(_sc)->vxl_lock, RA_LOCKED)
 #define VXLAN_LOCK_WASSERT(_sc) rm_assert(&(_sc)->vxl_lock, RA_WLOCKED)
 #define VXLAN_UNLOCK(_sc, _p) do {		\
     if (VXLAN_LOCK_WOWNED(_sc))			\
 	VXLAN_WUNLOCK(_sc);			\
     else					\
 	VXLAN_RUNLOCK(_sc, _p);			\
 } while (0)
 
 #define VXLAN_ACQUIRE(_sc)	refcount_acquire(&(_sc)->vxl_refcnt)
 #define VXLAN_RELEASE(_sc)	refcount_release(&(_sc)->vxl_refcnt)
 
 #define	satoconstsin(sa)	((const struct sockaddr_in *)(sa))
 #define	satoconstsin6(sa)	((const struct sockaddr_in6 *)(sa))
 
 struct vxlanudphdr {
 	struct udphdr		vxlh_udp;
 	struct vxlan_header	vxlh_hdr;
 } __packed;
 
 static int	vxlan_ftable_addr_cmp(const uint8_t *, const uint8_t *);
 static void	vxlan_ftable_init(struct vxlan_softc *);
 static void	vxlan_ftable_fini(struct vxlan_softc *);
 static void	vxlan_ftable_flush(struct vxlan_softc *, int);
 static void	vxlan_ftable_expire(struct vxlan_softc *);
 static int	vxlan_ftable_update_locked(struct vxlan_softc *,
 		    const struct sockaddr *, const uint8_t *,
 		    struct rm_priotracker *);
 static int	vxlan_ftable_update(struct vxlan_softc *,
 		    const struct sockaddr *, const uint8_t *);
 static int	vxlan_ftable_sysctl_dump(SYSCTL_HANDLER_ARGS);
 
 static struct vxlan_ftable_entry *
 		vxlan_ftable_entry_alloc(void);
 static void	vxlan_ftable_entry_free(struct vxlan_ftable_entry *);
 static void	vxlan_ftable_entry_init(struct vxlan_softc *,
 		    struct vxlan_ftable_entry *, const uint8_t *,
 		    const struct sockaddr *, uint32_t);
 static void	vxlan_ftable_entry_destroy(struct vxlan_softc *,
 		    struct vxlan_ftable_entry *);
 static int	vxlan_ftable_entry_insert(struct vxlan_softc *,
 		    struct vxlan_ftable_entry *);
 static struct vxlan_ftable_entry *
 		vxlan_ftable_entry_lookup(struct vxlan_softc *,
 		    const uint8_t *);
 static void	vxlan_ftable_entry_dump(struct vxlan_ftable_entry *,
 		    struct sbuf *);
 
 static struct vxlan_socket *
 		vxlan_socket_alloc(const union vxlan_sockaddr *);
 static void	vxlan_socket_destroy(struct vxlan_socket *);
 static void	vxlan_socket_release(struct vxlan_socket *);
 static struct vxlan_socket *
 		vxlan_socket_lookup(union vxlan_sockaddr *vxlsa);
 static void	vxlan_socket_insert(struct vxlan_socket *);
 static int	vxlan_socket_init(struct vxlan_socket *, struct ifnet *);
 static int	vxlan_socket_bind(struct vxlan_socket *, struct ifnet *);
 static int	vxlan_socket_create(struct ifnet *, int,
 		    const union vxlan_sockaddr *, struct vxlan_socket **);
 static void	vxlan_socket_ifdetach(struct vxlan_socket *,
 		    struct ifnet *, struct vxlan_softc_head *);
 
 static struct vxlan_socket *
 		vxlan_socket_mc_lookup(const union vxlan_sockaddr *);
 static int	vxlan_sockaddr_mc_info_match(
 		    const struct vxlan_socket_mc_info *,
 		    const union vxlan_sockaddr *,
 		    const union vxlan_sockaddr *, int);
 static int	vxlan_socket_mc_join_group(struct vxlan_socket *,
 		    const union vxlan_sockaddr *, const union vxlan_sockaddr *,
 		    int *, union vxlan_sockaddr *);
 static int	vxlan_socket_mc_leave_group(struct vxlan_socket *,
 		    const union vxlan_sockaddr *,
 		    const union vxlan_sockaddr *, int);
 static int	vxlan_socket_mc_add_group(struct vxlan_socket *,
 		    const union vxlan_sockaddr *, const union vxlan_sockaddr *,
 		    int, int *);
 static void	vxlan_socket_mc_release_group_by_idx(struct vxlan_socket *,
 		    int);
 
 static struct vxlan_softc *
 		vxlan_socket_lookup_softc_locked(struct vxlan_socket *,
 		    uint32_t);
 static struct vxlan_softc *
 		vxlan_socket_lookup_softc(struct vxlan_socket *, uint32_t);
 static int	vxlan_socket_insert_softc(struct vxlan_socket *,
 		    struct vxlan_softc *);
 static void	vxlan_socket_remove_softc(struct vxlan_socket *,
 		    struct vxlan_softc *);
 
 static struct ifnet *
 		vxlan_multicast_if_ref(struct vxlan_softc *, int);
 static void	vxlan_free_multicast(struct vxlan_softc *);
 static int	vxlan_setup_multicast_interface(struct vxlan_softc *);
 
 static int	vxlan_setup_multicast(struct vxlan_softc *);
 static int	vxlan_setup_socket(struct vxlan_softc *);
 static void	vxlan_setup_interface(struct vxlan_softc *);
 static int	vxlan_valid_init_config(struct vxlan_softc *);
 static void	vxlan_init_wait(struct vxlan_softc *);
 static void	vxlan_init_complete(struct vxlan_softc *);
 static void	vxlan_init(void *);
 static void	vxlan_release(struct vxlan_softc *);
 static void	vxlan_teardown_wait(struct vxlan_softc *);
 static void	vxlan_teardown_complete(struct vxlan_softc *);
 static void	vxlan_teardown_locked(struct vxlan_softc *);
 static void	vxlan_teardown(struct vxlan_softc *);
 static void	vxlan_ifdetach(struct vxlan_softc *, struct ifnet *,
 		    struct vxlan_softc_head *);
 static void	vxlan_timer(void *);
 
 static int	vxlan_ctrl_get_config(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_vni(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_local_addr(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_remote_addr(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_local_port(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_remote_port(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_port_range(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_ftable_timeout(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_ftable_max(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_multicast_if(struct vxlan_softc * , void *);
 static int	vxlan_ctrl_set_ttl(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_set_learn(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_ftable_entry_add(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_ftable_entry_rem(struct vxlan_softc *, void *);
 static int	vxlan_ctrl_flush(struct vxlan_softc *, void *);
 static int	vxlan_ioctl_drvspec(struct vxlan_softc *,
 		    struct ifdrv *, int);
 static int	vxlan_ioctl_ifflags(struct vxlan_softc *);
 static int	vxlan_ioctl(struct ifnet *, u_long, caddr_t);
 
 #if defined(INET) || defined(INET6)
 static uint16_t vxlan_pick_source_port(struct vxlan_softc *, struct mbuf *);
 static void	vxlan_encap_header(struct vxlan_softc *, struct mbuf *,
 		    int, uint16_t, uint16_t);
 #endif
 static int	vxlan_encap4(struct vxlan_softc *,
 		    const union vxlan_sockaddr *, struct mbuf *);
 static int	vxlan_encap6(struct vxlan_softc *,
 		    const union vxlan_sockaddr *, struct mbuf *);
 static int	vxlan_transmit(struct ifnet *, struct mbuf *);
 static void	vxlan_qflush(struct ifnet *);
 static void	vxlan_rcv_udp_packet(struct mbuf *, int, struct inpcb *,
 		    const struct sockaddr *, void *);
 static int	vxlan_input(struct vxlan_socket *, uint32_t, struct mbuf **,
 		    const struct sockaddr *);
 
 static void	vxlan_set_default_config(struct vxlan_softc *);
 static int	vxlan_set_user_config(struct vxlan_softc *,
 		     struct ifvxlanparam *);
 static int	vxlan_clone_create(struct if_clone *, int, caddr_t);
 static void	vxlan_clone_destroy(struct ifnet *);
 
 static uint32_t vxlan_mac_hash(struct vxlan_softc *, const uint8_t *);
 static void	vxlan_fakeaddr(struct vxlan_softc *);
 
 static int	vxlan_sockaddr_cmp(const union vxlan_sockaddr *,
 		    const struct sockaddr *);
 static void	vxlan_sockaddr_copy(union vxlan_sockaddr *,
 		    const struct sockaddr *);
 static int	vxlan_sockaddr_in_equal(const union vxlan_sockaddr *,
 		    const struct sockaddr *);
 static void	vxlan_sockaddr_in_copy(union vxlan_sockaddr *,
 		    const struct sockaddr *);
 static int	vxlan_sockaddr_supported(const union vxlan_sockaddr *, int);
 static int	vxlan_sockaddr_in_any(const union vxlan_sockaddr *);
 static int	vxlan_sockaddr_in_multicast(const union vxlan_sockaddr *);
 
 static int	vxlan_can_change_config(struct vxlan_softc *);
 static int	vxlan_check_vni(uint32_t);
 static int	vxlan_check_ttl(int);
 static int	vxlan_check_ftable_timeout(uint32_t);
 static int	vxlan_check_ftable_max(uint32_t);
 
 static void	vxlan_sysctl_setup(struct vxlan_softc *);
 static void	vxlan_sysctl_destroy(struct vxlan_softc *);
 static int	vxlan_tunable_int(struct vxlan_softc *, const char *, int);
 
 static void	vxlan_ifdetach_event(void *, struct ifnet *);
 static void	vxlan_load(void);
 static void	vxlan_unload(void);
 static int	vxlan_modevent(module_t, int, void *);
 
 static const char vxlan_name[] = "vxlan";
 static MALLOC_DEFINE(M_VXLAN, vxlan_name,
     "Virtual eXtensible LAN Interface");
 static struct if_clone *vxlan_cloner;
 static struct mtx vxlan_list_mtx;
 static LIST_HEAD(, vxlan_socket) vxlan_socket_list;
 
 static eventhandler_tag vxlan_ifdetach_event_tag;
 
 SYSCTL_DECL(_net_link);
 SYSCTL_NODE(_net_link, OID_AUTO, vxlan, CTLFLAG_RW, 0,
     "Virtual eXtensible Local Area Network");
 
 static int vxlan_legacy_port = 0;
 TUNABLE_INT("net.link.vxlan.legacy_port", &vxlan_legacy_port);
 static int vxlan_reuse_port = 0;
 TUNABLE_INT("net.link.vxlan.reuse_port", &vxlan_reuse_port);
 
 /* Default maximum number of addresses in the forwarding table. */
 #ifndef VXLAN_FTABLE_MAX
 #define VXLAN_FTABLE_MAX	2000
 #endif
 
 /* Timeout (in seconds) of addresses learned in the forwarding table. */
 #ifndef VXLAN_FTABLE_TIMEOUT
 #define VXLAN_FTABLE_TIMEOUT	(20 * 60)
 #endif
 
 /*
  * Maximum timeout (in seconds) of addresses learned in the forwarding
  * table.
  */
 #ifndef VXLAN_FTABLE_MAX_TIMEOUT
 #define VXLAN_FTABLE_MAX_TIMEOUT	(60 * 60 * 24)
 #endif
 
 /* Number of seconds between pruning attempts of the forwarding table. */
 #ifndef VXLAN_FTABLE_PRUNE
 #define VXLAN_FTABLE_PRUNE	(5 * 60)
 #endif
 
 static int vxlan_ftable_prune_period = VXLAN_FTABLE_PRUNE;
 
 struct vxlan_control {
 	int	(*vxlc_func)(struct vxlan_softc *, void *);
 	int	vxlc_argsize;
 	int	vxlc_flags;
 #define VXLAN_CTRL_FLAG_COPYIN	0x01
 #define VXLAN_CTRL_FLAG_COPYOUT	0x02
 #define VXLAN_CTRL_FLAG_SUSER	0x04
 };
 
 static const struct vxlan_control vxlan_control_table[] = {
 	[VXLAN_CMD_GET_CONFIG] =
 	    {	vxlan_ctrl_get_config, sizeof(struct ifvxlancfg),
 		VXLAN_CTRL_FLAG_COPYOUT
 	    },
 
 	[VXLAN_CMD_SET_VNI] =
 	    {   vxlan_ctrl_set_vni, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_LOCAL_ADDR] =
 	    {   vxlan_ctrl_set_local_addr, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_REMOTE_ADDR] =
 	    {   vxlan_ctrl_set_remote_addr, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_LOCAL_PORT] =
 	    {   vxlan_ctrl_set_local_port, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_REMOTE_PORT] =
 	    {   vxlan_ctrl_set_remote_port, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_PORT_RANGE] =
 	    {   vxlan_ctrl_set_port_range, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_FTABLE_TIMEOUT] =
 	    {	vxlan_ctrl_set_ftable_timeout, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_FTABLE_MAX] =
 	    {	vxlan_ctrl_set_ftable_max, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_MULTICAST_IF] =
 	    {	vxlan_ctrl_set_multicast_if, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_TTL] =
 	    {	vxlan_ctrl_set_ttl, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_SET_LEARN] =
 	    {	vxlan_ctrl_set_learn, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_FTABLE_ENTRY_ADD] =
 	    {	vxlan_ctrl_ftable_entry_add, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_FTABLE_ENTRY_REM] =
 	    {	vxlan_ctrl_ftable_entry_rem, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 
 	[VXLAN_CMD_FLUSH] =
 	    {   vxlan_ctrl_flush, sizeof(struct ifvxlancmd),
 		VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER,
 	    },
 };
 
 static const int vxlan_control_table_size = nitems(vxlan_control_table);
 
 static int
 vxlan_ftable_addr_cmp(const uint8_t *a, const uint8_t *b)
 {
 	int i, d;
 
 	for (i = 0, d = 0; i < ETHER_ADDR_LEN && d == 0; i++)
 		d = ((int)a[i]) - ((int)b[i]);
 
 	return (d);
 }
 
 static void
 vxlan_ftable_init(struct vxlan_softc *sc)
 {
 	int i;
 
 	sc->vxl_ftable = malloc(sizeof(struct vxlan_ftable_head) *
 	    VXLAN_SC_FTABLE_SIZE, M_VXLAN, M_ZERO | M_WAITOK);
 
 	for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++)
 		LIST_INIT(&sc->vxl_ftable[i]);
 	sc->vxl_ftable_hash_key = arc4random();
 }
 
 static void
 vxlan_ftable_fini(struct vxlan_softc *sc)
 {
 	int i;
 
 	for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) {
 		KASSERT(LIST_EMPTY(&sc->vxl_ftable[i]),
 		    ("%s: vxlan %p ftable[%d] not empty", __func__, sc, i));
 	}
 	MPASS(sc->vxl_ftable_cnt == 0);
 
 	free(sc->vxl_ftable, M_VXLAN);
 	sc->vxl_ftable = NULL;
 }
 
 static void
 vxlan_ftable_flush(struct vxlan_softc *sc, int all)
 {
 	struct vxlan_ftable_entry *fe, *tfe;
 	int i;
 
 	for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) {
 		LIST_FOREACH_SAFE(fe, &sc->vxl_ftable[i], vxlfe_hash, tfe) {
 			if (all || VXLAN_FE_IS_DYNAMIC(fe))
 				vxlan_ftable_entry_destroy(sc, fe);
 		}
 	}
 }
 
 static void
 vxlan_ftable_expire(struct vxlan_softc *sc)
 {
 	struct vxlan_ftable_entry *fe, *tfe;
 	int i;
 
 	VXLAN_LOCK_WASSERT(sc);
 
 	for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) {
 		LIST_FOREACH_SAFE(fe, &sc->vxl_ftable[i], vxlfe_hash, tfe) {
 			if (VXLAN_FE_IS_DYNAMIC(fe) &&
 			    time_uptime >= fe->vxlfe_expire)
 				vxlan_ftable_entry_destroy(sc, fe);
 		}
 	}
 }
 
 static int
 vxlan_ftable_update_locked(struct vxlan_softc *sc, const struct sockaddr *sa,
     const uint8_t *mac, struct rm_priotracker *tracker)
 {
 	union vxlan_sockaddr vxlsa;
 	struct vxlan_ftable_entry *fe;
 	int error;
 
 	VXLAN_LOCK_ASSERT(sc);
 
 again:
 	/*
 	 * A forwarding entry for this MAC address might already exist. If
 	 * so, update it, otherwise create a new one. We may have to upgrade
 	 * the lock if we have to change or create an entry.
 	 */
 	fe = vxlan_ftable_entry_lookup(sc, mac);
 	if (fe != NULL) {
 		fe->vxlfe_expire = time_uptime + sc->vxl_ftable_timeout;
 
 		if (!VXLAN_FE_IS_DYNAMIC(fe) ||
 		    vxlan_sockaddr_in_equal(&fe->vxlfe_raddr, sa))
 			return (0);
 		if (!VXLAN_LOCK_WOWNED(sc)) {
 			VXLAN_RUNLOCK(sc, tracker);
 			VXLAN_WLOCK(sc);
 			sc->vxl_stats.ftable_lock_upgrade_failed++;
 			goto again;
 		}
 		vxlan_sockaddr_in_copy(&fe->vxlfe_raddr, sa);
 		return (0);
 	}
 
 	if (!VXLAN_LOCK_WOWNED(sc)) {
 		VXLAN_RUNLOCK(sc, tracker);
 		VXLAN_WLOCK(sc);
 		sc->vxl_stats.ftable_lock_upgrade_failed++;
 		goto again;
 	}
 
 	if (sc->vxl_ftable_cnt >= sc->vxl_ftable_max) {
 		sc->vxl_stats.ftable_nospace++;
 		return (ENOSPC);
 	}
 
 	fe = vxlan_ftable_entry_alloc();
 	if (fe == NULL)
 		return (ENOMEM);
 
 	/*
 	 * The source port may be randomly select by the remove host, so
 	 * use the port of the default destination address.
 	 */
 	vxlan_sockaddr_copy(&vxlsa, sa);
 	vxlsa.in4.sin_port = sc->vxl_dst_addr.in4.sin_port;
 
 	vxlan_ftable_entry_init(sc, fe, mac, &vxlsa.sa,
 	    VXLAN_FE_FLAG_DYNAMIC);
 
 	/* The prior lookup failed, so the insert should not. */
 	error = vxlan_ftable_entry_insert(sc, fe);
 	MPASS(error == 0);
 
 	return (0);
 }
 
 static int
 vxlan_ftable_update(struct vxlan_softc *sc, const struct sockaddr *sa,
     const uint8_t *mac)
 {
 	struct rm_priotracker tracker;
 	int error;
 
 	VXLAN_RLOCK(sc, &tracker);
 	error = vxlan_ftable_update_locked(sc, sa, mac, &tracker);
 	VXLAN_UNLOCK(sc, &tracker);
 
 	return (error);
 }
 
 static int
 vxlan_ftable_sysctl_dump(SYSCTL_HANDLER_ARGS)
 {
 	struct rm_priotracker tracker;
 	struct sbuf sb;
 	struct vxlan_softc *sc;
 	struct vxlan_ftable_entry *fe;
 	size_t size;
 	int i, error;
 
 	/*
 	 * This is mostly intended for debugging during development. It is
 	 * not practical to dump an entire large table this way.
 	 */
 
 	sc = arg1;
 	size = PAGE_SIZE;	/* Calculate later. */
 
 	sbuf_new(&sb, NULL, size, SBUF_FIXEDLEN);
 	sbuf_putc(&sb, '\n');
 
 	VXLAN_RLOCK(sc, &tracker);
 	for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) {
 		LIST_FOREACH(fe, &sc->vxl_ftable[i], vxlfe_hash) {
 			if (sbuf_error(&sb) != 0)
 				break;
 			vxlan_ftable_entry_dump(fe, &sb);
 		}
 	}
 	VXLAN_RUNLOCK(sc, &tracker);
 
 	if (sbuf_len(&sb) == 1)
 		sbuf_setpos(&sb, 0);
 
 	sbuf_finish(&sb);
 	error = sysctl_handle_string(oidp, sbuf_data(&sb), sbuf_len(&sb), req);
 	sbuf_delete(&sb);
 
 	return (error);
 }
 
 static struct vxlan_ftable_entry *
 vxlan_ftable_entry_alloc(void)
 {
 	struct vxlan_ftable_entry *fe;
 
 	fe = malloc(sizeof(*fe), M_VXLAN, M_ZERO | M_NOWAIT);
 
 	return (fe);
 }
 
 static void
 vxlan_ftable_entry_free(struct vxlan_ftable_entry *fe)
 {
 
 	free(fe, M_VXLAN);
 }
 
 static void
 vxlan_ftable_entry_init(struct vxlan_softc *sc, struct vxlan_ftable_entry *fe,
     const uint8_t *mac, const struct sockaddr *sa, uint32_t flags)
 {
 
 	fe->vxlfe_flags = flags;
 	fe->vxlfe_expire = time_uptime + sc->vxl_ftable_timeout;
 	memcpy(fe->vxlfe_mac, mac, ETHER_ADDR_LEN);
 	vxlan_sockaddr_copy(&fe->vxlfe_raddr, sa);
 }
 
 static void
 vxlan_ftable_entry_destroy(struct vxlan_softc *sc,
     struct vxlan_ftable_entry *fe)
 {
 
 	sc->vxl_ftable_cnt--;
 	LIST_REMOVE(fe, vxlfe_hash);
 	vxlan_ftable_entry_free(fe);
 }
 
 static int
 vxlan_ftable_entry_insert(struct vxlan_softc *sc,
     struct vxlan_ftable_entry *fe)
 {
 	struct vxlan_ftable_entry *lfe;
 	uint32_t hash;
 	int dir;
 
 	VXLAN_LOCK_WASSERT(sc);
 	hash = VXLAN_SC_FTABLE_HASH(sc, fe->vxlfe_mac);
 
 	lfe = LIST_FIRST(&sc->vxl_ftable[hash]);
 	if (lfe == NULL) {
 		LIST_INSERT_HEAD(&sc->vxl_ftable[hash], fe, vxlfe_hash);
 		goto out;
 	}
 
 	do {
 		dir = vxlan_ftable_addr_cmp(fe->vxlfe_mac, lfe->vxlfe_mac);
 		if (dir == 0)
 			return (EEXIST);
 		if (dir > 0) {
 			LIST_INSERT_BEFORE(lfe, fe, vxlfe_hash);
 			goto out;
 		} else if (LIST_NEXT(lfe, vxlfe_hash) == NULL) {
 			LIST_INSERT_AFTER(lfe, fe, vxlfe_hash);
 			goto out;
 		} else
 			lfe = LIST_NEXT(lfe, vxlfe_hash);
 	} while (lfe != NULL);
 
 out:
 	sc->vxl_ftable_cnt++;
 
 	return (0);
 }
 
 static struct vxlan_ftable_entry *
 vxlan_ftable_entry_lookup(struct vxlan_softc *sc, const uint8_t *mac)
 {
 	struct vxlan_ftable_entry *fe;
 	uint32_t hash;
 	int dir;
 
 	VXLAN_LOCK_ASSERT(sc);
 	hash = VXLAN_SC_FTABLE_HASH(sc, mac);
 
 	LIST_FOREACH(fe, &sc->vxl_ftable[hash], vxlfe_hash) {
 		dir = vxlan_ftable_addr_cmp(fe->vxlfe_mac, mac);
 		if (dir == 0)
 			return (fe);
 		if (dir > 0)
 			break;
 	}
 
 	return (NULL);
 }
 
 static void
 vxlan_ftable_entry_dump(struct vxlan_ftable_entry *fe, struct sbuf *sb)
 {
 	char buf[64];
 	const union vxlan_sockaddr *sa;
 	const void *addr;
 	int i, len, af, width;
 
 	sa = &fe->vxlfe_raddr;
 	af = sa->sa.sa_family;
 	len = sbuf_len(sb);
 
 	sbuf_printf(sb, "%c 0x%02X ", VXLAN_FE_IS_DYNAMIC(fe) ? 'D' : 'S',
 	    fe->vxlfe_flags);
 
 	for (i = 0; i < ETHER_ADDR_LEN - 1; i++)
 		sbuf_printf(sb, "%02X:", fe->vxlfe_mac[i]);
 	sbuf_printf(sb, "%02X ", fe->vxlfe_mac[i]);
 
 	if (af == AF_INET) {
 		addr = &sa->in4.sin_addr;
 		width = INET_ADDRSTRLEN - 1;
 	} else {
 		addr = &sa->in6.sin6_addr;
 		width = INET6_ADDRSTRLEN - 1;
 	}
 	inet_ntop(af, addr, buf, sizeof(buf));
 	sbuf_printf(sb, "%*s ", width, buf);
 
 	sbuf_printf(sb, "%08jd", (intmax_t)fe->vxlfe_expire);
 
 	sbuf_putc(sb, '\n');
 
 	/* Truncate a partial line. */
 	if (sbuf_error(sb) != 0)
 		sbuf_setpos(sb, len);
 }
 
 static struct vxlan_socket *
 vxlan_socket_alloc(const union vxlan_sockaddr *sa)
 {
 	struct vxlan_socket *vso;
 	int i;
 
 	vso = malloc(sizeof(*vso), M_VXLAN, M_WAITOK | M_ZERO);
 	rm_init(&vso->vxlso_lock, "vxlansorm");
 	refcount_init(&vso->vxlso_refcnt, 0);
 	for (i = 0; i < VXLAN_SO_VNI_HASH_SIZE; i++)
 		LIST_INIT(&vso->vxlso_vni_hash[i]);
 	vso->vxlso_laddr = *sa;
 
 	return (vso);
 }
 
 static void
 vxlan_socket_destroy(struct vxlan_socket *vso)
 {
 	struct socket *so;
 	struct vxlan_socket_mc_info *mc;
 	int i;
 
 	for (i = 0; i < VXLAN_SO_MC_MAX_GROUPS; i++) {
 		mc = &vso->vxlso_mc[i];
 		KASSERT(mc->vxlsomc_gaddr.sa.sa_family == AF_UNSPEC,
 		    ("%s: socket %p mc[%d] still has address",
 		     __func__, vso, i));
 	}
 
 	for (i = 0; i < VXLAN_SO_VNI_HASH_SIZE; i++) {
 		KASSERT(LIST_EMPTY(&vso->vxlso_vni_hash[i]),
 		    ("%s: socket %p vni_hash[%d] not empty",
 		     __func__, vso, i));
 	}
 
 	so = vso->vxlso_sock;
 	if (so != NULL) {
 		vso->vxlso_sock = NULL;
 		soclose(so);
 	}
 
 	rm_destroy(&vso->vxlso_lock);
 	free(vso, M_VXLAN);
 }
 
 static void
 vxlan_socket_release(struct vxlan_socket *vso)
 {
 	int destroy;
 
 	mtx_lock(&vxlan_list_mtx);
 	destroy = VXLAN_SO_RELEASE(vso);
 	if (destroy != 0)
 		LIST_REMOVE(vso, vxlso_entry);
 	mtx_unlock(&vxlan_list_mtx);
 
 	if (destroy != 0)
 		vxlan_socket_destroy(vso);
 }
 
 static struct vxlan_socket *
 vxlan_socket_lookup(union vxlan_sockaddr *vxlsa)
 {
 	struct vxlan_socket *vso;
 
 	mtx_lock(&vxlan_list_mtx);
 	LIST_FOREACH(vso, &vxlan_socket_list, vxlso_entry) {
 		if (vxlan_sockaddr_cmp(&vso->vxlso_laddr, &vxlsa->sa) == 0) {
 			VXLAN_SO_ACQUIRE(vso);
 			break;
 		}
 	}
 	mtx_unlock(&vxlan_list_mtx);
 
 	return (vso);
 }
 
 static void
 vxlan_socket_insert(struct vxlan_socket *vso)
 {
 
 	mtx_lock(&vxlan_list_mtx);
 	VXLAN_SO_ACQUIRE(vso);
 	LIST_INSERT_HEAD(&vxlan_socket_list, vso, vxlso_entry);
 	mtx_unlock(&vxlan_list_mtx);
 }
 
 static int
 vxlan_socket_init(struct vxlan_socket *vso, struct ifnet *ifp)
 {
 	struct thread *td;
 	int error;
 
 	td = curthread;
 
 	error = socreate(vso->vxlso_laddr.sa.sa_family, &vso->vxlso_sock,
 	    SOCK_DGRAM, IPPROTO_UDP, td->td_ucred, td);
 	if (error) {
 		if_printf(ifp, "cannot create socket: %d\n", error);
 		return (error);
 	}
 
 	error = udp_set_kernel_tunneling(vso->vxlso_sock,
 	    vxlan_rcv_udp_packet, NULL, vso);
 	if (error) {
 		if_printf(ifp, "cannot set tunneling function: %d\n", error);
 		return (error);
 	}
 
 	if (vxlan_reuse_port != 0) {
 		struct sockopt sopt;
 		int val = 1;
 
 		bzero(&sopt, sizeof(sopt));
 		sopt.sopt_dir = SOPT_SET;
 		sopt.sopt_level = IPPROTO_IP;
 		sopt.sopt_name = SO_REUSEPORT;
 		sopt.sopt_val = &val;
 		sopt.sopt_valsize = sizeof(val);
 		error = sosetopt(vso->vxlso_sock, &sopt);
 		if (error) {
 			if_printf(ifp,
 			    "cannot set REUSEADDR socket opt: %d\n", error);
 			return (error);
 		}
 	}
 
 	return (0);
 }
 
 static int
 vxlan_socket_bind(struct vxlan_socket *vso, struct ifnet *ifp)
 {
 	union vxlan_sockaddr laddr;
 	struct thread *td;
 	int error;
 
 	td = curthread;
 	laddr = vso->vxlso_laddr;
 
 	error = sobind(vso->vxlso_sock, &laddr.sa, td);
 	if (error) {
 		if (error != EADDRINUSE)
 			if_printf(ifp, "cannot bind socket: %d\n", error);
 		return (error);
 	}
 
 	return (0);
 }
 
 static int
 vxlan_socket_create(struct ifnet *ifp, int multicast,
     const union vxlan_sockaddr *saddr, struct vxlan_socket **vsop)
 {
 	union vxlan_sockaddr laddr;
 	struct vxlan_socket *vso;
 	int error;
 
 	laddr = *saddr;
 
 	/*
 	 * If this socket will be multicast, then only the local port
 	 * must be specified when binding.
 	 */
 	if (multicast != 0) {
 		if (VXLAN_SOCKADDR_IS_IPV4(&laddr))
 			laddr.in4.sin_addr.s_addr = INADDR_ANY;
 #ifdef INET6
 		else
 			laddr.in6.sin6_addr = in6addr_any;
 #endif
 	}
 
 	vso = vxlan_socket_alloc(&laddr);
 	if (vso == NULL)
 		return (ENOMEM);
 
 	error = vxlan_socket_init(vso, ifp);
 	if (error)
 		goto fail;
 
 	error = vxlan_socket_bind(vso, ifp);
 	if (error)
 		goto fail;
 
 	/*
 	 * There is a small window between the bind completing and
 	 * inserting the socket, so that a concurrent create may fail.
 	 * Let's not worry about that for now.
 	 */
 	vxlan_socket_insert(vso);
 	*vsop = vso;
 
 	return (0);
 
 fail:
 	vxlan_socket_destroy(vso);
 
 	return (error);
 }
 
 static void
 vxlan_socket_ifdetach(struct vxlan_socket *vso, struct ifnet *ifp,
     struct vxlan_softc_head *list)
 {
 	struct rm_priotracker tracker;
 	struct vxlan_softc *sc;
 	int i;
 
 	VXLAN_SO_RLOCK(vso, &tracker);
 	for (i = 0; i < VXLAN_SO_VNI_HASH_SIZE; i++) {
 		LIST_FOREACH(sc, &vso->vxlso_vni_hash[i], vxl_entry)
 			vxlan_ifdetach(sc, ifp, list);
 	}
 	VXLAN_SO_RUNLOCK(vso, &tracker);
 }
 
 static struct vxlan_socket *
 vxlan_socket_mc_lookup(const union vxlan_sockaddr *vxlsa)
 {
 	struct vxlan_socket *vso;
 	union vxlan_sockaddr laddr;
 
 	laddr = *vxlsa;
 
 	if (VXLAN_SOCKADDR_IS_IPV4(&laddr))
 		laddr.in4.sin_addr.s_addr = INADDR_ANY;
 #ifdef INET6
 	else
 		laddr.in6.sin6_addr = in6addr_any;
 #endif
 
 	vso = vxlan_socket_lookup(&laddr);
 
 	return (vso);
 }
 
 static int
 vxlan_sockaddr_mc_info_match(const struct vxlan_socket_mc_info *mc,
     const union vxlan_sockaddr *group, const union vxlan_sockaddr *local,
     int ifidx)
 {
 
 	if (!vxlan_sockaddr_in_any(local) &&
 	    !vxlan_sockaddr_in_equal(&mc->vxlsomc_saddr, &local->sa))
 		return (0);
 	if (!vxlan_sockaddr_in_equal(&mc->vxlsomc_gaddr, &group->sa))
 		return (0);
 	if (ifidx != 0 && ifidx != mc->vxlsomc_ifidx)
 		return (0);
 
 	return (1);
 }
 
 static int
 vxlan_socket_mc_join_group(struct vxlan_socket *vso,
     const union vxlan_sockaddr *group, const union vxlan_sockaddr *local,
     int *ifidx, union vxlan_sockaddr *source)
 {
 	struct sockopt sopt;
 	int error;
 
 	*source = *local;
 
 	if (VXLAN_SOCKADDR_IS_IPV4(group)) {
 		struct ip_mreq mreq;
 
 		mreq.imr_multiaddr = group->in4.sin_addr;
 		mreq.imr_interface = local->in4.sin_addr;
 
 		bzero(&sopt, sizeof(sopt));
 		sopt.sopt_dir = SOPT_SET;
 		sopt.sopt_level = IPPROTO_IP;
 		sopt.sopt_name = IP_ADD_MEMBERSHIP;
 		sopt.sopt_val = &mreq;
 		sopt.sopt_valsize = sizeof(mreq);
 		error = sosetopt(vso->vxlso_sock, &sopt);
 		if (error)
 			return (error);
 
 		/*
 		 * BMV: Ideally, there would be a formal way for us to get
 		 * the local interface that was selected based on the
 		 * imr_interface address. We could then update *ifidx so
 		 * vxlan_sockaddr_mc_info_match() would return a match for
 		 * later creates that explicitly set the multicast interface.
 		 *
 		 * If we really need to, we can of course look in the INP's
 		 * membership list:
 		 *     sotoinpcb(vso->vxlso_sock)->inp_moptions->
 		 *         imo_membership[]->inm_ifp
 		 * similarly to imo_match_group().
 		 */
 		source->in4.sin_addr = local->in4.sin_addr;
 
 	} else if (VXLAN_SOCKADDR_IS_IPV6(group)) {
 		struct ipv6_mreq mreq;
 
 		mreq.ipv6mr_multiaddr = group->in6.sin6_addr;
 		mreq.ipv6mr_interface = *ifidx;
 
 		bzero(&sopt, sizeof(sopt));
 		sopt.sopt_dir = SOPT_SET;
 		sopt.sopt_level = IPPROTO_IPV6;
 		sopt.sopt_name = IPV6_JOIN_GROUP;
 		sopt.sopt_val = &mreq;
 		sopt.sopt_valsize = sizeof(mreq);
 		error = sosetopt(vso->vxlso_sock, &sopt);
 		if (error)
 			return (error);
 
 		/*
 		 * BMV: As with IPv4, we would really like to know what
 		 * interface in6p_lookup_mcast_ifp() selected.
 		 */
 	} else
 		error = EAFNOSUPPORT;
 
 	return (error);
 }
 
 static int
 vxlan_socket_mc_leave_group(struct vxlan_socket *vso,
     const union vxlan_sockaddr *group, const union vxlan_sockaddr *source,
     int ifidx)
 {
 	struct sockopt sopt;
 	int error;
 
 	bzero(&sopt, sizeof(sopt));
 	sopt.sopt_dir = SOPT_SET;
 
 	if (VXLAN_SOCKADDR_IS_IPV4(group)) {
 		struct ip_mreq mreq;
 
 		mreq.imr_multiaddr = group->in4.sin_addr;
 		mreq.imr_interface = source->in4.sin_addr;
 
 		sopt.sopt_level = IPPROTO_IP;
 		sopt.sopt_name = IP_DROP_MEMBERSHIP;
 		sopt.sopt_val = &mreq;
 		sopt.sopt_valsize = sizeof(mreq);
 		error = sosetopt(vso->vxlso_sock, &sopt);
 
 	} else if (VXLAN_SOCKADDR_IS_IPV6(group)) {
 		struct ipv6_mreq mreq;
 
 		mreq.ipv6mr_multiaddr = group->in6.sin6_addr;
 		mreq.ipv6mr_interface = ifidx;
 
 		sopt.sopt_level = IPPROTO_IPV6;
 		sopt.sopt_name = IPV6_LEAVE_GROUP;
 		sopt.sopt_val = &mreq;
 		sopt.sopt_valsize = sizeof(mreq);
 		error = sosetopt(vso->vxlso_sock, &sopt);
 
 	} else
 		error = EAFNOSUPPORT;
 
 	return (error);
 }
 
 static int
 vxlan_socket_mc_add_group(struct vxlan_socket *vso,
     const union vxlan_sockaddr *group, const union vxlan_sockaddr *local,
     int ifidx, int *idx)
 {
 	union vxlan_sockaddr source;
 	struct vxlan_socket_mc_info *mc;
 	int i, empty, error;
 
 	/*
 	 * Within a socket, the same multicast group may be used by multiple
 	 * interfaces, each with a different network identifier. But a socket
 	 * may only join a multicast group once, so keep track of the users
 	 * here.
 	 */
 
 	VXLAN_SO_WLOCK(vso);
 	for (empty = 0, i = 0; i < VXLAN_SO_MC_MAX_GROUPS; i++) {
 		mc = &vso->vxlso_mc[i];
 
 		if (mc->vxlsomc_gaddr.sa.sa_family == AF_UNSPEC) {
 			empty++;
 			continue;
 		}
 
 		if (vxlan_sockaddr_mc_info_match(mc, group, local, ifidx))
 			goto out;
 	}
 	VXLAN_SO_WUNLOCK(vso);
 
 	if (empty == 0)
 		return (ENOSPC);
 
 	error = vxlan_socket_mc_join_group(vso, group, local, &ifidx, &source);
 	if (error)
 		return (error);
 
 	VXLAN_SO_WLOCK(vso);
 	for (i = 0; i < VXLAN_SO_MC_MAX_GROUPS; i++) {
 		mc = &vso->vxlso_mc[i];
 
 		if (mc->vxlsomc_gaddr.sa.sa_family == AF_UNSPEC) {
 			vxlan_sockaddr_copy(&mc->vxlsomc_gaddr, &group->sa);
 			vxlan_sockaddr_copy(&mc->vxlsomc_saddr, &source.sa);
 			mc->vxlsomc_ifidx = ifidx;
 			goto out;
 		}
 	}
 	VXLAN_SO_WUNLOCK(vso);
 
 	error = vxlan_socket_mc_leave_group(vso, group, &source, ifidx);
 	MPASS(error == 0);
 
 	return (ENOSPC);
 
 out:
 	mc->vxlsomc_users++;
 	VXLAN_SO_WUNLOCK(vso);
 
 	*idx = i;
 
 	return (0);
 }
 
 static void
 vxlan_socket_mc_release_group_by_idx(struct vxlan_socket *vso, int idx)
 {
 	union vxlan_sockaddr group, source;
 	struct vxlan_socket_mc_info *mc;
 	int ifidx, leave;
 
 	KASSERT(idx >= 0 && idx < VXLAN_SO_MC_MAX_GROUPS,
 	    ("%s: vso %p idx %d out of bounds", __func__, vso, idx));
 
 	leave = 0;
 	mc = &vso->vxlso_mc[idx];
 
 	VXLAN_SO_WLOCK(vso);
 	mc->vxlsomc_users--;
 	if (mc->vxlsomc_users == 0) {
 		group = mc->vxlsomc_gaddr;
 		source = mc->vxlsomc_saddr;
 		ifidx = mc->vxlsomc_ifidx;
 		bzero(mc, sizeof(*mc));
 		leave = 1;
 	}
 	VXLAN_SO_WUNLOCK(vso);
 
 	if (leave != 0) {
 		/*
 		 * Our socket's membership in this group may have already
 		 * been removed if we joined through an interface that's
 		 * been detached.
 		 */
 		vxlan_socket_mc_leave_group(vso, &group, &source, ifidx);
 	}
 }
 
 static struct vxlan_softc *
 vxlan_socket_lookup_softc_locked(struct vxlan_socket *vso, uint32_t vni)
 {
 	struct vxlan_softc *sc;
 	uint32_t hash;
 
 	VXLAN_SO_LOCK_ASSERT(vso);
 	hash = VXLAN_SO_VNI_HASH(vni);
 
 	LIST_FOREACH(sc, &vso->vxlso_vni_hash[hash], vxl_entry) {
 		if (sc->vxl_vni == vni) {
 			VXLAN_ACQUIRE(sc);
 			break;
 		}
 	}
 
 	return (sc);
 }
 
 static struct vxlan_softc *
 vxlan_socket_lookup_softc(struct vxlan_socket *vso, uint32_t vni)
 {
 	struct rm_priotracker tracker;
 	struct vxlan_softc *sc;
 
 	VXLAN_SO_RLOCK(vso, &tracker);
 	sc = vxlan_socket_lookup_softc_locked(vso, vni);
 	VXLAN_SO_RUNLOCK(vso, &tracker);
 
 	return (sc);
 }
 
 static int
 vxlan_socket_insert_softc(struct vxlan_socket *vso, struct vxlan_softc *sc)
 {
 	struct vxlan_softc *tsc;
 	uint32_t vni, hash;
 
 	vni = sc->vxl_vni;
 	hash = VXLAN_SO_VNI_HASH(vni);
 
 	VXLAN_SO_WLOCK(vso);
 	tsc = vxlan_socket_lookup_softc_locked(vso, vni);
 	if (tsc != NULL) {
 		VXLAN_SO_WUNLOCK(vso);
 		vxlan_release(tsc);
 		return (EEXIST);
 	}
 
 	VXLAN_ACQUIRE(sc);
 	LIST_INSERT_HEAD(&vso->vxlso_vni_hash[hash], sc, vxl_entry);
 	VXLAN_SO_WUNLOCK(vso);
 
 	return (0);
 }
 
 static void
 vxlan_socket_remove_softc(struct vxlan_socket *vso, struct vxlan_softc *sc)
 {
 
 	VXLAN_SO_WLOCK(vso);
 	LIST_REMOVE(sc, vxl_entry);
 	VXLAN_SO_WUNLOCK(vso);
 
 	vxlan_release(sc);
 }
 
 static struct ifnet *
 vxlan_multicast_if_ref(struct vxlan_softc *sc, int ipv4)
 {
 	struct ifnet *ifp;
 
 	VXLAN_LOCK_ASSERT(sc);
 
 	if (ipv4 && sc->vxl_im4o != NULL)
 		ifp = sc->vxl_im4o->imo_multicast_ifp;
 	else if (!ipv4 && sc->vxl_im6o != NULL)
 		ifp = sc->vxl_im6o->im6o_multicast_ifp;
 	else
 		ifp = NULL;
 
 	if (ifp != NULL)
 		if_ref(ifp);
 
 	return (ifp);
 }
 
 static void
 vxlan_free_multicast(struct vxlan_softc *sc)
 {
 
 	if (sc->vxl_mc_ifp != NULL) {
 		if_rele(sc->vxl_mc_ifp);
 		sc->vxl_mc_ifp = NULL;
 		sc->vxl_mc_ifindex = 0;
 	}
 
 	if (sc->vxl_im4o != NULL) {
 		free(sc->vxl_im4o, M_VXLAN);
 		sc->vxl_im4o = NULL;
 	}
 
 	if (sc->vxl_im6o != NULL) {
 		free(sc->vxl_im6o, M_VXLAN);
 		sc->vxl_im6o = NULL;
 	}
 }
 
 static int
 vxlan_setup_multicast_interface(struct vxlan_softc *sc)
 {
 	struct ifnet *ifp;
 
 	ifp = ifunit_ref(sc->vxl_mc_ifname);
 	if (ifp == NULL) {
 		if_printf(sc->vxl_ifp, "multicast interfaces %s does "
 		    "not exist\n", sc->vxl_mc_ifname);
 		return (ENOENT);
 	}
 
 	if ((ifp->if_flags & IFF_MULTICAST) == 0) {
 		if_printf(sc->vxl_ifp, "interface %s does not support "
 		     "multicast\n", sc->vxl_mc_ifname);
 		if_rele(ifp);
 		return (ENOTSUP);
 	}
 
 	sc->vxl_mc_ifp = ifp;
 	sc->vxl_mc_ifindex = ifp->if_index;
 
 	return (0);
 }
 
 static int
 vxlan_setup_multicast(struct vxlan_softc *sc)
 {
 	const union vxlan_sockaddr *group;
 	int error;
 
 	group = &sc->vxl_dst_addr;
 	error = 0;
 
 	if (sc->vxl_mc_ifname[0] != '\0') {
 		error = vxlan_setup_multicast_interface(sc);
 		if (error)
 			return (error);
 	}
 
 	/*
 	 * Initialize an multicast options structure that is sufficiently
 	 * populated for use in the respective IP output routine. This
 	 * structure is typically stored in the socket, but our sockets
 	 * may be shared among multiple interfaces.
 	 */
 	if (VXLAN_SOCKADDR_IS_IPV4(group)) {
 		sc->vxl_im4o = malloc(sizeof(struct ip_moptions), M_VXLAN,
 		    M_ZERO | M_WAITOK);
 		sc->vxl_im4o->imo_multicast_ifp = sc->vxl_mc_ifp;
 		sc->vxl_im4o->imo_multicast_ttl = sc->vxl_ttl;
 		sc->vxl_im4o->imo_multicast_vif = -1;
 	} else if (VXLAN_SOCKADDR_IS_IPV6(group)) {
 		sc->vxl_im6o = malloc(sizeof(struct ip6_moptions), M_VXLAN,
 		    M_ZERO | M_WAITOK);
 		sc->vxl_im6o->im6o_multicast_ifp = sc->vxl_mc_ifp;
 		sc->vxl_im6o->im6o_multicast_hlim = sc->vxl_ttl;
 	}
 
 	return (error);
 }
 
 static int
 vxlan_setup_socket(struct vxlan_softc *sc)
 {
 	struct vxlan_socket *vso;
 	struct ifnet *ifp;
 	union vxlan_sockaddr *saddr, *daddr;
 	int multicast, error;
 
 	vso = NULL;
 	ifp = sc->vxl_ifp;
 	saddr = &sc->vxl_src_addr;
 	daddr = &sc->vxl_dst_addr;
 
 	multicast = vxlan_sockaddr_in_multicast(daddr);
 	MPASS(multicast != -1);
 	sc->vxl_vso_mc_index = -1;
 
 	/*
 	 * Try to create the socket. If that fails, attempt to use an
 	 * existing socket.
 	 */
 	error = vxlan_socket_create(ifp, multicast, saddr, &vso);
 	if (error) {
 		if (multicast != 0)
 			vso = vxlan_socket_mc_lookup(saddr);
 		else
 			vso = vxlan_socket_lookup(saddr);
 
 		if (vso == NULL) {
 			if_printf(ifp, "cannot create socket (error: %d), "
 			    "and no existing socket found\n", error);
 			goto out;
 		}
 	}
 
 	if (multicast != 0) {
 		error = vxlan_setup_multicast(sc);
 		if (error)
 			goto out;
 
 		error = vxlan_socket_mc_add_group(vso, daddr, saddr,
 		    sc->vxl_mc_ifindex, &sc->vxl_vso_mc_index);
 		if (error)
 			goto out;
 	}
 
 	sc->vxl_sock = vso;
 	error = vxlan_socket_insert_softc(vso, sc);
 	if (error) {
 		sc->vxl_sock = NULL;
 		if_printf(ifp, "network identifier %d already exists in "
 		    "this socket\n", sc->vxl_vni);
 		goto out;
 	}
 
 	return (0);
 
 out:
 	if (vso != NULL) {
 		if (sc->vxl_vso_mc_index != -1) {
 			vxlan_socket_mc_release_group_by_idx(vso,
 			    sc->vxl_vso_mc_index);
 			sc->vxl_vso_mc_index = -1;
 		}
 		if (multicast != 0)
 			vxlan_free_multicast(sc);
 		vxlan_socket_release(vso);
 	}
 
 	return (error);
 }
 
 static void
 vxlan_setup_interface(struct vxlan_softc *sc)
 {
 	struct ifnet *ifp;
 
 	ifp = sc->vxl_ifp;
 	ifp->if_hdrlen = ETHER_HDR_LEN + sizeof(struct vxlanudphdr);
 
 	if (VXLAN_SOCKADDR_IS_IPV4(&sc->vxl_dst_addr) != 0)
 		ifp->if_hdrlen += sizeof(struct ip);
 	else if (VXLAN_SOCKADDR_IS_IPV6(&sc->vxl_dst_addr) != 0)
 		ifp->if_hdrlen += sizeof(struct ip6_hdr);
 }
 
 static int
 vxlan_valid_init_config(struct vxlan_softc *sc)
 {
 	const char *reason;
 
 	if (vxlan_check_vni(sc->vxl_vni) != 0) {
 		reason = "invalid virtual network identifier specified";
 		goto fail;
 	}
 
 	if (vxlan_sockaddr_supported(&sc->vxl_src_addr, 1) == 0) {
 		reason = "source address type is not supported";
 		goto fail;
 	}
 
 	if (vxlan_sockaddr_supported(&sc->vxl_dst_addr, 0) == 0) {
 		reason = "destination address type is not supported";
 		goto fail;
 	}
 
 	if (vxlan_sockaddr_in_any(&sc->vxl_dst_addr) != 0) {
 		reason = "no valid destination address specified";
 		goto fail;
 	}
 
 	if (vxlan_sockaddr_in_multicast(&sc->vxl_dst_addr) == 0 &&
 	    sc->vxl_mc_ifname[0] != '\0') {
 		reason = "can only specify interface with a group address";
 		goto fail;
 	}
 
 	if (vxlan_sockaddr_in_any(&sc->vxl_src_addr) == 0) {
 		if (VXLAN_SOCKADDR_IS_IPV4(&sc->vxl_src_addr) ^
 		    VXLAN_SOCKADDR_IS_IPV4(&sc->vxl_dst_addr)) {
 			reason = "source and destination address must both "
 			    "be either IPv4 or IPv6";
 			goto fail;
 		}
 	}
 
 	if (sc->vxl_src_addr.in4.sin_port == 0) {
 		reason = "local port not specified";
 		goto fail;
 	}
 
 	if (sc->vxl_dst_addr.in4.sin_port == 0) {
 		reason = "remote port not specified";
 		goto fail;
 	}
 
 	return (0);
 
 fail:
 	if_printf(sc->vxl_ifp, "cannot initialize interface: %s\n", reason);
 	return (EINVAL);
 }
 
 static void
 vxlan_init_wait(struct vxlan_softc *sc)
 {
 
 	VXLAN_LOCK_WASSERT(sc);
 	while (sc->vxl_flags & VXLAN_FLAG_INIT)
 		rm_sleep(sc, &sc->vxl_lock, 0, "vxlint", hz);
 }
 
 static void
 vxlan_init_complete(struct vxlan_softc *sc)
 {
 
 	VXLAN_WLOCK(sc);
 	sc->vxl_flags &= ~VXLAN_FLAG_INIT;
 	wakeup(sc);
 	VXLAN_WUNLOCK(sc);
 }
 
 static void
 vxlan_init(void *xsc)
 {
 	static const uint8_t empty_mac[ETHER_ADDR_LEN];
 	struct vxlan_softc *sc;
 	struct ifnet *ifp;
 
 	sc = xsc;
 	ifp = sc->vxl_ifp;
 
 	VXLAN_WLOCK(sc);
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 		VXLAN_WUNLOCK(sc);
 		return;
 	}
 	sc->vxl_flags |= VXLAN_FLAG_INIT;
 	VXLAN_WUNLOCK(sc);
 
 	if (vxlan_valid_init_config(sc) != 0)
 		goto out;
 
 	vxlan_setup_interface(sc);
 
 	if (vxlan_setup_socket(sc) != 0)
 		goto out;
 
 	/* Initialize the default forwarding entry. */
 	vxlan_ftable_entry_init(sc, &sc->vxl_default_fe, empty_mac,
 	    &sc->vxl_dst_addr.sa, VXLAN_FE_FLAG_STATIC);
 
 	VXLAN_WLOCK(sc);
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	callout_reset(&sc->vxl_callout, vxlan_ftable_prune_period * hz,
 	    vxlan_timer, sc);
 	VXLAN_WUNLOCK(sc);
 
 out:
 	vxlan_init_complete(sc);
 }
 
 static void
 vxlan_release(struct vxlan_softc *sc)
 {
 
 	/*
 	 * The softc may be destroyed as soon as we release our reference,
 	 * so we cannot serialize the wakeup with the softc lock. We use a
 	 * timeout in our sleeps so a missed wakeup is unfortunate but not
 	 * fatal.
 	 */
 	if (VXLAN_RELEASE(sc) != 0)
 		wakeup(sc);
 }
 
 static void
 vxlan_teardown_wait(struct vxlan_softc *sc)
 {
 
 	VXLAN_LOCK_WASSERT(sc);
 	while (sc->vxl_flags & VXLAN_FLAG_TEARDOWN)
 		rm_sleep(sc, &sc->vxl_lock, 0, "vxltrn", hz);
 }
 
 static void
 vxlan_teardown_complete(struct vxlan_softc *sc)
 {
 
 	VXLAN_WLOCK(sc);
 	sc->vxl_flags &= ~VXLAN_FLAG_TEARDOWN;
 	wakeup(sc);
 	VXLAN_WUNLOCK(sc);
 }
 
 static void
 vxlan_teardown_locked(struct vxlan_softc *sc)
 {
 	struct ifnet *ifp;
 	struct vxlan_socket *vso;
 
 	ifp = sc->vxl_ifp;
 
 	VXLAN_LOCK_WASSERT(sc);
 	MPASS(sc->vxl_flags & VXLAN_FLAG_TEARDOWN);
 
 	ifp->if_flags &= ~IFF_UP;
 	ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	callout_stop(&sc->vxl_callout);
 	vso = sc->vxl_sock;
 	sc->vxl_sock = NULL;
 
 	VXLAN_WUNLOCK(sc);
 
 	if (vso != NULL) {
 		vxlan_socket_remove_softc(vso, sc);
 
 		if (sc->vxl_vso_mc_index != -1) {
 			vxlan_socket_mc_release_group_by_idx(vso,
 			    sc->vxl_vso_mc_index);
 			sc->vxl_vso_mc_index = -1;
 		}
 	}
 
 	VXLAN_WLOCK(sc);
 	while (sc->vxl_refcnt != 0)
 		rm_sleep(sc, &sc->vxl_lock, 0, "vxldrn", hz);
 	VXLAN_WUNLOCK(sc);
 
 	callout_drain(&sc->vxl_callout);
 
 	vxlan_free_multicast(sc);
 	if (vso != NULL)
 		vxlan_socket_release(vso);
 
 	vxlan_teardown_complete(sc);
 }
 
 static void
 vxlan_teardown(struct vxlan_softc *sc)
 {
 
 	VXLAN_WLOCK(sc);
 	if (sc->vxl_flags & VXLAN_FLAG_TEARDOWN) {
 		vxlan_teardown_wait(sc);
 		VXLAN_WUNLOCK(sc);
 		return;
 	}
 
 	sc->vxl_flags |= VXLAN_FLAG_TEARDOWN;
 	vxlan_teardown_locked(sc);
 }
 
 static void
 vxlan_ifdetach(struct vxlan_softc *sc, struct ifnet *ifp,
     struct vxlan_softc_head *list)
 {
 
 	VXLAN_WLOCK(sc);
 
 	if (sc->vxl_mc_ifp != ifp)
 		goto out;
 	if (sc->vxl_flags & VXLAN_FLAG_TEARDOWN)
 		goto out;
 
 	sc->vxl_flags |= VXLAN_FLAG_TEARDOWN;
 	LIST_INSERT_HEAD(list, sc, vxl_ifdetach_list);
 
 out:
 	VXLAN_WUNLOCK(sc);
 }
 
 static void
 vxlan_timer(void *xsc)
 {
 	struct vxlan_softc *sc;
 
 	sc = xsc;
 	VXLAN_LOCK_WASSERT(sc);
 
 	vxlan_ftable_expire(sc);
 	callout_schedule(&sc->vxl_callout, vxlan_ftable_prune_period * hz);
 }
 
 static int
 vxlan_ioctl_ifflags(struct vxlan_softc *sc)
 {
 	struct ifnet *ifp;
 
 	ifp = sc->vxl_ifp;
 
 	if (ifp->if_flags & IFF_UP) {
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
 			vxlan_init(sc);
 	} else {
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 			vxlan_teardown(sc);
 	}
 
 	return (0);
 }
 
 static int
 vxlan_ctrl_get_config(struct vxlan_softc *sc, void *arg)
 {
 	struct rm_priotracker tracker;
 	struct ifvxlancfg *cfg;
 
 	cfg = arg;
 	bzero(cfg, sizeof(*cfg));
 
 	VXLAN_RLOCK(sc, &tracker);
 	cfg->vxlc_vni = sc->vxl_vni;
 	memcpy(&cfg->vxlc_local_sa, &sc->vxl_src_addr,
 	    sizeof(union vxlan_sockaddr));
 	memcpy(&cfg->vxlc_remote_sa, &sc->vxl_dst_addr,
 	    sizeof(union vxlan_sockaddr));
 	cfg->vxlc_mc_ifindex = sc->vxl_mc_ifindex;
 	cfg->vxlc_ftable_cnt = sc->vxl_ftable_cnt;
 	cfg->vxlc_ftable_max = sc->vxl_ftable_max;
 	cfg->vxlc_ftable_timeout = sc->vxl_ftable_timeout;
 	cfg->vxlc_port_min = sc->vxl_min_port;
 	cfg->vxlc_port_max = sc->vxl_max_port;
 	cfg->vxlc_learn = (sc->vxl_flags & VXLAN_FLAG_LEARN) != 0;
 	cfg->vxlc_ttl = sc->vxl_ttl;
 	VXLAN_RUNLOCK(sc, &tracker);
 
 	return (0);
 }
 
 static int
 vxlan_ctrl_set_vni(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	if (vxlan_check_vni(cmd->vxlcmd_vni) != 0)
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		sc->vxl_vni = cmd->vxlcmd_vni;
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_local_addr(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	union vxlan_sockaddr *vxlsa;
 	int error;
 
 	cmd = arg;
 	vxlsa = &cmd->vxlcmd_sa;
 
 	if (!VXLAN_SOCKADDR_IS_IPV46(vxlsa))
 		return (EINVAL);
 	if (vxlan_sockaddr_in_multicast(vxlsa) != 0)
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		vxlan_sockaddr_in_copy(&sc->vxl_src_addr, &vxlsa->sa);
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_remote_addr(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	union vxlan_sockaddr *vxlsa;
 	int error;
 
 	cmd = arg;
 	vxlsa = &cmd->vxlcmd_sa;
 
 	if (!VXLAN_SOCKADDR_IS_IPV46(vxlsa))
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		vxlan_sockaddr_in_copy(&sc->vxl_dst_addr, &vxlsa->sa);
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_local_port(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	if (cmd->vxlcmd_port == 0)
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		sc->vxl_src_addr.in4.sin_port = htons(cmd->vxlcmd_port);
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_remote_port(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	if (cmd->vxlcmd_port == 0)
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		sc->vxl_dst_addr.in4.sin_port = htons(cmd->vxlcmd_port);
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_port_range(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	uint16_t min, max;
 	int error;
 
 	cmd = arg;
 	min = cmd->vxlcmd_port_min;
 	max = cmd->vxlcmd_port_max;
 
 	if (max < min)
 		return (EINVAL);
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		sc->vxl_min_port = min;
 		sc->vxl_max_port = max;
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_ftable_timeout(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_check_ftable_timeout(cmd->vxlcmd_ftable_timeout) == 0) {
 		sc->vxl_ftable_timeout = cmd->vxlcmd_ftable_timeout;
 		error = 0;
 	} else
 		error = EINVAL;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_ftable_max(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_check_ftable_max(cmd->vxlcmd_ftable_max) == 0) {
 		sc->vxl_ftable_max = cmd->vxlcmd_ftable_max;
 		error = 0;
 	} else
 		error = EINVAL;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_multicast_if(struct vxlan_softc * sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_can_change_config(sc)) {
 		strlcpy(sc->vxl_mc_ifname, cmd->vxlcmd_ifname, IFNAMSIZ);
 		error = 0;
 	} else
 		error = EBUSY;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_ttl(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int error;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	if (vxlan_check_ttl(cmd->vxlcmd_ttl) == 0) {
 		sc->vxl_ttl = cmd->vxlcmd_ttl;
 		if (sc->vxl_im4o != NULL)
 			sc->vxl_im4o->imo_multicast_ttl = sc->vxl_ttl;
 		if (sc->vxl_im6o != NULL)
 			sc->vxl_im6o->im6o_multicast_hlim = sc->vxl_ttl;
 		error = 0;
 	} else
 		error = EINVAL;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_set_learn(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	if (cmd->vxlcmd_flags & VXLAN_CMD_FLAG_LEARN)
 		sc->vxl_flags |= VXLAN_FLAG_LEARN;
 	else
 		sc->vxl_flags &= ~VXLAN_FLAG_LEARN;
 	VXLAN_WUNLOCK(sc);
 
 	return (0);
 }
 
 static int
 vxlan_ctrl_ftable_entry_add(struct vxlan_softc *sc, void *arg)
 {
 	union vxlan_sockaddr vxlsa;
 	struct ifvxlancmd *cmd;
 	struct vxlan_ftable_entry *fe;
 	int error;
 
 	cmd = arg;
 	vxlsa = cmd->vxlcmd_sa;
 
 	if (!VXLAN_SOCKADDR_IS_IPV46(&vxlsa))
 		return (EINVAL);
 	if (vxlan_sockaddr_in_any(&vxlsa) != 0)
 		return (EINVAL);
 	if (vxlan_sockaddr_in_multicast(&vxlsa) != 0)
 		return (EINVAL);
 	/* BMV: We could support both IPv4 and IPv6 later. */
 	if (vxlsa.sa.sa_family != sc->vxl_dst_addr.sa.sa_family)
 		return (EAFNOSUPPORT);
 
 	fe = vxlan_ftable_entry_alloc();
 	if (fe == NULL)
 		return (ENOMEM);
 
 	if (vxlsa.in4.sin_port == 0)
 		vxlsa.in4.sin_port = sc->vxl_dst_addr.in4.sin_port;
 
 	vxlan_ftable_entry_init(sc, fe, cmd->vxlcmd_mac, &vxlsa.sa,
 	    VXLAN_FE_FLAG_STATIC);
 
 	VXLAN_WLOCK(sc);
 	error = vxlan_ftable_entry_insert(sc, fe);
 	VXLAN_WUNLOCK(sc);
 
 	if (error)
 		vxlan_ftable_entry_free(fe);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_ftable_entry_rem(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	struct vxlan_ftable_entry *fe;
 	int error;
 
 	cmd = arg;
 
 	VXLAN_WLOCK(sc);
 	fe = vxlan_ftable_entry_lookup(sc, cmd->vxlcmd_mac);
 	if (fe != NULL) {
 		vxlan_ftable_entry_destroy(sc, fe);
 		error = 0;
 	} else
 		error = ENOENT;
 	VXLAN_WUNLOCK(sc);
 
 	return (error);
 }
 
 static int
 vxlan_ctrl_flush(struct vxlan_softc *sc, void *arg)
 {
 	struct ifvxlancmd *cmd;
 	int all;
 
 	cmd = arg;
 	all = cmd->vxlcmd_flags & VXLAN_CMD_FLAG_FLUSH_ALL;
 
 	VXLAN_WLOCK(sc);
 	vxlan_ftable_flush(sc, all);
 	VXLAN_WUNLOCK(sc);
 
 	return (0);
 }
 
 static int
 vxlan_ioctl_drvspec(struct vxlan_softc *sc, struct ifdrv *ifd, int get)
 {
 	const struct vxlan_control *vc;
 	union {
 		struct ifvxlancfg	cfg;
 		struct ifvxlancmd	cmd;
 	} args;
 	int out, error;
 
 	if (ifd->ifd_cmd >= vxlan_control_table_size)
 		return (EINVAL);
 
 	bzero(&args, sizeof(args));
 	vc = &vxlan_control_table[ifd->ifd_cmd];
 	out = (vc->vxlc_flags & VXLAN_CTRL_FLAG_COPYOUT) != 0;
 
 	if ((get != 0 && out == 0) || (get == 0 && out != 0))
 		return (EINVAL);
 
 	if (vc->vxlc_flags & VXLAN_CTRL_FLAG_SUSER) {
 		error = priv_check(curthread, PRIV_NET_VXLAN);
 		if (error)
 			return (error);
 	}
 
 	if (ifd->ifd_len != vc->vxlc_argsize ||
 	    ifd->ifd_len > sizeof(args))
 		return (EINVAL);
 
 	if (vc->vxlc_flags & VXLAN_CTRL_FLAG_COPYIN) {
 		error = copyin(ifd->ifd_data, &args, ifd->ifd_len);
 		if (error)
 			return (error);
 	}
 
 	error = vc->vxlc_func(sc, &args);
 	if (error)
 		return (error);
 
 	if (vc->vxlc_flags & VXLAN_CTRL_FLAG_COPYOUT) {
 		error = copyout(&args, ifd->ifd_data, ifd->ifd_len);
 		if (error)
 			return (error);
 	}
 
 	return (0);
 }
 
 static int
 vxlan_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
 	struct vxlan_softc *sc;
 	struct ifreq *ifr;
 	struct ifdrv *ifd;
 	int error;
 
 	sc = ifp->if_softc;
 	ifr = (struct ifreq *) data;
 	ifd = (struct ifdrv *) data;
 
 	switch (cmd) {
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		error = 0;
 		break;
 
 	case SIOCGDRVSPEC:
 	case SIOCSDRVSPEC:
 		error = vxlan_ioctl_drvspec(sc, ifd, cmd == SIOCGDRVSPEC);
 		break;
 
 	case SIOCSIFFLAGS:
 		error = vxlan_ioctl_ifflags(sc);
 		break;
 	default:
 		error = ether_ioctl(ifp, cmd, data);
 		break;
 	}
 
 	return (error);
 }
 
 #if defined(INET) || defined(INET6)
 static uint16_t
 vxlan_pick_source_port(struct vxlan_softc *sc, struct mbuf *m)
 {
 	int range;
 	uint32_t hash;
 
 	range = sc->vxl_max_port - sc->vxl_min_port + 1;
 
 	/* check if flowid is set and not opaque */
-	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE &&
-	    M_HASHTYPE_GET(m) != M_HASHTYPE_OPAQUE)
+	if (M_HASHTYPE_ISHASH(m))
 		hash = m->m_pkthdr.flowid;
 	else
 		hash = jenkins_hash(m->m_data, ETHER_HDR_LEN,
 		    sc->vxl_port_hash_key);
 
 	return (sc->vxl_min_port + (hash % range));
 }
 
 static void
 vxlan_encap_header(struct vxlan_softc *sc, struct mbuf *m, int ipoff,
     uint16_t srcport, uint16_t dstport)
 {
 	struct vxlanudphdr *hdr;
 	struct udphdr *udph;
 	struct vxlan_header *vxh;
 	int len;
 
 	len = m->m_pkthdr.len - ipoff;
 	MPASS(len >= sizeof(struct vxlanudphdr));
 	hdr = mtodo(m, ipoff);
 
 	udph = &hdr->vxlh_udp;
 	udph->uh_sport = srcport;
 	udph->uh_dport = dstport;
 	udph->uh_ulen = htons(len);
 	udph->uh_sum = 0;
 
 	vxh = &hdr->vxlh_hdr;
 	vxh->vxlh_flags = htonl(VXLAN_HDR_FLAGS_VALID_VNI);
 	vxh->vxlh_vni = htonl(sc->vxl_vni << VXLAN_HDR_VNI_SHIFT);
 }
 #endif
 
 static int
 vxlan_encap4(struct vxlan_softc *sc, const union vxlan_sockaddr *fvxlsa,
     struct mbuf *m)
 {
 #ifdef INET
 	struct ifnet *ifp;
 	struct ip *ip;
 	struct in_addr srcaddr, dstaddr;
 	uint16_t srcport, dstport;
 	int len, mcast, error;
 
 	ifp = sc->vxl_ifp;
 	srcaddr = sc->vxl_src_addr.in4.sin_addr;
 	srcport = vxlan_pick_source_port(sc, m);
 	dstaddr = fvxlsa->in4.sin_addr;
 	dstport = fvxlsa->in4.sin_port;
 
 	M_PREPEND(m, sizeof(struct ip) + sizeof(struct vxlanudphdr),
 	    M_NOWAIT);
 	if (m == NULL) {
 		if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 		return (ENOBUFS);
 	}
 
 	len = m->m_pkthdr.len;
 
 	ip = mtod(m, struct ip *);
 	ip->ip_tos = 0;
 	ip->ip_len = htons(len);
 	ip->ip_off = 0;
 	ip->ip_ttl = sc->vxl_ttl;
 	ip->ip_p = IPPROTO_UDP;
 	ip->ip_sum = 0;
 	ip->ip_src = srcaddr;
 	ip->ip_dst = dstaddr;
 
 	vxlan_encap_header(sc, m, sizeof(struct ip), srcport, dstport);
 
 	mcast = (m->m_flags & (M_MCAST | M_BCAST)) ? 1 : 0;
 	m->m_flags &= ~(M_MCAST | M_BCAST);
 
 	error = ip_output(m, NULL, NULL, 0, sc->vxl_im4o, NULL);
 	if (error == 0) {
 		if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 		if_inc_counter(ifp, IFCOUNTER_OBYTES, len);
 		if (mcast != 0)
 			if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 	} else
 		if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 
 	return (error);
 #else
 	m_freem(m);
 	return (ENOTSUP);
 #endif
 }
 
 static int
 vxlan_encap6(struct vxlan_softc *sc, const union vxlan_sockaddr *fvxlsa,
     struct mbuf *m)
 {
 #ifdef INET6
 	struct ifnet *ifp;
 	struct ip6_hdr *ip6;
 	const struct in6_addr *srcaddr, *dstaddr;
 	uint16_t srcport, dstport;
 	int len, mcast, error;
 
 	ifp = sc->vxl_ifp;
 	srcaddr = &sc->vxl_src_addr.in6.sin6_addr;
 	srcport = vxlan_pick_source_port(sc, m);
 	dstaddr = &fvxlsa->in6.sin6_addr;
 	dstport = fvxlsa->in6.sin6_port;
 
 	M_PREPEND(m, sizeof(struct ip6_hdr) + sizeof(struct vxlanudphdr),
 	    M_NOWAIT);
 	if (m == NULL) {
 		if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 		return (ENOBUFS);
 	}
 
 	len = m->m_pkthdr.len;
 
 	ip6 = mtod(m, struct ip6_hdr *);
 	ip6->ip6_flow = 0;		/* BMV: Keep in forwarding entry? */
 	ip6->ip6_vfc = IPV6_VERSION;
 	ip6->ip6_plen = 0;
 	ip6->ip6_nxt = IPPROTO_UDP;
 	ip6->ip6_hlim = sc->vxl_ttl;
 	ip6->ip6_src = *srcaddr;
 	ip6->ip6_dst = *dstaddr;
 
 	vxlan_encap_header(sc, m, sizeof(struct ip6_hdr), srcport, dstport);
 
 	/*
 	 * XXX BMV We need support for RFC6935 before we can send and
 	 * receive IPv6 UDP packets with a zero checksum.
 	 */
 	{
 		struct udphdr *hdr = mtodo(m, sizeof(struct ip6_hdr));
 		hdr->uh_sum = in6_cksum_pseudo(ip6,
 		    m->m_pkthdr.len - sizeof(struct ip6_hdr), IPPROTO_UDP, 0);
 		m->m_pkthdr.csum_flags = CSUM_UDP_IPV6;
 		m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum);
 	}
 
 	mcast = (m->m_flags & (M_MCAST | M_BCAST)) ? 1 : 0;
 	m->m_flags &= ~(M_MCAST | M_BCAST);
 
 	error = ip6_output(m, NULL, NULL, 0, sc->vxl_im6o, NULL, NULL);
 	if (error == 0) {
 		if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 		if_inc_counter(ifp, IFCOUNTER_OBYTES, len);
 		if (mcast != 0)
 			if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
 	} else
 		if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
 
 	return (error);
 #else
 	m_freem(m);
 	return (ENOTSUP);
 #endif
 }
 
 static int
 vxlan_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct rm_priotracker tracker;
 	union vxlan_sockaddr vxlsa;
 	struct vxlan_softc *sc;
 	struct vxlan_ftable_entry *fe;
 	struct ifnet *mcifp;
 	struct ether_header *eh;
 	int ipv4, error;
 
 	sc = ifp->if_softc;
 	eh = mtod(m, struct ether_header *);
 	fe = NULL;
 	mcifp = NULL;
 
 	ETHER_BPF_MTAP(ifp, m);
 
 	VXLAN_RLOCK(sc, &tracker);
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 		VXLAN_RUNLOCK(sc, &tracker);
 		m_freem(m);
 		return (ENETDOWN);
 	}
 
 	if ((m->m_flags & (M_BCAST | M_MCAST)) == 0)
 		fe = vxlan_ftable_entry_lookup(sc, eh->ether_dhost);
 	if (fe == NULL)
 		fe = &sc->vxl_default_fe;
 	vxlan_sockaddr_copy(&vxlsa, &fe->vxlfe_raddr.sa);
 
 	ipv4 = VXLAN_SOCKADDR_IS_IPV4(&vxlsa) != 0;
 	if (vxlan_sockaddr_in_multicast(&vxlsa) != 0)
 		mcifp = vxlan_multicast_if_ref(sc, ipv4);
 
 	VXLAN_ACQUIRE(sc);
 	VXLAN_RUNLOCK(sc, &tracker);
 
 	if (ipv4 != 0)
 		error = vxlan_encap4(sc, &vxlsa, m);
 	else
 		error = vxlan_encap6(sc, &vxlsa, m);
 
 	vxlan_release(sc);
 	if (mcifp != NULL)
 		if_rele(mcifp);
 
 	return (error);
 }
 
 static void
 vxlan_qflush(struct ifnet *ifp __unused)
 {
 }
 
 static void
 vxlan_rcv_udp_packet(struct mbuf *m, int offset, struct inpcb *inpcb,
     const struct sockaddr *srcsa, void *xvso)
 {
 	struct vxlan_socket *vso;
 	struct vxlan_header *vxh, vxlanhdr;
 	uint32_t vni;
 	int error;
 
 	M_ASSERTPKTHDR(m);
 	vso = xvso;
 	offset += sizeof(struct udphdr);
 
 	if (m->m_pkthdr.len < offset + sizeof(struct vxlan_header))
 		goto out;
 
 	if (__predict_false(m->m_len < offset + sizeof(struct vxlan_header))) {
 		m_copydata(m, offset, sizeof(struct vxlan_header),
 		    (caddr_t) &vxlanhdr);
 		vxh = &vxlanhdr;
 	} else
 		vxh = mtodo(m, offset);
 
 	/*
 	 * Drop if there is a reserved bit set in either the flags or VNI
 	 * fields of the header. This goes against the specification, but
 	 * a bit set may indicate an unsupported new feature. This matches
 	 * the behavior of the Linux implementation.
 	 */
 	if (vxh->vxlh_flags != htonl(VXLAN_HDR_FLAGS_VALID_VNI) ||
 	    vxh->vxlh_vni & ~htonl(VXLAN_VNI_MASK))
 		goto out;
 
 	vni = ntohl(vxh->vxlh_vni) >> VXLAN_HDR_VNI_SHIFT;
 	/* Adjust to the start of the inner Ethernet frame. */
 	m_adj(m, offset + sizeof(struct vxlan_header));
 
 	error = vxlan_input(vso, vni, &m, srcsa);
 	MPASS(error != 0 || m == NULL);
 
 out:
 	if (m != NULL)
 		m_freem(m);
 }
 
 static int
 vxlan_input(struct vxlan_socket *vso, uint32_t vni, struct mbuf **m0,
     const struct sockaddr *sa)
 {
 	struct vxlan_softc *sc;
 	struct ifnet *ifp;
 	struct mbuf *m;
 	struct ether_header *eh;
 	int error;
 
 	sc = vxlan_socket_lookup_softc(vso, vni);
 	if (sc == NULL)
 		return (ENOENT);
 
 	ifp = sc->vxl_ifp;
 	m = *m0;
 	eh = mtod(m, struct ether_header *);
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
 		error = ENETDOWN;
 		goto out;
 	} else if (ifp == m->m_pkthdr.rcvif) {
 		/* XXX Does not catch more complex loops. */
 		error = EDEADLK;
 		goto out;
 	}
 
 	if (sc->vxl_flags & VXLAN_FLAG_LEARN)
 		vxlan_ftable_update(sc, sa, eh->ether_shost);
 
 	m_clrprotoflags(m);
 	m->m_pkthdr.rcvif = ifp;
 	M_SETFIB(m, ifp->if_fib);
 
 	error = netisr_queue_src(NETISR_ETHER, 0, m);
 	*m0 = NULL;
 
 out:
 	vxlan_release(sc);
 	return (error);
 }
 
 static void
 vxlan_set_default_config(struct vxlan_softc *sc)
 {
 
 	sc->vxl_flags |= VXLAN_FLAG_LEARN;
 
 	sc->vxl_vni = VXLAN_VNI_MAX;
 	sc->vxl_ttl = IPDEFTTL;
 
 	if (!vxlan_tunable_int(sc, "legacy_port", vxlan_legacy_port)) {
 		sc->vxl_src_addr.in4.sin_port = htons(VXLAN_PORT);
 		sc->vxl_dst_addr.in4.sin_port = htons(VXLAN_PORT);
 	} else {
 		sc->vxl_src_addr.in4.sin_port = htons(VXLAN_LEGACY_PORT);
 		sc->vxl_dst_addr.in4.sin_port = htons(VXLAN_LEGACY_PORT);
 	}
 
 	sc->vxl_min_port = V_ipport_firstauto;
 	sc->vxl_max_port = V_ipport_lastauto;
 
 	sc->vxl_ftable_max = VXLAN_FTABLE_MAX;
 	sc->vxl_ftable_timeout = VXLAN_FTABLE_TIMEOUT;
 }
 
 static int
 vxlan_set_user_config(struct vxlan_softc *sc, struct ifvxlanparam *vxlp)
 {
 
 #ifndef INET
 	if (vxlp->vxlp_with & (VXLAN_PARAM_WITH_LOCAL_ADDR4 |
 	    VXLAN_PARAM_WITH_REMOTE_ADDR4))
 		return (EAFNOSUPPORT);
 #endif
 
 #ifndef INET6
 	if (vxlp->vxlp_with & (VXLAN_PARAM_WITH_LOCAL_ADDR6 |
 	    VXLAN_PARAM_WITH_REMOTE_ADDR6))
 		return (EAFNOSUPPORT);
 #endif
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_VNI) {
 		if (vxlan_check_vni(vxlp->vxlp_vni) == 0)
 			sc->vxl_vni = vxlp->vxlp_vni;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR4) {
 		sc->vxl_src_addr.in4.sin_len = sizeof(struct sockaddr_in);
 		sc->vxl_src_addr.in4.sin_family = AF_INET;
 		sc->vxl_src_addr.in4.sin_addr = vxlp->vxlp_local_in4;
 	} else if (vxlp->vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR6) {
 		sc->vxl_src_addr.in6.sin6_len = sizeof(struct sockaddr_in6);
 		sc->vxl_src_addr.in6.sin6_family = AF_INET6;
 		sc->vxl_src_addr.in6.sin6_addr = vxlp->vxlp_local_in6;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR4) {
 		sc->vxl_dst_addr.in4.sin_len = sizeof(struct sockaddr_in);
 		sc->vxl_dst_addr.in4.sin_family = AF_INET;
 		sc->vxl_dst_addr.in4.sin_addr = vxlp->vxlp_remote_in4;
 	} else if (vxlp->vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR6) {
 		sc->vxl_dst_addr.in6.sin6_len = sizeof(struct sockaddr_in6);
 		sc->vxl_dst_addr.in6.sin6_family = AF_INET6;
 		sc->vxl_dst_addr.in6.sin6_addr = vxlp->vxlp_remote_in6;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_LOCAL_PORT)
 		sc->vxl_src_addr.in4.sin_port = htons(vxlp->vxlp_local_port);
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_REMOTE_PORT)
 		sc->vxl_dst_addr.in4.sin_port = htons(vxlp->vxlp_remote_port);
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_PORT_RANGE) {
 		if (vxlp->vxlp_min_port <= vxlp->vxlp_max_port) {
 			sc->vxl_min_port = vxlp->vxlp_min_port;
 			sc->vxl_max_port = vxlp->vxlp_max_port;
 		}
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_MULTICAST_IF)
 		strlcpy(sc->vxl_mc_ifname, vxlp->vxlp_mc_ifname, IFNAMSIZ);
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_FTABLE_TIMEOUT) {
 		if (vxlan_check_ftable_timeout(vxlp->vxlp_ftable_timeout) == 0)
 			sc->vxl_ftable_timeout = vxlp->vxlp_ftable_timeout;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_FTABLE_MAX) {
 		if (vxlan_check_ftable_max(vxlp->vxlp_ftable_max) == 0)
 			sc->vxl_ftable_max = vxlp->vxlp_ftable_max;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_TTL) {
 		if (vxlan_check_ttl(vxlp->vxlp_ttl) == 0)
 			sc->vxl_ttl = vxlp->vxlp_ttl;
 	}
 
 	if (vxlp->vxlp_with & VXLAN_PARAM_WITH_LEARN) {
 		if (vxlp->vxlp_learn == 0)
 			sc->vxl_flags &= ~VXLAN_FLAG_LEARN;
 	}
 
 	return (0);
 }
 
 static int
 vxlan_clone_create(struct if_clone *ifc, int unit, caddr_t params)
 {
 	struct vxlan_softc *sc;
 	struct ifnet *ifp;
 	struct ifvxlanparam vxlp;
 	int error;
 
 	sc = malloc(sizeof(struct vxlan_softc), M_VXLAN, M_WAITOK | M_ZERO);
 	sc->vxl_unit = unit;
 	vxlan_set_default_config(sc);
 
 	if (params != 0) {
 		error = copyin(params, &vxlp, sizeof(vxlp));
 		if (error)
 			goto fail;
 
 		error = vxlan_set_user_config(sc, &vxlp);
 		if (error)
 			goto fail;
 	}
 
 	ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		error = ENOSPC;
 		goto fail;
 	}
 
 	sc->vxl_ifp = ifp;
 	rm_init(&sc->vxl_lock, "vxlanrm");
 	callout_init_rw(&sc->vxl_callout, &sc->vxl_lock, 0);
 	sc->vxl_port_hash_key = arc4random();
 	vxlan_ftable_init(sc);
 
 	vxlan_sysctl_setup(sc);
 
 	ifp->if_softc = sc;
 	if_initname(ifp, vxlan_name, unit);
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 	ifp->if_init = vxlan_init;
 	ifp->if_ioctl = vxlan_ioctl;
 	ifp->if_transmit = vxlan_transmit;
 	ifp->if_qflush = vxlan_qflush;
 
 	vxlan_fakeaddr(sc);
 	ether_ifattach(ifp, sc->vxl_hwaddr);
 
 	ifp->if_baudrate = 0;
 	ifp->if_hdrlen = 0;
 
 	return (0);
 
 fail:
 	free(sc, M_VXLAN);
 	return (error);
 }
 
 static void
 vxlan_clone_destroy(struct ifnet *ifp)
 {
 	struct vxlan_softc *sc;
 
 	sc = ifp->if_softc;
 
 	vxlan_teardown(sc);
 
 	vxlan_ftable_flush(sc, 1);
 
 	ether_ifdetach(ifp);
 	if_free(ifp);
 
 	vxlan_ftable_fini(sc);
 
 	vxlan_sysctl_destroy(sc);
 	rm_destroy(&sc->vxl_lock);
 	free(sc, M_VXLAN);
 }
 
 /* BMV: Taken from if_bridge. */
 static uint32_t
 vxlan_mac_hash(struct vxlan_softc *sc, const uint8_t *addr)
 {
 	uint32_t a = 0x9e3779b9, b = 0x9e3779b9, c = sc->vxl_ftable_hash_key;
 
 	b += addr[5] << 8;
 	b += addr[4];
 	a += addr[3] << 24;
 	a += addr[2] << 16;
 	a += addr[1] << 8;
 	a += addr[0];
 
 /*
  * The following hash function is adapted from "Hash Functions" by Bob Jenkins
  * ("Algorithm Alley", Dr. Dobbs Journal, September 1997).
  */
 #define	mix(a, b, c)							\
 do {									\
 	a -= b; a -= c; a ^= (c >> 13);					\
 	b -= c; b -= a; b ^= (a << 8);					\
 	c -= a; c -= b; c ^= (b >> 13);					\
 	a -= b; a -= c; a ^= (c >> 12);					\
 	b -= c; b -= a; b ^= (a << 16);					\
 	c -= a; c -= b; c ^= (b >> 5);					\
 	a -= b; a -= c; a ^= (c >> 3);					\
 	b -= c; b -= a; b ^= (a << 10);					\
 	c -= a; c -= b; c ^= (b >> 15);					\
 } while (0)
 
 	mix(a, b, c);
 
 #undef mix
 
 	return (c);
 }
 
 static void
 vxlan_fakeaddr(struct vxlan_softc *sc)
 {
 
 	/*
 	 * Generate a non-multicast, locally administered address.
 	 *
 	 * BMV: Should we use the FreeBSD OUI range instead?
 	 */
 	arc4rand(sc->vxl_hwaddr, ETHER_ADDR_LEN, 1);
 	sc->vxl_hwaddr[0] &= ~1;
 	sc->vxl_hwaddr[0] |= 2;
 }
 
 static int
 vxlan_sockaddr_cmp(const union vxlan_sockaddr *vxladdr,
     const struct sockaddr *sa)
 {
 
 	return (bcmp(&vxladdr->sa, sa, vxladdr->sa.sa_len));
 }
 
 static void
 vxlan_sockaddr_copy(union vxlan_sockaddr *vxladdr,
     const struct sockaddr *sa)
 {
 
 	MPASS(sa->sa_family == AF_INET || sa->sa_family == AF_INET6);
 	bzero(vxladdr, sizeof(*vxladdr));
 
 	if (sa->sa_family == AF_INET) {
 		vxladdr->in4 = *satoconstsin(sa);
 		vxladdr->in4.sin_len = sizeof(struct sockaddr_in);
 	} else if (sa->sa_family == AF_INET6) {
 		vxladdr->in6 = *satoconstsin6(sa);
 		vxladdr->in6.sin6_len = sizeof(struct sockaddr_in6);
 	}
 }
 
 static int
 vxlan_sockaddr_in_equal(const union vxlan_sockaddr *vxladdr,
     const struct sockaddr *sa)
 {
 	int equal;
 
 	if (sa->sa_family == AF_INET) {
 		const struct in_addr *in4 = &satoconstsin(sa)->sin_addr;
 		equal = in4->s_addr == vxladdr->in4.sin_addr.s_addr;
 	} else if (sa->sa_family == AF_INET6) {
 		const struct in6_addr *in6 = &satoconstsin6(sa)->sin6_addr;
 		equal = IN6_ARE_ADDR_EQUAL(in6, &vxladdr->in6.sin6_addr);
 	} else
 		equal = 0;
 
 	return (equal);
 }
 
 static void
 vxlan_sockaddr_in_copy(union vxlan_sockaddr *vxladdr,
     const struct sockaddr *sa)
 {
 
 	MPASS(sa->sa_family == AF_INET || sa->sa_family == AF_INET6);
 
 	if (sa->sa_family == AF_INET) {
 		const struct in_addr *in4 = &satoconstsin(sa)->sin_addr;
 		vxladdr->in4.sin_family = AF_INET;
 		vxladdr->in4.sin_len = sizeof(struct sockaddr_in);
 		vxladdr->in4.sin_addr = *in4;
 	} else if (sa->sa_family == AF_INET6) {
 		const struct in6_addr *in6 = &satoconstsin6(sa)->sin6_addr;
 		vxladdr->in6.sin6_family = AF_INET6;
 		vxladdr->in6.sin6_len = sizeof(struct sockaddr_in6);
 		vxladdr->in6.sin6_addr = *in6;
 	}
 }
 
 static int
 vxlan_sockaddr_supported(const union vxlan_sockaddr *vxladdr, int unspec)
 {
 	const struct sockaddr *sa;
 	int supported;
 
 	sa = &vxladdr->sa;
 	supported = 0;
 
 	if (sa->sa_family == AF_UNSPEC && unspec != 0) {
 		supported = 1;
 	} else if (sa->sa_family == AF_INET) {
 #ifdef INET
 		supported = 1;
 #endif
 	} else if (sa->sa_family == AF_INET6) {
 #ifdef INET6
 		supported = 1;
 #endif
 	}
 
 	return (supported);
 }
 
 static int
 vxlan_sockaddr_in_any(const union vxlan_sockaddr *vxladdr)
 {
 	const struct sockaddr *sa;
 	int any;
 
 	sa = &vxladdr->sa;
 
 	if (sa->sa_family == AF_INET) {
 		const struct in_addr *in4 = &satoconstsin(sa)->sin_addr;
 		any = in4->s_addr == INADDR_ANY;
 	} else if (sa->sa_family == AF_INET6) {
 		const struct in6_addr *in6 = &satoconstsin6(sa)->sin6_addr;
 		any = IN6_IS_ADDR_UNSPECIFIED(in6);
 	} else
 		any = -1;
 
 	return (any);
 }
 
 static int
 vxlan_sockaddr_in_multicast(const union vxlan_sockaddr *vxladdr)
 {
 	const struct sockaddr *sa;
 	int mc;
 
 	sa = &vxladdr->sa;
 
 	if (sa->sa_family == AF_INET) {
 		const struct in_addr *in4 = &satoconstsin(sa)->sin_addr;
 		mc = IN_MULTICAST(ntohl(in4->s_addr));
 	} else if (sa->sa_family == AF_INET6) {
 		const struct in6_addr *in6 = &satoconstsin6(sa)->sin6_addr;
 		mc = IN6_IS_ADDR_MULTICAST(in6);
 	} else
 		mc = -1;
 
 	return (mc);
 }
 
 static int
 vxlan_can_change_config(struct vxlan_softc *sc)
 {
 	struct ifnet *ifp;
 
 	ifp = sc->vxl_ifp;
 	VXLAN_LOCK_ASSERT(sc);
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 		return (0);
 	if (sc->vxl_flags & (VXLAN_FLAG_INIT | VXLAN_FLAG_TEARDOWN))
 		return (0);
 
 	return (1);
 }
 
 static int
 vxlan_check_vni(uint32_t vni)
 {
 
 	return (vni >= VXLAN_VNI_MAX);
 }
 
 static int
 vxlan_check_ttl(int ttl)
 {
 
 	return (ttl > MAXTTL);
 }
 
 static int
 vxlan_check_ftable_timeout(uint32_t timeout)
 {
 
 	return (timeout > VXLAN_FTABLE_MAX_TIMEOUT);
 }
 
 static int
 vxlan_check_ftable_max(uint32_t max)
 {
 
 	return (max > VXLAN_FTABLE_MAX);
 }
 
 static void
 vxlan_sysctl_setup(struct vxlan_softc *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *node;
 	struct vxlan_statistics *stats;
 	char namebuf[8];
 
 	ctx = &sc->vxl_sysctl_ctx;
 	stats = &sc->vxl_stats;
 	snprintf(namebuf, sizeof(namebuf), "%d", sc->vxl_unit);
 
 	sysctl_ctx_init(ctx);
 	sc->vxl_sysctl_node = SYSCTL_ADD_NODE(ctx,
 	    SYSCTL_STATIC_CHILDREN(_net_link_vxlan), OID_AUTO, namebuf,
 	    CTLFLAG_RD, NULL, "");
 
 	node = SYSCTL_ADD_NODE(ctx, SYSCTL_CHILDREN(sc->vxl_sysctl_node),
 	    OID_AUTO, "ftable", CTLFLAG_RD, NULL, "");
 	SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(node), OID_AUTO, "count",
 	    CTLFLAG_RD, &sc->vxl_ftable_cnt, 0,
 	    "Number of entries in fowarding table");
 	SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(node), OID_AUTO, "max",
 	     CTLFLAG_RD, &sc->vxl_ftable_max, 0,
 	    "Maximum number of entries allowed in fowarding table");
 	SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(node), OID_AUTO, "timeout",
 	    CTLFLAG_RD, &sc->vxl_ftable_timeout, 0,
 	    "Number of seconds between prunes of the forwarding table");
 	SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(node), OID_AUTO, "dump",
 	    CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_MPSAFE | CTLFLAG_SKIP,
 	    sc, 0, vxlan_ftable_sysctl_dump, "A",
 	    "Dump the forwarding table entries");
 
 	node = SYSCTL_ADD_NODE(ctx, SYSCTL_CHILDREN(sc->vxl_sysctl_node),
 	    OID_AUTO, "stats", CTLFLAG_RD, NULL, "");
 	SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(node), OID_AUTO,
 	    "ftable_nospace", CTLFLAG_RD, &stats->ftable_nospace, 0,
 	    "Fowarding table reached maximum entries");
 	SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(node), OID_AUTO,
 	    "ftable_lock_upgrade_failed", CTLFLAG_RD,
 	    &stats->ftable_lock_upgrade_failed, 0,
 	    "Forwarding table update required lock upgrade");
 }
 
 static void
 vxlan_sysctl_destroy(struct vxlan_softc *sc)
 {
 
 	sysctl_ctx_free(&sc->vxl_sysctl_ctx);
 	sc->vxl_sysctl_node = NULL;
 }
 
 static int
 vxlan_tunable_int(struct vxlan_softc *sc, const char *knob, int def)
 {
 	char path[64];
 
 	snprintf(path, sizeof(path), "net.link.vxlan.%d.%s",
 	    sc->vxl_unit, knob);
 	TUNABLE_INT_FETCH(path, &def);
 
 	return (def);
 }
 
 static void
 vxlan_ifdetach_event(void *arg __unused, struct ifnet *ifp)
 {
 	struct vxlan_softc_head list;
 	struct vxlan_socket *vso;
 	struct vxlan_softc *sc, *tsc;
 
 	LIST_INIT(&list);
 
 	if (ifp->if_flags & IFF_RENAMING)
 		return;
 	if ((ifp->if_flags & IFF_MULTICAST) == 0)
 		return;
 
 	mtx_lock(&vxlan_list_mtx);
 	LIST_FOREACH(vso, &vxlan_socket_list, vxlso_entry)
 		vxlan_socket_ifdetach(vso, ifp, &list);
 	mtx_unlock(&vxlan_list_mtx);
 
 	LIST_FOREACH_SAFE(sc, &list, vxl_ifdetach_list, tsc) {
 		LIST_REMOVE(sc, vxl_ifdetach_list);
 
 		VXLAN_WLOCK(sc);
 		if (sc->vxl_flags & VXLAN_FLAG_INIT)
 			vxlan_init_wait(sc);
 		vxlan_teardown_locked(sc);
 	}
 }
 
 static void
 vxlan_load(void)
 {
 
 	mtx_init(&vxlan_list_mtx, "vxlan list", NULL, MTX_DEF);
 	LIST_INIT(&vxlan_socket_list);
 	vxlan_ifdetach_event_tag = EVENTHANDLER_REGISTER(ifnet_departure_event,
 	    vxlan_ifdetach_event, NULL, EVENTHANDLER_PRI_ANY);
 	vxlan_cloner = if_clone_simple(vxlan_name, vxlan_clone_create,
 	    vxlan_clone_destroy, 0);
 }
 
 static void
 vxlan_unload(void)
 {
 
 	EVENTHANDLER_DEREGISTER(ifnet_departure_event,
 	    vxlan_ifdetach_event_tag);
 	if_clone_detach(vxlan_cloner);
 	mtx_destroy(&vxlan_list_mtx);
 	MPASS(LIST_EMPTY(&vxlan_socket_list));
 }
 
 static int
 vxlan_modevent(module_t mod, int type, void *unused)
 {
 	int error;
 
 	error = 0;
 
 	switch (type) {
 	case MOD_LOAD:
 		vxlan_load();
 		break;
 	case MOD_UNLOAD:
 		vxlan_unload();
 		break;
 	default:
 		error = ENOTSUP;
 		break;
 	}
 
 	return (error);
 }
 
 static moduledata_t vxlan_mod = {
 	"if_vxlan",
 	vxlan_modevent,
 	0
 };
 
 DECLARE_MODULE(if_vxlan, vxlan_mod, SI_SUB_PSEUDO, SI_ORDER_ANY);
 MODULE_VERSION(if_vxlan, 1);
Index: projects/vnet/sys/netinet/sctp_input.c
===================================================================
--- projects/vnet/sys/netinet/sctp_input.c	(revision 301546)
+++ projects/vnet/sys/netinet/sctp_input.c	(revision 301547)
@@ -1,6250 +1,6250 @@
 /*-
  * Copyright (c) 2001-2008, by Cisco Systems, Inc. All rights reserved.
  * Copyright (c) 2008-2012, by Randall Stewart. All rights reserved.
  * Copyright (c) 2008-2012, by Michael Tuexen. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * a) Redistributions of source code must retain the above copyright notice,
  *    this list of conditions and the following disclaimer.
  *
  * b) Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in
  *    the documentation and/or other materials provided with the distribution.
  *
  * c) Neither the name of Cisco Systems, Inc. nor the names of its
  *    contributors may be used to endorse or promote products derived
  *    from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <netinet/sctp_os.h>
 #include <netinet/sctp_var.h>
 #include <netinet/sctp_sysctl.h>
 #include <netinet/sctp_pcb.h>
 #include <netinet/sctp_header.h>
 #include <netinet/sctputil.h>
 #include <netinet/sctp_output.h>
 #include <netinet/sctp_input.h>
 #include <netinet/sctp_auth.h>
 #include <netinet/sctp_indata.h>
 #include <netinet/sctp_asconf.h>
 #include <netinet/sctp_bsd_addr.h>
 #include <netinet/sctp_timer.h>
 #include <netinet/sctp_crc32.h>
 #if defined(INET) || defined(INET6)
 #include <netinet/udp.h>
 #endif
 #include <sys/smp.h>
 
 
 
 static void
 sctp_stop_all_cookie_timers(struct sctp_tcb *stcb)
 {
 	struct sctp_nets *net;
 
 	/*
 	 * This now not only stops all cookie timers it also stops any INIT
 	 * timers as well. This will make sure that the timers are stopped
 	 * in all collision cases.
 	 */
 	SCTP_TCB_LOCK_ASSERT(stcb);
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		if (net->rxt_timer.type == SCTP_TIMER_TYPE_COOKIE) {
 			sctp_timer_stop(SCTP_TIMER_TYPE_COOKIE,
 			    stcb->sctp_ep,
 			    stcb,
 			    net, SCTP_FROM_SCTP_INPUT + SCTP_LOC_1);
 		} else if (net->rxt_timer.type == SCTP_TIMER_TYPE_INIT) {
 			sctp_timer_stop(SCTP_TIMER_TYPE_INIT,
 			    stcb->sctp_ep,
 			    stcb,
 			    net, SCTP_FROM_SCTP_INPUT + SCTP_LOC_2);
 		}
 	}
 }
 
 /* INIT handler */
 static void
 sctp_handle_init(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst, struct sctphdr *sh,
     struct sctp_init_chunk *cp, struct sctp_inpcb *inp,
     struct sctp_tcb *stcb, struct sctp_nets *net, int *abort_no_unlock,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id, uint16_t port)
 {
 	struct sctp_init *init;
 	struct mbuf *op_err;
 
 	SCTPDBG(SCTP_DEBUG_INPUT2, "sctp_handle_init: handling INIT tcb:%p\n",
 	    (void *)stcb);
 	if (stcb == NULL) {
 		SCTP_INP_RLOCK(inp);
 	}
 	/* validate length */
 	if (ntohs(cp->ch.chunk_length) < sizeof(struct sctp_init_chunk)) {
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	/* validate parameters */
 	init = &cp->init;
 	if (init->initiate_tag == 0) {
 		/* protocol error... send abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	if (ntohl(init->a_rwnd) < SCTP_MIN_RWND) {
 		/* invalid parameter... send abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	if (init->num_inbound_streams == 0) {
 		/* protocol error... send abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	if (init->num_outbound_streams == 0) {
 		/* protocol error... send abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	if (sctp_validate_init_auth_params(m, offset + sizeof(*cp),
 	    offset + ntohs(cp->ch.chunk_length))) {
 		/* auth parameter(s) error... send abort */
 		op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 		    "Problem with AUTH parameters");
 		sctp_abort_association(inp, stcb, m, iphlen, src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		if (stcb)
 			*abort_no_unlock = 1;
 		goto outnow;
 	}
 	/*
 	 * We are only accepting if we have a socket with positive
 	 * so_qlimit.
 	 */
 	if ((stcb == NULL) &&
 	    ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) ||
 	    (inp->sctp_socket == NULL) ||
 	    (inp->sctp_socket->so_qlimit == 0))) {
 		/*
 		 * FIX ME ?? What about TCP model and we have a
 		 * match/restart case? Actually no fix is needed. the lookup
 		 * will always find the existing assoc so stcb would not be
 		 * NULL. It may be questionable to do this since we COULD
 		 * just send back the INIT-ACK and hope that the app did
 		 * accept()'s by the time the COOKIE was sent. But there is
 		 * a price to pay for COOKIE generation and I don't want to
 		 * pay it on the chance that the app will actually do some
 		 * accepts(). The App just looses and should NOT be in this
 		 * state :-)
 		 */
 		if (SCTP_BASE_SYSCTL(sctp_blackhole) == 0) {
 			op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 			    "No listener");
 			sctp_send_abort(m, iphlen, src, dst, sh, 0, op_err,
 			    mflowtype, mflowid, inp->fibnum,
 			    vrf_id, port);
 		}
 		goto outnow;
 	}
 	if ((stcb != NULL) &&
 	    (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_SHUTDOWN_ACK_SENT)) {
 		SCTPDBG(SCTP_DEBUG_INPUT3, "sctp_handle_init: sending SHUTDOWN-ACK\n");
 		sctp_send_shutdown_ack(stcb, NULL);
 		sctp_chunk_output(inp, stcb, SCTP_OUTPUT_FROM_CONTROL_PROC, SCTP_SO_NOT_LOCKED);
 	} else {
 		SCTPDBG(SCTP_DEBUG_INPUT3, "sctp_handle_init: sending INIT-ACK\n");
 		sctp_send_initiate_ack(inp, stcb, net, m, iphlen, offset,
 		    src, dst, sh, cp,
 		    mflowtype, mflowid,
 		    vrf_id, port,
 		    ((stcb == NULL) ? SCTP_HOLDS_LOCK : SCTP_NOT_LOCKED));
 	}
 outnow:
 	if (stcb == NULL) {
 		SCTP_INP_RUNLOCK(inp);
 	}
 }
 
 /*
  * process peer "INIT/INIT-ACK" chunk returns value < 0 on error
  */
 
 int
 sctp_is_there_unsent_data(struct sctp_tcb *stcb, int so_locked
 #if !defined(__APPLE__) && !defined(SCTP_SO_LOCK_TESTING)
     SCTP_UNUSED
 #endif
 )
 {
 	int unsent_data = 0;
 	unsigned int i;
 	struct sctp_stream_queue_pending *sp;
 	struct sctp_association *asoc;
 
 	/*
 	 * This function returns the number of streams that have true unsent
 	 * data on them. Note that as it looks through it will clean up any
 	 * places that have old data that has been sent but left at top of
 	 * stream queue.
 	 */
 	asoc = &stcb->asoc;
 	SCTP_TCB_SEND_LOCK(stcb);
 	if (!stcb->asoc.ss_functions.sctp_ss_is_empty(stcb, asoc)) {
 		/* Check to see if some data queued */
 		for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 			/* sa_ignore FREED_MEMORY */
 			sp = TAILQ_FIRST(&stcb->asoc.strmout[i].outqueue);
 			if (sp == NULL) {
 				continue;
 			}
 			if ((sp->msg_is_complete) &&
 			    (sp->length == 0) &&
 			    (sp->sender_all_done)) {
 				/*
 				 * We are doing differed cleanup. Last time
 				 * through when we took all the data the
 				 * sender_all_done was not set.
 				 */
 				if (sp->put_last_out == 0) {
 					SCTP_PRINTF("Gak, put out entire msg with NO end!-1\n");
 					SCTP_PRINTF("sender_done:%d len:%d msg_comp:%d put_last_out:%d\n",
 					    sp->sender_all_done,
 					    sp->length,
 					    sp->msg_is_complete,
 					    sp->put_last_out);
 				}
 				atomic_subtract_int(&stcb->asoc.stream_queue_cnt, 1);
 				TAILQ_REMOVE(&stcb->asoc.strmout[i].outqueue, sp, next);
 				if (sp->net) {
 					sctp_free_remote_addr(sp->net);
 					sp->net = NULL;
 				}
 				if (sp->data) {
 					sctp_m_freem(sp->data);
 					sp->data = NULL;
 				}
 				sctp_free_a_strmoq(stcb, sp, so_locked);
 			} else {
 				unsent_data++;
 				break;
 			}
 		}
 	}
 	SCTP_TCB_SEND_UNLOCK(stcb);
 	return (unsent_data);
 }
 
 static int
 sctp_process_init(struct sctp_init_chunk *cp, struct sctp_tcb *stcb)
 {
 	struct sctp_init *init;
 	struct sctp_association *asoc;
 	struct sctp_nets *lnet;
 	unsigned int i;
 
 	init = &cp->init;
 	asoc = &stcb->asoc;
 	/* save off parameters */
 	asoc->peer_vtag = ntohl(init->initiate_tag);
 	asoc->peers_rwnd = ntohl(init->a_rwnd);
 	/* init tsn's */
 	asoc->highest_tsn_inside_map = asoc->asconf_seq_in = ntohl(init->initial_tsn) - 1;
 
 	if (!TAILQ_EMPTY(&asoc->nets)) {
 		/* update any ssthresh's that may have a default */
 		TAILQ_FOREACH(lnet, &asoc->nets, sctp_next) {
 			lnet->ssthresh = asoc->peers_rwnd;
 			if (SCTP_BASE_SYSCTL(sctp_logging_level) & (SCTP_CWND_MONITOR_ENABLE | SCTP_CWND_LOGGING_ENABLE)) {
 				sctp_log_cwnd(stcb, lnet, 0, SCTP_CWND_INITIALIZATION);
 			}
 		}
 	}
 	SCTP_TCB_SEND_LOCK(stcb);
 	if (asoc->pre_open_streams > ntohs(init->num_inbound_streams)) {
 		unsigned int newcnt;
 		struct sctp_stream_out *outs;
 		struct sctp_stream_queue_pending *sp, *nsp;
 		struct sctp_tmit_chunk *chk, *nchk;
 
 		/* abandon the upper streams */
 		newcnt = ntohs(init->num_inbound_streams);
 		TAILQ_FOREACH_SAFE(chk, &asoc->send_queue, sctp_next, nchk) {
 			if (chk->rec.data.stream_number >= newcnt) {
 				TAILQ_REMOVE(&asoc->send_queue, chk, sctp_next);
 				asoc->send_queue_cnt--;
 				if (asoc->strmout[chk->rec.data.stream_number].chunks_on_queues > 0) {
 					asoc->strmout[chk->rec.data.stream_number].chunks_on_queues--;
 #ifdef INVARIANTS
 				} else {
 					panic("No chunks on the queues for sid %u.", chk->rec.data.stream_number);
 #endif
 				}
 				if (chk->data != NULL) {
 					sctp_free_bufspace(stcb, asoc, chk, 1);
 					sctp_ulp_notify(SCTP_NOTIFY_UNSENT_DG_FAIL, stcb,
 					    0, chk, SCTP_SO_NOT_LOCKED);
 					if (chk->data) {
 						sctp_m_freem(chk->data);
 						chk->data = NULL;
 					}
 				}
 				sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 				/* sa_ignore FREED_MEMORY */
 			}
 		}
 		if (asoc->strmout) {
 			for (i = newcnt; i < asoc->pre_open_streams; i++) {
 				outs = &asoc->strmout[i];
 				TAILQ_FOREACH_SAFE(sp, &outs->outqueue, next, nsp) {
 					TAILQ_REMOVE(&outs->outqueue, sp, next);
 					asoc->stream_queue_cnt--;
 					sctp_ulp_notify(SCTP_NOTIFY_SPECIAL_SP_FAIL,
 					    stcb, 0, sp, SCTP_SO_NOT_LOCKED);
 					if (sp->data) {
 						sctp_m_freem(sp->data);
 						sp->data = NULL;
 					}
 					if (sp->net) {
 						sctp_free_remote_addr(sp->net);
 						sp->net = NULL;
 					}
 					/* Free the chunk */
 					sctp_free_a_strmoq(stcb, sp, SCTP_SO_NOT_LOCKED);
 					/* sa_ignore FREED_MEMORY */
 				}
 				outs->state = SCTP_STREAM_CLOSED;
 			}
 		}
 		/* cut back the count */
 		asoc->pre_open_streams = newcnt;
 	}
 	SCTP_TCB_SEND_UNLOCK(stcb);
 	asoc->streamoutcnt = asoc->pre_open_streams;
 	if (asoc->strmout) {
 		for (i = 0; i < asoc->streamoutcnt; i++) {
 			asoc->strmout[i].state = SCTP_STREAM_OPEN;
 		}
 	}
 	/* EY - nr_sack: initialize highest tsn in nr_mapping_array */
 	asoc->highest_tsn_inside_nr_map = asoc->highest_tsn_inside_map;
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MAP_LOGGING_ENABLE) {
 		sctp_log_map(0, 5, asoc->highest_tsn_inside_map, SCTP_MAP_SLIDE_RESULT);
 	}
 	/* This is the next one we expect */
 	asoc->str_reset_seq_in = asoc->asconf_seq_in + 1;
 
 	asoc->mapping_array_base_tsn = ntohl(init->initial_tsn);
 	asoc->tsn_last_delivered = asoc->cumulative_tsn = asoc->asconf_seq_in;
 
 	asoc->advanced_peer_ack_point = asoc->last_acked_seq;
 	/* open the requested streams */
 
 	if (asoc->strmin != NULL) {
 		/* Free the old ones */
 		for (i = 0; i < asoc->streamincnt; i++) {
 			sctp_clean_up_stream(stcb, &asoc->strmin[i].inqueue);
 			sctp_clean_up_stream(stcb, &asoc->strmin[i].uno_inqueue);
 		}
 		SCTP_FREE(asoc->strmin, SCTP_M_STRMI);
 	}
 	if (asoc->max_inbound_streams > ntohs(init->num_outbound_streams)) {
 		asoc->streamincnt = ntohs(init->num_outbound_streams);
 	} else {
 		asoc->streamincnt = asoc->max_inbound_streams;
 	}
 	SCTP_MALLOC(asoc->strmin, struct sctp_stream_in *, asoc->streamincnt *
 	    sizeof(struct sctp_stream_in), SCTP_M_STRMI);
 	if (asoc->strmin == NULL) {
 		/* we didn't get memory for the streams! */
 		SCTPDBG(SCTP_DEBUG_INPUT2, "process_init: couldn't get memory for the streams!\n");
 		return (-1);
 	}
 	for (i = 0; i < asoc->streamincnt; i++) {
 		asoc->strmin[i].stream_no = i;
 		asoc->strmin[i].last_sequence_delivered = 0xffffffff;
 		TAILQ_INIT(&asoc->strmin[i].inqueue);
 		TAILQ_INIT(&asoc->strmin[i].uno_inqueue);
 		asoc->strmin[i].pd_api_started = 0;
 		asoc->strmin[i].delivery_started = 0;
 	}
 	/*
 	 * load_address_from_init will put the addresses into the
 	 * association when the COOKIE is processed or the INIT-ACK is
 	 * processed. Both types of COOKIE's existing and new call this
 	 * routine. It will remove addresses that are no longer in the
 	 * association (for the restarting case where addresses are
 	 * removed). Up front when the INIT arrives we will discard it if it
 	 * is a restart and new addresses have been added.
 	 */
 	/* sa_ignore MEMLEAK */
 	return (0);
 }
 
 /*
  * INIT-ACK message processing/consumption returns value < 0 on error
  */
 static int
 sctp_process_init_ack(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst, struct sctphdr *sh,
     struct sctp_init_ack_chunk *cp, struct sctp_tcb *stcb,
     struct sctp_nets *net, int *abort_no_unlock,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id)
 {
 	struct sctp_association *asoc;
 	struct mbuf *op_err;
 	int retval, abort_flag;
 	uint32_t initack_limit;
 	int nat_friendly = 0;
 
 	/* First verify that we have no illegal param's */
 	abort_flag = 0;
 
 	op_err = sctp_arethere_unrecognized_parameters(m,
 	    (offset + sizeof(struct sctp_init_chunk)),
 	    &abort_flag, (struct sctp_chunkhdr *)cp, &nat_friendly);
 	if (abort_flag) {
 		/* Send an abort and notify peer */
 		sctp_abort_an_association(stcb->sctp_ep, stcb, op_err, SCTP_SO_NOT_LOCKED);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	asoc = &stcb->asoc;
 	asoc->peer_supports_nat = (uint8_t) nat_friendly;
 	/* process the peer's parameters in the INIT-ACK */
 	retval = sctp_process_init((struct sctp_init_chunk *)cp, stcb);
 	if (retval < 0) {
 		return (retval);
 	}
 	initack_limit = offset + ntohs(cp->ch.chunk_length);
 	/* load all addresses */
 	if ((retval = sctp_load_addresses_from_init(stcb, m,
 	    (offset + sizeof(struct sctp_init_chunk)), initack_limit,
 	    src, dst, NULL, stcb->asoc.port))) {
 		op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 		    "Problem with address parameters");
 		SCTPDBG(SCTP_DEBUG_INPUT1,
 		    "Load addresses from INIT causes an abort %d\n",
 		    retval);
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	/* if the peer doesn't support asconf, flush the asconf queue */
 	if (asoc->asconf_supported == 0) {
 		struct sctp_asconf_addr *param, *nparam;
 
 		TAILQ_FOREACH_SAFE(param, &asoc->asconf_queue, next, nparam) {
 			TAILQ_REMOVE(&asoc->asconf_queue, param, next);
 			SCTP_FREE(param, SCTP_M_ASC_ADDR);
 		}
 	}
 	stcb->asoc.peer_hmac_id = sctp_negotiate_hmacid(stcb->asoc.peer_hmacs,
 	    stcb->asoc.local_hmacs);
 	if (op_err) {
 		sctp_queue_op_err(stcb, op_err);
 		/* queuing will steal away the mbuf chain to the out queue */
 		op_err = NULL;
 	}
 	/* extract the cookie and queue it to "echo" it back... */
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 		sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 		    stcb->asoc.overall_error_count,
 		    0,
 		    SCTP_FROM_SCTP_INPUT,
 		    __LINE__);
 	}
 	stcb->asoc.overall_error_count = 0;
 	net->error_count = 0;
 
 	/*
 	 * Cancel the INIT timer, We do this first before queueing the
 	 * cookie. We always cancel at the primary to assue that we are
 	 * canceling the timer started by the INIT which always goes to the
 	 * primary.
 	 */
 	sctp_timer_stop(SCTP_TIMER_TYPE_INIT, stcb->sctp_ep, stcb,
 	    asoc->primary_destination, SCTP_FROM_SCTP_INPUT + SCTP_LOC_3);
 
 	/* calculate the RTO */
 	net->RTO = sctp_calculate_rto(stcb, asoc, net, &asoc->time_entered, sctp_align_safe_nocopy,
 	    SCTP_RTT_FROM_NON_DATA);
 	retval = sctp_send_cookie_echo(m, offset, stcb, net);
 	if (retval < 0) {
 		/*
 		 * No cookie, we probably should send a op error. But in any
 		 * case if there is no cookie in the INIT-ACK, we can
 		 * abandon the peer, its broke.
 		 */
 		if (retval == -3) {
 			uint16_t len;
 
 			len = (uint16_t) (sizeof(struct sctp_error_missing_param) + sizeof(uint16_t));
 			/* We abort with an error of missing mandatory param */
 			op_err = sctp_get_mbuf_for_msg(len, 0, M_NOWAIT, 1, MT_DATA);
 			if (op_err != NULL) {
 				struct sctp_error_missing_param *cause;
 
 				SCTP_BUF_LEN(op_err) = len;
 				cause = mtod(op_err, struct sctp_error_missing_param *);
 				/* Subtract the reserved param */
 				cause->cause.code = htons(SCTP_CAUSE_MISSING_PARAM);
 				cause->cause.length = htons(len);
 				cause->num_missing_params = htonl(1);
 				cause->type[0] = htons(SCTP_STATE_COOKIE);
 			}
 			sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 			    src, dst, sh, op_err,
 			    mflowtype, mflowid,
 			    vrf_id, net->port);
 			*abort_no_unlock = 1;
 		}
 		return (retval);
 	}
 	return (0);
 }
 
 static void
 sctp_handle_heartbeat_ack(struct sctp_heartbeat_chunk *cp,
     struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	union sctp_sockstore store;
 	struct sctp_nets *r_net, *f_net;
 	struct timeval tv;
 	int req_prim = 0;
 	uint16_t old_error_counter;
 
 	if (ntohs(cp->ch.chunk_length) != sizeof(struct sctp_heartbeat_chunk)) {
 		/* Invalid length */
 		return;
 	}
 	memset(&store, 0, sizeof(store));
 	switch (cp->heartbeat.hb_info.addr_family) {
 #ifdef INET
 	case AF_INET:
 		if (cp->heartbeat.hb_info.addr_len == sizeof(struct sockaddr_in)) {
 			store.sin.sin_family = cp->heartbeat.hb_info.addr_family;
 			store.sin.sin_len = cp->heartbeat.hb_info.addr_len;
 			store.sin.sin_port = stcb->rport;
 			memcpy(&store.sin.sin_addr, cp->heartbeat.hb_info.address,
 			    sizeof(store.sin.sin_addr));
 		} else {
 			return;
 		}
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		if (cp->heartbeat.hb_info.addr_len == sizeof(struct sockaddr_in6)) {
 			store.sin6.sin6_family = cp->heartbeat.hb_info.addr_family;
 			store.sin6.sin6_len = cp->heartbeat.hb_info.addr_len;
 			store.sin6.sin6_port = stcb->rport;
 			memcpy(&store.sin6.sin6_addr, cp->heartbeat.hb_info.address, sizeof(struct in6_addr));
 		} else {
 			return;
 		}
 		break;
 #endif
 	default:
 		return;
 	}
 	r_net = sctp_findnet(stcb, &store.sa);
 	if (r_net == NULL) {
 		SCTPDBG(SCTP_DEBUG_INPUT1, "Huh? I can't find the address I sent it to, discard\n");
 		return;
 	}
 	if ((r_net && (r_net->dest_state & SCTP_ADDR_UNCONFIRMED)) &&
 	    (r_net->heartbeat_random1 == cp->heartbeat.hb_info.random_value1) &&
 	    (r_net->heartbeat_random2 == cp->heartbeat.hb_info.random_value2)) {
 		/*
 		 * If the its a HB and it's random value is correct when can
 		 * confirm the destination.
 		 */
 		r_net->dest_state &= ~SCTP_ADDR_UNCONFIRMED;
 		if (r_net->dest_state & SCTP_ADDR_REQ_PRIMARY) {
 			stcb->asoc.primary_destination = r_net;
 			r_net->dest_state &= ~SCTP_ADDR_REQ_PRIMARY;
 			f_net = TAILQ_FIRST(&stcb->asoc.nets);
 			if (f_net != r_net) {
 				/*
 				 * first one on the list is NOT the primary
 				 * sctp_cmpaddr() is much more efficient if
 				 * the primary is the first on the list,
 				 * make it so.
 				 */
 				TAILQ_REMOVE(&stcb->asoc.nets, r_net, sctp_next);
 				TAILQ_INSERT_HEAD(&stcb->asoc.nets, r_net, sctp_next);
 			}
 			req_prim = 1;
 		}
 		sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_CONFIRMED,
 		    stcb, 0, (void *)r_net, SCTP_SO_NOT_LOCKED);
 		sctp_timer_stop(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep, stcb,
 		    r_net, SCTP_FROM_SCTP_INPUT + SCTP_LOC_4);
 		sctp_timer_start(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep, stcb, r_net);
 	}
 	old_error_counter = r_net->error_count;
 	r_net->error_count = 0;
 	r_net->hb_responded = 1;
 	tv.tv_sec = cp->heartbeat.hb_info.time_value_1;
 	tv.tv_usec = cp->heartbeat.hb_info.time_value_2;
 	/* Now lets do a RTO with this */
 	r_net->RTO = sctp_calculate_rto(stcb, &stcb->asoc, r_net, &tv, sctp_align_safe_nocopy,
 	    SCTP_RTT_FROM_NON_DATA);
 	if (!(r_net->dest_state & SCTP_ADDR_REACHABLE)) {
 		r_net->dest_state |= SCTP_ADDR_REACHABLE;
 		sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_UP, stcb,
 		    0, (void *)r_net, SCTP_SO_NOT_LOCKED);
 	}
 	if (r_net->dest_state & SCTP_ADDR_PF) {
 		r_net->dest_state &= ~SCTP_ADDR_PF;
 		stcb->asoc.cc_functions.sctp_cwnd_update_exit_pf(stcb, net);
 	}
 	if (old_error_counter > 0) {
 		sctp_timer_stop(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep,
 		    stcb, r_net, SCTP_FROM_SCTP_INPUT + SCTP_LOC_5);
 		sctp_timer_start(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep, stcb, r_net);
 	}
 	if (r_net == stcb->asoc.primary_destination) {
 		if (stcb->asoc.alternate) {
 			/* release the alternate, primary is good */
 			sctp_free_remote_addr(stcb->asoc.alternate);
 			stcb->asoc.alternate = NULL;
 		}
 	}
 	/* Mobility adaptation */
 	if (req_prim) {
 		if ((sctp_is_mobility_feature_on(stcb->sctp_ep,
 		    SCTP_MOBILITY_BASE) ||
 		    sctp_is_mobility_feature_on(stcb->sctp_ep,
 		    SCTP_MOBILITY_FASTHANDOFF)) &&
 		    sctp_is_mobility_feature_on(stcb->sctp_ep,
 		    SCTP_MOBILITY_PRIM_DELETED)) {
 
 			sctp_timer_stop(SCTP_TIMER_TYPE_PRIM_DELETED,
 			    stcb->sctp_ep, stcb, NULL,
 			    SCTP_FROM_SCTP_INPUT + SCTP_LOC_6);
 			if (sctp_is_mobility_feature_on(stcb->sctp_ep,
 			    SCTP_MOBILITY_FASTHANDOFF)) {
 				sctp_assoc_immediate_retrans(stcb,
 				    stcb->asoc.primary_destination);
 			}
 			if (sctp_is_mobility_feature_on(stcb->sctp_ep,
 			    SCTP_MOBILITY_BASE)) {
 				sctp_move_chunks_from_net(stcb,
 				    stcb->asoc.deleted_primary);
 			}
 			sctp_delete_prim_timer(stcb->sctp_ep, stcb,
 			    stcb->asoc.deleted_primary);
 		}
 	}
 }
 
 static int
 sctp_handle_nat_colliding_state(struct sctp_tcb *stcb)
 {
 	/*
 	 * return 0 means we want you to proceed with the abort non-zero
 	 * means no abort processing
 	 */
 	struct sctpasochead *head;
 
 	if (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_COOKIE_WAIT) {
 		/* generate a new vtag and send init */
 		LIST_REMOVE(stcb, sctp_asocs);
 		stcb->asoc.my_vtag = sctp_select_a_tag(stcb->sctp_ep, stcb->sctp_ep->sctp_lport, stcb->rport, 1);
 		head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(stcb->asoc.my_vtag, SCTP_BASE_INFO(hashasocmark))];
 		/*
 		 * put it in the bucket in the vtag hash of assoc's for the
 		 * system
 		 */
 		LIST_INSERT_HEAD(head, stcb, sctp_asocs);
 		sctp_send_initiate(stcb->sctp_ep, stcb, SCTP_SO_NOT_LOCKED);
 		return (1);
 	}
 	if (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_COOKIE_ECHOED) {
 		/*
 		 * treat like a case where the cookie expired i.e.: - dump
 		 * current cookie. - generate a new vtag. - resend init.
 		 */
 		/* generate a new vtag and send init */
 		LIST_REMOVE(stcb, sctp_asocs);
 		stcb->asoc.state &= ~SCTP_STATE_COOKIE_ECHOED;
 		stcb->asoc.state |= SCTP_STATE_COOKIE_WAIT;
 		sctp_stop_all_cookie_timers(stcb);
 		sctp_toss_old_cookies(stcb, &stcb->asoc);
 		stcb->asoc.my_vtag = sctp_select_a_tag(stcb->sctp_ep, stcb->sctp_ep->sctp_lport, stcb->rport, 1);
 		head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(stcb->asoc.my_vtag, SCTP_BASE_INFO(hashasocmark))];
 		/*
 		 * put it in the bucket in the vtag hash of assoc's for the
 		 * system
 		 */
 		LIST_INSERT_HEAD(head, stcb, sctp_asocs);
 		sctp_send_initiate(stcb->sctp_ep, stcb, SCTP_SO_NOT_LOCKED);
 		return (1);
 	}
 	return (0);
 }
 
 static int
 sctp_handle_nat_missing_state(struct sctp_tcb *stcb,
     struct sctp_nets *net)
 {
 	/*
 	 * return 0 means we want you to proceed with the abort non-zero
 	 * means no abort processing
 	 */
 	if (stcb->asoc.auth_supported == 0) {
 		SCTPDBG(SCTP_DEBUG_INPUT2, "sctp_handle_nat_missing_state: Peer does not support AUTH, cannot send an asconf\n");
 		return (0);
 	}
 	sctp_asconf_send_nat_state_update(stcb, net);
 	return (1);
 }
 
 
 static void
 sctp_handle_abort(struct sctp_abort_chunk *abort,
     struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 #endif
 	uint16_t len;
 	uint16_t error;
 
 	SCTPDBG(SCTP_DEBUG_INPUT2, "sctp_handle_abort: handling ABORT\n");
 	if (stcb == NULL)
 		return;
 
 	len = ntohs(abort->ch.chunk_length);
 	if (len > sizeof(struct sctp_chunkhdr)) {
 		/*
 		 * Need to check the cause codes for our two magic nat
 		 * aborts which don't kill the assoc necessarily.
 		 */
 		struct sctp_gen_error_cause *cause;
 
 		cause = (struct sctp_gen_error_cause *)(abort + 1);
 		error = ntohs(cause->code);
 		if (error == SCTP_CAUSE_NAT_COLLIDING_STATE) {
 			SCTPDBG(SCTP_DEBUG_INPUT2, "Received Colliding state abort flags:%x\n",
 			    abort->ch.chunk_flags);
 			if (sctp_handle_nat_colliding_state(stcb)) {
 				return;
 			}
 		} else if (error == SCTP_CAUSE_NAT_MISSING_STATE) {
 			SCTPDBG(SCTP_DEBUG_INPUT2, "Received missing state abort flags:%x\n",
 			    abort->ch.chunk_flags);
 			if (sctp_handle_nat_missing_state(stcb, net)) {
 				return;
 			}
 		}
 	} else {
 		error = 0;
 	}
 	/* stop any receive timers */
 	sctp_timer_stop(SCTP_TIMER_TYPE_RECV, stcb->sctp_ep, stcb, net,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_7);
 	/* notify user of the abort and clean up... */
 	sctp_abort_notification(stcb, 1, error, abort, SCTP_SO_NOT_LOCKED);
 	/* free the tcb */
 	SCTP_STAT_INCR_COUNTER32(sctps_aborted);
 	if ((SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_OPEN) ||
 	    (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 		SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 	}
 #ifdef SCTP_ASOCLOG_OF_TSNS
 	sctp_print_out_track_log(stcb);
 #endif
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	so = SCTP_INP_SO(stcb->sctp_ep);
 	atomic_add_int(&stcb->asoc.refcnt, 1);
 	SCTP_TCB_UNLOCK(stcb);
 	SCTP_SOCKET_LOCK(so, 1);
 	SCTP_TCB_LOCK(stcb);
 	atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 	stcb->asoc.state |= SCTP_STATE_WAS_ABORTED;
 	(void)sctp_free_assoc(stcb->sctp_ep, stcb, SCTP_NORMAL_PROC,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_8);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 	SCTPDBG(SCTP_DEBUG_INPUT2, "sctp_handle_abort: finished\n");
 }
 
 static void
 sctp_start_net_timers(struct sctp_tcb *stcb)
 {
 	uint32_t cnt_hb_sent;
 	struct sctp_nets *net;
 
 	cnt_hb_sent = 0;
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		/*
 		 * For each network start: 1) A pmtu timer. 2) A HB timer 3)
 		 * If the dest in unconfirmed send a hb as well if under
 		 * max_hb_burst have been sent.
 		 */
 		sctp_timer_start(SCTP_TIMER_TYPE_PATHMTURAISE, stcb->sctp_ep, stcb, net);
 		sctp_timer_start(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep, stcb, net);
 		if ((net->dest_state & SCTP_ADDR_UNCONFIRMED) &&
 		    (cnt_hb_sent < SCTP_BASE_SYSCTL(sctp_hb_maxburst))) {
 			sctp_send_hb(stcb, net, SCTP_SO_NOT_LOCKED);
 			cnt_hb_sent++;
 		}
 	}
 	if (cnt_hb_sent) {
 		sctp_chunk_output(stcb->sctp_ep, stcb,
 		    SCTP_OUTPUT_FROM_COOKIE_ACK,
 		    SCTP_SO_NOT_LOCKED);
 	}
 }
 
 
 static void
 sctp_handle_shutdown(struct sctp_shutdown_chunk *cp,
     struct sctp_tcb *stcb, struct sctp_nets *net, int *abort_flag)
 {
 	struct sctp_association *asoc;
 	int some_on_streamwheel;
 	int old_state;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 #endif
 
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_shutdown: handling SHUTDOWN\n");
 	if (stcb == NULL)
 		return;
 	asoc = &stcb->asoc;
 	if ((SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_WAIT) ||
 	    (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED)) {
 		return;
 	}
 	if (ntohs(cp->ch.chunk_length) != sizeof(struct sctp_shutdown_chunk)) {
 		/* Shutdown NOT the expected size */
 		return;
 	}
 	old_state = SCTP_GET_STATE(asoc);
 	sctp_update_acked(stcb, cp, abort_flag);
 	if (*abort_flag) {
 		return;
 	}
 	if (asoc->control_pdapi) {
 		/*
 		 * With a normal shutdown we assume the end of last record.
 		 */
 		SCTP_INP_READ_LOCK(stcb->sctp_ep);
 		if (asoc->control_pdapi->on_strm_q) {
 			struct sctp_stream_in *strm;
 
 			strm = &asoc->strmin[asoc->control_pdapi->sinfo_stream];
 			if (asoc->control_pdapi->on_strm_q == SCTP_ON_UNORDERED) {
 				/* Unordered */
 				TAILQ_REMOVE(&strm->uno_inqueue, asoc->control_pdapi, next_instrm);
 				asoc->control_pdapi->on_strm_q = 0;
 			} else if (asoc->control_pdapi->on_strm_q == SCTP_ON_ORDERED) {
 				/* Ordered */
 				TAILQ_REMOVE(&strm->inqueue, asoc->control_pdapi, next_instrm);
 				asoc->control_pdapi->on_strm_q = 0;
 #ifdef INVARIANTS
 			} else {
 				panic("Unknown state on ctrl:%p on_strm_q:%d",
 				    asoc->control_pdapi,
 				    asoc->control_pdapi->on_strm_q);
 #endif
 			}
 		}
 		asoc->control_pdapi->end_added = 1;
 		asoc->control_pdapi->pdapi_aborted = 1;
 		asoc->control_pdapi = NULL;
 		SCTP_INP_READ_UNLOCK(stcb->sctp_ep);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		so = SCTP_INP_SO(stcb->sctp_ep);
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 			/* assoc was freed while we were unlocked */
 			SCTP_SOCKET_UNLOCK(so, 1);
 			return;
 		}
 #endif
 		if (stcb->sctp_socket) {
 			sctp_sorwakeup(stcb->sctp_ep, stcb->sctp_socket);
 		}
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 	}
 	/* goto SHUTDOWN_RECEIVED state to block new requests */
 	if (stcb->sctp_socket) {
 		if ((SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_RECEIVED) &&
 		    (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_ACK_SENT) &&
 		    (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_SENT)) {
 			SCTP_SET_STATE(asoc, SCTP_STATE_SHUTDOWN_RECEIVED);
 			SCTP_CLEAR_SUBSTATE(asoc, SCTP_STATE_SHUTDOWN_PENDING);
 			/*
 			 * notify upper layer that peer has initiated a
 			 * shutdown
 			 */
 			sctp_ulp_notify(SCTP_NOTIFY_PEER_SHUTDOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 
 			/* reset time */
 			(void)SCTP_GETTIME_TIMEVAL(&asoc->time_entered);
 		}
 	}
 	if (SCTP_GET_STATE(asoc) == SCTP_STATE_SHUTDOWN_SENT) {
 		/*
 		 * stop the shutdown timer, since we WILL move to
 		 * SHUTDOWN-ACK-SENT.
 		 */
 		sctp_timer_stop(SCTP_TIMER_TYPE_SHUTDOWN, stcb->sctp_ep, stcb,
 		    net, SCTP_FROM_SCTP_INPUT + SCTP_LOC_9);
 	}
 	/* Now is there unsent data on a stream somewhere? */
 	some_on_streamwheel = sctp_is_there_unsent_data(stcb, SCTP_SO_NOT_LOCKED);
 
 	if (!TAILQ_EMPTY(&asoc->send_queue) ||
 	    !TAILQ_EMPTY(&asoc->sent_queue) ||
 	    some_on_streamwheel) {
 		/* By returning we will push more data out */
 		return;
 	} else {
 		/* no outstanding data to send, so move on... */
 		/* send SHUTDOWN-ACK */
 		/* move to SHUTDOWN-ACK-SENT state */
 		if ((SCTP_GET_STATE(asoc) == SCTP_STATE_OPEN) ||
 		    (SCTP_GET_STATE(asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 			SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 		}
 		SCTP_CLEAR_SUBSTATE(asoc, SCTP_STATE_SHUTDOWN_PENDING);
 		if (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_ACK_SENT) {
 			SCTP_SET_STATE(asoc, SCTP_STATE_SHUTDOWN_ACK_SENT);
 			sctp_stop_timers_for_shutdown(stcb);
 			sctp_send_shutdown_ack(stcb, net);
 			sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNACK,
 			    stcb->sctp_ep, stcb, net);
 		} else if (old_state == SCTP_STATE_SHUTDOWN_ACK_SENT) {
 			sctp_send_shutdown_ack(stcb, net);
 		}
 	}
 }
 
 static void
 sctp_handle_shutdown_ack(struct sctp_shutdown_ack_chunk *cp SCTP_UNUSED,
     struct sctp_tcb *stcb,
     struct sctp_nets *net)
 {
 	struct sctp_association *asoc;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 	so = SCTP_INP_SO(stcb->sctp_ep);
 #endif
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_shutdown_ack: handling SHUTDOWN ACK\n");
 	if (stcb == NULL)
 		return;
 
 	asoc = &stcb->asoc;
 	/* process according to association state */
 	if ((SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_WAIT) ||
 	    (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED)) {
 		/* unexpected SHUTDOWN-ACK... do OOTB handling... */
 		sctp_send_shutdown_complete(stcb, net, 1);
 		SCTP_TCB_UNLOCK(stcb);
 		return;
 	}
 	if ((SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_SENT) &&
 	    (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_ACK_SENT)) {
 		/* unexpected SHUTDOWN-ACK... so ignore... */
 		SCTP_TCB_UNLOCK(stcb);
 		return;
 	}
 	if (asoc->control_pdapi) {
 		/*
 		 * With a normal shutdown we assume the end of last record.
 		 */
 		SCTP_INP_READ_LOCK(stcb->sctp_ep);
 		asoc->control_pdapi->end_added = 1;
 		asoc->control_pdapi->pdapi_aborted = 1;
 		asoc->control_pdapi = NULL;
 		SCTP_INP_READ_UNLOCK(stcb->sctp_ep);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 			/* assoc was freed while we were unlocked */
 			SCTP_SOCKET_UNLOCK(so, 1);
 			return;
 		}
 #endif
 		sctp_sorwakeup(stcb->sctp_ep, stcb->sctp_socket);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 	}
 #ifdef INVARIANTS
 	if (!TAILQ_EMPTY(&asoc->send_queue) ||
 	    !TAILQ_EMPTY(&asoc->sent_queue) ||
 	    !stcb->asoc.ss_functions.sctp_ss_is_empty(stcb, asoc)) {
 		panic("Queues are not empty when handling SHUTDOWN-ACK");
 	}
 #endif
 	/* stop the timer */
 	sctp_timer_stop(SCTP_TIMER_TYPE_SHUTDOWN, stcb->sctp_ep, stcb, net,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_10);
 	/* send SHUTDOWN-COMPLETE */
 	sctp_send_shutdown_complete(stcb, net, 0);
 	/* notify upper layer protocol */
 	if (stcb->sctp_socket) {
 		if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
 			stcb->sctp_socket->so_snd.sb_cc = 0;
 		}
 		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 	}
 	SCTP_STAT_INCR_COUNTER32(sctps_shutdown);
 	/* free the TCB but first save off the ep */
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	atomic_add_int(&stcb->asoc.refcnt, 1);
 	SCTP_TCB_UNLOCK(stcb);
 	SCTP_SOCKET_LOCK(so, 1);
 	SCTP_TCB_LOCK(stcb);
 	atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 	(void)sctp_free_assoc(stcb->sctp_ep, stcb, SCTP_NORMAL_PROC,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_11);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 }
 
 /*
  * Skip past the param header and then we will find the chunk that caused the
  * problem. There are two possibilities ASCONF or FWD-TSN other than that and
  * our peer must be broken.
  */
 static void
 sctp_process_unrecog_chunk(struct sctp_tcb *stcb, struct sctp_paramhdr *phdr,
     struct sctp_nets *net)
 {
 	struct sctp_chunkhdr *chk;
 
 	chk = (struct sctp_chunkhdr *)((caddr_t)phdr + sizeof(*phdr));
 	switch (chk->chunk_type) {
 	case SCTP_ASCONF_ACK:
 	case SCTP_ASCONF:
 		sctp_asconf_cleanup(stcb, net);
 		break;
 	case SCTP_IFORWARD_CUM_TSN:
 	case SCTP_FORWARD_CUM_TSN:
 		stcb->asoc.prsctp_supported = 0;
 		break;
 	default:
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "Peer does not support chunk type %d(%x)??\n",
 		    chk->chunk_type, (uint32_t) chk->chunk_type);
 		break;
 	}
 }
 
 /*
  * Skip past the param header and then we will find the param that caused the
  * problem.  There are a number of param's in a ASCONF OR the prsctp param
  * these will turn of specific features.
  * XXX: Is this the right thing to do?
  */
 static void
 sctp_process_unrecog_param(struct sctp_tcb *stcb, struct sctp_paramhdr *phdr)
 {
 	struct sctp_paramhdr *pbad;
 
 	pbad = phdr + 1;
 	switch (ntohs(pbad->param_type)) {
 		/* pr-sctp draft */
 	case SCTP_PRSCTP_SUPPORTED:
 		stcb->asoc.prsctp_supported = 0;
 		break;
 	case SCTP_SUPPORTED_CHUNK_EXT:
 		break;
 		/* draft-ietf-tsvwg-addip-sctp */
 	case SCTP_HAS_NAT_SUPPORT:
 		stcb->asoc.peer_supports_nat = 0;
 		break;
 	case SCTP_ADD_IP_ADDRESS:
 	case SCTP_DEL_IP_ADDRESS:
 	case SCTP_SET_PRIM_ADDR:
 		stcb->asoc.asconf_supported = 0;
 		break;
 	case SCTP_SUCCESS_REPORT:
 	case SCTP_ERROR_CAUSE_IND:
 		SCTPDBG(SCTP_DEBUG_INPUT2, "Huh, the peer does not support success? or error cause?\n");
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "Turning off ASCONF to this strange peer\n");
 		stcb->asoc.asconf_supported = 0;
 		break;
 	default:
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "Peer does not support param type %d(%x)??\n",
 		    pbad->param_type, (uint32_t) pbad->param_type);
 		break;
 	}
 }
 
 static int
 sctp_handle_error(struct sctp_chunkhdr *ch,
     struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	int chklen;
 	struct sctp_paramhdr *phdr;
 	uint16_t error, error_type;
 	uint16_t error_len;
 	struct sctp_association *asoc;
 	int adjust;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 #endif
 
 	/* parse through all of the errors and process */
 	asoc = &stcb->asoc;
 	phdr = (struct sctp_paramhdr *)((caddr_t)ch +
 	    sizeof(struct sctp_chunkhdr));
 	chklen = ntohs(ch->chunk_length) - sizeof(struct sctp_chunkhdr);
 	error = 0;
 	while ((size_t)chklen >= sizeof(struct sctp_paramhdr)) {
 		/* Process an Error Cause */
 		error_type = ntohs(phdr->param_type);
 		error_len = ntohs(phdr->param_length);
 		if ((error_len > chklen) || (error_len == 0)) {
 			/* invalid param length for this param */
 			SCTPDBG(SCTP_DEBUG_INPUT1, "Bogus length in error param- chunk left:%d errorlen:%d\n",
 			    chklen, error_len);
 			return (0);
 		}
 		if (error == 0) {
 			/* report the first error cause */
 			error = error_type;
 		}
 		switch (error_type) {
 		case SCTP_CAUSE_INVALID_STREAM:
 		case SCTP_CAUSE_MISSING_PARAM:
 		case SCTP_CAUSE_INVALID_PARAM:
 		case SCTP_CAUSE_NO_USER_DATA:
 			SCTPDBG(SCTP_DEBUG_INPUT1, "Software error we got a %d back? We have a bug :/ (or do they?)\n",
 			    error_type);
 			break;
 		case SCTP_CAUSE_NAT_COLLIDING_STATE:
 			SCTPDBG(SCTP_DEBUG_INPUT2, "Received Colliding state abort flags:%x\n",
 			    ch->chunk_flags);
 			if (sctp_handle_nat_colliding_state(stcb)) {
 				return (0);
 			}
 			break;
 		case SCTP_CAUSE_NAT_MISSING_STATE:
 			SCTPDBG(SCTP_DEBUG_INPUT2, "Received missing state abort flags:%x\n",
 			    ch->chunk_flags);
 			if (sctp_handle_nat_missing_state(stcb, net)) {
 				return (0);
 			}
 			break;
 		case SCTP_CAUSE_STALE_COOKIE:
 			/*
 			 * We only act if we have echoed a cookie and are
 			 * waiting.
 			 */
 			if (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED) {
 				int *p;
 
 				p = (int *)((caddr_t)phdr + sizeof(*phdr));
 				/* Save the time doubled */
 				asoc->cookie_preserve_req = ntohl(*p) << 1;
 				asoc->stale_cookie_count++;
 				if (asoc->stale_cookie_count >
 				    asoc->max_init_times) {
 					sctp_abort_notification(stcb, 0, 0, NULL, SCTP_SO_NOT_LOCKED);
 					/* now free the asoc */
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					so = SCTP_INP_SO(stcb->sctp_ep);
 					atomic_add_int(&stcb->asoc.refcnt, 1);
 					SCTP_TCB_UNLOCK(stcb);
 					SCTP_SOCKET_LOCK(so, 1);
 					SCTP_TCB_LOCK(stcb);
 					atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 					(void)sctp_free_assoc(stcb->sctp_ep, stcb, SCTP_NORMAL_PROC,
 					    SCTP_FROM_SCTP_INPUT + SCTP_LOC_12);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 					return (-1);
 				}
 				/* blast back to INIT state */
 				sctp_toss_old_cookies(stcb, &stcb->asoc);
 				asoc->state &= ~SCTP_STATE_COOKIE_ECHOED;
 				asoc->state |= SCTP_STATE_COOKIE_WAIT;
 				sctp_stop_all_cookie_timers(stcb);
 				sctp_send_initiate(stcb->sctp_ep, stcb, SCTP_SO_NOT_LOCKED);
 			}
 			break;
 		case SCTP_CAUSE_UNRESOLVABLE_ADDR:
 			/*
 			 * Nothing we can do here, we don't do hostname
 			 * addresses so if the peer does not like my IPv6
 			 * (or IPv4 for that matter) it does not matter. If
 			 * they don't support that type of address, they can
 			 * NOT possibly get that packet type... i.e. with no
 			 * IPv6 you can't receive a IPv6 packet. so we can
 			 * safely ignore this one. If we ever added support
 			 * for HOSTNAME Addresses, then we would need to do
 			 * something here.
 			 */
 			break;
 		case SCTP_CAUSE_UNRECOG_CHUNK:
 			sctp_process_unrecog_chunk(stcb, phdr, net);
 			break;
 		case SCTP_CAUSE_UNRECOG_PARAM:
 			sctp_process_unrecog_param(stcb, phdr);
 			break;
 		case SCTP_CAUSE_COOKIE_IN_SHUTDOWN:
 			/*
 			 * We ignore this since the timer will drive out a
 			 * new cookie anyway and there timer will drive us
 			 * to send a SHUTDOWN_COMPLETE. We can't send one
 			 * here since we don't have their tag.
 			 */
 			break;
 		case SCTP_CAUSE_DELETING_LAST_ADDR:
 		case SCTP_CAUSE_RESOURCE_SHORTAGE:
 		case SCTP_CAUSE_DELETING_SRC_ADDR:
 			/*
 			 * We should NOT get these here, but in a
 			 * ASCONF-ACK.
 			 */
 			SCTPDBG(SCTP_DEBUG_INPUT2, "Peer sends ASCONF errors in a Operational Error?<%d>?\n",
 			    error_type);
 			break;
 		case SCTP_CAUSE_OUT_OF_RESC:
 			/*
 			 * And what, pray tell do we do with the fact that
 			 * the peer is out of resources? Not really sure we
 			 * could do anything but abort. I suspect this
 			 * should have came WITH an abort instead of in a
 			 * OP-ERROR.
 			 */
 			break;
 		default:
 			SCTPDBG(SCTP_DEBUG_INPUT1, "sctp_handle_error: unknown error type = 0x%xh\n",
 			    error_type);
 			break;
 		}
 		adjust = SCTP_SIZE32(error_len);
 		chklen -= adjust;
 		phdr = (struct sctp_paramhdr *)((caddr_t)phdr + adjust);
 	}
 	sctp_ulp_notify(SCTP_NOTIFY_REMOTE_ERROR, stcb, error, ch, SCTP_SO_NOT_LOCKED);
 	return (0);
 }
 
 static int
 sctp_handle_init_ack(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst, struct sctphdr *sh,
     struct sctp_init_ack_chunk *cp, struct sctp_tcb *stcb,
     struct sctp_nets *net, int *abort_no_unlock,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id)
 {
 	struct sctp_init_ack *init_ack;
 	struct mbuf *op_err;
 
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_init_ack: handling INIT-ACK\n");
 
 	if (stcb == NULL) {
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "sctp_handle_init_ack: TCB is null\n");
 		return (-1);
 	}
 	if (ntohs(cp->ch.chunk_length) < sizeof(struct sctp_init_ack_chunk)) {
 		/* Invalid length */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	init_ack = &cp->init;
 	/* validate parameters */
 	if (init_ack->initiate_tag == 0) {
 		/* protocol error... send an abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	if (ntohl(init_ack->a_rwnd) < SCTP_MIN_RWND) {
 		/* protocol error... send an abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	if (init_ack->num_inbound_streams == 0) {
 		/* protocol error... send an abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	if (init_ack->num_outbound_streams == 0) {
 		/* protocol error... send an abort */
 		op_err = sctp_generate_cause(SCTP_CAUSE_INVALID_PARAM, "");
 		sctp_abort_association(stcb->sctp_ep, stcb, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, net->port);
 		*abort_no_unlock = 1;
 		return (-1);
 	}
 	/* process according to association state... */
 	switch (stcb->asoc.state & SCTP_STATE_MASK) {
 	case SCTP_STATE_COOKIE_WAIT:
 		/* this is the expected state for this chunk */
 		/* process the INIT-ACK parameters */
 		if (stcb->asoc.primary_destination->dest_state &
 		    SCTP_ADDR_UNCONFIRMED) {
 			/*
 			 * The primary is where we sent the INIT, we can
 			 * always consider it confirmed when the INIT-ACK is
 			 * returned. Do this before we load addresses
 			 * though.
 			 */
 			stcb->asoc.primary_destination->dest_state &=
 			    ~SCTP_ADDR_UNCONFIRMED;
 			sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_CONFIRMED,
 			    stcb, 0, (void *)stcb->asoc.primary_destination, SCTP_SO_NOT_LOCKED);
 		}
 		if (sctp_process_init_ack(m, iphlen, offset, src, dst, sh, cp, stcb,
 		    net, abort_no_unlock,
 		    mflowtype, mflowid,
 		    vrf_id) < 0) {
 			/* error in parsing parameters */
 			return (-1);
 		}
 		/* update our state */
 		SCTPDBG(SCTP_DEBUG_INPUT2, "moving to COOKIE-ECHOED state\n");
 		SCTP_SET_STATE(&stcb->asoc, SCTP_STATE_COOKIE_ECHOED);
 
 		/* reset the RTO calc */
 		if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 			sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 			    stcb->asoc.overall_error_count,
 			    0,
 			    SCTP_FROM_SCTP_INPUT,
 			    __LINE__);
 		}
 		stcb->asoc.overall_error_count = 0;
 		(void)SCTP_GETTIME_TIMEVAL(&stcb->asoc.time_entered);
 		/*
 		 * collapse the init timer back in case of a exponential
 		 * backoff
 		 */
 		sctp_timer_start(SCTP_TIMER_TYPE_COOKIE, stcb->sctp_ep,
 		    stcb, net);
 		/*
 		 * the send at the end of the inbound data processing will
 		 * cause the cookie to be sent
 		 */
 		break;
 	case SCTP_STATE_SHUTDOWN_SENT:
 		/* incorrect state... discard */
 		break;
 	case SCTP_STATE_COOKIE_ECHOED:
 		/* incorrect state... discard */
 		break;
 	case SCTP_STATE_OPEN:
 		/* incorrect state... discard */
 		break;
 	case SCTP_STATE_EMPTY:
 	case SCTP_STATE_INUSE:
 	default:
 		/* incorrect state... discard */
 		return (-1);
 		break;
 	}
 	SCTPDBG(SCTP_DEBUG_INPUT1, "Leaving handle-init-ack end\n");
 	return (0);
 }
 
 static struct sctp_tcb *
 sctp_process_cookie_new(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_state_cookie *cookie, int cookie_len,
     struct sctp_inpcb *inp, struct sctp_nets **netp,
     struct sockaddr *init_src, int *notification,
     int auth_skipped, uint32_t auth_offset, uint32_t auth_len,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id, uint16_t port);
 
 
 /*
  * handle a state cookie for an existing association m: input packet mbuf
  * chain-- assumes a pullup on IP/SCTP/COOKIE-ECHO chunk note: this is a
  * "split" mbuf and the cookie signature does not exist offset: offset into
  * mbuf to the cookie-echo chunk
  */
 static struct sctp_tcb *
 sctp_process_cookie_existing(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_state_cookie *cookie, int cookie_len,
     struct sctp_inpcb *inp, struct sctp_tcb *stcb, struct sctp_nets **netp,
     struct sockaddr *init_src, int *notification,
     int auth_skipped, uint32_t auth_offset, uint32_t auth_len,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id, uint16_t port)
 {
 	struct sctp_association *asoc;
 	struct sctp_init_chunk *init_cp, init_buf;
 	struct sctp_init_ack_chunk *initack_cp, initack_buf;
 	struct sctp_nets *net;
 	struct mbuf *op_err;
 	int init_offset, initack_offset, i;
 	int retval;
 	int spec_flag = 0;
 	uint32_t how_indx;
 
 #if defined(SCTP_DETAILED_STR_STATS)
 	int j;
 
 #endif
 
 	net = *netp;
 	/* I know that the TCB is non-NULL from the caller */
 	asoc = &stcb->asoc;
 	for (how_indx = 0; how_indx < sizeof(asoc->cookie_how); how_indx++) {
 		if (asoc->cookie_how[how_indx] == 0)
 			break;
 	}
 	if (how_indx < sizeof(asoc->cookie_how)) {
 		asoc->cookie_how[how_indx] = 1;
 	}
 	if (SCTP_GET_STATE(asoc) == SCTP_STATE_SHUTDOWN_ACK_SENT) {
 		/* SHUTDOWN came in after sending INIT-ACK */
 		sctp_send_shutdown_ack(stcb, stcb->asoc.primary_destination);
 		op_err = sctp_generate_cause(SCTP_CAUSE_COOKIE_IN_SHUTDOWN, "");
 		sctp_send_operr_to(src, dst, sh, cookie->peers_vtag, op_err,
 		    mflowtype, mflowid, inp->fibnum,
 		    vrf_id, net->port);
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 2;
 		return (NULL);
 	}
 	/*
 	 * find and validate the INIT chunk in the cookie (peer's info) the
 	 * INIT should start after the cookie-echo header struct (chunk
 	 * header, state cookie header struct)
 	 */
 	init_offset = offset += sizeof(struct sctp_cookie_echo_chunk);
 
 	init_cp = (struct sctp_init_chunk *)
 	    sctp_m_getptr(m, init_offset, sizeof(struct sctp_init_chunk),
 	    (uint8_t *) & init_buf);
 	if (init_cp == NULL) {
 		/* could not pull a INIT chunk in cookie */
 		return (NULL);
 	}
 	if (init_cp->ch.chunk_type != SCTP_INITIATION) {
 		return (NULL);
 	}
 	/*
 	 * find and validate the INIT-ACK chunk in the cookie (my info) the
 	 * INIT-ACK follows the INIT chunk
 	 */
 	initack_offset = init_offset + SCTP_SIZE32(ntohs(init_cp->ch.chunk_length));
 	initack_cp = (struct sctp_init_ack_chunk *)
 	    sctp_m_getptr(m, initack_offset, sizeof(struct sctp_init_ack_chunk),
 	    (uint8_t *) & initack_buf);
 	if (initack_cp == NULL) {
 		/* could not pull INIT-ACK chunk in cookie */
 		return (NULL);
 	}
 	if (initack_cp->ch.chunk_type != SCTP_INITIATION_ACK) {
 		return (NULL);
 	}
 	if ((ntohl(initack_cp->init.initiate_tag) == asoc->my_vtag) &&
 	    (ntohl(init_cp->init.initiate_tag) == asoc->peer_vtag)) {
 		/*
 		 * case D in Section 5.2.4 Table 2: MMAA process accordingly
 		 * to get into the OPEN state
 		 */
 		if (ntohl(initack_cp->init.initial_tsn) != asoc->init_seq_number) {
 			/*-
 			 * Opps, this means that we somehow generated two vtag's
 			 * the same. I.e. we did:
 			 *  Us               Peer
 			 *   <---INIT(tag=a)------
 			 *   ----INIT-ACK(tag=t)-->
 			 *   ----INIT(tag=t)------> *1
 			 *   <---INIT-ACK(tag=a)---
                          *   <----CE(tag=t)------------- *2
 			 *
 			 * At point *1 we should be generating a different
 			 * tag t'. Which means we would throw away the CE and send
 			 * ours instead. Basically this is case C (throw away side).
 			 */
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 17;
 			return (NULL);
 
 		}
 		switch (SCTP_GET_STATE(asoc)) {
 		case SCTP_STATE_COOKIE_WAIT:
 		case SCTP_STATE_COOKIE_ECHOED:
 			/*
 			 * INIT was sent but got a COOKIE_ECHO with the
 			 * correct tags... just accept it...but we must
 			 * process the init so that we can make sure we have
 			 * the right seq no's.
 			 */
 			/* First we must process the INIT !! */
 			retval = sctp_process_init(init_cp, stcb);
 			if (retval < 0) {
 				if (how_indx < sizeof(asoc->cookie_how))
 					asoc->cookie_how[how_indx] = 3;
 				return (NULL);
 			}
 			/* we have already processed the INIT so no problem */
 			sctp_timer_stop(SCTP_TIMER_TYPE_HEARTBEAT, inp,
 			    stcb, net,
 			    SCTP_FROM_SCTP_INPUT + SCTP_LOC_13);
 			sctp_timer_stop(SCTP_TIMER_TYPE_INIT, inp,
 			    stcb, net,
 			    SCTP_FROM_SCTP_INPUT + SCTP_LOC_14);
 			/* update current state */
 			if (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED)
 				SCTP_STAT_INCR_COUNTER32(sctps_activeestab);
 			else
 				SCTP_STAT_INCR_COUNTER32(sctps_collisionestab);
 
 			SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 			if (asoc->state & SCTP_STATE_SHUTDOWN_PENDING) {
 				sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD,
 				    stcb->sctp_ep, stcb, asoc->primary_destination);
 			}
 			SCTP_STAT_INCR_GAUGE32(sctps_currestab);
 			sctp_stop_all_cookie_timers(stcb);
 			if (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 			    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) &&
 			    (inp->sctp_socket->so_qlimit == 0)
 			    ) {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				struct socket *so;
 
 #endif
 				/*
 				 * Here is where collision would go if we
 				 * did a connect() and instead got a
 				 * init/init-ack/cookie done before the
 				 * init-ack came back..
 				 */
 				stcb->sctp_ep->sctp_flags |=
 				    SCTP_PCB_FLAGS_CONNECTED;
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				so = SCTP_INP_SO(stcb->sctp_ep);
 				atomic_add_int(&stcb->asoc.refcnt, 1);
 				SCTP_TCB_UNLOCK(stcb);
 				SCTP_SOCKET_LOCK(so, 1);
 				SCTP_TCB_LOCK(stcb);
 				atomic_add_int(&stcb->asoc.refcnt, -1);
 				if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 					SCTP_SOCKET_UNLOCK(so, 1);
 					return (NULL);
 				}
 #endif
 				soisconnected(stcb->sctp_socket);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 			}
 			/* notify upper layer */
 			*notification = SCTP_NOTIFY_ASSOC_UP;
 			/*
 			 * since we did not send a HB make sure we don't
 			 * double things
 			 */
 			net->hb_responded = 1;
 			net->RTO = sctp_calculate_rto(stcb, asoc, net,
 			    &cookie->time_entered,
 			    sctp_align_unsafe_makecopy,
 			    SCTP_RTT_FROM_NON_DATA);
 
 			if (stcb->asoc.sctp_autoclose_ticks &&
 			    (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_AUTOCLOSE))) {
 				sctp_timer_start(SCTP_TIMER_TYPE_AUTOCLOSE,
 				    inp, stcb, NULL);
 			}
 			break;
 		default:
 			/*
 			 * we're in the OPEN state (or beyond), so peer must
 			 * have simply lost the COOKIE-ACK
 			 */
 			break;
 		}		/* end switch */
 		sctp_stop_all_cookie_timers(stcb);
 		/*
 		 * We ignore the return code here.. not sure if we should
 		 * somehow abort.. but we do have an existing asoc. This
 		 * really should not fail.
 		 */
 		if (sctp_load_addresses_from_init(stcb, m,
 		    init_offset + sizeof(struct sctp_init_chunk),
 		    initack_offset, src, dst, init_src, stcb->asoc.port)) {
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 4;
 			return (NULL);
 		}
 		/* respond with a COOKIE-ACK */
 		sctp_toss_old_cookies(stcb, asoc);
 		sctp_send_cookie_ack(stcb);
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 5;
 		return (stcb);
 	}
 	if (ntohl(initack_cp->init.initiate_tag) != asoc->my_vtag &&
 	    ntohl(init_cp->init.initiate_tag) == asoc->peer_vtag &&
 	    cookie->tie_tag_my_vtag == 0 &&
 	    cookie->tie_tag_peer_vtag == 0) {
 		/*
 		 * case C in Section 5.2.4 Table 2: XMOO silently discard
 		 */
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 6;
 		return (NULL);
 	}
 	/*
 	 * If nat support, and the below and stcb is established, send back
 	 * a ABORT(colliding state) if we are established.
 	 */
 	if ((SCTP_GET_STATE(asoc) == SCTP_STATE_OPEN) &&
 	    (asoc->peer_supports_nat) &&
 	    ((ntohl(initack_cp->init.initiate_tag) == asoc->my_vtag) &&
 	    ((ntohl(init_cp->init.initiate_tag) != asoc->peer_vtag) ||
 	    (asoc->peer_vtag == 0)))) {
 		/*
 		 * Special case - Peer's support nat. We may have two init's
 		 * that we gave out the same tag on since one was not
 		 * established.. i.e. we get INIT from host-1 behind the nat
 		 * and we respond tag-a, we get a INIT from host-2 behind
 		 * the nat and we get tag-a again. Then we bring up host-1
 		 * (or 2's) assoc, Then comes the cookie from hsot-2 (or 1).
 		 * Now we have colliding state. We must send an abort here
 		 * with colliding state indication.
 		 */
 		op_err = sctp_generate_cause(SCTP_CAUSE_NAT_COLLIDING_STATE, "");
 		sctp_send_abort(m, iphlen, src, dst, sh, 0, op_err,
 		    mflowtype, mflowid, inp->fibnum,
 		    vrf_id, port);
 		return (NULL);
 	}
 	if ((ntohl(initack_cp->init.initiate_tag) == asoc->my_vtag) &&
 	    ((ntohl(init_cp->init.initiate_tag) != asoc->peer_vtag) ||
 	    (asoc->peer_vtag == 0))) {
 		/*
 		 * case B in Section 5.2.4 Table 2: MXAA or MOAA my info
 		 * should be ok, re-accept peer info
 		 */
 		if (ntohl(initack_cp->init.initial_tsn) != asoc->init_seq_number) {
 			/*
 			 * Extension of case C. If we hit this, then the
 			 * random number generator returned the same vtag
 			 * when we first sent our INIT-ACK and when we later
 			 * sent our INIT. The side with the seq numbers that
 			 * are different will be the one that normnally
 			 * would have hit case C. This in effect "extends"
 			 * our vtags in this collision case to be 64 bits.
 			 * The same collision could occur aka you get both
 			 * vtag and seq number the same twice in a row.. but
 			 * is much less likely. If it did happen then we
 			 * would proceed through and bring up the assoc.. we
 			 * may end up with the wrong stream setup however..
 			 * which would be bad.. but there is no way to
 			 * tell.. until we send on a stream that does not
 			 * exist :-)
 			 */
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 7;
 
 			return (NULL);
 		}
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 8;
 		sctp_timer_stop(SCTP_TIMER_TYPE_HEARTBEAT, inp, stcb, net,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_15);
 		sctp_stop_all_cookie_timers(stcb);
 		/*
 		 * since we did not send a HB make sure we don't double
 		 * things
 		 */
 		net->hb_responded = 1;
 		if (stcb->asoc.sctp_autoclose_ticks &&
 		    sctp_is_feature_on(inp, SCTP_PCB_FLAGS_AUTOCLOSE)) {
 			sctp_timer_start(SCTP_TIMER_TYPE_AUTOCLOSE, inp, stcb,
 			    NULL);
 		}
 		asoc->my_rwnd = ntohl(initack_cp->init.a_rwnd);
 		asoc->pre_open_streams = ntohs(initack_cp->init.num_outbound_streams);
 
 		if (ntohl(init_cp->init.initiate_tag) != asoc->peer_vtag) {
 			/*
 			 * Ok the peer probably discarded our data (if we
 			 * echoed a cookie+data). So anything on the
 			 * sent_queue should be marked for retransmit, we
 			 * may not get something to kick us so it COULD
 			 * still take a timeout to move these.. but it can't
 			 * hurt to mark them.
 			 */
 			struct sctp_tmit_chunk *chk;
 
 			TAILQ_FOREACH(chk, &stcb->asoc.sent_queue, sctp_next) {
 				if (chk->sent < SCTP_DATAGRAM_RESEND) {
 					chk->sent = SCTP_DATAGRAM_RESEND;
 					sctp_flight_size_decrease(chk);
 					sctp_total_flight_decrease(stcb, chk);
 					sctp_ucount_incr(stcb->asoc.sent_queue_retran_cnt);
 					spec_flag++;
 				}
 			}
 
 		}
 		/* process the INIT info (peer's info) */
 		retval = sctp_process_init(init_cp, stcb);
 		if (retval < 0) {
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 9;
 			return (NULL);
 		}
 		if (sctp_load_addresses_from_init(stcb, m,
 		    init_offset + sizeof(struct sctp_init_chunk),
 		    initack_offset, src, dst, init_src, stcb->asoc.port)) {
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 10;
 			return (NULL);
 		}
 		if ((asoc->state & SCTP_STATE_COOKIE_WAIT) ||
 		    (asoc->state & SCTP_STATE_COOKIE_ECHOED)) {
 			*notification = SCTP_NOTIFY_ASSOC_UP;
 
 			if (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 			    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) &&
 			    (inp->sctp_socket->so_qlimit == 0)) {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				struct socket *so;
 
 #endif
 				stcb->sctp_ep->sctp_flags |=
 				    SCTP_PCB_FLAGS_CONNECTED;
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				so = SCTP_INP_SO(stcb->sctp_ep);
 				atomic_add_int(&stcb->asoc.refcnt, 1);
 				SCTP_TCB_UNLOCK(stcb);
 				SCTP_SOCKET_LOCK(so, 1);
 				SCTP_TCB_LOCK(stcb);
 				atomic_add_int(&stcb->asoc.refcnt, -1);
 				if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 					SCTP_SOCKET_UNLOCK(so, 1);
 					return (NULL);
 				}
 #endif
 				soisconnected(stcb->sctp_socket);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 			}
 			if (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED)
 				SCTP_STAT_INCR_COUNTER32(sctps_activeestab);
 			else
 				SCTP_STAT_INCR_COUNTER32(sctps_collisionestab);
 			SCTP_STAT_INCR_GAUGE32(sctps_currestab);
 		} else if (SCTP_GET_STATE(asoc) == SCTP_STATE_OPEN) {
 			SCTP_STAT_INCR_COUNTER32(sctps_restartestab);
 		} else {
 			SCTP_STAT_INCR_COUNTER32(sctps_collisionestab);
 		}
 		SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 		if (asoc->state & SCTP_STATE_SHUTDOWN_PENDING) {
 			sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD,
 			    stcb->sctp_ep, stcb, asoc->primary_destination);
 		}
 		sctp_stop_all_cookie_timers(stcb);
 		sctp_toss_old_cookies(stcb, asoc);
 		sctp_send_cookie_ack(stcb);
 		if (spec_flag) {
 			/*
 			 * only if we have retrans set do we do this. What
 			 * this call does is get only the COOKIE-ACK out and
 			 * then when we return the normal call to
 			 * sctp_chunk_output will get the retrans out behind
 			 * this.
 			 */
 			sctp_chunk_output(inp, stcb, SCTP_OUTPUT_FROM_COOKIE_ACK, SCTP_SO_NOT_LOCKED);
 		}
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 11;
 
 		return (stcb);
 	}
 	if ((ntohl(initack_cp->init.initiate_tag) != asoc->my_vtag &&
 	    ntohl(init_cp->init.initiate_tag) != asoc->peer_vtag) &&
 	    cookie->tie_tag_my_vtag == asoc->my_vtag_nonce &&
 	    cookie->tie_tag_peer_vtag == asoc->peer_vtag_nonce &&
 	    cookie->tie_tag_peer_vtag != 0) {
 		struct sctpasochead *head;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		struct socket *so;
 
 #endif
 
 		if (asoc->peer_supports_nat) {
 			/*
 			 * This is a gross gross hack. Just call the
 			 * cookie_new code since we are allowing a duplicate
 			 * association. I hope this works...
 			 */
 			return (sctp_process_cookie_new(m, iphlen, offset, src, dst,
 			    sh, cookie, cookie_len,
 			    inp, netp, init_src, notification,
 			    auth_skipped, auth_offset, auth_len,
 			    mflowtype, mflowid,
 			    vrf_id, port));
 		}
 		/*
 		 * case A in Section 5.2.4 Table 2: XXMM (peer restarted)
 		 */
 		/* temp code */
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 12;
 		sctp_timer_stop(SCTP_TIMER_TYPE_INIT, inp, stcb, net,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_16);
 		sctp_timer_stop(SCTP_TIMER_TYPE_HEARTBEAT, inp, stcb, net,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_17);
 
 		/* notify upper layer */
 		*notification = SCTP_NOTIFY_ASSOC_RESTART;
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 		if ((SCTP_GET_STATE(asoc) != SCTP_STATE_OPEN) &&
 		    (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_RECEIVED) &&
 		    (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_SENT)) {
 			SCTP_STAT_INCR_GAUGE32(sctps_currestab);
 		}
 		if (SCTP_GET_STATE(asoc) == SCTP_STATE_OPEN) {
 			SCTP_STAT_INCR_GAUGE32(sctps_restartestab);
 		} else if (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_SENT) {
 			SCTP_STAT_INCR_GAUGE32(sctps_collisionestab);
 		}
 		if (asoc->state & SCTP_STATE_SHUTDOWN_PENDING) {
 			SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 			sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD,
 			    stcb->sctp_ep, stcb, asoc->primary_destination);
 
 		} else if (!(asoc->state & SCTP_STATE_SHUTDOWN_SENT)) {
 			/* move to OPEN state, if not in SHUTDOWN_SENT */
 			SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 		}
 		asoc->pre_open_streams =
 		    ntohs(initack_cp->init.num_outbound_streams);
 		asoc->init_seq_number = ntohl(initack_cp->init.initial_tsn);
 		asoc->sending_seq = asoc->asconf_seq_out = asoc->str_reset_seq_out = asoc->init_seq_number;
 		asoc->asconf_seq_out_acked = asoc->asconf_seq_out - 1;
 
 		asoc->asconf_seq_in = asoc->last_acked_seq = asoc->init_seq_number - 1;
 
 		asoc->str_reset_seq_in = asoc->init_seq_number;
 
 		asoc->advanced_peer_ack_point = asoc->last_acked_seq;
 		if (asoc->mapping_array) {
 			memset(asoc->mapping_array, 0,
 			    asoc->mapping_array_size);
 		}
 		if (asoc->nr_mapping_array) {
 			memset(asoc->nr_mapping_array, 0,
 			    asoc->mapping_array_size);
 		}
 		SCTP_TCB_UNLOCK(stcb);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		so = SCTP_INP_SO(stcb->sctp_ep);
 		SCTP_SOCKET_LOCK(so, 1);
 #endif
 		SCTP_INP_INFO_WLOCK();
 		SCTP_INP_WLOCK(stcb->sctp_ep);
 		SCTP_TCB_LOCK(stcb);
 		atomic_add_int(&stcb->asoc.refcnt, -1);
 		/* send up all the data */
 		SCTP_TCB_SEND_LOCK(stcb);
 
 		sctp_report_all_outbound(stcb, 0, 1, SCTP_SO_LOCKED);
 		for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 			stcb->asoc.strmout[i].chunks_on_queues = 0;
 #if defined(SCTP_DETAILED_STR_STATS)
 			for (j = 0; j < SCTP_PR_SCTP_MAX + 1; j++) {
 				asoc->strmout[i].abandoned_sent[j] = 0;
 				asoc->strmout[i].abandoned_unsent[j] = 0;
 			}
 #else
 			asoc->strmout[i].abandoned_sent[0] = 0;
 			asoc->strmout[i].abandoned_unsent[0] = 0;
 #endif
 			stcb->asoc.strmout[i].stream_no = i;
 			stcb->asoc.strmout[i].next_sequence_send = 0;
 			stcb->asoc.strmout[i].last_msg_incomplete = 0;
 		}
 		/* process the INIT-ACK info (my info) */
 		asoc->my_vtag = ntohl(initack_cp->init.initiate_tag);
 		asoc->my_rwnd = ntohl(initack_cp->init.a_rwnd);
 
 		/* pull from vtag hash */
 		LIST_REMOVE(stcb, sctp_asocs);
 		/* re-insert to new vtag position */
 		head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(stcb->asoc.my_vtag,
 		    SCTP_BASE_INFO(hashasocmark))];
 		/*
 		 * put it in the bucket in the vtag hash of assoc's for the
 		 * system
 		 */
 		LIST_INSERT_HEAD(head, stcb, sctp_asocs);
 
 		SCTP_TCB_SEND_UNLOCK(stcb);
 		SCTP_INP_WUNLOCK(stcb->sctp_ep);
 		SCTP_INP_INFO_WUNLOCK();
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		asoc->total_flight = 0;
 		asoc->total_flight_count = 0;
 		/* process the INIT info (peer's info) */
 		retval = sctp_process_init(init_cp, stcb);
 		if (retval < 0) {
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 13;
 
 			return (NULL);
 		}
 		/*
 		 * since we did not send a HB make sure we don't double
 		 * things
 		 */
 		net->hb_responded = 1;
 
 		if (sctp_load_addresses_from_init(stcb, m,
 		    init_offset + sizeof(struct sctp_init_chunk),
 		    initack_offset, src, dst, init_src, stcb->asoc.port)) {
 			if (how_indx < sizeof(asoc->cookie_how))
 				asoc->cookie_how[how_indx] = 14;
 
 			return (NULL);
 		}
 		/* respond with a COOKIE-ACK */
 		sctp_stop_all_cookie_timers(stcb);
 		sctp_toss_old_cookies(stcb, asoc);
 		sctp_send_cookie_ack(stcb);
 		if (how_indx < sizeof(asoc->cookie_how))
 			asoc->cookie_how[how_indx] = 15;
 
 		return (stcb);
 	}
 	if (how_indx < sizeof(asoc->cookie_how))
 		asoc->cookie_how[how_indx] = 16;
 	/* all other cases... */
 	return (NULL);
 }
 
 
 /*
  * handle a state cookie for a new association m: input packet mbuf chain--
  * assumes a pullup on IP/SCTP/COOKIE-ECHO chunk note: this is a "split" mbuf
  * and the cookie signature does not exist offset: offset into mbuf to the
  * cookie-echo chunk length: length of the cookie chunk to: where the init
  * was from returns a new TCB
  */
 static struct sctp_tcb *
 sctp_process_cookie_new(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_state_cookie *cookie, int cookie_len,
     struct sctp_inpcb *inp, struct sctp_nets **netp,
     struct sockaddr *init_src, int *notification,
     int auth_skipped, uint32_t auth_offset, uint32_t auth_len,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id, uint16_t port)
 {
 	struct sctp_tcb *stcb;
 	struct sctp_init_chunk *init_cp, init_buf;
 	struct sctp_init_ack_chunk *initack_cp, initack_buf;
 	union sctp_sockstore store;
 	struct sctp_association *asoc;
 	int init_offset, initack_offset, initack_limit;
 	int retval;
 	int error = 0;
 	uint8_t auth_chunk_buf[SCTP_PARAM_BUFFER_SIZE];
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 	so = SCTP_INP_SO(inp);
 #endif
 
 	/*
 	 * find and validate the INIT chunk in the cookie (peer's info) the
 	 * INIT should start after the cookie-echo header struct (chunk
 	 * header, state cookie header struct)
 	 */
 	init_offset = offset + sizeof(struct sctp_cookie_echo_chunk);
 	init_cp = (struct sctp_init_chunk *)
 	    sctp_m_getptr(m, init_offset, sizeof(struct sctp_init_chunk),
 	    (uint8_t *) & init_buf);
 	if (init_cp == NULL) {
 		/* could not pull a INIT chunk in cookie */
 		SCTPDBG(SCTP_DEBUG_INPUT1,
 		    "process_cookie_new: could not pull INIT chunk hdr\n");
 		return (NULL);
 	}
 	if (init_cp->ch.chunk_type != SCTP_INITIATION) {
 		SCTPDBG(SCTP_DEBUG_INPUT1, "HUH? process_cookie_new: could not find INIT chunk!\n");
 		return (NULL);
 	}
 	initack_offset = init_offset + SCTP_SIZE32(ntohs(init_cp->ch.chunk_length));
 	/*
 	 * find and validate the INIT-ACK chunk in the cookie (my info) the
 	 * INIT-ACK follows the INIT chunk
 	 */
 	initack_cp = (struct sctp_init_ack_chunk *)
 	    sctp_m_getptr(m, initack_offset, sizeof(struct sctp_init_ack_chunk),
 	    (uint8_t *) & initack_buf);
 	if (initack_cp == NULL) {
 		/* could not pull INIT-ACK chunk in cookie */
 		SCTPDBG(SCTP_DEBUG_INPUT1, "process_cookie_new: could not pull INIT-ACK chunk hdr\n");
 		return (NULL);
 	}
 	if (initack_cp->ch.chunk_type != SCTP_INITIATION_ACK) {
 		return (NULL);
 	}
 	/*
 	 * NOTE: We can't use the INIT_ACK's chk_length to determine the
 	 * "initack_limit" value.  This is because the chk_length field
 	 * includes the length of the cookie, but the cookie is omitted when
 	 * the INIT and INIT_ACK are tacked onto the cookie...
 	 */
 	initack_limit = offset + cookie_len;
 
 	/*
 	 * now that we know the INIT/INIT-ACK are in place, create a new TCB
 	 * and popluate
 	 */
 
 	/*
 	 * Here we do a trick, we set in NULL for the proc/thread argument.
 	 * We do this since in effect we only use the p argument when the
 	 * socket is unbound and we must do an implicit bind. Since we are
 	 * getting a cookie, we cannot be unbound.
 	 */
 	stcb = sctp_aloc_assoc(inp, init_src, &error,
 	    ntohl(initack_cp->init.initiate_tag), vrf_id,
 	    ntohs(initack_cp->init.num_outbound_streams),
 	    port,
 	    (struct thread *)NULL
 	    );
 	if (stcb == NULL) {
 		struct mbuf *op_err;
 
 		/* memory problem? */
 		SCTPDBG(SCTP_DEBUG_INPUT1,
 		    "process_cookie_new: no room for another TCB!\n");
 		op_err = sctp_generate_cause(SCTP_CAUSE_OUT_OF_RESC, "");
 		sctp_abort_association(inp, (struct sctp_tcb *)NULL, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 		return (NULL);
 	}
 	/* get the correct sctp_nets */
 	if (netp)
 		*netp = sctp_findnet(stcb, init_src);
 
 	asoc = &stcb->asoc;
 	/* get scope variables out of cookie */
 	asoc->scope.ipv4_local_scope = cookie->ipv4_scope;
 	asoc->scope.site_scope = cookie->site_scope;
 	asoc->scope.local_scope = cookie->local_scope;
 	asoc->scope.loopback_scope = cookie->loopback_scope;
 
 	if ((asoc->scope.ipv4_addr_legal != cookie->ipv4_addr_legal) ||
 	    (asoc->scope.ipv6_addr_legal != cookie->ipv6_addr_legal)) {
 		struct mbuf *op_err;
 
 		/*
 		 * Houston we have a problem. The EP changed while the
 		 * cookie was in flight. Only recourse is to abort the
 		 * association.
 		 */
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 		op_err = sctp_generate_cause(SCTP_CAUSE_OUT_OF_RESC, "");
 		sctp_abort_association(inp, (struct sctp_tcb *)NULL, m, iphlen,
 		    src, dst, sh, op_err,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 #endif
 		(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_18);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		return (NULL);
 	}
 	/* process the INIT-ACK info (my info) */
 	asoc->my_vtag = ntohl(initack_cp->init.initiate_tag);
 	asoc->my_rwnd = ntohl(initack_cp->init.a_rwnd);
 	asoc->pre_open_streams = ntohs(initack_cp->init.num_outbound_streams);
 	asoc->init_seq_number = ntohl(initack_cp->init.initial_tsn);
 	asoc->sending_seq = asoc->asconf_seq_out = asoc->str_reset_seq_out = asoc->init_seq_number;
 	asoc->asconf_seq_out_acked = asoc->asconf_seq_out - 1;
 	asoc->asconf_seq_in = asoc->last_acked_seq = asoc->init_seq_number - 1;
 	asoc->str_reset_seq_in = asoc->init_seq_number;
 
 	asoc->advanced_peer_ack_point = asoc->last_acked_seq;
 
 	/* process the INIT info (peer's info) */
 	if (netp)
 		retval = sctp_process_init(init_cp, stcb);
 	else
 		retval = 0;
 	if (retval < 0) {
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 #endif
 		(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_19);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		return (NULL);
 	}
 	/* load all addresses */
 	if (sctp_load_addresses_from_init(stcb, m,
 	    init_offset + sizeof(struct sctp_init_chunk), initack_offset,
 	    src, dst, init_src, port)) {
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 #endif
 		(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_20);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		return (NULL);
 	}
 	/*
 	 * verify any preceding AUTH chunk that was skipped
 	 */
 	/* pull the local authentication parameters from the cookie/init-ack */
 	sctp_auth_get_cookie_params(stcb, m,
 	    initack_offset + sizeof(struct sctp_init_ack_chunk),
 	    initack_limit - (initack_offset + sizeof(struct sctp_init_ack_chunk)));
 	if (auth_skipped) {
 		struct sctp_auth_chunk *auth;
 
 		auth = (struct sctp_auth_chunk *)
 		    sctp_m_getptr(m, auth_offset, auth_len, auth_chunk_buf);
 		if ((auth == NULL) || sctp_handle_auth(stcb, auth, m, auth_offset)) {
 			/* auth HMAC failed, dump the assoc and packet */
 			SCTPDBG(SCTP_DEBUG_AUTH1,
 			    "COOKIE-ECHO: AUTH failed\n");
 			atomic_add_int(&stcb->asoc.refcnt, 1);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_SOCKET_LOCK(so, 1);
 			SCTP_TCB_LOCK(stcb);
 #endif
 			(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 			    SCTP_FROM_SCTP_INPUT + SCTP_LOC_21);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 			atomic_subtract_int(&stcb->asoc.refcnt, 1);
 			return (NULL);
 		} else {
 			/* remaining chunks checked... good to go */
 			stcb->asoc.authenticated = 1;
 		}
 	}
 	/* update current state */
 	SCTPDBG(SCTP_DEBUG_INPUT2, "moving to OPEN state\n");
 	SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 	if (asoc->state & SCTP_STATE_SHUTDOWN_PENDING) {
 		sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD,
 		    stcb->sctp_ep, stcb, asoc->primary_destination);
 	}
 	sctp_stop_all_cookie_timers(stcb);
 	SCTP_STAT_INCR_COUNTER32(sctps_passiveestab);
 	SCTP_STAT_INCR_GAUGE32(sctps_currestab);
 
 	/*
 	 * if we're doing ASCONFs, check to see if we have any new local
 	 * addresses that need to get added to the peer (eg. addresses
 	 * changed while cookie echo in flight).  This needs to be done
 	 * after we go to the OPEN state to do the correct asconf
 	 * processing. else, make sure we have the correct addresses in our
 	 * lists
 	 */
 
 	/* warning, we re-use sin, sin6, sa_store here! */
 	/* pull in local_address (our "from" address) */
 	switch (cookie->laddr_type) {
 #ifdef INET
 	case SCTP_IPV4_ADDRESS:
 		/* source addr is IPv4 */
 		memset(&store.sin, 0, sizeof(struct sockaddr_in));
 		store.sin.sin_family = AF_INET;
 		store.sin.sin_len = sizeof(struct sockaddr_in);
 		store.sin.sin_addr.s_addr = cookie->laddress[0];
 		break;
 #endif
 #ifdef INET6
 	case SCTP_IPV6_ADDRESS:
 		/* source addr is IPv6 */
 		memset(&store.sin6, 0, sizeof(struct sockaddr_in6));
 		store.sin6.sin6_family = AF_INET6;
 		store.sin6.sin6_len = sizeof(struct sockaddr_in6);
 		store.sin6.sin6_scope_id = cookie->scope_id;
 		memcpy(&store.sin6.sin6_addr, cookie->laddress, sizeof(struct in6_addr));
 		break;
 #endif
 	default:
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 #endif
 		(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 		    SCTP_FROM_SCTP_INPUT + SCTP_LOC_22);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		return (NULL);
 	}
 
 	/* set up to notify upper layer */
 	*notification = SCTP_NOTIFY_ASSOC_UP;
 	if (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 	    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) &&
 	    (inp->sctp_socket->so_qlimit == 0)) {
 		/*
 		 * This is an endpoint that called connect() how it got a
 		 * cookie that is NEW is a bit of a mystery. It must be that
 		 * the INIT was sent, but before it got there.. a complete
 		 * INIT/INIT-ACK/COOKIE arrived. But of course then it
 		 * should have went to the other code.. not here.. oh well..
 		 * a bit of protection is worth having..
 		 */
 		stcb->sctp_ep->sctp_flags |= SCTP_PCB_FLAGS_CONNECTED;
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_SOCKET_LOCK(so, 1);
 		SCTP_TCB_LOCK(stcb);
 		atomic_subtract_int(&stcb->asoc.refcnt, 1);
 		if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 			SCTP_SOCKET_UNLOCK(so, 1);
 			return (NULL);
 		}
 #endif
 		soisconnected(stcb->sctp_socket);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 		SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 	} else if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) &&
 	    (inp->sctp_socket->so_qlimit)) {
 		/*
 		 * We don't want to do anything with this one. Since it is
 		 * the listening guy. The timer will get started for
 		 * accepted connections in the caller.
 		 */
 		;
 	}
 	/* since we did not send a HB make sure we don't double things */
 	if ((netp) && (*netp))
 		(*netp)->hb_responded = 1;
 
 	if (stcb->asoc.sctp_autoclose_ticks &&
 	    sctp_is_feature_on(inp, SCTP_PCB_FLAGS_AUTOCLOSE)) {
 		sctp_timer_start(SCTP_TIMER_TYPE_AUTOCLOSE, inp, stcb, NULL);
 	}
 	(void)SCTP_GETTIME_TIMEVAL(&stcb->asoc.time_entered);
 	if ((netp != NULL) && (*netp != NULL)) {
 		/* calculate the RTT and set the encaps port */
 		(*netp)->RTO = sctp_calculate_rto(stcb, asoc, *netp,
 		    &cookie->time_entered, sctp_align_unsafe_makecopy,
 		    SCTP_RTT_FROM_NON_DATA);
 	}
 	/* respond with a COOKIE-ACK */
 	sctp_send_cookie_ack(stcb);
 
 	/*
 	 * check the address lists for any ASCONFs that need to be sent
 	 * AFTER the cookie-ack is sent
 	 */
 	sctp_check_address_list(stcb, m,
 	    initack_offset + sizeof(struct sctp_init_ack_chunk),
 	    initack_limit - (initack_offset + sizeof(struct sctp_init_ack_chunk)),
 	    &store.sa, cookie->local_scope, cookie->site_scope,
 	    cookie->ipv4_scope, cookie->loopback_scope);
 
 
 	return (stcb);
 }
 
 /*
  * CODE LIKE THIS NEEDS TO RUN IF the peer supports the NAT extension, i.e
  * we NEED to make sure we are not already using the vtag. If so we
  * need to send back an ABORT-TRY-AGAIN-WITH-NEW-TAG No middle box bit!
 	head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(tag,
 							    SCTP_BASE_INFO(hashasocmark))];
 	LIST_FOREACH(stcb, head, sctp_asocs) {
 	        if ((stcb->asoc.my_vtag == tag) && (stcb->rport == rport) && (inp == stcb->sctp_ep))  {
 		       -- SEND ABORT - TRY AGAIN --
 		}
 	}
 */
 
 /*
  * handles a COOKIE-ECHO message stcb: modified to either a new or left as
  * existing (non-NULL) TCB
  */
 static struct mbuf *
 sctp_handle_cookie_echo(struct mbuf *m, int iphlen, int offset,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_cookie_echo_chunk *cp,
     struct sctp_inpcb **inp_p, struct sctp_tcb **stcb, struct sctp_nets **netp,
     int auth_skipped, uint32_t auth_offset, uint32_t auth_len,
     struct sctp_tcb **locked_tcb,
     uint8_t mflowtype, uint32_t mflowid,
     uint32_t vrf_id, uint16_t port)
 {
 	struct sctp_state_cookie *cookie;
 	struct sctp_tcb *l_stcb = *stcb;
 	struct sctp_inpcb *l_inp;
 	struct sockaddr *to;
 	struct sctp_pcb *ep;
 	struct mbuf *m_sig;
 	uint8_t calc_sig[SCTP_SIGNATURE_SIZE], tmp_sig[SCTP_SIGNATURE_SIZE];
 	uint8_t *sig;
 	uint8_t cookie_ok = 0;
 	unsigned int sig_offset, cookie_offset;
 	unsigned int cookie_len;
 	struct timeval now;
 	struct timeval time_expires;
 	int notification = 0;
 	struct sctp_nets *netl;
 	int had_a_existing_tcb = 0;
 	int send_int_conf = 0;
 
 #ifdef INET
 	struct sockaddr_in sin;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 sin6;
 
 #endif
 
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_cookie: handling COOKIE-ECHO\n");
 
 	if (inp_p == NULL) {
 		return (NULL);
 	}
 	cookie = &cp->cookie;
 	cookie_offset = offset + sizeof(struct sctp_chunkhdr);
 	cookie_len = ntohs(cp->ch.chunk_length);
 
 	if ((cookie->peerport != sh->src_port) ||
 	    (cookie->myport != sh->dest_port) ||
 	    (cookie->my_vtag != sh->v_tag)) {
 		/*
 		 * invalid ports or bad tag.  Note that we always leave the
 		 * v_tag in the header in network order and when we stored
 		 * it in the my_vtag slot we also left it in network order.
 		 * This maintains the match even though it may be in the
 		 * opposite byte order of the machine :->
 		 */
 		return (NULL);
 	}
 	if (cookie_len < sizeof(struct sctp_cookie_echo_chunk) +
 	    sizeof(struct sctp_init_chunk) +
 	    sizeof(struct sctp_init_ack_chunk) + SCTP_SIGNATURE_SIZE) {
 		/* cookie too small */
 		return (NULL);
 	}
 	/*
 	 * split off the signature into its own mbuf (since it should not be
 	 * calculated in the sctp_hmac_m() call).
 	 */
 	sig_offset = offset + cookie_len - SCTP_SIGNATURE_SIZE;
 	m_sig = m_split(m, sig_offset, M_NOWAIT);
 	if (m_sig == NULL) {
 		/* out of memory or ?? */
 		return (NULL);
 	}
 #ifdef SCTP_MBUF_LOGGING
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MBUF_LOGGING_ENABLE) {
 		sctp_log_mbc(m_sig, SCTP_MBUF_SPLIT);
 	}
 #endif
 
 	/*
 	 * compute the signature/digest for the cookie
 	 */
 	ep = &(*inp_p)->sctp_ep;
 	l_inp = *inp_p;
 	if (l_stcb) {
 		SCTP_TCB_UNLOCK(l_stcb);
 	}
 	SCTP_INP_RLOCK(l_inp);
 	if (l_stcb) {
 		SCTP_TCB_LOCK(l_stcb);
 	}
 	/* which cookie is it? */
 	if ((cookie->time_entered.tv_sec < (long)ep->time_of_secret_change) &&
 	    (ep->current_secret_number != ep->last_secret_number)) {
 		/* it's the old cookie */
 		(void)sctp_hmac_m(SCTP_HMAC,
 		    (uint8_t *) ep->secret_key[(int)ep->last_secret_number],
 		    SCTP_SECRET_SIZE, m, cookie_offset, calc_sig, 0);
 	} else {
 		/* it's the current cookie */
 		(void)sctp_hmac_m(SCTP_HMAC,
 		    (uint8_t *) ep->secret_key[(int)ep->current_secret_number],
 		    SCTP_SECRET_SIZE, m, cookie_offset, calc_sig, 0);
 	}
 	/* get the signature */
 	SCTP_INP_RUNLOCK(l_inp);
 	sig = (uint8_t *) sctp_m_getptr(m_sig, 0, SCTP_SIGNATURE_SIZE, (uint8_t *) & tmp_sig);
 	if (sig == NULL) {
 		/* couldn't find signature */
 		sctp_m_freem(m_sig);
 		return (NULL);
 	}
 	/* compare the received digest with the computed digest */
 	if (memcmp(calc_sig, sig, SCTP_SIGNATURE_SIZE) != 0) {
 		/* try the old cookie? */
 		if ((cookie->time_entered.tv_sec == (long)ep->time_of_secret_change) &&
 		    (ep->current_secret_number != ep->last_secret_number)) {
 			/* compute digest with old */
 			(void)sctp_hmac_m(SCTP_HMAC,
 			    (uint8_t *) ep->secret_key[(int)ep->last_secret_number],
 			    SCTP_SECRET_SIZE, m, cookie_offset, calc_sig, 0);
 			/* compare */
 			if (memcmp(calc_sig, sig, SCTP_SIGNATURE_SIZE) == 0)
 				cookie_ok = 1;
 		}
 	} else {
 		cookie_ok = 1;
 	}
 
 	/*
 	 * Now before we continue we must reconstruct our mbuf so that
 	 * normal processing of any other chunks will work.
 	 */
 	{
 		struct mbuf *m_at;
 
 		m_at = m;
 		while (SCTP_BUF_NEXT(m_at) != NULL) {
 			m_at = SCTP_BUF_NEXT(m_at);
 		}
 		SCTP_BUF_NEXT(m_at) = m_sig;
 	}
 
 	if (cookie_ok == 0) {
 		SCTPDBG(SCTP_DEBUG_INPUT2, "handle_cookie_echo: cookie signature validation failed!\n");
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "offset = %u, cookie_offset = %u, sig_offset = %u\n",
 		    (uint32_t) offset, cookie_offset, sig_offset);
 		return (NULL);
 	}
 	/*
 	 * check the cookie timestamps to be sure it's not stale
 	 */
 	(void)SCTP_GETTIME_TIMEVAL(&now);
 	/* Expire time is in Ticks, so we convert to seconds */
 	time_expires.tv_sec = cookie->time_entered.tv_sec + TICKS_TO_SEC(cookie->cookie_life);
 	time_expires.tv_usec = cookie->time_entered.tv_usec;
 	/*
 	 * TODO sctp_constants.h needs alternative time macros when _KERNEL
 	 * is undefined.
 	 */
 	if (timevalcmp(&now, &time_expires, >)) {
 		/* cookie is stale! */
 		struct mbuf *op_err;
 		struct sctp_error_stale_cookie *cause;
 		uint32_t tim;
 
 		op_err = sctp_get_mbuf_for_msg(sizeof(struct sctp_error_stale_cookie),
 		    0, M_NOWAIT, 1, MT_DATA);
 		if (op_err == NULL) {
 			/* FOOBAR */
 			return (NULL);
 		}
 		/* Set the len */
 		SCTP_BUF_LEN(op_err) = sizeof(struct sctp_error_stale_cookie);
 		cause = mtod(op_err, struct sctp_error_stale_cookie *);
 		cause->cause.code = htons(SCTP_CAUSE_STALE_COOKIE);
 		cause->cause.length = htons((sizeof(struct sctp_paramhdr) +
 		    (sizeof(uint32_t))));
 		/* seconds to usec */
 		tim = (now.tv_sec - time_expires.tv_sec) * 1000000;
 		/* add in usec */
 		if (tim == 0)
 			tim = now.tv_usec - cookie->time_entered.tv_usec;
 		cause->stale_time = htonl(tim);
 		sctp_send_operr_to(src, dst, sh, cookie->peers_vtag, op_err,
 		    mflowtype, mflowid, l_inp->fibnum,
 		    vrf_id, port);
 		return (NULL);
 	}
 	/*
 	 * Now we must see with the lookup address if we have an existing
 	 * asoc. This will only happen if we were in the COOKIE-WAIT state
 	 * and a INIT collided with us and somewhere the peer sent the
 	 * cookie on another address besides the single address our assoc
 	 * had for him. In this case we will have one of the tie-tags set at
 	 * least AND the address field in the cookie can be used to look it
 	 * up.
 	 */
 	to = NULL;
 	switch (cookie->addr_type) {
 #ifdef INET6
 	case SCTP_IPV6_ADDRESS:
 		memset(&sin6, 0, sizeof(sin6));
 		sin6.sin6_family = AF_INET6;
 		sin6.sin6_len = sizeof(sin6);
 		sin6.sin6_port = sh->src_port;
 		sin6.sin6_scope_id = cookie->scope_id;
 		memcpy(&sin6.sin6_addr.s6_addr, cookie->address,
 		    sizeof(sin6.sin6_addr.s6_addr));
 		to = (struct sockaddr *)&sin6;
 		break;
 #endif
 #ifdef INET
 	case SCTP_IPV4_ADDRESS:
 		memset(&sin, 0, sizeof(sin));
 		sin.sin_family = AF_INET;
 		sin.sin_len = sizeof(sin);
 		sin.sin_port = sh->src_port;
 		sin.sin_addr.s_addr = cookie->address[0];
 		to = (struct sockaddr *)&sin;
 		break;
 #endif
 	default:
 		/* This should not happen */
 		return (NULL);
 	}
 	if (*stcb == NULL) {
 		/* Yep, lets check */
 		*stcb = sctp_findassociation_ep_addr(inp_p, to, netp, dst, NULL);
 		if (*stcb == NULL) {
 			/*
 			 * We should have only got back the same inp. If we
 			 * got back a different ep we have a problem. The
 			 * original findep got back l_inp and now
 			 */
 			if (l_inp != *inp_p) {
 				SCTP_PRINTF("Bad problem find_ep got a diff inp then special_locate?\n");
 			}
 		} else {
 			if (*locked_tcb == NULL) {
 				/*
 				 * In this case we found the assoc only
 				 * after we locked the create lock. This
 				 * means we are in a colliding case and we
 				 * must make sure that we unlock the tcb if
 				 * its one of the cases where we throw away
 				 * the incoming packets.
 				 */
 				*locked_tcb = *stcb;
 
 				/*
 				 * We must also increment the inp ref count
 				 * since the ref_count flags was set when we
 				 * did not find the TCB, now we found it
 				 * which reduces the refcount.. we must
 				 * raise it back out to balance it all :-)
 				 */
 				SCTP_INP_INCR_REF((*stcb)->sctp_ep);
 				if ((*stcb)->sctp_ep != l_inp) {
 					SCTP_PRINTF("Huh? ep:%p diff then l_inp:%p?\n",
 					    (void *)(*stcb)->sctp_ep, (void *)l_inp);
 				}
 			}
 		}
 	}
 	cookie_len -= SCTP_SIGNATURE_SIZE;
 	if (*stcb == NULL) {
 		/* this is the "normal" case... get a new TCB */
 		*stcb = sctp_process_cookie_new(m, iphlen, offset, src, dst, sh,
 		    cookie, cookie_len, *inp_p,
 		    netp, to, &notification,
 		    auth_skipped, auth_offset, auth_len,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 	} else {
 		/* this is abnormal... cookie-echo on existing TCB */
 		had_a_existing_tcb = 1;
 		*stcb = sctp_process_cookie_existing(m, iphlen, offset,
 		    src, dst, sh,
 		    cookie, cookie_len, *inp_p, *stcb, netp, to,
 		    &notification, auth_skipped, auth_offset, auth_len,
 		    mflowtype, mflowid,
 		    vrf_id, port);
 	}
 
 	if (*stcb == NULL) {
 		/* still no TCB... must be bad cookie-echo */
 		return (NULL);
 	}
 	if (*netp != NULL) {
 		(*netp)->flowtype = mflowtype;
 		(*netp)->flowid = mflowid;
 	}
 	/*
 	 * Ok, we built an association so confirm the address we sent the
 	 * INIT-ACK to.
 	 */
 	netl = sctp_findnet(*stcb, to);
 	/*
 	 * This code should in theory NOT run but
 	 */
 	if (netl == NULL) {
 		/* TSNH! Huh, why do I need to add this address here? */
 		if (sctp_add_remote_addr(*stcb, to, NULL, port,
 		    SCTP_DONOT_SETSCOPE, SCTP_IN_COOKIE_PROC)) {
 			return (NULL);
 		}
 		netl = sctp_findnet(*stcb, to);
 	}
 	if (netl) {
 		if (netl->dest_state & SCTP_ADDR_UNCONFIRMED) {
 			netl->dest_state &= ~SCTP_ADDR_UNCONFIRMED;
 			(void)sctp_set_primary_addr((*stcb), (struct sockaddr *)NULL,
 			    netl);
 			send_int_conf = 1;
 		}
 	}
 	sctp_start_net_timers(*stcb);
 	if ((*inp_p)->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) {
 		if (!had_a_existing_tcb ||
 		    (((*inp_p)->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0)) {
 			/*
 			 * If we have a NEW cookie or the connect never
 			 * reached the connected state during collision we
 			 * must do the TCP accept thing.
 			 */
 			struct socket *so, *oso;
 			struct sctp_inpcb *inp;
 
 			if (notification == SCTP_NOTIFY_ASSOC_RESTART) {
 				/*
 				 * For a restart we will keep the same
 				 * socket, no need to do anything. I THINK!!
 				 */
 				sctp_ulp_notify(notification, *stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 				if (send_int_conf) {
 					sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_CONFIRMED,
 					    (*stcb), 0, (void *)netl, SCTP_SO_NOT_LOCKED);
 				}
 				return (m);
 			}
 			oso = (*inp_p)->sctp_socket;
 			atomic_add_int(&(*stcb)->asoc.refcnt, 1);
 			SCTP_TCB_UNLOCK((*stcb));
 			CURVNET_SET(oso->so_vnet);
 			so = sonewconn(oso, 0
 			    );
 			CURVNET_RESTORE();
 			SCTP_TCB_LOCK((*stcb));
 			atomic_subtract_int(&(*stcb)->asoc.refcnt, 1);
 
 			if (so == NULL) {
 				struct mbuf *op_err;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				struct socket *pcb_so;
 
 #endif
 				/* Too many sockets */
 				SCTPDBG(SCTP_DEBUG_INPUT1, "process_cookie_new: no room for another socket!\n");
 				op_err = sctp_generate_cause(SCTP_CAUSE_OUT_OF_RESC, "");
 				sctp_abort_association(*inp_p, NULL, m, iphlen,
 				    src, dst, sh, op_err,
 				    mflowtype, mflowid,
 				    vrf_id, port);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				pcb_so = SCTP_INP_SO(*inp_p);
 				atomic_add_int(&(*stcb)->asoc.refcnt, 1);
 				SCTP_TCB_UNLOCK((*stcb));
 				SCTP_SOCKET_LOCK(pcb_so, 1);
 				SCTP_TCB_LOCK((*stcb));
 				atomic_subtract_int(&(*stcb)->asoc.refcnt, 1);
 #endif
 				(void)sctp_free_assoc(*inp_p, *stcb, SCTP_NORMAL_PROC,
 				    SCTP_FROM_SCTP_INPUT + SCTP_LOC_23);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 				SCTP_SOCKET_UNLOCK(pcb_so, 1);
 #endif
 				return (NULL);
 			}
 			inp = (struct sctp_inpcb *)so->so_pcb;
 			SCTP_INP_INCR_REF(inp);
 			/*
 			 * We add the unbound flag here so that if we get an
 			 * soabort() before we get the move_pcb done, we
 			 * will properly cleanup.
 			 */
 			inp->sctp_flags = (SCTP_PCB_FLAGS_TCPTYPE |
 			    SCTP_PCB_FLAGS_CONNECTED |
 			    SCTP_PCB_FLAGS_IN_TCPPOOL |
 			    SCTP_PCB_FLAGS_UNBOUND |
 			    (SCTP_PCB_COPY_FLAGS & (*inp_p)->sctp_flags) |
 			    SCTP_PCB_FLAGS_DONT_WAKE);
 			inp->sctp_features = (*inp_p)->sctp_features;
 			inp->sctp_mobility_features = (*inp_p)->sctp_mobility_features;
 			inp->sctp_socket = so;
 			inp->sctp_frag_point = (*inp_p)->sctp_frag_point;
 			inp->max_cwnd = (*inp_p)->max_cwnd;
 			inp->sctp_cmt_on_off = (*inp_p)->sctp_cmt_on_off;
 			inp->ecn_supported = (*inp_p)->ecn_supported;
 			inp->prsctp_supported = (*inp_p)->prsctp_supported;
 			inp->auth_supported = (*inp_p)->auth_supported;
 			inp->asconf_supported = (*inp_p)->asconf_supported;
 			inp->reconfig_supported = (*inp_p)->reconfig_supported;
 			inp->nrsack_supported = (*inp_p)->nrsack_supported;
 			inp->pktdrop_supported = (*inp_p)->pktdrop_supported;
 			inp->partial_delivery_point = (*inp_p)->partial_delivery_point;
 			inp->sctp_context = (*inp_p)->sctp_context;
 			inp->local_strreset_support = (*inp_p)->local_strreset_support;
 			inp->fibnum = (*inp_p)->fibnum;
 			inp->inp_starting_point_for_iterator = NULL;
 			/*
 			 * copy in the authentication parameters from the
 			 * original endpoint
 			 */
 			if (inp->sctp_ep.local_hmacs)
 				sctp_free_hmaclist(inp->sctp_ep.local_hmacs);
 			inp->sctp_ep.local_hmacs =
 			    sctp_copy_hmaclist((*inp_p)->sctp_ep.local_hmacs);
 			if (inp->sctp_ep.local_auth_chunks)
 				sctp_free_chunklist(inp->sctp_ep.local_auth_chunks);
 			inp->sctp_ep.local_auth_chunks =
 			    sctp_copy_chunklist((*inp_p)->sctp_ep.local_auth_chunks);
 
 			/*
 			 * Now we must move it from one hash table to
 			 * another and get the tcb in the right place.
 			 */
 
 			/*
 			 * This is where the one-2-one socket is put into
 			 * the accept state waiting for the accept!
 			 */
 			if (*stcb) {
 				(*stcb)->asoc.state |= SCTP_STATE_IN_ACCEPT_QUEUE;
 			}
 			sctp_move_pcb_and_assoc(*inp_p, inp, *stcb);
 
 			atomic_add_int(&(*stcb)->asoc.refcnt, 1);
 			SCTP_TCB_UNLOCK((*stcb));
 
 			sctp_pull_off_control_to_new_inp((*inp_p), inp, *stcb,
 			    0);
 			SCTP_TCB_LOCK((*stcb));
 			atomic_subtract_int(&(*stcb)->asoc.refcnt, 1);
 
 
 			/*
 			 * now we must check to see if we were aborted while
 			 * the move was going on and the lock/unlock
 			 * happened.
 			 */
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 				/*
 				 * yep it was, we leave the assoc attached
 				 * to the socket since the sctp_inpcb_free()
 				 * call will send an abort for us.
 				 */
 				SCTP_INP_DECR_REF(inp);
 				return (NULL);
 			}
 			SCTP_INP_DECR_REF(inp);
 			/* Switch over to the new guy */
 			*inp_p = inp;
 			sctp_ulp_notify(notification, *stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 			if (send_int_conf) {
 				sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_CONFIRMED,
 				    (*stcb), 0, (void *)netl, SCTP_SO_NOT_LOCKED);
 			}
 			/*
 			 * Pull it from the incomplete queue and wake the
 			 * guy
 			 */
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			atomic_add_int(&(*stcb)->asoc.refcnt, 1);
 			SCTP_TCB_UNLOCK((*stcb));
 			SCTP_SOCKET_LOCK(so, 1);
 #endif
 			soisconnected(so);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			SCTP_TCB_LOCK((*stcb));
 			atomic_subtract_int(&(*stcb)->asoc.refcnt, 1);
 			SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 			return (m);
 		}
 	}
 	if (notification) {
 		sctp_ulp_notify(notification, *stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 	}
 	if (send_int_conf) {
 		sctp_ulp_notify(SCTP_NOTIFY_INTERFACE_CONFIRMED,
 		    (*stcb), 0, (void *)netl, SCTP_SO_NOT_LOCKED);
 	}
 	return (m);
 }
 
 static void
 sctp_handle_cookie_ack(struct sctp_cookie_ack_chunk *cp SCTP_UNUSED,
     struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	/* cp must not be used, others call this without a c-ack :-) */
 	struct sctp_association *asoc;
 
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_cookie_ack: handling COOKIE-ACK\n");
 	if ((stcb == NULL) || (net == NULL)) {
 		return;
 	}
 	asoc = &stcb->asoc;
 
 	sctp_stop_all_cookie_timers(stcb);
 	/* process according to association state */
 	if (SCTP_GET_STATE(asoc) == SCTP_STATE_COOKIE_ECHOED) {
 		/* state change only needed when I am in right state */
 		SCTPDBG(SCTP_DEBUG_INPUT2, "moving to OPEN state\n");
 		SCTP_SET_STATE(asoc, SCTP_STATE_OPEN);
 		sctp_start_net_timers(stcb);
 		if (asoc->state & SCTP_STATE_SHUTDOWN_PENDING) {
 			sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD,
 			    stcb->sctp_ep, stcb, asoc->primary_destination);
 
 		}
 		/* update RTO */
 		SCTP_STAT_INCR_COUNTER32(sctps_activeestab);
 		SCTP_STAT_INCR_GAUGE32(sctps_currestab);
 		if (asoc->overall_error_count == 0) {
 			net->RTO = sctp_calculate_rto(stcb, asoc, net,
 			    &asoc->time_entered, sctp_align_safe_nocopy,
 			    SCTP_RTT_FROM_NON_DATA);
 		}
 		(void)SCTP_GETTIME_TIMEVAL(&asoc->time_entered);
 		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_UP, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 		if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 		    (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			struct socket *so;
 
 #endif
 			stcb->sctp_ep->sctp_flags |= SCTP_PCB_FLAGS_CONNECTED;
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			so = SCTP_INP_SO(stcb->sctp_ep);
 			atomic_add_int(&stcb->asoc.refcnt, 1);
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_SOCKET_LOCK(so, 1);
 			SCTP_TCB_LOCK(stcb);
 			atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 			if ((stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) == 0) {
 				soisconnected(stcb->sctp_socket);
 			}
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 			SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 		}
 		/*
 		 * since we did not send a HB make sure we don't double
 		 * things
 		 */
 		net->hb_responded = 1;
 
 		if (stcb->asoc.state & SCTP_STATE_CLOSED_SOCKET) {
 			/*
 			 * We don't need to do the asconf thing, nor hb or
 			 * autoclose if the socket is closed.
 			 */
 			goto closed_socket;
 		}
 		sctp_timer_start(SCTP_TIMER_TYPE_HEARTBEAT, stcb->sctp_ep,
 		    stcb, net);
 
 
 		if (stcb->asoc.sctp_autoclose_ticks &&
 		    sctp_is_feature_on(stcb->sctp_ep, SCTP_PCB_FLAGS_AUTOCLOSE)) {
 			sctp_timer_start(SCTP_TIMER_TYPE_AUTOCLOSE,
 			    stcb->sctp_ep, stcb, NULL);
 		}
 		/*
 		 * send ASCONF if parameters are pending and ASCONFs are
 		 * allowed (eg. addresses changed when init/cookie echo were
 		 * in flight)
 		 */
 		if ((sctp_is_feature_on(stcb->sctp_ep, SCTP_PCB_FLAGS_DO_ASCONF)) &&
 		    (stcb->asoc.asconf_supported == 1) &&
 		    (!TAILQ_EMPTY(&stcb->asoc.asconf_queue))) {
 #ifdef SCTP_TIMER_BASED_ASCONF
 			sctp_timer_start(SCTP_TIMER_TYPE_ASCONF,
 			    stcb->sctp_ep, stcb,
 			    stcb->asoc.primary_destination);
 #else
 			sctp_send_asconf(stcb, stcb->asoc.primary_destination,
 			    SCTP_ADDR_NOT_LOCKED);
 #endif
 		}
 	}
 closed_socket:
 	/* Toss the cookie if I can */
 	sctp_toss_old_cookies(stcb, asoc);
 	if (!TAILQ_EMPTY(&asoc->sent_queue)) {
 		/* Restart the timer if we have pending data */
 		struct sctp_tmit_chunk *chk;
 
 		chk = TAILQ_FIRST(&asoc->sent_queue);
 		sctp_timer_start(SCTP_TIMER_TYPE_SEND, stcb->sctp_ep, stcb, chk->whoTo);
 	}
 }
 
 static void
 sctp_handle_ecn_echo(struct sctp_ecne_chunk *cp,
     struct sctp_tcb *stcb)
 {
 	struct sctp_nets *net;
 	struct sctp_tmit_chunk *lchk;
 	struct sctp_ecne_chunk bkup;
 	uint8_t override_bit;
 	uint32_t tsn, window_data_tsn;
 	int len;
 	unsigned int pkt_cnt;
 
 	len = ntohs(cp->ch.chunk_length);
 	if ((len != sizeof(struct sctp_ecne_chunk)) &&
 	    (len != sizeof(struct old_sctp_ecne_chunk))) {
 		return;
 	}
 	if (len == sizeof(struct old_sctp_ecne_chunk)) {
 		/* Its the old format */
 		memcpy(&bkup, cp, sizeof(struct old_sctp_ecne_chunk));
 		bkup.num_pkts_since_cwr = htonl(1);
 		cp = &bkup;
 	}
 	SCTP_STAT_INCR(sctps_recvecne);
 	tsn = ntohl(cp->tsn);
 	pkt_cnt = ntohl(cp->num_pkts_since_cwr);
 	lchk = TAILQ_LAST(&stcb->asoc.send_queue, sctpchunk_listhead);
 	if (lchk == NULL) {
 		window_data_tsn = stcb->asoc.sending_seq - 1;
 	} else {
 		window_data_tsn = lchk->rec.data.TSN_seq;
 	}
 
 	/* Find where it was sent to if possible. */
 	net = NULL;
 	TAILQ_FOREACH(lchk, &stcb->asoc.sent_queue, sctp_next) {
 		if (lchk->rec.data.TSN_seq == tsn) {
 			net = lchk->whoTo;
 			net->ecn_prev_cwnd = lchk->rec.data.cwnd_at_send;
 			break;
 		}
 		if (SCTP_TSN_GT(lchk->rec.data.TSN_seq, tsn)) {
 			break;
 		}
 	}
 	if (net == NULL) {
 		/*
 		 * What to do. A previous send of a CWR was possibly lost.
 		 * See how old it is, we may have it marked on the actual
 		 * net.
 		 */
 		TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 			if (tsn == net->last_cwr_tsn) {
 				/* Found him, send it off */
 				break;
 			}
 		}
 		if (net == NULL) {
 			/*
 			 * If we reach here, we need to send a special CWR
 			 * that says hey, we did this a long time ago and
 			 * you lost the response.
 			 */
 			net = TAILQ_FIRST(&stcb->asoc.nets);
 			if (net == NULL) {
 				/* TSNH */
 				return;
 			}
 			override_bit = SCTP_CWR_REDUCE_OVERRIDE;
 		} else {
 			override_bit = 0;
 		}
 	} else {
 		override_bit = 0;
 	}
 	if (SCTP_TSN_GT(tsn, net->cwr_window_tsn) &&
 	    ((override_bit & SCTP_CWR_REDUCE_OVERRIDE) == 0)) {
 		/*
 		 * JRS - Use the congestion control given in the pluggable
 		 * CC module
 		 */
 		stcb->asoc.cc_functions.sctp_cwnd_update_after_ecn_echo(stcb, net, 0, pkt_cnt);
 		/*
 		 * We reduce once every RTT. So we will only lower cwnd at
 		 * the next sending seq i.e. the window_data_tsn
 		 */
 		net->cwr_window_tsn = window_data_tsn;
 		net->ecn_ce_pkt_cnt += pkt_cnt;
 		net->lost_cnt = pkt_cnt;
 		net->last_cwr_tsn = tsn;
 	} else {
 		override_bit |= SCTP_CWR_IN_SAME_WINDOW;
 		if (SCTP_TSN_GT(tsn, net->last_cwr_tsn) &&
 		    ((override_bit & SCTP_CWR_REDUCE_OVERRIDE) == 0)) {
 			/*
 			 * Another loss in the same window update how many
 			 * marks/packets lost we have had.
 			 */
 			int cnt = 1;
 
 			if (pkt_cnt > net->lost_cnt) {
 				/* Should be the case */
 				cnt = (pkt_cnt - net->lost_cnt);
 				net->ecn_ce_pkt_cnt += cnt;
 			}
 			net->lost_cnt = pkt_cnt;
 			net->last_cwr_tsn = tsn;
 			/*
 			 * Most CC functions will ignore this call, since we
 			 * are in-window yet of the initial CE the peer saw.
 			 */
 			stcb->asoc.cc_functions.sctp_cwnd_update_after_ecn_echo(stcb, net, 1, cnt);
 		}
 	}
 	/*
 	 * We always send a CWR this way if our previous one was lost our
 	 * peer will get an update, or if it is not time again to reduce we
 	 * still get the cwr to the peer. Note we set the override when we
 	 * could not find the TSN on the chunk or the destination network.
 	 */
 	sctp_send_cwr(stcb, net, net->last_cwr_tsn, override_bit);
 }
 
 static void
 sctp_handle_ecn_cwr(struct sctp_cwr_chunk *cp, struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	/*
 	 * Here we get a CWR from the peer. We must look in the outqueue and
 	 * make sure that we have a covered ECNE in the control chunk part.
 	 * If so remove it.
 	 */
 	struct sctp_tmit_chunk *chk;
 	struct sctp_ecne_chunk *ecne;
 	int override;
 	uint32_t cwr_tsn;
 
 	cwr_tsn = ntohl(cp->tsn);
 	override = cp->ch.chunk_flags & SCTP_CWR_REDUCE_OVERRIDE;
 	TAILQ_FOREACH(chk, &stcb->asoc.control_send_queue, sctp_next) {
 		if (chk->rec.chunk_id.id != SCTP_ECN_ECHO) {
 			continue;
 		}
 		if ((override == 0) && (chk->whoTo != net)) {
 			/* Must be from the right src unless override is set */
 			continue;
 		}
 		ecne = mtod(chk->data, struct sctp_ecne_chunk *);
 		if (SCTP_TSN_GE(cwr_tsn, ntohl(ecne->tsn))) {
 			/* this covers this ECNE, we can remove it */
 			stcb->asoc.ecn_echo_cnt_onq--;
 			TAILQ_REMOVE(&stcb->asoc.control_send_queue, chk,
 			    sctp_next);
 			sctp_m_freem(chk->data);
 			chk->data = NULL;
 			stcb->asoc.ctrl_queue_cnt--;
 			sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 			if (override == 0) {
 				break;
 			}
 		}
 	}
 }
 
 static void
 sctp_handle_shutdown_complete(struct sctp_shutdown_complete_chunk *cp SCTP_UNUSED,
     struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	struct sctp_association *asoc;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 #endif
 
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_shutdown_complete: handling SHUTDOWN-COMPLETE\n");
 	if (stcb == NULL)
 		return;
 
 	asoc = &stcb->asoc;
 	/* process according to association state */
 	if (SCTP_GET_STATE(asoc) != SCTP_STATE_SHUTDOWN_ACK_SENT) {
 		/* unexpected SHUTDOWN-COMPLETE... so ignore... */
 		SCTPDBG(SCTP_DEBUG_INPUT2,
 		    "sctp_handle_shutdown_complete: not in SCTP_STATE_SHUTDOWN_ACK_SENT --- ignore\n");
 		SCTP_TCB_UNLOCK(stcb);
 		return;
 	}
 	/* notify upper layer protocol */
 	if (stcb->sctp_socket) {
 		sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 	}
 #ifdef INVARIANTS
 	if (!TAILQ_EMPTY(&asoc->send_queue) ||
 	    !TAILQ_EMPTY(&asoc->sent_queue) ||
 	    !stcb->asoc.ss_functions.sctp_ss_is_empty(stcb, asoc)) {
 		panic("Queues are not empty when handling SHUTDOWN-COMPLETE");
 	}
 #endif
 	/* stop the timer */
 	sctp_timer_stop(SCTP_TIMER_TYPE_SHUTDOWNACK, stcb->sctp_ep, stcb, net,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_24);
 	SCTP_STAT_INCR_COUNTER32(sctps_shutdown);
 	/* free the TCB */
 	SCTPDBG(SCTP_DEBUG_INPUT2,
 	    "sctp_handle_shutdown_complete: calls free-asoc\n");
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	so = SCTP_INP_SO(stcb->sctp_ep);
 	atomic_add_int(&stcb->asoc.refcnt, 1);
 	SCTP_TCB_UNLOCK(stcb);
 	SCTP_SOCKET_LOCK(so, 1);
 	SCTP_TCB_LOCK(stcb);
 	atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 	(void)sctp_free_assoc(stcb->sctp_ep, stcb, SCTP_NORMAL_PROC,
 	    SCTP_FROM_SCTP_INPUT + SCTP_LOC_25);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 	return;
 }
 
 static int
 process_chunk_drop(struct sctp_tcb *stcb, struct sctp_chunk_desc *desc,
     struct sctp_nets *net, uint8_t flg)
 {
 	switch (desc->chunk_type) {
 	case SCTP_DATA:
 		/* find the tsn to resend (possibly */
 		{
 			uint32_t tsn;
 			struct sctp_tmit_chunk *tp1;
 
 			tsn = ntohl(desc->tsn_ifany);
 			TAILQ_FOREACH(tp1, &stcb->asoc.sent_queue, sctp_next) {
 				if (tp1->rec.data.TSN_seq == tsn) {
 					/* found it */
 					break;
 				}
 				if (SCTP_TSN_GT(tp1->rec.data.TSN_seq, tsn)) {
 					/* not found */
 					tp1 = NULL;
 					break;
 				}
 			}
 			if (tp1 == NULL) {
 				/*
 				 * Do it the other way , aka without paying
 				 * attention to queue seq order.
 				 */
 				SCTP_STAT_INCR(sctps_pdrpdnfnd);
 				TAILQ_FOREACH(tp1, &stcb->asoc.sent_queue, sctp_next) {
 					if (tp1->rec.data.TSN_seq == tsn) {
 						/* found it */
 						break;
 					}
 				}
 			}
 			if (tp1 == NULL) {
 				SCTP_STAT_INCR(sctps_pdrptsnnf);
 			}
 			if ((tp1) && (tp1->sent < SCTP_DATAGRAM_ACKED)) {
 				uint8_t *ddp;
 
 				if (((flg & SCTP_BADCRC) == 0) &&
 				    ((flg & SCTP_FROM_MIDDLE_BOX) == 0)) {
 					return (0);
 				}
 				if ((stcb->asoc.peers_rwnd == 0) &&
 				    ((flg & SCTP_FROM_MIDDLE_BOX) == 0)) {
 					SCTP_STAT_INCR(sctps_pdrpdiwnp);
 					return (0);
 				}
 				if (stcb->asoc.peers_rwnd == 0 &&
 				    (flg & SCTP_FROM_MIDDLE_BOX)) {
 					SCTP_STAT_INCR(sctps_pdrpdizrw);
 					return (0);
 				}
 				ddp = (uint8_t *) (mtod(tp1->data, caddr_t)+
 				    sizeof(struct sctp_data_chunk));
 				{
 					unsigned int iii;
 
 					for (iii = 0; iii < sizeof(desc->data_bytes);
 					    iii++) {
 						if (ddp[iii] != desc->data_bytes[iii]) {
 							SCTP_STAT_INCR(sctps_pdrpbadd);
 							return (-1);
 						}
 					}
 				}
 
 				if (tp1->do_rtt) {
 					/*
 					 * this guy had a RTO calculation
 					 * pending on it, cancel it
 					 */
 					if (tp1->whoTo->rto_needed == 0) {
 						tp1->whoTo->rto_needed = 1;
 					}
 					tp1->do_rtt = 0;
 				}
 				SCTP_STAT_INCR(sctps_pdrpmark);
 				if (tp1->sent != SCTP_DATAGRAM_RESEND)
 					sctp_ucount_incr(stcb->asoc.sent_queue_retran_cnt);
 				/*
 				 * mark it as if we were doing a FR, since
 				 * we will be getting gap ack reports behind
 				 * the info from the router.
 				 */
 				tp1->rec.data.doing_fast_retransmit = 1;
 				/*
 				 * mark the tsn with what sequences can
 				 * cause a new FR.
 				 */
 				if (TAILQ_EMPTY(&stcb->asoc.send_queue)) {
 					tp1->rec.data.fast_retran_tsn = stcb->asoc.sending_seq;
 				} else {
 					tp1->rec.data.fast_retran_tsn = (TAILQ_FIRST(&stcb->asoc.send_queue))->rec.data.TSN_seq;
 				}
 
 				/* restart the timer */
 				sctp_timer_stop(SCTP_TIMER_TYPE_SEND, stcb->sctp_ep,
 				    stcb, tp1->whoTo,
 				    SCTP_FROM_SCTP_INPUT + SCTP_LOC_26);
 				sctp_timer_start(SCTP_TIMER_TYPE_SEND, stcb->sctp_ep,
 				    stcb, tp1->whoTo);
 
 				/* fix counts and things */
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_FLIGHT_LOGGING_ENABLE) {
 					sctp_misc_ints(SCTP_FLIGHT_LOG_DOWN_PDRP,
 					    tp1->whoTo->flight_size,
 					    tp1->book_size,
 					    (uint32_t) (uintptr_t) stcb,
 					    tp1->rec.data.TSN_seq);
 				}
 				if (tp1->sent < SCTP_DATAGRAM_RESEND) {
 					sctp_flight_size_decrease(tp1);
 					sctp_total_flight_decrease(stcb, tp1);
 				}
 				tp1->sent = SCTP_DATAGRAM_RESEND;
 			} {
 				/* audit code */
 				unsigned int audit;
 
 				audit = 0;
 				TAILQ_FOREACH(tp1, &stcb->asoc.sent_queue, sctp_next) {
 					if (tp1->sent == SCTP_DATAGRAM_RESEND)
 						audit++;
 				}
 				TAILQ_FOREACH(tp1, &stcb->asoc.control_send_queue,
 				    sctp_next) {
 					if (tp1->sent == SCTP_DATAGRAM_RESEND)
 						audit++;
 				}
 				if (audit != stcb->asoc.sent_queue_retran_cnt) {
 					SCTP_PRINTF("**Local Audit finds cnt:%d asoc cnt:%d\n",
 					    audit, stcb->asoc.sent_queue_retran_cnt);
 #ifndef SCTP_AUDITING_ENABLED
 					stcb->asoc.sent_queue_retran_cnt = audit;
 #endif
 				}
 			}
 		}
 		break;
 	case SCTP_ASCONF:
 		{
 			struct sctp_tmit_chunk *asconf;
 
 			TAILQ_FOREACH(asconf, &stcb->asoc.control_send_queue,
 			    sctp_next) {
 				if (asconf->rec.chunk_id.id == SCTP_ASCONF) {
 					break;
 				}
 			}
 			if (asconf) {
 				if (asconf->sent != SCTP_DATAGRAM_RESEND)
 					sctp_ucount_incr(stcb->asoc.sent_queue_retran_cnt);
 				asconf->sent = SCTP_DATAGRAM_RESEND;
 				asconf->snd_count--;
 			}
 		}
 		break;
 	case SCTP_INITIATION:
 		/* resend the INIT */
 		stcb->asoc.dropped_special_cnt++;
 		if (stcb->asoc.dropped_special_cnt < SCTP_RETRY_DROPPED_THRESH) {
 			/*
 			 * If we can get it in, in a few attempts we do
 			 * this, otherwise we let the timer fire.
 			 */
 			sctp_timer_stop(SCTP_TIMER_TYPE_INIT, stcb->sctp_ep,
 			    stcb, net,
 			    SCTP_FROM_SCTP_INPUT + SCTP_LOC_27);
 			sctp_send_initiate(stcb->sctp_ep, stcb, SCTP_SO_NOT_LOCKED);
 		}
 		break;
 	case SCTP_SELECTIVE_ACK:
 	case SCTP_NR_SELECTIVE_ACK:
 		/* resend the sack */
 		sctp_send_sack(stcb, SCTP_SO_NOT_LOCKED);
 		break;
 	case SCTP_HEARTBEAT_REQUEST:
 		/* resend a demand HB */
 		if ((stcb->asoc.overall_error_count + 3) < stcb->asoc.max_send_times) {
 			/*
 			 * Only retransmit if we KNOW we wont destroy the
 			 * tcb
 			 */
 			sctp_send_hb(stcb, net, SCTP_SO_NOT_LOCKED);
 		}
 		break;
 	case SCTP_SHUTDOWN:
 		sctp_send_shutdown(stcb, net);
 		break;
 	case SCTP_SHUTDOWN_ACK:
 		sctp_send_shutdown_ack(stcb, net);
 		break;
 	case SCTP_COOKIE_ECHO:
 		{
 			struct sctp_tmit_chunk *cookie;
 
 			cookie = NULL;
 			TAILQ_FOREACH(cookie, &stcb->asoc.control_send_queue,
 			    sctp_next) {
 				if (cookie->rec.chunk_id.id == SCTP_COOKIE_ECHO) {
 					break;
 				}
 			}
 			if (cookie) {
 				if (cookie->sent != SCTP_DATAGRAM_RESEND)
 					sctp_ucount_incr(stcb->asoc.sent_queue_retran_cnt);
 				cookie->sent = SCTP_DATAGRAM_RESEND;
 				sctp_stop_all_cookie_timers(stcb);
 			}
 		}
 		break;
 	case SCTP_COOKIE_ACK:
 		sctp_send_cookie_ack(stcb);
 		break;
 	case SCTP_ASCONF_ACK:
 		/* resend last asconf ack */
 		sctp_send_asconf_ack(stcb);
 		break;
 	case SCTP_IFORWARD_CUM_TSN:
 	case SCTP_FORWARD_CUM_TSN:
 		send_forward_tsn(stcb, &stcb->asoc);
 		break;
 		/* can't do anything with these */
 	case SCTP_PACKET_DROPPED:
 	case SCTP_INITIATION_ACK:	/* this should not happen */
 	case SCTP_HEARTBEAT_ACK:
 	case SCTP_ABORT_ASSOCIATION:
 	case SCTP_OPERATION_ERROR:
 	case SCTP_SHUTDOWN_COMPLETE:
 	case SCTP_ECN_ECHO:
 	case SCTP_ECN_CWR:
 	default:
 		break;
 	}
 	return (0);
 }
 
 void
 sctp_reset_in_stream(struct sctp_tcb *stcb, uint32_t number_entries, uint16_t * list)
 {
 	uint32_t i;
 	uint16_t temp;
 
 	/*
 	 * We set things to 0xffffffff since this is the last delivered
 	 * sequence and we will be sending in 0 after the reset.
 	 */
 
 	if (number_entries) {
 		for (i = 0; i < number_entries; i++) {
 			temp = ntohs(list[i]);
 			if (temp >= stcb->asoc.streamincnt) {
 				continue;
 			}
 			stcb->asoc.strmin[temp].last_sequence_delivered = 0xffffffff;
 		}
 	} else {
 		list = NULL;
 		for (i = 0; i < stcb->asoc.streamincnt; i++) {
 			stcb->asoc.strmin[i].last_sequence_delivered = 0xffffffff;
 		}
 	}
 	sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_RECV, stcb, number_entries, (void *)list, SCTP_SO_NOT_LOCKED);
 }
 
 static void
 sctp_reset_out_streams(struct sctp_tcb *stcb, uint32_t number_entries, uint16_t * list)
 {
 	uint32_t i;
 	uint16_t temp;
 
 	if (number_entries > 0) {
 		for (i = 0; i < number_entries; i++) {
 			temp = ntohs(list[i]);
 			if (temp >= stcb->asoc.streamoutcnt) {
 				/* no such stream */
 				continue;
 			}
 			stcb->asoc.strmout[temp].next_sequence_send = 0;
 		}
 	} else {
 		for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 			stcb->asoc.strmout[i].next_sequence_send = 0;
 		}
 	}
 	sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_SEND, stcb, number_entries, (void *)list, SCTP_SO_NOT_LOCKED);
 }
 
 static void
 sctp_reset_clear_pending(struct sctp_tcb *stcb, uint32_t number_entries, uint16_t * list)
 {
 	uint32_t i;
 	uint16_t temp;
 
 	if (number_entries > 0) {
 		for (i = 0; i < number_entries; i++) {
 			temp = ntohs(list[i]);
 			if (temp >= stcb->asoc.streamoutcnt) {
 				/* no such stream */
 				continue;
 			}
 			stcb->asoc.strmout[temp].state = SCTP_STREAM_OPEN;
 		}
 	} else {
 		for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 			stcb->asoc.strmout[i].state = SCTP_STREAM_OPEN;
 		}
 	}
 }
 
 
 struct sctp_stream_reset_request *
 sctp_find_stream_reset(struct sctp_tcb *stcb, uint32_t seq, struct sctp_tmit_chunk **bchk)
 {
 	struct sctp_association *asoc;
 	struct sctp_chunkhdr *ch;
 	struct sctp_stream_reset_request *r;
 	struct sctp_tmit_chunk *chk;
 	int len, clen;
 
 	asoc = &stcb->asoc;
 	if (TAILQ_EMPTY(&stcb->asoc.control_send_queue)) {
 		asoc->stream_reset_outstanding = 0;
 		return (NULL);
 	}
 	if (stcb->asoc.str_reset == NULL) {
 		asoc->stream_reset_outstanding = 0;
 		return (NULL);
 	}
 	chk = stcb->asoc.str_reset;
 	if (chk->data == NULL) {
 		return (NULL);
 	}
 	if (bchk) {
 		/* he wants a copy of the chk pointer */
 		*bchk = chk;
 	}
 	clen = chk->send_size;
 	ch = mtod(chk->data, struct sctp_chunkhdr *);
 	r = (struct sctp_stream_reset_request *)(ch + 1);
 	if (ntohl(r->request_seq) == seq) {
 		/* found it */
 		return (r);
 	}
 	len = SCTP_SIZE32(ntohs(r->ph.param_length));
 	if (clen > (len + (int)sizeof(struct sctp_chunkhdr))) {
 		/* move to the next one, there can only be a max of two */
 		r = (struct sctp_stream_reset_request *)((caddr_t)r + len);
 		if (ntohl(r->request_seq) == seq) {
 			return (r);
 		}
 	}
 	/* that seq is not here */
 	return (NULL);
 }
 
 static void
 sctp_clean_up_stream_reset(struct sctp_tcb *stcb)
 {
 	struct sctp_association *asoc;
 	struct sctp_tmit_chunk *chk = stcb->asoc.str_reset;
 
 	if (stcb->asoc.str_reset == NULL) {
 		return;
 	}
 	asoc = &stcb->asoc;
 
 	sctp_timer_stop(SCTP_TIMER_TYPE_STRRESET, stcb->sctp_ep, stcb,
 	    chk->whoTo, SCTP_FROM_SCTP_INPUT + SCTP_LOC_28);
 	TAILQ_REMOVE(&asoc->control_send_queue,
 	    chk,
 	    sctp_next);
 	if (chk->data) {
 		sctp_m_freem(chk->data);
 		chk->data = NULL;
 	}
 	asoc->ctrl_queue_cnt--;
 	sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 	/* sa_ignore NO_NULL_CHK */
 	stcb->asoc.str_reset = NULL;
 }
 
 
 static int
 sctp_handle_stream_reset_response(struct sctp_tcb *stcb,
     uint32_t seq, uint32_t action,
     struct sctp_stream_reset_response *respin)
 {
 	uint16_t type;
 	int lparm_len;
 	struct sctp_association *asoc = &stcb->asoc;
 	struct sctp_tmit_chunk *chk;
 	struct sctp_stream_reset_request *req_param;
 	struct sctp_stream_reset_out_request *req_out_param;
 	struct sctp_stream_reset_in_request *req_in_param;
 	uint32_t number_entries;
 
 	if (asoc->stream_reset_outstanding == 0) {
 		/* duplicate */
 		return (0);
 	}
 	if (seq == stcb->asoc.str_reset_seq_out) {
 		req_param = sctp_find_stream_reset(stcb, seq, &chk);
 		if (req_param != NULL) {
 			stcb->asoc.str_reset_seq_out++;
 			type = ntohs(req_param->ph.param_type);
 			lparm_len = ntohs(req_param->ph.param_length);
 			if (type == SCTP_STR_RESET_OUT_REQUEST) {
 				int no_clear = 0;
 
 				req_out_param = (struct sctp_stream_reset_out_request *)req_param;
 				number_entries = (lparm_len - sizeof(struct sctp_stream_reset_out_request)) / sizeof(uint16_t);
 				asoc->stream_reset_out_is_outstanding = 0;
 				if (asoc->stream_reset_outstanding)
 					asoc->stream_reset_outstanding--;
 				if (action == SCTP_STREAM_RESET_RESULT_PERFORMED) {
 					/* do it */
 					sctp_reset_out_streams(stcb, number_entries, req_out_param->list_of_streams);
 				} else if (action == SCTP_STREAM_RESET_RESULT_DENIED) {
 					sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_DENIED_OUT, stcb, number_entries, req_out_param->list_of_streams, SCTP_SO_NOT_LOCKED);
 				} else if (action == SCTP_STREAM_RESET_RESULT_IN_PROGRESS) {
 					/*
 					 * Set it up so we don't stop
 					 * retransmitting
 					 */
 					asoc->stream_reset_outstanding++;
 					stcb->asoc.str_reset_seq_out--;
 					asoc->stream_reset_out_is_outstanding = 1;
 					no_clear = 1;
 				} else {
 					sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_FAILED_OUT, stcb, number_entries, req_out_param->list_of_streams, SCTP_SO_NOT_LOCKED);
 				}
 				if (no_clear == 0) {
 					sctp_reset_clear_pending(stcb, number_entries, req_out_param->list_of_streams);
 				}
 			} else if (type == SCTP_STR_RESET_IN_REQUEST) {
 				req_in_param = (struct sctp_stream_reset_in_request *)req_param;
 				number_entries = (lparm_len - sizeof(struct sctp_stream_reset_in_request)) / sizeof(uint16_t);
 				if (asoc->stream_reset_outstanding)
 					asoc->stream_reset_outstanding--;
 				if (action == SCTP_STREAM_RESET_RESULT_DENIED) {
 					sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_DENIED_IN, stcb,
 					    number_entries, req_in_param->list_of_streams, SCTP_SO_NOT_LOCKED);
 				} else if (action != SCTP_STREAM_RESET_RESULT_PERFORMED) {
 					sctp_ulp_notify(SCTP_NOTIFY_STR_RESET_FAILED_IN, stcb,
 					    number_entries, req_in_param->list_of_streams, SCTP_SO_NOT_LOCKED);
 				}
 			} else if (type == SCTP_STR_RESET_ADD_OUT_STREAMS) {
 				/* Ok we now may have more streams */
 				int num_stream;
 
 				num_stream = stcb->asoc.strm_pending_add_size;
 				if (num_stream > (stcb->asoc.strm_realoutsize - stcb->asoc.streamoutcnt)) {
 					/* TSNH */
 					num_stream = stcb->asoc.strm_realoutsize - stcb->asoc.streamoutcnt;
 				}
 				stcb->asoc.strm_pending_add_size = 0;
 				if (asoc->stream_reset_outstanding)
 					asoc->stream_reset_outstanding--;
 				if (action == SCTP_STREAM_RESET_RESULT_PERFORMED) {
 					/* Put the new streams into effect */
 					int i;
 
 					for (i = asoc->streamoutcnt; i < (asoc->streamoutcnt + num_stream); i++) {
 						asoc->strmout[i].state = SCTP_STREAM_OPEN;
 					}
 					asoc->streamoutcnt += num_stream;
 					sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt, 0);
 				} else if (action == SCTP_STREAM_RESET_RESULT_DENIED) {
 					sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt,
 					    SCTP_STREAM_CHANGE_DENIED);
 				} else {
 					sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt,
 					    SCTP_STREAM_CHANGE_FAILED);
 				}
 			} else if (type == SCTP_STR_RESET_ADD_IN_STREAMS) {
 				if (asoc->stream_reset_outstanding)
 					asoc->stream_reset_outstanding--;
 				if (action == SCTP_STREAM_RESET_RESULT_DENIED) {
 					sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt,
 					    SCTP_STREAM_CHANGE_DENIED);
 				} else if (action != SCTP_STREAM_RESET_RESULT_PERFORMED) {
 					sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt,
 					    SCTP_STREAM_CHANGE_FAILED);
 				}
 			} else if (type == SCTP_STR_RESET_TSN_REQUEST) {
 				/**
 				 * a) Adopt the new in tsn.
 				 * b) reset the map
 				 * c) Adopt the new out-tsn
 				 */
 				struct sctp_stream_reset_response_tsn *resp;
 				struct sctp_forward_tsn_chunk fwdtsn;
 				int abort_flag = 0;
 
 				if (respin == NULL) {
 					/* huh ? */
 					return (0);
 				}
 				if (ntohs(respin->ph.param_length) < sizeof(struct sctp_stream_reset_response_tsn)) {
 					return (0);
 				}
 				if (action == SCTP_STREAM_RESET_RESULT_PERFORMED) {
 					resp = (struct sctp_stream_reset_response_tsn *)respin;
 					asoc->stream_reset_outstanding--;
 					fwdtsn.ch.chunk_length = htons(sizeof(struct sctp_forward_tsn_chunk));
 					fwdtsn.ch.chunk_type = SCTP_FORWARD_CUM_TSN;
 					fwdtsn.new_cumulative_tsn = htonl(ntohl(resp->senders_next_tsn) - 1);
 					sctp_handle_forward_tsn(stcb, &fwdtsn, &abort_flag, NULL, 0);
 					if (abort_flag) {
 						return (1);
 					}
 					stcb->asoc.highest_tsn_inside_map = (ntohl(resp->senders_next_tsn) - 1);
 					if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MAP_LOGGING_ENABLE) {
 						sctp_log_map(0, 7, asoc->highest_tsn_inside_map, SCTP_MAP_SLIDE_RESULT);
 					}
 					stcb->asoc.tsn_last_delivered = stcb->asoc.cumulative_tsn = stcb->asoc.highest_tsn_inside_map;
 					stcb->asoc.mapping_array_base_tsn = ntohl(resp->senders_next_tsn);
 					memset(stcb->asoc.mapping_array, 0, stcb->asoc.mapping_array_size);
 
 					stcb->asoc.highest_tsn_inside_nr_map = stcb->asoc.highest_tsn_inside_map;
 					memset(stcb->asoc.nr_mapping_array, 0, stcb->asoc.mapping_array_size);
 
 					stcb->asoc.sending_seq = ntohl(resp->receivers_next_tsn);
 					stcb->asoc.last_acked_seq = stcb->asoc.cumulative_tsn;
 
 					sctp_reset_out_streams(stcb, 0, (uint16_t *) NULL);
 					sctp_reset_in_stream(stcb, 0, (uint16_t *) NULL);
 					sctp_notify_stream_reset_tsn(stcb, stcb->asoc.sending_seq, (stcb->asoc.mapping_array_base_tsn + 1), 0);
 				} else if (action == SCTP_STREAM_RESET_RESULT_DENIED) {
 					sctp_notify_stream_reset_tsn(stcb, stcb->asoc.sending_seq, (stcb->asoc.mapping_array_base_tsn + 1),
 					    SCTP_ASSOC_RESET_DENIED);
 				} else {
 					sctp_notify_stream_reset_tsn(stcb, stcb->asoc.sending_seq, (stcb->asoc.mapping_array_base_tsn + 1),
 					    SCTP_ASSOC_RESET_FAILED);
 				}
 			}
 			/* get rid of the request and get the request flags */
 			if (asoc->stream_reset_outstanding == 0) {
 				sctp_clean_up_stream_reset(stcb);
 			}
 		}
 	}
 	if (asoc->stream_reset_outstanding == 0) {
 		sctp_send_stream_reset_out_if_possible(stcb, SCTP_SO_NOT_LOCKED);
 	}
 	return (0);
 }
 
 static void
 sctp_handle_str_reset_request_in(struct sctp_tcb *stcb,
     struct sctp_tmit_chunk *chk,
     struct sctp_stream_reset_in_request *req, int trunc)
 {
 	uint32_t seq;
 	int len, i;
 	int number_entries;
 	uint16_t temp;
 
 	/*
 	 * peer wants me to send a str-reset to him for my outgoing seq's if
 	 * seq_in is right.
 	 */
 	struct sctp_association *asoc = &stcb->asoc;
 
 	seq = ntohl(req->request_seq);
 	if (asoc->str_reset_seq_in == seq) {
 		asoc->last_reset_action[1] = asoc->last_reset_action[0];
 		if (!(asoc->local_strreset_support & SCTP_ENABLE_RESET_STREAM_REQ)) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if (trunc) {
 			/* Can't do it, since they exceeded our buffer size  */
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if (stcb->asoc.stream_reset_out_is_outstanding == 0) {
 			len = ntohs(req->ph.param_length);
 			number_entries = ((len - sizeof(struct sctp_stream_reset_in_request)) / sizeof(uint16_t));
 			if (number_entries) {
 				for (i = 0; i < number_entries; i++) {
 					temp = ntohs(req->list_of_streams[i]);
 					if (temp >= stcb->asoc.streamoutcnt) {
 						asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 						goto bad_boy;
 					}
 					req->list_of_streams[i] = temp;
 				}
 				for (i = 0; i < number_entries; i++) {
 					if (stcb->asoc.strmout[req->list_of_streams[i]].state == SCTP_STREAM_OPEN) {
 						stcb->asoc.strmout[req->list_of_streams[i]].state = SCTP_STREAM_RESET_PENDING;
 					}
 				}
 			} else {
 				/* Its all */
 				for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 					if (stcb->asoc.strmout[i].state == SCTP_STREAM_OPEN)
 						stcb->asoc.strmout[i].state = SCTP_STREAM_RESET_PENDING;
 				}
 			}
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_PERFORMED;
 		} else {
 			/* Can't do it, since we have sent one out */
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_ERR_IN_PROGRESS;
 		}
 bad_boy:
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 		asoc->str_reset_seq_in++;
 	} else if (asoc->str_reset_seq_in - 1 == seq) {
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 	} else if (asoc->str_reset_seq_in - 2 == seq) {
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[1]);
 	} else {
 		sctp_add_stream_reset_result(chk, seq, SCTP_STREAM_RESET_RESULT_ERR_BAD_SEQNO);
 	}
 	sctp_send_stream_reset_out_if_possible(stcb, SCTP_SO_NOT_LOCKED);
 }
 
 static int
 sctp_handle_str_reset_request_tsn(struct sctp_tcb *stcb,
     struct sctp_tmit_chunk *chk,
     struct sctp_stream_reset_tsn_request *req)
 {
 	/* reset all in and out and update the tsn */
 	/*
 	 * A) reset my str-seq's on in and out. B) Select a receive next,
 	 * and set cum-ack to it. Also process this selected number as a
 	 * fwd-tsn as well. C) set in the response my next sending seq.
 	 */
 	struct sctp_forward_tsn_chunk fwdtsn;
 	struct sctp_association *asoc = &stcb->asoc;
 	int abort_flag = 0;
 	uint32_t seq;
 
 	seq = ntohl(req->request_seq);
 	if (asoc->str_reset_seq_in == seq) {
 		asoc->last_reset_action[1] = stcb->asoc.last_reset_action[0];
 		if (!(asoc->local_strreset_support & SCTP_ENABLE_CHANGE_ASSOC_REQ)) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else {
 			fwdtsn.ch.chunk_length = htons(sizeof(struct sctp_forward_tsn_chunk));
 			fwdtsn.ch.chunk_type = SCTP_FORWARD_CUM_TSN;
 			fwdtsn.ch.chunk_flags = 0;
 			fwdtsn.new_cumulative_tsn = htonl(stcb->asoc.highest_tsn_inside_map + 1);
 			sctp_handle_forward_tsn(stcb, &fwdtsn, &abort_flag, NULL, 0);
 			if (abort_flag) {
 				return (1);
 			}
 			asoc->highest_tsn_inside_map += SCTP_STREAM_RESET_TSN_DELTA;
 			if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MAP_LOGGING_ENABLE) {
 				sctp_log_map(0, 10, asoc->highest_tsn_inside_map, SCTP_MAP_SLIDE_RESULT);
 			}
 			asoc->tsn_last_delivered = asoc->cumulative_tsn = asoc->highest_tsn_inside_map;
 			asoc->mapping_array_base_tsn = asoc->highest_tsn_inside_map + 1;
 			memset(asoc->mapping_array, 0, asoc->mapping_array_size);
 			asoc->highest_tsn_inside_nr_map = asoc->highest_tsn_inside_map;
 			memset(asoc->nr_mapping_array, 0, asoc->mapping_array_size);
 			atomic_add_int(&asoc->sending_seq, 1);
 			/* save off historical data for retrans */
 			asoc->last_sending_seq[1] = asoc->last_sending_seq[0];
 			asoc->last_sending_seq[0] = asoc->sending_seq;
 			asoc->last_base_tsnsent[1] = asoc->last_base_tsnsent[0];
 			asoc->last_base_tsnsent[0] = asoc->mapping_array_base_tsn;
 			sctp_reset_out_streams(stcb, 0, (uint16_t *) NULL);
 			sctp_reset_in_stream(stcb, 0, (uint16_t *) NULL);
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_PERFORMED;
 			sctp_notify_stream_reset_tsn(stcb, asoc->sending_seq, (asoc->mapping_array_base_tsn + 1), 0);
 		}
 		sctp_add_stream_reset_result_tsn(chk, seq, asoc->last_reset_action[0],
 		    asoc->last_sending_seq[0], asoc->last_base_tsnsent[0]);
 		asoc->str_reset_seq_in++;
 	} else if (asoc->str_reset_seq_in - 1 == seq) {
 		sctp_add_stream_reset_result_tsn(chk, seq, asoc->last_reset_action[0],
 		    asoc->last_sending_seq[0], asoc->last_base_tsnsent[0]);
 	} else if (asoc->str_reset_seq_in - 2 == seq) {
 		sctp_add_stream_reset_result_tsn(chk, seq, asoc->last_reset_action[1],
 		    asoc->last_sending_seq[1], asoc->last_base_tsnsent[1]);
 	} else {
 		sctp_add_stream_reset_result(chk, seq, SCTP_STREAM_RESET_RESULT_ERR_BAD_SEQNO);
 	}
 	return (0);
 }
 
 static void
 sctp_handle_str_reset_request_out(struct sctp_tcb *stcb,
     struct sctp_tmit_chunk *chk,
     struct sctp_stream_reset_out_request *req, int trunc)
 {
 	uint32_t seq, tsn;
 	int number_entries, len;
 	struct sctp_association *asoc = &stcb->asoc;
 
 	seq = ntohl(req->request_seq);
 
 	/* now if its not a duplicate we process it */
 	if (asoc->str_reset_seq_in == seq) {
 		len = ntohs(req->ph.param_length);
 		number_entries = ((len - sizeof(struct sctp_stream_reset_out_request)) / sizeof(uint16_t));
 		/*
 		 * the sender is resetting, handle the list issue.. we must
 		 * a) verify if we can do the reset, if so no problem b) If
 		 * we can't do the reset we must copy the request. c) queue
 		 * it, and setup the data in processor to trigger it off
 		 * when needed and dequeue all the queued data.
 		 */
 		tsn = ntohl(req->send_reset_at_tsn);
 
 		/* move the reset action back one */
 		asoc->last_reset_action[1] = asoc->last_reset_action[0];
 		if (!(asoc->local_strreset_support & SCTP_ENABLE_RESET_STREAM_REQ)) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if (trunc) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if (SCTP_TSN_GE(asoc->cumulative_tsn, tsn)) {
 			/* we can do it now */
 			sctp_reset_in_stream(stcb, number_entries, req->list_of_streams);
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_PERFORMED;
 		} else {
 			/*
 			 * we must queue it up and thus wait for the TSN's
 			 * to arrive that are at or before tsn
 			 */
 			struct sctp_stream_reset_list *liste;
 			int siz;
 
 			siz = sizeof(struct sctp_stream_reset_list) + (number_entries * sizeof(uint16_t));
 			SCTP_MALLOC(liste, struct sctp_stream_reset_list *,
 			    siz, SCTP_M_STRESET);
 			if (liste == NULL) {
 				/* gak out of memory */
 				asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 				sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 				return;
 			}
 			liste->seq = seq;
 			liste->tsn = tsn;
 			liste->number_entries = number_entries;
 			memcpy(&liste->list_of_streams, req->list_of_streams, number_entries * sizeof(uint16_t));
 			TAILQ_INSERT_TAIL(&asoc->resetHead, liste, next_resp);
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_IN_PROGRESS;
 		}
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 		asoc->str_reset_seq_in++;
 	} else if ((asoc->str_reset_seq_in - 1) == seq) {
 		/*
 		 * one seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 	} else if ((asoc->str_reset_seq_in - 2) == seq) {
 		/*
 		 * two seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[1]);
 	} else {
 		sctp_add_stream_reset_result(chk, seq, SCTP_STREAM_RESET_RESULT_ERR_BAD_SEQNO);
 	}
 }
 
 static void
 sctp_handle_str_reset_add_strm(struct sctp_tcb *stcb, struct sctp_tmit_chunk *chk,
     struct sctp_stream_reset_add_strm *str_add)
 {
 	/*
 	 * Peer is requesting to add more streams. If its within our
 	 * max-streams we will allow it.
 	 */
 	uint32_t num_stream, i;
 	uint32_t seq;
 	struct sctp_association *asoc = &stcb->asoc;
 	struct sctp_queued_to_read *ctl, *nctl;
 
 	/* Get the number. */
 	seq = ntohl(str_add->request_seq);
 	num_stream = ntohs(str_add->number_of_streams);
 	/* Now what would be the new total? */
 	if (asoc->str_reset_seq_in == seq) {
 		num_stream += stcb->asoc.streamincnt;
 		stcb->asoc.last_reset_action[1] = stcb->asoc.last_reset_action[0];
 		if (!(asoc->local_strreset_support & SCTP_ENABLE_CHANGE_ASSOC_REQ)) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if ((num_stream > stcb->asoc.max_inbound_streams) ||
 		    (num_stream > 0xffff)) {
 			/* We must reject it they ask for to many */
 	denied:
 			stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else {
 			/* Ok, we can do that :-) */
 			struct sctp_stream_in *oldstrm;
 
 			/* save off the old */
 			oldstrm = stcb->asoc.strmin;
 			SCTP_MALLOC(stcb->asoc.strmin, struct sctp_stream_in *,
 			    (num_stream * sizeof(struct sctp_stream_in)),
 			    SCTP_M_STRMI);
 			if (stcb->asoc.strmin == NULL) {
 				stcb->asoc.strmin = oldstrm;
 				goto denied;
 			}
 			/* copy off the old data */
 			for (i = 0; i < stcb->asoc.streamincnt; i++) {
 				TAILQ_INIT(&stcb->asoc.strmin[i].inqueue);
 				TAILQ_INIT(&stcb->asoc.strmin[i].uno_inqueue);
 				stcb->asoc.strmin[i].stream_no = i;
 				stcb->asoc.strmin[i].last_sequence_delivered = oldstrm[i].last_sequence_delivered;
 				stcb->asoc.strmin[i].delivery_started = oldstrm[i].delivery_started;
 				stcb->asoc.strmin[i].pd_api_started = oldstrm[i].pd_api_started;
 				/* now anything on those queues? */
 				TAILQ_FOREACH_SAFE(ctl, &oldstrm[i].inqueue, next_instrm, nctl) {
 					TAILQ_REMOVE(&oldstrm[i].inqueue, ctl, next_instrm);
 					TAILQ_INSERT_TAIL(&stcb->asoc.strmin[i].inqueue, ctl, next_instrm);
 				}
 				TAILQ_FOREACH_SAFE(ctl, &oldstrm[i].uno_inqueue, next_instrm, nctl) {
 					TAILQ_REMOVE(&oldstrm[i].uno_inqueue, ctl, next_instrm);
 					TAILQ_INSERT_TAIL(&stcb->asoc.strmin[i].uno_inqueue, ctl, next_instrm);
 				}
 			}
 			/* Init the new streams */
 			for (i = stcb->asoc.streamincnt; i < num_stream; i++) {
 				TAILQ_INIT(&stcb->asoc.strmin[i].inqueue);
 				TAILQ_INIT(&stcb->asoc.strmin[i].uno_inqueue);
 				stcb->asoc.strmin[i].stream_no = i;
 				stcb->asoc.strmin[i].last_sequence_delivered = 0xffffffff;
 				stcb->asoc.strmin[i].pd_api_started = 0;
 				stcb->asoc.strmin[i].delivery_started = 0;
 			}
 			SCTP_FREE(oldstrm, SCTP_M_STRMI);
 			/* update the size */
 			stcb->asoc.streamincnt = num_stream;
 			stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_PERFORMED;
 			sctp_notify_stream_reset_add(stcb, stcb->asoc.streamincnt, stcb->asoc.streamoutcnt, 0);
 		}
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 		asoc->str_reset_seq_in++;
 	} else if ((asoc->str_reset_seq_in - 1) == seq) {
 		/*
 		 * one seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 	} else if ((asoc->str_reset_seq_in - 2) == seq) {
 		/*
 		 * two seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[1]);
 	} else {
 		sctp_add_stream_reset_result(chk, seq, SCTP_STREAM_RESET_RESULT_ERR_BAD_SEQNO);
 
 	}
 }
 
 static void
 sctp_handle_str_reset_add_out_strm(struct sctp_tcb *stcb, struct sctp_tmit_chunk *chk,
     struct sctp_stream_reset_add_strm *str_add)
 {
 	/*
 	 * Peer is requesting to add more streams. If its within our
 	 * max-streams we will allow it.
 	 */
 	uint16_t num_stream;
 	uint32_t seq;
 	struct sctp_association *asoc = &stcb->asoc;
 
 	/* Get the number. */
 	seq = ntohl(str_add->request_seq);
 	num_stream = ntohs(str_add->number_of_streams);
 	/* Now what would be the new total? */
 	if (asoc->str_reset_seq_in == seq) {
 		stcb->asoc.last_reset_action[1] = stcb->asoc.last_reset_action[0];
 		if (!(asoc->local_strreset_support & SCTP_ENABLE_CHANGE_ASSOC_REQ)) {
 			asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 		} else if (stcb->asoc.stream_reset_outstanding) {
 			/* We must reject it we have something pending */
 			stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_ERR_IN_PROGRESS;
 		} else {
 			/* Ok, we can do that :-) */
 			int mychk;
 
 			mychk = stcb->asoc.streamoutcnt;
 			mychk += num_stream;
 			if (mychk < 0x10000) {
 				stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_PERFORMED;
 				if (sctp_send_str_reset_req(stcb, 0, NULL, 0, 0, 1, num_stream, 0, 1)) {
 					stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 				}
 			} else {
 				stcb->asoc.last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED;
 			}
 		}
 		sctp_add_stream_reset_result(chk, seq, stcb->asoc.last_reset_action[0]);
 		asoc->str_reset_seq_in++;
 	} else if ((asoc->str_reset_seq_in - 1) == seq) {
 		/*
 		 * one seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]);
 	} else if ((asoc->str_reset_seq_in - 2) == seq) {
 		/*
 		 * two seq back, just echo back last action since my
 		 * response was lost.
 		 */
 		sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[1]);
 	} else {
 		sctp_add_stream_reset_result(chk, seq, SCTP_STREAM_RESET_RESULT_ERR_BAD_SEQNO);
 	}
 }
 
 #ifdef __GNUC__
 __attribute__((noinline))
 #endif
 	static int
 	    sctp_handle_stream_reset(struct sctp_tcb *stcb, struct mbuf *m, int offset,
         struct sctp_chunkhdr *ch_req)
 {
 	uint16_t remaining_length, param_len, ptype;
 	struct sctp_paramhdr pstore;
 	uint8_t cstore[SCTP_CHUNK_BUFFER_SIZE];
 	uint32_t seq = 0;
 	int num_req = 0;
 	int trunc = 0;
 	struct sctp_tmit_chunk *chk;
 	struct sctp_chunkhdr *ch;
 	struct sctp_paramhdr *ph;
 	int ret_code = 0;
 	int num_param = 0;
 
 	/* now it may be a reset or a reset-response */
 	remaining_length = ntohs(ch_req->chunk_length) - sizeof(struct sctp_chunkhdr);
 
 	/* setup for adding the response */
 	sctp_alloc_a_chunk(stcb, chk);
 	if (chk == NULL) {
 		return (ret_code);
 	}
 	chk->copy_by_ref = 0;
 	chk->rec.chunk_id.id = SCTP_STREAM_RESET;
 	chk->rec.chunk_id.can_take_data = 0;
 	chk->flags = 0;
 	chk->asoc = &stcb->asoc;
 	chk->no_fr_allowed = 0;
 	chk->book_size = chk->send_size = sizeof(struct sctp_chunkhdr);
 	chk->book_size_scale = 0;
 	chk->data = sctp_get_mbuf_for_msg(MCLBYTES, 0, M_NOWAIT, 1, MT_DATA);
 	if (chk->data == NULL) {
 strres_nochunk:
 		if (chk->data) {
 			sctp_m_freem(chk->data);
 			chk->data = NULL;
 		}
 		sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 		return (ret_code);
 	}
 	SCTP_BUF_RESV_UF(chk->data, SCTP_MIN_OVERHEAD);
 
 	/* setup chunk parameters */
 	chk->sent = SCTP_DATAGRAM_UNSENT;
 	chk->snd_count = 0;
 	chk->whoTo = NULL;
 
 	ch = mtod(chk->data, struct sctp_chunkhdr *);
 	ch->chunk_type = SCTP_STREAM_RESET;
 	ch->chunk_flags = 0;
 	ch->chunk_length = htons(chk->send_size);
 	SCTP_BUF_LEN(chk->data) = SCTP_SIZE32(chk->send_size);
 	offset += sizeof(struct sctp_chunkhdr);
 	while (remaining_length >= sizeof(struct sctp_paramhdr)) {
 		ph = (struct sctp_paramhdr *)sctp_m_getptr(m, offset, sizeof(pstore), (uint8_t *) & pstore);
 		if (ph == NULL) {
 			/* TSNH */
 			break;
 		}
 		param_len = ntohs(ph->param_length);
 		if ((param_len > remaining_length) ||
 		    (param_len < (sizeof(struct sctp_paramhdr) + sizeof(uint32_t)))) {
 			/* bad parameter length */
 			break;
 		}
 		ph = (struct sctp_paramhdr *)sctp_m_getptr(m, offset, min(param_len, sizeof(cstore)),
 		    (uint8_t *) & cstore);
 		if (ph == NULL) {
 			/* TSNH */
 			break;
 		}
 		ptype = ntohs(ph->param_type);
 		num_param++;
 		if (param_len > sizeof(cstore)) {
 			trunc = 1;
 		} else {
 			trunc = 0;
 		}
 		if (num_param > SCTP_MAX_RESET_PARAMS) {
 			/* hit the max of parameters already sorry.. */
 			break;
 		}
 		if (ptype == SCTP_STR_RESET_OUT_REQUEST) {
 			struct sctp_stream_reset_out_request *req_out;
 
 			if (param_len < sizeof(struct sctp_stream_reset_out_request)) {
 				break;
 			}
 			req_out = (struct sctp_stream_reset_out_request *)ph;
 			num_req++;
 			if (stcb->asoc.stream_reset_outstanding) {
 				seq = ntohl(req_out->response_seq);
 				if (seq == stcb->asoc.str_reset_seq_out) {
 					/* implicit ack */
 					(void)sctp_handle_stream_reset_response(stcb, seq, SCTP_STREAM_RESET_RESULT_PERFORMED, NULL);
 				}
 			}
 			sctp_handle_str_reset_request_out(stcb, chk, req_out, trunc);
 		} else if (ptype == SCTP_STR_RESET_ADD_OUT_STREAMS) {
 			struct sctp_stream_reset_add_strm *str_add;
 
 			if (param_len < sizeof(struct sctp_stream_reset_add_strm)) {
 				break;
 			}
 			str_add = (struct sctp_stream_reset_add_strm *)ph;
 			num_req++;
 			sctp_handle_str_reset_add_strm(stcb, chk, str_add);
 		} else if (ptype == SCTP_STR_RESET_ADD_IN_STREAMS) {
 			struct sctp_stream_reset_add_strm *str_add;
 
 			if (param_len < sizeof(struct sctp_stream_reset_add_strm)) {
 				break;
 			}
 			str_add = (struct sctp_stream_reset_add_strm *)ph;
 			num_req++;
 			sctp_handle_str_reset_add_out_strm(stcb, chk, str_add);
 		} else if (ptype == SCTP_STR_RESET_IN_REQUEST) {
 			struct sctp_stream_reset_in_request *req_in;
 
 			num_req++;
 			req_in = (struct sctp_stream_reset_in_request *)ph;
 			sctp_handle_str_reset_request_in(stcb, chk, req_in, trunc);
 		} else if (ptype == SCTP_STR_RESET_TSN_REQUEST) {
 			struct sctp_stream_reset_tsn_request *req_tsn;
 
 			num_req++;
 			req_tsn = (struct sctp_stream_reset_tsn_request *)ph;
 			if (sctp_handle_str_reset_request_tsn(stcb, chk, req_tsn)) {
 				ret_code = 1;
 				goto strres_nochunk;
 			}
 			/* no more */
 			break;
 		} else if (ptype == SCTP_STR_RESET_RESPONSE) {
 			struct sctp_stream_reset_response *resp;
 			uint32_t result;
 
 			if (param_len < sizeof(struct sctp_stream_reset_response)) {
 				break;
 			}
 			resp = (struct sctp_stream_reset_response *)ph;
 			seq = ntohl(resp->response_seq);
 			result = ntohl(resp->result);
 			if (sctp_handle_stream_reset_response(stcb, seq, result, resp)) {
 				ret_code = 1;
 				goto strres_nochunk;
 			}
 		} else {
 			break;
 		}
 		offset += SCTP_SIZE32(param_len);
 		if (remaining_length >= SCTP_SIZE32(param_len)) {
 			remaining_length -= SCTP_SIZE32(param_len);
 		} else {
 			remaining_length = 0;
 		}
 	}
 	if (num_req == 0) {
 		/* we have no response free the stuff */
 		goto strres_nochunk;
 	}
 	/* ok we have a chunk to link in */
 	TAILQ_INSERT_TAIL(&stcb->asoc.control_send_queue,
 	    chk,
 	    sctp_next);
 	stcb->asoc.ctrl_queue_cnt++;
 	return (ret_code);
 }
 
 /*
  * Handle a router or endpoints report of a packet loss, there are two ways
  * to handle this, either we get the whole packet and must disect it
  * ourselves (possibly with truncation and or corruption) or it is a summary
  * from a middle box that did the disectting for us.
  */
 static void
 sctp_handle_packet_dropped(struct sctp_pktdrop_chunk *cp,
     struct sctp_tcb *stcb, struct sctp_nets *net, uint32_t limit)
 {
 	uint32_t bottle_bw, on_queue;
 	uint16_t trunc_len;
 	unsigned int chlen;
 	unsigned int at;
 	struct sctp_chunk_desc desc;
 	struct sctp_chunkhdr *ch;
 
 	chlen = ntohs(cp->ch.chunk_length);
 	chlen -= sizeof(struct sctp_pktdrop_chunk);
 	/* XXX possible chlen underflow */
 	if (chlen == 0) {
 		ch = NULL;
 		if (cp->ch.chunk_flags & SCTP_FROM_MIDDLE_BOX)
 			SCTP_STAT_INCR(sctps_pdrpbwrpt);
 	} else {
 		ch = (struct sctp_chunkhdr *)(cp->data + sizeof(struct sctphdr));
 		chlen -= sizeof(struct sctphdr);
 		/* XXX possible chlen underflow */
 		memset(&desc, 0, sizeof(desc));
 	}
 	trunc_len = (uint16_t) ntohs(cp->trunc_len);
 	if (trunc_len > limit) {
 		trunc_len = limit;
 	}
 	/* now the chunks themselves */
 	while ((ch != NULL) && (chlen >= sizeof(struct sctp_chunkhdr))) {
 		desc.chunk_type = ch->chunk_type;
 		/* get amount we need to move */
 		at = ntohs(ch->chunk_length);
 		if (at < sizeof(struct sctp_chunkhdr)) {
 			/* corrupt chunk, maybe at the end? */
 			SCTP_STAT_INCR(sctps_pdrpcrupt);
 			break;
 		}
 		if (trunc_len == 0) {
 			/* we are supposed to have all of it */
 			if (at > chlen) {
 				/* corrupt skip it */
 				SCTP_STAT_INCR(sctps_pdrpcrupt);
 				break;
 			}
 		} else {
 			/* is there enough of it left ? */
 			if (desc.chunk_type == SCTP_DATA) {
 				if (chlen < (sizeof(struct sctp_data_chunk) +
 				    sizeof(desc.data_bytes))) {
 					break;
 				}
 			} else {
 				if (chlen < sizeof(struct sctp_chunkhdr)) {
 					break;
 				}
 			}
 		}
 		if (desc.chunk_type == SCTP_DATA) {
 			/* can we get out the tsn? */
 			if ((cp->ch.chunk_flags & SCTP_FROM_MIDDLE_BOX))
 				SCTP_STAT_INCR(sctps_pdrpmbda);
 
 			if (chlen >= (sizeof(struct sctp_data_chunk) + sizeof(uint32_t))) {
 				/* yep */
 				struct sctp_data_chunk *dcp;
 				uint8_t *ddp;
 				unsigned int iii;
 
 				dcp = (struct sctp_data_chunk *)ch;
 				ddp = (uint8_t *) (dcp + 1);
 				for (iii = 0; iii < sizeof(desc.data_bytes); iii++) {
 					desc.data_bytes[iii] = ddp[iii];
 				}
 				desc.tsn_ifany = dcp->dp.tsn;
 			} else {
 				/* nope we are done. */
 				SCTP_STAT_INCR(sctps_pdrpnedat);
 				break;
 			}
 		} else {
 			if ((cp->ch.chunk_flags & SCTP_FROM_MIDDLE_BOX))
 				SCTP_STAT_INCR(sctps_pdrpmbct);
 		}
 
 		if (process_chunk_drop(stcb, &desc, net, cp->ch.chunk_flags)) {
 			SCTP_STAT_INCR(sctps_pdrppdbrk);
 			break;
 		}
 		if (SCTP_SIZE32(at) > chlen) {
 			break;
 		}
 		chlen -= SCTP_SIZE32(at);
 		if (chlen < sizeof(struct sctp_chunkhdr)) {
 			/* done, none left */
 			break;
 		}
 		ch = (struct sctp_chunkhdr *)((caddr_t)ch + SCTP_SIZE32(at));
 	}
 	/* Now update any rwnd --- possibly */
 	if ((cp->ch.chunk_flags & SCTP_FROM_MIDDLE_BOX) == 0) {
 		/* From a peer, we get a rwnd report */
 		uint32_t a_rwnd;
 
 		SCTP_STAT_INCR(sctps_pdrpfehos);
 
 		bottle_bw = ntohl(cp->bottle_bw);
 		on_queue = ntohl(cp->current_onq);
 		if (bottle_bw && on_queue) {
 			/* a rwnd report is in here */
 			if (bottle_bw > on_queue)
 				a_rwnd = bottle_bw - on_queue;
 			else
 				a_rwnd = 0;
 
 			if (a_rwnd == 0)
 				stcb->asoc.peers_rwnd = 0;
 			else {
 				if (a_rwnd > stcb->asoc.total_flight) {
 					stcb->asoc.peers_rwnd =
 					    a_rwnd - stcb->asoc.total_flight;
 				} else {
 					stcb->asoc.peers_rwnd = 0;
 				}
 				if (stcb->asoc.peers_rwnd <
 				    stcb->sctp_ep->sctp_ep.sctp_sws_sender) {
 					/* SWS sender side engages */
 					stcb->asoc.peers_rwnd = 0;
 				}
 			}
 		}
 	} else {
 		SCTP_STAT_INCR(sctps_pdrpfmbox);
 	}
 
 	/* now middle boxes in sat networks get a cwnd bump */
 	if ((cp->ch.chunk_flags & SCTP_FROM_MIDDLE_BOX) &&
 	    (stcb->asoc.sat_t3_loss_recovery == 0) &&
 	    (stcb->asoc.sat_network)) {
 		/*
 		 * This is debatable but for sat networks it makes sense
 		 * Note if a T3 timer has went off, we will prohibit any
 		 * changes to cwnd until we exit the t3 loss recovery.
 		 */
 		stcb->asoc.cc_functions.sctp_cwnd_update_after_packet_dropped(stcb,
 		    net, cp, &bottle_bw, &on_queue);
 	}
 }
 
 /*
  * handles all control chunks in a packet inputs: - m: mbuf chain, assumed to
  * still contain IP/SCTP header - stcb: is the tcb found for this packet -
  * offset: offset into the mbuf chain to first chunkhdr - length: is the
  * length of the complete packet outputs: - length: modified to remaining
  * length after control processing - netp: modified to new sctp_nets after
  * cookie-echo processing - return NULL to discard the packet (ie. no asoc,
  * bad packet,...) otherwise return the tcb for this packet
  */
 #ifdef __GNUC__
 __attribute__((noinline))
 #endif
 	static struct sctp_tcb *
 	         sctp_process_control(struct mbuf *m, int iphlen, int *offset, int length,
              struct sockaddr *src, struct sockaddr *dst,
              struct sctphdr *sh, struct sctp_chunkhdr *ch, struct sctp_inpcb *inp,
              struct sctp_tcb *stcb, struct sctp_nets **netp, int *fwd_tsn_seen,
              uint8_t mflowtype, uint32_t mflowid, uint16_t fibnum,
              uint32_t vrf_id, uint16_t port)
 {
 	struct sctp_association *asoc;
 	struct mbuf *op_err;
 	char msg[SCTP_DIAG_INFO_LEN];
 	uint32_t vtag_in;
 	int num_chunks = 0;	/* number of control chunks processed */
 	uint32_t chk_length;
 	int ret;
 	int abort_no_unlock = 0;
 	int ecne_seen = 0;
 
 	/*
 	 * How big should this be, and should it be alloc'd? Lets try the
 	 * d-mtu-ceiling for now (2k) and that should hopefully work ...
 	 * until we get into jumbo grams and such..
 	 */
 	uint8_t chunk_buf[SCTP_CHUNK_BUFFER_SIZE];
 	struct sctp_tcb *locked_tcb = stcb;
 	int got_auth = 0;
 	uint32_t auth_offset = 0, auth_len = 0;
 	int auth_skipped = 0;
 	int asconf_cnt = 0;
 
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 	struct socket *so;
 
 #endif
 
 	SCTPDBG(SCTP_DEBUG_INPUT1, "sctp_process_control: iphlen=%u, offset=%u, length=%u stcb:%p\n",
 	    iphlen, *offset, length, (void *)stcb);
 
 	/* validate chunk header length... */
 	if (ntohs(ch->chunk_length) < sizeof(*ch)) {
 		SCTPDBG(SCTP_DEBUG_INPUT1, "Invalid header length %d\n",
 		    ntohs(ch->chunk_length));
 		if (locked_tcb) {
 			SCTP_TCB_UNLOCK(locked_tcb);
 		}
 		return (NULL);
 	}
 	/*
 	 * validate the verification tag
 	 */
 	vtag_in = ntohl(sh->v_tag);
 
 	if (locked_tcb) {
 		SCTP_TCB_LOCK_ASSERT(locked_tcb);
 	}
 	if (ch->chunk_type == SCTP_INITIATION) {
 		SCTPDBG(SCTP_DEBUG_INPUT1, "Its an INIT of len:%d vtag:%x\n",
 		    ntohs(ch->chunk_length), vtag_in);
 		if (vtag_in != 0) {
 			/* protocol error- silently discard... */
 			SCTP_STAT_INCR(sctps_badvtag);
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 		}
 	} else if (ch->chunk_type != SCTP_COOKIE_ECHO) {
 		/*
 		 * If there is no stcb, skip the AUTH chunk and process
 		 * later after a stcb is found (to validate the lookup was
 		 * valid.
 		 */
 		if ((ch->chunk_type == SCTP_AUTHENTICATION) &&
 		    (stcb == NULL) &&
 		    (inp->auth_supported == 1)) {
 			/* save this chunk for later processing */
 			auth_skipped = 1;
 			auth_offset = *offset;
 			auth_len = ntohs(ch->chunk_length);
 
 			/* (temporarily) move past this chunk */
 			*offset += SCTP_SIZE32(auth_len);
 			if (*offset >= length) {
 				/* no more data left in the mbuf chain */
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, *offset,
 			    sizeof(struct sctp_chunkhdr), chunk_buf);
 		}
 		if (ch == NULL) {
 			/* Help */
 			*offset = length;
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 		}
 		if (ch->chunk_type == SCTP_COOKIE_ECHO) {
 			goto process_control_chunks;
 		}
 		/*
 		 * first check if it's an ASCONF with an unknown src addr we
 		 * need to look inside to find the association
 		 */
 		if (ch->chunk_type == SCTP_ASCONF && stcb == NULL) {
 			struct sctp_chunkhdr *asconf_ch = ch;
 			uint32_t asconf_offset = 0, asconf_len = 0;
 
 			/* inp's refcount may be reduced */
 			SCTP_INP_INCR_REF(inp);
 
 			asconf_offset = *offset;
 			do {
 				asconf_len = ntohs(asconf_ch->chunk_length);
 				if (asconf_len < sizeof(struct sctp_asconf_paramhdr))
 					break;
 				stcb = sctp_findassociation_ep_asconf(m,
 				    *offset,
 				    dst,
 				    sh, &inp, netp, vrf_id);
 				if (stcb != NULL)
 					break;
 				asconf_offset += SCTP_SIZE32(asconf_len);
 				asconf_ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, asconf_offset,
 				    sizeof(struct sctp_chunkhdr), chunk_buf);
 			} while (asconf_ch != NULL && asconf_ch->chunk_type == SCTP_ASCONF);
 			if (stcb == NULL) {
 				/*
 				 * reduce inp's refcount if not reduced in
 				 * sctp_findassociation_ep_asconf().
 				 */
 				SCTP_INP_DECR_REF(inp);
 			} else {
 				locked_tcb = stcb;
 			}
 
 			/* now go back and verify any auth chunk to be sure */
 			if (auth_skipped && (stcb != NULL)) {
 				struct sctp_auth_chunk *auth;
 
 				auth = (struct sctp_auth_chunk *)
 				    sctp_m_getptr(m, auth_offset,
 				    auth_len, chunk_buf);
 				got_auth = 1;
 				auth_skipped = 0;
 				if ((auth == NULL) || sctp_handle_auth(stcb, auth, m,
 				    auth_offset)) {
 					/* auth HMAC failed so dump it */
 					*offset = length;
 					if (locked_tcb) {
 						SCTP_TCB_UNLOCK(locked_tcb);
 					}
 					return (NULL);
 				} else {
 					/* remaining chunks are HMAC checked */
 					stcb->asoc.authenticated = 1;
 				}
 			}
 		}
 		if (stcb == NULL) {
 			snprintf(msg, sizeof(msg), "OOTB, %s:%d at %s", __FILE__, __LINE__, __func__);
 			op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 			    msg);
 			/* no association, so it's out of the blue... */
 			sctp_handle_ootb(m, iphlen, *offset, src, dst, sh, inp, op_err,
 			    mflowtype, mflowid, inp->fibnum,
 			    vrf_id, port);
 			*offset = length;
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 		}
 		asoc = &stcb->asoc;
 		/* ABORT and SHUTDOWN can use either v_tag... */
 		if ((ch->chunk_type == SCTP_ABORT_ASSOCIATION) ||
 		    (ch->chunk_type == SCTP_SHUTDOWN_COMPLETE) ||
 		    (ch->chunk_type == SCTP_PACKET_DROPPED)) {
 			/* Take the T-bit always into account. */
 			if ((((ch->chunk_flags & SCTP_HAD_NO_TCB) == 0) &&
 			    (vtag_in == asoc->my_vtag)) ||
 			    (((ch->chunk_flags & SCTP_HAD_NO_TCB) == SCTP_HAD_NO_TCB) &&
 			    (vtag_in == asoc->peer_vtag))) {
 				/* this is valid */
 			} else {
 				/* drop this packet... */
 				SCTP_STAT_INCR(sctps_badvtag);
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 		} else if (ch->chunk_type == SCTP_SHUTDOWN_ACK) {
 			if (vtag_in != asoc->my_vtag) {
 				/*
 				 * this could be a stale SHUTDOWN-ACK or the
 				 * peer never got the SHUTDOWN-COMPLETE and
 				 * is still hung; we have started a new asoc
 				 * but it won't complete until the shutdown
 				 * is completed
 				 */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				snprintf(msg, sizeof(msg), "OOTB, %s:%d at %s", __FILE__, __LINE__, __func__);
 				op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 				    msg);
 				sctp_handle_ootb(m, iphlen, *offset, src, dst,
 				    sh, inp, op_err,
 				    mflowtype, mflowid, fibnum,
 				    vrf_id, port);
 				return (NULL);
 			}
 		} else {
 			/* for all other chunks, vtag must match */
 			if (vtag_in != asoc->my_vtag) {
 				/* invalid vtag... */
 				SCTPDBG(SCTP_DEBUG_INPUT3,
 				    "invalid vtag: %xh, expect %xh\n",
 				    vtag_in, asoc->my_vtag);
 				SCTP_STAT_INCR(sctps_badvtag);
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 		}
 	}			/* end if !SCTP_COOKIE_ECHO */
 	/*
 	 * process all control chunks...
 	 */
 	if (((ch->chunk_type == SCTP_SELECTIVE_ACK) ||
 	    (ch->chunk_type == SCTP_NR_SELECTIVE_ACK) ||
 	    (ch->chunk_type == SCTP_HEARTBEAT_REQUEST)) &&
 	    (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_COOKIE_ECHOED)) {
 		/* implied cookie-ack.. we must have lost the ack */
 		if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 			sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 			    stcb->asoc.overall_error_count,
 			    0,
 			    SCTP_FROM_SCTP_INPUT,
 			    __LINE__);
 		}
 		stcb->asoc.overall_error_count = 0;
 		sctp_handle_cookie_ack((struct sctp_cookie_ack_chunk *)ch, stcb,
 		    *netp);
 	}
 process_control_chunks:
 	while (IS_SCTP_CONTROL(ch)) {
 		/* validate chunk length */
 		chk_length = ntohs(ch->chunk_length);
 		SCTPDBG(SCTP_DEBUG_INPUT2, "sctp_process_control: processing a chunk type=%u, len=%u\n",
 		    ch->chunk_type, chk_length);
 		SCTP_LTRACE_CHK(inp, stcb, ch->chunk_type, chk_length);
 		if (chk_length < sizeof(*ch) ||
 		    (*offset + (int)chk_length) > length) {
 			*offset = length;
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 		}
 		SCTP_STAT_INCR_COUNTER64(sctps_incontrolchunks);
 		/*
 		 * INIT-ACK only gets the init ack "header" portion only
 		 * because we don't have to process the peer's COOKIE. All
 		 * others get a complete chunk.
 		 */
 		if ((ch->chunk_type == SCTP_INITIATION_ACK) ||
 		    (ch->chunk_type == SCTP_INITIATION)) {
 			/* get an init-ack chunk */
 			ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, *offset,
 			    sizeof(struct sctp_init_ack_chunk), chunk_buf);
 			if (ch == NULL) {
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 		} else {
 			/* For cookies and all other chunks. */
 			if (chk_length > sizeof(chunk_buf)) {
 				/*
 				 * use just the size of the chunk buffer so
 				 * the front part of our chunks fit in
 				 * contiguous space up to the chunk buffer
 				 * size (508 bytes). For chunks that need to
 				 * get more than that they must use the
 				 * sctp_m_getptr() function or other means
 				 * (e.g. know how to parse mbuf chains).
 				 * Cookies do this already.
 				 */
 				ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, *offset,
 				    (sizeof(chunk_buf) - 4),
 				    chunk_buf);
 				if (ch == NULL) {
 					*offset = length;
 					if (locked_tcb) {
 						SCTP_TCB_UNLOCK(locked_tcb);
 					}
 					return (NULL);
 				}
 			} else {
 				/* We can fit it all */
 				ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, *offset,
 				    chk_length, chunk_buf);
 				if (ch == NULL) {
 					SCTP_PRINTF("sctp_process_control: Can't get the all data....\n");
 					*offset = length;
 					if (locked_tcb) {
 						SCTP_TCB_UNLOCK(locked_tcb);
 					}
 					return (NULL);
 				}
 			}
 		}
 		num_chunks++;
 		/* Save off the last place we got a control from */
 		if (stcb != NULL) {
 			if (((netp != NULL) && (*netp != NULL)) || (ch->chunk_type == SCTP_ASCONF)) {
 				/*
 				 * allow last_control to be NULL if
 				 * ASCONF... ASCONF processing will find the
 				 * right net later
 				 */
 				if ((netp != NULL) && (*netp != NULL))
 					stcb->asoc.last_control_chunk_from = *netp;
 			}
 		}
 #ifdef SCTP_AUDITING_ENABLED
 		sctp_audit_log(0xB0, ch->chunk_type);
 #endif
 
 		/* check to see if this chunk required auth, but isn't */
 		if ((stcb != NULL) &&
 		    (stcb->asoc.auth_supported == 1) &&
 		    sctp_auth_is_required_chunk(ch->chunk_type, stcb->asoc.local_auth_chunks) &&
 		    !stcb->asoc.authenticated) {
 			/* "silently" ignore */
 			SCTP_STAT_INCR(sctps_recvauthmissing);
 			goto next_chunk;
 		}
 		switch (ch->chunk_type) {
 		case SCTP_INITIATION:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_INIT\n");
 			/* The INIT chunk must be the only chunk. */
 			if ((num_chunks > 1) ||
 			    (length - *offset > (int)SCTP_SIZE32(chk_length))) {
 				/* RFC 4960 requires that no ABORT is sent */
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			/* Honor our resource limit. */
 			if (chk_length > SCTP_LARGEST_INIT_ACCEPTED) {
 				op_err = sctp_generate_cause(SCTP_CAUSE_OUT_OF_RESC, "");
 				sctp_abort_association(inp, stcb, m, iphlen,
 				    src, dst, sh, op_err,
 				    mflowtype, mflowid,
 				    vrf_id, port);
 				*offset = length;
 				return (NULL);
 			}
 			sctp_handle_init(m, iphlen, *offset, src, dst, sh,
 			    (struct sctp_init_chunk *)ch, inp,
 			    stcb, *netp, &abort_no_unlock,
 			    mflowtype, mflowid,
 			    vrf_id, port);
 			*offset = length;
 			if ((!abort_no_unlock) && (locked_tcb)) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 			break;
 		case SCTP_PAD_CHUNK:
 			break;
 		case SCTP_INITIATION_ACK:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_INIT-ACK\n");
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 				/* We are not interested anymore */
 				if ((stcb) && (stcb->asoc.total_output_queue_size)) {
 					;
 				} else {
 					if ((locked_tcb != NULL) && (locked_tcb != stcb)) {
 						/* Very unlikely */
 						SCTP_TCB_UNLOCK(locked_tcb);
 					}
 					*offset = length;
 					if (stcb) {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 						so = SCTP_INP_SO(inp);
 						atomic_add_int(&stcb->asoc.refcnt, 1);
 						SCTP_TCB_UNLOCK(stcb);
 						SCTP_SOCKET_LOCK(so, 1);
 						SCTP_TCB_LOCK(stcb);
 						atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 						(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 						    SCTP_FROM_SCTP_INPUT + SCTP_LOC_29);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 						SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 					}
 					return (NULL);
 				}
 			}
 			/* The INIT-ACK chunk must be the only chunk. */
 			if ((num_chunks > 1) ||
 			    (length - *offset > (int)SCTP_SIZE32(chk_length))) {
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			if ((netp) && (*netp)) {
 				ret = sctp_handle_init_ack(m, iphlen, *offset,
 				    src, dst, sh,
 				    (struct sctp_init_ack_chunk *)ch,
 				    stcb, *netp,
 				    &abort_no_unlock,
 				    mflowtype, mflowid,
 				    vrf_id);
 			} else {
 				ret = -1;
 			}
 			*offset = length;
 			if (abort_no_unlock) {
 				return (NULL);
 			}
 			/*
 			 * Special case, I must call the output routine to
 			 * get the cookie echoed
 			 */
 			if ((stcb != NULL) && (ret == 0)) {
 				sctp_chunk_output(stcb->sctp_ep, stcb, SCTP_OUTPUT_FROM_CONTROL_PROC, SCTP_SO_NOT_LOCKED);
 			}
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			return (NULL);
 			break;
 		case SCTP_SELECTIVE_ACK:
 			{
 				struct sctp_sack_chunk *sack;
 				int abort_now = 0;
 				uint32_t a_rwnd, cum_ack;
 				uint16_t num_seg, num_dup;
 				uint8_t flags;
 				int offset_seg, offset_dup;
 
 				SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_SACK\n");
 				SCTP_STAT_INCR(sctps_recvsacks);
 				if (stcb == NULL) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "No stcb when processing SACK chunk\n");
 					break;
 				}
 				if (chk_length < sizeof(struct sctp_sack_chunk)) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "Bad size on SACK chunk, too small\n");
 					break;
 				}
 				if (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_SHUTDOWN_ACK_SENT) {
 					/*-
 					 * If we have sent a shutdown-ack, we will pay no
 					 * attention to a sack sent in to us since
 					 * we don't care anymore.
 					 */
 					break;
 				}
 				sack = (struct sctp_sack_chunk *)ch;
 				flags = ch->chunk_flags;
 				cum_ack = ntohl(sack->sack.cum_tsn_ack);
 				num_seg = ntohs(sack->sack.num_gap_ack_blks);
 				num_dup = ntohs(sack->sack.num_dup_tsns);
 				a_rwnd = (uint32_t) ntohl(sack->sack.a_rwnd);
 				if (sizeof(struct sctp_sack_chunk) +
 				    num_seg * sizeof(struct sctp_gap_ack_block) +
 				    num_dup * sizeof(uint32_t) != chk_length) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "Bad size of SACK chunk\n");
 					break;
 				}
 				offset_seg = *offset + sizeof(struct sctp_sack_chunk);
 				offset_dup = offset_seg + num_seg * sizeof(struct sctp_gap_ack_block);
 				SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_SACK process cum_ack:%x num_seg:%d a_rwnd:%d\n",
 				    cum_ack, num_seg, a_rwnd);
 				stcb->asoc.seen_a_sack_this_pkt = 1;
 				if ((stcb->asoc.pr_sctp_cnt == 0) &&
 				    (num_seg == 0) &&
 				    SCTP_TSN_GE(cum_ack, stcb->asoc.last_acked_seq) &&
 				    (stcb->asoc.saw_sack_with_frags == 0) &&
 				    (stcb->asoc.saw_sack_with_nr_frags == 0) &&
 				    (!TAILQ_EMPTY(&stcb->asoc.sent_queue))
 				    ) {
 					/*
 					 * We have a SIMPLE sack having no
 					 * prior segments and data on sent
 					 * queue to be acked.. Use the
 					 * faster path sack processing. We
 					 * also allow window update sacks
 					 * with no missing segments to go
 					 * this way too.
 					 */
 					sctp_express_handle_sack(stcb, cum_ack, a_rwnd, &abort_now, ecne_seen);
 				} else {
 					if (netp && *netp)
 						sctp_handle_sack(m, offset_seg, offset_dup, stcb,
 						    num_seg, 0, num_dup, &abort_now, flags,
 						    cum_ack, a_rwnd, ecne_seen);
 				}
 				if (abort_now) {
 					/* ABORT signal from sack processing */
 					*offset = length;
 					return (NULL);
 				}
 				if (TAILQ_EMPTY(&stcb->asoc.send_queue) &&
 				    TAILQ_EMPTY(&stcb->asoc.sent_queue) &&
 				    (stcb->asoc.stream_queue_cnt == 0)) {
 					sctp_ulp_notify(SCTP_NOTIFY_SENDER_DRY, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 				}
 			}
 			break;
 			/*
 			 * EY - nr_sack:  If the received chunk is an
 			 * nr_sack chunk
 			 */
 		case SCTP_NR_SELECTIVE_ACK:
 			{
 				struct sctp_nr_sack_chunk *nr_sack;
 				int abort_now = 0;
 				uint32_t a_rwnd, cum_ack;
 				uint16_t num_seg, num_nr_seg, num_dup;
 				uint8_t flags;
 				int offset_seg, offset_dup;
 
 				SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_NR_SACK\n");
 				SCTP_STAT_INCR(sctps_recvsacks);
 				if (stcb == NULL) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "No stcb when processing NR-SACK chunk\n");
 					break;
 				}
 				if (stcb->asoc.nrsack_supported == 0) {
 					goto unknown_chunk;
 				}
 				if (chk_length < sizeof(struct sctp_nr_sack_chunk)) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "Bad size on NR-SACK chunk, too small\n");
 					break;
 				}
 				if (SCTP_GET_STATE(&stcb->asoc) == SCTP_STATE_SHUTDOWN_ACK_SENT) {
 					/*-
 					 * If we have sent a shutdown-ack, we will pay no
 					 * attention to a sack sent in to us since
 					 * we don't care anymore.
 					 */
 					break;
 				}
 				nr_sack = (struct sctp_nr_sack_chunk *)ch;
 				flags = ch->chunk_flags;
 				cum_ack = ntohl(nr_sack->nr_sack.cum_tsn_ack);
 				num_seg = ntohs(nr_sack->nr_sack.num_gap_ack_blks);
 				num_nr_seg = ntohs(nr_sack->nr_sack.num_nr_gap_ack_blks);
 				num_dup = ntohs(nr_sack->nr_sack.num_dup_tsns);
 				a_rwnd = (uint32_t) ntohl(nr_sack->nr_sack.a_rwnd);
 				if (sizeof(struct sctp_nr_sack_chunk) +
 				    (num_seg + num_nr_seg) * sizeof(struct sctp_gap_ack_block) +
 				    num_dup * sizeof(uint32_t) != chk_length) {
 					SCTPDBG(SCTP_DEBUG_INDATA1, "Bad size of NR_SACK chunk\n");
 					break;
 				}
 				offset_seg = *offset + sizeof(struct sctp_nr_sack_chunk);
 				offset_dup = offset_seg + num_seg * sizeof(struct sctp_gap_ack_block);
 				SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_NR_SACK process cum_ack:%x num_seg:%d a_rwnd:%d\n",
 				    cum_ack, num_seg, a_rwnd);
 				stcb->asoc.seen_a_sack_this_pkt = 1;
 				if ((stcb->asoc.pr_sctp_cnt == 0) &&
 				    (num_seg == 0) && (num_nr_seg == 0) &&
 				    SCTP_TSN_GE(cum_ack, stcb->asoc.last_acked_seq) &&
 				    (stcb->asoc.saw_sack_with_frags == 0) &&
 				    (stcb->asoc.saw_sack_with_nr_frags == 0) &&
 				    (!TAILQ_EMPTY(&stcb->asoc.sent_queue))) {
 					/*
 					 * We have a SIMPLE sack having no
 					 * prior segments and data on sent
 					 * queue to be acked. Use the faster
 					 * path sack processing. We also
 					 * allow window update sacks with no
 					 * missing segments to go this way
 					 * too.
 					 */
 					sctp_express_handle_sack(stcb, cum_ack, a_rwnd,
 					    &abort_now, ecne_seen);
 				} else {
 					if (netp && *netp)
 						sctp_handle_sack(m, offset_seg, offset_dup, stcb,
 						    num_seg, num_nr_seg, num_dup, &abort_now, flags,
 						    cum_ack, a_rwnd, ecne_seen);
 				}
 				if (abort_now) {
 					/* ABORT signal from sack processing */
 					*offset = length;
 					return (NULL);
 				}
 				if (TAILQ_EMPTY(&stcb->asoc.send_queue) &&
 				    TAILQ_EMPTY(&stcb->asoc.sent_queue) &&
 				    (stcb->asoc.stream_queue_cnt == 0)) {
 					sctp_ulp_notify(SCTP_NOTIFY_SENDER_DRY, stcb, 0, NULL, SCTP_SO_NOT_LOCKED);
 				}
 			}
 			break;
 
 		case SCTP_HEARTBEAT_REQUEST:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_HEARTBEAT\n");
 			if ((stcb) && netp && *netp) {
 				SCTP_STAT_INCR(sctps_recvheartbeat);
 				sctp_send_heartbeat_ack(stcb, m, *offset,
 				    chk_length, *netp);
 
 				/* He's alive so give him credit */
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 			}
 			break;
 		case SCTP_HEARTBEAT_ACK:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_HEARTBEAT-ACK\n");
 			if ((stcb == NULL) || (chk_length != sizeof(struct sctp_heartbeat_chunk))) {
 				/* Its not ours */
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			/* He's alive so give him credit */
 			if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 				sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 				    stcb->asoc.overall_error_count,
 				    0,
 				    SCTP_FROM_SCTP_INPUT,
 				    __LINE__);
 			}
 			stcb->asoc.overall_error_count = 0;
 			SCTP_STAT_INCR(sctps_recvheartbeatack);
 			if (netp && *netp)
 				sctp_handle_heartbeat_ack((struct sctp_heartbeat_chunk *)ch,
 				    stcb, *netp);
 			break;
 		case SCTP_ABORT_ASSOCIATION:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_ABORT, stcb %p\n",
 			    (void *)stcb);
 			if ((stcb) && netp && *netp)
 				sctp_handle_abort((struct sctp_abort_chunk *)ch,
 				    stcb, *netp);
 			*offset = length;
 			return (NULL);
 			break;
 		case SCTP_SHUTDOWN:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_SHUTDOWN, stcb %p\n",
 			    (void *)stcb);
 			if ((stcb == NULL) || (chk_length != sizeof(struct sctp_shutdown_chunk))) {
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			if (netp && *netp) {
 				int abort_flag = 0;
 
 				sctp_handle_shutdown((struct sctp_shutdown_chunk *)ch,
 				    stcb, *netp, &abort_flag);
 				if (abort_flag) {
 					*offset = length;
 					return (NULL);
 				}
 			}
 			break;
 		case SCTP_SHUTDOWN_ACK:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_SHUTDOWN-ACK, stcb %p\n", (void *)stcb);
 			if ((stcb) && (netp) && (*netp))
 				sctp_handle_shutdown_ack((struct sctp_shutdown_ack_chunk *)ch, stcb, *netp);
 			*offset = length;
 			return (NULL);
 			break;
 
 		case SCTP_OPERATION_ERROR:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_OP-ERR\n");
 			if ((stcb) && netp && *netp && sctp_handle_error(ch, stcb, *netp) < 0) {
 				*offset = length;
 				return (NULL);
 			}
 			break;
 		case SCTP_COOKIE_ECHO:
 			SCTPDBG(SCTP_DEBUG_INPUT3,
 			    "SCTP_COOKIE-ECHO, stcb %p\n", (void *)stcb);
 			if ((stcb) && (stcb->asoc.total_output_queue_size)) {
 				;
 			} else {
 				if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 					/* We are not interested anymore */
 			abend:
 					if (stcb) {
 						SCTP_TCB_UNLOCK(stcb);
 					}
 					*offset = length;
 					return (NULL);
 				}
 			}
 			/*
 			 * First are we accepting? We do this again here
 			 * since it is possible that a previous endpoint WAS
 			 * listening responded to a INIT-ACK and then
 			 * closed. We opened and bound.. and are now no
 			 * longer listening.
 			 */
 
 			if ((stcb == NULL) && (inp->sctp_socket->so_qlen >= inp->sctp_socket->so_qlimit)) {
 				if ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) &&
 				    (SCTP_BASE_SYSCTL(sctp_abort_if_one_2_one_hits_limit))) {
 					op_err = sctp_generate_cause(SCTP_CAUSE_OUT_OF_RESC, "");
 					sctp_abort_association(inp, stcb, m, iphlen,
 					    src, dst, sh, op_err,
 					    mflowtype, mflowid,
 					    vrf_id, port);
 				}
 				*offset = length;
 				return (NULL);
 			} else {
 				struct mbuf *ret_buf;
 				struct sctp_inpcb *linp;
 
 				if (stcb) {
 					linp = NULL;
 				} else {
 					linp = inp;
 				}
 
 				if (linp) {
 					SCTP_ASOC_CREATE_LOCK(linp);
 					if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) ||
 					    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
 						SCTP_ASOC_CREATE_UNLOCK(linp);
 						goto abend;
 					}
 				}
 				if (netp) {
 					ret_buf =
 					    sctp_handle_cookie_echo(m, iphlen,
 					    *offset,
 					    src, dst,
 					    sh,
 					    (struct sctp_cookie_echo_chunk *)ch,
 					    &inp, &stcb, netp,
 					    auth_skipped,
 					    auth_offset,
 					    auth_len,
 					    &locked_tcb,
 					    mflowtype,
 					    mflowid,
 					    vrf_id,
 					    port);
 				} else {
 					ret_buf = NULL;
 				}
 				if (linp) {
 					SCTP_ASOC_CREATE_UNLOCK(linp);
 				}
 				if (ret_buf == NULL) {
 					if (locked_tcb) {
 						SCTP_TCB_UNLOCK(locked_tcb);
 					}
 					SCTPDBG(SCTP_DEBUG_INPUT3,
 					    "GAK, null buffer\n");
 					*offset = length;
 					return (NULL);
 				}
 				/* if AUTH skipped, see if it verified... */
 				if (auth_skipped) {
 					got_auth = 1;
 					auth_skipped = 0;
 				}
 				if (!TAILQ_EMPTY(&stcb->asoc.sent_queue)) {
 					/*
 					 * Restart the timer if we have
 					 * pending data
 					 */
 					struct sctp_tmit_chunk *chk;
 
 					chk = TAILQ_FIRST(&stcb->asoc.sent_queue);
 					sctp_timer_start(SCTP_TIMER_TYPE_SEND, stcb->sctp_ep, stcb, chk->whoTo);
 				}
 			}
 			break;
 		case SCTP_COOKIE_ACK:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_COOKIE-ACK, stcb %p\n", (void *)stcb);
 			if ((stcb == NULL) || chk_length != sizeof(struct sctp_cookie_ack_chunk)) {
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 				/* We are not interested anymore */
 				if ((stcb) && (stcb->asoc.total_output_queue_size)) {
 					;
 				} else if (stcb) {
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					so = SCTP_INP_SO(inp);
 					atomic_add_int(&stcb->asoc.refcnt, 1);
 					SCTP_TCB_UNLOCK(stcb);
 					SCTP_SOCKET_LOCK(so, 1);
 					SCTP_TCB_LOCK(stcb);
 					atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 					(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 					    SCTP_FROM_SCTP_INPUT + SCTP_LOC_30);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 					*offset = length;
 					return (NULL);
 				}
 			}
 			/* He's alive so give him credit */
 			if ((stcb) && netp && *netp) {
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 				sctp_handle_cookie_ack((struct sctp_cookie_ack_chunk *)ch, stcb, *netp);
 			}
 			break;
 		case SCTP_ECN_ECHO:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_ECN-ECHO\n");
 			/* He's alive so give him credit */
 			if ((stcb == NULL) || (chk_length != sizeof(struct sctp_ecne_chunk))) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if (stcb) {
 				if (stcb->asoc.ecn_supported == 0) {
 					goto unknown_chunk;
 				}
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 				sctp_handle_ecn_echo((struct sctp_ecne_chunk *)ch,
 				    stcb);
 				ecne_seen = 1;
 			}
 			break;
 		case SCTP_ECN_CWR:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_ECN-CWR\n");
 			/* He's alive so give him credit */
 			if ((stcb == NULL) || (chk_length != sizeof(struct sctp_cwr_chunk))) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if (stcb) {
 				if (stcb->asoc.ecn_supported == 0) {
 					goto unknown_chunk;
 				}
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 				sctp_handle_ecn_cwr((struct sctp_cwr_chunk *)ch, stcb, *netp);
 			}
 			break;
 		case SCTP_SHUTDOWN_COMPLETE:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_SHUTDOWN-COMPLETE, stcb %p\n", (void *)stcb);
 			/* must be first and only chunk */
 			if ((num_chunks > 1) ||
 			    (length - *offset > (int)SCTP_SIZE32(chk_length))) {
 				*offset = length;
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				return (NULL);
 			}
 			if ((stcb) && netp && *netp) {
 				sctp_handle_shutdown_complete((struct sctp_shutdown_complete_chunk *)ch,
 				    stcb, *netp);
 			}
 			*offset = length;
 			return (NULL);
 			break;
 		case SCTP_ASCONF:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_ASCONF\n");
 			/* He's alive so give him credit */
 			if (stcb) {
 				if (stcb->asoc.asconf_supported == 0) {
 					goto unknown_chunk;
 				}
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 				sctp_handle_asconf(m, *offset, src,
 				    (struct sctp_asconf_chunk *)ch, stcb, asconf_cnt == 0);
 				asconf_cnt++;
 			}
 			break;
 		case SCTP_ASCONF_ACK:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_ASCONF-ACK\n");
 			if (chk_length < sizeof(struct sctp_asconf_ack_chunk)) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if ((stcb) && netp && *netp) {
 				if (stcb->asoc.asconf_supported == 0) {
 					goto unknown_chunk;
 				}
 				/* He's alive so give him credit */
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				stcb->asoc.overall_error_count = 0;
 				sctp_handle_asconf_ack(m, *offset,
 				    (struct sctp_asconf_ack_chunk *)ch, stcb, *netp, &abort_no_unlock);
 				if (abort_no_unlock)
 					return (NULL);
 			}
 			break;
 		case SCTP_FORWARD_CUM_TSN:
 		case SCTP_IFORWARD_CUM_TSN:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_FWD-TSN\n");
 			if (chk_length < sizeof(struct sctp_forward_tsn_chunk)) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			/* He's alive so give him credit */
 			if (stcb) {
 				int abort_flag = 0;
 
 				if (stcb->asoc.prsctp_supported == 0) {
 					goto unknown_chunk;
 				}
 				stcb->asoc.overall_error_count = 0;
 				if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 					sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 					    stcb->asoc.overall_error_count,
 					    0,
 					    SCTP_FROM_SCTP_INPUT,
 					    __LINE__);
 				}
 				*fwd_tsn_seen = 1;
 				if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 					/* We are not interested anymore */
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					so = SCTP_INP_SO(inp);
 					atomic_add_int(&stcb->asoc.refcnt, 1);
 					SCTP_TCB_UNLOCK(stcb);
 					SCTP_SOCKET_LOCK(so, 1);
 					SCTP_TCB_LOCK(stcb);
 					atomic_subtract_int(&stcb->asoc.refcnt, 1);
 #endif
 					(void)sctp_free_assoc(inp, stcb, SCTP_NORMAL_PROC,
 					    SCTP_FROM_SCTP_INPUT + SCTP_LOC_31);
 #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING)
 					SCTP_SOCKET_UNLOCK(so, 1);
 #endif
 					*offset = length;
 					return (NULL);
 				}
 				sctp_handle_forward_tsn(stcb,
 				    (struct sctp_forward_tsn_chunk *)ch, &abort_flag, m, *offset);
 				if (abort_flag) {
 					*offset = length;
 					return (NULL);
 				} else {
 					if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 						sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 						    stcb->asoc.overall_error_count,
 						    0,
 						    SCTP_FROM_SCTP_INPUT,
 						    __LINE__);
 					}
 					stcb->asoc.overall_error_count = 0;
 				}
 
 			}
 			break;
 		case SCTP_STREAM_RESET:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_STREAM_RESET\n");
 			if (((stcb == NULL) || (ch == NULL) || (chk_length < sizeof(struct sctp_stream_reset_tsn_req)))) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if (stcb->asoc.reconfig_supported == 0) {
 				goto unknown_chunk;
 			}
 			if (sctp_handle_stream_reset(stcb, m, *offset, ch)) {
 				/* stop processing */
 				*offset = length;
 				return (NULL);
 			}
 			break;
 		case SCTP_PACKET_DROPPED:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_PACKET_DROPPED\n");
 			/* re-get it all please */
 			if (chk_length < sizeof(struct sctp_pktdrop_chunk)) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if (ch && (stcb) && netp && (*netp)) {
 				if (stcb->asoc.pktdrop_supported == 0) {
 					goto unknown_chunk;
 				}
 				sctp_handle_packet_dropped((struct sctp_pktdrop_chunk *)ch,
 				    stcb, *netp,
 				    min(chk_length, (sizeof(chunk_buf) - 4)));
 
 			}
 			break;
 		case SCTP_AUTHENTICATION:
 			SCTPDBG(SCTP_DEBUG_INPUT3, "SCTP_AUTHENTICATION\n");
 			if (stcb == NULL) {
 				/* save the first AUTH for later processing */
 				if (auth_skipped == 0) {
 					auth_offset = *offset;
 					auth_len = chk_length;
 					auth_skipped = 1;
 				}
 				/* skip this chunk (temporarily) */
 				goto next_chunk;
 			}
 			if (stcb->asoc.auth_supported == 0) {
 				goto unknown_chunk;
 			}
 			if ((chk_length < (sizeof(struct sctp_auth_chunk))) ||
 			    (chk_length > (sizeof(struct sctp_auth_chunk) +
 			    SCTP_AUTH_DIGEST_LEN_MAX))) {
 				/* Its not ours */
 				if (locked_tcb) {
 					SCTP_TCB_UNLOCK(locked_tcb);
 				}
 				*offset = length;
 				return (NULL);
 			}
 			if (got_auth == 1) {
 				/* skip this chunk... it's already auth'd */
 				goto next_chunk;
 			}
 			got_auth = 1;
 			if ((ch == NULL) || sctp_handle_auth(stcb, (struct sctp_auth_chunk *)ch,
 			    m, *offset)) {
 				/* auth HMAC failed so dump the packet */
 				*offset = length;
 				return (stcb);
 			} else {
 				/* remaining chunks are HMAC checked */
 				stcb->asoc.authenticated = 1;
 			}
 			break;
 
 		default:
 	unknown_chunk:
 			/* it's an unknown chunk! */
 			if ((ch->chunk_type & 0x40) && (stcb != NULL)) {
 				struct sctp_gen_error_cause *cause;
 				int len;
 
 				op_err = sctp_get_mbuf_for_msg(sizeof(struct sctp_gen_error_cause),
 				    0, M_NOWAIT, 1, MT_DATA);
 				if (op_err != NULL) {
 					len = min(SCTP_SIZE32(chk_length), (uint32_t) (length - *offset));
 					cause = mtod(op_err, struct sctp_gen_error_cause *);
 					cause->code = htons(SCTP_CAUSE_UNRECOG_CHUNK);
 					cause->length = htons((uint16_t) (len + sizeof(struct sctp_gen_error_cause)));
 					SCTP_BUF_LEN(op_err) = sizeof(struct sctp_gen_error_cause);
 					SCTP_BUF_NEXT(op_err) = SCTP_M_COPYM(m, *offset, len, M_NOWAIT);
 					if (SCTP_BUF_NEXT(op_err) != NULL) {
 #ifdef SCTP_MBUF_LOGGING
 						if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MBUF_LOGGING_ENABLE) {
 							sctp_log_mbc(SCTP_BUF_NEXT(op_err), SCTP_MBUF_ICOPY);
 						}
 #endif
 						sctp_queue_op_err(stcb, op_err);
 					} else {
 						sctp_m_freem(op_err);
 					}
 				}
 			}
 			if ((ch->chunk_type & 0x80) == 0) {
 				/* discard this packet */
 				*offset = length;
 				return (stcb);
 			}	/* else skip this bad chunk and continue... */
 			break;
 		}		/* switch (ch->chunk_type) */
 
 
 next_chunk:
 		/* get the next chunk */
 		*offset += SCTP_SIZE32(chk_length);
 		if (*offset >= length) {
 			/* no more data left in the mbuf chain */
 			break;
 		}
 		ch = (struct sctp_chunkhdr *)sctp_m_getptr(m, *offset,
 		    sizeof(struct sctp_chunkhdr), chunk_buf);
 		if (ch == NULL) {
 			if (locked_tcb) {
 				SCTP_TCB_UNLOCK(locked_tcb);
 			}
 			*offset = length;
 			return (NULL);
 		}
 	}			/* while */
 
 	if (asconf_cnt > 0 && stcb != NULL) {
 		sctp_send_asconf_ack(stcb);
 	}
 	return (stcb);
 }
 
 
 /*
  * common input chunk processing (v4 and v6)
  */
 void
 sctp_common_input_processing(struct mbuf **mm, int iphlen, int offset, int length,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_chunkhdr *ch,
 #if !defined(SCTP_WITH_NO_CSUM)
     uint8_t compute_crc,
 #endif
     uint8_t ecn_bits,
     uint8_t mflowtype, uint32_t mflowid, uint16_t fibnum,
     uint32_t vrf_id, uint16_t port)
 {
 	uint32_t high_tsn;
 	int fwd_tsn_seen = 0, data_processed = 0;
 	struct mbuf *m = *mm, *op_err;
 	char msg[SCTP_DIAG_INFO_LEN];
 	int un_sent;
 	int cnt_ctrl_ready = 0;
 	struct sctp_inpcb *inp = NULL, *inp_decr = NULL;
 	struct sctp_tcb *stcb = NULL;
 	struct sctp_nets *net = NULL;
 
 	SCTP_STAT_INCR(sctps_recvdatagrams);
 #ifdef SCTP_AUDITING_ENABLED
 	sctp_audit_log(0xE0, 1);
 	sctp_auditing(0, inp, stcb, net);
 #endif
 #if !defined(SCTP_WITH_NO_CSUM)
 	if (compute_crc != 0) {
 		uint32_t check, calc_check;
 
 		check = sh->checksum;
 		sh->checksum = 0;
 		calc_check = sctp_calculate_cksum(m, iphlen);
 		sh->checksum = check;
 		if (calc_check != check) {
 			SCTPDBG(SCTP_DEBUG_INPUT1, "Bad CSUM on SCTP packet calc_check:%x check:%x  m:%p mlen:%d iphlen:%d\n",
 			    calc_check, check, (void *)m, length, iphlen);
 			stcb = sctp_findassociation_addr(m, offset, src, dst,
 			    sh, ch, &inp, &net, vrf_id);
 #if defined(INET) || defined(INET6)
 			if ((ch->chunk_type != SCTP_INITIATION) &&
 			    (net != NULL) && (net->port != port)) {
 				if (net->port == 0) {
 					/* UDP encapsulation turned on. */
 					net->mtu -= sizeof(struct udphdr);
 					if (stcb->asoc.smallest_mtu > net->mtu) {
 						sctp_pathmtu_adjustment(stcb, net->mtu);
 					}
 				} else if (port == 0) {
 					/* UDP encapsulation turned off. */
 					net->mtu += sizeof(struct udphdr);
 					/* XXX Update smallest_mtu */
 				}
 				net->port = port;
 			}
 #endif
 			if (net != NULL) {
 				net->flowtype = mflowtype;
 				net->flowid = mflowid;
 			}
 			if ((inp != NULL) && (stcb != NULL)) {
 				sctp_send_packet_dropped(stcb, net, m, length, iphlen, 1);
 				sctp_chunk_output(inp, stcb, SCTP_OUTPUT_FROM_INPUT_ERROR, SCTP_SO_NOT_LOCKED);
 			} else if ((inp != NULL) && (stcb == NULL)) {
 				inp_decr = inp;
 			}
 			SCTP_STAT_INCR(sctps_badsum);
 			SCTP_STAT_INCR_COUNTER32(sctps_checksumerrors);
 			goto out;
 		}
 	}
 #endif
 	/* Destination port of 0 is illegal, based on RFC4960. */
 	if (sh->dest_port == 0) {
 		SCTP_STAT_INCR(sctps_hdrops);
 		goto out;
 	}
 	stcb = sctp_findassociation_addr(m, offset, src, dst,
 	    sh, ch, &inp, &net, vrf_id);
 #if defined(INET) || defined(INET6)
 	if ((ch->chunk_type != SCTP_INITIATION) &&
 	    (net != NULL) && (net->port != port)) {
 		if (net->port == 0) {
 			/* UDP encapsulation turned on. */
 			net->mtu -= sizeof(struct udphdr);
 			if (stcb->asoc.smallest_mtu > net->mtu) {
 				sctp_pathmtu_adjustment(stcb, net->mtu);
 			}
 		} else if (port == 0) {
 			/* UDP encapsulation turned off. */
 			net->mtu += sizeof(struct udphdr);
 			/* XXX Update smallest_mtu */
 		}
 		net->port = port;
 	}
 #endif
 	if (net != NULL) {
 		net->flowtype = mflowtype;
 		net->flowid = mflowid;
 	}
 	if (inp == NULL) {
 		SCTP_STAT_INCR(sctps_noport);
 		if (badport_bandlim(BANDLIM_SCTP_OOTB) < 0) {
 			goto out;
 		}
 		if (ch->chunk_type == SCTP_SHUTDOWN_ACK) {
 			sctp_send_shutdown_complete2(src, dst, sh,
 			    mflowtype, mflowid, fibnum,
 			    vrf_id, port);
 			goto out;
 		}
 		if (ch->chunk_type == SCTP_SHUTDOWN_COMPLETE) {
 			goto out;
 		}
 		if (ch->chunk_type != SCTP_ABORT_ASSOCIATION) {
 			if ((SCTP_BASE_SYSCTL(sctp_blackhole) == 0) ||
 			    ((SCTP_BASE_SYSCTL(sctp_blackhole) == 1) &&
 			    (ch->chunk_type != SCTP_INIT))) {
 				op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 				    "Out of the blue");
 				sctp_send_abort(m, iphlen, src, dst,
 				    sh, 0, op_err,
 				    mflowtype, mflowid, fibnum,
 				    vrf_id, port);
 			}
 		}
 		goto out;
 	} else if (stcb == NULL) {
 		inp_decr = inp;
 	}
 #ifdef IPSEC
 	/*-
 	 * I very much doubt any of the IPSEC stuff will work but I have no
 	 * idea, so I will leave it in place.
 	 */
 	if (inp != NULL) {
 		switch (dst->sa_family) {
 #ifdef INET
 		case AF_INET:
 			if (ipsec4_in_reject(m, &inp->ip_inp.inp)) {
 				SCTP_STAT_INCR(sctps_hdrops);
 				goto out;
 			}
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			if (ipsec6_in_reject(m, &inp->ip_inp.inp)) {
 				SCTP_STAT_INCR(sctps_hdrops);
 				goto out;
 			}
 			break;
 #endif
 		default:
 			break;
 		}
 	}
 #endif
 	SCTPDBG(SCTP_DEBUG_INPUT1, "Ok, Common input processing called, m:%p iphlen:%d offset:%d length:%d stcb:%p\n",
 	    (void *)m, iphlen, offset, length, (void *)stcb);
 	if (stcb) {
 		/* always clear this before beginning a packet */
 		stcb->asoc.authenticated = 0;
 		stcb->asoc.seen_a_sack_this_pkt = 0;
 		SCTPDBG(SCTP_DEBUG_INPUT1, "stcb:%p state:%x\n",
 		    (void *)stcb, stcb->asoc.state);
 
 		if ((stcb->asoc.state & SCTP_STATE_WAS_ABORTED) ||
 		    (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED)) {
 			/*-
 			 * If we hit here, we had a ref count
 			 * up when the assoc was aborted and the
 			 * timer is clearing out the assoc, we should
 			 * NOT respond to any packet.. its OOTB.
 			 */
 			SCTP_TCB_UNLOCK(stcb);
 			stcb = NULL;
 			snprintf(msg, sizeof(msg), "OOTB, %s:%d at %s", __FILE__, __LINE__, __func__);
 			op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 			    msg);
 			sctp_handle_ootb(m, iphlen, offset, src, dst, sh, inp, op_err,
 			    mflowtype, mflowid, inp->fibnum,
 			    vrf_id, port);
 			goto out;
 		}
 	}
 	if (IS_SCTP_CONTROL(ch)) {
 		/* process the control portion of the SCTP packet */
 		/* sa_ignore NO_NULL_CHK */
 		stcb = sctp_process_control(m, iphlen, &offset, length,
 		    src, dst, sh, ch,
 		    inp, stcb, &net, &fwd_tsn_seen,
 		    mflowtype, mflowid, fibnum,
 		    vrf_id, port);
 		if (stcb) {
 			/*
 			 * This covers us if the cookie-echo was there and
 			 * it changes our INP.
 			 */
 			inp = stcb->sctp_ep;
 #if defined(INET) || defined(INET6)
 			if ((ch->chunk_type != SCTP_INITIATION) &&
 			    (net != NULL) && (net->port != port)) {
 				if (net->port == 0) {
 					/* UDP encapsulation turned on. */
 					net->mtu -= sizeof(struct udphdr);
 					if (stcb->asoc.smallest_mtu > net->mtu) {
 						sctp_pathmtu_adjustment(stcb, net->mtu);
 					}
 				} else if (port == 0) {
 					/* UDP encapsulation turned off. */
 					net->mtu += sizeof(struct udphdr);
 					/* XXX Update smallest_mtu */
 				}
 				net->port = port;
 			}
 #endif
 		}
 	} else {
 		/*
 		 * no control chunks, so pre-process DATA chunks (these
 		 * checks are taken care of by control processing)
 		 */
 
 		/*
 		 * if DATA only packet, and auth is required, then punt...
 		 * can't have authenticated without any AUTH (control)
 		 * chunks
 		 */
 		if ((stcb != NULL) &&
 		    (stcb->asoc.auth_supported == 1) &&
 		    sctp_auth_is_required_chunk(SCTP_DATA, stcb->asoc.local_auth_chunks)) {
 			/* "silently" ignore */
 			SCTP_STAT_INCR(sctps_recvauthmissing);
 			goto out;
 		}
 		if (stcb == NULL) {
 			/* out of the blue DATA chunk */
 			snprintf(msg, sizeof(msg), "OOTB, %s:%d at %s", __FILE__, __LINE__, __func__);
 			op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 			    msg);
 			sctp_handle_ootb(m, iphlen, offset, src, dst, sh, inp, op_err,
 			    mflowtype, mflowid, fibnum,
 			    vrf_id, port);
 			goto out;
 		}
 		if (stcb->asoc.my_vtag != ntohl(sh->v_tag)) {
 			/* v_tag mismatch! */
 			SCTP_STAT_INCR(sctps_badvtag);
 			goto out;
 		}
 	}
 
 	if (stcb == NULL) {
 		/*
 		 * no valid TCB for this packet, or we found it's a bad
 		 * packet while processing control, or we're done with this
 		 * packet (done or skip rest of data), so we drop it...
 		 */
 		goto out;
 	}
 	/*
 	 * DATA chunk processing
 	 */
 	/* plow through the data chunks while length > offset */
 
 	/*
 	 * Rest should be DATA only.  Check authentication state if AUTH for
 	 * DATA is required.
 	 */
 	if ((length > offset) &&
 	    (stcb != NULL) &&
 	    (stcb->asoc.auth_supported == 1) &&
 	    sctp_auth_is_required_chunk(SCTP_DATA, stcb->asoc.local_auth_chunks) &&
 	    !stcb->asoc.authenticated) {
 		/* "silently" ignore */
 		SCTP_STAT_INCR(sctps_recvauthmissing);
 		SCTPDBG(SCTP_DEBUG_AUTH1,
 		    "Data chunk requires AUTH, skipped\n");
 		goto trigger_send;
 	}
 	if (length > offset) {
 		int retval;
 
 		/*
 		 * First check to make sure our state is correct. We would
 		 * not get here unless we really did have a tag, so we don't
 		 * abort if this happens, just dump the chunk silently.
 		 */
 		switch (SCTP_GET_STATE(&stcb->asoc)) {
 		case SCTP_STATE_COOKIE_ECHOED:
 			/*
 			 * we consider data with valid tags in this state
 			 * shows us the cookie-ack was lost. Imply it was
 			 * there.
 			 */
 			if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_THRESHOLD_LOGGING) {
 				sctp_misc_ints(SCTP_THRESHOLD_CLEAR,
 				    stcb->asoc.overall_error_count,
 				    0,
 				    SCTP_FROM_SCTP_INPUT,
 				    __LINE__);
 			}
 			stcb->asoc.overall_error_count = 0;
 			sctp_handle_cookie_ack((struct sctp_cookie_ack_chunk *)ch, stcb, net);
 			break;
 		case SCTP_STATE_COOKIE_WAIT:
 			/*
 			 * We consider OOTB any data sent during asoc setup.
 			 */
 			snprintf(msg, sizeof(msg), "OOTB, %s:%d at %s", __FILE__, __LINE__, __func__);
 			op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 			    msg);
 			sctp_handle_ootb(m, iphlen, offset, src, dst, sh, inp, op_err,
 			    mflowtype, mflowid, inp->fibnum,
 			    vrf_id, port);
 			goto out;
 			/* sa_ignore NOTREACHED */
 			break;
 		case SCTP_STATE_EMPTY:	/* should not happen */
 		case SCTP_STATE_INUSE:	/* should not happen */
 		case SCTP_STATE_SHUTDOWN_RECEIVED:	/* This is a peer error */
 		case SCTP_STATE_SHUTDOWN_ACK_SENT:
 		default:
 			goto out;
 			/* sa_ignore NOTREACHED */
 			break;
 		case SCTP_STATE_OPEN:
 		case SCTP_STATE_SHUTDOWN_SENT:
 			break;
 		}
 		/* plow through the data chunks while length > offset */
 		retval = sctp_process_data(mm, iphlen, &offset, length,
 		    inp, stcb, net, &high_tsn);
 		if (retval == 2) {
 			/*
 			 * The association aborted, NO UNLOCK needed since
 			 * the association is destroyed.
 			 */
 			stcb = NULL;
 			goto out;
 		}
 		data_processed = 1;
 		/*
 		 * Anything important needs to have been m_copy'ed in
 		 * process_data
 		 */
 	}
 	/* take care of ecn */
 	if ((data_processed == 1) &&
 	    (stcb->asoc.ecn_supported == 1) &&
 	    ((ecn_bits & SCTP_CE_BITS) == SCTP_CE_BITS)) {
 		/* Yep, we need to add a ECNE */
 		sctp_send_ecn_echo(stcb, net, high_tsn);
 	}
 	if ((data_processed == 0) && (fwd_tsn_seen)) {
 		int was_a_gap;
 		uint32_t highest_tsn;
 
 		if (SCTP_TSN_GT(stcb->asoc.highest_tsn_inside_nr_map, stcb->asoc.highest_tsn_inside_map)) {
 			highest_tsn = stcb->asoc.highest_tsn_inside_nr_map;
 		} else {
 			highest_tsn = stcb->asoc.highest_tsn_inside_map;
 		}
 		was_a_gap = SCTP_TSN_GT(highest_tsn, stcb->asoc.cumulative_tsn);
 		stcb->asoc.send_sack = 1;
 		sctp_sack_check(stcb, was_a_gap);
 	} else if (fwd_tsn_seen) {
 		stcb->asoc.send_sack = 1;
 	}
 	/* trigger send of any chunks in queue... */
 trigger_send:
 #ifdef SCTP_AUDITING_ENABLED
 	sctp_audit_log(0xE0, 2);
 	sctp_auditing(1, inp, stcb, net);
 #endif
 	SCTPDBG(SCTP_DEBUG_INPUT1,
 	    "Check for chunk output prw:%d tqe:%d tf=%d\n",
 	    stcb->asoc.peers_rwnd,
 	    TAILQ_EMPTY(&stcb->asoc.control_send_queue),
 	    stcb->asoc.total_flight);
 	un_sent = (stcb->asoc.total_output_queue_size - stcb->asoc.total_flight);
 	if (!TAILQ_EMPTY(&stcb->asoc.control_send_queue)) {
 		cnt_ctrl_ready = stcb->asoc.ctrl_queue_cnt - stcb->asoc.ecn_echo_cnt_onq;
 	}
 	if (!TAILQ_EMPTY(&stcb->asoc.asconf_send_queue) ||
 	    cnt_ctrl_ready ||
 	    stcb->asoc.trigger_reset ||
 	    ((un_sent) &&
 	    (stcb->asoc.peers_rwnd > 0 ||
 	    (stcb->asoc.peers_rwnd <= 0 && stcb->asoc.total_flight == 0)))) {
 		SCTPDBG(SCTP_DEBUG_INPUT3, "Calling chunk OUTPUT\n");
 		sctp_chunk_output(inp, stcb, SCTP_OUTPUT_FROM_CONTROL_PROC, SCTP_SO_NOT_LOCKED);
 		SCTPDBG(SCTP_DEBUG_INPUT3, "chunk OUTPUT returns\n");
 	}
 #ifdef SCTP_AUDITING_ENABLED
 	sctp_audit_log(0xE0, 3);
 	sctp_auditing(2, inp, stcb, net);
 #endif
 out:
 	if (stcb != NULL) {
 		SCTP_TCB_UNLOCK(stcb);
 	}
 	if (inp_decr != NULL) {
 		/* reduce ref-count */
 		SCTP_INP_WLOCK(inp_decr);
 		SCTP_INP_DECR_REF(inp_decr);
 		SCTP_INP_WUNLOCK(inp_decr);
 	}
 	return;
 }
 
 #ifdef INET
 void
 sctp_input_with_port(struct mbuf *i_pak, int off, uint16_t port)
 {
 	struct mbuf *m;
 	int iphlen;
 	uint32_t vrf_id = 0;
 	uint8_t ecn_bits;
 	struct sockaddr_in src, dst;
 	struct ip *ip;
 	struct sctphdr *sh;
 	struct sctp_chunkhdr *ch;
 	int length, offset;
 
 #if !defined(SCTP_WITH_NO_CSUM)
 	uint8_t compute_crc;
 
 #endif
 	uint32_t mflowid;
 	uint8_t mflowtype;
 	uint16_t fibnum;
 
 	iphlen = off;
 	if (SCTP_GET_PKT_VRFID(i_pak, vrf_id)) {
 		SCTP_RELEASE_PKT(i_pak);
 		return;
 	}
 	m = SCTP_HEADER_TO_CHAIN(i_pak);
 #ifdef SCTP_MBUF_LOGGING
 	/* Log in any input mbufs */
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_MBUF_LOGGING_ENABLE) {
 		sctp_log_mbc(m, SCTP_MBUF_INPUT);
 	}
 #endif
 #ifdef SCTP_PACKET_LOGGING
 	if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_LAST_PACKET_TRACING) {
 		sctp_packet_log(m);
 	}
 #endif
 	SCTPDBG(SCTP_DEBUG_CRCOFFLOAD,
 	    "sctp_input(): Packet of length %d received on %s with csum_flags 0x%b.\n",
 	    m->m_pkthdr.len,
 	    if_name(m->m_pkthdr.rcvif),
 	    (int)m->m_pkthdr.csum_flags, CSUM_BITS);
 	mflowid = m->m_pkthdr.flowid;
 	mflowtype = M_HASHTYPE_GET(m);
 	fibnum = M_GETFIB(m);
 	SCTP_STAT_INCR(sctps_recvpackets);
 	SCTP_STAT_INCR_COUNTER64(sctps_inpackets);
 	/* Get IP, SCTP, and first chunk header together in the first mbuf. */
 	offset = iphlen + sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr);
 	if (SCTP_BUF_LEN(m) < offset) {
 		if ((m = m_pullup(m, offset)) == NULL) {
 			SCTP_STAT_INCR(sctps_hdrops);
 			return;
 		}
 	}
 	ip = mtod(m, struct ip *);
 	sh = (struct sctphdr *)((caddr_t)ip + iphlen);
 	ch = (struct sctp_chunkhdr *)((caddr_t)sh + sizeof(struct sctphdr));
 	offset -= sizeof(struct sctp_chunkhdr);
 	memset(&src, 0, sizeof(struct sockaddr_in));
 	src.sin_family = AF_INET;
 	src.sin_len = sizeof(struct sockaddr_in);
 	src.sin_port = sh->src_port;
 	src.sin_addr = ip->ip_src;
 	memset(&dst, 0, sizeof(struct sockaddr_in));
 	dst.sin_family = AF_INET;
 	dst.sin_len = sizeof(struct sockaddr_in);
 	dst.sin_port = sh->dest_port;
 	dst.sin_addr = ip->ip_dst;
 	length = ntohs(ip->ip_len);
 	/* Validate mbuf chain length with IP payload length. */
 	if (SCTP_HEADER_LEN(m) != length) {
 		SCTPDBG(SCTP_DEBUG_INPUT1,
 		    "sctp_input() length:%d reported length:%d\n", length, SCTP_HEADER_LEN(m));
 		SCTP_STAT_INCR(sctps_hdrops);
 		goto out;
 	}
 	/* SCTP does not allow broadcasts or multicasts */
 	if (IN_MULTICAST(ntohl(dst.sin_addr.s_addr))) {
 		goto out;
 	}
 	if (SCTP_IS_IT_BROADCAST(dst.sin_addr, m)) {
 		goto out;
 	}
 	ecn_bits = ip->ip_tos;
 #if defined(SCTP_WITH_NO_CSUM)
 	SCTP_STAT_INCR(sctps_recvnocrc);
 #else
 	if (m->m_pkthdr.csum_flags & CSUM_SCTP_VALID) {
 		SCTP_STAT_INCR(sctps_recvhwcrc);
 		compute_crc = 0;
 	} else {
 		SCTP_STAT_INCR(sctps_recvswcrc);
 		compute_crc = 1;
 	}
 #endif
 	sctp_common_input_processing(&m, iphlen, offset, length,
 	    (struct sockaddr *)&src,
 	    (struct sockaddr *)&dst,
 	    sh, ch,
 #if !defined(SCTP_WITH_NO_CSUM)
 	    compute_crc,
 #endif
 	    ecn_bits,
 	    mflowtype, mflowid, fibnum,
 	    vrf_id, port);
 out:
 	if (m) {
 		sctp_m_freem(m);
 	}
 	return;
 }
 
 #if defined(__FreeBSD__) && defined(SCTP_MCORE_INPUT) && defined(SMP)
 extern int *sctp_cpuarry;
 
 #endif
 
 int
 sctp_input(struct mbuf **mp, int *offp, int proto SCTP_UNUSED)
 {
 	struct mbuf *m;
 	int off;
 
 	m = *mp;
 	off = *offp;
 #if defined(__FreeBSD__) && defined(SCTP_MCORE_INPUT) && defined(SMP)
 	if (mp_ncpus > 1) {
 		struct ip *ip;
 		struct sctphdr *sh;
 		int offset;
 		int cpu_to_use;
 		uint32_t flowid, tag;
 
 		if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
 			flowid = m->m_pkthdr.flowid;
 		} else {
 			/*
 			 * No flow id built by lower layers fix it so we
 			 * create one.
 			 */
 			offset = off + sizeof(struct sctphdr);
 			if (SCTP_BUF_LEN(m) < offset) {
 				if ((m = m_pullup(m, offset)) == NULL) {
 					SCTP_STAT_INCR(sctps_hdrops);
 					return (IPPROTO_DONE);
 				}
 			}
 			ip = mtod(m, struct ip *);
 			sh = (struct sctphdr *)((caddr_t)ip + off);
 			tag = htonl(sh->v_tag);
 			flowid = tag ^ ntohs(sh->dest_port) ^ ntohs(sh->src_port);
 			m->m_pkthdr.flowid = flowid;
-			M_HASHTYPE_SET(m, M_HASHTYPE_OPAQUE);
+			M_HASHTYPE_SET(m, M_HASHTYPE_OPAQUE_HASH);
 		}
 		cpu_to_use = sctp_cpuarry[flowid % mp_ncpus];
 		sctp_queue_to_mcore(m, off, cpu_to_use);
 		return (IPPROTO_DONE);
 	}
 #endif
 	sctp_input_with_port(m, off, 0);
 	return (IPPROTO_DONE);
 }
 
 #endif
Index: projects/vnet/sys/netinet/sctp_pcb.c
===================================================================
--- projects/vnet/sys/netinet/sctp_pcb.c	(revision 301546)
+++ projects/vnet/sys/netinet/sctp_pcb.c	(revision 301547)
@@ -1,7117 +1,7117 @@
 /*-
  * Copyright (c) 2001-2008, by Cisco Systems, Inc. All rights reserved.
  * Copyright (c) 2008-2012, by Randall Stewart. All rights reserved.
  * Copyright (c) 2008-2012, by Michael Tuexen. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * a) Redistributions of source code must retain the above copyright notice,
  *    this list of conditions and the following disclaimer.
  *
  * b) Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in
  *    the documentation and/or other materials provided with the distribution.
  *
  * c) Neither the name of Cisco Systems, Inc. nor the names of its
  *    contributors may be used to endorse or promote products derived
  *    from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <netinet/sctp_os.h>
 #include <sys/proc.h>
 #include <netinet/sctp_var.h>
 #include <netinet/sctp_sysctl.h>
 #include <netinet/sctp_pcb.h>
 #include <netinet/sctputil.h>
 #include <netinet/sctp.h>
 #include <netinet/sctp_header.h>
 #include <netinet/sctp_asconf.h>
 #include <netinet/sctp_output.h>
 #include <netinet/sctp_timer.h>
 #include <netinet/sctp_bsd_addr.h>
 #include <netinet/sctp_dtrace_define.h>
 #if defined(INET) || defined(INET6)
 #include <netinet/udp.h>
 #endif
 #ifdef INET6
 #include <netinet6/ip6_var.h>
 #endif
 #include <sys/sched.h>
 #include <sys/smp.h>
 #include <sys/unistd.h>
 
 
 VNET_DEFINE(struct sctp_base_info, system_base_info);
 
 /* FIX: we don't handle multiple link local scopes */
 /* "scopeless" replacement IN6_ARE_ADDR_EQUAL */
 #ifdef INET6
 int
 SCTP6_ARE_ADDR_EQUAL(struct sockaddr_in6 *a, struct sockaddr_in6 *b)
 {
 	struct sockaddr_in6 tmp_a, tmp_b;
 
 	memcpy(&tmp_a, a, sizeof(struct sockaddr_in6));
 	if (sa6_embedscope(&tmp_a, MODULE_GLOBAL(ip6_use_defzone)) != 0) {
 		return (0);
 	}
 	memcpy(&tmp_b, b, sizeof(struct sockaddr_in6));
 	if (sa6_embedscope(&tmp_b, MODULE_GLOBAL(ip6_use_defzone)) != 0) {
 		return (0);
 	}
 	return (IN6_ARE_ADDR_EQUAL(&tmp_a.sin6_addr, &tmp_b.sin6_addr));
 }
 
 #endif
 
 void
 sctp_fill_pcbinfo(struct sctp_pcbinfo *spcb)
 {
 	/*
 	 * We really don't need to lock this, but I will just because it
 	 * does not hurt.
 	 */
 	SCTP_INP_INFO_RLOCK();
 	spcb->ep_count = SCTP_BASE_INFO(ipi_count_ep);
 	spcb->asoc_count = SCTP_BASE_INFO(ipi_count_asoc);
 	spcb->laddr_count = SCTP_BASE_INFO(ipi_count_laddr);
 	spcb->raddr_count = SCTP_BASE_INFO(ipi_count_raddr);
 	spcb->chk_count = SCTP_BASE_INFO(ipi_count_chunk);
 	spcb->readq_count = SCTP_BASE_INFO(ipi_count_readq);
 	spcb->stream_oque = SCTP_BASE_INFO(ipi_count_strmoq);
 	spcb->free_chunks = SCTP_BASE_INFO(ipi_free_chunks);
 	SCTP_INP_INFO_RUNLOCK();
 }
 
 /*-
  * Addresses are added to VRF's (Virtual Router's). For BSD we
  * have only the default VRF 0. We maintain a hash list of
  * VRF's. Each VRF has its own list of sctp_ifn's. Each of
  * these has a list of addresses. When we add a new address
  * to a VRF we lookup the ifn/ifn_index, if the ifn does
  * not exist we create it and add it to the list of IFN's
  * within the VRF. Once we have the sctp_ifn, we add the
  * address to the list. So we look something like:
  *
  * hash-vrf-table
  *   vrf-> ifn-> ifn -> ifn
  *   vrf    |
  *    ...   +--ifa-> ifa -> ifa
  *   vrf
  *
  * We keep these separate lists since the SCTP subsystem will
  * point to these from its source address selection nets structure.
  * When an address is deleted it does not happen right away on
  * the SCTP side, it gets scheduled. What we do when a
  * delete happens is immediately remove the address from
  * the master list and decrement the refcount. As our
  * addip iterator works through and frees the src address
  * selection pointing to the sctp_ifa, eventually the refcount
  * will reach 0 and we will delete it. Note that it is assumed
  * that any locking on system level ifn/ifa is done at the
  * caller of these functions and these routines will only
  * lock the SCTP structures as they add or delete things.
  *
  * Other notes on VRF concepts.
  *  - An endpoint can be in multiple VRF's
  *  - An association lives within a VRF and only one VRF.
  *  - Any incoming packet we can deduce the VRF for by
  *    looking at the mbuf/pak inbound (for BSD its VRF=0 :D)
  *  - Any downward send call or connect call must supply the
  *    VRF via ancillary data or via some sort of set default
  *    VRF socket option call (again for BSD no brainer since
  *    the VRF is always 0).
  *  - An endpoint may add multiple VRF's to it.
  *  - Listening sockets can accept associations in any
  *    of the VRF's they are in but the assoc will end up
  *    in only one VRF (gotten from the packet or connect/send).
  *
  */
 
 struct sctp_vrf *
 sctp_allocate_vrf(int vrf_id)
 {
 	struct sctp_vrf *vrf = NULL;
 	struct sctp_vrflist *bucket;
 
 	/* First allocate the VRF structure */
 	vrf = sctp_find_vrf(vrf_id);
 	if (vrf) {
 		/* Already allocated */
 		return (vrf);
 	}
 	SCTP_MALLOC(vrf, struct sctp_vrf *, sizeof(struct sctp_vrf),
 	    SCTP_M_VRF);
 	if (vrf == NULL) {
 		/* No memory */
 #ifdef INVARIANTS
 		panic("No memory for VRF:%d", vrf_id);
 #endif
 		return (NULL);
 	}
 	/* setup the VRF */
 	memset(vrf, 0, sizeof(struct sctp_vrf));
 	vrf->vrf_id = vrf_id;
 	LIST_INIT(&vrf->ifnlist);
 	vrf->total_ifa_count = 0;
 	vrf->refcount = 0;
 	/* now also setup table ids */
 	SCTP_INIT_VRF_TABLEID(vrf);
 	/* Init the HASH of addresses */
 	vrf->vrf_addr_hash = SCTP_HASH_INIT(SCTP_VRF_ADDR_HASH_SIZE,
 	    &vrf->vrf_addr_hashmark);
 	if (vrf->vrf_addr_hash == NULL) {
 		/* No memory */
 #ifdef INVARIANTS
 		panic("No memory for VRF:%d", vrf_id);
 #endif
 		SCTP_FREE(vrf, SCTP_M_VRF);
 		return (NULL);
 	}
 	/* Add it to the hash table */
 	bucket = &SCTP_BASE_INFO(sctp_vrfhash)[(vrf_id & SCTP_BASE_INFO(hashvrfmark))];
 	LIST_INSERT_HEAD(bucket, vrf, next_vrf);
 	atomic_add_int(&SCTP_BASE_INFO(ipi_count_vrfs), 1);
 	return (vrf);
 }
 
 
 struct sctp_ifn *
 sctp_find_ifn(void *ifn, uint32_t ifn_index)
 {
 	struct sctp_ifn *sctp_ifnp;
 	struct sctp_ifnlist *hash_ifn_head;
 
 	/*
 	 * We assume the lock is held for the addresses if that's wrong
 	 * problems could occur :-)
 	 */
 	hash_ifn_head = &SCTP_BASE_INFO(vrf_ifn_hash)[(ifn_index & SCTP_BASE_INFO(vrf_ifn_hashmark))];
 	LIST_FOREACH(sctp_ifnp, hash_ifn_head, next_bucket) {
 		if (sctp_ifnp->ifn_index == ifn_index) {
 			return (sctp_ifnp);
 		}
 		if (sctp_ifnp->ifn_p && ifn && (sctp_ifnp->ifn_p == ifn)) {
 			return (sctp_ifnp);
 		}
 	}
 	return (NULL);
 }
 
 
 struct sctp_vrf *
 sctp_find_vrf(uint32_t vrf_id)
 {
 	struct sctp_vrflist *bucket;
 	struct sctp_vrf *liste;
 
 	bucket = &SCTP_BASE_INFO(sctp_vrfhash)[(vrf_id & SCTP_BASE_INFO(hashvrfmark))];
 	LIST_FOREACH(liste, bucket, next_vrf) {
 		if (vrf_id == liste->vrf_id) {
 			return (liste);
 		}
 	}
 	return (NULL);
 }
 
 
 void
 sctp_free_vrf(struct sctp_vrf *vrf)
 {
 	if (SCTP_DECREMENT_AND_CHECK_REFCOUNT(&vrf->refcount)) {
 		if (vrf->vrf_addr_hash) {
 			SCTP_HASH_FREE(vrf->vrf_addr_hash, vrf->vrf_addr_hashmark);
 			vrf->vrf_addr_hash = NULL;
 		}
 		/* We zero'd the count */
 		LIST_REMOVE(vrf, next_vrf);
 		SCTP_FREE(vrf, SCTP_M_VRF);
 		atomic_subtract_int(&SCTP_BASE_INFO(ipi_count_vrfs), 1);
 	}
 }
 
 
 void
 sctp_free_ifn(struct sctp_ifn *sctp_ifnp)
 {
 	if (SCTP_DECREMENT_AND_CHECK_REFCOUNT(&sctp_ifnp->refcount)) {
 		/* We zero'd the count */
 		if (sctp_ifnp->vrf) {
 			sctp_free_vrf(sctp_ifnp->vrf);
 		}
 		SCTP_FREE(sctp_ifnp, SCTP_M_IFN);
 		atomic_subtract_int(&SCTP_BASE_INFO(ipi_count_ifns), 1);
 	}
 }
 
 
 void
 sctp_update_ifn_mtu(uint32_t ifn_index, uint32_t mtu)
 {
 	struct sctp_ifn *sctp_ifnp;
 
 	sctp_ifnp = sctp_find_ifn((void *)NULL, ifn_index);
 	if (sctp_ifnp != NULL) {
 		sctp_ifnp->ifn_mtu = mtu;
 	}
 }
 
 
 void
 sctp_free_ifa(struct sctp_ifa *sctp_ifap)
 {
 	if (SCTP_DECREMENT_AND_CHECK_REFCOUNT(&sctp_ifap->refcount)) {
 		/* We zero'd the count */
 		if (sctp_ifap->ifn_p) {
 			sctp_free_ifn(sctp_ifap->ifn_p);
 		}
 		SCTP_FREE(sctp_ifap, SCTP_M_IFA);
 		atomic_subtract_int(&SCTP_BASE_INFO(ipi_count_ifas), 1);
 	}
 }
 
 
 static void
 sctp_delete_ifn(struct sctp_ifn *sctp_ifnp, int hold_addr_lock)
 {
 	struct sctp_ifn *found;
 
 	found = sctp_find_ifn(sctp_ifnp->ifn_p, sctp_ifnp->ifn_index);
 	if (found == NULL) {
 		/* Not in the list.. sorry */
 		return;
 	}
 	if (hold_addr_lock == 0)
 		SCTP_IPI_ADDR_WLOCK();
 	LIST_REMOVE(sctp_ifnp, next_bucket);
 	LIST_REMOVE(sctp_ifnp, next_ifn);
 	SCTP_DEREGISTER_INTERFACE(sctp_ifnp->ifn_index,
 	    sctp_ifnp->registered_af);
 	if (hold_addr_lock == 0)
 		SCTP_IPI_ADDR_WUNLOCK();
 	/* Take away the reference, and possibly free it */
 	sctp_free_ifn(sctp_ifnp);
 }
 
 
 void
 sctp_mark_ifa_addr_down(uint32_t vrf_id, struct sockaddr *addr,
     const char *if_name, uint32_t ifn_index)
 {
 	struct sctp_vrf *vrf;
 	struct sctp_ifa *sctp_ifap;
 
 	SCTP_IPI_ADDR_RLOCK();
 	vrf = sctp_find_vrf(vrf_id);
 	if (vrf == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Can't find vrf_id 0x%x\n", vrf_id);
 		goto out;
 
 	}
 	sctp_ifap = sctp_find_ifa_by_addr(addr, vrf->vrf_id, SCTP_ADDR_LOCKED);
 	if (sctp_ifap == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Can't find sctp_ifap for address\n");
 		goto out;
 	}
 	if (sctp_ifap->ifn_p == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "IFA has no IFN - can't mark unusable\n");
 		goto out;
 	}
 	if (if_name) {
 		if (strncmp(if_name, sctp_ifap->ifn_p->ifn_name, SCTP_IFNAMSIZ) != 0) {
 			SCTPDBG(SCTP_DEBUG_PCB4, "IFN %s of IFA not the same as %s\n",
 			    sctp_ifap->ifn_p->ifn_name, if_name);
 			goto out;
 		}
 	} else {
 		if (sctp_ifap->ifn_p->ifn_index != ifn_index) {
 			SCTPDBG(SCTP_DEBUG_PCB4, "IFA owned by ifn_index:%d down command for ifn_index:%d - ignored\n",
 			    sctp_ifap->ifn_p->ifn_index, ifn_index);
 			goto out;
 		}
 	}
 
 	sctp_ifap->localifa_flags &= (~SCTP_ADDR_VALID);
 	sctp_ifap->localifa_flags |= SCTP_ADDR_IFA_UNUSEABLE;
 out:
 	SCTP_IPI_ADDR_RUNLOCK();
 }
 
 
 void
 sctp_mark_ifa_addr_up(uint32_t vrf_id, struct sockaddr *addr,
     const char *if_name, uint32_t ifn_index)
 {
 	struct sctp_vrf *vrf;
 	struct sctp_ifa *sctp_ifap;
 
 	SCTP_IPI_ADDR_RLOCK();
 	vrf = sctp_find_vrf(vrf_id);
 	if (vrf == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Can't find vrf_id 0x%x\n", vrf_id);
 		goto out;
 
 	}
 	sctp_ifap = sctp_find_ifa_by_addr(addr, vrf->vrf_id, SCTP_ADDR_LOCKED);
 	if (sctp_ifap == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Can't find sctp_ifap for address\n");
 		goto out;
 	}
 	if (sctp_ifap->ifn_p == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "IFA has no IFN - can't mark unusable\n");
 		goto out;
 	}
 	if (if_name) {
 		if (strncmp(if_name, sctp_ifap->ifn_p->ifn_name, SCTP_IFNAMSIZ) != 0) {
 			SCTPDBG(SCTP_DEBUG_PCB4, "IFN %s of IFA not the same as %s\n",
 			    sctp_ifap->ifn_p->ifn_name, if_name);
 			goto out;
 		}
 	} else {
 		if (sctp_ifap->ifn_p->ifn_index != ifn_index) {
 			SCTPDBG(SCTP_DEBUG_PCB4, "IFA owned by ifn_index:%d down command for ifn_index:%d - ignored\n",
 			    sctp_ifap->ifn_p->ifn_index, ifn_index);
 			goto out;
 		}
 	}
 
 	sctp_ifap->localifa_flags &= (~SCTP_ADDR_IFA_UNUSEABLE);
 	sctp_ifap->localifa_flags |= SCTP_ADDR_VALID;
 out:
 	SCTP_IPI_ADDR_RUNLOCK();
 }
 
 
 /*-
  * Add an ifa to an ifn.
  * Register the interface as necessary.
  * NOTE: ADDR write lock MUST be held.
  */
 static void
 sctp_add_ifa_to_ifn(struct sctp_ifn *sctp_ifnp, struct sctp_ifa *sctp_ifap)
 {
 	int ifa_af;
 
 	LIST_INSERT_HEAD(&sctp_ifnp->ifalist, sctp_ifap, next_ifa);
 	sctp_ifap->ifn_p = sctp_ifnp;
 	atomic_add_int(&sctp_ifap->ifn_p->refcount, 1);
 	/* update address counts */
 	sctp_ifnp->ifa_count++;
 	ifa_af = sctp_ifap->address.sa.sa_family;
 	switch (ifa_af) {
 #ifdef INET
 	case AF_INET:
 		sctp_ifnp->num_v4++;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		sctp_ifnp->num_v6++;
 		break;
 #endif
 	default:
 		break;
 	}
 	if (sctp_ifnp->ifa_count == 1) {
 		/* register the new interface */
 		SCTP_REGISTER_INTERFACE(sctp_ifnp->ifn_index, ifa_af);
 		sctp_ifnp->registered_af = ifa_af;
 	}
 }
 
 
 /*-
  * Remove an ifa from its ifn.
  * If no more addresses exist, remove the ifn too. Otherwise, re-register
  * the interface based on the remaining address families left.
  * NOTE: ADDR write lock MUST be held.
  */
 static void
 sctp_remove_ifa_from_ifn(struct sctp_ifa *sctp_ifap)
 {
 	LIST_REMOVE(sctp_ifap, next_ifa);
 	if (sctp_ifap->ifn_p) {
 		/* update address counts */
 		sctp_ifap->ifn_p->ifa_count--;
 		switch (sctp_ifap->address.sa.sa_family) {
 #ifdef INET
 		case AF_INET:
 			sctp_ifap->ifn_p->num_v4--;
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			sctp_ifap->ifn_p->num_v6--;
 			break;
 #endif
 		default:
 			break;
 		}
 
 		if (LIST_EMPTY(&sctp_ifap->ifn_p->ifalist)) {
 			/* remove the ifn, possibly freeing it */
 			sctp_delete_ifn(sctp_ifap->ifn_p, SCTP_ADDR_LOCKED);
 		} else {
 			/* re-register address family type, if needed */
 			if ((sctp_ifap->ifn_p->num_v6 == 0) &&
 			    (sctp_ifap->ifn_p->registered_af == AF_INET6)) {
 				SCTP_DEREGISTER_INTERFACE(sctp_ifap->ifn_p->ifn_index, AF_INET6);
 				SCTP_REGISTER_INTERFACE(sctp_ifap->ifn_p->ifn_index, AF_INET);
 				sctp_ifap->ifn_p->registered_af = AF_INET;
 			} else if ((sctp_ifap->ifn_p->num_v4 == 0) &&
 			    (sctp_ifap->ifn_p->registered_af == AF_INET)) {
 				SCTP_DEREGISTER_INTERFACE(sctp_ifap->ifn_p->ifn_index, AF_INET);
 				SCTP_REGISTER_INTERFACE(sctp_ifap->ifn_p->ifn_index, AF_INET6);
 				sctp_ifap->ifn_p->registered_af = AF_INET6;
 			}
 			/* free the ifn refcount */
 			sctp_free_ifn(sctp_ifap->ifn_p);
 		}
 		sctp_ifap->ifn_p = NULL;
 	}
 }
 
 
 struct sctp_ifa *
 sctp_add_addr_to_vrf(uint32_t vrf_id, void *ifn, uint32_t ifn_index,
     uint32_t ifn_type, const char *if_name, void *ifa,
     struct sockaddr *addr, uint32_t ifa_flags,
     int dynamic_add)
 {
 	struct sctp_vrf *vrf;
 	struct sctp_ifn *sctp_ifnp = NULL;
 	struct sctp_ifa *sctp_ifap = NULL;
 	struct sctp_ifalist *hash_addr_head;
 	struct sctp_ifnlist *hash_ifn_head;
 	uint32_t hash_of_addr;
 	int new_ifn_af = 0;
 
 #ifdef SCTP_DEBUG
 	SCTPDBG(SCTP_DEBUG_PCB4, "vrf_id 0x%x: adding address: ", vrf_id);
 	SCTPDBG_ADDR(SCTP_DEBUG_PCB4, addr);
 #endif
 	SCTP_IPI_ADDR_WLOCK();
 	sctp_ifnp = sctp_find_ifn(ifn, ifn_index);
 	if (sctp_ifnp) {
 		vrf = sctp_ifnp->vrf;
 	} else {
 		vrf = sctp_find_vrf(vrf_id);
 		if (vrf == NULL) {
 			vrf = sctp_allocate_vrf(vrf_id);
 			if (vrf == NULL) {
 				SCTP_IPI_ADDR_WUNLOCK();
 				return (NULL);
 			}
 		}
 	}
 	if (sctp_ifnp == NULL) {
 		/*
 		 * build one and add it, can't hold lock until after malloc
 		 * done though.
 		 */
 		SCTP_IPI_ADDR_WUNLOCK();
 		SCTP_MALLOC(sctp_ifnp, struct sctp_ifn *,
 		    sizeof(struct sctp_ifn), SCTP_M_IFN);
 		if (sctp_ifnp == NULL) {
 #ifdef INVARIANTS
 			panic("No memory for IFN");
 #endif
 			return (NULL);
 		}
 		memset(sctp_ifnp, 0, sizeof(struct sctp_ifn));
 		sctp_ifnp->ifn_index = ifn_index;
 		sctp_ifnp->ifn_p = ifn;
 		sctp_ifnp->ifn_type = ifn_type;
 		sctp_ifnp->refcount = 0;
 		sctp_ifnp->vrf = vrf;
 		atomic_add_int(&vrf->refcount, 1);
 		sctp_ifnp->ifn_mtu = SCTP_GATHER_MTU_FROM_IFN_INFO(ifn, ifn_index, addr->sa_family);
 		if (if_name != NULL) {
 			snprintf(sctp_ifnp->ifn_name, SCTP_IFNAMSIZ, "%s", if_name);
 		} else {
 			snprintf(sctp_ifnp->ifn_name, SCTP_IFNAMSIZ, "%s", "unknown");
 		}
 		hash_ifn_head = &SCTP_BASE_INFO(vrf_ifn_hash)[(ifn_index & SCTP_BASE_INFO(vrf_ifn_hashmark))];
 		LIST_INIT(&sctp_ifnp->ifalist);
 		SCTP_IPI_ADDR_WLOCK();
 		LIST_INSERT_HEAD(hash_ifn_head, sctp_ifnp, next_bucket);
 		LIST_INSERT_HEAD(&vrf->ifnlist, sctp_ifnp, next_ifn);
 		atomic_add_int(&SCTP_BASE_INFO(ipi_count_ifns), 1);
 		new_ifn_af = 1;
 	}
 	sctp_ifap = sctp_find_ifa_by_addr(addr, vrf->vrf_id, SCTP_ADDR_LOCKED);
 	if (sctp_ifap) {
 		/* Hmm, it already exists? */
 		if ((sctp_ifap->ifn_p) &&
 		    (sctp_ifap->ifn_p->ifn_index == ifn_index)) {
 			SCTPDBG(SCTP_DEBUG_PCB4, "Using existing ifn %s (0x%x) for ifa %p\n",
 			    sctp_ifap->ifn_p->ifn_name, ifn_index,
 			    (void *)sctp_ifap);
 			if (new_ifn_af) {
 				/* Remove the created one that we don't want */
 				sctp_delete_ifn(sctp_ifnp, SCTP_ADDR_LOCKED);
 			}
 			if (sctp_ifap->localifa_flags & SCTP_BEING_DELETED) {
 				/* easy to solve, just switch back to active */
 				SCTPDBG(SCTP_DEBUG_PCB4, "Clearing deleted ifa flag\n");
 				sctp_ifap->localifa_flags = SCTP_ADDR_VALID;
 				sctp_ifap->ifn_p = sctp_ifnp;
 				atomic_add_int(&sctp_ifap->ifn_p->refcount, 1);
 			}
 	exit_stage_left:
 			SCTP_IPI_ADDR_WUNLOCK();
 			return (sctp_ifap);
 		} else {
 			if (sctp_ifap->ifn_p) {
 				/*
 				 * The last IFN gets the address, remove the
 				 * old one
 				 */
 				SCTPDBG(SCTP_DEBUG_PCB4, "Moving ifa %p from %s (0x%x) to %s (0x%x)\n",
 				    (void *)sctp_ifap, sctp_ifap->ifn_p->ifn_name,
 				    sctp_ifap->ifn_p->ifn_index, if_name,
 				    ifn_index);
 				/* remove the address from the old ifn */
 				sctp_remove_ifa_from_ifn(sctp_ifap);
 				/* move the address over to the new ifn */
 				sctp_add_ifa_to_ifn(sctp_ifnp, sctp_ifap);
 				goto exit_stage_left;
 			} else {
 				/* repair ifnp which was NULL ? */
 				sctp_ifap->localifa_flags = SCTP_ADDR_VALID;
 				SCTPDBG(SCTP_DEBUG_PCB4, "Repairing ifn %p for ifa %p\n",
 				    (void *)sctp_ifnp, (void *)sctp_ifap);
 				sctp_add_ifa_to_ifn(sctp_ifnp, sctp_ifap);
 			}
 			goto exit_stage_left;
 		}
 	}
 	SCTP_IPI_ADDR_WUNLOCK();
 	SCTP_MALLOC(sctp_ifap, struct sctp_ifa *, sizeof(struct sctp_ifa), SCTP_M_IFA);
 	if (sctp_ifap == NULL) {
 #ifdef INVARIANTS
 		panic("No memory for IFA");
 #endif
 		return (NULL);
 	}
 	memset(sctp_ifap, 0, sizeof(struct sctp_ifa));
 	sctp_ifap->ifn_p = sctp_ifnp;
 	atomic_add_int(&sctp_ifnp->refcount, 1);
 	sctp_ifap->vrf_id = vrf_id;
 	sctp_ifap->ifa = ifa;
 	memcpy(&sctp_ifap->address, addr, addr->sa_len);
 	sctp_ifap->localifa_flags = SCTP_ADDR_VALID | SCTP_ADDR_DEFER_USE;
 	sctp_ifap->flags = ifa_flags;
 	/* Set scope */
 	switch (sctp_ifap->address.sa.sa_family) {
 #ifdef INET
 	case AF_INET:
 		{
 			struct sockaddr_in *sin;
 
 			sin = &sctp_ifap->address.sin;
 			if (SCTP_IFN_IS_IFT_LOOP(sctp_ifap->ifn_p) ||
 			    (IN4_ISLOOPBACK_ADDRESS(&sin->sin_addr))) {
 				sctp_ifap->src_is_loop = 1;
 			}
 			if ((IN4_ISPRIVATE_ADDRESS(&sin->sin_addr))) {
 				sctp_ifap->src_is_priv = 1;
 			}
 			sctp_ifnp->num_v4++;
 			if (new_ifn_af)
 				new_ifn_af = AF_INET;
 			break;
 		}
 #endif
 #ifdef INET6
 	case AF_INET6:
 		{
 			/* ok to use deprecated addresses? */
 			struct sockaddr_in6 *sin6;
 
 			sin6 = &sctp_ifap->address.sin6;
 			if (SCTP_IFN_IS_IFT_LOOP(sctp_ifap->ifn_p) ||
 			    (IN6_IS_ADDR_LOOPBACK(&sin6->sin6_addr))) {
 				sctp_ifap->src_is_loop = 1;
 			}
 			if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr)) {
 				sctp_ifap->src_is_priv = 1;
 			}
 			sctp_ifnp->num_v6++;
 			if (new_ifn_af)
 				new_ifn_af = AF_INET6;
 			break;
 		}
 #endif
 	default:
 		new_ifn_af = 0;
 		break;
 	}
 	hash_of_addr = sctp_get_ifa_hash_val(&sctp_ifap->address.sa);
 
 	if ((sctp_ifap->src_is_priv == 0) &&
 	    (sctp_ifap->src_is_loop == 0)) {
 		sctp_ifap->src_is_glob = 1;
 	}
 	SCTP_IPI_ADDR_WLOCK();
 	hash_addr_head = &vrf->vrf_addr_hash[(hash_of_addr & vrf->vrf_addr_hashmark)];
 	LIST_INSERT_HEAD(hash_addr_head, sctp_ifap, next_bucket);
 	sctp_ifap->refcount = 1;
 	LIST_INSERT_HEAD(&sctp_ifnp->ifalist, sctp_ifap, next_ifa);
 	sctp_ifnp->ifa_count++;
 	vrf->total_ifa_count++;
 	atomic_add_int(&SCTP_BASE_INFO(ipi_count_ifas), 1);
 	if (new_ifn_af) {
 		SCTP_REGISTER_INTERFACE(ifn_index, new_ifn_af);
 		sctp_ifnp->registered_af = new_ifn_af;
 	}
 	SCTP_IPI_ADDR_WUNLOCK();
 	if (dynamic_add) {
 		/*
 		 * Bump up the refcount so that when the timer completes it
 		 * will drop back down.
 		 */
 		struct sctp_laddr *wi;
 
 		atomic_add_int(&sctp_ifap->refcount, 1);
 		wi = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_laddr), struct sctp_laddr);
 		if (wi == NULL) {
 			/*
 			 * Gak, what can we do? We have lost an address
 			 * change can you say HOSED?
 			 */
 			SCTPDBG(SCTP_DEBUG_PCB4, "Lost an address change?\n");
 			/* Opps, must decrement the count */
 			sctp_del_addr_from_vrf(vrf_id, addr, ifn_index,
 			    if_name);
 			return (NULL);
 		}
 		SCTP_INCR_LADDR_COUNT();
 		bzero(wi, sizeof(*wi));
 		(void)SCTP_GETTIME_TIMEVAL(&wi->start_time);
 		wi->ifa = sctp_ifap;
 		wi->action = SCTP_ADD_IP_ADDRESS;
 
 		SCTP_WQ_ADDR_LOCK();
 		LIST_INSERT_HEAD(&SCTP_BASE_INFO(addr_wq), wi, sctp_nxt_addr);
 		SCTP_WQ_ADDR_UNLOCK();
 
 		sctp_timer_start(SCTP_TIMER_TYPE_ADDR_WQ,
 		    (struct sctp_inpcb *)NULL,
 		    (struct sctp_tcb *)NULL,
 		    (struct sctp_nets *)NULL);
 	} else {
 		/* it's ready for use */
 		sctp_ifap->localifa_flags &= ~SCTP_ADDR_DEFER_USE;
 	}
 	return (sctp_ifap);
 }
 
 void
 sctp_del_addr_from_vrf(uint32_t vrf_id, struct sockaddr *addr,
     uint32_t ifn_index, const char *if_name)
 {
 	struct sctp_vrf *vrf;
 	struct sctp_ifa *sctp_ifap = NULL;
 
 	SCTP_IPI_ADDR_WLOCK();
 	vrf = sctp_find_vrf(vrf_id);
 	if (vrf == NULL) {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Can't find vrf_id 0x%x\n", vrf_id);
 		goto out_now;
 	}
 #ifdef SCTP_DEBUG
 	SCTPDBG(SCTP_DEBUG_PCB4, "vrf_id 0x%x: deleting address:", vrf_id);
 	SCTPDBG_ADDR(SCTP_DEBUG_PCB4, addr);
 #endif
 	sctp_ifap = sctp_find_ifa_by_addr(addr, vrf->vrf_id, SCTP_ADDR_LOCKED);
 	if (sctp_ifap) {
 		/* Validate the delete */
 		if (sctp_ifap->ifn_p) {
 			int valid = 0;
 
 			/*-
 			 * The name has priority over the ifn_index
 			 * if its given. We do this especially for
 			 * panda who might recycle indexes fast.
 			 */
 			if (if_name) {
 				if (strncmp(if_name, sctp_ifap->ifn_p->ifn_name, SCTP_IFNAMSIZ) == 0) {
 					/* They match its a correct delete */
 					valid = 1;
 				}
 			}
 			if (!valid) {
 				/* last ditch check ifn_index */
 				if (ifn_index == sctp_ifap->ifn_p->ifn_index) {
 					valid = 1;
 				}
 			}
 			if (!valid) {
 				SCTPDBG(SCTP_DEBUG_PCB4, "ifn:%d ifname:%s does not match addresses\n",
 				    ifn_index, ((if_name == NULL) ? "NULL" : if_name));
 				SCTPDBG(SCTP_DEBUG_PCB4, "ifn:%d ifname:%s - ignoring delete\n",
 				    sctp_ifap->ifn_p->ifn_index, sctp_ifap->ifn_p->ifn_name);
 				SCTP_IPI_ADDR_WUNLOCK();
 				return;
 			}
 		}
 		SCTPDBG(SCTP_DEBUG_PCB4, "Deleting ifa %p\n", (void *)sctp_ifap);
 		sctp_ifap->localifa_flags &= SCTP_ADDR_VALID;
 		/*
 		 * We don't set the flag. This means that the structure will
 		 * hang around in EP's that have bound specific to it until
 		 * they close. This gives us TCP like behavior if someone
 		 * removes an address (or for that matter adds it right
 		 * back).
 		 */
 		/* sctp_ifap->localifa_flags |= SCTP_BEING_DELETED; */
 		vrf->total_ifa_count--;
 		LIST_REMOVE(sctp_ifap, next_bucket);
 		sctp_remove_ifa_from_ifn(sctp_ifap);
 	}
 #ifdef SCTP_DEBUG
 	else {
 		SCTPDBG(SCTP_DEBUG_PCB4, "Del Addr-ifn:%d Could not find address:",
 		    ifn_index);
 		SCTPDBG_ADDR(SCTP_DEBUG_PCB1, addr);
 	}
 #endif
 
 out_now:
 	SCTP_IPI_ADDR_WUNLOCK();
 	if (sctp_ifap) {
 		struct sctp_laddr *wi;
 
 		wi = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_laddr), struct sctp_laddr);
 		if (wi == NULL) {
 			/*
 			 * Gak, what can we do? We have lost an address
 			 * change can you say HOSED?
 			 */
 			SCTPDBG(SCTP_DEBUG_PCB4, "Lost an address change?\n");
 
 			/* Oops, must decrement the count */
 			sctp_free_ifa(sctp_ifap);
 			return;
 		}
 		SCTP_INCR_LADDR_COUNT();
 		bzero(wi, sizeof(*wi));
 		(void)SCTP_GETTIME_TIMEVAL(&wi->start_time);
 		wi->ifa = sctp_ifap;
 		wi->action = SCTP_DEL_IP_ADDRESS;
 		SCTP_WQ_ADDR_LOCK();
 		/*
 		 * Should this really be a tailq? As it is we will process
 		 * the newest first :-0
 		 */
 		LIST_INSERT_HEAD(&SCTP_BASE_INFO(addr_wq), wi, sctp_nxt_addr);
 		SCTP_WQ_ADDR_UNLOCK();
 
 		sctp_timer_start(SCTP_TIMER_TYPE_ADDR_WQ,
 		    (struct sctp_inpcb *)NULL,
 		    (struct sctp_tcb *)NULL,
 		    (struct sctp_nets *)NULL);
 	}
 	return;
 }
 
 
 static int
 sctp_does_stcb_own_this_addr(struct sctp_tcb *stcb, struct sockaddr *to)
 {
 	int loopback_scope;
 
 #if defined(INET)
 	int ipv4_local_scope, ipv4_addr_legal;
 
 #endif
 #if defined(INET6)
 	int local_scope, site_scope, ipv6_addr_legal;
 
 #endif
 	struct sctp_vrf *vrf;
 	struct sctp_ifn *sctp_ifn;
 	struct sctp_ifa *sctp_ifa;
 
 	loopback_scope = stcb->asoc.scope.loopback_scope;
 #if defined(INET)
 	ipv4_local_scope = stcb->asoc.scope.ipv4_local_scope;
 	ipv4_addr_legal = stcb->asoc.scope.ipv4_addr_legal;
 #endif
 #if defined(INET6)
 	local_scope = stcb->asoc.scope.local_scope;
 	site_scope = stcb->asoc.scope.site_scope;
 	ipv6_addr_legal = stcb->asoc.scope.ipv6_addr_legal;
 #endif
 
 	SCTP_IPI_ADDR_RLOCK();
 	vrf = sctp_find_vrf(stcb->asoc.vrf_id);
 	if (vrf == NULL) {
 		/* no vrf, no addresses */
 		SCTP_IPI_ADDR_RUNLOCK();
 		return (0);
 	}
 	if (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) {
 		LIST_FOREACH(sctp_ifn, &vrf->ifnlist, next_ifn) {
 			if ((loopback_scope == 0) &&
 			    SCTP_IFN_IS_IFT_LOOP(sctp_ifn)) {
 				continue;
 			}
 			LIST_FOREACH(sctp_ifa, &sctp_ifn->ifalist, next_ifa) {
 				if (sctp_is_addr_restricted(stcb, sctp_ifa) &&
 				    (!sctp_is_addr_pending(stcb, sctp_ifa))) {
 					/*
 					 * We allow pending addresses, where
 					 * we have sent an asconf-add to be
 					 * considered valid.
 					 */
 					continue;
 				}
 				if (sctp_ifa->address.sa.sa_family != to->sa_family) {
 					continue;
 				}
 				switch (sctp_ifa->address.sa.sa_family) {
 #ifdef INET
 				case AF_INET:
 					if (ipv4_addr_legal) {
 						struct sockaddr_in *sin,
 						           *rsin;
 
 						sin = &sctp_ifa->address.sin;
 						rsin = (struct sockaddr_in *)to;
 						if ((ipv4_local_scope == 0) &&
 						    IN4_ISPRIVATE_ADDRESS(&sin->sin_addr)) {
 							continue;
 						}
 						if (prison_check_ip4(stcb->sctp_ep->ip_inp.inp.inp_cred,
 						    &sin->sin_addr) != 0) {
 							continue;
 						}
 						if (sin->sin_addr.s_addr == rsin->sin_addr.s_addr) {
 							SCTP_IPI_ADDR_RUNLOCK();
 							return (1);
 						}
 					}
 					break;
 #endif
 #ifdef INET6
 				case AF_INET6:
 					if (ipv6_addr_legal) {
 						struct sockaddr_in6 *sin6,
 						            *rsin6;
 
 						sin6 = &sctp_ifa->address.sin6;
 						rsin6 = (struct sockaddr_in6 *)to;
 						if (prison_check_ip6(stcb->sctp_ep->ip_inp.inp.inp_cred,
 						    &sin6->sin6_addr) != 0) {
 							continue;
 						}
 						if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr)) {
 							if (local_scope == 0)
 								continue;
 							if (sin6->sin6_scope_id == 0) {
 								if (sa6_recoverscope(sin6) != 0)
 									continue;
 							}
 						}
 						if ((site_scope == 0) &&
 						    (IN6_IS_ADDR_SITELOCAL(&sin6->sin6_addr))) {
 							continue;
 						}
 						if (SCTP6_ARE_ADDR_EQUAL(sin6, rsin6)) {
 							SCTP_IPI_ADDR_RUNLOCK();
 							return (1);
 						}
 					}
 					break;
 #endif
 				default:
 					/* TSNH */
 					break;
 				}
 			}
 		}
 	} else {
 		struct sctp_laddr *laddr;
 
 		LIST_FOREACH(laddr, &stcb->sctp_ep->sctp_addr_list, sctp_nxt_addr) {
 			if (laddr->ifa->localifa_flags & SCTP_BEING_DELETED) {
 				SCTPDBG(SCTP_DEBUG_PCB1, "ifa being deleted\n");
 				continue;
 			}
 			if (sctp_is_addr_restricted(stcb, laddr->ifa) &&
 			    (!sctp_is_addr_pending(stcb, laddr->ifa))) {
 				/*
 				 * We allow pending addresses, where we have
 				 * sent an asconf-add to be considered
 				 * valid.
 				 */
 				continue;
 			}
 			if (laddr->ifa->address.sa.sa_family != to->sa_family) {
 				continue;
 			}
 			switch (to->sa_family) {
 #ifdef INET
 			case AF_INET:
 				{
 					struct sockaddr_in *sin, *rsin;
 
 					sin = &laddr->ifa->address.sin;
 					rsin = (struct sockaddr_in *)to;
 					if (sin->sin_addr.s_addr == rsin->sin_addr.s_addr) {
 						SCTP_IPI_ADDR_RUNLOCK();
 						return (1);
 					}
 					break;
 				}
 #endif
 #ifdef INET6
 			case AF_INET6:
 				{
 					struct sockaddr_in6 *sin6, *rsin6;
 
 					sin6 = &laddr->ifa->address.sin6;
 					rsin6 = (struct sockaddr_in6 *)to;
 					if (SCTP6_ARE_ADDR_EQUAL(sin6, rsin6)) {
 						SCTP_IPI_ADDR_RUNLOCK();
 						return (1);
 					}
 					break;
 				}
 
 #endif
 			default:
 				/* TSNH */
 				break;
 			}
 
 		}
 	}
 	SCTP_IPI_ADDR_RUNLOCK();
 	return (0);
 }
 
 
 static struct sctp_tcb *
 sctp_tcb_special_locate(struct sctp_inpcb **inp_p, struct sockaddr *from,
     struct sockaddr *to, struct sctp_nets **netp, uint32_t vrf_id)
 {
 	/**** ASSUMES THE CALLER holds the INP_INFO_RLOCK */
 	/*
 	 * If we support the TCP model, then we must now dig through to see
 	 * if we can find our endpoint in the list of tcp ep's.
 	 */
 	uint16_t lport, rport;
 	struct sctppcbhead *ephead;
 	struct sctp_inpcb *inp;
 	struct sctp_laddr *laddr;
 	struct sctp_tcb *stcb;
 	struct sctp_nets *net;
 
 	if ((to == NULL) || (from == NULL)) {
 		return (NULL);
 	}
 	switch (to->sa_family) {
 #ifdef INET
 	case AF_INET:
 		if (from->sa_family == AF_INET) {
 			lport = ((struct sockaddr_in *)to)->sin_port;
 			rport = ((struct sockaddr_in *)from)->sin_port;
 		} else {
 			return (NULL);
 		}
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		if (from->sa_family == AF_INET6) {
 			lport = ((struct sockaddr_in6 *)to)->sin6_port;
 			rport = ((struct sockaddr_in6 *)from)->sin6_port;
 		} else {
 			return (NULL);
 		}
 		break;
 #endif
 	default:
 		return (NULL);
 	}
 	ephead = &SCTP_BASE_INFO(sctp_tcpephash)[SCTP_PCBHASH_ALLADDR((lport | rport), SCTP_BASE_INFO(hashtcpmark))];
 	/*
 	 * Ok now for each of the guys in this bucket we must look and see:
 	 * - Does the remote port match. - Does there single association's
 	 * addresses match this address (to). If so we update p_ep to point
 	 * to this ep and return the tcb from it.
 	 */
 	LIST_FOREACH(inp, ephead, sctp_hash) {
 		SCTP_INP_RLOCK(inp);
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if (lport != inp->sctp_lport) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		switch (to->sa_family) {
 #ifdef INET
 		case AF_INET:
 			{
 				struct sockaddr_in *sin;
 
 				sin = (struct sockaddr_in *)to;
 				if (prison_check_ip4(inp->ip_inp.inp.inp_cred,
 				    &sin->sin_addr) != 0) {
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				break;
 			}
 #endif
 #ifdef INET6
 		case AF_INET6:
 			{
 				struct sockaddr_in6 *sin6;
 
 				sin6 = (struct sockaddr_in6 *)to;
 				if (prison_check_ip6(inp->ip_inp.inp.inp_cred,
 				    &sin6->sin6_addr) != 0) {
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				break;
 			}
 #endif
 		default:
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if (inp->def_vrf_id != vrf_id) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		/* check to see if the ep has one of the addresses */
 		if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) == 0) {
 			/* We are NOT bound all, so look further */
 			int match = 0;
 
 			LIST_FOREACH(laddr, &inp->sctp_addr_list, sctp_nxt_addr) {
 
 				if (laddr->ifa == NULL) {
 					SCTPDBG(SCTP_DEBUG_PCB1, "%s: NULL ifa\n", __func__);
 					continue;
 				}
 				if (laddr->ifa->localifa_flags & SCTP_BEING_DELETED) {
 					SCTPDBG(SCTP_DEBUG_PCB1, "ifa being deleted\n");
 					continue;
 				}
 				if (laddr->ifa->address.sa.sa_family ==
 				    to->sa_family) {
 					/* see if it matches */
 #ifdef INET
 					if (from->sa_family == AF_INET) {
 						struct sockaddr_in *intf_addr,
 						           *sin;
 
 						intf_addr = &laddr->ifa->address.sin;
 						sin = (struct sockaddr_in *)to;
 						if (sin->sin_addr.s_addr ==
 						    intf_addr->sin_addr.s_addr) {
 							match = 1;
 							break;
 						}
 					}
 #endif
 #ifdef INET6
 					if (from->sa_family == AF_INET6) {
 						struct sockaddr_in6 *intf_addr6;
 						struct sockaddr_in6 *sin6;
 
 						sin6 = (struct sockaddr_in6 *)
 						    to;
 						intf_addr6 = &laddr->ifa->address.sin6;
 
 						if (SCTP6_ARE_ADDR_EQUAL(sin6,
 						    intf_addr6)) {
 							match = 1;
 							break;
 						}
 					}
 #endif
 				}
 			}
 			if (match == 0) {
 				/* This endpoint does not have this address */
 				SCTP_INP_RUNLOCK(inp);
 				continue;
 			}
 		}
 		/*
 		 * Ok if we hit here the ep has the address, does it hold
 		 * the tcb?
 		 */
 		/* XXX: Why don't we TAILQ_FOREACH through sctp_asoc_list? */
 		stcb = LIST_FIRST(&inp->sctp_asoc_list);
 		if (stcb == NULL) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		SCTP_TCB_LOCK(stcb);
 		if (!sctp_does_stcb_own_this_addr(stcb, to)) {
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if (stcb->rport != rport) {
 			/* remote port does not match. */
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if (!sctp_does_stcb_own_this_addr(stcb, to)) {
 			SCTP_TCB_UNLOCK(stcb);
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		/* Does this TCB have a matching address? */
 		TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 
 			if (net->ro._l_addr.sa.sa_family != from->sa_family) {
 				/* not the same family, can't be a match */
 				continue;
 			}
 			switch (from->sa_family) {
 #ifdef INET
 			case AF_INET:
 				{
 					struct sockaddr_in *sin, *rsin;
 
 					sin = (struct sockaddr_in *)&net->ro._l_addr;
 					rsin = (struct sockaddr_in *)from;
 					if (sin->sin_addr.s_addr ==
 					    rsin->sin_addr.s_addr) {
 						/* found it */
 						if (netp != NULL) {
 							*netp = net;
 						}
 						/*
 						 * Update the endpoint
 						 * pointer
 						 */
 						*inp_p = inp;
 						SCTP_INP_RUNLOCK(inp);
 						return (stcb);
 					}
 					break;
 				}
 #endif
 #ifdef INET6
 			case AF_INET6:
 				{
 					struct sockaddr_in6 *sin6, *rsin6;
 
 					sin6 = (struct sockaddr_in6 *)&net->ro._l_addr;
 					rsin6 = (struct sockaddr_in6 *)from;
 					if (SCTP6_ARE_ADDR_EQUAL(sin6,
 					    rsin6)) {
 						/* found it */
 						if (netp != NULL) {
 							*netp = net;
 						}
 						/*
 						 * Update the endpoint
 						 * pointer
 						 */
 						*inp_p = inp;
 						SCTP_INP_RUNLOCK(inp);
 						return (stcb);
 					}
 					break;
 				}
 #endif
 			default:
 				/* TSNH */
 				break;
 			}
 		}
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_INP_RUNLOCK(inp);
 	}
 	return (NULL);
 }
 
 
 /*
  * rules for use
  *
  * 1) If I return a NULL you must decrement any INP ref cnt. 2) If I find an
  * stcb, both will be locked (locked_tcb and stcb) but decrement will be done
  * (if locked == NULL). 3) Decrement happens on return ONLY if locked ==
  * NULL.
  */
 
 struct sctp_tcb *
 sctp_findassociation_ep_addr(struct sctp_inpcb **inp_p, struct sockaddr *remote,
     struct sctp_nets **netp, struct sockaddr *local, struct sctp_tcb *locked_tcb)
 {
 	struct sctpasochead *head;
 	struct sctp_inpcb *inp;
 	struct sctp_tcb *stcb = NULL;
 	struct sctp_nets *net;
 	uint16_t rport;
 
 	inp = *inp_p;
 	switch (remote->sa_family) {
 #ifdef INET
 	case AF_INET:
 		rport = (((struct sockaddr_in *)remote)->sin_port);
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		rport = (((struct sockaddr_in6 *)remote)->sin6_port);
 		break;
 #endif
 	default:
 		return (NULL);
 	}
 	if (locked_tcb) {
 		/*
 		 * UN-lock so we can do proper locking here this occurs when
 		 * called from load_addresses_from_init.
 		 */
 		atomic_add_int(&locked_tcb->asoc.refcnt, 1);
 		SCTP_TCB_UNLOCK(locked_tcb);
 	}
 	SCTP_INP_INFO_RLOCK();
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
 		/*-
 		 * Now either this guy is our listener or it's the
 		 * connector. If it is the one that issued the connect, then
 		 * it's only chance is to be the first TCB in the list. If
 		 * it is the acceptor, then do the special_lookup to hash
 		 * and find the real inp.
 		 */
 		if ((inp->sctp_socket) && (inp->sctp_socket->so_qlimit)) {
 			/* to is peer addr, from is my addr */
 			stcb = sctp_tcb_special_locate(inp_p, remote, local,
 			    netp, inp->def_vrf_id);
 			if ((stcb != NULL) && (locked_tcb == NULL)) {
 				/* we have a locked tcb, lower refcount */
 				SCTP_INP_DECR_REF(inp);
 			}
 			if ((locked_tcb != NULL) && (locked_tcb != stcb)) {
 				SCTP_INP_RLOCK(locked_tcb->sctp_ep);
 				SCTP_TCB_LOCK(locked_tcb);
 				atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 				SCTP_INP_RUNLOCK(locked_tcb->sctp_ep);
 			}
 			SCTP_INP_INFO_RUNLOCK();
 			return (stcb);
 		} else {
 			SCTP_INP_WLOCK(inp);
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 				goto null_return;
 			}
 			stcb = LIST_FIRST(&inp->sctp_asoc_list);
 			if (stcb == NULL) {
 				goto null_return;
 			}
 			SCTP_TCB_LOCK(stcb);
 
 			if (stcb->rport != rport) {
 				/* remote port does not match. */
 				SCTP_TCB_UNLOCK(stcb);
 				goto null_return;
 			}
 			if (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 				SCTP_TCB_UNLOCK(stcb);
 				goto null_return;
 			}
 			if (local && !sctp_does_stcb_own_this_addr(stcb, local)) {
 				SCTP_TCB_UNLOCK(stcb);
 				goto null_return;
 			}
 			/* now look at the list of remote addresses */
 			TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 #ifdef INVARIANTS
 				if (net == (TAILQ_NEXT(net, sctp_next))) {
 					panic("Corrupt net list");
 				}
 #endif
 				if (net->ro._l_addr.sa.sa_family !=
 				    remote->sa_family) {
 					/* not the same family */
 					continue;
 				}
 				switch (remote->sa_family) {
 #ifdef INET
 				case AF_INET:
 					{
 						struct sockaddr_in *sin,
 						           *rsin;
 
 						sin = (struct sockaddr_in *)
 						    &net->ro._l_addr;
 						rsin = (struct sockaddr_in *)remote;
 						if (sin->sin_addr.s_addr ==
 						    rsin->sin_addr.s_addr) {
 							/* found it */
 							if (netp != NULL) {
 								*netp = net;
 							}
 							if (locked_tcb == NULL) {
 								SCTP_INP_DECR_REF(inp);
 							} else if (locked_tcb != stcb) {
 								SCTP_TCB_LOCK(locked_tcb);
 							}
 							if (locked_tcb) {
 								atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 							}
 							SCTP_INP_WUNLOCK(inp);
 							SCTP_INP_INFO_RUNLOCK();
 							return (stcb);
 						}
 						break;
 					}
 #endif
 #ifdef INET6
 				case AF_INET6:
 					{
 						struct sockaddr_in6 *sin6,
 						            *rsin6;
 
 						sin6 = (struct sockaddr_in6 *)&net->ro._l_addr;
 						rsin6 = (struct sockaddr_in6 *)remote;
 						if (SCTP6_ARE_ADDR_EQUAL(sin6,
 						    rsin6)) {
 							/* found it */
 							if (netp != NULL) {
 								*netp = net;
 							}
 							if (locked_tcb == NULL) {
 								SCTP_INP_DECR_REF(inp);
 							} else if (locked_tcb != stcb) {
 								SCTP_TCB_LOCK(locked_tcb);
 							}
 							if (locked_tcb) {
 								atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 							}
 							SCTP_INP_WUNLOCK(inp);
 							SCTP_INP_INFO_RUNLOCK();
 							return (stcb);
 						}
 						break;
 					}
 #endif
 				default:
 					/* TSNH */
 					break;
 				}
 			}
 			SCTP_TCB_UNLOCK(stcb);
 		}
 	} else {
 		SCTP_INP_WLOCK(inp);
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			goto null_return;
 		}
 		head = &inp->sctp_tcbhash[SCTP_PCBHASH_ALLADDR(rport,
 		    inp->sctp_hashmark)];
 		LIST_FOREACH(stcb, head, sctp_tcbhash) {
 			if (stcb->rport != rport) {
 				/* remote port does not match */
 				continue;
 			}
 			SCTP_TCB_LOCK(stcb);
 			if (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			if (local && !sctp_does_stcb_own_this_addr(stcb, local)) {
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			/* now look at the list of remote addresses */
 			TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 #ifdef INVARIANTS
 				if (net == (TAILQ_NEXT(net, sctp_next))) {
 					panic("Corrupt net list");
 				}
 #endif
 				if (net->ro._l_addr.sa.sa_family !=
 				    remote->sa_family) {
 					/* not the same family */
 					continue;
 				}
 				switch (remote->sa_family) {
 #ifdef INET
 				case AF_INET:
 					{
 						struct sockaddr_in *sin,
 						           *rsin;
 
 						sin = (struct sockaddr_in *)
 						    &net->ro._l_addr;
 						rsin = (struct sockaddr_in *)remote;
 						if (sin->sin_addr.s_addr ==
 						    rsin->sin_addr.s_addr) {
 							/* found it */
 							if (netp != NULL) {
 								*netp = net;
 							}
 							if (locked_tcb == NULL) {
 								SCTP_INP_DECR_REF(inp);
 							} else if (locked_tcb != stcb) {
 								SCTP_TCB_LOCK(locked_tcb);
 							}
 							if (locked_tcb) {
 								atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 							}
 							SCTP_INP_WUNLOCK(inp);
 							SCTP_INP_INFO_RUNLOCK();
 							return (stcb);
 						}
 						break;
 					}
 #endif
 #ifdef INET6
 				case AF_INET6:
 					{
 						struct sockaddr_in6 *sin6,
 						            *rsin6;
 
 						sin6 = (struct sockaddr_in6 *)
 						    &net->ro._l_addr;
 						rsin6 = (struct sockaddr_in6 *)remote;
 						if (SCTP6_ARE_ADDR_EQUAL(sin6,
 						    rsin6)) {
 							/* found it */
 							if (netp != NULL) {
 								*netp = net;
 							}
 							if (locked_tcb == NULL) {
 								SCTP_INP_DECR_REF(inp);
 							} else if (locked_tcb != stcb) {
 								SCTP_TCB_LOCK(locked_tcb);
 							}
 							if (locked_tcb) {
 								atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 							}
 							SCTP_INP_WUNLOCK(inp);
 							SCTP_INP_INFO_RUNLOCK();
 							return (stcb);
 						}
 						break;
 					}
 #endif
 				default:
 					/* TSNH */
 					break;
 				}
 			}
 			SCTP_TCB_UNLOCK(stcb);
 		}
 	}
 null_return:
 	/* clean up for returning null */
 	if (locked_tcb) {
 		SCTP_TCB_LOCK(locked_tcb);
 		atomic_subtract_int(&locked_tcb->asoc.refcnt, 1);
 	}
 	SCTP_INP_WUNLOCK(inp);
 	SCTP_INP_INFO_RUNLOCK();
 	/* not found */
 	return (NULL);
 }
 
 
 /*
  * Find an association for a specific endpoint using the association id given
  * out in the COMM_UP notification
  */
 struct sctp_tcb *
 sctp_findasoc_ep_asocid_locked(struct sctp_inpcb *inp, sctp_assoc_t asoc_id, int want_lock)
 {
 	/*
 	 * Use my the assoc_id to find a endpoint
 	 */
 	struct sctpasochead *head;
 	struct sctp_tcb *stcb;
 	uint32_t id;
 
 	if (inp == NULL) {
 		SCTP_PRINTF("TSNH ep_associd\n");
 		return (NULL);
 	}
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 		SCTP_PRINTF("TSNH ep_associd0\n");
 		return (NULL);
 	}
 	id = (uint32_t) asoc_id;
 	head = &inp->sctp_asocidhash[SCTP_PCBHASH_ASOC(id, inp->hashasocidmark)];
 	if (head == NULL) {
 		/* invalid id TSNH */
 		SCTP_PRINTF("TSNH ep_associd1\n");
 		return (NULL);
 	}
 	LIST_FOREACH(stcb, head, sctp_tcbasocidhash) {
 		if (stcb->asoc.assoc_id == id) {
 			if (inp != stcb->sctp_ep) {
 				/*
 				 * some other guy has the same id active (id
 				 * collision ??).
 				 */
 				SCTP_PRINTF("TSNH ep_associd2\n");
 				continue;
 			}
 			if (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 				continue;
 			}
 			if (want_lock) {
 				SCTP_TCB_LOCK(stcb);
 			}
 			return (stcb);
 		}
 	}
 	return (NULL);
 }
 
 
 struct sctp_tcb *
 sctp_findassociation_ep_asocid(struct sctp_inpcb *inp, sctp_assoc_t asoc_id, int want_lock)
 {
 	struct sctp_tcb *stcb;
 
 	SCTP_INP_RLOCK(inp);
 	stcb = sctp_findasoc_ep_asocid_locked(inp, asoc_id, want_lock);
 	SCTP_INP_RUNLOCK(inp);
 	return (stcb);
 }
 
 
 /*
  * Endpoint probe expects that the INP_INFO is locked.
  */
 static struct sctp_inpcb *
 sctp_endpoint_probe(struct sockaddr *nam, struct sctppcbhead *head,
     uint16_t lport, uint32_t vrf_id)
 {
 	struct sctp_inpcb *inp;
 	struct sctp_laddr *laddr;
 
 #ifdef INET
 	struct sockaddr_in *sin;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 *sin6;
 	struct sockaddr_in6 *intf_addr6;
 
 #endif
 	int fnd;
 
 #ifdef INET
 	sin = NULL;
 #endif
 #ifdef INET6
 	sin6 = NULL;
 #endif
 	switch (nam->sa_family) {
 #ifdef INET
 	case AF_INET:
 		sin = (struct sockaddr_in *)nam;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		sin6 = (struct sockaddr_in6 *)nam;
 		break;
 #endif
 	default:
 		/* unsupported family */
 		return (NULL);
 	}
 
 	if (head == NULL)
 		return (NULL);
 
 	LIST_FOREACH(inp, head, sctp_hash) {
 		SCTP_INP_RLOCK(inp);
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) &&
 		    (inp->sctp_lport == lport)) {
 			/* got it */
 			switch (nam->sa_family) {
 #ifdef INET
 			case AF_INET:
 				if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) &&
 				    SCTP_IPV6_V6ONLY(inp)) {
 					/*
 					 * IPv4 on a IPv6 socket with ONLY
 					 * IPv6 set
 					 */
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				if (prison_check_ip4(inp->ip_inp.inp.inp_cred,
 				    &sin->sin_addr) != 0) {
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				break;
 #endif
 #ifdef INET6
 			case AF_INET6:
 				/*
 				 * A V6 address and the endpoint is NOT
 				 * bound V6
 				 */
 				if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) == 0) {
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				if (prison_check_ip6(inp->ip_inp.inp.inp_cred,
 				    &sin6->sin6_addr) != 0) {
 					SCTP_INP_RUNLOCK(inp);
 					continue;
 				}
 				break;
 #endif
 			default:
 				break;
 			}
 			/* does a VRF id match? */
 			fnd = 0;
 			if (inp->def_vrf_id == vrf_id)
 				fnd = 1;
 
 			SCTP_INP_RUNLOCK(inp);
 			if (!fnd)
 				continue;
 			return (inp);
 		}
 		SCTP_INP_RUNLOCK(inp);
 	}
 	switch (nam->sa_family) {
 #ifdef INET
 	case AF_INET:
 		if (sin->sin_addr.s_addr == INADDR_ANY) {
 			/* Can't hunt for one that has no address specified */
 			return (NULL);
 		}
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		if (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr)) {
 			/* Can't hunt for one that has no address specified */
 			return (NULL);
 		}
 		break;
 #endif
 	default:
 		break;
 	}
 	/*
 	 * ok, not bound to all so see if we can find a EP bound to this
 	 * address.
 	 */
 	LIST_FOREACH(inp, head, sctp_hash) {
 		SCTP_INP_RLOCK(inp);
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL)) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		/*
 		 * Ok this could be a likely candidate, look at all of its
 		 * addresses
 		 */
 		if (inp->sctp_lport != lport) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		/* does a VRF id match? */
 		fnd = 0;
 		if (inp->def_vrf_id == vrf_id)
 			fnd = 1;
 
 		if (!fnd) {
 			SCTP_INP_RUNLOCK(inp);
 			continue;
 		}
 		LIST_FOREACH(laddr, &inp->sctp_addr_list, sctp_nxt_addr) {
 			if (laddr->ifa == NULL) {
 				SCTPDBG(SCTP_DEBUG_PCB1, "%s: NULL ifa\n",
 				    __func__);
 				continue;
 			}
 			SCTPDBG(SCTP_DEBUG_PCB1, "Ok laddr->ifa:%p is possible, ",
 			    (void *)laddr->ifa);
 			if (laddr->ifa->localifa_flags & SCTP_BEING_DELETED) {
 				SCTPDBG(SCTP_DEBUG_PCB1, "Huh IFA being deleted\n");
 				continue;
 			}
 			if (laddr->ifa->address.sa.sa_family == nam->sa_family) {
 				/* possible, see if it matches */
 				switch (nam->sa_family) {
 #ifdef INET
 				case AF_INET:
 					if (sin->sin_addr.s_addr ==
 					    laddr->ifa->address.sin.sin_addr.s_addr) {
 						SCTP_INP_RUNLOCK(inp);
 						return (inp);
 					}
 					break;
 #endif
 #ifdef INET6
 				case AF_INET6:
 					intf_addr6 = &laddr->ifa->address.sin6;
 					if (SCTP6_ARE_ADDR_EQUAL(sin6,
 					    intf_addr6)) {
 						SCTP_INP_RUNLOCK(inp);
 						return (inp);
 					}
 					break;
 #endif
 				}
 			}
 		}
 		SCTP_INP_RUNLOCK(inp);
 	}
 	return (NULL);
 }
 
 
 static struct sctp_inpcb *
 sctp_isport_inuse(struct sctp_inpcb *inp, uint16_t lport, uint32_t vrf_id)
 {
 	struct sctppcbhead *head;
 	struct sctp_inpcb *t_inp;
 	int fnd;
 
 	head = &SCTP_BASE_INFO(sctp_ephash)[SCTP_PCBHASH_ALLADDR(lport,
 	    SCTP_BASE_INFO(hashmark))];
 	LIST_FOREACH(t_inp, head, sctp_hash) {
 		if (t_inp->sctp_lport != lport) {
 			continue;
 		}
 		/* is it in the VRF in question */
 		fnd = 0;
 		if (t_inp->def_vrf_id == vrf_id)
 			fnd = 1;
 		if (!fnd)
 			continue;
 
 		/* This one is in use. */
 		/* check the v6/v4 binding issue */
 		if ((t_inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) &&
 		    SCTP_IPV6_V6ONLY(t_inp)) {
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) {
 				/* collision in V6 space */
 				return (t_inp);
 			} else {
 				/* inp is BOUND_V4 no conflict */
 				continue;
 			}
 		} else if (t_inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) {
 			/* t_inp is bound v4 and v6, conflict always */
 			return (t_inp);
 		} else {
 			/* t_inp is bound only V4 */
 			if ((inp->sctp_flags & SCTP_PCB_FLAGS_BOUND_V6) &&
 			    SCTP_IPV6_V6ONLY(inp)) {
 				/* no conflict */
 				continue;
 			}
 			/* else fall through to conflict */
 		}
 		return (t_inp);
 	}
 	return (NULL);
 }
 
 
 int
 sctp_swap_inpcb_for_listen(struct sctp_inpcb *inp)
 {
 	/* For 1-2-1 with port reuse */
 	struct sctppcbhead *head;
 	struct sctp_inpcb *tinp, *ninp;
 
 	if (sctp_is_feature_off(inp, SCTP_PCB_FLAGS_PORTREUSE)) {
 		/* only works with port reuse on */
 		return (-1);
 	}
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL) == 0) {
 		return (0);
 	}
 	SCTP_INP_RUNLOCK(inp);
 	SCTP_INP_INFO_WLOCK();
 	head = &SCTP_BASE_INFO(sctp_ephash)[SCTP_PCBHASH_ALLADDR(inp->sctp_lport,
 	    SCTP_BASE_INFO(hashmark))];
 	/* Kick out all non-listeners to the TCP hash */
 	LIST_FOREACH_SAFE(tinp, head, sctp_hash, ninp) {
 		if (tinp->sctp_lport != inp->sctp_lport) {
 			continue;
 		}
 		if (tinp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			continue;
 		}
 		if (tinp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 			continue;
 		}
 		if (tinp->sctp_socket->so_qlimit) {
 			continue;
 		}
 		SCTP_INP_WLOCK(tinp);
 		LIST_REMOVE(tinp, sctp_hash);
 		head = &SCTP_BASE_INFO(sctp_tcpephash)[SCTP_PCBHASH_ALLADDR(tinp->sctp_lport, SCTP_BASE_INFO(hashtcpmark))];
 		tinp->sctp_flags |= SCTP_PCB_FLAGS_IN_TCPPOOL;
 		LIST_INSERT_HEAD(head, tinp, sctp_hash);
 		SCTP_INP_WUNLOCK(tinp);
 	}
 	SCTP_INP_WLOCK(inp);
 	/* Pull from where he was */
 	LIST_REMOVE(inp, sctp_hash);
 	inp->sctp_flags &= ~SCTP_PCB_FLAGS_IN_TCPPOOL;
 	head = &SCTP_BASE_INFO(sctp_ephash)[SCTP_PCBHASH_ALLADDR(inp->sctp_lport, SCTP_BASE_INFO(hashmark))];
 	LIST_INSERT_HEAD(head, inp, sctp_hash);
 	SCTP_INP_WUNLOCK(inp);
 	SCTP_INP_RLOCK(inp);
 	SCTP_INP_INFO_WUNLOCK();
 	return (0);
 }
 
 
 struct sctp_inpcb *
 sctp_pcb_findep(struct sockaddr *nam, int find_tcp_pool, int have_lock,
     uint32_t vrf_id)
 {
 	/*
 	 * First we check the hash table to see if someone has this port
 	 * bound with just the port.
 	 */
 	struct sctp_inpcb *inp;
 	struct sctppcbhead *head;
 	int lport;
 	unsigned int i;
 
 #ifdef INET
 	struct sockaddr_in *sin;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 *sin6;
 
 #endif
 
 	switch (nam->sa_family) {
 #ifdef INET
 	case AF_INET:
 		sin = (struct sockaddr_in *)nam;
 		lport = sin->sin_port;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		sin6 = (struct sockaddr_in6 *)nam;
 		lport = sin6->sin6_port;
 		break;
 #endif
 	default:
 		return (NULL);
 	}
 	/*
 	 * I could cheat here and just cast to one of the types but we will
 	 * do it right. It also provides the check against an Unsupported
 	 * type too.
 	 */
 	/* Find the head of the ALLADDR chain */
 	if (have_lock == 0) {
 		SCTP_INP_INFO_RLOCK();
 	}
 	head = &SCTP_BASE_INFO(sctp_ephash)[SCTP_PCBHASH_ALLADDR(lport,
 	    SCTP_BASE_INFO(hashmark))];
 	inp = sctp_endpoint_probe(nam, head, lport, vrf_id);
 
 	/*
 	 * If the TCP model exists it could be that the main listening
 	 * endpoint is gone but there still exists a connected socket for
 	 * this guy. If so we can return the first one that we find. This
 	 * may NOT be the correct one so the caller should be wary on the
 	 * returned INP. Currently the only caller that sets find_tcp_pool
 	 * is in bindx where we are verifying that a user CAN bind the
 	 * address. He either has bound it already, or someone else has, or
 	 * its open to bind, so this is good enough.
 	 */
 	if (inp == NULL && find_tcp_pool) {
 		for (i = 0; i < SCTP_BASE_INFO(hashtcpmark) + 1; i++) {
 			head = &SCTP_BASE_INFO(sctp_tcpephash)[i];
 			inp = sctp_endpoint_probe(nam, head, lport, vrf_id);
 			if (inp) {
 				break;
 			}
 		}
 	}
 	if (inp) {
 		SCTP_INP_INCR_REF(inp);
 	}
 	if (have_lock == 0) {
 		SCTP_INP_INFO_RUNLOCK();
 	}
 	return (inp);
 }
 
 
 /*
  * Find an association for an endpoint with the pointer to whom you want to
  * send to and the endpoint pointer. The address can be IPv4 or IPv6. We may
  * need to change the *to to some other struct like a mbuf...
  */
 struct sctp_tcb *
 sctp_findassociation_addr_sa(struct sockaddr *from, struct sockaddr *to,
     struct sctp_inpcb **inp_p, struct sctp_nets **netp, int find_tcp_pool,
     uint32_t vrf_id)
 {
 	struct sctp_inpcb *inp = NULL;
 	struct sctp_tcb *stcb;
 
 	SCTP_INP_INFO_RLOCK();
 	if (find_tcp_pool) {
 		if (inp_p != NULL) {
 			stcb = sctp_tcb_special_locate(inp_p, from, to, netp,
 			    vrf_id);
 		} else {
 			stcb = sctp_tcb_special_locate(&inp, from, to, netp,
 			    vrf_id);
 		}
 		if (stcb != NULL) {
 			SCTP_INP_INFO_RUNLOCK();
 			return (stcb);
 		}
 	}
 	inp = sctp_pcb_findep(to, 0, 1, vrf_id);
 	if (inp_p != NULL) {
 		*inp_p = inp;
 	}
 	SCTP_INP_INFO_RUNLOCK();
 	if (inp == NULL) {
 		return (NULL);
 	}
 	/*
 	 * ok, we have an endpoint, now lets find the assoc for it (if any)
 	 * we now place the source address or from in the to of the find
 	 * endpoint call. Since in reality this chain is used from the
 	 * inbound packet side.
 	 */
 	if (inp_p != NULL) {
 		stcb = sctp_findassociation_ep_addr(inp_p, from, netp, to,
 		    NULL);
 	} else {
 		stcb = sctp_findassociation_ep_addr(&inp, from, netp, to,
 		    NULL);
 	}
 	return (stcb);
 }
 
 
 /*
  * This routine will grub through the mbuf that is a INIT or INIT-ACK and
  * find all addresses that the sender has specified in any address list. Each
  * address will be used to lookup the TCB and see if one exits.
  */
 static struct sctp_tcb *
 sctp_findassociation_special_addr(struct mbuf *m, int offset,
     struct sctphdr *sh, struct sctp_inpcb **inp_p, struct sctp_nets **netp,
     struct sockaddr *dst)
 {
 	struct sctp_paramhdr *phdr, parm_buf;
 
 #if defined(INET) || defined(INET6)
 	struct sctp_tcb *stcb;
 	uint16_t ptype;
 
 #endif
 	uint16_t plen;
 
 #ifdef INET
 	struct sockaddr_in sin4;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 sin6;
 
 #endif
 
 #ifdef INET
 	memset(&sin4, 0, sizeof(sin4));
 	sin4.sin_len = sizeof(sin4);
 	sin4.sin_family = AF_INET;
 	sin4.sin_port = sh->src_port;
 #endif
 #ifdef INET6
 	memset(&sin6, 0, sizeof(sin6));
 	sin6.sin6_len = sizeof(sin6);
 	sin6.sin6_family = AF_INET6;
 	sin6.sin6_port = sh->src_port;
 #endif
 
 	offset += sizeof(struct sctp_init_chunk);
 
 	phdr = sctp_get_next_param(m, offset, &parm_buf, sizeof(parm_buf));
 	while (phdr != NULL) {
 		/* now we must see if we want the parameter */
 #if defined(INET) || defined(INET6)
 		ptype = ntohs(phdr->param_type);
 #endif
 		plen = ntohs(phdr->param_length);
 		if (plen == 0) {
 			break;
 		}
 #ifdef INET
 		if (ptype == SCTP_IPV4_ADDRESS &&
 		    plen == sizeof(struct sctp_ipv4addr_param)) {
 			/* Get the rest of the address */
 			struct sctp_ipv4addr_param ip4_parm, *p4;
 
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)&ip4_parm, min(plen, sizeof(ip4_parm)));
 			if (phdr == NULL) {
 				return (NULL);
 			}
 			p4 = (struct sctp_ipv4addr_param *)phdr;
 			memcpy(&sin4.sin_addr, &p4->addr, sizeof(p4->addr));
 			/* look it up */
 			stcb = sctp_findassociation_ep_addr(inp_p,
 			    (struct sockaddr *)&sin4, netp, dst, NULL);
 			if (stcb != NULL) {
 				return (stcb);
 			}
 		}
 #endif
 #ifdef INET6
 		if (ptype == SCTP_IPV6_ADDRESS &&
 		    plen == sizeof(struct sctp_ipv6addr_param)) {
 			/* Get the rest of the address */
 			struct sctp_ipv6addr_param ip6_parm, *p6;
 
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)&ip6_parm, min(plen, sizeof(ip6_parm)));
 			if (phdr == NULL) {
 				return (NULL);
 			}
 			p6 = (struct sctp_ipv6addr_param *)phdr;
 			memcpy(&sin6.sin6_addr, &p6->addr, sizeof(p6->addr));
 			/* look it up */
 			stcb = sctp_findassociation_ep_addr(inp_p,
 			    (struct sockaddr *)&sin6, netp, dst, NULL);
 			if (stcb != NULL) {
 				return (stcb);
 			}
 		}
 #endif
 		offset += SCTP_SIZE32(plen);
 		phdr = sctp_get_next_param(m, offset, &parm_buf,
 		    sizeof(parm_buf));
 	}
 	return (NULL);
 }
 
 static struct sctp_tcb *
 sctp_findassoc_by_vtag(struct sockaddr *from, struct sockaddr *to, uint32_t vtag,
     struct sctp_inpcb **inp_p, struct sctp_nets **netp, uint16_t rport,
     uint16_t lport, int skip_src_check, uint32_t vrf_id, uint32_t remote_tag)
 {
 	/*
 	 * Use my vtag to hash. If we find it we then verify the source addr
 	 * is in the assoc. If all goes well we save a bit on rec of a
 	 * packet.
 	 */
 	struct sctpasochead *head;
 	struct sctp_nets *net;
 	struct sctp_tcb *stcb;
 
 	SCTP_INP_INFO_RLOCK();
 	head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(vtag,
 	    SCTP_BASE_INFO(hashasocmark))];
 	LIST_FOREACH(stcb, head, sctp_asocs) {
 		SCTP_INP_RLOCK(stcb->sctp_ep);
 		if (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			SCTP_INP_RUNLOCK(stcb->sctp_ep);
 			continue;
 		}
 		if (stcb->sctp_ep->def_vrf_id != vrf_id) {
 			SCTP_INP_RUNLOCK(stcb->sctp_ep);
 			continue;
 		}
 		SCTP_TCB_LOCK(stcb);
 		SCTP_INP_RUNLOCK(stcb->sctp_ep);
 		if (stcb->asoc.my_vtag == vtag) {
 			/* candidate */
 			if (stcb->rport != rport) {
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			if (stcb->sctp_ep->sctp_lport != lport) {
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			if (stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			/* RRS:Need toaddr check here */
 			if (sctp_does_stcb_own_this_addr(stcb, to) == 0) {
 				/* Endpoint does not own this address */
 				SCTP_TCB_UNLOCK(stcb);
 				continue;
 			}
 			if (remote_tag) {
 				/*
 				 * If we have both vtags that's all we match
 				 * on
 				 */
 				if (stcb->asoc.peer_vtag == remote_tag) {
 					/*
 					 * If both tags match we consider it
 					 * conclusive and check NO
 					 * source/destination addresses
 					 */
 					goto conclusive;
 				}
 			}
 			if (skip_src_check) {
 		conclusive:
 				if (from) {
 					*netp = sctp_findnet(stcb, from);
 				} else {
 					*netp = NULL;	/* unknown */
 				}
 				if (inp_p)
 					*inp_p = stcb->sctp_ep;
 				SCTP_INP_INFO_RUNLOCK();
 				return (stcb);
 			}
 			net = sctp_findnet(stcb, from);
 			if (net) {
 				/* yep its him. */
 				*netp = net;
 				SCTP_STAT_INCR(sctps_vtagexpress);
 				*inp_p = stcb->sctp_ep;
 				SCTP_INP_INFO_RUNLOCK();
 				return (stcb);
 			} else {
 				/*
 				 * not him, this should only happen in rare
 				 * cases so I peg it.
 				 */
 				SCTP_STAT_INCR(sctps_vtagbogus);
 			}
 		}
 		SCTP_TCB_UNLOCK(stcb);
 	}
 	SCTP_INP_INFO_RUNLOCK();
 	return (NULL);
 }
 
 
 /*
  * Find an association with the pointer to the inbound IP packet. This can be
  * a IPv4 or IPv6 packet.
  */
 struct sctp_tcb *
 sctp_findassociation_addr(struct mbuf *m, int offset,
     struct sockaddr *src, struct sockaddr *dst,
     struct sctphdr *sh, struct sctp_chunkhdr *ch,
     struct sctp_inpcb **inp_p, struct sctp_nets **netp, uint32_t vrf_id)
 {
 	struct sctp_tcb *stcb;
 	struct sctp_inpcb *inp;
 
 	if (sh->v_tag) {
 		/* we only go down this path if vtag is non-zero */
 		stcb = sctp_findassoc_by_vtag(src, dst, ntohl(sh->v_tag),
 		    inp_p, netp, sh->src_port, sh->dest_port, 0, vrf_id, 0);
 		if (stcb) {
 			return (stcb);
 		}
 	}
 	if (inp_p) {
 		stcb = sctp_findassociation_addr_sa(src, dst, inp_p, netp,
 		    1, vrf_id);
 		inp = *inp_p;
 	} else {
 		stcb = sctp_findassociation_addr_sa(src, dst, &inp, netp,
 		    1, vrf_id);
 	}
 	SCTPDBG(SCTP_DEBUG_PCB1, "stcb:%p inp:%p\n", (void *)stcb, (void *)inp);
 	if (stcb == NULL && inp) {
 		/* Found a EP but not this address */
 		if ((ch->chunk_type == SCTP_INITIATION) ||
 		    (ch->chunk_type == SCTP_INITIATION_ACK)) {
 			/*-
 			 * special hook, we do NOT return linp or an
 			 * association that is linked to an existing
 			 * association that is under the TCP pool (i.e. no
 			 * listener exists). The endpoint finding routine
 			 * will always find a listener before examining the
 			 * TCP pool.
 			 */
 			if (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL) {
 				if (inp_p) {
 					*inp_p = NULL;
 				}
 				return (NULL);
 			}
 			stcb = sctp_findassociation_special_addr(m,
 			    offset, sh, &inp, netp, dst);
 			if (inp_p != NULL) {
 				*inp_p = inp;
 			}
 		}
 	}
 	SCTPDBG(SCTP_DEBUG_PCB1, "stcb is %p\n", (void *)stcb);
 	return (stcb);
 }
 
 /*
  * lookup an association by an ASCONF lookup address.
  * if the lookup address is 0.0.0.0 or ::0, use the vtag to do the lookup
  */
 struct sctp_tcb *
 sctp_findassociation_ep_asconf(struct mbuf *m, int offset,
     struct sockaddr *dst, struct sctphdr *sh,
     struct sctp_inpcb **inp_p, struct sctp_nets **netp, uint32_t vrf_id)
 {
 	struct sctp_tcb *stcb;
 	union sctp_sockstore remote_store;
 	struct sctp_paramhdr parm_buf, *phdr;
 	int ptype;
 	int zero_address = 0;
 
 #ifdef INET
 	struct sockaddr_in *sin;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 *sin6;
 
 #endif
 
 	memset(&remote_store, 0, sizeof(remote_store));
 	phdr = sctp_get_next_param(m, offset + sizeof(struct sctp_asconf_chunk),
 	    &parm_buf, sizeof(struct sctp_paramhdr));
 	if (phdr == NULL) {
 		SCTPDBG(SCTP_DEBUG_INPUT3, "%s: failed to get asconf lookup addr\n",
 		    __func__);
 		return NULL;
 	}
 	ptype = (int)((uint32_t) ntohs(phdr->param_type));
 	/* get the correlation address */
 	switch (ptype) {
 #ifdef INET6
 	case SCTP_IPV6_ADDRESS:
 		{
 			/* ipv6 address param */
 			struct sctp_ipv6addr_param *p6, p6_buf;
 
 			if (ntohs(phdr->param_length) != sizeof(struct sctp_ipv6addr_param)) {
 				return NULL;
 			}
 			p6 = (struct sctp_ipv6addr_param *)sctp_get_next_param(m,
 			    offset + sizeof(struct sctp_asconf_chunk),
 			    &p6_buf.ph, sizeof(*p6));
 			if (p6 == NULL) {
 				SCTPDBG(SCTP_DEBUG_INPUT3, "%s: failed to get asconf v6 lookup addr\n",
 				    __func__);
 				return (NULL);
 			}
 			sin6 = &remote_store.sin6;
 			sin6->sin6_family = AF_INET6;
 			sin6->sin6_len = sizeof(*sin6);
 			sin6->sin6_port = sh->src_port;
 			memcpy(&sin6->sin6_addr, &p6->addr, sizeof(struct in6_addr));
 			if (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr))
 				zero_address = 1;
 			break;
 		}
 #endif
 #ifdef INET
 	case SCTP_IPV4_ADDRESS:
 		{
 			/* ipv4 address param */
 			struct sctp_ipv4addr_param *p4, p4_buf;
 
 			if (ntohs(phdr->param_length) != sizeof(struct sctp_ipv4addr_param)) {
 				return NULL;
 			}
 			p4 = (struct sctp_ipv4addr_param *)sctp_get_next_param(m,
 			    offset + sizeof(struct sctp_asconf_chunk),
 			    &p4_buf.ph, sizeof(*p4));
 			if (p4 == NULL) {
 				SCTPDBG(SCTP_DEBUG_INPUT3, "%s: failed to get asconf v4 lookup addr\n",
 				    __func__);
 				return (NULL);
 			}
 			sin = &remote_store.sin;
 			sin->sin_family = AF_INET;
 			sin->sin_len = sizeof(*sin);
 			sin->sin_port = sh->src_port;
 			memcpy(&sin->sin_addr, &p4->addr, sizeof(struct in_addr));
 			if (sin->sin_addr.s_addr == INADDR_ANY)
 				zero_address = 1;
 			break;
 		}
 #endif
 	default:
 		/* invalid address param type */
 		return NULL;
 	}
 
 	if (zero_address) {
 		stcb = sctp_findassoc_by_vtag(NULL, dst, ntohl(sh->v_tag), inp_p,
 		    netp, sh->src_port, sh->dest_port, 1, vrf_id, 0);
 		if (stcb != NULL) {
 			SCTP_INP_DECR_REF(*inp_p);
 		}
 	} else {
 		stcb = sctp_findassociation_ep_addr(inp_p,
 		    &remote_store.sa, netp,
 		    dst, NULL);
 	}
 	return (stcb);
 }
 
 
 /*
  * allocate a sctp_inpcb and setup a temporary binding to a port/all
  * addresses. This way if we don't get a bind we by default pick a ephemeral
  * port with all addresses bound.
  */
 int
 sctp_inpcb_alloc(struct socket *so, uint32_t vrf_id)
 {
 	/*
 	 * we get called when a new endpoint starts up. We need to allocate
 	 * the sctp_inpcb structure from the zone and init it. Mark it as
 	 * unbound and find a port that we can use as an ephemeral with
 	 * INADDR_ANY. If the user binds later no problem we can then add in
 	 * the specific addresses. And setup the default parameters for the
 	 * EP.
 	 */
 	int i, error;
 	struct sctp_inpcb *inp;
 	struct sctp_pcb *m;
 	struct timeval time;
 	sctp_sharedkey_t *null_key;
 
 	error = 0;
 
 	SCTP_INP_INFO_WLOCK();
 	inp = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_ep), struct sctp_inpcb);
 	if (inp == NULL) {
 		SCTP_PRINTF("Out of SCTP-INPCB structures - no resources\n");
 		SCTP_INP_INFO_WUNLOCK();
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOBUFS);
 		return (ENOBUFS);
 	}
 	/* zap it */
 	bzero(inp, sizeof(*inp));
 
 	/* bump generations */
 	/* setup socket pointers */
 	inp->sctp_socket = so;
 	inp->ip_inp.inp.inp_socket = so;
 	inp->ip_inp.inp.inp_cred = crhold(so->so_cred);
 #ifdef INET6
 	if (INP_SOCKAF(so) == AF_INET6) {
 		if (MODULE_GLOBAL(ip6_auto_flowlabel)) {
 			inp->ip_inp.inp.inp_flags |= IN6P_AUTOFLOWLABEL;
 		}
 		if (MODULE_GLOBAL(ip6_v6only)) {
 			inp->ip_inp.inp.inp_flags |= IN6P_IPV6_V6ONLY;
 		}
 	}
 #endif
 	inp->sctp_associd_counter = 1;
 	inp->partial_delivery_point = SCTP_SB_LIMIT_RCV(so) >> SCTP_PARTIAL_DELIVERY_SHIFT;
 	inp->sctp_frag_point = SCTP_DEFAULT_MAXSEGMENT;
 	inp->max_cwnd = 0;
 	inp->sctp_cmt_on_off = SCTP_BASE_SYSCTL(sctp_cmt_on_off);
 	inp->ecn_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_ecn_enable);
 	inp->prsctp_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_pr_enable);
 	inp->auth_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_auth_enable);
 	inp->asconf_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_asconf_enable);
 	inp->reconfig_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_reconfig_enable);
 	inp->nrsack_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_nrsack_enable);
 	inp->pktdrop_supported = (uint8_t) SCTP_BASE_SYSCTL(sctp_pktdrop_enable);
 	inp->idata_supported = 0;
 
 	inp->fibnum = so->so_fibnum;
 	/* init the small hash table we use to track asocid <-> tcb */
 	inp->sctp_asocidhash = SCTP_HASH_INIT(SCTP_STACK_VTAG_HASH_SIZE, &inp->hashasocidmark);
 	if (inp->sctp_asocidhash == NULL) {
 		crfree(inp->ip_inp.inp.inp_cred);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp);
 		SCTP_INP_INFO_WUNLOCK();
 		return (ENOBUFS);
 	}
 #ifdef IPSEC
 	error = ipsec_init_policy(so, &inp->ip_inp.inp.inp_sp);
 	if (error != 0) {
 		crfree(inp->ip_inp.inp.inp_cred);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp);
 		SCTP_INP_INFO_WUNLOCK();
 		return error;
 	}
 #endif				/* IPSEC */
 	SCTP_INCR_EP_COUNT();
 	inp->ip_inp.inp.inp_ip_ttl = MODULE_GLOBAL(ip_defttl);
 	SCTP_INP_INFO_WUNLOCK();
 
 	so->so_pcb = (caddr_t)inp;
 
 	if (SCTP_SO_TYPE(so) == SOCK_SEQPACKET) {
 		/* UDP style socket */
 		inp->sctp_flags = (SCTP_PCB_FLAGS_UDPTYPE |
 		    SCTP_PCB_FLAGS_UNBOUND);
 		/* Be sure it is NON-BLOCKING IO for UDP */
 		/* SCTP_SET_SO_NBIO(so); */
 	} else if (SCTP_SO_TYPE(so) == SOCK_STREAM) {
 		/* TCP style socket */
 		inp->sctp_flags = (SCTP_PCB_FLAGS_TCPTYPE |
 		    SCTP_PCB_FLAGS_UNBOUND);
 		/* Be sure we have blocking IO by default */
 		SCTP_CLEAR_SO_NBIO(so);
 	} else {
 		/*
 		 * unsupported socket type (RAW, etc)- in case we missed it
 		 * in protosw
 		 */
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EOPNOTSUPP);
 		so->so_pcb = NULL;
 		crfree(inp->ip_inp.inp.inp_cred);
 #ifdef IPSEC
 		ipsec_delete_pcbpolicy(&inp->ip_inp.inp);
 #endif
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp);
 		return (EOPNOTSUPP);
 	}
 	if (SCTP_BASE_SYSCTL(sctp_default_frag_interleave) == SCTP_FRAG_LEVEL_1) {
 		sctp_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE);
 		sctp_feature_off(inp, SCTP_PCB_FLAGS_INTERLEAVE_STRMS);
 	} else if (SCTP_BASE_SYSCTL(sctp_default_frag_interleave) == SCTP_FRAG_LEVEL_2) {
 		sctp_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE);
 		sctp_feature_on(inp, SCTP_PCB_FLAGS_INTERLEAVE_STRMS);
 	} else if (SCTP_BASE_SYSCTL(sctp_default_frag_interleave) == SCTP_FRAG_LEVEL_0) {
 		sctp_feature_off(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE);
 		sctp_feature_off(inp, SCTP_PCB_FLAGS_INTERLEAVE_STRMS);
 	}
 	inp->sctp_tcbhash = SCTP_HASH_INIT(SCTP_BASE_SYSCTL(sctp_pcbtblsize),
 	    &inp->sctp_hashmark);
 	if (inp->sctp_tcbhash == NULL) {
 		SCTP_PRINTF("Out of SCTP-INPCB->hashinit - no resources\n");
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOBUFS);
 		so->so_pcb = NULL;
 		crfree(inp->ip_inp.inp.inp_cred);
 #ifdef IPSEC
 		ipsec_delete_pcbpolicy(&inp->ip_inp.inp);
 #endif
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp);
 		return (ENOBUFS);
 	}
 	inp->def_vrf_id = vrf_id;
 
 	SCTP_INP_INFO_WLOCK();
 	SCTP_INP_LOCK_INIT(inp);
 	INP_LOCK_INIT(&inp->ip_inp.inp, "inp", "sctpinp");
 	SCTP_INP_READ_INIT(inp);
 	SCTP_ASOC_CREATE_LOCK_INIT(inp);
 	/* lock the new ep */
 	SCTP_INP_WLOCK(inp);
 
 	/* add it to the info area */
 	LIST_INSERT_HEAD(&SCTP_BASE_INFO(listhead), inp, sctp_list);
 	SCTP_INP_INFO_WUNLOCK();
 
 	TAILQ_INIT(&inp->read_queue);
 	LIST_INIT(&inp->sctp_addr_list);
 
 	LIST_INIT(&inp->sctp_asoc_list);
 
 #ifdef SCTP_TRACK_FREED_ASOCS
 	/* TEMP CODE */
 	LIST_INIT(&inp->sctp_asoc_free_list);
 #endif
 	/* Init the timer structure for signature change */
 	SCTP_OS_TIMER_INIT(&inp->sctp_ep.signature_change.timer);
 	inp->sctp_ep.signature_change.type = SCTP_TIMER_TYPE_NEWCOOKIE;
 
 	/* now init the actual endpoint default data */
 	m = &inp->sctp_ep;
 
 	/* setup the base timeout information */
 	m->sctp_timeoutticks[SCTP_TIMER_SEND] = SEC_TO_TICKS(SCTP_SEND_SEC);	/* needed ? */
 	m->sctp_timeoutticks[SCTP_TIMER_INIT] = SEC_TO_TICKS(SCTP_INIT_SEC);	/* needed ? */
 	m->sctp_timeoutticks[SCTP_TIMER_RECV] = MSEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_delayed_sack_time_default));
 	m->sctp_timeoutticks[SCTP_TIMER_HEARTBEAT] = MSEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_heartbeat_interval_default));
 	m->sctp_timeoutticks[SCTP_TIMER_PMTU] = SEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_pmtu_raise_time_default));
 	m->sctp_timeoutticks[SCTP_TIMER_MAXSHUTDOWN] = SEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_shutdown_guard_time_default));
 	m->sctp_timeoutticks[SCTP_TIMER_SIGNATURE] = SEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_secret_lifetime_default));
 	/* all max/min max are in ms */
 	m->sctp_maxrto = SCTP_BASE_SYSCTL(sctp_rto_max_default);
 	m->sctp_minrto = SCTP_BASE_SYSCTL(sctp_rto_min_default);
 	m->initial_rto = SCTP_BASE_SYSCTL(sctp_rto_initial_default);
 	m->initial_init_rto_max = SCTP_BASE_SYSCTL(sctp_init_rto_max_default);
 	m->sctp_sack_freq = SCTP_BASE_SYSCTL(sctp_sack_freq_default);
 	m->max_init_times = SCTP_BASE_SYSCTL(sctp_init_rtx_max_default);
 	m->max_send_times = SCTP_BASE_SYSCTL(sctp_assoc_rtx_max_default);
 	m->def_net_failure = SCTP_BASE_SYSCTL(sctp_path_rtx_max_default);
 	m->def_net_pf_threshold = SCTP_BASE_SYSCTL(sctp_path_pf_threshold);
 	m->sctp_sws_sender = SCTP_SWS_SENDER_DEF;
 	m->sctp_sws_receiver = SCTP_SWS_RECEIVER_DEF;
 	m->max_burst = SCTP_BASE_SYSCTL(sctp_max_burst_default);
 	m->fr_max_burst = SCTP_BASE_SYSCTL(sctp_fr_max_burst_default);
 
 	m->sctp_default_cc_module = SCTP_BASE_SYSCTL(sctp_default_cc_module);
 	m->sctp_default_ss_module = SCTP_BASE_SYSCTL(sctp_default_ss_module);
 	m->max_open_streams_intome = SCTP_BASE_SYSCTL(sctp_nr_incoming_streams_default);
 	/* number of streams to pre-open on a association */
 	m->pre_open_stream_count = SCTP_BASE_SYSCTL(sctp_nr_outgoing_streams_default);
 
 	/* Add adaptation cookie */
 	m->adaptation_layer_indicator = 0;
 	m->adaptation_layer_indicator_provided = 0;
 
 	/* seed random number generator */
 	m->random_counter = 1;
 	m->store_at = SCTP_SIGNATURE_SIZE;
 	SCTP_READ_RANDOM(m->random_numbers, sizeof(m->random_numbers));
 	sctp_fill_random_store(m);
 
 	/* Minimum cookie size */
 	m->size_of_a_cookie = (sizeof(struct sctp_init_msg) * 2) +
 	    sizeof(struct sctp_state_cookie);
 	m->size_of_a_cookie += SCTP_SIGNATURE_SIZE;
 
 	/* Setup the initial secret */
 	(void)SCTP_GETTIME_TIMEVAL(&time);
 	m->time_of_secret_change = time.tv_sec;
 
 	for (i = 0; i < SCTP_NUMBER_OF_SECRETS; i++) {
 		m->secret_key[0][i] = sctp_select_initial_TSN(m);
 	}
 	sctp_timer_start(SCTP_TIMER_TYPE_NEWCOOKIE, inp, NULL, NULL);
 
 	/* How long is a cookie good for ? */
 	m->def_cookie_life = MSEC_TO_TICKS(SCTP_BASE_SYSCTL(sctp_valid_cookie_life_default));
 	/*
 	 * Initialize authentication parameters
 	 */
 	m->local_hmacs = sctp_default_supported_hmaclist();
 	m->local_auth_chunks = sctp_alloc_chunklist();
 	if (inp->asconf_supported) {
 		sctp_auth_add_chunk(SCTP_ASCONF, m->local_auth_chunks);
 		sctp_auth_add_chunk(SCTP_ASCONF_ACK, m->local_auth_chunks);
 	}
 	m->default_dscp = 0;
 #ifdef INET6
 	m->default_flowlabel = 0;
 #endif
 	m->port = 0;		/* encapsulation disabled by default */
 	LIST_INIT(&m->shared_keys);
 	/* add default NULL key as key id 0 */
 	null_key = sctp_alloc_sharedkey();
 	sctp_insert_sharedkey(&m->shared_keys, null_key);
 	SCTP_INP_WUNLOCK(inp);
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, NULL, 12);
 #endif
 	return (error);
 }
 
 
 void
 sctp_move_pcb_and_assoc(struct sctp_inpcb *old_inp, struct sctp_inpcb *new_inp,
     struct sctp_tcb *stcb)
 {
 	struct sctp_nets *net;
 	uint16_t lport, rport;
 	struct sctppcbhead *head;
 	struct sctp_laddr *laddr, *oladdr;
 
 	atomic_add_int(&stcb->asoc.refcnt, 1);
 	SCTP_TCB_UNLOCK(stcb);
 	SCTP_INP_INFO_WLOCK();
 	SCTP_INP_WLOCK(old_inp);
 	SCTP_INP_WLOCK(new_inp);
 	SCTP_TCB_LOCK(stcb);
 	atomic_subtract_int(&stcb->asoc.refcnt, 1);
 
 	new_inp->sctp_ep.time_of_secret_change =
 	    old_inp->sctp_ep.time_of_secret_change;
 	memcpy(new_inp->sctp_ep.secret_key, old_inp->sctp_ep.secret_key,
 	    sizeof(old_inp->sctp_ep.secret_key));
 	new_inp->sctp_ep.current_secret_number =
 	    old_inp->sctp_ep.current_secret_number;
 	new_inp->sctp_ep.last_secret_number =
 	    old_inp->sctp_ep.last_secret_number;
 	new_inp->sctp_ep.size_of_a_cookie = old_inp->sctp_ep.size_of_a_cookie;
 
 	/* make it so new data pours into the new socket */
 	stcb->sctp_socket = new_inp->sctp_socket;
 	stcb->sctp_ep = new_inp;
 
 	/* Copy the port across */
 	lport = new_inp->sctp_lport = old_inp->sctp_lport;
 	rport = stcb->rport;
 	/* Pull the tcb from the old association */
 	LIST_REMOVE(stcb, sctp_tcbhash);
 	LIST_REMOVE(stcb, sctp_tcblist);
 	if (stcb->asoc.in_asocid_hash) {
 		LIST_REMOVE(stcb, sctp_tcbasocidhash);
 	}
 	/* Now insert the new_inp into the TCP connected hash */
 	head = &SCTP_BASE_INFO(sctp_tcpephash)[SCTP_PCBHASH_ALLADDR((lport | rport), SCTP_BASE_INFO(hashtcpmark))];
 
 	LIST_INSERT_HEAD(head, new_inp, sctp_hash);
 	/* Its safe to access */
 	new_inp->sctp_flags &= ~SCTP_PCB_FLAGS_UNBOUND;
 
 	/* Now move the tcb into the endpoint list */
 	LIST_INSERT_HEAD(&new_inp->sctp_asoc_list, stcb, sctp_tcblist);
 	/*
 	 * Question, do we even need to worry about the ep-hash since we
 	 * only have one connection? Probably not :> so lets get rid of it
 	 * and not suck up any kernel memory in that.
 	 */
 	if (stcb->asoc.in_asocid_hash) {
 		struct sctpasochead *lhd;
 
 		lhd = &new_inp->sctp_asocidhash[SCTP_PCBHASH_ASOC(stcb->asoc.assoc_id,
 		    new_inp->hashasocidmark)];
 		LIST_INSERT_HEAD(lhd, stcb, sctp_tcbasocidhash);
 	}
 	/* Ok. Let's restart timer. */
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		sctp_timer_start(SCTP_TIMER_TYPE_PATHMTURAISE, new_inp,
 		    stcb, net);
 	}
 
 	SCTP_INP_INFO_WUNLOCK();
 	if (new_inp->sctp_tcbhash != NULL) {
 		SCTP_HASH_FREE(new_inp->sctp_tcbhash, new_inp->sctp_hashmark);
 		new_inp->sctp_tcbhash = NULL;
 	}
 	if ((new_inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) == 0) {
 		/* Subset bound, so copy in the laddr list from the old_inp */
 		LIST_FOREACH(oladdr, &old_inp->sctp_addr_list, sctp_nxt_addr) {
 			laddr = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_laddr), struct sctp_laddr);
 			if (laddr == NULL) {
 				/*
 				 * Gak, what can we do? This assoc is really
 				 * HOSED. We probably should send an abort
 				 * here.
 				 */
 				SCTPDBG(SCTP_DEBUG_PCB1, "Association hosed in TCP model, out of laddr memory\n");
 				continue;
 			}
 			SCTP_INCR_LADDR_COUNT();
 			bzero(laddr, sizeof(*laddr));
 			(void)SCTP_GETTIME_TIMEVAL(&laddr->start_time);
 			laddr->ifa = oladdr->ifa;
 			atomic_add_int(&laddr->ifa->refcount, 1);
 			LIST_INSERT_HEAD(&new_inp->sctp_addr_list, laddr,
 			    sctp_nxt_addr);
 			new_inp->laddr_count++;
 			if (oladdr == stcb->asoc.last_used_address) {
 				stcb->asoc.last_used_address = laddr;
 			}
 		}
 	}
 	/*
 	 * Now any running timers need to be adjusted since we really don't
 	 * care if they are running or not just blast in the new_inp into
 	 * all of them.
 	 */
 
 	stcb->asoc.dack_timer.ep = (void *)new_inp;
 	stcb->asoc.asconf_timer.ep = (void *)new_inp;
 	stcb->asoc.strreset_timer.ep = (void *)new_inp;
 	stcb->asoc.shut_guard_timer.ep = (void *)new_inp;
 	stcb->asoc.autoclose_timer.ep = (void *)new_inp;
 	stcb->asoc.delayed_event_timer.ep = (void *)new_inp;
 	stcb->asoc.delete_prim_timer.ep = (void *)new_inp;
 	/* now what about the nets? */
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		net->pmtu_timer.ep = (void *)new_inp;
 		net->hb_timer.ep = (void *)new_inp;
 		net->rxt_timer.ep = (void *)new_inp;
 	}
 	SCTP_INP_WUNLOCK(new_inp);
 	SCTP_INP_WUNLOCK(old_inp);
 }
 
 /*
  * insert an laddr entry with the given ifa for the desired list
  */
 static int
 sctp_insert_laddr(struct sctpladdr *list, struct sctp_ifa *ifa, uint32_t act)
 {
 	struct sctp_laddr *laddr;
 
 	laddr = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_laddr), struct sctp_laddr);
 	if (laddr == NULL) {
 		/* out of memory? */
 		SCTP_LTRACE_ERR_RET(NULL, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		return (EINVAL);
 	}
 	SCTP_INCR_LADDR_COUNT();
 	bzero(laddr, sizeof(*laddr));
 	(void)SCTP_GETTIME_TIMEVAL(&laddr->start_time);
 	laddr->ifa = ifa;
 	laddr->action = act;
 	atomic_add_int(&ifa->refcount, 1);
 	/* insert it */
 	LIST_INSERT_HEAD(list, laddr, sctp_nxt_addr);
 
 	return (0);
 }
 
 /*
  * Remove an laddr entry from the local address list (on an assoc)
  */
 static void
 sctp_remove_laddr(struct sctp_laddr *laddr)
 {
 
 	/* remove from the list */
 	LIST_REMOVE(laddr, sctp_nxt_addr);
 	sctp_free_ifa(laddr->ifa);
 	SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_laddr), laddr);
 	SCTP_DECR_LADDR_COUNT();
 }
 
 
 
 /* sctp_ifap is used to bypass normal local address validation checks */
 int
 sctp_inpcb_bind(struct socket *so, struct sockaddr *addr,
     struct sctp_ifa *sctp_ifap, struct thread *p)
 {
 	/* bind a ep to a socket address */
 	struct sctppcbhead *head;
 	struct sctp_inpcb *inp, *inp_tmp;
 	struct inpcb *ip_inp;
 	int port_reuse_active = 0;
 	int bindall;
 	uint16_t lport;
 	int error;
 	uint32_t vrf_id;
 
 	lport = 0;
 	bindall = 1;
 	inp = (struct sctp_inpcb *)so->so_pcb;
 	ip_inp = (struct inpcb *)so->so_pcb;
 #ifdef SCTP_DEBUG
 	if (addr) {
 		SCTPDBG(SCTP_DEBUG_PCB1, "Bind called port: %d\n",
 		    ntohs(((struct sockaddr_in *)addr)->sin_port));
 		SCTPDBG(SCTP_DEBUG_PCB1, "Addr: ");
 		SCTPDBG_ADDR(SCTP_DEBUG_PCB1, addr);
 	}
 #endif
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_UNBOUND) == 0) {
 		/* already did a bind, subsequent binds NOT allowed ! */
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		return (EINVAL);
 	}
 #ifdef INVARIANTS
 	if (p == NULL)
 		panic("null proc/thread");
 #endif
 	if (addr != NULL) {
 		switch (addr->sa_family) {
 #ifdef INET
 		case AF_INET:
 			{
 				struct sockaddr_in *sin;
 
 				/* IPV6_V6ONLY socket? */
 				if (SCTP_IPV6_V6ONLY(ip_inp)) {
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 					return (EINVAL);
 				}
 				if (addr->sa_len != sizeof(*sin)) {
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 					return (EINVAL);
 				}
 				sin = (struct sockaddr_in *)addr;
 				lport = sin->sin_port;
 				/*
 				 * For LOOPBACK the prison_local_ip4() call
 				 * will transmute the ip address to the
 				 * proper value.
 				 */
 				if (p && (error = prison_local_ip4(p->td_ucred, &sin->sin_addr)) != 0) {
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, error);
 					return (error);
 				}
 				if (sin->sin_addr.s_addr != INADDR_ANY) {
 					bindall = 0;
 				}
 				break;
 			}
 #endif
 #ifdef INET6
 		case AF_INET6:
 			{
 				/*
 				 * Only for pure IPv6 Address. (No IPv4
 				 * Mapped!)
 				 */
 				struct sockaddr_in6 *sin6;
 
 				sin6 = (struct sockaddr_in6 *)addr;
 
 				if (addr->sa_len != sizeof(*sin6)) {
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 					return (EINVAL);
 				}
 				lport = sin6->sin6_port;
 				/*
 				 * For LOOPBACK the prison_local_ip6() call
 				 * will transmute the ipv6 address to the
 				 * proper value.
 				 */
 				if (p && (error = prison_local_ip6(p->td_ucred, &sin6->sin6_addr,
 				    (SCTP_IPV6_V6ONLY(inp) != 0))) != 0) {
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, error);
 					return (error);
 				}
 				if (!IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr)) {
 					bindall = 0;
 					/* KAME hack: embed scopeid */
 					if (sa6_embedscope(sin6, MODULE_GLOBAL(ip6_use_defzone)) != 0) {
 						SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 						return (EINVAL);
 					}
 				}
 				/* this must be cleared for ifa_ifwithaddr() */
 				sin6->sin6_scope_id = 0;
 				break;
 			}
 #endif
 		default:
 			SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EAFNOSUPPORT);
 			return (EAFNOSUPPORT);
 		}
 	}
 	SCTP_INP_INFO_WLOCK();
 	SCTP_INP_WLOCK(inp);
 	/* Setup a vrf_id to be the default for the non-bind-all case. */
 	vrf_id = inp->def_vrf_id;
 
 	/* increase our count due to the unlock we do */
 	SCTP_INP_INCR_REF(inp);
 	if (lport) {
 		/*
 		 * Did the caller specify a port? if so we must see if an ep
 		 * already has this one bound.
 		 */
 		/* got to be root to get at low ports */
 		if (ntohs(lport) < IPPORT_RESERVED) {
 			if (p && (error =
 			    priv_check(p, PRIV_NETINET_RESERVEDPORT)
 			    )) {
 				SCTP_INP_DECR_REF(inp);
 				SCTP_INP_WUNLOCK(inp);
 				SCTP_INP_INFO_WUNLOCK();
 				return (error);
 			}
 		}
 		SCTP_INP_WUNLOCK(inp);
 		if (bindall) {
 			vrf_id = inp->def_vrf_id;
 			inp_tmp = sctp_pcb_findep(addr, 0, 1, vrf_id);
 			if (inp_tmp != NULL) {
 				/*
 				 * lock guy returned and lower count note
 				 * that we are not bound so inp_tmp should
 				 * NEVER be inp. And it is this inp
 				 * (inp_tmp) that gets the reference bump,
 				 * so we must lower it.
 				 */
 				SCTP_INP_DECR_REF(inp_tmp);
 				/* unlock info */
 				if ((sctp_is_feature_on(inp, SCTP_PCB_FLAGS_PORTREUSE)) &&
 				    (sctp_is_feature_on(inp_tmp, SCTP_PCB_FLAGS_PORTREUSE))) {
 					/*
 					 * Ok, must be one-2-one and
 					 * allowing port re-use
 					 */
 					port_reuse_active = 1;
 					goto continue_anyway;
 				}
 				SCTP_INP_DECR_REF(inp);
 				SCTP_INP_INFO_WUNLOCK();
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EADDRINUSE);
 				return (EADDRINUSE);
 			}
 		} else {
 			inp_tmp = sctp_pcb_findep(addr, 0, 1, vrf_id);
 			if (inp_tmp != NULL) {
 				/*
 				 * lock guy returned and lower count note
 				 * that we are not bound so inp_tmp should
 				 * NEVER be inp. And it is this inp
 				 * (inp_tmp) that gets the reference bump,
 				 * so we must lower it.
 				 */
 				SCTP_INP_DECR_REF(inp_tmp);
 				/* unlock info */
 				if ((sctp_is_feature_on(inp, SCTP_PCB_FLAGS_PORTREUSE)) &&
 				    (sctp_is_feature_on(inp_tmp, SCTP_PCB_FLAGS_PORTREUSE))) {
 					/*
 					 * Ok, must be one-2-one and
 					 * allowing port re-use
 					 */
 					port_reuse_active = 1;
 					goto continue_anyway;
 				}
 				SCTP_INP_DECR_REF(inp);
 				SCTP_INP_INFO_WUNLOCK();
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EADDRINUSE);
 				return (EADDRINUSE);
 			}
 		}
 continue_anyway:
 		SCTP_INP_WLOCK(inp);
 		if (bindall) {
 			/* verify that no lport is not used by a singleton */
 			if ((port_reuse_active == 0) &&
 			    (inp_tmp = sctp_isport_inuse(inp, lport, vrf_id))) {
 				/* Sorry someone already has this one bound */
 				if ((sctp_is_feature_on(inp, SCTP_PCB_FLAGS_PORTREUSE)) &&
 				    (sctp_is_feature_on(inp_tmp, SCTP_PCB_FLAGS_PORTREUSE))) {
 					port_reuse_active = 1;
 				} else {
 					SCTP_INP_DECR_REF(inp);
 					SCTP_INP_WUNLOCK(inp);
 					SCTP_INP_INFO_WUNLOCK();
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EADDRINUSE);
 					return (EADDRINUSE);
 				}
 			}
 		}
 	} else {
 		uint16_t first, last, candidate;
 		uint16_t count;
 		int done;
 
 		if (ip_inp->inp_flags & INP_HIGHPORT) {
 			first = MODULE_GLOBAL(ipport_hifirstauto);
 			last = MODULE_GLOBAL(ipport_hilastauto);
 		} else if (ip_inp->inp_flags & INP_LOWPORT) {
 			if (p && (error =
 			    priv_check(p, PRIV_NETINET_RESERVEDPORT)
 			    )) {
 				SCTP_INP_DECR_REF(inp);
 				SCTP_INP_WUNLOCK(inp);
 				SCTP_INP_INFO_WUNLOCK();
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, error);
 				return (error);
 			}
 			first = MODULE_GLOBAL(ipport_lowfirstauto);
 			last = MODULE_GLOBAL(ipport_lowlastauto);
 		} else {
 			first = MODULE_GLOBAL(ipport_firstauto);
 			last = MODULE_GLOBAL(ipport_lastauto);
 		}
 		if (first > last) {
 			uint16_t temp;
 
 			temp = first;
 			first = last;
 			last = temp;
 		}
 		count = last - first + 1;	/* number of candidates */
 		candidate = first + sctp_select_initial_TSN(&inp->sctp_ep) % (count);
 
 		done = 0;
 		while (!done) {
 			if (sctp_isport_inuse(inp, htons(candidate), inp->def_vrf_id) == NULL) {
 				done = 1;
 			}
 			if (!done) {
 				if (--count == 0) {
 					SCTP_INP_DECR_REF(inp);
 					SCTP_INP_WUNLOCK(inp);
 					SCTP_INP_INFO_WUNLOCK();
 					SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EADDRINUSE);
 					return (EADDRINUSE);
 				}
 				if (candidate == last)
 					candidate = first;
 				else
 					candidate = candidate + 1;
 			}
 		}
 		lport = htons(candidate);
 	}
 	SCTP_INP_DECR_REF(inp);
 	if (inp->sctp_flags & (SCTP_PCB_FLAGS_SOCKET_GONE |
 	    SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
 		/*
 		 * this really should not happen. The guy did a non-blocking
 		 * bind and then did a close at the same time.
 		 */
 		SCTP_INP_WUNLOCK(inp);
 		SCTP_INP_INFO_WUNLOCK();
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		return (EINVAL);
 	}
 	/* ok we look clear to give out this port, so lets setup the binding */
 	if (bindall) {
 		/* binding to all addresses, so just set in the proper flags */
 		inp->sctp_flags |= SCTP_PCB_FLAGS_BOUNDALL;
 		/* set the automatic addr changes from kernel flag */
 		if (SCTP_BASE_SYSCTL(sctp_auto_asconf) == 0) {
 			sctp_feature_off(inp, SCTP_PCB_FLAGS_DO_ASCONF);
 			sctp_feature_off(inp, SCTP_PCB_FLAGS_AUTO_ASCONF);
 		} else {
 			sctp_feature_on(inp, SCTP_PCB_FLAGS_DO_ASCONF);
 			sctp_feature_on(inp, SCTP_PCB_FLAGS_AUTO_ASCONF);
 		}
 		if (SCTP_BASE_SYSCTL(sctp_multiple_asconfs) == 0) {
 			sctp_feature_off(inp, SCTP_PCB_FLAGS_MULTIPLE_ASCONFS);
 		} else {
 			sctp_feature_on(inp, SCTP_PCB_FLAGS_MULTIPLE_ASCONFS);
 		}
 		/*
 		 * set the automatic mobility_base from kernel flag (by
 		 * micchie)
 		 */
 		if (SCTP_BASE_SYSCTL(sctp_mobility_base) == 0) {
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_BASE);
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_PRIM_DELETED);
 		} else {
 			sctp_mobility_feature_on(inp, SCTP_MOBILITY_BASE);
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_PRIM_DELETED);
 		}
 		/*
 		 * set the automatic mobility_fasthandoff from kernel flag
 		 * (by micchie)
 		 */
 		if (SCTP_BASE_SYSCTL(sctp_mobility_fasthandoff) == 0) {
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_FASTHANDOFF);
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_PRIM_DELETED);
 		} else {
 			sctp_mobility_feature_on(inp, SCTP_MOBILITY_FASTHANDOFF);
 			sctp_mobility_feature_off(inp, SCTP_MOBILITY_PRIM_DELETED);
 		}
 	} else {
 		/*
 		 * bind specific, make sure flags is off and add a new
 		 * address structure to the sctp_addr_list inside the ep
 		 * structure.
 		 * 
 		 * We will need to allocate one and insert it at the head. The
 		 * socketopt call can just insert new addresses in there as
 		 * well. It will also have to do the embed scope kame hack
 		 * too (before adding).
 		 */
 		struct sctp_ifa *ifa;
 		union sctp_sockstore store;
 
 		memset(&store, 0, sizeof(store));
 		switch (addr->sa_family) {
 #ifdef INET
 		case AF_INET:
 			memcpy(&store.sin, addr, sizeof(struct sockaddr_in));
 			store.sin.sin_port = 0;
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			memcpy(&store.sin6, addr, sizeof(struct sockaddr_in6));
 			store.sin6.sin6_port = 0;
 			break;
 #endif
 		default:
 			break;
 		}
 		/*
 		 * first find the interface with the bound address need to
 		 * zero out the port to find the address! yuck! can't do
 		 * this earlier since need port for sctp_pcb_findep()
 		 */
 		if (sctp_ifap != NULL) {
 			ifa = sctp_ifap;
 		} else {
 			/*
 			 * Note for BSD we hit here always other O/S's will
 			 * pass things in via the sctp_ifap argument
 			 * (Panda).
 			 */
 			ifa = sctp_find_ifa_by_addr(&store.sa,
 			    vrf_id, SCTP_ADDR_NOT_LOCKED);
 		}
 		if (ifa == NULL) {
 			/* Can't find an interface with that address */
 			SCTP_INP_WUNLOCK(inp);
 			SCTP_INP_INFO_WUNLOCK();
 			SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EADDRNOTAVAIL);
 			return (EADDRNOTAVAIL);
 		}
 #ifdef INET6
 		if (addr->sa_family == AF_INET6) {
 			/* GAK, more FIXME IFA lock? */
 			if (ifa->localifa_flags & SCTP_ADDR_IFA_UNUSEABLE) {
 				/* Can't bind a non-existent addr. */
 				SCTP_INP_WUNLOCK(inp);
 				SCTP_INP_INFO_WUNLOCK();
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 				return (EINVAL);
 			}
 		}
 #endif
 		/* we're not bound all */
 		inp->sctp_flags &= ~SCTP_PCB_FLAGS_BOUNDALL;
 		/* allow bindx() to send ASCONF's for binding changes */
 		sctp_feature_on(inp, SCTP_PCB_FLAGS_DO_ASCONF);
 		/* clear automatic addr changes from kernel flag */
 		sctp_feature_off(inp, SCTP_PCB_FLAGS_AUTO_ASCONF);
 
 		/* add this address to the endpoint list */
 		error = sctp_insert_laddr(&inp->sctp_addr_list, ifa, 0);
 		if (error != 0) {
 			SCTP_INP_WUNLOCK(inp);
 			SCTP_INP_INFO_WUNLOCK();
 			return (error);
 		}
 		inp->laddr_count++;
 	}
 	/* find the bucket */
 	if (port_reuse_active) {
 		/* Put it into tcp 1-2-1 hash */
 		head = &SCTP_BASE_INFO(sctp_tcpephash)[SCTP_PCBHASH_ALLADDR(lport, SCTP_BASE_INFO(hashtcpmark))];
 		inp->sctp_flags |= SCTP_PCB_FLAGS_IN_TCPPOOL;
 	} else {
 		head = &SCTP_BASE_INFO(sctp_ephash)[SCTP_PCBHASH_ALLADDR(lport, SCTP_BASE_INFO(hashmark))];
 	}
 	/* put it in the bucket */
 	LIST_INSERT_HEAD(head, inp, sctp_hash);
 	SCTPDBG(SCTP_DEBUG_PCB1, "Main hash to bind at head:%p, bound port:%d - in tcp_pool=%d\n",
 	    (void *)head, ntohs(lport), port_reuse_active);
 	/* set in the port */
 	inp->sctp_lport = lport;
 
 	/* turn off just the unbound flag */
 	inp->sctp_flags &= ~SCTP_PCB_FLAGS_UNBOUND;
 	SCTP_INP_WUNLOCK(inp);
 	SCTP_INP_INFO_WUNLOCK();
 	return (0);
 }
 
 
 static void
 sctp_iterator_inp_being_freed(struct sctp_inpcb *inp)
 {
 	struct sctp_iterator *it, *nit;
 
 	/*
 	 * We enter with the only the ITERATOR_LOCK in place and a write
 	 * lock on the inp_info stuff.
 	 */
 	it = sctp_it_ctl.cur_it;
 	if (it && (it->vn != curvnet)) {
 		/* Its not looking at our VNET */
 		return;
 	}
 	if (it && (it->inp == inp)) {
 		/*
 		 * This is tricky and we hold the iterator lock, but when it
 		 * returns and gets the lock (when we release it) the
 		 * iterator will try to operate on inp. We need to stop that
 		 * from happening. But of course the iterator has a
 		 * reference on the stcb and inp. We can mark it and it will
 		 * stop.
 		 * 
 		 * If its a single iterator situation, we set the end iterator
 		 * flag. Otherwise we set the iterator to go to the next
 		 * inp.
 		 * 
 		 */
 		if (it->iterator_flags & SCTP_ITERATOR_DO_SINGLE_INP) {
 			sctp_it_ctl.iterator_flags |= SCTP_ITERATOR_STOP_CUR_IT;
 		} else {
 			sctp_it_ctl.iterator_flags |= SCTP_ITERATOR_STOP_CUR_INP;
 		}
 	}
 	/*
 	 * Now go through and remove any single reference to our inp that
 	 * may be still pending on the list
 	 */
 	SCTP_IPI_ITERATOR_WQ_LOCK();
 	TAILQ_FOREACH_SAFE(it, &sctp_it_ctl.iteratorhead, sctp_nxt_itr, nit) {
 		if (it->vn != curvnet) {
 			continue;
 		}
 		if (it->inp == inp) {
 			/* This one points to me is it inp specific? */
 			if (it->iterator_flags & SCTP_ITERATOR_DO_SINGLE_INP) {
 				/* Remove and free this one */
 				TAILQ_REMOVE(&sctp_it_ctl.iteratorhead,
 				    it, sctp_nxt_itr);
 				if (it->function_atend != NULL) {
 					(*it->function_atend) (it->pointer, it->val);
 				}
 				SCTP_FREE(it, SCTP_M_ITER);
 			} else {
 				it->inp = LIST_NEXT(it->inp, sctp_list);
 				if (it->inp) {
 					SCTP_INP_INCR_REF(it->inp);
 				}
 			}
 			/*
 			 * When its put in the refcnt is incremented so decr
 			 * it
 			 */
 			SCTP_INP_DECR_REF(inp);
 		}
 	}
 	SCTP_IPI_ITERATOR_WQ_UNLOCK();
 }
 
 /* release sctp_inpcb unbind the port */
 void
 sctp_inpcb_free(struct sctp_inpcb *inp, int immediate, int from)
 {
 	/*
 	 * Here we free a endpoint. We must find it (if it is in the Hash
 	 * table) and remove it from there. Then we must also find it in the
 	 * overall list and remove it from there. After all removals are
 	 * complete then any timer has to be stopped. Then start the actual
 	 * freeing. a) Any local lists. b) Any associations. c) The hash of
 	 * all associations. d) finally the ep itself.
 	 */
 	struct sctp_tcb *asoc, *nasoc;
 	struct sctp_laddr *laddr, *nladdr;
 	struct inpcb *ip_pcb;
 	struct socket *so;
 	int being_refed = 0;
 	struct sctp_queued_to_read *sq, *nsq;
 	int cnt;
 	sctp_sharedkey_t *shared_key, *nshared_key;
 
 
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, NULL, 0);
 #endif
 	SCTP_ITERATOR_LOCK();
 	/* mark any iterators on the list or being processed */
 	sctp_iterator_inp_being_freed(inp);
 	SCTP_ITERATOR_UNLOCK();
 	so = inp->sctp_socket;
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 		/* been here before.. eeks.. get out of here */
 		SCTP_PRINTF("This conflict in free SHOULD not be happening! from %d, imm %d\n", from, immediate);
 #ifdef SCTP_LOG_CLOSING
 		sctp_log_closing(inp, NULL, 1);
 #endif
 		return;
 	}
 	SCTP_ASOC_CREATE_LOCK(inp);
 	SCTP_INP_INFO_WLOCK();
 
 	SCTP_INP_WLOCK(inp);
 	if (from == SCTP_CALLED_AFTER_CMPSET_OFCLOSE) {
 		inp->sctp_flags &= ~SCTP_PCB_FLAGS_CLOSE_IP;
 		/* socket is gone, so no more wakeups allowed */
 		inp->sctp_flags |= SCTP_PCB_FLAGS_DONT_WAKE;
 		inp->sctp_flags &= ~SCTP_PCB_FLAGS_WAKEINPUT;
 		inp->sctp_flags &= ~SCTP_PCB_FLAGS_WAKEOUTPUT;
 
 	}
 	/* First time through we have the socket lock, after that no more. */
 	sctp_timer_stop(SCTP_TIMER_TYPE_NEWCOOKIE, inp, NULL, NULL,
 	    SCTP_FROM_SCTP_PCB + SCTP_LOC_1);
 
 	if (inp->control) {
 		sctp_m_freem(inp->control);
 		inp->control = NULL;
 	}
 	if (inp->pkt) {
 		sctp_m_freem(inp->pkt);
 		inp->pkt = NULL;
 	}
 	ip_pcb = &inp->ip_inp.inp;	/* we could just cast the main pointer
 					 * here but I will be nice :> (i.e.
 					 * ip_pcb = ep;) */
 	if (immediate == SCTP_FREE_SHOULD_USE_GRACEFUL_CLOSE) {
 		int cnt_in_sd;
 
 		cnt_in_sd = 0;
 		LIST_FOREACH_SAFE(asoc, &inp->sctp_asoc_list, sctp_tcblist, nasoc) {
 			SCTP_TCB_LOCK(asoc);
 			if (asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 				/* Skip guys being freed */
 				cnt_in_sd++;
 				if (asoc->asoc.state & SCTP_STATE_IN_ACCEPT_QUEUE) {
 					/*
 					 * Special case - we did not start a
 					 * kill timer on the asoc due to it
 					 * was not closed. So go ahead and
 					 * start it now.
 					 */
 					asoc->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE;
 					sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, asoc, NULL);
 				}
 				SCTP_TCB_UNLOCK(asoc);
 				continue;
 			}
 			if (((SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_COOKIE_WAIT) ||
 			    (SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_COOKIE_ECHOED)) &&
 			    (asoc->asoc.total_output_queue_size == 0)) {
 				/*
 				 * If we have data in queue, we don't want
 				 * to just free since the app may have done,
 				 * send()/close or connect/send/close. And
 				 * it wants the data to get across first.
 				 */
 				/* Just abandon things in the front states */
 				if (sctp_free_assoc(inp, asoc, SCTP_PCBFREE_NOFORCE,
 				    SCTP_FROM_SCTP_PCB + SCTP_LOC_2) == 0) {
 					cnt_in_sd++;
 				}
 				continue;
 			}
 			/* Disconnect the socket please */
 			asoc->sctp_socket = NULL;
 			asoc->asoc.state |= SCTP_STATE_CLOSED_SOCKET;
 			if ((asoc->asoc.size_on_reasm_queue > 0) ||
 			    (asoc->asoc.control_pdapi) ||
 			    (asoc->asoc.size_on_all_streams > 0) ||
 			    (so && (so->so_rcv.sb_cc > 0))) {
 				/* Left with Data unread */
 				struct mbuf *op_err;
 
 				op_err = sctp_generate_cause(SCTP_CAUSE_USER_INITIATED_ABT, "");
 				asoc->sctp_ep->last_abort_code = SCTP_FROM_SCTP_PCB + SCTP_LOC_3;
 				sctp_send_abort_tcb(asoc, op_err, SCTP_SO_LOCKED);
 				SCTP_STAT_INCR_COUNTER32(sctps_aborted);
 				if ((SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_OPEN) ||
 				    (SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 					SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 				}
 				if (sctp_free_assoc(inp, asoc,
 				    SCTP_PCBFREE_NOFORCE, SCTP_FROM_SCTP_PCB + SCTP_LOC_4) == 0) {
 					cnt_in_sd++;
 				}
 				continue;
 			} else if (TAILQ_EMPTY(&asoc->asoc.send_queue) &&
 				    TAILQ_EMPTY(&asoc->asoc.sent_queue) &&
 			    (asoc->asoc.stream_queue_cnt == 0)) {
 				if (asoc->asoc.locked_on_sending) {
 					goto abort_anyway;
 				}
 				if ((SCTP_GET_STATE(&asoc->asoc) != SCTP_STATE_SHUTDOWN_SENT) &&
 				    (SCTP_GET_STATE(&asoc->asoc) != SCTP_STATE_SHUTDOWN_ACK_SENT)) {
 					struct sctp_nets *netp;
 
 					/*
 					 * there is nothing queued to send,
 					 * so I send shutdown
 					 */
 					if ((SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_OPEN) ||
 					    (SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 						SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 					}
 					SCTP_SET_STATE(&asoc->asoc, SCTP_STATE_SHUTDOWN_SENT);
 					SCTP_CLEAR_SUBSTATE(&asoc->asoc, SCTP_STATE_SHUTDOWN_PENDING);
 					sctp_stop_timers_for_shutdown(asoc);
 					if (asoc->asoc.alternate) {
 						netp = asoc->asoc.alternate;
 					} else {
 						netp = asoc->asoc.primary_destination;
 					}
 					sctp_send_shutdown(asoc, netp);
 					sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWN, asoc->sctp_ep, asoc,
 					    netp);
 					sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD, asoc->sctp_ep, asoc,
 					    asoc->asoc.primary_destination);
 					sctp_chunk_output(inp, asoc, SCTP_OUTPUT_FROM_SHUT_TMR, SCTP_SO_LOCKED);
 				}
 			} else {
 				/* mark into shutdown pending */
 				struct sctp_stream_queue_pending *sp;
 
 				asoc->asoc.state |= SCTP_STATE_SHUTDOWN_PENDING;
 				sctp_timer_start(SCTP_TIMER_TYPE_SHUTDOWNGUARD, asoc->sctp_ep, asoc,
 				    asoc->asoc.primary_destination);
 				if (asoc->asoc.locked_on_sending) {
 					sp = TAILQ_LAST(&((asoc->asoc.locked_on_sending)->outqueue),
 					    sctp_streamhead);
 					if (sp == NULL) {
 						SCTP_PRINTF("Error, sp is NULL, locked on sending is %p strm:%d\n",
 						    (void *)asoc->asoc.locked_on_sending,
 						    asoc->asoc.locked_on_sending->stream_no);
 					} else {
 						if ((sp->length == 0) && (sp->msg_is_complete == 0))
 							asoc->asoc.state |= SCTP_STATE_PARTIAL_MSG_LEFT;
 					}
 				}
 				if (TAILQ_EMPTY(&asoc->asoc.send_queue) &&
 				    TAILQ_EMPTY(&asoc->asoc.sent_queue) &&
 				    (asoc->asoc.state & SCTP_STATE_PARTIAL_MSG_LEFT)) {
 					struct mbuf *op_err;
 
 			abort_anyway:
 					op_err = sctp_generate_cause(SCTP_CAUSE_USER_INITIATED_ABT, "");
 					asoc->sctp_ep->last_abort_code = SCTP_FROM_SCTP_PCB + SCTP_LOC_5;
 					sctp_send_abort_tcb(asoc, op_err, SCTP_SO_LOCKED);
 					SCTP_STAT_INCR_COUNTER32(sctps_aborted);
 					if ((SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_OPEN) ||
 					    (SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 						SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 					}
 					if (sctp_free_assoc(inp, asoc,
 					    SCTP_PCBFREE_NOFORCE,
 					    SCTP_FROM_SCTP_PCB + SCTP_LOC_6) == 0) {
 						cnt_in_sd++;
 					}
 					continue;
 				} else {
 					sctp_chunk_output(inp, asoc, SCTP_OUTPUT_FROM_CLOSING, SCTP_SO_LOCKED);
 				}
 			}
 			cnt_in_sd++;
 			SCTP_TCB_UNLOCK(asoc);
 		}
 		/* now is there some left in our SHUTDOWN state? */
 		if (cnt_in_sd) {
 #ifdef SCTP_LOG_CLOSING
 			sctp_log_closing(inp, NULL, 2);
 #endif
 			inp->sctp_socket = NULL;
 			SCTP_INP_WUNLOCK(inp);
 			SCTP_ASOC_CREATE_UNLOCK(inp);
 			SCTP_INP_INFO_WUNLOCK();
 			return;
 		}
 	}
 	inp->sctp_socket = NULL;
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_UNBOUND) !=
 	    SCTP_PCB_FLAGS_UNBOUND) {
 		/*
 		 * ok, this guy has been bound. It's port is somewhere in
 		 * the SCTP_BASE_INFO(hash table). Remove it!
 		 */
 		LIST_REMOVE(inp, sctp_hash);
 		inp->sctp_flags |= SCTP_PCB_FLAGS_UNBOUND;
 	}
 	/*
 	 * If there is a timer running to kill us, forget it, since it may
 	 * have a contest on the INP lock.. which would cause us to die ...
 	 */
 	cnt = 0;
 	LIST_FOREACH_SAFE(asoc, &inp->sctp_asoc_list, sctp_tcblist, nasoc) {
 		SCTP_TCB_LOCK(asoc);
 		if (asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 			if (asoc->asoc.state & SCTP_STATE_IN_ACCEPT_QUEUE) {
 				asoc->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE;
 				sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, asoc, NULL);
 			}
 			cnt++;
 			SCTP_TCB_UNLOCK(asoc);
 			continue;
 		}
 		/* Free associations that are NOT killing us */
 		if ((SCTP_GET_STATE(&asoc->asoc) != SCTP_STATE_COOKIE_WAIT) &&
 		    ((asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) == 0)) {
 			struct mbuf *op_err;
 
 			op_err = sctp_generate_cause(SCTP_CAUSE_USER_INITIATED_ABT, "");
 			asoc->sctp_ep->last_abort_code = SCTP_FROM_SCTP_PCB + SCTP_LOC_7;
 			sctp_send_abort_tcb(asoc, op_err, SCTP_SO_LOCKED);
 			SCTP_STAT_INCR_COUNTER32(sctps_aborted);
 		} else if (asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) {
 			cnt++;
 			SCTP_TCB_UNLOCK(asoc);
 			continue;
 		}
 		if ((SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_OPEN) ||
 		    (SCTP_GET_STATE(&asoc->asoc) == SCTP_STATE_SHUTDOWN_RECEIVED)) {
 			SCTP_STAT_DECR_GAUGE32(sctps_currestab);
 		}
 		if (sctp_free_assoc(inp, asoc, SCTP_PCBFREE_FORCE,
 		    SCTP_FROM_SCTP_PCB + SCTP_LOC_8) == 0) {
 			cnt++;
 		}
 	}
 	if (cnt) {
 		/* Ok we have someone out there that will kill us */
 		(void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer);
 #ifdef SCTP_LOG_CLOSING
 		sctp_log_closing(inp, NULL, 3);
 #endif
 		SCTP_INP_WUNLOCK(inp);
 		SCTP_ASOC_CREATE_UNLOCK(inp);
 		SCTP_INP_INFO_WUNLOCK();
 		return;
 	}
 	if (SCTP_INP_LOCK_CONTENDED(inp))
 		being_refed++;
 	if (SCTP_INP_READ_CONTENDED(inp))
 		being_refed++;
 	if (SCTP_ASOC_CREATE_LOCK_CONTENDED(inp))
 		being_refed++;
 
 	if ((inp->refcount) ||
 	    (being_refed) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_CLOSE_IP)) {
 		(void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer);
 #ifdef SCTP_LOG_CLOSING
 		sctp_log_closing(inp, NULL, 4);
 #endif
 		sctp_timer_start(SCTP_TIMER_TYPE_INPKILL, inp, NULL, NULL);
 		SCTP_INP_WUNLOCK(inp);
 		SCTP_ASOC_CREATE_UNLOCK(inp);
 		SCTP_INP_INFO_WUNLOCK();
 		return;
 	}
 	inp->sctp_ep.signature_change.type = 0;
 	inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_ALLGONE;
 	/*
 	 * Remove it from the list .. last thing we need a lock for.
 	 */
 	LIST_REMOVE(inp, sctp_list);
 	SCTP_INP_WUNLOCK(inp);
 	SCTP_ASOC_CREATE_UNLOCK(inp);
 	SCTP_INP_INFO_WUNLOCK();
 	/*
 	 * Now we release all locks. Since this INP cannot be found anymore
 	 * except possibly by the kill timer that might be running. We call
 	 * the drain function here. It should hit the case were it sees the
 	 * ACTIVE flag cleared and exit out freeing us to proceed and
 	 * destroy everything.
 	 */
 	if (from != SCTP_CALLED_FROM_INPKILL_TIMER) {
 		(void)SCTP_OS_TIMER_STOP_DRAIN(&inp->sctp_ep.signature_change.timer);
 	} else {
 		/* Probably un-needed */
 		(void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer);
 	}
 
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, NULL, 5);
 #endif
 
 
 	if ((inp->sctp_asocidhash) != NULL) {
 		SCTP_HASH_FREE(inp->sctp_asocidhash, inp->hashasocidmark);
 		inp->sctp_asocidhash = NULL;
 	}
 	/* sa_ignore FREED_MEMORY */
 	TAILQ_FOREACH_SAFE(sq, &inp->read_queue, next, nsq) {
 		/* Its only abandoned if it had data left */
 		if (sq->length)
 			SCTP_STAT_INCR(sctps_left_abandon);
 
 		TAILQ_REMOVE(&inp->read_queue, sq, next);
 		sctp_free_remote_addr(sq->whoFrom);
 		if (so)
 			so->so_rcv.sb_cc -= sq->length;
 		if (sq->data) {
 			sctp_m_freem(sq->data);
 			sq->data = NULL;
 		}
 		/*
 		 * no need to free the net count, since at this point all
 		 * assoc's are gone.
 		 */
 		sctp_free_a_readq(NULL, sq);
 	}
 	/* Now the sctp_pcb things */
 	/*
 	 * free each asoc if it is not already closed/free. we can't use the
 	 * macro here since le_next will get freed as part of the
 	 * sctp_free_assoc() call.
 	 */
 #ifdef IPSEC
 	ipsec_delete_pcbpolicy(ip_pcb);
 #endif
 	if (ip_pcb->inp_options) {
 		(void)sctp_m_free(ip_pcb->inp_options);
 		ip_pcb->inp_options = 0;
 	}
 #ifdef INET6
 	if (ip_pcb->inp_vflag & INP_IPV6) {
 		struct in6pcb *in6p;
 
 		in6p = (struct in6pcb *)inp;
 		ip6_freepcbopts(in6p->in6p_outputopts);
 	}
 #endif				/* INET6 */
 	ip_pcb->inp_vflag = 0;
 	/* free up authentication fields */
 	if (inp->sctp_ep.local_auth_chunks != NULL)
 		sctp_free_chunklist(inp->sctp_ep.local_auth_chunks);
 	if (inp->sctp_ep.local_hmacs != NULL)
 		sctp_free_hmaclist(inp->sctp_ep.local_hmacs);
 
 	LIST_FOREACH_SAFE(shared_key, &inp->sctp_ep.shared_keys, next, nshared_key) {
 		LIST_REMOVE(shared_key, next);
 		sctp_free_sharedkey(shared_key);
 		/* sa_ignore FREED_MEMORY */
 	}
 
 	/*
 	 * if we have an address list the following will free the list of
 	 * ifaddr's that are set into this ep. Again macro limitations here,
 	 * since the LIST_FOREACH could be a bad idea.
 	 */
 	LIST_FOREACH_SAFE(laddr, &inp->sctp_addr_list, sctp_nxt_addr, nladdr) {
 		sctp_remove_laddr(laddr);
 	}
 
 #ifdef SCTP_TRACK_FREED_ASOCS
 	/* TEMP CODE */
 	LIST_FOREACH_SAFE(asoc, &inp->sctp_asoc_free_list, sctp_tcblist, nasoc) {
 		LIST_REMOVE(asoc, sctp_tcblist);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), asoc);
 		SCTP_DECR_ASOC_COUNT();
 	}
 	/* *** END TEMP CODE *** */
 #endif
 	/* Now lets see about freeing the EP hash table. */
 	if (inp->sctp_tcbhash != NULL) {
 		SCTP_HASH_FREE(inp->sctp_tcbhash, inp->sctp_hashmark);
 		inp->sctp_tcbhash = NULL;
 	}
 	/* Now we must put the ep memory back into the zone pool */
 	crfree(inp->ip_inp.inp.inp_cred);
 	INP_LOCK_DESTROY(&inp->ip_inp.inp);
 	SCTP_INP_LOCK_DESTROY(inp);
 	SCTP_INP_READ_DESTROY(inp);
 	SCTP_ASOC_CREATE_LOCK_DESTROY(inp);
 	SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp);
 	SCTP_DECR_EP_COUNT();
 }
 
 
 struct sctp_nets *
 sctp_findnet(struct sctp_tcb *stcb, struct sockaddr *addr)
 {
 	struct sctp_nets *net;
 
 	/* locate the address */
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		if (sctp_cmpaddr(addr, (struct sockaddr *)&net->ro._l_addr))
 			return (net);
 	}
 	return (NULL);
 }
 
 
 int
 sctp_is_address_on_local_host(struct sockaddr *addr, uint32_t vrf_id)
 {
 	struct sctp_ifa *sctp_ifa;
 
 	sctp_ifa = sctp_find_ifa_by_addr(addr, vrf_id, SCTP_ADDR_NOT_LOCKED);
 	if (sctp_ifa) {
 		return (1);
 	} else {
 		return (0);
 	}
 }
 
 /*
  * add's a remote endpoint address, done with the INIT/INIT-ACK as well as
  * when a ASCONF arrives that adds it. It will also initialize all the cwnd
  * stats of stuff.
  */
 int
 sctp_add_remote_addr(struct sctp_tcb *stcb, struct sockaddr *newaddr,
     struct sctp_nets **netp, uint16_t port, int set_scope, int from)
 {
 	/*
 	 * The following is redundant to the same lines in the
 	 * sctp_aloc_assoc() but is needed since others call the add address
 	 * function
 	 */
 	struct sctp_nets *net, *netfirst;
 	int addr_inscope;
 
 	SCTPDBG(SCTP_DEBUG_PCB1, "Adding an address (from:%d) to the peer: ",
 	    from);
 	SCTPDBG_ADDR(SCTP_DEBUG_PCB1, newaddr);
 
 	netfirst = sctp_findnet(stcb, newaddr);
 	if (netfirst) {
 		/*
 		 * Lie and return ok, we don't want to make the association
 		 * go away for this behavior. It will happen in the TCP
 		 * model in a connected socket. It does not reach the hash
 		 * table until after the association is built so it can't be
 		 * found. Mark as reachable, since the initial creation will
 		 * have been cleared and the NOT_IN_ASSOC flag will have
 		 * been added... and we don't want to end up removing it
 		 * back out.
 		 */
 		if (netfirst->dest_state & SCTP_ADDR_UNCONFIRMED) {
 			netfirst->dest_state = (SCTP_ADDR_REACHABLE |
 			    SCTP_ADDR_UNCONFIRMED);
 		} else {
 			netfirst->dest_state = SCTP_ADDR_REACHABLE;
 		}
 
 		return (0);
 	}
 	addr_inscope = 1;
 	switch (newaddr->sa_family) {
 #ifdef INET
 	case AF_INET:
 		{
 			struct sockaddr_in *sin;
 
 			sin = (struct sockaddr_in *)newaddr;
 			if (sin->sin_addr.s_addr == 0) {
 				/* Invalid address */
 				return (-1);
 			}
 			/* zero out the bzero area */
 			memset(&sin->sin_zero, 0, sizeof(sin->sin_zero));
 
 			/* assure len is set */
 			sin->sin_len = sizeof(struct sockaddr_in);
 			if (set_scope) {
 				if (IN4_ISPRIVATE_ADDRESS(&sin->sin_addr)) {
 					stcb->asoc.scope.ipv4_local_scope = 1;
 				}
 			} else {
 				/* Validate the address is in scope */
 				if ((IN4_ISPRIVATE_ADDRESS(&sin->sin_addr)) &&
 				    (stcb->asoc.scope.ipv4_local_scope == 0)) {
 					addr_inscope = 0;
 				}
 			}
 			break;
 		}
 #endif
 #ifdef INET6
 	case AF_INET6:
 		{
 			struct sockaddr_in6 *sin6;
 
 			sin6 = (struct sockaddr_in6 *)newaddr;
 			if (IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr)) {
 				/* Invalid address */
 				return (-1);
 			}
 			/* assure len is set */
 			sin6->sin6_len = sizeof(struct sockaddr_in6);
 			if (set_scope) {
 				if (sctp_is_address_on_local_host(newaddr, stcb->asoc.vrf_id)) {
 					stcb->asoc.scope.loopback_scope = 1;
 					stcb->asoc.scope.local_scope = 0;
 					stcb->asoc.scope.ipv4_local_scope = 1;
 					stcb->asoc.scope.site_scope = 1;
 				} else if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr)) {
 					/*
 					 * If the new destination is a
 					 * LINK_LOCAL we must have common
 					 * site scope. Don't set the local
 					 * scope since we may not share all
 					 * links, only loopback can do this.
 					 * Links on the local network would
 					 * also be on our private network
 					 * for v4 too.
 					 */
 					stcb->asoc.scope.ipv4_local_scope = 1;
 					stcb->asoc.scope.site_scope = 1;
 				} else if (IN6_IS_ADDR_SITELOCAL(&sin6->sin6_addr)) {
 					/*
 					 * If the new destination is
 					 * SITE_LOCAL then we must have site
 					 * scope in common.
 					 */
 					stcb->asoc.scope.site_scope = 1;
 				}
 			} else {
 				/* Validate the address is in scope */
 				if (IN6_IS_ADDR_LOOPBACK(&sin6->sin6_addr) &&
 				    (stcb->asoc.scope.loopback_scope == 0)) {
 					addr_inscope = 0;
 				} else if (IN6_IS_ADDR_LINKLOCAL(&sin6->sin6_addr) &&
 				    (stcb->asoc.scope.local_scope == 0)) {
 					addr_inscope = 0;
 				} else if (IN6_IS_ADDR_SITELOCAL(&sin6->sin6_addr) &&
 				    (stcb->asoc.scope.site_scope == 0)) {
 					addr_inscope = 0;
 				}
 			}
 			break;
 		}
 #endif
 	default:
 		/* not supported family type */
 		return (-1);
 	}
 	net = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_net), struct sctp_nets);
 	if (net == NULL) {
 		return (-1);
 	}
 	SCTP_INCR_RADDR_COUNT();
 	bzero(net, sizeof(struct sctp_nets));
 	(void)SCTP_GETTIME_TIMEVAL(&net->start_time);
 	memcpy(&net->ro._l_addr, newaddr, newaddr->sa_len);
 	switch (newaddr->sa_family) {
 #ifdef INET
 	case AF_INET:
 		((struct sockaddr_in *)&net->ro._l_addr)->sin_port = stcb->rport;
 		break;
 #endif
 #ifdef INET6
 	case AF_INET6:
 		((struct sockaddr_in6 *)&net->ro._l_addr)->sin6_port = stcb->rport;
 		break;
 #endif
 	default:
 		break;
 	}
 	net->addr_is_local = sctp_is_address_on_local_host(newaddr, stcb->asoc.vrf_id);
 	if (net->addr_is_local && ((set_scope || (from == SCTP_ADDR_IS_CONFIRMED)))) {
 		stcb->asoc.scope.loopback_scope = 1;
 		stcb->asoc.scope.ipv4_local_scope = 1;
 		stcb->asoc.scope.local_scope = 0;
 		stcb->asoc.scope.site_scope = 1;
 		addr_inscope = 1;
 	}
 	net->failure_threshold = stcb->asoc.def_net_failure;
 	net->pf_threshold = stcb->asoc.def_net_pf_threshold;
 	if (addr_inscope == 0) {
 		net->dest_state = (SCTP_ADDR_REACHABLE |
 		    SCTP_ADDR_OUT_OF_SCOPE);
 	} else {
 		if (from == SCTP_ADDR_IS_CONFIRMED)
 			/* SCTP_ADDR_IS_CONFIRMED is passed by connect_x */
 			net->dest_state = SCTP_ADDR_REACHABLE;
 		else
 			net->dest_state = SCTP_ADDR_REACHABLE |
 			    SCTP_ADDR_UNCONFIRMED;
 	}
 	/*
 	 * We set this to 0, the timer code knows that this means its an
 	 * initial value
 	 */
 	net->rto_needed = 1;
 	net->RTO = 0;
 	net->RTO_measured = 0;
 	stcb->asoc.numnets++;
 	net->ref_count = 1;
 	net->cwr_window_tsn = net->last_cwr_tsn = stcb->asoc.sending_seq - 1;
 	net->port = port;
 	net->dscp = stcb->asoc.default_dscp;
 #ifdef INET6
 	net->flowlabel = stcb->asoc.default_flowlabel;
 #endif
 	if (sctp_stcb_is_feature_on(stcb->sctp_ep, stcb, SCTP_PCB_FLAGS_DONOT_HEARTBEAT)) {
 		net->dest_state |= SCTP_ADDR_NOHB;
 	} else {
 		net->dest_state &= ~SCTP_ADDR_NOHB;
 	}
 	if (sctp_stcb_is_feature_on(stcb->sctp_ep, stcb, SCTP_PCB_FLAGS_DO_NOT_PMTUD)) {
 		net->dest_state |= SCTP_ADDR_NO_PMTUD;
 	} else {
 		net->dest_state &= ~SCTP_ADDR_NO_PMTUD;
 	}
 	net->heart_beat_delay = stcb->asoc.heart_beat_delay;
 	/* Init the timer structure */
 	SCTP_OS_TIMER_INIT(&net->rxt_timer.timer);
 	SCTP_OS_TIMER_INIT(&net->pmtu_timer.timer);
 	SCTP_OS_TIMER_INIT(&net->hb_timer.timer);
 
 	/* Now generate a route for this guy */
 #ifdef INET6
 	/* KAME hack: embed scopeid */
 	if (newaddr->sa_family == AF_INET6) {
 		struct sockaddr_in6 *sin6;
 
 		sin6 = (struct sockaddr_in6 *)&net->ro._l_addr;
 		(void)sa6_embedscope(sin6, MODULE_GLOBAL(ip6_use_defzone));
 		sin6->sin6_scope_id = 0;
 	}
 #endif
 	SCTP_RTALLOC((sctp_route_t *) & net->ro,
 	    stcb->asoc.vrf_id,
 	    stcb->sctp_ep->fibnum);
 
 	if (SCTP_ROUTE_HAS_VALID_IFN(&net->ro)) {
 		/* Get source address */
 		net->ro._s_addr = sctp_source_address_selection(stcb->sctp_ep,
 		    stcb,
 		    (sctp_route_t *) & net->ro,
 		    net,
 		    0,
 		    stcb->asoc.vrf_id);
 		if (net->ro._s_addr != NULL) {
 			net->src_addr_selected = 1;
 			/* Now get the interface MTU */
 			if (net->ro._s_addr->ifn_p != NULL) {
 				net->mtu = SCTP_GATHER_MTU_FROM_INTFC(net->ro._s_addr->ifn_p);
 			}
 		} else {
 			net->src_addr_selected = 0;
 		}
 		if (net->mtu > 0) {
 			uint32_t rmtu;
 
 			rmtu = SCTP_GATHER_MTU_FROM_ROUTE(net->ro._s_addr, &net->ro._l_addr.sa, net->ro.ro_rt);
 			if (rmtu == 0) {
 				/*
 				 * Start things off to match mtu of
 				 * interface please.
 				 */
 				SCTP_SET_MTU_OF_ROUTE(&net->ro._l_addr.sa,
 				    net->ro.ro_rt, net->mtu);
 			} else {
 				/*
 				 * we take the route mtu over the interface,
 				 * since the route may be leading out the
 				 * loopback, or a different interface.
 				 */
 				net->mtu = rmtu;
 			}
 		}
 	} else {
 		net->src_addr_selected = 0;
 	}
 	if (net->mtu == 0) {
 		switch (newaddr->sa_family) {
 #ifdef INET
 		case AF_INET:
 			net->mtu = SCTP_DEFAULT_MTU;
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			net->mtu = 1280;
 			break;
 #endif
 		default:
 			break;
 		}
 	}
 #if defined(INET) || defined(INET6)
 	if (net->port) {
 		net->mtu -= (uint32_t) sizeof(struct udphdr);
 	}
 #endif
 	if (from == SCTP_ALLOC_ASOC) {
 		stcb->asoc.smallest_mtu = net->mtu;
 	}
 	if (stcb->asoc.smallest_mtu > net->mtu) {
 		sctp_pathmtu_adjustment(stcb, net->mtu);
 	}
 #ifdef INET6
 	if (newaddr->sa_family == AF_INET6) {
 		struct sockaddr_in6 *sin6;
 
 		sin6 = (struct sockaddr_in6 *)&net->ro._l_addr;
 		(void)sa6_recoverscope(sin6);
 	}
 #endif
 
 	/* JRS - Use the congestion control given in the CC module */
 	if (stcb->asoc.cc_functions.sctp_set_initial_cc_param != NULL)
 		(*stcb->asoc.cc_functions.sctp_set_initial_cc_param) (stcb, net);
 
 	/*
 	 * CMT: CUC algo - set find_pseudo_cumack to TRUE (1) at beginning
 	 * of assoc (2005/06/27, iyengar@cis.udel.edu)
 	 */
 	net->find_pseudo_cumack = 1;
 	net->find_rtx_pseudo_cumack = 1;
 	/* Choose an initial flowid. */
 	net->flowid = stcb->asoc.my_vtag ^
 	    ntohs(stcb->rport) ^
 	    ntohs(stcb->sctp_ep->sctp_lport);
-	net->flowtype = M_HASHTYPE_OPAQUE;
+	net->flowtype = M_HASHTYPE_OPAQUE_HASH;
 	if (netp) {
 		*netp = net;
 	}
 	netfirst = TAILQ_FIRST(&stcb->asoc.nets);
 	if (net->ro.ro_rt == NULL) {
 		/* Since we have no route put it at the back */
 		TAILQ_INSERT_TAIL(&stcb->asoc.nets, net, sctp_next);
 	} else if (netfirst == NULL) {
 		/* We are the first one in the pool. */
 		TAILQ_INSERT_HEAD(&stcb->asoc.nets, net, sctp_next);
 	} else if (netfirst->ro.ro_rt == NULL) {
 		/*
 		 * First one has NO route. Place this one ahead of the first
 		 * one.
 		 */
 		TAILQ_INSERT_HEAD(&stcb->asoc.nets, net, sctp_next);
 	} else if (net->ro.ro_rt->rt_ifp != netfirst->ro.ro_rt->rt_ifp) {
 		/*
 		 * This one has a different interface than the one at the
 		 * top of the list. Place it ahead.
 		 */
 		TAILQ_INSERT_HEAD(&stcb->asoc.nets, net, sctp_next);
 	} else {
 		/*
 		 * Ok we have the same interface as the first one. Move
 		 * forward until we find either a) one with a NULL route...
 		 * insert ahead of that b) one with a different ifp.. insert
 		 * after that. c) end of the list.. insert at the tail.
 		 */
 		struct sctp_nets *netlook;
 
 		do {
 			netlook = TAILQ_NEXT(netfirst, sctp_next);
 			if (netlook == NULL) {
 				/* End of the list */
 				TAILQ_INSERT_TAIL(&stcb->asoc.nets, net, sctp_next);
 				break;
 			} else if (netlook->ro.ro_rt == NULL) {
 				/* next one has NO route */
 				TAILQ_INSERT_BEFORE(netfirst, net, sctp_next);
 				break;
 			} else if (netlook->ro.ro_rt->rt_ifp != net->ro.ro_rt->rt_ifp) {
 				TAILQ_INSERT_AFTER(&stcb->asoc.nets, netlook,
 				    net, sctp_next);
 				break;
 			}
 			/* Shift forward */
 			netfirst = netlook;
 		} while (netlook != NULL);
 	}
 
 	/* got to have a primary set */
 	if (stcb->asoc.primary_destination == 0) {
 		stcb->asoc.primary_destination = net;
 	} else if ((stcb->asoc.primary_destination->ro.ro_rt == NULL) &&
 		    (net->ro.ro_rt) &&
 	    ((net->dest_state & SCTP_ADDR_UNCONFIRMED) == 0)) {
 		/* No route to current primary adopt new primary */
 		stcb->asoc.primary_destination = net;
 	}
 	/* Validate primary is first */
 	net = TAILQ_FIRST(&stcb->asoc.nets);
 	if ((net != stcb->asoc.primary_destination) &&
 	    (stcb->asoc.primary_destination)) {
 		/*
 		 * first one on the list is NOT the primary sctp_cmpaddr()
 		 * is much more efficient if the primary is the first on the
 		 * list, make it so.
 		 */
 		TAILQ_REMOVE(&stcb->asoc.nets,
 		    stcb->asoc.primary_destination, sctp_next);
 		TAILQ_INSERT_HEAD(&stcb->asoc.nets,
 		    stcb->asoc.primary_destination, sctp_next);
 	}
 	return (0);
 }
 
 
 static uint32_t
 sctp_aloc_a_assoc_id(struct sctp_inpcb *inp, struct sctp_tcb *stcb)
 {
 	uint32_t id;
 	struct sctpasochead *head;
 	struct sctp_tcb *lstcb;
 
 	SCTP_INP_WLOCK(inp);
 try_again:
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 		/* TSNH */
 		SCTP_INP_WUNLOCK(inp);
 		return (0);
 	}
 	/*
 	 * We don't allow assoc id to be one of SCTP_FUTURE_ASSOC,
 	 * SCTP_CURRENT_ASSOC and SCTP_ALL_ASSOC.
 	 */
 	if (inp->sctp_associd_counter <= SCTP_ALL_ASSOC) {
 		inp->sctp_associd_counter = SCTP_ALL_ASSOC + 1;
 	}
 	id = inp->sctp_associd_counter;
 	inp->sctp_associd_counter++;
 	lstcb = sctp_findasoc_ep_asocid_locked(inp, (sctp_assoc_t) id, 0);
 	if (lstcb) {
 		goto try_again;
 	}
 	head = &inp->sctp_asocidhash[SCTP_PCBHASH_ASOC(id, inp->hashasocidmark)];
 	LIST_INSERT_HEAD(head, stcb, sctp_tcbasocidhash);
 	stcb->asoc.in_asocid_hash = 1;
 	SCTP_INP_WUNLOCK(inp);
 	return id;
 }
 
 /*
  * allocate an association and add it to the endpoint. The caller must be
  * careful to add all additional addresses once they are know right away or
  * else the assoc will be may experience a blackout scenario.
  */
 struct sctp_tcb *
 sctp_aloc_assoc(struct sctp_inpcb *inp, struct sockaddr *firstaddr,
     int *error, uint32_t override_tag, uint32_t vrf_id,
     uint16_t o_streams, uint16_t port,
     struct thread *p
 )
 {
 	/* note the p argument is only valid in unbound sockets */
 
 	struct sctp_tcb *stcb;
 	struct sctp_association *asoc;
 	struct sctpasochead *head;
 	uint16_t rport;
 	int err;
 
 	/*
 	 * Assumption made here: Caller has done a
 	 * sctp_findassociation_ep_addr(ep, addr's); to make sure the
 	 * address does not exist already.
 	 */
 	if (SCTP_BASE_INFO(ipi_count_asoc) >= SCTP_MAX_NUM_OF_ASOC) {
 		/* Hit max assoc, sorry no more */
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOBUFS);
 		*error = ENOBUFS;
 		return (NULL);
 	}
 	if (firstaddr == NULL) {
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		*error = EINVAL;
 		return (NULL);
 	}
 	SCTP_INP_RLOCK(inp);
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL) &&
 	    ((sctp_is_feature_off(inp, SCTP_PCB_FLAGS_PORTREUSE)) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED))) {
 		/*
 		 * If its in the TCP pool, its NOT allowed to create an
 		 * association. The parent listener needs to call
 		 * sctp_aloc_assoc.. or the one-2-many socket. If a peeled
 		 * off, or connected one does this.. its an error.
 		 */
 		SCTP_INP_RUNLOCK(inp);
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		*error = EINVAL;
 		return (NULL);
 	}
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)) {
 		if ((inp->sctp_flags & SCTP_PCB_FLAGS_WAS_CONNECTED) ||
 		    (inp->sctp_flags & SCTP_PCB_FLAGS_WAS_ABORTED)) {
 			SCTP_INP_RUNLOCK(inp);
 			SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 			*error = EINVAL;
 			return (NULL);
 		}
 	}
 	SCTPDBG(SCTP_DEBUG_PCB3, "Allocate an association for peer:");
 #ifdef SCTP_DEBUG
 	if (firstaddr) {
 		SCTPDBG_ADDR(SCTP_DEBUG_PCB3, firstaddr);
 		switch (firstaddr->sa_family) {
 #ifdef INET
 		case AF_INET:
 			SCTPDBG(SCTP_DEBUG_PCB3, "Port:%d\n",
 			    ntohs(((struct sockaddr_in *)firstaddr)->sin_port));
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			SCTPDBG(SCTP_DEBUG_PCB3, "Port:%d\n",
 			    ntohs(((struct sockaddr_in6 *)firstaddr)->sin6_port));
 			break;
 #endif
 		default:
 			break;
 		}
 	} else {
 		SCTPDBG(SCTP_DEBUG_PCB3, "None\n");
 	}
 #endif				/* SCTP_DEBUG */
 	switch (firstaddr->sa_family) {
 #ifdef INET
 	case AF_INET:
 		{
 			struct sockaddr_in *sin;
 
 			sin = (struct sockaddr_in *)firstaddr;
 			if ((ntohs(sin->sin_port) == 0) ||
 			    (sin->sin_addr.s_addr == INADDR_ANY) ||
 			    (sin->sin_addr.s_addr == INADDR_BROADCAST) ||
 			    IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) {
 				/* Invalid address */
 				SCTP_INP_RUNLOCK(inp);
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 				*error = EINVAL;
 				return (NULL);
 			}
 			rport = sin->sin_port;
 			break;
 		}
 #endif
 #ifdef INET6
 	case AF_INET6:
 		{
 			struct sockaddr_in6 *sin6;
 
 			sin6 = (struct sockaddr_in6 *)firstaddr;
 			if ((ntohs(sin6->sin6_port) == 0) ||
 			    IN6_IS_ADDR_UNSPECIFIED(&sin6->sin6_addr) ||
 			    IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr)) {
 				/* Invalid address */
 				SCTP_INP_RUNLOCK(inp);
 				SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 				*error = EINVAL;
 				return (NULL);
 			}
 			rport = sin6->sin6_port;
 			break;
 		}
 #endif
 	default:
 		/* not supported family type */
 		SCTP_INP_RUNLOCK(inp);
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		*error = EINVAL;
 		return (NULL);
 	}
 	SCTP_INP_RUNLOCK(inp);
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_UNBOUND) {
 		/*
 		 * If you have not performed a bind, then we need to do the
 		 * ephemeral bind for you.
 		 */
 		if ((err = sctp_inpcb_bind(inp->sctp_socket,
 		    (struct sockaddr *)NULL,
 		    (struct sctp_ifa *)NULL,
 		    p
 		    ))) {
 			/* bind error, probably perm */
 			*error = err;
 			return (NULL);
 		}
 	}
 	stcb = SCTP_ZONE_GET(SCTP_BASE_INFO(ipi_zone_asoc), struct sctp_tcb);
 	if (stcb == NULL) {
 		/* out of memory? */
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOMEM);
 		*error = ENOMEM;
 		return (NULL);
 	}
 	SCTP_INCR_ASOC_COUNT();
 
 	bzero(stcb, sizeof(*stcb));
 	asoc = &stcb->asoc;
 
 	asoc->assoc_id = sctp_aloc_a_assoc_id(inp, stcb);
 	SCTP_TCB_LOCK_INIT(stcb);
 	SCTP_TCB_SEND_LOCK_INIT(stcb);
 	stcb->rport = rport;
 	/* setup back pointer's */
 	stcb->sctp_ep = inp;
 	stcb->sctp_socket = inp->sctp_socket;
 	if ((err = sctp_init_asoc(inp, stcb, override_tag, vrf_id, o_streams))) {
 		/* failed */
 		SCTP_TCB_LOCK_DESTROY(stcb);
 		SCTP_TCB_SEND_LOCK_DESTROY(stcb);
 		LIST_REMOVE(stcb, sctp_tcbasocidhash);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), stcb);
 		SCTP_DECR_ASOC_COUNT();
 		*error = err;
 		return (NULL);
 	}
 	/* and the port */
 	SCTP_INP_INFO_WLOCK();
 	SCTP_INP_WLOCK(inp);
 	if (inp->sctp_flags & (SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_SOCKET_ALLGONE)) {
 		/* inpcb freed while alloc going on */
 		SCTP_TCB_LOCK_DESTROY(stcb);
 		SCTP_TCB_SEND_LOCK_DESTROY(stcb);
 		LIST_REMOVE(stcb, sctp_tcbasocidhash);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), stcb);
 		SCTP_INP_WUNLOCK(inp);
 		SCTP_INP_INFO_WUNLOCK();
 		SCTP_DECR_ASOC_COUNT();
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EINVAL);
 		*error = EINVAL;
 		return (NULL);
 	}
 	SCTP_TCB_LOCK(stcb);
 
 	/* now that my_vtag is set, add it to the hash */
 	head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(stcb->asoc.my_vtag, SCTP_BASE_INFO(hashasocmark))];
 	/* put it in the bucket in the vtag hash of assoc's for the system */
 	LIST_INSERT_HEAD(head, stcb, sctp_asocs);
 	SCTP_INP_INFO_WUNLOCK();
 
 	if ((err = sctp_add_remote_addr(stcb, firstaddr, NULL, port, SCTP_DO_SETSCOPE, SCTP_ALLOC_ASOC))) {
 		/* failure.. memory error? */
 		if (asoc->strmout) {
 			SCTP_FREE(asoc->strmout, SCTP_M_STRMO);
 			asoc->strmout = NULL;
 		}
 		if (asoc->mapping_array) {
 			SCTP_FREE(asoc->mapping_array, SCTP_M_MAP);
 			asoc->mapping_array = NULL;
 		}
 		if (asoc->nr_mapping_array) {
 			SCTP_FREE(asoc->nr_mapping_array, SCTP_M_MAP);
 			asoc->nr_mapping_array = NULL;
 		}
 		SCTP_DECR_ASOC_COUNT();
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_TCB_LOCK_DESTROY(stcb);
 		SCTP_TCB_SEND_LOCK_DESTROY(stcb);
 		LIST_REMOVE(stcb, sctp_tcbasocidhash);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), stcb);
 		SCTP_INP_WUNLOCK(inp);
 		SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOBUFS);
 		*error = ENOBUFS;
 		return (NULL);
 	}
 	/* Init all the timers */
 	SCTP_OS_TIMER_INIT(&asoc->dack_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->strreset_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->asconf_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->shut_guard_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->autoclose_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->delayed_event_timer.timer);
 	SCTP_OS_TIMER_INIT(&asoc->delete_prim_timer.timer);
 
 	LIST_INSERT_HEAD(&inp->sctp_asoc_list, stcb, sctp_tcblist);
 	/* now file the port under the hash as well */
 	if (inp->sctp_tcbhash != NULL) {
 		head = &inp->sctp_tcbhash[SCTP_PCBHASH_ALLADDR(stcb->rport,
 		    inp->sctp_hashmark)];
 		LIST_INSERT_HEAD(head, stcb, sctp_tcbhash);
 	}
 	SCTP_INP_WUNLOCK(inp);
 	SCTPDBG(SCTP_DEBUG_PCB1, "Association %p now allocated\n", (void *)stcb);
 	return (stcb);
 }
 
 
 void
 sctp_remove_net(struct sctp_tcb *stcb, struct sctp_nets *net)
 {
 	struct sctp_association *asoc;
 
 	asoc = &stcb->asoc;
 	asoc->numnets--;
 	TAILQ_REMOVE(&asoc->nets, net, sctp_next);
 	if (net == asoc->primary_destination) {
 		/* Reset primary */
 		struct sctp_nets *lnet;
 
 		lnet = TAILQ_FIRST(&asoc->nets);
 		/*
 		 * Mobility adaptation Ideally, if deleted destination is
 		 * the primary, it becomes a fast retransmission trigger by
 		 * the subsequent SET PRIMARY. (by micchie)
 		 */
 		if (sctp_is_mobility_feature_on(stcb->sctp_ep,
 		    SCTP_MOBILITY_BASE) ||
 		    sctp_is_mobility_feature_on(stcb->sctp_ep,
 		    SCTP_MOBILITY_FASTHANDOFF)) {
 			SCTPDBG(SCTP_DEBUG_ASCONF1, "remove_net: primary dst is deleting\n");
 			if (asoc->deleted_primary != NULL) {
 				SCTPDBG(SCTP_DEBUG_ASCONF1, "remove_net: deleted primary may be already stored\n");
 				goto out;
 			}
 			asoc->deleted_primary = net;
 			atomic_add_int(&net->ref_count, 1);
 			memset(&net->lastsa, 0, sizeof(net->lastsa));
 			memset(&net->lastsv, 0, sizeof(net->lastsv));
 			sctp_mobility_feature_on(stcb->sctp_ep,
 			    SCTP_MOBILITY_PRIM_DELETED);
 			sctp_timer_start(SCTP_TIMER_TYPE_PRIM_DELETED,
 			    stcb->sctp_ep, stcb, NULL);
 		}
 out:
 		/* Try to find a confirmed primary */
 		asoc->primary_destination = sctp_find_alternate_net(stcb, lnet, 0);
 	}
 	if (net == asoc->last_data_chunk_from) {
 		/* Reset primary */
 		asoc->last_data_chunk_from = TAILQ_FIRST(&asoc->nets);
 	}
 	if (net == asoc->last_control_chunk_from) {
 		/* Clear net */
 		asoc->last_control_chunk_from = NULL;
 	}
 	if (net == stcb->asoc.alternate) {
 		sctp_free_remote_addr(stcb->asoc.alternate);
 		stcb->asoc.alternate = NULL;
 	}
 	sctp_free_remote_addr(net);
 }
 
 /*
  * remove a remote endpoint address from an association, it will fail if the
  * address does not exist.
  */
 int
 sctp_del_remote_addr(struct sctp_tcb *stcb, struct sockaddr *remaddr)
 {
 	/*
 	 * Here we need to remove a remote address. This is quite simple, we
 	 * first find it in the list of address for the association
 	 * (tasoc->asoc.nets) and then if it is there, we do a LIST_REMOVE
 	 * on that item. Note we do not allow it to be removed if there are
 	 * no other addresses.
 	 */
 	struct sctp_association *asoc;
 	struct sctp_nets *net, *nnet;
 
 	asoc = &stcb->asoc;
 
 	/* locate the address */
 	TAILQ_FOREACH_SAFE(net, &asoc->nets, sctp_next, nnet) {
 		if (net->ro._l_addr.sa.sa_family != remaddr->sa_family) {
 			continue;
 		}
 		if (sctp_cmpaddr((struct sockaddr *)&net->ro._l_addr,
 		    remaddr)) {
 			/* we found the guy */
 			if (asoc->numnets < 2) {
 				/* Must have at LEAST two remote addresses */
 				return (-1);
 			} else {
 				sctp_remove_net(stcb, net);
 				return (0);
 			}
 		}
 	}
 	/* not found. */
 	return (-2);
 }
 
 void
 sctp_delete_from_timewait(uint32_t tag, uint16_t lport, uint16_t rport)
 {
 	struct sctpvtaghead *chain;
 	struct sctp_tagblock *twait_block;
 	int found = 0;
 	int i;
 
 	chain = &SCTP_BASE_INFO(vtag_timewait)[(tag % SCTP_STACK_VTAG_HASH_SIZE)];
 	LIST_FOREACH(twait_block, chain, sctp_nxt_tagblock) {
 		for (i = 0; i < SCTP_NUMBER_IN_VTAG_BLOCK; i++) {
 			if ((twait_block->vtag_block[i].v_tag == tag) &&
 			    (twait_block->vtag_block[i].lport == lport) &&
 			    (twait_block->vtag_block[i].rport == rport)) {
 				twait_block->vtag_block[i].tv_sec_at_expire = 0;
 				twait_block->vtag_block[i].v_tag = 0;
 				twait_block->vtag_block[i].lport = 0;
 				twait_block->vtag_block[i].rport = 0;
 				found = 1;
 				break;
 			}
 		}
 		if (found)
 			break;
 	}
 }
 
 int
 sctp_is_in_timewait(uint32_t tag, uint16_t lport, uint16_t rport)
 {
 	struct sctpvtaghead *chain;
 	struct sctp_tagblock *twait_block;
 	int found = 0;
 	int i;
 
 	SCTP_INP_INFO_WLOCK();
 	chain = &SCTP_BASE_INFO(vtag_timewait)[(tag % SCTP_STACK_VTAG_HASH_SIZE)];
 	LIST_FOREACH(twait_block, chain, sctp_nxt_tagblock) {
 		for (i = 0; i < SCTP_NUMBER_IN_VTAG_BLOCK; i++) {
 			if ((twait_block->vtag_block[i].v_tag == tag) &&
 			    (twait_block->vtag_block[i].lport == lport) &&
 			    (twait_block->vtag_block[i].rport == rport)) {
 				found = 1;
 				break;
 			}
 		}
 		if (found)
 			break;
 	}
 	SCTP_INP_INFO_WUNLOCK();
 	return (found);
 }
 
 
 void
 sctp_add_vtag_to_timewait(uint32_t tag, uint32_t time, uint16_t lport, uint16_t rport)
 {
 	struct sctpvtaghead *chain;
 	struct sctp_tagblock *twait_block;
 	struct timeval now;
 	int set, i;
 
 	if (time == 0) {
 		/* Its disabled */
 		return;
 	}
 	(void)SCTP_GETTIME_TIMEVAL(&now);
 	chain = &SCTP_BASE_INFO(vtag_timewait)[(tag % SCTP_STACK_VTAG_HASH_SIZE)];
 	set = 0;
 	LIST_FOREACH(twait_block, chain, sctp_nxt_tagblock) {
 		/* Block(s) present, lets find space, and expire on the fly */
 		for (i = 0; i < SCTP_NUMBER_IN_VTAG_BLOCK; i++) {
 			if ((twait_block->vtag_block[i].v_tag == 0) &&
 			    !set) {
 				twait_block->vtag_block[i].tv_sec_at_expire =
 				    now.tv_sec + time;
 				twait_block->vtag_block[i].v_tag = tag;
 				twait_block->vtag_block[i].lport = lport;
 				twait_block->vtag_block[i].rport = rport;
 				set = 1;
 			} else if ((twait_block->vtag_block[i].v_tag) &&
 			    ((long)twait_block->vtag_block[i].tv_sec_at_expire < now.tv_sec)) {
 				/* Audit expires this guy */
 				twait_block->vtag_block[i].tv_sec_at_expire = 0;
 				twait_block->vtag_block[i].v_tag = 0;
 				twait_block->vtag_block[i].lport = 0;
 				twait_block->vtag_block[i].rport = 0;
 				if (set == 0) {
 					/* Reuse it for my new tag */
 					twait_block->vtag_block[i].tv_sec_at_expire = now.tv_sec + time;
 					twait_block->vtag_block[i].v_tag = tag;
 					twait_block->vtag_block[i].lport = lport;
 					twait_block->vtag_block[i].rport = rport;
 					set = 1;
 				}
 			}
 		}
 		if (set) {
 			/*
 			 * We only do up to the block where we can place our
 			 * tag for audits
 			 */
 			break;
 		}
 	}
 	/* Need to add a new block to chain */
 	if (!set) {
 		SCTP_MALLOC(twait_block, struct sctp_tagblock *,
 		    sizeof(struct sctp_tagblock), SCTP_M_TIMW);
 		if (twait_block == NULL) {
 #ifdef INVARIANTS
 			panic("Can not alloc tagblock");
 #endif
 			return;
 		}
 		memset(twait_block, 0, sizeof(struct sctp_tagblock));
 		LIST_INSERT_HEAD(chain, twait_block, sctp_nxt_tagblock);
 		twait_block->vtag_block[0].tv_sec_at_expire = now.tv_sec + time;
 		twait_block->vtag_block[0].v_tag = tag;
 		twait_block->vtag_block[0].lport = lport;
 		twait_block->vtag_block[0].rport = rport;
 	}
 }
 
 void
 sctp_clean_up_stream(struct sctp_tcb *stcb, struct sctp_readhead *rh)
 {
 	struct sctp_tmit_chunk *chk, *nchk;
 	struct sctp_queued_to_read *ctl, *nctl;
 
 	TAILQ_FOREACH_SAFE(ctl, rh, next_instrm, nctl) {
 		TAILQ_REMOVE(rh, ctl, next_instrm);
 		ctl->on_strm_q = 0;
 		if (ctl->on_read_q == 0) {
 			sctp_free_remote_addr(ctl->whoFrom);
 			if (ctl->data) {
 				sctp_m_freem(ctl->data);
 				ctl->data = NULL;
 			}
 		}
 		/* Reassembly free? */
 		TAILQ_FOREACH_SAFE(chk, &ctl->reasm, sctp_next, nchk) {
 			TAILQ_REMOVE(&ctl->reasm, chk, sctp_next);
 			if (chk->data) {
 				sctp_m_freem(chk->data);
 				chk->data = NULL;
 			}
 			if (chk->holds_key_ref)
 				sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 			sctp_free_remote_addr(chk->whoTo);
 			SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 			SCTP_DECR_CHK_COUNT();
 			/* sa_ignore FREED_MEMORY */
 		}
 		/*
 		 * We don't free the address here since all the net's were
 		 * freed above.
 		 */
 		if (ctl->on_read_q == 0) {
 			sctp_free_a_readq(stcb, ctl);
 		}
 	}
 }
 
 
 /*-
  * Free the association after un-hashing the remote port. This
  * function ALWAYS returns holding NO LOCK on the stcb. It DOES
  * expect that the input to this function IS a locked TCB.
  * It will return 0, if it did NOT destroy the association (instead
  * it unlocks it. It will return NON-zero if it either destroyed the
  * association OR the association is already destroyed.
  */
 int
 sctp_free_assoc(struct sctp_inpcb *inp, struct sctp_tcb *stcb, int from_inpcbfree, int from_location)
 {
 	int i;
 	struct sctp_association *asoc;
 	struct sctp_nets *net, *nnet;
 	struct sctp_laddr *laddr, *naddr;
 	struct sctp_tmit_chunk *chk, *nchk;
 	struct sctp_asconf_addr *aparam, *naparam;
 	struct sctp_asconf_ack *aack, *naack;
 	struct sctp_stream_reset_list *strrst, *nstrrst;
 	struct sctp_queued_to_read *sq, *nsq;
 	struct sctp_stream_queue_pending *sp, *nsp;
 	sctp_sharedkey_t *shared_key, *nshared_key;
 	struct socket *so;
 
 	/* first, lets purge the entry from the hash table. */
 
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, stcb, 6);
 #endif
 	if (stcb->asoc.state == 0) {
 #ifdef SCTP_LOG_CLOSING
 		sctp_log_closing(inp, NULL, 7);
 #endif
 		/* there is no asoc, really TSNH :-0 */
 		return (1);
 	}
 	if (stcb->asoc.alternate) {
 		sctp_free_remote_addr(stcb->asoc.alternate);
 		stcb->asoc.alternate = NULL;
 	}
 	/* TEMP CODE */
 	if (stcb->freed_from_where == 0) {
 		/* Only record the first place free happened from */
 		stcb->freed_from_where = from_location;
 	}
 	/* TEMP CODE */
 
 	asoc = &stcb->asoc;
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE))
 		/* nothing around */
 		so = NULL;
 	else
 		so = inp->sctp_socket;
 
 	/*
 	 * We used timer based freeing if a reader or writer is in the way.
 	 * So we first check if we are actually being called from a timer,
 	 * if so we abort early if a reader or writer is still in the way.
 	 */
 	if ((stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) &&
 	    (from_inpcbfree == SCTP_NORMAL_PROC)) {
 		/*
 		 * is it the timer driving us? if so are the reader/writers
 		 * gone?
 		 */
 		if (stcb->asoc.refcnt) {
 			/* nope, reader or writer in the way */
 			sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, stcb, NULL);
 			/* no asoc destroyed */
 			SCTP_TCB_UNLOCK(stcb);
 #ifdef SCTP_LOG_CLOSING
 			sctp_log_closing(inp, stcb, 8);
 #endif
 			return (0);
 		}
 	}
 	/* now clean up any other timers */
 	(void)SCTP_OS_TIMER_STOP(&asoc->dack_timer.timer);
 	asoc->dack_timer.self = NULL;
 	(void)SCTP_OS_TIMER_STOP(&asoc->strreset_timer.timer);
 	/*-
 	 * For stream reset we don't blast this unless
 	 * it is a str-reset timer, it might be the
 	 * free-asoc timer which we DON'T want to
 	 * disturb.
 	 */
 	if (asoc->strreset_timer.type == SCTP_TIMER_TYPE_STRRESET)
 		asoc->strreset_timer.self = NULL;
 	(void)SCTP_OS_TIMER_STOP(&asoc->asconf_timer.timer);
 	asoc->asconf_timer.self = NULL;
 	(void)SCTP_OS_TIMER_STOP(&asoc->autoclose_timer.timer);
 	asoc->autoclose_timer.self = NULL;
 	(void)SCTP_OS_TIMER_STOP(&asoc->shut_guard_timer.timer);
 	asoc->shut_guard_timer.self = NULL;
 	(void)SCTP_OS_TIMER_STOP(&asoc->delayed_event_timer.timer);
 	asoc->delayed_event_timer.self = NULL;
 	/* Mobility adaptation */
 	(void)SCTP_OS_TIMER_STOP(&asoc->delete_prim_timer.timer);
 	asoc->delete_prim_timer.self = NULL;
 	TAILQ_FOREACH(net, &asoc->nets, sctp_next) {
 		(void)SCTP_OS_TIMER_STOP(&net->rxt_timer.timer);
 		net->rxt_timer.self = NULL;
 		(void)SCTP_OS_TIMER_STOP(&net->pmtu_timer.timer);
 		net->pmtu_timer.self = NULL;
 		(void)SCTP_OS_TIMER_STOP(&net->hb_timer.timer);
 		net->hb_timer.self = NULL;
 	}
 	/* Now the read queue needs to be cleaned up (only once) */
 	if ((stcb->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) == 0) {
 		stcb->asoc.state |= SCTP_STATE_ABOUT_TO_BE_FREED;
 		SCTP_INP_READ_LOCK(inp);
 		TAILQ_FOREACH(sq, &inp->read_queue, next) {
 			if (sq->stcb == stcb) {
 				sq->do_not_ref_stcb = 1;
 				sq->sinfo_cumtsn = stcb->asoc.cumulative_tsn;
 				/*
 				 * If there is no end, there never will be
 				 * now.
 				 */
 				if (sq->end_added == 0) {
 					/* Held for PD-API clear that. */
 					sq->pdapi_aborted = 1;
 					sq->held_length = 0;
 					if (sctp_stcb_is_feature_on(inp, stcb, SCTP_PCB_FLAGS_PDAPIEVNT) && (so != NULL)) {
 						/*
 						 * Need to add a PD-API
 						 * aborted indication.
 						 * Setting the control_pdapi
 						 * assures that it will be
 						 * added right after this
 						 * msg.
 						 */
 						uint32_t strseq;
 
 						stcb->asoc.control_pdapi = sq;
 						strseq = (sq->sinfo_stream << 16) | sq->sinfo_ssn;
 						sctp_ulp_notify(SCTP_NOTIFY_PARTIAL_DELVIERY_INDICATION,
 						    stcb,
 						    SCTP_PARTIAL_DELIVERY_ABORTED,
 						    (void *)&strseq,
 						    SCTP_SO_LOCKED);
 						stcb->asoc.control_pdapi = NULL;
 					}
 				}
 				/* Add an end to wake them */
 				sq->end_added = 1;
 			}
 		}
 		SCTP_INP_READ_UNLOCK(inp);
 		if (stcb->block_entry) {
 			SCTP_LTRACE_ERR_RET(inp, stcb, NULL, SCTP_FROM_SCTP_PCB, ECONNRESET);
 			stcb->block_entry->error = ECONNRESET;
 			stcb->block_entry = NULL;
 		}
 	}
 	if ((stcb->asoc.refcnt) || (stcb->asoc.state & SCTP_STATE_IN_ACCEPT_QUEUE)) {
 		/*
 		 * Someone holds a reference OR the socket is unaccepted
 		 * yet.
 		 */
 		if ((stcb->asoc.refcnt) ||
 		    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) ||
 		    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE)) {
 			stcb->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE;
 			sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, stcb, NULL);
 		}
 		SCTP_TCB_UNLOCK(stcb);
 		if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) ||
 		    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE))
 			/* nothing around */
 			so = NULL;
 		if (so) {
 			/* Wake any reader/writers */
 			sctp_sorwakeup(inp, so);
 			sctp_sowwakeup(inp, so);
 		}
 #ifdef SCTP_LOG_CLOSING
 		sctp_log_closing(inp, stcb, 9);
 #endif
 		/* no asoc destroyed */
 		return (0);
 	}
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, stcb, 10);
 #endif
 	/*
 	 * When I reach here, no others want to kill the assoc yet.. and I
 	 * own the lock. Now its possible an abort comes in when I do the
 	 * lock exchange below to grab all the locks to do the final take
 	 * out. to prevent this we increment the count, which will start a
 	 * timer and blow out above thus assuring us that we hold exclusive
 	 * killing of the asoc. Note that after getting back the TCB lock we
 	 * will go ahead and increment the counter back up and stop any
 	 * timer a passing stranger may have started :-S
 	 */
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		atomic_add_int(&stcb->asoc.refcnt, 1);
 
 		SCTP_TCB_UNLOCK(stcb);
 		SCTP_INP_INFO_WLOCK();
 		SCTP_INP_WLOCK(inp);
 		SCTP_TCB_LOCK(stcb);
 	}
 	/* Double check the GONE flag */
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE))
 		/* nothing around */
 		so = NULL;
 
 	if ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) ||
 	    (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) {
 		/*
 		 * For TCP type we need special handling when we are
 		 * connected. We also include the peel'ed off ones to.
 		 */
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) {
 			inp->sctp_flags &= ~SCTP_PCB_FLAGS_CONNECTED;
 			inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED;
 			if (so) {
 				SOCK_LOCK(so);
 				if (so->so_rcv.sb_cc == 0) {
 					so->so_state &= ~(SS_ISCONNECTING |
 					    SS_ISDISCONNECTING |
 					    SS_ISCONFIRMING |
 					    SS_ISCONNECTED);
 				}
 				socantrcvmore_locked(so);
 				sctp_sowwakeup(inp, so);
 				sctp_sorwakeup(inp, so);
 				SCTP_SOWAKEUP(so);
 			}
 		}
 	}
 	/*
 	 * Make it invalid too, that way if its about to run it will abort
 	 * and return.
 	 */
 	/* re-increment the lock */
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		atomic_add_int(&stcb->asoc.refcnt, -1);
 	}
 	if (stcb->asoc.refcnt) {
 		stcb->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE;
 		sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, stcb, NULL);
 		if (from_inpcbfree == SCTP_NORMAL_PROC) {
 			SCTP_INP_INFO_WUNLOCK();
 			SCTP_INP_WUNLOCK(inp);
 		}
 		SCTP_TCB_UNLOCK(stcb);
 		return (0);
 	}
 	asoc->state = 0;
 	if (inp->sctp_tcbhash) {
 		LIST_REMOVE(stcb, sctp_tcbhash);
 	}
 	if (stcb->asoc.in_asocid_hash) {
 		LIST_REMOVE(stcb, sctp_tcbasocidhash);
 	}
 	/* Now lets remove it from the list of ALL associations in the EP */
 	LIST_REMOVE(stcb, sctp_tcblist);
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		SCTP_INP_INCR_REF(inp);
 		SCTP_INP_WUNLOCK(inp);
 	}
 	/* pull from vtag hash */
 	LIST_REMOVE(stcb, sctp_asocs);
 	sctp_add_vtag_to_timewait(asoc->my_vtag, SCTP_BASE_SYSCTL(sctp_vtag_time_wait),
 	    inp->sctp_lport, stcb->rport);
 
 	/*
 	 * Now restop the timers to be sure this is paranoia at is finest!
 	 */
 	(void)SCTP_OS_TIMER_STOP(&asoc->strreset_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->dack_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->strreset_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->asconf_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->shut_guard_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->autoclose_timer.timer);
 	(void)SCTP_OS_TIMER_STOP(&asoc->delayed_event_timer.timer);
 	TAILQ_FOREACH(net, &asoc->nets, sctp_next) {
 		(void)SCTP_OS_TIMER_STOP(&net->rxt_timer.timer);
 		(void)SCTP_OS_TIMER_STOP(&net->pmtu_timer.timer);
 		(void)SCTP_OS_TIMER_STOP(&net->hb_timer.timer);
 	}
 
 	asoc->strreset_timer.type = SCTP_TIMER_TYPE_NONE;
 	/*
 	 * The chunk lists and such SHOULD be empty but we check them just
 	 * in case.
 	 */
 	/* anything on the wheel needs to be removed */
 	for (i = 0; i < asoc->streamoutcnt; i++) {
 		struct sctp_stream_out *outs;
 
 		outs = &asoc->strmout[i];
 		/* now clean up any chunks here */
 		TAILQ_FOREACH_SAFE(sp, &outs->outqueue, next, nsp) {
 			TAILQ_REMOVE(&outs->outqueue, sp, next);
 			sctp_free_spbufspace(stcb, asoc, sp);
 			if (sp->data) {
 				if (so) {
 					/* Still an open socket - report */
 					sctp_ulp_notify(SCTP_NOTIFY_SPECIAL_SP_FAIL, stcb,
 					    0, (void *)sp, SCTP_SO_LOCKED);
 				}
 				if (sp->data) {
 					sctp_m_freem(sp->data);
 					sp->data = NULL;
 					sp->tail_mbuf = NULL;
 					sp->length = 0;
 				}
 			}
 			if (sp->net) {
 				sctp_free_remote_addr(sp->net);
 				sp->net = NULL;
 			}
 			sctp_free_a_strmoq(stcb, sp, SCTP_SO_LOCKED);
 		}
 	}
 	/* sa_ignore FREED_MEMORY */
 	TAILQ_FOREACH_SAFE(strrst, &asoc->resetHead, next_resp, nstrrst) {
 		TAILQ_REMOVE(&asoc->resetHead, strrst, next_resp);
 		SCTP_FREE(strrst, SCTP_M_STRESET);
 	}
 	TAILQ_FOREACH_SAFE(sq, &asoc->pending_reply_queue, next, nsq) {
 		TAILQ_REMOVE(&asoc->pending_reply_queue, sq, next);
 		if (sq->data) {
 			sctp_m_freem(sq->data);
 			sq->data = NULL;
 		}
 		sctp_free_remote_addr(sq->whoFrom);
 		sq->whoFrom = NULL;
 		sq->stcb = NULL;
 		/* Free the ctl entry */
 		sctp_free_a_readq(stcb, sq);
 		/* sa_ignore FREED_MEMORY */
 	}
 	TAILQ_FOREACH_SAFE(chk, &asoc->free_chunks, sctp_next, nchk) {
 		TAILQ_REMOVE(&asoc->free_chunks, chk, sctp_next);
 		if (chk->data) {
 			sctp_m_freem(chk->data);
 			chk->data = NULL;
 		}
 		if (chk->holds_key_ref)
 			sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 		SCTP_DECR_CHK_COUNT();
 		atomic_subtract_int(&SCTP_BASE_INFO(ipi_free_chunks), 1);
 		asoc->free_chunk_cnt--;
 		/* sa_ignore FREED_MEMORY */
 	}
 	/* pending send queue SHOULD be empty */
 	TAILQ_FOREACH_SAFE(chk, &asoc->send_queue, sctp_next, nchk) {
 		if (asoc->strmout[chk->rec.data.stream_number].chunks_on_queues > 0) {
 			asoc->strmout[chk->rec.data.stream_number].chunks_on_queues--;
 #ifdef INVARIANTS
 		} else {
 			panic("No chunks on the queues for sid %u.", chk->rec.data.stream_number);
 #endif
 		}
 		TAILQ_REMOVE(&asoc->send_queue, chk, sctp_next);
 		if (chk->data) {
 			if (so) {
 				/* Still a socket? */
 				sctp_ulp_notify(SCTP_NOTIFY_UNSENT_DG_FAIL, stcb,
 				    0, chk, SCTP_SO_LOCKED);
 			}
 			if (chk->data) {
 				sctp_m_freem(chk->data);
 				chk->data = NULL;
 			}
 		}
 		if (chk->holds_key_ref)
 			sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 		if (chk->whoTo) {
 			sctp_free_remote_addr(chk->whoTo);
 			chk->whoTo = NULL;
 		}
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 		SCTP_DECR_CHK_COUNT();
 		/* sa_ignore FREED_MEMORY */
 	}
 	/* sent queue SHOULD be empty */
 	TAILQ_FOREACH_SAFE(chk, &asoc->sent_queue, sctp_next, nchk) {
 		if (chk->sent != SCTP_DATAGRAM_NR_ACKED) {
 			if (asoc->strmout[chk->rec.data.stream_number].chunks_on_queues > 0) {
 				asoc->strmout[chk->rec.data.stream_number].chunks_on_queues--;
 #ifdef INVARIANTS
 			} else {
 				panic("No chunks on the queues for sid %u.", chk->rec.data.stream_number);
 #endif
 			}
 		}
 		TAILQ_REMOVE(&asoc->sent_queue, chk, sctp_next);
 		if (chk->data) {
 			if (so) {
 				/* Still a socket? */
 				sctp_ulp_notify(SCTP_NOTIFY_SENT_DG_FAIL, stcb,
 				    0, chk, SCTP_SO_LOCKED);
 			}
 			if (chk->data) {
 				sctp_m_freem(chk->data);
 				chk->data = NULL;
 			}
 		}
 		if (chk->holds_key_ref)
 			sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 		sctp_free_remote_addr(chk->whoTo);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 		SCTP_DECR_CHK_COUNT();
 		/* sa_ignore FREED_MEMORY */
 	}
 #ifdef INVARIANTS
 	for (i = 0; i < stcb->asoc.streamoutcnt; i++) {
 		if (stcb->asoc.strmout[i].chunks_on_queues > 0) {
 			panic("%u chunks left for stream %u.", stcb->asoc.strmout[i].chunks_on_queues, i);
 		}
 	}
 #endif
 	/* control queue MAY not be empty */
 	TAILQ_FOREACH_SAFE(chk, &asoc->control_send_queue, sctp_next, nchk) {
 		TAILQ_REMOVE(&asoc->control_send_queue, chk, sctp_next);
 		if (chk->data) {
 			sctp_m_freem(chk->data);
 			chk->data = NULL;
 		}
 		if (chk->holds_key_ref)
 			sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 		sctp_free_remote_addr(chk->whoTo);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 		SCTP_DECR_CHK_COUNT();
 		/* sa_ignore FREED_MEMORY */
 	}
 	/* ASCONF queue MAY not be empty */
 	TAILQ_FOREACH_SAFE(chk, &asoc->asconf_send_queue, sctp_next, nchk) {
 		TAILQ_REMOVE(&asoc->asconf_send_queue, chk, sctp_next);
 		if (chk->data) {
 			sctp_m_freem(chk->data);
 			chk->data = NULL;
 		}
 		if (chk->holds_key_ref)
 			sctp_auth_key_release(stcb, chk->auth_keyid, SCTP_SO_LOCKED);
 		sctp_free_remote_addr(chk->whoTo);
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_chunk), chk);
 		SCTP_DECR_CHK_COUNT();
 		/* sa_ignore FREED_MEMORY */
 	}
 	if (asoc->mapping_array) {
 		SCTP_FREE(asoc->mapping_array, SCTP_M_MAP);
 		asoc->mapping_array = NULL;
 	}
 	if (asoc->nr_mapping_array) {
 		SCTP_FREE(asoc->nr_mapping_array, SCTP_M_MAP);
 		asoc->nr_mapping_array = NULL;
 	}
 	/* the stream outs */
 	if (asoc->strmout) {
 		SCTP_FREE(asoc->strmout, SCTP_M_STRMO);
 		asoc->strmout = NULL;
 	}
 	asoc->strm_realoutsize = asoc->streamoutcnt = 0;
 	if (asoc->strmin) {
 		for (i = 0; i < asoc->streamincnt; i++) {
 			sctp_clean_up_stream(stcb, &asoc->strmin[i].inqueue);
 			sctp_clean_up_stream(stcb, &asoc->strmin[i].uno_inqueue);
 		}
 		SCTP_FREE(asoc->strmin, SCTP_M_STRMI);
 		asoc->strmin = NULL;
 	}
 	asoc->streamincnt = 0;
 	TAILQ_FOREACH_SAFE(net, &asoc->nets, sctp_next, nnet) {
 #ifdef INVARIANTS
 		if (SCTP_BASE_INFO(ipi_count_raddr) == 0) {
 			panic("no net's left alloc'ed, or list points to itself");
 		}
 #endif
 		TAILQ_REMOVE(&asoc->nets, net, sctp_next);
 		sctp_free_remote_addr(net);
 	}
 	LIST_FOREACH_SAFE(laddr, &asoc->sctp_restricted_addrs, sctp_nxt_addr, naddr) {
 		/* sa_ignore FREED_MEMORY */
 		sctp_remove_laddr(laddr);
 	}
 
 	/* pending asconf (address) parameters */
 	TAILQ_FOREACH_SAFE(aparam, &asoc->asconf_queue, next, naparam) {
 		/* sa_ignore FREED_MEMORY */
 		TAILQ_REMOVE(&asoc->asconf_queue, aparam, next);
 		SCTP_FREE(aparam, SCTP_M_ASC_ADDR);
 	}
 	TAILQ_FOREACH_SAFE(aack, &asoc->asconf_ack_sent, next, naack) {
 		/* sa_ignore FREED_MEMORY */
 		TAILQ_REMOVE(&asoc->asconf_ack_sent, aack, next);
 		if (aack->data != NULL) {
 			sctp_m_freem(aack->data);
 		}
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asconf_ack), aack);
 	}
 	/* clean up auth stuff */
 	if (asoc->local_hmacs)
 		sctp_free_hmaclist(asoc->local_hmacs);
 	if (asoc->peer_hmacs)
 		sctp_free_hmaclist(asoc->peer_hmacs);
 
 	if (asoc->local_auth_chunks)
 		sctp_free_chunklist(asoc->local_auth_chunks);
 	if (asoc->peer_auth_chunks)
 		sctp_free_chunklist(asoc->peer_auth_chunks);
 
 	sctp_free_authinfo(&asoc->authinfo);
 
 	LIST_FOREACH_SAFE(shared_key, &asoc->shared_keys, next, nshared_key) {
 		LIST_REMOVE(shared_key, next);
 		sctp_free_sharedkey(shared_key);
 		/* sa_ignore FREED_MEMORY */
 	}
 
 	/* Insert new items here :> */
 
 	/* Get rid of LOCK */
 	SCTP_TCB_UNLOCK(stcb);
 	SCTP_TCB_LOCK_DESTROY(stcb);
 	SCTP_TCB_SEND_LOCK_DESTROY(stcb);
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		SCTP_INP_INFO_WUNLOCK();
 		SCTP_INP_RLOCK(inp);
 	}
 #ifdef SCTP_TRACK_FREED_ASOCS
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 		/* now clean up the tasoc itself */
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), stcb);
 		SCTP_DECR_ASOC_COUNT();
 	} else {
 		LIST_INSERT_HEAD(&inp->sctp_asoc_free_list, stcb, sctp_tcblist);
 	}
 #else
 	SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_asoc), stcb);
 	SCTP_DECR_ASOC_COUNT();
 #endif
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) {
 			/*
 			 * If its NOT the inp_free calling us AND sctp_close
 			 * as been called, we call back...
 			 */
 			SCTP_INP_RUNLOCK(inp);
 			/*
 			 * This will start the kill timer (if we are the
 			 * last one) since we hold an increment yet. But
 			 * this is the only safe way to do this since
 			 * otherwise if the socket closes at the same time
 			 * we are here we might collide in the cleanup.
 			 */
 			sctp_inpcb_free(inp,
 			    SCTP_FREE_SHOULD_USE_GRACEFUL_CLOSE,
 			    SCTP_CALLED_DIRECTLY_NOCMPSET);
 			SCTP_INP_DECR_REF(inp);
 			goto out_of;
 		} else {
 			/* The socket is still open. */
 			SCTP_INP_DECR_REF(inp);
 		}
 	}
 	if (from_inpcbfree == SCTP_NORMAL_PROC) {
 		SCTP_INP_RUNLOCK(inp);
 	}
 out_of:
 	/* destroyed the asoc */
 #ifdef SCTP_LOG_CLOSING
 	sctp_log_closing(inp, NULL, 11);
 #endif
 	return (1);
 }
 
 
 
 /*
  * determine if a destination is "reachable" based upon the addresses bound
  * to the current endpoint (e.g. only v4 or v6 currently bound)
  */
 /*
  * FIX: if we allow assoc-level bindx(), then this needs to be fixed to use
  * assoc level v4/v6 flags, as the assoc *may* not have the same address
  * types bound as its endpoint
  */
 int
 sctp_destination_is_reachable(struct sctp_tcb *stcb, struct sockaddr *destaddr)
 {
 	struct sctp_inpcb *inp;
 	int answer;
 
 	/*
 	 * No locks here, the TCB, in all cases is already locked and an
 	 * assoc is up. There is either a INP lock by the caller applied (in
 	 * asconf case when deleting an address) or NOT in the HB case,
 	 * however if HB then the INP increment is up and the INP will not
 	 * be removed (on top of the fact that we have a TCB lock). So we
 	 * only want to read the sctp_flags, which is either bound-all or
 	 * not.. no protection needed since once an assoc is up you can't be
 	 * changing your binding.
 	 */
 	inp = stcb->sctp_ep;
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) {
 		/* if bound all, destination is not restricted */
 		/*
 		 * RRS: Question during lock work: Is this correct? If you
 		 * are bound-all you still might need to obey the V4--V6
 		 * flags??? IMO this bound-all stuff needs to be removed!
 		 */
 		return (1);
 	}
 	/* NOTE: all "scope" checks are done when local addresses are added */
 	switch (destaddr->sa_family) {
 #ifdef INET6
 	case AF_INET6:
 		answer = inp->ip_inp.inp.inp_vflag & INP_IPV6;
 		break;
 #endif
 #ifdef INET
 	case AF_INET:
 		answer = inp->ip_inp.inp.inp_vflag & INP_IPV4;
 		break;
 #endif
 	default:
 		/* invalid family, so it's unreachable */
 		answer = 0;
 		break;
 	}
 	return (answer);
 }
 
 /*
  * update the inp_vflags on an endpoint
  */
 static void
 sctp_update_ep_vflag(struct sctp_inpcb *inp)
 {
 	struct sctp_laddr *laddr;
 
 	/* first clear the flag */
 	inp->ip_inp.inp.inp_vflag = 0;
 	/* set the flag based on addresses on the ep list */
 	LIST_FOREACH(laddr, &inp->sctp_addr_list, sctp_nxt_addr) {
 		if (laddr->ifa == NULL) {
 			SCTPDBG(SCTP_DEBUG_PCB1, "%s: NULL ifa\n",
 			    __func__);
 			continue;
 		}
 		if (laddr->ifa->localifa_flags & SCTP_BEING_DELETED) {
 			continue;
 		}
 		switch (laddr->ifa->address.sa.sa_family) {
 #ifdef INET6
 		case AF_INET6:
 			inp->ip_inp.inp.inp_vflag |= INP_IPV6;
 			break;
 #endif
 #ifdef INET
 		case AF_INET:
 			inp->ip_inp.inp.inp_vflag |= INP_IPV4;
 			break;
 #endif
 		default:
 			break;
 		}
 	}
 }
 
 /*
  * Add the address to the endpoint local address list There is nothing to be
  * done if we are bound to all addresses
  */
 void
 sctp_add_local_addr_ep(struct sctp_inpcb *inp, struct sctp_ifa *ifa, uint32_t action)
 {
 	struct sctp_laddr *laddr;
 	struct sctp_tcb *stcb;
 	int fnd, error = 0;
 
 	fnd = 0;
 
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) {
 		/* You are already bound to all. You have it already */
 		return;
 	}
 #ifdef INET6
 	if (ifa->address.sa.sa_family == AF_INET6) {
 		if (ifa->localifa_flags & SCTP_ADDR_IFA_UNUSEABLE) {
 			/* Can't bind a non-useable addr. */
 			return;
 		}
 	}
 #endif
 	/* first, is it already present? */
 	LIST_FOREACH(laddr, &inp->sctp_addr_list, sctp_nxt_addr) {
 		if (laddr->ifa == ifa) {
 			fnd = 1;
 			break;
 		}
 	}
 
 	if (fnd == 0) {
 		/* Not in the ep list */
 		error = sctp_insert_laddr(&inp->sctp_addr_list, ifa, action);
 		if (error != 0)
 			return;
 		inp->laddr_count++;
 		/* update inp_vflag flags */
 		switch (ifa->address.sa.sa_family) {
 #ifdef INET6
 		case AF_INET6:
 			inp->ip_inp.inp.inp_vflag |= INP_IPV6;
 			break;
 #endif
 #ifdef INET
 		case AF_INET:
 			inp->ip_inp.inp.inp_vflag |= INP_IPV4;
 			break;
 #endif
 		default:
 			break;
 		}
 		LIST_FOREACH(stcb, &inp->sctp_asoc_list, sctp_tcblist) {
 			sctp_add_local_addr_restricted(stcb, ifa);
 		}
 	}
 	return;
 }
 
 
 /*
  * select a new (hopefully reachable) destination net (should only be used
  * when we deleted an ep addr that is the only usable source address to reach
  * the destination net)
  */
 static void
 sctp_select_primary_destination(struct sctp_tcb *stcb)
 {
 	struct sctp_nets *net;
 
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		/* for now, we'll just pick the first reachable one we find */
 		if (net->dest_state & SCTP_ADDR_UNCONFIRMED)
 			continue;
 		if (sctp_destination_is_reachable(stcb,
 		    (struct sockaddr *)&net->ro._l_addr)) {
 			/* found a reachable destination */
 			stcb->asoc.primary_destination = net;
 		}
 	}
 	/* I can't there from here! ...we're gonna die shortly... */
 }
 
 
 /*
  * Delete the address from the endpoint local address list. There is nothing
  * to be done if we are bound to all addresses
  */
 void
 sctp_del_local_addr_ep(struct sctp_inpcb *inp, struct sctp_ifa *ifa)
 {
 	struct sctp_laddr *laddr;
 	int fnd;
 
 	fnd = 0;
 	if (inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) {
 		/* You are already bound to all. You have it already */
 		return;
 	}
 	LIST_FOREACH(laddr, &inp->sctp_addr_list, sctp_nxt_addr) {
 		if (laddr->ifa == ifa) {
 			fnd = 1;
 			break;
 		}
 	}
 	if (fnd && (inp->laddr_count < 2)) {
 		/* can't delete unless there are at LEAST 2 addresses */
 		return;
 	}
 	if (fnd) {
 		/*
 		 * clean up any use of this address go through our
 		 * associations and clear any last_used_address that match
 		 * this one for each assoc, see if a new primary_destination
 		 * is needed
 		 */
 		struct sctp_tcb *stcb;
 
 		/* clean up "next_addr_touse" */
 		if (inp->next_addr_touse == laddr)
 			/* delete this address */
 			inp->next_addr_touse = NULL;
 
 		/* clean up "last_used_address" */
 		LIST_FOREACH(stcb, &inp->sctp_asoc_list, sctp_tcblist) {
 			struct sctp_nets *net;
 
 			SCTP_TCB_LOCK(stcb);
 			if (stcb->asoc.last_used_address == laddr)
 				/* delete this address */
 				stcb->asoc.last_used_address = NULL;
 			/*
 			 * Now spin through all the nets and purge any ref
 			 * to laddr
 			 */
 			TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 				if (net->ro._s_addr == laddr->ifa) {
 					/* Yep, purge src address selected */
 					sctp_rtentry_t *rt;
 
 					/* delete this address if cached */
 					rt = net->ro.ro_rt;
 					if (rt != NULL) {
 						RTFREE(rt);
 						net->ro.ro_rt = NULL;
 					}
 					sctp_free_ifa(net->ro._s_addr);
 					net->ro._s_addr = NULL;
 					net->src_addr_selected = 0;
 				}
 			}
 			SCTP_TCB_UNLOCK(stcb);
 		}		/* for each tcb */
 		/* remove it from the ep list */
 		sctp_remove_laddr(laddr);
 		inp->laddr_count--;
 		/* update inp_vflag flags */
 		sctp_update_ep_vflag(inp);
 	}
 	return;
 }
 
 /*
  * Add the address to the TCB local address restricted list.
  * This is a "pending" address list (eg. addresses waiting for an
  * ASCONF-ACK response) and cannot be used as a valid source address.
  */
 void
 sctp_add_local_addr_restricted(struct sctp_tcb *stcb, struct sctp_ifa *ifa)
 {
 	struct sctp_laddr *laddr;
 	struct sctpladdr *list;
 
 	/*
 	 * Assumes TCB is locked.. and possibly the INP. May need to
 	 * confirm/fix that if we need it and is not the case.
 	 */
 	list = &stcb->asoc.sctp_restricted_addrs;
 
 #ifdef INET6
 	if (ifa->address.sa.sa_family == AF_INET6) {
 		if (ifa->localifa_flags & SCTP_ADDR_IFA_UNUSEABLE) {
 			/* Can't bind a non-existent addr. */
 			return;
 		}
 	}
 #endif
 	/* does the address already exist? */
 	LIST_FOREACH(laddr, list, sctp_nxt_addr) {
 		if (laddr->ifa == ifa) {
 			return;
 		}
 	}
 
 	/* add to the list */
 	(void)sctp_insert_laddr(list, ifa, 0);
 	return;
 }
 
 /*
  * Remove a local address from the TCB local address restricted list
  */
 void
 sctp_del_local_addr_restricted(struct sctp_tcb *stcb, struct sctp_ifa *ifa)
 {
 	struct sctp_inpcb *inp;
 	struct sctp_laddr *laddr;
 
 	/*
 	 * This is called by asconf work. It is assumed that a) The TCB is
 	 * locked and b) The INP is locked. This is true in as much as I can
 	 * trace through the entry asconf code where I did these locks.
 	 * Again, the ASCONF code is a bit different in that it does lock
 	 * the INP during its work often times. This must be since we don't
 	 * want other proc's looking up things while what they are looking
 	 * up is changing :-D
 	 */
 
 	inp = stcb->sctp_ep;
 	/* if subset bound and don't allow ASCONF's, can't delete last */
 	if (((inp->sctp_flags & SCTP_PCB_FLAGS_BOUNDALL) == 0) &&
 	    sctp_is_feature_off(inp, SCTP_PCB_FLAGS_DO_ASCONF)) {
 		if (stcb->sctp_ep->laddr_count < 2) {
 			/* can't delete last address */
 			return;
 		}
 	}
 	LIST_FOREACH(laddr, &stcb->asoc.sctp_restricted_addrs, sctp_nxt_addr) {
 		/* remove the address if it exists */
 		if (laddr->ifa == NULL)
 			continue;
 		if (laddr->ifa == ifa) {
 			sctp_remove_laddr(laddr);
 			return;
 		}
 	}
 
 	/* address not found! */
 	return;
 }
 
 /*
  * Temporarily remove for __APPLE__ until we use the Tiger equivalents
  */
 /* sysctl */
 static int sctp_max_number_of_assoc = SCTP_MAX_NUM_OF_ASOC;
 static int sctp_scale_up_for_address = SCTP_SCALE_FOR_ADDR;
 
 
 
 #if defined(__FreeBSD__) && defined(SCTP_MCORE_INPUT) && defined(SMP)
 struct sctp_mcore_ctrl *sctp_mcore_workers = NULL;
 int *sctp_cpuarry = NULL;
 void
 sctp_queue_to_mcore(struct mbuf *m, int off, int cpu_to_use)
 {
 	/* Queue a packet to a processor for the specified core */
 	struct sctp_mcore_queue *qent;
 	struct sctp_mcore_ctrl *wkq;
 	int need_wake = 0;
 
 	if (sctp_mcore_workers == NULL) {
 		/* Something went way bad during setup */
 		sctp_input_with_port(m, off, 0);
 		return;
 	}
 	SCTP_MALLOC(qent, struct sctp_mcore_queue *,
 	    (sizeof(struct sctp_mcore_queue)),
 	    SCTP_M_MCORE);
 	if (qent == NULL) {
 		/* This is trouble  */
 		sctp_input_with_port(m, off, 0);
 		return;
 	}
 	qent->vn = curvnet;
 	qent->m = m;
 	qent->off = off;
 	qent->v6 = 0;
 	wkq = &sctp_mcore_workers[cpu_to_use];
 	SCTP_MCORE_QLOCK(wkq);
 
 	TAILQ_INSERT_TAIL(&wkq->que, qent, next);
 	if (wkq->running == 0) {
 		need_wake = 1;
 	}
 	SCTP_MCORE_QUNLOCK(wkq);
 	if (need_wake) {
 		wakeup(&wkq->running);
 	}
 }
 
 static void
 sctp_mcore_thread(void *arg)
 {
 
 	struct sctp_mcore_ctrl *wkq;
 	struct sctp_mcore_queue *qent;
 
 	wkq = (struct sctp_mcore_ctrl *)arg;
 	struct mbuf *m;
 	int off, v6;
 
 	/* Wait for first tickle */
 	SCTP_MCORE_LOCK(wkq);
 	wkq->running = 0;
 	msleep(&wkq->running,
 	    &wkq->core_mtx,
 	    0, "wait for pkt", 0);
 	SCTP_MCORE_UNLOCK(wkq);
 
 	/* Bind to our cpu */
 	thread_lock(curthread);
 	sched_bind(curthread, wkq->cpuid);
 	thread_unlock(curthread);
 
 	/* Now lets start working */
 	SCTP_MCORE_LOCK(wkq);
 	/* Now grab lock and go */
 	for (;;) {
 		SCTP_MCORE_QLOCK(wkq);
 skip_sleep:
 		wkq->running = 1;
 		qent = TAILQ_FIRST(&wkq->que);
 		if (qent) {
 			TAILQ_REMOVE(&wkq->que, qent, next);
 			SCTP_MCORE_QUNLOCK(wkq);
 			CURVNET_SET(qent->vn);
 			m = qent->m;
 			off = qent->off;
 			v6 = qent->v6;
 			SCTP_FREE(qent, SCTP_M_MCORE);
 			if (v6 == 0) {
 				sctp_input_with_port(m, off, 0);
 			} else {
 				SCTP_PRINTF("V6 not yet supported\n");
 				sctp_m_freem(m);
 			}
 			CURVNET_RESTORE();
 			SCTP_MCORE_QLOCK(wkq);
 		}
 		wkq->running = 0;
 		if (!TAILQ_EMPTY(&wkq->que)) {
 			goto skip_sleep;
 		}
 		SCTP_MCORE_QUNLOCK(wkq);
 		msleep(&wkq->running,
 		    &wkq->core_mtx,
 		    0, "wait for pkt", 0);
 	}
 }
 
 static void
 sctp_startup_mcore_threads(void)
 {
 	int i, cpu;
 
 	if (mp_ncpus == 1)
 		return;
 
 	if (sctp_mcore_workers != NULL) {
 		/*
 		 * Already been here in some previous vnet?
 		 */
 		return;
 	}
 	SCTP_MALLOC(sctp_mcore_workers, struct sctp_mcore_ctrl *,
 	    ((mp_maxid + 1) * sizeof(struct sctp_mcore_ctrl)),
 	    SCTP_M_MCORE);
 	if (sctp_mcore_workers == NULL) {
 		/* TSNH I hope */
 		return;
 	}
 	memset(sctp_mcore_workers, 0, ((mp_maxid + 1) *
 	    sizeof(struct sctp_mcore_ctrl)));
 	/* Init the structures */
 	for (i = 0; i <= mp_maxid; i++) {
 		TAILQ_INIT(&sctp_mcore_workers[i].que);
 		SCTP_MCORE_LOCK_INIT(&sctp_mcore_workers[i]);
 		SCTP_MCORE_QLOCK_INIT(&sctp_mcore_workers[i]);
 		sctp_mcore_workers[i].cpuid = i;
 	}
 	if (sctp_cpuarry == NULL) {
 		SCTP_MALLOC(sctp_cpuarry, int *,
 		    (mp_ncpus * sizeof(int)),
 		    SCTP_M_MCORE);
 		i = 0;
 		CPU_FOREACH(cpu) {
 			sctp_cpuarry[i] = cpu;
 			i++;
 		}
 	}
 	/* Now start them all */
 	CPU_FOREACH(cpu) {
 		(void)kproc_create(sctp_mcore_thread,
 		    (void *)&sctp_mcore_workers[cpu],
 		    &sctp_mcore_workers[cpu].thread_proc,
 		    RFPROC,
 		    SCTP_KTHREAD_PAGES,
 		    SCTP_MCORE_NAME);
 
 	}
 }
 
 #endif
 
 void
 sctp_pcb_init()
 {
 	/*
 	 * SCTP initialization for the PCB structures should be called by
 	 * the sctp_init() function.
 	 */
 	int i;
 	struct timeval tv;
 
 	if (SCTP_BASE_VAR(sctp_pcb_initialized) != 0) {
 		/* error I was called twice */
 		return;
 	}
 	SCTP_BASE_VAR(sctp_pcb_initialized) = 1;
 
 #if defined(SCTP_LOCAL_TRACE_BUF)
 	bzero(&SCTP_BASE_SYSCTL(sctp_log), sizeof(struct sctp_log));
 #endif
 #if defined(__FreeBSD__) && defined(SMP) && defined(SCTP_USE_PERCPU_STAT)
 	SCTP_MALLOC(SCTP_BASE_STATS, struct sctpstat *,
 	    ((mp_maxid + 1) * sizeof(struct sctpstat)),
 	    SCTP_M_MCORE);
 #endif
 	(void)SCTP_GETTIME_TIMEVAL(&tv);
 #if defined(__FreeBSD__) && defined(SMP) && defined(SCTP_USE_PERCPU_STAT)
 	bzero(SCTP_BASE_STATS, (sizeof(struct sctpstat) * (mp_maxid + 1)));
 	SCTP_BASE_STATS[PCPU_GET(cpuid)].sctps_discontinuitytime.tv_sec = (uint32_t) tv.tv_sec;
 	SCTP_BASE_STATS[PCPU_GET(cpuid)].sctps_discontinuitytime.tv_usec = (uint32_t) tv.tv_usec;
 #else
 	bzero(&SCTP_BASE_STATS, sizeof(struct sctpstat));
 	SCTP_BASE_STAT(sctps_discontinuitytime).tv_sec = (uint32_t) tv.tv_sec;
 	SCTP_BASE_STAT(sctps_discontinuitytime).tv_usec = (uint32_t) tv.tv_usec;
 #endif
 	/* init the empty list of (All) Endpoints */
 	LIST_INIT(&SCTP_BASE_INFO(listhead));
 
 
 	/* init the hash table of endpoints */
 	TUNABLE_INT_FETCH("net.inet.sctp.tcbhashsize", &SCTP_BASE_SYSCTL(sctp_hashtblsize));
 	TUNABLE_INT_FETCH("net.inet.sctp.pcbhashsize", &SCTP_BASE_SYSCTL(sctp_pcbtblsize));
 	TUNABLE_INT_FETCH("net.inet.sctp.chunkscale", &SCTP_BASE_SYSCTL(sctp_chunkscale));
 	SCTP_BASE_INFO(sctp_asochash) = SCTP_HASH_INIT((SCTP_BASE_SYSCTL(sctp_hashtblsize) * 31),
 	    &SCTP_BASE_INFO(hashasocmark));
 	SCTP_BASE_INFO(sctp_ephash) = SCTP_HASH_INIT(SCTP_BASE_SYSCTL(sctp_hashtblsize),
 	    &SCTP_BASE_INFO(hashmark));
 	SCTP_BASE_INFO(sctp_tcpephash) = SCTP_HASH_INIT(SCTP_BASE_SYSCTL(sctp_hashtblsize),
 	    &SCTP_BASE_INFO(hashtcpmark));
 	SCTP_BASE_INFO(hashtblsize) = SCTP_BASE_SYSCTL(sctp_hashtblsize);
 
 
 	SCTP_BASE_INFO(sctp_vrfhash) = SCTP_HASH_INIT(SCTP_SIZE_OF_VRF_HASH,
 	    &SCTP_BASE_INFO(hashvrfmark));
 
 	SCTP_BASE_INFO(vrf_ifn_hash) = SCTP_HASH_INIT(SCTP_VRF_IFN_HASH_SIZE,
 	    &SCTP_BASE_INFO(vrf_ifn_hashmark));
 	/* init the zones */
 	/*
 	 * FIX ME: Should check for NULL returns, but if it does fail we are
 	 * doomed to panic anyways... add later maybe.
 	 */
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_ep), "sctp_ep",
 	    sizeof(struct sctp_inpcb), maxsockets);
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_asoc), "sctp_asoc",
 	    sizeof(struct sctp_tcb), sctp_max_number_of_assoc);
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_laddr), "sctp_laddr",
 	    sizeof(struct sctp_laddr),
 	    (sctp_max_number_of_assoc * sctp_scale_up_for_address));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_net), "sctp_raddr",
 	    sizeof(struct sctp_nets),
 	    (sctp_max_number_of_assoc * sctp_scale_up_for_address));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_chunk), "sctp_chunk",
 	    sizeof(struct sctp_tmit_chunk),
 	    (sctp_max_number_of_assoc * SCTP_BASE_SYSCTL(sctp_chunkscale)));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_readq), "sctp_readq",
 	    sizeof(struct sctp_queued_to_read),
 	    (sctp_max_number_of_assoc * SCTP_BASE_SYSCTL(sctp_chunkscale)));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_strmoq), "sctp_stream_msg_out",
 	    sizeof(struct sctp_stream_queue_pending),
 	    (sctp_max_number_of_assoc * SCTP_BASE_SYSCTL(sctp_chunkscale)));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_asconf), "sctp_asconf",
 	    sizeof(struct sctp_asconf),
 	    (sctp_max_number_of_assoc * SCTP_BASE_SYSCTL(sctp_chunkscale)));
 
 	SCTP_ZONE_INIT(SCTP_BASE_INFO(ipi_zone_asconf_ack), "sctp_asconf_ack",
 	    sizeof(struct sctp_asconf_ack),
 	    (sctp_max_number_of_assoc * SCTP_BASE_SYSCTL(sctp_chunkscale)));
 
 
 	/* Master Lock INIT for info structure */
 	SCTP_INP_INFO_LOCK_INIT();
 	SCTP_STATLOG_INIT_LOCK();
 
 	SCTP_IPI_COUNT_INIT();
 	SCTP_IPI_ADDR_INIT();
 #ifdef SCTP_PACKET_LOGGING
 	SCTP_IP_PKTLOG_INIT();
 #endif
 	LIST_INIT(&SCTP_BASE_INFO(addr_wq));
 
 	SCTP_WQ_ADDR_INIT();
 	/* not sure if we need all the counts */
 	SCTP_BASE_INFO(ipi_count_ep) = 0;
 	/* assoc/tcb zone info */
 	SCTP_BASE_INFO(ipi_count_asoc) = 0;
 	/* local addrlist zone info */
 	SCTP_BASE_INFO(ipi_count_laddr) = 0;
 	/* remote addrlist zone info */
 	SCTP_BASE_INFO(ipi_count_raddr) = 0;
 	/* chunk info */
 	SCTP_BASE_INFO(ipi_count_chunk) = 0;
 
 	/* socket queue zone info */
 	SCTP_BASE_INFO(ipi_count_readq) = 0;
 
 	/* stream out queue cont */
 	SCTP_BASE_INFO(ipi_count_strmoq) = 0;
 
 	SCTP_BASE_INFO(ipi_free_strmoq) = 0;
 	SCTP_BASE_INFO(ipi_free_chunks) = 0;
 
 	SCTP_OS_TIMER_INIT(&SCTP_BASE_INFO(addr_wq_timer.timer));
 
 	/* Init the TIMEWAIT list */
 	for (i = 0; i < SCTP_STACK_VTAG_HASH_SIZE; i++) {
 		LIST_INIT(&SCTP_BASE_INFO(vtag_timewait)[i]);
 	}
 	sctp_startup_iterator();
 
 #if defined(__FreeBSD__) && defined(SCTP_MCORE_INPUT) && defined(SMP)
 	sctp_startup_mcore_threads();
 #endif
 
 	/*
 	 * INIT the default VRF which for BSD is the only one, other O/S's
 	 * may have more. But initially they must start with one and then
 	 * add the VRF's as addresses are added.
 	 */
 	sctp_init_vrf_list(SCTP_DEFAULT_VRF);
 }
 
 /*
  * Assumes that the SCTP_BASE_INFO() lock is NOT held.
  */
 void
 sctp_pcb_finish(void)
 {
 	struct sctp_vrflist *vrf_bucket;
 	struct sctp_vrf *vrf, *nvrf;
 	struct sctp_ifn *ifn, *nifn;
 	struct sctp_ifa *ifa, *nifa;
 	struct sctpvtaghead *chain;
 	struct sctp_tagblock *twait_block, *prev_twait_block;
 	struct sctp_laddr *wi, *nwi;
 	int i;
 	struct sctp_iterator *it, *nit;
 
 	if (SCTP_BASE_VAR(sctp_pcb_initialized) == 0) {
 		SCTP_PRINTF("%s: race condition on teardown.\n", __func__);
 		return;
 	}
 	SCTP_BASE_VAR(sctp_pcb_initialized) = 0;
 	/*
 	 * In FreeBSD the iterator thread never exits but we do clean up.
 	 * The only way FreeBSD reaches here is if we have VRF's but we
 	 * still add the ifdef to make it compile on old versions.
 	 */
 retry:
 	SCTP_IPI_ITERATOR_WQ_LOCK();
 	/*
 	 * sctp_iterator_worker() might be working on an it entry without
 	 * holding the lock.  We won't find it on the list either and
 	 * continue and free/destroy it.  While holding the lock, spin, to
 	 * avoid the race condition as sctp_iterator_worker() will have to
 	 * wait to re-aquire the lock.
 	 */
 	if (sctp_it_ctl.iterator_running != 0 || sctp_it_ctl.cur_it != NULL) {
 		SCTP_IPI_ITERATOR_WQ_UNLOCK();
 		SCTP_PRINTF("%s: Iterator running while we held the lock. Retry. "
 		    "cur_it=%p\n", __func__, sctp_it_ctl.cur_it);
 		DELAY(10);
 		goto retry;
 	}
 	TAILQ_FOREACH_SAFE(it, &sctp_it_ctl.iteratorhead, sctp_nxt_itr, nit) {
 		if (it->vn != curvnet) {
 			continue;
 		}
 		TAILQ_REMOVE(&sctp_it_ctl.iteratorhead, it, sctp_nxt_itr);
 		if (it->function_atend != NULL) {
 			(*it->function_atend) (it->pointer, it->val);
 		}
 		SCTP_FREE(it, SCTP_M_ITER);
 	}
 	SCTP_IPI_ITERATOR_WQ_UNLOCK();
 	SCTP_ITERATOR_LOCK();
 	if ((sctp_it_ctl.cur_it) &&
 	    (sctp_it_ctl.cur_it->vn == curvnet)) {
 		sctp_it_ctl.iterator_flags |= SCTP_ITERATOR_STOP_CUR_IT;
 	}
 	SCTP_ITERATOR_UNLOCK();
 	SCTP_OS_TIMER_STOP_DRAIN(&SCTP_BASE_INFO(addr_wq_timer.timer));
 	SCTP_WQ_ADDR_LOCK();
 	LIST_FOREACH_SAFE(wi, &SCTP_BASE_INFO(addr_wq), sctp_nxt_addr, nwi) {
 		LIST_REMOVE(wi, sctp_nxt_addr);
 		SCTP_DECR_LADDR_COUNT();
 		if (wi->action == SCTP_DEL_IP_ADDRESS) {
 			SCTP_FREE(wi->ifa, SCTP_M_IFA);
 		}
 		SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_laddr), wi);
 	}
 	SCTP_WQ_ADDR_UNLOCK();
 
 	/*
 	 * free the vrf/ifn/ifa lists and hashes (be sure address monitor is
 	 * destroyed first).
 	 */
 	vrf_bucket = &SCTP_BASE_INFO(sctp_vrfhash)[(SCTP_DEFAULT_VRFID & SCTP_BASE_INFO(hashvrfmark))];
 	LIST_FOREACH_SAFE(vrf, vrf_bucket, next_vrf, nvrf) {
 		LIST_FOREACH_SAFE(ifn, &vrf->ifnlist, next_ifn, nifn) {
 			LIST_FOREACH_SAFE(ifa, &ifn->ifalist, next_ifa, nifa) {
 				/* free the ifa */
 				LIST_REMOVE(ifa, next_bucket);
 				LIST_REMOVE(ifa, next_ifa);
 				SCTP_FREE(ifa, SCTP_M_IFA);
 			}
 			/* free the ifn */
 			LIST_REMOVE(ifn, next_bucket);
 			LIST_REMOVE(ifn, next_ifn);
 			SCTP_FREE(ifn, SCTP_M_IFN);
 		}
 		SCTP_HASH_FREE(vrf->vrf_addr_hash, vrf->vrf_addr_hashmark);
 		/* free the vrf */
 		LIST_REMOVE(vrf, next_vrf);
 		SCTP_FREE(vrf, SCTP_M_VRF);
 	}
 	/* free the vrf hashes */
 	SCTP_HASH_FREE(SCTP_BASE_INFO(sctp_vrfhash), SCTP_BASE_INFO(hashvrfmark));
 	SCTP_HASH_FREE(SCTP_BASE_INFO(vrf_ifn_hash), SCTP_BASE_INFO(vrf_ifn_hashmark));
 
 	/*
 	 * free the TIMEWAIT list elements malloc'd in the function
 	 * sctp_add_vtag_to_timewait()...
 	 */
 	for (i = 0; i < SCTP_STACK_VTAG_HASH_SIZE; i++) {
 		chain = &SCTP_BASE_INFO(vtag_timewait)[i];
 		if (!LIST_EMPTY(chain)) {
 			prev_twait_block = NULL;
 			LIST_FOREACH(twait_block, chain, sctp_nxt_tagblock) {
 				if (prev_twait_block) {
 					SCTP_FREE(prev_twait_block, SCTP_M_TIMW);
 				}
 				prev_twait_block = twait_block;
 			}
 			SCTP_FREE(prev_twait_block, SCTP_M_TIMW);
 		}
 	}
 
 	/* free the locks and mutexes */
 #ifdef SCTP_PACKET_LOGGING
 	SCTP_IP_PKTLOG_DESTROY();
 #endif
 	SCTP_IPI_ADDR_DESTROY();
 	SCTP_STATLOG_DESTROY();
 	SCTP_INP_INFO_LOCK_DESTROY();
 
 	SCTP_WQ_ADDR_DESTROY();
 
 	/* Get rid of other stuff too. */
 	if (SCTP_BASE_INFO(sctp_asochash) != NULL)
 		SCTP_HASH_FREE(SCTP_BASE_INFO(sctp_asochash), SCTP_BASE_INFO(hashasocmark));
 	if (SCTP_BASE_INFO(sctp_ephash) != NULL)
 		SCTP_HASH_FREE(SCTP_BASE_INFO(sctp_ephash), SCTP_BASE_INFO(hashmark));
 	if (SCTP_BASE_INFO(sctp_tcpephash) != NULL)
 		SCTP_HASH_FREE(SCTP_BASE_INFO(sctp_tcpephash), SCTP_BASE_INFO(hashtcpmark));
 
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_ep));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_asoc));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_laddr));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_net));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_chunk));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_readq));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_strmoq));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_asconf));
 	SCTP_ZONE_DESTROY(SCTP_BASE_INFO(ipi_zone_asconf_ack));
 #if defined(__FreeBSD__) && defined(SMP) && defined(SCTP_USE_PERCPU_STAT)
 	SCTP_FREE(SCTP_BASE_STATS, SCTP_M_MCORE);
 #endif
 }
 
 
 int
 sctp_load_addresses_from_init(struct sctp_tcb *stcb, struct mbuf *m,
     int offset, int limit,
     struct sockaddr *src, struct sockaddr *dst,
     struct sockaddr *altsa, uint16_t port)
 {
 	/*
 	 * grub through the INIT pulling addresses and loading them to the
 	 * nets structure in the asoc. The from address in the mbuf should
 	 * also be loaded (if it is not already). This routine can be called
 	 * with either INIT or INIT-ACK's as long as the m points to the IP
 	 * packet and the offset points to the beginning of the parameters.
 	 */
 	struct sctp_inpcb *inp;
 	struct sctp_nets *net, *nnet, *net_tmp;
 	struct sctp_paramhdr *phdr, parm_buf;
 	struct sctp_tcb *stcb_tmp;
 	uint16_t ptype, plen;
 	struct sockaddr *sa;
 	uint8_t random_store[SCTP_PARAM_BUFFER_SIZE];
 	struct sctp_auth_random *p_random = NULL;
 	uint16_t random_len = 0;
 	uint8_t hmacs_store[SCTP_PARAM_BUFFER_SIZE];
 	struct sctp_auth_hmac_algo *hmacs = NULL;
 	uint16_t hmacs_len = 0;
 	uint8_t saw_asconf = 0;
 	uint8_t saw_asconf_ack = 0;
 	uint8_t chunks_store[SCTP_PARAM_BUFFER_SIZE];
 	struct sctp_auth_chunk_list *chunks = NULL;
 	uint16_t num_chunks = 0;
 	sctp_key_t *new_key;
 	uint32_t keylen;
 	int got_random = 0, got_hmacs = 0, got_chklist = 0;
 	uint8_t peer_supports_ecn;
 	uint8_t peer_supports_prsctp;
 	uint8_t peer_supports_auth;
 	uint8_t peer_supports_asconf;
 	uint8_t peer_supports_asconf_ack;
 	uint8_t peer_supports_reconfig;
 	uint8_t peer_supports_nrsack;
 	uint8_t peer_supports_pktdrop;
 	uint8_t peer_supports_idata;
 
 #ifdef INET
 	struct sockaddr_in sin;
 
 #endif
 #ifdef INET6
 	struct sockaddr_in6 sin6;
 
 #endif
 
 	/* First get the destination address setup too. */
 #ifdef INET
 	memset(&sin, 0, sizeof(sin));
 	sin.sin_family = AF_INET;
 	sin.sin_len = sizeof(sin);
 	sin.sin_port = stcb->rport;
 #endif
 #ifdef INET6
 	memset(&sin6, 0, sizeof(sin6));
 	sin6.sin6_family = AF_INET6;
 	sin6.sin6_len = sizeof(struct sockaddr_in6);
 	sin6.sin6_port = stcb->rport;
 #endif
 	if (altsa) {
 		sa = altsa;
 	} else {
 		sa = src;
 	}
 	peer_supports_idata = 0;
 	peer_supports_ecn = 0;
 	peer_supports_prsctp = 0;
 	peer_supports_auth = 0;
 	peer_supports_asconf = 0;
 	peer_supports_reconfig = 0;
 	peer_supports_nrsack = 0;
 	peer_supports_pktdrop = 0;
 	TAILQ_FOREACH(net, &stcb->asoc.nets, sctp_next) {
 		/* mark all addresses that we have currently on the list */
 		net->dest_state |= SCTP_ADDR_NOT_IN_ASSOC;
 	}
 	/* does the source address already exist? if so skip it */
 	inp = stcb->sctp_ep;
 	atomic_add_int(&stcb->asoc.refcnt, 1);
 	stcb_tmp = sctp_findassociation_ep_addr(&inp, sa, &net_tmp, dst, stcb);
 	atomic_add_int(&stcb->asoc.refcnt, -1);
 
 	if ((stcb_tmp == NULL && inp == stcb->sctp_ep) || inp == NULL) {
 		/* we must add the source address */
 		/* no scope set here since we have a tcb already. */
 		switch (sa->sa_family) {
 #ifdef INET
 		case AF_INET:
 			if (stcb->asoc.scope.ipv4_addr_legal) {
 				if (sctp_add_remote_addr(stcb, sa, NULL, port, SCTP_DONOT_SETSCOPE, SCTP_LOAD_ADDR_2)) {
 					return (-1);
 				}
 			}
 			break;
 #endif
 #ifdef INET6
 		case AF_INET6:
 			if (stcb->asoc.scope.ipv6_addr_legal) {
 				if (sctp_add_remote_addr(stcb, sa, NULL, port, SCTP_DONOT_SETSCOPE, SCTP_LOAD_ADDR_3)) {
 					return (-2);
 				}
 			}
 			break;
 #endif
 		default:
 			break;
 		}
 	} else {
 		if (net_tmp != NULL && stcb_tmp == stcb) {
 			net_tmp->dest_state &= ~SCTP_ADDR_NOT_IN_ASSOC;
 		} else if (stcb_tmp != stcb) {
 			/* It belongs to another association? */
 			if (stcb_tmp)
 				SCTP_TCB_UNLOCK(stcb_tmp);
 			return (-3);
 		}
 	}
 	if (stcb->asoc.state == 0) {
 		/* the assoc was freed? */
 		return (-4);
 	}
 	/* now we must go through each of the params. */
 	phdr = sctp_get_next_param(m, offset, &parm_buf, sizeof(parm_buf));
 	while (phdr) {
 		ptype = ntohs(phdr->param_type);
 		plen = ntohs(phdr->param_length);
 		/*
 		 * SCTP_PRINTF("ptype => %0x, plen => %d\n",
 		 * (uint32_t)ptype, (int)plen);
 		 */
 		if (offset + plen > limit) {
 			break;
 		}
 		if (plen == 0) {
 			break;
 		}
 #ifdef INET
 		if (ptype == SCTP_IPV4_ADDRESS) {
 			if (stcb->asoc.scope.ipv4_addr_legal) {
 				struct sctp_ipv4addr_param *p4, p4_buf;
 
 				/* ok get the v4 address and check/add */
 				phdr = sctp_get_next_param(m, offset,
 				    (struct sctp_paramhdr *)&p4_buf,
 				    sizeof(p4_buf));
 				if (plen != sizeof(struct sctp_ipv4addr_param) ||
 				    phdr == NULL) {
 					return (-5);
 				}
 				p4 = (struct sctp_ipv4addr_param *)phdr;
 				sin.sin_addr.s_addr = p4->addr;
 				if (IN_MULTICAST(ntohl(sin.sin_addr.s_addr))) {
 					/* Skip multi-cast addresses */
 					goto next_param;
 				}
 				if ((sin.sin_addr.s_addr == INADDR_BROADCAST) ||
 				    (sin.sin_addr.s_addr == INADDR_ANY)) {
 					goto next_param;
 				}
 				sa = (struct sockaddr *)&sin;
 				inp = stcb->sctp_ep;
 				atomic_add_int(&stcb->asoc.refcnt, 1);
 				stcb_tmp = sctp_findassociation_ep_addr(&inp, sa, &net,
 				    dst, stcb);
 				atomic_add_int(&stcb->asoc.refcnt, -1);
 
 				if ((stcb_tmp == NULL && inp == stcb->sctp_ep) ||
 				    inp == NULL) {
 					/* we must add the source address */
 					/*
 					 * no scope set since we have a tcb
 					 * already
 					 */
 
 					/*
 					 * we must validate the state again
 					 * here
 					 */
 			add_it_now:
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-7);
 					}
 					if (sctp_add_remote_addr(stcb, sa, NULL, port, SCTP_DONOT_SETSCOPE, SCTP_LOAD_ADDR_4)) {
 						return (-8);
 					}
 				} else if (stcb_tmp == stcb) {
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-10);
 					}
 					if (net != NULL) {
 						/* clear flag */
 						net->dest_state &=
 						    ~SCTP_ADDR_NOT_IN_ASSOC;
 					}
 				} else {
 					/*
 					 * strange, address is in another
 					 * assoc? straighten out locks.
 					 */
 					if (stcb_tmp) {
 						if (SCTP_GET_STATE(&stcb_tmp->asoc) & SCTP_STATE_COOKIE_WAIT) {
 							struct mbuf *op_err;
 							char msg[SCTP_DIAG_INFO_LEN];
 
 							/*
 							 * in setup state we
 							 * abort this guy
 							 */
 							snprintf(msg, sizeof(msg),
 							    "%s:%d at %s", __FILE__, __LINE__, __func__);
 							op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 							    msg);
 							sctp_abort_an_association(stcb_tmp->sctp_ep,
 							    stcb_tmp, op_err,
 							    SCTP_SO_NOT_LOCKED);
 							goto add_it_now;
 						}
 						SCTP_TCB_UNLOCK(stcb_tmp);
 					}
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-12);
 					}
 					return (-13);
 				}
 			}
 		} else
 #endif
 #ifdef INET6
 		if (ptype == SCTP_IPV6_ADDRESS) {
 			if (stcb->asoc.scope.ipv6_addr_legal) {
 				/* ok get the v6 address and check/add */
 				struct sctp_ipv6addr_param *p6, p6_buf;
 
 				phdr = sctp_get_next_param(m, offset,
 				    (struct sctp_paramhdr *)&p6_buf,
 				    sizeof(p6_buf));
 				if (plen != sizeof(struct sctp_ipv6addr_param) ||
 				    phdr == NULL) {
 					return (-14);
 				}
 				p6 = (struct sctp_ipv6addr_param *)phdr;
 				memcpy((caddr_t)&sin6.sin6_addr, p6->addr,
 				    sizeof(p6->addr));
 				if (IN6_IS_ADDR_MULTICAST(&sin6.sin6_addr)) {
 					/* Skip multi-cast addresses */
 					goto next_param;
 				}
 				if (IN6_IS_ADDR_LINKLOCAL(&sin6.sin6_addr)) {
 					/*
 					 * Link local make no sense without
 					 * scope
 					 */
 					goto next_param;
 				}
 				sa = (struct sockaddr *)&sin6;
 				inp = stcb->sctp_ep;
 				atomic_add_int(&stcb->asoc.refcnt, 1);
 				stcb_tmp = sctp_findassociation_ep_addr(&inp, sa, &net,
 				    dst, stcb);
 				atomic_add_int(&stcb->asoc.refcnt, -1);
 				if (stcb_tmp == NULL &&
 				    (inp == stcb->sctp_ep || inp == NULL)) {
 					/*
 					 * we must validate the state again
 					 * here
 					 */
 			add_it_now6:
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-16);
 					}
 					/*
 					 * we must add the address, no scope
 					 * set
 					 */
 					if (sctp_add_remote_addr(stcb, sa, NULL, port, SCTP_DONOT_SETSCOPE, SCTP_LOAD_ADDR_5)) {
 						return (-17);
 					}
 				} else if (stcb_tmp == stcb) {
 					/*
 					 * we must validate the state again
 					 * here
 					 */
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-19);
 					}
 					if (net != NULL) {
 						/* clear flag */
 						net->dest_state &=
 						    ~SCTP_ADDR_NOT_IN_ASSOC;
 					}
 				} else {
 					/*
 					 * strange, address is in another
 					 * assoc? straighten out locks.
 					 */
 					if (stcb_tmp) {
 						if (SCTP_GET_STATE(&stcb_tmp->asoc) & SCTP_STATE_COOKIE_WAIT) {
 							struct mbuf *op_err;
 							char msg[SCTP_DIAG_INFO_LEN];
 
 							/*
 							 * in setup state we
 							 * abort this guy
 							 */
 							snprintf(msg, sizeof(msg),
 							    "%s:%d at %s", __FILE__, __LINE__, __func__);
 							op_err = sctp_generate_cause(SCTP_BASE_SYSCTL(sctp_diag_info_code),
 							    msg);
 							sctp_abort_an_association(stcb_tmp->sctp_ep,
 							    stcb_tmp, op_err,
 							    SCTP_SO_NOT_LOCKED);
 							goto add_it_now6;
 						}
 						SCTP_TCB_UNLOCK(stcb_tmp);
 					}
 					if (stcb->asoc.state == 0) {
 						/* the assoc was freed? */
 						return (-21);
 					}
 					return (-22);
 				}
 			}
 		} else
 #endif
 		if (ptype == SCTP_ECN_CAPABLE) {
 			peer_supports_ecn = 1;
 		} else if (ptype == SCTP_ULP_ADAPTATION) {
 			if (stcb->asoc.state != SCTP_STATE_OPEN) {
 				struct sctp_adaptation_layer_indication ai,
 				                                *aip;
 
 				phdr = sctp_get_next_param(m, offset,
 				    (struct sctp_paramhdr *)&ai, sizeof(ai));
 				aip = (struct sctp_adaptation_layer_indication *)phdr;
 				if (aip) {
 					stcb->asoc.peers_adaptation = ntohl(aip->indication);
 					stcb->asoc.adaptation_needed = 1;
 				}
 			}
 		} else if (ptype == SCTP_SET_PRIM_ADDR) {
 			struct sctp_asconf_addr_param lstore, *fee;
 			int lptype;
 			struct sockaddr *lsa = NULL;
 
 #ifdef INET
 			struct sctp_asconf_addrv4_param *fii;
 
 #endif
 
 			if (stcb->asoc.asconf_supported == 0) {
 				return (-100);
 			}
 			if (plen > sizeof(lstore)) {
 				return (-23);
 			}
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)&lstore,
 			    min(plen, sizeof(lstore)));
 			if (phdr == NULL) {
 				return (-24);
 			}
 			fee = (struct sctp_asconf_addr_param *)phdr;
 			lptype = ntohs(fee->addrp.ph.param_type);
 			switch (lptype) {
 #ifdef INET
 			case SCTP_IPV4_ADDRESS:
 				if (plen !=
 				    sizeof(struct sctp_asconf_addrv4_param)) {
 					SCTP_PRINTF("Sizeof setprim in init/init ack not %d but %d - ignored\n",
 					    (int)sizeof(struct sctp_asconf_addrv4_param),
 					    plen);
 				} else {
 					fii = (struct sctp_asconf_addrv4_param *)fee;
 					sin.sin_addr.s_addr = fii->addrp.addr;
 					lsa = (struct sockaddr *)&sin;
 				}
 				break;
 #endif
 #ifdef INET6
 			case SCTP_IPV6_ADDRESS:
 				if (plen !=
 				    sizeof(struct sctp_asconf_addr_param)) {
 					SCTP_PRINTF("Sizeof setprim (v6) in init/init ack not %d but %d - ignored\n",
 					    (int)sizeof(struct sctp_asconf_addr_param),
 					    plen);
 				} else {
 					memcpy(sin6.sin6_addr.s6_addr,
 					    fee->addrp.addr,
 					    sizeof(fee->addrp.addr));
 					lsa = (struct sockaddr *)&sin6;
 				}
 				break;
 #endif
 			default:
 				break;
 			}
 			if (lsa) {
 				(void)sctp_set_primary_addr(stcb, sa, NULL);
 			}
 		} else if (ptype == SCTP_HAS_NAT_SUPPORT) {
 			stcb->asoc.peer_supports_nat = 1;
 		} else if (ptype == SCTP_PRSCTP_SUPPORTED) {
 			/* Peer supports pr-sctp */
 			peer_supports_prsctp = 1;
 		} else if (ptype == SCTP_SUPPORTED_CHUNK_EXT) {
 			/* A supported extension chunk */
 			struct sctp_supported_chunk_types_param *pr_supported;
 			uint8_t local_store[SCTP_PARAM_BUFFER_SIZE];
 			int num_ent, i;
 
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)&local_store, min(sizeof(local_store), plen));
 			if (phdr == NULL) {
 				return (-25);
 			}
 			pr_supported = (struct sctp_supported_chunk_types_param *)phdr;
 			num_ent = plen - sizeof(struct sctp_paramhdr);
 			for (i = 0; i < num_ent; i++) {
 				switch (pr_supported->chunk_types[i]) {
 				case SCTP_ASCONF:
 					peer_supports_asconf = 1;
 					break;
 				case SCTP_ASCONF_ACK:
 					peer_supports_asconf_ack = 1;
 					break;
 				case SCTP_FORWARD_CUM_TSN:
 					peer_supports_prsctp = 1;
 					break;
 				case SCTP_PACKET_DROPPED:
 					peer_supports_pktdrop = 1;
 					break;
 				case SCTP_NR_SELECTIVE_ACK:
 					peer_supports_nrsack = 1;
 					break;
 				case SCTP_STREAM_RESET:
 					peer_supports_reconfig = 1;
 					break;
 				case SCTP_AUTHENTICATION:
 					peer_supports_auth = 1;
 					break;
 				case SCTP_IDATA:
 					peer_supports_idata = 1;
 					break;
 				default:
 					/* one I have not learned yet */
 					break;
 
 				}
 			}
 		} else if (ptype == SCTP_RANDOM) {
 			if (plen > sizeof(random_store))
 				break;
 			if (got_random) {
 				/* already processed a RANDOM */
 				goto next_param;
 			}
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)random_store,
 			    min(sizeof(random_store), plen));
 			if (phdr == NULL)
 				return (-26);
 			p_random = (struct sctp_auth_random *)phdr;
 			random_len = plen - sizeof(*p_random);
 			/* enforce the random length */
 			if (random_len != SCTP_AUTH_RANDOM_SIZE_REQUIRED) {
 				SCTPDBG(SCTP_DEBUG_AUTH1, "SCTP: invalid RANDOM len\n");
 				return (-27);
 			}
 			got_random = 1;
 		} else if (ptype == SCTP_HMAC_LIST) {
 			uint16_t num_hmacs;
 			uint16_t i;
 
 			if (plen > sizeof(hmacs_store))
 				break;
 			if (got_hmacs) {
 				/* already processed a HMAC list */
 				goto next_param;
 			}
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)hmacs_store,
 			    min(plen, sizeof(hmacs_store)));
 			if (phdr == NULL)
 				return (-28);
 			hmacs = (struct sctp_auth_hmac_algo *)phdr;
 			hmacs_len = plen - sizeof(*hmacs);
 			num_hmacs = hmacs_len / sizeof(hmacs->hmac_ids[0]);
 			/* validate the hmac list */
 			if (sctp_verify_hmac_param(hmacs, num_hmacs)) {
 				return (-29);
 			}
 			if (stcb->asoc.peer_hmacs != NULL)
 				sctp_free_hmaclist(stcb->asoc.peer_hmacs);
 			stcb->asoc.peer_hmacs = sctp_alloc_hmaclist(num_hmacs);
 			if (stcb->asoc.peer_hmacs != NULL) {
 				for (i = 0; i < num_hmacs; i++) {
 					(void)sctp_auth_add_hmacid(stcb->asoc.peer_hmacs,
 					    ntohs(hmacs->hmac_ids[i]));
 				}
 			}
 			got_hmacs = 1;
 		} else if (ptype == SCTP_CHUNK_LIST) {
 			int i;
 
 			if (plen > sizeof(chunks_store))
 				break;
 			if (got_chklist) {
 				/* already processed a Chunks list */
 				goto next_param;
 			}
 			phdr = sctp_get_next_param(m, offset,
 			    (struct sctp_paramhdr *)chunks_store,
 			    min(plen, sizeof(chunks_store)));
 			if (phdr == NULL)
 				return (-30);
 			chunks = (struct sctp_auth_chunk_list *)phdr;
 			num_chunks = plen - sizeof(*chunks);
 			if (stcb->asoc.peer_auth_chunks != NULL)
 				sctp_clear_chunklist(stcb->asoc.peer_auth_chunks);
 			else
 				stcb->asoc.peer_auth_chunks = sctp_alloc_chunklist();
 			for (i = 0; i < num_chunks; i++) {
 				(void)sctp_auth_add_chunk(chunks->chunk_types[i],
 				    stcb->asoc.peer_auth_chunks);
 				/* record asconf/asconf-ack if listed */
 				if (chunks->chunk_types[i] == SCTP_ASCONF)
 					saw_asconf = 1;
 				if (chunks->chunk_types[i] == SCTP_ASCONF_ACK)
 					saw_asconf_ack = 1;
 
 			}
 			got_chklist = 1;
 		} else if ((ptype == SCTP_HEARTBEAT_INFO) ||
 			    (ptype == SCTP_STATE_COOKIE) ||
 			    (ptype == SCTP_UNRECOG_PARAM) ||
 			    (ptype == SCTP_COOKIE_PRESERVE) ||
 			    (ptype == SCTP_SUPPORTED_ADDRTYPE) ||
 			    (ptype == SCTP_ADD_IP_ADDRESS) ||
 			    (ptype == SCTP_DEL_IP_ADDRESS) ||
 			    (ptype == SCTP_ERROR_CAUSE_IND) ||
 		    (ptype == SCTP_SUCCESS_REPORT)) {
 			 /* don't care */ ;
 		} else {
 			if ((ptype & 0x8000) == 0x0000) {
 				/*
 				 * must stop processing the rest of the
 				 * param's. Any report bits were handled
 				 * with the call to
 				 * sctp_arethere_unrecognized_parameters()
 				 * when the INIT or INIT-ACK was first seen.
 				 */
 				break;
 			}
 		}
 
 next_param:
 		offset += SCTP_SIZE32(plen);
 		if (offset >= limit) {
 			break;
 		}
 		phdr = sctp_get_next_param(m, offset, &parm_buf,
 		    sizeof(parm_buf));
 	}
 	/* Now check to see if we need to purge any addresses */
 	TAILQ_FOREACH_SAFE(net, &stcb->asoc.nets, sctp_next, nnet) {
 		if ((net->dest_state & SCTP_ADDR_NOT_IN_ASSOC) ==
 		    SCTP_ADDR_NOT_IN_ASSOC) {
 			/* This address has been removed from the asoc */
 			/* remove and free it */
 			stcb->asoc.numnets--;
 			TAILQ_REMOVE(&stcb->asoc.nets, net, sctp_next);
 			sctp_free_remote_addr(net);
 			if (net == stcb->asoc.primary_destination) {
 				stcb->asoc.primary_destination = NULL;
 				sctp_select_primary_destination(stcb);
 			}
 		}
 	}
 	if ((stcb->asoc.ecn_supported == 1) &&
 	    (peer_supports_ecn == 0)) {
 		stcb->asoc.ecn_supported = 0;
 	}
 	if ((stcb->asoc.prsctp_supported == 1) &&
 	    (peer_supports_prsctp == 0)) {
 		stcb->asoc.prsctp_supported = 0;
 	}
 	if ((stcb->asoc.auth_supported == 1) &&
 	    ((peer_supports_auth == 0) ||
 	    (got_random == 0) || (got_hmacs == 0))) {
 		stcb->asoc.auth_supported = 0;
 	}
 	if ((stcb->asoc.asconf_supported == 1) &&
 	    ((peer_supports_asconf == 0) || (peer_supports_asconf_ack == 0) ||
 	    (stcb->asoc.auth_supported == 0) ||
 	    (saw_asconf == 0) || (saw_asconf_ack == 0))) {
 		stcb->asoc.asconf_supported = 0;
 	}
 	if ((stcb->asoc.reconfig_supported == 1) &&
 	    (peer_supports_reconfig == 0)) {
 		stcb->asoc.reconfig_supported = 0;
 	}
 	if ((stcb->asoc.idata_supported == 1) &&
 	    (peer_supports_idata == 0)) {
 		stcb->asoc.idata_supported = 0;
 	}
 	if ((stcb->asoc.nrsack_supported == 1) &&
 	    (peer_supports_nrsack == 0)) {
 		stcb->asoc.nrsack_supported = 0;
 	}
 	if ((stcb->asoc.pktdrop_supported == 1) &&
 	    (peer_supports_pktdrop == 0)) {
 		stcb->asoc.pktdrop_supported = 0;
 	}
 	/* validate authentication required parameters */
 	if ((peer_supports_auth == 0) && (got_chklist == 1)) {
 		/* peer does not support auth but sent a chunks list? */
 		return (-31);
 	}
 	if ((peer_supports_asconf == 1) && (peer_supports_auth == 0)) {
 		/* peer supports asconf but not auth? */
 		return (-32);
 	} else if ((peer_supports_asconf == 1) &&
 		    (peer_supports_auth == 1) &&
 	    ((saw_asconf == 0) || (saw_asconf_ack == 0))) {
 		return (-33);
 	}
 	/* concatenate the full random key */
 	keylen = sizeof(*p_random) + random_len + sizeof(*hmacs) + hmacs_len;
 	if (chunks != NULL) {
 		keylen += sizeof(*chunks) + num_chunks;
 	}
 	new_key = sctp_alloc_key(keylen);
 	if (new_key != NULL) {
 		/* copy in the RANDOM */
 		if (p_random != NULL) {
 			keylen = sizeof(*p_random) + random_len;
 			bcopy(p_random, new_key->key, keylen);
 		}
 		/* append in the AUTH chunks */
 		if (chunks != NULL) {
 			bcopy(chunks, new_key->key + keylen,
 			    sizeof(*chunks) + num_chunks);
 			keylen += sizeof(*chunks) + num_chunks;
 		}
 		/* append in the HMACs */
 		if (hmacs != NULL) {
 			bcopy(hmacs, new_key->key + keylen,
 			    sizeof(*hmacs) + hmacs_len);
 		}
 	} else {
 		/* failed to get memory for the key */
 		return (-34);
 	}
 	if (stcb->asoc.authinfo.peer_random != NULL)
 		sctp_free_key(stcb->asoc.authinfo.peer_random);
 	stcb->asoc.authinfo.peer_random = new_key;
 	sctp_clear_cachedkeys(stcb, stcb->asoc.authinfo.assoc_keyid);
 	sctp_clear_cachedkeys(stcb, stcb->asoc.authinfo.recv_keyid);
 
 	return (0);
 }
 
 int
 sctp_set_primary_addr(struct sctp_tcb *stcb, struct sockaddr *sa,
     struct sctp_nets *net)
 {
 	/* make sure the requested primary address exists in the assoc */
 	if (net == NULL && sa)
 		net = sctp_findnet(stcb, sa);
 
 	if (net == NULL) {
 		/* didn't find the requested primary address! */
 		return (-1);
 	} else {
 		/* set the primary address */
 		if (net->dest_state & SCTP_ADDR_UNCONFIRMED) {
 			/* Must be confirmed, so queue to set */
 			net->dest_state |= SCTP_ADDR_REQ_PRIMARY;
 			return (0);
 		}
 		stcb->asoc.primary_destination = net;
 		if (!(net->dest_state & SCTP_ADDR_PF) && (stcb->asoc.alternate)) {
 			sctp_free_remote_addr(stcb->asoc.alternate);
 			stcb->asoc.alternate = NULL;
 		}
 		net = TAILQ_FIRST(&stcb->asoc.nets);
 		if (net != stcb->asoc.primary_destination) {
 			/*
 			 * first one on the list is NOT the primary
 			 * sctp_cmpaddr() is much more efficient if the
 			 * primary is the first on the list, make it so.
 			 */
 			TAILQ_REMOVE(&stcb->asoc.nets, stcb->asoc.primary_destination, sctp_next);
 			TAILQ_INSERT_HEAD(&stcb->asoc.nets, stcb->asoc.primary_destination, sctp_next);
 		}
 		return (0);
 	}
 }
 
 int
 sctp_is_vtag_good(uint32_t tag, uint16_t lport, uint16_t rport, struct timeval *now)
 {
 	/*
 	 * This function serves two purposes. It will see if a TAG can be
 	 * re-used and return 1 for yes it is ok and 0 for don't use that
 	 * tag. A secondary function it will do is purge out old tags that
 	 * can be removed.
 	 */
 	struct sctpvtaghead *chain;
 	struct sctp_tagblock *twait_block;
 	struct sctpasochead *head;
 	struct sctp_tcb *stcb;
 	int i;
 
 	SCTP_INP_INFO_RLOCK();
 	head = &SCTP_BASE_INFO(sctp_asochash)[SCTP_PCBHASH_ASOC(tag,
 	    SCTP_BASE_INFO(hashasocmark))];
 	LIST_FOREACH(stcb, head, sctp_asocs) {
 		/*
 		 * We choose not to lock anything here. TCB's can't be
 		 * removed since we have the read lock, so they can't be
 		 * freed on us, same thing for the INP. I may be wrong with
 		 * this assumption, but we will go with it for now :-)
 		 */
 		if (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) {
 			continue;
 		}
 		if (stcb->asoc.my_vtag == tag) {
 			/* candidate */
 			if (stcb->rport != rport) {
 				continue;
 			}
 			if (stcb->sctp_ep->sctp_lport != lport) {
 				continue;
 			}
 			/* Its a used tag set */
 			SCTP_INP_INFO_RUNLOCK();
 			return (0);
 		}
 	}
 	chain = &SCTP_BASE_INFO(vtag_timewait)[(tag % SCTP_STACK_VTAG_HASH_SIZE)];
 	/* Now what about timed wait ? */
 	LIST_FOREACH(twait_block, chain, sctp_nxt_tagblock) {
 		/*
 		 * Block(s) are present, lets see if we have this tag in the
 		 * list
 		 */
 		for (i = 0; i < SCTP_NUMBER_IN_VTAG_BLOCK; i++) {
 			if (twait_block->vtag_block[i].v_tag == 0) {
 				/* not used */
 				continue;
 			} else if ((long)twait_block->vtag_block[i].tv_sec_at_expire <
 			    now->tv_sec) {
 				/* Audit expires this guy */
 				twait_block->vtag_block[i].tv_sec_at_expire = 0;
 				twait_block->vtag_block[i].v_tag = 0;
 				twait_block->vtag_block[i].lport = 0;
 				twait_block->vtag_block[i].rport = 0;
 			} else if ((twait_block->vtag_block[i].v_tag == tag) &&
 				    (twait_block->vtag_block[i].lport == lport) &&
 			    (twait_block->vtag_block[i].rport == rport)) {
 				/* Bad tag, sorry :< */
 				SCTP_INP_INFO_RUNLOCK();
 				return (0);
 			}
 		}
 	}
 	SCTP_INP_INFO_RUNLOCK();
 	return (1);
 }
 
 static void
 sctp_drain_mbufs(struct sctp_tcb *stcb)
 {
 	/*
 	 * We must hunt this association for MBUF's past the cumack (i.e.
 	 * out of order data that we can renege on).
 	 */
 	struct sctp_association *asoc;
 	struct sctp_tmit_chunk *chk, *nchk;
 	uint32_t cumulative_tsn_p1;
 	struct sctp_queued_to_read *ctl, *nctl;
 	int cnt, strmat;
 	uint32_t gap, i;
 	int fnd = 0;
 
 	/* We look for anything larger than the cum-ack + 1 */
 
 	asoc = &stcb->asoc;
 	if (asoc->cumulative_tsn == asoc->highest_tsn_inside_map) {
 		/* none we can reneg on. */
 		return;
 	}
 	SCTP_STAT_INCR(sctps_protocol_drains_done);
 	cumulative_tsn_p1 = asoc->cumulative_tsn + 1;
 	cnt = 0;
 	/* Ok that was fun, now we will drain all the inbound streams? */
 	for (strmat = 0; strmat < asoc->streamincnt; strmat++) {
 		TAILQ_FOREACH_SAFE(ctl, &asoc->strmin[strmat].inqueue, next_instrm, nctl) {
 			if (SCTP_TSN_GT(ctl->sinfo_tsn, cumulative_tsn_p1)) {
 				/* Yep it is above cum-ack */
 				cnt++;
 				SCTP_CALC_TSN_TO_GAP(gap, ctl->sinfo_tsn, asoc->mapping_array_base_tsn);
 				asoc->size_on_all_streams = sctp_sbspace_sub(asoc->size_on_all_streams, ctl->length);
 				sctp_ucount_decr(asoc->cnt_on_all_streams);
 				SCTP_UNSET_TSN_PRESENT(asoc->mapping_array, gap);
 				TAILQ_REMOVE(&asoc->strmin[strmat].inqueue, ctl, next_instrm);
 				if (ctl->data) {
 					sctp_m_freem(ctl->data);
 					ctl->data = NULL;
 				}
 				sctp_free_remote_addr(ctl->whoFrom);
 				/* Now its reasm? */
 				TAILQ_FOREACH_SAFE(chk, &ctl->reasm, sctp_next, nchk) {
 					cnt++;
 					SCTP_CALC_TSN_TO_GAP(gap, chk->rec.data.TSN_seq, asoc->mapping_array_base_tsn);
 					asoc->size_on_reasm_queue = sctp_sbspace_sub(asoc->size_on_reasm_queue, chk->send_size);
 					sctp_ucount_decr(asoc->cnt_on_reasm_queue);
 					SCTP_UNSET_TSN_PRESENT(asoc->mapping_array, gap);
 					TAILQ_REMOVE(&ctl->reasm, chk, sctp_next);
 					if (chk->data) {
 						sctp_m_freem(chk->data);
 						chk->data = NULL;
 					}
 					sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 				}
 				sctp_free_a_readq(stcb, ctl);
 			}
 		}
 		TAILQ_FOREACH_SAFE(ctl, &asoc->strmin[strmat].uno_inqueue, next_instrm, nctl) {
 			if (SCTP_TSN_GT(ctl->sinfo_tsn, cumulative_tsn_p1)) {
 				/* Yep it is above cum-ack */
 				cnt++;
 				SCTP_CALC_TSN_TO_GAP(gap, ctl->sinfo_tsn, asoc->mapping_array_base_tsn);
 				asoc->size_on_all_streams = sctp_sbspace_sub(asoc->size_on_all_streams, ctl->length);
 				sctp_ucount_decr(asoc->cnt_on_all_streams);
 				SCTP_UNSET_TSN_PRESENT(asoc->mapping_array, gap);
 				TAILQ_REMOVE(&asoc->strmin[strmat].uno_inqueue, ctl, next_instrm);
 				if (ctl->data) {
 					sctp_m_freem(ctl->data);
 					ctl->data = NULL;
 				}
 				sctp_free_remote_addr(ctl->whoFrom);
 				/* Now its reasm? */
 				TAILQ_FOREACH_SAFE(chk, &ctl->reasm, sctp_next, nchk) {
 					cnt++;
 					SCTP_CALC_TSN_TO_GAP(gap, chk->rec.data.TSN_seq, asoc->mapping_array_base_tsn);
 					asoc->size_on_reasm_queue = sctp_sbspace_sub(asoc->size_on_reasm_queue, chk->send_size);
 					sctp_ucount_decr(asoc->cnt_on_reasm_queue);
 					SCTP_UNSET_TSN_PRESENT(asoc->mapping_array, gap);
 					TAILQ_REMOVE(&ctl->reasm, chk, sctp_next);
 					if (chk->data) {
 						sctp_m_freem(chk->data);
 						chk->data = NULL;
 					}
 					sctp_free_a_chunk(stcb, chk, SCTP_SO_NOT_LOCKED);
 				}
 				sctp_free_a_readq(stcb, ctl);
 			}
 		}
 	}
 	if (cnt) {
 		/* We must back down to see what the new highest is */
 		for (i = asoc->highest_tsn_inside_map; SCTP_TSN_GE(i, asoc->mapping_array_base_tsn); i--) {
 			SCTP_CALC_TSN_TO_GAP(gap, i, asoc->mapping_array_base_tsn);
 			if (SCTP_IS_TSN_PRESENT(asoc->mapping_array, gap)) {
 				asoc->highest_tsn_inside_map = i;
 				fnd = 1;
 				break;
 			}
 		}
 		if (!fnd) {
 			asoc->highest_tsn_inside_map = asoc->mapping_array_base_tsn - 1;
 		}
 		/*
 		 * Question, should we go through the delivery queue? The
 		 * only reason things are on here is the app not reading OR
 		 * a p-d-api up. An attacker COULD send enough in to
 		 * initiate the PD-API and then send a bunch of stuff to
 		 * other streams... these would wind up on the delivery
 		 * queue.. and then we would not get to them. But in order
 		 * to do this I then have to back-track and un-deliver
 		 * sequence numbers in streams.. el-yucko. I think for now
 		 * we will NOT look at the delivery queue and leave it to be
 		 * something to consider later. An alternative would be to
 		 * abort the P-D-API with a notification and then deliver
 		 * the data.... Or another method might be to keep track of
 		 * how many times the situation occurs and if we see a
 		 * possible attack underway just abort the association.
 		 */
 #ifdef SCTP_DEBUG
 		SCTPDBG(SCTP_DEBUG_PCB1, "Freed %d chunks from reneg harvest\n", cnt);
 #endif
 		/*
 		 * Now do we need to find a new
 		 * asoc->highest_tsn_inside_map?
 		 */
 		asoc->last_revoke_count = cnt;
 		(void)SCTP_OS_TIMER_STOP(&stcb->asoc.dack_timer.timer);
 		/* sa_ignore NO_NULL_CHK */
 		sctp_send_sack(stcb, SCTP_SO_NOT_LOCKED);
 		sctp_chunk_output(stcb->sctp_ep, stcb, SCTP_OUTPUT_FROM_DRAIN, SCTP_SO_NOT_LOCKED);
 	}
 	/*
 	 * Another issue, in un-setting the TSN's in the mapping array we
 	 * DID NOT adjust the highest_tsn marker.  This will cause one of
 	 * two things to occur. It may cause us to do extra work in checking
 	 * for our mapping array movement. More importantly it may cause us
 	 * to SACK every datagram. This may not be a bad thing though since
 	 * we will recover once we get our cum-ack above and all this stuff
 	 * we dumped recovered.
 	 */
 }
 
 void
 sctp_drain()
 {
 	/*
 	 * We must walk the PCB lists for ALL associations here. The system
 	 * is LOW on MBUF's and needs help. This is where reneging will
 	 * occur. We really hope this does NOT happen!
 	 */
 	VNET_ITERATOR_DECL(vnet_iter);
 	VNET_LIST_RLOCK_NOSLEEP();
 	VNET_FOREACH(vnet_iter) {
 		CURVNET_SET(vnet_iter);
 		struct sctp_inpcb *inp;
 		struct sctp_tcb *stcb;
 
 		SCTP_STAT_INCR(sctps_protocol_drain_calls);
 		if (SCTP_BASE_SYSCTL(sctp_do_drain) == 0) {
 #ifdef VIMAGE
 			continue;
 #else
 			return;
 #endif
 		}
 		SCTP_INP_INFO_RLOCK();
 		LIST_FOREACH(inp, &SCTP_BASE_INFO(listhead), sctp_list) {
 			/* For each endpoint */
 			SCTP_INP_RLOCK(inp);
 			LIST_FOREACH(stcb, &inp->sctp_asoc_list, sctp_tcblist) {
 				/* For each association */
 				SCTP_TCB_LOCK(stcb);
 				sctp_drain_mbufs(stcb);
 				SCTP_TCB_UNLOCK(stcb);
 			}
 			SCTP_INP_RUNLOCK(inp);
 		}
 		SCTP_INP_INFO_RUNLOCK();
 		CURVNET_RESTORE();
 	}
 	VNET_LIST_RUNLOCK_NOSLEEP();
 }
 
 /*
  * start a new iterator
  * iterates through all endpoints and associations based on the pcb_state
  * flags and asoc_state.  "af" (mandatory) is executed for all matching
  * assocs and "ef" (optional) is executed when the iterator completes.
  * "inpf" (optional) is executed for each new endpoint as it is being
  * iterated through. inpe (optional) is called when the inp completes
  * its way through all the stcbs.
  */
 int
 sctp_initiate_iterator(inp_func inpf,
     asoc_func af,
     inp_func inpe,
     uint32_t pcb_state,
     uint32_t pcb_features,
     uint32_t asoc_state,
     void *argp,
     uint32_t argi,
     end_func ef,
     struct sctp_inpcb *s_inp,
     uint8_t chunk_output_off)
 {
 	struct sctp_iterator *it = NULL;
 
 	if (af == NULL) {
 		return (-1);
 	}
 	if (SCTP_BASE_VAR(sctp_pcb_initialized) == 0) {
 		SCTP_PRINTF("%s: abort on initialize being %d\n", __func__,
 		    SCTP_BASE_VAR(sctp_pcb_initialized));
 		return (-1);
 	}
 	SCTP_MALLOC(it, struct sctp_iterator *, sizeof(struct sctp_iterator),
 	    SCTP_M_ITER);
 	if (it == NULL) {
 		SCTP_LTRACE_ERR_RET(NULL, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOMEM);
 		return (ENOMEM);
 	}
 	memset(it, 0, sizeof(*it));
 	it->function_assoc = af;
 	it->function_inp = inpf;
 	if (inpf)
 		it->done_current_ep = 0;
 	else
 		it->done_current_ep = 1;
 	it->function_atend = ef;
 	it->pointer = argp;
 	it->val = argi;
 	it->pcb_flags = pcb_state;
 	it->pcb_features = pcb_features;
 	it->asoc_state = asoc_state;
 	it->function_inp_end = inpe;
 	it->no_chunk_output = chunk_output_off;
 	it->vn = curvnet;
 	if (s_inp) {
 		/* Assume lock is held here */
 		it->inp = s_inp;
 		SCTP_INP_INCR_REF(it->inp);
 		it->iterator_flags = SCTP_ITERATOR_DO_SINGLE_INP;
 	} else {
 		SCTP_INP_INFO_RLOCK();
 		it->inp = LIST_FIRST(&SCTP_BASE_INFO(listhead));
 		if (it->inp) {
 			SCTP_INP_INCR_REF(it->inp);
 		}
 		SCTP_INP_INFO_RUNLOCK();
 		it->iterator_flags = SCTP_ITERATOR_DO_ALL_INP;
 
 	}
 	SCTP_IPI_ITERATOR_WQ_LOCK();
 	if (SCTP_BASE_VAR(sctp_pcb_initialized) == 0) {
 		SCTP_IPI_ITERATOR_WQ_UNLOCK();
 		SCTP_PRINTF("%s: rollback on initialize being %d it=%p\n", __func__,
 		    SCTP_BASE_VAR(sctp_pcb_initialized), it);
 		SCTP_FREE(it, SCTP_M_ITER);
 		return (-1);
 	}
 	TAILQ_INSERT_TAIL(&sctp_it_ctl.iteratorhead, it, sctp_nxt_itr);
 	if (sctp_it_ctl.iterator_running == 0) {
 		sctp_wakeup_iterator();
 	}
 	SCTP_IPI_ITERATOR_WQ_UNLOCK();
 	/* sa_ignore MEMLEAK {memory is put on the tailq for the iterator} */
 	return (0);
 }
Index: projects/vnet/sys/ofed/drivers/net/mlx4/en_rx.c
===================================================================
--- projects/vnet/sys/ofed/drivers/net/mlx4/en_rx.c	(revision 301546)
+++ projects/vnet/sys/ofed/drivers/net/mlx4/en_rx.c	(revision 301547)
@@ -1,922 +1,922 @@
 /*
  * Copyright (c) 2007, 2014 Mellanox Technologies. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
  * General Public License (GPL) Version 2, available from the file
  * COPYING in the main directory of this source tree, or the
  * OpenIB.org BSD license below:
  *
  *     Redistribution and use in source and binary forms, with or
  *     without modification, are permitted provided that the following
  *     conditions are met:
  *
  *      - Redistributions of source code must retain the above
  *        copyright notice, this list of conditions and the following
  *        disclaimer.
  *
  *      - Redistributions in binary form must reproduce the above
  *        copyright notice, this list of conditions and the following
  *        disclaimer in the documentation and/or other materials
  *        provided with the distribution.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
  * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
  * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
  * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
  */
 #include "opt_inet.h"
 #include <linux/mlx4/cq.h>
 #include <linux/slab.h>
 #include <linux/mlx4/qp.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
 #include <linux/vmalloc.h>
 #include <linux/mlx4/driver.h>
 #ifdef CONFIG_NET_RX_BUSY_POLL
 #include <net/busy_poll.h>
 #endif
 
 #include "mlx4_en.h"
 
 
 static void mlx4_en_init_rx_desc(struct mlx4_en_priv *priv,
 				 struct mlx4_en_rx_ring *ring,
 				 int index)
 {
 	struct mlx4_en_rx_desc *rx_desc = (struct mlx4_en_rx_desc *)
 	    (ring->buf + (ring->stride * index));
 	int possible_frags;
 	int i;
 
 	/* Set size and memtype fields */
 	rx_desc->data[0].byte_count = cpu_to_be32(priv->rx_mb_size - MLX4_NET_IP_ALIGN);
 	rx_desc->data[0].lkey = cpu_to_be32(priv->mdev->mr.key);
 
 	/*
 	 * If the number of used fragments does not fill up the ring
 	 * stride, remaining (unused) fragments must be padded with
 	 * null address/size and a special memory key:
 	 */
 	possible_frags = (ring->stride - sizeof(struct mlx4_en_rx_desc)) / DS_SIZE;
 	for (i = 1; i < possible_frags; i++) {
 		rx_desc->data[i].byte_count = 0;
 		rx_desc->data[i].lkey = cpu_to_be32(MLX4_EN_MEMTYPE_PAD);
 		rx_desc->data[i].addr = 0;
 	}
 }
 
 static int
 mlx4_en_alloc_buf(struct mlx4_en_rx_ring *ring,
      __be64 *pdma, struct mlx4_en_rx_mbuf *mb_list)
 {
 	bus_dma_segment_t segs[1];
 	bus_dmamap_t map;
 	struct mbuf *mb;
 	int nsegs;
 	int err;
 
 	/* try to allocate a new spare mbuf */
 	if (unlikely(ring->spare.mbuf == NULL)) {
 		mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, ring->rx_mb_size);
 		if (unlikely(mb == NULL))
 			return (-ENOMEM);
 		/* setup correct length */
 		mb->m_pkthdr.len = mb->m_len = ring->rx_mb_size;
 
 		/* make sure IP header gets aligned */
 		m_adj(mb, MLX4_NET_IP_ALIGN);
 
 		/* load spare mbuf into BUSDMA */
 		err = -bus_dmamap_load_mbuf_sg(ring->dma_tag, ring->spare.dma_map,
 		    mb, segs, &nsegs, BUS_DMA_NOWAIT);
 		if (unlikely(err != 0)) {
 			m_freem(mb);
 			return (err);
 		}
 
 		/* store spare info */
 		ring->spare.mbuf = mb;
 		ring->spare.paddr_be = cpu_to_be64(segs[0].ds_addr);
 
 		bus_dmamap_sync(ring->dma_tag, ring->spare.dma_map,
 		    BUS_DMASYNC_PREREAD);
 	}
 
 	/* synchronize and unload the current mbuf, if any */
 	if (likely(mb_list->mbuf != NULL)) {
 		bus_dmamap_sync(ring->dma_tag, mb_list->dma_map,
 		    BUS_DMASYNC_POSTREAD);
 		bus_dmamap_unload(ring->dma_tag, mb_list->dma_map);
 	}
 
 	mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, ring->rx_mb_size);
 	if (unlikely(mb == NULL))
 		goto use_spare;
 
 	/* setup correct length */
 	mb->m_pkthdr.len = mb->m_len = ring->rx_mb_size;
 
 	/* make sure IP header gets aligned */
 	m_adj(mb, MLX4_NET_IP_ALIGN);
 
 	err = -bus_dmamap_load_mbuf_sg(ring->dma_tag, mb_list->dma_map,
 	    mb, segs, &nsegs, BUS_DMA_NOWAIT);
 	if (unlikely(err != 0)) {
 		m_freem(mb);
 		goto use_spare;
 	}
 
 	*pdma = cpu_to_be64(segs[0].ds_addr);
 	mb_list->mbuf = mb;
 
 	bus_dmamap_sync(ring->dma_tag, mb_list->dma_map, BUS_DMASYNC_PREREAD);
 	return (0);
 
 use_spare:
 	/* swap DMA maps */
 	map = mb_list->dma_map;
 	mb_list->dma_map = ring->spare.dma_map;
 	ring->spare.dma_map = map;
 
 	/* swap MBUFs */
 	mb_list->mbuf = ring->spare.mbuf;
 	ring->spare.mbuf = NULL;
 
 	/* store physical address */
 	*pdma = ring->spare.paddr_be;
 	return (0);
 }
 
 static void
 mlx4_en_free_buf(struct mlx4_en_rx_ring *ring, struct mlx4_en_rx_mbuf *mb_list)
 {
 	bus_dmamap_t map = mb_list->dma_map;
 	bus_dmamap_sync(ring->dma_tag, map, BUS_DMASYNC_POSTREAD);
 	bus_dmamap_unload(ring->dma_tag, map);
 	m_freem(mb_list->mbuf);
 	mb_list->mbuf = NULL;	/* safety clearing */
 }
 
 static int
 mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
     struct mlx4_en_rx_ring *ring, int index)
 {
 	struct mlx4_en_rx_desc *rx_desc = (struct mlx4_en_rx_desc *)
 	    (ring->buf + (index * ring->stride));
 	struct mlx4_en_rx_mbuf *mb_list = ring->mbuf + index;
 
 	mb_list->mbuf = NULL;
 
 	if (mlx4_en_alloc_buf(ring, &rx_desc->data[0].addr, mb_list)) {
 		priv->port_stats.rx_alloc_failed++;
 		return (-ENOMEM);
 	}
 	return (0);
 }
 
 static inline void
 mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
 {
 	*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
 }
 
 static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_rx_ring *ring;
 	int ring_ind;
 	int buf_ind;
 	int new_size;
 	int err;
 
 	for (buf_ind = 0; buf_ind < priv->prof->rx_ring_size; buf_ind++) {
 		for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
 			ring = priv->rx_ring[ring_ind];
 
 			err = mlx4_en_prepare_rx_desc(priv, ring,
 						      ring->actual_size);
 			if (err) {
 				if (ring->actual_size == 0) {
 					en_err(priv, "Failed to allocate "
 						     "enough rx buffers\n");
 					return -ENOMEM;
 				} else {
 					new_size =
 						rounddown_pow_of_two(ring->actual_size);
 					en_warn(priv, "Only %d buffers allocated "
 						      "reducing ring size to %d\n",
 						ring->actual_size, new_size);
 					goto reduce_rings;
 				}
 			}
 			ring->actual_size++;
 			ring->prod++;
 		}
 	}
 	return 0;
 
 reduce_rings:
 	for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
 		ring = priv->rx_ring[ring_ind];
 		while (ring->actual_size > new_size) {
 			ring->actual_size--;
 			ring->prod--;
 			mlx4_en_free_buf(ring,
 			    ring->mbuf + ring->actual_size);
 		}
 	}
 
 	return 0;
 }
 
 static void mlx4_en_free_rx_buf(struct mlx4_en_priv *priv,
 				struct mlx4_en_rx_ring *ring)
 {
 	int index;
 
 	en_dbg(DRV, priv, "Freeing Rx buf - cons:%d prod:%d\n",
 	       ring->cons, ring->prod);
 
 	/* Unmap and free Rx buffers */
 	BUG_ON((u32) (ring->prod - ring->cons) > ring->actual_size);
 	while (ring->cons != ring->prod) {
 		index = ring->cons & ring->size_mask;
 		en_dbg(DRV, priv, "Processing descriptor:%d\n", index);
 		mlx4_en_free_buf(ring, ring->mbuf + index);
 		++ring->cons;
 	}
 }
 
 void mlx4_en_calc_rx_buf(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int eff_mtu = dev->if_mtu + ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN +
 	    MLX4_NET_IP_ALIGN;
 
 	if (eff_mtu > MJUM16BYTES) {
 		en_err(priv, "MTU(%d) is too big\n", dev->if_mtu);
                 eff_mtu = MJUM16BYTES;
         } else if (eff_mtu > MJUM9BYTES) {
                 eff_mtu = MJUM16BYTES;
         } else if (eff_mtu > MJUMPAGESIZE) {
                 eff_mtu = MJUM9BYTES;
         } else if (eff_mtu > MCLBYTES) {
                 eff_mtu = MJUMPAGESIZE;
         } else {
                 eff_mtu = MCLBYTES;
         }
 
 	priv->rx_mb_size = eff_mtu;
 
 	en_dbg(DRV, priv, "Effective RX MTU: %d bytes\n", eff_mtu);
 }
 
 int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 			   struct mlx4_en_rx_ring **pring,
 			   u32 size, int node)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rx_ring *ring;
 	int err;
 	int tmp;
 	uint32_t x;
 
         ring = kzalloc(sizeof(struct mlx4_en_rx_ring), GFP_KERNEL);
         if (!ring) {
                 en_err(priv, "Failed to allocate RX ring structure\n");
                 return -ENOMEM;
         }
 
 	/* Create DMA descriptor TAG */
 	if ((err = -bus_dma_tag_create(
 	    bus_get_dma_tag(mdev->pdev->dev.bsddev),
 	    1,				/* any alignment */
 	    0,				/* no boundary */
 	    BUS_SPACE_MAXADDR,		/* lowaddr */
 	    BUS_SPACE_MAXADDR,		/* highaddr */
 	    NULL, NULL,			/* filter, filterarg */
 	    MJUM16BYTES,		/* maxsize */
 	    1,				/* nsegments */
 	    MJUM16BYTES,		/* maxsegsize */
 	    0,				/* flags */
 	    NULL, NULL,			/* lockfunc, lockfuncarg */
 	    &ring->dma_tag))) {
 		en_err(priv, "Failed to create DMA tag\n");
 		goto err_ring;
 	}
 
 	ring->prod = 0;
 	ring->cons = 0;
 	ring->size = size;
 	ring->size_mask = size - 1;
 	ring->stride = roundup_pow_of_two(
 	    sizeof(struct mlx4_en_rx_desc) + DS_SIZE);
 	ring->log_stride = ffs(ring->stride) - 1;
 	ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
 
 	tmp = size * sizeof(struct mlx4_en_rx_mbuf);
 
         ring->mbuf = kzalloc(tmp, GFP_KERNEL);
         if (ring->mbuf == NULL) {
                 err = -ENOMEM;
                 goto err_dma_tag;
         }
 
 	err = -bus_dmamap_create(ring->dma_tag, 0, &ring->spare.dma_map);
 	if (err != 0)
 		goto err_info;
 
 	for (x = 0; x != size; x++) {
 		err = -bus_dmamap_create(ring->dma_tag, 0,
 		    &ring->mbuf[x].dma_map);
 		if (err != 0) {
 			while (x--)
 				bus_dmamap_destroy(ring->dma_tag,
 				    ring->mbuf[x].dma_map);
 			goto err_info;
 		}
 	}
 	en_dbg(DRV, priv, "Allocated MBUF ring at addr:%p size:%d\n",
 		 ring->mbuf, tmp);
 
 	err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres,
 				 ring->buf_size, 2 * PAGE_SIZE);
 	if (err)
 		goto err_dma_map;
 
 	err = mlx4_en_map_buffer(&ring->wqres.buf);
 	if (err) {
 		en_err(priv, "Failed to map RX buffer\n");
 		goto err_hwq;
 	}
 	ring->buf = ring->wqres.buf.direct.buf;
 	*pring = ring;
 	return 0;
 
 err_hwq:
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
 err_dma_map:
 	for (x = 0; x != size; x++) {
 		bus_dmamap_destroy(ring->dma_tag,
 		    ring->mbuf[x].dma_map);
 	}
 	bus_dmamap_destroy(ring->dma_tag, ring->spare.dma_map);
 err_info:
 	vfree(ring->mbuf);
 err_dma_tag:
 	bus_dma_tag_destroy(ring->dma_tag);
 err_ring:
 	kfree(ring);
 	return (err);
 }
 
 int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_rx_ring *ring;
 	int i;
 	int ring_ind;
 	int err;
 	int stride = roundup_pow_of_two(
 	    sizeof(struct mlx4_en_rx_desc) + DS_SIZE);
 
 	for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
 		ring = priv->rx_ring[ring_ind];
 
 		ring->prod = 0;
 		ring->cons = 0;
 		ring->actual_size = 0;
 		ring->cqn = priv->rx_cq[ring_ind]->mcq.cqn;
                 ring->rx_mb_size = priv->rx_mb_size;
 
 		ring->stride = stride;
 		if (ring->stride <= TXBB_SIZE)
 			ring->buf += TXBB_SIZE;
 
 		ring->log_stride = ffs(ring->stride) - 1;
 		ring->buf_size = ring->size * ring->stride;
 
 		memset(ring->buf, 0, ring->buf_size);
 		mlx4_en_update_rx_prod_db(ring);
 
 		/* Initialize all descriptors */
 		for (i = 0; i < ring->size; i++)
 			mlx4_en_init_rx_desc(priv, ring, i);
 
 #ifdef INET
 		/* Configure lro mngr */
 		if (priv->dev->if_capenable & IFCAP_LRO) {
 			if (tcp_lro_init(&ring->lro))
 				priv->dev->if_capenable &= ~IFCAP_LRO;
 			else
 				ring->lro.ifp = priv->dev;
 		}
 #endif
 	}
 
 
 	err = mlx4_en_fill_rx_buffers(priv);
 	if (err)
 		goto err_buffers;
 
 	for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
 		ring = priv->rx_ring[ring_ind];
 
 		ring->size_mask = ring->actual_size - 1;
 		mlx4_en_update_rx_prod_db(ring);
 	}
 
 	return 0;
 
 err_buffers:
 	for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++)
 		mlx4_en_free_rx_buf(priv, priv->rx_ring[ring_ind]);
 
 	ring_ind = priv->rx_ring_num - 1;
 
 	while (ring_ind >= 0) {
 		ring = priv->rx_ring[ring_ind];
 		if (ring->stride <= TXBB_SIZE)
 			ring->buf -= TXBB_SIZE;
 		ring_ind--;
 	}
 
 	return err;
 }
 
 
 void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
 			     struct mlx4_en_rx_ring **pring,
 			     u32 size, u16 stride)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rx_ring *ring = *pring;
 	uint32_t x;
 
 	mlx4_en_unmap_buffer(&ring->wqres.buf);
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
 	for (x = 0; x != size; x++)
 		bus_dmamap_destroy(ring->dma_tag, ring->mbuf[x].dma_map);
 	/* free spare mbuf, if any */
 	if (ring->spare.mbuf != NULL) {
 		bus_dmamap_sync(ring->dma_tag, ring->spare.dma_map,
 		    BUS_DMASYNC_POSTREAD);
 		bus_dmamap_unload(ring->dma_tag, ring->spare.dma_map);
 		m_freem(ring->spare.mbuf);
 	}
 	bus_dmamap_destroy(ring->dma_tag, ring->spare.dma_map);
 	vfree(ring->mbuf);
 	bus_dma_tag_destroy(ring->dma_tag);
 	kfree(ring);
 	*pring = NULL;
 #ifdef CONFIG_RFS_ACCEL
 	mlx4_en_cleanup_filters(priv, ring);
 #endif
 }
 
 void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
 				struct mlx4_en_rx_ring *ring)
 {
 #ifdef INET
 	tcp_lro_free(&ring->lro);
 #endif
 	mlx4_en_free_rx_buf(priv, ring);
 	if (ring->stride <= TXBB_SIZE)
 		ring->buf -= TXBB_SIZE;
 }
 
 
 static void validate_loopback(struct mlx4_en_priv *priv, struct mbuf *mb)
 {
 	int i;
 	int offset = ETHER_HDR_LEN;
 
 	for (i = 0; i < MLX4_LOOPBACK_TEST_PAYLOAD; i++, offset++) {
 		if (*(mb->m_data + offset) != (unsigned char) (i & 0xff))
 			goto out_loopback;
 	}
 	/* Loopback found */
 	priv->loopback_ok = 1;
 
 out_loopback:
 	m_freem(mb);
 }
 
 
 static inline int invalid_cqe(struct mlx4_en_priv *priv,
 			      struct mlx4_cqe *cqe)
 {
 	/* Drop packet on bad receive or bad checksum */
 	if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
 		     MLX4_CQE_OPCODE_ERROR)) {
 		en_err(priv, "CQE completed in error - vendor syndrom:%d syndrom:%d\n",
 		       ((struct mlx4_err_cqe *)cqe)->vendor_err_syndrome,
 		       ((struct mlx4_err_cqe *)cqe)->syndrome);
 		return 1;
 	}
 	if (unlikely(cqe->badfcs_enc & MLX4_CQE_BAD_FCS)) {
 		en_dbg(RX_ERR, priv, "Accepted frame with bad FCS\n");
 		return 1;
 	}
 
 	return 0;
 }
 
 static struct mbuf *
 mlx4_en_rx_mb(struct mlx4_en_priv *priv, struct mlx4_en_rx_ring *ring,
     struct mlx4_en_rx_desc *rx_desc, struct mlx4_en_rx_mbuf *mb_list,
     int length)
 {
 	struct mbuf *mb;
 
 	/* get mbuf */
 	mb = mb_list->mbuf;
 
 	/* collect used fragment while atomically replacing it */
 	if (mlx4_en_alloc_buf(ring, &rx_desc->data[0].addr, mb_list))
 		return (NULL);
 
 	/* range check hardware computed value */
 	if (unlikely(length > mb->m_len))
 		length = mb->m_len;
 
 	/* update total packet length in packet header */
 	mb->m_len = mb->m_pkthdr.len = length;
 	return (mb);
 }
 
 /* For cpu arch with cache line of 64B the performance is better when cqe size==64B
  * To enlarge cqe size from 32B to 64B --> 32B of garbage (i.e. 0xccccccc)
  * was added in the beginning of each cqe (the real data is in the corresponding 32B).
  * The following calc ensures that when factor==1, it means we are aligned to 64B
  * and we get the real cqe data*/
 #define CQE_FACTOR_INDEX(index, factor) ((index << factor) + factor)
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_cqe *cqe;
 	struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
 	struct mlx4_en_rx_mbuf *mb_list;
 	struct mlx4_en_rx_desc *rx_desc;
 	struct mbuf *mb;
 	struct mlx4_cq *mcq = &cq->mcq;
 	struct mlx4_cqe *buf = cq->buf;
 	int index;
 	unsigned int length;
 	int polled = 0;
 	u32 cons_index = mcq->cons_index;
 	u32 size_mask = ring->size_mask;
 	int size = cq->size;
 	int factor = priv->cqe_factor;
 
 	if (!priv->port_up)
 		return 0;
 
 	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
 	 * descriptor offset can be deducted from the CQE index instead of
 	 * reading 'cqe->index' */
 	index = cons_index & size_mask;
 	cqe = &buf[CQE_FACTOR_INDEX(index, factor)];
 
 	/* Process all completed CQEs */
 	while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
 		    cons_index & size)) {
 		mb_list = ring->mbuf + index;
 		rx_desc = (struct mlx4_en_rx_desc *)
 		    (ring->buf + (index << ring->log_stride));
 
 		/*
 		 * make sure we read the CQE after we read the ownership bit
 		 */
 		rmb();
 
 		if (invalid_cqe(priv, cqe)) {
 			goto next;
 		}
 		/*
 		 * Packet is OK - process it.
 		 */
 		length = be32_to_cpu(cqe->byte_cnt);
 		length -= ring->fcs_del;
 
 		mb = mlx4_en_rx_mb(priv, ring, rx_desc, mb_list, length);
 		if (unlikely(!mb)) {
 			ring->errors++;
 			goto next;
 		}
 
 		ring->bytes += length;
 		ring->packets++;
 
 		if (unlikely(priv->validate_loopback)) {
 			validate_loopback(priv, mb);
 			goto next;
 		}
 
 		/* forward Toeplitz compatible hash value */
 		mb->m_pkthdr.flowid = be32_to_cpu(cqe->immed_rss_invalid);
-		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
+		M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
 		mb->m_pkthdr.rcvif = dev;
 		if (be32_to_cpu(cqe->vlan_my_qpn) &
 		    MLX4_CQE_VLAN_PRESENT_MASK) {
 			mb->m_pkthdr.ether_vtag = be16_to_cpu(cqe->sl_vid);
 			mb->m_flags |= M_VLANTAG;
 		}
 		if (likely(dev->if_capenable &
 		    (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) &&
 		    (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
 		    (cqe->checksum == cpu_to_be16(0xffff))) {
 			priv->port_stats.rx_chksum_good++;
 			mb->m_pkthdr.csum_flags =
 			    CSUM_IP_CHECKED | CSUM_IP_VALID |
 			    CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
 			mb->m_pkthdr.csum_data = htons(0xffff);
 			/* This packet is eligible for LRO if it is:
 			 * - DIX Ethernet (type interpretation)
 			 * - TCP/IP (v4)
 			 * - without IP options
 			 * - not an IP fragment
 			 */
 #ifdef INET
 			if (mlx4_en_can_lro(cqe->status) &&
 					(dev->if_capenable & IFCAP_LRO)) {
 				if (ring->lro.lro_cnt != 0 &&
 						tcp_lro_rx(&ring->lro, mb, 0) == 0)
 					goto next;
 			}
 
 #endif
 			/* LRO not possible, complete processing here */
 			INC_PERF_COUNTER(priv->pstats.lro_misses);
 		} else {
 			mb->m_pkthdr.csum_flags = 0;
 			priv->port_stats.rx_chksum_none++;
 		}
 
 		/* Push it up the stack */
 		dev->if_input(dev, mb);
 
 next:
 		++cons_index;
 		index = cons_index & size_mask;
 		cqe = &buf[CQE_FACTOR_INDEX(index, factor)];
 		if (++polled == budget)
 			goto out;
 	}
 	/* Flush all pending IP reassembly sessions */
 out:
 #ifdef INET
 	tcp_lro_flush_all(&ring->lro);
 #endif
 	AVG_PERF_COUNTER(priv->pstats.rx_coal_avg, polled);
 	mcq->cons_index = cons_index;
 	mlx4_cq_set_ci(mcq);
 	wmb(); /* ensure HW sees CQ consumer before we post new buffers */
 	ring->cons = mcq->cons_index;
 	ring->prod += polled; /* Polled descriptors were realocated in place */
 	mlx4_en_update_rx_prod_db(ring);
 	return polled;
 
 }
 
 /* Rx CQ polling - called by NAPI */
 static int mlx4_en_poll_rx_cq(struct mlx4_en_cq *cq, int budget)
 {
         struct net_device *dev = cq->dev;
         int done;
 
         done = mlx4_en_process_rx_cq(dev, cq, budget);
         cq->tot_rx += done;
 
         return done;
 
 }
 void mlx4_en_rx_irq(struct mlx4_cq *mcq)
 {
 	struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
 	struct mlx4_en_priv *priv = netdev_priv(cq->dev);
         int done;
 
         // Shoot one within the irq context 
         // Because there is no NAPI in freeBSD
         done = mlx4_en_poll_rx_cq(cq, MLX4_EN_RX_BUDGET);
 	if (priv->port_up  && (done == MLX4_EN_RX_BUDGET) ) {
 		cq->curr_poll_rx_cpu_id = curcpu;
 		taskqueue_enqueue(cq->tq, &cq->cq_task);
         }
 	else {
 		mlx4_en_arm_cq(priv, cq);
 	}
 }
 
 void mlx4_en_rx_que(void *context, int pending)
 {
         struct mlx4_en_cq *cq;
 	struct thread *td;
 
         cq = context;
 	td = curthread;
 
 	thread_lock(td);
 	sched_bind(td, cq->curr_poll_rx_cpu_id);
 	thread_unlock(td);
 
         while (mlx4_en_poll_rx_cq(cq, MLX4_EN_RX_BUDGET)
                         == MLX4_EN_RX_BUDGET);
         mlx4_en_arm_cq(cq->dev->if_softc, cq);
 }
 
 
 /* RSS related functions */
 
 static int mlx4_en_config_rss_qp(struct mlx4_en_priv *priv, int qpn,
 				 struct mlx4_en_rx_ring *ring,
 				 enum mlx4_qp_state *state,
 				 struct mlx4_qp *qp)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_qp_context *context;
 	int err = 0;
 
 	context = kmalloc(sizeof *context , GFP_KERNEL);
 	if (!context) {
 		en_err(priv, "Failed to allocate qp context\n");
 		return -ENOMEM;
 	}
 
 	err = mlx4_qp_alloc(mdev->dev, qpn, qp);
 	if (err) {
 		en_err(priv, "Failed to allocate qp #%x\n", qpn);
 		goto out;
 	}
 	qp->event = mlx4_en_sqp_event;
 
 	memset(context, 0, sizeof *context);
 	mlx4_en_fill_qp_context(priv, ring->actual_size, ring->stride, 0, 0,
 				qpn, ring->cqn, -1, context);
 	context->db_rec_addr = cpu_to_be64(ring->wqres.db.dma);
 
 	/* Cancel FCS removal if FW allows */
 	if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_FCS_KEEP) {
 		context->param3 |= cpu_to_be32(1 << 29);
 		ring->fcs_del = ETH_FCS_LEN;
 	} else
 		ring->fcs_del = 0;
 
 	err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, context, qp, state);
 	if (err) {
 		mlx4_qp_remove(mdev->dev, qp);
 		mlx4_qp_free(mdev->dev, qp);
 	}
 	mlx4_en_update_rx_prod_db(ring);
 out:
 	kfree(context);
 	return err;
 }
 
 int mlx4_en_create_drop_qp(struct mlx4_en_priv *priv)
 {
 	int err;
 	u32 qpn;
 
 	err = mlx4_qp_reserve_range(priv->mdev->dev, 1, 1, &qpn, 0);
 	if (err) {
 		en_err(priv, "Failed reserving drop qpn\n");
 		return err;
 	}
 	err = mlx4_qp_alloc(priv->mdev->dev, qpn, &priv->drop_qp);
 	if (err) {
 		en_err(priv, "Failed allocating drop qp\n");
 		mlx4_qp_release_range(priv->mdev->dev, qpn, 1);
 		return err;
 	}
 
 	return 0;
 }
 
 void mlx4_en_destroy_drop_qp(struct mlx4_en_priv *priv)
 {
 	u32 qpn;
 
 	qpn = priv->drop_qp.qpn;
 	mlx4_qp_remove(priv->mdev->dev, &priv->drop_qp);
 	mlx4_qp_free(priv->mdev->dev, &priv->drop_qp);
 	mlx4_qp_release_range(priv->mdev->dev, qpn, 1);
 }
 
 /* Allocate rx qp's and configure them according to rss map */
 int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rss_map *rss_map = &priv->rss_map;
 	struct mlx4_qp_context context;
 	struct mlx4_rss_context *rss_context;
 	int rss_rings;
 	void *ptr;
 	u8 rss_mask = (MLX4_RSS_IPV4 | MLX4_RSS_TCP_IPV4 | MLX4_RSS_IPV6 |
 			MLX4_RSS_TCP_IPV6);
 	int i;
 	int err = 0;
 	int good_qps = 0;
 	static const u32 rsskey[10] = { 0xD181C62C, 0xF7F4DB5B, 0x1983A2FC,
 				0x943E1ADB, 0xD9389E6B, 0xD1039C2C, 0xA74499AD,
 				0x593D56D9, 0xF3253C06, 0x2ADC1FFC};
 
 	en_dbg(DRV, priv, "Configuring rss steering\n");
 	err = mlx4_qp_reserve_range(mdev->dev, priv->rx_ring_num,
 				    priv->rx_ring_num,
 				    &rss_map->base_qpn, 0);
 	if (err) {
 		en_err(priv, "Failed reserving %d qps\n", priv->rx_ring_num);
 		return err;
 	}
 
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		priv->rx_ring[i]->qpn = rss_map->base_qpn + i;
 		err = mlx4_en_config_rss_qp(priv, priv->rx_ring[i]->qpn,
 					    priv->rx_ring[i],
 					    &rss_map->state[i],
 					    &rss_map->qps[i]);
 		if (err)
 			goto rss_err;
 
 		++good_qps;
 	}
 
 	/* Configure RSS indirection qp */
 	err = mlx4_qp_alloc(mdev->dev, priv->base_qpn, &rss_map->indir_qp);
 	if (err) {
 		en_err(priv, "Failed to allocate RSS indirection QP\n");
 		goto rss_err;
 	}
 	rss_map->indir_qp.event = mlx4_en_sqp_event;
 	mlx4_en_fill_qp_context(priv, 0, 0, 0, 1, priv->base_qpn,
 				priv->rx_ring[0]->cqn, -1, &context);
 
 	if (!priv->prof->rss_rings || priv->prof->rss_rings > priv->rx_ring_num)
 		rss_rings = priv->rx_ring_num;
 	else
 		rss_rings = priv->prof->rss_rings;
 
 	ptr = ((u8 *)&context) + offsetof(struct mlx4_qp_context, pri_path) +
 	    MLX4_RSS_OFFSET_IN_QPC_PRI_PATH;
 	rss_context = ptr;
 	rss_context->base_qpn = cpu_to_be32(ilog2(rss_rings) << 24 |
 					    (rss_map->base_qpn));
 	rss_context->default_qpn = cpu_to_be32(rss_map->base_qpn);
 	if (priv->mdev->profile.udp_rss) {
 		rss_mask |=  MLX4_RSS_UDP_IPV4 | MLX4_RSS_UDP_IPV6;
 		rss_context->base_qpn_udp = rss_context->default_qpn;
 	}
 	rss_context->flags = rss_mask;
 	rss_context->hash_fn = MLX4_RSS_HASH_TOP;
 	for (i = 0; i < 10; i++)
 		rss_context->rss_key[i] = cpu_to_be32(rsskey[i]);
 
 	err = mlx4_qp_to_ready(mdev->dev, &priv->res.mtt, &context,
 			       &rss_map->indir_qp, &rss_map->indir_state);
 	if (err)
 		goto indir_err;
 
 	return 0;
 
 indir_err:
 	mlx4_qp_modify(mdev->dev, NULL, rss_map->indir_state,
 		       MLX4_QP_STATE_RST, NULL, 0, 0, &rss_map->indir_qp);
 	mlx4_qp_remove(mdev->dev, &rss_map->indir_qp);
 	mlx4_qp_free(mdev->dev, &rss_map->indir_qp);
 rss_err:
 	for (i = 0; i < good_qps; i++) {
 		mlx4_qp_modify(mdev->dev, NULL, rss_map->state[i],
 			       MLX4_QP_STATE_RST, NULL, 0, 0, &rss_map->qps[i]);
 		mlx4_qp_remove(mdev->dev, &rss_map->qps[i]);
 		mlx4_qp_free(mdev->dev, &rss_map->qps[i]);
 	}
 	mlx4_qp_release_range(mdev->dev, rss_map->base_qpn, priv->rx_ring_num);
 	return err;
 }
 
 void mlx4_en_release_rss_steer(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct mlx4_en_rss_map *rss_map = &priv->rss_map;
 	int i;
 
 	mlx4_qp_modify(mdev->dev, NULL, rss_map->indir_state,
 		       MLX4_QP_STATE_RST, NULL, 0, 0, &rss_map->indir_qp);
 	mlx4_qp_remove(mdev->dev, &rss_map->indir_qp);
 	mlx4_qp_free(mdev->dev, &rss_map->indir_qp);
 
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		mlx4_qp_modify(mdev->dev, NULL, rss_map->state[i],
 			       MLX4_QP_STATE_RST, NULL, 0, 0, &rss_map->qps[i]);
 		mlx4_qp_remove(mdev->dev, &rss_map->qps[i]);
 		mlx4_qp_free(mdev->dev, &rss_map->qps[i]);
 	}
 	mlx4_qp_release_range(mdev->dev, rss_map->base_qpn, priv->rx_ring_num);
 }
 
Index: projects/vnet/sys/sys/intr.h
===================================================================
--- projects/vnet/sys/sys/intr.h	(revision 301546)
+++ projects/vnet/sys/sys/intr.h	(revision 301547)
@@ -1,153 +1,129 @@
 /*-
  * Copyright (c) 2015-2016 Svatopluk Kraus
  * Copyright (c) 2015-2016 Michal Meloun
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _SYS_INTR_H_
 #define _SYS_INTR_H_
 
 #include <sys/systm.h>
 
 #define	INTR_IRQ_INVALID	0xFFFFFFFF
 
-#ifdef DEV_ACPI
-struct intr_map_data_acpi {
-	struct intr_map_data	hdr;
-	u_int			irq;
-	enum intr_polarity	pol;
-	enum intr_trigger	trig;
-};
-#endif
-
-struct intr_map_data_gpio {
-	struct intr_map_data	hdr;
-	u_int			gpio_pin_num;
-	u_int			gpio_pin_flags;
-	u_int		 	gpio_intr_mode;
-};
-
 #ifdef notyet
 #define	INTR_SOLO	INTR_MD1
 typedef int intr_irq_filter_t(void *arg, struct trapframe *tf);
 #else
 typedef int intr_irq_filter_t(void *arg);
 #endif
 typedef int intr_child_irq_filter_t(void *arg, uintptr_t irq);
 
 #define INTR_ISRC_NAMELEN	(MAXCOMLEN + 1)
 
 #define INTR_ISRCF_IPI		0x01	/* IPI interrupt */
 #define INTR_ISRCF_PPI		0x02	/* PPI interrupt */
 #define INTR_ISRCF_BOUND	0x04	/* bound to a CPU */
 
 struct intr_pic;
 
 /* Interrupt source definition. */
 struct intr_irqsrc {
 	device_t		isrc_dev;	/* where isrc is mapped */
 	u_int			isrc_irq;	/* unique identificator */
 	u_int			isrc_flags;
 	char			isrc_name[INTR_ISRC_NAMELEN];
 	cpuset_t		isrc_cpu;	/* on which CPUs is enabled */
 	u_int			isrc_index;
 	u_long *		isrc_count;
 	u_int			isrc_handlers;
 	struct intr_event *	isrc_event;
 #ifdef INTR_SOLO
 	intr_irq_filter_t *	isrc_filter;
 	void *			isrc_arg;
 #endif
 };
 
 /* Intr interface for PIC. */
 int intr_isrc_deregister(struct intr_irqsrc *);
 int intr_isrc_register(struct intr_irqsrc *, device_t, u_int, const char *, ...)
     __printflike(4, 5);
 
 #ifdef SMP
 bool intr_isrc_init_on_cpu(struct intr_irqsrc *isrc, u_int cpu);
 #endif
 
 int intr_isrc_dispatch(struct intr_irqsrc *, struct trapframe *);
 u_int intr_irq_next_cpu(u_int current_cpu, cpuset_t *cpumask);
 
 struct intr_pic *intr_pic_register(device_t, intptr_t);
 int intr_pic_deregister(device_t, intptr_t);
 int intr_pic_claim_root(device_t, intptr_t, intr_irq_filter_t *, void *, u_int);
 struct intr_pic *intr_pic_add_handler(device_t, struct intr_pic *,
     intr_child_irq_filter_t *, void *, uintptr_t, uintptr_t);
 
 extern device_t intr_irq_root_dev;
 
 /* Intr interface for BUS. */
 int intr_map_irq(device_t, intptr_t, struct intr_map_data *, u_int *);
 
 int intr_alloc_irq(device_t, struct resource *);
 int intr_release_irq(device_t, struct resource *);
 
 int intr_setup_irq(device_t, struct resource *, driver_filter_t, driver_intr_t,
     void *, int, void **);
 int intr_teardown_irq(device_t, struct resource *, void *);
 
 int intr_describe_irq(device_t, struct resource *, void *, const char *);
 int intr_child_irq_handler(struct intr_pic *, uintptr_t);
 
 /* MSI/MSI-X handling */
 int intr_msi_register(device_t, intptr_t);
 int intr_alloc_msi(device_t, device_t, intptr_t, int, int, int *);
 int intr_release_msi(device_t, device_t, intptr_t, int, int *);
 int intr_map_msi(device_t, device_t, intptr_t, int, uint64_t *, uint32_t *);
 int intr_alloc_msix(device_t, device_t, intptr_t, int *);
 int intr_release_msix(device_t, device_t, intptr_t, int);
-
-#ifdef DEV_ACPI
-u_int intr_acpi_map_irq(device_t, u_int, enum intr_polarity,
-    enum intr_trigger);
-#endif
-
-u_int intr_gpio_map_irq(device_t dev, u_int pin_num, u_int pin_flags,
-    u_int intr_mode);
 
 #ifdef SMP
 int intr_bind_irq(device_t, struct resource *, int);
 
 void intr_pic_init_secondary(void);
 
 /* Virtualization for interrupt source IPI counter increment. */
 static inline void
 intr_ipi_increment_count(u_long *counter, u_int cpu)
 {
 
 	KASSERT(cpu < MAXCPU, ("%s: too big cpu %u", __func__, cpu));
 	counter[cpu]++;
 }
 
 /* Virtualization for interrupt source IPI counters setup. */
 u_long * intr_ipi_setup_counters(const char *name);
 
 #endif
 #endif	/* _SYS_INTR_H */
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.c
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.c	(revision 301546)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.c	(nonexistent)
@@ -1,297 +0,0 @@
-/*
- * ath3kfw.c
- */
-
-/*-
- * Copyright (c) 2010 Maksim Yevmenkin <m_evmenkin@yahoo.com>
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- *
- * $FreeBSD$
- */
-
-#include <sys/types.h>
-#include <errno.h>
-#include <fcntl.h>
-#include <libusb20_desc.h>
-#include <libusb20.h>
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <syslog.h>
-#include <unistd.h>
-
-#define ATH3KFW			"ath3kfw"
-#define ATH3KFW_VENDOR_ID	0x0cf3
-#define ATH3KFW_PRODUCT_ID	0x3000
-#define ATH3KFW_FW		"/usr/local/etc/ath3k-1.fw"
-#define ATH3KFW_BULK_EP		0x02
-#define	ATH3KFW_REQ_DFU_DNLOAD	1
-#define	ATH3KFW_MAX_BSIZE	4096
-
-static int	parse_ugen_name		(char const *ugen, uint8_t *bus,
-					 uint8_t *addr);
-static int	find_device		(struct libusb20_backend *be,
-					 uint8_t bus, uint8_t addr,
-					 struct libusb20_device **dev);
-static int	download_firmware	(struct libusb20_device *dev,
-					 char const *firmware);
-static void	usage			(void);
-
-static int			vendor_id = ATH3KFW_VENDOR_ID;
-static int			product_id = ATH3KFW_PRODUCT_ID;
-
-/*
- * Firmware downloader for Atheros AR3011 based USB Bluetooth devices
- */
-
-int
-main(int argc, char **argv)
-{
-	uint8_t			bus, addr;
-	char const		*firmware;
-	struct libusb20_backend	*be;
-	struct libusb20_device	*dev;
-	int			n;
-
-	openlog(ATH3KFW, LOG_NDELAY|LOG_PERROR|LOG_PID, LOG_USER);
-
-	bus = 0;
-	addr = 0;
-	firmware = ATH3KFW_FW;
-
-	while ((n = getopt(argc, argv, "d:f:hp:v:")) != -1) {
-		switch (n) {
-		case 'd': /* ugen device name */
-			if (parse_ugen_name(optarg, &bus, &addr) < 0)
-				usage();
-			break;
-
-		case 'f': /* firmware file */
-			firmware = optarg;
-			break;
-		case 'p': /* product id */
-			product_id = strtol(optarg, NULL, 0);
-			break;
-		case 'v': /* vendor id */
-			vendor_id = strtol(optarg, NULL, 0);
-			break;
-		case 'h':
-		default:
-			usage();
-			break;
-			/* NOT REACHED */
-		}
-	}
-
-	be = libusb20_be_alloc_default();
-	if (be == NULL) {
-		syslog(LOG_ERR, "libusb20_be_alloc_default() failed");
-		return (-1);
-	}
-
-	if (find_device(be, bus, addr, &dev) < 0) {
-		syslog(LOG_ERR, "ugen%d.%d is not recognized as " \
-			"Atheros AR3011 based device " \
-			"(possibly caused by lack of permissions)", bus, addr);
-		return (-1);
-	}
-
-	if (download_firmware(dev, firmware) < 0) {
-		syslog(LOG_ERR, "could not download %s firmare to ugen%d.%d",
-			firmware, bus, addr);
-		return (-1);
-	}
-
-	libusb20_be_free(be);
-	closelog();
-	
-	return (0);
-}
-
-/*
- * Parse ugen name and extract device's bus and address
- */
-
-static int
-parse_ugen_name(char const *ugen, uint8_t *bus, uint8_t *addr)
-{
-	char	*ep;
-
-	if (strncmp(ugen, "ugen", 4) != 0)
-		return (-1);
-
-	*bus = (uint8_t) strtoul(ugen + 4, &ep, 10);
-	if (*ep != '.')
-		return (-1);
-
-	*addr = (uint8_t) strtoul(ep + 1, &ep, 10);
-	if (*ep != '\0')
-		return (-1);
-
-	return (0);
-}
-
-/*
- * Find USB device
- */
-
-static int
-find_device(struct libusb20_backend *be, uint8_t bus, uint8_t addr,
-		struct libusb20_device **dev)
-{
-	struct LIBUSB20_DEVICE_DESC_DECODED	*desc;
-
-	*dev = NULL;
-
-	while ((*dev = libusb20_be_device_foreach(be, *dev)) != NULL) {
-		if (libusb20_dev_get_bus_number(*dev) != bus ||
-		    libusb20_dev_get_address(*dev) != addr)
-			continue;
-
-		desc = libusb20_dev_get_device_desc(*dev);
-		if (desc == NULL)
-			continue;
-
-		if (desc->idVendor != vendor_id ||
-		    desc->idProduct != product_id)
-			continue;
-
-		break;
-	}
-
-	return ((*dev == NULL)? -1 : 0);
-}
-
-/*
- * Download firmware
- */
-
-static int
-download_firmware(struct libusb20_device *dev, char const *firmware)
-{
-	struct libusb20_transfer		*bulk;
-	struct LIBUSB20_CONTROL_SETUP_DECODED	req;
-	int					fd, n, error;
-	uint8_t					buf[ATH3KFW_MAX_BSIZE];
-
-	error = -1;
-
-	if (libusb20_dev_open(dev, 1) != 0) {
-		syslog(LOG_ERR, "libusb20_dev_open() failed");
-		return (error);
-	}
-
-	if ((bulk = libusb20_tr_get_pointer(dev, 0)) == NULL) {
-		syslog(LOG_ERR, "libusb20_tr_get_pointer() failed");
-		goto out;
-	}
-
-	if (libusb20_tr_open(bulk, ATH3KFW_MAX_BSIZE, 1, ATH3KFW_BULK_EP) != 0) {
-		syslog(LOG_ERR, "libusb20_tr_open(%d, 1, %d) failed",
-			ATH3KFW_MAX_BSIZE, ATH3KFW_BULK_EP);
-		goto out;
-	}
-
-	if ((fd = open(firmware, O_RDONLY)) < 0) {
-		syslog(LOG_ERR, "open(%s) failed. %s",
-			firmware, strerror(errno));
-		goto out1;
-	}
-
-	n = read(fd, buf, 20);
-	if (n != 20) {
-		syslog(LOG_ERR, "read(%s, 20) failed. %s",
-			firmware, strerror(errno));
-		goto out2;
-	}
-
-	LIBUSB20_INIT(LIBUSB20_CONTROL_SETUP, &req);
-	req.bmRequestType = LIBUSB20_REQUEST_TYPE_VENDOR;
-	req.bRequest = ATH3KFW_REQ_DFU_DNLOAD;
-	req.wLength = 20;
-
-	if (libusb20_dev_request_sync(dev, &req, buf, NULL, 5000, 0) != 0) {
-		syslog(LOG_ERR, "libusb20_dev_request_sync() failed");
-		goto out2;
-	}
-
-	for (;;) {
-		n = read(fd, buf, sizeof(buf));
-		if (n < 0) {
-			syslog(LOG_ERR, "read(%s, %d) failed. %s",
-				firmware, (int) sizeof(buf), strerror(errno));
-			goto out2;
-		}
-		if (n == 0)
-			break;
-
-		libusb20_tr_setup_bulk(bulk, buf, n, 3000);
-		libusb20_tr_start(bulk);
-
-		while (libusb20_dev_process(dev) == 0) {
-			if (libusb20_tr_pending(bulk) == 0)
-				break;
-
-			libusb20_dev_wait_process(dev, -1);
-		}
-
-		if (libusb20_tr_get_status(bulk) != 0) {
-			syslog(LOG_ERR, "bulk transfer failed with status %d",
-				libusb20_tr_get_status(bulk));
-			goto out2;
-		}
-	}
-
-	error = 0;
-out2:
-	close(fd);
-out1:
-	libusb20_tr_close(bulk);
-out:
-	libusb20_dev_close(dev);
-
-	return (error);
-}
-
-/*
- * Display usage and exit
- */
-
-static void
-usage(void)
-{
-	printf(
-"Usage: %s -d ugenX.Y -f firmware_file\n"
-"Usage: %s -h\n" \
-"Where:\n" \
-"\t-d ugenX.Y           ugen device name\n" \
-"\t-f firmware image    firmware image file name for download\n" \
-"\t-v vendor_id         vendor id\n" \
-"\t-p vendor_id         product id\n" \
-"\t-h                   display this message\n", ATH3KFW, ATH3KFW);
-
-        exit(255);
-}
-

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.c
___________________________________________________________________
Deleted: svn:keywords
## -1 +0,0 ##
-FreeBSD=%H
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/Makefile
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/Makefile	(revision 301546)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/Makefile	(revision 301547)
@@ -1,7 +1,8 @@
 # $FreeBSD$
 
 PROG=		ath3kfw
 MAN=		ath3kfw.8
 LIBADD+=	usb
+SRCS=		main.c ath3k_fw.c ath3k_hw.c
 
 .include <bsd.prog.mk>
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_dbg.h
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_dbg.h	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_dbg.h	(revision 301547)
@@ -0,0 +1,41 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+#ifndef	__ATH3K_DEBUG_H__
+#define	__ATH3K_DEBUG_H__
+
+extern	int ath3k_do_debug;
+extern	int ath3k_do_info;
+
+#define	ath3k_debug(...)	if (ath3k_do_debug) fprintf(stderr, __VA_ARGS__)
+#define	ath3k_err(...)		fprintf(stderr, __VA_ARGS__)
+#define	ath3k_info(...)		if (ath3k_do_info) fprintf(stdout, __VA_ARGS__)
+
+#endif

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_dbg.h
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.c
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.c	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.c	(revision 301547)
@@ -0,0 +1,114 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+#include <err.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "ath3k_fw.h"
+#include "ath3k_dbg.h"
+
+int
+ath3k_fw_read(struct ath3k_firmware *fw, const char *fwname)
+{
+	int fd;
+	struct stat sb;
+	unsigned char *buf;
+	ssize_t r;
+	int i;
+
+	fd = open(fwname, O_RDONLY);
+	if (fd < 0) {
+		warn("%s: open: %s", __func__, fwname);
+		return (0);
+	}
+
+	if (fstat(fd, &sb) != 0) {
+		warn("%s: stat: %s", __func__, fwname);
+		close(fd);
+		return (0);
+	}
+	
+	buf = calloc(1, sb.st_size);
+	if (buf == NULL) {
+		warn("%s: calloc", __func__);
+		close(fd);
+		return (0);
+	}
+
+	i = 0;
+	/* XXX handle partial reads */
+	r = read(fd, buf, sb.st_size);
+	if (r < 0) {
+		warn("%s: read", __func__);
+		free(buf);
+		close(fd);
+		return (0);
+	}
+
+	if (r != sb.st_size) {
+		fprintf(stderr, "%s: read len %d != file size %d\n",
+		    __func__,
+		    (int) r,
+		    (int) sb.st_size);
+		free(buf);
+		close(fd);
+		return (0);
+	}
+
+	/* We have everything, so! */
+
+	bzero(fw, sizeof(*fw));
+
+	fw->fwname = strdup(fwname);
+	fw->len = sb.st_size;
+	fw->size = sb.st_size;
+	fw->buf = buf;
+
+	close(fd);
+	return (1);
+}
+
+void
+ath3k_fw_free(struct ath3k_firmware *fw)
+{
+	if (fw->fwname)
+		free(fw->fwname);
+	if (fw->buf)
+		free(fw->buf);
+	bzero(fw, sizeof(*fw));
+}

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.h
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.h	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.h	(revision 301547)
@@ -0,0 +1,56 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+#ifndef	__ATH3K_FW_H__
+#define	__ATH3K_FW_H__
+
+/*
+ * XXX TODO: ensure that the endian-ness of this stuff is
+ * correct!
+ */
+struct ath3k_version {
+	unsigned int	rom_version;
+	unsigned int	build_version;
+	unsigned int	ram_version;
+	unsigned char	ref_clock;
+	unsigned char	reserved[0x07];
+};
+
+struct ath3k_firmware {
+	char *fwname;
+	int len;		/* firmware length */
+	int size;		/* buffer size */
+	unsigned char *buf;
+};
+
+extern	int ath3k_fw_read(struct ath3k_firmware *fw, const char *fwname);
+extern	void ath3k_fw_free(struct ath3k_firmware *fw);
+
+#endif

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_fw.h
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.c
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.c	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.c	(revision 301547)
@@ -0,0 +1,358 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+#include <err.h>
+#include <fcntl.h>
+#include <sys/endian.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <libusb.h>
+
+#include "ath3k_fw.h"
+#include "ath3k_hw.h"
+#include "ath3k_dbg.h"
+
+#define	XMIN(x, y)	((x) < (y) ? (x) : (y))
+
+int
+ath3k_load_fwfile(struct libusb_device_handle *hdl,
+    const struct ath3k_firmware *fw)
+{
+	int size, count, sent = 0;
+	int ret, r;
+
+	count = fw->len;
+
+	size = XMIN(count, FW_HDR_SIZE);
+
+	ath3k_debug("%s: file=%s, size=%d\n",
+	    __func__, fw->fwname, count);
+
+	/*
+	 * Flip the device over to configuration mode.
+	 */
+	ret = libusb_control_transfer(hdl,
+	    LIBUSB_REQUEST_TYPE_VENDOR | LIBUSB_ENDPOINT_OUT,
+	    ATH3K_DNLOAD,
+	    0,
+	    0,
+	    fw->buf + sent,
+	    size,
+	    1000);	/* XXX timeout */
+
+	if (ret != size) {
+		fprintf(stderr, "Can't switch to config mode; ret=%d\n",
+		    ret);
+		return (-1);
+	}
+
+	sent += size;
+	count -= size;
+
+	/* Load in the rest of the data */
+	while (count) {
+		size = XMIN(count, BULK_SIZE);
+		ath3k_debug("%s: transferring %d bytes, offset %d\n",
+		    __func__,
+		    sent,
+		    size);
+
+		ret = libusb_bulk_transfer(hdl,
+		    0x2,
+		    fw->buf + sent,
+		    size,
+		    &r,
+		    1000);
+
+		if (ret < 0 || r != size) {
+			fprintf(stderr, "Can't load firmware: err=%s, size=%d\n",
+			    libusb_strerror(ret),
+			    size);
+			return (-1);
+		}
+		sent  += size;
+		count -= size;
+	}
+	return (0);
+}
+
+int
+ath3k_get_state(struct libusb_device_handle *hdl, unsigned char *state)
+{
+	int ret;
+
+	ret = libusb_control_transfer(hdl,
+	    LIBUSB_REQUEST_TYPE_VENDOR | LIBUSB_ENDPOINT_IN,
+	    ATH3K_GETSTATE,
+	    0,
+	    0,
+	    state,
+	    1,
+	    1000);	/* XXX timeout */
+
+	if (ret < 0) {
+		fprintf(stderr,
+		    "%s: libusb_control_transfer() failed: code=%d\n",
+		    __func__,
+		    ret);
+		return (0);
+	}
+
+	return (ret == 1);
+}
+
+int
+ath3k_get_version(struct libusb_device_handle *hdl,
+    struct ath3k_version *version)
+{
+	int ret;
+
+	ret = libusb_control_transfer(hdl,
+	    LIBUSB_REQUEST_TYPE_VENDOR | LIBUSB_ENDPOINT_IN,
+	    ATH3K_GETVERSION,
+	    0,
+	    0,
+	    (unsigned char *) version,
+	    sizeof(struct ath3k_version),
+	    1000);	/* XXX timeout */
+
+	if (ret < 0) {
+		fprintf(stderr,
+		    "%s: libusb_control_transfer() failed: code=%d\n",
+		    __func__,
+		    ret);
+		return (0);
+	}
+
+	/* XXX endian fix! */
+
+	return (ret == sizeof(struct ath3k_version));
+}
+
+int
+ath3k_load_patch(libusb_device_handle *hdl, const char *fw_path)
+{
+	int ret;
+	unsigned char fw_state;
+	struct ath3k_version fw_ver, pt_ver;
+	char fwname[FILENAME_MAX];
+	struct ath3k_firmware fw;
+	uint32_t tmp;
+
+	ret = ath3k_get_state(hdl, &fw_state);
+	if (ret < 0) {
+		ath3k_err("%s: Can't get state\n", __func__);
+		return (ret);
+	}
+
+	if (fw_state & ATH3K_PATCH_UPDATE) {
+		ath3k_info("%s: Patch already downloaded\n",
+		    __func__);
+		return (0);
+	}
+
+	ret = ath3k_get_version(hdl, &fw_ver);
+	if (ret < 0) {
+		ath3k_debug("%s: Can't get version\n", __func__);
+		return (ret);
+	}
+
+	/* XXX path info? */
+	snprintf(fwname, FILENAME_MAX, "%s/ar3k/AthrBT_0x%08x.dfu",
+	    fw_path,
+	    fw_ver.rom_version);
+
+	/* Read in the firmware */
+	if (ath3k_fw_read(&fw, fwname) <= 0) {
+		ath3k_debug("%s: ath3k_fw_read() failed\n",
+		    __func__);
+		return (-1);
+	}
+
+	/*
+	 * Extract the ROM/build version from the patch file.
+	 */
+	memcpy(&tmp, fw.buf + fw.len - 8, sizeof(tmp));
+	pt_ver.rom_version = le32toh(tmp);
+	memcpy(&tmp, fw.buf + fw.len - 4, sizeof(tmp));
+	pt_ver.build_version = le32toh(tmp);
+
+	ath3k_info("%s: file %s: rom_ver=%d, build_ver=%d\n",
+	    __func__,
+	    fwname,
+	    (int) pt_ver.rom_version,
+	    (int) pt_ver.build_version);
+
+	/* Check the ROM/build version against the firmware */
+	if ((pt_ver.rom_version != fw_ver.rom_version) ||
+	    (pt_ver.build_version <= fw_ver.build_version)) {
+		ath3k_debug("Patch file version mismatch!\n");
+		ath3k_fw_free(&fw);
+		return (-1);
+	}
+
+	/* Load in the firmware */
+	ret = ath3k_load_fwfile(hdl, &fw);
+
+	/* free it */
+	ath3k_fw_free(&fw);
+
+	return (ret);
+}
+
+int
+ath3k_load_syscfg(libusb_device_handle *hdl, const char *fw_path)
+{
+	unsigned char fw_state;
+	char filename[FILENAME_MAX];
+	struct ath3k_firmware fw;
+	struct ath3k_version fw_ver;
+	int clk_value, ret;
+
+	ret = ath3k_get_state(hdl, &fw_state);
+	if (ret < 0) {
+		ath3k_err("Can't get state to change to load configuration err");
+		return (-EBUSY);
+	}
+
+	ret = ath3k_get_version(hdl, &fw_ver);
+	if (ret < 0) {
+		ath3k_err("Can't get version to change to load ram patch err");
+		return (ret);
+	}
+
+	switch (fw_ver.ref_clock) {
+	case ATH3K_XTAL_FREQ_26M:
+		clk_value = 26;
+		break;
+	case ATH3K_XTAL_FREQ_40M:
+		clk_value = 40;
+		break;
+	case ATH3K_XTAL_FREQ_19P2:
+		clk_value = 19;
+		break;
+	default:
+		clk_value = 0;
+		break;
+}
+
+	snprintf(filename, FILENAME_MAX, "%s/ar3k/ramps_0x%08x_%d%s",
+	    fw_path,
+	    fw_ver.rom_version,
+	    clk_value,
+	    ".dfu");
+
+	ath3k_info("%s: syscfg file = %s\n",
+	    __func__,
+	    filename);
+
+	/* Read in the firmware */
+	if (ath3k_fw_read(&fw, filename) <= 0) {
+		ath3k_err("%s: ath3k_fw_read() failed\n",
+		    __func__);
+		return (-1);
+	}
+
+	ret = ath3k_load_fwfile(hdl, &fw);
+
+	ath3k_fw_free(&fw);
+	return (ret);
+}
+
+int
+ath3k_set_normal_mode(libusb_device_handle *hdl)
+{
+	int ret;
+	unsigned char fw_state;
+
+	ret = ath3k_get_state(hdl, &fw_state);
+	if (ret < 0) {
+		ath3k_err("%s: can't get state\n", __func__);
+		return (ret);
+	}
+
+	/*
+	 * This isn't a fatal error - the device may have detached
+	 * already.
+	 */
+	if ((fw_state & ATH3K_MODE_MASK) == ATH3K_NORMAL_MODE) {
+		ath3k_debug("%s: firmware is already in normal mode\n",
+		    __func__);
+		return (0);
+	}
+
+	ret = libusb_control_transfer(hdl,
+	    LIBUSB_REQUEST_TYPE_VENDOR,		/* XXX out direction? */
+	    ATH3K_SET_NORMAL_MODE,
+	    0,
+	    0,
+	    NULL,
+	    0,
+	    1000);	/* XXX timeout */
+
+	if (ret < 0) {
+		ath3k_err("%s: libusb_control_transfer() failed: code=%d\n",
+		    __func__,
+		    ret);
+		return (0);
+	}
+
+	return (ret == 0);
+}
+
+int
+ath3k_switch_pid(libusb_device_handle *hdl)
+{
+	int ret;
+	ret = libusb_control_transfer(hdl,
+	    LIBUSB_REQUEST_TYPE_VENDOR,		/* XXX set an out flag? */
+	    USB_REG_SWITCH_VID_PID,
+	    0,
+	    0,
+	    NULL,
+	    0,
+	    1000);	/* XXX timeout */
+
+	if (ret < 0) {
+		ath3k_debug("%s: libusb_control_transfer() failed: code=%d\n",
+		    __func__,
+		    ret);
+		return (0);
+	}
+
+	return (ret == 0);
+}

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.h
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.h	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.h	(revision 301547)
@@ -0,0 +1,66 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+#ifndef	__ATH3K_HW_H__
+#define	__ATH3K_HW_H__
+
+#define	ATH3K_DNLOAD			0x01
+#define	ATH3K_GETSTATE			0x05
+#define	ATH3K_SET_NORMAL_MODE		0x07
+#define	ATH3K_GETVERSION		0x09
+#define	USB_REG_SWITCH_VID_PID		0x0a
+
+#define	ATH3K_MODE_MASK			0x3F
+#define	ATH3K_NORMAL_MODE		0x0E
+
+#define	ATH3K_PATCH_UPDATE		0x80
+#define	ATH3K_SYSCFG_UPDATE		0x40
+
+#define	ATH3K_XTAL_FREQ_26M		0x00
+#define	ATH3K_XTAL_FREQ_40M		0x01
+#define	ATH3K_XTAL_FREQ_19P2		0x02
+#define	ATH3K_NAME_LEN			0xFF
+
+#define	USB_REQ_DFU_DNLOAD		1
+#define	BULK_SIZE			4096
+#define	FW_HDR_SIZE			20
+
+extern	int ath3k_load_fwfile(struct libusb_device_handle *hdl,
+	    const struct ath3k_firmware *fw);
+extern	int ath3k_get_state(struct libusb_device_handle *hdl,
+	    unsigned char *state);
+extern	int ath3k_get_version(struct libusb_device_handle *hdl,
+	    struct ath3k_version *version);
+extern	int ath3k_load_patch(libusb_device_handle *hdl, const char *fw_path);
+extern	int ath3k_load_syscfg(libusb_device_handle *hdl, const char *fw_path);
+extern	int ath3k_set_normal_mode(libusb_device_handle *hdl);
+extern	int ath3k_switch_pid(libusb_device_handle *hdl);
+
+#endif

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3k_hw.h
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.8
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.8	(revision 301546)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/ath3kfw.8	(revision 301547)
@@ -1,78 +1,90 @@
 .\" Copyright (c) 2010 Maksim Yevmenkin <m_evmenkin@yahoo.com>
+.\" Copyright (c) 2013, 2016 Adrian Chadd <adrian@freebsd.org>
 .\" All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
-.Dd November 9, 2010
+.Dd June 4, 2016
 .Dt ATH3KFW 8
 .Os
 .Sh NAME
 .Nm ath3kfw
-.Nd firmware download utility for Atheros AR3011 chip based Bluetooth USB devices
+.Nd firmware download utility for Atheros AR3011/AR3012 chip based Bluetooth USB devices
 .Sh SYNOPSIS
 .Nm
 .Fl d Ar device_name
-.Fl f Ar firmware_file_name
+.Fl f Ar firmware_path
 .Nm
 .Fl h
 .Sh DESCRIPTION
 The
 .Nm
 utility downloads the specified firmware file to the specified
 .Xr ugen 4
 device.
 .Pp
 This utility will
 .Em only
-work with Atheros AR3011 chip based Bluetooth USB devices.
+work with Atheros AR3011 and AR3012 chip based Bluetooth USB devices.
 The identification is currently based on USB vendor ID/product ID pair.
 The vendor ID should be 0x0cf3
 .Pq Dv USB_VENDOR_ATHEROS2
-and the product ID should be 0x3000.
+and the product ID should be one of the supported devices.
 .Pp
-Firmware files ath3k-1.fw and ath3k-2.fw can be obtained from the
-linux-firmware RPM.
+Firmware files are available in the linux-firmware RPM.
 .Pp
+The
+.Nm
+utility will query the device to determine which firmware image and board
+configuration to load in at runtime.
+.Pp
 The options are as follows:
 .Bl -tag -width indent
+.It Fl D
+Enable verbose debugging.
 .It Fl d Ar device_name
 Specify
 .Xr ugen 4
 device name.
-.It Fl f Ar firmware_file_name
-Specify firmware file name for download.
+.It I
+Enable informational debugging.
+.It Fl f Ar firmware_path
+Specify the directory containing the firmware files to search and upload.
 .It Fl h
 Display usage message and exit.
 .El
 .Sh EXIT STATUS
 .Ex -std
 .Sh SEE ALSO
 .Xr libusb 3 ,
 .Xr ugen 4 ,
 .Xr devd 8
 .Sh AUTHORS
-.An Maksim Yevmenkin Aq Mt m_evmenkin@yahoo.com
+The original utility was written by
+.An Maksim Yevmenkin Aq Mt m_evmenkin@yahoo.com .
+This was written based on Linux ath3k by
+.An Adrian Chadd Aq Mt adrian@freebsd.org .
 .Sh BUGS
 Most likely.
 Please report if found.
Index: projects/vnet/usr.sbin/bluetooth/ath3kfw/main.c
===================================================================
--- projects/vnet/usr.sbin/bluetooth/ath3kfw/main.c	(nonexistent)
+++ projects/vnet/usr.sbin/bluetooth/ath3kfw/main.c	(revision 301547)
@@ -0,0 +1,394 @@
+/*-
+ * Copyright (c) 2013 Adrian Chadd <adrian@freebsd.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce at minimum a disclaimer
+ *    similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
+ *    redistribution must be conditioned upon including a substantially
+ *    similar Disclaimer requirement for further binary redistribution.
+ *
+ * NO WARRANTY
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
+ * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGES.
+ *
+ * $FreeBSD$
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+#include <err.h>
+#include <fcntl.h>
+#include <libgen.h>
+#include <sys/stat.h>
+#include <sys/param.h>
+
+#include <libusb.h>
+
+#include "ath3k_fw.h"
+#include "ath3k_hw.h"
+#include "ath3k_dbg.h"
+
+#define	_DEFAULT_ATH3K_FIRMWARE_PATH	"/usr/share/firmware/ath3k/"
+
+int	ath3k_do_debug = 0;
+int	ath3k_do_info = 0;
+
+struct ath3k_devid {
+	uint16_t product_id;
+	uint16_t vendor_id;
+	int is_3012;
+};
+
+static struct ath3k_devid ath3k_list[] = {
+
+	/* Atheros AR3012 with sflash firmware*/
+	{ .vendor_id = 0x0489, .product_id = 0xe04e, .is_3012 = 1 },
+	{ .vendor_id = 0x0489, .product_id = 0xe04d, .is_3012 = 1 },
+	{ .vendor_id = 0x0489, .product_id = 0xe056, .is_3012 = 1 },
+	{ .vendor_id = 0x0489, .product_id = 0xe057, .is_3012 = 1 },
+	{ .vendor_id = 0x0489, .product_id = 0xe05f, .is_3012 = 1 },
+	{ .vendor_id = 0x04c5, .product_id = 0x1330, .is_3012 = 1 },
+	{ .vendor_id = 0x04ca, .product_id = 0x3004, .is_3012 = 1 },
+	{ .vendor_id = 0x04ca, .product_id = 0x3005, .is_3012 = 1 },
+	{ .vendor_id = 0x04ca, .product_id = 0x3006, .is_3012 = 1 },
+	{ .vendor_id = 0x04ca, .product_id = 0x3008, .is_3012 = 1 },
+	{ .vendor_id = 0x04ca, .product_id = 0x300b, .is_3012 = 1 },
+	{ .vendor_id = 0x0930, .product_id = 0x0219, .is_3012 = 1 },
+	{ .vendor_id = 0x0930, .product_id = 0x0220, .is_3012 = 1 },
+	{ .vendor_id = 0x0b05, .product_id = 0x17d0, .is_3012 = 1 },
+	{ .vendor_id = 0x0CF3, .product_id = 0x0036, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x3004, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x3005, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x3008, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x311D, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x311E, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x311F, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0x3121, .is_3012 = 1 },
+	{ .vendor_id = 0x0CF3, .product_id = 0x817a, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0xe004, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0xe005, .is_3012 = 1 },
+	{ .vendor_id = 0x0cf3, .product_id = 0xe003, .is_3012 = 1 },
+	{ .vendor_id = 0x13d3, .product_id = 0x3362, .is_3012 = 1 },
+	{ .vendor_id = 0x13d3, .product_id = 0x3375, .is_3012 = 1 },
+	{ .vendor_id = 0x13d3, .product_id = 0x3393, .is_3012 = 1 },
+	{ .vendor_id = 0x13d3, .product_id = 0x3402, .is_3012 = 1 },
+
+	/* Atheros AR5BBU22 with sflash firmware */
+	{ .vendor_id = 0x0489, .product_id = 0xE036, .is_3012 = 1 },
+	{ .vendor_id = 0x0489, .product_id = 0xE03C, .is_3012 = 1 },
+};
+
+static int
+ath3k_is_3012(struct libusb_device_descriptor *d)
+{
+	int i;
+
+	/* Search looking for whether it's an AR3012 */
+	for (i = 0; i < (int) nitems(ath3k_list); i++) {
+		if ((ath3k_list[i].product_id == d->idProduct) &&
+		    (ath3k_list[i].vendor_id == d->idVendor)) {
+			fprintf(stderr, "%s: found AR3012\n", __func__);
+			return (ath3k_list[i].is_3012);
+		}
+	}
+
+	/* Not found */
+	return (0);
+}
+
+static libusb_device *
+ath3k_find_device(libusb_context *ctx, int bus_id, int dev_id)
+{
+	libusb_device **list, *dev = NULL, *found = NULL;
+	ssize_t cnt, i;
+
+	cnt = libusb_get_device_list(ctx, &list);
+	if (cnt < 0) {
+		ath3k_err("%s: libusb_get_device_list() failed: code %lld\n",
+		    __func__,
+		    (long long int) cnt);
+		return (NULL);
+	}
+
+	/*
+	 * XXX TODO: match on the vendor/product id too!
+	 */
+	for (i = 0; i < cnt; i++) {
+		dev = list[i];
+		if (bus_id == libusb_get_bus_number(dev) &&
+		    dev_id == libusb_get_device_address(dev)) {
+			/*
+			 * Take a reference so it's not freed later on.
+			 */
+			found = libusb_ref_device(dev);
+			break;
+		}
+	}
+
+	libusb_free_device_list(list, 1);
+	return (found);
+}
+
+static int
+ath3k_init_ar3012(libusb_device_handle *hdl, const char *fw_path)
+{
+	int ret;
+
+	ret = ath3k_load_patch(hdl, fw_path);
+	if (ret < 0) {
+		ath3k_err("Loading patch file failed\n");
+	return (ret);
+	}
+
+	ret = ath3k_load_syscfg(hdl, fw_path);
+	if (ret < 0) {
+		ath3k_err("Loading sysconfig file failed\n");
+		return (ret);
+	}
+
+	ret = ath3k_set_normal_mode(hdl);
+	if (ret < 0) {
+		ath3k_err("Set normal mode failed\n");
+		return (ret);
+	}
+
+	ath3k_switch_pid(hdl);
+	return (0);
+}
+
+static int
+ath3k_init_firmware(libusb_device_handle *hdl, const char *file_prefix)
+{
+	struct ath3k_firmware fw;
+	char fwname[FILENAME_MAX];
+	int ret;
+
+	/* XXX path info? */
+	snprintf(fwname, FILENAME_MAX, "%s/ath3k-1.fw", file_prefix);
+
+	ath3k_debug("%s: loading ath3k-1.fw\n", __func__);
+
+	/* Read in the firmware */
+	if (ath3k_fw_read(&fw, fwname) <= 0) {
+		fprintf(stderr, "%s: ath3k_fw_read() failed\n",
+		    __func__);
+		return (-1);
+	}
+
+	/* Load in the firmware */
+	ret = ath3k_load_fwfile(hdl, &fw);
+
+	/* free it */
+	ath3k_fw_free(&fw);
+
+	return (0);
+}
+
+/*
+ * Parse ugen name and extract device's bus and address
+ */
+
+static int
+parse_ugen_name(char const *ugen, uint8_t *bus, uint8_t *addr)
+{
+	char *ep;
+
+	if (strncmp(ugen, "ugen", 4) != 0)
+		return (-1);
+
+	*bus = (uint8_t) strtoul(ugen + 4, &ep, 10);
+	if (*ep != '.')
+		return (-1);
+
+	*addr = (uint8_t) strtoul(ep + 1, &ep, 10);
+	if (*ep != '\0')
+		return (-1);
+
+	return (0);
+}
+
+static void
+usage(void)
+{
+	fprintf(stderr,
+	    "Usage: ath3kfw (-D) -d ugenX.Y (-f firmware path) (-I)\n");
+	fprintf(stderr, "    -D: enable debugging\n");
+	fprintf(stderr, "    -d: device to operate upon\n");
+	fprintf(stderr, "    -f: firmware path, if not default\n");
+	fprintf(stderr, "    -I: enable informational output\n");
+	exit(127);
+}
+
+int
+main(int argc, char *argv[])
+{
+	struct libusb_device_descriptor d;
+	libusb_context *ctx;
+	libusb_device *dev;
+	libusb_device_handle *hdl;
+	unsigned char state;
+	struct ath3k_version ver;
+	int r;
+	uint8_t bus_id = 0, dev_id = 0;
+	int devid_set = 0;
+	int n;
+	char *firmware_path = NULL;
+	int is_3012 = 0;
+
+	/* libusb setup */
+	r = libusb_init(&ctx);
+	if (r != 0) {
+		ath3k_err("%s: libusb_init failed: code %d\n",
+		    argv[0],
+		    r);
+		exit(127);
+	}
+
+	/* Enable debugging, just because */
+	libusb_set_debug(ctx, 3);
+
+	/* Parse command line arguments */
+	while ((n = getopt(argc, argv, "Dd:f:hIm:p:v:")) != -1) {
+		switch (n) {
+		case 'd': /* ugen device name */
+			devid_set = 1;
+			if (parse_ugen_name(optarg, &bus_id, &dev_id) < 0)
+				usage();
+			break;
+		case 'D':
+			ath3k_do_debug = 1;
+			break;
+		case 'f': /* firmware path */
+			if (firmware_path)
+				free(firmware_path);
+			firmware_path = strdup(optarg);
+			break;
+		case 'I':
+			ath3k_do_info = 1;
+			break;
+		case 'h':
+		default:
+			usage();
+			break;
+			/* NOT REACHED */
+		}
+	}
+
+	/* Ensure the devid was given! */
+	if (devid_set == 0) {
+		usage();
+		/* NOTREACHED */
+	}
+
+	ath3k_debug("%s: opening dev %d.%d\n",
+	    basename(argv[0]),
+	    (int) bus_id,
+	    (int) dev_id);
+
+	/* Find a device based on the bus/dev id */
+	dev = ath3k_find_device(ctx, bus_id, dev_id);
+	if (dev == NULL) {
+		ath3k_err("%s: device not found\n", __func__);
+		/* XXX cleanup? */
+		exit(1);
+	}
+
+	/* Get the device descriptor for this device entry */
+	r = libusb_get_device_descriptor(dev, &d);
+	if (r != 0) {
+		warn("%s: libusb_get_device_descriptor: %s\n",
+		    __func__,
+		    libusb_strerror(r));
+		exit(1);
+	}
+
+	/* See if its an AR3012 */
+	if (ath3k_is_3012(&d)) {
+		is_3012 = 1;
+
+		/* If it's bcdDevice > 1, don't attach */
+		if (d.bcdDevice > 0x0001) {
+			ath3k_debug("%s: AR3012; bcdDevice=%d, exiting\n",
+			    __func__,
+			    d.bcdDevice);
+			exit(0);
+		}
+	}
+
+	/* XXX enforce that bInterfaceNumber is 0 */
+
+	/* XXX enforce the device/product id if they're non-zero */
+
+	/* Grab device handle */
+	r = libusb_open(dev, &hdl);
+	if (r != 0) {
+		ath3k_err("%s: libusb_open() failed: code %d\n", __func__, r);
+		/* XXX cleanup? */
+		exit(1);
+	}
+
+	/*
+	 * Get the initial NIC state.
+	 */
+	r = ath3k_get_state(hdl, &state);
+	if (r == 0) {
+		ath3k_err("%s: ath3k_get_state() failed!\n", __func__);
+		/* XXX cleanup? */
+		exit(1);
+	}
+	ath3k_debug("%s: state=0x%02x\n",
+	    __func__,
+	    (int) state);
+
+	/* And the version */
+	r = ath3k_get_version(hdl, &ver);
+	if (r == 0) {
+		ath3k_err("%s: ath3k_get_version() failed!\n", __func__);
+		/* XXX cleanup? */
+		exit(1);
+	}
+	ath3k_info("ROM version: %d, build version: %d, ram version: %d, "
+	    "ref clock=%d\n",
+	    ver.rom_version,
+	    ver.build_version,
+	    ver.ram_version,
+	    ver.ref_clock);
+
+	/* Default the firmware path */
+	if (firmware_path == NULL)
+		firmware_path = strdup(_DEFAULT_ATH3K_FIRMWARE_PATH);
+
+	if (is_3012) {
+		(void) ath3k_init_ar3012(hdl, firmware_path);
+	} else {
+		(void) ath3k_init_firmware(hdl, firmware_path);
+	}
+
+	/* Shutdown */
+	libusb_close(hdl);
+	hdl = NULL;
+
+	libusb_unref_device(dev);
+	dev = NULL;
+
+	libusb_exit(ctx);
+	ctx = NULL;
+}

Property changes on: projects/vnet/usr.sbin/bluetooth/ath3kfw/main.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: projects/vnet/usr.sbin/newsyslog/newsyslog.c
===================================================================
--- projects/vnet/usr.sbin/newsyslog/newsyslog.c	(revision 301546)
+++ projects/vnet/usr.sbin/newsyslog/newsyslog.c	(revision 301547)
@@ -1,2673 +1,2675 @@
 /*-
  * ------+---------+---------+-------- + --------+---------+---------+---------*
  * This file includes significant modifications done by:
  * Copyright (c) 2003, 2004  - Garance Alistair Drosehn <gad@FreeBSD.org>.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *   1. Redistributions of source code must retain the above copyright
  *      notice, this list of conditions and the following disclaimer.
  *   2. Redistributions in binary form must reproduce the above copyright
  *      notice, this list of conditions and the following disclaimer in the
  *      documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * ------+---------+---------+-------- + --------+---------+---------+---------*
  */
 
 /*
  * This file contains changes from the Open Software Foundation.
  */
 
 /*
  * Copyright 1988, 1989 by the Massachusetts Institute of Technology
  *
  * Permission to use, copy, modify, and distribute this software and its
  * documentation for any purpose and without fee is hereby granted, provided
  * that the above copyright notice appear in all copies and that both that
  * copyright notice and this permission notice appear in supporting
  * documentation, and that the names of M.I.T. and the M.I.T. S.I.P.B. not be
  * used in advertising or publicity pertaining to distribution of the
  * software without specific, written prior permission. M.I.T. and the M.I.T.
  * S.I.P.B. make no representations about the suitability of this software
  * for any purpose.  It is provided "as is" without express or implied
  * warranty.
  *
  */
 
 /*
  * newsyslog - roll over selected logs at the appropriate time, keeping the a
  * specified number of backup files around.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #define	OSF
 
 #include <sys/param.h>
 #include <sys/queue.h>
 #include <sys/stat.h>
 #include <sys/wait.h>
 
 #include <assert.h>
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <dirent.h>
 #include <fcntl.h>
 #include <fnmatch.h>
 #include <glob.h>
 #include <grp.h>
 #include <paths.h>
 #include <pwd.h>
 #include <signal.h>
 #include <stdio.h>
 #include <libgen.h>
 #include <stdlib.h>
 #include <string.h>
 #include <time.h>
 #include <unistd.h>
 
 #include "pathnames.h"
 #include "extern.h"
 
 /*
  * Compression suffixes
  */
 #ifndef	COMPRESS_SUFFIX_GZ
 #define	COMPRESS_SUFFIX_GZ	".gz"
 #endif
 
 #ifndef	COMPRESS_SUFFIX_BZ2
 #define	COMPRESS_SUFFIX_BZ2	".bz2"
 #endif
 
 #ifndef	COMPRESS_SUFFIX_XZ
 #define	COMPRESS_SUFFIX_XZ	".xz"
 #endif
 
 #define	COMPRESS_SUFFIX_MAXLEN	MAX(MAX(sizeof(COMPRESS_SUFFIX_GZ),sizeof(COMPRESS_SUFFIX_BZ2)),sizeof(COMPRESS_SUFFIX_XZ))
 
 /*
  * Compression types
  */
 #define	COMPRESS_TYPES  4	/* Number of supported compression types */
 
 #define	COMPRESS_NONE	0
 #define	COMPRESS_GZIP	1
 #define	COMPRESS_BZIP2	2
 #define	COMPRESS_XZ	3
 
 /*
  * Bit-values for the 'flags' parsed from a config-file entry.
  */
 #define	CE_BINARY	0x0008	/* Logfile is in binary, do not add status */
 				/*    messages to logfile(s) when rotating. */
 #define	CE_NOSIGNAL	0x0010	/* There is no process to signal when */
 				/*    trimming this file. */
 #define	CE_TRIMAT	0x0020	/* trim file at a specific time. */
 #define	CE_GLOB		0x0040	/* name of the log is file name pattern. */
 #define	CE_SIGNALGROUP	0x0080	/* Signal a process-group instead of a single */
 				/*    process when trimming this file. */
 #define	CE_CREATE	0x0100	/* Create the log file if it does not exist. */
 #define	CE_NODUMP	0x0200	/* Set 'nodump' on newly created log file. */
 #define	CE_PID2CMD	0x0400	/* Replace PID file with a shell command.*/
 
 #define	MIN_PID         5	/* Don't touch pids lower than this */
 #define	MAX_PID		99999	/* was lower, see /usr/include/sys/proc.h */
 
 #define	kbytes(size)  (((size) + 1023) >> 10)
 
 #define	DEFAULT_MARKER	"<default>"
 #define	DEBUG_MARKER	"<debug>"
 #define	INCLUDE_MARKER	"<include>"
 #define	DEFAULT_TIMEFNAME_FMT	"%Y%m%dT%H%M%S"
 
 #define	MAX_OLDLOGS 65536	/* Default maximum number of old logfiles */
 
 struct compress_types {
 	const char *flag;	/* Flag in configuration file */
 	const char *suffix;	/* Compression suffix */
 	const char *path;	/* Path to compression program */
 };
 
 static const struct compress_types compress_type[COMPRESS_TYPES] = {
 	{ "", "", "" },					/* no compression */
 	{ "Z", COMPRESS_SUFFIX_GZ, _PATH_GZIP },	/* gzip compression */
 	{ "J", COMPRESS_SUFFIX_BZ2, _PATH_BZIP2 },	/* bzip2 compression */
 	{ "X", COMPRESS_SUFFIX_XZ, _PATH_XZ }		/* xz compression */
 };
 
 struct conf_entry {
 	STAILQ_ENTRY(conf_entry) cf_nextp;
 	char *log;		/* Name of the log */
 	char *pid_cmd_file;		/* PID or command file */
 	char *r_reason;		/* The reason this file is being rotated */
 	int firstcreate;	/* Creating log for the first time (-C). */
 	int rotate;		/* Non-zero if this file should be rotated */
 	int fsize;		/* size found for the log file */
 	uid_t uid;		/* Owner of log */
 	gid_t gid;		/* Group of log */
 	int numlogs;		/* Number of logs to keep */
 	int trsize;		/* Size cutoff to trigger trimming the log */
 	int hours;		/* Hours between log trimming */
 	struct ptime_data *trim_at;	/* Specific time to do trimming */
 	unsigned int permissions;	/* File permissions on the log */
 	int flags;		/* CE_BINARY */
 	int compress;		/* Compression */
 	int sig;		/* Signal to send */
 	int def_cfg;		/* Using the <default> rule for this file */
 };
 
 struct sigwork_entry {
 	SLIST_ENTRY(sigwork_entry) sw_nextp;
 	int	 sw_signum;		/* the signal to send */
 	int	 sw_pidok;		/* true if pid value is valid */
 	pid_t	 sw_pid;		/* the process id from the PID file */
 	const char *sw_pidtype;		/* "daemon" or "process group" */
 	int	 sw_runcmd;		/* run command or send PID to signal */
 	char	 sw_fname[1];		/* file the PID was read from or shell cmd */
 };
 
 struct zipwork_entry {
 	SLIST_ENTRY(zipwork_entry) zw_nextp;
 	const struct conf_entry *zw_conf;	/* for chown/perm/flag info */
 	const struct sigwork_entry *zw_swork;	/* to know success of signal */
 	int	 zw_fsize;		/* size of the file to compress */
 	char	 zw_fname[1];		/* the file to compress */
 };
 
 struct include_entry {
 	STAILQ_ENTRY(include_entry) inc_nextp;
 	const char *file;	/* Name of file to process */
 };
 
 struct oldlog_entry {
 	char *fname;		/* Filename of the log file */
 	time_t t;		/* Parsed timestamp of the logfile */
 };
 
 typedef enum {
 	FREE_ENT, KEEP_ENT
 }	fk_entry;
 
 STAILQ_HEAD(cflist, conf_entry);
 static SLIST_HEAD(swlisthead, sigwork_entry) swhead =
     SLIST_HEAD_INITIALIZER(swhead);
 static SLIST_HEAD(zwlisthead, zipwork_entry) zwhead =
     SLIST_HEAD_INITIALIZER(zwhead);
 STAILQ_HEAD(ilist, include_entry);
 
 int dbg_at_times;		/* -D Show details of 'trim_at' code */
 
 static int archtodir = 0;	/* Archive old logfiles to other directory */
 static int createlogs;		/* Create (non-GLOB) logfiles which do not */
 				/*    already exist.  1=='for entries with */
 				/*    C flag', 2=='for all entries'. */
 int verbose = 0;		/* Print out what's going on */
 static int needroot = 1;	/* Root privs are necessary */
 int noaction = 0;		/* Don't do anything, just show it */
 static int norotate = 0;	/* Don't rotate */
 static int nosignal;		/* Do not send any signals */
 static int enforcepid = 0;	/* If PID file does not exist or empty, do nothing */
 static int force = 0;		/* Force the trim no matter what */
 static int rotatereq = 0;	/* -R = Always rotate the file(s) as given */
 				/*    on the command (this also requires   */
 				/*    that a list of files *are* given on  */
 				/*    the run command). */
 static char *requestor;		/* The name given on a -R request */
 static char *timefnamefmt = NULL;/* Use time based filenames instead of .0 */
 static char *archdirname;	/* Directory path to old logfiles archive */
 static char *destdir = NULL;	/* Directory to treat at root for logs */
 static const char *conf;	/* Configuration file to use */
 
 struct ptime_data *dbg_timenow;	/* A "timenow" value set via -D option */
 static struct ptime_data *timenow; /* The time to use for checking at-fields */
 
 #define	DAYTIME_LEN	16
 static char daytime[DAYTIME_LEN];/* The current time in human readable form,
 				  * used for rotation-tracking messages. */
 static char hostname[MAXHOSTNAMELEN]; /* hostname */
 
 static const char *path_syslogpid = _PATH_SYSLOGPID;
 
 static struct cflist *get_worklist(char **files);
 static void parse_file(FILE *cf, struct cflist *work_p, struct cflist *glob_p,
 		    struct conf_entry *defconf_p, struct ilist *inclist);
 static void add_to_queue(const char *fname, struct ilist *inclist);
 static char *sob(char *p);
 static char *son(char *p);
 static int isnumberstr(const char *);
 static int isglobstr(const char *);
 static char *missing_field(char *p, char *errline);
 static void	 change_attrs(const char *, const struct conf_entry *);
 static const char *get_logfile_suffix(const char *logfile);
 static fk_entry	 do_entry(struct conf_entry *);
 static fk_entry	 do_rotate(const struct conf_entry *);
 static void	 do_sigwork(struct sigwork_entry *);
 static void	 do_zipwork(struct zipwork_entry *);
 static struct sigwork_entry *
 		 save_sigwork(const struct conf_entry *);
 static struct zipwork_entry *
 		 save_zipwork(const struct conf_entry *, const struct
 		    sigwork_entry *, int, const char *);
 static void	 set_swpid(struct sigwork_entry *, const struct conf_entry *);
 static int	 sizefile(const char *);
 static void expand_globs(struct cflist *work_p, struct cflist *glob_p);
 static void free_clist(struct cflist *list);
 static void free_entry(struct conf_entry *ent);
 static struct conf_entry *init_entry(const char *fname,
 		struct conf_entry *src_entry);
 static void parse_args(int argc, char **argv);
 static int parse_doption(const char *doption);
 static void usage(void);
 static int log_trim(const char *logname, const struct conf_entry *log_ent);
 static int age_old_log(const char *file);
 static void savelog(char *from, char *to);
 static void createdir(const struct conf_entry *ent, char *dirpart);
 static void createlog(const struct conf_entry *ent);
 static int parse_signal(const char *str);
 
 /*
  * All the following take a parameter of 'int', but expect values in the
  * range of unsigned char.  Define wrappers which take values of type 'char',
  * whether signed or unsigned, and ensure they end up in the right range.
  */
 #define	isdigitch(Anychar) isdigit((u_char)(Anychar))
 #define	isprintch(Anychar) isprint((u_char)(Anychar))
 #define	isspacech(Anychar) isspace((u_char)(Anychar))
 #define	tolowerch(Anychar) tolower((u_char)(Anychar))
 
 int
 main(int argc, char **argv)
 {
 	struct cflist *worklist;
 	struct conf_entry *p;
 	struct sigwork_entry *stmp;
 	struct zipwork_entry *ztmp;
 
 	SLIST_INIT(&swhead);
 	SLIST_INIT(&zwhead);
 
 	parse_args(argc, argv);
 	argc -= optind;
 	argv += optind;
 
 	if (needroot && getuid() && geteuid())
 		errx(1, "must have root privs");
 	worklist = get_worklist(argv);
 
 	/*
 	 * Rotate all the files which need to be rotated.  Note that
 	 * some users have *hundreds* of entries in newsyslog.conf!
 	 */
 	while (!STAILQ_EMPTY(worklist)) {
 		p = STAILQ_FIRST(worklist);
 		STAILQ_REMOVE_HEAD(worklist, cf_nextp);
 		if (do_entry(p) == FREE_ENT)
 			free_entry(p);
 	}
 
 	/*
 	 * Send signals to any processes which need a signal to tell
 	 * them to close and re-open the log file(s) we have rotated.
 	 * Note that zipwork_entries include pointers to these
 	 * sigwork_entry's, so we can not free the entries here.
 	 */
 	if (!SLIST_EMPTY(&swhead)) {
 		if (noaction || verbose)
 			printf("Signal all daemon process(es)...\n");
 		SLIST_FOREACH(stmp, &swhead, sw_nextp)
 			do_sigwork(stmp);
-		if (noaction)
-			printf("\tsleep 10\n");
-		else {
-			if (verbose)
-				printf("Pause 10 seconds to allow daemon(s)"
-				    " to close log file(s)\n");
-			sleep(10);
+		if (!(rotatereq && nosignal)) {
+			if (noaction)
+				printf("\tsleep 10\n");
+			else {
+				if (verbose)
+					printf("Pause 10 seconds to allow "
+					    "daemon(s) to close log file(s)\n");
+				sleep(10);
+			}
 		}
 	}
 	/*
 	 * Compress all files that we're expected to compress, now
 	 * that all processes should have closed the files which
 	 * have been rotated.
 	 */
 	if (!SLIST_EMPTY(&zwhead)) {
 		if (noaction || verbose)
 			printf("Compress all rotated log file(s)...\n");
 		while (!SLIST_EMPTY(&zwhead)) {
 			ztmp = SLIST_FIRST(&zwhead);
 			do_zipwork(ztmp);
 			SLIST_REMOVE_HEAD(&zwhead, zw_nextp);
 			free(ztmp);
 		}
 	}
 	/* Now free all the sigwork entries. */
 	while (!SLIST_EMPTY(&swhead)) {
 		stmp = SLIST_FIRST(&swhead);
 		SLIST_REMOVE_HEAD(&swhead, sw_nextp);
 		free(stmp);
 	}
 
 	while (wait(NULL) > 0 || errno == EINTR)
 		;
 	return (0);
 }
 
 static struct conf_entry *
 init_entry(const char *fname, struct conf_entry *src_entry)
 {
 	struct conf_entry *tempwork;
 
 	if (verbose > 4)
 		printf("\t--> [creating entry for %s]\n", fname);
 
 	tempwork = malloc(sizeof(struct conf_entry));
 	if (tempwork == NULL)
 		err(1, "malloc of conf_entry for %s", fname);
 
 	if (destdir == NULL || fname[0] != '/')
 		tempwork->log = strdup(fname);
 	else
 		asprintf(&tempwork->log, "%s%s", destdir, fname);
 	if (tempwork->log == NULL)
 		err(1, "strdup for %s", fname);
 
 	if (src_entry != NULL) {
 		tempwork->pid_cmd_file = NULL;
 		if (src_entry->pid_cmd_file)
 			tempwork->pid_cmd_file = strdup(src_entry->pid_cmd_file);
 		tempwork->r_reason = NULL;
 		tempwork->firstcreate = 0;
 		tempwork->rotate = 0;
 		tempwork->fsize = -1;
 		tempwork->uid = src_entry->uid;
 		tempwork->gid = src_entry->gid;
 		tempwork->numlogs = src_entry->numlogs;
 		tempwork->trsize = src_entry->trsize;
 		tempwork->hours = src_entry->hours;
 		tempwork->trim_at = NULL;
 		if (src_entry->trim_at != NULL)
 			tempwork->trim_at = ptime_init(src_entry->trim_at);
 		tempwork->permissions = src_entry->permissions;
 		tempwork->flags = src_entry->flags;
 		tempwork->compress = src_entry->compress;
 		tempwork->sig = src_entry->sig;
 		tempwork->def_cfg = src_entry->def_cfg;
 	} else {
 		/* Initialize as a "do-nothing" entry */
 		tempwork->pid_cmd_file = NULL;
 		tempwork->r_reason = NULL;
 		tempwork->firstcreate = 0;
 		tempwork->rotate = 0;
 		tempwork->fsize = -1;
 		tempwork->uid = (uid_t)-1;
 		tempwork->gid = (gid_t)-1;
 		tempwork->numlogs = 1;
 		tempwork->trsize = -1;
 		tempwork->hours = -1;
 		tempwork->trim_at = NULL;
 		tempwork->permissions = 0;
 		tempwork->flags = 0;
 		tempwork->compress = COMPRESS_NONE;
 		tempwork->sig = SIGHUP;
 		tempwork->def_cfg = 0;
 	}
 
 	return (tempwork);
 }
 
 static void
 free_entry(struct conf_entry *ent)
 {
 
 	if (ent == NULL)
 		return;
 
 	if (ent->log != NULL) {
 		if (verbose > 4)
 			printf("\t--> [freeing entry for %s]\n", ent->log);
 		free(ent->log);
 		ent->log = NULL;
 	}
 
 	if (ent->pid_cmd_file != NULL) {
 		free(ent->pid_cmd_file);
 		ent->pid_cmd_file = NULL;
 	}
 
 	if (ent->r_reason != NULL) {
 		free(ent->r_reason);
 		ent->r_reason = NULL;
 	}
 
 	if (ent->trim_at != NULL) {
 		ptime_free(ent->trim_at);
 		ent->trim_at = NULL;
 	}
 
 	free(ent);
 }
 
 static void
 free_clist(struct cflist *list)
 {
 	struct conf_entry *ent;
 
 	while (!STAILQ_EMPTY(list)) {
 		ent = STAILQ_FIRST(list);
 		STAILQ_REMOVE_HEAD(list, cf_nextp);
 		free_entry(ent);
 	}
 
 	free(list);
 	list = NULL;
 }
 
 static fk_entry
 do_entry(struct conf_entry * ent)
 {
 #define	REASON_MAX	80
 	int modtime;
 	fk_entry free_or_keep;
 	double diffsecs;
 	char temp_reason[REASON_MAX];
 	int oversized;
 
 	free_or_keep = FREE_ENT;
 	if (verbose)
 		printf("%s <%d%s>: ", ent->log, ent->numlogs,
 		    compress_type[ent->compress].flag);
 	ent->fsize = sizefile(ent->log);
 	oversized = ((ent->trsize > 0) && (ent->fsize >= ent->trsize));
 	modtime = age_old_log(ent->log);
 	ent->rotate = 0;
 	ent->firstcreate = 0;
 	if (ent->fsize < 0) {
 		/*
 		 * If either the C flag or the -C option was specified,
 		 * and if we won't be creating the file, then have the
 		 * verbose message include a hint as to why the file
 		 * will not be created.
 		 */
 		temp_reason[0] = '\0';
 		if (createlogs > 1)
 			ent->firstcreate = 1;
 		else if ((ent->flags & CE_CREATE) && createlogs)
 			ent->firstcreate = 1;
 		else if (ent->flags & CE_CREATE)
 			strlcpy(temp_reason, " (no -C option)", REASON_MAX);
 		else if (createlogs)
 			strlcpy(temp_reason, " (no C flag)", REASON_MAX);
 
 		if (ent->firstcreate) {
 			if (verbose)
 				printf("does not exist -> will create.\n");
 			createlog(ent);
 		} else if (verbose) {
 			printf("does not exist, skipped%s.\n", temp_reason);
 		}
 	} else {
 		if (ent->flags & CE_TRIMAT && !force && !rotatereq &&
 		    !oversized) {
 			diffsecs = ptimeget_diff(timenow, ent->trim_at);
 			if (diffsecs < 0.0) {
 				/* trim_at is some time in the future. */
 				if (verbose) {
 					ptime_adjust4dst(ent->trim_at,
 					    timenow);
 					printf("--> will trim at %s",
 					    ptimeget_ctime(ent->trim_at));
 				}
 				return (free_or_keep);
 			} else if (diffsecs >= 3600.0) {
 				/*
 				 * trim_at is more than an hour in the past,
 				 * so find the next valid trim_at time, and
 				 * tell the user what that will be.
 				 */
 				if (verbose && dbg_at_times)
 					printf("\n\t--> prev trim at %s\t",
 					    ptimeget_ctime(ent->trim_at));
 				if (verbose) {
 					ptimeset_nxtime(ent->trim_at);
 					printf("--> will trim at %s",
 					    ptimeget_ctime(ent->trim_at));
 				}
 				return (free_or_keep);
 			} else if (verbose && noaction && dbg_at_times) {
 				/*
 				 * If we are just debugging at-times, then
 				 * a detailed message is helpful.  Also
 				 * skip "doing" any commands, since they
 				 * would all be turned off by no-action.
 				 */
 				printf("\n\t--> timematch at %s",
 				    ptimeget_ctime(ent->trim_at));
 				return (free_or_keep);
 			} else if (verbose && ent->hours <= 0) {
 				printf("--> time is up\n");
 			}
 		}
 		if (verbose && (ent->trsize > 0))
 			printf("size (Kb): %d [%d] ", ent->fsize, ent->trsize);
 		if (verbose && (ent->hours > 0))
 			printf(" age (hr): %d [%d] ", modtime, ent->hours);
 
 		/*
 		 * Figure out if this logfile needs to be rotated.
 		 */
 		temp_reason[0] = '\0';
 		if (rotatereq) {
 			ent->rotate = 1;
 			snprintf(temp_reason, REASON_MAX, " due to -R from %s",
 			    requestor);
 		} else if (force) {
 			ent->rotate = 1;
 			snprintf(temp_reason, REASON_MAX, " due to -F request");
 		} else if (oversized) {
 			ent->rotate = 1;
 			snprintf(temp_reason, REASON_MAX, " due to size>%dK",
 			    ent->trsize);
 		} else if (ent->hours <= 0 && (ent->flags & CE_TRIMAT)) {
 			ent->rotate = 1;
 		} else if ((ent->hours > 0) && ((modtime >= ent->hours) ||
 		    (modtime < 0))) {
 			ent->rotate = 1;
 		}
 
 		/*
 		 * If the file needs to be rotated, then rotate it.
 		 */
 		if (ent->rotate && !norotate) {
 			if (temp_reason[0] != '\0')
 				ent->r_reason = strdup(temp_reason);
 			if (verbose)
 				printf("--> trimming log....\n");
 			if (noaction && !verbose)
 				printf("%s <%d%s>: trimming\n", ent->log,
 				    ent->numlogs,
 				    compress_type[ent->compress].flag);
 			free_or_keep = do_rotate(ent);
 		} else {
 			if (verbose)
 				printf("--> skipping\n");
 		}
 	}
 	return (free_or_keep);
 #undef REASON_MAX
 }
 
 static void
 parse_args(int argc, char **argv)
 {
 	int ch;
 	char *p;
 
 	timenow = ptime_init(NULL);
 	ptimeset_time(timenow, time(NULL));
 	strlcpy(daytime, ptimeget_ctime(timenow) + 4, DAYTIME_LEN);
 
 	/* Let's get our hostname */
 	(void)gethostname(hostname, sizeof(hostname));
 
 	/* Truncate domain */
 	if ((p = strchr(hostname, '.')) != NULL)
 		*p = '\0';
 
 	/* Parse command line options. */
 	while ((ch = getopt(argc, argv, "a:d:f:nrst:vCD:FNPR:S:")) != -1)
 		switch (ch) {
 		case 'a':
 			archtodir++;
 			archdirname = optarg;
 			break;
 		case 'd':
 			destdir = optarg;
 			break;
 		case 'f':
 			conf = optarg;
 			break;
 		case 'n':
 			noaction++;
 			/* FALLTHROUGH */
 		case 'r':
 			needroot = 0;
 			break;
 		case 's':
 			nosignal = 1;
 			break;
 		case 't':
 			if (optarg[0] == '\0' ||
 			    strcmp(optarg, "DEFAULT") == 0)
 				timefnamefmt = strdup(DEFAULT_TIMEFNAME_FMT);
 			else
 				timefnamefmt = strdup(optarg);
 			break;
 		case 'v':
 			verbose++;
 			break;
 		case 'C':
 			/* Useful for things like rc.diskless... */
 			createlogs++;
 			break;
 		case 'D':
 			/*
 			 * Set some debugging option.  The specific option
 			 * depends on the value of optarg.  These options
 			 * may come and go without notice or documentation.
 			 */
 			if (parse_doption(optarg))
 				break;
 			usage();
 			/* NOTREACHED */
 		case 'F':
 			force++;
 			break;
 		case 'N':
 			norotate++;
 			break;
 		case 'P':
 			enforcepid++;
 			break;
 		case 'R':
 			rotatereq++;
 			requestor = strdup(optarg);
 			break;
 		case 'S':
 			path_syslogpid = optarg;
 			break;
 		case 'm':	/* Used by OpenBSD for "monitor mode" */
 		default:
 			usage();
 			/* NOTREACHED */
 		}
 
 	if (force && norotate) {
 		warnx("Only one of -F and -N may be specified.");
 		usage();
 		/* NOTREACHED */
 	}
 
 	if (rotatereq) {
 		if (optind == argc) {
 			warnx("At least one filename must be given when -R is specified.");
 			usage();
 			/* NOTREACHED */
 		}
 		/* Make sure "requestor" value is safe for a syslog message. */
 		for (p = requestor; *p != '\0'; p++) {
 			if (!isprintch(*p) && (*p != '\t'))
 				*p = '.';
 		}
 	}
 
 	if (dbg_timenow) {
 		/*
 		 * Note that the 'daytime' variable is not changed.
 		 * That is only used in messages that track when a
 		 * logfile is rotated, and if a file *is* rotated,
 		 * then it will still rotated at the "real now" time.
 		 */
 		ptime_free(timenow);
 		timenow = dbg_timenow;
 		fprintf(stderr, "Debug: Running as if TimeNow is %s",
 		    ptimeget_ctime(dbg_timenow));
 	}
 
 }
 
 /*
  * These debugging options are mainly meant for developer use, such
  * as writing regression-tests.  They would not be needed by users
  * during normal operation of newsyslog...
  */
 static int
 parse_doption(const char *doption)
 {
 	const char TN[] = "TN=";
 	int res;
 
 	if (strncmp(doption, TN, sizeof(TN) - 1) == 0) {
 		/*
 		 * The "TimeNow" debugging option.  This might be off
 		 * by an hour when crossing a timezone change.
 		 */
 		dbg_timenow = ptime_init(NULL);
 		res = ptime_relparse(dbg_timenow, PTM_PARSE_ISO8601,
 		    time(NULL), doption + sizeof(TN) - 1);
 		if (res == -2) {
 			warnx("Non-existent time specified on -D %s", doption);
 			return (0);			/* failure */
 		} else if (res < 0) {
 			warnx("Malformed time given on -D %s", doption);
 			return (0);			/* failure */
 		}
 		return (1);			/* successfully parsed */
 
 	}
 
 	if (strcmp(doption, "ats") == 0) {
 		dbg_at_times++;
 		return (1);			/* successfully parsed */
 	}
 
 	/* XXX - This check could probably be dropped. */
 	if ((strcmp(doption, "neworder") == 0) || (strcmp(doption, "oldorder")
 	    == 0)) {
 		warnx("NOTE: newsyslog always uses 'neworder'.");
 		return (1);			/* successfully parsed */
 	}
 
 	warnx("Unknown -D (debug) option: '%s'", doption);
 	return (0);				/* failure */
 }
 
 static void
 usage(void)
 {
 
 	fprintf(stderr,
 	    "usage: newsyslog [-CFNPnrsv] [-a directory] [-d directory] [-f config_file]\n"
 	    "                 [-S pidfile] [-t timefmt] [[-R tagname] file ...]\n");
 	exit(1);
 }
 
 /*
  * Parse a configuration file and return a linked list of all the logs
  * which should be processed.
  */
 static struct cflist *
 get_worklist(char **files)
 {
 	FILE *f;
 	char **given;
 	struct cflist *cmdlist, *filelist, *globlist;
 	struct conf_entry *defconf, *dupent, *ent;
 	struct ilist inclist;
 	struct include_entry *inc;
 	int gmatch, fnres;
 
 	defconf = NULL;
 	STAILQ_INIT(&inclist);
 
 	filelist = malloc(sizeof(struct cflist));
 	if (filelist == NULL)
 		err(1, "malloc of filelist");
 	STAILQ_INIT(filelist);
 	globlist = malloc(sizeof(struct cflist));
 	if (globlist == NULL)
 		err(1, "malloc of globlist");
 	STAILQ_INIT(globlist);
 
 	inc = malloc(sizeof(struct include_entry));
 	if (inc == NULL)
 		err(1, "malloc of inc");
 	inc->file = conf;
 	if (inc->file == NULL)
 		inc->file = _PATH_CONF;
 	STAILQ_INSERT_TAIL(&inclist, inc, inc_nextp);
 
 	STAILQ_FOREACH(inc, &inclist, inc_nextp) {
 		if (strcmp(inc->file, "-") != 0)
 			f = fopen(inc->file, "r");
 		else {
 			f = stdin;
 			inc->file = "<stdin>";
 		}
 		if (!f)
 			err(1, "%s", inc->file);
 
 		if (verbose)
 			printf("Processing %s\n", inc->file);
 		parse_file(f, filelist, globlist, defconf, &inclist);
 		(void) fclose(f);
 	}
 
 	/*
 	 * All config-file information has been read in and turned into
 	 * a filelist and a globlist.  If there were no specific files
 	 * given on the run command, then the only thing left to do is to
 	 * call a routine which finds all files matched by the globlist
 	 * and adds them to the filelist.  Then return the worklist.
 	 */
 	if (*files == NULL) {
 		expand_globs(filelist, globlist);
 		free_clist(globlist);
 		if (defconf != NULL)
 			free_entry(defconf);
 		return (filelist);
 		/* NOTREACHED */
 	}
 
 	/*
 	 * If newsyslog was given a specific list of files to process,
 	 * it may be that some of those files were not listed in any
 	 * config file.  Those unlisted files should get the default
 	 * rotation action.  First, create the default-rotation action
 	 * if none was found in a system config file.
 	 */
 	if (defconf == NULL) {
 		defconf = init_entry(DEFAULT_MARKER, NULL);
 		defconf->numlogs = 3;
 		defconf->trsize = 50;
 		defconf->permissions = S_IRUSR|S_IWUSR;
 	}
 
 	/*
 	 * If newsyslog was run with a list of specific filenames,
 	 * then create a new worklist which has only those files in
 	 * it, picking up the rotation-rules for those files from
 	 * the original filelist.
 	 *
 	 * XXX - Note that this will copy multiple rules for a single
 	 *	logfile, if multiple entries are an exact match for
 	 *	that file.  That matches the historic behavior, but do
 	 *	we want to continue to allow it?  If so, it should
 	 *	probably be handled more intelligently.
 	 */
 	cmdlist = malloc(sizeof(struct cflist));
 	if (cmdlist == NULL)
 		err(1, "malloc of cmdlist");
 	STAILQ_INIT(cmdlist);
 
 	for (given = files; *given; ++given) {
 		/*
 		 * First try to find exact-matches for this given file.
 		 */
 		gmatch = 0;
 		STAILQ_FOREACH(ent, filelist, cf_nextp) {
 			if (strcmp(ent->log, *given) == 0) {
 				gmatch++;
 				dupent = init_entry(*given, ent);
 				STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp);
 			}
 		}
 		if (gmatch) {
 			if (verbose > 2)
 				printf("\t+ Matched entry %s\n", *given);
 			continue;
 		}
 
 		/*
 		 * There was no exact-match for this given file, so look
 		 * for a "glob" entry which does match.
 		 */
 		gmatch = 0;
 		if (verbose > 2 && globlist != NULL)
 			printf("\t+ Checking globs for %s\n", *given);
 		STAILQ_FOREACH(ent, globlist, cf_nextp) {
 			fnres = fnmatch(ent->log, *given, FNM_PATHNAME);
 			if (verbose > 2)
 				printf("\t+    = %d for pattern %s\n", fnres,
 				    ent->log);
 			if (fnres == 0) {
 				gmatch++;
 				dupent = init_entry(*given, ent);
 				/* This new entry is not a glob! */
 				dupent->flags &= ~CE_GLOB;
 				STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp);
 				/* Only allow a match to one glob-entry */
 				break;
 			}
 		}
 		if (gmatch) {
 			if (verbose > 2)
 				printf("\t+ Matched %s via %s\n", *given,
 				    ent->log);
 			continue;
 		}
 
 		/*
 		 * This given file was not found in any config file, so
 		 * add a worklist item based on the default entry.
 		 */
 		if (verbose > 2)
 			printf("\t+ No entry matched %s  (will use %s)\n",
 			    *given, DEFAULT_MARKER);
 		dupent = init_entry(*given, defconf);
 		/* Mark that it was *not* found in a config file */
 		dupent->def_cfg = 1;
 		STAILQ_INSERT_TAIL(cmdlist, dupent, cf_nextp);
 	}
 
 	/*
 	 * Free all the entries in the original work list, the list of
 	 * glob entries, and the default entry.
 	 */
 	free_clist(filelist);
 	free_clist(globlist);
 	free_entry(defconf);
 
 	/* And finally, return a worklist which matches the given files. */
 	return (cmdlist);
 }
 
 /*
  * Expand the list of entries with filename patterns, and add all files
  * which match those glob-entries onto the worklist.
  */
 static void
 expand_globs(struct cflist *work_p, struct cflist *glob_p)
 {
 	int gmatch, gres;
 	size_t i;
 	char *mfname;
 	struct conf_entry *dupent, *ent, *globent;
 	glob_t pglob;
 	struct stat st_fm;
 
 	/*
 	 * The worklist contains all fully-specified (non-GLOB) names.
 	 *
 	 * Now expand the list of filename-pattern (GLOB) entries into
 	 * a second list, which (by definition) will only match files
 	 * that already exist.  Do not add a glob-related entry for any
 	 * file which already exists in the fully-specified list.
 	 */
 	STAILQ_FOREACH(globent, glob_p, cf_nextp) {
 		gres = glob(globent->log, GLOB_NOCHECK, NULL, &pglob);
 		if (gres != 0) {
 			warn("cannot expand pattern (%d): %s", gres,
 			    globent->log);
 			continue;
 		}
 
 		if (verbose > 2)
 			printf("\t+ Expanding pattern %s\n", globent->log);
 		for (i = 0; i < pglob.gl_matchc; i++) {
 			mfname = pglob.gl_pathv[i];
 
 			/* See if this file already has a specific entry. */
 			gmatch = 0;
 			STAILQ_FOREACH(ent, work_p, cf_nextp) {
 				if (strcmp(mfname, ent->log) == 0) {
 					gmatch++;
 					break;
 				}
 			}
 			if (gmatch)
 				continue;
 
 			/* Make sure the named matched is a file. */
 			gres = lstat(mfname, &st_fm);
 			if (gres != 0) {
 				/* Error on a file that glob() matched?!? */
 				warn("Skipping %s - lstat() error", mfname);
 				continue;
 			}
 			if (!S_ISREG(st_fm.st_mode)) {
 				/* We only rotate files! */
 				if (verbose > 2)
 					printf("\t+  . skipping %s (!file)\n",
 					    mfname);
 				continue;
 			}
 
 			if (verbose > 2)
 				printf("\t+  . add file %s\n", mfname);
 			dupent = init_entry(mfname, globent);
 			/* This new entry is not a glob! */
 			dupent->flags &= ~CE_GLOB;
 
 			/* Add to the worklist. */
 			STAILQ_INSERT_TAIL(work_p, dupent, cf_nextp);
 		}
 		globfree(&pglob);
 		if (verbose > 2)
 			printf("\t+ Done with pattern %s\n", globent->log);
 	}
 }
 
 /*
  * Parse a configuration file and update a linked list of all the logs to
  * process.
  */
 static void
 parse_file(FILE *cf, struct cflist *work_p, struct cflist *glob_p,
     struct conf_entry *defconf_p, struct ilist *inclist)
 {
 	char line[BUFSIZ], *parse, *q;
 	char *cp, *errline, *group;
 	struct conf_entry *working;
 	struct passwd *pwd;
 	struct group *grp;
 	glob_t pglob;
 	int eol, ptm_opts, res, special;
 	size_t i;
 
 	errline = NULL;
 	while (fgets(line, BUFSIZ, cf)) {
 		if ((line[0] == '\n') || (line[0] == '#') ||
 		    (strlen(line) == 0))
 			continue;
 		if (errline != NULL)
 			free(errline);
 		errline = strdup(line);
 		for (cp = line + 1; *cp != '\0'; cp++) {
 			if (*cp != '#')
 				continue;
 			if (*(cp - 1) == '\\') {
 				strcpy(cp - 1, cp);
 				cp--;
 				continue;
 			}
 			*cp = '\0';
 			break;
 		}
 
 		q = parse = missing_field(sob(line), errline);
 		parse = son(line);
 		if (!*parse)
 			errx(1, "malformed line (missing fields):\n%s",
 			    errline);
 		*parse = '\0';
 
 		/*
 		 * Allow people to set debug options via the config file.
 		 * (NOTE: debug options are undocumented, and may disappear
 		 * at any time, etc).
 		 */
 		if (strcasecmp(DEBUG_MARKER, q) == 0) {
 			q = parse = missing_field(sob(parse + 1), errline);
 			parse = son(parse);
 			if (!*parse)
 				warnx("debug line specifies no option:\n%s",
 				    errline);
 			else {
 				*parse = '\0';
 				parse_doption(q);
 			}
 			continue;
 		} else if (strcasecmp(INCLUDE_MARKER, q) == 0) {
 			if (verbose)
 				printf("Found: %s", errline);
 			q = parse = missing_field(sob(parse + 1), errline);
 			parse = son(parse);
 			if (!*parse) {
 				warnx("include line missing argument:\n%s",
 				    errline);
 				continue;
 			}
 
 			*parse = '\0';
 
 			if (isglobstr(q)) {
 				res = glob(q, GLOB_NOCHECK, NULL, &pglob);
 				if (res != 0) {
 					warn("cannot expand pattern (%d): %s",
 					    res, q);
 					continue;
 				}
 
 				if (verbose > 2)
 					printf("\t+ Expanding pattern %s\n", q);
 
 				for (i = 0; i < pglob.gl_matchc; i++)
 					add_to_queue(pglob.gl_pathv[i],
 					    inclist);
 				globfree(&pglob);
 			} else
 				add_to_queue(q, inclist);
 			continue;
 		}
 
 		special = 0;
 		working = init_entry(q, NULL);
 		if (strcasecmp(DEFAULT_MARKER, q) == 0) {
 			special = 1;
 			if (defconf_p != NULL) {
 				warnx("Ignoring duplicate entry for %s!", q);
 				free_entry(working);
 				continue;
 			}
 			defconf_p = working;
 		}
 
 		q = parse = missing_field(sob(parse + 1), errline);
 		parse = son(parse);
 		if (!*parse)
 			errx(1, "malformed line (missing fields):\n%s",
 			    errline);
 		*parse = '\0';
 		if ((group = strchr(q, ':')) != NULL ||
 		    (group = strrchr(q, '.')) != NULL) {
 			*group++ = '\0';
 			if (*q) {
 				if (!(isnumberstr(q))) {
 					if ((pwd = getpwnam(q)) == NULL)
 						errx(1,
 				     "error in config file; unknown user:\n%s",
 						    errline);
 					working->uid = pwd->pw_uid;
 				} else
 					working->uid = atoi(q);
 			} else
 				working->uid = (uid_t)-1;
 
 			q = group;
 			if (*q) {
 				if (!(isnumberstr(q))) {
 					if ((grp = getgrnam(q)) == NULL)
 						errx(1,
 				    "error in config file; unknown group:\n%s",
 						    errline);
 					working->gid = grp->gr_gid;
 				} else
 					working->gid = atoi(q);
 			} else
 				working->gid = (gid_t)-1;
 
 			q = parse = missing_field(sob(parse + 1), errline);
 			parse = son(parse);
 			if (!*parse)
 				errx(1, "malformed line (missing fields):\n%s",
 				    errline);
 			*parse = '\0';
 		} else {
 			working->uid = (uid_t)-1;
 			working->gid = (gid_t)-1;
 		}
 
 		if (!sscanf(q, "%o", &working->permissions))
 			errx(1, "error in config file; bad permissions:\n%s",
 			    errline);
 
 		q = parse = missing_field(sob(parse + 1), errline);
 		parse = son(parse);
 		if (!*parse)
 			errx(1, "malformed line (missing fields):\n%s",
 			    errline);
 		*parse = '\0';
 		if (!sscanf(q, "%d", &working->numlogs) || working->numlogs < 0)
 			errx(1, "error in config file; bad value for count of logs to save:\n%s",
 			    errline);
 
 		q = parse = missing_field(sob(parse + 1), errline);
 		parse = son(parse);
 		if (!*parse)
 			errx(1, "malformed line (missing fields):\n%s",
 			    errline);
 		*parse = '\0';
 		if (isdigitch(*q))
 			working->trsize = atoi(q);
 		else if (strcmp(q, "*") == 0)
 			working->trsize = -1;
 		else {
 			warnx("Invalid value of '%s' for 'size' in line:\n%s",
 			    q, errline);
 			working->trsize = -1;
 		}
 
 		working->flags = 0;
 		working->compress = COMPRESS_NONE;
 		q = parse = missing_field(sob(parse + 1), errline);
 		parse = son(parse);
 		eol = !*parse;
 		*parse = '\0';
 		{
 			char *ep;
 			u_long ul;
 
 			ul = strtoul(q, &ep, 10);
 			if (ep == q)
 				working->hours = 0;
 			else if (*ep == '*')
 				working->hours = -1;
 			else if (ul > INT_MAX)
 				errx(1, "interval is too large:\n%s", errline);
 			else
 				working->hours = ul;
 
 			if (*ep == '\0' || strcmp(ep, "*") == 0)
 				goto no_trimat;
 			if (*ep != '@' && *ep != '$')
 				errx(1, "malformed interval/at:\n%s", errline);
 
 			working->flags |= CE_TRIMAT;
 			working->trim_at = ptime_init(NULL);
 			ptm_opts = PTM_PARSE_ISO8601;
 			if (*ep == '$')
 				ptm_opts = PTM_PARSE_DWM;
 			ptm_opts |= PTM_PARSE_MATCHDOM;
 			res = ptime_relparse(working->trim_at, ptm_opts,
 			    ptimeget_secs(timenow), ep + 1);
 			if (res == -2)
 				errx(1, "nonexistent time for 'at' value:\n%s",
 				    errline);
 			else if (res < 0)
 				errx(1, "malformed 'at' value:\n%s", errline);
 		}
 no_trimat:
 
 		if (eol)
 			q = NULL;
 		else {
 			q = parse = sob(parse + 1);	/* Optional field */
 			parse = son(parse);
 			if (!*parse)
 				eol = 1;
 			*parse = '\0';
 		}
 
 		for (; q && *q && !isspacech(*q); q++) {
 			switch (tolowerch(*q)) {
 			case 'b':
 				working->flags |= CE_BINARY;
 				break;
 			case 'c':
 				working->flags |= CE_CREATE;
 				break;
 			case 'd':
 				working->flags |= CE_NODUMP;
 				break;
 			case 'g':
 				working->flags |= CE_GLOB;
 				break;
 			case 'j':
 				working->compress = COMPRESS_BZIP2;
 				break;
 			case 'n':
 				working->flags |= CE_NOSIGNAL;
 				break;
 			case 'r':
 				working->flags |= CE_PID2CMD;
 				break;
 			case 'u':
 				working->flags |= CE_SIGNALGROUP;
 				break;
 			case 'w':
 				/* Depreciated flag - keep for compatibility purposes */
 				break;
 			case 'x':
 				working->compress = COMPRESS_XZ;
 				break;
 			case 'z':
 				working->compress = COMPRESS_GZIP;
 				break;
 			case '-':
 				break;
 			case 'f':	/* Used by OpenBSD for "CE_FOLLOW" */
 			case 'm':	/* Used by OpenBSD for "CE_MONITOR" */
 			case 'p':	/* Used by NetBSD  for "CE_PLAIN0" */
 			default:
 				errx(1, "illegal flag in config file -- %c",
 				    *q);
 			}
 		}
 
 		if (eol)
 			q = NULL;
 		else {
 			q = parse = sob(parse + 1);	/* Optional field */
 			parse = son(parse);
 			if (!*parse)
 				eol = 1;
 			*parse = '\0';
 		}
 
 		working->pid_cmd_file = NULL;
 		if (q && *q) {
 			if (*q == '/')
 				working->pid_cmd_file = strdup(q);
 			else if (isalnum(*q))
 				goto got_sig;
 			else {
 				errx(1,
 			"illegal pid file or signal in config file:\n%s",
 				    errline);
 			}
 		}
 		if (eol)
 			q = NULL;
 		else {
 			q = parse = sob(parse + 1);	/* Optional field */
 			*(parse = son(parse)) = '\0';
 		}
 
 		working->sig = SIGHUP;
 		if (q && *q) {
 got_sig:
 			working->sig = parse_signal(q);
 			if (working->sig < 1 || working->sig >= sys_nsig) {
 				errx(1,
 				    "illegal signal in config file:\n%s",
 				    errline);
 			}
 		}
 
 		/*
 		 * Finish figuring out what pid-file to use (if any) in
 		 * later processing if this logfile needs to be rotated.
 		 */
 		if ((working->flags & CE_NOSIGNAL) == CE_NOSIGNAL) {
 			/*
 			 * This config-entry specified 'n' for nosignal,
 			 * see if it also specified an explicit pid_cmd_file.
 			 * This would be a pretty pointless combination.
 			 */
 			if (working->pid_cmd_file != NULL) {
 				warnx("Ignoring '%s' because flag 'n' was specified in line:\n%s",
 				    working->pid_cmd_file, errline);
 				free(working->pid_cmd_file);
 				working->pid_cmd_file = NULL;
 			}
 		} else if (working->pid_cmd_file == NULL) {
 			/*
 			 * This entry did not specify the 'n' flag, which
 			 * means it should signal syslogd unless it had
 			 * specified some other pid-file (and obviously the
 			 * syslog pid-file will not be for a process-group).
 			 * Also, we should only try to notify syslog if we
 			 * are root.
 			 */
 			if (working->flags & CE_SIGNALGROUP) {
 				warnx("Ignoring flag 'U' in line:\n%s",
 				    errline);
 				working->flags &= ~CE_SIGNALGROUP;
 			}
 			if (needroot)
 				working->pid_cmd_file = strdup(path_syslogpid);
 		}
 
 		/*
 		 * Add this entry to the appropriate list of entries, unless
 		 * it was some kind of special entry (eg: <default>).
 		 */
 		if (special) {
 			;			/* Do not add to any list */
 		} else if (working->flags & CE_GLOB) {
 			STAILQ_INSERT_TAIL(glob_p, working, cf_nextp);
 		} else {
 			STAILQ_INSERT_TAIL(work_p, working, cf_nextp);
 		}
 	}
 	if (errline != NULL)
 		free(errline);
 }
 
 static char *
 missing_field(char *p, char *errline)
 {
 
 	if (!p || !*p)
 		errx(1, "missing field in config file:\n%s", errline);
 	return (p);
 }
 
 /*
  * In our sort we return it in the reverse of what qsort normally
  * would do, as we want the newest files first.  If we have two
  * entries with the same time we don't really care about order.
  *
  * Support function for qsort() in delete_oldest_timelog().
  */
 static int
 oldlog_entry_compare(const void *a, const void *b)
 {
 	const struct oldlog_entry *ola = a, *olb = b;
 
 	if (ola->t > olb->t)
 		return (-1);
 	else if (ola->t < olb->t)
 		return (1);
 	else
 		return (0);
 }
 
 /*
  * Check whether the file corresponding to dp is an archive of the logfile
  * logfname, based on the timefnamefmt format string. Return true and fill out
  * tm if this is the case; otherwise return false.
  */
 static int
 validate_old_timelog(int fd, const struct dirent *dp, const char *logfname,
     struct tm *tm)
 {
 	struct stat sb;
 	size_t logfname_len;
 	char *s;
 	int c;
 
 	logfname_len = strlen(logfname);
 
 	if (dp->d_type != DT_REG) {
 		/*
 		 * Some filesystems (e.g. NFS) don't fill out the d_type field
 		 * and leave it set to DT_UNKNOWN; in this case we must obtain
 		 * the file type ourselves.
 		 */
 		if (dp->d_type != DT_UNKNOWN ||
 		    fstatat(fd, dp->d_name, &sb, AT_SYMLINK_NOFOLLOW) != 0 ||
 		    !S_ISREG(sb.st_mode))
 			return (0);
 	}
 	/* Ignore everything but files with our logfile prefix. */
 	if (strncmp(dp->d_name, logfname, logfname_len) != 0)
 		return (0);
 	/* Ignore the actual non-rotated logfile. */
 	if (dp->d_namlen == logfname_len)
 		return (0);
 
 	/*
 	 * Make sure we created have found a logfile, so the
 	 * postfix is valid, IE format is: '.<time>(.[bgx]z)?'.
 	 */
 	if (dp->d_name[logfname_len] != '.') {
 		if (verbose)
 			printf("Ignoring %s which has unexpected "
 			    "extension '%s'\n", dp->d_name,
 			    &dp->d_name[logfname_len]);
 		return (0);
 	}
 	memset(tm, 0, sizeof(*tm));
 	if ((s = strptime(&dp->d_name[logfname_len + 1],
 	    timefnamefmt, tm)) == NULL) {
 		/*
 		 * We could special case "old" sequentially named logfiles here,
 		 * but we do not as that would require special handling to
 		 * decide which one was the oldest compared to "new" time based
 		 * logfiles.
 		 */
 		if (verbose)
 			printf("Ignoring %s which does not "
 			    "match time format\n", dp->d_name);
 		return (0);
 	}
 
 	for (c = 0; c < COMPRESS_TYPES; c++)
 		if (strcmp(s, compress_type[c].suffix) == 0)
 			/* We're done. */
 			return (1);
 
 	if (verbose)
 		printf("Ignoring %s which has unexpected extension '%s'\n",
 		    dp->d_name, s);
 
 	return (0);
 }
 
 /*
  * Delete the oldest logfiles, when using time based filenames.
  */
 static void
 delete_oldest_timelog(const struct conf_entry *ent, const char *archive_dir)
 {
 	char *logfname, *s, *dir, errbuf[80];
 	int dir_fd, i, logcnt, max_logcnt;
 	struct oldlog_entry *oldlogs;
 	struct dirent *dp;
 	const char *cdir;
 	struct tm tm;
 	DIR *dirp;
 
 	oldlogs = malloc(MAX_OLDLOGS * sizeof(struct oldlog_entry));
 	max_logcnt = MAX_OLDLOGS;
 	logcnt = 0;
 
 	if (archive_dir != NULL && archive_dir[0] != '\0')
 		cdir = archive_dir;
 	else
 		if ((cdir = dirname(ent->log)) == NULL)
 			err(1, "dirname()");
 	if ((dir = strdup(cdir)) == NULL)
 		err(1, "strdup()");
 
 	if ((s = basename(ent->log)) == NULL)
 		err(1, "basename()");
 	if ((logfname = strdup(s)) == NULL)
 		err(1, "strdup()");
 	if (strcmp(logfname, "/") == 0)
 		errx(1, "Invalid log filename - became '/'");
 
 	if (verbose > 2)
 		printf("Searching for old logs in %s\n", dir);
 
 	/* First we create a 'list' of all archived logfiles */
 	if ((dirp = opendir(dir)) == NULL)
 		err(1, "Cannot open log directory '%s'", dir);
 	dir_fd = dirfd(dirp);
 	while ((dp = readdir(dirp)) != NULL) {
 		if (validate_old_timelog(dir_fd, dp, logfname, &tm) == 0)
 			continue;
 
 		/*
 		 * We should now have old an old rotated logfile, so
 		 * add it to the 'list'.
 		 */
 		if ((oldlogs[logcnt].t = timegm(&tm)) == -1)
 			err(1, "Could not convert time string to time value");
 		if ((oldlogs[logcnt].fname = strdup(dp->d_name)) == NULL)
 			err(1, "strdup()");
 		logcnt++;
 
 		/*
 		 * It is very unlikely we ever run out of space in the
 		 * logfile array from the default size, but lets
 		 * handle it anyway...
 		 */
 		if (logcnt >= max_logcnt) {
 			max_logcnt *= 4;
 			/* Detect integer overflow */
 			if (max_logcnt < logcnt)
 				errx(1, "Too many old logfiles found");
 			oldlogs = realloc(oldlogs,
 			    max_logcnt * sizeof(struct oldlog_entry));
 			if (oldlogs == NULL)
 				err(1, "realloc()");
 		}
 	}
 
 	/* Second, if needed we delete oldest archived logfiles */
 	if (logcnt > 0 && logcnt >= ent->numlogs && ent->numlogs > 1) {
 		oldlogs = realloc(oldlogs, logcnt *
 		    sizeof(struct oldlog_entry));
 		if (oldlogs == NULL)
 			err(1, "realloc()");
 
 		/*
 		 * We now sort the logs in the order of newest to
 		 * oldest.  That way we can simply skip over the
 		 * number of records we want to keep.
 		 */
 		qsort(oldlogs, logcnt, sizeof(struct oldlog_entry),
 		    oldlog_entry_compare);
 		for (i = ent->numlogs - 1; i < logcnt; i++) {
 			if (noaction)
 				printf("\trm -f %s/%s\n", dir,
 				    oldlogs[i].fname);
 			else if (unlinkat(dir_fd, oldlogs[i].fname, 0) != 0) {
 				snprintf(errbuf, sizeof(errbuf),
 				    "Could not delete old logfile '%s'",
 				    oldlogs[i].fname);
 				perror(errbuf);
 			}
 		}
 	} else if (verbose > 1)
 		printf("No old logs to delete for logfile %s\n", ent->log);
 
 	/* Third, cleanup */
 	closedir(dirp);
 	for (i = 0; i < logcnt; i++) {
 		assert(oldlogs[i].fname != NULL);
 		free(oldlogs[i].fname);
 	}
 	free(oldlogs);
 	free(logfname);
 	free(dir);
 }
 
 /*
  * Generate a log filename, when using classic filenames.
  */
 static void
 gen_classiclog_fname(char *fname, size_t fname_sz, const char *archive_dir,
     const char *namepart, int numlogs_c)
 {
 
 	if (archive_dir[0] != '\0')
 		(void) snprintf(fname, fname_sz, "%s/%s.%d", archive_dir,
 		    namepart, numlogs_c);
 	else
 		(void) snprintf(fname, fname_sz, "%s.%d", namepart, numlogs_c);
 }
 
 /*
  * Delete a rotated logfile, when using classic filenames.
  */
 static void
 delete_classiclog(const char *archive_dir, const char *namepart, int numlog_c)
 {
 	char file1[MAXPATHLEN], zfile1[MAXPATHLEN];
 	int c;
 
 	gen_classiclog_fname(file1, sizeof(file1), archive_dir, namepart,
 	    numlog_c);
 
 	for (c = 0; c < COMPRESS_TYPES; c++) {
 		(void) snprintf(zfile1, sizeof(zfile1), "%s%s", file1,
 		    compress_type[c].suffix);
 		if (noaction)
 			printf("\trm -f %s\n", zfile1);
 		else
 			(void) unlink(zfile1);
 	}
 }
 
 /*
  * Only add to the queue if the file hasn't already been added. This is
  * done to prevent circular include loops.
  */
 static void
 add_to_queue(const char *fname, struct ilist *inclist)
 {
 	struct include_entry *inc;
 
 	STAILQ_FOREACH(inc, inclist, inc_nextp) {
 		if (strcmp(fname, inc->file) == 0) {
 			warnx("duplicate include detected: %s", fname);
 			return;
 		}
 	}
 
 	inc = malloc(sizeof(struct include_entry));
 	if (inc == NULL)
 		err(1, "malloc of inc");
 	inc->file = strdup(fname);
 
 	if (verbose > 2)
 		printf("\t+ Adding %s to the processing queue.\n", fname);
 
 	STAILQ_INSERT_TAIL(inclist, inc, inc_nextp);
 }
 
 /*
  * Search for logfile and return its compression suffix (if supported)
  * The suffix detection is first-match in the order of compress_types
  *
  * Note: if logfile without suffix exists (uncompressed, COMPRESS_NONE)
  * a zero-length string is returned
  */
 static const char *
 get_logfile_suffix(const char *logfile)
 {
 	struct stat st;
 	char zfile[MAXPATHLEN];
 	int c;
 
 	for (c = 0; c < COMPRESS_TYPES; c++) {
 		(void) strlcpy(zfile, logfile, MAXPATHLEN);
 		(void) strlcat(zfile, compress_type[c].suffix, MAXPATHLEN);
 		if (lstat(zfile, &st) == 0)
 			return (compress_type[c].suffix);
 	}
 	return (NULL);
 }
 
 static fk_entry
 do_rotate(const struct conf_entry *ent)
 {
 	char dirpart[MAXPATHLEN], namepart[MAXPATHLEN];
 	char file1[MAXPATHLEN], file2[MAXPATHLEN];
 	char zfile1[MAXPATHLEN], zfile2[MAXPATHLEN];
 	const char *logfile_suffix;
 	char datetimestr[30];
 	int flags, numlogs_c;
 	fk_entry free_or_keep;
 	struct sigwork_entry *swork;
 	struct stat st;
 	struct tm tm;
 	time_t now;
 
 	flags = ent->flags;
 	free_or_keep = FREE_ENT;
 
 	if (archtodir) {
 		char *p;
 
 		/* build complete name of archive directory into dirpart */
 		if (*archdirname == '/') {	/* absolute */
 			strlcpy(dirpart, archdirname, sizeof(dirpart));
 		} else {	/* relative */
 			/* get directory part of logfile */
 			strlcpy(dirpart, ent->log, sizeof(dirpart));
 			if ((p = strrchr(dirpart, '/')) == NULL)
 				dirpart[0] = '\0';
 			else
 				*(p + 1) = '\0';
 			strlcat(dirpart, archdirname, sizeof(dirpart));
 		}
 
 		/* check if archive directory exists, if not, create it */
 		if (lstat(dirpart, &st))
 			createdir(ent, dirpart);
 
 		/* get filename part of logfile */
 		if ((p = strrchr(ent->log, '/')) == NULL)
 			strlcpy(namepart, ent->log, sizeof(namepart));
 		else
 			strlcpy(namepart, p + 1, sizeof(namepart));
 	} else {
 		/*
 		 * Tell utility functions we are not using an archive
 		 * dir.
 		 */
 		dirpart[0] = '\0';
 		strlcpy(namepart, ent->log, sizeof(namepart));
 	}
 
 	/* Delete old logs */
 	if (timefnamefmt != NULL)
 		delete_oldest_timelog(ent, dirpart);
 	else {
 		/*
 		 * Handle cleaning up after legacy newsyslog where we
 		 * kept ent->numlogs + 1 files.  This code can go away
 		 * at some point in the future.
 		 */
 		delete_classiclog(dirpart, namepart, ent->numlogs);
 
 		if (ent->numlogs > 0)
 			delete_classiclog(dirpart, namepart, ent->numlogs - 1);
 
 	}
 
 	if (timefnamefmt != NULL) {
 		/* If time functions fails we can't really do any sensible */
 		if (time(&now) == (time_t)-1 ||
 		    localtime_r(&now, &tm) == NULL)
 			bzero(&tm, sizeof(tm));
 
 		strftime(datetimestr, sizeof(datetimestr), timefnamefmt, &tm);
 		if (archtodir)
 			(void) snprintf(file1, sizeof(file1), "%s/%s.%s",
 			    dirpart, namepart, datetimestr);
 		else
 			(void) snprintf(file1, sizeof(file1), "%s.%s",
 			    ent->log, datetimestr);
 
 		/* Don't run the code to move down logs */
 		numlogs_c = -1;
 	} else {
 		gen_classiclog_fname(file1, sizeof(file1), dirpart, namepart,
 		    ent->numlogs - 1);
 		numlogs_c = ent->numlogs - 2;		/* copy for countdown */
 	}
 
 	/* Move down log files */
 	for (; numlogs_c >= 0; numlogs_c--) {
 		(void) strlcpy(file2, file1, sizeof(file2));
 
 		gen_classiclog_fname(file1, sizeof(file1), dirpart, namepart,
 		    numlogs_c);
 
 		logfile_suffix = get_logfile_suffix(file1);
 		if (logfile_suffix == NULL)
 			continue;
 		(void) strlcpy(zfile1, file1, MAXPATHLEN);
 		(void) strlcpy(zfile2, file2, MAXPATHLEN);
 		(void) strlcat(zfile1, logfile_suffix, MAXPATHLEN);
 		(void) strlcat(zfile2, logfile_suffix, MAXPATHLEN);
 
 		if (noaction)
 			printf("\tmv %s %s\n", zfile1, zfile2);
 		else {
 			/* XXX - Ought to be checking for failure! */
 			(void)rename(zfile1, zfile2);
 		}
 		change_attrs(zfile2, ent);
 	}
 
 	if (ent->numlogs > 0) {
 		if (noaction) {
 			/*
 			 * Note that savelog() may succeed with using link()
 			 * for the archtodir case, but there is no good way
 			 * of knowing if it will when doing "noaction", so
 			 * here we claim that it will have to do a copy...
 			 */
 			if (archtodir)
 				printf("\tcp %s %s\n", ent->log, file1);
 			else
 				printf("\tln %s %s\n", ent->log, file1);
 			printf("\ttouch %s\t\t"
 			    "# Update mtime for 'when'-interval processing\n",
 			    file1);
 		} else {
 			if (!(flags & CE_BINARY)) {
 				/* Report the trimming to the old log */
 				log_trim(ent->log, ent);
 			}
 			savelog(ent->log, file1);
 			/*
 			 * Interval-based rotations are done using the mtime of
 			 * the most recently archived log, so make sure it gets
 			 * updated during a rotation.
 			 */
 			utimes(file1, NULL);
 		}
 		change_attrs(file1, ent);
 	}
 
 	/* Create the new log file and move it into place */
 	if (noaction)
 		printf("Start new log...\n");
 	createlog(ent);
 
 	/*
 	 * Save all signalling and file-compression to be done after log
 	 * files from all entries have been rotated.  This way any one
 	 * process will not be sent the same signal multiple times when
 	 * multiple log files had to be rotated.
 	 */
 	swork = NULL;
 	if (ent->pid_cmd_file != NULL)
 		swork = save_sigwork(ent);
 	if (ent->numlogs > 0 && ent->compress > COMPRESS_NONE) {
 		/*
 		 * The zipwork_entry will include a pointer to this
 		 * conf_entry, so the conf_entry should not be freed.
 		 */
 		free_or_keep = KEEP_ENT;
 		save_zipwork(ent, swork, ent->fsize, file1);
 	}
 
 	return (free_or_keep);
 }
 
 static void
 do_sigwork(struct sigwork_entry *swork)
 {
 	struct sigwork_entry *nextsig;
 	int kres, secs;
 	char *tmp;
 
 	if (swork->sw_runcmd == 0 && (!(swork->sw_pidok) || swork->sw_pid == 0))
 		return;			/* no work to do... */
 
 	/*
 	 * If nosignal (-s) was specified, then do not signal any process.
 	 * Note that a nosignal request triggers a warning message if the
 	 * rotated logfile needs to be compressed, *unless* -R was also
 	 * specified.  We assume that an `-sR' request came from a process
 	 * which writes to the logfile, and as such, we assume that process
 	 * has already made sure the logfile is not presently in use.  This
 	 * just sets swork->sw_pidok to a special value, and do_zipwork
 	 * will print any necessary warning(s).
 	 */
 	if (nosignal) {
 		if (!rotatereq)
 			swork->sw_pidok = -1;
 		return;
 	}
 
 	/*
 	 * Compute the pause between consecutive signals.  Use a longer
 	 * sleep time if we will be sending two signals to the same
 	 * daemon or process-group.
 	 */
 	secs = 0;
 	nextsig = SLIST_NEXT(swork, sw_nextp);
 	if (nextsig != NULL) {
 		if (swork->sw_pid == nextsig->sw_pid)
 			secs = 10;
 		else
 			secs = 1;
 	}
 
 	if (noaction) {
 		if (swork->sw_runcmd)
 			printf("\tsh -c '%s %d'\n", swork->sw_fname,
 			    swork->sw_signum);
 		else {
 			printf("\tkill -%d %d \t\t# %s\n", swork->sw_signum,
 			    (int)swork->sw_pid, swork->sw_fname);
 			if (secs > 0)
 				printf("\tsleep %d\n", secs);
 		}
 		return;
 	}
 
 	if (swork->sw_runcmd) {
 		asprintf(&tmp, "%s %d", swork->sw_fname, swork->sw_signum);
 		if (tmp == NULL) {
 			warn("can't allocate memory to run %s",
 			    swork->sw_fname);
 			return;
 		}
 		if (verbose)
 			printf("Run command: %s\n", tmp);
 		kres = system(tmp);
 		if (kres) {
 			warnx("%s: returned non-zero exit code: %d",
 			    tmp, kres);
 		}
 		free(tmp);
 		return;
 	}
 
 	kres = kill(swork->sw_pid, swork->sw_signum);
 	if (kres != 0) {
 		/*
 		 * Assume that "no such process" (ESRCH) is something
 		 * to warn about, but is not an error.  Presumably the
 		 * process which writes to the rotated log file(s) is
 		 * gone, in which case we should have no problem with
 		 * compressing the rotated log file(s).
 		 */
 		if (errno != ESRCH)
 			swork->sw_pidok = 0;
 		warn("can't notify %s, pid %d = %s", swork->sw_pidtype,
 		    (int)swork->sw_pid, swork->sw_fname);
 	} else {
 		if (verbose)
 			printf("Notified %s pid %d = %s\n", swork->sw_pidtype,
 			    (int)swork->sw_pid, swork->sw_fname);
 		if (secs > 0) {
 			if (verbose)
 				printf("Pause %d second(s) between signals\n",
 				    secs);
 			sleep(secs);
 		}
 	}
 }
 
 static void
 do_zipwork(struct zipwork_entry *zwork)
 {
 	const char *pgm_name, *pgm_path;
 	int errsav, fcount, zstatus;
 	pid_t pidzip, wpid;
 	char zresult[MAXPATHLEN];
 	int c;
 
 	assert(zwork != NULL);
 	pgm_path = NULL;
 	strlcpy(zresult, zwork->zw_fname, sizeof(zresult));
 	if (zwork->zw_conf != NULL &&
 	    zwork->zw_conf->compress > COMPRESS_NONE)
 		for (c = 1; c < COMPRESS_TYPES; c++) {
 			if (zwork->zw_conf->compress == c) {
 				pgm_path = compress_type[c].path;
 				(void) strlcat(zresult,
 				    compress_type[c].suffix, sizeof(zresult));
 				break;
 			}
 		}
 	if (pgm_path == NULL) {
 		warnx("invalid entry for %s in do_zipwork", zwork->zw_fname);
 		return;
 	}
 	pgm_name = strrchr(pgm_path, '/');
 	if (pgm_name == NULL)
 		pgm_name = pgm_path;
 	else
 		pgm_name++;
 
 	if (zwork->zw_swork != NULL && zwork->zw_swork->sw_runcmd == 0 &&
 	    zwork->zw_swork->sw_pidok <= 0) {
 		warnx(
 		    "log %s not compressed because daemon(s) not notified",
 		    zwork->zw_fname);
 		change_attrs(zwork->zw_fname, zwork->zw_conf);
 		return;
 	}
 
 	if (noaction) {
 		printf("\t%s %s\n", pgm_name, zwork->zw_fname);
 		change_attrs(zresult, zwork->zw_conf);
 		return;
 	}
 
 	fcount = 1;
 	pidzip = fork();
 	while (pidzip < 0) {
 		/*
 		 * The fork failed.  If the failure was due to a temporary
 		 * problem, then wait a short time and try it again.
 		 */
 		errsav = errno;
 		warn("fork() for `%s %s'", pgm_name, zwork->zw_fname);
 		if (errsav != EAGAIN || fcount > 5)
 			errx(1, "Exiting...");
 		sleep(fcount * 12);
 		fcount++;
 		pidzip = fork();
 	}
 	if (!pidzip) {
 		/* The child process executes the compression command */
 		execl(pgm_path, pgm_path, "-f", zwork->zw_fname, (char *)0);
 		err(1, "execl(`%s -f %s')", pgm_path, zwork->zw_fname);
 	}
 
 	wpid = waitpid(pidzip, &zstatus, 0);
 	if (wpid == -1) {
 		/* XXX - should this be a fatal error? */
 		warn("%s: waitpid(%d)", pgm_path, pidzip);
 		return;
 	}
 	if (!WIFEXITED(zstatus)) {
 		warnx("`%s -f %s' did not terminate normally", pgm_name,
 		    zwork->zw_fname);
 		return;
 	}
 	if (WEXITSTATUS(zstatus)) {
 		warnx("`%s -f %s' terminated with a non-zero status (%d)",
 		    pgm_name, zwork->zw_fname, WEXITSTATUS(zstatus));
 		return;
 	}
 
 	/* Compression was successful, set file attributes on the result. */
 	change_attrs(zresult, zwork->zw_conf);
 }
 
 /*
  * Save information on any process we need to signal.  Any single
  * process may need to be sent different signal-values for different
  * log files, but usually a single signal-value will cause the process
  * to close and re-open all of it's log files.
  */
 static struct sigwork_entry *
 save_sigwork(const struct conf_entry *ent)
 {
 	struct sigwork_entry *sprev, *stmp;
 	int ndiff;
 	size_t tmpsiz;
 
 	sprev = NULL;
 	ndiff = 1;
 	SLIST_FOREACH(stmp, &swhead, sw_nextp) {
 		ndiff = strcmp(ent->pid_cmd_file, stmp->sw_fname);
 		if (ndiff > 0)
 			break;
 		if (ndiff == 0) {
 			if (ent->sig == stmp->sw_signum)
 				break;
 			if (ent->sig > stmp->sw_signum) {
 				ndiff = 1;
 				break;
 			}
 		}
 		sprev = stmp;
 	}
 	if (stmp != NULL && ndiff == 0)
 		return (stmp);
 
 	tmpsiz = sizeof(struct sigwork_entry) + strlen(ent->pid_cmd_file) + 1;
 	stmp = malloc(tmpsiz);
 	
 	stmp->sw_runcmd = 0;
 	/* If this is a command to run we just set the flag and run command */
 	if (ent->flags & CE_PID2CMD) {
 		stmp->sw_pid = -1;
 		stmp->sw_pidok = 0;
 		stmp->sw_runcmd = 1;
 	} else {
 		set_swpid(stmp, ent);
 	}
 	stmp->sw_signum = ent->sig;
 	strcpy(stmp->sw_fname, ent->pid_cmd_file);
 	if (sprev == NULL)
 		SLIST_INSERT_HEAD(&swhead, stmp, sw_nextp);
 	else
 		SLIST_INSERT_AFTER(sprev, stmp, sw_nextp);
 	return (stmp);
 }
 
 /*
  * Save information on any file we need to compress.  We may see the same
  * file multiple times, so check the full list to avoid duplicates.  The
  * list itself is sorted smallest-to-largest, because that's the order we
  * want to compress the files.  If the partition is very low on disk space,
  * then the smallest files are the most likely to compress, and compressing
  * them first will free up more space for the larger files.
  */
 static struct zipwork_entry *
 save_zipwork(const struct conf_entry *ent, const struct sigwork_entry *swork,
     int zsize, const char *zipfname)
 {
 	struct zipwork_entry *zprev, *ztmp;
 	int ndiff;
 	size_t tmpsiz;
 
 	/* Compute the size if the caller did not know it. */
 	if (zsize < 0)
 		zsize = sizefile(zipfname);
 
 	zprev = NULL;
 	ndiff = 1;
 	SLIST_FOREACH(ztmp, &zwhead, zw_nextp) {
 		ndiff = strcmp(zipfname, ztmp->zw_fname);
 		if (ndiff == 0)
 			break;
 		if (zsize > ztmp->zw_fsize)
 			zprev = ztmp;
 	}
 	if (ztmp != NULL && ndiff == 0)
 		return (ztmp);
 
 	tmpsiz = sizeof(struct zipwork_entry) + strlen(zipfname) + 1;
 	ztmp = malloc(tmpsiz);
 	ztmp->zw_conf = ent;
 	ztmp->zw_swork = swork;
 	ztmp->zw_fsize = zsize;
 	strcpy(ztmp->zw_fname, zipfname);
 	if (zprev == NULL)
 		SLIST_INSERT_HEAD(&zwhead, ztmp, zw_nextp);
 	else
 		SLIST_INSERT_AFTER(zprev, ztmp, zw_nextp);
 	return (ztmp);
 }
 
 /* Send a signal to the pid specified by pidfile */
 static void
 set_swpid(struct sigwork_entry *swork, const struct conf_entry *ent)
 {
 	FILE *f;
 	long minok, maxok, rval;
 	char *endp, *linep, line[BUFSIZ];
 
 	minok = MIN_PID;
 	maxok = MAX_PID;
 	swork->sw_pidok = 0;
 	swork->sw_pid = 0;
 	swork->sw_pidtype = "daemon";
 	if (ent->flags & CE_SIGNALGROUP) {
 		/*
 		 * If we are expected to signal a process-group when
 		 * rotating this logfile, then the value read in should
 		 * be the negative of a valid process ID.
 		 */
 		minok = -MAX_PID;
 		maxok = -MIN_PID;
 		swork->sw_pidtype = "process-group";
 	}
 
 	f = fopen(ent->pid_cmd_file, "r");
 	if (f == NULL) {
 		if (errno == ENOENT && enforcepid == 0) {
 			/*
 			 * Warn if the PID file doesn't exist, but do
 			 * not consider it an error.  Most likely it
 			 * means the process has been terminated,
 			 * so it should be safe to rotate any log
 			 * files that the process would have been using.
 			 */
 			swork->sw_pidok = 1;
 			warnx("pid file doesn't exist: %s", ent->pid_cmd_file);
 		} else
 			warn("can't open pid file: %s", ent->pid_cmd_file);
 		return;
 	}
 
 	if (fgets(line, BUFSIZ, f) == NULL) {
 		/*
 		 * Warn if the PID file is empty, but do not consider
 		 * it an error.  Most likely it means the process has
 		 * has terminated, so it should be safe to rotate any
 		 * log files that the process would have been using.
 		 */
 		if (feof(f) && enforcepid == 0) {
 			swork->sw_pidok = 1;
 			warnx("pid/cmd file is empty: %s", ent->pid_cmd_file);
 		} else
 			warn("can't read from pid file: %s", ent->pid_cmd_file);
 		(void)fclose(f);
 		return;
 	}
 	(void)fclose(f);
 
 	errno = 0;
 	linep = line;
 	while (*linep == ' ')
 		linep++;
 	rval = strtol(linep, &endp, 10);
 	if (*endp != '\0' && !isspacech(*endp)) {
 		warnx("pid file does not start with a valid number: %s",
 		    ent->pid_cmd_file);
 	} else if (rval < minok || rval > maxok) {
 		warnx("bad value '%ld' for process number in %s",
 		    rval, ent->pid_cmd_file);
 		if (verbose)
 			warnx("\t(expecting value between %ld and %ld)",
 			    minok, maxok);
 	} else {
 		swork->sw_pidok = 1;
 		swork->sw_pid = rval;
 	}
 
 	return;
 }
 
 /* Log the fact that the logs were turned over */
 static int
 log_trim(const char *logname, const struct conf_entry *log_ent)
 {
 	FILE *f;
 	const char *xtra;
 
 	if ((f = fopen(logname, "a")) == NULL)
 		return (-1);
 	xtra = "";
 	if (log_ent->def_cfg)
 		xtra = " using <default> rule";
 	if (log_ent->firstcreate)
 		fprintf(f, "%s %s newsyslog[%d]: logfile first created%s\n",
 		    daytime, hostname, (int) getpid(), xtra);
 	else if (log_ent->r_reason != NULL)
 		fprintf(f, "%s %s newsyslog[%d]: logfile turned over%s%s\n",
 		    daytime, hostname, (int) getpid(), log_ent->r_reason, xtra);
 	else
 		fprintf(f, "%s %s newsyslog[%d]: logfile turned over%s\n",
 		    daytime, hostname, (int) getpid(), xtra);
 	if (fclose(f) == EOF)
 		err(1, "log_trim: fclose");
 	return (0);
 }
 
 /* Return size in kilobytes of a file */
 static int
 sizefile(const char *file)
 {
 	struct stat sb;
 
 	if (stat(file, &sb) < 0)
 		return (-1);
 	return (kbytes(sb.st_size));
 }
 
 /*
  * Return the mtime of the most recent archive of the logfile, using timestamp
  * based filenames.
  */
 static time_t
 mtime_old_timelog(const char *file)
 {
 	struct stat sb;
 	struct tm tm;
 	int dir_fd;
 	time_t t;
 	struct dirent *dp;
 	DIR *dirp;
 	char *s, *logfname, *dir;
 
 	t = -1;
 
 	if ((dir = dirname(file)) == NULL) {
 		warn("dirname() of '%s'", file);
 		return (t);
 	}
 	if ((s = basename(file)) == NULL) {
 		warn("basename() of '%s'", file);
 		return (t);
 	} else if (s[0] == '/') {
 		warnx("Invalid log filename '%s'", s);
 		return (t);
 	} else if ((logfname = strdup(s)) == NULL)
 		err(1, "strdup()");
 
 	if ((dirp = opendir(dir)) == NULL) {
 		warn("Cannot open log directory '%s'", dir);
 		return (t);
 	}
 	dir_fd = dirfd(dirp);
 	/* Open the archive dir and find the most recent archive of logfname. */
 	while ((dp = readdir(dirp)) != NULL) {
 		if (validate_old_timelog(dir_fd, dp, logfname, &tm) == 0)
 			continue;
 
 		if (fstatat(dir_fd, dp->d_name, &sb, AT_SYMLINK_NOFOLLOW) == -1) {
 			warn("Cannot stat '%s'", file);
 			continue;
 		}
 		if (t < sb.st_mtime)
 			t = sb.st_mtime;
 	}
 	closedir(dirp);
 
 	return (t);
 }
 
 /* Return the age in hours of the most recent archive of the logfile. */
 static int
 age_old_log(const char *file)
 {
 	struct stat sb;
 	const char *logfile_suffix;
 	char tmp[MAXPATHLEN + sizeof(".0") + COMPRESS_SUFFIX_MAXLEN + 1];
 	time_t mtime;
 
 	if (archtodir) {
 		char *p;
 
 		/* build name of archive directory into tmp */
 		if (*archdirname == '/') {	/* absolute */
 			strlcpy(tmp, archdirname, sizeof(tmp));
 		} else {	/* relative */
 			/* get directory part of logfile */
 			strlcpy(tmp, file, sizeof(tmp));
 			if ((p = strrchr(tmp, '/')) == NULL)
 				tmp[0] = '\0';
 			else
 				*(p + 1) = '\0';
 			strlcat(tmp, archdirname, sizeof(tmp));
 		}
 
 		strlcat(tmp, "/", sizeof(tmp));
 
 		/* get filename part of logfile */
 		if ((p = strrchr(file, '/')) == NULL)
 			strlcat(tmp, file, sizeof(tmp));
 		else
 			strlcat(tmp, p + 1, sizeof(tmp));
 	} else {
 		(void) strlcpy(tmp, file, sizeof(tmp));
 	}
 
 	if (timefnamefmt != NULL) {
 		mtime = mtime_old_timelog(tmp);
 		if (mtime == -1)
 			return (-1);
 	} else {
 		strlcat(tmp, ".0", sizeof(tmp));
 		logfile_suffix = get_logfile_suffix(tmp);
 		if (logfile_suffix == NULL)
 			return (-1);
 		(void) strlcat(tmp, logfile_suffix, sizeof(tmp));
 		if (stat(tmp, &sb) < 0)
 			return (-1);
 		mtime = sb.st_mtime;
 	}
 
 	return ((int)(ptimeget_secs(timenow) - mtime + 1800) / 3600);
 }
 
 /* Skip Over Blanks */
 static char *
 sob(char *p)
 {
 	while (p && *p && isspace(*p))
 		p++;
 	return (p);
 }
 
 /* Skip Over Non-Blanks */
 static char *
 son(char *p)
 {
 	while (p && *p && !isspace(*p))
 		p++;
 	return (p);
 }
 
 /* Check if string is actually a number */
 static int
 isnumberstr(const char *string)
 {
 	while (*string) {
 		if (!isdigitch(*string++))
 			return (0);
 	}
 	return (1);
 }
 
 /* Check if string contains a glob */
 static int
 isglobstr(const char *string)
 {
 	char chr;
 
 	while ((chr = *string++)) {
 		if (chr == '*' || chr == '?' || chr == '[')
 			return (1);
 	}
 	return (0);
 }
 
 /*
  * Save the active log file under a new name.  A link to the new name
  * is the quick-and-easy way to do this.  If that fails (which it will
  * if the destination is on another partition), then make a copy of
  * the file to the new location.
  */
 static void
 savelog(char *from, char *to)
 {
 	FILE *src, *dst;
 	int c, res;
 
 	res = link(from, to);
 	if (res == 0)
 		return;
 
 	if ((src = fopen(from, "r")) == NULL)
 		err(1, "can't fopen %s for reading", from);
 	if ((dst = fopen(to, "w")) == NULL)
 		err(1, "can't fopen %s for writing", to);
 
 	while ((c = getc(src)) != EOF) {
 		if ((putc(c, dst)) == EOF)
 			err(1, "error writing to %s", to);
 	}
 
 	if (ferror(src))
 		err(1, "error reading from %s", from);
 	if ((fclose(src)) != 0)
 		err(1, "can't fclose %s", to);
 	if ((fclose(dst)) != 0)
 		err(1, "can't fclose %s", from);
 }
 
 /* create one or more directory components of a path */
 static void
 createdir(const struct conf_entry *ent, char *dirpart)
 {
 	int res;
 	char *s, *d;
 	char mkdirpath[MAXPATHLEN];
 	struct stat st;
 
 	s = dirpart;
 	d = mkdirpath;
 
 	for (;;) {
 		*d++ = *s++;
 		if (*s != '/' && *s != '\0')
 			continue;
 		*d = '\0';
 		res = lstat(mkdirpath, &st);
 		if (res != 0) {
 			if (noaction) {
 				printf("\tmkdir %s\n", mkdirpath);
 			} else {
 				res = mkdir(mkdirpath, 0755);
 				if (res != 0)
 					err(1, "Error on mkdir(\"%s\") for -a",
 					    mkdirpath);
 			}
 		}
 		if (*s == '\0')
 			break;
 	}
 	if (verbose) {
 		if (ent->firstcreate)
 			printf("Created directory '%s' for new %s\n",
 			    dirpart, ent->log);
 		else
 			printf("Created directory '%s' for -a\n", dirpart);
 	}
 }
 
 /*
  * Create a new log file, destroying any currently-existing version
  * of the log file in the process.  If the caller wants a backup copy
  * of the file to exist, they should call 'link(logfile,logbackup)'
  * before calling this routine.
  */
 void
 createlog(const struct conf_entry *ent)
 {
 	int fd, failed;
 	struct stat st;
 	char *realfile, *slash, tempfile[MAXPATHLEN];
 
 	fd = -1;
 	realfile = ent->log;
 
 	/*
 	 * If this log file is being created for the first time (-C option),
 	 * then it may also be true that the parent directory does not exist
 	 * yet.  Check, and create that directory if it is missing.
 	 */
 	if (ent->firstcreate) {
 		strlcpy(tempfile, realfile, sizeof(tempfile));
 		slash = strrchr(tempfile, '/');
 		if (slash != NULL) {
 			*slash = '\0';
 			failed = stat(tempfile, &st);
 			if (failed && errno != ENOENT)
 				err(1, "Error on stat(%s)", tempfile);
 			if (failed)
 				createdir(ent, tempfile);
 			else if (!S_ISDIR(st.st_mode))
 				errx(1, "%s exists but is not a directory",
 				    tempfile);
 		}
 	}
 
 	/*
 	 * First create an unused filename, so it can be chown'ed and
 	 * chmod'ed before it is moved into the real location.  mkstemp
 	 * will create the file mode=600 & owned by us.  Note that all
 	 * temp files will have a suffix of '.z<something>'.
 	 */
 	strlcpy(tempfile, realfile, sizeof(tempfile));
 	strlcat(tempfile, ".zXXXXXX", sizeof(tempfile));
 	if (noaction)
 		printf("\tmktemp %s\n", tempfile);
 	else {
 		fd = mkstemp(tempfile);
 		if (fd < 0)
 			err(1, "can't mkstemp logfile %s", tempfile);
 
 		/*
 		 * Add status message to what will become the new log file.
 		 */
 		if (!(ent->flags & CE_BINARY)) {
 			if (log_trim(tempfile, ent))
 				err(1, "can't add status message to log");
 		}
 	}
 
 	/* Change the owner/group, if we are supposed to */
 	if (ent->uid != (uid_t)-1 || ent->gid != (gid_t)-1) {
 		if (noaction)
 			printf("\tchown %u:%u %s\n", ent->uid, ent->gid,
 			    tempfile);
 		else {
 			failed = fchown(fd, ent->uid, ent->gid);
 			if (failed)
 				err(1, "can't fchown temp file %s", tempfile);
 		}
 	}
 
 	/* Turn on NODUMP if it was requested in the config-file. */
 	if (ent->flags & CE_NODUMP) {
 		if (noaction)
 			printf("\tchflags nodump %s\n", tempfile);
 		else {
 			failed = fchflags(fd, UF_NODUMP);
 			if (failed) {
 				warn("log_trim: fchflags(NODUMP)");
 			}
 		}
 	}
 
 	/*
 	 * Note that if the real logfile still exists, and if the call
 	 * to rename() fails, then "neither the old file nor the new
 	 * file shall be changed or created" (to quote the standard).
 	 * If the call succeeds, then the file will be replaced without
 	 * any window where some other process might find that the file
 	 * did not exist.
 	 * XXX - ? It may be that for some error conditions, we could
 	 *	retry by first removing the realfile and then renaming.
 	 */
 	if (noaction) {
 		printf("\tchmod %o %s\n", ent->permissions, tempfile);
 		printf("\tmv %s %s\n", tempfile, realfile);
 	} else {
 		failed = fchmod(fd, ent->permissions);
 		if (failed)
 			err(1, "can't fchmod temp file '%s'", tempfile);
 		failed = rename(tempfile, realfile);
 		if (failed)
 			err(1, "can't mv %s to %s", tempfile, realfile);
 	}
 
 	if (fd >= 0)
 		close(fd);
 }
 
 /*
  * Change the attributes of a given filename to what was specified in
  * the newsyslog.conf entry.  This routine is only called for files
  * that newsyslog expects that it has created, and thus it is a fatal
  * error if this routine finds that the file does not exist.
  */
 static void
 change_attrs(const char *fname, const struct conf_entry *ent)
 {
 	int failed;
 
 	if (noaction) {
 		printf("\tchmod %o %s\n", ent->permissions, fname);
 
 		if (ent->uid != (uid_t)-1 || ent->gid != (gid_t)-1)
 			printf("\tchown %u:%u %s\n",
 			    ent->uid, ent->gid, fname);
 
 		if (ent->flags & CE_NODUMP)
 			printf("\tchflags nodump %s\n", fname);
 		return;
 	}
 
 	failed = chmod(fname, ent->permissions);
 	if (failed) {
 		if (errno != EPERM)
 			err(1, "chmod(%s) in change_attrs", fname);
 		warn("change_attrs couldn't chmod(%s)", fname);
 	}
 
 	if (ent->uid != (uid_t)-1 || ent->gid != (gid_t)-1) {
 		failed = chown(fname, ent->uid, ent->gid);
 		if (failed)
 			warn("can't chown %s", fname);
 	}
 
 	if (ent->flags & CE_NODUMP) {
 		failed = chflags(fname, UF_NODUMP);
 		if (failed)
 			warn("can't chflags %s NODUMP", fname);
 	}
 }
 
 /*
  * Parse a signal number or signal name. Returns the signal number parsed or -1
  * on failure.
  */
 static int
 parse_signal(const char *str)
 {
 	int sig, i;
 	const char *errstr;
 
 	sig = strtonum(str, 1, sys_nsig - 1, &errstr);
 
 	if (errstr == NULL)
 		return (sig);
 	if (strncasecmp(str, "SIG", 3) == 0)
 		str += 3;
 
 	for (i = 1; i < sys_nsig; i++) {
 		if (strcasecmp(str, sys_signame[i]) == 0)
 			return (i);
 	}
 
 	return (-1);
 }
Index: projects/vnet/usr.sbin/ypldap/ypldap.8
===================================================================
--- projects/vnet/usr.sbin/ypldap/ypldap.8	(revision 301546)
+++ projects/vnet/usr.sbin/ypldap/ypldap.8	(revision 301547)
@@ -1,82 +1,84 @@
 .\"	$OpenBSD: ypldap.8,v 1.10 2015/07/27 17:28:40 sobrado Exp $
 .\"	$FreeBSD$
 .\"
 .\" Copyright (c) 2008 Pierre-Yves Ritschard <pyr@openbsd.org>
 .\"
 .\" Permission to use, copy, modify, and distribute this software for any
 .\" purpose with or without fee is hereby granted, provided that the above
 .\" copyright notice and this permission notice appear in all copies.
 .\"
 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 .\"
-.Dd $Mdocdate: July 27 2015 $
+.Dd $Mdocdate: June 8 2016 $
 .Dt YPLDAP 8
 .Os
 .Sh NAME
 .Nm ypldap
 .Nd YP map server using LDAP backend
 .Sh SYNOPSIS
 .Nm
 .Op Fl dnv
 .Op Fl D Ar macro Ns = Ns Ar value
 .Op Fl f Ar file
 .Sh DESCRIPTION
 .Nm
 is a daemon providing YP maps using LDAP as a backend.
 RFC 2307 or similar LDAP schemas can be tied to the different YP maps.
 .Nm
 has the same role as
 .Xr ypserv 8
 and the two daemons are exclusive.
 .Pp
 The options are as follows:
 .Bl -tag -width Ds
 .It Fl D Ar macro Ns = Ns Ar value
 Define
 .Ar macro
 to be set to
 .Ar value
 on the command line.
 Overrides the definition of
 .Ar macro
 in the configuration file.
 .It Fl d
 Do not daemonize.
 If this option is specified,
 .Nm
 will run in the foreground and log to
 .Em stderr .
 .It Fl f Ar file
 Specify an alternative configuration file.
 .It Fl n
 Configtest mode.
 Only check the configuration file for validity.
 .It Fl v
 Produce more verbose output.
 .El
 .Sh FILES
 .Bl -tag -width "/etc/ypldap.confXX" -compact
 .It Pa /etc/ypldap.conf
 Default
 .Nm
 configuration file.
 .El
 .Sh SEE ALSO
 .Xr ypldap.conf 5 ,
 .Xr ypbind 8
 .Sh HISTORY
 The
 .Nm
 program first appeared in
-.Ox 4.4 .
+.Ox 4.4
+and then
+.Fx 11.0 .
 .Sh AUTHORS
 The
 .Nm
 program was written by
 .An Pierre-Yves Ritschard .
Index: projects/vnet
===================================================================
--- projects/vnet	(revision 301546)
+++ projects/vnet	(revision 301547)

Property changes on: projects/vnet
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head:r301529-301546