Index: head/UPDATING
===================================================================
--- head/UPDATING	(revision 195653)
+++ head/UPDATING	(revision 195654)
@@ -1,1651 +1,1658 @@
 Updating Information for FreeBSD current users
 
 This file is maintained and copyrighted by M. Warner Losh
 <imp@village.org>.  See end of file for further details.  For commonly
 done items, please see the COMMON ITEMS: section later in the file.
 
 Items affecting the ports and packages system can be found in
 /usr/ports/UPDATING.  Please read that file before running
 portupgrade.
 
 NOTE TO PEOPLE WHO THINK THAT FreeBSD 8.x IS SLOW:
 	FreeBSD 8.x has many debugging features turned on, in
 	both the kernel and userland.  These features attempt to detect
 	incorrect use of system primitives, and encourage loud failure
 	through extra sanity checking and fail stop semantics.  They
 	also substantially impact system performance.  If you want to
 	do performance measurement, benchmarking, and optimization,
 	you'll want to turn them off.  This includes various WITNESS-
 	related kernel options, INVARIANTS, malloc debugging flags
 	in userland, and various verbose features in the kernel.  Many
 	developers choose to disable these features on build machines
 	to maximize performance.  (To disable malloc debugging, run
 	ln -s aj /etc/malloc.conf.)
 
+20090713:
+	The TOE interface to the TCP syncache has been modified to remove struct
+	tcpopt (<netinet/tcp_var.h>) from the ABI of the network stack.  The
+	cxgb driver is the only TOE consumer affected by this change, and needs
+	to be recompiled along with the kernel. As this change breaks the ABI,
+	bump __FreeBSD_version to 800103.
+
 20090712:
 	Padding has been added to struct tcpcb, sackhint and tcpstat in
 	<netinet/tcp_var.h> to facilitate future MFCs and bug fixes whilst
 	maintainig the ABI. However, this change breaks the ABI, so bump
 	__FreeBSD_version to 800102. User space tools that rely on the size of
 	any of these structs (e.g. sockstat) need to be recompiled.
 
 20090630:
 	The NFS_LEGACYRPC option has been removed along with the old
 	kernel RPC implementation that this option selected. Kernel
 	configurations may need to be adjusted.
 
 20090629:
 	The network interface device nodes at /dev/net/<interface> have
 	been removed.  All ioctl operations can be performed the normal
 	way using routing sockets.  The kqueue functionality can
 	generally be replaced with routing sockets.
 
 20090628:
 	The documentation from the FreeBSD Documentation Project
 	(Handbook, FAQ, etc.) is now installed via packages by
 	sysinstall(8) and under the /usr/local/share/doc/freebsd
 	directory instead of /usr/share/doc.
 
 20090624:
 	The ABI of various structures related to the SYSV IPC API have
 	been changed.  As a result, the COMPAT_FREEBSD[456] kernel
 	options now all require COMPAT_FREEBSD7.  Bump __FreeBSD_version
 	to 800100.
 
 20090622:
 	Layout of struct vnet has changed as routing related variables
 	were moved to their own Vimage module. Modules need to be
 	recompiled.  Bump __FreeBSD_version to 800099.
 
 20090619:
 	NGROUPS_MAX and NGROUPS have been increased from 16 to 1023
 	and 1024 respectively.  As long as no more than 16 groups per
 	process are used, no changes should be visible.  When more
 	than 16 groups are used, old binaries may fail if they call
 	getgroups() or getgrouplist() with statically sized storage.
 	Recompiling will work around this, but applications should be
 	modified to use dynamically allocated storage for group arrays
 	as POSIX.1-2008 does not cap an implementation's number of
 	supported groups at NGROUPS_MAX+1 as previous versions did.
 
 	NFS and portalfs mounts may also be affected as the list of
 	groups is truncated to 16.  Users of NFS who use more than 16
 	groups, should take care that negative group permissions are not
 	used on the exported file systems as they will not be reliable
 	unless a GSSAPI based authentication method is used.
 
 20090616:
 	The compiling option ADAPTIVE_LOCKMGRS has been introduced.
 	This option compiles in the support for adaptive spinning for lockmgrs
 	which want to enable it.  The lockinit() function now accepts the
 	flag LK_ADAPTIVE in order to make the lock object subject to
 	adaptive spinning when both held in write and read mode.
 
 20090613:
 	The layout of the structure returned by IEEE80211_IOC_STA_INFO
 	has changed.  User applications that use this ioctl need to be
 	rebuilt.
 
 20090611:
 	The layout of struct thread has changed.  Kernel and modules
 	need to be rebuilt.
 
 20090608:
 	The layout of structs ifnet, domain, protosw and vnet_net has
 	changed.  Kernel modules need to be rebuilt.
 	Bump __FreeBSD_version to 800097.
 
 20090602:
 	window(1) has been removed from the base system. It can now be
 	installed from ports. The port is called misc/window.
 
 20090601:
 	The way we are storing and accessing `routing table' entries
 	has changed. Programs reading the FIB, like netstat, need to
 	be re-compiled.
 
 20090601:
 	A new netisr implementation has been added for FreeBSD 8.  Network
 	file system modules, such as igmp, ipdivert, and others, should be
 	rebuilt.
 	Bump __FreeBSD_version to 800096.
 
 20090530:
 	Remove the tunable/sysctl debug.mpsafevfs as its initial purpose
 	is no more valid.
 
 20090530:
 	Add VOP_ACCESSX(9).  File system modules need to be rebuilt.
 	Bump __FreeBSD_version to 800094.
 
 20090529:
 	Add mnt_xflag field to 'struct mount'.  File system modules
 	need to be rebuilt.
 	Bump __FreeBSD_version to 800093.
 
 20090528:
 	The compiling option ADAPTIVE_SX has been retired while it has been
 	introduced the option NO_ADAPTIVE_SX which handles the reversed logic.
 	The KPI for sx_init_flags() changes as accepting flags:
 	SX_ADAPTIVESPIN flag has been retired while the SX_NOADAPTIVE flag
 	has been introduced in order to handle the reversed logic.
 	Bump __FreeBSD_version to 800092.
 
 20090527:
 	Add support for hierarchical jails.  Remove global securelevel.
 	Bump __FreeBSD_version to 800091.
 
 20090523:
 	The layout of struct vnet_net has changed, therefore modules
 	need to be rebuilt.
 	Bump __FreeBSD_version to 800090.
 
 20090523:
 	The newly imported zic(8) produces a new format in the
 	output. Please run tzsetup(8) to install the newly created
 	data to /etc/localtime.
 
 20090520:
 	The sysctl tree for the usb stack has renamed from hw.usb2.* to
 	hw.usb.* and is now consistent again with previous releases.
 
 20090520:
 	802.11 monitor mode support was revised and driver api's
 	were changed.  Drivers dependent on net80211 now support
 	DLT_IEEE802_11_RADIO instead of DLT_IEEE802_11.  No
 	user-visible data structures were changed but applications
 	that use DLT_IEEE802_11 may require changes.
 	Bump __FreeBSD_version to 800088.
 
 20090430:
 	The layout of the following structs has changed: sysctl_oid,
 	socket, ifnet, inpcbinfo, tcpcb, syncache_head, vnet_inet,
 	vnet_inet6 and vnet_ipfw.  Most modules need to be rebuild or
 	panics may be experienced.  World rebuild is required for
 	correctly checking networking state from userland.
 	Bump __FreeBSD_version to 800085.
 
 20090429:
 	MLDv2 and Source-Specific Multicast (SSM) have been merged
 	to the IPv6 stack. VIMAGE hooks are in but not yet used.
 	The implementation of SSM within FreeBSD's IPv6 stack closely
 	follows the IPv4 implementation.
 
 	For kernel developers:
 
 	* The most important changes are that the ip6_output() and
 	  ip6_input() paths no longer take the IN6_MULTI_LOCK,
 	  and this lock has been downgraded to a non-recursive mutex.
 
 	* As with the changes to the IPv4 stack to support SSM, filtering
 	  of inbound multicast traffic must now be performed by transport
 	  protocols within the IPv6 stack. This does not apply to TCP and
 	  SCTP, however, it does apply to UDP in IPv6 and raw IPv6.
 
 	* The KPIs used by IPv6 multicast are similar to those used by
 	  the IPv4 stack, with the following differences:
 	   * im6o_mc_filter() is analogous to imo_multicast_filter().
 	   * The legacy KAME entry points in6_joingroup and in6_leavegroup()
 	     are shimmed to in6_mc_join() and in6_mc_leave() respectively.
 	   * IN6_LOOKUP_MULTI() has been deprecated and removed.
 	   * IPv6 relies on MLD for the DAD mechanism. KAME's internal KPIs
 	     for MLDv1 have an additional 'timer' argument which is used to
 	     jitter the initial membership report for the solicited-node
 	     multicast membership on-link.
 	   * This is not strictly needed for MLDv2, which already jitters
 	     its report transmissions.  However, the 'timer' argument is
 	     preserved in case MLDv1 is active on the interface.
 
 	* The KAME linked-list based IPv6 membership implementation has
 	  been refactored to use a vector similar to that used by the IPv4
 	  stack.
 	  Code which maintains a list of its own multicast memberships
 	  internally, e.g. carp, has been updated to reflect the new
 	  semantics.
 
 	* There is a known Lock Order Reversal (LOR) due to in6_setscope()
 	  acquiring the IF_AFDATA_LOCK and being called within ip6_output().
 	  Whilst MLDv2 tries to avoid this otherwise benign LOR, it is an
 	  implementation constraint which needs to be addressed in HEAD.
 
 	For application developers:
 
 	* The changes are broadly similar to those made for the IPv4
 	  stack.
 
 	* The use of IPv4 and IPv6 multicast socket options on the same
 	  socket, using mapped addresses, HAS NOT been tested or supported.
 
 	* There are a number of issues with the implementation of various
 	  IPv6 multicast APIs which need to be resolved in the API surface
 	  before the implementation is fully compatible with KAME userland
 	  use, and these are mostly to do with interface index treatment.
 
 	* The literature available discusses the use of either the delta / ASM
 	  API with setsockopt(2)/getsockopt(2), or the full-state / ASM API
 	  using setsourcefilter(3)/getsourcefilter(3). For more information
 	  please refer to RFC 3768, 'Socket Interface Extensions for
 	  Multicast Source Filters'.
 
 	* Applications which use the published RFC 3678 APIs should be fine.
 
 	For systems administrators:
 
 	* The mtest(8) utility has been refactored to support IPv6, in
 	  addition to IPv4. Interface addresses are no longer accepted
 	  as arguments, their names must be used instead. The utility
 	  will map the interface name to its first IPv4 address as
 	  returned by getifaddrs(3).
 
 	* The ifmcstat(8) utility has also been updated to print the MLDv2
 	  endpoint state and source filter lists via sysctl(3).
 
 	* The net.inet6.ip6.mcast.loop sysctl may be tuned to 0 to disable
 	  loopback of IPv6 multicast datagrams by default; it defaults to 1
 	  to preserve the existing behaviour. Disabling multicast loopback is
 	  recommended for optimal system performance.
 
 	* The IPv6 MROUTING code has been changed to examine this sysctl
 	  instead of attempting to perform a group lookup before looping
 	  back forwarded datagrams.
 
 	Bump __FreeBSD_version to 800084.
 
 20090422:
 	Implement low-level Bluetooth HCI API.
 	Bump __FreeBSD_version to 800083.
 
 20090419:
 	The layout of struct malloc_type, used by modules to register new
 	memory allocation types, has changed.  Most modules will need to
 	be rebuilt or panics may be experienced.
 	Bump __FreeBSD_version to 800081.
 
 20090415:
 	Anticipate overflowing inp_flags - add inp_flags2.
 	This changes most offsets in inpcb, so checking v4 connection
 	state will require a world rebuild.
 	Bump __FreeBSD_version to 800080.
 
 20090415:
 	Add an llentry to struct route and struct route_in6. Modules
 	embedding a struct route will need to be recompiled.
 	Bump __FreeBSD_version to 800079.
 
 20090414:
 	The size of rt_metrics_lite and by extension rtentry has changed.
 	Networking administration apps will need to be recompiled.
 	The route command now supports show as an alias for get, weighting
 	of routes, sticky and nostick flags to alter the behavior of stateful
 	load balancing.
 	Bump __FreeBSD_version to 800078.
 
 20090408:
 	Do not use Giant for kbdmux(4) locking. This is wrong and
 	apparently causing more problems than it solves. This will
 	re-open the issue where interrupt handlers may race with
 	kbdmux(4) in polling mode. Typical symptoms include (but
 	not limited to) duplicated and/or missing characters when
 	low level console functions (such as gets) are used while
 	interrupts are enabled (for example geli password prompt,
 	mountroot prompt etc.). Disabling kbdmux(4) may help.
 
 20090407:
 	The size of structs vnet_net, vnet_inet and vnet_ipfw has changed;
 	kernel modules referencing any of the above need to be recompiled.
 	Bump __FreeBSD_version to 800075.
 
 20090320:
 	GEOM_PART has become the default partition slicer for storage devices,
 	replacing GEOM_MBR, GEOM_BSD, GEOM_PC98 and GEOM_GPT slicers. It
 	introduces some changes:
 
 	MSDOS/EBR: the devices created from MSDOS extended partition entries
 	(EBR) can be named differently than with GEOM_MBR and are now symlinks
 	to devices with offset-based names. fstabs may need to be modified.
 
 	BSD: the "geometry does not match label" warning is harmless in most
 	cases but it points to problems in file system misalignment with
 	disk geometry. The "c" partition is now implicit, covers the whole
 	top-level drive and cannot be (mis)used by users.
 
 	General: Kernel dumps are now not allowed to be written to devices
 	whose partition types indicate they are meant to be used for file
 	systems (or, in case of MSDOS partitions, as something else than
 	the "386BSD" type).
 
 	Most of these changes date approximately from 200812.
 
 20090319:
 	The uscanner(4) driver has been removed from the kernel. This follows
 	Linux removing theirs in 2.6 and making libusb the default interface
 	(supported by sane).
 
 20090319:
 	The multicast forwarding code has been cleaned up. netstat(1)
 	only relies on KVM now for printing bandwidth upcall meters.
 	The IPv4 and IPv6 modules are split into ip_mroute_mod and
 	ip6_mroute_mod respectively. The config(5) options for statically
 	compiling this code remain the same, i.e. 'options MROUTING'.
 
 20090315:
 	Support for the IFF_NEEDSGIANT network interface flag has been
 	removed, which means that non-MPSAFE network device drivers are no
 	longer supported.  In particular, if_ar, if_sr, and network device
 	drivers from the old (legacy) USB stack can no longer be built or
 	used.
 
 20090313:
 	POSIX.1 Native Language Support (NLS) has been enabled in libc and
 	a bunch of new language catalog files have also been added.
 	This means that some common libc messages are now localized and
 	they depend on the LC_MESSAGES environmental variable.
 
 20090313:
 	The k8temp(4) driver has been renamed to amdtemp(4) since
 	support for K10 and K11 CPU families was added.
 
 20090309:
 	IGMPv3 and Source-Specific Multicast (SSM) have been merged
 	to the IPv4 stack. VIMAGE hooks are in but not yet used.
 
 	For kernel developers, the most important changes are that the
 	ip_output() and ip_input() paths no longer take the IN_MULTI_LOCK(),
 	and this lock has been downgraded to a non-recursive mutex.
 
 	Transport protocols (UDP, Raw IP) are now responsible for filtering
 	inbound multicast traffic according to group membership and source
 	filters. The imo_multicast_filter() KPI exists for this purpose.
 	Transports which do not use multicast (SCTP, TCP) already reject
 	multicast by default. Forwarding and receive performance may improve
 	as a mutex acquisition is no longer needed in the ip_input()
 	low-level input path.  in_addmulti() and in_delmulti() are shimmed
 	to new KPIs which exist to support SSM in-kernel.
 
 	For application developers, it is recommended that loopback of
 	multicast datagrams be disabled for best performance, as this
 	will still cause the lock to be taken for each looped-back
 	datagram transmission. The net.inet.ip.mcast.loop sysctl may
 	be tuned to 0 to disable loopback by default; it defaults to 1
 	to preserve the existing behaviour.
 
 	For systems administrators, to obtain best performance with
 	multicast reception and multiple groups, it is always recommended
 	that a card with a suitably precise hash filter is used. Hash
 	collisions will still result in the lock being taken within the
 	transport protocol input path to check group membership.
 
 	If deploying FreeBSD in an environment with IGMP snooping switches,
 	it is recommended that the net.inet.igmp.sendlocal sysctl remain
 	enabled; this forces 224.0.0.0/24 group membership to be announced
 	via IGMP.
 
 	The size of 'struct igmpstat' has changed; netstat needs to be
 	recompiled to reflect this.
 	Bump __FreeBSD_version to 800070.
 
 20090309:
 	libusb20.so.1 is now installed as libusb.so.1 and the ports system
 	updated to use it. This requires a buildworld/installworld in order to
 	update the library and dependencies (usbconfig, etc). Its advisable to
 	rebuild all ports which uses libusb. More specific directions are given
 	in the ports collection UPDATING file. Any /etc/libmap.conf entries for
 	libusb are no longer required and can be removed.
 
 20090302:
 	A workaround is committed to allow the creation of System V shared
 	memory segment of size > 2 GB on the 64-bit architectures.
 	Due to a limitation of the existing ABI, the shm_segsz member
 	of the struct shmid_ds, returned by shmctl(IPC_STAT) call is
 	wrong for large segments. Note that limits must be explicitly
 	raised to allow such segments to be created.
 
 20090301:
 	The layout of struct ifnet has changed, requiring a rebuild of all
 	network device driver modules.
 
 20090227:
 	The /dev handling for the new USB stack has changed, a
 	buildworld/installworld is required for libusb20.
 
 20090223:
 	The new USB2 stack has now been permanently moved in and all kernel and
 	module names reverted to their previous values (eg, usb, ehci, ohci,
 	ums, ...).  The old usb stack can be compiled in by prefixing the name
 	with the letter 'o', the old usb modules have been removed.
 	Updating entry 20090216 for xorg and 20090215 for libmap may still
 	apply.
 
 20090217:
 	The rc.conf(5) option if_up_delay has been renamed to
 	defaultroute_delay to better reflect its purpose. If you have
 	customized this setting in /etc/rc.conf you need to update it to
 	use the new name.
 
 20090216:
 	xorg 7.4 wants to configure its input devices via hald which does not
 	yet work with USB2. If the keyboard/mouse does not work in xorg then
 	add
 		Option "AllowEmptyInput" "off"
 	to your ServerLayout section.  This will cause X to use the configured
 	kbd and mouse sections from your xorg.conf.
 
 20090215:
 	The GENERIC kernels for all architectures now default to the new USB2
 	stack. No kernel config options or code have been removed so if a
 	problem arises please report it and optionally revert to the old USB
 	stack. If you are loading USB kernel modules or have a custom kernel
 	that includes GENERIC then ensure that usb names are also changed over,
 	eg uftdi -> usb2_serial_ftdi.
 
 	Older programs linked against the ports libusb 0.1 need to be
 	redirected to the new stack's libusb20.  /etc/libmap.conf can
 	be used for this:
 		# Map old usb library to new one for usb2 stack
 		libusb-0.1.so.8	libusb20.so.1
 
 20090203:
 	The ichsmb(4) driver has been changed to require SMBus slave
 	addresses be left-justified (xxxxxxx0b) rather than right-justified.
 	All of the other SMBus controller drivers require left-justified
 	slave addresses, so this change makes all the drivers provide the
 	same interface.
 
 20090201:
 	INET6 statistics (struct ip6stat) was updated.
 	netstat(1) needs to be recompiled.
 
 20090119:
 	NTFS has been removed from GENERIC kernel on amd64 to match
 	GENERIC on i386. Should not cause any issues since mount_ntfs(8)
 	will load ntfs.ko module automatically when NTFS support is
 	actually needed, unless ntfs.ko is not installed or security
 	level prohibits loading kernel modules. If either is the case,
 	"options NTFS" has to be added into kernel config.
 
 20090115:
 	TCP Appropriate Byte Counting (RFC 3465) support added to kernel.
 	New field in struct tcpcb breaks ABI, so bump __FreeBSD_version to
 	800061. User space tools that rely on the size of struct tcpcb in
 	tcp_var.h (e.g. sockstat) need to be recompiled.
 
 20081225:
 	ng_tty(4) module updated to match the new TTY subsystem.
 	Due to API change, user-level applications must be updated.
 	New API support added to mpd5 CVS and expected to be present
 	in next mpd5.3 release.
 
 20081219:
 	With __FreeBSD_version 800060 the makefs tool is part of
 	the base system (it was a port).
 
 20081216:
 	The afdata and ifnet locks have been changed from mutexes to
 	rwlocks, network modules will need to be re-compiled.
 
 20081214:
 	__FreeBSD_version 800059 incorporates the new arp-v2 rewrite.
 	RTF_CLONING, RTF_LLINFO and RTF_WASCLONED flags are eliminated.
 	The new code reduced struct rtentry{} by 16 bytes on 32-bit
 	architecture and 40 bytes on 64-bit architecture. The userland
 	applications "arp" and "ndp" have been updated accordingly.
 	The output from "netstat -r" shows only routing entries and
 	none of the L2 information.
 
 20081130:
 	__FreeBSD_version 800057 marks the switchover from the
 	binary ath hal to source code. Users must add the line:
 
 	options	AH_SUPPORT_AR5416
 
 	to their kernel config files when specifying:
 
 	device	ath_hal
 
 	The ath_hal module no longer exists; the code is now compiled
 	together with the driver in the ath module.  It is now
 	possible to tailor chip support (i.e. reduce the set of chips
 	and thereby the code size); consult ath_hal(4) for details.
 
 20081121:
 	__FreeBSD_version 800054 adds memory barriers to
 	<machine/atomic.h>, new interfaces to ifnet to facilitate
 	multiple hardware transmit queues for cards that support
 	them, and a lock-less ring-buffer implementation to
 	enable drivers to more efficiently manage queueing of
 	packets.
 
 20081117:
 	A new version of ZFS (version 13) has been merged to -HEAD.
 	This version has zpool attribute "listsnapshots" off by
 	default, which means "zfs list" does not show snapshots,
 	and is the same as Solaris behavior.
 
 20081028:
 	dummynet(4) ABI has changed. ipfw(8) needs to be recompiled.
 
 20081009:
 	The uhci, ohci, ehci and slhci USB Host controller drivers have
 	been put into separate modules. If you load the usb module
 	separately through loader.conf you will need to load the
 	appropriate *hci module as well. E.g. for a UHCI-based USB 2.0
 	controller add the following to loader.conf:
 
 		uhci_load="YES"
 		ehci_load="YES"
 
 20081009:
 	The ABI used by the PMC toolset has changed.  Please keep
 	userland (libpmc(3)) and the kernel module (hwpmc(4)) in
 	sync.
 
 20080820:
 	The TTY subsystem of the kernel has been replaced by a new
 	implementation, which provides better scalability and an
 	improved driver model. Most common drivers have been migrated to
 	the new TTY subsystem, while others have not. The following
 	drivers have not yet been ported to the new TTY layer:
 
 	PCI/ISA:
 		cy, digi, rc, rp, sio
 
 	USB:
 		ubser, ucycom
 
 	Line disciplines:
 		ng_h4, ng_tty, ppp, sl, snp
 
 	Adding these drivers to your kernel configuration file shall
 	cause compilation to fail.
 
 20080818:
 	ntpd has been upgraded to 4.2.4p5.
 
 20080801:
 	OpenSSH has been upgraded to 5.1p1.
 
 	For many years, FreeBSD's version of OpenSSH preferred DSA
 	over RSA for host and user authentication keys.  With this
 	upgrade, we've switched to the vendor's default of RSA over
 	DSA.  This may cause upgraded clients to warn about unknown
 	host keys even for previously known hosts.  Users should
 	follow the usual procedure for verifying host keys before
 	accepting the RSA key.
 
 	This can be circumvented by setting the "HostKeyAlgorithms"
 	option to "ssh-dss,ssh-rsa" in ~/.ssh/config or on the ssh
 	command line.
 
 	Please note that the sequence of keys offered for
 	authentication has been changed as well.  You may want to
 	specify IdentityFile in a different order to revert this
 	behavior.
 
 20080713:
 	The sio(4) driver has been removed from the i386 and amd64
 	kernel configuration files. This means uart(4) is now the
 	default serial port driver on those platforms as well.
 
 	To prevent collisions with the sio(4) driver, the uart(4) driver
 	uses different names for its device nodes. This means the
 	onboard serial port will now most likely be called "ttyu0"
 	instead of "ttyd0". You may need to reconfigure applications to
 	use the new device names.
 
 	When using the serial port as a boot console, be sure to update
 	/boot/device.hints and /etc/ttys before booting the new kernel.
 	If you forget to do so, you can still manually specify the hints
 	at the loader prompt:
 
 		set hint.uart.0.at="isa"
 		set hint.uart.0.port="0x3F8"
 		set hint.uart.0.flags="0x10"
 		set hint.uart.0.irq="4"
 		boot -s
 
 20080609:
 	The gpt(8) utility has been removed. Use gpart(8) to partition
 	disks instead.
 
 20080603:
 	The version that Linuxulator emulates was changed from 2.4.2
 	to 2.6.16. If you experience any problems with Linux binaries
 	please try to set sysctl compat.linux.osrelease to 2.4.2 and
 	if it fixes the problem contact emulation mailing list.
 
 20080525:
 	ISDN4BSD (I4B) was removed from the src tree. You may need to
 	update a your kernel configuration and remove relevant entries.
 
 20080509:
 	I have checked in code to support multiple routing tables.
 	See the man pages setfib(1) and setfib(2).
 	This is a hopefully backwards compatible version,
 	but to make use of it you need to compile your kernel
 	with options ROUTETABLES=2 (or more up to 16).
 
 20080420:
 	The 802.11 wireless support was redone to enable multi-bss
 	operation on devices that are capable.  The underlying device
 	is no longer used directly but instead wlanX devices are
 	cloned with ifconfig.  This requires changes to rc.conf files.
 	For example, change:
 		ifconfig_ath0="WPA DHCP"
 	to
 		wlans_ath0=wlan0
 		ifconfig_wlan0="WPA DHCP"
 	see rc.conf(5) for more details.  In addition, mergemaster of
 	/etc/rc.d is highly recommended.  Simultaneous update of userland
 	and kernel wouldn't hurt either.
 
 	As part of the multi-bss changes the wlan_scan_ap and wlan_scan_sta
 	modules were merged into the base wlan module.  All references
 	to these modules (e.g. in kernel config files) must be removed.
 
 20080408:
 	psm(4) has gained write(2) support in native operation level.
 	Arbitrary commands can be written to /dev/psm%d and status can
 	be read back from it.  Therefore, an application is responsible
 	for status validation and error recovery.  It is a no-op in
 	other operation levels.
 
 20080312:
 	Support for KSE threading has been removed from the kernel.  To
 	run legacy applications linked against KSE libmap.conf may
 	be used.  The following libmap.conf may be used to ensure
 	compatibility with any prior release:
 
 	libpthread.so.1 libthr.so.1
 	libpthread.so.2 libthr.so.2
 	libkse.so.3 libthr.so.3
 
 20080301:
 	The layout of struct vmspace has changed. This affects libkvm
 	and any executables that link against libkvm and use the
 	kvm_getprocs() function. In particular, but not exclusively,
 	it affects ps(1), fstat(1), pkill(1), systat(1), top(1) and w(1).
 	The effects are minimal, but it's advisable to upgrade world
 	nonetheless.
 
 20080229:
 	The latest em driver no longer has support in it for the
 	82575 adapter, this is now moved to the igb driver. The
 	split was done to make new features that are incompatible
 	with older hardware easier to do.
 
 20080220:
 	The new geom_lvm(4) geom class has been renamed to geom_linux_lvm(4),
 	likewise the kernel option is now GEOM_LINUX_LVM.
 
 20080211:
 	The default NFS mount mode has changed from UDP to TCP for
 	increased reliability.  If you rely on (insecurely) NFS
 	mounting across a firewall you may need to update your
 	firewall rules.
 
 20080208:
 	Belatedly note the addition of m_collapse for compacting
 	mbuf chains.
 
 20080126:
 	The fts(3) structures have been changed to use adequate
 	integer types for their members and so to be able to cope
 	with huge file trees.  The old fts(3) ABI is preserved
 	through symbol versioning in libc, so third-party binaries
 	using fts(3) should still work, although they will not take
 	advantage of the extended types.  At the same time, some
 	third-party software might fail to build after this change
 	due to unportable assumptions made in its source code about
 	fts(3) structure members.  Such software should be fixed
 	by its vendor or, in the worst case, in the ports tree.
 	FreeBSD_version 800015 marks this change for the unlikely
 	case that a portable fix is impossible.
 
 20080123:
 	To upgrade to -current after this date, you must be running
 	FreeBSD not older than 6.0-RELEASE.  Upgrading to -current
 	from 5.x now requires a stop over at RELENG_6 or RELENG_7 systems.
 
 20071128:
 	The ADAPTIVE_GIANT kernel option has been retired because its
 	functionality is the default now.
 
 20071118:
 	The AT keyboard emulation of sunkbd(4) has been turned on
 	by default. In order to make the special symbols of the Sun
 	keyboards driven by sunkbd(4) work under X these now have
 	to be configured the same way as Sun USB keyboards driven
 	by ukbd(4) (which also does AT keyboard emulation), f.e.:
 
 	Option	"XkbLayout" "us"
 	Option	"XkbRules" "xorg"
 	Option	"XkbSymbols" "pc(pc105)+sun_vndr/usb(sun_usb)+us"
 
 20071024:
 	It has been decided that it is desirable to provide ABI
 	backwards compatibility to the FreeBSD 4/5/6 versions of the
 	PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was
 	broken with the introduction of PCI domain support (see the
 	20070930 entry). Unfortunately, this required the ABI of
 	PCIOCGETCONF to be broken again in order to be able to
 	provide backwards compatibility to the old version of that
 	IOCTL. Thus consumers of PCIOCGETCONF have to be recompiled
 	again. As for prominent ports this affects neither pciutils
 	nor xorg-server this time, the hal port needs to be rebuilt
 	however.
 
 20071020:
 	The misnamed kthread_create() and friends have been renamed
 	to kproc_create() etc. Many of the callers already
 	used kproc_start()..
 	I will return kthread_create() and friends in a while
 	with implementations that actually create threads, not procs.
 	Renaming corresponds with version 800002.
 
 20071010:
 	RELENG_7 branched.
 
 20071009:
 	Setting WITHOUT_LIBPTHREAD now means WITHOUT_LIBKSE and
 	WITHOUT_LIBTHR are set.
 
 20070930:
 	The PCI code has been made aware of PCI domains. This means that
 	the location strings as used by pciconf(8) etc are now in the
 	following format: pci<domain>:<bus>:<device>[:<function>]. It
 	also means that consumers of <sys/pciio.h> potentially need to
 	be recompiled; this includes the hal and xorg-server ports.
 
 20070928:
 	The caching daemon (cached) was renamed to nscd. nscd.conf
 	configuration file should be used instead of cached.conf and
 	nscd_enable, nscd_pidfile and nscd_flags options should be used
 	instead of cached_enable, cached_pidfile and cached_flags in
 	rc.conf.
 
 20070921:
 	The getfacl(1) utility now prints owning user and group name
 	instead of owning uid and gid in the three line comment header.
 	This is the same behavior as getfacl(1) on Solaris and Linux.
 
 20070704:
 	The new IPsec code is now compiled in using the IPSEC option.  The
 	IPSEC option now requires "device crypto" be defined in your kernel
 	configuration.  The FAST_IPSEC kernel option is now deprecated.
 
 20070702:
 	The packet filter (pf) code has been updated to OpenBSD 4.1 Please
 	note the changed syntax - keep state is now on by default.  Also
 	note the fact that ftp-proxy(8) has been changed from bottom up and
 	has been moved from libexec to usr/sbin.  Changes in the ALTQ
 	handling also affect users of IPFW's ALTQ capabilities.
 
 20070701:
 	Remove KAME IPsec in favor of FAST_IPSEC, which is now the
 	only IPsec supported by FreeBSD.  The new IPsec stack
 	supports both IPv4 and IPv6. The kernel option will change
 	after the code changes have settled in.  For now the kernel
 	option IPSEC is deprecated and FAST_IPSEC is the only option, that
 	will change after some settling time.
 
 20070701:
 	The wicontrol(8) utility has been removed from the base system. wi(4)
 	cards should be configured using ifconfig(8), see the man page for more
 	information.
 
 20070612:
 	The i386/amd64 GENERIC kernel now defaults to the nfe(4) driver
 	instead of the nve(4) driver. Please update your configuration
 	accordingly.
 
 20070612:
 	By default, /etc/rc.d/sendmail no longer rebuilds the aliases
 	database if it is missing or older than the aliases file.  If
 	desired, set the new rc.conf option sendmail_rebuild_aliases
 	to "YES" to restore that functionality.
 
 20070612:
 	The IPv4 multicast socket code has been considerably modified, and
 	moved to the file sys/netinet/in_mcast.c. Initial support for the
 	RFC 3678 Source-Specific Multicast Socket API has been added to
 	the IPv4 network stack.
 
 	Strict multicast and broadcast reception is now the default for
 	UDP/IPv4 sockets; the net.inet.udp.strict_mcast_mship sysctl variable
 	has now been removed.
 
 	The RFC 1724 hack for interface selection has been removed; the use
 	of the Linux-derived ip_mreqn structure with IP_MULTICAST_IF has
 	been added to replace it. Consumers such as routed will soon be
 	updated to reflect this.
 
 	These changes affect users who are running routed(8) or rdisc(8)
 	from the FreeBSD base system on point-to-point or unnumbered
 	interfaces.
 
 20070610:
 	The net80211 layer has changed significantly and all wireless
 	drivers that depend on it need to be recompiled.  Further these
 	changes require that any program that interacts with the wireless
 	support in the kernel be recompiled; this includes: ifconfig,
 	wpa_supplicant, hostapd, and wlanstats.  Users must also, for
 	the moment, kldload the wlan_scan_sta and/or wlan_scan_ap modules
 	if they use modules for wireless support.  These modules implement
 	scanning support for station and ap modes, respectively.  Failure
 	to load the appropriate module before marking a wireless interface
 	up will result in a message to the console and the device not
 	operating properly.
 
 20070610:
 	The pam_nologin(8) module ceases to provide an authentication
 	function and starts providing an account management function.
 	Consequent changes to /etc/pam.d should be brought in using
 	mergemaster(8).  Third-party files in /usr/local/etc/pam.d may
 	need manual editing as follows.  Locate this line (or similar):
 
 		auth	required	pam_nologin.so	no_warn
 
 	and change it according to this example:
 
 		account	required	pam_nologin.so	no_warn
 
 	That is, the first word needs to be changed from "auth" to
 	"account".  The new line can be moved to the account section
 	within the file for clarity.  Not updating pam.conf(5) files
 	will result in nologin(5) ignored by the respective services.
 
 20070529:
 	The ether_ioctl() function has been synchronized with ioctl(2)
 	and ifnet.if_ioctl.  Due to that, the size of one of its arguments
 	has changed on 64-bit architectures.  All kernel modules using
 	ether_ioctl() need to be rebuilt on such architectures.
 
 20070516:
 	Improved INCLUDE_CONFIG_FILE support has been introduced to the
 	config(8) utility. In order to take advantage of this new
 	functionality, you are expected to recompile and install
 	src/usr.sbin/config. If you don't rebuild config(8), and your
 	kernel configuration depends on INCLUDE_CONFIG_FILE, the kernel
 	build will be broken because of a missing "kernconfstring"
 	symbol.
 
 20070513:
 	Symbol versioning is enabled by default.  To disable it, use
 	option WITHOUT_SYMVER.  It is not advisable to attempt to
 	disable symbol versioning once it is enabled; your installworld
 	will break because a symbol version-less libc will get installed
 	before the install tools.  As a result, the old install tools,
 	which previously had symbol dependencies to FBSD_1.0, will fail
 	because the freshly installed libc will not have them.
 
 	The default threading library (providing "libpthread") has been
 	changed to libthr.  If you wish to have libkse as your default,
 	use option DEFAULT_THREAD_LIB=libkse for the buildworld.
 
 20070423:
 	The ABI breakage in sendmail(8)'s libmilter has been repaired
 	so it is no longer necessary to recompile mail filters (aka,
 	milters).  If you recompiled mail filters after the 20070408
 	note, it is not necessary to recompile them again.
 
 20070417:
 	The new trunk(4) driver has been renamed to lagg(4) as it better
 	reflects its purpose. ifconfig will need to be recompiled.
 
 20070408:
 	sendmail(8) has been updated to version 8.14.1.  Mail filters
 	(aka, milters) compiled against the libmilter included in the
 	base operating system should be recompiled.
 
 20070302:
 	Firmwares for ipw(4) and iwi(4) are now included in the base tree.
 	In order to use them one must agree to the respective LICENSE in
 	share/doc/legal and define legal.intel_<name>.license_ack=1 via
 	loader.conf(5) or kenv(1).  Make sure to deinstall the now
 	deprecated modules from the respective firmware ports.
 
 20070228:
 	The name resolution/mapping functions addr2ascii(3) and ascii2addr(3)
 	were removed from FreeBSD's libc. These originally came from INRIA
 	IPv6. Nothing in FreeBSD ever used them. They may be regarded as
 	deprecated in previous releases.
 	The AF_LINK support for getnameinfo(3) was merged from NetBSD to
 	replace it as a more portable (and re-entrant) API.
 
 20070224:
 	To support interrupt filtering a modification to the newbus API
 	has occurred, ABI was broken and __FreeBSD_version was bumped
 	to 700031. Please make sure that your kernel and modules are in
 	sync. For more info:
 	http://docs.freebsd.org/cgi/mid.cgi?20070221233124.GA13941
 
 20070224:
 	The IPv6 multicast forwarding code may now be loaded into GENERIC
 	kernels by loading the ip_mroute.ko module. This is built into the
 	module unless WITHOUT_INET6 or WITHOUT_INET6_SUPPORT options are
 	set; see src.conf(5) for more information.
 
 20070214:
 	The output of netstat -r has changed. Without -n, we now only
 	print a "network name" without the prefix length if the network
 	address and mask exactly match a Class A/B/C network, and an entry
 	exists in the nsswitch "networks" map.
 	With -n, we print the full unabbreviated CIDR network prefix in
 	the form "a.b.c.d/p". 0.0.0.0/0 is always printed as "default".
 	This change is in preparation for changes such as equal-cost
 	multipath, and to more generally assist operational deployment
 	of FreeBSD as a modern IPv4 router.
 
 20070210:
 	PIM has been turned on by default in the IPv4 multicast
 	routing code. The kernel option 'PIM' has now been removed.
 	PIM is now built by default if option 'MROUTING' is specified.
 	It may now be loaded into GENERIC kernels by loading the
 	ip_mroute.ko module.
 
 20070207:
 	Support for IPIP tunnels (VIFF_TUNNEL) in IPv4 multicast routing
 	has been removed. Its functionality may be achieved by explicitly
 	configuring gif(4) interfaces and using the 'phyint' keyword in
 	mrouted.conf.
 	XORP does not support source-routed IPv4 multicast tunnels nor the
 	integrated IPIP tunneling, therefore it is not affected by this
 	change. The __FreeBSD_version macro has been bumped to 700030.
 
 20061221:
 	Support for PCI Message Signalled Interrupts has been
 	re-enabled in the bge driver, only for those chips which are
 	believed to support it properly.  If there are any problems,
 	MSI can be disabled completely by setting the
 	'hw.pci.enable_msi' and 'hw.pci.enable_msix' tunables to 0
 	in the loader.
 
 20061214:
 	Support for PCI Message Signalled Interrupts has been
 	disabled again in the bge driver.  Many revisions of the
 	hardware fail to support it properly.  Support can be
 	re-enabled by removing the #define of BGE_DISABLE_MSI in
 	"src/sys/dev/bge/if_bge.c".
 
 20061214:
 	Support for PCI Message Signalled Interrupts has been added
 	to the bge driver.  If there are any problems, MSI can be
 	disabled completely by setting the 'hw.pci.enable_msi' and
 	'hw.pci.enable_msix' tunables to 0 in the loader.
 
 20061205:
 	The removal of several facets of the experimental Threading
 	system from the kernel means that the proc and thread structures
 	have changed quite a bit. I suggest all kernel modules that might
 	reference these structures be recompiled.. Especially the
 	linux module.
 
 20061126:
 	Sound infrastructure has been updated with various fixes and
 	improvements. Most of the changes are pretty much transparent,
 	with exceptions of followings:
 	1) All sound driver specific sysctls (hw.snd.pcm%d.*) have been
 	   moved to their own dev sysctl nodes, for example:
 		hw.snd.pcm0.vchans -> dev.pcm.0.vchans
 	2) /dev/dspr%d.%d has been deprecated. Each channel now has its
 	   own chardev in the form of "dsp%d.<function>%d", where <function>
 	   is p = playback, r = record and v = virtual, respectively. Users
 	   are encouraged to use these devs instead of (old) "/dev/dsp%d.%d".
 	   This does not affect those who are using "/dev/dsp".
 
 20061122:
 	geom(4)'s gmirror(8) class metadata structure has been
 	rev'd from v3 to v4. If you update across this point and
 	your metadata is converted for you, you will not be easily
 	able to downgrade since the /boot/kernel.old/geom_mirror.ko
 	kernel module will be unable to read the v4 metadata.  You
 	can resolve this by doing from the loader(8) prompt:
 
 		set vfs.root.mountfrom="ufs:/dev/XXX"
 
 	where XXX is the root slice of one of the disks that composed
 	the mirror (i.e.: /dev/ad0s1a). You can then rebuild
 	the array the same way you built it originally.
 
 20061122:
 	The following binaries have been disconnected from the build:
 	mount_devfs, mount_ext2fs, mount_fdescfs, mount_procfs, mount_linprocfs,
 	and mount_std.  The functionality of these programs has been
 	moved into the mount program.  For example, to mount a devfs
 	filesystem, instead of using mount_devfs, use: "mount -t devfs".
 	This does not affect entries in /etc/fstab, since entries in
 	/etc/fstab are always processed with "mount -t fstype".
 
 20061113:
 	Support for PCI Message Signalled Interrupts on i386 and amd64
 	has been added to the kernel and various drivers will soon be
 	updated to use MSI when it is available.  If there are any problems,
 	MSI can be disabled completely by setting the 'hw.pci.enable_msi'
 	and 'hw.pci.enable_msix' tunables to 0 in the loader.
 
 20061110:
 	The MUTEX_PROFILING option has been renamed to LOCK_PROFILING.
 	The lockmgr object layout has been changed as a result of having
 	a lock_object embedded in it. As a consequence all file system
 	kernel modules must be re-compiled. The mutex profiling man page
 	has not yet been updated to reflect this change.
 
 20061026:
 	KSE in the kernel has now been made optional and turned on by
 	default. Use 'nooption KSE' in your kernel config to turn it
 	off. All kernel modules *must* be recompiled after this change.
 	There-after, modules from a KSE kernel should be compatible with
 	modules from a NOKSE kernel due to the temporary padding fields
 	added to 'struct proc'.
 
 20060929:
 	mrouted and its utilities have been removed from the base system.
 
 20060927:
 	Some ioctl(2) command codes have changed.  Full backward ABI
 	compatibility is provided if the "options COMPAT_FREEBSD6" is
 	present in the kernel configuration file.  Make sure to add
 	this option to your kernel config file, or recompile X.Org
 	and the rest of ports; otherwise they may refuse to work.
 
 20060924:
 	tcpslice has been removed from the base system.
 
 20060913:
 	The sizes of struct tcpcb (and struct xtcpcb) have changed due to
 	the rewrite of TCP syncookies.  Tools like netstat, sockstat, and
 	systat needs to be rebuilt.
 
 20060903:
 	libpcap updated to v0.9.4 and tcpdump to v3.9.4
 
 20060816:
 	The IPFIREWALL_FORWARD_EXTENDED option is gone and the behaviour
 	for IPFIREWALL_FORWARD is now as it was before when it was first
 	committed and for years after. The behaviour is now ON.
 
 20060725:
 	enigma(1)/crypt(1) utility has been changed on 64 bit architectures.
 	Now it can decrypt files created from different architectures.
 	Unfortunately, it is no longer able to decrypt a cipher text
 	generated with an older version on 64 bit architectures.
 	If you have such a file, you need old utility to decrypt it.
 
 20060709:
 	The interface version of the i4b kernel part has changed. So
 	after updating the kernel sources and compiling a new kernel,
 	the i4b user space tools in "/usr/src/usr.sbin/i4b" must also
 	be rebuilt, and vice versa.
 
 20060627:
 	The XBOX kernel now defaults to the nfe(4) driver instead of
 	the nve(4) driver. Please update your configuration
 	accordingly.
 
 20060514:
 	The i386-only lnc(4) driver for the AMD Am7900 LANCE and Am79C9xx
 	PCnet family of NICs has been removed. The new le(4) driver serves
 	as an equivalent but cross-platform replacement with the pcn(4)
 	driver still providing performance-optimized support for the subset
 	of AMD Am79C971 PCnet-FAST and greater chips as before.
 
 20060511:
 	The machdep.* sysctls and the adjkerntz utility have been
 	modified a bit.  The new adjkerntz utility uses the new
 	sysctl names and sysctlbyname() calls, so it may be impossible
 	to run an old /sbin/adjkerntz utility in single-user mode
 	with a new kernel.  Replace the `adjkerntz -i' step before
 	`make installworld' with:
 
 	    /usr/obj/usr/src/sbin/adjkerntz/adjkerntz -i
 
 	and proceed as usual with the rest of the installworld-stage
 	steps.  Otherwise, you risk installing binaries with their
 	timestamp set several hours in the future, especially if
 	you are running with local time set to GMT+X hours.
 
 20060412:
 	The ip6fw utility has been removed.  The behavior provided by
 	ip6fw has been in ipfw2 for a good while and the rc.d scripts
 	have been updated to deal with it.  There are some rules that
 	might not migrate cleanly.  Use rc.firewall6 as a template to
 	rewrite rules.
 
 20060428:
 	The puc(4) driver has been overhauled. The ebus(4) and sbus(4)
 	attachments have been removed. Make sure to configure scc(4)
 	on sparc64. Note also that by default puc(4) will use uart(4)
 	and not sio(4) for serial ports because interrupt handling has
 	been optimized for multi-port serial cards and only uart(4)
 	implements the interface to support it.
 
 20060330:
 	The scc(4) driver replaces puc(4) for Serial Communications
 	Controllers (SCCs) like the Siemens SAB82532 and the Zilog
 	Z8530. On sparc64, it is advised to add scc(4) to the kernel
 	configuration to make sure that the serial ports remain
 	functional.
 
 20060317:
 	Most world/kernel related NO_* build options changed names.
 	New knobs have common prefixes WITHOUT_*/WITH_* (modelled
 	after FreeBSD ports) and should be set in /etc/src.conf
 	(the src.conf(5) manpage is provided).  Full backwards
 	compatibility is maintained for the time being though it's
 	highly recommended to start moving old options out of the
 	system-wide /etc/make.conf file into the new /etc/src.conf
 	while also properly renaming them.  More conversions will
 	likely follow.  Posting to current@:
 
 	http://lists.freebsd.org/pipermail/freebsd-current/2006-March/061725.html
 
 20060305:
 	The NETSMBCRYPTO kernel option has been retired because its
 	functionality is always included in NETSMB and smbfs.ko now.
 
 20060303:
 	The TDFX_LINUX kernel option was retired and replaced by the
 	tdfx_linux device.  The latter can be loaded as the 3dfx_linux.ko
 	kernel module.  Loading it alone should suffice to get 3dfx support
 	for Linux apps because it will pull in 3dfx.ko and linux.ko through
 	its dependencies.
 
 20060204:
 	The 'audit' group was added to support the new auditing functionality
 	in the base system.  Be sure to follow the directions for updating,
 	including the requirement to run mergemaster -p.
 
 20060201:
 	The kernel ABI to file system modules was changed on i386.
 	Please make sure that your kernel and modules are in sync.
 
 20060118:
 	This actually occured some time ago, but installing the kernel
 	now also installs a bunch of symbol files for the kernel modules.
 	This increases the size of /boot/kernel to about 67Mbytes. You
 	will need twice this if you will eventually back this up to kernel.old
 	on your next install.
 	If you have a shortage of room in your root partition, you should add
 	-DINSTALL_NODEBUG to your make arguments or add INSTALL_NODEBUG="yes"
 	to your /etc/make.conf.
 
 20060113:
 	libc's malloc implementation has been replaced.  This change has the
 	potential to uncover application bugs that previously went unnoticed.
 	See the malloc(3) manual page for more details.
 
 20060112:
 	The generic netgraph(4) cookie has been changed. If you upgrade
 	kernel passing this point, you also need to upgrade userland
 	and netgraph(4) utilities like ports/net/mpd or ports/net/mpd4.
 
 20060106:
 	si(4)'s device files now contain the unit number.
 	Uses of {cua,tty}A[0-9a-f] should be replaced by {cua,tty}A0[0-9a-f].
 
 20060106:
 	The kernel ABI was mostly destroyed due to a change in the size
 	of struct lock_object which is nested in other structures such
 	as mutexes which are nested in all sorts of other structures.
 	Make sure your kernel and modules are in sync.
 
 20051231:
 	The page coloring algorithm in the VM subsystem was converted
 	from tuning with kernel options to autotuning. Please remove
 	any PQ_* option except PQ_NOOPT from your kernel config.
 
 20051211:
 	The net80211-related tools in the tools/tools/ath directory
 	have been moved to tools/tools/net80211 and renamed with a
 	"wlan" prefix.  Scripts that use them should be adjusted
 	accordingly.
 
 20051202:
 	Scripts in the local_startup directories (as defined in
 	/etc/defaults/rc.conf) that have the new rc.d semantics will
 	now be run as part of the base system rcorder. If there are
 	errors or problems with one of these local scripts, it could
 	cause boot problems. If you encounter such problems, boot in
 	single user mode, remove that script from the */rc.d directory.
 	Please report the problem to the port's maintainer, and the
 	freebsd-ports@freebsd.org mailing list.
 
 20051129:
 	The nodev mount option was deprecated in RELENG_6 (where it
 	was a no-op), and is now unsupported.  If you have nodev or dev listed
 	in /etc/fstab, remove it, otherwise it will result in a mount error.
 
 20051129:
 	ABI between ipfw(4) and ipfw(8) has been changed. You need
 	to rebuild ipfw(8) when rebuilding kernel.
 
 20051108:
 	rp(4)'s device files now contain the unit number.
 	Uses of {cua,tty}R[0-9a-f] should be replaced by {cua,tty}R0[0-9a-f].
 
 20051029:
 	/etc/rc.d/ppp-user has been renamed to /etc/rc.d/ppp.
 	Its /etc/rc.conf.d configuration file has been `ppp' from
 	the beginning, and hence there is no need to touch it.
 
 20051014:
 	Now most modules get their build-time options from the kernel
 	configuration file.  A few modules still have fixed options
 	due to their non-conformant implementation, but they will be
 	corrected eventually.  You may need to review the options of
 	the modules in use, explicitly specify the non-default options
 	in the kernel configuration file, and rebuild the kernel and
 	modules afterwards.
 
 20051001:
 	kern.polling.enable sysctl MIB is now deprecated. Use ifconfig(8)
 	to turn polling(4) on your interfaces.
 
 20050927:
 	The old bridge(4) implementation was retired.  The new
 	if_bridge(4) serves as a full functional replacement.
 
 20050722:
 	The ai_addrlen of a struct addrinfo was changed to a socklen_t
 	to conform to POSIX-2001.  This change broke an ABI
 	compatibility on 64 bit architecture.  You have to recompile
 	userland programs that use getaddrinfo(3) on 64 bit
 	architecture.
 
 20050711:
 	RELENG_6 branched here.
 
 20050629:
 	The pccard_ifconfig rc.conf variable has been removed and a new
 	variable, ifconfig_DEFAULT has been introduced.  Unlike
 	pccard_ifconfig, ifconfig_DEFAULT applies to ALL interfaces that
 	do not have ifconfig_ifn entries rather than just those in
 	removable_interfaces.
 
 20050616:
 	Some previous versions of PAM have permitted the use of
 	non-absolute paths in /etc/pam.conf or /etc/pam.d/* when referring
 	to third party PAM modules in /usr/local/lib.  A change has been
 	made to require the use of absolute paths in order to avoid
 	ambiguity and dependence on library path configuration, which may
 	affect existing configurations.
 
 20050610:
 	Major changes to network interface API.  All drivers must be
 	recompiled.  Drivers not in the base system will need to be
 	updated to the new APIs.
 
 20050609:
 	Changes were made to kinfo_proc in sys/user.h.  Please recompile
 	userland, or commands like `fstat', `pkill', `ps', `top' and `w'
 	will not behave correctly.
 
 	The API and ABI for hwpmc(4) have changed with the addition
 	of sampling support.  Please recompile lib/libpmc(3) and
 	usr.sbin/{pmcstat,pmccontrol}.
 
 20050606:
 	The OpenBSD dhclient was imported in place of the ISC dhclient
 	and the network interface configuration scripts were updated
 	accordingly.  If you use DHCP to configure your interfaces, you
 	must now run devd.  Also, DNS updating was lost so you will need
 	to find a workaround if you use this feature.
 
 	The '_dhcp' user was added to support the OpenBSD dhclient.  Be
 	sure to run mergemaster -p (like you are supposed to do every time
 	anyway).
 
 20050605:
 	if_bridge was added to the tree. This has changed struct ifnet.
 	Please recompile userland and all network related modules.
 
 20050603:
 	The n_net of a struct netent was changed to an uint32_t, and
 	1st argument of getnetbyaddr() was changed to an uint32_t, to
 	conform to POSIX-2001.  These changes broke an ABI
 	compatibility on 64 bit architecture.  With these changes,
 	shlib major of libpcap was bumped.  You have to recompile
 	userland programs that use getnetbyaddr(3), getnetbyname(3),
 	getnetent(3) and/or libpcap on 64 bit architecture.
 
 20050528:
 	Kernel parsing of extra options on '#!' first lines of shell
 	scripts has changed.  Lines with multiple options likely will
 	fail after this date.  For full details, please see
 		http://people.freebsd.org/~gad/Updating-20050528.txt
 
 20050503:
 	The packet filter (pf) code has been updated to OpenBSD 3.7
 	Please note the changed anchor syntax and the fact that
 	authpf(8) now needs a mounted fdescfs(5) to function.
 
 20050415:
 	The NO_MIXED_MODE kernel option has been removed from the i386
 	amd64 platforms as its use has been superceded by the new local
 	APIC timer code.  Any kernel config files containing this option
 	should be updated.
 
 20050227:
 	The on-disk format of LC_CTYPE files was changed to be machine
 	independent.  Please make sure NOT to use NO_CLEAN buildworld
 	when crossing this point. Crossing this point also requires
 	recompile or reinstall of all locale depended packages.
 
 20050225:
 	The ifi_epoch member of struct if_data has been changed to
 	contain the uptime at which the interface was created or the
 	statistics zeroed rather then the wall clock time because
 	wallclock time may go backwards.  This should have no impact
 	unless an snmp implementation is using this value (I know of
 	none at this point.)
 
 20050224:
 	The acpi_perf and acpi_throttle drivers are now part of the
 	acpi(4) main module.  They are no longer built separately.
 
 20050223:
 	The layout of struct image_params has changed. You have to
 	recompile all compatibility modules (linux, svr4, etc) for use
 	with the new kernel.
 
 20050223:
 	The p4tcc driver has been merged into cpufreq(4).  This makes
 	"options CPU_ENABLE_TCC" obsolete.  Please load cpufreq.ko or
 	compile in "device cpufreq" to restore this functionality.
 
 20050220:
 	The responsibility of recomputing the file system summary of
 	a SoftUpdates-enabled dirty volume has been transferred to the
 	background fsck.  A rebuild of fsck(8) utility is recommended
 	if you have updated the kernel.
 
 	To get the old behavior (recompute file system summary at mount
 	time), you can set vfs.ffs.compute_summary_at_mount=1 before
 	mounting the new volume.
 
 20050206:
 	The cpufreq import is complete.  As part of this, the sysctls for
 	acpi(4) throttling have been removed.  The power_profile script
 	has been updated, so you can use performance/economy_cpu_freq in
 	rc.conf(5) to set AC on/offline cpu frequencies.
 
 20050206:
 	NG_VERSION has been increased. Recompiling kernel (or ng_socket.ko)
 	requires recompiling libnetgraph and userland netgraph utilities.
 
 20050114:
 	Support for abbreviated forms of a number of ipfw options is
 	now deprecated.  Warnings are printed to stderr indicating the
 	correct full form when a match occurs.  Some abbreviations may
 	be supported at a later date based on user feedback.  To be
 	considered for support, abbreviations must be in use prior to
 	this commit and unlikely to be confused with current key words.
 
 20041221:
 	By a popular demand, a lot of NOFOO options were renamed
 	to NO_FOO (see bsd.compat.mk for a full list).  The old
 	spellings are still supported, but will cause annoying
 	warnings on stderr.  Make sure you upgrade properly (see
 	the COMMON ITEMS: section later in this file).
 
 20041219:
 	Auto-loading of ancillary wlan modules such as wlan_wep has
 	been temporarily disabled; you need to statically configure
 	the modules you need into your kernel or explicitly load them
 	prior to use.  Specifically, if you intend to use WEP encryption
 	with an 802.11 device load/configure wlan_wep; if you want to
 	use WPA with the ath driver load/configure wlan_tkip, wlan_ccmp,
 	and wlan_xauth as required.
 
 20041213:
 	The behaviour of ppp(8) has changed slightly.  If lqr is enabled
 	(``enable lqr''), older versions would revert to LCP ECHO mode on
 	negotiation failure.  Now, ``enable echo'' is required for this
 	behaviour.  The ppp version number has been bumped to 3.4.2 to
 	reflect the change.
 
 20041201:
 	The wlan support has been updated to split the crypto support
 	into separate modules.  For static WEP you must configure the
 	wlan_wep module in your system or build and install the module
 	in place where it can be loaded (the kernel will auto-load
 	the module when a wep key is configured).
 
 20041201:
 	The ath driver has been updated to split the tx rate control
 	algorithm into a separate module.  You need to include either
 	ath_rate_onoe or ath_rate_amrr when configuring the kernel.
 
 20041116:
 	Support for systems with an 80386 CPU has been removed.  Please
 	use FreeBSD 5.x or earlier on systems with an 80386.
 
 20041110:
 	We have had a hack which would mount the root filesystem
 	R/W if the device were named 'md*'.  As part of the vnode
 	work I'm doing I have had to remove this hack.  People
 	building systems which use preloaded MD root filesystems
 	may need to insert a "/sbin/mount -u -o rw /dev/md0 /" in
 	their /etc/rc scripts.
 
 20041104:
 	FreeBSD 5.3 shipped here.
 
 20041102:
 	The size of struct tcpcb has changed again due to the removal
 	of RFC1644 T/TCP.  You have to recompile userland programs that
 	read kmem for tcp sockets directly (netstat, sockstat, etc.)
 
 20041022:
 	The size of struct tcpcb has changed.  You have to recompile
 	userland programs that read kmem for tcp sockets directly
 	(netstat, sockstat, etc.)
 
 20041016:
 	RELENG_5 branched here.  For older entries, please see updating
 	in the RELENG_5 branch.
 
 COMMON ITEMS:
 
 	General Notes
 	-------------
 	Avoid using make -j when upgrading.  From time to time in the
 	past there have been problems using -j with buildworld and/or
 	installworld.  This is especially true when upgrading between
 	"distant" versions (eg one that cross a major release boundary
 	or several minor releases, or when several months have passed
 	on the -current branch).
 
 	Sometimes, obscure build problems are the result of environment
 	poisoning.  This can happen because the make utility reads its
 	environment when searching for values for global variables.
 	To run your build attempts in an "environmental clean room",
 	prefix all make commands with 'env -i '.  See the env(1) manual
 	page for more details.
 
 	When upgrading from one major version to another it is generally
 	best to upgrade to the latest code in the currently installed branch
 	first, then do an upgrade to the new branch. This is the best-tested
 	upgrade path, and has the highest probability of being successful.
 	Please try this approach before reporting problems with a major
 	version upgrade.
 
 	To build a kernel
 	-----------------
 	If you are updating from a prior version of FreeBSD (even one just
 	a few days old), you should follow this procedure.  It is the most
 	failsafe as it uses a /usr/obj tree with a fresh mini-buildworld,
 
 	make kernel-toolchain
 	make -DALWAYS_CHECK_MAKE buildkernel KERNCONF=YOUR_KERNEL_HERE
 	make -DALWAYS_CHECK_MAKE installkernel KERNCONF=YOUR_KERNEL_HERE
 
 	To test a kernel once
 	---------------------
 	If you just want to boot a kernel once (because you are not sure
 	if it works, or if you want to boot a known bad kernel to provide
 	debugging information) run
 	make installkernel KERNCONF=YOUR_KERNEL_HERE KODIR=/boot/testkernel
 	nextboot -k testkernel
 
 	To just build a kernel when you know that it won't mess you up
 	--------------------------------------------------------------
 	This assumes you are already running a 5.X system.  Replace
 	${arch} with the architecture of your machine (e.g. "i386",
 	"alpha", "amd64", "ia64", "pc98", "sparc64", etc).
 
 	cd src/sys/${arch}/conf
 	config KERNEL_NAME_HERE
 	cd ../compile/KERNEL_NAME_HERE
 	make depend
 	make
 	make install
 
 	If this fails, go to the "To build a kernel" section.
 
 	To rebuild everything and install it on the current system.
 	-----------------------------------------------------------
 	# Note: sometimes if you are running current you gotta do more than
 	# is listed here if you are upgrading from a really old current.
 
 	<make sure you have good level 0 dumps>
 	make buildworld
 	make kernel KERNCONF=YOUR_KERNEL_HERE
 							[1]
 	<reboot in single user>				[3]
 	mergemaster -p					[5]
 	make installworld
 	make delete-old
 	mergemaster					[4]
 	<reboot>
 
 
 	To cross-install current onto a separate partition
 	--------------------------------------------------
 	# In this approach we use a separate partition to hold
 	# current's root, 'usr', and 'var' directories.   A partition
 	# holding "/", "/usr" and "/var" should be about 2GB in
 	# size.
 
 	<make sure you have good level 0 dumps>
 	<boot into -stable>
 	make buildworld
 	make buildkernel KERNCONF=YOUR_KERNEL_HERE
 	<maybe newfs current's root partition>
 	<mount current's root partition on directory ${CURRENT_ROOT}>
 	make installworld DESTDIR=${CURRENT_ROOT}
 	make distribution DESTDIR=${CURRENT_ROOT} # if newfs'd
 	make installkernel KERNCONF=YOUR_KERNEL_HERE DESTDIR=${CURRENT_ROOT}
 	cp /etc/fstab ${CURRENT_ROOT}/etc/fstab 		   # if newfs'd
 	<edit ${CURRENT_ROOT}/etc/fstab to mount "/" from the correct partition>
 	<reboot into current>
 	<do a "native" rebuild/install as described in the previous section>
 	<maybe install compatibility libraries from ports/misc/compat*>
 	<reboot>
 
 
 	To upgrade in-place from 5.x-stable to current
 	----------------------------------------------
 	<make sure you have good level 0 dumps>
 	make buildworld					[9]
 	make kernel KERNCONF=YOUR_KERNEL_HERE		[8]
 							[1]
 	<reboot in single user>				[3]
 	mergemaster -p					[5]
 	make installworld
 	make delete-old
 	mergemaster -i					[4]
 	<reboot>
 
 	Make sure that you've read the UPDATING file to understand the
 	tweaks to various things you need.  At this point in the life
 	cycle of current, things change often and you are on your own
 	to cope.  The defaults can also change, so please read ALL of
 	the UPDATING entries.
 
 	Also, if you are tracking -current, you must be subscribed to
 	freebsd-current@freebsd.org.  Make sure that before you update
 	your sources that you have read and understood all the recent
 	messages there.  If in doubt, please track -stable which has
 	much fewer pitfalls.
 
 	[1] If you have third party modules, such as vmware, you
 	should disable them at this point so they don't crash your
 	system on reboot.
 
 	[3] From the bootblocks, boot -s, and then do
 		fsck -p
 		mount -u /
 		mount -a
 		cd src
 		adjkerntz -i		# if CMOS is wall time
 	Also, when doing a major release upgrade, it is required that
 	you boot into single user mode to do the installworld.
 
 	[4] Note: This step is non-optional.  Failure to do this step
 	can result in a significant reduction in the functionality of the
 	system.  Attempting to do it by hand is not recommended and those
 	that pursue this avenue should read this file carefully, as well
 	as the archives of freebsd-current and freebsd-hackers mailing lists
 	for potential gotchas.
 
 	[5] Usually this step is a noop.  However, from time to time
 	you may need to do this if you get unknown user in the following
 	step.  It never hurts to do it all the time.  You may need to
 	install a new mergemaster (cd src/usr.sbin/mergemaster && make
 	install) after the buildworld before this step if you last updated
 	from current before 20020224 or from -stable before 20020408.
 
 	[8] In order to have a kernel that can run the 4.x binaries
 	needed to do an installworld, you must include the COMPAT_FREEBSD4
 	option in your kernel.  Failure to do so may leave you with a system
 	that is hard to boot to recover. A similar kernel option COMPAT_FREEBSD5
 	is required to run the 5.x binaries on more recent kernels.
 
 	Make sure that you merge any new devices from GENERIC since the
 	last time you updated your kernel config file.
 
 	[9] When checking out sources, you must include the -P flag to have
 	cvs prune empty directories.
 
 	If CPUTYPE is defined in your /etc/make.conf, make sure to use the
 	"?=" instead of the "=" assignment operator, so that buildworld can
 	override the CPUTYPE if it needs to.
 
 	MAKEOBJDIRPREFIX must be defined in an environment variable, and
 	not on the command line, or in /etc/make.conf.  buildworld will
 	warn if it is improperly defined.
 FORMAT:
 
 This file contains a list, in reverse chronological order, of major
 breakages in tracking -current.  Not all things will be listed here,
 and it only starts on October 16, 2004.  Updating files can found in
 previous releases if your system is older than this.
 
 Copyright information:
 
 Copyright 1998-2005 M. Warner Losh.  All Rights Reserved.
 
 Redistribution, publication, translation and use, with or without
 modification, in full or in part, in any form or format of this
 document are permitted without further permission from the author.
 
 THIS DOCUMENT IS PROVIDED BY WARNER LOSH ``AS IS'' AND ANY EXPRESS OR
 IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL WARNER LOSH BE LIABLE FOR ANY DIRECT,
 INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
 
 If you find this document useful, and you want to, you may buy the
 author a beer.
 
 Contact Warner Losh if you have any questions about your use of
 this document.
 
 $FreeBSD$
Index: head/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c
===================================================================
--- head/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(revision 195653)
+++ head/sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c	(revision 195654)
@@ -1,4468 +1,4469 @@
 /**************************************************************************
 
 Copyright (c) 2007-2008, Chelsio Inc.
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
  1. Redistributions of source code must retain the above copyright notice,
     this list of conditions and the following disclaimer.
 
  2. Neither the name of the Chelsio Corporation nor the names of its
     contributors may be used to endorse or promote products derived from
     this software without specific prior written permission.
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
 
 ***************************************************************************/
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/fcntl.h>
 #include <sys/kernel.h>
 #include <sys/limits.h>
 #include <sys/ktr.h>
 #include <sys/lock.h>
 #include <sys/mbuf.h>
 #include <sys/mutex.h>
 #include <sys/sockstate.h>
 #include <sys/sockopt.h>
 #include <sys/socket.h>
 #include <sys/sockbuf.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
 #include <sys/protosw.h>
 #include <sys/priv.h>
 
 #if __FreeBSD_version >= 800044
 #include <sys/vimage.h>
 #else
 #define V_tcp_do_autosndbuf tcp_do_autosndbuf
 #define V_tcp_autosndbuf_max tcp_autosndbuf_max
 #define V_tcp_do_rfc1323 tcp_do_rfc1323
 #define V_tcp_do_autorcvbuf tcp_do_autorcvbuf
 #define V_tcp_autorcvbuf_max tcp_autorcvbuf_max
 #define V_tcpstat tcpstat
 #endif
 
 #include <net/if.h>
 #include <net/route.h>
 
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/in_systm.h>
 #include <netinet/in_var.h>
 
 
 #include <cxgb_osdep.h>
 #include <sys/mbufq.h>
 
 #include <netinet/ip.h>
 #include <netinet/tcp_var.h>
 #include <netinet/tcp_fsm.h>
 #include <netinet/tcp_offload.h>
 #include <netinet/tcp_seq.h>
 #include <netinet/tcp_syncache.h>
 #include <netinet/tcp_timer.h>
 #if __FreeBSD_version >= 800056
 #include <netinet/vinet.h>
 #endif
 #include <net/route.h>
 
 #include <t3cdev.h>
 #include <common/cxgb_firmware_exports.h>
 #include <common/cxgb_t3_cpl.h>
 #include <common/cxgb_tcb.h>
 #include <common/cxgb_ctl_defs.h>
 #include <cxgb_offload.h>
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <machine/bus.h>
 #include <sys/mvec.h>
 #include <ulp/toecore/cxgb_toedev.h>
 #include <ulp/tom/cxgb_l2t.h>
 #include <ulp/tom/cxgb_defs.h>
 #include <ulp/tom/cxgb_tom.h>
 #include <ulp/tom/cxgb_t3_ddp.h>
 #include <ulp/tom/cxgb_toepcb.h>
 #include <ulp/tom/cxgb_tcp.h>
 #include <ulp/tom/cxgb_tcp_offload.h>
 
 /*
  * For ULP connections HW may add headers, e.g., for digests, that aren't part
  * of the messages sent by the host but that are part of the TCP payload and
  * therefore consume TCP sequence space.  Tx connection parameters that
  * operate in TCP sequence space are affected by the HW additions and need to
  * compensate for them to accurately track TCP sequence numbers. This array
  * contains the compensating extra lengths for ULP packets.  It is indexed by
  * a packet's ULP submode.
  */
 const unsigned int t3_ulp_extra_len[] = {0, 4, 4, 8};
 
 #ifdef notyet
 /*
  * This sk_buff holds a fake header-only TCP segment that we use whenever we
  * need to exploit SW TCP functionality that expects TCP headers, such as
  * tcp_create_openreq_child().  It's a RO buffer that may be used by multiple
  * CPUs without locking.
  */
 static struct mbuf *tcphdr_mbuf __read_mostly;
 #endif
 
 /*
  * Size of WRs in bytes.  Note that we assume all devices we are handling have
  * the same WR size.
  */
 static unsigned int wrlen __read_mostly;
 
 /*
  * The number of WRs needed for an skb depends on the number of page fragments
  * in the skb and whether it has any payload in its main body.  This maps the
  * length of the gather list represented by an skb into the # of necessary WRs.
  */
 static unsigned int mbuf_wrs[TX_MAX_SEGS + 1] __read_mostly;
 
 /*
  * Max receive window supported by HW in bytes.  Only a small part of it can
  * be set through option0, the rest needs to be set through RX_DATA_ACK.
  */
 #define MAX_RCV_WND ((1U << 27) - 1)
 
 /*
  * Min receive window.  We want it to be large enough to accommodate receive
  * coalescing, handle jumbo frames, and not trigger sender SWS avoidance.
  */
 #define MIN_RCV_WND (24 * 1024U)
 #define INP_TOS(inp) ((inp_ip_tos_get(inp) >> 2) & M_TOS)
 
 #define VALIDATE_SEQ 0
 #define VALIDATE_SOCK(so)
 #define DEBUG_WR 0
 
 #define TCP_TIMEWAIT	1
 #define TCP_CLOSE	2
 #define TCP_DROP	3
 
 static void t3_send_reset(struct toepcb *toep);
 static void send_abort_rpl(struct mbuf *m, struct toedev *tdev, int rst_status);
 static inline void free_atid(struct t3cdev *cdev, unsigned int tid);
 static void handle_syncache_event(int event, void *arg);
 
 static inline void
 SBAPPEND(struct sockbuf *sb, struct mbuf *n)
 {
 	struct mbuf *m;
 
 	m = sb->sb_mb;
 	while (m) {
 		KASSERT(((m->m_flags & M_EXT) && (m->m_ext.ext_type == EXT_EXTREF)) ||
 		    !(m->m_flags & M_EXT), ("unexpected type M_EXT=%d ext_type=%d m_len=%d\n",
 			!!(m->m_flags & M_EXT), m->m_ext.ext_type, m->m_len));
 		KASSERT(m->m_next != (struct mbuf *)0xffffffff, ("bad next value m_next=%p m_nextpkt=%p m_flags=0x%x",
 			m->m_next, m->m_nextpkt, m->m_flags));
 		m = m->m_next;
 	}
 	m = n;
 	while (m) {
 		KASSERT(((m->m_flags & M_EXT) && (m->m_ext.ext_type == EXT_EXTREF)) ||
 		    !(m->m_flags & M_EXT), ("unexpected type M_EXT=%d ext_type=%d m_len=%d\n",
 			!!(m->m_flags & M_EXT), m->m_ext.ext_type, m->m_len));
 		KASSERT(m->m_next != (struct mbuf *)0xffffffff, ("bad next value m_next=%p m_nextpkt=%p m_flags=0x%x",
 			m->m_next, m->m_nextpkt, m->m_flags));
 		m = m->m_next;
 	}
 	KASSERT(sb->sb_flags & SB_NOCOALESCE, ("NOCOALESCE not set"));
 	sbappendstream_locked(sb, n);
 	m = sb->sb_mb;
 
 	while (m) {
 		KASSERT(m->m_next != (struct mbuf *)0xffffffff, ("bad next value m_next=%p m_nextpkt=%p m_flags=0x%x",
 			m->m_next, m->m_nextpkt, m->m_flags));
 		m = m->m_next;
 	}
 }
 
 static inline int
 is_t3a(const struct toedev *dev)
 {
 	return (dev->tod_ttid == TOE_ID_CHELSIO_T3);
 }
 
 static void
 dump_toepcb(struct toepcb *toep)
 {
 	DPRINTF("qset_idx=%d qset=%d ulp_mode=%d mtu_idx=%d tid=%d\n",
 	    toep->tp_qset_idx, toep->tp_qset, toep->tp_ulp_mode,
 	    toep->tp_mtu_idx, toep->tp_tid);
 
 	DPRINTF("wr_max=%d wr_avail=%d wr_unacked=%d mss_clamp=%d flags=0x%x\n",
 	    toep->tp_wr_max, toep->tp_wr_avail, toep->tp_wr_unacked, 
 	    toep->tp_mss_clamp, toep->tp_flags);
 }
 
 #ifndef RTALLOC2_DEFINED
 static struct rtentry *
 rtalloc2(struct sockaddr *dst, int report, u_long ignflags)
 {
 	struct rtentry *rt = NULL;
 	
 	if ((rt = rtalloc1(dst, report, ignflags)) != NULL)
 		RT_UNLOCK(rt);
 
 	return (rt);
 }
 #endif
 
 /*
  * Determine whether to send a CPL message now or defer it.  A message is
  * deferred if the connection is in SYN_SENT since we don't know the TID yet.
  * For connections in other states the message is sent immediately.
  * If through_l2t is set the message is subject to ARP processing, otherwise
  * it is sent directly.
  */
 static inline void
 send_or_defer(struct toepcb *toep, struct mbuf *m, int through_l2t)
 {
 	struct tcpcb *tp = toep->tp_tp;
 
 	if (__predict_false(tp->t_state == TCPS_SYN_SENT)) {
 		inp_wlock(tp->t_inpcb);
 		mbufq_tail(&toep->out_of_order_queue, m);  // defer
 		inp_wunlock(tp->t_inpcb);
 	} else if (through_l2t)
 		l2t_send(TOEP_T3C_DEV(toep), m, toep->tp_l2t);  // send through L2T
 	else
 		cxgb_ofld_send(TOEP_T3C_DEV(toep), m);          // send directly
 }
 
 static inline unsigned int
 mkprio(unsigned int cntrl, const struct toepcb *toep)
 {
         return (cntrl);
 }
 
 /*
  * Populate a TID_RELEASE WR.  The skb must be already propely sized.
  */
 static inline void
 mk_tid_release(struct mbuf *m, const struct toepcb *toep, unsigned int tid)
 {
 	struct cpl_tid_release *req;
 
 	m_set_priority(m, mkprio(CPL_PRIORITY_SETUP, toep));
 	m->m_pkthdr.len = m->m_len = sizeof(*req);
 	req = mtod(m, struct cpl_tid_release *);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_TID_RELEASE, tid));
 }
 
 static inline void
 make_tx_data_wr(struct socket *so, struct mbuf *m, int len, struct mbuf *tail)
 {
 	INIT_VNET_INET(so->so_vnet);
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	struct tx_data_wr *req;
 	struct sockbuf *snd;
 	
 	inp_lock_assert(tp->t_inpcb);
 	snd = so_sockbuf_snd(so);
 	
 	req = mtod(m, struct tx_data_wr *);
 	m->m_len = sizeof(*req);
 	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
 	req->wr_lo = htonl(V_WR_TID(toep->tp_tid));
 	/* len includes the length of any HW ULP additions */
 	req->len = htonl(len);
 	req->param = htonl(V_TX_PORT(toep->tp_l2t->smt_idx));
 	/* V_TX_ULP_SUBMODE sets both the mode and submode */
 	req->flags = htonl(V_TX_ULP_SUBMODE(/*skb_ulp_mode(skb)*/ 0) |
 	                   V_TX_URG(/* skb_urgent(skb) */ 0 ) |
 	                   V_TX_SHOVE((!(tp->t_flags & TF_MORETOCOME) &&
 				   (tail ? 0 : 1))));
 	req->sndseq = htonl(tp->snd_nxt);
 	if (__predict_false((toep->tp_flags & TP_DATASENT) == 0)) {
 		req->flags |= htonl(V_TX_ACK_PAGES(2) | F_TX_INIT | 
 				    V_TX_CPU_IDX(toep->tp_qset));
  
 		/* Sendbuffer is in units of 32KB.
 		 */
 		if (V_tcp_do_autosndbuf && snd->sb_flags & SB_AUTOSIZE) 
 			req->param |= htonl(V_TX_SNDBUF(V_tcp_autosndbuf_max >> 15));
 		else {
 			req->param |= htonl(V_TX_SNDBUF(snd->sb_hiwat >> 15));
 		}
 		
 		toep->tp_flags |= TP_DATASENT;
 	}
 }
 
 #define IMM_LEN 64 /* XXX - see WR_LEN in the cxgb driver */
 
 int
 t3_push_frames(struct socket *so, int req_completion)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	
 	struct mbuf *tail, *m0, *last;
 	struct t3cdev *cdev;
 	struct tom_data *d;
 	int state, bytes, count, total_bytes;
 	bus_dma_segment_t segs[TX_MAX_SEGS], *segp;
 	struct sockbuf *snd;
 	
 	if (tp->t_state == TCPS_SYN_SENT || tp->t_state == TCPS_CLOSED) {
 		DPRINTF("tcp state=%d\n", tp->t_state);	
 		return (0);
 	}	
 
 	state = so_state_get(so);
 	
 	if (state & (SS_ISDISCONNECTING|SS_ISDISCONNECTED)) {
 		DPRINTF("disconnecting\n");
 		
 		return (0);
 	}
 
 	inp_lock_assert(tp->t_inpcb);
 
 	snd = so_sockbuf_snd(so);
 	sockbuf_lock(snd);
 
 	d = TOM_DATA(toep->tp_toedev);
 	cdev = d->cdev;
 
 	last = tail = snd->sb_sndptr ? snd->sb_sndptr : snd->sb_mb;
 
 	total_bytes = 0;
 	DPRINTF("wr_avail=%d tail=%p snd.cc=%d tp_last=%p\n",
 	    toep->tp_wr_avail, tail, snd->sb_cc, toep->tp_m_last);
 
 	if (last && toep->tp_m_last == last  && snd->sb_sndptroff != 0) {
 		KASSERT(tail, ("sbdrop error"));
 		last = tail = tail->m_next;
 	}
 
 	if ((toep->tp_wr_avail == 0 ) || (tail == NULL)) {
 		DPRINTF("wr_avail=%d tail=%p\n", toep->tp_wr_avail, tail);
 		sockbuf_unlock(snd);
 
 		return (0);		
 	}
 			
 	toep->tp_m_last = NULL;
 	while (toep->tp_wr_avail && (tail != NULL)) {
 		count = bytes = 0;
 		segp = segs;
 		if ((m0 = m_gethdr(M_NOWAIT, MT_DATA)) == NULL) {
 			sockbuf_unlock(snd);
 			return (0);
 		}
 		/*
 		 * If the data in tail fits as in-line, then
 		 * make an immediate data wr.
 		 */
 		if (tail->m_len <= IMM_LEN) {
 			count = 1;
 			bytes = tail->m_len;
 			last = tail;
 			tail = tail->m_next;
 			m_set_sgl(m0, NULL);
 			m_set_sgllen(m0, 0);
 			make_tx_data_wr(so, m0, bytes, tail);
 			m_append(m0, bytes, mtod(last, caddr_t));
 			KASSERT(!m0->m_next, ("bad append"));
 		} else {
 			while ((mbuf_wrs[count + 1] <= toep->tp_wr_avail)
 			    && (tail != NULL) && (count < TX_MAX_SEGS-1)) {
 				bytes += tail->m_len;
 				last = tail;
 				count++;
 				/*
 				 * technically an abuse to be using this for a VA
 				 * but less gross than defining my own structure
 				 * or calling pmap_kextract from here :-|
 				 */
 				segp->ds_addr = (bus_addr_t)tail->m_data;
 				segp->ds_len = tail->m_len;
 				DPRINTF("count=%d wr_needed=%d ds_addr=%p ds_len=%d\n",
 				    count, mbuf_wrs[count], tail->m_data, tail->m_len);
 				segp++;
 				tail = tail->m_next;
 			}
 			DPRINTF("wr_avail=%d mbuf_wrs[%d]=%d tail=%p\n",
 			    toep->tp_wr_avail, count, mbuf_wrs[count], tail);	
 
 			m_set_sgl(m0, segs);
 			m_set_sgllen(m0, count);
 			make_tx_data_wr(so, m0, bytes, tail);
 		}
 		m_set_priority(m0, mkprio(CPL_PRIORITY_DATA, toep));
 
 		if (tail) {
 			snd->sb_sndptr = tail;
 			toep->tp_m_last = NULL;
 		} else 
 			toep->tp_m_last = snd->sb_sndptr = last;
 
 
 		DPRINTF("toep->tp_m_last=%p\n", toep->tp_m_last);
 
 		snd->sb_sndptroff += bytes;
 		total_bytes += bytes;
 		toep->tp_write_seq += bytes;
 		CTR6(KTR_TOM, "t3_push_frames: wr_avail=%d mbuf_wrs[%d]=%d"
 		    " tail=%p sndptr=%p sndptroff=%d",
 		    toep->tp_wr_avail, count, mbuf_wrs[count],
 		    tail, snd->sb_sndptr, snd->sb_sndptroff);	
 		if (tail)
 			CTR4(KTR_TOM, "t3_push_frames: total_bytes=%d"
 			    " tp_m_last=%p tailbuf=%p snd_una=0x%08x",
 			    total_bytes, toep->tp_m_last, tail->m_data,
 			    tp->snd_una);
 		else
 			CTR3(KTR_TOM, "t3_push_frames: total_bytes=%d"
 			    " tp_m_last=%p snd_una=0x%08x",
 			    total_bytes, toep->tp_m_last, tp->snd_una);
 
 
 #ifdef KTR		
 {
 		int i;
 
 		i = 0;
 		while (i < count && m_get_sgllen(m0)) {
 			if ((count - i) >= 3) {
 				CTR6(KTR_TOM,
 				    "t3_push_frames: pa=0x%zx len=%d pa=0x%zx"
 				    " len=%d pa=0x%zx len=%d",
 				    segs[i].ds_addr, segs[i].ds_len,
 				    segs[i + 1].ds_addr, segs[i + 1].ds_len,
 				    segs[i + 2].ds_addr, segs[i + 2].ds_len);
 				    i += 3;
 			} else if ((count - i) == 2) {
 				CTR4(KTR_TOM, 
 				    "t3_push_frames: pa=0x%zx len=%d pa=0x%zx"
 				    " len=%d",
 				    segs[i].ds_addr, segs[i].ds_len,
 				    segs[i + 1].ds_addr, segs[i + 1].ds_len);
 				    i += 2;
 			} else {
 				CTR2(KTR_TOM, "t3_push_frames: pa=0x%zx len=%d",
 				    segs[i].ds_addr, segs[i].ds_len);
 				i++;
 			}
 	
 		}
 }
 #endif		
                  /*
 		 * remember credits used
 		 */
 		m0->m_pkthdr.csum_data = mbuf_wrs[count];
 		m0->m_pkthdr.len = bytes;
 		toep->tp_wr_avail -= mbuf_wrs[count];
 		toep->tp_wr_unacked += mbuf_wrs[count];
 		
 		if ((req_completion && toep->tp_wr_unacked == mbuf_wrs[count]) ||
 		    toep->tp_wr_unacked >= toep->tp_wr_max / 2) {
 			struct work_request_hdr *wr = cplhdr(m0);
 
 			wr->wr_hi |= htonl(F_WR_COMPL);
 			toep->tp_wr_unacked = 0;	
 		}
 		KASSERT((m0->m_pkthdr.csum_data > 0) &&
 		    (m0->m_pkthdr.csum_data <= 4), ("bad credit count %d",
 			m0->m_pkthdr.csum_data));
 		m0->m_type = MT_DONTFREE;
 		enqueue_wr(toep, m0);
 		DPRINTF("sending offload tx with %d bytes in %d segments\n",
 		    bytes, count);
 		l2t_send(cdev, m0, toep->tp_l2t);
 	}
 	sockbuf_unlock(snd);
 	return (total_bytes);
 }
 
 /*
  * Close a connection by sending a CPL_CLOSE_CON_REQ message.  Cannot fail
  * under any circumstances.  We take the easy way out and always queue the
  * message to the write_queue.  We can optimize the case where the queue is
  * already empty though the optimization is probably not worth it.
  */
 static void
 close_conn(struct socket *so)
 {
 	struct mbuf *m;
 	struct cpl_close_con_req *req;
 	struct tom_data *d;
 	struct inpcb *inp = so_sotoinpcb(so);
 	struct tcpcb *tp;
 	struct toepcb *toep;
 	unsigned int tid; 
 
 
 	inp_wlock(inp);
 	tp = so_sototcpcb(so);
 	toep = tp->t_toe;
 	
 	if (tp->t_state != TCPS_SYN_SENT)
 		t3_push_frames(so, 1);
 	
 	if (toep->tp_flags & TP_FIN_SENT) {
 		inp_wunlock(inp);
 		return;
 	}
 
 	tid = toep->tp_tid;
 	    
 	d = TOM_DATA(toep->tp_toedev);
 	
 	m = m_gethdr_nofail(sizeof(*req));
 	m_set_priority(m, CPL_PRIORITY_DATA);
 	m_set_sgl(m, NULL);
 	m_set_sgllen(m, 0);
 
 	toep->tp_flags |= TP_FIN_SENT;
 	req = mtod(m, struct cpl_close_con_req *);
 	
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_CLOSE_CON));
 	req->wr.wr_lo = htonl(V_WR_TID(tid));
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, tid));
 	req->rsvd = 0;
 	inp_wunlock(inp);
 	/*
 	 * XXX - need to defer shutdown while there is still data in the queue
 	 *
 	 */
 	CTR4(KTR_TOM, "%s CLOSE_CON_REQ so %p tp %p tid=%u", __FUNCTION__, so, tp, tid);
 	cxgb_ofld_send(d->cdev, m);
 
 }
 
 /*
  * Handle an ARP failure for a CPL_ABORT_REQ.  Change it into a no RST variant
  * and send it along.
  */
 static void
 abort_arp_failure(struct t3cdev *cdev, struct mbuf *m)
 {
 	struct cpl_abort_req *req = cplhdr(m);
 
 	req->cmd = CPL_ABORT_NO_RST;
 	cxgb_ofld_send(cdev, m);
 }
 
 /*
  * Send RX credits through an RX_DATA_ACK CPL message.  If nofail is 0 we are
  * permitted to return without sending the message in case we cannot allocate
  * an sk_buff.  Returns the number of credits sent.
  */
 uint32_t
 t3_send_rx_credits(struct tcpcb *tp, uint32_t credits, uint32_t dack, int nofail)
 {
 	struct mbuf *m;
 	struct cpl_rx_data_ack *req;
 	struct toepcb *toep = tp->t_toe;
 	struct toedev *tdev = toep->tp_toedev;
 	
 	m = m_gethdr_nofail(sizeof(*req));
 
 	DPRINTF("returning %u credits to HW\n", credits);
 	
 	req = mtod(m, struct cpl_rx_data_ack *);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_RX_DATA_ACK, toep->tp_tid));
 	req->credit_dack = htonl(dack | V_RX_CREDITS(credits));
 	m_set_priority(m, mkprio(CPL_PRIORITY_ACK, toep)); 
 	cxgb_ofld_send(TOM_DATA(tdev)->cdev, m);
 	return (credits);
 }
 
 /*
  * Send RX_DATA_ACK CPL message to request a modulation timer to be scheduled.
  * This is only used in DDP mode, so we take the opportunity to also set the
  * DACK mode and flush any Rx credits.
  */
 void
 t3_send_rx_modulate(struct toepcb *toep)
 {
 	struct mbuf *m;
 	struct cpl_rx_data_ack *req;
 
 	m = m_gethdr_nofail(sizeof(*req));
 
 	req = mtod(m, struct cpl_rx_data_ack *);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	m->m_pkthdr.len = m->m_len = sizeof(*req);
 	
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_RX_DATA_ACK, toep->tp_tid));
 	req->credit_dack = htonl(F_RX_MODULATE | F_RX_DACK_CHANGE |
 				 V_RX_DACK_MODE(1) |
 				 V_RX_CREDITS(toep->tp_copied_seq - toep->tp_rcv_wup));
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	cxgb_ofld_send(TOEP_T3C_DEV(toep), m);
 	toep->tp_rcv_wup = toep->tp_copied_seq;
 }
 
 /*
  * Handle receipt of an urgent pointer.
  */
 static void
 handle_urg_ptr(struct socket *so, uint32_t urg_seq)
 {
 #ifdef URGENT_DATA_SUPPORTED
 	struct tcpcb *tp = so_sototcpcb(so);
 
 	urg_seq--;   /* initially points past the urgent data, per BSD */
 
 	if (tp->urg_data && !after(urg_seq, tp->urg_seq))
 		return;                                 /* duplicate pointer */
 	sk_send_sigurg(sk);
 	if (tp->urg_seq == tp->copied_seq && tp->urg_data &&
 	    !sock_flag(sk, SOCK_URGINLINE) && tp->copied_seq != tp->rcv_nxt) {
 		struct sk_buff *skb = skb_peek(&sk->sk_receive_queue);
 
 		tp->copied_seq++;
 		if (skb && tp->copied_seq - TCP_SKB_CB(skb)->seq >= skb->len)
 			tom_eat_skb(sk, skb, 0);
 	}
 	tp->urg_data = TCP_URG_NOTYET;
 	tp->urg_seq = urg_seq;
 #endif
 }
 
 /*
  * Returns true if a socket cannot accept new Rx data.
  */
 static inline int
 so_no_receive(const struct socket *so)
 {
 	return (so_state_get(so) & (SS_ISDISCONNECTED|SS_ISDISCONNECTING));
 }
 
 /*
  * Process an urgent data notification.
  */
 static void
 rx_urg_notify(struct toepcb *toep, struct mbuf *m)
 {
 	struct cpl_rx_urg_notify *hdr = cplhdr(m);
 	struct socket *so = inp_inpcbtosocket(toep->tp_tp->t_inpcb);
 
 	VALIDATE_SOCK(so);
 
 	if (!so_no_receive(so))
 		handle_urg_ptr(so, ntohl(hdr->seq));
 
 	m_freem(m);
 }
 
 /*
  * Handler for RX_URG_NOTIFY CPL messages.
  */
 static int
 do_rx_urg_notify(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	rx_urg_notify(toep, m);
 	return (0);
 }
 
 static __inline int
 is_delack_mode_valid(struct toedev *dev, struct toepcb *toep)
 {
 	return (toep->tp_ulp_mode ||
 		(toep->tp_ulp_mode == ULP_MODE_TCPDDP &&
 		    dev->tod_ttid >= TOE_ID_CHELSIO_T3));
 }
 
 /*
  * Set of states for which we should return RX credits.
  */
 #define CREDIT_RETURN_STATE (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2)
 
 /*
  * Called after some received data has been read.  It returns RX credits
  * to the HW for the amount of data processed.
  */
 void
 t3_cleanup_rbuf(struct tcpcb *tp, int copied)
 {
 	struct toepcb *toep = tp->t_toe;
 	struct socket *so;
 	struct toedev *dev;
 	int dack_mode, must_send, read;
 	u32 thres, credits, dack = 0;
 	struct sockbuf *rcv;
 	
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	rcv = so_sockbuf_rcv(so);
 
 	if (!((tp->t_state == TCPS_ESTABLISHED) || (tp->t_state == TCPS_FIN_WAIT_1) ||
 		(tp->t_state == TCPS_FIN_WAIT_2))) {
 		if (copied) {
 			sockbuf_lock(rcv);
 			toep->tp_copied_seq += copied;
 			sockbuf_unlock(rcv);
 		}
 		
 		return;
 	}
 	
 	inp_lock_assert(tp->t_inpcb); 
 
 	sockbuf_lock(rcv);
 	if (copied)
 		toep->tp_copied_seq += copied;
 	else {
 		read = toep->tp_enqueued_bytes - rcv->sb_cc;
 		toep->tp_copied_seq += read;
 	}
 	credits = toep->tp_copied_seq - toep->tp_rcv_wup;
 	toep->tp_enqueued_bytes = rcv->sb_cc;
 	sockbuf_unlock(rcv);
 
 	if (credits > rcv->sb_mbmax) {
 		log(LOG_ERR, "copied_seq=%u rcv_wup=%u credits=%u\n",
 		    toep->tp_copied_seq, toep->tp_rcv_wup, credits);
 	    credits = rcv->sb_mbmax;
 	}
 	
 	    
 	/*
 	 * XXX this won't accurately reflect credit return - we need
 	 * to look at the difference between the amount that has been 
 	 * put in the recv sockbuf and what is there now
 	 */
 
 	if (__predict_false(!credits))
 		return;
 
 	dev = toep->tp_toedev;
 	thres = TOM_TUNABLE(dev, rx_credit_thres);
 
 	if (__predict_false(thres == 0))
 		return;
 
 	if (is_delack_mode_valid(dev, toep)) {
 		dack_mode = TOM_TUNABLE(dev, delack);
 		if (__predict_false(dack_mode != toep->tp_delack_mode)) {
 			u32 r = tp->rcv_nxt - toep->tp_delack_seq;
 
 			if (r >= tp->rcv_wnd || r >= 16 * toep->tp_mss_clamp)
 				dack = F_RX_DACK_CHANGE |
 				       V_RX_DACK_MODE(dack_mode);
 		}
 	} else 
 		dack = F_RX_DACK_CHANGE | V_RX_DACK_MODE(1);
 		
 	/*
 	 * For coalescing to work effectively ensure the receive window has
 	 * at least 16KB left.
 	 */
 	must_send = credits + 16384 >= tp->rcv_wnd;
 
 	if (must_send || credits >= thres)
 		toep->tp_rcv_wup += t3_send_rx_credits(tp, credits, dack, must_send);
 }
 
 static int
 cxgb_toe_disconnect(struct tcpcb *tp)
 {
 	struct socket *so;
 	
 	DPRINTF("cxgb_toe_disconnect\n");
 
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	close_conn(so);
 	return (0);
 }
 
 static int
 cxgb_toe_reset(struct tcpcb *tp)
 {
 	struct toepcb *toep = tp->t_toe;
 
 	t3_send_reset(toep);
 
 	/*
 	 * unhook from socket
 	 */
 	tp->t_flags &= ~TF_TOE;
 	toep->tp_tp = NULL;
 	tp->t_toe = NULL;
 	return (0);
 }
 
 static int
 cxgb_toe_send(struct tcpcb *tp)
 {
 	struct socket *so;
 	
 	DPRINTF("cxgb_toe_send\n");
 	dump_toepcb(tp->t_toe);
 
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	t3_push_frames(so, 1);
 	return (0);
 }
 
 static int
 cxgb_toe_rcvd(struct tcpcb *tp)
 {
 
 	inp_lock_assert(tp->t_inpcb);
 
 	t3_cleanup_rbuf(tp, 0);
 	
 	return (0);
 }
 
 static void
 cxgb_toe_detach(struct tcpcb *tp)
 {
 	struct toepcb *toep;
 
         /*
 	 * XXX how do we handle teardown in the SYN_SENT state?
 	 *
 	 */
 	inp_lock_assert(tp->t_inpcb);
 	toep = tp->t_toe;
 	toep->tp_tp = NULL;
 
 	/*
 	 * unhook from socket
 	 */
 	tp->t_flags &= ~TF_TOE;
 	tp->t_toe = NULL;
 }
 	
 
 static struct toe_usrreqs cxgb_toe_usrreqs = {
 	.tu_disconnect = cxgb_toe_disconnect,
 	.tu_reset = cxgb_toe_reset,
 	.tu_send = cxgb_toe_send,
 	.tu_rcvd = cxgb_toe_rcvd,
 	.tu_detach = cxgb_toe_detach,
 	.tu_detach = cxgb_toe_detach,
 	.tu_syncache_event = handle_syncache_event,
 };
 
 
 static void
 __set_tcb_field(struct toepcb *toep, struct mbuf *m, uint16_t word,
 			    uint64_t mask, uint64_t val, int no_reply)
 {
 	struct cpl_set_tcb_field *req;
 
 	CTR4(KTR_TCB, "__set_tcb_field_ulp(tid=%u word=0x%x mask=%jx val=%jx",
 	    toep->tp_tid, word, mask, val);
 
 	req = mtod(m, struct cpl_set_tcb_field *);
 	m->m_pkthdr.len = m->m_len = sizeof(*req);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, toep->tp_tid));
 	req->reply = V_NO_REPLY(no_reply);
 	req->cpu_idx = 0;
 	req->word = htons(word);
 	req->mask = htobe64(mask);
 	req->val = htobe64(val);
 
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	send_or_defer(toep, m, 0);
 }
 
 static void
 t3_set_tcb_field(struct toepcb *toep, uint16_t word, uint64_t mask, uint64_t val)
 {
 	struct mbuf *m;
 	struct tcpcb *tp = toep->tp_tp;
 	
 	if (toep == NULL)
 		return;
  
 	if (tp->t_state == TCPS_CLOSED || (toep->tp_flags & TP_ABORT_SHUTDOWN)) {
 		printf("not seting field\n");
 		return;
 	}
 	
 	m = m_gethdr_nofail(sizeof(struct cpl_set_tcb_field));
 
 	__set_tcb_field(toep, m, word, mask, val, 1);
 }
 
 /*
  * Set one of the t_flags bits in the TCB.
  */
 static void
 set_tcb_tflag(struct toepcb *toep, unsigned int bit_pos, int val)
 {
 
 	t3_set_tcb_field(toep, W_TCB_T_FLAGS1, 1ULL << bit_pos, val << bit_pos);
 }
 
 /*
  * Send a SET_TCB_FIELD CPL message to change a connection's Nagle setting.
  */
 static void
 t3_set_nagle(struct toepcb *toep)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	
 	set_tcb_tflag(toep, S_TF_NAGLE, !(tp->t_flags & TF_NODELAY));
 }
 
 /*
  * Send a SET_TCB_FIELD CPL message to change a connection's keepalive setting.
  */
 void
 t3_set_keepalive(struct toepcb *toep, int on_off)
 {
 
 	set_tcb_tflag(toep, S_TF_KEEPALIVE, on_off);
 }
 
 void
 t3_set_rcv_coalesce_enable(struct toepcb *toep, int on_off)
 {
 	set_tcb_tflag(toep, S_TF_RCV_COALESCE_ENABLE, on_off);
 }
 
 void
 t3_set_dack_mss(struct toepcb *toep, int on_off)
 {
 
 	set_tcb_tflag(toep, S_TF_DACK_MSS, on_off);
 }
 
 /*
  * Send a SET_TCB_FIELD CPL message to change a connection's TOS setting.
  */
 static void
 t3_set_tos(struct toepcb *toep)
 {
 	int tos = inp_ip_tos_get(toep->tp_tp->t_inpcb);	
 	
 	t3_set_tcb_field(toep, W_TCB_TOS, V_TCB_TOS(M_TCB_TOS),
 			 V_TCB_TOS(tos));
 }
 
 
 /*
  * In DDP mode, TP fails to schedule a timer to push RX data to the host when
  * DDP is disabled (data is delivered to freelist). [Note that, the peer should
  * set the PSH bit in the last segment, which would trigger delivery.]
  * We work around the issue by setting a DDP buffer in a partial placed state,
  * which guarantees that TP will schedule a timer.
  */
 #define TP_DDP_TIMER_WORKAROUND_MASK\
     (V_TF_DDP_BUF0_VALID(1) | V_TF_DDP_ACTIVE_BUF(1) |\
      ((V_TCB_RX_DDP_BUF0_OFFSET(M_TCB_RX_DDP_BUF0_OFFSET) |\
        V_TCB_RX_DDP_BUF0_LEN(3)) << 32))
 #define TP_DDP_TIMER_WORKAROUND_VAL\
     (V_TF_DDP_BUF0_VALID(1) | V_TF_DDP_ACTIVE_BUF(0) |\
      ((V_TCB_RX_DDP_BUF0_OFFSET((uint64_t)1) | V_TCB_RX_DDP_BUF0_LEN((uint64_t)2)) <<\
       32))
 
 static void
 t3_enable_ddp(struct toepcb *toep, int on)
 {
 	if (on) {
 		
 		t3_set_tcb_field(toep, W_TCB_RX_DDP_FLAGS, V_TF_DDP_OFF(1),
 				 V_TF_DDP_OFF(0));
 	} else
 		t3_set_tcb_field(toep, W_TCB_RX_DDP_FLAGS,
 				 V_TF_DDP_OFF(1) |
 				 TP_DDP_TIMER_WORKAROUND_MASK,
 				 V_TF_DDP_OFF(1) |
 				 TP_DDP_TIMER_WORKAROUND_VAL);
 
 }
 
 void
 t3_set_ddp_tag(struct toepcb *toep, int buf_idx, unsigned int tag_color)
 {
 	t3_set_tcb_field(toep, W_TCB_RX_DDP_BUF0_TAG + buf_idx,
 			 V_TCB_RX_DDP_BUF0_TAG(M_TCB_RX_DDP_BUF0_TAG),
 			 tag_color);
 }
 
 void
 t3_set_ddp_buf(struct toepcb *toep, int buf_idx, unsigned int offset,
 		    unsigned int len)
 {
 	if (buf_idx == 0)
 		t3_set_tcb_field(toep, W_TCB_RX_DDP_BUF0_OFFSET,
 			 V_TCB_RX_DDP_BUF0_OFFSET(M_TCB_RX_DDP_BUF0_OFFSET) |
 			 V_TCB_RX_DDP_BUF0_LEN(M_TCB_RX_DDP_BUF0_LEN),
 			 V_TCB_RX_DDP_BUF0_OFFSET((uint64_t)offset) |
 			 V_TCB_RX_DDP_BUF0_LEN((uint64_t)len));
 	else
 		t3_set_tcb_field(toep, W_TCB_RX_DDP_BUF1_OFFSET,
 			 V_TCB_RX_DDP_BUF1_OFFSET(M_TCB_RX_DDP_BUF1_OFFSET) |
 			 V_TCB_RX_DDP_BUF1_LEN(M_TCB_RX_DDP_BUF1_LEN << 32),
 			 V_TCB_RX_DDP_BUF1_OFFSET((uint64_t)offset) |
 			 V_TCB_RX_DDP_BUF1_LEN(((uint64_t)len) << 32));
 }
 
 static int
 t3_set_cong_control(struct socket *so, const char *name)
 {
 #ifdef CONGESTION_CONTROL_SUPPORTED	
 	int cong_algo;
 
 	for (cong_algo = 0; cong_algo < ARRAY_SIZE(t3_cong_ops); cong_algo++)
 		if (!strcmp(name, t3_cong_ops[cong_algo].name))
 			break;
 
 	if (cong_algo >= ARRAY_SIZE(t3_cong_ops))
 		return -EINVAL;
 #endif
 	return 0;
 }
 
 int
 t3_get_tcb(struct toepcb *toep)
 {
 	struct cpl_get_tcb *req;
 	struct tcpcb *tp = toep->tp_tp;
 	struct mbuf *m = m_gethdr(M_NOWAIT, MT_DATA);
 
 	if (!m)
 		return (ENOMEM);
 	
 	inp_lock_assert(tp->t_inpcb);	
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	req = mtod(m, struct cpl_get_tcb *);
 	m->m_pkthdr.len = m->m_len = sizeof(*req);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_GET_TCB, toep->tp_tid));
 	req->cpuno = htons(toep->tp_qset);
 	req->rsvd = 0;
 	if (tp->t_state == TCPS_SYN_SENT)
 		mbufq_tail(&toep->out_of_order_queue, m);	// defer
 	else
 		cxgb_ofld_send(TOEP_T3C_DEV(toep), m);
 	return 0;
 }
 
 static inline void
 so_insert_tid(struct tom_data *d, struct toepcb *toep, unsigned int tid)
 {
 
 	toepcb_hold(toep);
 
 	cxgb_insert_tid(d->cdev, d->client, toep, tid);
 }
 
 /**
  *	find_best_mtu - find the entry in the MTU table closest to an MTU
  *	@d: TOM state
  *	@mtu: the target MTU
  *
  *	Returns the index of the value in the MTU table that is closest to but
  *	does not exceed the target MTU.
  */
 static unsigned int
 find_best_mtu(const struct t3c_data *d, unsigned short mtu)
 {
 	int i = 0;
 
 	while (i < d->nmtus - 1 && d->mtus[i + 1] <= mtu)
 		++i;
 	return (i);
 }
 
 static unsigned int
 select_mss(struct t3c_data *td, struct tcpcb *tp, unsigned int pmtu)
 {
 	unsigned int idx;
 	
 #ifdef notyet
 	struct rtentry *dst = so_sotoinpcb(so)->inp_route.ro_rt;
 #endif
 	if (tp) {
 		tp->t_maxseg = pmtu - 40;
 		if (tp->t_maxseg < td->mtus[0] - 40)
 			tp->t_maxseg = td->mtus[0] - 40;
 		idx = find_best_mtu(td, tp->t_maxseg + 40);
 
 		tp->t_maxseg = td->mtus[idx] - 40;
 	} else
 		idx = find_best_mtu(td, pmtu);
 	
 	return (idx);
 }
 
 static inline void
 free_atid(struct t3cdev *cdev, unsigned int tid)
 {
 	struct toepcb *toep = cxgb_free_atid(cdev, tid);
 
 	if (toep)
 		toepcb_release(toep);
 }
 
 /*
  * Release resources held by an offload connection (TID, L2T entry, etc.)
  */
 static void
 t3_release_offload_resources(struct toepcb *toep)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	struct toedev *tdev = toep->tp_toedev;
 	struct t3cdev *cdev;
 	struct socket *so;
 	unsigned int tid = toep->tp_tid;
 	struct sockbuf *rcv;
 	
 	CTR0(KTR_TOM, "t3_release_offload_resources");
 
 	if (!tdev)
 		return;
 
 	cdev = TOEP_T3C_DEV(toep);
 	if (!cdev)
 		return;
 
 	toep->tp_qset = 0;
 	t3_release_ddp_resources(toep);
 
 #ifdef CTRL_SKB_CACHE
 	kfree_skb(CTRL_SKB_CACHE(tp));
 	CTRL_SKB_CACHE(tp) = NULL;
 #endif
 
 	if (toep->tp_wr_avail != toep->tp_wr_max) {
 		purge_wr_queue(toep);
 		reset_wr_list(toep);
 	}
 
 	if (toep->tp_l2t) {
 		l2t_release(L2DATA(cdev), toep->tp_l2t);
 		toep->tp_l2t = NULL;
 	}
 	toep->tp_tp = NULL;
 	if (tp) {
 		inp_lock_assert(tp->t_inpcb);
 		so = inp_inpcbtosocket(tp->t_inpcb);
 		rcv = so_sockbuf_rcv(so);		
 		/*
 		 * cancel any offloaded reads
 		 *
 		 */
 		sockbuf_lock(rcv);
 		tp->t_toe = NULL;
 		tp->t_flags &= ~TF_TOE;
 		if (toep->tp_ddp_state.user_ddp_pending) {
 			t3_cancel_ubuf(toep, rcv);
 			toep->tp_ddp_state.user_ddp_pending = 0;
 		}
 		so_sorwakeup_locked(so);
 			
 	}
 	
 	if (toep->tp_state == TCPS_SYN_SENT) {
 		free_atid(cdev, tid);
 #ifdef notyet		
 		__skb_queue_purge(&tp->out_of_order_queue);
 #endif		
 	} else {                                          // we have TID
 		cxgb_remove_tid(cdev, toep, tid);
 		toepcb_release(toep);
 	}
 #if 0
 	log(LOG_INFO, "closing TID %u, state %u\n", tid, tp->t_state);
 #endif
 }
 
 static void
 install_offload_ops(struct socket *so)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 
 	KASSERT(tp->t_toe != NULL, ("toepcb not set"));
 	
 	t3_install_socket_ops(so);
 	tp->t_flags |= TF_TOE;
 	tp->t_tu = &cxgb_toe_usrreqs;
 }
 
 /*
  * Determine the receive window scaling factor given a target max
  * receive window.
  */
 static __inline int
 select_rcv_wscale(int space, struct vnet *vnet)
 {
 	INIT_VNET_INET(vnet);
 	int wscale = 0;
 
 	if (space > MAX_RCV_WND)
 		space = MAX_RCV_WND;
 
 	if (V_tcp_do_rfc1323)
 		for (; space > 65535 && wscale < 14; space >>= 1, ++wscale) ;
 
 	return (wscale);
 }
 
 /*
  * Determine the receive window size for a socket.
  */
 static unsigned long
 select_rcv_wnd(struct toedev *dev, struct socket *so)
 {
 	INIT_VNET_INET(so->so_vnet);
 	struct tom_data *d = TOM_DATA(dev);
 	unsigned int wnd;
 	unsigned int max_rcv_wnd;
 	struct sockbuf *rcv;
 
 	rcv = so_sockbuf_rcv(so);
 	
 	if (V_tcp_do_autorcvbuf)
 		wnd = V_tcp_autorcvbuf_max;
 	else
 		wnd = rcv->sb_hiwat;
 
 	
 	
 	/* XXX
 	 * For receive coalescing to work effectively we need a receive window
 	 * that can accomodate a coalesced segment.
 	 */	
 	if (wnd < MIN_RCV_WND)
 		wnd = MIN_RCV_WND; 
 	
 	/* PR 5138 */
 	max_rcv_wnd = (dev->tod_ttid < TOE_ID_CHELSIO_T3C ? 
 				    (uint32_t)d->rx_page_size * 23 :
 				    MAX_RCV_WND);
 	
 	return min(wnd, max_rcv_wnd);
 }
 
 /*
  * Assign offload parameters to some socket fields.  This code is used by
  * both active and passive opens.
  */
 static inline void
 init_offload_socket(struct socket *so, struct toedev *dev, unsigned int tid,
     struct l2t_entry *e, struct rtentry *dst, struct toepcb *toep)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct t3c_data *td = T3C_DATA(TOM_DATA(dev)->cdev);
 	struct sockbuf *snd, *rcv;
 	
 #ifdef notyet	
 	SOCK_LOCK_ASSERT(so);
 #endif
 	
 	snd = so_sockbuf_snd(so);
 	rcv = so_sockbuf_rcv(so);
 	
 	log(LOG_INFO, "initializing offload socket\n");
 	/*
 	 * We either need to fix push frames to work with sbcompress
 	 * or we need to add this
 	 */
 	snd->sb_flags |= SB_NOCOALESCE;
 	rcv->sb_flags |= SB_NOCOALESCE;
 	
 	tp->t_toe = toep;
 	toep->tp_tp = tp;
 	toep->tp_toedev = dev;
 	
 	toep->tp_tid = tid;
 	toep->tp_l2t = e;
 	toep->tp_wr_max = toep->tp_wr_avail = TOM_TUNABLE(dev, max_wrs);
 	toep->tp_wr_unacked = 0;
 	toep->tp_delack_mode = 0;
 	
 	toep->tp_mtu_idx = select_mss(td, tp, dst->rt_ifp->if_mtu);
 	/*
 	 * XXX broken
 	 * 
 	 */
 	tp->rcv_wnd = select_rcv_wnd(dev, so);
 
         toep->tp_ulp_mode = TOM_TUNABLE(dev, ddp) && !(so_options_get(so) & SO_NO_DDP) &&
 		       tp->rcv_wnd >= MIN_DDP_RCV_WIN ? ULP_MODE_TCPDDP : 0;
 	toep->tp_qset_idx = 0;
 	
 	reset_wr_list(toep);
 	DPRINTF("initialization done\n");
 }
 
 /*
  * The next two functions calculate the option 0 value for a socket.
  */
 static inline unsigned int
 calc_opt0h(struct socket *so, int mtu_idx)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	int wscale = select_rcv_wscale(tp->rcv_wnd, so->so_vnet);
 	
 	return V_NAGLE((tp->t_flags & TF_NODELAY) == 0) |
 	    V_KEEP_ALIVE((so_options_get(so) & SO_KEEPALIVE) != 0) | F_TCAM_BYPASS |
 	    V_WND_SCALE(wscale) | V_MSS_IDX(mtu_idx);
 }
 
 static inline unsigned int
 calc_opt0l(struct socket *so, int ulp_mode)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	unsigned int val;
 	
 	val = V_TOS(INP_TOS(tp->t_inpcb)) | V_ULP_MODE(ulp_mode) |
 	       V_RCV_BUFSIZ(min(tp->rcv_wnd >> 10, (u32)M_RCV_BUFSIZ));
 
 	DPRINTF("opt0l tos=%08x rcv_wnd=%ld opt0l=%08x\n", INP_TOS(tp->t_inpcb), tp->rcv_wnd, val);
 	return (val);
 }
 
 static inline unsigned int
 calc_opt2(const struct socket *so, struct toedev *dev)
 {
 	int flv_valid;
 
 	flv_valid = (TOM_TUNABLE(dev, cong_alg) != -1);
 
 	return (V_FLAVORS_VALID(flv_valid) |
 	    V_CONG_CONTROL_FLAVOR(flv_valid ? TOM_TUNABLE(dev, cong_alg) : 0));
 }
 
 #if DEBUG_WR > 1
 static int
 count_pending_wrs(const struct toepcb *toep)
 {
 	const struct mbuf *m;
 	int n = 0;
 
 	wr_queue_walk(toep, m)
 		n += m->m_pkthdr.csum_data;
 	return (n);
 }
 #endif
 
 #if 0
 (((*(struct tom_data **)&(dev)->l4opt)->conf.cong_alg) != -1)
 #endif
 	
 static void
 mk_act_open_req(struct socket *so, struct mbuf *m,
     unsigned int atid, const struct l2t_entry *e)
 {
 	struct cpl_act_open_req *req;
 	struct inpcb *inp = so_sotoinpcb(so);
 	struct tcpcb *tp = inp_inpcbtotcpcb(inp);
 	struct toepcb *toep = tp->t_toe;
 	struct toedev *tdev = toep->tp_toedev;
 	
 	m_set_priority((struct mbuf *)m, mkprio(CPL_PRIORITY_SETUP, toep));
 	
 	req = mtod(m, struct cpl_act_open_req *);
 	m->m_pkthdr.len = m->m_len = sizeof(*req);
 
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	req->wr.wr_lo = 0;
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ACT_OPEN_REQ, atid));
 	inp_4tuple_get(inp, &req->local_ip, &req->local_port, &req->peer_ip, &req->peer_port);
 #if 0	
 	req->local_port = inp->inp_lport;
 	req->peer_port = inp->inp_fport;
 	memcpy(&req->local_ip, &inp->inp_laddr, 4);
 	memcpy(&req->peer_ip, &inp->inp_faddr, 4);
 #endif	
 	req->opt0h = htonl(calc_opt0h(so, toep->tp_mtu_idx) | V_L2T_IDX(e->idx) |
 			   V_TX_CHANNEL(e->smt_idx));
 	req->opt0l = htonl(calc_opt0l(so, toep->tp_ulp_mode));
 	req->params = 0;
 	req->opt2 = htonl(calc_opt2(so, tdev));
 }
 
 
 /*
  * Convert an ACT_OPEN_RPL status to an errno.
  */
 static int
 act_open_rpl_status_to_errno(int status)
 {
 	switch (status) {
 	case CPL_ERR_CONN_RESET:
 		return (ECONNREFUSED);
 	case CPL_ERR_ARP_MISS:
 		return (EHOSTUNREACH);
 	case CPL_ERR_CONN_TIMEDOUT:
 		return (ETIMEDOUT);
 	case CPL_ERR_TCAM_FULL:
 		return (ENOMEM);
 	case CPL_ERR_CONN_EXIST:
 		log(LOG_ERR, "ACTIVE_OPEN_RPL: 4-tuple in use\n");
 		return (EADDRINUSE);
 	default:
 		return (EIO);
 	}
 }
 
 static void
 fail_act_open(struct toepcb *toep, int errno)
 {
 	struct tcpcb *tp = toep->tp_tp;
 
 	t3_release_offload_resources(toep);
 	if (tp) {
 		inp_wunlock(tp->t_inpcb);		
 		tcp_offload_drop(tp, errno);
 	}
 	
 #ifdef notyet
 	TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
 #endif
 }
 
 /*
  * Handle active open failures.
  */
 static void
 active_open_failed(struct toepcb *toep, struct mbuf *m)
 {
 	struct cpl_act_open_rpl *rpl = cplhdr(m);
 	struct inpcb *inp;
 
 	if (toep->tp_tp == NULL)
 		goto done;
 
 	inp = toep->tp_tp->t_inpcb;
 
 /*
  * Don't handle connection retry for now
  */
 #ifdef notyet
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
 	if (rpl->status == CPL_ERR_CONN_EXIST &&
 	    icsk->icsk_retransmit_timer.function != act_open_retry_timer) {
 		icsk->icsk_retransmit_timer.function = act_open_retry_timer;
 		sk_reset_timer(so, &icsk->icsk_retransmit_timer,
 			       jiffies + HZ / 2);
 	} else
 #endif
 	{
 		inp_wlock(inp);
 		/*
 		 * drops the inpcb lock
 		 */
 		fail_act_open(toep, act_open_rpl_status_to_errno(rpl->status));
 	}
 	
 	done:
 	m_free(m);
 }
 
 /*
  * Return whether a failed active open has allocated a TID
  */
 static inline int
 act_open_has_tid(int status)
 {
 	return status != CPL_ERR_TCAM_FULL && status != CPL_ERR_CONN_EXIST &&
 	       status != CPL_ERR_ARP_MISS;
 }
 
 /*
  * Process an ACT_OPEN_RPL CPL message.
  */
 static int
 do_act_open_rpl(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 	struct cpl_act_open_rpl *rpl = cplhdr(m);
 	
 	if (cdev->type != T3A && act_open_has_tid(rpl->status))
 		cxgb_queue_tid_release(cdev, GET_TID(rpl));
 	
 	active_open_failed(toep, m);
 	return (0);
 }
 
 /*
  * Handle an ARP failure for an active open.   XXX purge ofo queue
  *
  * XXX badly broken for crossed SYNs as the ATID is no longer valid.
  * XXX crossed SYN errors should be generated by PASS_ACCEPT_RPL which should
  * check SOCK_DEAD or sk->sk_sock.  Or maybe generate the error here but don't
  * free the atid.  Hmm.
  */
 #ifdef notyet
 static void
 act_open_req_arp_failure(struct t3cdev *dev, struct mbuf *m)
 {
 	struct toepcb *toep = m_get_toep(m);
 	struct tcpcb *tp = toep->tp_tp;
 	struct inpcb *inp = tp->t_inpcb;
 	struct socket *so;
 	
 	inp_wlock(inp);
 	if (tp->t_state == TCPS_SYN_SENT || tp->t_state == TCPS_SYN_RECEIVED) {
 		/*
 		 * drops the inpcb lock
 		 */
 		fail_act_open(so, EHOSTUNREACH);
 		printf("freeing %p\n", m);
 		
 		m_free(m);
 	} else
 		inp_wunlock(inp);
 }
 #endif
 /*
  * Send an active open request.
  */
 int
 t3_connect(struct toedev *tdev, struct socket *so,
     struct rtentry *rt, struct sockaddr *nam)
 {
 	struct mbuf *m;
 	struct l2t_entry *e;
 	struct tom_data *d = TOM_DATA(tdev);
 	struct inpcb *inp = so_sotoinpcb(so);
 	struct tcpcb *tp = intotcpcb(inp);
 	struct toepcb *toep; /* allocated by init_offload_socket */
 		
 	int atid;
 
 	toep = toepcb_alloc();
 	if (toep == NULL)
 		goto out_err;
 	
 	if ((atid = cxgb_alloc_atid(d->cdev, d->client, toep)) < 0)
 		goto out_err;
 	
 	e = t3_l2t_get(d->cdev, rt, rt->rt_ifp, nam);
 	if (!e)
 		goto free_tid;
 
 	inp_lock_assert(inp);
 	m = m_gethdr(MT_DATA, M_WAITOK);
 	
 #if 0	
 	m->m_toe.mt_toepcb = tp->t_toe;
 	set_arp_failure_handler((struct mbuf *)m, act_open_req_arp_failure);
 #endif
 	so_lock(so);
 	
 	init_offload_socket(so, tdev, atid, e, rt, toep);
 	
 	install_offload_ops(so);
 	
 	mk_act_open_req(so, m, atid, e);
 	so_unlock(so);
 	
 	soisconnecting(so);
 	toep = tp->t_toe;
 	m_set_toep(m, tp->t_toe);
 	
 	toep->tp_state = TCPS_SYN_SENT;
 	l2t_send(d->cdev, (struct mbuf *)m, e);
 
 	if (toep->tp_ulp_mode)
 		t3_enable_ddp(toep, 0);
 	return 	(0);
 	
 free_tid:
 	printf("failing connect - free atid\n");
 	
 	free_atid(d->cdev, atid);
 out_err:
 	printf("return ENOMEM\n");
        return (ENOMEM);
 }
 
 /*
  * Send an ABORT_REQ message.  Cannot fail.  This routine makes sure we do
  * not send multiple ABORT_REQs for the same connection and also that we do
  * not try to send a message after the connection has closed.  Returns 1 if
  * an ABORT_REQ wasn't generated after all, 0 otherwise.
  */
 static void
 t3_send_reset(struct toepcb *toep)
 {
 	
 	struct cpl_abort_req *req;
 	unsigned int tid = toep->tp_tid;
 	int mode = CPL_ABORT_SEND_RST;
 	struct tcpcb *tp = toep->tp_tp;
 	struct toedev *tdev = toep->tp_toedev;
 	struct socket *so = NULL;
 	struct mbuf *m;
 	struct sockbuf *snd;
 	
 	if (tp) {
 		inp_lock_assert(tp->t_inpcb);
 		so = inp_inpcbtosocket(tp->t_inpcb);
 	}
 	
 	if (__predict_false((toep->tp_flags & TP_ABORT_SHUTDOWN) ||
 		tdev == NULL))
 		return;
 	toep->tp_flags |= (TP_ABORT_RPL_PENDING|TP_ABORT_SHUTDOWN);
 
 	snd = so_sockbuf_snd(so);
 	/* Purge the send queue so we don't send anything after an abort. */
 	if (so)
 		sbflush(snd);
 	if ((toep->tp_flags & TP_CLOSE_CON_REQUESTED) && is_t3a(tdev))
 		mode |= CPL_ABORT_POST_CLOSE_REQ;
 
 	m = m_gethdr_nofail(sizeof(*req));
 	m_set_priority(m, mkprio(CPL_PRIORITY_DATA, toep));
 	set_arp_failure_handler(m, abort_arp_failure);
 
 	req = mtod(m, struct cpl_abort_req *);
 	req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_HOST_ABORT_CON_REQ));
 	req->wr.wr_lo = htonl(V_WR_TID(tid));
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ABORT_REQ, tid));
 	req->rsvd0 = tp ? htonl(tp->snd_nxt) : 0;
 	req->rsvd1 = !(toep->tp_flags & TP_DATASENT);
 	req->cmd = mode;
 	if (tp && (tp->t_state == TCPS_SYN_SENT))
 		mbufq_tail(&toep->out_of_order_queue, m);	// defer
 	else
 		l2t_send(TOEP_T3C_DEV(toep), m, toep->tp_l2t);
 }
 
 static int
 t3_ip_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	struct inpcb *inp;
 	int error, optval;
 	
 	if (sopt->sopt_name == IP_OPTIONS)
 		return (ENOPROTOOPT);
 
 	if (sopt->sopt_name != IP_TOS)
 		return (EOPNOTSUPP);
 	
 	error = sooptcopyin(sopt, &optval, sizeof optval, sizeof optval);
 
 	if (error)
 		return (error);
 
 	if (optval > IPTOS_PREC_CRITIC_ECP)
 		return (EINVAL);
 
 	inp = so_sotoinpcb(so);
 	inp_wlock(inp);
 	inp_ip_tos_set(inp, optval);
 #if 0	
 	inp->inp_ip_tos = optval;
 #endif
 	t3_set_tos(inp_inpcbtotcpcb(inp)->t_toe);
 	inp_wunlock(inp);
 
 	return (0);
 }
 
 static int
 t3_tcp_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	int err = 0;
 	size_t copied;
 
 	if (sopt->sopt_name != TCP_CONGESTION &&
 	    sopt->sopt_name != TCP_NODELAY)
 		return (EOPNOTSUPP);
 
 	if (sopt->sopt_name == TCP_CONGESTION) {
 		char name[TCP_CA_NAME_MAX];
 		int optlen = sopt->sopt_valsize;
 		struct tcpcb *tp;
 		
 		if (sopt->sopt_dir == SOPT_GET) {
 			KASSERT(0, ("unimplemented"));
 			return (EOPNOTSUPP);
 		}
 
 		if (optlen < 1)
 			return (EINVAL);
 		
 		err = copyinstr(sopt->sopt_val, name, 
 		    min(TCP_CA_NAME_MAX - 1, optlen), &copied);
 		if (err)
 			return (err);
 		if (copied < 1)
 			return (EINVAL);
 
 		tp = so_sototcpcb(so);
 		/*
 		 * XXX I need to revisit this
 		 */
 		if ((err = t3_set_cong_control(so, name)) == 0) {
 #ifdef CONGESTION_CONTROL_SUPPORTED
 			tp->t_cong_control = strdup(name, M_CXGB);
 #endif			
 		} else
 			return (err);
 	} else {
 		int optval, oldval;
 		struct inpcb *inp;
 		struct tcpcb *tp;
 
 		if (sopt->sopt_dir == SOPT_GET)
 			return (EOPNOTSUPP);
 	
 		err = sooptcopyin(sopt, &optval, sizeof optval,
 		    sizeof optval);
 
 		if (err)
 			return (err);
 
 		inp = so_sotoinpcb(so);
 		inp_wlock(inp);
 		tp = inp_inpcbtotcpcb(inp);
 
 		oldval = tp->t_flags;
 		if (optval)
 			tp->t_flags |= TF_NODELAY;
 		else
 			tp->t_flags &= ~TF_NODELAY;
 		inp_wunlock(inp);
 
 
 		if (oldval != tp->t_flags && (tp->t_toe != NULL))
 			t3_set_nagle(tp->t_toe);
 
 	}
 
 	return (0);
 }
 
 int
 t3_ctloutput(struct socket *so, struct sockopt *sopt)
 {
 	int err;
 
 	if (sopt->sopt_level != IPPROTO_TCP) 
 		err =  t3_ip_ctloutput(so, sopt);
 	else
 		err = t3_tcp_ctloutput(so, sopt);
 
 	if (err != EOPNOTSUPP)
 		return (err);
 
 	return (tcp_ctloutput(so, sopt));
 }
 
 /*
  * Returns true if we need to explicitly request RST when we receive new data
  * on an RX-closed connection.
  */
 static inline int
 need_rst_on_excess_rx(const struct toepcb *toep)
 {
 	return (1);
 }
 
 /*
  * Handles Rx data that arrives in a state where the socket isn't accepting
  * new data.
  */
 static void
 handle_excess_rx(struct toepcb *toep, struct mbuf *m)
 {
 	
 	if (need_rst_on_excess_rx(toep) &&
 	    !(toep->tp_flags & TP_ABORT_SHUTDOWN))
 		t3_send_reset(toep);
 	m_freem(m); 
 }
 
 /*
  * Process a get_tcb_rpl as a DDP completion (similar to RX_DDP_COMPLETE)
  * by getting the DDP offset from the TCB.
  */
 static void
 tcb_rpl_as_ddp_complete(struct toepcb *toep, struct mbuf *m)
 {
 	struct ddp_state *q = &toep->tp_ddp_state;
 	struct ddp_buf_state *bsp;
 	struct cpl_get_tcb_rpl *hdr;
 	unsigned int ddp_offset;
 	struct socket *so;
 	struct tcpcb *tp;
 	struct sockbuf *rcv;	
 	int state;
 	
 	uint64_t t;
 	__be64 *tcb;
 
 	tp = toep->tp_tp;
 	so = inp_inpcbtosocket(tp->t_inpcb);
 
 	inp_lock_assert(tp->t_inpcb);
 	rcv = so_sockbuf_rcv(so);
 	sockbuf_lock(rcv);	
 	
 	/* Note that we only accout for CPL_GET_TCB issued by the DDP code.
 	 * We really need a cookie in order to dispatch the RPLs.
 	 */
 	q->get_tcb_count--;
 
 	/* It is a possible that a previous CPL already invalidated UBUF DDP
 	 * and moved the cur_buf idx and hence no further processing of this
 	 * skb is required. However, the app might be sleeping on
 	 * !q->get_tcb_count and we need to wake it up.
 	 */
 	if (q->cancel_ubuf && !t3_ddp_ubuf_pending(toep)) {
 		int state = so_state_get(so);
 
 		m_freem(m);
 		if (__predict_true((state & SS_NOFDREF) == 0))
 			so_sorwakeup_locked(so);
 		else
 			sockbuf_unlock(rcv);
 
 		return;
 	}
 
 	bsp = &q->buf_state[q->cur_buf];
 	hdr = cplhdr(m);
 	tcb = (__be64 *)(hdr + 1);
 	if (q->cur_buf == 0) {
 		t = be64toh(tcb[(31 - W_TCB_RX_DDP_BUF0_OFFSET) / 2]);
 		ddp_offset = t >> (32 + S_TCB_RX_DDP_BUF0_OFFSET);
 	} else {
 		t = be64toh(tcb[(31 - W_TCB_RX_DDP_BUF1_OFFSET) / 2]);
 		ddp_offset = t >> S_TCB_RX_DDP_BUF1_OFFSET;
 	}
 	ddp_offset &= M_TCB_RX_DDP_BUF0_OFFSET;
 	m->m_cur_offset = bsp->cur_offset;
 	bsp->cur_offset = ddp_offset;
 	m->m_len = m->m_pkthdr.len = ddp_offset - m->m_cur_offset;
 
 	CTR5(KTR_TOM,
 	    "tcb_rpl_as_ddp_complete: idx=%d seq=0x%x hwbuf=%u ddp_offset=%u cur_offset=%u",
 	    q->cur_buf, tp->rcv_nxt, q->cur_buf, ddp_offset, m->m_cur_offset);
 	KASSERT(ddp_offset >= m->m_cur_offset,
 	    ("ddp_offset=%u less than cur_offset=%u",
 		ddp_offset, m->m_cur_offset));
 	
 #if 0
 {
 	unsigned int ddp_flags, rcv_nxt, rx_hdr_offset, buf_idx;
 
 	t = be64toh(tcb[(31 - W_TCB_RX_DDP_FLAGS) / 2]);
 	ddp_flags = (t >> S_TCB_RX_DDP_FLAGS) & M_TCB_RX_DDP_FLAGS;
 
         t = be64toh(tcb[(31 - W_TCB_RCV_NXT) / 2]);
         rcv_nxt = t >> S_TCB_RCV_NXT;
         rcv_nxt &= M_TCB_RCV_NXT;
 
         t = be64toh(tcb[(31 - W_TCB_RX_HDR_OFFSET) / 2]);
         rx_hdr_offset = t >> (32 + S_TCB_RX_HDR_OFFSET);
         rx_hdr_offset &= M_TCB_RX_HDR_OFFSET;
 
 	T3_TRACE2(TIDTB(sk),
 		  "tcb_rpl_as_ddp_complete: DDP FLAGS 0x%x dma up to 0x%x",
 		  ddp_flags, rcv_nxt - rx_hdr_offset);
 	T3_TRACE4(TB(q),
 		  "tcb_rpl_as_ddp_complete: rcvnxt 0x%x hwbuf %u cur_offset %u cancel %u",
 		  tp->rcv_nxt, q->cur_buf, bsp->cur_offset, q->cancel_ubuf);
 	T3_TRACE3(TB(q),
 		  "tcb_rpl_as_ddp_complete: TCB rcvnxt 0x%x hwbuf 0x%x ddp_offset %u",
 		  rcv_nxt - rx_hdr_offset, ddp_flags, ddp_offset);
 	T3_TRACE2(TB(q),
 		  "tcb_rpl_as_ddp_complete: flags0 0x%x flags1 0x%x",
 		 q->buf_state[0].flags, q->buf_state[1].flags);
 
 }
 #endif
 	if (__predict_false(so_no_receive(so) && m->m_pkthdr.len)) {
 		handle_excess_rx(toep, m);
 		return;
 	}
 
 #ifdef T3_TRACE
 	if ((int)m->m_pkthdr.len < 0) {
 		t3_ddp_error(so, "tcb_rpl_as_ddp_complete: neg len");
 	}
 #endif
 	if (bsp->flags & DDP_BF_NOCOPY) {
 #ifdef T3_TRACE
 		T3_TRACE0(TB(q),
 			  "tcb_rpl_as_ddp_complete: CANCEL UBUF");
 
 		if (!q->cancel_ubuf && !(sk->sk_shutdown & RCV_SHUTDOWN)) {
 			printk("!cancel_ubuf");
 			t3_ddp_error(sk, "tcb_rpl_as_ddp_complete: !cancel_ubuf");
 		}
 #endif
 		m->m_ddp_flags = DDP_BF_PSH | DDP_BF_NOCOPY | 1;
 		bsp->flags &= ~(DDP_BF_NOCOPY|DDP_BF_NODATA);
 		q->cur_buf ^= 1;
 	} else if (bsp->flags & DDP_BF_NOFLIP) {
 
 		m->m_ddp_flags = 1;    /* always a kernel buffer */
 
 		/* now HW buffer carries a user buffer */
 		bsp->flags &= ~DDP_BF_NOFLIP;
 		bsp->flags |= DDP_BF_NOCOPY;
 
 		/* It is possible that the CPL_GET_TCB_RPL doesn't indicate
 		 * any new data in which case we're done. If in addition the
 		 * offset is 0, then there wasn't a completion for the kbuf
 		 * and we need to decrement the posted count.
 		 */
 		if (m->m_pkthdr.len == 0) {
 			if (ddp_offset == 0) {
 				q->kbuf_posted--;
 				bsp->flags |= DDP_BF_NODATA;
 			}
 			sockbuf_unlock(rcv);
 			m_free(m);
 			return;
 		}
 	} else {
 		sockbuf_unlock(rcv);
 
 		/* This reply is for a CPL_GET_TCB_RPL to cancel the UBUF DDP,
 		 * but it got here way late and nobody cares anymore.
 		 */
 		m_free(m);
 		return;
 	}
 
 	m->m_ddp_gl = (unsigned char *)bsp->gl;
 	m->m_flags |= M_DDP;
 	m->m_seq = tp->rcv_nxt;
 	tp->rcv_nxt += m->m_pkthdr.len;
 	tp->t_rcvtime = ticks;
 	CTR3(KTR_TOM, "tcb_rpl_as_ddp_complete: seq 0x%x hwbuf %u m->m_pktlen %u",
 		  m->m_seq, q->cur_buf, m->m_pkthdr.len);
 	if (m->m_pkthdr.len == 0) {
 		q->user_ddp_pending = 0;
 		m_free(m);
 	} else 
 		SBAPPEND(rcv, m);
 
 	state = so_state_get(so);	
 	if (__predict_true((state & SS_NOFDREF) == 0))
 		so_sorwakeup_locked(so);
 	else
 		sockbuf_unlock(rcv);
 }
 
 /*
  * Process a CPL_GET_TCB_RPL.  These can also be generated by the DDP code,
  * in that case they are similar to DDP completions.
  */
 static int
 do_get_tcb_rpl(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	/* OK if socket doesn't exist */
 	if (toep == NULL) {
 		printf("null toep in do_get_tcb_rpl\n");
 		return (CPL_RET_BUF_DONE);
 	}
 
 	inp_wlock(toep->tp_tp->t_inpcb);
 	tcb_rpl_as_ddp_complete(toep, m);
 	inp_wunlock(toep->tp_tp->t_inpcb);
 	
 	return (0);
 }
 
 static void
 handle_ddp_data(struct toepcb *toep, struct mbuf *m)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so;
 	struct ddp_state *q;
 	struct ddp_buf_state *bsp;
 	struct cpl_rx_data *hdr = cplhdr(m);
 	unsigned int rcv_nxt = ntohl(hdr->seq);
 	struct sockbuf *rcv;	
 	
 	if (tp->rcv_nxt == rcv_nxt)
 		return;
 
 	inp_lock_assert(tp->t_inpcb);
 	so  = inp_inpcbtosocket(tp->t_inpcb);
 	rcv = so_sockbuf_rcv(so);	
 	sockbuf_lock(rcv);	
 
 	q = &toep->tp_ddp_state;
 	bsp = &q->buf_state[q->cur_buf];
 	KASSERT(SEQ_GT(rcv_nxt, tp->rcv_nxt), ("tp->rcv_nxt=0x%08x decreased rcv_nxt=0x08%x",
 		rcv_nxt, tp->rcv_nxt));
 	m->m_len = m->m_pkthdr.len = rcv_nxt - tp->rcv_nxt;
 	KASSERT(m->m_len > 0, ("%s m_len=%d", __FUNCTION__, m->m_len));
 	CTR3(KTR_TOM, "rcv_nxt=0x%x tp->rcv_nxt=0x%x len=%d",
 	    rcv_nxt, tp->rcv_nxt, m->m_pkthdr.len);
 
 #ifdef T3_TRACE
 	if ((int)m->m_pkthdr.len < 0) {
 		t3_ddp_error(so, "handle_ddp_data: neg len");
 	}
 #endif
 	m->m_ddp_gl = (unsigned char *)bsp->gl;
 	m->m_flags |= M_DDP;
 	m->m_cur_offset = bsp->cur_offset;
 	m->m_ddp_flags = DDP_BF_PSH | (bsp->flags & DDP_BF_NOCOPY) | 1;
 	if (bsp->flags & DDP_BF_NOCOPY)
 		bsp->flags &= ~DDP_BF_NOCOPY;
 
 	m->m_seq = tp->rcv_nxt;
 	tp->rcv_nxt = rcv_nxt;
 	bsp->cur_offset += m->m_pkthdr.len;
 	if (!(bsp->flags & DDP_BF_NOFLIP))
 		q->cur_buf ^= 1;
 	/*
 	 * For now, don't re-enable DDP after a connection fell out of  DDP
 	 * mode.
 	 */
 	q->ubuf_ddp_ready = 0;
 	sockbuf_unlock(rcv);
 }
 
 /*
  * Process new data received for a connection.
  */
 static void
 new_rx_data(struct toepcb *toep, struct mbuf *m)
 {
 	struct cpl_rx_data *hdr = cplhdr(m);
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so;
 	struct sockbuf *rcv;	
 	int state;
 	int len = be16toh(hdr->len);
 
 	inp_wlock(tp->t_inpcb);
 
 	so  = inp_inpcbtosocket(tp->t_inpcb);
 	
 	if (__predict_false(so_no_receive(so))) {
 		handle_excess_rx(toep, m);
 		inp_wunlock(tp->t_inpcb);
 		TRACE_EXIT;
 		return;
 	}
 
 	if (toep->tp_ulp_mode == ULP_MODE_TCPDDP)
 		handle_ddp_data(toep, m);
 	
 	m->m_seq = ntohl(hdr->seq);
 	m->m_ulp_mode = 0;                    /* for iSCSI */
 
 #if VALIDATE_SEQ
 	if (__predict_false(m->m_seq != tp->rcv_nxt)) {
 		log(LOG_ERR,
 		       "%s: TID %u: Bad sequence number %u, expected %u\n",
 		    toep->tp_toedev->name, toep->tp_tid, m->m_seq,
 		       tp->rcv_nxt);
 		m_freem(m);
 		inp_wunlock(tp->t_inpcb);
 		return;
 	}
 #endif
 	m_adj(m, sizeof(*hdr));
 
 #ifdef URGENT_DATA_SUPPORTED
 	/*
 	 * We don't handle urgent data yet
 	 */
 	if (__predict_false(hdr->urg))
 		handle_urg_ptr(so, tp->rcv_nxt + ntohs(hdr->urg));
 	if (__predict_false(tp->urg_data == TCP_URG_NOTYET &&
 		     tp->urg_seq - tp->rcv_nxt < skb->len))
 		tp->urg_data = TCP_URG_VALID | skb->data[tp->urg_seq -
 							 tp->rcv_nxt];
 #endif	
 	if (__predict_false(hdr->dack_mode != toep->tp_delack_mode)) {
 		toep->tp_delack_mode = hdr->dack_mode;
 		toep->tp_delack_seq = tp->rcv_nxt;
 	}
 	CTR6(KTR_TOM, "appending mbuf=%p pktlen=%d m_len=%d len=%d rcv_nxt=0x%x enqueued_bytes=%d",
 	    m, m->m_pkthdr.len, m->m_len, len, tp->rcv_nxt, toep->tp_enqueued_bytes);
 	
 	if (len < m->m_pkthdr.len)
 		m->m_pkthdr.len = m->m_len = len;
 
 	tp->rcv_nxt += m->m_pkthdr.len;
 	tp->t_rcvtime = ticks;
 	toep->tp_enqueued_bytes += m->m_pkthdr.len;
 	CTR2(KTR_TOM,
 	    "new_rx_data: seq 0x%x len %u",
 	    m->m_seq, m->m_pkthdr.len);
 	inp_wunlock(tp->t_inpcb);
 	rcv = so_sockbuf_rcv(so);
 	sockbuf_lock(rcv);
 #if 0	
 	if (sb_notify(rcv))
 		DPRINTF("rx_data so=%p flags=0x%x len=%d\n", so, rcv->sb_flags, m->m_pkthdr.len);
 #endif
 	SBAPPEND(rcv, m);
 
 #ifdef notyet
 	/*
 	 * We're giving too many credits to the card - but disable this check so we can keep on moving :-|
 	 *
 	 */
 	KASSERT(rcv->sb_cc < (rcv->sb_mbmax << 1),
 
 	    ("so=%p, data contents exceed mbmax, sb_cc=%d sb_mbmax=%d",
 		so, rcv->sb_cc, rcv->sb_mbmax));
 #endif
 	
 
 	CTR2(KTR_TOM, "sb_cc=%d sb_mbcnt=%d",
 	    rcv->sb_cc, rcv->sb_mbcnt);
 	
 	state = so_state_get(so);	
 	if (__predict_true((state & SS_NOFDREF) == 0))
 		so_sorwakeup_locked(so);
 	else
 		sockbuf_unlock(rcv);
 }
 
 /*
  * Handler for RX_DATA CPL messages.
  */
 static int
 do_rx_data(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	DPRINTF("rx_data len=%d\n", m->m_pkthdr.len);
 	
 	new_rx_data(toep, m);
 
 	return (0);
 }
 
 static void
 new_rx_data_ddp(struct toepcb *toep, struct mbuf *m)
 {
 	struct tcpcb *tp;
 	struct ddp_state *q;
 	struct ddp_buf_state *bsp;
 	struct cpl_rx_data_ddp *hdr;
 	struct socket *so;	
 	unsigned int ddp_len, rcv_nxt, ddp_report, end_offset, buf_idx;
 	int nomoredata = 0;
 	unsigned int delack_mode;
 	struct sockbuf *rcv;
 	
 	tp = toep->tp_tp;	
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(tp->t_inpcb);
 
 	if (__predict_false(so_no_receive(so))) {
 
 		handle_excess_rx(toep, m);
 		inp_wunlock(tp->t_inpcb);
 		return;
 	}
 	
 	q = &toep->tp_ddp_state;
 	hdr = cplhdr(m);
 	ddp_report = ntohl(hdr->u.ddp_report);
 	buf_idx = (ddp_report >> S_DDP_BUF_IDX) & 1;
 	bsp = &q->buf_state[buf_idx];
 
 	CTR4(KTR_TOM,
 	    "new_rx_data_ddp: tp->rcv_nxt 0x%x cur_offset %u "
 	    "hdr seq 0x%x len %u",
 	    tp->rcv_nxt, bsp->cur_offset, ntohl(hdr->seq),
 	    ntohs(hdr->len));
 	CTR3(KTR_TOM,
 	    "new_rx_data_ddp: offset %u ddp_report 0x%x buf_idx=%d",
 	    G_DDP_OFFSET(ddp_report), ddp_report, buf_idx);
 	
 	ddp_len = ntohs(hdr->len);
 	rcv_nxt = ntohl(hdr->seq) + ddp_len;
 
 	delack_mode = G_DDP_DACK_MODE(ddp_report);
 	if (__predict_false(G_DDP_DACK_MODE(ddp_report) != toep->tp_delack_mode)) {
 		toep->tp_delack_mode = delack_mode;
 		toep->tp_delack_seq = tp->rcv_nxt;
 	}
 	
 	m->m_seq = tp->rcv_nxt;
 	tp->rcv_nxt = rcv_nxt;
 
 	tp->t_rcvtime = ticks;
 	/*
 	 * Store the length in m->m_len.  We are changing the meaning of
 	 * m->m_len here, we need to be very careful that nothing from now on
 	 * interprets ->len of this packet the usual way.
 	 */
 	m->m_len = m->m_pkthdr.len = rcv_nxt - m->m_seq;
 	inp_wunlock(tp->t_inpcb);
 	CTR3(KTR_TOM,
 	    "new_rx_data_ddp: m_len=%u rcv_next 0x%08x rcv_nxt_prev=0x%08x ",
 	    m->m_len, rcv_nxt, m->m_seq);
 	/*
 	 * Figure out where the new data was placed in the buffer and store it
 	 * in when.  Assumes the buffer offset starts at 0, consumer needs to
 	 * account for page pod's pg_offset.
 	 */
 	end_offset = G_DDP_OFFSET(ddp_report) + ddp_len;
 	m->m_cur_offset = end_offset - m->m_pkthdr.len;
 
 	rcv = so_sockbuf_rcv(so);
 	sockbuf_lock(rcv);	
 
 	m->m_ddp_gl = (unsigned char *)bsp->gl;
 	m->m_flags |= M_DDP;
 	bsp->cur_offset = end_offset;
 	toep->tp_enqueued_bytes += m->m_pkthdr.len;
 
 	/*
 	 * Length is only meaningful for kbuf
 	 */
 	if (!(bsp->flags & DDP_BF_NOCOPY))
 		KASSERT(m->m_len <= bsp->gl->dgl_length,
 		    ("length received exceeds ddp pages: len=%d dgl_length=%d",
 			m->m_len, bsp->gl->dgl_length));
 
 	KASSERT(m->m_len > 0, ("%s m_len=%d", __FUNCTION__, m->m_len));
 	KASSERT(m->m_next == NULL, ("m_len=%p", m->m_next));
         /*
 	 * Bit 0 of flags stores whether the DDP buffer is completed.
 	 * Note that other parts of the code depend on this being in bit 0.
 	 */
 	if ((bsp->flags & DDP_BF_NOINVAL) && end_offset != bsp->gl->dgl_length) {
 		panic("spurious ddp completion");
 	} else {
 		m->m_ddp_flags = !!(ddp_report & F_DDP_BUF_COMPLETE);
 		if (m->m_ddp_flags && !(bsp->flags & DDP_BF_NOFLIP)) 
 			q->cur_buf ^= 1;                     /* flip buffers */
 	}
 
 	if (bsp->flags & DDP_BF_NOCOPY) {
 		m->m_ddp_flags |= (bsp->flags & DDP_BF_NOCOPY);
 		bsp->flags &= ~DDP_BF_NOCOPY;
 	}
 
 	if (ddp_report & F_DDP_PSH)
 		m->m_ddp_flags |= DDP_BF_PSH;
 	if (nomoredata)
 		m->m_ddp_flags |= DDP_BF_NODATA;
 
 #ifdef notyet	
 	skb_reset_transport_header(skb);
 	tcp_hdr(skb)->fin = 0;          /* changes original hdr->ddp_report */
 #endif
 	SBAPPEND(rcv, m);
 
 	if ((so_state_get(so) & SS_NOFDREF) == 0 && ((ddp_report & F_DDP_PSH) ||
 	    (((m->m_ddp_flags & (DDP_BF_NOCOPY|1)) == (DDP_BF_NOCOPY|1))
 		|| !(m->m_ddp_flags & DDP_BF_NOCOPY))))
 		so_sorwakeup_locked(so);
 	else
 		sockbuf_unlock(rcv);
 }
 
 #define DDP_ERR (F_DDP_PPOD_MISMATCH | F_DDP_LLIMIT_ERR | F_DDP_ULIMIT_ERR |\
 		 F_DDP_PPOD_PARITY_ERR | F_DDP_PADDING_ERR | F_DDP_OFFSET_ERR |\
 		 F_DDP_INVALID_TAG | F_DDP_COLOR_ERR | F_DDP_TID_MISMATCH |\
 		 F_DDP_INVALID_PPOD)
 
 /*
  * Handler for RX_DATA_DDP CPL messages.
  */
 static int
 do_rx_data_ddp(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = ctx;
 	const struct cpl_rx_data_ddp *hdr = cplhdr(m);
 
 	VALIDATE_SOCK(so);
 
 	if (__predict_false(ntohl(hdr->ddpvld_status) & DDP_ERR)) {
 		log(LOG_ERR, "RX_DATA_DDP for TID %u reported error 0x%x\n",
 		       GET_TID(hdr), G_DDP_VALID(ntohl(hdr->ddpvld_status)));
 		return (CPL_RET_BUF_DONE);
 	}
 #if 0
 	skb->h.th = tcphdr_skb->h.th;
 #endif	
 	new_rx_data_ddp(toep, m);
 	return (0);
 }
 
 static void
 process_ddp_complete(struct toepcb *toep, struct mbuf *m)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so;
 	struct ddp_state *q;
 	struct ddp_buf_state *bsp;
 	struct cpl_rx_ddp_complete *hdr;
 	unsigned int ddp_report, buf_idx, when, delack_mode;
 	int nomoredata = 0;
 	struct sockbuf *rcv;
 	
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(tp->t_inpcb);
 
 	if (__predict_false(so_no_receive(so))) {
 		struct inpcb *inp = so_sotoinpcb(so);
 
 		handle_excess_rx(toep, m);
 		inp_wunlock(inp);
 		return;
 	}
 	q = &toep->tp_ddp_state; 
 	hdr = cplhdr(m);
 	ddp_report = ntohl(hdr->ddp_report);
 	buf_idx = (ddp_report >> S_DDP_BUF_IDX) & 1;
 	m->m_pkthdr.csum_data = tp->rcv_nxt;
 
 	rcv = so_sockbuf_rcv(so);
 	sockbuf_lock(rcv);
 
 	bsp = &q->buf_state[buf_idx];
 	when = bsp->cur_offset;
 	m->m_len = m->m_pkthdr.len = G_DDP_OFFSET(ddp_report) - when;
 	tp->rcv_nxt += m->m_len;
 	tp->t_rcvtime = ticks;
 
 	delack_mode = G_DDP_DACK_MODE(ddp_report);
 	if (__predict_false(G_DDP_DACK_MODE(ddp_report) != toep->tp_delack_mode)) {
 		toep->tp_delack_mode = delack_mode;
 		toep->tp_delack_seq = tp->rcv_nxt;
 	}
 #ifdef notyet
 	skb_reset_transport_header(skb);
 	tcp_hdr(skb)->fin = 0;          /* changes valid memory past CPL */
 #endif
 	inp_wunlock(tp->t_inpcb);
 
 	KASSERT(m->m_len >= 0, ("%s m_len=%d", __FUNCTION__, m->m_len));
 	CTR5(KTR_TOM,
 		  "process_ddp_complete: tp->rcv_nxt 0x%x cur_offset %u "
 		  "ddp_report 0x%x offset %u, len %u",
 		  tp->rcv_nxt, bsp->cur_offset, ddp_report,
 		   G_DDP_OFFSET(ddp_report), m->m_len);
 
 	m->m_cur_offset = bsp->cur_offset;
 	bsp->cur_offset += m->m_len;
 
 	if (!(bsp->flags & DDP_BF_NOFLIP)) {
 		q->cur_buf ^= 1;                     /* flip buffers */
 		if (G_DDP_OFFSET(ddp_report) < q->kbuf[0]->dgl_length)
 			nomoredata=1;
 	}
 		
 	CTR4(KTR_TOM,
 		  "process_ddp_complete: tp->rcv_nxt 0x%x cur_offset %u "
 		  "ddp_report %u offset %u",
 		  tp->rcv_nxt, bsp->cur_offset, ddp_report,
 		   G_DDP_OFFSET(ddp_report));
 	
 	m->m_ddp_gl = (unsigned char *)bsp->gl;
 	m->m_flags |= M_DDP;
 	m->m_ddp_flags = (bsp->flags & DDP_BF_NOCOPY) | 1;
 	if (bsp->flags & DDP_BF_NOCOPY)
 		bsp->flags &= ~DDP_BF_NOCOPY;
 	if (nomoredata)
 		m->m_ddp_flags |= DDP_BF_NODATA;
 
 	SBAPPEND(rcv, m);
 	if ((so_state_get(so) & SS_NOFDREF) == 0)
 		so_sorwakeup_locked(so);
 	else
 		sockbuf_unlock(rcv);
 }
 
 /*
  * Handler for RX_DDP_COMPLETE CPL messages.
  */
 static int
 do_rx_ddp_complete(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = ctx;
 
 	VALIDATE_SOCK(so);
 #if 0
 	skb->h.th = tcphdr_skb->h.th;
 #endif	
 	process_ddp_complete(toep, m);
 	return (0);
 }
 
 /*
  * Move a socket to TIME_WAIT state.  We need to make some adjustments to the
  * socket state before calling tcp_time_wait to comply with its expectations.
  */
 static void
 enter_timewait(struct tcpcb *tp)
 {
 	/*
 	 * Bump rcv_nxt for the peer FIN.  We don't do this at the time we
 	 * process peer_close because we don't want to carry the peer FIN in
 	 * the socket's receive queue and if we increment rcv_nxt without
 	 * having the FIN in the receive queue we'll confuse facilities such
 	 * as SIOCINQ.
 	 */
 	inp_wlock(tp->t_inpcb);	
 	tp->rcv_nxt++;
 
 	tp->ts_recent_age = 0;	     /* defeat recycling */
 	tp->t_srtt = 0;                        /* defeat tcp_update_metrics */
 	inp_wunlock(tp->t_inpcb);
 	tcp_offload_twstart(tp);
 }
 
 /*
  * For TCP DDP a PEER_CLOSE may also be an implicit RX_DDP_COMPLETE.  This
  * function deals with the data that may be reported along with the FIN.
  * Returns -1 if no further processing of the PEER_CLOSE is needed, >= 0 to
  * perform normal FIN-related processing.  In the latter case 1 indicates that
  * there was an implicit RX_DDP_COMPLETE and the skb should not be freed, 0 the
  * skb can be freed.
  */
 static int
 handle_peer_close_data(struct socket *so, struct mbuf *m)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	struct ddp_state *q;
 	struct ddp_buf_state *bsp;
 	struct cpl_peer_close *req = cplhdr(m);
 	unsigned int rcv_nxt = ntohl(req->rcv_nxt) - 1; /* exclude FIN */
 	struct sockbuf *rcv;
 	
 	if (tp->rcv_nxt == rcv_nxt)			/* no data */
 		return (0);
 
 	CTR0(KTR_TOM, "handle_peer_close_data");
 	if (__predict_false(so_no_receive(so))) {
 		handle_excess_rx(toep, m);
 
 		/*
 		 * Although we discard the data we want to process the FIN so
 		 * that PEER_CLOSE + data behaves the same as RX_DATA_DDP +
 		 * PEER_CLOSE without data.  In particular this PEER_CLOSE
 		 * may be what will close the connection.  We return 1 because
 		 * handle_excess_rx() already freed the packet.
 		 */
 		return (1);
 	}
 
 	inp_lock_assert(tp->t_inpcb);
 	q = &toep->tp_ddp_state;
 	rcv = so_sockbuf_rcv(so);
 	sockbuf_lock(rcv);
 
 	bsp = &q->buf_state[q->cur_buf];
 	m->m_len = m->m_pkthdr.len = rcv_nxt - tp->rcv_nxt;
 	KASSERT(m->m_len > 0, ("%s m_len=%d", __FUNCTION__, m->m_len));
 	m->m_ddp_gl = (unsigned char *)bsp->gl;
 	m->m_flags |= M_DDP;
 	m->m_cur_offset = bsp->cur_offset;
 	m->m_ddp_flags = 
 	    DDP_BF_PSH | (bsp->flags & DDP_BF_NOCOPY) | 1;
 	m->m_seq = tp->rcv_nxt;
 	tp->rcv_nxt = rcv_nxt;
 	bsp->cur_offset += m->m_pkthdr.len;
 	if (!(bsp->flags & DDP_BF_NOFLIP))
 		q->cur_buf ^= 1;
 #ifdef notyet	
 	skb_reset_transport_header(skb);
 	tcp_hdr(skb)->fin = 0;          /* changes valid memory past CPL */
 #endif	
 	tp->t_rcvtime = ticks;
 	SBAPPEND(rcv, m);
 	if (__predict_true((so_state_get(so) & SS_NOFDREF) == 0))
 		so_sorwakeup_locked(so);
 	else
 		sockbuf_unlock(rcv);
 
 	return (1);
 }
 
 /*
  * Handle a peer FIN.
  */
 static void
 do_peer_fin(struct toepcb *toep, struct mbuf *m)
 {
 	struct socket *so;
 	struct tcpcb *tp = toep->tp_tp;
 	int keep, action;
 	
 	action = keep = 0;	
 	CTR1(KTR_TOM, "do_peer_fin state=%d", tp->t_state);
 	if (!is_t3a(toep->tp_toedev) && (toep->tp_flags & TP_ABORT_RPL_PENDING)) {
 		printf("abort_pending set\n");
 		
 		goto out;
 	}
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(toep->tp_tp->t_inpcb);
 	if (toep->tp_ulp_mode == ULP_MODE_TCPDDP) {
 		keep = handle_peer_close_data(so, m);
 		if (keep < 0) {
 			inp_wunlock(tp->t_inpcb);					
 			return;
 		}
 	}
 	if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
 		CTR1(KTR_TOM,
 		    "waking up waiters for cantrcvmore on %p ", so);	
 		socantrcvmore(so);
 
 		/*
 		 * If connection is half-synchronized
 		 * (ie NEEDSYN flag on) then delay ACK,
 		 * so it may be piggybacked when SYN is sent.
 		 * Otherwise, since we received a FIN then no
 		 * more input can be expected, send ACK now.
 		 */
 		if (tp->t_flags & TF_NEEDSYN)
 			tp->t_flags |= TF_DELACK;
 		else
 			tp->t_flags |= TF_ACKNOW;
 		tp->rcv_nxt++;
 	}
 	
 	switch (tp->t_state) {
 	case TCPS_SYN_RECEIVED:
 	    tp->t_starttime = ticks;
 	/* FALLTHROUGH */ 
 	case TCPS_ESTABLISHED:
 		tp->t_state = TCPS_CLOSE_WAIT;
 		break;
 	case TCPS_FIN_WAIT_1:
 		tp->t_state = TCPS_CLOSING;
 		break;
 	case TCPS_FIN_WAIT_2:
 		/*
 		 * If we've sent an abort_req we must have sent it too late,
 		 * HW will send us a reply telling us so, and this peer_close
 		 * is really the last message for this connection and needs to
 		 * be treated as an abort_rpl, i.e., transition the connection
 		 * to TCP_CLOSE (note that the host stack does this at the
 		 * time of generating the RST but we must wait for HW).
 		 * Otherwise we enter TIME_WAIT.
 		 */
 		t3_release_offload_resources(toep);
 		if (toep->tp_flags & TP_ABORT_RPL_PENDING) {
 			action = TCP_CLOSE;
 		} else {
 			action = TCP_TIMEWAIT;			
 		}
 		break;
 	default:
 		log(LOG_ERR,
 		       "%s: TID %u received PEER_CLOSE in bad state %d\n",
 		    toep->tp_toedev->tod_name, toep->tp_tid, tp->t_state);
 	}
 	inp_wunlock(tp->t_inpcb);					
 
 	if (action == TCP_TIMEWAIT) {
 		enter_timewait(tp);
 	} else if (action == TCP_DROP) {
 		tcp_offload_drop(tp, 0);		
 	} else if (action == TCP_CLOSE) {
 		tcp_offload_close(tp);		
 	}
 
 #ifdef notyet		
 	/* Do not send POLL_HUP for half duplex close. */
 	if ((sk->sk_shutdown & SEND_SHUTDOWN) ||
 	    sk->sk_state == TCP_CLOSE)
 		sk_wake_async(so, 1, POLL_HUP);
 	else
 		sk_wake_async(so, 1, POLL_IN);
 #endif
 
 out:
 	if (!keep)
 		m_free(m);
 }
 
 /*
  * Handler for PEER_CLOSE CPL messages.
  */
 static int
 do_peer_close(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	VALIDATE_SOCK(so);
 
 	do_peer_fin(toep, m);
 	return (0);
 }
 
 static void
 process_close_con_rpl(struct toepcb *toep, struct mbuf *m)
 {
 	struct cpl_close_con_rpl *rpl = cplhdr(m);
 	struct tcpcb *tp = toep->tp_tp;	
 	struct socket *so;	
 	int action = 0;
 	struct sockbuf *rcv;	
 	
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(tp->t_inpcb);	
 	
 	tp->snd_una = ntohl(rpl->snd_nxt) - 1;  /* exclude FIN */
 
 	if (!is_t3a(toep->tp_toedev) && (toep->tp_flags & TP_ABORT_RPL_PENDING)) {
 		inp_wunlock(tp->t_inpcb);
 		goto out;
 	}
 	
 	CTR3(KTR_TOM, "process_close_con_rpl(%p) state=%d dead=%d", toep, 
 	    tp->t_state, !!(so_state_get(so) & SS_NOFDREF));
 
 	switch (tp->t_state) {
 	case TCPS_CLOSING:              /* see FIN_WAIT2 case in do_peer_fin */
 		t3_release_offload_resources(toep);
 		if (toep->tp_flags & TP_ABORT_RPL_PENDING) {
 			action = TCP_CLOSE;
 
 		} else {
 			action = TCP_TIMEWAIT;
 		}
 		break;
 	case TCPS_LAST_ACK:
 		/*
 		 * In this state we don't care about pending abort_rpl.
 		 * If we've sent abort_req it was post-close and was sent too
 		 * late, this close_con_rpl is the actual last message.
 		 */
 		t3_release_offload_resources(toep);
 		action = TCP_CLOSE;
 		break;
 	case TCPS_FIN_WAIT_1:
 		/*
 		 * If we can't receive any more
 		 * data, then closing user can proceed.
 		 * Starting the timer is contrary to the
 		 * specification, but if we don't get a FIN
 		 * we'll hang forever.
 		 *
 		 * XXXjl:
 		 * we should release the tp also, and use a
 		 * compressed state.
 		 */
 		if (so)
 			rcv = so_sockbuf_rcv(so);
 		else
 			break;
 		
 		if (rcv->sb_state & SBS_CANTRCVMORE) {
 			int timeout;
 
 			if (so)
 				soisdisconnected(so);
 			timeout = (tcp_fast_finwait2_recycle) ? 
 			    tcp_finwait2_timeout : tcp_maxidle;
 			tcp_timer_activate(tp, TT_2MSL, timeout);
 		}
 		tp->t_state = TCPS_FIN_WAIT_2;
 		if ((so_options_get(so) & SO_LINGER) && so_linger_get(so) == 0 &&
 		    (toep->tp_flags & TP_ABORT_SHUTDOWN) == 0) {
 			action = TCP_DROP;
 		}
 
 		break;
 	default:
 		log(LOG_ERR,
 		       "%s: TID %u received CLOSE_CON_RPL in bad state %d\n",
 		       toep->tp_toedev->tod_name, toep->tp_tid,
 		       tp->t_state);
 	}
 	inp_wunlock(tp->t_inpcb);
 
 
 	if (action == TCP_TIMEWAIT) {
 		enter_timewait(tp);
 	} else if (action == TCP_DROP) {
 		tcp_offload_drop(tp, 0);		
 	} else if (action == TCP_CLOSE) {
 		tcp_offload_close(tp);		
 	}
 out:
 	m_freem(m);
 }
 
 /*
  * Handler for CLOSE_CON_RPL CPL messages.
  */
 static int
 do_close_con_rpl(struct t3cdev *cdev, struct mbuf *m,
 			    void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	process_close_con_rpl(toep, m);
 	return (0);
 }
 
 /*
  * Process abort replies.  We only process these messages if we anticipate
  * them as the coordination between SW and HW in this area is somewhat lacking
  * and sometimes we get ABORT_RPLs after we are done with the connection that
  * originated the ABORT_REQ.
  */
 static void
 process_abort_rpl(struct toepcb *toep, struct mbuf *m)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so;	
 	int needclose = 0;
 	
 #ifdef T3_TRACE
 	T3_TRACE1(TIDTB(sk),
 		  "process_abort_rpl: GTS rpl pending %d",
 		  sock_flag(sk, ABORT_RPL_PENDING));
 #endif
 	
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	
 	if (toep->tp_flags & TP_ABORT_RPL_PENDING) {
 		/*
 		 * XXX panic on tcpdrop
 		 */
 		if (!(toep->tp_flags & TP_ABORT_RPL_RCVD) && !is_t3a(toep->tp_toedev))
 			toep->tp_flags |= TP_ABORT_RPL_RCVD;
 		else {
 			toep->tp_flags &= ~(TP_ABORT_RPL_RCVD|TP_ABORT_RPL_PENDING);
 			if (!(toep->tp_flags & TP_ABORT_REQ_RCVD) ||
 			    !is_t3a(toep->tp_toedev)) {
 				if (toep->tp_flags & TP_ABORT_REQ_RCVD)
 					panic("TP_ABORT_REQ_RCVD set");
 				t3_release_offload_resources(toep);
 				needclose = 1;
 			}
 		}
 	}
 	inp_wunlock(tp->t_inpcb);
 
 	if (needclose)
 		tcp_offload_close(tp);
 
 	m_free(m);
 }
 
 /*
  * Handle an ABORT_RPL_RSS CPL message.
  */
 static int
 do_abort_rpl(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct cpl_abort_rpl_rss *rpl = cplhdr(m);
 	struct toepcb *toep;
 	
 	/*
 	 * Ignore replies to post-close aborts indicating that the abort was
 	 * requested too late.  These connections are terminated when we get
 	 * PEER_CLOSE or CLOSE_CON_RPL and by the time the abort_rpl_rss
 	 * arrives the TID is either no longer used or it has been recycled.
 	 */
 	if (rpl->status == CPL_ERR_ABORT_FAILED) {
 discard:
 		m_free(m);
 		return (0);
 	}
 
 	toep = (struct toepcb *)ctx;
 	
         /*
 	 * Sometimes we've already closed the socket, e.g., a post-close
 	 * abort races with ABORT_REQ_RSS, the latter frees the socket
 	 * expecting the ABORT_REQ will fail with CPL_ERR_ABORT_FAILED,
 	 * but FW turns the ABORT_REQ into a regular one and so we get
 	 * ABORT_RPL_RSS with status 0 and no socket.  Only on T3A.
 	 */
 	if (!toep)
 		goto discard;
 
 	if (toep->tp_tp == NULL) {
 		log(LOG_NOTICE, "removing tid for abort\n");
 		cxgb_remove_tid(cdev, toep, toep->tp_tid);
 		if (toep->tp_l2t) 
 			l2t_release(L2DATA(cdev), toep->tp_l2t);
 
 		toepcb_release(toep);
 		goto discard;
 	}
 	
 	log(LOG_NOTICE, "toep=%p\n", toep);
 	log(LOG_NOTICE, "tp=%p\n", toep->tp_tp);
 
 	toepcb_hold(toep);
 	process_abort_rpl(toep, m);
 	toepcb_release(toep);
 	return (0);
 }
 
 /*
  * Convert the status code of an ABORT_REQ into a FreeBSD error code.  Also
  * indicate whether RST should be sent in response.
  */
 static int
 abort_status_to_errno(struct socket *so, int abort_reason, int *need_rst)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 
 	switch (abort_reason) {
 	case CPL_ERR_BAD_SYN:
 #if 0		
 		NET_INC_STATS_BH(LINUX_MIB_TCPABORTONSYN);	// fall through
 #endif		
 	case CPL_ERR_CONN_RESET:
 		// XXX need to handle SYN_RECV due to crossed SYNs
 		return (tp->t_state == TCPS_CLOSE_WAIT ? EPIPE : ECONNRESET);
 	case CPL_ERR_XMIT_TIMEDOUT:
 	case CPL_ERR_PERSIST_TIMEDOUT:
 	case CPL_ERR_FINWAIT2_TIMEDOUT:
 	case CPL_ERR_KEEPALIVE_TIMEDOUT:
 #if 0		
 		NET_INC_STATS_BH(LINUX_MIB_TCPABORTONTIMEOUT);
 #endif		
 		return (ETIMEDOUT);
 	default:
 		return (EIO);
 	}
 }
 
 static inline void
 set_abort_rpl_wr(struct mbuf *m, unsigned int tid, int cmd)
 {
 	struct cpl_abort_rpl *rpl = cplhdr(m);
 
 	rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_HOST_ABORT_CON_RPL));
 	rpl->wr.wr_lo = htonl(V_WR_TID(tid));
 	m->m_len = m->m_pkthdr.len = sizeof(*rpl);
 	
 	OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_ABORT_RPL, tid));
 	rpl->cmd = cmd;
 }
 
 static void
 send_deferred_abort_rpl(struct toedev *tdev, struct mbuf *m)
 {
 	struct mbuf *reply_mbuf;
 	struct cpl_abort_req_rss *req = cplhdr(m);
 
 	reply_mbuf = m_gethdr_nofail(sizeof(struct cpl_abort_rpl));
 	m_set_priority(m, CPL_PRIORITY_DATA);
 	m->m_len = m->m_pkthdr.len = sizeof(struct cpl_abort_rpl);
 	set_abort_rpl_wr(reply_mbuf, GET_TID(req), req->status);
 	cxgb_ofld_send(TOM_DATA(tdev)->cdev, reply_mbuf);
 	m_free(m);
 }
 
 /*
  * Returns whether an ABORT_REQ_RSS message is a negative advice.
  */
 static inline int
 is_neg_adv_abort(unsigned int status)
 {
 	return status == CPL_ERR_RTX_NEG_ADVICE ||
 	    status == CPL_ERR_PERSIST_NEG_ADVICE;
 }
 
 static void
 send_abort_rpl(struct mbuf *m, struct toedev *tdev, int rst_status)
 {
 	struct mbuf  *reply_mbuf;
 	struct cpl_abort_req_rss *req = cplhdr(m);
 
 	reply_mbuf = m_gethdr(M_NOWAIT, MT_DATA);
 
 	if (!reply_mbuf) {
 		/* Defer the reply.  Stick rst_status into req->cmd. */
 		req->status = rst_status;
 		t3_defer_reply(m, tdev, send_deferred_abort_rpl);
 		return;
 	}
 
 	m_set_priority(reply_mbuf, CPL_PRIORITY_DATA);
 	set_abort_rpl_wr(reply_mbuf, GET_TID(req), rst_status);
 	m_free(m);
 
 	/*
 	 * XXX need to sync with ARP as for SYN_RECV connections we can send
 	 * these messages while ARP is pending.  For other connection states
 	 * it's not a problem.
 	 */
 	cxgb_ofld_send(TOM_DATA(tdev)->cdev, reply_mbuf);
 }
 
 #ifdef notyet
 static void
 cleanup_syn_rcv_conn(struct socket *child, struct socket *parent)
 {
 	CXGB_UNIMPLEMENTED();
 #ifdef notyet	
 	struct request_sock *req = child->sk_user_data;
 
 	inet_csk_reqsk_queue_removed(parent, req);
 	synq_remove(tcp_sk(child));
 	__reqsk_free(req);
 	child->sk_user_data = NULL;
 #endif
 }
 
 
 /*
  * Performs the actual work to abort a SYN_RECV connection.
  */
 static void
 do_abort_syn_rcv(struct socket *child, struct socket *parent)
 {
 	struct tcpcb *parenttp = so_sototcpcb(parent);
 	struct tcpcb *childtp = so_sototcpcb(child);
 
 	/*
 	 * If the server is still open we clean up the child connection,
 	 * otherwise the server already did the clean up as it was purging
 	 * its SYN queue and the skb was just sitting in its backlog.
 	 */
 	if (__predict_false(parenttp->t_state == TCPS_LISTEN)) {
 		cleanup_syn_rcv_conn(child, parent);
 		inp_wlock(childtp->t_inpcb);
 		t3_release_offload_resources(childtp->t_toe);
 		inp_wunlock(childtp->t_inpcb);
 		tcp_offload_close(childtp);
 	}
 }
 #endif
 
 /*
  * Handle abort requests for a SYN_RECV connection.  These need extra work
  * because the socket is on its parent's SYN queue.
  */
 static int
 abort_syn_rcv(struct socket *so, struct mbuf *m)
 {
 	CXGB_UNIMPLEMENTED();
 #ifdef notyet	
 	struct socket *parent;
 	struct toedev *tdev = toep->tp_toedev;
 	struct t3cdev *cdev = TOM_DATA(tdev)->cdev;
 	struct socket *oreq = so->so_incomp;
 	struct t3c_tid_entry *t3c_stid;
 	struct tid_info *t;
 
 	if (!oreq)
 		return -1;        /* somehow we are not on the SYN queue */
 
 	t = &(T3C_DATA(cdev))->tid_maps;
 	t3c_stid = lookup_stid(t, oreq->ts_recent);
 	parent = ((struct listen_ctx *)t3c_stid->ctx)->lso;
 
 	so_lock(parent);
 	do_abort_syn_rcv(so, parent);
 	send_abort_rpl(m, tdev, CPL_ABORT_NO_RST);
 	so_unlock(parent);
 #endif
 	return (0);
 }
 
 /*
  * Process abort requests.  If we are waiting for an ABORT_RPL we ignore this
  * request except that we need to reply to it.
  */
 static void
 process_abort_req(struct toepcb *toep, struct mbuf *m, struct toedev *tdev)
 {
 	int rst_status = CPL_ABORT_NO_RST;
 	const struct cpl_abort_req_rss *req = cplhdr(m);
 	struct tcpcb *tp = toep->tp_tp; 
 	struct socket *so;
 	int needclose = 0;
 	
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(toep->tp_tp->t_inpcb);
 	if ((toep->tp_flags & TP_ABORT_REQ_RCVD) == 0) {
 		toep->tp_flags |= (TP_ABORT_REQ_RCVD|TP_ABORT_SHUTDOWN);
 		m_free(m);
 		goto skip;
 	}
 
 	toep->tp_flags &= ~TP_ABORT_REQ_RCVD;
 	/*
 	 * Three cases to consider:
 	 * a) We haven't sent an abort_req; close the connection.
 	 * b) We have sent a post-close abort_req that will get to TP too late
 	 *    and will generate a CPL_ERR_ABORT_FAILED reply.  The reply will
 	 *    be ignored and the connection should be closed now.
 	 * c) We have sent a regular abort_req that will get to TP too late.
 	 *    That will generate an abort_rpl with status 0, wait for it.
 	 */
 	if (((toep->tp_flags & TP_ABORT_RPL_PENDING) == 0) ||
 	    (is_t3a(toep->tp_toedev) && (toep->tp_flags & TP_CLOSE_CON_REQUESTED))) {
 		int error;
 		
 		error = abort_status_to_errno(so, req->status,
 		    &rst_status);
 		so_error_set(so, error);
 
 		if (__predict_true((so_state_get(so) & SS_NOFDREF) == 0))
 			so_sorwakeup(so);
 		/*
 		 * SYN_RECV needs special processing.  If abort_syn_rcv()
 		 * returns 0 is has taken care of the abort.
 		 */
 		if ((tp->t_state == TCPS_SYN_RECEIVED) && !abort_syn_rcv(so, m))
 			goto skip;
 
 		t3_release_offload_resources(toep);
 		needclose = 1;
 	}
 	inp_wunlock(tp->t_inpcb);
 
 	if (needclose)
 		tcp_offload_close(tp);
 
 	send_abort_rpl(m, tdev, rst_status);
 	return;
 skip:
 	inp_wunlock(tp->t_inpcb);	
 }
 
 /*
  * Handle an ABORT_REQ_RSS CPL message.
  */
 static int
 do_abort_req(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	const struct cpl_abort_req_rss *req = cplhdr(m);
 	struct toepcb *toep = (struct toepcb *)ctx;
 	
 	if (is_neg_adv_abort(req->status)) {
 		m_free(m);
 		return (0);
 	}
 
 	log(LOG_NOTICE, "aborting tid=%d\n", toep->tp_tid);
 	
 	if ((toep->tp_flags & (TP_SYN_RCVD|TP_ABORT_REQ_RCVD)) == TP_SYN_RCVD) {
 		cxgb_remove_tid(cdev, toep, toep->tp_tid);
 		toep->tp_flags |= TP_ABORT_REQ_RCVD;
 		
 		send_abort_rpl(m, toep->tp_toedev, CPL_ABORT_NO_RST);
 		if (toep->tp_l2t) 
 			l2t_release(L2DATA(cdev), toep->tp_l2t);
 
 		/*
 		 *  Unhook
 		 */
 		toep->tp_tp->t_toe = NULL;
 		toep->tp_tp->t_flags &= ~TF_TOE;
 		toep->tp_tp = NULL;
 		/*
 		 * XXX need to call syncache_chkrst - but we don't
 		 * have a way of doing that yet
 		 */
 		toepcb_release(toep);
 		log(LOG_ERR, "abort for unestablished connection :-(\n");
 		return (0);
 	}
 	if (toep->tp_tp == NULL) {
 		log(LOG_NOTICE, "disconnected toepcb\n");
 		/* should be freed momentarily */
 		return (0);
 	}
 
 
 	toepcb_hold(toep);
 	process_abort_req(toep, m, toep->tp_toedev);
 	toepcb_release(toep);
 	return (0);
 }
 #ifdef notyet
 static void
 pass_open_abort(struct socket *child, struct socket *parent, struct mbuf *m)
 {
 	struct toedev *tdev = TOE_DEV(parent);
 
 	do_abort_syn_rcv(child, parent);
 	if (tdev->tod_ttid == TOE_ID_CHELSIO_T3) {
 		struct cpl_pass_accept_rpl *rpl = cplhdr(m);
 
 		rpl->opt0h = htonl(F_TCAM_BYPASS);
 		rpl->opt0l_status = htonl(CPL_PASS_OPEN_REJECT);
 		cxgb_ofld_send(TOM_DATA(tdev)->cdev, m);
 	} else
 		m_free(m);
 }
 #endif
 static void
 handle_pass_open_arp_failure(struct socket *so, struct mbuf *m)
 {
 	CXGB_UNIMPLEMENTED();
 	
 #ifdef notyet	
 	struct t3cdev *cdev;
 	struct socket *parent;
 	struct socket *oreq;
 	struct t3c_tid_entry *t3c_stid;
 	struct tid_info *t;
 	struct tcpcb *otp, *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	
 	/*
 	 * If the connection is being aborted due to the parent listening
 	 * socket going away there's nothing to do, the ABORT_REQ will close
 	 * the connection.
 	 */
 	if (toep->tp_flags & TP_ABORT_RPL_PENDING) {
 		m_free(m);
 		return;
 	}
 
 	oreq = so->so_incomp;
 	otp = so_sototcpcb(oreq);
 	
 	cdev = T3C_DEV(so);
 	t = &(T3C_DATA(cdev))->tid_maps;
 	t3c_stid = lookup_stid(t, otp->ts_recent);
 	parent = ((struct listen_ctx *)t3c_stid->ctx)->lso;
 
 	so_lock(parent);
 	pass_open_abort(so, parent, m);
 	so_unlock(parent);
 #endif	
 }
 
 /*
  * Handle an ARP failure for a CPL_PASS_ACCEPT_RPL.  This is treated similarly
  * to an ABORT_REQ_RSS in SYN_RECV as both events need to tear down a SYN_RECV
  * connection.
  */
 static void
 pass_accept_rpl_arp_failure(struct t3cdev *cdev, struct mbuf *m)
 {
 
 #ifdef notyet	
 	TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
 	BLOG_SKB_CB(skb)->dev = TOE_DEV(skb->sk);
 #endif
 	handle_pass_open_arp_failure(m_get_socket(m), m);
 }
 
 /*
  * Populate a reject CPL_PASS_ACCEPT_RPL WR.
  */
 static void
 mk_pass_accept_rpl(struct mbuf *reply_mbuf, struct mbuf *req_mbuf)
 {
 	struct cpl_pass_accept_req *req = cplhdr(req_mbuf);
 	struct cpl_pass_accept_rpl *rpl = cplhdr(reply_mbuf);
 	unsigned int tid = GET_TID(req);
 
 	m_set_priority(reply_mbuf, CPL_PRIORITY_SETUP);
 	rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_PASS_ACCEPT_RPL, tid));
 	rpl->peer_ip = req->peer_ip;   // req->peer_ip not overwritten yet
 	rpl->opt0h = htonl(F_TCAM_BYPASS);
 	rpl->opt0l_status = htonl(CPL_PASS_OPEN_REJECT);
 	rpl->opt2 = 0;
 	rpl->rsvd = rpl->opt2;   /* workaround for HW bug */
 }
 
 /*
  * Send a deferred reject to an accept request.
  */
 static void
 reject_pass_request(struct toedev *tdev, struct mbuf *m)
 {
 	struct mbuf *reply_mbuf;
 
 	reply_mbuf = m_gethdr_nofail(sizeof(struct cpl_pass_accept_rpl));
 	mk_pass_accept_rpl(reply_mbuf, m);
 	cxgb_ofld_send(TOM_DATA(tdev)->cdev, reply_mbuf);
 	m_free(m);
 }
 
 static void
 handle_syncache_event(int event, void *arg)
 {
 	struct toepcb *toep = arg;
 
 	switch (event) {
 	case TOE_SC_ENTRY_PRESENT:
 		/*
 		 * entry already exists - free toepcb
 		 * and l2t
 		 */
 		printf("syncache entry present\n");
 		toepcb_release(toep);
 		break;
 	case TOE_SC_DROP:
 		/*
 		 * The syncache has given up on this entry
 		 * either it timed out, or it was evicted
 		 * we need to explicitly release the tid
 		 */
 		printf("syncache entry dropped\n");
 		toepcb_release(toep);		
 		break;
 	default:
 		log(LOG_ERR, "unknown syncache event %d\n", event);
 		break;
 	}
 }
 
 static void
 syncache_add_accept_req(struct cpl_pass_accept_req *req, struct socket *lso, struct toepcb *toep)
 {
 	struct in_conninfo inc;
-	struct tcpopt to;
+	struct toeopt toeo;
 	struct tcphdr th;
 	struct inpcb *inp;
 	int mss, wsf, sack, ts;
 	uint32_t rcv_isn = ntohl(req->rcv_isn);
 	
-	bzero(&to, sizeof(struct tcpopt));
+	bzero(&toeo, sizeof(struct toeopt));
 	inp = so_sotoinpcb(lso);
 	
 	/*
 	 * Fill out information for entering us into the syncache
 	 */
 	bzero(&inc, sizeof(inc));
 	inc.inc_fport = th.th_sport = req->peer_port;
 	inc.inc_lport = th.th_dport = req->local_port;
 	th.th_seq = req->rcv_isn;
 	th.th_flags = TH_SYN;
 
 	toep->tp_iss = toep->tp_delack_seq = toep->tp_rcv_wup = toep->tp_copied_seq = rcv_isn + 1;
 
 	inc.inc_len = 0;
 	inc.inc_faddr.s_addr = req->peer_ip;
 	inc.inc_laddr.s_addr = req->local_ip;
 
 	DPRINTF("syncache add of %d:%d %d:%d\n",
 	    ntohl(req->local_ip), ntohs(req->local_port),
 	    ntohl(req->peer_ip), ntohs(req->peer_port));
 	
 	mss = req->tcp_options.mss;
 	wsf = req->tcp_options.wsf;
 	ts = req->tcp_options.tstamp;
 	sack = req->tcp_options.sack;
-	to.to_mss = mss;
-	to.to_wscale = wsf;
-	to.to_flags = (mss ? TOF_MSS : 0) | (wsf ? TOF_SCALE : 0) | (ts ? TOF_TS : 0) | (sack ? TOF_SACKPERM : 0);
-	tcp_offload_syncache_add(&inc, &to, &th, inp, &lso, &cxgb_toe_usrreqs, toep);
+	toeo.to_mss = mss;
+	toeo.to_wscale = wsf;
+	toeo.to_flags = (mss ? TOF_MSS : 0) | (wsf ? TOF_SCALE : 0) | (ts ? TOF_TS : 0) | (sack ? TOF_SACKPERM : 0);
+	tcp_offload_syncache_add(&inc, &toeo, &th, inp, &lso, &cxgb_toe_usrreqs,
+toep);
 }
 
 
 /*
  * Process a CPL_PASS_ACCEPT_REQ message.  Does the part that needs the socket
  * lock held.  Note that the sock here is a listening socket that is not owned
  * by the TOE.
  */
 static void
 process_pass_accept_req(struct socket *so, struct mbuf *m, struct toedev *tdev,
     struct listen_ctx *lctx)
 {
 	int rt_flags;
 	struct l2t_entry *e;
 	struct iff_mac tim;
 	struct mbuf *reply_mbuf, *ddp_mbuf = NULL;
 	struct cpl_pass_accept_rpl *rpl;
 	struct cpl_pass_accept_req *req = cplhdr(m);
 	unsigned int tid = GET_TID(req);
 	struct tom_data *d = TOM_DATA(tdev);
 	struct t3cdev *cdev = d->cdev;
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *newtoep;
 	struct rtentry *dst;
 	struct sockaddr_in nam;
 	struct t3c_data *td = T3C_DATA(cdev);
 
 	reply_mbuf = m_gethdr(M_NOWAIT, MT_DATA);
 	if (__predict_false(reply_mbuf == NULL)) {
 		if (tdev->tod_ttid == TOE_ID_CHELSIO_T3)
 			t3_defer_reply(m, tdev, reject_pass_request);
 		else {
 			cxgb_queue_tid_release(cdev, tid);
 			m_free(m);
 		}
 		DPRINTF("failed to get reply_mbuf\n");
 		
 		goto out;
 	}
 
 	if (tp->t_state != TCPS_LISTEN) {
 		DPRINTF("socket not in listen state\n");
 		
 		goto reject;
 	}
 	
 	tim.mac_addr = req->dst_mac;
 	tim.vlan_tag = ntohs(req->vlan_tag);
 	if (cdev->ctl(cdev, GET_IFF_FROM_MAC, &tim) < 0 || !tim.dev) {
 		DPRINTF("rejecting from failed GET_IFF_FROM_MAC\n");
 		goto reject;
 	}
 	
 #ifdef notyet
 	/*
 	 * XXX do route lookup to confirm that we're still listening on this
 	 * address
 	 */
 	if (ip_route_input(skb, req->local_ip, req->peer_ip,
 			   G_PASS_OPEN_TOS(ntohl(req->tos_tid)), tim.dev))
 		goto reject;
 	rt_flags = ((struct rtable *)skb->dst)->rt_flags &
 		(RTCF_BROADCAST | RTCF_MULTICAST | RTCF_LOCAL);
 	dst_release(skb->dst);	// done with the input route, release it
 	skb->dst = NULL;
 	
 	if ((rt_flags & RTF_LOCAL) == 0)
 		goto reject;
 #endif
 	/*
 	 * XXX
 	 */
 	rt_flags = RTF_LOCAL;
 	if ((rt_flags & RTF_LOCAL) == 0)
 		goto reject;
 	
 	/*
 	 * Calculate values and add to syncache
 	 */
 
 	newtoep = toepcb_alloc();
 	if (newtoep == NULL)
 		goto reject;
 
 	bzero(&nam, sizeof(struct sockaddr_in));
 	
 	nam.sin_len = sizeof(struct sockaddr_in);
 	nam.sin_family = AF_INET;
 	nam.sin_addr.s_addr =req->peer_ip;
 	dst = rtalloc2((struct sockaddr *)&nam, 1, 0);
 
 	if (dst == NULL) {
 		printf("failed to find route\n");
 		goto reject;
 	}
 	e = newtoep->tp_l2t = t3_l2t_get(d->cdev, dst, tim.dev,
 	    (struct sockaddr *)&nam);
 	if (e == NULL) {
 		DPRINTF("failed to get l2t\n");
 	}
 	/*
 	 * Point to our listen socket until accept
 	 */
 	newtoep->tp_tp = tp;
 	newtoep->tp_flags = TP_SYN_RCVD;
 	newtoep->tp_tid = tid;
 	newtoep->tp_toedev = tdev;
 	tp->rcv_wnd = select_rcv_wnd(tdev, so);
 	
 	cxgb_insert_tid(cdev, d->client, newtoep, tid);
 	so_lock(so);
 	LIST_INSERT_HEAD(&lctx->synq_head, newtoep, synq_entry);
 	so_unlock(so);
 
 	newtoep->tp_ulp_mode = TOM_TUNABLE(tdev, ddp) && !(so_options_get(so) & SO_NO_DDP) &&
 		       tp->rcv_wnd >= MIN_DDP_RCV_WIN ? ULP_MODE_TCPDDP : 0;
 
 	if (newtoep->tp_ulp_mode) {
 		ddp_mbuf = m_gethdr(M_NOWAIT, MT_DATA);
 		
 		if (ddp_mbuf == NULL)
 			newtoep->tp_ulp_mode = 0;
 	}
 	
 	CTR4(KTR_TOM, "ddp=%d rcv_wnd=%ld min_win=%d ulp_mode=%d",
 	    TOM_TUNABLE(tdev, ddp), tp->rcv_wnd, MIN_DDP_RCV_WIN, newtoep->tp_ulp_mode);
 	set_arp_failure_handler(reply_mbuf, pass_accept_rpl_arp_failure);
 	/*
 	 * XXX workaround for lack of syncache drop
 	 */
 	toepcb_hold(newtoep);
 	syncache_add_accept_req(req, so, newtoep);
 	
 	rpl = cplhdr(reply_mbuf);
 	reply_mbuf->m_pkthdr.len = reply_mbuf->m_len = sizeof(*rpl);
 	rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
 	rpl->wr.wr_lo = 0;
 	OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_PASS_ACCEPT_RPL, tid));
 	rpl->opt2 = htonl(calc_opt2(so, tdev));
 	rpl->rsvd = rpl->opt2;                /* workaround for HW bug */
 	rpl->peer_ip = req->peer_ip;	// req->peer_ip is not overwritten
 
 	rpl->opt0h = htonl(calc_opt0h(so, select_mss(td, NULL, dst->rt_ifp->if_mtu)) |
 	    V_L2T_IDX(e->idx) | V_TX_CHANNEL(e->smt_idx));
 	rpl->opt0l_status = htonl(calc_opt0l(so, newtoep->tp_ulp_mode) |
 				  CPL_PASS_OPEN_ACCEPT);
 
 	DPRINTF("opt0l_status=%08x\n", rpl->opt0l_status);
 	
 	m_set_priority(reply_mbuf, mkprio(CPL_PRIORITY_SETUP, newtoep));
 		
 	l2t_send(cdev, reply_mbuf, e);
 	m_free(m);
 	if (newtoep->tp_ulp_mode) {	
 		__set_tcb_field(newtoep, ddp_mbuf, W_TCB_RX_DDP_FLAGS,
 				V_TF_DDP_OFF(1) |
 				TP_DDP_TIMER_WORKAROUND_MASK,
 				V_TF_DDP_OFF(1) |
 		    TP_DDP_TIMER_WORKAROUND_VAL, 1);
 	} else
 		DPRINTF("no DDP\n");
 
 	return;
 reject:
 	if (tdev->tod_ttid == TOE_ID_CHELSIO_T3)
 		mk_pass_accept_rpl(reply_mbuf, m);
 	else 
 		mk_tid_release(reply_mbuf, newtoep, tid);
 	cxgb_ofld_send(cdev, reply_mbuf);
 	m_free(m);
 out:
 #if 0
 	TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
 #else
 	return;
 #endif	
 }      
 
 /*
  * Handle a CPL_PASS_ACCEPT_REQ message.
  */
 static int
 do_pass_accept_req(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct listen_ctx *listen_ctx = (struct listen_ctx *)ctx;
 	struct socket *lso = listen_ctx->lso; /* XXX need an interlock against the listen socket going away */
 	struct tom_data *d = listen_ctx->tom_data;
 
 #if VALIDATE_TID
 	struct cpl_pass_accept_req *req = cplhdr(m);
 	unsigned int tid = GET_TID(req);
 	struct tid_info *t = &(T3C_DATA(cdev))->tid_maps;
 
 	if (unlikely(!lsk)) {
 		printk(KERN_ERR "%s: PASS_ACCEPT_REQ had unknown STID %lu\n",
 		       cdev->name,
 		       (unsigned long)((union listen_entry *)ctx -
 					t->stid_tab));
 		return CPL_RET_BUF_DONE;
 	}
 	if (unlikely(tid >= t->ntids)) {
 		printk(KERN_ERR "%s: passive open TID %u too large\n",
 		       cdev->name, tid);
 		return CPL_RET_BUF_DONE;
 	}
 	/*
 	 * For T3A the current user of the TID may have closed but its last
 	 * message(s) may have been backlogged so the TID appears to be still
 	 * in use.  Just take the TID away, the connection can close at its
 	 * own leisure.  For T3B this situation is a bug.
 	 */
 	if (!valid_new_tid(t, tid) &&
 	    cdev->type != T3A) {
 		printk(KERN_ERR "%s: passive open uses existing TID %u\n",
 		       cdev->name, tid);
 		return CPL_RET_BUF_DONE;
 	}
 #endif
 
 	process_pass_accept_req(lso, m, &d->tdev, listen_ctx);
 	return (0);
 }
 
 /*
  * Called when a connection is established to translate the TCP options
  * reported by HW to FreeBSD's native format.
  */
 static void
 assign_rxopt(struct socket *so, unsigned int opt)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	const struct t3c_data *td = T3C_DATA(TOEP_T3C_DEV(toep));
 
 	inp_lock_assert(tp->t_inpcb);
 	
 	toep->tp_mss_clamp = td->mtus[G_TCPOPT_MSS(opt)] - 40;
 	tp->t_flags         |= G_TCPOPT_TSTAMP(opt) ? TF_RCVD_TSTMP : 0;
 	tp->t_flags         |= G_TCPOPT_SACK(opt) ? TF_SACK_PERMIT : 0;
 	tp->t_flags 	    |= G_TCPOPT_WSCALE_OK(opt) ? TF_RCVD_SCALE : 0;
 	if ((tp->t_flags & (TF_RCVD_SCALE|TF_REQ_SCALE)) ==
 	    (TF_RCVD_SCALE|TF_REQ_SCALE))
 		tp->rcv_scale = tp->request_r_scale;
 }
 
 /*
  * Completes some final bits of initialization for just established connections
  * and changes their state to TCP_ESTABLISHED.
  *
  * snd_isn here is the ISN after the SYN, i.e., the true ISN + 1.
  */
 static void
 make_established(struct socket *so, u32 snd_isn, unsigned int opt)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	
 	toep->tp_write_seq = tp->iss = tp->snd_max = tp->snd_nxt = tp->snd_una = snd_isn;
 	assign_rxopt(so, opt);
 
 	/*
 	 *XXXXXXXXXXX
 	 * 
 	 */
 #ifdef notyet
 	so->so_proto->pr_ctloutput = t3_ctloutput;
 #endif
 	
 #if 0	
 	inet_sk(sk)->id = tp->write_seq ^ jiffies;
 #endif	
 	/*
 	 * XXX not clear what rcv_wup maps to
 	 */
 	/*
 	 * Causes the first RX_DATA_ACK to supply any Rx credits we couldn't
 	 * pass through opt0.
 	 */
 	if (tp->rcv_wnd > (M_RCV_BUFSIZ << 10))
 		toep->tp_rcv_wup -= tp->rcv_wnd - (M_RCV_BUFSIZ << 10);
 
 	dump_toepcb(toep);
 
 #ifdef notyet
 /*
  * no clean interface for marking ARP up to date
  */
 	dst_confirm(sk->sk_dst_cache);
 #endif
 	tp->t_starttime = ticks;
 	tp->t_state = TCPS_ESTABLISHED;
 	soisconnected(so);
 }
 
 static int
 syncache_expand_establish_req(struct cpl_pass_establish *req, struct socket **so, struct toepcb *toep)
 {
 
 	struct in_conninfo inc;
-	struct tcpopt to;
+	struct toeopt to;
 	struct tcphdr th;
 	int mss, wsf, sack, ts;
 	struct mbuf *m = NULL;
 	const struct t3c_data *td = T3C_DATA(TOM_DATA(toep->tp_toedev)->cdev);
 	unsigned int opt;
 	
 #ifdef MAC
 #error	"no MAC support"
 #endif	
 	
 	opt = ntohs(req->tcp_opt);
 	
-	bzero(&to, sizeof(struct tcpopt));
+	bzero(&toeo, sizeof(struct toeopt));
 	
 	/*
 	 * Fill out information for entering us into the syncache
 	 */
 	bzero(&inc, sizeof(inc));
 	inc.inc_fport = th.th_sport = req->peer_port;
 	inc.inc_lport = th.th_dport = req->local_port;
 	th.th_seq = req->rcv_isn;
 	th.th_flags = TH_ACK;
 	
 	inc.inc_len = 0;
 	inc.inc_faddr.s_addr = req->peer_ip;
 	inc.inc_laddr.s_addr = req->local_ip;
 	
 	mss  = td->mtus[G_TCPOPT_MSS(opt)] - 40;
 	wsf  = G_TCPOPT_WSCALE_OK(opt);
 	ts   = G_TCPOPT_TSTAMP(opt);
 	sack = G_TCPOPT_SACK(opt);
 	
-	to.to_mss = mss;
-	to.to_wscale =  G_TCPOPT_SND_WSCALE(opt);
-	to.to_flags = (mss ? TOF_MSS : 0) | (wsf ? TOF_SCALE : 0) | (ts ? TOF_TS : 0) | (sack ? TOF_SACKPERM : 0);
+	toeo.to_mss = mss;
+	toeo.to_wscale =  G_TCPOPT_SND_WSCALE(opt);
+	toeo.to_flags = (mss ? TOF_MSS : 0) | (wsf ? TOF_SCALE : 0) | (ts ? TOF_TS : 0) | (sack ? TOF_SACKPERM : 0);
 
 	DPRINTF("syncache expand of %d:%d %d:%d mss:%d wsf:%d ts:%d sack:%d\n",
 	    ntohl(req->local_ip), ntohs(req->local_port),
 	    ntohl(req->peer_ip), ntohs(req->peer_port),
 	    mss, wsf, ts, sack);
-	return tcp_offload_syncache_expand(&inc, &to, &th, so, m);
+	return tcp_offload_syncache_expand(&inc, &toeo, &th, so, m);
 }
 
 
 /*
  * Process a CPL_PASS_ESTABLISH message.  XXX a lot of the locking doesn't work
  * if we are in TCP_SYN_RECV due to crossed SYNs
  */
 static int
 do_pass_establish(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct cpl_pass_establish *req = cplhdr(m);
 	struct toepcb *toep = (struct toepcb *)ctx;
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so, *lso;
 	struct t3c_data *td = T3C_DATA(cdev);
 	struct sockbuf *snd, *rcv;
 	
 	// Complete socket initialization now that we have the SND_ISN
 	
 	struct toedev *tdev;
 
 
 	tdev = toep->tp_toedev;
 
 	inp_wlock(tp->t_inpcb);
 	
 	/*
 	 *
 	 * XXX need to add reference while we're manipulating
 	 */
 	so = lso = inp_inpcbtosocket(tp->t_inpcb);
 
 	inp_wunlock(tp->t_inpcb);
 
 	so_lock(so);
 	LIST_REMOVE(toep, synq_entry);
 	so_unlock(so);
 	
 	if (!syncache_expand_establish_req(req, &so, toep)) {
 		/*
 		 * No entry 
 		 */
 		CXGB_UNIMPLEMENTED();
 	}
 	if (so == NULL) {
 		/*
 		 * Couldn't create the socket
 		 */
 		CXGB_UNIMPLEMENTED();
 	}
 
 	tp = so_sototcpcb(so);
 	inp_wlock(tp->t_inpcb);
 
 	snd = so_sockbuf_snd(so);
 	rcv = so_sockbuf_rcv(so);
 
 	snd->sb_flags |= SB_NOCOALESCE;
 	rcv->sb_flags |= SB_NOCOALESCE;
 
 	toep->tp_tp = tp;
 	toep->tp_flags = 0;
 	tp->t_toe = toep;
 	reset_wr_list(toep);
 	tp->rcv_wnd = select_rcv_wnd(tdev, so);
 	tp->rcv_nxt = toep->tp_copied_seq;
 	install_offload_ops(so);
 	
 	toep->tp_wr_max = toep->tp_wr_avail = TOM_TUNABLE(tdev, max_wrs);
 	toep->tp_wr_unacked = 0;
 	toep->tp_qset = G_QNUM(ntohl(m->m_pkthdr.csum_data));
 	toep->tp_qset_idx = 0;
 	toep->tp_mtu_idx = select_mss(td, tp, toep->tp_l2t->neigh->rt_ifp->if_mtu);
 	
 	/*
 	 * XXX Cancel any keep alive timer
 	 */
 	     
 	make_established(so, ntohl(req->snd_isn), ntohs(req->tcp_opt));
 
 	/*
 	 * XXX workaround for lack of syncache drop
 	 */
 	toepcb_release(toep);
 	inp_wunlock(tp->t_inpcb);
 	
 	CTR1(KTR_TOM, "do_pass_establish tid=%u", toep->tp_tid);
 	cxgb_log_tcb(cdev->adapter, toep->tp_tid);
 #ifdef notyet
 	/*
 	 * XXX not sure how these checks map to us
 	 */
 	if (unlikely(sk->sk_socket)) {   // simultaneous opens only
 		sk->sk_state_change(sk);
 		sk_wake_async(so, 0, POLL_OUT);
 	}
 	/*
 	 * The state for the new connection is now up to date.
 	 * Next check if we should add the connection to the parent's
 	 * accept queue.  When the parent closes it resets connections
 	 * on its SYN queue, so check if we are being reset.  If so we
 	 * don't need to do anything more, the coming ABORT_RPL will
 	 * destroy this socket.  Otherwise move the connection to the
 	 * accept queue.
 	 *
 	 * Note that we reset the synq before closing the server so if
 	 * we are not being reset the stid is still open.
 	 */
 	if (unlikely(!tp->forward_skb_hint)) { // removed from synq
 		__kfree_skb(skb);
 		goto unlock;
 	}
 #endif
 	m_free(m);
 
 	return (0);
 }
 
 /*
  * Fill in the right TID for CPL messages waiting in the out-of-order queue
  * and send them to the TOE.
  */
 static void
 fixup_and_send_ofo(struct toepcb *toep)
 {
 	struct mbuf *m;
 	struct toedev *tdev = toep->tp_toedev;
 	struct tcpcb *tp = toep->tp_tp;
 	unsigned int tid = toep->tp_tid;
 
 	log(LOG_NOTICE, "fixup_and_send_ofo\n");
 	
 	inp_lock_assert(tp->t_inpcb);
 	while ((m = mbufq_dequeue(&toep->out_of_order_queue)) != NULL) {
 		/*
 		 * A variety of messages can be waiting but the fields we'll
 		 * be touching are common to all so any message type will do.
 		 */
 		struct cpl_close_con_req *p = cplhdr(m);
 
 		p->wr.wr_lo = htonl(V_WR_TID(tid));
 		OPCODE_TID(p) = htonl(MK_OPCODE_TID(p->ot.opcode, tid));
 		cxgb_ofld_send(TOM_DATA(tdev)->cdev, m);
 	}
 }
 
 /*
  * Updates socket state from an active establish CPL message.  Runs with the
  * socket lock held.
  */
 static void
 socket_act_establish(struct socket *so, struct mbuf *m)
 {
 	INIT_VNET_INET(so->so_vnet);
 	struct cpl_act_establish *req = cplhdr(m);
 	u32 rcv_isn = ntohl(req->rcv_isn);	/* real RCV_ISN + 1 */
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	
 	if (__predict_false(tp->t_state != TCPS_SYN_SENT))
 		log(LOG_ERR, "TID %u expected SYN_SENT, found %d\n",
 		    toep->tp_tid, tp->t_state);
 
 	tp->ts_recent_age = ticks;
 	tp->irs = tp->rcv_wnd = tp->rcv_nxt = rcv_isn;
 	toep->tp_delack_seq = toep->tp_rcv_wup = toep->tp_copied_seq = tp->irs;
 
 	make_established(so, ntohl(req->snd_isn), ntohs(req->tcp_opt));
 	
 	/*
 	 * Now that we finally have a TID send any CPL messages that we had to
 	 * defer for lack of a TID.
 	 */
 	if (mbufq_len(&toep->out_of_order_queue))
 		fixup_and_send_ofo(toep);
 
 	if (__predict_false(so_state_get(so) & SS_NOFDREF)) {
 		/*
 		 * XXX does this even make sense?
 		 */
 		so_sorwakeup(so);
 	}
 	m_free(m);
 #ifdef notyet
 /*
  * XXX assume no write requests permitted while socket connection is
  * incomplete
  */
 	/*
 	 * Currently the send queue must be empty at this point because the
 	 * socket layer does not send anything before a connection is
 	 * established.  To be future proof though we handle the possibility
 	 * that there are pending buffers to send (either TX_DATA or
 	 * CLOSE_CON_REQ).  First we need to adjust the sequence number of the
 	 * buffers according to the just learned write_seq, and then we send
 	 * them on their way.
 	 */
 	fixup_pending_writeq_buffers(sk);
 	if (t3_push_frames(so, 1))
 		sk->sk_write_space(sk);
 #endif
 
 	toep->tp_state = tp->t_state;
 	TCPSTAT_INC(tcps_connects);
 				
 }
 
 /*
  * Process a CPL_ACT_ESTABLISH message.
  */
 static int
 do_act_establish(struct t3cdev *cdev, struct mbuf *m, void *ctx)
 {
 	struct cpl_act_establish *req = cplhdr(m);
 	unsigned int tid = GET_TID(req);
 	unsigned int atid = G_PASS_OPEN_TID(ntohl(req->tos_tid));
 	struct toepcb *toep = (struct toepcb *)ctx;
 	struct tcpcb *tp = toep->tp_tp;
 	struct socket *so; 
 	struct toedev *tdev;
 	struct tom_data *d;
 	
 	if (tp == NULL) {
 		free_atid(cdev, atid);
 		return (0);
 	}
 	inp_wlock(tp->t_inpcb);
 
 	/*
 	 * XXX
 	 */
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	tdev = toep->tp_toedev; /* blow up here if link was down */
 	d = TOM_DATA(tdev);
 
 	/*
 	 * It's OK if the TID is currently in use, the owning socket may have
 	 * backlogged its last CPL message(s).  Just take it away.
 	 */
 	toep->tp_tid = tid;
 	toep->tp_tp = tp;
 	so_insert_tid(d, toep, tid);
 	free_atid(cdev, atid);
 	toep->tp_qset = G_QNUM(ntohl(m->m_pkthdr.csum_data));
 
 	socket_act_establish(so, m);
 	inp_wunlock(tp->t_inpcb);
 	CTR1(KTR_TOM, "do_act_establish tid=%u", toep->tp_tid);
 	cxgb_log_tcb(cdev->adapter, toep->tp_tid);
 
 	return (0);
 }
 
 /*
  * Process an acknowledgment of WR completion.  Advance snd_una and send the
  * next batch of work requests from the write queue.
  */
 static void
 wr_ack(struct toepcb *toep, struct mbuf *m)
 {
 	struct tcpcb *tp = toep->tp_tp;
 	struct cpl_wr_ack *hdr = cplhdr(m);
 	struct socket *so;
 	unsigned int credits = ntohs(hdr->credits);
 	u32 snd_una = ntohl(hdr->snd_una);
 	int bytes = 0;
 	struct sockbuf *snd;
 	
 	CTR2(KTR_SPARE2, "wr_ack: snd_una=%u credits=%d", snd_una, credits);
 
 	inp_wlock(tp->t_inpcb);
 	so = inp_inpcbtosocket(tp->t_inpcb);
 	toep->tp_wr_avail += credits;
 	if (toep->tp_wr_unacked > toep->tp_wr_max - toep->tp_wr_avail)
 		toep->tp_wr_unacked = toep->tp_wr_max - toep->tp_wr_avail;
 
 	while (credits) {
 		struct mbuf *p = peek_wr(toep);
 		
 		if (__predict_false(!p)) {
 			log(LOG_ERR, "%u WR_ACK credits for TID %u with "
 			    "nothing pending, state %u wr_avail=%u\n",
 			    credits, toep->tp_tid, tp->t_state, toep->tp_wr_avail);
 			break;
 		}
 		CTR2(KTR_TOM,
 			"wr_ack: p->credits=%d p->bytes=%d",
 		    p->m_pkthdr.csum_data, p->m_pkthdr.len);
 		KASSERT(p->m_pkthdr.csum_data != 0,
 		    ("empty request still on list"));
 
 		if (__predict_false(credits < p->m_pkthdr.csum_data)) {
 
 #if DEBUG_WR > 1
 			struct tx_data_wr *w = cplhdr(p);
 			log(LOG_ERR,
 			       "TID %u got %u WR credits, need %u, len %u, "
 			       "main body %u, frags %u, seq # %u, ACK una %u,"
 			       " ACK nxt %u, WR_AVAIL %u, WRs pending %u\n",
 			       toep->tp_tid, credits, p->csum, p->len,
 			       p->len - p->data_len, skb_shinfo(p)->nr_frags,
 			       ntohl(w->sndseq), snd_una, ntohl(hdr->snd_nxt),
 			    toep->tp_wr_avail, count_pending_wrs(tp) - credits);
 #endif
 			p->m_pkthdr.csum_data -= credits;
 			break;
 		} else {
 			dequeue_wr(toep);
 			credits -= p->m_pkthdr.csum_data;
 			bytes += p->m_pkthdr.len;
 			CTR3(KTR_TOM,
 			    "wr_ack: done with wr of %d bytes remain credits=%d wr credits=%d",
 			    p->m_pkthdr.len, credits, p->m_pkthdr.csum_data);
 	
 			m_free(p);
 		}
 	}
 
 #if DEBUG_WR
 	check_wr_invariants(tp);
 #endif
 
 	if (__predict_false(SEQ_LT(snd_una, tp->snd_una))) {
 #if VALIDATE_SEQ
 		struct tom_data *d = TOM_DATA(TOE_DEV(so));
 
 		log(LOG_ERR "%s: unexpected sequence # %u in WR_ACK "
 		    "for TID %u, snd_una %u\n", (&d->tdev)->name, snd_una,
 		    toep->tp_tid, tp->snd_una);
 #endif
 		goto out_free;
 	}
 
 	if (tp->snd_una != snd_una) {
 		tp->snd_una = snd_una;
 		tp->ts_recent_age = ticks;
 #ifdef notyet
 		/*
 		 * Keep ARP entry "minty fresh"
 		 */
 		dst_confirm(sk->sk_dst_cache);
 #endif
 		if (tp->snd_una == tp->snd_nxt)
 			toep->tp_flags &= ~TP_TX_WAIT_IDLE;
 	}
 
 	snd = so_sockbuf_snd(so);
 	if (bytes) {
 		CTR1(KTR_SPARE2, "wr_ack: sbdrop(%d)", bytes);
 		snd = so_sockbuf_snd(so);
 		sockbuf_lock(snd);		
 		sbdrop_locked(snd, bytes);
 		so_sowwakeup_locked(so);
 	}
 
 	if (snd->sb_sndptroff < snd->sb_cc)
 		t3_push_frames(so, 0);
 
 out_free:
 	inp_wunlock(tp->t_inpcb);
 	m_free(m);
 }
 
 /*
  * Handler for TX_DATA_ACK CPL messages.
  */
 static int
 do_wr_ack(struct t3cdev *dev, struct mbuf *m, void *ctx)
 {
 	struct toepcb *toep = (struct toepcb *)ctx;
 
 	VALIDATE_SOCK(so);
 
 	wr_ack(toep, m);
 	return 0;
 }
 
 /*
  * Handler for TRACE_PKT CPL messages.  Just sink these packets.
  */
 static int
 do_trace_pkt(struct t3cdev *dev, struct mbuf *m, void *ctx)
 {
 	m_freem(m);
 	return 0;
 }
 
 /*
  * Reset a connection that is on a listener's SYN queue or accept queue,
  * i.e., one that has not had a struct socket associated with it.
  * Must be called from process context.
  *
  * Modeled after code in inet_csk_listen_stop().
  */
 static void
 t3_reset_listen_child(struct socket *child)
 {
 	struct tcpcb *tp = so_sototcpcb(child);
 	
 	t3_send_reset(tp->t_toe);
 }
 
 
 static void
 t3_child_disconnect(struct socket *so, void *arg)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 		
 	if (tp->t_flags & TF_TOE) {
 		inp_wlock(tp->t_inpcb);
 		t3_reset_listen_child(so);
 		inp_wunlock(tp->t_inpcb);
 	}	
 }
 
 /*
  * Disconnect offloaded established but not yet accepted connections sitting
  * on a server's accept_queue.  We just send an ABORT_REQ at this point and
  * finish off the disconnect later as we may need to wait for the ABORT_RPL.
  */
 void
 t3_disconnect_acceptq(struct socket *listen_so)
 {
 
 	so_lock(listen_so);
 	so_listeners_apply_all(listen_so, t3_child_disconnect, NULL);
 	so_unlock(listen_so);
 }
 
 /*
  * Reset offloaded connections sitting on a server's syn queue.  As above
  * we send ABORT_REQ and finish off when we get ABORT_RPL.
  */
 
 void
 t3_reset_synq(struct listen_ctx *lctx)
 {
 	struct toepcb *toep;
 
 	so_lock(lctx->lso);	
 	while (!LIST_EMPTY(&lctx->synq_head)) {
 		toep = LIST_FIRST(&lctx->synq_head);
 		LIST_REMOVE(toep, synq_entry);
 		toep->tp_tp = NULL;
 		t3_send_reset(toep);
 		cxgb_remove_tid(TOEP_T3C_DEV(toep), toep, toep->tp_tid);
 		toepcb_release(toep);
 	}
 	so_unlock(lctx->lso); 
 }
 
 
 int
 t3_setup_ppods(struct toepcb *toep, const struct ddp_gather_list *gl,
 		   unsigned int nppods, unsigned int tag, unsigned int maxoff,
 		   unsigned int pg_off, unsigned int color)
 {
 	unsigned int i, j, pidx;
 	struct pagepod *p;
 	struct mbuf *m;
 	struct ulp_mem_io *req;
 	unsigned int tid = toep->tp_tid;
 	const struct tom_data *td = TOM_DATA(toep->tp_toedev);
 	unsigned int ppod_addr = tag * PPOD_SIZE + td->ddp_llimit;
 
 	CTR6(KTR_TOM, "t3_setup_ppods(gl=%p nppods=%u tag=%u maxoff=%u pg_off=%u color=%u)",
 	    gl, nppods, tag, maxoff, pg_off, color);
 	
 	for (i = 0; i < nppods; ++i) {
 		m = m_gethdr_nofail(sizeof(*req) + PPOD_SIZE);
 		m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 		req = mtod(m, struct ulp_mem_io *);
 		m->m_pkthdr.len = m->m_len = sizeof(*req) + PPOD_SIZE;
 		req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS));
 		req->wr.wr_lo = 0;
 		req->cmd_lock_addr = htonl(V_ULP_MEMIO_ADDR(ppod_addr >> 5) |
 					   V_ULPTX_CMD(ULP_MEM_WRITE));
 		req->len = htonl(V_ULP_MEMIO_DATA_LEN(PPOD_SIZE / 32) |
 				 V_ULPTX_NFLITS(PPOD_SIZE / 8 + 1));
 
 		p = (struct pagepod *)(req + 1);
 		if (__predict_false(i < nppods - NUM_SENTINEL_PPODS)) {
 			p->pp_vld_tid = htonl(F_PPOD_VALID | V_PPOD_TID(tid));
 			p->pp_pgsz_tag_color = htonl(V_PPOD_TAG(tag) |
 						  V_PPOD_COLOR(color));
 			p->pp_max_offset = htonl(maxoff);
 			p->pp_page_offset = htonl(pg_off);
 			p->pp_rsvd = 0;
 			for (pidx = 4 * i, j = 0; j < 5; ++j, ++pidx)
 				p->pp_addr[j] = pidx < gl->dgl_nelem ?
 				    htobe64(VM_PAGE_TO_PHYS(gl->dgl_pages[pidx])) : 0;
 		} else
 			p->pp_vld_tid = 0;   /* mark sentinel page pods invalid */
 		send_or_defer(toep, m, 0);
 		ppod_addr += PPOD_SIZE;
 	}
 	return (0);
 }
 
 /*
  * Build a CPL_BARRIER message as payload of a ULP_TX_PKT command.
  */
 static inline void
 mk_cpl_barrier_ulp(struct cpl_barrier *b)
 {
 	struct ulp_txpkt *txpkt = (struct ulp_txpkt *)b;
 
 	txpkt->cmd_dest = htonl(V_ULPTX_CMD(ULP_TXPKT));
 	txpkt->len = htonl(V_ULPTX_NFLITS(sizeof(*b) / 8));
 	b->opcode = CPL_BARRIER;
 }
 
 /*
  * Build a CPL_GET_TCB message as payload of a ULP_TX_PKT command.
  */
 static inline void
 mk_get_tcb_ulp(struct cpl_get_tcb *req, unsigned int tid, unsigned int cpuno)
 {
 	struct ulp_txpkt *txpkt = (struct ulp_txpkt *)req;
 
 	txpkt = (struct ulp_txpkt *)req;
 	txpkt->cmd_dest = htonl(V_ULPTX_CMD(ULP_TXPKT));
 	txpkt->len = htonl(V_ULPTX_NFLITS(sizeof(*req) / 8));
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_GET_TCB, tid));
 	req->cpuno = htons(cpuno);
 }
 
 /*
  * Build a CPL_SET_TCB_FIELD message as payload of a ULP_TX_PKT command.
  */
 static inline void
 mk_set_tcb_field_ulp(struct cpl_set_tcb_field *req, unsigned int tid,
                      unsigned int word, uint64_t mask, uint64_t val)
 {
 	struct ulp_txpkt *txpkt = (struct ulp_txpkt *)req;
 	
 	CTR4(KTR_TCB, "mk_set_tcb_field_ulp(tid=%u word=0x%x mask=%jx val=%jx",
 	    tid, word, mask, val);
 	
 	txpkt->cmd_dest = htonl(V_ULPTX_CMD(ULP_TXPKT));
 	txpkt->len = htonl(V_ULPTX_NFLITS(sizeof(*req) / 8));
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, tid));
 	req->reply = V_NO_REPLY(1);
 	req->cpu_idx = 0;
 	req->word = htons(word);
 	req->mask = htobe64(mask);
 	req->val = htobe64(val);
 }
 
 /*
  * Build a CPL_RX_DATA_ACK message as payload of a ULP_TX_PKT command.
  */
 static void
 mk_rx_data_ack_ulp(struct toepcb *toep, struct cpl_rx_data_ack *ack,
     unsigned int tid, unsigned int credits)
 {
 	struct ulp_txpkt *txpkt = (struct ulp_txpkt *)ack;
 
 	txpkt->cmd_dest = htonl(V_ULPTX_CMD(ULP_TXPKT));
 	txpkt->len = htonl(V_ULPTX_NFLITS(sizeof(*ack) / 8));
 	OPCODE_TID(ack) = htonl(MK_OPCODE_TID(CPL_RX_DATA_ACK, tid));
 	ack->credit_dack = htonl(F_RX_MODULATE | F_RX_DACK_CHANGE |
 	    V_RX_DACK_MODE(TOM_TUNABLE(toep->tp_toedev, delack)) |
 				 V_RX_CREDITS(credits));
 }
 
 void
 t3_cancel_ddpbuf(struct toepcb *toep, unsigned int bufidx)
 {
 	unsigned int wrlen;
 	struct mbuf *m;
 	struct work_request_hdr *wr;
 	struct cpl_barrier *lock;
 	struct cpl_set_tcb_field *req;
 	struct cpl_get_tcb *getreq;
 	struct ddp_state *p = &toep->tp_ddp_state;
 
 #if 0
 	SOCKBUF_LOCK_ASSERT(&toeptoso(toep)->so_rcv);
 #endif
 	wrlen = sizeof(*wr) + sizeof(*req) + 2 * sizeof(*lock) +
 		sizeof(*getreq);
 	m = m_gethdr_nofail(wrlen);
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	wr = mtod(m, struct work_request_hdr *);
 	bzero(wr, wrlen);
 	
 	wr->wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS));
 	m->m_pkthdr.len = m->m_len = wrlen;
 
 	lock = (struct cpl_barrier *)(wr + 1);
 	mk_cpl_barrier_ulp(lock);
 
 	req = (struct cpl_set_tcb_field *)(lock + 1);
 
 	CTR1(KTR_TCB, "t3_cancel_ddpbuf(bufidx=%u)", bufidx);
 
 	/* Hmmm, not sure if this actually a good thing: reactivating
 	 * the other buffer might be an issue if it has been completed
 	 * already. However, that is unlikely, since the fact that the UBUF
 	 * is not completed indicates that there is no oustanding data.
 	 */
 	if (bufidx == 0)
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_FLAGS,
 				     V_TF_DDP_ACTIVE_BUF(1) |
 				     V_TF_DDP_BUF0_VALID(1),
 				     V_TF_DDP_ACTIVE_BUF(1));
 	else
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_FLAGS,
 				     V_TF_DDP_ACTIVE_BUF(1) |
 				     V_TF_DDP_BUF1_VALID(1), 0);
 
 	getreq = (struct cpl_get_tcb *)(req + 1);
 	mk_get_tcb_ulp(getreq, toep->tp_tid, toep->tp_qset);
 
 	mk_cpl_barrier_ulp((struct cpl_barrier *)(getreq + 1));
 
 	/* Keep track of the number of oustanding CPL_GET_TCB requests
 	 */
 	p->get_tcb_count++;
 	
 #ifdef T3_TRACE
 	T3_TRACE1(TIDTB(so),
 		  "t3_cancel_ddpbuf: bufidx %u", bufidx);
 #endif
 	cxgb_ofld_send(TOEP_T3C_DEV(toep), m);
 }
 
 /**
  * t3_overlay_ddpbuf - overlay an existing DDP buffer with a new one
  * @sk: the socket associated with the buffers
  * @bufidx: index of HW DDP buffer (0 or 1)
  * @tag0: new tag for HW buffer 0
  * @tag1: new tag for HW buffer 1
  * @len: new length for HW buf @bufidx
  *
  * Sends a compound WR to overlay a new DDP buffer on top of an existing
  * buffer by changing the buffer tag and length and setting the valid and
  * active flag accordingly.  The caller must ensure the new buffer is at
  * least as big as the existing one.  Since we typically reprogram both HW
  * buffers this function sets both tags for convenience. Read the TCB to
  * determine how made data was written into the buffer before the overlay
  * took place.
  */
 void
 t3_overlay_ddpbuf(struct toepcb *toep, unsigned int bufidx, unsigned int tag0,
 	 	       unsigned int tag1, unsigned int len)
 {
 	unsigned int wrlen;
 	struct mbuf *m;
 	struct work_request_hdr *wr;
 	struct cpl_get_tcb *getreq;
 	struct cpl_set_tcb_field *req;
 	struct ddp_state *p = &toep->tp_ddp_state;
 
 	CTR4(KTR_TCB, "t3_setup_ppods(bufidx=%u tag0=%u tag1=%u len=%u)",
 	    bufidx, tag0, tag1, len);
 #if 0
 	SOCKBUF_LOCK_ASSERT(&toeptoso(toep)->so_rcv);
 #endif	
 	wrlen = sizeof(*wr) + 3 * sizeof(*req) + sizeof(*getreq);
 	m = m_gethdr_nofail(wrlen);
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	wr = mtod(m, struct work_request_hdr *);
 	m->m_pkthdr.len = m->m_len = wrlen;
 	bzero(wr, wrlen);
 
 	
 	/* Set the ATOMIC flag to make sure that TP processes the following
 	 * CPLs in an atomic manner and no wire segments can be interleaved.
 	 */
 	wr->wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS) | F_WR_ATOMIC);
 	req = (struct cpl_set_tcb_field *)(wr + 1);
 	mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_BUF0_TAG,
 			     V_TCB_RX_DDP_BUF0_TAG(M_TCB_RX_DDP_BUF0_TAG) |
 			     V_TCB_RX_DDP_BUF1_TAG(M_TCB_RX_DDP_BUF1_TAG) << 32,
 			     V_TCB_RX_DDP_BUF0_TAG(tag0) |
 			     V_TCB_RX_DDP_BUF1_TAG((uint64_t)tag1) << 32);
 	req++;
 	if (bufidx == 0) {
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_BUF0_LEN,
 			    V_TCB_RX_DDP_BUF0_LEN(M_TCB_RX_DDP_BUF0_LEN),
 			    V_TCB_RX_DDP_BUF0_LEN((uint64_t)len));
 		req++;
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_FLAGS,
 			    V_TF_DDP_PUSH_DISABLE_0(1) |
 			    V_TF_DDP_BUF0_VALID(1) | V_TF_DDP_ACTIVE_BUF(1),
 			    V_TF_DDP_PUSH_DISABLE_0(0) |
 			    V_TF_DDP_BUF0_VALID(1));
 	} else {
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_BUF1_LEN,
 			    V_TCB_RX_DDP_BUF1_LEN(M_TCB_RX_DDP_BUF1_LEN),
 			    V_TCB_RX_DDP_BUF1_LEN((uint64_t)len));
 		req++;
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_FLAGS,
 			    V_TF_DDP_PUSH_DISABLE_1(1) |
 			    V_TF_DDP_BUF1_VALID(1) | V_TF_DDP_ACTIVE_BUF(1),
 			    V_TF_DDP_PUSH_DISABLE_1(0) |
 			    V_TF_DDP_BUF1_VALID(1) | V_TF_DDP_ACTIVE_BUF(1));
 	}
 
 	getreq = (struct cpl_get_tcb *)(req + 1);
 	mk_get_tcb_ulp(getreq, toep->tp_tid, toep->tp_qset);
 
 	/* Keep track of the number of oustanding CPL_GET_TCB requests
 	 */
 	p->get_tcb_count++;
 
 #ifdef T3_TRACE
 	T3_TRACE4(TIDTB(sk),
 		  "t3_overlay_ddpbuf: bufidx %u tag0 %u tag1 %u "
 		  "len %d",
 		  bufidx, tag0, tag1, len);
 #endif
 	cxgb_ofld_send(TOEP_T3C_DEV(toep), m);
 }
 
 /*
  * Sends a compound WR containing all the CPL messages needed to program the
  * two HW DDP buffers, namely optionally setting up the length and offset of
  * each buffer, programming the DDP flags, and optionally sending RX_DATA_ACK.
  */
 void
 t3_setup_ddpbufs(struct toepcb *toep, unsigned int len0, unsigned int offset0,
 		      unsigned int len1, unsigned int offset1,
                       uint64_t ddp_flags, uint64_t flag_mask, int modulate)
 {
 	unsigned int wrlen;
 	struct mbuf *m;
 	struct work_request_hdr *wr;
 	struct cpl_set_tcb_field *req;
 
 	CTR6(KTR_TCB, "t3_setup_ddpbufs(len0=%u offset0=%u len1=%u offset1=%u ddp_flags=0x%08x%08x ",
 	    len0, offset0, len1, offset1, ddp_flags >> 32, ddp_flags & 0xffffffff);
 	
 #if 0
 	SOCKBUF_LOCK_ASSERT(&toeptoso(toep)->so_rcv);
 #endif
 	wrlen = sizeof(*wr) + sizeof(*req) + (len0 ? sizeof(*req) : 0) +
 		(len1 ? sizeof(*req) : 0) +
 		(modulate ? sizeof(struct cpl_rx_data_ack) : 0);
 	m = m_gethdr_nofail(wrlen);
 	m_set_priority(m, mkprio(CPL_PRIORITY_CONTROL, toep));
 	wr = mtod(m, struct work_request_hdr *);
 	bzero(wr, wrlen);
 	
 	wr->wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS));
 	m->m_pkthdr.len = m->m_len = wrlen;
 
 	req = (struct cpl_set_tcb_field *)(wr + 1);
 	if (len0) {                  /* program buffer 0 offset and length */
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_BUF0_OFFSET,
 			V_TCB_RX_DDP_BUF0_OFFSET(M_TCB_RX_DDP_BUF0_OFFSET) |
 			V_TCB_RX_DDP_BUF0_LEN(M_TCB_RX_DDP_BUF0_LEN),
 			V_TCB_RX_DDP_BUF0_OFFSET((uint64_t)offset0) |
 			V_TCB_RX_DDP_BUF0_LEN((uint64_t)len0));
 		req++;
 	}
 	if (len1) {                  /* program buffer 1 offset and length */
 		mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_BUF1_OFFSET,
 			V_TCB_RX_DDP_BUF1_OFFSET(M_TCB_RX_DDP_BUF1_OFFSET) |
 			V_TCB_RX_DDP_BUF1_LEN(M_TCB_RX_DDP_BUF1_LEN) << 32,
 			V_TCB_RX_DDP_BUF1_OFFSET((uint64_t)offset1) |
 			V_TCB_RX_DDP_BUF1_LEN((uint64_t)len1) << 32);
 		req++;
 	}
 
 	mk_set_tcb_field_ulp(req, toep->tp_tid, W_TCB_RX_DDP_FLAGS, flag_mask,
 			     ddp_flags);
 
 	if (modulate) {
 		mk_rx_data_ack_ulp(toep,
 		    (struct cpl_rx_data_ack *)(req + 1), toep->tp_tid,
 		    toep->tp_copied_seq - toep->tp_rcv_wup);
 		toep->tp_rcv_wup = toep->tp_copied_seq;
 	}
 
 #ifdef T3_TRACE
 	T3_TRACE5(TIDTB(sk),
 		  "t3_setup_ddpbufs: len0 %u len1 %u ddp_flags 0x%08x%08x "
 		  "modulate %d",
 		  len0, len1, ddp_flags >> 32, ddp_flags & 0xffffffff,
 		  modulate);
 #endif
 
 	cxgb_ofld_send(TOEP_T3C_DEV(toep), m);
 }
 
 void
 t3_init_wr_tab(unsigned int wr_len)
 {
 	int i;
 
 	if (mbuf_wrs[1])     /* already initialized */
 		return;
 
 	for (i = 1; i < ARRAY_SIZE(mbuf_wrs); i++) {
 		int sgl_len = (3 * i) / 2 + (i & 1);
 
 		sgl_len += 3;
 		mbuf_wrs[i] = sgl_len <= wr_len ?
 		       	1 : 1 + (sgl_len - 2) / (wr_len - 1);
 	}
 
 	wrlen = wr_len * 8;
 }
 
 int
 t3_init_cpl_io(void)
 {
 #ifdef notyet
 	tcphdr_skb = alloc_skb(sizeof(struct tcphdr), GFP_KERNEL);
 	if (!tcphdr_skb) {
 		log(LOG_ERR,
 		       "Chelsio TCP offload: can't allocate sk_buff\n");
 		return -1;
 	}
 	skb_put(tcphdr_skb, sizeof(struct tcphdr));
 	tcphdr_skb->h.raw = tcphdr_skb->data;
 	memset(tcphdr_skb->data, 0, tcphdr_skb->len);
 #endif
 	
 	t3tom_register_cpl_handler(CPL_ACT_ESTABLISH, do_act_establish);
 	t3tom_register_cpl_handler(CPL_ACT_OPEN_RPL, do_act_open_rpl);
 	t3tom_register_cpl_handler(CPL_TX_DMA_ACK, do_wr_ack);
 	t3tom_register_cpl_handler(CPL_RX_DATA, do_rx_data);
 	t3tom_register_cpl_handler(CPL_CLOSE_CON_RPL, do_close_con_rpl);
 	t3tom_register_cpl_handler(CPL_PEER_CLOSE, do_peer_close);
 	t3tom_register_cpl_handler(CPL_PASS_ESTABLISH, do_pass_establish);
 	t3tom_register_cpl_handler(CPL_PASS_ACCEPT_REQ, do_pass_accept_req);
 	t3tom_register_cpl_handler(CPL_ABORT_REQ_RSS, do_abort_req);
 	t3tom_register_cpl_handler(CPL_ABORT_RPL_RSS, do_abort_rpl);
 	t3tom_register_cpl_handler(CPL_RX_DATA_DDP, do_rx_data_ddp);
 	t3tom_register_cpl_handler(CPL_RX_DDP_COMPLETE, do_rx_ddp_complete);
 	t3tom_register_cpl_handler(CPL_RX_URG_NOTIFY, do_rx_urg_notify);
 	t3tom_register_cpl_handler(CPL_TRACE_PKT, do_trace_pkt);
 	t3tom_register_cpl_handler(CPL_GET_TCB_RPL, do_get_tcb_rpl);
 	return (0);
 }
 
Index: head/sys/netinet/tcp_offload.h
===================================================================
--- head/sys/netinet/tcp_offload.h	(revision 195653)
+++ head/sys/netinet/tcp_offload.h	(revision 195654)
@@ -1,341 +1,354 @@
 /*-
  * Copyright (c) 2007, Chelsio Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  * 1. Redistributions of source code must retain the above copyright notice,
  *    this list of conditions and the following disclaimer.
  *
  * 2. Neither the name of the Chelsio Corporation nor the names of its
  *    contributors may be used to endorse or promote products derived from
  *    this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _NETINET_TCP_OFFLOAD_H_
 #define	_NETINET_TCP_OFFLOAD_H_
 
 #ifndef _KERNEL
 #error "no user-serviceable parts inside"
 #endif
 
 /*
  * A driver publishes that it provides offload services
  * by setting IFCAP_TOE in the ifnet. The offload connect
  * will bypass any further work if the interface that a
  * connection would use does not support TCP offload.
  *
  * The TOE API assumes that the tcp offload engine can offload the 
  * the entire connection from set up to teardown, with some provision 
  * being made to allowing the software stack to handle time wait. If
  * the device does not meet these criteria, it is the driver's responsibility
  * to overload the functions that it needs to in tcp_usrreqs and make
  * its own calls to tcp_output if it needs to do so.
  *
  * There is currently no provision for the device advertising the congestion
  * control algorithms it supports as there is currently no API for querying 
  * an operating system for the protocols that it has loaded. This is a desirable
  * future extension.
  *
  *
  *
  * It is assumed that individuals deploying TOE will want connections
  * to be offloaded without software changes so all connections on an
  * interface providing TOE are offloaded unless the the SO_NO_OFFLOAD 
  * flag is set on the socket.
  *
  *
  * The toe_usrreqs structure constitutes the TOE driver's 
  * interface to the TCP stack for functionality that doesn't
  * interact directly with userspace. If one wants to provide
  * (optional) functionality to do zero-copy to/from
  * userspace one still needs to override soreceive/sosend 
  * with functions that fault in and pin the user buffers.
  *
  * + tu_send
  *   - tells the driver that new data may have been added to the 
  *     socket's send buffer - the driver should not fail if the
  *     buffer is in fact unchanged
  *   - the driver is responsible for providing credits (bytes in the send window)
  *     back to the socket by calling sbdrop() as segments are acknowledged.
  *   - The driver expects the inpcb lock to be held - the driver is expected
  *     not to drop the lock. Hence the driver is not allowed to acquire the
  *     pcbinfo lock during this call.
  *
  * + tu_rcvd
  *   - returns credits to the driver and triggers window updates
  *     to the peer (a credit as used here is a byte in the peer's receive window)
  *   - the driver is expected to determine how many bytes have been 
  *     consumed and credit that back to the card so that it can grow
  *     the window again by maintaining its own state between invocations.
  *   - In principle this could be used to shrink the window as well as
  *     grow the window, although it is not used for that now.
  *   - this function needs to correctly handle being called any number of
  *     times without any bytes being consumed from the receive buffer.
  *   - The driver expects the inpcb lock to be held - the driver is expected
  *     not to drop the lock. Hence the driver is not allowed to acquire the
  *     pcbinfo lock during this call.
  *
  * + tu_disconnect
  *   - tells the driver to send FIN to peer
  *   - driver is expected to send the remaining data and then do a clean half close
  *   - disconnect implies at least half-close so only send, reset, and detach
  *     are legal
  *   - the driver is expected to handle transition through the shutdown
  *     state machine and allow the stack to support SO_LINGER.
  *   - The driver expects the inpcb lock to be held - the driver is expected
  *     not to drop the lock. Hence the driver is not allowed to acquire the
  *     pcbinfo lock during this call.
  *
  * + tu_reset
  *   - closes the connection and sends a RST to peer
  *   - driver is expectd to trigger an RST and detach the toepcb
  *   - no further calls are legal after reset
  *   - The driver expects the inpcb lock to be held - the driver is expected
  *     not to drop the lock. Hence the driver is not allowed to acquire the
  *     pcbinfo lock during this call.
  *
  *   The following fields in the tcpcb are expected to be referenced by the driver:
  *	+ iss
  *	+ rcv_nxt
  *	+ rcv_wnd
  *	+ snd_isn
  *	+ snd_max
  *	+ snd_nxt
  *	+ snd_una
  *	+ t_flags
  *	+ t_inpcb
  *	+ t_maxseg
  *	+ t_toe
  *
  *   The following fields in the inpcb are expected to be referenced by the driver:
  *	+ inp_lport
  *	+ inp_fport
  *	+ inp_laddr
  *	+ inp_fport
  *	+ inp_socket
  *	+ inp_ip_tos
  *
  *   The following fields in the socket are expected to be referenced by the
  *   driver:
  *	+ so_comp
  *	+ so_error
  *	+ so_linger
  *	+ so_options
  *	+ so_rcv
  *	+ so_snd
  *	+ so_state
  *	+ so_timeo
  *
  *   These functions all return 0 on success and can return the following errors
  *   as appropriate:
  *	+ EPERM:
  *	+ ENOBUFS: memory allocation failed
  *	+ EMSGSIZE: MTU changed during the call
  *	+ EHOSTDOWN:
  *	+ EHOSTUNREACH:
  *	+ ENETDOWN:
  *	* ENETUNREACH: the peer is no longer reachable
  *
  * + tu_detach
  *   - tells driver that the socket is going away so disconnect
  *     the toepcb and free appropriate resources
  *   - allows the driver to cleanly handle the case of connection state
  *     outliving the socket
  *   - no further calls are legal after detach
  *   - the driver is expected to provide its own synchronization between
  *     detach and receiving new data.
  * 
  * + tu_syncache_event
  *   - even if it is not actually needed, the driver is expected to
  *     call syncache_add for the initial SYN and then syncache_expand
  *     for the SYN,ACK
  *   - tells driver that a connection either has not been added or has 
  *     been dropped from the syncache
  *   - the driver is expected to maintain state that lives outside the 
  *     software stack so the syncache needs to be able to notify the
  *     toe driver that the software stack is not going to create a connection
  *     for a received SYN
  *   - The driver is responsible for any synchronization required between
  *     the syncache dropping an entry and the driver processing the SYN,ACK.
  * 
  */
 struct toe_usrreqs {
 	int (*tu_send)(struct tcpcb *tp);
 	int (*tu_rcvd)(struct tcpcb *tp);
 	int (*tu_disconnect)(struct tcpcb *tp);
 	int (*tu_reset)(struct tcpcb *tp);
 	void (*tu_detach)(struct tcpcb *tp);
 	void (*tu_syncache_event)(int event, void *toep);
 };
 
+/*
+ * Proxy for struct tcpopt between TOE drivers and TCP functions.
+ */
+struct toeopt {
+	u_int64_t	to_flags;	/* see tcpopt in tcp_var.h */
+	u_int16_t	to_mss;		/* maximum segment size */
+	u_int8_t	to_wscale;	/* window scaling */
+
+	u_int8_t	_pad1;		/* explicit pad for 64bit alignment */
+	u_int32_t	_pad2;		/* explicit pad for 64bit alignment */
+	u_int64_t	_pad3[4];	/* TBD */
+};
+
 #define	TOE_SC_ENTRY_PRESENT		1	/* 4-tuple already present */
 #define	TOE_SC_DROP			2	/* connection was timed out */
 
 /*
  * Because listen is a one-to-many relationship (a socket can be listening 
  * on all interfaces on a machine some of which may be using different TCP
  * offload devices), listen uses a publish/subscribe mechanism. The TCP
  * offload driver registers a listen notification function with the stack.
  * When a listen socket is created all TCP offload devices are notified
  * so that they can do the appropriate set up to offload connections on the
  * port to which the socket is bound. When the listen socket is closed,
  * the offload devices are notified so that they will stop listening on that
  * port and free any associated resources as well as sending RSTs on any
  * connections in the SYN_RCVD state.
  *
  */
 
 typedef	void	(*tcp_offload_listen_start_fn)(void *, struct tcpcb *);
 typedef	void	(*tcp_offload_listen_stop_fn)(void *, struct tcpcb *);
 
 EVENTHANDLER_DECLARE(tcp_offload_listen_start, tcp_offload_listen_start_fn);
 EVENTHANDLER_DECLARE(tcp_offload_listen_stop, tcp_offload_listen_stop_fn);
 
 /*
  * Check if the socket can be offloaded by the following steps:
  * - determine the egress interface
  * - check the interface for TOE capability and TOE is enabled
  * - check if the device has resources to offload the connection
  */
 int	tcp_offload_connect(struct socket *so, struct sockaddr *nam);
 
 /*
  * The tcp_output_* routines are wrappers around the toe_usrreqs calls
  * which trigger packet transmission. In the non-offloaded case they
  * translate to tcp_output. The tcp_offload_* routines notify TOE
  * of specific events. I the non-offloaded case they are no-ops.
  *
  * Listen is a special case because it is a 1 to many relationship
  * and there can be more than one offload driver in the system.
  */
 
 /*
  * Connection is offloaded
  */
 #define	tp_offload(tp)		((tp)->t_flags & TF_TOE)
 
 /*
  * hackish way of allowing this file to also be included by TOE
  * which needs to be kept ignorant of socket implementation details
  */
 #ifdef _SYS_SOCKETVAR_H_
 /*
  * The socket has not been marked as "do not offload"
  */
 #define	SO_OFFLOADABLE(so)	((so->so_options & SO_NO_OFFLOAD) == 0)
 
 static __inline int
 tcp_output_connect(struct socket *so, struct sockaddr *nam)
 {
 	struct tcpcb *tp = sototcpcb(so);
 	int error;
 
 	/*
 	 * If offload has been disabled for this socket or the 
 	 * connection cannot be offloaded just call tcp_output
 	 * to start the TCP state machine.
 	 */
 #ifndef TCP_OFFLOAD_DISABLE	
 	if (!SO_OFFLOADABLE(so) || (error = tcp_offload_connect(so, nam)) != 0)
 #endif		
 		error = tcp_output(tp);
 	return (error);
 }
 
 static __inline int
 tcp_output_send(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (tp_offload(tp))
 		return (tp->t_tu->tu_send(tp));
 #endif
 	return (tcp_output(tp));
 }
 
 static __inline int
 tcp_output_rcvd(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (tp_offload(tp))
 		return (tp->t_tu->tu_rcvd(tp));
 #endif
 	return (tcp_output(tp));
 }
 
 static __inline int
 tcp_output_disconnect(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (tp_offload(tp))
 		return (tp->t_tu->tu_disconnect(tp));
 #endif
 	return (tcp_output(tp));
 }
 
 static __inline int
 tcp_output_reset(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (tp_offload(tp))
 		return (tp->t_tu->tu_reset(tp));
 #endif
 	return (tcp_output(tp));
 }
 
 static __inline void
 tcp_offload_detach(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (tp_offload(tp))
 		tp->t_tu->tu_detach(tp);
 #endif	
 }
 
 static __inline void
 tcp_offload_listen_open(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (SO_OFFLOADABLE(tp->t_inpcb->inp_socket))
 		EVENTHANDLER_INVOKE(tcp_offload_listen_start, tp);
 #endif	
 }
 
 static __inline void
 tcp_offload_listen_close(struct tcpcb *tp)
 {
 
 #ifndef TCP_OFFLOAD_DISABLE
 	EVENTHANDLER_INVOKE(tcp_offload_listen_stop, tp);
 #endif	
 }
 #undef SO_OFFLOADABLE
 #endif /* _SYS_SOCKETVAR_H_ */
 #undef tp_offload
 
 void tcp_offload_twstart(struct tcpcb *tp);
 struct tcpcb *tcp_offload_close(struct tcpcb *tp);
 struct tcpcb *tcp_offload_drop(struct tcpcb *tp, int error);
 
 #endif /* _NETINET_TCP_OFFLOAD_H_ */
Index: head/sys/netinet/tcp_syncache.c
===================================================================
--- head/sys/netinet/tcp_syncache.c	(revision 195653)
+++ head/sys/netinet/tcp_syncache.c	(revision 195654)
@@ -1,1781 +1,1794 @@
 /*-
  * Copyright (c) 2001 McAfee, Inc.
  * Copyright (c) 2006 Andre Oppermann, Internet Business Solutions AG
  * All rights reserved.
  *
  * This software was developed for the FreeBSD Project by Jonathan Lemon
  * and McAfee Research, the Security Research Division of McAfee, Inc. under
  * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
  * DARPA CHATS research program.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_ipsec.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/limits.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/md5.h>
 #include <sys/proc.h>		/* for proc0 declaration */
 #include <sys/random.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/syslog.h>
 #include <sys/ucred.h>
 #include <sys/vimage.h>
 
 #include <vm/uma.h>
 
 #include <net/if.h>
 #include <net/route.h>
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/in_var.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip_var.h>
 #include <netinet/ip_options.h>
 #ifdef INET6
 #include <netinet/ip6.h>
 #include <netinet/icmp6.h>
 #include <netinet6/nd6.h>
 #include <netinet6/ip6_var.h>
 #include <netinet6/in6_pcb.h>
 #endif
 #include <netinet/tcp.h>
 #include <netinet/tcp_fsm.h>
 #include <netinet/tcp_seq.h>
 #include <netinet/tcp_timer.h>
 #include <netinet/tcp_var.h>
 #include <netinet/tcp_syncache.h>
 #include <netinet/tcp_offload.h>
 #ifdef INET6
 #include <netinet6/tcp6_var.h>
 #endif
 #include <netinet/vinet.h>
 
 #ifdef IPSEC
 #include <netipsec/ipsec.h>
 #ifdef INET6
 #include <netipsec/ipsec6.h>
 #endif
 #include <netipsec/key.h>
 #endif /*IPSEC*/
 
 #include <machine/in_cksum.h>
 
 #include <security/mac/mac_framework.h>
 
 #ifdef VIMAGE_GLOBALS
 static struct tcp_syncache tcp_syncache;
 static int tcp_syncookies;
 static int tcp_syncookiesonly;
 int tcp_sc_rst_sock_fail;
 #endif
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp, OID_AUTO, syncookies,
     CTLFLAG_RW, tcp_syncookies, 0,
     "Use TCP SYN cookies if the syncache overflows");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp, OID_AUTO, syncookies_only,
     CTLFLAG_RW, tcp_syncookiesonly, 0,
     "Use only TCP SYN cookies");
 
 #ifdef TCP_OFFLOAD_DISABLE
 #define TOEPCB_ISSET(sc) (0)
 #else
 #define TOEPCB_ISSET(sc) ((sc)->sc_toepcb != NULL)
 #endif
 
 static void	 syncache_drop(struct syncache *, struct syncache_head *);
 static void	 syncache_free(struct syncache *);
 static void	 syncache_insert(struct syncache *, struct syncache_head *);
 struct syncache *syncache_lookup(struct in_conninfo *, struct syncache_head **);
 static int	 syncache_respond(struct syncache *);
 static struct	 socket *syncache_socket(struct syncache *, struct socket *,
 		    struct mbuf *m);
 static void	 syncache_timeout(struct syncache *sc, struct syncache_head *sch,
 		    int docallout);
 static void	 syncache_timer(void *);
 static void	 syncookie_generate(struct syncache_head *, struct syncache *,
 		    u_int32_t *);
 static struct syncache
 		*syncookie_lookup(struct in_conninfo *, struct syncache_head *,
 		    struct syncache *, struct tcpopt *, struct tcphdr *,
 		    struct socket *);
 
 /*
  * Transmit the SYN,ACK fewer times than TCP_MAXRXTSHIFT specifies.
  * 3 retransmits corresponds to a timeout of 3 * (1 + 2 + 4 + 8) == 45 seconds,
  * the odds are that the user has given up attempting to connect by then.
  */
 #define SYNCACHE_MAXREXMTS		3
 
 /* Arbitrary values */
 #define TCP_SYNCACHE_HASHSIZE		512
 #define TCP_SYNCACHE_BUCKETLIMIT	30
 
 SYSCTL_NODE(_net_inet_tcp, OID_AUTO, syncache, CTLFLAG_RW, 0, "TCP SYN cache");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
     bucketlimit, CTLFLAG_RDTUN,
     tcp_syncache.bucket_limit, 0, "Per-bucket hash limit for syncache");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
     cachelimit, CTLFLAG_RDTUN,
     tcp_syncache.cache_limit, 0, "Overall entry limit for syncache");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
     count, CTLFLAG_RD,
     tcp_syncache.cache_count, 0, "Current number of entries in syncache");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
     hashsize, CTLFLAG_RDTUN,
     tcp_syncache.hashsize, 0, "Size of TCP syncache hashtable");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
     rexmtlimit, CTLFLAG_RW,
     tcp_syncache.rexmt_limit, 0, "Limit on SYN/ACK retransmissions");
 
 SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_tcp_syncache, OID_AUTO,
      rst_on_sock_fail, CTLFLAG_RW,
      tcp_sc_rst_sock_fail, 0, "Send reset on socket allocation failure");
 
 static MALLOC_DEFINE(M_SYNCACHE, "syncache", "TCP syncache");
 
 #define SYNCACHE_HASH(inc, mask)					\
 	((V_tcp_syncache.hash_secret ^					\
 	  (inc)->inc_faddr.s_addr ^					\
 	  ((inc)->inc_faddr.s_addr >> 16) ^				\
 	  (inc)->inc_fport ^ (inc)->inc_lport) & mask)
 
 #define SYNCACHE_HASH6(inc, mask)					\
 	((V_tcp_syncache.hash_secret ^					\
 	  (inc)->inc6_faddr.s6_addr32[0] ^				\
 	  (inc)->inc6_faddr.s6_addr32[3] ^				\
 	  (inc)->inc_fport ^ (inc)->inc_lport) & mask)
 
 #define ENDPTS_EQ(a, b) (						\
 	(a)->ie_fport == (b)->ie_fport &&				\
 	(a)->ie_lport == (b)->ie_lport &&				\
 	(a)->ie_faddr.s_addr == (b)->ie_faddr.s_addr &&			\
 	(a)->ie_laddr.s_addr == (b)->ie_laddr.s_addr			\
 )
 
 #define ENDPTS6_EQ(a, b) (memcmp(a, b, sizeof(*a)) == 0)
 
 #define	SCH_LOCK(sch)		mtx_lock(&(sch)->sch_mtx)
 #define	SCH_UNLOCK(sch)		mtx_unlock(&(sch)->sch_mtx)
 #define	SCH_LOCK_ASSERT(sch)	mtx_assert(&(sch)->sch_mtx, MA_OWNED)
 
 /*
  * Requires the syncache entry to be already removed from the bucket list.
  */
 static void
 syncache_free(struct syncache *sc)
 {
 	INIT_VNET_INET(curvnet);
 
 	if (sc->sc_ipopts)
 		(void) m_free(sc->sc_ipopts);
 	if (sc->sc_cred)
 		crfree(sc->sc_cred);
 #ifdef MAC
 	mac_syncache_destroy(&sc->sc_label);
 #endif
 
 	uma_zfree(V_tcp_syncache.zone, sc);
 }
 
 void
 syncache_init(void)
 {
 	INIT_VNET_INET(curvnet);
 	int i;
 
 	V_tcp_syncookies = 1;
 	V_tcp_syncookiesonly = 0;
 	V_tcp_sc_rst_sock_fail = 1;
 
 	V_tcp_syncache.cache_count = 0;
 	V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
 	V_tcp_syncache.bucket_limit = TCP_SYNCACHE_BUCKETLIMIT;
 	V_tcp_syncache.rexmt_limit = SYNCACHE_MAXREXMTS;
 	V_tcp_syncache.hash_secret = arc4random();
 
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.hashsize",
 	    &V_tcp_syncache.hashsize);
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.bucketlimit",
 	    &V_tcp_syncache.bucket_limit);
 	if (!powerof2(V_tcp_syncache.hashsize) ||
 	    V_tcp_syncache.hashsize == 0) {
 		printf("WARNING: syncache hash size is not a power of 2.\n");
 		V_tcp_syncache.hashsize = TCP_SYNCACHE_HASHSIZE;
 	}
 	V_tcp_syncache.hashmask = V_tcp_syncache.hashsize - 1;
 
 	/* Set limits. */
 	V_tcp_syncache.cache_limit =
 	    V_tcp_syncache.hashsize * V_tcp_syncache.bucket_limit;
 	TUNABLE_INT_FETCH("net.inet.tcp.syncache.cachelimit",
 	    &V_tcp_syncache.cache_limit);
 
 	/* Allocate the hash table. */
 	V_tcp_syncache.hashbase = malloc(V_tcp_syncache.hashsize *
 	    sizeof(struct syncache_head), M_SYNCACHE, M_WAITOK | M_ZERO);
 
 	/* Initialize the hash buckets. */
 	for (i = 0; i < V_tcp_syncache.hashsize; i++) {
 #ifdef VIMAGE
 		V_tcp_syncache.hashbase[i].sch_vnet = curvnet;
 #endif
 		TAILQ_INIT(&V_tcp_syncache.hashbase[i].sch_bucket);
 		mtx_init(&V_tcp_syncache.hashbase[i].sch_mtx, "tcp_sc_head",
 			 NULL, MTX_DEF);
 		callout_init_mtx(&V_tcp_syncache.hashbase[i].sch_timer,
 			 &V_tcp_syncache.hashbase[i].sch_mtx, 0);
 		V_tcp_syncache.hashbase[i].sch_length = 0;
 	}
 
 	/* Create the syncache entry zone. */
 	V_tcp_syncache.zone = uma_zcreate("syncache", sizeof(struct syncache),
 	    NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
 	uma_zone_set_max(V_tcp_syncache.zone, V_tcp_syncache.cache_limit);
 }
 
 #ifdef VIMAGE
 void
 syncache_destroy(void)
 {
 	INIT_VNET_INET(curvnet);
 
 	/* XXX walk the cache, free remaining objects, stop timers */
 
 	uma_zdestroy(V_tcp_syncache.zone);
 	FREE(V_tcp_syncache.hashbase, M_SYNCACHE);
 }
 #endif
 
 /*
  * Inserts a syncache entry into the specified bucket row.
  * Locks and unlocks the syncache_head autonomously.
  */
 static void
 syncache_insert(struct syncache *sc, struct syncache_head *sch)
 {
 	INIT_VNET_INET(sch->sch_vnet);
 	struct syncache *sc2;
 
 	SCH_LOCK(sch);
 
 	/*
 	 * Make sure that we don't overflow the per-bucket limit.
 	 * If the bucket is full, toss the oldest element.
 	 */
 	if (sch->sch_length >= V_tcp_syncache.bucket_limit) {
 		KASSERT(!TAILQ_EMPTY(&sch->sch_bucket),
 			("sch->sch_length incorrect"));
 		sc2 = TAILQ_LAST(&sch->sch_bucket, sch_head);
 		syncache_drop(sc2, sch);
 		TCPSTAT_INC(tcps_sc_bucketoverflow);
 	}
 
 	/* Put it into the bucket. */
 	TAILQ_INSERT_HEAD(&sch->sch_bucket, sc, sc_hash);
 	sch->sch_length++;
 
 	/* Reinitialize the bucket row's timer. */
 	if (sch->sch_length == 1)
 		sch->sch_nextc = ticks + INT_MAX;
 	syncache_timeout(sc, sch, 1);
 
 	SCH_UNLOCK(sch);
 
 	V_tcp_syncache.cache_count++;
 	TCPSTAT_INC(tcps_sc_added);
 }
 
 /*
  * Remove and free entry from syncache bucket row.
  * Expects locked syncache head.
  */
 static void
 syncache_drop(struct syncache *sc, struct syncache_head *sch)
 {
 	INIT_VNET_INET(sch->sch_vnet);
 
 	SCH_LOCK_ASSERT(sch);
 
 	TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
 	sch->sch_length--;
 
 #ifndef TCP_OFFLOAD_DISABLE
 	if (sc->sc_tu)
 		sc->sc_tu->tu_syncache_event(TOE_SC_DROP, sc->sc_toepcb);
 #endif		    
 	syncache_free(sc);
 	V_tcp_syncache.cache_count--;
 }
 
 /*
  * Engage/reengage time on bucket row.
  */
 static void
 syncache_timeout(struct syncache *sc, struct syncache_head *sch, int docallout)
 {
 	sc->sc_rxttime = ticks +
 		TCPTV_RTOBASE * (tcp_backoff[sc->sc_rxmits]);
 	sc->sc_rxmits++;
 	if (TSTMP_LT(sc->sc_rxttime, sch->sch_nextc)) {
 		sch->sch_nextc = sc->sc_rxttime;
 		if (docallout)
 			callout_reset(&sch->sch_timer, sch->sch_nextc - ticks,
 			    syncache_timer, (void *)sch);
 	}
 }
 
 /*
  * Walk the timer queues, looking for SYN,ACKs that need to be retransmitted.
  * If we have retransmitted an entry the maximum number of times, expire it.
  * One separate timer for each bucket row.
  */
 static void
 syncache_timer(void *xsch)
 {
 	struct syncache_head *sch = (struct syncache_head *)xsch;
 	struct syncache *sc, *nsc;
 	int tick = ticks;
 	char *s;
 
 	CURVNET_SET(sch->sch_vnet);
 	INIT_VNET_INET(sch->sch_vnet);
 
 	/* NB: syncache_head has already been locked by the callout. */
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * In the following cycle we may remove some entries and/or
 	 * advance some timeouts, so re-initialize the bucket timer.
 	 */
 	sch->sch_nextc = tick + INT_MAX;
 
 	TAILQ_FOREACH_SAFE(sc, &sch->sch_bucket, sc_hash, nsc) {
 		/*
 		 * We do not check if the listen socket still exists
 		 * and accept the case where the listen socket may be
 		 * gone by the time we resend the SYN/ACK.  We do
 		 * not expect this to happens often. If it does,
 		 * then the RST will be sent by the time the remote
 		 * host does the SYN/ACK->ACK.
 		 */
 		if (TSTMP_GT(sc->sc_rxttime, tick)) {
 			if (TSTMP_LT(sc->sc_rxttime, sch->sch_nextc))
 				sch->sch_nextc = sc->sc_rxttime;
 			continue;
 		}
 		if (sc->sc_rxmits > V_tcp_syncache.rexmt_limit) {
 			if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 				log(LOG_DEBUG, "%s; %s: Retransmits exhausted, "
 				    "giving up and removing syncache entry\n",
 				    s, __func__);
 				free(s, M_TCPLOG);
 			}
 			syncache_drop(sc, sch);
 			TCPSTAT_INC(tcps_sc_stale);
 			continue;
 		}
 		if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Response timeout, "
 			    "retransmitting (%u) SYN|ACK\n",
 			    s, __func__, sc->sc_rxmits);
 			free(s, M_TCPLOG);
 		}
 
 		(void) syncache_respond(sc);
 		TCPSTAT_INC(tcps_sc_retransmitted);
 		syncache_timeout(sc, sch, 0);
 	}
 	if (!TAILQ_EMPTY(&(sch)->sch_bucket))
 		callout_reset(&(sch)->sch_timer, (sch)->sch_nextc - tick,
 			syncache_timer, (void *)(sch));
 	CURVNET_RESTORE();
 }
 
 /*
  * Find an entry in the syncache.
  * Returns always with locked syncache_head plus a matching entry or NULL.
  */
 struct syncache *
 syncache_lookup(struct in_conninfo *inc, struct syncache_head **schp)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache *sc;
 	struct syncache_head *sch;
 
 #ifdef INET6
 	if (inc->inc_flags & INC_ISIPV6) {
 		sch = &V_tcp_syncache.hashbase[
 		    SYNCACHE_HASH6(inc, V_tcp_syncache.hashmask)];
 		*schp = sch;
 
 		SCH_LOCK(sch);
 
 		/* Circle through bucket row to find matching entry. */
 		TAILQ_FOREACH(sc, &sch->sch_bucket, sc_hash) {
 			if (ENDPTS6_EQ(&inc->inc_ie, &sc->sc_inc.inc_ie))
 				return (sc);
 		}
 	} else
 #endif
 	{
 		sch = &V_tcp_syncache.hashbase[
 		    SYNCACHE_HASH(inc, V_tcp_syncache.hashmask)];
 		*schp = sch;
 
 		SCH_LOCK(sch);
 
 		/* Circle through bucket row to find matching entry. */
 		TAILQ_FOREACH(sc, &sch->sch_bucket, sc_hash) {
 #ifdef INET6
 			if (sc->sc_inc.inc_flags & INC_ISIPV6)
 				continue;
 #endif
 			if (ENDPTS_EQ(&inc->inc_ie, &sc->sc_inc.inc_ie))
 				return (sc);
 		}
 	}
 	SCH_LOCK_ASSERT(*schp);
 	return (NULL);			/* always returns with locked sch */
 }
 
 /*
  * This function is called when we get a RST for a
  * non-existent connection, so that we can see if the
  * connection is in the syn cache.  If it is, zap it.
  */
 void
 syncache_chkrst(struct in_conninfo *inc, struct tcphdr *th)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache *sc;
 	struct syncache_head *sch;
 	char *s = NULL;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * Any RST to our SYN|ACK must not carry ACK, SYN or FIN flags.
 	 * See RFC 793 page 65, section SEGMENT ARRIVES.
 	 */
 	if (th->th_flags & (TH_ACK|TH_SYN|TH_FIN)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Spurious RST with ACK, SYN or "
 			    "FIN flag set, segment ignored\n", s, __func__);
 		TCPSTAT_INC(tcps_badrst);
 		goto done;
 	}
 
 	/*
 	 * No corresponding connection was found in syncache.
 	 * If syncookies are enabled and possibly exclusively
 	 * used, or we are under memory pressure, a valid RST
 	 * may not find a syncache entry.  In that case we're
 	 * done and no SYN|ACK retransmissions will happen.
 	 * Otherwise the the RST was misdirected or spoofed.
 	 */
 	if (sc == NULL) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Spurious RST without matching "
 			    "syncache entry (possibly syncookie only), "
 			    "segment ignored\n", s, __func__);
 		TCPSTAT_INC(tcps_badrst);
 		goto done;
 	}
 
 	/*
 	 * If the RST bit is set, check the sequence number to see
 	 * if this is a valid reset segment.
 	 * RFC 793 page 37:
 	 *   In all states except SYN-SENT, all reset (RST) segments
 	 *   are validated by checking their SEQ-fields.  A reset is
 	 *   valid if its sequence number is in the window.
 	 *
 	 *   The sequence number in the reset segment is normally an
 	 *   echo of our outgoing acknowlegement numbers, but some hosts
 	 *   send a reset with the sequence number at the rightmost edge
 	 *   of our receive window, and we have to handle this case.
 	 */
 	if (SEQ_GEQ(th->th_seq, sc->sc_irs) &&
 	    SEQ_LEQ(th->th_seq, sc->sc_irs + sc->sc_wnd)) {
 		syncache_drop(sc, sch);
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Our SYN|ACK was rejected, "
 			    "connection attempt aborted by remote endpoint\n",
 			    s, __func__);
 		TCPSTAT_INC(tcps_sc_reset);
 	} else {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: RST with invalid SEQ %u != "
 			    "IRS %u (+WND %u), segment ignored\n",
 			    s, __func__, th->th_seq, sc->sc_irs, sc->sc_wnd);
 		TCPSTAT_INC(tcps_badrst);
 	}
 
 done:
 	if (s != NULL)
 		free(s, M_TCPLOG);
 	SCH_UNLOCK(sch);
 }
 
 void
 syncache_badack(struct in_conninfo *inc)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache *sc;
 	struct syncache_head *sch;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc != NULL) {
 		syncache_drop(sc, sch);
 		TCPSTAT_INC(tcps_sc_badack);
 	}
 	SCH_UNLOCK(sch);
 }
 
 void
 syncache_unreach(struct in_conninfo *inc, struct tcphdr *th)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache *sc;
 	struct syncache_head *sch;
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc == NULL)
 		goto done;
 
 	/* If the sequence number != sc_iss, then it's a bogus ICMP msg */
 	if (ntohl(th->th_seq) != sc->sc_iss)
 		goto done;
 
 	/*
 	 * If we've rertransmitted 3 times and this is our second error,
 	 * we remove the entry.  Otherwise, we allow it to continue on.
 	 * This prevents us from incorrectly nuking an entry during a
 	 * spurious network outage.
 	 *
 	 * See tcp_notify().
 	 */
 	if ((sc->sc_flags & SCF_UNREACH) == 0 || sc->sc_rxmits < 3 + 1) {
 		sc->sc_flags |= SCF_UNREACH;
 		goto done;
 	}
 	syncache_drop(sc, sch);
 	TCPSTAT_INC(tcps_sc_unreach);
 done:
 	SCH_UNLOCK(sch);
 }
 
 /*
  * Build a new TCP socket structure from a syncache entry.
  */
 static struct socket *
 syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m)
 {
 	INIT_VNET_INET(lso->so_vnet);
 	struct inpcb *inp = NULL;
 	struct socket *so;
 	struct tcpcb *tp;
 	char *s;
 
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 
 	/*
 	 * Ok, create the full blown connection, and set things up
 	 * as they would have been set up if we had created the
 	 * connection when the SYN arrived.  If we can't create
 	 * the connection, abort it.
 	 */
 	so = sonewconn(lso, SS_ISCONNECTED);
 	if (so == NULL) {
 		/*
 		 * Drop the connection; we will either send a RST or
 		 * have the peer retransmit its SYN again after its
 		 * RTO and try again.
 		 */
 		TCPSTAT_INC(tcps_listendrop);
 		if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Socket create failed "
 			    "due to limits or memory shortage\n",
 			    s, __func__);
 			free(s, M_TCPLOG);
 		}
 		goto abort2;
 	}
 #ifdef MAC
 	mac_socketpeer_set_from_mbuf(m, so);
 #endif
 
 	inp = sotoinpcb(so);
 	inp->inp_inc.inc_fibnum = sc->sc_inc.inc_fibnum;
 	so->so_fibnum = sc->sc_inc.inc_fibnum;
 	INP_WLOCK(inp);
 
 	/* Insert new socket into PCB hash list. */
 	inp->inp_inc.inc_flags = sc->sc_inc.inc_flags;
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		inp->in6p_laddr = sc->sc_inc.inc6_laddr;
 	} else {
 		inp->inp_vflag &= ~INP_IPV6;
 		inp->inp_vflag |= INP_IPV4;
 #endif
 		inp->inp_laddr = sc->sc_inc.inc_laddr;
 #ifdef INET6
 	}
 #endif
 	inp->inp_lport = sc->sc_inc.inc_lport;
 	if (in_pcbinshash(inp) != 0) {
 		/*
 		 * Undo the assignments above if we failed to
 		 * put the PCB on the hash lists.
 		 */
 #ifdef INET6
 		if (sc->sc_inc.inc_flags & INC_ISIPV6)
 			inp->in6p_laddr = in6addr_any;
 		else
 #endif
 			inp->inp_laddr.s_addr = INADDR_ANY;
 		inp->inp_lport = 0;
 		goto abort;
 	}
 #ifdef IPSEC
 	/* Copy old policy into new socket's. */
 	if (ipsec_copy_policy(sotoinpcb(lso)->inp_sp, inp->inp_sp))
 		printf("syncache_socket: could not copy policy\n");
 #endif
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		struct inpcb *oinp = sotoinpcb(lso);
 		struct in6_addr laddr6;
 		struct sockaddr_in6 sin6;
 		/*
 		 * Inherit socket options from the listening socket.
 		 * Note that in6p_inputopts are not (and should not be)
 		 * copied, since it stores previously received options and is
 		 * used to detect if each new option is different than the
 		 * previous one and hence should be passed to a user.
 		 * If we copied in6p_inputopts, a user would not be able to
 		 * receive options just after calling the accept system call.
 		 */
 		inp->inp_flags |= oinp->inp_flags & INP_CONTROLOPTS;
 		if (oinp->in6p_outputopts)
 			inp->in6p_outputopts =
 			    ip6_copypktopts(oinp->in6p_outputopts, M_NOWAIT);
 
 		sin6.sin6_family = AF_INET6;
 		sin6.sin6_len = sizeof(sin6);
 		sin6.sin6_addr = sc->sc_inc.inc6_faddr;
 		sin6.sin6_port = sc->sc_inc.inc_fport;
 		sin6.sin6_flowinfo = sin6.sin6_scope_id = 0;
 		laddr6 = inp->in6p_laddr;
 		if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
 			inp->in6p_laddr = sc->sc_inc.inc6_laddr;
 		if (in6_pcbconnect(inp, (struct sockaddr *)&sin6,
 		    thread0.td_ucred)) {
 			inp->in6p_laddr = laddr6;
 			goto abort;
 		}
 		/* Override flowlabel from in6_pcbconnect. */
 		inp->inp_flow &= ~IPV6_FLOWLABEL_MASK;
 		inp->inp_flow |= sc->sc_flowlabel;
 	} else
 #endif
 	{
 		struct in_addr laddr;
 		struct sockaddr_in sin;
 
 		inp->inp_options = (m) ? ip_srcroute(m) : NULL;
 		
 		if (inp->inp_options == NULL) {
 			inp->inp_options = sc->sc_ipopts;
 			sc->sc_ipopts = NULL;
 		}
 
 		sin.sin_family = AF_INET;
 		sin.sin_len = sizeof(sin);
 		sin.sin_addr = sc->sc_inc.inc_faddr;
 		sin.sin_port = sc->sc_inc.inc_fport;
 		bzero((caddr_t)sin.sin_zero, sizeof(sin.sin_zero));
 		laddr = inp->inp_laddr;
 		if (inp->inp_laddr.s_addr == INADDR_ANY)
 			inp->inp_laddr = sc->sc_inc.inc_laddr;
 		if (in_pcbconnect(inp, (struct sockaddr *)&sin,
 		    thread0.td_ucred)) {
 			inp->inp_laddr = laddr;
 			goto abort;
 		}
 	}
 	tp = intotcpcb(inp);
 	tp->t_state = TCPS_SYN_RECEIVED;
 	tp->iss = sc->sc_iss;
 	tp->irs = sc->sc_irs;
 	tcp_rcvseqinit(tp);
 	tcp_sendseqinit(tp);
 	tp->snd_wl1 = sc->sc_irs;
 	tp->snd_max = tp->iss + 1;
 	tp->snd_nxt = tp->iss + 1;
 	tp->rcv_up = sc->sc_irs + 1;
 	tp->rcv_wnd = sc->sc_wnd;
 	tp->rcv_adv += tp->rcv_wnd;
 	tp->last_ack_sent = tp->rcv_nxt;
 
 	tp->t_flags = sototcpcb(lso)->t_flags & (TF_NOPUSH|TF_NODELAY);
 	if (sc->sc_flags & SCF_NOOPT)
 		tp->t_flags |= TF_NOOPT;
 	else {
 		if (sc->sc_flags & SCF_WINSCALE) {
 			tp->t_flags |= TF_REQ_SCALE|TF_RCVD_SCALE;
 			tp->snd_scale = sc->sc_requested_s_scale;
 			tp->request_r_scale = sc->sc_requested_r_scale;
 		}
 		if (sc->sc_flags & SCF_TIMESTAMP) {
 			tp->t_flags |= TF_REQ_TSTMP|TF_RCVD_TSTMP;
 			tp->ts_recent = sc->sc_tsreflect;
 			tp->ts_recent_age = ticks;
 			tp->ts_offset = sc->sc_tsoff;
 		}
 #ifdef TCP_SIGNATURE
 		if (sc->sc_flags & SCF_SIGNATURE)
 			tp->t_flags |= TF_SIGNATURE;
 #endif
 		if (sc->sc_flags & SCF_SACK)
 			tp->t_flags |= TF_SACK_PERMIT;
 	}
 
 	if (sc->sc_flags & SCF_ECN)
 		tp->t_flags |= TF_ECN_PERMIT;
 
 	/*
 	 * Set up MSS and get cached values from tcp_hostcache.
 	 * This might overwrite some of the defaults we just set.
 	 */
 	tcp_mss(tp, sc->sc_peer_mss);
 
 	/*
 	 * If the SYN,ACK was retransmitted, reset cwnd to 1 segment.
 	 */
 	if (sc->sc_rxmits)
 		tp->snd_cwnd = tp->t_maxseg;
 	tcp_timer_activate(tp, TT_KEEP, tcp_keepinit);
 
 	INP_WUNLOCK(inp);
 
 	TCPSTAT_INC(tcps_accepts);
 	return (so);
 
 abort:
 	INP_WUNLOCK(inp);
 abort2:
 	if (so != NULL)
 		soabort(so);
 	return (NULL);
 }
 
 /*
  * This function gets called when we receive an ACK for a
  * socket in the LISTEN state.  We look up the connection
  * in the syncache, and if its there, we pull it out of
  * the cache and turn it into a full-blown connection in
  * the SYN-RECEIVED state.
  */
 int
 syncache_expand(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th,
     struct socket **lsop, struct mbuf *m)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache *sc;
 	struct syncache_head *sch;
 	struct syncache scs;
 	char *s;
 
 	/*
 	 * Global TCP locks are held because we manipulate the PCB lists
 	 * and create a new socket.
 	 */
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK,
 	    ("%s: can handle only ACK", __func__));
 
 	sc = syncache_lookup(inc, &sch);	/* returns locked sch */
 	SCH_LOCK_ASSERT(sch);
 	if (sc == NULL) {
 		/*
 		 * There is no syncache entry, so see if this ACK is
 		 * a returning syncookie.  To do this, first:
 		 *  A. See if this socket has had a syncache entry dropped in
 		 *     the past.  We don't want to accept a bogus syncookie
 		 *     if we've never received a SYN.
 		 *  B. check that the syncookie is valid.  If it is, then
 		 *     cobble up a fake syncache entry, and return.
 		 */
 		if (!V_tcp_syncookies) {
 			SCH_UNLOCK(sch);
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Spurious ACK, "
 				    "segment rejected (syncookies disabled)\n",
 				    s, __func__);
 			goto failed;
 		}
 		bzero(&scs, sizeof(scs));
 		sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop);
 		SCH_UNLOCK(sch);
 		if (sc == NULL) {
 			if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 				log(LOG_DEBUG, "%s; %s: Segment failed "
 				    "SYNCOOKIE authentication, segment rejected "
 				    "(probably spoofed)\n", s, __func__);
 			goto failed;
 		}
 	} else {
 		/* Pull out the entry to unlock the bucket row. */
 		TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
 		sch->sch_length--;
 		V_tcp_syncache.cache_count--;
 		SCH_UNLOCK(sch);
 	}
 
 	/*
 	 * Segment validation:
 	 * ACK must match our initial sequence number + 1 (the SYN|ACK).
 	 */
 	if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
 			    "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
 		goto failed;
 	}
 
 	/*
 	 * The SEQ must fall in the window starting at the received
 	 * initial receive sequence number + 1 (the SYN).
 	 */
 	if ((SEQ_LEQ(th->th_seq, sc->sc_irs) ||
 	    SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) &&
 	    !TOEPCB_ISSET(sc)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment "
 			    "rejected\n", s, __func__, th->th_seq, sc->sc_irs);
 		goto failed;
 	}
 
 	if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
 			    "segment rejected\n", s, __func__);
 		goto failed;
 	}
 	/*
 	 * If timestamps were negotiated the reflected timestamp
 	 * must be equal to what we actually sent in the SYN|ACK.
 	 */
 	if ((to->to_flags & TOF_TS) && to->to_tsecr != sc->sc_ts &&
 	    !TOEPCB_ISSET(sc)) {
 		if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
 			log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, "
 			    "segment rejected\n",
 			    s, __func__, to->to_tsecr, sc->sc_ts);
 		goto failed;
 	}
 
 	*lsop = syncache_socket(sc, *lsop, m);
 
 	if (*lsop == NULL)
 		TCPSTAT_INC(tcps_sc_aborted);
 	else
 		TCPSTAT_INC(tcps_sc_completed);
 
 /* how do we find the inp for the new socket? */
 	if (sc != &scs)
 		syncache_free(sc);
 	return (1);
 failed:
 	if (sc != NULL && sc != &scs)
 		syncache_free(sc);
 	if (s != NULL)
 		free(s, M_TCPLOG);
 	*lsop = NULL;
 	return (0);
 }
 
 int
-tcp_offload_syncache_expand(struct in_conninfo *inc, struct tcpopt *to,
+tcp_offload_syncache_expand(struct in_conninfo *inc, struct toeopt *toeo,
     struct tcphdr *th, struct socket **lsop, struct mbuf *m)
 {
 	INIT_VNET_INET(curvnet);
+	struct tcpopt to;
 	int rc;
+
+	bzero(&to, sizeof(struct tcpopt));
+	to.to_mss = toeo->to_mss;
+	to.to_wscale = toeo->to_wscale;
+	to.to_flags = toeo->to_flags;
 	
 	INP_INFO_WLOCK(&V_tcbinfo);
-	rc = syncache_expand(inc, to, th, lsop, m);
+	rc = syncache_expand(inc, &to, th, lsop, m);
 	INP_INFO_WUNLOCK(&V_tcbinfo);
 
 	return (rc);
 }
 
 /*
  * Given a LISTEN socket and an inbound SYN request, add
  * this to the syn cache, and send back a segment:
  *	<SEQ=ISS><ACK=RCV_NXT><CTL=SYN,ACK>
  * to the source.
  *
  * IMPORTANT NOTE: We do _NOT_ ACK data that might accompany the SYN.
  * Doing so would require that we hold onto the data and deliver it
  * to the application.  However, if we are the target of a SYN-flood
  * DoS attack, an attacker could send data which would eventually
  * consume all available buffer space if it were ACKed.  By not ACKing
  * the data, we avoid this DoS scenario.
  */
 static void
 _syncache_add(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th,
     struct inpcb *inp, struct socket **lsop, struct mbuf *m,
     struct toe_usrreqs *tu, void *toepcb)
 {
 	INIT_VNET_INET(inp->inp_vnet);
 	struct tcpcb *tp;
 	struct socket *so;
 	struct syncache *sc = NULL;
 	struct syncache_head *sch;
 	struct mbuf *ipopts = NULL;
 	u_int32_t flowtmp;
 	int win, sb_hiwat, ip_ttl, ip_tos, noopt;
 	char *s;
 #ifdef INET6
 	int autoflowlabel = 0;
 #endif
 #ifdef MAC
 	struct label *maclabel;
 #endif
 	struct syncache scs;
 	struct ucred *cred;
 
 	INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
 	INP_WLOCK_ASSERT(inp);			/* listen socket */
 	KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_SYN,
 	    ("%s: unexpected tcp flags", __func__));
 
 	/*
 	 * Combine all so/tp operations very early to drop the INP lock as
 	 * soon as possible.
 	 */
 	so = *lsop;
 	tp = sototcpcb(so);
 	cred = crhold(so->so_cred);
 
 #ifdef INET6
 	if ((inc->inc_flags & INC_ISIPV6) &&
 	    (inp->inp_flags & IN6P_AUTOFLOWLABEL))
 		autoflowlabel = 1;
 #endif
 	ip_ttl = inp->inp_ip_ttl;
 	ip_tos = inp->inp_ip_tos;
 	win = sbspace(&so->so_rcv);
 	sb_hiwat = so->so_rcv.sb_hiwat;
 	noopt = (tp->t_flags & TF_NOOPT);
 
 	/* By the time we drop the lock these should no longer be used. */
 	so = NULL;
 	tp = NULL;
 
 #ifdef MAC
 	if (mac_syncache_init(&maclabel) != 0) {
 		INP_WUNLOCK(inp);
 		INP_INFO_WUNLOCK(&V_tcbinfo);
 		goto done;
 	} else
 		mac_syncache_create(maclabel, inp);
 #endif
 	INP_WUNLOCK(inp);
 	INP_INFO_WUNLOCK(&V_tcbinfo);
 
 	/*
 	 * Remember the IP options, if any.
 	 */
 #ifdef INET6
 	if (!(inc->inc_flags & INC_ISIPV6))
 #endif
 		ipopts = (m) ? ip_srcroute(m) : NULL;
 
 	/*
 	 * See if we already have an entry for this connection.
 	 * If we do, resend the SYN,ACK, and reset the retransmit timer.
 	 *
 	 * XXX: should the syncache be re-initialized with the contents
 	 * of the new SYN here (which may have different options?)
 	 *
 	 * XXX: We do not check the sequence number to see if this is a
 	 * real retransmit or a new connection attempt.  The question is
 	 * how to handle such a case; either ignore it as spoofed, or
 	 * drop the current entry and create a new one?
 	 */
 	sc = syncache_lookup(inc, &sch);	/* returns locked entry */
 	SCH_LOCK_ASSERT(sch);
 	if (sc != NULL) {
 #ifndef TCP_OFFLOAD_DISABLE
 		if (sc->sc_tu)
 			sc->sc_tu->tu_syncache_event(TOE_SC_ENTRY_PRESENT,
 			    sc->sc_toepcb);
 #endif		    
 		TCPSTAT_INC(tcps_sc_dupsyn);
 		if (ipopts) {
 			/*
 			 * If we were remembering a previous source route,
 			 * forget it and use the new one we've been given.
 			 */
 			if (sc->sc_ipopts)
 				(void) m_free(sc->sc_ipopts);
 			sc->sc_ipopts = ipopts;
 		}
 		/*
 		 * Update timestamp if present.
 		 */
 		if ((sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS))
 			sc->sc_tsreflect = to->to_tsval;
 		else
 			sc->sc_flags &= ~SCF_TIMESTAMP;
 #ifdef MAC
 		/*
 		 * Since we have already unconditionally allocated label
 		 * storage, free it up.  The syncache entry will already
 		 * have an initialized label we can use.
 		 */
 		mac_syncache_destroy(&maclabel);
 #endif
 		/* Retransmit SYN|ACK and reset retransmit count. */
 		if ((s = tcp_log_addrs(&sc->sc_inc, th, NULL, NULL))) {
 			log(LOG_DEBUG, "%s; %s: Received duplicate SYN, "
 			    "resetting timer and retransmitting SYN|ACK\n",
 			    s, __func__);
 			free(s, M_TCPLOG);
 		}
 		if (!TOEPCB_ISSET(sc) && syncache_respond(sc) == 0) {
 			sc->sc_rxmits = 0;
 			syncache_timeout(sc, sch, 1);
 			TCPSTAT_INC(tcps_sndacks);
 			TCPSTAT_INC(tcps_sndtotal);
 		}
 		SCH_UNLOCK(sch);
 		goto done;
 	}
 
 	sc = uma_zalloc(V_tcp_syncache.zone, M_NOWAIT | M_ZERO);
 	if (sc == NULL) {
 		/*
 		 * The zone allocator couldn't provide more entries.
 		 * Treat this as if the cache was full; drop the oldest
 		 * entry and insert the new one.
 		 */
 		TCPSTAT_INC(tcps_sc_zonefail);
 		if ((sc = TAILQ_LAST(&sch->sch_bucket, sch_head)) != NULL)
 			syncache_drop(sc, sch);
 		sc = uma_zalloc(V_tcp_syncache.zone, M_NOWAIT | M_ZERO);
 		if (sc == NULL) {
 			if (V_tcp_syncookies) {
 				bzero(&scs, sizeof(scs));
 				sc = &scs;
 			} else {
 				SCH_UNLOCK(sch);
 				if (ipopts)
 					(void) m_free(ipopts);
 				goto done;
 			}
 		}
 	}
 	
 	/*
 	 * Fill in the syncache values.
 	 */
 #ifdef MAC
 	sc->sc_label = maclabel;
 #endif
 	sc->sc_cred = cred;
 	cred = NULL;
 	sc->sc_ipopts = ipopts;
 	/* XXX-BZ this fib assignment is just useless. */
 	sc->sc_inc.inc_fibnum = inp->inp_inc.inc_fibnum;
 	bcopy(inc, &sc->sc_inc, sizeof(struct in_conninfo));
 #ifdef INET6
 	if (!(inc->inc_flags & INC_ISIPV6))
 #endif
 	{
 		sc->sc_ip_tos = ip_tos;
 		sc->sc_ip_ttl = ip_ttl;
 	}
 #ifndef TCP_OFFLOAD_DISABLE	
 	sc->sc_tu = tu;
 	sc->sc_toepcb = toepcb;
 #endif
 	sc->sc_irs = th->th_seq;
 	sc->sc_iss = arc4random();
 	sc->sc_flags = 0;
 	sc->sc_flowlabel = 0;
 
 	/*
 	 * Initial receive window: clip sbspace to [0 .. TCP_MAXWIN].
 	 * win was derived from socket earlier in the function.
 	 */
 	win = imax(win, 0);
 	win = imin(win, TCP_MAXWIN);
 	sc->sc_wnd = win;
 
 	if (V_tcp_do_rfc1323) {
 		/*
 		 * A timestamp received in a SYN makes
 		 * it ok to send timestamp requests and replies.
 		 */
 		if (to->to_flags & TOF_TS) {
 			sc->sc_tsreflect = to->to_tsval;
 			sc->sc_ts = ticks;
 			sc->sc_flags |= SCF_TIMESTAMP;
 		}
 		if (to->to_flags & TOF_SCALE) {
 			int wscale = 0;
 
 			/*
 			 * Pick the smallest possible scaling factor that
 			 * will still allow us to scale up to sb_max, aka
 			 * kern.ipc.maxsockbuf.
 			 *
 			 * We do this because there are broken firewalls that
 			 * will corrupt the window scale option, leading to
 			 * the other endpoint believing that our advertised
 			 * window is unscaled.  At scale factors larger than
 			 * 5 the unscaled window will drop below 1500 bytes,
 			 * leading to serious problems when traversing these
 			 * broken firewalls.
 			 *
 			 * With the default maxsockbuf of 256K, a scale factor
 			 * of 3 will be chosen by this algorithm.  Those who
 			 * choose a larger maxsockbuf should watch out
 			 * for the compatiblity problems mentioned above.
 			 *
 			 * RFC1323: The Window field in a SYN (i.e., a <SYN>
 			 * or <SYN,ACK>) segment itself is never scaled.
 			 */
 			while (wscale < TCP_MAX_WINSHIFT &&
 			    (TCP_MAXWIN << wscale) < sb_max)
 				wscale++;
 			sc->sc_requested_r_scale = wscale;
 			sc->sc_requested_s_scale = to->to_wscale;
 			sc->sc_flags |= SCF_WINSCALE;
 		}
 	}
 #ifdef TCP_SIGNATURE
 	/*
 	 * If listening socket requested TCP digests, and received SYN
 	 * contains the option, flag this in the syncache so that
 	 * syncache_respond() will do the right thing with the SYN+ACK.
 	 * XXX: Currently we always record the option by default and will
 	 * attempt to use it in syncache_respond().
 	 */
 	if (to->to_flags & TOF_SIGNATURE)
 		sc->sc_flags |= SCF_SIGNATURE;
 #endif
 	if (to->to_flags & TOF_SACKPERM)
 		sc->sc_flags |= SCF_SACK;
 	if (to->to_flags & TOF_MSS)
 		sc->sc_peer_mss = to->to_mss;	/* peer mss may be zero */
 	if (noopt)
 		sc->sc_flags |= SCF_NOOPT;
 	if ((th->th_flags & (TH_ECE|TH_CWR)) && V_tcp_do_ecn)
 		sc->sc_flags |= SCF_ECN;
 
 	if (V_tcp_syncookies) {
 		syncookie_generate(sch, sc, &flowtmp);
 #ifdef INET6
 		if (autoflowlabel)
 			sc->sc_flowlabel = flowtmp;
 #endif
 	} else {
 #ifdef INET6
 		if (autoflowlabel)
 			sc->sc_flowlabel =
 			    (htonl(ip6_randomflowlabel()) & IPV6_FLOWLABEL_MASK);
 #endif
 	}
 	SCH_UNLOCK(sch);
 
 	/*
 	 * Do a standard 3-way handshake.
 	 */
 	if (TOEPCB_ISSET(sc) || syncache_respond(sc) == 0) {
 		if (V_tcp_syncookies && V_tcp_syncookiesonly && sc != &scs)
 			syncache_free(sc);
 		else if (sc != &scs)
 			syncache_insert(sc, sch);   /* locks and unlocks sch */
 		TCPSTAT_INC(tcps_sndacks);
 		TCPSTAT_INC(tcps_sndtotal);
 	} else {
 		if (sc != &scs)
 			syncache_free(sc);
 		TCPSTAT_INC(tcps_sc_dropped);
 	}
 
 done:
 	if (cred != NULL)
 		crfree(cred);
 #ifdef MAC
 	if (sc == &scs)
 		mac_syncache_destroy(&maclabel);
 #endif
 	if (m) {
 		
 		*lsop = NULL;
 		m_freem(m);
 	}
 }
 
 static int
 syncache_respond(struct syncache *sc)
 {
 	INIT_VNET_INET(curvnet);
 	struct ip *ip = NULL;
 	struct mbuf *m;
 	struct tcphdr *th;
 	int optlen, error;
 	u_int16_t hlen, tlen, mssopt;
 	struct tcpopt to;
 #ifdef INET6
 	struct ip6_hdr *ip6 = NULL;
 #endif
 
 	hlen =
 #ifdef INET6
 	       (sc->sc_inc.inc_flags & INC_ISIPV6) ? sizeof(struct ip6_hdr) :
 #endif
 		sizeof(struct ip);
 	tlen = hlen + sizeof(struct tcphdr);
 
 	/* Determine MSS we advertize to other end of connection. */
 	mssopt = tcp_mssopt(&sc->sc_inc);
 	if (sc->sc_peer_mss)
 		mssopt = max( min(sc->sc_peer_mss, mssopt), V_tcp_minmss);
 
 	/* XXX: Assume that the entire packet will fit in a header mbuf. */
 	KASSERT(max_linkhdr + tlen + TCP_MAXOLEN <= MHLEN,
 	    ("syncache: mbuf too small"));
 
 	/* Create the IP+TCP header from scratch. */
 	m = m_gethdr(M_DONTWAIT, MT_DATA);
 	if (m == NULL)
 		return (ENOBUFS);
 #ifdef MAC
 	mac_syncache_create_mbuf(sc->sc_label, m);
 #endif
 	m->m_data += max_linkhdr;
 	m->m_len = tlen;
 	m->m_pkthdr.len = tlen;
 	m->m_pkthdr.rcvif = NULL;
 
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		ip6 = mtod(m, struct ip6_hdr *);
 		ip6->ip6_vfc = IPV6_VERSION;
 		ip6->ip6_nxt = IPPROTO_TCP;
 		ip6->ip6_src = sc->sc_inc.inc6_laddr;
 		ip6->ip6_dst = sc->sc_inc.inc6_faddr;
 		ip6->ip6_plen = htons(tlen - hlen);
 		/* ip6_hlim is set after checksum */
 		ip6->ip6_flow &= ~IPV6_FLOWLABEL_MASK;
 		ip6->ip6_flow |= sc->sc_flowlabel;
 
 		th = (struct tcphdr *)(ip6 + 1);
 	} else
 #endif
 	{
 		ip = mtod(m, struct ip *);
 		ip->ip_v = IPVERSION;
 		ip->ip_hl = sizeof(struct ip) >> 2;
 		ip->ip_len = tlen;
 		ip->ip_id = 0;
 		ip->ip_off = 0;
 		ip->ip_sum = 0;
 		ip->ip_p = IPPROTO_TCP;
 		ip->ip_src = sc->sc_inc.inc_laddr;
 		ip->ip_dst = sc->sc_inc.inc_faddr;
 		ip->ip_ttl = sc->sc_ip_ttl;
 		ip->ip_tos = sc->sc_ip_tos;
 
 		/*
 		 * See if we should do MTU discovery.  Route lookups are
 		 * expensive, so we will only unset the DF bit if:
 		 *
 		 *	1) path_mtu_discovery is disabled
 		 *	2) the SCF_UNREACH flag has been set
 		 */
 		if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0))
 		       ip->ip_off |= IP_DF;
 
 		th = (struct tcphdr *)(ip + 1);
 	}
 	th->th_sport = sc->sc_inc.inc_lport;
 	th->th_dport = sc->sc_inc.inc_fport;
 
 	th->th_seq = htonl(sc->sc_iss);
 	th->th_ack = htonl(sc->sc_irs + 1);
 	th->th_off = sizeof(struct tcphdr) >> 2;
 	th->th_x2 = 0;
 	th->th_flags = TH_SYN|TH_ACK;
 	th->th_win = htons(sc->sc_wnd);
 	th->th_urp = 0;
 
 	if (sc->sc_flags & SCF_ECN) {
 		th->th_flags |= TH_ECE;
 		TCPSTAT_INC(tcps_ecn_shs);
 	}
 
 	/* Tack on the TCP options. */
 	if ((sc->sc_flags & SCF_NOOPT) == 0) {
 		to.to_flags = 0;
 
 		to.to_mss = mssopt;
 		to.to_flags = TOF_MSS;
 		if (sc->sc_flags & SCF_WINSCALE) {
 			to.to_wscale = sc->sc_requested_r_scale;
 			to.to_flags |= TOF_SCALE;
 		}
 		if (sc->sc_flags & SCF_TIMESTAMP) {
 			/* Virgin timestamp or TCP cookie enhanced one. */
 			to.to_tsval = sc->sc_ts;
 			to.to_tsecr = sc->sc_tsreflect;
 			to.to_flags |= TOF_TS;
 		}
 		if (sc->sc_flags & SCF_SACK)
 			to.to_flags |= TOF_SACKPERM;
 #ifdef TCP_SIGNATURE
 		if (sc->sc_flags & SCF_SIGNATURE)
 			to.to_flags |= TOF_SIGNATURE;
 #endif
 		optlen = tcp_addoptions(&to, (u_char *)(th + 1));
 
 		/* Adjust headers by option size. */
 		th->th_off = (sizeof(struct tcphdr) + optlen) >> 2;
 		m->m_len += optlen;
 		m->m_pkthdr.len += optlen;
 
 #ifdef TCP_SIGNATURE
 		if (sc->sc_flags & SCF_SIGNATURE)
 			tcp_signature_compute(m, 0, 0, optlen,
 			    to.to_signature, IPSEC_DIR_OUTBOUND);
 #endif
 #ifdef INET6
 		if (sc->sc_inc.inc_flags & INC_ISIPV6)
 			ip6->ip6_plen = htons(ntohs(ip6->ip6_plen) + optlen);
 		else
 #endif
 			ip->ip_len += optlen;
 	} else
 		optlen = 0;
 
 #ifdef INET6
 	if (sc->sc_inc.inc_flags & INC_ISIPV6) {
 		th->th_sum = 0;
 		th->th_sum = in6_cksum(m, IPPROTO_TCP, hlen,
 				       tlen + optlen - hlen);
 		ip6->ip6_hlim = in6_selecthlim(NULL, NULL);
 		error = ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL);
 	} else
 #endif
 	{
 		th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr,
 		    htons(tlen + optlen - hlen + IPPROTO_TCP));
 		m->m_pkthdr.csum_flags = CSUM_TCP;
 		m->m_pkthdr.csum_data = offsetof(struct tcphdr, th_sum);
 		error = ip_output(m, sc->sc_ipopts, NULL, 0, NULL, NULL);
 	}
 	return (error);
 }
 
 void
 syncache_add(struct in_conninfo *inc, struct tcpopt *to, struct tcphdr *th,
     struct inpcb *inp, struct socket **lsop, struct mbuf *m)
 {
 	_syncache_add(inc, to, th, inp, lsop, m, NULL, NULL);
 }
 
 void
-tcp_offload_syncache_add(struct in_conninfo *inc, struct tcpopt *to,
+tcp_offload_syncache_add(struct in_conninfo *inc, struct toeopt *toeo,
     struct tcphdr *th, struct inpcb *inp, struct socket **lsop,
     struct toe_usrreqs *tu, void *toepcb)
 {
 	INIT_VNET_INET(curvnet);
+	struct tcpopt to;
 
+	bzero(&to, sizeof(struct tcpopt));
+	to.to_mss = toeo->to_mss;
+	to.to_wscale = toeo->to_wscale;
+	to.to_flags = toeo->to_flags;
+
 	INP_INFO_WLOCK(&V_tcbinfo);
 	INP_WLOCK(inp);
-	_syncache_add(inc, to, th, inp, lsop, NULL, tu, toepcb);
+
+	_syncache_add(inc, &to, th, inp, lsop, NULL, tu, toepcb);
 }
 
 /*
  * The purpose of SYN cookies is to avoid keeping track of all SYN's we
  * receive and to be able to handle SYN floods from bogus source addresses
  * (where we will never receive any reply).  SYN floods try to exhaust all
  * our memory and available slots in the SYN cache table to cause a denial
  * of service to legitimate users of the local host.
  *
  * The idea of SYN cookies is to encode and include all necessary information
  * about the connection setup state within the SYN-ACK we send back and thus
  * to get along without keeping any local state until the ACK to the SYN-ACK
  * arrives (if ever).  Everything we need to know should be available from
  * the information we encoded in the SYN-ACK.
  *
  * More information about the theory behind SYN cookies and its first
  * discussion and specification can be found at:
  *  http://cr.yp.to/syncookies.html    (overview)
  *  http://cr.yp.to/syncookies/archive (gory details)
  *
  * This implementation extends the orginal idea and first implementation
  * of FreeBSD by using not only the initial sequence number field to store
  * information but also the timestamp field if present.  This way we can
  * keep track of the entire state we need to know to recreate the session in
  * its original form.  Almost all TCP speakers implement RFC1323 timestamps
  * these days.  For those that do not we still have to live with the known
  * shortcomings of the ISN only SYN cookies.
  *
  * Cookie layers:
  *
  * Initial sequence number we send:
  * 31|................................|0
  *    DDDDDDDDDDDDDDDDDDDDDDDDDMMMRRRP
  *    D = MD5 Digest (first dword)
  *    M = MSS index
  *    R = Rotation of secret
  *    P = Odd or Even secret
  *
  * The MD5 Digest is computed with over following parameters:
  *  a) randomly rotated secret
  *  b) struct in_conninfo containing the remote/local ip/port (IPv4&IPv6)
  *  c) the received initial sequence number from remote host
  *  d) the rotation offset and odd/even bit
  *
  * Timestamp we send:
  * 31|................................|0
  *    DDDDDDDDDDDDDDDDDDDDDDSSSSRRRRA5
  *    D = MD5 Digest (third dword) (only as filler)
  *    S = Requested send window scale
  *    R = Requested receive window scale
  *    A = SACK allowed
  *    5 = TCP-MD5 enabled (not implemented yet)
  *    XORed with MD5 Digest (forth dword)
  *
  * The timestamp isn't cryptographically secure and doesn't need to be.
  * The double use of the MD5 digest dwords ties it to a specific remote/
  * local host/port, remote initial sequence number and our local time
  * limited secret.  A received timestamp is reverted (XORed) and then
  * the contained MD5 dword is compared to the computed one to ensure the
  * timestamp belongs to the SYN-ACK we sent.  The other parameters may
  * have been tampered with but this isn't different from supplying bogus
  * values in the SYN in the first place.
  *
  * Some problems with SYN cookies remain however:
  * Consider the problem of a recreated (and retransmitted) cookie.  If the
  * original SYN was accepted, the connection is established.  The second
  * SYN is inflight, and if it arrives with an ISN that falls within the
  * receive window, the connection is killed.
  *
  * Notes:
  * A heuristic to determine when to accept syn cookies is not necessary.
  * An ACK flood would cause the syncookie verification to be attempted,
  * but a SYN flood causes syncookies to be generated.  Both are of equal
  * cost, so there's no point in trying to optimize the ACK flood case.
  * Also, if you don't process certain ACKs for some reason, then all someone
  * would have to do is launch a SYN and ACK flood at the same time, which
  * would stop cookie verification and defeat the entire purpose of syncookies.
  */
 static int tcp_sc_msstab[] = { 0, 256, 468, 536, 996, 1452, 1460, 8960 };
 
 static void
 syncookie_generate(struct syncache_head *sch, struct syncache *sc,
     u_int32_t *flowlabel)
 {
 	INIT_VNET_INET(curvnet);
 	MD5_CTX ctx;
 	u_int32_t md5_buffer[MD5_DIGEST_LENGTH / sizeof(u_int32_t)];
 	u_int32_t data;
 	u_int32_t *secbits;
 	u_int off, pmss, mss;
 	int i;
 
 	SCH_LOCK_ASSERT(sch);
 
 	/* Which of the two secrets to use. */
 	secbits = sch->sch_oddeven ?
 			sch->sch_secbits_odd : sch->sch_secbits_even;
 
 	/* Reseed secret if too old. */
 	if (sch->sch_reseed < time_uptime) {
 		sch->sch_oddeven = sch->sch_oddeven ? 0 : 1;	/* toggle */
 		secbits = sch->sch_oddeven ?
 				sch->sch_secbits_odd : sch->sch_secbits_even;
 		for (i = 0; i < SYNCOOKIE_SECRET_SIZE; i++)
 			secbits[i] = arc4random();
 		sch->sch_reseed = time_uptime + SYNCOOKIE_LIFETIME;
 	}
 
 	/* Secret rotation offset. */
 	off = sc->sc_iss & 0x7;			/* iss was randomized before */
 
 	/* Maximum segment size calculation. */
 	pmss =
 	    max( min(sc->sc_peer_mss, tcp_mssopt(&sc->sc_inc)),	V_tcp_minmss);
 	for (mss = sizeof(tcp_sc_msstab) / sizeof(int) - 1; mss > 0; mss--)
 		if (tcp_sc_msstab[mss] <= pmss)
 			break;
 
 	/* Fold parameters and MD5 digest into the ISN we will send. */
 	data = sch->sch_oddeven;/* odd or even secret, 1 bit */
 	data |= off << 1;	/* secret offset, derived from iss, 3 bits */
 	data |= mss << 4;	/* mss, 3 bits */
 
 	MD5Init(&ctx);
 	MD5Update(&ctx, ((u_int8_t *)secbits) + off,
 	    SYNCOOKIE_SECRET_SIZE * sizeof(*secbits) - off);
 	MD5Update(&ctx, secbits, off);
 	MD5Update(&ctx, &sc->sc_inc, sizeof(sc->sc_inc));
 	MD5Update(&ctx, &sc->sc_irs, sizeof(sc->sc_irs));
 	MD5Update(&ctx, &data, sizeof(data));
 	MD5Final((u_int8_t *)&md5_buffer, &ctx);
 
 	data |= (md5_buffer[0] << 7);
 	sc->sc_iss = data;
 
 #ifdef INET6
 	*flowlabel = md5_buffer[1] & IPV6_FLOWLABEL_MASK;
 #endif
 
 	/* Additional parameters are stored in the timestamp if present. */
 	if (sc->sc_flags & SCF_TIMESTAMP) {
 		data =  ((sc->sc_flags & SCF_SIGNATURE) ? 1 : 0); /* TCP-MD5, 1 bit */
 		data |= ((sc->sc_flags & SCF_SACK) ? 1 : 0) << 1; /* SACK, 1 bit */
 		data |= sc->sc_requested_s_scale << 2;  /* SWIN scale, 4 bits */
 		data |= sc->sc_requested_r_scale << 6;  /* RWIN scale, 4 bits */
 		data |= md5_buffer[2] << 10;		/* more digest bits */
 		data ^= md5_buffer[3];
 		sc->sc_ts = data;
 		sc->sc_tsoff = data - ticks;		/* after XOR */
 	}
 
 	TCPSTAT_INC(tcps_sc_sendcookie);
 }
 
 static struct syncache *
 syncookie_lookup(struct in_conninfo *inc, struct syncache_head *sch, 
     struct syncache *sc, struct tcpopt *to, struct tcphdr *th,
     struct socket *so)
 {
 	INIT_VNET_INET(curvnet);
 	MD5_CTX ctx;
 	u_int32_t md5_buffer[MD5_DIGEST_LENGTH / sizeof(u_int32_t)];
 	u_int32_t data = 0;
 	u_int32_t *secbits;
 	tcp_seq ack, seq;
 	int off, mss, wnd, flags;
 
 	SCH_LOCK_ASSERT(sch);
 
 	/*
 	 * Pull information out of SYN-ACK/ACK and
 	 * revert sequence number advances.
 	 */
 	ack = th->th_ack - 1;
 	seq = th->th_seq - 1;
 	off = (ack >> 1) & 0x7;
 	mss = (ack >> 4) & 0x7;
 	flags = ack & 0x7f;
 
 	/* Which of the two secrets to use. */
 	secbits = (flags & 0x1) ? sch->sch_secbits_odd : sch->sch_secbits_even;
 
 	/*
 	 * The secret wasn't updated for the lifetime of a syncookie,
 	 * so this SYN-ACK/ACK is either too old (replay) or totally bogus.
 	 */
 	if (sch->sch_reseed + SYNCOOKIE_LIFETIME < time_uptime) {
 		return (NULL);
 	}
 
 	/* Recompute the digest so we can compare it. */
 	MD5Init(&ctx);
 	MD5Update(&ctx, ((u_int8_t *)secbits) + off,
 	    SYNCOOKIE_SECRET_SIZE * sizeof(*secbits) - off);
 	MD5Update(&ctx, secbits, off);
 	MD5Update(&ctx, inc, sizeof(*inc));
 	MD5Update(&ctx, &seq, sizeof(seq));
 	MD5Update(&ctx, &flags, sizeof(flags));
 	MD5Final((u_int8_t *)&md5_buffer, &ctx);
 
 	/* Does the digest part of or ACK'ed ISS match? */
 	if ((ack & (~0x7f)) != (md5_buffer[0] << 7))
 		return (NULL);
 
 	/* Does the digest part of our reflected timestamp match? */
 	if (to->to_flags & TOF_TS) {
 		data = md5_buffer[3] ^ to->to_tsecr;
 		if ((data & (~0x3ff)) != (md5_buffer[2] << 10))
 			return (NULL);
 	}
 
 	/* Fill in the syncache values. */
 	bcopy(inc, &sc->sc_inc, sizeof(struct in_conninfo));
 	sc->sc_ipopts = NULL;
 	
 	sc->sc_irs = seq;
 	sc->sc_iss = ack;
 
 #ifdef INET6
 	if (inc->inc_flags & INC_ISIPV6) {
 		if (sotoinpcb(so)->inp_flags & IN6P_AUTOFLOWLABEL)
 			sc->sc_flowlabel = md5_buffer[1] & IPV6_FLOWLABEL_MASK;
 	} else
 #endif
 	{
 		sc->sc_ip_ttl = sotoinpcb(so)->inp_ip_ttl;
 		sc->sc_ip_tos = sotoinpcb(so)->inp_ip_tos;
 	}
 
 	/* Additional parameters that were encoded in the timestamp. */
 	if (data) {
 		sc->sc_flags |= SCF_TIMESTAMP;
 		sc->sc_tsreflect = to->to_tsval;
 		sc->sc_ts = to->to_tsecr;
 		sc->sc_tsoff = to->to_tsecr - ticks;
 		sc->sc_flags |= (data & 0x1) ? SCF_SIGNATURE : 0;
 		sc->sc_flags |= ((data >> 1) & 0x1) ? SCF_SACK : 0;
 		sc->sc_requested_s_scale = min((data >> 2) & 0xf,
 		    TCP_MAX_WINSHIFT);
 		sc->sc_requested_r_scale = min((data >> 6) & 0xf,
 		    TCP_MAX_WINSHIFT);
 		if (sc->sc_requested_s_scale || sc->sc_requested_r_scale)
 			sc->sc_flags |= SCF_WINSCALE;
 	} else
 		sc->sc_flags |= SCF_NOOPT;
 
 	wnd = sbspace(&so->so_rcv);
 	wnd = imax(wnd, 0);
 	wnd = imin(wnd, TCP_MAXWIN);
 	sc->sc_wnd = wnd;
 
 	sc->sc_rxmits = 0;
 	sc->sc_peer_mss = tcp_sc_msstab[mss];
 
 	TCPSTAT_INC(tcps_sc_recvcookie);
 	return (sc);
 }
 
 /*
  * Returns the current number of syncache entries.  This number
  * will probably change before you get around to calling 
  * syncache_pcblist.
  */
 
 int
 syncache_pcbcount(void)
 {
 	INIT_VNET_INET(curvnet);
 	struct syncache_head *sch;
 	int count, i;
 
 	for (count = 0, i = 0; i < V_tcp_syncache.hashsize; i++) {
 		/* No need to lock for a read. */
 		sch = &V_tcp_syncache.hashbase[i];
 		count += sch->sch_length;
 	}
 	return count;
 }
 
 /*
  * Exports the syncache entries to userland so that netstat can display
  * them alongside the other sockets.  This function is intended to be
  * called only from tcp_pcblist.
  *
  * Due to concurrency on an active system, the number of pcbs exported
  * may have no relation to max_pcbs.  max_pcbs merely indicates the
  * amount of space the caller allocated for this function to use.
  */
 int
 syncache_pcblist(struct sysctl_req *req, int max_pcbs, int *pcbs_exported)
 {
 	INIT_VNET_INET(curvnet);
 	struct xtcpcb xt;
 	struct syncache *sc;
 	struct syncache_head *sch;
 	int count, error, i;
 
 	for (count = 0, error = 0, i = 0; i < V_tcp_syncache.hashsize; i++) {
 		sch = &V_tcp_syncache.hashbase[i];
 		SCH_LOCK(sch);
 		TAILQ_FOREACH(sc, &sch->sch_bucket, sc_hash) {
 			if (count >= max_pcbs) {
 				SCH_UNLOCK(sch);
 				goto exit;
 			}
 			if (cr_cansee(req->td->td_ucred, sc->sc_cred) != 0)
 				continue;
 			bzero(&xt, sizeof(xt));
 			xt.xt_len = sizeof(xt);
 			if (sc->sc_inc.inc_flags & INC_ISIPV6)
 				xt.xt_inp.inp_vflag = INP_IPV6;
 			else
 				xt.xt_inp.inp_vflag = INP_IPV4;
 			bcopy(&sc->sc_inc, &xt.xt_inp.inp_inc, sizeof (struct in_conninfo));
 			xt.xt_tp.t_inpcb = &xt.xt_inp;
 			xt.xt_tp.t_state = TCPS_SYN_RECEIVED;
 			xt.xt_socket.xso_protocol = IPPROTO_TCP;
 			xt.xt_socket.xso_len = sizeof (struct xsocket);
 			xt.xt_socket.so_type = SOCK_STREAM;
 			xt.xt_socket.so_state = SS_ISCONNECTING;
 			error = SYSCTL_OUT(req, &xt, sizeof xt);
 			if (error) {
 				SCH_UNLOCK(sch);
 				goto exit;
 			}
 			count++;
 		}
 		SCH_UNLOCK(sch);
 	}
 exit:
 	*pcbs_exported = count;
 	return error;
 }
Index: head/sys/netinet/tcp_syncache.h
===================================================================
--- head/sys/netinet/tcp_syncache.h	(revision 195653)
+++ head/sys/netinet/tcp_syncache.h	(revision 195654)
@@ -1,125 +1,127 @@
 /*-
  * Copyright (c) 1982, 1986, 1993, 1994, 1995
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)tcp_var.h	8.4 (Berkeley) 5/24/95
  * $FreeBSD$
  */
 
 #ifndef _NETINET_TCP_SYNCACHE_H_
 #define _NETINET_TCP_SYNCACHE_H_
 #ifdef _KERNEL
 
+struct toeopt;
+
 void	 syncache_init(void);
 #ifdef VIMAGE
 void	syncache_destroy(void);
 #endif
 void	 syncache_unreach(struct in_conninfo *, struct tcphdr *);
 int	 syncache_expand(struct in_conninfo *, struct tcpopt *,
 	     struct tcphdr *, struct socket **, struct mbuf *);
-int	 tcp_offload_syncache_expand(struct in_conninfo *inc, struct tcpopt *to,
+int	 tcp_offload_syncache_expand(struct in_conninfo *inc, struct toeopt *toeo,
              struct tcphdr *th, struct socket **lsop, struct mbuf *m);
 void	 syncache_add(struct in_conninfo *, struct tcpopt *,
 	     struct tcphdr *, struct inpcb *, struct socket **, struct mbuf *);
-void	 tcp_offload_syncache_add(struct in_conninfo *, struct tcpopt *,
+void	 tcp_offload_syncache_add(struct in_conninfo *, struct toeopt *,
              struct tcphdr *, struct inpcb *, struct socket **,
              struct toe_usrreqs *tu, void *toepcb);
 
 void	 syncache_chkrst(struct in_conninfo *, struct tcphdr *);
 void	 syncache_badack(struct in_conninfo *);
 int	 syncache_pcbcount(void);
 int	 syncache_pcblist(struct sysctl_req *req, int max_pcbs, int *pcbs_exported);
 
 struct syncache {
 	TAILQ_ENTRY(syncache)	sc_hash;
 	struct		in_conninfo sc_inc;	/* addresses */
 	int		sc_rxttime;		/* retransmit time */
 	u_int16_t	sc_rxmits;		/* retransmit counter */
 	u_int32_t	sc_tsreflect;		/* timestamp to reflect */
 	u_int32_t	sc_ts;			/* our timestamp to send */
 	u_int32_t	sc_tsoff;		/* ts offset w/ syncookies */
 	u_int32_t	sc_flowlabel;		/* IPv6 flowlabel */
 	tcp_seq		sc_irs;			/* seq from peer */
 	tcp_seq		sc_iss;			/* our ISS */
 	struct		mbuf *sc_ipopts;	/* source route */
 	u_int16_t	sc_peer_mss;		/* peer's MSS */
 	u_int16_t	sc_wnd;			/* advertised window */
 	u_int8_t	sc_ip_ttl;		/* IPv4 TTL */
 	u_int8_t	sc_ip_tos;		/* IPv4 TOS */
 	u_int8_t	sc_requested_s_scale:4,
 			sc_requested_r_scale:4;
 	u_int16_t	sc_flags;
 #ifndef TCP_OFFLOAD_DISABLE
 	struct toe_usrreqs *sc_tu;		/* TOE operations */
 	void 		*sc_toepcb;		/* TOE protocol block */
 #endif			
 	struct label	*sc_label;		/* MAC label reference */
 	struct ucred	*sc_cred;		/* cred cache for jail checks */
 };
 
 /*
  * Flags for the sc_flags field.
  */
 #define SCF_NOOPT	0x01			/* no TCP options */
 #define SCF_WINSCALE	0x02			/* negotiated window scaling */
 #define SCF_TIMESTAMP	0x04			/* negotiated timestamps */
 						/* MSS is implicit */
 #define SCF_UNREACH	0x10			/* icmp unreachable received */
 #define SCF_SIGNATURE	0x20			/* send MD5 digests */
 #define SCF_SACK	0x80			/* send SACK option */
 #define SCF_ECN		0x100			/* send ECN setup packet */
 
 #define	SYNCOOKIE_SECRET_SIZE	8	/* dwords */
 #define	SYNCOOKIE_LIFETIME	16	/* seconds */
 
 struct syncache_head {
 	struct vnet	*sch_vnet;
 	struct mtx	sch_mtx;
 	TAILQ_HEAD(sch_head, syncache)	sch_bucket;
 	struct callout	sch_timer;
 	int		sch_nextc;
 	u_int		sch_length;
 	u_int		sch_oddeven;
 	u_int32_t	sch_secbits_odd[SYNCOOKIE_SECRET_SIZE];
 	u_int32_t	sch_secbits_even[SYNCOOKIE_SECRET_SIZE];
 	u_int		sch_reseed;		/* time_uptime, seconds */
 };
 
 struct tcp_syncache {
 	struct	syncache_head *hashbase;
 	uma_zone_t zone;
 	u_int	hashsize;
 	u_int	hashmask;
 	u_int	bucket_limit;
 	u_int	cache_count;		/* XXX: unprotected */
 	u_int	cache_limit;
 	u_int	rexmt_limit;
 	u_int	hash_secret;
 };
 
 #endif /* _KERNEL */
 #endif /* !_NETINET_TCP_SYNCACHE_H_ */
Index: head/sys/netinet/tcp_var.h
===================================================================
--- head/sys/netinet/tcp_var.h	(revision 195653)
+++ head/sys/netinet/tcp_var.h	(revision 195654)
@@ -1,668 +1,668 @@
 /*-
  * Copyright (c) 1982, 1986, 1993, 1994, 1995
  *	The Regents of the University of California.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)tcp_var.h	8.4 (Berkeley) 5/24/95
  * $FreeBSD$
  */
 
 #ifndef _NETINET_TCP_VAR_H_
 #define _NETINET_TCP_VAR_H_
 
 #include <netinet/tcp.h>
 
 struct vnet;
 
 /*
  * Kernel variables for tcp.
  */
 #ifdef VIMAGE_GLOBALS
 extern int	tcp_do_rfc1323;
 #endif
 
 /* TCP segment queue entry */
 struct tseg_qent {
 	LIST_ENTRY(tseg_qent) tqe_q;
 	int	tqe_len;		/* TCP segment data length */
 	struct	tcphdr *tqe_th;		/* a pointer to tcp header */
 	struct	mbuf	*tqe_m;		/* mbuf contains packet */
 };
 LIST_HEAD(tsegqe_head, tseg_qent);
 #ifdef VIMAGE_GLOBALS
 extern int	tcp_reass_qsize;
 #endif
 extern struct uma_zone *tcp_reass_zone;
 
 struct sackblk {
 	tcp_seq start;		/* start seq no. of sack block */
 	tcp_seq end;		/* end seq no. */
 };
 
 struct sackhole {
 	tcp_seq start;		/* start seq no. of hole */
 	tcp_seq end;		/* end seq no. */
 	tcp_seq rxmit;		/* next seq. no in hole to be retransmitted */
 	TAILQ_ENTRY(sackhole) scblink;	/* scoreboard linkage */
 };
 
 struct sackhint {
 	struct sackhole	*nexthole;
 	int		sack_bytes_rexmit;
 
 	int		ispare;		/* explicit pad for 64bit alignment */
 	uint64_t	_pad[2];	/* 1 sacked_bytes, 1 TBD */
 };
 
 struct tcptemp {
 	u_char	tt_ipgen[40]; /* the size must be of max ip header, now IPv6 */
 	struct	tcphdr tt_t;
 };
 
 #define tcp6cb		tcpcb  /* for KAME src sync over BSD*'s */
 
 /* Neighbor Discovery, Neighbor Unreachability Detection Upper layer hint. */
 #ifdef INET6
 #define ND6_HINT(tp)						\
 do {								\
 	if ((tp) && (tp)->t_inpcb &&				\
 	    ((tp)->t_inpcb->inp_vflag & INP_IPV6) != 0)		\
 		nd6_nud_hint(NULL, NULL, 0);			\
 } while (0)
 #else
 #define ND6_HINT(tp)
 #endif
 
 /*
  * Tcp control block, one per tcp; fields:
  * Organized for 16 byte cacheline efficiency.
  */
 struct tcpcb {
 	struct	tsegqe_head t_segq;	/* segment reassembly queue */
 	void	*t_pspare[2];		/* new reassembly queue */
 	int	t_segqlen;		/* segment reassembly queue length */
 	int	t_dupacks;		/* consecutive dup acks recd */
 
 	struct tcp_timer *t_timers;	/* All the TCP timers in one struct */
 
 	struct	inpcb *t_inpcb;		/* back pointer to internet pcb */
 	int	t_state;		/* state of this connection */
 	u_int	t_flags;
 
 	struct	vnet *t_vnet;		/* back pointer to parent vnet */
 
 	tcp_seq	snd_una;		/* send unacknowledged */
 	tcp_seq	snd_max;		/* highest sequence number sent;
 					 * used to recognize retransmits
 					 */
 	tcp_seq	snd_nxt;		/* send next */
 	tcp_seq	snd_up;			/* send urgent pointer */
 
 	tcp_seq	snd_wl1;		/* window update seg seq number */
 	tcp_seq	snd_wl2;		/* window update seg ack number */
 	tcp_seq	iss;			/* initial send sequence number */
 	tcp_seq	irs;			/* initial receive sequence number */
 
 	tcp_seq	rcv_nxt;		/* receive next */
 	tcp_seq	rcv_adv;		/* advertised window */
 	u_long	rcv_wnd;		/* receive window */
 	tcp_seq	rcv_up;			/* receive urgent pointer */
 
 	u_long	snd_wnd;		/* send window */
 	u_long	snd_cwnd;		/* congestion-controlled window */
 	u_long	snd_bwnd;		/* bandwidth-controlled window */
 	u_long	snd_ssthresh;		/* snd_cwnd size threshold for
 					 * for slow start exponential to
 					 * linear switch
 					 */
 	u_long	snd_bandwidth;		/* calculated bandwidth or 0 */
 	tcp_seq	snd_recover;		/* for use in NewReno Fast Recovery */
 
 	u_int	t_maxopd;		/* mss plus options */
 
 	u_int	t_rcvtime;		/* inactivity time */
 	u_int	t_starttime;		/* time connection was established */
 	u_int	t_rtttime;		/* RTT measurement start time */
 	tcp_seq	t_rtseq;		/* sequence number being timed */
 
 	u_int	t_bw_rtttime;		/* used for bandwidth calculation */
 	tcp_seq	t_bw_rtseq;		/* used for bandwidth calculation */
 
 	int	t_rxtcur;		/* current retransmit value (ticks) */
 	u_int	t_maxseg;		/* maximum segment size */
 	int	t_srtt;			/* smoothed round-trip time */
 	int	t_rttvar;		/* variance in round-trip time */
 
 	int	t_rxtshift;		/* log(2) of rexmt exp. backoff */
 	u_int	t_rttmin;		/* minimum rtt allowed */
 	u_int	t_rttbest;		/* best rtt we've seen */
 	u_long	t_rttupdated;		/* number of times rtt sampled */
 	u_long	max_sndwnd;		/* largest window peer has offered */
 
 	int	t_softerror;		/* possible error not yet reported */
 /* out-of-band data */
 	char	t_oobflags;		/* have some */
 	char	t_iobc;			/* input character */
 /* RFC 1323 variables */
 	u_char	snd_scale;		/* window scaling for send window */
 	u_char	rcv_scale;		/* window scaling for recv window */
 	u_char	request_r_scale;	/* pending window scaling */
 	u_int32_t  ts_recent;		/* timestamp echo data */
 	u_int	ts_recent_age;		/* when last updated */
 	u_int32_t  ts_offset;		/* our timestamp offset */
 
 	tcp_seq	last_ack_sent;
 /* experimental */
 	u_long	snd_cwnd_prev;		/* cwnd prior to retransmit */
 	u_long	snd_ssthresh_prev;	/* ssthresh prior to retransmit */
 	tcp_seq	snd_recover_prev;	/* snd_recover prior to retransmit */
 	u_int	t_badrxtwin;		/* window for retransmit recovery */
 	u_char	snd_limited;		/* segments limited transmitted */
 /* SACK related state */
 	int	snd_numholes;		/* number of holes seen by sender */
 	TAILQ_HEAD(sackhole_head, sackhole) snd_holes;
 					/* SACK scoreboard (sorted) */
 	tcp_seq	snd_fack;		/* last seq number(+1) sack'd by rcv'r*/
 	int	rcv_numsacks;		/* # distinct sack blks present */
 	struct sackblk sackblks[MAX_SACK_BLKS]; /* seq nos. of sack blocks */
 	tcp_seq sack_newdata;		/* New data xmitted in this recovery
 					   episode starts at this seq number */
 	struct sackhint	sackhint;	/* SACK scoreboard hint */
 	int	t_rttlow;		/* smallest observerved RTT */
 	u_int32_t	rfbuf_ts;	/* recv buffer autoscaling timestamp */
 	int	rfbuf_cnt;		/* recv buffer autoscaling byte count */
 	struct toe_usrreqs *t_tu;	/* offload operations vector */
 	void	*t_toe;			/* TOE pcb pointer */
 	int	t_bytes_acked;		/* # bytes acked during current RTT */
 
 	int	t_ispare;		/* explicit pad for 64bit alignment */
 	void	*t_pspare2[6];		/* 2 CC / 4 TBD */
 	uint64_t _pad[12];		/* 7 UTO, 5 TBD (1-2 CC/RTT?) */
 };
 
 /*
  * Flags and utility macros for the t_flags field.
  */
 #define	TF_ACKNOW	0x000001	/* ack peer immediately */
 #define	TF_DELACK	0x000002	/* ack, but try to delay it */
 #define	TF_NODELAY	0x000004	/* don't delay packets to coalesce */
 #define	TF_NOOPT	0x000008	/* don't use tcp options */
 #define	TF_SENTFIN	0x000010	/* have sent FIN */
 #define	TF_REQ_SCALE	0x000020	/* have/will request window scaling */
 #define	TF_RCVD_SCALE	0x000040	/* other side has requested scaling */
 #define	TF_REQ_TSTMP	0x000080	/* have/will request timestamps */
 #define	TF_RCVD_TSTMP	0x000100	/* a timestamp was received in SYN */
 #define	TF_SACK_PERMIT	0x000200	/* other side said I could SACK */
 #define	TF_NEEDSYN	0x000400	/* send SYN (implicit state) */
 #define	TF_NEEDFIN	0x000800	/* send FIN (implicit state) */
 #define	TF_NOPUSH	0x001000	/* don't push */
 #define	TF_MORETOCOME	0x010000	/* More data to be appended to sock */
 #define	TF_LQ_OVERFLOW	0x020000	/* listen queue overflow */
 #define	TF_LASTIDLE	0x040000	/* connection was previously idle */
 #define	TF_RXWIN0SENT	0x080000	/* sent a receiver win 0 in response */
 #define	TF_FASTRECOVERY	0x100000	/* in NewReno Fast Recovery */
 #define	TF_WASFRECOVERY	0x200000	/* was in NewReno Fast Recovery */
 #define	TF_SIGNATURE	0x400000	/* require MD5 digests (RFC2385) */
 #define	TF_FORCEDATA	0x800000	/* force out a byte */
 #define	TF_TSO		0x1000000	/* TSO enabled on this connection */
 #define	TF_TOE		0x2000000	/* this connection is offloaded */
 #define	TF_ECN_PERMIT	0x4000000	/* connection ECN-ready */
 #define	TF_ECN_SND_CWR	0x8000000	/* ECN CWR in queue */
 #define	TF_ECN_SND_ECE	0x10000000	/* ECN ECE in queue */
 
 #define IN_FASTRECOVERY(tp)	(tp->t_flags & TF_FASTRECOVERY)
 #define ENTER_FASTRECOVERY(tp)	tp->t_flags |= TF_FASTRECOVERY
 #define EXIT_FASTRECOVERY(tp)	tp->t_flags &= ~TF_FASTRECOVERY
 
 /*
  * Flags for the t_oobflags field.
  */
 #define	TCPOOB_HAVEDATA	0x01
 #define	TCPOOB_HADDATA	0x02
 
 #ifdef TCP_SIGNATURE
 /*
  * Defines which are needed by the xform_tcp module and tcp_[in|out]put
  * for SADB verification and lookup.
  */
 #define	TCP_SIGLEN	16	/* length of computed digest in bytes */
 #define	TCP_KEYLEN_MIN	1	/* minimum length of TCP-MD5 key */
 #define	TCP_KEYLEN_MAX	80	/* maximum length of TCP-MD5 key */
 /*
  * Only a single SA per host may be specified at this time. An SPI is
  * needed in order for the KEY_ALLOCSA() lookup to work.
  */
 #define	TCP_SIG_SPI	0x1000
 #endif /* TCP_SIGNATURE */
 
 /*
  * Structure to hold TCP options that are only used during segment
  * processing (in tcp_input), but not held in the tcpcb.
  * It's basically used to reduce the number of parameters
  * to tcp_dooptions and tcp_addoptions.
  * The binary order of the to_flags is relevant for packing of the
  * options in tcp_addoptions.
  */
 struct tcpopt {
-	u_long		to_flags;	/* which options are present */
+	u_int64_t	to_flags;	/* which options are present */
 #define	TOF_MSS		0x0001		/* maximum segment size */
 #define	TOF_SCALE	0x0002		/* window scaling */
 #define	TOF_SACKPERM	0x0004		/* SACK permitted */
 #define	TOF_TS		0x0010		/* timestamp */
 #define	TOF_SIGNATURE	0x0040		/* TCP-MD5 signature option (RFC2385) */
 #define	TOF_SACK	0x0080		/* Peer sent SACK option */
 #define	TOF_MAXOPT	0x0100
 	u_int32_t	to_tsval;	/* new timestamp */
 	u_int32_t	to_tsecr;	/* reflected timestamp */
+	u_char		*to_sacks;	/* pointer to the first SACK blocks */
+	u_char		*to_signature;	/* pointer to the TCP-MD5 signature */
 	u_int16_t	to_mss;		/* maximum segment size */
 	u_int8_t	to_wscale;	/* window scaling */
 	u_int8_t	to_nsacks;	/* number of SACK blocks */
-	u_char		*to_sacks;	/* pointer to the first SACK blocks */
-	u_char		*to_signature;	/* pointer to the TCP-MD5 signature */
 };
 
 /*
  * Flags for tcp_dooptions.
  */
 #define	TO_SYN		0x01		/* parse SYN-only options */
 
 struct hc_metrics_lite {	/* must stay in sync with hc_metrics */
 	u_long	rmx_mtu;	/* MTU for this path */
 	u_long	rmx_ssthresh;	/* outbound gateway buffer limit */
 	u_long	rmx_rtt;	/* estimated round trip time */
 	u_long	rmx_rttvar;	/* estimated rtt variance */
 	u_long	rmx_bandwidth;	/* estimated bandwidth */
 	u_long	rmx_cwnd;	/* congestion window */
 	u_long	rmx_sendpipe;   /* outbound delay-bandwidth product */
 	u_long	rmx_recvpipe;   /* inbound delay-bandwidth product */
 };
 
 #ifndef _NETINET_IN_PCB_H_
 struct in_conninfo;
 #endif /* _NETINET_IN_PCB_H_ */
 
 struct tcptw {
 	struct inpcb	*tw_inpcb;	/* XXX back pointer to internet pcb */
 	tcp_seq		snd_nxt;
 	tcp_seq		rcv_nxt;
 	tcp_seq		iss;
 	tcp_seq		irs;
 	u_short		last_win;	/* cached window value */
 	u_short		tw_so_options;	/* copy of so_options */
 	struct ucred	*tw_cred;	/* user credentials */
 	u_int32_t	t_recent;
 	u_int32_t	ts_offset;	/* our timestamp offset */
 	u_int		t_starttime;
 	int		tw_time;
 	TAILQ_ENTRY(tcptw) tw_2msl;
 };
 
 #define	intotcpcb(ip)	((struct tcpcb *)(ip)->inp_ppcb)
 #define	intotw(ip)	((struct tcptw *)(ip)->inp_ppcb)
 #define	sototcpcb(so)	(intotcpcb(sotoinpcb(so)))
 
 /*
  * The smoothed round-trip time and estimated variance
  * are stored as fixed point numbers scaled by the values below.
  * For convenience, these scales are also used in smoothing the average
  * (smoothed = (1/scale)sample + ((scale-1)/scale)smoothed).
  * With these scales, srtt has 3 bits to the right of the binary point,
  * and thus an "ALPHA" of 0.875.  rttvar has 2 bits to the right of the
  * binary point, and is smoothed with an ALPHA of 0.75.
  */
 #define	TCP_RTT_SCALE		32	/* multiplier for srtt; 3 bits frac. */
 #define	TCP_RTT_SHIFT		5	/* shift for srtt; 3 bits frac. */
 #define	TCP_RTTVAR_SCALE	16	/* multiplier for rttvar; 2 bits */
 #define	TCP_RTTVAR_SHIFT	4	/* shift for rttvar; 2 bits */
 #define	TCP_DELTA_SHIFT		2	/* see tcp_input.c */
 
 /*
  * The initial retransmission should happen at rtt + 4 * rttvar.
  * Because of the way we do the smoothing, srtt and rttvar
  * will each average +1/2 tick of bias.  When we compute
  * the retransmit timer, we want 1/2 tick of rounding and
  * 1 extra tick because of +-1/2 tick uncertainty in the
  * firing of the timer.  The bias will give us exactly the
  * 1.5 tick we need.  But, because the bias is
  * statistical, we have to test that we don't drop below
  * the minimum feasible timer (which is 2 ticks).
  * This version of the macro adapted from a paper by Lawrence
  * Brakmo and Larry Peterson which outlines a problem caused
  * by insufficient precision in the original implementation,
  * which results in inappropriately large RTO values for very
  * fast networks.
  */
 #define	TCP_REXMTVAL(tp) \
 	max((tp)->t_rttmin, (((tp)->t_srtt >> (TCP_RTT_SHIFT - TCP_DELTA_SHIFT))  \
 	  + (tp)->t_rttvar) >> TCP_DELTA_SHIFT)
 
 /*
  * TCP statistics.
  * Many of these should be kept per connection,
  * but that's inconvenient at the moment.
  */
 struct	tcpstat {
 	u_long	tcps_connattempt;	/* connections initiated */
 	u_long	tcps_accepts;		/* connections accepted */
 	u_long	tcps_connects;		/* connections established */
 	u_long	tcps_drops;		/* connections dropped */
 	u_long	tcps_conndrops;		/* embryonic connections dropped */
 	u_long	tcps_minmssdrops;	/* average minmss too low drops */
 	u_long	tcps_closed;		/* conn. closed (includes drops) */
 	u_long	tcps_segstimed;		/* segs where we tried to get rtt */
 	u_long	tcps_rttupdated;	/* times we succeeded */
 	u_long	tcps_delack;		/* delayed acks sent */
 	u_long	tcps_timeoutdrop;	/* conn. dropped in rxmt timeout */
 	u_long	tcps_rexmttimeo;	/* retransmit timeouts */
 	u_long	tcps_persisttimeo;	/* persist timeouts */
 	u_long	tcps_keeptimeo;		/* keepalive timeouts */
 	u_long	tcps_keepprobe;		/* keepalive probes sent */
 	u_long	tcps_keepdrops;		/* connections dropped in keepalive */
 
 	u_long	tcps_sndtotal;		/* total packets sent */
 	u_long	tcps_sndpack;		/* data packets sent */
 	u_long	tcps_sndbyte;		/* data bytes sent */
 	u_long	tcps_sndrexmitpack;	/* data packets retransmitted */
 	u_long	tcps_sndrexmitbyte;	/* data bytes retransmitted */
 	u_long	tcps_sndrexmitbad;	/* unnecessary packet retransmissions */
 	u_long	tcps_sndacks;		/* ack-only packets sent */
 	u_long	tcps_sndprobe;		/* window probes sent */
 	u_long	tcps_sndurg;		/* packets sent with URG only */
 	u_long	tcps_sndwinup;		/* window update-only packets sent */
 	u_long	tcps_sndctrl;		/* control (SYN|FIN|RST) packets sent */
 
 	u_long	tcps_rcvtotal;		/* total packets received */
 	u_long	tcps_rcvpack;		/* packets received in sequence */
 	u_long	tcps_rcvbyte;		/* bytes received in sequence */
 	u_long	tcps_rcvbadsum;		/* packets received with ccksum errs */
 	u_long	tcps_rcvbadoff;		/* packets received with bad offset */
 	u_long	tcps_rcvmemdrop;	/* packets dropped for lack of memory */
 	u_long	tcps_rcvshort;		/* packets received too short */
 	u_long	tcps_rcvduppack;	/* duplicate-only packets received */
 	u_long	tcps_rcvdupbyte;	/* duplicate-only bytes received */
 	u_long	tcps_rcvpartduppack;	/* packets with some duplicate data */
 	u_long	tcps_rcvpartdupbyte;	/* dup. bytes in part-dup. packets */
 	u_long	tcps_rcvoopack;		/* out-of-order packets received */
 	u_long	tcps_rcvoobyte;		/* out-of-order bytes received */
 	u_long	tcps_rcvpackafterwin;	/* packets with data after window */
 	u_long	tcps_rcvbyteafterwin;	/* bytes rcvd after window */
 	u_long	tcps_rcvafterclose;	/* packets rcvd after "close" */
 	u_long	tcps_rcvwinprobe;	/* rcvd window probe packets */
 	u_long	tcps_rcvdupack;		/* rcvd duplicate acks */
 	u_long	tcps_rcvacktoomuch;	/* rcvd acks for unsent data */
 	u_long	tcps_rcvackpack;	/* rcvd ack packets */
 	u_long	tcps_rcvackbyte;	/* bytes acked by rcvd acks */
 	u_long	tcps_rcvwinupd;		/* rcvd window update packets */
 	u_long	tcps_pawsdrop;		/* segments dropped due to PAWS */
 	u_long	tcps_predack;		/* times hdr predict ok for acks */
 	u_long	tcps_preddat;		/* times hdr predict ok for data pkts */
 	u_long	tcps_pcbcachemiss;
 	u_long	tcps_cachedrtt;		/* times cached RTT in route updated */
 	u_long	tcps_cachedrttvar;	/* times cached rttvar updated */
 	u_long	tcps_cachedssthresh;	/* times cached ssthresh updated */
 	u_long	tcps_usedrtt;		/* times RTT initialized from route */
 	u_long	tcps_usedrttvar;	/* times RTTVAR initialized from rt */
 	u_long	tcps_usedssthresh;	/* times ssthresh initialized from rt*/
 	u_long	tcps_persistdrop;	/* timeout in persist state */
 	u_long	tcps_badsyn;		/* bogus SYN, e.g. premature ACK */
 	u_long	tcps_mturesent;		/* resends due to MTU discovery */
 	u_long	tcps_listendrop;	/* listen queue overflows */
 	u_long	tcps_badrst;		/* ignored RSTs in the window */
 
 	u_long	tcps_sc_added;		/* entry added to syncache */
 	u_long	tcps_sc_retransmitted;	/* syncache entry was retransmitted */
 	u_long	tcps_sc_dupsyn;		/* duplicate SYN packet */
 	u_long	tcps_sc_dropped;	/* could not reply to packet */
 	u_long	tcps_sc_completed;	/* successful extraction of entry */
 	u_long	tcps_sc_bucketoverflow;	/* syncache per-bucket limit hit */
 	u_long	tcps_sc_cacheoverflow;	/* syncache cache limit hit */
 	u_long	tcps_sc_reset;		/* RST removed entry from syncache */
 	u_long	tcps_sc_stale;		/* timed out or listen socket gone */
 	u_long	tcps_sc_aborted;	/* syncache entry aborted */
 	u_long	tcps_sc_badack;		/* removed due to bad ACK */
 	u_long	tcps_sc_unreach;	/* ICMP unreachable received */
 	u_long	tcps_sc_zonefail;	/* zalloc() failed */
 	u_long	tcps_sc_sendcookie;	/* SYN cookie sent */
 	u_long	tcps_sc_recvcookie;	/* SYN cookie received */
 
 	u_long	tcps_hc_added;		/* entry added to hostcache */
 	u_long	tcps_hc_bucketoverflow;	/* hostcache per bucket limit hit */
 
 	u_long  tcps_finwait2_drops;    /* Drop FIN_WAIT_2 connection after time limit */
 
 	/* SACK related stats */
 	u_long	tcps_sack_recovery_episode; /* SACK recovery episodes */
 	u_long  tcps_sack_rexmits;	    /* SACK rexmit segments   */
 	u_long  tcps_sack_rexmit_bytes;	    /* SACK rexmit bytes      */
 	u_long  tcps_sack_rcv_blocks;	    /* SACK blocks (options) received */
 	u_long  tcps_sack_send_blocks;	    /* SACK blocks (options) sent     */
 	u_long  tcps_sack_sboverflow; 	    /* times scoreboard overflowed */
 	
 	/* ECN related stats */
 	u_long	tcps_ecn_ce;		/* ECN Congestion Experienced */
 	u_long	tcps_ecn_ect0;		/* ECN Capable Transport */
 	u_long	tcps_ecn_ect1;		/* ECN Capable Transport */
 	u_long	tcps_ecn_shs;		/* ECN successful handshakes */
 	u_long	tcps_ecn_rcwnd;		/* # times ECN reduced the cwnd */
 
 	u_long	_pad[12];		/* 6 UTO, 6 TBD */
 };
 
 #ifdef _KERNEL
 #define	TCPSTAT_ADD(name, val)	V_tcpstat.name += (val)
 #define	TCPSTAT_INC(name)	TCPSTAT_ADD(name, 1)
 #endif
 
 /*
  * TCB structure exported to user-land via sysctl(3).
  * Evil hack: declare only if in_pcb.h and sys/socketvar.h have been
  * included.  Not all of our clients do.
  */
 #if defined(_NETINET_IN_PCB_H_) && defined(_SYS_SOCKETVAR_H_)
 struct	xtcpcb {
 	size_t	xt_len;
 	struct	inpcb	xt_inp;
 	struct	tcpcb	xt_tp;
 	struct	xsocket	xt_socket;
 	u_quad_t	xt_alignment_hack;
 };
 #endif
 
 /*
  * Names for TCP sysctl objects
  */
 #define	TCPCTL_DO_RFC1323	1	/* use RFC-1323 extensions */
 #define	TCPCTL_MSSDFLT		3	/* MSS default */
 #define TCPCTL_STATS		4	/* statistics (read-only) */
 #define	TCPCTL_RTTDFLT		5	/* default RTT estimate */
 #define	TCPCTL_KEEPIDLE		6	/* keepalive idle timer */
 #define	TCPCTL_KEEPINTVL	7	/* interval to send keepalives */
 #define	TCPCTL_SENDSPACE	8	/* send buffer space */
 #define	TCPCTL_RECVSPACE	9	/* receive buffer space */
 #define	TCPCTL_KEEPINIT		10	/* timeout for establishing syn */
 #define	TCPCTL_PCBLIST		11	/* list of all outstanding PCBs */
 #define	TCPCTL_DELACKTIME	12	/* time before sending delayed ACK */
 #define	TCPCTL_V6MSSDFLT	13	/* MSS default for IPv6 */
 #define	TCPCTL_SACK		14	/* Selective Acknowledgement,rfc 2018 */
 #define	TCPCTL_DROP		15	/* drop tcp connection */
 #define	TCPCTL_MAXID		16
 #define TCPCTL_FINWAIT2_TIMEOUT        17
 
 #define TCPCTL_NAMES { \
 	{ 0, 0 }, \
 	{ "rfc1323", CTLTYPE_INT }, \
 	{ "mssdflt", CTLTYPE_INT }, \
 	{ "stats", CTLTYPE_STRUCT }, \
 	{ "rttdflt", CTLTYPE_INT }, \
 	{ "keepidle", CTLTYPE_INT }, \
 	{ "keepintvl", CTLTYPE_INT }, \
 	{ "sendspace", CTLTYPE_INT }, \
 	{ "recvspace", CTLTYPE_INT }, \
 	{ "keepinit", CTLTYPE_INT }, \
 	{ "pcblist", CTLTYPE_STRUCT }, \
 	{ "delacktime", CTLTYPE_INT }, \
 	{ "v6mssdflt", CTLTYPE_INT }, \
 	{ "maxid", CTLTYPE_INT }, \
 }
 
 
 #ifdef _KERNEL
 #ifdef SYSCTL_DECL
 SYSCTL_DECL(_net_inet_tcp);
 SYSCTL_DECL(_net_inet_tcp_sack);
 MALLOC_DECLARE(M_TCPLOG);
 #endif
 
 extern	int tcp_log_in_vain;
 
 #ifdef VIMAGE_GLOBALS
 extern	struct inpcbhead tcb;		/* head of queue of active tcpcb's */
 extern	struct inpcbinfo tcbinfo;
 extern	struct tcpstat tcpstat;	/* tcp statistics */
 extern	int tcp_mssdflt;	/* XXX */
 extern	int tcp_minmss;
 extern	int tcp_delack_enabled;
 extern	int tcp_do_newreno;
 extern	int path_mtu_discovery;
 extern	int ss_fltsz;
 extern	int ss_fltsz_local;
 
 extern	int blackhole;
 extern	int drop_synfin;
 extern	int tcp_do_rfc3042;
 extern	int tcp_do_rfc3390;
 extern	int tcp_insecure_rst;
 extern	int tcp_do_autorcvbuf;
 extern	int tcp_autorcvbuf_inc;
 extern	int tcp_autorcvbuf_max;
 extern	int tcp_do_rfc3465;
 extern	int tcp_abc_l_var;
 
 extern	int tcp_do_tso;
 extern	int tcp_do_autosndbuf;
 extern	int tcp_autosndbuf_inc;
 extern	int tcp_autosndbuf_max;
 
 extern	int nolocaltimewait;
 
 extern	int tcp_do_sack;		/* SACK enabled/disabled */
 extern	int tcp_sack_maxholes;
 extern	int tcp_sack_globalmaxholes;
 extern	int tcp_sack_globalholes;
 extern	int tcp_sc_rst_sock_fail;	/* RST on sock alloc failure */
 extern	int tcp_do_ecn;			/* TCP ECN enabled/disabled */
 extern	int tcp_ecn_maxretries;
 #endif /* VIMAGE_GLOBALS */
 
 int	 tcp_addoptions(struct tcpopt *, u_char *);
 struct tcpcb *
 	 tcp_close(struct tcpcb *);
 void	 tcp_discardcb(struct tcpcb *);
 void	 tcp_twstart(struct tcpcb *);
 #if 0
 int	 tcp_twrecycleable(struct tcptw *tw);
 #endif
 void	 tcp_twclose(struct tcptw *_tw, int _reuse);
 void	 tcp_ctlinput(int, struct sockaddr *, void *);
 int	 tcp_ctloutput(struct socket *, struct sockopt *);
 struct tcpcb *
 	 tcp_drop(struct tcpcb *, int);
 void	 tcp_drain(void);
 void	 tcp_fasttimo(void);
 void	 tcp_init(void);
 #ifdef VIMAGE
 void	 tcp_destroy(void);
 #endif
 void	 tcp_fini(void *);
 char 	*tcp_log_addrs(struct in_conninfo *, struct tcphdr *, void *,
 	    const void *);
 int	 tcp_reass(struct tcpcb *, struct tcphdr *, int *, struct mbuf *);
 void	 tcp_reass_init(void);
 void	 tcp_input(struct mbuf *, int);
 u_long	 tcp_maxmtu(struct in_conninfo *, int *);
 u_long	 tcp_maxmtu6(struct in_conninfo *, int *);
 void	 tcp_mss_update(struct tcpcb *, int, struct hc_metrics_lite *, int *);
 void	 tcp_mss(struct tcpcb *, int);
 int	 tcp_mssopt(struct in_conninfo *);
 struct inpcb *
 	 tcp_drop_syn_sent(struct inpcb *, int);
 struct inpcb *
 	 tcp_mtudisc(struct inpcb *, int);
 struct tcpcb *
 	 tcp_newtcpcb(struct inpcb *);
 int	 tcp_output(struct tcpcb *);
 void	 tcp_respond(struct tcpcb *, void *,
 	    struct tcphdr *, struct mbuf *, tcp_seq, tcp_seq, int);
 void	 tcp_tw_init(void);
 #ifdef VIMAGE
 void	 tcp_tw_destroy(void);
 #endif
 void	 tcp_tw_zone_change(void);
 int	 tcp_twcheck(struct inpcb *, struct tcpopt *, struct tcphdr *,
 	    struct mbuf *, int);
 int	 tcp_twrespond(struct tcptw *, int);
 void	 tcp_setpersist(struct tcpcb *);
 #ifdef TCP_SIGNATURE
 int	 tcp_signature_compute(struct mbuf *, int, int, int, u_char *, u_int);
 #endif
 void	 tcp_slowtimo(void);
 struct tcptemp *
 	 tcpip_maketemplate(struct inpcb *);
 void	 tcpip_fillheaders(struct inpcb *, void *, void *);
 void	 tcp_timer_activate(struct tcpcb *, int, u_int);
 int	 tcp_timer_active(struct tcpcb *, int);
 void	 tcp_trace(short, short, struct tcpcb *, void *, struct tcphdr *, int);
 void	 tcp_xmit_bandwidth_limit(struct tcpcb *tp, tcp_seq ack_seq);
 /*
  * All tcp_hc_* functions are IPv4 and IPv6 (via in_conninfo)
  */
 void	 tcp_hc_init(void);
 #ifdef VIMAGE
 void	 tcp_hc_destroy(void);
 #endif
 void	 tcp_hc_get(struct in_conninfo *, struct hc_metrics_lite *);
 u_long	 tcp_hc_getmtu(struct in_conninfo *);
 void	 tcp_hc_updatemtu(struct in_conninfo *, u_long);
 void	 tcp_hc_update(struct in_conninfo *, struct hc_metrics_lite *);
 
 extern	struct pr_usrreqs tcp_usrreqs;
 extern	u_long tcp_sendspace;
 extern	u_long tcp_recvspace;
 tcp_seq tcp_new_isn(struct tcpcb *);
 
 void	 tcp_sack_doack(struct tcpcb *, struct tcpopt *, tcp_seq);
 void	 tcp_update_sack_list(struct tcpcb *tp, tcp_seq rcv_laststart, tcp_seq rcv_lastend);
 void	 tcp_clean_sackreport(struct tcpcb *tp);
 void	 tcp_sack_adjust(struct tcpcb *tp);
 struct sackhole *tcp_sack_output(struct tcpcb *tp, int *sack_bytes_rexmt);
 void	 tcp_sack_partialack(struct tcpcb *, struct tcphdr *);
 void	 tcp_free_sackholes(struct tcpcb *tp);
 int	 tcp_newreno(struct tcpcb *, struct tcphdr *);
 u_long	 tcp_seq_subtract(u_long, u_long );
 
 #endif /* _KERNEL */
 
 #endif /* _NETINET_TCP_VAR_H_ */
Index: head/sys/sys/param.h
===================================================================
--- head/sys/sys/param.h	(revision 195653)
+++ head/sys/sys/param.h	(revision 195654)
@@ -1,314 +1,314 @@
 /*-
  * Copyright (c) 1982, 1986, 1989, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)param.h	8.3 (Berkeley) 4/4/95
  * $FreeBSD$
  */
 
 #ifndef _SYS_PARAM_H_
 #define _SYS_PARAM_H_
 
 #include <sys/_null.h>
 
 #define	BSD	199506		/* System version (year & month). */
 #define BSD4_3	1
 #define BSD4_4	1
 
 /* 
  * __FreeBSD_version numbers are documented in the Porter's Handbook.
  * If you bump the version for any reason, you should update the documentation
  * there.
  * Currently this lives here:
  *
  *	doc/en_US.ISO8859-1/books/porters-handbook/book.sgml
  *
  * scheme is:  <major><two digit minor>Rxx
  *		'R' is in the range 0 to 4 if this is a release branch or
  *		x.0-CURRENT before RELENG_*_0 is created, otherwise 'R' is
  *		in the range 5 to 9.
  */
 #undef __FreeBSD_version
-#define __FreeBSD_version 800102	/* Master, propagated to newvers */
+#define __FreeBSD_version 800103	/* Master, propagated to newvers */
 
 #ifndef LOCORE
 #include <sys/types.h>
 #endif
 
 /*
  * Machine-independent constants (some used in following include files).
  * Redefined constants are from POSIX 1003.1 limits file.
  *
  * MAXCOMLEN should be >= sizeof(ac_comm) (see <acct.h>)
  * MAXLOGNAME should be == UT_NAMESIZE+1 (see <utmp.h>)
  */
 #include <sys/syslimits.h>
 
 #define	MAXCOMLEN	19		/* max command name remembered */
 #define	MAXINTERP	32		/* max interpreter file name length */
 #define	MAXLOGNAME	17		/* max login name length (incl. NUL) */
 #define	MAXUPRC		CHILD_MAX	/* max simultaneous processes */
 #define	NCARGS		ARG_MAX		/* max bytes for an exec function */
 #define	NGROUPS		(NGROUPS_MAX+1)	/* max number groups */
 #define	NOFILE		OPEN_MAX	/* max open files per process */
 #define	NOGROUP		65535		/* marker for empty group set member */
 #define MAXHOSTNAMELEN	256		/* max hostname size */
 #define SPECNAMELEN	63		/* max length of devicename */
 
 /* More types and definitions used throughout the kernel. */
 #ifdef _KERNEL
 #include <sys/cdefs.h>
 #include <sys/errno.h>
 #ifndef LOCORE
 #include <sys/time.h>
 #include <sys/priority.h>
 #endif
 
 #ifndef FALSE
 #define	FALSE	0
 #endif
 #ifndef TRUE
 #define	TRUE	1
 #endif
 #endif
 
 #ifndef _KERNEL
 /* Signals. */
 #include <sys/signal.h>
 #endif
 
 /* Machine type dependent parameters. */
 #include <machine/param.h>
 #ifndef _KERNEL
 #include <sys/limits.h>
 #endif
 
 #ifndef _NO_NAMESPACE_POLLUTION
 
 #ifndef DEV_BSHIFT
 #define	DEV_BSHIFT	9		/* log2(DEV_BSIZE) */
 #endif
 #define	DEV_BSIZE	(1<<DEV_BSHIFT)
 
 #ifndef BLKDEV_IOSIZE
 #define BLKDEV_IOSIZE  PAGE_SIZE	/* default block device I/O size */
 #endif
 #ifndef DFLTPHYS
 #define DFLTPHYS	(64 * 1024)	/* default max raw I/O transfer size */
 #endif
 #ifndef MAXPHYS
 #define MAXPHYS		(128 * 1024)	/* max raw I/O transfer size */
 #endif
 #ifndef MAXDUMPPGS
 #define MAXDUMPPGS	(DFLTPHYS/PAGE_SIZE)
 #endif
 
 /*
  * Constants related to network buffer management.
  * MCLBYTES must be no larger than PAGE_SIZE.
  */
 #ifndef	MSIZE
 #define MSIZE		256		/* size of an mbuf */
 #endif	/* MSIZE */
 
 #ifndef	MCLSHIFT
 #define MCLSHIFT	11		/* convert bytes to mbuf clusters */
 #endif	/* MCLSHIFT */
 
 #define MCLBYTES	(1 << MCLSHIFT)	/* size of an mbuf cluster */
 
 #define	MJUMPAGESIZE	PAGE_SIZE	/* jumbo cluster 4k */
 #define	MJUM9BYTES	(9 * 1024)	/* jumbo cluster 9k */
 #define	MJUM16BYTES	(16 * 1024)	/* jumbo cluster 16k */
 
 /*
  * Some macros for units conversion
  */
 
 /* clicks to bytes */
 #ifndef ctob
 #define ctob(x)	((x)<<PAGE_SHIFT)
 #endif
 
 /* bytes to clicks */
 #ifndef btoc
 #define btoc(x)	(((vm_offset_t)(x)+PAGE_MASK)>>PAGE_SHIFT)
 #endif
 
 /*
  * btodb() is messy and perhaps slow because `bytes' may be an off_t.  We
  * want to shift an unsigned type to avoid sign extension and we don't
  * want to widen `bytes' unnecessarily.  Assume that the result fits in
  * a daddr_t.
  */
 #ifndef btodb
 #define btodb(bytes)	 		/* calculates (bytes / DEV_BSIZE) */ \
 	(sizeof (bytes) > sizeof(long) \
 	 ? (daddr_t)((unsigned long long)(bytes) >> DEV_BSHIFT) \
 	 : (daddr_t)((unsigned long)(bytes) >> DEV_BSHIFT))
 #endif
 
 #ifndef dbtob
 #define dbtob(db)			/* calculates (db * DEV_BSIZE) */ \
 	((off_t)(db) << DEV_BSHIFT)
 #endif
 
 #endif /* _NO_NAMESPACE_POLLUTION */
 
 #define	PRIMASK	0x0ff
 #define	PCATCH	0x100		/* OR'd with pri for tsleep to check signals */
 #define	PDROP	0x200	/* OR'd with pri to stop re-entry of interlock mutex */
 
 #define	NZERO	0		/* default "nice" */
 
 #define	NBBY	8		/* number of bits in a byte */
 #define	NBPW	sizeof(int)	/* number of bytes per word (integer) */
 
 #define	CMASK	022		/* default file mask: S_IWGRP|S_IWOTH */
 
 #define	NODEV	(dev_t)(-1)	/* non-existent device */
 
 /*
  * File system parameters and macros.
  *
  * MAXBSIZE -	Filesystems are made out of blocks of at most MAXBSIZE bytes
  *		per block.  MAXBSIZE may be made larger without effecting
  *		any existing filesystems as long as it does not exceed MAXPHYS,
  *		and may be made smaller at the risk of not being able to use
  *		filesystems which require a block size exceeding MAXBSIZE.
  *
  * BKVASIZE -	Nominal buffer space per buffer, in bytes.  BKVASIZE is the
  *		minimum KVM memory reservation the kernel is willing to make.
  *		Filesystems can of course request smaller chunks.  Actual 
  *		backing memory uses a chunk size of a page (PAGE_SIZE).
  *
  *		If you make BKVASIZE too small you risk seriously fragmenting
  *		the buffer KVM map which may slow things down a bit.  If you
  *		make it too big the kernel will not be able to optimally use 
  *		the KVM memory reserved for the buffer cache and will wind 
  *		up with too-few buffers.
  *
  *		The default is 16384, roughly 2x the block size used by a
  *		normal UFS filesystem.
  */
 #define MAXBSIZE	65536	/* must be power of 2 */
 #define BKVASIZE	16384	/* must be power of 2 */
 #define BKVAMASK	(BKVASIZE-1)
 
 /*
  * MAXPATHLEN defines the longest permissible path length after expanding
  * symbolic links. It is used to allocate a temporary buffer from the buffer
  * pool in which to do the name expansion, hence should be a power of two,
  * and must be less than or equal to MAXBSIZE.  MAXSYMLINKS defines the
  * maximum number of symbolic links that may be expanded in a path name.
  * It should be set high enough to allow all legitimate uses, but halt
  * infinite loops reasonably quickly.
  */
 #define	MAXPATHLEN	PATH_MAX
 #define MAXSYMLINKS	32
 
 /* Bit map related macros. */
 #define	setbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] |= 1<<((i)%NBBY))
 #define	clrbit(a,i)	(((unsigned char *)(a))[(i)/NBBY] &= ~(1<<((i)%NBBY)))
 #define	isset(a,i)							\
 	(((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY)))
 #define	isclr(a,i)							\
 	((((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY))) == 0)
 
 /* Macros for counting and rounding. */
 #ifndef howmany
 #define	howmany(x, y)	(((x)+((y)-1))/(y))
 #endif
 #define	rounddown(x, y)	(((x)/(y))*(y))
 #define	roundup(x, y)	((((x)+((y)-1))/(y))*(y))  /* to any y */
 #define	roundup2(x, y)	(((x)+((y)-1))&(~((y)-1))) /* if y is powers of two */
 #define powerof2(x)	((((x)-1)&(x))==0)
 
 /* Macros for min/max. */
 #define	MIN(a,b) (((a)<(b))?(a):(b))
 #define	MAX(a,b) (((a)>(b))?(a):(b))
 
 #ifdef _KERNEL
 /*
  * Basic byte order function prototypes for non-inline functions.
  */
 #ifndef LOCORE
 #ifndef _BYTEORDER_PROTOTYPED
 #define	_BYTEORDER_PROTOTYPED
 __BEGIN_DECLS
 __uint32_t	 htonl(__uint32_t);
 __uint16_t	 htons(__uint16_t);
 __uint32_t	 ntohl(__uint32_t);
 __uint16_t	 ntohs(__uint16_t);
 __END_DECLS
 #endif
 #endif
 
 #ifndef lint
 #ifndef _BYTEORDER_FUNC_DEFINED
 #define	_BYTEORDER_FUNC_DEFINED
 #define	htonl(x)	__htonl(x)
 #define	htons(x)	__htons(x)
 #define	ntohl(x)	__ntohl(x)
 #define	ntohs(x)	__ntohs(x)
 #endif /* !_BYTEORDER_FUNC_DEFINED */
 #endif /* lint */
 #endif /* _KERNEL */
 
 /*
  * Scale factor for scaled integers used to count %cpu time and load avgs.
  *
  * The number of CPU `tick's that map to a unique `%age' can be expressed
  * by the formula (1 / (2 ^ (FSHIFT - 11))).  The maximum load average that
  * can be calculated (assuming 32 bits) can be closely approximated using
  * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15).
  *
  * For the scheduler to maintain a 1:1 mapping of CPU `tick' to `%age',
  * FSHIFT must be at least 11; this gives us a maximum load avg of ~1024.
  */
 #define	FSHIFT	11		/* bits to right of fixed binary point */
 #define FSCALE	(1<<FSHIFT)
 
 #define dbtoc(db)			/* calculates devblks to pages */ \
 	((db + (ctodb(1) - 1)) >> (PAGE_SHIFT - DEV_BSHIFT))
  
 #define ctodb(db)			/* calculates pages to devblks */ \
 	((db) << (PAGE_SHIFT - DEV_BSHIFT))
 
 /*
  * Given the pointer x to the member m of the struct s, return
  * a pointer to the containing structure.
  */
 #define	member2struct(s, m, x)						\
 	((struct s *)(void *)((char *)(x) - offsetof(struct s, m)))
 
 #endif	/* _SYS_PARAM_H_ */