Index: projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml
===================================================================
--- projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml (revision 301546)
+++ projects/vnet/release/doc/en_US.ISO8859-1/relnotes/article.xml (revision 301547)
@@ -1,1853 +1,1857 @@
%release;
%sponsor;
%vendor;
]>
&os; &release.current; Release NotesThe &os; Project$FreeBSD$20152016The &os; Documentation
Project
&tm-attrib.freebsd;
&tm-attrib.ibm;
&tm-attrib.ieee;
&tm-attrib.intel;
&tm-attrib.sparc;
&tm-attrib.general;
The release notes for &os; &release.current; contain
a summary of the changes made to the &os; base system on the
&release.branch; development line. This document lists
applicable security advisories that were issued since the last
release, as well as significant changes to the &os; kernel and
userland. Some brief remarks on upgrading are also
presented.IntroductionThis document contains the release notes for &os;
&release.current;. It describes recently added, changed, or
deleted features of &os;. It also provides some notes on
upgrading from previous versions of &os;.The &release.type; distribution to
which these release notes apply represents the latest point
along the &release.branch; development branch since
&release.branch; was created. Information regarding pre-built,
binary &release.type; distributions along this branch can be
found at &release.url;.The &release.type; distribution to
which these release notes apply represents a point along the
&release.branch; development branch between &release.prev; and
the future &release.next;. Information regarding pre-built,
binary &release.type; distributions along this branch can be
found at &release.url;.This distribution of &os;
&release.current; is a &release.type; distribution. It can be
found at &release.url; or
any of its mirrors. More information on obtaining this (or
other) &release.type; distributions of &os; can be found in the
Obtaining
&os; appendix to the &os;
Handbook.All users are encouraged to consult the release errata
before installing &os;. The errata document is updated with
late-breaking information discovered late in the
release cycle or after the release. Typically, it contains
information on known bugs, security advisories, and corrections
to documentation. An up-to-date copy of the errata for &os;
&release.current; can be found on the &os; Web site.This document describes the most user-visible new or changed
features in &os; since &release.prev;. In general, changes
described here are unique to the &release.branch; branch unless
specifically marked as &merged; features.Typical release note items document recent security
advisories issued after &release.prev;, new drivers or hardware
support, new commands or options, major bug fixes, or
contributed software upgrades. They may also list changes to
major ports/packages or release engineering practices. Clearly
the release notes cannot list every single change made to &os;
between releases; this document focuses primarily on security
advisories, user-visible changes, and major architectural
improvements.Upgrading from Previous Releases of &os;Binary upgrades between RELEASE versions
(and snapshots of the various security branches) are supported
using the &man.freebsd-update.8; utility. The binary upgrade
procedure will update unmodified userland utilities, as well as
unmodified GENERIC kernels distributed as a part of an official
&os; release. The &man.freebsd-update.8; utility requires that
the host being upgraded have Internet connectivity.Source-based upgrades (those based on recompiling the &os;
base system from source code) from previous versions are
supported, according to the instructions in
/usr/src/UPDATING.Upgrading &os; should only be attempted after backing up
all data and configuration files.Security and ErrataThis section lists the various Security Advisories and
Errata Notices since &release.prev;.Security Advisories
&security;
Errata Notices
&errata;
UserlandThis section covers changes and additions to userland
applications, contributed software, and system utilities.Userland Configuration ChangesThe default &man.newsyslog.conf.5; now
includes files in the
/etc/newsyslog.conf.d/ and
/usr/local/etc/newsyslog.conf.d/
directories by default for &man.newsyslog.8;.The &man.mailwrapper.8; utility has been
updated to use &man.mailer.conf.5; from the
LOCALBASE environment variable, which
defaults to /usr/local
if unset.The MK_ARM_EABI
&man.src.conf.5; option has been removed.The ntp suite
has been updated to version 4.2.8p8.Userland Application ChangesThe &man.casperd.8; daemon has been
added, which provides access to functionality that is not
available in the capability mode
sandbox.When unable to load a kernel module with
&man.kldload.8;, a message informing to view output of
&man.dmesg.8; is now printed, opposed to the previous output
Exec format error..Allow &man.pciconf.8; to identify PCI
devices that are attached to a driver to be identified by
their device name instead of just the selector. Additionally,
an optional device argument to the -l flag
to restrict the output to only listing details about a single
device.A new flag, onifconsole
has been added to /etc/ttys. This allows
the system to provide a login prompt via serial console if the
device is an active kernel console, otherwise it is equivalent
to off.Support for displaying VPD for PCI
devices via &man.pciconf.8; has been added.&man.ping.8; protects against malicious
network packets using the Capsicum framework to drop
privileges.The &man.ps.1; utility has been
updated to include the -J flag, used to
filter output by matching &man.jail.8; IDs and names.
Additionally, argument 0 can be used to
-J to only list processes running on the
host system.The &man.top.1; utility has been updated
to filter by &man.jail.8; ID or name, in followup to the
&man.ps.1; change in r265229.The &man.pmcstat.8; utility has been
updated to include a new flag, -l, which
ends event collection after the specified number of
seconds.The &man.ps.1; utility has been updated
to include a new keyword, tracer, which
displays the PID of the tracing
process.Support for adding empty partitions has
been added to the &man.mkimg.1; utility.The &man.primes.6; utility has been
updated to correctly enumerate prime numbers between
4295098369 and
3825123056546413050, which prior to this
change, it would be possible for returned values to be
incorrectly identified as prime numbers.The &man.mkimg.1; utility has been
updated to include three options used to print information
about &man.mkimg.1; itself:OptionOutput--versionThe current version of the &man.mkimg.1;
utility--formatsThe disk image file formats supported by
&man.mkimg.1;--schemesThe partition schemes supported by
&man.mkimg.1;Userland &man.ctf.5; support in
&man.dtrace.1; has been added. With this change,
&man.dtrace.1; is able to resolve type info for function and
USDT probe arguments, and function return
values.The &man.elfdump.1; utility has been
updated to support capability mode provided by
&man.capsicum.4;.The
&man.fstyp.8; utility has been added, which is used to
determine the filesystem on a specified device.The libedit library
has been updated to support UTF-8, which
additionally provides unicode support to &man.sh.1;.The
&man.mkimg.1; utility has been updated to support the
MBR EFI partition
type.The &man.ptrace.2; system
call has been updated include support for Altivec registers on
&os;/&arch.powerpc;.A new device control utility,
&man.devctl.8; has been added, which allows making
administrative changes to individual devices, such as
attaching and detaching drivers, and enabling and disabling
devices. The &man.devctl.8; utility uses the new
&man.devctl.3; library.The &man.netstat.1; utility has been
updated to link against the &man.libxo.3; shared
library.A new flag, -c, has
been added to the &man.mkimg.1; utility, which allows
specifying the capacity of the target disk image.The
&man.uefisign.8; utility has been added.The &man.freebsd-update.8; utility has
been updated to prevent fetching updated binary patches when
a previous upgrade has not been thoroughly completed.A regression in the &man.libarchive.3;
library that would prevent a directory from being included in
the archive when --one-file-system is used
has been fixed.The
&man.ar.1; utility has been updated to set
ARCHIVE_EXTRACT_SECURE_SYMLINKS and
ARCHIVE_EXTRACT_SECURE_NODOTDOT to disallow
directory traversal when extracting an archive, similar to
&man.tar.1;.A race condition in &man.wc.1; that
would cause final results to be sent to &man.stderr.4; when
receiving the SIGINFO signal has been
fixed.The &man.chflags.1;, &man.chgrp.1;,
&man.chmod.1;, and &man.chown.8; utilities now affect symbolic
links when the -R flag is specified, as
documented in &man.symlink.7;.The &man.date.1; utility has been
updated to print the modification time of the file passed as
an argument to the -r flag, improving
compatibility with the GNU &man.date.1;
utility behavior.The &man.pw.8; utility has been updated
with a new flag, -R, that sets the root
directory within which the utility will operate.The &man.lockstat.1; utility has been
updated with several improvements:Spin locks are now reported as the amount of time
spinning, instead of loop iterations.Reader locks are now recognized as adaptive that can
spin on &os;.Lock aquisition events for successful reader try-lock
events are now reported.Spin and block events are now reported before lock
acquisition events.The &man.fstyp.8; utility has been
updated to be able to detect &man.zfs.8; and &man.geli.8;
filesystems.The &man.mkimg.1; utility has been
updated to include support for NTFS
filesystems in both MBR and
GPT partitioning schemes.The &man.quota.1; utility has been
updated to include support for IPv6.The &man.jexec.8; utility has been
updated to include a new flag, -l, which
ensures a clean environment in the target jail when used.
Additionally, &man.jexec.8; will run a shell within the target
jail when run no commands are specified.The &man.w.1; utility has been updated
to display the full IPv6 remote address of the host from which
a user is connected.The &man.jail.8; framework has been
updated to allow mounting &man.linprocfs.5; and
&man.linsysfs.5; within a jail.The &man.patch.1; utility has been
updated to include a new option to the -V
flag, none, which disables backup file
creation when applying a patch.The
&man.ar.1; utility now enables deterministic mode
(-D) by default. This behavior can be
disabled by specifying the -U flag.The &man.xargs.1; utility has been
updated to allow specifying 0 as an
argument to the -P (parallel mode) flag,
which allows creating as many concurrent processes as
possible.The &man.patch.1; utility has been
updated to remove the automatic checkout feature.A
new utility, &man.sesutil.8;, has been added, which is used
to manage &man.ses.4; devices.The &man.pciconf.8; utility has been
updated to use the PCI ID database from the misc/pciids package, if present,
falling back to the PCI ID database in the &os; base
system.The &man.ifconfig.8; utility has been
updated to always exit with an error code if an important
&man.ioctl.2; fails.Contributed Software&man.byacc.1; has been updated to
version 20140101.OpenSSH has
been updated to 7.2p2.mdocml has
been updated to version 1.12.3.The binutils
suite of utilities has been updated to include upstream
patches that add new relocations for &arch.powerpc;
support.The
ELF Tool Chain has been updated to
upstream revision r3136.The texinfo
utility and info pages were removed from
the base system. The print/texinfo port should be
installed on systems where info pages are
needed.The ELF
object manipulation tools
addr2line,
elfcopy (strip),
nm,
readelf,
size, and
strings were switched to the
versions from the ELF Tool Chain project.The libedit library
has been updated to include UTF-8 support,
adding UTF-8 support to the &man.sh.1;
shell.The &man.xz.1; utility has been updated
to support multi-threaded compression.The
elftoolchain utilities have been
updated to version 3179.The &man.xz.1; utility has been updated
to version 5.2.1.The &man.nvi.1; utility has been updated
to version 2.1.3.The &man.wpa.supplicant.8; and
&man.hostapd.8; utilities have been updated to version
2.4.The
&man.resolvconf.8; utility has been updated to version
3.7.3.bmake has
been updated to version 20150606.sendmail has
been updated to 8.15.2. Starting with &os; 11.0 and
sendmail 8.15, sendmail uses uncompressed IPv6 addresses by
default, i.e., they will not contain ::. For
example, instead of ::1, it will be
0:0:0:0:0:0:0:1. This permits a zero subnet to
have a more specific match, such as different map entries for
IPv6:0:0 versus IPv6:0. This change requires that
configuration data (including maps, files, classes, custom
ruleset, etc.) must use the same format, so make certain such
configuration data is upgrading. As a very simple check
search for patterns like 'IPv6:[0-9a-fA-F:]*::' and 'IPv6::'.
To return to the old behavior, set the m4 option
confUSE_COMPRESSED_IPV6_ADDRESSES or the cf
option UseCompressedIPv6Addresses.The &man.tcpdump.1; utility has been
updated to version 4.7.4.OpenSSL has
been updated to version 1.0.2h.The
&man.ssh.1; utility has been updated to re-implement hostname
canonicalization before locating the host in
known_hosts.The &man.libarchive.3; library has been
updated to properly skip a sparse file entry in a &man.tar.1;
file, which would previously produce errors.The apr
library used by &man.svnlite.1; has been updated to version
1.5.2.The serf
library used by &man.svnlite.1; has been updated to version
1.3.8.The &man.svnlite.1; utility has been
updated to version 1.8.14.The sqlite3
library used by &man.svnlite.1; and &man.kerberos.8; has been
updated to version 3.12.1.Timezone data files have been updated to
version 2015f.The &man.acpi.4; subsystem has been
updated to version 20150818.The &man.unbound.8; utility has been
updated to version 1.5.4.&man.jemalloc.3; has been updated to
version 4.0.2.The &man.file.1; utility has been
updated to version 5.26.The &man.nc.1; utility has been updated
to the OpenBSD 5.8 version.Clang has
been updated to version 3.8.0.LLVM has
been updated to version 3.8.0.LLDB has
been updated to version 3.8.0.libc++ has
been updated to version 3.8.0.The
compiler_rt utility has been
updated to version 3.8.0.Installation and Configuration ToolsThe &man.bsdinstall.8; partition editor
and &man.sade.8; utility have been updated to include native
ZFS support.The &os; installation utility,
&man.bsdinstall.8;, has been updated to set the
canmount &man.zfs.8; property to
off for the /var dataset, preventing the
contents of directories within /var from conflicting when
using multiple boot environments, such as that provided by
sysutils/beadm.The &man.bsdconfig.8; utility has been
updated to skip the initial &man.tzsetup.8;
UTC versus wall-clock time prompt when run
in a virtual machine, determined when the
kern.vm_guest &man.sysctl.8; is set to
1.The &man.bsdinstall.8; utility has been
updated to use the new &man.dpv.3; library to display progress
when extracting the &os; distributions.Support for detecting and implementing
aligning partitions on 1Mb boundaries has been added to
&man.bsdinstall.8;.Support for detecting and implementing
a workaround for various laptops and motherboards that do not
boot properly from GPT-partitioned disks
has been added to &man.bsdinstall.8;. Additionally, the
active flag will be set on the partition
when needed.Support for selecting the partitioning
scheme when installing on the UFS
filesystem has been added to &man.bsdinstall.8;./etc/rc.d
ScriptsThe &man.rc.8; subsystem has been
updated to allow configuring services in ${LOCALBASE}/etc/rc.conf.d/.
If LOCALBASE is unset, it defaults to
/usr/local.A new &man.rc.8; script,
growfs, has been added, which will resize
the root filesystem on boot if /firstboot
exists.The mrouted
&man.rc.8; script has been removed from the base system. An
equivalent script is available from the net/mrouted port.A new &man.rc.8; script,
iovctl, has been added, which allows
automatically starting the &man.iovctl.8; utility at
boot.The &man.service.8; utility has been
updated to honor entries within /etc/rc.conf.d/./etc/periodic
ScriptsThe daily &man.periodic.8; script
110.clean-tmps has been updated to avoid
crossing filesystem mount boundaries when cleaning files in
/tmp.A new
&man.periodic.8; script,
510.status-world-kernel, has been added,
which evaluates the running userland and kernel versions from
the &man.uname.1; -U and
-K arguments, and prints an error if the
system userland and kernel are not in sync.Runtime Libraries and APIThe Blowfish &man.crypt.3; default
format has been changed to
$2b$.The &man.readline.3; library is now
statically linked in software within the base system, and the
shared library is no longer installed, allowing the Ports
Collection to use a modern version of the library.The &man.strptime.3; library has been
updated to add support for POSIX-2001
features %U and
%W.The &man.dl.iterate.phdr.3; library has been
changed to always return the path name of the
ELF object in the
dlpi_name structure member.The &man.libxo.3; library has been
imported to the base system.A
userland library for Chelsio Terminator 5 based iWARP cards
has been added, allowing userland RDMA
applications to work over compatible
NICs.The &man.gpio.3; library has been added,
providing a wrapper around the &man.gpio.4; kernel
interface.The
&man.procctl.2; system call has been updated to include
a facility for non-&man.init.8; processes to be declared as
the reaper of child processes and their decendants.The futimens() and
utimensat() system calls have been
added. See &man.utimensat.2; for more information.The &man.elf.3; compile-time dependency
has been removed from dtri.o, which
allows adding DTrace probes to
userland applications and libraries without also linking
against &man.elf.3;.The &man.setmode.3; function has been
updated to consistently set errno on
failure.The &man.qsort.3; functions have been
updated to be able to handle 32-bit aligned data on 64-bit
platforms, also providing a significant improvement in 32-bit
workloads.Several standard include headers have
been updated to use of gcc
attributes, such as __result_use_check(),
__alloc_size(), and
__nonnull().Support for file verification in
MAC has been added.The
libgomp library is now only built when
building GCC from the base system. An
up-to-date version is available in the Ports Collection as
devel/libiomp5-devel.The stdlib.h and
malloc.h headers have been updated to
make use of the gccalloc_align() attribute.The Blowfish &man.crypt.3; library
has been updated to support $2y$ hashes.The &man.execl.3; and &man.execlp.3;
library functions have been updated to use the
__sentinelgcc
attribute.ABI CompatibilityThe &linux; compatibility version has
been updated to 2.6.18. The
compat.linux.osrelease &man.sysctl.8; is
evaluated when building the emulators/linux-c6 and related
ports.The stack protector has been upgraded to
the "strong" level, elevating the protection against buffer
overflows. While this significantly improves the security of
the system, extensive testing was done to ensure there are no
measurable side effects in performance or
functionality.KernelThis section covers changes to kernel configurations, system
tuning, and system control parameters that are not otherwise
categorized.Kernel Bug FixesA kernel bug that inhibited proper
functionality of the dev.cpu.0.freq
&man.sysctl.8; on &intel; processors with Turbo
Boost ™ enabled has been fixed.Support for
&man.dtrace.1; stack tracing has been fixed for
&os;/&arch.powerpc;, using the trapexit()
and asttrapexit() functions instead of
checking within addressed kernel space.A kernel panic triggered when destroying
a &man.vnet.9; &man.jail.8; configured with &man.gif.4; has
been fixed.A kernel panic triggered when destroying
a &man.vnet.9; &man.jail.8; configured with &man.gre.4; has
been fixed.A bug in &man.ipfw.4; that could
potentially lead to a kernel panic when using &man.dummynet.4;
at layer 2 has been fixed.The
kernel RPC has been updated to include
several enhancements:The 45 MiB limit on requests queued for
&man.nfsd.8; threads has been removed.Avoids unnecessary throttling by not deferring
accounting for completed requests.Fixes an integer overflow and signedness bugs.Support for
&man.dtrace.1; has been added for the
Book-E ™.The &man.kqueue.2; system call has been
updated to handle write events to files larger than 2
gigabytes.Kernel ConfigurationThe IMAGACT_BINMISC
kernel configuration option has been enabled by default,
which enables application execution through emulators, such
as Qemu.The VT kernel
configuration file has been removed, and the &man.vt.4;
driver is included in the GENERIC kernel.
To enable &man.vt.4;, enter set kern.vty=vt
at the &man.loader.8; prompt during boot, or add
kern.vty=vt to &man.loader.conf.5; and
reboot the system.The &man.config.8; utility has been
updated to allow using a non-standard src/ tree, specified as an
argument to the -s flag.The
&os;/&arch.powerpc64; kernel now builds as
a position-independent executable, allowing the kernel to be
loaded into and run from any physical or virtual
address.This change requires an update to &man.loader.8;.
The userland and kernel must be updated before rebooting the
system.A new module for creating
rpi.dtb has been added for the Raspberry
Pi.The
rpi.dtb module is now installed to
/boot/dtb/ by
default for the Raspberry Pi system.Kernel support for Vector-Scalar eXtension
(VSX) found on POWER7 and POWER8 hardware
has been added.The &man.pmap.9; implementation for 64-bit
&powerpc; processors has been overhaulded to improve
concurrency.A new module for creating
the dtb module for AM335x systems has
been added.The
PAE_TABLES kernel configuration option has
been added for &os;/&arch.i386;, which instructs &man.pmap.9;
to use PAE format for page tables while
maintaining a 32-bit physical address size elsewhere in the
kernel. The use of this option can enhance application-level
security by enabling the creation of no execute
mappings on modern &arch.i386; processors. Unlike the
PAE option, PAE_TABLES
preserves kernel binary interface (KBI)
compatibility with non-PAE kernels,
allowing non-PAE kernel modules and drivers
to work with a PAE_TABLES-enabled kernel.
Additionally, system limits are tuned for 4GB maximum
RAM, avoiding kernel virtual address space
(KVA) exhaustion.The SIFTR kernel
configuration has been added, allowing building &man.siftr.4;
statically into the kernel.The &arch.arm; boot loader,
ubldr, is now relocatable. In addition,
ubldr.bin is now created during build
time, which is a stripped binary with an entry point of
0, providing the ability to specify the
load address by running go
${loadaddr} in
u-boot.The &man.nvd.4; and &man.nvme.4; drivers are
now included in the GENERIC kernel
configuration by default.A new kernel configuration option,
EM_MULTIQUEUE, has been added which enables
multi-queue support in the &man.em.4; driver.Multi-queue support in the &man.em.4; driver is not
officially supported by &intel;.The GENERIC kernel
configuration has been updated to include the
IPSEC option by default.Initial NUMA
affinity and policy configuration has been added. See
&man.numactl.1;, and &man.numa.getaffinity.2;, for usage
details.The &man.pms.4; driver has been added
to the GENERIC kernel configuration for
supported architectures.The
CUBIEBOARD2 kernel configuration has been
renamed to A20.Kernel
debugging symbols are now installed to /usr/lib/debug/boot/kernel/.
To retain the previous behavior, add
KERN_DEBUGDIR="" to
&man.src.conf.5;.System Tuning and ControlsThe
&man.hwpmc.4; default and maximum callchain depths have been
increased. The default has been increased from 16 to 32, and
the maximum increased from 32 to 128.The kern.osrelease
and kern.osreldate are now configurable
&man.jail.8; parameters.The &man.devfs.5; device filesystem has
been changed to update timestamps for read/write operations
using seconds precision. A new &man.sysctl.8;,
vfs.devfs.dotimes has been added, which
when set to a non-zero value, enables default precision
timestamps for these operations.A new
&man.sysctl.8;, kern.racct.enable, has been
added, which when set to a non-zero value allows using
&man.rctl.8; with the GENERIC kernel.
A new kernel configuration option,
RACCT_DISABLED has also been added.The
GENERIC kernel configuration now includes
RACCT and RCTL by
default.To enable RACCT and
RCTL on a system using the
GENERIC kernel configuration, add
kern.racct.enable=1 to
&man.loader.conf.5;, and reboot the system.A new &man.sysctl.8;,
net.inet.tcp.hostcache.purgenow, has
been added, which when set to 1 during
runtime will flush all
net.inet.tcp.hostcache entries.A new &man.sysctl.8;,
hw.model, has been added, which displays
CPU model information.The &man.uart.4; driver has been
updated to allow tuning pulses per second captured in the
CTS line during runtime, whereas previously only the DCD line
could be used without rebuilding the kernel.Devices and DriversThis section covers changes and additions to devices and
device drivers since &release.prev;.Device DriversSupport for GPS ports has been added to
&man.uhso.4;.The &man.full.4; device has been added,
and the lindev(4) device has been removed.
Prior to this change, lindev(4) provided
only the /dev/full character device,
returning ENOSPC on write attempts. As
this device is not specific to &linux;, a native &os; version
has been added.Hardware context support has been
added to the drm/i915 driver, adding
support for Mesa 9.2 and
later.The &man.vt.4; driver has been updated,
replacing the bitmapped kern.vt.spclkeys
&man.sysctl.8; with individual
kern.vt.kbd_* variants.The &man.hpet.4; driver has been updated
to create a
/dev/hpetN
device, providing access to HPET from
userspace.The drm code has
been updated to match &linux; version 3.8.13.The &man.psm.4; driver has been updated
to include improved support for newer Synaptics ®
touchpads and the ClickPad ® mouse on newer
Lenovo ™ laptops.Support for the Freescale
PCI Root Complex device has been
added.The &man.cyapa.4; driver has been added,
supporting the Cypress APA I2C trackpad.The &man.isl.4; driver has been added,
supporting the Intersil I2C ISL29018 digital ambient light
sensor.Storage DriversThe &man.mpr.4;
device has been added, providing support for LSI Fusion-MPT
3 12Gb SCSI/SATA controllers.The &man.mrsas.4; driver has been added,
providing support for LSI MegaRAID SAS controllers. The
&man.mfi.4; driver will attach to the controller, by default.
To enable &man.mrsas.4; add
hw.mfi.mrsas_enable=1 to
/boot/loader.conf, which turns off
&man.mfi.4; device probing.At this time, the &man.mfiutil.8; utility and the &os;
version of MegaCLI and
StorCli do not work with
&man.mrsas.4;.The
&man.ctl.4; subsystem has been updated, increasing the ports
limit from 128 to 256,
and LUN limit from 256
to 1024.The asr(4) driver has
been removed, and is no longer supported.The &man.hptnr.4; driver has been
updated to version 1.1.1.The &man.pms.4; driver has been added,
providing support for the PMC Sierra line of
SAS/SATA host bus
adapters.The &man.ioat.4; driver has been added,
providing support for the PSE (Platform
Storage Extension).The
CTL High Availability implementation has
been rewritten.The &man.ctl.4; driver has been updated
to support CD-ROM and removable devices.The &man.isp.4; driver has
been updated and improved: added support for 16Gbps FC cards,
improved target mode support, completed Multi-ID (NPIV)
functionality.Network DriversSupport for Broadcom chipsets BCM57764,
BCM57767, BCM57782, BCM57786 and BCM57787 has been added to
&man.bge.4;.Support for the &intel; Centrino™
Wireless-N 135 chipset has been added.Firmware for &intel; Centrino™
Wireless-N 105 devices has been added to the base
system.The deprecated nve(4) driver has been
removed. Users of NVIDIA nForce MCP network adapters are
advised to use the &man.nfe.4; driver instead, which has been
the default driver for this hardware since
&os; 7.0.The if_nf10bmac(4)
device has been added, providing support for NetFPGA-10G
Embedded CPU Ethernet Core.The if_nf10bmac(4) driver operates on
the FPGA, and is not suited for the PCI host
interface.The &man.ath.hal.4; driver has been
updated to support the Atheros AR1111 chipset.Support for the &intel; Centrino™
Wireless-N 105 chipset has been added.Support for the &man.cxgbe.4; Terminator
5 (T5) 10G/40G cards has been added to &man.netmap.4;.The &man.alc.4; driver has been updated
to support AR816x and AR817x ethernet controllers.The &man.pf.4; packet filter default
hash has been changed from Jenkins to
Murmur3, providing a 3-percent performance
increase in packets-per-second.The &man.vxlan.4; driver has been added,
which creates a virtual Layer 2 (Ethernet) network overlaid in
a Layer 3 (IP/UDP) network. The &man.vxlan.4; driver is
analogous to &man.vlan.4;, but is designed to be better suited
for large, multiple-tenant datacenter environments.The
&man.gre.4; driver has been significantly overhauled, and has
been split into two separate modules, &man.gre.4; and
&man.me.4;.The &man.ral.4; driver has been updated
to support the RT5390 and RT5392 chipsets.The &man.sfxge.4; driver has been
updated to support Solarflare Flareon Ultra 7000-series
chipsets.The &man.em.4; driver has been updated
with improved transmission queue hang detection.The &man.cdce.4; driver has been updated
to include support for the RTL8153 chipset.The &man.iwm.4; driver has been imported
from OpenBSD, providing support for &intel; 3160/7260/7265
wireless chipsets.The &man.em.4; driver has been updated
to allow disabling CRC stripping.The &man.pf.4; implementation has been
updated to remove support for the scrub fragment
crop|drop-ovl filtering rule. Systems with this
rule in &man.pf.conf.5; will implicitly be converted to the
scrub fragment reassemble filtering rule,
without necessary intervention.The &man.lagg.4; driver has been updated
to remove support for the fec
protocol.Hardware SupportThis section covers general hardware support for physical
machines, hypervisors, and virtualization environments, as well
as hardware changes and updates that do not otherwise fit in
other sections of this document.Hardware SupportThe &man.asmc.4; driver has been
updated to support the &apple; MacMini 3,1.Support for &os;/ia64 has been dropped
as of &os; 11.An issue that could cause a system to
hang when entering ACPI
S3 state (suspend to
RAM) has been corrected in the &man.acpi.4;
and &man.pci.4; drivers.The power management unit
subsystem has been updated to support power button events on
certain &arch.powerpc; hardware, such as aluminum
PowerBook ®.The &man.hwpmc.4;
driver has been updated to correct performance counter
sampling on G4 (MPC74xxx) and G5 class processors.The
OpenCrypto framework has been
updated to include AES-ICM and
AES-GCM modes, both of which have also been
added to the &man.aesni.4; driver.The &man.hwpmc.4;
driver has been updated to support the Freescale e500
core.The &man.ig4.4; driver has been added,
providing support for the fourth generation &intel;
I2C SMBus.The &man.uart.4; driver has been updated to support
AMT devices on newer systems.Initial SMP support has been
added to the &os;/&arch.arm64; port.Virtualization SupportSupport for the Virtual Interrupt
Delivery feature of &intel; VT-x is enabled if
supported by the CPU. This feature can be disabled by running
sysctl hw.vmm.vmx.use_apic_vid=0.
Additionally, to persist this setting across reboots, add
hw.vmm.vmx.use_apic_vid=0 to
/etc/sysctl.conf.Support for Posted Interrupt
Processing is enabled if supported by the CPU. This
feature can be disabled by running sysctl
hw.vmm.vmx.use_apic_pir=0. Additionally, to
persist this setting across reboots, add
hw.vmm.vmx.use_apic_pir=0 to
/etc/sysctl.conf.Unmapped IO support has been added to
&man.virtio_blk.4;.Unmapped IO support has been added to
&man.virtio_scsi.4;.The &man.virtio_random.4; driver has
been added to harvest entropy from the host system.&os;/&arch.i386; guests can be run under
bhyve.Support for running a &os;/&arch.amd64;
Xen guest instance as
PVH guest has been added.
PVH mode, short for Para-Virtualized
Hardware, uses para-virtualized drivers for boot and
I/O, and uses hardware virtualization extensions for all other
tasks, without the need for emulation.The &man.bhyve.8; hypervisor has been
updated to support &amd; processors with
SVM and AMD-V hardware
extensions.The &man.virtio.console.4; driver has
been added, which provides an interface to VirtIO console
devices through a &man.tty.4; device.The &man.bhyve.8; hypervisor has been
updated to support DSM TRIM commands for
virtual AHCI disks.Support for the
QEMUvirt system
has been added.The
Hyper-V™ drivers have been updated with several
enhancements:The &man.hv.vmbus.4; driver now has multi-channel
support.The &man.hv.storvsc.4; driver now has scatter/gather
support, in addition to performance improvements.The &man.hv.kvp.4; driver has received several bug
fixes.Support for &man.xen.4; para-virtualized
domU kernels has been removed.The
&man.hv.netvsc.4; driver has been updated to support checksum
offloading and TSO.The &man.xen.4; driver has been updated
to include support for blkif indirect
segment I/O.ARM SupportThe &man.nand.4; device is enabled for
ARM devices by default.Support for the Exynos 5420
Octa system has been added.The SMP
option has been enabled for all Exynos 5 systems supported by
&os;.Support for the Toradex
Apalis i.MX6 development board has been added.An issue that could cause
instability when detecting SD cards on the
Raspberry Pi SOC has been fixed.The bcm2835_cpufreq
driver has been added, which supports CPU
frequency and voltage control on the Raspberry Pi
SOC.Support to turn off the
BeagleBone Black system with the &man.shutdown.8;
-p flag or by invoking &man.poweroff.8; has
been added.Audio transmission drivers
have been added for Digital Audio Multiplexer
(AUDMUXM), Smart Direct Memory Access
Controller (SDMA), and Syncronous Serial
Interface (SSI).Initial
support for the ARM AArch64 architecture has been
added.Kernel support for Thumb-2
userland has been added.Support for the hardware power button
on the BeagleBone Black system has been added.Initial
ACPI support has been added for
&os;/&arch.arm64;.Support for 1-Wire devices has been
added, providing support for 1-Wire hardware through
&man.gpio.4;. See &man.ow.4;, &man.owc.4;, and
&man.ow.temp.4; for more information.Support for the HiSilicon HI6220 SoC has been
added.StorageThis section covers changes and additions to file systems
and other storage subsystems, both local and networked.General StorageThe
&man.ctl.4; LUN mapping has been rewritten,
replacing iSCSI-specific mapping mechanisms
with a new mechanism that works for any port.The
&man.ctld.8; utility has been updated to allow controlling
non-iSCSI &man.ctl.4; ports.The
&man.autofs.5; subsystem has been updated to include a new
&man.auto.master.5; map, -media, which
allows automatically mounting removable media, such as
CD drives or USB flash
drives.The
&man.autofs.5; subsystem has been updated to include a new
&man.auto.master.5; map, -noauto, which
handles &man.fstab.5; entries set to
noauto.The GELI class has
been updated to support the BIO_DELETE
&man.g.bio.9; bio_cmd field, providing
TRIM/UNMAP support on
GELI-backed SSD storage
providers.Networked StorageThe new
filesystem automount facility, &man.autofs.5;, has been added.
The new &man.autofs.5; facility is similar to that found in
other &unix;-like operating systems, such as OS X™
and Solaris™. The &man.autofs.5; facility uses
a &sun;-compatible &man.auto.master.5; configuration file, and
is administered with the &man.automount.8; userland utility,
and the &man.automountd.8; and &man.autounmountd.8;
daemons.Support
for the timeo, actimeo,
noac, and proto options
have been added to &man.mount.nfs.8;.ZFSThe arc_meta_limit
statistics are now visible through the
kstat &man.sysctl.8;. As a result of this
change, the vfs.zfs.arc_meta_used
&man.sysctl.8; has been removed, and replaced with the
kstat.zfs.misc.arcstats.arc_meta_used
&man.sysctl.8;.The &man.zfs.8; l2arc
code has been updated to take ashift into
account when gathering buffers to be written to the
l2arc device.The zfsd daemon has been added,
which manages hotspares and replements in drive slots that publish
physical paths.&man.geom.4;Support for the
disklabel64 partitioning scheme has been
added to &man.gpart.8;.Support for the
apple-boot, apple-hfs,
and apple-ufs MBR
partitioning schemes have been added to &man.gpart.8;.The &man.gpart.8; utility has been
updated to include a new attribute for GPT
partitions, lenovofix, which when set,
which works around BIOS compatibility
issues reported on several Lenovo ™ laptops.Boot Loader ChangesThis section covers the boot loader, boot menu, and other
boot-related changes.Boot Loader ChangesThe
memory test run at boot time on &os;/&arch.amd64; platforms
has been disabled by default.A new &man.ttys.5; class,
3wire, has been added. This is similar to
the existing terminal classes, but does not have a defined
baudrate.The &man.vt.4; driver has been made the
default system console driver. The &man.syscons.4; driver is
still available, and can be enabled by adding
kern.vty=sc in &man.loader.conf.5;.
Alternatively, &man.syscons.4; can be enabled at boot time by
entering set kern.vty=sc at the
&man.loader.8; prompt.Support for bzipfs
has been added to the EFI loader.The boot loader has been updated to
support entering the GELI passphrase before
loading the kernel. To enable this behavior, add
geom_eli_passphrase_prompt="YES" to
&man.loader.conf.5;.The &man.ttys.5; file for &os;/&arch.arm; has been
updated to enable ttyu1,
ttyu2, and ttyu3 by
default, if the callin port is an active console port.Boot Menu ChangesNetworkingThis section describes changes that affect networking in
&os;.Network ProtocolsSupport for the IPX network transport
protocol has been removed, and will not be supported in
&os; 11 and later releases.Support for PLPMTUD
blackhole detection (RFC 4821) has been
added to the &man.tcp.4; stack, disabled by default. New
control tunables have been added:TunableDescriptionnet.inet.tcp.pmtud_blackhole_detectionEnables or disables PLPMTUD
blackhole detectionnet.inet.tcp.pmtud_blackhole_mssMSS to try for IPv4net.inet.tcp.v6pmtud_blackhole_mssMSS to try for IPv6New monitoring &man.sysctl.8;s haven been added:TunableDescriptionnet.inet.tcp.pmtud_blackhole_activatedNumber of times the code was activated to attempt
downshifting the MSSnet.inet.tcp.pmtud_blackhole_min_activatedNumber of times the blackhole
MSS was used in an attempt to
downshiftnet.inet.tcp.pmtud_blackhole_failedNumber of times that the blackhole failed to
connect after downshifting the
MSSSupport for IP
identification for atomic datagrams (RFC
6864) has been added. Support for this feature can be toggled
with the net.inet.ip.rfc6864
&man.sysctl.8;, which is enabled by default.The IPSEC has been
updated to include support for AES modes on
both software-only and hardware-backed (&man.aesni.4;)
systems.The
network stack has been updated to fix handling of
IPv6 On-Link redirects.The net.inet.tcp.ecn.enable sysctl mib has been
changed from a binary off/on control to a three way setting.ValueDescription0Totally disable ECN.1Enable ECN if incoming connections request it. Outgoing
connections will request ECN.2Enable ECN if incoming connections request it. Outgoing
conections will not request ECN.
+ Dummynet AQM, an independent implementation of
+ CoDel and FQ-CoDel for ipfw/dummynet has been imported to the base
+ system.
+
Ports Collection and Package InfrastructureThis section covers changes to the &os; Ports
Collection, package infrastructure, and package maintenance and
installation tools.Infrastructure ChangesPackaging ChangesDocumentationThis section covers changes to the &os; Documentation
Project sources and toolchain.Documentation Source ChangesDocumentation Toolchain ChangesRelease Engineering and IntegrationThis section convers changes that are specific to the
&os; Release Engineering processes.Integration ChangesThe
Release Engineering build tools have been updated to include
support for producing virtual machine disk images for various
cloud hosting providers.The Release Engineering build tools have
been updated to use multi-threaded &man.xz.1;. By default,
the number of &man.xz.1; threads is set to the number of cores
available.The
Release Engineering build tools have been updated to include
support for building &os;/&arch.arm64; virtual machine and
memory stick installation images.The
Release Engineering build tools have been updated to support
building &os;/&arch.arm; images without external utilities for
supported boards where a corresponding
u-boot port exists in the Ports
Collection.The
&os;/&arch.i386; memory stick installation images are now
created using the &man.mkimg.1; utility, matching the way
the &os;/&arch.amd64; images are created.
Index: projects/vnet/share/man/man5/rc.conf.5
===================================================================
--- projects/vnet/share/man/man5/rc.conf.5 (revision 301546)
+++ projects/vnet/share/man/man5/rc.conf.5 (revision 301547)
@@ -1,4663 +1,4679 @@
.\" Copyright (c) 1995
.\" Jordan K. Hubbard
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
-.Dd April 30, 2016
+.Dd June 8, 2016
.Dt RC.CONF 5
.Os
.Sh NAME
.Nm rc.conf
.Nd system configuration information
.Sh DESCRIPTION
The file
.Nm
contains descriptive information about the local host name, configuration
details for any potential network interfaces and which services should be
started up at system initial boot time.
In new installations, the
.Nm
file is generally initialized by the system installation utility.
.Pp
The purpose of
.Nm
is not to run commands or perform system startup actions
directly.
Instead, it is included by the
various generic startup scripts in
.Pa /etc
which conditionalize their
internal actions according to the settings found there.
.Pp
The
.Pa /etc/rc.conf
file is included from the file
.Pa /etc/defaults/rc.conf ,
which specifies the default settings for all the available options.
Options need only be specified in
.Pa /etc/rc.conf
when the system administrator wishes to override these defaults.
The file
.Pa /etc/rc.conf.local
is used to override settings in
.Pa /etc/rc.conf
for historical reasons.
.Pp
In addition to
.Pa /etc/rc.conf.local
you can also place smaller configuration files for each
.Xr rc 8
script in the
.Pa /etc/rc.conf.d
directory or
.Ao Ar dir Ac Ns Pa /rc.conf.d
directories specified in
.Va local_startup ,
which will be included by the
.Va load_rc_config
function.
For jail configurations you could use the file
.Pa /etc/rc.conf.d/jail
to store jail specific configuration options.
If
.Va local_startup
contains
.Pa /usr/local/etc/rc.d
and
.Pa /opt/conf ,
.Pa /usr/local/rc.conf.d/jail
and
.Pa /opt/conf/rc.conf.d/jail
will be loaded.
If
.Ao Ar dir Ac Ns Pa /rc.conf.d/ Ns Ao Ar name Ac
is a directory,
all of files in the directory will be loaded.
Also see the
.Va rc_conf_files
variable below.
.Pp
Options are set with
.Dq Ar name Ns Li = Ns Ar value
assignments that use
.Xr sh 1
syntax.
The following list provides a name and short description for each
variable that can be set in the
.Nm
file:
.Bl -tag -width indent-two
.It Va rc_debug
.Pq Vt bool
If set to
.Dq Li YES ,
enable output of debug messages from rc scripts.
This variable can be helpful in diagnosing mistakes when
editing or integrating new scripts.
Beware that this produces copious output to the terminal and
.Xr syslog 3 .
.It Va rc_info
.Pq Vt bool
If set to
.Dq Li NO ,
disable informational messages from the rc scripts.
Informational messages are displayed when
a condition that is not serious enough to warrant a warning or
an error occurs.
.It Va rc_startmsgs
.Pq Vt bool
If set to
.Dq Li YES ,
show
.Dq Starting foo:
when faststart is used (e.g., at boot time).
.It Va early_late_divider
.Pq Vt str
The name of the script that should be used as the
delimiter between the
.Dq early
and
.Dq late
stages of the boot process.
The early stage should contain all the services needed to
get the disks (local or remote) mounted so that the late
stage can include scripts contained in the directories
listed in the
.Va local_startup
variable (see below).
Thus, the two likely candidates for this value are
.Pa mountcritlocal
for the typical system, and
.Pa mountcritremote
if the system needs remote file
systems mounted to get access to the
.Va local_startup
directories; for example when
.Pa /usr/local
is NFS mounted.
For
.Pa rc.conf
within a
.Xr jail 8
.Pa NETWORKING
is likely to be an appropriate value.
Extreme care should be taken when changing this value,
and before changing it one should ensure that there are
adequate provisions to recover from a failed boot
(such as physical contact with the machine,
or reliable remote console access).
.It Va always_force_depends
.Pq Vt bool
Various
.Pa rc.d
scripts use the force_depend function to check whether required
services are already running, and to start them if necessary.
By default during boot time this check is bypassed if the
required service is enabled in
.Pa /etc/rc.conf[.local] .
Setting this option will bypass that check at boot time and
always test whether or not the service is actually running.
Enabling this option is likely to increase your boot time if
services are enabled that utilize the force_depend check.
.It Ao Ar name Ac Ns Va _chroot
.Pq Vt str
.Xr chroot
to this directory before running the service.
.It Ao Ar name Ac Ns Va _user
.Pq Vt str
Run the service under this user account.
.It Ao Ar name Ac Ns Va _group
.Pq Vt str
Run the chrooted service under this system group. Unlike the _user
setting, this setting has no effect if the service is not chrooted.
.It Ao Ar name Ac Ns Va _fib
.Pq Vt int
The
.Xr setfib 1
value to run the service under.
.It Ao Ar name Ac Ns Va _nice
.Pq Vt int
The
.Xr nice 1
value to run the service under.
.It Va apm_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable support for Automatic Power Management with
the
.Xr apm 8
command.
.It Va apmd_enable
.Pq Vt bool
Run
.Xr apmd 8
to handle APM event from userland.
This also enables support for APM.
.It Va apmd_flags
.Pq Vt str
If
.Va apmd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr apmd 8
daemon.
.It Va devd_enable
.Pq Vt bool
Run
.Xr devd 8
to handle device added, removed or unknown events from the kernel.
.It Va ddb_enable
.Pq Vt bool
Run
.Xr ddb 8
to install
.Xr ddb 4
scripts at boot time.
.It Va ddb_config
.Pq Vt str
Configuration file for
.Xr ddb 8 .
Default
.Pa /etc/ddb.conf .
.It Va kld_list
.Pq Vt str
A list of kernel modules to load right after the local
disks are mounted.
Loading modules at this point in the boot process is
much faster than doing it via
.Pa /boot/loader.conf
for those modules not necessary for mounting local disk.
.It Va kldxref_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Set to
.Dq Li YES
to automatically rebuild
.Pa linker.hints
files with
.Xr kldxref 8
at boot time.
.It Va kldxref_clobber
.Pq Vt bool
Set to
.Dq Li NO
by default.
If
.Va kldxref_enable
is true,
setting to
.Dq Li YES
will overwrite existing
.Pa linker.hints
files at boot time.
Otherwise,
only missing
.Pa linker.hints
files are generated.
.It Va kldxref_module_path
.Pq Vt str
Empty by default.
A semi-colon
.Pq Ql \&;
delimited list of paths containing
.Xr kld 4
modules.
If empty,
the contents of the
.Va kern.module_path
.Xr sysctl 8
are used.
.It Va powerd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable the system power control facility with the
.Xr powerd 8
daemon.
.It Va powerd_flags
.Pq Vt str
If
.Va powerd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr powerd 8
daemon.
.It Va tmpmfs
Controls the creation of a
.Pa /tmp
memory file system.
Always happens if set to
.Dq Li YES
and never happens if set to
.Dq Li NO .
If set to anything else, a memory file system is created if
.Pa /tmp
is not writable.
.It Va tmpsize
Controls the size of a created
.Pa /tmp
memory file system.
.It Va tmpmfs_flags
Extra options passed to the
.Xr mdmfs 8
utility when the memory file system for
.Pa /tmp
is created.
The default is
.Dq Li "-S" ,
which inhibits the use of softupdates on
.Pa /tmp
so that file system space is freed without delay
after file truncation or deletion.
See
.Xr mdmfs 8
for other options you can use in
.Va tmpmfs_flags .
.It Va varmfs
Controls the creation of a
.Pa /var
memory file system.
Always happens if set to
.Dq Li YES
and never happens if set to
.Dq Li NO .
If set to anything else, a memory file system is created if
.Pa /var
is not writable.
.It Va varsize
Controls the size of a created
.Pa /var
memory file system.
.It Va varmfs_flags
Extra options passed to the
.Xr mdmfs 8
utility when the memory file system for
.Pa /var
is created.
The default is
.Dq Li "-S" ,
which inhibits the use of softupdates on
.Pa /var
so that file system space is freed without delay
after file truncation or deletion.
See
.Xr mdmfs 8
for other options you can use in
.Va varmfs_flags .
.It Va populate_var
Controls the automatic population of the
.Pa /var
file system.
Always happens if set to
.Dq Li YES
and never happens if set to
.Dq Li NO .
If set to anything else, a memory file system is created if
.Pa /var
is not writable.
Note that this process requires access to certain commands in
.Pa /usr
before
.Pa /usr
is mounted on normal systems.
.It Va cleanvar_enable
.Pq Vt bool
Clean the
.Pa /var
directory.
.It Va local_startup
.Pq Vt str
List of directories to search for startup script files.
.It Va script_name_sep
.Pq Vt str
The field separator to use for breaking down the list of startup script files
into individual filenames.
The default is a space.
It is not necessary to change this unless there are startup scripts with names
containing spaces.
.It Va hostapd_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr hostapd 8
at system boot time.
.It Va hostname
.Pq Vt str
The fully qualified domain name (FQDN) of this host on the network.
This should almost certainly be set to something meaningful, even if
there is no network connection.
If
.Xr dhclient 8
is used to set the hostname via DHCP,
this variable should be set to an empty string.
If this value remains unset when the system is done booting
your console login will display the default hostname of
.Dq Amnesiac .
.It Va nisdomainname
.Pq Vt str
The NIS domain name of this host, or
.Dq Li NO
if NIS is not used.
.It Va dhclient_program
.Pq Vt str
Path to the DHCP client program
.Pa ( /sbin/dhclient ,
the
.Ox
DHCP client,
is the default).
.It Va dhclient_flags
.Pq Vt str
Additional flags to pass to the DHCP client program.
For the
.Ox
DHCP client, see the
.Xr dhclient 8
manpage for a description of the command line options available.
.It Va dhclient_flags_ Ns Aq Ar iface
Additional flags to pass to the DHCP client program running on
.Ar iface
only.
When specified, this variable overrides
.Va dhclient_flags .
.It Va background_dhclient
.Pq Vt bool
Set to
.Dq Li YES
to start the DHCP client in background.
This can cause trouble with applications depending on
a working network, but it will provide a faster startup
in many cases.
.It Va background_dhclient_ Ns Aq Ar iface
When specified, this variable overrides the
.Va background_dhclient
variable for interface
.Ar iface
only.
.It Va synchronous_dhclient
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr dhclient 8
synchronously at startup.
This behavior can be overridden on a per-interface basis by replacing
the
.Dq Li DHCP
keyword in the
.Va ifconfig_ Ns Aq Ar interface
variable with
.Dq Li SYNCDHCP
or
.Dq Li NOSYNCDHCP .
.It Va defaultroute_delay
.Pq Vt int
When set to a positive value, wait up to this long after configuring
DHCP interfaces at startup to give the interfaces time to receive a lease.
.It Va firewall_enable
.Pq Vt bool
Set to
.Dq Li YES
to load firewall rules at startup.
If the kernel was not built with
.Cd "options IPFIREWALL" ,
the
.Pa ipfw.ko
kernel module will be loaded.
See also
.Va ipfilter_enable .
.It Va firewall_script
.Pq Vt str
This variable specifies the full path to the firewall script to run.
The default is
.Pa /etc/rc.firewall .
.It Va firewall_type
.Pq Vt str
Names the firewall type from the selection in
.Pa /etc/rc.firewall ,
or the file which contains the local firewall ruleset.
Valid selections from
.Pa /etc/rc.firewall
are:
.Pp
.Bl -tag -width ".Li simple" -compact
.It Li open
unrestricted IP access
.It Li closed
all IP services disabled, except via
.Dq Li lo0
.It Li client
basic protection for a workstation
.It Li simple
basic protection for a LAN.
.El
.Pp
If a filename is specified, the full path
must be given.
.It Va firewall_quiet
.Pq Vt bool
Set to
.Dq Li YES
to disable the display of firewall rules on the console during boot.
.It Va firewall_logging
.Pq Vt bool
Set to
.Dq Li YES
to enable firewall event logging.
This is equivalent to the
.Dv IPFIREWALL_VERBOSE
kernel option.
.It Va firewall_logif
.Pq Vt bool
Set to
.Dq Li YES
to create pseudo interface
.Li ipfw0
for logging.
For more details, see
.Xr ipfw 8
manual page.
.It Va firewall_flags
.Pq Vt str
Flags passed to
.Xr ipfw 8
if
.Va firewall_type
specifies a filename.
.It Va firewall_coscripts
.Pq Vt str
List of executables and/or rc scripts to run after firewall starts/stops.
Default is empty.
.\" ----- firewall_nat_enable setting --------------------------------
.It Va firewall_nat_enable
.Pq Vt bool
The
.Xr ipfw 8
equivalent of
.Va natd_enable .
Setting this to
.Dq Li YES
enables kernel NAT.
.Va firewall_enable
must also be set to
.Dq Li YES .
.It Va firewall_nat_interface
.Pq Vt str
The
.Xr ipfw 8
equivalent of
.Va natd_interface .
This is the name of the public interface or IP address on which
kernel NAT should run.
.It Va firewall_nat_flags
.Pq Vt str
Additional configuration parameters for kernel NAT should be placed here.
.It Va dummynet_enable
.Pq Vt bool
Setting this to
.Dq Li YES
will automatically load the
.Xr dummynet 4
module if
.Va firewall_enable
is also set to
.Dq Li YES .
.\" -------------------------------------------------------------------
.It Va natd_program
.Pq Vt str
Path to
.Xr natd 8 .
.It Va natd_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable
.Xr natd 8 .
.Va firewall_enable
must also be set to
.Dq Li YES ,
and
.Xr divert 4
sockets must be enabled in the kernel.
If the kernel was not built with
.Cd "options IPDIVERT" ,
the
.Pa ipdivert.ko
kernel module will be loaded.
.It Va natd_interface
.Pq Vt str
This is the name of the public interface on which
.Xr natd 8
should run.
The interface may be given as an interface name or as an IP address.
.It Va natd_flags
.Pq Vt str
Additional
.Xr natd 8
flags should be placed here.
The
.Fl n
or
.Fl a
flag is automatically added with the above
.Va natd_interface
as an argument.
.\" ----- ipfilter_enable setting --------------------------------
.It Va ipfilter_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting this to
.Dq Li YES
enables
.Xr ipf 8
packet filtering.
.Pp
Typical usage will require putting
.Bd -literal
ipfilter_enable="YES"
ipnat_enable="YES"
ipmon_enable="YES"
ipfs_enable="YES"
.Ed
.Pp
into
.Pa /etc/rc.conf
and editing
.Pa /etc/ipf.rules
and
.Pa /etc/ipnat.rules
appropriately.
.Pp
Note that
.Va ipfilter_enable
and
.Va ipnat_enable
can be enabled independently.
.Va ipmon_enable
and
.Va ipfs_enable
both require at least one of
.Va ipfilter_enable
and
.Va ipnat_enable
to be enabled.
.Pp
Having
.Bd -literal
options IPFILTER
options IPFILTER_LOG
options IPFILTER_DEFAULT_BLOCK
.Ed
.Pp
in the kernel configuration file is a good idea, too.
.\" ----- ipfilter_program setting ------------------------------
.It Va ipfilter_program
.Pq Vt str
Path to
.Xr ipf 8
(default
.Pa /sbin/ipf ) .
.\" ----- ipfilter_rules setting --------------------------------
.It Va ipfilter_rules
.Pq Vt str
Set to
.Pa /etc/ipf.rules
by default.
This variable contains the name of the filter rule definition file.
The file is expected to be readable for the
.Xr ipf 8
command to execute.
.\" ----- ipv6_ipfilter_rules setting ---------------------------
.It Va ipv6_ipfilter_rules
.Pq Vt str
Set to
.Pa /etc/ipf6.rules
by default.
This variable contains the IPv6 filter rule definition file.
The file is expected to be readable for the
.Xr ipf 8
command to execute.
.\" ----- ipfilter_flags setting --------------------------------
.It Va ipfilter_flags
.Pq Vt str
Empty by default.
This variable contains flags passed to the
.Xr ipf 8
program.
.\" ----- ipnat_enable setting ----------------------------------
.It Va ipnat_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Set it to
.Dq Li YES
to enable
.Xr ipnat 8
network address translation.
See
.Va ipfilter_enable
for a detailed discussion.
.\" ----- ipnat_program setting ---------------------------------
.It Va ipnat_program
.Pq Vt str
Path to
.Xr ipnat 8
(default
.Pa /sbin/ipnat ) .
.\" ----- ipnat_rules setting -----------------------------------
.It Va ipnat_rules
.Pq Vt str
Set to
.Pa /etc/ipnat.rules
by default.
This variable contains the name of the file
holding the network address translation definition.
This file is expected to be readable for the
.Xr ipnat 8
command to execute.
.\" ----- ipnat_flags setting -----------------------------------
.It Va ipnat_flags
.Pq Vt str
Empty by default.
This variable contains flags passed to the
.Xr ipnat 8
program.
.\" ----- ipmon_enable setting ----------------------------------
.It Va ipmon_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Set it to
.Dq Li YES
to enable
.Xr ipmon 8
monitoring (logging
.Xr ipf 8
and
.Xr ipnat 8
events).
Setting this variable needs setting
.Va ipfilter_enable
or
.Va ipnat_enable
too.
See
.Va ipfilter_enable
for a detailed discussion.
.\" ----- ipmon_program setting ---------------------------------
.It Va ipmon_program
.Pq Vt str
Path to
.Xr ipmon 8
(default
.Pa /sbin/ipmon ) .
.\" ----- ipmon_flags setting -----------------------------------
.It Va ipmon_flags
.Pq Vt str
Set to
.Dq Li -Ds
by default.
This variable contains flags passed to the
.Xr ipmon 8
program.
Another typical example would be
.Dq Fl D Pa /var/log/ipflog
to have
.Xr ipmon 8
log directly to a file bypassing
.Xr syslogd 8 .
Make sure to adjust
.Pa /etc/newsyslog.conf
in such case like this:
.Bd -literal
/var/log/ipflog 640 10 100 * Z /var/run/ipmon.pid
.Ed
.\" ----- ipfs_enable setting -----------------------------------
.It Va ipfs_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Set it to
.Dq Li YES
to enable
.Xr ipfs 8
saving the filter and NAT state tables during shutdown
and reloading them during startup again.
Setting this variable needs setting
.Va ipfilter_enable
or
.Va ipnat_enable
to
.Dq Li YES
too.
See
.Va ipfilter_enable
for a detailed discussion.
Note that if
.Va kern_securelevel
is set to 3,
.Va ipfs_enable
cannot be used
because the raised securelevel will prevent
.Xr ipfs 8
from saving the state tables at shutdown time.
.\" ----- ipfs_program setting ----------------------------------
.It Va ipfs_program
.Pq Vt str
Path to
.Xr ipfs 8
(default
.Pa /sbin/ipfs ) .
.\" ----- ipfs_flags setting ------------------------------------
.It Va ipfs_flags
.Pq Vt str
Empty by default.
This variable contains flags passed to the
.Xr ipfs 8
program.
.\" ----- end of added ipf hook ---------------------------------
.It Va pf_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting this to
.Dq Li YES
enables
.Xr pf 4
packet filtering.
.Pp
Typical usage will require putting
.Pp
.Dl pf_enable="YES"
.Pp
into
.Pa /etc/rc.conf
and editing
.Pa /etc/pf.conf
appropriately.
Adding
.Pp
.Dl "device pf"
.Pp
builds support for
.Xr pf 4
into the kernel, otherwise the
kernel module will be loaded.
.It Va pf_rules
.Pq Vt str
Path to
.Xr pf 4
ruleset configuration file
(default
.Pa /etc/pf.conf ) .
.It Va pf_program
.Pq Vt str
Path to
.Xr pfctl 8
(default
.Pa /sbin/pfctl ) .
.It Va pf_flags
.Pq Vt str
If
.Va pf_enable
is set to
.Dq Li YES ,
these flags are passed to the
.Xr pfctl 8
program when loading the ruleset.
.It Va pflog_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting this to
.Dq Li YES
enables
.Xr pflogd 8
which logs packets from the
.Xr pf 4
packet filter.
.It Va pflog_logfile
.Pq Vt str
If
.Va pflog_enable
is set to
.Dq Li YES
this controls where
.Xr pflogd 8
stores the logfile
(default
.Pa /var/log/pflog ) .
Check
.Pa /etc/newsyslog.conf
to adjust logfile rotation for this.
.It Va pflog_program
.Pq Vt str
Path to
.Xr pflogd 8
(default
.Pa /sbin/pflogd ) .
.It Va pflog_flags
.Pq Vt str
Empty by default.
This variable contains additional flags passed to the
.Xr pflogd 8
program.
.It Va pflog_instances
.Pq Vt str
If logging to more than one
.Xr pflog 4
interface is desired,
.Va pflog_instances
is set to the list of
.Xr pflogd 8
instances that should be started at system boot time. If
.Va pflog_instances
is set, for each whitespace-seperated
.Ar element
in the list,
.Ao Ar element Ac Ns Va _dev
and
.Ao Ar element Ac Ns Va _logfile
elements are assumed to exist.
.Ao Ar element Ac Ns Va _dev
must contain the
.Xr pflog 4
interface to be watched by the named
.Xr pflogd 8
instance.
.Ao Ar element Ac Ns Va _logfile
must contain the name of the logfile that will be used by the
.Xr pflogd 8
instance.
.It Va ftpproxy_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting this to
.Dq Li YES
enables
.Xr ftp-proxy 8
which supports the
.Xr pf 4
packet filter in translating ftp connections.
.It Va ftpproxy_flags
.Pq Vt str
Empty by default.
This variable contains additional flags passed to the
.Xr ftp-proxy 8
program.
.It Va ftpproxy_instances
.Pq Vt str
Empty by default. If multiple instances of
.Xr ftp-proxy 8
are desired at boot time,
.Va ftpproxy_instances
should contain a whitespace-seperated list of instance names. For each
.Ar element
in the list, a variable named
.Ao Ar element Ac Ns Va _flags
should be defined, containing the command-line flags to be passed to the
.Xr ftp-proxy 8
instance.
.It Va pfsync_enable
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting this to
.Dq Li YES
enables exposing
.Xr pf 4
state changes to other hosts over the network by means of
.Xr pfsync 4 .
The
.Va pfsync_syncdev
variable
must also be set then.
.It Va pfsync_syncdev
.Pq Vt str
Empty by default.
This variable specifies the name of the network interface
.Xr pfsync 4
should operate through.
It must be set accordingly if
.Va pfsync_enable
is set to
.Dq Li YES .
.It Va pfsync_syncpeer
.Pq Vt str
Empty by default.
This variable is optional.
By default, state change messages are sent out on the synchronisation
interface using IP multicast packets.
The protocol is IP protocol 240, PFSYNC, and the multicast group used is
224.0.0.240.
When a peer address is specified using the
.Va pfsync_syncpeer
option, the peer address is used as a destination for the pfsync
traffic, and the traffic can then be protected using
.Xr ipsec 4 .
See the
.Xr pfsync 4
manpage for more details about using
.Xr ipsec 4
with
.Xr pfsync 4
interfaces.
.It Va pfsync_ifconfig
.Pq Vt str
Empty by default.
This variable can contain additional options to be passed to the
.Xr ifconfig 8
command used to set up
.Xr pfsync 4 .
.It Va tcp_extensions
.Pq Vt bool
Set to
.Dq Li YES
by default.
Setting this to
.Dq Li NO
disables certain TCP options as described by
.Rs
.%T "RFC 1323"
.Re
Setting this to
.Dq Li NO
might help remedy such problems with connections as randomly hanging
or other weird behavior.
Some network devices are known
to be broken with respect to these options.
.It Va log_in_vain
.Pq Vt int
Set to 0 by default.
The
.Xr sysctl 8
variables,
.Va net.inet.tcp.log_in_vain
and
.Va net.inet.udp.log_in_vain ,
as described in
.Xr tcp 4
and
.Xr udp 4 ,
are set to the given value.
.It Va tcp_keepalive
.Pq Vt bool
Set to
.Dq Li YES
by default.
Setting to
.Dq Li NO
will disable probing idle TCP connections to verify that the
peer is still up and reachable.
.It Va tcp_drop_synfin
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting to
.Dq Li YES
will cause the kernel to ignore TCP frames that have both
the SYN and FIN flags set.
This prevents OS fingerprinting, but may
break some legitimate applications.
.It Va icmp_drop_redirect
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting to
.Dq Li YES
will cause the kernel to ignore ICMP REDIRECT packets.
Refer to
.Xr icmp 4
for more information.
.It Va icmp_log_redirect
.Pq Vt bool
Set to
.Dq Li NO
by default.
Setting to
.Dq Li YES
will cause the kernel to log ICMP REDIRECT packets.
Note that
the log messages are not rate-limited, so this option should only be used
for troubleshooting networks.
Refer to
.Xr icmp 4
for more information.
.It Va icmp_bmcastecho
.Pq Vt bool
Set to
.Dq Li YES
to respond to broadcast or multicast ICMP ping packets.
Refer to
.Xr icmp 4
for more information.
.It Va ip_portrange_first
.Pq Vt int
If not set to
.Dq Li NO ,
this is the first port in the default portrange.
Refer to
.Xr ip 4
for more information.
.It Va ip_portrange_last
.Pq Vt int
If not set to
.Dq Li NO ,
this is the last port in the default portrange.
Refer to
.Xr ip 4
for more information.
.It Va network_interfaces
.Pq Vt str
Set to the list of network interfaces to configure on this host or
.Dq Li AUTO
(the default) for all current interfaces.
Setting the
.Va network_interfaces
variable to anything other than the default is deprecated.
Interfaces that the administrator wishes to store configuration for,
but not start at boot should be configured with the
.Dq Li NOAUTO
keyword in their
.Va ifconfig_ Ns Aq Ar interface
variables as described below.
.Pp
An
.Va ifconfig_ Ns Aq Ar interface
variable is also assumed to exist for each value of
.Ar interface .
When an interface name contains any of the characters
.Dq Li .-/+
they are translated to
.Dq Li _
before lookup.
The variable can contain arguments to
.Xr ifconfig 8 ,
as well as special case-insensitive keywords described below.
Such keywords are removed before passing the value to
.Xr ifconfig 8
while the order of the other arguments is preserved.
.Pp
It is possible to add IP alias entries using
.Xr ifconfig 8
syntax with the address family keyword such as
.Li inet .
Assuming that the interface in question was
.Li ed0 ,
it might look something like this:
.Bd -literal
ifconfig_ed0_alias0="inet 127.0.0.253 netmask 0xffffffff"
ifconfig_ed0_alias1="inet 127.0.0.254 netmask 0xffffffff"
.Ed
.Pp
It also possible to configure multiple IP addresses in Classless
Inter-Domain Routing
.Pq CIDR
address notation,
whose each address component can be a range like
.Li inet 192.0.2.5-23/24
or
.Li inet6 2001:db8:1-f::1/64 .
This notation allows address and prefix length part only,
not the other address modifiers.
Note that the maximum number of the generated addresses from a range
specification is limited to an integer value specified in
.Va netif_ipexpand_max
in
.Xr rc.conf 5
because a small typo can unexpectedly generate a large number of addresses.
The default value is
.Li 2048 .
It can be increased by adding the following line into
.Xr rc.conf 5 :
.Bd -literal
netif_ipexpand_max="4096"
.Ed
.Pp
In the case of
.Li 192.0.2.5-23/24 ,
the address 192.0.2.5 will be configured with the
netmask /24 and the addresses 192.0.2.6 to 192.0.2.23 with
the non-conflicting netmask /32 as explained in the
.Xr ifconfig 8
alias section.
Note that this special netmask handling is only for
.Li inet ,
not for the other address families such as
.Li inet6 .
.Pp
With the interface in question being
.Li ed0 ,
an example could look like:
.Bd -literal
ifconfig_ed0_alias2="inet 192.0.2.129/27"
ifconfig_ed0_alias3="inet 192.0.2.1-5/28"
.Ed
.Pp
and so on.
.Pp
Note that
.Va ipv4_addrs_ Ns Aq Ar interface
variable was supported for IPv4 CIDR address notation.
It is now deprecated because the functionality was integrated into
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
though
.Va ipv4_addrs_ Ns Aq Ar interface
is still supported for backward compatibility.
.Pp
For each
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
entry with an address family keyword,
its contents are passed to
.Xr ifconfig 8 .
Execution stops at the first unsuccessful access, so if
something like this is present:
.Bd -literal
ifconfig_ed0_alias0="inet 127.0.0.251 netmask 0xffffffff"
ifconfig_ed0_alias1="inet 127.0.0.252 netmask 0xffffffff"
ifconfig_ed0_alias2="inet 127.0.0.253 netmask 0xffffffff"
ifconfig_ed0_alias4="inet 127.0.0.254 netmask 0xffffffff"
.Ed
.Pp
Then note that alias4 would
.Em not
be added since the search would
stop with the missing
.Dq Li alias3
entry.
Because of this difficult to manage behavior,
there is
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _aliases
variable, which has the same functionality as
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
and can have all of entries in a variable like the following:
.Bd -literal
ifconfig_ed0_aliases="\\
inet 127.0.0.251 netmask 0xffffffff \\
inet 127.0.0.252 netmask 0xffffffff \\
inet 127.0.0.253 netmask 0xffffffff \\
inet 127.0.0.254 netmask 0xffffffff"
.Ed
.Pp
It also supports CIDR notation.
.Pp
If the
.Pa /etc/start_if. Ns Aq Ar interface
file is present, it is read and executed by the
.Xr sh 1
interpreter
before configuring the interface as specified in the
.Va ifconfig_ Ns Aq Ar interface
and
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
variables.
.Pp
If a
.Va vlans_ Ns Aq Ar interface
variable is set,
a
.Xr vlan 4
interface will be created for each item in the list with the
.Ar vlandev
argument set to
.Ar interface .
If a vlan interface's name is a number,
then that number is used as the vlan tag and the new vlan interface is
named
.Ar interface . Ns Ar tag .
Otherwise,
the vlan tag must be specified via a
.Va vlan
parameter in the
.Va create_args_ Ns Aq Ar interface
variable.
.Pp
To create a vlan device named
.Li em0.101
on
.Li em0
with the vlan tag 101 and the optional the IPv4 address 192.0.2.1/24:
.Bd -literal
vlans_em0="101"
ifconfig_em0_101="inet 192.0.2.1/24"
.Ed
.Pp
To create a vlan device named
.Li myvlan
on
.Li em0
with the vlan tag 102:
.Bd -literal
vlans_em0="myvlan"
create_args_myvlan="vlan 102"
.Ed
.Pp
If a
.Va wlans_ Ns Aq Ar interface
variable is set,
an
.Xr wlan 4
interface will be created for each item in the list with the
.Ar wlandev
argument set to
.Ar interface .
Further wlan cloning arguments may be passed to the
.Xr ifconfig 8
.Cm create
command by setting the
.Va create_args_ Ns Aq Ar interface
variable.
One or more
.Xr wlan 4
devices must be created for each wireless devices as of
.Fx 8.0 .
Debugging flags for
.Xr wlan 4
devices as set by
.Xr wlandebug 8
may be specified with an
.Va wlandebug_ Ns Aq Ar interface
variable.
The contents of this variable will be passed directly to
.Xr wlandebug 8 .
.Pp
If the
.Va ifconfig_ Ns Aq Ar interface
contains the keyword
.Dq Li NOAUTO
then the interface will not be configured
at boot or by
.Pa /etc/pccard_ether
when
.Va network_interfaces
is set to
.Dq Li AUTO .
.Pp
It is possible to bring up an interface with DHCP by adding
.Dq Li DHCP
to the
.Va ifconfig_ Ns Aq Ar interface
variable.
For instance, to initialize the
.Li ed0
device via DHCP,
it is possible to use something like:
.Bd -literal
ifconfig_ed0="DHCP"
.Ed
.Pp
If you want to configure your wireless interface with
.Xr wpa_supplicant 8
for use with WPA, EAP/LEAP or WEP, you need to add
.Dq Li WPA
to the
.Va ifconfig_ Ns Aq Ar interface
variable.
.Pp
On the other hand, if you want to configure your wireless interface with
.Xr hostapd 8 ,
you need to add
.Dq Li HOSTAP
to the
.Va ifconfig_ Ns Aq Ar interface
variable.
.Xr hostapd 8
will use the settings from
.Pa /etc/hostapd- Ns Ao Ar interface Ac Ns .conf
.Pp
Finally, you can add
.Xr ifconfig 8
options in this variable, in addition to the
.Pa /etc/start_if. Ns Aq Ar interface
file.
For instance, to configure an
.Xr ath 4
wireless device in station mode with an address obtained
via DHCP, using WPA authentication and 802.11b mode, it is
possible to use something like:
.Bd -literal
wlans_ath0="wlan0"
ifconfig_wlan0="DHCP WPA mode 11b"
.Ed
.Pp
In addition to the
.Va ifconfig_ Ns Aq Ar interface
form, a fallback variable
.Va ifconfig_DEFAULT
may be configured.
It will be used for all interfaces with no
.Va ifconfig_ Ns Aq Ar interface
variable.
This is intended to replace the no longer supported
.Va pccard_ifconfig
variable.
.Pp
It is also possible to rename an interface by doing:
.Bd -literal
ifconfig_ed0_name="net0"
ifconfig_net0="inet 192.0.2.1 netmask 0xffffff00"
.Ed
.It Va ipv6_enable
.Pq Vt bool
This variable is deprecated.
Use
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
and
.Va ipv6_activate_all_interfaces
if necessary.
.Pp
If the variable is
.Dq Li YES ,
.Dq Li inet6 accept_rtadv
is added to all of
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
and the
.Va ipv6_activate_all_interfaces
is defined as
.Dq Li YES .
.It Va ipv6_prefer
.Pq Vt bool
This variable is deprecated.
Use
.Va ip6addrctl_policy
instead.
.Pp
If the variable is
.Dq Li YES ,
the default address selection policy table set by
.Xr ip6addrctl 8
will be IPv6-preferred.
.Pp
If the variable is
.Dq Li NO ,
the default address selection policy table set by
.Xr ip6addrctl 8
will be IPv4-preferred.
.It Va ipv6_activate_all_interfaces
.Pq Vt bool
This controls initial configuration on IPv6-capable
interfaces with no corresponding
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
variable.
Note that it is not always necessary to set this variable to
.Dq YES
to use IPv6 functionality on
.Fx .
In most cases, just configuring
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
variables works.
.Pp
If the variable is
.Dq Li NO ,
all interfaces which do not have a corresponding
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
variable will be marked as
.Dq Li IFDISABLED
at creation.
This means that all of IPv6 functionality on that interface
is completely disabled to enforce a security policy.
If the variable is set to
.Dq YES ,
the flag will be cleared on all of the interfaces.
.Pp
In most cases, just defining an
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
for an IPv6-capable interface should be sufficient.
However, if an interface is added dynamically
.Pq by some tunneling protocols such as PPP, for example ,
it is often difficult to define the variable in advance.
In such a case, configuring the
.Dq Li IFDISABLED
flag can be disabled by setting this variable to
.Dq YES .
.Pp
For more details of the
.Dq Li IFDISABLED
flag and keywords
.Dq Li inet6 ifdisabled ,
see
.Xr ifconfig 8 .
.Pp
Default is
.Dq Li NO .
.It Va ipv6_privacy
.Pq Vt bool
If the variable is
.Dq Li YES
privacy addresses will be generated for each IPv6
interface as described in RFC 4941.
.It Va ipv6_network_interfaces
.Pq Vt str
This is the IPv6 equivalent of
.Va network_interfaces .
Normally manual configuration of this variable is not needed.
.It Va ipv6_cpe_wanif
.Pq Vt str
If the variable is set to an interface name,
the
.Xr ifconfig 8
options
.Dq inet6 -no_radr accept_rtadv
will be added to the specified interface automatically before evaluating
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6 ,
and two
.Xr sysctl 8
variables
.Va net.inet6.ip6.rfc6204w3
and
.Va net.inet6.ip6.no_radr
will be set to 1.
.Pp
This means the specified interface will accept ICMPv6 Router
Advertisement messages on that link and add the discovered
routers into the Default Router List.
While the other interfaces can still accept RA messages if the
.Dq inet6 accept_rtadv
option is specified, adding
routes into the Default Router List will be disabled by
.Dq inet6 no_radr
option by default.
See
.Xr ifconfig 8
for more details.
.Pp
Note that ICMPv6 Router Advertisement messages will be
accepted even when
.Va net.inet6.ip6.forwarding
is 1
.Pq packet forwarding is enabled
when
.Va net.inet6.ip6.rfc6204w3
is set to 1.
.Pp
Default is
.Dq Li NO .
.It Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
.Pq Vt str
IPv6 functionality on an interface should be configured by
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6 ,
instead of setting ifconfig parameters in
.Va ifconfig_ Ns Aq Ar interface .
If this variable is empty, all of IPv6 configurations on the
specified interface by other variables such as
.Va ipv6_prefix_ Ns Ao Ar interface Ac
will be ignored.
.Pp
Aliases should be set by
.Va ifconfig_ Ns Ao Ar interface Ac Ns Va _alias Ns Aq Ar n
with
.Dq Li inet6
keyword.
For example:
.Bd -literal
ifconfig_ed0_ipv6="inet6 2001:db8:1::1 prefixlen 64"
ifconfig_ed0_alias0="inet6 2001:db8:2::1 prefixlen 64"
.Ed
.Pp
Interfaces that have an
.Dq Li inet6 accept_rtadv
keyword in
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
setting will be automatically configured by SLAAC
.Pq StateLess Address AutoConfiguration
described in
.Rs
.%T "RFC 4862"
.Re
.Pp
Note that a link-local address will be automatically configured in
addition to the configured global-scope addresses because the IPv6
specifications require it on each link.
The address is calculated from the MAC address by using an algorithm
defined in
.Rs
.%T "RFC 4862"
.%O "Section 5.3"
.Re
.Pp
If only a link-local address is needed on the interface,
the following configuration can be used:
.Bd -literal
ifconfig_ed0_ipv6="inet6 auto_linklocal"
.Ed
.Pp
A link-local address can also be configured manually.
This is useful for the default router address of an IPv6 router
so that it does not change when the network interface
card is replaced.
For example:
.Bd -literal
ifconfig_ed0_ipv6="inet6 fe80::1 prefixlen 64"
.Ed
.It Va ipv6_prefix_ Ns Aq Ar interface
.Pq Vt str
If one or more prefixes are defined in
.Va ipv6_prefix_ Ns Aq Ar interface
addresses based on each prefix and the EUI-64 interface index will be
configured on that interface.
Note that this variable will be ignored when
.Va ifconfig_ Ns Ao Ar interface Ac Ns _ipv6
is empty.
.Pp
For example, the following configuration
.Bd -literal
ipv6_prefix_ed0="2001:db8:1:0 2001:db8:2:0"
.Ed
.Pp
is equivalent to the following:
.Bd -literal
ifconfig_ed0_alias0="inet6 2001:db8:1:: eui64 prefixlen 64"
ifconfig_ed0_alias1="inet6 2001:db8:1:: prefixlen 64 anycast"
ifconfig_ed0_alias2="inet6 2001:db8:2:: eui64 prefixlen 64"
ifconfig_ed0_alias3="inet6 2001:db8:2:: prefixlen 64 anycast"
.Ed
.Pp
These Subnet-Router anycast addresses will be added only when
.Va ipv6_gateway_enable
is YES.
.It Va ipv6_default_interface
.Pq Vt str
If not set to
.Dq Li NO ,
this is the default output interface for scoped addresses.
This works only with ipv6_gateway_enable="NO".
.It Va ip6addrctl_enable
.Pq Vt bool
This variable is to enable configuring default address selection policy table
.Pq RFC 3484 .
The table can be specified in another variable
.Va ip6addrctl_policy .
For
.Va ip6addrctl_policy
the following keywords can be specified:
.Dq Li ipv4_prefer ,
.Dq Li ipv6_prefer ,
or
.Dq Li AUTO .
.Pp
If
.Dq Li ipv4_prefer
or
.Dq Li ipv6_prefer
is specified,
.Xr ip6addrctl 8
installs a pre-defined policy table described in Section 2.1
.Pq IPv6-preferred
or 10.3
.Pq IPv4-preferred
of RFC 3484.
.Pp
If
.Dq Li AUTO
is specified, it attempts to read a file
.Pa /etc/ip6addrctl.conf
first.
If this file is found,
.Xr ip6addrctl 8
reads and installs it.
If not found, a policy is automatically set
according to
.Va ipv6_activate_all_interfaces
variable; if the variable is set to
.Dq Li YES
the IPv6-preferred one is used.
Otherwise IPv4-preferred.
.Pp
The default value of
.Va ip6addrctl_enable
and
.Va ip6addrctl_policy
are
.Dq Li YES
and
.Dq Li AUTO ,
respectively.
.It Va cloned_interfaces
.Pq Vt str
Set to the list of clonable network interfaces to create on this host.
Further cloning arguments may be passed to the
.Xr ifconfig 8
.Cm create
command for each interface by setting the
.Va create_args_ Ns Aq Ar interface
variable.
If an interface name is specified with
.Dq :sticky
keyword,
the interface will not be destroyed even when
.Pa rc.d/netif
script is invoked with
.Dq stop
argument.
This is useful when reconfiguring the interface without destroying it.
Entries in
.Va cloned_interfaces
are automatically appended to
.Va network_interfaces
for configuration.
.It Va cloned_interfaces_sticky
.Pq Vt bool
This variable is to globally enable functionality of
.Dq :sticky
keyword in
.Va cloned_interfaces
for all interfaces.
The default value is
.Dq NO .
Even if this variable is specified to
.Dq YES ,
.Dq :nosticky
keyword can be used to override it on per interface basis.
.It Va gif_interfaces
.Pq Vt str
This variable is deprecated in favor of
.Va cloned_interfaces .
Set to the list of
.Xr gif 4
tunnel interfaces to configure on this host.
A
.Va gifconfig_ Ns Aq Ar interface
variable is assumed to exist for each value of
.Ar interface .
The value of this variable is used to configure the link layer of the
tunnel according to the syntax of the
.Cm tunnel
option to
.Xr ifconfig 8 .
Additionally, this option ensures that each listed interface is created
via the
.Cm create
option to
.Xr ifconfig 8
before attempting to configure it.
.It Va sppp_interfaces
.Pq Vt str
Set to the list of
.Xr sppp 4
interfaces to configure on this host.
A
.Va spppconfig_ Ns Aq Ar interface
variable is assumed to exist for each value of
.Ar interface .
Each interface should also be configured by a general
.Va ifconfig_ Ns Aq Ar interface
setting.
Refer to
.Xr spppcontrol 8
for more information about available options.
.It Va ppp_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr ppp 8
daemon.
.It Va ppp_profile
.Pq Vt str
The name of the profile to use from
.Pa /etc/ppp/ppp.conf .
Also used for per-profile overrides of
.Va ppp_mode
and
.Va ppp_nat ,
and
.Va ppp_ Ns Ao Ar profile Ac Ns _unit .
When the profile name contains any of the characters
.Dq Li .-/+
they are translated to
.Dq Li _
for the proposes of the override variable names.
.It Va ppp_mode
.Pq Vt str
Mode in which to run the
.Xr ppp 8
daemon.
.It Va ppp_ Ns Ao Ar profile Ac Ns _mode
.Pq Vt str
Overrides the global
.Va ppp_mode
for
.Ar profile .
Accepted modes are
.Dq Li auto ,
.Dq Li ddial ,
.Dq Li direct
and
.Dq Li dedicated .
See the manual for a full description.
.It Va ppp_nat
.Pq Vt bool
If set to
.Dq Li YES ,
enables network address translation.
Used in conjunction with
.Va gateway_enable
allows hosts on private network addresses access to the Internet using
this host as a network address translating router.
.It Va ppp_ Ns Ao Ar profile Ac Ns _nat
.Pq Vt str
Overrides the global
.Va ppp_nat
for
.Ar profile .
.It Va ppp_ Ns Ao Ar profile Ac Ns _unit
.Pq Vt int
Set the unit number to be used for this profile.
See the manual description of
.Fl unit Ns Ar N
for details.
.It Va ppp_user
.Pq Vt str
The name of the user under which
.Xr ppp 8
should be started.
By
default,
.Xr ppp 8
is started as
.Dq Li root .
.It Va rc_conf_files
.Pq Vt str
This option is used to specify a list of files that will override
the settings in
.Pa /etc/defaults/rc.conf .
The files will be read in the order in which they are specified and should
include the full path to the file.
By default, the files specified are
.Pa /etc/rc.conf
and
.Pa /etc/rc.conf.local
.It Va zfs_enable
.Pq Vt bool
If set to
.Dq Li YES ,
.Pa /etc/rc.d/zfs
will attempt to automatically mount ZFS file systems and initialize ZFS volumes
(ZVOLs).
.It Va gptboot_enable
.Pq Vt bool
If set to
.Dq Li YES ,
.Pa /etc/rc.d/gptboot
will log if the system successfully (or not) booted from a GPT partition,
which had the
.Ar bootonce
attribute set using
.Xr gpart 8
utility.
.It Va gbde_autoattach_all
.Pq Vt bool
If set to
.Dq Li YES ,
.Pa /etc/rc.d/gbde
will attempt to automatically initialize your .bde devices in
.Pa /etc/fstab .
.It Va gbde_devices
.Pq Vt str
List the devices that the script should try to attach,
or
.Dq Li AUTO .
.It Va gbde_lockdir
.Pq Vt str
The directory where the
.Xr gbde 4
lockfiles are located.
The default lockfile directory is
.Pa /etc .
.Pp
The lockfile for each individual
.Xr gbde 4
device can be overridden by setting the variable
.Va gbde_lock_ Ns Aq Ar device ,
where
.Ar device
is the encrypted device without the
.Dq Pa /dev/
and
.Dq Pa .bde
parts.
.It Va gbde_attach_attempts
.Pq Vt int
Number of times to attempt attaching to a
.Xr gbde 4
device, i.e., how many times the user is asked for the pass-phrase.
Default is 3.
.It Va geli_devices
.Pq Vt str
List of devices to automatically attach on boot.
Note that .eli devices from
.Pa /etc/fstab
are automatically appended to this list.
.It Va geli_tries
.Pq Vt int
Number of times user is asked for the pass-phrase.
If empty, it will be taken from
.Va kern.geom.eli.tries
sysctl variable.
.It Va geli_default_flags
.Pq Vt str
Default flags to use by
.Xr geli 8
when configuring disk encryption.
Flags can be configured for every device separately by defining
.Va geli_ Ns Ao Ar device Ac Ns Va _flags
variable.
.It Va geli_autodetach
.Pq Vt str
Specifies if GELI devices should be marked for detach on last close after
file systems are mounted.
Default is
.Dq Li YES .
This can be changed for every device separately by defining
.Va geli_ Ns Ao Ar device Ac Ns Va _autodetach
variable.
.It Va root_rw_mount
.Pq Vt bool
Set to
.Dq Li YES
by default.
After the file systems are checked at boot time, the root file system
is remounted as read-write if this is set to
.Dq Li YES .
Diskless systems that mount their root file system from a read-only remote
NFS share should set this to
.Dq Li NO
in their
.Pa rc.conf .
.It Va fsck_y_enable
.Pq Vt bool
If set to
.Dq Li YES ,
.Xr fsck 8
will be run with the
.Fl y
flag if the initial preen
of the file systems fails.
.It Va background_fsck
.Pq Vt bool
If set to
.Dq Li YES ,
the system will attempt to run
.Xr fsck 8
in the background where possible.
.It Va background_fsck_delay
.Pq Vt int
The amount of time in seconds to sleep before starting a background
.Xr fsck 8 .
It defaults to sixty seconds to allow large applications such as
the X server to start before disk I/O bandwidth is monopolized by
.Xr fsck 8 .
If set to a negative number, the background file system check will be
delayed indefinitely to allow the administrator to run it at a more
convenient time.
For example it may be run from
.Xr cron 8
by adding a line like
.Pp
.Dl "0 4 * * * root /etc/rc.d/bgfsck forcestart"
.Pp
to
.Pa /etc/crontab .
.It Va netfs_types
.Pq Vt str
List of file system types that are network-based.
This list should generally not be modified by end users.
Use
.Va extra_netfs_types
instead.
.It Va extra_netfs_types
.Pq Vt str
If set to something other than
.Dq Li NO
(the default),
this variable extends the list of file system types
for which automatic mounting at startup by
.Xr rc 8
should be delayed until the network is initialized.
It should contain
a whitespace-separated list of network file system descriptor pairs,
each consisting of a file system type as passed to
.Xr mount 8
and a human-readable, one-word description,
joined with a colon
.Pq Ql \&: .
Extending the default list in this way is only necessary
when third party file system types are used.
.It Va syslogd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr syslogd 8
daemon.
.It Va syslogd_program
.Pq Vt str
Path to
.Xr syslogd 8
(default
.Pa /usr/sbin/syslogd ) .
.It Va syslogd_flags
.Pq Vt str
If
.Va syslogd_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr syslogd 8 .
.It Va inetd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr inetd 8
daemon.
.It Va inetd_program
.Pq Vt str
Path to
.Xr inetd 8
(default
.Pa /usr/sbin/inetd ) .
.It Va inetd_flags
.Pq Vt str
If
.Va inetd_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr inetd 8 .
.It Va hastd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr hastd 8
daemon.
.It Va hastd_program
.Pq Vt str
Path to
.Xr hastd 8
(default
.Pa /sbin/hastd ) .
.It Va hastd_flags
.Pq Vt str
If
.Va hastd_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr hastd 8 .
.It Va local_unbound_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr unbound 8
daemon as a local caching resolver.
.It Va kdc_enable
.Pq Vt bool
Set to
.Dq Li YES
to start a Kerberos 5 authentication server
at boot time.
.It Va kdc_program
.Pq Vt str
If
.Va kdc_enable
is set to
.Dq Li YES
this is the path to Kerberos 5 Authentication Server.
.It Va kdc_flags
.Pq Vt str
Empty by default.
This variable contains additional flags to be passed to the Kerberos 5
authentication server.
.It Va kadmind_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr kadmind 8 ,
the Kerberos 5 Administration Daemon; set to
.Dq Li NO
on a slave server.
.It Va kadmind_program
.Pq Vt str
If
.Va kadmind_enable
is set to
.Dq Li YES
this is the path to Kerberos 5 Administration Daemon.
.It Va kpasswdd_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr kpasswdd 8 ,
the Kerberos 5 Password-Changing Daemon; set to
.Dq Li NO
on a slave server.
.It Va kpasswdd_program
.Pq Vt str
If
.Va kpasswdd_enable
is set to
.Dq Li YES
this is the path to Kerberos 5 Password-Changing Daemon.
.It Va kfd_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr kfd 8 ,
the Kerberos 5 ticket forwarding daemon, at the boot time.
.It Va kfd_program
.Pq Vt str
Path to
.Xr kfd 8
(default
.Pa /usr/libexec/kfd ) .
.It Va rwhod_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rwhod 8
daemon at boot time.
.It Va rwhod_flags
.Pq Vt str
If
.Va rwhod_enable
is set to
.Dq Li YES ,
these are the flags to pass to it.
.It Va amd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr amd 8
daemon at boot time.
.It Va amd_flags
.Pq Vt str
If
.Va amd_enable
is set to
.Dq Li YES ,
these are the flags to pass to it.
See the
.Xr amd 8
manpage for more information.
.It Va amd_map_program
.Pq Vt str
If set,
the specified program is run to get the list of
.Xr amd 8
maps.
For example, if the
.Xr amd 8
maps are stored in NIS, one can set this to
run
.Xr ypcat 1
to get a list of
.Xr amd 8
maps from the
.Pa amd.master
NIS map.
.It Va update_motd
.Pq Vt bool
If set to
.Dq Li YES ,
.Pa /etc/motd
will be updated at boot time to reflect the kernel release
being run.
If set to
.Dq Li NO ,
.Pa /etc/motd
will not be updated.
.It Va nfs_client_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the NFS client daemons at boot time.
.It Va nfs_access_cache
.Pq Vt int
If
.Va nfs_client_enable
is set to
.Dq Li YES ,
this can be set to
.Dq Li 0
to disable NFS ACCESS RPC caching, or to the number of seconds for which
NFS ACCESS
results should be cached.
A value of 2-10 seconds will substantially reduce network
traffic for many NFS operations.
.It Va nfs_server_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the NFS server daemons at boot time.
.It Va nfs_server_flags
.Pq Vt str
If
.Va nfs_server_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr nfsd 8
daemon.
.It Va nfsv4_server_enable
.Pq Vt bool
If
.Va nfs_server_enable
is set to
.Dq Li YES
and
.Va nfsv4_server_enable
are set to
.Dq Li YES ,
enable the server for NFSv4 as well as NFSv2 and NFSv3.
.It Va nfsuserd_enable
.Pq Vt bool
If
.Va nfsuserd_enable
is set to
.Dq Li YES ,
run the nfsuserd daemon, which is needed for NFSv4 in order
to map between user/group names vs uid/gid numbers.
If
.Va nfsv4_server_enable
is set to
.Dq Li YES ,
this will be forced enabled.
.It Va nfsuserd_flags
.Pq Vt str
If
.Va nfsuserd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr nfsuserd 8
daemon.
.It Va nfscbd_enable
.Pq Vt bool
If
.Va nfscbd_enable
is set to
.Dq Li YES ,
run the nfscbd daemon, which enables callbacks/delegations for the NFSv4 client.
.It Va nfscbd_flags
.Pq Vt str
If
.Va nfscbd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr nfscbd 8
daemon.
.It Va mountd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
and no
.Va nfs_server_enable
is set, start
.Xr mountd 8 ,
but not
.Xr nfsd 8
daemon.
It is commonly needed to run CFS without real NFS used.
.It Va mountd_flags
.Pq Vt str
If
.Va mountd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr mountd 8
daemon.
.It Va weak_mountd_authentication
.Pq Vt bool
If set to
.Dq Li YES ,
allow services like PCNFSD to make non-privileged mount
requests.
.It Va nfs_reserved_port_only
.Pq Vt bool
If set to
.Dq Li YES ,
provide NFS services only on a secure port.
.It Va nfs_bufpackets
.Pq Vt int
If set to a number, indicates the number of packets worth of
socket buffer space to reserve on an NFS client.
The kernel default is typically 4.
Using a higher number may be
useful on gigabit networks to improve performance.
The minimum value is
2 and the maximum is 64.
.It Va rpc_lockd_enable
.Pq Vt bool
If set to
.Dq Li YES
and also an NFS server or client, run
.Xr rpc.lockd 8
at boot time.
.It Va rpc_lockd_flags
.Pq Vt str
If
.Va rpc_lockd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rpc.lockd 8
daemon.
.It Va rpc_statd_enable
.Pq Vt bool
If set to
.Dq Li YES
and also an NFS server or client, run
.Xr rpc.statd 8
at boot time.
.It Va rpc_statd_flags
.Pq Vt str
If
.Va rpc_statd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rpc.statd 8
daemon.
.It Va rpcbind_program
.Pq Vt str
Path to
.Xr rpcbind 8
(default
.Pa /usr/sbin/rpcbind ) .
.It Va rpcbind_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rpcbind 8
service at boot time.
.It Va rpcbind_flags
.Pq Vt str
If
.Va rpcbind_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rpcbind 8
daemon.
.It Va keyserv_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr keyserv 8
daemon on boot for running Secure RPC.
.It Va keyserv_flags
.Pq Vt str
If
.Va keyserv_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr keyserv 8
daemon.
.It Va pppoed_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr pppoed 8
daemon at boot time to provide PPP over Ethernet services.
.It Va pppoed_ Ns Aq Ar provider
.Pq Vt str
.Xr pppoed 8
listens to requests to this
.Ar provider
and ultimately runs
.Xr ppp 8
with a
.Ar system
argument of the same name.
.It Va pppoed_flags
.Pq Vt str
Additional flags to pass to
.Xr pppoed 8 .
.It Va pppoed_interface
.Pq Vt str
The network interface to run
.Xr pppoed 8
on.
This is mandatory when
.Va pppoed_enable
is set to
.Dq Li YES .
.It Va timed_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr timed 8
service at boot time.
This command is intended for networks of
machines where a consistent
.Dq "network time"
for all hosts must be established.
This is often useful in large NFS
environments where time stamps on files are expected to be consistent
network-wide.
.It Va timed_flags
.Pq Vt str
If
.Va timed_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr timed 8
service.
.It Va ntpdate_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run
.Xr ntpdate 8
at system startup.
This command is intended to
synchronize the system clock only
.Em once
from some standard reference.
.It Va ntpdate_config
.Pq Vt str
Configuration file for
.Xr ntpdate 8 .
Default
.Pa /etc/ntp.conf .
.It Va ntpdate_hosts
.Pq Vt str
A whitespace-separated list of NTP servers to synchronize with at startup.
The default is to use the servers listed in
.Va ntpdate_config ,
if that file exists.
.It Va ntpdate_program
.Pq Vt str
Path to
.Xr ntpdate 8
(default
.Pa /usr/sbin/ntpdate ) .
.It Va ntpdate_flags
.Pq Vt str
If
.Va ntpdate_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr ntpdate 8
command (typically a hostname).
.It Va ntpd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr ntpd 8
command at boot time.
.It Va ntpd_program
.Pq Vt str
Path to
.Xr ntpd 8
(default
.Pa /usr/sbin/ntpd ) .
.It Va ntpd_config
.Pq Vt str
Path to
.Xr ntpd 8
configuration file.
Default
.Pa /etc/ntp.conf .
.It Va ntpd_flags
.Pq Vt str
If
.Va ntpd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr ntpd 8
daemon.
.It Va ntpd_sync_on_start
.Pq Vt bool
If set to
.Dq Li YES ,
.Xr ntpd 8
is run with the
.Fl g
flag, which syncs the system's clock on startup.
See
.Xr ntpd 8
for more information regarding the
.Fl g
option.
This is a preferred alternative to using
.Xr ntpdate 8
or specifying the
.Va ntpdate_enable
variable.
.It Va nis_client_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr ypbind 8
service at system boot time.
.It Va nis_client_flags
.Pq Vt str
If
.Va nis_client_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr ypbind 8
service.
+.It Va nis_ypldap_enable
+.Pq Vt bool
+If set to
+.Dq Li YES ,
+run the
+.Xr ypldap 8
+daemon at system boot time.
+.It Va nis_ypldap_flags
+.Pq Vt str
+If
+.Va nis.ypldap_enable
+is set to
+.Dq Li YES ,
+these are the flags to pass to the
+.Xr ypldap 8
+daemon.
.It Va nis_ypset_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr ypset 8
daemon at system boot time.
.It Va nis_ypset_flags
.Pq Vt str
If
.Va nis_ypset_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr ypset 8
daemon.
.It Va nis_server_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr ypserv 8
daemon at system boot time.
.It Va nis_server_flags
.Pq Vt str
If
.Va nis_server_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr ypserv 8
daemon.
.It Va nis_ypxfrd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rpc.ypxfrd 8
daemon at system boot time.
.It Va nis_ypxfrd_flags
.Pq Vt str
If
.Va nis_ypxfrd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rpc.ypxfrd 8
daemon.
.It Va nis_yppasswdd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rpc.yppasswdd 8
daemon at system boot time.
.It Va nis_yppasswdd_flags
.Pq Vt str
If
.Va nis_yppasswdd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rpc.yppasswdd 8
daemon.
.It Va rpc_ypupdated_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Nm rpc.ypupdated
daemon at system boot time.
.It Va bsnmpd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr bsnmpd 1
daemon at system boot time.
Be sure to understand the security implications of running SNMP daemon
on your host.
.It Va bsnmpd_flags
.Pq Vt str
If
.Va bsnmpd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr bsnmpd 1
daemon.
.It Va defaultrouter
.Pq Vt str
If not set to
.Dq Li NO ,
create a default route to this host name or IP address
(use an IP address if this router is also required to get to the
name server!).
.It Va ipv6_defaultrouter
.Pq Vt str
The IPv6 equivalent of
.Va defaultrouter .
.It Va static_arp_pairs
.Pq Vt str
Set to the list of static ARP pairs that are to be added at system
boot time.
For each whitespace separated
.Ar element
in the value, a
.Va static_arp_ Ns Aq Ar element
variable is assumed to exist whose contents will later be passed to a
.Dq Nm arp Cm -S
operation.
For example
.Bd -literal
static_arp_pairs="gw"
static_arp_gw="192.168.1.1 00:01:02:03:04:05"
.Ed
.It Va static_ndp_pairs
.Pq Vt str
Set to the list of static NDP pairs that are to be added at system
boot time.
For each whitespace separated
.Ar element
in the value, a
.Va static_ndp_ Ns Aq Ar element
variable is assumed to exist whose contents will later be passed to a
.Dq Nm ndp Cm -s
operation.
For example
.Bd -literal
static_ndp_pairs="gw"
static_ndp_gw="2001:db8:3::1 00:01:02:03:04:05"
.Ed
.It Va static_routes
.Pq Vt str
Set to the list of static routes that are to be added at system
boot time.
If not set to
.Dq Li NO
then for each whitespace separated
.Ar element
in the value, a
.Va route_ Ns Aq Ar element
variable is assumed to exist
whose contents will later be passed to a
.Dq Nm route Cm add
operation.
For example:
.Bd -literal
static_routes="ext mcast:gif0 gif0local:gif0"
route_ext="-net 10.0.0.0/24 -gateway 192.168.0.1"
route_mcast="-net 224.0.0.0/4 -iface gif0"
route_gif0local="-host 169.254.1.1 -iface lo0"
.Ed
.Pp
When an
.Ar element
is in the form of
.Li name:ifname ,
the route is specific to the interface
.Li ifname .
.It Va ipv6_static_routes
.Pq Vt str
The IPv6 equivalent of
.Va static_routes .
If not set to
.Dq Li NO
then for each whitespace separated
.Ar element
in the value, a
.Va ipv6_route_ Ns Aq Ar element
variable is assumed to exist
whose contents will later be passed to a
.Dq Nm route Cm add Fl inet6
operation.
.It Va natm_static_routes
.Pq Vt str
The
.Xr natmip 4
equivalent of
.Va static_routes .
If not empty then for each whitespace separated
.Ar element
in the value, a
.Va route_ Ns Aq Ar element
variable is assumed to exist whose contents will later be passed to a
.Dq Nm atmconfig Cm natm Cm add
operation.
.It Va gateway_enable
.Pq Vt bool
If set to
.Dq Li YES ,
configure host to act as an IP router, e.g.\& to forward packets
between interfaces.
.It Va ipv6_gateway_enable
.Pq Vt bool
The IPv6 equivalent of
.Va gateway_enable .
.It Va routed_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run a routing daemon of some sort, based on the
settings of
.Va routed_program
and
.Va routed_flags .
.It Va route6d_enable
.Pq Vt bool
The IPv6 equivalent of
.Va routed_enable .
If set to
.Dq Li YES ,
run a routing daemon of some sort, based on the
settings of
.Va route6d_program
and
.Va route6d_flags .
.It Va routed_program
.Pq Vt str
If
.Va routed_enable
is set to
.Dq Li YES ,
this is the name of the routing daemon to use.
.It Va route6d_program
.Pq Vt str
The IPv6 equivalent of
.Va routed_program .
.It Va routed_flags
.Pq Vt str
If
.Va routed_enable
is set to
.Dq Li YES ,
these are the flags to pass to the routing daemon.
.It Va route6d_flags
.Pq Vt str
The IPv6 equivalent of
.Va routed_flags .
.It Va mroute6d_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the IPv6 multicast routing daemon.
.Pp
Note that multicast routing daemons are no longer included in the
.Fx
base system, however, both
.Xr mrouted 8
and
.Xr pim6dd 8
may be installed from the
.Fx
Ports Collection.
.It Va mroute6d_flags
.Pq Vt str
If
.Va mroute6d_enable
is set to
.Dq Li YES ,
these are the flags passed to the IPv6 multicast routing daemon.
.It Va mroute6d_program
.Pq Vt str
If
.Va mroute6d_enable
is set to
.Dq Li YES ,
this is the path to the IPv6 multicast routing daemon.
.It Va rtadvd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rtadvd 8
daemon at boot time.
The
.Xr rtadvd 8
utility sends ICMPv6 Router Advertisement messages to
the interfaces specified in
.Va rtadvd_interfaces .
This should only be enabled with great care.
You may want to fine-tune
.Xr rtadvd.conf 5 .
.It Va rtadvd_interfaces
.Pq Vt str
If
.Va rtadvd_enable
is set to
.Dq Li YES
this is the list of interfaces to use.
.It Va arpproxy_all
.Pq Vt bool
If set to
.Dq Li YES ,
enable global proxy ARP.
.It Va forward_sourceroute
.Pq Vt bool
If set to
.Dq Li YES
and
.Va gateway_enable
is also set to
.Dq Li YES ,
source-routed packets are forwarded.
.It Va accept_sourceroute
.Pq Vt bool
If set to
.Dq Li YES ,
the system will accept source-routed packets directed at it.
.It Va rarpd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr rarpd 8
daemon at system boot time.
.It Va rarpd_flags
.Pq Vt str
If
.Va rarpd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr rarpd 8
daemon.
.It Va bootparamd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr bootparamd 8
daemon at system boot time.
.It Va bootparamd_flags
.Pq Vt str
If
.Va bootparamd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr bootparamd 8
daemon.
.It Va stf_interface_ipv4addr
.Pq Vt str
If not set to
.Dq Li NO ,
this is the local IPv4 address for 6to4 (IPv6 over IPv4 tunneling
interface).
Specify this entry to enable the 6to4 interface.
.It Va stf_interface_ipv4plen
.Pq Vt int
Prefix length for 6to4 IPv4 addresses, to limit peer address range.
An effective value is 0-31.
.It Va stf_interface_ipv6_ifid
.Pq Vt str
IPv6 interface ID for
.Xr stf 4 .
This can be set to
.Dq Li AUTO .
.It Va stf_interface_ipv6_slaid
.Pq Vt str
IPv6 Site Level Aggregator for
.Xr stf 4 .
.It Va ipv6_ipv4mapping
.Pq Vt bool
If set to
.Dq Li YES
this enables IPv4 mapped IPv6 address communication (like
.Li ::ffff:a.b.c.d ) .
.It Va rtsold_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable the
.Xr rtsold 8
daemon to send ICMPv6 Router Solicitation messages.
.It Va rtsold_flags
.Pq Vt str
If
.Va rtsold_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr rtsold 8 .
.It Va rtsol_flags
.Pq Vt str
For interfaces configured with the
.Dq Li inet6 accept_rtadv
keyword, these are the flags to pass to
.Xr rtsol 8 .
.Pp
Note that
.Va rtsold_enable
is mutually exclusive to
.Va rtsol_flags ;
.Va rtsold_enable
takes precedence.
.It Va atm_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable the configuration of ATM interfaces at system boot time.
For all of the ATM variables described below, please refer to the
.Xr atm 8
manual page for further details on the available command parameters.
Also refer to the files in
.Pa /usr/share/examples/atm
for more detailed configuration information.
.It Va atm_load
.Pq Vt str
This is a list of physical ATM interface drivers to load.
Typical values are
.Dq Li hfa_pci
and/or
.Dq Li hea_pci .
.It Va atm_netif_ Ns Aq Ar intf
.Pq Vt str
For the ATM physical interface
.Ar intf ,
this variable defines the name prefix and count for the ATM network
interfaces to be created.
The value will be passed as the parameters of an
.Dq Nm atm Cm "set netif" Ar intf
command.
.It Va atm_sigmgr_ Ns Aq Ar intf
.Pq Vt str
For the ATM physical interface
.Ar intf ,
this variable defines the ATM signalling manager to be used.
The value will be passed as the parameters of an
.Dq Nm atm Cm attach Ar intf
command.
.It Va atm_prefix_ Ns Aq Ar intf
.Pq Vt str
For the ATM physical interface
.Ar intf ,
this variable defines the NSAP prefix for interfaces using a UNI signalling
manager.
If set to
.Dq Li ILMI ,
the prefix will automatically be set via the
.Xr ilmid 8
daemon.
Otherwise, the value will be passed as the parameters of an
.Dq Nm atm Cm "set prefix" Ar intf
command.
.It Va atm_macaddr_ Ns Aq Ar intf
.Pq Vt str
For the ATM physical interface
.Ar intf ,
this variable defines the MAC address for interfaces using a UNI signalling
manager.
If set to
.Dq Li NO ,
the hardware MAC address contained in the ATM interface card will be used.
Otherwise, the value will be passed as the parameters of an
.Dq Nm atm Cm "set mac" Ar intf
command.
.It Va atm_arpserver_ Ns Aq Ar netif
.Pq Vt str
For the ATM network interface
.Ar netif ,
this variable defines the ATM address for a host which is to provide ATMARP
service.
This variable is only applicable to interfaces using a UNI signalling
manager.
If set to
.Dq Li local ,
this host will become an ATMARP server.
The value will be passed as the parameters of an
.Dq Nm atm Cm "set arpserver" Ar netif
command.
.It Va atm_scsparp_ Ns Aq Ar netif
.Pq Vt bool
If set to
.Dq Li YES ,
SCSP/ATMARP service for the network interface
.Ar netif
will be initiated using the
.Xr scspd 8
and
.Xr atmarpd 8
daemons.
This variable is only applicable if
.Va atm_arpserver_ Ns Aq Ar netif
is set to
.Dq Li local .
.It Va atm_pvcs
.Pq Vt str
Set to the list of ATM PVCs to be added at system
boot time.
For each whitespace separated
.Ar element
in the value, an
.Va atm_pvc_ Ns Aq Ar element
variable is assumed to exist.
The value of each of these variables
will be passed as the parameters of an
.Dq Nm atm Cm "add pvc"
command.
.It Va atm_arps
.Pq Vt str
Set to the list of permanent ATM ARP entries to be added
at system boot time.
For each whitespace separated
.Ar element
in the value, an
.Va atm_arp_ Ns Aq Ar element
variable is assumed to exist.
The value of each of these variables
will be passed as the parameters of an
.Dq Nm atm Cm "add arp"
command.
.It Va natm_interfaces
.Pq Vt str
Set to the list of
.Xr natm 4
interfaces that will also be used for HARP through
.Xr harp 4 .
If this list is not empty all interfaces in the list will be brought up
with
.Xr ifconfig 8
and
.Xr harp 4
will be loaded.
For this to work the interface drivers must be either compiled into the
kernel or must reside on the root partition.
.It Va keybell
.Pq Vt str
The keyboard bell sound.
Set to
.Dq Li normal ,
.Dq Li visual ,
.Dq Li off ,
or
.Dq Li NO
if the default behavior is desired.
For details, refer to the
.Xr kbdcontrol 1
manpage.
.It Va keyboard
.Pq Vt str
If set to a non-null string, the virtual console's keyboard input is
set to this device.
.It Va keymap
.Pq Vt str
If set to
.Dq Li NO ,
no keymap is installed, otherwise the value is used to install
the keymap file found in
.Pa /usr/share/syscons/keymaps/ Ns Ao Ar value Ac Ns Pa .kbd
(if using
.Xr syscons 4 ) or
.Pa /usr/share/vt/keymaps/ Ns Ao Ar value Ac Ns Pa .kbd
(if using
.Xr vt 4 ) .
.It Va keyrate
.Pq Vt str
The keyboard repeat speed.
Set to
.Dq Li slow ,
.Dq Li normal ,
.Dq Li fast ,
or
.Dq Li NO
if the default behavior is desired.
.It Va keychange
.Pq Vt str
If not set to
.Dq Li NO ,
attempt to program the function keys with the value.
The value should
be a single string of the form:
.Dq Ar funkey_number new_value Op Ar funkey_number new_value ... .
.It Va cursor
.Pq Vt str
Can be set to the value of
.Dq Li normal ,
.Dq Li blink ,
.Dq Li destructive ,
or
.Dq Li NO
to set the cursor behavior explicitly or choose the default behavior.
.It Va scrnmap
.Pq Vt str
If set to
.Dq Li NO ,
no screen map is installed, otherwise the value is used to install
the screen map file in
.Pa /usr/share/syscons/scrnmaps/ Ns Aq Ar value .
This parameter is ignored when using
.Xr vt 4
as the console driver.
.It Va font8x16
.Pq Vt str
If set to
.Dq Li NO ,
the default 8x16 font value is used for screen size requests, otherwise
the value in
.Pa /usr/share/syscons/fonts/ Ns Aq Ar value
or
.Pa /usr/share/vt/fonts/ Ns Aq Ar value
is used (depending on the console driver being used).
.It Va font8x14
.Pq Vt str
If set to
.Dq Li NO ,
the default 8x14 font value is used for screen size requests, otherwise
the value in
.Pa /usr/share/syscons/fonts/ Ns Aq Ar value
or
.Pa /usr/share/vt/fonts/ Ns Aq Ar value
is used (depending on the console driver being used).
.It Va font8x8
.Pq Vt str
If set to
.Dq Li NO ,
the default 8x8 font value is used for screen size requests, otherwise
the value in
.Pa /usr/share/syscons/fonts/ Ns Aq Ar value
or
.Pa /usr/share/vt/fonts/ Ns Aq Ar value
is used (depending on the console driver being used).
.It Va blanktime
.Pq Vt int
If set to
.Dq Li NO ,
the default screen blanking interval is used, otherwise it is set
to
.Ar value
seconds.
.It Va saver
.Pq Vt str
If not set to
.Dq Li NO ,
this is the actual screen saver to use
.Li ( blank , snake , daemon ,
etc).
.It Va moused_nondefault_enable
.Pq Vt str
If set to
.Dq Li NO ,
the mouse device specified on
the command line is not automatically treated as enabled by the
.Pa /etc/rc.d/moused
script.
Having this variable set to
.Dq Li YES
allows a
.Xr usb 4
mouse,
for example,
to be enabled as soon as it is plugged in.
.It Va moused_enable
.Pq Vt str
If set to
.Dq Li YES ,
the
.Xr moused 8
daemon is started for doing cut/paste selection on the console.
.It Va moused_type
.Pq Vt str
This is the protocol type of the mouse connected to this host.
This variable must be set if
.Va moused_enable
is set to
.Dq Li YES .
The
.Xr moused 8
daemon
is able to detect the appropriate mouse type automatically in many cases.
Set this variable to
.Dq Li auto
to let the daemon detect it, or
select one from the following list if the automatic detection fails.
.Pp
If the mouse is attached to the PS/2 mouse port, choose
.Dq Li auto
or
.Dq Li ps/2 ,
regardless of the brand and model of the mouse.
Likewise, if the
mouse is attached to the bus mouse port, choose
.Dq Li auto
or
.Dq Li busmouse .
All other protocols are for serial mice and will not work with
the PS/2 and bus mice.
If this is a USB mouse,
.Dq Li auto
is the only protocol type which will work.
.Pp
.Bl -tag -width ".Li x10mouseremote" -compact
.It Li microsoft
Microsoft mouse (serial)
.It Li intellimouse
Microsoft IntelliMouse (serial)
.It Li mousesystems
Mouse systems Corp.\& mouse (serial)
.It Li mmseries
MM Series mouse (serial)
.It Li logitech
Logitech mouse (serial)
.It Li busmouse
A bus mouse
.It Li mouseman
Logitech MouseMan and TrackMan (serial)
.It Li glidepoint
ALPS GlidePoint (serial)
.It Li thinkingmouse
Kensington ThinkingMouse (serial)
.It Li ps/2
PS/2 mouse
.It Li mmhittab
MM HitTablet (serial)
.It Li x10mouseremote
X10 MouseRemote (serial)
.It Li versapad
Interlink VersaPad (serial)
.El
.Pp
Even if the mouse is not in the above list, it may be compatible
with one in the list.
Refer to the manual page for
.Xr moused 8
for compatibility information.
.Pp
It should also be noted that while this is enabled, any
other client of the mouse (such as an X server) should access
the mouse through the virtual mouse device,
.Pa /dev/sysmouse ,
and configure it as a
.Dq Li sysmouse
type mouse, since all
mouse data is converted to this single canonical format when
using
.Xr moused 8 .
If the client program does not support the
.Dq Li sysmouse
type,
specify the
.Dq Li mousesystems
type.
It is the second preferred type.
.It Va moused_port
.Pq Vt str
If
.Va moused_enable
is set to
.Dq Li YES ,
this is the actual port the mouse is on.
It might be
.Pa /dev/cuau0
for a COM1 serial mouse,
.Pa /dev/psm0
for a PS/2 mouse or
.Pa /dev/mse0
for a bus mouse, for example.
.It Va moused_flags
.Pq Vt str
If
.Va moused_flags
is set, its value is used as an additional set of flags to pass to the
.Xr moused 8
daemon.
.It Va "moused_" Ns Ar XXX Ns Va "_flags"
When
.Va moused_nondefault_enable
is enabled, and a
.Xr moused 8
daemon is started for a non-default port, the
.Va "moused_" Ns Ar XXX Ns Va "_flags"
set of options has precedence over and replaces the default
.Va moused_flags
(where
.Ar XXX
is the name of the non-default port, i.e.,\&
.Ar ums0 ) .
By setting
.Va "moused_" Ns Ar XXX Ns Va "_flags"
it is possible to set up a different set of default flags for each
.Xr moused 8
instance.
For example, you can use
.Dq Li "-3"
for the default
.Va moused_flags
to make your laptop's touchpad more comfortable to use,
but an empty set of options for
.Va moused_ums0_flags
when your
.Xr usb 4
mouse has three or more buttons.
.It Va mousechar_start
.Pq Vt int
If set to
.Dq Li NO ,
the default mouse cursor character range
.Li 0xd0 Ns - Ns Li 0xd3
is used,
otherwise the range start is set
to
.Ar value
character, see
.Xr vidcontrol 1 .
Use if the default range is occupied in the language code table.
.It Va allscreens_flags
.Pq Vt str
If set,
.Xr vidcontrol 1
is run with these options for each of the virtual terminals
.Pq Pa /dev/ttyv* .
For example,
.Dq Fl m Cm on
will enable the mouse pointer on all virtual terminals
if
.Va moused_enable
is set to
.Dq Li YES .
.It Va allscreens_kbdflags
.Pq Vt str
If set,
.Xr kbdcontrol 1
is run with these options for each of the virtual terminals
.Pq Pa /dev/ttyv* .
For example,
.Dq Fl h Li 200
will set the
.Xr syscons 4
or
.Xr vt 4
scrollback (history) buffer to 200 lines.
.It Va cron_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr cron 8
daemon at system boot time.
.It Va cron_program
.Pq Vt str
Path to
.Xr cron 8
(default
.Pa /usr/sbin/cron ) .
.It Va cron_flags
.Pq Vt str
If
.Va cron_enable
is set to
.Dq Li YES ,
these are the flags to pass to
.Xr cron 8 .
.It Va cron_dst
.Pq Vt bool
If set to
.Dq Li YES ,
enable the special handling of transitions to and from the
Daylight Saving Time in
.Xr cron 8
(equivalent to using the flag
.Fl s ) .
.It Va lpd_program
.Pq Vt str
Path to
.Xr lpd 8
(default
.Pa /usr/sbin/lpd ) .
.It Va lpd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr lpd 8
daemon at system boot time.
.It Va lpd_flags
.Pq Vt str
If
.Va lpd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr lpd 8
daemon.
.It Va chkprintcap_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run the
.Xr chkprintcap 8
command before starting the
.Xr lpd 8
daemon.
.It Va chkprintcap_flags
.Pq Vt str
If
.Va lpd_enable
and
.Va chkprintcap_enable
are set to
.Dq Li YES ,
these are the flags to pass to the
.Xr chkprintcap 8
program.
The default is
.Dq Li -d ,
which causes missing directories to be created.
.It Va mta_start_script
.Pq Vt str
This variable specifies the full path to the script to run to start
a mail transfer agent.
The default is
.Pa /etc/rc.sendmail .
The
.Va sendmail_*
variables which
.Pa /etc/rc.sendmail
uses are documented in the
.Xr rc.sendmail 8
manual page.
.It Va dumpdev
.Pq Vt str
Indicates the device (usually a swap partition) to which a crash dump
should be written in the event of a system crash.
If the value of this variable is
.Dq Li AUTO ,
the first suitable swap device listed in
.Pa /etc/fstab
will be used as dump device.
Otherwise, the value of this variable is passed as the argument to
.Xr dumpon 8 .
To disable crash dumps, set this variable to
.Dq Li NO .
.It Va dumpdir
.Pq Vt str
When the system reboots after a crash and a crash dump is found on the
device specified by the
.Va dumpdev
variable,
.Xr savecore 8
will save that crash dump and a copy of the kernel to the directory
specified by the
.Va dumpdir
variable.
The default value is
.Pa /var/crash .
Set to
.Dq Li NO
to not run
.Xr savecore 8
at boot time when
.Va dumpdir
is set.
.It Va savecore_enable
.Pq Vt bool
If set to
.Dq Li NO ,
disable automatic extraction of the crash dump from the
.Va dumpdev .
.It Va savecore_flags
.Pq Vt str
If crash dumps are enabled, these are the flags to pass to the
.Xr savecore 8
utility.
.It Va quota_enable
.Pq Vt bool
Set to
.Dq Li YES
to turn on user and group disk quotas on system startup via the
.Xr quotaon 8
command for all file systems marked as having quotas enabled in
.Pa /etc/fstab .
The kernel must be built with
.Cd "options QUOTA"
for disk quotas to function.
.It Va check_quotas
.Pq Vt bool
Set to
.Dq Li YES
to enable user and group disk quota checking via the
.Xr quotacheck 8
command.
.It Va quotacheck_flags
.Pq Vt str
If
.Va quota_enable
is set to
.Dq Li YES ,
and
.Va check_quotas
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr quotacheck 8
utility.
The default is
.Dq Li "-a" ,
which checks quotas for all file systems with quotas enabled in
.Pa /etc/fstab .
.It Va quotaon_flags
.Pq Vt str
If
.Va quota_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr quotaon 8
utility.
The default is
.Dq Li "-a" ,
which enables quotas for all file systems with quotas enabled in
.Pa /etc/fstab .
.It Va quotaoff_flags
.Pq Vt str
If
.Va quota_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr quotaoff 8
utility when shutting down the quota system.
The default is
.Dq Li "-a" ,
which disables quotas for all file systems with quotas enabled in
.Pa /etc/fstab .
.It Va accounting_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable system accounting through the
.Xr accton 8
facility.
.It Va ibcs2_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable iBCS2 (SCO) binary emulation at system initial boot
time.
.It Va ibcs2_loaders
.Pq Vt str
If not set to
.Dq Li NO
and if
.Va ibcs2_enable
is set to
.Dq Li YES ,
this specifies a list of additional iBCS2 loaders to enable.
.It Va firstboot_sentinel
.Pq Vt str
This variable specifies the full path to a
.Dq first boot
sentinel file.
If a file exists with this path,
.Pa rc.d
scripts with the
.Dq firstboot
keyword will be run on startup and the sentinel file will be deleted
after the boot process completes.
The sentinel file must be located on a writable file system which is
mounted no later than
.Va early_late_divider
to function properly.
The default is
.Pa /firstboot .
.It Va linux_enable
.Pq Vt bool
Set to
.Dq Li YES
to enable Linux/ELF binary emulation at system initial
boot time.
.It Va svr4_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable SysVR4 emulation at boot time.
.It Va sysvipc_enable
.Pq Vt bool
If set to
.Dq Li YES ,
load System V IPC primitives at boot time.
.It Va clear_tmp_enable
.Pq Vt bool
Set to
.Dq Li YES
to have
.Pa /tmp
cleaned at startup.
.It Va clear_tmp_X
.Pq Vt bool
Set to
.Dq Li NO
to disable removing of X11 lock files,
and the removal and (secure) recreation
of the various socket directories for X11
related programs.
.It Va ldconfig_paths
.Pq Vt str
Set to the list of shared library paths to use with
.Xr ldconfig 8 .
NOTE:
.Pa /usr/lib
will always be added first, so it need not appear in this list.
.It Va ldconfig32_paths
.Pq Vt str
Set to the list of 32-bit compatibility shared library paths to
use with
.Xr ldconfig 8 .
.It Va ldconfig_paths_aout
.Pq Vt str
Set to the list of shared library paths to use with
.Xr ldconfig 8
legacy
.Xr a.out 5
support.
.It Va ldconfig_insecure
.Pq Vt bool
The
.Xr ldconfig 8
utility normally refuses to use directories
which are writable by anyone except root.
Set this variable to
.Dq Li YES
to disable that security check during system startup.
.It Va ldconfig_local_dirs
.Pq Vt str
Set to the list of local
.Xr ldconfig 8
directories.
The names of all files in the directories listed will be
passed as arguments to
.Xr ldconfig 8 .
.It Va ldconfig_local32_dirs
.Pq Vt str
Set to the list of local 32-bit compatibility
.Xr ldconfig 8
directories.
The names of all files in the directories listed will be
passed as arguments to
.Dq Nm ldconfig Fl 32 .
.It Va kern_securelevel_enable
.Pq Vt bool
Set to
.Dq Li YES
to set the kernel security level at system startup.
.It Va kern_securelevel
.Pq Vt int
The kernel security level to set at startup.
The allowed range of
.Ar value
ranges from \-1 (the compile time default) to 3 (the
most secure).
See
.Xr security 7
for the list of possible security levels and their effect
on system operation.
.It Va sshd_program
.Pq Vt str
Path to the SSH server program
.Pa ( /usr/sbin/sshd
is the default).
.It Va sshd_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr sshd 8
at system boot time.
.It Va sshd_flags
.Pq Vt str
If
.Va sshd_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr sshd 8
daemon.
.It Va ftpd_program
.Pq Vt str
Path to the FTP server program
.Pa ( /usr/libexec/ftpd
is the default).
.It Va ftpd_enable
.Pq Vt bool
Set to
.Dq Li YES
to start
.Xr ftpd 8
as a stand-alone daemon at system boot time.
.It Va ftpd_flags
.Pq Vt str
If
.Va ftpd_enable
is set to
.Dq Li YES ,
these are the additional flags to pass to the
.Xr ftpd 8
daemon.
.It Va watchdogd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
start the
.Xr watchdogd 8
daemon at boot time.
This requires that the kernel have been compiled with a
.Xr watchdog 4
compatible device.
.It Va watchdogd_flags
.Pq Vt str
If
.Va watchdogd_enable
is set to
.Dq Li YES ,
these are the flags passed to the
.Xr watchdogd 8
daemon.
.It Va devfs_rulesets
.Pq Vt str
List of files containing sets of rules for
.Xr devfs 8 .
.It Va devfs_system_ruleset
.Pq Vt str
Rule name(s) to apply to the system
.Pa /dev
itself.
.It Va devfs_set_rulesets
.Pq Vt str
Pairs of already-mounted
.Pa dev
directories and rulesets that should be applied to them.
For example: /mount/dev=ruleset_name
.It Va devfs_load_rulesets
.Pq Vt bool
If set, always load the default rulesets listed in
.Va devfs_rulesets .
.It Va performance_cx_lowest
.Pq Vt str
CPU idle state to use while on AC power.
The string
.Dq Li LOW
indicates that
.Xr acpi 4
should use the lowest power state available while
.Dq Li HIGH
indicates that the lowest latency state (less power savings) should be used.
.It Va performance_cpu_freq
.Pq Vt str
CPU clock frequency to use while on AC power.
The string
.Dq Li LOW
indicates that
.Xr cpufreq 4
should use the lowest frequency available while
.Dq Li HIGH
indicates that the highest frequency (less power savings) should be used.
.It Va economy_cx_lowest
.Pq Vt str
CPU idle state to use when off AC power.
The string
.Dq Li LOW
indicates that
.Xr acpi 4
should use the lowest power state available while
.Dq Li HIGH
indicates that the lowest latency state (less power savings) should be used.
.It Va economy_cpu_freq
.Pq Vt str
CPU clock frequency to use when off AC power.
The string
.Dq Li LOW
indicates that
.Xr cpufreq 4
should use the lowest frequency available while
.Dq Li HIGH
indicates that the highest frequency (less power savings) should be used.
.It Va jail_enable
.Pq Vt bool
If set to
.Dq Li NO ,
any configured jails will not be started.
.It Va jail_conf
.Pq Vt str
The configuration filename used by
.Xr jail 8
utility.
The default value is
.Pa /etc/jail.conf .
.It Va jail_parallel_start
.Pq Vt bool
If set to
.Dq Li YES ,
all configured jails will be started in the background (in parallel).
.It Va jail_flags
.Pq Vt str
Unset by default.
When set, use as default value for
.Va jail_ Ns Ao Ar jname Ac Ns Va _flags
for every jail in
.Va jail_list .
.It Va jail_list
.Pq Vt str
A space-delimited list of jail names.
When left empty, all of the
.Xr jail 8
instances defined in the configuration file are started.
The names specified in this list control the jail startup order.
.Xr jail 8
instances missing from
.Va jail_list
must be started manually.
Note that a jail's
.Va depend
parameter in the configuration file may override this list.
.It Va jail_reverse_stop
.Pq Vt bool
When set to
.Dq Li YES ,
all configured jails in
.Va jail_list
are stopped in reverse order.
.It Va jail_* variables
Note that older releases supported per-jail configuration via
.Xr rc.conf 5
variables.
For example,
hostname of a jail named
.Li vjail
was able to be set by
.Li jail_vjail_hostname .
These per-jail configuration variables are now obsolete in favor of
.Xr jail 8
configuration file.
For backward compatibility,
when per-jail configuration variables are defined,
.Xr jail 8
configuration files are created as
.Pa /var/run/jail. Ns Ao Ar jname Ac Ns Pa .conf
and used.
.Pp
The following per-jail parameters are handled by
.Pa rc.d/jail
script out of their corresponding
.Nm
variables.
In addition to them, parameters in
.Va jail_ Ns Ao Ar jname Ac Ns Va _parameters
will be added to the configuration file.
They must be a semi-colon
.Pq Ql \&;
delimited list of
.Dq key=value .
For more details,
see
.Xr jail 8
manual page.
.Bl -tag -width "host.hostname" -offset indent
.It Li path
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _rootdir
.It Li host.hostname
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _hostname
.It Li exec.consolelog
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _consolelog .
The default value is
.Pa /var/log/jail_ Ao Ar jname Ac Pa _console.log .
.It Li interface
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _interface .
.It Li vnet.interface
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _vnet_interface .
This implies
.Li vnet
parameter will be enabled and cannot be specified with
.Va jail_ Ns Ao Ar jname Ac Ns Va _interface ,
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip
and/or
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
at the same time.
.It Li fstab
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _fstab
.It Li mount
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _procfs_enable .
.It Li exec.fib
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _fib
.It Li exec.start
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_start .
The parameter name was
.Li command
in some older releases.
.It Li exec.prestart
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_prestart
.It Li exec.poststart
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_poststart
.It Li exec.stop
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_stop
.It Li exec.prestop
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_prestop
.It Li exec.poststop
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _exec_poststop
.It Li ip4.addr
set if
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip
or
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
contain IPv4 addresses
.It Li ip6.addr
set if
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip
or
.Va jail_ Ns Ao Ar jname Ac Ns Va _ip_multi Ns Aq Ar n
contain IPv6 addresses
.It Li allow.mount
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _mount_enable
.It Li mount.devfs
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _devfs_enable
.It Li devfs_ruleset
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _devfs_ruleset .
This must be an integer,
not a string.
.It Li mount.fdescfs
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _fdescfs_enable
.It Li allow.set_hostname
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _set_hostname_allow
.It Li allow.rawsocket
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _socket_unixiproute_only
.It Li allow.sysvipc
set from
.Va jail_ Ns Ao Ar jname Ac Ns Va _sysvipc_allow
.El
.\" -----------------------------------------------------
.It Va harvest_mask
.Pq Vt int
Set to a bit-mask
representing the entropy sources
you wish to harvest.
Refer to
.Xr random 4
for more information.
.It Va entropy_dir
.Pq Vt str
Set to
.Dq Li NO
to disable caching entropy via
.Xr cron 8 .
Otherwise set to the directory
in which the entropy files are stored.
To be useful,
there must be
a system cron job
that regularly writes and rotates
files here.
All files found
will be used at boot time.
The default is
.Pa /var/db/entropy .
.It Va entropy_file
.Pq Vt str
Set to
.Dq Li NO
to disable caching entropy through reboots.
Otherwise set to the name
of a file used to store cached entropy.
This file should be located
on a file system that is readable
before all the volumes specified in
.Xr fstab 5
are mounted.
By default,
.Pa /entropy
is used,
but if
.Pa /var/db/entropy-file
is found it will also be used.
This will be of some use to
.Xr bsdinstall 8 .
.It Va entropy_boot_file
.Pq Vt str
Set to
.Dq Li NO
to disable
very early caching entropy
through reboots.
Otherwise set to the filename
used to read
very early reboot cached entropy.
This file should be located where
.Xr loader 8
can read it.
See also
.Xr loader.conf 5 .
The default location is
.Pa /boot/entropy .
.It Va entropy_save_sz
.Pq Vt int
Size of the entropy cache files saved by
.Nm save-entropy
periodically.
.It Va entropy_save_num
.Pq Vt int
Number of entropy cache files to save by
.Nm save-entropy
periodically.
.It Va ipsec_enable
.Pq Vt bool
Set to
.Dq Li YES
to run
.Xr setkey 8
on
.Va ipsec_file
at boot time.
.It Va ipsec_file
.Pq Vt str
Configuration file for
.Xr setkey 8 .
.It Va dmesg_enable
.Pq Vt bool
Set to
.Dq Li YES
to save
.Xr dmesg 8
to
.Pa /var/run/dmesg.boot
on boot.
.It Va rcshutdown_timeout
.Pq Vt int
If set, start a watchdog timer in the background which will terminate
.Pa rc.shutdown
if
.Xr shutdown 8
has not completed within the specified time (in seconds).
Notice that in addition to this soft timeout,
.Xr init 8
also applies a hard timeout for the execution of
.Pa rc.shutdown .
This is configured via
.Xr sysctl 8
variable
.Va kern.init_shutdown_timeout
and defaults to 120 seconds.
Setting the value of
.Va rcshutdown_timeout
to more than 120 seconds will have no effect until the
.Xr sysctl 8
variable
.Va kern.init_shutdown_timeout
is also increased.
.It Va virecover_enable
.Pq Vt bool
Set to
.Dq Li NO
to prevent the system from trying to
recover pre-maturely terminated
.Xr vi 1
sessions.
.It Va ugidfw_enable
.Pq Vt bool
Set to
.Dq Li YES
to load the
.Xr mac_bsdextended 4
module upon system initialization and load a default
ruleset file.
.It Va bsdextended_script
.Pq Vt str
The default
.Xr mac_bsdextended 4
ruleset file to load.
The default value of this variable is
.Pa /etc/rc.bsdextended .
.It Va newsyslog_enable
.Pq Vt bool
If set to
.Dq Li YES ,
run
.Xr newsyslog 8
command at startup.
.It Va newsyslog_flags
.Pq Vt str
If
.Va newsyslog_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr newsyslog 8
program.
The default is
.Dq Li -CN ,
which causes log files flagged with a
.Cm C
to be created.
.It Va mdconfig_md Ns Aq Ar X
.Pq Vt str
Arguments to
.Xr mdconfig 8
for
.Xr md 4
device
.Ar X .
At minimum a
.Fl t Ar type
must be specified and either a
.Fl s Ar size
for malloc or swap backed
.Xr md 4
devices or a
.Fl f Ar file
for vnode backed
.Xr md 4
devices.
Note that
.Va mdconfig_md Ns Aq Ar X
variables are evaluated until one variable is unset or null.
.It Va mdconfig_md Ns Ao Ar X Ac Ns Va _newfs
.Pq Vt str
Optional arguments passed to
.Xr newfs 8
to initialize
.Xr md 4
device
.Ar X .
.It Va mdconfig_md Ns Ao Ar X Ac Ns Va _owner
.Pq Vt str
An ownership specification passed to
.Xr chown 8
after the specified
.Xr md 4
device
.Ar X
has been mounted.
Both the
.Xr md 4
device and the mount point will be changed.
.It Va mdconfig_md Ns Ao Ar X Ac Ns Va _perms
.Pq Vt str
A mode string passed to
.Xr chmod 1
after the specified
.Xr md 4
device
.Ar X
has been mounted.
Both the
.Xr md 4
device and the mount point will be changed.
.It Va mdconfig_md Ns Ao Ar X Ac Ns Va _files
.Pq Vt str
Files to be copied to the mount point of the
.Xr md 4
device
.Ar X
after it has been mounted.
.It Va mdconfig_md Ns Ao Ar X Ac Ns Va _cmd
.Pq Vt str
Command to execute after the specified
.Xr md 4
device
.Ar X
has been mounted.
Note that the command is passed to
.Ic eval
and that both
.Va _dev
and
.Va _mp
variables can be used to reference respectively the
.Xr md 4
device and the mount point.
Assuming that the
.Xr md 4
device is
.Li md0 ,
one could set the following:
.Bd -literal
mdconfig_md0_cmd="tar xfzC /var/file.tgz \e${_mp}"
.Ed
.It Va autobridge_interfaces
.Pq Vt str
Set to the list of bridge interfaces that will have newly arriving interfaces
checked against to be automatically added.
If not set to
.Dq Li NO
then for each whitespace separated
.Ar element
in the value, a
.Va autobridge_ Ns Aq Ar element
variable is assumed to exist which has a whitespace separated list of interface
names to match, these names can use wildcards.
For example:
.Bd -literal
autobridge_interfaces="bridge0"
autobridge_bridge0="tap* dc0 vlan[345]"
.Ed
.It Va mixer_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable support for sound mixer.
.It Va hcsecd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable Bluetooth security daemon.
.It Va hcsecd_config
.Pq Vt str
Configuration file for
.Xr hcsecd 8 .
Default
.Pa /etc/bluetooth/hcsecd.conf .
.It Va sdpd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable Bluetooth Service Discovery Protocol daemon.
.It Va sdpd_control
.Pq Vt str
Path to
.Xr sdpd 8
control socket.
Default
.Pa /var/run/sdp .
.It Va sdpd_groupname
.Pq Vt str
Sets
.Xr sdpd 8
group to run as after it initializes.
Default
.Dq Li nobody .
.It Va sdpd_username
.Pq Vt str
Sets
.Xr sdpd 8
user to run as after it initializes.
Default
.Dq Li nobody .
.It Va bthidd_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable Bluetooth Human Interface Device daemon.
.It Va bthidd_config
.Pq Vt str
Configuration file for
.Xr bthidd 8 .
Default
.Pa /etc/bluetooth/bthidd.conf .
.It Va bthidd_hids
.Pq Vt str
Path to a file, where
.Xr bthidd 8
will store information about known HID devices.
Default
.Pa /var/db/bthidd.hids .
.It Va rfcomm_pppd_server_enable
.Pq Vt bool
If set to
.Dq Li YES ,
enable Bluetooth RFCOMM PPP wrapper daemon.
.It Va rfcomm_pppd_server_profile
.Pq Vt str
The name of the profile to use from
.Pa /etc/ppp/ppp.conf .
Multiple profiles can be specified here.
Also used to specify per-profile overrides.
When the profile name contains any of the characters
.Dq Li .-/+
they are translated to
.Dq Li _
for the proposes of the override variable names.
.It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _bdaddr
.Pq Vt str
Overrides local address to listen on.
By default
.Xr rfcomm_pppd 8
will listen on
.Dq Li ANY
address.
The address can be specified as BD_ADDR or name.
.It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _channel
.Pq Vt str
Overrides local RFCOMM channel to listen on.
By default
.Xr rfcomm_pppd 8
will listen on RFCOMM channel 1.
Must set properly if multiple profiles used in the same time.
.It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _register_sp
.Pq Vt bool
Tells
.Xr rfcomm_pppd 8
if it should register Serial Port service on the specified RFCOMM channel.
Default
.Dq Li NO .
.It Va rfcomm_pppd_server_ Ns Ao Ar profile Ac Ns _register_dun
.Pq Vt bool
Tells
.Xr rfcomm_pppd 8
if it should register Dial-Up Networking service on the specified
RFCOMM channel.
Default
.Dq Li NO .
.It Va ubthidhci_enable
.Pq Vt bool
If set to
.Dq Li YES ,
change the USB Bluetooth controller from HID mode to HCI mode.
You also need to specify the location of USB Bluetooth controller with the
.Va ubthidhci_busnum
and
.Va ubthidhci_addr
variables.
.It Va ubthidhci_busnum
Bus number where the USB Bluetooth controller is located.
Check the output of
.Xr usbconfig 8
on your system to find this information.
.It Va ubthidhci_addr
Bus address of the USB Bluetooth controller.
Check the output of
.Xr usbconfig 8
on your system to find this information.
.It Va netwait_enable
.Pq Vt bool
If set to
.Dq Li YES ,
delays the start of network-reliant services until
.Va netwait_if
is up and ICMP packets to a destination defined in
.Va netwait_ip
are flowing.
Link state is examined first, followed by
.Dq Li pinging
an IP address to verify network usability.
If no destination can be reached or timeouts are exceeded,
network services are started anyway with no guarantee that
the network is usable.
Use of this variable requires both
.Va netwait_ip
and
.Va netwait_if
to be set.
.It Va netwait_ip
.Pq Vt str
Empty by default.
This variable contains a space-delimited list of IP addresses to
.Xr ping 8 .
DNS hostnames should not be used as resolution is not guaranteed
to be functional at this point.
If multiple IP addresses are specified,
each will be tried until one is successful or the list is exhausted.
.It Va netwait_timeout
.Pq Vt int
Indicates the total number of seconds to perform a
.Dq Li ping
against each IP address in
.Va netwait_ip ,
at a rate of one ping per second.
If any of the pings are successful,
full network connectivity is considered reliable.
The default is 60.
.It Va netwait_if
.Pq Vt str
Empty by default.
Defines the name of the network interface on which watch for link.
.Xr ifconfig 8
is used to monitor the interface, looking for
.Dq Li status: no carrier .
Once gone, the link is considered up.
This can be a
.Xr vlan 4
interface if desired.
.It Va netwait_if_timeout
.Pq Vt int
Defines the total number of seconds to wait for link to become usable,
polled at a 1-second interval.
The default is 30.
.It Va rctl_enable
.Pq Vt bool
If set to
.Dq Li YES ,
load
.Xr rctl 8
rules from the defined ruleset.
The kernel must be built with
.Cd "options RACCT"
and
.Cd "options RCTL" .
.It Va rctl_rules
.Pq Vt str
Set to
.Pa /etc/rctl.conf
by default.
This variables contains the
.Xr rctl.conf 5
ruleset to load for
.Xr rctl 8 .
.It Va iovctl_files
.Pq Vt str
A space-separated list of configuration files used by
.Xr iovctl 8 .
The default value is an empty string.
.It Va autofs_enable
.Pq Vt bool
If set to
.Dq Li YES ,
start the
.Xr automount 8
utility and the
.Xr automountd 8
and
.Xr autounmountd 8
daemons at boot time.
.It Va automount_flags
.Pq Vt str
If
.Va autofs_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr automount 8
program.
By default no flags are passed.
.It Va automountd_flags
.Pq Vt str
If
.Va autofs_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr automountd 8
daemon.
By default no flags are passed.
.It Va autounmountd_flags
.Pq Vt str
If
.Va autofs_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr autounmountd 8
daemon.
By default no flags are passed.
.It Va ctld_enable
.Pq Vt bool
If set to
.Dq Li YES ,
start the
.Xr ctld 8
daemon at boot time.
.It Va iscsid_enable
.Pq Vt bool
If set to
.Dq Li YES ,
start the
.Xr iscsid 8
daemon at boot time.
.It Va iscsictl_enable
.Pq Vt bool
If set to
.Dq Li YES ,
start the
.Xr iscsictl 8
utility at boot time.
.It Va iscsictl_flags
.Pq Vt str
If
.Va iscsictl_enable
is set to
.Dq Li YES ,
these are the flags to pass to the
.Xr iscsictl 8
program.
The default is
.Dq Li -Aa ,
which configures sessions based on the
.Pa /etc/iscsi.conf
configuration file.
.El
.Sh FILES
.Bl -tag -width ".Pa /etc/defaults/rc.conf" -compact
.It Pa /etc/defaults/rc.conf
.It Pa /etc/rc.conf
.It Pa /etc/rc.conf.local
.El
.Sh SEE ALSO
.Xr catman 1 ,
.Xr chmod 1 ,
.Xr gdb 1 ,
.Xr info 1 ,
.Xr kbdcontrol 1 ,
.Xr makewhatis 1 ,
.Xr sh 1 ,
.Xr vi 1 ,
.Xr vidcontrol 1 ,
.Xr bridge 4 ,
.Xr dummynet 4 ,
.Xr ip 4 ,
.Xr ipf 4 ,
.Xr ipfw 4 ,
.Xr ipnat 4 ,
.Xr kld 4 ,
.Xr pf 4 ,
.Xr pflog 4 ,
.Xr pfsync 4 ,
.Xr tcp 4 ,
.Xr udp 4 ,
.Xr exports 5 ,
.Xr fstab 5 ,
.Xr ipf 5 ,
.Xr ipnat 5 ,
.Xr jail.conf 5 ,
.Xr loader.conf 5 ,
.Xr motd 5 ,
.Xr newsyslog.conf 5 ,
.Xr pf.conf 5 ,
.Xr security 7 ,
.Xr accton 8 ,
.Xr amd 8 ,
.Xr apm 8 ,
.Xr atm 8 ,
.Xr bsdinstall 8 ,
.Xr bthidd 8 ,
.Xr chkprintcap 8 ,
.Xr chown 8 ,
.Xr cron 8 ,
.Xr devfs 8 ,
.Xr dhclient 8 ,
.Xr ftpd 8 ,
.Xr geli 8 ,
.Xr hcsecd 8 ,
.Xr ifconfig 8 ,
.Xr inetd 8 ,
.Xr iovctl 8 ,
.Xr ipf 8 ,
.Xr ipfw 8 ,
.Xr ipnat 8 ,
.Xr jail 8 ,
.Xr kldxref 8 ,
.Xr loader 8 ,
.Xr lpd 8 ,
.Xr mdconfig 8 ,
.Xr mdmfs 8 ,
.Xr mixer 8 ,
.Xr mountd 8 ,
.Xr moused 8 ,
.Xr newfs 8 ,
.Xr newsyslog 8 ,
.Xr nfsd 8 ,
.Xr ntpd 8 ,
.Xr ntpdate 8 ,
.Xr pfctl 8 ,
.Xr pflogd 8 ,
.Xr ping 8 ,
.Xr powerd 8 ,
.Xr quotacheck 8 ,
.Xr quotaon 8 ,
.Xr rc 8 ,
.Xr rc.sendmail 8 ,
.Xr rfcomm_pppd 8 ,
.Xr route 8 ,
.Xr routed 8 ,
.Xr rpc.lockd 8 ,
.Xr rpc.statd 8 ,
.Xr rpcbind 8 ,
.Xr rwhod 8 ,
.Xr savecore 8 ,
.Xr sdpd 8 ,
.Xr sshd 8 ,
.Xr swapon 8 ,
.Xr sysctl 8 ,
.Xr syslogd 8 ,
.Xr timed 8 ,
.Xr unbound 8 ,
.Xr usbconfig 8 ,
.Xr wlandebug 8 ,
.Xr yp 8 ,
.Xr ypbind 8 ,
.Xr ypserv 8 ,
.Xr ypset 8
.Sh HISTORY
The
.Nm
file appeared in
.Fx 2.2.2 .
.Sh AUTHORS
.An Jordan K. Hubbard .
Index: projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h
===================================================================
--- projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h (revision 301546)
+++ projects/vnet/sys/compat/linuxkpi/common/include/linux/etherdevice.h (revision 301547)
@@ -1,111 +1,112 @@
/*-
- * Copyright (c) 2015 Mellanox Technologies, Ltd. All rights reserved.
+ * Copyright (c) 2015-2016 Mellanox Technologies, Ltd. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice unmodified, this list of conditions, and the following
* disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _LINUX_ETHERDEVICE
#define _LINUX_ETHERDEVICE
#include
#include
#include
#define ETH_MODULE_SFF_8079 1
#define ETH_MODULE_SFF_8079_LEN 256
#define ETH_MODULE_SFF_8472 2
#define ETH_MODULE_SFF_8472_LEN 512
#define ETH_MODULE_SFF_8636 3
#define ETH_MODULE_SFF_8636_LEN 256
#define ETH_MODULE_SFF_8436 4
#define ETH_MODULE_SFF_8436_LEN 256
struct ethtool_eeprom {
u32 offset;
u32 len;
};
struct ethtool_modinfo {
u32 type;
u32 eeprom_len;
};
static inline bool
is_zero_ether_addr(const u8 * addr)
{
return ((addr[0] + addr[1] + addr[2] + addr[3] + addr[4] + addr[5]) == 0x00);
}
static inline bool
is_multicast_ether_addr(const u8 * addr)
{
return (0x01 & addr[0]);
}
static inline bool
is_broadcast_ether_addr(const u8 * addr)
{
return ((addr[0] + addr[1] + addr[2] + addr[3] + addr[4] + addr[5]) == (6 * 0xff));
}
static inline bool
is_valid_ether_addr(const u8 * addr)
{
return !is_multicast_ether_addr(addr) && !is_zero_ether_addr(addr);
}
static inline void
ether_addr_copy(u8 * dst, const u8 * src)
{
memcpy(dst, src, 6);
}
static inline bool
ether_addr_equal(const u8 *pa, const u8 *pb)
{
return (memcmp(pa, pb, 6) == 0);
}
static inline bool
ether_addr_equal_64bits(const u8 *pa, const u8 *pb)
{
return (memcmp(pa, pb, 6) == 0);
}
static inline void
eth_broadcast_addr(u8 *pa)
{
memset(pa, 0xff, 6);
}
static inline void
random_ether_addr(u8 * dst)
{
- read_random(dst, 6);
+ if (read_random(dst, 6) == 0)
+ arc4rand(dst, 6, 0);
dst[0] &= 0xfe;
dst[0] |= 0x02;
}
#endif /* _LINUX_ETHERDEVICE */
Index: projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h
===================================================================
--- projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h (revision 301546)
+++ projects/vnet/sys/compat/linuxkpi/common/include/linux/random.h (revision 301547)
@@ -1,42 +1,44 @@
/*-
* Copyright (c) 2010 Isilon Systems, Inc.
* Copyright (c) 2010 iX Systems, Inc.
* Copyright (c) 2010 Panasas, Inc.
- * Copyright (c) 2013, 2014 Mellanox Technologies, Ltd.
+ * Copyright (c) 2013-2016 Mellanox Technologies, Ltd.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice unmodified, this list of conditions, and the following
* disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef _LINUX_RANDOM_H_
#define _LINUX_RANDOM_H_
#include
+#include
static inline void
get_random_bytes(void *buf, int nbytes)
{
- read_random(buf, nbytes);
+ if (read_random(buf, nbytes) == 0)
+ arc4rand(buf, nbytes, 0);
}
#endif /* _LINUX_RANDOM_H_ */
Index: projects/vnet/sys/dev/cxgb/cxgb_sge.c
===================================================================
--- projects/vnet/sys/dev/cxgb/cxgb_sge.c (revision 301546)
+++ projects/vnet/sys/dev/cxgb/cxgb_sge.c (revision 301547)
@@ -1,3706 +1,3707 @@
/**************************************************************************
Copyright (c) 2007-2009, Chelsio Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Neither the name of the Chelsio Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
***************************************************************************/
#include
__FBSDID("$FreeBSD$");
#include "opt_inet6.h"
#include "opt_inet.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
int txq_fills = 0;
int multiq_tx_enable = 1;
#ifdef TCP_OFFLOAD
CTASSERT(NUM_CPL_HANDLERS >= NUM_CPL_CMDS);
#endif
extern struct sysctl_oid_list sysctl__hw_cxgb_children;
int cxgb_txq_buf_ring_size = TX_ETH_Q_SIZE;
SYSCTL_INT(_hw_cxgb, OID_AUTO, txq_mr_size, CTLFLAG_RDTUN, &cxgb_txq_buf_ring_size, 0,
"size of per-queue mbuf ring");
static int cxgb_tx_coalesce_force = 0;
SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_force, CTLFLAG_RWTUN,
&cxgb_tx_coalesce_force, 0,
"coalesce small packets into a single work request regardless of ring state");
#define COALESCE_START_DEFAULT TX_ETH_Q_SIZE>>1
#define COALESCE_START_MAX (TX_ETH_Q_SIZE-(TX_ETH_Q_SIZE>>3))
#define COALESCE_STOP_DEFAULT TX_ETH_Q_SIZE>>2
#define COALESCE_STOP_MIN TX_ETH_Q_SIZE>>5
#define TX_RECLAIM_DEFAULT TX_ETH_Q_SIZE>>5
#define TX_RECLAIM_MAX TX_ETH_Q_SIZE>>2
#define TX_RECLAIM_MIN TX_ETH_Q_SIZE>>6
static int cxgb_tx_coalesce_enable_start = COALESCE_START_DEFAULT;
SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_enable_start, CTLFLAG_RWTUN,
&cxgb_tx_coalesce_enable_start, 0,
"coalesce enable threshold");
static int cxgb_tx_coalesce_enable_stop = COALESCE_STOP_DEFAULT;
SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_coalesce_enable_stop, CTLFLAG_RWTUN,
&cxgb_tx_coalesce_enable_stop, 0,
"coalesce disable threshold");
static int cxgb_tx_reclaim_threshold = TX_RECLAIM_DEFAULT;
SYSCTL_INT(_hw_cxgb, OID_AUTO, tx_reclaim_threshold, CTLFLAG_RWTUN,
&cxgb_tx_reclaim_threshold, 0,
"tx cleaning minimum threshold");
/*
* XXX don't re-enable this until TOE stops assuming
* we have an m_ext
*/
static int recycle_enable = 0;
extern int cxgb_use_16k_clusters;
extern int nmbjumbop;
extern int nmbjumbo9;
extern int nmbjumbo16;
#define USE_GTS 0
#define SGE_RX_SM_BUF_SIZE 1536
#define SGE_RX_DROP_THRES 16
#define SGE_RX_COPY_THRES 128
/*
* Period of the Tx buffer reclaim timer. This timer does not need to run
* frequently as Tx buffers are usually reclaimed by new Tx packets.
*/
#define TX_RECLAIM_PERIOD (hz >> 1)
/*
* Values for sge_txq.flags
*/
enum {
TXQ_RUNNING = 1 << 0, /* fetch engine is running */
TXQ_LAST_PKT_DB = 1 << 1, /* last packet rang the doorbell */
};
struct tx_desc {
uint64_t flit[TX_DESC_FLITS];
} __packed;
struct rx_desc {
uint32_t addr_lo;
uint32_t len_gen;
uint32_t gen2;
uint32_t addr_hi;
} __packed;
struct rsp_desc { /* response queue descriptor */
struct rss_header rss_hdr;
uint32_t flags;
uint32_t len_cq;
uint8_t imm_data[47];
uint8_t intr_gen;
} __packed;
#define RX_SW_DESC_MAP_CREATED (1 << 0)
#define TX_SW_DESC_MAP_CREATED (1 << 1)
#define RX_SW_DESC_INUSE (1 << 3)
#define TX_SW_DESC_MAPPED (1 << 4)
#define RSPQ_NSOP_NEOP G_RSPD_SOP_EOP(0)
#define RSPQ_EOP G_RSPD_SOP_EOP(F_RSPD_EOP)
#define RSPQ_SOP G_RSPD_SOP_EOP(F_RSPD_SOP)
#define RSPQ_SOP_EOP G_RSPD_SOP_EOP(F_RSPD_SOP|F_RSPD_EOP)
struct tx_sw_desc { /* SW state per Tx descriptor */
struct mbuf *m;
bus_dmamap_t map;
int flags;
};
struct rx_sw_desc { /* SW state per Rx descriptor */
caddr_t rxsd_cl;
struct mbuf *m;
bus_dmamap_t map;
int flags;
};
struct txq_state {
unsigned int compl;
unsigned int gen;
unsigned int pidx;
};
struct refill_fl_cb_arg {
int error;
bus_dma_segment_t seg;
int nseg;
};
/*
* Maps a number of flits to the number of Tx descriptors that can hold them.
* The formula is
*
* desc = 1 + (flits - 2) / (WR_FLITS - 1).
*
* HW allows up to 4 descriptors to be combined into a WR.
*/
static uint8_t flit_desc_map[] = {
0,
#if SGE_NUM_GENBITS == 1
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
#elif SGE_NUM_GENBITS == 2
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
#else
# error "SGE_NUM_GENBITS must be 1 or 2"
#endif
};
#define TXQ_LOCK_ASSERT(qs) mtx_assert(&(qs)->lock, MA_OWNED)
#define TXQ_TRYLOCK(qs) mtx_trylock(&(qs)->lock)
#define TXQ_LOCK(qs) mtx_lock(&(qs)->lock)
#define TXQ_UNLOCK(qs) mtx_unlock(&(qs)->lock)
#define TXQ_RING_EMPTY(qs) drbr_empty((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
#define TXQ_RING_NEEDS_ENQUEUE(qs) \
drbr_needs_enqueue((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
#define TXQ_RING_FLUSH(qs) drbr_flush((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
#define TXQ_RING_DEQUEUE_COND(qs, func, arg) \
drbr_dequeue_cond((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr, func, arg)
#define TXQ_RING_DEQUEUE(qs) \
drbr_dequeue((qs)->port->ifp, (qs)->txq[TXQ_ETH].txq_mr)
int cxgb_debug = 0;
static void sge_timer_cb(void *arg);
static void sge_timer_reclaim(void *arg, int ncount);
static void sge_txq_reclaim_handler(void *arg, int ncount);
static void cxgb_start_locked(struct sge_qset *qs);
/*
* XXX need to cope with bursty scheduling by looking at a wider
* window than we are now for determining the need for coalescing
*
*/
static __inline uint64_t
check_pkt_coalesce(struct sge_qset *qs)
{
struct adapter *sc;
struct sge_txq *txq;
uint8_t *fill;
if (__predict_false(cxgb_tx_coalesce_force))
return (1);
txq = &qs->txq[TXQ_ETH];
sc = qs->port->adapter;
fill = &sc->tunq_fill[qs->idx];
if (cxgb_tx_coalesce_enable_start > COALESCE_START_MAX)
cxgb_tx_coalesce_enable_start = COALESCE_START_MAX;
if (cxgb_tx_coalesce_enable_stop < COALESCE_STOP_MIN)
cxgb_tx_coalesce_enable_start = COALESCE_STOP_MIN;
/*
* if the hardware transmit queue is more than 1/8 full
* we mark it as coalescing - we drop back from coalescing
* when we go below 1/32 full and there are no packets enqueued,
* this provides us with some degree of hysteresis
*/
if (*fill != 0 && (txq->in_use <= cxgb_tx_coalesce_enable_stop) &&
TXQ_RING_EMPTY(qs) && (qs->coalescing == 0))
*fill = 0;
else if (*fill == 0 && (txq->in_use >= cxgb_tx_coalesce_enable_start))
*fill = 1;
return (sc->tunq_coalesce);
}
#ifdef __LP64__
static void
set_wr_hdr(struct work_request_hdr *wrp, uint32_t wr_hi, uint32_t wr_lo)
{
uint64_t wr_hilo;
#if _BYTE_ORDER == _LITTLE_ENDIAN
wr_hilo = wr_hi;
wr_hilo |= (((uint64_t)wr_lo)<<32);
#else
wr_hilo = wr_lo;
wr_hilo |= (((uint64_t)wr_hi)<<32);
#endif
wrp->wrh_hilo = wr_hilo;
}
#else
static void
set_wr_hdr(struct work_request_hdr *wrp, uint32_t wr_hi, uint32_t wr_lo)
{
wrp->wrh_hi = wr_hi;
wmb();
wrp->wrh_lo = wr_lo;
}
#endif
struct coalesce_info {
int count;
int nbytes;
};
static int
coalesce_check(struct mbuf *m, void *arg)
{
struct coalesce_info *ci = arg;
int *count = &ci->count;
int *nbytes = &ci->nbytes;
if ((*nbytes == 0) || ((*nbytes + m->m_len <= 10500) &&
(*count < 7) && (m->m_next == NULL))) {
*count += 1;
*nbytes += m->m_len;
return (1);
}
return (0);
}
static struct mbuf *
cxgb_dequeue(struct sge_qset *qs)
{
struct mbuf *m, *m_head, *m_tail;
struct coalesce_info ci;
if (check_pkt_coalesce(qs) == 0)
return TXQ_RING_DEQUEUE(qs);
m_head = m_tail = NULL;
ci.count = ci.nbytes = 0;
do {
m = TXQ_RING_DEQUEUE_COND(qs, coalesce_check, &ci);
if (m_head == NULL) {
m_tail = m_head = m;
} else if (m != NULL) {
m_tail->m_nextpkt = m;
m_tail = m;
}
} while (m != NULL);
if (ci.count > 7)
panic("trying to coalesce %d packets in to one WR", ci.count);
return (m_head);
}
/**
* reclaim_completed_tx - reclaims completed Tx descriptors
* @adapter: the adapter
* @q: the Tx queue to reclaim completed descriptors from
*
* Reclaims Tx descriptors that the SGE has indicated it has processed,
* and frees the associated buffers if possible. Called with the Tx
* queue's lock held.
*/
static __inline int
reclaim_completed_tx(struct sge_qset *qs, int reclaim_min, int queue)
{
struct sge_txq *q = &qs->txq[queue];
int reclaim = desc_reclaimable(q);
if ((cxgb_tx_reclaim_threshold > TX_RECLAIM_MAX) ||
(cxgb_tx_reclaim_threshold < TX_RECLAIM_MIN))
cxgb_tx_reclaim_threshold = TX_RECLAIM_DEFAULT;
if (reclaim < reclaim_min)
return (0);
mtx_assert(&qs->lock, MA_OWNED);
if (reclaim > 0) {
t3_free_tx_desc(qs, reclaim, queue);
q->cleaned += reclaim;
q->in_use -= reclaim;
}
if (isset(&qs->txq_stopped, TXQ_ETH))
clrbit(&qs->txq_stopped, TXQ_ETH);
return (reclaim);
}
/**
* should_restart_tx - are there enough resources to restart a Tx queue?
* @q: the Tx queue
*
* Checks if there are enough descriptors to restart a suspended Tx queue.
*/
static __inline int
should_restart_tx(const struct sge_txq *q)
{
unsigned int r = q->processed - q->cleaned;
return q->in_use - r < (q->size >> 1);
}
/**
* t3_sge_init - initialize SGE
* @adap: the adapter
* @p: the SGE parameters
*
* Performs SGE initialization needed every time after a chip reset.
* We do not initialize any of the queue sets here, instead the driver
* top-level must request those individually. We also do not enable DMA
* here, that should be done after the queues have been set up.
*/
void
t3_sge_init(adapter_t *adap, struct sge_params *p)
{
u_int ctrl, ups;
ups = 0; /* = ffs(pci_resource_len(adap->pdev, 2) >> 12); */
ctrl = F_DROPPKT | V_PKTSHIFT(2) | F_FLMODE | F_AVOIDCQOVFL |
F_CQCRDTCTRL | F_CONGMODE | F_TNLFLMODE | F_FATLPERREN |
V_HOSTPAGESIZE(PAGE_SHIFT - 11) | F_BIGENDIANINGRESS |
V_USERSPACESIZE(ups ? ups - 1 : 0) | F_ISCSICOALESCING;
#if SGE_NUM_GENBITS == 1
ctrl |= F_EGRGENCTRL;
#endif
if (adap->params.rev > 0) {
if (!(adap->flags & (USING_MSIX | USING_MSI)))
ctrl |= F_ONEINTMULTQ | F_OPTONEINTMULTQ;
}
t3_write_reg(adap, A_SG_CONTROL, ctrl);
t3_write_reg(adap, A_SG_EGR_RCQ_DRB_THRSH, V_HIRCQDRBTHRSH(512) |
V_LORCQDRBTHRSH(512));
t3_write_reg(adap, A_SG_TIMER_TICK, core_ticks_per_usec(adap) / 10);
t3_write_reg(adap, A_SG_CMDQ_CREDIT_TH, V_THRESHOLD(32) |
V_TIMEOUT(200 * core_ticks_per_usec(adap)));
t3_write_reg(adap, A_SG_HI_DRB_HI_THRSH,
adap->params.rev < T3_REV_C ? 1000 : 500);
t3_write_reg(adap, A_SG_HI_DRB_LO_THRSH, 256);
t3_write_reg(adap, A_SG_LO_DRB_HI_THRSH, 1000);
t3_write_reg(adap, A_SG_LO_DRB_LO_THRSH, 256);
t3_write_reg(adap, A_SG_OCO_BASE, V_BASE1(0xfff));
t3_write_reg(adap, A_SG_DRB_PRI_THRESH, 63 * 1024);
}
/**
* sgl_len - calculates the size of an SGL of the given capacity
* @n: the number of SGL entries
*
* Calculates the number of flits needed for a scatter/gather list that
* can hold the given number of entries.
*/
static __inline unsigned int
sgl_len(unsigned int n)
{
return ((3 * n) / 2 + (n & 1));
}
/**
* get_imm_packet - return the next ingress packet buffer from a response
* @resp: the response descriptor containing the packet data
*
* Return a packet containing the immediate data of the given response.
*/
static int
get_imm_packet(adapter_t *sc, const struct rsp_desc *resp, struct mbuf *m)
{
if (resp->rss_hdr.opcode == CPL_RX_DATA) {
const struct cpl_rx_data *cpl = (const void *)&resp->imm_data[0];
m->m_len = sizeof(*cpl) + ntohs(cpl->len);
} else if (resp->rss_hdr.opcode == CPL_RX_PKT) {
const struct cpl_rx_pkt *cpl = (const void *)&resp->imm_data[0];
m->m_len = sizeof(*cpl) + ntohs(cpl->len);
} else
m->m_len = IMMED_PKT_SIZE;
m->m_ext.ext_buf = NULL;
m->m_ext.ext_type = 0;
memcpy(mtod(m, uint8_t *), resp->imm_data, m->m_len);
return (0);
}
static __inline u_int
flits_to_desc(u_int n)
{
return (flit_desc_map[n]);
}
#define SGE_PARERR (F_CPPARITYERROR | F_OCPARITYERROR | F_RCPARITYERROR | \
F_IRPARITYERROR | V_ITPARITYERROR(M_ITPARITYERROR) | \
V_FLPARITYERROR(M_FLPARITYERROR) | F_LODRBPARITYERROR | \
F_HIDRBPARITYERROR | F_LORCQPARITYERROR | \
F_HIRCQPARITYERROR)
#define SGE_FRAMINGERR (F_UC_REQ_FRAMINGERROR | F_R_REQ_FRAMINGERROR)
#define SGE_FATALERR (SGE_PARERR | SGE_FRAMINGERR | F_RSPQCREDITOVERFOW | \
F_RSPQDISABLED)
/**
* t3_sge_err_intr_handler - SGE async event interrupt handler
* @adapter: the adapter
*
* Interrupt handler for SGE asynchronous (non-data) events.
*/
void
t3_sge_err_intr_handler(adapter_t *adapter)
{
unsigned int v, status;
status = t3_read_reg(adapter, A_SG_INT_CAUSE);
if (status & SGE_PARERR)
CH_ALERT(adapter, "SGE parity error (0x%x)\n",
status & SGE_PARERR);
if (status & SGE_FRAMINGERR)
CH_ALERT(adapter, "SGE framing error (0x%x)\n",
status & SGE_FRAMINGERR);
if (status & F_RSPQCREDITOVERFOW)
CH_ALERT(adapter, "SGE response queue credit overflow\n");
if (status & F_RSPQDISABLED) {
v = t3_read_reg(adapter, A_SG_RSPQ_FL_STATUS);
CH_ALERT(adapter,
"packet delivered to disabled response queue (0x%x)\n",
(v >> S_RSPQ0DISABLED) & 0xff);
}
t3_write_reg(adapter, A_SG_INT_CAUSE, status);
if (status & SGE_FATALERR)
t3_fatal_err(adapter);
}
void
t3_sge_prep(adapter_t *adap, struct sge_params *p)
{
int i, nqsets, fl_q_size, jumbo_q_size, use_16k, jumbo_buf_size;
nqsets = min(SGE_QSETS / adap->params.nports, mp_ncpus);
nqsets *= adap->params.nports;
fl_q_size = min(nmbclusters/(3*nqsets), FL_Q_SIZE);
while (!powerof2(fl_q_size))
fl_q_size--;
use_16k = cxgb_use_16k_clusters != -1 ? cxgb_use_16k_clusters :
is_offload(adap);
#if __FreeBSD_version >= 700111
if (use_16k) {
jumbo_q_size = min(nmbjumbo16/(3*nqsets), JUMBO_Q_SIZE);
jumbo_buf_size = MJUM16BYTES;
} else {
jumbo_q_size = min(nmbjumbo9/(3*nqsets), JUMBO_Q_SIZE);
jumbo_buf_size = MJUM9BYTES;
}
#else
jumbo_q_size = min(nmbjumbop/(3*nqsets), JUMBO_Q_SIZE);
jumbo_buf_size = MJUMPAGESIZE;
#endif
while (!powerof2(jumbo_q_size))
jumbo_q_size--;
if (fl_q_size < (FL_Q_SIZE / 4) || jumbo_q_size < (JUMBO_Q_SIZE / 2))
device_printf(adap->dev,
"Insufficient clusters and/or jumbo buffers.\n");
p->max_pkt_size = jumbo_buf_size - sizeof(struct cpl_rx_data);
for (i = 0; i < SGE_QSETS; ++i) {
struct qset_params *q = p->qset + i;
if (adap->params.nports > 2) {
q->coalesce_usecs = 50;
} else {
#ifdef INVARIANTS
q->coalesce_usecs = 10;
#else
q->coalesce_usecs = 5;
#endif
}
q->polling = 0;
q->rspq_size = RSPQ_Q_SIZE;
q->fl_size = fl_q_size;
q->jumbo_size = jumbo_q_size;
q->jumbo_buf_size = jumbo_buf_size;
q->txq_size[TXQ_ETH] = TX_ETH_Q_SIZE;
q->txq_size[TXQ_OFLD] = is_offload(adap) ? TX_OFLD_Q_SIZE : 16;
q->txq_size[TXQ_CTRL] = TX_CTRL_Q_SIZE;
q->cong_thres = 0;
}
}
int
t3_sge_alloc(adapter_t *sc)
{
/* The parent tag. */
if (bus_dma_tag_create( bus_get_dma_tag(sc->dev),/* PCI parent */
1, 0, /* algnmnt, boundary */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
BUS_SPACE_MAXSIZE_32BIT,/* maxsize */
BUS_SPACE_UNRESTRICTED, /* nsegments */
BUS_SPACE_MAXSIZE_32BIT,/* maxsegsize */
0, /* flags */
NULL, NULL, /* lock, lockarg */
&sc->parent_dmat)) {
device_printf(sc->dev, "Cannot allocate parent DMA tag\n");
return (ENOMEM);
}
/*
* DMA tag for normal sized RX frames
*/
if (bus_dma_tag_create(sc->parent_dmat, MCLBYTES, 0, BUS_SPACE_MAXADDR,
BUS_SPACE_MAXADDR, NULL, NULL, MCLBYTES, 1,
MCLBYTES, BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rx_dmat)) {
device_printf(sc->dev, "Cannot allocate RX DMA tag\n");
return (ENOMEM);
}
/*
* DMA tag for jumbo sized RX frames.
*/
if (bus_dma_tag_create(sc->parent_dmat, MJUM16BYTES, 0, BUS_SPACE_MAXADDR,
BUS_SPACE_MAXADDR, NULL, NULL, MJUM16BYTES, 1, MJUM16BYTES,
BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rx_jumbo_dmat)) {
device_printf(sc->dev, "Cannot allocate RX jumbo DMA tag\n");
return (ENOMEM);
}
/*
* DMA tag for TX frames.
*/
if (bus_dma_tag_create(sc->parent_dmat, 1, 0, BUS_SPACE_MAXADDR,
BUS_SPACE_MAXADDR, NULL, NULL, TX_MAX_SIZE, TX_MAX_SEGS,
TX_MAX_SIZE, BUS_DMA_ALLOCNOW,
NULL, NULL, &sc->tx_dmat)) {
device_printf(sc->dev, "Cannot allocate TX DMA tag\n");
return (ENOMEM);
}
return (0);
}
int
t3_sge_free(struct adapter * sc)
{
if (sc->tx_dmat != NULL)
bus_dma_tag_destroy(sc->tx_dmat);
if (sc->rx_jumbo_dmat != NULL)
bus_dma_tag_destroy(sc->rx_jumbo_dmat);
if (sc->rx_dmat != NULL)
bus_dma_tag_destroy(sc->rx_dmat);
if (sc->parent_dmat != NULL)
bus_dma_tag_destroy(sc->parent_dmat);
return (0);
}
void
t3_update_qset_coalesce(struct sge_qset *qs, const struct qset_params *p)
{
qs->rspq.holdoff_tmr = max(p->coalesce_usecs * 10, 1U);
qs->rspq.polling = 0 /* p->polling */;
}
#if !defined(__i386__) && !defined(__amd64__)
static void
refill_fl_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
{
struct refill_fl_cb_arg *cb_arg = arg;
cb_arg->error = error;
cb_arg->seg = segs[0];
cb_arg->nseg = nseg;
}
#endif
/**
* refill_fl - refill an SGE free-buffer list
* @sc: the controller softc
* @q: the free-list to refill
* @n: the number of new buffers to allocate
*
* (Re)populate an SGE free-buffer list with up to @n new packet buffers.
* The caller must assure that @n does not exceed the queue's capacity.
*/
static void
refill_fl(adapter_t *sc, struct sge_fl *q, int n)
{
struct rx_sw_desc *sd = &q->sdesc[q->pidx];
struct rx_desc *d = &q->desc[q->pidx];
struct refill_fl_cb_arg cb_arg;
struct mbuf *m;
caddr_t cl;
int err;
cb_arg.error = 0;
while (n--) {
/*
* We allocate an uninitialized mbuf + cluster, mbuf is
* initialized after rx.
*/
if (q->zone == zone_pack) {
if ((m = m_getcl(M_NOWAIT, MT_NOINIT, M_PKTHDR)) == NULL)
break;
cl = m->m_ext.ext_buf;
} else {
if ((cl = m_cljget(NULL, M_NOWAIT, q->buf_size)) == NULL)
break;
if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) {
uma_zfree(q->zone, cl);
break;
}
}
if ((sd->flags & RX_SW_DESC_MAP_CREATED) == 0) {
if ((err = bus_dmamap_create(q->entry_tag, 0, &sd->map))) {
log(LOG_WARNING, "bus_dmamap_create failed %d\n", err);
uma_zfree(q->zone, cl);
goto done;
}
sd->flags |= RX_SW_DESC_MAP_CREATED;
}
#if !defined(__i386__) && !defined(__amd64__)
err = bus_dmamap_load(q->entry_tag, sd->map,
cl, q->buf_size, refill_fl_cb, &cb_arg, 0);
if (err != 0 || cb_arg.error) {
if (q->zone != zone_pack)
uma_zfree(q->zone, cl);
m_free(m);
goto done;
}
#else
cb_arg.seg.ds_addr = pmap_kextract((vm_offset_t)cl);
#endif
sd->flags |= RX_SW_DESC_INUSE;
sd->rxsd_cl = cl;
sd->m = m;
d->addr_lo = htobe32(cb_arg.seg.ds_addr & 0xffffffff);
d->addr_hi = htobe32(((uint64_t)cb_arg.seg.ds_addr >>32) & 0xffffffff);
d->len_gen = htobe32(V_FLD_GEN1(q->gen));
d->gen2 = htobe32(V_FLD_GEN2(q->gen));
d++;
sd++;
if (++q->pidx == q->size) {
q->pidx = 0;
q->gen ^= 1;
sd = q->sdesc;
d = q->desc;
}
q->credits++;
q->db_pending++;
}
done:
if (q->db_pending >= 32) {
q->db_pending = 0;
t3_write_reg(sc, A_SG_KDOORBELL, V_EGRCNTX(q->cntxt_id));
}
}
/**
* free_rx_bufs - free the Rx buffers on an SGE free list
* @sc: the controle softc
* @q: the SGE free list to clean up
*
* Release the buffers on an SGE free-buffer Rx queue. HW fetching from
* this queue should be stopped before calling this function.
*/
static void
free_rx_bufs(adapter_t *sc, struct sge_fl *q)
{
u_int cidx = q->cidx;
while (q->credits--) {
struct rx_sw_desc *d = &q->sdesc[cidx];
if (d->flags & RX_SW_DESC_INUSE) {
bus_dmamap_unload(q->entry_tag, d->map);
bus_dmamap_destroy(q->entry_tag, d->map);
if (q->zone == zone_pack) {
m_init(d->m, M_NOWAIT, MT_DATA, M_EXT);
uma_zfree(zone_pack, d->m);
} else {
m_init(d->m, M_NOWAIT, MT_DATA, 0);
uma_zfree(zone_mbuf, d->m);
uma_zfree(q->zone, d->rxsd_cl);
}
}
d->rxsd_cl = NULL;
d->m = NULL;
if (++cidx == q->size)
cidx = 0;
}
}
static __inline void
__refill_fl(adapter_t *adap, struct sge_fl *fl)
{
refill_fl(adap, fl, min(16U, fl->size - fl->credits));
}
static __inline void
__refill_fl_lt(adapter_t *adap, struct sge_fl *fl, int max)
{
uint32_t reclaimable = fl->size - fl->credits;
if (reclaimable > 0)
refill_fl(adap, fl, min(max, reclaimable));
}
/**
* recycle_rx_buf - recycle a receive buffer
* @adapter: the adapter
* @q: the SGE free list
* @idx: index of buffer to recycle
*
* Recycles the specified buffer on the given free list by adding it at
* the next available slot on the list.
*/
static void
recycle_rx_buf(adapter_t *adap, struct sge_fl *q, unsigned int idx)
{
struct rx_desc *from = &q->desc[idx];
struct rx_desc *to = &q->desc[q->pidx];
q->sdesc[q->pidx] = q->sdesc[idx];
to->addr_lo = from->addr_lo; // already big endian
to->addr_hi = from->addr_hi; // likewise
wmb(); /* necessary ? */
to->len_gen = htobe32(V_FLD_GEN1(q->gen));
to->gen2 = htobe32(V_FLD_GEN2(q->gen));
q->credits++;
if (++q->pidx == q->size) {
q->pidx = 0;
q->gen ^= 1;
}
t3_write_reg(adap, A_SG_KDOORBELL, V_EGRCNTX(q->cntxt_id));
}
static void
alloc_ring_cb(void *arg, bus_dma_segment_t *segs, int nsegs, int error)
{
uint32_t *addr;
addr = arg;
*addr = segs[0].ds_addr;
}
static int
alloc_ring(adapter_t *sc, size_t nelem, size_t elem_size, size_t sw_size,
bus_addr_t *phys, void *desc, void *sdesc, bus_dma_tag_t *tag,
bus_dmamap_t *map, bus_dma_tag_t parent_entry_tag, bus_dma_tag_t *entry_tag)
{
size_t len = nelem * elem_size;
void *s = NULL;
void *p = NULL;
int err;
if ((err = bus_dma_tag_create(sc->parent_dmat, PAGE_SIZE, 0,
BUS_SPACE_MAXADDR_32BIT,
BUS_SPACE_MAXADDR, NULL, NULL, len, 1,
len, 0, NULL, NULL, tag)) != 0) {
device_printf(sc->dev, "Cannot allocate descriptor tag\n");
return (ENOMEM);
}
if ((err = bus_dmamem_alloc(*tag, (void **)&p, BUS_DMA_NOWAIT,
map)) != 0) {
device_printf(sc->dev, "Cannot allocate descriptor memory\n");
return (ENOMEM);
}
bus_dmamap_load(*tag, *map, p, len, alloc_ring_cb, phys, 0);
bzero(p, len);
*(void **)desc = p;
if (sw_size) {
len = nelem * sw_size;
s = malloc(len, M_DEVBUF, M_WAITOK|M_ZERO);
*(void **)sdesc = s;
}
if (parent_entry_tag == NULL)
return (0);
if ((err = bus_dma_tag_create(parent_entry_tag, 1, 0,
BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR,
NULL, NULL, TX_MAX_SIZE, TX_MAX_SEGS,
TX_MAX_SIZE, BUS_DMA_ALLOCNOW,
NULL, NULL, entry_tag)) != 0) {
device_printf(sc->dev, "Cannot allocate descriptor entry tag\n");
return (ENOMEM);
}
return (0);
}
static void
sge_slow_intr_handler(void *arg, int ncount)
{
adapter_t *sc = arg;
t3_slow_intr_handler(sc);
t3_write_reg(sc, A_PL_INT_ENABLE0, sc->slow_intr_mask);
(void) t3_read_reg(sc, A_PL_INT_ENABLE0);
}
/**
* sge_timer_cb - perform periodic maintenance of an SGE qset
* @data: the SGE queue set to maintain
*
* Runs periodically from a timer to perform maintenance of an SGE queue
* set. It performs two tasks:
*
* a) Cleans up any completed Tx descriptors that may still be pending.
* Normal descriptor cleanup happens when new packets are added to a Tx
* queue so this timer is relatively infrequent and does any cleanup only
* if the Tx queue has not seen any new packets in a while. We make a
* best effort attempt to reclaim descriptors, in that we don't wait
* around if we cannot get a queue's lock (which most likely is because
* someone else is queueing new packets and so will also handle the clean
* up). Since control queues use immediate data exclusively we don't
* bother cleaning them up here.
*
* b) Replenishes Rx queues that have run out due to memory shortage.
* Normally new Rx buffers are added when existing ones are consumed but
* when out of memory a queue can become empty. We try to add only a few
* buffers here, the queue will be replenished fully as these new buffers
* are used up if memory shortage has subsided.
*
* c) Return coalesced response queue credits in case a response queue is
* starved.
*
* d) Ring doorbells for T304 tunnel queues since we have seen doorbell
* fifo overflows and the FW doesn't implement any recovery scheme yet.
*/
static void
sge_timer_cb(void *arg)
{
adapter_t *sc = arg;
if ((sc->flags & USING_MSIX) == 0) {
struct port_info *pi;
struct sge_qset *qs;
struct sge_txq *txq;
int i, j;
int reclaim_ofl, refill_rx;
if (sc->open_device_map == 0)
return;
for (i = 0; i < sc->params.nports; i++) {
pi = &sc->port[i];
for (j = 0; j < pi->nqsets; j++) {
qs = &sc->sge.qs[pi->first_qset + j];
txq = &qs->txq[0];
reclaim_ofl = txq[TXQ_OFLD].processed - txq[TXQ_OFLD].cleaned;
refill_rx = ((qs->fl[0].credits < qs->fl[0].size) ||
(qs->fl[1].credits < qs->fl[1].size));
if (reclaim_ofl || refill_rx) {
taskqueue_enqueue(sc->tq, &pi->timer_reclaim_task);
break;
}
}
}
}
if (sc->params.nports > 2) {
int i;
for_each_port(sc, i) {
struct port_info *pi = &sc->port[i];
t3_write_reg(sc, A_SG_KDOORBELL,
F_SELEGRCNTX |
(FW_TUNNEL_SGEEC_START + pi->first_qset));
}
}
if (((sc->flags & USING_MSIX) == 0 || sc->params.nports > 2) &&
sc->open_device_map != 0)
callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
}
/*
* This is meant to be a catch-all function to keep sge state private
* to sge.c
*
*/
int
t3_sge_init_adapter(adapter_t *sc)
{
callout_init(&sc->sge_timer_ch, 1);
callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
TASK_INIT(&sc->slow_intr_task, 0, sge_slow_intr_handler, sc);
return (0);
}
int
t3_sge_reset_adapter(adapter_t *sc)
{
callout_reset(&sc->sge_timer_ch, TX_RECLAIM_PERIOD, sge_timer_cb, sc);
return (0);
}
int
t3_sge_init_port(struct port_info *pi)
{
TASK_INIT(&pi->timer_reclaim_task, 0, sge_timer_reclaim, pi);
return (0);
}
/**
* refill_rspq - replenish an SGE response queue
* @adapter: the adapter
* @q: the response queue to replenish
* @credits: how many new responses to make available
*
* Replenishes a response queue by making the supplied number of responses
* available to HW.
*/
static __inline void
refill_rspq(adapter_t *sc, const struct sge_rspq *q, u_int credits)
{
/* mbufs are allocated on demand when a rspq entry is processed. */
t3_write_reg(sc, A_SG_RSPQ_CREDIT_RETURN,
V_RSPQ(q->cntxt_id) | V_CREDITS(credits));
}
static void
sge_txq_reclaim_handler(void *arg, int ncount)
{
struct sge_qset *qs = arg;
int i;
for (i = 0; i < 3; i++)
reclaim_completed_tx(qs, 16, i);
}
static void
sge_timer_reclaim(void *arg, int ncount)
{
struct port_info *pi = arg;
int i, nqsets = pi->nqsets;
adapter_t *sc = pi->adapter;
struct sge_qset *qs;
struct mtx *lock;
KASSERT((sc->flags & USING_MSIX) == 0,
("can't call timer reclaim for msi-x"));
for (i = 0; i < nqsets; i++) {
qs = &sc->sge.qs[pi->first_qset + i];
reclaim_completed_tx(qs, 16, TXQ_OFLD);
lock = (sc->flags & USING_MSIX) ? &qs->rspq.lock :
&sc->sge.qs[0].rspq.lock;
if (mtx_trylock(lock)) {
/* XXX currently assume that we are *NOT* polling */
uint32_t status = t3_read_reg(sc, A_SG_RSPQ_FL_STATUS);
if (qs->fl[0].credits < qs->fl[0].size - 16)
__refill_fl(sc, &qs->fl[0]);
if (qs->fl[1].credits < qs->fl[1].size - 16)
__refill_fl(sc, &qs->fl[1]);
if (status & (1 << qs->rspq.cntxt_id)) {
if (qs->rspq.credits) {
refill_rspq(sc, &qs->rspq, 1);
qs->rspq.credits--;
t3_write_reg(sc, A_SG_RSPQ_FL_STATUS,
1 << qs->rspq.cntxt_id);
}
}
mtx_unlock(lock);
}
}
}
/**
* init_qset_cntxt - initialize an SGE queue set context info
* @qs: the queue set
* @id: the queue set id
*
* Initializes the TIDs and context ids for the queues of a queue set.
*/
static void
init_qset_cntxt(struct sge_qset *qs, u_int id)
{
qs->rspq.cntxt_id = id;
qs->fl[0].cntxt_id = 2 * id;
qs->fl[1].cntxt_id = 2 * id + 1;
qs->txq[TXQ_ETH].cntxt_id = FW_TUNNEL_SGEEC_START + id;
qs->txq[TXQ_ETH].token = FW_TUNNEL_TID_START + id;
qs->txq[TXQ_OFLD].cntxt_id = FW_OFLD_SGEEC_START + id;
qs->txq[TXQ_CTRL].cntxt_id = FW_CTRL_SGEEC_START + id;
qs->txq[TXQ_CTRL].token = FW_CTRL_TID_START + id;
/* XXX: a sane limit is needed instead of INT_MAX */
mbufq_init(&qs->txq[TXQ_ETH].sendq, INT_MAX);
mbufq_init(&qs->txq[TXQ_OFLD].sendq, INT_MAX);
mbufq_init(&qs->txq[TXQ_CTRL].sendq, INT_MAX);
}
static void
txq_prod(struct sge_txq *txq, unsigned int ndesc, struct txq_state *txqs)
{
txq->in_use += ndesc;
/*
* XXX we don't handle stopping of queue
* presumably start handles this when we bump against the end
*/
txqs->gen = txq->gen;
txq->unacked += ndesc;
txqs->compl = (txq->unacked & 32) << (S_WR_COMPL - 5);
txq->unacked &= 31;
txqs->pidx = txq->pidx;
txq->pidx += ndesc;
#ifdef INVARIANTS
if (((txqs->pidx > txq->cidx) &&
(txq->pidx < txqs->pidx) &&
(txq->pidx >= txq->cidx)) ||
((txqs->pidx < txq->cidx) &&
(txq->pidx >= txq-> cidx)) ||
((txqs->pidx < txq->cidx) &&
(txq->cidx < txqs->pidx)))
panic("txqs->pidx=%d txq->pidx=%d txq->cidx=%d",
txqs->pidx, txq->pidx, txq->cidx);
#endif
if (txq->pidx >= txq->size) {
txq->pidx -= txq->size;
txq->gen ^= 1;
}
}
/**
* calc_tx_descs - calculate the number of Tx descriptors for a packet
* @m: the packet mbufs
* @nsegs: the number of segments
*
* Returns the number of Tx descriptors needed for the given Ethernet
* packet. Ethernet packets require addition of WR and CPL headers.
*/
static __inline unsigned int
calc_tx_descs(const struct mbuf *m, int nsegs)
{
unsigned int flits;
if (m->m_pkthdr.len <= PIO_LEN)
return 1;
flits = sgl_len(nsegs) + 2;
if (m->m_pkthdr.csum_flags & CSUM_TSO)
flits++;
return flits_to_desc(flits);
}
/**
* make_sgl - populate a scatter/gather list for a packet
* @sgp: the SGL to populate
* @segs: the packet dma segments
* @nsegs: the number of segments
*
* Generates a scatter/gather list for the buffers that make up a packet
* and returns the SGL size in 8-byte words. The caller must size the SGL
* appropriately.
*/
static __inline void
make_sgl(struct sg_ent *sgp, bus_dma_segment_t *segs, int nsegs)
{
int i, idx;
for (idx = 0, i = 0; i < nsegs; i++) {
/*
* firmware doesn't like empty segments
*/
if (segs[i].ds_len == 0)
continue;
if (i && idx == 0)
++sgp;
sgp->len[idx] = htobe32(segs[i].ds_len);
sgp->addr[idx] = htobe64(segs[i].ds_addr);
idx ^= 1;
}
if (idx) {
sgp->len[idx] = 0;
sgp->addr[idx] = 0;
}
}
/**
* check_ring_tx_db - check and potentially ring a Tx queue's doorbell
* @adap: the adapter
* @q: the Tx queue
*
* Ring the doorbell if a Tx queue is asleep. There is a natural race,
* where the HW is going to sleep just after we checked, however,
* then the interrupt handler will detect the outstanding TX packet
* and ring the doorbell for us.
*
* When GTS is disabled we unconditionally ring the doorbell.
*/
static __inline void
check_ring_tx_db(adapter_t *adap, struct sge_txq *q, int mustring)
{
#if USE_GTS
clear_bit(TXQ_LAST_PKT_DB, &q->flags);
if (test_and_set_bit(TXQ_RUNNING, &q->flags) == 0) {
set_bit(TXQ_LAST_PKT_DB, &q->flags);
#ifdef T3_TRACE
T3_TRACE1(adap->tb[q->cntxt_id & 7], "doorbell Tx, cntxt %d",
q->cntxt_id);
#endif
t3_write_reg(adap, A_SG_KDOORBELL,
F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
}
#else
if (mustring || ++q->db_pending >= 32) {
wmb(); /* write descriptors before telling HW */
t3_write_reg(adap, A_SG_KDOORBELL,
F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
q->db_pending = 0;
}
#endif
}
static __inline void
wr_gen2(struct tx_desc *d, unsigned int gen)
{
#if SGE_NUM_GENBITS == 2
d->flit[TX_DESC_FLITS - 1] = htobe64(gen);
#endif
}
/**
* write_wr_hdr_sgl - write a WR header and, optionally, SGL
* @ndesc: number of Tx descriptors spanned by the SGL
* @txd: first Tx descriptor to be written
* @txqs: txq state (generation and producer index)
* @txq: the SGE Tx queue
* @sgl: the SGL
* @flits: number of flits to the start of the SGL in the first descriptor
* @sgl_flits: the SGL size in flits
* @wr_hi: top 32 bits of WR header based on WR type (big endian)
* @wr_lo: low 32 bits of WR header based on WR type (big endian)
*
* Write a work request header and an associated SGL. If the SGL is
* small enough to fit into one Tx descriptor it has already been written
* and we just need to write the WR header. Otherwise we distribute the
* SGL across the number of descriptors it spans.
*/
static void
write_wr_hdr_sgl(unsigned int ndesc, struct tx_desc *txd, struct txq_state *txqs,
const struct sge_txq *txq, const struct sg_ent *sgl, unsigned int flits,
unsigned int sgl_flits, unsigned int wr_hi, unsigned int wr_lo)
{
struct work_request_hdr *wrp = (struct work_request_hdr *)txd;
struct tx_sw_desc *txsd = &txq->sdesc[txqs->pidx];
if (__predict_true(ndesc == 1)) {
set_wr_hdr(wrp, htonl(F_WR_SOP | F_WR_EOP | V_WR_DATATYPE(1) |
V_WR_SGLSFLT(flits)) | wr_hi,
htonl(V_WR_LEN(flits + sgl_flits) | V_WR_GEN(txqs->gen)) |
wr_lo);
wr_gen2(txd, txqs->gen);
} else {
unsigned int ogen = txqs->gen;
const uint64_t *fp = (const uint64_t *)sgl;
struct work_request_hdr *wp = wrp;
wrp->wrh_hi = htonl(F_WR_SOP | V_WR_DATATYPE(1) |
V_WR_SGLSFLT(flits)) | wr_hi;
while (sgl_flits) {
unsigned int avail = WR_FLITS - flits;
if (avail > sgl_flits)
avail = sgl_flits;
memcpy(&txd->flit[flits], fp, avail * sizeof(*fp));
sgl_flits -= avail;
ndesc--;
if (!sgl_flits)
break;
fp += avail;
txd++;
txsd++;
if (++txqs->pidx == txq->size) {
txqs->pidx = 0;
txqs->gen ^= 1;
txd = txq->desc;
txsd = txq->sdesc;
}
/*
* when the head of the mbuf chain
* is freed all clusters will be freed
* with it
*/
wrp = (struct work_request_hdr *)txd;
wrp->wrh_hi = htonl(V_WR_DATATYPE(1) |
V_WR_SGLSFLT(1)) | wr_hi;
wrp->wrh_lo = htonl(V_WR_LEN(min(WR_FLITS,
sgl_flits + 1)) |
V_WR_GEN(txqs->gen)) | wr_lo;
wr_gen2(txd, txqs->gen);
flits = 1;
}
wrp->wrh_hi |= htonl(F_WR_EOP);
wmb();
wp->wrh_lo = htonl(V_WR_LEN(WR_FLITS) | V_WR_GEN(ogen)) | wr_lo;
wr_gen2((struct tx_desc *)wp, ogen);
}
}
/* sizeof(*eh) + sizeof(*ip) + sizeof(*tcp) */
#define TCPPKTHDRSIZE (ETHER_HDR_LEN + 20 + 20)
#define GET_VTAG(cntrl, m) \
do { \
if ((m)->m_flags & M_VLANTAG) \
cntrl |= F_TXPKT_VLAN_VLD | V_TXPKT_VLAN((m)->m_pkthdr.ether_vtag); \
} while (0)
static int
t3_encap(struct sge_qset *qs, struct mbuf **m)
{
adapter_t *sc;
struct mbuf *m0;
struct sge_txq *txq;
struct txq_state txqs;
struct port_info *pi;
unsigned int ndesc, flits, cntrl, mlen;
int err, nsegs, tso_info = 0;
struct work_request_hdr *wrp;
struct tx_sw_desc *txsd;
struct sg_ent *sgp, *sgl;
uint32_t wr_hi, wr_lo, sgl_flits;
bus_dma_segment_t segs[TX_MAX_SEGS];
struct tx_desc *txd;
pi = qs->port;
sc = pi->adapter;
txq = &qs->txq[TXQ_ETH];
txd = &txq->desc[txq->pidx];
txsd = &txq->sdesc[txq->pidx];
sgl = txq->txq_sgl;
prefetch(txd);
m0 = *m;
mtx_assert(&qs->lock, MA_OWNED);
cntrl = V_TXPKT_INTF(pi->txpkt_intf);
KASSERT(m0->m_flags & M_PKTHDR, ("not packet header\n"));
if (m0->m_nextpkt == NULL && m0->m_next != NULL &&
m0->m_pkthdr.csum_flags & (CSUM_TSO))
tso_info = V_LSO_MSS(m0->m_pkthdr.tso_segsz);
if (m0->m_nextpkt != NULL) {
busdma_map_sg_vec(txq->entry_tag, txsd->map, m0, segs, &nsegs);
ndesc = 1;
mlen = 0;
} else {
if ((err = busdma_map_sg_collapse(txq->entry_tag, txsd->map,
&m0, segs, &nsegs))) {
if (cxgb_debug)
printf("failed ... err=%d\n", err);
return (err);
}
mlen = m0->m_pkthdr.len;
ndesc = calc_tx_descs(m0, nsegs);
}
txq_prod(txq, ndesc, &txqs);
KASSERT(m0->m_pkthdr.len, ("empty packet nsegs=%d", nsegs));
txsd->m = m0;
if (m0->m_nextpkt != NULL) {
struct cpl_tx_pkt_batch *cpl_batch = (struct cpl_tx_pkt_batch *)txd;
int i, fidx;
if (nsegs > 7)
panic("trying to coalesce %d packets in to one WR", nsegs);
txq->txq_coalesced += nsegs;
wrp = (struct work_request_hdr *)txd;
flits = nsegs*2 + 1;
for (fidx = 1, i = 0; i < nsegs; i++, fidx += 2) {
struct cpl_tx_pkt_batch_entry *cbe;
uint64_t flit;
uint32_t *hflit = (uint32_t *)&flit;
int cflags = m0->m_pkthdr.csum_flags;
cntrl = V_TXPKT_INTF(pi->txpkt_intf);
GET_VTAG(cntrl, m0);
cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT);
if (__predict_false(!(cflags & CSUM_IP)))
cntrl |= F_TXPKT_IPCSUM_DIS;
if (__predict_false(!(cflags & (CSUM_TCP | CSUM_UDP |
CSUM_UDP_IPV6 | CSUM_TCP_IPV6))))
cntrl |= F_TXPKT_L4CSUM_DIS;
hflit[0] = htonl(cntrl);
hflit[1] = htonl(segs[i].ds_len | 0x80000000);
flit |= htobe64(1 << 24);
cbe = &cpl_batch->pkt_entry[i];
cbe->cntrl = hflit[0];
cbe->len = hflit[1];
cbe->addr = htobe64(segs[i].ds_addr);
}
wr_hi = htonl(F_WR_SOP | F_WR_EOP | V_WR_DATATYPE(1) |
V_WR_SGLSFLT(flits)) |
htonl(V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) | txqs.compl);
wr_lo = htonl(V_WR_LEN(flits) |
V_WR_GEN(txqs.gen)) | htonl(V_WR_TID(txq->token));
set_wr_hdr(wrp, wr_hi, wr_lo);
wmb();
ETHER_BPF_MTAP(pi->ifp, m0);
wr_gen2(txd, txqs.gen);
check_ring_tx_db(sc, txq, 0);
return (0);
} else if (tso_info) {
uint16_t eth_type;
struct cpl_tx_pkt_lso *hdr = (struct cpl_tx_pkt_lso *)txd;
struct ether_header *eh;
void *l3hdr;
struct tcphdr *tcp;
txd->flit[2] = 0;
GET_VTAG(cntrl, m0);
cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT_LSO);
hdr->cntrl = htonl(cntrl);
hdr->len = htonl(mlen | 0x80000000);
if (__predict_false(mlen < TCPPKTHDRSIZE)) {
printf("mbuf=%p,len=%d,tso_segsz=%d,csum_flags=%b,flags=%#x",
m0, mlen, m0->m_pkthdr.tso_segsz,
(int)m0->m_pkthdr.csum_flags, CSUM_BITS, m0->m_flags);
panic("tx tso packet too small");
}
/* Make sure that ether, ip, tcp headers are all in m0 */
if (__predict_false(m0->m_len < TCPPKTHDRSIZE)) {
m0 = m_pullup(m0, TCPPKTHDRSIZE);
if (__predict_false(m0 == NULL)) {
/* XXX panic probably an overreaction */
panic("couldn't fit header into mbuf");
}
}
eh = mtod(m0, struct ether_header *);
eth_type = eh->ether_type;
if (eth_type == htons(ETHERTYPE_VLAN)) {
struct ether_vlan_header *evh = (void *)eh;
tso_info |= V_LSO_ETH_TYPE(CPL_ETH_II_VLAN);
l3hdr = evh + 1;
eth_type = evh->evl_proto;
} else {
tso_info |= V_LSO_ETH_TYPE(CPL_ETH_II);
l3hdr = eh + 1;
}
if (eth_type == htons(ETHERTYPE_IP)) {
struct ip *ip = l3hdr;
tso_info |= V_LSO_IPHDR_WORDS(ip->ip_hl);
tcp = (struct tcphdr *)(ip + 1);
} else if (eth_type == htons(ETHERTYPE_IPV6)) {
struct ip6_hdr *ip6 = l3hdr;
KASSERT(ip6->ip6_nxt == IPPROTO_TCP,
("%s: CSUM_TSO with ip6_nxt %d",
__func__, ip6->ip6_nxt));
tso_info |= F_LSO_IPV6;
tso_info |= V_LSO_IPHDR_WORDS(sizeof(*ip6) >> 2);
tcp = (struct tcphdr *)(ip6 + 1);
} else
panic("%s: CSUM_TSO but neither ip nor ip6", __func__);
tso_info |= V_LSO_TCPHDR_WORDS(tcp->th_off);
hdr->lso_info = htonl(tso_info);
if (__predict_false(mlen <= PIO_LEN)) {
/*
* pkt not undersized but fits in PIO_LEN
* Indicates a TSO bug at the higher levels.
*/
txsd->m = NULL;
m_copydata(m0, 0, mlen, (caddr_t)&txd->flit[3]);
flits = (mlen + 7) / 8 + 3;
wr_hi = htonl(V_WR_BCNTLFLT(mlen & 7) |
V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) |
F_WR_SOP | F_WR_EOP | txqs.compl);
wr_lo = htonl(V_WR_LEN(flits) |
V_WR_GEN(txqs.gen) | V_WR_TID(txq->token));
set_wr_hdr(&hdr->wr, wr_hi, wr_lo);
wmb();
ETHER_BPF_MTAP(pi->ifp, m0);
wr_gen2(txd, txqs.gen);
check_ring_tx_db(sc, txq, 0);
m_freem(m0);
return (0);
}
flits = 3;
} else {
struct cpl_tx_pkt *cpl = (struct cpl_tx_pkt *)txd;
GET_VTAG(cntrl, m0);
cntrl |= V_TXPKT_OPCODE(CPL_TX_PKT);
if (__predict_false(!(m0->m_pkthdr.csum_flags & CSUM_IP)))
cntrl |= F_TXPKT_IPCSUM_DIS;
if (__predict_false(!(m0->m_pkthdr.csum_flags & (CSUM_TCP |
CSUM_UDP | CSUM_UDP_IPV6 | CSUM_TCP_IPV6))))
cntrl |= F_TXPKT_L4CSUM_DIS;
cpl->cntrl = htonl(cntrl);
cpl->len = htonl(mlen | 0x80000000);
if (mlen <= PIO_LEN) {
txsd->m = NULL;
m_copydata(m0, 0, mlen, (caddr_t)&txd->flit[2]);
flits = (mlen + 7) / 8 + 2;
wr_hi = htonl(V_WR_BCNTLFLT(mlen & 7) |
V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) |
F_WR_SOP | F_WR_EOP | txqs.compl);
wr_lo = htonl(V_WR_LEN(flits) |
V_WR_GEN(txqs.gen) | V_WR_TID(txq->token));
set_wr_hdr(&cpl->wr, wr_hi, wr_lo);
wmb();
ETHER_BPF_MTAP(pi->ifp, m0);
wr_gen2(txd, txqs.gen);
check_ring_tx_db(sc, txq, 0);
m_freem(m0);
return (0);
}
flits = 2;
}
wrp = (struct work_request_hdr *)txd;
sgp = (ndesc == 1) ? (struct sg_ent *)&txd->flit[flits] : sgl;
make_sgl(sgp, segs, nsegs);
sgl_flits = sgl_len(nsegs);
ETHER_BPF_MTAP(pi->ifp, m0);
KASSERT(ndesc <= 4, ("ndesc too large %d", ndesc));
wr_hi = htonl(V_WR_OP(FW_WROPCODE_TUNNEL_TX_PKT) | txqs.compl);
wr_lo = htonl(V_WR_TID(txq->token));
write_wr_hdr_sgl(ndesc, txd, &txqs, txq, sgl, flits,
sgl_flits, wr_hi, wr_lo);
check_ring_tx_db(sc, txq, 0);
return (0);
}
void
cxgb_tx_watchdog(void *arg)
{
struct sge_qset *qs = arg;
struct sge_txq *txq = &qs->txq[TXQ_ETH];
if (qs->coalescing != 0 &&
(txq->in_use <= cxgb_tx_coalesce_enable_stop) &&
TXQ_RING_EMPTY(qs))
qs->coalescing = 0;
else if (qs->coalescing == 0 &&
(txq->in_use >= cxgb_tx_coalesce_enable_start))
qs->coalescing = 1;
if (TXQ_TRYLOCK(qs)) {
qs->qs_flags |= QS_FLUSHING;
cxgb_start_locked(qs);
qs->qs_flags &= ~QS_FLUSHING;
TXQ_UNLOCK(qs);
}
if (qs->port->ifp->if_drv_flags & IFF_DRV_RUNNING)
callout_reset_on(&txq->txq_watchdog, hz/4, cxgb_tx_watchdog,
qs, txq->txq_watchdog.c_cpu);
}
static void
cxgb_tx_timeout(void *arg)
{
struct sge_qset *qs = arg;
struct sge_txq *txq = &qs->txq[TXQ_ETH];
if (qs->coalescing == 0 && (txq->in_use >= (txq->size>>3)))
qs->coalescing = 1;
if (TXQ_TRYLOCK(qs)) {
qs->qs_flags |= QS_TIMEOUT;
cxgb_start_locked(qs);
qs->qs_flags &= ~QS_TIMEOUT;
TXQ_UNLOCK(qs);
}
}
static void
cxgb_start_locked(struct sge_qset *qs)
{
struct mbuf *m_head = NULL;
struct sge_txq *txq = &qs->txq[TXQ_ETH];
struct port_info *pi = qs->port;
struct ifnet *ifp = pi->ifp;
if (qs->qs_flags & (QS_FLUSHING|QS_TIMEOUT))
reclaim_completed_tx(qs, 0, TXQ_ETH);
if (!pi->link_config.link_ok) {
TXQ_RING_FLUSH(qs);
return;
}
TXQ_LOCK_ASSERT(qs);
while (!TXQ_RING_EMPTY(qs) && (ifp->if_drv_flags & IFF_DRV_RUNNING) &&
pi->link_config.link_ok) {
reclaim_completed_tx(qs, cxgb_tx_reclaim_threshold, TXQ_ETH);
if (txq->size - txq->in_use <= TX_MAX_DESC)
break;
if ((m_head = cxgb_dequeue(qs)) == NULL)
break;
/*
* Encapsulation can modify our pointer, and or make it
* NULL on failure. In that event, we can't requeue.
*/
if (t3_encap(qs, &m_head) || m_head == NULL)
break;
m_head = NULL;
}
if (txq->db_pending)
check_ring_tx_db(pi->adapter, txq, 1);
if (!TXQ_RING_EMPTY(qs) && callout_pending(&txq->txq_timer) == 0 &&
pi->link_config.link_ok)
callout_reset_on(&txq->txq_timer, 1, cxgb_tx_timeout,
qs, txq->txq_timer.c_cpu);
if (m_head != NULL)
m_freem(m_head);
}
static int
cxgb_transmit_locked(struct ifnet *ifp, struct sge_qset *qs, struct mbuf *m)
{
struct port_info *pi = qs->port;
struct sge_txq *txq = &qs->txq[TXQ_ETH];
struct buf_ring *br = txq->txq_mr;
int error, avail;
avail = txq->size - txq->in_use;
TXQ_LOCK_ASSERT(qs);
/*
* We can only do a direct transmit if the following are true:
* - we aren't coalescing (ring < 3/4 full)
* - the link is up -- checked in caller
* - there are no packets enqueued already
* - there is space in hardware transmit queue
*/
if (check_pkt_coalesce(qs) == 0 &&
!TXQ_RING_NEEDS_ENQUEUE(qs) && avail > TX_MAX_DESC) {
if (t3_encap(qs, &m)) {
if (m != NULL &&
(error = drbr_enqueue(ifp, br, m)) != 0)
return (error);
} else {
if (txq->db_pending)
check_ring_tx_db(pi->adapter, txq, 1);
/*
* We've bypassed the buf ring so we need to update
* the stats directly
*/
txq->txq_direct_packets++;
txq->txq_direct_bytes += m->m_pkthdr.len;
}
} else if ((error = drbr_enqueue(ifp, br, m)) != 0)
return (error);
reclaim_completed_tx(qs, cxgb_tx_reclaim_threshold, TXQ_ETH);
if (!TXQ_RING_EMPTY(qs) && pi->link_config.link_ok &&
(!check_pkt_coalesce(qs) || (drbr_inuse(ifp, br) >= 7)))
cxgb_start_locked(qs);
else if (!TXQ_RING_EMPTY(qs) && !callout_pending(&txq->txq_timer))
callout_reset_on(&txq->txq_timer, 1, cxgb_tx_timeout,
qs, txq->txq_timer.c_cpu);
return (0);
}
int
cxgb_transmit(struct ifnet *ifp, struct mbuf *m)
{
struct sge_qset *qs;
struct port_info *pi = ifp->if_softc;
int error, qidx = pi->first_qset;
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0
||(!pi->link_config.link_ok)) {
m_freem(m);
return (0);
}
/* check if flowid is set */
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
qidx = (m->m_pkthdr.flowid % pi->nqsets) + pi->first_qset;
qs = &pi->adapter->sge.qs[qidx];
if (TXQ_TRYLOCK(qs)) {
/* XXX running */
error = cxgb_transmit_locked(ifp, qs, m);
TXQ_UNLOCK(qs);
} else
error = drbr_enqueue(ifp, qs->txq[TXQ_ETH].txq_mr, m);
return (error);
}
void
cxgb_qflush(struct ifnet *ifp)
{
/*
* flush any enqueued mbufs in the buf_rings
* and in the transmit queues
* no-op for now
*/
return;
}
/**
* write_imm - write a packet into a Tx descriptor as immediate data
* @d: the Tx descriptor to write
* @m: the packet
* @len: the length of packet data to write as immediate data
* @gen: the generation bit value to write
*
* Writes a packet as immediate data into a Tx descriptor. The packet
* contains a work request at its beginning. We must write the packet
* carefully so the SGE doesn't read accidentally before it's written in
* its entirety.
*/
static __inline void
write_imm(struct tx_desc *d, caddr_t src,
unsigned int len, unsigned int gen)
{
struct work_request_hdr *from = (struct work_request_hdr *)src;
struct work_request_hdr *to = (struct work_request_hdr *)d;
uint32_t wr_hi, wr_lo;
KASSERT(len <= WR_LEN && len >= sizeof(*from),
("%s: invalid len %d", __func__, len));
memcpy(&to[1], &from[1], len - sizeof(*from));
wr_hi = from->wrh_hi | htonl(F_WR_SOP | F_WR_EOP |
V_WR_BCNTLFLT(len & 7));
wr_lo = from->wrh_lo | htonl(V_WR_GEN(gen) | V_WR_LEN((len + 7) / 8));
set_wr_hdr(to, wr_hi, wr_lo);
wmb();
wr_gen2(d, gen);
}
/**
* check_desc_avail - check descriptor availability on a send queue
* @adap: the adapter
* @q: the TX queue
* @m: the packet needing the descriptors
* @ndesc: the number of Tx descriptors needed
* @qid: the Tx queue number in its queue set (TXQ_OFLD or TXQ_CTRL)
*
* Checks if the requested number of Tx descriptors is available on an
* SGE send queue. If the queue is already suspended or not enough
* descriptors are available the packet is queued for later transmission.
* Must be called with the Tx queue locked.
*
* Returns 0 if enough descriptors are available, 1 if there aren't
* enough descriptors and the packet has been queued, and 2 if the caller
* needs to retry because there weren't enough descriptors at the
* beginning of the call but some freed up in the mean time.
*/
static __inline int
check_desc_avail(adapter_t *adap, struct sge_txq *q,
struct mbuf *m, unsigned int ndesc,
unsigned int qid)
{
/*
* XXX We currently only use this for checking the control queue
* the control queue is only used for binding qsets which happens
* at init time so we are guaranteed enough descriptors
*/
if (__predict_false(mbufq_len(&q->sendq))) {
addq_exit: (void )mbufq_enqueue(&q->sendq, m);
return 1;
}
if (__predict_false(q->size - q->in_use < ndesc)) {
struct sge_qset *qs = txq_to_qset(q, qid);
setbit(&qs->txq_stopped, qid);
if (should_restart_tx(q) &&
test_and_clear_bit(qid, &qs->txq_stopped))
return 2;
q->stops++;
goto addq_exit;
}
return 0;
}
/**
* reclaim_completed_tx_imm - reclaim completed control-queue Tx descs
* @q: the SGE control Tx queue
*
* This is a variant of reclaim_completed_tx() that is used for Tx queues
* that send only immediate data (presently just the control queues) and
* thus do not have any mbufs
*/
static __inline void
reclaim_completed_tx_imm(struct sge_txq *q)
{
unsigned int reclaim = q->processed - q->cleaned;
q->in_use -= reclaim;
q->cleaned += reclaim;
}
/**
* ctrl_xmit - send a packet through an SGE control Tx queue
* @adap: the adapter
* @q: the control queue
* @m: the packet
*
* Send a packet through an SGE control Tx queue. Packets sent through
* a control queue must fit entirely as immediate data in a single Tx
* descriptor and have no page fragments.
*/
static int
ctrl_xmit(adapter_t *adap, struct sge_qset *qs, struct mbuf *m)
{
int ret;
struct work_request_hdr *wrp = mtod(m, struct work_request_hdr *);
struct sge_txq *q = &qs->txq[TXQ_CTRL];
KASSERT(m->m_len <= WR_LEN, ("%s: bad tx data", __func__));
wrp->wrh_hi |= htonl(F_WR_SOP | F_WR_EOP);
wrp->wrh_lo = htonl(V_WR_TID(q->token));
TXQ_LOCK(qs);
again: reclaim_completed_tx_imm(q);
ret = check_desc_avail(adap, q, m, 1, TXQ_CTRL);
if (__predict_false(ret)) {
if (ret == 1) {
TXQ_UNLOCK(qs);
return (ENOSPC);
}
goto again;
}
write_imm(&q->desc[q->pidx], m->m_data, m->m_len, q->gen);
q->in_use++;
if (++q->pidx >= q->size) {
q->pidx = 0;
q->gen ^= 1;
}
TXQ_UNLOCK(qs);
wmb();
t3_write_reg(adap, A_SG_KDOORBELL,
F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
m_free(m);
return (0);
}
/**
* restart_ctrlq - restart a suspended control queue
* @qs: the queue set cotaining the control queue
*
* Resumes transmission on a suspended Tx control queue.
*/
static void
restart_ctrlq(void *data, int npending)
{
struct mbuf *m;
struct sge_qset *qs = (struct sge_qset *)data;
struct sge_txq *q = &qs->txq[TXQ_CTRL];
adapter_t *adap = qs->port->adapter;
TXQ_LOCK(qs);
again: reclaim_completed_tx_imm(q);
while (q->in_use < q->size &&
(m = mbufq_dequeue(&q->sendq)) != NULL) {
write_imm(&q->desc[q->pidx], m->m_data, m->m_len, q->gen);
m_free(m);
if (++q->pidx >= q->size) {
q->pidx = 0;
q->gen ^= 1;
}
q->in_use++;
}
if (mbufq_len(&q->sendq)) {
setbit(&qs->txq_stopped, TXQ_CTRL);
if (should_restart_tx(q) &&
test_and_clear_bit(TXQ_CTRL, &qs->txq_stopped))
goto again;
q->stops++;
}
TXQ_UNLOCK(qs);
t3_write_reg(adap, A_SG_KDOORBELL,
F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
}
/*
* Send a management message through control queue 0
*/
int
t3_mgmt_tx(struct adapter *adap, struct mbuf *m)
{
return ctrl_xmit(adap, &adap->sge.qs[0], m);
}
/**
* free_qset - free the resources of an SGE queue set
* @sc: the controller owning the queue set
* @q: the queue set
*
* Release the HW and SW resources associated with an SGE queue set, such
* as HW contexts, packet buffers, and descriptor rings. Traffic to the
* queue set must be quiesced prior to calling this.
*/
static void
t3_free_qset(adapter_t *sc, struct sge_qset *q)
{
int i;
reclaim_completed_tx(q, 0, TXQ_ETH);
if (q->txq[TXQ_ETH].txq_mr != NULL)
buf_ring_free(q->txq[TXQ_ETH].txq_mr, M_DEVBUF);
if (q->txq[TXQ_ETH].txq_ifq != NULL) {
ifq_delete(q->txq[TXQ_ETH].txq_ifq);
free(q->txq[TXQ_ETH].txq_ifq, M_DEVBUF);
}
for (i = 0; i < SGE_RXQ_PER_SET; ++i) {
if (q->fl[i].desc) {
mtx_lock_spin(&sc->sge.reg_lock);
t3_sge_disable_fl(sc, q->fl[i].cntxt_id);
mtx_unlock_spin(&sc->sge.reg_lock);
bus_dmamap_unload(q->fl[i].desc_tag, q->fl[i].desc_map);
bus_dmamem_free(q->fl[i].desc_tag, q->fl[i].desc,
q->fl[i].desc_map);
bus_dma_tag_destroy(q->fl[i].desc_tag);
bus_dma_tag_destroy(q->fl[i].entry_tag);
}
if (q->fl[i].sdesc) {
free_rx_bufs(sc, &q->fl[i]);
free(q->fl[i].sdesc, M_DEVBUF);
}
}
mtx_unlock(&q->lock);
MTX_DESTROY(&q->lock);
for (i = 0; i < SGE_TXQ_PER_SET; i++) {
if (q->txq[i].desc) {
mtx_lock_spin(&sc->sge.reg_lock);
t3_sge_enable_ecntxt(sc, q->txq[i].cntxt_id, 0);
mtx_unlock_spin(&sc->sge.reg_lock);
bus_dmamap_unload(q->txq[i].desc_tag,
q->txq[i].desc_map);
bus_dmamem_free(q->txq[i].desc_tag, q->txq[i].desc,
q->txq[i].desc_map);
bus_dma_tag_destroy(q->txq[i].desc_tag);
bus_dma_tag_destroy(q->txq[i].entry_tag);
}
if (q->txq[i].sdesc) {
free(q->txq[i].sdesc, M_DEVBUF);
}
}
if (q->rspq.desc) {
mtx_lock_spin(&sc->sge.reg_lock);
t3_sge_disable_rspcntxt(sc, q->rspq.cntxt_id);
mtx_unlock_spin(&sc->sge.reg_lock);
bus_dmamap_unload(q->rspq.desc_tag, q->rspq.desc_map);
bus_dmamem_free(q->rspq.desc_tag, q->rspq.desc,
q->rspq.desc_map);
bus_dma_tag_destroy(q->rspq.desc_tag);
MTX_DESTROY(&q->rspq.lock);
}
#if defined(INET6) || defined(INET)
tcp_lro_free(&q->lro.ctrl);
#endif
bzero(q, sizeof(*q));
}
/**
* t3_free_sge_resources - free SGE resources
* @sc: the adapter softc
*
* Frees resources used by the SGE queue sets.
*/
void
t3_free_sge_resources(adapter_t *sc, int nqsets)
{
int i;
for (i = 0; i < nqsets; ++i) {
TXQ_LOCK(&sc->sge.qs[i]);
t3_free_qset(sc, &sc->sge.qs[i]);
}
}
/**
* t3_sge_start - enable SGE
* @sc: the controller softc
*
* Enables the SGE for DMAs. This is the last step in starting packet
* transfers.
*/
void
t3_sge_start(adapter_t *sc)
{
t3_set_reg_field(sc, A_SG_CONTROL, F_GLOBALENABLE, F_GLOBALENABLE);
}
/**
* t3_sge_stop - disable SGE operation
* @sc: the adapter
*
* Disables the DMA engine. This can be called in emeregencies (e.g.,
* from error interrupts) or from normal process context. In the latter
* case it also disables any pending queue restart tasklets. Note that
* if it is called in interrupt context it cannot disable the restart
* tasklets as it cannot wait, however the tasklets will have no effect
* since the doorbells are disabled and the driver will call this again
* later from process context, at which time the tasklets will be stopped
* if they are still running.
*/
void
t3_sge_stop(adapter_t *sc)
{
int i, nqsets;
t3_set_reg_field(sc, A_SG_CONTROL, F_GLOBALENABLE, 0);
if (sc->tq == NULL)
return;
for (nqsets = i = 0; i < (sc)->params.nports; i++)
nqsets += sc->port[i].nqsets;
#ifdef notyet
/*
*
* XXX
*/
for (i = 0; i < nqsets; ++i) {
struct sge_qset *qs = &sc->sge.qs[i];
taskqueue_drain(sc->tq, &qs->txq[TXQ_OFLD].qresume_task);
taskqueue_drain(sc->tq, &qs->txq[TXQ_CTRL].qresume_task);
}
#endif
}
/**
* t3_free_tx_desc - reclaims Tx descriptors and their buffers
* @adapter: the adapter
* @q: the Tx queue to reclaim descriptors from
* @reclaimable: the number of descriptors to reclaim
* @m_vec_size: maximum number of buffers to reclaim
* @desc_reclaimed: returns the number of descriptors reclaimed
*
* Reclaims Tx descriptors from an SGE Tx queue and frees the associated
* Tx buffers. Called with the Tx queue lock held.
*
* Returns number of buffers of reclaimed
*/
void
t3_free_tx_desc(struct sge_qset *qs, int reclaimable, int queue)
{
struct tx_sw_desc *txsd;
unsigned int cidx, mask;
struct sge_txq *q = &qs->txq[queue];
#ifdef T3_TRACE
T3_TRACE2(sc->tb[q->cntxt_id & 7],
"reclaiming %u Tx descriptors at cidx %u", reclaimable, cidx);
#endif
cidx = q->cidx;
mask = q->size - 1;
txsd = &q->sdesc[cidx];
mtx_assert(&qs->lock, MA_OWNED);
while (reclaimable--) {
prefetch(q->sdesc[(cidx + 1) & mask].m);
prefetch(q->sdesc[(cidx + 2) & mask].m);
if (txsd->m != NULL) {
if (txsd->flags & TX_SW_DESC_MAPPED) {
bus_dmamap_unload(q->entry_tag, txsd->map);
txsd->flags &= ~TX_SW_DESC_MAPPED;
}
m_freem_list(txsd->m);
txsd->m = NULL;
} else
q->txq_skipped++;
++txsd;
if (++cidx == q->size) {
cidx = 0;
txsd = q->sdesc;
}
}
q->cidx = cidx;
}
/**
* is_new_response - check if a response is newly written
* @r: the response descriptor
* @q: the response queue
*
* Returns true if a response descriptor contains a yet unprocessed
* response.
*/
static __inline int
is_new_response(const struct rsp_desc *r,
const struct sge_rspq *q)
{
return (r->intr_gen & F_RSPD_GEN2) == q->gen;
}
#define RSPD_GTS_MASK (F_RSPD_TXQ0_GTS | F_RSPD_TXQ1_GTS)
#define RSPD_CTRL_MASK (RSPD_GTS_MASK | \
V_RSPD_TXQ0_CR(M_RSPD_TXQ0_CR) | \
V_RSPD_TXQ1_CR(M_RSPD_TXQ1_CR) | \
V_RSPD_TXQ2_CR(M_RSPD_TXQ2_CR))
/* How long to delay the next interrupt in case of memory shortage, in 0.1us. */
#define NOMEM_INTR_DELAY 2500
#ifdef TCP_OFFLOAD
/**
* write_ofld_wr - write an offload work request
* @adap: the adapter
* @m: the packet to send
* @q: the Tx queue
* @pidx: index of the first Tx descriptor to write
* @gen: the generation value to use
* @ndesc: number of descriptors the packet will occupy
*
* Write an offload work request to send the supplied packet. The packet
* data already carry the work request with most fields populated.
*/
static void
write_ofld_wr(adapter_t *adap, struct mbuf *m, struct sge_txq *q,
unsigned int pidx, unsigned int gen, unsigned int ndesc)
{
unsigned int sgl_flits, flits;
int i, idx, nsegs, wrlen;
struct work_request_hdr *from;
struct sg_ent *sgp, t3sgl[TX_MAX_SEGS / 2 + 1];
struct tx_desc *d = &q->desc[pidx];
struct txq_state txqs;
struct sglist_seg *segs;
struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
struct sglist *sgl;
from = (void *)(oh + 1); /* Start of WR within mbuf */
wrlen = m->m_len - sizeof(*oh);
if (!(oh->flags & F_HDR_SGL)) {
write_imm(d, (caddr_t)from, wrlen, gen);
/*
* mbuf with "real" immediate tx data will be enqueue_wr'd by
* t3_push_frames and freed in wr_ack. Others, like those sent
* down by close_conn, t3_send_reset, etc. should be freed here.
*/
if (!(oh->flags & F_HDR_DF))
m_free(m);
return;
}
memcpy(&d->flit[1], &from[1], wrlen - sizeof(*from));
sgl = oh->sgl;
flits = wrlen / 8;
sgp = (ndesc == 1) ? (struct sg_ent *)&d->flit[flits] : t3sgl;
nsegs = sgl->sg_nseg;
segs = sgl->sg_segs;
for (idx = 0, i = 0; i < nsegs; i++) {
KASSERT(segs[i].ss_len, ("%s: 0 len in sgl", __func__));
if (i && idx == 0)
++sgp;
sgp->len[idx] = htobe32(segs[i].ss_len);
sgp->addr[idx] = htobe64(segs[i].ss_paddr);
idx ^= 1;
}
if (idx) {
sgp->len[idx] = 0;
sgp->addr[idx] = 0;
}
sgl_flits = sgl_len(nsegs);
txqs.gen = gen;
txqs.pidx = pidx;
txqs.compl = 0;
write_wr_hdr_sgl(ndesc, d, &txqs, q, t3sgl, flits, sgl_flits,
from->wrh_hi, from->wrh_lo);
}
/**
* ofld_xmit - send a packet through an offload queue
* @adap: the adapter
* @q: the Tx offload queue
* @m: the packet
*
* Send an offload packet through an SGE offload queue.
*/
static int
ofld_xmit(adapter_t *adap, struct sge_qset *qs, struct mbuf *m)
{
int ret;
unsigned int ndesc;
unsigned int pidx, gen;
struct sge_txq *q = &qs->txq[TXQ_OFLD];
struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
ndesc = G_HDR_NDESC(oh->flags);
TXQ_LOCK(qs);
again: reclaim_completed_tx(qs, 16, TXQ_OFLD);
ret = check_desc_avail(adap, q, m, ndesc, TXQ_OFLD);
if (__predict_false(ret)) {
if (ret == 1) {
TXQ_UNLOCK(qs);
return (EINTR);
}
goto again;
}
gen = q->gen;
q->in_use += ndesc;
pidx = q->pidx;
q->pidx += ndesc;
if (q->pidx >= q->size) {
q->pidx -= q->size;
q->gen ^= 1;
}
write_ofld_wr(adap, m, q, pidx, gen, ndesc);
check_ring_tx_db(adap, q, 1);
TXQ_UNLOCK(qs);
return (0);
}
/**
* restart_offloadq - restart a suspended offload queue
* @qs: the queue set cotaining the offload queue
*
* Resumes transmission on a suspended Tx offload queue.
*/
static void
restart_offloadq(void *data, int npending)
{
struct mbuf *m;
struct sge_qset *qs = data;
struct sge_txq *q = &qs->txq[TXQ_OFLD];
adapter_t *adap = qs->port->adapter;
int cleaned;
TXQ_LOCK(qs);
again: cleaned = reclaim_completed_tx(qs, 16, TXQ_OFLD);
while ((m = mbufq_first(&q->sendq)) != NULL) {
unsigned int gen, pidx;
struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
unsigned int ndesc = G_HDR_NDESC(oh->flags);
if (__predict_false(q->size - q->in_use < ndesc)) {
setbit(&qs->txq_stopped, TXQ_OFLD);
if (should_restart_tx(q) &&
test_and_clear_bit(TXQ_OFLD, &qs->txq_stopped))
goto again;
q->stops++;
break;
}
gen = q->gen;
q->in_use += ndesc;
pidx = q->pidx;
q->pidx += ndesc;
if (q->pidx >= q->size) {
q->pidx -= q->size;
q->gen ^= 1;
}
(void)mbufq_dequeue(&q->sendq);
TXQ_UNLOCK(qs);
write_ofld_wr(adap, m, q, pidx, gen, ndesc);
TXQ_LOCK(qs);
}
#if USE_GTS
set_bit(TXQ_RUNNING, &q->flags);
set_bit(TXQ_LAST_PKT_DB, &q->flags);
#endif
TXQ_UNLOCK(qs);
wmb();
t3_write_reg(adap, A_SG_KDOORBELL,
F_SELEGRCNTX | V_EGRCNTX(q->cntxt_id));
}
/**
* t3_offload_tx - send an offload packet
* @m: the packet
*
* Sends an offload packet. We use the packet priority to select the
* appropriate Tx queue as follows: bit 0 indicates whether the packet
* should be sent as regular or control, bits 1-3 select the queue set.
*/
int
t3_offload_tx(struct adapter *sc, struct mbuf *m)
{
struct ofld_hdr *oh = mtod(m, struct ofld_hdr *);
struct sge_qset *qs = &sc->sge.qs[G_HDR_QSET(oh->flags)];
if (oh->flags & F_HDR_CTRL) {
m_adj(m, sizeof (*oh)); /* trim ofld_hdr off */
return (ctrl_xmit(sc, qs, m));
} else
return (ofld_xmit(sc, qs, m));
}
#endif
static void
restart_tx(struct sge_qset *qs)
{
struct adapter *sc = qs->port->adapter;
if (isset(&qs->txq_stopped, TXQ_OFLD) &&
should_restart_tx(&qs->txq[TXQ_OFLD]) &&
test_and_clear_bit(TXQ_OFLD, &qs->txq_stopped)) {
qs->txq[TXQ_OFLD].restarts++;
taskqueue_enqueue(sc->tq, &qs->txq[TXQ_OFLD].qresume_task);
}
if (isset(&qs->txq_stopped, TXQ_CTRL) &&
should_restart_tx(&qs->txq[TXQ_CTRL]) &&
test_and_clear_bit(TXQ_CTRL, &qs->txq_stopped)) {
qs->txq[TXQ_CTRL].restarts++;
taskqueue_enqueue(sc->tq, &qs->txq[TXQ_CTRL].qresume_task);
}
}
/**
* t3_sge_alloc_qset - initialize an SGE queue set
* @sc: the controller softc
* @id: the queue set id
* @nports: how many Ethernet ports will be using this queue set
* @irq_vec_idx: the IRQ vector index for response queue interrupts
* @p: configuration parameters for this queue set
* @ntxq: number of Tx queues for the queue set
* @pi: port info for queue set
*
* Allocate resources and initialize an SGE queue set. A queue set
* comprises a response queue, two Rx free-buffer queues, and up to 3
* Tx queues. The Tx queues are assigned roles in the order Ethernet
* queue, offload queue, and control queue.
*/
int
t3_sge_alloc_qset(adapter_t *sc, u_int id, int nports, int irq_vec_idx,
const struct qset_params *p, int ntxq, struct port_info *pi)
{
struct sge_qset *q = &sc->sge.qs[id];
int i, ret = 0;
MTX_INIT(&q->lock, q->namebuf, NULL, MTX_DEF);
q->port = pi;
q->adap = sc;
if ((q->txq[TXQ_ETH].txq_mr = buf_ring_alloc(cxgb_txq_buf_ring_size,
M_DEVBUF, M_WAITOK, &q->lock)) == NULL) {
device_printf(sc->dev, "failed to allocate mbuf ring\n");
goto err;
}
if ((q->txq[TXQ_ETH].txq_ifq = malloc(sizeof(struct ifaltq), M_DEVBUF,
M_NOWAIT | M_ZERO)) == NULL) {
device_printf(sc->dev, "failed to allocate ifq\n");
goto err;
}
ifq_init(q->txq[TXQ_ETH].txq_ifq, pi->ifp);
callout_init(&q->txq[TXQ_ETH].txq_timer, 1);
callout_init(&q->txq[TXQ_ETH].txq_watchdog, 1);
q->txq[TXQ_ETH].txq_timer.c_cpu = id % mp_ncpus;
q->txq[TXQ_ETH].txq_watchdog.c_cpu = id % mp_ncpus;
init_qset_cntxt(q, id);
q->idx = id;
if ((ret = alloc_ring(sc, p->fl_size, sizeof(struct rx_desc),
sizeof(struct rx_sw_desc), &q->fl[0].phys_addr,
&q->fl[0].desc, &q->fl[0].sdesc,
&q->fl[0].desc_tag, &q->fl[0].desc_map,
sc->rx_dmat, &q->fl[0].entry_tag)) != 0) {
printf("error %d from alloc ring fl0\n", ret);
goto err;
}
if ((ret = alloc_ring(sc, p->jumbo_size, sizeof(struct rx_desc),
sizeof(struct rx_sw_desc), &q->fl[1].phys_addr,
&q->fl[1].desc, &q->fl[1].sdesc,
&q->fl[1].desc_tag, &q->fl[1].desc_map,
sc->rx_jumbo_dmat, &q->fl[1].entry_tag)) != 0) {
printf("error %d from alloc ring fl1\n", ret);
goto err;
}
if ((ret = alloc_ring(sc, p->rspq_size, sizeof(struct rsp_desc), 0,
&q->rspq.phys_addr, &q->rspq.desc, NULL,
&q->rspq.desc_tag, &q->rspq.desc_map,
NULL, NULL)) != 0) {
printf("error %d from alloc ring rspq\n", ret);
goto err;
}
snprintf(q->rspq.lockbuf, RSPQ_NAME_LEN, "t3 rspq lock %d:%d",
device_get_unit(sc->dev), irq_vec_idx);
MTX_INIT(&q->rspq.lock, q->rspq.lockbuf, NULL, MTX_DEF);
for (i = 0; i < ntxq; ++i) {
size_t sz = i == TXQ_CTRL ? 0 : sizeof(struct tx_sw_desc);
if ((ret = alloc_ring(sc, p->txq_size[i],
sizeof(struct tx_desc), sz,
&q->txq[i].phys_addr, &q->txq[i].desc,
&q->txq[i].sdesc, &q->txq[i].desc_tag,
&q->txq[i].desc_map,
sc->tx_dmat, &q->txq[i].entry_tag)) != 0) {
printf("error %d from alloc ring tx %i\n", ret, i);
goto err;
}
mbufq_init(&q->txq[i].sendq, INT_MAX);
q->txq[i].gen = 1;
q->txq[i].size = p->txq_size[i];
}
#ifdef TCP_OFFLOAD
TASK_INIT(&q->txq[TXQ_OFLD].qresume_task, 0, restart_offloadq, q);
#endif
TASK_INIT(&q->txq[TXQ_CTRL].qresume_task, 0, restart_ctrlq, q);
TASK_INIT(&q->txq[TXQ_ETH].qreclaim_task, 0, sge_txq_reclaim_handler, q);
TASK_INIT(&q->txq[TXQ_OFLD].qreclaim_task, 0, sge_txq_reclaim_handler, q);
q->fl[0].gen = q->fl[1].gen = 1;
q->fl[0].size = p->fl_size;
q->fl[1].size = p->jumbo_size;
q->rspq.gen = 1;
q->rspq.cidx = 0;
q->rspq.size = p->rspq_size;
q->txq[TXQ_ETH].stop_thres = nports *
flits_to_desc(sgl_len(TX_MAX_SEGS + 1) + 3);
q->fl[0].buf_size = MCLBYTES;
q->fl[0].zone = zone_pack;
q->fl[0].type = EXT_PACKET;
if (p->jumbo_buf_size == MJUM16BYTES) {
q->fl[1].zone = zone_jumbo16;
q->fl[1].type = EXT_JUMBO16;
} else if (p->jumbo_buf_size == MJUM9BYTES) {
q->fl[1].zone = zone_jumbo9;
q->fl[1].type = EXT_JUMBO9;
} else if (p->jumbo_buf_size == MJUMPAGESIZE) {
q->fl[1].zone = zone_jumbop;
q->fl[1].type = EXT_JUMBOP;
} else {
KASSERT(0, ("can't deal with jumbo_buf_size %d.", p->jumbo_buf_size));
ret = EDOOFUS;
goto err;
}
q->fl[1].buf_size = p->jumbo_buf_size;
/* Allocate and setup the lro_ctrl structure */
q->lro.enabled = !!(pi->ifp->if_capenable & IFCAP_LRO);
#if defined(INET6) || defined(INET)
ret = tcp_lro_init(&q->lro.ctrl);
if (ret) {
printf("error %d from tcp_lro_init\n", ret);
goto err;
}
#endif
q->lro.ctrl.ifp = pi->ifp;
mtx_lock_spin(&sc->sge.reg_lock);
ret = -t3_sge_init_rspcntxt(sc, q->rspq.cntxt_id, irq_vec_idx,
q->rspq.phys_addr, q->rspq.size,
q->fl[0].buf_size, 1, 0);
if (ret) {
printf("error %d from t3_sge_init_rspcntxt\n", ret);
goto err_unlock;
}
for (i = 0; i < SGE_RXQ_PER_SET; ++i) {
ret = -t3_sge_init_flcntxt(sc, q->fl[i].cntxt_id, 0,
q->fl[i].phys_addr, q->fl[i].size,
q->fl[i].buf_size, p->cong_thres, 1,
0);
if (ret) {
printf("error %d from t3_sge_init_flcntxt for index i=%d\n", ret, i);
goto err_unlock;
}
}
ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_ETH].cntxt_id, USE_GTS,
SGE_CNTXT_ETH, id, q->txq[TXQ_ETH].phys_addr,
q->txq[TXQ_ETH].size, q->txq[TXQ_ETH].token,
1, 0);
if (ret) {
printf("error %d from t3_sge_init_ecntxt\n", ret);
goto err_unlock;
}
if (ntxq > 1) {
ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_OFLD].cntxt_id,
USE_GTS, SGE_CNTXT_OFLD, id,
q->txq[TXQ_OFLD].phys_addr,
q->txq[TXQ_OFLD].size, 0, 1, 0);
if (ret) {
printf("error %d from t3_sge_init_ecntxt\n", ret);
goto err_unlock;
}
}
if (ntxq > 2) {
ret = -t3_sge_init_ecntxt(sc, q->txq[TXQ_CTRL].cntxt_id, 0,
SGE_CNTXT_CTRL, id,
q->txq[TXQ_CTRL].phys_addr,
q->txq[TXQ_CTRL].size,
q->txq[TXQ_CTRL].token, 1, 0);
if (ret) {
printf("error %d from t3_sge_init_ecntxt\n", ret);
goto err_unlock;
}
}
mtx_unlock_spin(&sc->sge.reg_lock);
t3_update_qset_coalesce(q, p);
refill_fl(sc, &q->fl[0], q->fl[0].size);
refill_fl(sc, &q->fl[1], q->fl[1].size);
refill_rspq(sc, &q->rspq, q->rspq.size - 1);
t3_write_reg(sc, A_SG_GTS, V_RSPQ(q->rspq.cntxt_id) |
V_NEWTIMER(q->rspq.holdoff_tmr));
return (0);
err_unlock:
mtx_unlock_spin(&sc->sge.reg_lock);
err:
TXQ_LOCK(q);
t3_free_qset(sc, q);
return (ret);
}
/*
* Remove CPL_RX_PKT headers from the mbuf and reduce it to a regular mbuf with
* ethernet data. Hardware assistance with various checksums and any vlan tag
* will also be taken into account here.
*/
void
t3_rx_eth(struct adapter *adap, struct mbuf *m, int ethpad)
{
struct cpl_rx_pkt *cpl = (struct cpl_rx_pkt *)(mtod(m, uint8_t *) + ethpad);
struct port_info *pi = &adap->port[adap->rxpkt_map[cpl->iff]];
struct ifnet *ifp = pi->ifp;
if (cpl->vlan_valid) {
m->m_pkthdr.ether_vtag = ntohs(cpl->vlan);
m->m_flags |= M_VLANTAG;
}
m->m_pkthdr.rcvif = ifp;
/*
* adjust after conversion to mbuf chain
*/
m->m_pkthdr.len -= (sizeof(*cpl) + ethpad);
m->m_len -= (sizeof(*cpl) + ethpad);
m->m_data += (sizeof(*cpl) + ethpad);
if (!cpl->fragment && cpl->csum_valid && cpl->csum == 0xffff) {
struct ether_header *eh = mtod(m, void *);
uint16_t eh_type;
if (eh->ether_type == htons(ETHERTYPE_VLAN)) {
struct ether_vlan_header *evh = mtod(m, void *);
eh_type = evh->evl_proto;
} else
eh_type = eh->ether_type;
if (ifp->if_capenable & IFCAP_RXCSUM &&
eh_type == htons(ETHERTYPE_IP)) {
m->m_pkthdr.csum_flags = (CSUM_IP_CHECKED |
CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m->m_pkthdr.csum_data = 0xffff;
} else if (ifp->if_capenable & IFCAP_RXCSUM_IPV6 &&
eh_type == htons(ETHERTYPE_IPV6)) {
m->m_pkthdr.csum_flags = (CSUM_DATA_VALID_IPV6 |
CSUM_PSEUDO_HDR);
m->m_pkthdr.csum_data = 0xffff;
}
}
}
/**
* get_packet - return the next ingress packet buffer from a free list
* @adap: the adapter that received the packet
* @drop_thres: # of remaining buffers before we start dropping packets
* @qs: the qset that the SGE free list holding the packet belongs to
* @mh: the mbuf header, contains a pointer to the head and tail of the mbuf chain
* @r: response descriptor
*
* Get the next packet from a free list and complete setup of the
* sk_buff. If the packet is small we make a copy and recycle the
* original buffer, otherwise we use the original buffer itself. If a
* positive drop threshold is supplied packets are dropped and their
* buffers recycled if (a) the number of remaining buffers is under the
* threshold and the packet is too big to copy, or (b) the packet should
* be copied but there is no memory for the copy.
*/
static int
get_packet(adapter_t *adap, unsigned int drop_thres, struct sge_qset *qs,
struct t3_mbuf_hdr *mh, struct rsp_desc *r)
{
unsigned int len_cq = ntohl(r->len_cq);
struct sge_fl *fl = (len_cq & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0];
int mask, cidx = fl->cidx;
struct rx_sw_desc *sd = &fl->sdesc[cidx];
uint32_t len = G_RSPD_LEN(len_cq);
uint32_t flags = M_EXT;
uint8_t sopeop = G_RSPD_SOP_EOP(ntohl(r->flags));
caddr_t cl;
struct mbuf *m;
int ret = 0;
mask = fl->size - 1;
prefetch(fl->sdesc[(cidx + 1) & mask].m);
prefetch(fl->sdesc[(cidx + 2) & mask].m);
prefetch(fl->sdesc[(cidx + 1) & mask].rxsd_cl);
prefetch(fl->sdesc[(cidx + 2) & mask].rxsd_cl);
fl->credits--;
bus_dmamap_sync(fl->entry_tag, sd->map, BUS_DMASYNC_POSTREAD);
if (recycle_enable && len <= SGE_RX_COPY_THRES &&
sopeop == RSPQ_SOP_EOP) {
if ((m = m_gethdr(M_NOWAIT, MT_DATA)) == NULL)
goto skip_recycle;
cl = mtod(m, void *);
memcpy(cl, sd->rxsd_cl, len);
recycle_rx_buf(adap, fl, fl->cidx);
m->m_pkthdr.len = m->m_len = len;
m->m_flags = 0;
mh->mh_head = mh->mh_tail = m;
ret = 1;
goto done;
} else {
skip_recycle:
bus_dmamap_unload(fl->entry_tag, sd->map);
cl = sd->rxsd_cl;
m = sd->m;
if ((sopeop == RSPQ_SOP_EOP) ||
(sopeop == RSPQ_SOP))
flags |= M_PKTHDR;
m_init(m, M_NOWAIT, MT_DATA, flags);
if (fl->zone == zone_pack) {
/*
* restore clobbered data pointer
*/
m->m_data = m->m_ext.ext_buf;
} else {
m_cljset(m, cl, fl->type);
}
m->m_len = len;
}
switch(sopeop) {
case RSPQ_SOP_EOP:
ret = 1;
/* FALLTHROUGH */
case RSPQ_SOP:
mh->mh_head = mh->mh_tail = m;
m->m_pkthdr.len = len;
break;
case RSPQ_EOP:
ret = 1;
/* FALLTHROUGH */
case RSPQ_NSOP_NEOP:
if (mh->mh_tail == NULL) {
log(LOG_ERR, "discarding intermediate descriptor entry\n");
m_freem(m);
break;
}
mh->mh_tail->m_next = m;
mh->mh_tail = m;
mh->mh_head->m_pkthdr.len += len;
break;
}
if (cxgb_debug)
printf("len=%d pktlen=%d\n", m->m_len, m->m_pkthdr.len);
done:
if (++fl->cidx == fl->size)
fl->cidx = 0;
return (ret);
}
/**
* handle_rsp_cntrl_info - handles control information in a response
* @qs: the queue set corresponding to the response
* @flags: the response control flags
*
* Handles the control information of an SGE response, such as GTS
* indications and completion credits for the queue set's Tx queues.
* HW coalesces credits, we don't do any extra SW coalescing.
*/
static __inline void
handle_rsp_cntrl_info(struct sge_qset *qs, uint32_t flags)
{
unsigned int credits;
#if USE_GTS
if (flags & F_RSPD_TXQ0_GTS)
clear_bit(TXQ_RUNNING, &qs->txq[TXQ_ETH].flags);
#endif
credits = G_RSPD_TXQ0_CR(flags);
if (credits)
qs->txq[TXQ_ETH].processed += credits;
credits = G_RSPD_TXQ2_CR(flags);
if (credits)
qs->txq[TXQ_CTRL].processed += credits;
# if USE_GTS
if (flags & F_RSPD_TXQ1_GTS)
clear_bit(TXQ_RUNNING, &qs->txq[TXQ_OFLD].flags);
# endif
credits = G_RSPD_TXQ1_CR(flags);
if (credits)
qs->txq[TXQ_OFLD].processed += credits;
}
static void
check_ring_db(adapter_t *adap, struct sge_qset *qs,
unsigned int sleeping)
{
;
}
/**
* process_responses - process responses from an SGE response queue
* @adap: the adapter
* @qs: the queue set to which the response queue belongs
* @budget: how many responses can be processed in this round
*
* Process responses from an SGE response queue up to the supplied budget.
* Responses include received packets as well as credits and other events
* for the queues that belong to the response queue's queue set.
* A negative budget is effectively unlimited.
*
* Additionally choose the interrupt holdoff time for the next interrupt
* on this queue. If the system is under memory shortage use a fairly
* long delay to help recovery.
*/
static int
process_responses(adapter_t *adap, struct sge_qset *qs, int budget)
{
struct sge_rspq *rspq = &qs->rspq;
struct rsp_desc *r = &rspq->desc[rspq->cidx];
int budget_left = budget;
unsigned int sleeping = 0;
#if defined(INET6) || defined(INET)
int lro_enabled = qs->lro.enabled;
int skip_lro;
struct lro_ctrl *lro_ctrl = &qs->lro.ctrl;
#endif
struct t3_mbuf_hdr *mh = &rspq->rspq_mh;
#ifdef DEBUG
static int last_holdoff = 0;
if (cxgb_debug && rspq->holdoff_tmr != last_holdoff) {
printf("next_holdoff=%d\n", rspq->holdoff_tmr);
last_holdoff = rspq->holdoff_tmr;
}
#endif
rspq->next_holdoff = rspq->holdoff_tmr;
while (__predict_true(budget_left && is_new_response(r, rspq))) {
int eth, eop = 0, ethpad = 0;
uint32_t flags = ntohl(r->flags);
uint32_t rss_hash = be32toh(r->rss_hdr.rss_hash_val);
uint8_t opcode = r->rss_hdr.opcode;
eth = (opcode == CPL_RX_PKT);
if (__predict_false(flags & F_RSPD_ASYNC_NOTIF)) {
struct mbuf *m;
if (cxgb_debug)
printf("async notification\n");
if (mh->mh_head == NULL) {
mh->mh_head = m_gethdr(M_NOWAIT, MT_DATA);
m = mh->mh_head;
} else {
m = m_gethdr(M_NOWAIT, MT_DATA);
}
if (m == NULL)
goto no_mem;
memcpy(mtod(m, char *), r, AN_PKT_SIZE);
m->m_len = m->m_pkthdr.len = AN_PKT_SIZE;
*mtod(m, char *) = CPL_ASYNC_NOTIF;
opcode = CPL_ASYNC_NOTIF;
eop = 1;
rspq->async_notif++;
goto skip;
} else if (flags & F_RSPD_IMM_DATA_VALID) {
struct mbuf *m = m_gethdr(M_NOWAIT, MT_DATA);
if (m == NULL) {
no_mem:
rspq->next_holdoff = NOMEM_INTR_DELAY;
budget_left--;
break;
}
if (mh->mh_head == NULL)
mh->mh_head = m;
else
mh->mh_tail->m_next = m;
mh->mh_tail = m;
get_imm_packet(adap, r, m);
mh->mh_head->m_pkthdr.len += m->m_len;
eop = 1;
rspq->imm_data++;
} else if (r->len_cq) {
int drop_thresh = eth ? SGE_RX_DROP_THRES : 0;
eop = get_packet(adap, drop_thresh, qs, mh, r);
if (eop) {
if (r->rss_hdr.hash_type && !adap->timestamp) {
- M_HASHTYPE_SET(mh->mh_head, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mh->mh_head,
+ M_HASHTYPE_OPAQUE_HASH);
mh->mh_head->m_pkthdr.flowid = rss_hash;
}
}
ethpad = 2;
} else {
rspq->pure_rsps++;
}
skip:
if (flags & RSPD_CTRL_MASK) {
sleeping |= flags & RSPD_GTS_MASK;
handle_rsp_cntrl_info(qs, flags);
}
if (!eth && eop) {
rspq->offload_pkts++;
#ifdef TCP_OFFLOAD
adap->cpl_handler[opcode](qs, r, mh->mh_head);
#else
m_freem(mh->mh_head);
#endif
mh->mh_head = NULL;
} else if (eth && eop) {
struct mbuf *m = mh->mh_head;
t3_rx_eth(adap, m, ethpad);
/*
* The T304 sends incoming packets on any qset. If LRO
* is also enabled, we could end up sending packet up
* lro_ctrl->ifp's input. That is incorrect.
*
* The mbuf's rcvif was derived from the cpl header and
* is accurate. Skip LRO and just use that.
*/
#if defined(INET6) || defined(INET)
skip_lro = __predict_false(qs->port->ifp != m->m_pkthdr.rcvif);
if (lro_enabled && lro_ctrl->lro_cnt && !skip_lro
&& (tcp_lro_rx(lro_ctrl, m, 0) == 0)
) {
/* successfully queue'd for LRO */
} else
#endif
{
/*
* LRO not enabled, packet unsuitable for LRO,
* or unable to queue. Pass it up right now in
* either case.
*/
struct ifnet *ifp = m->m_pkthdr.rcvif;
(*ifp->if_input)(ifp, m);
}
mh->mh_head = NULL;
}
r++;
if (__predict_false(++rspq->cidx == rspq->size)) {
rspq->cidx = 0;
rspq->gen ^= 1;
r = rspq->desc;
}
if (++rspq->credits >= 64) {
refill_rspq(adap, rspq, rspq->credits);
rspq->credits = 0;
}
__refill_fl_lt(adap, &qs->fl[0], 32);
__refill_fl_lt(adap, &qs->fl[1], 32);
--budget_left;
}
#if defined(INET6) || defined(INET)
/* Flush LRO */
tcp_lro_flush_all(lro_ctrl);
#endif
if (sleeping)
check_ring_db(adap, qs, sleeping);
mb(); /* commit Tx queue processed updates */
if (__predict_false(qs->txq_stopped > 1))
restart_tx(qs);
__refill_fl_lt(adap, &qs->fl[0], 512);
__refill_fl_lt(adap, &qs->fl[1], 512);
budget -= budget_left;
return (budget);
}
/*
* A helper function that processes responses and issues GTS.
*/
static __inline int
process_responses_gts(adapter_t *adap, struct sge_rspq *rq)
{
int work;
static int last_holdoff = 0;
work = process_responses(adap, rspq_to_qset(rq), -1);
if (cxgb_debug && (rq->next_holdoff != last_holdoff)) {
printf("next_holdoff=%d\n", rq->next_holdoff);
last_holdoff = rq->next_holdoff;
}
t3_write_reg(adap, A_SG_GTS, V_RSPQ(rq->cntxt_id) |
V_NEWTIMER(rq->next_holdoff) | V_NEWINDEX(rq->cidx));
return (work);
}
/*
* Interrupt handler for legacy INTx interrupts for T3B-based cards.
* Handles data events from SGE response queues as well as error and other
* async events as they all use the same interrupt pin. We use one SGE
* response queue per port in this mode and protect all response queues with
* queue 0's lock.
*/
void
t3b_intr(void *data)
{
uint32_t i, map;
adapter_t *adap = data;
struct sge_rspq *q0 = &adap->sge.qs[0].rspq;
t3_write_reg(adap, A_PL_CLI, 0);
map = t3_read_reg(adap, A_SG_DATA_INTR);
if (!map)
return;
if (__predict_false(map & F_ERRINTR)) {
t3_write_reg(adap, A_PL_INT_ENABLE0, 0);
(void) t3_read_reg(adap, A_PL_INT_ENABLE0);
taskqueue_enqueue(adap->tq, &adap->slow_intr_task);
}
mtx_lock(&q0->lock);
for_each_port(adap, i)
if (map & (1 << i))
process_responses_gts(adap, &adap->sge.qs[i].rspq);
mtx_unlock(&q0->lock);
}
/*
* The MSI interrupt handler. This needs to handle data events from SGE
* response queues as well as error and other async events as they all use
* the same MSI vector. We use one SGE response queue per port in this mode
* and protect all response queues with queue 0's lock.
*/
void
t3_intr_msi(void *data)
{
adapter_t *adap = data;
struct sge_rspq *q0 = &adap->sge.qs[0].rspq;
int i, new_packets = 0;
mtx_lock(&q0->lock);
for_each_port(adap, i)
if (process_responses_gts(adap, &adap->sge.qs[i].rspq))
new_packets = 1;
mtx_unlock(&q0->lock);
if (new_packets == 0) {
t3_write_reg(adap, A_PL_INT_ENABLE0, 0);
(void) t3_read_reg(adap, A_PL_INT_ENABLE0);
taskqueue_enqueue(adap->tq, &adap->slow_intr_task);
}
}
void
t3_intr_msix(void *data)
{
struct sge_qset *qs = data;
adapter_t *adap = qs->port->adapter;
struct sge_rspq *rspq = &qs->rspq;
if (process_responses_gts(adap, rspq) == 0)
rspq->unhandled_irqs++;
}
#define QDUMP_SBUF_SIZE 32 * 400
static int
t3_dump_rspq(SYSCTL_HANDLER_ARGS)
{
struct sge_rspq *rspq;
struct sge_qset *qs;
int i, err, dump_end, idx;
struct sbuf *sb;
struct rsp_desc *rspd;
uint32_t data[4];
rspq = arg1;
qs = rspq_to_qset(rspq);
if (rspq->rspq_dump_count == 0)
return (0);
if (rspq->rspq_dump_count > RSPQ_Q_SIZE) {
log(LOG_WARNING,
"dump count is too large %d\n", rspq->rspq_dump_count);
rspq->rspq_dump_count = 0;
return (EINVAL);
}
if (rspq->rspq_dump_start > (RSPQ_Q_SIZE-1)) {
log(LOG_WARNING,
"dump start of %d is greater than queue size\n",
rspq->rspq_dump_start);
rspq->rspq_dump_start = 0;
return (EINVAL);
}
err = t3_sge_read_rspq(qs->port->adapter, rspq->cntxt_id, data);
if (err)
return (err);
err = sysctl_wire_old_buffer(req, 0);
if (err)
return (err);
sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
sbuf_printf(sb, " \n index=%u size=%u MSI-X/RspQ=%u intr enable=%u intr armed=%u\n",
(data[0] & 0xffff), data[0] >> 16, ((data[2] >> 20) & 0x3f),
((data[2] >> 26) & 1), ((data[2] >> 27) & 1));
sbuf_printf(sb, " generation=%u CQ mode=%u FL threshold=%u\n",
((data[2] >> 28) & 1), ((data[2] >> 31) & 1), data[3]);
sbuf_printf(sb, " start=%d -> end=%d\n", rspq->rspq_dump_start,
(rspq->rspq_dump_start + rspq->rspq_dump_count) & (RSPQ_Q_SIZE-1));
dump_end = rspq->rspq_dump_start + rspq->rspq_dump_count;
for (i = rspq->rspq_dump_start; i < dump_end; i++) {
idx = i & (RSPQ_Q_SIZE-1);
rspd = &rspq->desc[idx];
sbuf_printf(sb, "\tidx=%04d opcode=%02x cpu_idx=%x hash_type=%x cq_idx=%x\n",
idx, rspd->rss_hdr.opcode, rspd->rss_hdr.cpu_idx,
rspd->rss_hdr.hash_type, be16toh(rspd->rss_hdr.cq_idx));
sbuf_printf(sb, "\trss_hash_val=%x flags=%08x len_cq=%x intr_gen=%x\n",
rspd->rss_hdr.rss_hash_val, be32toh(rspd->flags),
be32toh(rspd->len_cq), rspd->intr_gen);
}
err = sbuf_finish(sb);
sbuf_delete(sb);
return (err);
}
static int
t3_dump_txq_eth(SYSCTL_HANDLER_ARGS)
{
struct sge_txq *txq;
struct sge_qset *qs;
int i, j, err, dump_end;
struct sbuf *sb;
struct tx_desc *txd;
uint32_t *WR, wr_hi, wr_lo, gen;
uint32_t data[4];
txq = arg1;
qs = txq_to_qset(txq, TXQ_ETH);
if (txq->txq_dump_count == 0) {
return (0);
}
if (txq->txq_dump_count > TX_ETH_Q_SIZE) {
log(LOG_WARNING,
"dump count is too large %d\n", txq->txq_dump_count);
txq->txq_dump_count = 1;
return (EINVAL);
}
if (txq->txq_dump_start > (TX_ETH_Q_SIZE-1)) {
log(LOG_WARNING,
"dump start of %d is greater than queue size\n",
txq->txq_dump_start);
txq->txq_dump_start = 0;
return (EINVAL);
}
err = t3_sge_read_ecntxt(qs->port->adapter, qs->rspq.cntxt_id, data);
if (err)
return (err);
err = sysctl_wire_old_buffer(req, 0);
if (err)
return (err);
sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
sbuf_printf(sb, " \n credits=%u GTS=%u index=%u size=%u rspq#=%u cmdq#=%u\n",
(data[0] & 0x7fff), ((data[0] >> 15) & 1), (data[0] >> 16),
(data[1] & 0xffff), ((data[3] >> 4) & 7), ((data[3] >> 7) & 1));
sbuf_printf(sb, " TUN=%u TOE=%u generation%u uP token=%u valid=%u\n",
((data[3] >> 8) & 1), ((data[3] >> 9) & 1), ((data[3] >> 10) & 1),
((data[3] >> 11) & 0xfffff), ((data[3] >> 31) & 1));
sbuf_printf(sb, " qid=%d start=%d -> end=%d\n", qs->idx,
txq->txq_dump_start,
(txq->txq_dump_start + txq->txq_dump_count) & (TX_ETH_Q_SIZE-1));
dump_end = txq->txq_dump_start + txq->txq_dump_count;
for (i = txq->txq_dump_start; i < dump_end; i++) {
txd = &txq->desc[i & (TX_ETH_Q_SIZE-1)];
WR = (uint32_t *)txd->flit;
wr_hi = ntohl(WR[0]);
wr_lo = ntohl(WR[1]);
gen = G_WR_GEN(wr_lo);
sbuf_printf(sb," wr_hi %08x wr_lo %08x gen %d\n",
wr_hi, wr_lo, gen);
for (j = 2; j < 30; j += 4)
sbuf_printf(sb, "\t%08x %08x %08x %08x \n",
WR[j], WR[j + 1], WR[j + 2], WR[j + 3]);
}
err = sbuf_finish(sb);
sbuf_delete(sb);
return (err);
}
static int
t3_dump_txq_ctrl(SYSCTL_HANDLER_ARGS)
{
struct sge_txq *txq;
struct sge_qset *qs;
int i, j, err, dump_end;
struct sbuf *sb;
struct tx_desc *txd;
uint32_t *WR, wr_hi, wr_lo, gen;
txq = arg1;
qs = txq_to_qset(txq, TXQ_CTRL);
if (txq->txq_dump_count == 0) {
return (0);
}
if (txq->txq_dump_count > 256) {
log(LOG_WARNING,
"dump count is too large %d\n", txq->txq_dump_count);
txq->txq_dump_count = 1;
return (EINVAL);
}
if (txq->txq_dump_start > 255) {
log(LOG_WARNING,
"dump start of %d is greater than queue size\n",
txq->txq_dump_start);
txq->txq_dump_start = 0;
return (EINVAL);
}
err = sysctl_wire_old_buffer(req, 0);
if (err != 0)
return (err);
sb = sbuf_new_for_sysctl(NULL, NULL, QDUMP_SBUF_SIZE, req);
sbuf_printf(sb, " qid=%d start=%d -> end=%d\n", qs->idx,
txq->txq_dump_start,
(txq->txq_dump_start + txq->txq_dump_count) & 255);
dump_end = txq->txq_dump_start + txq->txq_dump_count;
for (i = txq->txq_dump_start; i < dump_end; i++) {
txd = &txq->desc[i & (255)];
WR = (uint32_t *)txd->flit;
wr_hi = ntohl(WR[0]);
wr_lo = ntohl(WR[1]);
gen = G_WR_GEN(wr_lo);
sbuf_printf(sb," wr_hi %08x wr_lo %08x gen %d\n",
wr_hi, wr_lo, gen);
for (j = 2; j < 30; j += 4)
sbuf_printf(sb, "\t%08x %08x %08x %08x \n",
WR[j], WR[j + 1], WR[j + 2], WR[j + 3]);
}
err = sbuf_finish(sb);
sbuf_delete(sb);
return (err);
}
static int
t3_set_coalesce_usecs(SYSCTL_HANDLER_ARGS)
{
adapter_t *sc = arg1;
struct qset_params *qsp = &sc->params.sge.qset[0];
int coalesce_usecs;
struct sge_qset *qs;
int i, j, err, nqsets = 0;
struct mtx *lock;
if ((sc->flags & FULL_INIT_DONE) == 0)
return (ENXIO);
coalesce_usecs = qsp->coalesce_usecs;
err = sysctl_handle_int(oidp, &coalesce_usecs, arg2, req);
if (err != 0) {
return (err);
}
if (coalesce_usecs == qsp->coalesce_usecs)
return (0);
for (i = 0; i < sc->params.nports; i++)
for (j = 0; j < sc->port[i].nqsets; j++)
nqsets++;
coalesce_usecs = max(1, coalesce_usecs);
for (i = 0; i < nqsets; i++) {
qs = &sc->sge.qs[i];
qsp = &sc->params.sge.qset[i];
qsp->coalesce_usecs = coalesce_usecs;
lock = (sc->flags & USING_MSIX) ? &qs->rspq.lock :
&sc->sge.qs[0].rspq.lock;
mtx_lock(lock);
t3_update_qset_coalesce(qs, qsp);
t3_write_reg(sc, A_SG_GTS, V_RSPQ(qs->rspq.cntxt_id) |
V_NEWTIMER(qs->rspq.holdoff_tmr));
mtx_unlock(lock);
}
return (0);
}
static int
t3_pkt_timestamp(SYSCTL_HANDLER_ARGS)
{
adapter_t *sc = arg1;
int rc, timestamp;
if ((sc->flags & FULL_INIT_DONE) == 0)
return (ENXIO);
timestamp = sc->timestamp;
rc = sysctl_handle_int(oidp, ×tamp, arg2, req);
if (rc != 0)
return (rc);
if (timestamp != sc->timestamp) {
t3_set_reg_field(sc, A_TP_PC_CONFIG2, F_ENABLERXPKTTMSTPRSS,
timestamp ? F_ENABLERXPKTTMSTPRSS : 0);
sc->timestamp = timestamp;
}
return (0);
}
void
t3_add_attach_sysctls(adapter_t *sc)
{
struct sysctl_ctx_list *ctx;
struct sysctl_oid_list *children;
ctx = device_get_sysctl_ctx(sc->dev);
children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->dev));
/* random information */
SYSCTL_ADD_STRING(ctx, children, OID_AUTO,
"firmware_version",
CTLFLAG_RD, sc->fw_version,
0, "firmware version");
SYSCTL_ADD_UINT(ctx, children, OID_AUTO,
"hw_revision",
CTLFLAG_RD, &sc->params.rev,
0, "chip model");
SYSCTL_ADD_STRING(ctx, children, OID_AUTO,
"port_types",
CTLFLAG_RD, sc->port_types,
0, "type of ports");
SYSCTL_ADD_INT(ctx, children, OID_AUTO,
"enable_debug",
CTLFLAG_RW, &cxgb_debug,
0, "enable verbose debugging output");
SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, "tunq_coalesce",
CTLFLAG_RD, &sc->tunq_coalesce,
"#tunneled packets freed");
SYSCTL_ADD_INT(ctx, children, OID_AUTO,
"txq_overrun",
CTLFLAG_RD, &txq_fills,
0, "#times txq overrun");
SYSCTL_ADD_UINT(ctx, children, OID_AUTO,
"core_clock",
CTLFLAG_RD, &sc->params.vpd.cclk,
0, "core clock frequency (in KHz)");
}
static const char *rspq_name = "rspq";
static const char *txq_names[] =
{
"txq_eth",
"txq_ofld",
"txq_ctrl"
};
static int
sysctl_handle_macstat(SYSCTL_HANDLER_ARGS)
{
struct port_info *p = arg1;
uint64_t *parg;
if (!p)
return (EINVAL);
cxgb_refresh_stats(p);
parg = (uint64_t *) ((uint8_t *)&p->mac.stats + arg2);
return (sysctl_handle_64(oidp, parg, 0, req));
}
void
t3_add_configured_sysctls(adapter_t *sc)
{
struct sysctl_ctx_list *ctx;
struct sysctl_oid_list *children;
int i, j;
ctx = device_get_sysctl_ctx(sc->dev);
children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->dev));
SYSCTL_ADD_PROC(ctx, children, OID_AUTO,
"intr_coal",
CTLTYPE_INT|CTLFLAG_RW, sc,
0, t3_set_coalesce_usecs,
"I", "interrupt coalescing timer (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO,
"pkt_timestamp",
CTLTYPE_INT | CTLFLAG_RW, sc,
0, t3_pkt_timestamp,
"I", "provide packet timestamp instead of connection hash");
for (i = 0; i < sc->params.nports; i++) {
struct port_info *pi = &sc->port[i];
struct sysctl_oid *poid;
struct sysctl_oid_list *poidlist;
struct mac_stats *mstats = &pi->mac.stats;
snprintf(pi->namebuf, PORT_NAME_LEN, "port%d", i);
poid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO,
pi->namebuf, CTLFLAG_RD, NULL, "port statistics");
poidlist = SYSCTL_CHILDREN(poid);
SYSCTL_ADD_UINT(ctx, poidlist, OID_AUTO,
"nqsets", CTLFLAG_RD, &pi->nqsets,
0, "#queue sets");
for (j = 0; j < pi->nqsets; j++) {
struct sge_qset *qs = &sc->sge.qs[pi->first_qset + j];
struct sysctl_oid *qspoid, *rspqpoid, *txqpoid,
*ctrlqpoid, *lropoid;
struct sysctl_oid_list *qspoidlist, *rspqpoidlist,
*txqpoidlist, *ctrlqpoidlist,
*lropoidlist;
struct sge_txq *txq = &qs->txq[TXQ_ETH];
snprintf(qs->namebuf, QS_NAME_LEN, "qs%d", j);
qspoid = SYSCTL_ADD_NODE(ctx, poidlist, OID_AUTO,
qs->namebuf, CTLFLAG_RD, NULL, "qset statistics");
qspoidlist = SYSCTL_CHILDREN(qspoid);
SYSCTL_ADD_UINT(ctx, qspoidlist, OID_AUTO, "fl0_empty",
CTLFLAG_RD, &qs->fl[0].empty, 0,
"freelist #0 empty");
SYSCTL_ADD_UINT(ctx, qspoidlist, OID_AUTO, "fl1_empty",
CTLFLAG_RD, &qs->fl[1].empty, 0,
"freelist #1 empty");
rspqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO,
rspq_name, CTLFLAG_RD, NULL, "rspq statistics");
rspqpoidlist = SYSCTL_CHILDREN(rspqpoid);
txqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO,
txq_names[0], CTLFLAG_RD, NULL, "txq statistics");
txqpoidlist = SYSCTL_CHILDREN(txqpoid);
ctrlqpoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO,
txq_names[2], CTLFLAG_RD, NULL, "ctrlq statistics");
ctrlqpoidlist = SYSCTL_CHILDREN(ctrlqpoid);
lropoid = SYSCTL_ADD_NODE(ctx, qspoidlist, OID_AUTO,
"lro_stats", CTLFLAG_RD, NULL, "LRO statistics");
lropoidlist = SYSCTL_CHILDREN(lropoid);
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "size",
CTLFLAG_RD, &qs->rspq.size,
0, "#entries in response queue");
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "cidx",
CTLFLAG_RD, &qs->rspq.cidx,
0, "consumer index");
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "credits",
CTLFLAG_RD, &qs->rspq.credits,
0, "#credits");
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "starved",
CTLFLAG_RD, &qs->rspq.starved,
0, "#times starved");
SYSCTL_ADD_UAUTO(ctx, rspqpoidlist, OID_AUTO, "phys_addr",
CTLFLAG_RD, &qs->rspq.phys_addr,
"physical_address_of the queue");
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "dump_start",
CTLFLAG_RW, &qs->rspq.rspq_dump_start,
0, "start rspq dump entry");
SYSCTL_ADD_UINT(ctx, rspqpoidlist, OID_AUTO, "dump_count",
CTLFLAG_RW, &qs->rspq.rspq_dump_count,
0, "#rspq entries to dump");
SYSCTL_ADD_PROC(ctx, rspqpoidlist, OID_AUTO, "qdump",
CTLTYPE_STRING | CTLFLAG_RD, &qs->rspq,
0, t3_dump_rspq, "A", "dump of the response queue");
SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "dropped",
CTLFLAG_RD, &qs->txq[TXQ_ETH].txq_mr->br_drops,
"#tunneled packets dropped");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "sendqlen",
CTLFLAG_RD, &qs->txq[TXQ_ETH].sendq.mq_len,
0, "#tunneled packets waiting to be sent");
#if 0
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "queue_pidx",
CTLFLAG_RD, (uint32_t *)(uintptr_t)&qs->txq[TXQ_ETH].txq_mr.br_prod,
0, "#tunneled packets queue producer index");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "queue_cidx",
CTLFLAG_RD, (uint32_t *)(uintptr_t)&qs->txq[TXQ_ETH].txq_mr.br_cons,
0, "#tunneled packets queue consumer index");
#endif
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "processed",
CTLFLAG_RD, &qs->txq[TXQ_ETH].processed,
0, "#tunneled packets processed by the card");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "cleaned",
CTLFLAG_RD, &txq->cleaned,
0, "#tunneled packets cleaned");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "in_use",
CTLFLAG_RD, &txq->in_use,
0, "#tunneled packet slots in use");
SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "frees",
CTLFLAG_RD, &txq->txq_frees,
"#tunneled packets freed");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "skipped",
CTLFLAG_RD, &txq->txq_skipped,
0, "#tunneled packet descriptors skipped");
SYSCTL_ADD_UQUAD(ctx, txqpoidlist, OID_AUTO, "coalesced",
CTLFLAG_RD, &txq->txq_coalesced,
"#tunneled packets coalesced");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "enqueued",
CTLFLAG_RD, &txq->txq_enqueued,
0, "#tunneled packets enqueued to hardware");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "stopped_flags",
CTLFLAG_RD, &qs->txq_stopped,
0, "tx queues stopped");
SYSCTL_ADD_UAUTO(ctx, txqpoidlist, OID_AUTO, "phys_addr",
CTLFLAG_RD, &txq->phys_addr,
"physical_address_of the queue");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "qgen",
CTLFLAG_RW, &qs->txq[TXQ_ETH].gen,
0, "txq generation");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "hw_cidx",
CTLFLAG_RD, &txq->cidx,
0, "hardware queue cidx");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "hw_pidx",
CTLFLAG_RD, &txq->pidx,
0, "hardware queue pidx");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "dump_start",
CTLFLAG_RW, &qs->txq[TXQ_ETH].txq_dump_start,
0, "txq start idx for dump");
SYSCTL_ADD_UINT(ctx, txqpoidlist, OID_AUTO, "dump_count",
CTLFLAG_RW, &qs->txq[TXQ_ETH].txq_dump_count,
0, "txq #entries to dump");
SYSCTL_ADD_PROC(ctx, txqpoidlist, OID_AUTO, "qdump",
CTLTYPE_STRING | CTLFLAG_RD, &qs->txq[TXQ_ETH],
0, t3_dump_txq_eth, "A", "dump of the transmit queue");
SYSCTL_ADD_UINT(ctx, ctrlqpoidlist, OID_AUTO, "dump_start",
CTLFLAG_RW, &qs->txq[TXQ_CTRL].txq_dump_start,
0, "ctrlq start idx for dump");
SYSCTL_ADD_UINT(ctx, ctrlqpoidlist, OID_AUTO, "dump_count",
CTLFLAG_RW, &qs->txq[TXQ_CTRL].txq_dump_count,
0, "ctrl #entries to dump");
SYSCTL_ADD_PROC(ctx, ctrlqpoidlist, OID_AUTO, "qdump",
CTLTYPE_STRING | CTLFLAG_RD, &qs->txq[TXQ_CTRL],
0, t3_dump_txq_ctrl, "A", "dump of the transmit queue");
SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_queued",
CTLFLAG_RD, &qs->lro.ctrl.lro_queued, 0, NULL);
SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_flushed",
CTLFLAG_RD, &qs->lro.ctrl.lro_flushed, 0, NULL);
SYSCTL_ADD_U64(ctx, lropoidlist, OID_AUTO, "lro_bad_csum",
CTLFLAG_RD, &qs->lro.ctrl.lro_bad_csum, 0, NULL);
SYSCTL_ADD_INT(ctx, lropoidlist, OID_AUTO, "lro_cnt",
CTLFLAG_RD, &qs->lro.ctrl.lro_cnt, 0, NULL);
}
/* Now add a node for mac stats. */
poid = SYSCTL_ADD_NODE(ctx, poidlist, OID_AUTO, "mac_stats",
CTLFLAG_RD, NULL, "MAC statistics");
poidlist = SYSCTL_CHILDREN(poid);
/*
* We (ab)use the length argument (arg2) to pass on the offset
* of the data that we are interested in. This is only required
* for the quad counters that are updated from the hardware (we
* make sure that we return the latest value).
* sysctl_handle_macstat first updates *all* the counters from
* the hardware, and then returns the latest value of the
* requested counter. Best would be to update only the
* requested counter from hardware, but t3_mac_update_stats()
* hides all the register details and we don't want to dive into
* all that here.
*/
#define CXGB_SYSCTL_ADD_QUAD(a) SYSCTL_ADD_OID(ctx, poidlist, OID_AUTO, #a, \
(CTLTYPE_U64 | CTLFLAG_RD), pi, offsetof(struct mac_stats, a), \
sysctl_handle_macstat, "QU", 0)
CXGB_SYSCTL_ADD_QUAD(tx_octets);
CXGB_SYSCTL_ADD_QUAD(tx_octets_bad);
CXGB_SYSCTL_ADD_QUAD(tx_frames);
CXGB_SYSCTL_ADD_QUAD(tx_mcast_frames);
CXGB_SYSCTL_ADD_QUAD(tx_bcast_frames);
CXGB_SYSCTL_ADD_QUAD(tx_pause);
CXGB_SYSCTL_ADD_QUAD(tx_deferred);
CXGB_SYSCTL_ADD_QUAD(tx_late_collisions);
CXGB_SYSCTL_ADD_QUAD(tx_total_collisions);
CXGB_SYSCTL_ADD_QUAD(tx_excess_collisions);
CXGB_SYSCTL_ADD_QUAD(tx_underrun);
CXGB_SYSCTL_ADD_QUAD(tx_len_errs);
CXGB_SYSCTL_ADD_QUAD(tx_mac_internal_errs);
CXGB_SYSCTL_ADD_QUAD(tx_excess_deferral);
CXGB_SYSCTL_ADD_QUAD(tx_fcs_errs);
CXGB_SYSCTL_ADD_QUAD(tx_frames_64);
CXGB_SYSCTL_ADD_QUAD(tx_frames_65_127);
CXGB_SYSCTL_ADD_QUAD(tx_frames_128_255);
CXGB_SYSCTL_ADD_QUAD(tx_frames_256_511);
CXGB_SYSCTL_ADD_QUAD(tx_frames_512_1023);
CXGB_SYSCTL_ADD_QUAD(tx_frames_1024_1518);
CXGB_SYSCTL_ADD_QUAD(tx_frames_1519_max);
CXGB_SYSCTL_ADD_QUAD(rx_octets);
CXGB_SYSCTL_ADD_QUAD(rx_octets_bad);
CXGB_SYSCTL_ADD_QUAD(rx_frames);
CXGB_SYSCTL_ADD_QUAD(rx_mcast_frames);
CXGB_SYSCTL_ADD_QUAD(rx_bcast_frames);
CXGB_SYSCTL_ADD_QUAD(rx_pause);
CXGB_SYSCTL_ADD_QUAD(rx_fcs_errs);
CXGB_SYSCTL_ADD_QUAD(rx_align_errs);
CXGB_SYSCTL_ADD_QUAD(rx_symbol_errs);
CXGB_SYSCTL_ADD_QUAD(rx_data_errs);
CXGB_SYSCTL_ADD_QUAD(rx_sequence_errs);
CXGB_SYSCTL_ADD_QUAD(rx_runt);
CXGB_SYSCTL_ADD_QUAD(rx_jabber);
CXGB_SYSCTL_ADD_QUAD(rx_short);
CXGB_SYSCTL_ADD_QUAD(rx_too_long);
CXGB_SYSCTL_ADD_QUAD(rx_mac_internal_errs);
CXGB_SYSCTL_ADD_QUAD(rx_cong_drops);
CXGB_SYSCTL_ADD_QUAD(rx_frames_64);
CXGB_SYSCTL_ADD_QUAD(rx_frames_65_127);
CXGB_SYSCTL_ADD_QUAD(rx_frames_128_255);
CXGB_SYSCTL_ADD_QUAD(rx_frames_256_511);
CXGB_SYSCTL_ADD_QUAD(rx_frames_512_1023);
CXGB_SYSCTL_ADD_QUAD(rx_frames_1024_1518);
CXGB_SYSCTL_ADD_QUAD(rx_frames_1519_max);
#undef CXGB_SYSCTL_ADD_QUAD
#define CXGB_SYSCTL_ADD_ULONG(a) SYSCTL_ADD_ULONG(ctx, poidlist, OID_AUTO, #a, \
CTLFLAG_RD, &mstats->a, 0)
CXGB_SYSCTL_ADD_ULONG(tx_fifo_parity_err);
CXGB_SYSCTL_ADD_ULONG(rx_fifo_parity_err);
CXGB_SYSCTL_ADD_ULONG(tx_fifo_urun);
CXGB_SYSCTL_ADD_ULONG(rx_fifo_ovfl);
CXGB_SYSCTL_ADD_ULONG(serdes_signal_loss);
CXGB_SYSCTL_ADD_ULONG(xaui_pcs_ctc_err);
CXGB_SYSCTL_ADD_ULONG(xaui_pcs_align_change);
CXGB_SYSCTL_ADD_ULONG(num_toggled);
CXGB_SYSCTL_ADD_ULONG(num_resets);
CXGB_SYSCTL_ADD_ULONG(link_faults);
#undef CXGB_SYSCTL_ADD_ULONG
}
}
/**
* t3_get_desc - dump an SGE descriptor for debugging purposes
* @qs: the queue set
* @qnum: identifies the specific queue (0..2: Tx, 3:response, 4..5: Rx)
* @idx: the descriptor index in the queue
* @data: where to dump the descriptor contents
*
* Dumps the contents of a HW descriptor of an SGE queue. Returns the
* size of the descriptor.
*/
int
t3_get_desc(const struct sge_qset *qs, unsigned int qnum, unsigned int idx,
unsigned char *data)
{
if (qnum >= 6)
return (EINVAL);
if (qnum < 3) {
if (!qs->txq[qnum].desc || idx >= qs->txq[qnum].size)
return -EINVAL;
memcpy(data, &qs->txq[qnum].desc[idx], sizeof(struct tx_desc));
return sizeof(struct tx_desc);
}
if (qnum == 3) {
if (!qs->rspq.desc || idx >= qs->rspq.size)
return (EINVAL);
memcpy(data, &qs->rspq.desc[idx], sizeof(struct rsp_desc));
return sizeof(struct rsp_desc);
}
qnum -= 4;
if (!qs->fl[qnum].desc || idx >= qs->fl[qnum].size)
return (EINVAL);
memcpy(data, &qs->fl[qnum].desc[idx], sizeof(struct rx_desc));
return sizeof(struct rx_desc);
}
Index: projects/vnet/sys/dev/cxgbe/adapter.h
===================================================================
--- projects/vnet/sys/dev/cxgbe/adapter.h (revision 301546)
+++ projects/vnet/sys/dev/cxgbe/adapter.h (revision 301547)
@@ -1,1152 +1,1166 @@
/*-
* Copyright (c) 2011 Chelsio Communications, Inc.
* All rights reserved.
* Written by: Navdeep Parhar
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*
*/
#ifndef __T4_ADAPTER_H__
#define __T4_ADAPTER_H__
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "offload.h"
+#include "t4_ioctl.h"
#include "common/t4_msg.h"
#include "firmware/t4fw_interface.h"
#define KTR_CXGBE KTR_SPARE3
MALLOC_DECLARE(M_CXGBE);
#define CXGBE_UNIMPLEMENTED(s) \
panic("%s (%s, line %d) not implemented yet.", s, __FILE__, __LINE__)
#if defined(__i386__) || defined(__amd64__)
static __inline void
prefetch(void *x)
{
__asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x));
}
#else
#define prefetch(x)
#endif
#ifndef SYSCTL_ADD_UQUAD
#define SYSCTL_ADD_UQUAD SYSCTL_ADD_QUAD
#define sysctl_handle_64 sysctl_handle_quad
#define CTLTYPE_U64 CTLTYPE_QUAD
#endif
#if (__FreeBSD_version >= 900030) || \
((__FreeBSD_version >= 802507) && (__FreeBSD_version < 900000))
#define SBUF_DRAIN 1
#endif
#ifdef __amd64__
/* XXX: need systemwide bus_space_read_8/bus_space_write_8 */
static __inline uint64_t
t4_bus_space_read_8(bus_space_tag_t tag, bus_space_handle_t handle,
bus_size_t offset)
{
KASSERT(tag == X86_BUS_SPACE_MEM,
("%s: can only handle mem space", __func__));
return (*(volatile uint64_t *)(handle + offset));
}
static __inline void
t4_bus_space_write_8(bus_space_tag_t tag, bus_space_handle_t bsh,
bus_size_t offset, uint64_t value)
{
KASSERT(tag == X86_BUS_SPACE_MEM,
("%s: can only handle mem space", __func__));
*(volatile uint64_t *)(bsh + offset) = value;
}
#else
static __inline uint64_t
t4_bus_space_read_8(bus_space_tag_t tag, bus_space_handle_t handle,
bus_size_t offset)
{
return (uint64_t)bus_space_read_4(tag, handle, offset) +
((uint64_t)bus_space_read_4(tag, handle, offset + 4) << 32);
}
static __inline void
t4_bus_space_write_8(bus_space_tag_t tag, bus_space_handle_t bsh,
bus_size_t offset, uint64_t value)
{
bus_space_write_4(tag, bsh, offset, value);
bus_space_write_4(tag, bsh, offset + 4, value >> 32);
}
#endif
struct adapter;
typedef struct adapter adapter_t;
enum {
/*
* All ingress queues use this entry size. Note that the firmware event
* queue and any iq expecting CPL_RX_PKT in the descriptor needs this to
* be at least 64.
*/
IQ_ESIZE = 64,
/* Default queue sizes for all kinds of ingress queues */
FW_IQ_QSIZE = 256,
RX_IQ_QSIZE = 1024,
/* All egress queues use this entry size */
EQ_ESIZE = 64,
/* Default queue sizes for all kinds of egress queues */
CTRL_EQ_QSIZE = 128,
TX_EQ_QSIZE = 1024,
#if MJUMPAGESIZE != MCLBYTES
SW_ZONE_SIZES = 4, /* cluster, jumbop, jumbo9k, jumbo16k */
#else
SW_ZONE_SIZES = 3, /* cluster, jumbo9k, jumbo16k */
#endif
CL_METADATA_SIZE = CACHE_LINE_SIZE,
SGE_MAX_WR_NDESC = SGE_MAX_WR_LEN / EQ_ESIZE, /* max WR size in desc */
TX_SGL_SEGS = 39,
TX_SGL_SEGS_TSO = 38,
TX_WR_FLITS = SGE_MAX_WR_LEN / 8
};
enum {
/* adapter intr_type */
INTR_INTX = (1 << 0),
INTR_MSI = (1 << 1),
INTR_MSIX = (1 << 2)
};
enum {
XGMAC_MTU = (1 << 0),
XGMAC_PROMISC = (1 << 1),
XGMAC_ALLMULTI = (1 << 2),
XGMAC_VLANEX = (1 << 3),
XGMAC_UCADDR = (1 << 4),
XGMAC_MCADDRS = (1 << 5),
XGMAC_ALL = 0xffff
};
enum {
/* flags understood by begin_synchronized_op */
HOLD_LOCK = (1 << 0),
SLEEP_OK = (1 << 1),
INTR_OK = (1 << 2),
/* flags understood by end_synchronized_op */
LOCK_HELD = HOLD_LOCK,
};
enum {
/* adapter flags */
FULL_INIT_DONE = (1 << 0),
FW_OK = (1 << 1),
/* INTR_DIRECT = (1 << 2), No longer used. */
MASTER_PF = (1 << 3),
ADAP_SYSCTL_CTX = (1 << 4),
/* TOM_INIT_DONE= (1 << 5), No longer used */
BUF_PACKING_OK = (1 << 6),
CXGBE_BUSY = (1 << 9),
/* port flags */
HAS_TRACEQ = (1 << 3),
/* VI flags */
DOOMED = (1 << 0),
VI_INIT_DONE = (1 << 1),
VI_SYSCTL_CTX = (1 << 2),
INTR_RXQ = (1 << 4), /* All NIC rxq's take interrupts */
INTR_OFLD_RXQ = (1 << 5), /* All TOE rxq's take interrupts */
INTR_ALL = (INTR_RXQ | INTR_OFLD_RXQ),
VI_NETMAP = (1 << 6),
/* adapter debug_flags */
DF_DUMP_MBOX = (1 << 0),
};
#define IS_DOOMED(vi) ((vi)->flags & DOOMED)
#define SET_DOOMED(vi) do {(vi)->flags |= DOOMED;} while (0)
#define IS_BUSY(sc) ((sc)->flags & CXGBE_BUSY)
#define SET_BUSY(sc) do {(sc)->flags |= CXGBE_BUSY;} while (0)
#define CLR_BUSY(sc) do {(sc)->flags &= ~CXGBE_BUSY;} while (0)
struct vi_info {
device_t dev;
struct port_info *pi;
struct ifnet *ifp;
struct ifmedia media;
unsigned long flags;
int if_flags;
uint16_t *rss;
uint16_t viid;
int16_t xact_addr_filt;/* index of exact MAC address filter */
uint16_t rss_size; /* size of VI's RSS table slice */
uint16_t rss_base; /* start of VI's RSS table slice */
eventhandler_tag vlan_c;
int nintr;
int first_intr;
/* These need to be int as they are used in sysctl */
int ntxq; /* # of tx queues */
int first_txq; /* index of first tx queue */
int rsrv_noflowq; /* Reserve queue 0 for non-flowid packets */
int nrxq; /* # of rx queues */
int first_rxq; /* index of first rx queue */
int nofldtxq; /* # of offload tx queues */
int first_ofld_txq; /* index of first offload tx queue */
int nofldrxq; /* # of offload rx queues */
int first_ofld_rxq; /* index of first offload rx queue */
int tmr_idx;
int pktc_idx;
int qsize_rxq;
int qsize_txq;
struct timeval last_refreshed;
struct fw_vi_stats_vf stats;
struct callout tick;
struct sysctl_ctx_list ctx; /* from ifconfig up to driver detach */
uint8_t hw_addr[ETHER_ADDR_LEN]; /* factory MAC address, won't change */
};
+enum {
+ /* tx_sched_class flags */
+ TX_SC_OK = (1 << 0), /* Set up in hardware, active. */
+};
+
+struct tx_sched_class {
+ int refcount;
+ int flags;
+ struct t4_sched_class_params params;
+};
+
struct port_info {
device_t dev;
struct adapter *adapter;
struct vi_info *vi;
int nvi;
int up_vis;
int uld_vis;
+
+ struct tx_sched_class *tc; /* traffic classes for this channel */
struct mtx pi_lock;
char lockname[16];
unsigned long flags;
uint8_t lport; /* associated offload logical port */
int8_t mdio_addr;
uint8_t port_type;
uint8_t mod_type;
uint8_t port_id;
uint8_t tx_chan;
uint8_t rx_chan_map; /* rx MPS channel bitmap */
int linkdnrc;
struct link_config link_cfg;
struct timeval last_refreshed;
struct port_stats stats;
u_int tnl_cong_drops;
u_int tx_parse_error;
struct callout tick;
};
#define IS_MAIN_VI(vi) ((vi) == &((vi)->pi->vi[0]))
/* Where the cluster came from, how it has been carved up. */
struct cluster_layout {
int8_t zidx;
int8_t hwidx;
uint16_t region1; /* mbufs laid out within this region */
/* region2 is the DMA region */
uint16_t region3; /* cluster_metadata within this region */
};
struct cluster_metadata {
u_int refcount;
struct fl_sdesc *sd; /* For debug only. Could easily be stale */
};
struct fl_sdesc {
caddr_t cl;
uint16_t nmbuf; /* # of driver originated mbufs with ref on cluster */
struct cluster_layout cll;
};
struct tx_desc {
__be64 flit[8];
};
struct tx_sdesc {
struct mbuf *m; /* m_nextpkt linked chain of frames */
uint8_t desc_used; /* # of hardware descriptors used by the WR */
};
#define IQ_PAD (IQ_ESIZE - sizeof(struct rsp_ctrl) - sizeof(struct rss_header))
struct iq_desc {
struct rss_header rss;
uint8_t cpl[IQ_PAD];
struct rsp_ctrl rsp;
};
#undef IQ_PAD
CTASSERT(sizeof(struct iq_desc) == IQ_ESIZE);
enum {
/* iq flags */
IQ_ALLOCATED = (1 << 0), /* firmware resources allocated */
IQ_HAS_FL = (1 << 1), /* iq associated with a freelist */
IQ_INTR = (1 << 2), /* iq takes direct interrupt */
IQ_LRO_ENABLED = (1 << 3), /* iq is an eth rxq with LRO enabled */
/* iq state */
IQS_DISABLED = 0,
IQS_BUSY = 1,
IQS_IDLE = 2,
};
/*
* Ingress Queue: T4 is producer, driver is consumer.
*/
struct sge_iq {
uint32_t flags;
volatile int state;
struct adapter *adapter;
struct iq_desc *desc; /* KVA of descriptor ring */
int8_t intr_pktc_idx; /* packet count threshold index */
uint8_t gen; /* generation bit */
uint8_t intr_params; /* interrupt holdoff parameters */
uint8_t intr_next; /* XXX: holdoff for next interrupt */
uint16_t qsize; /* size (# of entries) of the queue */
uint16_t sidx; /* index of the entry with the status page */
uint16_t cidx; /* consumer index */
uint16_t cntxt_id; /* SGE context id for the iq */
uint16_t abs_id; /* absolute SGE id for the iq */
STAILQ_ENTRY(sge_iq) link;
bus_dma_tag_t desc_tag;
bus_dmamap_t desc_map;
bus_addr_t ba; /* bus address of descriptor ring */
};
enum {
EQ_CTRL = 1,
EQ_ETH = 2,
EQ_OFLD = 3,
/* eq flags */
EQ_TYPEMASK = 0x3, /* 2 lsbits hold the type (see above) */
EQ_ALLOCATED = (1 << 2), /* firmware resources allocated */
EQ_ENABLED = (1 << 3), /* open for business */
};
/* Listed in order of preference. Update t4_sysctls too if you change these */
enum {DOORBELL_UDB, DOORBELL_WCWR, DOORBELL_UDBWC, DOORBELL_KDB};
/*
* Egress Queue: driver is producer, T4 is consumer.
*
* Note: A free list is an egress queue (driver produces the buffers and T4
* consumes them) but it's special enough to have its own struct (see sge_fl).
*/
struct sge_eq {
unsigned int flags; /* MUST be first */
unsigned int cntxt_id; /* SGE context id for the eq */
struct mtx eq_lock;
struct tx_desc *desc; /* KVA of descriptor ring */
uint16_t doorbells;
volatile uint32_t *udb; /* KVA of doorbell (lies within BAR2) */
u_int udb_qid; /* relative qid within the doorbell page */
uint16_t sidx; /* index of the entry with the status page */
uint16_t cidx; /* consumer idx (desc idx) */
uint16_t pidx; /* producer idx (desc idx) */
uint16_t equeqidx; /* EQUEQ last requested at this pidx */
uint16_t dbidx; /* pidx of the most recent doorbell */
uint16_t iqid; /* iq that gets egr_update for the eq */
uint8_t tx_chan; /* tx channel used by the eq */
volatile u_int equiq; /* EQUIQ outstanding */
bus_dma_tag_t desc_tag;
bus_dmamap_t desc_map;
bus_addr_t ba; /* bus address of descriptor ring */
char lockname[16];
};
struct sw_zone_info {
uma_zone_t zone; /* zone that this cluster comes from */
int size; /* size of cluster: 2K, 4K, 9K, 16K, etc. */
int type; /* EXT_xxx type of the cluster */
int8_t head_hwidx;
int8_t tail_hwidx;
};
struct hw_buf_info {
int8_t zidx; /* backpointer to zone; -ve means unused */
int8_t next; /* next hwidx for this zone; -1 means no more */
int size;
};
enum {
NUM_MEMWIN = 3,
MEMWIN0_APERTURE = 2048,
MEMWIN0_BASE = 0x1b800,
MEMWIN1_APERTURE = 32768,
MEMWIN1_BASE = 0x28000,
MEMWIN2_APERTURE_T4 = 65536,
MEMWIN2_BASE_T4 = 0x30000,
MEMWIN2_APERTURE_T5 = 128 * 1024,
MEMWIN2_BASE_T5 = 0x60000,
};
struct memwin {
struct rwlock mw_lock __aligned(CACHE_LINE_SIZE);
uint32_t mw_base; /* constant after setup_memwin */
uint32_t mw_aperture; /* ditto */
uint32_t mw_curpos; /* protected by mw_lock */
};
enum {
FL_STARVING = (1 << 0), /* on the adapter's list of starving fl's */
FL_DOOMED = (1 << 1), /* about to be destroyed */
FL_BUF_PACKING = (1 << 2), /* buffer packing enabled */
FL_BUF_RESUME = (1 << 3), /* resume from the middle of the frame */
};
#define FL_RUNNING_LOW(fl) \
(IDXDIFF(fl->dbidx * 8, fl->cidx, fl->sidx * 8) <= fl->lowat)
#define FL_NOT_RUNNING_LOW(fl) \
(IDXDIFF(fl->dbidx * 8, fl->cidx, fl->sidx * 8) >= 2 * fl->lowat)
struct sge_fl {
struct mtx fl_lock;
__be64 *desc; /* KVA of descriptor ring, ptr to addresses */
struct fl_sdesc *sdesc; /* KVA of software descriptor ring */
struct cluster_layout cll_def; /* default refill zone, layout */
uint16_t lowat; /* # of buffers <= this means fl needs help */
int flags;
uint16_t buf_boundary;
/* The 16b idx all deal with hw descriptors */
uint16_t dbidx; /* hw pidx after last doorbell */
uint16_t sidx; /* index of status page */
volatile uint16_t hw_cidx;
/* The 32b idx are all buffer idx, not hardware descriptor idx */
uint32_t cidx; /* consumer index */
uint32_t pidx; /* producer index */
uint32_t dbval;
u_int rx_offset; /* offset in fl buf (when buffer packing) */
volatile uint32_t *udb;
uint64_t mbuf_allocated;/* # of mbuf allocated from zone_mbuf */
uint64_t mbuf_inlined; /* # of mbuf created within clusters */
uint64_t cl_allocated; /* # of clusters allocated */
uint64_t cl_recycled; /* # of clusters recycled */
uint64_t cl_fast_recycled; /* # of clusters recycled (fast) */
/* These 3 are valid when FL_BUF_RESUME is set, stale otherwise. */
struct mbuf *m0;
struct mbuf **pnext;
u_int remaining;
uint16_t qsize; /* # of hw descriptors (status page included) */
uint16_t cntxt_id; /* SGE context id for the freelist */
TAILQ_ENTRY(sge_fl) link; /* All starving freelists */
bus_dma_tag_t desc_tag;
bus_dmamap_t desc_map;
char lockname[16];
bus_addr_t ba; /* bus address of descriptor ring */
struct cluster_layout cll_alt; /* alternate refill zone, layout */
};
struct mp_ring;
/* txq: SGE egress queue + what's needed for Ethernet NIC */
struct sge_txq {
struct sge_eq eq; /* MUST be first */
struct ifnet *ifp; /* the interface this txq belongs to */
struct mp_ring *r; /* tx software ring */
struct tx_sdesc *sdesc; /* KVA of software descriptor ring */
struct sglist *gl;
__be32 cpl_ctrl0; /* for convenience */
struct task tx_reclaim_task;
/* stats for common events first */
uint64_t txcsum; /* # of times hardware assisted with checksum */
uint64_t tso_wrs; /* # of TSO work requests */
uint64_t vlan_insertion;/* # of times VLAN tag was inserted */
uint64_t imm_wrs; /* # of work requests with immediate data */
uint64_t sgl_wrs; /* # of work requests with direct SGL */
uint64_t txpkt_wrs; /* # of txpkt work requests (not coalesced) */
uint64_t txpkts0_wrs; /* # of type0 coalesced tx work requests */
uint64_t txpkts1_wrs; /* # of type1 coalesced tx work requests */
uint64_t txpkts0_pkts; /* # of frames in type0 coalesced tx WRs */
uint64_t txpkts1_pkts; /* # of frames in type1 coalesced tx WRs */
/* stats for not-that-common events */
} __aligned(CACHE_LINE_SIZE);
/* rxq: SGE ingress queue + SGE free list + miscellaneous items */
struct sge_rxq {
struct sge_iq iq; /* MUST be first */
struct sge_fl fl; /* MUST follow iq */
struct ifnet *ifp; /* the interface this rxq belongs to */
#if defined(INET) || defined(INET6)
struct lro_ctrl lro; /* LRO state */
#endif
/* stats for common events first */
uint64_t rxcsum; /* # of times hardware assisted with checksum */
uint64_t vlan_extraction;/* # of times VLAN tag was extracted */
/* stats for not-that-common events */
} __aligned(CACHE_LINE_SIZE);
static inline struct sge_rxq *
iq_to_rxq(struct sge_iq *iq)
{
return (__containerof(iq, struct sge_rxq, iq));
}
/* ofld_rxq: SGE ingress queue + SGE free list + miscellaneous items */
struct sge_ofld_rxq {
struct sge_iq iq; /* MUST be first */
struct sge_fl fl; /* MUST follow iq */
} __aligned(CACHE_LINE_SIZE);
static inline struct sge_ofld_rxq *
iq_to_ofld_rxq(struct sge_iq *iq)
{
return (__containerof(iq, struct sge_ofld_rxq, iq));
}
struct wrqe {
STAILQ_ENTRY(wrqe) link;
struct sge_wrq *wrq;
int wr_len;
char wr[] __aligned(16);
};
struct wrq_cookie {
TAILQ_ENTRY(wrq_cookie) link;
int ndesc;
int pidx;
};
/*
* wrq: SGE egress queue that is given prebuilt work requests. Both the control
* and offload tx queues are of this type.
*/
struct sge_wrq {
struct sge_eq eq; /* MUST be first */
struct adapter *adapter;
struct task wrq_tx_task;
/* Tx desc reserved but WR not "committed" yet. */
TAILQ_HEAD(wrq_incomplete_wrs , wrq_cookie) incomplete_wrs;
/* List of WRs ready to go out as soon as descriptors are available. */
STAILQ_HEAD(, wrqe) wr_list;
u_int nwr_pending;
u_int ndesc_needed;
/* stats for common events first */
uint64_t tx_wrs_direct; /* # of WRs written directly to desc ring. */
uint64_t tx_wrs_ss; /* # of WRs copied from scratch space. */
uint64_t tx_wrs_copied; /* # of WRs queued and copied to desc ring. */
/* stats for not-that-common events */
/*
* Scratch space for work requests that wrap around after reaching the
* status page, and some information about the last WR that used it.
*/
uint16_t ss_pidx;
uint16_t ss_len;
uint8_t ss[SGE_MAX_WR_LEN];
} __aligned(CACHE_LINE_SIZE);
struct sge_nm_rxq {
struct vi_info *vi;
struct iq_desc *iq_desc;
uint16_t iq_abs_id;
uint16_t iq_cntxt_id;
uint16_t iq_cidx;
uint16_t iq_sidx;
uint8_t iq_gen;
__be64 *fl_desc;
uint16_t fl_cntxt_id;
uint32_t fl_cidx;
uint32_t fl_pidx;
uint32_t fl_sidx;
uint32_t fl_db_val;
u_int fl_hwidx:4;
u_int nid; /* netmap ring # for this queue */
/* infrequently used items after this */
bus_dma_tag_t iq_desc_tag;
bus_dmamap_t iq_desc_map;
bus_addr_t iq_ba;
int intr_idx;
bus_dma_tag_t fl_desc_tag;
bus_dmamap_t fl_desc_map;
bus_addr_t fl_ba;
} __aligned(CACHE_LINE_SIZE);
struct sge_nm_txq {
struct tx_desc *desc;
uint16_t cidx;
uint16_t pidx;
uint16_t sidx;
uint16_t equiqidx; /* EQUIQ last requested at this pidx */
uint16_t equeqidx; /* EQUEQ last requested at this pidx */
uint16_t dbidx; /* pidx of the most recent doorbell */
uint16_t doorbells;
volatile uint32_t *udb;
u_int udb_qid;
u_int cntxt_id;
__be32 cpl_ctrl0; /* for convenience */
u_int nid; /* netmap ring # for this queue */
/* infrequently used items after this */
bus_dma_tag_t desc_tag;
bus_dmamap_t desc_map;
bus_addr_t ba;
int iqidx;
} __aligned(CACHE_LINE_SIZE);
struct sge {
int nrxq; /* total # of Ethernet rx queues */
int ntxq; /* total # of Ethernet tx tx queues */
int nofldrxq; /* total # of TOE rx queues */
int nofldtxq; /* total # of TOE tx queues */
int nnmrxq; /* total # of netmap rx queues */
int nnmtxq; /* total # of netmap tx queues */
int niq; /* total # of ingress queues */
int neq; /* total # of egress queues */
struct sge_iq fwq; /* Firmware event queue */
struct sge_wrq mgmtq; /* Management queue (control queue) */
struct sge_wrq *ctrlq; /* Control queues */
struct sge_txq *txq; /* NIC tx queues */
struct sge_rxq *rxq; /* NIC rx queues */
struct sge_wrq *ofld_txq; /* TOE tx queues */
struct sge_ofld_rxq *ofld_rxq; /* TOE rx queues */
struct sge_nm_txq *nm_txq; /* netmap tx queues */
struct sge_nm_rxq *nm_rxq; /* netmap rx queues */
uint16_t iq_start;
int eq_start;
struct sge_iq **iqmap; /* iq->cntxt_id to iq mapping */
struct sge_eq **eqmap; /* eq->cntxt_id to eq mapping */
int8_t safe_hwidx1; /* may not have room for metadata */
int8_t safe_hwidx2; /* with room for metadata and maybe more */
struct sw_zone_info sw_zone_info[SW_ZONE_SIZES];
struct hw_buf_info hw_buf_info[SGE_FLBUF_SIZES];
};
struct rss_header;
typedef int (*cpl_handler_t)(struct sge_iq *, const struct rss_header *,
struct mbuf *);
typedef int (*an_handler_t)(struct sge_iq *, const struct rsp_ctrl *);
typedef int (*fw_msg_handler_t)(struct adapter *, const __be64 *);
struct adapter {
SLIST_ENTRY(adapter) link;
device_t dev;
struct cdev *cdev;
/* PCIe register resources */
int regs_rid;
struct resource *regs_res;
int msix_rid;
struct resource *msix_res;
bus_space_handle_t bh;
bus_space_tag_t bt;
bus_size_t mmio_len;
int udbs_rid;
struct resource *udbs_res;
volatile uint8_t *udbs_base;
unsigned int pf;
unsigned int mbox;
unsigned int vpd_busy;
unsigned int vpd_flag;
/* Interrupt information */
int intr_type;
int intr_count;
struct irq {
struct resource *res;
int rid;
void *tag;
} *irq;
bus_dma_tag_t dmat; /* Parent DMA tag */
struct sge sge;
int lro_timeout;
struct taskqueue *tq[MAX_NCHAN]; /* General purpose taskqueues */
struct port_info *port[MAX_NPORTS];
uint8_t chan_map[MAX_NCHAN];
void *tom_softc; /* (struct tom_data *) */
struct tom_tunables tt;
void *iwarp_softc; /* (struct c4iw_dev *) */
void *iscsi_ulp_softc; /* (struct cxgbei_data *) */
struct l2t_data *l2t; /* L2 table */
struct tid_info tids;
uint16_t doorbells;
int offload_map; /* ports with IFCAP_TOE enabled */
int active_ulds; /* ULDs activated on this adapter */
int flags;
int debug_flags;
char ifp_lockname[16];
struct mtx ifp_lock;
struct ifnet *ifp; /* tracer ifp */
struct ifmedia media;
int traceq; /* iq used by all tracers, -1 if none */
int tracer_valid; /* bitmap of valid tracers */
int tracer_enabled; /* bitmap of enabled tracers */
char fw_version[16];
char tp_version[16];
char exprom_version[16];
char cfg_file[32];
u_int cfcsum;
struct adapter_params params;
const struct chip_params *chip_params;
struct t4_virt_res vres;
uint16_t nbmcaps;
uint16_t linkcaps;
uint16_t switchcaps;
uint16_t niccaps;
uint16_t toecaps;
uint16_t rdmacaps;
uint16_t tlscaps;
uint16_t iscsicaps;
uint16_t fcoecaps;
struct sysctl_ctx_list ctx; /* from adapter_full_init to full_uninit */
struct mtx sc_lock;
char lockname[16];
/* Starving free lists */
struct mtx sfl_lock; /* same cache-line as sc_lock? but that's ok */
TAILQ_HEAD(, sge_fl) sfl;
struct callout sfl_callout;
struct mtx reg_lock; /* for indirect register access */
struct memwin memwin[NUM_MEMWIN]; /* memory windows */
an_handler_t an_handler __aligned(CACHE_LINE_SIZE);
fw_msg_handler_t fw_msg_handler[7]; /* NUM_FW6_TYPES */
cpl_handler_t cpl_handler[0xef]; /* NUM_CPL_CMDS */
const char *last_op;
const void *last_op_thr;
int last_op_flags;
int sc_do_rxcopy;
};
#define ADAPTER_LOCK(sc) mtx_lock(&(sc)->sc_lock)
#define ADAPTER_UNLOCK(sc) mtx_unlock(&(sc)->sc_lock)
#define ADAPTER_LOCK_ASSERT_OWNED(sc) mtx_assert(&(sc)->sc_lock, MA_OWNED)
#define ADAPTER_LOCK_ASSERT_NOTOWNED(sc) mtx_assert(&(sc)->sc_lock, MA_NOTOWNED)
#define ASSERT_SYNCHRONIZED_OP(sc) \
KASSERT(IS_BUSY(sc) && \
(mtx_owned(&(sc)->sc_lock) || sc->last_op_thr == curthread), \
("%s: operation not synchronized.", __func__))
#define PORT_LOCK(pi) mtx_lock(&(pi)->pi_lock)
#define PORT_UNLOCK(pi) mtx_unlock(&(pi)->pi_lock)
#define PORT_LOCK_ASSERT_OWNED(pi) mtx_assert(&(pi)->pi_lock, MA_OWNED)
#define PORT_LOCK_ASSERT_NOTOWNED(pi) mtx_assert(&(pi)->pi_lock, MA_NOTOWNED)
#define FL_LOCK(fl) mtx_lock(&(fl)->fl_lock)
#define FL_TRYLOCK(fl) mtx_trylock(&(fl)->fl_lock)
#define FL_UNLOCK(fl) mtx_unlock(&(fl)->fl_lock)
#define FL_LOCK_ASSERT_OWNED(fl) mtx_assert(&(fl)->fl_lock, MA_OWNED)
#define FL_LOCK_ASSERT_NOTOWNED(fl) mtx_assert(&(fl)->fl_lock, MA_NOTOWNED)
#define RXQ_FL_LOCK(rxq) FL_LOCK(&(rxq)->fl)
#define RXQ_FL_UNLOCK(rxq) FL_UNLOCK(&(rxq)->fl)
#define RXQ_FL_LOCK_ASSERT_OWNED(rxq) FL_LOCK_ASSERT_OWNED(&(rxq)->fl)
#define RXQ_FL_LOCK_ASSERT_NOTOWNED(rxq) FL_LOCK_ASSERT_NOTOWNED(&(rxq)->fl)
#define EQ_LOCK(eq) mtx_lock(&(eq)->eq_lock)
#define EQ_TRYLOCK(eq) mtx_trylock(&(eq)->eq_lock)
#define EQ_UNLOCK(eq) mtx_unlock(&(eq)->eq_lock)
#define EQ_LOCK_ASSERT_OWNED(eq) mtx_assert(&(eq)->eq_lock, MA_OWNED)
#define EQ_LOCK_ASSERT_NOTOWNED(eq) mtx_assert(&(eq)->eq_lock, MA_NOTOWNED)
#define TXQ_LOCK(txq) EQ_LOCK(&(txq)->eq)
#define TXQ_TRYLOCK(txq) EQ_TRYLOCK(&(txq)->eq)
#define TXQ_UNLOCK(txq) EQ_UNLOCK(&(txq)->eq)
#define TXQ_LOCK_ASSERT_OWNED(txq) EQ_LOCK_ASSERT_OWNED(&(txq)->eq)
#define TXQ_LOCK_ASSERT_NOTOWNED(txq) EQ_LOCK_ASSERT_NOTOWNED(&(txq)->eq)
#define CH_DUMP_MBOX(sc, mbox, data_reg) \
do { \
if (sc->debug_flags & DF_DUMP_MBOX) { \
log(LOG_NOTICE, \
"%s mbox %u: %016llx %016llx %016llx %016llx " \
"%016llx %016llx %016llx %016llx\n", \
device_get_nameunit(sc->dev), mbox, \
(unsigned long long)t4_read_reg64(sc, data_reg), \
(unsigned long long)t4_read_reg64(sc, data_reg + 8), \
(unsigned long long)t4_read_reg64(sc, data_reg + 16), \
(unsigned long long)t4_read_reg64(sc, data_reg + 24), \
(unsigned long long)t4_read_reg64(sc, data_reg + 32), \
(unsigned long long)t4_read_reg64(sc, data_reg + 40), \
(unsigned long long)t4_read_reg64(sc, data_reg + 48), \
(unsigned long long)t4_read_reg64(sc, data_reg + 56)); \
} \
} while (0)
#define for_each_txq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.txq[vi->first_txq], iter = 0; \
iter < vi->ntxq; ++iter, ++q)
#define for_each_rxq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.rxq[vi->first_rxq], iter = 0; \
iter < vi->nrxq; ++iter, ++q)
#define for_each_ofld_txq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.ofld_txq[vi->first_ofld_txq], iter = 0; \
iter < vi->nofldtxq; ++iter, ++q)
#define for_each_ofld_rxq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.ofld_rxq[vi->first_ofld_rxq], iter = 0; \
iter < vi->nofldrxq; ++iter, ++q)
#define for_each_nm_txq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.nm_txq[vi->first_txq], iter = 0; \
iter < vi->ntxq; ++iter, ++q)
#define for_each_nm_rxq(vi, iter, q) \
for (q = &vi->pi->adapter->sge.nm_rxq[vi->first_rxq], iter = 0; \
iter < vi->nrxq; ++iter, ++q)
#define for_each_vi(_pi, _iter, _vi) \
for ((_vi) = (_pi)->vi, (_iter) = 0; (_iter) < (_pi)->nvi; \
++(_iter), ++(_vi))
#define IDXINCR(idx, incr, wrap) do { \
idx = wrap - idx > incr ? idx + incr : incr - (wrap - idx); \
} while (0)
#define IDXDIFF(head, tail, wrap) \
((head) >= (tail) ? (head) - (tail) : (wrap) - (tail) + (head))
/* One for errors, one for firmware events */
#define T4_EXTRA_INTR 2
static inline uint32_t
t4_read_reg(struct adapter *sc, uint32_t reg)
{
return bus_space_read_4(sc->bt, sc->bh, reg);
}
static inline void
t4_write_reg(struct adapter *sc, uint32_t reg, uint32_t val)
{
bus_space_write_4(sc->bt, sc->bh, reg, val);
}
static inline uint64_t
t4_read_reg64(struct adapter *sc, uint32_t reg)
{
return t4_bus_space_read_8(sc->bt, sc->bh, reg);
}
static inline void
t4_write_reg64(struct adapter *sc, uint32_t reg, uint64_t val)
{
t4_bus_space_write_8(sc->bt, sc->bh, reg, val);
}
static inline void
t4_os_pci_read_cfg1(struct adapter *sc, int reg, uint8_t *val)
{
*val = pci_read_config(sc->dev, reg, 1);
}
static inline void
t4_os_pci_write_cfg1(struct adapter *sc, int reg, uint8_t val)
{
pci_write_config(sc->dev, reg, val, 1);
}
static inline void
t4_os_pci_read_cfg2(struct adapter *sc, int reg, uint16_t *val)
{
*val = pci_read_config(sc->dev, reg, 2);
}
static inline void
t4_os_pci_write_cfg2(struct adapter *sc, int reg, uint16_t val)
{
pci_write_config(sc->dev, reg, val, 2);
}
static inline void
t4_os_pci_read_cfg4(struct adapter *sc, int reg, uint32_t *val)
{
*val = pci_read_config(sc->dev, reg, 4);
}
static inline void
t4_os_pci_write_cfg4(struct adapter *sc, int reg, uint32_t val)
{
pci_write_config(sc->dev, reg, val, 4);
}
static inline struct port_info *
adap2pinfo(struct adapter *sc, int idx)
{
return (sc->port[idx]);
}
static inline void
t4_os_set_hw_addr(struct adapter *sc, int idx, uint8_t hw_addr[])
{
bcopy(hw_addr, sc->port[idx]->vi[0].hw_addr, ETHER_ADDR_LEN);
}
static inline bool
is_10G_port(const struct port_info *pi)
{
return ((pi->link_cfg.supported & FW_PORT_CAP_SPEED_10G) != 0);
}
static inline bool
is_40G_port(const struct port_info *pi)
{
return ((pi->link_cfg.supported & FW_PORT_CAP_SPEED_40G) != 0);
}
static inline int
port_top_speed(const struct port_info *pi)
{
if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_100G)
return (100);
if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_40G)
return (40);
if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_10G)
return (10);
if (pi->link_cfg.supported & FW_PORT_CAP_SPEED_1G)
return (1);
return (0);
}
static inline int
tx_resume_threshold(struct sge_eq *eq)
{
/* not quite the same as qsize / 4, but this will do. */
return (eq->sidx / 4);
}
static inline int
t4_use_ldst(struct adapter *sc)
{
#ifdef notyet
return (sc->flags & FW_OK || !sc->use_bd);
#else
return (0);
#endif
}
/* t4_main.c */
int t4_os_find_pci_capability(struct adapter *, int);
int t4_os_pci_save_state(struct adapter *);
int t4_os_pci_restore_state(struct adapter *);
void t4_os_portmod_changed(const struct adapter *, int);
void t4_os_link_changed(struct adapter *, int, int, int);
void t4_iterate(void (*)(struct adapter *, void *), void *);
int t4_register_cpl_handler(struct adapter *, int, cpl_handler_t);
int t4_register_an_handler(struct adapter *, an_handler_t);
int t4_register_fw_msg_handler(struct adapter *, int, fw_msg_handler_t);
int t4_filter_rpl(struct sge_iq *, const struct rss_header *, struct mbuf *);
int begin_synchronized_op(struct adapter *, struct vi_info *, int, char *);
void doom_vi(struct adapter *, struct vi_info *);
void end_synchronized_op(struct adapter *, int);
int update_mac_settings(struct ifnet *, int);
int adapter_full_init(struct adapter *);
int adapter_full_uninit(struct adapter *);
uint64_t cxgbe_get_counter(struct ifnet *, ift_counter);
int vi_full_init(struct vi_info *);
int vi_full_uninit(struct vi_info *);
void vi_sysctls(struct vi_info *);
void vi_tick(void *);
#ifdef DEV_NETMAP
/* t4_netmap.c */
int create_netmap_ifnet(struct port_info *);
int destroy_netmap_ifnet(struct port_info *);
void t4_nm_intr(void *);
#endif
/* t4_sge.c */
void t4_sge_modload(void);
void t4_sge_modunload(void);
uint64_t t4_sge_extfree_refs(void);
void t4_init_sge_cpl_handlers(struct adapter *);
void t4_tweak_chip_settings(struct adapter *);
int t4_read_chip_settings(struct adapter *);
int t4_create_dma_tag(struct adapter *);
void t4_sge_sysctls(struct adapter *, struct sysctl_ctx_list *,
struct sysctl_oid_list *);
int t4_destroy_dma_tag(struct adapter *);
int t4_setup_adapter_queues(struct adapter *);
int t4_teardown_adapter_queues(struct adapter *);
int t4_setup_vi_queues(struct vi_info *);
int t4_teardown_vi_queues(struct vi_info *);
void t4_intr_all(void *);
void t4_intr(void *);
void t4_intr_err(void *);
void t4_intr_evt(void *);
void t4_wrq_tx_locked(struct adapter *, struct sge_wrq *, struct wrqe *);
void t4_update_fl_bufsize(struct ifnet *);
int parse_pkt(struct mbuf **);
void *start_wrq_wr(struct sge_wrq *, int, struct wrq_cookie *);
void commit_wrq_wr(struct sge_wrq *, void *, struct wrq_cookie *);
int tnl_cong(struct port_info *, int);
/* t4_tracer.c */
struct t4_tracer;
void t4_tracer_modload(void);
void t4_tracer_modunload(void);
void t4_tracer_port_detach(struct adapter *);
int t4_get_tracer(struct adapter *, struct t4_tracer *);
int t4_set_tracer(struct adapter *, struct t4_tracer *);
int t4_trace_pkt(struct sge_iq *, const struct rss_header *, struct mbuf *);
int t5_trace_pkt(struct sge_iq *, const struct rss_header *, struct mbuf *);
static inline struct wrqe *
alloc_wrqe(int wr_len, struct sge_wrq *wrq)
{
int len = offsetof(struct wrqe, wr) + wr_len;
struct wrqe *wr;
wr = malloc(len, M_CXGBE, M_NOWAIT);
if (__predict_false(wr == NULL))
return (NULL);
wr->wr_len = wr_len;
wr->wrq = wrq;
return (wr);
}
static inline void *
wrtod(struct wrqe *wr)
{
return (&wr->wr[0]);
}
static inline void
free_wrqe(struct wrqe *wr)
{
free(wr, M_CXGBE);
}
static inline void
t4_wrq_tx(struct adapter *sc, struct wrqe *wr)
{
struct sge_wrq *wrq = wr->wrq;
TXQ_LOCK(wrq);
t4_wrq_tx_locked(sc, wrq, wr);
TXQ_UNLOCK(wrq);
}
#endif
Index: projects/vnet/sys/dev/cxgbe/t4_main.c
===================================================================
--- projects/vnet/sys/dev/cxgbe/t4_main.c (revision 301546)
+++ projects/vnet/sys/dev/cxgbe/t4_main.c (revision 301547)
@@ -1,9442 +1,9561 @@
/*-
* Copyright (c) 2011 Chelsio Communications, Inc.
* All rights reserved.
* Written by: Navdeep Parhar
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include
__FBSDID("$FreeBSD$");
#include "opt_ddb.h"
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_rss.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#ifdef RSS
#include
#endif
#if defined(__i386__) || defined(__amd64__)
#include
#include
#endif
#ifdef DDB
#include
#include
#endif
#include "common/common.h"
#include "common/t4_msg.h"
#include "common/t4_regs.h"
#include "common/t4_regs_values.h"
#include "t4_ioctl.h"
#include "t4_l2t.h"
#include "t4_mp_ring.h"
/* T4 bus driver interface */
static int t4_probe(device_t);
static int t4_attach(device_t);
static int t4_detach(device_t);
static device_method_t t4_methods[] = {
DEVMETHOD(device_probe, t4_probe),
DEVMETHOD(device_attach, t4_attach),
DEVMETHOD(device_detach, t4_detach),
DEVMETHOD_END
};
static driver_t t4_driver = {
"t4nex",
t4_methods,
sizeof(struct adapter)
};
/* T4 port (cxgbe) interface */
static int cxgbe_probe(device_t);
static int cxgbe_attach(device_t);
static int cxgbe_detach(device_t);
static device_method_t cxgbe_methods[] = {
DEVMETHOD(device_probe, cxgbe_probe),
DEVMETHOD(device_attach, cxgbe_attach),
DEVMETHOD(device_detach, cxgbe_detach),
{ 0, 0 }
};
static driver_t cxgbe_driver = {
"cxgbe",
cxgbe_methods,
sizeof(struct port_info)
};
/* T4 VI (vcxgbe) interface */
static int vcxgbe_probe(device_t);
static int vcxgbe_attach(device_t);
static int vcxgbe_detach(device_t);
static device_method_t vcxgbe_methods[] = {
DEVMETHOD(device_probe, vcxgbe_probe),
DEVMETHOD(device_attach, vcxgbe_attach),
DEVMETHOD(device_detach, vcxgbe_detach),
{ 0, 0 }
};
static driver_t vcxgbe_driver = {
"vcxgbe",
vcxgbe_methods,
sizeof(struct vi_info)
};
static d_ioctl_t t4_ioctl;
static d_open_t t4_open;
static d_close_t t4_close;
static struct cdevsw t4_cdevsw = {
.d_version = D_VERSION,
.d_flags = 0,
.d_open = t4_open,
.d_close = t4_close,
.d_ioctl = t4_ioctl,
.d_name = "t4nex",
};
/* T5 bus driver interface */
static int t5_probe(device_t);
static device_method_t t5_methods[] = {
DEVMETHOD(device_probe, t5_probe),
DEVMETHOD(device_attach, t4_attach),
DEVMETHOD(device_detach, t4_detach),
DEVMETHOD_END
};
static driver_t t5_driver = {
"t5nex",
t5_methods,
sizeof(struct adapter)
};
/* T5 port (cxl) interface */
static driver_t cxl_driver = {
"cxl",
cxgbe_methods,
sizeof(struct port_info)
};
/* T5 VI (vcxl) interface */
static driver_t vcxl_driver = {
"vcxl",
vcxgbe_methods,
sizeof(struct vi_info)
};
static struct cdevsw t5_cdevsw = {
.d_version = D_VERSION,
.d_flags = 0,
.d_open = t4_open,
.d_close = t4_close,
.d_ioctl = t4_ioctl,
.d_name = "t5nex",
};
/* ifnet + media interface */
static void cxgbe_init(void *);
static int cxgbe_ioctl(struct ifnet *, unsigned long, caddr_t);
static int cxgbe_transmit(struct ifnet *, struct mbuf *);
static void cxgbe_qflush(struct ifnet *);
static int cxgbe_media_change(struct ifnet *);
static void cxgbe_media_status(struct ifnet *, struct ifmediareq *);
MALLOC_DEFINE(M_CXGBE, "cxgbe", "Chelsio T4/T5 Ethernet driver and services");
/*
* Correct lock order when you need to acquire multiple locks is t4_list_lock,
* then ADAPTER_LOCK, then t4_uld_list_lock.
*/
static struct sx t4_list_lock;
SLIST_HEAD(, adapter) t4_list;
#ifdef TCP_OFFLOAD
static struct sx t4_uld_list_lock;
SLIST_HEAD(, uld_info) t4_uld_list;
#endif
/*
* Tunables. See tweak_tunables() too.
*
* Each tunable is set to a default value here if it's known at compile-time.
* Otherwise it is set to -1 as an indication to tweak_tunables() that it should
* provide a reasonable default when the driver is loaded.
*
* Tunables applicable to both T4 and T5 are under hw.cxgbe. Those specific to
* T5 are under hw.cxl.
*/
/*
* Number of queues for tx and rx, 10G and 1G, NIC and offload.
*/
#define NTXQ_10G 16
static int t4_ntxq10g = -1;
TUNABLE_INT("hw.cxgbe.ntxq10g", &t4_ntxq10g);
#define NRXQ_10G 8
static int t4_nrxq10g = -1;
TUNABLE_INT("hw.cxgbe.nrxq10g", &t4_nrxq10g);
#define NTXQ_1G 4
static int t4_ntxq1g = -1;
TUNABLE_INT("hw.cxgbe.ntxq1g", &t4_ntxq1g);
#define NRXQ_1G 2
static int t4_nrxq1g = -1;
TUNABLE_INT("hw.cxgbe.nrxq1g", &t4_nrxq1g);
static int t4_rsrv_noflowq = 0;
TUNABLE_INT("hw.cxgbe.rsrv_noflowq", &t4_rsrv_noflowq);
#ifdef TCP_OFFLOAD
#define NOFLDTXQ_10G 8
static int t4_nofldtxq10g = -1;
TUNABLE_INT("hw.cxgbe.nofldtxq10g", &t4_nofldtxq10g);
#define NOFLDRXQ_10G 2
static int t4_nofldrxq10g = -1;
TUNABLE_INT("hw.cxgbe.nofldrxq10g", &t4_nofldrxq10g);
#define NOFLDTXQ_1G 2
static int t4_nofldtxq1g = -1;
TUNABLE_INT("hw.cxgbe.nofldtxq1g", &t4_nofldtxq1g);
#define NOFLDRXQ_1G 1
static int t4_nofldrxq1g = -1;
TUNABLE_INT("hw.cxgbe.nofldrxq1g", &t4_nofldrxq1g);
#endif
#ifdef DEV_NETMAP
#define NNMTXQ_10G 2
static int t4_nnmtxq10g = -1;
TUNABLE_INT("hw.cxgbe.nnmtxq10g", &t4_nnmtxq10g);
#define NNMRXQ_10G 2
static int t4_nnmrxq10g = -1;
TUNABLE_INT("hw.cxgbe.nnmrxq10g", &t4_nnmrxq10g);
#define NNMTXQ_1G 1
static int t4_nnmtxq1g = -1;
TUNABLE_INT("hw.cxgbe.nnmtxq1g", &t4_nnmtxq1g);
#define NNMRXQ_1G 1
static int t4_nnmrxq1g = -1;
TUNABLE_INT("hw.cxgbe.nnmrxq1g", &t4_nnmrxq1g);
#endif
/*
* Holdoff parameters for 10G and 1G ports.
*/
#define TMR_IDX_10G 1
static int t4_tmr_idx_10g = TMR_IDX_10G;
TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_10G", &t4_tmr_idx_10g);
#define PKTC_IDX_10G (-1)
static int t4_pktc_idx_10g = PKTC_IDX_10G;
TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_10G", &t4_pktc_idx_10g);
#define TMR_IDX_1G 1
static int t4_tmr_idx_1g = TMR_IDX_1G;
TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_1G", &t4_tmr_idx_1g);
#define PKTC_IDX_1G (-1)
static int t4_pktc_idx_1g = PKTC_IDX_1G;
TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_1G", &t4_pktc_idx_1g);
/*
* Size (# of entries) of each tx and rx queue.
*/
static unsigned int t4_qsize_txq = TX_EQ_QSIZE;
TUNABLE_INT("hw.cxgbe.qsize_txq", &t4_qsize_txq);
static unsigned int t4_qsize_rxq = RX_IQ_QSIZE;
TUNABLE_INT("hw.cxgbe.qsize_rxq", &t4_qsize_rxq);
/*
* Interrupt types allowed (bits 0, 1, 2 = INTx, MSI, MSI-X respectively).
*/
static int t4_intr_types = INTR_MSIX | INTR_MSI | INTR_INTX;
TUNABLE_INT("hw.cxgbe.interrupt_types", &t4_intr_types);
/*
* Configuration file.
*/
#define DEFAULT_CF "default"
#define FLASH_CF "flash"
#define UWIRE_CF "uwire"
#define FPGA_CF "fpga"
static char t4_cfg_file[32] = DEFAULT_CF;
TUNABLE_STR("hw.cxgbe.config_file", t4_cfg_file, sizeof(t4_cfg_file));
/*
* PAUSE settings (bit 0, 1 = rx_pause, tx_pause respectively).
* rx_pause = 1 to heed incoming PAUSE frames, 0 to ignore them.
* tx_pause = 1 to emit PAUSE frames when the rx FIFO reaches its high water
* mark or when signalled to do so, 0 to never emit PAUSE.
*/
static int t4_pause_settings = PAUSE_TX | PAUSE_RX;
TUNABLE_INT("hw.cxgbe.pause_settings", &t4_pause_settings);
/*
* Firmware auto-install by driver during attach (0, 1, 2 = prohibited, allowed,
* encouraged respectively).
*/
static unsigned int t4_fw_install = 1;
TUNABLE_INT("hw.cxgbe.fw_install", &t4_fw_install);
/*
* ASIC features that will be used. Disable the ones you don't want so that the
* chip resources aren't wasted on features that will not be used.
*/
static int t4_nbmcaps_allowed = 0;
TUNABLE_INT("hw.cxgbe.nbmcaps_allowed", &t4_nbmcaps_allowed);
static int t4_linkcaps_allowed = 0; /* No DCBX, PPP, etc. by default */
TUNABLE_INT("hw.cxgbe.linkcaps_allowed", &t4_linkcaps_allowed);
static int t4_switchcaps_allowed = FW_CAPS_CONFIG_SWITCH_INGRESS |
FW_CAPS_CONFIG_SWITCH_EGRESS;
TUNABLE_INT("hw.cxgbe.switchcaps_allowed", &t4_switchcaps_allowed);
static int t4_niccaps_allowed = FW_CAPS_CONFIG_NIC;
TUNABLE_INT("hw.cxgbe.niccaps_allowed", &t4_niccaps_allowed);
static int t4_toecaps_allowed = -1;
TUNABLE_INT("hw.cxgbe.toecaps_allowed", &t4_toecaps_allowed);
static int t4_rdmacaps_allowed = -1;
TUNABLE_INT("hw.cxgbe.rdmacaps_allowed", &t4_rdmacaps_allowed);
static int t4_tlscaps_allowed = 0;
TUNABLE_INT("hw.cxgbe.tlscaps_allowed", &t4_tlscaps_allowed);
static int t4_iscsicaps_allowed = -1;
TUNABLE_INT("hw.cxgbe.iscsicaps_allowed", &t4_iscsicaps_allowed);
static int t4_fcoecaps_allowed = 0;
TUNABLE_INT("hw.cxgbe.fcoecaps_allowed", &t4_fcoecaps_allowed);
static int t5_write_combine = 0;
TUNABLE_INT("hw.cxl.write_combine", &t5_write_combine);
static int t4_num_vis = 1;
TUNABLE_INT("hw.cxgbe.num_vis", &t4_num_vis);
/* Functions used by extra VIs to obtain unique MAC addresses for each VI. */
static int vi_mac_funcs[] = {
FW_VI_FUNC_OFLD,
FW_VI_FUNC_IWARP,
FW_VI_FUNC_OPENISCSI,
FW_VI_FUNC_OPENFCOE,
FW_VI_FUNC_FOISCSI,
FW_VI_FUNC_FOFCOE,
};
struct intrs_and_queues {
uint16_t intr_type; /* INTx, MSI, or MSI-X */
uint16_t nirq; /* Total # of vectors */
uint16_t intr_flags_10g;/* Interrupt flags for each 10G port */
uint16_t intr_flags_1g; /* Interrupt flags for each 1G port */
uint16_t ntxq10g; /* # of NIC txq's for each 10G port */
uint16_t nrxq10g; /* # of NIC rxq's for each 10G port */
uint16_t ntxq1g; /* # of NIC txq's for each 1G port */
uint16_t nrxq1g; /* # of NIC rxq's for each 1G port */
uint16_t rsrv_noflowq; /* Flag whether to reserve queue 0 */
#ifdef TCP_OFFLOAD
uint16_t nofldtxq10g; /* # of TOE txq's for each 10G port */
uint16_t nofldrxq10g; /* # of TOE rxq's for each 10G port */
uint16_t nofldtxq1g; /* # of TOE txq's for each 1G port */
uint16_t nofldrxq1g; /* # of TOE rxq's for each 1G port */
#endif
#ifdef DEV_NETMAP
uint16_t nnmtxq10g; /* # of netmap txq's for each 10G port */
uint16_t nnmrxq10g; /* # of netmap rxq's for each 10G port */
uint16_t nnmtxq1g; /* # of netmap txq's for each 1G port */
uint16_t nnmrxq1g; /* # of netmap rxq's for each 1G port */
#endif
};
struct filter_entry {
uint32_t valid:1; /* filter allocated and valid */
uint32_t locked:1; /* filter is administratively locked */
uint32_t pending:1; /* filter action is pending firmware reply */
uint32_t smtidx:8; /* Source MAC Table index for smac */
struct l2t_entry *l2t; /* Layer Two Table entry for dmac */
struct t4_filter_specification fs;
};
static int map_bars_0_and_4(struct adapter *);
static int map_bar_2(struct adapter *);
static void setup_memwin(struct adapter *);
static void position_memwin(struct adapter *, int, uint32_t);
static int rw_via_memwin(struct adapter *, int, uint32_t, uint32_t *, int, int);
static inline int read_via_memwin(struct adapter *, int, uint32_t, uint32_t *,
int);
static inline int write_via_memwin(struct adapter *, int, uint32_t,
const uint32_t *, int);
static int validate_mem_range(struct adapter *, uint32_t, int);
static int fwmtype_to_hwmtype(int);
static int validate_mt_off_len(struct adapter *, int, uint32_t, int,
uint32_t *);
static int fixup_devlog_params(struct adapter *);
static int cfg_itype_and_nqueues(struct adapter *, int, int, int,
struct intrs_and_queues *);
static int prep_firmware(struct adapter *);
static int partition_resources(struct adapter *, const struct firmware *,
const char *);
static int get_params__pre_init(struct adapter *);
static int get_params__post_init(struct adapter *);
static int set_params__post_init(struct adapter *);
static void t4_set_desc(struct adapter *);
static void build_medialist(struct port_info *, struct ifmedia *);
static int cxgbe_init_synchronized(struct vi_info *);
static int cxgbe_uninit_synchronized(struct vi_info *);
static int setup_intr_handlers(struct adapter *);
static void quiesce_txq(struct adapter *, struct sge_txq *);
static void quiesce_wrq(struct adapter *, struct sge_wrq *);
static void quiesce_iq(struct adapter *, struct sge_iq *);
static void quiesce_fl(struct adapter *, struct sge_fl *);
static int t4_alloc_irq(struct adapter *, struct irq *, int rid,
driver_intr_t *, void *, char *);
static int t4_free_irq(struct adapter *, struct irq *);
static void get_regs(struct adapter *, struct t4_regdump *, uint8_t *);
static void vi_refresh_stats(struct adapter *, struct vi_info *);
static void cxgbe_refresh_stats(struct adapter *, struct port_info *);
static void cxgbe_tick(void *);
static void cxgbe_vlan_config(void *, struct ifnet *, uint16_t);
static int cpl_not_handled(struct sge_iq *, const struct rss_header *,
struct mbuf *);
static int an_not_handled(struct sge_iq *, const struct rsp_ctrl *);
static int fw_msg_not_handled(struct adapter *, const __be64 *);
static void t4_sysctls(struct adapter *);
static void cxgbe_sysctls(struct port_info *);
static int sysctl_int_array(SYSCTL_HANDLER_ARGS);
static int sysctl_bitfield(SYSCTL_HANDLER_ARGS);
static int sysctl_btphy(SYSCTL_HANDLER_ARGS);
static int sysctl_noflowq(SYSCTL_HANDLER_ARGS);
static int sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS);
static int sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS);
static int sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS);
static int sysctl_qsize_txq(SYSCTL_HANDLER_ARGS);
static int sysctl_pause_settings(SYSCTL_HANDLER_ARGS);
static int sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS);
static int sysctl_temperature(SYSCTL_HANDLER_ARGS);
#ifdef SBUF_DRAIN
static int sysctl_cctrl(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_la(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS);
static int sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS);
static int sysctl_cpl_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_ddp_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_devlog(SYSCTL_HANDLER_ARGS);
static int sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_hw_sched(SYSCTL_HANDLER_ARGS);
static int sysctl_lb_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_linkdnrc(SYSCTL_HANDLER_ARGS);
static int sysctl_meminfo(SYSCTL_HANDLER_ARGS);
static int sysctl_mps_tcam(SYSCTL_HANDLER_ARGS);
static int sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS);
static int sysctl_path_mtus(SYSCTL_HANDLER_ARGS);
static int sysctl_pm_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_rdma_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_tcp_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_tids(SYSCTL_HANDLER_ARGS);
static int sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS);
static int sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS);
static int sysctl_tp_la(SYSCTL_HANDLER_ARGS);
static int sysctl_tx_rate(SYSCTL_HANDLER_ARGS);
static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS);
static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS);
+static int sysctl_tc_params(SYSCTL_HANDLER_ARGS);
#endif
#ifdef TCP_OFFLOAD
static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS);
static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS);
static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS);
#endif
static uint32_t fconf_iconf_to_mode(uint32_t, uint32_t);
static uint32_t mode_to_fconf(uint32_t);
static uint32_t mode_to_iconf(uint32_t);
static int check_fspec_against_fconf_iconf(struct adapter *,
struct t4_filter_specification *);
static int get_filter_mode(struct adapter *, uint32_t *);
static int set_filter_mode(struct adapter *, uint32_t);
static inline uint64_t get_filter_hits(struct adapter *, uint32_t);
static int get_filter(struct adapter *, struct t4_filter *);
static int set_filter(struct adapter *, struct t4_filter *);
static int del_filter(struct adapter *, struct t4_filter *);
static void clear_filter(struct filter_entry *);
static int set_filter_wr(struct adapter *, int);
static int del_filter_wr(struct adapter *, int);
static int get_sge_context(struct adapter *, struct t4_sge_context *);
static int load_fw(struct adapter *, struct t4_data *);
static int read_card_mem(struct adapter *, int, struct t4_mem_range *);
static int read_i2c(struct adapter *, struct t4_i2c_data *);
static int set_sched_class(struct adapter *, struct t4_sched_params *);
static int set_sched_queue(struct adapter *, struct t4_sched_queue *);
#ifdef TCP_OFFLOAD
static int toe_capability(struct vi_info *, int);
#endif
static int mod_event(module_t, int, void *);
struct {
uint16_t device;
char *desc;
} t4_pciids[] = {
{0xa000, "Chelsio Terminator 4 FPGA"},
{0x4400, "Chelsio T440-dbg"},
{0x4401, "Chelsio T420-CR"},
{0x4402, "Chelsio T422-CR"},
{0x4403, "Chelsio T440-CR"},
{0x4404, "Chelsio T420-BCH"},
{0x4405, "Chelsio T440-BCH"},
{0x4406, "Chelsio T440-CH"},
{0x4407, "Chelsio T420-SO"},
{0x4408, "Chelsio T420-CX"},
{0x4409, "Chelsio T420-BT"},
{0x440a, "Chelsio T404-BT"},
{0x440e, "Chelsio T440-LP-CR"},
}, t5_pciids[] = {
{0xb000, "Chelsio Terminator 5 FPGA"},
{0x5400, "Chelsio T580-dbg"},
{0x5401, "Chelsio T520-CR"}, /* 2 x 10G */
{0x5402, "Chelsio T522-CR"}, /* 2 x 10G, 2 X 1G */
{0x5403, "Chelsio T540-CR"}, /* 4 x 10G */
{0x5407, "Chelsio T520-SO"}, /* 2 x 10G, nomem */
{0x5409, "Chelsio T520-BT"}, /* 2 x 10GBaseT */
{0x540a, "Chelsio T504-BT"}, /* 4 x 1G */
{0x540d, "Chelsio T580-CR"}, /* 2 x 40G */
{0x540e, "Chelsio T540-LP-CR"}, /* 4 x 10G */
{0x5410, "Chelsio T580-LP-CR"}, /* 2 x 40G */
{0x5411, "Chelsio T520-LL-CR"}, /* 2 x 10G */
{0x5412, "Chelsio T560-CR"}, /* 1 x 40G, 2 x 10G */
{0x5414, "Chelsio T580-LP-SO-CR"}, /* 2 x 40G, nomem */
{0x5415, "Chelsio T502-BT"}, /* 2 x 1G */
#ifdef notyet
{0x5404, "Chelsio T520-BCH"},
{0x5405, "Chelsio T540-BCH"},
{0x5406, "Chelsio T540-CH"},
{0x5408, "Chelsio T520-CX"},
{0x540b, "Chelsio B520-SR"},
{0x540c, "Chelsio B504-BT"},
{0x540f, "Chelsio Amsterdam"},
{0x5413, "Chelsio T580-CHR"},
#endif
};
#ifdef TCP_OFFLOAD
/*
* service_iq() has an iq and needs the fl. Offset of fl from the iq should be
* exactly the same for both rxq and ofld_rxq.
*/
CTASSERT(offsetof(struct sge_ofld_rxq, iq) == offsetof(struct sge_rxq, iq));
CTASSERT(offsetof(struct sge_ofld_rxq, fl) == offsetof(struct sge_rxq, fl));
#endif
/* No easy way to include t4_msg.h before adapter.h so we check this way */
CTASSERT(nitems(((struct adapter *)0)->cpl_handler) == NUM_CPL_CMDS);
CTASSERT(nitems(((struct adapter *)0)->fw_msg_handler) == NUM_FW6_TYPES);
CTASSERT(sizeof(struct cluster_metadata) <= CL_METADATA_SIZE);
static int
t4_probe(device_t dev)
{
int i;
uint16_t v = pci_get_vendor(dev);
uint16_t d = pci_get_device(dev);
uint8_t f = pci_get_function(dev);
if (v != PCI_VENDOR_ID_CHELSIO)
return (ENXIO);
/* Attach only to PF0 of the FPGA */
if (d == 0xa000 && f != 0)
return (ENXIO);
for (i = 0; i < nitems(t4_pciids); i++) {
if (d == t4_pciids[i].device) {
device_set_desc(dev, t4_pciids[i].desc);
return (BUS_PROBE_DEFAULT);
}
}
return (ENXIO);
}
static int
t5_probe(device_t dev)
{
int i;
uint16_t v = pci_get_vendor(dev);
uint16_t d = pci_get_device(dev);
uint8_t f = pci_get_function(dev);
if (v != PCI_VENDOR_ID_CHELSIO)
return (ENXIO);
/* Attach only to PF0 of the FPGA */
if (d == 0xb000 && f != 0)
return (ENXIO);
for (i = 0; i < nitems(t5_pciids); i++) {
if (d == t5_pciids[i].device) {
device_set_desc(dev, t5_pciids[i].desc);
return (BUS_PROBE_DEFAULT);
}
}
return (ENXIO);
}
static void
t5_attribute_workaround(device_t dev)
{
device_t root_port;
uint32_t v;
/*
* The T5 chips do not properly echo the No Snoop and Relaxed
* Ordering attributes when replying to a TLP from a Root
* Port. As a workaround, find the parent Root Port and
* disable No Snoop and Relaxed Ordering. Note that this
* affects all devices under this root port.
*/
root_port = pci_find_pcie_root_port(dev);
if (root_port == NULL) {
device_printf(dev, "Unable to find parent root port\n");
return;
}
v = pcie_adjust_config(root_port, PCIER_DEVICE_CTL,
PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE, 0, 2);
if ((v & (PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE)) !=
0)
device_printf(dev, "Disabled No Snoop/Relaxed Ordering on %s\n",
device_get_nameunit(root_port));
}
static int
t4_attach(device_t dev)
{
struct adapter *sc;
int rc = 0, i, j, n10g, n1g, rqidx, tqidx;
struct intrs_and_queues iaq;
struct sge *s;
uint8_t *buf;
#ifdef TCP_OFFLOAD
int ofld_rqidx, ofld_tqidx;
#endif
#ifdef DEV_NETMAP
int nm_rqidx, nm_tqidx;
#endif
int num_vis;
sc = device_get_softc(dev);
sc->dev = dev;
TUNABLE_INT_FETCH("hw.cxgbe.debug_flags", &sc->debug_flags);
if ((pci_get_device(dev) & 0xff00) == 0x5400)
t5_attribute_workaround(dev);
pci_enable_busmaster(dev);
if (pci_find_cap(dev, PCIY_EXPRESS, &i) == 0) {
uint32_t v;
pci_set_max_read_req(dev, 4096);
v = pci_read_config(dev, i + PCIER_DEVICE_CTL, 2);
v |= PCIEM_CTL_RELAXED_ORD_ENABLE;
pci_write_config(dev, i + PCIER_DEVICE_CTL, v, 2);
sc->params.pci.mps = 128 << ((v & PCIEM_CTL_MAX_PAYLOAD) >> 5);
}
sc->traceq = -1;
mtx_init(&sc->ifp_lock, sc->ifp_lockname, 0, MTX_DEF);
snprintf(sc->ifp_lockname, sizeof(sc->ifp_lockname), "%s tracer",
device_get_nameunit(dev));
snprintf(sc->lockname, sizeof(sc->lockname), "%s",
device_get_nameunit(dev));
mtx_init(&sc->sc_lock, sc->lockname, 0, MTX_DEF);
sx_xlock(&t4_list_lock);
SLIST_INSERT_HEAD(&t4_list, sc, link);
sx_xunlock(&t4_list_lock);
mtx_init(&sc->sfl_lock, "starving freelists", 0, MTX_DEF);
TAILQ_INIT(&sc->sfl);
callout_init_mtx(&sc->sfl_callout, &sc->sfl_lock, 0);
mtx_init(&sc->reg_lock, "indirect register access", 0, MTX_DEF);
rc = map_bars_0_and_4(sc);
if (rc != 0)
goto done; /* error message displayed already */
/*
* This is the real PF# to which we're attaching. Works from within PCI
* passthrough environments too, where pci_get_function() could return a
* different PF# depending on the passthrough configuration. We need to
* use the real PF# in all our communication with the firmware.
*/
sc->pf = G_SOURCEPF(t4_read_reg(sc, A_PL_WHOAMI));
sc->mbox = sc->pf;
memset(sc->chan_map, 0xff, sizeof(sc->chan_map));
sc->an_handler = an_not_handled;
for (i = 0; i < nitems(sc->cpl_handler); i++)
sc->cpl_handler[i] = cpl_not_handled;
for (i = 0; i < nitems(sc->fw_msg_handler); i++)
sc->fw_msg_handler[i] = fw_msg_not_handled;
t4_register_cpl_handler(sc, CPL_SET_TCB_RPL, t4_filter_rpl);
t4_register_cpl_handler(sc, CPL_TRACE_PKT, t4_trace_pkt);
t4_register_cpl_handler(sc, CPL_T5_TRACE_PKT, t5_trace_pkt);
t4_init_sge_cpl_handlers(sc);
/* Prepare the adapter for operation. */
buf = malloc(PAGE_SIZE, M_CXGBE, M_ZERO | M_WAITOK);
rc = -t4_prep_adapter(sc, buf);
free(buf, M_CXGBE);
if (rc != 0) {
device_printf(dev, "failed to prepare adapter: %d.\n", rc);
goto done;
}
/*
* Do this really early, with the memory windows set up even before the
* character device. The userland tool's register i/o and mem read
* will work even in "recovery mode".
*/
setup_memwin(sc);
if (t4_init_devlog_params(sc, 0) == 0)
fixup_devlog_params(sc);
sc->cdev = make_dev(is_t4(sc) ? &t4_cdevsw : &t5_cdevsw,
device_get_unit(dev), UID_ROOT, GID_WHEEL, 0600, "%s",
device_get_nameunit(dev));
if (sc->cdev == NULL)
device_printf(dev, "failed to create nexus char device.\n");
else
sc->cdev->si_drv1 = sc;
/* Go no further if recovery mode has been requested. */
if (TUNABLE_INT_FETCH("hw.cxgbe.sos", &i) && i != 0) {
device_printf(dev, "recovery mode.\n");
goto done;
}
#if defined(__i386__)
if ((cpu_feature & CPUID_CX8) == 0) {
device_printf(dev, "64 bit atomics not available.\n");
rc = ENOTSUP;
goto done;
}
#endif
/* Prepare the firmware for operation */
rc = prep_firmware(sc);
if (rc != 0)
goto done; /* error message displayed already */
rc = get_params__post_init(sc);
if (rc != 0)
goto done; /* error message displayed already */
rc = set_params__post_init(sc);
if (rc != 0)
goto done; /* error message displayed already */
rc = map_bar_2(sc);
if (rc != 0)
goto done; /* error message displayed already */
rc = t4_create_dma_tag(sc);
if (rc != 0)
goto done; /* error message displayed already */
/*
* Number of VIs to create per-port. The first VI is the
* "main" regular VI for the port. The second VI is used for
* netmap if present, and any remaining VIs are used for
* additional virtual interfaces.
*
* Limit the number of VIs per port to the number of available
* MAC addresses per port.
*/
if (t4_num_vis >= 1)
num_vis = t4_num_vis;
else
num_vis = 1;
#ifdef DEV_NETMAP
num_vis++;
#endif
if (num_vis > nitems(vi_mac_funcs)) {
num_vis = nitems(vi_mac_funcs);
device_printf(dev, "Number of VIs limited to %d\n", num_vis);
}
/*
* First pass over all the ports - allocate VIs and initialize some
* basic parameters like mac address, port type, etc. We also figure
* out whether a port is 10G or 1G and use that information when
* calculating how many interrupts to attempt to allocate.
*/
n10g = n1g = 0;
for_each_port(sc, i) {
struct port_info *pi;
struct vi_info *vi;
pi = malloc(sizeof(*pi), M_CXGBE, M_ZERO | M_WAITOK);
sc->port[i] = pi;
/* These must be set before t4_port_init */
pi->adapter = sc;
pi->port_id = i;
pi->nvi = num_vis;
pi->vi = malloc(sizeof(struct vi_info) * num_vis, M_CXGBE,
M_ZERO | M_WAITOK);
/*
* Allocate the "main" VI and initialize parameters
* like mac addr.
*/
rc = -t4_port_init(sc, sc->mbox, sc->pf, 0, i);
if (rc != 0) {
device_printf(dev, "unable to initialize port %d: %d\n",
i, rc);
free(pi->vi, M_CXGBE);
free(pi, M_CXGBE);
sc->port[i] = NULL;
goto done;
}
pi->link_cfg.requested_fc &= ~(PAUSE_TX | PAUSE_RX);
pi->link_cfg.requested_fc |= t4_pause_settings;
pi->link_cfg.fc &= ~(PAUSE_TX | PAUSE_RX);
pi->link_cfg.fc |= t4_pause_settings;
rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, &pi->link_cfg);
if (rc != 0) {
device_printf(dev, "port %d l1cfg failed: %d\n", i, rc);
free(pi->vi, M_CXGBE);
free(pi, M_CXGBE);
sc->port[i] = NULL;
goto done;
}
snprintf(pi->lockname, sizeof(pi->lockname), "%sp%d",
device_get_nameunit(dev), i);
mtx_init(&pi->pi_lock, pi->lockname, 0, MTX_DEF);
sc->chan_map[pi->tx_chan] = i;
+ pi->tc = malloc(sizeof(struct tx_sched_class) *
+ sc->chip_params->nsched_cls, M_CXGBE, M_ZERO | M_WAITOK);
+
if (is_10G_port(pi) || is_40G_port(pi)) {
n10g++;
for_each_vi(pi, j, vi) {
vi->tmr_idx = t4_tmr_idx_10g;
vi->pktc_idx = t4_pktc_idx_10g;
}
} else {
n1g++;
for_each_vi(pi, j, vi) {
vi->tmr_idx = t4_tmr_idx_1g;
vi->pktc_idx = t4_pktc_idx_1g;
}
}
pi->linkdnrc = -1;
for_each_vi(pi, j, vi) {
vi->qsize_rxq = t4_qsize_rxq;
vi->qsize_txq = t4_qsize_txq;
vi->pi = pi;
}
pi->dev = device_add_child(dev, is_t4(sc) ? "cxgbe" : "cxl", -1);
if (pi->dev == NULL) {
device_printf(dev,
"failed to add device for port %d.\n", i);
rc = ENXIO;
goto done;
}
pi->vi[0].dev = pi->dev;
device_set_softc(pi->dev, pi);
}
/*
* Interrupt type, # of interrupts, # of rx/tx queues, etc.
*/
#ifdef DEV_NETMAP
num_vis--;
#endif
rc = cfg_itype_and_nqueues(sc, n10g, n1g, num_vis, &iaq);
if (rc != 0)
goto done; /* error message displayed already */
sc->intr_type = iaq.intr_type;
sc->intr_count = iaq.nirq;
s = &sc->sge;
s->nrxq = n10g * iaq.nrxq10g + n1g * iaq.nrxq1g;
s->ntxq = n10g * iaq.ntxq10g + n1g * iaq.ntxq1g;
if (num_vis > 1) {
s->nrxq += (n10g + n1g) * (num_vis - 1);
s->ntxq += (n10g + n1g) * (num_vis - 1);
}
s->neq = s->ntxq + s->nrxq; /* the free list in an rxq is an eq */
s->neq += sc->params.nports + 1;/* ctrl queues: 1 per port + 1 mgmt */
s->niq = s->nrxq + 1; /* 1 extra for firmware event queue */
#ifdef TCP_OFFLOAD
if (is_offload(sc)) {
s->nofldrxq = n10g * iaq.nofldrxq10g + n1g * iaq.nofldrxq1g;
s->nofldtxq = n10g * iaq.nofldtxq10g + n1g * iaq.nofldtxq1g;
if (num_vis > 1) {
s->nofldrxq += (n10g + n1g) * (num_vis - 1);
s->nofldtxq += (n10g + n1g) * (num_vis - 1);
}
s->neq += s->nofldtxq + s->nofldrxq;
s->niq += s->nofldrxq;
s->ofld_rxq = malloc(s->nofldrxq * sizeof(struct sge_ofld_rxq),
M_CXGBE, M_ZERO | M_WAITOK);
s->ofld_txq = malloc(s->nofldtxq * sizeof(struct sge_wrq),
M_CXGBE, M_ZERO | M_WAITOK);
}
#endif
#ifdef DEV_NETMAP
s->nnmrxq = n10g * iaq.nnmrxq10g + n1g * iaq.nnmrxq1g;
s->nnmtxq = n10g * iaq.nnmtxq10g + n1g * iaq.nnmtxq1g;
s->neq += s->nnmtxq + s->nnmrxq;
s->niq += s->nnmrxq;
s->nm_rxq = malloc(s->nnmrxq * sizeof(struct sge_nm_rxq),
M_CXGBE, M_ZERO | M_WAITOK);
s->nm_txq = malloc(s->nnmtxq * sizeof(struct sge_nm_txq),
M_CXGBE, M_ZERO | M_WAITOK);
#endif
s->ctrlq = malloc(sc->params.nports * sizeof(struct sge_wrq), M_CXGBE,
M_ZERO | M_WAITOK);
s->rxq = malloc(s->nrxq * sizeof(struct sge_rxq), M_CXGBE,
M_ZERO | M_WAITOK);
s->txq = malloc(s->ntxq * sizeof(struct sge_txq), M_CXGBE,
M_ZERO | M_WAITOK);
s->iqmap = malloc(s->niq * sizeof(struct sge_iq *), M_CXGBE,
M_ZERO | M_WAITOK);
s->eqmap = malloc(s->neq * sizeof(struct sge_eq *), M_CXGBE,
M_ZERO | M_WAITOK);
sc->irq = malloc(sc->intr_count * sizeof(struct irq), M_CXGBE,
M_ZERO | M_WAITOK);
t4_init_l2t(sc, M_WAITOK);
/*
* Second pass over the ports. This time we know the number of rx and
* tx queues that each port should get.
*/
rqidx = tqidx = 0;
#ifdef TCP_OFFLOAD
ofld_rqidx = ofld_tqidx = 0;
#endif
#ifdef DEV_NETMAP
nm_rqidx = nm_tqidx = 0;
#endif
for_each_port(sc, i) {
struct port_info *pi = sc->port[i];
struct vi_info *vi;
if (pi == NULL)
continue;
for_each_vi(pi, j, vi) {
#ifdef DEV_NETMAP
if (j == 1) {
vi->flags |= VI_NETMAP | INTR_RXQ;
vi->first_rxq = nm_rqidx;
vi->first_txq = nm_tqidx;
if (is_10G_port(pi) || is_40G_port(pi)) {
vi->nrxq = iaq.nnmrxq10g;
vi->ntxq = iaq.nnmtxq10g;
} else {
vi->nrxq = iaq.nnmrxq1g;
vi->ntxq = iaq.nnmtxq1g;
}
nm_rqidx += vi->nrxq;
nm_tqidx += vi->ntxq;
continue;
}
#endif
vi->first_rxq = rqidx;
vi->first_txq = tqidx;
if (is_10G_port(pi) || is_40G_port(pi)) {
vi->flags |= iaq.intr_flags_10g & INTR_RXQ;
vi->nrxq = j == 0 ? iaq.nrxq10g : 1;
vi->ntxq = j == 0 ? iaq.ntxq10g : 1;
} else {
vi->flags |= iaq.intr_flags_1g & INTR_RXQ;
vi->nrxq = j == 0 ? iaq.nrxq1g : 1;
vi->ntxq = j == 0 ? iaq.ntxq1g : 1;
}
if (vi->ntxq > 1)
vi->rsrv_noflowq = iaq.rsrv_noflowq ? 1 : 0;
else
vi->rsrv_noflowq = 0;
rqidx += vi->nrxq;
tqidx += vi->ntxq;
#ifdef TCP_OFFLOAD
if (!is_offload(sc))
continue;
vi->first_ofld_rxq = ofld_rqidx;
vi->first_ofld_txq = ofld_tqidx;
if (is_10G_port(pi) || is_40G_port(pi)) {
vi->flags |= iaq.intr_flags_10g & INTR_OFLD_RXQ;
vi->nofldrxq = j == 0 ? iaq.nofldrxq10g : 1;
vi->nofldtxq = j == 0 ? iaq.nofldtxq10g : 1;
} else {
vi->flags |= iaq.intr_flags_1g & INTR_OFLD_RXQ;
vi->nofldrxq = j == 0 ? iaq.nofldrxq1g : 1;
vi->nofldtxq = j == 0 ? iaq.nofldtxq1g : 1;
}
ofld_rqidx += vi->nofldrxq;
ofld_tqidx += vi->nofldtxq;
#endif
}
}
rc = setup_intr_handlers(sc);
if (rc != 0) {
device_printf(dev,
"failed to setup interrupt handlers: %d\n", rc);
goto done;
}
rc = bus_generic_attach(dev);
if (rc != 0) {
device_printf(dev,
"failed to attach all child ports: %d\n", rc);
goto done;
}
device_printf(dev,
"PCIe gen%d x%d, %d ports, %d %s interrupt%s, %d eq, %d iq\n",
sc->params.pci.speed, sc->params.pci.width, sc->params.nports,
sc->intr_count, sc->intr_type == INTR_MSIX ? "MSI-X" :
(sc->intr_type == INTR_MSI ? "MSI" : "INTx"),
sc->intr_count > 1 ? "s" : "", sc->sge.neq, sc->sge.niq);
t4_set_desc(sc);
done:
if (rc != 0 && sc->cdev) {
/* cdev was created and so cxgbetool works; recover that way. */
device_printf(dev,
"error during attach, adapter is now in recovery mode.\n");
rc = 0;
}
if (rc != 0)
t4_detach(dev);
else
t4_sysctls(sc);
return (rc);
}
/*
* Idempotent
*/
static int
t4_detach(device_t dev)
{
struct adapter *sc;
struct port_info *pi;
int i, rc;
sc = device_get_softc(dev);
if (sc->flags & FULL_INIT_DONE)
t4_intr_disable(sc);
if (sc->cdev) {
destroy_dev(sc->cdev);
sc->cdev = NULL;
}
rc = bus_generic_detach(dev);
if (rc) {
device_printf(dev,
"failed to detach child devices: %d\n", rc);
return (rc);
}
for (i = 0; i < sc->intr_count; i++)
t4_free_irq(sc, &sc->irq[i]);
for (i = 0; i < MAX_NPORTS; i++) {
pi = sc->port[i];
if (pi) {
t4_free_vi(sc, sc->mbox, sc->pf, 0, pi->vi[0].viid);
if (pi->dev)
device_delete_child(dev, pi->dev);
mtx_destroy(&pi->pi_lock);
free(pi->vi, M_CXGBE);
+ free(pi->tc, M_CXGBE);
free(pi, M_CXGBE);
}
}
if (sc->flags & FULL_INIT_DONE)
adapter_full_uninit(sc);
if (sc->flags & FW_OK)
t4_fw_bye(sc, sc->mbox);
if (sc->intr_type == INTR_MSI || sc->intr_type == INTR_MSIX)
pci_release_msi(dev);
if (sc->regs_res)
bus_release_resource(dev, SYS_RES_MEMORY, sc->regs_rid,
sc->regs_res);
if (sc->udbs_res)
bus_release_resource(dev, SYS_RES_MEMORY, sc->udbs_rid,
sc->udbs_res);
if (sc->msix_res)
bus_release_resource(dev, SYS_RES_MEMORY, sc->msix_rid,
sc->msix_res);
if (sc->l2t)
t4_free_l2t(sc->l2t);
#ifdef TCP_OFFLOAD
free(sc->sge.ofld_rxq, M_CXGBE);
free(sc->sge.ofld_txq, M_CXGBE);
#endif
#ifdef DEV_NETMAP
free(sc->sge.nm_rxq, M_CXGBE);
free(sc->sge.nm_txq, M_CXGBE);
#endif
free(sc->irq, M_CXGBE);
free(sc->sge.rxq, M_CXGBE);
free(sc->sge.txq, M_CXGBE);
free(sc->sge.ctrlq, M_CXGBE);
free(sc->sge.iqmap, M_CXGBE);
free(sc->sge.eqmap, M_CXGBE);
free(sc->tids.ftid_tab, M_CXGBE);
t4_destroy_dma_tag(sc);
if (mtx_initialized(&sc->sc_lock)) {
sx_xlock(&t4_list_lock);
SLIST_REMOVE(&t4_list, sc, adapter, link);
sx_xunlock(&t4_list_lock);
mtx_destroy(&sc->sc_lock);
}
callout_drain(&sc->sfl_callout);
if (mtx_initialized(&sc->tids.ftid_lock))
mtx_destroy(&sc->tids.ftid_lock);
if (mtx_initialized(&sc->sfl_lock))
mtx_destroy(&sc->sfl_lock);
if (mtx_initialized(&sc->ifp_lock))
mtx_destroy(&sc->ifp_lock);
if (mtx_initialized(&sc->reg_lock))
mtx_destroy(&sc->reg_lock);
for (i = 0; i < NUM_MEMWIN; i++) {
struct memwin *mw = &sc->memwin[i];
if (rw_initialized(&mw->mw_lock))
rw_destroy(&mw->mw_lock);
}
bzero(sc, sizeof(*sc));
return (0);
}
static int
cxgbe_probe(device_t dev)
{
char buf[128];
struct port_info *pi = device_get_softc(dev);
snprintf(buf, sizeof(buf), "port %d", pi->port_id);
device_set_desc_copy(dev, buf);
return (BUS_PROBE_DEFAULT);
}
#define T4_CAP (IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | \
IFCAP_VLAN_HWCSUM | IFCAP_TSO | IFCAP_JUMBO_MTU | IFCAP_LRO | \
IFCAP_VLAN_HWTSO | IFCAP_LINKSTATE | IFCAP_HWCSUM_IPV6 | IFCAP_HWSTATS)
#define T4_CAP_ENABLE (T4_CAP)
static int
cxgbe_vi_attach(device_t dev, struct vi_info *vi)
{
struct ifnet *ifp;
struct sbuf *sb;
vi->xact_addr_filt = -1;
callout_init(&vi->tick, 1);
/* Allocate an ifnet and set it up */
ifp = if_alloc(IFT_ETHER);
if (ifp == NULL) {
device_printf(dev, "Cannot allocate ifnet\n");
return (ENOMEM);
}
vi->ifp = ifp;
ifp->if_softc = vi;
if_initname(ifp, device_get_name(dev), device_get_unit(dev));
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
ifp->if_init = cxgbe_init;
ifp->if_ioctl = cxgbe_ioctl;
ifp->if_transmit = cxgbe_transmit;
ifp->if_qflush = cxgbe_qflush;
ifp->if_get_counter = cxgbe_get_counter;
ifp->if_capabilities = T4_CAP;
#ifdef TCP_OFFLOAD
if (vi->nofldrxq != 0)
ifp->if_capabilities |= IFCAP_TOE;
#endif
ifp->if_capenable = T4_CAP_ENABLE;
ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_IP | CSUM_TSO |
CSUM_UDP_IPV6 | CSUM_TCP_IPV6;
ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
ifp->if_hw_tsomaxsegcount = TX_SGL_SEGS;
ifp->if_hw_tsomaxsegsize = 65536;
/* Initialize ifmedia for this VI */
ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change,
cxgbe_media_status);
build_medialist(vi->pi, &vi->media);
vi->vlan_c = EVENTHANDLER_REGISTER(vlan_config, cxgbe_vlan_config, ifp,
EVENTHANDLER_PRI_ANY);
ether_ifattach(ifp, vi->hw_addr);
sb = sbuf_new_auto();
sbuf_printf(sb, "%d txq, %d rxq (NIC)", vi->ntxq, vi->nrxq);
#ifdef TCP_OFFLOAD
if (ifp->if_capabilities & IFCAP_TOE)
sbuf_printf(sb, "; %d txq, %d rxq (TOE)",
vi->nofldtxq, vi->nofldrxq);
#endif
sbuf_finish(sb);
device_printf(dev, "%s\n", sbuf_data(sb));
sbuf_delete(sb);
vi_sysctls(vi);
return (0);
}
static int
cxgbe_attach(device_t dev)
{
struct port_info *pi = device_get_softc(dev);
struct vi_info *vi;
int i, rc;
callout_init_mtx(&pi->tick, &pi->pi_lock, 0);
rc = cxgbe_vi_attach(dev, &pi->vi[0]);
if (rc)
return (rc);
for_each_vi(pi, i, vi) {
if (i == 0)
continue;
#ifdef DEV_NETMAP
if (vi->flags & VI_NETMAP) {
/*
* media handled here to keep
* implementation private to this file
*/
ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change,
cxgbe_media_status);
build_medialist(pi, &vi->media);
vi->dev = device_add_child(dev, is_t4(pi->adapter) ?
"ncxgbe" : "ncxl", device_get_unit(dev));
} else
#endif
vi->dev = device_add_child(dev, is_t4(pi->adapter) ?
"vcxgbe" : "vcxl", -1);
if (vi->dev == NULL) {
device_printf(dev, "failed to add VI %d\n", i);
continue;
}
device_set_softc(vi->dev, vi);
}
cxgbe_sysctls(pi);
bus_generic_attach(dev);
return (0);
}
static void
cxgbe_vi_detach(struct vi_info *vi)
{
struct ifnet *ifp = vi->ifp;
ether_ifdetach(ifp);
if (vi->vlan_c)
EVENTHANDLER_DEREGISTER(vlan_config, vi->vlan_c);
/* Let detach proceed even if these fail. */
cxgbe_uninit_synchronized(vi);
callout_drain(&vi->tick);
vi_full_uninit(vi);
ifmedia_removeall(&vi->media);
if_free(vi->ifp);
vi->ifp = NULL;
}
static int
cxgbe_detach(device_t dev)
{
struct port_info *pi = device_get_softc(dev);
struct adapter *sc = pi->adapter;
int rc;
/* Detach the extra VIs first. */
rc = bus_generic_detach(dev);
if (rc)
return (rc);
device_delete_children(dev);
doom_vi(sc, &pi->vi[0]);
if (pi->flags & HAS_TRACEQ) {
sc->traceq = -1; /* cloner should not create ifnet */
t4_tracer_port_detach(sc);
}
cxgbe_vi_detach(&pi->vi[0]);
callout_drain(&pi->tick);
end_synchronized_op(sc, 0);
return (0);
}
static void
cxgbe_init(void *arg)
{
struct vi_info *vi = arg;
struct adapter *sc = vi->pi->adapter;
if (begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4init") != 0)
return;
cxgbe_init_synchronized(vi);
end_synchronized_op(sc, 0);
}
static int
cxgbe_ioctl(struct ifnet *ifp, unsigned long cmd, caddr_t data)
{
int rc = 0, mtu, flags, can_sleep;
struct vi_info *vi = ifp->if_softc;
struct adapter *sc = vi->pi->adapter;
struct ifreq *ifr = (struct ifreq *)data;
uint32_t mask;
switch (cmd) {
case SIOCSIFMTU:
mtu = ifr->ifr_mtu;
if ((mtu < ETHERMIN) || (mtu > ETHERMTU_JUMBO))
return (EINVAL);
rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4mtu");
if (rc)
return (rc);
ifp->if_mtu = mtu;
if (vi->flags & VI_INIT_DONE) {
t4_update_fl_bufsize(ifp);
if (ifp->if_drv_flags & IFF_DRV_RUNNING)
rc = update_mac_settings(ifp, XGMAC_MTU);
}
end_synchronized_op(sc, 0);
break;
case SIOCSIFFLAGS:
can_sleep = 0;
redo_sifflags:
rc = begin_synchronized_op(sc, vi,
can_sleep ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4flg");
if (rc)
return (rc);
if (ifp->if_flags & IFF_UP) {
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
flags = vi->if_flags;
if ((ifp->if_flags ^ flags) &
(IFF_PROMISC | IFF_ALLMULTI)) {
if (can_sleep == 1) {
end_synchronized_op(sc, 0);
can_sleep = 0;
goto redo_sifflags;
}
rc = update_mac_settings(ifp,
XGMAC_PROMISC | XGMAC_ALLMULTI);
}
} else {
if (can_sleep == 0) {
end_synchronized_op(sc, LOCK_HELD);
can_sleep = 1;
goto redo_sifflags;
}
rc = cxgbe_init_synchronized(vi);
}
vi->if_flags = ifp->if_flags;
} else if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
if (can_sleep == 0) {
end_synchronized_op(sc, LOCK_HELD);
can_sleep = 1;
goto redo_sifflags;
}
rc = cxgbe_uninit_synchronized(vi);
}
end_synchronized_op(sc, can_sleep ? 0 : LOCK_HELD);
break;
case SIOCADDMULTI:
case SIOCDELMULTI: /* these two are called with a mutex held :-( */
rc = begin_synchronized_op(sc, vi, HOLD_LOCK, "t4multi");
if (rc)
return (rc);
if (ifp->if_drv_flags & IFF_DRV_RUNNING)
rc = update_mac_settings(ifp, XGMAC_MCADDRS);
end_synchronized_op(sc, LOCK_HELD);
break;
case SIOCSIFCAP:
rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4cap");
if (rc)
return (rc);
mask = ifr->ifr_reqcap ^ ifp->if_capenable;
if (mask & IFCAP_TXCSUM) {
ifp->if_capenable ^= IFCAP_TXCSUM;
ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP);
if (IFCAP_TSO4 & ifp->if_capenable &&
!(IFCAP_TXCSUM & ifp->if_capenable)) {
ifp->if_capenable &= ~IFCAP_TSO4;
if_printf(ifp,
"tso4 disabled due to -txcsum.\n");
}
}
if (mask & IFCAP_TXCSUM_IPV6) {
ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
if (IFCAP_TSO6 & ifp->if_capenable &&
!(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
ifp->if_capenable &= ~IFCAP_TSO6;
if_printf(ifp,
"tso6 disabled due to -txcsum6.\n");
}
}
if (mask & IFCAP_RXCSUM)
ifp->if_capenable ^= IFCAP_RXCSUM;
if (mask & IFCAP_RXCSUM_IPV6)
ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
/*
* Note that we leave CSUM_TSO alone (it is always set). The
* kernel takes both IFCAP_TSOx and CSUM_TSO into account before
* sending a TSO request our way, so it's sufficient to toggle
* IFCAP_TSOx only.
*/
if (mask & IFCAP_TSO4) {
if (!(IFCAP_TSO4 & ifp->if_capenable) &&
!(IFCAP_TXCSUM & ifp->if_capenable)) {
if_printf(ifp, "enable txcsum first.\n");
rc = EAGAIN;
goto fail;
}
ifp->if_capenable ^= IFCAP_TSO4;
}
if (mask & IFCAP_TSO6) {
if (!(IFCAP_TSO6 & ifp->if_capenable) &&
!(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
if_printf(ifp, "enable txcsum6 first.\n");
rc = EAGAIN;
goto fail;
}
ifp->if_capenable ^= IFCAP_TSO6;
}
if (mask & IFCAP_LRO) {
#if defined(INET) || defined(INET6)
int i;
struct sge_rxq *rxq;
ifp->if_capenable ^= IFCAP_LRO;
for_each_rxq(vi, i, rxq) {
if (ifp->if_capenable & IFCAP_LRO)
rxq->iq.flags |= IQ_LRO_ENABLED;
else
rxq->iq.flags &= ~IQ_LRO_ENABLED;
}
#endif
}
#ifdef TCP_OFFLOAD
if (mask & IFCAP_TOE) {
int enable = (ifp->if_capenable ^ mask) & IFCAP_TOE;
rc = toe_capability(vi, enable);
if (rc != 0)
goto fail;
ifp->if_capenable ^= mask;
}
#endif
if (mask & IFCAP_VLAN_HWTAGGING) {
ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
if (ifp->if_drv_flags & IFF_DRV_RUNNING)
rc = update_mac_settings(ifp, XGMAC_VLANEX);
}
if (mask & IFCAP_VLAN_MTU) {
ifp->if_capenable ^= IFCAP_VLAN_MTU;
/* Need to find out how to disable auto-mtu-inflation */
}
if (mask & IFCAP_VLAN_HWTSO)
ifp->if_capenable ^= IFCAP_VLAN_HWTSO;
if (mask & IFCAP_VLAN_HWCSUM)
ifp->if_capenable ^= IFCAP_VLAN_HWCSUM;
#ifdef VLAN_CAPABILITIES
VLAN_CAPABILITIES(ifp);
#endif
fail:
end_synchronized_op(sc, 0);
break;
case SIOCSIFMEDIA:
case SIOCGIFMEDIA:
ifmedia_ioctl(ifp, ifr, &vi->media, cmd);
break;
case SIOCGI2C: {
struct ifi2creq i2c;
rc = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
if (rc != 0)
break;
if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) {
rc = EPERM;
break;
}
if (i2c.len > sizeof(i2c.data)) {
rc = EINVAL;
break;
}
rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4i2c");
if (rc)
return (rc);
rc = -t4_i2c_rd(sc, sc->mbox, vi->pi->port_id, i2c.dev_addr,
i2c.offset, i2c.len, &i2c.data[0]);
end_synchronized_op(sc, 0);
if (rc == 0)
rc = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
break;
}
default:
rc = ether_ioctl(ifp, cmd, data);
}
return (rc);
}
static int
cxgbe_transmit(struct ifnet *ifp, struct mbuf *m)
{
struct vi_info *vi = ifp->if_softc;
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
struct sge_txq *txq;
void *items[1];
int rc;
M_ASSERTPKTHDR(m);
MPASS(m->m_nextpkt == NULL); /* not quite ready for this yet */
if (__predict_false(pi->link_cfg.link_ok == 0)) {
m_freem(m);
return (ENETDOWN);
}
rc = parse_pkt(&m);
if (__predict_false(rc != 0)) {
MPASS(m == NULL); /* was freed already */
atomic_add_int(&pi->tx_parse_error, 1); /* rare, atomic is ok */
return (rc);
}
/* Select a txq. */
txq = &sc->sge.txq[vi->first_txq];
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
txq += ((m->m_pkthdr.flowid % (vi->ntxq - vi->rsrv_noflowq)) +
vi->rsrv_noflowq);
items[0] = m;
rc = mp_ring_enqueue(txq->r, items, 1, 4096);
if (__predict_false(rc != 0))
m_freem(m);
return (rc);
}
static void
cxgbe_qflush(struct ifnet *ifp)
{
struct vi_info *vi = ifp->if_softc;
struct sge_txq *txq;
int i;
/* queues do not exist if !VI_INIT_DONE. */
if (vi->flags & VI_INIT_DONE) {
for_each_txq(vi, i, txq) {
TXQ_LOCK(txq);
txq->eq.flags &= ~EQ_ENABLED;
TXQ_UNLOCK(txq);
while (!mp_ring_is_idle(txq->r)) {
mp_ring_check_drainage(txq->r, 0);
pause("qflush", 1);
}
}
}
if_qflush(ifp);
}
static uint64_t
vi_get_counter(struct ifnet *ifp, ift_counter c)
{
struct vi_info *vi = ifp->if_softc;
struct fw_vi_stats_vf *s = &vi->stats;
vi_refresh_stats(vi->pi->adapter, vi);
switch (c) {
case IFCOUNTER_IPACKETS:
return (s->rx_bcast_frames + s->rx_mcast_frames +
s->rx_ucast_frames);
case IFCOUNTER_IERRORS:
return (s->rx_err_frames);
case IFCOUNTER_OPACKETS:
return (s->tx_bcast_frames + s->tx_mcast_frames +
s->tx_ucast_frames + s->tx_offload_frames);
case IFCOUNTER_OERRORS:
return (s->tx_drop_frames);
case IFCOUNTER_IBYTES:
return (s->rx_bcast_bytes + s->rx_mcast_bytes +
s->rx_ucast_bytes);
case IFCOUNTER_OBYTES:
return (s->tx_bcast_bytes + s->tx_mcast_bytes +
s->tx_ucast_bytes + s->tx_offload_bytes);
case IFCOUNTER_IMCASTS:
return (s->rx_mcast_frames);
case IFCOUNTER_OMCASTS:
return (s->tx_mcast_frames);
case IFCOUNTER_OQDROPS: {
uint64_t drops;
drops = 0;
if ((vi->flags & (VI_INIT_DONE | VI_NETMAP)) == VI_INIT_DONE) {
int i;
struct sge_txq *txq;
for_each_txq(vi, i, txq)
drops += counter_u64_fetch(txq->r->drops);
}
return (drops);
}
default:
return (if_get_counter_default(ifp, c));
}
}
uint64_t
cxgbe_get_counter(struct ifnet *ifp, ift_counter c)
{
struct vi_info *vi = ifp->if_softc;
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
struct port_stats *s = &pi->stats;
if (pi->nvi > 1)
return (vi_get_counter(ifp, c));
cxgbe_refresh_stats(sc, pi);
switch (c) {
case IFCOUNTER_IPACKETS:
return (s->rx_frames);
case IFCOUNTER_IERRORS:
return (s->rx_jabber + s->rx_runt + s->rx_too_long +
s->rx_fcs_err + s->rx_len_err);
case IFCOUNTER_OPACKETS:
return (s->tx_frames);
case IFCOUNTER_OERRORS:
return (s->tx_error_frames);
case IFCOUNTER_IBYTES:
return (s->rx_octets);
case IFCOUNTER_OBYTES:
return (s->tx_octets);
case IFCOUNTER_IMCASTS:
return (s->rx_mcast_frames);
case IFCOUNTER_OMCASTS:
return (s->tx_mcast_frames);
case IFCOUNTER_IQDROPS:
return (s->rx_ovflow0 + s->rx_ovflow1 + s->rx_ovflow2 +
s->rx_ovflow3 + s->rx_trunc0 + s->rx_trunc1 + s->rx_trunc2 +
s->rx_trunc3 + pi->tnl_cong_drops);
case IFCOUNTER_OQDROPS: {
uint64_t drops;
drops = s->tx_drop;
if (vi->flags & VI_INIT_DONE) {
int i;
struct sge_txq *txq;
for_each_txq(vi, i, txq)
drops += counter_u64_fetch(txq->r->drops);
}
return (drops);
}
default:
return (if_get_counter_default(ifp, c));
}
}
static int
cxgbe_media_change(struct ifnet *ifp)
{
struct vi_info *vi = ifp->if_softc;
device_printf(vi->dev, "%s unimplemented.\n", __func__);
return (EOPNOTSUPP);
}
static void
cxgbe_media_status(struct ifnet *ifp, struct ifmediareq *ifmr)
{
struct vi_info *vi = ifp->if_softc;
struct port_info *pi = vi->pi;
struct ifmedia_entry *cur;
int speed = pi->link_cfg.speed;
cur = vi->media.ifm_cur;
ifmr->ifm_status = IFM_AVALID;
if (!pi->link_cfg.link_ok)
return;
ifmr->ifm_status |= IFM_ACTIVE;
/* active and current will differ iff current media is autoselect. */
if (IFM_SUBTYPE(cur->ifm_media) != IFM_AUTO)
return;
ifmr->ifm_active = IFM_ETHER | IFM_FDX;
if (speed == 10000)
ifmr->ifm_active |= IFM_10G_T;
else if (speed == 1000)
ifmr->ifm_active |= IFM_1000_T;
else if (speed == 100)
ifmr->ifm_active |= IFM_100_TX;
else if (speed == 10)
ifmr->ifm_active |= IFM_10_T;
else
KASSERT(0, ("%s: link up but speed unknown (%u)", __func__,
speed));
}
static int
vcxgbe_probe(device_t dev)
{
char buf[128];
struct vi_info *vi = device_get_softc(dev);
snprintf(buf, sizeof(buf), "port %d vi %td", vi->pi->port_id,
vi - vi->pi->vi);
device_set_desc_copy(dev, buf);
return (BUS_PROBE_DEFAULT);
}
static int
vcxgbe_attach(device_t dev)
{
struct vi_info *vi;
struct port_info *pi;
struct adapter *sc;
int func, index, rc;
u32 param, val;
vi = device_get_softc(dev);
pi = vi->pi;
sc = pi->adapter;
index = vi - pi->vi;
KASSERT(index < nitems(vi_mac_funcs),
("%s: VI %s doesn't have a MAC func", __func__,
device_get_nameunit(dev)));
func = vi_mac_funcs[index];
rc = t4_alloc_vi_func(sc, sc->mbox, pi->tx_chan, sc->pf, 0, 1,
vi->hw_addr, &vi->rss_size, func, 0);
if (rc < 0) {
device_printf(dev, "Failed to allocate virtual interface "
"for port %d: %d\n", pi->port_id, -rc);
return (-rc);
}
vi->viid = rc;
param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_RSSINFO) |
V_FW_PARAMS_PARAM_YZ(vi->viid);
rc = t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val);
if (rc)
vi->rss_base = 0xffff;
else {
/* MPASS((val >> 16) == rss_size); */
vi->rss_base = val & 0xffff;
}
rc = cxgbe_vi_attach(dev, vi);
if (rc) {
t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
return (rc);
}
return (0);
}
static int
vcxgbe_detach(device_t dev)
{
struct vi_info *vi;
struct adapter *sc;
vi = device_get_softc(dev);
sc = vi->pi->adapter;
doom_vi(sc, vi);
cxgbe_vi_detach(vi);
t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
end_synchronized_op(sc, 0);
return (0);
}
void
t4_fatal_err(struct adapter *sc)
{
t4_set_reg_field(sc, A_SGE_CONTROL, F_GLOBALENABLE, 0);
t4_intr_disable(sc);
log(LOG_EMERG, "%s: encountered fatal error, adapter stopped.\n",
device_get_nameunit(sc->dev));
}
static int
map_bars_0_and_4(struct adapter *sc)
{
sc->regs_rid = PCIR_BAR(0);
sc->regs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
&sc->regs_rid, RF_ACTIVE);
if (sc->regs_res == NULL) {
device_printf(sc->dev, "cannot map registers.\n");
return (ENXIO);
}
sc->bt = rman_get_bustag(sc->regs_res);
sc->bh = rman_get_bushandle(sc->regs_res);
sc->mmio_len = rman_get_size(sc->regs_res);
setbit(&sc->doorbells, DOORBELL_KDB);
sc->msix_rid = PCIR_BAR(4);
sc->msix_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
&sc->msix_rid, RF_ACTIVE);
if (sc->msix_res == NULL) {
device_printf(sc->dev, "cannot map MSI-X BAR.\n");
return (ENXIO);
}
return (0);
}
static int
map_bar_2(struct adapter *sc)
{
/*
* T4: only iWARP driver uses the userspace doorbells. There is no need
* to map it if RDMA is disabled.
*/
if (is_t4(sc) && sc->rdmacaps == 0)
return (0);
sc->udbs_rid = PCIR_BAR(2);
sc->udbs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
&sc->udbs_rid, RF_ACTIVE);
if (sc->udbs_res == NULL) {
device_printf(sc->dev, "cannot map doorbell BAR.\n");
return (ENXIO);
}
sc->udbs_base = rman_get_virtual(sc->udbs_res);
if (is_t5(sc)) {
setbit(&sc->doorbells, DOORBELL_UDB);
#if defined(__i386__) || defined(__amd64__)
if (t5_write_combine) {
int rc;
/*
* Enable write combining on BAR2. This is the
* userspace doorbell BAR and is split into 128B
* (UDBS_SEG_SIZE) doorbell regions, each associated
* with an egress queue. The first 64B has the doorbell
* and the second 64B can be used to submit a tx work
* request with an implicit doorbell.
*/
rc = pmap_change_attr((vm_offset_t)sc->udbs_base,
rman_get_size(sc->udbs_res), PAT_WRITE_COMBINING);
if (rc == 0) {
clrbit(&sc->doorbells, DOORBELL_UDB);
setbit(&sc->doorbells, DOORBELL_WCWR);
setbit(&sc->doorbells, DOORBELL_UDBWC);
} else {
device_printf(sc->dev,
"couldn't enable write combining: %d\n",
rc);
}
t4_write_reg(sc, A_SGE_STAT_CFG,
V_STATSOURCE_T5(7) | V_STATMODE(0));
}
#endif
}
return (0);
}
struct memwin_init {
uint32_t base;
uint32_t aperture;
};
static const struct memwin_init t4_memwin[NUM_MEMWIN] = {
{ MEMWIN0_BASE, MEMWIN0_APERTURE },
{ MEMWIN1_BASE, MEMWIN1_APERTURE },
{ MEMWIN2_BASE_T4, MEMWIN2_APERTURE_T4 }
};
static const struct memwin_init t5_memwin[NUM_MEMWIN] = {
{ MEMWIN0_BASE, MEMWIN0_APERTURE },
{ MEMWIN1_BASE, MEMWIN1_APERTURE },
{ MEMWIN2_BASE_T5, MEMWIN2_APERTURE_T5 },
};
static void
setup_memwin(struct adapter *sc)
{
const struct memwin_init *mw_init;
struct memwin *mw;
int i;
uint32_t bar0;
if (is_t4(sc)) {
/*
* Read low 32b of bar0 indirectly via the hardware backdoor
* mechanism. Works from within PCI passthrough environments
* too, where rman_get_start() can return a different value. We
* need to program the T4 memory window decoders with the actual
* addresses that will be coming across the PCIe link.
*/
bar0 = t4_hw_pci_read_cfg4(sc, PCIR_BAR(0));
bar0 &= (uint32_t) PCIM_BAR_MEM_BASE;
mw_init = &t4_memwin[0];
} else {
/* T5+ use the relative offset inside the PCIe BAR */
bar0 = 0;
mw_init = &t5_memwin[0];
}
for (i = 0, mw = &sc->memwin[0]; i < NUM_MEMWIN; i++, mw_init++, mw++) {
rw_init(&mw->mw_lock, "memory window access");
mw->mw_base = mw_init->base;
mw->mw_aperture = mw_init->aperture;
mw->mw_curpos = 0;
t4_write_reg(sc,
PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, i),
(mw->mw_base + bar0) | V_BIR(0) |
V_WINDOW(ilog2(mw->mw_aperture) - 10));
rw_wlock(&mw->mw_lock);
position_memwin(sc, i, 0);
rw_wunlock(&mw->mw_lock);
}
/* flush */
t4_read_reg(sc, PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, 2));
}
/*
* Positions the memory window at the given address in the card's address space.
* There are some alignment requirements and the actual position may be at an
* address prior to the requested address. mw->mw_curpos always has the actual
* position of the window.
*/
static void
position_memwin(struct adapter *sc, int idx, uint32_t addr)
{
struct memwin *mw;
uint32_t pf;
uint32_t reg;
MPASS(idx >= 0 && idx < NUM_MEMWIN);
mw = &sc->memwin[idx];
rw_assert(&mw->mw_lock, RA_WLOCKED);
if (is_t4(sc)) {
pf = 0;
mw->mw_curpos = addr & ~0xf; /* start must be 16B aligned */
} else {
pf = V_PFNUM(sc->pf);
mw->mw_curpos = addr & ~0x7f; /* start must be 128B aligned */
}
reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, idx);
t4_write_reg(sc, reg, mw->mw_curpos | pf);
t4_read_reg(sc, reg); /* flush */
}
static int
rw_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
int len, int rw)
{
struct memwin *mw;
uint32_t mw_end, v;
MPASS(idx >= 0 && idx < NUM_MEMWIN);
/* Memory can only be accessed in naturally aligned 4 byte units */
if (addr & 3 || len & 3 || len <= 0)
return (EINVAL);
mw = &sc->memwin[idx];
while (len > 0) {
rw_rlock(&mw->mw_lock);
mw_end = mw->mw_curpos + mw->mw_aperture;
if (addr >= mw_end || addr < mw->mw_curpos) {
/* Will need to reposition the window */
if (!rw_try_upgrade(&mw->mw_lock)) {
rw_runlock(&mw->mw_lock);
rw_wlock(&mw->mw_lock);
}
rw_assert(&mw->mw_lock, RA_WLOCKED);
position_memwin(sc, idx, addr);
rw_downgrade(&mw->mw_lock);
mw_end = mw->mw_curpos + mw->mw_aperture;
}
rw_assert(&mw->mw_lock, RA_RLOCKED);
while (addr < mw_end && len > 0) {
if (rw == 0) {
v = t4_read_reg(sc, mw->mw_base + addr -
mw->mw_curpos);
*val++ = le32toh(v);
} else {
v = *val++;
t4_write_reg(sc, mw->mw_base + addr -
mw->mw_curpos, htole32(v));
}
addr += 4;
len -= 4;
}
rw_runlock(&mw->mw_lock);
}
return (0);
}
static inline int
read_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
int len)
{
return (rw_via_memwin(sc, idx, addr, val, len, 0));
}
static inline int
write_via_memwin(struct adapter *sc, int idx, uint32_t addr,
const uint32_t *val, int len)
{
return (rw_via_memwin(sc, idx, addr, (void *)(uintptr_t)val, len, 1));
}
static int
t4_range_cmp(const void *a, const void *b)
{
return ((const struct t4_range *)a)->start -
((const struct t4_range *)b)->start;
}
/*
* Verify that the memory range specified by the addr/len pair is valid within
* the card's address space.
*/
static int
validate_mem_range(struct adapter *sc, uint32_t addr, int len)
{
struct t4_range mem_ranges[4], *r, *next;
uint32_t em, addr_len;
int i, n, remaining;
/* Memory can only be accessed in naturally aligned 4 byte units */
if (addr & 3 || len & 3 || len <= 0)
return (EINVAL);
/* Enabled memories */
em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
r = &mem_ranges[0];
n = 0;
bzero(r, sizeof(mem_ranges));
if (em & F_EDRAM0_ENABLE) {
addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
r->size = G_EDRAM0_SIZE(addr_len) << 20;
if (r->size > 0) {
r->start = G_EDRAM0_BASE(addr_len) << 20;
if (addr >= r->start &&
addr + len <= r->start + r->size)
return (0);
r++;
n++;
}
}
if (em & F_EDRAM1_ENABLE) {
addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
r->size = G_EDRAM1_SIZE(addr_len) << 20;
if (r->size > 0) {
r->start = G_EDRAM1_BASE(addr_len) << 20;
if (addr >= r->start &&
addr + len <= r->start + r->size)
return (0);
r++;
n++;
}
}
if (em & F_EXT_MEM_ENABLE) {
addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
r->size = G_EXT_MEM_SIZE(addr_len) << 20;
if (r->size > 0) {
r->start = G_EXT_MEM_BASE(addr_len) << 20;
if (addr >= r->start &&
addr + len <= r->start + r->size)
return (0);
r++;
n++;
}
}
if (is_t5(sc) && em & F_EXT_MEM1_ENABLE) {
addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
r->size = G_EXT_MEM1_SIZE(addr_len) << 20;
if (r->size > 0) {
r->start = G_EXT_MEM1_BASE(addr_len) << 20;
if (addr >= r->start &&
addr + len <= r->start + r->size)
return (0);
r++;
n++;
}
}
MPASS(n <= nitems(mem_ranges));
if (n > 1) {
/* Sort and merge the ranges. */
qsort(mem_ranges, n, sizeof(struct t4_range), t4_range_cmp);
/* Start from index 0 and examine the next n - 1 entries. */
r = &mem_ranges[0];
for (remaining = n - 1; remaining > 0; remaining--, r++) {
MPASS(r->size > 0); /* r is a valid entry. */
next = r + 1;
MPASS(next->size > 0); /* and so is the next one. */
while (r->start + r->size >= next->start) {
/* Merge the next one into the current entry. */
r->size = max(r->start + r->size,
next->start + next->size) - r->start;
n--; /* One fewer entry in total. */
if (--remaining == 0)
goto done; /* short circuit */
next++;
}
if (next != r + 1) {
/*
* Some entries were merged into r and next
* points to the first valid entry that couldn't
* be merged.
*/
MPASS(next->size > 0); /* must be valid */
memcpy(r + 1, next, remaining * sizeof(*r));
#ifdef INVARIANTS
/*
* This so that the foo->size assertion in the
* next iteration of the loop do the right
* thing for entries that were pulled up and are
* no longer valid.
*/
MPASS(n < nitems(mem_ranges));
bzero(&mem_ranges[n], (nitems(mem_ranges) - n) *
sizeof(struct t4_range));
#endif
}
}
done:
/* Done merging the ranges. */
MPASS(n > 0);
r = &mem_ranges[0];
for (i = 0; i < n; i++, r++) {
if (addr >= r->start &&
addr + len <= r->start + r->size)
return (0);
}
}
return (EFAULT);
}
static int
fwmtype_to_hwmtype(int mtype)
{
switch (mtype) {
case FW_MEMTYPE_EDC0:
return (MEM_EDC0);
case FW_MEMTYPE_EDC1:
return (MEM_EDC1);
case FW_MEMTYPE_EXTMEM:
return (MEM_MC0);
case FW_MEMTYPE_EXTMEM1:
return (MEM_MC1);
default:
panic("%s: cannot translate fw mtype %d.", __func__, mtype);
}
}
/*
* Verify that the memory range specified by the memtype/offset/len pair is
* valid and lies entirely within the memtype specified. The global address of
* the start of the range is returned in addr.
*/
static int
validate_mt_off_len(struct adapter *sc, int mtype, uint32_t off, int len,
uint32_t *addr)
{
uint32_t em, addr_len, maddr;
/* Memory can only be accessed in naturally aligned 4 byte units */
if (off & 3 || len & 3 || len == 0)
return (EINVAL);
em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
switch (fwmtype_to_hwmtype(mtype)) {
case MEM_EDC0:
if (!(em & F_EDRAM0_ENABLE))
return (EINVAL);
addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
maddr = G_EDRAM0_BASE(addr_len) << 20;
break;
case MEM_EDC1:
if (!(em & F_EDRAM1_ENABLE))
return (EINVAL);
addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
maddr = G_EDRAM1_BASE(addr_len) << 20;
break;
case MEM_MC:
if (!(em & F_EXT_MEM_ENABLE))
return (EINVAL);
addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
maddr = G_EXT_MEM_BASE(addr_len) << 20;
break;
case MEM_MC1:
if (!is_t5(sc) || !(em & F_EXT_MEM1_ENABLE))
return (EINVAL);
addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
maddr = G_EXT_MEM1_BASE(addr_len) << 20;
break;
default:
return (EINVAL);
}
*addr = maddr + off; /* global address */
return (validate_mem_range(sc, *addr, len));
}
static int
fixup_devlog_params(struct adapter *sc)
{
struct devlog_params *dparams = &sc->params.devlog;
int rc;
rc = validate_mt_off_len(sc, dparams->memtype, dparams->start,
dparams->size, &dparams->addr);
return (rc);
}
static int
cfg_itype_and_nqueues(struct adapter *sc, int n10g, int n1g, int num_vis,
struct intrs_and_queues *iaq)
{
int rc, itype, navail, nrxq10g, nrxq1g, n;
int nofldrxq10g = 0, nofldrxq1g = 0;
int nnmrxq10g = 0, nnmrxq1g = 0;
bzero(iaq, sizeof(*iaq));
iaq->ntxq10g = t4_ntxq10g;
iaq->ntxq1g = t4_ntxq1g;
iaq->nrxq10g = nrxq10g = t4_nrxq10g;
iaq->nrxq1g = nrxq1g = t4_nrxq1g;
iaq->rsrv_noflowq = t4_rsrv_noflowq;
#ifdef TCP_OFFLOAD
if (is_offload(sc)) {
iaq->nofldtxq10g = t4_nofldtxq10g;
iaq->nofldtxq1g = t4_nofldtxq1g;
iaq->nofldrxq10g = nofldrxq10g = t4_nofldrxq10g;
iaq->nofldrxq1g = nofldrxq1g = t4_nofldrxq1g;
}
#endif
#ifdef DEV_NETMAP
iaq->nnmtxq10g = t4_nnmtxq10g;
iaq->nnmtxq1g = t4_nnmtxq1g;
iaq->nnmrxq10g = nnmrxq10g = t4_nnmrxq10g;
iaq->nnmrxq1g = nnmrxq1g = t4_nnmrxq1g;
#endif
for (itype = INTR_MSIX; itype; itype >>= 1) {
if ((itype & t4_intr_types) == 0)
continue; /* not allowed */
if (itype == INTR_MSIX)
navail = pci_msix_count(sc->dev);
else if (itype == INTR_MSI)
navail = pci_msi_count(sc->dev);
else
navail = 1;
restart:
if (navail == 0)
continue;
iaq->intr_type = itype;
iaq->intr_flags_10g = 0;
iaq->intr_flags_1g = 0;
/*
* Best option: an interrupt vector for errors, one for the
* firmware event queue, and one for every rxq (NIC, TOE, and
* netmap).
*/
iaq->nirq = T4_EXTRA_INTR;
iaq->nirq += n10g * (nrxq10g + nofldrxq10g + nnmrxq10g);
iaq->nirq += n10g * 2 * (num_vis - 1);
iaq->nirq += n1g * (nrxq1g + nofldrxq1g + nnmrxq1g);
iaq->nirq += n1g * 2 * (num_vis - 1);
if (iaq->nirq <= navail &&
(itype != INTR_MSI || powerof2(iaq->nirq))) {
iaq->intr_flags_10g = INTR_ALL;
iaq->intr_flags_1g = INTR_ALL;
goto allocate;
}
/*
* Second best option: a vector for errors, one for the firmware
* event queue, and vectors for either all the NIC rx queues or
* all the TOE rx queues. The queues that don't get vectors
* will forward their interrupts to those that do.
*
* Note: netmap rx queues cannot be created early and so they
* can't be setup to receive forwarded interrupts for others.
*/
iaq->nirq = T4_EXTRA_INTR;
if (nrxq10g >= nofldrxq10g) {
iaq->intr_flags_10g = INTR_RXQ;
iaq->nirq += n10g * nrxq10g;
iaq->nirq += n10g * (num_vis - 1);
#ifdef DEV_NETMAP
iaq->nnmrxq10g = min(nnmrxq10g, nrxq10g);
#endif
} else {
iaq->intr_flags_10g = INTR_OFLD_RXQ;
iaq->nirq += n10g * nofldrxq10g;
#ifdef DEV_NETMAP
iaq->nnmrxq10g = min(nnmrxq10g, nofldrxq10g);
#endif
}
if (nrxq1g >= nofldrxq1g) {
iaq->intr_flags_1g = INTR_RXQ;
iaq->nirq += n1g * nrxq1g;
iaq->nirq += n1g * (num_vis - 1);
#ifdef DEV_NETMAP
iaq->nnmrxq1g = min(nnmrxq1g, nrxq1g);
#endif
} else {
iaq->intr_flags_1g = INTR_OFLD_RXQ;
iaq->nirq += n1g * nofldrxq1g;
#ifdef DEV_NETMAP
iaq->nnmrxq1g = min(nnmrxq1g, nofldrxq1g);
#endif
}
if (iaq->nirq <= navail &&
(itype != INTR_MSI || powerof2(iaq->nirq)))
goto allocate;
/*
* Next best option: an interrupt vector for errors, one for the
* firmware event queue, and at least one per VI. At this
* point we know we'll have to downsize nrxq and/or nofldrxq
* and/or nnmrxq to fit what's available to us.
*/
iaq->nirq = T4_EXTRA_INTR;
iaq->nirq += (n10g + n1g) * num_vis;
if (iaq->nirq <= navail) {
int leftover = navail - iaq->nirq;
if (n10g > 0) {
int target = max(nrxq10g, nofldrxq10g);
iaq->intr_flags_10g = nrxq10g >= nofldrxq10g ?
INTR_RXQ : INTR_OFLD_RXQ;
n = 1;
while (n < target && leftover >= n10g) {
leftover -= n10g;
iaq->nirq += n10g;
n++;
}
iaq->nrxq10g = min(n, nrxq10g);
#ifdef TCP_OFFLOAD
iaq->nofldrxq10g = min(n, nofldrxq10g);
#endif
#ifdef DEV_NETMAP
iaq->nnmrxq10g = min(n, nnmrxq10g);
#endif
}
if (n1g > 0) {
int target = max(nrxq1g, nofldrxq1g);
iaq->intr_flags_1g = nrxq1g >= nofldrxq1g ?
INTR_RXQ : INTR_OFLD_RXQ;
n = 1;
while (n < target && leftover >= n1g) {
leftover -= n1g;
iaq->nirq += n1g;
n++;
}
iaq->nrxq1g = min(n, nrxq1g);
#ifdef TCP_OFFLOAD
iaq->nofldrxq1g = min(n, nofldrxq1g);
#endif
#ifdef DEV_NETMAP
iaq->nnmrxq1g = min(n, nnmrxq1g);
#endif
}
if (itype != INTR_MSI || powerof2(iaq->nirq))
goto allocate;
}
/*
* Least desirable option: one interrupt vector for everything.
*/
iaq->nirq = iaq->nrxq10g = iaq->nrxq1g = 1;
iaq->intr_flags_10g = iaq->intr_flags_1g = 0;
#ifdef TCP_OFFLOAD
if (is_offload(sc))
iaq->nofldrxq10g = iaq->nofldrxq1g = 1;
#endif
#ifdef DEV_NETMAP
iaq->nnmrxq10g = iaq->nnmrxq1g = 1;
#endif
allocate:
navail = iaq->nirq;
rc = 0;
if (itype == INTR_MSIX)
rc = pci_alloc_msix(sc->dev, &navail);
else if (itype == INTR_MSI)
rc = pci_alloc_msi(sc->dev, &navail);
if (rc == 0) {
if (navail == iaq->nirq)
return (0);
/*
* Didn't get the number requested. Use whatever number
* the kernel is willing to allocate (it's in navail).
*/
device_printf(sc->dev, "fewer vectors than requested, "
"type=%d, req=%d, rcvd=%d; will downshift req.\n",
itype, iaq->nirq, navail);
pci_release_msi(sc->dev);
goto restart;
}
device_printf(sc->dev,
"failed to allocate vectors:%d, type=%d, req=%d, rcvd=%d\n",
itype, rc, iaq->nirq, navail);
}
device_printf(sc->dev,
"failed to find a usable interrupt type. "
"allowed=%d, msi-x=%d, msi=%d, intx=1", t4_intr_types,
pci_msix_count(sc->dev), pci_msi_count(sc->dev));
return (ENXIO);
}
#define FW_VERSION(chip) ( \
V_FW_HDR_FW_VER_MAJOR(chip##FW_VERSION_MAJOR) | \
V_FW_HDR_FW_VER_MINOR(chip##FW_VERSION_MINOR) | \
V_FW_HDR_FW_VER_MICRO(chip##FW_VERSION_MICRO) | \
V_FW_HDR_FW_VER_BUILD(chip##FW_VERSION_BUILD))
#define FW_INTFVER(chip, intf) (chip##FW_HDR_INTFVER_##intf)
struct fw_info {
uint8_t chip;
char *kld_name;
char *fw_mod_name;
struct fw_hdr fw_hdr; /* XXX: waste of space, need a sparse struct */
} fw_info[] = {
{
.chip = CHELSIO_T4,
.kld_name = "t4fw_cfg",
.fw_mod_name = "t4fw",
.fw_hdr = {
.chip = FW_HDR_CHIP_T4,
.fw_ver = htobe32_const(FW_VERSION(T4)),
.intfver_nic = FW_INTFVER(T4, NIC),
.intfver_vnic = FW_INTFVER(T4, VNIC),
.intfver_ofld = FW_INTFVER(T4, OFLD),
.intfver_ri = FW_INTFVER(T4, RI),
.intfver_iscsipdu = FW_INTFVER(T4, ISCSIPDU),
.intfver_iscsi = FW_INTFVER(T4, ISCSI),
.intfver_fcoepdu = FW_INTFVER(T4, FCOEPDU),
.intfver_fcoe = FW_INTFVER(T4, FCOE),
},
}, {
.chip = CHELSIO_T5,
.kld_name = "t5fw_cfg",
.fw_mod_name = "t5fw",
.fw_hdr = {
.chip = FW_HDR_CHIP_T5,
.fw_ver = htobe32_const(FW_VERSION(T5)),
.intfver_nic = FW_INTFVER(T5, NIC),
.intfver_vnic = FW_INTFVER(T5, VNIC),
.intfver_ofld = FW_INTFVER(T5, OFLD),
.intfver_ri = FW_INTFVER(T5, RI),
.intfver_iscsipdu = FW_INTFVER(T5, ISCSIPDU),
.intfver_iscsi = FW_INTFVER(T5, ISCSI),
.intfver_fcoepdu = FW_INTFVER(T5, FCOEPDU),
.intfver_fcoe = FW_INTFVER(T5, FCOE),
},
}
};
static struct fw_info *
find_fw_info(int chip)
{
int i;
for (i = 0; i < nitems(fw_info); i++) {
if (fw_info[i].chip == chip)
return (&fw_info[i]);
}
return (NULL);
}
/*
* Is the given firmware API compatible with the one the driver was compiled
* with?
*/
static int
fw_compatible(const struct fw_hdr *hdr1, const struct fw_hdr *hdr2)
{
/* short circuit if it's the exact same firmware version */
if (hdr1->chip == hdr2->chip && hdr1->fw_ver == hdr2->fw_ver)
return (1);
/*
* XXX: Is this too conservative? Perhaps I should limit this to the
* features that are supported in the driver.
*/
#define SAME_INTF(x) (hdr1->intfver_##x == hdr2->intfver_##x)
if (hdr1->chip == hdr2->chip && SAME_INTF(nic) && SAME_INTF(vnic) &&
SAME_INTF(ofld) && SAME_INTF(ri) && SAME_INTF(iscsipdu) &&
SAME_INTF(iscsi) && SAME_INTF(fcoepdu) && SAME_INTF(fcoe))
return (1);
#undef SAME_INTF
return (0);
}
/*
* The firmware in the KLD is usable, but should it be installed? This routine
* explains itself in detail if it indicates the KLD firmware should be
* installed.
*/
static int
should_install_kld_fw(struct adapter *sc, int card_fw_usable, int k, int c)
{
const char *reason;
if (!card_fw_usable) {
reason = "incompatible or unusable";
goto install;
}
if (k > c) {
reason = "older than the version bundled with this driver";
goto install;
}
if (t4_fw_install == 2 && k != c) {
reason = "different than the version bundled with this driver";
goto install;
}
return (0);
install:
if (t4_fw_install == 0) {
device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
"but the driver is prohibited from installing a different "
"firmware on the card.\n",
G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason);
return (0);
}
device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
"installing firmware %u.%u.%u.%u on card.\n",
G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason,
G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
return (1);
}
/*
* Establish contact with the firmware and determine if we are the master driver
* or not, and whether we are responsible for chip initialization.
*/
static int
prep_firmware(struct adapter *sc)
{
const struct firmware *fw = NULL, *default_cfg;
int rc, pf, card_fw_usable, kld_fw_usable, need_fw_reset = 1;
enum dev_state state;
struct fw_info *fw_info;
struct fw_hdr *card_fw; /* fw on the card */
const struct fw_hdr *kld_fw; /* fw in the KLD */
const struct fw_hdr *drv_fw; /* fw header the driver was compiled
against */
/* Contact firmware. */
rc = t4_fw_hello(sc, sc->mbox, sc->mbox, MASTER_MAY, &state);
if (rc < 0 || state == DEV_STATE_ERR) {
rc = -rc;
device_printf(sc->dev,
"failed to connect to the firmware: %d, %d.\n", rc, state);
return (rc);
}
pf = rc;
if (pf == sc->mbox)
sc->flags |= MASTER_PF;
else if (state == DEV_STATE_UNINIT) {
/*
* We didn't get to be the master so we definitely won't be
* configuring the chip. It's a bug if someone else hasn't
* configured it already.
*/
device_printf(sc->dev, "couldn't be master(%d), "
"device not already initialized either(%d).\n", rc, state);
return (EDOOFUS);
}
/* This is the firmware whose headers the driver was compiled against */
fw_info = find_fw_info(chip_id(sc));
if (fw_info == NULL) {
device_printf(sc->dev,
"unable to look up firmware information for chip %d.\n",
chip_id(sc));
return (EINVAL);
}
drv_fw = &fw_info->fw_hdr;
/*
* The firmware KLD contains many modules. The KLD name is also the
* name of the module that contains the default config file.
*/
default_cfg = firmware_get(fw_info->kld_name);
/* Read the header of the firmware on the card */
card_fw = malloc(sizeof(*card_fw), M_CXGBE, M_ZERO | M_WAITOK);
rc = -t4_read_flash(sc, FLASH_FW_START,
sizeof (*card_fw) / sizeof (uint32_t), (uint32_t *)card_fw, 1);
if (rc == 0)
card_fw_usable = fw_compatible(drv_fw, (const void*)card_fw);
else {
device_printf(sc->dev,
"Unable to read card's firmware header: %d\n", rc);
card_fw_usable = 0;
}
/* This is the firmware in the KLD */
fw = firmware_get(fw_info->fw_mod_name);
if (fw != NULL) {
kld_fw = (const void *)fw->data;
kld_fw_usable = fw_compatible(drv_fw, kld_fw);
} else {
kld_fw = NULL;
kld_fw_usable = 0;
}
if (card_fw_usable && card_fw->fw_ver == drv_fw->fw_ver &&
(!kld_fw_usable || kld_fw->fw_ver == drv_fw->fw_ver)) {
/*
* Common case: the firmware on the card is an exact match and
* the KLD is an exact match too, or the KLD is
* absent/incompatible. Note that t4_fw_install = 2 is ignored
* here -- use cxgbetool loadfw if you want to reinstall the
* same firmware as the one on the card.
*/
} else if (kld_fw_usable && state == DEV_STATE_UNINIT &&
should_install_kld_fw(sc, card_fw_usable, be32toh(kld_fw->fw_ver),
be32toh(card_fw->fw_ver))) {
rc = -t4_fw_upgrade(sc, sc->mbox, fw->data, fw->datasize, 0);
if (rc != 0) {
device_printf(sc->dev,
"failed to install firmware: %d\n", rc);
goto done;
}
/* Installed successfully, update the cached header too. */
memcpy(card_fw, kld_fw, sizeof(*card_fw));
card_fw_usable = 1;
need_fw_reset = 0; /* already reset as part of load_fw */
}
if (!card_fw_usable) {
uint32_t d, c, k;
d = ntohl(drv_fw->fw_ver);
c = ntohl(card_fw->fw_ver);
k = kld_fw ? ntohl(kld_fw->fw_ver) : 0;
device_printf(sc->dev, "Cannot find a usable firmware: "
"fw_install %d, chip state %d, "
"driver compiled with %d.%d.%d.%d, "
"card has %d.%d.%d.%d, KLD has %d.%d.%d.%d\n",
t4_fw_install, state,
G_FW_HDR_FW_VER_MAJOR(d), G_FW_HDR_FW_VER_MINOR(d),
G_FW_HDR_FW_VER_MICRO(d), G_FW_HDR_FW_VER_BUILD(d),
G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c),
G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
rc = EINVAL;
goto done;
}
/* We're using whatever's on the card and it's known to be good. */
sc->params.fw_vers = ntohl(card_fw->fw_ver);
snprintf(sc->fw_version, sizeof(sc->fw_version), "%u.%u.%u.%u",
G_FW_HDR_FW_VER_MAJOR(sc->params.fw_vers),
G_FW_HDR_FW_VER_MINOR(sc->params.fw_vers),
G_FW_HDR_FW_VER_MICRO(sc->params.fw_vers),
G_FW_HDR_FW_VER_BUILD(sc->params.fw_vers));
t4_get_tp_version(sc, &sc->params.tp_vers);
snprintf(sc->tp_version, sizeof(sc->tp_version), "%u.%u.%u.%u",
G_FW_HDR_FW_VER_MAJOR(sc->params.tp_vers),
G_FW_HDR_FW_VER_MINOR(sc->params.tp_vers),
G_FW_HDR_FW_VER_MICRO(sc->params.tp_vers),
G_FW_HDR_FW_VER_BUILD(sc->params.tp_vers));
if (t4_get_exprom_version(sc, &sc->params.exprom_vers) != 0)
sc->params.exprom_vers = 0;
else {
snprintf(sc->exprom_version, sizeof(sc->exprom_version),
"%u.%u.%u.%u",
G_FW_HDR_FW_VER_MAJOR(sc->params.exprom_vers),
G_FW_HDR_FW_VER_MINOR(sc->params.exprom_vers),
G_FW_HDR_FW_VER_MICRO(sc->params.exprom_vers),
G_FW_HDR_FW_VER_BUILD(sc->params.exprom_vers));
}
/* Reset device */
if (need_fw_reset &&
(rc = -t4_fw_reset(sc, sc->mbox, F_PIORSTMODE | F_PIORST)) != 0) {
device_printf(sc->dev, "firmware reset failed: %d.\n", rc);
if (rc != ETIMEDOUT && rc != EIO)
t4_fw_bye(sc, sc->mbox);
goto done;
}
sc->flags |= FW_OK;
rc = get_params__pre_init(sc);
if (rc != 0)
goto done; /* error message displayed already */
/* Partition adapter resources as specified in the config file. */
if (state == DEV_STATE_UNINIT) {
KASSERT(sc->flags & MASTER_PF,
("%s: trying to change chip settings when not master.",
__func__));
rc = partition_resources(sc, default_cfg, fw_info->kld_name);
if (rc != 0)
goto done; /* error message displayed already */
t4_tweak_chip_settings(sc);
/* get basic stuff going */
rc = -t4_fw_initialize(sc, sc->mbox);
if (rc != 0) {
device_printf(sc->dev, "fw init failed: %d.\n", rc);
goto done;
}
} else {
snprintf(sc->cfg_file, sizeof(sc->cfg_file), "pf%d", pf);
sc->cfcsum = 0;
}
done:
free(card_fw, M_CXGBE);
if (fw != NULL)
firmware_put(fw, FIRMWARE_UNLOAD);
if (default_cfg != NULL)
firmware_put(default_cfg, FIRMWARE_UNLOAD);
return (rc);
}
#define FW_PARAM_DEV(param) \
(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | \
V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_##param))
#define FW_PARAM_PFVF(param) \
(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_PFVF) | \
V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_PFVF_##param))
/*
* Partition chip resources for use between various PFs, VFs, etc.
*/
static int
partition_resources(struct adapter *sc, const struct firmware *default_cfg,
const char *name_prefix)
{
const struct firmware *cfg = NULL;
int rc = 0;
struct fw_caps_config_cmd caps;
uint32_t mtype, moff, finicsum, cfcsum;
/*
* Figure out what configuration file to use. Pick the default config
* file for the card if the user hasn't specified one explicitly.
*/
snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", t4_cfg_file);
if (strncmp(t4_cfg_file, DEFAULT_CF, sizeof(t4_cfg_file)) == 0) {
/* Card specific overrides go here. */
if (pci_get_device(sc->dev) == 0x440a)
snprintf(sc->cfg_file, sizeof(sc->cfg_file), UWIRE_CF);
if (is_fpga(sc))
snprintf(sc->cfg_file, sizeof(sc->cfg_file), FPGA_CF);
}
/*
* We need to load another module if the profile is anything except
* "default" or "flash".
*/
if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) != 0 &&
strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
char s[32];
snprintf(s, sizeof(s), "%s_%s", name_prefix, sc->cfg_file);
cfg = firmware_get(s);
if (cfg == NULL) {
if (default_cfg != NULL) {
device_printf(sc->dev,
"unable to load module \"%s\" for "
"configuration profile \"%s\", will use "
"the default config file instead.\n",
s, sc->cfg_file);
snprintf(sc->cfg_file, sizeof(sc->cfg_file),
"%s", DEFAULT_CF);
} else {
device_printf(sc->dev,
"unable to load module \"%s\" for "
"configuration profile \"%s\", will use "
"the config file on the card's flash "
"instead.\n", s, sc->cfg_file);
snprintf(sc->cfg_file, sizeof(sc->cfg_file),
"%s", FLASH_CF);
}
}
}
if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) == 0 &&
default_cfg == NULL) {
device_printf(sc->dev,
"default config file not available, will use the config "
"file on the card's flash instead.\n");
snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", FLASH_CF);
}
if (strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
u_int cflen;
const uint32_t *cfdata;
uint32_t param, val, addr;
KASSERT(cfg != NULL || default_cfg != NULL,
("%s: no config to upload", __func__));
/*
* Ask the firmware where it wants us to upload the config file.
*/
param = FW_PARAM_DEV(CF);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val);
if (rc != 0) {
/* No support for config file? Shouldn't happen. */
device_printf(sc->dev,
"failed to query config file location: %d.\n", rc);
goto done;
}
mtype = G_FW_PARAMS_PARAM_Y(val);
moff = G_FW_PARAMS_PARAM_Z(val) << 16;
/*
* XXX: sheer laziness. We deliberately added 4 bytes of
* useless stuffing/comments at the end of the config file so
* it's ok to simply throw away the last remaining bytes when
* the config file is not an exact multiple of 4. This also
* helps with the validate_mt_off_len check.
*/
if (cfg != NULL) {
cflen = cfg->datasize & ~3;
cfdata = cfg->data;
} else {
cflen = default_cfg->datasize & ~3;
cfdata = default_cfg->data;
}
if (cflen > FLASH_CFG_MAX_SIZE) {
device_printf(sc->dev,
"config file too long (%d, max allowed is %d). "
"Will try to use the config on the card, if any.\n",
cflen, FLASH_CFG_MAX_SIZE);
goto use_config_on_flash;
}
rc = validate_mt_off_len(sc, mtype, moff, cflen, &addr);
if (rc != 0) {
device_printf(sc->dev,
"%s: addr (%d/0x%x) or len %d is not valid: %d. "
"Will try to use the config on the card, if any.\n",
__func__, mtype, moff, cflen, rc);
goto use_config_on_flash;
}
write_via_memwin(sc, 2, addr, cfdata, cflen);
} else {
use_config_on_flash:
mtype = FW_MEMTYPE_FLASH;
moff = t4_flash_cfg_addr(sc);
}
bzero(&caps, sizeof(caps));
caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_READ);
caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID |
V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) |
V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(moff >> 16) | FW_LEN16(caps));
rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
if (rc != 0) {
device_printf(sc->dev,
"failed to pre-process config file: %d "
"(mtype %d, moff 0x%x).\n", rc, mtype, moff);
goto done;
}
finicsum = be32toh(caps.finicsum);
cfcsum = be32toh(caps.cfcsum);
if (finicsum != cfcsum) {
device_printf(sc->dev,
"WARNING: config file checksum mismatch: %08x %08x\n",
finicsum, cfcsum);
}
sc->cfcsum = cfcsum;
#define LIMIT_CAPS(x) do { \
caps.x &= htobe16(t4_##x##_allowed); \
} while (0)
/*
* Let the firmware know what features will (not) be used so it can tune
* things accordingly.
*/
LIMIT_CAPS(nbmcaps);
LIMIT_CAPS(linkcaps);
LIMIT_CAPS(switchcaps);
LIMIT_CAPS(niccaps);
LIMIT_CAPS(toecaps);
LIMIT_CAPS(rdmacaps);
LIMIT_CAPS(tlscaps);
LIMIT_CAPS(iscsicaps);
LIMIT_CAPS(fcoecaps);
#undef LIMIT_CAPS
caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_WRITE);
caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), NULL);
if (rc != 0) {
device_printf(sc->dev,
"failed to process config file: %d.\n", rc);
}
done:
if (cfg != NULL)
firmware_put(cfg, FIRMWARE_UNLOAD);
return (rc);
}
/*
* Retrieve parameters that are needed (or nice to have) very early.
*/
static int
get_params__pre_init(struct adapter *sc)
{
int rc;
uint32_t param[2], val[2];
param[0] = FW_PARAM_DEV(PORTVEC);
param[1] = FW_PARAM_DEV(CCLK);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query parameters (pre_init): %d.\n", rc);
return (rc);
}
sc->params.portvec = val[0];
sc->params.nports = bitcount32(val[0]);
sc->params.vpd.cclk = val[1];
/* Read device log parameters. */
rc = -t4_init_devlog_params(sc, 1);
if (rc == 0)
fixup_devlog_params(sc);
else {
device_printf(sc->dev,
"failed to get devlog parameters: %d.\n", rc);
rc = 0; /* devlog isn't critical for device operation */
}
return (rc);
}
/*
* Retrieve various parameters that are of interest to the driver. The device
* has been initialized by the firmware at this point.
*/
static int
get_params__post_init(struct adapter *sc)
{
int rc;
uint32_t param[7], val[7];
struct fw_caps_config_cmd caps;
param[0] = FW_PARAM_PFVF(IQFLINT_START);
param[1] = FW_PARAM_PFVF(EQ_START);
param[2] = FW_PARAM_PFVF(FILTER_START);
param[3] = FW_PARAM_PFVF(FILTER_END);
param[4] = FW_PARAM_PFVF(L2T_START);
param[5] = FW_PARAM_PFVF(L2T_END);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query parameters (post_init): %d.\n", rc);
return (rc);
}
sc->sge.iq_start = val[0];
sc->sge.eq_start = val[1];
sc->tids.ftid_base = val[2];
sc->tids.nftids = val[3] - val[2] + 1;
sc->params.ftid_min = val[2];
sc->params.ftid_max = val[3];
sc->vres.l2t.start = val[4];
sc->vres.l2t.size = val[5] - val[4] + 1;
KASSERT(sc->vres.l2t.size <= L2T_SIZE,
("%s: L2 table size (%u) larger than expected (%u)",
__func__, sc->vres.l2t.size, L2T_SIZE));
/* get capabilites */
bzero(&caps, sizeof(caps));
caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_READ);
caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
if (rc != 0) {
device_printf(sc->dev,
"failed to get card capabilities: %d.\n", rc);
return (rc);
}
#define READ_CAPS(x) do { \
sc->x = htobe16(caps.x); \
} while (0)
READ_CAPS(nbmcaps);
READ_CAPS(linkcaps);
READ_CAPS(switchcaps);
READ_CAPS(niccaps);
READ_CAPS(toecaps);
READ_CAPS(rdmacaps);
READ_CAPS(tlscaps);
READ_CAPS(iscsicaps);
READ_CAPS(fcoecaps);
if (sc->niccaps & FW_CAPS_CONFIG_NIC_ETHOFLD) {
param[0] = FW_PARAM_PFVF(ETHOFLD_START);
param[1] = FW_PARAM_PFVF(ETHOFLD_END);
param[2] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 3, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query NIC parameters: %d.\n", rc);
return (rc);
}
sc->tids.etid_base = val[0];
sc->params.etid_min = val[0];
sc->tids.netids = val[1] - val[0] + 1;
sc->params.netids = sc->tids.netids;
sc->params.eo_wr_cred = val[2];
sc->params.ethoffload = 1;
}
if (sc->toecaps) {
/* query offload-related parameters */
param[0] = FW_PARAM_DEV(NTID);
param[1] = FW_PARAM_PFVF(SERVER_START);
param[2] = FW_PARAM_PFVF(SERVER_END);
param[3] = FW_PARAM_PFVF(TDDP_START);
param[4] = FW_PARAM_PFVF(TDDP_END);
param[5] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query TOE parameters: %d.\n", rc);
return (rc);
}
sc->tids.ntids = val[0];
sc->tids.natids = min(sc->tids.ntids / 2, MAX_ATIDS);
sc->tids.stid_base = val[1];
sc->tids.nstids = val[2] - val[1] + 1;
sc->vres.ddp.start = val[3];
sc->vres.ddp.size = val[4] - val[3] + 1;
sc->params.ofldq_wr_cred = val[5];
sc->params.offload = 1;
}
if (sc->rdmacaps) {
param[0] = FW_PARAM_PFVF(STAG_START);
param[1] = FW_PARAM_PFVF(STAG_END);
param[2] = FW_PARAM_PFVF(RQ_START);
param[3] = FW_PARAM_PFVF(RQ_END);
param[4] = FW_PARAM_PFVF(PBL_START);
param[5] = FW_PARAM_PFVF(PBL_END);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query RDMA parameters(1): %d.\n", rc);
return (rc);
}
sc->vres.stag.start = val[0];
sc->vres.stag.size = val[1] - val[0] + 1;
sc->vres.rq.start = val[2];
sc->vres.rq.size = val[3] - val[2] + 1;
sc->vres.pbl.start = val[4];
sc->vres.pbl.size = val[5] - val[4] + 1;
param[0] = FW_PARAM_PFVF(SQRQ_START);
param[1] = FW_PARAM_PFVF(SQRQ_END);
param[2] = FW_PARAM_PFVF(CQ_START);
param[3] = FW_PARAM_PFVF(CQ_END);
param[4] = FW_PARAM_PFVF(OCQ_START);
param[5] = FW_PARAM_PFVF(OCQ_END);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query RDMA parameters(2): %d.\n", rc);
return (rc);
}
sc->vres.qp.start = val[0];
sc->vres.qp.size = val[1] - val[0] + 1;
sc->vres.cq.start = val[2];
sc->vres.cq.size = val[3] - val[2] + 1;
sc->vres.ocq.start = val[4];
sc->vres.ocq.size = val[5] - val[4] + 1;
}
if (sc->iscsicaps) {
param[0] = FW_PARAM_PFVF(ISCSI_START);
param[1] = FW_PARAM_PFVF(ISCSI_END);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
if (rc != 0) {
device_printf(sc->dev,
"failed to query iSCSI parameters: %d.\n", rc);
return (rc);
}
sc->vres.iscsi.start = val[0];
sc->vres.iscsi.size = val[1] - val[0] + 1;
}
/*
* We've got the params we wanted to query via the firmware. Now grab
* some others directly from the chip.
*/
rc = t4_read_chip_settings(sc);
return (rc);
}
static int
set_params__post_init(struct adapter *sc)
{
uint32_t param, val;
/* ask for encapsulated CPLs */
param = FW_PARAM_PFVF(CPLFW4MSG_ENCAP);
val = 1;
(void)t4_set_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val);
return (0);
}
#undef FW_PARAM_PFVF
#undef FW_PARAM_DEV
static void
t4_set_desc(struct adapter *sc)
{
char buf[128];
struct adapter_params *p = &sc->params;
snprintf(buf, sizeof(buf), "Chelsio %s %sNIC (rev %d), S/N:%s, "
"P/N:%s, E/C:%s", p->vpd.id, is_offload(sc) ? "R" : "",
chip_rev(sc), p->vpd.sn, p->vpd.pn, p->vpd.ec);
device_set_desc_copy(sc->dev, buf);
}
static void
build_medialist(struct port_info *pi, struct ifmedia *media)
{
int m;
PORT_LOCK(pi);
ifmedia_removeall(media);
m = IFM_ETHER | IFM_FDX;
switch(pi->port_type) {
case FW_PORT_TYPE_BT_XFI:
case FW_PORT_TYPE_BT_XAUI:
ifmedia_add(media, m | IFM_10G_T, 0, NULL);
/* fall through */
case FW_PORT_TYPE_BT_SGMII:
ifmedia_add(media, m | IFM_1000_T, 0, NULL);
ifmedia_add(media, m | IFM_100_TX, 0, NULL);
ifmedia_add(media, IFM_ETHER | IFM_AUTO, 0, NULL);
ifmedia_set(media, IFM_ETHER | IFM_AUTO);
break;
case FW_PORT_TYPE_CX4:
ifmedia_add(media, m | IFM_10G_CX4, 0, NULL);
ifmedia_set(media, m | IFM_10G_CX4);
break;
case FW_PORT_TYPE_QSFP_10G:
case FW_PORT_TYPE_SFP:
case FW_PORT_TYPE_FIBER_XFI:
case FW_PORT_TYPE_FIBER_XAUI:
switch (pi->mod_type) {
case FW_PORT_MOD_TYPE_LR:
ifmedia_add(media, m | IFM_10G_LR, 0, NULL);
ifmedia_set(media, m | IFM_10G_LR);
break;
case FW_PORT_MOD_TYPE_SR:
ifmedia_add(media, m | IFM_10G_SR, 0, NULL);
ifmedia_set(media, m | IFM_10G_SR);
break;
case FW_PORT_MOD_TYPE_LRM:
ifmedia_add(media, m | IFM_10G_LRM, 0, NULL);
ifmedia_set(media, m | IFM_10G_LRM);
break;
case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
ifmedia_add(media, m | IFM_10G_TWINAX, 0, NULL);
ifmedia_set(media, m | IFM_10G_TWINAX);
break;
case FW_PORT_MOD_TYPE_NONE:
m &= ~IFM_FDX;
ifmedia_add(media, m | IFM_NONE, 0, NULL);
ifmedia_set(media, m | IFM_NONE);
break;
case FW_PORT_MOD_TYPE_NA:
case FW_PORT_MOD_TYPE_ER:
default:
device_printf(pi->dev,
"unknown port_type (%d), mod_type (%d)\n",
pi->port_type, pi->mod_type);
ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
ifmedia_set(media, m | IFM_UNKNOWN);
break;
}
break;
case FW_PORT_TYPE_QSFP:
switch (pi->mod_type) {
case FW_PORT_MOD_TYPE_LR:
ifmedia_add(media, m | IFM_40G_LR4, 0, NULL);
ifmedia_set(media, m | IFM_40G_LR4);
break;
case FW_PORT_MOD_TYPE_SR:
ifmedia_add(media, m | IFM_40G_SR4, 0, NULL);
ifmedia_set(media, m | IFM_40G_SR4);
break;
case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
ifmedia_add(media, m | IFM_40G_CR4, 0, NULL);
ifmedia_set(media, m | IFM_40G_CR4);
break;
case FW_PORT_MOD_TYPE_NONE:
m &= ~IFM_FDX;
ifmedia_add(media, m | IFM_NONE, 0, NULL);
ifmedia_set(media, m | IFM_NONE);
break;
default:
device_printf(pi->dev,
"unknown port_type (%d), mod_type (%d)\n",
pi->port_type, pi->mod_type);
ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
ifmedia_set(media, m | IFM_UNKNOWN);
break;
}
break;
default:
device_printf(pi->dev,
"unknown port_type (%d), mod_type (%d)\n", pi->port_type,
pi->mod_type);
ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
ifmedia_set(media, m | IFM_UNKNOWN);
break;
}
PORT_UNLOCK(pi);
}
#define FW_MAC_EXACT_CHUNK 7
/*
* Program the port's XGMAC based on parameters in ifnet. The caller also
* indicates which parameters should be programmed (the rest are left alone).
*/
int
update_mac_settings(struct ifnet *ifp, int flags)
{
int rc = 0;
struct vi_info *vi = ifp->if_softc;
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
int mtu = -1, promisc = -1, allmulti = -1, vlanex = -1;
ASSERT_SYNCHRONIZED_OP(sc);
KASSERT(flags, ("%s: not told what to update.", __func__));
if (flags & XGMAC_MTU)
mtu = ifp->if_mtu;
if (flags & XGMAC_PROMISC)
promisc = ifp->if_flags & IFF_PROMISC ? 1 : 0;
if (flags & XGMAC_ALLMULTI)
allmulti = ifp->if_flags & IFF_ALLMULTI ? 1 : 0;
if (flags & XGMAC_VLANEX)
vlanex = ifp->if_capenable & IFCAP_VLAN_HWTAGGING ? 1 : 0;
if (flags & (XGMAC_MTU|XGMAC_PROMISC|XGMAC_ALLMULTI|XGMAC_VLANEX)) {
rc = -t4_set_rxmode(sc, sc->mbox, vi->viid, mtu, promisc,
allmulti, 1, vlanex, false);
if (rc) {
if_printf(ifp, "set_rxmode (%x) failed: %d\n", flags,
rc);
return (rc);
}
}
if (flags & XGMAC_UCADDR) {
uint8_t ucaddr[ETHER_ADDR_LEN];
bcopy(IF_LLADDR(ifp), ucaddr, sizeof(ucaddr));
rc = t4_change_mac(sc, sc->mbox, vi->viid, vi->xact_addr_filt,
ucaddr, true, true);
if (rc < 0) {
rc = -rc;
if_printf(ifp, "change_mac failed: %d\n", rc);
return (rc);
} else {
vi->xact_addr_filt = rc;
rc = 0;
}
}
if (flags & XGMAC_MCADDRS) {
const uint8_t *mcaddr[FW_MAC_EXACT_CHUNK];
int del = 1;
uint64_t hash = 0;
struct ifmultiaddr *ifma;
int i = 0, j;
if_maddr_rlock(ifp);
TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
if (ifma->ifma_addr->sa_family != AF_LINK)
continue;
mcaddr[i] =
LLADDR((struct sockaddr_dl *)ifma->ifma_addr);
MPASS(ETHER_IS_MULTICAST(mcaddr[i]));
i++;
if (i == FW_MAC_EXACT_CHUNK) {
rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid,
del, i, mcaddr, NULL, &hash, 0);
if (rc < 0) {
rc = -rc;
for (j = 0; j < i; j++) {
if_printf(ifp,
"failed to add mc address"
" %02x:%02x:%02x:"
"%02x:%02x:%02x rc=%d\n",
mcaddr[j][0], mcaddr[j][1],
mcaddr[j][2], mcaddr[j][3],
mcaddr[j][4], mcaddr[j][5],
rc);
}
goto mcfail;
}
del = 0;
i = 0;
}
}
if (i > 0) {
rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid, del, i,
mcaddr, NULL, &hash, 0);
if (rc < 0) {
rc = -rc;
for (j = 0; j < i; j++) {
if_printf(ifp,
"failed to add mc address"
" %02x:%02x:%02x:"
"%02x:%02x:%02x rc=%d\n",
mcaddr[j][0], mcaddr[j][1],
mcaddr[j][2], mcaddr[j][3],
mcaddr[j][4], mcaddr[j][5],
rc);
}
goto mcfail;
}
}
rc = -t4_set_addr_hash(sc, sc->mbox, vi->viid, 0, hash, 0);
if (rc != 0)
if_printf(ifp, "failed to set mc address hash: %d", rc);
mcfail:
if_maddr_runlock(ifp);
}
return (rc);
}
/*
* {begin|end}_synchronized_op must be called from the same thread.
*/
int
begin_synchronized_op(struct adapter *sc, struct vi_info *vi, int flags,
char *wmesg)
{
int rc, pri;
#ifdef WITNESS
/* the caller thinks it's ok to sleep, but is it really? */
if (flags & SLEEP_OK)
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,
"begin_synchronized_op");
#endif
if (INTR_OK)
pri = PCATCH;
else
pri = 0;
ADAPTER_LOCK(sc);
for (;;) {
if (vi && IS_DOOMED(vi)) {
rc = ENXIO;
goto done;
}
if (!IS_BUSY(sc)) {
rc = 0;
break;
}
if (!(flags & SLEEP_OK)) {
rc = EBUSY;
goto done;
}
if (mtx_sleep(&sc->flags, &sc->sc_lock, pri, wmesg, 0)) {
rc = EINTR;
goto done;
}
}
KASSERT(!IS_BUSY(sc), ("%s: controller busy.", __func__));
SET_BUSY(sc);
#ifdef INVARIANTS
sc->last_op = wmesg;
sc->last_op_thr = curthread;
sc->last_op_flags = flags;
#endif
done:
if (!(flags & HOLD_LOCK) || rc)
ADAPTER_UNLOCK(sc);
return (rc);
}
/*
* Tell if_ioctl and if_init that the VI is going away. This is
* special variant of begin_synchronized_op and must be paired with a
* call to end_synchronized_op.
*/
void
doom_vi(struct adapter *sc, struct vi_info *vi)
{
ADAPTER_LOCK(sc);
SET_DOOMED(vi);
wakeup(&sc->flags);
while (IS_BUSY(sc))
mtx_sleep(&sc->flags, &sc->sc_lock, 0, "t4detach", 0);
SET_BUSY(sc);
#ifdef INVARIANTS
sc->last_op = "t4detach";
sc->last_op_thr = curthread;
sc->last_op_flags = 0;
#endif
ADAPTER_UNLOCK(sc);
}
/*
* {begin|end}_synchronized_op must be called from the same thread.
*/
void
end_synchronized_op(struct adapter *sc, int flags)
{
if (flags & LOCK_HELD)
ADAPTER_LOCK_ASSERT_OWNED(sc);
else
ADAPTER_LOCK(sc);
KASSERT(IS_BUSY(sc), ("%s: controller not busy.", __func__));
CLR_BUSY(sc);
wakeup(&sc->flags);
ADAPTER_UNLOCK(sc);
}
static int
cxgbe_init_synchronized(struct vi_info *vi)
{
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
struct ifnet *ifp = vi->ifp;
int rc = 0, i;
struct sge_txq *txq;
ASSERT_SYNCHRONIZED_OP(sc);
if (ifp->if_drv_flags & IFF_DRV_RUNNING)
return (0); /* already running */
if (!(sc->flags & FULL_INIT_DONE) &&
((rc = adapter_full_init(sc)) != 0))
return (rc); /* error message displayed already */
if (!(vi->flags & VI_INIT_DONE) &&
((rc = vi_full_init(vi)) != 0))
return (rc); /* error message displayed already */
rc = update_mac_settings(ifp, XGMAC_ALL);
if (rc)
goto done; /* error message displayed already */
rc = -t4_enable_vi(sc, sc->mbox, vi->viid, true, true);
if (rc != 0) {
if_printf(ifp, "enable_vi failed: %d\n", rc);
goto done;
}
/*
* Can't fail from this point onwards. Review cxgbe_uninit_synchronized
* if this changes.
*/
for_each_txq(vi, i, txq) {
TXQ_LOCK(txq);
txq->eq.flags |= EQ_ENABLED;
TXQ_UNLOCK(txq);
}
/*
* The first iq of the first port to come up is used for tracing.
*/
if (sc->traceq < 0 && IS_MAIN_VI(vi)) {
sc->traceq = sc->sge.rxq[vi->first_rxq].iq.abs_id;
t4_write_reg(sc, is_t4(sc) ? A_MPS_TRC_RSS_CONTROL :
A_MPS_T5_TRC_RSS_CONTROL, V_RSSCONTROL(pi->tx_chan) |
V_QUEUENUMBER(sc->traceq));
pi->flags |= HAS_TRACEQ;
}
/* all ok */
PORT_LOCK(pi);
ifp->if_drv_flags |= IFF_DRV_RUNNING;
pi->up_vis++;
if (pi->nvi > 1)
callout_reset(&vi->tick, hz, vi_tick, vi);
else
callout_reset(&pi->tick, hz, cxgbe_tick, pi);
PORT_UNLOCK(pi);
done:
if (rc != 0)
cxgbe_uninit_synchronized(vi);
return (rc);
}
/*
* Idempotent.
*/
static int
cxgbe_uninit_synchronized(struct vi_info *vi)
{
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
struct ifnet *ifp = vi->ifp;
int rc, i;
struct sge_txq *txq;
ASSERT_SYNCHRONIZED_OP(sc);
if (!(vi->flags & VI_INIT_DONE)) {
KASSERT(!(ifp->if_drv_flags & IFF_DRV_RUNNING),
("uninited VI is running"));
return (0);
}
/*
* Disable the VI so that all its data in either direction is discarded
* by the MPS. Leave everything else (the queues, interrupts, and 1Hz
* tick) intact as the TP can deliver negative advice or data that it's
* holding in its RAM (for an offloaded connection) even after the VI is
* disabled.
*/
rc = -t4_enable_vi(sc, sc->mbox, vi->viid, false, false);
if (rc) {
if_printf(ifp, "disable_vi failed: %d\n", rc);
return (rc);
}
for_each_txq(vi, i, txq) {
TXQ_LOCK(txq);
txq->eq.flags &= ~EQ_ENABLED;
TXQ_UNLOCK(txq);
}
PORT_LOCK(pi);
if (pi->nvi == 1)
callout_stop(&pi->tick);
else
callout_stop(&vi->tick);
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
PORT_UNLOCK(pi);
return (0);
}
ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
pi->up_vis--;
if (pi->up_vis > 0) {
PORT_UNLOCK(pi);
return (0);
}
PORT_UNLOCK(pi);
pi->link_cfg.link_ok = 0;
pi->link_cfg.speed = 0;
pi->linkdnrc = -1;
t4_os_link_changed(sc, pi->port_id, 0, -1);
return (0);
}
/*
* It is ok for this function to fail midway and return right away. t4_detach
* will walk the entire sc->irq list and clean up whatever is valid.
*/
static int
setup_intr_handlers(struct adapter *sc)
{
int rc, rid, p, q, v;
char s[8];
struct irq *irq;
struct port_info *pi;
struct vi_info *vi;
struct sge_rxq *rxq;
#ifdef TCP_OFFLOAD
struct sge_ofld_rxq *ofld_rxq;
#endif
#ifdef DEV_NETMAP
struct sge_nm_rxq *nm_rxq;
#endif
#ifdef RSS
int nbuckets = rss_getnumbuckets();
#endif
/*
* Setup interrupts.
*/
irq = &sc->irq[0];
rid = sc->intr_type == INTR_INTX ? 0 : 1;
if (sc->intr_count == 1)
return (t4_alloc_irq(sc, irq, rid, t4_intr_all, sc, "all"));
/* Multiple interrupts. */
KASSERT(sc->intr_count >= T4_EXTRA_INTR + sc->params.nports,
("%s: too few intr.", __func__));
/* The first one is always error intr */
rc = t4_alloc_irq(sc, irq, rid, t4_intr_err, sc, "err");
if (rc != 0)
return (rc);
irq++;
rid++;
/* The second one is always the firmware event queue */
rc = t4_alloc_irq(sc, irq, rid, t4_intr_evt, &sc->sge.fwq, "evt");
if (rc != 0)
return (rc);
irq++;
rid++;
for_each_port(sc, p) {
pi = sc->port[p];
for_each_vi(pi, v, vi) {
vi->first_intr = rid - 1;
#ifdef DEV_NETMAP
if (vi->flags & VI_NETMAP) {
for_each_nm_rxq(vi, q, nm_rxq) {
snprintf(s, sizeof(s), "%d-%d", p, q);
rc = t4_alloc_irq(sc, irq, rid,
t4_nm_intr, nm_rxq, s);
if (rc != 0)
return (rc);
irq++;
rid++;
vi->nintr++;
}
continue;
}
#endif
if (vi->flags & INTR_RXQ) {
for_each_rxq(vi, q, rxq) {
if (v == 0)
snprintf(s, sizeof(s), "%d.%d",
p, q);
else
snprintf(s, sizeof(s),
"%d(%d).%d", p, v, q);
rc = t4_alloc_irq(sc, irq, rid,
t4_intr, rxq, s);
if (rc != 0)
return (rc);
#ifdef RSS
bus_bind_intr(sc->dev, irq->res,
rss_getcpu(q % nbuckets));
#endif
irq++;
rid++;
vi->nintr++;
}
}
#ifdef TCP_OFFLOAD
if (vi->flags & INTR_OFLD_RXQ) {
for_each_ofld_rxq(vi, q, ofld_rxq) {
snprintf(s, sizeof(s), "%d,%d", p, q);
rc = t4_alloc_irq(sc, irq, rid,
t4_intr, ofld_rxq, s);
if (rc != 0)
return (rc);
irq++;
rid++;
vi->nintr++;
}
}
#endif
}
}
MPASS(irq == &sc->irq[sc->intr_count]);
return (0);
}
int
adapter_full_init(struct adapter *sc)
{
int rc, i;
ASSERT_SYNCHRONIZED_OP(sc);
ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
KASSERT((sc->flags & FULL_INIT_DONE) == 0,
("%s: FULL_INIT_DONE already", __func__));
/*
* queues that belong to the adapter (not any particular port).
*/
rc = t4_setup_adapter_queues(sc);
if (rc != 0)
goto done;
for (i = 0; i < nitems(sc->tq); i++) {
sc->tq[i] = taskqueue_create("t4 taskq", M_NOWAIT,
taskqueue_thread_enqueue, &sc->tq[i]);
if (sc->tq[i] == NULL) {
device_printf(sc->dev,
"failed to allocate task queue %d\n", i);
rc = ENOMEM;
goto done;
}
taskqueue_start_threads(&sc->tq[i], 1, PI_NET, "%s tq%d",
device_get_nameunit(sc->dev), i);
}
t4_intr_enable(sc);
sc->flags |= FULL_INIT_DONE;
done:
if (rc != 0)
adapter_full_uninit(sc);
return (rc);
}
int
adapter_full_uninit(struct adapter *sc)
{
int i;
ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
t4_teardown_adapter_queues(sc);
for (i = 0; i < nitems(sc->tq) && sc->tq[i]; i++) {
taskqueue_free(sc->tq[i]);
sc->tq[i] = NULL;
}
sc->flags &= ~FULL_INIT_DONE;
return (0);
}
#ifdef RSS
#define SUPPORTED_RSS_HASHTYPES (RSS_HASHTYPE_RSS_IPV4 | \
RSS_HASHTYPE_RSS_TCP_IPV4 | RSS_HASHTYPE_RSS_IPV6 | \
RSS_HASHTYPE_RSS_TCP_IPV6 | RSS_HASHTYPE_RSS_UDP_IPV4 | \
RSS_HASHTYPE_RSS_UDP_IPV6)
/* Translates kernel hash types to hardware. */
static int
hashconfig_to_hashen(int hashconfig)
{
int hashen = 0;
if (hashconfig & RSS_HASHTYPE_RSS_IPV4)
hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN;
if (hashconfig & RSS_HASHTYPE_RSS_IPV6)
hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN;
if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV4) {
hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
}
if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV6) {
hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
}
if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV4)
hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV6)
hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
return (hashen);
}
/* Translates hardware hash types to kernel. */
static int
hashen_to_hashconfig(int hashen)
{
int hashconfig = 0;
if (hashen & F_FW_RSS_VI_CONFIG_CMD_UDPEN) {
/*
* If UDP hashing was enabled it must have been enabled for
* either IPv4 or IPv6 (inclusive or). Enabling UDP without
* enabling any 4-tuple hash is nonsense configuration.
*/
MPASS(hashen & (F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN));
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV4;
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV6;
}
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV4;
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV6;
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_IPV4;
if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN)
hashconfig |= RSS_HASHTYPE_RSS_IPV6;
return (hashconfig);
}
#endif
int
vi_full_init(struct vi_info *vi)
{
struct adapter *sc = vi->pi->adapter;
struct ifnet *ifp = vi->ifp;
uint16_t *rss;
struct sge_rxq *rxq;
int rc, i, j, hashen;
#ifdef RSS
int nbuckets = rss_getnumbuckets();
int hashconfig = rss_gethashconfig();
int extra;
uint32_t raw_rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
uint32_t rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
#endif
ASSERT_SYNCHRONIZED_OP(sc);
KASSERT((vi->flags & VI_INIT_DONE) == 0,
("%s: VI_INIT_DONE already", __func__));
sysctl_ctx_init(&vi->ctx);
vi->flags |= VI_SYSCTL_CTX;
/*
* Allocate tx/rx/fl queues for this VI.
*/
rc = t4_setup_vi_queues(vi);
if (rc != 0)
goto done; /* error message displayed already */
#ifdef DEV_NETMAP
/* Netmap VIs configure RSS when netmap is enabled. */
if (vi->flags & VI_NETMAP) {
vi->flags |= VI_INIT_DONE;
return (0);
}
#endif
/*
* Setup RSS for this VI. Save a copy of the RSS table for later use.
*/
if (vi->nrxq > vi->rss_size) {
if_printf(ifp, "nrxq (%d) > hw RSS table size (%d); "
"some queues will never receive traffic.\n", vi->nrxq,
vi->rss_size);
} else if (vi->rss_size % vi->nrxq) {
if_printf(ifp, "nrxq (%d), hw RSS table size (%d); "
"expect uneven traffic distribution.\n", vi->nrxq,
vi->rss_size);
}
#ifdef RSS
MPASS(RSS_KEYSIZE == 40);
if (vi->nrxq != nbuckets) {
if_printf(ifp, "nrxq (%d) != kernel RSS buckets (%d);"
"performance will be impacted.\n", vi->nrxq, nbuckets);
}
rss_getkey((void *)&raw_rss_key[0]);
for (i = 0; i < nitems(rss_key); i++) {
rss_key[i] = htobe32(raw_rss_key[nitems(rss_key) - 1 - i]);
}
t4_write_rss_key(sc, &rss_key[0], -1);
#endif
rss = malloc(vi->rss_size * sizeof (*rss), M_CXGBE, M_ZERO | M_WAITOK);
for (i = 0; i < vi->rss_size;) {
#ifdef RSS
j = rss_get_indirection_to_bucket(i);
j %= vi->nrxq;
rxq = &sc->sge.rxq[vi->first_rxq + j];
rss[i++] = rxq->iq.abs_id;
#else
for_each_rxq(vi, j, rxq) {
rss[i++] = rxq->iq.abs_id;
if (i == vi->rss_size)
break;
}
#endif
}
rc = -t4_config_rss_range(sc, sc->mbox, vi->viid, 0, vi->rss_size, rss,
vi->rss_size);
if (rc != 0) {
if_printf(ifp, "rss_config failed: %d\n", rc);
goto done;
}
#ifdef RSS
hashen = hashconfig_to_hashen(hashconfig);
/*
* We may have had to enable some hashes even though the global config
* wants them disabled. This is a potential problem that must be
* reported to the user.
*/
extra = hashen_to_hashconfig(hashen) ^ hashconfig;
/*
* If we consider only the supported hash types, then the enabled hashes
* are a superset of the requested hashes. In other words, there cannot
* be any supported hash that was requested but not enabled, but there
* can be hashes that were not requested but had to be enabled.
*/
extra &= SUPPORTED_RSS_HASHTYPES;
MPASS((extra & hashconfig) == 0);
if (extra) {
if_printf(ifp,
"global RSS config (0x%x) cannot be accommodated.\n",
hashconfig);
}
if (extra & RSS_HASHTYPE_RSS_IPV4)
if_printf(ifp, "IPv4 2-tuple hashing forced on.\n");
if (extra & RSS_HASHTYPE_RSS_TCP_IPV4)
if_printf(ifp, "TCP/IPv4 4-tuple hashing forced on.\n");
if (extra & RSS_HASHTYPE_RSS_IPV6)
if_printf(ifp, "IPv6 2-tuple hashing forced on.\n");
if (extra & RSS_HASHTYPE_RSS_TCP_IPV6)
if_printf(ifp, "TCP/IPv6 4-tuple hashing forced on.\n");
if (extra & RSS_HASHTYPE_RSS_UDP_IPV4)
if_printf(ifp, "UDP/IPv4 4-tuple hashing forced on.\n");
if (extra & RSS_HASHTYPE_RSS_UDP_IPV6)
if_printf(ifp, "UDP/IPv6 4-tuple hashing forced on.\n");
#else
hashen = F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN |
F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN |
F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN | F_FW_RSS_VI_CONFIG_CMD_UDPEN;
#endif
rc = -t4_config_vi_rss(sc, sc->mbox, vi->viid, hashen, rss[0]);
if (rc != 0) {
if_printf(ifp, "rss hash/defaultq config failed: %d\n", rc);
goto done;
}
vi->rss = rss;
vi->flags |= VI_INIT_DONE;
done:
if (rc != 0)
vi_full_uninit(vi);
return (rc);
}
/*
* Idempotent.
*/
int
vi_full_uninit(struct vi_info *vi)
{
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
int i;
struct sge_rxq *rxq;
struct sge_txq *txq;
#ifdef TCP_OFFLOAD
struct sge_ofld_rxq *ofld_rxq;
struct sge_wrq *ofld_txq;
#endif
if (vi->flags & VI_INIT_DONE) {
/* Need to quiesce queues. */
#ifdef DEV_NETMAP
if (vi->flags & VI_NETMAP)
goto skip;
#endif
/* XXX: Only for the first VI? */
if (IS_MAIN_VI(vi))
quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]);
for_each_txq(vi, i, txq) {
quiesce_txq(sc, txq);
}
#ifdef TCP_OFFLOAD
for_each_ofld_txq(vi, i, ofld_txq) {
quiesce_wrq(sc, ofld_txq);
}
#endif
for_each_rxq(vi, i, rxq) {
quiesce_iq(sc, &rxq->iq);
quiesce_fl(sc, &rxq->fl);
}
#ifdef TCP_OFFLOAD
for_each_ofld_rxq(vi, i, ofld_rxq) {
quiesce_iq(sc, &ofld_rxq->iq);
quiesce_fl(sc, &ofld_rxq->fl);
}
#endif
free(vi->rss, M_CXGBE);
}
#ifdef DEV_NETMAP
skip:
#endif
t4_teardown_vi_queues(vi);
vi->flags &= ~VI_INIT_DONE;
return (0);
}
static void
quiesce_txq(struct adapter *sc, struct sge_txq *txq)
{
struct sge_eq *eq = &txq->eq;
struct sge_qstat *spg = (void *)&eq->desc[eq->sidx];
(void) sc; /* unused */
#ifdef INVARIANTS
TXQ_LOCK(txq);
MPASS((eq->flags & EQ_ENABLED) == 0);
TXQ_UNLOCK(txq);
#endif
/* Wait for the mp_ring to empty. */
while (!mp_ring_is_idle(txq->r)) {
mp_ring_check_drainage(txq->r, 0);
pause("rquiesce", 1);
}
/* Then wait for the hardware to finish. */
while (spg->cidx != htobe16(eq->pidx))
pause("equiesce", 1);
/* Finally, wait for the driver to reclaim all descriptors. */
while (eq->cidx != eq->pidx)
pause("dquiesce", 1);
}
static void
quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq)
{
/* XXXTX */
}
static void
quiesce_iq(struct adapter *sc, struct sge_iq *iq)
{
(void) sc; /* unused */
/* Synchronize with the interrupt handler */
while (!atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_DISABLED))
pause("iqfree", 1);
}
static void
quiesce_fl(struct adapter *sc, struct sge_fl *fl)
{
mtx_lock(&sc->sfl_lock);
FL_LOCK(fl);
fl->flags |= FL_DOOMED;
FL_UNLOCK(fl);
callout_stop(&sc->sfl_callout);
mtx_unlock(&sc->sfl_lock);
KASSERT((fl->flags & FL_STARVING) == 0,
("%s: still starving", __func__));
}
static int
t4_alloc_irq(struct adapter *sc, struct irq *irq, int rid,
driver_intr_t *handler, void *arg, char *name)
{
int rc;
irq->rid = rid;
irq->res = bus_alloc_resource_any(sc->dev, SYS_RES_IRQ, &irq->rid,
RF_SHAREABLE | RF_ACTIVE);
if (irq->res == NULL) {
device_printf(sc->dev,
"failed to allocate IRQ for rid %d, name %s.\n", rid, name);
return (ENOMEM);
}
rc = bus_setup_intr(sc->dev, irq->res, INTR_MPSAFE | INTR_TYPE_NET,
NULL, handler, arg, &irq->tag);
if (rc != 0) {
device_printf(sc->dev,
"failed to setup interrupt for rid %d, name %s: %d\n",
rid, name, rc);
} else if (name)
bus_describe_intr(sc->dev, irq->res, irq->tag, name);
return (rc);
}
static int
t4_free_irq(struct adapter *sc, struct irq *irq)
{
if (irq->tag)
bus_teardown_intr(sc->dev, irq->res, irq->tag);
if (irq->res)
bus_release_resource(sc->dev, SYS_RES_IRQ, irq->rid, irq->res);
bzero(irq, sizeof(*irq));
return (0);
}
static void
get_regs(struct adapter *sc, struct t4_regdump *regs, uint8_t *buf)
{
regs->version = chip_id(sc) | chip_rev(sc) << 10;
t4_get_regs(sc, buf, regs->len);
}
#define A_PL_INDIR_CMD 0x1f8
#define S_PL_AUTOINC 31
#define M_PL_AUTOINC 0x1U
#define V_PL_AUTOINC(x) ((x) << S_PL_AUTOINC)
#define G_PL_AUTOINC(x) (((x) >> S_PL_AUTOINC) & M_PL_AUTOINC)
#define S_PL_VFID 20
#define M_PL_VFID 0xffU
#define V_PL_VFID(x) ((x) << S_PL_VFID)
#define G_PL_VFID(x) (((x) >> S_PL_VFID) & M_PL_VFID)
#define S_PL_ADDR 0
#define M_PL_ADDR 0xfffffU
#define V_PL_ADDR(x) ((x) << S_PL_ADDR)
#define G_PL_ADDR(x) (((x) >> S_PL_ADDR) & M_PL_ADDR)
#define A_PL_INDIR_DATA 0x1fc
static uint64_t
read_vf_stat(struct adapter *sc, unsigned int viid, int reg)
{
u32 stats[2];
mtx_assert(&sc->reg_lock, MA_OWNED);
t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
V_PL_VFID(G_FW_VIID_VIN(viid)) | V_PL_ADDR(VF_MPS_REG(reg)));
stats[0] = t4_read_reg(sc, A_PL_INDIR_DATA);
stats[1] = t4_read_reg(sc, A_PL_INDIR_DATA);
return (((uint64_t)stats[1]) << 32 | stats[0]);
}
static void
t4_get_vi_stats(struct adapter *sc, unsigned int viid,
struct fw_vi_stats_vf *stats)
{
#define GET_STAT(name) \
read_vf_stat(sc, viid, A_MPS_VF_STAT_##name##_L)
stats->tx_bcast_bytes = GET_STAT(TX_VF_BCAST_BYTES);
stats->tx_bcast_frames = GET_STAT(TX_VF_BCAST_FRAMES);
stats->tx_mcast_bytes = GET_STAT(TX_VF_MCAST_BYTES);
stats->tx_mcast_frames = GET_STAT(TX_VF_MCAST_FRAMES);
stats->tx_ucast_bytes = GET_STAT(TX_VF_UCAST_BYTES);
stats->tx_ucast_frames = GET_STAT(TX_VF_UCAST_FRAMES);
stats->tx_drop_frames = GET_STAT(TX_VF_DROP_FRAMES);
stats->tx_offload_bytes = GET_STAT(TX_VF_OFFLOAD_BYTES);
stats->tx_offload_frames = GET_STAT(TX_VF_OFFLOAD_FRAMES);
stats->rx_bcast_bytes = GET_STAT(RX_VF_BCAST_BYTES);
stats->rx_bcast_frames = GET_STAT(RX_VF_BCAST_FRAMES);
stats->rx_mcast_bytes = GET_STAT(RX_VF_MCAST_BYTES);
stats->rx_mcast_frames = GET_STAT(RX_VF_MCAST_FRAMES);
stats->rx_ucast_bytes = GET_STAT(RX_VF_UCAST_BYTES);
stats->rx_ucast_frames = GET_STAT(RX_VF_UCAST_FRAMES);
stats->rx_err_frames = GET_STAT(RX_VF_ERR_FRAMES);
#undef GET_STAT
}
static void
t4_clr_vi_stats(struct adapter *sc, unsigned int viid)
{
int reg;
t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
V_PL_VFID(G_FW_VIID_VIN(viid)) |
V_PL_ADDR(VF_MPS_REG(A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L)));
for (reg = A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L;
reg <= A_MPS_VF_STAT_RX_VF_ERR_FRAMES_H; reg += 4)
t4_write_reg(sc, A_PL_INDIR_DATA, 0);
}
static void
vi_refresh_stats(struct adapter *sc, struct vi_info *vi)
{
struct timeval tv;
const struct timeval interval = {0, 250000}; /* 250ms */
if (!(vi->flags & VI_INIT_DONE))
return;
getmicrotime(&tv);
timevalsub(&tv, &interval);
if (timevalcmp(&tv, &vi->last_refreshed, <))
return;
mtx_lock(&sc->reg_lock);
t4_get_vi_stats(sc, vi->viid, &vi->stats);
getmicrotime(&vi->last_refreshed);
mtx_unlock(&sc->reg_lock);
}
static void
cxgbe_refresh_stats(struct adapter *sc, struct port_info *pi)
{
int i;
u_int v, tnl_cong_drops;
struct timeval tv;
const struct timeval interval = {0, 250000}; /* 250ms */
getmicrotime(&tv);
timevalsub(&tv, &interval);
if (timevalcmp(&tv, &pi->last_refreshed, <))
return;
tnl_cong_drops = 0;
t4_get_port_stats(sc, pi->tx_chan, &pi->stats);
for (i = 0; i < sc->chip_params->nchan; i++) {
if (pi->rx_chan_map & (1 << i)) {
mtx_lock(&sc->reg_lock);
t4_read_indirect(sc, A_TP_MIB_INDEX, A_TP_MIB_DATA, &v,
1, A_TP_MIB_TNL_CNG_DROP_0 + i);
mtx_unlock(&sc->reg_lock);
tnl_cong_drops += v;
}
}
pi->tnl_cong_drops = tnl_cong_drops;
getmicrotime(&pi->last_refreshed);
}
static void
cxgbe_tick(void *arg)
{
struct port_info *pi = arg;
struct adapter *sc = pi->adapter;
PORT_LOCK_ASSERT_OWNED(pi);
cxgbe_refresh_stats(sc, pi);
callout_schedule(&pi->tick, hz);
}
void
vi_tick(void *arg)
{
struct vi_info *vi = arg;
struct adapter *sc = vi->pi->adapter;
vi_refresh_stats(sc, vi);
callout_schedule(&vi->tick, hz);
}
static void
cxgbe_vlan_config(void *arg, struct ifnet *ifp, uint16_t vid)
{
struct ifnet *vlan;
if (arg != ifp || ifp->if_type != IFT_ETHER)
return;
vlan = VLAN_DEVAT(ifp, vid);
VLAN_SETCOOKIE(vlan, ifp);
}
static int
cpl_not_handled(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
{
#ifdef INVARIANTS
panic("%s: opcode 0x%02x on iq %p with payload %p",
__func__, rss->opcode, iq, m);
#else
log(LOG_ERR, "%s: opcode 0x%02x on iq %p with payload %p\n",
__func__, rss->opcode, iq, m);
m_freem(m);
#endif
return (EDOOFUS);
}
int
t4_register_cpl_handler(struct adapter *sc, int opcode, cpl_handler_t h)
{
uintptr_t *loc, new;
if (opcode >= nitems(sc->cpl_handler))
return (EINVAL);
new = h ? (uintptr_t)h : (uintptr_t)cpl_not_handled;
loc = (uintptr_t *) &sc->cpl_handler[opcode];
atomic_store_rel_ptr(loc, new);
return (0);
}
static int
an_not_handled(struct sge_iq *iq, const struct rsp_ctrl *ctrl)
{
#ifdef INVARIANTS
panic("%s: async notification on iq %p (ctrl %p)", __func__, iq, ctrl);
#else
log(LOG_ERR, "%s: async notification on iq %p (ctrl %p)\n",
__func__, iq, ctrl);
#endif
return (EDOOFUS);
}
int
t4_register_an_handler(struct adapter *sc, an_handler_t h)
{
uintptr_t *loc, new;
new = h ? (uintptr_t)h : (uintptr_t)an_not_handled;
loc = (uintptr_t *) &sc->an_handler;
atomic_store_rel_ptr(loc, new);
return (0);
}
static int
fw_msg_not_handled(struct adapter *sc, const __be64 *rpl)
{
const struct cpl_fw6_msg *cpl =
__containerof(rpl, struct cpl_fw6_msg, data[0]);
#ifdef INVARIANTS
panic("%s: fw_msg type %d", __func__, cpl->type);
#else
log(LOG_ERR, "%s: fw_msg type %d\n", __func__, cpl->type);
#endif
return (EDOOFUS);
}
int
t4_register_fw_msg_handler(struct adapter *sc, int type, fw_msg_handler_t h)
{
uintptr_t *loc, new;
if (type >= nitems(sc->fw_msg_handler))
return (EINVAL);
/*
* These are dispatched by the handler for FW{4|6}_CPL_MSG using the CPL
* handler dispatch table. Reject any attempt to install a handler for
* this subtype.
*/
if (type == FW_TYPE_RSSCPL || type == FW6_TYPE_RSSCPL)
return (EINVAL);
new = h ? (uintptr_t)h : (uintptr_t)fw_msg_not_handled;
loc = (uintptr_t *) &sc->fw_msg_handler[type];
atomic_store_rel_ptr(loc, new);
return (0);
}
/*
* Should match fw_caps_config_ enums in t4fw_interface.h
*/
static char *caps_decoder[] = {
"\20\001IPMI\002NCSI", /* 0: NBM */
"\20\001PPP\002QFC\003DCBX", /* 1: link */
"\20\001INGRESS\002EGRESS", /* 2: switch */
"\20\001NIC\002VM\003IDS\004UM\005UM_ISGL" /* 3: NIC */
"\006HASHFILTER\007ETHOFLD",
"\20\001TOE", /* 4: TOE */
"\20\001RDDP\002RDMAC", /* 5: RDMA */
"\20\001INITIATOR_PDU\002TARGET_PDU" /* 6: iSCSI */
"\003INITIATOR_CNXOFLD\004TARGET_CNXOFLD"
"\005INITIATOR_SSNOFLD\006TARGET_SSNOFLD"
"\007T10DIF"
"\010INITIATOR_CMDOFLD\011TARGET_CMDOFLD",
"\20\00KEYS", /* 7: TLS */
"\20\001INITIATOR\002TARGET\003CTRL_OFLD" /* 8: FCoE */
"\004PO_INITIATOR\005PO_TARGET",
};
static void
t4_sysctls(struct adapter *sc)
{
struct sysctl_ctx_list *ctx;
struct sysctl_oid *oid;
struct sysctl_oid_list *children, *c0;
static char *doorbells = {"\20\1UDB\2WCWR\3UDBWC\4KDB"};
ctx = device_get_sysctl_ctx(sc->dev);
/*
* dev.t4nex.X.
*/
oid = device_get_sysctl_tree(sc->dev);
c0 = children = SYSCTL_CHILDREN(oid);
sc->sc_do_rxcopy = 1;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "do_rx_copy", CTLFLAG_RW,
&sc->sc_do_rxcopy, 1, "Do RX copy of small frames");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nports", CTLFLAG_RD, NULL,
sc->params.nports, "# of ports");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "hw_revision", CTLFLAG_RD,
NULL, chip_rev(sc), "chip hardware revision");
SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "tp_version",
CTLFLAG_RD, sc->tp_version, 0, "TP microcode version");
if (sc->params.exprom_vers != 0) {
SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "exprom_version",
CTLFLAG_RD, sc->exprom_version, 0, "expansion ROM version");
}
SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "firmware_version",
CTLFLAG_RD, sc->fw_version, 0, "firmware version");
SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "cf",
CTLFLAG_RD, sc->cfg_file, 0, "configuration file");
SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cfcsum", CTLFLAG_RD, NULL,
sc->cfcsum, "config file checksum");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "doorbells",
CTLTYPE_STRING | CTLFLAG_RD, doorbells, sc->doorbells,
sysctl_bitfield, "A", "available doorbells");
#define SYSCTL_CAP(name, n, text) \
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, #name, \
CTLTYPE_STRING | CTLFLAG_RD, caps_decoder[n], sc->name, \
sysctl_bitfield, "A", "available " text "capabilities")
SYSCTL_CAP(nbmcaps, 0, "NBM");
SYSCTL_CAP(linkcaps, 1, "link");
SYSCTL_CAP(switchcaps, 2, "switch");
SYSCTL_CAP(niccaps, 3, "NIC");
SYSCTL_CAP(toecaps, 4, "TCP offload");
SYSCTL_CAP(rdmacaps, 5, "RDMA");
SYSCTL_CAP(iscsicaps, 6, "iSCSI");
SYSCTL_CAP(tlscaps, 7, "TLS");
SYSCTL_CAP(fcoecaps, 8, "FCoE");
#undef SYSCTL_CAP
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "core_clock", CTLFLAG_RD, NULL,
sc->params.vpd.cclk, "core clock frequency (in KHz)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_timers",
CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.timer_val,
sizeof(sc->params.sge.timer_val), sysctl_int_array, "A",
"interrupt holdoff timer values (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pkt_counts",
CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.counter_val,
sizeof(sc->params.sge.counter_val), sysctl_int_array, "A",
"interrupt holdoff packet counter values");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nfilters", CTLFLAG_RD,
NULL, sc->tids.nftids, "number of filters");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature", CTLTYPE_INT |
CTLFLAG_RD, sc, 0, sysctl_temperature, "I",
"chip temperature (in Celsius)");
t4_sge_sysctls(sc, ctx, children);
sc->lro_timeout = 100;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "lro_timeout", CTLFLAG_RW,
&sc->lro_timeout, 0, "lro inactive-flush timeout (in us)");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "debug_flags", CTLFLAG_RW,
&sc->debug_flags, 0, "flags to enable runtime debugging");
#ifdef SBUF_DRAIN
/*
* dev.t4nex.X.misc. Marked CTLFLAG_SKIP to avoid information overload.
*/
oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "misc",
CTLFLAG_RD | CTLFLAG_SKIP, NULL,
"logs and miscellaneous information");
children = SYSCTL_CHILDREN(oid);
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cctrl",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cctrl, "A", "congestion control");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp0",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cim_ibq_obq, "A", "CIM IBQ 0 (TP0)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp1",
CTLTYPE_STRING | CTLFLAG_RD, sc, 1,
sysctl_cim_ibq_obq, "A", "CIM IBQ 1 (TP1)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ulp",
CTLTYPE_STRING | CTLFLAG_RD, sc, 2,
sysctl_cim_ibq_obq, "A", "CIM IBQ 2 (ULP)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge0",
CTLTYPE_STRING | CTLFLAG_RD, sc, 3,
sysctl_cim_ibq_obq, "A", "CIM IBQ 3 (SGE0)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge1",
CTLTYPE_STRING | CTLFLAG_RD, sc, 4,
sysctl_cim_ibq_obq, "A", "CIM IBQ 4 (SGE1)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ncsi",
CTLTYPE_STRING | CTLFLAG_RD, sc, 5,
sysctl_cim_ibq_obq, "A", "CIM IBQ 5 (NCSI)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_la",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
chip_id(sc) <= CHELSIO_T5 ? sysctl_cim_la : sysctl_cim_la_t6,
"A", "CIM logic analyzer");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ma_la",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cim_ma_la, "A", "CIM MA logic analyzer");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp0",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 0 (ULP0)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp1",
CTLTYPE_STRING | CTLFLAG_RD, sc, 1 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 1 (ULP1)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp2",
CTLTYPE_STRING | CTLFLAG_RD, sc, 2 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 2 (ULP2)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp3",
CTLTYPE_STRING | CTLFLAG_RD, sc, 3 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 3 (ULP3)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge",
CTLTYPE_STRING | CTLFLAG_RD, sc, 4 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 4 (SGE)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ncsi",
CTLTYPE_STRING | CTLFLAG_RD, sc, 5 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 5 (NCSI)");
if (chip_id(sc) > CHELSIO_T4) {
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge0_rx",
CTLTYPE_STRING | CTLFLAG_RD, sc, 6 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 6 (SGE0-RX)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge1_rx",
CTLTYPE_STRING | CTLFLAG_RD, sc, 7 + CIM_NUM_IBQ,
sysctl_cim_ibq_obq, "A", "CIM OBQ 7 (SGE1-RX)");
}
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_pif_la",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cim_pif_la, "A", "CIM PIF logic analyzer");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_qcfg",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cim_qcfg, "A", "CIM queue configuration");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cpl_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_cpl_stats, "A", "CPL statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ddp_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_ddp_stats, "A", "non-TCP DDP statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "devlog",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_devlog, "A", "firmware's device log");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fcoe_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_fcoe_stats, "A", "FCoE statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "hw_sched",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_hw_sched, "A", "hardware scheduler ");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "l2t",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_l2t, "A", "hardware L2 table");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "lb_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_lb_stats, "A", "loopback statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "meminfo",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_meminfo, "A", "memory regions");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "mps_tcam",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
chip_id(sc) <= CHELSIO_T5 ? sysctl_mps_tcam : sysctl_mps_tcam_t6,
"A", "MPS TCAM entries");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "path_mtus",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_path_mtus, "A", "path MTUs");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pm_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_pm_stats, "A", "PM statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rdma_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_rdma_stats, "A", "RDMA statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tcp_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_tcp_stats, "A", "TCP statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tids",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_tids, "A", "TID information");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_err_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_tp_err_stats, "A", "TP error statistics");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la_mask",
CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_tp_la_mask, "I",
"TP logic analyzer event capture mask");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_tp_la, "A", "TP logic analyzer");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tx_rate",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_tx_rate, "A", "Tx rate");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ulprx_la",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_ulprx_la, "A", "ULPRX logic analyzer");
if (is_t5(sc)) {
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "wcwr_stats",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
sysctl_wcwr_stats, "A", "write combined work requests");
}
#endif
#ifdef TCP_OFFLOAD
if (is_offload(sc)) {
/*
* dev.t4nex.X.toe.
*/
oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "toe", CTLFLAG_RD,
NULL, "TOE parameters");
children = SYSCTL_CHILDREN(oid);
sc->tt.sndbuf = 256 * 1024;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "sndbuf", CTLFLAG_RW,
&sc->tt.sndbuf, 0, "max hardware send buffer size");
sc->tt.ddp = 0;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ddp", CTLFLAG_RW,
&sc->tt.ddp, 0, "DDP allowed");
sc->tt.rx_coalesce = 1;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "rx_coalesce",
CTLFLAG_RW, &sc->tt.rx_coalesce, 0, "receive coalescing");
sc->tt.tx_align = 1;
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_align",
CTLFLAG_RW, &sc->tt.tx_align, 0, "chop and align payload");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timer_tick",
CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_tick, "A",
"TP timer tick (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timestamp_tick",
CTLTYPE_STRING | CTLFLAG_RD, sc, 1, sysctl_tp_tick, "A",
"TCP timestamp tick (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_tick",
CTLTYPE_STRING | CTLFLAG_RD, sc, 2, sysctl_tp_tick, "A",
"DACK tick (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_timer",
CTLTYPE_UINT | CTLFLAG_RD, sc, 0, sysctl_tp_dack_timer,
"IU", "DACK timer (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_min",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MIN,
sysctl_tp_timer, "LU", "Retransmit min (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_max",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MAX,
sysctl_tp_timer, "LU", "Retransmit max (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_min",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MIN,
sysctl_tp_timer, "LU", "Persist timer min (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_max",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MAX,
sysctl_tp_timer, "LU", "Persist timer max (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_idle",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_IDLE,
sysctl_tp_timer, "LU", "Keepidle idle timer (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_intvl",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_INTVL,
sysctl_tp_timer, "LU", "Keepidle interval (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "initial_srtt",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_INIT_SRTT,
sysctl_tp_timer, "LU", "Initial SRTT (us)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "finwait2_timer",
CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_FINWAIT2_TIMER,
sysctl_tp_timer, "LU", "FINWAIT2 timer (us)");
}
#endif
}
void
vi_sysctls(struct vi_info *vi)
{
struct sysctl_ctx_list *ctx;
struct sysctl_oid *oid;
struct sysctl_oid_list *children;
ctx = device_get_sysctl_ctx(vi->dev);
/*
* dev.[nv](cxgbe|cxl).X.
*/
oid = device_get_sysctl_tree(vi->dev);
children = SYSCTL_CHILDREN(oid);
SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "viid", CTLFLAG_RD, NULL,
vi->viid, "VI identifer");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nrxq", CTLFLAG_RD,
&vi->nrxq, 0, "# of rx queues");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ntxq", CTLFLAG_RD,
&vi->ntxq, 0, "# of tx queues");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_rxq", CTLFLAG_RD,
&vi->first_rxq, 0, "index of first rx queue");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_txq", CTLFLAG_RD,
&vi->first_txq, 0, "index of first tx queue");
if (vi->flags & VI_NETMAP)
return;
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rsrv_noflowq", CTLTYPE_INT |
CTLFLAG_RW, vi, 0, sysctl_noflowq, "IU",
"Reserve queue 0 for non-flowid packets");
#ifdef TCP_OFFLOAD
if (vi->nofldrxq != 0) {
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldrxq", CTLFLAG_RD,
&vi->nofldrxq, 0,
"# of rx queues for offloaded TCP connections");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldtxq", CTLFLAG_RD,
&vi->nofldtxq, 0,
"# of tx queues for offloaded TCP connections");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_rxq",
CTLFLAG_RD, &vi->first_ofld_rxq, 0,
"index of first TOE rx queue");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_txq",
CTLFLAG_RD, &vi->first_ofld_txq, 0,
"index of first TOE tx queue");
}
#endif
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_tmr_idx",
CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_tmr_idx, "I",
"holdoff timer index");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pktc_idx",
CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_pktc_idx, "I",
"holdoff packet counter index");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_rxq",
CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_rxq, "I",
"rx queue size");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_txq",
CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_txq, "I",
"tx queue size");
}
static void
cxgbe_sysctls(struct port_info *pi)
{
struct sysctl_ctx_list *ctx;
struct sysctl_oid *oid;
- struct sysctl_oid_list *children;
+ struct sysctl_oid_list *children, *children2;
struct adapter *sc = pi->adapter;
+ int i;
+ char name[16];
ctx = device_get_sysctl_ctx(pi->dev);
/*
* dev.cxgbe.X.
*/
oid = device_get_sysctl_tree(pi->dev);
children = SYSCTL_CHILDREN(oid);
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "linkdnrc", CTLTYPE_STRING |
CTLFLAG_RD, pi, 0, sysctl_linkdnrc, "A", "reason why link is down");
if (pi->port_type == FW_PORT_TYPE_BT_XAUI) {
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature",
CTLTYPE_INT | CTLFLAG_RD, pi, 0, sysctl_btphy, "I",
"PHY temperature (in Celsius)");
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fw_version",
CTLTYPE_INT | CTLFLAG_RD, pi, 1, sysctl_btphy, "I",
"PHY firmware version");
}
SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pause_settings",
CTLTYPE_STRING | CTLFLAG_RW, pi, PAUSE_TX, sysctl_pause_settings,
"A", "PAUSE settings (bit 0 = rx_pause, bit 1 = tx_pause)");
SYSCTL_ADD_INT(ctx, children, OID_AUTO, "max_speed", CTLFLAG_RD, NULL,
port_top_speed(pi), "max speed (in Gbps)");
/*
+ * dev.(cxgbe|cxl).X.tc.
+ */
+ oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "tc", CTLFLAG_RD, NULL,
+ "Tx scheduler traffic classes");
+ for (i = 0; i < sc->chip_params->nsched_cls; i++) {
+ struct tx_sched_class *tc = &pi->tc[i];
+
+ snprintf(name, sizeof(name), "%d", i);
+ children2 = SYSCTL_CHILDREN(SYSCTL_ADD_NODE(ctx,
+ SYSCTL_CHILDREN(oid), OID_AUTO, name, CTLFLAG_RD, NULL,
+ "traffic class"));
+ SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "flags", CTLFLAG_RD,
+ &tc->flags, 0, "flags");
+ SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "refcount",
+ CTLFLAG_RD, &tc->refcount, 0, "references to this class");
+#ifdef SBUF_DRAIN
+ SYSCTL_ADD_PROC(ctx, children2, OID_AUTO, "params",
+ CTLTYPE_STRING | CTLFLAG_RD, sc, (pi->port_id << 16) | i,
+ sysctl_tc_params, "A", "traffic class parameters");
+#endif
+ }
+
+ /*
* dev.cxgbe.X.stats.
*/
oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD,
NULL, "port statistics");
children = SYSCTL_CHILDREN(oid);
SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_parse_error", CTLFLAG_RD,
&pi->tx_parse_error, 0,
"# of tx packets with invalid length or # of segments");
#define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \
SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \
CTLTYPE_U64 | CTLFLAG_RD, sc, reg, \
sysctl_handle_t4_reg64, "QU", desc)
SYSCTL_ADD_T4_REG64(pi, "tx_octets", "# of octets in good frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BYTES_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames", "total # of good frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_FRAMES_L));
SYSCTL_ADD_T4_REG64(pi, "tx_bcast_frames", "# of broadcast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BCAST_L));
SYSCTL_ADD_T4_REG64(pi, "tx_mcast_frames", "# of multicast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_MCAST_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ucast_frames", "# of unicast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_UCAST_L));
SYSCTL_ADD_T4_REG64(pi, "tx_error_frames", "# of error frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_64",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_64B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_65_127",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_65B_127B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_128_255",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_128B_255B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_256_511",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_256B_511B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_512_1023",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_512B_1023B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_1024_1518",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1024B_1518B_L));
SYSCTL_ADD_T4_REG64(pi, "tx_frames_1519_max",
"# of tx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1519B_MAX_L));
SYSCTL_ADD_T4_REG64(pi, "tx_drop", "# of dropped tx frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_DROP_L));
SYSCTL_ADD_T4_REG64(pi, "tx_pause", "# of pause frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PAUSE_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp0", "# of PPP prio 0 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP0_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp1", "# of PPP prio 1 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP1_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp2", "# of PPP prio 2 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP2_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp3", "# of PPP prio 3 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP3_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp4", "# of PPP prio 4 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP4_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp5", "# of PPP prio 5 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP5_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp6", "# of PPP prio 6 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP6_L));
SYSCTL_ADD_T4_REG64(pi, "tx_ppp7", "# of PPP prio 7 frames transmitted",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP7_L));
SYSCTL_ADD_T4_REG64(pi, "rx_octets", "# of octets in good frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BYTES_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames", "total # of good frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_FRAMES_L));
SYSCTL_ADD_T4_REG64(pi, "rx_bcast_frames", "# of broadcast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BCAST_L));
SYSCTL_ADD_T4_REG64(pi, "rx_mcast_frames", "# of multicast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MCAST_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ucast_frames", "# of unicast frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_UCAST_L));
SYSCTL_ADD_T4_REG64(pi, "rx_too_long", "# of frames exceeding MTU",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "rx_jabber", "# of jabber frames",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_CRC_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "rx_fcs_err",
"# of frames received with bad FCS",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_CRC_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "rx_len_err",
"# of frames received with length error",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LEN_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "rx_symbol_err", "symbol errors",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_SYM_ERROR_L));
SYSCTL_ADD_T4_REG64(pi, "rx_runt", "# of short frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LESS_64B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_64",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_64B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_65_127",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_65B_127B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_128_255",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_128B_255B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_256_511",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_256B_511B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_512_1023",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_512B_1023B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_1024_1518",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1024B_1518B_L));
SYSCTL_ADD_T4_REG64(pi, "rx_frames_1519_max",
"# of rx frames in this range",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1519B_MAX_L));
SYSCTL_ADD_T4_REG64(pi, "rx_pause", "# of pause frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PAUSE_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp0", "# of PPP prio 0 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP0_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp1", "# of PPP prio 1 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP1_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp2", "# of PPP prio 2 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP2_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp3", "# of PPP prio 3 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP3_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp4", "# of PPP prio 4 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP4_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp5", "# of PPP prio 5 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP5_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp6", "# of PPP prio 6 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP6_L));
SYSCTL_ADD_T4_REG64(pi, "rx_ppp7", "# of PPP prio 7 frames received",
PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP7_L));
#undef SYSCTL_ADD_T4_REG64
#define SYSCTL_ADD_T4_PORTSTAT(name, desc) \
SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, #name, CTLFLAG_RD, \
&pi->stats.name, desc)
/* We get these from port_stats and they may be stale by up to 1s */
SYSCTL_ADD_T4_PORTSTAT(rx_ovflow0,
"# drops due to buffer-group 0 overflows");
SYSCTL_ADD_T4_PORTSTAT(rx_ovflow1,
"# drops due to buffer-group 1 overflows");
SYSCTL_ADD_T4_PORTSTAT(rx_ovflow2,
"# drops due to buffer-group 2 overflows");
SYSCTL_ADD_T4_PORTSTAT(rx_ovflow3,
"# drops due to buffer-group 3 overflows");
SYSCTL_ADD_T4_PORTSTAT(rx_trunc0,
"# of buffer-group 0 truncated packets");
SYSCTL_ADD_T4_PORTSTAT(rx_trunc1,
"# of buffer-group 1 truncated packets");
SYSCTL_ADD_T4_PORTSTAT(rx_trunc2,
"# of buffer-group 2 truncated packets");
SYSCTL_ADD_T4_PORTSTAT(rx_trunc3,
"# of buffer-group 3 truncated packets");
#undef SYSCTL_ADD_T4_PORTSTAT
}
static int
sysctl_int_array(SYSCTL_HANDLER_ARGS)
{
int rc, *i, space = 0;
struct sbuf sb;
sbuf_new_for_sysctl(&sb, NULL, 64, req);
for (i = arg1; arg2; arg2 -= sizeof(int), i++) {
if (space)
sbuf_printf(&sb, " ");
sbuf_printf(&sb, "%d", *i);
space = 1;
}
rc = sbuf_finish(&sb);
sbuf_delete(&sb);
return (rc);
}
static int
sysctl_bitfield(SYSCTL_HANDLER_ARGS)
{
int rc;
struct sbuf *sb;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return(rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
if (sb == NULL)
return (ENOMEM);
sbuf_printf(sb, "%b", (int)arg2, (char *)arg1);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_btphy(SYSCTL_HANDLER_ARGS)
{
struct port_info *pi = arg1;
int op = arg2;
struct adapter *sc = pi->adapter;
u_int v;
int rc;
rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK, "t4btt");
if (rc)
return (rc);
/* XXX: magic numbers */
rc = -t4_mdio_rd(sc, sc->mbox, pi->mdio_addr, 0x1e, op ? 0x20 : 0xc820,
&v);
end_synchronized_op(sc, 0);
if (rc)
return (rc);
if (op == 0)
v /= 256;
rc = sysctl_handle_int(oidp, &v, 0, req);
return (rc);
}
static int
sysctl_noflowq(SYSCTL_HANDLER_ARGS)
{
struct vi_info *vi = arg1;
int rc, val;
val = vi->rsrv_noflowq;
rc = sysctl_handle_int(oidp, &val, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if ((val >= 1) && (vi->ntxq > 1))
vi->rsrv_noflowq = 1;
else
vi->rsrv_noflowq = 0;
return (rc);
}
static int
sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS)
{
struct vi_info *vi = arg1;
struct adapter *sc = vi->pi->adapter;
int idx, rc, i;
struct sge_rxq *rxq;
#ifdef TCP_OFFLOAD
struct sge_ofld_rxq *ofld_rxq;
#endif
uint8_t v;
idx = vi->tmr_idx;
rc = sysctl_handle_int(oidp, &idx, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if (idx < 0 || idx >= SGE_NTIMERS)
return (EINVAL);
rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4tmr");
if (rc)
return (rc);
v = V_QINTR_TIMER_IDX(idx) | V_QINTR_CNT_EN(vi->pktc_idx != -1);
for_each_rxq(vi, i, rxq) {
#ifdef atomic_store_rel_8
atomic_store_rel_8(&rxq->iq.intr_params, v);
#else
rxq->iq.intr_params = v;
#endif
}
#ifdef TCP_OFFLOAD
for_each_ofld_rxq(vi, i, ofld_rxq) {
#ifdef atomic_store_rel_8
atomic_store_rel_8(&ofld_rxq->iq.intr_params, v);
#else
ofld_rxq->iq.intr_params = v;
#endif
}
#endif
vi->tmr_idx = idx;
end_synchronized_op(sc, LOCK_HELD);
return (0);
}
static int
sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS)
{
struct vi_info *vi = arg1;
struct adapter *sc = vi->pi->adapter;
int idx, rc;
idx = vi->pktc_idx;
rc = sysctl_handle_int(oidp, &idx, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if (idx < -1 || idx >= SGE_NCOUNTERS)
return (EINVAL);
rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4pktc");
if (rc)
return (rc);
if (vi->flags & VI_INIT_DONE)
rc = EBUSY; /* cannot be changed once the queues are created */
else
vi->pktc_idx = idx;
end_synchronized_op(sc, LOCK_HELD);
return (rc);
}
static int
sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS)
{
struct vi_info *vi = arg1;
struct adapter *sc = vi->pi->adapter;
int qsize, rc;
qsize = vi->qsize_rxq;
rc = sysctl_handle_int(oidp, &qsize, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if (qsize < 128 || (qsize & 7))
return (EINVAL);
rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4rxqs");
if (rc)
return (rc);
if (vi->flags & VI_INIT_DONE)
rc = EBUSY; /* cannot be changed once the queues are created */
else
vi->qsize_rxq = qsize;
end_synchronized_op(sc, LOCK_HELD);
return (rc);
}
static int
sysctl_qsize_txq(SYSCTL_HANDLER_ARGS)
{
struct vi_info *vi = arg1;
struct adapter *sc = vi->pi->adapter;
int qsize, rc;
qsize = vi->qsize_txq;
rc = sysctl_handle_int(oidp, &qsize, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if (qsize < 128 || qsize > 65536)
return (EINVAL);
rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4txqs");
if (rc)
return (rc);
if (vi->flags & VI_INIT_DONE)
rc = EBUSY; /* cannot be changed once the queues are created */
else
vi->qsize_txq = qsize;
end_synchronized_op(sc, LOCK_HELD);
return (rc);
}
static int
sysctl_pause_settings(SYSCTL_HANDLER_ARGS)
{
struct port_info *pi = arg1;
struct adapter *sc = pi->adapter;
struct link_config *lc = &pi->link_cfg;
int rc;
if (req->newptr == NULL) {
struct sbuf *sb;
static char *bits = "\20\1PAUSE_RX\2PAUSE_TX";
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return(rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
if (sb == NULL)
return (ENOMEM);
sbuf_printf(sb, "%b", lc->fc & (PAUSE_TX | PAUSE_RX), bits);
rc = sbuf_finish(sb);
sbuf_delete(sb);
} else {
char s[2];
int n;
s[0] = '0' + (lc->requested_fc & (PAUSE_TX | PAUSE_RX));
s[1] = 0;
rc = sysctl_handle_string(oidp, s, sizeof(s), req);
if (rc != 0)
return(rc);
if (s[1] != 0)
return (EINVAL);
if (s[0] < '0' || s[0] > '9')
return (EINVAL); /* not a number */
n = s[0] - '0';
if (n & ~(PAUSE_TX | PAUSE_RX))
return (EINVAL); /* some other bit is set too */
rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK,
"t4PAUSE");
if (rc)
return (rc);
if ((lc->requested_fc & (PAUSE_TX | PAUSE_RX)) != n) {
int link_ok = lc->link_ok;
lc->requested_fc &= ~(PAUSE_TX | PAUSE_RX);
lc->requested_fc |= n;
rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, lc);
lc->link_ok = link_ok; /* restore */
}
end_synchronized_op(sc, 0);
}
return (rc);
}
static int
sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
int reg = arg2;
uint64_t val;
val = t4_read_reg64(sc, reg);
return (sysctl_handle_64(oidp, &val, 0, req));
}
static int
sysctl_temperature(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
int rc, t;
uint32_t param, val;
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4temp");
if (rc)
return (rc);
param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_DIAG) |
V_FW_PARAMS_PARAM_Y(FW_PARAM_DEV_DIAG_TMP);
rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, ¶m, &val);
end_synchronized_op(sc, 0);
if (rc)
return (rc);
/* unknown is returned as 0 but we display -1 in that case */
t = val == 0 ? -1 : val;
rc = sysctl_handle_int(oidp, &t, 0, req);
return (rc);
}
#ifdef SBUF_DRAIN
static int
sysctl_cctrl(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
uint16_t incr[NMTUS][NCCTRL_WIN];
static const char *dec_fac[] = {
"0.5", "0.5625", "0.625", "0.6875", "0.75", "0.8125", "0.875",
"0.9375"
};
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
t4_read_cong_tbl(sc, incr);
for (i = 0; i < NCCTRL_WIN; ++i) {
sbuf_printf(sb, "%2d: %4u %4u %4u %4u %4u %4u %4u %4u\n", i,
incr[0][i], incr[1][i], incr[2][i], incr[3][i], incr[4][i],
incr[5][i], incr[6][i], incr[7][i]);
sbuf_printf(sb, "%8u %4u %4u %4u %4u %4u %4u %4u %5u %s\n",
incr[8][i], incr[9][i], incr[10][i], incr[11][i],
incr[12][i], incr[13][i], incr[14][i], incr[15][i],
sc->params.a_wnd[i], dec_fac[sc->params.b_wnd[i]]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static const char *qname[CIM_NUM_IBQ + CIM_NUM_OBQ_T5] = {
"TP0", "TP1", "ULP", "SGE0", "SGE1", "NC-SI", /* ibq's */
"ULP0", "ULP1", "ULP2", "ULP3", "SGE", "NC-SI", /* obq's */
"SGE0-RX", "SGE1-RX" /* additional obq's (T5 onwards) */
};
static int
sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i, n, qid = arg2;
uint32_t *buf, *p;
char *qtype;
u_int cim_num_obq = sc->chip_params->cim_num_obq;
KASSERT(qid >= 0 && qid < CIM_NUM_IBQ + cim_num_obq,
("%s: bad qid %d\n", __func__, qid));
if (qid < CIM_NUM_IBQ) {
/* inbound queue */
qtype = "IBQ";
n = 4 * CIM_IBQ_SIZE;
buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
rc = t4_read_cim_ibq(sc, qid, buf, n);
} else {
/* outbound queue */
qtype = "OBQ";
qid -= CIM_NUM_IBQ;
n = 4 * cim_num_obq * CIM_OBQ_SIZE;
buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
rc = t4_read_cim_obq(sc, qid, buf, n);
}
if (rc < 0) {
rc = -rc;
goto done;
}
n = rc * sizeof(uint32_t); /* rc has # of words actually read */
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
goto done;
sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
if (sb == NULL) {
rc = ENOMEM;
goto done;
}
sbuf_printf(sb, "%s%d %s", qtype , qid, qname[arg2]);
for (i = 0, p = buf; i < n; i += 16, p += 4)
sbuf_printf(sb, "\n%#06x: %08x %08x %08x %08x", i, p[0], p[1],
p[2], p[3]);
rc = sbuf_finish(sb);
sbuf_delete(sb);
done:
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_cim_la(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
u_int cfg;
struct sbuf *sb;
uint32_t *buf, *p;
int rc;
MPASS(chip_id(sc) <= CHELSIO_T5);
rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
if (rc != 0)
return (rc);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
M_ZERO | M_WAITOK);
rc = -t4_cim_read_la(sc, buf, NULL);
if (rc != 0)
goto done;
sbuf_printf(sb, "Status Data PC%s",
cfg & F_UPDBGLACAPTPCONLY ? "" :
" LS0Stat LS0Addr LS0Data");
for (p = buf; p <= &buf[sc->params.cim_la_size - 8]; p += 8) {
if (cfg & F_UPDBGLACAPTPCONLY) {
sbuf_printf(sb, "\n %02x %08x %08x", p[5] & 0xff,
p[6], p[7]);
sbuf_printf(sb, "\n %02x %02x%06x %02x%06x",
(p[3] >> 8) & 0xff, p[3] & 0xff, p[4] >> 8,
p[4] & 0xff, p[5] >> 8);
sbuf_printf(sb, "\n %02x %x%07x %x%07x",
(p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
p[1] & 0xf, p[2] >> 4);
} else {
sbuf_printf(sb,
"\n %02x %x%07x %x%07x %08x %08x "
"%08x%08x%08x%08x",
(p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
p[1] & 0xf, p[2] >> 4, p[2] & 0xf, p[3], p[4], p[5],
p[6], p[7]);
}
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
done:
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
u_int cfg;
struct sbuf *sb;
uint32_t *buf, *p;
int rc;
MPASS(chip_id(sc) > CHELSIO_T5);
rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
if (rc != 0)
return (rc);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
M_ZERO | M_WAITOK);
rc = -t4_cim_read_la(sc, buf, NULL);
if (rc != 0)
goto done;
sbuf_printf(sb, "Status Inst Data PC%s",
cfg & F_UPDBGLACAPTPCONLY ? "" :
" LS0Stat LS0Addr LS0Data LS1Stat LS1Addr LS1Data");
for (p = buf; p <= &buf[sc->params.cim_la_size - 10]; p += 10) {
if (cfg & F_UPDBGLACAPTPCONLY) {
sbuf_printf(sb, "\n %02x %08x %08x %08x",
p[3] & 0xff, p[2], p[1], p[0]);
sbuf_printf(sb, "\n %02x %02x%06x %02x%06x %02x%06x",
(p[6] >> 8) & 0xff, p[6] & 0xff, p[5] >> 8,
p[5] & 0xff, p[4] >> 8, p[4] & 0xff, p[3] >> 8);
sbuf_printf(sb, "\n %02x %04x%04x %04x%04x %04x%04x",
(p[9] >> 16) & 0xff, p[9] & 0xffff, p[8] >> 16,
p[8] & 0xffff, p[7] >> 16, p[7] & 0xffff,
p[6] >> 16);
} else {
sbuf_printf(sb, "\n %02x %04x%04x %04x%04x %04x%04x "
"%08x %08x %08x %08x %08x %08x",
(p[9] >> 16) & 0xff,
p[9] & 0xffff, p[8] >> 16,
p[8] & 0xffff, p[7] >> 16,
p[7] & 0xffff, p[6] >> 16,
p[2], p[1], p[0], p[5], p[4], p[3]);
}
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
done:
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
u_int i;
struct sbuf *sb;
uint32_t *buf, *p;
int rc;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(2 * CIM_MALA_SIZE * 5 * sizeof(uint32_t), M_CXGBE,
M_ZERO | M_WAITOK);
t4_cim_read_ma_la(sc, buf, buf + 5 * CIM_MALA_SIZE);
p = buf;
for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
sbuf_printf(sb, "\n%02x%08x%08x%08x%08x", p[4], p[3], p[2],
p[1], p[0]);
}
sbuf_printf(sb, "\n\nCnt ID Tag UE Data RDY VLD");
for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
sbuf_printf(sb, "\n%3u %2u %x %u %08x%08x %u %u",
(p[2] >> 10) & 0xff, (p[2] >> 7) & 7,
(p[2] >> 3) & 0xf, (p[2] >> 2) & 1,
(p[1] >> 2) | ((p[2] & 3) << 30),
(p[0] >> 2) | ((p[1] & 3) << 30), (p[0] >> 1) & 1,
p[0] & 1);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
u_int i;
struct sbuf *sb;
uint32_t *buf, *p;
int rc;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(2 * CIM_PIFLA_SIZE * 6 * sizeof(uint32_t), M_CXGBE,
M_ZERO | M_WAITOK);
t4_cim_read_pif_la(sc, buf, buf + 6 * CIM_PIFLA_SIZE, NULL, NULL);
p = buf;
sbuf_printf(sb, "Cntl ID DataBE Addr Data");
for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
sbuf_printf(sb, "\n %02x %02x %04x %08x %08x%08x%08x%08x",
(p[5] >> 22) & 0xff, (p[5] >> 16) & 0x3f, p[5] & 0xffff,
p[4], p[3], p[2], p[1], p[0]);
}
sbuf_printf(sb, "\n\nCntl ID Data");
for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
sbuf_printf(sb, "\n %02x %02x %08x%08x%08x%08x",
(p[4] >> 6) & 0xff, p[4] & 0x3f, p[3], p[2], p[1], p[0]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
uint16_t base[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
uint16_t size[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
uint16_t thres[CIM_NUM_IBQ];
uint32_t obq_wr[2 * CIM_NUM_OBQ_T5], *wr = obq_wr;
uint32_t stat[4 * (CIM_NUM_IBQ + CIM_NUM_OBQ_T5)], *p = stat;
u_int cim_num_obq, ibq_rdaddr, obq_rdaddr, nq;
cim_num_obq = sc->chip_params->cim_num_obq;
if (is_t4(sc)) {
ibq_rdaddr = A_UP_IBQ_0_RDADDR;
obq_rdaddr = A_UP_OBQ_0_REALADDR;
} else {
ibq_rdaddr = A_UP_IBQ_0_SHADOW_RDADDR;
obq_rdaddr = A_UP_OBQ_0_SHADOW_REALADDR;
}
nq = CIM_NUM_IBQ + cim_num_obq;
rc = -t4_cim_read(sc, ibq_rdaddr, 4 * nq, stat);
if (rc == 0)
rc = -t4_cim_read(sc, obq_rdaddr, 2 * cim_num_obq, obq_wr);
if (rc != 0)
return (rc);
t4_read_cimq_cfg(sc, base, size, thres);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
if (sb == NULL)
return (ENOMEM);
sbuf_printf(sb, "Queue Base Size Thres RdPtr WrPtr SOP EOP Avail");
for (i = 0; i < CIM_NUM_IBQ; i++, p += 4)
sbuf_printf(sb, "\n%7s %5x %5u %5u %6x %4x %4u %4u %5u",
qname[i], base[i], size[i], thres[i], G_IBQRDADDR(p[0]),
G_IBQWRADDR(p[1]), G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
G_QUEREMFLITS(p[2]) * 16);
for ( ; i < nq; i++, p += 4, wr += 2)
sbuf_printf(sb, "\n%7s %5x %5u %12x %4x %4u %4u %5u", qname[i],
base[i], size[i], G_QUERDADDR(p[0]) & 0x3fff,
wr[0] - base[i], G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
G_QUEREMFLITS(p[2]) * 16);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_cpl_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_cpl_stats stats;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
mtx_lock(&sc->reg_lock);
t4_tp_get_cpl_stats(sc, &stats);
mtx_unlock(&sc->reg_lock);
if (sc->chip_params->nchan > 2) {
sbuf_printf(sb, " channel 0 channel 1"
" channel 2 channel 3");
sbuf_printf(sb, "\nCPL requests: %10u %10u %10u %10u",
stats.req[0], stats.req[1], stats.req[2], stats.req[3]);
sbuf_printf(sb, "\nCPL responses: %10u %10u %10u %10u",
stats.rsp[0], stats.rsp[1], stats.rsp[2], stats.rsp[3]);
} else {
sbuf_printf(sb, " channel 0 channel 1");
sbuf_printf(sb, "\nCPL requests: %10u %10u",
stats.req[0], stats.req[1]);
sbuf_printf(sb, "\nCPL responses: %10u %10u",
stats.rsp[0], stats.rsp[1]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_ddp_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_usm_stats stats;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return(rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
t4_get_usm_stats(sc, &stats);
sbuf_printf(sb, "Frames: %u\n", stats.frames);
sbuf_printf(sb, "Octets: %ju\n", stats.octets);
sbuf_printf(sb, "Drops: %u", stats.drops);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static const char * const devlog_level_strings[] = {
[FW_DEVLOG_LEVEL_EMERG] = "EMERG",
[FW_DEVLOG_LEVEL_CRIT] = "CRIT",
[FW_DEVLOG_LEVEL_ERR] = "ERR",
[FW_DEVLOG_LEVEL_NOTICE] = "NOTICE",
[FW_DEVLOG_LEVEL_INFO] = "INFO",
[FW_DEVLOG_LEVEL_DEBUG] = "DEBUG"
};
static const char * const devlog_facility_strings[] = {
[FW_DEVLOG_FACILITY_CORE] = "CORE",
[FW_DEVLOG_FACILITY_CF] = "CF",
[FW_DEVLOG_FACILITY_SCHED] = "SCHED",
[FW_DEVLOG_FACILITY_TIMER] = "TIMER",
[FW_DEVLOG_FACILITY_RES] = "RES",
[FW_DEVLOG_FACILITY_HW] = "HW",
[FW_DEVLOG_FACILITY_FLR] = "FLR",
[FW_DEVLOG_FACILITY_DMAQ] = "DMAQ",
[FW_DEVLOG_FACILITY_PHY] = "PHY",
[FW_DEVLOG_FACILITY_MAC] = "MAC",
[FW_DEVLOG_FACILITY_PORT] = "PORT",
[FW_DEVLOG_FACILITY_VI] = "VI",
[FW_DEVLOG_FACILITY_FILTER] = "FILTER",
[FW_DEVLOG_FACILITY_ACL] = "ACL",
[FW_DEVLOG_FACILITY_TM] = "TM",
[FW_DEVLOG_FACILITY_QFC] = "QFC",
[FW_DEVLOG_FACILITY_DCB] = "DCB",
[FW_DEVLOG_FACILITY_ETH] = "ETH",
[FW_DEVLOG_FACILITY_OFLD] = "OFLD",
[FW_DEVLOG_FACILITY_RI] = "RI",
[FW_DEVLOG_FACILITY_ISCSI] = "ISCSI",
[FW_DEVLOG_FACILITY_FCOE] = "FCOE",
[FW_DEVLOG_FACILITY_FOISCSI] = "FOISCSI",
[FW_DEVLOG_FACILITY_FOFCOE] = "FOFCOE",
[FW_DEVLOG_FACILITY_CHNET] = "CHNET",
};
static int
sysctl_devlog(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct devlog_params *dparams = &sc->params.devlog;
struct fw_devlog_e *buf, *e;
int i, j, rc, nentries, first = 0;
struct sbuf *sb;
uint64_t ftstamp = UINT64_MAX;
if (dparams->addr == 0)
return (ENXIO);
buf = malloc(dparams->size, M_CXGBE, M_NOWAIT);
if (buf == NULL)
return (ENOMEM);
rc = read_via_memwin(sc, 1, dparams->addr, (void *)buf, dparams->size);
if (rc != 0)
goto done;
nentries = dparams->size / sizeof(struct fw_devlog_e);
for (i = 0; i < nentries; i++) {
e = &buf[i];
if (e->timestamp == 0)
break; /* end */
e->timestamp = be64toh(e->timestamp);
e->seqno = be32toh(e->seqno);
for (j = 0; j < 8; j++)
e->params[j] = be32toh(e->params[j]);
if (e->timestamp < ftstamp) {
ftstamp = e->timestamp;
first = i;
}
}
if (buf[first].timestamp == 0)
goto done; /* nothing in the log */
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
goto done;
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL) {
rc = ENOMEM;
goto done;
}
sbuf_printf(sb, "%10s %15s %8s %8s %s\n",
"Seq#", "Tstamp", "Level", "Facility", "Message");
i = first;
do {
e = &buf[i];
if (e->timestamp == 0)
break; /* end */
sbuf_printf(sb, "%10d %15ju %8s %8s ",
e->seqno, e->timestamp,
(e->level < nitems(devlog_level_strings) ?
devlog_level_strings[e->level] : "UNKNOWN"),
(e->facility < nitems(devlog_facility_strings) ?
devlog_facility_strings[e->facility] : "UNKNOWN"));
sbuf_printf(sb, e->fmt, e->params[0], e->params[1],
e->params[2], e->params[3], e->params[4],
e->params[5], e->params[6], e->params[7]);
if (++i == nentries)
i = 0;
} while (i != first);
rc = sbuf_finish(sb);
sbuf_delete(sb);
done:
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_fcoe_stats stats[MAX_NCHAN];
int i, nchan = sc->chip_params->nchan;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
for (i = 0; i < nchan; i++)
t4_get_fcoe_stats(sc, i, &stats[i]);
if (nchan > 2) {
sbuf_printf(sb, " channel 0 channel 1"
" channel 2 channel 3");
sbuf_printf(sb, "\noctetsDDP: %16ju %16ju %16ju %16ju",
stats[0].octets_ddp, stats[1].octets_ddp,
stats[2].octets_ddp, stats[3].octets_ddp);
sbuf_printf(sb, "\nframesDDP: %16u %16u %16u %16u",
stats[0].frames_ddp, stats[1].frames_ddp,
stats[2].frames_ddp, stats[3].frames_ddp);
sbuf_printf(sb, "\nframesDrop: %16u %16u %16u %16u",
stats[0].frames_drop, stats[1].frames_drop,
stats[2].frames_drop, stats[3].frames_drop);
} else {
sbuf_printf(sb, " channel 0 channel 1");
sbuf_printf(sb, "\noctetsDDP: %16ju %16ju",
stats[0].octets_ddp, stats[1].octets_ddp);
sbuf_printf(sb, "\nframesDDP: %16u %16u",
stats[0].frames_ddp, stats[1].frames_ddp);
sbuf_printf(sb, "\nframesDrop: %16u %16u",
stats[0].frames_drop, stats[1].frames_drop);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_hw_sched(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
unsigned int map, kbps, ipg, mode;
unsigned int pace_tab[NTX_SCHED];
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
map = t4_read_reg(sc, A_TP_TX_MOD_QUEUE_REQ_MAP);
mode = G_TIMERMODE(t4_read_reg(sc, A_TP_MOD_CONFIG));
t4_read_pace_tbl(sc, pace_tab);
sbuf_printf(sb, "Scheduler Mode Channel Rate (Kbps) "
"Class IPG (0.1 ns) Flow IPG (us)");
for (i = 0; i < NTX_SCHED; ++i, map >>= 2) {
t4_get_tx_sched(sc, i, &kbps, &ipg);
sbuf_printf(sb, "\n %u %-5s %u ", i,
(mode & (1 << i)) ? "flow" : "class", map & 3);
if (kbps)
sbuf_printf(sb, "%9u ", kbps);
else
sbuf_printf(sb, " disabled ");
if (ipg)
sbuf_printf(sb, "%13u ", ipg);
else
sbuf_printf(sb, " disabled ");
if (pace_tab[i])
sbuf_printf(sb, "%10u", pace_tab[i]);
else
sbuf_printf(sb, " disabled");
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_lb_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i, j;
uint64_t *p0, *p1;
struct lb_port_stats s[2];
static const char *stat_name[] = {
"OctetsOK:", "FramesOK:", "BcastFrames:", "McastFrames:",
"UcastFrames:", "ErrorFrames:", "Frames64:", "Frames65To127:",
"Frames128To255:", "Frames256To511:", "Frames512To1023:",
"Frames1024To1518:", "Frames1519ToMax:", "FramesDropped:",
"BG0FramesDropped:", "BG1FramesDropped:", "BG2FramesDropped:",
"BG3FramesDropped:", "BG0FramesTrunc:", "BG1FramesTrunc:",
"BG2FramesTrunc:", "BG3FramesTrunc:"
};
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
memset(s, 0, sizeof(s));
for (i = 0; i < sc->chip_params->nchan; i += 2) {
t4_get_lb_stats(sc, i, &s[0]);
t4_get_lb_stats(sc, i + 1, &s[1]);
p0 = &s[0].octets;
p1 = &s[1].octets;
sbuf_printf(sb, "%s Loopback %u"
" Loopback %u", i == 0 ? "" : "\n", i, i + 1);
for (j = 0; j < nitems(stat_name); j++)
sbuf_printf(sb, "\n%-17s %20ju %20ju", stat_name[j],
*p0++, *p1++);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_linkdnrc(SYSCTL_HANDLER_ARGS)
{
int rc = 0;
struct port_info *pi = arg1;
struct sbuf *sb;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return(rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 64, req);
if (sb == NULL)
return (ENOMEM);
if (pi->linkdnrc < 0)
sbuf_printf(sb, "n/a");
else
sbuf_printf(sb, "%s", t4_link_down_rc_str(pi->linkdnrc));
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
struct mem_desc {
unsigned int base;
unsigned int limit;
unsigned int idx;
};
static int
mem_desc_cmp(const void *a, const void *b)
{
return ((const struct mem_desc *)a)->base -
((const struct mem_desc *)b)->base;
}
static void
mem_region_show(struct sbuf *sb, const char *name, unsigned int from,
unsigned int to)
{
unsigned int size;
if (from == to)
return;
size = to - from + 1;
if (size == 0)
return;
/* XXX: need humanize_number(3) in libkern for a more readable 'size' */
sbuf_printf(sb, "%-15s %#x-%#x [%u]\n", name, from, to, size);
}
static int
sysctl_meminfo(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i, n;
uint32_t lo, hi, used, alloc;
static const char *memory[] = {"EDC0:", "EDC1:", "MC:", "MC0:", "MC1:"};
static const char *region[] = {
"DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:",
"Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:",
"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
"RQUDP region:", "PBL region:", "TXPBL region:",
"DBVFIFO region:", "ULPRX state:", "ULPTX state:",
"On-chip queues:"
};
struct mem_desc avail[4];
struct mem_desc mem[nitems(region) + 3]; /* up to 3 holes */
struct mem_desc *md = mem;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
for (i = 0; i < nitems(mem); i++) {
mem[i].limit = 0;
mem[i].idx = i;
}
/* Find and sort the populated memory ranges */
i = 0;
lo = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
if (lo & F_EDRAM0_ENABLE) {
hi = t4_read_reg(sc, A_MA_EDRAM0_BAR);
avail[i].base = G_EDRAM0_BASE(hi) << 20;
avail[i].limit = avail[i].base + (G_EDRAM0_SIZE(hi) << 20);
avail[i].idx = 0;
i++;
}
if (lo & F_EDRAM1_ENABLE) {
hi = t4_read_reg(sc, A_MA_EDRAM1_BAR);
avail[i].base = G_EDRAM1_BASE(hi) << 20;
avail[i].limit = avail[i].base + (G_EDRAM1_SIZE(hi) << 20);
avail[i].idx = 1;
i++;
}
if (lo & F_EXT_MEM_ENABLE) {
hi = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
avail[i].base = G_EXT_MEM_BASE(hi) << 20;
avail[i].limit = avail[i].base +
(G_EXT_MEM_SIZE(hi) << 20);
avail[i].idx = is_t5(sc) ? 3 : 2; /* Call it MC0 for T5 */
i++;
}
if (is_t5(sc) && lo & F_EXT_MEM1_ENABLE) {
hi = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
avail[i].base = G_EXT_MEM1_BASE(hi) << 20;
avail[i].limit = avail[i].base +
(G_EXT_MEM1_SIZE(hi) << 20);
avail[i].idx = 4;
i++;
}
if (!i) /* no memory available */
return 0;
qsort(avail, i, sizeof(struct mem_desc), mem_desc_cmp);
(md++)->base = t4_read_reg(sc, A_SGE_DBQ_CTXT_BADDR);
(md++)->base = t4_read_reg(sc, A_SGE_IMSG_CTXT_BADDR);
(md++)->base = t4_read_reg(sc, A_SGE_FLM_CACHE_BADDR);
(md++)->base = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_BASE);
(md++)->base = t4_read_reg(sc, A_TP_CMM_TIMER_BASE);
(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_RX_FLST_BASE);
(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_TX_FLST_BASE);
(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_PS_FLST_BASE);
/* the next few have explicit upper bounds */
md->base = t4_read_reg(sc, A_TP_PMM_TX_BASE);
md->limit = md->base - 1 +
t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE) *
G_PMTXMAXPAGE(t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE));
md++;
md->base = t4_read_reg(sc, A_TP_PMM_RX_BASE);
md->limit = md->base - 1 +
t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) *
G_PMRXMAXPAGE(t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE));
md++;
if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
if (chip_id(sc) <= CHELSIO_T5)
md->base = t4_read_reg(sc, A_LE_DB_HASH_TID_BASE);
else
md->base = t4_read_reg(sc, A_LE_DB_HASH_TBL_BASE_ADDR);
md->limit = 0;
} else {
md->base = 0;
md->idx = nitems(region); /* hide it */
}
md++;
#define ulp_region(reg) \
md->base = t4_read_reg(sc, A_ULP_ ## reg ## _LLIMIT);\
(md++)->limit = t4_read_reg(sc, A_ULP_ ## reg ## _ULIMIT)
ulp_region(RX_ISCSI);
ulp_region(RX_TDDP);
ulp_region(TX_TPT);
ulp_region(RX_STAG);
ulp_region(RX_RQ);
ulp_region(RX_RQUDP);
ulp_region(RX_PBL);
ulp_region(TX_PBL);
#undef ulp_region
md->base = 0;
md->idx = nitems(region);
if (!is_t4(sc)) {
uint32_t size = 0;
uint32_t sge_ctrl = t4_read_reg(sc, A_SGE_CONTROL2);
uint32_t fifo_size = t4_read_reg(sc, A_SGE_DBVFIFO_SIZE);
if (is_t5(sc)) {
if (sge_ctrl & F_VFIFO_ENABLE)
size = G_DBVFIFO_SIZE(fifo_size);
} else
size = G_T6_DBVFIFO_SIZE(fifo_size);
if (size) {
md->base = G_BASEADDR(t4_read_reg(sc,
A_SGE_DBVFIFO_BADDR));
md->limit = md->base + (size << 2) - 1;
}
}
md++;
md->base = t4_read_reg(sc, A_ULP_RX_CTX_BASE);
md->limit = 0;
md++;
md->base = t4_read_reg(sc, A_ULP_TX_ERR_TABLE_BASE);
md->limit = 0;
md++;
md->base = sc->vres.ocq.start;
if (sc->vres.ocq.size)
md->limit = md->base + sc->vres.ocq.size - 1;
else
md->idx = nitems(region); /* hide it */
md++;
/* add any address-space holes, there can be up to 3 */
for (n = 0; n < i - 1; n++)
if (avail[n].limit < avail[n + 1].base)
(md++)->base = avail[n].limit;
if (avail[n].limit)
(md++)->base = avail[n].limit;
n = md - mem;
qsort(mem, n, sizeof(struct mem_desc), mem_desc_cmp);
for (lo = 0; lo < i; lo++)
mem_region_show(sb, memory[avail[lo].idx], avail[lo].base,
avail[lo].limit - 1);
sbuf_printf(sb, "\n");
for (i = 0; i < n; i++) {
if (mem[i].idx >= nitems(region))
continue; /* skip holes */
if (!mem[i].limit)
mem[i].limit = i < n - 1 ? mem[i + 1].base - 1 : ~0;
mem_region_show(sb, region[mem[i].idx], mem[i].base,
mem[i].limit);
}
sbuf_printf(sb, "\n");
lo = t4_read_reg(sc, A_CIM_SDRAM_BASE_ADDR);
hi = t4_read_reg(sc, A_CIM_SDRAM_ADDR_SIZE) + lo - 1;
mem_region_show(sb, "uP RAM:", lo, hi);
lo = t4_read_reg(sc, A_CIM_EXTMEM2_BASE_ADDR);
hi = t4_read_reg(sc, A_CIM_EXTMEM2_ADDR_SIZE) + lo - 1;
mem_region_show(sb, "uP Extmem2:", lo, hi);
lo = t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE);
sbuf_printf(sb, "\n%u Rx pages of size %uKiB for %u channels\n",
G_PMRXMAXPAGE(lo),
t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) >> 10,
(lo & F_PMRXNUMCHN) ? 2 : 1);
lo = t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE);
hi = t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE);
sbuf_printf(sb, "%u Tx pages of size %u%ciB for %u channels\n",
G_PMTXMAXPAGE(lo),
hi >= (1 << 20) ? (hi >> 20) : (hi >> 10),
hi >= (1 << 20) ? 'M' : 'K', 1 << G_PMTXNUMCHN(lo));
sbuf_printf(sb, "%u p-structs\n",
t4_read_reg(sc, A_TP_CMM_MM_MAX_PSTRUCT));
for (i = 0; i < 4; i++) {
if (chip_id(sc) > CHELSIO_T5)
lo = t4_read_reg(sc, A_MPS_RX_MAC_BG_PG_CNT0 + i * 4);
else
lo = t4_read_reg(sc, A_MPS_RX_PG_RSV0 + i * 4);
if (is_t5(sc)) {
used = G_T5_USED(lo);
alloc = G_T5_ALLOC(lo);
} else {
used = G_USED(lo);
alloc = G_ALLOC(lo);
}
/* For T6 these are MAC buffer groups */
sbuf_printf(sb, "\nPort %d using %u pages out of %u allocated",
i, used, alloc);
}
for (i = 0; i < sc->chip_params->nchan; i++) {
if (chip_id(sc) > CHELSIO_T5)
lo = t4_read_reg(sc, A_MPS_RX_LPBK_BG_PG_CNT0 + i * 4);
else
lo = t4_read_reg(sc, A_MPS_RX_PG_RSV4 + i * 4);
if (is_t5(sc)) {
used = G_T5_USED(lo);
alloc = G_T5_ALLOC(lo);
} else {
used = G_USED(lo);
alloc = G_ALLOC(lo);
}
/* For T6 these are MAC buffer groups */
sbuf_printf(sb,
"\nLoopback %d using %u pages out of %u allocated",
i, used, alloc);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static inline void
tcamxy2valmask(uint64_t x, uint64_t y, uint8_t *addr, uint64_t *mask)
{
*mask = x | y;
y = htobe64(y);
memcpy(addr, (char *)&y + 2, ETHER_ADDR_LEN);
}
static int
sysctl_mps_tcam(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
MPASS(chip_id(sc) <= CHELSIO_T5);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
sbuf_printf(sb,
"Idx Ethernet address Mask Vld Ports PF"
" VF Replication P0 P1 P2 P3 ML");
for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
uint64_t tcamx, tcamy, mask;
uint32_t cls_lo, cls_hi;
uint8_t addr[ETHER_ADDR_LEN];
tcamy = t4_read_reg64(sc, MPS_CLS_TCAM_Y_L(i));
tcamx = t4_read_reg64(sc, MPS_CLS_TCAM_X_L(i));
if (tcamx & tcamy)
continue;
tcamxy2valmask(tcamx, tcamy, addr, &mask);
cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x %012jx"
" %c %#x%4u%4d", i, addr[0], addr[1], addr[2],
addr[3], addr[4], addr[5], (uintmax_t)mask,
(cls_lo & F_SRAM_VLD) ? 'Y' : 'N',
G_PORTMAP(cls_hi), G_PF(cls_lo),
(cls_lo & F_VF_VALID) ? G_VF(cls_lo) : -1);
if (cls_lo & F_REPLICATE) {
struct fw_ldst_cmd ldst_cmd;
memset(&ldst_cmd, 0, sizeof(ldst_cmd));
ldst_cmd.op_to_addrspace =
htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_READ |
V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
ldst_cmd.u.mps.rplc.fid_idx =
htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
V_FW_LDST_CMD_IDX(i));
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
"t4mps");
if (rc)
break;
rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
sizeof(ldst_cmd), &ldst_cmd);
end_synchronized_op(sc, 0);
if (rc != 0) {
sbuf_printf(sb, "%36d", rc);
rc = 0;
} else {
sbuf_printf(sb, " %08x %08x %08x %08x",
be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
}
} else
sbuf_printf(sb, "%36s", "");
sbuf_printf(sb, "%4u%3u%3u%3u %#3x", G_SRAM_PRIO0(cls_lo),
G_SRAM_PRIO1(cls_lo), G_SRAM_PRIO2(cls_lo),
G_SRAM_PRIO3(cls_lo), (cls_lo >> S_MULTILISTEN0) & 0xf);
}
if (rc)
(void) sbuf_finish(sb);
else
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
MPASS(chip_id(sc) > CHELSIO_T5);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
sbuf_printf(sb, "Idx Ethernet address Mask VNI Mask"
" IVLAN Vld DIP_Hit Lookup Port Vld Ports PF VF"
" Replication"
" P0 P1 P2 P3 ML\n");
for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
uint8_t dip_hit, vlan_vld, lookup_type, port_num;
uint16_t ivlan;
uint64_t tcamx, tcamy, val, mask;
uint32_t cls_lo, cls_hi, ctl, data2, vnix, vniy;
uint8_t addr[ETHER_ADDR_LEN];
ctl = V_CTLREQID(1) | V_CTLCMDTYPE(0) | V_CTLXYBITSEL(0);
if (i < 256)
ctl |= V_CTLTCAMINDEX(i) | V_CTLTCAMSEL(0);
else
ctl |= V_CTLTCAMINDEX(i - 256) | V_CTLTCAMSEL(1);
t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
tcamy = G_DMACH(val) << 32;
tcamy |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
lookup_type = G_DATALKPTYPE(data2);
port_num = G_DATAPORTNUM(data2);
if (lookup_type && lookup_type != M_DATALKPTYPE) {
/* Inner header VNI */
vniy = ((data2 & F_DATAVIDH2) << 23) |
(G_DATAVIDH1(data2) << 16) | G_VIDL(val);
dip_hit = data2 & F_DATADIPHIT;
vlan_vld = 0;
} else {
vniy = 0;
dip_hit = 0;
vlan_vld = data2 & F_DATAVIDH2;
ivlan = G_VIDL(val);
}
ctl |= V_CTLXYBITSEL(1);
t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
tcamx = G_DMACH(val) << 32;
tcamx |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
if (lookup_type && lookup_type != M_DATALKPTYPE) {
/* Inner header VNI mask */
vnix = ((data2 & F_DATAVIDH2) << 23) |
(G_DATAVIDH1(data2) << 16) | G_VIDL(val);
} else
vnix = 0;
if (tcamx & tcamy)
continue;
tcamxy2valmask(tcamx, tcamy, addr, &mask);
cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
if (lookup_type && lookup_type != M_DATALKPTYPE) {
sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
"%012jx %06x %06x - - %3c"
" 'I' %4x %3c %#x%4u%4d", i, addr[0],
addr[1], addr[2], addr[3], addr[4], addr[5],
(uintmax_t)mask, vniy, vnix, dip_hit ? 'Y' : 'N',
port_num, cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
} else {
sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
"%012jx - - ", i, addr[0], addr[1],
addr[2], addr[3], addr[4], addr[5],
(uintmax_t)mask);
if (vlan_vld)
sbuf_printf(sb, "%4u Y ", ivlan);
else
sbuf_printf(sb, " - N ");
sbuf_printf(sb, "- %3c %4x %3c %#x%4u%4d",
lookup_type ? 'I' : 'O', port_num,
cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
}
if (cls_lo & F_T6_REPLICATE) {
struct fw_ldst_cmd ldst_cmd;
memset(&ldst_cmd, 0, sizeof(ldst_cmd));
ldst_cmd.op_to_addrspace =
htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
F_FW_CMD_REQUEST | F_FW_CMD_READ |
V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
ldst_cmd.u.mps.rplc.fid_idx =
htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
V_FW_LDST_CMD_IDX(i));
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
"t6mps");
if (rc)
break;
rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
sizeof(ldst_cmd), &ldst_cmd);
end_synchronized_op(sc, 0);
if (rc != 0) {
sbuf_printf(sb, "%72d", rc);
rc = 0;
} else {
sbuf_printf(sb, " %08x %08x %08x %08x"
" %08x %08x %08x %08x",
be32toh(ldst_cmd.u.mps.rplc.rplc255_224),
be32toh(ldst_cmd.u.mps.rplc.rplc223_192),
be32toh(ldst_cmd.u.mps.rplc.rplc191_160),
be32toh(ldst_cmd.u.mps.rplc.rplc159_128),
be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
}
} else
sbuf_printf(sb, "%72s", "");
sbuf_printf(sb, "%4u%3u%3u%3u %#x",
G_T6_SRAM_PRIO0(cls_lo), G_T6_SRAM_PRIO1(cls_lo),
G_T6_SRAM_PRIO2(cls_lo), G_T6_SRAM_PRIO3(cls_lo),
(cls_lo >> S_T6_MULTILISTEN0) & 0xf);
}
if (rc)
(void) sbuf_finish(sb);
else
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_path_mtus(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
uint16_t mtus[NMTUS];
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
t4_read_mtu_tbl(sc, mtus, NULL);
sbuf_printf(sb, "%u %u %u %u %u %u %u %u %u %u %u %u %u %u %u %u",
mtus[0], mtus[1], mtus[2], mtus[3], mtus[4], mtus[5], mtus[6],
mtus[7], mtus[8], mtus[9], mtus[10], mtus[11], mtus[12], mtus[13],
mtus[14], mtus[15]);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_pm_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, i;
uint32_t tx_cnt[MAX_PM_NSTATS], rx_cnt[MAX_PM_NSTATS];
uint64_t tx_cyc[MAX_PM_NSTATS], rx_cyc[MAX_PM_NSTATS];
static const char *tx_stats[MAX_PM_NSTATS] = {
"Read:", "Write bypass:", "Write mem:", "Bypass + mem:",
"Tx FIFO wait", NULL, "Tx latency"
};
static const char *rx_stats[MAX_PM_NSTATS] = {
"Read:", "Write bypass:", "Write mem:", "Flush:",
" Rx FIFO wait", NULL, "Rx latency"
};
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
t4_pmtx_get_stats(sc, tx_cnt, tx_cyc);
t4_pmrx_get_stats(sc, rx_cnt, rx_cyc);
sbuf_printf(sb, " Tx pcmds Tx bytes");
for (i = 0; i < 4; i++) {
sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
tx_cyc[i]);
}
sbuf_printf(sb, "\n Rx pcmds Rx bytes");
for (i = 0; i < 4; i++) {
sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
rx_cyc[i]);
}
if (chip_id(sc) > CHELSIO_T5) {
sbuf_printf(sb,
"\n Total wait Total occupancy");
sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
tx_cyc[i]);
sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
rx_cyc[i]);
i += 2;
MPASS(i < nitems(tx_stats));
sbuf_printf(sb,
"\n Reads Total wait");
sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
tx_cyc[i]);
sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
rx_cyc[i]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_rdma_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_rdma_stats stats;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
mtx_lock(&sc->reg_lock);
t4_tp_get_rdma_stats(sc, &stats);
mtx_unlock(&sc->reg_lock);
sbuf_printf(sb, "NoRQEModDefferals: %u\n", stats.rqe_dfr_mod);
sbuf_printf(sb, "NoRQEPktDefferals: %u", stats.rqe_dfr_pkt);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_tcp_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_tcp_stats v4, v6;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
mtx_lock(&sc->reg_lock);
t4_tp_get_tcp_stats(sc, &v4, &v6);
mtx_unlock(&sc->reg_lock);
sbuf_printf(sb,
" IP IPv6\n");
sbuf_printf(sb, "OutRsts: %20u %20u\n",
v4.tcp_out_rsts, v6.tcp_out_rsts);
sbuf_printf(sb, "InSegs: %20ju %20ju\n",
v4.tcp_in_segs, v6.tcp_in_segs);
sbuf_printf(sb, "OutSegs: %20ju %20ju\n",
v4.tcp_out_segs, v6.tcp_out_segs);
sbuf_printf(sb, "RetransSegs: %20ju %20ju",
v4.tcp_retrans_segs, v6.tcp_retrans_segs);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_tids(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tid_info *t = &sc->tids;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
if (t->natids) {
sbuf_printf(sb, "ATID range: 0-%u, in use: %u\n", t->natids - 1,
t->atids_in_use);
}
if (t->ntids) {
if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
uint32_t b = t4_read_reg(sc, A_LE_DB_SERVER_INDEX) / 4;
if (b) {
sbuf_printf(sb, "TID range: 0-%u, %u-%u", b - 1,
t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
t->ntids - 1);
} else {
sbuf_printf(sb, "TID range: %u-%u",
t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
t->ntids - 1);
}
} else
sbuf_printf(sb, "TID range: 0-%u", t->ntids - 1);
sbuf_printf(sb, ", in use: %u\n",
atomic_load_acq_int(&t->tids_in_use));
}
if (t->nstids) {
sbuf_printf(sb, "STID range: %u-%u, in use: %u\n", t->stid_base,
t->stid_base + t->nstids - 1, t->stids_in_use);
}
if (t->nftids) {
sbuf_printf(sb, "FTID range: %u-%u\n", t->ftid_base,
t->ftid_base + t->nftids - 1);
}
if (t->netids) {
sbuf_printf(sb, "ETID range: %u-%u\n", t->etid_base,
t->etid_base + t->netids - 1);
}
sbuf_printf(sb, "HW TID usage: %u IP users, %u IPv6 users",
t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV4),
t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV6));
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
struct tp_err_stats stats;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
mtx_lock(&sc->reg_lock);
t4_tp_get_err_stats(sc, &stats);
mtx_unlock(&sc->reg_lock);
if (sc->chip_params->nchan > 2) {
sbuf_printf(sb, " channel 0 channel 1"
" channel 2 channel 3\n");
sbuf_printf(sb, "macInErrs: %10u %10u %10u %10u\n",
stats.mac_in_errs[0], stats.mac_in_errs[1],
stats.mac_in_errs[2], stats.mac_in_errs[3]);
sbuf_printf(sb, "hdrInErrs: %10u %10u %10u %10u\n",
stats.hdr_in_errs[0], stats.hdr_in_errs[1],
stats.hdr_in_errs[2], stats.hdr_in_errs[3]);
sbuf_printf(sb, "tcpInErrs: %10u %10u %10u %10u\n",
stats.tcp_in_errs[0], stats.tcp_in_errs[1],
stats.tcp_in_errs[2], stats.tcp_in_errs[3]);
sbuf_printf(sb, "tcp6InErrs: %10u %10u %10u %10u\n",
stats.tcp6_in_errs[0], stats.tcp6_in_errs[1],
stats.tcp6_in_errs[2], stats.tcp6_in_errs[3]);
sbuf_printf(sb, "tnlCongDrops: %10u %10u %10u %10u\n",
stats.tnl_cong_drops[0], stats.tnl_cong_drops[1],
stats.tnl_cong_drops[2], stats.tnl_cong_drops[3]);
sbuf_printf(sb, "tnlTxDrops: %10u %10u %10u %10u\n",
stats.tnl_tx_drops[0], stats.tnl_tx_drops[1],
stats.tnl_tx_drops[2], stats.tnl_tx_drops[3]);
sbuf_printf(sb, "ofldVlanDrops: %10u %10u %10u %10u\n",
stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1],
stats.ofld_vlan_drops[2], stats.ofld_vlan_drops[3]);
sbuf_printf(sb, "ofldChanDrops: %10u %10u %10u %10u\n\n",
stats.ofld_chan_drops[0], stats.ofld_chan_drops[1],
stats.ofld_chan_drops[2], stats.ofld_chan_drops[3]);
} else {
sbuf_printf(sb, " channel 0 channel 1\n");
sbuf_printf(sb, "macInErrs: %10u %10u\n",
stats.mac_in_errs[0], stats.mac_in_errs[1]);
sbuf_printf(sb, "hdrInErrs: %10u %10u\n",
stats.hdr_in_errs[0], stats.hdr_in_errs[1]);
sbuf_printf(sb, "tcpInErrs: %10u %10u\n",
stats.tcp_in_errs[0], stats.tcp_in_errs[1]);
sbuf_printf(sb, "tcp6InErrs: %10u %10u\n",
stats.tcp6_in_errs[0], stats.tcp6_in_errs[1]);
sbuf_printf(sb, "tnlCongDrops: %10u %10u\n",
stats.tnl_cong_drops[0], stats.tnl_cong_drops[1]);
sbuf_printf(sb, "tnlTxDrops: %10u %10u\n",
stats.tnl_tx_drops[0], stats.tnl_tx_drops[1]);
sbuf_printf(sb, "ofldVlanDrops: %10u %10u\n",
stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1]);
sbuf_printf(sb, "ofldChanDrops: %10u %10u\n\n",
stats.ofld_chan_drops[0], stats.ofld_chan_drops[1]);
}
sbuf_printf(sb, "ofldNoNeigh: %u\nofldCongDefer: %u",
stats.ofld_no_neigh, stats.ofld_cong_defer);
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct tp_params *tpp = &sc->params.tp;
u_int mask;
int rc;
mask = tpp->la_mask >> 16;
rc = sysctl_handle_int(oidp, &mask, 0, req);
if (rc != 0 || req->newptr == NULL)
return (rc);
if (mask > 0xffff)
return (EINVAL);
tpp->la_mask = mask << 16;
t4_set_reg_field(sc, A_TP_DBG_LA_CONFIG, 0xffff0000U, tpp->la_mask);
return (0);
}
struct field_desc {
const char *name;
u_int start;
u_int width;
};
static void
field_desc_show(struct sbuf *sb, uint64_t v, const struct field_desc *f)
{
char buf[32];
int line_size = 0;
while (f->name) {
uint64_t mask = (1ULL << f->width) - 1;
int len = snprintf(buf, sizeof(buf), "%s: %ju", f->name,
((uintmax_t)v >> f->start) & mask);
if (line_size + len >= 79) {
line_size = 8;
sbuf_printf(sb, "\n ");
}
sbuf_printf(sb, "%s ", buf);
line_size += len + 1;
f++;
}
sbuf_printf(sb, "\n");
}
static const struct field_desc tp_la0[] = {
{ "RcfOpCodeOut", 60, 4 },
{ "State", 56, 4 },
{ "WcfState", 52, 4 },
{ "RcfOpcSrcOut", 50, 2 },
{ "CRxError", 49, 1 },
{ "ERxError", 48, 1 },
{ "SanityFailed", 47, 1 },
{ "SpuriousMsg", 46, 1 },
{ "FlushInputMsg", 45, 1 },
{ "FlushInputCpl", 44, 1 },
{ "RssUpBit", 43, 1 },
{ "RssFilterHit", 42, 1 },
{ "Tid", 32, 10 },
{ "InitTcb", 31, 1 },
{ "LineNumber", 24, 7 },
{ "Emsg", 23, 1 },
{ "EdataOut", 22, 1 },
{ "Cmsg", 21, 1 },
{ "CdataOut", 20, 1 },
{ "EreadPdu", 19, 1 },
{ "CreadPdu", 18, 1 },
{ "TunnelPkt", 17, 1 },
{ "RcfPeerFin", 16, 1 },
{ "RcfReasonOut", 12, 4 },
{ "TxCchannel", 10, 2 },
{ "RcfTxChannel", 8, 2 },
{ "RxEchannel", 6, 2 },
{ "RcfRxChannel", 5, 1 },
{ "RcfDataOutSrdy", 4, 1 },
{ "RxDvld", 3, 1 },
{ "RxOoDvld", 2, 1 },
{ "RxCongestion", 1, 1 },
{ "TxCongestion", 0, 1 },
{ NULL }
};
static const struct field_desc tp_la1[] = {
{ "CplCmdIn", 56, 8 },
{ "CplCmdOut", 48, 8 },
{ "ESynOut", 47, 1 },
{ "EAckOut", 46, 1 },
{ "EFinOut", 45, 1 },
{ "ERstOut", 44, 1 },
{ "SynIn", 43, 1 },
{ "AckIn", 42, 1 },
{ "FinIn", 41, 1 },
{ "RstIn", 40, 1 },
{ "DataIn", 39, 1 },
{ "DataInVld", 38, 1 },
{ "PadIn", 37, 1 },
{ "RxBufEmpty", 36, 1 },
{ "RxDdp", 35, 1 },
{ "RxFbCongestion", 34, 1 },
{ "TxFbCongestion", 33, 1 },
{ "TxPktSumSrdy", 32, 1 },
{ "RcfUlpType", 28, 4 },
{ "Eread", 27, 1 },
{ "Ebypass", 26, 1 },
{ "Esave", 25, 1 },
{ "Static0", 24, 1 },
{ "Cread", 23, 1 },
{ "Cbypass", 22, 1 },
{ "Csave", 21, 1 },
{ "CPktOut", 20, 1 },
{ "RxPagePoolFull", 18, 2 },
{ "RxLpbkPkt", 17, 1 },
{ "TxLpbkPkt", 16, 1 },
{ "RxVfValid", 15, 1 },
{ "SynLearned", 14, 1 },
{ "SetDelEntry", 13, 1 },
{ "SetInvEntry", 12, 1 },
{ "CpcmdDvld", 11, 1 },
{ "CpcmdSave", 10, 1 },
{ "RxPstructsFull", 8, 2 },
{ "EpcmdDvld", 7, 1 },
{ "EpcmdFlush", 6, 1 },
{ "EpcmdTrimPrefix", 5, 1 },
{ "EpcmdTrimPostfix", 4, 1 },
{ "ERssIp4Pkt", 3, 1 },
{ "ERssIp6Pkt", 2, 1 },
{ "ERssTcpUdpPkt", 1, 1 },
{ "ERssFceFipPkt", 0, 1 },
{ NULL }
};
static const struct field_desc tp_la2[] = {
{ "CplCmdIn", 56, 8 },
{ "MpsVfVld", 55, 1 },
{ "MpsPf", 52, 3 },
{ "MpsVf", 44, 8 },
{ "SynIn", 43, 1 },
{ "AckIn", 42, 1 },
{ "FinIn", 41, 1 },
{ "RstIn", 40, 1 },
{ "DataIn", 39, 1 },
{ "DataInVld", 38, 1 },
{ "PadIn", 37, 1 },
{ "RxBufEmpty", 36, 1 },
{ "RxDdp", 35, 1 },
{ "RxFbCongestion", 34, 1 },
{ "TxFbCongestion", 33, 1 },
{ "TxPktSumSrdy", 32, 1 },
{ "RcfUlpType", 28, 4 },
{ "Eread", 27, 1 },
{ "Ebypass", 26, 1 },
{ "Esave", 25, 1 },
{ "Static0", 24, 1 },
{ "Cread", 23, 1 },
{ "Cbypass", 22, 1 },
{ "Csave", 21, 1 },
{ "CPktOut", 20, 1 },
{ "RxPagePoolFull", 18, 2 },
{ "RxLpbkPkt", 17, 1 },
{ "TxLpbkPkt", 16, 1 },
{ "RxVfValid", 15, 1 },
{ "SynLearned", 14, 1 },
{ "SetDelEntry", 13, 1 },
{ "SetInvEntry", 12, 1 },
{ "CpcmdDvld", 11, 1 },
{ "CpcmdSave", 10, 1 },
{ "RxPstructsFull", 8, 2 },
{ "EpcmdDvld", 7, 1 },
{ "EpcmdFlush", 6, 1 },
{ "EpcmdTrimPrefix", 5, 1 },
{ "EpcmdTrimPostfix", 4, 1 },
{ "ERssIp4Pkt", 3, 1 },
{ "ERssIp6Pkt", 2, 1 },
{ "ERssTcpUdpPkt", 1, 1 },
{ "ERssFceFipPkt", 0, 1 },
{ NULL }
};
static void
tp_la_show(struct sbuf *sb, uint64_t *p, int idx)
{
field_desc_show(sb, *p, tp_la0);
}
static void
tp_la_show2(struct sbuf *sb, uint64_t *p, int idx)
{
if (idx)
sbuf_printf(sb, "\n");
field_desc_show(sb, p[0], tp_la0);
if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
field_desc_show(sb, p[1], tp_la0);
}
static void
tp_la_show3(struct sbuf *sb, uint64_t *p, int idx)
{
if (idx)
sbuf_printf(sb, "\n");
field_desc_show(sb, p[0], tp_la0);
if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
field_desc_show(sb, p[1], (p[0] & (1 << 17)) ? tp_la2 : tp_la1);
}
static int
sysctl_tp_la(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
uint64_t *buf, *p;
int rc;
u_int i, inc;
void (*show_func)(struct sbuf *, uint64_t *, int);
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(TPLA_SIZE * sizeof(uint64_t), M_CXGBE, M_ZERO | M_WAITOK);
t4_tp_read_la(sc, buf, NULL);
p = buf;
switch (G_DBGLAMODE(t4_read_reg(sc, A_TP_DBG_LA_CONFIG))) {
case 2:
inc = 2;
show_func = tp_la_show2;
break;
case 3:
inc = 2;
show_func = tp_la_show3;
break;
default:
inc = 1;
show_func = tp_la_show;
}
for (i = 0; i < TPLA_SIZE / inc; i++, p += inc)
(*show_func)(sb, p, i);
rc = sbuf_finish(sb);
sbuf_delete(sb);
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_tx_rate(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc;
u64 nrate[MAX_NCHAN], orate[MAX_NCHAN];
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
if (sb == NULL)
return (ENOMEM);
t4_get_chan_txrate(sc, nrate, orate);
if (sc->chip_params->nchan > 2) {
sbuf_printf(sb, " channel 0 channel 1"
" channel 2 channel 3\n");
sbuf_printf(sb, "NIC B/s: %10ju %10ju %10ju %10ju\n",
nrate[0], nrate[1], nrate[2], nrate[3]);
sbuf_printf(sb, "Offload B/s: %10ju %10ju %10ju %10ju",
orate[0], orate[1], orate[2], orate[3]);
} else {
sbuf_printf(sb, " channel 0 channel 1\n");
sbuf_printf(sb, "NIC B/s: %10ju %10ju\n",
nrate[0], nrate[1]);
sbuf_printf(sb, "Offload B/s: %10ju %10ju",
orate[0], orate[1]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
static int
sysctl_ulprx_la(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
uint32_t *buf, *p;
int rc, i;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
buf = malloc(ULPRX_LA_SIZE * 8 * sizeof(uint32_t), M_CXGBE,
M_ZERO | M_WAITOK);
t4_ulprx_read_la(sc, buf);
p = buf;
sbuf_printf(sb, " Pcmd Type Message"
" Data");
for (i = 0; i < ULPRX_LA_SIZE; i++, p += 8) {
sbuf_printf(sb, "\n%08x%08x %4x %08x %08x%08x%08x%08x",
p[1], p[0], p[2], p[3], p[7], p[6], p[5], p[4]);
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
free(buf, M_CXGBE);
return (rc);
}
static int
sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
struct sbuf *sb;
int rc, v;
rc = sysctl_wire_old_buffer(req, 0);
if (rc != 0)
return (rc);
sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
if (sb == NULL)
return (ENOMEM);
v = t4_read_reg(sc, A_SGE_STAT_CFG);
if (G_STATSOURCE_T5(v) == 7) {
if (G_STATMODE(v) == 0) {
sbuf_printf(sb, "total %d, incomplete %d",
t4_read_reg(sc, A_SGE_STAT_TOTAL),
t4_read_reg(sc, A_SGE_STAT_MATCH));
} else if (G_STATMODE(v) == 1) {
sbuf_printf(sb, "total %d, data overflow %d",
t4_read_reg(sc, A_SGE_STAT_TOTAL),
t4_read_reg(sc, A_SGE_STAT_MATCH));
}
}
rc = sbuf_finish(sb);
sbuf_delete(sb);
return (rc);
}
+
+static int
+sysctl_tc_params(SYSCTL_HANDLER_ARGS)
+{
+ struct adapter *sc = arg1;
+ struct tx_sched_class *tc;
+ struct t4_sched_class_params p;
+ struct sbuf *sb;
+ int i, rc, port_id, flags, mbps, gbps;
+
+ rc = sysctl_wire_old_buffer(req, 0);
+ if (rc != 0)
+ return (rc);
+
+ sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
+ if (sb == NULL)
+ return (ENOMEM);
+
+ port_id = arg2 >> 16;
+ MPASS(port_id < sc->params.nports);
+ MPASS(sc->port[port_id] != NULL);
+ i = arg2 & 0xffff;
+ MPASS(i < sc->chip_params->nsched_cls);
+ tc = &sc->port[port_id]->tc[i];
+
+ rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
+ "t4tc_p");
+ if (rc)
+ goto done;
+ flags = tc->flags;
+ p = tc->params;
+ end_synchronized_op(sc, LOCK_HELD);
+
+ if ((flags & TX_SC_OK) == 0) {
+ sbuf_printf(sb, "none");
+ goto done;
+ }
+
+ if (p.level == SCHED_CLASS_LEVEL_CL_WRR) {
+ sbuf_printf(sb, "cl-wrr weight %u", p.weight);
+ goto done;
+ } else if (p.level == SCHED_CLASS_LEVEL_CL_RL)
+ sbuf_printf(sb, "cl-rl");
+ else if (p.level == SCHED_CLASS_LEVEL_CH_RL)
+ sbuf_printf(sb, "ch-rl");
+ else {
+ rc = ENXIO;
+ goto done;
+ }
+
+ if (p.ratemode == SCHED_CLASS_RATEMODE_REL) {
+ /* XXX: top speed or actual link speed? */
+ gbps = port_top_speed(sc->port[port_id]);
+ sbuf_printf(sb, " %u%% of %uGbps", p.maxrate, gbps);
+ }
+ else if (p.ratemode == SCHED_CLASS_RATEMODE_ABS) {
+ switch (p.rateunit) {
+ case SCHED_CLASS_RATEUNIT_BITS:
+ mbps = p.maxrate / 1000;
+ gbps = p.maxrate / 1000000;
+ if (p.maxrate == gbps * 1000000)
+ sbuf_printf(sb, " %uGbps", gbps);
+ else if (p.maxrate == mbps * 1000)
+ sbuf_printf(sb, " %uMbps", mbps);
+ else
+ sbuf_printf(sb, " %uKbps", p.maxrate);
+ break;
+ case SCHED_CLASS_RATEUNIT_PKTS:
+ sbuf_printf(sb, " %upps", p.maxrate);
+ break;
+ default:
+ rc = ENXIO;
+ goto done;
+ }
+ }
+
+ switch (p.mode) {
+ case SCHED_CLASS_MODE_CLASS:
+ sbuf_printf(sb, " aggregate");
+ break;
+ case SCHED_CLASS_MODE_FLOW:
+ sbuf_printf(sb, " per-flow");
+ break;
+ default:
+ rc = ENXIO;
+ goto done;
+ }
+
+done:
+ if (rc == 0)
+ rc = sbuf_finish(sb);
+ sbuf_delete(sb);
+
+ return (rc);
+}
#endif
#ifdef TCP_OFFLOAD
static void
unit_conv(char *buf, size_t len, u_int val, u_int factor)
{
u_int rem = val % factor;
if (rem == 0)
snprintf(buf, len, "%u", val / factor);
else {
while (rem % 10 == 0)
rem /= 10;
snprintf(buf, len, "%u.%u", val / factor, rem);
}
}
static int
sysctl_tp_tick(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
char buf[16];
u_int res, re;
u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
switch (arg2) {
case 0:
/* timer_tick */
re = G_TIMERRESOLUTION(res);
break;
case 1:
/* TCP timestamp tick */
re = G_TIMESTAMPRESOLUTION(res);
break;
case 2:
/* DACK tick */
re = G_DELAYEDACKRESOLUTION(res);
break;
default:
return (EDOOFUS);
}
unit_conv(buf, sizeof(buf), (cclk_ps << re), 1000000);
return (sysctl_handle_string(oidp, buf, sizeof(buf), req));
}
static int
sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
u_int res, dack_re, v;
u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
dack_re = G_DELAYEDACKRESOLUTION(res);
v = ((cclk_ps << dack_re) / 1000000) * t4_read_reg(sc, A_TP_DACK_TIMER);
return (sysctl_handle_int(oidp, &v, 0, req));
}
static int
sysctl_tp_timer(SYSCTL_HANDLER_ARGS)
{
struct adapter *sc = arg1;
int reg = arg2;
u_int tre;
u_long tp_tick_us, v;
u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
MPASS(reg == A_TP_RXT_MIN || reg == A_TP_RXT_MAX ||
reg == A_TP_PERS_MIN || reg == A_TP_PERS_MAX ||
reg == A_TP_KEEP_IDLE || A_TP_KEEP_INTVL || reg == A_TP_INIT_SRTT ||
reg == A_TP_FINWAIT2_TIMER);
tre = G_TIMERRESOLUTION(t4_read_reg(sc, A_TP_TIMER_RESOLUTION));
tp_tick_us = (cclk_ps << tre) / 1000000;
if (reg == A_TP_INIT_SRTT)
v = tp_tick_us * G_INITSRTT(t4_read_reg(sc, reg));
else
v = tp_tick_us * t4_read_reg(sc, reg);
return (sysctl_handle_long(oidp, &v, 0, req));
}
#endif
static uint32_t
fconf_iconf_to_mode(uint32_t fconf, uint32_t iconf)
{
uint32_t mode;
mode = T4_FILTER_IPv4 | T4_FILTER_IPv6 | T4_FILTER_IP_SADDR |
T4_FILTER_IP_DADDR | T4_FILTER_IP_SPORT | T4_FILTER_IP_DPORT;
if (fconf & F_FRAGMENTATION)
mode |= T4_FILTER_IP_FRAGMENT;
if (fconf & F_MPSHITTYPE)
mode |= T4_FILTER_MPS_HIT_TYPE;
if (fconf & F_MACMATCH)
mode |= T4_FILTER_MAC_IDX;
if (fconf & F_ETHERTYPE)
mode |= T4_FILTER_ETH_TYPE;
if (fconf & F_PROTOCOL)
mode |= T4_FILTER_IP_PROTO;
if (fconf & F_TOS)
mode |= T4_FILTER_IP_TOS;
if (fconf & F_VLAN)
mode |= T4_FILTER_VLAN;
if (fconf & F_VNIC_ID) {
mode |= T4_FILTER_VNIC;
if (iconf & F_VNIC)
mode |= T4_FILTER_IC_VNIC;
}
if (fconf & F_PORT)
mode |= T4_FILTER_PORT;
if (fconf & F_FCOE)
mode |= T4_FILTER_FCoE;
return (mode);
}
static uint32_t
mode_to_fconf(uint32_t mode)
{
uint32_t fconf = 0;
if (mode & T4_FILTER_IP_FRAGMENT)
fconf |= F_FRAGMENTATION;
if (mode & T4_FILTER_MPS_HIT_TYPE)
fconf |= F_MPSHITTYPE;
if (mode & T4_FILTER_MAC_IDX)
fconf |= F_MACMATCH;
if (mode & T4_FILTER_ETH_TYPE)
fconf |= F_ETHERTYPE;
if (mode & T4_FILTER_IP_PROTO)
fconf |= F_PROTOCOL;
if (mode & T4_FILTER_IP_TOS)
fconf |= F_TOS;
if (mode & T4_FILTER_VLAN)
fconf |= F_VLAN;
if (mode & T4_FILTER_VNIC)
fconf |= F_VNIC_ID;
if (mode & T4_FILTER_PORT)
fconf |= F_PORT;
if (mode & T4_FILTER_FCoE)
fconf |= F_FCOE;
return (fconf);
}
static uint32_t
mode_to_iconf(uint32_t mode)
{
if (mode & T4_FILTER_IC_VNIC)
return (F_VNIC);
return (0);
}
static int check_fspec_against_fconf_iconf(struct adapter *sc,
struct t4_filter_specification *fs)
{
struct tp_params *tpp = &sc->params.tp;
uint32_t fconf = 0;
if (fs->val.frag || fs->mask.frag)
fconf |= F_FRAGMENTATION;
if (fs->val.matchtype || fs->mask.matchtype)
fconf |= F_MPSHITTYPE;
if (fs->val.macidx || fs->mask.macidx)
fconf |= F_MACMATCH;
if (fs->val.ethtype || fs->mask.ethtype)
fconf |= F_ETHERTYPE;
if (fs->val.proto || fs->mask.proto)
fconf |= F_PROTOCOL;
if (fs->val.tos || fs->mask.tos)
fconf |= F_TOS;
if (fs->val.vlan_vld || fs->mask.vlan_vld)
fconf |= F_VLAN;
if (fs->val.ovlan_vld || fs->mask.ovlan_vld) {
fconf |= F_VNIC_ID;
if (tpp->ingress_config & F_VNIC)
return (EINVAL);
}
if (fs->val.pfvf_vld || fs->mask.pfvf_vld) {
fconf |= F_VNIC_ID;
if ((tpp->ingress_config & F_VNIC) == 0)
return (EINVAL);
}
if (fs->val.iport || fs->mask.iport)
fconf |= F_PORT;
if (fs->val.fcoe || fs->mask.fcoe)
fconf |= F_FCOE;
if ((tpp->vlan_pri_map | fconf) != tpp->vlan_pri_map)
return (E2BIG);
return (0);
}
static int
get_filter_mode(struct adapter *sc, uint32_t *mode)
{
struct tp_params *tpp = &sc->params.tp;
/*
* We trust the cached values of the relevant TP registers. This means
* things work reliably only if writes to those registers are always via
* t4_set_filter_mode.
*/
*mode = fconf_iconf_to_mode(tpp->vlan_pri_map, tpp->ingress_config);
return (0);
}
static int
set_filter_mode(struct adapter *sc, uint32_t mode)
{
struct tp_params *tpp = &sc->params.tp;
uint32_t fconf, iconf;
int rc;
iconf = mode_to_iconf(mode);
if ((iconf ^ tpp->ingress_config) & F_VNIC) {
/*
* For now we just complain if A_TP_INGRESS_CONFIG is not
* already set to the correct value for the requested filter
* mode. It's not clear if it's safe to write to this register
* on the fly. (And we trust the cached value of the register).
*/
return (EBUSY);
}
fconf = mode_to_fconf(mode);
rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4setfm");
if (rc)
return (rc);
if (sc->tids.ftids_in_use > 0) {
rc = EBUSY;
goto done;
}
#ifdef TCP_OFFLOAD
if (uld_active(sc, ULD_TOM)) {
rc = EBUSY;
goto done;
}
#endif
rc = -t4_set_filter_mode(sc, fconf);
done:
end_synchronized_op(sc, LOCK_HELD);
return (rc);
}
static inline uint64_t
get_filter_hits(struct adapter *sc, uint32_t fid)
{
uint32_t tcb_addr;
tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE) +
(fid + sc->tids.ftid_base) * TCB_SIZE;
if (is_t4(sc)) {
uint64_t hits;
read_via_memwin(sc, 0, tcb_addr + 16, (uint32_t *)&hits, 8);
return (be64toh(hits));
} else {
uint32_t hits;
read_via_memwin(sc, 0, tcb_addr + 24, &hits, 4);
return (be32toh(hits));
}
}
static int
get_filter(struct adapter *sc, struct t4_filter *t)
{
int i, rc, nfilters = sc->tids.nftids;
struct filter_entry *f;
rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
"t4getf");
if (rc)
return (rc);
if (sc->tids.ftids_in_use == 0 || sc->tids.ftid_tab == NULL ||
t->idx >= nfilters) {
t->idx = 0xffffffff;
goto done;
}
f = &sc->tids.ftid_tab[t->idx];
for (i = t->idx; i < nfilters; i++, f++) {
if (f->valid) {
t->idx = i;
t->l2tidx = f->l2t ? f->l2t->idx : 0;
t->smtidx = f->smtidx;
if (f->fs.hitcnts)
t->hits = get_filter_hits(sc, t->idx);
else
t->hits = UINT64_MAX;
t->fs = f->fs;
goto done;
}
}
t->idx = 0xffffffff;
done:
end_synchronized_op(sc, LOCK_HELD);
return (0);
}
static int
set_filter(struct adapter *sc, struct t4_filter *t)
{
unsigned int nfilters, nports;
struct filter_entry *f;
int i, rc;
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setf");
if (rc)
return (rc);
nfilters = sc->tids.nftids;
nports = sc->params.nports;
if (nfilters == 0) {
rc = ENOTSUP;
goto done;
}
if (!(sc->flags & FULL_INIT_DONE)) {
rc = EAGAIN;
goto done;
}
if (t->idx >= nfilters) {
rc = EINVAL;
goto done;
}
/* Validate against the global filter mode and ingress config */
rc = check_fspec_against_fconf_iconf(sc, &t->fs);
if (rc != 0)
goto done;
if (t->fs.action == FILTER_SWITCH && t->fs.eport >= nports) {
rc = EINVAL;
goto done;
}
if (t->fs.val.iport >= nports) {
rc = EINVAL;
goto done;
}
/* Can't specify an iq if not steering to it */
if (!t->fs.dirsteer && t->fs.iq) {
rc = EINVAL;
goto done;
}
/* IPv6 filter idx must be 4 aligned */
if (t->fs.type == 1 &&
((t->idx & 0x3) || t->idx + 4 >= nfilters)) {
rc = EINVAL;
goto done;
}
if (sc->tids.ftid_tab == NULL) {
KASSERT(sc->tids.ftids_in_use == 0,
("%s: no memory allocated but filters_in_use > 0",
__func__));
sc->tids.ftid_tab = malloc(sizeof (struct filter_entry) *
nfilters, M_CXGBE, M_NOWAIT | M_ZERO);
if (sc->tids.ftid_tab == NULL) {
rc = ENOMEM;
goto done;
}
mtx_init(&sc->tids.ftid_lock, "T4 filters", 0, MTX_DEF);
}
for (i = 0; i < 4; i++) {
f = &sc->tids.ftid_tab[t->idx + i];
if (f->pending || f->valid) {
rc = EBUSY;
goto done;
}
if (f->locked) {
rc = EPERM;
goto done;
}
if (t->fs.type == 0)
break;
}
f = &sc->tids.ftid_tab[t->idx];
f->fs = t->fs;
rc = set_filter_wr(sc, t->idx);
done:
end_synchronized_op(sc, 0);
if (rc == 0) {
mtx_lock(&sc->tids.ftid_lock);
for (;;) {
if (f->pending == 0) {
rc = f->valid ? 0 : EIO;
break;
}
if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
PCATCH, "t4setfw", 0)) {
rc = EINPROGRESS;
break;
}
}
mtx_unlock(&sc->tids.ftid_lock);
}
return (rc);
}
static int
del_filter(struct adapter *sc, struct t4_filter *t)
{
unsigned int nfilters;
struct filter_entry *f;
int rc;
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4delf");
if (rc)
return (rc);
nfilters = sc->tids.nftids;
if (nfilters == 0) {
rc = ENOTSUP;
goto done;
}
if (sc->tids.ftid_tab == NULL || sc->tids.ftids_in_use == 0 ||
t->idx >= nfilters) {
rc = EINVAL;
goto done;
}
if (!(sc->flags & FULL_INIT_DONE)) {
rc = EAGAIN;
goto done;
}
f = &sc->tids.ftid_tab[t->idx];
if (f->pending) {
rc = EBUSY;
goto done;
}
if (f->locked) {
rc = EPERM;
goto done;
}
if (f->valid) {
t->fs = f->fs; /* extra info for the caller */
rc = del_filter_wr(sc, t->idx);
}
done:
end_synchronized_op(sc, 0);
if (rc == 0) {
mtx_lock(&sc->tids.ftid_lock);
for (;;) {
if (f->pending == 0) {
rc = f->valid ? EIO : 0;
break;
}
if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
PCATCH, "t4delfw", 0)) {
rc = EINPROGRESS;
break;
}
}
mtx_unlock(&sc->tids.ftid_lock);
}
return (rc);
}
static void
clear_filter(struct filter_entry *f)
{
if (f->l2t)
t4_l2t_release(f->l2t);
bzero(f, sizeof (*f));
}
static int
set_filter_wr(struct adapter *sc, int fidx)
{
struct filter_entry *f = &sc->tids.ftid_tab[fidx];
struct fw_filter_wr *fwr;
unsigned int ftid, vnic_vld, vnic_vld_mask;
struct wrq_cookie cookie;
ASSERT_SYNCHRONIZED_OP(sc);
if (f->fs.newdmac || f->fs.newvlan) {
/* This filter needs an L2T entry; allocate one. */
f->l2t = t4_l2t_alloc_switching(sc->l2t);
if (f->l2t == NULL)
return (EAGAIN);
if (t4_l2t_set_switching(sc, f->l2t, f->fs.vlan, f->fs.eport,
f->fs.dmac)) {
t4_l2t_release(f->l2t);
f->l2t = NULL;
return (ENOMEM);
}
}
/* Already validated against fconf, iconf */
MPASS((f->fs.val.pfvf_vld & f->fs.val.ovlan_vld) == 0);
MPASS((f->fs.mask.pfvf_vld & f->fs.mask.ovlan_vld) == 0);
if (f->fs.val.pfvf_vld || f->fs.val.ovlan_vld)
vnic_vld = 1;
else
vnic_vld = 0;
if (f->fs.mask.pfvf_vld || f->fs.mask.ovlan_vld)
vnic_vld_mask = 1;
else
vnic_vld_mask = 0;
ftid = sc->tids.ftid_base + fidx;
fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
if (fwr == NULL)
return (ENOMEM);
bzero(fwr, sizeof(*fwr));
fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR));
fwr->len16_pkd = htobe32(FW_LEN16(*fwr));
fwr->tid_to_iq =
htobe32(V_FW_FILTER_WR_TID(ftid) |
V_FW_FILTER_WR_RQTYPE(f->fs.type) |
V_FW_FILTER_WR_NOREPLY(0) |
V_FW_FILTER_WR_IQ(f->fs.iq));
fwr->del_filter_to_l2tix =
htobe32(V_FW_FILTER_WR_RPTTID(f->fs.rpttid) |
V_FW_FILTER_WR_DROP(f->fs.action == FILTER_DROP) |
V_FW_FILTER_WR_DIRSTEER(f->fs.dirsteer) |
V_FW_FILTER_WR_MASKHASH(f->fs.maskhash) |
V_FW_FILTER_WR_DIRSTEERHASH(f->fs.dirsteerhash) |
V_FW_FILTER_WR_LPBK(f->fs.action == FILTER_SWITCH) |
V_FW_FILTER_WR_DMAC(f->fs.newdmac) |
V_FW_FILTER_WR_SMAC(f->fs.newsmac) |
V_FW_FILTER_WR_INSVLAN(f->fs.newvlan == VLAN_INSERT ||
f->fs.newvlan == VLAN_REWRITE) |
V_FW_FILTER_WR_RMVLAN(f->fs.newvlan == VLAN_REMOVE ||
f->fs.newvlan == VLAN_REWRITE) |
V_FW_FILTER_WR_HITCNTS(f->fs.hitcnts) |
V_FW_FILTER_WR_TXCHAN(f->fs.eport) |
V_FW_FILTER_WR_PRIO(f->fs.prio) |
V_FW_FILTER_WR_L2TIX(f->l2t ? f->l2t->idx : 0));
fwr->ethtype = htobe16(f->fs.val.ethtype);
fwr->ethtypem = htobe16(f->fs.mask.ethtype);
fwr->frag_to_ovlan_vldm =
(V_FW_FILTER_WR_FRAG(f->fs.val.frag) |
V_FW_FILTER_WR_FRAGM(f->fs.mask.frag) |
V_FW_FILTER_WR_IVLAN_VLD(f->fs.val.vlan_vld) |
V_FW_FILTER_WR_OVLAN_VLD(vnic_vld) |
V_FW_FILTER_WR_IVLAN_VLDM(f->fs.mask.vlan_vld) |
V_FW_FILTER_WR_OVLAN_VLDM(vnic_vld_mask));
fwr->smac_sel = 0;
fwr->rx_chan_rx_rpl_iq = htobe16(V_FW_FILTER_WR_RX_CHAN(0) |
V_FW_FILTER_WR_RX_RPL_IQ(sc->sge.fwq.abs_id));
fwr->maci_to_matchtypem =
htobe32(V_FW_FILTER_WR_MACI(f->fs.val.macidx) |
V_FW_FILTER_WR_MACIM(f->fs.mask.macidx) |
V_FW_FILTER_WR_FCOE(f->fs.val.fcoe) |
V_FW_FILTER_WR_FCOEM(f->fs.mask.fcoe) |
V_FW_FILTER_WR_PORT(f->fs.val.iport) |
V_FW_FILTER_WR_PORTM(f->fs.mask.iport) |
V_FW_FILTER_WR_MATCHTYPE(f->fs.val.matchtype) |
V_FW_FILTER_WR_MATCHTYPEM(f->fs.mask.matchtype));
fwr->ptcl = f->fs.val.proto;
fwr->ptclm = f->fs.mask.proto;
fwr->ttyp = f->fs.val.tos;
fwr->ttypm = f->fs.mask.tos;
fwr->ivlan = htobe16(f->fs.val.vlan);
fwr->ivlanm = htobe16(f->fs.mask.vlan);
fwr->ovlan = htobe16(f->fs.val.vnic);
fwr->ovlanm = htobe16(f->fs.mask.vnic);
bcopy(f->fs.val.dip, fwr->lip, sizeof (fwr->lip));
bcopy(f->fs.mask.dip, fwr->lipm, sizeof (fwr->lipm));
bcopy(f->fs.val.sip, fwr->fip, sizeof (fwr->fip));
bcopy(f->fs.mask.sip, fwr->fipm, sizeof (fwr->fipm));
fwr->lp = htobe16(f->fs.val.dport);
fwr->lpm = htobe16(f->fs.mask.dport);
fwr->fp = htobe16(f->fs.val.sport);
fwr->fpm = htobe16(f->fs.mask.sport);
if (f->fs.newsmac)
bcopy(f->fs.smac, fwr->sma, sizeof (fwr->sma));
f->pending = 1;
sc->tids.ftids_in_use++;
commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
return (0);
}
static int
del_filter_wr(struct adapter *sc, int fidx)
{
struct filter_entry *f = &sc->tids.ftid_tab[fidx];
struct fw_filter_wr *fwr;
unsigned int ftid;
struct wrq_cookie cookie;
ftid = sc->tids.ftid_base + fidx;
fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
if (fwr == NULL)
return (ENOMEM);
bzero(fwr, sizeof (*fwr));
t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id);
f->pending = 1;
commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
return (0);
}
int
t4_filter_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
{
struct adapter *sc = iq->adapter;
const struct cpl_set_tcb_rpl *rpl = (const void *)(rss + 1);
unsigned int idx = GET_TID(rpl);
unsigned int rc;
struct filter_entry *f;
KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__,
rss->opcode));
if (is_ftid(sc, idx)) {
idx -= sc->tids.ftid_base;
f = &sc->tids.ftid_tab[idx];
rc = G_COOKIE(rpl->cookie);
mtx_lock(&sc->tids.ftid_lock);
if (rc == FW_FILTER_WR_FLT_ADDED) {
KASSERT(f->pending, ("%s: filter[%u] isn't pending.",
__func__, idx));
f->smtidx = (be64toh(rpl->oldval) >> 24) & 0xff;
f->pending = 0; /* asynchronous setup completed */
f->valid = 1;
} else {
if (rc != FW_FILTER_WR_FLT_DELETED) {
/* Add or delete failed, display an error */
log(LOG_ERR,
"filter %u setup failed with error %u\n",
idx, rc);
}
clear_filter(f);
sc->tids.ftids_in_use--;
}
wakeup(&sc->tids.ftid_tab);
mtx_unlock(&sc->tids.ftid_lock);
}
return (0);
}
static int
get_sge_context(struct adapter *sc, struct t4_sge_context *cntxt)
{
int rc;
if (cntxt->cid > M_CTXTQID)
return (EINVAL);
if (cntxt->mem_id != CTXT_EGRESS && cntxt->mem_id != CTXT_INGRESS &&
cntxt->mem_id != CTXT_FLM && cntxt->mem_id != CTXT_CNM)
return (EINVAL);
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ctxt");
if (rc)
return (rc);
if (sc->flags & FW_OK) {
rc = -t4_sge_ctxt_rd(sc, sc->mbox, cntxt->cid, cntxt->mem_id,
&cntxt->data[0]);
if (rc == 0)
goto done;
}
/*
* Read via firmware failed or wasn't even attempted. Read directly via
* the backdoor.
*/
rc = -t4_sge_ctxt_rd_bd(sc, cntxt->cid, cntxt->mem_id, &cntxt->data[0]);
done:
end_synchronized_op(sc, 0);
return (rc);
}
static int
load_fw(struct adapter *sc, struct t4_data *fw)
{
int rc;
uint8_t *fw_data;
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ldfw");
if (rc)
return (rc);
if (sc->flags & FULL_INIT_DONE) {
rc = EBUSY;
goto done;
}
fw_data = malloc(fw->len, M_CXGBE, M_WAITOK);
if (fw_data == NULL) {
rc = ENOMEM;
goto done;
}
rc = copyin(fw->data, fw_data, fw->len);
if (rc == 0)
rc = -t4_load_fw(sc, fw_data, fw->len);
free(fw_data, M_CXGBE);
done:
end_synchronized_op(sc, 0);
return (rc);
}
#define MAX_READ_BUF_SIZE (128 * 1024)
static int
read_card_mem(struct adapter *sc, int win, struct t4_mem_range *mr)
{
uint32_t addr, remaining, n;
uint32_t *buf;
int rc;
uint8_t *dst;
rc = validate_mem_range(sc, mr->addr, mr->len);
if (rc != 0)
return (rc);
buf = malloc(min(mr->len, MAX_READ_BUF_SIZE), M_CXGBE, M_WAITOK);
addr = mr->addr;
remaining = mr->len;
dst = (void *)mr->data;
while (remaining) {
n = min(remaining, MAX_READ_BUF_SIZE);
read_via_memwin(sc, 2, addr, buf, n);
rc = copyout(buf, dst, n);
if (rc != 0)
break;
dst += n;
remaining -= n;
addr += n;
}
free(buf, M_CXGBE);
return (rc);
}
#undef MAX_READ_BUF_SIZE
static int
read_i2c(struct adapter *sc, struct t4_i2c_data *i2cd)
{
int rc;
if (i2cd->len == 0 || i2cd->port_id >= sc->params.nports)
return (EINVAL);
if (i2cd->len > sizeof(i2cd->data))
return (EFBIG);
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4i2crd");
if (rc)
return (rc);
rc = -t4_i2c_rd(sc, sc->mbox, i2cd->port_id, i2cd->dev_addr,
i2cd->offset, i2cd->len, &i2cd->data[0]);
end_synchronized_op(sc, 0);
return (rc);
}
static int
in_range(int val, int lo, int hi)
{
return (val < 0 || (val <= hi && val >= lo));
}
static int
-set_sched_class(struct adapter *sc, struct t4_sched_params *p)
+set_sched_class_config(struct adapter *sc, int minmax)
{
- int fw_subcmd, fw_type, rc;
+ int rc;
- rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsc");
+ if (minmax < 0)
+ return (EINVAL);
+
+ rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4sscc");
if (rc)
return (rc);
+ rc = -t4_sched_config(sc, FW_SCHED_TYPE_PKTSCHED, minmax, 1);
+ end_synchronized_op(sc, 0);
- if (!(sc->flags & FULL_INIT_DONE)) {
- rc = EAGAIN;
- goto done;
- }
+ return (rc);
+}
- /*
- * Translate the cxgbetool parameters into T4 firmware parameters. (The
- * sub-command and type are in common locations.)
- */
- if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG)
- fw_subcmd = FW_SCHED_SC_CONFIG;
- else if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS)
- fw_subcmd = FW_SCHED_SC_PARAMS;
- else {
- rc = EINVAL;
- goto done;
- }
- if (p->type == SCHED_CLASS_TYPE_PACKET)
- fw_type = FW_SCHED_TYPE_PKTSCHED;
- else {
- rc = EINVAL;
- goto done;
- }
+static int
+set_sched_class_params(struct adapter *sc, struct t4_sched_class_params *p,
+ int sleep_ok)
+{
+ int rc, top_speed, fw_level, fw_mode, fw_rateunit, fw_ratemode;
+ struct port_info *pi;
+ struct tx_sched_class *tc;
- if (fw_subcmd == FW_SCHED_SC_CONFIG) {
- /* Vet our parameters ..*/
- if (p->u.config.minmax < 0) {
- rc = EINVAL;
- goto done;
- }
+ if (p->level == SCHED_CLASS_LEVEL_CL_RL)
+ fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL;
+ else if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
+ fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR;
+ else if (p->level == SCHED_CLASS_LEVEL_CH_RL)
+ fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL;
+ else
+ return (EINVAL);
- /* And pass the request to the firmware ...*/
- rc = -t4_sched_config(sc, fw_type, p->u.config.minmax, 1);
- goto done;
- }
+ if (p->mode == SCHED_CLASS_MODE_CLASS)
+ fw_mode = FW_SCHED_PARAMS_MODE_CLASS;
+ else if (p->mode == SCHED_CLASS_MODE_FLOW)
+ fw_mode = FW_SCHED_PARAMS_MODE_FLOW;
+ else
+ return (EINVAL);
- if (fw_subcmd == FW_SCHED_SC_PARAMS) {
- int fw_level;
- int fw_mode;
- int fw_rateunit;
- int fw_ratemode;
+ if (p->rateunit == SCHED_CLASS_RATEUNIT_BITS)
+ fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE;
+ else if (p->rateunit == SCHED_CLASS_RATEUNIT_PKTS)
+ fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE;
+ else
+ return (EINVAL);
- if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL)
- fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL;
- else if (p->u.params.level == SCHED_CLASS_LEVEL_CL_WRR)
- fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR;
- else if (p->u.params.level == SCHED_CLASS_LEVEL_CH_RL)
- fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL;
- else {
- rc = EINVAL;
- goto done;
- }
+ if (p->ratemode == SCHED_CLASS_RATEMODE_REL)
+ fw_ratemode = FW_SCHED_PARAMS_RATE_REL;
+ else if (p->ratemode == SCHED_CLASS_RATEMODE_ABS)
+ fw_ratemode = FW_SCHED_PARAMS_RATE_ABS;
+ else
+ return (EINVAL);
- if (p->u.params.mode == SCHED_CLASS_MODE_CLASS)
- fw_mode = FW_SCHED_PARAMS_MODE_CLASS;
- else if (p->u.params.mode == SCHED_CLASS_MODE_FLOW)
- fw_mode = FW_SCHED_PARAMS_MODE_FLOW;
- else {
- rc = EINVAL;
- goto done;
- }
+ /* Vet our parameters ... */
+ if (!in_range(p->channel, 0, sc->chip_params->nchan - 1))
+ return (ERANGE);
- if (p->u.params.rateunit == SCHED_CLASS_RATEUNIT_BITS)
- fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE;
- else if (p->u.params.rateunit == SCHED_CLASS_RATEUNIT_PKTS)
- fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE;
- else {
- rc = EINVAL;
- goto done;
- }
+ pi = sc->port[sc->chan_map[p->channel]];
+ if (pi == NULL)
+ return (ENXIO);
+ MPASS(pi->tx_chan == p->channel);
+ top_speed = port_top_speed(pi) * 1000000; /* Gbps -> Kbps */
- if (p->u.params.ratemode == SCHED_CLASS_RATEMODE_REL)
- fw_ratemode = FW_SCHED_PARAMS_RATE_REL;
- else if (p->u.params.ratemode == SCHED_CLASS_RATEMODE_ABS)
- fw_ratemode = FW_SCHED_PARAMS_RATE_ABS;
- else {
- rc = EINVAL;
- goto done;
- }
+ if (!in_range(p->cl, 0, sc->chip_params->nsched_cls) ||
+ !in_range(p->minrate, 0, top_speed) ||
+ !in_range(p->maxrate, 0, top_speed) ||
+ !in_range(p->weight, 0, 100))
+ return (ERANGE);
- /* Vet our parameters ... */
- if (!in_range(p->u.params.channel, 0, 3) ||
- !in_range(p->u.params.cl, 0, sc->chip_params->nsched_cls) ||
- !in_range(p->u.params.minrate, 0, 10000000) ||
- !in_range(p->u.params.maxrate, 0, 10000000) ||
- !in_range(p->u.params.weight, 0, 100)) {
- rc = ERANGE;
- goto done;
- }
+ /*
+ * Translate any unset parameters into the firmware's
+ * nomenclature and/or fail the call if the parameters
+ * are required ...
+ */
+ if (p->rateunit < 0 || p->ratemode < 0 || p->channel < 0 || p->cl < 0)
+ return (EINVAL);
+ if (p->minrate < 0)
+ p->minrate = 0;
+ if (p->maxrate < 0) {
+ if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
+ p->level == SCHED_CLASS_LEVEL_CH_RL)
+ return (EINVAL);
+ else
+ p->maxrate = 0;
+ }
+ if (p->weight < 0) {
+ if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
+ return (EINVAL);
+ else
+ p->weight = 0;
+ }
+ if (p->pktsize < 0) {
+ if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
+ p->level == SCHED_CLASS_LEVEL_CH_RL)
+ return (EINVAL);
+ else
+ p->pktsize = 0;
+ }
+
+ rc = begin_synchronized_op(sc, NULL,
+ sleep_ok ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4sscp");
+ if (rc)
+ return (rc);
+ tc = &pi->tc[p->cl];
+ tc->params = *p;
+ rc = -t4_sched_params(sc, FW_SCHED_TYPE_PKTSCHED, fw_level, fw_mode,
+ fw_rateunit, fw_ratemode, p->channel, p->cl, p->minrate, p->maxrate,
+ p->weight, p->pktsize, sleep_ok);
+ if (rc == 0)
+ tc->flags |= TX_SC_OK;
+ else {
/*
- * Translate any unset parameters into the firmware's
- * nomenclature and/or fail the call if the parameters
- * are required ...
+ * Unknown state at this point, see tc->params for what was
+ * attempted.
*/
- if (p->u.params.rateunit < 0 || p->u.params.ratemode < 0 ||
- p->u.params.channel < 0 || p->u.params.cl < 0) {
- rc = EINVAL;
- goto done;
- }
- if (p->u.params.minrate < 0)
- p->u.params.minrate = 0;
- if (p->u.params.maxrate < 0) {
- if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL ||
- p->u.params.level == SCHED_CLASS_LEVEL_CH_RL) {
- rc = EINVAL;
- goto done;
- } else
- p->u.params.maxrate = 0;
- }
- if (p->u.params.weight < 0) {
- if (p->u.params.level == SCHED_CLASS_LEVEL_CL_WRR) {
- rc = EINVAL;
- goto done;
- } else
- p->u.params.weight = 0;
- }
- if (p->u.params.pktsize < 0) {
- if (p->u.params.level == SCHED_CLASS_LEVEL_CL_RL ||
- p->u.params.level == SCHED_CLASS_LEVEL_CH_RL) {
- rc = EINVAL;
- goto done;
- } else
- p->u.params.pktsize = 0;
- }
-
- /* See what the firmware thinks of the request ... */
- rc = -t4_sched_params(sc, fw_type, fw_level, fw_mode,
- fw_rateunit, fw_ratemode, p->u.params.channel,
- p->u.params.cl, p->u.params.minrate, p->u.params.maxrate,
- p->u.params.weight, p->u.params.pktsize, 1);
- goto done;
+ tc->flags &= ~TX_SC_OK;
}
+ end_synchronized_op(sc, sleep_ok ? 0 : LOCK_HELD);
- rc = EINVAL;
-done:
- end_synchronized_op(sc, 0);
return (rc);
}
static int
+set_sched_class(struct adapter *sc, struct t4_sched_params *p)
+{
+
+ if (p->type != SCHED_CLASS_TYPE_PACKET)
+ return (EINVAL);
+
+ if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG)
+ return (set_sched_class_config(sc, p->u.config.minmax));
+
+ if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS)
+ return (set_sched_class_params(sc, &p->u.params, 1));
+
+ return (EINVAL);
+}
+
+static int
set_sched_queue(struct adapter *sc, struct t4_sched_queue *p)
{
struct port_info *pi = NULL;
struct vi_info *vi;
struct sge_txq *txq;
uint32_t fw_mnem, fw_queue, fw_class;
int i, rc;
rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsq");
if (rc)
return (rc);
- if (!(sc->flags & FULL_INIT_DONE)) {
- rc = EAGAIN;
- goto done;
- }
-
if (p->port >= sc->params.nports) {
rc = EINVAL;
goto done;
}
/* XXX: Only supported for the main VI. */
pi = sc->port[p->port];
vi = &pi->vi[0];
- if (!in_range(p->queue, 0, vi->ntxq - 1) || !in_range(p->cl, 0, 7)) {
+ if (!(vi->flags & VI_INIT_DONE)) {
+ /* tx queues not set up yet */
+ rc = EAGAIN;
+ goto done;
+ }
+
+ if (!in_range(p->queue, 0, vi->ntxq - 1) ||
+ !in_range(p->cl, 0, sc->chip_params->nsched_cls - 1)) {
rc = EINVAL;
goto done;
}
/*
* Create a template for the FW_PARAMS_CMD mnemonic and value (TX
* Scheduling Class in this case).
*/
fw_mnem = (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) |
V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH));
fw_class = p->cl < 0 ? 0xffffffff : p->cl;
/*
* If op.queue is non-negative, then we're only changing the scheduling
* on a single specified TX queue.
*/
if (p->queue >= 0) {
txq = &sc->sge.txq[vi->first_txq + p->queue];
fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
&fw_class);
goto done;
}
/*
* Change the scheduling on all the TX queues for the
* interface.
*/
for_each_txq(vi, i, txq) {
fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
&fw_class);
if (rc)
goto done;
}
rc = 0;
done:
end_synchronized_op(sc, 0);
return (rc);
}
int
t4_os_find_pci_capability(struct adapter *sc, int cap)
{
int i;
return (pci_find_cap(sc->dev, cap, &i) == 0 ? i : 0);
}
int
t4_os_pci_save_state(struct adapter *sc)
{
device_t dev;
struct pci_devinfo *dinfo;
dev = sc->dev;
dinfo = device_get_ivars(dev);
pci_cfg_save(dev, dinfo, 0);
return (0);
}
int
t4_os_pci_restore_state(struct adapter *sc)
{
device_t dev;
struct pci_devinfo *dinfo;
dev = sc->dev;
dinfo = device_get_ivars(dev);
pci_cfg_restore(dev, dinfo);
return (0);
}
void
t4_os_portmod_changed(const struct adapter *sc, int idx)
{
struct port_info *pi = sc->port[idx];
struct vi_info *vi;
struct ifnet *ifp;
int v;
static const char *mod_str[] = {
NULL, "LR", "SR", "ER", "TWINAX", "active TWINAX", "LRM"
};
for_each_vi(pi, v, vi) {
build_medialist(pi, &vi->media);
}
ifp = pi->vi[0].ifp;
if (pi->mod_type == FW_PORT_MOD_TYPE_NONE)
if_printf(ifp, "transceiver unplugged.\n");
else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN)
if_printf(ifp, "unknown transceiver inserted.\n");
else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED)
if_printf(ifp, "unsupported transceiver inserted.\n");
else if (pi->mod_type > 0 && pi->mod_type < nitems(mod_str)) {
if_printf(ifp, "%s transceiver inserted.\n",
mod_str[pi->mod_type]);
} else {
if_printf(ifp, "transceiver (type %d) inserted.\n",
pi->mod_type);
}
}
void
t4_os_link_changed(struct adapter *sc, int idx, int link_stat, int reason)
{
struct port_info *pi = sc->port[idx];
struct vi_info *vi;
struct ifnet *ifp;
int v;
if (link_stat)
pi->linkdnrc = -1;
else {
if (reason >= 0)
pi->linkdnrc = reason;
}
for_each_vi(pi, v, vi) {
ifp = vi->ifp;
if (ifp == NULL)
continue;
if (link_stat) {
ifp->if_baudrate = IF_Mbps(pi->link_cfg.speed);
if_link_state_change(ifp, LINK_STATE_UP);
} else {
if_link_state_change(ifp, LINK_STATE_DOWN);
}
}
}
void
t4_iterate(void (*func)(struct adapter *, void *), void *arg)
{
struct adapter *sc;
sx_slock(&t4_list_lock);
SLIST_FOREACH(sc, &t4_list, link) {
/*
* func should not make any assumptions about what state sc is
* in - the only guarantee is that sc->sc_lock is a valid lock.
*/
func(sc, arg);
}
sx_sunlock(&t4_list_lock);
}
static int
t4_open(struct cdev *dev, int flags, int type, struct thread *td)
{
return (0);
}
static int
t4_close(struct cdev *dev, int flags, int type, struct thread *td)
{
return (0);
}
static int
t4_ioctl(struct cdev *dev, unsigned long cmd, caddr_t data, int fflag,
struct thread *td)
{
int rc;
struct adapter *sc = dev->si_drv1;
rc = priv_check(td, PRIV_DRIVER);
if (rc != 0)
return (rc);
switch (cmd) {
case CHELSIO_T4_GETREG: {
struct t4_reg *edata = (struct t4_reg *)data;
if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
return (EFAULT);
if (edata->size == 4)
edata->val = t4_read_reg(sc, edata->addr);
else if (edata->size == 8)
edata->val = t4_read_reg64(sc, edata->addr);
else
return (EINVAL);
break;
}
case CHELSIO_T4_SETREG: {
struct t4_reg *edata = (struct t4_reg *)data;
if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
return (EFAULT);
if (edata->size == 4) {
if (edata->val & 0xffffffff00000000)
return (EINVAL);
t4_write_reg(sc, edata->addr, (uint32_t) edata->val);
} else if (edata->size == 8)
t4_write_reg64(sc, edata->addr, edata->val);
else
return (EINVAL);
break;
}
case CHELSIO_T4_REGDUMP: {
struct t4_regdump *regs = (struct t4_regdump *)data;
int reglen = is_t4(sc) ? T4_REGDUMP_SIZE : T5_REGDUMP_SIZE;
uint8_t *buf;
if (regs->len < reglen) {
regs->len = reglen; /* hint to the caller */
return (ENOBUFS);
}
regs->len = reglen;
buf = malloc(reglen, M_CXGBE, M_WAITOK | M_ZERO);
get_regs(sc, regs, buf);
rc = copyout(buf, regs->data, reglen);
free(buf, M_CXGBE);
break;
}
case CHELSIO_T4_GET_FILTER_MODE:
rc = get_filter_mode(sc, (uint32_t *)data);
break;
case CHELSIO_T4_SET_FILTER_MODE:
rc = set_filter_mode(sc, *(uint32_t *)data);
break;
case CHELSIO_T4_GET_FILTER:
rc = get_filter(sc, (struct t4_filter *)data);
break;
case CHELSIO_T4_SET_FILTER:
rc = set_filter(sc, (struct t4_filter *)data);
break;
case CHELSIO_T4_DEL_FILTER:
rc = del_filter(sc, (struct t4_filter *)data);
break;
case CHELSIO_T4_GET_SGE_CONTEXT:
rc = get_sge_context(sc, (struct t4_sge_context *)data);
break;
case CHELSIO_T4_LOAD_FW:
rc = load_fw(sc, (struct t4_data *)data);
break;
case CHELSIO_T4_GET_MEM:
rc = read_card_mem(sc, 2, (struct t4_mem_range *)data);
break;
case CHELSIO_T4_GET_I2C:
rc = read_i2c(sc, (struct t4_i2c_data *)data);
break;
case CHELSIO_T4_CLEAR_STATS: {
int i, v;
u_int port_id = *(uint32_t *)data;
struct port_info *pi;
struct vi_info *vi;
if (port_id >= sc->params.nports)
return (EINVAL);
pi = sc->port[port_id];
/* MAC stats */
t4_clr_port_stats(sc, pi->tx_chan);
pi->tx_parse_error = 0;
mtx_lock(&sc->reg_lock);
for_each_vi(pi, v, vi) {
if (vi->flags & VI_INIT_DONE)
t4_clr_vi_stats(sc, vi->viid);
}
mtx_unlock(&sc->reg_lock);
/*
* Since this command accepts a port, clear stats for
* all VIs on this port.
*/
for_each_vi(pi, v, vi) {
if (vi->flags & VI_INIT_DONE) {
struct sge_rxq *rxq;
struct sge_txq *txq;
struct sge_wrq *wrq;
if (vi->flags & VI_NETMAP)
continue;
for_each_rxq(vi, i, rxq) {
#if defined(INET) || defined(INET6)
rxq->lro.lro_queued = 0;
rxq->lro.lro_flushed = 0;
#endif
rxq->rxcsum = 0;
rxq->vlan_extraction = 0;
}
for_each_txq(vi, i, txq) {
txq->txcsum = 0;
txq->tso_wrs = 0;
txq->vlan_insertion = 0;
txq->imm_wrs = 0;
txq->sgl_wrs = 0;
txq->txpkt_wrs = 0;
txq->txpkts0_wrs = 0;
txq->txpkts1_wrs = 0;
txq->txpkts0_pkts = 0;
txq->txpkts1_pkts = 0;
mp_ring_reset_stats(txq->r);
}
#ifdef TCP_OFFLOAD
/* nothing to clear for each ofld_rxq */
for_each_ofld_txq(vi, i, wrq) {
wrq->tx_wrs_direct = 0;
wrq->tx_wrs_copied = 0;
}
#endif
if (IS_MAIN_VI(vi)) {
wrq = &sc->sge.ctrlq[pi->port_id];
wrq->tx_wrs_direct = 0;
wrq->tx_wrs_copied = 0;
}
}
}
break;
}
case CHELSIO_T4_SCHED_CLASS:
rc = set_sched_class(sc, (struct t4_sched_params *)data);
break;
case CHELSIO_T4_SCHED_QUEUE:
rc = set_sched_queue(sc, (struct t4_sched_queue *)data);
break;
case CHELSIO_T4_GET_TRACER:
rc = t4_get_tracer(sc, (struct t4_tracer *)data);
break;
case CHELSIO_T4_SET_TRACER:
rc = t4_set_tracer(sc, (struct t4_tracer *)data);
break;
default:
rc = EINVAL;
}
return (rc);
}
void
t4_db_full(struct adapter *sc)
{
CXGBE_UNIMPLEMENTED(__func__);
}
void
t4_db_dropped(struct adapter *sc)
{
CXGBE_UNIMPLEMENTED(__func__);
}
#ifdef TCP_OFFLOAD
void
t4_iscsi_init(struct adapter *sc, u_int tag_mask, const u_int *pgsz_order)
{
t4_write_reg(sc, A_ULP_RX_ISCSI_TAGMASK, tag_mask);
t4_write_reg(sc, A_ULP_RX_ISCSI_PSZ, V_HPZ0(pgsz_order[0]) |
V_HPZ1(pgsz_order[1]) | V_HPZ2(pgsz_order[2]) |
V_HPZ3(pgsz_order[3]));
}
static int
toe_capability(struct vi_info *vi, int enable)
{
int rc;
struct port_info *pi = vi->pi;
struct adapter *sc = pi->adapter;
ASSERT_SYNCHRONIZED_OP(sc);
if (!is_offload(sc))
return (ENODEV);
if (enable) {
if ((vi->ifp->if_capenable & IFCAP_TOE) != 0) {
/* TOE is already enabled. */
return (0);
}
/*
* We need the port's queues around so that we're able to send
* and receive CPLs to/from the TOE even if the ifnet for this
* port has never been UP'd administratively.
*/
if (!(vi->flags & VI_INIT_DONE)) {
rc = cxgbe_init_synchronized(vi);
if (rc)
return (rc);
}
if (!(pi->vi[0].flags & VI_INIT_DONE)) {
rc = cxgbe_init_synchronized(&pi->vi[0]);
if (rc)
return (rc);
}
if (isset(&sc->offload_map, pi->port_id)) {
/* TOE is enabled on another VI of this port. */
pi->uld_vis++;
return (0);
}
if (!uld_active(sc, ULD_TOM)) {
rc = t4_activate_uld(sc, ULD_TOM);
if (rc == EAGAIN) {
log(LOG_WARNING,
"You must kldload t4_tom.ko before trying "
"to enable TOE on a cxgbe interface.\n");
}
if (rc != 0)
return (rc);
KASSERT(sc->tom_softc != NULL,
("%s: TOM activated but softc NULL", __func__));
KASSERT(uld_active(sc, ULD_TOM),
("%s: TOM activated but flag not set", __func__));
}
/* Activate iWARP and iSCSI too, if the modules are loaded. */
if (!uld_active(sc, ULD_IWARP))
(void) t4_activate_uld(sc, ULD_IWARP);
if (!uld_active(sc, ULD_ISCSI))
(void) t4_activate_uld(sc, ULD_ISCSI);
pi->uld_vis++;
setbit(&sc->offload_map, pi->port_id);
} else {
pi->uld_vis--;
if (!isset(&sc->offload_map, pi->port_id) || pi->uld_vis > 0)
return (0);
KASSERT(uld_active(sc, ULD_TOM),
("%s: TOM never initialized?", __func__));
clrbit(&sc->offload_map, pi->port_id);
}
return (0);
}
/*
* Add an upper layer driver to the global list.
*/
int
t4_register_uld(struct uld_info *ui)
{
int rc = 0;
struct uld_info *u;
sx_xlock(&t4_uld_list_lock);
SLIST_FOREACH(u, &t4_uld_list, link) {
if (u->uld_id == ui->uld_id) {
rc = EEXIST;
goto done;
}
}
SLIST_INSERT_HEAD(&t4_uld_list, ui, link);
ui->refcount = 0;
done:
sx_xunlock(&t4_uld_list_lock);
return (rc);
}
int
t4_unregister_uld(struct uld_info *ui)
{
int rc = EINVAL;
struct uld_info *u;
sx_xlock(&t4_uld_list_lock);
SLIST_FOREACH(u, &t4_uld_list, link) {
if (u == ui) {
if (ui->refcount > 0) {
rc = EBUSY;
goto done;
}
SLIST_REMOVE(&t4_uld_list, ui, uld_info, link);
rc = 0;
goto done;
}
}
done:
sx_xunlock(&t4_uld_list_lock);
return (rc);
}
int
t4_activate_uld(struct adapter *sc, int id)
{
int rc;
struct uld_info *ui;
ASSERT_SYNCHRONIZED_OP(sc);
if (id < 0 || id > ULD_MAX)
return (EINVAL);
rc = EAGAIN; /* kldoad the module with this ULD and try again. */
sx_slock(&t4_uld_list_lock);
SLIST_FOREACH(ui, &t4_uld_list, link) {
if (ui->uld_id == id) {
if (!(sc->flags & FULL_INIT_DONE)) {
rc = adapter_full_init(sc);
if (rc != 0)
break;
}
rc = ui->activate(sc);
if (rc == 0) {
setbit(&sc->active_ulds, id);
ui->refcount++;
}
break;
}
}
sx_sunlock(&t4_uld_list_lock);
return (rc);
}
int
t4_deactivate_uld(struct adapter *sc, int id)
{
int rc;
struct uld_info *ui;
ASSERT_SYNCHRONIZED_OP(sc);
if (id < 0 || id > ULD_MAX)
return (EINVAL);
rc = ENXIO;
sx_slock(&t4_uld_list_lock);
SLIST_FOREACH(ui, &t4_uld_list, link) {
if (ui->uld_id == id) {
rc = ui->deactivate(sc);
if (rc == 0) {
clrbit(&sc->active_ulds, id);
ui->refcount--;
}
break;
}
}
sx_sunlock(&t4_uld_list_lock);
return (rc);
}
int
uld_active(struct adapter *sc, int uld_id)
{
MPASS(uld_id >= 0 && uld_id <= ULD_MAX);
return (isset(&sc->active_ulds, uld_id));
}
#endif
/*
* Come up with reasonable defaults for some of the tunables, provided they're
* not set by the user (in which case we'll use the values as is).
*/
static void
tweak_tunables(void)
{
int nc = mp_ncpus; /* our snapshot of the number of CPUs */
if (t4_ntxq10g < 1) {
#ifdef RSS
t4_ntxq10g = rss_getnumbuckets();
#else
t4_ntxq10g = min(nc, NTXQ_10G);
#endif
}
if (t4_ntxq1g < 1) {
#ifdef RSS
/* XXX: way too many for 1GbE? */
t4_ntxq1g = rss_getnumbuckets();
#else
t4_ntxq1g = min(nc, NTXQ_1G);
#endif
}
if (t4_nrxq10g < 1) {
#ifdef RSS
t4_nrxq10g = rss_getnumbuckets();
#else
t4_nrxq10g = min(nc, NRXQ_10G);
#endif
}
if (t4_nrxq1g < 1) {
#ifdef RSS
/* XXX: way too many for 1GbE? */
t4_nrxq1g = rss_getnumbuckets();
#else
t4_nrxq1g = min(nc, NRXQ_1G);
#endif
}
#ifdef TCP_OFFLOAD
if (t4_nofldtxq10g < 1)
t4_nofldtxq10g = min(nc, NOFLDTXQ_10G);
if (t4_nofldtxq1g < 1)
t4_nofldtxq1g = min(nc, NOFLDTXQ_1G);
if (t4_nofldrxq10g < 1)
t4_nofldrxq10g = min(nc, NOFLDRXQ_10G);
if (t4_nofldrxq1g < 1)
t4_nofldrxq1g = min(nc, NOFLDRXQ_1G);
if (t4_toecaps_allowed == -1)
t4_toecaps_allowed = FW_CAPS_CONFIG_TOE;
if (t4_rdmacaps_allowed == -1) {
t4_rdmacaps_allowed = FW_CAPS_CONFIG_RDMA_RDDP |
FW_CAPS_CONFIG_RDMA_RDMAC;
}
if (t4_iscsicaps_allowed == -1) {
t4_iscsicaps_allowed = FW_CAPS_CONFIG_ISCSI_INITIATOR_PDU |
FW_CAPS_CONFIG_ISCSI_TARGET_PDU |
FW_CAPS_CONFIG_ISCSI_T10DIF;
}
#else
if (t4_toecaps_allowed == -1)
t4_toecaps_allowed = 0;
if (t4_rdmacaps_allowed == -1)
t4_rdmacaps_allowed = 0;
if (t4_iscsicaps_allowed == -1)
t4_iscsicaps_allowed = 0;
#endif
#ifdef DEV_NETMAP
if (t4_nnmtxq10g < 1)
t4_nnmtxq10g = min(nc, NNMTXQ_10G);
if (t4_nnmtxq1g < 1)
t4_nnmtxq1g = min(nc, NNMTXQ_1G);
if (t4_nnmrxq10g < 1)
t4_nnmrxq10g = min(nc, NNMRXQ_10G);
if (t4_nnmrxq1g < 1)
t4_nnmrxq1g = min(nc, NNMRXQ_1G);
#endif
if (t4_tmr_idx_10g < 0 || t4_tmr_idx_10g >= SGE_NTIMERS)
t4_tmr_idx_10g = TMR_IDX_10G;
if (t4_pktc_idx_10g < -1 || t4_pktc_idx_10g >= SGE_NCOUNTERS)
t4_pktc_idx_10g = PKTC_IDX_10G;
if (t4_tmr_idx_1g < 0 || t4_tmr_idx_1g >= SGE_NTIMERS)
t4_tmr_idx_1g = TMR_IDX_1G;
if (t4_pktc_idx_1g < -1 || t4_pktc_idx_1g >= SGE_NCOUNTERS)
t4_pktc_idx_1g = PKTC_IDX_1G;
if (t4_qsize_txq < 128)
t4_qsize_txq = 128;
if (t4_qsize_rxq < 128)
t4_qsize_rxq = 128;
while (t4_qsize_rxq & 7)
t4_qsize_rxq++;
t4_intr_types &= INTR_MSIX | INTR_MSI | INTR_INTX;
}
#ifdef DDB
static void
t4_dump_tcb(struct adapter *sc, int tid)
{
uint32_t base, i, j, off, pf, reg, save, tcb_addr, win_pos;
reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, 2);
save = t4_read_reg(sc, reg);
base = sc->memwin[2].mw_base;
/* Dump TCB for the tid */
tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
tcb_addr += tid * TCB_SIZE;
if (is_t4(sc)) {
pf = 0;
win_pos = tcb_addr & ~0xf; /* start must be 16B aligned */
} else {
pf = V_PFNUM(sc->pf);
win_pos = tcb_addr & ~0x7f; /* start must be 128B aligned */
}
t4_write_reg(sc, reg, win_pos | pf);
t4_read_reg(sc, reg);
off = tcb_addr - win_pos;
for (i = 0; i < 4; i++) {
uint32_t buf[8];
for (j = 0; j < 8; j++, off += 4)
buf[j] = htonl(t4_read_reg(sc, base + off));
db_printf("%08x %08x %08x %08x %08x %08x %08x %08x\n",
buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6],
buf[7]);
}
t4_write_reg(sc, reg, save);
t4_read_reg(sc, reg);
}
static void
t4_dump_devlog(struct adapter *sc)
{
struct devlog_params *dparams = &sc->params.devlog;
struct fw_devlog_e e;
int i, first, j, m, nentries, rc;
uint64_t ftstamp = UINT64_MAX;
if (dparams->start == 0) {
db_printf("devlog params not valid\n");
return;
}
nentries = dparams->size / sizeof(struct fw_devlog_e);
m = fwmtype_to_hwmtype(dparams->memtype);
/* Find the first entry. */
first = -1;
for (i = 0; i < nentries && !db_pager_quit; i++) {
rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
sizeof(e), (void *)&e);
if (rc != 0)
break;
if (e.timestamp == 0)
break;
e.timestamp = be64toh(e.timestamp);
if (e.timestamp < ftstamp) {
ftstamp = e.timestamp;
first = i;
}
}
if (first == -1)
return;
i = first;
do {
rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
sizeof(e), (void *)&e);
if (rc != 0)
return;
if (e.timestamp == 0)
return;
e.timestamp = be64toh(e.timestamp);
e.seqno = be32toh(e.seqno);
for (j = 0; j < 8; j++)
e.params[j] = be32toh(e.params[j]);
db_printf("%10d %15ju %8s %8s ",
e.seqno, e.timestamp,
(e.level < nitems(devlog_level_strings) ?
devlog_level_strings[e.level] : "UNKNOWN"),
(e.facility < nitems(devlog_facility_strings) ?
devlog_facility_strings[e.facility] : "UNKNOWN"));
db_printf(e.fmt, e.params[0], e.params[1], e.params[2],
e.params[3], e.params[4], e.params[5], e.params[6],
e.params[7]);
if (++i == nentries)
i = 0;
} while (i != first && !db_pager_quit);
}
static struct command_table db_t4_table = LIST_HEAD_INITIALIZER(db_t4_table);
_DB_SET(_show, t4, NULL, db_show_table, 0, &db_t4_table);
DB_FUNC(devlog, db_show_devlog, db_t4_table, CS_OWN, NULL)
{
device_t dev;
int t;
bool valid;
valid = false;
t = db_read_token();
if (t == tIDENT) {
dev = device_lookup_by_name(db_tok_string);
valid = true;
}
db_skip_to_eol();
if (!valid) {
db_printf("usage: show t4 devlog \n");
return;
}
if (dev == NULL) {
db_printf("device not found\n");
return;
}
t4_dump_devlog(device_get_softc(dev));
}
DB_FUNC(tcb, db_show_t4tcb, db_t4_table, CS_OWN, NULL)
{
device_t dev;
int radix, tid, t;
bool valid;
valid = false;
radix = db_radix;
db_radix = 10;
t = db_read_token();
if (t == tIDENT) {
dev = device_lookup_by_name(db_tok_string);
t = db_read_token();
if (t == tNUMBER) {
tid = db_tok_number;
valid = true;
}
}
db_radix = radix;
db_skip_to_eol();
if (!valid) {
db_printf("usage: show t4 tcb \n");
return;
}
if (dev == NULL) {
db_printf("device not found\n");
return;
}
if (tid < 0) {
db_printf("invalid tid\n");
return;
}
t4_dump_tcb(device_get_softc(dev), tid);
}
#endif
static struct sx mlu; /* mod load unload */
SX_SYSINIT(cxgbe_mlu, &mlu, "cxgbe mod load/unload");
static int
mod_event(module_t mod, int cmd, void *arg)
{
int rc = 0;
static int loaded = 0;
switch (cmd) {
case MOD_LOAD:
sx_xlock(&mlu);
if (loaded++ == 0) {
t4_sge_modload();
sx_init(&t4_list_lock, "T4/T5 adapters");
SLIST_INIT(&t4_list);
#ifdef TCP_OFFLOAD
sx_init(&t4_uld_list_lock, "T4/T5 ULDs");
SLIST_INIT(&t4_uld_list);
#endif
t4_tracer_modload();
tweak_tunables();
}
sx_xunlock(&mlu);
break;
case MOD_UNLOAD:
sx_xlock(&mlu);
if (--loaded == 0) {
int tries;
sx_slock(&t4_list_lock);
if (!SLIST_EMPTY(&t4_list)) {
rc = EBUSY;
sx_sunlock(&t4_list_lock);
goto done_unload;
}
#ifdef TCP_OFFLOAD
sx_slock(&t4_uld_list_lock);
if (!SLIST_EMPTY(&t4_uld_list)) {
rc = EBUSY;
sx_sunlock(&t4_uld_list_lock);
sx_sunlock(&t4_list_lock);
goto done_unload;
}
#endif
tries = 0;
while (tries++ < 5 && t4_sge_extfree_refs() != 0) {
uprintf("%ju clusters with custom free routine "
"still is use.\n", t4_sge_extfree_refs());
pause("t4unload", 2 * hz);
}
#ifdef TCP_OFFLOAD
sx_sunlock(&t4_uld_list_lock);
#endif
sx_sunlock(&t4_list_lock);
if (t4_sge_extfree_refs() == 0) {
t4_tracer_modunload();
#ifdef TCP_OFFLOAD
sx_destroy(&t4_uld_list_lock);
#endif
sx_destroy(&t4_list_lock);
t4_sge_modunload();
loaded = 0;
} else {
rc = EBUSY;
loaded++; /* undo earlier decrement */
}
}
done_unload:
sx_xunlock(&mlu);
break;
}
return (rc);
}
static devclass_t t4_devclass, t5_devclass;
static devclass_t cxgbe_devclass, cxl_devclass;
static devclass_t vcxgbe_devclass, vcxl_devclass;
DRIVER_MODULE(t4nex, pci, t4_driver, t4_devclass, mod_event, 0);
MODULE_VERSION(t4nex, 1);
MODULE_DEPEND(t4nex, firmware, 1, 1, 1);
#ifdef DEV_NETMAP
MODULE_DEPEND(t4nex, netmap, 1, 1, 1);
#endif /* DEV_NETMAP */
DRIVER_MODULE(t5nex, pci, t5_driver, t5_devclass, mod_event, 0);
MODULE_VERSION(t5nex, 1);
MODULE_DEPEND(t5nex, firmware, 1, 1, 1);
#ifdef DEV_NETMAP
MODULE_DEPEND(t5nex, netmap, 1, 1, 1);
#endif /* DEV_NETMAP */
DRIVER_MODULE(cxgbe, t4nex, cxgbe_driver, cxgbe_devclass, 0, 0);
MODULE_VERSION(cxgbe, 1);
DRIVER_MODULE(cxl, t5nex, cxl_driver, cxl_devclass, 0, 0);
MODULE_VERSION(cxl, 1);
DRIVER_MODULE(vcxgbe, cxgbe, vcxgbe_driver, vcxgbe_devclass, 0, 0);
MODULE_VERSION(vcxgbe, 1);
DRIVER_MODULE(vcxl, cxl, vcxl_driver, vcxl_devclass, 0, 0);
MODULE_VERSION(vcxl, 1);
Index: projects/vnet/sys/dev/e1000/if_igb.c
===================================================================
--- projects/vnet/sys/dev/e1000/if_igb.c (revision 301546)
+++ projects/vnet/sys/dev/e1000/if_igb.c (revision 301547)
@@ -1,6440 +1,6440 @@
/******************************************************************************
Copyright (c) 2001-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
******************************************************************************/
/*$FreeBSD$*/
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_rss.h"
#ifdef HAVE_KERNEL_OPTION_HEADERS
#include "opt_device_polling.h"
#include "opt_altq.h"
#endif
#include "if_igb.h"
/*********************************************************************
* Driver version:
*********************************************************************/
char igb_driver_version[] = "2.5.3-k";
/*********************************************************************
* PCI Device ID Table
*
* Used by probe to select devices to load on
* Last field stores an index into e1000_strings
* Last entry must be all 0s
*
* { Vendor ID, Device ID, SubVendor ID, SubDevice ID, String Index }
*********************************************************************/
static igb_vendor_info_t igb_vendor_info_array[] =
{
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575EB_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575EB_FIBER_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82575GB_QUAD_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_NS, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_NS_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_FIBER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_SERDES_QUAD, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_QUAD_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_QUAD_COPPER_ET2, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82576_VF, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_FIBER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_SGMII, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_COPPER_DUAL, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_82580_QUAD_FIBER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SGMII, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_SFP, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_DH89XXCC_BACKPLANE, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_FIBER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_SGMII, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I350_VF, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_IT, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_OEM1, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_COPPER_FLASHLESS, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SERDES_FLASHLESS, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_FIBER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SERDES, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I210_SGMII, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I211_COPPER, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_BACKPLANE_1GBPS, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_BACKPLANE_2_5GBPS, 0, 0, 0},
{IGB_INTEL_VENDOR_ID, E1000_DEV_ID_I354_SGMII, 0, 0, 0},
/* required last entry */
{0, 0, 0, 0, 0}
};
/*********************************************************************
* Table of branding strings for all supported NICs.
*********************************************************************/
static char *igb_strings[] = {
"Intel(R) PRO/1000 Network Connection"
};
/*********************************************************************
* Function prototypes
*********************************************************************/
static int igb_probe(device_t);
static int igb_attach(device_t);
static int igb_detach(device_t);
static int igb_shutdown(device_t);
static int igb_suspend(device_t);
static int igb_resume(device_t);
#ifndef IGB_LEGACY_TX
static int igb_mq_start(struct ifnet *, struct mbuf *);
static int igb_mq_start_locked(struct ifnet *, struct tx_ring *);
static void igb_qflush(struct ifnet *);
static void igb_deferred_mq_start(void *, int);
#else
static void igb_start(struct ifnet *);
static void igb_start_locked(struct tx_ring *, struct ifnet *ifp);
#endif
static int igb_ioctl(struct ifnet *, u_long, caddr_t);
static uint64_t igb_get_counter(if_t, ift_counter);
static void igb_init(void *);
static void igb_init_locked(struct adapter *);
static void igb_stop(void *);
static void igb_media_status(struct ifnet *, struct ifmediareq *);
static int igb_media_change(struct ifnet *);
static void igb_identify_hardware(struct adapter *);
static int igb_allocate_pci_resources(struct adapter *);
static int igb_allocate_msix(struct adapter *);
static int igb_allocate_legacy(struct adapter *);
static int igb_setup_msix(struct adapter *);
static void igb_free_pci_resources(struct adapter *);
static void igb_local_timer(void *);
static void igb_reset(struct adapter *);
static int igb_setup_interface(device_t, struct adapter *);
static int igb_allocate_queues(struct adapter *);
static void igb_configure_queues(struct adapter *);
static int igb_allocate_transmit_buffers(struct tx_ring *);
static void igb_setup_transmit_structures(struct adapter *);
static void igb_setup_transmit_ring(struct tx_ring *);
static void igb_initialize_transmit_units(struct adapter *);
static void igb_free_transmit_structures(struct adapter *);
static void igb_free_transmit_buffers(struct tx_ring *);
static int igb_allocate_receive_buffers(struct rx_ring *);
static int igb_setup_receive_structures(struct adapter *);
static int igb_setup_receive_ring(struct rx_ring *);
static void igb_initialize_receive_units(struct adapter *);
static void igb_free_receive_structures(struct adapter *);
static void igb_free_receive_buffers(struct rx_ring *);
static void igb_free_receive_ring(struct rx_ring *);
static void igb_enable_intr(struct adapter *);
static void igb_disable_intr(struct adapter *);
static void igb_update_stats_counters(struct adapter *);
static bool igb_txeof(struct tx_ring *);
static __inline void igb_rx_discard(struct rx_ring *, int);
static __inline void igb_rx_input(struct rx_ring *,
struct ifnet *, struct mbuf *, u32);
static bool igb_rxeof(struct igb_queue *, int, int *);
static void igb_rx_checksum(u32, struct mbuf *, u32);
static int igb_tx_ctx_setup(struct tx_ring *,
struct mbuf *, u32 *, u32 *);
static int igb_tso_setup(struct tx_ring *,
struct mbuf *, u32 *, u32 *);
static void igb_set_promisc(struct adapter *);
static void igb_disable_promisc(struct adapter *);
static void igb_set_multi(struct adapter *);
static void igb_update_link_status(struct adapter *);
static void igb_refresh_mbufs(struct rx_ring *, int);
static void igb_register_vlan(void *, struct ifnet *, u16);
static void igb_unregister_vlan(void *, struct ifnet *, u16);
static void igb_setup_vlan_hw_support(struct adapter *);
static int igb_xmit(struct tx_ring *, struct mbuf **);
static int igb_dma_malloc(struct adapter *, bus_size_t,
struct igb_dma_alloc *, int);
static void igb_dma_free(struct adapter *, struct igb_dma_alloc *);
static int igb_sysctl_nvm_info(SYSCTL_HANDLER_ARGS);
static void igb_print_nvm_info(struct adapter *);
static int igb_is_valid_ether_addr(u8 *);
static void igb_add_hw_stats(struct adapter *);
static void igb_vf_init_stats(struct adapter *);
static void igb_update_vf_stats_counters(struct adapter *);
/* Management and WOL Support */
static void igb_init_manageability(struct adapter *);
static void igb_release_manageability(struct adapter *);
static void igb_get_hw_control(struct adapter *);
static void igb_release_hw_control(struct adapter *);
static void igb_enable_wakeup(device_t);
static void igb_led_func(void *, int);
static int igb_irq_fast(void *);
static void igb_msix_que(void *);
static void igb_msix_link(void *);
static void igb_handle_que(void *context, int pending);
static void igb_handle_link(void *context, int pending);
static void igb_handle_link_locked(struct adapter *);
static void igb_set_sysctl_value(struct adapter *, const char *,
const char *, int *, int);
static int igb_set_flowcntl(SYSCTL_HANDLER_ARGS);
static int igb_sysctl_dmac(SYSCTL_HANDLER_ARGS);
static int igb_sysctl_eee(SYSCTL_HANDLER_ARGS);
#ifdef DEVICE_POLLING
static poll_handler_t igb_poll;
#endif /* POLLING */
/*********************************************************************
* FreeBSD Device Interface Entry Points
*********************************************************************/
static device_method_t igb_methods[] = {
/* Device interface */
DEVMETHOD(device_probe, igb_probe),
DEVMETHOD(device_attach, igb_attach),
DEVMETHOD(device_detach, igb_detach),
DEVMETHOD(device_shutdown, igb_shutdown),
DEVMETHOD(device_suspend, igb_suspend),
DEVMETHOD(device_resume, igb_resume),
DEVMETHOD_END
};
static driver_t igb_driver = {
"igb", igb_methods, sizeof(struct adapter),
};
static devclass_t igb_devclass;
DRIVER_MODULE(igb, pci, igb_driver, igb_devclass, 0, 0);
MODULE_DEPEND(igb, pci, 1, 1, 1);
MODULE_DEPEND(igb, ether, 1, 1, 1);
#ifdef DEV_NETMAP
MODULE_DEPEND(igb, netmap, 1, 1, 1);
#endif /* DEV_NETMAP */
/*********************************************************************
* Tunable default values.
*********************************************************************/
static SYSCTL_NODE(_hw, OID_AUTO, igb, CTLFLAG_RD, 0, "IGB driver parameters");
/* Descriptor defaults */
static int igb_rxd = IGB_DEFAULT_RXD;
static int igb_txd = IGB_DEFAULT_TXD;
SYSCTL_INT(_hw_igb, OID_AUTO, rxd, CTLFLAG_RDTUN, &igb_rxd, 0,
"Number of receive descriptors per queue");
SYSCTL_INT(_hw_igb, OID_AUTO, txd, CTLFLAG_RDTUN, &igb_txd, 0,
"Number of transmit descriptors per queue");
/*
** AIM: Adaptive Interrupt Moderation
** which means that the interrupt rate
** is varied over time based on the
** traffic for that interrupt vector
*/
static int igb_enable_aim = TRUE;
SYSCTL_INT(_hw_igb, OID_AUTO, enable_aim, CTLFLAG_RWTUN, &igb_enable_aim, 0,
"Enable adaptive interrupt moderation");
/*
* MSIX should be the default for best performance,
* but this allows it to be forced off for testing.
*/
static int igb_enable_msix = 1;
SYSCTL_INT(_hw_igb, OID_AUTO, enable_msix, CTLFLAG_RDTUN, &igb_enable_msix, 0,
"Enable MSI-X interrupts");
/*
** Tuneable Interrupt rate
*/
static int igb_max_interrupt_rate = 8000;
SYSCTL_INT(_hw_igb, OID_AUTO, max_interrupt_rate, CTLFLAG_RDTUN,
&igb_max_interrupt_rate, 0, "Maximum interrupts per second");
#ifndef IGB_LEGACY_TX
/*
** Tuneable number of buffers in the buf-ring (drbr_xxx)
*/
static int igb_buf_ring_size = IGB_BR_SIZE;
SYSCTL_INT(_hw_igb, OID_AUTO, buf_ring_size, CTLFLAG_RDTUN,
&igb_buf_ring_size, 0, "Size of the bufring");
#endif
/*
** Header split causes the packet header to
** be dma'd to a separate mbuf from the payload.
** this can have memory alignment benefits. But
** another plus is that small packets often fit
** into the header and thus use no cluster. Its
** a very workload dependent type feature.
*/
static int igb_header_split = FALSE;
SYSCTL_INT(_hw_igb, OID_AUTO, header_split, CTLFLAG_RDTUN, &igb_header_split, 0,
"Enable receive mbuf header split");
/*
** This will autoconfigure based on the
** number of CPUs and max supported
** MSIX messages if left at 0.
*/
static int igb_num_queues = 0;
SYSCTL_INT(_hw_igb, OID_AUTO, num_queues, CTLFLAG_RDTUN, &igb_num_queues, 0,
"Number of queues to configure, 0 indicates autoconfigure");
/*
** Global variable to store last used CPU when binding queues
** to CPUs in igb_allocate_msix. Starts at CPU_FIRST and increments when a
** queue is bound to a cpu.
*/
static int igb_last_bind_cpu = -1;
/* How many packets rxeof tries to clean at a time */
static int igb_rx_process_limit = 100;
SYSCTL_INT(_hw_igb, OID_AUTO, rx_process_limit, CTLFLAG_RDTUN,
&igb_rx_process_limit, 0,
"Maximum number of received packets to process at a time, -1 means unlimited");
/* How many packets txeof tries to clean at a time */
static int igb_tx_process_limit = -1;
SYSCTL_INT(_hw_igb, OID_AUTO, tx_process_limit, CTLFLAG_RDTUN,
&igb_tx_process_limit, 0,
"Maximum number of sent packets to process at a time, -1 means unlimited");
#ifdef DEV_NETMAP /* see ixgbe.c for details */
#include
#endif /* DEV_NETMAP */
/*********************************************************************
* Device identification routine
*
* igb_probe determines if the driver should be loaded on
* adapter based on PCI vendor/device id of the adapter.
*
* return BUS_PROBE_DEFAULT on success, positive on failure
*********************************************************************/
static int
igb_probe(device_t dev)
{
char adapter_name[256];
uint16_t pci_vendor_id = 0;
uint16_t pci_device_id = 0;
uint16_t pci_subvendor_id = 0;
uint16_t pci_subdevice_id = 0;
igb_vendor_info_t *ent;
INIT_DEBUGOUT("igb_probe: begin");
pci_vendor_id = pci_get_vendor(dev);
if (pci_vendor_id != IGB_INTEL_VENDOR_ID)
return (ENXIO);
pci_device_id = pci_get_device(dev);
pci_subvendor_id = pci_get_subvendor(dev);
pci_subdevice_id = pci_get_subdevice(dev);
ent = igb_vendor_info_array;
while (ent->vendor_id != 0) {
if ((pci_vendor_id == ent->vendor_id) &&
(pci_device_id == ent->device_id) &&
((pci_subvendor_id == ent->subvendor_id) ||
(ent->subvendor_id == 0)) &&
((pci_subdevice_id == ent->subdevice_id) ||
(ent->subdevice_id == 0))) {
sprintf(adapter_name, "%s, Version - %s",
igb_strings[ent->index],
igb_driver_version);
device_set_desc_copy(dev, adapter_name);
return (BUS_PROBE_DEFAULT);
}
ent++;
}
return (ENXIO);
}
/*********************************************************************
* Device initialization routine
*
* The attach entry point is called when the driver is being loaded.
* This routine identifies the type of hardware, allocates all resources
* and initializes the hardware.
*
* return 0 on success, positive on failure
*********************************************************************/
static int
igb_attach(device_t dev)
{
struct adapter *adapter;
int error = 0;
u16 eeprom_data;
INIT_DEBUGOUT("igb_attach: begin");
if (resource_disabled("igb", device_get_unit(dev))) {
device_printf(dev, "Disabled by device hint\n");
return (ENXIO);
}
adapter = device_get_softc(dev);
adapter->dev = adapter->osdep.dev = dev;
IGB_CORE_LOCK_INIT(adapter, device_get_nameunit(dev));
/* SYSCTLs */
SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
OID_AUTO, "nvm", CTLTYPE_INT|CTLFLAG_RW, adapter, 0,
igb_sysctl_nvm_info, "I", "NVM Information");
igb_set_sysctl_value(adapter, "enable_aim",
"Interrupt Moderation", &adapter->enable_aim,
igb_enable_aim);
SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
OID_AUTO, "fc", CTLTYPE_INT|CTLFLAG_RW,
adapter, 0, igb_set_flowcntl, "I", "Flow Control");
callout_init_mtx(&adapter->timer, &adapter->core_mtx, 0);
/* Determine hardware and mac info */
igb_identify_hardware(adapter);
/* Setup PCI resources */
if (igb_allocate_pci_resources(adapter)) {
device_printf(dev, "Allocation of PCI resources failed\n");
error = ENXIO;
goto err_pci;
}
/* Do Shared Code initialization */
if (e1000_setup_init_funcs(&adapter->hw, TRUE)) {
device_printf(dev, "Setup of Shared code failed\n");
error = ENXIO;
goto err_pci;
}
e1000_get_bus_info(&adapter->hw);
/* Sysctls for limiting the amount of work done in the taskqueues */
igb_set_sysctl_value(adapter, "rx_processing_limit",
"max number of rx packets to process",
&adapter->rx_process_limit, igb_rx_process_limit);
igb_set_sysctl_value(adapter, "tx_processing_limit",
"max number of tx packets to process",
&adapter->tx_process_limit, igb_tx_process_limit);
/*
* Validate number of transmit and receive descriptors. It
* must not exceed hardware maximum, and must be multiple
* of E1000_DBA_ALIGN.
*/
if (((igb_txd * sizeof(struct e1000_tx_desc)) % IGB_DBA_ALIGN) != 0 ||
(igb_txd > IGB_MAX_TXD) || (igb_txd < IGB_MIN_TXD)) {
device_printf(dev, "Using %d TX descriptors instead of %d!\n",
IGB_DEFAULT_TXD, igb_txd);
adapter->num_tx_desc = IGB_DEFAULT_TXD;
} else
adapter->num_tx_desc = igb_txd;
if (((igb_rxd * sizeof(struct e1000_rx_desc)) % IGB_DBA_ALIGN) != 0 ||
(igb_rxd > IGB_MAX_RXD) || (igb_rxd < IGB_MIN_RXD)) {
device_printf(dev, "Using %d RX descriptors instead of %d!\n",
IGB_DEFAULT_RXD, igb_rxd);
adapter->num_rx_desc = IGB_DEFAULT_RXD;
} else
adapter->num_rx_desc = igb_rxd;
adapter->hw.mac.autoneg = DO_AUTO_NEG;
adapter->hw.phy.autoneg_wait_to_complete = FALSE;
adapter->hw.phy.autoneg_advertised = AUTONEG_ADV_DEFAULT;
/* Copper options */
if (adapter->hw.phy.media_type == e1000_media_type_copper) {
adapter->hw.phy.mdix = AUTO_ALL_MODES;
adapter->hw.phy.disable_polarity_correction = FALSE;
adapter->hw.phy.ms_type = IGB_MASTER_SLAVE;
}
/*
* Set the frame limits assuming
* standard ethernet sized frames.
*/
adapter->max_frame_size = ETHERMTU + ETHER_HDR_LEN + ETHERNET_FCS_SIZE;
/*
** Allocate and Setup Queues
*/
if (igb_allocate_queues(adapter)) {
error = ENOMEM;
goto err_pci;
}
/* Allocate the appropriate stats memory */
if (adapter->vf_ifp) {
adapter->stats =
(struct e1000_vf_stats *)malloc(sizeof \
(struct e1000_vf_stats), M_DEVBUF, M_NOWAIT | M_ZERO);
igb_vf_init_stats(adapter);
} else
adapter->stats =
(struct e1000_hw_stats *)malloc(sizeof \
(struct e1000_hw_stats), M_DEVBUF, M_NOWAIT | M_ZERO);
if (adapter->stats == NULL) {
device_printf(dev, "Can not allocate stats memory\n");
error = ENOMEM;
goto err_late;
}
/* Allocate multicast array memory. */
adapter->mta = malloc(sizeof(u8) * ETH_ADDR_LEN *
MAX_NUM_MULTICAST_ADDRESSES, M_DEVBUF, M_NOWAIT);
if (adapter->mta == NULL) {
device_printf(dev, "Can not allocate multicast setup array\n");
error = ENOMEM;
goto err_late;
}
/* Some adapter-specific advanced features */
if (adapter->hw.mac.type >= e1000_i350) {
SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
OID_AUTO, "dmac", CTLTYPE_INT|CTLFLAG_RW,
adapter, 0, igb_sysctl_dmac, "I", "DMA Coalesce");
SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
OID_AUTO, "eee_disabled", CTLTYPE_INT|CTLFLAG_RW,
adapter, 0, igb_sysctl_eee, "I",
"Disable Energy Efficient Ethernet");
if (adapter->hw.phy.media_type == e1000_media_type_copper) {
if (adapter->hw.mac.type == e1000_i354)
e1000_set_eee_i354(&adapter->hw, TRUE, TRUE);
else
e1000_set_eee_i350(&adapter->hw, TRUE, TRUE);
}
}
/*
** Start from a known state, this is
** important in reading the nvm and
** mac from that.
*/
e1000_reset_hw(&adapter->hw);
/* Make sure we have a good EEPROM before we read from it */
if (((adapter->hw.mac.type != e1000_i210) &&
(adapter->hw.mac.type != e1000_i211)) &&
(e1000_validate_nvm_checksum(&adapter->hw) < 0)) {
/*
** Some PCI-E parts fail the first check due to
** the link being in sleep state, call it again,
** if it fails a second time its a real issue.
*/
if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
device_printf(dev,
"The EEPROM Checksum Is Not Valid\n");
error = EIO;
goto err_late;
}
}
/*
** Copy the permanent MAC address out of the EEPROM
*/
if (e1000_read_mac_addr(&adapter->hw) < 0) {
device_printf(dev, "EEPROM read error while reading MAC"
" address\n");
error = EIO;
goto err_late;
}
/* Check its sanity */
if (!igb_is_valid_ether_addr(adapter->hw.mac.addr)) {
device_printf(dev, "Invalid MAC address\n");
error = EIO;
goto err_late;
}
/* Setup OS specific network interface */
if (igb_setup_interface(dev, adapter) != 0)
goto err_late;
/* Now get a good starting state */
igb_reset(adapter);
/* Initialize statistics */
igb_update_stats_counters(adapter);
adapter->hw.mac.get_link_status = 1;
igb_update_link_status(adapter);
/* Indicate SOL/IDER usage */
if (e1000_check_reset_block(&adapter->hw))
device_printf(dev,
"PHY reset is blocked due to SOL/IDER session.\n");
/* Determine if we have to control management hardware */
adapter->has_manage = e1000_enable_mng_pass_thru(&adapter->hw);
/*
* Setup Wake-on-Lan
*/
/* APME bit in EEPROM is mapped to WUC.APME */
eeprom_data = E1000_READ_REG(&adapter->hw, E1000_WUC) & E1000_WUC_APME;
if (eeprom_data)
adapter->wol = E1000_WUFC_MAG;
/* Register for VLAN events */
adapter->vlan_attach = EVENTHANDLER_REGISTER(vlan_config,
igb_register_vlan, adapter, EVENTHANDLER_PRI_FIRST);
adapter->vlan_detach = EVENTHANDLER_REGISTER(vlan_unconfig,
igb_unregister_vlan, adapter, EVENTHANDLER_PRI_FIRST);
igb_add_hw_stats(adapter);
/* Tell the stack that the interface is not active */
adapter->ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
adapter->ifp->if_drv_flags |= IFF_DRV_OACTIVE;
adapter->led_dev = led_create(igb_led_func, adapter,
device_get_nameunit(dev));
/*
** Configure Interrupts
*/
if ((adapter->msix > 1) && (igb_enable_msix))
error = igb_allocate_msix(adapter);
else /* MSI or Legacy */
error = igb_allocate_legacy(adapter);
if (error)
goto err_late;
#ifdef DEV_NETMAP
igb_netmap_attach(adapter);
#endif /* DEV_NETMAP */
INIT_DEBUGOUT("igb_attach: end");
return (0);
err_late:
if (igb_detach(dev) == 0) /* igb_detach() already did the cleanup */
return(error);
igb_free_transmit_structures(adapter);
igb_free_receive_structures(adapter);
igb_release_hw_control(adapter);
err_pci:
igb_free_pci_resources(adapter);
if (adapter->ifp != NULL)
if_free(adapter->ifp);
free(adapter->mta, M_DEVBUF);
IGB_CORE_LOCK_DESTROY(adapter);
return (error);
}
/*********************************************************************
* Device removal routine
*
* The detach entry point is called when the driver is being removed.
* This routine stops the adapter and deallocates all the resources
* that were allocated for driver operation.
*
* return 0 on success, positive on failure
*********************************************************************/
static int
igb_detach(device_t dev)
{
struct adapter *adapter = device_get_softc(dev);
struct ifnet *ifp = adapter->ifp;
INIT_DEBUGOUT("igb_detach: begin");
/* Make sure VLANS are not using driver */
if (adapter->ifp->if_vlantrunk != NULL) {
device_printf(dev,"Vlan in use, detach first\n");
return (EBUSY);
}
ether_ifdetach(adapter->ifp);
if (adapter->led_dev != NULL)
led_destroy(adapter->led_dev);
#ifdef DEVICE_POLLING
if (ifp->if_capenable & IFCAP_POLLING)
ether_poll_deregister(ifp);
#endif
IGB_CORE_LOCK(adapter);
adapter->in_detach = 1;
igb_stop(adapter);
IGB_CORE_UNLOCK(adapter);
e1000_phy_hw_reset(&adapter->hw);
/* Give control back to firmware */
igb_release_manageability(adapter);
igb_release_hw_control(adapter);
if (adapter->wol) {
E1000_WRITE_REG(&adapter->hw, E1000_WUC, E1000_WUC_PME_EN);
E1000_WRITE_REG(&adapter->hw, E1000_WUFC, adapter->wol);
igb_enable_wakeup(dev);
}
/* Unregister VLAN events */
if (adapter->vlan_attach != NULL)
EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
if (adapter->vlan_detach != NULL)
EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
callout_drain(&adapter->timer);
#ifdef DEV_NETMAP
netmap_detach(adapter->ifp);
#endif /* DEV_NETMAP */
igb_free_pci_resources(adapter);
bus_generic_detach(dev);
if_free(ifp);
igb_free_transmit_structures(adapter);
igb_free_receive_structures(adapter);
if (adapter->mta != NULL)
free(adapter->mta, M_DEVBUF);
IGB_CORE_LOCK_DESTROY(adapter);
return (0);
}
/*********************************************************************
*
* Shutdown entry point
*
**********************************************************************/
static int
igb_shutdown(device_t dev)
{
return igb_suspend(dev);
}
/*
* Suspend/resume device methods.
*/
static int
igb_suspend(device_t dev)
{
struct adapter *adapter = device_get_softc(dev);
IGB_CORE_LOCK(adapter);
igb_stop(adapter);
igb_release_manageability(adapter);
igb_release_hw_control(adapter);
if (adapter->wol) {
E1000_WRITE_REG(&adapter->hw, E1000_WUC, E1000_WUC_PME_EN);
E1000_WRITE_REG(&adapter->hw, E1000_WUFC, adapter->wol);
igb_enable_wakeup(dev);
}
IGB_CORE_UNLOCK(adapter);
return bus_generic_suspend(dev);
}
static int
igb_resume(device_t dev)
{
struct adapter *adapter = device_get_softc(dev);
struct tx_ring *txr = adapter->tx_rings;
struct ifnet *ifp = adapter->ifp;
IGB_CORE_LOCK(adapter);
igb_init_locked(adapter);
igb_init_manageability(adapter);
if ((ifp->if_flags & IFF_UP) &&
(ifp->if_drv_flags & IFF_DRV_RUNNING) && adapter->link_active) {
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IGB_TX_LOCK(txr);
#ifndef IGB_LEGACY_TX
/* Process the stack queue only if not depleted */
if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
#else
if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
igb_start_locked(txr, ifp);
#endif
IGB_TX_UNLOCK(txr);
}
}
IGB_CORE_UNLOCK(adapter);
return bus_generic_resume(dev);
}
#ifdef IGB_LEGACY_TX
/*********************************************************************
* Transmit entry point
*
* igb_start is called by the stack to initiate a transmit.
* The driver will remain in this routine as long as there are
* packets to transmit and transmit resources are available.
* In case resources are not available stack is notified and
* the packet is requeued.
**********************************************************************/
static void
igb_start_locked(struct tx_ring *txr, struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct mbuf *m_head;
IGB_TX_LOCK_ASSERT(txr);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
IFF_DRV_RUNNING)
return;
if (!adapter->link_active)
return;
/* Call cleanup if number of TX descriptors low */
if (txr->tx_avail <= IGB_TX_CLEANUP_THRESHOLD)
igb_txeof(txr);
while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
if (txr->tx_avail <= IGB_MAX_SCATTER) {
txr->queue_status |= IGB_QUEUE_DEPLETED;
break;
}
IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
if (m_head == NULL)
break;
/*
* Encapsulation can modify our pointer, and or make it
* NULL on failure. In that event, we can't requeue.
*/
if (igb_xmit(txr, &m_head)) {
if (m_head != NULL)
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
if (txr->tx_avail <= IGB_MAX_SCATTER)
txr->queue_status |= IGB_QUEUE_DEPLETED;
break;
}
/* Send a copy of the frame to the BPF listener */
ETHER_BPF_MTAP(ifp, m_head);
/* Set watchdog on */
txr->watchdog_time = ticks;
txr->queue_status |= IGB_QUEUE_WORKING;
}
}
/*
* Legacy TX driver routine, called from the
* stack, always uses tx[0], and spins for it.
* Should not be used with multiqueue tx
*/
static void
igb_start(struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct tx_ring *txr = adapter->tx_rings;
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
IGB_TX_LOCK(txr);
igb_start_locked(txr, ifp);
IGB_TX_UNLOCK(txr);
}
return;
}
#else /* ~IGB_LEGACY_TX */
/*
** Multiqueue Transmit Entry:
** quick turnaround to the stack
**
*/
static int
igb_mq_start(struct ifnet *ifp, struct mbuf *m)
{
struct adapter *adapter = ifp->if_softc;
struct igb_queue *que;
struct tx_ring *txr;
int i, err = 0;
#ifdef RSS
uint32_t bucket_id;
#endif
/* Which queue to use */
/*
* When doing RSS, map it to the same outbound queue
* as the incoming flow would be mapped to.
*
* If everything is setup correctly, it should be the
* same bucket that the current CPU we're on is.
*/
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
#ifdef RSS
if (rss_hash2bucket(m->m_pkthdr.flowid,
M_HASHTYPE_GET(m), &bucket_id) == 0) {
/* XXX TODO: spit out something if bucket_id > num_queues? */
i = bucket_id % adapter->num_queues;
} else {
#endif
i = m->m_pkthdr.flowid % adapter->num_queues;
#ifdef RSS
}
#endif
} else {
i = curcpu % adapter->num_queues;
}
txr = &adapter->tx_rings[i];
que = &adapter->queues[i];
err = drbr_enqueue(ifp, txr->br, m);
if (err)
return (err);
if (IGB_TX_TRYLOCK(txr)) {
igb_mq_start_locked(ifp, txr);
IGB_TX_UNLOCK(txr);
} else
taskqueue_enqueue(que->tq, &txr->txq_task);
return (0);
}
static int
igb_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct mbuf *next;
int err = 0, enq = 0;
IGB_TX_LOCK_ASSERT(txr);
if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
adapter->link_active == 0)
return (ENETDOWN);
/* Process the queue */
while ((next = drbr_peek(ifp, txr->br)) != NULL) {
if ((err = igb_xmit(txr, &next)) != 0) {
if (next == NULL) {
/* It was freed, move forward */
drbr_advance(ifp, txr->br);
} else {
/*
* Still have one left, it may not be
* the same since the transmit function
* may have changed it.
*/
drbr_putback(ifp, txr->br, next);
}
break;
}
drbr_advance(ifp, txr->br);
enq++;
if (next->m_flags & M_MCAST && adapter->vf_ifp)
if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
ETHER_BPF_MTAP(ifp, next);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
}
if (enq > 0) {
/* Set the watchdog */
txr->queue_status |= IGB_QUEUE_WORKING;
txr->watchdog_time = ticks;
}
if (txr->tx_avail <= IGB_TX_CLEANUP_THRESHOLD)
igb_txeof(txr);
if (txr->tx_avail <= IGB_MAX_SCATTER)
txr->queue_status |= IGB_QUEUE_DEPLETED;
return (err);
}
/*
* Called from a taskqueue to drain queued transmit packets.
*/
static void
igb_deferred_mq_start(void *arg, int pending)
{
struct tx_ring *txr = arg;
struct adapter *adapter = txr->adapter;
struct ifnet *ifp = adapter->ifp;
IGB_TX_LOCK(txr);
if (!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
IGB_TX_UNLOCK(txr);
}
/*
** Flush all ring buffers
*/
static void
igb_qflush(struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct tx_ring *txr = adapter->tx_rings;
struct mbuf *m;
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IGB_TX_LOCK(txr);
while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
m_freem(m);
IGB_TX_UNLOCK(txr);
}
if_qflush(ifp);
}
#endif /* ~IGB_LEGACY_TX */
/*********************************************************************
* Ioctl entry point
*
* igb_ioctl is called when the user wants to configure the
* interface.
*
* return 0 on success, positive on failure
**********************************************************************/
static int
igb_ioctl(struct ifnet *ifp, u_long command, caddr_t data)
{
struct adapter *adapter = ifp->if_softc;
struct ifreq *ifr = (struct ifreq *)data;
#if defined(INET) || defined(INET6)
struct ifaddr *ifa = (struct ifaddr *)data;
#endif
bool avoid_reset = FALSE;
int error = 0;
if (adapter->in_detach)
return (error);
switch (command) {
case SIOCSIFADDR:
#ifdef INET
if (ifa->ifa_addr->sa_family == AF_INET)
avoid_reset = TRUE;
#endif
#ifdef INET6
if (ifa->ifa_addr->sa_family == AF_INET6)
avoid_reset = TRUE;
#endif
/*
** Calling init results in link renegotiation,
** so we avoid doing it when possible.
*/
if (avoid_reset) {
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
igb_init(adapter);
#ifdef INET
if (!(ifp->if_flags & IFF_NOARP))
arp_ifinit(ifp, ifa);
#endif
} else
error = ether_ioctl(ifp, command, data);
break;
case SIOCSIFMTU:
{
int max_frame_size;
IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFMTU (Set Interface MTU)");
IGB_CORE_LOCK(adapter);
max_frame_size = 9234;
if (ifr->ifr_mtu > max_frame_size - ETHER_HDR_LEN -
ETHER_CRC_LEN) {
IGB_CORE_UNLOCK(adapter);
error = EINVAL;
break;
}
ifp->if_mtu = ifr->ifr_mtu;
adapter->max_frame_size =
ifp->if_mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
igb_init_locked(adapter);
IGB_CORE_UNLOCK(adapter);
break;
}
case SIOCSIFFLAGS:
IOCTL_DEBUGOUT("ioctl rcv'd:\
SIOCSIFFLAGS (Set Interface Flags)");
IGB_CORE_LOCK(adapter);
if (ifp->if_flags & IFF_UP) {
if ((ifp->if_drv_flags & IFF_DRV_RUNNING)) {
if ((ifp->if_flags ^ adapter->if_flags) &
(IFF_PROMISC | IFF_ALLMULTI)) {
igb_disable_promisc(adapter);
igb_set_promisc(adapter);
}
} else
igb_init_locked(adapter);
} else
if (ifp->if_drv_flags & IFF_DRV_RUNNING)
igb_stop(adapter);
adapter->if_flags = ifp->if_flags;
IGB_CORE_UNLOCK(adapter);
break;
case SIOCADDMULTI:
case SIOCDELMULTI:
IOCTL_DEBUGOUT("ioctl rcv'd: SIOC(ADD|DEL)MULTI");
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
IGB_CORE_LOCK(adapter);
igb_disable_intr(adapter);
igb_set_multi(adapter);
#ifdef DEVICE_POLLING
if (!(ifp->if_capenable & IFCAP_POLLING))
#endif
igb_enable_intr(adapter);
IGB_CORE_UNLOCK(adapter);
}
break;
case SIOCSIFMEDIA:
/* Check SOL/IDER usage */
IGB_CORE_LOCK(adapter);
if (e1000_check_reset_block(&adapter->hw)) {
IGB_CORE_UNLOCK(adapter);
device_printf(adapter->dev, "Media change is"
" blocked due to SOL/IDER session.\n");
break;
}
IGB_CORE_UNLOCK(adapter);
case SIOCGIFMEDIA:
IOCTL_DEBUGOUT("ioctl rcv'd: \
SIOCxIFMEDIA (Get/Set Interface Media)");
error = ifmedia_ioctl(ifp, ifr, &adapter->media, command);
break;
case SIOCSIFCAP:
{
int mask, reinit;
IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFCAP (Set Capabilities)");
reinit = 0;
mask = ifr->ifr_reqcap ^ ifp->if_capenable;
#ifdef DEVICE_POLLING
if (mask & IFCAP_POLLING) {
if (ifr->ifr_reqcap & IFCAP_POLLING) {
error = ether_poll_register(igb_poll, ifp);
if (error)
return (error);
IGB_CORE_LOCK(adapter);
igb_disable_intr(adapter);
ifp->if_capenable |= IFCAP_POLLING;
IGB_CORE_UNLOCK(adapter);
} else {
error = ether_poll_deregister(ifp);
/* Enable interrupt even in error case */
IGB_CORE_LOCK(adapter);
igb_enable_intr(adapter);
ifp->if_capenable &= ~IFCAP_POLLING;
IGB_CORE_UNLOCK(adapter);
}
}
#endif
#if __FreeBSD_version >= 1000000
/* HW cannot turn these on/off separately */
if (mask & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) {
ifp->if_capenable ^= IFCAP_RXCSUM;
ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
reinit = 1;
}
if (mask & IFCAP_TXCSUM) {
ifp->if_capenable ^= IFCAP_TXCSUM;
reinit = 1;
}
if (mask & IFCAP_TXCSUM_IPV6) {
ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
reinit = 1;
}
#else
if (mask & IFCAP_HWCSUM) {
ifp->if_capenable ^= IFCAP_HWCSUM;
reinit = 1;
}
#endif
if (mask & IFCAP_TSO4) {
ifp->if_capenable ^= IFCAP_TSO4;
reinit = 1;
}
if (mask & IFCAP_TSO6) {
ifp->if_capenable ^= IFCAP_TSO6;
reinit = 1;
}
if (mask & IFCAP_VLAN_HWTAGGING) {
ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
reinit = 1;
}
if (mask & IFCAP_VLAN_HWFILTER) {
ifp->if_capenable ^= IFCAP_VLAN_HWFILTER;
reinit = 1;
}
if (mask & IFCAP_VLAN_HWTSO) {
ifp->if_capenable ^= IFCAP_VLAN_HWTSO;
reinit = 1;
}
if (mask & IFCAP_LRO) {
ifp->if_capenable ^= IFCAP_LRO;
reinit = 1;
}
if (reinit && (ifp->if_drv_flags & IFF_DRV_RUNNING))
igb_init(adapter);
VLAN_CAPABILITIES(ifp);
break;
}
default:
error = ether_ioctl(ifp, command, data);
break;
}
return (error);
}
/*********************************************************************
* Init entry point
*
* This routine is used in two ways. It is used by the stack as
* init entry point in network interface structure. It is also used
* by the driver as a hw/sw initialization routine to get to a
* consistent state.
*
* return 0 on success, positive on failure
**********************************************************************/
static void
igb_init_locked(struct adapter *adapter)
{
struct ifnet *ifp = adapter->ifp;
device_t dev = adapter->dev;
INIT_DEBUGOUT("igb_init: begin");
IGB_CORE_LOCK_ASSERT(adapter);
igb_disable_intr(adapter);
callout_stop(&adapter->timer);
/* Get the latest mac address, User can use a LAA */
bcopy(IF_LLADDR(adapter->ifp), adapter->hw.mac.addr,
ETHER_ADDR_LEN);
/* Put the address into the Receive Address Array */
e1000_rar_set(&adapter->hw, adapter->hw.mac.addr, 0);
igb_reset(adapter);
igb_update_link_status(adapter);
E1000_WRITE_REG(&adapter->hw, E1000_VET, ETHERTYPE_VLAN);
/* Set hardware offload abilities */
ifp->if_hwassist = 0;
if (ifp->if_capenable & IFCAP_TXCSUM) {
#if __FreeBSD_version >= 1000000
ifp->if_hwassist |= (CSUM_IP_TCP | CSUM_IP_UDP);
if (adapter->hw.mac.type != e1000_82575)
ifp->if_hwassist |= CSUM_IP_SCTP;
#else
ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP);
#if __FreeBSD_version >= 800000
if (adapter->hw.mac.type != e1000_82575)
ifp->if_hwassist |= CSUM_SCTP;
#endif
#endif
}
#if __FreeBSD_version >= 1000000
if (ifp->if_capenable & IFCAP_TXCSUM_IPV6) {
ifp->if_hwassist |= (CSUM_IP6_TCP | CSUM_IP6_UDP);
if (adapter->hw.mac.type != e1000_82575)
ifp->if_hwassist |= CSUM_IP6_SCTP;
}
#endif
if (ifp->if_capenable & IFCAP_TSO)
ifp->if_hwassist |= CSUM_TSO;
/* Clear bad data from Rx FIFOs */
e1000_rx_fifo_flush_82575(&adapter->hw);
/* Configure for OS presence */
igb_init_manageability(adapter);
/* Prepare transmit descriptors and buffers */
igb_setup_transmit_structures(adapter);
igb_initialize_transmit_units(adapter);
/* Setup Multicast table */
igb_set_multi(adapter);
/*
** Figure out the desired mbuf pool
** for doing jumbo/packetsplit
*/
if (adapter->max_frame_size <= 2048)
adapter->rx_mbuf_sz = MCLBYTES;
else if (adapter->max_frame_size <= 4096)
adapter->rx_mbuf_sz = MJUMPAGESIZE;
else
adapter->rx_mbuf_sz = MJUM9BYTES;
/* Prepare receive descriptors and buffers */
if (igb_setup_receive_structures(adapter)) {
device_printf(dev, "Could not setup receive structures\n");
return;
}
igb_initialize_receive_units(adapter);
/* Enable VLAN support */
if (ifp->if_capenable & IFCAP_VLAN_HWTAGGING)
igb_setup_vlan_hw_support(adapter);
/* Don't lose promiscuous settings */
igb_set_promisc(adapter);
ifp->if_drv_flags |= IFF_DRV_RUNNING;
ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
e1000_clear_hw_cntrs_base_generic(&adapter->hw);
if (adapter->msix > 1) /* Set up queue routing */
igb_configure_queues(adapter);
/* this clears any pending interrupts */
E1000_READ_REG(&adapter->hw, E1000_ICR);
#ifdef DEVICE_POLLING
/*
* Only enable interrupts if we are not polling, make sure
* they are off otherwise.
*/
if (ifp->if_capenable & IFCAP_POLLING)
igb_disable_intr(adapter);
else
#endif /* DEVICE_POLLING */
{
igb_enable_intr(adapter);
E1000_WRITE_REG(&adapter->hw, E1000_ICS, E1000_ICS_LSC);
}
/* Set Energy Efficient Ethernet */
if (adapter->hw.phy.media_type == e1000_media_type_copper) {
if (adapter->hw.mac.type == e1000_i354)
e1000_set_eee_i354(&adapter->hw, TRUE, TRUE);
else
e1000_set_eee_i350(&adapter->hw, TRUE, TRUE);
}
}
static void
igb_init(void *arg)
{
struct adapter *adapter = arg;
IGB_CORE_LOCK(adapter);
igb_init_locked(adapter);
IGB_CORE_UNLOCK(adapter);
}
static void
igb_handle_que(void *context, int pending)
{
struct igb_queue *que = context;
struct adapter *adapter = que->adapter;
struct tx_ring *txr = que->txr;
struct ifnet *ifp = adapter->ifp;
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
bool more;
more = igb_rxeof(que, adapter->rx_process_limit, NULL);
IGB_TX_LOCK(txr);
igb_txeof(txr);
#ifndef IGB_LEGACY_TX
/* Process the stack queue only if not depleted */
if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
#else
if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
igb_start_locked(txr, ifp);
#endif
IGB_TX_UNLOCK(txr);
/* Do we need another? */
if (more) {
taskqueue_enqueue(que->tq, &que->que_task);
return;
}
}
#ifdef DEVICE_POLLING
if (ifp->if_capenable & IFCAP_POLLING)
return;
#endif
/* Reenable this interrupt */
if (que->eims)
E1000_WRITE_REG(&adapter->hw, E1000_EIMS, que->eims);
else
igb_enable_intr(adapter);
}
/* Deal with link in a sleepable context */
static void
igb_handle_link(void *context, int pending)
{
struct adapter *adapter = context;
IGB_CORE_LOCK(adapter);
igb_handle_link_locked(adapter);
IGB_CORE_UNLOCK(adapter);
}
static void
igb_handle_link_locked(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
struct ifnet *ifp = adapter->ifp;
IGB_CORE_LOCK_ASSERT(adapter);
adapter->hw.mac.get_link_status = 1;
igb_update_link_status(adapter);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) && adapter->link_active) {
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IGB_TX_LOCK(txr);
#ifndef IGB_LEGACY_TX
/* Process the stack queue only if not depleted */
if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
#else
if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
igb_start_locked(txr, ifp);
#endif
IGB_TX_UNLOCK(txr);
}
}
}
/*********************************************************************
*
* MSI/Legacy Deferred
* Interrupt Service routine
*
*********************************************************************/
static int
igb_irq_fast(void *arg)
{
struct adapter *adapter = arg;
struct igb_queue *que = adapter->queues;
u32 reg_icr;
reg_icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
/* Hot eject? */
if (reg_icr == 0xffffffff)
return FILTER_STRAY;
/* Definitely not our interrupt. */
if (reg_icr == 0x0)
return FILTER_STRAY;
if ((reg_icr & E1000_ICR_INT_ASSERTED) == 0)
return FILTER_STRAY;
/*
* Mask interrupts until the taskqueue is finished running. This is
* cheap, just assume that it is needed. This also works around the
* MSI message reordering errata on certain systems.
*/
igb_disable_intr(adapter);
taskqueue_enqueue(que->tq, &que->que_task);
/* Link status change */
if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))
taskqueue_enqueue(que->tq, &adapter->link_task);
if (reg_icr & E1000_ICR_RXO)
adapter->rx_overruns++;
return FILTER_HANDLED;
}
#ifdef DEVICE_POLLING
#if __FreeBSD_version >= 800000
#define POLL_RETURN_COUNT(a) (a)
static int
#else
#define POLL_RETURN_COUNT(a)
static void
#endif
igb_poll(struct ifnet *ifp, enum poll_cmd cmd, int count)
{
struct adapter *adapter = ifp->if_softc;
struct igb_queue *que;
struct tx_ring *txr;
u32 reg_icr, rx_done = 0;
u32 loop = IGB_MAX_LOOP;
bool more;
IGB_CORE_LOCK(adapter);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
IGB_CORE_UNLOCK(adapter);
return POLL_RETURN_COUNT(rx_done);
}
if (cmd == POLL_AND_CHECK_STATUS) {
reg_icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
/* Link status change */
if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))
igb_handle_link_locked(adapter);
if (reg_icr & E1000_ICR_RXO)
adapter->rx_overruns++;
}
IGB_CORE_UNLOCK(adapter);
for (int i = 0; i < adapter->num_queues; i++) {
que = &adapter->queues[i];
txr = que->txr;
igb_rxeof(que, count, &rx_done);
IGB_TX_LOCK(txr);
do {
more = igb_txeof(txr);
} while (loop-- && more);
#ifndef IGB_LEGACY_TX
if (!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
#else
if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
igb_start_locked(txr, ifp);
#endif
IGB_TX_UNLOCK(txr);
}
return POLL_RETURN_COUNT(rx_done);
}
#endif /* DEVICE_POLLING */
/*********************************************************************
*
* MSIX Que Interrupt Service routine
*
**********************************************************************/
static void
igb_msix_que(void *arg)
{
struct igb_queue *que = arg;
struct adapter *adapter = que->adapter;
struct ifnet *ifp = adapter->ifp;
struct tx_ring *txr = que->txr;
struct rx_ring *rxr = que->rxr;
u32 newitr = 0;
bool more_rx;
/* Ignore spurious interrupts */
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
return;
E1000_WRITE_REG(&adapter->hw, E1000_EIMC, que->eims);
++que->irqs;
IGB_TX_LOCK(txr);
igb_txeof(txr);
#ifndef IGB_LEGACY_TX
/* Process the stack queue only if not depleted */
if (((txr->queue_status & IGB_QUEUE_DEPLETED) == 0) &&
!drbr_empty(ifp, txr->br))
igb_mq_start_locked(ifp, txr);
#else
if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
igb_start_locked(txr, ifp);
#endif
IGB_TX_UNLOCK(txr);
more_rx = igb_rxeof(que, adapter->rx_process_limit, NULL);
if (adapter->enable_aim == FALSE)
goto no_calc;
/*
** Do Adaptive Interrupt Moderation:
** - Write out last calculated setting
** - Calculate based on average size over
** the last interval.
*/
if (que->eitr_setting)
E1000_WRITE_REG(&adapter->hw,
E1000_EITR(que->msix), que->eitr_setting);
que->eitr_setting = 0;
/* Idle, do nothing */
if ((txr->bytes == 0) && (rxr->bytes == 0))
goto no_calc;
/* Used half Default if sub-gig */
if (adapter->link_speed != 1000)
newitr = IGB_DEFAULT_ITR / 2;
else {
if ((txr->bytes) && (txr->packets))
newitr = txr->bytes/txr->packets;
if ((rxr->bytes) && (rxr->packets))
newitr = max(newitr,
(rxr->bytes / rxr->packets));
newitr += 24; /* account for hardware frame, crc */
/* set an upper boundary */
newitr = min(newitr, 3000);
/* Be nice to the mid range */
if ((newitr > 300) && (newitr < 1200))
newitr = (newitr / 3);
else
newitr = (newitr / 2);
}
newitr &= 0x7FFC; /* Mask invalid bits */
if (adapter->hw.mac.type == e1000_82575)
newitr |= newitr << 16;
else
newitr |= E1000_EITR_CNT_IGNR;
/* save for next interrupt */
que->eitr_setting = newitr;
/* Reset state */
txr->bytes = 0;
txr->packets = 0;
rxr->bytes = 0;
rxr->packets = 0;
no_calc:
/* Schedule a clean task if needed*/
if (more_rx)
taskqueue_enqueue(que->tq, &que->que_task);
else
/* Reenable this interrupt */
E1000_WRITE_REG(&adapter->hw, E1000_EIMS, que->eims);
return;
}
/*********************************************************************
*
* MSIX Link Interrupt Service routine
*
**********************************************************************/
static void
igb_msix_link(void *arg)
{
struct adapter *adapter = arg;
u32 icr;
++adapter->link_irq;
icr = E1000_READ_REG(&adapter->hw, E1000_ICR);
if (!(icr & E1000_ICR_LSC))
goto spurious;
igb_handle_link(adapter, 0);
spurious:
/* Rearm */
E1000_WRITE_REG(&adapter->hw, E1000_IMS, E1000_IMS_LSC);
E1000_WRITE_REG(&adapter->hw, E1000_EIMS, adapter->link_mask);
return;
}
/*********************************************************************
*
* Media Ioctl callback
*
* This routine is called whenever the user queries the status of
* the interface using ifconfig.
*
**********************************************************************/
static void
igb_media_status(struct ifnet *ifp, struct ifmediareq *ifmr)
{
struct adapter *adapter = ifp->if_softc;
INIT_DEBUGOUT("igb_media_status: begin");
IGB_CORE_LOCK(adapter);
igb_update_link_status(adapter);
ifmr->ifm_status = IFM_AVALID;
ifmr->ifm_active = IFM_ETHER;
if (!adapter->link_active) {
IGB_CORE_UNLOCK(adapter);
return;
}
ifmr->ifm_status |= IFM_ACTIVE;
switch (adapter->link_speed) {
case 10:
ifmr->ifm_active |= IFM_10_T;
break;
case 100:
/*
** Support for 100Mb SFP - these are Fiber
** but the media type appears as serdes
*/
if (adapter->hw.phy.media_type ==
e1000_media_type_internal_serdes)
ifmr->ifm_active |= IFM_100_FX;
else
ifmr->ifm_active |= IFM_100_TX;
break;
case 1000:
ifmr->ifm_active |= IFM_1000_T;
break;
case 2500:
ifmr->ifm_active |= IFM_2500_SX;
break;
}
if (adapter->link_duplex == FULL_DUPLEX)
ifmr->ifm_active |= IFM_FDX;
else
ifmr->ifm_active |= IFM_HDX;
IGB_CORE_UNLOCK(adapter);
}
/*********************************************************************
*
* Media Ioctl callback
*
* This routine is called when the user changes speed/duplex using
* media/mediopt option with ifconfig.
*
**********************************************************************/
static int
igb_media_change(struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct ifmedia *ifm = &adapter->media;
INIT_DEBUGOUT("igb_media_change: begin");
if (IFM_TYPE(ifm->ifm_media) != IFM_ETHER)
return (EINVAL);
IGB_CORE_LOCK(adapter);
switch (IFM_SUBTYPE(ifm->ifm_media)) {
case IFM_AUTO:
adapter->hw.mac.autoneg = DO_AUTO_NEG;
adapter->hw.phy.autoneg_advertised = AUTONEG_ADV_DEFAULT;
break;
case IFM_1000_LX:
case IFM_1000_SX:
case IFM_1000_T:
adapter->hw.mac.autoneg = DO_AUTO_NEG;
adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
break;
case IFM_100_TX:
adapter->hw.mac.autoneg = FALSE;
adapter->hw.phy.autoneg_advertised = 0;
if ((ifm->ifm_media & IFM_GMASK) == IFM_FDX)
adapter->hw.mac.forced_speed_duplex = ADVERTISE_100_FULL;
else
adapter->hw.mac.forced_speed_duplex = ADVERTISE_100_HALF;
break;
case IFM_10_T:
adapter->hw.mac.autoneg = FALSE;
adapter->hw.phy.autoneg_advertised = 0;
if ((ifm->ifm_media & IFM_GMASK) == IFM_FDX)
adapter->hw.mac.forced_speed_duplex = ADVERTISE_10_FULL;
else
adapter->hw.mac.forced_speed_duplex = ADVERTISE_10_HALF;
break;
default:
device_printf(adapter->dev, "Unsupported media type\n");
}
igb_init_locked(adapter);
IGB_CORE_UNLOCK(adapter);
return (0);
}
/*********************************************************************
*
* This routine maps the mbufs to Advanced TX descriptors.
*
**********************************************************************/
static int
igb_xmit(struct tx_ring *txr, struct mbuf **m_headp)
{
struct adapter *adapter = txr->adapter;
u32 olinfo_status = 0, cmd_type_len;
int i, j, error, nsegs;
int first;
bool remap = TRUE;
struct mbuf *m_head;
bus_dma_segment_t segs[IGB_MAX_SCATTER];
bus_dmamap_t map;
struct igb_tx_buf *txbuf;
union e1000_adv_tx_desc *txd = NULL;
m_head = *m_headp;
/* Basic descriptor defines */
cmd_type_len = (E1000_ADVTXD_DTYP_DATA |
E1000_ADVTXD_DCMD_IFCS | E1000_ADVTXD_DCMD_DEXT);
if (m_head->m_flags & M_VLANTAG)
cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
/*
* Important to capture the first descriptor
* used because it will contain the index of
* the one we tell the hardware to report back
*/
first = txr->next_avail_desc;
txbuf = &txr->tx_buffers[first];
map = txbuf->map;
/*
* Map the packet for DMA.
*/
retry:
error = bus_dmamap_load_mbuf_sg(txr->txtag, map,
*m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
if (__predict_false(error)) {
struct mbuf *m;
switch (error) {
case EFBIG:
/* Try it again? - one try */
if (remap == TRUE) {
remap = FALSE;
m = m_collapse(*m_headp, M_NOWAIT,
IGB_MAX_SCATTER);
if (m == NULL) {
adapter->mbuf_defrag_failed++;
m_freem(*m_headp);
*m_headp = NULL;
return (ENOBUFS);
}
*m_headp = m;
goto retry;
} else
return (error);
default:
txr->no_tx_dma_setup++;
m_freem(*m_headp);
*m_headp = NULL;
return (error);
}
}
/* Make certain there are enough descriptors */
if (txr->tx_avail < (nsegs + 2)) {
txr->no_desc_avail++;
bus_dmamap_unload(txr->txtag, map);
return (ENOBUFS);
}
m_head = *m_headp;
/*
** Set up the appropriate offload context
** this will consume the first descriptor
*/
error = igb_tx_ctx_setup(txr, m_head, &cmd_type_len, &olinfo_status);
if (__predict_false(error)) {
m_freem(*m_headp);
*m_headp = NULL;
return (error);
}
/* 82575 needs the queue index added */
if (adapter->hw.mac.type == e1000_82575)
olinfo_status |= txr->me << 4;
i = txr->next_avail_desc;
for (j = 0; j < nsegs; j++) {
bus_size_t seglen;
bus_addr_t segaddr;
txbuf = &txr->tx_buffers[i];
txd = &txr->tx_base[i];
seglen = segs[j].ds_len;
segaddr = htole64(segs[j].ds_addr);
txd->read.buffer_addr = segaddr;
txd->read.cmd_type_len = htole32(E1000_TXD_CMD_IFCS |
cmd_type_len | seglen);
txd->read.olinfo_status = htole32(olinfo_status);
if (++i == txr->num_desc)
i = 0;
}
txd->read.cmd_type_len |=
htole32(E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS);
txr->tx_avail -= nsegs;
txr->next_avail_desc = i;
txbuf->m_head = m_head;
/*
** Here we swap the map so the last descriptor,
** which gets the completion interrupt has the
** real map, and the first descriptor gets the
** unused map from this descriptor.
*/
txr->tx_buffers[first].map = txbuf->map;
txbuf->map = map;
bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE);
/* Set the EOP descriptor that will be marked done */
txbuf = &txr->tx_buffers[first];
txbuf->eop = txd;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
* Advance the Transmit Descriptor Tail (Tdt), this tells the
* hardware that this frame is available to transmit.
*/
++txr->total_packets;
E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), i);
return (0);
}
static void
igb_set_promisc(struct adapter *adapter)
{
struct ifnet *ifp = adapter->ifp;
struct e1000_hw *hw = &adapter->hw;
u32 reg;
if (adapter->vf_ifp) {
e1000_promisc_set_vf(hw, e1000_promisc_enabled);
return;
}
reg = E1000_READ_REG(hw, E1000_RCTL);
if (ifp->if_flags & IFF_PROMISC) {
reg |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
E1000_WRITE_REG(hw, E1000_RCTL, reg);
} else if (ifp->if_flags & IFF_ALLMULTI) {
reg |= E1000_RCTL_MPE;
reg &= ~E1000_RCTL_UPE;
E1000_WRITE_REG(hw, E1000_RCTL, reg);
}
}
static void
igb_disable_promisc(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct ifnet *ifp = adapter->ifp;
u32 reg;
int mcnt = 0;
if (adapter->vf_ifp) {
e1000_promisc_set_vf(hw, e1000_promisc_disabled);
return;
}
reg = E1000_READ_REG(hw, E1000_RCTL);
reg &= (~E1000_RCTL_UPE);
if (ifp->if_flags & IFF_ALLMULTI)
mcnt = MAX_NUM_MULTICAST_ADDRESSES;
else {
struct ifmultiaddr *ifma;
#if __FreeBSD_version < 800000
IF_ADDR_LOCK(ifp);
#else
if_maddr_rlock(ifp);
#endif
TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
if (ifma->ifma_addr->sa_family != AF_LINK)
continue;
if (mcnt == MAX_NUM_MULTICAST_ADDRESSES)
break;
mcnt++;
}
#if __FreeBSD_version < 800000
IF_ADDR_UNLOCK(ifp);
#else
if_maddr_runlock(ifp);
#endif
}
/* Don't disable if in MAX groups */
if (mcnt < MAX_NUM_MULTICAST_ADDRESSES)
reg &= (~E1000_RCTL_MPE);
E1000_WRITE_REG(hw, E1000_RCTL, reg);
}
/*********************************************************************
* Multicast Update
*
* This routine is called whenever multicast address list is updated.
*
**********************************************************************/
static void
igb_set_multi(struct adapter *adapter)
{
struct ifnet *ifp = adapter->ifp;
struct ifmultiaddr *ifma;
u32 reg_rctl = 0;
u8 *mta;
int mcnt = 0;
IOCTL_DEBUGOUT("igb_set_multi: begin");
mta = adapter->mta;
bzero(mta, sizeof(uint8_t) * ETH_ADDR_LEN *
MAX_NUM_MULTICAST_ADDRESSES);
#if __FreeBSD_version < 800000
IF_ADDR_LOCK(ifp);
#else
if_maddr_rlock(ifp);
#endif
TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
if (ifma->ifma_addr->sa_family != AF_LINK)
continue;
if (mcnt == MAX_NUM_MULTICAST_ADDRESSES)
break;
bcopy(LLADDR((struct sockaddr_dl *)ifma->ifma_addr),
&mta[mcnt * ETH_ADDR_LEN], ETH_ADDR_LEN);
mcnt++;
}
#if __FreeBSD_version < 800000
IF_ADDR_UNLOCK(ifp);
#else
if_maddr_runlock(ifp);
#endif
if (mcnt >= MAX_NUM_MULTICAST_ADDRESSES) {
reg_rctl = E1000_READ_REG(&adapter->hw, E1000_RCTL);
reg_rctl |= E1000_RCTL_MPE;
E1000_WRITE_REG(&adapter->hw, E1000_RCTL, reg_rctl);
} else
e1000_update_mc_addr_list(&adapter->hw, mta, mcnt);
}
/*********************************************************************
* Timer routine:
* This routine checks for link status,
* updates statistics, and does the watchdog.
*
**********************************************************************/
static void
igb_local_timer(void *arg)
{
struct adapter *adapter = arg;
device_t dev = adapter->dev;
struct ifnet *ifp = adapter->ifp;
struct tx_ring *txr = adapter->tx_rings;
struct igb_queue *que = adapter->queues;
int hung = 0, busy = 0;
IGB_CORE_LOCK_ASSERT(adapter);
igb_update_link_status(adapter);
igb_update_stats_counters(adapter);
/*
** Check the TX queues status
** - central locked handling of OACTIVE
** - watchdog only if all queues show hung
*/
for (int i = 0; i < adapter->num_queues; i++, que++, txr++) {
if ((txr->queue_status & IGB_QUEUE_HUNG) &&
(adapter->pause_frames == 0))
++hung;
if (txr->queue_status & IGB_QUEUE_DEPLETED)
++busy;
if ((txr->queue_status & IGB_QUEUE_IDLE) == 0)
taskqueue_enqueue(que->tq, &que->que_task);
}
if (hung == adapter->num_queues)
goto timeout;
if (busy == adapter->num_queues)
ifp->if_drv_flags |= IFF_DRV_OACTIVE;
else if ((ifp->if_drv_flags & IFF_DRV_OACTIVE) &&
(busy < adapter->num_queues))
ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
adapter->pause_frames = 0;
callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
#ifndef DEVICE_POLLING
/* Schedule all queue interrupts - deadlock protection */
E1000_WRITE_REG(&adapter->hw, E1000_EICS, adapter->que_mask);
#endif
return;
timeout:
device_printf(adapter->dev, "Watchdog timeout -- resetting\n");
device_printf(dev,"Queue(%d) tdh = %d, hw tdt = %d\n", txr->me,
E1000_READ_REG(&adapter->hw, E1000_TDH(txr->me)),
E1000_READ_REG(&adapter->hw, E1000_TDT(txr->me)));
device_printf(dev,"TX(%d) desc avail = %d,"
"Next TX to Clean = %d\n",
txr->me, txr->tx_avail, txr->next_to_clean);
adapter->ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
adapter->watchdog_events++;
igb_init_locked(adapter);
}
static void
igb_update_link_status(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct e1000_fc_info *fc = &hw->fc;
struct ifnet *ifp = adapter->ifp;
device_t dev = adapter->dev;
struct tx_ring *txr = adapter->tx_rings;
u32 link_check, thstat, ctrl;
char *flowctl = NULL;
link_check = thstat = ctrl = 0;
/* Get the cached link value or read for real */
switch (hw->phy.media_type) {
case e1000_media_type_copper:
if (hw->mac.get_link_status) {
/* Do the work to read phy */
e1000_check_for_link(hw);
link_check = !hw->mac.get_link_status;
} else
link_check = TRUE;
break;
case e1000_media_type_fiber:
e1000_check_for_link(hw);
link_check = (E1000_READ_REG(hw, E1000_STATUS) &
E1000_STATUS_LU);
break;
case e1000_media_type_internal_serdes:
e1000_check_for_link(hw);
link_check = adapter->hw.mac.serdes_has_link;
break;
/* VF device is type_unknown */
case e1000_media_type_unknown:
e1000_check_for_link(hw);
link_check = !hw->mac.get_link_status;
/* Fall thru */
default:
break;
}
/* Check for thermal downshift or shutdown */
if (hw->mac.type == e1000_i350) {
thstat = E1000_READ_REG(hw, E1000_THSTAT);
ctrl = E1000_READ_REG(hw, E1000_CTRL_EXT);
}
/* Get the flow control for display */
switch (fc->current_mode) {
case e1000_fc_rx_pause:
flowctl = "RX";
break;
case e1000_fc_tx_pause:
flowctl = "TX";
break;
case e1000_fc_full:
flowctl = "Full";
break;
case e1000_fc_none:
default:
flowctl = "None";
break;
}
/* Now we check if a transition has happened */
if (link_check && (adapter->link_active == 0)) {
e1000_get_speed_and_duplex(&adapter->hw,
&adapter->link_speed, &adapter->link_duplex);
if (bootverbose)
device_printf(dev, "Link is up %d Mbps %s,"
" Flow Control: %s\n",
adapter->link_speed,
((adapter->link_duplex == FULL_DUPLEX) ?
"Full Duplex" : "Half Duplex"), flowctl);
adapter->link_active = 1;
ifp->if_baudrate = adapter->link_speed * 1000000;
if ((ctrl & E1000_CTRL_EXT_LINK_MODE_GMII) &&
(thstat & E1000_THSTAT_LINK_THROTTLE))
device_printf(dev, "Link: thermal downshift\n");
/* Delay Link Up for Phy update */
if (((hw->mac.type == e1000_i210) ||
(hw->mac.type == e1000_i211)) &&
(hw->phy.id == I210_I_PHY_ID))
msec_delay(I210_LINK_DELAY);
/* Reset if the media type changed. */
if (hw->dev_spec._82575.media_changed) {
hw->dev_spec._82575.media_changed = false;
adapter->flags |= IGB_MEDIA_RESET;
igb_reset(adapter);
}
/* This can sleep */
if_link_state_change(ifp, LINK_STATE_UP);
} else if (!link_check && (adapter->link_active == 1)) {
ifp->if_baudrate = adapter->link_speed = 0;
adapter->link_duplex = 0;
if (bootverbose)
device_printf(dev, "Link is Down\n");
if ((ctrl & E1000_CTRL_EXT_LINK_MODE_GMII) &&
(thstat & E1000_THSTAT_PWR_DOWN))
device_printf(dev, "Link: thermal shutdown\n");
adapter->link_active = 0;
/* This can sleep */
if_link_state_change(ifp, LINK_STATE_DOWN);
/* Reset queue state */
for (int i = 0; i < adapter->num_queues; i++, txr++)
txr->queue_status = IGB_QUEUE_IDLE;
}
}
/*********************************************************************
*
* This routine disables all traffic on the adapter by issuing a
* global reset on the MAC and deallocates TX/RX buffers.
*
**********************************************************************/
static void
igb_stop(void *arg)
{
struct adapter *adapter = arg;
struct ifnet *ifp = adapter->ifp;
struct tx_ring *txr = adapter->tx_rings;
IGB_CORE_LOCK_ASSERT(adapter);
INIT_DEBUGOUT("igb_stop: begin");
igb_disable_intr(adapter);
callout_stop(&adapter->timer);
/* Tell the stack that the interface is no longer active */
ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
ifp->if_drv_flags |= IFF_DRV_OACTIVE;
/* Disarm watchdog timer. */
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IGB_TX_LOCK(txr);
txr->queue_status = IGB_QUEUE_IDLE;
IGB_TX_UNLOCK(txr);
}
e1000_reset_hw(&adapter->hw);
E1000_WRITE_REG(&adapter->hw, E1000_WUC, 0);
e1000_led_off(&adapter->hw);
e1000_cleanup_led(&adapter->hw);
}
/*********************************************************************
*
* Determine hardware revision.
*
**********************************************************************/
static void
igb_identify_hardware(struct adapter *adapter)
{
device_t dev = adapter->dev;
/* Make sure our PCI config space has the necessary stuff set */
pci_enable_busmaster(dev);
adapter->hw.bus.pci_cmd_word = pci_read_config(dev, PCIR_COMMAND, 2);
/* Save off the information about this board */
adapter->hw.vendor_id = pci_get_vendor(dev);
adapter->hw.device_id = pci_get_device(dev);
adapter->hw.revision_id = pci_read_config(dev, PCIR_REVID, 1);
adapter->hw.subsystem_vendor_id =
pci_read_config(dev, PCIR_SUBVEND_0, 2);
adapter->hw.subsystem_device_id =
pci_read_config(dev, PCIR_SUBDEV_0, 2);
/* Set MAC type early for PCI setup */
e1000_set_mac_type(&adapter->hw);
/* Are we a VF device? */
if ((adapter->hw.mac.type == e1000_vfadapt) ||
(adapter->hw.mac.type == e1000_vfadapt_i350))
adapter->vf_ifp = 1;
else
adapter->vf_ifp = 0;
}
static int
igb_allocate_pci_resources(struct adapter *adapter)
{
device_t dev = adapter->dev;
int rid;
rid = PCIR_BAR(0);
adapter->pci_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY,
&rid, RF_ACTIVE);
if (adapter->pci_mem == NULL) {
device_printf(dev, "Unable to allocate bus resource: memory\n");
return (ENXIO);
}
adapter->osdep.mem_bus_space_tag =
rman_get_bustag(adapter->pci_mem);
adapter->osdep.mem_bus_space_handle =
rman_get_bushandle(adapter->pci_mem);
adapter->hw.hw_addr = (u8 *)&adapter->osdep.mem_bus_space_handle;
adapter->num_queues = 1; /* Defaults for Legacy or MSI */
/* This will setup either MSI/X or MSI */
adapter->msix = igb_setup_msix(adapter);
adapter->hw.back = &adapter->osdep;
return (0);
}
/*********************************************************************
*
* Setup the Legacy or MSI Interrupt handler
*
**********************************************************************/
static int
igb_allocate_legacy(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct igb_queue *que = adapter->queues;
#ifndef IGB_LEGACY_TX
struct tx_ring *txr = adapter->tx_rings;
#endif
int error, rid = 0;
/* Turn off all interrupts */
E1000_WRITE_REG(&adapter->hw, E1000_IMC, 0xffffffff);
/* MSI RID is 1 */
if (adapter->msix == 1)
rid = 1;
/* We allocate a single interrupt resource */
adapter->res = bus_alloc_resource_any(dev,
SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
if (adapter->res == NULL) {
device_printf(dev, "Unable to allocate bus resource: "
"interrupt\n");
return (ENXIO);
}
#ifndef IGB_LEGACY_TX
TASK_INIT(&txr->txq_task, 0, igb_deferred_mq_start, txr);
#endif
/*
* Try allocating a fast interrupt and the associated deferred
* processing contexts.
*/
TASK_INIT(&que->que_task, 0, igb_handle_que, que);
/* Make tasklet for deferred link handling */
TASK_INIT(&adapter->link_task, 0, igb_handle_link, adapter);
que->tq = taskqueue_create_fast("igb_taskq", M_NOWAIT,
taskqueue_thread_enqueue, &que->tq);
taskqueue_start_threads(&que->tq, 1, PI_NET, "%s taskq",
device_get_nameunit(adapter->dev));
if ((error = bus_setup_intr(dev, adapter->res,
INTR_TYPE_NET | INTR_MPSAFE, igb_irq_fast, NULL,
adapter, &adapter->tag)) != 0) {
device_printf(dev, "Failed to register fast interrupt "
"handler: %d\n", error);
taskqueue_free(que->tq);
que->tq = NULL;
return (error);
}
return (0);
}
/*********************************************************************
*
* Setup the MSIX Queue Interrupt handlers:
*
**********************************************************************/
static int
igb_allocate_msix(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct igb_queue *que = adapter->queues;
int error, rid, vector = 0;
int cpu_id = 0;
#ifdef RSS
cpuset_t cpu_mask;
#endif
/* Be sure to start with all interrupts disabled */
E1000_WRITE_REG(&adapter->hw, E1000_IMC, ~0);
E1000_WRITE_FLUSH(&adapter->hw);
#ifdef RSS
/*
* If we're doing RSS, the number of queues needs to
* match the number of RSS buckets that are configured.
*
* + If there's more queues than RSS buckets, we'll end
* up with queues that get no traffic.
*
* + If there's more RSS buckets than queues, we'll end
* up having multiple RSS buckets map to the same queue,
* so there'll be some contention.
*/
if (adapter->num_queues != rss_getnumbuckets()) {
device_printf(dev,
"%s: number of queues (%d) != number of RSS buckets (%d)"
"; performance will be impacted.\n",
__func__,
adapter->num_queues,
rss_getnumbuckets());
}
#endif
for (int i = 0; i < adapter->num_queues; i++, vector++, que++) {
rid = vector +1;
que->res = bus_alloc_resource_any(dev,
SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
if (que->res == NULL) {
device_printf(dev,
"Unable to allocate bus resource: "
"MSIX Queue Interrupt\n");
return (ENXIO);
}
error = bus_setup_intr(dev, que->res,
INTR_TYPE_NET | INTR_MPSAFE, NULL,
igb_msix_que, que, &que->tag);
if (error) {
que->res = NULL;
device_printf(dev, "Failed to register Queue handler");
return (error);
}
#if __FreeBSD_version >= 800504
bus_describe_intr(dev, que->res, que->tag, "que %d", i);
#endif
que->msix = vector;
if (adapter->hw.mac.type == e1000_82575)
que->eims = E1000_EICR_TX_QUEUE0 << i;
else
que->eims = 1 << vector;
#ifdef RSS
/*
* The queue ID is used as the RSS layer bucket ID.
* We look up the queue ID -> RSS CPU ID and select
* that.
*/
cpu_id = rss_getcpu(i % rss_getnumbuckets());
#else
/*
* Bind the msix vector, and thus the
* rings to the corresponding cpu.
*
* This just happens to match the default RSS round-robin
* bucket -> queue -> CPU allocation.
*/
if (adapter->num_queues > 1) {
if (igb_last_bind_cpu < 0)
igb_last_bind_cpu = CPU_FIRST();
cpu_id = igb_last_bind_cpu;
}
#endif
if (adapter->num_queues > 1) {
bus_bind_intr(dev, que->res, cpu_id);
#ifdef RSS
device_printf(dev,
"Bound queue %d to RSS bucket %d\n",
i, cpu_id);
#else
device_printf(dev,
"Bound queue %d to cpu %d\n",
i, cpu_id);
#endif
}
#ifndef IGB_LEGACY_TX
TASK_INIT(&que->txr->txq_task, 0, igb_deferred_mq_start,
que->txr);
#endif
/* Make tasklet for deferred handling */
TASK_INIT(&que->que_task, 0, igb_handle_que, que);
que->tq = taskqueue_create("igb_que", M_NOWAIT,
taskqueue_thread_enqueue, &que->tq);
if (adapter->num_queues > 1) {
/*
* Only pin the taskqueue thread to a CPU if
* RSS is in use.
*
* This again just happens to match the default RSS
* round-robin bucket -> queue -> CPU allocation.
*/
#ifdef RSS
CPU_SETOF(cpu_id, &cpu_mask);
taskqueue_start_threads_cpuset(&que->tq, 1, PI_NET,
&cpu_mask,
"%s que (bucket %d)",
device_get_nameunit(adapter->dev),
cpu_id);
#else
taskqueue_start_threads(&que->tq, 1, PI_NET,
"%s que (qid %d)",
device_get_nameunit(adapter->dev),
cpu_id);
#endif
} else {
taskqueue_start_threads(&que->tq, 1, PI_NET, "%s que",
device_get_nameunit(adapter->dev));
}
/* Finally update the last bound CPU id */
if (adapter->num_queues > 1)
igb_last_bind_cpu = CPU_NEXT(igb_last_bind_cpu);
}
/* And Link */
rid = vector + 1;
adapter->res = bus_alloc_resource_any(dev,
SYS_RES_IRQ, &rid, RF_SHAREABLE | RF_ACTIVE);
if (adapter->res == NULL) {
device_printf(dev,
"Unable to allocate bus resource: "
"MSIX Link Interrupt\n");
return (ENXIO);
}
if ((error = bus_setup_intr(dev, adapter->res,
INTR_TYPE_NET | INTR_MPSAFE, NULL,
igb_msix_link, adapter, &adapter->tag)) != 0) {
device_printf(dev, "Failed to register Link handler");
return (error);
}
#if __FreeBSD_version >= 800504
bus_describe_intr(dev, adapter->res, adapter->tag, "link");
#endif
adapter->linkvec = vector;
return (0);
}
static void
igb_configure_queues(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct igb_queue *que;
u32 tmp, ivar = 0, newitr = 0;
/* First turn on RSS capability */
if (adapter->hw.mac.type != e1000_82575)
E1000_WRITE_REG(hw, E1000_GPIE,
E1000_GPIE_MSIX_MODE | E1000_GPIE_EIAME |
E1000_GPIE_PBA | E1000_GPIE_NSICR);
/* Turn on MSIX */
switch (adapter->hw.mac.type) {
case e1000_82580:
case e1000_i350:
case e1000_i354:
case e1000_i210:
case e1000_i211:
case e1000_vfadapt:
case e1000_vfadapt_i350:
/* RX entries */
for (int i = 0; i < adapter->num_queues; i++) {
u32 index = i >> 1;
ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
que = &adapter->queues[i];
if (i & 1) {
ivar &= 0xFF00FFFF;
ivar |= (que->msix | E1000_IVAR_VALID) << 16;
} else {
ivar &= 0xFFFFFF00;
ivar |= que->msix | E1000_IVAR_VALID;
}
E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
}
/* TX entries */
for (int i = 0; i < adapter->num_queues; i++) {
u32 index = i >> 1;
ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
que = &adapter->queues[i];
if (i & 1) {
ivar &= 0x00FFFFFF;
ivar |= (que->msix | E1000_IVAR_VALID) << 24;
} else {
ivar &= 0xFFFF00FF;
ivar |= (que->msix | E1000_IVAR_VALID) << 8;
}
E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
adapter->que_mask |= que->eims;
}
/* And for the link interrupt */
ivar = (adapter->linkvec | E1000_IVAR_VALID) << 8;
adapter->link_mask = 1 << adapter->linkvec;
E1000_WRITE_REG(hw, E1000_IVAR_MISC, ivar);
break;
case e1000_82576:
/* RX entries */
for (int i = 0; i < adapter->num_queues; i++) {
u32 index = i & 0x7; /* Each IVAR has two entries */
ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
que = &adapter->queues[i];
if (i < 8) {
ivar &= 0xFFFFFF00;
ivar |= que->msix | E1000_IVAR_VALID;
} else {
ivar &= 0xFF00FFFF;
ivar |= (que->msix | E1000_IVAR_VALID) << 16;
}
E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
adapter->que_mask |= que->eims;
}
/* TX entries */
for (int i = 0; i < adapter->num_queues; i++) {
u32 index = i & 0x7; /* Each IVAR has two entries */
ivar = E1000_READ_REG_ARRAY(hw, E1000_IVAR0, index);
que = &adapter->queues[i];
if (i < 8) {
ivar &= 0xFFFF00FF;
ivar |= (que->msix | E1000_IVAR_VALID) << 8;
} else {
ivar &= 0x00FFFFFF;
ivar |= (que->msix | E1000_IVAR_VALID) << 24;
}
E1000_WRITE_REG_ARRAY(hw, E1000_IVAR0, index, ivar);
adapter->que_mask |= que->eims;
}
/* And for the link interrupt */
ivar = (adapter->linkvec | E1000_IVAR_VALID) << 8;
adapter->link_mask = 1 << adapter->linkvec;
E1000_WRITE_REG(hw, E1000_IVAR_MISC, ivar);
break;
case e1000_82575:
/* enable MSI-X support*/
tmp = E1000_READ_REG(hw, E1000_CTRL_EXT);
tmp |= E1000_CTRL_EXT_PBA_CLR;
/* Auto-Mask interrupts upon ICR read. */
tmp |= E1000_CTRL_EXT_EIAME;
tmp |= E1000_CTRL_EXT_IRCA;
E1000_WRITE_REG(hw, E1000_CTRL_EXT, tmp);
/* Queues */
for (int i = 0; i < adapter->num_queues; i++) {
que = &adapter->queues[i];
tmp = E1000_EICR_RX_QUEUE0 << i;
tmp |= E1000_EICR_TX_QUEUE0 << i;
que->eims = tmp;
E1000_WRITE_REG_ARRAY(hw, E1000_MSIXBM(0),
i, que->eims);
adapter->que_mask |= que->eims;
}
/* Link */
E1000_WRITE_REG(hw, E1000_MSIXBM(adapter->linkvec),
E1000_EIMS_OTHER);
adapter->link_mask |= E1000_EIMS_OTHER;
default:
break;
}
/* Set the starting interrupt rate */
if (igb_max_interrupt_rate > 0)
newitr = (4000000 / igb_max_interrupt_rate) & 0x7FFC;
if (hw->mac.type == e1000_82575)
newitr |= newitr << 16;
else
newitr |= E1000_EITR_CNT_IGNR;
for (int i = 0; i < adapter->num_queues; i++) {
que = &adapter->queues[i];
E1000_WRITE_REG(hw, E1000_EITR(que->msix), newitr);
}
return;
}
static void
igb_free_pci_resources(struct adapter *adapter)
{
struct igb_queue *que = adapter->queues;
device_t dev = adapter->dev;
int rid;
/*
** There is a slight possibility of a failure mode
** in attach that will result in entering this function
** before interrupt resources have been initialized, and
** in that case we do not want to execute the loops below
** We can detect this reliably by the state of the adapter
** res pointer.
*/
if (adapter->res == NULL)
goto mem;
/*
* First release all the interrupt resources:
*/
for (int i = 0; i < adapter->num_queues; i++, que++) {
rid = que->msix + 1;
if (que->tag != NULL) {
bus_teardown_intr(dev, que->res, que->tag);
que->tag = NULL;
}
if (que->res != NULL)
bus_release_resource(dev,
SYS_RES_IRQ, rid, que->res);
}
/* Clean the Legacy or Link interrupt last */
if (adapter->linkvec) /* we are doing MSIX */
rid = adapter->linkvec + 1;
else
(adapter->msix != 0) ? (rid = 1):(rid = 0);
que = adapter->queues;
if (adapter->tag != NULL) {
taskqueue_drain(que->tq, &adapter->link_task);
bus_teardown_intr(dev, adapter->res, adapter->tag);
adapter->tag = NULL;
}
if (adapter->res != NULL)
bus_release_resource(dev, SYS_RES_IRQ, rid, adapter->res);
for (int i = 0; i < adapter->num_queues; i++, que++) {
if (que->tq != NULL) {
#ifndef IGB_LEGACY_TX
taskqueue_drain(que->tq, &que->txr->txq_task);
#endif
taskqueue_drain(que->tq, &que->que_task);
taskqueue_free(que->tq);
}
}
mem:
if (adapter->msix)
pci_release_msi(dev);
if (adapter->msix_mem != NULL)
bus_release_resource(dev, SYS_RES_MEMORY,
adapter->memrid, adapter->msix_mem);
if (adapter->pci_mem != NULL)
bus_release_resource(dev, SYS_RES_MEMORY,
PCIR_BAR(0), adapter->pci_mem);
}
/*
* Setup Either MSI/X or MSI
*/
static int
igb_setup_msix(struct adapter *adapter)
{
device_t dev = adapter->dev;
int bar, want, queues, msgs, maxqueues;
/* tuneable override */
if (igb_enable_msix == 0)
goto msi;
/* First try MSI/X */
msgs = pci_msix_count(dev);
if (msgs == 0)
goto msi;
/*
** Some new devices, as with ixgbe, now may
** use a different BAR, so we need to keep
** track of which is used.
*/
adapter->memrid = PCIR_BAR(IGB_MSIX_BAR);
bar = pci_read_config(dev, adapter->memrid, 4);
if (bar == 0) /* use next bar */
adapter->memrid += 4;
adapter->msix_mem = bus_alloc_resource_any(dev,
SYS_RES_MEMORY, &adapter->memrid, RF_ACTIVE);
if (adapter->msix_mem == NULL) {
/* May not be enabled */
device_printf(adapter->dev,
"Unable to map MSIX table \n");
goto msi;
}
queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus;
/* Override via tuneable */
if (igb_num_queues != 0)
queues = igb_num_queues;
#ifdef RSS
/* If we're doing RSS, clamp at the number of RSS buckets */
if (queues > rss_getnumbuckets())
queues = rss_getnumbuckets();
#endif
/* Sanity check based on HW */
switch (adapter->hw.mac.type) {
case e1000_82575:
maxqueues = 4;
break;
case e1000_82576:
case e1000_82580:
case e1000_i350:
case e1000_i354:
maxqueues = 8;
break;
case e1000_i210:
maxqueues = 4;
break;
case e1000_i211:
maxqueues = 2;
break;
default: /* VF interfaces */
maxqueues = 1;
break;
}
/* Final clamp on the actual hardware capability */
if (queues > maxqueues)
queues = maxqueues;
/*
** One vector (RX/TX pair) per queue
** plus an additional for Link interrupt
*/
want = queues + 1;
if (msgs >= want)
msgs = want;
else {
device_printf(adapter->dev,
"MSIX Configuration Problem, "
"%d vectors configured, but %d queues wanted!\n",
msgs, want);
goto msi;
}
if ((pci_alloc_msix(dev, &msgs) == 0) && (msgs == want)) {
device_printf(adapter->dev,
"Using MSIX interrupts with %d vectors\n", msgs);
adapter->num_queues = queues;
return (msgs);
}
/*
** If MSIX alloc failed or provided us with
** less than needed, free and fall through to MSI
*/
pci_release_msi(dev);
msi:
if (adapter->msix_mem != NULL) {
bus_release_resource(dev, SYS_RES_MEMORY,
PCIR_BAR(IGB_MSIX_BAR), adapter->msix_mem);
adapter->msix_mem = NULL;
}
msgs = 1;
if (pci_alloc_msi(dev, &msgs) == 0) {
device_printf(adapter->dev," Using an MSI interrupt\n");
return (msgs);
}
device_printf(adapter->dev," Using a Legacy interrupt\n");
return (0);
}
/*********************************************************************
*
* Initialize the DMA Coalescing feature
*
**********************************************************************/
static void
igb_init_dmac(struct adapter *adapter, u32 pba)
{
device_t dev = adapter->dev;
struct e1000_hw *hw = &adapter->hw;
u32 dmac, reg = ~E1000_DMACR_DMAC_EN;
u16 hwm;
if (hw->mac.type == e1000_i211)
return;
if (hw->mac.type > e1000_82580) {
if (adapter->dmac == 0) { /* Disabling it */
E1000_WRITE_REG(hw, E1000_DMACR, reg);
return;
} else
device_printf(dev, "DMA Coalescing enabled\n");
/* Set starting threshold */
E1000_WRITE_REG(hw, E1000_DMCTXTH, 0);
hwm = 64 * pba - adapter->max_frame_size / 16;
if (hwm < 64 * (pba - 6))
hwm = 64 * (pba - 6);
reg = E1000_READ_REG(hw, E1000_FCRTC);
reg &= ~E1000_FCRTC_RTH_COAL_MASK;
reg |= ((hwm << E1000_FCRTC_RTH_COAL_SHIFT)
& E1000_FCRTC_RTH_COAL_MASK);
E1000_WRITE_REG(hw, E1000_FCRTC, reg);
dmac = pba - adapter->max_frame_size / 512;
if (dmac < pba - 10)
dmac = pba - 10;
reg = E1000_READ_REG(hw, E1000_DMACR);
reg &= ~E1000_DMACR_DMACTHR_MASK;
reg = ((dmac << E1000_DMACR_DMACTHR_SHIFT)
& E1000_DMACR_DMACTHR_MASK);
/* transition to L0x or L1 if available..*/
reg |= (E1000_DMACR_DMAC_EN | E1000_DMACR_DMAC_LX_MASK);
/* Check if status is 2.5Gb backplane connection
* before configuration of watchdog timer, which is
* in msec values in 12.8usec intervals
* watchdog timer= msec values in 32usec intervals
* for non 2.5Gb connection
*/
if (hw->mac.type == e1000_i354) {
int status = E1000_READ_REG(hw, E1000_STATUS);
if ((status & E1000_STATUS_2P5_SKU) &&
(!(status & E1000_STATUS_2P5_SKU_OVER)))
reg |= ((adapter->dmac * 5) >> 6);
else
reg |= (adapter->dmac >> 5);
} else {
reg |= (adapter->dmac >> 5);
}
E1000_WRITE_REG(hw, E1000_DMACR, reg);
E1000_WRITE_REG(hw, E1000_DMCRTRH, 0);
/* Set the interval before transition */
reg = E1000_READ_REG(hw, E1000_DMCTLX);
if (hw->mac.type == e1000_i350)
reg |= IGB_DMCTLX_DCFLUSH_DIS;
/*
** in 2.5Gb connection, TTLX unit is 0.4 usec
** which is 0x4*2 = 0xA. But delay is still 4 usec
*/
if (hw->mac.type == e1000_i354) {
int status = E1000_READ_REG(hw, E1000_STATUS);
if ((status & E1000_STATUS_2P5_SKU) &&
(!(status & E1000_STATUS_2P5_SKU_OVER)))
reg |= 0xA;
else
reg |= 0x4;
} else {
reg |= 0x4;
}
E1000_WRITE_REG(hw, E1000_DMCTLX, reg);
/* free space in tx packet buffer to wake from DMA coal */
E1000_WRITE_REG(hw, E1000_DMCTXTH, (IGB_TXPBSIZE -
(2 * adapter->max_frame_size)) >> 6);
/* make low power state decision controlled by DMA coal */
reg = E1000_READ_REG(hw, E1000_PCIEMISC);
reg &= ~E1000_PCIEMISC_LX_DECISION;
E1000_WRITE_REG(hw, E1000_PCIEMISC, reg);
} else if (hw->mac.type == e1000_82580) {
u32 reg = E1000_READ_REG(hw, E1000_PCIEMISC);
E1000_WRITE_REG(hw, E1000_PCIEMISC,
reg & ~E1000_PCIEMISC_LX_DECISION);
E1000_WRITE_REG(hw, E1000_DMACR, 0);
}
}
/*********************************************************************
*
* Set up an fresh starting state
*
**********************************************************************/
static void
igb_reset(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct e1000_hw *hw = &adapter->hw;
struct e1000_fc_info *fc = &hw->fc;
struct ifnet *ifp = adapter->ifp;
u32 pba = 0;
u16 hwm;
INIT_DEBUGOUT("igb_reset: begin");
/* Let the firmware know the OS is in control */
igb_get_hw_control(adapter);
/*
* Packet Buffer Allocation (PBA)
* Writing PBA sets the receive portion of the buffer
* the remainder is used for the transmit buffer.
*/
switch (hw->mac.type) {
case e1000_82575:
pba = E1000_PBA_32K;
break;
case e1000_82576:
case e1000_vfadapt:
pba = E1000_READ_REG(hw, E1000_RXPBS);
pba &= E1000_RXPBS_SIZE_MASK_82576;
break;
case e1000_82580:
case e1000_i350:
case e1000_i354:
case e1000_vfadapt_i350:
pba = E1000_READ_REG(hw, E1000_RXPBS);
pba = e1000_rxpbs_adjust_82580(pba);
break;
case e1000_i210:
case e1000_i211:
pba = E1000_PBA_34K;
default:
break;
}
/* Special needs in case of Jumbo frames */
if ((hw->mac.type == e1000_82575) && (ifp->if_mtu > ETHERMTU)) {
u32 tx_space, min_tx, min_rx;
pba = E1000_READ_REG(hw, E1000_PBA);
tx_space = pba >> 16;
pba &= 0xffff;
min_tx = (adapter->max_frame_size +
sizeof(struct e1000_tx_desc) - ETHERNET_FCS_SIZE) * 2;
min_tx = roundup2(min_tx, 1024);
min_tx >>= 10;
min_rx = adapter->max_frame_size;
min_rx = roundup2(min_rx, 1024);
min_rx >>= 10;
if (tx_space < min_tx &&
((min_tx - tx_space) < pba)) {
pba = pba - (min_tx - tx_space);
/*
* if short on rx space, rx wins
* and must trump tx adjustment
*/
if (pba < min_rx)
pba = min_rx;
}
E1000_WRITE_REG(hw, E1000_PBA, pba);
}
INIT_DEBUGOUT1("igb_init: pba=%dK",pba);
/*
* These parameters control the automatic generation (Tx) and
* response (Rx) to Ethernet PAUSE frames.
* - High water mark should allow for at least two frames to be
* received after sending an XOFF.
* - Low water mark works best when it is very near the high water mark.
* This allows the receiver to restart by sending XON when it has
* drained a bit.
*/
hwm = min(((pba << 10) * 9 / 10),
((pba << 10) - 2 * adapter->max_frame_size));
if (hw->mac.type < e1000_82576) {
fc->high_water = hwm & 0xFFF8; /* 8-byte granularity */
fc->low_water = fc->high_water - 8;
} else {
fc->high_water = hwm & 0xFFF0; /* 16-byte granularity */
fc->low_water = fc->high_water - 16;
}
fc->pause_time = IGB_FC_PAUSE_TIME;
fc->send_xon = TRUE;
if (adapter->fc)
fc->requested_mode = adapter->fc;
else
fc->requested_mode = e1000_fc_default;
/* Issue a global reset */
e1000_reset_hw(hw);
E1000_WRITE_REG(hw, E1000_WUC, 0);
/* Reset for AutoMediaDetect */
if (adapter->flags & IGB_MEDIA_RESET) {
e1000_setup_init_funcs(hw, TRUE);
e1000_get_bus_info(hw);
adapter->flags &= ~IGB_MEDIA_RESET;
}
if (e1000_init_hw(hw) < 0)
device_printf(dev, "Hardware Initialization Failed\n");
/* Setup DMA Coalescing */
igb_init_dmac(adapter, pba);
E1000_WRITE_REG(&adapter->hw, E1000_VET, ETHERTYPE_VLAN);
e1000_get_phy_info(hw);
e1000_check_for_link(hw);
return;
}
/*********************************************************************
*
* Setup networking device structure and register an interface.
*
**********************************************************************/
static int
igb_setup_interface(device_t dev, struct adapter *adapter)
{
struct ifnet *ifp;
INIT_DEBUGOUT("igb_setup_interface: begin");
ifp = adapter->ifp = if_alloc(IFT_ETHER);
if (ifp == NULL) {
device_printf(dev, "can not allocate ifnet structure\n");
return (-1);
}
if_initname(ifp, device_get_name(dev), device_get_unit(dev));
ifp->if_init = igb_init;
ifp->if_softc = adapter;
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
ifp->if_ioctl = igb_ioctl;
ifp->if_get_counter = igb_get_counter;
/* TSO parameters */
ifp->if_hw_tsomax = IP_MAXPACKET;
ifp->if_hw_tsomaxsegcount = IGB_MAX_SCATTER;
ifp->if_hw_tsomaxsegsize = IGB_TSO_SEG_SIZE;
#ifndef IGB_LEGACY_TX
ifp->if_transmit = igb_mq_start;
ifp->if_qflush = igb_qflush;
#else
ifp->if_start = igb_start;
IFQ_SET_MAXLEN(&ifp->if_snd, adapter->num_tx_desc - 1);
ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 1;
IFQ_SET_READY(&ifp->if_snd);
#endif
ether_ifattach(ifp, adapter->hw.mac.addr);
ifp->if_capabilities = ifp->if_capenable = 0;
ifp->if_capabilities = IFCAP_HWCSUM | IFCAP_VLAN_HWCSUM;
#if __FreeBSD_version >= 1000000
ifp->if_capabilities |= IFCAP_HWCSUM_IPV6;
#endif
ifp->if_capabilities |= IFCAP_TSO;
ifp->if_capabilities |= IFCAP_JUMBO_MTU;
ifp->if_capenable = ifp->if_capabilities;
/* Don't enable LRO by default */
ifp->if_capabilities |= IFCAP_LRO;
#ifdef DEVICE_POLLING
ifp->if_capabilities |= IFCAP_POLLING;
#endif
/*
* Tell the upper layer(s) we
* support full VLAN capability.
*/
ifp->if_hdrlen = sizeof(struct ether_vlan_header);
ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING
| IFCAP_VLAN_HWTSO
| IFCAP_VLAN_MTU;
ifp->if_capenable |= IFCAP_VLAN_HWTAGGING
| IFCAP_VLAN_HWTSO
| IFCAP_VLAN_MTU;
/*
** Don't turn this on by default, if vlans are
** created on another pseudo device (eg. lagg)
** then vlan events are not passed thru, breaking
** operation, but with HW FILTER off it works. If
** using vlans directly on the igb driver you can
** enable this and get full hardware tag filtering.
*/
ifp->if_capabilities |= IFCAP_VLAN_HWFILTER;
/*
* Specify the media types supported by this adapter and register
* callbacks to update media and link information
*/
ifmedia_init(&adapter->media, IFM_IMASK,
igb_media_change, igb_media_status);
if ((adapter->hw.phy.media_type == e1000_media_type_fiber) ||
(adapter->hw.phy.media_type == e1000_media_type_internal_serdes)) {
ifmedia_add(&adapter->media, IFM_ETHER | IFM_1000_SX | IFM_FDX,
0, NULL);
ifmedia_add(&adapter->media, IFM_ETHER | IFM_1000_SX, 0, NULL);
} else {
ifmedia_add(&adapter->media, IFM_ETHER | IFM_10_T, 0, NULL);
ifmedia_add(&adapter->media, IFM_ETHER | IFM_10_T | IFM_FDX,
0, NULL);
ifmedia_add(&adapter->media, IFM_ETHER | IFM_100_TX,
0, NULL);
ifmedia_add(&adapter->media, IFM_ETHER | IFM_100_TX | IFM_FDX,
0, NULL);
if (adapter->hw.phy.type != e1000_phy_ife) {
ifmedia_add(&adapter->media,
IFM_ETHER | IFM_1000_T | IFM_FDX, 0, NULL);
ifmedia_add(&adapter->media,
IFM_ETHER | IFM_1000_T, 0, NULL);
}
}
ifmedia_add(&adapter->media, IFM_ETHER | IFM_AUTO, 0, NULL);
ifmedia_set(&adapter->media, IFM_ETHER | IFM_AUTO);
return (0);
}
/*
* Manage DMA'able memory.
*/
static void
igb_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
{
if (error)
return;
*(bus_addr_t *) arg = segs[0].ds_addr;
}
static int
igb_dma_malloc(struct adapter *adapter, bus_size_t size,
struct igb_dma_alloc *dma, int mapflags)
{
int error;
error = bus_dma_tag_create(bus_get_dma_tag(adapter->dev), /* parent */
IGB_DBA_ALIGN, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
size, /* maxsize */
1, /* nsegments */
size, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockarg */
&dma->dma_tag);
if (error) {
device_printf(adapter->dev,
"%s: bus_dma_tag_create failed: %d\n",
__func__, error);
goto fail_0;
}
error = bus_dmamem_alloc(dma->dma_tag, (void**) &dma->dma_vaddr,
BUS_DMA_NOWAIT | BUS_DMA_COHERENT, &dma->dma_map);
if (error) {
device_printf(adapter->dev,
"%s: bus_dmamem_alloc(%ju) failed: %d\n",
__func__, (uintmax_t)size, error);
goto fail_2;
}
dma->dma_paddr = 0;
error = bus_dmamap_load(dma->dma_tag, dma->dma_map, dma->dma_vaddr,
size, igb_dmamap_cb, &dma->dma_paddr, mapflags | BUS_DMA_NOWAIT);
if (error || dma->dma_paddr == 0) {
device_printf(adapter->dev,
"%s: bus_dmamap_load failed: %d\n",
__func__, error);
goto fail_3;
}
return (0);
fail_3:
bus_dmamap_unload(dma->dma_tag, dma->dma_map);
fail_2:
bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
bus_dma_tag_destroy(dma->dma_tag);
fail_0:
dma->dma_tag = NULL;
return (error);
}
static void
igb_dma_free(struct adapter *adapter, struct igb_dma_alloc *dma)
{
if (dma->dma_tag == NULL)
return;
if (dma->dma_paddr != 0) {
bus_dmamap_sync(dma->dma_tag, dma->dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(dma->dma_tag, dma->dma_map);
dma->dma_paddr = 0;
}
if (dma->dma_vaddr != NULL) {
bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
dma->dma_vaddr = NULL;
}
bus_dma_tag_destroy(dma->dma_tag);
dma->dma_tag = NULL;
}
/*********************************************************************
*
* Allocate memory for the transmit and receive rings, and then
* the descriptors associated with each, called only once at attach.
*
**********************************************************************/
static int
igb_allocate_queues(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct igb_queue *que = NULL;
struct tx_ring *txr = NULL;
struct rx_ring *rxr = NULL;
int rsize, tsize, error = E1000_SUCCESS;
int txconf = 0, rxconf = 0;
/* First allocate the top level queue structs */
if (!(adapter->queues =
(struct igb_queue *) malloc(sizeof(struct igb_queue) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate queue memory\n");
error = ENOMEM;
goto fail;
}
/* Next allocate the TX ring struct memory */
if (!(adapter->tx_rings =
(struct tx_ring *) malloc(sizeof(struct tx_ring) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate TX ring memory\n");
error = ENOMEM;
goto tx_fail;
}
/* Now allocate the RX */
if (!(adapter->rx_rings =
(struct rx_ring *) malloc(sizeof(struct rx_ring) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate RX ring memory\n");
error = ENOMEM;
goto rx_fail;
}
tsize = roundup2(adapter->num_tx_desc *
sizeof(union e1000_adv_tx_desc), IGB_DBA_ALIGN);
/*
* Now set up the TX queues, txconf is needed to handle the
* possibility that things fail midcourse and we need to
* undo memory gracefully
*/
for (int i = 0; i < adapter->num_queues; i++, txconf++) {
/* Set up some basics */
txr = &adapter->tx_rings[i];
txr->adapter = adapter;
txr->me = i;
txr->num_desc = adapter->num_tx_desc;
/* Initialize the TX lock */
snprintf(txr->mtx_name, sizeof(txr->mtx_name), "%s:tx(%d)",
device_get_nameunit(dev), txr->me);
mtx_init(&txr->tx_mtx, txr->mtx_name, NULL, MTX_DEF);
if (igb_dma_malloc(adapter, tsize,
&txr->txdma, BUS_DMA_NOWAIT)) {
device_printf(dev,
"Unable to allocate TX Descriptor memory\n");
error = ENOMEM;
goto err_tx_desc;
}
txr->tx_base = (union e1000_adv_tx_desc *)txr->txdma.dma_vaddr;
bzero((void *)txr->tx_base, tsize);
/* Now allocate transmit buffers for the ring */
if (igb_allocate_transmit_buffers(txr)) {
device_printf(dev,
"Critical Failure setting up transmit buffers\n");
error = ENOMEM;
goto err_tx_desc;
}
#ifndef IGB_LEGACY_TX
/* Allocate a buf ring */
txr->br = buf_ring_alloc(igb_buf_ring_size, M_DEVBUF,
M_WAITOK, &txr->tx_mtx);
#endif
}
/*
* Next the RX queues...
*/
rsize = roundup2(adapter->num_rx_desc *
sizeof(union e1000_adv_rx_desc), IGB_DBA_ALIGN);
for (int i = 0; i < adapter->num_queues; i++, rxconf++) {
rxr = &adapter->rx_rings[i];
rxr->adapter = adapter;
rxr->me = i;
/* Initialize the RX lock */
snprintf(rxr->mtx_name, sizeof(rxr->mtx_name), "%s:rx(%d)",
device_get_nameunit(dev), txr->me);
mtx_init(&rxr->rx_mtx, rxr->mtx_name, NULL, MTX_DEF);
if (igb_dma_malloc(adapter, rsize,
&rxr->rxdma, BUS_DMA_NOWAIT)) {
device_printf(dev,
"Unable to allocate RxDescriptor memory\n");
error = ENOMEM;
goto err_rx_desc;
}
rxr->rx_base = (union e1000_adv_rx_desc *)rxr->rxdma.dma_vaddr;
bzero((void *)rxr->rx_base, rsize);
/* Allocate receive buffers for the ring*/
if (igb_allocate_receive_buffers(rxr)) {
device_printf(dev,
"Critical Failure setting up receive buffers\n");
error = ENOMEM;
goto err_rx_desc;
}
}
/*
** Finally set up the queue holding structs
*/
for (int i = 0; i < adapter->num_queues; i++) {
que = &adapter->queues[i];
que->adapter = adapter;
que->txr = &adapter->tx_rings[i];
que->rxr = &adapter->rx_rings[i];
}
return (0);
err_rx_desc:
for (rxr = adapter->rx_rings; rxconf > 0; rxr++, rxconf--)
igb_dma_free(adapter, &rxr->rxdma);
err_tx_desc:
for (txr = adapter->tx_rings; txconf > 0; txr++, txconf--)
igb_dma_free(adapter, &txr->txdma);
free(adapter->rx_rings, M_DEVBUF);
rx_fail:
#ifndef IGB_LEGACY_TX
buf_ring_free(txr->br, M_DEVBUF);
#endif
free(adapter->tx_rings, M_DEVBUF);
tx_fail:
free(adapter->queues, M_DEVBUF);
fail:
return (error);
}
/*********************************************************************
*
* Allocate memory for tx_buffer structures. The tx_buffer stores all
* the information needed to transmit a packet on the wire. This is
* called only once at attach, setup is done every reset.
*
**********************************************************************/
static int
igb_allocate_transmit_buffers(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
device_t dev = adapter->dev;
struct igb_tx_buf *txbuf;
int error, i;
/*
* Setup DMA descriptor areas.
*/
if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
IGB_TSO_SIZE, /* maxsize */
IGB_MAX_SCATTER, /* nsegments */
PAGE_SIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->txtag))) {
device_printf(dev,"Unable to allocate TX DMA tag\n");
goto fail;
}
if (!(txr->tx_buffers =
(struct igb_tx_buf *) malloc(sizeof(struct igb_tx_buf) *
adapter->num_tx_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate tx_buffer memory\n");
error = ENOMEM;
goto fail;
}
/* Create the descriptor buffer dma maps */
txbuf = txr->tx_buffers;
for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
error = bus_dmamap_create(txr->txtag, 0, &txbuf->map);
if (error != 0) {
device_printf(dev, "Unable to create TX DMA map\n");
goto fail;
}
}
return 0;
fail:
/* We free all, it handles case where we are in the middle */
igb_free_transmit_structures(adapter);
return (error);
}
/*********************************************************************
*
* Initialize a transmit ring.
*
**********************************************************************/
static void
igb_setup_transmit_ring(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct igb_tx_buf *txbuf;
int i;
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
/* Clear the old descriptor contents */
IGB_TX_LOCK(txr);
#ifdef DEV_NETMAP
slot = netmap_reset(na, NR_TX, txr->me, 0);
#endif /* DEV_NETMAP */
bzero((void *)txr->tx_base,
(sizeof(union e1000_adv_tx_desc)) * adapter->num_tx_desc);
/* Reset indices */
txr->next_avail_desc = 0;
txr->next_to_clean = 0;
/* Free any existing tx buffers. */
txbuf = txr->tx_buffers;
for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
if (txbuf->m_head != NULL) {
bus_dmamap_sync(txr->txtag, txbuf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag, txbuf->map);
m_freem(txbuf->m_head);
txbuf->m_head = NULL;
}
#ifdef DEV_NETMAP
if (slot) {
int si = netmap_idx_n2k(&na->tx_rings[txr->me], i);
/* no need to set the address */
netmap_load_map(na, txr->txtag, txbuf->map, NMB(na, slot + si));
}
#endif /* DEV_NETMAP */
/* clear the watch index */
txbuf->eop = NULL;
}
/* Set number of descriptors available */
txr->tx_avail = adapter->num_tx_desc;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
IGB_TX_UNLOCK(txr);
}
/*********************************************************************
*
* Initialize all transmit rings.
*
**********************************************************************/
static void
igb_setup_transmit_structures(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++)
igb_setup_transmit_ring(txr);
return;
}
/*********************************************************************
*
* Enable transmit unit.
*
**********************************************************************/
static void
igb_initialize_transmit_units(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
struct e1000_hw *hw = &adapter->hw;
u32 tctl, txdctl;
INIT_DEBUGOUT("igb_initialize_transmit_units: begin");
tctl = txdctl = 0;
/* Setup the Tx Descriptor Rings */
for (int i = 0; i < adapter->num_queues; i++, txr++) {
u64 bus_addr = txr->txdma.dma_paddr;
E1000_WRITE_REG(hw, E1000_TDLEN(i),
adapter->num_tx_desc * sizeof(struct e1000_tx_desc));
E1000_WRITE_REG(hw, E1000_TDBAH(i),
(uint32_t)(bus_addr >> 32));
E1000_WRITE_REG(hw, E1000_TDBAL(i),
(uint32_t)bus_addr);
/* Setup the HW Tx Head and Tail descriptor pointers */
E1000_WRITE_REG(hw, E1000_TDT(i), 0);
E1000_WRITE_REG(hw, E1000_TDH(i), 0);
HW_DEBUGOUT2("Base = %x, Length = %x\n",
E1000_READ_REG(hw, E1000_TDBAL(i)),
E1000_READ_REG(hw, E1000_TDLEN(i)));
txr->queue_status = IGB_QUEUE_IDLE;
txdctl |= IGB_TX_PTHRESH;
txdctl |= IGB_TX_HTHRESH << 8;
txdctl |= IGB_TX_WTHRESH << 16;
txdctl |= E1000_TXDCTL_QUEUE_ENABLE;
E1000_WRITE_REG(hw, E1000_TXDCTL(i), txdctl);
}
if (adapter->vf_ifp)
return;
e1000_config_collision_dist(hw);
/* Program the Transmit Control Register */
tctl = E1000_READ_REG(hw, E1000_TCTL);
tctl &= ~E1000_TCTL_CT;
tctl |= (E1000_TCTL_PSP | E1000_TCTL_RTLC | E1000_TCTL_EN |
(E1000_COLLISION_THRESHOLD << E1000_CT_SHIFT));
/* This write will effectively turn on the transmit unit. */
E1000_WRITE_REG(hw, E1000_TCTL, tctl);
}
/*********************************************************************
*
* Free all transmit rings.
*
**********************************************************************/
static void
igb_free_transmit_structures(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IGB_TX_LOCK(txr);
igb_free_transmit_buffers(txr);
igb_dma_free(adapter, &txr->txdma);
IGB_TX_UNLOCK(txr);
IGB_TX_LOCK_DESTROY(txr);
}
free(adapter->tx_rings, M_DEVBUF);
}
/*********************************************************************
*
* Free transmit ring related data structures.
*
**********************************************************************/
static void
igb_free_transmit_buffers(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct igb_tx_buf *tx_buffer;
int i;
INIT_DEBUGOUT("free_transmit_ring: begin");
if (txr->tx_buffers == NULL)
return;
tx_buffer = txr->tx_buffers;
for (i = 0; i < adapter->num_tx_desc; i++, tx_buffer++) {
if (tx_buffer->m_head != NULL) {
bus_dmamap_sync(txr->txtag, tx_buffer->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
tx_buffer->map);
m_freem(tx_buffer->m_head);
tx_buffer->m_head = NULL;
if (tx_buffer->map != NULL) {
bus_dmamap_destroy(txr->txtag,
tx_buffer->map);
tx_buffer->map = NULL;
}
} else if (tx_buffer->map != NULL) {
bus_dmamap_unload(txr->txtag,
tx_buffer->map);
bus_dmamap_destroy(txr->txtag,
tx_buffer->map);
tx_buffer->map = NULL;
}
}
#ifndef IGB_LEGACY_TX
if (txr->br != NULL)
buf_ring_free(txr->br, M_DEVBUF);
#endif
if (txr->tx_buffers != NULL) {
free(txr->tx_buffers, M_DEVBUF);
txr->tx_buffers = NULL;
}
if (txr->txtag != NULL) {
bus_dma_tag_destroy(txr->txtag);
txr->txtag = NULL;
}
return;
}
/**********************************************************************
*
* Setup work for hardware segmentation offload (TSO) on
* adapters using advanced tx descriptors
*
**********************************************************************/
static int
igb_tso_setup(struct tx_ring *txr, struct mbuf *mp,
u32 *cmd_type_len, u32 *olinfo_status)
{
struct adapter *adapter = txr->adapter;
struct e1000_adv_tx_context_desc *TXD;
u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
u32 mss_l4len_idx = 0, paylen;
u16 vtag = 0, eh_type;
int ctxd, ehdrlen, ip_hlen, tcp_hlen;
struct ether_vlan_header *eh;
#ifdef INET6
struct ip6_hdr *ip6;
#endif
#ifdef INET
struct ip *ip;
#endif
struct tcphdr *th;
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
eh_type = eh->evl_proto;
} else {
ehdrlen = ETHER_HDR_LEN;
eh_type = eh->evl_encap_proto;
}
switch (ntohs(eh_type)) {
#ifdef INET6
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
/* XXX-BZ For now we do not pretend to support ext. hdrs. */
if (ip6->ip6_nxt != IPPROTO_TCP)
return (ENXIO);
ip_hlen = sizeof(struct ip6_hdr);
ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV6;
break;
#endif
#ifdef INET
case ETHERTYPE_IP:
ip = (struct ip *)(mp->m_data + ehdrlen);
if (ip->ip_p != IPPROTO_TCP)
return (ENXIO);
ip->ip_sum = 0;
ip_hlen = ip->ip_hl << 2;
th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
th->th_sum = in_pseudo(ip->ip_src.s_addr,
ip->ip_dst.s_addr, htons(IPPROTO_TCP));
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
/* Tell transmit desc to also do IPv4 checksum. */
*olinfo_status |= E1000_TXD_POPTS_IXSM << 8;
break;
#endif
default:
panic("%s: CSUM_TSO but no supported IP version (0x%04x)",
__func__, ntohs(eh_type));
break;
}
ctxd = txr->next_avail_desc;
TXD = (struct e1000_adv_tx_context_desc *) &txr->tx_base[ctxd];
tcp_hlen = th->th_off << 2;
/* This is used in the transmit desc in encap */
paylen = mp->m_pkthdr.len - ehdrlen - ip_hlen - tcp_hlen;
/* VLAN MACLEN IPLEN */
if (mp->m_flags & M_VLANTAG) {
vtag = htole16(mp->m_pkthdr.ether_vtag);
vlan_macip_lens |= (vtag << E1000_ADVTXD_VLAN_SHIFT);
}
vlan_macip_lens |= ehdrlen << E1000_ADVTXD_MACLEN_SHIFT;
vlan_macip_lens |= ip_hlen;
TXD->vlan_macip_lens = htole32(vlan_macip_lens);
/* ADV DTYPE TUCMD */
type_tucmd_mlhl |= E1000_ADVTXD_DCMD_DEXT | E1000_ADVTXD_DTYP_CTXT;
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_TCP;
TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
/* MSS L4LEN IDX */
mss_l4len_idx |= (mp->m_pkthdr.tso_segsz << E1000_ADVTXD_MSS_SHIFT);
mss_l4len_idx |= (tcp_hlen << E1000_ADVTXD_L4LEN_SHIFT);
/* 82575 needs the queue index added */
if (adapter->hw.mac.type == e1000_82575)
mss_l4len_idx |= txr->me << 4;
TXD->mss_l4len_idx = htole32(mss_l4len_idx);
TXD->seqnum_seed = htole32(0);
if (++ctxd == txr->num_desc)
ctxd = 0;
txr->tx_avail--;
txr->next_avail_desc = ctxd;
*cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
*olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
*olinfo_status |= paylen << E1000_ADVTXD_PAYLEN_SHIFT;
++txr->tso_tx;
return (0);
}
/*********************************************************************
*
* Advanced Context Descriptor setup for VLAN, CSUM or TSO
*
**********************************************************************/
static int
igb_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp,
u32 *cmd_type_len, u32 *olinfo_status)
{
struct e1000_adv_tx_context_desc *TXD;
struct adapter *adapter = txr->adapter;
struct ether_vlan_header *eh;
struct ip *ip;
struct ip6_hdr *ip6;
u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0;
int ehdrlen, ip_hlen = 0;
u16 etype;
u8 ipproto = 0;
int offload = TRUE;
int ctxd = txr->next_avail_desc;
u16 vtag = 0;
/* First check if TSO is to be used */
if (mp->m_pkthdr.csum_flags & CSUM_TSO)
return (igb_tso_setup(txr, mp, cmd_type_len, olinfo_status));
if ((mp->m_pkthdr.csum_flags & CSUM_OFFLOAD) == 0)
offload = FALSE;
/* Indicate the whole packet as payload when not doing TSO */
*olinfo_status |= mp->m_pkthdr.len << E1000_ADVTXD_PAYLEN_SHIFT;
/* Now ready a context descriptor */
TXD = (struct e1000_adv_tx_context_desc *) &txr->tx_base[ctxd];
/*
** In advanced descriptors the vlan tag must
** be placed into the context descriptor. Hence
** we need to make one even if not doing offloads.
*/
if (mp->m_flags & M_VLANTAG) {
vtag = htole16(mp->m_pkthdr.ether_vtag);
vlan_macip_lens |= (vtag << E1000_ADVTXD_VLAN_SHIFT);
} else if (offload == FALSE) /* ... no offload to do */
return (0);
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present,
* helpful for QinQ too.
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
etype = ntohs(eh->evl_proto);
ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
} else {
etype = ntohs(eh->evl_encap_proto);
ehdrlen = ETHER_HDR_LEN;
}
/* Set the ether header length */
vlan_macip_lens |= ehdrlen << E1000_ADVTXD_MACLEN_SHIFT;
switch (etype) {
case ETHERTYPE_IP:
ip = (struct ip *)(mp->m_data + ehdrlen);
ip_hlen = ip->ip_hl << 2;
ipproto = ip->ip_p;
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
break;
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
ip_hlen = sizeof(struct ip6_hdr);
/* XXX-BZ this will go badly in case of ext hdrs. */
ipproto = ip6->ip6_nxt;
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV6;
break;
default:
offload = FALSE;
break;
}
vlan_macip_lens |= ip_hlen;
type_tucmd_mlhl |= E1000_ADVTXD_DCMD_DEXT | E1000_ADVTXD_DTYP_CTXT;
switch (ipproto) {
case IPPROTO_TCP:
#if __FreeBSD_version >= 1000000
if (mp->m_pkthdr.csum_flags & (CSUM_IP_TCP | CSUM_IP6_TCP))
#else
if (mp->m_pkthdr.csum_flags & CSUM_TCP)
#endif
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_TCP;
break;
case IPPROTO_UDP:
#if __FreeBSD_version >= 1000000
if (mp->m_pkthdr.csum_flags & (CSUM_IP_UDP | CSUM_IP6_UDP))
#else
if (mp->m_pkthdr.csum_flags & CSUM_UDP)
#endif
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_UDP;
break;
#if __FreeBSD_version >= 800000
case IPPROTO_SCTP:
#if __FreeBSD_version >= 1000000
if (mp->m_pkthdr.csum_flags & (CSUM_IP_SCTP | CSUM_IP6_SCTP))
#else
if (mp->m_pkthdr.csum_flags & CSUM_SCTP)
#endif
type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_L4T_SCTP;
break;
#endif
default:
offload = FALSE;
break;
}
if (offload) /* For the TX descriptor setup */
*olinfo_status |= E1000_TXD_POPTS_TXSM << 8;
/* 82575 needs the queue index added */
if (adapter->hw.mac.type == e1000_82575)
mss_l4len_idx = txr->me << 4;
/* Now copy bits into descriptor */
TXD->vlan_macip_lens = htole32(vlan_macip_lens);
TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
TXD->seqnum_seed = htole32(0);
TXD->mss_l4len_idx = htole32(mss_l4len_idx);
/* We've consumed the first desc, adjust counters */
if (++ctxd == txr->num_desc)
ctxd = 0;
txr->next_avail_desc = ctxd;
--txr->tx_avail;
return (0);
}
/**********************************************************************
*
* Examine each tx_buffer in the used queue. If the hardware is done
* processing the packet then free associated resources. The
* tx_buffer is put back on the free queue.
*
* TRUE return means there's work in the ring to clean, FALSE its empty.
**********************************************************************/
static bool
igb_txeof(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
#ifdef DEV_NETMAP
struct ifnet *ifp = adapter->ifp;
#endif /* DEV_NETMAP */
u32 work, processed = 0;
int limit = adapter->tx_process_limit;
struct igb_tx_buf *buf;
union e1000_adv_tx_desc *txd;
mtx_assert(&txr->tx_mtx, MA_OWNED);
#ifdef DEV_NETMAP
if (netmap_tx_irq(ifp, txr->me))
return (FALSE);
#endif /* DEV_NETMAP */
if (txr->tx_avail == txr->num_desc) {
txr->queue_status = IGB_QUEUE_IDLE;
return FALSE;
}
/* Get work starting point */
work = txr->next_to_clean;
buf = &txr->tx_buffers[work];
txd = &txr->tx_base[work];
work -= txr->num_desc; /* The distance to ring end */
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
do {
union e1000_adv_tx_desc *eop = buf->eop;
if (eop == NULL) /* No work */
break;
if ((eop->wb.status & E1000_TXD_STAT_DD) == 0)
break; /* I/O not complete */
if (buf->m_head) {
txr->bytes +=
buf->m_head->m_pkthdr.len;
bus_dmamap_sync(txr->txtag,
buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
}
buf->eop = NULL;
++txr->tx_avail;
/* We clean the range if multi segment */
while (txd != eop) {
++txd;
++buf;
++work;
/* wrap the ring? */
if (__predict_false(!work)) {
work -= txr->num_desc;
buf = txr->tx_buffers;
txd = txr->tx_base;
}
if (buf->m_head) {
txr->bytes +=
buf->m_head->m_pkthdr.len;
bus_dmamap_sync(txr->txtag,
buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
}
++txr->tx_avail;
buf->eop = NULL;
}
++txr->packets;
++processed;
txr->watchdog_time = ticks;
/* Try the next packet */
++txd;
++buf;
++work;
/* reset with a wrap */
if (__predict_false(!work)) {
work -= txr->num_desc;
buf = txr->tx_buffers;
txd = txr->tx_base;
}
prefetch(txd);
} while (__predict_true(--limit));
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
work += txr->num_desc;
txr->next_to_clean = work;
/*
** Watchdog calculation, we know there's
** work outstanding or the first return
** would have been taken, so none processed
** for too long indicates a hang.
*/
if ((!processed) && ((ticks - txr->watchdog_time) > IGB_WATCHDOG))
txr->queue_status |= IGB_QUEUE_HUNG;
if (txr->tx_avail >= IGB_QUEUE_THRESHOLD)
txr->queue_status &= ~IGB_QUEUE_DEPLETED;
if (txr->tx_avail == txr->num_desc) {
txr->queue_status = IGB_QUEUE_IDLE;
return (FALSE);
}
return (TRUE);
}
/*********************************************************************
*
* Refresh mbuf buffers for RX descriptor rings
* - now keeps its own state so discards due to resource
* exhaustion are unnecessary, if an mbuf cannot be obtained
* it just returns, keeping its placeholder, thus it can simply
* be recalled to try again.
*
**********************************************************************/
static void
igb_refresh_mbufs(struct rx_ring *rxr, int limit)
{
struct adapter *adapter = rxr->adapter;
bus_dma_segment_t hseg[1];
bus_dma_segment_t pseg[1];
struct igb_rx_buf *rxbuf;
struct mbuf *mh, *mp;
int i, j, nsegs, error;
bool refreshed = FALSE;
i = j = rxr->next_to_refresh;
/*
** Get one descriptor beyond
** our work mark to control
** the loop.
*/
if (++j == adapter->num_rx_desc)
j = 0;
while (j != limit) {
rxbuf = &rxr->rx_buffers[i];
/* No hdr mbuf used with header split off */
if (rxr->hdr_split == FALSE)
goto no_split;
if (rxbuf->m_head == NULL) {
mh = m_gethdr(M_NOWAIT, MT_DATA);
if (mh == NULL)
goto update;
} else
mh = rxbuf->m_head;
mh->m_pkthdr.len = mh->m_len = MHLEN;
mh->m_len = MHLEN;
mh->m_flags |= M_PKTHDR;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->htag,
rxbuf->hmap, mh, hseg, &nsegs, BUS_DMA_NOWAIT);
if (error != 0) {
printf("Refresh mbufs: hdr dmamap load"
" failure - %d\n", error);
m_free(mh);
rxbuf->m_head = NULL;
goto update;
}
rxbuf->m_head = mh;
bus_dmamap_sync(rxr->htag, rxbuf->hmap,
BUS_DMASYNC_PREREAD);
rxr->rx_base[i].read.hdr_addr =
htole64(hseg[0].ds_addr);
no_split:
if (rxbuf->m_pack == NULL) {
mp = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, adapter->rx_mbuf_sz);
if (mp == NULL)
goto update;
} else
mp = rxbuf->m_pack;
mp->m_pkthdr.len = mp->m_len = adapter->rx_mbuf_sz;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
rxbuf->pmap, mp, pseg, &nsegs, BUS_DMA_NOWAIT);
if (error != 0) {
printf("Refresh mbufs: payload dmamap load"
" failure - %d\n", error);
m_free(mp);
rxbuf->m_pack = NULL;
goto update;
}
rxbuf->m_pack = mp;
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_PREREAD);
rxr->rx_base[i].read.pkt_addr =
htole64(pseg[0].ds_addr);
refreshed = TRUE; /* I feel wefreshed :) */
i = j; /* our next is precalculated */
rxr->next_to_refresh = i;
if (++j == adapter->num_rx_desc)
j = 0;
}
update:
if (refreshed) /* update tail */
E1000_WRITE_REG(&adapter->hw,
E1000_RDT(rxr->me), rxr->next_to_refresh);
return;
}
/*********************************************************************
*
* Allocate memory for rx_buffer structures. Since we use one
* rx_buffer per received packet, the maximum number of rx_buffer's
* that we'll need is equal to the number of receive descriptors
* that we've allocated.
*
**********************************************************************/
static int
igb_allocate_receive_buffers(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
device_t dev = adapter->dev;
struct igb_rx_buf *rxbuf;
int i, bsize, error;
bsize = sizeof(struct igb_rx_buf) * adapter->num_rx_desc;
if (!(rxr->rx_buffers =
(struct igb_rx_buf *) malloc(bsize,
M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate rx_buffer memory\n");
error = ENOMEM;
goto fail;
}
if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MSIZE, /* maxsize */
1, /* nsegments */
MSIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&rxr->htag))) {
device_printf(dev, "Unable to create RX DMA tag\n");
goto fail;
}
if ((error = bus_dma_tag_create(bus_get_dma_tag(dev),
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MJUM9BYTES, /* maxsize */
1, /* nsegments */
MJUM9BYTES, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&rxr->ptag))) {
device_printf(dev, "Unable to create RX payload DMA tag\n");
goto fail;
}
for (i = 0; i < adapter->num_rx_desc; i++) {
rxbuf = &rxr->rx_buffers[i];
error = bus_dmamap_create(rxr->htag, 0, &rxbuf->hmap);
if (error) {
device_printf(dev,
"Unable to create RX head DMA maps\n");
goto fail;
}
error = bus_dmamap_create(rxr->ptag, 0, &rxbuf->pmap);
if (error) {
device_printf(dev,
"Unable to create RX packet DMA maps\n");
goto fail;
}
}
return (0);
fail:
/* Frees all, but can handle partial completion */
igb_free_receive_structures(adapter);
return (error);
}
static void
igb_free_receive_ring(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
struct igb_rx_buf *rxbuf;
for (int i = 0; i < adapter->num_rx_desc; i++) {
rxbuf = &rxr->rx_buffers[i];
if (rxbuf->m_head != NULL) {
bus_dmamap_sync(rxr->htag, rxbuf->hmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->htag, rxbuf->hmap);
rxbuf->m_head->m_flags |= M_PKTHDR;
m_freem(rxbuf->m_head);
}
if (rxbuf->m_pack != NULL) {
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
rxbuf->m_pack->m_flags |= M_PKTHDR;
m_freem(rxbuf->m_pack);
}
rxbuf->m_head = NULL;
rxbuf->m_pack = NULL;
}
}
/*********************************************************************
*
* Initialize a receive ring and its buffers.
*
**********************************************************************/
static int
igb_setup_receive_ring(struct rx_ring *rxr)
{
struct adapter *adapter;
struct ifnet *ifp;
device_t dev;
struct igb_rx_buf *rxbuf;
bus_dma_segment_t pseg[1], hseg[1];
struct lro_ctrl *lro = &rxr->lro;
int rsize, nsegs, error = 0;
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(rxr->adapter->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
adapter = rxr->adapter;
dev = adapter->dev;
ifp = adapter->ifp;
/* Clear the ring contents */
IGB_RX_LOCK(rxr);
#ifdef DEV_NETMAP
slot = netmap_reset(na, NR_RX, rxr->me, 0);
#endif /* DEV_NETMAP */
rsize = roundup2(adapter->num_rx_desc *
sizeof(union e1000_adv_rx_desc), IGB_DBA_ALIGN);
bzero((void *)rxr->rx_base, rsize);
/*
** Free current RX buffer structures and their mbufs
*/
igb_free_receive_ring(rxr);
/* Configure for header split? */
if (igb_header_split)
rxr->hdr_split = TRUE;
/* Now replenish the ring mbufs */
for (int j = 0; j < adapter->num_rx_desc; ++j) {
struct mbuf *mh, *mp;
rxbuf = &rxr->rx_buffers[j];
#ifdef DEV_NETMAP
if (slot) {
/* slot sj is mapped to the j-th NIC-ring entry */
int sj = netmap_idx_n2k(&na->rx_rings[rxr->me], j);
uint64_t paddr;
void *addr;
addr = PNMB(na, slot + sj, &paddr);
netmap_load_map(na, rxr->ptag, rxbuf->pmap, addr);
/* Update descriptor */
rxr->rx_base[j].read.pkt_addr = htole64(paddr);
continue;
}
#endif /* DEV_NETMAP */
if (rxr->hdr_split == FALSE)
goto skip_head;
/* First the header */
rxbuf->m_head = m_gethdr(M_NOWAIT, MT_DATA);
if (rxbuf->m_head == NULL) {
error = ENOBUFS;
goto fail;
}
m_adj(rxbuf->m_head, ETHER_ALIGN);
mh = rxbuf->m_head;
mh->m_len = mh->m_pkthdr.len = MHLEN;
mh->m_flags |= M_PKTHDR;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->htag,
rxbuf->hmap, rxbuf->m_head, hseg,
&nsegs, BUS_DMA_NOWAIT);
if (error != 0) /* Nothing elegant to do here */
goto fail;
bus_dmamap_sync(rxr->htag,
rxbuf->hmap, BUS_DMASYNC_PREREAD);
/* Update descriptor */
rxr->rx_base[j].read.hdr_addr = htole64(hseg[0].ds_addr);
skip_head:
/* Now the payload cluster */
rxbuf->m_pack = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, adapter->rx_mbuf_sz);
if (rxbuf->m_pack == NULL) {
error = ENOBUFS;
goto fail;
}
mp = rxbuf->m_pack;
mp->m_pkthdr.len = mp->m_len = adapter->rx_mbuf_sz;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
rxbuf->pmap, mp, pseg,
&nsegs, BUS_DMA_NOWAIT);
if (error != 0)
goto fail;
bus_dmamap_sync(rxr->ptag,
rxbuf->pmap, BUS_DMASYNC_PREREAD);
/* Update descriptor */
rxr->rx_base[j].read.pkt_addr = htole64(pseg[0].ds_addr);
}
/* Setup our descriptor indices */
rxr->next_to_check = 0;
rxr->next_to_refresh = adapter->num_rx_desc - 1;
rxr->lro_enabled = FALSE;
rxr->rx_split_packets = 0;
rxr->rx_bytes = 0;
rxr->fmp = NULL;
rxr->lmp = NULL;
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
** Now set up the LRO interface, we
** also only do head split when LRO
** is enabled, since so often they
** are undesirable in similar setups.
*/
if (ifp->if_capenable & IFCAP_LRO) {
error = tcp_lro_init(lro);
if (error) {
device_printf(dev, "LRO Initialization failed!\n");
goto fail;
}
INIT_DEBUGOUT("RX LRO Initialized\n");
rxr->lro_enabled = TRUE;
lro->ifp = adapter->ifp;
}
IGB_RX_UNLOCK(rxr);
return (0);
fail:
igb_free_receive_ring(rxr);
IGB_RX_UNLOCK(rxr);
return (error);
}
/*********************************************************************
*
* Initialize all receive rings.
*
**********************************************************************/
static int
igb_setup_receive_structures(struct adapter *adapter)
{
struct rx_ring *rxr = adapter->rx_rings;
int i;
for (i = 0; i < adapter->num_queues; i++, rxr++)
if (igb_setup_receive_ring(rxr))
goto fail;
return (0);
fail:
/*
* Free RX buffers allocated so far, we will only handle
* the rings that completed, the failing case will have
* cleaned up for itself. 'i' is the endpoint.
*/
for (int j = 0; j < i; ++j) {
rxr = &adapter->rx_rings[j];
IGB_RX_LOCK(rxr);
igb_free_receive_ring(rxr);
IGB_RX_UNLOCK(rxr);
}
return (ENOBUFS);
}
/*
* Initialise the RSS mapping for NICs that support multiple transmit/
* receive rings.
*/
static void
igb_initialise_rss_mapping(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
int i;
int queue_id;
u32 reta;
u32 rss_key[10], mrqc, shift = 0;
/* XXX? */
if (adapter->hw.mac.type == e1000_82575)
shift = 6;
/*
* The redirection table controls which destination
* queue each bucket redirects traffic to.
* Each DWORD represents four queues, with the LSB
* being the first queue in the DWORD.
*
* This just allocates buckets to queues using round-robin
* allocation.
*
* NOTE: It Just Happens to line up with the default
* RSS allocation method.
*/
/* Warning FM follows */
reta = 0;
for (i = 0; i < 128; i++) {
#ifdef RSS
queue_id = rss_get_indirection_to_bucket(i);
/*
* If we have more queues than buckets, we'll
* end up mapping buckets to a subset of the
* queues.
*
* If we have more buckets than queues, we'll
* end up instead assigning multiple buckets
* to queues.
*
* Both are suboptimal, but we need to handle
* the case so we don't go out of bounds
* indexing arrays and such.
*/
queue_id = queue_id % adapter->num_queues;
#else
queue_id = (i % adapter->num_queues);
#endif
/* Adjust if required */
queue_id = queue_id << shift;
/*
* The low 8 bits are for hash value (n+0);
* The next 8 bits are for hash value (n+1), etc.
*/
reta = reta >> 8;
reta = reta | ( ((uint32_t) queue_id) << 24);
if ((i & 3) == 3) {
E1000_WRITE_REG(hw, E1000_RETA(i >> 2), reta);
reta = 0;
}
}
/* Now fill in hash table */
/*
* MRQC: Multiple Receive Queues Command
* Set queuing to RSS control, number depends on the device.
*/
mrqc = E1000_MRQC_ENABLE_RSS_8Q;
#ifdef RSS
/* XXX ew typecasting */
rss_getkey((uint8_t *) &rss_key);
#else
arc4rand(&rss_key, sizeof(rss_key), 0);
#endif
for (i = 0; i < 10; i++)
E1000_WRITE_REG_ARRAY(hw,
E1000_RSSRK(0), i, rss_key[i]);
/*
* Configure the RSS fields to hash upon.
*/
mrqc |= (E1000_MRQC_RSS_FIELD_IPV4 |
E1000_MRQC_RSS_FIELD_IPV4_TCP);
mrqc |= (E1000_MRQC_RSS_FIELD_IPV6 |
E1000_MRQC_RSS_FIELD_IPV6_TCP);
mrqc |=( E1000_MRQC_RSS_FIELD_IPV4_UDP |
E1000_MRQC_RSS_FIELD_IPV6_UDP);
mrqc |=( E1000_MRQC_RSS_FIELD_IPV6_UDP_EX |
E1000_MRQC_RSS_FIELD_IPV6_TCP_EX);
E1000_WRITE_REG(hw, E1000_MRQC, mrqc);
}
/*********************************************************************
*
* Enable receive unit.
*
**********************************************************************/
static void
igb_initialize_receive_units(struct adapter *adapter)
{
struct rx_ring *rxr = adapter->rx_rings;
struct ifnet *ifp = adapter->ifp;
struct e1000_hw *hw = &adapter->hw;
u32 rctl, rxcsum, psize, srrctl = 0;
INIT_DEBUGOUT("igb_initialize_receive_unit: begin");
/*
* Make sure receives are disabled while setting
* up the descriptor ring
*/
rctl = E1000_READ_REG(hw, E1000_RCTL);
E1000_WRITE_REG(hw, E1000_RCTL, rctl & ~E1000_RCTL_EN);
/*
** Set up for header split
*/
if (igb_header_split) {
/* Use a standard mbuf for the header */
srrctl |= IGB_HDR_BUF << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
srrctl |= E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
} else
srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
/*
** Set up for jumbo frames
*/
if (ifp->if_mtu > ETHERMTU) {
rctl |= E1000_RCTL_LPE;
if (adapter->rx_mbuf_sz == MJUMPAGESIZE) {
srrctl |= 4096 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
rctl |= E1000_RCTL_SZ_4096 | E1000_RCTL_BSEX;
} else if (adapter->rx_mbuf_sz > MJUMPAGESIZE) {
srrctl |= 8192 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
rctl |= E1000_RCTL_SZ_8192 | E1000_RCTL_BSEX;
}
/* Set maximum packet len */
psize = adapter->max_frame_size;
/* are we on a vlan? */
if (adapter->ifp->if_vlantrunk != NULL)
psize += VLAN_TAG_SIZE;
E1000_WRITE_REG(&adapter->hw, E1000_RLPML, psize);
} else {
rctl &= ~E1000_RCTL_LPE;
srrctl |= 2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
rctl |= E1000_RCTL_SZ_2048;
}
/*
* If TX flow control is disabled and there's >1 queue defined,
* enable DROP.
*
* This drops frames rather than hanging the RX MAC for all queues.
*/
if ((adapter->num_queues > 1) &&
(adapter->fc == e1000_fc_none ||
adapter->fc == e1000_fc_rx_pause)) {
srrctl |= E1000_SRRCTL_DROP_EN;
}
/* Setup the Base and Length of the Rx Descriptor Rings */
for (int i = 0; i < adapter->num_queues; i++, rxr++) {
u64 bus_addr = rxr->rxdma.dma_paddr;
u32 rxdctl;
E1000_WRITE_REG(hw, E1000_RDLEN(i),
adapter->num_rx_desc * sizeof(struct e1000_rx_desc));
E1000_WRITE_REG(hw, E1000_RDBAH(i),
(uint32_t)(bus_addr >> 32));
E1000_WRITE_REG(hw, E1000_RDBAL(i),
(uint32_t)bus_addr);
E1000_WRITE_REG(hw, E1000_SRRCTL(i), srrctl);
/* Enable this Queue */
rxdctl = E1000_READ_REG(hw, E1000_RXDCTL(i));
rxdctl |= E1000_RXDCTL_QUEUE_ENABLE;
rxdctl &= 0xFFF00000;
rxdctl |= IGB_RX_PTHRESH;
rxdctl |= IGB_RX_HTHRESH << 8;
rxdctl |= IGB_RX_WTHRESH << 16;
E1000_WRITE_REG(hw, E1000_RXDCTL(i), rxdctl);
}
/*
** Setup for RX MultiQueue
*/
rxcsum = E1000_READ_REG(hw, E1000_RXCSUM);
if (adapter->num_queues >1) {
/* rss setup */
igb_initialise_rss_mapping(adapter);
/*
** NOTE: Receive Full-Packet Checksum Offload
** is mutually exclusive with Multiqueue. However
** this is not the same as TCP/IP checksums which
** still work.
*/
rxcsum |= E1000_RXCSUM_PCSD;
#if __FreeBSD_version >= 800000
/* For SCTP Offload */
if ((hw->mac.type != e1000_82575) &&
(ifp->if_capenable & IFCAP_RXCSUM))
rxcsum |= E1000_RXCSUM_CRCOFL;
#endif
} else {
/* Non RSS setup */
if (ifp->if_capenable & IFCAP_RXCSUM) {
rxcsum |= E1000_RXCSUM_IPPCSE;
#if __FreeBSD_version >= 800000
if (adapter->hw.mac.type != e1000_82575)
rxcsum |= E1000_RXCSUM_CRCOFL;
#endif
} else
rxcsum &= ~E1000_RXCSUM_TUOFL;
}
E1000_WRITE_REG(hw, E1000_RXCSUM, rxcsum);
/* Setup the Receive Control Register */
rctl &= ~(3 << E1000_RCTL_MO_SHIFT);
rctl |= E1000_RCTL_EN | E1000_RCTL_BAM | E1000_RCTL_LBM_NO |
E1000_RCTL_RDMTS_HALF |
(hw->mac.mc_filter_type << E1000_RCTL_MO_SHIFT);
/* Strip CRC bytes. */
rctl |= E1000_RCTL_SECRC;
/* Make sure VLAN Filters are off */
rctl &= ~E1000_RCTL_VFE;
/* Don't store bad packets */
rctl &= ~E1000_RCTL_SBP;
/* Enable Receives */
E1000_WRITE_REG(hw, E1000_RCTL, rctl);
/*
* Setup the HW Rx Head and Tail Descriptor Pointers
* - needs to be after enable
*/
for (int i = 0; i < adapter->num_queues; i++) {
rxr = &adapter->rx_rings[i];
E1000_WRITE_REG(hw, E1000_RDH(i), rxr->next_to_check);
#ifdef DEV_NETMAP
/*
* an init() while a netmap client is active must
* preserve the rx buffers passed to userspace.
* In this driver it means we adjust RDT to
* something different from next_to_refresh
* (which is not used in netmap mode).
*/
if (ifp->if_capenable & IFCAP_NETMAP) {
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_kring *kring = &na->rx_rings[i];
int t = rxr->next_to_refresh - nm_kr_rxspace(kring);
if (t >= adapter->num_rx_desc)
t -= adapter->num_rx_desc;
else if (t < 0)
t += adapter->num_rx_desc;
E1000_WRITE_REG(hw, E1000_RDT(i), t);
} else
#endif /* DEV_NETMAP */
E1000_WRITE_REG(hw, E1000_RDT(i), rxr->next_to_refresh);
}
return;
}
/*********************************************************************
*
* Free receive rings.
*
**********************************************************************/
static void
igb_free_receive_structures(struct adapter *adapter)
{
struct rx_ring *rxr = adapter->rx_rings;
for (int i = 0; i < adapter->num_queues; i++, rxr++) {
struct lro_ctrl *lro = &rxr->lro;
igb_free_receive_buffers(rxr);
tcp_lro_free(lro);
igb_dma_free(adapter, &rxr->rxdma);
}
free(adapter->rx_rings, M_DEVBUF);
}
/*********************************************************************
*
* Free receive ring data structures.
*
**********************************************************************/
static void
igb_free_receive_buffers(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
struct igb_rx_buf *rxbuf;
int i;
INIT_DEBUGOUT("free_receive_structures: begin");
/* Cleanup any existing buffers */
if (rxr->rx_buffers != NULL) {
for (i = 0; i < adapter->num_rx_desc; i++) {
rxbuf = &rxr->rx_buffers[i];
if (rxbuf->m_head != NULL) {
bus_dmamap_sync(rxr->htag, rxbuf->hmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->htag, rxbuf->hmap);
rxbuf->m_head->m_flags |= M_PKTHDR;
m_freem(rxbuf->m_head);
}
if (rxbuf->m_pack != NULL) {
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
rxbuf->m_pack->m_flags |= M_PKTHDR;
m_freem(rxbuf->m_pack);
}
rxbuf->m_head = NULL;
rxbuf->m_pack = NULL;
if (rxbuf->hmap != NULL) {
bus_dmamap_destroy(rxr->htag, rxbuf->hmap);
rxbuf->hmap = NULL;
}
if (rxbuf->pmap != NULL) {
bus_dmamap_destroy(rxr->ptag, rxbuf->pmap);
rxbuf->pmap = NULL;
}
}
if (rxr->rx_buffers != NULL) {
free(rxr->rx_buffers, M_DEVBUF);
rxr->rx_buffers = NULL;
}
}
if (rxr->htag != NULL) {
bus_dma_tag_destroy(rxr->htag);
rxr->htag = NULL;
}
if (rxr->ptag != NULL) {
bus_dma_tag_destroy(rxr->ptag);
rxr->ptag = NULL;
}
}
static __inline void
igb_rx_discard(struct rx_ring *rxr, int i)
{
struct igb_rx_buf *rbuf;
rbuf = &rxr->rx_buffers[i];
/* Partially received? Free the chain */
if (rxr->fmp != NULL) {
rxr->fmp->m_flags |= M_PKTHDR;
m_freem(rxr->fmp);
rxr->fmp = NULL;
rxr->lmp = NULL;
}
/*
** With advanced descriptors the writeback
** clobbers the buffer addrs, so its easier
** to just free the existing mbufs and take
** the normal refresh path to get new buffers
** and mapping.
*/
if (rbuf->m_head) {
m_free(rbuf->m_head);
rbuf->m_head = NULL;
bus_dmamap_unload(rxr->htag, rbuf->hmap);
}
if (rbuf->m_pack) {
m_free(rbuf->m_pack);
rbuf->m_pack = NULL;
bus_dmamap_unload(rxr->ptag, rbuf->pmap);
}
return;
}
static __inline void
igb_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u32 ptype)
{
/*
* ATM LRO is only for IPv4/TCP packets and TCP checksum of the packet
* should be computed by hardware. Also it should not have VLAN tag in
* ethernet header.
*/
if (rxr->lro_enabled &&
(ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
(ptype & E1000_RXDADV_PKTTYPE_ETQF) == 0 &&
(ptype & (E1000_RXDADV_PKTTYPE_IPV4 | E1000_RXDADV_PKTTYPE_TCP)) ==
(E1000_RXDADV_PKTTYPE_IPV4 | E1000_RXDADV_PKTTYPE_TCP) &&
(m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) ==
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
/*
* Send to the stack if:
** - LRO not enabled, or
** - no LRO resources, or
** - lro enqueue fails
*/
if (rxr->lro.lro_cnt != 0)
if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
return;
}
IGB_RX_UNLOCK(rxr);
(*ifp->if_input)(ifp, m);
IGB_RX_LOCK(rxr);
}
/*********************************************************************
*
* This routine executes in interrupt context. It replenishes
* the mbufs in the descriptor and sends data which has been
* dma'ed into host memory to upper layer.
*
* We loop at most count times if count is > 0, or until done if
* count < 0.
*
* Return TRUE if more to clean, FALSE otherwise
*********************************************************************/
static bool
igb_rxeof(struct igb_queue *que, int count, int *done)
{
struct adapter *adapter = que->adapter;
struct rx_ring *rxr = que->rxr;
struct ifnet *ifp = adapter->ifp;
struct lro_ctrl *lro = &rxr->lro;
int i, processed = 0, rxdone = 0;
u32 ptype, staterr = 0;
union e1000_adv_rx_desc *cur;
IGB_RX_LOCK(rxr);
/* Sync the ring. */
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
#ifdef DEV_NETMAP
if (netmap_rx_irq(ifp, rxr->me, &processed)) {
IGB_RX_UNLOCK(rxr);
return (FALSE);
}
#endif /* DEV_NETMAP */
/* Main clean loop */
for (i = rxr->next_to_check; count != 0;) {
struct mbuf *sendmp, *mh, *mp;
struct igb_rx_buf *rxbuf;
u16 hlen, plen, hdr, vtag, pkt_info;
bool eop = FALSE;
cur = &rxr->rx_base[i];
staterr = le32toh(cur->wb.upper.status_error);
if ((staterr & E1000_RXD_STAT_DD) == 0)
break;
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
count--;
sendmp = mh = mp = NULL;
cur->wb.upper.status_error = 0;
rxbuf = &rxr->rx_buffers[i];
plen = le16toh(cur->wb.upper.length);
ptype = le32toh(cur->wb.lower.lo_dword.data) & IGB_PKTTYPE_MASK;
if (((adapter->hw.mac.type == e1000_i350) ||
(adapter->hw.mac.type == e1000_i354)) &&
(staterr & E1000_RXDEXT_STATERR_LB))
vtag = be16toh(cur->wb.upper.vlan);
else
vtag = le16toh(cur->wb.upper.vlan);
hdr = le16toh(cur->wb.lower.lo_dword.hs_rss.hdr_info);
pkt_info = le16toh(cur->wb.lower.lo_dword.hs_rss.pkt_info);
eop = ((staterr & E1000_RXD_STAT_EOP) == E1000_RXD_STAT_EOP);
/*
* Free the frame (all segments) if we're at EOP and
* it's an error.
*
* The datasheet states that EOP + status is only valid for
* the final segment in a multi-segment frame.
*/
if (eop && ((staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) != 0)) {
adapter->dropped_pkts++;
++rxr->rx_discarded;
igb_rx_discard(rxr, i);
goto next_desc;
}
/*
** The way the hardware is configured to
** split, it will ONLY use the header buffer
** when header split is enabled, otherwise we
** get normal behavior, ie, both header and
** payload are DMA'd into the payload buffer.
**
** The fmp test is to catch the case where a
** packet spans multiple descriptors, in that
** case only the first header is valid.
*/
if (rxr->hdr_split && rxr->fmp == NULL) {
bus_dmamap_unload(rxr->htag, rxbuf->hmap);
hlen = (hdr & E1000_RXDADV_HDRBUFLEN_MASK) >>
E1000_RXDADV_HDRBUFLEN_SHIFT;
if (hlen > IGB_HDR_BUF)
hlen = IGB_HDR_BUF;
mh = rxr->rx_buffers[i].m_head;
mh->m_len = hlen;
/* clear buf pointer for refresh */
rxbuf->m_head = NULL;
/*
** Get the payload length, this
** could be zero if its a small
** packet.
*/
if (plen > 0) {
mp = rxr->rx_buffers[i].m_pack;
mp->m_len = plen;
mh->m_next = mp;
/* clear buf pointer */
rxbuf->m_pack = NULL;
rxr->rx_split_packets++;
}
} else {
/*
** Either no header split, or a
** secondary piece of a fragmented
** split packet.
*/
mh = rxr->rx_buffers[i].m_pack;
mh->m_len = plen;
/* clear buf info for refresh */
rxbuf->m_pack = NULL;
}
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
++processed; /* So we know when to refresh */
/* Initial frame - setup */
if (rxr->fmp == NULL) {
mh->m_pkthdr.len = mh->m_len;
/* Save the head of the chain */
rxr->fmp = mh;
rxr->lmp = mh;
if (mp != NULL) {
/* Add payload if split */
mh->m_pkthdr.len += mp->m_len;
rxr->lmp = mh->m_next;
}
} else {
/* Chain mbuf's together */
rxr->lmp->m_next = mh;
rxr->lmp = rxr->lmp->m_next;
rxr->fmp->m_pkthdr.len += mh->m_len;
}
if (eop) {
rxr->fmp->m_pkthdr.rcvif = ifp;
rxr->rx_packets++;
/* capture data for AIM */
rxr->packets++;
rxr->bytes += rxr->fmp->m_pkthdr.len;
rxr->rx_bytes += rxr->fmp->m_pkthdr.len;
if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
igb_rx_checksum(staterr, rxr->fmp, ptype);
if ((ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
(staterr & E1000_RXD_STAT_VP) != 0) {
rxr->fmp->m_pkthdr.ether_vtag = vtag;
rxr->fmp->m_flags |= M_VLANTAG;
}
/*
* In case of multiqueue, we have RXCSUM.PCSD bit set
* and never cleared. This means we have RSS hash
* available to be used.
*/
if (adapter->num_queues > 1) {
rxr->fmp->m_pkthdr.flowid =
le32toh(cur->wb.lower.hi_dword.rss);
switch (pkt_info & E1000_RXDADV_RSSTYPE_MASK) {
case E1000_RXDADV_RSSTYPE_IPV4_TCP:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_TCP_IPV4);
break;
case E1000_RXDADV_RSSTYPE_IPV4:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_IPV4);
break;
case E1000_RXDADV_RSSTYPE_IPV6_TCP:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_TCP_IPV6);
break;
case E1000_RXDADV_RSSTYPE_IPV6_EX:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_IPV6_EX);
break;
case E1000_RXDADV_RSSTYPE_IPV6:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_IPV6);
break;
case E1000_RXDADV_RSSTYPE_IPV6_TCP_EX:
M_HASHTYPE_SET(rxr->fmp,
M_HASHTYPE_RSS_TCP_IPV6_EX);
break;
default:
/* XXX fallthrough */
M_HASHTYPE_SET(rxr->fmp,
- M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_OPAQUE_HASH);
}
} else {
#ifndef IGB_LEGACY_TX
rxr->fmp->m_pkthdr.flowid = que->msix;
M_HASHTYPE_SET(rxr->fmp, M_HASHTYPE_OPAQUE);
#endif
}
sendmp = rxr->fmp;
/* Make sure to set M_PKTHDR. */
sendmp->m_flags |= M_PKTHDR;
rxr->fmp = NULL;
rxr->lmp = NULL;
}
next_desc:
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/* Advance our pointers to the next descriptor. */
if (++i == adapter->num_rx_desc)
i = 0;
/*
** Send to the stack or LRO
*/
if (sendmp != NULL) {
rxr->next_to_check = i;
igb_rx_input(rxr, ifp, sendmp, ptype);
i = rxr->next_to_check;
rxdone++;
}
/* Every 8 descriptors we go to refresh mbufs */
if (processed == 8) {
igb_refresh_mbufs(rxr, i);
processed = 0;
}
}
/* Catch any remainders */
if (igb_rx_unrefreshed(rxr))
igb_refresh_mbufs(rxr, i);
rxr->next_to_check = i;
/*
* Flush any outstanding LRO work
*/
tcp_lro_flush_all(lro);
if (done != NULL)
*done += rxdone;
IGB_RX_UNLOCK(rxr);
return ((staterr & E1000_RXD_STAT_DD) ? TRUE : FALSE);
}
/*********************************************************************
*
* Verify that the hardware indicated that the checksum is valid.
* Inform the stack about the status of checksum so that stack
* doesn't spend time verifying the checksum.
*
*********************************************************************/
static void
igb_rx_checksum(u32 staterr, struct mbuf *mp, u32 ptype)
{
u16 status = (u16)staterr;
u8 errors = (u8) (staterr >> 24);
int sctp;
/* Ignore Checksum bit is set */
if (status & E1000_RXD_STAT_IXSM) {
mp->m_pkthdr.csum_flags = 0;
return;
}
if ((ptype & E1000_RXDADV_PKTTYPE_ETQF) == 0 &&
(ptype & E1000_RXDADV_PKTTYPE_SCTP) != 0)
sctp = 1;
else
sctp = 0;
if (status & E1000_RXD_STAT_IPCS) {
/* Did it pass? */
if (!(errors & E1000_RXD_ERR_IPE)) {
/* IP Checksum Good */
mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED;
mp->m_pkthdr.csum_flags |= CSUM_IP_VALID;
} else
mp->m_pkthdr.csum_flags = 0;
}
if (status & (E1000_RXD_STAT_TCPCS | E1000_RXD_STAT_UDPCS)) {
u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
#if __FreeBSD_version >= 800000
if (sctp) /* reassign */
type = CSUM_SCTP_VALID;
#endif
/* Did it pass? */
if (!(errors & E1000_RXD_ERR_TCPE)) {
mp->m_pkthdr.csum_flags |= type;
if (sctp == 0)
mp->m_pkthdr.csum_data = htons(0xffff);
}
}
return;
}
/*
* This routine is run via an vlan
* config EVENT
*/
static void
igb_register_vlan(void *arg, struct ifnet *ifp, u16 vtag)
{
struct adapter *adapter = ifp->if_softc;
u32 index, bit;
if (ifp->if_softc != arg) /* Not our event */
return;
if ((vtag == 0) || (vtag > 4095)) /* Invalid */
return;
IGB_CORE_LOCK(adapter);
index = (vtag >> 5) & 0x7F;
bit = vtag & 0x1F;
adapter->shadow_vfta[index] |= (1 << bit);
++adapter->num_vlans;
/* Change hw filter setting */
if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
igb_setup_vlan_hw_support(adapter);
IGB_CORE_UNLOCK(adapter);
}
/*
* This routine is run via an vlan
* unconfig EVENT
*/
static void
igb_unregister_vlan(void *arg, struct ifnet *ifp, u16 vtag)
{
struct adapter *adapter = ifp->if_softc;
u32 index, bit;
if (ifp->if_softc != arg)
return;
if ((vtag == 0) || (vtag > 4095)) /* Invalid */
return;
IGB_CORE_LOCK(adapter);
index = (vtag >> 5) & 0x7F;
bit = vtag & 0x1F;
adapter->shadow_vfta[index] &= ~(1 << bit);
--adapter->num_vlans;
/* Change hw filter setting */
if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
igb_setup_vlan_hw_support(adapter);
IGB_CORE_UNLOCK(adapter);
}
static void
igb_setup_vlan_hw_support(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct ifnet *ifp = adapter->ifp;
u32 reg;
if (adapter->vf_ifp) {
e1000_rlpml_set_vf(hw,
adapter->max_frame_size + VLAN_TAG_SIZE);
return;
}
reg = E1000_READ_REG(hw, E1000_CTRL);
reg |= E1000_CTRL_VME;
E1000_WRITE_REG(hw, E1000_CTRL, reg);
/* Enable the Filter Table */
if (ifp->if_capenable & IFCAP_VLAN_HWFILTER) {
reg = E1000_READ_REG(hw, E1000_RCTL);
reg &= ~E1000_RCTL_CFIEN;
reg |= E1000_RCTL_VFE;
E1000_WRITE_REG(hw, E1000_RCTL, reg);
}
/* Update the frame size */
E1000_WRITE_REG(&adapter->hw, E1000_RLPML,
adapter->max_frame_size + VLAN_TAG_SIZE);
/* Don't bother with table if no vlans */
if ((adapter->num_vlans == 0) ||
((ifp->if_capenable & IFCAP_VLAN_HWFILTER) == 0))
return;
/*
** A soft reset zero's out the VFTA, so
** we need to repopulate it now.
*/
for (int i = 0; i < IGB_VFTA_SIZE; i++)
if (adapter->shadow_vfta[i] != 0) {
if (adapter->vf_ifp)
e1000_vfta_set_vf(hw,
adapter->shadow_vfta[i], TRUE);
else
e1000_write_vfta(hw,
i, adapter->shadow_vfta[i]);
}
}
static void
igb_enable_intr(struct adapter *adapter)
{
/* With RSS set up what to auto clear */
if (adapter->msix_mem) {
u32 mask = (adapter->que_mask | adapter->link_mask);
E1000_WRITE_REG(&adapter->hw, E1000_EIAC, mask);
E1000_WRITE_REG(&adapter->hw, E1000_EIAM, mask);
E1000_WRITE_REG(&adapter->hw, E1000_EIMS, mask);
E1000_WRITE_REG(&adapter->hw, E1000_IMS,
E1000_IMS_LSC);
} else {
E1000_WRITE_REG(&adapter->hw, E1000_IMS,
IMS_ENABLE_MASK);
}
E1000_WRITE_FLUSH(&adapter->hw);
return;
}
static void
igb_disable_intr(struct adapter *adapter)
{
if (adapter->msix_mem) {
E1000_WRITE_REG(&adapter->hw, E1000_EIMC, ~0);
E1000_WRITE_REG(&adapter->hw, E1000_EIAC, 0);
}
E1000_WRITE_REG(&adapter->hw, E1000_IMC, ~0);
E1000_WRITE_FLUSH(&adapter->hw);
return;
}
/*
* Bit of a misnomer, what this really means is
* to enable OS management of the system... aka
* to disable special hardware management features
*/
static void
igb_init_manageability(struct adapter *adapter)
{
if (adapter->has_manage) {
int manc2h = E1000_READ_REG(&adapter->hw, E1000_MANC2H);
int manc = E1000_READ_REG(&adapter->hw, E1000_MANC);
/* disable hardware interception of ARP */
manc &= ~(E1000_MANC_ARP_EN);
/* enable receiving management packets to the host */
manc |= E1000_MANC_EN_MNG2HOST;
manc2h |= 1 << 5; /* Mng Port 623 */
manc2h |= 1 << 6; /* Mng Port 664 */
E1000_WRITE_REG(&adapter->hw, E1000_MANC2H, manc2h);
E1000_WRITE_REG(&adapter->hw, E1000_MANC, manc);
}
}
/*
* Give control back to hardware management
* controller if there is one.
*/
static void
igb_release_manageability(struct adapter *adapter)
{
if (adapter->has_manage) {
int manc = E1000_READ_REG(&adapter->hw, E1000_MANC);
/* re-enable hardware interception of ARP */
manc |= E1000_MANC_ARP_EN;
manc &= ~E1000_MANC_EN_MNG2HOST;
E1000_WRITE_REG(&adapter->hw, E1000_MANC, manc);
}
}
/*
* igb_get_hw_control sets CTRL_EXT:DRV_LOAD bit.
* For ASF and Pass Through versions of f/w this means that
* the driver is loaded.
*
*/
static void
igb_get_hw_control(struct adapter *adapter)
{
u32 ctrl_ext;
if (adapter->vf_ifp)
return;
/* Let firmware know the driver has taken over */
ctrl_ext = E1000_READ_REG(&adapter->hw, E1000_CTRL_EXT);
E1000_WRITE_REG(&adapter->hw, E1000_CTRL_EXT,
ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
}
/*
* igb_release_hw_control resets CTRL_EXT:DRV_LOAD bit.
* For ASF and Pass Through versions of f/w this means that the
* driver is no longer loaded.
*
*/
static void
igb_release_hw_control(struct adapter *adapter)
{
u32 ctrl_ext;
if (adapter->vf_ifp)
return;
/* Let firmware taken over control of h/w */
ctrl_ext = E1000_READ_REG(&adapter->hw, E1000_CTRL_EXT);
E1000_WRITE_REG(&adapter->hw, E1000_CTRL_EXT,
ctrl_ext & ~E1000_CTRL_EXT_DRV_LOAD);
}
static int
igb_is_valid_ether_addr(uint8_t *addr)
{
char zero_addr[6] = { 0, 0, 0, 0, 0, 0 };
if ((addr[0] & 1) || (!bcmp(addr, zero_addr, ETHER_ADDR_LEN))) {
return (FALSE);
}
return (TRUE);
}
/*
* Enable PCI Wake On Lan capability
*/
static void
igb_enable_wakeup(device_t dev)
{
u16 cap, status;
u8 id;
/* First find the capabilities pointer*/
cap = pci_read_config(dev, PCIR_CAP_PTR, 2);
/* Read the PM Capabilities */
id = pci_read_config(dev, cap, 1);
if (id != PCIY_PMG) /* Something wrong */
return;
/* OK, we have the power capabilities, so
now get the status register */
cap += PCIR_POWER_STATUS;
status = pci_read_config(dev, cap, 2);
status |= PCIM_PSTAT_PME | PCIM_PSTAT_PMEENABLE;
pci_write_config(dev, cap, status, 2);
return;
}
static void
igb_led_func(void *arg, int onoff)
{
struct adapter *adapter = arg;
IGB_CORE_LOCK(adapter);
if (onoff) {
e1000_setup_led(&adapter->hw);
e1000_led_on(&adapter->hw);
} else {
e1000_led_off(&adapter->hw);
e1000_cleanup_led(&adapter->hw);
}
IGB_CORE_UNLOCK(adapter);
}
static uint64_t
igb_get_vf_counter(if_t ifp, ift_counter cnt)
{
struct adapter *adapter;
struct e1000_vf_stats *stats;
#ifndef IGB_LEGACY_TX
struct tx_ring *txr;
uint64_t rv;
#endif
adapter = if_getsoftc(ifp);
stats = (struct e1000_vf_stats *)adapter->stats;
switch (cnt) {
case IFCOUNTER_IPACKETS:
return (stats->gprc);
case IFCOUNTER_OPACKETS:
return (stats->gptc);
case IFCOUNTER_IBYTES:
return (stats->gorc);
case IFCOUNTER_OBYTES:
return (stats->gotc);
case IFCOUNTER_IMCASTS:
return (stats->mprc);
case IFCOUNTER_IERRORS:
return (adapter->dropped_pkts);
case IFCOUNTER_OERRORS:
return (adapter->watchdog_events);
#ifndef IGB_LEGACY_TX
case IFCOUNTER_OQDROPS:
rv = 0;
txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++)
rv += txr->br->br_drops;
return (rv);
#endif
default:
return (if_get_counter_default(ifp, cnt));
}
}
static uint64_t
igb_get_counter(if_t ifp, ift_counter cnt)
{
struct adapter *adapter;
struct e1000_hw_stats *stats;
#ifndef IGB_LEGACY_TX
struct tx_ring *txr;
uint64_t rv;
#endif
adapter = if_getsoftc(ifp);
if (adapter->vf_ifp)
return (igb_get_vf_counter(ifp, cnt));
stats = (struct e1000_hw_stats *)adapter->stats;
switch (cnt) {
case IFCOUNTER_IPACKETS:
return (stats->gprc);
case IFCOUNTER_OPACKETS:
return (stats->gptc);
case IFCOUNTER_IBYTES:
return (stats->gorc);
case IFCOUNTER_OBYTES:
return (stats->gotc);
case IFCOUNTER_IMCASTS:
return (stats->mprc);
case IFCOUNTER_OMCASTS:
return (stats->mptc);
case IFCOUNTER_IERRORS:
return (adapter->dropped_pkts + stats->rxerrc +
stats->crcerrs + stats->algnerrc +
stats->ruc + stats->roc + stats->cexterr);
case IFCOUNTER_OERRORS:
return (stats->ecol + stats->latecol +
adapter->watchdog_events);
case IFCOUNTER_COLLISIONS:
return (stats->colc);
case IFCOUNTER_IQDROPS:
return (stats->mpc);
#ifndef IGB_LEGACY_TX
case IFCOUNTER_OQDROPS:
rv = 0;
txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++)
rv += txr->br->br_drops;
return (rv);
#endif
default:
return (if_get_counter_default(ifp, cnt));
}
}
/**********************************************************************
*
* Update the board statistics counters.
*
**********************************************************************/
static void
igb_update_stats_counters(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct e1000_hw_stats *stats;
/*
** The virtual function adapter has only a
** small controlled set of stats, do only
** those and return.
*/
if (adapter->vf_ifp) {
igb_update_vf_stats_counters(adapter);
return;
}
stats = (struct e1000_hw_stats *)adapter->stats;
if (adapter->hw.phy.media_type == e1000_media_type_copper ||
(E1000_READ_REG(hw, E1000_STATUS) & E1000_STATUS_LU)) {
stats->symerrs +=
E1000_READ_REG(hw,E1000_SYMERRS);
stats->sec += E1000_READ_REG(hw, E1000_SEC);
}
stats->crcerrs += E1000_READ_REG(hw, E1000_CRCERRS);
stats->mpc += E1000_READ_REG(hw, E1000_MPC);
stats->scc += E1000_READ_REG(hw, E1000_SCC);
stats->ecol += E1000_READ_REG(hw, E1000_ECOL);
stats->mcc += E1000_READ_REG(hw, E1000_MCC);
stats->latecol += E1000_READ_REG(hw, E1000_LATECOL);
stats->colc += E1000_READ_REG(hw, E1000_COLC);
stats->dc += E1000_READ_REG(hw, E1000_DC);
stats->rlec += E1000_READ_REG(hw, E1000_RLEC);
stats->xonrxc += E1000_READ_REG(hw, E1000_XONRXC);
stats->xontxc += E1000_READ_REG(hw, E1000_XONTXC);
/*
** For watchdog management we need to know if we have been
** paused during the last interval, so capture that here.
*/
adapter->pause_frames = E1000_READ_REG(&adapter->hw, E1000_XOFFRXC);
stats->xoffrxc += adapter->pause_frames;
stats->xofftxc += E1000_READ_REG(hw, E1000_XOFFTXC);
stats->fcruc += E1000_READ_REG(hw, E1000_FCRUC);
stats->prc64 += E1000_READ_REG(hw, E1000_PRC64);
stats->prc127 += E1000_READ_REG(hw, E1000_PRC127);
stats->prc255 += E1000_READ_REG(hw, E1000_PRC255);
stats->prc511 += E1000_READ_REG(hw, E1000_PRC511);
stats->prc1023 += E1000_READ_REG(hw, E1000_PRC1023);
stats->prc1522 += E1000_READ_REG(hw, E1000_PRC1522);
stats->gprc += E1000_READ_REG(hw, E1000_GPRC);
stats->bprc += E1000_READ_REG(hw, E1000_BPRC);
stats->mprc += E1000_READ_REG(hw, E1000_MPRC);
stats->gptc += E1000_READ_REG(hw, E1000_GPTC);
/* For the 64-bit byte counters the low dword must be read first. */
/* Both registers clear on the read of the high dword */
stats->gorc += E1000_READ_REG(hw, E1000_GORCL) +
((u64)E1000_READ_REG(hw, E1000_GORCH) << 32);
stats->gotc += E1000_READ_REG(hw, E1000_GOTCL) +
((u64)E1000_READ_REG(hw, E1000_GOTCH) << 32);
stats->rnbc += E1000_READ_REG(hw, E1000_RNBC);
stats->ruc += E1000_READ_REG(hw, E1000_RUC);
stats->rfc += E1000_READ_REG(hw, E1000_RFC);
stats->roc += E1000_READ_REG(hw, E1000_ROC);
stats->rjc += E1000_READ_REG(hw, E1000_RJC);
stats->mgprc += E1000_READ_REG(hw, E1000_MGTPRC);
stats->mgpdc += E1000_READ_REG(hw, E1000_MGTPDC);
stats->mgptc += E1000_READ_REG(hw, E1000_MGTPTC);
stats->tor += E1000_READ_REG(hw, E1000_TORL) +
((u64)E1000_READ_REG(hw, E1000_TORH) << 32);
stats->tot += E1000_READ_REG(hw, E1000_TOTL) +
((u64)E1000_READ_REG(hw, E1000_TOTH) << 32);
stats->tpr += E1000_READ_REG(hw, E1000_TPR);
stats->tpt += E1000_READ_REG(hw, E1000_TPT);
stats->ptc64 += E1000_READ_REG(hw, E1000_PTC64);
stats->ptc127 += E1000_READ_REG(hw, E1000_PTC127);
stats->ptc255 += E1000_READ_REG(hw, E1000_PTC255);
stats->ptc511 += E1000_READ_REG(hw, E1000_PTC511);
stats->ptc1023 += E1000_READ_REG(hw, E1000_PTC1023);
stats->ptc1522 += E1000_READ_REG(hw, E1000_PTC1522);
stats->mptc += E1000_READ_REG(hw, E1000_MPTC);
stats->bptc += E1000_READ_REG(hw, E1000_BPTC);
/* Interrupt Counts */
stats->iac += E1000_READ_REG(hw, E1000_IAC);
stats->icrxptc += E1000_READ_REG(hw, E1000_ICRXPTC);
stats->icrxatc += E1000_READ_REG(hw, E1000_ICRXATC);
stats->ictxptc += E1000_READ_REG(hw, E1000_ICTXPTC);
stats->ictxatc += E1000_READ_REG(hw, E1000_ICTXATC);
stats->ictxqec += E1000_READ_REG(hw, E1000_ICTXQEC);
stats->ictxqmtc += E1000_READ_REG(hw, E1000_ICTXQMTC);
stats->icrxdmtc += E1000_READ_REG(hw, E1000_ICRXDMTC);
stats->icrxoc += E1000_READ_REG(hw, E1000_ICRXOC);
/* Host to Card Statistics */
stats->cbtmpc += E1000_READ_REG(hw, E1000_CBTMPC);
stats->htdpmc += E1000_READ_REG(hw, E1000_HTDPMC);
stats->cbrdpc += E1000_READ_REG(hw, E1000_CBRDPC);
stats->cbrmpc += E1000_READ_REG(hw, E1000_CBRMPC);
stats->rpthc += E1000_READ_REG(hw, E1000_RPTHC);
stats->hgptc += E1000_READ_REG(hw, E1000_HGPTC);
stats->htcbdpc += E1000_READ_REG(hw, E1000_HTCBDPC);
stats->hgorc += (E1000_READ_REG(hw, E1000_HGORCL) +
((u64)E1000_READ_REG(hw, E1000_HGORCH) << 32));
stats->hgotc += (E1000_READ_REG(hw, E1000_HGOTCL) +
((u64)E1000_READ_REG(hw, E1000_HGOTCH) << 32));
stats->lenerrs += E1000_READ_REG(hw, E1000_LENERRS);
stats->scvpc += E1000_READ_REG(hw, E1000_SCVPC);
stats->hrmpc += E1000_READ_REG(hw, E1000_HRMPC);
stats->algnerrc += E1000_READ_REG(hw, E1000_ALGNERRC);
stats->rxerrc += E1000_READ_REG(hw, E1000_RXERRC);
stats->tncrs += E1000_READ_REG(hw, E1000_TNCRS);
stats->cexterr += E1000_READ_REG(hw, E1000_CEXTERR);
stats->tsctc += E1000_READ_REG(hw, E1000_TSCTC);
stats->tsctfc += E1000_READ_REG(hw, E1000_TSCTFC);
/* Driver specific counters */
adapter->device_control = E1000_READ_REG(hw, E1000_CTRL);
adapter->rx_control = E1000_READ_REG(hw, E1000_RCTL);
adapter->int_mask = E1000_READ_REG(hw, E1000_IMS);
adapter->eint_mask = E1000_READ_REG(hw, E1000_EIMS);
adapter->packet_buf_alloc_tx =
((E1000_READ_REG(hw, E1000_PBA) & 0xffff0000) >> 16);
adapter->packet_buf_alloc_rx =
(E1000_READ_REG(hw, E1000_PBA) & 0xffff);
}
/**********************************************************************
*
* Initialize the VF board statistics counters.
*
**********************************************************************/
static void
igb_vf_init_stats(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct e1000_vf_stats *stats;
stats = (struct e1000_vf_stats *)adapter->stats;
if (stats == NULL)
return;
stats->last_gprc = E1000_READ_REG(hw, E1000_VFGPRC);
stats->last_gorc = E1000_READ_REG(hw, E1000_VFGORC);
stats->last_gptc = E1000_READ_REG(hw, E1000_VFGPTC);
stats->last_gotc = E1000_READ_REG(hw, E1000_VFGOTC);
stats->last_mprc = E1000_READ_REG(hw, E1000_VFMPRC);
}
/**********************************************************************
*
* Update the VF board statistics counters.
*
**********************************************************************/
static void
igb_update_vf_stats_counters(struct adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct e1000_vf_stats *stats;
if (adapter->link_speed == 0)
return;
stats = (struct e1000_vf_stats *)adapter->stats;
UPDATE_VF_REG(E1000_VFGPRC,
stats->last_gprc, stats->gprc);
UPDATE_VF_REG(E1000_VFGORC,
stats->last_gorc, stats->gorc);
UPDATE_VF_REG(E1000_VFGPTC,
stats->last_gptc, stats->gptc);
UPDATE_VF_REG(E1000_VFGOTC,
stats->last_gotc, stats->gotc);
UPDATE_VF_REG(E1000_VFMPRC,
stats->last_mprc, stats->mprc);
}
/* Export a single 32-bit register via a read-only sysctl. */
static int
igb_sysctl_reg_handler(SYSCTL_HANDLER_ARGS)
{
struct adapter *adapter;
u_int val;
adapter = oidp->oid_arg1;
val = E1000_READ_REG(&adapter->hw, oidp->oid_arg2);
return (sysctl_handle_int(oidp, &val, 0, req));
}
/*
** Tuneable interrupt rate handler
*/
static int
igb_sysctl_interrupt_rate_handler(SYSCTL_HANDLER_ARGS)
{
struct igb_queue *que = ((struct igb_queue *)oidp->oid_arg1);
int error;
u32 reg, usec, rate;
reg = E1000_READ_REG(&que->adapter->hw, E1000_EITR(que->msix));
usec = ((reg & 0x7FFC) >> 2);
if (usec > 0)
rate = 1000000 / usec;
else
rate = 0;
error = sysctl_handle_int(oidp, &rate, 0, req);
if (error || !req->newptr)
return error;
return 0;
}
/*
* Add sysctl variables, one per statistic, to the system.
*/
static void
igb_add_hw_stats(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct tx_ring *txr = adapter->tx_rings;
struct rx_ring *rxr = adapter->rx_rings;
struct sysctl_ctx_list *ctx = device_get_sysctl_ctx(dev);
struct sysctl_oid *tree = device_get_sysctl_tree(dev);
struct sysctl_oid_list *child = SYSCTL_CHILDREN(tree);
struct e1000_hw_stats *stats = adapter->stats;
struct sysctl_oid *stat_node, *queue_node, *int_node, *host_node;
struct sysctl_oid_list *stat_list, *queue_list, *int_list, *host_list;
#define QUEUE_NAME_LEN 32
char namebuf[QUEUE_NAME_LEN];
/* Driver Statistics */
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "dropped",
CTLFLAG_RD, &adapter->dropped_pkts,
"Driver dropped packets");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "link_irq",
CTLFLAG_RD, &adapter->link_irq,
"Link MSIX IRQ Handled");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "mbuf_defrag_fail",
CTLFLAG_RD, &adapter->mbuf_defrag_failed,
"Defragmenting mbuf chain failed");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_dma_fail",
CTLFLAG_RD, &adapter->no_tx_dma_setup,
"Driver tx dma failure in xmit");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_overruns",
CTLFLAG_RD, &adapter->rx_overruns,
"RX overruns");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "watchdog_timeouts",
CTLFLAG_RD, &adapter->watchdog_events,
"Watchdog timeouts");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "device_control",
CTLFLAG_RD, &adapter->device_control,
"Device Control Register");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_control",
CTLFLAG_RD, &adapter->rx_control,
"Receiver Control Register");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "interrupt_mask",
CTLFLAG_RD, &adapter->int_mask,
"Interrupt Mask");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "extended_int_mask",
CTLFLAG_RD, &adapter->eint_mask,
"Extended Interrupt Mask");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_buf_alloc",
CTLFLAG_RD, &adapter->packet_buf_alloc_tx,
"Transmit Buffer Packet Allocation");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "rx_buf_alloc",
CTLFLAG_RD, &adapter->packet_buf_alloc_rx,
"Receive Buffer Packet Allocation");
SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "fc_high_water",
CTLFLAG_RD, &adapter->hw.fc.high_water, 0,
"Flow Control High Watermark");
SYSCTL_ADD_UINT(ctx, child, OID_AUTO, "fc_low_water",
CTLFLAG_RD, &adapter->hw.fc.low_water, 0,
"Flow Control Low Watermark");
for (int i = 0; i < adapter->num_queues; i++, rxr++, txr++) {
struct lro_ctrl *lro = &rxr->lro;
snprintf(namebuf, QUEUE_NAME_LEN, "queue%d", i);
queue_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, namebuf,
CTLFLAG_RD, NULL, "Queue Name");
queue_list = SYSCTL_CHILDREN(queue_node);
SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "interrupt_rate",
CTLTYPE_UINT | CTLFLAG_RD, &adapter->queues[i],
sizeof(&adapter->queues[i]),
igb_sysctl_interrupt_rate_handler,
"IU", "Interrupt Rate");
SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "txd_head",
CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_TDH(txr->me),
igb_sysctl_reg_handler, "IU",
"Transmit Descriptor Head");
SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "txd_tail",
CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_TDT(txr->me),
igb_sysctl_reg_handler, "IU",
"Transmit Descriptor Tail");
SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "no_desc_avail",
CTLFLAG_RD, &txr->no_desc_avail,
"Queue Descriptors Unavailable");
SYSCTL_ADD_UQUAD(ctx, queue_list, OID_AUTO, "tx_packets",
CTLFLAG_RD, &txr->total_packets,
"Queue Packets Transmitted");
SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "rxd_head",
CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_RDH(rxr->me),
igb_sysctl_reg_handler, "IU",
"Receive Descriptor Head");
SYSCTL_ADD_PROC(ctx, queue_list, OID_AUTO, "rxd_tail",
CTLTYPE_UINT | CTLFLAG_RD, adapter, E1000_RDT(rxr->me),
igb_sysctl_reg_handler, "IU",
"Receive Descriptor Tail");
SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "rx_packets",
CTLFLAG_RD, &rxr->rx_packets,
"Queue Packets Received");
SYSCTL_ADD_QUAD(ctx, queue_list, OID_AUTO, "rx_bytes",
CTLFLAG_RD, &rxr->rx_bytes,
"Queue Bytes Received");
SYSCTL_ADD_U64(ctx, queue_list, OID_AUTO, "lro_queued",
CTLFLAG_RD, &lro->lro_queued, 0,
"LRO Queued");
SYSCTL_ADD_U64(ctx, queue_list, OID_AUTO, "lro_flushed",
CTLFLAG_RD, &lro->lro_flushed, 0,
"LRO Flushed");
}
/* MAC stats get their own sub node */
stat_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "mac_stats",
CTLFLAG_RD, NULL, "MAC Statistics");
stat_list = SYSCTL_CHILDREN(stat_node);
/*
** VF adapter has a very limited set of stats
** since its not managing the metal, so to speak.
*/
if (adapter->vf_ifp) {
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_recvd",
CTLFLAG_RD, &stats->gprc,
"Good Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_txd",
CTLFLAG_RD, &stats->gptc,
"Good Packets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_recvd",
CTLFLAG_RD, &stats->gorc,
"Good Octets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_txd",
CTLFLAG_RD, &stats->gotc,
"Good Octets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_recvd",
CTLFLAG_RD, &stats->mprc,
"Multicast Packets Received");
return;
}
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "excess_coll",
CTLFLAG_RD, &stats->ecol,
"Excessive collisions");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "single_coll",
CTLFLAG_RD, &stats->scc,
"Single collisions");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "multiple_coll",
CTLFLAG_RD, &stats->mcc,
"Multiple collisions");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "late_coll",
CTLFLAG_RD, &stats->latecol,
"Late collisions");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "collision_count",
CTLFLAG_RD, &stats->colc,
"Collision Count");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "symbol_errors",
CTLFLAG_RD, &stats->symerrs,
"Symbol Errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "sequence_errors",
CTLFLAG_RD, &stats->sec,
"Sequence Errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "defer_count",
CTLFLAG_RD, &stats->dc,
"Defer Count");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "missed_packets",
CTLFLAG_RD, &stats->mpc,
"Missed Packets");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_length_errors",
CTLFLAG_RD, &stats->rlec,
"Receive Length Errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_no_buff",
CTLFLAG_RD, &stats->rnbc,
"Receive No Buffers");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_undersize",
CTLFLAG_RD, &stats->ruc,
"Receive Undersize");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_fragmented",
CTLFLAG_RD, &stats->rfc,
"Fragmented Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_oversize",
CTLFLAG_RD, &stats->roc,
"Oversized Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_jabber",
CTLFLAG_RD, &stats->rjc,
"Recevied Jabber");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "recv_errs",
CTLFLAG_RD, &stats->rxerrc,
"Receive Errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "crc_errs",
CTLFLAG_RD, &stats->crcerrs,
"CRC errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "alignment_errs",
CTLFLAG_RD, &stats->algnerrc,
"Alignment Errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_no_crs",
CTLFLAG_RD, &stats->tncrs,
"Transmit with No CRS");
/* On 82575 these are collision counts */
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "coll_ext_errs",
CTLFLAG_RD, &stats->cexterr,
"Collision/Carrier extension errors");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xon_recvd",
CTLFLAG_RD, &stats->xonrxc,
"XON Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xon_txd",
CTLFLAG_RD, &stats->xontxc,
"XON Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xoff_recvd",
CTLFLAG_RD, &stats->xoffrxc,
"XOFF Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "xoff_txd",
CTLFLAG_RD, &stats->xofftxc,
"XOFF Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "unsupported_fc_recvd",
CTLFLAG_RD, &stats->fcruc,
"Unsupported Flow Control Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_recvd",
CTLFLAG_RD, &stats->mgprc,
"Management Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_drop",
CTLFLAG_RD, &stats->mgpdc,
"Management Packets Dropped");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mgmt_pkts_txd",
CTLFLAG_RD, &stats->mgptc,
"Management Packets Transmitted");
/* Packet Reception Stats */
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_pkts_recvd",
CTLFLAG_RD, &stats->tpr,
"Total Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_recvd",
CTLFLAG_RD, &stats->gprc,
"Good Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "bcast_pkts_recvd",
CTLFLAG_RD, &stats->bprc,
"Broadcast Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_recvd",
CTLFLAG_RD, &stats->mprc,
"Multicast Packets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_64",
CTLFLAG_RD, &stats->prc64,
"64 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_65_127",
CTLFLAG_RD, &stats->prc127,
"65-127 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_128_255",
CTLFLAG_RD, &stats->prc255,
"128-255 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_256_511",
CTLFLAG_RD, &stats->prc511,
"256-511 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_512_1023",
CTLFLAG_RD, &stats->prc1023,
"512-1023 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "rx_frames_1024_1522",
CTLFLAG_RD, &stats->prc1522,
"1023-1522 byte frames received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_recvd",
CTLFLAG_RD, &stats->gorc,
"Good Octets Received");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_octets_recvd",
CTLFLAG_RD, &stats->tor,
"Total Octets Received");
/* Packet Transmission Stats */
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_octets_txd",
CTLFLAG_RD, &stats->gotc,
"Good Octets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_octets_txd",
CTLFLAG_RD, &stats->tot,
"Total Octets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "total_pkts_txd",
CTLFLAG_RD, &stats->tpt,
"Total Packets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "good_pkts_txd",
CTLFLAG_RD, &stats->gptc,
"Good Packets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "bcast_pkts_txd",
CTLFLAG_RD, &stats->bptc,
"Broadcast Packets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "mcast_pkts_txd",
CTLFLAG_RD, &stats->mptc,
"Multicast Packets Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_64",
CTLFLAG_RD, &stats->ptc64,
"64 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_65_127",
CTLFLAG_RD, &stats->ptc127,
"65-127 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_128_255",
CTLFLAG_RD, &stats->ptc255,
"128-255 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_256_511",
CTLFLAG_RD, &stats->ptc511,
"256-511 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_512_1023",
CTLFLAG_RD, &stats->ptc1023,
"512-1023 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tx_frames_1024_1522",
CTLFLAG_RD, &stats->ptc1522,
"1024-1522 byte frames transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tso_txd",
CTLFLAG_RD, &stats->tsctc,
"TSO Contexts Transmitted");
SYSCTL_ADD_QUAD(ctx, stat_list, OID_AUTO, "tso_ctx_fail",
CTLFLAG_RD, &stats->tsctfc,
"TSO Contexts Failed");
/* Interrupt Stats */
int_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "interrupts",
CTLFLAG_RD, NULL, "Interrupt Statistics");
int_list = SYSCTL_CHILDREN(int_node);
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "asserts",
CTLFLAG_RD, &stats->iac,
"Interrupt Assertion Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_pkt_timer",
CTLFLAG_RD, &stats->icrxptc,
"Interrupt Cause Rx Pkt Timer Expire Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_abs_timer",
CTLFLAG_RD, &stats->icrxatc,
"Interrupt Cause Rx Abs Timer Expire Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_pkt_timer",
CTLFLAG_RD, &stats->ictxptc,
"Interrupt Cause Tx Pkt Timer Expire Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_abs_timer",
CTLFLAG_RD, &stats->ictxatc,
"Interrupt Cause Tx Abs Timer Expire Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_queue_empty",
CTLFLAG_RD, &stats->ictxqec,
"Interrupt Cause Tx Queue Empty Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "tx_queue_min_thresh",
CTLFLAG_RD, &stats->ictxqmtc,
"Interrupt Cause Tx Queue Min Thresh Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_desc_min_thresh",
CTLFLAG_RD, &stats->icrxdmtc,
"Interrupt Cause Rx Desc Min Thresh Count");
SYSCTL_ADD_QUAD(ctx, int_list, OID_AUTO, "rx_overrun",
CTLFLAG_RD, &stats->icrxoc,
"Interrupt Cause Receiver Overrun Count");
/* Host to Card Stats */
host_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "host",
CTLFLAG_RD, NULL,
"Host to Card Statistics");
host_list = SYSCTL_CHILDREN(host_node);
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_tx_pkt",
CTLFLAG_RD, &stats->cbtmpc,
"Circuit Breaker Tx Packet Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "host_tx_pkt_discard",
CTLFLAG_RD, &stats->htdpmc,
"Host Transmit Discarded Packets");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "rx_pkt",
CTLFLAG_RD, &stats->rpthc,
"Rx Packets To Host");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_rx_pkts",
CTLFLAG_RD, &stats->cbrmpc,
"Circuit Breaker Rx Packet Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_rx_pkt_drop",
CTLFLAG_RD, &stats->cbrdpc,
"Circuit Breaker Rx Dropped Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "tx_good_pkt",
CTLFLAG_RD, &stats->hgptc,
"Host Good Packets Tx Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "breaker_tx_pkt_drop",
CTLFLAG_RD, &stats->htcbdpc,
"Host Tx Circuit Breaker Dropped Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "rx_good_bytes",
CTLFLAG_RD, &stats->hgorc,
"Host Good Octets Received Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "tx_good_bytes",
CTLFLAG_RD, &stats->hgotc,
"Host Good Octets Transmit Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "length_errors",
CTLFLAG_RD, &stats->lenerrs,
"Length Errors");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "serdes_violation_pkt",
CTLFLAG_RD, &stats->scvpc,
"SerDes/SGMII Code Violation Pkt Count");
SYSCTL_ADD_QUAD(ctx, host_list, OID_AUTO, "header_redir_missed",
CTLFLAG_RD, &stats->hrmpc,
"Header Redirection Missed Packet Count");
}
/**********************************************************************
*
* This routine provides a way to dump out the adapter eeprom,
* often a useful debug/service tool. This only dumps the first
* 32 words, stuff that matters is in that extent.
*
**********************************************************************/
static int
igb_sysctl_nvm_info(SYSCTL_HANDLER_ARGS)
{
struct adapter *adapter;
int error;
int result;
result = -1;
error = sysctl_handle_int(oidp, &result, 0, req);
if (error || !req->newptr)
return (error);
/*
* This value will cause a hex dump of the
* first 32 16-bit words of the EEPROM to
* the screen.
*/
if (result == 1) {
adapter = (struct adapter *)arg1;
igb_print_nvm_info(adapter);
}
return (error);
}
static void
igb_print_nvm_info(struct adapter *adapter)
{
u16 eeprom_data;
int i, j, row = 0;
/* Its a bit crude, but it gets the job done */
printf("\nInterface EEPROM Dump:\n");
printf("Offset\n0x0000 ");
for (i = 0, j = 0; i < 32; i++, j++) {
if (j == 8) { /* Make the offset block */
j = 0; ++row;
printf("\n0x00%x0 ",row);
}
e1000_read_nvm(&adapter->hw, i, 1, &eeprom_data);
printf("%04x ", eeprom_data);
}
printf("\n");
}
static void
igb_set_sysctl_value(struct adapter *adapter, const char *name,
const char *description, int *limit, int value)
{
*limit = value;
SYSCTL_ADD_INT(device_get_sysctl_ctx(adapter->dev),
SYSCTL_CHILDREN(device_get_sysctl_tree(adapter->dev)),
OID_AUTO, name, CTLFLAG_RW, limit, value, description);
}
/*
** Set flow control using sysctl:
** Flow control values:
** 0 - off
** 1 - rx pause
** 2 - tx pause
** 3 - full
*/
static int
igb_set_flowcntl(SYSCTL_HANDLER_ARGS)
{
int error;
static int input = 3; /* default is full */
struct adapter *adapter = (struct adapter *) arg1;
error = sysctl_handle_int(oidp, &input, 0, req);
if ((error) || (req->newptr == NULL))
return (error);
switch (input) {
case e1000_fc_rx_pause:
case e1000_fc_tx_pause:
case e1000_fc_full:
case e1000_fc_none:
adapter->hw.fc.requested_mode = input;
adapter->fc = input;
break;
default:
/* Do nothing */
return (error);
}
adapter->hw.fc.current_mode = adapter->hw.fc.requested_mode;
e1000_force_mac_fc(&adapter->hw);
/* XXX TODO: update DROP_EN on each RX queue if appropriate */
return (error);
}
/*
** Manage DMA Coalesce:
** Control values:
** 0/1 - off/on
** Legal timer values are:
** 250,500,1000-10000 in thousands
*/
static int
igb_sysctl_dmac(SYSCTL_HANDLER_ARGS)
{
struct adapter *adapter = (struct adapter *) arg1;
int error;
error = sysctl_handle_int(oidp, &adapter->dmac, 0, req);
if ((error) || (req->newptr == NULL))
return (error);
switch (adapter->dmac) {
case 0:
/* Disabling */
break;
case 1: /* Just enable and use default */
adapter->dmac = 1000;
break;
case 250:
case 500:
case 1000:
case 2000:
case 3000:
case 4000:
case 5000:
case 6000:
case 7000:
case 8000:
case 9000:
case 10000:
/* Legal values - allow */
break;
default:
/* Do nothing, illegal value */
adapter->dmac = 0;
return (EINVAL);
}
/* Reinit the interface */
igb_init(adapter);
return (error);
}
/*
** Manage Energy Efficient Ethernet:
** Control values:
** 0/1 - enabled/disabled
*/
static int
igb_sysctl_eee(SYSCTL_HANDLER_ARGS)
{
struct adapter *adapter = (struct adapter *) arg1;
int error, value;
value = adapter->hw.dev_spec._82575.eee_disable;
error = sysctl_handle_int(oidp, &value, 0, req);
if (error || req->newptr == NULL)
return (error);
IGB_CORE_LOCK(adapter);
adapter->hw.dev_spec._82575.eee_disable = (value != 0);
igb_init_locked(adapter);
IGB_CORE_UNLOCK(adapter);
return (0);
}
Index: projects/vnet/sys/dev/gpio/gpiobus.c
===================================================================
--- projects/vnet/sys/dev/gpio/gpiobus.c (revision 301546)
+++ projects/vnet/sys/dev/gpio/gpiobus.c (revision 301547)
@@ -1,849 +1,874 @@
/*-
* Copyright (c) 2009 Oleksandr Tymoshenko
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include
__FBSDID("$FreeBSD$");
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "gpiobus_if.h"
#undef GPIOBUS_DEBUG
#ifdef GPIOBUS_DEBUG
#define dprintf printf
#else
#define dprintf(x, arg...)
#endif
static void gpiobus_print_pins(struct gpiobus_ivar *, char *, size_t);
static int gpiobus_parse_pins(struct gpiobus_softc *, device_t, int);
static int gpiobus_probe(device_t);
static int gpiobus_attach(device_t);
static int gpiobus_detach(device_t);
static int gpiobus_suspend(device_t);
static int gpiobus_resume(device_t);
static void gpiobus_probe_nomatch(device_t, device_t);
static int gpiobus_print_child(device_t, device_t);
static int gpiobus_child_location_str(device_t, device_t, char *, size_t);
static int gpiobus_child_pnpinfo_str(device_t, device_t, char *, size_t);
static device_t gpiobus_add_child(device_t, u_int, const char *, int);
static void gpiobus_hinted_child(device_t, const char *, int);
/*
* GPIOBUS interface
*/
static int gpiobus_acquire_bus(device_t, device_t, int);
static void gpiobus_release_bus(device_t, device_t);
static int gpiobus_pin_setflags(device_t, device_t, uint32_t, uint32_t);
static int gpiobus_pin_getflags(device_t, device_t, uint32_t, uint32_t*);
static int gpiobus_pin_getcaps(device_t, device_t, uint32_t, uint32_t*);
static int gpiobus_pin_set(device_t, device_t, uint32_t, unsigned int);
static int gpiobus_pin_get(device_t, device_t, uint32_t, unsigned int*);
static int gpiobus_pin_toggle(device_t, device_t, uint32_t);
/*
* XXX -> Move me to better place - gpio_subr.c?
* Also, this function must be changed when interrupt configuration
* data will be moved into struct resource.
*/
#ifdef INTRNG
+static void
+gpio_destruct_map_data(struct intr_map_data *map_data)
+{
+
+ KASSERT(map_data->type == INTR_MAP_DATA_GPIO,
+ ("%s: bad map_data type %d", __func__, map_data->type));
+
+ free(map_data, M_DEVBUF);
+}
+
struct resource *
gpio_alloc_intr_resource(device_t consumer_dev, int *rid, u_int alloc_flags,
gpio_pin_t pin, uint32_t intr_mode)
{
- u_int irqnum;
+ int rv;
+ u_int irq;
+ struct intr_map_data_gpio *gpio_data;
+ struct resource *res;
- /*
- * Allocate new fictitious interrupt number and store configuration
- * into it.
- */
- irqnum = intr_gpio_map_irq(pin->dev, pin->pin, pin->flags, intr_mode);
- if (irqnum == INTR_IRQ_INVALID)
+ gpio_data = malloc(sizeof(*gpio_data), M_DEVBUF, M_WAITOK | M_ZERO);
+ gpio_data->hdr.type = INTR_MAP_DATA_GPIO;
+ gpio_data->hdr.destruct = gpio_destruct_map_data;
+ gpio_data->gpio_pin_num = pin->pin;
+ gpio_data->gpio_pin_flags = pin->flags;
+ gpio_data->gpio_intr_mode = intr_mode;
+
+ rv = intr_map_irq(pin->dev, 0, (struct intr_map_data *)gpio_data,
+ &irq);
+ if (rv != 0) {
+ gpio_destruct_map_data((struct intr_map_data *)gpio_data);
return (NULL);
+ }
- return (bus_alloc_resource(consumer_dev, SYS_RES_IRQ, rid,
- irqnum, irqnum, 1, alloc_flags));
+ res = bus_alloc_resource(consumer_dev, SYS_RES_IRQ, rid, irq, irq, 1,
+ alloc_flags);
+ if (res == NULL) {
+ gpio_destruct_map_data((struct intr_map_data *)gpio_data);
+ return (NULL);
+ }
+ rman_set_virtual(res, gpio_data);
+ return (res);
}
#else
struct resource *
gpio_alloc_intr_resource(device_t consumer_dev, int *rid, u_int alloc_flags,
gpio_pin_t pin, uint32_t intr_mode)
{
return (NULL);
}
#endif
int
gpio_check_flags(uint32_t caps, uint32_t flags)
{
/* Check for unwanted flags. */
if ((flags & caps) == 0 || (flags & caps) != flags)
return (EINVAL);
/* Cannot mix input/output together. */
if (flags & GPIO_PIN_INPUT && flags & GPIO_PIN_OUTPUT)
return (EINVAL);
/* Cannot mix pull-up/pull-down together. */
if (flags & GPIO_PIN_PULLUP && flags & GPIO_PIN_PULLDOWN)
return (EINVAL);
return (0);
}
static void
gpiobus_print_pins(struct gpiobus_ivar *devi, char *buf, size_t buflen)
{
char tmp[128];
int i, range_start, range_stop, need_coma;
if (devi->npins == 0)
return;
need_coma = 0;
range_start = range_stop = devi->pins[0];
for (i = 1; i < devi->npins; i++) {
if (devi->pins[i] != (range_stop + 1)) {
if (need_coma)
strlcat(buf, ",", buflen);
memset(tmp, 0, sizeof(tmp));
if (range_start != range_stop)
snprintf(tmp, sizeof(tmp) - 1, "%d-%d",
range_start, range_stop);
else
snprintf(tmp, sizeof(tmp) - 1, "%d",
range_start);
strlcat(buf, tmp, buflen);
range_start = range_stop = devi->pins[i];
need_coma = 1;
}
else
range_stop++;
}
if (need_coma)
strlcat(buf, ",", buflen);
memset(tmp, 0, sizeof(tmp));
if (range_start != range_stop)
snprintf(tmp, sizeof(tmp) - 1, "%d-%d",
range_start, range_stop);
else
snprintf(tmp, sizeof(tmp) - 1, "%d",
range_start);
strlcat(buf, tmp, buflen);
}
device_t
gpiobus_attach_bus(device_t dev)
{
device_t busdev;
busdev = device_add_child(dev, "gpiobus", -1);
if (busdev == NULL)
return (NULL);
if (device_add_child(dev, "gpioc", -1) == NULL) {
device_delete_child(dev, busdev);
return (NULL);
}
#ifdef FDT
ofw_gpiobus_register_provider(dev);
#endif
bus_generic_attach(dev);
return (busdev);
}
int
gpiobus_detach_bus(device_t dev)
{
int err;
#ifdef FDT
ofw_gpiobus_unregister_provider(dev);
#endif
err = bus_generic_detach(dev);
if (err != 0)
return (err);
return (device_delete_children(dev));
}
int
gpiobus_init_softc(device_t dev)
{
struct gpiobus_softc *sc;
sc = GPIOBUS_SOFTC(dev);
sc->sc_busdev = dev;
sc->sc_dev = device_get_parent(dev);
sc->sc_intr_rman.rm_type = RMAN_ARRAY;
sc->sc_intr_rman.rm_descr = "GPIO Interrupts";
if (rman_init(&sc->sc_intr_rman) != 0 ||
rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0)
panic("%s: failed to set up rman.", __func__);
if (GPIO_PIN_MAX(sc->sc_dev, &sc->sc_npins) != 0)
return (ENXIO);
KASSERT(sc->sc_npins >= 0, ("GPIO device with no pins"));
/* Pins = GPIO_PIN_MAX() + 1 */
sc->sc_npins++;
sc->sc_pins = malloc(sizeof(*sc->sc_pins) * sc->sc_npins, M_DEVBUF,
M_NOWAIT | M_ZERO);
if (sc->sc_pins == NULL)
return (ENOMEM);
/* Initialize the bus lock. */
GPIOBUS_LOCK_INIT(sc);
return (0);
}
int
gpiobus_alloc_ivars(struct gpiobus_ivar *devi)
{
/* Allocate pins and flags memory. */
devi->pins = malloc(sizeof(uint32_t) * devi->npins, M_DEVBUF,
M_NOWAIT | M_ZERO);
if (devi->pins == NULL)
return (ENOMEM);
devi->flags = malloc(sizeof(uint32_t) * devi->npins, M_DEVBUF,
M_NOWAIT | M_ZERO);
if (devi->flags == NULL) {
free(devi->pins, M_DEVBUF);
return (ENOMEM);
}
return (0);
}
void
gpiobus_free_ivars(struct gpiobus_ivar *devi)
{
if (devi->flags) {
free(devi->flags, M_DEVBUF);
devi->flags = NULL;
}
if (devi->pins) {
free(devi->pins, M_DEVBUF);
devi->pins = NULL;
}
}
int
gpiobus_acquire_pin(device_t bus, uint32_t pin)
{
struct gpiobus_softc *sc;
sc = device_get_softc(bus);
/* Consistency check. */
if (pin >= sc->sc_npins) {
device_printf(bus,
"invalid pin %d, max: %d\n", pin, sc->sc_npins - 1);
return (-1);
}
/* Mark pin as mapped and give warning if it's already mapped. */
if (sc->sc_pins[pin].mapped) {
device_printf(bus, "warning: pin %d is already mapped\n", pin);
return (-1);
}
sc->sc_pins[pin].mapped = 1;
return (0);
}
/* Release mapped pin */
int
gpiobus_release_pin(device_t bus, uint32_t pin)
{
struct gpiobus_softc *sc;
sc = device_get_softc(bus);
/* Consistency check. */
if (pin >= sc->sc_npins) {
device_printf(bus,
"gpiobus_acquire_pin: invalid pin %d, max=%d\n",
pin, sc->sc_npins - 1);
return (-1);
}
if (!sc->sc_pins[pin].mapped) {
device_printf(bus, "gpiobus_acquire_pin: pin %d is not mapped\n", pin);
return (-1);
}
sc->sc_pins[pin].mapped = 0;
return (0);
}
static int
gpiobus_parse_pins(struct gpiobus_softc *sc, device_t child, int mask)
{
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
int i, npins;
npins = 0;
for (i = 0; i < 32; i++) {
if (mask & (1 << i))
npins++;
}
if (npins == 0) {
device_printf(child, "empty pin mask\n");
return (EINVAL);
}
devi->npins = npins;
if (gpiobus_alloc_ivars(devi) != 0) {
device_printf(child, "cannot allocate device ivars\n");
return (EINVAL);
}
npins = 0;
for (i = 0; i < 32; i++) {
if ((mask & (1 << i)) == 0)
continue;
/* Reserve the GPIO pin. */
if (gpiobus_acquire_pin(sc->sc_busdev, i) != 0) {
gpiobus_free_ivars(devi);
return (EINVAL);
}
devi->pins[npins++] = i;
/* Use the child name as pin name. */
GPIOBUS_PIN_SETNAME(sc->sc_busdev, i,
device_get_nameunit(child));
}
return (0);
}
static int
gpiobus_probe(device_t dev)
{
device_set_desc(dev, "GPIO bus");
return (BUS_PROBE_GENERIC);
}
static int
gpiobus_attach(device_t dev)
{
int err;
err = gpiobus_init_softc(dev);
if (err != 0)
return (err);
/*
* Get parent's pins and mark them as unmapped
*/
bus_generic_probe(dev);
bus_enumerate_hinted_children(dev);
return (bus_generic_attach(dev));
}
/*
* Since this is not a self-enumerating bus, and since we always add
* children in attach, we have to always delete children here.
*/
static int
gpiobus_detach(device_t dev)
{
struct gpiobus_softc *sc;
struct gpiobus_ivar *devi;
device_t *devlist;
int i, err, ndevs;
sc = GPIOBUS_SOFTC(dev);
KASSERT(mtx_initialized(&sc->sc_mtx),
("gpiobus mutex not initialized"));
GPIOBUS_LOCK_DESTROY(sc);
if ((err = bus_generic_detach(dev)) != 0)
return (err);
if ((err = device_get_children(dev, &devlist, &ndevs)) != 0)
return (err);
for (i = 0; i < ndevs; i++) {
devi = GPIOBUS_IVAR(devlist[i]);
gpiobus_free_ivars(devi);
resource_list_free(&devi->rl);
free(devi, M_DEVBUF);
device_delete_child(dev, devlist[i]);
}
free(devlist, M_TEMP);
rman_fini(&sc->sc_intr_rman);
if (sc->sc_pins) {
for (i = 0; i < sc->sc_npins; i++) {
if (sc->sc_pins[i].name != NULL)
free(sc->sc_pins[i].name, M_DEVBUF);
sc->sc_pins[i].name = NULL;
}
free(sc->sc_pins, M_DEVBUF);
sc->sc_pins = NULL;
}
return (0);
}
static int
gpiobus_suspend(device_t dev)
{
return (bus_generic_suspend(dev));
}
static int
gpiobus_resume(device_t dev)
{
return (bus_generic_resume(dev));
}
static void
gpiobus_probe_nomatch(device_t dev, device_t child)
{
char pins[128];
struct gpiobus_ivar *devi;
devi = GPIOBUS_IVAR(child);
memset(pins, 0, sizeof(pins));
gpiobus_print_pins(devi, pins, sizeof(pins));
if (devi->npins > 1)
device_printf(dev, " at pins %s", pins);
else
device_printf(dev, " at pin %s", pins);
resource_list_print_type(&devi->rl, "irq", SYS_RES_IRQ, "%jd");
printf("\n");
}
static int
gpiobus_print_child(device_t dev, device_t child)
{
char pins[128];
int retval = 0;
struct gpiobus_ivar *devi;
devi = GPIOBUS_IVAR(child);
memset(pins, 0, sizeof(pins));
retval += bus_print_child_header(dev, child);
if (devi->npins > 0) {
if (devi->npins > 1)
retval += printf(" at pins ");
else
retval += printf(" at pin ");
gpiobus_print_pins(devi, pins, sizeof(pins));
retval += printf("%s", pins);
}
resource_list_print_type(&devi->rl, "irq", SYS_RES_IRQ, "%jd");
retval += bus_print_child_footer(dev, child);
return (retval);
}
static int
gpiobus_child_location_str(device_t bus, device_t child, char *buf,
size_t buflen)
{
struct gpiobus_ivar *devi;
devi = GPIOBUS_IVAR(child);
if (devi->npins > 1)
strlcpy(buf, "pins=", buflen);
else
strlcpy(buf, "pin=", buflen);
gpiobus_print_pins(devi, buf, buflen);
return (0);
}
static int
gpiobus_child_pnpinfo_str(device_t bus, device_t child, char *buf,
size_t buflen)
{
*buf = '\0';
return (0);
}
static device_t
gpiobus_add_child(device_t dev, u_int order, const char *name, int unit)
{
device_t child;
struct gpiobus_ivar *devi;
child = device_add_child_ordered(dev, order, name, unit);
if (child == NULL)
return (child);
devi = malloc(sizeof(struct gpiobus_ivar), M_DEVBUF, M_NOWAIT | M_ZERO);
if (devi == NULL) {
device_delete_child(dev, child);
return (NULL);
}
resource_list_init(&devi->rl);
device_set_ivars(child, devi);
return (child);
}
static void
gpiobus_hinted_child(device_t bus, const char *dname, int dunit)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(bus);
struct gpiobus_ivar *devi;
device_t child;
int irq, pins;
child = BUS_ADD_CHILD(bus, 0, dname, dunit);
devi = GPIOBUS_IVAR(child);
resource_int_value(dname, dunit, "pins", &pins);
if (gpiobus_parse_pins(sc, child, pins)) {
resource_list_free(&devi->rl);
free(devi, M_DEVBUF);
device_delete_child(bus, child);
}
if (resource_int_value(dname, dunit, "irq", &irq) == 0) {
if (bus_set_resource(child, SYS_RES_IRQ, 0, irq, 1) != 0)
device_printf(bus,
"warning: bus_set_resource() failed\n");
}
}
static int
gpiobus_set_resource(device_t dev, device_t child, int type, int rid,
rman_res_t start, rman_res_t count)
{
struct gpiobus_ivar *devi;
struct resource_list_entry *rle;
dprintf("%s: entry (%p, %p, %d, %d, %p, %ld)\n",
__func__, dev, child, type, rid, (void *)(intptr_t)start, count);
devi = GPIOBUS_IVAR(child);
rle = resource_list_add(&devi->rl, type, rid, start,
start + count - 1, count);
if (rle == NULL)
return (ENXIO);
return (0);
}
static struct resource *
gpiobus_alloc_resource(device_t bus, device_t child, int type, int *rid,
rman_res_t start, rman_res_t end, rman_res_t count, u_int flags)
{
struct gpiobus_softc *sc;
struct resource *rv;
struct resource_list *rl;
struct resource_list_entry *rle;
int isdefault;
if (type != SYS_RES_IRQ)
return (NULL);
isdefault = (RMAN_IS_DEFAULT_RANGE(start, end) && count == 1);
rle = NULL;
if (isdefault) {
rl = BUS_GET_RESOURCE_LIST(bus, child);
if (rl == NULL)
return (NULL);
rle = resource_list_find(rl, type, *rid);
if (rle == NULL)
return (NULL);
if (rle->res != NULL)
panic("%s: resource entry is busy", __func__);
start = rle->start;
count = rle->count;
end = rle->end;
}
sc = device_get_softc(bus);
rv = rman_reserve_resource(&sc->sc_intr_rman, start, end, count, flags,
child);
if (rv == NULL)
return (NULL);
rman_set_rid(rv, *rid);
if ((flags & RF_ACTIVE) != 0 &&
bus_activate_resource(child, type, *rid, rv) != 0) {
rman_release_resource(rv);
return (NULL);
}
return (rv);
}
static int
gpiobus_release_resource(device_t bus __unused, device_t child, int type,
int rid, struct resource *r)
{
int error;
if (rman_get_flags(r) & RF_ACTIVE) {
error = bus_deactivate_resource(child, type, rid, r);
if (error)
return (error);
}
return (rman_release_resource(r));
}
static struct resource_list *
gpiobus_get_resource_list(device_t bus __unused, device_t child)
{
struct gpiobus_ivar *ivar;
ivar = GPIOBUS_IVAR(child);
return (&ivar->rl);
}
static int
gpiobus_acquire_bus(device_t busdev, device_t child, int how)
{
struct gpiobus_softc *sc;
sc = device_get_softc(busdev);
GPIOBUS_ASSERT_UNLOCKED(sc);
GPIOBUS_LOCK(sc);
if (sc->sc_owner != NULL) {
if (sc->sc_owner == child)
panic("%s: %s still owns the bus.",
device_get_nameunit(busdev),
device_get_nameunit(child));
if (how == GPIOBUS_DONTWAIT) {
GPIOBUS_UNLOCK(sc);
return (EWOULDBLOCK);
}
while (sc->sc_owner != NULL)
mtx_sleep(sc, &sc->sc_mtx, 0, "gpiobuswait", 0);
}
sc->sc_owner = child;
GPIOBUS_UNLOCK(sc);
return (0);
}
static void
gpiobus_release_bus(device_t busdev, device_t child)
{
struct gpiobus_softc *sc;
sc = device_get_softc(busdev);
GPIOBUS_ASSERT_UNLOCKED(sc);
GPIOBUS_LOCK(sc);
if (sc->sc_owner == NULL)
panic("%s: %s releasing unowned bus.",
device_get_nameunit(busdev),
device_get_nameunit(child));
if (sc->sc_owner != child)
panic("%s: %s trying to release bus owned by %s",
device_get_nameunit(busdev),
device_get_nameunit(child),
device_get_nameunit(sc->sc_owner));
sc->sc_owner = NULL;
wakeup(sc);
GPIOBUS_UNLOCK(sc);
}
static int
gpiobus_pin_setflags(device_t dev, device_t child, uint32_t pin,
uint32_t flags)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
uint32_t caps;
if (pin >= devi->npins)
return (EINVAL);
if (GPIO_PIN_GETCAPS(sc->sc_dev, devi->pins[pin], &caps) != 0)
return (EINVAL);
if (gpio_check_flags(caps, flags) != 0)
return (EINVAL);
return (GPIO_PIN_SETFLAGS(sc->sc_dev, devi->pins[pin], flags));
}
static int
gpiobus_pin_getflags(device_t dev, device_t child, uint32_t pin,
uint32_t *flags)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
if (pin >= devi->npins)
return (EINVAL);
return GPIO_PIN_GETFLAGS(sc->sc_dev, devi->pins[pin], flags);
}
static int
gpiobus_pin_getcaps(device_t dev, device_t child, uint32_t pin,
uint32_t *caps)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
if (pin >= devi->npins)
return (EINVAL);
return GPIO_PIN_GETCAPS(sc->sc_dev, devi->pins[pin], caps);
}
static int
gpiobus_pin_set(device_t dev, device_t child, uint32_t pin,
unsigned int value)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
if (pin >= devi->npins)
return (EINVAL);
return GPIO_PIN_SET(sc->sc_dev, devi->pins[pin], value);
}
static int
gpiobus_pin_get(device_t dev, device_t child, uint32_t pin,
unsigned int *value)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
if (pin >= devi->npins)
return (EINVAL);
return GPIO_PIN_GET(sc->sc_dev, devi->pins[pin], value);
}
static int
gpiobus_pin_toggle(device_t dev, device_t child, uint32_t pin)
{
struct gpiobus_softc *sc = GPIOBUS_SOFTC(dev);
struct gpiobus_ivar *devi = GPIOBUS_IVAR(child);
if (pin >= devi->npins)
return (EINVAL);
return GPIO_PIN_TOGGLE(sc->sc_dev, devi->pins[pin]);
}
static int
gpiobus_pin_getname(device_t dev, uint32_t pin, char *name)
{
struct gpiobus_softc *sc;
sc = GPIOBUS_SOFTC(dev);
if (pin > sc->sc_npins)
return (EINVAL);
/* Did we have a name for this pin ? */
if (sc->sc_pins[pin].name != NULL) {
memcpy(name, sc->sc_pins[pin].name, GPIOMAXNAME);
return (0);
}
/* Return the default pin name. */
return (GPIO_PIN_GETNAME(device_get_parent(dev), pin, name));
}
static int
gpiobus_pin_setname(device_t dev, uint32_t pin, const char *name)
{
struct gpiobus_softc *sc;
sc = GPIOBUS_SOFTC(dev);
if (pin > sc->sc_npins)
return (EINVAL);
if (name == NULL)
return (EINVAL);
/* Save the pin name. */
if (sc->sc_pins[pin].name == NULL)
sc->sc_pins[pin].name = malloc(GPIOMAXNAME, M_DEVBUF,
M_WAITOK | M_ZERO);
strlcpy(sc->sc_pins[pin].name, name, GPIOMAXNAME);
return (0);
}
static device_method_t gpiobus_methods[] = {
/* Device interface */
DEVMETHOD(device_probe, gpiobus_probe),
DEVMETHOD(device_attach, gpiobus_attach),
DEVMETHOD(device_detach, gpiobus_detach),
DEVMETHOD(device_shutdown, bus_generic_shutdown),
DEVMETHOD(device_suspend, gpiobus_suspend),
DEVMETHOD(device_resume, gpiobus_resume),
/* Bus interface */
DEVMETHOD(bus_setup_intr, bus_generic_setup_intr),
DEVMETHOD(bus_config_intr, bus_generic_config_intr),
DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr),
DEVMETHOD(bus_set_resource, gpiobus_set_resource),
DEVMETHOD(bus_alloc_resource, gpiobus_alloc_resource),
DEVMETHOD(bus_release_resource, gpiobus_release_resource),
DEVMETHOD(bus_activate_resource, bus_generic_activate_resource),
DEVMETHOD(bus_deactivate_resource, bus_generic_deactivate_resource),
DEVMETHOD(bus_get_resource_list, gpiobus_get_resource_list),
DEVMETHOD(bus_add_child, gpiobus_add_child),
DEVMETHOD(bus_probe_nomatch, gpiobus_probe_nomatch),
DEVMETHOD(bus_print_child, gpiobus_print_child),
DEVMETHOD(bus_child_pnpinfo_str, gpiobus_child_pnpinfo_str),
DEVMETHOD(bus_child_location_str, gpiobus_child_location_str),
DEVMETHOD(bus_hinted_child, gpiobus_hinted_child),
/* GPIO protocol */
DEVMETHOD(gpiobus_acquire_bus, gpiobus_acquire_bus),
DEVMETHOD(gpiobus_release_bus, gpiobus_release_bus),
DEVMETHOD(gpiobus_pin_getflags, gpiobus_pin_getflags),
DEVMETHOD(gpiobus_pin_getcaps, gpiobus_pin_getcaps),
DEVMETHOD(gpiobus_pin_setflags, gpiobus_pin_setflags),
DEVMETHOD(gpiobus_pin_get, gpiobus_pin_get),
DEVMETHOD(gpiobus_pin_set, gpiobus_pin_set),
DEVMETHOD(gpiobus_pin_toggle, gpiobus_pin_toggle),
DEVMETHOD(gpiobus_pin_getname, gpiobus_pin_getname),
DEVMETHOD(gpiobus_pin_setname, gpiobus_pin_setname),
DEVMETHOD_END
};
driver_t gpiobus_driver = {
"gpiobus",
gpiobus_methods,
sizeof(struct gpiobus_softc)
};
devclass_t gpiobus_devclass;
DRIVER_MODULE(gpiobus, gpio, gpiobus_driver, gpiobus_devclass, 0, 0);
MODULE_VERSION(gpiobus, 1);
Index: projects/vnet/sys/dev/gpio/gpiobusvar.h
===================================================================
--- projects/vnet/sys/dev/gpio/gpiobusvar.h (revision 301546)
+++ projects/vnet/sys/dev/gpio/gpiobusvar.h (revision 301547)
@@ -1,144 +1,151 @@
/*-
* Copyright (c) 2009 Oleksandr Tymoshenko
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*
*/
#ifndef __GPIOBUS_H__
#define __GPIOBUS_H__
#include "opt_platform.h"
#include
#include
#include
#ifdef FDT
#include
#include
#endif
#include "gpio_if.h"
#ifdef FDT
#define GPIOBUS_IVAR(d) (struct gpiobus_ivar *) \
&((struct ofw_gpiobus_devinfo *)device_get_ivars(d))->opd_dinfo
#else
#define GPIOBUS_IVAR(d) (struct gpiobus_ivar *) device_get_ivars(d)
#endif
#define GPIOBUS_SOFTC(d) (struct gpiobus_softc *) device_get_softc(d)
#define GPIOBUS_LOCK(_sc) mtx_lock(&(_sc)->sc_mtx)
#define GPIOBUS_UNLOCK(_sc) mtx_unlock(&(_sc)->sc_mtx)
#define GPIOBUS_LOCK_INIT(_sc) mtx_init(&_sc->sc_mtx, \
device_get_nameunit(_sc->sc_dev), "gpiobus", MTX_DEF)
#define GPIOBUS_LOCK_DESTROY(_sc) mtx_destroy(&_sc->sc_mtx)
#define GPIOBUS_ASSERT_LOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_OWNED)
#define GPIOBUS_ASSERT_UNLOCKED(_sc) mtx_assert(&_sc->sc_mtx, MA_NOTOWNED)
#define GPIOBUS_WAIT 1
#define GPIOBUS_DONTWAIT 2
/* Use default interrupt mode - for gpio_alloc_intr_resource */
#define GPIO_INTR_CONFORM GPIO_INTR_NONE
struct gpiobus_pin_data
{
int mapped; /* pin is mapped/reserved. */
char *name; /* pin name. */
};
+struct intr_map_data_gpio {
+ struct intr_map_data hdr;
+ u_int gpio_pin_num;
+ u_int gpio_pin_flags;
+ u_int gpio_intr_mode;
+};
+
struct gpiobus_softc
{
struct mtx sc_mtx; /* bus mutex */
struct rman sc_intr_rman; /* isr resources */
device_t sc_busdev; /* bus device */
device_t sc_owner; /* bus owner */
device_t sc_dev; /* driver device */
int sc_npins; /* total pins on bus */
struct gpiobus_pin_data *sc_pins; /* pin data */
};
struct gpiobus_pin
{
device_t dev; /* gpio device */
uint32_t flags; /* pin flags */
uint32_t pin; /* pin number */
};
typedef struct gpiobus_pin *gpio_pin_t;
struct gpiobus_ivar
{
struct resource_list rl; /* isr resource list */
uint32_t npins; /* pins total */
uint32_t *flags; /* pins flags */
uint32_t *pins; /* pins map */
};
#ifdef FDT
struct ofw_gpiobus_devinfo {
struct gpiobus_ivar opd_dinfo;
struct ofw_bus_devinfo opd_obdinfo;
};
static __inline int
gpio_map_gpios(device_t bus, phandle_t dev, phandle_t gparent, int gcells,
pcell_t *gpios, uint32_t *pin, uint32_t *flags)
{
return (GPIO_MAP_GPIOS(bus, dev, gparent, gcells, gpios, pin, flags));
}
device_t ofw_gpiobus_add_fdt_child(device_t, const char *, phandle_t);
int ofw_gpiobus_parse_gpios(device_t, char *, struct gpiobus_pin **);
void ofw_gpiobus_register_provider(device_t);
void ofw_gpiobus_unregister_provider(device_t);
/* Consumers interface. */
int gpio_pin_get_by_ofw_name(device_t consumer, phandle_t node,
char *name, gpio_pin_t *gpio);
int gpio_pin_get_by_ofw_idx(device_t consumer, phandle_t node,
int idx, gpio_pin_t *gpio);
int gpio_pin_get_by_ofw_property(device_t consumer, phandle_t node,
char *name, gpio_pin_t *gpio);
void gpio_pin_release(gpio_pin_t gpio);
int gpio_pin_getcaps(gpio_pin_t pin, uint32_t *caps);
int gpio_pin_is_active(gpio_pin_t pin, bool *active);
int gpio_pin_set_active(gpio_pin_t pin, bool active);
int gpio_pin_setflags(gpio_pin_t pin, uint32_t flags);
#endif
struct resource *gpio_alloc_intr_resource(device_t consumer_dev, int *rid,
u_int alloc_flags, gpio_pin_t pin, uint32_t intr_mode);
int gpio_check_flags(uint32_t, uint32_t);
device_t gpiobus_attach_bus(device_t);
int gpiobus_detach_bus(device_t);
int gpiobus_init_softc(device_t);
int gpiobus_alloc_ivars(struct gpiobus_ivar *);
void gpiobus_free_ivars(struct gpiobus_ivar *);
int gpiobus_acquire_pin(device_t, uint32_t);
int gpiobus_release_pin(device_t, uint32_t);
extern driver_t gpiobus_driver;
#endif /* __GPIOBUS_H__ */
Index: projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
===================================================================
--- projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 301546)
+++ projects/vnet/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 301547)
@@ -1,3003 +1,3003 @@
/*-
* Copyright (c) 2010-2012 Citrix Inc.
* Copyright (c) 2009-2012,2016 Microsoft Corp.
* Copyright (c) 2012 NetApp Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice unmodified, this list of conditions, and the following
* disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*-
* Copyright (c) 2004-2006 Kip Macy
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include
__FBSDID("$FreeBSD$");
#include "opt_inet6.h"
#include "opt_inet.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include "hv_net_vsc.h"
#include "hv_rndis.h"
#include "hv_rndis_filter.h"
#define hv_chan_rxr hv_chan_priv1
#define hv_chan_txr hv_chan_priv2
/* Short for Hyper-V network interface */
#define NETVSC_DEVNAME "hn"
/*
* It looks like offset 0 of buf is reserved to hold the softc pointer.
* The sc pointer evidently not needed, and is not presently populated.
* The packet offset is where the netvsc_packet starts in the buffer.
*/
#define HV_NV_SC_PTR_OFFSET_IN_BUF 0
#define HV_NV_PACKET_OFFSET_IN_BUF 16
/* YYY should get it from the underlying channel */
#define HN_TX_DESC_CNT 512
#define HN_LROENT_CNT_DEF 128
#define HN_RING_CNT_DEF_MAX 8
#define HN_RNDIS_MSG_LEN \
(sizeof(rndis_msg) + \
RNDIS_HASHVAL_PPI_SIZE + \
RNDIS_VLAN_PPI_SIZE + \
RNDIS_TSO_PPI_SIZE + \
RNDIS_CSUM_PPI_SIZE)
#define HN_RNDIS_MSG_BOUNDARY PAGE_SIZE
#define HN_RNDIS_MSG_ALIGN CACHE_LINE_SIZE
#define HN_TX_DATA_BOUNDARY PAGE_SIZE
#define HN_TX_DATA_MAXSIZE IP_MAXPACKET
#define HN_TX_DATA_SEGSIZE PAGE_SIZE
#define HN_TX_DATA_SEGCNT_MAX \
(NETVSC_PACKET_MAXPAGE - HV_RF_NUM_TX_RESERVED_PAGE_BUFS)
#define HN_DIRECT_TX_SIZE_DEF 128
#define HN_EARLY_TXEOF_THRESH 8
struct hn_txdesc {
#ifndef HN_USE_TXDESC_BUFRING
SLIST_ENTRY(hn_txdesc) link;
#endif
struct mbuf *m;
struct hn_tx_ring *txr;
int refs;
uint32_t flags; /* HN_TXD_FLAG_ */
netvsc_packet netvsc_pkt; /* XXX to be removed */
bus_dmamap_t data_dmap;
bus_addr_t rndis_msg_paddr;
rndis_msg *rndis_msg;
bus_dmamap_t rndis_msg_dmap;
};
#define HN_TXD_FLAG_ONLIST 0x1
#define HN_TXD_FLAG_DMAMAP 0x2
/*
* Only enable UDP checksum offloading when it is on 2012R2 or
* later. UDP checksum offloading doesn't work on earlier
* Windows releases.
*/
#define HN_CSUM_ASSIST_WIN8 (CSUM_IP | CSUM_TCP)
#define HN_CSUM_ASSIST (CSUM_IP | CSUM_UDP | CSUM_TCP)
#define HN_LRO_LENLIM_MULTIRX_DEF (12 * ETHERMTU)
#define HN_LRO_LENLIM_DEF (25 * ETHERMTU)
/* YYY 2*MTU is a bit rough, but should be good enough. */
#define HN_LRO_LENLIM_MIN(ifp) (2 * (ifp)->if_mtu)
#define HN_LRO_ACKCNT_DEF 1
/*
* Be aware that this sleepable mutex will exhibit WITNESS errors when
* certain TCP and ARP code paths are taken. This appears to be a
* well-known condition, as all other drivers checked use a sleeping
* mutex to protect their transmit paths.
* Also Be aware that mutexes do not play well with semaphores, and there
* is a conflicting semaphore in a certain channel code path.
*/
#define NV_LOCK_INIT(_sc, _name) \
mtx_init(&(_sc)->hn_lock, _name, MTX_NETWORK_LOCK, MTX_DEF)
#define NV_LOCK(_sc) mtx_lock(&(_sc)->hn_lock)
#define NV_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->hn_lock, MA_OWNED)
#define NV_UNLOCK(_sc) mtx_unlock(&(_sc)->hn_lock)
#define NV_LOCK_DESTROY(_sc) mtx_destroy(&(_sc)->hn_lock)
/*
* Globals
*/
int hv_promisc_mode = 0; /* normal mode by default */
SYSCTL_NODE(_hw, OID_AUTO, hn, CTLFLAG_RD | CTLFLAG_MPSAFE, NULL,
"Hyper-V network interface");
/* Trust tcp segements verification on host side. */
static int hn_trust_hosttcp = 1;
SYSCTL_INT(_hw_hn, OID_AUTO, trust_hosttcp, CTLFLAG_RDTUN,
&hn_trust_hosttcp, 0,
"Trust tcp segement verification on host side, "
"when csum info is missing (global setting)");
/* Trust udp datagrams verification on host side. */
static int hn_trust_hostudp = 1;
SYSCTL_INT(_hw_hn, OID_AUTO, trust_hostudp, CTLFLAG_RDTUN,
&hn_trust_hostudp, 0,
"Trust udp datagram verification on host side, "
"when csum info is missing (global setting)");
/* Trust ip packets verification on host side. */
static int hn_trust_hostip = 1;
SYSCTL_INT(_hw_hn, OID_AUTO, trust_hostip, CTLFLAG_RDTUN,
&hn_trust_hostip, 0,
"Trust ip packet verification on host side, "
"when csum info is missing (global setting)");
#if __FreeBSD_version >= 1100045
/* Limit TSO burst size */
static int hn_tso_maxlen = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, tso_maxlen, CTLFLAG_RDTUN,
&hn_tso_maxlen, 0, "TSO burst limit");
#endif
/* Limit chimney send size */
static int hn_tx_chimney_size = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, tx_chimney_size, CTLFLAG_RDTUN,
&hn_tx_chimney_size, 0, "Chimney send packet size limit");
/* Limit the size of packet for direct transmission */
static int hn_direct_tx_size = HN_DIRECT_TX_SIZE_DEF;
SYSCTL_INT(_hw_hn, OID_AUTO, direct_tx_size, CTLFLAG_RDTUN,
&hn_direct_tx_size, 0, "Size of the packet for direct transmission");
#if defined(INET) || defined(INET6)
#if __FreeBSD_version >= 1100095
static int hn_lro_entry_count = HN_LROENT_CNT_DEF;
SYSCTL_INT(_hw_hn, OID_AUTO, lro_entry_count, CTLFLAG_RDTUN,
&hn_lro_entry_count, 0, "LRO entry count");
#endif
#endif
static int hn_share_tx_taskq = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, share_tx_taskq, CTLFLAG_RDTUN,
&hn_share_tx_taskq, 0, "Enable shared TX taskqueue");
static struct taskqueue *hn_tx_taskq;
#ifndef HN_USE_TXDESC_BUFRING
static int hn_use_txdesc_bufring = 0;
#else
static int hn_use_txdesc_bufring = 1;
#endif
SYSCTL_INT(_hw_hn, OID_AUTO, use_txdesc_bufring, CTLFLAG_RD,
&hn_use_txdesc_bufring, 0, "Use buf_ring for TX descriptors");
static int hn_bind_tx_taskq = -1;
SYSCTL_INT(_hw_hn, OID_AUTO, bind_tx_taskq, CTLFLAG_RDTUN,
&hn_bind_tx_taskq, 0, "Bind TX taskqueue to the specified cpu");
static int hn_use_if_start = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, use_if_start, CTLFLAG_RDTUN,
&hn_use_if_start, 0, "Use if_start TX method");
static int hn_chan_cnt = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, chan_cnt, CTLFLAG_RDTUN,
&hn_chan_cnt, 0,
"# of channels to use; each channel has one RX ring and one TX ring");
static int hn_tx_ring_cnt = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, tx_ring_cnt, CTLFLAG_RDTUN,
&hn_tx_ring_cnt, 0, "# of TX rings to use");
static int hn_tx_swq_depth = 0;
SYSCTL_INT(_hw_hn, OID_AUTO, tx_swq_depth, CTLFLAG_RDTUN,
&hn_tx_swq_depth, 0, "Depth of IFQ or BUFRING");
static u_int hn_cpu_index;
/*
* Forward declarations
*/
static void hn_stop(hn_softc_t *sc);
static void hn_ifinit_locked(hn_softc_t *sc);
static void hn_ifinit(void *xsc);
static int hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data);
static int hn_start_locked(struct hn_tx_ring *txr, int len);
static void hn_start(struct ifnet *ifp);
static void hn_start_txeof(struct hn_tx_ring *);
static int hn_ifmedia_upd(struct ifnet *ifp);
static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr);
#if __FreeBSD_version >= 1100099
static int hn_lro_lenlim_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_lro_ackcnt_sysctl(SYSCTL_HANDLER_ARGS);
#endif
static int hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_rx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_rx_stat_u64_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_tx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_tx_conf_int_sysctl(SYSCTL_HANDLER_ARGS);
static int hn_check_iplen(const struct mbuf *, int);
static int hn_create_tx_ring(struct hn_softc *, int);
static void hn_destroy_tx_ring(struct hn_tx_ring *);
static int hn_create_tx_data(struct hn_softc *, int);
static void hn_destroy_tx_data(struct hn_softc *);
static void hn_start_taskfunc(void *, int);
static void hn_start_txeof_taskfunc(void *, int);
static void hn_stop_tx_tasks(struct hn_softc *);
static int hn_encap(struct hn_tx_ring *, struct hn_txdesc *, struct mbuf **);
static void hn_create_rx_data(struct hn_softc *sc, int);
static void hn_destroy_rx_data(struct hn_softc *sc);
static void hn_set_tx_chimney_size(struct hn_softc *, int);
static void hn_channel_attach(struct hn_softc *, struct hv_vmbus_channel *);
static void hn_subchan_attach(struct hn_softc *, struct hv_vmbus_channel *);
static int hn_transmit(struct ifnet *, struct mbuf *);
static void hn_xmit_qflush(struct ifnet *);
static int hn_xmit(struct hn_tx_ring *, int);
static void hn_xmit_txeof(struct hn_tx_ring *);
static void hn_xmit_taskfunc(void *, int);
static void hn_xmit_txeof_taskfunc(void *, int);
#if __FreeBSD_version >= 1100099
static void
hn_set_lro_lenlim(struct hn_softc *sc, int lenlim)
{
int i;
for (i = 0; i < sc->hn_rx_ring_inuse; ++i)
sc->hn_rx_ring[i].hn_lro.lro_length_lim = lenlim;
}
#endif
static int
hn_get_txswq_depth(const struct hn_tx_ring *txr)
{
KASSERT(txr->hn_txdesc_cnt > 0, ("tx ring is not setup yet"));
if (hn_tx_swq_depth < txr->hn_txdesc_cnt)
return txr->hn_txdesc_cnt;
return hn_tx_swq_depth;
}
static int
hn_ifmedia_upd(struct ifnet *ifp __unused)
{
return EOPNOTSUPP;
}
static void
hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
{
struct hn_softc *sc = ifp->if_softc;
ifmr->ifm_status = IFM_AVALID;
ifmr->ifm_active = IFM_ETHER;
if (!sc->hn_carrier) {
ifmr->ifm_active |= IFM_NONE;
return;
}
ifmr->ifm_status |= IFM_ACTIVE;
ifmr->ifm_active |= IFM_10G_T | IFM_FDX;
}
/* {F8615163-DF3E-46c5-913F-F2D2F965ED0E} */
static const hv_guid g_net_vsc_device_type = {
.data = {0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46,
0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E}
};
/*
* Standard probe entry point.
*
*/
static int
netvsc_probe(device_t dev)
{
const char *p;
p = vmbus_get_type(dev);
if (!memcmp(p, &g_net_vsc_device_type.data, sizeof(hv_guid))) {
device_set_desc(dev, "Hyper-V Network Interface");
if (bootverbose)
printf("Netvsc probe... DONE \n");
return (BUS_PROBE_DEFAULT);
}
return (ENXIO);
}
/*
* Standard attach entry point.
*
* Called when the driver is loaded. It allocates needed resources,
* and initializes the "hardware" and software.
*/
static int
netvsc_attach(device_t dev)
{
struct hv_device *device_ctx = vmbus_get_devctx(dev);
struct hv_vmbus_channel *pri_chan;
netvsc_device_info device_info;
hn_softc_t *sc;
int unit = device_get_unit(dev);
struct ifnet *ifp = NULL;
int error, ring_cnt, tx_ring_cnt;
#if __FreeBSD_version >= 1100045
int tso_maxlen;
#endif
sc = device_get_softc(dev);
sc->hn_unit = unit;
sc->hn_dev = dev;
if (hn_tx_taskq == NULL) {
sc->hn_tx_taskq = taskqueue_create("hn_tx", M_WAITOK,
taskqueue_thread_enqueue, &sc->hn_tx_taskq);
if (hn_bind_tx_taskq >= 0) {
int cpu = hn_bind_tx_taskq;
cpuset_t cpu_set;
if (cpu > mp_ncpus - 1)
cpu = mp_ncpus - 1;
CPU_SETOF(cpu, &cpu_set);
taskqueue_start_threads_cpuset(&sc->hn_tx_taskq, 1,
PI_NET, &cpu_set, "%s tx",
device_get_nameunit(dev));
} else {
taskqueue_start_threads(&sc->hn_tx_taskq, 1, PI_NET,
"%s tx", device_get_nameunit(dev));
}
} else {
sc->hn_tx_taskq = hn_tx_taskq;
}
NV_LOCK_INIT(sc, "NetVSCLock");
sc->hn_dev_obj = device_ctx;
ifp = sc->hn_ifp = if_alloc(IFT_ETHER);
ifp->if_softc = sc;
if_initname(ifp, device_get_name(dev), device_get_unit(dev));
/*
* Figure out the # of RX rings (ring_cnt) and the # of TX rings
* to use (tx_ring_cnt).
*
* NOTE:
* The # of RX rings to use is same as the # of channels to use.
*/
ring_cnt = hn_chan_cnt;
if (ring_cnt <= 0) {
/* Default */
ring_cnt = mp_ncpus;
if (ring_cnt > HN_RING_CNT_DEF_MAX)
ring_cnt = HN_RING_CNT_DEF_MAX;
} else if (ring_cnt > mp_ncpus) {
ring_cnt = mp_ncpus;
}
tx_ring_cnt = hn_tx_ring_cnt;
if (tx_ring_cnt <= 0 || tx_ring_cnt > ring_cnt)
tx_ring_cnt = ring_cnt;
if (hn_use_if_start) {
/* ifnet.if_start only needs one TX ring. */
tx_ring_cnt = 1;
}
/*
* Set the leader CPU for channels.
*/
sc->hn_cpu = atomic_fetchadd_int(&hn_cpu_index, ring_cnt) % mp_ncpus;
error = hn_create_tx_data(sc, tx_ring_cnt);
if (error)
goto failed;
hn_create_rx_data(sc, ring_cnt);
/*
* Associate the first TX/RX ring w/ the primary channel.
*/
pri_chan = device_ctx->channel;
KASSERT(HV_VMBUS_CHAN_ISPRIMARY(pri_chan), ("not primary channel"));
KASSERT(pri_chan->offer_msg.offer.sub_channel_index == 0,
("primary channel subidx %u",
pri_chan->offer_msg.offer.sub_channel_index));
hn_channel_attach(sc, pri_chan);
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
ifp->if_ioctl = hn_ioctl;
ifp->if_init = hn_ifinit;
/* needed by hv_rf_on_device_add() code */
ifp->if_mtu = ETHERMTU;
if (hn_use_if_start) {
int qdepth = hn_get_txswq_depth(&sc->hn_tx_ring[0]);
ifp->if_start = hn_start;
IFQ_SET_MAXLEN(&ifp->if_snd, qdepth);
ifp->if_snd.ifq_drv_maxlen = qdepth - 1;
IFQ_SET_READY(&ifp->if_snd);
} else {
ifp->if_transmit = hn_transmit;
ifp->if_qflush = hn_xmit_qflush;
}
ifmedia_init(&sc->hn_media, 0, hn_ifmedia_upd, hn_ifmedia_sts);
ifmedia_add(&sc->hn_media, IFM_ETHER | IFM_AUTO, 0, NULL);
ifmedia_set(&sc->hn_media, IFM_ETHER | IFM_AUTO);
/* XXX ifmedia_set really should do this for us */
sc->hn_media.ifm_media = sc->hn_media.ifm_cur->ifm_media;
/*
* Tell upper layers that we support full VLAN capability.
*/
ifp->if_hdrlen = sizeof(struct ether_vlan_header);
ifp->if_capabilities |=
IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO |
IFCAP_LRO;
ifp->if_capenable |=
IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO |
IFCAP_LRO;
ifp->if_hwassist = sc->hn_tx_ring[0].hn_csum_assist | CSUM_TSO;
error = hv_rf_on_device_add(device_ctx, &device_info, ring_cnt);
if (error)
goto failed;
KASSERT(sc->net_dev->num_channel > 0 &&
sc->net_dev->num_channel <= sc->hn_rx_ring_inuse,
("invalid channel count %u, should be less than %d",
sc->net_dev->num_channel, sc->hn_rx_ring_inuse));
/*
* Set the # of TX/RX rings that could be used according to
* the # of channels that host offered.
*/
if (sc->hn_tx_ring_inuse > sc->net_dev->num_channel)
sc->hn_tx_ring_inuse = sc->net_dev->num_channel;
sc->hn_rx_ring_inuse = sc->net_dev->num_channel;
device_printf(dev, "%d TX ring, %d RX ring\n",
sc->hn_tx_ring_inuse, sc->hn_rx_ring_inuse);
if (sc->net_dev->num_channel > 1) {
struct hv_vmbus_channel **subchan;
int subchan_cnt = sc->net_dev->num_channel - 1;
int i;
/* Wait for sub-channels setup to complete. */
subchan = vmbus_get_subchan(pri_chan, subchan_cnt);
/* Attach the sub-channels. */
for (i = 0; i < subchan_cnt; ++i) {
/* NOTE: Calling order is critical. */
hn_subchan_attach(sc, subchan[i]);
hv_nv_subchan_attach(subchan[i]);
}
/* Release the sub-channels */
vmbus_rel_subchan(subchan, subchan_cnt);
device_printf(dev, "%d sub-channels setup done\n", subchan_cnt);
}
#if __FreeBSD_version >= 1100099
if (sc->hn_rx_ring_inuse > 1) {
/*
* Reduce TCP segment aggregation limit for multiple
* RX rings to increase ACK timeliness.
*/
hn_set_lro_lenlim(sc, HN_LRO_LENLIM_MULTIRX_DEF);
}
#endif
if (device_info.link_state == 0) {
sc->hn_carrier = 1;
}
#if __FreeBSD_version >= 1100045
tso_maxlen = hn_tso_maxlen;
if (tso_maxlen <= 0 || tso_maxlen > IP_MAXPACKET)
tso_maxlen = IP_MAXPACKET;
ifp->if_hw_tsomaxsegcount = HN_TX_DATA_SEGCNT_MAX;
ifp->if_hw_tsomaxsegsize = PAGE_SIZE;
ifp->if_hw_tsomax = tso_maxlen -
(ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
#endif
ether_ifattach(ifp, device_info.mac_addr);
#if __FreeBSD_version >= 1100045
if_printf(ifp, "TSO: %u/%u/%u\n", ifp->if_hw_tsomax,
ifp->if_hw_tsomaxsegcount, ifp->if_hw_tsomaxsegsize);
#endif
sc->hn_tx_chimney_max = sc->net_dev->send_section_size;
hn_set_tx_chimney_size(sc, sc->hn_tx_chimney_max);
if (hn_tx_chimney_size > 0 &&
hn_tx_chimney_size < sc->hn_tx_chimney_max)
hn_set_tx_chimney_size(sc, hn_tx_chimney_size);
return (0);
failed:
hn_destroy_tx_data(sc);
if (ifp != NULL)
if_free(ifp);
return (error);
}
/*
* Standard detach entry point
*/
static int
netvsc_detach(device_t dev)
{
struct hn_softc *sc = device_get_softc(dev);
struct hv_device *hv_device = vmbus_get_devctx(dev);
if (bootverbose)
printf("netvsc_detach\n");
/*
* XXXKYS: Need to clean up all our
* driver state; this is the driver
* unloading.
*/
/*
* XXXKYS: Need to stop outgoing traffic and unregister
* the netdevice.
*/
hv_rf_on_device_remove(hv_device, HV_RF_NV_DESTROY_CHANNEL);
hn_stop_tx_tasks(sc);
ifmedia_removeall(&sc->hn_media);
hn_destroy_rx_data(sc);
hn_destroy_tx_data(sc);
if (sc->hn_tx_taskq != hn_tx_taskq)
taskqueue_free(sc->hn_tx_taskq);
return (0);
}
/*
* Standard shutdown entry point
*/
static int
netvsc_shutdown(device_t dev)
{
return (0);
}
static __inline int
hn_txdesc_dmamap_load(struct hn_tx_ring *txr, struct hn_txdesc *txd,
struct mbuf **m_head, bus_dma_segment_t *segs, int *nsegs)
{
struct mbuf *m = *m_head;
int error;
error = bus_dmamap_load_mbuf_sg(txr->hn_tx_data_dtag, txd->data_dmap,
m, segs, nsegs, BUS_DMA_NOWAIT);
if (error == EFBIG) {
struct mbuf *m_new;
m_new = m_collapse(m, M_NOWAIT, HN_TX_DATA_SEGCNT_MAX);
if (m_new == NULL)
return ENOBUFS;
else
*m_head = m = m_new;
txr->hn_tx_collapsed++;
error = bus_dmamap_load_mbuf_sg(txr->hn_tx_data_dtag,
txd->data_dmap, m, segs, nsegs, BUS_DMA_NOWAIT);
}
if (!error) {
bus_dmamap_sync(txr->hn_tx_data_dtag, txd->data_dmap,
BUS_DMASYNC_PREWRITE);
txd->flags |= HN_TXD_FLAG_DMAMAP;
}
return error;
}
static __inline void
hn_txdesc_dmamap_unload(struct hn_tx_ring *txr, struct hn_txdesc *txd)
{
if (txd->flags & HN_TXD_FLAG_DMAMAP) {
bus_dmamap_sync(txr->hn_tx_data_dtag,
txd->data_dmap, BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->hn_tx_data_dtag,
txd->data_dmap);
txd->flags &= ~HN_TXD_FLAG_DMAMAP;
}
}
static __inline int
hn_txdesc_put(struct hn_tx_ring *txr, struct hn_txdesc *txd)
{
KASSERT((txd->flags & HN_TXD_FLAG_ONLIST) == 0,
("put an onlist txd %#x", txd->flags));
KASSERT(txd->refs > 0, ("invalid txd refs %d", txd->refs));
if (atomic_fetchadd_int(&txd->refs, -1) != 1)
return 0;
hn_txdesc_dmamap_unload(txr, txd);
if (txd->m != NULL) {
m_freem(txd->m);
txd->m = NULL;
}
txd->flags |= HN_TXD_FLAG_ONLIST;
#ifndef HN_USE_TXDESC_BUFRING
mtx_lock_spin(&txr->hn_txlist_spin);
KASSERT(txr->hn_txdesc_avail >= 0 &&
txr->hn_txdesc_avail < txr->hn_txdesc_cnt,
("txdesc_put: invalid txd avail %d", txr->hn_txdesc_avail));
txr->hn_txdesc_avail++;
SLIST_INSERT_HEAD(&txr->hn_txlist, txd, link);
mtx_unlock_spin(&txr->hn_txlist_spin);
#else
atomic_add_int(&txr->hn_txdesc_avail, 1);
buf_ring_enqueue(txr->hn_txdesc_br, txd);
#endif
return 1;
}
static __inline struct hn_txdesc *
hn_txdesc_get(struct hn_tx_ring *txr)
{
struct hn_txdesc *txd;
#ifndef HN_USE_TXDESC_BUFRING
mtx_lock_spin(&txr->hn_txlist_spin);
txd = SLIST_FIRST(&txr->hn_txlist);
if (txd != NULL) {
KASSERT(txr->hn_txdesc_avail > 0,
("txdesc_get: invalid txd avail %d", txr->hn_txdesc_avail));
txr->hn_txdesc_avail--;
SLIST_REMOVE_HEAD(&txr->hn_txlist, link);
}
mtx_unlock_spin(&txr->hn_txlist_spin);
#else
txd = buf_ring_dequeue_sc(txr->hn_txdesc_br);
#endif
if (txd != NULL) {
#ifdef HN_USE_TXDESC_BUFRING
atomic_subtract_int(&txr->hn_txdesc_avail, 1);
#endif
KASSERT(txd->m == NULL && txd->refs == 0 &&
(txd->flags & HN_TXD_FLAG_ONLIST), ("invalid txd"));
txd->flags &= ~HN_TXD_FLAG_ONLIST;
txd->refs = 1;
}
return txd;
}
static __inline void
hn_txdesc_hold(struct hn_txdesc *txd)
{
/* 0->1 transition will never work */
KASSERT(txd->refs > 0, ("invalid refs %d", txd->refs));
atomic_add_int(&txd->refs, 1);
}
static __inline void
hn_txeof(struct hn_tx_ring *txr)
{
txr->hn_has_txeof = 0;
txr->hn_txeof(txr);
}
static void
hn_tx_done(struct hv_vmbus_channel *chan, void *xpkt)
{
netvsc_packet *packet = xpkt;
struct hn_txdesc *txd;
struct hn_tx_ring *txr;
txd = (struct hn_txdesc *)(uintptr_t)
packet->compl.send.send_completion_tid;
txr = txd->txr;
KASSERT(txr->hn_chan == chan,
("channel mismatch, on channel%u, should be channel%u",
chan->offer_msg.offer.sub_channel_index,
txr->hn_chan->offer_msg.offer.sub_channel_index));
txr->hn_has_txeof = 1;
hn_txdesc_put(txr, txd);
++txr->hn_txdone_cnt;
if (txr->hn_txdone_cnt >= HN_EARLY_TXEOF_THRESH) {
txr->hn_txdone_cnt = 0;
if (txr->hn_oactive)
hn_txeof(txr);
}
}
void
netvsc_channel_rollup(struct hv_vmbus_channel *chan)
{
struct hn_tx_ring *txr = chan->hv_chan_txr;
#if defined(INET) || defined(INET6)
struct hn_rx_ring *rxr = chan->hv_chan_rxr;
tcp_lro_flush_all(&rxr->hn_lro);
#endif
/*
* NOTE:
* 'txr' could be NULL, if multiple channels and
* ifnet.if_start method are enabled.
*/
if (txr == NULL || !txr->hn_has_txeof)
return;
txr->hn_txdone_cnt = 0;
hn_txeof(txr);
}
/*
* NOTE:
* If this function fails, then both txd and m_head0 will be freed.
*/
static int
hn_encap(struct hn_tx_ring *txr, struct hn_txdesc *txd, struct mbuf **m_head0)
{
bus_dma_segment_t segs[HN_TX_DATA_SEGCNT_MAX];
int error, nsegs, i;
struct mbuf *m_head = *m_head0;
netvsc_packet *packet;
rndis_msg *rndis_mesg;
rndis_packet *rndis_pkt;
rndis_per_packet_info *rppi;
struct rndis_hash_value *hash_value;
uint32_t rndis_msg_size;
packet = &txd->netvsc_pkt;
packet->is_data_pkt = TRUE;
packet->tot_data_buf_len = m_head->m_pkthdr.len;
/*
* extension points to the area reserved for the
* rndis_filter_packet, which is placed just after
* the netvsc_packet (and rppi struct, if present;
* length is updated later).
*/
rndis_mesg = txd->rndis_msg;
/* XXX not necessary */
memset(rndis_mesg, 0, HN_RNDIS_MSG_LEN);
rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
rndis_pkt = &rndis_mesg->msg.packet;
rndis_pkt->data_offset = sizeof(rndis_packet);
rndis_pkt->data_length = packet->tot_data_buf_len;
rndis_pkt->per_pkt_info_offset = sizeof(rndis_packet);
rndis_msg_size = RNDIS_MESSAGE_SIZE(rndis_packet);
/*
* Set the hash value for this packet, so that the host could
* dispatch the TX done event for this packet back to this TX
* ring's channel.
*/
rndis_msg_size += RNDIS_HASHVAL_PPI_SIZE;
rppi = hv_set_rppi_data(rndis_mesg, RNDIS_HASHVAL_PPI_SIZE,
nbl_hash_value);
hash_value = (struct rndis_hash_value *)((uint8_t *)rppi +
rppi->per_packet_info_offset);
hash_value->hash_value = txr->hn_tx_idx;
if (m_head->m_flags & M_VLANTAG) {
ndis_8021q_info *rppi_vlan_info;
rndis_msg_size += RNDIS_VLAN_PPI_SIZE;
rppi = hv_set_rppi_data(rndis_mesg, RNDIS_VLAN_PPI_SIZE,
ieee_8021q_info);
rppi_vlan_info = (ndis_8021q_info *)((uint8_t *)rppi +
rppi->per_packet_info_offset);
rppi_vlan_info->u1.s1.vlan_id =
m_head->m_pkthdr.ether_vtag & 0xfff;
}
if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
rndis_tcp_tso_info *tso_info;
struct ether_vlan_header *eh;
int ether_len;
/*
* XXX need m_pullup and use mtodo
*/
eh = mtod(m_head, struct ether_vlan_header*);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN))
ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
else
ether_len = ETHER_HDR_LEN;
rndis_msg_size += RNDIS_TSO_PPI_SIZE;
rppi = hv_set_rppi_data(rndis_mesg, RNDIS_TSO_PPI_SIZE,
tcp_large_send_info);
tso_info = (rndis_tcp_tso_info *)((uint8_t *)rppi +
rppi->per_packet_info_offset);
tso_info->lso_v2_xmit.type =
RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE;
#ifdef INET
if (m_head->m_pkthdr.csum_flags & CSUM_IP_TSO) {
struct ip *ip =
(struct ip *)(m_head->m_data + ether_len);
unsigned long iph_len = ip->ip_hl << 2;
struct tcphdr *th =
(struct tcphdr *)((caddr_t)ip + iph_len);
tso_info->lso_v2_xmit.ip_version =
RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4;
ip->ip_len = 0;
ip->ip_sum = 0;
th->th_sum = in_pseudo(ip->ip_src.s_addr,
ip->ip_dst.s_addr, htons(IPPROTO_TCP));
}
#endif
#if defined(INET6) && defined(INET)
else
#endif
#ifdef INET6
{
struct ip6_hdr *ip6 = (struct ip6_hdr *)
(m_head->m_data + ether_len);
struct tcphdr *th = (struct tcphdr *)(ip6 + 1);
tso_info->lso_v2_xmit.ip_version =
RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6;
ip6->ip6_plen = 0;
th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
}
#endif
tso_info->lso_v2_xmit.tcp_header_offset = 0;
tso_info->lso_v2_xmit.mss = m_head->m_pkthdr.tso_segsz;
} else if (m_head->m_pkthdr.csum_flags & txr->hn_csum_assist) {
rndis_tcp_ip_csum_info *csum_info;
rndis_msg_size += RNDIS_CSUM_PPI_SIZE;
rppi = hv_set_rppi_data(rndis_mesg, RNDIS_CSUM_PPI_SIZE,
tcpip_chksum_info);
csum_info = (rndis_tcp_ip_csum_info *)((uint8_t *)rppi +
rppi->per_packet_info_offset);
csum_info->xmit.is_ipv4 = 1;
if (m_head->m_pkthdr.csum_flags & CSUM_IP)
csum_info->xmit.ip_header_csum = 1;
if (m_head->m_pkthdr.csum_flags & CSUM_TCP) {
csum_info->xmit.tcp_csum = 1;
csum_info->xmit.tcp_header_offset = 0;
} else if (m_head->m_pkthdr.csum_flags & CSUM_UDP) {
csum_info->xmit.udp_csum = 1;
}
}
rndis_mesg->msg_len = packet->tot_data_buf_len + rndis_msg_size;
packet->tot_data_buf_len = rndis_mesg->msg_len;
/*
* Chimney send, if the packet could fit into one chimney buffer.
*/
if (packet->tot_data_buf_len < txr->hn_tx_chimney_size) {
netvsc_dev *net_dev = txr->hn_sc->net_dev;
uint32_t send_buf_section_idx;
txr->hn_tx_chimney_tried++;
send_buf_section_idx =
hv_nv_get_next_send_section(net_dev);
if (send_buf_section_idx !=
NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) {
uint8_t *dest = ((uint8_t *)net_dev->send_buf +
(send_buf_section_idx *
net_dev->send_section_size));
memcpy(dest, rndis_mesg, rndis_msg_size);
dest += rndis_msg_size;
m_copydata(m_head, 0, m_head->m_pkthdr.len, dest);
packet->send_buf_section_idx = send_buf_section_idx;
packet->send_buf_section_size =
packet->tot_data_buf_len;
packet->page_buf_count = 0;
txr->hn_tx_chimney++;
goto done;
}
}
error = hn_txdesc_dmamap_load(txr, txd, &m_head, segs, &nsegs);
if (error) {
int freed;
/*
* This mbuf is not linked w/ the txd yet, so free it now.
*/
m_freem(m_head);
*m_head0 = NULL;
freed = hn_txdesc_put(txr, txd);
KASSERT(freed != 0,
("fail to free txd upon txdma error"));
txr->hn_txdma_failed++;
if_inc_counter(txr->hn_sc->hn_ifp, IFCOUNTER_OERRORS, 1);
return error;
}
*m_head0 = m_head;
packet->page_buf_count = nsegs + HV_RF_NUM_TX_RESERVED_PAGE_BUFS;
/* send packet with page buffer */
packet->page_buffers[0].pfn = atop(txd->rndis_msg_paddr);
packet->page_buffers[0].offset = txd->rndis_msg_paddr & PAGE_MASK;
packet->page_buffers[0].length = rndis_msg_size;
/*
* Fill the page buffers with mbuf info starting at index
* HV_RF_NUM_TX_RESERVED_PAGE_BUFS.
*/
for (i = 0; i < nsegs; ++i) {
hv_vmbus_page_buffer *pb = &packet->page_buffers[
i + HV_RF_NUM_TX_RESERVED_PAGE_BUFS];
pb->pfn = atop(segs[i].ds_addr);
pb->offset = segs[i].ds_addr & PAGE_MASK;
pb->length = segs[i].ds_len;
}
packet->send_buf_section_idx =
NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX;
packet->send_buf_section_size = 0;
done:
txd->m = m_head;
/* Set the completion routine */
packet->compl.send.on_send_completion = hn_tx_done;
packet->compl.send.send_completion_context = packet;
packet->compl.send.send_completion_tid = (uint64_t)(uintptr_t)txd;
return 0;
}
/*
* NOTE:
* If this function fails, then txd will be freed, but the mbuf
* associated w/ the txd will _not_ be freed.
*/
static int
hn_send_pkt(struct ifnet *ifp, struct hn_tx_ring *txr, struct hn_txdesc *txd)
{
int error, send_failed = 0;
again:
/*
* Make sure that txd is not freed before ETHER_BPF_MTAP.
*/
hn_txdesc_hold(txd);
error = hv_nv_on_send(txr->hn_chan, &txd->netvsc_pkt);
if (!error) {
ETHER_BPF_MTAP(ifp, txd->m);
if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
if (!hn_use_if_start) {
if_inc_counter(ifp, IFCOUNTER_OBYTES,
txd->m->m_pkthdr.len);
if (txd->m->m_flags & M_MCAST)
if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
}
txr->hn_pkts++;
}
hn_txdesc_put(txr, txd);
if (__predict_false(error)) {
int freed;
/*
* This should "really rarely" happen.
*
* XXX Too many RX to be acked or too many sideband
* commands to run? Ask netvsc_channel_rollup()
* to kick start later.
*/
txr->hn_has_txeof = 1;
if (!send_failed) {
txr->hn_send_failed++;
send_failed = 1;
/*
* Try sending again after set hn_has_txeof;
* in case that we missed the last
* netvsc_channel_rollup().
*/
goto again;
}
if_printf(ifp, "send failed\n");
/*
* Caller will perform further processing on the
* associated mbuf, so don't free it in hn_txdesc_put();
* only unload it from the DMA map in hn_txdesc_put(),
* if it was loaded.
*/
txd->m = NULL;
freed = hn_txdesc_put(txr, txd);
KASSERT(freed != 0,
("fail to free txd upon send error"));
txr->hn_send_failed++;
}
return error;
}
/*
* Start a transmit of one or more packets
*/
static int
hn_start_locked(struct hn_tx_ring *txr, int len)
{
struct hn_softc *sc = txr->hn_sc;
struct ifnet *ifp = sc->hn_ifp;
KASSERT(hn_use_if_start,
("hn_start_locked is called, when if_start is disabled"));
KASSERT(txr == &sc->hn_tx_ring[0], ("not the first TX ring"));
mtx_assert(&txr->hn_tx_lock, MA_OWNED);
if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
IFF_DRV_RUNNING)
return 0;
while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
struct hn_txdesc *txd;
struct mbuf *m_head;
int error;
IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
if (m_head == NULL)
break;
if (len > 0 && m_head->m_pkthdr.len > len) {
/*
* This sending could be time consuming; let callers
* dispatch this packet sending (and sending of any
* following up packets) to tx taskqueue.
*/
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
return 1;
}
txd = hn_txdesc_get(txr);
if (txd == NULL) {
txr->hn_no_txdescs++;
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
atomic_set_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
break;
}
error = hn_encap(txr, txd, &m_head);
if (error) {
/* Both txd and m_head are freed */
continue;
}
error = hn_send_pkt(ifp, txr, txd);
if (__predict_false(error)) {
/* txd is freed, but m_head is not */
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
atomic_set_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
break;
}
}
return 0;
}
/*
* Link up/down notification
*/
void
netvsc_linkstatus_callback(struct hv_device *device_obj, uint32_t status)
{
hn_softc_t *sc = device_get_softc(device_obj->device);
if (status == 1) {
sc->hn_carrier = 1;
} else {
sc->hn_carrier = 0;
}
}
/*
* Append the specified data to the indicated mbuf chain,
* Extend the mbuf chain if the new data does not fit in
* existing space.
*
* This is a minor rewrite of m_append() from sys/kern/uipc_mbuf.c.
* There should be an equivalent in the kernel mbuf code,
* but there does not appear to be one yet.
*
* Differs from m_append() in that additional mbufs are
* allocated with cluster size MJUMPAGESIZE, and filled
* accordingly.
*
* Return 1 if able to complete the job; otherwise 0.
*/
static int
hv_m_append(struct mbuf *m0, int len, c_caddr_t cp)
{
struct mbuf *m, *n;
int remainder, space;
for (m = m0; m->m_next != NULL; m = m->m_next)
;
remainder = len;
space = M_TRAILINGSPACE(m);
if (space > 0) {
/*
* Copy into available space.
*/
if (space > remainder)
space = remainder;
bcopy(cp, mtod(m, caddr_t) + m->m_len, space);
m->m_len += space;
cp += space;
remainder -= space;
}
while (remainder > 0) {
/*
* Allocate a new mbuf; could check space
* and allocate a cluster instead.
*/
n = m_getjcl(M_NOWAIT, m->m_type, 0, MJUMPAGESIZE);
if (n == NULL)
break;
n->m_len = min(MJUMPAGESIZE, remainder);
bcopy(cp, mtod(n, caddr_t), n->m_len);
cp += n->m_len;
remainder -= n->m_len;
m->m_next = n;
m = n;
}
if (m0->m_flags & M_PKTHDR)
m0->m_pkthdr.len += len - remainder;
return (remainder == 0);
}
/*
* Called when we receive a data packet from the "wire" on the
* specified device
*
* Note: This is no longer used as a callback
*/
int
netvsc_recv(struct hv_vmbus_channel *chan, netvsc_packet *packet,
const rndis_tcp_ip_csum_info *csum_info,
const struct rndis_hash_info *hash_info,
const struct rndis_hash_value *hash_value)
{
struct hn_rx_ring *rxr = chan->hv_chan_rxr;
struct ifnet *ifp = rxr->hn_ifp;
struct mbuf *m_new;
int size, do_lro = 0, do_csum = 1;
+ int hash_type = M_HASHTYPE_OPAQUE_HASH;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
return (0);
/*
* Bail out if packet contains more data than configured MTU.
*/
if (packet->tot_data_buf_len > (ifp->if_mtu + ETHER_HDR_LEN)) {
return (0);
} else if (packet->tot_data_buf_len <= MHLEN) {
m_new = m_gethdr(M_NOWAIT, MT_DATA);
if (m_new == NULL) {
if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
return (0);
}
memcpy(mtod(m_new, void *), packet->data,
packet->tot_data_buf_len);
m_new->m_pkthdr.len = m_new->m_len = packet->tot_data_buf_len;
rxr->hn_small_pkts++;
} else {
/*
* Get an mbuf with a cluster. For packets 2K or less,
* get a standard 2K cluster. For anything larger, get a
* 4K cluster. Any buffers larger than 4K can cause problems
* if looped around to the Hyper-V TX channel, so avoid them.
*/
size = MCLBYTES;
if (packet->tot_data_buf_len > MCLBYTES) {
/* 4096 */
size = MJUMPAGESIZE;
}
m_new = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, size);
if (m_new == NULL) {
if_inc_counter(ifp, IFCOUNTER_IQDROPS, 1);
return (0);
}
hv_m_append(m_new, packet->tot_data_buf_len, packet->data);
}
m_new->m_pkthdr.rcvif = ifp;
if (__predict_false((ifp->if_capenable & IFCAP_RXCSUM) == 0))
do_csum = 0;
/* receive side checksum offload */
if (csum_info != NULL) {
/* IP csum offload */
if (csum_info->receive.ip_csum_succeeded && do_csum) {
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID);
rxr->hn_csum_ip++;
}
/* TCP/UDP csum offload */
if ((csum_info->receive.tcp_csum_succeeded ||
csum_info->receive.udp_csum_succeeded) && do_csum) {
m_new->m_pkthdr.csum_flags |=
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0xffff;
if (csum_info->receive.tcp_csum_succeeded)
rxr->hn_csum_tcp++;
else
rxr->hn_csum_udp++;
}
if (csum_info->receive.ip_csum_succeeded &&
csum_info->receive.tcp_csum_succeeded)
do_lro = 1;
} else {
const struct ether_header *eh;
uint16_t etype;
int hoff;
hoff = sizeof(*eh);
if (m_new->m_len < hoff)
goto skip;
eh = mtod(m_new, struct ether_header *);
etype = ntohs(eh->ether_type);
if (etype == ETHERTYPE_VLAN) {
const struct ether_vlan_header *evl;
hoff = sizeof(*evl);
if (m_new->m_len < hoff)
goto skip;
evl = mtod(m_new, struct ether_vlan_header *);
etype = ntohs(evl->evl_proto);
}
if (etype == ETHERTYPE_IP) {
int pr;
pr = hn_check_iplen(m_new, hoff);
if (pr == IPPROTO_TCP) {
if (do_csum &&
(rxr->hn_trust_hcsum &
HN_TRUST_HCSUM_TCP)) {
rxr->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0xffff;
}
do_lro = 1;
} else if (pr == IPPROTO_UDP) {
if (do_csum &&
(rxr->hn_trust_hcsum &
HN_TRUST_HCSUM_UDP)) {
rxr->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0xffff;
}
} else if (pr != IPPROTO_DONE && do_csum &&
(rxr->hn_trust_hcsum & HN_TRUST_HCSUM_IP)) {
rxr->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID);
}
}
}
skip:
if ((packet->vlan_tci != 0) &&
(ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0) {
m_new->m_pkthdr.ether_vtag = packet->vlan_tci;
m_new->m_flags |= M_VLANTAG;
}
if (hash_info != NULL && hash_value != NULL) {
- int hash_type = M_HASHTYPE_OPAQUE;
-
rxr->hn_rss_pkts++;
m_new->m_pkthdr.flowid = hash_value->hash_value;
if ((hash_info->hash_info & NDIS_HASH_FUNCTION_MASK) ==
NDIS_HASH_FUNCTION_TOEPLITZ) {
uint32_t type =
(hash_info->hash_info & NDIS_HASH_TYPE_MASK);
switch (type) {
case NDIS_HASH_IPV4:
hash_type = M_HASHTYPE_RSS_IPV4;
break;
case NDIS_HASH_TCP_IPV4:
hash_type = M_HASHTYPE_RSS_TCP_IPV4;
break;
case NDIS_HASH_IPV6:
hash_type = M_HASHTYPE_RSS_IPV6;
break;
case NDIS_HASH_IPV6_EX:
hash_type = M_HASHTYPE_RSS_IPV6_EX;
break;
case NDIS_HASH_TCP_IPV6:
hash_type = M_HASHTYPE_RSS_TCP_IPV6;
break;
case NDIS_HASH_TCP_IPV6_EX:
hash_type = M_HASHTYPE_RSS_TCP_IPV6_EX;
break;
}
}
- M_HASHTYPE_SET(m_new, hash_type);
} else {
- if (hash_value != NULL)
+ if (hash_value != NULL) {
m_new->m_pkthdr.flowid = hash_value->hash_value;
- else
+ } else {
m_new->m_pkthdr.flowid = rxr->hn_rx_idx;
- M_HASHTYPE_SET(m_new, M_HASHTYPE_OPAQUE);
+ hash_type = M_HASHTYPE_OPAQUE;
+ }
}
+ M_HASHTYPE_SET(m_new, hash_type);
/*
* Note: Moved RX completion back to hv_nv_on_receive() so all
* messages (not just data messages) will trigger a response.
*/
if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
rxr->hn_pkts++;
if ((ifp->if_capenable & IFCAP_LRO) && do_lro) {
#if defined(INET) || defined(INET6)
struct lro_ctrl *lro = &rxr->hn_lro;
if (lro->lro_cnt) {
rxr->hn_lro_tried++;
if (tcp_lro_rx(lro, m_new, 0) == 0) {
/* DONE! */
return 0;
}
}
#endif
}
/* We're not holding the lock here, so don't release it */
(*ifp->if_input)(ifp, m_new);
return (0);
}
/*
* Rules for using sc->temp_unusable:
* 1. sc->temp_unusable can only be read or written while holding NV_LOCK()
* 2. code reading sc->temp_unusable under NV_LOCK(), and finding
* sc->temp_unusable set, must release NV_LOCK() and exit
* 3. to retain exclusive control of the interface,
* sc->temp_unusable must be set by code before releasing NV_LOCK()
* 4. only code setting sc->temp_unusable can clear sc->temp_unusable
* 5. code setting sc->temp_unusable must eventually clear sc->temp_unusable
*/
/*
* Standard ioctl entry point. Called when the user wants to configure
* the interface.
*/
static int
hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
{
hn_softc_t *sc = ifp->if_softc;
struct ifreq *ifr = (struct ifreq *)data;
#ifdef INET
struct ifaddr *ifa = (struct ifaddr *)data;
#endif
netvsc_device_info device_info;
struct hv_device *hn_dev;
int mask, error = 0;
int retry_cnt = 500;
switch(cmd) {
case SIOCSIFADDR:
#ifdef INET
if (ifa->ifa_addr->sa_family == AF_INET) {
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
hn_ifinit(sc);
arp_ifinit(ifp, ifa);
} else
#endif
error = ether_ioctl(ifp, cmd, data);
break;
case SIOCSIFMTU:
hn_dev = vmbus_get_devctx(sc->hn_dev);
/* Check MTU value change */
if (ifp->if_mtu == ifr->ifr_mtu)
break;
if (ifr->ifr_mtu > NETVSC_MAX_CONFIGURABLE_MTU) {
error = EINVAL;
break;
}
/* Obtain and record requested MTU */
ifp->if_mtu = ifr->ifr_mtu;
#if __FreeBSD_version >= 1100099
/*
* Make sure that LRO aggregation length limit is still
* valid, after the MTU change.
*/
NV_LOCK(sc);
if (sc->hn_rx_ring[0].hn_lro.lro_length_lim <
HN_LRO_LENLIM_MIN(ifp))
hn_set_lro_lenlim(sc, HN_LRO_LENLIM_MIN(ifp));
NV_UNLOCK(sc);
#endif
do {
NV_LOCK(sc);
if (!sc->temp_unusable) {
sc->temp_unusable = TRUE;
retry_cnt = -1;
}
NV_UNLOCK(sc);
if (retry_cnt > 0) {
retry_cnt--;
DELAY(5 * 1000);
}
} while (retry_cnt > 0);
if (retry_cnt == 0) {
error = EINVAL;
break;
}
/* We must remove and add back the device to cause the new
* MTU to take effect. This includes tearing down, but not
* deleting the channel, then bringing it back up.
*/
error = hv_rf_on_device_remove(hn_dev, HV_RF_NV_RETAIN_CHANNEL);
if (error) {
NV_LOCK(sc);
sc->temp_unusable = FALSE;
NV_UNLOCK(sc);
break;
}
error = hv_rf_on_device_add(hn_dev, &device_info,
sc->hn_rx_ring_inuse);
if (error) {
NV_LOCK(sc);
sc->temp_unusable = FALSE;
NV_UNLOCK(sc);
break;
}
sc->hn_tx_chimney_max = sc->net_dev->send_section_size;
if (sc->hn_tx_ring[0].hn_tx_chimney_size >
sc->hn_tx_chimney_max)
hn_set_tx_chimney_size(sc, sc->hn_tx_chimney_max);
hn_ifinit_locked(sc);
NV_LOCK(sc);
sc->temp_unusable = FALSE;
NV_UNLOCK(sc);
break;
case SIOCSIFFLAGS:
do {
NV_LOCK(sc);
if (!sc->temp_unusable) {
sc->temp_unusable = TRUE;
retry_cnt = -1;
}
NV_UNLOCK(sc);
if (retry_cnt > 0) {
retry_cnt--;
DELAY(5 * 1000);
}
} while (retry_cnt > 0);
if (retry_cnt == 0) {
error = EINVAL;
break;
}
if (ifp->if_flags & IFF_UP) {
/*
* If only the state of the PROMISC flag changed,
* then just use the 'set promisc mode' command
* instead of reinitializing the entire NIC. Doing
* a full re-init means reloading the firmware and
* waiting for it to start up, which may take a
* second or two.
*/
#ifdef notyet
/* Fixme: Promiscuous mode? */
if (ifp->if_drv_flags & IFF_DRV_RUNNING &&
ifp->if_flags & IFF_PROMISC &&
!(sc->hn_if_flags & IFF_PROMISC)) {
/* do something here for Hyper-V */
} else if (ifp->if_drv_flags & IFF_DRV_RUNNING &&
!(ifp->if_flags & IFF_PROMISC) &&
sc->hn_if_flags & IFF_PROMISC) {
/* do something here for Hyper-V */
} else
#endif
hn_ifinit_locked(sc);
} else {
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
hn_stop(sc);
}
}
NV_LOCK(sc);
sc->temp_unusable = FALSE;
NV_UNLOCK(sc);
sc->hn_if_flags = ifp->if_flags;
error = 0;
break;
case SIOCSIFCAP:
NV_LOCK(sc);
mask = ifr->ifr_reqcap ^ ifp->if_capenable;
if (mask & IFCAP_TXCSUM) {
ifp->if_capenable ^= IFCAP_TXCSUM;
if (ifp->if_capenable & IFCAP_TXCSUM) {
ifp->if_hwassist |=
sc->hn_tx_ring[0].hn_csum_assist;
} else {
ifp->if_hwassist &=
~sc->hn_tx_ring[0].hn_csum_assist;
}
}
if (mask & IFCAP_RXCSUM)
ifp->if_capenable ^= IFCAP_RXCSUM;
if (mask & IFCAP_LRO)
ifp->if_capenable ^= IFCAP_LRO;
if (mask & IFCAP_TSO4) {
ifp->if_capenable ^= IFCAP_TSO4;
if (ifp->if_capenable & IFCAP_TSO4)
ifp->if_hwassist |= CSUM_IP_TSO;
else
ifp->if_hwassist &= ~CSUM_IP_TSO;
}
if (mask & IFCAP_TSO6) {
ifp->if_capenable ^= IFCAP_TSO6;
if (ifp->if_capenable & IFCAP_TSO6)
ifp->if_hwassist |= CSUM_IP6_TSO;
else
ifp->if_hwassist &= ~CSUM_IP6_TSO;
}
NV_UNLOCK(sc);
error = 0;
break;
case SIOCADDMULTI:
case SIOCDELMULTI:
#ifdef notyet
/* Fixme: Multicast mode? */
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
NV_LOCK(sc);
netvsc_setmulti(sc);
NV_UNLOCK(sc);
error = 0;
}
#endif
error = EINVAL;
break;
case SIOCSIFMEDIA:
case SIOCGIFMEDIA:
error = ifmedia_ioctl(ifp, ifr, &sc->hn_media, cmd);
break;
default:
error = ether_ioctl(ifp, cmd, data);
break;
}
return (error);
}
/*
*
*/
static void
hn_stop(hn_softc_t *sc)
{
struct ifnet *ifp;
int ret, i;
struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
ifp = sc->hn_ifp;
if (bootverbose)
printf(" Closing Device ...\n");
atomic_clear_int(&ifp->if_drv_flags,
(IFF_DRV_RUNNING | IFF_DRV_OACTIVE));
for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
sc->hn_tx_ring[i].hn_oactive = 0;
if_link_state_change(ifp, LINK_STATE_DOWN);
sc->hn_initdone = 0;
ret = hv_rf_on_close(device_ctx);
}
/*
* FreeBSD transmit entry point
*/
static void
hn_start(struct ifnet *ifp)
{
struct hn_softc *sc = ifp->if_softc;
struct hn_tx_ring *txr = &sc->hn_tx_ring[0];
if (txr->hn_sched_tx)
goto do_sched;
if (mtx_trylock(&txr->hn_tx_lock)) {
int sched;
sched = hn_start_locked(txr, txr->hn_direct_tx_size);
mtx_unlock(&txr->hn_tx_lock);
if (!sched)
return;
}
do_sched:
taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_tx_task);
}
static void
hn_start_txeof(struct hn_tx_ring *txr)
{
struct hn_softc *sc = txr->hn_sc;
struct ifnet *ifp = sc->hn_ifp;
KASSERT(txr == &sc->hn_tx_ring[0], ("not the first TX ring"));
if (txr->hn_sched_tx)
goto do_sched;
if (mtx_trylock(&txr->hn_tx_lock)) {
int sched;
atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
sched = hn_start_locked(txr, txr->hn_direct_tx_size);
mtx_unlock(&txr->hn_tx_lock);
if (sched) {
taskqueue_enqueue(txr->hn_tx_taskq,
&txr->hn_tx_task);
}
} else {
do_sched:
/*
* Release the OACTIVE earlier, with the hope, that
* others could catch up. The task will clear the
* flag again with the hn_tx_lock to avoid possible
* races.
*/
atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_txeof_task);
}
}
/*
*
*/
static void
hn_ifinit_locked(hn_softc_t *sc)
{
struct ifnet *ifp;
struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
int ret, i;
ifp = sc->hn_ifp;
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
return;
}
hv_promisc_mode = 1;
ret = hv_rf_on_open(device_ctx);
if (ret != 0) {
return;
} else {
sc->hn_initdone = 1;
}
atomic_clear_int(&ifp->if_drv_flags, IFF_DRV_OACTIVE);
for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
sc->hn_tx_ring[i].hn_oactive = 0;
atomic_set_int(&ifp->if_drv_flags, IFF_DRV_RUNNING);
if_link_state_change(ifp, LINK_STATE_UP);
}
/*
*
*/
static void
hn_ifinit(void *xsc)
{
hn_softc_t *sc = xsc;
NV_LOCK(sc);
if (sc->temp_unusable) {
NV_UNLOCK(sc);
return;
}
sc->temp_unusable = TRUE;
NV_UNLOCK(sc);
hn_ifinit_locked(sc);
NV_LOCK(sc);
sc->temp_unusable = FALSE;
NV_UNLOCK(sc);
}
#ifdef LATER
/*
*
*/
static void
hn_watchdog(struct ifnet *ifp)
{
hn_softc_t *sc;
sc = ifp->if_softc;
printf("hn%d: watchdog timeout -- resetting\n", sc->hn_unit);
hn_ifinit(sc); /*???*/
if_inc_counter(ifp, IFCOUNTER_OERRORS, 1);
}
#endif
#if __FreeBSD_version >= 1100099
static int
hn_lro_lenlim_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
unsigned int lenlim;
int error;
lenlim = sc->hn_rx_ring[0].hn_lro.lro_length_lim;
error = sysctl_handle_int(oidp, &lenlim, 0, req);
if (error || req->newptr == NULL)
return error;
if (lenlim < HN_LRO_LENLIM_MIN(sc->hn_ifp) ||
lenlim > TCP_LRO_LENGTH_MAX)
return EINVAL;
NV_LOCK(sc);
hn_set_lro_lenlim(sc, lenlim);
NV_UNLOCK(sc);
return 0;
}
static int
hn_lro_ackcnt_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int ackcnt, error, i;
/*
* lro_ackcnt_lim is append count limit,
* +1 to turn it into aggregation limit.
*/
ackcnt = sc->hn_rx_ring[0].hn_lro.lro_ackcnt_lim + 1;
error = sysctl_handle_int(oidp, &ackcnt, 0, req);
if (error || req->newptr == NULL)
return error;
if (ackcnt < 2 || ackcnt > (TCP_LRO_ACKCNT_MAX + 1))
return EINVAL;
/*
* Convert aggregation limit back to append
* count limit.
*/
--ackcnt;
NV_LOCK(sc);
for (i = 0; i < sc->hn_rx_ring_inuse; ++i)
sc->hn_rx_ring[i].hn_lro.lro_ackcnt_lim = ackcnt;
NV_UNLOCK(sc);
return 0;
}
#endif
static int
hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int hcsum = arg2;
int on, error, i;
on = 0;
if (sc->hn_rx_ring[0].hn_trust_hcsum & hcsum)
on = 1;
error = sysctl_handle_int(oidp, &on, 0, req);
if (error || req->newptr == NULL)
return error;
NV_LOCK(sc);
for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
struct hn_rx_ring *rxr = &sc->hn_rx_ring[i];
if (on)
rxr->hn_trust_hcsum |= hcsum;
else
rxr->hn_trust_hcsum &= ~hcsum;
}
NV_UNLOCK(sc);
return 0;
}
static int
hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int chimney_size, error;
chimney_size = sc->hn_tx_ring[0].hn_tx_chimney_size;
error = sysctl_handle_int(oidp, &chimney_size, 0, req);
if (error || req->newptr == NULL)
return error;
if (chimney_size > sc->hn_tx_chimney_max || chimney_size <= 0)
return EINVAL;
hn_set_tx_chimney_size(sc, chimney_size);
return 0;
}
static int
hn_rx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int ofs = arg2, i, error;
struct hn_rx_ring *rxr;
u_long stat;
stat = 0;
for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
rxr = &sc->hn_rx_ring[i];
stat += *((u_long *)((uint8_t *)rxr + ofs));
}
error = sysctl_handle_long(oidp, &stat, 0, req);
if (error || req->newptr == NULL)
return error;
/* Zero out this stat. */
for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
rxr = &sc->hn_rx_ring[i];
*((u_long *)((uint8_t *)rxr + ofs)) = 0;
}
return 0;
}
static int
hn_rx_stat_u64_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int ofs = arg2, i, error;
struct hn_rx_ring *rxr;
uint64_t stat;
stat = 0;
for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
rxr = &sc->hn_rx_ring[i];
stat += *((uint64_t *)((uint8_t *)rxr + ofs));
}
error = sysctl_handle_64(oidp, &stat, 0, req);
if (error || req->newptr == NULL)
return error;
/* Zero out this stat. */
for (i = 0; i < sc->hn_rx_ring_inuse; ++i) {
rxr = &sc->hn_rx_ring[i];
*((uint64_t *)((uint8_t *)rxr + ofs)) = 0;
}
return 0;
}
static int
hn_tx_stat_ulong_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int ofs = arg2, i, error;
struct hn_tx_ring *txr;
u_long stat;
stat = 0;
for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
txr = &sc->hn_tx_ring[i];
stat += *((u_long *)((uint8_t *)txr + ofs));
}
error = sysctl_handle_long(oidp, &stat, 0, req);
if (error || req->newptr == NULL)
return error;
/* Zero out this stat. */
for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
txr = &sc->hn_tx_ring[i];
*((u_long *)((uint8_t *)txr + ofs)) = 0;
}
return 0;
}
static int
hn_tx_conf_int_sysctl(SYSCTL_HANDLER_ARGS)
{
struct hn_softc *sc = arg1;
int ofs = arg2, i, error, conf;
struct hn_tx_ring *txr;
txr = &sc->hn_tx_ring[0];
conf = *((int *)((uint8_t *)txr + ofs));
error = sysctl_handle_int(oidp, &conf, 0, req);
if (error || req->newptr == NULL)
return error;
NV_LOCK(sc);
for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
txr = &sc->hn_tx_ring[i];
*((int *)((uint8_t *)txr + ofs)) = conf;
}
NV_UNLOCK(sc);
return 0;
}
static int
hn_check_iplen(const struct mbuf *m, int hoff)
{
const struct ip *ip;
int len, iphlen, iplen;
const struct tcphdr *th;
int thoff; /* TCP data offset */
len = hoff + sizeof(struct ip);
/* The packet must be at least the size of an IP header. */
if (m->m_pkthdr.len < len)
return IPPROTO_DONE;
/* The fixed IP header must reside completely in the first mbuf. */
if (m->m_len < len)
return IPPROTO_DONE;
ip = mtodo(m, hoff);
/* Bound check the packet's stated IP header length. */
iphlen = ip->ip_hl << 2;
if (iphlen < sizeof(struct ip)) /* minimum header length */
return IPPROTO_DONE;
/* The full IP header must reside completely in the one mbuf. */
if (m->m_len < hoff + iphlen)
return IPPROTO_DONE;
iplen = ntohs(ip->ip_len);
/*
* Check that the amount of data in the buffers is as
* at least much as the IP header would have us expect.
*/
if (m->m_pkthdr.len < hoff + iplen)
return IPPROTO_DONE;
/*
* Ignore IP fragments.
*/
if (ntohs(ip->ip_off) & (IP_OFFMASK | IP_MF))
return IPPROTO_DONE;
/*
* The TCP/IP or UDP/IP header must be entirely contained within
* the first fragment of a packet.
*/
switch (ip->ip_p) {
case IPPROTO_TCP:
if (iplen < iphlen + sizeof(struct tcphdr))
return IPPROTO_DONE;
if (m->m_len < hoff + iphlen + sizeof(struct tcphdr))
return IPPROTO_DONE;
th = (const struct tcphdr *)((const uint8_t *)ip + iphlen);
thoff = th->th_off << 2;
if (thoff < sizeof(struct tcphdr) || thoff + iphlen > iplen)
return IPPROTO_DONE;
if (m->m_len < hoff + iphlen + thoff)
return IPPROTO_DONE;
break;
case IPPROTO_UDP:
if (iplen < iphlen + sizeof(struct udphdr))
return IPPROTO_DONE;
if (m->m_len < hoff + iphlen + sizeof(struct udphdr))
return IPPROTO_DONE;
break;
default:
if (iplen < iphlen)
return IPPROTO_DONE;
break;
}
return ip->ip_p;
}
static void
hn_create_rx_data(struct hn_softc *sc, int ring_cnt)
{
struct sysctl_oid_list *child;
struct sysctl_ctx_list *ctx;
device_t dev = sc->hn_dev;
#if defined(INET) || defined(INET6)
#if __FreeBSD_version >= 1100095
int lroent_cnt;
#endif
#endif
int i;
sc->hn_rx_ring_cnt = ring_cnt;
sc->hn_rx_ring_inuse = sc->hn_rx_ring_cnt;
sc->hn_rx_ring = malloc(sizeof(struct hn_rx_ring) * sc->hn_rx_ring_cnt,
M_NETVSC, M_WAITOK | M_ZERO);
#if defined(INET) || defined(INET6)
#if __FreeBSD_version >= 1100095
lroent_cnt = hn_lro_entry_count;
if (lroent_cnt < TCP_LRO_ENTRIES)
lroent_cnt = TCP_LRO_ENTRIES;
device_printf(dev, "LRO: entry count %d\n", lroent_cnt);
#endif
#endif /* INET || INET6 */
ctx = device_get_sysctl_ctx(dev);
child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev));
/* Create dev.hn.UNIT.rx sysctl tree */
sc->hn_rx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "rx",
CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
for (i = 0; i < sc->hn_rx_ring_cnt; ++i) {
struct hn_rx_ring *rxr = &sc->hn_rx_ring[i];
if (hn_trust_hosttcp)
rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_TCP;
if (hn_trust_hostudp)
rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_UDP;
if (hn_trust_hostip)
rxr->hn_trust_hcsum |= HN_TRUST_HCSUM_IP;
rxr->hn_ifp = sc->hn_ifp;
rxr->hn_rx_idx = i;
/*
* Initialize LRO.
*/
#if defined(INET) || defined(INET6)
#if __FreeBSD_version >= 1100095
tcp_lro_init_args(&rxr->hn_lro, sc->hn_ifp, lroent_cnt, 0);
#else
tcp_lro_init(&rxr->hn_lro);
rxr->hn_lro.ifp = sc->hn_ifp;
#endif
#if __FreeBSD_version >= 1100099
rxr->hn_lro.lro_length_lim = HN_LRO_LENLIM_DEF;
rxr->hn_lro.lro_ackcnt_lim = HN_LRO_ACKCNT_DEF;
#endif
#endif /* INET || INET6 */
if (sc->hn_rx_sysctl_tree != NULL) {
char name[16];
/*
* Create per RX ring sysctl tree:
* dev.hn.UNIT.rx.RINGID
*/
snprintf(name, sizeof(name), "%d", i);
rxr->hn_rx_sysctl_tree = SYSCTL_ADD_NODE(ctx,
SYSCTL_CHILDREN(sc->hn_rx_sysctl_tree),
OID_AUTO, name, CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
if (rxr->hn_rx_sysctl_tree != NULL) {
SYSCTL_ADD_ULONG(ctx,
SYSCTL_CHILDREN(rxr->hn_rx_sysctl_tree),
OID_AUTO, "packets", CTLFLAG_RW,
&rxr->hn_pkts, "# of packets received");
SYSCTL_ADD_ULONG(ctx,
SYSCTL_CHILDREN(rxr->hn_rx_sysctl_tree),
OID_AUTO, "rss_pkts", CTLFLAG_RW,
&rxr->hn_rss_pkts,
"# of packets w/ RSS info received");
}
}
}
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_queued",
CTLTYPE_U64 | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_lro.lro_queued),
hn_rx_stat_u64_sysctl, "LU", "LRO queued");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_flushed",
CTLTYPE_U64 | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_lro.lro_flushed),
hn_rx_stat_u64_sysctl, "LU", "LRO flushed");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_tried",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_lro_tried),
hn_rx_stat_ulong_sysctl, "LU", "# of LRO tries");
#if __FreeBSD_version >= 1100099
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_length_lim",
CTLTYPE_UINT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
hn_lro_lenlim_sysctl, "IU",
"Max # of data bytes to be aggregated by LRO");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_ackcnt_lim",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
hn_lro_ackcnt_sysctl, "I",
"Max # of ACKs to be aggregated by LRO");
#endif
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hosttcp",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_TCP,
hn_trust_hcsum_sysctl, "I",
"Trust tcp segement verification on host side, "
"when csum info is missing");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostudp",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_UDP,
hn_trust_hcsum_sysctl, "I",
"Trust udp datagram verification on host side, "
"when csum info is missing");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostip",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, HN_TRUST_HCSUM_IP,
hn_trust_hcsum_sysctl, "I",
"Trust ip packet verification on host side, "
"when csum info is missing");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_ip",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_csum_ip),
hn_rx_stat_ulong_sysctl, "LU", "RXCSUM IP");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_tcp",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_csum_tcp),
hn_rx_stat_ulong_sysctl, "LU", "RXCSUM TCP");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_udp",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_csum_udp),
hn_rx_stat_ulong_sysctl, "LU", "RXCSUM UDP");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "csum_trusted",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_csum_trusted),
hn_rx_stat_ulong_sysctl, "LU",
"# of packets that we trust host's csum verification");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "small_pkts",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_rx_ring, hn_small_pkts),
hn_rx_stat_ulong_sysctl, "LU", "# of small packets received");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rx_ring_cnt",
CTLFLAG_RD, &sc->hn_rx_ring_cnt, 0, "# created RX rings");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "rx_ring_inuse",
CTLFLAG_RD, &sc->hn_rx_ring_inuse, 0, "# used RX rings");
}
static void
hn_destroy_rx_data(struct hn_softc *sc)
{
#if defined(INET) || defined(INET6)
int i;
#endif
if (sc->hn_rx_ring_cnt == 0)
return;
#if defined(INET) || defined(INET6)
for (i = 0; i < sc->hn_rx_ring_cnt; ++i)
tcp_lro_free(&sc->hn_rx_ring[i].hn_lro);
#endif
free(sc->hn_rx_ring, M_NETVSC);
sc->hn_rx_ring = NULL;
sc->hn_rx_ring_cnt = 0;
sc->hn_rx_ring_inuse = 0;
}
static int
hn_create_tx_ring(struct hn_softc *sc, int id)
{
struct hn_tx_ring *txr = &sc->hn_tx_ring[id];
bus_dma_tag_t parent_dtag;
int error, i;
txr->hn_sc = sc;
txr->hn_tx_idx = id;
#ifndef HN_USE_TXDESC_BUFRING
mtx_init(&txr->hn_txlist_spin, "hn txlist", NULL, MTX_SPIN);
#endif
mtx_init(&txr->hn_tx_lock, "hn tx", NULL, MTX_DEF);
txr->hn_txdesc_cnt = HN_TX_DESC_CNT;
txr->hn_txdesc = malloc(sizeof(struct hn_txdesc) * txr->hn_txdesc_cnt,
M_NETVSC, M_WAITOK | M_ZERO);
#ifndef HN_USE_TXDESC_BUFRING
SLIST_INIT(&txr->hn_txlist);
#else
txr->hn_txdesc_br = buf_ring_alloc(txr->hn_txdesc_cnt, M_NETVSC,
M_WAITOK, &txr->hn_tx_lock);
#endif
txr->hn_tx_taskq = sc->hn_tx_taskq;
if (hn_use_if_start) {
txr->hn_txeof = hn_start_txeof;
TASK_INIT(&txr->hn_tx_task, 0, hn_start_taskfunc, txr);
TASK_INIT(&txr->hn_txeof_task, 0, hn_start_txeof_taskfunc, txr);
} else {
int br_depth;
txr->hn_txeof = hn_xmit_txeof;
TASK_INIT(&txr->hn_tx_task, 0, hn_xmit_taskfunc, txr);
TASK_INIT(&txr->hn_txeof_task, 0, hn_xmit_txeof_taskfunc, txr);
br_depth = hn_get_txswq_depth(txr);
txr->hn_mbuf_br = buf_ring_alloc(br_depth, M_NETVSC,
M_WAITOK, &txr->hn_tx_lock);
}
txr->hn_direct_tx_size = hn_direct_tx_size;
if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1)
txr->hn_csum_assist = HN_CSUM_ASSIST;
else
txr->hn_csum_assist = HN_CSUM_ASSIST_WIN8;
/*
* Always schedule transmission instead of trying to do direct
* transmission. This one gives the best performance so far.
*/
txr->hn_sched_tx = 1;
parent_dtag = bus_get_dma_tag(sc->hn_dev);
/* DMA tag for RNDIS messages. */
error = bus_dma_tag_create(parent_dtag, /* parent */
HN_RNDIS_MSG_ALIGN, /* alignment */
HN_RNDIS_MSG_BOUNDARY, /* boundary */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
HN_RNDIS_MSG_LEN, /* maxsize */
1, /* nsegments */
HN_RNDIS_MSG_LEN, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->hn_tx_rndis_dtag);
if (error) {
device_printf(sc->hn_dev, "failed to create rndis dmatag\n");
return error;
}
/* DMA tag for data. */
error = bus_dma_tag_create(parent_dtag, /* parent */
1, /* alignment */
HN_TX_DATA_BOUNDARY, /* boundary */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
HN_TX_DATA_MAXSIZE, /* maxsize */
HN_TX_DATA_SEGCNT_MAX, /* nsegments */
HN_TX_DATA_SEGSIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->hn_tx_data_dtag);
if (error) {
device_printf(sc->hn_dev, "failed to create data dmatag\n");
return error;
}
for (i = 0; i < txr->hn_txdesc_cnt; ++i) {
struct hn_txdesc *txd = &txr->hn_txdesc[i];
txd->txr = txr;
/*
* Allocate and load RNDIS messages.
*/
error = bus_dmamem_alloc(txr->hn_tx_rndis_dtag,
(void **)&txd->rndis_msg,
BUS_DMA_WAITOK | BUS_DMA_COHERENT,
&txd->rndis_msg_dmap);
if (error) {
device_printf(sc->hn_dev,
"failed to allocate rndis_msg, %d\n", i);
return error;
}
error = bus_dmamap_load(txr->hn_tx_rndis_dtag,
txd->rndis_msg_dmap,
txd->rndis_msg, HN_RNDIS_MSG_LEN,
hyperv_dma_map_paddr, &txd->rndis_msg_paddr,
BUS_DMA_NOWAIT);
if (error) {
device_printf(sc->hn_dev,
"failed to load rndis_msg, %d\n", i);
bus_dmamem_free(txr->hn_tx_rndis_dtag,
txd->rndis_msg, txd->rndis_msg_dmap);
return error;
}
/* DMA map for TX data. */
error = bus_dmamap_create(txr->hn_tx_data_dtag, 0,
&txd->data_dmap);
if (error) {
device_printf(sc->hn_dev,
"failed to allocate tx data dmamap\n");
bus_dmamap_unload(txr->hn_tx_rndis_dtag,
txd->rndis_msg_dmap);
bus_dmamem_free(txr->hn_tx_rndis_dtag,
txd->rndis_msg, txd->rndis_msg_dmap);
return error;
}
/* All set, put it to list */
txd->flags |= HN_TXD_FLAG_ONLIST;
#ifndef HN_USE_TXDESC_BUFRING
SLIST_INSERT_HEAD(&txr->hn_txlist, txd, link);
#else
buf_ring_enqueue(txr->hn_txdesc_br, txd);
#endif
}
txr->hn_txdesc_avail = txr->hn_txdesc_cnt;
if (sc->hn_tx_sysctl_tree != NULL) {
struct sysctl_oid_list *child;
struct sysctl_ctx_list *ctx;
char name[16];
/*
* Create per TX ring sysctl tree:
* dev.hn.UNIT.tx.RINGID
*/
ctx = device_get_sysctl_ctx(sc->hn_dev);
child = SYSCTL_CHILDREN(sc->hn_tx_sysctl_tree);
snprintf(name, sizeof(name), "%d", id);
txr->hn_tx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO,
name, CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
if (txr->hn_tx_sysctl_tree != NULL) {
child = SYSCTL_CHILDREN(txr->hn_tx_sysctl_tree);
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_avail",
CTLFLAG_RD, &txr->hn_txdesc_avail, 0,
"# of available TX descs");
if (!hn_use_if_start) {
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "oactive",
CTLFLAG_RD, &txr->hn_oactive, 0,
"over active");
}
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "packets",
CTLFLAG_RW, &txr->hn_pkts,
"# of packets transmitted");
}
}
return 0;
}
static void
hn_txdesc_dmamap_destroy(struct hn_txdesc *txd)
{
struct hn_tx_ring *txr = txd->txr;
KASSERT(txd->m == NULL, ("still has mbuf installed"));
KASSERT((txd->flags & HN_TXD_FLAG_DMAMAP) == 0, ("still dma mapped"));
bus_dmamap_unload(txr->hn_tx_rndis_dtag, txd->rndis_msg_dmap);
bus_dmamem_free(txr->hn_tx_rndis_dtag, txd->rndis_msg,
txd->rndis_msg_dmap);
bus_dmamap_destroy(txr->hn_tx_data_dtag, txd->data_dmap);
}
static void
hn_destroy_tx_ring(struct hn_tx_ring *txr)
{
struct hn_txdesc *txd;
if (txr->hn_txdesc == NULL)
return;
#ifndef HN_USE_TXDESC_BUFRING
while ((txd = SLIST_FIRST(&txr->hn_txlist)) != NULL) {
SLIST_REMOVE_HEAD(&txr->hn_txlist, link);
hn_txdesc_dmamap_destroy(txd);
}
#else
mtx_lock(&txr->hn_tx_lock);
while ((txd = buf_ring_dequeue_sc(txr->hn_txdesc_br)) != NULL)
hn_txdesc_dmamap_destroy(txd);
mtx_unlock(&txr->hn_tx_lock);
#endif
if (txr->hn_tx_data_dtag != NULL)
bus_dma_tag_destroy(txr->hn_tx_data_dtag);
if (txr->hn_tx_rndis_dtag != NULL)
bus_dma_tag_destroy(txr->hn_tx_rndis_dtag);
#ifdef HN_USE_TXDESC_BUFRING
buf_ring_free(txr->hn_txdesc_br, M_NETVSC);
#endif
free(txr->hn_txdesc, M_NETVSC);
txr->hn_txdesc = NULL;
if (txr->hn_mbuf_br != NULL)
buf_ring_free(txr->hn_mbuf_br, M_NETVSC);
#ifndef HN_USE_TXDESC_BUFRING
mtx_destroy(&txr->hn_txlist_spin);
#endif
mtx_destroy(&txr->hn_tx_lock);
}
static int
hn_create_tx_data(struct hn_softc *sc, int ring_cnt)
{
struct sysctl_oid_list *child;
struct sysctl_ctx_list *ctx;
int i;
sc->hn_tx_ring_cnt = ring_cnt;
sc->hn_tx_ring_inuse = sc->hn_tx_ring_cnt;
sc->hn_tx_ring = malloc(sizeof(struct hn_tx_ring) * sc->hn_tx_ring_cnt,
M_NETVSC, M_WAITOK | M_ZERO);
ctx = device_get_sysctl_ctx(sc->hn_dev);
child = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->hn_dev));
/* Create dev.hn.UNIT.tx sysctl tree */
sc->hn_tx_sysctl_tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "tx",
CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "");
for (i = 0; i < sc->hn_tx_ring_cnt; ++i) {
int error;
error = hn_create_tx_ring(sc, i);
if (error)
return error;
}
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "no_txdescs",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_no_txdescs),
hn_tx_stat_ulong_sysctl, "LU", "# of times short of TX descs");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "send_failed",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_send_failed),
hn_tx_stat_ulong_sysctl, "LU", "# of hyper-v sending failure");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "txdma_failed",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_txdma_failed),
hn_tx_stat_ulong_sysctl, "LU", "# of TX DMA failure");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_collapsed",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_tx_collapsed),
hn_tx_stat_ulong_sysctl, "LU", "# of TX mbuf collapsed");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_tx_chimney),
hn_tx_stat_ulong_sysctl, "LU", "# of chimney send");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney_tried",
CTLTYPE_ULONG | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_tx_chimney_tried),
hn_tx_stat_ulong_sysctl, "LU", "# of chimney send tries");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_cnt",
CTLFLAG_RD, &sc->hn_tx_ring[0].hn_txdesc_cnt, 0,
"# of total TX descs");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_chimney_max",
CTLFLAG_RD, &sc->hn_tx_chimney_max, 0,
"Chimney send packet size upper boundary");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney_size",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc, 0,
hn_tx_chimney_size_sysctl,
"I", "Chimney send packet size limit");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "direct_tx_size",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_direct_tx_size),
hn_tx_conf_int_sysctl, "I",
"Size of the packet for direct transmission");
SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "sched_tx",
CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, sc,
__offsetof(struct hn_tx_ring, hn_sched_tx),
hn_tx_conf_int_sysctl, "I",
"Always schedule transmission "
"instead of doing direct transmission");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_ring_cnt",
CTLFLAG_RD, &sc->hn_tx_ring_cnt, 0, "# created TX rings");
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_ring_inuse",
CTLFLAG_RD, &sc->hn_tx_ring_inuse, 0, "# used TX rings");
return 0;
}
static void
hn_set_tx_chimney_size(struct hn_softc *sc, int chimney_size)
{
int i;
NV_LOCK(sc);
for (i = 0; i < sc->hn_tx_ring_inuse; ++i)
sc->hn_tx_ring[i].hn_tx_chimney_size = chimney_size;
NV_UNLOCK(sc);
}
static void
hn_destroy_tx_data(struct hn_softc *sc)
{
int i;
if (sc->hn_tx_ring_cnt == 0)
return;
for (i = 0; i < sc->hn_tx_ring_cnt; ++i)
hn_destroy_tx_ring(&sc->hn_tx_ring[i]);
free(sc->hn_tx_ring, M_NETVSC);
sc->hn_tx_ring = NULL;
sc->hn_tx_ring_cnt = 0;
sc->hn_tx_ring_inuse = 0;
}
static void
hn_start_taskfunc(void *xtxr, int pending __unused)
{
struct hn_tx_ring *txr = xtxr;
mtx_lock(&txr->hn_tx_lock);
hn_start_locked(txr, 0);
mtx_unlock(&txr->hn_tx_lock);
}
static void
hn_start_txeof_taskfunc(void *xtxr, int pending __unused)
{
struct hn_tx_ring *txr = xtxr;
mtx_lock(&txr->hn_tx_lock);
atomic_clear_int(&txr->hn_sc->hn_ifp->if_drv_flags, IFF_DRV_OACTIVE);
hn_start_locked(txr, 0);
mtx_unlock(&txr->hn_tx_lock);
}
static void
hn_stop_tx_tasks(struct hn_softc *sc)
{
int i;
for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
struct hn_tx_ring *txr = &sc->hn_tx_ring[i];
taskqueue_drain(txr->hn_tx_taskq, &txr->hn_tx_task);
taskqueue_drain(txr->hn_tx_taskq, &txr->hn_txeof_task);
}
}
static int
hn_xmit(struct hn_tx_ring *txr, int len)
{
struct hn_softc *sc = txr->hn_sc;
struct ifnet *ifp = sc->hn_ifp;
struct mbuf *m_head;
mtx_assert(&txr->hn_tx_lock, MA_OWNED);
KASSERT(hn_use_if_start == 0,
("hn_xmit is called, when if_start is enabled"));
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || txr->hn_oactive)
return 0;
while ((m_head = drbr_peek(ifp, txr->hn_mbuf_br)) != NULL) {
struct hn_txdesc *txd;
int error;
if (len > 0 && m_head->m_pkthdr.len > len) {
/*
* This sending could be time consuming; let callers
* dispatch this packet sending (and sending of any
* following up packets) to tx taskqueue.
*/
drbr_putback(ifp, txr->hn_mbuf_br, m_head);
return 1;
}
txd = hn_txdesc_get(txr);
if (txd == NULL) {
txr->hn_no_txdescs++;
drbr_putback(ifp, txr->hn_mbuf_br, m_head);
txr->hn_oactive = 1;
break;
}
error = hn_encap(txr, txd, &m_head);
if (error) {
/* Both txd and m_head are freed; discard */
drbr_advance(ifp, txr->hn_mbuf_br);
continue;
}
error = hn_send_pkt(ifp, txr, txd);
if (__predict_false(error)) {
/* txd is freed, but m_head is not */
drbr_putback(ifp, txr->hn_mbuf_br, m_head);
txr->hn_oactive = 1;
break;
}
/* Sent */
drbr_advance(ifp, txr->hn_mbuf_br);
}
return 0;
}
static int
hn_transmit(struct ifnet *ifp, struct mbuf *m)
{
struct hn_softc *sc = ifp->if_softc;
struct hn_tx_ring *txr;
int error, idx = 0;
/*
* Select the TX ring based on flowid
*/
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
idx = m->m_pkthdr.flowid % sc->hn_tx_ring_inuse;
txr = &sc->hn_tx_ring[idx];
error = drbr_enqueue(ifp, txr->hn_mbuf_br, m);
if (error) {
if_inc_counter(ifp, IFCOUNTER_OQDROPS, 1);
return error;
}
if (txr->hn_oactive)
return 0;
if (txr->hn_sched_tx)
goto do_sched;
if (mtx_trylock(&txr->hn_tx_lock)) {
int sched;
sched = hn_xmit(txr, txr->hn_direct_tx_size);
mtx_unlock(&txr->hn_tx_lock);
if (!sched)
return 0;
}
do_sched:
taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_tx_task);
return 0;
}
static void
hn_xmit_qflush(struct ifnet *ifp)
{
struct hn_softc *sc = ifp->if_softc;
int i;
for (i = 0; i < sc->hn_tx_ring_inuse; ++i) {
struct hn_tx_ring *txr = &sc->hn_tx_ring[i];
struct mbuf *m;
mtx_lock(&txr->hn_tx_lock);
while ((m = buf_ring_dequeue_sc(txr->hn_mbuf_br)) != NULL)
m_freem(m);
mtx_unlock(&txr->hn_tx_lock);
}
if_qflush(ifp);
}
static void
hn_xmit_txeof(struct hn_tx_ring *txr)
{
if (txr->hn_sched_tx)
goto do_sched;
if (mtx_trylock(&txr->hn_tx_lock)) {
int sched;
txr->hn_oactive = 0;
sched = hn_xmit(txr, txr->hn_direct_tx_size);
mtx_unlock(&txr->hn_tx_lock);
if (sched) {
taskqueue_enqueue(txr->hn_tx_taskq,
&txr->hn_tx_task);
}
} else {
do_sched:
/*
* Release the oactive earlier, with the hope, that
* others could catch up. The task will clear the
* oactive again with the hn_tx_lock to avoid possible
* races.
*/
txr->hn_oactive = 0;
taskqueue_enqueue(txr->hn_tx_taskq, &txr->hn_txeof_task);
}
}
static void
hn_xmit_taskfunc(void *xtxr, int pending __unused)
{
struct hn_tx_ring *txr = xtxr;
mtx_lock(&txr->hn_tx_lock);
hn_xmit(txr, 0);
mtx_unlock(&txr->hn_tx_lock);
}
static void
hn_xmit_txeof_taskfunc(void *xtxr, int pending __unused)
{
struct hn_tx_ring *txr = xtxr;
mtx_lock(&txr->hn_tx_lock);
txr->hn_oactive = 0;
hn_xmit(txr, 0);
mtx_unlock(&txr->hn_tx_lock);
}
static void
hn_channel_attach(struct hn_softc *sc, struct hv_vmbus_channel *chan)
{
struct hn_rx_ring *rxr;
int idx;
idx = chan->offer_msg.offer.sub_channel_index;
KASSERT(idx >= 0 && idx < sc->hn_rx_ring_inuse,
("invalid channel index %d, should > 0 && < %d",
idx, sc->hn_rx_ring_inuse));
rxr = &sc->hn_rx_ring[idx];
KASSERT((rxr->hn_rx_flags & HN_RX_FLAG_ATTACHED) == 0,
("RX ring %d already attached", idx));
rxr->hn_rx_flags |= HN_RX_FLAG_ATTACHED;
chan->hv_chan_rxr = rxr;
if (bootverbose) {
if_printf(sc->hn_ifp, "link RX ring %d to channel%u\n",
idx, chan->offer_msg.child_rel_id);
}
if (idx < sc->hn_tx_ring_inuse) {
struct hn_tx_ring *txr = &sc->hn_tx_ring[idx];
KASSERT((txr->hn_tx_flags & HN_TX_FLAG_ATTACHED) == 0,
("TX ring %d already attached", idx));
txr->hn_tx_flags |= HN_TX_FLAG_ATTACHED;
chan->hv_chan_txr = txr;
txr->hn_chan = chan;
if (bootverbose) {
if_printf(sc->hn_ifp, "link TX ring %d to channel%u\n",
idx, chan->offer_msg.child_rel_id);
}
}
/* Bind channel to a proper CPU */
vmbus_channel_cpu_set(chan, (sc->hn_cpu + idx) % mp_ncpus);
}
static void
hn_subchan_attach(struct hn_softc *sc, struct hv_vmbus_channel *chan)
{
KASSERT(!HV_VMBUS_CHAN_ISPRIMARY(chan),
("subchannel callback on primary channel"));
KASSERT(chan->offer_msg.offer.sub_channel_index > 0,
("invalid channel subidx %u",
chan->offer_msg.offer.sub_channel_index));
hn_channel_attach(sc, chan);
}
static void
hn_tx_taskq_create(void *arg __unused)
{
if (!hn_share_tx_taskq)
return;
hn_tx_taskq = taskqueue_create("hn_tx", M_WAITOK,
taskqueue_thread_enqueue, &hn_tx_taskq);
if (hn_bind_tx_taskq >= 0) {
int cpu = hn_bind_tx_taskq;
cpuset_t cpu_set;
if (cpu > mp_ncpus - 1)
cpu = mp_ncpus - 1;
CPU_SETOF(cpu, &cpu_set);
taskqueue_start_threads_cpuset(&hn_tx_taskq, 1, PI_NET,
&cpu_set, "hn tx");
} else {
taskqueue_start_threads(&hn_tx_taskq, 1, PI_NET, "hn tx");
}
}
SYSINIT(hn_txtq_create, SI_SUB_DRIVERS, SI_ORDER_FIRST,
hn_tx_taskq_create, NULL);
static void
hn_tx_taskq_destroy(void *arg __unused)
{
if (hn_tx_taskq != NULL)
taskqueue_free(hn_tx_taskq);
}
SYSUNINIT(hn_txtq_destroy, SI_SUB_DRIVERS, SI_ORDER_FIRST,
hn_tx_taskq_destroy, NULL);
static device_method_t netvsc_methods[] = {
/* Device interface */
DEVMETHOD(device_probe, netvsc_probe),
DEVMETHOD(device_attach, netvsc_attach),
DEVMETHOD(device_detach, netvsc_detach),
DEVMETHOD(device_shutdown, netvsc_shutdown),
{ 0, 0 }
};
static driver_t netvsc_driver = {
NETVSC_DEVNAME,
netvsc_methods,
sizeof(hn_softc_t)
};
static devclass_t netvsc_devclass;
DRIVER_MODULE(hn, vmbus, netvsc_driver, netvsc_devclass, 0, 0);
MODULE_VERSION(hn, 1);
MODULE_DEPEND(hn, vmbus, 1, 1, 1);
Index: projects/vnet/sys/dev/ixgbe/ix_txrx.c
===================================================================
--- projects/vnet/sys/dev/ixgbe/ix_txrx.c (revision 301546)
+++ projects/vnet/sys/dev/ixgbe/ix_txrx.c (revision 301547)
@@ -1,2303 +1,2303 @@
/******************************************************************************
Copyright (c) 2001-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
******************************************************************************/
/*$FreeBSD$*/
#ifndef IXGBE_STANDALONE_BUILD
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_rss.h"
#endif
#include "ixgbe.h"
#ifdef RSS
#include
#include
#endif
#ifdef DEV_NETMAP
#include
#include
#include
extern int ix_crcstrip;
#endif
/*
** HW RSC control:
** this feature only works with
** IPv4, and only on 82599 and later.
** Also this will cause IP forwarding to
** fail and that can't be controlled by
** the stack as LRO can. For all these
** reasons I've deemed it best to leave
** this off and not bother with a tuneable
** interface, this would need to be compiled
** to enable.
*/
static bool ixgbe_rsc_enable = FALSE;
#ifdef IXGBE_FDIR
/*
** For Flow Director: this is the
** number of TX packets we sample
** for the filter pool, this means
** every 20th packet will be probed.
**
** This feature can be disabled by
** setting this to 0.
*/
static int atr_sample_rate = 20;
#endif
/*********************************************************************
* Local Function prototypes
*********************************************************************/
static void ixgbe_setup_transmit_ring(struct tx_ring *);
static void ixgbe_free_transmit_buffers(struct tx_ring *);
static int ixgbe_setup_receive_ring(struct rx_ring *);
static void ixgbe_free_receive_buffers(struct rx_ring *);
static void ixgbe_rx_checksum(u32, struct mbuf *, u32);
static void ixgbe_refresh_mbufs(struct rx_ring *, int);
static int ixgbe_xmit(struct tx_ring *, struct mbuf **);
static int ixgbe_tx_ctx_setup(struct tx_ring *,
struct mbuf *, u32 *, u32 *);
static int ixgbe_tso_setup(struct tx_ring *,
struct mbuf *, u32 *, u32 *);
#ifdef IXGBE_FDIR
static void ixgbe_atr(struct tx_ring *, struct mbuf *);
#endif
static __inline void ixgbe_rx_discard(struct rx_ring *, int);
static __inline void ixgbe_rx_input(struct rx_ring *, struct ifnet *,
struct mbuf *, u32);
#ifdef IXGBE_LEGACY_TX
/*********************************************************************
* Transmit entry point
*
* ixgbe_start is called by the stack to initiate a transmit.
* The driver will remain in this routine as long as there are
* packets to transmit and transmit resources are available.
* In case resources are not available stack is notified and
* the packet is requeued.
**********************************************************************/
void
ixgbe_start_locked(struct tx_ring *txr, struct ifnet * ifp)
{
struct mbuf *m_head;
struct adapter *adapter = txr->adapter;
IXGBE_TX_LOCK_ASSERT(txr);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
return;
if (!adapter->link_active)
return;
while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
if (txr->tx_avail <= IXGBE_QUEUE_MIN_FREE)
break;
IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
if (m_head == NULL)
break;
if (ixgbe_xmit(txr, &m_head)) {
if (m_head != NULL)
IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
break;
}
/* Send a copy of the frame to the BPF listener */
ETHER_BPF_MTAP(ifp, m_head);
}
return;
}
/*
* Legacy TX start - called by the stack, this
* always uses the first tx ring, and should
* not be used with multiqueue tx enabled.
*/
void
ixgbe_start(struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct tx_ring *txr = adapter->tx_rings;
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
IXGBE_TX_LOCK(txr);
ixgbe_start_locked(txr, ifp);
IXGBE_TX_UNLOCK(txr);
}
return;
}
#else /* ! IXGBE_LEGACY_TX */
/*
** Multiqueue Transmit Entry Point
** (if_transmit function)
*/
int
ixgbe_mq_start(struct ifnet *ifp, struct mbuf *m)
{
struct adapter *adapter = ifp->if_softc;
struct ix_queue *que;
struct tx_ring *txr;
int i, err = 0;
#ifdef RSS
uint32_t bucket_id;
#endif
/*
* When doing RSS, map it to the same outbound queue
* as the incoming flow would be mapped to.
*
* If everything is setup correctly, it should be the
* same bucket that the current CPU we're on is.
*/
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
#ifdef RSS
if (rss_hash2bucket(m->m_pkthdr.flowid,
M_HASHTYPE_GET(m), &bucket_id) == 0) {
i = bucket_id % adapter->num_queues;
#ifdef IXGBE_DEBUG
if (bucket_id > adapter->num_queues)
if_printf(ifp, "bucket_id (%d) > num_queues "
"(%d)\n", bucket_id, adapter->num_queues);
#endif
} else
#endif
i = m->m_pkthdr.flowid % adapter->num_queues;
} else
i = curcpu % adapter->num_queues;
/* Check for a hung queue and pick alternative */
if (((1 << i) & adapter->active_queues) == 0)
i = ffsl(adapter->active_queues);
txr = &adapter->tx_rings[i];
que = &adapter->queues[i];
err = drbr_enqueue(ifp, txr->br, m);
if (err)
return (err);
if (IXGBE_TX_TRYLOCK(txr)) {
ixgbe_mq_start_locked(ifp, txr);
IXGBE_TX_UNLOCK(txr);
} else
taskqueue_enqueue(que->tq, &txr->txq_task);
return (0);
}
int
ixgbe_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct mbuf *next;
int enqueued = 0, err = 0;
if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
adapter->link_active == 0)
return (ENETDOWN);
/* Process the queue */
#if __FreeBSD_version < 901504
next = drbr_dequeue(ifp, txr->br);
while (next != NULL) {
if ((err = ixgbe_xmit(txr, &next)) != 0) {
if (next != NULL)
err = drbr_enqueue(ifp, txr->br, next);
#else
while ((next = drbr_peek(ifp, txr->br)) != NULL) {
if ((err = ixgbe_xmit(txr, &next)) != 0) {
if (next == NULL) {
drbr_advance(ifp, txr->br);
} else {
drbr_putback(ifp, txr->br, next);
}
#endif
break;
}
#if __FreeBSD_version >= 901504
drbr_advance(ifp, txr->br);
#endif
enqueued++;
#if 0 // this is VF-only
#if __FreeBSD_version >= 1100036
/*
* Since we're looking at the tx ring, we can check
* to see if we're a VF by examing our tail register
* address.
*/
if (txr->tail < IXGBE_TDT(0) && next->m_flags & M_MCAST)
if_inc_counter(ifp, IFCOUNTER_OMCASTS, 1);
#endif
#endif
/* Send a copy of the frame to the BPF listener */
ETHER_BPF_MTAP(ifp, next);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
#if __FreeBSD_version < 901504
next = drbr_dequeue(ifp, txr->br);
#endif
}
if (txr->tx_avail < IXGBE_TX_CLEANUP_THRESHOLD)
ixgbe_txeof(txr);
return (err);
}
/*
* Called from a taskqueue to drain queued transmit packets.
*/
void
ixgbe_deferred_mq_start(void *arg, int pending)
{
struct tx_ring *txr = arg;
struct adapter *adapter = txr->adapter;
struct ifnet *ifp = adapter->ifp;
IXGBE_TX_LOCK(txr);
if (!drbr_empty(ifp, txr->br))
ixgbe_mq_start_locked(ifp, txr);
IXGBE_TX_UNLOCK(txr);
}
/*
* Flush all ring buffers
*/
void
ixgbe_qflush(struct ifnet *ifp)
{
struct adapter *adapter = ifp->if_softc;
struct tx_ring *txr = adapter->tx_rings;
struct mbuf *m;
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IXGBE_TX_LOCK(txr);
while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
m_freem(m);
IXGBE_TX_UNLOCK(txr);
}
if_qflush(ifp);
}
#endif /* IXGBE_LEGACY_TX */
/*********************************************************************
*
* This routine maps the mbufs to tx descriptors, allowing the
* TX engine to transmit the packets.
* - return 0 on success, positive on failure
*
**********************************************************************/
static int
ixgbe_xmit(struct tx_ring *txr, struct mbuf **m_headp)
{
struct adapter *adapter = txr->adapter;
u32 olinfo_status = 0, cmd_type_len;
int i, j, error, nsegs;
int first;
bool remap = TRUE;
struct mbuf *m_head;
bus_dma_segment_t segs[adapter->num_segs];
bus_dmamap_t map;
struct ixgbe_tx_buf *txbuf;
union ixgbe_adv_tx_desc *txd = NULL;
m_head = *m_headp;
/* Basic descriptor defines */
cmd_type_len = (IXGBE_ADVTXD_DTYP_DATA |
IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT);
if (m_head->m_flags & M_VLANTAG)
cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
/*
* Important to capture the first descriptor
* used because it will contain the index of
* the one we tell the hardware to report back
*/
first = txr->next_avail_desc;
txbuf = &txr->tx_buffers[first];
map = txbuf->map;
/*
* Map the packet for DMA.
*/
retry:
error = bus_dmamap_load_mbuf_sg(txr->txtag, map,
*m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
if (__predict_false(error)) {
struct mbuf *m;
switch (error) {
case EFBIG:
/* Try it again? - one try */
if (remap == TRUE) {
remap = FALSE;
/*
* XXX: m_defrag will choke on
* non-MCLBYTES-sized clusters
*/
m = m_defrag(*m_headp, M_NOWAIT);
if (m == NULL) {
adapter->mbuf_defrag_failed++;
m_freem(*m_headp);
*m_headp = NULL;
return (ENOBUFS);
}
*m_headp = m;
goto retry;
} else
return (error);
case ENOMEM:
txr->no_tx_dma_setup++;
return (error);
default:
txr->no_tx_dma_setup++;
m_freem(*m_headp);
*m_headp = NULL;
return (error);
}
}
/* Make certain there are enough descriptors */
if (txr->tx_avail < (nsegs + 2)) {
txr->no_desc_avail++;
bus_dmamap_unload(txr->txtag, map);
return (ENOBUFS);
}
m_head = *m_headp;
/*
* Set up the appropriate offload context
* this will consume the first descriptor
*/
error = ixgbe_tx_ctx_setup(txr, m_head, &cmd_type_len, &olinfo_status);
if (__predict_false(error)) {
if (error == ENOBUFS)
*m_headp = NULL;
return (error);
}
#ifdef IXGBE_FDIR
/* Do the flow director magic */
if ((txr->atr_sample) && (!adapter->fdir_reinit)) {
++txr->atr_count;
if (txr->atr_count >= atr_sample_rate) {
ixgbe_atr(txr, m_head);
txr->atr_count = 0;
}
}
#endif
olinfo_status |= IXGBE_ADVTXD_CC;
i = txr->next_avail_desc;
for (j = 0; j < nsegs; j++) {
bus_size_t seglen;
bus_addr_t segaddr;
txbuf = &txr->tx_buffers[i];
txd = &txr->tx_base[i];
seglen = segs[j].ds_len;
segaddr = htole64(segs[j].ds_addr);
txd->read.buffer_addr = segaddr;
txd->read.cmd_type_len = htole32(txr->txd_cmd |
cmd_type_len |seglen);
txd->read.olinfo_status = htole32(olinfo_status);
if (++i == txr->num_desc)
i = 0;
}
txd->read.cmd_type_len |=
htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
txr->tx_avail -= nsegs;
txr->next_avail_desc = i;
txbuf->m_head = m_head;
/*
* Here we swap the map so the last descriptor,
* which gets the completion interrupt has the
* real map, and the first descriptor gets the
* unused map from this descriptor.
*/
txr->tx_buffers[first].map = txbuf->map;
txbuf->map = map;
bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE);
/* Set the EOP descriptor that will be marked done */
txbuf = &txr->tx_buffers[first];
txbuf->eop = txd;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
* Advance the Transmit Descriptor Tail (Tdt), this tells the
* hardware that this frame is available to transmit.
*/
++txr->total_packets;
IXGBE_WRITE_REG(&adapter->hw, txr->tail, i);
/* Mark queue as having work */
if (txr->busy == 0)
txr->busy = 1;
return (0);
}
/*********************************************************************
*
* Allocate memory for tx_buffer structures. The tx_buffer stores all
* the information needed to transmit a packet on the wire. This is
* called only once at attach, setup is done every reset.
*
**********************************************************************/
int
ixgbe_allocate_transmit_buffers(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
device_t dev = adapter->dev;
struct ixgbe_tx_buf *txbuf;
int error, i;
/*
* Setup DMA descriptor areas.
*/
if ((error = bus_dma_tag_create(
bus_get_dma_tag(adapter->dev), /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
IXGBE_TSO_SIZE, /* maxsize */
adapter->num_segs, /* nsegments */
PAGE_SIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->txtag))) {
device_printf(dev,"Unable to allocate TX DMA tag\n");
goto fail;
}
if (!(txr->tx_buffers =
(struct ixgbe_tx_buf *) malloc(sizeof(struct ixgbe_tx_buf) *
adapter->num_tx_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate tx_buffer memory\n");
error = ENOMEM;
goto fail;
}
/* Create the descriptor buffer dma maps */
txbuf = txr->tx_buffers;
for (i = 0; i < adapter->num_tx_desc; i++, txbuf++) {
error = bus_dmamap_create(txr->txtag, 0, &txbuf->map);
if (error != 0) {
device_printf(dev, "Unable to create TX DMA map\n");
goto fail;
}
}
return 0;
fail:
/* We free all, it handles case where we are in the middle */
ixgbe_free_transmit_structures(adapter);
return (error);
}
/*********************************************************************
*
* Initialize a transmit ring.
*
**********************************************************************/
static void
ixgbe_setup_transmit_ring(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct ixgbe_tx_buf *txbuf;
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(adapter->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
/* Clear the old ring contents */
IXGBE_TX_LOCK(txr);
#ifdef DEV_NETMAP
/*
* (under lock): if in netmap mode, do some consistency
* checks and set slot to entry 0 of the netmap ring.
*/
slot = netmap_reset(na, NR_TX, txr->me, 0);
#endif /* DEV_NETMAP */
bzero((void *)txr->tx_base,
(sizeof(union ixgbe_adv_tx_desc)) * adapter->num_tx_desc);
/* Reset indices */
txr->next_avail_desc = 0;
txr->next_to_clean = 0;
/* Free any existing tx buffers. */
txbuf = txr->tx_buffers;
for (int i = 0; i < txr->num_desc; i++, txbuf++) {
if (txbuf->m_head != NULL) {
bus_dmamap_sync(txr->txtag, txbuf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag, txbuf->map);
m_freem(txbuf->m_head);
txbuf->m_head = NULL;
}
#ifdef DEV_NETMAP
/*
* In netmap mode, set the map for the packet buffer.
* NOTE: Some drivers (not this one) also need to set
* the physical buffer address in the NIC ring.
* Slots in the netmap ring (indexed by "si") are
* kring->nkr_hwofs positions "ahead" wrt the
* corresponding slot in the NIC ring. In some drivers
* (not here) nkr_hwofs can be negative. Function
* netmap_idx_n2k() handles wraparounds properly.
*/
if (slot) {
int si = netmap_idx_n2k(&na->tx_rings[txr->me], i);
netmap_load_map(na, txr->txtag,
txbuf->map, NMB(na, slot + si));
}
#endif /* DEV_NETMAP */
/* Clear the EOP descriptor pointer */
txbuf->eop = NULL;
}
#ifdef IXGBE_FDIR
/* Set the rate at which we sample packets */
if (adapter->hw.mac.type != ixgbe_mac_82598EB)
txr->atr_sample = atr_sample_rate;
#endif
/* Set number of descriptors available */
txr->tx_avail = adapter->num_tx_desc;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
IXGBE_TX_UNLOCK(txr);
}
/*********************************************************************
*
* Initialize all transmit rings.
*
**********************************************************************/
int
ixgbe_setup_transmit_structures(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++)
ixgbe_setup_transmit_ring(txr);
return (0);
}
/*********************************************************************
*
* Free all transmit rings.
*
**********************************************************************/
void
ixgbe_free_transmit_structures(struct adapter *adapter)
{
struct tx_ring *txr = adapter->tx_rings;
for (int i = 0; i < adapter->num_queues; i++, txr++) {
IXGBE_TX_LOCK(txr);
ixgbe_free_transmit_buffers(txr);
ixgbe_dma_free(adapter, &txr->txdma);
IXGBE_TX_UNLOCK(txr);
IXGBE_TX_LOCK_DESTROY(txr);
}
free(adapter->tx_rings, M_DEVBUF);
}
/*********************************************************************
*
* Free transmit ring related data structures.
*
**********************************************************************/
static void
ixgbe_free_transmit_buffers(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
struct ixgbe_tx_buf *tx_buffer;
int i;
INIT_DEBUGOUT("ixgbe_free_transmit_ring: begin");
if (txr->tx_buffers == NULL)
return;
tx_buffer = txr->tx_buffers;
for (i = 0; i < adapter->num_tx_desc; i++, tx_buffer++) {
if (tx_buffer->m_head != NULL) {
bus_dmamap_sync(txr->txtag, tx_buffer->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
tx_buffer->map);
m_freem(tx_buffer->m_head);
tx_buffer->m_head = NULL;
if (tx_buffer->map != NULL) {
bus_dmamap_destroy(txr->txtag,
tx_buffer->map);
tx_buffer->map = NULL;
}
} else if (tx_buffer->map != NULL) {
bus_dmamap_unload(txr->txtag,
tx_buffer->map);
bus_dmamap_destroy(txr->txtag,
tx_buffer->map);
tx_buffer->map = NULL;
}
}
#ifdef IXGBE_LEGACY_TX
if (txr->br != NULL)
buf_ring_free(txr->br, M_DEVBUF);
#endif
if (txr->tx_buffers != NULL) {
free(txr->tx_buffers, M_DEVBUF);
txr->tx_buffers = NULL;
}
if (txr->txtag != NULL) {
bus_dma_tag_destroy(txr->txtag);
txr->txtag = NULL;
}
return;
}
/*********************************************************************
*
* Advanced Context Descriptor setup for VLAN, CSUM or TSO
*
**********************************************************************/
static int
ixgbe_tx_ctx_setup(struct tx_ring *txr, struct mbuf *mp,
u32 *cmd_type_len, u32 *olinfo_status)
{
struct adapter *adapter = txr->adapter;
struct ixgbe_adv_tx_context_desc *TXD;
struct ether_vlan_header *eh;
#ifdef INET
struct ip *ip;
#endif
#ifdef INET6
struct ip6_hdr *ip6;
#endif
u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
int ehdrlen, ip_hlen = 0;
u16 etype;
u8 ipproto = 0;
int offload = TRUE;
int ctxd = txr->next_avail_desc;
u16 vtag = 0;
caddr_t l3d;
/* First check if TSO is to be used */
if (mp->m_pkthdr.csum_flags & (CSUM_IP_TSO|CSUM_IP6_TSO))
return (ixgbe_tso_setup(txr, mp, cmd_type_len, olinfo_status));
if ((mp->m_pkthdr.csum_flags & CSUM_OFFLOAD) == 0)
offload = FALSE;
/* Indicate the whole packet as payload when not doing TSO */
*olinfo_status |= mp->m_pkthdr.len << IXGBE_ADVTXD_PAYLEN_SHIFT;
/* Now ready a context descriptor */
TXD = (struct ixgbe_adv_tx_context_desc *) &txr->tx_base[ctxd];
/*
** In advanced descriptors the vlan tag must
** be placed into the context descriptor. Hence
** we need to make one even if not doing offloads.
*/
if (mp->m_flags & M_VLANTAG) {
vtag = htole16(mp->m_pkthdr.ether_vtag);
vlan_macip_lens |= (vtag << IXGBE_ADVTXD_VLAN_SHIFT);
} else if (!IXGBE_IS_X550VF(adapter) && (offload == FALSE))
return (0);
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present,
* helpful for QinQ too.
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
etype = ntohs(eh->evl_proto);
ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
} else {
etype = ntohs(eh->evl_encap_proto);
ehdrlen = ETHER_HDR_LEN;
}
/* Set the ether header length */
vlan_macip_lens |= ehdrlen << IXGBE_ADVTXD_MACLEN_SHIFT;
if (offload == FALSE)
goto no_offloads;
/*
* If the first mbuf only includes the ethernet header, jump to the next one
* XXX: This assumes the stack splits mbufs containing headers on header boundaries
* XXX: And assumes the entire IP header is contained in one mbuf
*/
if (mp->m_len == ehdrlen && mp->m_next)
l3d = mtod(mp->m_next, caddr_t);
else
l3d = mtod(mp, caddr_t) + ehdrlen;
switch (etype) {
#ifdef INET
case ETHERTYPE_IP:
ip = (struct ip *)(l3d);
ip_hlen = ip->ip_hl << 2;
ipproto = ip->ip_p;
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
/* Insert IPv4 checksum into data descriptors */
if (mp->m_pkthdr.csum_flags & CSUM_IP) {
ip->ip_sum = 0;
*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
}
break;
#endif
#ifdef INET6
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(l3d);
ip_hlen = sizeof(struct ip6_hdr);
ipproto = ip6->ip6_nxt;
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
break;
#endif
default:
offload = FALSE;
break;
}
vlan_macip_lens |= ip_hlen;
/* No support for offloads for non-L4 next headers */
switch (ipproto) {
case IPPROTO_TCP:
if (mp->m_pkthdr.csum_flags & (CSUM_IP_TCP | CSUM_IP6_TCP))
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
else
offload = false;
break;
case IPPROTO_UDP:
if (mp->m_pkthdr.csum_flags & (CSUM_IP_UDP | CSUM_IP6_UDP))
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP;
else
offload = false;
break;
case IPPROTO_SCTP:
if (mp->m_pkthdr.csum_flags & (CSUM_IP_SCTP | CSUM_IP6_SCTP))
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_SCTP;
else
offload = false;
break;
default:
offload = false;
break;
}
if (offload) /* Insert L4 checksum into data descriptors */
*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
no_offloads:
type_tucmd_mlhl |= IXGBE_ADVTXD_DCMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
/* Now copy bits into descriptor */
TXD->vlan_macip_lens = htole32(vlan_macip_lens);
TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
TXD->seqnum_seed = htole32(0);
TXD->mss_l4len_idx = htole32(0);
/* We've consumed the first desc, adjust counters */
if (++ctxd == txr->num_desc)
ctxd = 0;
txr->next_avail_desc = ctxd;
--txr->tx_avail;
return (0);
}
/**********************************************************************
*
* Setup work for hardware segmentation offload (TSO) on
* adapters using advanced tx descriptors
*
**********************************************************************/
static int
ixgbe_tso_setup(struct tx_ring *txr, struct mbuf *mp,
u32 *cmd_type_len, u32 *olinfo_status)
{
struct ixgbe_adv_tx_context_desc *TXD;
u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
u32 mss_l4len_idx = 0, paylen;
u16 vtag = 0, eh_type;
int ctxd, ehdrlen, ip_hlen, tcp_hlen;
struct ether_vlan_header *eh;
#ifdef INET6
struct ip6_hdr *ip6;
#endif
#ifdef INET
struct ip *ip;
#endif
struct tcphdr *th;
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
eh_type = eh->evl_proto;
} else {
ehdrlen = ETHER_HDR_LEN;
eh_type = eh->evl_encap_proto;
}
switch (ntohs(eh_type)) {
#ifdef INET6
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
/* XXX-BZ For now we do not pretend to support ext. hdrs. */
if (ip6->ip6_nxt != IPPROTO_TCP)
return (ENXIO);
ip_hlen = sizeof(struct ip6_hdr);
ip6 = (struct ip6_hdr *)(mp->m_data + ehdrlen);
th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
break;
#endif
#ifdef INET
case ETHERTYPE_IP:
ip = (struct ip *)(mp->m_data + ehdrlen);
if (ip->ip_p != IPPROTO_TCP)
return (ENXIO);
ip->ip_sum = 0;
ip_hlen = ip->ip_hl << 2;
th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
th->th_sum = in_pseudo(ip->ip_src.s_addr,
ip->ip_dst.s_addr, htons(IPPROTO_TCP));
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
/* Tell transmit desc to also do IPv4 checksum. */
*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
break;
#endif
default:
panic("%s: CSUM_TSO but no supported IP version (0x%04x)",
__func__, ntohs(eh_type));
break;
}
ctxd = txr->next_avail_desc;
TXD = (struct ixgbe_adv_tx_context_desc *) &txr->tx_base[ctxd];
tcp_hlen = th->th_off << 2;
/* This is used in the transmit desc in encap */
paylen = mp->m_pkthdr.len - ehdrlen - ip_hlen - tcp_hlen;
/* VLAN MACLEN IPLEN */
if (mp->m_flags & M_VLANTAG) {
vtag = htole16(mp->m_pkthdr.ether_vtag);
vlan_macip_lens |= (vtag << IXGBE_ADVTXD_VLAN_SHIFT);
}
vlan_macip_lens |= ehdrlen << IXGBE_ADVTXD_MACLEN_SHIFT;
vlan_macip_lens |= ip_hlen;
TXD->vlan_macip_lens = htole32(vlan_macip_lens);
/* ADV DTYPE TUCMD */
type_tucmd_mlhl |= IXGBE_ADVTXD_DCMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
TXD->type_tucmd_mlhl = htole32(type_tucmd_mlhl);
/* MSS L4LEN IDX */
mss_l4len_idx |= (mp->m_pkthdr.tso_segsz << IXGBE_ADVTXD_MSS_SHIFT);
mss_l4len_idx |= (tcp_hlen << IXGBE_ADVTXD_L4LEN_SHIFT);
TXD->mss_l4len_idx = htole32(mss_l4len_idx);
TXD->seqnum_seed = htole32(0);
if (++ctxd == txr->num_desc)
ctxd = 0;
txr->tx_avail--;
txr->next_avail_desc = ctxd;
*cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
*olinfo_status |= paylen << IXGBE_ADVTXD_PAYLEN_SHIFT;
++txr->tso_tx;
return (0);
}
/**********************************************************************
*
* Examine each tx_buffer in the used queue. If the hardware is done
* processing the packet then free associated resources. The
* tx_buffer is put back on the free queue.
*
**********************************************************************/
void
ixgbe_txeof(struct tx_ring *txr)
{
struct adapter *adapter = txr->adapter;
#ifdef DEV_NETMAP
struct ifnet *ifp = adapter->ifp;
#endif
u32 work, processed = 0;
u32 limit = adapter->tx_process_limit;
struct ixgbe_tx_buf *buf;
union ixgbe_adv_tx_desc *txd;
mtx_assert(&txr->tx_mtx, MA_OWNED);
#ifdef DEV_NETMAP
if (ifp->if_capenable & IFCAP_NETMAP) {
struct netmap_adapter *na = NA(ifp);
struct netmap_kring *kring = &na->tx_rings[txr->me];
txd = txr->tx_base;
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
/*
* In netmap mode, all the work is done in the context
* of the client thread. Interrupt handlers only wake up
* clients, which may be sleeping on individual rings
* or on a global resource for all rings.
* To implement tx interrupt mitigation, we wake up the client
* thread roughly every half ring, even if the NIC interrupts
* more frequently. This is implemented as follows:
* - ixgbe_txsync() sets kring->nr_kflags with the index of
* the slot that should wake up the thread (nkr_num_slots
* means the user thread should not be woken up);
* - the driver ignores tx interrupts unless netmap_mitigate=0
* or the slot has the DD bit set.
*/
if (!netmap_mitigate ||
(kring->nr_kflags < kring->nkr_num_slots &&
txd[kring->nr_kflags].wb.status & IXGBE_TXD_STAT_DD)) {
netmap_tx_irq(ifp, txr->me);
}
return;
}
#endif /* DEV_NETMAP */
if (txr->tx_avail == txr->num_desc) {
txr->busy = 0;
return;
}
/* Get work starting point */
work = txr->next_to_clean;
buf = &txr->tx_buffers[work];
txd = &txr->tx_base[work];
work -= txr->num_desc; /* The distance to ring end */
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_POSTREAD);
do {
union ixgbe_adv_tx_desc *eop = buf->eop;
if (eop == NULL) /* No work */
break;
if ((eop->wb.status & IXGBE_TXD_STAT_DD) == 0)
break; /* I/O not complete */
if (buf->m_head) {
txr->bytes +=
buf->m_head->m_pkthdr.len;
bus_dmamap_sync(txr->txtag,
buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
}
buf->eop = NULL;
++txr->tx_avail;
/* We clean the range if multi segment */
while (txd != eop) {
++txd;
++buf;
++work;
/* wrap the ring? */
if (__predict_false(!work)) {
work -= txr->num_desc;
buf = txr->tx_buffers;
txd = txr->tx_base;
}
if (buf->m_head) {
txr->bytes +=
buf->m_head->m_pkthdr.len;
bus_dmamap_sync(txr->txtag,
buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(txr->txtag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
}
++txr->tx_avail;
buf->eop = NULL;
}
++txr->packets;
++processed;
/* Try the next packet */
++txd;
++buf;
++work;
/* reset with a wrap */
if (__predict_false(!work)) {
work -= txr->num_desc;
buf = txr->tx_buffers;
txd = txr->tx_base;
}
prefetch(txd);
} while (__predict_true(--limit));
bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
work += txr->num_desc;
txr->next_to_clean = work;
/*
** Queue Hang detection, we know there's
** work outstanding or the first return
** would have been taken, so increment busy
** if nothing managed to get cleaned, then
** in local_timer it will be checked and
** marked as HUNG if it exceeds a MAX attempt.
*/
if ((processed == 0) && (txr->busy != IXGBE_QUEUE_HUNG))
++txr->busy;
/*
** If anything gets cleaned we reset state to 1,
** note this will turn off HUNG if its set.
*/
if (processed)
txr->busy = 1;
if (txr->tx_avail == txr->num_desc)
txr->busy = 0;
return;
}
#ifdef IXGBE_FDIR
/*
** This routine parses packet headers so that Flow
** Director can make a hashed filter table entry
** allowing traffic flows to be identified and kept
** on the same cpu. This would be a performance
** hit, but we only do it at IXGBE_FDIR_RATE of
** packets.
*/
static void
ixgbe_atr(struct tx_ring *txr, struct mbuf *mp)
{
struct adapter *adapter = txr->adapter;
struct ix_queue *que;
struct ip *ip;
struct tcphdr *th;
struct udphdr *uh;
struct ether_vlan_header *eh;
union ixgbe_atr_hash_dword input = {.dword = 0};
union ixgbe_atr_hash_dword common = {.dword = 0};
int ehdrlen, ip_hlen;
u16 etype;
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
etype = eh->evl_proto;
} else {
ehdrlen = ETHER_HDR_LEN;
etype = eh->evl_encap_proto;
}
/* Only handling IPv4 */
if (etype != htons(ETHERTYPE_IP))
return;
ip = (struct ip *)(mp->m_data + ehdrlen);
ip_hlen = ip->ip_hl << 2;
/* check if we're UDP or TCP */
switch (ip->ip_p) {
case IPPROTO_TCP:
th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
/* src and dst are inverted */
common.port.dst ^= th->th_sport;
common.port.src ^= th->th_dport;
input.formatted.flow_type ^= IXGBE_ATR_FLOW_TYPE_TCPV4;
break;
case IPPROTO_UDP:
uh = (struct udphdr *)((caddr_t)ip + ip_hlen);
/* src and dst are inverted */
common.port.dst ^= uh->uh_sport;
common.port.src ^= uh->uh_dport;
input.formatted.flow_type ^= IXGBE_ATR_FLOW_TYPE_UDPV4;
break;
default:
return;
}
input.formatted.vlan_id = htobe16(mp->m_pkthdr.ether_vtag);
if (mp->m_pkthdr.ether_vtag)
common.flex_bytes ^= htons(ETHERTYPE_VLAN);
else
common.flex_bytes ^= etype;
common.ip ^= ip->ip_src.s_addr ^ ip->ip_dst.s_addr;
que = &adapter->queues[txr->me];
/*
** This assumes the Rx queue and Tx
** queue are bound to the same CPU
*/
ixgbe_fdir_add_signature_filter_82599(&adapter->hw,
input, common, que->msix);
}
#endif /* IXGBE_FDIR */
/*
** Used to detect a descriptor that has
** been merged by Hardware RSC.
*/
static inline u32
ixgbe_rsc_count(union ixgbe_adv_rx_desc *rx)
{
return (le32toh(rx->wb.lower.lo_dword.data) &
IXGBE_RXDADV_RSCCNT_MASK) >> IXGBE_RXDADV_RSCCNT_SHIFT;
}
/*********************************************************************
*
* Initialize Hardware RSC (LRO) feature on 82599
* for an RX ring, this is toggled by the LRO capability
* even though it is transparent to the stack.
*
* NOTE: since this HW feature only works with IPV4 and
* our testing has shown soft LRO to be as effective
* I have decided to disable this by default.
*
**********************************************************************/
static void
ixgbe_setup_hw_rsc(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
struct ixgbe_hw *hw = &adapter->hw;
u32 rscctrl, rdrxctl;
/* If turning LRO/RSC off we need to disable it */
if ((adapter->ifp->if_capenable & IFCAP_LRO) == 0) {
rscctrl = IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxr->me));
rscctrl &= ~IXGBE_RSCCTL_RSCEN;
return;
}
rdrxctl = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
rdrxctl &= ~IXGBE_RDRXCTL_RSCFRSTSIZE;
#ifdef DEV_NETMAP /* crcstrip is optional in netmap */
if (adapter->ifp->if_capenable & IFCAP_NETMAP && !ix_crcstrip)
#endif /* DEV_NETMAP */
rdrxctl |= IXGBE_RDRXCTL_CRCSTRIP;
rdrxctl |= IXGBE_RDRXCTL_RSCACKC;
IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, rdrxctl);
rscctrl = IXGBE_READ_REG(hw, IXGBE_RSCCTL(rxr->me));
rscctrl |= IXGBE_RSCCTL_RSCEN;
/*
** Limit the total number of descriptors that
** can be combined, so it does not exceed 64K
*/
if (rxr->mbuf_sz == MCLBYTES)
rscctrl |= IXGBE_RSCCTL_MAXDESC_16;
else if (rxr->mbuf_sz == MJUMPAGESIZE)
rscctrl |= IXGBE_RSCCTL_MAXDESC_8;
else if (rxr->mbuf_sz == MJUM9BYTES)
rscctrl |= IXGBE_RSCCTL_MAXDESC_4;
else /* Using 16K cluster */
rscctrl |= IXGBE_RSCCTL_MAXDESC_1;
IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(rxr->me), rscctrl);
/* Enable TCP header recognition */
IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(0),
(IXGBE_READ_REG(hw, IXGBE_PSRTYPE(0)) |
IXGBE_PSRTYPE_TCPHDR));
/* Disable RSC for ACK packets */
IXGBE_WRITE_REG(hw, IXGBE_RSCDBU,
(IXGBE_RSCDBU_RSCACKDIS | IXGBE_READ_REG(hw, IXGBE_RSCDBU)));
rxr->hw_rsc = TRUE;
}
/*********************************************************************
*
* Refresh mbuf buffers for RX descriptor rings
* - now keeps its own state so discards due to resource
* exhaustion are unnecessary, if an mbuf cannot be obtained
* it just returns, keeping its placeholder, thus it can simply
* be recalled to try again.
*
**********************************************************************/
static void
ixgbe_refresh_mbufs(struct rx_ring *rxr, int limit)
{
struct adapter *adapter = rxr->adapter;
bus_dma_segment_t seg[1];
struct ixgbe_rx_buf *rxbuf;
struct mbuf *mp;
int i, j, nsegs, error;
bool refreshed = FALSE;
i = j = rxr->next_to_refresh;
/* Control the loop with one beyond */
if (++j == rxr->num_desc)
j = 0;
while (j != limit) {
rxbuf = &rxr->rx_buffers[i];
if (rxbuf->buf == NULL) {
mp = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, rxr->mbuf_sz);
if (mp == NULL)
goto update;
if (adapter->max_frame_size <= (MCLBYTES - ETHER_ALIGN))
m_adj(mp, ETHER_ALIGN);
} else
mp = rxbuf->buf;
mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
/* If we're dealing with an mbuf that was copied rather
* than replaced, there's no need to go through busdma.
*/
if ((rxbuf->flags & IXGBE_RX_COPY) == 0) {
/* Get the memory mapping */
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
rxbuf->pmap, mp, seg, &nsegs, BUS_DMA_NOWAIT);
if (error != 0) {
printf("Refresh mbufs: payload dmamap load"
" failure - %d\n", error);
m_free(mp);
rxbuf->buf = NULL;
goto update;
}
rxbuf->buf = mp;
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_PREREAD);
rxbuf->addr = rxr->rx_base[i].read.pkt_addr =
htole64(seg[0].ds_addr);
} else {
rxr->rx_base[i].read.pkt_addr = rxbuf->addr;
rxbuf->flags &= ~IXGBE_RX_COPY;
}
refreshed = TRUE;
/* Next is precalculated */
i = j;
rxr->next_to_refresh = i;
if (++j == rxr->num_desc)
j = 0;
}
update:
if (refreshed) /* Update hardware tail index */
IXGBE_WRITE_REG(&adapter->hw,
rxr->tail, rxr->next_to_refresh);
return;
}
/*********************************************************************
*
* Allocate memory for rx_buffer structures. Since we use one
* rx_buffer per received packet, the maximum number of rx_buffer's
* that we'll need is equal to the number of receive descriptors
* that we've allocated.
*
**********************************************************************/
int
ixgbe_allocate_receive_buffers(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
device_t dev = adapter->dev;
struct ixgbe_rx_buf *rxbuf;
int bsize, error;
bsize = sizeof(struct ixgbe_rx_buf) * rxr->num_desc;
if (!(rxr->rx_buffers =
(struct ixgbe_rx_buf *) malloc(bsize,
M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate rx_buffer memory\n");
error = ENOMEM;
goto fail;
}
if ((error = bus_dma_tag_create(bus_get_dma_tag(dev), /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MJUM16BYTES, /* maxsize */
1, /* nsegments */
MJUM16BYTES, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&rxr->ptag))) {
device_printf(dev, "Unable to create RX DMA tag\n");
goto fail;
}
for (int i = 0; i < rxr->num_desc; i++, rxbuf++) {
rxbuf = &rxr->rx_buffers[i];
error = bus_dmamap_create(rxr->ptag, 0, &rxbuf->pmap);
if (error) {
device_printf(dev, "Unable to create RX dma map\n");
goto fail;
}
}
return (0);
fail:
/* Frees all, but can handle partial completion */
ixgbe_free_receive_structures(adapter);
return (error);
}
static void
ixgbe_free_receive_ring(struct rx_ring *rxr)
{
struct ixgbe_rx_buf *rxbuf;
for (int i = 0; i < rxr->num_desc; i++) {
rxbuf = &rxr->rx_buffers[i];
if (rxbuf->buf != NULL) {
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
rxbuf->buf->m_flags |= M_PKTHDR;
m_freem(rxbuf->buf);
rxbuf->buf = NULL;
rxbuf->flags = 0;
}
}
}
/*********************************************************************
*
* Initialize a receive ring and its buffers.
*
**********************************************************************/
static int
ixgbe_setup_receive_ring(struct rx_ring *rxr)
{
struct adapter *adapter;
struct ifnet *ifp;
device_t dev;
struct ixgbe_rx_buf *rxbuf;
bus_dma_segment_t seg[1];
struct lro_ctrl *lro = &rxr->lro;
int rsize, nsegs, error = 0;
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(rxr->adapter->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
adapter = rxr->adapter;
ifp = adapter->ifp;
dev = adapter->dev;
/* Clear the ring contents */
IXGBE_RX_LOCK(rxr);
#ifdef DEV_NETMAP
/* same as in ixgbe_setup_transmit_ring() */
slot = netmap_reset(na, NR_RX, rxr->me, 0);
#endif /* DEV_NETMAP */
rsize = roundup2(adapter->num_rx_desc *
sizeof(union ixgbe_adv_rx_desc), DBA_ALIGN);
bzero((void *)rxr->rx_base, rsize);
/* Cache the size */
rxr->mbuf_sz = adapter->rx_mbuf_sz;
/* Free current RX buffer structs and their mbufs */
ixgbe_free_receive_ring(rxr);
/* Now replenish the mbufs */
for (int j = 0; j != rxr->num_desc; ++j) {
struct mbuf *mp;
rxbuf = &rxr->rx_buffers[j];
#ifdef DEV_NETMAP
/*
* In netmap mode, fill the map and set the buffer
* address in the NIC ring, considering the offset
* between the netmap and NIC rings (see comment in
* ixgbe_setup_transmit_ring() ). No need to allocate
* an mbuf, so end the block with a continue;
*/
if (slot) {
int sj = netmap_idx_n2k(&na->rx_rings[rxr->me], j);
uint64_t paddr;
void *addr;
addr = PNMB(na, slot + sj, &paddr);
netmap_load_map(na, rxr->ptag, rxbuf->pmap, addr);
/* Update descriptor and the cached value */
rxr->rx_base[j].read.pkt_addr = htole64(paddr);
rxbuf->addr = htole64(paddr);
continue;
}
#endif /* DEV_NETMAP */
rxbuf->flags = 0;
rxbuf->buf = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, adapter->rx_mbuf_sz);
if (rxbuf->buf == NULL) {
error = ENOBUFS;
goto fail;
}
mp = rxbuf->buf;
mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
rxbuf->pmap, mp, seg,
&nsegs, BUS_DMA_NOWAIT);
if (error != 0)
goto fail;
bus_dmamap_sync(rxr->ptag,
rxbuf->pmap, BUS_DMASYNC_PREREAD);
/* Update the descriptor and the cached value */
rxr->rx_base[j].read.pkt_addr = htole64(seg[0].ds_addr);
rxbuf->addr = htole64(seg[0].ds_addr);
}
/* Setup our descriptor indices */
rxr->next_to_check = 0;
rxr->next_to_refresh = 0;
rxr->lro_enabled = FALSE;
rxr->rx_copies = 0;
rxr->rx_bytes = 0;
rxr->vtag_strip = FALSE;
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
** Now set up the LRO interface:
*/
if (ixgbe_rsc_enable)
ixgbe_setup_hw_rsc(rxr);
else if (ifp->if_capenable & IFCAP_LRO) {
int err = tcp_lro_init(lro);
if (err) {
device_printf(dev, "LRO Initialization failed!\n");
goto fail;
}
INIT_DEBUGOUT("RX Soft LRO Initialized\n");
rxr->lro_enabled = TRUE;
lro->ifp = adapter->ifp;
}
IXGBE_RX_UNLOCK(rxr);
return (0);
fail:
ixgbe_free_receive_ring(rxr);
IXGBE_RX_UNLOCK(rxr);
return (error);
}
/*********************************************************************
*
* Initialize all receive rings.
*
**********************************************************************/
int
ixgbe_setup_receive_structures(struct adapter *adapter)
{
struct rx_ring *rxr = adapter->rx_rings;
int j;
for (j = 0; j < adapter->num_queues; j++, rxr++)
if (ixgbe_setup_receive_ring(rxr))
goto fail;
return (0);
fail:
/*
* Free RX buffers allocated so far, we will only handle
* the rings that completed, the failing case will have
* cleaned up for itself. 'j' failed, so its the terminus.
*/
for (int i = 0; i < j; ++i) {
rxr = &adapter->rx_rings[i];
ixgbe_free_receive_ring(rxr);
}
return (ENOBUFS);
}
/*********************************************************************
*
* Free all receive rings.
*
**********************************************************************/
void
ixgbe_free_receive_structures(struct adapter *adapter)
{
struct rx_ring *rxr = adapter->rx_rings;
INIT_DEBUGOUT("ixgbe_free_receive_structures: begin");
for (int i = 0; i < adapter->num_queues; i++, rxr++) {
struct lro_ctrl *lro = &rxr->lro;
ixgbe_free_receive_buffers(rxr);
/* Free LRO memory */
tcp_lro_free(lro);
/* Free the ring memory as well */
ixgbe_dma_free(adapter, &rxr->rxdma);
}
free(adapter->rx_rings, M_DEVBUF);
}
/*********************************************************************
*
* Free receive ring data structures
*
**********************************************************************/
void
ixgbe_free_receive_buffers(struct rx_ring *rxr)
{
struct adapter *adapter = rxr->adapter;
struct ixgbe_rx_buf *rxbuf;
INIT_DEBUGOUT("ixgbe_free_receive_buffers: begin");
/* Cleanup any existing buffers */
if (rxr->rx_buffers != NULL) {
for (int i = 0; i < adapter->num_rx_desc; i++) {
rxbuf = &rxr->rx_buffers[i];
if (rxbuf->buf != NULL) {
bus_dmamap_sync(rxr->ptag, rxbuf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, rxbuf->pmap);
rxbuf->buf->m_flags |= M_PKTHDR;
m_freem(rxbuf->buf);
}
rxbuf->buf = NULL;
if (rxbuf->pmap != NULL) {
bus_dmamap_destroy(rxr->ptag, rxbuf->pmap);
rxbuf->pmap = NULL;
}
}
if (rxr->rx_buffers != NULL) {
free(rxr->rx_buffers, M_DEVBUF);
rxr->rx_buffers = NULL;
}
}
if (rxr->ptag != NULL) {
bus_dma_tag_destroy(rxr->ptag);
rxr->ptag = NULL;
}
return;
}
static __inline void
ixgbe_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u32 ptype)
{
/*
* ATM LRO is only for IP/TCP packets and TCP checksum of the packet
* should be computed by hardware. Also it should not have VLAN tag in
* ethernet header. In case of IPv6 we do not yet support ext. hdrs.
*/
if (rxr->lro_enabled &&
(ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
(ptype & IXGBE_RXDADV_PKTTYPE_ETQF) == 0 &&
((ptype & (IXGBE_RXDADV_PKTTYPE_IPV4 | IXGBE_RXDADV_PKTTYPE_TCP)) ==
(IXGBE_RXDADV_PKTTYPE_IPV4 | IXGBE_RXDADV_PKTTYPE_TCP) ||
(ptype & (IXGBE_RXDADV_PKTTYPE_IPV6 | IXGBE_RXDADV_PKTTYPE_TCP)) ==
(IXGBE_RXDADV_PKTTYPE_IPV6 | IXGBE_RXDADV_PKTTYPE_TCP)) &&
(m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) ==
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
/*
* Send to the stack if:
** - LRO not enabled, or
** - no LRO resources, or
** - lro enqueue fails
*/
if (rxr->lro.lro_cnt != 0)
if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
return;
}
IXGBE_RX_UNLOCK(rxr);
(*ifp->if_input)(ifp, m);
IXGBE_RX_LOCK(rxr);
}
static __inline void
ixgbe_rx_discard(struct rx_ring *rxr, int i)
{
struct ixgbe_rx_buf *rbuf;
rbuf = &rxr->rx_buffers[i];
/*
** With advanced descriptors the writeback
** clobbers the buffer addrs, so its easier
** to just free the existing mbufs and take
** the normal refresh path to get new buffers
** and mapping.
*/
if (rbuf->fmp != NULL) {/* Partial chain ? */
rbuf->fmp->m_flags |= M_PKTHDR;
m_freem(rbuf->fmp);
rbuf->fmp = NULL;
rbuf->buf = NULL; /* rbuf->buf is part of fmp's chain */
} else if (rbuf->buf) {
m_free(rbuf->buf);
rbuf->buf = NULL;
}
bus_dmamap_unload(rxr->ptag, rbuf->pmap);
rbuf->flags = 0;
return;
}
/*********************************************************************
*
* This routine executes in interrupt context. It replenishes
* the mbufs in the descriptor and sends data which has been
* dma'ed into host memory to upper layer.
*
* Return TRUE for more work, FALSE for all clean.
*********************************************************************/
bool
ixgbe_rxeof(struct ix_queue *que)
{
struct adapter *adapter = que->adapter;
struct rx_ring *rxr = que->rxr;
struct ifnet *ifp = adapter->ifp;
struct lro_ctrl *lro = &rxr->lro;
int i, nextp, processed = 0;
u32 staterr = 0;
u32 count = adapter->rx_process_limit;
union ixgbe_adv_rx_desc *cur;
struct ixgbe_rx_buf *rbuf, *nbuf;
u16 pkt_info;
IXGBE_RX_LOCK(rxr);
#ifdef DEV_NETMAP
/* Same as the txeof routine: wakeup clients on intr. */
if (netmap_rx_irq(ifp, rxr->me, &processed)) {
IXGBE_RX_UNLOCK(rxr);
return (FALSE);
}
#endif /* DEV_NETMAP */
for (i = rxr->next_to_check; count != 0;) {
struct mbuf *sendmp, *mp;
u32 rsc, ptype;
u16 len;
u16 vtag = 0;
bool eop;
/* Sync the ring. */
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
cur = &rxr->rx_base[i];
staterr = le32toh(cur->wb.upper.status_error);
pkt_info = le16toh(cur->wb.lower.lo_dword.hs_rss.pkt_info);
if ((staterr & IXGBE_RXD_STAT_DD) == 0)
break;
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
count--;
sendmp = NULL;
nbuf = NULL;
rsc = 0;
cur->wb.upper.status_error = 0;
rbuf = &rxr->rx_buffers[i];
mp = rbuf->buf;
len = le16toh(cur->wb.upper.length);
ptype = le32toh(cur->wb.lower.lo_dword.data) &
IXGBE_RXDADV_PKTTYPE_MASK;
eop = ((staterr & IXGBE_RXD_STAT_EOP) != 0);
/* Make sure bad packets are discarded */
if (eop && (staterr & IXGBE_RXDADV_ERR_FRAME_ERR_MASK) != 0) {
#if __FreeBSD_version >= 1100036
if (IXGBE_IS_VF(adapter))
if_inc_counter(ifp, IFCOUNTER_IERRORS, 1);
#endif
rxr->rx_discarded++;
ixgbe_rx_discard(rxr, i);
goto next_desc;
}
/*
** On 82599 which supports a hardware
** LRO (called HW RSC), packets need
** not be fragmented across sequential
** descriptors, rather the next descriptor
** is indicated in bits of the descriptor.
** This also means that we might proceses
** more than one packet at a time, something
** that has never been true before, it
** required eliminating global chain pointers
** in favor of what we are doing here. -jfv
*/
if (!eop) {
/*
** Figure out the next descriptor
** of this frame.
*/
if (rxr->hw_rsc == TRUE) {
rsc = ixgbe_rsc_count(cur);
rxr->rsc_num += (rsc - 1);
}
if (rsc) { /* Get hardware index */
nextp = ((staterr &
IXGBE_RXDADV_NEXTP_MASK) >>
IXGBE_RXDADV_NEXTP_SHIFT);
} else { /* Just sequential */
nextp = i + 1;
if (nextp == adapter->num_rx_desc)
nextp = 0;
}
nbuf = &rxr->rx_buffers[nextp];
prefetch(nbuf);
}
/*
** Rather than using the fmp/lmp global pointers
** we now keep the head of a packet chain in the
** buffer struct and pass this along from one
** descriptor to the next, until we get EOP.
*/
mp->m_len = len;
/*
** See if there is a stored head
** that determines what we are
*/
sendmp = rbuf->fmp;
if (sendmp != NULL) { /* secondary frag */
rbuf->buf = rbuf->fmp = NULL;
mp->m_flags &= ~M_PKTHDR;
sendmp->m_pkthdr.len += mp->m_len;
} else {
/*
* Optimize. This might be a small packet,
* maybe just a TCP ACK. Do a fast copy that
* is cache aligned into a new mbuf, and
* leave the old mbuf+cluster for re-use.
*/
if (eop && len <= IXGBE_RX_COPY_LEN) {
sendmp = m_gethdr(M_NOWAIT, MT_DATA);
if (sendmp != NULL) {
sendmp->m_data +=
IXGBE_RX_COPY_ALIGN;
ixgbe_bcopy(mp->m_data,
sendmp->m_data, len);
sendmp->m_len = len;
rxr->rx_copies++;
rbuf->flags |= IXGBE_RX_COPY;
}
}
if (sendmp == NULL) {
rbuf->buf = rbuf->fmp = NULL;
sendmp = mp;
}
/* first desc of a non-ps chain */
sendmp->m_flags |= M_PKTHDR;
sendmp->m_pkthdr.len = mp->m_len;
}
++processed;
/* Pass the head pointer on */
if (eop == 0) {
nbuf->fmp = sendmp;
sendmp = NULL;
mp->m_next = nbuf->buf;
} else { /* Sending this frame */
sendmp->m_pkthdr.rcvif = ifp;
rxr->rx_packets++;
/* capture data for AIM */
rxr->bytes += sendmp->m_pkthdr.len;
rxr->rx_bytes += sendmp->m_pkthdr.len;
/* Process vlan info */
if ((rxr->vtag_strip) &&
(staterr & IXGBE_RXD_STAT_VP))
vtag = le16toh(cur->wb.upper.vlan);
if (vtag) {
sendmp->m_pkthdr.ether_vtag = vtag;
sendmp->m_flags |= M_VLANTAG;
}
if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
ixgbe_rx_checksum(staterr, sendmp, ptype);
/*
* In case of multiqueue, we have RXCSUM.PCSD bit set
* and never cleared. This means we have RSS hash
* available to be used.
*/
if (adapter->num_queues > 1) {
sendmp->m_pkthdr.flowid =
le32toh(cur->wb.lower.hi_dword.rss);
switch (pkt_info & IXGBE_RXDADV_RSSTYPE_MASK) {
case IXGBE_RXDADV_RSSTYPE_IPV4:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_IPV4);
break;
case IXGBE_RXDADV_RSSTYPE_IPV4_TCP:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_TCP_IPV4);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_IPV6);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6_TCP:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_TCP_IPV6);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6_EX:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_IPV6_EX);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6_TCP_EX:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_TCP_IPV6_EX);
break;
#if __FreeBSD_version > 1100000
case IXGBE_RXDADV_RSSTYPE_IPV4_UDP:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_UDP_IPV4);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6_UDP:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_UDP_IPV6);
break;
case IXGBE_RXDADV_RSSTYPE_IPV6_UDP_EX:
M_HASHTYPE_SET(sendmp,
M_HASHTYPE_RSS_UDP_IPV6_EX);
break;
#endif
default:
M_HASHTYPE_SET(sendmp,
- M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_OPAQUE_HASH);
}
} else {
sendmp->m_pkthdr.flowid = que->msix;
M_HASHTYPE_SET(sendmp, M_HASHTYPE_OPAQUE);
}
}
next_desc:
bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/* Advance our pointers to the next descriptor. */
if (++i == rxr->num_desc)
i = 0;
/* Now send to the stack or do LRO */
if (sendmp != NULL) {
rxr->next_to_check = i;
ixgbe_rx_input(rxr, ifp, sendmp, ptype);
i = rxr->next_to_check;
}
/* Every 8 descriptors we go to refresh mbufs */
if (processed == 8) {
ixgbe_refresh_mbufs(rxr, i);
processed = 0;
}
}
/* Refresh any remaining buf structs */
if (ixgbe_rx_unrefreshed(rxr))
ixgbe_refresh_mbufs(rxr, i);
rxr->next_to_check = i;
/*
* Flush any outstanding LRO work
*/
tcp_lro_flush_all(lro);
IXGBE_RX_UNLOCK(rxr);
/*
** Still have cleaning to do?
*/
if ((staterr & IXGBE_RXD_STAT_DD) != 0)
return (TRUE);
else
return (FALSE);
}
/*********************************************************************
*
* Verify that the hardware indicated that the checksum is valid.
* Inform the stack about the status of checksum so that stack
* doesn't spend time verifying the checksum.
*
*********************************************************************/
static void
ixgbe_rx_checksum(u32 staterr, struct mbuf * mp, u32 ptype)
{
u16 status = (u16) staterr;
u8 errors = (u8) (staterr >> 24);
bool sctp = false;
if ((ptype & IXGBE_RXDADV_PKTTYPE_ETQF) == 0 &&
(ptype & IXGBE_RXDADV_PKTTYPE_SCTP) != 0)
sctp = true;
/* IPv4 checksum */
if (status & IXGBE_RXD_STAT_IPCS) {
mp->m_pkthdr.csum_flags |= CSUM_L3_CALC;
/* IP Checksum Good */
if (!(errors & IXGBE_RXD_ERR_IPE))
mp->m_pkthdr.csum_flags |= CSUM_L3_VALID;
}
/* TCP/UDP/SCTP checksum */
if (status & IXGBE_RXD_STAT_L4CS) {
mp->m_pkthdr.csum_flags |= CSUM_L4_CALC;
if (!(errors & IXGBE_RXD_ERR_TCPE)) {
mp->m_pkthdr.csum_flags |= CSUM_L4_VALID;
if (!sctp)
mp->m_pkthdr.csum_data = htons(0xffff);
}
}
}
/********************************************************************
* Manage DMA'able memory.
*******************************************************************/
static void
ixgbe_dmamap_cb(void *arg, bus_dma_segment_t * segs, int nseg, int error)
{
if (error)
return;
*(bus_addr_t *) arg = segs->ds_addr;
return;
}
int
ixgbe_dma_malloc(struct adapter *adapter, bus_size_t size,
struct ixgbe_dma_alloc *dma, int mapflags)
{
device_t dev = adapter->dev;
int r;
r = bus_dma_tag_create(bus_get_dma_tag(adapter->dev), /* parent */
DBA_ALIGN, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
size, /* maxsize */
1, /* nsegments */
size, /* maxsegsize */
BUS_DMA_ALLOCNOW, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&dma->dma_tag);
if (r != 0) {
device_printf(dev,"ixgbe_dma_malloc: bus_dma_tag_create failed; "
"error %u\n", r);
goto fail_0;
}
r = bus_dmamem_alloc(dma->dma_tag, (void **)&dma->dma_vaddr,
BUS_DMA_NOWAIT, &dma->dma_map);
if (r != 0) {
device_printf(dev,"ixgbe_dma_malloc: bus_dmamem_alloc failed; "
"error %u\n", r);
goto fail_1;
}
r = bus_dmamap_load(dma->dma_tag, dma->dma_map, dma->dma_vaddr,
size,
ixgbe_dmamap_cb,
&dma->dma_paddr,
mapflags | BUS_DMA_NOWAIT);
if (r != 0) {
device_printf(dev,"ixgbe_dma_malloc: bus_dmamap_load failed; "
"error %u\n", r);
goto fail_2;
}
dma->dma_size = size;
return (0);
fail_2:
bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
fail_1:
bus_dma_tag_destroy(dma->dma_tag);
fail_0:
dma->dma_tag = NULL;
return (r);
}
void
ixgbe_dma_free(struct adapter *adapter, struct ixgbe_dma_alloc *dma)
{
bus_dmamap_sync(dma->dma_tag, dma->dma_map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(dma->dma_tag, dma->dma_map);
bus_dmamem_free(dma->dma_tag, dma->dma_vaddr, dma->dma_map);
bus_dma_tag_destroy(dma->dma_tag);
}
/*********************************************************************
*
* Allocate memory for the transmit and receive rings, and then
* the descriptors associated with each, called only once at attach.
*
**********************************************************************/
int
ixgbe_allocate_queues(struct adapter *adapter)
{
device_t dev = adapter->dev;
struct ix_queue *que;
struct tx_ring *txr;
struct rx_ring *rxr;
int rsize, tsize, error = IXGBE_SUCCESS;
int txconf = 0, rxconf = 0;
#ifdef PCI_IOV
enum ixgbe_iov_mode iov_mode;
#endif
/* First allocate the top level queue structs */
if (!(adapter->queues =
(struct ix_queue *) malloc(sizeof(struct ix_queue) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate queue memory\n");
error = ENOMEM;
goto fail;
}
/* First allocate the TX ring struct memory */
if (!(adapter->tx_rings =
(struct tx_ring *) malloc(sizeof(struct tx_ring) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate TX ring memory\n");
error = ENOMEM;
goto tx_fail;
}
/* Next allocate the RX */
if (!(adapter->rx_rings =
(struct rx_ring *) malloc(sizeof(struct rx_ring) *
adapter->num_queues, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate RX ring memory\n");
error = ENOMEM;
goto rx_fail;
}
/* For the ring itself */
tsize = roundup2(adapter->num_tx_desc *
sizeof(union ixgbe_adv_tx_desc), DBA_ALIGN);
#ifdef PCI_IOV
iov_mode = ixgbe_get_iov_mode(adapter);
adapter->pool = ixgbe_max_vfs(iov_mode);
#else
adapter->pool = 0;
#endif
/*
* Now set up the TX queues, txconf is needed to handle the
* possibility that things fail midcourse and we need to
* undo memory gracefully
*/
for (int i = 0; i < adapter->num_queues; i++, txconf++) {
/* Set up some basics */
txr = &adapter->tx_rings[i];
txr->adapter = adapter;
#ifdef PCI_IOV
txr->me = ixgbe_pf_que_index(iov_mode, i);
#else
txr->me = i;
#endif
txr->num_desc = adapter->num_tx_desc;
/* Initialize the TX side lock */
snprintf(txr->mtx_name, sizeof(txr->mtx_name), "%s:tx(%d)",
device_get_nameunit(dev), txr->me);
mtx_init(&txr->tx_mtx, txr->mtx_name, NULL, MTX_DEF);
if (ixgbe_dma_malloc(adapter, tsize,
&txr->txdma, BUS_DMA_NOWAIT)) {
device_printf(dev,
"Unable to allocate TX Descriptor memory\n");
error = ENOMEM;
goto err_tx_desc;
}
txr->tx_base = (union ixgbe_adv_tx_desc *)txr->txdma.dma_vaddr;
bzero((void *)txr->tx_base, tsize);
/* Now allocate transmit buffers for the ring */
if (ixgbe_allocate_transmit_buffers(txr)) {
device_printf(dev,
"Critical Failure setting up transmit buffers\n");
error = ENOMEM;
goto err_tx_desc;
}
#ifndef IXGBE_LEGACY_TX
/* Allocate a buf ring */
txr->br = buf_ring_alloc(IXGBE_BR_SIZE, M_DEVBUF,
M_WAITOK, &txr->tx_mtx);
if (txr->br == NULL) {
device_printf(dev,
"Critical Failure setting up buf ring\n");
error = ENOMEM;
goto err_tx_desc;
}
#endif
}
/*
* Next the RX queues...
*/
rsize = roundup2(adapter->num_rx_desc *
sizeof(union ixgbe_adv_rx_desc), DBA_ALIGN);
for (int i = 0; i < adapter->num_queues; i++, rxconf++) {
rxr = &adapter->rx_rings[i];
/* Set up some basics */
rxr->adapter = adapter;
#ifdef PCI_IOV
rxr->me = ixgbe_pf_que_index(iov_mode, i);
#else
rxr->me = i;
#endif
rxr->num_desc = adapter->num_rx_desc;
/* Initialize the RX side lock */
snprintf(rxr->mtx_name, sizeof(rxr->mtx_name), "%s:rx(%d)",
device_get_nameunit(dev), rxr->me);
mtx_init(&rxr->rx_mtx, rxr->mtx_name, NULL, MTX_DEF);
if (ixgbe_dma_malloc(adapter, rsize,
&rxr->rxdma, BUS_DMA_NOWAIT)) {
device_printf(dev,
"Unable to allocate RxDescriptor memory\n");
error = ENOMEM;
goto err_rx_desc;
}
rxr->rx_base = (union ixgbe_adv_rx_desc *)rxr->rxdma.dma_vaddr;
bzero((void *)rxr->rx_base, rsize);
/* Allocate receive buffers for the ring*/
if (ixgbe_allocate_receive_buffers(rxr)) {
device_printf(dev,
"Critical Failure setting up receive buffers\n");
error = ENOMEM;
goto err_rx_desc;
}
}
/*
** Finally set up the queue holding structs
*/
for (int i = 0; i < adapter->num_queues; i++) {
que = &adapter->queues[i];
que->adapter = adapter;
que->me = i;
que->txr = &adapter->tx_rings[i];
que->rxr = &adapter->rx_rings[i];
}
return (0);
err_rx_desc:
for (rxr = adapter->rx_rings; rxconf > 0; rxr++, rxconf--)
ixgbe_dma_free(adapter, &rxr->rxdma);
err_tx_desc:
for (txr = adapter->tx_rings; txconf > 0; txr++, txconf--)
ixgbe_dma_free(adapter, &txr->txdma);
free(adapter->rx_rings, M_DEVBUF);
rx_fail:
free(adapter->tx_rings, M_DEVBUF);
tx_fail:
free(adapter->queues, M_DEVBUF);
fail:
return (error);
}
Index: projects/vnet/sys/dev/ixl/ixl_txrx.c
===================================================================
--- projects/vnet/sys/dev/ixl/ixl_txrx.c (revision 301546)
+++ projects/vnet/sys/dev/ixl/ixl_txrx.c (revision 301547)
@@ -1,1831 +1,1831 @@
/******************************************************************************
Copyright (c) 2013-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
******************************************************************************/
/*$FreeBSD$*/
/*
** IXL driver TX/RX Routines:
** This was seperated to allow usage by
** both the BASE and the VF drivers.
*/
#ifndef IXL_STANDALONE_BUILD
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_rss.h"
#endif
#include "ixl.h"
#ifdef RSS
#include
#endif
/* Local Prototypes */
static void ixl_rx_checksum(struct mbuf *, u32, u32, u8);
static void ixl_refresh_mbufs(struct ixl_queue *, int);
static int ixl_xmit(struct ixl_queue *, struct mbuf **);
static int ixl_tx_setup_offload(struct ixl_queue *,
struct mbuf *, u32 *, u32 *);
static bool ixl_tso_setup(struct ixl_queue *, struct mbuf *);
static __inline void ixl_rx_discard(struct rx_ring *, int);
static __inline void ixl_rx_input(struct rx_ring *, struct ifnet *,
struct mbuf *, u8);
#ifdef DEV_NETMAP
#include
#endif /* DEV_NETMAP */
/*
** Multiqueue Transmit driver
*/
int
ixl_mq_start(struct ifnet *ifp, struct mbuf *m)
{
struct ixl_vsi *vsi = ifp->if_softc;
struct ixl_queue *que;
struct tx_ring *txr;
int err, i;
#ifdef RSS
u32 bucket_id;
#endif
/*
** Which queue to use:
**
** When doing RSS, map it to the same outbound
** queue as the incoming flow would be mapped to.
** If everything is setup correctly, it should be
** the same bucket that the current CPU we're on is.
*/
if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) {
#ifdef RSS
if (rss_hash2bucket(m->m_pkthdr.flowid,
M_HASHTYPE_GET(m), &bucket_id) == 0) {
i = bucket_id % vsi->num_queues;
} else
#endif
i = m->m_pkthdr.flowid % vsi->num_queues;
} else
i = curcpu % vsi->num_queues;
/*
** This may not be perfect, but until something
** better comes along it will keep from scheduling
** on stalled queues.
*/
if (((1 << i) & vsi->active_queues) == 0)
i = ffsl(vsi->active_queues);
que = &vsi->queues[i];
txr = &que->txr;
err = drbr_enqueue(ifp, txr->br, m);
if (err)
return (err);
if (IXL_TX_TRYLOCK(txr)) {
ixl_mq_start_locked(ifp, txr);
IXL_TX_UNLOCK(txr);
} else
taskqueue_enqueue(que->tq, &que->tx_task);
return (0);
}
int
ixl_mq_start_locked(struct ifnet *ifp, struct tx_ring *txr)
{
struct ixl_queue *que = txr->que;
struct ixl_vsi *vsi = que->vsi;
struct mbuf *next;
int err = 0;
if (((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) ||
vsi->link_active == 0)
return (ENETDOWN);
/* Process the transmit queue */
while ((next = drbr_peek(ifp, txr->br)) != NULL) {
if ((err = ixl_xmit(que, &next)) != 0) {
if (next == NULL)
drbr_advance(ifp, txr->br);
else
drbr_putback(ifp, txr->br, next);
break;
}
drbr_advance(ifp, txr->br);
/* Send a copy of the frame to the BPF listener */
ETHER_BPF_MTAP(ifp, next);
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
}
if (txr->avail < IXL_TX_CLEANUP_THRESHOLD)
ixl_txeof(que);
return (err);
}
/*
* Called from a taskqueue to drain queued transmit packets.
*/
void
ixl_deferred_mq_start(void *arg, int pending)
{
struct ixl_queue *que = arg;
struct tx_ring *txr = &que->txr;
struct ixl_vsi *vsi = que->vsi;
struct ifnet *ifp = vsi->ifp;
IXL_TX_LOCK(txr);
if (!drbr_empty(ifp, txr->br))
ixl_mq_start_locked(ifp, txr);
IXL_TX_UNLOCK(txr);
}
/*
** Flush all queue ring buffers
*/
void
ixl_qflush(struct ifnet *ifp)
{
struct ixl_vsi *vsi = ifp->if_softc;
for (int i = 0; i < vsi->num_queues; i++) {
struct ixl_queue *que = &vsi->queues[i];
struct tx_ring *txr = &que->txr;
struct mbuf *m;
IXL_TX_LOCK(txr);
while ((m = buf_ring_dequeue_sc(txr->br)) != NULL)
m_freem(m);
IXL_TX_UNLOCK(txr);
}
if_qflush(ifp);
}
/*
** Find mbuf chains passed to the driver
** that are 'sparse', using more than 8
** mbufs to deliver an mss-size chunk of data
*/
static inline bool
ixl_tso_detect_sparse(struct mbuf *mp)
{
struct mbuf *m;
int num = 0, mss;
bool ret = FALSE;
mss = mp->m_pkthdr.tso_segsz;
for (m = mp->m_next; m != NULL; m = m->m_next) {
num++;
mss -= m->m_len;
if (mss < 1)
break;
if (m->m_next == NULL)
break;
}
if (num > IXL_SPARSE_CHAIN)
ret = TRUE;
return (ret);
}
/*********************************************************************
*
* This routine maps the mbufs to tx descriptors, allowing the
* TX engine to transmit the packets.
* - return 0 on success, positive on failure
*
**********************************************************************/
#define IXL_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
static int
ixl_xmit(struct ixl_queue *que, struct mbuf **m_headp)
{
struct ixl_vsi *vsi = que->vsi;
struct i40e_hw *hw = vsi->hw;
struct tx_ring *txr = &que->txr;
struct ixl_tx_buf *buf;
struct i40e_tx_desc *txd = NULL;
struct mbuf *m_head, *m;
int i, j, error, nsegs, maxsegs;
int first, last = 0;
u16 vtag = 0;
u32 cmd, off;
bus_dmamap_t map;
bus_dma_tag_t tag;
bus_dma_segment_t segs[IXL_MAX_TSO_SEGS];
cmd = off = 0;
m_head = *m_headp;
/*
* Important to capture the first descriptor
* used because it will contain the index of
* the one we tell the hardware to report back
*/
first = txr->next_avail;
buf = &txr->buffers[first];
map = buf->map;
tag = txr->tx_tag;
maxsegs = IXL_MAX_TX_SEGS;
if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
/* Use larger mapping for TSO */
tag = txr->tso_tag;
maxsegs = IXL_MAX_TSO_SEGS;
if (ixl_tso_detect_sparse(m_head)) {
m = m_defrag(m_head, M_NOWAIT);
if (m == NULL) {
m_freem(*m_headp);
*m_headp = NULL;
return (ENOBUFS);
}
*m_headp = m;
}
}
/*
* Map the packet for DMA.
*/
error = bus_dmamap_load_mbuf_sg(tag, map,
*m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
if (error == EFBIG) {
struct mbuf *m;
m = m_defrag(*m_headp, M_NOWAIT);
if (m == NULL) {
que->mbuf_defrag_failed++;
m_freem(*m_headp);
*m_headp = NULL;
return (ENOBUFS);
}
*m_headp = m;
/* Try it again */
error = bus_dmamap_load_mbuf_sg(tag, map,
*m_headp, segs, &nsegs, BUS_DMA_NOWAIT);
if (error == ENOMEM) {
que->tx_dma_setup++;
return (error);
} else if (error != 0) {
que->tx_dma_setup++;
m_freem(*m_headp);
*m_headp = NULL;
return (error);
}
} else if (error == ENOMEM) {
que->tx_dma_setup++;
return (error);
} else if (error != 0) {
que->tx_dma_setup++;
m_freem(*m_headp);
*m_headp = NULL;
return (error);
}
/* Make certain there are enough descriptors */
if (nsegs > txr->avail - 2) {
txr->no_desc++;
error = ENOBUFS;
goto xmit_fail;
}
m_head = *m_headp;
/* Set up the TSO/CSUM offload */
if (m_head->m_pkthdr.csum_flags & CSUM_OFFLOAD) {
error = ixl_tx_setup_offload(que, m_head, &cmd, &off);
if (error)
goto xmit_fail;
}
cmd |= I40E_TX_DESC_CMD_ICRC;
/* Grab the VLAN tag */
if (m_head->m_flags & M_VLANTAG) {
cmd |= I40E_TX_DESC_CMD_IL2TAG1;
vtag = htole16(m_head->m_pkthdr.ether_vtag);
}
i = txr->next_avail;
for (j = 0; j < nsegs; j++) {
bus_size_t seglen;
buf = &txr->buffers[i];
buf->tag = tag; /* Keep track of the type tag */
txd = &txr->base[i];
seglen = segs[j].ds_len;
txd->buffer_addr = htole64(segs[j].ds_addr);
txd->cmd_type_offset_bsz =
htole64(I40E_TX_DESC_DTYPE_DATA
| ((u64)cmd << I40E_TXD_QW1_CMD_SHIFT)
| ((u64)off << I40E_TXD_QW1_OFFSET_SHIFT)
| ((u64)seglen << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
| ((u64)vtag << I40E_TXD_QW1_L2TAG1_SHIFT));
last = i; /* descriptor that will get completion IRQ */
if (++i == que->num_desc)
i = 0;
buf->m_head = NULL;
buf->eop_index = -1;
}
/* Set the last descriptor for report */
txd->cmd_type_offset_bsz |=
htole64(((u64)IXL_TXD_CMD << I40E_TXD_QW1_CMD_SHIFT));
txr->avail -= nsegs;
txr->next_avail = i;
buf->m_head = m_head;
/* Swap the dma map between the first and last descriptor */
txr->buffers[first].map = buf->map;
buf->map = map;
bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE);
/* Set the index of the descriptor that will be marked done */
buf = &txr->buffers[first];
buf->eop_index = last;
bus_dmamap_sync(txr->dma.tag, txr->dma.map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
* Advance the Transmit Descriptor Tail (Tdt), this tells the
* hardware that this frame is available to transmit.
*/
++txr->total_packets;
wr32(hw, txr->tail, i);
/* Mark outstanding work */
if (que->busy == 0)
que->busy = 1;
return (0);
xmit_fail:
bus_dmamap_unload(tag, buf->map);
return (error);
}
/*********************************************************************
*
* Allocate memory for tx_buffer structures. The tx_buffer stores all
* the information needed to transmit a packet on the wire. This is
* called only once at attach, setup is done every reset.
*
**********************************************************************/
int
ixl_allocate_tx_data(struct ixl_queue *que)
{
struct tx_ring *txr = &que->txr;
struct ixl_vsi *vsi = que->vsi;
device_t dev = vsi->dev;
struct ixl_tx_buf *buf;
int error = 0;
/*
* Setup DMA descriptor areas.
*/
if ((error = bus_dma_tag_create(NULL, /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
IXL_TSO_SIZE, /* maxsize */
IXL_MAX_TX_SEGS, /* nsegments */
PAGE_SIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->tx_tag))) {
device_printf(dev,"Unable to allocate TX DMA tag\n");
goto fail;
}
/* Make a special tag for TSO */
if ((error = bus_dma_tag_create(NULL, /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
IXL_TSO_SIZE, /* maxsize */
IXL_MAX_TSO_SEGS, /* nsegments */
PAGE_SIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&txr->tso_tag))) {
device_printf(dev,"Unable to allocate TX TSO DMA tag\n");
goto fail;
}
if (!(txr->buffers =
(struct ixl_tx_buf *) malloc(sizeof(struct ixl_tx_buf) *
que->num_desc, M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate tx_buffer memory\n");
error = ENOMEM;
goto fail;
}
/* Create the descriptor buffer default dma maps */
buf = txr->buffers;
for (int i = 0; i < que->num_desc; i++, buf++) {
buf->tag = txr->tx_tag;
error = bus_dmamap_create(buf->tag, 0, &buf->map);
if (error != 0) {
device_printf(dev, "Unable to create TX DMA map\n");
goto fail;
}
}
fail:
return (error);
}
/*********************************************************************
*
* (Re)Initialize a queue transmit ring.
* - called by init, it clears the descriptor ring,
* and frees any stale mbufs
*
**********************************************************************/
void
ixl_init_tx_ring(struct ixl_queue *que)
{
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(que->vsi->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
struct tx_ring *txr = &que->txr;
struct ixl_tx_buf *buf;
/* Clear the old ring contents */
IXL_TX_LOCK(txr);
#ifdef DEV_NETMAP
/*
* (under lock): if in netmap mode, do some consistency
* checks and set slot to entry 0 of the netmap ring.
*/
slot = netmap_reset(na, NR_TX, que->me, 0);
#endif /* DEV_NETMAP */
bzero((void *)txr->base,
(sizeof(struct i40e_tx_desc)) * que->num_desc);
/* Reset indices */
txr->next_avail = 0;
txr->next_to_clean = 0;
#ifdef IXL_FDIR
/* Initialize flow director */
txr->atr_rate = ixl_atr_rate;
txr->atr_count = 0;
#endif
/* Free any existing tx mbufs. */
buf = txr->buffers;
for (int i = 0; i < que->num_desc; i++, buf++) {
if (buf->m_head != NULL) {
bus_dmamap_sync(buf->tag, buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(buf->tag, buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
}
#ifdef DEV_NETMAP
/*
* In netmap mode, set the map for the packet buffer.
* NOTE: Some drivers (not this one) also need to set
* the physical buffer address in the NIC ring.
* netmap_idx_n2k() maps a nic index, i, into the corresponding
* netmap slot index, si
*/
if (slot) {
int si = netmap_idx_n2k(&na->tx_rings[que->me], i);
netmap_load_map(na, buf->tag, buf->map, NMB(na, slot + si));
}
#endif /* DEV_NETMAP */
/* Clear the EOP index */
buf->eop_index = -1;
}
/* Set number of descriptors available */
txr->avail = que->num_desc;
bus_dmamap_sync(txr->dma.tag, txr->dma.map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
IXL_TX_UNLOCK(txr);
}
/*********************************************************************
*
* Free transmit ring related data structures.
*
**********************************************************************/
void
ixl_free_que_tx(struct ixl_queue *que)
{
struct tx_ring *txr = &que->txr;
struct ixl_tx_buf *buf;
INIT_DBG_IF(que->vsi->ifp, "queue %d: begin", que->me);
for (int i = 0; i < que->num_desc; i++) {
buf = &txr->buffers[i];
if (buf->m_head != NULL) {
bus_dmamap_sync(buf->tag, buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(buf->tag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
if (buf->map != NULL) {
bus_dmamap_destroy(buf->tag,
buf->map);
buf->map = NULL;
}
} else if (buf->map != NULL) {
bus_dmamap_unload(buf->tag,
buf->map);
bus_dmamap_destroy(buf->tag,
buf->map);
buf->map = NULL;
}
}
if (txr->br != NULL)
buf_ring_free(txr->br, M_DEVBUF);
if (txr->buffers != NULL) {
free(txr->buffers, M_DEVBUF);
txr->buffers = NULL;
}
if (txr->tx_tag != NULL) {
bus_dma_tag_destroy(txr->tx_tag);
txr->tx_tag = NULL;
}
if (txr->tso_tag != NULL) {
bus_dma_tag_destroy(txr->tso_tag);
txr->tso_tag = NULL;
}
INIT_DBG_IF(que->vsi->ifp, "queue %d: end", que->me);
return;
}
/*********************************************************************
*
* Setup descriptor for hw offloads
*
**********************************************************************/
static int
ixl_tx_setup_offload(struct ixl_queue *que,
struct mbuf *mp, u32 *cmd, u32 *off)
{
struct ether_vlan_header *eh;
#ifdef INET
struct ip *ip = NULL;
#endif
struct tcphdr *th = NULL;
#ifdef INET6
struct ip6_hdr *ip6;
#endif
int elen, ip_hlen = 0, tcp_hlen;
u16 etype;
u8 ipproto = 0;
bool tso = FALSE;
/* Set up the TSO context descriptor if required */
if (mp->m_pkthdr.csum_flags & CSUM_TSO) {
tso = ixl_tso_setup(que, mp);
if (tso)
++que->tso;
else
return (ENXIO);
}
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present,
* helpful for QinQ too.
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
etype = ntohs(eh->evl_proto);
elen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
} else {
etype = ntohs(eh->evl_encap_proto);
elen = ETHER_HDR_LEN;
}
switch (etype) {
#ifdef INET
case ETHERTYPE_IP:
ip = (struct ip *)(mp->m_data + elen);
ip_hlen = ip->ip_hl << 2;
ipproto = ip->ip_p;
th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
/* The IP checksum must be recalculated with TSO */
if (tso)
*cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
else
*cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
break;
#endif
#ifdef INET6
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(mp->m_data + elen);
ip_hlen = sizeof(struct ip6_hdr);
ipproto = ip6->ip6_nxt;
th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
*cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
break;
#endif
default:
break;
}
*off |= (elen >> 1) << I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
*off |= (ip_hlen >> 2) << I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
switch (ipproto) {
case IPPROTO_TCP:
tcp_hlen = th->th_off << 2;
if (mp->m_pkthdr.csum_flags & (CSUM_TCP|CSUM_TCP_IPV6)) {
*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
*off |= (tcp_hlen >> 2) <<
I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
}
#ifdef IXL_FDIR
ixl_atr(que, th, etype);
#endif
break;
case IPPROTO_UDP:
if (mp->m_pkthdr.csum_flags & (CSUM_UDP|CSUM_UDP_IPV6)) {
*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
*off |= (sizeof(struct udphdr) >> 2) <<
I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
}
break;
case IPPROTO_SCTP:
if (mp->m_pkthdr.csum_flags & (CSUM_SCTP|CSUM_SCTP_IPV6)) {
*cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
*off |= (sizeof(struct sctphdr) >> 2) <<
I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
}
/* Fall Thru */
default:
break;
}
return (0);
}
/**********************************************************************
*
* Setup context for hardware segmentation offload (TSO)
*
**********************************************************************/
static bool
ixl_tso_setup(struct ixl_queue *que, struct mbuf *mp)
{
struct tx_ring *txr = &que->txr;
struct i40e_tx_context_desc *TXD;
struct ixl_tx_buf *buf;
u32 cmd, mss, type, tsolen;
u16 etype;
int idx, elen, ip_hlen, tcp_hlen;
struct ether_vlan_header *eh;
#ifdef INET
struct ip *ip;
#endif
#ifdef INET6
struct ip6_hdr *ip6;
#endif
#if defined(INET6) || defined(INET)
struct tcphdr *th;
#endif
u64 type_cmd_tso_mss;
/*
* Determine where frame payload starts.
* Jump over vlan headers if already present
*/
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
elen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
etype = eh->evl_proto;
} else {
elen = ETHER_HDR_LEN;
etype = eh->evl_encap_proto;
}
switch (ntohs(etype)) {
#ifdef INET6
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(mp->m_data + elen);
if (ip6->ip6_nxt != IPPROTO_TCP)
return (ENXIO);
ip_hlen = sizeof(struct ip6_hdr);
th = (struct tcphdr *)((caddr_t)ip6 + ip_hlen);
th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
tcp_hlen = th->th_off << 2;
/*
* The corresponding flag is set by the stack in the IPv4
* TSO case, but not in IPv6 (at least in FreeBSD 10.2).
* So, set it here because the rest of the flow requires it.
*/
mp->m_pkthdr.csum_flags |= CSUM_TCP_IPV6;
break;
#endif
#ifdef INET
case ETHERTYPE_IP:
ip = (struct ip *)(mp->m_data + elen);
if (ip->ip_p != IPPROTO_TCP)
return (ENXIO);
ip->ip_sum = 0;
ip_hlen = ip->ip_hl << 2;
th = (struct tcphdr *)((caddr_t)ip + ip_hlen);
th->th_sum = in_pseudo(ip->ip_src.s_addr,
ip->ip_dst.s_addr, htons(IPPROTO_TCP));
tcp_hlen = th->th_off << 2;
break;
#endif
default:
printf("%s: CSUM_TSO but no supported IP version (0x%04x)",
__func__, ntohs(etype));
return FALSE;
}
/* Ensure we have at least the IP+TCP header in the first mbuf. */
if (mp->m_len < elen + ip_hlen + sizeof(struct tcphdr))
return FALSE;
idx = txr->next_avail;
buf = &txr->buffers[idx];
TXD = (struct i40e_tx_context_desc *) &txr->base[idx];
tsolen = mp->m_pkthdr.len - (elen + ip_hlen + tcp_hlen);
type = I40E_TX_DESC_DTYPE_CONTEXT;
cmd = I40E_TX_CTX_DESC_TSO;
mss = mp->m_pkthdr.tso_segsz;
type_cmd_tso_mss = ((u64)type << I40E_TXD_CTX_QW1_DTYPE_SHIFT) |
((u64)cmd << I40E_TXD_CTX_QW1_CMD_SHIFT) |
((u64)tsolen << I40E_TXD_CTX_QW1_TSO_LEN_SHIFT) |
((u64)mss << I40E_TXD_CTX_QW1_MSS_SHIFT);
TXD->type_cmd_tso_mss = htole64(type_cmd_tso_mss);
TXD->tunneling_params = htole32(0);
buf->m_head = NULL;
buf->eop_index = -1;
if (++idx == que->num_desc)
idx = 0;
txr->avail--;
txr->next_avail = idx;
return TRUE;
}
/*
** ixl_get_tx_head - Retrieve the value from the
** location the HW records its HEAD index
*/
static inline u32
ixl_get_tx_head(struct ixl_queue *que)
{
struct tx_ring *txr = &que->txr;
void *head = &txr->base[que->num_desc];
return LE32_TO_CPU(*(volatile __le32 *)head);
}
/**********************************************************************
*
* Examine each tx_buffer in the used queue. If the hardware is done
* processing the packet then free associated resources. The
* tx_buffer is put back on the free queue.
*
**********************************************************************/
bool
ixl_txeof(struct ixl_queue *que)
{
struct tx_ring *txr = &que->txr;
u32 first, last, head, done, processed;
struct ixl_tx_buf *buf;
struct i40e_tx_desc *tx_desc, *eop_desc;
mtx_assert(&txr->mtx, MA_OWNED);
#ifdef DEV_NETMAP
// XXX todo: implement moderation
if (netmap_tx_irq(que->vsi->ifp, que->me))
return FALSE;
#endif /* DEF_NETMAP */
/* These are not the descriptors you seek, move along :) */
if (txr->avail == que->num_desc) {
que->busy = 0;
return FALSE;
}
processed = 0;
first = txr->next_to_clean;
buf = &txr->buffers[first];
tx_desc = (struct i40e_tx_desc *)&txr->base[first];
last = buf->eop_index;
if (last == -1)
return FALSE;
eop_desc = (struct i40e_tx_desc *)&txr->base[last];
/* Get the Head WB value */
head = ixl_get_tx_head(que);
/*
** Get the index of the first descriptor
** BEYOND the EOP and call that 'done'.
** I do this so the comparison in the
** inner while loop below can be simple
*/
if (++last == que->num_desc) last = 0;
done = last;
bus_dmamap_sync(txr->dma.tag, txr->dma.map,
BUS_DMASYNC_POSTREAD);
/*
** The HEAD index of the ring is written in a
** defined location, this rather than a done bit
** is what is used to keep track of what must be
** 'cleaned'.
*/
while (first != head) {
/* We clean the range of the packet */
while (first != done) {
++txr->avail;
++processed;
if (buf->m_head) {
txr->bytes += /* for ITR adjustment */
buf->m_head->m_pkthdr.len;
txr->tx_bytes += /* for TX stats */
buf->m_head->m_pkthdr.len;
bus_dmamap_sync(buf->tag,
buf->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(buf->tag,
buf->map);
m_freem(buf->m_head);
buf->m_head = NULL;
buf->map = NULL;
}
buf->eop_index = -1;
if (++first == que->num_desc)
first = 0;
buf = &txr->buffers[first];
tx_desc = &txr->base[first];
}
++txr->packets;
/* See if there is more work now */
last = buf->eop_index;
if (last != -1) {
eop_desc = &txr->base[last];
/* Get next done point */
if (++last == que->num_desc) last = 0;
done = last;
} else
break;
}
bus_dmamap_sync(txr->dma.tag, txr->dma.map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
txr->next_to_clean = first;
/*
** Hang detection, we know there's
** work outstanding or the first return
** would have been taken, so indicate an
** unsuccessful pass, in local_timer if
** the value is too great the queue will
** be considered hung. If anything has been
** cleaned then reset the state.
*/
if ((processed == 0) && (que->busy != IXL_QUEUE_HUNG))
++que->busy;
if (processed)
que->busy = 1; /* Note this turns off HUNG */
/*
* If there are no pending descriptors, clear the timeout.
*/
if (txr->avail == que->num_desc) {
que->busy = 0;
return FALSE;
}
return TRUE;
}
/*********************************************************************
*
* Refresh mbuf buffers for RX descriptor rings
* - now keeps its own state so discards due to resource
* exhaustion are unnecessary, if an mbuf cannot be obtained
* it just returns, keeping its placeholder, thus it can simply
* be recalled to try again.
*
**********************************************************************/
static void
ixl_refresh_mbufs(struct ixl_queue *que, int limit)
{
struct ixl_vsi *vsi = que->vsi;
struct rx_ring *rxr = &que->rxr;
bus_dma_segment_t hseg[1];
bus_dma_segment_t pseg[1];
struct ixl_rx_buf *buf;
struct mbuf *mh, *mp;
int i, j, nsegs, error;
bool refreshed = FALSE;
i = j = rxr->next_refresh;
/* Control the loop with one beyond */
if (++j == que->num_desc)
j = 0;
while (j != limit) {
buf = &rxr->buffers[i];
if (rxr->hdr_split == FALSE)
goto no_split;
if (buf->m_head == NULL) {
mh = m_gethdr(M_NOWAIT, MT_DATA);
if (mh == NULL)
goto update;
} else
mh = buf->m_head;
mh->m_pkthdr.len = mh->m_len = MHLEN;
mh->m_len = MHLEN;
mh->m_flags |= M_PKTHDR;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->htag,
buf->hmap, mh, hseg, &nsegs, BUS_DMA_NOWAIT);
if (error != 0) {
printf("Refresh mbufs: hdr dmamap load"
" failure - %d\n", error);
m_free(mh);
buf->m_head = NULL;
goto update;
}
buf->m_head = mh;
bus_dmamap_sync(rxr->htag, buf->hmap,
BUS_DMASYNC_PREREAD);
rxr->base[i].read.hdr_addr =
htole64(hseg[0].ds_addr);
no_split:
if (buf->m_pack == NULL) {
mp = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, rxr->mbuf_sz);
if (mp == NULL)
goto update;
} else
mp = buf->m_pack;
mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
buf->pmap, mp, pseg, &nsegs, BUS_DMA_NOWAIT);
if (error != 0) {
printf("Refresh mbufs: payload dmamap load"
" failure - %d\n", error);
m_free(mp);
buf->m_pack = NULL;
goto update;
}
buf->m_pack = mp;
bus_dmamap_sync(rxr->ptag, buf->pmap,
BUS_DMASYNC_PREREAD);
rxr->base[i].read.pkt_addr =
htole64(pseg[0].ds_addr);
/* Used only when doing header split */
rxr->base[i].read.hdr_addr = 0;
refreshed = TRUE;
/* Next is precalculated */
i = j;
rxr->next_refresh = i;
if (++j == que->num_desc)
j = 0;
}
update:
if (refreshed) /* Update hardware tail index */
wr32(vsi->hw, rxr->tail, rxr->next_refresh);
return;
}
/*********************************************************************
*
* Allocate memory for rx_buffer structures. Since we use one
* rx_buffer per descriptor, the maximum number of rx_buffer's
* that we'll need is equal to the number of receive descriptors
* that we've defined.
*
**********************************************************************/
int
ixl_allocate_rx_data(struct ixl_queue *que)
{
struct rx_ring *rxr = &que->rxr;
struct ixl_vsi *vsi = que->vsi;
device_t dev = vsi->dev;
struct ixl_rx_buf *buf;
int i, bsize, error;
bsize = sizeof(struct ixl_rx_buf) * que->num_desc;
if (!(rxr->buffers =
(struct ixl_rx_buf *) malloc(bsize,
M_DEVBUF, M_NOWAIT | M_ZERO))) {
device_printf(dev, "Unable to allocate rx_buffer memory\n");
error = ENOMEM;
return (error);
}
if ((error = bus_dma_tag_create(NULL, /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MSIZE, /* maxsize */
1, /* nsegments */
MSIZE, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&rxr->htag))) {
device_printf(dev, "Unable to create RX DMA htag\n");
return (error);
}
if ((error = bus_dma_tag_create(NULL, /* parent */
1, 0, /* alignment, bounds */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MJUM16BYTES, /* maxsize */
1, /* nsegments */
MJUM16BYTES, /* maxsegsize */
0, /* flags */
NULL, /* lockfunc */
NULL, /* lockfuncarg */
&rxr->ptag))) {
device_printf(dev, "Unable to create RX DMA ptag\n");
return (error);
}
for (i = 0; i < que->num_desc; i++) {
buf = &rxr->buffers[i];
error = bus_dmamap_create(rxr->htag,
BUS_DMA_NOWAIT, &buf->hmap);
if (error) {
device_printf(dev, "Unable to create RX head map\n");
break;
}
error = bus_dmamap_create(rxr->ptag,
BUS_DMA_NOWAIT, &buf->pmap);
if (error) {
device_printf(dev, "Unable to create RX pkt map\n");
break;
}
}
return (error);
}
/*********************************************************************
*
* (Re)Initialize the queue receive ring and its buffers.
*
**********************************************************************/
int
ixl_init_rx_ring(struct ixl_queue *que)
{
struct rx_ring *rxr = &que->rxr;
struct ixl_vsi *vsi = que->vsi;
#if defined(INET6) || defined(INET)
struct ifnet *ifp = vsi->ifp;
struct lro_ctrl *lro = &rxr->lro;
#endif
struct ixl_rx_buf *buf;
bus_dma_segment_t pseg[1], hseg[1];
int rsize, nsegs, error = 0;
#ifdef DEV_NETMAP
struct netmap_adapter *na = NA(que->vsi->ifp);
struct netmap_slot *slot;
#endif /* DEV_NETMAP */
IXL_RX_LOCK(rxr);
#ifdef DEV_NETMAP
/* same as in ixl_init_tx_ring() */
slot = netmap_reset(na, NR_RX, que->me, 0);
#endif /* DEV_NETMAP */
/* Clear the ring contents */
rsize = roundup2(que->num_desc *
sizeof(union i40e_rx_desc), DBA_ALIGN);
bzero((void *)rxr->base, rsize);
/* Cleanup any existing buffers */
for (int i = 0; i < que->num_desc; i++) {
buf = &rxr->buffers[i];
if (buf->m_head != NULL) {
bus_dmamap_sync(rxr->htag, buf->hmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->htag, buf->hmap);
buf->m_head->m_flags |= M_PKTHDR;
m_freem(buf->m_head);
}
if (buf->m_pack != NULL) {
bus_dmamap_sync(rxr->ptag, buf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, buf->pmap);
buf->m_pack->m_flags |= M_PKTHDR;
m_freem(buf->m_pack);
}
buf->m_head = NULL;
buf->m_pack = NULL;
}
/* header split is off */
rxr->hdr_split = FALSE;
/* Now replenish the mbufs */
for (int j = 0; j != que->num_desc; ++j) {
struct mbuf *mh, *mp;
buf = &rxr->buffers[j];
#ifdef DEV_NETMAP
/*
* In netmap mode, fill the map and set the buffer
* address in the NIC ring, considering the offset
* between the netmap and NIC rings (see comment in
* ixgbe_setup_transmit_ring() ). No need to allocate
* an mbuf, so end the block with a continue;
*/
if (slot) {
int sj = netmap_idx_n2k(&na->rx_rings[que->me], j);
uint64_t paddr;
void *addr;
addr = PNMB(na, slot + sj, &paddr);
netmap_load_map(na, rxr->dma.tag, buf->pmap, addr);
/* Update descriptor and the cached value */
rxr->base[j].read.pkt_addr = htole64(paddr);
rxr->base[j].read.hdr_addr = 0;
continue;
}
#endif /* DEV_NETMAP */
/*
** Don't allocate mbufs if not
** doing header split, its wasteful
*/
if (rxr->hdr_split == FALSE)
goto skip_head;
/* First the header */
buf->m_head = m_gethdr(M_NOWAIT, MT_DATA);
if (buf->m_head == NULL) {
error = ENOBUFS;
goto fail;
}
m_adj(buf->m_head, ETHER_ALIGN);
mh = buf->m_head;
mh->m_len = mh->m_pkthdr.len = MHLEN;
mh->m_flags |= M_PKTHDR;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->htag,
buf->hmap, buf->m_head, hseg,
&nsegs, BUS_DMA_NOWAIT);
if (error != 0) /* Nothing elegant to do here */
goto fail;
bus_dmamap_sync(rxr->htag,
buf->hmap, BUS_DMASYNC_PREREAD);
/* Update descriptor */
rxr->base[j].read.hdr_addr = htole64(hseg[0].ds_addr);
skip_head:
/* Now the payload cluster */
buf->m_pack = m_getjcl(M_NOWAIT, MT_DATA,
M_PKTHDR, rxr->mbuf_sz);
if (buf->m_pack == NULL) {
error = ENOBUFS;
goto fail;
}
mp = buf->m_pack;
mp->m_pkthdr.len = mp->m_len = rxr->mbuf_sz;
/* Get the memory mapping */
error = bus_dmamap_load_mbuf_sg(rxr->ptag,
buf->pmap, mp, pseg,
&nsegs, BUS_DMA_NOWAIT);
if (error != 0)
goto fail;
bus_dmamap_sync(rxr->ptag,
buf->pmap, BUS_DMASYNC_PREREAD);
/* Update descriptor */
rxr->base[j].read.pkt_addr = htole64(pseg[0].ds_addr);
rxr->base[j].read.hdr_addr = 0;
}
/* Setup our descriptor indices */
rxr->next_check = 0;
rxr->next_refresh = 0;
rxr->lro_enabled = FALSE;
rxr->split = 0;
rxr->bytes = 0;
rxr->discard = FALSE;
wr32(vsi->hw, rxr->tail, que->num_desc - 1);
ixl_flush(vsi->hw);
#if defined(INET6) || defined(INET)
/*
** Now set up the LRO interface:
*/
if (ifp->if_capenable & IFCAP_LRO) {
int err = tcp_lro_init(lro);
if (err) {
if_printf(ifp, "queue %d: LRO Initialization failed!\n", que->me);
goto fail;
}
INIT_DBG_IF(ifp, "queue %d: RX Soft LRO Initialized", que->me);
rxr->lro_enabled = TRUE;
lro->ifp = vsi->ifp;
}
#endif
bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
fail:
IXL_RX_UNLOCK(rxr);
return (error);
}
/*********************************************************************
*
* Free station receive ring data structures
*
**********************************************************************/
void
ixl_free_que_rx(struct ixl_queue *que)
{
struct rx_ring *rxr = &que->rxr;
struct ixl_rx_buf *buf;
INIT_DBG_IF(que->vsi->ifp, "queue %d: begin", que->me);
/* Cleanup any existing buffers */
if (rxr->buffers != NULL) {
for (int i = 0; i < que->num_desc; i++) {
buf = &rxr->buffers[i];
if (buf->m_head != NULL) {
bus_dmamap_sync(rxr->htag, buf->hmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->htag, buf->hmap);
buf->m_head->m_flags |= M_PKTHDR;
m_freem(buf->m_head);
}
if (buf->m_pack != NULL) {
bus_dmamap_sync(rxr->ptag, buf->pmap,
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(rxr->ptag, buf->pmap);
buf->m_pack->m_flags |= M_PKTHDR;
m_freem(buf->m_pack);
}
buf->m_head = NULL;
buf->m_pack = NULL;
if (buf->hmap != NULL) {
bus_dmamap_destroy(rxr->htag, buf->hmap);
buf->hmap = NULL;
}
if (buf->pmap != NULL) {
bus_dmamap_destroy(rxr->ptag, buf->pmap);
buf->pmap = NULL;
}
}
if (rxr->buffers != NULL) {
free(rxr->buffers, M_DEVBUF);
rxr->buffers = NULL;
}
}
if (rxr->htag != NULL) {
bus_dma_tag_destroy(rxr->htag);
rxr->htag = NULL;
}
if (rxr->ptag != NULL) {
bus_dma_tag_destroy(rxr->ptag);
rxr->ptag = NULL;
}
INIT_DBG_IF(que->vsi->ifp, "queue %d: end", que->me);
return;
}
static __inline void
ixl_rx_input(struct rx_ring *rxr, struct ifnet *ifp, struct mbuf *m, u8 ptype)
{
#if defined(INET6) || defined(INET)
/*
* ATM LRO is only for IPv4/TCP packets and TCP checksum of the packet
* should be computed by hardware. Also it should not have VLAN tag in
* ethernet header.
*/
if (rxr->lro_enabled &&
(ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0 &&
(m->m_pkthdr.csum_flags & (CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) ==
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR)) {
/*
* Send to the stack if:
** - LRO not enabled, or
** - no LRO resources, or
** - lro enqueue fails
*/
if (rxr->lro.lro_cnt != 0)
if (tcp_lro_rx(&rxr->lro, m, 0) == 0)
return;
}
#endif
IXL_RX_UNLOCK(rxr);
(*ifp->if_input)(ifp, m);
IXL_RX_LOCK(rxr);
}
static __inline void
ixl_rx_discard(struct rx_ring *rxr, int i)
{
struct ixl_rx_buf *rbuf;
rbuf = &rxr->buffers[i];
if (rbuf->fmp != NULL) {/* Partial chain ? */
rbuf->fmp->m_flags |= M_PKTHDR;
m_freem(rbuf->fmp);
rbuf->fmp = NULL;
}
/*
** With advanced descriptors the writeback
** clobbers the buffer addrs, so its easier
** to just free the existing mbufs and take
** the normal refresh path to get new buffers
** and mapping.
*/
if (rbuf->m_head) {
m_free(rbuf->m_head);
rbuf->m_head = NULL;
}
if (rbuf->m_pack) {
m_free(rbuf->m_pack);
rbuf->m_pack = NULL;
}
return;
}
#ifdef RSS
/*
** i40e_ptype_to_hash: parse the packet type
** to determine the appropriate hash.
*/
static inline int
ixl_ptype_to_hash(u8 ptype)
{
struct i40e_rx_ptype_decoded decoded;
u8 ex = 0;
decoded = decode_rx_desc_ptype(ptype);
ex = decoded.outer_frag;
if (!decoded.known)
- return M_HASHTYPE_OPAQUE;
+ return M_HASHTYPE_OPAQUE_HASH;
if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_L2)
- return M_HASHTYPE_OPAQUE;
+ return M_HASHTYPE_OPAQUE_HASH;
/* Note: anything that gets to this point is IP */
if (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6) {
switch (decoded.inner_prot) {
case I40E_RX_PTYPE_INNER_PROT_TCP:
if (ex)
return M_HASHTYPE_RSS_TCP_IPV6_EX;
else
return M_HASHTYPE_RSS_TCP_IPV6;
case I40E_RX_PTYPE_INNER_PROT_UDP:
if (ex)
return M_HASHTYPE_RSS_UDP_IPV6_EX;
else
return M_HASHTYPE_RSS_UDP_IPV6;
default:
if (ex)
return M_HASHTYPE_RSS_IPV6_EX;
else
return M_HASHTYPE_RSS_IPV6;
}
}
if (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4) {
switch (decoded.inner_prot) {
case I40E_RX_PTYPE_INNER_PROT_TCP:
return M_HASHTYPE_RSS_TCP_IPV4;
case I40E_RX_PTYPE_INNER_PROT_UDP:
if (ex)
return M_HASHTYPE_RSS_UDP_IPV4_EX;
else
return M_HASHTYPE_RSS_UDP_IPV4;
default:
return M_HASHTYPE_RSS_IPV4;
}
}
/* We should never get here!! */
- return M_HASHTYPE_OPAQUE;
+ return M_HASHTYPE_OPAQUE_HASH;
}
#endif /* RSS */
/*********************************************************************
*
* This routine executes in interrupt context. It replenishes
* the mbufs in the descriptor and sends data which has been
* dma'ed into host memory to upper layer.
*
* We loop at most count times if count is > 0, or until done if
* count < 0.
*
* Return TRUE for more work, FALSE for all clean.
*********************************************************************/
bool
ixl_rxeof(struct ixl_queue *que, int count)
{
struct ixl_vsi *vsi = que->vsi;
struct rx_ring *rxr = &que->rxr;
struct ifnet *ifp = vsi->ifp;
#if defined(INET6) || defined(INET)
struct lro_ctrl *lro = &rxr->lro;
#endif
int i, nextp, processed = 0;
union i40e_rx_desc *cur;
struct ixl_rx_buf *rbuf, *nbuf;
IXL_RX_LOCK(rxr);
#ifdef DEV_NETMAP
if (netmap_rx_irq(ifp, que->me, &count)) {
IXL_RX_UNLOCK(rxr);
return (FALSE);
}
#endif /* DEV_NETMAP */
for (i = rxr->next_check; count != 0;) {
struct mbuf *sendmp, *mh, *mp;
u32 rsc, status, error;
u16 hlen, plen, vtag;
u64 qword;
u8 ptype;
bool eop;
/* Sync the ring. */
bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
cur = &rxr->base[i];
qword = le64toh(cur->wb.qword1.status_error_len);
status = (qword & I40E_RXD_QW1_STATUS_MASK)
>> I40E_RXD_QW1_STATUS_SHIFT;
error = (qword & I40E_RXD_QW1_ERROR_MASK)
>> I40E_RXD_QW1_ERROR_SHIFT;
plen = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK)
>> I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
hlen = (qword & I40E_RXD_QW1_LENGTH_HBUF_MASK)
>> I40E_RXD_QW1_LENGTH_HBUF_SHIFT;
ptype = (qword & I40E_RXD_QW1_PTYPE_MASK)
>> I40E_RXD_QW1_PTYPE_SHIFT;
if ((status & (1 << I40E_RX_DESC_STATUS_DD_SHIFT)) == 0) {
++rxr->not_done;
break;
}
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
break;
count--;
sendmp = NULL;
nbuf = NULL;
rsc = 0;
cur->wb.qword1.status_error_len = 0;
rbuf = &rxr->buffers[i];
mh = rbuf->m_head;
mp = rbuf->m_pack;
eop = (status & (1 << I40E_RX_DESC_STATUS_EOF_SHIFT));
if (status & (1 << I40E_RX_DESC_STATUS_L2TAG1P_SHIFT))
vtag = le16toh(cur->wb.qword0.lo_dword.l2tag1);
else
vtag = 0;
/*
** Make sure bad packets are discarded,
** note that only EOP descriptor has valid
** error results.
*/
if (eop && (error & (1 << I40E_RX_DESC_ERROR_RXE_SHIFT))) {
rxr->desc_errs++;
ixl_rx_discard(rxr, i);
goto next_desc;
}
/* Prefetch the next buffer */
if (!eop) {
nextp = i + 1;
if (nextp == que->num_desc)
nextp = 0;
nbuf = &rxr->buffers[nextp];
prefetch(nbuf);
}
/*
** The header mbuf is ONLY used when header
** split is enabled, otherwise we get normal
** behavior, ie, both header and payload
** are DMA'd into the payload buffer.
**
** Rather than using the fmp/lmp global pointers
** we now keep the head of a packet chain in the
** buffer struct and pass this along from one
** descriptor to the next, until we get EOP.
*/
if (rxr->hdr_split && (rbuf->fmp == NULL)) {
if (hlen > IXL_RX_HDR)
hlen = IXL_RX_HDR;
mh->m_len = hlen;
mh->m_flags |= M_PKTHDR;
mh->m_next = NULL;
mh->m_pkthdr.len = mh->m_len;
/* Null buf pointer so it is refreshed */
rbuf->m_head = NULL;
/*
** Check the payload length, this
** could be zero if its a small
** packet.
*/
if (plen > 0) {
mp->m_len = plen;
mp->m_next = NULL;
mp->m_flags &= ~M_PKTHDR;
mh->m_next = mp;
mh->m_pkthdr.len += mp->m_len;
/* Null buf pointer so it is refreshed */
rbuf->m_pack = NULL;
rxr->split++;
}
/*
** Now create the forward
** chain so when complete
** we wont have to.
*/
if (eop == 0) {
/* stash the chain head */
nbuf->fmp = mh;
/* Make forward chain */
if (plen)
mp->m_next = nbuf->m_pack;
else
mh->m_next = nbuf->m_pack;
} else {
/* Singlet, prepare to send */
sendmp = mh;
if (vtag) {
sendmp->m_pkthdr.ether_vtag = vtag;
sendmp->m_flags |= M_VLANTAG;
}
}
} else {
/*
** Either no header split, or a
** secondary piece of a fragmented
** split packet.
*/
mp->m_len = plen;
/*
** See if there is a stored head
** that determines what we are
*/
sendmp = rbuf->fmp;
rbuf->m_pack = rbuf->fmp = NULL;
if (sendmp != NULL) /* secondary frag */
sendmp->m_pkthdr.len += mp->m_len;
else {
/* first desc of a non-ps chain */
sendmp = mp;
sendmp->m_flags |= M_PKTHDR;
sendmp->m_pkthdr.len = mp->m_len;
if (vtag) {
sendmp->m_pkthdr.ether_vtag = vtag;
sendmp->m_flags |= M_VLANTAG;
}
}
/* Pass the head pointer on */
if (eop == 0) {
nbuf->fmp = sendmp;
sendmp = NULL;
mp->m_next = nbuf->m_pack;
}
}
++processed;
/* Sending this frame? */
if (eop) {
sendmp->m_pkthdr.rcvif = ifp;
/* gather stats */
rxr->rx_packets++;
rxr->rx_bytes += sendmp->m_pkthdr.len;
/* capture data for dynamic ITR adjustment */
rxr->packets++;
rxr->bytes += sendmp->m_pkthdr.len;
if ((ifp->if_capenable & IFCAP_RXCSUM) != 0)
ixl_rx_checksum(sendmp, status, error, ptype);
#ifdef RSS
sendmp->m_pkthdr.flowid =
le32toh(cur->wb.qword0.hi_dword.rss);
M_HASHTYPE_SET(sendmp, ixl_ptype_to_hash(ptype));
#else
sendmp->m_pkthdr.flowid = que->msix;
M_HASHTYPE_SET(sendmp, M_HASHTYPE_OPAQUE);
#endif
}
next_desc:
bus_dmamap_sync(rxr->dma.tag, rxr->dma.map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/* Advance our pointers to the next descriptor. */
if (++i == que->num_desc)
i = 0;
/* Now send to the stack or do LRO */
if (sendmp != NULL) {
rxr->next_check = i;
ixl_rx_input(rxr, ifp, sendmp, ptype);
i = rxr->next_check;
}
/* Every 8 descriptors we go to refresh mbufs */
if (processed == 8) {
ixl_refresh_mbufs(que, i);
processed = 0;
}
}
/* Refresh any remaining buf structs */
if (ixl_rx_unrefreshed(que))
ixl_refresh_mbufs(que, i);
rxr->next_check = i;
#if defined(INET6) || defined(INET)
/*
* Flush any outstanding LRO work
*/
tcp_lro_flush_all(lro);
#endif
IXL_RX_UNLOCK(rxr);
return (FALSE);
}
/*********************************************************************
*
* Verify that the hardware indicated that the checksum is valid.
* Inform the stack about the status of checksum so that stack
* doesn't spend time verifying the checksum.
*
*********************************************************************/
static void
ixl_rx_checksum(struct mbuf * mp, u32 status, u32 error, u8 ptype)
{
struct i40e_rx_ptype_decoded decoded;
decoded = decode_rx_desc_ptype(ptype);
/* Errors? */
if (error & ((1 << I40E_RX_DESC_ERROR_IPE_SHIFT) |
(1 << I40E_RX_DESC_ERROR_L4E_SHIFT))) {
mp->m_pkthdr.csum_flags = 0;
return;
}
/* IPv6 with extension headers likely have bad csum */
if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6)
if (status &
(1 << I40E_RX_DESC_STATUS_IPV6EXADD_SHIFT)) {
mp->m_pkthdr.csum_flags = 0;
return;
}
/* IP Checksum Good */
mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED;
mp->m_pkthdr.csum_flags |= CSUM_IP_VALID;
if (status & (1 << I40E_RX_DESC_STATUS_L3L4P_SHIFT)) {
mp->m_pkthdr.csum_flags |=
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
mp->m_pkthdr.csum_data |= htons(0xffff);
}
return;
}
#if __FreeBSD_version >= 1100000
uint64_t
ixl_get_counter(if_t ifp, ift_counter cnt)
{
struct ixl_vsi *vsi;
vsi = if_getsoftc(ifp);
switch (cnt) {
case IFCOUNTER_IPACKETS:
return (vsi->ipackets);
case IFCOUNTER_IERRORS:
return (vsi->ierrors);
case IFCOUNTER_OPACKETS:
return (vsi->opackets);
case IFCOUNTER_OERRORS:
return (vsi->oerrors);
case IFCOUNTER_COLLISIONS:
/* Collisions are by standard impossible in 40G/10G Ethernet */
return (0);
case IFCOUNTER_IBYTES:
return (vsi->ibytes);
case IFCOUNTER_OBYTES:
return (vsi->obytes);
case IFCOUNTER_IMCASTS:
return (vsi->imcasts);
case IFCOUNTER_OMCASTS:
return (vsi->omcasts);
case IFCOUNTER_IQDROPS:
return (vsi->iqdrops);
case IFCOUNTER_OQDROPS:
return (vsi->oqdrops);
case IFCOUNTER_NOPROTO:
return (vsi->noproto);
default:
return (if_get_counter_default(ifp, cnt));
}
}
#endif
Index: projects/vnet/sys/dev/mlx5/driver.h
===================================================================
--- projects/vnet/sys/dev/mlx5/driver.h (revision 301546)
+++ projects/vnet/sys/dev/mlx5/driver.h (revision 301547)
@@ -1,943 +1,944 @@
/*-
* Copyright (c) 2013-2015, Mellanox Technologies, Ltd. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef MLX5_DRIVER_H
#define MLX5_DRIVER_H
#include
#include
#include
#include
#include
+#include
#include
#include
#include
#include
#include
#include
enum {
MLX5_BOARD_ID_LEN = 64,
MLX5_MAX_NAME_LEN = 16,
};
enum {
/* one minute for the sake of bringup. Generally, commands must always
* complete and we may need to increase this timeout value
*/
MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000,
MLX5_CMD_WQ_MAX_NAME = 32,
};
enum {
CMD_OWNER_SW = 0x0,
CMD_OWNER_HW = 0x1,
CMD_STATUS_SUCCESS = 0,
};
enum mlx5_sqp_t {
MLX5_SQP_SMI = 0,
MLX5_SQP_GSI = 1,
MLX5_SQP_IEEE_1588 = 2,
MLX5_SQP_SNIFFER = 3,
MLX5_SQP_SYNC_UMR = 4,
};
enum {
MLX5_MAX_PORTS = 2,
};
enum {
MLX5_EQ_VEC_PAGES = 0,
MLX5_EQ_VEC_CMD = 1,
MLX5_EQ_VEC_ASYNC = 2,
MLX5_EQ_VEC_COMP_BASE,
};
enum {
MLX5_MAX_IRQ_NAME = 32
};
enum {
MLX5_ATOMIC_MODE_IB_COMP = 1 << 16,
MLX5_ATOMIC_MODE_CX = 2 << 16,
MLX5_ATOMIC_MODE_8B = 3 << 16,
MLX5_ATOMIC_MODE_16B = 4 << 16,
MLX5_ATOMIC_MODE_32B = 5 << 16,
MLX5_ATOMIC_MODE_64B = 6 << 16,
MLX5_ATOMIC_MODE_128B = 7 << 16,
MLX5_ATOMIC_MODE_256B = 8 << 16,
};
enum {
MLX5_REG_QETCR = 0x4005,
MLX5_REG_QPDP = 0x4007,
MLX5_REG_QTCT = 0x400A,
MLX5_REG_PCAP = 0x5001,
MLX5_REG_PMTU = 0x5003,
MLX5_REG_PTYS = 0x5004,
MLX5_REG_PAOS = 0x5006,
MLX5_REG_PFCC = 0x5007,
MLX5_REG_PPCNT = 0x5008,
MLX5_REG_PMAOS = 0x5012,
MLX5_REG_PUDE = 0x5009,
MLX5_REG_PPTB = 0x500B,
MLX5_REG_PBMC = 0x500C,
MLX5_REG_PMPE = 0x5010,
MLX5_REG_PELC = 0x500e,
MLX5_REG_PVLC = 0x500f,
MLX5_REG_PMLP = 0x5002,
MLX5_REG_NODE_DESC = 0x6001,
MLX5_REG_HOST_ENDIANNESS = 0x7004,
MLX5_REG_MCIA = 0x9014,
};
enum dbg_rsc_type {
MLX5_DBG_RSC_QP,
MLX5_DBG_RSC_EQ,
MLX5_DBG_RSC_CQ,
};
struct mlx5_field_desc {
struct dentry *dent;
int i;
};
struct mlx5_rsc_debug {
struct mlx5_core_dev *dev;
void *object;
enum dbg_rsc_type type;
struct dentry *root;
struct mlx5_field_desc fields[0];
};
enum mlx5_dev_event {
MLX5_DEV_EVENT_SYS_ERROR,
MLX5_DEV_EVENT_PORT_UP,
MLX5_DEV_EVENT_PORT_DOWN,
MLX5_DEV_EVENT_PORT_INITIALIZED,
MLX5_DEV_EVENT_LID_CHANGE,
MLX5_DEV_EVENT_PKEY_CHANGE,
MLX5_DEV_EVENT_GUID_CHANGE,
MLX5_DEV_EVENT_CLIENT_REREG,
MLX5_DEV_EVENT_VPORT_CHANGE,
};
enum mlx5_port_status {
MLX5_PORT_UP = 1 << 0,
MLX5_PORT_DOWN = 1 << 1,
};
enum mlx5_link_mode {
MLX5_1000BASE_CX_SGMII = 0,
MLX5_1000BASE_KX = 1,
MLX5_10GBASE_CX4 = 2,
MLX5_10GBASE_KX4 = 3,
MLX5_10GBASE_KR = 4,
MLX5_20GBASE_KR2 = 5,
MLX5_40GBASE_CR4 = 6,
MLX5_40GBASE_KR4 = 7,
MLX5_56GBASE_R4 = 8,
MLX5_10GBASE_CR = 12,
MLX5_10GBASE_SR = 13,
MLX5_10GBASE_ER = 14,
MLX5_40GBASE_SR4 = 15,
MLX5_40GBASE_LR4 = 16,
MLX5_100GBASE_CR4 = 20,
MLX5_100GBASE_SR4 = 21,
MLX5_100GBASE_KR4 = 22,
MLX5_100GBASE_LR4 = 23,
MLX5_100BASE_TX = 24,
MLX5_1000BASE_T = 25,
MLX5_10GBASE_T = 26,
MLX5_25GBASE_CR = 27,
MLX5_25GBASE_KR = 28,
MLX5_25GBASE_SR = 29,
MLX5_50GBASE_CR2 = 30,
MLX5_50GBASE_KR2 = 31,
MLX5_LINK_MODES_NUMBER,
};
#define MLX5_PROT_MASK(link_mode) (1 << link_mode)
struct mlx5_uuar_info {
struct mlx5_uar *uars;
int num_uars;
int num_low_latency_uuars;
unsigned long *bitmap;
unsigned int *count;
struct mlx5_bf *bfs;
/*
* protect uuar allocation data structs
*/
struct mutex lock;
u32 ver;
};
struct mlx5_bf {
void __iomem *reg;
void __iomem *regreg;
int buf_size;
struct mlx5_uar *uar;
unsigned long offset;
int need_lock;
/* protect blue flame buffer selection when needed
*/
spinlock_t lock;
/* serialize 64 bit writes when done as two 32 bit accesses
*/
spinlock_t lock32;
int uuarn;
};
struct mlx5_cmd_first {
__be32 data[4];
};
struct mlx5_cmd_msg {
struct list_head list;
struct cache_ent *cache;
u32 len;
struct mlx5_cmd_first first;
struct mlx5_cmd_mailbox *next;
};
struct mlx5_cmd_debug {
struct dentry *dbg_root;
struct dentry *dbg_in;
struct dentry *dbg_out;
struct dentry *dbg_outlen;
struct dentry *dbg_status;
struct dentry *dbg_run;
void *in_msg;
void *out_msg;
u8 status;
u16 inlen;
u16 outlen;
};
struct cache_ent {
/* protect block chain allocations
*/
spinlock_t lock;
struct list_head head;
};
struct cmd_msg_cache {
struct cache_ent large;
struct cache_ent med;
};
struct mlx5_cmd_stats {
u64 sum;
u64 n;
struct dentry *root;
struct dentry *avg;
struct dentry *count;
/* protect command average calculations */
spinlock_t lock;
};
struct mlx5_cmd {
void *cmd_alloc_buf;
dma_addr_t alloc_dma;
int alloc_size;
void *cmd_buf;
dma_addr_t dma;
u16 cmdif_rev;
u8 log_sz;
u8 log_stride;
int max_reg_cmds;
int events;
u32 __iomem *vector;
/* protect command queue allocations
*/
spinlock_t alloc_lock;
/* protect token allocations
*/
spinlock_t token_lock;
u8 token;
unsigned long bitmask;
char wq_name[MLX5_CMD_WQ_MAX_NAME];
struct workqueue_struct *wq;
struct semaphore sem;
struct semaphore pages_sem;
int mode;
struct mlx5_cmd_work_ent *ent_arr[MLX5_MAX_COMMANDS];
struct pci_pool *pool;
struct mlx5_cmd_debug dbg;
struct cmd_msg_cache cache;
int checksum_disabled;
struct mlx5_cmd_stats stats[MLX5_CMD_OP_MAX];
int moving_to_polling;
};
struct mlx5_port_caps {
int gid_table_len;
int pkey_table_len;
u8 ext_port_cap;
};
struct mlx5_cmd_mailbox {
void *buf;
dma_addr_t dma;
struct mlx5_cmd_mailbox *next;
};
struct mlx5_buf_list {
void *buf;
dma_addr_t map;
};
struct mlx5_buf {
struct mlx5_buf_list direct;
struct mlx5_buf_list *page_list;
int nbufs;
int npages;
int size;
u8 page_shift;
};
struct mlx5_eq {
struct mlx5_core_dev *dev;
__be32 __iomem *doorbell;
u32 cons_index;
struct mlx5_buf buf;
int size;
u8 irqn;
u8 eqn;
int nent;
u64 mask;
struct list_head list;
int index;
struct mlx5_rsc_debug *dbg;
};
struct mlx5_core_psv {
u32 psv_idx;
struct psv_layout {
u32 pd;
u16 syndrome;
u16 reserved;
u16 bg;
u16 app_tag;
u32 ref_tag;
} psv;
};
struct mlx5_core_sig_ctx {
struct mlx5_core_psv psv_memory;
struct mlx5_core_psv psv_wire;
#if (__FreeBSD_version >= 1100000)
struct ib_sig_err err_item;
#endif
bool sig_status_checked;
bool sig_err_exists;
u32 sigerr_count;
};
struct mlx5_core_mr {
u64 iova;
u64 size;
u32 key;
u32 pd;
};
enum mlx5_res_type {
MLX5_RES_QP,
MLX5_RES_SRQ,
MLX5_RES_XSRQ,
};
struct mlx5_core_rsc_common {
enum mlx5_res_type res;
atomic_t refcount;
struct completion free;
};
struct mlx5_core_srq {
struct mlx5_core_rsc_common common; /* must be first */
u32 srqn;
int max;
int max_gs;
int max_avail_gather;
int wqe_shift;
void (*event)(struct mlx5_core_srq *, int);
atomic_t refcount;
struct completion free;
};
struct mlx5_eq_table {
void __iomem *update_ci;
void __iomem *update_arm_ci;
struct list_head comp_eqs_list;
struct mlx5_eq pages_eq;
struct mlx5_eq async_eq;
struct mlx5_eq cmd_eq;
int num_comp_vectors;
/* protect EQs list
*/
spinlock_t lock;
};
struct mlx5_uar {
u32 index;
struct list_head bf_list;
unsigned free_bf_bmap;
void __iomem *bf_map;
void __iomem *map;
};
struct mlx5_core_health {
struct mlx5_health_buffer __iomem *health;
__be32 __iomem *health_counter;
struct timer_list timer;
struct list_head list;
u32 prev;
int miss_counter;
};
#define MLX5_CQ_LINEAR_ARRAY_SIZE 1024
struct mlx5_cq_linear_array_entry {
spinlock_t lock;
struct mlx5_core_cq * volatile cq;
};
struct mlx5_cq_table {
/* protect radix tree
*/
spinlock_t lock;
struct radix_tree_root tree;
struct mlx5_cq_linear_array_entry linear_array[MLX5_CQ_LINEAR_ARRAY_SIZE];
};
struct mlx5_qp_table {
/* protect radix tree
*/
spinlock_t lock;
struct radix_tree_root tree;
};
struct mlx5_srq_table {
/* protect radix tree
*/
spinlock_t lock;
struct radix_tree_root tree;
};
struct mlx5_mr_table {
/* protect radix tree
*/
rwlock_t lock;
struct radix_tree_root tree;
};
struct mlx5_irq_info {
char name[MLX5_MAX_IRQ_NAME];
};
struct mlx5_priv {
char name[MLX5_MAX_NAME_LEN];
struct mlx5_eq_table eq_table;
struct msix_entry *msix_arr;
struct mlx5_irq_info *irq_info;
struct mlx5_uuar_info uuari;
MLX5_DECLARE_DOORBELL_LOCK(cq_uar_lock);
struct io_mapping *bf_mapping;
/* pages stuff */
struct workqueue_struct *pg_wq;
struct rb_root page_root;
int fw_pages;
int reg_pages;
struct list_head free_list;
struct mlx5_core_health health;
struct mlx5_srq_table srq_table;
/* start: qp staff */
struct mlx5_qp_table qp_table;
struct dentry *qp_debugfs;
struct dentry *eq_debugfs;
struct dentry *cq_debugfs;
struct dentry *cmdif_debugfs;
/* end: qp staff */
/* start: cq staff */
struct mlx5_cq_table cq_table;
/* end: cq staff */
/* start: mr staff */
struct mlx5_mr_table mr_table;
/* end: mr staff */
/* start: alloc staff */
int numa_node;
struct mutex pgdir_mutex;
struct list_head pgdir_list;
/* end: alloc staff */
struct dentry *dbg_root;
/* protect mkey key part */
spinlock_t mkey_lock;
u8 mkey_key;
struct list_head dev_list;
struct list_head ctx_list;
spinlock_t ctx_lock;
};
struct mlx5_special_contexts {
int resd_lkey;
};
struct mlx5_core_dev {
struct pci_dev *pdev;
char board_id[MLX5_BOARD_ID_LEN];
struct mlx5_cmd cmd;
struct mlx5_port_caps port_caps[MLX5_MAX_PORTS];
u32 hca_caps_cur[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
u32 hca_caps_max[MLX5_CAP_NUM][MLX5_UN_SZ_DW(hca_cap_union)];
struct mlx5_init_seg __iomem *iseg;
void (*event) (struct mlx5_core_dev *dev,
enum mlx5_dev_event event,
unsigned long param);
struct mlx5_priv priv;
struct mlx5_profile *profile;
atomic_t num_qps;
u32 issi;
struct mlx5_special_contexts special_contexts;
unsigned int module_status[MLX5_MAX_PORTS];
};
enum {
MLX5_WOL_DISABLE = 0,
MLX5_WOL_SECURED_MAGIC = 1 << 1,
MLX5_WOL_MAGIC = 1 << 2,
MLX5_WOL_ARP = 1 << 3,
MLX5_WOL_BROADCAST = 1 << 4,
MLX5_WOL_MULTICAST = 1 << 5,
MLX5_WOL_UNICAST = 1 << 6,
MLX5_WOL_PHY_ACTIVITY = 1 << 7,
};
struct mlx5_db {
__be32 *db;
union {
struct mlx5_db_pgdir *pgdir;
struct mlx5_ib_user_db_page *user_page;
} u;
dma_addr_t dma;
int index;
};
struct mlx5_net_counters {
u64 packets;
u64 octets;
};
struct mlx5_ptys_reg {
u8 local_port;
u8 proto_mask;
u32 eth_proto_cap;
u16 ib_link_width_cap;
u16 ib_proto_cap;
u32 eth_proto_admin;
u16 ib_link_width_admin;
u16 ib_proto_admin;
u32 eth_proto_oper;
u16 ib_link_width_oper;
u16 ib_proto_oper;
u32 eth_proto_lp_advertise;
};
struct mlx5_pvlc_reg {
u8 local_port;
u8 vl_hw_cap;
u8 vl_admin;
u8 vl_operational;
};
struct mlx5_pmtu_reg {
u8 local_port;
u16 max_mtu;
u16 admin_mtu;
u16 oper_mtu;
};
struct mlx5_vport_counters {
struct mlx5_net_counters received_errors;
struct mlx5_net_counters transmit_errors;
struct mlx5_net_counters received_ib_unicast;
struct mlx5_net_counters transmitted_ib_unicast;
struct mlx5_net_counters received_ib_multicast;
struct mlx5_net_counters transmitted_ib_multicast;
struct mlx5_net_counters received_eth_broadcast;
struct mlx5_net_counters transmitted_eth_broadcast;
struct mlx5_net_counters received_eth_unicast;
struct mlx5_net_counters transmitted_eth_unicast;
struct mlx5_net_counters received_eth_multicast;
struct mlx5_net_counters transmitted_eth_multicast;
};
enum {
MLX5_DB_PER_PAGE = PAGE_SIZE / L1_CACHE_BYTES,
};
enum {
MLX5_COMP_EQ_SIZE = 1024,
};
enum {
MLX5_PTYS_IB = 1 << 0,
MLX5_PTYS_EN = 1 << 2,
};
struct mlx5_db_pgdir {
struct list_head list;
DECLARE_BITMAP(bitmap, MLX5_DB_PER_PAGE);
__be32 *db_page;
dma_addr_t db_dma;
};
typedef void (*mlx5_cmd_cbk_t)(int status, void *context);
struct mlx5_cmd_work_ent {
struct mlx5_cmd_msg *in;
struct mlx5_cmd_msg *out;
void *uout;
int uout_size;
mlx5_cmd_cbk_t callback;
void *context;
int idx;
struct completion done;
struct mlx5_cmd *cmd;
struct work_struct work;
struct mlx5_cmd_layout *lay;
int ret;
int page_queue;
u8 status;
u8 token;
u64 ts1;
u64 ts2;
u16 op;
};
struct mlx5_pas {
u64 pa;
u8 log_sz;
};
static inline void *mlx5_buf_offset(struct mlx5_buf *buf, int offset)
{
if (likely(BITS_PER_LONG == 64 || buf->nbufs == 1))
return buf->direct.buf + offset;
else
return buf->page_list[offset >> PAGE_SHIFT].buf +
(offset & (PAGE_SIZE - 1));
}
extern struct workqueue_struct *mlx5_core_wq;
#define STRUCT_FIELD(header, field) \
.struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field), \
.struct_size_bytes = sizeof((struct ib_unpacked_ ## header *)0)->field
static inline struct mlx5_core_dev *pci2mlx5_core_dev(struct pci_dev *pdev)
{
return pci_get_drvdata(pdev);
}
extern struct dentry *mlx5_debugfs_root;
static inline u16 fw_rev_maj(struct mlx5_core_dev *dev)
{
return ioread32be(&dev->iseg->fw_rev) & 0xffff;
}
static inline u16 fw_rev_min(struct mlx5_core_dev *dev)
{
return ioread32be(&dev->iseg->fw_rev) >> 16;
}
static inline u16 fw_rev_sub(struct mlx5_core_dev *dev)
{
return ioread32be(&dev->iseg->cmdif_rev_fw_sub) & 0xffff;
}
static inline u16 cmdif_rev_get(struct mlx5_core_dev *dev)
{
return ioread32be(&dev->iseg->cmdif_rev_fw_sub) >> 16;
}
static inline int mlx5_get_gid_table_len(u16 param)
{
if (param > 4) {
printf("M4_CORE_DRV_NAME: WARN: ""gid table length is zero\n");
return 0;
}
return 8 * (1 << param);
}
static inline void *mlx5_vzalloc(unsigned long size)
{
void *rtn;
rtn = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
return rtn;
}
static inline u32 mlx5_base_mkey(const u32 key)
{
return key & 0xffffff00u;
}
int mlx5_cmd_init(struct mlx5_core_dev *dev);
void mlx5_cmd_cleanup(struct mlx5_core_dev *dev);
void mlx5_cmd_use_events(struct mlx5_core_dev *dev);
void mlx5_cmd_use_polling(struct mlx5_core_dev *dev);
int mlx5_cmd_status_to_err(struct mlx5_outbox_hdr *hdr);
int mlx5_cmd_status_to_err_v2(void *ptr);
int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type,
enum mlx5_cap_mode cap_mode);
int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
int out_size);
int mlx5_cmd_exec_cb(struct mlx5_core_dev *dev, void *in, int in_size,
void *out, int out_size, mlx5_cmd_cbk_t callback,
void *context);
int mlx5_cmd_alloc_uar(struct mlx5_core_dev *dev, u32 *uarn);
int mlx5_cmd_free_uar(struct mlx5_core_dev *dev, u32 uarn);
int mlx5_alloc_uuars(struct mlx5_core_dev *dev, struct mlx5_uuar_info *uuari);
int mlx5_free_uuars(struct mlx5_core_dev *dev, struct mlx5_uuar_info *uuari);
int mlx5_alloc_map_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar);
void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar);
void mlx5_health_cleanup(void);
void __init mlx5_health_init(void);
void mlx5_start_health_poll(struct mlx5_core_dev *dev);
void mlx5_stop_health_poll(struct mlx5_core_dev *dev);
int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size, int max_direct,
struct mlx5_buf *buf, int node);
int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, int max_direct,
struct mlx5_buf *buf);
void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_buf *buf);
int mlx5_core_create_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
struct mlx5_create_srq_mbox_in *in, int inlen,
int is_xrc);
int mlx5_core_destroy_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq);
int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
struct mlx5_query_srq_mbox_out *out);
int mlx5_core_query_vendor_id(struct mlx5_core_dev *mdev, u32 *vendor_id);
int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq,
u16 lwm, int is_srq);
void mlx5_init_mr_table(struct mlx5_core_dev *dev);
void mlx5_cleanup_mr_table(struct mlx5_core_dev *dev);
int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
struct mlx5_create_mkey_mbox_in *in, int inlen,
mlx5_cmd_cbk_t callback, void *context,
struct mlx5_create_mkey_mbox_out *out);
int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr);
int mlx5_core_query_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
struct mlx5_query_mkey_mbox_out *out, int outlen);
int mlx5_core_dump_fill_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr,
u32 *mkey);
int mlx5_core_alloc_pd(struct mlx5_core_dev *dev, u32 *pdn);
int mlx5_core_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn);
int mlx5_core_mad_ifc(struct mlx5_core_dev *dev, void *inb, void *outb,
u16 opmod, u8 port);
void mlx5_pagealloc_init(struct mlx5_core_dev *dev);
void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev);
int mlx5_pagealloc_start(struct mlx5_core_dev *dev);
void mlx5_pagealloc_stop(struct mlx5_core_dev *dev);
void mlx5_core_req_pages_handler(struct mlx5_core_dev *dev, u16 func_id,
s32 npages);
int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot);
int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev);
void mlx5_register_debugfs(void);
void mlx5_unregister_debugfs(void);
int mlx5_eq_init(struct mlx5_core_dev *dev);
void mlx5_eq_cleanup(struct mlx5_core_dev *dev);
void mlx5_fill_page_array(struct mlx5_buf *buf, __be64 *pas);
void mlx5_cq_completion(struct mlx5_core_dev *dev, u32 cqn);
void mlx5_rsc_event(struct mlx5_core_dev *dev, u32 rsn, int event_type);
void mlx5_srq_event(struct mlx5_core_dev *dev, u32 srqn, int event_type);
struct mlx5_core_srq *mlx5_core_get_srq(struct mlx5_core_dev *dev, u32 srqn);
void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, unsigned long vector);
void mlx5_cq_event(struct mlx5_core_dev *dev, u32 cqn, int event_type);
int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
int nent, u64 mask, const char *name, struct mlx5_uar *uar);
int mlx5_destroy_unmap_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
int mlx5_start_eqs(struct mlx5_core_dev *dev);
int mlx5_stop_eqs(struct mlx5_core_dev *dev);
int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn, int *irqn);
int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn);
int mlx5_qp_debugfs_init(struct mlx5_core_dev *dev);
void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev);
int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in,
int size_in, void *data_out, int size_out,
u16 reg_num, int arg, int write);
int mlx5_set_port_caps(struct mlx5_core_dev *dev, u8 port_num, u32 caps);
int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys,
int ptys_size, int proto_mask);
int mlx5_query_port_proto_cap(struct mlx5_core_dev *dev,
u32 *proto_cap, int proto_mask);
int mlx5_query_port_proto_admin(struct mlx5_core_dev *dev,
u32 *proto_admin, int proto_mask);
int mlx5_set_port_proto(struct mlx5_core_dev *dev, u32 proto_admin,
int proto_mask);
int mlx5_set_port_status(struct mlx5_core_dev *dev,
enum mlx5_port_status status);
int mlx5_query_port_status(struct mlx5_core_dev *dev, u8 *status);
int mlx5_set_port_pause(struct mlx5_core_dev *dev, u32 port,
u32 rx_pause, u32 tx_pause);
int mlx5_query_port_pause(struct mlx5_core_dev *dev, u32 port,
u32 *rx_pause, u32 *tx_pause);
int mlx5_set_port_mtu(struct mlx5_core_dev *dev, int mtu);
int mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, int *max_mtu);
int mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, int *oper_mtu);
unsigned int mlx5_query_module_status(struct mlx5_core_dev *dev, int module_num);
int mlx5_query_module_num(struct mlx5_core_dev *dev, int *module_num);
int mlx5_query_eeprom(struct mlx5_core_dev *dev, int i2c_addr, int page_num,
int device_addr, int size, int module_num, u32 *data,
int *size_read);
int mlx5_debug_eq_add(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
void mlx5_debug_eq_remove(struct mlx5_core_dev *dev, struct mlx5_eq *eq);
int mlx5_core_eq_query(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
struct mlx5_query_eq_mbox_out *out, int outlen);
int mlx5_eq_debugfs_init(struct mlx5_core_dev *dev);
void mlx5_eq_debugfs_cleanup(struct mlx5_core_dev *dev);
int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);
int mlx5_db_alloc(struct mlx5_core_dev *dev, struct mlx5_db *db);
int mlx5_db_alloc_node(struct mlx5_core_dev *dev, struct mlx5_db *db,
int node);
void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db);
const char *mlx5_command_str(int command);
int mlx5_cmdif_debugfs_init(struct mlx5_core_dev *dev);
void mlx5_cmdif_debugfs_cleanup(struct mlx5_core_dev *dev);
int mlx5_core_create_psv(struct mlx5_core_dev *dev, u32 pdn,
int npsvs, u32 *sig_index);
int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num);
void mlx5_core_put_rsc(struct mlx5_core_rsc_common *common);
u8 mlx5_is_wol_supported(struct mlx5_core_dev *dev);
int mlx5_set_wol(struct mlx5_core_dev *dev, u8 wol_mode);
int mlx5_query_wol(struct mlx5_core_dev *dev, u8 *wol_mode);
int mlx5_core_access_pvlc(struct mlx5_core_dev *dev,
struct mlx5_pvlc_reg *pvlc, int write);
int mlx5_core_access_ptys(struct mlx5_core_dev *dev,
struct mlx5_ptys_reg *ptys, int write);
int mlx5_core_access_pmtu(struct mlx5_core_dev *dev,
struct mlx5_pmtu_reg *pmtu, int write);
int mlx5_vxlan_udp_port_add(struct mlx5_core_dev *dev, u16 port);
int mlx5_vxlan_udp_port_delete(struct mlx5_core_dev *dev, u16 port);
int mlx5_query_port_cong_status(struct mlx5_core_dev *mdev, int protocol,
int priority, int *is_enable);
int mlx5_modify_port_cong_status(struct mlx5_core_dev *mdev, int protocol,
int priority, int enable);
int mlx5_query_port_cong_params(struct mlx5_core_dev *mdev, int protocol,
void *out, int out_size);
int mlx5_modify_port_cong_params(struct mlx5_core_dev *mdev,
void *in, int in_size);
int mlx5_query_port_cong_statistics(struct mlx5_core_dev *mdev, int clear,
void *out, int out_size);
static inline u32 mlx5_mkey_to_idx(u32 mkey)
{
return mkey >> 8;
}
static inline u32 mlx5_idx_to_mkey(u32 mkey_idx)
{
return mkey_idx << 8;
}
static inline u8 mlx5_mkey_variant(u32 mkey)
{
return mkey & 0xff;
}
enum {
MLX5_PROF_MASK_QP_SIZE = (u64)1 << 0,
MLX5_PROF_MASK_MR_CACHE = (u64)1 << 1,
};
enum {
MAX_MR_CACHE_ENTRIES = 16,
};
enum {
MLX5_INTERFACE_PROTOCOL_IB = 0,
MLX5_INTERFACE_PROTOCOL_ETH = 1,
};
struct mlx5_interface {
void * (*add)(struct mlx5_core_dev *dev);
void (*remove)(struct mlx5_core_dev *dev, void *context);
void (*event)(struct mlx5_core_dev *dev, void *context,
enum mlx5_dev_event event, unsigned long param);
void * (*get_dev)(void *context);
int protocol;
struct list_head list;
};
void *mlx5_get_protocol_dev(struct mlx5_core_dev *mdev, int protocol);
int mlx5_register_interface(struct mlx5_interface *intf);
void mlx5_unregister_interface(struct mlx5_interface *intf);
struct mlx5_profile {
u64 mask;
u8 log_max_qp;
struct {
int size;
int limit;
} mr_cache[MAX_MR_CACHE_ENTRIES];
};
#define MLX5_EEPROM_MAX_BYTES 32
#define MLX5_EEPROM_IDENTIFIER_BYTE_MASK 0x000000ff
#define MLX5_EEPROM_REVISION_ID_BYTE_MASK 0x0000ff00
#define MLX5_EEPROM_PAGE_3_VALID_BIT_MASK 0x00040000
#endif /* MLX5_DRIVER_H */
Index: projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c (revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_core/mlx5_vport.c (revision 301547)
@@ -1,922 +1,1280 @@
/*-
* Copyright (c) 2013-2015, Mellanox Technologies, Ltd. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#include
#include
#include
#include "mlx5_core.h"
u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod)
{
u32 in[MLX5_ST_SZ_DW(query_vport_state_in)];
u32 out[MLX5_ST_SZ_DW(query_vport_state_out)];
int err;
memset(in, 0, sizeof(in));
MLX5_SET(query_vport_state_in, in, opcode,
MLX5_CMD_OP_QUERY_VPORT_STATE);
MLX5_SET(query_vport_state_in, in, op_mod, opmod);
err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out,
sizeof(out));
if (err)
mlx5_core_warn(mdev, "MLX5_CMD_OP_QUERY_VPORT_STATE failed\n");
return MLX5_GET(query_vport_state_out, out, state);
}
EXPORT_SYMBOL_GPL(mlx5_query_vport_state);
static int mlx5_query_nic_vport_context(struct mlx5_core_dev *mdev, u32 vport,
u32 *out, int outlen)
{
u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
memset(in, 0, sizeof(in));
MLX5_SET(query_nic_vport_context_in, in, opcode,
MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
if (vport)
MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
}
int mlx5_vport_alloc_q_counter(struct mlx5_core_dev *mdev, int *counter_set_id)
{
u32 in[MLX5_ST_SZ_DW(alloc_q_counter_in)];
u32 out[MLX5_ST_SZ_DW(alloc_q_counter_in)];
int err;
memset(in, 0, sizeof(in));
memset(out, 0, sizeof(out));
MLX5_SET(alloc_q_counter_in, in, opcode,
MLX5_CMD_OP_ALLOC_Q_COUNTER);
err = mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
out, sizeof(out));
if (err)
return err;
*counter_set_id = MLX5_GET(alloc_q_counter_out, out,
counter_set_id);
return err;
}
int mlx5_vport_dealloc_q_counter(struct mlx5_core_dev *mdev,
int counter_set_id)
{
u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)];
u32 out[MLX5_ST_SZ_DW(dealloc_q_counter_out)];
memset(in, 0, sizeof(in));
memset(out, 0, sizeof(out));
MLX5_SET(dealloc_q_counter_in, in, opcode,
MLX5_CMD_OP_DEALLOC_Q_COUNTER);
MLX5_SET(dealloc_q_counter_in, in, counter_set_id,
counter_set_id);
return mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
out, sizeof(out));
}
static int mlx5_vport_query_q_counter(struct mlx5_core_dev *mdev,
int counter_set_id,
int reset,
void *out,
int out_size)
{
u32 in[MLX5_ST_SZ_DW(query_q_counter_in)];
memset(in, 0, sizeof(in));
MLX5_SET(query_q_counter_in, in, opcode, MLX5_CMD_OP_QUERY_Q_COUNTER);
MLX5_SET(query_q_counter_in, in, clear, reset);
MLX5_SET(query_q_counter_in, in, counter_set_id, counter_set_id);
return mlx5_cmd_exec_check_status(mdev, in, sizeof(in),
out, out_size);
}
int mlx5_vport_query_out_of_rx_buffer(struct mlx5_core_dev *mdev,
int counter_set_id,
u32 *out_of_rx_buffer)
{
u32 out[MLX5_ST_SZ_DW(query_q_counter_out)];
int err;
memset(out, 0, sizeof(out));
err = mlx5_vport_query_q_counter(mdev, counter_set_id, 0, out,
sizeof(out));
if (err)
return err;
*out_of_rx_buffer = MLX5_GET(query_q_counter_out, out,
out_of_buffer);
return err;
}
int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
u32 vport, u8 *addr)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
u8 *out_addr;
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
out_addr = MLX5_ADDR_OF(query_nic_vport_context_out, out,
nic_vport_context.permanent_address);
err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
if (err)
goto out;
ether_addr_copy(addr, &out_addr[2]);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_address);
int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
u64 *system_image_guid)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
if (err)
goto out;
*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.system_image_guid);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_system_image_guid);
int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
if (err)
goto out;
*node_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.node_guid);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_node_guid);
int mlx5_query_nic_vport_port_guid(struct mlx5_core_dev *mdev, u64 *port_guid)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
if (err)
goto out;
*port_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.port_guid);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_port_guid);
int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
u16 *qkey_viol_cntr)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_nic_vport_context(mdev, 0, out, outlen);
if (err)
goto out;
*qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out,
nic_vport_context.qkey_violation_counter);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_qkey_viol_cntr);
static int mlx5_modify_nic_vport_context(struct mlx5_core_dev *mdev, void *in,
int inlen)
{
u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
MLX5_SET(modify_nic_vport_context_in, in, opcode,
MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
memset(out, 0, sizeof(out));
return mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
}
static int mlx5_nic_vport_enable_disable_roce(struct mlx5_core_dev *mdev,
int enable_disable)
{
void *in;
int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
int err;
in = mlx5_vzalloc(inlen);
if (!in) {
mlx5_core_warn(mdev, "failed to allocate inbox\n");
return -ENOMEM;
}
MLX5_SET(modify_nic_vport_context_in, in, field_select.roce_en, 1);
MLX5_SET(modify_nic_vport_context_in, in, nic_vport_context.roce_en,
enable_disable);
err = mlx5_modify_nic_vport_context(mdev, in, inlen);
kvfree(in);
return err;
}
int mlx5_set_nic_vport_current_mac(struct mlx5_core_dev *mdev, int vport,
bool other_vport, u8 *addr)
{
void *in;
int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
+ MLX5_ST_SZ_BYTES(mac_address_layout);
u8 *mac_layout;
u8 *mac_ptr;
int err;
in = mlx5_vzalloc(inlen);
if (!in) {
mlx5_core_warn(mdev, "failed to allocate inbox\n");
return -ENOMEM;
}
MLX5_SET(modify_nic_vport_context_in, in,
opcode, MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
MLX5_SET(modify_nic_vport_context_in, in,
vport_number, vport);
MLX5_SET(modify_nic_vport_context_in, in,
other_vport, other_vport);
MLX5_SET(modify_nic_vport_context_in, in,
field_select.addresses_list, 1);
MLX5_SET(modify_nic_vport_context_in, in,
nic_vport_context.allowed_list_type,
MLX5_NIC_VPORT_LIST_TYPE_UC);
MLX5_SET(modify_nic_vport_context_in, in,
nic_vport_context.allowed_list_size, 1);
mac_layout = (u8 *)MLX5_ADDR_OF(modify_nic_vport_context_in, in,
nic_vport_context.current_uc_mac_address);
mac_ptr = (u8 *)MLX5_ADDR_OF(mac_address_layout, mac_layout,
mac_addr_47_32);
ether_addr_copy(mac_ptr, addr);
err = mlx5_modify_nic_vport_context(mdev, in, inlen);
kvfree(in);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_current_mac);
int mlx5_set_nic_vport_vlan_list(struct mlx5_core_dev *dev, u32 vport,
u16 *vlan_list, int list_len)
{
void *in, *ctx;
int i, err;
int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
+ MLX5_ST_SZ_BYTES(vlan_layout) * (int)list_len;
int max_list_size = 1 << MLX5_CAP_GEN_MAX(dev, log_max_vlan_list);
if (list_len > max_list_size) {
mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
list_len, max_list_size);
return -ENOSPC;
}
in = mlx5_vzalloc(inlen);
if (!in) {
mlx5_core_warn(dev, "failed to allocate inbox\n");
return -ENOMEM;
}
MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
if (vport)
MLX5_SET(modify_nic_vport_context_in, in,
other_vport, 1);
MLX5_SET(modify_nic_vport_context_in, in,
field_select.addresses_list, 1);
ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in, nic_vport_context);
MLX5_SET(nic_vport_context, ctx, allowed_list_type,
MLX5_NIC_VPORT_LIST_TYPE_VLAN);
MLX5_SET(nic_vport_context, ctx, allowed_list_size, list_len);
for (i = 0; i < list_len; i++) {
u8 *vlan_lout = MLX5_ADDR_OF(nic_vport_context, ctx,
current_uc_mac_address[i]);
MLX5_SET(vlan_layout, vlan_lout, vlan, vlan_list[i]);
}
err = mlx5_modify_nic_vport_context(dev, in, inlen);
kvfree(in);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_vlan_list);
int mlx5_set_nic_vport_mc_list(struct mlx5_core_dev *mdev, int vport,
u64 *addr_list, size_t addr_list_len)
{
void *in, *ctx;
int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)
+ MLX5_ST_SZ_BYTES(mac_address_layout) * (int)addr_list_len;
int err;
size_t i;
int max_list_sz = 1 << MLX5_CAP_GEN_MAX(mdev, log_max_current_mc_list);
if ((int)addr_list_len > max_list_sz) {
mlx5_core_warn(mdev, "Requested list size (%d) > (%d) max_list_size\n",
(int)addr_list_len, max_list_sz);
return -ENOSPC;
}
in = mlx5_vzalloc(inlen);
if (!in) {
mlx5_core_warn(mdev, "failed to allocate inbox\n");
return -ENOMEM;
}
MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
if (vport)
MLX5_SET(modify_nic_vport_context_in, in,
other_vport, 1);
MLX5_SET(modify_nic_vport_context_in, in,
field_select.addresses_list, 1);
ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in, nic_vport_context);
MLX5_SET(nic_vport_context, ctx, allowed_list_type,
MLX5_NIC_VPORT_LIST_TYPE_MC);
MLX5_SET(nic_vport_context, ctx, allowed_list_size, addr_list_len);
for (i = 0; i < addr_list_len; i++) {
u8 *mac_lout = (u8 *)MLX5_ADDR_OF(nic_vport_context, ctx,
current_uc_mac_address[i]);
u8 *mac_ptr = (u8 *)MLX5_ADDR_OF(mac_address_layout, mac_lout,
mac_addr_47_32);
ether_addr_copy(mac_ptr, (u8 *)&addr_list[i]);
}
err = mlx5_modify_nic_vport_context(mdev, in, inlen);
kvfree(in);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_mc_list);
int mlx5_set_nic_vport_promisc(struct mlx5_core_dev *mdev, int vport,
bool promisc_mc, bool promisc_uc,
bool promisc_all)
{
u8 in[MLX5_ST_SZ_BYTES(modify_nic_vport_context_in)];
u8 *ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
nic_vport_context);
memset(in, 0, MLX5_ST_SZ_BYTES(modify_nic_vport_context_in));
MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
if (vport)
MLX5_SET(modify_nic_vport_context_in, in,
other_vport, 1);
MLX5_SET(modify_nic_vport_context_in, in, field_select.promisc, 1);
if (promisc_mc)
MLX5_SET(nic_vport_context, ctx, promisc_mc, 1);
if (promisc_uc)
MLX5_SET(nic_vport_context, ctx, promisc_uc, 1);
if (promisc_all)
MLX5_SET(nic_vport_context, ctx, promisc_all, 1);
return mlx5_modify_nic_vport_context(mdev, in, sizeof(in));
}
EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_promisc);
+
+int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
+ u32 vport,
+ enum mlx5_list_type list_type,
+ u8 addr_list[][ETH_ALEN],
+ int *list_size)
+{
+ u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+ void *nic_vport_ctx;
+ int max_list_size;
+ int req_list_size;
+ u8 *mac_addr;
+ int out_sz;
+ void *out;
+ int err;
+ int i;
+
+ req_list_size = *list_size;
+
+ max_list_size = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC) ?
+ 1 << MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list) :
+ 1 << MLX5_CAP_GEN_MAX(dev, log_max_current_mc_list);
+
+ if (req_list_size > max_list_size) {
+ mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
+ req_list_size, max_list_size);
+ req_list_size = max_list_size;
+ }
+
+ out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+ req_list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+ memset(in, 0, sizeof(in));
+ out = kzalloc(out_sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ MLX5_SET(query_nic_vport_context_in, in, opcode,
+ MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+ MLX5_SET(query_nic_vport_context_in, in, allowed_list_type, list_type);
+ MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+ if (vport)
+ MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+ err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+ if (err)
+ goto out;
+
+ nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+ nic_vport_context);
+ req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+ allowed_list_size);
+
+ *list_size = req_list_size;
+ for (i = 0; i < req_list_size; i++) {
+ mac_addr = MLX5_ADDR_OF(nic_vport_context,
+ nic_vport_ctx,
+ current_uc_mac_address[i]) + 2;
+ ether_addr_copy(addr_list[i], mac_addr);
+ }
+out:
+ kfree(out);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_mac_list);
+
+int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev *dev,
+ enum mlx5_list_type list_type,
+ u8 addr_list[][ETH_ALEN],
+ int list_size)
+{
+ u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+ void *nic_vport_ctx;
+ int max_list_size;
+ int in_sz;
+ void *in;
+ int err;
+ int i;
+
+ max_list_size = list_type == MLX5_NIC_VPORT_LIST_TYPE_UC ?
+ 1 << MLX5_CAP_GEN(dev, log_max_current_uc_list) :
+ 1 << MLX5_CAP_GEN(dev, log_max_current_mc_list);
+
+ if (list_size > max_list_size)
+ return -ENOSPC;
+
+ in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+ list_size * MLX5_ST_SZ_BYTES(mac_address_layout);
+
+ memset(out, 0, sizeof(out));
+ in = kzalloc(in_sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(modify_nic_vport_context_in, in, opcode,
+ MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+ MLX5_SET(modify_nic_vport_context_in, in,
+ field_select.addresses_list, 1);
+
+ nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+ nic_vport_context);
+
+ MLX5_SET(nic_vport_context, nic_vport_ctx,
+ allowed_list_type, list_type);
+ MLX5_SET(nic_vport_context, nic_vport_ctx,
+ allowed_list_size, list_size);
+
+ for (i = 0; i < list_size; i++) {
+ u8 *curr_mac = MLX5_ADDR_OF(nic_vport_context,
+ nic_vport_ctx,
+ current_uc_mac_address[i]) + 2;
+ ether_addr_copy(curr_mac, addr_list[i]);
+ }
+
+ err = mlx5_cmd_exec_check_status(dev, in, in_sz, out, sizeof(out));
+ kfree(in);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_mac_list);
+
+int mlx5_query_nic_vport_vlan_list(struct mlx5_core_dev *dev,
+ u32 vport,
+ u16 *vlan_list,
+ int *list_size)
+{
+ u32 in[MLX5_ST_SZ_DW(query_nic_vport_context_in)];
+ void *nic_vport_ctx;
+ int max_list_size;
+ int req_list_size;
+ int out_sz;
+ void *out;
+ void *vlan_addr;
+ int err;
+ int i;
+
+ req_list_size = *list_size;
+
+ max_list_size = 1 << MLX5_CAP_GEN_MAX(dev, log_max_vlan_list);
+
+ if (req_list_size > max_list_size) {
+ mlx5_core_warn(dev, "Requested list size (%d) > (%d) max_list_size\n",
+ req_list_size, max_list_size);
+ req_list_size = max_list_size;
+ }
+
+ out_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+ req_list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+ memset(in, 0, sizeof(in));
+ out = kzalloc(out_sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ MLX5_SET(query_nic_vport_context_in, in, opcode,
+ MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT);
+ MLX5_SET(query_nic_vport_context_in, in, allowed_list_type,
+ MLX5_NIC_VPORT_CONTEXT_ALLOWED_LIST_TYPE_VLAN_LIST);
+ MLX5_SET(query_nic_vport_context_in, in, vport_number, vport);
+
+ if (vport)
+ MLX5_SET(query_nic_vport_context_in, in, other_vport, 1);
+
+ err = mlx5_cmd_exec_check_status(dev, in, sizeof(in), out, out_sz);
+ if (err)
+ goto out;
+
+ nic_vport_ctx = MLX5_ADDR_OF(query_nic_vport_context_out, out,
+ nic_vport_context);
+ req_list_size = MLX5_GET(nic_vport_context, nic_vport_ctx,
+ allowed_list_size);
+
+ *list_size = req_list_size;
+ for (i = 0; i < req_list_size; i++) {
+ vlan_addr = MLX5_ADDR_OF(nic_vport_context, nic_vport_ctx,
+ current_uc_mac_address[i]);
+ vlan_list[i] = MLX5_GET(vlan_layout, vlan_addr, vlan);
+ }
+out:
+ kfree(out);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_vlan_list);
+
+int mlx5_modify_nic_vport_vlans(struct mlx5_core_dev *dev,
+ u16 vlans[],
+ int list_size)
+{
+ u32 out[MLX5_ST_SZ_DW(modify_nic_vport_context_out)];
+ void *nic_vport_ctx;
+ int max_list_size;
+ int in_sz;
+ void *in;
+ int err;
+ int i;
+
+ max_list_size = 1 << MLX5_CAP_GEN(dev, log_max_vlan_list);
+
+ if (list_size > max_list_size)
+ return -ENOSPC;
+
+ in_sz = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in) +
+ list_size * MLX5_ST_SZ_BYTES(vlan_layout);
+
+ memset(out, 0, sizeof(out));
+ in = kzalloc(in_sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(modify_nic_vport_context_in, in, opcode,
+ MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+ MLX5_SET(modify_nic_vport_context_in, in,
+ field_select.addresses_list, 1);
+
+ nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in, in,
+ nic_vport_context);
+
+ MLX5_SET(nic_vport_context, nic_vport_ctx,
+ allowed_list_type, MLX5_NIC_VPORT_LIST_TYPE_VLAN);
+ MLX5_SET(nic_vport_context, nic_vport_ctx,
+ allowed_list_size, list_size);
+
+ for (i = 0; i < list_size; i++) {
+ void *vlan_addr = MLX5_ADDR_OF(nic_vport_context,
+ nic_vport_ctx,
+ current_uc_mac_address[i]);
+ MLX5_SET(vlan_layout, vlan_addr, vlan, vlans[i]);
+ }
+
+ err = mlx5_cmd_exec_check_status(dev, in, in_sz, out, sizeof(out));
+ kfree(in);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_vlans);
+
int mlx5_set_nic_vport_permanent_mac(struct mlx5_core_dev *mdev, int vport,
u8 *addr)
{
void *in;
int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
u8 *mac_ptr;
int err;
in = mlx5_vzalloc(inlen);
if (!in) {
mlx5_core_warn(mdev, "failed to allocate inbox\n");
return -ENOMEM;
}
MLX5_SET(modify_nic_vport_context_in, in,
opcode, MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
MLX5_SET(modify_nic_vport_context_in, in, other_vport, 1);
MLX5_SET(modify_nic_vport_context_in, in,
field_select.permanent_address, 1);
mac_ptr = (u8 *)MLX5_ADDR_OF(modify_nic_vport_context_in, in,
nic_vport_context.permanent_address.mac_addr_47_32);
ether_addr_copy(mac_ptr, addr);
err = mlx5_modify_nic_vport_context(mdev, in, inlen);
kvfree(in);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_set_nic_vport_permanent_mac);
int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
{
return mlx5_nic_vport_enable_disable_roce(mdev, 1);
}
EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
{
return mlx5_nic_vport_enable_disable_roce(mdev, 0);
}
EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
int mlx5_query_hca_vport_context(struct mlx5_core_dev *mdev,
u8 port_num, u8 vport_num, u32 *out,
int outlen)
{
u32 in[MLX5_ST_SZ_DW(query_hca_vport_context_in)];
int is_group_manager;
is_group_manager = MLX5_CAP_GEN(mdev, vport_group_manager);
memset(in, 0, sizeof(in));
MLX5_SET(query_hca_vport_context_in, in, opcode,
MLX5_CMD_OP_QUERY_HCA_VPORT_CONTEXT);
if (vport_num) {
if (is_group_manager) {
MLX5_SET(query_hca_vport_context_in, in, other_vport,
1);
MLX5_SET(query_hca_vport_context_in, in, vport_number,
vport_num);
} else {
return -EPERM;
}
}
if (MLX5_CAP_GEN(mdev, num_ports) == 2)
MLX5_SET(query_hca_vport_context_in, in, port_num, port_num);
return mlx5_cmd_exec_check_status(mdev, in, sizeof(in), out, outlen);
}
int mlx5_query_hca_vport_system_image_guid(struct mlx5_core_dev *mdev,
u64 *system_image_guid)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_hca_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_hca_vport_context(mdev, 1, 0, out, outlen);
if (err)
goto out;
*system_image_guid = MLX5_GET64(query_hca_vport_context_out, out,
hca_vport_context.system_image_guid);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_system_image_guid);
int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
{
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_hca_vport_context_out);
int err;
out = mlx5_vzalloc(outlen);
if (!out)
return -ENOMEM;
err = mlx5_query_hca_vport_context(mdev, 1, 0, out, outlen);
if (err)
goto out;
*node_guid = MLX5_GET64(query_hca_vport_context_out, out,
hca_vport_context.node_guid);
out:
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_node_guid);
int mlx5_query_hca_vport_gid(struct mlx5_core_dev *dev, u8 port_num,
u16 vport_num, u16 gid_index, union ib_gid *gid)
{
int in_sz = MLX5_ST_SZ_BYTES(query_hca_vport_gid_in);
int out_sz = MLX5_ST_SZ_BYTES(query_hca_vport_gid_out);
int is_group_manager;
void *out = NULL;
void *in = NULL;
union ib_gid *tmp;
int tbsz;
int nout;
int err;
is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
tbsz = mlx5_get_gid_table_len(MLX5_CAP_GEN(dev, gid_table_size));
if (gid_index > tbsz && gid_index != 0xffff)
return -EINVAL;
if (gid_index == 0xffff)
nout = tbsz;
else
nout = 1;
out_sz += nout * sizeof(*gid);
in = mlx5_vzalloc(in_sz);
out = mlx5_vzalloc(out_sz);
if (!in || !out) {
err = -ENOMEM;
goto out;
}
MLX5_SET(query_hca_vport_gid_in, in, opcode,
MLX5_CMD_OP_QUERY_HCA_VPORT_GID);
if (vport_num) {
if (is_group_manager) {
MLX5_SET(query_hca_vport_gid_in, in, vport_number,
vport_num);
MLX5_SET(query_hca_vport_gid_in, in, other_vport, 1);
} else {
err = -EPERM;
goto out;
}
}
MLX5_SET(query_hca_vport_gid_in, in, gid_index, gid_index);
if (MLX5_CAP_GEN(dev, num_ports) == 2)
MLX5_SET(query_hca_vport_gid_in, in, port_num, port_num);
err = mlx5_cmd_exec(dev, in, in_sz, out, out_sz);
if (err)
goto out;
err = mlx5_cmd_status_to_err_v2(out);
if (err)
goto out;
tmp = (union ib_gid *)MLX5_ADDR_OF(query_hca_vport_gid_out, out, gid);
gid->global.subnet_prefix = tmp->global.subnet_prefix;
gid->global.interface_id = tmp->global.interface_id;
out:
kvfree(in);
kvfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_gid);
int mlx5_query_hca_vport_pkey(struct mlx5_core_dev *dev, u8 other_vport,
u8 port_num, u16 vf_num, u16 pkey_index,
u16 *pkey)
{
int in_sz = MLX5_ST_SZ_BYTES(query_hca_vport_pkey_in);
int out_sz = MLX5_ST_SZ_BYTES(query_hca_vport_pkey_out);
int is_group_manager;
void *out = NULL;
void *in = NULL;
void *pkarr;
int nout;
int tbsz;
int err;
int i;
is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
tbsz = mlx5_to_sw_pkey_sz(MLX5_CAP_GEN(dev, pkey_table_size));
if (pkey_index > tbsz && pkey_index != 0xffff)
return -EINVAL;
if (pkey_index == 0xffff)
nout = tbsz;
else
nout = 1;
out_sz += nout * MLX5_ST_SZ_BYTES(pkey);
in = kzalloc(in_sz, GFP_KERNEL);
out = kzalloc(out_sz, GFP_KERNEL);
MLX5_SET(query_hca_vport_pkey_in, in, opcode,
MLX5_CMD_OP_QUERY_HCA_VPORT_PKEY);
if (other_vport) {
if (is_group_manager) {
MLX5_SET(query_hca_vport_pkey_in, in, vport_number,
vf_num);
MLX5_SET(query_hca_vport_pkey_in, in, other_vport, 1);
} else {
err = -EPERM;
goto out;
}
}
MLX5_SET(query_hca_vport_pkey_in, in, pkey_index, pkey_index);
if (MLX5_CAP_GEN(dev, num_ports) == 2)
MLX5_SET(query_hca_vport_pkey_in, in, port_num, port_num);
err = mlx5_cmd_exec(dev, in, in_sz, out, out_sz);
if (err)
goto out;
err = mlx5_cmd_status_to_err_v2(out);
if (err)
goto out;
pkarr = MLX5_ADDR_OF(query_hca_vport_pkey_out, out, pkey);
for (i = 0; i < nout; i++, pkey++,
pkarr += MLX5_ST_SZ_BYTES(pkey))
*pkey = MLX5_GET_PR(pkey, pkarr, pkey);
out:
kfree(in);
kfree(out);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_hca_vport_pkey);
static int mlx5_modify_eswitch_vport_context(struct mlx5_core_dev *mdev,
u16 vport, void *in, int inlen)
{
u32 out[MLX5_ST_SZ_DW(modify_esw_vport_context_out)];
int err;
memset(out, 0, sizeof(out));
MLX5_SET(modify_esw_vport_context_in, in, vport_number, vport);
if (vport)
MLX5_SET(modify_esw_vport_context_in, in, other_vport, 1);
MLX5_SET(modify_esw_vport_context_in, in, opcode,
MLX5_CMD_OP_MODIFY_ESW_VPORT_CONTEXT);
err = mlx5_cmd_exec_check_status(mdev, in, inlen,
out, sizeof(out));
if (err)
mlx5_core_warn(mdev, "MLX5_CMD_OP_MODIFY_ESW_VPORT_CONTEXT failed\n");
return err;
}
int mlx5_set_eswitch_cvlan_info(struct mlx5_core_dev *mdev, u8 vport,
u8 insert_mode, u8 strip_mode,
u16 vlan, u8 cfi, u8 pcp)
{
u32 in[MLX5_ST_SZ_DW(modify_esw_vport_context_in)];
memset(in, 0, sizeof(in));
if (insert_mode != MLX5_MODIFY_ESW_VPORT_CONTEXT_CVLAN_INSERT_NONE) {
MLX5_SET(modify_esw_vport_context_in, in,
esw_vport_context.cvlan_cfi, cfi);
MLX5_SET(modify_esw_vport_context_in, in,
esw_vport_context.cvlan_pcp, pcp);
MLX5_SET(modify_esw_vport_context_in, in,
esw_vport_context.cvlan_id, vlan);
}
MLX5_SET(modify_esw_vport_context_in, in,
esw_vport_context.vport_cvlan_insert, insert_mode);
MLX5_SET(modify_esw_vport_context_in, in,
esw_vport_context.vport_cvlan_strip, strip_mode);
MLX5_SET(modify_esw_vport_context_in, in, field_select,
MLX5_MODIFY_ESW_VPORT_CONTEXT_FIELD_SELECT_CVLAN_STRIP |
MLX5_MODIFY_ESW_VPORT_CONTEXT_FIELD_SELECT_CVLAN_INSERT);
return mlx5_modify_eswitch_vport_context(mdev, vport, in, sizeof(in));
}
EXPORT_SYMBOL_GPL(mlx5_set_eswitch_cvlan_info);
+
+int mlx5_arm_vport_context_events(struct mlx5_core_dev *mdev,
+ u8 vport,
+ u32 events_mask)
+{
+ u32 *in;
+ u32 inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+ void *nic_vport_ctx;
+ int err;
+
+ in = mlx5_vzalloc(inlen);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(modify_nic_vport_context_in,
+ in,
+ opcode,
+ MLX5_CMD_OP_MODIFY_NIC_VPORT_CONTEXT);
+ MLX5_SET(modify_nic_vport_context_in,
+ in,
+ field_select.change_event,
+ 1);
+ MLX5_SET(modify_nic_vport_context_in, in, vport_number, vport);
+ if (vport)
+ MLX5_SET(modify_nic_vport_context_in, in, other_vport, 1);
+ nic_vport_ctx = MLX5_ADDR_OF(modify_nic_vport_context_in,
+ in,
+ nic_vport_context);
+
+ MLX5_SET(nic_vport_context, nic_vport_ctx, arm_change_event, 1);
+
+ if (events_mask & MLX5_UC_ADDR_CHANGE)
+ MLX5_SET(nic_vport_context,
+ nic_vport_ctx,
+ event_on_uc_address_change,
+ 1);
+ if (events_mask & MLX5_MC_ADDR_CHANGE)
+ MLX5_SET(nic_vport_context,
+ nic_vport_ctx,
+ event_on_mc_address_change,
+ 1);
+ if (events_mask & MLX5_VLAN_CHANGE)
+ MLX5_SET(nic_vport_context,
+ nic_vport_ctx,
+ event_on_vlan_change,
+ 1);
+ if (events_mask & MLX5_PROMISC_CHANGE)
+ MLX5_SET(nic_vport_context,
+ nic_vport_ctx,
+ event_on_promisc_change,
+ 1);
+ if (events_mask & MLX5_MTU_CHANGE)
+ MLX5_SET(nic_vport_context,
+ nic_vport_ctx,
+ event_on_mtu,
+ 1);
+
+ err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+
+ kvfree(in);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_arm_vport_context_events);
+
+int mlx5_query_vport_promisc(struct mlx5_core_dev *mdev,
+ u32 vport,
+ u8 *promisc_uc,
+ u8 *promisc_mc,
+ u8 *promisc_all)
+{
+ u32 *out;
+ int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+ int err;
+
+ out = kzalloc(outlen, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ err = mlx5_query_nic_vport_context(mdev, vport, out, outlen);
+ if (err)
+ goto out;
+
+ *promisc_uc = MLX5_GET(query_nic_vport_context_out, out,
+ nic_vport_context.promisc_uc);
+ *promisc_mc = MLX5_GET(query_nic_vport_context_out, out,
+ nic_vport_context.promisc_mc);
+ *promisc_all = MLX5_GET(query_nic_vport_context_out, out,
+ nic_vport_context.promisc_all);
+
+out:
+ kfree(out);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_promisc);
+
+int mlx5_modify_nic_vport_promisc(struct mlx5_core_dev *mdev,
+ int promisc_uc,
+ int promisc_mc,
+ int promisc_all)
+{
+ void *in;
+ int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+ int err;
+
+ in = mlx5_vzalloc(inlen);
+ if (!in) {
+ mlx5_core_err(mdev, "failed to allocate inbox\n");
+ return -ENOMEM;
+ }
+
+ MLX5_SET(modify_nic_vport_context_in, in, field_select.promisc, 1);
+ MLX5_SET(modify_nic_vport_context_in, in,
+ nic_vport_context.promisc_uc, promisc_uc);
+ MLX5_SET(modify_nic_vport_context_in, in,
+ nic_vport_context.promisc_mc, promisc_mc);
+ MLX5_SET(modify_nic_vport_context_in, in,
+ nic_vport_context.promisc_all, promisc_all);
+
+ err = mlx5_modify_nic_vport_context(mdev, in, inlen);
+ kvfree(in);
+ return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_modify_nic_vport_promisc);
int mlx5_query_vport_counter(struct mlx5_core_dev *dev,
u8 port_num, u16 vport_num,
void *out, int out_size)
{
int in_sz = MLX5_ST_SZ_BYTES(query_vport_counter_in);
int is_group_manager;
void *in;
int err;
is_group_manager = MLX5_CAP_GEN(dev, vport_group_manager);
in = mlx5_vzalloc(in_sz);
if (!in)
return -ENOMEM;
MLX5_SET(query_vport_counter_in, in, opcode,
MLX5_CMD_OP_QUERY_VPORT_COUNTER);
if (vport_num) {
if (is_group_manager) {
MLX5_SET(query_vport_counter_in, in, other_vport, 1);
MLX5_SET(query_vport_counter_in, in, vport_number,
vport_num);
} else {
err = -EPERM;
goto ex;
}
}
if (MLX5_CAP_GEN(dev, num_ports) == 2)
MLX5_SET(query_vport_counter_in, in, port_num, port_num);
err = mlx5_cmd_exec(dev, in, in_sz, out, out_size);
if (err)
goto ex;
err = mlx5_cmd_status_to_err_v2(out);
if (err)
goto ex;
ex:
kvfree(in);
return err;
}
EXPORT_SYMBOL_GPL(mlx5_query_vport_counter);
int mlx5_get_vport_counters(struct mlx5_core_dev *dev, u8 port_num,
struct mlx5_vport_counters *vc)
{
int out_sz = MLX5_ST_SZ_BYTES(query_vport_counter_out);
void *out;
int err;
out = mlx5_vzalloc(out_sz);
if (!out)
return -ENOMEM;
err = mlx5_query_vport_counter(dev, port_num, 0, out, out_sz);
if (err)
goto ex;
vc->received_errors.packets =
MLX5_GET64(query_vport_counter_out,
out, received_errors.packets);
vc->received_errors.octets =
MLX5_GET64(query_vport_counter_out,
out, received_errors.octets);
vc->transmit_errors.packets =
MLX5_GET64(query_vport_counter_out,
out, transmit_errors.packets);
vc->transmit_errors.octets =
MLX5_GET64(query_vport_counter_out,
out, transmit_errors.octets);
vc->received_ib_unicast.packets =
MLX5_GET64(query_vport_counter_out,
out, received_ib_unicast.packets);
vc->received_ib_unicast.octets =
MLX5_GET64(query_vport_counter_out,
out, received_ib_unicast.octets);
vc->transmitted_ib_unicast.packets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_ib_unicast.packets);
vc->transmitted_ib_unicast.octets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_ib_unicast.octets);
vc->received_ib_multicast.packets =
MLX5_GET64(query_vport_counter_out,
out, received_ib_multicast.packets);
vc->received_ib_multicast.octets =
MLX5_GET64(query_vport_counter_out,
out, received_ib_multicast.octets);
vc->transmitted_ib_multicast.packets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_ib_multicast.packets);
vc->transmitted_ib_multicast.octets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_ib_multicast.octets);
vc->received_eth_broadcast.packets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_broadcast.packets);
vc->received_eth_broadcast.octets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_broadcast.octets);
vc->transmitted_eth_broadcast.packets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_broadcast.packets);
vc->transmitted_eth_broadcast.octets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_broadcast.octets);
vc->received_eth_unicast.octets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_unicast.octets);
vc->received_eth_unicast.packets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_unicast.packets);
vc->transmitted_eth_unicast.octets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_unicast.octets);
vc->transmitted_eth_unicast.packets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_unicast.packets);
vc->received_eth_multicast.octets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_multicast.octets);
vc->received_eth_multicast.packets =
MLX5_GET64(query_vport_counter_out,
out, received_eth_multicast.packets);
vc->transmitted_eth_multicast.octets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_multicast.octets);
vc->transmitted_eth_multicast.packets =
MLX5_GET64(query_vport_counter_out,
out, transmitted_eth_multicast.packets);
ex:
kvfree(out);
return err;
}
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c (revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_flow_table.c (revision 301547)
@@ -1,867 +1,999 @@
/*-
* Copyright (c) 2015 Mellanox Technologies. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#include "en.h"
#include
#include
enum {
MLX5E_FULLMATCH = 0,
MLX5E_ALLMULTI = 1,
MLX5E_PROMISC = 2,
};
enum {
MLX5E_UC = 0,
MLX5E_MC_IPV4 = 1,
MLX5E_MC_IPV6 = 2,
MLX5E_MC_OTHER = 3,
};
enum {
MLX5E_ACTION_NONE = 0,
MLX5E_ACTION_ADD = 1,
MLX5E_ACTION_DEL = 2,
};
struct mlx5e_eth_addr_hash_node {
LIST_ENTRY(mlx5e_eth_addr_hash_node) hlist;
u8 action;
struct mlx5e_eth_addr_info ai;
};
static inline int
mlx5e_hash_eth_addr(const u8 * addr)
{
return (addr[5]);
}
static void
mlx5e_add_eth_addr_to_hash(struct mlx5e_eth_addr_hash_head *hash,
const u8 * addr)
{
struct mlx5e_eth_addr_hash_node *hn;
int ix = mlx5e_hash_eth_addr(addr);
LIST_FOREACH(hn, &hash[ix], hlist) {
if (bcmp(hn->ai.addr, addr, ETHER_ADDR_LEN) == 0) {
if (hn->action == MLX5E_ACTION_DEL)
hn->action = MLX5E_ACTION_NONE;
return;
}
}
hn = malloc(sizeof(*hn), M_MLX5EN, M_NOWAIT | M_ZERO);
if (hn == NULL)
return;
ether_addr_copy(hn->ai.addr, addr);
hn->action = MLX5E_ACTION_ADD;
LIST_INSERT_HEAD(&hash[ix], hn, hlist);
}
static void
mlx5e_del_eth_addr_from_hash(struct mlx5e_eth_addr_hash_node *hn)
{
LIST_REMOVE(hn, hlist);
free(hn, M_MLX5EN);
}
static void
mlx5e_del_eth_addr_from_flow_table(struct mlx5e_priv *priv,
struct mlx5e_eth_addr_info *ai)
{
void *ft = priv->ft.main;
if (ai->tt_vec & (1 << MLX5E_TT_IPV6_TCP))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6_TCP]);
if (ai->tt_vec & (1 << MLX5E_TT_IPV4_TCP))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4_TCP]);
if (ai->tt_vec & (1 << MLX5E_TT_IPV6_UDP))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6_UDP]);
if (ai->tt_vec & (1 << MLX5E_TT_IPV4_UDP))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4_UDP]);
if (ai->tt_vec & (1 << MLX5E_TT_IPV6))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV6]);
if (ai->tt_vec & (1 << MLX5E_TT_IPV4))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_IPV4]);
if (ai->tt_vec & (1 << MLX5E_TT_ANY))
mlx5_del_flow_table_entry(ft, ai->ft_ix[MLX5E_TT_ANY]);
}
static int
mlx5e_get_eth_addr_type(const u8 * addr)
{
if (ETHER_IS_MULTICAST(addr) == 0)
return (MLX5E_UC);
if ((addr[0] == 0x01) &&
(addr[1] == 0x00) &&
(addr[2] == 0x5e) &&
!(addr[3] & 0x80))
return (MLX5E_MC_IPV4);
if ((addr[0] == 0x33) &&
(addr[1] == 0x33))
return (MLX5E_MC_IPV6);
return (MLX5E_MC_OTHER);
}
static u32
mlx5e_get_tt_vec(struct mlx5e_eth_addr_info *ai, int type)
{
int eth_addr_type;
u32 ret;
switch (type) {
case MLX5E_FULLMATCH:
eth_addr_type = mlx5e_get_eth_addr_type(ai->addr);
switch (eth_addr_type) {
case MLX5E_UC:
ret =
(1 << MLX5E_TT_IPV4_TCP) |
(1 << MLX5E_TT_IPV6_TCP) |
(1 << MLX5E_TT_IPV4_UDP) |
(1 << MLX5E_TT_IPV6_UDP) |
(1 << MLX5E_TT_IPV4) |
(1 << MLX5E_TT_IPV6) |
(1 << MLX5E_TT_ANY) |
0;
break;
case MLX5E_MC_IPV4:
ret =
(1 << MLX5E_TT_IPV4_UDP) |
(1 << MLX5E_TT_IPV4) |
0;
break;
case MLX5E_MC_IPV6:
ret =
(1 << MLX5E_TT_IPV6_UDP) |
(1 << MLX5E_TT_IPV6) |
0;
break;
default:
ret =
(1 << MLX5E_TT_ANY) |
0;
break;
}
break;
case MLX5E_ALLMULTI:
ret =
(1 << MLX5E_TT_IPV4_UDP) |
(1 << MLX5E_TT_IPV6_UDP) |
(1 << MLX5E_TT_IPV4) |
(1 << MLX5E_TT_IPV6) |
(1 << MLX5E_TT_ANY) |
0;
break;
default: /* MLX5E_PROMISC */
ret =
(1 << MLX5E_TT_IPV4_TCP) |
(1 << MLX5E_TT_IPV6_TCP) |
(1 << MLX5E_TT_IPV4_UDP) |
(1 << MLX5E_TT_IPV6_UDP) |
(1 << MLX5E_TT_IPV4) |
(1 << MLX5E_TT_IPV6) |
(1 << MLX5E_TT_ANY) |
0;
break;
}
return (ret);
}
static int
mlx5e_add_eth_addr_rule_sub(struct mlx5e_priv *priv,
struct mlx5e_eth_addr_info *ai, int type,
void *flow_context, void *match_criteria)
{
u8 match_criteria_enable = 0;
void *match_value;
void *dest;
u8 *dmac;
u8 *match_criteria_dmac;
void *ft = priv->ft.main;
u32 *tirn = priv->tirn;
u32 tt_vec;
int err;
match_value = MLX5_ADDR_OF(flow_context, flow_context, match_value);
dmac = MLX5_ADDR_OF(fte_match_param, match_value,
outer_headers.dmac_47_16);
match_criteria_dmac = MLX5_ADDR_OF(fte_match_param, match_criteria,
outer_headers.dmac_47_16);
dest = MLX5_ADDR_OF(flow_context, flow_context, destination);
MLX5_SET(flow_context, flow_context, action,
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
MLX5_SET(flow_context, flow_context, destination_list_size, 1);
MLX5_SET(dest_format_struct, dest, destination_type,
MLX5_FLOW_CONTEXT_DEST_TYPE_TIR);
switch (type) {
case MLX5E_FULLMATCH:
match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
memset(match_criteria_dmac, 0xff, ETH_ALEN);
ether_addr_copy(dmac, ai->addr);
break;
case MLX5E_ALLMULTI:
match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
match_criteria_dmac[0] = 0x01;
dmac[0] = 0x01;
break;
case MLX5E_PROMISC:
break;
default:
break;
}
tt_vec = mlx5e_get_tt_vec(ai, type);
if (tt_vec & (1 << MLX5E_TT_ANY)) {
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_ANY]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_ANY]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_ANY);
}
match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, match_criteria,
outer_headers.ethertype);
if (tt_vec & (1 << MLX5E_TT_IPV4)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IP);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV4]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV4);
}
if (tt_vec & (1 << MLX5E_TT_IPV6)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IPV6);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV6]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV6);
}
MLX5_SET_TO_ONES(fte_match_param, match_criteria,
outer_headers.ip_protocol);
MLX5_SET(fte_match_param, match_value, outer_headers.ip_protocol,
IPPROTO_UDP);
if (tt_vec & (1 << MLX5E_TT_IPV4_UDP)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IP);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV4_UDP]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4_UDP]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV4_UDP);
}
if (tt_vec & (1 << MLX5E_TT_IPV6_UDP)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IPV6);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV6_UDP]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6_UDP]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV6_UDP);
}
MLX5_SET(fte_match_param, match_value, outer_headers.ip_protocol,
IPPROTO_TCP);
if (tt_vec & (1 << MLX5E_TT_IPV4_TCP)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IP);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV4_TCP]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV4_TCP]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV4_TCP);
}
if (tt_vec & (1 << MLX5E_TT_IPV6_TCP)) {
MLX5_SET(fte_match_param, match_value, outer_headers.ethertype,
ETHERTYPE_IPV6);
MLX5_SET(dest_format_struct, dest, destination_id,
tirn[MLX5E_TT_IPV6_TCP]);
err = mlx5_add_flow_table_entry(ft, match_criteria_enable,
match_criteria, flow_context, &ai->ft_ix[MLX5E_TT_IPV6_TCP]);
if (err) {
mlx5e_del_eth_addr_from_flow_table(priv, ai);
return (err);
}
ai->tt_vec |= (1 << MLX5E_TT_IPV6_TCP);
}
return (0);
}
static int
mlx5e_add_eth_addr_rule(struct mlx5e_priv *priv,
struct mlx5e_eth_addr_info *ai, int type)
{
u32 *flow_context;
u32 *match_criteria;
int err;
flow_context = mlx5_vzalloc(MLX5_ST_SZ_BYTES(flow_context) +
MLX5_ST_SZ_BYTES(dest_format_struct));
match_criteria = mlx5_vzalloc(MLX5_ST_SZ_BYTES(fte_match_param));
if (!flow_context || !match_criteria) {
if_printf(priv->ifp, "%s: alloc failed\n", __func__);
err = -ENOMEM;
goto add_eth_addr_rule_out;
}
err = mlx5e_add_eth_addr_rule_sub(priv, ai, type, flow_context,
match_criteria);
if (err)
if_printf(priv->ifp, "%s: failed\n", __func__);
add_eth_addr_rule_out:
kvfree(match_criteria);
kvfree(flow_context);
return (err);
}
+static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
+{
+ struct ifnet *ifp = priv->ifp;
+ int max_list_size;
+ int list_size;
+ u16 *vlans;
+ int vlan;
+ int err;
+ int i;
+
+ list_size = 0;
+ for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
+ list_size++;
+
+ max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);
+
+ if (list_size > max_list_size) {
+ if_printf(ifp,
+ "ifnet vlans list size (%d) > (%d) max vport list size, some vlans will be dropped\n",
+ list_size, max_list_size);
+ list_size = max_list_size;
+ }
+
+ vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
+ if (!vlans)
+ return -ENOMEM;
+
+ i = 0;
+ for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID) {
+ if (i >= list_size)
+ break;
+ vlans[i++] = vlan;
+ }
+
+ err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
+ if (err)
+ if_printf(ifp, "Failed to modify vport vlans list err(%d)\n",
+ err);
+
+ kfree(vlans);
+ return err;
+}
+
enum mlx5e_vlan_rule_type {
MLX5E_VLAN_RULE_TYPE_UNTAGGED,
MLX5E_VLAN_RULE_TYPE_ANY_VID,
MLX5E_VLAN_RULE_TYPE_MATCH_VID,
};
static int
mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
enum mlx5e_vlan_rule_type rule_type, u16 vid)
{
u8 match_criteria_enable = 0;
u32 *flow_context;
void *match_value;
void *dest;
u32 *match_criteria;
u32 *ft_ix;
int err;
flow_context = mlx5_vzalloc(MLX5_ST_SZ_BYTES(flow_context) +
MLX5_ST_SZ_BYTES(dest_format_struct));
match_criteria = mlx5_vzalloc(MLX5_ST_SZ_BYTES(fte_match_param));
if (!flow_context || !match_criteria) {
if_printf(priv->ifp, "%s: alloc failed\n", __func__);
err = -ENOMEM;
goto add_vlan_rule_out;
}
match_value = MLX5_ADDR_OF(flow_context, flow_context, match_value);
dest = MLX5_ADDR_OF(flow_context, flow_context, destination);
MLX5_SET(flow_context, flow_context, action,
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST);
MLX5_SET(flow_context, flow_context, destination_list_size, 1);
MLX5_SET(dest_format_struct, dest, destination_type,
MLX5_FLOW_CONTEXT_DEST_TYPE_FLOW_TABLE);
MLX5_SET(dest_format_struct, dest, destination_id,
mlx5_get_flow_table_id(priv->ft.main));
match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, match_criteria,
outer_headers.vlan_tag);
switch (rule_type) {
case MLX5E_VLAN_RULE_TYPE_UNTAGGED:
ft_ix = &priv->vlan.untagged_rule_ft_ix;
break;
case MLX5E_VLAN_RULE_TYPE_ANY_VID:
ft_ix = &priv->vlan.any_vlan_rule_ft_ix;
MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
1);
break;
default: /* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
ft_ix = &priv->vlan.active_vlans_ft_ix[vid];
MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
1);
MLX5_SET_TO_ONES(fte_match_param, match_criteria,
outer_headers.first_vid);
MLX5_SET(fte_match_param, match_value, outer_headers.first_vid,
vid);
+ mlx5e_vport_context_update_vlans(priv);
break;
}
err = mlx5_add_flow_table_entry(priv->ft.vlan, match_criteria_enable,
match_criteria, flow_context, ft_ix);
if (err)
if_printf(priv->ifp, "%s: failed\n", __func__);
add_vlan_rule_out:
kvfree(match_criteria);
kvfree(flow_context);
return (err);
}
static void
mlx5e_del_vlan_rule(struct mlx5e_priv *priv,
enum mlx5e_vlan_rule_type rule_type, u16 vid)
{
switch (rule_type) {
case MLX5E_VLAN_RULE_TYPE_UNTAGGED:
mlx5_del_flow_table_entry(priv->ft.vlan,
priv->vlan.untagged_rule_ft_ix);
break;
case MLX5E_VLAN_RULE_TYPE_ANY_VID:
mlx5_del_flow_table_entry(priv->ft.vlan,
priv->vlan.any_vlan_rule_ft_ix);
break;
case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
mlx5_del_flow_table_entry(priv->ft.vlan,
priv->vlan.active_vlans_ft_ix[vid]);
+ mlx5e_vport_context_update_vlans(priv);
break;
}
}
void
mlx5e_enable_vlan_filter(struct mlx5e_priv *priv)
{
if (priv->vlan.filter_disabled) {
priv->vlan.filter_disabled = false;
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
0);
}
}
void
mlx5e_disable_vlan_filter(struct mlx5e_priv *priv)
{
if (!priv->vlan.filter_disabled) {
priv->vlan.filter_disabled = true;
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
0);
}
}
void
mlx5e_vlan_rx_add_vid(void *arg, struct ifnet *ifp, u16 vid)
{
struct mlx5e_priv *priv = arg;
if (ifp != priv->ifp)
return;
PRIV_LOCK(priv);
set_bit(vid, priv->vlan.active_vlans);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
PRIV_UNLOCK(priv);
}
void
mlx5e_vlan_rx_kill_vid(void *arg, struct ifnet *ifp, u16 vid)
{
struct mlx5e_priv *priv = arg;
if (ifp != priv->ifp)
return;
PRIV_LOCK(priv);
clear_bit(vid, priv->vlan.active_vlans);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
PRIV_UNLOCK(priv);
}
int
mlx5e_add_all_vlan_rules(struct mlx5e_priv *priv)
{
u16 vid;
int err;
for_each_set_bit(vid, priv->vlan.active_vlans, VLAN_N_VID) {
err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID,
vid);
if (err)
return (err);
}
err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_UNTAGGED, 0);
if (err)
return (err);
if (priv->vlan.filter_disabled) {
err = mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID,
0);
if (err)
return (err);
}
return (0);
}
void
mlx5e_del_all_vlan_rules(struct mlx5e_priv *priv)
{
u16 vid;
if (priv->vlan.filter_disabled)
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_ANY_VID, 0);
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_UNTAGGED, 0);
for_each_set_bit(vid, priv->vlan.active_vlans, VLAN_N_VID)
mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
}
#define mlx5e_for_each_hash_node(hn, tmp, hash, i) \
for (i = 0; i < MLX5E_ETH_ADDR_HASH_SIZE; i++) \
LIST_FOREACH_SAFE(hn, &(hash)[i], hlist, tmp)
static void
mlx5e_execute_action(struct mlx5e_priv *priv,
struct mlx5e_eth_addr_hash_node *hn)
{
switch (hn->action) {
case MLX5E_ACTION_ADD:
mlx5e_add_eth_addr_rule(priv, &hn->ai, MLX5E_FULLMATCH);
hn->action = MLX5E_ACTION_NONE;
break;
case MLX5E_ACTION_DEL:
mlx5e_del_eth_addr_from_flow_table(priv, &hn->ai);
mlx5e_del_eth_addr_from_hash(hn);
break;
default:
break;
}
}
static void
mlx5e_sync_ifp_addr(struct mlx5e_priv *priv)
{
struct ifnet *ifp = priv->ifp;
struct ifaddr *ifa;
struct ifmultiaddr *ifma;
/* XXX adding this entry might not be needed */
mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_uc,
LLADDR((struct sockaddr_dl *)(ifp->if_addr->ifa_addr)));
if_addr_rlock(ifp);
TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
if (ifa->ifa_addr->sa_family != AF_LINK)
continue;
mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_uc,
LLADDR((struct sockaddr_dl *)ifa->ifa_addr));
}
if_addr_runlock(ifp);
if_maddr_rlock(ifp);
TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
if (ifma->ifma_addr->sa_family != AF_LINK)
continue;
mlx5e_add_eth_addr_to_hash(priv->eth_addr.if_mc,
LLADDR((struct sockaddr_dl *)ifma->ifma_addr));
}
if_maddr_runlock(ifp);
}
+static void mlx5e_fill_addr_array(struct mlx5e_priv *priv, int list_type,
+ u8 addr_array[][ETH_ALEN], int size)
+{
+ bool is_uc = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC);
+ struct ifnet *ifp = priv->ifp;
+ struct mlx5e_eth_addr_hash_node *hn;
+ struct mlx5e_eth_addr_hash_head *addr_list;
+ struct mlx5e_eth_addr_hash_node *tmp;
+ int i = 0;
+ int hi;
+
+ addr_list = is_uc ? priv->eth_addr.if_uc : priv->eth_addr.if_mc;
+
+ if (is_uc) /* Make sure our own address is pushed first */
+ ether_addr_copy(addr_array[i++], IF_LLADDR(ifp));
+ else if (priv->eth_addr.broadcast_enabled)
+ ether_addr_copy(addr_array[i++], ifp->if_broadcastaddr);
+
+ mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
+ if (ether_addr_equal(IF_LLADDR(ifp), hn->ai.addr))
+ continue;
+ if (i >= size)
+ break;
+ ether_addr_copy(addr_array[i++], hn->ai.addr);
+ }
+}
+
+static void mlx5e_vport_context_update_addr_list(struct mlx5e_priv *priv,
+ int list_type)
+{
+ bool is_uc = (list_type == MLX5_NIC_VPORT_LIST_TYPE_UC);
+ struct mlx5e_eth_addr_hash_node *hn;
+ u8 (*addr_array)[ETH_ALEN] = NULL;
+ struct mlx5e_eth_addr_hash_head *addr_list;
+ struct mlx5e_eth_addr_hash_node *tmp;
+ int max_size;
+ int size;
+ int err;
+ int hi;
+
+ size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 : 0);
+ max_size = is_uc ?
+ 1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
+ 1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);
+
+ addr_list = is_uc ? priv->eth_addr.if_uc : priv->eth_addr.if_mc;
+ mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
+ size++;
+
+ if (size > max_size) {
+ if_printf(priv->ifp,
+ "ifp %s list size (%d) > (%d) max vport list size, some addresses will be dropped\n",
+ is_uc ? "UC" : "MC", size, max_size);
+ size = max_size;
+ }
+
+ if (size) {
+ addr_array = kcalloc(size, ETH_ALEN, GFP_KERNEL);
+ if (!addr_array) {
+ err = -ENOMEM;
+ goto out;
+ }
+ mlx5e_fill_addr_array(priv, list_type, addr_array, size);
+ }
+
+ err = mlx5_modify_nic_vport_mac_list(priv->mdev, list_type, addr_array, size);
+out:
+ if (err)
+ if_printf(priv->ifp,
+ "Failed to modify vport %s list err(%d)\n",
+ is_uc ? "UC" : "MC", err);
+ kfree(addr_array);
+}
+
+static void mlx5e_vport_context_update(struct mlx5e_priv *priv)
+{
+ struct mlx5e_eth_addr_db *ea = &priv->eth_addr;
+
+ mlx5e_vport_context_update_addr_list(priv, MLX5_NIC_VPORT_LIST_TYPE_UC);
+ mlx5e_vport_context_update_addr_list(priv, MLX5_NIC_VPORT_LIST_TYPE_MC);
+ mlx5_modify_nic_vport_promisc(priv->mdev, 0,
+ ea->allmulti_enabled,
+ ea->promisc_enabled);
+}
+
static void
mlx5e_apply_ifp_addr(struct mlx5e_priv *priv)
{
struct mlx5e_eth_addr_hash_node *hn;
struct mlx5e_eth_addr_hash_node *tmp;
int i;
mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_uc, i)
mlx5e_execute_action(priv, hn);
mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_mc, i)
mlx5e_execute_action(priv, hn);
}
static void
mlx5e_handle_ifp_addr(struct mlx5e_priv *priv)
{
struct mlx5e_eth_addr_hash_node *hn;
struct mlx5e_eth_addr_hash_node *tmp;
int i;
mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_uc, i)
hn->action = MLX5E_ACTION_DEL;
mlx5e_for_each_hash_node(hn, tmp, priv->eth_addr.if_mc, i)
hn->action = MLX5E_ACTION_DEL;
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_sync_ifp_addr(priv);
mlx5e_apply_ifp_addr(priv);
}
void
mlx5e_set_rx_mode_core(struct mlx5e_priv *priv)
{
struct mlx5e_eth_addr_db *ea = &priv->eth_addr;
struct ifnet *ndev = priv->ifp;
bool rx_mode_enable = test_bit(MLX5E_STATE_OPENED, &priv->state);
bool promisc_enabled = rx_mode_enable && (ndev->if_flags & IFF_PROMISC);
bool allmulti_enabled = rx_mode_enable && (ndev->if_flags & IFF_ALLMULTI);
bool broadcast_enabled = rx_mode_enable;
bool enable_promisc = !ea->promisc_enabled && promisc_enabled;
bool disable_promisc = ea->promisc_enabled && !promisc_enabled;
bool enable_allmulti = !ea->allmulti_enabled && allmulti_enabled;
bool disable_allmulti = ea->allmulti_enabled && !allmulti_enabled;
bool enable_broadcast = !ea->broadcast_enabled && broadcast_enabled;
bool disable_broadcast = ea->broadcast_enabled && !broadcast_enabled;
/* update broadcast address */
ether_addr_copy(priv->eth_addr.broadcast.addr,
priv->ifp->if_broadcastaddr);
if (enable_promisc)
mlx5e_add_eth_addr_rule(priv, &ea->promisc, MLX5E_PROMISC);
if (enable_allmulti)
mlx5e_add_eth_addr_rule(priv, &ea->allmulti, MLX5E_ALLMULTI);
if (enable_broadcast)
mlx5e_add_eth_addr_rule(priv, &ea->broadcast, MLX5E_FULLMATCH);
mlx5e_handle_ifp_addr(priv);
if (disable_broadcast)
mlx5e_del_eth_addr_from_flow_table(priv, &ea->broadcast);
if (disable_allmulti)
mlx5e_del_eth_addr_from_flow_table(priv, &ea->allmulti);
if (disable_promisc)
mlx5e_del_eth_addr_from_flow_table(priv, &ea->promisc);
ea->promisc_enabled = promisc_enabled;
ea->allmulti_enabled = allmulti_enabled;
ea->broadcast_enabled = broadcast_enabled;
+
+ mlx5e_vport_context_update(priv);
}
void
mlx5e_set_rx_mode_work(struct work_struct *work)
{
struct mlx5e_priv *priv =
container_of(work, struct mlx5e_priv, set_rx_mode_work);
PRIV_LOCK(priv);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_set_rx_mode_core(priv);
PRIV_UNLOCK(priv);
}
static int
mlx5e_create_main_flow_table(struct mlx5e_priv *priv)
{
struct mlx5_flow_table_group *g;
u8 *dmac;
g = malloc(9 * sizeof(*g), M_MLX5EN, M_WAITOK | M_ZERO);
if (g == NULL)
return (-ENOMEM);
g[0].log_sz = 2;
g[0].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
outer_headers.ethertype);
MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
outer_headers.ip_protocol);
g[1].log_sz = 1;
g[1].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, g[1].match_criteria,
outer_headers.ethertype);
g[2].log_sz = 0;
g[3].log_sz = 14;
g[3].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[3].match_criteria,
outer_headers.dmac_47_16);
memset(dmac, 0xff, ETH_ALEN);
MLX5_SET_TO_ONES(fte_match_param, g[3].match_criteria,
outer_headers.ethertype);
MLX5_SET_TO_ONES(fte_match_param, g[3].match_criteria,
outer_headers.ip_protocol);
g[4].log_sz = 13;
g[4].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[4].match_criteria,
outer_headers.dmac_47_16);
memset(dmac, 0xff, ETH_ALEN);
MLX5_SET_TO_ONES(fte_match_param, g[4].match_criteria,
outer_headers.ethertype);
g[5].log_sz = 11;
g[5].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[5].match_criteria,
outer_headers.dmac_47_16);
memset(dmac, 0xff, ETH_ALEN);
g[6].log_sz = 2;
g[6].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[6].match_criteria,
outer_headers.dmac_47_16);
dmac[0] = 0x01;
MLX5_SET_TO_ONES(fte_match_param, g[6].match_criteria,
outer_headers.ethertype);
MLX5_SET_TO_ONES(fte_match_param, g[6].match_criteria,
outer_headers.ip_protocol);
g[7].log_sz = 1;
g[7].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[7].match_criteria,
outer_headers.dmac_47_16);
dmac[0] = 0x01;
MLX5_SET_TO_ONES(fte_match_param, g[7].match_criteria,
outer_headers.ethertype);
g[8].log_sz = 0;
g[8].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
dmac = MLX5_ADDR_OF(fte_match_param, g[8].match_criteria,
outer_headers.dmac_47_16);
dmac[0] = 0x01;
priv->ft.main = mlx5_create_flow_table(priv->mdev, 1,
MLX5_FLOW_TABLE_TYPE_NIC_RCV,
0, 9, g);
free(g, M_MLX5EN);
return (priv->ft.main ? 0 : -ENOMEM);
}
static void
mlx5e_destroy_main_flow_table(struct mlx5e_priv *priv)
{
mlx5_destroy_flow_table(priv->ft.main);
priv->ft.main = NULL;
}
static int
mlx5e_create_vlan_flow_table(struct mlx5e_priv *priv)
{
struct mlx5_flow_table_group *g;
g = malloc(2 * sizeof(*g), M_MLX5EN, M_WAITOK | M_ZERO);
if (g == NULL)
return (-ENOMEM);
g[0].log_sz = 12;
g[0].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
outer_headers.vlan_tag);
MLX5_SET_TO_ONES(fte_match_param, g[0].match_criteria,
outer_headers.first_vid);
/* untagged + any vlan id */
g[1].log_sz = 1;
g[1].match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
MLX5_SET_TO_ONES(fte_match_param, g[1].match_criteria,
outer_headers.vlan_tag);
priv->ft.vlan = mlx5_create_flow_table(priv->mdev, 0,
MLX5_FLOW_TABLE_TYPE_NIC_RCV,
0, 2, g);
free(g, M_MLX5EN);
return (priv->ft.vlan ? 0 : -ENOMEM);
}
static void
mlx5e_destroy_vlan_flow_table(struct mlx5e_priv *priv)
{
mlx5_destroy_flow_table(priv->ft.vlan);
priv->ft.vlan = NULL;
}
int
mlx5e_open_flow_table(struct mlx5e_priv *priv)
{
int err;
err = mlx5e_create_main_flow_table(priv);
if (err)
return (err);
err = mlx5e_create_vlan_flow_table(priv);
if (err)
goto err_destroy_main_flow_table;
return (0);
err_destroy_main_flow_table:
mlx5e_destroy_main_flow_table(priv);
return (err);
}
void
mlx5e_close_flow_table(struct mlx5e_priv *priv)
{
mlx5e_destroy_vlan_flow_table(priv);
mlx5e_destroy_main_flow_table(priv);
}
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c (revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_main.c (revision 301547)
@@ -1,3183 +1,3190 @@
/*-
* Copyright (c) 2015 Mellanox Technologies. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#include "en.h"
#include
#include
#define ETH_DRIVER_VERSION "3.1.0-dev"
char mlx5e_version[] = "Mellanox Ethernet driver"
" (" ETH_DRIVER_VERSION ")";
struct mlx5e_rq_param {
u32 rqc [MLX5_ST_SZ_DW(rqc)];
struct mlx5_wq_param wq;
};
struct mlx5e_sq_param {
u32 sqc [MLX5_ST_SZ_DW(sqc)];
struct mlx5_wq_param wq;
};
struct mlx5e_cq_param {
u32 cqc [MLX5_ST_SZ_DW(cqc)];
struct mlx5_wq_param wq;
u16 eq_ix;
};
struct mlx5e_channel_param {
struct mlx5e_rq_param rq;
struct mlx5e_sq_param sq;
struct mlx5e_cq_param rx_cq;
struct mlx5e_cq_param tx_cq;
};
static const struct {
u32 subtype;
u64 baudrate;
} mlx5e_mode_table[MLX5E_LINK_MODES_NUMBER] = {
[MLX5E_1000BASE_CX_SGMII] = {
.subtype = IFM_1000_CX_SGMII,
.baudrate = IF_Mbps(1000ULL),
},
[MLX5E_1000BASE_KX] = {
.subtype = IFM_1000_KX,
.baudrate = IF_Mbps(1000ULL),
},
[MLX5E_10GBASE_CX4] = {
.subtype = IFM_10G_CX4,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_10GBASE_KX4] = {
.subtype = IFM_10G_KX4,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_10GBASE_KR] = {
.subtype = IFM_10G_KR,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_20GBASE_KR2] = {
.subtype = IFM_20G_KR2,
.baudrate = IF_Gbps(20ULL),
},
[MLX5E_40GBASE_CR4] = {
.subtype = IFM_40G_CR4,
.baudrate = IF_Gbps(40ULL),
},
[MLX5E_40GBASE_KR4] = {
.subtype = IFM_40G_KR4,
.baudrate = IF_Gbps(40ULL),
},
[MLX5E_56GBASE_R4] = {
.subtype = IFM_56G_R4,
.baudrate = IF_Gbps(56ULL),
},
[MLX5E_10GBASE_CR] = {
.subtype = IFM_10G_CR1,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_10GBASE_SR] = {
.subtype = IFM_10G_SR,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_10GBASE_LR] = {
.subtype = IFM_10G_LR,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_40GBASE_SR4] = {
.subtype = IFM_40G_SR4,
.baudrate = IF_Gbps(40ULL),
},
[MLX5E_40GBASE_LR4] = {
.subtype = IFM_40G_LR4,
.baudrate = IF_Gbps(40ULL),
},
[MLX5E_100GBASE_CR4] = {
.subtype = IFM_100G_CR4,
.baudrate = IF_Gbps(100ULL),
},
[MLX5E_100GBASE_SR4] = {
.subtype = IFM_100G_SR4,
.baudrate = IF_Gbps(100ULL),
},
[MLX5E_100GBASE_KR4] = {
.subtype = IFM_100G_KR4,
.baudrate = IF_Gbps(100ULL),
},
[MLX5E_100GBASE_LR4] = {
.subtype = IFM_100G_LR4,
.baudrate = IF_Gbps(100ULL),
},
[MLX5E_100BASE_TX] = {
.subtype = IFM_100_TX,
.baudrate = IF_Mbps(100ULL),
},
[MLX5E_100BASE_T] = {
.subtype = IFM_100_T,
.baudrate = IF_Mbps(100ULL),
},
[MLX5E_10GBASE_T] = {
.subtype = IFM_10G_T,
.baudrate = IF_Gbps(10ULL),
},
[MLX5E_25GBASE_CR] = {
.subtype = IFM_25G_CR,
.baudrate = IF_Gbps(25ULL),
},
[MLX5E_25GBASE_KR] = {
.subtype = IFM_25G_KR,
.baudrate = IF_Gbps(25ULL),
},
[MLX5E_25GBASE_SR] = {
.subtype = IFM_25G_SR,
.baudrate = IF_Gbps(25ULL),
},
[MLX5E_50GBASE_CR2] = {
.subtype = IFM_50G_CR2,
.baudrate = IF_Gbps(50ULL),
},
[MLX5E_50GBASE_KR2] = {
.subtype = IFM_50G_KR2,
.baudrate = IF_Gbps(50ULL),
},
};
MALLOC_DEFINE(M_MLX5EN, "MLX5EN", "MLX5 Ethernet");
static void
mlx5e_update_carrier(struct mlx5e_priv *priv)
{
struct mlx5_core_dev *mdev = priv->mdev;
u32 out[MLX5_ST_SZ_DW(ptys_reg)];
u32 eth_proto_oper;
int error;
u8 port_state;
u8 i;
port_state = mlx5_query_vport_state(mdev,
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT);
if (port_state == VPORT_STATE_UP) {
priv->media_status_last |= IFM_ACTIVE;
} else {
priv->media_status_last &= ~IFM_ACTIVE;
priv->media_active_last = IFM_ETHER;
if_link_state_change(priv->ifp, LINK_STATE_DOWN);
return;
}
error = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN);
if (error) {
priv->media_active_last = IFM_ETHER;
priv->ifp->if_baudrate = 1;
if_printf(priv->ifp, "%s: query port ptys failed: 0x%x\n",
__func__, error);
return;
}
eth_proto_oper = MLX5_GET(ptys_reg, out, eth_proto_oper);
for (i = 0; i != MLX5E_LINK_MODES_NUMBER; i++) {
if (mlx5e_mode_table[i].baudrate == 0)
continue;
if (MLX5E_PROT_MASK(i) & eth_proto_oper) {
priv->ifp->if_baudrate =
mlx5e_mode_table[i].baudrate;
priv->media_active_last =
mlx5e_mode_table[i].subtype | IFM_ETHER | IFM_FDX;
}
}
if_link_state_change(priv->ifp, LINK_STATE_UP);
}
static void
mlx5e_media_status(struct ifnet *dev, struct ifmediareq *ifmr)
{
struct mlx5e_priv *priv = dev->if_softc;
ifmr->ifm_status = priv->media_status_last;
ifmr->ifm_active = priv->media_active_last |
(priv->params.rx_pauseframe_control ? IFM_ETH_RXPAUSE : 0) |
(priv->params.tx_pauseframe_control ? IFM_ETH_TXPAUSE : 0);
}
static u32
mlx5e_find_link_mode(u32 subtype)
{
u32 i;
u32 link_mode = 0;
for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
if (mlx5e_mode_table[i].baudrate == 0)
continue;
if (mlx5e_mode_table[i].subtype == subtype)
link_mode |= MLX5E_PROT_MASK(i);
}
return (link_mode);
}
static int
mlx5e_media_change(struct ifnet *dev)
{
struct mlx5e_priv *priv = dev->if_softc;
struct mlx5_core_dev *mdev = priv->mdev;
u32 eth_proto_cap;
u32 link_mode;
int was_opened;
int locked;
int error;
locked = PRIV_LOCKED(priv);
if (!locked)
PRIV_LOCK(priv);
if (IFM_TYPE(priv->media.ifm_media) != IFM_ETHER) {
error = EINVAL;
goto done;
}
link_mode = mlx5e_find_link_mode(IFM_SUBTYPE(priv->media.ifm_media));
/* query supported capabilities */
error = mlx5_query_port_proto_cap(mdev, ð_proto_cap, MLX5_PTYS_EN);
if (error != 0) {
if_printf(dev, "Query port media capability failed\n");
goto done;
}
/* check for autoselect */
if (IFM_SUBTYPE(priv->media.ifm_media) == IFM_AUTO) {
link_mode = eth_proto_cap;
if (link_mode == 0) {
if_printf(dev, "Port media capability is zero\n");
error = EINVAL;
goto done;
}
} else {
link_mode = link_mode & eth_proto_cap;
if (link_mode == 0) {
if_printf(dev, "Not supported link mode requested\n");
error = EINVAL;
goto done;
}
}
/* update pauseframe control bits */
priv->params.rx_pauseframe_control =
(priv->media.ifm_media & IFM_ETH_RXPAUSE) ? 1 : 0;
priv->params.tx_pauseframe_control =
(priv->media.ifm_media & IFM_ETH_TXPAUSE) ? 1 : 0;
/* check if device is opened */
was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
/* reconfigure the hardware */
mlx5_set_port_status(mdev, MLX5_PORT_DOWN);
mlx5_set_port_proto(mdev, link_mode, MLX5_PTYS_EN);
mlx5_set_port_pause(mdev, 1,
priv->params.rx_pauseframe_control,
priv->params.tx_pauseframe_control);
if (was_opened)
mlx5_set_port_status(mdev, MLX5_PORT_UP);
done:
if (!locked)
PRIV_UNLOCK(priv);
return (error);
}
static void
mlx5e_update_carrier_work(struct work_struct *work)
{
struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
update_carrier_work);
PRIV_LOCK(priv);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_update_carrier(priv);
PRIV_UNLOCK(priv);
}
static void
mlx5e_update_pport_counters(struct mlx5e_priv *priv)
{
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_pport_stats *s = &priv->stats.pport;
struct mlx5e_port_stats_debug *s_debug = &priv->stats.port_stats_debug;
u32 *in;
u32 *out;
u64 *ptr;
unsigned sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
unsigned x;
unsigned y;
in = mlx5_vzalloc(sz);
out = mlx5_vzalloc(sz);
if (in == NULL || out == NULL)
goto free_out;
ptr = (uint64_t *)MLX5_ADDR_OF(ppcnt_reg, out, counter_set);
MLX5_SET(ppcnt_reg, in, local_port, 1);
MLX5_SET(ppcnt_reg, in, grp, MLX5_IEEE_802_3_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
for (x = y = 0; x != MLX5E_PPORT_IEEE802_3_STATS_NUM; x++, y++)
s->arg[y] = be64toh(ptr[x]);
MLX5_SET(ppcnt_reg, in, grp, MLX5_RFC_2819_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
for (x = 0; x != MLX5E_PPORT_RFC2819_STATS_NUM; x++, y++)
s->arg[y] = be64toh(ptr[x]);
for (y = 0; x != MLX5E_PPORT_RFC2819_STATS_NUM +
MLX5E_PPORT_RFC2819_STATS_DEBUG_NUM; x++, y++)
s_debug->arg[y] = be64toh(ptr[x]);
MLX5_SET(ppcnt_reg, in, grp, MLX5_RFC_2863_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
for (x = 0; x != MLX5E_PPORT_RFC2863_STATS_DEBUG_NUM; x++, y++)
s_debug->arg[y] = be64toh(ptr[x]);
MLX5_SET(ppcnt_reg, in, grp, MLX5_PHYSICAL_LAYER_COUNTERS_GROUP);
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
for (x = 0; x != MLX5E_PPORT_PHYSICAL_LAYER_STATS_DEBUG_NUM; x++, y++)
s_debug->arg[y] = be64toh(ptr[x]);
free_out:
kvfree(in);
kvfree(out);
}
static void
mlx5e_update_stats_work(struct work_struct *work)
{
struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
update_stats_work);
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_vport_stats *s = &priv->stats.vport;
struct mlx5e_rq_stats *rq_stats;
struct mlx5e_sq_stats *sq_stats;
struct buf_ring *sq_br;
#if (__FreeBSD_version < 1100000)
struct ifnet *ifp = priv->ifp;
#endif
u32 in[MLX5_ST_SZ_DW(query_vport_counter_in)];
u32 *out;
int outlen = MLX5_ST_SZ_BYTES(query_vport_counter_out);
u64 tso_packets = 0;
u64 tso_bytes = 0;
u64 tx_queue_dropped = 0;
u64 tx_defragged = 0;
u64 tx_offload_none = 0;
u64 lro_packets = 0;
u64 lro_bytes = 0;
u64 sw_lro_queued = 0;
u64 sw_lro_flushed = 0;
u64 rx_csum_none = 0;
u64 rx_wqe_err = 0;
u32 rx_out_of_buffer = 0;
int i;
int j;
PRIV_LOCK(priv);
out = mlx5_vzalloc(outlen);
if (out == NULL)
goto free_out;
if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
goto free_out;
/* Collect firts the SW counters and then HW for consistency */
for (i = 0; i < priv->params.num_channels; i++) {
struct mlx5e_rq *rq = &priv->channel[i]->rq;
rq_stats = &priv->channel[i]->rq.stats;
/* collect stats from LRO */
rq_stats->sw_lro_queued = rq->lro.lro_queued;
rq_stats->sw_lro_flushed = rq->lro.lro_flushed;
sw_lro_queued += rq_stats->sw_lro_queued;
sw_lro_flushed += rq_stats->sw_lro_flushed;
lro_packets += rq_stats->lro_packets;
lro_bytes += rq_stats->lro_bytes;
rx_csum_none += rq_stats->csum_none;
rx_wqe_err += rq_stats->wqe_err;
for (j = 0; j < priv->num_tc; j++) {
sq_stats = &priv->channel[i]->sq[j].stats;
sq_br = priv->channel[i]->sq[j].br;
tso_packets += sq_stats->tso_packets;
tso_bytes += sq_stats->tso_bytes;
tx_queue_dropped += sq_stats->dropped;
tx_queue_dropped += sq_br->br_drops;
tx_defragged += sq_stats->defragged;
tx_offload_none += sq_stats->csum_offload_none;
}
}
/* update counters */
s->tso_packets = tso_packets;
s->tso_bytes = tso_bytes;
s->tx_queue_dropped = tx_queue_dropped;
s->tx_defragged = tx_defragged;
s->lro_packets = lro_packets;
s->lro_bytes = lro_bytes;
s->sw_lro_queued = sw_lro_queued;
s->sw_lro_flushed = sw_lro_flushed;
s->rx_csum_none = rx_csum_none;
s->rx_wqe_err = rx_wqe_err;
/* HW counters */
memset(in, 0, sizeof(in));
MLX5_SET(query_vport_counter_in, in, opcode,
MLX5_CMD_OP_QUERY_VPORT_COUNTER);
MLX5_SET(query_vport_counter_in, in, op_mod, 0);
MLX5_SET(query_vport_counter_in, in, other_vport, 0);
memset(out, 0, outlen);
/* get number of out-of-buffer drops first */
if (mlx5_vport_query_out_of_rx_buffer(mdev, priv->counter_set_id,
&rx_out_of_buffer))
goto free_out;
/* accumulate difference into a 64-bit counter */
s->rx_out_of_buffer += (u64)(u32)(rx_out_of_buffer - s->rx_out_of_buffer_prev);
s->rx_out_of_buffer_prev = rx_out_of_buffer;
/* get port statistics */
if (mlx5_cmd_exec(mdev, in, sizeof(in), out, outlen))
goto free_out;
#define MLX5_GET_CTR(out, x) \
MLX5_GET64(query_vport_counter_out, out, x)
s->rx_error_packets =
MLX5_GET_CTR(out, received_errors.packets);
s->rx_error_bytes =
MLX5_GET_CTR(out, received_errors.octets);
s->tx_error_packets =
MLX5_GET_CTR(out, transmit_errors.packets);
s->tx_error_bytes =
MLX5_GET_CTR(out, transmit_errors.octets);
s->rx_unicast_packets =
MLX5_GET_CTR(out, received_eth_unicast.packets);
s->rx_unicast_bytes =
MLX5_GET_CTR(out, received_eth_unicast.octets);
s->tx_unicast_packets =
MLX5_GET_CTR(out, transmitted_eth_unicast.packets);
s->tx_unicast_bytes =
MLX5_GET_CTR(out, transmitted_eth_unicast.octets);
s->rx_multicast_packets =
MLX5_GET_CTR(out, received_eth_multicast.packets);
s->rx_multicast_bytes =
MLX5_GET_CTR(out, received_eth_multicast.octets);
s->tx_multicast_packets =
MLX5_GET_CTR(out, transmitted_eth_multicast.packets);
s->tx_multicast_bytes =
MLX5_GET_CTR(out, transmitted_eth_multicast.octets);
s->rx_broadcast_packets =
MLX5_GET_CTR(out, received_eth_broadcast.packets);
s->rx_broadcast_bytes =
MLX5_GET_CTR(out, received_eth_broadcast.octets);
s->tx_broadcast_packets =
MLX5_GET_CTR(out, transmitted_eth_broadcast.packets);
s->tx_broadcast_bytes =
MLX5_GET_CTR(out, transmitted_eth_broadcast.octets);
s->rx_packets =
s->rx_unicast_packets +
s->rx_multicast_packets +
s->rx_broadcast_packets -
s->rx_out_of_buffer;
s->rx_bytes =
s->rx_unicast_bytes +
s->rx_multicast_bytes +
s->rx_broadcast_bytes;
s->tx_packets =
s->tx_unicast_packets +
s->tx_multicast_packets +
s->tx_broadcast_packets;
s->tx_bytes =
s->tx_unicast_bytes +
s->tx_multicast_bytes +
s->tx_broadcast_bytes;
/* Update calculated offload counters */
s->tx_csum_offload = s->tx_packets - tx_offload_none;
s->rx_csum_good = s->rx_packets - s->rx_csum_none;
/* Update per port counters */
mlx5e_update_pport_counters(priv);
#if (__FreeBSD_version < 1100000)
/* no get_counters interface in fbsd 10 */
ifp->if_ipackets = s->rx_packets;
ifp->if_ierrors = s->rx_error_packets;
ifp->if_iqdrops = s->rx_out_of_buffer;
ifp->if_opackets = s->tx_packets;
ifp->if_oerrors = s->tx_error_packets;
ifp->if_snd.ifq_drops = s->tx_queue_dropped;
ifp->if_ibytes = s->rx_bytes;
ifp->if_obytes = s->tx_bytes;
#endif
free_out:
kvfree(out);
PRIV_UNLOCK(priv);
}
static void
mlx5e_update_stats(void *arg)
{
struct mlx5e_priv *priv = arg;
schedule_work(&priv->update_stats_work);
callout_reset(&priv->watchdog, hz, &mlx5e_update_stats, priv);
}
static void
mlx5e_async_event_sub(struct mlx5e_priv *priv,
enum mlx5_dev_event event)
{
switch (event) {
case MLX5_DEV_EVENT_PORT_UP:
case MLX5_DEV_EVENT_PORT_DOWN:
schedule_work(&priv->update_carrier_work);
break;
default:
break;
}
}
static void
mlx5e_async_event(struct mlx5_core_dev *mdev, void *vpriv,
enum mlx5_dev_event event, unsigned long param)
{
struct mlx5e_priv *priv = vpriv;
mtx_lock(&priv->async_events_mtx);
if (test_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state))
mlx5e_async_event_sub(priv, event);
mtx_unlock(&priv->async_events_mtx);
}
static void
mlx5e_enable_async_events(struct mlx5e_priv *priv)
{
set_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state);
}
static void
mlx5e_disable_async_events(struct mlx5e_priv *priv)
{
mtx_lock(&priv->async_events_mtx);
clear_bit(MLX5E_STATE_ASYNC_EVENTS_ENABLE, &priv->state);
mtx_unlock(&priv->async_events_mtx);
}
static const char *mlx5e_rq_stats_desc[] = {
MLX5E_RQ_STATS(MLX5E_STATS_DESC)
};
static int
mlx5e_create_rq(struct mlx5e_channel *c,
struct mlx5e_rq_param *param,
struct mlx5e_rq *rq)
{
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
char buffer[16];
void *rqc = param->rqc;
void *rqc_wq = MLX5_ADDR_OF(rqc, rqc, wq);
int wq_sz;
int err;
int i;
/* Create DMA descriptor TAG */
if ((err = -bus_dma_tag_create(
bus_get_dma_tag(mdev->pdev->dev.bsddev),
1, /* any alignment */
0, /* no boundary */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MJUM16BYTES, /* maxsize */
1, /* nsegments */
MJUM16BYTES, /* maxsegsize */
0, /* flags */
NULL, NULL, /* lockfunc, lockfuncarg */
&rq->dma_tag)))
goto done;
err = mlx5_wq_ll_create(mdev, ¶m->wq, rqc_wq, &rq->wq,
&rq->wq_ctrl);
if (err)
goto err_free_dma_tag;
rq->wq.db = &rq->wq.db[MLX5_RCV_DBR];
if (priv->params.hw_lro_en) {
rq->wqe_sz = priv->params.lro_wqe_sz;
} else {
rq->wqe_sz = MLX5E_SW2MB_MTU(priv->ifp->if_mtu);
}
if (rq->wqe_sz > MJUM16BYTES) {
err = -ENOMEM;
goto err_rq_wq_destroy;
} else if (rq->wqe_sz > MJUM9BYTES) {
rq->wqe_sz = MJUM16BYTES;
} else if (rq->wqe_sz > MJUMPAGESIZE) {
rq->wqe_sz = MJUM9BYTES;
} else if (rq->wqe_sz > MCLBYTES) {
rq->wqe_sz = MJUMPAGESIZE;
} else {
rq->wqe_sz = MCLBYTES;
}
wq_sz = mlx5_wq_ll_get_size(&rq->wq);
rq->mbuf = malloc(wq_sz * sizeof(rq->mbuf[0]), M_MLX5EN, M_WAITOK | M_ZERO);
if (rq->mbuf == NULL) {
err = -ENOMEM;
goto err_rq_wq_destroy;
}
for (i = 0; i != wq_sz; i++) {
struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, i);
uint32_t byte_count = rq->wqe_sz - MLX5E_NET_IP_ALIGN;
err = -bus_dmamap_create(rq->dma_tag, 0, &rq->mbuf[i].dma_map);
if (err != 0) {
while (i--)
bus_dmamap_destroy(rq->dma_tag, rq->mbuf[i].dma_map);
goto err_rq_mbuf_free;
}
wqe->data.lkey = c->mkey_be;
wqe->data.byte_count = cpu_to_be32(byte_count | MLX5_HW_START_PADDING);
}
rq->pdev = c->pdev;
rq->ifp = c->ifp;
rq->channel = c;
rq->ix = c->ix;
snprintf(buffer, sizeof(buffer), "rxstat%d", c->ix);
mlx5e_create_stats(&rq->stats.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
buffer, mlx5e_rq_stats_desc, MLX5E_RQ_STATS_NUM,
rq->stats.arg);
#ifdef HAVE_TURBO_LRO
if (tcp_tlro_init(&rq->lro, c->ifp, MLX5E_BUDGET_MAX) != 0)
rq->lro.mbuf = NULL;
#else
if (tcp_lro_init(&rq->lro))
rq->lro.lro_cnt = 0;
else
rq->lro.ifp = c->ifp;
#endif
return (0);
err_rq_mbuf_free:
free(rq->mbuf, M_MLX5EN);
err_rq_wq_destroy:
mlx5_wq_destroy(&rq->wq_ctrl);
err_free_dma_tag:
bus_dma_tag_destroy(rq->dma_tag);
done:
return (err);
}
static void
mlx5e_destroy_rq(struct mlx5e_rq *rq)
{
int wq_sz;
int i;
/* destroy all sysctl nodes */
sysctl_ctx_free(&rq->stats.ctx);
/* free leftover LRO packets, if any */
#ifdef HAVE_TURBO_LRO
tcp_tlro_free(&rq->lro);
#else
tcp_lro_free(&rq->lro);
#endif
wq_sz = mlx5_wq_ll_get_size(&rq->wq);
for (i = 0; i != wq_sz; i++) {
if (rq->mbuf[i].mbuf != NULL) {
bus_dmamap_unload(rq->dma_tag,
rq->mbuf[i].dma_map);
m_freem(rq->mbuf[i].mbuf);
}
bus_dmamap_destroy(rq->dma_tag, rq->mbuf[i].dma_map);
}
free(rq->mbuf, M_MLX5EN);
mlx5_wq_destroy(&rq->wq_ctrl);
}
static int
mlx5e_enable_rq(struct mlx5e_rq *rq, struct mlx5e_rq_param *param)
{
struct mlx5e_channel *c = rq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
void *in;
void *rqc;
void *wq;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(create_rq_in) +
sizeof(u64) * rq->wq_ctrl.buf.npages;
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
rqc = MLX5_ADDR_OF(create_rq_in, in, ctx);
wq = MLX5_ADDR_OF(rqc, rqc, wq);
memcpy(rqc, param->rqc, sizeof(param->rqc));
MLX5_SET(rqc, rqc, cqn, c->rq.cq.mcq.cqn);
MLX5_SET(rqc, rqc, state, MLX5_RQC_STATE_RST);
MLX5_SET(rqc, rqc, flush_in_error_en, 1);
if (priv->counter_set_id >= 0)
MLX5_SET(rqc, rqc, counter_set_id, priv->counter_set_id);
MLX5_SET(wq, wq, log_wq_pg_sz, rq->wq_ctrl.buf.page_shift -
PAGE_SHIFT);
MLX5_SET64(wq, wq, dbr_addr, rq->wq_ctrl.db.dma);
mlx5_fill_page_array(&rq->wq_ctrl.buf,
(__be64 *) MLX5_ADDR_OF(wq, wq, pas));
err = mlx5_core_create_rq(mdev, in, inlen, &rq->rqn);
kvfree(in);
return (err);
}
static int
mlx5e_modify_rq(struct mlx5e_rq *rq, int curr_state, int next_state)
{
struct mlx5e_channel *c = rq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
void *in;
void *rqc;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(modify_rq_in);
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
rqc = MLX5_ADDR_OF(modify_rq_in, in, ctx);
MLX5_SET(modify_rq_in, in, rqn, rq->rqn);
MLX5_SET(modify_rq_in, in, rq_state, curr_state);
MLX5_SET(rqc, rqc, state, next_state);
err = mlx5_core_modify_rq(mdev, in, inlen);
kvfree(in);
return (err);
}
static void
mlx5e_disable_rq(struct mlx5e_rq *rq)
{
struct mlx5e_channel *c = rq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
mlx5_core_destroy_rq(mdev, rq->rqn);
}
static int
mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq)
{
struct mlx5e_channel *c = rq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_wq_ll *wq = &rq->wq;
int i;
for (i = 0; i < 1000; i++) {
if (wq->cur_sz >= priv->params.min_rx_wqes)
return (0);
msleep(4);
}
return (-ETIMEDOUT);
}
static int
mlx5e_open_rq(struct mlx5e_channel *c,
struct mlx5e_rq_param *param,
struct mlx5e_rq *rq)
{
int err;
err = mlx5e_create_rq(c, param, rq);
if (err)
return (err);
err = mlx5e_enable_rq(rq, param);
if (err)
goto err_destroy_rq;
err = mlx5e_modify_rq(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY);
if (err)
goto err_disable_rq;
c->rq.enabled = 1;
return (0);
err_disable_rq:
mlx5e_disable_rq(rq);
err_destroy_rq:
mlx5e_destroy_rq(rq);
return (err);
}
static void
mlx5e_close_rq(struct mlx5e_rq *rq)
{
rq->enabled = 0;
mlx5e_modify_rq(rq, MLX5_RQC_STATE_RDY, MLX5_RQC_STATE_ERR);
}
static void
mlx5e_close_rq_wait(struct mlx5e_rq *rq)
{
/* wait till RQ is empty */
while (!mlx5_wq_ll_is_empty(&rq->wq)) {
msleep(4);
rq->cq.mcq.comp(&rq->cq.mcq);
}
mlx5e_disable_rq(rq);
mlx5e_destroy_rq(rq);
}
static void
mlx5e_free_sq_db(struct mlx5e_sq *sq)
{
int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
int x;
for (x = 0; x != wq_sz; x++)
bus_dmamap_destroy(sq->dma_tag, sq->mbuf[x].dma_map);
free(sq->mbuf, M_MLX5EN);
}
static int
mlx5e_alloc_sq_db(struct mlx5e_sq *sq)
{
int wq_sz = mlx5_wq_cyc_get_size(&sq->wq);
int err;
int x;
sq->mbuf = malloc(wq_sz * sizeof(sq->mbuf[0]), M_MLX5EN, M_WAITOK | M_ZERO);
if (sq->mbuf == NULL)
return (-ENOMEM);
/* Create DMA descriptor MAPs */
for (x = 0; x != wq_sz; x++) {
err = -bus_dmamap_create(sq->dma_tag, 0, &sq->mbuf[x].dma_map);
if (err != 0) {
while (x--)
bus_dmamap_destroy(sq->dma_tag, sq->mbuf[x].dma_map);
free(sq->mbuf, M_MLX5EN);
return (err);
}
}
return (0);
}
static const char *mlx5e_sq_stats_desc[] = {
MLX5E_SQ_STATS(MLX5E_STATS_DESC)
};
static int
mlx5e_create_sq(struct mlx5e_channel *c,
int tc,
struct mlx5e_sq_param *param,
struct mlx5e_sq *sq)
{
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
char buffer[16];
void *sqc = param->sqc;
void *sqc_wq = MLX5_ADDR_OF(sqc, sqc, wq);
#ifdef RSS
cpuset_t cpu_mask;
int cpu_id;
#endif
int err;
/* Create DMA descriptor TAG */
if ((err = -bus_dma_tag_create(
bus_get_dma_tag(mdev->pdev->dev.bsddev),
1, /* any alignment */
0, /* no boundary */
BUS_SPACE_MAXADDR, /* lowaddr */
BUS_SPACE_MAXADDR, /* highaddr */
NULL, NULL, /* filter, filterarg */
MLX5E_MAX_TX_PAYLOAD_SIZE, /* maxsize */
MLX5E_MAX_TX_MBUF_FRAGS, /* nsegments */
MLX5E_MAX_TX_MBUF_SIZE, /* maxsegsize */
0, /* flags */
NULL, NULL, /* lockfunc, lockfuncarg */
&sq->dma_tag)))
goto done;
err = mlx5_alloc_map_uar(mdev, &sq->uar);
if (err)
goto err_free_dma_tag;
err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq,
&sq->wq_ctrl);
if (err)
goto err_unmap_free_uar;
sq->wq.db = &sq->wq.db[MLX5_SND_DBR];
sq->uar_map = sq->uar.map;
sq->uar_bf_map = sq->uar.bf_map;
sq->bf_buf_size = (1 << MLX5_CAP_GEN(mdev, log_bf_reg_size)) / 2;
err = mlx5e_alloc_sq_db(sq);
if (err)
goto err_sq_wq_destroy;
sq->pdev = c->pdev;
sq->mkey_be = c->mkey_be;
sq->channel = c;
sq->tc = tc;
sq->br = buf_ring_alloc(MLX5E_SQ_TX_QUEUE_SIZE, M_MLX5EN,
M_WAITOK, &sq->lock);
if (sq->br == NULL) {
if_printf(c->ifp, "%s: Failed allocating sq drbr buffer\n",
__func__);
err = -ENOMEM;
goto err_free_sq_db;
}
sq->sq_tq = taskqueue_create_fast("mlx5e_que", M_WAITOK,
taskqueue_thread_enqueue, &sq->sq_tq);
if (sq->sq_tq == NULL) {
if_printf(c->ifp, "%s: Failed allocating taskqueue\n",
__func__);
err = -ENOMEM;
goto err_free_drbr;
}
TASK_INIT(&sq->sq_task, 0, mlx5e_tx_que, sq);
#ifdef RSS
cpu_id = rss_getcpu(c->ix % rss_getnumbuckets());
CPU_SETOF(cpu_id, &cpu_mask);
taskqueue_start_threads_cpuset(&sq->sq_tq, 1, PI_NET, &cpu_mask,
"%s TX SQ%d.%d CPU%d", c->ifp->if_xname, c->ix, tc, cpu_id);
#else
taskqueue_start_threads(&sq->sq_tq, 1, PI_NET,
"%s TX SQ%d.%d", c->ifp->if_xname, c->ix, tc);
#endif
snprintf(buffer, sizeof(buffer), "txstat%dtc%d", c->ix, tc);
mlx5e_create_stats(&sq->stats.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
buffer, mlx5e_sq_stats_desc, MLX5E_SQ_STATS_NUM,
sq->stats.arg);
return (0);
err_free_drbr:
buf_ring_free(sq->br, M_MLX5EN);
err_free_sq_db:
mlx5e_free_sq_db(sq);
err_sq_wq_destroy:
mlx5_wq_destroy(&sq->wq_ctrl);
err_unmap_free_uar:
mlx5_unmap_free_uar(mdev, &sq->uar);
err_free_dma_tag:
bus_dma_tag_destroy(sq->dma_tag);
done:
return (err);
}
static void
mlx5e_destroy_sq(struct mlx5e_sq *sq)
{
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
/* destroy all sysctl nodes */
sysctl_ctx_free(&sq->stats.ctx);
mlx5e_free_sq_db(sq);
mlx5_wq_destroy(&sq->wq_ctrl);
mlx5_unmap_free_uar(priv->mdev, &sq->uar);
taskqueue_drain(sq->sq_tq, &sq->sq_task);
taskqueue_free(sq->sq_tq);
buf_ring_free(sq->br, M_MLX5EN);
}
static int
mlx5e_enable_sq(struct mlx5e_sq *sq, struct mlx5e_sq_param *param)
{
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
void *in;
void *sqc;
void *wq;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(create_sq_in) +
sizeof(u64) * sq->wq_ctrl.buf.npages;
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
sqc = MLX5_ADDR_OF(create_sq_in, in, ctx);
wq = MLX5_ADDR_OF(sqc, sqc, wq);
memcpy(sqc, param->sqc, sizeof(param->sqc));
MLX5_SET(sqc, sqc, tis_num_0, priv->tisn[sq->tc]);
MLX5_SET(sqc, sqc, cqn, c->sq[sq->tc].cq.mcq.cqn);
MLX5_SET(sqc, sqc, state, MLX5_SQC_STATE_RST);
MLX5_SET(sqc, sqc, tis_lst_sz, 1);
MLX5_SET(sqc, sqc, flush_in_error_en, 1);
MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC);
MLX5_SET(wq, wq, uar_page, sq->uar.index);
MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift -
PAGE_SHIFT);
MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma);
mlx5_fill_page_array(&sq->wq_ctrl.buf,
(__be64 *) MLX5_ADDR_OF(wq, wq, pas));
err = mlx5_core_create_sq(mdev, in, inlen, &sq->sqn);
kvfree(in);
return (err);
}
static int
mlx5e_modify_sq(struct mlx5e_sq *sq, int curr_state, int next_state)
{
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
void *in;
void *sqc;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(modify_sq_in);
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
sqc = MLX5_ADDR_OF(modify_sq_in, in, ctx);
MLX5_SET(modify_sq_in, in, sqn, sq->sqn);
MLX5_SET(modify_sq_in, in, sq_state, curr_state);
MLX5_SET(sqc, sqc, state, next_state);
err = mlx5_core_modify_sq(mdev, in, inlen);
kvfree(in);
return (err);
}
static void
mlx5e_disable_sq(struct mlx5e_sq *sq)
{
struct mlx5e_channel *c = sq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
mlx5_core_destroy_sq(mdev, sq->sqn);
}
static int
mlx5e_open_sq(struct mlx5e_channel *c,
int tc,
struct mlx5e_sq_param *param,
struct mlx5e_sq *sq)
{
int err;
err = mlx5e_create_sq(c, tc, param, sq);
if (err)
return (err);
err = mlx5e_enable_sq(sq, param);
if (err)
goto err_destroy_sq;
err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RST, MLX5_SQC_STATE_RDY);
if (err)
goto err_disable_sq;
atomic_store_rel_int(&sq->queue_state, MLX5E_SQ_READY);
return (0);
err_disable_sq:
mlx5e_disable_sq(sq);
err_destroy_sq:
mlx5e_destroy_sq(sq);
return (err);
}
static void
mlx5e_sq_send_nops_locked(struct mlx5e_sq *sq, int can_sleep)
{
/* fill up remainder with NOPs */
while (sq->cev_counter != 0) {
while (!mlx5e_sq_has_room_for(sq, 1)) {
if (can_sleep != 0) {
mtx_unlock(&sq->lock);
msleep(4);
mtx_lock(&sq->lock);
} else {
goto done;
}
}
/* send a single NOP */
mlx5e_send_nop(sq, 1);
wmb();
}
done:
/* Check if we need to write the doorbell */
if (likely(sq->doorbell.d64 != 0)) {
mlx5e_tx_notify_hw(sq, sq->doorbell.d32, 0);
sq->doorbell.d64 = 0;
}
return;
}
void
mlx5e_sq_cev_timeout(void *arg)
{
struct mlx5e_sq *sq = arg;
mtx_assert(&sq->lock, MA_OWNED);
/* check next state */
switch (sq->cev_next_state) {
case MLX5E_CEV_STATE_SEND_NOPS:
/* fill TX ring with NOPs, if any */
mlx5e_sq_send_nops_locked(sq, 0);
/* check if completed */
if (sq->cev_counter == 0) {
sq->cev_next_state = MLX5E_CEV_STATE_INITIAL;
return;
}
break;
default:
/* send NOPs on next timeout */
sq->cev_next_state = MLX5E_CEV_STATE_SEND_NOPS;
break;
}
/* restart timer */
callout_reset_curcpu(&sq->cev_callout, hz, mlx5e_sq_cev_timeout, sq);
}
static void
mlx5e_close_sq_wait(struct mlx5e_sq *sq)
{
mtx_lock(&sq->lock);
/* teardown event factor timer, if any */
sq->cev_next_state = MLX5E_CEV_STATE_HOLD_NOPS;
callout_stop(&sq->cev_callout);
/* send dummy NOPs in order to flush the transmit ring */
mlx5e_sq_send_nops_locked(sq, 1);
mtx_unlock(&sq->lock);
/* make sure it is safe to free the callout */
callout_drain(&sq->cev_callout);
/* error out remaining requests */
mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR);
/* wait till SQ is empty */
mtx_lock(&sq->lock);
while (sq->cc != sq->pc) {
mtx_unlock(&sq->lock);
msleep(4);
sq->cq.mcq.comp(&sq->cq.mcq);
mtx_lock(&sq->lock);
}
mtx_unlock(&sq->lock);
mlx5e_disable_sq(sq);
mlx5e_destroy_sq(sq);
}
static int
mlx5e_create_cq(struct mlx5e_channel *c,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq,
mlx5e_cq_comp_t *comp)
{
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_core_cq *mcq = &cq->mcq;
int eqn_not_used;
int irqn;
int err;
u32 i;
param->wq.buf_numa_node = 0;
param->wq.db_numa_node = 0;
param->eq_ix = c->ix;
err = mlx5_cqwq_create(mdev, ¶m->wq, param->cqc, &cq->wq,
&cq->wq_ctrl);
if (err)
return (err);
mlx5_vector2eqn(mdev, param->eq_ix, &eqn_not_used, &irqn);
mcq->cqe_sz = 64;
mcq->set_ci_db = cq->wq_ctrl.db.db;
mcq->arm_db = cq->wq_ctrl.db.db + 1;
*mcq->set_ci_db = 0;
*mcq->arm_db = 0;
mcq->vector = param->eq_ix;
mcq->comp = comp;
mcq->event = mlx5e_cq_error_event;
mcq->irqn = irqn;
mcq->uar = &priv->cq_uar;
for (i = 0; i < mlx5_cqwq_get_size(&cq->wq); i++) {
struct mlx5_cqe64 *cqe = mlx5_cqwq_get_wqe(&cq->wq, i);
cqe->op_own = 0xf1;
}
cq->channel = c;
return (0);
}
static void
mlx5e_destroy_cq(struct mlx5e_cq *cq)
{
mlx5_wq_destroy(&cq->wq_ctrl);
}
static int
mlx5e_enable_cq(struct mlx5e_cq *cq, struct mlx5e_cq_param *param,
u8 moderation_mode)
{
struct mlx5e_channel *c = cq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_core_cq *mcq = &cq->mcq;
void *in;
void *cqc;
int inlen;
int irqn_not_used;
int eqn;
int err;
inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
sizeof(u64) * cq->wq_ctrl.buf.npages;
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
memcpy(cqc, param->cqc, sizeof(param->cqc));
mlx5_fill_page_array(&cq->wq_ctrl.buf,
(__be64 *) MLX5_ADDR_OF(create_cq_in, in, pas));
mlx5_vector2eqn(mdev, param->eq_ix, &eqn, &irqn_not_used);
MLX5_SET(cqc, cqc, cq_period_mode, moderation_mode);
MLX5_SET(cqc, cqc, c_eqn, eqn);
MLX5_SET(cqc, cqc, uar_page, mcq->uar->index);
MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift -
PAGE_SHIFT);
MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma);
err = mlx5_core_create_cq(mdev, mcq, in, inlen);
kvfree(in);
if (err)
return (err);
mlx5e_cq_arm(cq);
return (0);
}
static void
mlx5e_disable_cq(struct mlx5e_cq *cq)
{
struct mlx5e_channel *c = cq->channel;
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
mlx5_core_destroy_cq(mdev, &cq->mcq);
}
static int
mlx5e_open_cq(struct mlx5e_channel *c,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq,
mlx5e_cq_comp_t *comp,
u8 moderation_mode)
{
int err;
err = mlx5e_create_cq(c, param, cq, comp);
if (err)
return (err);
err = mlx5e_enable_cq(cq, param, moderation_mode);
if (err)
goto err_destroy_cq;
return (0);
err_destroy_cq:
mlx5e_destroy_cq(cq);
return (err);
}
static void
mlx5e_close_cq(struct mlx5e_cq *cq)
{
mlx5e_disable_cq(cq);
mlx5e_destroy_cq(cq);
}
static int
mlx5e_open_tx_cqs(struct mlx5e_channel *c,
struct mlx5e_channel_param *cparam)
{
u8 tx_moderation_mode;
int err;
int tc;
switch (c->priv->params.tx_cq_moderation_mode) {
case 0:
tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
break;
default:
if (MLX5_CAP_GEN(c->priv->mdev, cq_period_start_from_cqe))
tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_CQE;
else
tx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
break;
}
for (tc = 0; tc < c->num_tc; tc++) {
/* open completion queue */
err = mlx5e_open_cq(c, &cparam->tx_cq, &c->sq[tc].cq,
&mlx5e_tx_cq_comp, tx_moderation_mode);
if (err)
goto err_close_tx_cqs;
}
return (0);
err_close_tx_cqs:
for (tc--; tc >= 0; tc--)
mlx5e_close_cq(&c->sq[tc].cq);
return (err);
}
static void
mlx5e_close_tx_cqs(struct mlx5e_channel *c)
{
int tc;
for (tc = 0; tc < c->num_tc; tc++)
mlx5e_close_cq(&c->sq[tc].cq);
}
static int
mlx5e_open_sqs(struct mlx5e_channel *c,
struct mlx5e_channel_param *cparam)
{
int err;
int tc;
for (tc = 0; tc < c->num_tc; tc++) {
err = mlx5e_open_sq(c, tc, &cparam->sq, &c->sq[tc]);
if (err)
goto err_close_sqs;
}
return (0);
err_close_sqs:
for (tc--; tc >= 0; tc--)
mlx5e_close_sq_wait(&c->sq[tc]);
return (err);
}
static void
mlx5e_close_sqs_wait(struct mlx5e_channel *c)
{
int tc;
for (tc = 0; tc < c->num_tc; tc++)
mlx5e_close_sq_wait(&c->sq[tc]);
}
static void
mlx5e_chan_mtx_init(struct mlx5e_channel *c)
{
int tc;
mtx_init(&c->rq.mtx, "mlx5rx", MTX_NETWORK_LOCK, MTX_DEF);
for (tc = 0; tc < c->num_tc; tc++) {
struct mlx5e_sq *sq = c->sq + tc;
mtx_init(&sq->lock, "mlx5tx", MTX_NETWORK_LOCK, MTX_DEF);
mtx_init(&sq->comp_lock, "mlx5comp", MTX_NETWORK_LOCK,
MTX_DEF);
callout_init_mtx(&sq->cev_callout, &sq->lock, 0);
sq->cev_factor = c->priv->params_ethtool.tx_completion_fact;
/* ensure the TX completion event factor is not zero */
if (sq->cev_factor == 0)
sq->cev_factor = 1;
}
}
static void
mlx5e_chan_mtx_destroy(struct mlx5e_channel *c)
{
int tc;
mtx_destroy(&c->rq.mtx);
for (tc = 0; tc < c->num_tc; tc++) {
mtx_destroy(&c->sq[tc].lock);
mtx_destroy(&c->sq[tc].comp_lock);
}
}
static int
mlx5e_open_channel(struct mlx5e_priv *priv, int ix,
struct mlx5e_channel_param *cparam,
struct mlx5e_channel *volatile *cp)
{
struct mlx5e_channel *c;
u8 rx_moderation_mode;
int err;
c = malloc(sizeof(*c), M_MLX5EN, M_WAITOK | M_ZERO);
if (c == NULL)
return (-ENOMEM);
c->priv = priv;
c->ix = ix;
c->cpu = 0;
c->pdev = &priv->mdev->pdev->dev;
c->ifp = priv->ifp;
c->mkey_be = cpu_to_be32(priv->mr.key);
c->num_tc = priv->num_tc;
/* init mutexes */
mlx5e_chan_mtx_init(c);
/* open transmit completion queue */
err = mlx5e_open_tx_cqs(c, cparam);
if (err)
goto err_free;
switch (priv->params.rx_cq_moderation_mode) {
case 0:
rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
break;
default:
if (MLX5_CAP_GEN(priv->mdev, cq_period_start_from_cqe))
rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_CQE;
else
rx_moderation_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
break;
}
/* open receive completion queue */
err = mlx5e_open_cq(c, &cparam->rx_cq, &c->rq.cq,
&mlx5e_rx_cq_comp, rx_moderation_mode);
if (err)
goto err_close_tx_cqs;
err = mlx5e_open_sqs(c, cparam);
if (err)
goto err_close_rx_cq;
err = mlx5e_open_rq(c, &cparam->rq, &c->rq);
if (err)
goto err_close_sqs;
/* store channel pointer */
*cp = c;
/* poll receive queue initially */
c->rq.cq.mcq.comp(&c->rq.cq.mcq);
return (0);
err_close_sqs:
mlx5e_close_sqs_wait(c);
err_close_rx_cq:
mlx5e_close_cq(&c->rq.cq);
err_close_tx_cqs:
mlx5e_close_tx_cqs(c);
err_free:
/* destroy mutexes */
mlx5e_chan_mtx_destroy(c);
free(c, M_MLX5EN);
return (err);
}
static void
mlx5e_close_channel(struct mlx5e_channel *volatile *pp)
{
struct mlx5e_channel *c = *pp;
/* check if channel is already closed */
if (c == NULL)
return;
mlx5e_close_rq(&c->rq);
}
static void
mlx5e_close_channel_wait(struct mlx5e_channel *volatile *pp)
{
struct mlx5e_channel *c = *pp;
/* check if channel is already closed */
if (c == NULL)
return;
/* ensure channel pointer is no longer used */
*pp = NULL;
mlx5e_close_rq_wait(&c->rq);
mlx5e_close_sqs_wait(c);
mlx5e_close_cq(&c->rq.cq);
mlx5e_close_tx_cqs(c);
/* destroy mutexes */
mlx5e_chan_mtx_destroy(c);
free(c, M_MLX5EN);
}
static void
mlx5e_build_rq_param(struct mlx5e_priv *priv,
struct mlx5e_rq_param *param)
{
void *rqc = param->rqc;
void *wq = MLX5_ADDR_OF(rqc, rqc, wq);
MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_LINKED_LIST);
MLX5_SET(wq, wq, end_padding_mode, MLX5_WQ_END_PAD_MODE_ALIGN);
MLX5_SET(wq, wq, log_wq_stride, ilog2(sizeof(struct mlx5e_rx_wqe)));
MLX5_SET(wq, wq, log_wq_sz, priv->params.log_rq_size);
MLX5_SET(wq, wq, pd, priv->pdn);
param->wq.buf_numa_node = 0;
param->wq.db_numa_node = 0;
param->wq.linear = 1;
}
static void
mlx5e_build_sq_param(struct mlx5e_priv *priv,
struct mlx5e_sq_param *param)
{
void *sqc = param->sqc;
void *wq = MLX5_ADDR_OF(sqc, sqc, wq);
MLX5_SET(wq, wq, log_wq_sz, priv->params.log_sq_size);
MLX5_SET(wq, wq, log_wq_stride, ilog2(MLX5_SEND_WQE_BB));
MLX5_SET(wq, wq, pd, priv->pdn);
param->wq.buf_numa_node = 0;
param->wq.db_numa_node = 0;
param->wq.linear = 1;
}
static void
mlx5e_build_common_cq_param(struct mlx5e_priv *priv,
struct mlx5e_cq_param *param)
{
void *cqc = param->cqc;
MLX5_SET(cqc, cqc, uar_page, priv->cq_uar.index);
}
static void
mlx5e_build_rx_cq_param(struct mlx5e_priv *priv,
struct mlx5e_cq_param *param)
{
void *cqc = param->cqc;
/*
* TODO The sysctl to control on/off is a bool value for now, which means
* we only support CSUM, once HASH is implemnted we'll need to address that.
*/
if (priv->params.cqe_zipping_en) {
MLX5_SET(cqc, cqc, mini_cqe_res_format, MLX5_CQE_FORMAT_CSUM);
MLX5_SET(cqc, cqc, cqe_compression_en, 1);
}
MLX5_SET(cqc, cqc, log_cq_size, priv->params.log_rq_size);
MLX5_SET(cqc, cqc, cq_period, priv->params.rx_cq_moderation_usec);
MLX5_SET(cqc, cqc, cq_max_count, priv->params.rx_cq_moderation_pkts);
mlx5e_build_common_cq_param(priv, param);
}
static void
mlx5e_build_tx_cq_param(struct mlx5e_priv *priv,
struct mlx5e_cq_param *param)
{
void *cqc = param->cqc;
MLX5_SET(cqc, cqc, log_cq_size, priv->params.log_sq_size);
MLX5_SET(cqc, cqc, cq_period, priv->params.tx_cq_moderation_usec);
MLX5_SET(cqc, cqc, cq_max_count, priv->params.tx_cq_moderation_pkts);
mlx5e_build_common_cq_param(priv, param);
}
static void
mlx5e_build_channel_param(struct mlx5e_priv *priv,
struct mlx5e_channel_param *cparam)
{
memset(cparam, 0, sizeof(*cparam));
mlx5e_build_rq_param(priv, &cparam->rq);
mlx5e_build_sq_param(priv, &cparam->sq);
mlx5e_build_rx_cq_param(priv, &cparam->rx_cq);
mlx5e_build_tx_cq_param(priv, &cparam->tx_cq);
}
static int
mlx5e_open_channels(struct mlx5e_priv *priv)
{
struct mlx5e_channel_param cparam;
void *ptr;
int err;
int i;
int j;
priv->channel = malloc(priv->params.num_channels *
sizeof(struct mlx5e_channel *), M_MLX5EN, M_WAITOK | M_ZERO);
if (priv->channel == NULL)
return (-ENOMEM);
mlx5e_build_channel_param(priv, &cparam);
for (i = 0; i < priv->params.num_channels; i++) {
err = mlx5e_open_channel(priv, i, &cparam, &priv->channel[i]);
if (err)
goto err_close_channels;
}
for (j = 0; j < priv->params.num_channels; j++) {
err = mlx5e_wait_for_min_rx_wqes(&priv->channel[j]->rq);
if (err)
goto err_close_channels;
}
return (0);
err_close_channels:
for (i--; i >= 0; i--) {
mlx5e_close_channel(&priv->channel[i]);
mlx5e_close_channel_wait(&priv->channel[i]);
}
/* remove "volatile" attribute from "channel" pointer */
ptr = __DECONST(void *, priv->channel);
priv->channel = NULL;
free(ptr, M_MLX5EN);
return (err);
}
static void
mlx5e_close_channels(struct mlx5e_priv *priv)
{
void *ptr;
int i;
if (priv->channel == NULL)
return;
for (i = 0; i < priv->params.num_channels; i++)
mlx5e_close_channel(&priv->channel[i]);
for (i = 0; i < priv->params.num_channels; i++)
mlx5e_close_channel_wait(&priv->channel[i]);
/* remove "volatile" attribute from "channel" pointer */
ptr = __DECONST(void *, priv->channel);
priv->channel = NULL;
free(ptr, M_MLX5EN);
}
static int
mlx5e_refresh_sq_params(struct mlx5e_priv *priv, struct mlx5e_sq *sq)
{
return (mlx5_core_modify_cq_moderation(priv->mdev, &sq->cq.mcq,
priv->params.tx_cq_moderation_usec,
priv->params.tx_cq_moderation_pkts));
}
static int
mlx5e_refresh_rq_params(struct mlx5e_priv *priv, struct mlx5e_rq *rq)
{
return (mlx5_core_modify_cq_moderation(priv->mdev, &rq->cq.mcq,
priv->params.rx_cq_moderation_usec,
priv->params.rx_cq_moderation_pkts));
}
static int
mlx5e_refresh_channel_params_sub(struct mlx5e_priv *priv, struct mlx5e_channel *c)
{
int err;
int i;
if (c == NULL)
return (EINVAL);
err = mlx5e_refresh_rq_params(priv, &c->rq);
if (err)
goto done;
for (i = 0; i != c->num_tc; i++) {
err = mlx5e_refresh_sq_params(priv, &c->sq[i]);
if (err)
goto done;
}
done:
return (err);
}
int
mlx5e_refresh_channel_params(struct mlx5e_priv *priv)
{
int i;
if (priv->channel == NULL)
return (EINVAL);
for (i = 0; i < priv->params.num_channels; i++) {
int err;
err = mlx5e_refresh_channel_params_sub(priv, priv->channel[i]);
if (err)
return (err);
}
return (0);
}
static int
mlx5e_open_tis(struct mlx5e_priv *priv, int tc)
{
struct mlx5_core_dev *mdev = priv->mdev;
u32 in[MLX5_ST_SZ_DW(create_tis_in)];
void *tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
memset(in, 0, sizeof(in));
MLX5_SET(tisc, tisc, prio, tc);
MLX5_SET(tisc, tisc, transport_domain, priv->tdn);
return (mlx5_core_create_tis(mdev, in, sizeof(in), &priv->tisn[tc]));
}
static void
mlx5e_close_tis(struct mlx5e_priv *priv, int tc)
{
mlx5_core_destroy_tis(priv->mdev, priv->tisn[tc]);
}
static int
mlx5e_open_tises(struct mlx5e_priv *priv)
{
int num_tc = priv->num_tc;
int err;
int tc;
for (tc = 0; tc < num_tc; tc++) {
err = mlx5e_open_tis(priv, tc);
if (err)
goto err_close_tises;
}
return (0);
err_close_tises:
for (tc--; tc >= 0; tc--)
mlx5e_close_tis(priv, tc);
return (err);
}
static void
mlx5e_close_tises(struct mlx5e_priv *priv)
{
int num_tc = priv->num_tc;
int tc;
for (tc = 0; tc < num_tc; tc++)
mlx5e_close_tis(priv, tc);
}
static int
mlx5e_open_rqt(struct mlx5e_priv *priv)
{
struct mlx5_core_dev *mdev = priv->mdev;
u32 *in;
u32 out[MLX5_ST_SZ_DW(create_rqt_out)];
void *rqtc;
int inlen;
int err;
int sz;
int i;
sz = 1 << priv->params.rx_hash_log_tbl_sz;
inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + sizeof(u32) * sz;
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
MLX5_SET(rqtc, rqtc, rqt_actual_size, sz);
MLX5_SET(rqtc, rqtc, rqt_max_size, sz);
for (i = 0; i < sz; i++) {
int ix;
#ifdef RSS
ix = rss_get_indirection_to_bucket(i);
#else
ix = i;
#endif
/* ensure we don't overflow */
ix %= priv->params.num_channels;
MLX5_SET(rqtc, rqtc, rq_num[i], priv->channel[ix]->rq.rqn);
}
MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
memset(out, 0, sizeof(out));
err = mlx5_cmd_exec_check_status(mdev, in, inlen, out, sizeof(out));
if (!err)
priv->rqtn = MLX5_GET(create_rqt_out, out, rqtn);
kvfree(in);
return (err);
}
static void
mlx5e_close_rqt(struct mlx5e_priv *priv)
{
u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)];
u32 out[MLX5_ST_SZ_DW(destroy_rqt_out)];
memset(in, 0, sizeof(in));
MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT);
MLX5_SET(destroy_rqt_in, in, rqtn, priv->rqtn);
mlx5_cmd_exec_check_status(priv->mdev, in, sizeof(in), out,
sizeof(out));
}
static void
mlx5e_build_tir_ctx(struct mlx5e_priv *priv, u32 * tirc, int tt)
{
void *hfso = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
__be32 *hkey;
MLX5_SET(tirc, tirc, transport_domain, priv->tdn);
#define ROUGH_MAX_L2_L3_HDR_SZ 256
#define MLX5_HASH_IP (MLX5_HASH_FIELD_SEL_SRC_IP |\
MLX5_HASH_FIELD_SEL_DST_IP)
#define MLX5_HASH_ALL (MLX5_HASH_FIELD_SEL_SRC_IP |\
MLX5_HASH_FIELD_SEL_DST_IP |\
MLX5_HASH_FIELD_SEL_L4_SPORT |\
MLX5_HASH_FIELD_SEL_L4_DPORT)
#define MLX5_HASH_IP_IPSEC_SPI (MLX5_HASH_FIELD_SEL_SRC_IP |\
MLX5_HASH_FIELD_SEL_DST_IP |\
MLX5_HASH_FIELD_SEL_IPSEC_SPI)
if (priv->params.hw_lro_en) {
MLX5_SET(tirc, tirc, lro_enable_mask,
MLX5_TIRC_LRO_ENABLE_MASK_IPV4_LRO |
MLX5_TIRC_LRO_ENABLE_MASK_IPV6_LRO);
MLX5_SET(tirc, tirc, lro_max_msg_sz,
(priv->params.lro_wqe_sz -
ROUGH_MAX_L2_L3_HDR_SZ) >> 8);
/* TODO: add the option to choose timer value dynamically */
MLX5_SET(tirc, tirc, lro_timeout_period_usecs,
MLX5_CAP_ETH(priv->mdev,
lro_timer_supported_periods[2]));
}
/* setup parameters for hashing TIR type, if any */
switch (tt) {
case MLX5E_TT_ANY:
MLX5_SET(tirc, tirc, disp_type,
MLX5_TIRC_DISP_TYPE_DIRECT);
MLX5_SET(tirc, tirc, inline_rqn,
priv->channel[0]->rq.rqn);
break;
default:
MLX5_SET(tirc, tirc, disp_type,
MLX5_TIRC_DISP_TYPE_INDIRECT);
MLX5_SET(tirc, tirc, indirect_table,
priv->rqtn);
MLX5_SET(tirc, tirc, rx_hash_fn,
MLX5_TIRC_RX_HASH_FN_HASH_TOEPLITZ);
hkey = (__be32 *) MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
#ifdef RSS
/*
* The FreeBSD RSS implementation does currently not
* support symmetric Toeplitz hashes:
*/
MLX5_SET(tirc, tirc, rx_hash_symmetric, 0);
rss_getkey((uint8_t *)hkey);
#else
MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
hkey[0] = cpu_to_be32(0xD181C62C);
hkey[1] = cpu_to_be32(0xF7F4DB5B);
hkey[2] = cpu_to_be32(0x1983A2FC);
hkey[3] = cpu_to_be32(0x943E1ADB);
hkey[4] = cpu_to_be32(0xD9389E6B);
hkey[5] = cpu_to_be32(0xD1039C2C);
hkey[6] = cpu_to_be32(0xA74499AD);
hkey[7] = cpu_to_be32(0x593D56D9);
hkey[8] = cpu_to_be32(0xF3253C06);
hkey[9] = cpu_to_be32(0x2ADC1FFC);
#endif
break;
}
switch (tt) {
case MLX5E_TT_IPV4_TCP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV4);
MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
MLX5_L4_PROT_TYPE_TCP);
#ifdef RSS
if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_TCP_IPV4)) {
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
} else
#endif
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_ALL);
break;
case MLX5E_TT_IPV6_TCP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV6);
MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
MLX5_L4_PROT_TYPE_TCP);
#ifdef RSS
if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_TCP_IPV6)) {
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
} else
#endif
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_ALL);
break;
case MLX5E_TT_IPV4_UDP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV4);
MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
MLX5_L4_PROT_TYPE_UDP);
#ifdef RSS
if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_UDP_IPV4)) {
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
} else
#endif
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_ALL);
break;
case MLX5E_TT_IPV6_UDP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV6);
MLX5_SET(rx_hash_field_select, hfso, l4_prot_type,
MLX5_L4_PROT_TYPE_UDP);
#ifdef RSS
if (!(rss_gethashconfig() & RSS_HASHTYPE_RSS_UDP_IPV6)) {
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
} else
#endif
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_ALL);
break;
case MLX5E_TT_IPV4_IPSEC_AH:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV4);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP_IPSEC_SPI);
break;
case MLX5E_TT_IPV6_IPSEC_AH:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV6);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP_IPSEC_SPI);
break;
case MLX5E_TT_IPV4_IPSEC_ESP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV4);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP_IPSEC_SPI);
break;
case MLX5E_TT_IPV6_IPSEC_ESP:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV6);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP_IPSEC_SPI);
break;
case MLX5E_TT_IPV4:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV4);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
break;
case MLX5E_TT_IPV6:
MLX5_SET(rx_hash_field_select, hfso, l3_prot_type,
MLX5_L3_PROT_TYPE_IPV6);
MLX5_SET(rx_hash_field_select, hfso, selected_fields,
MLX5_HASH_IP);
break;
default:
break;
}
}
static int
mlx5e_open_tir(struct mlx5e_priv *priv, int tt)
{
struct mlx5_core_dev *mdev = priv->mdev;
u32 *in;
void *tirc;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(create_tir_in);
in = mlx5_vzalloc(inlen);
if (in == NULL)
return (-ENOMEM);
tirc = MLX5_ADDR_OF(create_tir_in, in, tir_context);
mlx5e_build_tir_ctx(priv, tirc, tt);
err = mlx5_core_create_tir(mdev, in, inlen, &priv->tirn[tt]);
kvfree(in);
return (err);
}
static void
mlx5e_close_tir(struct mlx5e_priv *priv, int tt)
{
mlx5_core_destroy_tir(priv->mdev, priv->tirn[tt]);
}
static int
mlx5e_open_tirs(struct mlx5e_priv *priv)
{
int err;
int i;
for (i = 0; i < MLX5E_NUM_TT; i++) {
err = mlx5e_open_tir(priv, i);
if (err)
goto err_close_tirs;
}
return (0);
err_close_tirs:
for (i--; i >= 0; i--)
mlx5e_close_tir(priv, i);
return (err);
}
static void
mlx5e_close_tirs(struct mlx5e_priv *priv)
{
int i;
for (i = 0; i < MLX5E_NUM_TT; i++)
mlx5e_close_tir(priv, i);
}
/*
* SW MTU does not include headers,
* HW MTU includes all headers and checksums.
*/
static int
mlx5e_set_dev_port_mtu(struct ifnet *ifp, int sw_mtu)
{
struct mlx5e_priv *priv = ifp->if_softc;
struct mlx5_core_dev *mdev = priv->mdev;
int hw_mtu;
int err;
err = mlx5_set_port_mtu(mdev, MLX5E_SW2HW_MTU(sw_mtu));
if (err) {
if_printf(ifp, "%s: mlx5_set_port_mtu failed setting %d, err=%d\n",
__func__, sw_mtu, err);
return (err);
}
err = mlx5_query_port_oper_mtu(mdev, &hw_mtu);
if (!err) {
ifp->if_mtu = MLX5E_HW2SW_MTU(hw_mtu);
if (ifp->if_mtu != sw_mtu) {
if_printf(ifp, "Port MTU %d is different than "
"ifp mtu %d\n", sw_mtu, (int)ifp->if_mtu);
}
} else {
if_printf(ifp, "Query port MTU, after setting new "
"MTU value, failed\n");
ifp->if_mtu = sw_mtu;
}
return (0);
}
int
mlx5e_open_locked(struct ifnet *ifp)
{
struct mlx5e_priv *priv = ifp->if_softc;
int err;
/* check if already opened */
if (test_bit(MLX5E_STATE_OPENED, &priv->state) != 0)
return (0);
#ifdef RSS
if (rss_getnumbuckets() > priv->params.num_channels) {
if_printf(ifp, "NOTE: There are more RSS buckets(%u) than "
"channels(%u) available\n", rss_getnumbuckets(),
priv->params.num_channels);
}
#endif
err = mlx5e_open_tises(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_open_tises failed, %d\n",
__func__, err);
return (err);
}
err = mlx5_vport_alloc_q_counter(priv->mdev, &priv->counter_set_id);
if (err) {
if_printf(priv->ifp,
"%s: mlx5_vport_alloc_q_counter failed: %d\n",
__func__, err);
goto err_close_tises;
}
err = mlx5e_open_channels(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_open_channels failed, %d\n",
__func__, err);
goto err_dalloc_q_counter;
}
err = mlx5e_open_rqt(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_open_rqt failed, %d\n",
__func__, err);
goto err_close_channels;
}
err = mlx5e_open_tirs(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_open_tir failed, %d\n",
__func__, err);
goto err_close_rqls;
}
err = mlx5e_open_flow_table(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_open_flow_table failed, %d\n",
__func__, err);
goto err_close_tirs;
}
err = mlx5e_add_all_vlan_rules(priv);
if (err) {
if_printf(ifp, "%s: mlx5e_add_all_vlan_rules failed, %d\n",
__func__, err);
goto err_close_flow_table;
}
set_bit(MLX5E_STATE_OPENED, &priv->state);
mlx5e_update_carrier(priv);
mlx5e_set_rx_mode_core(priv);
return (0);
err_close_flow_table:
mlx5e_close_flow_table(priv);
err_close_tirs:
mlx5e_close_tirs(priv);
err_close_rqls:
mlx5e_close_rqt(priv);
err_close_channels:
mlx5e_close_channels(priv);
err_dalloc_q_counter:
mlx5_vport_dealloc_q_counter(priv->mdev, priv->counter_set_id);
err_close_tises:
mlx5e_close_tises(priv);
return (err);
}
static void
mlx5e_open(void *arg)
{
struct mlx5e_priv *priv = arg;
PRIV_LOCK(priv);
if (mlx5_set_port_status(priv->mdev, MLX5_PORT_UP))
if_printf(priv->ifp,
"%s: Setting port status to up failed\n",
__func__);
mlx5e_open_locked(priv->ifp);
priv->ifp->if_drv_flags |= IFF_DRV_RUNNING;
PRIV_UNLOCK(priv);
}
int
mlx5e_close_locked(struct ifnet *ifp)
{
struct mlx5e_priv *priv = ifp->if_softc;
/* check if already closed */
if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
return (0);
clear_bit(MLX5E_STATE_OPENED, &priv->state);
mlx5e_set_rx_mode_core(priv);
mlx5e_del_all_vlan_rules(priv);
if_link_state_change(priv->ifp, LINK_STATE_DOWN);
mlx5e_close_flow_table(priv);
mlx5e_close_tirs(priv);
mlx5e_close_rqt(priv);
mlx5e_close_channels(priv);
mlx5_vport_dealloc_q_counter(priv->mdev, priv->counter_set_id);
mlx5e_close_tises(priv);
return (0);
}
#if (__FreeBSD_version >= 1100000)
static uint64_t
mlx5e_get_counter(struct ifnet *ifp, ift_counter cnt)
{
struct mlx5e_priv *priv = ifp->if_softc;
u64 retval;
/* PRIV_LOCK(priv); XXX not allowed */
switch (cnt) {
case IFCOUNTER_IPACKETS:
retval = priv->stats.vport.rx_packets;
break;
case IFCOUNTER_IERRORS:
retval = priv->stats.vport.rx_error_packets;
break;
case IFCOUNTER_IQDROPS:
retval = priv->stats.vport.rx_out_of_buffer;
break;
case IFCOUNTER_OPACKETS:
retval = priv->stats.vport.tx_packets;
break;
case IFCOUNTER_OERRORS:
retval = priv->stats.vport.tx_error_packets;
break;
case IFCOUNTER_IBYTES:
retval = priv->stats.vport.rx_bytes;
break;
case IFCOUNTER_OBYTES:
retval = priv->stats.vport.tx_bytes;
break;
case IFCOUNTER_IMCASTS:
retval = priv->stats.vport.rx_multicast_packets;
break;
case IFCOUNTER_OMCASTS:
retval = priv->stats.vport.tx_multicast_packets;
break;
case IFCOUNTER_OQDROPS:
retval = priv->stats.vport.tx_queue_dropped;
break;
default:
retval = if_get_counter_default(ifp, cnt);
break;
}
/* PRIV_UNLOCK(priv); XXX not allowed */
return (retval);
}
#endif
static void
mlx5e_set_rx_mode(struct ifnet *ifp)
{
struct mlx5e_priv *priv = ifp->if_softc;
schedule_work(&priv->set_rx_mode_work);
}
static int
mlx5e_ioctl(struct ifnet *ifp, u_long command, caddr_t data)
{
struct mlx5e_priv *priv;
struct ifreq *ifr;
struct ifi2creq i2c;
int error = 0;
int mask = 0;
int size_read = 0;
int module_num;
int max_mtu;
uint8_t read_addr;
priv = ifp->if_softc;
/* check if detaching */
if (priv == NULL || priv->gone != 0)
return (ENXIO);
switch (command) {
case SIOCSIFMTU:
ifr = (struct ifreq *)data;
PRIV_LOCK(priv);
mlx5_query_port_max_mtu(priv->mdev, &max_mtu);
if (ifr->ifr_mtu >= MLX5E_MTU_MIN &&
ifr->ifr_mtu <= MIN(MLX5E_MTU_MAX, max_mtu)) {
int was_opened;
was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
if (was_opened)
mlx5e_close_locked(ifp);
/* set new MTU */
mlx5e_set_dev_port_mtu(ifp, ifr->ifr_mtu);
if (was_opened)
mlx5e_open_locked(ifp);
} else {
error = EINVAL;
if_printf(ifp, "Invalid MTU value. Min val: %d, Max val: %d\n",
MLX5E_MTU_MIN, MIN(MLX5E_MTU_MAX, max_mtu));
}
PRIV_UNLOCK(priv);
break;
case SIOCSIFFLAGS:
if ((ifp->if_flags & IFF_UP) &&
(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
mlx5e_set_rx_mode(ifp);
break;
}
PRIV_LOCK(priv);
if (ifp->if_flags & IFF_UP) {
if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) {
if (test_bit(MLX5E_STATE_OPENED, &priv->state) == 0)
mlx5e_open_locked(ifp);
ifp->if_drv_flags |= IFF_DRV_RUNNING;
mlx5_set_port_status(priv->mdev, MLX5_PORT_UP);
}
} else {
if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
mlx5_set_port_status(priv->mdev,
MLX5_PORT_DOWN);
if (test_bit(MLX5E_STATE_OPENED, &priv->state) != 0)
mlx5e_close_locked(ifp);
mlx5e_update_carrier(priv);
ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
}
}
PRIV_UNLOCK(priv);
break;
case SIOCADDMULTI:
case SIOCDELMULTI:
mlx5e_set_rx_mode(ifp);
break;
case SIOCSIFMEDIA:
case SIOCGIFMEDIA:
case SIOCGIFXMEDIA:
ifr = (struct ifreq *)data;
error = ifmedia_ioctl(ifp, ifr, &priv->media, command);
break;
case SIOCSIFCAP:
ifr = (struct ifreq *)data;
PRIV_LOCK(priv);
mask = ifr->ifr_reqcap ^ ifp->if_capenable;
if (mask & IFCAP_TXCSUM) {
ifp->if_capenable ^= IFCAP_TXCSUM;
ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP);
if (IFCAP_TSO4 & ifp->if_capenable &&
!(IFCAP_TXCSUM & ifp->if_capenable)) {
ifp->if_capenable &= ~IFCAP_TSO4;
ifp->if_hwassist &= ~CSUM_IP_TSO;
if_printf(ifp,
"tso4 disabled due to -txcsum.\n");
}
}
if (mask & IFCAP_TXCSUM_IPV6) {
ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
if (IFCAP_TSO6 & ifp->if_capenable &&
!(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
ifp->if_capenable &= ~IFCAP_TSO6;
ifp->if_hwassist &= ~CSUM_IP6_TSO;
if_printf(ifp,
"tso6 disabled due to -txcsum6.\n");
}
}
if (mask & IFCAP_RXCSUM)
ifp->if_capenable ^= IFCAP_RXCSUM;
if (mask & IFCAP_RXCSUM_IPV6)
ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
if (mask & IFCAP_TSO4) {
if (!(IFCAP_TSO4 & ifp->if_capenable) &&
!(IFCAP_TXCSUM & ifp->if_capenable)) {
if_printf(ifp, "enable txcsum first.\n");
error = EAGAIN;
goto out;
}
ifp->if_capenable ^= IFCAP_TSO4;
ifp->if_hwassist ^= CSUM_IP_TSO;
}
if (mask & IFCAP_TSO6) {
if (!(IFCAP_TSO6 & ifp->if_capenable) &&
!(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
if_printf(ifp, "enable txcsum6 first.\n");
error = EAGAIN;
goto out;
}
ifp->if_capenable ^= IFCAP_TSO6;
ifp->if_hwassist ^= CSUM_IP6_TSO;
}
if (mask & IFCAP_VLAN_HWFILTER) {
if (ifp->if_capenable & IFCAP_VLAN_HWFILTER)
mlx5e_disable_vlan_filter(priv);
else
mlx5e_enable_vlan_filter(priv);
ifp->if_capenable ^= IFCAP_VLAN_HWFILTER;
}
if (mask & IFCAP_VLAN_HWTAGGING)
ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
if (mask & IFCAP_WOL_MAGIC)
ifp->if_capenable ^= IFCAP_WOL_MAGIC;
VLAN_CAPABILITIES(ifp);
/* turn off LRO means also turn of HW LRO - if it's on */
if (mask & IFCAP_LRO) {
int was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
bool need_restart = false;
ifp->if_capenable ^= IFCAP_LRO;
if (!(ifp->if_capenable & IFCAP_LRO)) {
if (priv->params.hw_lro_en) {
priv->params.hw_lro_en = false;
need_restart = true;
/* Not sure this is the correct way */
priv->params_ethtool.hw_lro = priv->params.hw_lro_en;
}
}
if (was_opened && need_restart) {
mlx5e_close_locked(ifp);
mlx5e_open_locked(ifp);
}
}
out:
PRIV_UNLOCK(priv);
break;
case SIOCGI2C:
ifr = (struct ifreq *)data;
/*
* Copy from the user-space address ifr_data to the
* kernel-space address i2c
*/
error = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
if (error)
break;
if (i2c.len > sizeof(i2c.data)) {
error = EINVAL;
break;
}
PRIV_LOCK(priv);
/* Get module_num which is required for the query_eeprom */
error = mlx5_query_module_num(priv->mdev, &module_num);
if (error) {
if_printf(ifp, "Query module num failed, eeprom "
"reading is not supported\n");
error = EINVAL;
goto err_i2c;
}
/* Check if module is present before doing an access */
if (mlx5_query_module_status(priv->mdev, module_num) !=
MLX5_MODULE_STATUS_PLUGGED) {
error = EINVAL;
goto err_i2c;
}
/*
* Currently 0XA0 and 0xA2 are the only addresses permitted.
* The internal conversion is as follows:
*/
if (i2c.dev_addr == 0xA0)
read_addr = MLX5E_I2C_ADDR_LOW;
else if (i2c.dev_addr == 0xA2)
read_addr = MLX5E_I2C_ADDR_HIGH;
else {
if_printf(ifp, "Query eeprom failed, "
"Invalid Address: %X\n", i2c.dev_addr);
error = EINVAL;
goto err_i2c;
}
error = mlx5_query_eeprom(priv->mdev,
read_addr, MLX5E_EEPROM_LOW_PAGE,
(uint32_t)i2c.offset, (uint32_t)i2c.len, module_num,
(uint32_t *)i2c.data, &size_read);
if (error) {
if_printf(ifp, "Query eeprom failed, eeprom "
"reading is not supported\n");
error = EINVAL;
goto err_i2c;
}
if (i2c.len > MLX5_EEPROM_MAX_BYTES) {
error = mlx5_query_eeprom(priv->mdev,
read_addr, MLX5E_EEPROM_LOW_PAGE,
(uint32_t)(i2c.offset + size_read),
(uint32_t)(i2c.len - size_read), module_num,
(uint32_t *)(i2c.data + size_read), &size_read);
}
if (error) {
if_printf(ifp, "Query eeprom failed, eeprom "
"reading is not supported\n");
error = EINVAL;
goto err_i2c;
}
error = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
err_i2c:
PRIV_UNLOCK(priv);
break;
default:
error = ether_ioctl(ifp, command, data);
break;
}
return (error);
}
static int
mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
{
/*
* TODO: uncoment once FW really sets all these bits if
* (!mdev->caps.eth.rss_ind_tbl_cap || !mdev->caps.eth.csum_cap ||
* !mdev->caps.eth.max_lso_cap || !mdev->caps.eth.vlan_cap ||
* !(mdev->caps.gen.flags & MLX5_DEV_CAP_FLAG_SCQE_BRK_MOD)) return
* -ENOTSUPP;
*/
/* TODO: add more must-to-have features */
return (0);
}
static void
mlx5e_build_ifp_priv(struct mlx5_core_dev *mdev,
struct mlx5e_priv *priv,
int num_comp_vectors)
{
/*
* TODO: Consider link speed for setting "log_sq_size",
* "log_rq_size" and "cq_moderation_xxx":
*/
priv->params.log_sq_size =
MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
priv->params.log_rq_size =
MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE;
priv->params.rx_cq_moderation_usec =
MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE :
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
priv->params.rx_cq_moderation_mode =
MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ? 1 : 0;
priv->params.rx_cq_moderation_pkts =
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
priv->params.tx_cq_moderation_usec =
MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
priv->params.tx_cq_moderation_pkts =
MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
priv->params.min_rx_wqes =
MLX5E_PARAMS_DEFAULT_MIN_RX_WQES;
priv->params.rx_hash_log_tbl_sz =
(order_base_2(num_comp_vectors) >
MLX5E_PARAMS_DEFAULT_RX_HASH_LOG_TBL_SZ) ?
order_base_2(num_comp_vectors) :
MLX5E_PARAMS_DEFAULT_RX_HASH_LOG_TBL_SZ;
priv->params.num_tc = 1;
priv->params.default_vlan_prio = 0;
priv->counter_set_id = -1;
/*
* hw lro is currently defaulted to off. when it won't anymore we
* will consider the HW capability: "!!MLX5_CAP_ETH(mdev, lro_cap)"
*/
priv->params.hw_lro_en = false;
priv->params.lro_wqe_sz = MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
priv->params.cqe_zipping_en = !!MLX5_CAP_GEN(mdev, cqe_compression);
priv->mdev = mdev;
priv->params.num_channels = num_comp_vectors;
priv->order_base_2_num_channels = order_base_2(num_comp_vectors);
priv->queue_mapping_channel_mask =
roundup_pow_of_two(num_comp_vectors) - 1;
priv->num_tc = priv->params.num_tc;
priv->default_vlan_prio = priv->params.default_vlan_prio;
INIT_WORK(&priv->update_stats_work, mlx5e_update_stats_work);
INIT_WORK(&priv->update_carrier_work, mlx5e_update_carrier_work);
INIT_WORK(&priv->set_rx_mode_work, mlx5e_set_rx_mode_work);
}
static int
mlx5e_create_mkey(struct mlx5e_priv *priv, u32 pdn,
struct mlx5_core_mr *mr)
{
struct ifnet *ifp = priv->ifp;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_create_mkey_mbox_in *in;
int err;
in = mlx5_vzalloc(sizeof(*in));
if (in == NULL) {
if_printf(ifp, "%s: failed to allocate inbox\n", __func__);
return (-ENOMEM);
}
in->seg.flags = MLX5_PERM_LOCAL_WRITE |
MLX5_PERM_LOCAL_READ |
MLX5_ACCESS_MODE_PA;
in->seg.flags_pd = cpu_to_be32(pdn | MLX5_MKEY_LEN64);
in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
err = mlx5_core_create_mkey(mdev, mr, in, sizeof(*in), NULL, NULL,
NULL);
if (err)
if_printf(ifp, "%s: mlx5_core_create_mkey failed, %d\n",
__func__, err);
kvfree(in);
return (err);
}
static const char *mlx5e_vport_stats_desc[] = {
MLX5E_VPORT_STATS(MLX5E_STATS_DESC)
};
static const char *mlx5e_pport_stats_desc[] = {
MLX5E_PPORT_STATS(MLX5E_STATS_DESC)
};
static void
mlx5e_priv_mtx_init(struct mlx5e_priv *priv)
{
mtx_init(&priv->async_events_mtx, "mlx5async", MTX_NETWORK_LOCK, MTX_DEF);
sx_init(&priv->state_lock, "mlx5state");
callout_init_mtx(&priv->watchdog, &priv->async_events_mtx, 0);
}
static void
mlx5e_priv_mtx_destroy(struct mlx5e_priv *priv)
{
mtx_destroy(&priv->async_events_mtx);
sx_destroy(&priv->state_lock);
}
static int
sysctl_firmware(SYSCTL_HANDLER_ARGS)
{
/*
* %d.%d%.d the string format.
* fw_rev_{maj,min,sub} return u16, 2^16 = 65536.
* We need at most 5 chars to store that.
* It also has: two "." and NULL at the end, which means we need 18
* (5*3 + 3) chars at most.
*/
char fw[18];
struct mlx5e_priv *priv = arg1;
int error;
snprintf(fw, sizeof(fw), "%d.%d.%d", fw_rev_maj(priv->mdev), fw_rev_min(priv->mdev),
fw_rev_sub(priv->mdev));
error = sysctl_handle_string(oidp, fw, sizeof(fw), req);
return (error);
}
static void
mlx5e_add_hw_stats(struct mlx5e_priv *priv)
{
SYSCTL_ADD_PROC(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_hw),
OID_AUTO, "fw_version", CTLTYPE_STRING | CTLFLAG_RD, priv, 0,
sysctl_firmware, "A", "HCA firmware version");
SYSCTL_ADD_STRING(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_hw),
OID_AUTO, "board_id", CTLFLAG_RD, priv->mdev->board_id, 0,
"Board ID");
}
static void
mlx5e_setup_pauseframes(struct mlx5e_priv *priv)
{
#if (__FreeBSD_version < 1100000)
char path[64];
#endif
/* Only receiving pauseframes is enabled by default */
priv->params.tx_pauseframe_control = 0;
priv->params.rx_pauseframe_control = 1;
#if (__FreeBSD_version < 1100000)
/* compute path for sysctl */
snprintf(path, sizeof(path), "dev.mce.%d.tx_pauseframe_control",
device_get_unit(priv->mdev->pdev->dev.bsddev));
/* try to fetch tunable, if any */
TUNABLE_INT_FETCH(path, &priv->params.tx_pauseframe_control);
/* compute path for sysctl */
snprintf(path, sizeof(path), "dev.mce.%d.rx_pauseframe_control",
device_get_unit(priv->mdev->pdev->dev.bsddev));
/* try to fetch tunable, if any */
TUNABLE_INT_FETCH(path, &priv->params.rx_pauseframe_control);
#endif
/* register pausframe SYSCTLs */
SYSCTL_ADD_INT(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
OID_AUTO, "tx_pauseframe_control", CTLFLAG_RDTUN,
&priv->params.tx_pauseframe_control, 0,
"Set to enable TX pause frames. Clear to disable.");
SYSCTL_ADD_INT(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
OID_AUTO, "rx_pauseframe_control", CTLFLAG_RDTUN,
&priv->params.rx_pauseframe_control, 0,
"Set to enable RX pause frames. Clear to disable.");
/* range check */
priv->params.tx_pauseframe_control =
priv->params.tx_pauseframe_control ? 1 : 0;
priv->params.rx_pauseframe_control =
priv->params.rx_pauseframe_control ? 1 : 0;
/* update firmware */
mlx5_set_port_pause(priv->mdev, 1,
priv->params.rx_pauseframe_control,
priv->params.tx_pauseframe_control);
}
static void *
mlx5e_create_ifp(struct mlx5_core_dev *mdev)
{
static volatile int mlx5_en_unit;
struct ifnet *ifp;
struct mlx5e_priv *priv;
u8 dev_addr[ETHER_ADDR_LEN] __aligned(4);
struct sysctl_oid_list *child;
int ncv = mdev->priv.eq_table.num_comp_vectors;
char unit[16];
int err;
int i;
u32 eth_proto_cap;
if (mlx5e_check_required_hca_cap(mdev)) {
mlx5_core_dbg(mdev, "mlx5e_check_required_hca_cap() failed\n");
return (NULL);
}
priv = malloc(sizeof(*priv), M_MLX5EN, M_WAITOK | M_ZERO);
if (priv == NULL) {
mlx5_core_err(mdev, "malloc() failed\n");
return (NULL);
}
mlx5e_priv_mtx_init(priv);
ifp = priv->ifp = if_alloc(IFT_ETHER);
if (ifp == NULL) {
mlx5_core_err(mdev, "if_alloc() failed\n");
goto err_free_priv;
}
ifp->if_softc = priv;
if_initname(ifp, "mce", atomic_fetchadd_int(&mlx5_en_unit, 1));
ifp->if_mtu = ETHERMTU;
ifp->if_init = mlx5e_open;
ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
ifp->if_ioctl = mlx5e_ioctl;
ifp->if_transmit = mlx5e_xmit;
ifp->if_qflush = if_qflush;
#if (__FreeBSD_version >= 1100000)
ifp->if_get_counter = mlx5e_get_counter;
#endif
ifp->if_snd.ifq_maxlen = ifqmaxlen;
/*
* Set driver features
*/
ifp->if_capabilities |= IFCAP_HWCSUM | IFCAP_HWCSUM_IPV6;
ifp->if_capabilities |= IFCAP_VLAN_MTU | IFCAP_VLAN_HWTAGGING;
ifp->if_capabilities |= IFCAP_VLAN_HWCSUM | IFCAP_VLAN_HWFILTER;
ifp->if_capabilities |= IFCAP_LINKSTATE | IFCAP_JUMBO_MTU;
ifp->if_capabilities |= IFCAP_LRO;
ifp->if_capabilities |= IFCAP_TSO | IFCAP_VLAN_HWTSO;
/* set TSO limits so that we don't have to drop TX packets */
ifp->if_hw_tsomax = MLX5E_MAX_TX_PAYLOAD_SIZE - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
ifp->if_hw_tsomaxsegcount = MLX5E_MAX_TX_MBUF_FRAGS - 1 /* hdr */;
ifp->if_hw_tsomaxsegsize = MLX5E_MAX_TX_MBUF_SIZE;
ifp->if_capenable = ifp->if_capabilities;
ifp->if_hwassist = 0;
if (ifp->if_capenable & IFCAP_TSO)
ifp->if_hwassist |= CSUM_TSO;
if (ifp->if_capenable & IFCAP_TXCSUM)
ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP | CSUM_IP);
if (ifp->if_capenable & IFCAP_TXCSUM_IPV6)
ifp->if_hwassist |= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
/* ifnet sysctl tree */
sysctl_ctx_init(&priv->sysctl_ctx);
priv->sysctl_ifnet = SYSCTL_ADD_NODE(&priv->sysctl_ctx, SYSCTL_STATIC_CHILDREN(_dev),
OID_AUTO, ifp->if_dname, CTLFLAG_RD, 0, "MLX5 ethernet - interface name");
if (priv->sysctl_ifnet == NULL) {
mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
goto err_free_sysctl;
}
snprintf(unit, sizeof(unit), "%d", ifp->if_dunit);
priv->sysctl_ifnet = SYSCTL_ADD_NODE(&priv->sysctl_ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
OID_AUTO, unit, CTLFLAG_RD, 0, "MLX5 ethernet - interface unit");
if (priv->sysctl_ifnet == NULL) {
mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
goto err_free_sysctl;
}
/* HW sysctl tree */
child = SYSCTL_CHILDREN(device_get_sysctl_tree(mdev->pdev->dev.bsddev));
priv->sysctl_hw = SYSCTL_ADD_NODE(&priv->sysctl_ctx, child,
OID_AUTO, "hw", CTLFLAG_RD, 0, "MLX5 ethernet dev hw");
if (priv->sysctl_hw == NULL) {
mlx5_core_err(mdev, "SYSCTL_ADD_NODE() failed\n");
goto err_free_sysctl;
}
mlx5e_build_ifp_priv(mdev, priv, ncv);
err = mlx5_alloc_map_uar(mdev, &priv->cq_uar);
if (err) {
if_printf(ifp, "%s: mlx5_alloc_map_uar failed, %d\n",
__func__, err);
goto err_free_sysctl;
}
err = mlx5_core_alloc_pd(mdev, &priv->pdn);
if (err) {
if_printf(ifp, "%s: mlx5_core_alloc_pd failed, %d\n",
__func__, err);
goto err_unmap_free_uar;
}
err = mlx5_alloc_transport_domain(mdev, &priv->tdn);
if (err) {
if_printf(ifp, "%s: mlx5_alloc_transport_domain failed, %d\n",
__func__, err);
goto err_dealloc_pd;
}
err = mlx5e_create_mkey(priv, priv->pdn, &priv->mr);
if (err) {
if_printf(ifp, "%s: mlx5e_create_mkey failed, %d\n",
__func__, err);
goto err_dealloc_transport_domain;
}
mlx5_query_nic_vport_mac_address(priv->mdev, 0, dev_addr);
+ /* check if we should generate a random MAC address */
+ if (MLX5_CAP_GEN(priv->mdev, vport_group_manager) == 0 &&
+ is_zero_ether_addr(dev_addr)) {
+ random_ether_addr(dev_addr);
+ if_printf(ifp, "Assigned random MAC address\n");
+ }
+
/* set default MTU */
mlx5e_set_dev_port_mtu(ifp, ifp->if_mtu);
/* Set desc */
device_set_desc(mdev->pdev->dev.bsddev, mlx5e_version);
/* Set default media status */
priv->media_status_last = IFM_AVALID;
priv->media_active_last = IFM_ETHER | IFM_AUTO |
IFM_ETH_RXPAUSE | IFM_FDX;
/* setup default pauseframes configuration */
mlx5e_setup_pauseframes(priv);
err = mlx5_query_port_proto_cap(mdev, ð_proto_cap, MLX5_PTYS_EN);
if (err) {
eth_proto_cap = 0;
if_printf(ifp, "%s: Query port media capability failed, %d\n",
__func__, err);
}
/* Setup supported medias */
ifmedia_init(&priv->media, IFM_IMASK | IFM_ETH_FMASK,
mlx5e_media_change, mlx5e_media_status);
for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
if (mlx5e_mode_table[i].baudrate == 0)
continue;
if (MLX5E_PROT_MASK(i) & eth_proto_cap) {
ifmedia_add(&priv->media,
mlx5e_mode_table[i].subtype |
IFM_ETHER, 0, NULL);
ifmedia_add(&priv->media,
mlx5e_mode_table[i].subtype |
IFM_ETHER | IFM_FDX |
IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE, 0, NULL);
}
}
ifmedia_add(&priv->media, IFM_ETHER | IFM_AUTO, 0, NULL);
ifmedia_add(&priv->media, IFM_ETHER | IFM_AUTO | IFM_FDX |
IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE, 0, NULL);
/* Set autoselect by default */
ifmedia_set(&priv->media, IFM_ETHER | IFM_AUTO | IFM_FDX |
IFM_ETH_RXPAUSE | IFM_ETH_TXPAUSE);
ether_ifattach(ifp, dev_addr);
/* Register for VLAN events */
priv->vlan_attach = EVENTHANDLER_REGISTER(vlan_config,
mlx5e_vlan_rx_add_vid, priv, EVENTHANDLER_PRI_FIRST);
priv->vlan_detach = EVENTHANDLER_REGISTER(vlan_unconfig,
mlx5e_vlan_rx_kill_vid, priv, EVENTHANDLER_PRI_FIRST);
/* Link is down by default */
if_link_state_change(ifp, LINK_STATE_DOWN);
mlx5e_enable_async_events(priv);
mlx5e_add_hw_stats(priv);
mlx5e_create_stats(&priv->stats.vport.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
"vstats", mlx5e_vport_stats_desc, MLX5E_VPORT_STATS_NUM,
priv->stats.vport.arg);
mlx5e_create_stats(&priv->stats.pport.ctx, SYSCTL_CHILDREN(priv->sysctl_ifnet),
"pstats", mlx5e_pport_stats_desc, MLX5E_PPORT_STATS_NUM,
priv->stats.pport.arg);
mlx5e_create_ethtool(priv);
mtx_lock(&priv->async_events_mtx);
mlx5e_update_stats(priv);
mtx_unlock(&priv->async_events_mtx);
return (priv);
err_dealloc_transport_domain:
mlx5_dealloc_transport_domain(mdev, priv->tdn);
err_dealloc_pd:
mlx5_core_dealloc_pd(mdev, priv->pdn);
err_unmap_free_uar:
mlx5_unmap_free_uar(mdev, &priv->cq_uar);
err_free_sysctl:
sysctl_ctx_free(&priv->sysctl_ctx);
if_free(ifp);
err_free_priv:
mlx5e_priv_mtx_destroy(priv);
free(priv, M_MLX5EN);
return (NULL);
}
static void
mlx5e_destroy_ifp(struct mlx5_core_dev *mdev, void *vpriv)
{
struct mlx5e_priv *priv = vpriv;
struct ifnet *ifp = priv->ifp;
/* don't allow more IOCTLs */
priv->gone = 1;
/* XXX wait a bit to allow IOCTL handlers to complete */
pause("W", hz);
/* stop watchdog timer */
callout_drain(&priv->watchdog);
if (priv->vlan_attach != NULL)
EVENTHANDLER_DEREGISTER(vlan_config, priv->vlan_attach);
if (priv->vlan_detach != NULL)
EVENTHANDLER_DEREGISTER(vlan_unconfig, priv->vlan_detach);
/* make sure device gets closed */
PRIV_LOCK(priv);
mlx5e_close_locked(ifp);
PRIV_UNLOCK(priv);
/* unregister device */
ifmedia_removeall(&priv->media);
ether_ifdetach(ifp);
if_free(ifp);
/* destroy all remaining sysctl nodes */
if (priv->sysctl_debug)
sysctl_ctx_free(&priv->stats.port_stats_debug.ctx);
sysctl_ctx_free(&priv->stats.vport.ctx);
sysctl_ctx_free(&priv->stats.pport.ctx);
sysctl_ctx_free(&priv->sysctl_ctx);
mlx5_core_destroy_mkey(priv->mdev, &priv->mr);
mlx5_dealloc_transport_domain(priv->mdev, priv->tdn);
mlx5_core_dealloc_pd(priv->mdev, priv->pdn);
mlx5_unmap_free_uar(priv->mdev, &priv->cq_uar);
mlx5e_disable_async_events(priv);
flush_scheduled_work();
mlx5e_priv_mtx_destroy(priv);
free(priv, M_MLX5EN);
}
static void *
mlx5e_get_ifp(void *vpriv)
{
struct mlx5e_priv *priv = vpriv;
return (priv->ifp);
}
static struct mlx5_interface mlx5e_interface = {
.add = mlx5e_create_ifp,
.remove = mlx5e_destroy_ifp,
.event = mlx5e_async_event,
.protocol = MLX5_INTERFACE_PROTOCOL_ETH,
.get_dev = mlx5e_get_ifp,
};
void
mlx5e_init(void)
{
mlx5_register_interface(&mlx5e_interface);
}
void
mlx5e_cleanup(void)
{
mlx5_unregister_interface(&mlx5e_interface);
}
module_init_order(mlx5e_init, SI_ORDER_THIRD);
module_exit_order(mlx5e_cleanup, SI_ORDER_THIRD);
#if (__FreeBSD_version >= 1100000)
MODULE_DEPEND(mlx5en, linuxkpi, 1, 1, 1);
#endif
MODULE_DEPEND(mlx5en, mlx5, 1, 1, 1);
MODULE_VERSION(mlx5en, 1);
Index: projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
===================================================================
--- projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c (revision 301546)
+++ projects/vnet/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c (revision 301547)
@@ -1,444 +1,444 @@
/*-
* Copyright (c) 2015 Mellanox Technologies. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#include "en.h"
#include
static inline int
mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq,
struct mlx5e_rx_wqe *wqe, u16 ix)
{
bus_dma_segment_t segs[1];
struct mbuf *mb;
int nsegs;
int err;
if (rq->mbuf[ix].mbuf != NULL)
return (0);
mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rq->wqe_sz);
if (unlikely(!mb))
return (-ENOMEM);
/* set initial mbuf length */
mb->m_pkthdr.len = mb->m_len = rq->wqe_sz;
/* get IP header aligned */
m_adj(mb, MLX5E_NET_IP_ALIGN);
err = -bus_dmamap_load_mbuf_sg(rq->dma_tag, rq->mbuf[ix].dma_map,
mb, segs, &nsegs, BUS_DMA_NOWAIT);
if (err != 0)
goto err_free_mbuf;
if (unlikely(nsegs != 1)) {
bus_dmamap_unload(rq->dma_tag, rq->mbuf[ix].dma_map);
err = -ENOMEM;
goto err_free_mbuf;
}
wqe->data.addr = cpu_to_be64(segs[0].ds_addr);
rq->mbuf[ix].mbuf = mb;
rq->mbuf[ix].data = mb->m_data;
bus_dmamap_sync(rq->dma_tag, rq->mbuf[ix].dma_map,
BUS_DMASYNC_PREREAD);
return (0);
err_free_mbuf:
m_freem(mb);
return (err);
}
static void
mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
{
if (unlikely(rq->enabled == 0))
return;
while (!mlx5_wq_ll_is_full(&rq->wq)) {
struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, rq->wq.head);
if (unlikely(mlx5e_alloc_rx_wqe(rq, wqe, rq->wq.head)))
break;
mlx5_wq_ll_push(&rq->wq, be16_to_cpu(wqe->next.next_wqe_index));
}
/* ensure wqes are visible to device before updating doorbell record */
wmb();
mlx5_wq_ll_update_db_record(&rq->wq);
}
static void
mlx5e_lro_update_hdr(struct mbuf *mb, struct mlx5_cqe64 *cqe)
{
/* TODO: consider vlans, ip options, ... */
struct ether_header *eh;
uint16_t eh_type;
uint16_t tot_len;
struct ip6_hdr *ip6 = NULL;
struct ip *ip4 = NULL;
struct tcphdr *th;
uint32_t *ts_ptr;
uint8_t l4_hdr_type;
int tcp_ack;
eh = mtod(mb, struct ether_header *);
eh_type = ntohs(eh->ether_type);
l4_hdr_type = get_cqe_l4_hdr_type(cqe);
tcp_ack = ((CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA == l4_hdr_type) ||
(CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA == l4_hdr_type));
/* TODO: consider vlan */
tot_len = be32_to_cpu(cqe->byte_cnt) - ETHER_HDR_LEN;
switch (eh_type) {
case ETHERTYPE_IP:
ip4 = (struct ip *)(eh + 1);
th = (struct tcphdr *)(ip4 + 1);
break;
case ETHERTYPE_IPV6:
ip6 = (struct ip6_hdr *)(eh + 1);
th = (struct tcphdr *)(ip6 + 1);
break;
default:
return;
}
ts_ptr = (uint32_t *)(th + 1);
if (get_cqe_lro_tcppsh(cqe))
th->th_flags |= TH_PUSH;
if (tcp_ack) {
th->th_flags |= TH_ACK;
th->th_ack = cqe->lro_ack_seq_num;
th->th_win = cqe->lro_tcp_win;
/*
* FreeBSD handles only 32bit aligned timestamp right after
* the TCP hdr
* +--------+--------+--------+--------+
* | NOP | NOP | TSopt | 10 |
* +--------+--------+--------+--------+
* | TSval timestamp |
* +--------+--------+--------+--------+
* | TSecr timestamp |
* +--------+--------+--------+--------+
*/
if (get_cqe_lro_timestamp_valid(cqe) &&
(__predict_true(*ts_ptr) == ntohl(TCPOPT_NOP << 24 |
TCPOPT_NOP << 16 | TCPOPT_TIMESTAMP << 8 |
TCPOLEN_TIMESTAMP))) {
/*
* cqe->timestamp is 64bit long.
* [0-31] - timestamp.
* [32-64] - timestamp echo replay.
*/
ts_ptr[1] = *(uint32_t *)&cqe->timestamp;
ts_ptr[2] = *((uint32_t *)&cqe->timestamp + 1);
}
}
if (ip4) {
ip4->ip_ttl = cqe->lro_min_ttl;
ip4->ip_len = cpu_to_be16(tot_len);
ip4->ip_sum = 0;
ip4->ip_sum = in_cksum(mb, ip4->ip_hl << 2);
} else {
ip6->ip6_hlim = cqe->lro_min_ttl;
ip6->ip6_plen = cpu_to_be16(tot_len -
sizeof(struct ip6_hdr));
}
/* TODO: handle tcp checksum */
}
static inline void
mlx5e_build_rx_mbuf(struct mlx5_cqe64 *cqe,
struct mlx5e_rq *rq, struct mbuf *mb,
u32 cqe_bcnt)
{
struct ifnet *ifp = rq->ifp;
int lro_num_seg; /* HW LRO session aggregated packets counter */
lro_num_seg = be32_to_cpu(cqe->srqn) >> 24;
if (lro_num_seg > 1) {
mlx5e_lro_update_hdr(mb, cqe);
rq->stats.lro_packets++;
rq->stats.lro_bytes += cqe_bcnt;
}
mb->m_pkthdr.len = mb->m_len = cqe_bcnt;
/* check if a Toeplitz hash was computed */
if (cqe->rss_hash_type != 0) {
mb->m_pkthdr.flowid = be32_to_cpu(cqe->rss_hash_result);
#ifdef RSS
/* decode the RSS hash type */
switch (cqe->rss_hash_type &
(CQE_RSS_DST_HTYPE_L4 | CQE_RSS_DST_HTYPE_IP)) {
/* IPv4 */
case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV4):
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV4);
break;
case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV4):
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV4);
break;
case CQE_RSS_DST_HTYPE_IPV4:
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV4);
break;
/* IPv6 */
case (CQE_RSS_DST_HTYPE_TCP | CQE_RSS_DST_HTYPE_IPV6):
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_TCP_IPV6);
break;
case (CQE_RSS_DST_HTYPE_UDP | CQE_RSS_DST_HTYPE_IPV6):
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_UDP_IPV6);
break;
case CQE_RSS_DST_HTYPE_IPV6:
M_HASHTYPE_SET(mb, M_HASHTYPE_RSS_IPV6);
break;
default: /* Other */
- M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
break;
}
#else
- M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE_HASH);
#endif
} else {
mb->m_pkthdr.flowid = rq->ix;
M_HASHTYPE_SET(mb, M_HASHTYPE_OPAQUE);
}
mb->m_pkthdr.rcvif = ifp;
if (likely(ifp->if_capenable & (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6)) &&
((cqe->hds_ip_ext & (CQE_L2_OK | CQE_L3_OK | CQE_L4_OK)) ==
(CQE_L2_OK | CQE_L3_OK | CQE_L4_OK))) {
mb->m_pkthdr.csum_flags =
CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
mb->m_pkthdr.csum_data = htons(0xffff);
} else {
rq->stats.csum_none++;
}
if (cqe_has_vlan(cqe)) {
mb->m_pkthdr.ether_vtag = be16_to_cpu(cqe->vlan_info);
mb->m_flags |= M_VLANTAG;
}
}
static inline void
mlx5e_read_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
{
memcpy(data, mlx5_cqwq_get_wqe(&cq->wq, (cc & cq->wq.sz_m1)),
sizeof(struct mlx5_cqe64));
}
static inline void
mlx5e_write_cqe_slot(struct mlx5e_cq *cq, u32 cc, void *data)
{
memcpy(mlx5_cqwq_get_wqe(&cq->wq, cc & cq->wq.sz_m1),
data, sizeof(struct mlx5_cqe64));
}
static inline void
mlx5e_decompress_cqe(struct mlx5e_cq *cq, struct mlx5_cqe64 *title,
struct mlx5_mini_cqe8 *mini,
u16 wqe_counter, int i)
{
/*
* NOTE: The fields which are not set here are copied from the
* initial and common title. See memcpy() in
* mlx5e_write_cqe_slot().
*/
title->byte_cnt = mini->byte_cnt;
title->wqe_counter = cpu_to_be16((wqe_counter + i) & cq->wq.sz_m1);
title->check_sum = mini->checksum;
title->op_own = (title->op_own & 0xf0) |
(((cq->wq.cc + i) >> cq->wq.log_sz) & 1);
}
#define MLX5E_MINI_ARRAY_SZ 8
/* Make sure structs are not packet differently */
CTASSERT(sizeof(struct mlx5_cqe64) ==
sizeof(struct mlx5_mini_cqe8) * MLX5E_MINI_ARRAY_SZ);
static void
mlx5e_decompress_cqes(struct mlx5e_cq *cq)
{
struct mlx5_mini_cqe8 mini_array[MLX5E_MINI_ARRAY_SZ];
struct mlx5_cqe64 title;
u32 cqe_count;
u32 i = 0;
u16 title_wqe_counter;
mlx5e_read_cqe_slot(cq, cq->wq.cc, &title);
title_wqe_counter = be16_to_cpu(title.wqe_counter);
cqe_count = be32_to_cpu(title.byte_cnt);
/* Make sure we won't overflow */
KASSERT(cqe_count <= cq->wq.sz_m1,
("%s: cqe_count %u > cq->wq.sz_m1 %u", __func__,
cqe_count, cq->wq.sz_m1));
mlx5e_read_cqe_slot(cq, cq->wq.cc + 1, mini_array);
while (true) {
mlx5e_decompress_cqe(cq, &title,
&mini_array[i % MLX5E_MINI_ARRAY_SZ],
title_wqe_counter, i);
mlx5e_write_cqe_slot(cq, cq->wq.cc + i, &title);
i++;
if (i == cqe_count)
break;
if (i % MLX5E_MINI_ARRAY_SZ == 0)
mlx5e_read_cqe_slot(cq, cq->wq.cc + i, mini_array);
}
}
static int
mlx5e_poll_rx_cq(struct mlx5e_rq *rq, int budget)
{
int i;
for (i = 0; i < budget; i++) {
struct mlx5e_rx_wqe *wqe;
struct mlx5_cqe64 *cqe;
struct mbuf *mb;
__be16 wqe_counter_be;
u16 wqe_counter;
u32 byte_cnt;
cqe = mlx5e_get_cqe(&rq->cq);
if (!cqe)
break;
if (mlx5_get_cqe_format(cqe) == MLX5_COMPRESSED)
mlx5e_decompress_cqes(&rq->cq);
mlx5_cqwq_pop(&rq->cq.wq);
wqe_counter_be = cqe->wqe_counter;
wqe_counter = be16_to_cpu(wqe_counter_be);
wqe = mlx5_wq_ll_get_wqe(&rq->wq, wqe_counter);
byte_cnt = be32_to_cpu(cqe->byte_cnt);
bus_dmamap_sync(rq->dma_tag,
rq->mbuf[wqe_counter].dma_map,
BUS_DMASYNC_POSTREAD);
if (unlikely((cqe->op_own >> 4) != MLX5_CQE_RESP_SEND)) {
rq->stats.wqe_err++;
goto wq_ll_pop;
}
if (MHLEN >= byte_cnt &&
(mb = m_gethdr(M_NOWAIT, MT_DATA)) != NULL) {
bcopy(rq->mbuf[wqe_counter].data, mtod(mb, caddr_t),
byte_cnt);
} else {
mb = rq->mbuf[wqe_counter].mbuf;
rq->mbuf[wqe_counter].mbuf = NULL; /* safety clear */
bus_dmamap_unload(rq->dma_tag,
rq->mbuf[wqe_counter].dma_map);
}
mlx5e_build_rx_mbuf(cqe, rq, mb, byte_cnt);
rq->stats.packets++;
#ifdef HAVE_TURBO_LRO
if (mb->m_pkthdr.csum_flags == 0 ||
(rq->ifp->if_capenable & IFCAP_LRO) == 0 ||
rq->lro.mbuf == NULL) {
/* normal input */
rq->ifp->if_input(rq->ifp, mb);
} else {
tcp_tlro_rx(&rq->lro, mb);
}
#else
if (mb->m_pkthdr.csum_flags == 0 ||
(rq->ifp->if_capenable & IFCAP_LRO) == 0 ||
rq->lro.lro_cnt == 0 ||
tcp_lro_rx(&rq->lro, mb, 0) != 0) {
rq->ifp->if_input(rq->ifp, mb);
}
#endif
wq_ll_pop:
mlx5_wq_ll_pop(&rq->wq, wqe_counter_be,
&wqe->next.next_wqe_index);
}
mlx5_cqwq_update_db_record(&rq->cq.wq);
/* ensure cq space is freed before enabling more cqes */
wmb();
#ifndef HAVE_TURBO_LRO
tcp_lro_flush_all(&rq->lro);
#endif
return (i);
}
void
mlx5e_rx_cq_comp(struct mlx5_core_cq *mcq)
{
struct mlx5e_rq *rq = container_of(mcq, struct mlx5e_rq, cq.mcq);
int i = 0;
#ifdef HAVE_PER_CQ_EVENT_PACKET
struct mbuf *mb = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rq->wqe_sz);
if (mb != NULL) {
/* this code is used for debugging purpose only */
mb->m_pkthdr.len = mb->m_len = 15;
memset(mb->m_data, 255, 14);
mb->m_data[14] = rq->ix;
mb->m_pkthdr.rcvif = rq->ifp;
rq->ifp->if_input(rq->ifp, mb);
}
#endif
mtx_lock(&rq->mtx);
/*
* Polling the entire CQ without posting new WQEs results in
* lack of receive WQEs during heavy traffic scenarios.
*/
while (1) {
if (mlx5e_poll_rx_cq(rq, MLX5E_RX_BUDGET_MAX) !=
MLX5E_RX_BUDGET_MAX)
break;
i += MLX5E_RX_BUDGET_MAX;
if (i >= MLX5E_BUDGET_MAX)
break;
mlx5e_post_rx_wqes(rq);
}
mlx5e_post_rx_wqes(rq);
mlx5e_cq_arm(&rq->cq);
#ifdef HAVE_TURBO_LRO
tcp_tlro_flush(&rq->lro, 1);
#endif
mtx_unlock(&rq->mtx);
}
Index: projects/vnet/sys/dev/mlx5/vport.h
===================================================================
--- projects/vnet/sys/dev/mlx5/vport.h (revision 301546)
+++ projects/vnet/sys/dev/mlx5/vport.h (revision 301547)
@@ -1,81 +1,109 @@
/*-
* Copyright (c) 2013-2015, Mellanox Technologies, Ltd. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS `AS IS' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* $FreeBSD$
*/
#ifndef __MLX5_VPORT_H__
#define __MLX5_VPORT_H__
#include
int mlx5_vport_alloc_q_counter(struct mlx5_core_dev *mdev,
int *counter_set_id);
int mlx5_vport_dealloc_q_counter(struct mlx5_core_dev *mdev,
int counter_set_id);
int mlx5_vport_query_out_of_rx_buffer(struct mlx5_core_dev *mdev,
int counter_set_id,
u32 *out_of_rx_buffer);
u8 mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod);
+int mlx5_arm_vport_context_events(struct mlx5_core_dev *mdev,
+ u8 vport,
+ u32 events_mask);
+int mlx5_query_vport_promisc(struct mlx5_core_dev *mdev,
+ u32 vport,
+ u8 *promisc_uc,
+ u8 *promisc_mc,
+ u8 *promisc_all);
+int mlx5_modify_nic_vport_promisc(struct mlx5_core_dev *mdev,
+ int promisc_uc,
+ int promisc_mc,
+ int promisc_all);
int mlx5_query_nic_vport_mac_address(struct mlx5_core_dev *mdev,
u32 vport, u8 *addr);
int mlx5_set_nic_vport_current_mac(struct mlx5_core_dev *mdev, int vport,
bool other_vport, u8 *addr);
int mlx5_set_nic_vport_vlan_list(struct mlx5_core_dev *dev, u32 vport,
u16 *vlan_list, int list_len);
int mlx5_set_nic_vport_mc_list(struct mlx5_core_dev *mdev, int vport,
u64 *addr_list, size_t addr_list_len);
int mlx5_set_nic_vport_promisc(struct mlx5_core_dev *mdev, int vport,
bool promisc_mc, bool promisc_uc,
bool promisc_all);
+int mlx5_query_nic_vport_mac_list(struct mlx5_core_dev *dev,
+ u32 vport,
+ enum mlx5_list_type list_type,
+ u8 addr_list[][ETH_ALEN],
+ int *list_size);
+int mlx5_modify_nic_vport_mac_list(struct mlx5_core_dev *dev,
+ enum mlx5_list_type list_type,
+ u8 addr_list[][ETH_ALEN],
+ int list_size);
+int mlx5_query_nic_vport_vlan_list(struct mlx5_core_dev *dev,
+ u32 vport,
+ u16 *vlan_list,
+ int *list_size);
+int mlx5_modify_nic_vport_vlans(struct mlx5_core_dev *dev,
+ u16 vlans[],
+ int list_size);
int mlx5_set_nic_vport_permanent_mac(struct mlx5_core_dev *mdev, int vport,
u8 *addr);
int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev);
int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev);
int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
u64 *system_image_guid);
int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
int mlx5_query_nic_vport_port_guid(struct mlx5_core_dev *mdev, u64 *port_guid);
int mlx5_query_nic_vport_qkey_viol_cntr(struct mlx5_core_dev *mdev,
u16 *qkey_viol_cntr);
int mlx5_query_hca_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
int mlx5_query_hca_vport_system_image_guid(struct mlx5_core_dev *mdev,
u64 *system_image_guid);
int mlx5_query_hca_vport_context(struct mlx5_core_dev *mdev,
u8 port_num, u8 vport_num, u32 *out,
int outlen);
int mlx5_query_hca_vport_pkey(struct mlx5_core_dev *dev, u8 other_vport,
u8 port_num, u16 vf_num, u16 pkey_index,
u16 *pkey);
int mlx5_query_hca_vport_gid(struct mlx5_core_dev *dev, u8 port_num,
u16 vport_num, u16 gid_index, union ib_gid *gid);
int mlx5_set_eswitch_cvlan_info(struct mlx5_core_dev *mdev, u8 vport,
u8 insert_mode, u8 strip_mode,
u16 vlan, u8 cfi, u8 pcp);
int mlx5_query_vport_counter(struct mlx5_core_dev *dev,
u8 port_num, u16 vport_num,
void *out, int out_size);
int mlx5_get_vport_counters(struct mlx5_core_dev *dev, u8 port_num,
struct mlx5_vport_counters *vc);
#endif /* __MLX5_VPORT_H__ */
Index: projects/vnet/sys/dev/qlxgbe/ql_isr.c
===================================================================
--- projects/vnet/sys/dev/qlxgbe/ql_isr.c (revision 301546)
+++ projects/vnet/sys/dev/qlxgbe/ql_isr.c (revision 301547)
@@ -1,924 +1,924 @@
/*
* Copyright (c) 2013-2016 Qlogic Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/*
* File: ql_isr.c
* Author : David C Somayajulu, Qlogic Corporation, Aliso Viejo, CA 92656.
*/
#include
__FBSDID("$FreeBSD$");
#include "ql_os.h"
#include "ql_hw.h"
#include "ql_def.h"
#include "ql_inline.h"
#include "ql_ver.h"
#include "ql_glbl.h"
#include "ql_dbg.h"
static void qla_replenish_normal_rx(qla_host_t *ha, qla_sds_t *sdsp,
uint32_t r_idx);
static void
qla_rcv_error(qla_host_t *ha)
{
ha->flags.stop_rcv = 1;
ha->qla_initiate_recovery = 1;
}
/*
* Name: qla_rx_intr
* Function: Handles normal ethernet frames received
*/
static void
qla_rx_intr(qla_host_t *ha, qla_sgl_rcv_t *sgc, uint32_t sds_idx)
{
qla_rx_buf_t *rxb;
struct mbuf *mp = NULL, *mpf = NULL, *mpl = NULL;
struct ifnet *ifp = ha->ifp;
qla_sds_t *sdsp;
struct ether_vlan_header *eh;
uint32_t i, rem_len = 0;
uint32_t r_idx = 0;
qla_rx_ring_t *rx_ring;
if (ha->hw.num_rds_rings > 1)
r_idx = sds_idx;
ha->hw.rds[r_idx].count++;
sdsp = &ha->hw.sds[sds_idx];
rx_ring = &ha->rx_ring[r_idx];
for (i = 0; i < sgc->num_handles; i++) {
rxb = &rx_ring->rx_buf[sgc->handle[i] & 0x7FFF];
QL_ASSERT(ha, (rxb != NULL),
("%s: [sds_idx]=[%d] rxb != NULL\n", __func__,\
sds_idx));
if ((rxb == NULL) || QL_ERR_INJECT(ha, INJCT_RX_RXB_INVAL)) {
/* log the error */
device_printf(ha->pci_dev,
"%s invalid rxb[%d, %d, 0x%04x]\n",
__func__, sds_idx, i, sgc->handle[i]);
qla_rcv_error(ha);
return;
}
mp = rxb->m_head;
if (i == 0)
mpf = mp;
QL_ASSERT(ha, (mp != NULL),
("%s: [sds_idx]=[%d] mp != NULL\n", __func__,\
sds_idx));
bus_dmamap_sync(ha->rx_tag, rxb->map, BUS_DMASYNC_POSTREAD);
rxb->m_head = NULL;
rxb->next = sdsp->rxb_free;
sdsp->rxb_free = rxb;
sdsp->rx_free++;
if ((mp == NULL) || QL_ERR_INJECT(ha, INJCT_RX_MP_NULL)) {
/* log the error */
device_printf(ha->pci_dev,
"%s mp == NULL [%d, %d, 0x%04x]\n",
__func__, sds_idx, i, sgc->handle[i]);
qla_rcv_error(ha);
return;
}
if (i == 0) {
mpl = mpf = mp;
mp->m_flags |= M_PKTHDR;
mp->m_pkthdr.len = sgc->pkt_length;
mp->m_pkthdr.rcvif = ifp;
rem_len = mp->m_pkthdr.len;
} else {
mp->m_flags &= ~M_PKTHDR;
mpl->m_next = mp;
mpl = mp;
rem_len = rem_len - mp->m_len;
}
}
mpl->m_len = rem_len;
eh = mtod(mpf, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
uint32_t *data = (uint32_t *)eh;
mpf->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
mpf->m_flags |= M_VLANTAG;
*(data + 3) = *(data + 2);
*(data + 2) = *(data + 1);
*(data + 1) = *data;
m_adj(mpf, ETHER_VLAN_ENCAP_LEN);
}
if (sgc->chksum_status == Q8_STAT_DESC_STATUS_CHKSUM_OK) {
mpf->m_pkthdr.csum_flags = CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
mpf->m_pkthdr.csum_data = 0xFFFF;
} else {
mpf->m_pkthdr.csum_flags = 0;
}
if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
mpf->m_pkthdr.flowid = sgc->rss_hash;
- M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE_HASH);
(*ifp->if_input)(ifp, mpf);
if (sdsp->rx_free > ha->std_replenish)
qla_replenish_normal_rx(ha, sdsp, r_idx);
return;
}
#define QLA_TCP_HDR_SIZE 20
#define QLA_TCP_TS_OPTION_SIZE 12
/*
* Name: qla_lro_intr
* Function: Handles normal ethernet frames received
*/
static int
qla_lro_intr(qla_host_t *ha, qla_sgl_lro_t *sgc, uint32_t sds_idx)
{
qla_rx_buf_t *rxb;
struct mbuf *mp = NULL, *mpf = NULL, *mpl = NULL;
struct ifnet *ifp = ha->ifp;
qla_sds_t *sdsp;
struct ether_vlan_header *eh;
uint32_t i, rem_len = 0, pkt_length, iplen;
struct tcphdr *th;
struct ip *ip = NULL;
struct ip6_hdr *ip6 = NULL;
uint16_t etype;
uint32_t r_idx = 0;
qla_rx_ring_t *rx_ring;
if (ha->hw.num_rds_rings > 1)
r_idx = sds_idx;
ha->hw.rds[r_idx].count++;
rx_ring = &ha->rx_ring[r_idx];
ha->lro_pkt_count++;
sdsp = &ha->hw.sds[sds_idx];
pkt_length = sgc->payload_length + sgc->l4_offset;
if (sgc->flags & Q8_LRO_COMP_TS) {
pkt_length += QLA_TCP_HDR_SIZE + QLA_TCP_TS_OPTION_SIZE;
} else {
pkt_length += QLA_TCP_HDR_SIZE;
}
ha->lro_bytes += pkt_length;
for (i = 0; i < sgc->num_handles; i++) {
rxb = &rx_ring->rx_buf[sgc->handle[i] & 0x7FFF];
QL_ASSERT(ha, (rxb != NULL),
("%s: [sds_idx]=[%d] rxb != NULL\n", __func__,\
sds_idx));
if ((rxb == NULL) || QL_ERR_INJECT(ha, INJCT_LRO_RXB_INVAL)) {
/* log the error */
device_printf(ha->pci_dev,
"%s invalid rxb[%d, %d, 0x%04x]\n",
__func__, sds_idx, i, sgc->handle[i]);
qla_rcv_error(ha);
return (0);
}
mp = rxb->m_head;
if (i == 0)
mpf = mp;
QL_ASSERT(ha, (mp != NULL),
("%s: [sds_idx]=[%d] mp != NULL\n", __func__,\
sds_idx));
bus_dmamap_sync(ha->rx_tag, rxb->map, BUS_DMASYNC_POSTREAD);
rxb->m_head = NULL;
rxb->next = sdsp->rxb_free;
sdsp->rxb_free = rxb;
sdsp->rx_free++;
if ((mp == NULL) || QL_ERR_INJECT(ha, INJCT_LRO_MP_NULL)) {
/* log the error */
device_printf(ha->pci_dev,
"%s mp == NULL [%d, %d, 0x%04x]\n",
__func__, sds_idx, i, sgc->handle[i]);
qla_rcv_error(ha);
return (0);
}
if (i == 0) {
mpl = mpf = mp;
mp->m_flags |= M_PKTHDR;
mp->m_pkthdr.len = pkt_length;
mp->m_pkthdr.rcvif = ifp;
rem_len = mp->m_pkthdr.len;
} else {
mp->m_flags &= ~M_PKTHDR;
mpl->m_next = mp;
mpl = mp;
rem_len = rem_len - mp->m_len;
}
}
mpl->m_len = rem_len;
th = (struct tcphdr *)(mpf->m_data + sgc->l4_offset);
if (sgc->flags & Q8_LRO_COMP_PUSH_BIT)
th->th_flags |= TH_PUSH;
m_adj(mpf, sgc->l2_offset);
eh = mtod(mpf, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
uint32_t *data = (uint32_t *)eh;
mpf->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
mpf->m_flags |= M_VLANTAG;
*(data + 3) = *(data + 2);
*(data + 2) = *(data + 1);
*(data + 1) = *data;
m_adj(mpf, ETHER_VLAN_ENCAP_LEN);
etype = ntohs(eh->evl_proto);
} else {
etype = ntohs(eh->evl_encap_proto);
}
if (etype == ETHERTYPE_IP) {
ip = (struct ip *)(mpf->m_data + ETHER_HDR_LEN);
iplen = (ip->ip_hl << 2) + (th->th_off << 2) +
sgc->payload_length;
ip->ip_len = htons(iplen);
ha->ipv4_lro++;
} else if (etype == ETHERTYPE_IPV6) {
ip6 = (struct ip6_hdr *)(mpf->m_data + ETHER_HDR_LEN);
iplen = (th->th_off << 2) + sgc->payload_length;
ip6->ip6_plen = htons(iplen);
ha->ipv6_lro++;
} else {
m_freem(mpf);
if (sdsp->rx_free > ha->std_replenish)
qla_replenish_normal_rx(ha, sdsp, r_idx);
return 0;
}
mpf->m_pkthdr.csum_flags = CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR;
mpf->m_pkthdr.csum_data = 0xFFFF;
mpf->m_pkthdr.flowid = sgc->rss_hash;
- M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mpf, M_HASHTYPE_OPAQUE_HASH);
if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
(*ifp->if_input)(ifp, mpf);
if (sdsp->rx_free > ha->std_replenish)
qla_replenish_normal_rx(ha, sdsp, r_idx);
return (0);
}
static int
qla_rcv_cont_sds(qla_host_t *ha, uint32_t sds_idx, uint32_t comp_idx,
uint32_t dcount, uint16_t *handle, uint16_t *nhandles)
{
uint32_t i;
uint16_t num_handles;
q80_stat_desc_t *sdesc;
uint32_t opcode;
*nhandles = 0;
dcount--;
for (i = 0; i < dcount; i++) {
comp_idx = (comp_idx + 1) & (NUM_STATUS_DESCRIPTORS-1);
sdesc = (q80_stat_desc_t *)
&ha->hw.sds[sds_idx].sds_ring_base[comp_idx];
opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
if (!opcode) {
device_printf(ha->pci_dev, "%s: opcode=0 %p %p\n",
__func__, (void *)sdesc->data[0],
(void *)sdesc->data[1]);
return -1;
}
num_handles = Q8_SGL_STAT_DESC_NUM_HANDLES((sdesc->data[1]));
if (!num_handles) {
device_printf(ha->pci_dev, "%s: opcode=0 %p %p\n",
__func__, (void *)sdesc->data[0],
(void *)sdesc->data[1]);
return -1;
}
if (QL_ERR_INJECT(ha, INJCT_NUM_HNDLE_INVALID))
num_handles = -1;
switch (num_handles) {
case 1:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
break;
case 2:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
break;
case 3:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
break;
case 4:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
break;
case 5:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
break;
case 6:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE6((sdesc->data[1]));
break;
case 7:
*handle++ = Q8_SGL_STAT_DESC_HANDLE1((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE2((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE3((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE4((sdesc->data[0]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE5((sdesc->data[1]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE6((sdesc->data[1]));
*handle++ = Q8_SGL_STAT_DESC_HANDLE7((sdesc->data[1]));
break;
default:
device_printf(ha->pci_dev,
"%s: invalid num handles %p %p\n",
__func__, (void *)sdesc->data[0],
(void *)sdesc->data[1]);
QL_ASSERT(ha, (0),\
("%s: %s [nh, sds, d0, d1]=[%d, %d, %p, %p]\n",
__func__, "invalid num handles", sds_idx, num_handles,
(void *)sdesc->data[0],(void *)sdesc->data[1]));
qla_rcv_error(ha);
return 0;
}
*nhandles = *nhandles + num_handles;
}
return 0;
}
/*
* Name: qla_rcv_isr
* Function: Main Interrupt Service Routine
*/
static uint32_t
qla_rcv_isr(qla_host_t *ha, uint32_t sds_idx, uint32_t count)
{
device_t dev;
qla_hw_t *hw;
uint32_t comp_idx, c_idx = 0, desc_count = 0, opcode;
volatile q80_stat_desc_t *sdesc, *sdesc0 = NULL;
uint32_t ret = 0;
qla_sgl_comp_t sgc;
uint16_t nhandles;
uint32_t sds_replenish_threshold = 0;
dev = ha->pci_dev;
hw = &ha->hw;
hw->sds[sds_idx].rcv_active = 1;
if (ha->flags.stop_rcv) {
hw->sds[sds_idx].rcv_active = 0;
return 0;
}
QL_DPRINT2(ha, (dev, "%s: [%d]enter\n", __func__, sds_idx));
/*
* receive interrupts
*/
comp_idx = hw->sds[sds_idx].sdsr_next;
while (count-- && !ha->flags.stop_rcv) {
sdesc = (q80_stat_desc_t *)
&hw->sds[sds_idx].sds_ring_base[comp_idx];
opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
if (!opcode)
break;
hw->sds[sds_idx].intr_count++;
switch (opcode) {
case Q8_STAT_DESC_OPCODE_RCV_PKT:
desc_count = 1;
bzero(&sgc, sizeof(qla_sgl_comp_t));
sgc.rcv.pkt_length =
Q8_STAT_DESC_TOTAL_LENGTH((sdesc->data[0]));
sgc.rcv.num_handles = 1;
sgc.rcv.handle[0] =
Q8_STAT_DESC_HANDLE((sdesc->data[0]));
sgc.rcv.chksum_status =
Q8_STAT_DESC_STATUS((sdesc->data[1]));
sgc.rcv.rss_hash =
Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
sgc.rcv.vlan_tag =
Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
}
qla_rx_intr(ha, &sgc.rcv, sds_idx);
break;
case Q8_STAT_DESC_OPCODE_SGL_RCV:
desc_count =
Q8_STAT_DESC_COUNT_SGL_RCV((sdesc->data[1]));
if (desc_count > 1) {
c_idx = (comp_idx + desc_count -1) &
(NUM_STATUS_DESCRIPTORS-1);
sdesc0 = (q80_stat_desc_t *)
&hw->sds[sds_idx].sds_ring_base[c_idx];
if (Q8_STAT_DESC_OPCODE((sdesc0->data[1])) !=
Q8_STAT_DESC_OPCODE_CONT) {
desc_count = 0;
break;
}
}
bzero(&sgc, sizeof(qla_sgl_comp_t));
sgc.rcv.pkt_length =
Q8_STAT_DESC_TOTAL_LENGTH_SGL_RCV(\
(sdesc->data[0]));
sgc.rcv.chksum_status =
Q8_STAT_DESC_STATUS((sdesc->data[1]));
sgc.rcv.rss_hash =
Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
sgc.rcv.vlan_tag =
Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
}
QL_ASSERT(ha, (desc_count <= 2) ,\
("%s: [sds_idx, data0, data1]="\
"%d, %p, %p]\n", __func__, sds_idx,\
(void *)sdesc->data[0],\
(void *)sdesc->data[1]));
sgc.rcv.num_handles = 1;
sgc.rcv.handle[0] =
Q8_STAT_DESC_HANDLE((sdesc->data[0]));
if (qla_rcv_cont_sds(ha, sds_idx, comp_idx, desc_count,
&sgc.rcv.handle[1], &nhandles)) {
device_printf(dev,
"%s: [sds_idx, dcount, data0, data1]="
"[%d, %d, 0x%llx, 0x%llx]\n",
__func__, sds_idx, desc_count,
(long long unsigned int)sdesc->data[0],
(long long unsigned int)sdesc->data[1]);
desc_count = 0;
break;
}
sgc.rcv.num_handles += nhandles;
qla_rx_intr(ha, &sgc.rcv, sds_idx);
break;
case Q8_STAT_DESC_OPCODE_SGL_LRO:
desc_count =
Q8_STAT_DESC_COUNT_SGL_LRO((sdesc->data[1]));
if (desc_count > 1) {
c_idx = (comp_idx + desc_count -1) &
(NUM_STATUS_DESCRIPTORS-1);
sdesc0 = (q80_stat_desc_t *)
&hw->sds[sds_idx].sds_ring_base[c_idx];
if (Q8_STAT_DESC_OPCODE((sdesc0->data[1])) !=
Q8_STAT_DESC_OPCODE_CONT) {
desc_count = 0;
break;
}
}
bzero(&sgc, sizeof(qla_sgl_comp_t));
sgc.lro.payload_length =
Q8_STAT_DESC_TOTAL_LENGTH_SGL_RCV((sdesc->data[0]));
sgc.lro.rss_hash =
Q8_STAT_DESC_RSS_HASH((sdesc->data[0]));
sgc.lro.num_handles = 1;
sgc.lro.handle[0] =
Q8_STAT_DESC_HANDLE((sdesc->data[0]));
if (Q8_SGL_LRO_STAT_TS((sdesc->data[1])))
sgc.lro.flags |= Q8_LRO_COMP_TS;
if (Q8_SGL_LRO_STAT_PUSH_BIT((sdesc->data[1])))
sgc.lro.flags |= Q8_LRO_COMP_PUSH_BIT;
sgc.lro.l2_offset =
Q8_SGL_LRO_STAT_L2_OFFSET((sdesc->data[1]));
sgc.lro.l4_offset =
Q8_SGL_LRO_STAT_L4_OFFSET((sdesc->data[1]));
if (Q8_STAT_DESC_VLAN((sdesc->data[1]))) {
sgc.lro.vlan_tag =
Q8_STAT_DESC_VLAN_ID((sdesc->data[1]));
}
QL_ASSERT(ha, (desc_count <= 7) ,\
("%s: [sds_idx, data0, data1]="\
"[%d, 0x%llx, 0x%llx]\n",\
__func__, sds_idx,\
(long long unsigned int)sdesc->data[0],\
(long long unsigned int)sdesc->data[1]));
if (qla_rcv_cont_sds(ha, sds_idx, comp_idx,
desc_count, &sgc.lro.handle[1], &nhandles)) {
device_printf(dev,
"%s: [sds_idx, data0, data1]="\
"[%d, 0x%llx, 0x%llx]\n",\
__func__, sds_idx,\
(long long unsigned int)sdesc->data[0],\
(long long unsigned int)sdesc->data[1]);
desc_count = 0;
break;
}
sgc.lro.num_handles += nhandles;
if (qla_lro_intr(ha, &sgc.lro, sds_idx)) {
device_printf(dev,
"%s: [sds_idx, data0, data1]="\
"[%d, 0x%llx, 0x%llx]\n",\
__func__, sds_idx,\
(long long unsigned int)sdesc->data[0],\
(long long unsigned int)sdesc->data[1]);
device_printf(dev,
"%s: [comp_idx, c_idx, dcount, nhndls]="\
"[%d, %d, %d, %d]\n",\
__func__, comp_idx, c_idx, desc_count,
sgc.lro.num_handles);
if (desc_count > 1) {
device_printf(dev,
"%s: [sds_idx, data0, data1]="\
"[%d, 0x%llx, 0x%llx]\n",\
__func__, sds_idx,\
(long long unsigned int)sdesc0->data[0],\
(long long unsigned int)sdesc0->data[1]);
}
}
break;
default:
device_printf(dev, "%s: default 0x%llx!\n", __func__,
(long long unsigned int)sdesc->data[0]);
break;
}
if (desc_count == 0)
break;
sds_replenish_threshold += desc_count;
while (desc_count--) {
sdesc->data[0] = 0ULL;
sdesc->data[1] = 0ULL;
comp_idx = (comp_idx + 1) & (NUM_STATUS_DESCRIPTORS-1);
sdesc = (q80_stat_desc_t *)
&hw->sds[sds_idx].sds_ring_base[comp_idx];
}
if (sds_replenish_threshold > ha->hw.sds_cidx_thres) {
sds_replenish_threshold = 0;
if (hw->sds[sds_idx].sdsr_next != comp_idx) {
QL_UPDATE_SDS_CONSUMER_INDEX(ha, sds_idx,\
comp_idx);
}
hw->sds[sds_idx].sdsr_next = comp_idx;
}
}
if (ha->flags.stop_rcv)
goto qla_rcv_isr_exit;
if (hw->sds[sds_idx].sdsr_next != comp_idx) {
QL_UPDATE_SDS_CONSUMER_INDEX(ha, sds_idx, comp_idx);
}
hw->sds[sds_idx].sdsr_next = comp_idx;
sdesc = (q80_stat_desc_t *)&hw->sds[sds_idx].sds_ring_base[comp_idx];
opcode = Q8_STAT_DESC_OPCODE((sdesc->data[1]));
if (opcode)
ret = -1;
qla_rcv_isr_exit:
hw->sds[sds_idx].rcv_active = 0;
return (ret);
}
void
ql_mbx_isr(void *arg)
{
qla_host_t *ha;
uint32_t data;
uint32_t prev_link_state;
ha = arg;
if (ha == NULL) {
device_printf(ha->pci_dev, "%s: arg == NULL\n", __func__);
return;
}
data = READ_REG32(ha, Q8_FW_MBOX_CNTRL);
if ((data & 0x3) != 0x1) {
WRITE_REG32(ha, ha->hw.mbx_intr_mask_offset, 0);
return;
}
data = READ_REG32(ha, Q8_FW_MBOX0);
if ((data & 0xF000) != 0x8000)
return;
data = data & 0xFFFF;
switch (data) {
case 0x8001: /* It's an AEN */
ha->hw.cable_oui = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
data = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
ha->hw.cable_length = data & 0xFFFF;
data = data >> 16;
ha->hw.link_speed = data & 0xFFF;
data = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
prev_link_state = ha->hw.link_up;
ha->hw.link_up = (((data & 0xFF) == 0) ? 0 : 1);
if (prev_link_state != ha->hw.link_up) {
if (ha->hw.link_up)
if_link_state_change(ha->ifp, LINK_STATE_UP);
else
if_link_state_change(ha->ifp, LINK_STATE_DOWN);
}
ha->hw.module_type = ((data >> 8) & 0xFF);
ha->hw.flags.fduplex = (((data & 0xFF0000) == 0) ? 0 : 1);
ha->hw.flags.autoneg = (((data & 0xFF000000) == 0) ? 0 : 1);
data = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
ha->hw.flags.loopback_mode = data & 0x03;
ha->hw.link_faults = (data >> 3) & 0xFF;
break;
case 0x8100:
ha->hw.imd_compl=1;
break;
case 0x8101:
ha->async_event = 1;
ha->hw.aen_mb0 = 0x8101;
ha->hw.aen_mb1 = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
ha->hw.aen_mb2 = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
ha->hw.aen_mb3 = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
ha->hw.aen_mb4 = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
break;
case 0x8110:
/* for now just dump the registers */
{
uint32_t ombx[5];
ombx[0] = READ_REG32(ha, (Q8_FW_MBOX0 + 4));
ombx[1] = READ_REG32(ha, (Q8_FW_MBOX0 + 8));
ombx[2] = READ_REG32(ha, (Q8_FW_MBOX0 + 12));
ombx[3] = READ_REG32(ha, (Q8_FW_MBOX0 + 16));
ombx[4] = READ_REG32(ha, (Q8_FW_MBOX0 + 20));
device_printf(ha->pci_dev, "%s: "
"0x%08x 0x%08x 0x%08x 0x%08x 0x%08x 0x%08x\n",
__func__, data, ombx[0], ombx[1], ombx[2],
ombx[3], ombx[4]);
}
break;
case 0x8130:
/* sfp insertion aen */
device_printf(ha->pci_dev, "%s: sfp inserted [0x%08x]\n",
__func__, READ_REG32(ha, (Q8_FW_MBOX0 + 4)));
break;
case 0x8131:
/* sfp removal aen */
device_printf(ha->pci_dev, "%s: sfp removed]\n", __func__);
break;
default:
device_printf(ha->pci_dev, "%s: AEN[0x%08x]\n", __func__, data);
break;
}
WRITE_REG32(ha, Q8_FW_MBOX_CNTRL, 0x0);
WRITE_REG32(ha, ha->hw.mbx_intr_mask_offset, 0x0);
return;
}
static void
qla_replenish_normal_rx(qla_host_t *ha, qla_sds_t *sdsp, uint32_t r_idx)
{
qla_rx_buf_t *rxb;
int count = sdsp->rx_free;
uint32_t rx_next;
qla_rdesc_t *rdesc;
/* we can play with this value via a sysctl */
uint32_t replenish_thresh = ha->hw.rds_pidx_thres;
rdesc = &ha->hw.rds[r_idx];
rx_next = rdesc->rx_next;
while (count--) {
rxb = sdsp->rxb_free;
if (rxb == NULL)
break;
sdsp->rxb_free = rxb->next;
sdsp->rx_free--;
if (ql_get_mbuf(ha, rxb, NULL) == 0) {
qla_set_hw_rcv_desc(ha, r_idx, rdesc->rx_in,
rxb->handle,
rxb->paddr, (rxb->m_head)->m_pkthdr.len);
rdesc->rx_in++;
if (rdesc->rx_in == NUM_RX_DESCRIPTORS)
rdesc->rx_in = 0;
rdesc->rx_next++;
if (rdesc->rx_next == NUM_RX_DESCRIPTORS)
rdesc->rx_next = 0;
} else {
device_printf(ha->pci_dev,
"%s: ql_get_mbuf [0,(%d),(%d)] failed\n",
__func__, rdesc->rx_in, rxb->handle);
rxb->m_head = NULL;
rxb->next = sdsp->rxb_free;
sdsp->rxb_free = rxb;
sdsp->rx_free++;
break;
}
if (replenish_thresh-- == 0) {
QL_UPDATE_RDS_PRODUCER_INDEX(ha, rdesc->prod_std,
rdesc->rx_next);
rx_next = rdesc->rx_next;
replenish_thresh = ha->hw.rds_pidx_thres;
}
}
if (rx_next != rdesc->rx_next) {
QL_UPDATE_RDS_PRODUCER_INDEX(ha, rdesc->prod_std,
rdesc->rx_next);
}
}
void
ql_isr(void *arg)
{
qla_ivec_t *ivec = arg;
qla_host_t *ha ;
int idx;
qla_hw_t *hw;
struct ifnet *ifp;
uint32_t ret = 0;
ha = ivec->ha;
hw = &ha->hw;
ifp = ha->ifp;
if ((idx = ivec->sds_idx) >= ha->hw.num_sds_rings)
return;
if (idx == 0)
taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
ret = qla_rcv_isr(ha, idx, -1);
if (idx == 0)
taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
if (!ha->flags.stop_rcv) {
QL_ENABLE_INTERRUPTS(ha, idx);
}
return;
}
Index: projects/vnet/sys/dev/qlxge/qls_isr.c
===================================================================
--- projects/vnet/sys/dev/qlxge/qls_isr.c (revision 301546)
+++ projects/vnet/sys/dev/qlxge/qls_isr.c (revision 301547)
@@ -1,396 +1,396 @@
/*
* Copyright (c) 2013-2014 Qlogic Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/*
* File: qls_isr.c
* Author : David C Somayajulu, Qlogic Corporation, Aliso Viejo, CA 92656.
*/
#include
__FBSDID("$FreeBSD$");
#include "qls_os.h"
#include "qls_hw.h"
#include "qls_def.h"
#include "qls_inline.h"
#include "qls_ver.h"
#include "qls_glbl.h"
#include "qls_dbg.h"
static void
qls_tx_comp(qla_host_t *ha, uint32_t txr_idx, q81_tx_mac_comp_t *tx_comp)
{
qla_tx_buf_t *txb;
uint32_t tx_idx = tx_comp->tid_lo;
if (tx_idx >= NUM_TX_DESCRIPTORS) {
ha->qla_initiate_recovery = 1;
return;
}
txb = &ha->tx_ring[txr_idx].tx_buf[tx_idx];
if (txb->m_head) {
if_inc_counter(ha->ifp, IFCOUNTER_OPACKETS, 1);
bus_dmamap_sync(ha->tx_tag, txb->map,
BUS_DMASYNC_POSTWRITE);
bus_dmamap_unload(ha->tx_tag, txb->map);
m_freem(txb->m_head);
txb->m_head = NULL;
}
ha->tx_ring[txr_idx].txr_done++;
if (ha->tx_ring[txr_idx].txr_done == NUM_TX_DESCRIPTORS)
ha->tx_ring[txr_idx].txr_done = 0;
}
static void
qls_replenish_rx(qla_host_t *ha, uint32_t r_idx)
{
qla_rx_buf_t *rxb;
qla_rx_ring_t *rxr;
int count;
volatile q81_bq_addr_e_t *sbq_e;
rxr = &ha->rx_ring[r_idx];
count = rxr->rx_free;
sbq_e = rxr->sbq_vaddr;
while (count--) {
rxb = &rxr->rx_buf[rxr->sbq_next];
if (rxb->m_head == NULL) {
if (qls_get_mbuf(ha, rxb, NULL) != 0) {
device_printf(ha->pci_dev,
"%s: qls_get_mbuf [0,%d,%d] failed\n",
__func__, rxr->sbq_next, r_idx);
rxb->m_head = NULL;
break;
}
}
if (rxb->m_head != NULL) {
sbq_e[rxr->sbq_next].addr_lo = (uint32_t)rxb->paddr;
sbq_e[rxr->sbq_next].addr_hi =
(uint32_t)(rxb->paddr >> 32);
rxr->sbq_next++;
if (rxr->sbq_next == NUM_RX_DESCRIPTORS)
rxr->sbq_next = 0;
rxr->sbq_free++;
rxr->rx_free--;
}
if (rxr->sbq_free == 16) {
rxr->sbq_in += 16;
rxr->sbq_in = rxr->sbq_in & (NUM_RX_DESCRIPTORS - 1);
rxr->sbq_free = 0;
Q81_WR_SBQ_PROD_IDX(r_idx, (rxr->sbq_in));
}
}
}
static int
qls_rx_comp(qla_host_t *ha, uint32_t rxr_idx, uint32_t cq_idx, q81_rx_t *cq_e)
{
qla_rx_buf_t *rxb;
qla_rx_ring_t *rxr;
device_t dev = ha->pci_dev;
struct mbuf *mp = NULL;
struct ifnet *ifp = ha->ifp;
struct lro_ctrl *lro;
struct ether_vlan_header *eh;
rxr = &ha->rx_ring[rxr_idx];
lro = &rxr->lro;
rxb = &rxr->rx_buf[rxr->rx_next];
if (!(cq_e->flags1 & Q81_RX_FLAGS1_DS)) {
device_printf(dev, "%s: DS bit not set \n", __func__);
return -1;
}
if (rxb->paddr != cq_e->b_paddr) {
device_printf(dev,
"%s: (rxb->paddr != cq_e->b_paddr)[%p, %p] \n",
__func__, (void *)rxb->paddr, (void *)cq_e->b_paddr);
Q81_SET_CQ_INVALID(cq_idx);
ha->qla_initiate_recovery = 1;
return(-1);
}
rxr->rx_int++;
if ((cq_e->flags1 & Q81_RX_FLAGS1_ERR_MASK) == 0) {
mp = rxb->m_head;
rxb->m_head = NULL;
if (mp == NULL) {
device_printf(dev, "%s: mp == NULL\n", __func__);
} else {
mp->m_flags |= M_PKTHDR;
mp->m_pkthdr.len = cq_e->length;
mp->m_pkthdr.rcvif = ifp;
mp->m_len = cq_e->length;
eh = mtod(mp, struct ether_vlan_header *);
if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
uint32_t *data = (uint32_t *)eh;
mp->m_pkthdr.ether_vtag = ntohs(eh->evl_tag);
mp->m_flags |= M_VLANTAG;
*(data + 3) = *(data + 2);
*(data + 2) = *(data + 1);
*(data + 1) = *data;
m_adj(mp, ETHER_VLAN_ENCAP_LEN);
}
if ((cq_e->flags1 & Q81_RX_FLAGS1_RSS_MATCH_MASK)) {
rxr->rss_int++;
mp->m_pkthdr.flowid = cq_e->rss;
- M_HASHTYPE_SET(mp, M_HASHTYPE_OPAQUE);
+ M_HASHTYPE_SET(mp, M_HASHTYPE_OPAQUE_HASH);
}
if (cq_e->flags0 & (Q81_RX_FLAGS0_TE |
Q81_RX_FLAGS0_NU | Q81_RX_FLAGS0_IE)) {
mp->m_pkthdr.csum_flags = 0;
} else {
mp->m_pkthdr.csum_flags = CSUM_IP_CHECKED |
CSUM_IP_VALID | CSUM_DATA_VALID |
CSUM_PSEUDO_HDR;
mp->m_pkthdr.csum_data = 0xFFFF;
}
if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
if (lro->lro_cnt && (tcp_lro_rx(lro, mp, 0) == 0)) {
/* LRO packet has been successfully queued */
} else {
(*ifp->if_input)(ifp, mp);
}
}
} else {
device_printf(dev, "%s: err [0%08x]\n", __func__, cq_e->flags1);
}
rxr->rx_free++;
rxr->rx_next++;
if (rxr->rx_next == NUM_RX_DESCRIPTORS)
rxr->rx_next = 0;
if ((rxr->rx_free + rxr->sbq_free) >= 16)
qls_replenish_rx(ha, rxr_idx);
return 0;
}
static void
qls_cq_isr(qla_host_t *ha, uint32_t cq_idx)
{
q81_cq_e_t *cq_e, *cq_b;
uint32_t i, cq_comp_idx;
int ret = 0, tx_comp_done = 0;
struct lro_ctrl *lro;
cq_b = ha->rx_ring[cq_idx].cq_base_vaddr;
lro = &ha->rx_ring[cq_idx].lro;
cq_comp_idx = *(ha->rx_ring[cq_idx].cqi_vaddr);
i = ha->rx_ring[cq_idx].cq_next;
while (i != cq_comp_idx) {
cq_e = &cq_b[i];
switch (cq_e->opcode) {
case Q81_IOCB_TX_MAC:
case Q81_IOCB_TX_TSO:
qls_tx_comp(ha, cq_idx, (q81_tx_mac_comp_t *)cq_e);
tx_comp_done++;
break;
case Q81_IOCB_RX:
ret = qls_rx_comp(ha, cq_idx, i, (q81_rx_t *)cq_e);
break;
case Q81_IOCB_MPI:
case Q81_IOCB_SYS:
default:
device_printf(ha->pci_dev, "%s[%d %d 0x%x]: illegal \n",
__func__, i, (*(ha->rx_ring[cq_idx].cqi_vaddr)),
cq_e->opcode);
qls_dump_buf32(ha, __func__, cq_e,
(sizeof (q81_cq_e_t) >> 2));
break;
}
i++;
if (i == NUM_CQ_ENTRIES)
i = 0;
if (ret) {
break;
}
if (i == cq_comp_idx) {
cq_comp_idx = *(ha->rx_ring[cq_idx].cqi_vaddr);
}
if (tx_comp_done) {
taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
tx_comp_done = 0;
}
}
tcp_lro_flush_all(lro);
ha->rx_ring[cq_idx].cq_next = cq_comp_idx;
if (!ret) {
Q81_WR_CQ_CONS_IDX(cq_idx, (ha->rx_ring[cq_idx].cq_next));
}
if (tx_comp_done)
taskqueue_enqueue(ha->tx_tq, &ha->tx_task);
return;
}
static void
qls_mbx_isr(qla_host_t *ha)
{
uint32_t data;
int i;
device_t dev = ha->pci_dev;
if (qls_mbx_rd_reg(ha, 0, &data) == 0) {
if ((data & 0xF000) == 0x4000) {
ha->mbox[0] = data;
for (i = 1; i < Q81_NUM_MBX_REGISTERS; i++) {
if (qls_mbx_rd_reg(ha, i, &data))
break;
ha->mbox[i] = data;
}
ha->mbx_done = 1;
} else if ((data & 0xF000) == 0x8000) {
/* we have an AEN */
ha->aen[0] = data;
for (i = 1; i < Q81_NUM_AEN_REGISTERS; i++) {
if (qls_mbx_rd_reg(ha, i, &data))
break;
ha->aen[i] = data;
}
device_printf(dev,"%s: AEN "
"[0x%08x 0x%08x 0x%08x 0x%08x 0x%08x"
" 0x%08x 0x%08x 0x%08x 0x%08x]\n",
__func__,
ha->aen[0], ha->aen[1], ha->aen[2],
ha->aen[3], ha->aen[4], ha->aen[5],
ha->aen[6], ha->aen[7], ha->aen[8]);
switch ((ha->aen[0] & 0xFFFF)) {
case 0x8011:
ha->link_up = 1;
break;
case 0x8012:
ha->link_up = 0;
break;
case 0x8130:
ha->link_hw_info = ha->aen[1];
break;
case 0x8131:
ha->link_hw_info = 0;
break;
}
}
}
WRITE_REG32(ha, Q81_CTL_HOST_CMD_STATUS, Q81_CTL_HCS_CMD_CLR_RTH_INTR);
return;
}
void
qls_isr(void *arg)
{
qla_ivec_t *ivec = arg;
qla_host_t *ha;
uint32_t status;
uint32_t cq_idx;
device_t dev;
ha = ivec->ha;
cq_idx = ivec->cq_idx;
dev = ha->pci_dev;
status = READ_REG32(ha, Q81_CTL_STATUS);
if (status & Q81_CTL_STATUS_FE) {
device_printf(dev, "%s fatal error\n", __func__);
return;
}
if ((cq_idx == 0) && (status & Q81_CTL_STATUS_PI)) {
qls_mbx_isr(ha);
}
status = READ_REG32(ha, Q81_CTL_INTR_STATUS1);
if (status & ( 0x1 << cq_idx))
qls_cq_isr(ha, cq_idx);
Q81_ENABLE_INTR(ha, cq_idx);
return;
}
Index: projects/vnet/sys/kern/subr_intr.c
===================================================================
--- projects/vnet/sys/kern/subr_intr.c (revision 301546)
+++ projects/vnet/sys/kern/subr_intr.c (revision 301547)
@@ -1,1544 +1,1392 @@
/*-
* Copyright (c) 2015-2016 Svatopluk Kraus
* Copyright (c) 2015-2016 Michal Meloun
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include
__FBSDID("$FreeBSD$");
/*
* New-style Interrupt Framework
*
* TODO: - to support IPI (PPI) enabling on other CPUs if already started
* - to complete things for removable PICs
*/
-#include "opt_acpi.h"
#include "opt_ddb.h"
#include "opt_hwpmc_hooks.h"
-#include "opt_platform.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#ifdef HWPMC_HOOKS
#include
#endif
#include
#include
#include
#include
#include
#ifdef DDB
#include
#endif
#include "pic_if.h"
#include "msi_if.h"
#define INTRNAME_LEN (2*MAXCOMLEN + 1)
#ifdef DEBUG
#define debugf(fmt, args...) do { printf("%s(): ", __func__); \
printf(fmt,##args); } while (0)
#else
#define debugf(fmt, args...)
#endif
MALLOC_DECLARE(M_INTRNG);
MALLOC_DEFINE(M_INTRNG, "intr", "intr interrupt handling");
/* Main interrupt handler called from assembler -> 'hidden' for C code. */
void intr_irq_handler(struct trapframe *tf);
/* Root interrupt controller stuff. */
device_t intr_irq_root_dev;
static intr_irq_filter_t *irq_root_filter;
static void *irq_root_arg;
static u_int irq_root_ipicount;
struct intr_pic_child {
SLIST_ENTRY(intr_pic_child) pc_next;
struct intr_pic *pc_pic;
intr_child_irq_filter_t *pc_filter;
void *pc_filter_arg;
uintptr_t pc_start;
uintptr_t pc_length;
};
/* Interrupt controller definition. */
struct intr_pic {
SLIST_ENTRY(intr_pic) pic_next;
intptr_t pic_xref; /* hardware identification */
device_t pic_dev;
#define FLAG_PIC (1 << 0)
#define FLAG_MSI (1 << 1)
u_int pic_flags;
struct mtx pic_child_lock;
SLIST_HEAD(, intr_pic_child) pic_children;
};
static struct mtx pic_list_lock;
static SLIST_HEAD(, intr_pic) pic_list;
static struct intr_pic *pic_lookup(device_t dev, intptr_t xref);
/* Interrupt source definition. */
static struct mtx isrc_table_lock;
static struct intr_irqsrc *irq_sources[NIRQ];
u_int irq_next_free;
-/*
- * XXX - All stuff around struct intr_dev_data is considered as temporary
- * until better place for storing struct intr_map_data will be find.
- *
- * For now, there are two global interrupt numbers spaces:
- * <0, NIRQ) ... interrupts without config data
- * managed in irq_sources[]
- * IRQ_DDATA_BASE + <0, 2 * NIRQ) ... interrupts with config data
- * managed in intr_ddata_tab[]
- *
- * Read intr_ddata_lookup() to see how these spaces are worked with.
- * Note that each interrupt number from second space duplicates some number
- * from first space at this moment. An interrupt number from first space can
- * be duplicated even multiple times in second space.
- */
-struct intr_dev_data {
- device_t idd_dev;
- intptr_t idd_xref;
- u_int idd_irq;
- struct intr_map_data * idd_data;
- struct intr_irqsrc * idd_isrc;
-};
-
-static struct intr_dev_data *intr_ddata_tab[2 * NIRQ];
-static u_int intr_ddata_first_unused;
-
-#define IRQ_DDATA_BASE 10000
-CTASSERT(IRQ_DDATA_BASE > nitems(irq_sources));
-
#ifdef SMP
static boolean_t irq_assign_cpu = FALSE;
#endif
/*
* - 2 counters for each I/O interrupt.
* - MAXCPU counters for each IPI counters for SMP.
*/
#ifdef SMP
#define INTRCNT_COUNT (NIRQ * 2 + INTR_IPI_COUNT * MAXCPU)
#else
#define INTRCNT_COUNT (NIRQ * 2)
#endif
/* Data for MI statistics reporting. */
u_long intrcnt[INTRCNT_COUNT];
char intrnames[INTRCNT_COUNT * INTRNAME_LEN];
size_t sintrcnt = sizeof(intrcnt);
size_t sintrnames = sizeof(intrnames);
static u_int intrcnt_index;
/*
* Interrupt framework initialization routine.
*/
static void
intr_irq_init(void *dummy __unused)
{
SLIST_INIT(&pic_list);
mtx_init(&pic_list_lock, "intr pic list", NULL, MTX_DEF);
mtx_init(&isrc_table_lock, "intr isrc table", NULL, MTX_DEF);
}
SYSINIT(intr_irq_init, SI_SUB_INTR, SI_ORDER_FIRST, intr_irq_init, NULL);
static void
intrcnt_setname(const char *name, int index)
{
snprintf(intrnames + INTRNAME_LEN * index, INTRNAME_LEN, "%-*s",
INTRNAME_LEN - 1, name);
}
/*
* Update name for interrupt source with interrupt event.
*/
static void
intrcnt_updatename(struct intr_irqsrc *isrc)
{
/* QQQ: What about stray counter name? */
mtx_assert(&isrc_table_lock, MA_OWNED);
intrcnt_setname(isrc->isrc_event->ie_fullname, isrc->isrc_index);
}
/*
* Virtualization for interrupt source interrupt counter increment.
*/
static inline void
isrc_increment_count(struct intr_irqsrc *isrc)
{
if (isrc->isrc_flags & INTR_ISRCF_PPI)
atomic_add_long(&isrc->isrc_count[0], 1);
else
isrc->isrc_count[0]++;
}
/*
* Virtualization for interrupt source interrupt stray counter increment.
*/
static inline void
isrc_increment_straycount(struct intr_irqsrc *isrc)
{
isrc->isrc_count[1]++;
}
/*
* Virtualization for interrupt source interrupt name update.
*/
static void
isrc_update_name(struct intr_irqsrc *isrc, const char *name)
{
char str[INTRNAME_LEN];
mtx_assert(&isrc_table_lock, MA_OWNED);
if (name != NULL) {
snprintf(str, INTRNAME_LEN, "%s: %s", isrc->isrc_name, name);
intrcnt_setname(str, isrc->isrc_index);
snprintf(str, INTRNAME_LEN, "stray %s: %s", isrc->isrc_name,
name);
intrcnt_setname(str, isrc->isrc_index + 1);
} else {
snprintf(str, INTRNAME_LEN, "%s:", isrc->isrc_name);
intrcnt_setname(str, isrc->isrc_index);
snprintf(str, INTRNAME_LEN, "stray %s:", isrc->isrc_name);
intrcnt_setname(str, isrc->isrc_index + 1);
}
}
/*
* Virtualization for interrupt source interrupt counters setup.
*/
static void
isrc_setup_counters(struct intr_irqsrc *isrc)
{
u_int index;
/*
* XXX - it does not work well with removable controllers and
* interrupt sources !!!
*/
index = atomic_fetchadd_int(&intrcnt_index, 2);
isrc->isrc_index = index;
isrc->isrc_count = &intrcnt[index];
isrc_update_name(isrc, NULL);
}
/*
* Virtualization for interrupt source interrupt counters release.
*/
static void
isrc_release_counters(struct intr_irqsrc *isrc)
{
panic("%s: not implemented", __func__);
}
#ifdef SMP
/*
* Virtualization for interrupt source IPI counters setup.
*/
u_long *
intr_ipi_setup_counters(const char *name)
{
u_int index, i;
char str[INTRNAME_LEN];
index = atomic_fetchadd_int(&intrcnt_index, MAXCPU);
for (i = 0; i < MAXCPU; i++) {
snprintf(str, INTRNAME_LEN, "cpu%d:%s", i, name);
intrcnt_setname(str, index + i);
}
return (&intrcnt[index]);
}
#endif
/*
* Main interrupt dispatch handler. It's called straight
* from the assembler, where CPU interrupt is served.
*/
void
intr_irq_handler(struct trapframe *tf)
{
struct trapframe * oldframe;
struct thread * td;
KASSERT(irq_root_filter != NULL, ("%s: no filter", __func__));
PCPU_INC(cnt.v_intr);
critical_enter();
td = curthread;
oldframe = td->td_intr_frame;
td->td_intr_frame = tf;
irq_root_filter(irq_root_arg);
td->td_intr_frame = oldframe;
critical_exit();
#ifdef HWPMC_HOOKS
if (pmc_hook && TRAPF_USERMODE(tf) &&
(PCPU_GET(curthread)->td_pflags & TDP_CALLCHAIN))
pmc_hook(PCPU_GET(curthread), PMC_FN_USER_CALLCHAIN, tf);
#endif
}
int
intr_child_irq_handler(struct intr_pic *parent, uintptr_t irq)
{
struct intr_pic_child *child;
bool found;
found = false;
mtx_lock_spin(&parent->pic_child_lock);
SLIST_FOREACH(child, &parent->pic_children, pc_next) {
if (child->pc_start <= irq &&
irq < (child->pc_start + child->pc_length)) {
found = true;
break;
}
}
mtx_unlock_spin(&parent->pic_child_lock);
if (found)
return (child->pc_filter(child->pc_filter_arg, irq));
return (FILTER_STRAY);
}
/*
* interrupt controller dispatch function for interrupts. It should
* be called straight from the interrupt controller, when associated interrupt
* source is learned.
*/
int
intr_isrc_dispatch(struct intr_irqsrc *isrc, struct trapframe *tf)
{
KASSERT(isrc != NULL, ("%s: no source", __func__));
isrc_increment_count(isrc);
#ifdef INTR_SOLO
if (isrc->isrc_filter != NULL) {
int error;
error = isrc->isrc_filter(isrc->isrc_arg, tf);
PIC_POST_FILTER(isrc->isrc_dev, isrc);
if (error == FILTER_HANDLED)
return (0);
} else
#endif
if (isrc->isrc_event != NULL) {
if (intr_event_handle(isrc->isrc_event, tf) == 0)
return (0);
}
isrc_increment_straycount(isrc);
return (EINVAL);
}
/*
* Alloc unique interrupt number (resource handle) for interrupt source.
*
* There could be various strategies how to allocate free interrupt number
* (resource handle) for new interrupt source.
*
* 1. Handles are always allocated forward, so handles are not recycled
* immediately. However, if only one free handle left which is reused
* constantly...
*/
static inline int
isrc_alloc_irq(struct intr_irqsrc *isrc)
{
u_int maxirqs, irq;
mtx_assert(&isrc_table_lock, MA_OWNED);
maxirqs = nitems(irq_sources);
if (irq_next_free >= maxirqs)
return (ENOSPC);
for (irq = irq_next_free; irq < maxirqs; irq++) {
if (irq_sources[irq] == NULL)
goto found;
}
for (irq = 0; irq < irq_next_free; irq++) {
if (irq_sources[irq] == NULL)
goto found;
}
irq_next_free = maxirqs;
return (ENOSPC);
found:
isrc->isrc_irq = irq;
irq_sources[irq] = isrc;
irq_next_free = irq + 1;
if (irq_next_free >= maxirqs)
irq_next_free = 0;
return (0);
}
/*
* Free unique interrupt number (resource handle) from interrupt source.
*/
static inline int
isrc_free_irq(struct intr_irqsrc *isrc)
{
mtx_assert(&isrc_table_lock, MA_OWNED);
if (isrc->isrc_irq >= nitems(irq_sources))
return (EINVAL);
if (irq_sources[isrc->isrc_irq] != isrc)
return (EINVAL);
irq_sources[isrc->isrc_irq] = NULL;
isrc->isrc_irq = INTR_IRQ_INVALID; /* just to be safe */
return (0);
}
/*
* Lookup interrupt source by interrupt number (resource handle).
*/
static inline struct intr_irqsrc *
isrc_lookup(u_int irq)
{
if (irq < nitems(irq_sources))
return (irq_sources[irq]);
return (NULL);
}
/*
* Initialize interrupt source and register it into global interrupt table.
*/
int
intr_isrc_register(struct intr_irqsrc *isrc, device_t dev, u_int flags,
const char *fmt, ...)
{
int error;
va_list ap;
bzero(isrc, sizeof(struct intr_irqsrc));
isrc->isrc_dev = dev;
isrc->isrc_irq = INTR_IRQ_INVALID; /* just to be safe */
isrc->isrc_flags = flags;
va_start(ap, fmt);
vsnprintf(isrc->isrc_name, INTR_ISRC_NAMELEN, fmt, ap);
va_end(ap);
mtx_lock(&isrc_table_lock);
error = isrc_alloc_irq(isrc);
if (error != 0) {
mtx_unlock(&isrc_table_lock);
return (error);
}
/*
* Setup interrupt counters, but not for IPI sources. Those are setup
* later and only for used ones (up to INTR_IPI_COUNT) to not exhaust
* our counter pool.
*/
if ((isrc->isrc_flags & INTR_ISRCF_IPI) == 0)
isrc_setup_counters(isrc);
mtx_unlock(&isrc_table_lock);
return (0);
}
/*
* Deregister interrupt source from global interrupt table.
*/
int
intr_isrc_deregister(struct intr_irqsrc *isrc)
{
int error;
mtx_lock(&isrc_table_lock);
if ((isrc->isrc_flags & INTR_ISRCF_IPI) == 0)
isrc_release_counters(isrc);
error = isrc_free_irq(isrc);
mtx_unlock(&isrc_table_lock);
return (error);
}
#ifdef SMP
/*
* A support function for a PIC to decide if provided ISRC should be inited
* on given cpu. The logic of INTR_ISRCF_BOUND flag and isrc_cpu member of
* struct intr_irqsrc is the following:
*
* If INTR_ISRCF_BOUND is set, the ISRC should be inited only on cpus
* set in isrc_cpu. If not, the ISRC should be inited on every cpu and
* isrc_cpu is kept consistent with it. Thus isrc_cpu is always correct.
*/
bool
intr_isrc_init_on_cpu(struct intr_irqsrc *isrc, u_int cpu)
{
if (isrc->isrc_handlers == 0)
return (false);
if ((isrc->isrc_flags & (INTR_ISRCF_PPI | INTR_ISRCF_IPI)) == 0)
return (false);
if (isrc->isrc_flags & INTR_ISRCF_BOUND)
return (CPU_ISSET(cpu, &isrc->isrc_cpu));
CPU_SET(cpu, &isrc->isrc_cpu);
return (true);
}
#endif
-static struct intr_dev_data *
-intr_ddata_alloc(u_int extsize)
-{
- struct intr_dev_data *ddata;
- size_t size;
-
- size = sizeof(*ddata);
- ddata = malloc(size + extsize, M_INTRNG, M_WAITOK | M_ZERO);
-
- mtx_lock(&isrc_table_lock);
- if (intr_ddata_first_unused >= nitems(intr_ddata_tab)) {
- mtx_unlock(&isrc_table_lock);
- free(ddata, M_INTRNG);
- return (NULL);
- }
- intr_ddata_tab[intr_ddata_first_unused] = ddata;
- ddata->idd_irq = IRQ_DDATA_BASE + intr_ddata_first_unused++;
- mtx_unlock(&isrc_table_lock);
-
- ddata->idd_data = (struct intr_map_data *)((uintptr_t)ddata + size);
- return (ddata);
-}
-
-static struct intr_irqsrc *
-intr_ddata_lookup(u_int irq, struct intr_map_data **datap)
-{
- int error;
- struct intr_irqsrc *isrc;
- struct intr_dev_data *ddata;
-
- isrc = isrc_lookup(irq);
- if (isrc != NULL) {
- if (datap != NULL)
- *datap = NULL;
- return (isrc);
- }
-
- if (irq < IRQ_DDATA_BASE)
- return (NULL);
-
- irq -= IRQ_DDATA_BASE;
- if (irq >= nitems(intr_ddata_tab))
- return (NULL);
-
- ddata = intr_ddata_tab[irq];
- if (ddata->idd_isrc == NULL) {
- error = intr_map_irq(ddata->idd_dev, ddata->idd_xref,
- ddata->idd_data, &irq);
- if (error != 0)
- return (NULL);
- ddata->idd_isrc = isrc_lookup(irq);
- }
- if (datap != NULL)
- *datap = ddata->idd_data;
- return (ddata->idd_isrc);
-}
-
-#ifdef DEV_ACPI
-/*
- * Map interrupt source according to ACPI info into framework. If such mapping
- * does not exist, create it. Return unique interrupt number (resource handle)
- * associated with mapped interrupt source.
- */
-u_int
-intr_acpi_map_irq(device_t dev, u_int irq, enum intr_polarity pol,
- enum intr_trigger trig)
-{
- struct intr_map_data_acpi *daa;
- struct intr_dev_data *ddata;
-
- ddata = intr_ddata_alloc(sizeof(struct intr_map_data_acpi));
- if (ddata == NULL)
- return (INTR_IRQ_INVALID); /* no space left */
-
- ddata->idd_dev = dev;
- ddata->idd_data->type = INTR_MAP_DATA_ACPI;
-
- daa = (struct intr_map_data_acpi *)ddata->idd_data;
- daa->irq = irq;
- daa->pol = pol;
- daa->trig = trig;
-
- return (ddata->idd_irq);
-}
-#endif
-
-/*
- * Store GPIO interrupt decription in framework and return unique interrupt
- * number (resource handle) associated with it.
- */
-u_int
-intr_gpio_map_irq(device_t dev, u_int pin_num, u_int pin_flags, u_int intr_mode)
-{
- struct intr_dev_data *ddata;
- struct intr_map_data_gpio *dag;
-
- ddata = intr_ddata_alloc(sizeof(struct intr_map_data_gpio));
- if (ddata == NULL)
- return (INTR_IRQ_INVALID); /* no space left */
-
- ddata->idd_dev = dev;
- ddata->idd_data->type = INTR_MAP_DATA_GPIO;
-
- dag = (struct intr_map_data_gpio *)ddata->idd_data;
- dag->gpio_pin_num = pin_num;
- dag->gpio_pin_flags = pin_flags;
- dag->gpio_intr_mode = intr_mode;
- return (ddata->idd_irq);
-}
-
#ifdef INTR_SOLO
/*
* Setup filter into interrupt source.
*/
static int
iscr_setup_filter(struct intr_irqsrc *isrc, const char *name,
intr_irq_filter_t *filter, void *arg, void **cookiep)
{
if (filter == NULL)
return (EINVAL);
mtx_lock(&isrc_table_lock);
/*
* Make sure that we do not mix the two ways
* how we handle interrupt sources.
*/
if (isrc->isrc_filter != NULL || isrc->isrc_event != NULL) {
mtx_unlock(&isrc_table_lock);
return (EBUSY);
}
isrc->isrc_filter = filter;
isrc->isrc_arg = arg;
isrc_update_name(isrc, name);
mtx_unlock(&isrc_table_lock);
*cookiep = isrc;
return (0);
}
#endif
/*
* Interrupt source pre_ithread method for MI interrupt framework.
*/
static void
intr_isrc_pre_ithread(void *arg)
{
struct intr_irqsrc *isrc = arg;
PIC_PRE_ITHREAD(isrc->isrc_dev, isrc);
}
/*
* Interrupt source post_ithread method for MI interrupt framework.
*/
static void
intr_isrc_post_ithread(void *arg)
{
struct intr_irqsrc *isrc = arg;
PIC_POST_ITHREAD(isrc->isrc_dev, isrc);
}
/*
* Interrupt source post_filter method for MI interrupt framework.
*/
static void
intr_isrc_post_filter(void *arg)
{
struct intr_irqsrc *isrc = arg;
PIC_POST_FILTER(isrc->isrc_dev, isrc);
}
/*
* Interrupt source assign_cpu method for MI interrupt framework.
*/
static int
intr_isrc_assign_cpu(void *arg, int cpu)
{
#ifdef SMP
struct intr_irqsrc *isrc = arg;
int error;
if (isrc->isrc_dev != intr_irq_root_dev)
return (EINVAL);
mtx_lock(&isrc_table_lock);
if (cpu == NOCPU) {
CPU_ZERO(&isrc->isrc_cpu);
isrc->isrc_flags &= ~INTR_ISRCF_BOUND;
} else {
CPU_SETOF(cpu, &isrc->isrc_cpu);
isrc->isrc_flags |= INTR_ISRCF_BOUND;
}
/*
* In NOCPU case, it's up to PIC to either leave ISRC on same CPU or
* re-balance it to another CPU or enable it on more CPUs. However,
* PIC is expected to change isrc_cpu appropriately to keep us well
* informed if the call is successful.
*/
if (irq_assign_cpu) {
error = PIC_BIND_INTR(isrc->isrc_dev, isrc);
if (error) {
CPU_ZERO(&isrc->isrc_cpu);
mtx_unlock(&isrc_table_lock);
return (error);
}
}
mtx_unlock(&isrc_table_lock);
return (0);
#else
return (EOPNOTSUPP);
#endif
}
/*
* Create interrupt event for interrupt source.
*/
static int
isrc_event_create(struct intr_irqsrc *isrc)
{
struct intr_event *ie;
int error;
error = intr_event_create(&ie, isrc, 0, isrc->isrc_irq,
intr_isrc_pre_ithread, intr_isrc_post_ithread, intr_isrc_post_filter,
intr_isrc_assign_cpu, "%s:", isrc->isrc_name);
if (error)
return (error);
mtx_lock(&isrc_table_lock);
/*
* Make sure that we do not mix the two ways
* how we handle interrupt sources. Let contested event wins.
*/
#ifdef INTR_SOLO
if (isrc->isrc_filter != NULL || isrc->isrc_event != NULL) {
#else
if (isrc->isrc_event != NULL) {
#endif
mtx_unlock(&isrc_table_lock);
intr_event_destroy(ie);
return (isrc->isrc_event != NULL ? EBUSY : 0);
}
isrc->isrc_event = ie;
mtx_unlock(&isrc_table_lock);
return (0);
}
#ifdef notyet
/*
* Destroy interrupt event for interrupt source.
*/
static void
isrc_event_destroy(struct intr_irqsrc *isrc)
{
struct intr_event *ie;
mtx_lock(&isrc_table_lock);
ie = isrc->isrc_event;
isrc->isrc_event = NULL;
mtx_unlock(&isrc_table_lock);
if (ie != NULL)
intr_event_destroy(ie);
}
#endif
/*
* Add handler to interrupt source.
*/
static int
isrc_add_handler(struct intr_irqsrc *isrc, const char *name,
driver_filter_t filter, driver_intr_t handler, void *arg,
enum intr_type flags, void **cookiep)
{
int error;
if (isrc->isrc_event == NULL) {
error = isrc_event_create(isrc);
if (error)
return (error);
}
error = intr_event_add_handler(isrc->isrc_event, name, filter, handler,
arg, intr_priority(flags), flags, cookiep);
if (error == 0) {
mtx_lock(&isrc_table_lock);
intrcnt_updatename(isrc);
mtx_unlock(&isrc_table_lock);
}
return (error);
}
/*
* Lookup interrupt controller locked.
*/
static inline struct intr_pic *
pic_lookup_locked(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
mtx_assert(&pic_list_lock, MA_OWNED);
if (dev == NULL && xref == 0)
return (NULL);
/* Note that pic->pic_dev is never NULL on registered PIC. */
SLIST_FOREACH(pic, &pic_list, pic_next) {
if (dev == NULL) {
if (xref == pic->pic_xref)
return (pic);
} else if (xref == 0 || pic->pic_xref == 0) {
if (dev == pic->pic_dev)
return (pic);
} else if (xref == pic->pic_xref && dev == pic->pic_dev)
return (pic);
}
return (NULL);
}
/*
* Lookup interrupt controller.
*/
static struct intr_pic *
pic_lookup(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
mtx_lock(&pic_list_lock);
pic = pic_lookup_locked(dev, xref);
mtx_unlock(&pic_list_lock);
return (pic);
}
/*
* Create interrupt controller.
*/
static struct intr_pic *
pic_create(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
mtx_lock(&pic_list_lock);
pic = pic_lookup_locked(dev, xref);
if (pic != NULL) {
mtx_unlock(&pic_list_lock);
return (pic);
}
pic = malloc(sizeof(*pic), M_INTRNG, M_NOWAIT | M_ZERO);
if (pic == NULL) {
mtx_unlock(&pic_list_lock);
return (NULL);
}
pic->pic_xref = xref;
pic->pic_dev = dev;
mtx_init(&pic->pic_child_lock, "pic child lock", NULL, MTX_SPIN);
SLIST_INSERT_HEAD(&pic_list, pic, pic_next);
mtx_unlock(&pic_list_lock);
return (pic);
}
#ifdef notyet
/*
* Destroy interrupt controller.
*/
static void
pic_destroy(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
mtx_lock(&pic_list_lock);
pic = pic_lookup_locked(dev, xref);
if (pic == NULL) {
mtx_unlock(&pic_list_lock);
return;
}
SLIST_REMOVE(&pic_list, pic, intr_pic, pic_next);
mtx_unlock(&pic_list_lock);
free(pic, M_INTRNG);
}
#endif
/*
* Register interrupt controller.
*/
struct intr_pic *
intr_pic_register(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
if (dev == NULL)
return (NULL);
pic = pic_create(dev, xref);
if (pic == NULL)
return (NULL);
pic->pic_flags |= FLAG_PIC;
debugf("PIC %p registered for %s \n", pic,
device_get_nameunit(dev), dev, xref);
return (pic);
}
/*
* Unregister interrupt controller.
*/
int
intr_pic_deregister(device_t dev, intptr_t xref)
{
panic("%s: not implemented", __func__);
}
/*
* Mark interrupt controller (itself) as a root one.
*
* Note that only an interrupt controller can really know its position
* in interrupt controller's tree. So root PIC must claim itself as a root.
*
* In FDT case, according to ePAPR approved version 1.1 from 08 April 2011,
* page 30:
* "The root of the interrupt tree is determined when traversal
* of the interrupt tree reaches an interrupt controller node without
* an interrupts property and thus no explicit interrupt parent."
*/
int
intr_pic_claim_root(device_t dev, intptr_t xref, intr_irq_filter_t *filter,
void *arg, u_int ipicount)
{
struct intr_pic *pic;
pic = pic_lookup(dev, xref);
if (pic == NULL) {
device_printf(dev, "not registered\n");
return (EINVAL);
}
KASSERT((pic->pic_flags & FLAG_PIC) != 0,
("%s: Found a non-PIC controller: %s", __func__,
device_get_name(pic->pic_dev)));
if (filter == NULL) {
device_printf(dev, "filter missing\n");
return (EINVAL);
}
/*
* Only one interrupt controllers could be on the root for now.
* Note that we further suppose that there is not threaded interrupt
* routine (handler) on the root. See intr_irq_handler().
*/
if (intr_irq_root_dev != NULL) {
device_printf(dev, "another root already set\n");
return (EBUSY);
}
intr_irq_root_dev = dev;
irq_root_filter = filter;
irq_root_arg = arg;
irq_root_ipicount = ipicount;
debugf("irq root set to %s\n", device_get_nameunit(dev));
return (0);
}
/*
* Add a handler to manage a sub range of a parents interrupts.
*/
struct intr_pic *
intr_pic_add_handler(device_t parent, struct intr_pic *pic,
intr_child_irq_filter_t *filter, void *arg, uintptr_t start,
uintptr_t length)
{
struct intr_pic *parent_pic;
struct intr_pic_child *newchild;
#ifdef INVARIANTS
struct intr_pic_child *child;
#endif
parent_pic = pic_lookup(parent, 0);
if (parent_pic == NULL)
return (NULL);
newchild = malloc(sizeof(*newchild), M_INTRNG, M_WAITOK | M_ZERO);
newchild->pc_pic = pic;
newchild->pc_filter = filter;
newchild->pc_filter_arg = arg;
newchild->pc_start = start;
newchild->pc_length = length;
mtx_lock_spin(&parent_pic->pic_child_lock);
#ifdef INVARIANTS
SLIST_FOREACH(child, &parent_pic->pic_children, pc_next) {
KASSERT(child->pc_pic != pic, ("%s: Adding a child PIC twice",
__func__));
}
#endif
SLIST_INSERT_HEAD(&parent_pic->pic_children, newchild, pc_next);
mtx_unlock_spin(&parent_pic->pic_child_lock);
return (pic);
}
int
intr_map_irq(device_t dev, intptr_t xref, struct intr_map_data *data,
u_int *irqp)
{
int error;
struct intr_irqsrc *isrc;
struct intr_pic *pic;
if (data == NULL)
return (EINVAL);
pic = pic_lookup(dev, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_PIC) != 0,
("%s: Found a non-PIC controller: %s", __func__,
device_get_name(pic->pic_dev)));
error = PIC_MAP_INTR(pic->pic_dev, data, &isrc);
if (error == 0)
*irqp = isrc->isrc_irq;
return (error);
}
int
intr_alloc_irq(device_t dev, struct resource *res)
{
struct intr_map_data *data;
struct intr_irqsrc *isrc;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- data = rman_get_virtual(res);
- if (data == NULL)
- isrc = intr_ddata_lookup(rman_get_start(res), &data);
- else
- isrc = isrc_lookup(rman_get_start(res));
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL)
return (EINVAL);
+ data = rman_get_virtual(res);
return (PIC_ALLOC_INTR(isrc->isrc_dev, isrc, res, data));
}
int
intr_release_irq(device_t dev, struct resource *res)
{
struct intr_map_data *data;
struct intr_irqsrc *isrc;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- data = rman_get_virtual(res);
- if (data == NULL)
- isrc = intr_ddata_lookup(rman_get_start(res), &data);
- else
- isrc = isrc_lookup(rman_get_start(res));
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL)
return (EINVAL);
+ data = rman_get_virtual(res);
return (PIC_RELEASE_INTR(isrc->isrc_dev, isrc, res, data));
}
int
intr_setup_irq(device_t dev, struct resource *res, driver_filter_t filt,
driver_intr_t hand, void *arg, int flags, void **cookiep)
{
int error;
struct intr_map_data *data;
struct intr_irqsrc *isrc;
const char *name;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- data = rman_get_virtual(res);
- if (data == NULL)
- isrc = intr_ddata_lookup(rman_get_start(res), &data);
- else
- isrc = isrc_lookup(rman_get_start(res));
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL)
return (EINVAL);
+ data = rman_get_virtual(res);
name = device_get_nameunit(dev);
#ifdef INTR_SOLO
/*
* Standard handling is done through MI interrupt framework. However,
* some interrupts could request solely own special handling. This
* non standard handling can be used for interrupt controllers without
* handler (filter only), so in case that interrupt controllers are
* chained, MI interrupt framework is called only in leaf controller.
*
* Note that root interrupt controller routine is served as well,
* however in intr_irq_handler(), i.e. main system dispatch routine.
*/
if (flags & INTR_SOLO && hand != NULL) {
debugf("irq %u cannot solo on %s\n", irq, name);
return (EINVAL);
}
if (flags & INTR_SOLO) {
error = iscr_setup_filter(isrc, name, (intr_irq_filter_t *)filt,
arg, cookiep);
debugf("irq %u setup filter error %d on %s\n", irq, error,
name);
} else
#endif
{
error = isrc_add_handler(isrc, name, filt, hand, arg, flags,
cookiep);
debugf("irq %u add handler error %d on %s\n", irq, error, name);
}
if (error != 0)
return (error);
mtx_lock(&isrc_table_lock);
error = PIC_SETUP_INTR(isrc->isrc_dev, isrc, res, data);
if (error == 0) {
isrc->isrc_handlers++;
if (isrc->isrc_handlers == 1)
PIC_ENABLE_INTR(isrc->isrc_dev, isrc);
}
mtx_unlock(&isrc_table_lock);
if (error != 0)
intr_event_remove_handler(*cookiep);
return (error);
}
int
intr_teardown_irq(device_t dev, struct resource *res, void *cookie)
{
int error;
struct intr_map_data *data;
struct intr_irqsrc *isrc;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- data = rman_get_virtual(res);
- if (data == NULL)
- isrc = intr_ddata_lookup(rman_get_start(res), &data);
- else
- isrc = isrc_lookup(rman_get_start(res));
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL || isrc->isrc_handlers == 0)
return (EINVAL);
+ data = rman_get_virtual(res);
+
#ifdef INTR_SOLO
if (isrc->isrc_filter != NULL) {
if (isrc != cookie)
return (EINVAL);
mtx_lock(&isrc_table_lock);
isrc->isrc_filter = NULL;
isrc->isrc_arg = NULL;
isrc->isrc_handlers = 0;
PIC_DISABLE_INTR(isrc->isrc_dev, isrc);
PIC_TEARDOWN_INTR(isrc->isrc_dev, isrc, res, data);
isrc_update_name(isrc, NULL);
mtx_unlock(&isrc_table_lock);
return (0);
}
#endif
if (isrc != intr_handler_source(cookie))
return (EINVAL);
error = intr_event_remove_handler(cookie);
if (error == 0) {
mtx_lock(&isrc_table_lock);
isrc->isrc_handlers--;
if (isrc->isrc_handlers == 0)
PIC_DISABLE_INTR(isrc->isrc_dev, isrc);
PIC_TEARDOWN_INTR(isrc->isrc_dev, isrc, res, data);
intrcnt_updatename(isrc);
mtx_unlock(&isrc_table_lock);
}
return (error);
}
int
intr_describe_irq(device_t dev, struct resource *res, void *cookie,
const char *descr)
{
int error;
struct intr_irqsrc *isrc;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- isrc = intr_ddata_lookup(rman_get_start(res), NULL);
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL || isrc->isrc_handlers == 0)
return (EINVAL);
#ifdef INTR_SOLO
if (isrc->isrc_filter != NULL) {
if (isrc != cookie)
return (EINVAL);
mtx_lock(&isrc_table_lock);
isrc_update_name(isrc, descr);
mtx_unlock(&isrc_table_lock);
return (0);
}
#endif
error = intr_event_describe_handler(isrc->isrc_event, cookie, descr);
if (error == 0) {
mtx_lock(&isrc_table_lock);
intrcnt_updatename(isrc);
mtx_unlock(&isrc_table_lock);
}
return (error);
}
#ifdef SMP
int
intr_bind_irq(device_t dev, struct resource *res, int cpu)
{
struct intr_irqsrc *isrc;
KASSERT(rman_get_start(res) == rman_get_end(res),
("%s: more interrupts in resource", __func__));
- isrc = intr_ddata_lookup(rman_get_start(res), NULL);
+ isrc = isrc_lookup(rman_get_start(res));
if (isrc == NULL || isrc->isrc_handlers == 0)
return (EINVAL);
#ifdef INTR_SOLO
if (isrc->isrc_filter != NULL)
return (intr_isrc_assign_cpu(isrc, cpu));
#endif
return (intr_event_bind(isrc->isrc_event, cpu));
}
/*
* Return the CPU that the next interrupt source should use.
* For now just returns the next CPU according to round-robin.
*/
u_int
intr_irq_next_cpu(u_int last_cpu, cpuset_t *cpumask)
{
if (!irq_assign_cpu || mp_ncpus == 1)
return (PCPU_GET(cpuid));
do {
last_cpu++;
if (last_cpu > mp_maxid)
last_cpu = 0;
} while (!CPU_ISSET(last_cpu, cpumask));
return (last_cpu);
}
/*
* Distribute all the interrupt sources among the available
* CPUs once the AP's have been launched.
*/
static void
intr_irq_shuffle(void *arg __unused)
{
struct intr_irqsrc *isrc;
u_int i;
if (mp_ncpus == 1)
return;
mtx_lock(&isrc_table_lock);
irq_assign_cpu = TRUE;
for (i = 0; i < NIRQ; i++) {
isrc = irq_sources[i];
if (isrc == NULL || isrc->isrc_handlers == 0 ||
isrc->isrc_flags & (INTR_ISRCF_PPI | INTR_ISRCF_IPI))
continue;
if (isrc->isrc_event != NULL &&
isrc->isrc_flags & INTR_ISRCF_BOUND &&
isrc->isrc_event->ie_cpu != CPU_FFS(&isrc->isrc_cpu) - 1)
panic("%s: CPU inconsistency", __func__);
if ((isrc->isrc_flags & INTR_ISRCF_BOUND) == 0)
CPU_ZERO(&isrc->isrc_cpu); /* start again */
/*
* We are in wicked position here if the following call fails
* for bound ISRC. The best thing we can do is to clear
* isrc_cpu so inconsistency with ie_cpu will be detectable.
*/
if (PIC_BIND_INTR(isrc->isrc_dev, isrc) != 0)
CPU_ZERO(&isrc->isrc_cpu);
}
mtx_unlock(&isrc_table_lock);
}
SYSINIT(intr_irq_shuffle, SI_SUB_SMP, SI_ORDER_SECOND, intr_irq_shuffle, NULL);
#else
u_int
intr_irq_next_cpu(u_int current_cpu, cpuset_t *cpumask)
{
return (PCPU_GET(cpuid));
}
#endif
/*
* Register a MSI/MSI-X interrupt controller
*/
int
intr_msi_register(device_t dev, intptr_t xref)
{
struct intr_pic *pic;
if (dev == NULL)
return (EINVAL);
pic = pic_create(dev, xref);
if (pic == NULL)
return (ENOMEM);
pic->pic_flags |= FLAG_MSI;
debugf("PIC %p registered for %s \n", pic,
device_get_nameunit(dev), dev, (uintmax_t)xref);
return (0);
}
int
intr_alloc_msi(device_t pci, device_t child, intptr_t xref, int count,
int maxcount, int *irqs)
{
struct intr_irqsrc **isrc;
struct intr_pic *pic;
device_t pdev;
int err, i;
pic = pic_lookup(NULL, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_MSI) != 0,
("%s: Found a non-MSI controller: %s", __func__,
device_get_name(pic->pic_dev)));
isrc = malloc(sizeof(*isrc) * count, M_INTRNG, M_WAITOK);
err = MSI_ALLOC_MSI(pic->pic_dev, child, count, maxcount, &pdev, isrc);
if (err == 0) {
for (i = 0; i < count; i++) {
irqs[i] = isrc[i]->isrc_irq;
}
}
free(isrc, M_INTRNG);
return (err);
}
int
intr_release_msi(device_t pci, device_t child, intptr_t xref, int count,
int *irqs)
{
struct intr_irqsrc **isrc;
struct intr_pic *pic;
int i, err;
pic = pic_lookup(NULL, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_MSI) != 0,
("%s: Found a non-MSI controller: %s", __func__,
device_get_name(pic->pic_dev)));
isrc = malloc(sizeof(*isrc) * count, M_INTRNG, M_WAITOK);
for (i = 0; i < count; i++) {
isrc[i] = isrc_lookup(irqs[i]);
if (isrc == NULL) {
free(isrc, M_INTRNG);
return (EINVAL);
}
}
err = MSI_RELEASE_MSI(pic->pic_dev, child, count, isrc);
free(isrc, M_INTRNG);
return (err);
}
int
intr_alloc_msix(device_t pci, device_t child, intptr_t xref, int *irq)
{
struct intr_irqsrc *isrc;
struct intr_pic *pic;
device_t pdev;
int err;
pic = pic_lookup(NULL, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_MSI) != 0,
("%s: Found a non-MSI controller: %s", __func__,
device_get_name(pic->pic_dev)));
err = MSI_ALLOC_MSIX(pic->pic_dev, child, &pdev, &isrc);
if (err != 0)
return (err);
*irq = isrc->isrc_irq;
return (0);
}
int
intr_release_msix(device_t pci, device_t child, intptr_t xref, int irq)
{
struct intr_irqsrc *isrc;
struct intr_pic *pic;
int err;
pic = pic_lookup(NULL, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_MSI) != 0,
("%s: Found a non-MSI controller: %s", __func__,
device_get_name(pic->pic_dev)));
isrc = isrc_lookup(irq);
if (isrc == NULL)
return (EINVAL);
err = MSI_RELEASE_MSIX(pic->pic_dev, child, isrc);
return (err);
}
int
intr_map_msi(device_t pci, device_t child, intptr_t xref, int irq,
uint64_t *addr, uint32_t *data)
{
struct intr_irqsrc *isrc;
struct intr_pic *pic;
int err;
pic = pic_lookup(NULL, xref);
if (pic == NULL)
return (ESRCH);
KASSERT((pic->pic_flags & FLAG_MSI) != 0,
("%s: Found a non-MSI controller: %s", __func__,
device_get_name(pic->pic_dev)));
isrc = isrc_lookup(irq);
if (isrc == NULL)
return (EINVAL);
err = MSI_MAP_MSI(pic->pic_dev, child, isrc, addr, data);
return (err);
}
void dosoftints(void);
void
dosoftints(void)
{
}
#ifdef SMP
/*
* Init interrupt controller on another CPU.
*/
void
intr_pic_init_secondary(void)
{
/*
* QQQ: Only root PIC is aware of other CPUs ???
*/
KASSERT(intr_irq_root_dev != NULL, ("%s: no root attached", __func__));
//mtx_lock(&isrc_table_lock);
PIC_INIT_SECONDARY(intr_irq_root_dev);
//mtx_unlock(&isrc_table_lock);
}
#endif
#ifdef DDB
DB_SHOW_COMMAND(irqs, db_show_irqs)
{
u_int i, irqsum;
u_long num;
struct intr_irqsrc *isrc;
for (irqsum = 0, i = 0; i < NIRQ; i++) {
isrc = irq_sources[i];
if (isrc == NULL)
continue;
num = isrc->isrc_count != NULL ? isrc->isrc_count[0] : 0;
db_printf("irq%-3u <%s>: cpu %02lx%s cnt %lu\n", i,
isrc->isrc_name, isrc->isrc_cpu.__bits[0],
isrc->isrc_flags & INTR_ISRCF_BOUND ? " (bound)" : "", num);
irqsum += num;
}
db_printf("irq total %u\n", irqsum);
}
#endif
Index: projects/vnet/sys/net/flowtable.c
===================================================================
--- projects/vnet/sys/net/flowtable.c (revision 301546)
+++ projects/vnet/sys/net/flowtable.c (revision 301547)
@@ -1,1184 +1,1184 @@
/*-
* Copyright (c) 2014 Gleb Smirnoff
* Copyright (c) 2008-2010, BitGravity Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* 2. Neither the name of the BitGravity Corporation nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
#include "opt_route.h"
#include "opt_mpath.h"
#include "opt_ddb.h"
#include "opt_inet.h"
#include "opt_inet6.h"
#include
__FBSDID("$FreeBSD$");
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#ifdef INET6
#include