diff --git a/share/man/man4/ena.4 b/share/man/man4/ena.4 index befc2bb2ae0b..50a180871627 100644 --- a/share/man/man4/ena.4 +++ b/share/man/man4/ena.4 @@ -1,534 +1,534 @@ .\" SPDX-License-Identifier: BSD-2-Clause .\" -.\" Copyright (c) 2015-2022 Amazon.com, Inc. or its affiliates. +.\" Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in .\" the documentation and/or other materials provided with the .\" distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS .\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT .\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR .\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT .\" OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, .\" SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT .\" LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE .\" OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .Dd June 4, 2021 .Dt ENA 4 .Os .Sh NAME .Nm ena .Nd "FreeBSD kernel driver for Elastic Network Adapter (ENA) family" .Sh SYNOPSIS To compile this driver into the kernel, place the following line in the kernel configuration file: .Bd -ragged -offset indent .Cd "device ena" .Ed .Pp Alternatively, to load the driver as a module at boot time, place the following line in .Xr loader.conf 5 : .Bd -literal -offset indent if_ena_load="YES" .Ed .Sh DESCRIPTION The ENA is a networking interface designed to make good use of modern CPU features and system architectures. .Pp The ENA device exposes a lightweight management interface with a minimal set of memory mapped registers and extendable command set through an Admin Queue. .Pp The driver supports a range of ENA devices, is link-speed independent (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has a negotiated and extendable feature set. .Pp Some ENA devices support SR-IOV. This driver is used for both the SR-IOV Physical Function (PF) and Virtual Function (VF) devices. .Pp The ENA devices enable high speed and low overhead network traffic processing by providing multiple Tx/Rx queue pairs (the maximum number is advertised by the device via the Admin Queue), a dedicated MSI-X interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized data placement. .Pp When RSS is enabled, each Tx/Rx queue pair is bound to a corresponding CPU core and its NUMA domain. The order of those bindings is based on the RSS bucket mapping. For builds with RSS support disabled, the CPU and NUMA management is left to the kernel. Receive-side scaling (RSS) is supported for multi-core scaling. .Pp The .Nm driver and its corresponding devices implement health monitoring mechanisms such as watchdog, enabling the device and driver to recover in a manner transparent to the application, as well as debug logs. .Pp Some of the ENA devices support a working mode called Low-latency Queue (LLQ), which saves several more microseconds. .Pp Support for the .Xr netmap 4 framework is provided by the .Nm driver. Kernel must be built with the DEV_NETMAP option to be able to use this feature. .Sh HARDWARE Supported PCI vendor ID/device IDs: .Pp .Bl -bullet -compact .It 1d0f:0ec2 - ENA PF .It 1d0f:1ec2 - ENA PF with LLQ support .It 1d0f:ec20 - ENA VF .It 1d0f:ec21 - ENA VF with LLQ support .El .Sh LOADER TUNABLES The .Nm driver's behavior can be changed using run-time or boot-time sysctl arguments. The boot-time arguments can be set at the .Xr loader 8 prompt before booting the kernel, or stored in the .Xr loader.conf 5 . The run-time arguments can be set using the .Xr sysctl 8 command. .Pp Boot-time tunables: .Bl -tag -width indent .It Va hw.ena.enable_9k_mbufs Use 9k mbufs for the Rx descriptors. The default is 0. If the node value is set to 1, 9k mbufs will be used for the Rx buffers. If set to 0, page size mbufs will be used instead. .Pp Using 9k buffers for Rx can improve Rx throughput, but in low memory conditions it might increase allocation time, as the system has to look for 3 contiguous pages. This can further lead to OS instability, together with ENA driver reset and NVMe timeouts. If network performance is critical and memory capacity is sufficient, the 9k mbufs can be used. .It Va hw.ena.force_large_llq_headers Force the driver to use large LLQ headers (224 bytes). The default is 0. If the node value is set to 0, the regular size LLQ header will be used, which is 96B. In some cases, the packet header can be bigger than this (for example - IPv6 with multiple extensions). In such a situation, the large LLQ headers should be used by setting this node value to 1. This will take effect only if the device supports both LLQ and large LLQ headers. Otherwise, it will fallback to the no LLQ mode or regular header size. .Pp Increasing LLQ header size reduces the size of the Tx queue by half, so it may affect the number of dropped Tx packets. .El .Pp Run-time tunables: .Bl -tag -width indent .It Va hw.ena.log_level Controls extra logging verbosity of the driver. The default is 2. The higher the logging level, the more logs will be printed out. 0 means all extra logs are disabled and only error logs will be printed out. Default value (2) reports errors, warnings and is verbose about driver operation. .Pp The possible flags are: .Pp .Bl -bullet -compact .It 0 - ENA_ERR - Enable driver error messages and ena_com error logs. .It 1 - ENA_WARN - Enable logs for non-critical errors. .It 2 - ENA_INFO - Make the driver more verbose about its actions. .It 3 - ENA_DBG - Enable debug logs. .El .Pp NOTE: In order to enable logging on the Tx/Rx data path, driver must be compiled with ENA_LOG_IO_ENABLE compilation flag. .Pp Example: To enable logs for errors and warnings, the following command should be used: .Bd -literal -offset indent sysctl hw.ena.log_level=1 .Ed .It Va dev.ena.X.io_queues_nb Number of the currently allocated and used IO queues. The default is max_num_io_queues. Controls the number of IO queue pairs (Tx/Rx). As this call has to reallocate the queues, it will reset the interface and restart all the queues - this means that everything, which was currently held in the queue, will be lost, leading to potential packet drops. .Pp This call can fail if the system isn't able to provide the driver with enough resources. In that situation, the driver will try to revert the previous number of the IO queues. If this also fails, the device reset will be triggered. .Pp Example: To use only 2 Tx and Rx queues for the device ena1, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.1.io_queues_nb=2 .Ed .It Va dev.ena.X.rx_queue_size Size of the Rx queue. The default is 1024. Controls the number of IO descriptors for each Rx queue. The user may want to increase the Rx queue size if they observe a high number of Rx drops in the driver's statistics. For performance reasons, the Rx queue size must be a power of 2. .Pp This call can fail if the system isn't able to provide the driver with enough resources. In that situation, the driver will try to revert to the previous number of the descriptors. If this also fails, the device reset will be triggered. .Pp Example: To increase Rx ring size to 8K descriptors for the device ena0, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.0.rx_queue_size=8192 .Ed .It Va dev.ena.X.buf_ring_size Size of the Tx buffer ring (drbr). The default is 4096. Input must be a power of 2. Controls the number of mbufs that can be held in the Tx buffer ring. The drbr is used as a multiple-producer, single-consumer lockless ring for buffering extra mbufs coming from the stack in case the Tx procedure is busy sending the packets, or the Tx ring is full. Increasing the size of the buffer ring may reduce the number of Tx packets being dropped in case of a big Tx burst, which cannot be handled by the IO queue immediately. Each Tx queue has its own drbr. .Pp It is recommended to keep the drbr with at least the default value, but in case the system lacks the resources, it can be reduced. This call can fail if the system is not able to provide the driver with enough resources. In that situation, the driver will try to revert to the previous number of the drbr and trigger the device reset. .Pp Example: To set drbr size for interface ena0 to 2048, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.0.buf_ring_size=2048 .Ed .It Va dev.ena.X.eni_metrics.sample_interval Interval in seconds for updating ENI metrics. The default is 0. Determines how often (if ever) the ENI metrics should be updated. The ENI metrics are being updated asynchronously in a timer service in order to avoid admin queue overload by sysctl node reading. The value in this node controls the interval between issuing admin commands to the device, which will update the ENI metrics values. .Pp If some application is periodically monitoring the eni_metrics, then the ENI metrics interval can be adjusted accordingly. Value 0 turns off the update completely. Value 1 is the minimum interval and is equal to 1 second. The maximum allowed update interval is 1 hour. .Pp Example: To update ENI metrics for the device ena1 every 10 seconds, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.1.eni_metrics.sample_interval=10 .Ed .It Va dev.ena.X.rss.indir_table_size RSS indirection table size. The default is 128. Returns the number of entries in the RSS indirection table. .Pp Example: To read the RSS indirection table size, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.0.rss.indir_table_size .Ed .It Va dev.ena.X.rss.indir_table RSS indirection table mapping. The default is x:y key-pairs of indir_table_size length. Updates selected indices of the RSS indirection table. .Pp The entry string consists of one or more x:y keypairs, where x stands for the table index and y for its new value. Table indices that don't need to be updated can be omitted from the string and will retain their existing values. .Pp If an index is entered more than once, the last value is used. .Pp Example: To update two selected indices in the RSS indirection table, e.g. setting index 0 to queue 5 and then index 5 to queue 0, the following command should be used: .Bd -literal -offset indent sysctl dev.ena.0.rss.indir_table="0:5 5:0" .Ed .It Va dev.ena.X.rss.key RSS hash key. The default is 40 bytes long randomly generated hash key. Controls the RSS Toeplitz hash algorithm key value. .Pp Only available when driver compiled without the kernel side RSS support. .Pp Example: To change the RSS hash key value to .Pp 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, .br 0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, .br 0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4, .br 0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c, .br 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa .Pp the following command should be used: .Bd -literal -offset indent sysctl dev.ena.0.rss.key=6d5a56da255b0ec24167253d43a38fb0d0ca2bcbae7b30b477cb2da38030f20c6a42b73bbeac01fa .Ed .El .Sh DIAGNOSTICS .Ss Device initialization phase .Bl -diag .It ena%d: failed to init mmio read less .Pp Error occurred during initialization of the mmio register read request. .It ena%d: Can not reset device .Pp Device could not be reset. .br Device may not be responding or is already during reset. .It ena%d: device version is too low .Pp Version of the controller is too old and it is not supported by the driver. .It ena%d: Invalid dma width value %d .Pp The controller is unable to request dma transaction width. .br Device stopped responding or it demanded invalid value. .It ena%d: Can not initialize ena admin queue with device .Pp Initialization of the Admin Queue failed. .br Device may not be responding or there was a problem with initialization of the resources. .It ena%d: Cannot get attribute for ena device rc: %d .Pp Failed to get attributes of the device from the controller. .It ena%d: Cannot configure aenq groups rc: %d .Pp Errors occurred when trying to configure AENQ groups. .El .Ss Driver initialization/shutdown phase .Bl -diag .It ena%d: PCI resource allocation failed! .It ena%d: failed to pmap registers bar .It ena%d: can not allocate ifnet structure .It ena%d: Error with network interface setup .It ena%d: Failed to enable and set the admin interrupts .It ena%d: Error, MSI-X is already enabled .It ena%d: Failed to enable MSIX, vectors %d rc %d .It ena%d: Not enough number of MSI-X allocated: %d .It ena%d: Error with MSI-X enablement .It ena%d: could not allocate irq vector: %d .It ena%d: unable to allocate bus resource: registers! .It ena%d: unable to allocate bus resource: msix! .Pp Resource allocation failed when initializing the device. .br Driver will not be attached. .It ena%d: ENA device init failed (err: %d) .It ena%d: Cannot initialize device .Pp Device initialization failed. .br Driver will not be attached. .It ena%d: failed to register interrupt handler for irq %ju: %d .Pp Error occurred when trying to register Admin Queue interrupt handler. .It ena%d: Cannot setup mgmnt queue intr .Pp Error occurred during configuration of the Admin Queue interrupts. .It ena%d: Enable MSI-X failed .Pp Configuration of the MSI-X for Admin Queue failed. .br There could be lack of resources or interrupts could not have been configured. .br Driver will not be attached. .It ena%d: VLAN is in use, detach first .Pp VLANs are being used when trying to detach the driver. .br VLANs must be detached first and then detach routine have to be called again. .It ena%d: Unmapped RX DMA tag associations .It ena%d: Unmapped TX DMA tag associations .Pp Error occurred when trying to destroy RX/TX DMA tag. .It ena%d: Cannot init indirect table .It ena%d: Cannot fill indirect table .It ena%d: Cannot fill hash function .It ena%d: Cannot fill hash control .It ena%d: WARNING: RSS was not properly initialized, it will affect bandwidth .Pp Error occurred during initialization of one of RSS resources. .br The device will work with reduced performance because all RX packets will be passed to queue 0 and there will be no hash information. .It ena%d: LLQ is not supported. Fallback to host mode policy. .It ena%d: Failed to configure the device mode. Fallback to host mode policy. .It ena%d: unable to allocate LLQ bar resource. Fallback to host mode policy. .Pp Error occurred during Low-latency Queue mode setup. .br The device will work, but without the LLQ performance gain. .It ena%d: failed to enable write combining. .Pp Error occurred while setting the Write Combining mode, required for the LLQ. .It ena%d: failed to tear down irq: %d .It ena%d: dev has no parent while releasing res for irq: %d Release of the interrupts failed. .El .Ss Additional diagnostic .Bl -diag .It ena%d: Invalid MTU setting. new_mtu: %d max_mtu: %d min mtu: %d .Pp Requested MTU value is not supported and will not be set. .It ena%d: Failed to set MTU to %d .Pp This message appears when either MTU change feature is not supported, or device communication error has occurred. .It ena%d: Keep alive watchdog timeout. .Pp Device stopped responding and will be reset. .It ena%d: Found a Tx that wasn't completed on time, qid %d, index %d. .Pp Packet was pushed to the NIC but not sent within given time limit. .br It may be caused by hang of the IO queue. .It ena%d: The number of lost tx completion is above the threshold (%d > %d). Reset the device .Pp If too many Tx weren't completed on time the device is going to be reset. .br It may be caused by hanged queue or device. .It ena%d: Trigger reset is on .Pp Device will be reset. .br Reset is triggered either by watchdog or if too many TX packets were not completed on time. .It ena%d: device reset scheduled but trigger_reset is off .Pp Reset task has been triggered, but the driver did not request it. .br Device reset will not be performed. .It ena%d: Device reset failed .Pp Error occurred while trying to reset the device. .It ena%d: Cannot initialize device .It ena%d: Error, mac address are different .It ena%d: Error, device max mtu is smaller than ifp MTU .It ena%d: Validation of device parameters failed .It ena%d: Enable MSI-X failed .It ena%d: Failed to create I/O queues .It ena%d: Reset attempt failed. Can not reset the device .Pp Error occurred while trying to restore the device after reset. .It ena%d: Device reset completed successfully, Driver info: %s .Pp Device has been correctly restored after reset and is ready to use. .It ena%d: Allocation for Tx Queue %u failed .It ena%d: Allocation for Rx Queue %u failed .It ena%d: Unable to create Rx DMA map for buffer %d .It ena%d: Failed to create io TX queue #%d rc: %d .It ena%d: Failed to get TX queue handlers. TX queue num %d rc: %d .It ena%d: Failed to create io RX queue[%d] rc: %d .It ena%d: Failed to get RX queue handlers. RX queue num %d rc: %d .It ena%d: could not allocate irq vector: %d .It ena%d: failed to register interrupt handler for irq %ju: %d .Pp IO resources initialization failed. .br Interface will not be brought up. .It ena%d: LRO[%d] Initialization failed! .Pp Initialization of the LRO for the RX ring failed. .It ena%d: failed to alloc buffer for rx queue .It ena%d: failed to add buffer for rx queue %d .It ena%d: refilled rx qid %d with only %d mbufs (from %d) .Pp Allocation of resources used on RX path failed. .br If happened during initialization of the IO queue, the interface will not be brought up. .It ena%d: NULL mbuf in rx_info .Pp Error occurred while assembling mbuf from descriptors. .It ena%d: tx_info doesn't have valid mbuf .It ena%d: Invalid req_id: %hu .It ena%d: failed to prepare tx bufs .Pp Error occurred while preparing a packet for transmission. .It ena%d: ioctl promisc/allmulti .Pp IOCTL request for the device to work in promiscuous/allmulti mode. .br See .Xr ifconfig 8 for more details. .El .Sh SUPPORT If an issue is identified with the released source code with a supported adapter, please email the specific information related to the issue to .Aq Mt akiyano@amazon.com , .Aq Mt osamaabb@amazon.com and .Aq Mt darinzon@amazon.com . .Sh SEE ALSO .Xr netmap 4 , .Xr vlan 4 , .Xr ifconfig 8 .Sh HISTORY The .Nm driver first appeared in .Fx 11.1 . .Sh AUTHORS The .Nm driver was developed by Amazon and originally written by .An Semihalf . diff --git a/sys/dev/ena/ena.c b/sys/dev/ena/ena.c index 45bca630b4c2..3ff32cc9966c 100644 --- a/sys/dev/ena/ena.c +++ b/sys/dev/ena/ena.c @@ -1,4087 +1,4087 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2021 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include #include "opt_rss.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "ena.h" #include "ena_datapath.h" #include "ena_rss.h" #include "ena_sysctl.h" #ifdef DEV_NETMAP #include "ena_netmap.h" #endif /* DEV_NETMAP */ /********************************************************* * Function prototypes *********************************************************/ static int ena_probe(device_t); static void ena_intr_msix_mgmnt(void *); static void ena_free_pci_resources(struct ena_adapter *); static int ena_change_mtu(if_t, int); static inline void ena_alloc_counters(counter_u64_t *, int); static inline void ena_free_counters(counter_u64_t *, int); static inline void ena_reset_counters(counter_u64_t *, int); static void ena_init_io_rings_common(struct ena_adapter *, struct ena_ring *, uint16_t); static void ena_init_io_rings_basic(struct ena_adapter *); static void ena_init_io_rings_advanced(struct ena_adapter *); static void ena_init_io_rings(struct ena_adapter *); static void ena_free_io_ring_resources(struct ena_adapter *, unsigned int); static void ena_free_all_io_rings_resources(struct ena_adapter *); static int ena_setup_tx_dma_tag(struct ena_adapter *); static int ena_free_tx_dma_tag(struct ena_adapter *); static int ena_setup_rx_dma_tag(struct ena_adapter *); static int ena_free_rx_dma_tag(struct ena_adapter *); static void ena_release_all_tx_dmamap(struct ena_ring *); static int ena_setup_tx_resources(struct ena_adapter *, int); static void ena_free_tx_resources(struct ena_adapter *, int); static int ena_setup_all_tx_resources(struct ena_adapter *); static void ena_free_all_tx_resources(struct ena_adapter *); static int ena_setup_rx_resources(struct ena_adapter *, unsigned int); static void ena_free_rx_resources(struct ena_adapter *, unsigned int); static int ena_setup_all_rx_resources(struct ena_adapter *); static void ena_free_all_rx_resources(struct ena_adapter *); static inline int ena_alloc_rx_mbuf(struct ena_adapter *, struct ena_ring *, struct ena_rx_buffer *); static void ena_free_rx_mbuf(struct ena_adapter *, struct ena_ring *, struct ena_rx_buffer *); static void ena_free_rx_bufs(struct ena_adapter *, unsigned int); static void ena_refill_all_rx_bufs(struct ena_adapter *); static void ena_free_all_rx_bufs(struct ena_adapter *); static void ena_free_tx_bufs(struct ena_adapter *, unsigned int); static void ena_free_all_tx_bufs(struct ena_adapter *); static void ena_destroy_all_tx_queues(struct ena_adapter *); static void ena_destroy_all_rx_queues(struct ena_adapter *); static void ena_destroy_all_io_queues(struct ena_adapter *); static int ena_create_io_queues(struct ena_adapter *); static int ena_handle_msix(void *); static int ena_enable_msix(struct ena_adapter *); static void ena_setup_mgmnt_intr(struct ena_adapter *); static int ena_setup_io_intr(struct ena_adapter *); static int ena_request_mgmnt_irq(struct ena_adapter *); static int ena_request_io_irq(struct ena_adapter *); static void ena_free_mgmnt_irq(struct ena_adapter *); static void ena_free_io_irq(struct ena_adapter *); static void ena_free_irqs(struct ena_adapter *); static void ena_disable_msix(struct ena_adapter *); static void ena_unmask_all_io_irqs(struct ena_adapter *); static int ena_up_complete(struct ena_adapter *); static uint64_t ena_get_counter(if_t, ift_counter); static int ena_media_change(if_t); static void ena_media_status(if_t, struct ifmediareq *); static void ena_init(void *); static int ena_ioctl(if_t, u_long, caddr_t); static int ena_get_dev_offloads(struct ena_com_dev_get_features_ctx *); static void ena_update_host_info(struct ena_admin_host_info *, if_t); static void ena_update_hwassist(struct ena_adapter *); static int ena_setup_ifnet(device_t, struct ena_adapter *, struct ena_com_dev_get_features_ctx *); static int ena_enable_wc(device_t, struct resource *); static int ena_set_queues_placement_policy(device_t, struct ena_com_dev *, struct ena_admin_feature_llq_desc *, struct ena_llq_configurations *); static int ena_map_llq_mem_bar(device_t, struct ena_com_dev *); static uint32_t ena_calc_max_io_queue_num(device_t, struct ena_com_dev *, struct ena_com_dev_get_features_ctx *); static int ena_calc_io_queue_size(struct ena_calc_queue_size_ctx *); static void ena_config_host_info(struct ena_com_dev *, device_t); static int ena_attach(device_t); static int ena_detach(device_t); static int ena_device_init(struct ena_adapter *, device_t, struct ena_com_dev_get_features_ctx *, int *); static int ena_enable_msix_and_set_admin_interrupts(struct ena_adapter *); static void ena_update_on_link_change(void *, struct ena_admin_aenq_entry *); static void unimplemented_aenq_handler(void *, struct ena_admin_aenq_entry *); static int ena_copy_eni_metrics(struct ena_adapter *); static int ena_copy_srd_metrics(struct ena_adapter *); static int ena_copy_customer_metrics(struct ena_adapter *); static void ena_timer_service(void *); static char ena_version[] = ENA_DEVICE_NAME ENA_DRV_MODULE_NAME " v" ENA_DRV_MODULE_VERSION; static ena_vendor_info_t ena_vendor_info_array[] = { { PCI_VENDOR_ID_AMAZON, PCI_DEV_ID_ENA_PF, 0 }, { PCI_VENDOR_ID_AMAZON, PCI_DEV_ID_ENA_PF_RSERV0, 0 }, { PCI_VENDOR_ID_AMAZON, PCI_DEV_ID_ENA_VF, 0 }, { PCI_VENDOR_ID_AMAZON, PCI_DEV_ID_ENA_VF_RSERV0, 0 }, /* Last entry */ { 0, 0, 0 } }; struct sx ena_global_lock; /* * Contains pointers to event handlers, e.g. link state chage. */ static struct ena_aenq_handlers aenq_handlers; void ena_dmamap_callback(void *arg, bus_dma_segment_t *segs, int nseg, int error) { if (error != 0) return; *(bus_addr_t *)arg = segs[0].ds_addr; } int ena_dma_alloc(device_t dmadev, bus_size_t size, ena_mem_handle_t *dma, int mapflags, bus_size_t alignment, int domain) { struct ena_adapter *adapter = device_get_softc(dmadev); device_t pdev = adapter->pdev; uint32_t maxsize; uint64_t dma_space_addr; int error; maxsize = ((size - 1) / PAGE_SIZE + 1) * PAGE_SIZE; dma_space_addr = ENA_DMA_BIT_MASK(adapter->dma_width); if (unlikely(dma_space_addr == 0)) dma_space_addr = BUS_SPACE_MAXADDR; error = bus_dma_tag_create(bus_get_dma_tag(dmadev), /* parent */ alignment, 0, /* alignment, bounds */ dma_space_addr, /* lowaddr of exclusion window */ BUS_SPACE_MAXADDR, /* highaddr of exclusion window */ NULL, NULL, /* filter, filterarg */ maxsize, /* maxsize */ 1, /* nsegments */ maxsize, /* maxsegsize */ BUS_DMA_ALLOCNOW, /* flags */ NULL, /* lockfunc */ NULL, /* lockarg */ &dma->tag); if (unlikely(error != 0)) { ena_log(pdev, ERR, "bus_dma_tag_create failed: %d\n", error); goto fail_tag; } error = bus_dma_tag_set_domain(dma->tag, domain); if (unlikely(error != 0)) { ena_log(pdev, ERR, "bus_dma_tag_set_domain failed: %d\n", error); goto fail_map_create; } error = bus_dmamem_alloc(dma->tag, (void **)&dma->vaddr, BUS_DMA_COHERENT | BUS_DMA_ZERO, &dma->map); if (unlikely(error != 0)) { ena_log(pdev, ERR, "bus_dmamem_alloc(%ju) failed: %d\n", (uintmax_t)size, error); goto fail_map_create; } dma->paddr = 0; error = bus_dmamap_load(dma->tag, dma->map, dma->vaddr, size, ena_dmamap_callback, &dma->paddr, mapflags); if (unlikely((error != 0) || (dma->paddr == 0))) { ena_log(pdev, ERR, "bus_dmamap_load failed: %d\n", error); goto fail_map_load; } bus_dmamap_sync(dma->tag, dma->map, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); return (0); fail_map_load: bus_dmamem_free(dma->tag, dma->vaddr, dma->map); fail_map_create: bus_dma_tag_destroy(dma->tag); fail_tag: dma->tag = NULL; dma->vaddr = NULL; dma->paddr = 0; return (error); } static void ena_free_pci_resources(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; if (adapter->memory != NULL) { bus_release_resource(pdev, SYS_RES_MEMORY, PCIR_BAR(ENA_MEM_BAR), adapter->memory); } if (adapter->registers != NULL) { bus_release_resource(pdev, SYS_RES_MEMORY, PCIR_BAR(ENA_REG_BAR), adapter->registers); } if (adapter->msix != NULL) { bus_release_resource(pdev, SYS_RES_MEMORY, adapter->msix_rid, adapter->msix); } } static int ena_probe(device_t dev) { ena_vendor_info_t *ent; uint16_t pci_vendor_id = 0; uint16_t pci_device_id = 0; pci_vendor_id = pci_get_vendor(dev); pci_device_id = pci_get_device(dev); ent = ena_vendor_info_array; while (ent->vendor_id != 0) { if ((pci_vendor_id == ent->vendor_id) && (pci_device_id == ent->device_id)) { ena_log_raw(DBG, "vendor=%x device=%x\n", pci_vendor_id, pci_device_id); device_set_desc(dev, ENA_DEVICE_DESC); return (BUS_PROBE_DEFAULT); } ent++; } return (ENXIO); } static int ena_change_mtu(if_t ifp, int new_mtu) { struct ena_adapter *adapter = if_getsoftc(ifp); device_t pdev = adapter->pdev; int rc; if ((new_mtu > adapter->max_mtu) || (new_mtu < ENA_MIN_MTU)) { ena_log(pdev, ERR, "Invalid MTU setting. new_mtu: %d max mtu: %d min mtu: %d\n", new_mtu, adapter->max_mtu, ENA_MIN_MTU); return (EINVAL); } rc = ena_com_set_dev_mtu(adapter->ena_dev, new_mtu); if (likely(rc == 0)) { ena_log(pdev, DBG, "set MTU to %d\n", new_mtu); if_setmtu(ifp, new_mtu); } else { ena_log(pdev, ERR, "Failed to set MTU to %d\n", new_mtu); } return (rc); } static inline void ena_alloc_counters(counter_u64_t *begin, int size) { counter_u64_t *end = (counter_u64_t *)((char *)begin + size); for (; begin < end; ++begin) *begin = counter_u64_alloc(M_WAITOK); } static inline void ena_free_counters(counter_u64_t *begin, int size) { counter_u64_t *end = (counter_u64_t *)((char *)begin + size); for (; begin < end; ++begin) counter_u64_free(*begin); } static inline void ena_reset_counters(counter_u64_t *begin, int size) { counter_u64_t *end = (counter_u64_t *)((char *)begin + size); for (; begin < end; ++begin) counter_u64_zero(*begin); } static void ena_init_io_rings_common(struct ena_adapter *adapter, struct ena_ring *ring, uint16_t qid) { ring->qid = qid; ring->adapter = adapter; ring->ena_dev = adapter->ena_dev; atomic_store_8(&ring->first_interrupt, 0); ring->no_interrupt_event_cnt = 0; } static void ena_init_io_rings_basic(struct ena_adapter *adapter) { struct ena_com_dev *ena_dev; struct ena_ring *txr, *rxr; struct ena_que *que; int i; ena_dev = adapter->ena_dev; for (i = 0; i < adapter->num_io_queues; i++) { txr = &adapter->tx_ring[i]; rxr = &adapter->rx_ring[i]; /* TX/RX common ring state */ ena_init_io_rings_common(adapter, txr, i); ena_init_io_rings_common(adapter, rxr, i); /* TX specific ring state */ txr->tx_max_header_size = ena_dev->tx_max_header_size; txr->tx_mem_queue_type = ena_dev->tx_mem_queue_type; que = &adapter->que[i]; que->adapter = adapter; que->id = i; que->tx_ring = txr; que->rx_ring = rxr; txr->que = que; rxr->que = que; rxr->empty_rx_queue = 0; rxr->rx_mbuf_sz = ena_mbuf_sz; } } static void ena_init_io_rings_advanced(struct ena_adapter *adapter) { struct ena_ring *txr, *rxr; int i; for (i = 0; i < adapter->num_io_queues; i++) { txr = &adapter->tx_ring[i]; rxr = &adapter->rx_ring[i]; /* Allocate a buf ring */ txr->buf_ring_size = adapter->buf_ring_size; txr->br = buf_ring_alloc(txr->buf_ring_size, M_DEVBUF, M_WAITOK, &txr->ring_mtx); /* Allocate Tx statistics. */ ena_alloc_counters((counter_u64_t *)&txr->tx_stats, sizeof(txr->tx_stats)); txr->tx_last_cleanup_ticks = ticks; /* Allocate Rx statistics. */ ena_alloc_counters((counter_u64_t *)&rxr->rx_stats, sizeof(rxr->rx_stats)); /* Initialize locks */ snprintf(txr->mtx_name, nitems(txr->mtx_name), "%s:tx(%d)", device_get_nameunit(adapter->pdev), i); snprintf(rxr->mtx_name, nitems(rxr->mtx_name), "%s:rx(%d)", device_get_nameunit(adapter->pdev), i); mtx_init(&txr->ring_mtx, txr->mtx_name, NULL, MTX_DEF); } } static void ena_init_io_rings(struct ena_adapter *adapter) { /* * IO rings initialization can be divided into the 2 steps: * 1. Initialize variables and fields with initial values and copy * them from adapter/ena_dev (basic) * 2. Allocate mutex, counters and buf_ring (advanced) */ ena_init_io_rings_basic(adapter); ena_init_io_rings_advanced(adapter); } static void ena_free_io_ring_resources(struct ena_adapter *adapter, unsigned int qid) { struct ena_ring *txr = &adapter->tx_ring[qid]; struct ena_ring *rxr = &adapter->rx_ring[qid]; ena_free_counters((counter_u64_t *)&txr->tx_stats, sizeof(txr->tx_stats)); ena_free_counters((counter_u64_t *)&rxr->rx_stats, sizeof(rxr->rx_stats)); ENA_RING_MTX_LOCK(txr); drbr_free(txr->br, M_DEVBUF); ENA_RING_MTX_UNLOCK(txr); mtx_destroy(&txr->ring_mtx); } static void ena_free_all_io_rings_resources(struct ena_adapter *adapter) { int i; for (i = 0; i < adapter->num_io_queues; i++) ena_free_io_ring_resources(adapter, i); } static int ena_setup_tx_dma_tag(struct ena_adapter *adapter) { int ret; /* Create DMA tag for Tx buffers */ ret = bus_dma_tag_create(bus_get_dma_tag(adapter->pdev), 1, 0, /* alignment, bounds */ ENA_DMA_BIT_MASK(adapter->dma_width), /* lowaddr of excl window */ BUS_SPACE_MAXADDR, /* highaddr of excl window */ NULL, NULL, /* filter, filterarg */ ENA_TSO_MAXSIZE, /* maxsize */ adapter->max_tx_sgl_size - 1, /* nsegments */ ENA_TSO_MAXSIZE, /* maxsegsize */ 0, /* flags */ NULL, /* lockfunc */ NULL, /* lockfuncarg */ &adapter->tx_buf_tag); return (ret); } static int ena_free_tx_dma_tag(struct ena_adapter *adapter) { int ret; ret = bus_dma_tag_destroy(adapter->tx_buf_tag); if (likely(ret == 0)) adapter->tx_buf_tag = NULL; return (ret); } static int ena_setup_rx_dma_tag(struct ena_adapter *adapter) { int ret; /* Create DMA tag for Rx buffers*/ ret = bus_dma_tag_create(bus_get_dma_tag(adapter->pdev), /* parent */ 1, 0, /* alignment, bounds */ ENA_DMA_BIT_MASK(adapter->dma_width), /* lowaddr of excl window */ BUS_SPACE_MAXADDR, /* highaddr of excl window */ NULL, NULL, /* filter, filterarg */ ena_mbuf_sz, /* maxsize */ adapter->max_rx_sgl_size, /* nsegments */ ena_mbuf_sz, /* maxsegsize */ 0, /* flags */ NULL, /* lockfunc */ NULL, /* lockarg */ &adapter->rx_buf_tag); return (ret); } static int ena_free_rx_dma_tag(struct ena_adapter *adapter) { int ret; ret = bus_dma_tag_destroy(adapter->rx_buf_tag); if (likely(ret == 0)) adapter->rx_buf_tag = NULL; return (ret); } static void ena_release_all_tx_dmamap(struct ena_ring *tx_ring) { struct ena_adapter *adapter = tx_ring->adapter; struct ena_tx_buffer *tx_info; bus_dma_tag_t tx_tag = adapter->tx_buf_tag; int i; #ifdef DEV_NETMAP struct ena_netmap_tx_info *nm_info; int j; #endif /* DEV_NETMAP */ for (i = 0; i < tx_ring->ring_size; ++i) { tx_info = &tx_ring->tx_buffer_info[i]; #ifdef DEV_NETMAP if (if_getcapenable(adapter->ifp) & IFCAP_NETMAP) { nm_info = &tx_info->nm_info; for (j = 0; j < ENA_PKT_MAX_BUFS; ++j) { if (nm_info->map_seg[j] != NULL) { bus_dmamap_destroy(tx_tag, nm_info->map_seg[j]); nm_info->map_seg[j] = NULL; } } } #endif /* DEV_NETMAP */ if (tx_info->dmamap != NULL) { bus_dmamap_destroy(tx_tag, tx_info->dmamap); tx_info->dmamap = NULL; } } } /** * ena_setup_tx_resources - allocate Tx resources (Descriptors) * @adapter: network interface device structure * @qid: queue index * * Returns 0 on success, otherwise on failure. **/ static int ena_setup_tx_resources(struct ena_adapter *adapter, int qid) { device_t pdev = adapter->pdev; char thread_name[MAXCOMLEN + 1]; struct ena_que *que = &adapter->que[qid]; struct ena_ring *tx_ring = que->tx_ring; cpuset_t *cpu_mask = NULL; int size, i, err; #ifdef DEV_NETMAP bus_dmamap_t *map; int j; ena_netmap_reset_tx_ring(adapter, qid); #endif /* DEV_NETMAP */ size = sizeof(struct ena_tx_buffer) * tx_ring->ring_size; tx_ring->tx_buffer_info = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); if (unlikely(tx_ring->tx_buffer_info == NULL)) return (ENOMEM); size = sizeof(uint16_t) * tx_ring->ring_size; tx_ring->free_tx_ids = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); if (unlikely(tx_ring->free_tx_ids == NULL)) goto err_buf_info_free; size = tx_ring->tx_max_header_size; tx_ring->push_buf_intermediate_buf = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); if (unlikely(tx_ring->push_buf_intermediate_buf == NULL)) goto err_tx_ids_free; /* Req id stack for TX OOO completions */ for (i = 0; i < tx_ring->ring_size; i++) tx_ring->free_tx_ids[i] = i; /* Reset TX statistics. */ ena_reset_counters((counter_u64_t *)&tx_ring->tx_stats, sizeof(tx_ring->tx_stats)); tx_ring->next_to_use = 0; tx_ring->next_to_clean = 0; tx_ring->acum_pkts = 0; /* Make sure that drbr is empty */ ENA_RING_MTX_LOCK(tx_ring); drbr_flush(adapter->ifp, tx_ring->br); ENA_RING_MTX_UNLOCK(tx_ring); /* ... and create the buffer DMA maps */ for (i = 0; i < tx_ring->ring_size; i++) { err = bus_dmamap_create(adapter->tx_buf_tag, 0, &tx_ring->tx_buffer_info[i].dmamap); if (unlikely(err != 0)) { ena_log(pdev, ERR, "Unable to create Tx DMA map for buffer %d\n", i); goto err_map_release; } #ifdef DEV_NETMAP if (if_getcapenable(adapter->ifp) & IFCAP_NETMAP) { map = tx_ring->tx_buffer_info[i].nm_info.map_seg; for (j = 0; j < ENA_PKT_MAX_BUFS; j++) { err = bus_dmamap_create(adapter->tx_buf_tag, 0, &map[j]); if (unlikely(err != 0)) { ena_log(pdev, ERR, "Unable to create Tx DMA for buffer %d %d\n", i, j); goto err_map_release; } } } #endif /* DEV_NETMAP */ } /* Allocate taskqueues */ TASK_INIT(&tx_ring->enqueue_task, 0, ena_deferred_mq_start, tx_ring); tx_ring->enqueue_tq = taskqueue_create_fast("ena_tx_enque", M_NOWAIT, taskqueue_thread_enqueue, &tx_ring->enqueue_tq); if (unlikely(tx_ring->enqueue_tq == NULL)) { ena_log(pdev, ERR, "Unable to create taskqueue for enqueue task\n"); i = tx_ring->ring_size; goto err_map_release; } tx_ring->running = true; #ifdef RSS cpu_mask = &que->cpu_mask; snprintf(thread_name, sizeof(thread_name), "%s txeq %d", device_get_nameunit(adapter->pdev), que->cpu); #else snprintf(thread_name, sizeof(thread_name), "%s txeq %d", device_get_nameunit(adapter->pdev), que->id); #endif taskqueue_start_threads_cpuset(&tx_ring->enqueue_tq, 1, PI_NET, cpu_mask, "%s", thread_name); return (0); err_map_release: ena_release_all_tx_dmamap(tx_ring); err_tx_ids_free: free(tx_ring->free_tx_ids, M_DEVBUF); tx_ring->free_tx_ids = NULL; err_buf_info_free: free(tx_ring->tx_buffer_info, M_DEVBUF); tx_ring->tx_buffer_info = NULL; return (ENOMEM); } /** * ena_free_tx_resources - Free Tx Resources per Queue * @adapter: network interface device structure * @qid: queue index * * Free all transmit software resources **/ static void ena_free_tx_resources(struct ena_adapter *adapter, int qid) { struct ena_ring *tx_ring = &adapter->tx_ring[qid]; #ifdef DEV_NETMAP struct ena_netmap_tx_info *nm_info; int j; #endif /* DEV_NETMAP */ while (taskqueue_cancel(tx_ring->enqueue_tq, &tx_ring->enqueue_task, NULL)) taskqueue_drain(tx_ring->enqueue_tq, &tx_ring->enqueue_task); taskqueue_free(tx_ring->enqueue_tq); ENA_RING_MTX_LOCK(tx_ring); /* Flush buffer ring, */ drbr_flush(adapter->ifp, tx_ring->br); /* Free buffer DMA maps, */ for (int i = 0; i < tx_ring->ring_size; i++) { bus_dmamap_sync(adapter->tx_buf_tag, tx_ring->tx_buffer_info[i].dmamap, BUS_DMASYNC_POSTWRITE); bus_dmamap_unload(adapter->tx_buf_tag, tx_ring->tx_buffer_info[i].dmamap); bus_dmamap_destroy(adapter->tx_buf_tag, tx_ring->tx_buffer_info[i].dmamap); #ifdef DEV_NETMAP if (if_getcapenable(adapter->ifp) & IFCAP_NETMAP) { nm_info = &tx_ring->tx_buffer_info[i].nm_info; for (j = 0; j < ENA_PKT_MAX_BUFS; j++) { if (nm_info->socket_buf_idx[j] != 0) { bus_dmamap_sync(adapter->tx_buf_tag, nm_info->map_seg[j], BUS_DMASYNC_POSTWRITE); ena_netmap_unload(adapter, nm_info->map_seg[j]); } bus_dmamap_destroy(adapter->tx_buf_tag, nm_info->map_seg[j]); nm_info->socket_buf_idx[j] = 0; } } #endif /* DEV_NETMAP */ m_freem(tx_ring->tx_buffer_info[i].mbuf); tx_ring->tx_buffer_info[i].mbuf = NULL; } ENA_RING_MTX_UNLOCK(tx_ring); /* And free allocated memory. */ free(tx_ring->tx_buffer_info, M_DEVBUF); tx_ring->tx_buffer_info = NULL; free(tx_ring->free_tx_ids, M_DEVBUF); tx_ring->free_tx_ids = NULL; free(tx_ring->push_buf_intermediate_buf, M_DEVBUF); tx_ring->push_buf_intermediate_buf = NULL; } /** * ena_setup_all_tx_resources - allocate all queues Tx resources * @adapter: network interface device structure * * Returns 0 on success, otherwise on failure. **/ static int ena_setup_all_tx_resources(struct ena_adapter *adapter) { int i, rc; for (i = 0; i < adapter->num_io_queues; i++) { rc = ena_setup_tx_resources(adapter, i); if (rc != 0) { ena_log(adapter->pdev, ERR, "Allocation for Tx Queue %u failed\n", i); goto err_setup_tx; } } return (0); err_setup_tx: /* Rewind the index freeing the rings as we go */ while (i--) ena_free_tx_resources(adapter, i); return (rc); } /** * ena_free_all_tx_resources - Free Tx Resources for All Queues * @adapter: network interface device structure * * Free all transmit software resources **/ static void ena_free_all_tx_resources(struct ena_adapter *adapter) { int i; for (i = 0; i < adapter->num_io_queues; i++) ena_free_tx_resources(adapter, i); } /** * ena_setup_rx_resources - allocate Rx resources (Descriptors) * @adapter: network interface device structure * @qid: queue index * * Returns 0 on success, otherwise on failure. **/ static int ena_setup_rx_resources(struct ena_adapter *adapter, unsigned int qid) { device_t pdev = adapter->pdev; struct ena_que *que = &adapter->que[qid]; struct ena_ring *rx_ring = que->rx_ring; int size, err, i; size = sizeof(struct ena_rx_buffer) * rx_ring->ring_size; #ifdef DEV_NETMAP ena_netmap_reset_rx_ring(adapter, qid); rx_ring->initialized = false; #endif /* DEV_NETMAP */ /* * Alloc extra element so in rx path * we can always prefetch rx_info + 1 */ size += sizeof(struct ena_rx_buffer); rx_ring->rx_buffer_info = malloc(size, M_DEVBUF, M_WAITOK | M_ZERO); size = sizeof(uint16_t) * rx_ring->ring_size; rx_ring->free_rx_ids = malloc(size, M_DEVBUF, M_WAITOK); for (i = 0; i < rx_ring->ring_size; i++) rx_ring->free_rx_ids[i] = i; /* Reset RX statistics. */ ena_reset_counters((counter_u64_t *)&rx_ring->rx_stats, sizeof(rx_ring->rx_stats)); rx_ring->next_to_clean = 0; rx_ring->next_to_use = 0; /* ... and create the buffer DMA maps */ for (i = 0; i < rx_ring->ring_size; i++) { err = bus_dmamap_create(adapter->rx_buf_tag, 0, &(rx_ring->rx_buffer_info[i].map)); if (err != 0) { ena_log(pdev, ERR, "Unable to create Rx DMA map for buffer %d\n", i); goto err_buf_info_unmap; } } /* Create LRO for the ring */ if ((if_getcapenable(adapter->ifp) & IFCAP_LRO) != 0) { int err = tcp_lro_init(&rx_ring->lro); if (err != 0) { ena_log(pdev, ERR, "LRO[%d] Initialization failed!\n", qid); } else { ena_log(pdev, DBG, "RX Soft LRO[%d] Initialized\n", qid); rx_ring->lro.ifp = adapter->ifp; } } return (0); err_buf_info_unmap: while (i--) { bus_dmamap_destroy(adapter->rx_buf_tag, rx_ring->rx_buffer_info[i].map); } free(rx_ring->free_rx_ids, M_DEVBUF); rx_ring->free_rx_ids = NULL; free(rx_ring->rx_buffer_info, M_DEVBUF); rx_ring->rx_buffer_info = NULL; return (ENOMEM); } /** * ena_free_rx_resources - Free Rx Resources * @adapter: network interface device structure * @qid: queue index * * Free all receive software resources **/ static void ena_free_rx_resources(struct ena_adapter *adapter, unsigned int qid) { struct ena_ring *rx_ring = &adapter->rx_ring[qid]; /* Free buffer DMA maps, */ for (int i = 0; i < rx_ring->ring_size; i++) { bus_dmamap_sync(adapter->rx_buf_tag, rx_ring->rx_buffer_info[i].map, BUS_DMASYNC_POSTREAD); m_freem(rx_ring->rx_buffer_info[i].mbuf); rx_ring->rx_buffer_info[i].mbuf = NULL; bus_dmamap_unload(adapter->rx_buf_tag, rx_ring->rx_buffer_info[i].map); bus_dmamap_destroy(adapter->rx_buf_tag, rx_ring->rx_buffer_info[i].map); } /* free LRO resources, */ tcp_lro_free(&rx_ring->lro); /* free allocated memory */ free(rx_ring->rx_buffer_info, M_DEVBUF); rx_ring->rx_buffer_info = NULL; free(rx_ring->free_rx_ids, M_DEVBUF); rx_ring->free_rx_ids = NULL; } /** * ena_setup_all_rx_resources - allocate all queues Rx resources * @adapter: network interface device structure * * Returns 0 on success, otherwise on failure. **/ static int ena_setup_all_rx_resources(struct ena_adapter *adapter) { int i, rc = 0; for (i = 0; i < adapter->num_io_queues; i++) { rc = ena_setup_rx_resources(adapter, i); if (rc != 0) { ena_log(adapter->pdev, ERR, "Allocation for Rx Queue %u failed\n", i); goto err_setup_rx; } } return (0); err_setup_rx: /* rewind the index freeing the rings as we go */ while (i--) ena_free_rx_resources(adapter, i); return (rc); } /** * ena_free_all_rx_resources - Free Rx resources for all queues * @adapter: network interface device structure * * Free all receive software resources **/ static void ena_free_all_rx_resources(struct ena_adapter *adapter) { int i; for (i = 0; i < adapter->num_io_queues; i++) ena_free_rx_resources(adapter, i); } static inline int ena_alloc_rx_mbuf(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info) { device_t pdev = adapter->pdev; struct ena_com_buf *ena_buf; bus_dma_segment_t segs[1]; int nsegs, error; int mlen; /* if previous allocated frag is not used */ if (unlikely(rx_info->mbuf != NULL)) return (0); /* Get mbuf using UMA allocator */ rx_info->mbuf = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, rx_ring->rx_mbuf_sz); if (unlikely(rx_info->mbuf == NULL)) { counter_u64_add(rx_ring->rx_stats.mjum_alloc_fail, 1); rx_info->mbuf = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); if (unlikely(rx_info->mbuf == NULL)) { counter_u64_add(rx_ring->rx_stats.mbuf_alloc_fail, 1); return (ENOMEM); } mlen = MCLBYTES; } else { mlen = rx_ring->rx_mbuf_sz; } /* Set mbuf length*/ rx_info->mbuf->m_pkthdr.len = rx_info->mbuf->m_len = mlen; /* Map packets for DMA */ ena_log(pdev, DBG, "Using tag %p for buffers' DMA mapping, mbuf %p len: %d\n", adapter->rx_buf_tag, rx_info->mbuf, rx_info->mbuf->m_len); error = bus_dmamap_load_mbuf_sg(adapter->rx_buf_tag, rx_info->map, rx_info->mbuf, segs, &nsegs, BUS_DMA_NOWAIT); if (unlikely((error != 0) || (nsegs != 1))) { ena_log(pdev, WARN, "failed to map mbuf, error: %d, nsegs: %d\n", error, nsegs); counter_u64_add(rx_ring->rx_stats.dma_mapping_err, 1); goto exit; } bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_PREREAD); ena_buf = &rx_info->ena_buf; ena_buf->paddr = segs[0].ds_addr; ena_buf->len = mlen; ena_log(pdev, DBG, "ALLOC RX BUF: mbuf %p, rx_info %p, len %d, paddr %#jx\n", rx_info->mbuf, rx_info, ena_buf->len, (uintmax_t)ena_buf->paddr); return (0); exit: m_freem(rx_info->mbuf); rx_info->mbuf = NULL; return (EFAULT); } static void ena_free_rx_mbuf(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info) { if (rx_info->mbuf == NULL) { ena_log(adapter->pdev, WARN, "Trying to free unallocated buffer\n"); return; } bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_POSTREAD); bus_dmamap_unload(adapter->rx_buf_tag, rx_info->map); m_freem(rx_info->mbuf); rx_info->mbuf = NULL; } /** * ena_refill_rx_bufs - Refills ring with descriptors * @rx_ring: the ring which we want to feed with free descriptors * @num: number of descriptors to refill * Refills the ring with newly allocated DMA-mapped mbufs for receiving **/ int ena_refill_rx_bufs(struct ena_ring *rx_ring, uint32_t num) { struct ena_adapter *adapter = rx_ring->adapter; device_t pdev = adapter->pdev; uint16_t next_to_use, req_id; uint32_t i; int rc; ena_log_io(adapter->pdev, DBG, "refill qid: %d\n", rx_ring->qid); next_to_use = rx_ring->next_to_use; for (i = 0; i < num; i++) { struct ena_rx_buffer *rx_info; ena_log_io(pdev, DBG, "RX buffer - next to use: %d\n", next_to_use); req_id = rx_ring->free_rx_ids[next_to_use]; rx_info = &rx_ring->rx_buffer_info[req_id]; #ifdef DEV_NETMAP if (ena_rx_ring_in_netmap(adapter, rx_ring->qid)) rc = ena_netmap_alloc_rx_slot(adapter, rx_ring, rx_info); else #endif /* DEV_NETMAP */ rc = ena_alloc_rx_mbuf(adapter, rx_ring, rx_info); if (unlikely(rc != 0)) { ena_log_io(pdev, WARN, "failed to alloc buffer for rx queue %d\n", rx_ring->qid); break; } rc = ena_com_add_single_rx_desc(rx_ring->ena_com_io_sq, &rx_info->ena_buf, req_id); if (unlikely(rc != 0)) { ena_log_io(pdev, WARN, "failed to add buffer for rx queue %d\n", rx_ring->qid); break; } next_to_use = ENA_RX_RING_IDX_NEXT(next_to_use, rx_ring->ring_size); } if (unlikely(i < num)) { counter_u64_add(rx_ring->rx_stats.refil_partial, 1); ena_log_io(pdev, WARN, "refilled rx qid %d with only %d mbufs (from %d)\n", rx_ring->qid, i, num); } if (likely(i != 0)) ena_com_write_sq_doorbell(rx_ring->ena_com_io_sq); rx_ring->next_to_use = next_to_use; return (i); } int ena_update_buf_ring_size(struct ena_adapter *adapter, uint32_t new_buf_ring_size) { uint32_t old_buf_ring_size; int rc = 0; bool dev_was_up; old_buf_ring_size = adapter->buf_ring_size; adapter->buf_ring_size = new_buf_ring_size; dev_was_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); ena_down(adapter); /* Reconfigure buf ring for all Tx rings. */ ena_free_all_io_rings_resources(adapter); ena_init_io_rings_advanced(adapter); if (dev_was_up) { /* * If ena_up() fails, it's not because of recent buf_ring size * changes. Because of that, we just want to revert old drbr * value and trigger the reset because something else had to * go wrong. */ rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to configure device after setting new drbr size: %u. Reverting old value: %u and triggering the reset\n", new_buf_ring_size, old_buf_ring_size); /* Revert old size and trigger the reset */ adapter->buf_ring_size = old_buf_ring_size; ena_free_all_io_rings_resources(adapter); ena_init_io_rings_advanced(adapter); ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); ena_trigger_reset(adapter, ENA_REGS_RESET_OS_TRIGGER); } } return (rc); } int ena_update_queue_size(struct ena_adapter *adapter, uint32_t new_tx_size, uint32_t new_rx_size) { uint32_t old_tx_size, old_rx_size; int rc = 0; bool dev_was_up; old_tx_size = adapter->requested_tx_ring_size; old_rx_size = adapter->requested_rx_ring_size; adapter->requested_tx_ring_size = new_tx_size; adapter->requested_rx_ring_size = new_rx_size; dev_was_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); ena_down(adapter); /* Configure queues with new size. */ ena_init_io_rings_basic(adapter); if (dev_was_up) { rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to configure device with the new sizes - Tx: %u Rx: %u. Reverting old values - Tx: %u Rx: %u\n", new_tx_size, new_rx_size, old_tx_size, old_rx_size); /* Revert old size. */ adapter->requested_tx_ring_size = old_tx_size; adapter->requested_rx_ring_size = old_rx_size; ena_init_io_rings_basic(adapter); /* And try again. */ rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to revert old queue sizes. Triggering device reset.\n"); /* * If we've failed again, something had to go * wrong. After reset, the device should try to * go up */ ENA_FLAG_SET_ATOMIC( ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); ena_trigger_reset(adapter, ENA_REGS_RESET_OS_TRIGGER); } } } return (rc); } static void ena_update_io_rings(struct ena_adapter *adapter, uint32_t num) { ena_free_all_io_rings_resources(adapter); /* Force indirection table to be reinitialized */ ena_com_rss_destroy(adapter->ena_dev); adapter->num_io_queues = num; ena_init_io_rings(adapter); } int ena_update_base_cpu(struct ena_adapter *adapter, int new_num) { int old_num; int rc = 0; bool dev_was_up; dev_was_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); old_num = adapter->irq_cpu_base; ena_down(adapter); adapter->irq_cpu_base = new_num; if (dev_was_up) { rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to configure device %d IRQ base CPU. " "Reverting to previous value: %d\n", new_num, old_num); adapter->irq_cpu_base = old_num; rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to revert to previous setup." "Triggering device reset.\n"); ENA_FLAG_SET_ATOMIC( ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); ena_trigger_reset(adapter, ENA_REGS_RESET_OS_TRIGGER); } } } return (rc); } int ena_update_cpu_stride(struct ena_adapter *adapter, uint32_t new_num) { uint32_t old_num; int rc = 0; bool dev_was_up; dev_was_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); old_num = adapter->irq_cpu_stride; ena_down(adapter); adapter->irq_cpu_stride = new_num; if (dev_was_up) { rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to configure device %d IRQ CPU stride. " "Reverting to previous value: %d\n", new_num, old_num); adapter->irq_cpu_stride = old_num; rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to revert to previous setup." "Triggering device reset.\n"); ENA_FLAG_SET_ATOMIC( ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); ena_trigger_reset(adapter, ENA_REGS_RESET_OS_TRIGGER); } } } return (rc); } /* Caller should sanitize new_num */ int ena_update_io_queue_nb(struct ena_adapter *adapter, uint32_t new_num) { uint32_t old_num; int rc = 0; bool dev_was_up; dev_was_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); old_num = adapter->num_io_queues; ena_down(adapter); ena_update_io_rings(adapter, new_num); if (dev_was_up) { rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to configure device with %u IO queues. " "Reverting to previous value: %u\n", new_num, old_num); ena_update_io_rings(adapter, old_num); rc = ena_up(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to revert to previous setup IO " "queues. Triggering device reset.\n"); ENA_FLAG_SET_ATOMIC( ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); ena_trigger_reset(adapter, ENA_REGS_RESET_OS_TRIGGER); } } } return (rc); } static void ena_free_rx_bufs(struct ena_adapter *adapter, unsigned int qid) { struct ena_ring *rx_ring = &adapter->rx_ring[qid]; unsigned int i; for (i = 0; i < rx_ring->ring_size; i++) { struct ena_rx_buffer *rx_info = &rx_ring->rx_buffer_info[i]; if (rx_info->mbuf != NULL) ena_free_rx_mbuf(adapter, rx_ring, rx_info); #ifdef DEV_NETMAP if (((if_getflags(adapter->ifp) & IFF_DYING) == 0) && (if_getcapenable(adapter->ifp) & IFCAP_NETMAP)) { if (rx_info->netmap_buf_idx != 0) ena_netmap_free_rx_slot(adapter, rx_ring, rx_info); } #endif /* DEV_NETMAP */ } } /** * ena_refill_all_rx_bufs - allocate all queues Rx buffers * @adapter: network interface device structure * */ static void ena_refill_all_rx_bufs(struct ena_adapter *adapter) { struct ena_ring *rx_ring; int i, rc, bufs_num; for (i = 0; i < adapter->num_io_queues; i++) { rx_ring = &adapter->rx_ring[i]; bufs_num = rx_ring->ring_size - 1; rc = ena_refill_rx_bufs(rx_ring, bufs_num); if (unlikely(rc != bufs_num)) ena_log_io(adapter->pdev, WARN, "refilling Queue %d failed. " "Allocated %d buffers from: %d\n", i, rc, bufs_num); #ifdef DEV_NETMAP rx_ring->initialized = true; #endif /* DEV_NETMAP */ } } static void ena_free_all_rx_bufs(struct ena_adapter *adapter) { int i; for (i = 0; i < adapter->num_io_queues; i++) ena_free_rx_bufs(adapter, i); } /** * ena_free_tx_bufs - Free Tx Buffers per Queue * @adapter: network interface device structure * @qid: queue index **/ static void ena_free_tx_bufs(struct ena_adapter *adapter, unsigned int qid) { bool print_once = true; struct ena_ring *tx_ring = &adapter->tx_ring[qid]; ENA_RING_MTX_LOCK(tx_ring); for (int i = 0; i < tx_ring->ring_size; i++) { struct ena_tx_buffer *tx_info = &tx_ring->tx_buffer_info[i]; if (tx_info->mbuf == NULL) continue; if (print_once) { ena_log(adapter->pdev, WARN, "free uncompleted tx mbuf qid %d idx 0x%x\n", qid, i); print_once = false; } else { ena_log(adapter->pdev, DBG, "free uncompleted tx mbuf qid %d idx 0x%x\n", qid, i); } bus_dmamap_sync(adapter->tx_buf_tag, tx_info->dmamap, BUS_DMASYNC_POSTWRITE); bus_dmamap_unload(adapter->tx_buf_tag, tx_info->dmamap); m_free(tx_info->mbuf); tx_info->mbuf = NULL; } ENA_RING_MTX_UNLOCK(tx_ring); } static void ena_free_all_tx_bufs(struct ena_adapter *adapter) { for (int i = 0; i < adapter->num_io_queues; i++) ena_free_tx_bufs(adapter, i); } static void ena_destroy_all_tx_queues(struct ena_adapter *adapter) { uint16_t ena_qid; int i; for (i = 0; i < adapter->num_io_queues; i++) { ena_qid = ENA_IO_TXQ_IDX(i); ena_com_destroy_io_queue(adapter->ena_dev, ena_qid); } } static void ena_destroy_all_rx_queues(struct ena_adapter *adapter) { uint16_t ena_qid; int i; for (i = 0; i < adapter->num_io_queues; i++) { ena_qid = ENA_IO_RXQ_IDX(i); ena_com_destroy_io_queue(adapter->ena_dev, ena_qid); } } static void ena_destroy_all_io_queues(struct ena_adapter *adapter) { struct ena_que *queue; int i; for (i = 0; i < adapter->num_io_queues; i++) { queue = &adapter->que[i]; while (taskqueue_cancel(queue->cleanup_tq, &queue->cleanup_task, NULL)) taskqueue_drain(queue->cleanup_tq, &queue->cleanup_task); taskqueue_free(queue->cleanup_tq); } ena_destroy_all_tx_queues(adapter); ena_destroy_all_rx_queues(adapter); } static int ena_create_io_queues(struct ena_adapter *adapter) { struct ena_com_dev *ena_dev = adapter->ena_dev; struct ena_com_create_io_ctx ctx; struct ena_ring *ring; struct ena_que *queue; uint16_t ena_qid; uint32_t msix_vector; cpuset_t *cpu_mask = NULL; int rc, i; /* Create TX queues */ for (i = 0; i < adapter->num_io_queues; i++) { msix_vector = ENA_IO_IRQ_IDX(i); ena_qid = ENA_IO_TXQ_IDX(i); ctx.mem_queue_type = ena_dev->tx_mem_queue_type; ctx.direction = ENA_COM_IO_QUEUE_DIRECTION_TX; ctx.queue_size = adapter->requested_tx_ring_size; ctx.msix_vector = msix_vector; ctx.qid = ena_qid; ctx.numa_node = adapter->que[i].domain; rc = ena_com_create_io_queue(ena_dev, &ctx); if (rc != 0) { ena_log(adapter->pdev, ERR, "Failed to create io TX queue #%d rc: %d\n", i, rc); goto err_tx; } ring = &adapter->tx_ring[i]; rc = ena_com_get_io_handlers(ena_dev, ena_qid, &ring->ena_com_io_sq, &ring->ena_com_io_cq); if (rc != 0) { ena_log(adapter->pdev, ERR, "Failed to get TX queue handlers. TX queue num" " %d rc: %d\n", i, rc); ena_com_destroy_io_queue(ena_dev, ena_qid); goto err_tx; } if (ctx.numa_node >= 0) { ena_com_update_numa_node(ring->ena_com_io_cq, ctx.numa_node); } } /* Create RX queues */ for (i = 0; i < adapter->num_io_queues; i++) { msix_vector = ENA_IO_IRQ_IDX(i); ena_qid = ENA_IO_RXQ_IDX(i); ctx.mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; ctx.direction = ENA_COM_IO_QUEUE_DIRECTION_RX; ctx.queue_size = adapter->requested_rx_ring_size; ctx.msix_vector = msix_vector; ctx.qid = ena_qid; ctx.numa_node = adapter->que[i].domain; rc = ena_com_create_io_queue(ena_dev, &ctx); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to create io RX queue[%d] rc: %d\n", i, rc); goto err_rx; } ring = &adapter->rx_ring[i]; rc = ena_com_get_io_handlers(ena_dev, ena_qid, &ring->ena_com_io_sq, &ring->ena_com_io_cq); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Failed to get RX queue handlers. RX queue num" " %d rc: %d\n", i, rc); ena_com_destroy_io_queue(ena_dev, ena_qid); goto err_rx; } if (ctx.numa_node >= 0) { ena_com_update_numa_node(ring->ena_com_io_cq, ctx.numa_node); } } for (i = 0; i < adapter->num_io_queues; i++) { queue = &adapter->que[i]; NET_TASK_INIT(&queue->cleanup_task, 0, ena_cleanup, queue); queue->cleanup_tq = taskqueue_create_fast("ena cleanup", M_WAITOK, taskqueue_thread_enqueue, &queue->cleanup_tq); #ifdef RSS cpu_mask = &queue->cpu_mask; #endif taskqueue_start_threads_cpuset(&queue->cleanup_tq, 1, PI_NET, cpu_mask, "%s queue %d cleanup", device_get_nameunit(adapter->pdev), i); } return (0); err_rx: while (i--) ena_com_destroy_io_queue(ena_dev, ENA_IO_RXQ_IDX(i)); i = adapter->num_io_queues; err_tx: while (i--) ena_com_destroy_io_queue(ena_dev, ENA_IO_TXQ_IDX(i)); return (ENXIO); } /********************************************************************* * * MSIX & Interrupt Service routine * **********************************************************************/ /** * ena_handle_msix - MSIX Interrupt Handler for admin/async queue * @arg: interrupt number **/ static void ena_intr_msix_mgmnt(void *arg) { struct ena_adapter *adapter = (struct ena_adapter *)arg; ena_com_admin_q_comp_intr_handler(adapter->ena_dev); if (likely(ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) ena_com_aenq_intr_handler(adapter->ena_dev, arg); } /** * ena_handle_msix - MSIX Interrupt Handler for Tx/Rx * @arg: queue **/ static int ena_handle_msix(void *arg) { struct ena_que *queue = arg; struct ena_adapter *adapter = queue->adapter; if_t ifp = adapter->ifp; if (unlikely((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0)) return (FILTER_STRAY); taskqueue_enqueue(queue->cleanup_tq, &queue->cleanup_task); return (FILTER_HANDLED); } static int ena_enable_msix(struct ena_adapter *adapter) { device_t dev = adapter->pdev; int msix_vecs, msix_req; int i, rc = 0; if (ENA_FLAG_ISSET(ENA_FLAG_MSIX_ENABLED, adapter)) { ena_log(dev, ERR, "Error, MSI-X is already enabled\n"); return (EINVAL); } /* Reserved the max msix vectors we might need */ msix_vecs = ENA_MAX_MSIX_VEC(adapter->max_num_io_queues); adapter->msix_entries = malloc(msix_vecs * sizeof(struct msix_entry), M_DEVBUF, M_WAITOK | M_ZERO); ena_log(dev, DBG, "trying to enable MSI-X, vectors: %d\n", msix_vecs); for (i = 0; i < msix_vecs; i++) { adapter->msix_entries[i].entry = i; /* Vectors must start from 1 */ adapter->msix_entries[i].vector = i + 1; } msix_req = msix_vecs; rc = pci_alloc_msix(dev, &msix_vecs); if (unlikely(rc != 0)) { ena_log(dev, ERR, "Failed to enable MSIX, vectors %d rc %d\n", msix_vecs, rc); rc = ENOSPC; goto err_msix_free; } if (msix_vecs != msix_req) { if (msix_vecs == ENA_ADMIN_MSIX_VEC) { ena_log(dev, ERR, "Not enough number of MSI-x allocated: %d\n", msix_vecs); pci_release_msi(dev); rc = ENOSPC; goto err_msix_free; } ena_log(dev, ERR, "Enable only %d MSI-x (out of %d), reduce " "the number of queues\n", msix_vecs, msix_req); } adapter->msix_vecs = msix_vecs; ENA_FLAG_SET_ATOMIC(ENA_FLAG_MSIX_ENABLED, adapter); return (0); err_msix_free: free(adapter->msix_entries, M_DEVBUF); adapter->msix_entries = NULL; return (rc); } static void ena_setup_mgmnt_intr(struct ena_adapter *adapter) { snprintf(adapter->irq_tbl[ENA_MGMNT_IRQ_IDX].name, ENA_IRQNAME_SIZE, "ena-mgmnt@pci:%s", device_get_nameunit(adapter->pdev)); /* * Handler is NULL on purpose, it will be set * when mgmnt interrupt is acquired */ adapter->irq_tbl[ENA_MGMNT_IRQ_IDX].handler = NULL; adapter->irq_tbl[ENA_MGMNT_IRQ_IDX].data = adapter; adapter->irq_tbl[ENA_MGMNT_IRQ_IDX].vector = adapter->msix_entries[ENA_MGMNT_IRQ_IDX].vector; } static int ena_setup_io_intr(struct ena_adapter *adapter) { #ifdef RSS int num_buckets = rss_getnumbuckets(); static int last_bind = 0; int cur_bind; int idx; #endif int irq_idx; if (adapter->msix_entries == NULL) return (EINVAL); #ifdef RSS if (adapter->first_bind < 0) { adapter->first_bind = last_bind; last_bind = (last_bind + adapter->num_io_queues) % num_buckets; } cur_bind = adapter->first_bind; #endif for (int i = 0; i < adapter->num_io_queues; i++) { irq_idx = ENA_IO_IRQ_IDX(i); snprintf(adapter->irq_tbl[irq_idx].name, ENA_IRQNAME_SIZE, "%s-TxRx-%d", device_get_nameunit(adapter->pdev), i); adapter->irq_tbl[irq_idx].handler = ena_handle_msix; adapter->irq_tbl[irq_idx].data = &adapter->que[i]; adapter->irq_tbl[irq_idx].vector = adapter->msix_entries[irq_idx].vector; ena_log(adapter->pdev, DBG, "ena_setup_io_intr vector: %d\n", adapter->msix_entries[irq_idx].vector); if (adapter->irq_cpu_base > ENA_BASE_CPU_UNSPECIFIED) { adapter->que[i].cpu = adapter->irq_tbl[irq_idx].cpu = (unsigned)(adapter->irq_cpu_base + i * adapter->irq_cpu_stride) % (unsigned)mp_ncpus; CPU_SETOF(adapter->que[i].cpu, &adapter->que[i].cpu_mask); } #ifdef RSS adapter->que[i].cpu = adapter->irq_tbl[irq_idx].cpu = rss_getcpu(cur_bind); cur_bind = (cur_bind + 1) % num_buckets; CPU_SETOF(adapter->que[i].cpu, &adapter->que[i].cpu_mask); for (idx = 0; idx < MAXMEMDOM; ++idx) { if (CPU_ISSET(adapter->que[i].cpu, &cpuset_domain[idx])) break; } adapter->que[i].domain = idx; #else adapter->que[i].domain = -1; #endif } return (0); } static int ena_request_mgmnt_irq(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; struct ena_irq *irq; unsigned long flags; int rc, rcc; flags = RF_ACTIVE | RF_SHAREABLE; irq = &adapter->irq_tbl[ENA_MGMNT_IRQ_IDX]; irq->res = bus_alloc_resource_any(adapter->pdev, SYS_RES_IRQ, &irq->vector, flags); if (unlikely(irq->res == NULL)) { ena_log(pdev, ERR, "could not allocate irq vector: %d\n", irq->vector); return (ENXIO); } rc = bus_setup_intr(adapter->pdev, irq->res, INTR_TYPE_NET | INTR_MPSAFE, NULL, ena_intr_msix_mgmnt, irq->data, &irq->cookie); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to register interrupt handler for irq %ju: %d\n", rman_get_start(irq->res), rc); goto err_res_free; } irq->requested = true; return (rc); err_res_free: ena_log(pdev, INFO, "releasing resource for irq %d\n", irq->vector); rcc = bus_release_resource(adapter->pdev, SYS_RES_IRQ, irq->vector, irq->res); if (unlikely(rcc != 0)) ena_log(pdev, ERR, "dev has no parent while releasing res for irq: %d\n", irq->vector); irq->res = NULL; return (rc); } static int ena_request_io_irq(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; struct ena_irq *irq; unsigned long flags = 0; int rc = 0, i, rcc; if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_MSIX_ENABLED, adapter))) { ena_log(pdev, ERR, "failed to request I/O IRQ: MSI-X is not enabled\n"); return (EINVAL); } else { flags = RF_ACTIVE | RF_SHAREABLE; } for (i = ENA_IO_IRQ_FIRST_IDX; i < adapter->msix_vecs; i++) { irq = &adapter->irq_tbl[i]; if (unlikely(irq->requested)) continue; irq->res = bus_alloc_resource_any(adapter->pdev, SYS_RES_IRQ, &irq->vector, flags); if (unlikely(irq->res == NULL)) { rc = ENOMEM; ena_log(pdev, ERR, "could not allocate irq vector: %d\n", irq->vector); goto err; } rc = bus_setup_intr(adapter->pdev, irq->res, INTR_TYPE_NET | INTR_MPSAFE, irq->handler, NULL, irq->data, &irq->cookie); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to register interrupt handler for irq %ju: %d\n", rman_get_start(irq->res), rc); goto err; } irq->requested = true; if (adapter->rss_enabled || adapter->irq_cpu_base > ENA_BASE_CPU_UNSPECIFIED) { rc = bus_bind_intr(adapter->pdev, irq->res, irq->cpu); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to bind interrupt handler for irq %ju to cpu %d: %d\n", rman_get_start(irq->res), irq->cpu, rc); goto err; } ena_log(pdev, INFO, "queue %d - cpu %d\n", i - ENA_IO_IRQ_FIRST_IDX, irq->cpu); } } return (rc); err: for (; i >= ENA_IO_IRQ_FIRST_IDX; i--) { irq = &adapter->irq_tbl[i]; rcc = 0; /* Once we entered err: section and irq->requested is true we free both intr and resources */ if (irq->requested) { rcc = bus_teardown_intr(adapter->pdev, irq->res, irq->cookie); if (unlikely(rcc != 0)) ena_log(pdev, ERR, "could not release irq: %d, error: %d\n", irq->vector, rcc); } /* If we entered err: section without irq->requested set we know it was bus_alloc_resource_any() that needs cleanup, provided res is not NULL. In case res is NULL no work in needed in this iteration */ rcc = 0; if (irq->res != NULL) { rcc = bus_release_resource(adapter->pdev, SYS_RES_IRQ, irq->vector, irq->res); } if (unlikely(rcc != 0)) ena_log(pdev, ERR, "dev has no parent while releasing res for irq: %d\n", irq->vector); irq->requested = false; irq->res = NULL; } return (rc); } static void ena_free_mgmnt_irq(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; struct ena_irq *irq; int rc; irq = &adapter->irq_tbl[ENA_MGMNT_IRQ_IDX]; if (irq->requested) { ena_log(pdev, DBG, "tear down irq: %d\n", irq->vector); rc = bus_teardown_intr(adapter->pdev, irq->res, irq->cookie); if (unlikely(rc != 0)) ena_log(pdev, ERR, "failed to tear down irq: %d\n", irq->vector); irq->requested = 0; } if (irq->res != NULL) { ena_log(pdev, DBG, "release resource irq: %d\n", irq->vector); rc = bus_release_resource(adapter->pdev, SYS_RES_IRQ, irq->vector, irq->res); irq->res = NULL; if (unlikely(rc != 0)) ena_log(pdev, ERR, "dev has no parent while releasing res for irq: %d\n", irq->vector); } } static void ena_free_io_irq(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; struct ena_irq *irq; int rc; for (int i = ENA_IO_IRQ_FIRST_IDX; i < adapter->msix_vecs; i++) { irq = &adapter->irq_tbl[i]; if (irq->requested) { ena_log(pdev, DBG, "tear down irq: %d\n", irq->vector); rc = bus_teardown_intr(adapter->pdev, irq->res, irq->cookie); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to tear down irq: %d\n", irq->vector); } irq->requested = 0; } if (irq->res != NULL) { ena_log(pdev, DBG, "release resource irq: %d\n", irq->vector); rc = bus_release_resource(adapter->pdev, SYS_RES_IRQ, irq->vector, irq->res); irq->res = NULL; if (unlikely(rc != 0)) { ena_log(pdev, ERR, "dev has no parent while releasing res for irq: %d\n", irq->vector); } } } } static void ena_free_irqs(struct ena_adapter *adapter) { ena_free_io_irq(adapter); ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); } static void ena_disable_msix(struct ena_adapter *adapter) { if (ENA_FLAG_ISSET(ENA_FLAG_MSIX_ENABLED, adapter)) { ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_MSIX_ENABLED, adapter); pci_release_msi(adapter->pdev); } adapter->msix_vecs = 0; free(adapter->msix_entries, M_DEVBUF); adapter->msix_entries = NULL; } static void ena_unmask_all_io_irqs(struct ena_adapter *adapter) { struct ena_com_io_cq *io_cq; struct ena_eth_io_intr_reg intr_reg; struct ena_ring *tx_ring; uint16_t ena_qid; int i; /* Unmask interrupts for all queues */ for (i = 0; i < adapter->num_io_queues; i++) { ena_qid = ENA_IO_TXQ_IDX(i); io_cq = &adapter->ena_dev->io_cq_queues[ena_qid]; ena_com_update_intr_reg(&intr_reg, 0, 0, true, false); tx_ring = &adapter->tx_ring[i]; counter_u64_add(tx_ring->tx_stats.unmask_interrupt_num, 1); ena_com_unmask_intr(io_cq, &intr_reg); } } static int ena_up_complete(struct ena_adapter *adapter) { int rc; if (likely(ENA_FLAG_ISSET(ENA_FLAG_RSS_ACTIVE, adapter))) { rc = ena_rss_configure(adapter); if (rc != 0) { ena_log(adapter->pdev, ERR, "Failed to configure RSS\n"); return (rc); } } rc = ena_change_mtu(adapter->ifp, if_getmtu(adapter->ifp)); if (unlikely(rc != 0)) return (rc); ena_refill_all_rx_bufs(adapter); ena_reset_counters((counter_u64_t *)&adapter->hw_stats, sizeof(adapter->hw_stats)); return (0); } static void set_io_rings_size(struct ena_adapter *adapter, int new_tx_size, int new_rx_size) { int i; for (i = 0; i < adapter->num_io_queues; i++) { adapter->tx_ring[i].ring_size = new_tx_size; adapter->rx_ring[i].ring_size = new_rx_size; } } static int create_queues_with_size_backoff(struct ena_adapter *adapter) { device_t pdev = adapter->pdev; int rc; uint32_t cur_rx_ring_size, cur_tx_ring_size; uint32_t new_rx_ring_size, new_tx_ring_size; /* * Current queue sizes might be set to smaller than the requested * ones due to past queue allocation failures. */ set_io_rings_size(adapter, adapter->requested_tx_ring_size, adapter->requested_rx_ring_size); while (1) { /* Allocate transmit descriptors */ rc = ena_setup_all_tx_resources(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "err_setup_tx\n"); goto err_setup_tx; } /* Allocate receive descriptors */ rc = ena_setup_all_rx_resources(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "err_setup_rx\n"); goto err_setup_rx; } /* Create IO queues for Rx & Tx */ rc = ena_create_io_queues(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "create IO queues failed\n"); goto err_io_que; } return (0); err_io_que: ena_free_all_rx_resources(adapter); err_setup_rx: ena_free_all_tx_resources(adapter); err_setup_tx: /* * Lower the ring size if ENOMEM. Otherwise, return the * error straightaway. */ if (unlikely(rc != ENOMEM)) { ena_log(pdev, ERR, "Queue creation failed with error code: %d\n", rc); return (rc); } cur_tx_ring_size = adapter->tx_ring[0].ring_size; cur_rx_ring_size = adapter->rx_ring[0].ring_size; ena_log(pdev, ERR, "Not enough memory to create queues with sizes TX=%d, RX=%d\n", cur_tx_ring_size, cur_rx_ring_size); new_tx_ring_size = cur_tx_ring_size; new_rx_ring_size = cur_rx_ring_size; /* * Decrease the size of a larger queue, or decrease both if they * are the same size. */ if (cur_rx_ring_size <= cur_tx_ring_size) new_tx_ring_size = cur_tx_ring_size / 2; if (cur_rx_ring_size >= cur_tx_ring_size) new_rx_ring_size = cur_rx_ring_size / 2; if (new_tx_ring_size < ENA_MIN_RING_SIZE || new_rx_ring_size < ENA_MIN_RING_SIZE) { ena_log(pdev, ERR, "Queue creation failed with the smallest possible queue size" "of %d for both queues. Not retrying with smaller queues\n", ENA_MIN_RING_SIZE); return (rc); } ena_log(pdev, INFO, "Retrying queue creation with sizes TX=%d, RX=%d\n", new_tx_ring_size, new_rx_ring_size); set_io_rings_size(adapter, new_tx_ring_size, new_rx_ring_size); } } int ena_up(struct ena_adapter *adapter) { int rc = 0; ENA_LOCK_ASSERT(); if (unlikely(device_is_attached(adapter->pdev) == 0)) { ena_log(adapter->pdev, ERR, "device is not attached!\n"); return (ENXIO); } if (ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) return (0); ena_log(adapter->pdev, INFO, "device is going UP\n"); /* setup interrupts for IO queues */ rc = ena_setup_io_intr(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "error setting up IO interrupt\n"); goto error; } rc = ena_request_io_irq(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "err_req_irq\n"); goto error; } ena_log(adapter->pdev, INFO, "Creating %u IO queues. Rx queue size: %d, Tx queue size: %d, LLQ is %s\n", adapter->num_io_queues, adapter->requested_rx_ring_size, adapter->requested_tx_ring_size, (adapter->ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) ? "ENABLED" : "DISABLED"); rc = create_queues_with_size_backoff(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "error creating queues with size backoff\n"); goto err_create_queues_with_backoff; } if (ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, adapter)) if_link_state_change(adapter->ifp, LINK_STATE_UP); rc = ena_up_complete(adapter); if (unlikely(rc != 0)) goto err_up_complete; counter_u64_add(adapter->dev_stats.interface_up, 1); ena_update_hwassist(adapter); if_setdrvflagbits(adapter->ifp, IFF_DRV_RUNNING, IFF_DRV_OACTIVE); ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEV_UP, adapter); ena_unmask_all_io_irqs(adapter); return (0); err_up_complete: ena_destroy_all_io_queues(adapter); ena_free_all_rx_resources(adapter); ena_free_all_tx_resources(adapter); err_create_queues_with_backoff: ena_free_io_irq(adapter); error: return (rc); } static uint64_t ena_get_counter(if_t ifp, ift_counter cnt) { struct ena_adapter *adapter; struct ena_hw_stats *stats; adapter = if_getsoftc(ifp); stats = &adapter->hw_stats; switch (cnt) { case IFCOUNTER_IPACKETS: return (counter_u64_fetch(stats->rx_packets)); case IFCOUNTER_OPACKETS: return (counter_u64_fetch(stats->tx_packets)); case IFCOUNTER_IBYTES: return (counter_u64_fetch(stats->rx_bytes)); case IFCOUNTER_OBYTES: return (counter_u64_fetch(stats->tx_bytes)); case IFCOUNTER_IQDROPS: return (counter_u64_fetch(stats->rx_drops)); case IFCOUNTER_OQDROPS: return (counter_u64_fetch(stats->tx_drops)); default: return (if_get_counter_default(ifp, cnt)); } } static int ena_media_change(if_t ifp) { /* Media Change is not supported by firmware */ return (0); } static void ena_media_status(if_t ifp, struct ifmediareq *ifmr) { struct ena_adapter *adapter = if_getsoftc(ifp); ena_log(adapter->pdev, DBG, "Media status update\n"); ENA_LOCK_LOCK(); ifmr->ifm_status = IFM_AVALID; ifmr->ifm_active = IFM_ETHER; if (!ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, adapter)) { ENA_LOCK_UNLOCK(); ena_log(adapter->pdev, INFO, "Link is down\n"); return; } ifmr->ifm_status |= IFM_ACTIVE; ifmr->ifm_active |= IFM_UNKNOWN | IFM_FDX; ENA_LOCK_UNLOCK(); } static void ena_init(void *arg) { struct ena_adapter *adapter = (struct ena_adapter *)arg; if (!ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) { ENA_LOCK_LOCK(); ena_up(adapter); ENA_LOCK_UNLOCK(); } } static int ena_ioctl(if_t ifp, u_long command, caddr_t data) { struct ena_adapter *adapter; struct ifreq *ifr; int rc; adapter = if_getsoftc(ifp); ifr = (struct ifreq *)data; /* * Acquiring lock to prevent from running up and down routines parallel. */ rc = 0; switch (command) { case SIOCSIFMTU: if (if_getmtu(ifp) == ifr->ifr_mtu) break; ENA_LOCK_LOCK(); ena_down(adapter); ena_change_mtu(ifp, ifr->ifr_mtu); rc = ena_up(adapter); ENA_LOCK_UNLOCK(); break; case SIOCSIFFLAGS: if ((if_getflags(ifp) & IFF_UP) != 0) { if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) { if ((if_getflags(ifp) & (IFF_PROMISC | IFF_ALLMULTI)) != 0) { ena_log(adapter->pdev, INFO, "ioctl promisc/allmulti\n"); } } else { ENA_LOCK_LOCK(); rc = ena_up(adapter); ENA_LOCK_UNLOCK(); } } else { if ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) { ENA_LOCK_LOCK(); ena_down(adapter); ENA_LOCK_UNLOCK(); } } break; case SIOCADDMULTI: case SIOCDELMULTI: break; case SIOCSIFMEDIA: case SIOCGIFMEDIA: rc = ifmedia_ioctl(ifp, ifr, &adapter->media, command); break; case SIOCSIFCAP: { int reinit = 0; if (ifr->ifr_reqcap != if_getcapenable(ifp)) { if_setcapenable(ifp, ifr->ifr_reqcap); reinit = 1; } if ((reinit != 0) && ((if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0)) { ENA_LOCK_LOCK(); ena_down(adapter); rc = ena_up(adapter); ENA_LOCK_UNLOCK(); } } break; default: rc = ether_ioctl(ifp, command, data); break; } return (rc); } static int ena_get_dev_offloads(struct ena_com_dev_get_features_ctx *feat) { int caps = 0; if ((feat->offload.tx & (ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_FULL_MASK | ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK | ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L3_CSUM_IPV4_MASK)) != 0) caps |= IFCAP_TXCSUM; if ((feat->offload.tx & (ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_FULL_MASK | ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_PART_MASK)) != 0) caps |= IFCAP_TXCSUM_IPV6; if ((feat->offload.tx & ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV4_MASK) != 0) caps |= IFCAP_TSO4; if ((feat->offload.tx & ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV6_MASK) != 0) caps |= IFCAP_TSO6; if ((feat->offload.rx_supported & (ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV4_CSUM_MASK | ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L3_CSUM_IPV4_MASK)) != 0) caps |= IFCAP_RXCSUM; if ((feat->offload.rx_supported & ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV6_CSUM_MASK) != 0) caps |= IFCAP_RXCSUM_IPV6; caps |= IFCAP_LRO | IFCAP_JUMBO_MTU; return (caps); } static void ena_update_host_info(struct ena_admin_host_info *host_info, if_t ifp) { host_info->supported_network_features[0] = (uint32_t)if_getcapabilities(ifp); } static void ena_update_hwassist(struct ena_adapter *adapter) { if_t ifp = adapter->ifp; uint32_t feat = adapter->tx_offload_cap; int cap = if_getcapenable(ifp); int flags = 0; if_clearhwassist(ifp); if ((cap & IFCAP_TXCSUM) != 0) { if ((feat & ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L3_CSUM_IPV4_MASK) != 0) flags |= CSUM_IP; if ((feat & (ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_FULL_MASK | ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK)) != 0) flags |= CSUM_IP_UDP | CSUM_IP_TCP; } if ((cap & IFCAP_TXCSUM_IPV6) != 0) flags |= CSUM_IP6_UDP | CSUM_IP6_TCP; if ((cap & IFCAP_TSO4) != 0) flags |= CSUM_IP_TSO; if ((cap & IFCAP_TSO6) != 0) flags |= CSUM_IP6_TSO; if_sethwassistbits(ifp, flags, 0); } static int ena_setup_ifnet(device_t pdev, struct ena_adapter *adapter, struct ena_com_dev_get_features_ctx *feat) { if_t ifp; int caps = 0; ifp = adapter->ifp = if_gethandle(IFT_ETHER); if (unlikely(ifp == NULL)) { ena_log(pdev, ERR, "can not allocate ifnet structure\n"); return (ENXIO); } if_initname(ifp, device_get_name(pdev), device_get_unit(pdev)); if_setdev(ifp, pdev); if_setsoftc(ifp, adapter); if_setflags(ifp, IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST); if_setinitfn(ifp, ena_init); if_settransmitfn(ifp, ena_mq_start); if_setqflushfn(ifp, ena_qflush); if_setioctlfn(ifp, ena_ioctl); if_setgetcounterfn(ifp, ena_get_counter); if_setsendqlen(ifp, adapter->requested_tx_ring_size); if_setsendqready(ifp); if_setmtu(ifp, ETHERMTU); if_setbaudrate(ifp, 0); /* Zeroize capabilities... */ if_setcapabilities(ifp, 0); if_setcapenable(ifp, 0); /* check hardware support */ caps = ena_get_dev_offloads(feat); /* ... and set them */ if_setcapabilitiesbit(ifp, caps, 0); /* TSO parameters */ if_sethwtsomax(ifp, ENA_TSO_MAXSIZE - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN)); if_sethwtsomaxsegcount(ifp, adapter->max_tx_sgl_size - 1); if_sethwtsomaxsegsize(ifp, ENA_TSO_MAXSIZE); if_setifheaderlen(ifp, sizeof(struct ether_vlan_header)); if_setcapenable(ifp, if_getcapabilities(ifp)); /* * Specify the media types supported by this adapter and register * callbacks to update media and link information */ ifmedia_init(&adapter->media, IFM_IMASK, ena_media_change, ena_media_status); ifmedia_add(&adapter->media, IFM_ETHER | IFM_AUTO, 0, NULL); ifmedia_set(&adapter->media, IFM_ETHER | IFM_AUTO); ether_ifattach(ifp, adapter->mac_addr); return (0); } void ena_down(struct ena_adapter *adapter) { int rc; ENA_LOCK_ASSERT(); if (!ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) return; ena_log(adapter->pdev, INFO, "device is going DOWN\n"); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_DEV_UP, adapter); if_setdrvflagbits(adapter->ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING); ena_free_io_irq(adapter); if (ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter)) { rc = ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); if (unlikely(rc != 0)) ena_log(adapter->pdev, ERR, "Device reset failed\n"); } ena_destroy_all_io_queues(adapter); ena_free_all_tx_bufs(adapter); ena_free_all_rx_bufs(adapter); ena_free_all_tx_resources(adapter); ena_free_all_rx_resources(adapter); counter_u64_add(adapter->dev_stats.interface_down, 1); } static uint32_t ena_calc_max_io_queue_num(device_t pdev, struct ena_com_dev *ena_dev, struct ena_com_dev_get_features_ctx *get_feat_ctx) { uint32_t io_tx_sq_num, io_tx_cq_num, io_rx_num, max_num_io_queues; /* Regular queues capabilities */ if (ena_dev->supported_features & BIT(ENA_ADMIN_MAX_QUEUES_EXT)) { struct ena_admin_queue_ext_feature_fields *max_queue_ext = &get_feat_ctx->max_queue_ext.max_queue_ext; io_rx_num = min_t(int, max_queue_ext->max_rx_sq_num, max_queue_ext->max_rx_cq_num); io_tx_sq_num = max_queue_ext->max_tx_sq_num; io_tx_cq_num = max_queue_ext->max_tx_cq_num; } else { struct ena_admin_queue_feature_desc *max_queues = &get_feat_ctx->max_queues; io_tx_sq_num = max_queues->max_sq_num; io_tx_cq_num = max_queues->max_cq_num; io_rx_num = min_t(int, io_tx_sq_num, io_tx_cq_num); } /* In case of LLQ use the llq fields for the tx SQ/CQ */ if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) io_tx_sq_num = get_feat_ctx->llq.max_llq_num; max_num_io_queues = min_t(uint32_t, mp_ncpus, ENA_MAX_NUM_IO_QUEUES); max_num_io_queues = min_t(uint32_t, max_num_io_queues, io_rx_num); max_num_io_queues = min_t(uint32_t, max_num_io_queues, io_tx_sq_num); max_num_io_queues = min_t(uint32_t, max_num_io_queues, io_tx_cq_num); /* 1 IRQ for mgmnt and 1 IRQ for each TX/RX pair */ max_num_io_queues = min_t(uint32_t, max_num_io_queues, pci_msix_count(pdev) - 1); #ifdef RSS max_num_io_queues = min_t(uint32_t, max_num_io_queues, rss_getnumbuckets()); #endif return (max_num_io_queues); } static int ena_enable_wc(device_t pdev, struct resource *res) { #if defined(__i386) || defined(__amd64) || defined(__aarch64__) vm_offset_t va; vm_size_t len; int rc; va = (vm_offset_t)rman_get_virtual(res); len = rman_get_size(res); /* Enable write combining */ rc = pmap_change_attr(va, len, VM_MEMATTR_WRITE_COMBINING); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "pmap_change_attr failed, %d\n", rc); return (rc); } return (0); #endif return (EOPNOTSUPP); } static int ena_set_queues_placement_policy(device_t pdev, struct ena_com_dev *ena_dev, struct ena_admin_feature_llq_desc *llq, struct ena_llq_configurations *llq_default_configurations) { int rc; uint32_t llq_feature_mask; llq_feature_mask = 1 << ENA_ADMIN_LLQ; if (!(ena_dev->supported_features & llq_feature_mask)) { ena_log(pdev, WARN, "LLQ is not supported. Fallback to host mode policy.\n"); ena_dev->tx_mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; return (0); } if (ena_dev->mem_bar == NULL) { ena_log(pdev, WARN, "LLQ is advertised as supported but device doesn't expose mem bar.\n"); ena_dev->tx_mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; return (0); } rc = ena_com_config_dev_mode(ena_dev, llq, llq_default_configurations); if (unlikely(rc != 0)) { ena_log(pdev, WARN, "Failed to configure the device mode. " "Fallback to host mode policy.\n"); ena_dev->tx_mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; } return (0); } static int ena_map_llq_mem_bar(device_t pdev, struct ena_com_dev *ena_dev) { struct ena_adapter *adapter = device_get_softc(pdev); int rc, rid; /* Try to allocate resources for LLQ bar */ rid = PCIR_BAR(ENA_MEM_BAR); adapter->memory = bus_alloc_resource_any(pdev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (unlikely(adapter->memory == NULL)) { ena_log(pdev, WARN, "Unable to allocate LLQ bar resource. LLQ mode won't be used.\n"); return (0); } /* Enable write combining for better LLQ performance */ rc = ena_enable_wc(adapter->pdev, adapter->memory); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to enable write combining.\n"); return (rc); } /* * Save virtual address of the device's memory region * for the ena_com layer. */ ena_dev->mem_bar = rman_get_virtual(adapter->memory); return (0); } static inline void set_default_llq_configurations(struct ena_llq_configurations *llq_config, struct ena_admin_feature_llq_desc *llq) { llq_config->llq_header_location = ENA_ADMIN_INLINE_HEADER; llq_config->llq_stride_ctrl = ENA_ADMIN_MULTIPLE_DESCS_PER_ENTRY; llq_config->llq_num_decs_before_header = ENA_ADMIN_LLQ_NUM_DESCS_BEFORE_HEADER_2; if ((llq->entry_size_ctrl_supported & ENA_ADMIN_LIST_ENTRY_SIZE_256B) != 0 && ena_force_large_llq_header) { llq_config->llq_ring_entry_size = ENA_ADMIN_LIST_ENTRY_SIZE_256B; llq_config->llq_ring_entry_size_value = 256; } else { llq_config->llq_ring_entry_size = ENA_ADMIN_LIST_ENTRY_SIZE_128B; llq_config->llq_ring_entry_size_value = 128; } } static int ena_calc_io_queue_size(struct ena_calc_queue_size_ctx *ctx) { struct ena_admin_feature_llq_desc *llq = &ctx->get_feat_ctx->llq; struct ena_com_dev *ena_dev = ctx->ena_dev; uint32_t tx_queue_size = ENA_DEFAULT_RING_SIZE; uint32_t rx_queue_size = ENA_DEFAULT_RING_SIZE; uint32_t max_tx_queue_size; uint32_t max_rx_queue_size; if (ena_dev->supported_features & BIT(ENA_ADMIN_MAX_QUEUES_EXT)) { struct ena_admin_queue_ext_feature_fields *max_queue_ext = &ctx->get_feat_ctx->max_queue_ext.max_queue_ext; max_rx_queue_size = min_t(uint32_t, max_queue_ext->max_rx_cq_depth, max_queue_ext->max_rx_sq_depth); max_tx_queue_size = max_queue_ext->max_tx_cq_depth; if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) max_tx_queue_size = min_t(uint32_t, max_tx_queue_size, llq->max_llq_depth); else max_tx_queue_size = min_t(uint32_t, max_tx_queue_size, max_queue_ext->max_tx_sq_depth); ctx->max_tx_sgl_size = min_t(uint16_t, ENA_PKT_MAX_BUFS, max_queue_ext->max_per_packet_tx_descs); ctx->max_rx_sgl_size = min_t(uint16_t, ENA_PKT_MAX_BUFS, max_queue_ext->max_per_packet_rx_descs); } else { struct ena_admin_queue_feature_desc *max_queues = &ctx->get_feat_ctx->max_queues; max_rx_queue_size = min_t(uint32_t, max_queues->max_cq_depth, max_queues->max_sq_depth); max_tx_queue_size = max_queues->max_cq_depth; if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) max_tx_queue_size = min_t(uint32_t, max_tx_queue_size, llq->max_llq_depth); else max_tx_queue_size = min_t(uint32_t, max_tx_queue_size, max_queues->max_sq_depth); ctx->max_tx_sgl_size = min_t(uint16_t, ENA_PKT_MAX_BUFS, max_queues->max_packet_tx_descs); ctx->max_rx_sgl_size = min_t(uint16_t, ENA_PKT_MAX_BUFS, max_queues->max_packet_rx_descs); } /* round down to the nearest power of 2 */ max_tx_queue_size = 1 << (flsl(max_tx_queue_size) - 1); max_rx_queue_size = 1 << (flsl(max_rx_queue_size) - 1); /* * When forcing large headers, we multiply the entry size by 2, * and therefore divide the queue size by 2, leaving the amount * of memory used by the queues unchanged. */ if (ena_force_large_llq_header) { if ((llq->entry_size_ctrl_supported & ENA_ADMIN_LIST_ENTRY_SIZE_256B) != 0 && ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) { max_tx_queue_size /= 2; ena_log(ctx->pdev, INFO, "Forcing large headers and decreasing maximum Tx queue size to %d\n", max_tx_queue_size); } else { ena_log(ctx->pdev, WARN, "Forcing large headers failed: LLQ is disabled or device does not support large headers\n"); } } tx_queue_size = clamp_val(tx_queue_size, ENA_MIN_RING_SIZE, max_tx_queue_size); rx_queue_size = clamp_val(rx_queue_size, ENA_MIN_RING_SIZE, max_rx_queue_size); tx_queue_size = 1 << (flsl(tx_queue_size) - 1); rx_queue_size = 1 << (flsl(rx_queue_size) - 1); ctx->max_tx_queue_size = max_tx_queue_size; ctx->max_rx_queue_size = max_rx_queue_size; ctx->tx_queue_size = tx_queue_size; ctx->rx_queue_size = rx_queue_size; return (0); } static void ena_config_host_info(struct ena_com_dev *ena_dev, device_t dev) { struct ena_admin_host_info *host_info; uintptr_t rid; int rc; /* Allocate only the host info */ rc = ena_com_allocate_host_info(ena_dev); if (unlikely(rc != 0)) { ena_log(dev, ERR, "Cannot allocate host info\n"); return; } host_info = ena_dev->host_attr.host_info; if (pci_get_id(dev, PCI_ID_RID, &rid) == 0) host_info->bdf = rid; host_info->os_type = ENA_ADMIN_OS_FREEBSD; host_info->kernel_ver = osreldate; sprintf(host_info->kernel_ver_str, "%d", osreldate); host_info->os_dist = 0; strncpy(host_info->os_dist_str, osrelease, sizeof(host_info->os_dist_str) - 1); host_info->driver_version = (ENA_DRV_MODULE_VER_MAJOR) | (ENA_DRV_MODULE_VER_MINOR << ENA_ADMIN_HOST_INFO_MINOR_SHIFT) | (ENA_DRV_MODULE_VER_SUBMINOR << ENA_ADMIN_HOST_INFO_SUB_MINOR_SHIFT); host_info->num_cpus = mp_ncpus; host_info->driver_supported_features = ENA_ADMIN_HOST_INFO_RX_OFFSET_MASK | ENA_ADMIN_HOST_INFO_RSS_CONFIGURABLE_FUNCTION_KEY_MASK; rc = ena_com_set_host_attributes(ena_dev); if (unlikely(rc != 0)) { if (rc == EOPNOTSUPP) ena_log(dev, WARN, "Cannot set host attributes\n"); else ena_log(dev, ERR, "Cannot set host attributes\n"); goto err; } return; err: ena_com_delete_host_info(ena_dev); } static int ena_device_init(struct ena_adapter *adapter, device_t pdev, struct ena_com_dev_get_features_ctx *get_feat_ctx, int *wd_active) { struct ena_llq_configurations llq_config; struct ena_com_dev *ena_dev = adapter->ena_dev; bool readless_supported; uint32_t aenq_groups; int dma_width; int rc; rc = ena_com_mmio_reg_read_request_init(ena_dev); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "failed to init mmio read less\n"); return (rc); } /* * The PCIe configuration space revision id indicate if mmio reg * read is disabled */ readless_supported = !(pci_get_revid(pdev) & ENA_MMIO_DISABLE_REG_READ); ena_com_set_mmio_read_mode(ena_dev, readless_supported); rc = ena_com_dev_reset(ena_dev, ENA_REGS_RESET_NORMAL); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Can not reset device\n"); goto err_mmio_read_less; } rc = ena_com_validate_version(ena_dev); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "device version is too low\n"); goto err_mmio_read_less; } dma_width = ena_com_get_dma_width(ena_dev); if (unlikely(dma_width < 0)) { ena_log(pdev, ERR, "Invalid dma width value %d", dma_width); rc = dma_width; goto err_mmio_read_less; } adapter->dma_width = dma_width; /* ENA admin level init */ rc = ena_com_admin_init(ena_dev, &aenq_handlers); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Can not initialize ena admin queue with device\n"); goto err_mmio_read_less; } /* * To enable the msix interrupts the driver needs to know the number * of queues. So the driver uses polling mode to retrieve this * information */ ena_com_set_admin_polling_mode(ena_dev, true); ena_config_host_info(ena_dev, pdev); /* Get Device Attributes */ rc = ena_com_get_dev_attr_feat(ena_dev, get_feat_ctx); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Cannot get attribute for ena device rc: %d\n", rc); goto err_admin_init; } aenq_groups = BIT(ENA_ADMIN_LINK_CHANGE) | BIT(ENA_ADMIN_FATAL_ERROR) | BIT(ENA_ADMIN_WARNING) | BIT(ENA_ADMIN_NOTIFICATION) | BIT(ENA_ADMIN_KEEP_ALIVE); aenq_groups &= get_feat_ctx->aenq.supported_groups; rc = ena_com_set_aenq_config(ena_dev, aenq_groups); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Cannot configure aenq groups rc: %d\n", rc); goto err_admin_init; } *wd_active = !!(aenq_groups & BIT(ENA_ADMIN_KEEP_ALIVE)); set_default_llq_configurations(&llq_config, &get_feat_ctx->llq); rc = ena_set_queues_placement_policy(pdev, ena_dev, &get_feat_ctx->llq, &llq_config); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Failed to set placement policy\n"); goto err_admin_init; } return (0); err_admin_init: ena_com_delete_host_info(ena_dev); ena_com_admin_destroy(ena_dev); err_mmio_read_less: ena_com_mmio_reg_read_request_destroy(ena_dev); return (rc); } static int ena_enable_msix_and_set_admin_interrupts(struct ena_adapter *adapter) { struct ena_com_dev *ena_dev = adapter->ena_dev; int rc; rc = ena_enable_msix(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Error with MSI-X enablement\n"); return (rc); } ena_setup_mgmnt_intr(adapter); rc = ena_request_mgmnt_irq(adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Cannot setup mgmnt queue intr\n"); goto err_disable_msix; } ena_com_set_admin_polling_mode(ena_dev, false); ena_com_admin_aenq_enable(ena_dev); return (0); err_disable_msix: ena_disable_msix(adapter); return (rc); } /* Function called on ENA_ADMIN_KEEP_ALIVE event */ static void ena_keep_alive_wd(void *adapter_data, struct ena_admin_aenq_entry *aenq_e) { struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; struct ena_admin_aenq_keep_alive_desc *desc; sbintime_t stime; uint64_t rx_drops; uint64_t tx_drops; desc = (struct ena_admin_aenq_keep_alive_desc *)aenq_e; rx_drops = ((uint64_t)desc->rx_drops_high << 32) | desc->rx_drops_low; tx_drops = ((uint64_t)desc->tx_drops_high << 32) | desc->tx_drops_low; counter_u64_zero(adapter->hw_stats.rx_drops); counter_u64_add(adapter->hw_stats.rx_drops, rx_drops); counter_u64_zero(adapter->hw_stats.tx_drops); counter_u64_add(adapter->hw_stats.tx_drops, tx_drops); stime = getsbinuptime(); atomic_store_rel_64(&adapter->keep_alive_timestamp, stime); } /* Check for keep alive expiration */ static void check_for_missing_keep_alive(struct ena_adapter *adapter) { sbintime_t timestamp, time; if (adapter->wd_active == 0) return; if (adapter->keep_alive_timeout == ENA_HW_HINTS_NO_TIMEOUT) return; timestamp = atomic_load_acq_64(&adapter->keep_alive_timestamp); time = getsbinuptime() - timestamp; if (unlikely(time > adapter->keep_alive_timeout)) { ena_log(adapter->pdev, ERR, "Keep alive watchdog timeout.\n"); counter_u64_add(adapter->dev_stats.wd_expired, 1); ena_trigger_reset(adapter, ENA_REGS_RESET_KEEP_ALIVE_TO); } } /* Check if admin queue is enabled */ static void check_for_admin_com_state(struct ena_adapter *adapter) { if (unlikely(ena_com_get_admin_running_state(adapter->ena_dev) == false)) { ena_log(adapter->pdev, ERR, "ENA admin queue is not in running state!\n"); counter_u64_add(adapter->dev_stats.admin_q_pause, 1); ena_trigger_reset(adapter, ENA_REGS_RESET_ADMIN_TO); } } static int check_for_rx_interrupt_queue(struct ena_adapter *adapter, struct ena_ring *rx_ring) { if (likely(atomic_load_8(&rx_ring->first_interrupt))) return (0); if (ena_com_cq_empty(rx_ring->ena_com_io_cq)) return (0); rx_ring->no_interrupt_event_cnt++; if (rx_ring->no_interrupt_event_cnt == ENA_MAX_NO_INTERRUPT_ITERATIONS) { ena_log(adapter->pdev, ERR, "Potential MSIX issue on Rx side Queue = %d. Reset the device\n", rx_ring->qid); ena_trigger_reset(adapter, ENA_REGS_RESET_MISS_INTERRUPT); return (EIO); } return (0); } static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter, struct ena_ring *tx_ring) { device_t pdev = adapter->pdev; struct bintime curtime, time; struct ena_tx_buffer *tx_buf; int time_since_last_cleanup; int missing_tx_comp_to; sbintime_t time_offset; uint32_t missed_tx = 0; int i, rc = 0; getbinuptime(&curtime); for (i = 0; i < tx_ring->ring_size; i++) { tx_buf = &tx_ring->tx_buffer_info[i]; if (bintime_isset(&tx_buf->timestamp) == 0) continue; time = curtime; bintime_sub(&time, &tx_buf->timestamp); time_offset = bttosbt(time); if (unlikely(!atomic_load_8(&tx_ring->first_interrupt) && time_offset > 2 * adapter->missing_tx_timeout)) { /* * If after graceful period interrupt is still not * received, we schedule a reset. */ ena_log(pdev, ERR, "Potential MSIX issue on Tx side Queue = %d. " "Reset the device\n", tx_ring->qid); ena_trigger_reset(adapter, ENA_REGS_RESET_MISS_INTERRUPT); return (EIO); } /* Check again if packet is still waiting */ if (unlikely(time_offset > adapter->missing_tx_timeout)) { if (tx_buf->print_once) { time_since_last_cleanup = TICKS_2_MSEC(ticks - tx_ring->tx_last_cleanup_ticks); missing_tx_comp_to = sbttoms( adapter->missing_tx_timeout); ena_log(pdev, WARN, "Found a Tx that wasn't completed on time, qid %d, index %d. " "%d msecs have passed since last cleanup. Missing Tx timeout value %d msecs.\n", tx_ring->qid, i, time_since_last_cleanup, missing_tx_comp_to); } tx_buf->print_once = false; missed_tx++; } } if (unlikely(missed_tx > adapter->missing_tx_threshold)) { ena_log(pdev, ERR, "The number of lost tx completion is above the threshold " "(%d > %d). Reset the device\n", missed_tx, adapter->missing_tx_threshold); ena_trigger_reset(adapter, ENA_REGS_RESET_MISS_TX_CMPL); rc = EIO; } counter_u64_add(tx_ring->tx_stats.missing_tx_comp, missed_tx); return (rc); } /* * Check for TX which were not completed on time. * Timeout is defined by "missing_tx_timeout". * Reset will be performed if number of incompleted * transactions exceeds "missing_tx_threshold". */ static void check_for_missing_completions(struct ena_adapter *adapter) { struct ena_ring *tx_ring; struct ena_ring *rx_ring; int i, budget, rc; /* Make sure the driver doesn't turn the device in other process */ rmb(); if (!ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) return; if (ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter)) return; if (adapter->missing_tx_timeout == ENA_HW_HINTS_NO_TIMEOUT) return; budget = adapter->missing_tx_max_queues; for (i = adapter->next_monitored_tx_qid; i < adapter->num_io_queues; i++) { tx_ring = &adapter->tx_ring[i]; rx_ring = &adapter->rx_ring[i]; rc = check_missing_comp_in_tx_queue(adapter, tx_ring); if (unlikely(rc != 0)) return; rc = check_for_rx_interrupt_queue(adapter, rx_ring); if (unlikely(rc != 0)) return; budget--; if (budget == 0) { i++; break; } } adapter->next_monitored_tx_qid = i % adapter->num_io_queues; } /* trigger rx cleanup after 2 consecutive detections */ #define EMPTY_RX_REFILL 2 /* For the rare case where the device runs out of Rx descriptors and the * msix handler failed to refill new Rx descriptors (due to a lack of memory * for example). * This case will lead to a deadlock: * The device won't send interrupts since all the new Rx packets will be dropped * The msix handler won't allocate new Rx descriptors so the device won't be * able to send new packets. * * When such a situation is detected - execute rx cleanup task in another thread */ static void check_for_empty_rx_ring(struct ena_adapter *adapter) { struct ena_ring *rx_ring; int i, refill_required; if (!ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) return; if (ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter)) return; for (i = 0; i < adapter->num_io_queues; i++) { rx_ring = &adapter->rx_ring[i]; refill_required = ena_com_free_q_entries( rx_ring->ena_com_io_sq); if (unlikely(refill_required == (rx_ring->ring_size - 1))) { rx_ring->empty_rx_queue++; if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) { counter_u64_add(rx_ring->rx_stats.empty_rx_ring, 1); ena_log(adapter->pdev, WARN, "Rx ring %d is stalled. Triggering the refill function\n", i); taskqueue_enqueue(rx_ring->que->cleanup_tq, &rx_ring->que->cleanup_task); rx_ring->empty_rx_queue = 0; } } else { rx_ring->empty_rx_queue = 0; } } } static void ena_update_hints(struct ena_adapter *adapter, struct ena_admin_ena_hw_hints *hints) { struct ena_com_dev *ena_dev = adapter->ena_dev; if (hints->admin_completion_tx_timeout) ena_dev->admin_queue.completion_timeout = hints->admin_completion_tx_timeout * 1000; if (hints->mmio_read_timeout) /* convert to usec */ ena_dev->mmio_read.reg_read_to = hints->mmio_read_timeout * 1000; if (hints->missed_tx_completion_count_threshold_to_reset) adapter->missing_tx_threshold = hints->missed_tx_completion_count_threshold_to_reset; if (hints->missing_tx_completion_timeout) { if (hints->missing_tx_completion_timeout == ENA_HW_HINTS_NO_TIMEOUT) adapter->missing_tx_timeout = ENA_HW_HINTS_NO_TIMEOUT; else adapter->missing_tx_timeout = SBT_1MS * hints->missing_tx_completion_timeout; } if (hints->driver_watchdog_timeout) { if (hints->driver_watchdog_timeout == ENA_HW_HINTS_NO_TIMEOUT) adapter->keep_alive_timeout = ENA_HW_HINTS_NO_TIMEOUT; else adapter->keep_alive_timeout = SBT_1MS * hints->driver_watchdog_timeout; } } /** * ena_copy_eni_metrics - Get and copy ENI metrics from the HW. * @adapter: ENA device adapter * * Returns 0 on success, EOPNOTSUPP if current HW doesn't support those metrics * and other error codes on failure. * * This function can possibly cause a race with other calls to the admin queue. * Because of that, the caller should either lock this function or make sure * that there is no race in the current context. */ static int ena_copy_eni_metrics(struct ena_adapter *adapter) { static bool print_once = true; int rc; rc = ena_com_get_eni_stats(adapter->ena_dev, &adapter->eni_metrics); if (rc != 0) { if (rc == ENA_COM_UNSUPPORTED) { if (print_once) { ena_log(adapter->pdev, WARN, "Retrieving ENI metrics is not supported.\n"); print_once = false; } else { ena_log(adapter->pdev, DBG, "Retrieving ENI metrics is not supported.\n"); } } else { ena_log(adapter->pdev, ERR, "Failed to get ENI metrics: %d\n", rc); } } return (rc); } static int ena_copy_srd_metrics(struct ena_adapter *adapter) { return ena_com_get_ena_srd_info(adapter->ena_dev, &adapter->ena_srd_info); } static int ena_copy_customer_metrics(struct ena_adapter *adapter) { struct ena_com_dev *dev; u32 supported_metrics_count; int rc, len; dev = adapter->ena_dev; supported_metrics_count = ena_com_get_customer_metric_count(dev); len = supported_metrics_count * sizeof(u64); /* Fill the data buffer */ rc = ena_com_get_customer_metrics(adapter->ena_dev, (char *)(adapter->customer_metrics_array), len); return (rc); } static void ena_timer_service(void *data) { struct ena_adapter *adapter = (struct ena_adapter *)data; struct ena_admin_host_info *host_info = adapter->ena_dev->host_attr.host_info; check_for_missing_keep_alive(adapter); check_for_admin_com_state(adapter); check_for_missing_completions(adapter); check_for_empty_rx_ring(adapter); /* * User controller update of the ENA metrics. * If the delay was set to 0, then the stats shouldn't be updated at * all. * Otherwise, wait 'metrics_sample_interval' seconds, before * updating stats. * As timer service is executed every second, it's enough to increment * appropriate counter each time the timer service is executed. */ if ((adapter->metrics_sample_interval != 0) && (++adapter->metrics_sample_interval_cnt >= adapter->metrics_sample_interval)) { taskqueue_enqueue(adapter->metrics_tq, &adapter->metrics_task); adapter->metrics_sample_interval_cnt = 0; } if (host_info != NULL) ena_update_host_info(host_info, adapter->ifp); if (unlikely(ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter))) { /* * Timeout when validating version indicates that the device * became unresponsive. If that happens skip the reset and * reschedule timer service, so the reset can be retried later. */ if (ena_com_validate_version(adapter->ena_dev) == ENA_COM_TIMER_EXPIRED) { ena_log(adapter->pdev, WARN, "FW unresponsive, skipping reset\n"); ENA_TIMER_RESET(adapter); return; } ena_log(adapter->pdev, WARN, "Trigger reset is on\n"); taskqueue_enqueue(adapter->reset_tq, &adapter->reset_task); return; } /* * Schedule another timeout one second from now. */ ENA_TIMER_RESET(adapter); } void ena_destroy_device(struct ena_adapter *adapter, bool graceful) { if_t ifp = adapter->ifp; struct ena_com_dev *ena_dev = adapter->ena_dev; bool dev_up; if (!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter)) return; if (!graceful) if_link_state_change(ifp, LINK_STATE_DOWN); ENA_TIMER_DRAIN(adapter); dev_up = ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter); if (dev_up) ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); if (!graceful) ena_com_set_admin_running_state(ena_dev, false); if (ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, adapter)) ena_down(adapter); /* * Stop the device from sending AENQ events (if the device was up, and * the trigger reset was on, ena_down already performs device reset) */ if (!(ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter) && dev_up)) ena_com_dev_reset(adapter->ena_dev, adapter->reset_reason); ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); /* * IO rings resources should be freed because `ena_restore_device()` * calls (not directly) `ena_enable_msix()`, which re-allocates MSIX * vectors. The amount of MSIX vectors after destroy-restore may be * different than before. Therefore, IO rings resources should be * established from scratch each time. */ ena_free_all_io_rings_resources(adapter); ena_com_abort_admin_commands(ena_dev); ena_com_wait_for_abort_completion(ena_dev); ena_com_admin_destroy(ena_dev); ena_com_mmio_reg_read_request_destroy(ena_dev); adapter->reset_reason = ENA_REGS_RESET_NORMAL; ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_TRIGGER_RESET, adapter); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_DEVICE_RUNNING, adapter); } static int ena_device_validate_params(struct ena_adapter *adapter, struct ena_com_dev_get_features_ctx *get_feat_ctx) { if (memcmp(get_feat_ctx->dev_attr.mac_addr, adapter->mac_addr, ETHER_ADDR_LEN) != 0) { ena_log(adapter->pdev, ERR, "Error, mac addresses differ\n"); return (EINVAL); } if (get_feat_ctx->dev_attr.max_mtu < if_getmtu(adapter->ifp)) { ena_log(adapter->pdev, ERR, "Error, device max mtu is smaller than ifp MTU\n"); return (EINVAL); } return 0; } int ena_restore_device(struct ena_adapter *adapter) { struct ena_com_dev_get_features_ctx get_feat_ctx; struct ena_com_dev *ena_dev = adapter->ena_dev; if_t ifp = adapter->ifp; device_t dev = adapter->pdev; int wd_active; int rc; ENA_FLAG_SET_ATOMIC(ENA_FLAG_ONGOING_RESET, adapter); rc = ena_device_init(adapter, dev, &get_feat_ctx, &wd_active); if (rc != 0) { ena_log(dev, ERR, "Cannot initialize device\n"); goto err; } /* * Only enable WD if it was enabled before reset, so it won't override * value set by the user by the sysctl. */ if (adapter->wd_active != 0) adapter->wd_active = wd_active; rc = ena_device_validate_params(adapter, &get_feat_ctx); if (rc != 0) { ena_log(dev, ERR, "Validation of device parameters failed\n"); goto err_device_destroy; } ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_ONGOING_RESET, adapter); /* Make sure we don't have a race with AENQ Links state handler */ if (ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, adapter)) if_link_state_change(ifp, LINK_STATE_UP); rc = ena_enable_msix_and_set_admin_interrupts(adapter); if (rc != 0) { ena_log(dev, ERR, "Enable MSI-X failed\n"); goto err_device_destroy; } /* * Effective value of used MSIX vectors should be the same as before * `ena_destroy_device()`, if possible, or closest to it if less vectors * are available. */ if ((adapter->msix_vecs - ENA_ADMIN_MSIX_VEC) < adapter->num_io_queues) adapter->num_io_queues = adapter->msix_vecs - ENA_ADMIN_MSIX_VEC; /* Re-initialize rings basic information */ ena_init_io_rings(adapter); /* If the interface was up before the reset bring it up */ if (ENA_FLAG_ISSET(ENA_FLAG_DEV_UP_BEFORE_RESET, adapter)) { rc = ena_up(adapter); if (rc != 0) { ena_log(dev, ERR, "Failed to create I/O queues\n"); goto err_disable_msix; } } /* Indicate that device is running again and ready to work */ ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEVICE_RUNNING, adapter); /* * As the AENQ handlers weren't executed during reset because * the flag ENA_FLAG_DEVICE_RUNNING was turned off, the * timestamp must be updated again That will prevent next reset * caused by missing keep alive. */ adapter->keep_alive_timestamp = getsbinuptime(); ENA_TIMER_RESET(adapter); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); return (rc); err_disable_msix: ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); err_device_destroy: ena_com_abort_admin_commands(ena_dev); ena_com_wait_for_abort_completion(ena_dev); ena_com_admin_destroy(ena_dev); ena_com_dev_reset(ena_dev, ENA_REGS_RESET_DRIVER_INVALID_STATE); ena_com_mmio_reg_read_request_destroy(ena_dev); err: ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_DEVICE_RUNNING, adapter); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_ONGOING_RESET, adapter); ena_log(dev, ERR, "Reset attempt failed. Can not reset the device\n"); return (rc); } static void ena_metrics_task(void *arg, int pending) { struct ena_adapter *adapter = (struct ena_adapter *)arg; ENA_LOCK_LOCK(); if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_CUSTOMER_METRICS)) (void)ena_copy_customer_metrics(adapter); else if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS)) (void)ena_copy_eni_metrics(adapter); if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) (void)ena_copy_srd_metrics(adapter); ENA_LOCK_UNLOCK(); } static void ena_reset_task(void *arg, int pending) { struct ena_adapter *adapter = (struct ena_adapter *)arg; ENA_LOCK_LOCK(); if (likely(ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter))) { ena_destroy_device(adapter, false); ena_restore_device(adapter); ena_log(adapter->pdev, INFO, "Device reset completed successfully, Driver info: %s\n", ena_version); } ENA_LOCK_UNLOCK(); } static void ena_free_stats(struct ena_adapter *adapter) { ena_free_counters((counter_u64_t *)&adapter->hw_stats, sizeof(struct ena_hw_stats)); ena_free_counters((counter_u64_t *)&adapter->dev_stats, sizeof(struct ena_stats_dev)); } /** * ena_attach - Device Initialization Routine * @pdev: device information struct * * Returns 0 on success, otherwise on failure. * * ena_attach initializes an adapter identified by a device structure. * The OS initialization, configuring of the adapter private structure, * and a hardware reset occur. **/ static int ena_attach(device_t pdev) { struct ena_com_dev_get_features_ctx get_feat_ctx; struct ena_calc_queue_size_ctx calc_queue_ctx = { 0 }; static int version_printed; struct ena_adapter *adapter; struct ena_com_dev *ena_dev = NULL; uint32_t max_num_io_queues; int msix_rid; int rid, rc; adapter = device_get_softc(pdev); adapter->pdev = pdev; adapter->first_bind = -1; /* * Set up the timer service - driver is responsible for avoiding * concurrency, as the callout won't be using any locking inside. */ ENA_TIMER_INIT(adapter); adapter->keep_alive_timeout = ENA_DEFAULT_KEEP_ALIVE_TO; adapter->missing_tx_timeout = ENA_DEFAULT_TX_CMP_TO; adapter->missing_tx_max_queues = ENA_DEFAULT_TX_MONITORED_QUEUES; adapter->missing_tx_threshold = ENA_DEFAULT_TX_CMP_THRESHOLD; adapter->irq_cpu_base = ENA_BASE_CPU_UNSPECIFIED; adapter->irq_cpu_stride = 0; #ifdef RSS adapter->rss_enabled = 1; #endif if (version_printed++ == 0) ena_log(pdev, INFO, "%s\n", ena_version); /* Allocate memory for ena_dev structure */ ena_dev = malloc(sizeof(struct ena_com_dev), M_DEVBUF, M_WAITOK | M_ZERO); adapter->ena_dev = ena_dev; ena_dev->dmadev = pdev; rid = PCIR_BAR(ENA_REG_BAR); adapter->memory = NULL; adapter->registers = bus_alloc_resource_any(pdev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (unlikely(adapter->registers == NULL)) { ena_log(pdev, ERR, "unable to allocate bus resource: registers!\n"); rc = ENOMEM; goto err_dev_free; } /* MSIx vector table may reside on BAR0 with registers or on BAR1. */ msix_rid = pci_msix_table_bar(pdev); if (msix_rid != rid) { adapter->msix = bus_alloc_resource_any(pdev, SYS_RES_MEMORY, &msix_rid, RF_ACTIVE); if (unlikely(adapter->msix == NULL)) { ena_log(pdev, ERR, "unable to allocate bus resource: msix!\n"); rc = ENOMEM; goto err_pci_free; } adapter->msix_rid = msix_rid; } ena_dev->bus = malloc(sizeof(struct ena_bus), M_DEVBUF, M_WAITOK | M_ZERO); /* Store register resources */ ((struct ena_bus *)(ena_dev->bus))->reg_bar_t = rman_get_bustag( adapter->registers); ((struct ena_bus *)(ena_dev->bus))->reg_bar_h = rman_get_bushandle( adapter->registers); if (unlikely(((struct ena_bus *)(ena_dev->bus))->reg_bar_h == 0)) { ena_log(pdev, ERR, "failed to pmap registers bar\n"); rc = ENXIO; goto err_bus_free; } rc = ena_map_llq_mem_bar(pdev, ena_dev); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Failed to map ENA mem bar"); goto err_bus_free; } /* Initially clear all the flags */ ENA_FLAG_ZERO(adapter); /* Device initialization */ rc = ena_device_init(adapter, pdev, &get_feat_ctx, &adapter->wd_active); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "ENA device init failed! (err: %d)\n", rc); rc = ENXIO; goto err_bus_free; } if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) adapter->disable_meta_caching = !!( get_feat_ctx.llq.accel_mode.u.get.supported_flags & BIT(ENA_ADMIN_DISABLE_META_CACHING)); adapter->keep_alive_timestamp = getsbinuptime(); adapter->tx_offload_cap = get_feat_ctx.offload.tx; memcpy(adapter->mac_addr, get_feat_ctx.dev_attr.mac_addr, ETHER_ADDR_LEN); calc_queue_ctx.pdev = pdev; calc_queue_ctx.ena_dev = ena_dev; calc_queue_ctx.get_feat_ctx = &get_feat_ctx; /* Calculate initial and maximum IO queue number and size */ max_num_io_queues = ena_calc_max_io_queue_num(pdev, ena_dev, &get_feat_ctx); rc = ena_calc_io_queue_size(&calc_queue_ctx); if (unlikely((rc != 0) || (max_num_io_queues <= 0))) { rc = EFAULT; goto err_com_free; } adapter->requested_tx_ring_size = calc_queue_ctx.tx_queue_size; adapter->requested_rx_ring_size = calc_queue_ctx.rx_queue_size; adapter->max_tx_ring_size = calc_queue_ctx.max_tx_queue_size; adapter->max_rx_ring_size = calc_queue_ctx.max_rx_queue_size; adapter->max_tx_sgl_size = calc_queue_ctx.max_tx_sgl_size; adapter->max_rx_sgl_size = calc_queue_ctx.max_rx_sgl_size; adapter->max_num_io_queues = max_num_io_queues; adapter->buf_ring_size = ENA_DEFAULT_BUF_RING_SIZE; adapter->max_mtu = get_feat_ctx.dev_attr.max_mtu; adapter->reset_reason = ENA_REGS_RESET_NORMAL; /* set up dma tags for rx and tx buffers */ rc = ena_setup_tx_dma_tag(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Failed to create TX DMA tag\n"); goto err_com_free; } rc = ena_setup_rx_dma_tag(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Failed to create RX DMA tag\n"); goto err_tx_tag_free; } /* * The amount of requested MSIX vectors is equal to * adapter::max_num_io_queues (see `ena_enable_msix()`), plus a constant * number of admin queue interrupts. The former is initially determined * by HW capabilities (see `ena_calc_max_io_queue_num())` but may not be * achieved if there are not enough system resources. By default, the * number of effectively used IO queues is the same but later on it can * be limited by the user using sysctl interface. */ rc = ena_enable_msix_and_set_admin_interrupts(adapter); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Failed to enable and set the admin interrupts\n"); goto err_io_free; } /* By default all of allocated MSIX vectors are actively used */ adapter->num_io_queues = adapter->msix_vecs - ENA_ADMIN_MSIX_VEC; /* initialize rings basic information */ ena_init_io_rings(adapter); rc = ena_com_allocate_customer_metrics_buffer(ena_dev); if (rc) { ena_log(pdev, ERR, "Failed to allocate customer metrics buffer.\n"); goto err_msix_free; } rc = ena_sysctl_allocate_customer_metrics_buffer(adapter); if (unlikely(rc)){ ena_log(pdev, ERR, "Failed to allocate sysctl customer metrics buffer.\n"); goto err_metrics_buffer_destroy; } /* Initialize statistics */ ena_alloc_counters((counter_u64_t *)&adapter->dev_stats, sizeof(struct ena_stats_dev)); ena_alloc_counters((counter_u64_t *)&adapter->hw_stats, sizeof(struct ena_hw_stats)); ena_sysctl_add_nodes(adapter); /* setup network interface */ rc = ena_setup_ifnet(pdev, adapter, &get_feat_ctx); if (unlikely(rc != 0)) { ena_log(pdev, ERR, "Error with network interface setup\n"); goto err_customer_metrics_alloc; } /* Initialize reset task queue */ TASK_INIT(&adapter->reset_task, 0, ena_reset_task, adapter); adapter->reset_tq = taskqueue_create("ena_reset_enqueue", M_WAITOK | M_ZERO, taskqueue_thread_enqueue, &adapter->reset_tq); taskqueue_start_threads(&adapter->reset_tq, 1, PI_NET, "%s rstq", device_get_nameunit(adapter->pdev)); /* Initialize metrics task queue */ TASK_INIT(&adapter->metrics_task, 0, ena_metrics_task, adapter); adapter->metrics_tq = taskqueue_create("ena_metrics_enqueue", M_WAITOK | M_ZERO, taskqueue_thread_enqueue, &adapter->metrics_tq); taskqueue_start_threads(&adapter->metrics_tq, 1, PI_NET, "%s metricsq", device_get_nameunit(adapter->pdev)); #ifdef DEV_NETMAP rc = ena_netmap_attach(adapter); if (rc != 0) { ena_log(pdev, ERR, "netmap attach failed: %d\n", rc); goto err_detach; } #endif /* DEV_NETMAP */ /* Tell the stack that the interface is not active */ if_setdrvflagbits(adapter->ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING); ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEVICE_RUNNING, adapter); /* Run the timer service */ ENA_TIMER_RESET(adapter); return (0); #ifdef DEV_NETMAP err_detach: ether_ifdetach(adapter->ifp); #endif /* DEV_NETMAP */ err_customer_metrics_alloc: free(adapter->customer_metrics_array, M_DEVBUF); err_metrics_buffer_destroy: ena_com_delete_customer_metrics_buffer(ena_dev); err_msix_free: ena_free_stats(adapter); ena_com_dev_reset(adapter->ena_dev, ENA_REGS_RESET_INIT_ERR); ena_free_mgmnt_irq(adapter); ena_disable_msix(adapter); err_io_free: ena_free_all_io_rings_resources(adapter); ena_free_rx_dma_tag(adapter); err_tx_tag_free: ena_free_tx_dma_tag(adapter); err_com_free: ena_com_admin_destroy(ena_dev); ena_com_delete_host_info(ena_dev); ena_com_mmio_reg_read_request_destroy(ena_dev); err_bus_free: free(ena_dev->bus, M_DEVBUF); err_pci_free: ena_free_pci_resources(adapter); err_dev_free: free(ena_dev, M_DEVBUF); return (rc); } /** * ena_detach - Device Removal Routine * @pdev: device information struct * * ena_detach is called by the device subsystem to alert the driver * that it should release a PCI device. **/ static int ena_detach(device_t pdev) { struct ena_adapter *adapter = device_get_softc(pdev); struct ena_com_dev *ena_dev = adapter->ena_dev; int rc; /* Make sure VLANS are not using driver */ if (if_vlantrunkinuse(adapter->ifp)) { ena_log(adapter->pdev, ERR, "VLAN is in use, detach first\n"); return (EBUSY); } ether_ifdetach(adapter->ifp); /* Stop timer service */ ENA_LOCK_LOCK(); ENA_TIMER_DRAIN(adapter); ENA_LOCK_UNLOCK(); /* Release metrics task */ while (taskqueue_cancel(adapter->metrics_tq, &adapter->metrics_task, NULL)) taskqueue_drain(adapter->metrics_tq, &adapter->metrics_task); taskqueue_free(adapter->metrics_tq); /* Release reset task */ while (taskqueue_cancel(adapter->reset_tq, &adapter->reset_task, NULL)) taskqueue_drain(adapter->reset_tq, &adapter->reset_task); taskqueue_free(adapter->reset_tq); ENA_LOCK_LOCK(); ena_down(adapter); ena_destroy_device(adapter, true); ENA_LOCK_UNLOCK(); /* Restore unregistered sysctl queue nodes. */ ena_sysctl_update_queue_node_nb(adapter, adapter->num_io_queues, adapter->max_num_io_queues); #ifdef DEV_NETMAP netmap_detach(adapter->ifp); #endif /* DEV_NETMAP */ ena_free_stats(adapter); rc = ena_free_rx_dma_tag(adapter); if (unlikely(rc != 0)) ena_log(adapter->pdev, WARN, "Unmapped RX DMA tag associations\n"); rc = ena_free_tx_dma_tag(adapter); if (unlikely(rc != 0)) ena_log(adapter->pdev, WARN, "Unmapped TX DMA tag associations\n"); ena_free_irqs(adapter); ena_free_pci_resources(adapter); if (adapter->rss_indir != NULL) free(adapter->rss_indir, M_DEVBUF); if (likely(ENA_FLAG_ISSET(ENA_FLAG_RSS_ACTIVE, adapter))) ena_com_rss_destroy(ena_dev); ena_com_delete_host_info(ena_dev); free(adapter->customer_metrics_array, M_DEVBUF); ena_com_delete_customer_metrics_buffer(ena_dev); if_free(adapter->ifp); free(ena_dev->bus, M_DEVBUF); free(ena_dev, M_DEVBUF); return (bus_generic_detach(pdev)); } /****************************************************************************** ******************************** AENQ Handlers ******************************* *****************************************************************************/ /** * ena_update_on_link_change: * Notify the network interface about the change in link status **/ static void ena_update_on_link_change(void *adapter_data, struct ena_admin_aenq_entry *aenq_e) { struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; struct ena_admin_aenq_link_change_desc *aenq_desc; int status; if_t ifp; aenq_desc = (struct ena_admin_aenq_link_change_desc *)aenq_e; ifp = adapter->ifp; status = aenq_desc->flags & ENA_ADMIN_AENQ_LINK_CHANGE_DESC_LINK_STATUS_MASK; if (status != 0) { ena_log(adapter->pdev, INFO, "link is UP\n"); ENA_FLAG_SET_ATOMIC(ENA_FLAG_LINK_UP, adapter); if (!ENA_FLAG_ISSET(ENA_FLAG_ONGOING_RESET, adapter)) if_link_state_change(ifp, LINK_STATE_UP); } else { ena_log(adapter->pdev, INFO, "link is DOWN\n"); if_link_state_change(ifp, LINK_STATE_DOWN); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_LINK_UP, adapter); } } static void ena_notification(void *adapter_data, struct ena_admin_aenq_entry *aenq_e) { struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; struct ena_admin_ena_hw_hints *hints; ENA_WARN(aenq_e->aenq_common_desc.group != ENA_ADMIN_NOTIFICATION, adapter->ena_dev, "Invalid group(%x) expected %x\n", aenq_e->aenq_common_desc.group, ENA_ADMIN_NOTIFICATION); switch (aenq_e->aenq_common_desc.syndrome) { case ENA_ADMIN_UPDATE_HINTS: hints = (struct ena_admin_ena_hw_hints *)(&aenq_e->inline_data_w4); ena_update_hints(adapter, hints); break; default: ena_log(adapter->pdev, ERR, "Invalid aenq notification link state %d\n", aenq_e->aenq_common_desc.syndrome); } } static void ena_lock_init(void *arg) { ENA_LOCK_INIT(); } SYSINIT(ena_lock_init, SI_SUB_LOCK, SI_ORDER_FIRST, ena_lock_init, NULL); static void ena_lock_uninit(void *arg) { ENA_LOCK_DESTROY(); } SYSUNINIT(ena_lock_uninit, SI_SUB_LOCK, SI_ORDER_FIRST, ena_lock_uninit, NULL); /** * This handler will called for unknown event group or unimplemented handlers **/ static void unimplemented_aenq_handler(void *adapter_data, struct ena_admin_aenq_entry *aenq_e) { struct ena_adapter *adapter = (struct ena_adapter *)adapter_data; ena_log(adapter->pdev, ERR, "Unknown event was received or event with unimplemented handler\n"); } static struct ena_aenq_handlers aenq_handlers = { .handlers = { [ENA_ADMIN_LINK_CHANGE] = ena_update_on_link_change, [ENA_ADMIN_NOTIFICATION] = ena_notification, [ENA_ADMIN_KEEP_ALIVE] = ena_keep_alive_wd, }, .unimplemented_handler = unimplemented_aenq_handler }; /********************************************************************* * FreeBSD Device Interface Entry Points *********************************************************************/ static device_method_t ena_methods[] = { /* Device interface */ DEVMETHOD(device_probe, ena_probe), DEVMETHOD(device_attach, ena_attach), DEVMETHOD(device_detach, ena_detach), DEVMETHOD_END }; static driver_t ena_driver = { "ena", ena_methods, sizeof(struct ena_adapter), }; DRIVER_MODULE(ena, pci, ena_driver, 0, 0); MODULE_PNP_INFO("U16:vendor;U16:device", pci, ena, ena_vendor_info_array, nitems(ena_vendor_info_array) - 1); MODULE_DEPEND(ena, pci, 1, 1, 1); MODULE_DEPEND(ena, ether, 1, 1, 1); #ifdef DEV_NETMAP MODULE_DEPEND(ena, netmap, 1, 1, 1); #endif /* DEV_NETMAP */ /*********************************************************************/ diff --git a/sys/dev/ena/ena.h b/sys/dev/ena/ena.h index 5647fe973502..2c758d37622f 100644 --- a/sys/dev/ena/ena.h +++ b/sys/dev/ena/ena.h @@ -1,563 +1,563 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #ifndef ENA_H #define ENA_H #include "opt_rss.h" #include "ena-com/ena_com.h" #include "ena-com/ena_eth_com.h" #define ENA_DRV_MODULE_VER_MAJOR 2 #define ENA_DRV_MODULE_VER_MINOR 6 #define ENA_DRV_MODULE_VER_SUBMINOR 3 #define ENA_DRV_MODULE_NAME "ena" #ifndef ENA_DRV_MODULE_VERSION #define ENA_DRV_MODULE_VERSION \ __XSTRING(ENA_DRV_MODULE_VER_MAJOR) "." \ __XSTRING(ENA_DRV_MODULE_VER_MINOR) "." \ __XSTRING(ENA_DRV_MODULE_VER_SUBMINOR) #endif #define ENA_DEVICE_NAME "Elastic Network Adapter (ENA)" #define ENA_DEVICE_DESC "ENA adapter" /* Calculate DMA mask - width for ena cannot exceed 48, so it is safe */ #define ENA_DMA_BIT_MASK(x) ((1ULL << (x)) - 1ULL) /* 1 for AENQ + ADMIN */ #define ENA_ADMIN_MSIX_VEC 1 #define ENA_MAX_MSIX_VEC(io_queues) (ENA_ADMIN_MSIX_VEC + (io_queues)) #define ENA_REG_BAR 0 #define ENA_MEM_BAR 2 #define ENA_BUS_DMA_SEGS 32 #define ENA_DEFAULT_BUF_RING_SIZE 4096 #define ENA_DEFAULT_RING_SIZE 1024 #define ENA_MIN_RING_SIZE 256 #define ENA_BASE_CPU_UNSPECIFIED -1 /* * Refill Rx queue when number of required descriptors is above * QUEUE_SIZE / ENA_RX_REFILL_THRESH_DIVIDER or ENA_RX_REFILL_THRESH_PACKET */ #define ENA_RX_REFILL_THRESH_DIVIDER 8 #define ENA_RX_REFILL_THRESH_PACKET 256 #define ENA_IRQNAME_SIZE 40 #define ENA_PKT_MAX_BUFS 19 #define ENA_RX_RSS_TABLE_LOG_SIZE 7 #define ENA_RX_RSS_TABLE_SIZE (1 << ENA_RX_RSS_TABLE_LOG_SIZE) #define ENA_HASH_KEY_SIZE 40 #define ENA_MAX_FRAME_LEN 10000 #define ENA_MIN_FRAME_LEN 60 #define ENA_TX_RESUME_THRESH (ENA_PKT_MAX_BUFS + 2) #define ENA_DB_THRESHOLD 64 #define ENA_TX_COMMIT 32 /* * TX budget for cleaning. It should be half of the RX budget to reduce amount * of TCP retransmissions. */ #define ENA_TX_BUDGET 128 /* RX cleanup budget. -1 stands for infinity. */ #define ENA_RX_BUDGET 256 /* * How many times we can repeat cleanup in the io irq handling routine if the * RX or TX budget was depleted. */ #define ENA_CLEAN_BUDGET 8 #define ENA_RX_IRQ_INTERVAL 20 #define ENA_TX_IRQ_INTERVAL 50 #define ENA_MIN_MTU 128 #define ENA_TSO_MAXSIZE 65536 #define ENA_MMIO_DISABLE_REG_READ BIT(0) #define ENA_TX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - 1)) #define ENA_RX_RING_IDX_NEXT(idx, ring_size) (((idx) + 1) & ((ring_size) - 1)) #define ENA_IO_TXQ_IDX(q) (2 * (q)) #define ENA_IO_RXQ_IDX(q) (2 * (q) + 1) #define ENA_IO_TXQ_IDX_TO_COMBINED_IDX(q) ((q) / 2) #define ENA_IO_RXQ_IDX_TO_COMBINED_IDX(q) (((q) - 1) / 2) #define ENA_MGMNT_IRQ_IDX 0 #define ENA_IO_IRQ_FIRST_IDX 1 #define ENA_IO_IRQ_IDX(q) (ENA_IO_IRQ_FIRST_IDX + (q)) #define ENA_MAX_NO_INTERRUPT_ITERATIONS 3 /* * ENA device should send keep alive msg every 1 sec. * We wait for 6 sec just to be on the safe side. */ #define ENA_DEFAULT_KEEP_ALIVE_TO (SBT_1S * 6) /* Time in jiffies before concluding the transmitter is hung. */ #define ENA_DEFAULT_TX_CMP_TO (SBT_1S * 5) /* Number of queues to check for missing queues per timer tick */ #define ENA_DEFAULT_TX_MONITORED_QUEUES (4) /* Max number of timeouted packets before device reset */ #define ENA_DEFAULT_TX_CMP_THRESHOLD (128) /* * Supported PCI vendor and devices IDs */ #define PCI_VENDOR_ID_AMAZON 0x1d0f #define PCI_DEV_ID_ENA_PF 0x0ec2 #define PCI_DEV_ID_ENA_PF_RSERV0 0x1ec2 #define PCI_DEV_ID_ENA_VF 0xec20 #define PCI_DEV_ID_ENA_VF_RSERV0 0xec21 /* * Flags indicating current ENA driver state */ enum ena_flags_t { ENA_FLAG_DEVICE_RUNNING, ENA_FLAG_DEV_UP, ENA_FLAG_LINK_UP, ENA_FLAG_MSIX_ENABLED, ENA_FLAG_TRIGGER_RESET, ENA_FLAG_ONGOING_RESET, ENA_FLAG_DEV_UP_BEFORE_RESET, ENA_FLAG_RSS_ACTIVE, ENA_FLAGS_NUMBER = ENA_FLAG_RSS_ACTIVE }; BITSET_DEFINE(_ena_state, ENA_FLAGS_NUMBER); typedef struct _ena_state ena_state_t; #define ENA_FLAG_ZERO(adapter) \ BIT_ZERO(ENA_FLAGS_NUMBER, &(adapter)->flags) #define ENA_FLAG_ISSET(bit, adapter) \ BIT_ISSET(ENA_FLAGS_NUMBER, (bit), &(adapter)->flags) #define ENA_FLAG_SET_ATOMIC(bit, adapter) \ BIT_SET_ATOMIC(ENA_FLAGS_NUMBER, (bit), &(adapter)->flags) #define ENA_FLAG_CLEAR_ATOMIC(bit, adapter) \ BIT_CLR_ATOMIC(ENA_FLAGS_NUMBER, (bit), &(adapter)->flags) struct msix_entry { int entry; int vector; }; typedef struct _ena_vendor_info_t { uint16_t vendor_id; uint16_t device_id; unsigned int index; } ena_vendor_info_t; struct ena_irq { /* Interrupt resources */ struct resource *res; driver_filter_t *handler; void *data; void *cookie; unsigned int vector; bool requested; int cpu; char name[ENA_IRQNAME_SIZE]; }; struct ena_que { struct ena_adapter *adapter; struct ena_ring *tx_ring; struct ena_ring *rx_ring; struct task cleanup_task; struct taskqueue *cleanup_tq; uint32_t id; int cpu; cpuset_t cpu_mask; int domain; struct sysctl_oid *oid; }; struct ena_calc_queue_size_ctx { struct ena_com_dev_get_features_ctx *get_feat_ctx; struct ena_com_dev *ena_dev; device_t pdev; uint32_t tx_queue_size; uint32_t rx_queue_size; uint32_t max_tx_queue_size; uint32_t max_rx_queue_size; uint16_t max_tx_sgl_size; uint16_t max_rx_sgl_size; }; #ifdef DEV_NETMAP struct ena_netmap_tx_info { uint32_t socket_buf_idx[ENA_PKT_MAX_BUFS]; bus_dmamap_t map_seg[ENA_PKT_MAX_BUFS]; unsigned int sockets_used; }; #endif struct ena_tx_buffer { struct mbuf *mbuf; /* # of ena desc for this specific mbuf * (includes data desc and metadata desc) */ unsigned int tx_descs; /* # of buffers used by this mbuf */ unsigned int num_of_bufs; bus_dmamap_t dmamap; /* Used to detect missing tx packets */ struct bintime timestamp; bool print_once; #ifdef DEV_NETMAP struct ena_netmap_tx_info nm_info; #endif /* DEV_NETMAP */ struct ena_com_buf bufs[ENA_PKT_MAX_BUFS]; } __aligned(CACHE_LINE_SIZE); struct ena_rx_buffer { struct mbuf *mbuf; bus_dmamap_t map; struct ena_com_buf ena_buf; #ifdef DEV_NETMAP uint32_t netmap_buf_idx; #endif /* DEV_NETMAP */ } __aligned(CACHE_LINE_SIZE); struct ena_stats_tx { counter_u64_t cnt; counter_u64_t bytes; counter_u64_t prepare_ctx_err; counter_u64_t dma_mapping_err; counter_u64_t doorbells; counter_u64_t missing_tx_comp; counter_u64_t bad_req_id; counter_u64_t collapse; counter_u64_t collapse_err; counter_u64_t queue_wakeup; counter_u64_t queue_stop; counter_u64_t llq_buffer_copy; counter_u64_t unmask_interrupt_num; }; struct ena_stats_rx { counter_u64_t cnt; counter_u64_t bytes; counter_u64_t refil_partial; counter_u64_t csum_bad; counter_u64_t mjum_alloc_fail; counter_u64_t mbuf_alloc_fail; counter_u64_t dma_mapping_err; counter_u64_t bad_desc_num; counter_u64_t bad_req_id; counter_u64_t empty_rx_ring; counter_u64_t csum_good; }; struct ena_ring { /* Holds the empty requests for TX/RX out of order completions */ union { uint16_t *free_tx_ids; uint16_t *free_rx_ids; }; struct ena_com_dev *ena_dev; struct ena_adapter *adapter; struct ena_com_io_cq *ena_com_io_cq; struct ena_com_io_sq *ena_com_io_sq; uint16_t qid; /* Determines if device will use LLQ or normal mode for TX */ enum ena_admin_placement_policy_type tx_mem_queue_type; union { /* The maximum length the driver can push to the device (For LLQ) */ uint8_t tx_max_header_size; /* The maximum (and default) mbuf size for the Rx descriptor. */ uint16_t rx_mbuf_sz; }; uint8_t first_interrupt; uint16_t no_interrupt_event_cnt; struct ena_com_rx_buf_info ena_bufs[ENA_PKT_MAX_BUFS]; struct ena_que *que; struct lro_ctrl lro; uint16_t next_to_use; uint16_t next_to_clean; union { struct ena_tx_buffer *tx_buffer_info; /* contex of tx packet */ struct ena_rx_buffer *rx_buffer_info; /* contex of rx packet */ }; int ring_size; /* number of tx/rx_buffer_info's entries */ struct buf_ring *br; /* only for TX */ uint32_t buf_ring_size; struct mtx ring_mtx; char mtx_name[16]; struct { struct task enqueue_task; struct taskqueue *enqueue_tq; }; union { struct ena_stats_tx tx_stats; struct ena_stats_rx rx_stats; }; union { int empty_rx_queue; /* For Tx ring to indicate if it's running or not */ bool running; }; /* How many packets are sent in one Tx loop, used for doorbells */ uint32_t acum_pkts; /* Used for LLQ */ uint8_t *push_buf_intermediate_buf; int tx_last_cleanup_ticks; #ifdef DEV_NETMAP bool initialized; #endif /* DEV_NETMAP */ } __aligned(CACHE_LINE_SIZE); struct ena_stats_dev { counter_u64_t wd_expired; counter_u64_t interface_up; counter_u64_t interface_down; counter_u64_t admin_q_pause; }; struct ena_hw_stats { counter_u64_t rx_packets; counter_u64_t tx_packets; counter_u64_t rx_bytes; counter_u64_t tx_bytes; counter_u64_t rx_drops; counter_u64_t tx_drops; }; /* Board specific private data structure */ struct ena_adapter { struct ena_com_dev *ena_dev; /* OS defined structs */ if_t ifp; device_t pdev; struct ifmedia media; /* OS resources */ struct resource *memory; struct resource *registers; struct resource *msix; int msix_rid; /* MSI-X */ struct msix_entry *msix_entries; int msix_vecs; /* DMA tags used throughout the driver adapter for Tx and Rx */ bus_dma_tag_t tx_buf_tag; bus_dma_tag_t rx_buf_tag; int dma_width; uint32_t max_mtu; uint32_t num_io_queues; uint32_t max_num_io_queues; uint32_t requested_tx_ring_size; uint32_t requested_rx_ring_size; uint32_t max_tx_ring_size; uint32_t max_rx_ring_size; uint16_t max_tx_sgl_size; uint16_t max_rx_sgl_size; uint32_t tx_offload_cap; uint32_t buf_ring_size; /* RSS*/ int first_bind; struct ena_indir *rss_indir; uint8_t mac_addr[ETHER_ADDR_LEN]; /* mdio and phy*/ ena_state_t flags; /* IRQ CPU affinity */ int irq_cpu_base; uint32_t irq_cpu_stride; uint8_t rss_enabled; /* Queue will represent one TX and one RX ring */ struct ena_que que[ENA_MAX_NUM_IO_QUEUES] __aligned(CACHE_LINE_SIZE); /* TX */ struct ena_ring tx_ring[ENA_MAX_NUM_IO_QUEUES] __aligned(CACHE_LINE_SIZE); /* RX */ struct ena_ring rx_ring[ENA_MAX_NUM_IO_QUEUES] __aligned(CACHE_LINE_SIZE); struct ena_irq irq_tbl[ENA_MAX_MSIX_VEC(ENA_MAX_NUM_IO_QUEUES)]; /* Timer service */ struct callout timer_service; sbintime_t keep_alive_timestamp; uint32_t next_monitored_tx_qid; struct task reset_task; struct taskqueue *reset_tq; struct task metrics_task; struct taskqueue *metrics_tq; int wd_active; sbintime_t keep_alive_timeout; sbintime_t missing_tx_timeout; uint32_t missing_tx_max_queues; uint32_t missing_tx_threshold; bool disable_meta_caching; uint16_t metrics_sample_interval; uint16_t metrics_sample_interval_cnt; /* Statistics */ struct ena_stats_dev dev_stats; struct ena_hw_stats hw_stats; struct ena_admin_eni_stats eni_metrics; struct ena_admin_ena_srd_info ena_srd_info; uint64_t *customer_metrics_array; enum ena_regs_reset_reason_types reset_reason; }; #define ENA_RING_MTX_LOCK(_ring) mtx_lock(&(_ring)->ring_mtx) #define ENA_RING_MTX_TRYLOCK(_ring) mtx_trylock(&(_ring)->ring_mtx) #define ENA_RING_MTX_UNLOCK(_ring) mtx_unlock(&(_ring)->ring_mtx) #define ENA_RING_MTX_ASSERT(_ring) \ mtx_assert(&(_ring)->ring_mtx, MA_OWNED) #define ENA_LOCK_INIT() \ sx_init(&ena_global_lock, "ENA global lock") #define ENA_LOCK_DESTROY() sx_destroy(&ena_global_lock) #define ENA_LOCK_LOCK() sx_xlock(&ena_global_lock) #define ENA_LOCK_UNLOCK() sx_unlock(&ena_global_lock) #define ENA_LOCK_ASSERT() sx_assert(&ena_global_lock, SA_XLOCKED) #define ENA_TIMER_INIT(_adapter) \ callout_init(&(_adapter)->timer_service, true) #define ENA_TIMER_DRAIN(_adapter) \ callout_drain(&(_adapter)->timer_service) #define ENA_TIMER_RESET(_adapter) \ callout_reset_sbt(&(_adapter)->timer_service, SBT_1S, SBT_1S, \ ena_timer_service, (void*)(_adapter), 0) #define clamp_t(type, _x, min, max) min_t(type, max_t(type, _x, min), max) #define clamp_val(val, lo, hi) clamp_t(__typeof(val), val, lo, hi) extern struct sx ena_global_lock; int ena_up(struct ena_adapter *adapter); void ena_down(struct ena_adapter *adapter); int ena_restore_device(struct ena_adapter *adapter); void ena_destroy_device(struct ena_adapter *adapter, bool graceful); int ena_refill_rx_bufs(struct ena_ring *rx_ring, uint32_t num); int ena_update_buf_ring_size(struct ena_adapter *adapter, uint32_t new_buf_ring_size); int ena_update_queue_size(struct ena_adapter *adapter, uint32_t new_tx_size, uint32_t new_rx_size); int ena_update_io_queue_nb(struct ena_adapter *adapter, uint32_t new_num); int ena_update_base_cpu(struct ena_adapter *adapter, int new_num); int ena_update_cpu_stride(struct ena_adapter *adapter, uint32_t new_num); static inline int ena_mbuf_count(struct mbuf *mbuf) { int count = 1; while ((mbuf = mbuf->m_next) != NULL) ++count; return count; } static inline void ena_trigger_reset(struct ena_adapter *adapter, enum ena_regs_reset_reason_types reset_reason) { if (likely(!ENA_FLAG_ISSET(ENA_FLAG_TRIGGER_RESET, adapter))) { adapter->reset_reason = reset_reason; ENA_FLAG_SET_ATOMIC(ENA_FLAG_TRIGGER_RESET, adapter); } } static inline void ena_ring_tx_doorbell(struct ena_ring *tx_ring) { ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq); counter_u64_add(tx_ring->tx_stats.doorbells, 1); tx_ring->acum_pkts = 0; } #endif /* !(ENA_H) */ diff --git a/sys/dev/ena/ena_datapath.c b/sys/dev/ena/ena_datapath.c index 177f33ea8ef3..66a93bbe7a6c 100644 --- a/sys/dev/ena/ena_datapath.c +++ b/sys/dev/ena/ena_datapath.c @@ -1,1149 +1,1149 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include #include "opt_rss.h" #include "ena.h" #include "ena_datapath.h" #ifdef DEV_NETMAP #include "ena_netmap.h" #endif /* DEV_NETMAP */ #ifdef RSS #include #endif /* RSS */ #include /********************************************************************* * Static functions prototypes *********************************************************************/ static int ena_tx_cleanup(struct ena_ring *); static int ena_rx_cleanup(struct ena_ring *); static inline int ena_get_tx_req_id(struct ena_ring *tx_ring, struct ena_com_io_cq *io_cq, uint16_t *req_id); static void ena_rx_hash_mbuf(struct ena_ring *, struct ena_com_rx_ctx *, struct mbuf *); static struct mbuf *ena_rx_mbuf(struct ena_ring *, struct ena_com_rx_buf_info *, struct ena_com_rx_ctx *, uint16_t *); static inline void ena_rx_checksum(struct ena_ring *, struct ena_com_rx_ctx *, struct mbuf *); static void ena_tx_csum(struct ena_com_tx_ctx *, struct mbuf *, bool); static int ena_check_and_collapse_mbuf(struct ena_ring *tx_ring, struct mbuf **mbuf); static int ena_xmit_mbuf(struct ena_ring *, struct mbuf **); static void ena_start_xmit(struct ena_ring *); /********************************************************************* * Global functions *********************************************************************/ void ena_cleanup(void *arg, int pending) { struct ena_que *que = arg; struct ena_adapter *adapter = que->adapter; if_t ifp = adapter->ifp; struct ena_ring *tx_ring; struct ena_ring *rx_ring; struct ena_com_io_cq *io_cq; struct ena_eth_io_intr_reg intr_reg; int qid, ena_qid; int txc, rxc, i; if (unlikely((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0)) return; ena_log_io(adapter->pdev, DBG, "MSI-X TX/RX routine\n"); tx_ring = que->tx_ring; rx_ring = que->rx_ring; qid = que->id; ena_qid = ENA_IO_TXQ_IDX(qid); io_cq = &adapter->ena_dev->io_cq_queues[ena_qid]; atomic_store_8(&tx_ring->first_interrupt, 1); atomic_store_8(&rx_ring->first_interrupt, 1); for (i = 0; i < ENA_CLEAN_BUDGET; ++i) { rxc = ena_rx_cleanup(rx_ring); txc = ena_tx_cleanup(tx_ring); if (unlikely((if_getdrvflags(ifp) & IFF_DRV_RUNNING) == 0)) return; if ((txc != ENA_TX_BUDGET) && (rxc != ENA_RX_BUDGET)) break; } /* Signal that work is done and unmask interrupt */ ena_com_update_intr_reg(&intr_reg, ENA_RX_IRQ_INTERVAL, ENA_TX_IRQ_INTERVAL, true, false); counter_u64_add(tx_ring->tx_stats.unmask_interrupt_num, 1); ena_com_unmask_intr(io_cq, &intr_reg); } void ena_deferred_mq_start(void *arg, int pending) { struct ena_ring *tx_ring = (struct ena_ring *)arg; if_t ifp = tx_ring->adapter->ifp; while (!drbr_empty(ifp, tx_ring->br) && tx_ring->running && (if_getdrvflags(ifp) & IFF_DRV_RUNNING) != 0) { ENA_RING_MTX_LOCK(tx_ring); ena_start_xmit(tx_ring); ENA_RING_MTX_UNLOCK(tx_ring); } } int ena_mq_start(if_t ifp, struct mbuf *m) { struct ena_adapter *adapter = if_getsoftc(ifp); struct ena_ring *tx_ring; int ret, is_drbr_empty; uint32_t i; #ifdef RSS uint32_t bucket_id; #endif if (unlikely((if_getdrvflags(adapter->ifp) & IFF_DRV_RUNNING) == 0)) return (ENODEV); /* Which queue to use */ /* * If everything is setup correctly, it should be the * same bucket that the current CPU we're on is. * It should improve performance. */ if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) { #ifdef RSS if (rss_hash2bucket(m->m_pkthdr.flowid, M_HASHTYPE_GET(m), &bucket_id) == 0) i = bucket_id % adapter->num_io_queues; else #endif i = m->m_pkthdr.flowid % adapter->num_io_queues; } else { i = curcpu % adapter->num_io_queues; } tx_ring = &adapter->tx_ring[i]; /* Check if drbr is empty before putting packet */ is_drbr_empty = drbr_empty(ifp, tx_ring->br); ret = drbr_enqueue(ifp, tx_ring->br, m); if (unlikely(ret != 0)) { taskqueue_enqueue(tx_ring->enqueue_tq, &tx_ring->enqueue_task); return (ret); } if (is_drbr_empty && (ENA_RING_MTX_TRYLOCK(tx_ring) != 0)) { ena_start_xmit(tx_ring); ENA_RING_MTX_UNLOCK(tx_ring); } else { taskqueue_enqueue(tx_ring->enqueue_tq, &tx_ring->enqueue_task); } return (0); } void ena_qflush(if_t ifp) { struct ena_adapter *adapter = if_getsoftc(ifp); struct ena_ring *tx_ring = adapter->tx_ring; int i; for (i = 0; i < adapter->num_io_queues; ++i, ++tx_ring) if (!drbr_empty(ifp, tx_ring->br)) { ENA_RING_MTX_LOCK(tx_ring); drbr_flush(ifp, tx_ring->br); ENA_RING_MTX_UNLOCK(tx_ring); } if_qflush(ifp); } /********************************************************************* * Static functions *********************************************************************/ static inline int ena_get_tx_req_id(struct ena_ring *tx_ring, struct ena_com_io_cq *io_cq, uint16_t *req_id) { struct ena_adapter *adapter = tx_ring->adapter; int rc; rc = ena_com_tx_comp_req_id_get(io_cq, req_id); if (rc == ENA_COM_TRY_AGAIN) return (EAGAIN); if (unlikely(rc != 0)) { ena_log(adapter->pdev, ERR, "Invalid req_id %hu in qid %hu\n", *req_id, tx_ring->qid); counter_u64_add(tx_ring->tx_stats.bad_req_id, 1); goto err; } if (tx_ring->tx_buffer_info[*req_id].mbuf != NULL) return (0); ena_log(adapter->pdev, ERR, "tx_info doesn't have valid mbuf. req_id %hu qid %hu\n", *req_id, tx_ring->qid); err: ena_trigger_reset(adapter, ENA_REGS_RESET_INV_TX_REQ_ID); return (EFAULT); } /** * ena_tx_cleanup - clear sent packets and corresponding descriptors * @tx_ring: ring for which we want to clean packets * * Once packets are sent, we ask the device in a loop for no longer used * descriptors. We find the related mbuf chain in a map (index in an array) * and free it, then update ring state. * This is performed in "endless" loop, updating ring pointers every * TX_COMMIT. The first check of free descriptor is performed before the actual * loop, then repeated at the loop end. **/ static int ena_tx_cleanup(struct ena_ring *tx_ring) { struct ena_adapter *adapter; struct ena_com_io_cq *io_cq; uint16_t next_to_clean; uint16_t req_id; uint16_t ena_qid; unsigned int total_done = 0; int rc; int commit = ENA_TX_COMMIT; int budget = ENA_TX_BUDGET; int work_done; bool above_thresh; adapter = tx_ring->que->adapter; ena_qid = ENA_IO_TXQ_IDX(tx_ring->que->id); io_cq = &adapter->ena_dev->io_cq_queues[ena_qid]; next_to_clean = tx_ring->next_to_clean; #ifdef DEV_NETMAP if (netmap_tx_irq(adapter->ifp, tx_ring->qid) != NM_IRQ_PASS) return (0); #endif /* DEV_NETMAP */ do { struct ena_tx_buffer *tx_info; struct mbuf *mbuf; rc = ena_get_tx_req_id(tx_ring, io_cq, &req_id); if (unlikely(rc != 0)) break; tx_info = &tx_ring->tx_buffer_info[req_id]; mbuf = tx_info->mbuf; tx_info->mbuf = NULL; bintime_clear(&tx_info->timestamp); bus_dmamap_sync(adapter->tx_buf_tag, tx_info->dmamap, BUS_DMASYNC_POSTWRITE); bus_dmamap_unload(adapter->tx_buf_tag, tx_info->dmamap); ena_log_io(adapter->pdev, DBG, "tx: q %d mbuf %p completed\n", tx_ring->qid, mbuf); m_freem(mbuf); total_done += tx_info->tx_descs; tx_ring->free_tx_ids[next_to_clean] = req_id; next_to_clean = ENA_TX_RING_IDX_NEXT(next_to_clean, tx_ring->ring_size); if (unlikely(--commit == 0)) { commit = ENA_TX_COMMIT; /* update ring state every ENA_TX_COMMIT descriptor */ tx_ring->next_to_clean = next_to_clean; ena_com_comp_ack( &adapter->ena_dev->io_sq_queues[ena_qid], total_done); total_done = 0; } } while (likely(--budget)); work_done = ENA_TX_BUDGET - budget; ena_log_io(adapter->pdev, DBG, "tx: q %d done. total pkts: %d\n", tx_ring->qid, work_done); /* If there is still something to commit update ring state */ if (likely(commit != ENA_TX_COMMIT)) { tx_ring->next_to_clean = next_to_clean; ena_com_comp_ack(&adapter->ena_dev->io_sq_queues[ena_qid], total_done); } /* * Need to make the rings circular update visible to * ena_xmit_mbuf() before checking for tx_ring->running. */ mb(); above_thresh = ena_com_sq_have_enough_space(tx_ring->ena_com_io_sq, ENA_TX_RESUME_THRESH); if (unlikely(!tx_ring->running && above_thresh)) { ENA_RING_MTX_LOCK(tx_ring); above_thresh = ena_com_sq_have_enough_space( tx_ring->ena_com_io_sq, ENA_TX_RESUME_THRESH); if (!tx_ring->running && above_thresh) { tx_ring->running = true; counter_u64_add(tx_ring->tx_stats.queue_wakeup, 1); taskqueue_enqueue(tx_ring->enqueue_tq, &tx_ring->enqueue_task); } ENA_RING_MTX_UNLOCK(tx_ring); } tx_ring->tx_last_cleanup_ticks = ticks; return (work_done); } static void ena_rx_hash_mbuf(struct ena_ring *rx_ring, struct ena_com_rx_ctx *ena_rx_ctx, struct mbuf *mbuf) { struct ena_adapter *adapter = rx_ring->adapter; if (likely(ENA_FLAG_ISSET(ENA_FLAG_RSS_ACTIVE, adapter))) { mbuf->m_pkthdr.flowid = ena_rx_ctx->hash; #ifdef RSS /* * Hardware and software RSS are in agreement only when both are * configured to Toeplitz algorithm. This driver configures * that algorithm only when software RSS is enabled and uses it. */ if (adapter->ena_dev->rss.hash_func != ENA_ADMIN_TOEPLITZ && ena_rx_ctx->l3_proto != ENA_ETH_IO_L3_PROTO_UNKNOWN) { M_HASHTYPE_SET(mbuf, M_HASHTYPE_OPAQUE_HASH); return; } #endif if (ena_rx_ctx->frag && (ena_rx_ctx->l3_proto != ENA_ETH_IO_L3_PROTO_UNKNOWN)) { M_HASHTYPE_SET(mbuf, M_HASHTYPE_OPAQUE_HASH); return; } switch (ena_rx_ctx->l3_proto) { case ENA_ETH_IO_L3_PROTO_IPV4: switch (ena_rx_ctx->l4_proto) { case ENA_ETH_IO_L4_PROTO_TCP: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_TCP_IPV4); break; case ENA_ETH_IO_L4_PROTO_UDP: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_UDP_IPV4); break; default: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_IPV4); } break; case ENA_ETH_IO_L3_PROTO_IPV6: switch (ena_rx_ctx->l4_proto) { case ENA_ETH_IO_L4_PROTO_TCP: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_TCP_IPV6); break; case ENA_ETH_IO_L4_PROTO_UDP: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_UDP_IPV6); break; default: M_HASHTYPE_SET(mbuf, M_HASHTYPE_RSS_IPV6); } break; case ENA_ETH_IO_L3_PROTO_UNKNOWN: M_HASHTYPE_SET(mbuf, M_HASHTYPE_NONE); break; default: M_HASHTYPE_SET(mbuf, M_HASHTYPE_OPAQUE_HASH); } } else { mbuf->m_pkthdr.flowid = rx_ring->qid; M_HASHTYPE_SET(mbuf, M_HASHTYPE_NONE); } } /** * ena_rx_mbuf - assemble mbuf from descriptors * @rx_ring: ring for which we want to clean packets * @ena_bufs: buffer info * @ena_rx_ctx: metadata for this packet(s) * @next_to_clean: ring pointer, will be updated only upon success * **/ static struct mbuf * ena_rx_mbuf(struct ena_ring *rx_ring, struct ena_com_rx_buf_info *ena_bufs, struct ena_com_rx_ctx *ena_rx_ctx, uint16_t *next_to_clean) { struct mbuf *mbuf; struct ena_rx_buffer *rx_info; struct ena_adapter *adapter; device_t pdev; unsigned int descs = ena_rx_ctx->descs; uint16_t ntc, len, req_id, buf = 0; ntc = *next_to_clean; adapter = rx_ring->adapter; pdev = adapter->pdev; len = ena_bufs[buf].len; req_id = ena_bufs[buf].req_id; rx_info = &rx_ring->rx_buffer_info[req_id]; if (unlikely(rx_info->mbuf == NULL)) { ena_log(pdev, ERR, "NULL mbuf in rx_info"); return (NULL); } ena_log_io(pdev, DBG, "rx_info %p, mbuf %p, paddr %jx\n", rx_info, rx_info->mbuf, (uintmax_t)rx_info->ena_buf.paddr); bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_POSTREAD); mbuf = rx_info->mbuf; mbuf->m_flags |= M_PKTHDR; mbuf->m_pkthdr.len = len; mbuf->m_len = len; /* Only for the first segment the data starts at specific offset */ mbuf->m_data = mtodo(mbuf, ena_rx_ctx->pkt_offset); ena_log_io(pdev, DBG, "Mbuf data offset=%u\n", ena_rx_ctx->pkt_offset); mbuf->m_pkthdr.rcvif = rx_ring->que->adapter->ifp; /* Fill mbuf with hash key and it's interpretation for optimization */ ena_rx_hash_mbuf(rx_ring, ena_rx_ctx, mbuf); ena_log_io(pdev, DBG, "rx mbuf 0x%p, flags=0x%x, len: %d\n", mbuf, mbuf->m_flags, mbuf->m_pkthdr.len); /* DMA address is not needed anymore, unmap it */ bus_dmamap_unload(rx_ring->adapter->rx_buf_tag, rx_info->map); rx_info->mbuf = NULL; rx_ring->free_rx_ids[ntc] = req_id; ntc = ENA_RX_RING_IDX_NEXT(ntc, rx_ring->ring_size); /* * While we have more than 1 descriptors for one rcvd packet, append * other mbufs to the main one */ while (--descs) { ++buf; len = ena_bufs[buf].len; req_id = ena_bufs[buf].req_id; rx_info = &rx_ring->rx_buffer_info[req_id]; if (unlikely(rx_info->mbuf == NULL)) { ena_log(pdev, ERR, "NULL mbuf in rx_info"); /* * If one of the required mbufs was not allocated yet, * we can break there. * All earlier used descriptors will be reallocated * later and not used mbufs can be reused. * The next_to_clean pointer will not be updated in case * of an error, so caller should advance it manually * in error handling routine to keep it up to date * with hw ring. */ m_freem(mbuf); return (NULL); } bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_POSTREAD); if (unlikely(m_append(mbuf, len, rx_info->mbuf->m_data) == 0)) { counter_u64_add(rx_ring->rx_stats.mbuf_alloc_fail, 1); ena_log_io(pdev, WARN, "Failed to append Rx mbuf %p\n", mbuf); } ena_log_io(pdev, DBG, "rx mbuf updated. len %d\n", mbuf->m_pkthdr.len); /* Free already appended mbuf, it won't be useful anymore */ bus_dmamap_unload(rx_ring->adapter->rx_buf_tag, rx_info->map); m_freem(rx_info->mbuf); rx_info->mbuf = NULL; rx_ring->free_rx_ids[ntc] = req_id; ntc = ENA_RX_RING_IDX_NEXT(ntc, rx_ring->ring_size); } *next_to_clean = ntc; return (mbuf); } /** * ena_rx_checksum - indicate in mbuf if hw indicated a good cksum **/ static inline void ena_rx_checksum(struct ena_ring *rx_ring, struct ena_com_rx_ctx *ena_rx_ctx, struct mbuf *mbuf) { device_t pdev = rx_ring->adapter->pdev; /* if IP and error */ if (unlikely((ena_rx_ctx->l3_proto == ENA_ETH_IO_L3_PROTO_IPV4) && ena_rx_ctx->l3_csum_err)) { /* ipv4 checksum error */ mbuf->m_pkthdr.csum_flags = 0; counter_u64_add(rx_ring->rx_stats.csum_bad, 1); ena_log_io(pdev, DBG, "RX IPv4 header checksum error\n"); return; } /* if TCP/UDP */ if ((ena_rx_ctx->l4_proto == ENA_ETH_IO_L4_PROTO_TCP) || (ena_rx_ctx->l4_proto == ENA_ETH_IO_L4_PROTO_UDP)) { if (ena_rx_ctx->l4_csum_err) { /* TCP/UDP checksum error */ mbuf->m_pkthdr.csum_flags = 0; counter_u64_add(rx_ring->rx_stats.csum_bad, 1); ena_log_io(pdev, DBG, "RX L4 checksum error\n"); } else { mbuf->m_pkthdr.csum_flags = CSUM_IP_CHECKED; mbuf->m_pkthdr.csum_flags |= CSUM_IP_VALID; counter_u64_add(rx_ring->rx_stats.csum_good, 1); } } } /** * ena_rx_cleanup - handle rx irq * @arg: ring for which irq is being handled **/ static int ena_rx_cleanup(struct ena_ring *rx_ring) { struct ena_adapter *adapter; device_t pdev; struct mbuf *mbuf; struct ena_com_rx_ctx ena_rx_ctx; struct ena_com_io_cq *io_cq; struct ena_com_io_sq *io_sq; enum ena_regs_reset_reason_types reset_reason; if_t ifp; uint16_t ena_qid; uint16_t next_to_clean; uint32_t refill_required; uint32_t refill_threshold; uint32_t do_if_input = 0; unsigned int qid; int rc, i; int budget = ENA_RX_BUDGET; #ifdef DEV_NETMAP int done; #endif /* DEV_NETMAP */ adapter = rx_ring->que->adapter; pdev = adapter->pdev; ifp = adapter->ifp; qid = rx_ring->que->id; ena_qid = ENA_IO_RXQ_IDX(qid); io_cq = &adapter->ena_dev->io_cq_queues[ena_qid]; io_sq = &adapter->ena_dev->io_sq_queues[ena_qid]; next_to_clean = rx_ring->next_to_clean; #ifdef DEV_NETMAP if (netmap_rx_irq(adapter->ifp, rx_ring->qid, &done) != NM_IRQ_PASS) return (0); #endif /* DEV_NETMAP */ ena_log_io(pdev, DBG, "rx: qid %d\n", qid); do { ena_rx_ctx.ena_bufs = rx_ring->ena_bufs; ena_rx_ctx.max_bufs = adapter->max_rx_sgl_size; ena_rx_ctx.descs = 0; ena_rx_ctx.pkt_offset = 0; bus_dmamap_sync(io_cq->cdesc_addr.mem_handle.tag, io_cq->cdesc_addr.mem_handle.map, BUS_DMASYNC_POSTREAD); rc = ena_com_rx_pkt(io_cq, io_sq, &ena_rx_ctx); if (unlikely(rc != 0)) { if (rc == ENA_COM_NO_SPACE) { counter_u64_add(rx_ring->rx_stats.bad_desc_num, 1); reset_reason = ENA_REGS_RESET_TOO_MANY_RX_DESCS; } else { counter_u64_add(rx_ring->rx_stats.bad_req_id, 1); reset_reason = ENA_REGS_RESET_INV_RX_REQ_ID; } ena_trigger_reset(adapter, reset_reason); return (0); } if (unlikely(ena_rx_ctx.descs == 0)) break; ena_log_io(pdev, DBG, "rx: q %d got packet from ena. descs #: %d l3 proto %d l4 proto %d hash: %x\n", rx_ring->qid, ena_rx_ctx.descs, ena_rx_ctx.l3_proto, ena_rx_ctx.l4_proto, ena_rx_ctx.hash); /* Receive mbuf from the ring */ mbuf = ena_rx_mbuf(rx_ring, rx_ring->ena_bufs, &ena_rx_ctx, &next_to_clean); bus_dmamap_sync(io_cq->cdesc_addr.mem_handle.tag, io_cq->cdesc_addr.mem_handle.map, BUS_DMASYNC_PREREAD); /* Exit if we failed to retrieve a buffer */ if (unlikely(mbuf == NULL)) { for (i = 0; i < ena_rx_ctx.descs; ++i) { rx_ring->free_rx_ids[next_to_clean] = rx_ring->ena_bufs[i].req_id; next_to_clean = ENA_RX_RING_IDX_NEXT( next_to_clean, rx_ring->ring_size); } break; } if (((if_getcapenable(ifp) & IFCAP_RXCSUM) != 0) || ((if_getcapenable(ifp) & IFCAP_RXCSUM_IPV6) != 0)) { ena_rx_checksum(rx_ring, &ena_rx_ctx, mbuf); } counter_enter(); counter_u64_add_protected(rx_ring->rx_stats.bytes, mbuf->m_pkthdr.len); counter_u64_add_protected(adapter->hw_stats.rx_bytes, mbuf->m_pkthdr.len); counter_exit(); /* * LRO is only for IP/TCP packets and TCP checksum of the packet * should be computed by hardware. */ do_if_input = 1; if (((if_getcapenable(ifp) & IFCAP_LRO) != 0) && ((mbuf->m_pkthdr.csum_flags & CSUM_IP_VALID) != 0) && (ena_rx_ctx.l4_proto == ENA_ETH_IO_L4_PROTO_TCP)) { /* * Send to the stack if: * - LRO not enabled, or * - no LRO resources, or * - lro enqueue fails */ if ((rx_ring->lro.lro_cnt != 0) && (tcp_lro_rx(&rx_ring->lro, mbuf, 0) == 0)) do_if_input = 0; } if (do_if_input != 0) { ena_log_io(pdev, DBG, "calling if_input() with mbuf %p\n", mbuf); if_input(ifp, mbuf); } counter_enter(); counter_u64_add_protected(rx_ring->rx_stats.cnt, 1); counter_u64_add_protected(adapter->hw_stats.rx_packets, 1); counter_exit(); } while (--budget); rx_ring->next_to_clean = next_to_clean; refill_required = ena_com_free_q_entries(io_sq); refill_threshold = min_t(int, rx_ring->ring_size / ENA_RX_REFILL_THRESH_DIVIDER, ENA_RX_REFILL_THRESH_PACKET); if (refill_required > refill_threshold) { ena_refill_rx_bufs(rx_ring, refill_required); } tcp_lro_flush_all(&rx_ring->lro); return (ENA_RX_BUDGET - budget); } static void ena_tx_csum(struct ena_com_tx_ctx *ena_tx_ctx, struct mbuf *mbuf, bool disable_meta_caching) { struct ena_com_tx_meta *ena_meta; struct ether_vlan_header *eh; struct mbuf *mbuf_next; u32 mss; bool offload; uint16_t etype; int ehdrlen; struct ip *ip; int ipproto; int iphlen; struct tcphdr *th; int offset; offload = false; ena_meta = &ena_tx_ctx->ena_meta; mss = mbuf->m_pkthdr.tso_segsz; if (mss != 0) offload = true; if ((mbuf->m_pkthdr.csum_flags & CSUM_TSO) != 0) offload = true; if ((mbuf->m_pkthdr.csum_flags & CSUM_OFFLOAD) != 0) offload = true; if ((mbuf->m_pkthdr.csum_flags & CSUM6_OFFLOAD) != 0) offload = true; if (!offload) { if (disable_meta_caching) { memset(ena_meta, 0, sizeof(*ena_meta)); ena_tx_ctx->meta_valid = 1; } else { ena_tx_ctx->meta_valid = 0; } return; } /* Determine where frame payload starts. */ eh = mtod(mbuf, struct ether_vlan_header *); if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { etype = ntohs(eh->evl_proto); ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN; } else { etype = ntohs(eh->evl_encap_proto); ehdrlen = ETHER_HDR_LEN; } mbuf_next = m_getptr(mbuf, ehdrlen, &offset); switch (etype) { case ETHERTYPE_IP: ip = (struct ip *)(mtodo(mbuf_next, offset)); iphlen = ip->ip_hl << 2; ipproto = ip->ip_p; ena_tx_ctx->l3_proto = ENA_ETH_IO_L3_PROTO_IPV4; if ((ip->ip_off & htons(IP_DF)) != 0) ena_tx_ctx->df = 1; break; case ETHERTYPE_IPV6: ena_tx_ctx->l3_proto = ENA_ETH_IO_L3_PROTO_IPV6; iphlen = ip6_lasthdr(mbuf, ehdrlen, IPPROTO_IPV6, &ipproto); iphlen -= ehdrlen; ena_tx_ctx->df = 1; break; default: iphlen = 0; ipproto = 0; break; } mbuf_next = m_getptr(mbuf, iphlen + ehdrlen, &offset); th = (struct tcphdr *)(mtodo(mbuf_next, offset)); if ((mbuf->m_pkthdr.csum_flags & CSUM_IP) != 0) { ena_tx_ctx->l3_csum_enable = 1; } if ((mbuf->m_pkthdr.csum_flags & CSUM_TSO) != 0) { ena_tx_ctx->tso_enable = 1; ena_meta->l4_hdr_len = (th->th_off); } if (ipproto == IPPROTO_TCP) { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_TCP; if ((mbuf->m_pkthdr.csum_flags & (CSUM_IP_TCP | CSUM_IP6_TCP)) != 0) ena_tx_ctx->l4_csum_enable = 1; else ena_tx_ctx->l4_csum_enable = 0; } else if (ipproto == IPPROTO_UDP) { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_UDP; if ((mbuf->m_pkthdr.csum_flags & (CSUM_IP_UDP | CSUM_IP6_UDP)) != 0) ena_tx_ctx->l4_csum_enable = 1; else ena_tx_ctx->l4_csum_enable = 0; } else { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_UNKNOWN; ena_tx_ctx->l4_csum_enable = 0; } ena_meta->mss = mss; ena_meta->l3_hdr_len = iphlen; ena_meta->l3_hdr_offset = ehdrlen; ena_tx_ctx->meta_valid = 1; } static int ena_check_and_collapse_mbuf(struct ena_ring *tx_ring, struct mbuf **mbuf) { struct ena_adapter *adapter; struct mbuf *collapsed_mbuf; int num_frags; adapter = tx_ring->adapter; num_frags = ena_mbuf_count(*mbuf); /* One segment must be reserved for configuration descriptor. */ if (num_frags < adapter->max_tx_sgl_size) return (0); if ((num_frags == adapter->max_tx_sgl_size) && ((*mbuf)->m_pkthdr.len < tx_ring->tx_max_header_size)) return (0); counter_u64_add(tx_ring->tx_stats.collapse, 1); collapsed_mbuf = m_collapse(*mbuf, M_NOWAIT, adapter->max_tx_sgl_size - 1); if (unlikely(collapsed_mbuf == NULL)) { counter_u64_add(tx_ring->tx_stats.collapse_err, 1); return (ENOMEM); } /* If mbuf was collapsed succesfully, original mbuf is released. */ *mbuf = collapsed_mbuf; return (0); } static int ena_tx_map_mbuf(struct ena_ring *tx_ring, struct ena_tx_buffer *tx_info, struct mbuf *mbuf, void **push_hdr, u16 *header_len) { struct ena_adapter *adapter = tx_ring->adapter; struct ena_com_buf *ena_buf; bus_dma_segment_t segs[ENA_BUS_DMA_SEGS]; size_t iseg = 0; uint32_t mbuf_head_len; uint16_t offset; int rc, nsegs; mbuf_head_len = mbuf->m_len; tx_info->mbuf = mbuf; ena_buf = tx_info->bufs; /* * For easier maintaining of the DMA map, map the whole mbuf even if * the LLQ is used. The descriptors will be filled using the segments. */ rc = bus_dmamap_load_mbuf_sg(adapter->tx_buf_tag, tx_info->dmamap, mbuf, segs, &nsegs, BUS_DMA_NOWAIT); if (unlikely((rc != 0) || (nsegs == 0))) { ena_log_io(adapter->pdev, WARN, "dmamap load failed! err: %d nsegs: %d\n", rc, nsegs); goto dma_error; } if (tx_ring->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) { /* * When the device is LLQ mode, the driver will copy * the header into the device memory space. * the ena_com layer assumes the header is in a linear * memory space. * This assumption might be wrong since part of the header * can be in the fragmented buffers. * First check if header fits in the mbuf. If not, copy it to * separate buffer that will be holding linearized data. */ *header_len = min_t(uint32_t, mbuf->m_pkthdr.len, tx_ring->tx_max_header_size); /* If header is in linear space, just point into mbuf's data. */ if (likely(*header_len <= mbuf_head_len)) { *push_hdr = mbuf->m_data; /* * Otherwise, copy whole portion of header from multiple * mbufs to intermediate buffer. */ } else { m_copydata(mbuf, 0, *header_len, tx_ring->push_buf_intermediate_buf); *push_hdr = tx_ring->push_buf_intermediate_buf; counter_u64_add(tx_ring->tx_stats.llq_buffer_copy, 1); } ena_log_io(adapter->pdev, DBG, "mbuf: %p header_buf->vaddr: %p push_len: %d\n", mbuf, *push_hdr, *header_len); /* If packet is fitted in LLQ header, no need for DMA segments. */ if (mbuf->m_pkthdr.len <= tx_ring->tx_max_header_size) { return (0); } else { offset = tx_ring->tx_max_header_size; /* * As Header part is mapped to LLQ header, we can skip * it and just map the residuum of the mbuf to DMA * Segments. */ while (offset > 0) { if (offset >= segs[iseg].ds_len) { offset -= segs[iseg].ds_len; } else { ena_buf->paddr = segs[iseg].ds_addr + offset; ena_buf->len = segs[iseg].ds_len - offset; ena_buf++; tx_info->num_of_bufs++; offset = 0; } iseg++; } } } else { *push_hdr = NULL; /* * header_len is just a hint for the device. Because FreeBSD is * not giving us information about packet header length and it * is not guaranteed that all packet headers will be in the 1st * mbuf, setting header_len to 0 is making the device ignore * this value and resolve header on it's own. */ *header_len = 0; } /* Map rest of the mbuf */ while (iseg < nsegs) { ena_buf->paddr = segs[iseg].ds_addr; ena_buf->len = segs[iseg].ds_len; ena_buf++; iseg++; tx_info->num_of_bufs++; } return (0); dma_error: counter_u64_add(tx_ring->tx_stats.dma_mapping_err, 1); tx_info->mbuf = NULL; return (rc); } static int ena_xmit_mbuf(struct ena_ring *tx_ring, struct mbuf **mbuf) { struct ena_adapter *adapter; device_t pdev; struct ena_tx_buffer *tx_info; struct ena_com_tx_ctx ena_tx_ctx; struct ena_com_dev *ena_dev; struct ena_com_io_sq *io_sq; void *push_hdr; uint16_t next_to_use; uint16_t req_id; uint16_t ena_qid; uint16_t header_len; int rc; int nb_hw_desc; ena_qid = ENA_IO_TXQ_IDX(tx_ring->que->id); adapter = tx_ring->que->adapter; pdev = adapter->pdev; ena_dev = adapter->ena_dev; io_sq = &ena_dev->io_sq_queues[ena_qid]; rc = ena_check_and_collapse_mbuf(tx_ring, mbuf); if (unlikely(rc != 0)) { ena_log_io(pdev, WARN, "Failed to collapse mbuf! err: %d\n", rc); return (rc); } ena_log_io(pdev, DBG, "Tx: %d bytes\n", (*mbuf)->m_pkthdr.len); next_to_use = tx_ring->next_to_use; req_id = tx_ring->free_tx_ids[next_to_use]; tx_info = &tx_ring->tx_buffer_info[req_id]; tx_info->num_of_bufs = 0; ENA_WARN(tx_info->mbuf != NULL, adapter->ena_dev, "mbuf isn't NULL for req_id %d\n", req_id); rc = ena_tx_map_mbuf(tx_ring, tx_info, *mbuf, &push_hdr, &header_len); if (unlikely(rc != 0)) { ena_log_io(pdev, WARN, "Failed to map TX mbuf\n"); return (rc); } memset(&ena_tx_ctx, 0x0, sizeof(struct ena_com_tx_ctx)); ena_tx_ctx.ena_bufs = tx_info->bufs; ena_tx_ctx.push_header = push_hdr; ena_tx_ctx.num_bufs = tx_info->num_of_bufs; ena_tx_ctx.req_id = req_id; ena_tx_ctx.header_len = header_len; /* Set flags and meta data */ ena_tx_csum(&ena_tx_ctx, *mbuf, adapter->disable_meta_caching); if (tx_ring->acum_pkts == ENA_DB_THRESHOLD || ena_com_is_doorbell_needed(tx_ring->ena_com_io_sq, &ena_tx_ctx)) { ena_log_io(pdev, DBG, "llq tx max burst size of queue %d achieved, writing doorbell to send burst\n", tx_ring->que->id); ena_ring_tx_doorbell(tx_ring); } /* Prepare the packet's descriptors and send them to device */ rc = ena_com_prepare_tx(io_sq, &ena_tx_ctx, &nb_hw_desc); if (unlikely(rc != 0)) { if (likely(rc == ENA_COM_NO_MEM)) { ena_log_io(pdev, DBG, "tx ring[%d] is out of space\n", tx_ring->que->id); } else { ena_log(pdev, ERR, "failed to prepare tx bufs\n"); ena_trigger_reset(adapter, ENA_REGS_RESET_DRIVER_INVALID_STATE); } counter_u64_add(tx_ring->tx_stats.prepare_ctx_err, 1); goto dma_error; } counter_enter(); counter_u64_add_protected(tx_ring->tx_stats.cnt, 1); counter_u64_add_protected(tx_ring->tx_stats.bytes, (*mbuf)->m_pkthdr.len); counter_u64_add_protected(adapter->hw_stats.tx_packets, 1); counter_u64_add_protected(adapter->hw_stats.tx_bytes, (*mbuf)->m_pkthdr.len); counter_exit(); tx_info->tx_descs = nb_hw_desc; getbinuptime(&tx_info->timestamp); tx_info->print_once = true; tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use, tx_ring->ring_size); /* stop the queue when no more space available, the packet can have up * to sgl_size + 2. one for the meta descriptor and one for header * (if the header is larger than tx_max_header_size). */ if (unlikely(!ena_com_sq_have_enough_space(tx_ring->ena_com_io_sq, adapter->max_tx_sgl_size + 2))) { ena_log_io(pdev, DBG, "Stop queue %d\n", tx_ring->que->id); tx_ring->running = false; counter_u64_add(tx_ring->tx_stats.queue_stop, 1); /* There is a rare condition where this function decides to * stop the queue but meanwhile tx_cleanup() updates * next_to_completion and terminates. * The queue will remain stopped forever. * To solve this issue this function performs mb(), checks * the wakeup condition and wakes up the queue if needed. */ mb(); if (ena_com_sq_have_enough_space(tx_ring->ena_com_io_sq, ENA_TX_RESUME_THRESH)) { tx_ring->running = true; counter_u64_add(tx_ring->tx_stats.queue_wakeup, 1); } } bus_dmamap_sync(adapter->tx_buf_tag, tx_info->dmamap, BUS_DMASYNC_PREWRITE); return (0); dma_error: tx_info->mbuf = NULL; bus_dmamap_unload(adapter->tx_buf_tag, tx_info->dmamap); return (rc); } static void ena_start_xmit(struct ena_ring *tx_ring) { struct mbuf *mbuf; struct ena_adapter *adapter = tx_ring->adapter; int ret = 0; ENA_RING_MTX_ASSERT(tx_ring); if (unlikely((if_getdrvflags(adapter->ifp) & IFF_DRV_RUNNING) == 0)) return; if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, adapter))) return; while ((mbuf = drbr_peek(adapter->ifp, tx_ring->br)) != NULL) { ena_log_io(adapter->pdev, DBG, "\ndequeued mbuf %p with flags %#x and header csum flags %#jx\n", mbuf, mbuf->m_flags, (uint64_t)mbuf->m_pkthdr.csum_flags); if (unlikely(!tx_ring->running)) { drbr_putback(adapter->ifp, tx_ring->br, mbuf); break; } if (unlikely((ret = ena_xmit_mbuf(tx_ring, &mbuf)) != 0)) { if (ret == ENA_COM_NO_MEM) { drbr_putback(adapter->ifp, tx_ring->br, mbuf); } else if (ret == ENA_COM_NO_SPACE) { drbr_putback(adapter->ifp, tx_ring->br, mbuf); } else { m_freem(mbuf); drbr_advance(adapter->ifp, tx_ring->br); } break; } drbr_advance(adapter->ifp, tx_ring->br); if (unlikely((if_getdrvflags(adapter->ifp) & IFF_DRV_RUNNING) == 0)) return; tx_ring->acum_pkts++; BPF_MTAP(adapter->ifp, mbuf); } if (likely(tx_ring->acum_pkts != 0)) { /* Trigger the dma engine */ ena_ring_tx_doorbell(tx_ring); } if (unlikely(!tx_ring->running)) taskqueue_enqueue(tx_ring->que->cleanup_tq, &tx_ring->que->cleanup_task); } diff --git a/sys/dev/ena/ena_datapath.h b/sys/dev/ena/ena_datapath.h index c6166a806c6a..43292b5abbe9 100644 --- a/sys/dev/ena/ena_datapath.h +++ b/sys/dev/ena/ena_datapath.h @@ -1,43 +1,43 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #ifndef ENA_TXRX_H #define ENA_TXRX_H void ena_cleanup(void *arg, int pending); void ena_qflush(if_t ifp); int ena_mq_start(if_t ifp, struct mbuf *m); void ena_deferred_mq_start(void *arg, int pending); #define CSUM_OFFLOAD (CSUM_IP | CSUM_TCP | CSUM_UDP) #define CSUM6_OFFLOAD (CSUM_IP6_UDP | CSUM_IP6_TCP) #endif /* ENA_TXRX_H */ diff --git a/sys/dev/ena/ena_netmap.c b/sys/dev/ena/ena_netmap.c index a8d7cad05ab5..d95f48f7380c 100644 --- a/sys/dev/ena/ena_netmap.c +++ b/sys/dev/ena/ena_netmap.c @@ -1,1080 +1,1080 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include #ifdef DEV_NETMAP #include "ena.h" #include "ena_netmap.h" #define ENA_NETMAP_MORE_FRAMES 1 #define ENA_NETMAP_NO_MORE_FRAMES 0 #define ENA_MAX_FRAMES 16384 struct ena_netmap_ctx { struct netmap_kring *kring; struct ena_adapter *adapter; struct netmap_adapter *na; struct netmap_slot *slots; struct ena_ring *ring; struct ena_com_io_cq *io_cq; struct ena_com_io_sq *io_sq; u_int nm_i; uint16_t nt; uint16_t lim; }; /* Netmap callbacks */ static int ena_netmap_reg(struct netmap_adapter *, int); static int ena_netmap_txsync(struct netmap_kring *, int); static int ena_netmap_rxsync(struct netmap_kring *, int); /* Helper functions */ static int ena_netmap_tx_frames(struct ena_netmap_ctx *); static int ena_netmap_tx_frame(struct ena_netmap_ctx *); static inline uint16_t ena_netmap_count_slots(struct ena_netmap_ctx *); static inline uint16_t ena_netmap_packet_len(struct netmap_slot *, u_int, uint16_t); static int ena_netmap_copy_data(struct netmap_adapter *, struct netmap_slot *, u_int, uint16_t, uint16_t, void *); static int ena_netmap_map_single_slot(struct netmap_adapter *, struct netmap_slot *, bus_dma_tag_t, bus_dmamap_t, void **, uint64_t *); static int ena_netmap_tx_map_slots(struct ena_netmap_ctx *, struct ena_tx_buffer *, void **, uint16_t *, uint16_t *); static void ena_netmap_unmap_last_socket_chain(struct ena_netmap_ctx *, struct ena_tx_buffer *); static void ena_netmap_tx_cleanup(struct ena_netmap_ctx *); static uint16_t ena_netmap_tx_clean_one(struct ena_netmap_ctx *, uint16_t); static inline int validate_tx_req_id(struct ena_ring *, uint16_t); static int ena_netmap_rx_frames(struct ena_netmap_ctx *); static int ena_netmap_rx_frame(struct ena_netmap_ctx *); static int ena_netmap_rx_load_desc(struct ena_netmap_ctx *, uint16_t, int *); static void ena_netmap_rx_cleanup(struct ena_netmap_ctx *); static void ena_netmap_fill_ctx(struct netmap_kring *, struct ena_netmap_ctx *, uint16_t); int ena_netmap_attach(struct ena_adapter *adapter) { struct netmap_adapter na; ena_log_nm(adapter->pdev, INFO, "netmap attach\n"); bzero(&na, sizeof(na)); na.na_flags = NAF_MOREFRAG; na.ifp = adapter->ifp; na.num_tx_desc = adapter->requested_tx_ring_size; na.num_rx_desc = adapter->requested_rx_ring_size; na.num_tx_rings = adapter->num_io_queues; na.num_rx_rings = adapter->num_io_queues; na.rx_buf_maxsize = adapter->buf_ring_size; na.nm_txsync = ena_netmap_txsync; na.nm_rxsync = ena_netmap_rxsync; na.nm_register = ena_netmap_reg; return (netmap_attach(&na)); } int ena_netmap_alloc_rx_slot(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info) { struct netmap_adapter *na = NA(adapter->ifp); struct netmap_kring *kring; struct netmap_ring *ring; struct netmap_slot *slot; void *addr; uint64_t paddr; int nm_i, qid, head, lim, rc; /* if previously allocated frag is not used */ if (unlikely(rx_info->netmap_buf_idx != 0)) return (0); qid = rx_ring->qid; kring = na->rx_rings[qid]; nm_i = kring->nr_hwcur; head = kring->rhead; ena_log_nm(adapter->pdev, DBG, "nr_hwcur: %d, nr_hwtail: %d, rhead: %d, rcur: %d, rtail: %d\n", kring->nr_hwcur, kring->nr_hwtail, kring->rhead, kring->rcur, kring->rtail); if ((nm_i == head) && rx_ring->initialized) { ena_log_nm(adapter->pdev, ERR, "No free slots in netmap ring\n"); return (ENOMEM); } ring = kring->ring; if (ring == NULL) { ena_log_nm(adapter->pdev, ERR, "Rx ring %d is NULL\n", qid); return (EFAULT); } slot = &ring->slot[nm_i]; addr = PNMB(na, slot, &paddr); if (addr == NETMAP_BUF_BASE(na)) { ena_log_nm(adapter->pdev, ERR, "Bad buff in slot\n"); return (EFAULT); } rc = netmap_load_map(na, adapter->rx_buf_tag, rx_info->map, addr); if (rc != 0) { ena_log_nm(adapter->pdev, WARN, "DMA mapping error\n"); return (rc); } bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_PREREAD); rx_info->ena_buf.paddr = paddr; rx_info->ena_buf.len = ring->nr_buf_size; rx_info->mbuf = NULL; rx_info->netmap_buf_idx = slot->buf_idx; slot->buf_idx = 0; lim = kring->nkr_num_slots - 1; kring->nr_hwcur = nm_next(nm_i, lim); return (0); } void ena_netmap_free_rx_slot(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info) { struct netmap_adapter *na; struct netmap_kring *kring; struct netmap_slot *slot; int nm_i, qid, lim; na = NA(adapter->ifp); if (na == NULL) { ena_log_nm(adapter->pdev, ERR, "netmap adapter is NULL\n"); return; } if (na->rx_rings == NULL) { ena_log_nm(adapter->pdev, ERR, "netmap rings are NULL\n"); return; } qid = rx_ring->qid; kring = na->rx_rings[qid]; if (kring == NULL) { ena_log_nm(adapter->pdev, ERR, "netmap kernel ring %d is NULL\n", qid); return; } lim = kring->nkr_num_slots - 1; nm_i = nm_prev(kring->nr_hwcur, lim); if (kring->nr_mode != NKR_NETMAP_ON) return; bus_dmamap_sync(adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_POSTREAD); netmap_unload_map(na, adapter->rx_buf_tag, rx_info->map); KASSERT(kring->ring != NULL, ("Netmap Rx ring is NULL\n")); slot = &kring->ring->slot[nm_i]; ENA_WARN(slot->buf_idx != 0, adapter->ena_dev, "Overwrite slot buf\n"); slot->buf_idx = rx_info->netmap_buf_idx; slot->flags = NS_BUF_CHANGED; rx_info->netmap_buf_idx = 0; kring->nr_hwcur = nm_i; } static bool ena_ring_in_netmap(struct ena_adapter *adapter, int qid, enum txrx x) { struct netmap_adapter *na; struct netmap_kring *kring; if (if_getcapenable(adapter->ifp) & IFCAP_NETMAP) { na = NA(adapter->ifp); kring = (x == NR_RX) ? na->rx_rings[qid] : na->tx_rings[qid]; if (kring->nr_mode == NKR_NETMAP_ON) return true; } return false; } bool ena_tx_ring_in_netmap(struct ena_adapter *adapter, int qid) { return ena_ring_in_netmap(adapter, qid, NR_TX); } bool ena_rx_ring_in_netmap(struct ena_adapter *adapter, int qid) { return ena_ring_in_netmap(adapter, qid, NR_RX); } static void ena_netmap_reset_ring(struct ena_adapter *adapter, int qid, enum txrx x) { if (!ena_ring_in_netmap(adapter, qid, x)) return; netmap_reset(NA(adapter->ifp), x, qid, 0); ena_log_nm(adapter->pdev, INFO, "%s ring %d is in netmap mode\n", (x == NR_TX) ? "Tx" : "Rx", qid); } void ena_netmap_reset_rx_ring(struct ena_adapter *adapter, int qid) { ena_netmap_reset_ring(adapter, qid, NR_RX); } void ena_netmap_reset_tx_ring(struct ena_adapter *adapter, int qid) { ena_netmap_reset_ring(adapter, qid, NR_TX); } static int ena_netmap_reg(struct netmap_adapter *na, int onoff) { if_t ifp = na->ifp; struct ena_adapter *adapter = if_getsoftc(ifp); device_t pdev = adapter->pdev; struct netmap_kring *kring; enum txrx t; int rc, i; ENA_LOCK_LOCK(); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_TRIGGER_RESET, adapter); ena_down(adapter); if (onoff) { ena_log_nm(pdev, INFO, "netmap on\n"); for_rx_tx(t) { for (i = 0; i <= nma_get_nrings(na, t); i++) { kring = NMR(na, t)[i]; if (nm_kring_pending_on(kring)) { kring->nr_mode = NKR_NETMAP_ON; } } } nm_set_native_flags(na); } else { ena_log_nm(pdev, INFO, "netmap off\n"); nm_clear_native_flags(na); for_rx_tx(t) { for (i = 0; i <= nma_get_nrings(na, t); i++) { kring = NMR(na, t)[i]; if (nm_kring_pending_off(kring)) { kring->nr_mode = NKR_NETMAP_OFF; } } } } rc = ena_up(adapter); if (rc != 0) { ena_log_nm(pdev, WARN, "ena_up failed with rc=%d\n", rc); adapter->reset_reason = ENA_REGS_RESET_DRIVER_INVALID_STATE; nm_clear_native_flags(na); ena_destroy_device(adapter, false); ENA_FLAG_SET_ATOMIC(ENA_FLAG_DEV_UP_BEFORE_RESET, adapter); rc = ena_restore_device(adapter); } ENA_LOCK_UNLOCK(); return (rc); } static int ena_netmap_txsync(struct netmap_kring *kring, int flags) { struct ena_netmap_ctx ctx; int rc = 0; ena_netmap_fill_ctx(kring, &ctx, ENA_IO_TXQ_IDX(kring->ring_id)); ctx.ring = &ctx.adapter->tx_ring[kring->ring_id]; ENA_RING_MTX_LOCK(ctx.ring); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEV_UP, ctx.adapter))) goto txsync_end; if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, ctx.adapter))) goto txsync_end; rc = ena_netmap_tx_frames(&ctx); ena_netmap_tx_cleanup(&ctx); txsync_end: ENA_RING_MTX_UNLOCK(ctx.ring); return (rc); } static int ena_netmap_tx_frames(struct ena_netmap_ctx *ctx) { struct ena_ring *tx_ring = ctx->ring; int rc = 0; ctx->nm_i = ctx->kring->nr_hwcur; ctx->nt = ctx->ring->next_to_use; __builtin_prefetch(&ctx->slots[ctx->nm_i]); while (ctx->nm_i != ctx->kring->rhead) { if ((rc = ena_netmap_tx_frame(ctx)) != 0) { /* * When there is no empty space in Tx ring, error is * still being returned. It should not be passed to the * netmap, as application knows current ring state from * netmap ring pointers. Returning error there could * cause application to exit, but the Tx ring is * commonly being full. */ if (rc == ENA_COM_NO_MEM) rc = 0; break; } tx_ring->acum_pkts++; } /* If any packet was sent... */ if (likely(ctx->nm_i != ctx->kring->nr_hwcur)) { /* ...send the doorbell to the device. */ ena_ring_tx_doorbell(tx_ring); ctx->ring->next_to_use = ctx->nt; ctx->kring->nr_hwcur = ctx->nm_i; } return (rc); } static int ena_netmap_tx_frame(struct ena_netmap_ctx *ctx) { struct ena_com_tx_ctx ena_tx_ctx; struct ena_adapter *adapter; struct ena_ring *tx_ring; struct ena_tx_buffer *tx_info; uint16_t req_id; uint16_t header_len; uint16_t packet_len; int nb_hw_desc; int rc; void *push_hdr; adapter = ctx->adapter; if (ena_netmap_count_slots(ctx) > adapter->max_tx_sgl_size) { ena_log_nm(adapter->pdev, WARN, "Too many slots per packet\n"); return (EINVAL); } tx_ring = ctx->ring; req_id = tx_ring->free_tx_ids[ctx->nt]; tx_info = &tx_ring->tx_buffer_info[req_id]; tx_info->num_of_bufs = 0; tx_info->nm_info.sockets_used = 0; rc = ena_netmap_tx_map_slots(ctx, tx_info, &push_hdr, &header_len, &packet_len); if (unlikely(rc != 0)) { ena_log_nm(adapter->pdev, ERR, "Failed to map Tx slot\n"); return (rc); } bzero(&ena_tx_ctx, sizeof(struct ena_com_tx_ctx)); ena_tx_ctx.ena_bufs = tx_info->bufs; ena_tx_ctx.push_header = push_hdr; ena_tx_ctx.num_bufs = tx_info->num_of_bufs; ena_tx_ctx.req_id = req_id; ena_tx_ctx.header_len = header_len; ena_tx_ctx.meta_valid = adapter->disable_meta_caching; /* There are no any offloads, as the netmap doesn't support them */ if (tx_ring->acum_pkts == ENA_DB_THRESHOLD || ena_com_is_doorbell_needed(ctx->io_sq, &ena_tx_ctx)) ena_ring_tx_doorbell(tx_ring); rc = ena_com_prepare_tx(ctx->io_sq, &ena_tx_ctx, &nb_hw_desc); if (unlikely(rc != 0)) { if (likely(rc == ENA_COM_NO_MEM)) { ena_log_nm(adapter->pdev, DBG, "Tx ring[%d] is out of space\n", tx_ring->que->id); } else { ena_log_nm(adapter->pdev, ERR, "Failed to prepare Tx bufs\n"); ena_trigger_reset(adapter, ENA_REGS_RESET_DRIVER_INVALID_STATE); } counter_u64_add(tx_ring->tx_stats.prepare_ctx_err, 1); ena_netmap_unmap_last_socket_chain(ctx, tx_info); return (rc); } counter_enter(); counter_u64_add_protected(tx_ring->tx_stats.cnt, 1); counter_u64_add_protected(tx_ring->tx_stats.bytes, packet_len); counter_u64_add_protected(adapter->hw_stats.tx_packets, 1); counter_u64_add_protected(adapter->hw_stats.tx_bytes, packet_len); counter_exit(); tx_info->tx_descs = nb_hw_desc; ctx->nt = ENA_TX_RING_IDX_NEXT(ctx->nt, ctx->ring->ring_size); for (unsigned int i = 0; i < tx_info->num_of_bufs; i++) bus_dmamap_sync(adapter->tx_buf_tag, tx_info->nm_info.map_seg[i], BUS_DMASYNC_PREWRITE); return (0); } static inline uint16_t ena_netmap_count_slots(struct ena_netmap_ctx *ctx) { uint16_t slots = 1; uint16_t nm = ctx->nm_i; while ((ctx->slots[nm].flags & NS_MOREFRAG) != 0) { slots++; nm = nm_next(nm, ctx->lim); } return slots; } static inline uint16_t ena_netmap_packet_len(struct netmap_slot *slots, u_int slot_index, uint16_t limit) { struct netmap_slot *nm_slot; uint16_t packet_size = 0; do { nm_slot = &slots[slot_index]; packet_size += nm_slot->len; slot_index = nm_next(slot_index, limit); } while ((nm_slot->flags & NS_MOREFRAG) != 0); return packet_size; } static int ena_netmap_copy_data(struct netmap_adapter *na, struct netmap_slot *slots, u_int slot_index, uint16_t limit, uint16_t bytes_to_copy, void *destination) { struct netmap_slot *nm_slot; void *slot_vaddr; uint16_t data_amount; do { nm_slot = &slots[slot_index]; slot_vaddr = NMB(na, nm_slot); if (unlikely(slot_vaddr == NULL)) return (EINVAL); data_amount = min_t(uint16_t, bytes_to_copy, nm_slot->len); memcpy(destination, slot_vaddr, data_amount); bytes_to_copy -= data_amount; slot_index = nm_next(slot_index, limit); } while ((nm_slot->flags & NS_MOREFRAG) != 0 && bytes_to_copy > 0); return (0); } static int ena_netmap_map_single_slot(struct netmap_adapter *na, struct netmap_slot *slot, bus_dma_tag_t dmatag, bus_dmamap_t dmamap, void **vaddr, uint64_t *paddr) { device_t pdev; int rc; pdev = ((struct ena_adapter *)if_getsoftc(na->ifp))->pdev; *vaddr = PNMB(na, slot, paddr); if (unlikely(vaddr == NULL)) { ena_log_nm(pdev, ERR, "Slot address is NULL\n"); return (EINVAL); } rc = netmap_load_map(na, dmatag, dmamap, *vaddr); if (unlikely(rc != 0)) { ena_log_nm(pdev, ERR, "Failed to map slot %d for DMA\n", slot->buf_idx); return (EINVAL); } return (0); } static int ena_netmap_tx_map_slots(struct ena_netmap_ctx *ctx, struct ena_tx_buffer *tx_info, void **push_hdr, uint16_t *header_len, uint16_t *packet_len) { struct netmap_slot *slot; struct ena_com_buf *ena_buf; struct ena_adapter *adapter; struct ena_ring *tx_ring; struct ena_netmap_tx_info *nm_info; bus_dmamap_t *nm_maps; void *vaddr; uint64_t paddr; uint32_t *nm_buf_idx; uint32_t slot_head_len; uint32_t frag_len; uint32_t remaining_len; uint16_t push_len; uint16_t delta; int rc; adapter = ctx->adapter; tx_ring = ctx->ring; ena_buf = tx_info->bufs; nm_info = &tx_info->nm_info; nm_maps = nm_info->map_seg; nm_buf_idx = nm_info->socket_buf_idx; slot = &ctx->slots[ctx->nm_i]; slot_head_len = slot->len; *packet_len = ena_netmap_packet_len(ctx->slots, ctx->nm_i, ctx->lim); remaining_len = *packet_len; delta = 0; __builtin_prefetch(&ctx->slots[ctx->nm_i + 1]); if (tx_ring->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) { /* * When the device is in LLQ mode, the driver will copy * the header into the device memory space. * The ena_com layer assumes that the header is in a linear * memory space. * This assumption might be wrong since part of the header * can be in the fragmented buffers. * First, check if header fits in the first slot. If not, copy * it to separate buffer that will be holding linearized data. */ push_len = min_t(uint32_t, *packet_len, tx_ring->tx_max_header_size); *header_len = push_len; /* If header is in linear space, just point to socket's data. */ if (likely(push_len <= slot_head_len)) { *push_hdr = NMB(ctx->na, slot); if (unlikely(push_hdr == NULL)) { ena_log_nm(adapter->pdev, ERR, "Slot vaddress is NULL\n"); return (EINVAL); } /* * Otherwise, copy whole portion of header from multiple * slots to intermediate buffer. */ } else { rc = ena_netmap_copy_data(ctx->na, ctx->slots, ctx->nm_i, ctx->lim, push_len, tx_ring->push_buf_intermediate_buf); if (unlikely(rc)) { ena_log_nm(adapter->pdev, ERR, "Failed to copy data from slots to push_buf\n"); return (EINVAL); } *push_hdr = tx_ring->push_buf_intermediate_buf; counter_u64_add(tx_ring->tx_stats.llq_buffer_copy, 1); delta = push_len - slot_head_len; } ena_log_nm(adapter->pdev, DBG, "slot: %d header_buf->vaddr: %p push_len: %d\n", slot->buf_idx, *push_hdr, push_len); /* * If header was in linear memory space, map for the dma rest of * the data in the first mbuf of the mbuf chain. */ if (slot_head_len > push_len) { rc = ena_netmap_map_single_slot(ctx->na, slot, adapter->tx_buf_tag, *nm_maps, &vaddr, &paddr); if (unlikely(rc != 0)) { ena_log_nm(adapter->pdev, ERR, "DMA mapping error\n"); return (rc); } nm_maps++; ena_buf->paddr = paddr + push_len; ena_buf->len = slot->len - push_len; ena_buf++; tx_info->num_of_bufs++; } remaining_len -= slot->len; /* Save buf idx before advancing */ *nm_buf_idx = slot->buf_idx; nm_buf_idx++; slot->buf_idx = 0; /* Advance to the next socket */ ctx->nm_i = nm_next(ctx->nm_i, ctx->lim); slot = &ctx->slots[ctx->nm_i]; nm_info->sockets_used++; /* * If header is in non linear space (delta > 0), then skip mbufs * containing header and map the last one containing both header * and the packet data. * The first segment is already counted in. */ while (delta > 0) { __builtin_prefetch(&ctx->slots[ctx->nm_i + 1]); frag_len = slot->len; /* * If whole segment contains header just move to the * next one and reduce delta. */ if (unlikely(delta >= frag_len)) { delta -= frag_len; } else { /* * Map the data and then assign it with the * offsets */ rc = ena_netmap_map_single_slot(ctx->na, slot, adapter->tx_buf_tag, *nm_maps, &vaddr, &paddr); if (unlikely(rc != 0)) { ena_log_nm(adapter->pdev, ERR, "DMA mapping error\n"); goto error_map; } nm_maps++; ena_buf->paddr = paddr + delta; ena_buf->len = slot->len - delta; ena_buf++; tx_info->num_of_bufs++; delta = 0; } remaining_len -= slot->len; /* Save buf idx before advancing */ *nm_buf_idx = slot->buf_idx; nm_buf_idx++; slot->buf_idx = 0; /* Advance to the next socket */ ctx->nm_i = nm_next(ctx->nm_i, ctx->lim); slot = &ctx->slots[ctx->nm_i]; nm_info->sockets_used++; } } else { *push_hdr = NULL; /* * header_len is just a hint for the device. Because netmap is * not giving us any information about packet header length and * it is not guaranteed that all packet headers will be in the * 1st slot, setting header_len to 0 is making the device ignore * this value and resolve header on it's own. */ *header_len = 0; } /* Map all remaining data (regular routine for non-LLQ mode) */ while (remaining_len > 0) { __builtin_prefetch(&ctx->slots[ctx->nm_i + 1]); rc = ena_netmap_map_single_slot(ctx->na, slot, adapter->tx_buf_tag, *nm_maps, &vaddr, &paddr); if (unlikely(rc != 0)) { ena_log_nm(adapter->pdev, ERR, "DMA mapping error\n"); goto error_map; } nm_maps++; ena_buf->paddr = paddr; ena_buf->len = slot->len; ena_buf++; tx_info->num_of_bufs++; remaining_len -= slot->len; /* Save buf idx before advancing */ *nm_buf_idx = slot->buf_idx; nm_buf_idx++; slot->buf_idx = 0; /* Advance to the next socket */ ctx->nm_i = nm_next(ctx->nm_i, ctx->lim); slot = &ctx->slots[ctx->nm_i]; nm_info->sockets_used++; } return (0); error_map: ena_netmap_unmap_last_socket_chain(ctx, tx_info); return (rc); } static void ena_netmap_unmap_last_socket_chain(struct ena_netmap_ctx *ctx, struct ena_tx_buffer *tx_info) { struct ena_netmap_tx_info *nm_info; int n; nm_info = &tx_info->nm_info; /** * As the used sockets must not be equal to the buffers used in the LLQ * mode, they must be treated separately. * First, unmap the DMA maps. */ n = tx_info->num_of_bufs; while (n--) { netmap_unload_map(ctx->na, ctx->adapter->tx_buf_tag, nm_info->map_seg[n]); } tx_info->num_of_bufs = 0; /* Next, retain the sockets back to the userspace */ n = nm_info->sockets_used; while (n--) { ctx->slots[ctx->nm_i].buf_idx = nm_info->socket_buf_idx[n]; ctx->slots[ctx->nm_i].flags = NS_BUF_CHANGED; nm_info->socket_buf_idx[n] = 0; ctx->nm_i = nm_prev(ctx->nm_i, ctx->lim); } nm_info->sockets_used = 0; } static void ena_netmap_tx_cleanup(struct ena_netmap_ctx *ctx) { uint16_t req_id; uint16_t total_tx_descs = 0; ctx->nm_i = ctx->kring->nr_hwtail; ctx->nt = ctx->ring->next_to_clean; /* Reclaim buffers for completed transmissions */ while (ena_com_tx_comp_req_id_get(ctx->io_cq, &req_id) >= 0) { if (validate_tx_req_id(ctx->ring, req_id) != 0) break; total_tx_descs += ena_netmap_tx_clean_one(ctx, req_id); } ctx->kring->nr_hwtail = ctx->nm_i; if (total_tx_descs > 0) { /* acknowledge completion of sent packets */ ctx->ring->next_to_clean = ctx->nt; ena_com_comp_ack(ctx->ring->ena_com_io_sq, total_tx_descs); } } static uint16_t ena_netmap_tx_clean_one(struct ena_netmap_ctx *ctx, uint16_t req_id) { struct ena_tx_buffer *tx_info; struct ena_netmap_tx_info *nm_info; int n; tx_info = &ctx->ring->tx_buffer_info[req_id]; nm_info = &tx_info->nm_info; /** * As the used sockets must not be equal to the buffers used in the LLQ * mode, they must be treated separately. * First, unmap the DMA maps. */ n = tx_info->num_of_bufs; for (n = 0; n < tx_info->num_of_bufs; n++) { netmap_unload_map(ctx->na, ctx->adapter->tx_buf_tag, nm_info->map_seg[n]); } tx_info->num_of_bufs = 0; /* Next, retain the sockets back to the userspace */ for (n = 0; n < nm_info->sockets_used; n++) { ctx->nm_i = nm_next(ctx->nm_i, ctx->lim); ENA_WARN(ctx->slots[ctx->nm_i].buf_idx != 0, ctx->adapter->ena_dev, "Tx idx is not 0.\n"); ctx->slots[ctx->nm_i].buf_idx = nm_info->socket_buf_idx[n]; ctx->slots[ctx->nm_i].flags = NS_BUF_CHANGED; nm_info->socket_buf_idx[n] = 0; } nm_info->sockets_used = 0; ctx->ring->free_tx_ids[ctx->nt] = req_id; ctx->nt = ENA_TX_RING_IDX_NEXT(ctx->nt, ctx->lim); return tx_info->tx_descs; } static inline int validate_tx_req_id(struct ena_ring *tx_ring, uint16_t req_id) { struct ena_adapter *adapter = tx_ring->adapter; if (likely(req_id < tx_ring->ring_size)) return (0); ena_log_nm(adapter->pdev, WARN, "Invalid req_id %hu in qid %hu\n", req_id, tx_ring->qid); counter_u64_add(tx_ring->tx_stats.bad_req_id, 1); ena_trigger_reset(adapter, ENA_REGS_RESET_INV_TX_REQ_ID); return (EFAULT); } static int ena_netmap_rxsync(struct netmap_kring *kring, int flags) { struct ena_netmap_ctx ctx; int rc; ena_netmap_fill_ctx(kring, &ctx, ENA_IO_RXQ_IDX(kring->ring_id)); ctx.ring = &ctx.adapter->rx_ring[kring->ring_id]; if (ctx.kring->rhead > ctx.lim) { /* Probably not needed to release slots from RX ring. */ return (netmap_ring_reinit(ctx.kring)); } if (unlikely((if_getdrvflags(ctx.na->ifp) & IFF_DRV_RUNNING) == 0)) return (0); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_LINK_UP, ctx.adapter))) return (0); if ((rc = ena_netmap_rx_frames(&ctx)) != 0) return (rc); ena_netmap_rx_cleanup(&ctx); return (0); } static inline int ena_netmap_rx_frames(struct ena_netmap_ctx *ctx) { int rc = 0; int frames_counter = 0; ctx->nt = ctx->ring->next_to_clean; ctx->nm_i = ctx->kring->nr_hwtail; while ((rc = ena_netmap_rx_frame(ctx)) == ENA_NETMAP_MORE_FRAMES) { frames_counter++; /* In case of multiple frames, it is not an error. */ rc = 0; if (frames_counter > ENA_MAX_FRAMES) { ena_log_nm(ctx->adapter->pdev, ERR, "Driver is stuck in the Rx loop\n"); break; } }; ctx->kring->nr_hwtail = ctx->nm_i; ctx->kring->nr_kflags &= ~NKR_PENDINTR; ctx->ring->next_to_clean = ctx->nt; return (rc); } static inline int ena_netmap_rx_frame(struct ena_netmap_ctx *ctx) { struct ena_com_rx_ctx ena_rx_ctx; enum ena_regs_reset_reason_types reset_reason; int rc, len = 0; uint16_t buf, nm; ena_rx_ctx.ena_bufs = ctx->ring->ena_bufs; ena_rx_ctx.max_bufs = ctx->adapter->max_rx_sgl_size; bus_dmamap_sync(ctx->io_cq->cdesc_addr.mem_handle.tag, ctx->io_cq->cdesc_addr.mem_handle.map, BUS_DMASYNC_POSTREAD); rc = ena_com_rx_pkt(ctx->io_cq, ctx->io_sq, &ena_rx_ctx); if (unlikely(rc != 0)) { ena_log_nm(ctx->adapter->pdev, ERR, "Failed to read pkt from the device with error: %d\n", rc); if (rc == ENA_COM_NO_SPACE) { counter_u64_add(ctx->ring->rx_stats.bad_desc_num, 1); reset_reason = ENA_REGS_RESET_TOO_MANY_RX_DESCS; } else { counter_u64_add(ctx->ring->rx_stats.bad_req_id, 1); reset_reason = ENA_REGS_RESET_INV_RX_REQ_ID; } ena_trigger_reset(ctx->adapter, reset_reason); return (rc); } if (unlikely(ena_rx_ctx.descs == 0)) return (ENA_NETMAP_NO_MORE_FRAMES); ena_log_nm(ctx->adapter->pdev, DBG, "Rx: q %d got packet from ena. descs #:" " %d l3 proto %d l4 proto %d hash: %x\n", ctx->ring->qid, ena_rx_ctx.descs, ena_rx_ctx.l3_proto, ena_rx_ctx.l4_proto, ena_rx_ctx.hash); for (buf = 0; buf < ena_rx_ctx.descs; buf++) if ((rc = ena_netmap_rx_load_desc(ctx, buf, &len)) != 0) break; /* * ena_netmap_rx_load_desc doesn't know the number of descriptors. * It just set flag NS_MOREFRAG to all slots, then here flag of * last slot is cleared. */ ctx->slots[nm_prev(ctx->nm_i, ctx->lim)].flags = NS_BUF_CHANGED; if (rc != 0) { goto rx_clear_desc; } bus_dmamap_sync(ctx->io_cq->cdesc_addr.mem_handle.tag, ctx->io_cq->cdesc_addr.mem_handle.map, BUS_DMASYNC_PREREAD); counter_enter(); counter_u64_add_protected(ctx->ring->rx_stats.bytes, len); counter_u64_add_protected(ctx->adapter->hw_stats.rx_bytes, len); counter_u64_add_protected(ctx->ring->rx_stats.cnt, 1); counter_u64_add_protected(ctx->adapter->hw_stats.rx_packets, 1); counter_exit(); return (ENA_NETMAP_MORE_FRAMES); rx_clear_desc: nm = ctx->nm_i; /* Remove failed packet from ring */ while (buf--) { ctx->slots[nm].flags = 0; ctx->slots[nm].len = 0; nm = nm_prev(nm, ctx->lim); } return (rc); } static inline int ena_netmap_rx_load_desc(struct ena_netmap_ctx *ctx, uint16_t buf, int *len) { struct ena_rx_buffer *rx_info; uint16_t req_id; req_id = ctx->ring->ena_bufs[buf].req_id; rx_info = &ctx->ring->rx_buffer_info[req_id]; bus_dmamap_sync(ctx->adapter->rx_buf_tag, rx_info->map, BUS_DMASYNC_POSTREAD); netmap_unload_map(ctx->na, ctx->adapter->rx_buf_tag, rx_info->map); ENA_WARN(ctx->slots[ctx->nm_i].buf_idx != 0, ctx->adapter->ena_dev, "Rx idx is not 0.\n"); ctx->slots[ctx->nm_i].buf_idx = rx_info->netmap_buf_idx; rx_info->netmap_buf_idx = 0; /* * Set NS_MOREFRAG to all slots. * Then ena_netmap_rx_frame clears it from last one. */ ctx->slots[ctx->nm_i].flags |= NS_MOREFRAG | NS_BUF_CHANGED; ctx->slots[ctx->nm_i].len = ctx->ring->ena_bufs[buf].len; *len += ctx->slots[ctx->nm_i].len; ctx->ring->free_rx_ids[ctx->nt] = req_id; ena_log_nm(ctx->adapter->pdev, DBG, "rx_info %p, buf_idx %d, paddr %jx, nm: %d\n", rx_info, ctx->slots[ctx->nm_i].buf_idx, (uintmax_t)rx_info->ena_buf.paddr, ctx->nm_i); ctx->nm_i = nm_next(ctx->nm_i, ctx->lim); ctx->nt = ENA_RX_RING_IDX_NEXT(ctx->nt, ctx->ring->ring_size); return (0); } static inline void ena_netmap_rx_cleanup(struct ena_netmap_ctx *ctx) { int refill_required; refill_required = ctx->kring->rhead - ctx->kring->nr_hwcur; if (ctx->kring->nr_hwcur != ctx->kring->nr_hwtail) refill_required -= 1; if (refill_required == 0) return; else if (refill_required < 0) refill_required += ctx->kring->nkr_num_slots; ena_refill_rx_bufs(ctx->ring, refill_required); } static inline void ena_netmap_fill_ctx(struct netmap_kring *kring, struct ena_netmap_ctx *ctx, uint16_t ena_qid) { ctx->kring = kring; ctx->na = kring->na; ctx->adapter = if_getsoftc(ctx->na->ifp); ctx->lim = kring->nkr_num_slots - 1; ctx->io_cq = &ctx->adapter->ena_dev->io_cq_queues[ena_qid]; ctx->io_sq = &ctx->adapter->ena_dev->io_sq_queues[ena_qid]; ctx->slots = kring->ring->slot; } void ena_netmap_unload(struct ena_adapter *adapter, bus_dmamap_t map) { struct netmap_adapter *na = NA(adapter->ifp); netmap_unload_map(na, adapter->tx_buf_tag, map); } #endif /* DEV_NETMAP */ diff --git a/sys/dev/ena/ena_netmap.h b/sys/dev/ena/ena_netmap.h index aa4d0d3d815a..598fcf1f08b2 100644 --- a/sys/dev/ena/ena_netmap.h +++ b/sys/dev/ena/ena_netmap.h @@ -1,60 +1,60 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #ifndef _ENA_NETMAP_H_ #define _ENA_NETMAP_H_ /* Undef (un)likely as they are defined in netmap_kern.h */ #ifdef likely #undef likely #endif /* likely */ #ifdef unlikely #undef unlikely #endif /* unlikely */ #include #include #include int ena_netmap_attach(struct ena_adapter *adapter); int ena_netmap_alloc_rx_slot(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info); void ena_netmap_free_rx_slot(struct ena_adapter *adapter, struct ena_ring *rx_ring, struct ena_rx_buffer *rx_info); bool ena_rx_ring_in_netmap(struct ena_adapter *adapter, int qid); bool ena_tx_ring_in_netmap(struct ena_adapter *adapter, int qid); void ena_netmap_reset_rx_ring(struct ena_adapter *adapter, int qid); void ena_netmap_reset_tx_ring(struct ena_adapter *adapter, int qid); void ena_netmap_unload(struct ena_adapter *adapter, bus_dmamap_t map); #endif /* _ENA_NETMAP_H_ */ diff --git a/sys/dev/ena/ena_rss.c b/sys/dev/ena/ena_rss.c index ce21a0a8950e..d90a7fbb253a 100644 --- a/sys/dev/ena/ena_rss.c +++ b/sys/dev/ena/ena_rss.c @@ -1,300 +1,300 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2021 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #include #include "opt_rss.h" #include "ena_rss.h" /* * This function should generate unique key for the whole driver. * If the key was already genereated in the previous call (for example * for another adapter), then it should be returned instead. */ void ena_rss_key_fill(void *key, size_t size) { static bool key_generated; static uint8_t default_key[ENA_HASH_KEY_SIZE]; KASSERT(size <= ENA_HASH_KEY_SIZE, ("Requested more bytes than ENA RSS key can hold")); if (!key_generated) { arc4random_buf(default_key, ENA_HASH_KEY_SIZE); key_generated = true; } memcpy(key, default_key, size); } /* * ENA HW expects the key to be in reverse-byte order. */ static void ena_rss_reorder_hash_key(u8 *reordered_key, const u8 *key, size_t key_size) { int i; key = key + key_size - 1; for (i = 0; i < key_size; ++i) *reordered_key++ = *key--; } int ena_rss_set_hash(struct ena_com_dev *ena_dev, const u8 *key) { enum ena_admin_hash_functions ena_func = ENA_ADMIN_TOEPLITZ; u8 hw_key[ENA_HASH_KEY_SIZE]; ena_rss_reorder_hash_key(hw_key, key, ENA_HASH_KEY_SIZE); return (ena_com_fill_hash_function(ena_dev, ena_func, hw_key, ENA_HASH_KEY_SIZE, 0x0)); } int ena_rss_get_hash_key(struct ena_com_dev *ena_dev, u8 *key) { u8 hw_key[ENA_HASH_KEY_SIZE]; int rc; rc = ena_com_get_hash_key(ena_dev, hw_key); if (rc != 0) return rc; ena_rss_reorder_hash_key(key, hw_key, ENA_HASH_KEY_SIZE); return (0); } static int ena_rss_init_default(struct ena_adapter *adapter) { struct ena_com_dev *ena_dev = adapter->ena_dev; device_t dev = adapter->pdev; int qid, rc, i; rc = ena_com_rss_init(ena_dev, ENA_RX_RSS_TABLE_LOG_SIZE); if (unlikely(rc != 0)) { ena_log(dev, ERR, "Cannot init indirect table\n"); return (rc); } for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; i++) { #ifdef RSS qid = rss_get_indirection_to_bucket(i) % adapter->num_io_queues; #else qid = i % adapter->num_io_queues; #endif rc = ena_com_indirect_table_fill_entry(ena_dev, i, ENA_IO_RXQ_IDX(qid)); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) { ena_log(dev, ERR, "Cannot fill indirect table\n"); goto err_rss_destroy; } } #ifdef RSS uint8_t rss_algo = rss_gethashalgo(); if (rss_algo == RSS_HASH_TOEPLITZ) { uint8_t hash_key[RSS_KEYSIZE]; rss_getkey(hash_key); rc = ena_rss_set_hash(ena_dev, hash_key); } else #endif rc = ena_com_fill_hash_function(ena_dev, ENA_ADMIN_TOEPLITZ, NULL, ENA_HASH_KEY_SIZE, 0x0); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) { ena_log(dev, ERR, "Cannot fill hash function\n"); goto err_rss_destroy; } rc = ena_com_set_default_hash_ctrl(ena_dev); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) { ena_log(dev, ERR, "Cannot fill hash control\n"); goto err_rss_destroy; } rc = ena_rss_indir_init(adapter); return (rc == EOPNOTSUPP ? 0 : rc); err_rss_destroy: ena_com_rss_destroy(ena_dev); return (rc); } /* Configure the Rx forwarding */ int ena_rss_configure(struct ena_adapter *adapter) { struct ena_com_dev *ena_dev = adapter->ena_dev; int rc; /* In case the RSS table was destroyed */ if (!ena_dev->rss.tbl_log_size) { rc = ena_rss_init_default(adapter); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) { ena_log(adapter->pdev, ERR, "WARNING: RSS was not properly re-initialized," " it will affect bandwidth\n"); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_RSS_ACTIVE, adapter); return (rc); } } /* Set indirect table */ rc = ena_com_indirect_table_set(ena_dev); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) return (rc); /* Configure hash function (if supported) */ rc = ena_com_set_hash_function(ena_dev); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) return (rc); /* Configure hash inputs (if supported) */ rc = ena_com_set_hash_ctrl(ena_dev); if (unlikely((rc != 0) && (rc != EOPNOTSUPP))) return (rc); return (0); } static void ena_rss_init_default_deferred(void *arg) { struct ena_adapter *adapter; devclass_t dc; int max; int rc; dc = devclass_find("ena"); if (unlikely(dc == NULL)) { ena_log_raw(ERR, "SYSINIT: %s: No devclass ena\n", __func__); return; } max = devclass_get_maxunit(dc); while (max-- >= 0) { adapter = devclass_get_softc(dc, max); if (adapter != NULL) { rc = ena_rss_init_default(adapter); ENA_FLAG_SET_ATOMIC(ENA_FLAG_RSS_ACTIVE, adapter); if (unlikely(rc != 0)) { ena_log(adapter->pdev, WARN, "WARNING: RSS was not properly initialized," " it will affect bandwidth\n"); ENA_FLAG_CLEAR_ATOMIC(ENA_FLAG_RSS_ACTIVE, adapter); } } } } SYSINIT(ena_rss_init, SI_SUB_KICK_SCHEDULER, SI_ORDER_SECOND, ena_rss_init_default_deferred, NULL); int ena_rss_indir_get(struct ena_adapter *adapter, uint32_t *table) { int rc, i; rc = ena_com_indirect_table_get(adapter->ena_dev, table); if (rc != 0) { if (rc == EOPNOTSUPP) device_printf(adapter->pdev, "Reading from indirection table not supported\n"); else device_printf(adapter->pdev, "Unable to get indirection table\n"); return (rc); } for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; ++i) table[i] = ENA_IO_RXQ_IDX_TO_COMBINED_IDX(table[i]); return (0); } int ena_rss_indir_set(struct ena_adapter *adapter, uint32_t *table) { int rc, i; for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; ++i) { rc = ena_com_indirect_table_fill_entry(adapter->ena_dev, i, ENA_IO_RXQ_IDX(table[i])); if (rc != 0) { device_printf(adapter->pdev, "Cannot fill indirection table entry %d\n", i); return (rc); } } rc = ena_com_indirect_table_set(adapter->ena_dev); if (rc == EOPNOTSUPP) device_printf(adapter->pdev, "Writing to indirection table not supported\n"); else if (rc != 0) device_printf(adapter->pdev, "Cannot set indirection table\n"); return (rc); } int ena_rss_indir_init(struct ena_adapter *adapter) { struct ena_indir *indir = adapter->rss_indir; int rc; if (indir == NULL) { adapter->rss_indir = indir = malloc(sizeof(struct ena_indir), M_DEVBUF, M_WAITOK | M_ZERO); if (indir == NULL) return (ENOMEM); } rc = ena_rss_indir_get(adapter, indir->table); if (rc != 0) { free(adapter->rss_indir, M_DEVBUF); adapter->rss_indir = NULL; return (rc); } ena_rss_copy_indir_buf(indir->sysctl_buf, indir->table); return (0); } diff --git a/sys/dev/ena/ena_rss.h b/sys/dev/ena/ena_rss.h index 20dc41cc64a4..1c1c89261b35 100644 --- a/sys/dev/ena/ena_rss.h +++ b/sys/dev/ena/ena_rss.h @@ -1,71 +1,71 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2021 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #ifndef ENA_RSS_H #define ENA_RSS_H #include "opt_rss.h" #include #ifdef RSS #include #endif #include "ena.h" #define ENA_RX_RSS_MSG_RECORD_SZ 8 struct ena_indir { uint32_t table[ENA_RX_RSS_TABLE_SIZE]; /* This is the buffer wired to `rss.indir_table` sysctl. */ char sysctl_buf[ENA_RX_RSS_TABLE_SIZE * ENA_RX_RSS_MSG_RECORD_SZ]; }; int ena_rss_set_hash(struct ena_com_dev *ena_dev, const u8 *key); int ena_rss_get_hash_key(struct ena_com_dev *ena_dev, u8 *key); int ena_rss_configure(struct ena_adapter *); int ena_rss_indir_get(struct ena_adapter *adapter, uint32_t *table); int ena_rss_indir_set(struct ena_adapter *adapter, uint32_t *table); int ena_rss_indir_init(struct ena_adapter *adapter); static inline void ena_rss_copy_indir_buf(char *buf, uint32_t *table) { int i; for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; ++i) { buf += snprintf(buf, ENA_RX_RSS_MSG_RECORD_SZ + 1, "%s%d:%d", i == 0 ? "" : " ", i, table[i]); } } #endif /* !(ENA_RSS_H) */ diff --git a/sys/dev/ena/ena_sysctl.c b/sys/dev/ena/ena_sysctl.c index 5efad01f372e..5eaa3c3e76c3 100644 --- a/sys/dev/ena/ena_sysctl.c +++ b/sys/dev/ena/ena_sysctl.c @@ -1,1210 +1,1210 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2021 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include #include "opt_rss.h" #include "ena_rss.h" #include "ena_sysctl.h" static void ena_sysctl_add_wd(struct ena_adapter *); static void ena_sysctl_add_stats(struct ena_adapter *); static void ena_sysctl_add_eni_metrics(struct ena_adapter *); static void ena_sysctl_add_customer_metrics(struct ena_adapter *); static void ena_sysctl_add_srd_info(struct ena_adapter *); static void ena_sysctl_add_tuneables(struct ena_adapter *); static void ena_sysctl_add_irq_affinity(struct ena_adapter *); /* Kernel option RSS prevents manipulation of key hash and indirection table. */ #ifndef RSS static void ena_sysctl_add_rss(struct ena_adapter *); #endif static int ena_sysctl_buf_ring_size(SYSCTL_HANDLER_ARGS); static int ena_sysctl_rx_queue_size(SYSCTL_HANDLER_ARGS); static int ena_sysctl_io_queues_nb(SYSCTL_HANDLER_ARGS); static int ena_sysctl_irq_base_cpu(SYSCTL_HANDLER_ARGS); static int ena_sysctl_irq_cpu_stride(SYSCTL_HANDLER_ARGS); static int ena_sysctl_metrics_interval(SYSCTL_HANDLER_ARGS); #ifndef RSS static int ena_sysctl_rss_key(SYSCTL_HANDLER_ARGS); static int ena_sysctl_rss_indir_table(SYSCTL_HANDLER_ARGS); #endif /* Limit max ENA sample rate to be an hour. */ #define ENA_METRICS_MAX_SAMPLE_INTERVAL 3600 #define ENA_HASH_KEY_MSG_SIZE (ENA_HASH_KEY_SIZE * 2 + 1) #define SYSCTL_GSTRING_LEN 128 #define ENA_METRIC_ENI_ENTRY(stat, desc) { \ .name = #stat, \ .description = #desc, \ } #define ENA_STAT_ENTRY(stat, desc, stat_type) { \ .name = #stat, \ .description = #desc, \ .stat_offset = offsetof(struct ena_admin_##stat_type, stat) / sizeof(u64), \ } #define ENA_STAT_ENA_SRD_ENTRY(stat, desc) \ ENA_STAT_ENTRY(stat, desc, ena_srd_stats) struct ena_hw_metrics { char name[SYSCTL_GSTRING_LEN]; char description[SYSCTL_GSTRING_LEN]; }; struct ena_srd_metrics { char name[SYSCTL_GSTRING_LEN]; char description[SYSCTL_GSTRING_LEN]; int stat_offset; }; static const struct ena_srd_metrics ena_srd_stats_strings[] = { ENA_STAT_ENA_SRD_ENTRY( ena_srd_tx_pkts, Number of packets transmitted over ENA SRD), ENA_STAT_ENA_SRD_ENTRY( ena_srd_eligible_tx_pkts, Number of packets transmitted or could have been transmitted over ENA SRD), ENA_STAT_ENA_SRD_ENTRY( ena_srd_rx_pkts, Number of packets received over ENA SRD), ENA_STAT_ENA_SRD_ENTRY( ena_srd_resource_utilization, Percentage of the ENA SRD resources that are in use), }; static const struct ena_hw_metrics ena_hw_stats_strings[] = { ENA_METRIC_ENI_ENTRY( bw_in_allowance_exceeded, Inbound BW allowance exceeded), ENA_METRIC_ENI_ENTRY( bw_out_allowance_exceeded, Outbound BW allowance exceeded), ENA_METRIC_ENI_ENTRY( pps_allowance_exceeded, PPS allowance exceeded), ENA_METRIC_ENI_ENTRY( conntrack_allowance_exceeded, Connection tracking allowance exceeded), ENA_METRIC_ENI_ENTRY( linklocal_allowance_exceeded, Linklocal packet rate allowance), ENA_METRIC_ENI_ENTRY( conntrack_allowance_available, Number of available conntracks), }; #ifndef ARRAY_SIZE #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0])) #endif #define ENA_CUSTOMER_METRICS_ARRAY_SIZE ARRAY_SIZE(ena_hw_stats_strings) #define ENA_SRD_METRICS_ARRAY_SIZE ARRAY_SIZE(ena_srd_stats_strings) static SYSCTL_NODE(_hw, OID_AUTO, ena, CTLFLAG_RD | CTLFLAG_MPSAFE, 0, "ENA driver parameters"); /* * Logging level for changing verbosity of the output */ int ena_log_level = ENA_INFO; SYSCTL_INT(_hw_ena, OID_AUTO, log_level, CTLFLAG_RWTUN, &ena_log_level, 0, "Logging level indicating verbosity of the logs"); SYSCTL_CONST_STRING(_hw_ena, OID_AUTO, driver_version, CTLFLAG_RD, ENA_DRV_MODULE_VERSION, "ENA driver version"); /* * Use 9k mbufs for the Rx buffers. Default to 0 (use page size mbufs instead). * Using 9k mbufs in low memory conditions might cause allocation to take a lot * of time and lead to the OS instability as it needs to look for the contiguous * pages. * However, page size mbufs has a bit smaller throughput than 9k mbufs, so if * the network performance is the priority, the 9k mbufs can be used. */ int ena_enable_9k_mbufs = 0; SYSCTL_INT(_hw_ena, OID_AUTO, enable_9k_mbufs, CTLFLAG_RDTUN, &ena_enable_9k_mbufs, 0, "Use 9 kB mbufs for Rx descriptors"); /* * Force the driver to use large LLQ (Low Latency Queue) header. Defaults to * false. This option may be important for platforms, which often handle packet * headers on Tx with total header size greater than 96B, as it may * reduce the latency. * It also reduces the maximum Tx queue size by half, so it may cause more Tx * packet drops. */ bool ena_force_large_llq_header = false; SYSCTL_BOOL(_hw_ena, OID_AUTO, force_large_llq_header, CTLFLAG_RDTUN, &ena_force_large_llq_header, 0, "Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum Tx queue size by half.\n"); int ena_rss_table_size = ENA_RX_RSS_TABLE_SIZE; int ena_sysctl_allocate_customer_metrics_buffer(struct ena_adapter *adapter) { int rc = 0; adapter->customer_metrics_array = malloc((sizeof(u64) * ENA_CUSTOMER_METRICS_ARRAY_SIZE), M_DEVBUF, M_NOWAIT | M_ZERO); if (unlikely(adapter->customer_metrics_array == NULL)) rc = ENOMEM; return rc; } void ena_sysctl_add_nodes(struct ena_adapter *adapter) { struct ena_com_dev *dev = adapter->ena_dev; if (ena_com_get_cap(dev, ENA_ADMIN_CUSTOMER_METRICS)) ena_sysctl_add_customer_metrics(adapter); else if (ena_com_get_cap(dev, ENA_ADMIN_ENI_STATS)) ena_sysctl_add_eni_metrics(adapter); if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) ena_sysctl_add_srd_info(adapter); ena_sysctl_add_wd(adapter); ena_sysctl_add_stats(adapter); ena_sysctl_add_tuneables(adapter); ena_sysctl_add_irq_affinity(adapter); #ifndef RSS ena_sysctl_add_rss(adapter); #endif } static void ena_sysctl_add_wd(struct ena_adapter *adapter) { device_t dev; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); /* Sysctl calls for Watchdog service */ SYSCTL_ADD_INT(ctx, child, OID_AUTO, "wd_active", CTLFLAG_RWTUN, &adapter->wd_active, 0, "Watchdog is active"); SYSCTL_ADD_QUAD(ctx, child, OID_AUTO, "keep_alive_timeout", CTLFLAG_RWTUN, &adapter->keep_alive_timeout, "Timeout for Keep Alive messages"); SYSCTL_ADD_QUAD(ctx, child, OID_AUTO, "missing_tx_timeout", CTLFLAG_RWTUN, &adapter->missing_tx_timeout, "Timeout for TX completion"); SYSCTL_ADD_U32(ctx, child, OID_AUTO, "missing_tx_max_queues", CTLFLAG_RWTUN, &adapter->missing_tx_max_queues, 0, "Number of TX queues to check per run"); SYSCTL_ADD_U32(ctx, child, OID_AUTO, "missing_tx_threshold", CTLFLAG_RWTUN, &adapter->missing_tx_threshold, 0, "Max number of timeouted packets"); } static void ena_sysctl_add_stats(struct ena_adapter *adapter) { device_t dev; struct ena_ring *tx_ring; struct ena_ring *rx_ring; struct ena_hw_stats *hw_stats; struct ena_stats_dev *dev_stats; struct ena_stats_tx *tx_stats; struct ena_stats_rx *rx_stats; struct ena_com_stats_admin *admin_stats; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; struct sysctl_oid *queue_node, *tx_node, *rx_node, *hw_node; struct sysctl_oid *admin_node; struct sysctl_oid_list *queue_list, *tx_list, *rx_list, *hw_list; struct sysctl_oid_list *admin_list; #define QUEUE_NAME_LEN 32 char namebuf[QUEUE_NAME_LEN]; int i; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); tx_ring = adapter->tx_ring; rx_ring = adapter->rx_ring; hw_stats = &adapter->hw_stats; dev_stats = &adapter->dev_stats; admin_stats = &adapter->ena_dev->admin_queue.stats; SYSCTL_ADD_COUNTER_U64(ctx, child, OID_AUTO, "wd_expired", CTLFLAG_RD, &dev_stats->wd_expired, "Watchdog expiry count"); SYSCTL_ADD_COUNTER_U64(ctx, child, OID_AUTO, "interface_up", CTLFLAG_RD, &dev_stats->interface_up, "Network interface up count"); SYSCTL_ADD_COUNTER_U64(ctx, child, OID_AUTO, "interface_down", CTLFLAG_RD, &dev_stats->interface_down, "Network interface down count"); SYSCTL_ADD_COUNTER_U64(ctx, child, OID_AUTO, "admin_q_pause", CTLFLAG_RD, &dev_stats->admin_q_pause, "Admin queue pauses"); for (i = 0; i < adapter->num_io_queues; ++i, ++tx_ring, ++rx_ring) { snprintf(namebuf, QUEUE_NAME_LEN, "queue%d", i); queue_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, namebuf, CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "Queue Name"); queue_list = SYSCTL_CHILDREN(queue_node); adapter->que[i].oid = queue_node; #ifdef RSS /* Common stats */ SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "cpu", CTLFLAG_RD, &adapter->que[i].cpu, 0, "CPU affinity"); SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "domain", CTLFLAG_RD, &adapter->que[i].domain, 0, "NUMA domain"); #endif /* TX specific stats */ tx_node = SYSCTL_ADD_NODE(ctx, queue_list, OID_AUTO, "tx_ring", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "TX ring"); tx_list = SYSCTL_CHILDREN(tx_node); tx_stats = &tx_ring->tx_stats; SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "count", CTLFLAG_RD, &tx_stats->cnt, "Packets sent"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "bytes", CTLFLAG_RD, &tx_stats->bytes, "Bytes sent"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "prepare_ctx_err", CTLFLAG_RD, &tx_stats->prepare_ctx_err, "TX buffer preparation failures"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "dma_mapping_err", CTLFLAG_RD, &tx_stats->dma_mapping_err, "DMA mapping failures"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "doorbells", CTLFLAG_RD, &tx_stats->doorbells, "Queue doorbells"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "missing_tx_comp", CTLFLAG_RD, &tx_stats->missing_tx_comp, "TX completions missed"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "bad_req_id", CTLFLAG_RD, &tx_stats->bad_req_id, "Bad request id count"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "mbuf_collapses", CTLFLAG_RD, &tx_stats->collapse, "Mbuf collapse count"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "mbuf_collapse_err", CTLFLAG_RD, &tx_stats->collapse_err, "Mbuf collapse failures"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "queue_wakeups", CTLFLAG_RD, &tx_stats->queue_wakeup, "Queue wakeups"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "queue_stops", CTLFLAG_RD, &tx_stats->queue_stop, "Queue stops"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "llq_buffer_copy", CTLFLAG_RD, &tx_stats->llq_buffer_copy, "Header copies for llq transaction"); SYSCTL_ADD_COUNTER_U64(ctx, tx_list, OID_AUTO, "unmask_interrupt_num", CTLFLAG_RD, &tx_stats->unmask_interrupt_num, "Unmasked interrupt count"); /* RX specific stats */ rx_node = SYSCTL_ADD_NODE(ctx, queue_list, OID_AUTO, "rx_ring", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "RX ring"); rx_list = SYSCTL_CHILDREN(rx_node); rx_stats = &rx_ring->rx_stats; SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "count", CTLFLAG_RD, &rx_stats->cnt, "Packets received"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "bytes", CTLFLAG_RD, &rx_stats->bytes, "Bytes received"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "refil_partial", CTLFLAG_RD, &rx_stats->refil_partial, "Partial refilled mbufs"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "csum_bad", CTLFLAG_RD, &rx_stats->csum_bad, "Bad RX checksum"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "mbuf_alloc_fail", CTLFLAG_RD, &rx_stats->mbuf_alloc_fail, "Failed mbuf allocs"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "mjum_alloc_fail", CTLFLAG_RD, &rx_stats->mjum_alloc_fail, "Failed jumbo mbuf allocs"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "dma_mapping_err", CTLFLAG_RD, &rx_stats->dma_mapping_err, "DMA mapping errors"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "bad_desc_num", CTLFLAG_RD, &rx_stats->bad_desc_num, "Bad descriptor count"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "bad_req_id", CTLFLAG_RD, &rx_stats->bad_req_id, "Bad request id count"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "empty_rx_ring", CTLFLAG_RD, &rx_stats->empty_rx_ring, "RX descriptors depletion count"); SYSCTL_ADD_COUNTER_U64(ctx, rx_list, OID_AUTO, "csum_good", CTLFLAG_RD, &rx_stats->csum_good, "Valid RX checksum calculations"); } /* Stats read from device */ hw_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "hw_stats", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "Statistics from hardware"); hw_list = SYSCTL_CHILDREN(hw_node); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "rx_packets", CTLFLAG_RD, &hw_stats->rx_packets, "Packets received"); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "tx_packets", CTLFLAG_RD, &hw_stats->tx_packets, "Packets transmitted"); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "rx_bytes", CTLFLAG_RD, &hw_stats->rx_bytes, "Bytes received"); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "tx_bytes", CTLFLAG_RD, &hw_stats->tx_bytes, "Bytes transmitted"); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "rx_drops", CTLFLAG_RD, &hw_stats->rx_drops, "Receive packet drops"); SYSCTL_ADD_COUNTER_U64(ctx, hw_list, OID_AUTO, "tx_drops", CTLFLAG_RD, &hw_stats->tx_drops, "Transmit packet drops"); /* ENA Admin queue stats */ admin_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "admin_stats", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "ENA Admin Queue statistics"); admin_list = SYSCTL_CHILDREN(admin_node); SYSCTL_ADD_U64(ctx, admin_list, OID_AUTO, "aborted_cmd", CTLFLAG_RD, &admin_stats->aborted_cmd, 0, "Aborted commands"); SYSCTL_ADD_U64(ctx, admin_list, OID_AUTO, "sumbitted_cmd", CTLFLAG_RD, &admin_stats->submitted_cmd, 0, "Submitted commands"); SYSCTL_ADD_U64(ctx, admin_list, OID_AUTO, "completed_cmd", CTLFLAG_RD, &admin_stats->completed_cmd, 0, "Completed commands"); SYSCTL_ADD_U64(ctx, admin_list, OID_AUTO, "out_of_space", CTLFLAG_RD, &admin_stats->out_of_space, 0, "Queue out of space"); SYSCTL_ADD_U64(ctx, admin_list, OID_AUTO, "no_completion", CTLFLAG_RD, &admin_stats->no_completion, 0, "Commands not completed"); } static void ena_sysctl_add_srd_info(struct ena_adapter *adapter) { device_t dev; struct sysctl_oid *ena_srd_info; struct sysctl_oid_list *srd_list; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; struct ena_admin_ena_srd_stats *srd_stats_ptr; struct ena_srd_metrics cur_stat_strings; int i; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); ena_srd_info = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "ena_srd_info", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "ENA's SRD information"); srd_list = SYSCTL_CHILDREN(ena_srd_info); SYSCTL_ADD_U64(ctx, srd_list, OID_AUTO, "ena_srd_mode", CTLFLAG_RD, &adapter->ena_srd_info.flags, 0, "Describes which ENA-express features are enabled"); srd_stats_ptr = &adapter->ena_srd_info.ena_srd_stats; for (i = 0 ; i < ENA_SRD_METRICS_ARRAY_SIZE; i++) { cur_stat_strings = ena_srd_stats_strings[i]; SYSCTL_ADD_U64(ctx, srd_list, OID_AUTO, cur_stat_strings.name, CTLFLAG_RD, (u64 *)srd_stats_ptr + cur_stat_strings.stat_offset, 0, cur_stat_strings.description); } } static void ena_sysctl_add_customer_metrics(struct ena_adapter *adapter) { device_t dev; struct ena_com_dev *ena_dev; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; struct sysctl_oid *customer_metric; struct sysctl_oid_list *customer_list; int i; dev = adapter->pdev; ena_dev = adapter->ena_dev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); customer_metric = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "customer_metrics", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "ENA's customer metrics"); customer_list = SYSCTL_CHILDREN(customer_metric); for (i = 0; i < ENA_CUSTOMER_METRICS_ARRAY_SIZE; i++) { if (ena_com_get_customer_metric_support(ena_dev, i)) { SYSCTL_ADD_U64(ctx, customer_list, OID_AUTO, ena_hw_stats_strings[i].name, CTLFLAG_RD, &adapter->customer_metrics_array[i], 0, ena_hw_stats_strings[i].description); } } } static void ena_sysctl_add_eni_metrics(struct ena_adapter *adapter) { device_t dev; struct ena_admin_eni_stats *eni_metrics; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; struct sysctl_oid *eni_node; struct sysctl_oid_list *eni_list; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); eni_node = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "eni_metrics", CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, "ENA's ENI metrics"); eni_list = SYSCTL_CHILDREN(eni_node); eni_metrics = &adapter->eni_metrics; SYSCTL_ADD_U64(ctx, eni_list, OID_AUTO, "bw_in_allowance_exceeded", CTLFLAG_RD, &eni_metrics->bw_in_allowance_exceeded, 0, "Inbound BW allowance exceeded"); SYSCTL_ADD_U64(ctx, eni_list, OID_AUTO, "bw_out_allowance_exceeded", CTLFLAG_RD, &eni_metrics->bw_out_allowance_exceeded, 0, "Outbound BW allowance exceeded"); SYSCTL_ADD_U64(ctx, eni_list, OID_AUTO, "pps_allowance_exceeded", CTLFLAG_RD, &eni_metrics->pps_allowance_exceeded, 0, "PPS allowance exceeded"); SYSCTL_ADD_U64(ctx, eni_list, OID_AUTO, "conntrack_allowance_exceeded", CTLFLAG_RD, &eni_metrics->conntrack_allowance_exceeded, 0, "Connection tracking allowance exceeded"); SYSCTL_ADD_U64(ctx, eni_list, OID_AUTO, "linklocal_allowance_exceeded", CTLFLAG_RD, &eni_metrics->linklocal_allowance_exceeded, 0, "Linklocal packet rate allowance exceeded"); } static void ena_sysctl_add_tuneables(struct ena_adapter *adapter) { device_t dev; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); /* Tuneable number of buffers in the buf-ring (drbr) */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "buf_ring_size", CTLTYPE_U32 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_buf_ring_size, "I", "Size of the Tx buffer ring (drbr)."); /* Tuneable number of the Rx ring size */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "rx_queue_size", CTLTYPE_U32 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_rx_queue_size, "I", "Size of the Rx ring. The size should be a power of 2."); /* Tuneable number of IO queues */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "io_queues_nb", CTLTYPE_U32 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_io_queues_nb, "I", "Number of IO queues."); /* * Tuneable, which determines how often ENA metrics will be read. * 0 means it's turned off. Maximum allowed value is limited by: * ENA_METRICS_MAX_SAMPLE_INTERVAL. */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "stats_sample_interval", CTLTYPE_U16 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_metrics_interval, "SU", "Interval in seconds for updating Netword interface metrics. 0 turns off the update."); } /* Kernel option RSS prevents manipulation of key hash and indirection table. */ #ifndef RSS static void ena_sysctl_add_rss(struct ena_adapter *adapter) { device_t dev; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); /* RSS options */ tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "rss", CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, "Receive Side Scaling options."); child = SYSCTL_CHILDREN(tree); /* RSS hash key */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "key", CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_rss_key, "A", "RSS key."); /* Tuneable RSS indirection table */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "indir_table", CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_rss_indir_table, "A", "RSS indirection table."); /* RSS indirection table size */ SYSCTL_ADD_INT(ctx, child, OID_AUTO, "indir_table_size", CTLFLAG_RD | CTLFLAG_MPSAFE, &ena_rss_table_size, 0, "RSS indirection table size."); } #endif /* RSS */ static void ena_sysctl_add_irq_affinity(struct ena_adapter *adapter) { device_t dev; struct sysctl_ctx_list *ctx; struct sysctl_oid *tree; struct sysctl_oid_list *child; dev = adapter->pdev; ctx = device_get_sysctl_ctx(dev); tree = device_get_sysctl_tree(dev); child = SYSCTL_CHILDREN(tree); tree = SYSCTL_ADD_NODE(ctx, child, OID_AUTO, "irq_affinity", CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, "Decide base CPU and stride for irqs affinity."); child = SYSCTL_CHILDREN(tree); /* Add base cpu leaf */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "base_cpu", CTLTYPE_S32 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_irq_base_cpu, "I", "Base cpu index for setting irq affinity."); /* Add cpu stride leaf */ SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "cpu_stride", CTLTYPE_S32 | CTLFLAG_RW | CTLFLAG_MPSAFE, adapter, 0, ena_sysctl_irq_cpu_stride, "I", "Distance between irqs when setting affinity."); } /* * ena_sysctl_update_queue_node_nb - Register/unregister sysctl queue nodes. * * Whether the nodes are registered or unregistered depends on a delta between * the `old` and `new` parameters, representing the number of queues. * * This function is used to hide sysctl attributes for queue nodes which aren't * currently used by the HW (e.g. after a call to `ena_sysctl_io_queues_nb`). * * NOTE: * All unregistered nodes must be registered again at detach, i.e. by a call to * this function. */ void ena_sysctl_update_queue_node_nb(struct ena_adapter *adapter, int old, int new) { struct sysctl_oid *oid; int min, max, i; min = MIN(old, new); max = MIN(MAX(old, new), adapter->max_num_io_queues); for (i = min; i < max; ++i) { oid = adapter->que[i].oid; sysctl_wlock(); if (old > new) sysctl_unregister_oid(oid); else sysctl_register_oid(oid); sysctl_wunlock(); } } static int ena_sysctl_buf_ring_size(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; uint32_t val; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } val = 0; error = sysctl_wire_old_buffer(req, sizeof(val)); if (error == 0) { val = adapter->buf_ring_size; error = sysctl_handle_32(oidp, &val, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (!powerof2(val) || val == 0) { ena_log(adapter->pdev, ERR, "Requested new Tx buffer ring size (%u) is not a power of 2\n", val); error = EINVAL; goto unlock; } if (val != adapter->buf_ring_size) { ena_log(adapter->pdev, INFO, "Requested new Tx buffer ring size: %d. Old size: %d\n", val, adapter->buf_ring_size); error = ena_update_buf_ring_size(adapter, val); } else { ena_log(adapter->pdev, ERR, "New Tx buffer ring size is the same as already used: %u\n", adapter->buf_ring_size); } unlock: ENA_LOCK_UNLOCK(); return (error); } static int ena_sysctl_rx_queue_size(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; uint32_t val; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } val = 0; error = sysctl_wire_old_buffer(req, sizeof(val)); if (error == 0) { val = adapter->requested_rx_ring_size; error = sysctl_handle_32(oidp, &val, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (val < ENA_MIN_RING_SIZE || val > adapter->max_rx_ring_size) { ena_log(adapter->pdev, ERR, "Requested new Rx queue size (%u) is out of range: [%u, %u]\n", val, ENA_MIN_RING_SIZE, adapter->max_rx_ring_size); error = EINVAL; goto unlock; } /* Check if the parameter is power of 2 */ if (!powerof2(val)) { ena_log(adapter->pdev, ERR, "Requested new Rx queue size (%u) is not a power of 2\n", val); error = EINVAL; goto unlock; } if (val != adapter->requested_rx_ring_size) { ena_log(adapter->pdev, INFO, "Requested new Rx queue size: %u. Old size: %u\n", val, adapter->requested_rx_ring_size); error = ena_update_queue_size(adapter, adapter->requested_tx_ring_size, val); } else { ena_log(adapter->pdev, ERR, "New Rx queue size is the same as already used: %u\n", adapter->requested_rx_ring_size); } unlock: ENA_LOCK_UNLOCK(); return (error); } /* * Change number of effectively used IO queues adapter->num_io_queues */ static int ena_sysctl_io_queues_nb(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; uint32_t old_num_queues, tmp = 0; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } error = sysctl_wire_old_buffer(req, sizeof(tmp)); if (error == 0) { tmp = adapter->num_io_queues; error = sysctl_handle_int(oidp, &tmp, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (tmp == 0) { ena_log(adapter->pdev, ERR, "Requested number of IO queues is zero\n"); error = EINVAL; goto unlock; } /* * The adapter::max_num_io_queues is the HW capability. The system * resources availability may potentially be a tighter limit. Therefore * the relation `adapter::max_num_io_queues >= adapter::msix_vecs` * always holds true, while the `adapter::msix_vecs` is variable across * device reset (`ena_destroy_device()` + `ena_restore_device()`). */ if (tmp > (adapter->msix_vecs - ENA_ADMIN_MSIX_VEC)) { ena_log(adapter->pdev, ERR, "Requested number of IO queues is higher than maximum allowed (%u)\n", adapter->msix_vecs - ENA_ADMIN_MSIX_VEC); error = EINVAL; goto unlock; } if (tmp == adapter->num_io_queues) { ena_log(adapter->pdev, ERR, "Requested number of IO queues is equal to current value " "(%u)\n", adapter->num_io_queues); } else { ena_log(adapter->pdev, INFO, "Requested new number of IO queues: %u, current value: " "%u\n", tmp, adapter->num_io_queues); old_num_queues = adapter->num_io_queues; error = ena_update_io_queue_nb(adapter, tmp); if (error != 0) return (error); ena_sysctl_update_queue_node_nb(adapter, old_num_queues, tmp); } unlock: ENA_LOCK_UNLOCK(); return (error); } static int ena_sysctl_metrics_interval(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; uint16_t interval; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } error = sysctl_wire_old_buffer(req, sizeof(interval)); if (error == 0) { interval = adapter->metrics_sample_interval; error = sysctl_handle_16(oidp, &interval, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (interval > ENA_METRICS_MAX_SAMPLE_INTERVAL) { ena_log(adapter->pdev, ERR, "ENA metrics update interval is out of range - maximum allowed value: %d seconds\n", ENA_METRICS_MAX_SAMPLE_INTERVAL); error = EINVAL; goto unlock; } if (interval == 0) { ena_log(adapter->pdev, INFO, "ENA metrics update is now turned off\n"); bzero(&adapter->eni_metrics, sizeof(adapter->eni_metrics)); } else { ena_log(adapter->pdev, INFO, "ENA metrics update interval is set to: %" PRIu16 " seconds\n", interval); } adapter->metrics_sample_interval = interval; unlock: ENA_LOCK_UNLOCK(); return (0); } static int ena_sysctl_irq_base_cpu(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; int irq_base_cpu = 0; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = ENODEV; goto unlock; } error = sysctl_wire_old_buffer(req, sizeof(irq_base_cpu)); if (error == 0) { irq_base_cpu = adapter->irq_cpu_base; error = sysctl_handle_int(oidp, &irq_base_cpu, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (irq_base_cpu <= ENA_BASE_CPU_UNSPECIFIED) { ena_log(adapter->pdev, ERR, "Requested base CPU is less than zero.\n"); error = EINVAL; goto unlock; } if (irq_base_cpu > mp_ncpus) { ena_log(adapter->pdev, INFO, "Requested base CPU is larger than the number of available CPUs. \n"); error = EINVAL; goto unlock; } if (irq_base_cpu == adapter->irq_cpu_base) { ena_log(adapter->pdev, INFO, "Requested IRQ base CPU is equal to current value " "(%d)\n", adapter->irq_cpu_base); goto unlock; } ena_log(adapter->pdev, INFO, "Requested new IRQ base CPU: %d, current value: %d\n", irq_base_cpu, adapter->irq_cpu_base); error = ena_update_base_cpu(adapter, irq_base_cpu); unlock: ENA_LOCK_UNLOCK(); return (error); } static int ena_sysctl_irq_cpu_stride(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; int32_t irq_cpu_stride = 0; int error; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = ENODEV; goto unlock; } error = sysctl_wire_old_buffer(req, sizeof(irq_cpu_stride)); if (error == 0) { irq_cpu_stride = adapter->irq_cpu_stride; error = sysctl_handle_int(oidp, &irq_cpu_stride, 0, req); } if (error != 0 || req->newptr == NULL) goto unlock; if (irq_cpu_stride < 0) { ena_log(adapter->pdev, ERR, "Requested IRQ stride is less than zero.\n"); error = EINVAL; goto unlock; } if (irq_cpu_stride > mp_ncpus) { ena_log(adapter->pdev, INFO, "Warning: Requested IRQ stride is larger than the number of available CPUs.\n"); } if (irq_cpu_stride == adapter->irq_cpu_stride) { ena_log(adapter->pdev, INFO, "Requested IRQ CPU stride is equal to current value " "(%u)\n", adapter->irq_cpu_stride); goto unlock; } ena_log(adapter->pdev, INFO, "Requested new IRQ CPU stride: %u, current value: %u\n", irq_cpu_stride, adapter->irq_cpu_stride); error = ena_update_cpu_stride(adapter, irq_cpu_stride); if (error != 0) goto unlock; unlock: ENA_LOCK_UNLOCK(); return (error); } #ifndef RSS /* * Change the Receive Side Scaling hash key. */ static int ena_sysctl_rss_key(SYSCTL_HANDLER_ARGS) { struct ena_adapter *adapter = arg1; struct ena_com_dev *ena_dev = adapter->ena_dev; enum ena_admin_hash_functions ena_func; char msg[ENA_HASH_KEY_MSG_SIZE]; char elem[3] = { 0 }; char *endp; u8 rss_key[ENA_HASH_KEY_SIZE]; int error, i; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_RSS_ACTIVE, adapter))) { error = ENOTSUP; goto unlock; } error = sysctl_wire_old_buffer(req, sizeof(msg)); if (error != 0) goto unlock; error = ena_com_get_hash_function(adapter->ena_dev, &ena_func); if (error != 0) { device_printf(adapter->pdev, "Cannot get hash function\n"); goto unlock; } if (ena_func != ENA_ADMIN_TOEPLITZ) { error = EINVAL; device_printf(adapter->pdev, "Unsupported hash algorithm\n"); goto unlock; } error = ena_rss_get_hash_key(ena_dev, rss_key); if (error != 0) { device_printf(adapter->pdev, "Cannot get hash key\n"); goto unlock; } for (i = 0; i < ENA_HASH_KEY_SIZE; ++i) snprintf(&msg[i * 2], 3, "%02x", rss_key[i]); error = sysctl_handle_string(oidp, msg, sizeof(msg), req); if (error != 0 || req->newptr == NULL) goto unlock; if (strlen(msg) != sizeof(msg) - 1) { error = EINVAL; device_printf(adapter->pdev, "Invalid key size\n"); goto unlock; } for (i = 0; i < ENA_HASH_KEY_SIZE; ++i) { strncpy(elem, &msg[i * 2], 2); rss_key[i] = strtol(elem, &endp, 16); /* Both hex nibbles in the string must be valid to continue. */ if (endp == elem || *endp != '\0' || rss_key[i] < 0) { error = EINVAL; device_printf(adapter->pdev, "Invalid key hex value: '%c'\n", *endp); goto unlock; } } error = ena_rss_set_hash(ena_dev, rss_key); if (error != 0) device_printf(adapter->pdev, "Cannot fill hash key\n"); unlock: ENA_LOCK_UNLOCK(); return (error); } /* * Change the Receive Side Scaling indirection table. * * The sysctl entry string consists of one or more `x:y` keypairs, where * x stands for the table index and y for its new value. * Table indices that don't need to be updated can be omitted from the string * and will retain their existing values. If an index is entered more than once, * the last value is used. * * Example: * To update two selected indices in the RSS indirection table, e.g. setting * index 0 to queue 5 and then index 5 to queue 0, the below command should be * used: * sysctl dev.ena.0.rss.indir_table="0:5 5:0" */ static int ena_sysctl_rss_indir_table(SYSCTL_HANDLER_ARGS) { int num_queues, error; struct ena_adapter *adapter = arg1; struct ena_indir *indir; char *msg, *buf, *endp; uint32_t idx, value; ENA_LOCK_LOCK(); if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_DEVICE_RUNNING, adapter))) { error = EINVAL; goto unlock; } if (unlikely(!ENA_FLAG_ISSET(ENA_FLAG_RSS_ACTIVE, adapter))) { error = ENOTSUP; goto unlock; } indir = adapter->rss_indir; msg = indir->sysctl_buf; if (unlikely(indir == NULL)) { error = ENOTSUP; goto unlock; } error = sysctl_handle_string(oidp, msg, sizeof(indir->sysctl_buf), req); if (error != 0 || req->newptr == NULL) goto unlock; num_queues = adapter->num_io_queues; /* * This sysctl expects msg to be a list of `x:y` record pairs, * where x is the indirection table index and y is its value. */ for (buf = msg; *buf != '\0'; buf = endp) { idx = strtol(buf, &endp, 10); if (endp == buf || idx < 0) { device_printf(adapter->pdev, "Invalid index: %s\n", buf); error = EINVAL; break; } if (idx >= ENA_RX_RSS_TABLE_SIZE) { device_printf(adapter->pdev, "Index %d out of range\n", idx); error = ERANGE; break; } buf = endp; if (*buf++ != ':') { device_printf(adapter->pdev, "Missing ':' separator\n"); error = EINVAL; break; } value = strtol(buf, &endp, 10); if (endp == buf || value < 0) { device_printf(adapter->pdev, "Invalid value: %s\n", buf); error = EINVAL; break; } if (value >= num_queues) { device_printf(adapter->pdev, "Value %d out of range\n", value); error = ERANGE; break; } indir->table[idx] = value; } if (error != 0) /* Reload indirection table with last good data. */ ena_rss_indir_get(adapter, indir->table); /* At this point msg has been clobbered by sysctl_handle_string. */ ena_rss_copy_indir_buf(msg, indir->table); if (error == 0) error = ena_rss_indir_set(adapter, indir->table); unlock: ENA_LOCK_UNLOCK(); return (error); } #endif /* RSS */ diff --git a/sys/dev/ena/ena_sysctl.h b/sys/dev/ena/ena_sysctl.h index e9b4bfaae1cb..4f5834214005 100644 --- a/sys/dev/ena/ena_sysctl.h +++ b/sys/dev/ena/ena_sysctl.h @@ -1,51 +1,51 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * - * Copyright (c) 2015-2021 Amazon.com, Inc. or its affiliates. + * Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #ifndef ENA_SYSCTL_H #define ENA_SYSCTL_H #include #include #include "ena.h" void ena_sysctl_add_nodes(struct ena_adapter *adapter); void ena_sysctl_update_queue_node_nb(struct ena_adapter *adapter, int old, int new); int ena_sysctl_allocate_customer_metrics_buffer(struct ena_adapter *adapter); extern int ena_enable_9k_mbufs; #define ena_mbuf_sz (ena_enable_9k_mbufs ? MJUM9BYTES : MJUMPAGESIZE) /* Force the driver to use large LLQ (Low Latency Queue) headers. */ extern bool ena_force_large_llq_header; #endif /* !(ENA_SYSCTL_H) */ diff --git a/sys/modules/ena/Makefile b/sys/modules/ena/Makefile index 95326888cc6c..f8b71588afa1 100644 --- a/sys/modules/ena/Makefile +++ b/sys/modules/ena/Makefile @@ -1,42 +1,42 @@ # # SPDX-License-Identifier: BSD-2-Clause # -# Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. +# Copyright (c) 2015-2023 Amazon.com, Inc. or its affiliates. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # .PATH: ${SRCTOP}/sys/dev/ena \ ${SRCTOP}/sys/contrib/ena-com KMOD = if_ena SRCS = ena_com.c ena_eth_com.c SRCS += ena.c ena_sysctl.c ena_datapath.c ena_netmap.c ena_rss.c SRCS += device_if.h bus_if.h pci_if.h SRCS += opt_rss.h CFLAGS += -I${SRCTOP}/sys/contrib .include