Page MenuHomeFreeBSD

D5794.id17796.diff
No OneTemporary

D5794.id17796.diff

This file is larger than 256 KB, so syntax highlighting was skipped.
Index: contrib/ofed/libmlx5/AUTHORS
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/AUTHORS
@@ -0,0 +1 @@
+Eli Cohen <eli@mellanox.com>
Index: contrib/ofed/libmlx5/COPYING
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/COPYING
@@ -0,0 +1,378 @@
+This software is available to you under a choice of one of two
+licenses. You may choose to be licensed under the terms of the the
+OpenIB.org BSD license or the GNU General Public License (GPL) Version
+2, both included below.
+
+Copyright (c) 2007 Cisco, Inc. All rights reserved.
+
+==================================================================
+
+ OpenIB.org BSD license
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above
+ copyright notice, this list of conditions and the following
+ disclaimer in the documentation and/or other materials provided
+ with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+==================================================================
+
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.
+ 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Library General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) year name of author
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Library General
+Public License instead of this License.
Index: contrib/ofed/libmlx5/Makefile.am
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/Makefile.am
@@ -0,0 +1,35 @@
+AM_CFLAGS = -g -Wall -Werror -D_GNU_SOURCE -I$(includedir)
+LDFLAGS += @NUMA_LIB@
+EXTRA_DIST = src/mlx5.map libmlx5.spec.in mlx5.driver
+EXTRA_DIST += debian
+EXTRA_DIST += autogen.sh
+EXTRA_DIST += scripts/expose_libmlx5_headers/libmlx_expose_headers scripts/expose_libmlx5_headers/defines.txt scripts/expose_libmlx5_headers/structures.txt scripts/expose_libmlx5_headers/enumerations.txt
+EXTRA_DIST += libmlx5.spec
+
+
+mlx5_version_script = @MLX5_VERSION_SCRIPT@
+
+MLX5_SOURCES = src/buf.c src/cq.c src/dbrec.c src/mlx5.c src/qp.c src/srq.c src/verbs.c src/implicit_lkey.c
+noinst_HEADERS = src/bitmap.h src/doorbell.h src/list.h src/mlx5-abi.h src/mlx5.h src/wqe.h src/implicit_lkey.h
+
+
+if HAVE_IBV_DEVICE_LIBRARY_EXTENSION
+ lib_LTLIBRARIES = src/libmlx5.la
+ src_libmlx5_la_SOURCES = $(MLX5_SOURCES)
+ src_libmlx5_la_LDFLAGS = -avoid-version -release @IBV_DEVICE_LIBRARY_EXTENSION@ \
+ $(mlx5_version_script)
+ mlx5confdir = $(sysconfdir)/libibverbs.d
+ mlx5conf_DATA = mlx5.driver
+else
+ mlx5libdir = $(libdir)/infiniband
+ mlx5lib_LTLIBRARIES = src/mlx5.la
+ src_mlx5_la_SOURCES = $(MLX5_SOURCES)
+ src_mlx5_la_LDFLAGS = -avoid-version -module $(mlx5_version_script)
+endif
+
+install-data-hook:
+ mkdir -p $(DESTDIR)$(prefix)/include/infiniband
+ $(top_srcdir)/scripts/expose_libmlx5_headers/libmlx_expose_headers $(top_srcdir)/scripts/expose_libmlx5_headers/defines.txt $(top_srcdir)/scripts/expose_libmlx5_headers/structures.txt $(top_srcdir)/scripts/expose_libmlx5_headers/enumerations.txt $(DESTDIR)$(prefix)
+
+uninstall-hook:
+ rm -f $(DESTDIR)$(prefix)/include/infiniband/mlx5_hw.h
Index: contrib/ofed/libmlx5/README
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/README
@@ -0,0 +1,4 @@
+Introduction
+============
+
+Original file content erased. Put there something after finish coding...
Index: contrib/ofed/libmlx5/autogen.sh
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/autogen.sh
@@ -0,0 +1,7 @@
+#!/bin/sh -exE
+
+aclocal -I config
+libtoolize --force --copy
+autoheader
+automake --foreign --add-missing --copy
+autoconf
Index: contrib/ofed/libmlx5/config/.gitignore
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/config/.gitignore
@@ -0,0 +1,8 @@
+mkinstalldirs
+depcomp
+compile
+missing
+config.guess
+config.sub
+ltmain.sh
+install-sh
Index: contrib/ofed/libmlx5/configure.ac
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/configure.ac
@@ -0,0 +1,114 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(libmlx5, 1.0.2mlnx1, linux-rdma@vger.kernel.org)
+AC_CONFIG_SRCDIR([src/mlx5.h])
+AC_CONFIG_AUX_DIR(config)
+AC_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE([1.10 foreign tar-ustar silent-rules subdir-objects])
+m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
+
+AC_PROG_LIBTOOL
+LT_INIT
+
+AC_ARG_WITH([valgrind],
+ AC_HELP_STRING([--with-valgrind],
+ [Enable Valgrind annotations (small runtime overhead, default NO)]))
+if test x$with_valgrind = x || test x$with_valgrind = xno; then
+ want_valgrind=no
+ AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
+else
+ want_valgrind=yes
+ if test -d $with_valgrind; then
+ CPPFLAGS="$CPPFLAGS -I$with_valgrind/include"
+ fi
+fi
+
+AC_ARG_WITH([mlx5_debug],
+ AC_HELP_STRING([--with-mlx5_debug],
+ [Enable extensive debug prints from libmlx5 (default NO)]))
+if test x$with_mlx5_debug = xyes; then
+ CFLAGS="$CFLAGS -DMLX5_DEBUG"
+fi
+
+CFLAGS="$CFLAGS -Werror"
+
+dnl Checks for programs
+AC_PROG_CC
+
+dnl Checks for libraries
+AC_CHECK_LIB(numa, numa_node_of_cpu,
+ [
+ have_numa=yes
+ AC_DEFINE(HAVE_NUMA, 1, [adding numa support])
+ ],
+ [
+ have_numa=no
+ ]
+)
+
+AC_CHECK_LIB(ibverbs, ibv_get_device_list, [],
+ AC_MSG_ERROR([ibv_get_device_list() not found. libmlx5 requires libibverbs.]))
+
+AC_CHECK_LIB(ibverbs, ibv_register_driver_ext,
+ AC_DEFINE(HAVE_IBV_EXT, 1, [adding verbs extension support]))
+
+dnl Checks for header files.
+AC_CHECK_HEADER(infiniband/driver.h, [],
+ AC_MSG_ERROR([<infiniband/driver.h> not found. libmlx5 requires libibverbs.]))
+AC_HEADER_STDC
+
+if test x$want_valgrind = xyes; then
+ AC_CHECK_HEADER(valgrind/memcheck.h,
+ [AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
+ [Define to 1 if you have the <valgrind/memcheck.h> header file.])],
+ [if test $want_valgrind = yes; then
+ AC_MSG_ERROR([Valgrind memcheck support requested, but <valgrind/memcheck.h> not found.])
+ fi])
+fi
+
+dnl Checks for typedefs, structures, and compiler characteristics.
+AC_C_CONST
+AC_CHECK_SIZEOF(long)
+
+dnl Checks for library functions
+AC_CHECK_FUNC(ibv_read_sysfs_file, [],
+ AC_MSG_ERROR([ibv_read_sysfs_file() not found. libmlx5 requires libibverbs >= 1.0.3.]))
+AC_CHECK_FUNCS(ibv_dontfork_range ibv_dofork_range ibv_register_driver)
+
+dnl Now check if for libibverbs 1.0 vs 1.1
+dummy=if$$
+cat <<IBV_VERSION > $dummy.c
+#include <infiniband/driver.h>
+IBV_DEVICE_LIBRARY_EXTENSION
+IBV_VERSION
+IBV_DEVICE_LIBRARY_EXTENSION=`$CC $CPPFLAGS -E $dummy.c 2> /dev/null | tail -1`
+rm -f $dummy.c
+AM_CONDITIONAL(HAVE_IBV_DEVICE_LIBRARY_EXTENSION,
+ test $IBV_DEVICE_LIBRARY_EXTENSION != IBV_DEVICE_LIBRARY_EXTENSION)
+AC_SUBST(IBV_DEVICE_LIBRARY_EXTENSION)
+
+AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
+ [if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then
+ ac_cv_version_script=yes
+ else
+ ac_cv_version_script=no
+ fi])
+
+if test $ac_cv_version_script = yes; then
+ MLX5_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/mlx5.map'
+else
+ MLX5_VERSION_SCRIPT=
+fi
+AC_SUBST(MLX5_VERSION_SCRIPT)
+
+if test $have_numa = yes; then
+ NUMA_LIB='-lnuma'
+else
+ NUMA_LIB=
+fi
+AC_SUBST(NUMA_LIB)
+
+
+AC_CONFIG_FILES([Makefile libmlx5.spec])
+AC_OUTPUT
Index: contrib/ofed/libmlx5/debian/changelog
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/changelog
@@ -0,0 +1,208 @@
+libmlx5 (1.0.2mlnx1-1) unstable; urgency=low
+
+ * adjust mlx5_hw.h to survive -Werror during autotools
+ * Add support for compact AV
+ * fix warning in print with wrong type at mlx5_dbg
+ * Added report of GRH in wc_flags to DC CQE.
+ * configure.ac: Not compiling with valgrind if flag not set
+ * Fix masked atomic args check while post send
+ * fix bug of setting ctrl_seg opc_mod to wrong variable
+ * Add ConnectX-4 device
+
+ -- Alaa Hleihel <alaa@mellanox.com> Wed, 15 Apr 2015 18:47:10 +0200
+
+libmlx5 (1.0.1mlnx2-1) unstable; urgency=low
+
+ * libmlx5: ibv_exp_create_mr with IBV_EXP_MR_SIGNATURE_EN return error.
+ * libmlx5: revert the endianess fix for immediate data
+ * libmlx5: fix gcc version query define
+ * libmlx5: use wc_auto_evict_size instead of wc_flush
+ * libmlx5:Fixed Immediate data endianness.
+ * libmlx5: copy the correct pd into odp_data.pd
+ * libmlx5: fix debug mode
+ * libmlx5: fix bug in umr
+ * Replace getenv to use ibv_exp_getenv
+ * libmlx5: fix contiguous page registration size.
+ * libmlx5: valgrind errors on modify_cq
+ * libmlx5: fix compilation bugs for old gcc
+ * Modify to use verbs specific getenv
+ * libmlx5: fixed overrun bug in resize cq
+ * libmlx5: Add general and code restructuring optimizations
+ * libmlx5: Reset opmod for each wr
+ * libmlx5: Optimize usage of memory barrier and locks
+ * libmlx5: destroy remote implicit rkey during PD deallocation
+ * libmlx5.spec.in: Changed valgrind libs DESTDIR
+ * Added valgrind support
+ * fixed and added valgrind Macros
+ * Adding experimental dereg_mr support
+ * libmlx5: added support to choose specific addr when using contig_pages
+ * libmlx5: fixed wraparound problem in umr creation
+ * libmlx5: added -Werror to Makefile.am
+ * libmlx5: Fix send wq calculation
+ * libmlx5: added max_inl_send_klmx check in create qp
+ * libmlx5: UMR API change
+ * libmlx5: Fix create qp errno value from EINVAL to ENOMEM
+ * libmlx5: Handle send queue wraparound in extended atomics
+ * libmlx5: fail on create QP if enable umr without comp_mask
+ * libmlx5: add check max device sge when creating QP
+ * libmlx5: update opcode in bad completion
+ * libmlx5: corrections after changes in ibv_exp_prefetch_attr
+ * libmlx5: change return value on ibv_post_srq_recv
+ * libmlx5: fix segfault on modify QP
+ * libmlx5: Verify max atomic arg size
+ * libmlx5: Fix failure to set inline for invalidate
+ * libmlx5.spec.in: use %{_prefix} instead of /usr
+ * libmlx5.spec.in: Support configure_options.
+ * Makefile.am: add implicit_lkey.h to noinst_HEADERS
+ * libibverbs: Fix immediate error detection base on IB spec
+ * libmlx5: Avoid creating AH with DLID 0
+ * configure: Update AM_INIT_AUTOMAKE to support new auto tools.
+ * libmlx5: fix compilation warning on 32bit arch
+ * libmlx5: remove attr_size from ibv_exp_prefetch_mr verbs.
+ * libmlx5: prefetch implicit rkey MRs when registering a relaxed MR
+ * libxml5: Properly set the parameters of mrs created implicitly.
+ * libmlx5: Fix broken build on XEN server
+ * libmlx5: Fail post send if not in RTS
+ * libmlx5: Fix HW limitation in atomic response scatter entry
+ * libmlx5: Add missing fp in case of debug build
+ * libmlx5: Indicate UMR support at create
+ * libmlx5: fix compilation warning on Xen.
+ * libmlx5: fix compilation warning on newer gcc.
+ * libmlx5: fix compilation warning on 32bit arch
+ * libmlx5: add remote implicit mr support.
+ * libmlx5: Add implicit-lkey support.
+ * libmlx5: add support for the new ibv_exp_prefetch_mr verb.
+ * libmlx5: use $includedir to search for include files.
+ * libmlx5: change ibv_exp_reg_mr to call ibv_cmd_exp_reg_mr.
+ * libmlx5: fix reported size of verbs device struct.
+ * libmlx5: Add completion opcodes for masked atomic operations
+ * libmlx5: Fix bug taking args from wrong place
+ * libmlx5: Re-work UMR API
+ * libmlx5: Fix wrong calculation of translation size
+ * libmlx5: Add work completio opcode for UMR ops
+ * libmlx5: Fix DC size report to be a mask value
+ * BUILD: fix make checkdist and install datahook to respect $prefix
+ * libmlx5: Fix alignment problem
+ * libmlx5: Minor fixes to post send
+ * libmlx5: Use correct comp_mask in inline KLMs indication
+ * libmlx5: Fix compilation issues on 32 bit archs
+ * libmlx5: Add UMR support
+ * libmlx5: Add support for send NOP
+ * libmlx5: Simplify extended atomics API
+ * libmlx5: Fix compiler warning - unsued varaible
+ * scripts/expose_libmlx5_headers: install to the correct directory
+ * libmlx5: Fix endianess of atomics > 8 bytes
+ * libmlx5: Add support for Connect-IB virtual function
+ * libmlx5: Fix point type in ext_cmp_swp and ext_fetch_add
+ * libmlx5: fix 32b host compilation issue
+ * libmlx5: Add extended atomic support
+ * scripts/expose_libmlx5_headers: update the structures.txt file.
+ * libmlx5: Avoid overflow on mlx5_get_block_order()
+ * Revert "libmlx5: Fix log function to avoid overflow"
+ * Revert "libmlx5: Fix corner case in mlx5_get_block_order"
+ * libmlx5: Fix workaround for XRC
+ * libmlx5: Fix seg fault in poll_cq
+ * libmlx5: Fix corner case in mlx5_get_block_order
+ * libmlx5: Fix broken report on srq_qp
+ * libmlx5: fix refcnt for xrc
+ * libmlx5: Fix overflow on flag mask
+ * libmlx5: Fix log function to avoid overflow
+ * libmlx5: Fix variable overflow
+ * libmlx5: Return SRQ number in src_qp for XRC legacy
+ * libmlx5:update qp state on exp modify qp
+ * libmlx5: improve experimental interface
+ * libmlx5: Clear destroyed QP for resource table
+ * Change imm_data to ex.imm_data
+ * libmlx5: change wc_size from int to uint32_t.
+ * libmlx5: Fix sq overhead calculation
+ * libmlx5: Drain DCT CQEs when destroyed
+ * libmlx5.spec.in: Remove hard coded name and version from the Source
+
+ -- Vladimir Sokolovsky <vlad@mellanox.com> Wed, 10 Dec 2014 10:53:10 +0200
+
+libmlx5 (1.0.1mlnx1-1) unstable; urgency=low
+
+ * libmlx5: Fix reported max SGE
+ * libmlx5: Add support for experimental atomics
+ * libmlx5: Fix corruption of legacy xrc domain
+ * libmlx5: added a new script that exposes specific structures, enumerations and defines from the libmlx5 sources to a new header file.
+ * libmlx5: Fix return codes from post send/recv
+ * libmlx5: Use new mlx5_alloc_ucontext to allow BF
+ * libmlx5: fix write on non existing exp_wc_flags field
+ * libmlx5: Add support for ARM DCT
+ * libmlx5: Align verbs interface with upstream
+ * libmlx5: add ibv_exp_reg_mr experimental verb
+ * libmlx5: Change legacy extended verbs to experimental verbs
+ * libmlx5: Change legacy extended uverbs to experimental uverbs
+ * Enable contigous pages for Control resources by default
+ * libmlx5: Do not publish support for IBV_CALC_OP_MAXLOC
+ * libmlx5: Follow API changes in libibverbs
+ * libmlx5: Fix memory leak in destroy DCT
+ * libmlx5: Optimize post send for CD operations
+ * libmlx5: Remove valgrind statement from mlx5_poll_one
+ * libmlx5: Fix valgrind error on Debian 7.1
+ * libmlx5: Fix overflow handling in resize CQ
+ * libmlx5: Fix leak in destory srq
+ * libmlx5: Fix destroy DCT
+ * libmlx5: Fix resize CQ
+ * libmlx5: Add missing defines
+ * libmlx5: Change sandy bridge work around algorithm
+ * libmlx5: add debian support to EXTRA_DIST
+ * libmlx5: add support for "git review" command line gerrit tool
+ * libmlx5: Fix "make distcheck"
+ * libmlx5: Fix create QP extended flow
+ * libmlx5: Fix resize CQ missing mask
+ * libmlx5: Add Cross-channel capability
+ * libmlx5: Add mlx5_post_task
+ * libmlx5: Add CALC capabilities information into mlx5_query_device_ex
+ * libmlx5: Support Cross-channel capability in mlx5_drv_create_qp
+ * libmlx5: Add new opcodes to support Cross-channel
+ * libmlx5: Add support for inline receive new API
+ * mlx5: Add support for reading DC capabilites
+ * libmlx5: Fix XRC poll CQ flow
+ * libmlx5: Return DC related objects in query
+ * Revert "Revert "libmlx5: Remove deprecated enum IBV_QPT_DCT""
+ * Revert "libmlx5: Remove deprecated enum IBV_QPT_DCT"
+ * libmlx5: Remove deprecated enum IBV_QPT_DCT
+ * libmlx5: Move DC calls to experimental verbs files
+ * libmlx5: Avoid clearing unused struct
+ * libmlx5: Fix justified compile warnings on debian
+ * libmlx5: Modify support for DC
+ * libmlx5: Change call to experimental create qp
+ * libmlx5: Add support for resize cq
+ * libmlx5: poll cq may report grh indication for non UD QPs
+ * libmlx5: Remove/rename mentions of mlx4
+ * libmlx5: Fix broken uuar allocator
+ * libmlx5: Add support for create CQ extended
+ * libmlx5: add support for modify cq
+ * libmlx5: add support for query device extended
+ * libmlx5: avoid free of un-allocated pointer
+ * Avoid allocating receive buffer for QPs without recieve queue
+ * Fix signature calculation on receive queues
+ * Disable atomic operations
+ * libmlx5: Avoid returning negative values of errno
+ * libmlx5: fix srq free in destroy qp
+ * call mlx5_store/clear_qp() only when there are wqes
+ * libmlx5: Add adaptive stall mechanism for cq in sandy bridge
+ * libmlx5: On destroy qp remove pending cqe only by their qpn
+ * Fix copy to scat
+ * Fix leak in destroy SRQ
+ * libmlx5: Fix scatter to CQE
+ * libmlx5: XRC compat support
+ * Fix returned values in create QP
+ * libmlx5: Add DC support
+ * Work around for recovery problem in UoF
+ * Fix failure when mixed SRQ and QP report to CQ
+ * Add env varialbe to shut down blueflame
+ * Change dfault SB loop count
+ * Control action on error CQE
+ * mlx5: add XRC support
+ * mlx5: move call to single_threaded_app() to mlx5.c
+
+ -- Vladimir Sokolovsky <vlad@mellanox.com> Sun, 23 Mar 2014 14:16:10 +0200
+
+libmlx5 (1.0.0-1) unstable; urgency=low
+
+ * New Mellanox release.
+
+ -- Vladimir Sokolovsky <vlad@mellanox.com> Mon, 7 Jan 2013 13:38:10 +0200
Index: contrib/ofed/libmlx5/debian/compat
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/compat
@@ -0,0 +1 @@
+7
Index: contrib/ofed/libmlx5/debian/control
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/control
@@ -0,0 +1,47 @@
+Source: libmlx5
+Priority: extra
+Maintainer: Eli Cohen <eli@mellanox.com>
+Build-Depends: debhelper (>= 7.0.50~), dpkg-dev (>= 1.13.19), libibverbs-dev (>= 1.1.3)
+Standards-Version: 3.9.2
+Section: libs
+Homepage: http://www.openfabrics.org/
+
+Package: libmlx5-1
+Section: libs
+Architecture: any
+Depends: ${shlibs:Depends}, ${misc:Depends}, libibverbs1 (>= 1.1.3)
+Description: Userspace driver for Mellanox ConnectX InfiniBand HCAs
+ libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand
+ host channel adapters (HCAs) for the libibverbs library. This allows
+ userspace processes to access Mellanox HCA hardware directly with
+ low latency and low overhead.
+ .
+ This package contains the loadable plug-in.
+
+Package: libmlx5-dev
+Section: libdevel
+Architecture: any
+Depends: ${misc:Depends}, libmlx5-1 (= ${binary:Version})
+Description: Development files for the libmlx5 driver
+ libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand
+ host channel adapters (HCAs) for the libibverbs library. This allows
+ userspace processes to access Mellanox HCA hardware directly with
+ low latency and low overhead.
+ .
+ This package contains static versions of libmlx5 that may be linked
+ directly to an application, which may be useful for debugging.
+
+Package: libmlx5-1-dbg
+Section: debug
+Priority: extra
+Architecture: any
+Depends: ${misc:Depends}, libmlx5-1 (= ${binary:Version})
+Description: Debugging symbols for the libmlx5 driver
+ libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand
+ host channel adapters (HCAs) for the libibverbs library. This allows
+ userspace processes to access Mellanox HCA hardware directly with
+ low latency and low overhead.
+ .
+ This package contains the debugging symbols associated with
+ libmlx5-1. They will automatically be used by gdb for debugging
+ libmlx5-related issues.
Index: contrib/ofed/libmlx5/debian/copyright
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/copyright
@@ -0,0 +1,43 @@
+Initial Debianization:
+This package was debianized by Roland Dreier <rolandd@cisco.com> on
+Fri, 6 Apr 2007 10:04:57 -0700
+
+Source:
+It was downloaded from the OpenFabrics web site at
+<https://openfabrics.org/downloads/mlx4/>
+
+Authors:
+ Roland Dreier <rolandd@cisco.com>
+
+Portions are copyrighted by:
+ * Copyright (c) 2005, 2006, 2007 Cisco Systems. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
+ * Copyright (c) 2005 Mellanox Technologies Ltd. All rights reserved.
+
+libmlx4 is licensed under a choice of one of two licenses. You may
+choose to be licensed under the terms of the GNU General Public
+License (GPL) Version 2, available from the file
+/usr/share/common-licenses/GPL-2 on your Debian system, or the
+OpenIB.org BSD license below:
+
+ Redistribution and use in source and binary forms, with or
+ without modification, are permitted provided that the following
+ conditions are met:
+
+ - Redistributions of source code must retain the above
+ copyright notice, this list of conditions and the following
+ disclaimer.
+
+ - Redistributions in binary form must reproduce the above
+ copyright notice, this list of conditions and the following
+ disclaimer in the documentation and/or other materials
+ provided with the distribution.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
Index: contrib/ofed/libmlx5/debian/libmlx5-1.install
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/libmlx5-1.install
@@ -0,0 +1,2 @@
+usr/lib/libmlx5-rdmav2.so /usr/lib/libibverbs/
+etc/libibverbs.d/mlx5.driver
Index: contrib/ofed/libmlx5/debian/libmlx5-dev.install
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/libmlx5-dev.install
@@ -0,0 +1,2 @@
+usr/lib/libmlx5.a
+usr/include/infiniband
Index: contrib/ofed/libmlx5/debian/patches/driver-plugin-directory.patch
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/patches/driver-plugin-directory.patch
@@ -0,0 +1,10 @@
+Description: Tell libibverbs to look in /usr/lib/libibverbs for plugin library
+Author: Roland Dreier <roland@digitalvampire.org>
+
+Index: libmlx5.git/mlx5.driver
+===================================================================
+--- libmlx5.git.orig/mlx5.driver 2011-07-06 01:27:34.521058451 -0700
++++ libmlx5.git/mlx5.driver 2011-07-06 01:27:47.051074172 -0700
+@@ -1 +1 @@
+-driver mlx5
++driver /usr/lib/libibverbs/libmlx5
Index: contrib/ofed/libmlx5/debian/patches/series
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/patches/series
@@ -0,0 +1 @@
+driver-plugin-directory.patch
Index: contrib/ofed/libmlx5/debian/rules
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/rules
@@ -0,0 +1,10 @@
+#!/usr/bin/make -f
+# -*- mode: makefile; coding: utf-8 -*-
+
+%:
+ dh $@
+
+override_dh_strip:
+ dh_strip --dbg-package=libmlx5-1-dbg
+
+override_dh_makeshlibs:
Index: contrib/ofed/libmlx5/debian/source/format
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/source/format
@@ -0,0 +1 @@
+3.0 (quilt)
Index: contrib/ofed/libmlx5/debian/watch
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/debian/watch
@@ -0,0 +1,3 @@
+version=3
+opts="uversionmangle=s/-rc/~rc/" \
+ http://www.openfabrics.org/downloads/mlx5/libmlx5-(.+)\.tar\.gz
Index: contrib/ofed/libmlx5/libmlx5.spec.in
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/libmlx5.spec.in
@@ -0,0 +1,85 @@
+%{!?_with_valgrind: %define _with_valgrind 0}
+%{!?_disable_valgrind: %define _disable_valgrind 0}
+
+%if 0%{?rhel} == 6
+%if 0%{_disable_valgrind} == 0
+%define _with_valgrind 1
+%endif
+%endif
+
+Name: libmlx5
+Version: 1.0.2mlnx1
+Release: 1%{?dist}
+Summary: Mellanox ConnectX-IB InfiniBand HCA Userspace Driver
+
+Group: System Environment/Libraries
+License: GPLv2 or BSD
+Url: http://openfabrics.org/
+Source: http://openfabrics.org/downloads/mlx5/%{name}-%{version}.tar.gz
+BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)
+
+BuildRequires: libibverbs-devel >= 1.1-0.1.rc2
+%if %{_with_valgrind}
+BuildRequires: valgrind-devel
+%endif
+
+%description
+libmlx5 provides a device-specific userspace driver for Mellanox
+ConnectX HCAs for use with the libibverbs library.
+
+%package devel
+Summary: Development files for the libmlx5 driver
+Group: System Environment/Libraries
+Requires: %{name} = %{version}-%{release}
+Provides: libmlx5-static = %{version}-%{release}
+
+%description devel
+Static version of libmlx5 that may be linked directly to an
+application, which may be useful for debugging.
+
+%prep
+%setup -q -n %{name}-@VERSION@
+
+%build
+%if %{_with_valgrind}
+%configure %{?configure_options} --libdir=%{_libdir}/mlnx_ofed/valgrind --with-valgrind
+make %{?_smp_mflags}
+make DESTDIR=$RPM_BUILD_DIR/%{name}-%{version}/valgrind install
+rm -f $RPM_BUILD_DIR/%{name}-%{version}/valgrind/%{_libdir}/mlnx_ofed/valgrind/*.*a
+make clean
+%endif
+
+%configure %{configure_options}
+make %{?_smp_mflags}
+
+%install
+rm -rf $RPM_BUILD_ROOT
+make DESTDIR=%{buildroot} install
+%if %{_with_valgrind}
+mkdir -p %{buildroot}/%{_libdir}/mlnx_ofed
+cp -a $RPM_BUILD_DIR/%{name}-%{version}/valgrind/%{_libdir}/mlnx_ofed/valgrind %{buildroot}/%{_libdir}/mlnx_ofed
+%endif
+# remove unpackaged files from the buildroot
+rm -f $RPM_BUILD_ROOT%{_libdir}/*.la $RPM_BUILD_ROOT%{_libdir}/libmlx5.so
+
+%clean
+rm -rf $RPM_BUILD_ROOT
+
+%files
+%defattr(-,root,root,-)
+%{_libdir}/libmlx5-rdmav2.so
+%if %{_with_valgrind}
+%{_libdir}/mlnx_ofed/valgrind/libmlx5*.so
+%endif
+%{_sysconfdir}/libibverbs.d/mlx5.driver
+%doc AUTHORS COPYING README
+
+%files devel
+%defattr(-,root,root,-)
+%{_libdir}/libmlx5.a
+%{_prefix}/include/infiniband/
+
+%changelog
+* Mon Mar 26 2012 Eli Cohen <eli@mellanox.com> - 1.0.0
+- First version
+
Index: contrib/ofed/libmlx5/mlx5.driver
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/mlx5.driver
@@ -0,0 +1 @@
+driver mlx5
Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/defines.txt
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/defines.txt
@@ -0,0 +1,2 @@
+MLX5_CQ_DB_REQ_NOT_SOL
+MLX5_CQ_DB_REQ_NOT
Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/enumerations.txt
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/enumerations.txt
@@ -0,0 +1,74 @@
+MLX5_RCV_DBR
+MLX5_SND_DBR
+MLX5_SEND_WQE_BB
+MLX5_SEND_WQE_SHIFT
+MLX5_INLINE_SCATTER_32
+MLX5_INLINE_SCATTER_64
+MLX5_OPCODE_NOP
+MLX5_OPCODE_SEND_INVAL
+MLX5_OPCODE_RDMA_WRITE
+MLX5_OPCODE_RDMA_WRITE_IMM
+MLX5_OPCODE_SEND
+MLX5_OPCODE_SEND_IMM
+MLX5_OPCODE_LSO_MPW
+MLX5_OPC_MOD_MPW
+MLX5_OPCODE_RDMA_READ
+MLX5_OPCODE_ATOMIC_CS
+MLX5_OPCODE_ATOMIC_FA
+MLX5_OPCODE_ATOMIC_MASKED_CS
+MLX5_OPCODE_ATOMIC_MASKED_FA
+MLX5_OPCODE_BIND_MW
+MLX5_OPCODE_FMR
+MLX5_OPCODE_LOCAL_INVAL
+MLX5_OPCODE_CONFIG_CMD
+MLX5_OPCODE_SEND_ENABLE
+MLX5_OPCODE_RECV_ENABLE
+MLX5_OPCODE_CQE_WAIT
+MLX5_RECV_OPCODE_RDMA_WRITE_IMM
+MLX5_RECV_OPCODE_SEND
+MLX5_RECV_OPCODE_SEND_IMM
+MLX5_RECV_OPCODE_SEND_INVAL
+MLX5_CQE_OPCODE_ERROR
+MLX5_CQE_OPCODE_RESIZE
+MLX5_SRQ_FLAG_SIGNATURE
+MLX5_INLINE_SEG
+MLX5_CALC_UINT64_ADD
+MLX5_CALC_FLOAT64_ADD
+MLX5_CALC_UINT64_MAXLOC
+MLX5_CALC_UINT64_AND
+MLX5_CALC_UINT64_OR
+MLX5_CALC_UINT64_XOR
+MLX5_CQ_DOORBELL
+MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR
+MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR
+MLX5_CQE_SYNDROME_LOCAL_PROT_ERR
+MLX5_CQE_SYNDROME_WR_FLUSH_ERR
+MLX5_CQE_SYNDROME_MW_BIND_ERR
+MLX5_CQE_SYNDROME_BAD_RESP_ERR
+MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR
+MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR
+MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR
+MLX5_CQE_SYNDROME_REMOTE_OP_ERR
+MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR
+MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR
+MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR
+MLX5_CQE_OWNER_MASK
+MLX5_CQE_REQ
+MLX5_CQE_RESP_WR_IMM
+MLX5_CQE_RESP_SEND
+MLX5_CQE_RESP_SEND_IMM
+MLX5_CQE_RESP_SEND_INV
+MLX5_CQE_RESIZE_CQ
+MLX5_CQE_SIG_ERR
+MLX5_CQE_REQ_ERR
+MLX5_CQE_RESP_ERR
+MLX5_CQE_INVALID
+MLX5_WQE_CTRL_CQ_UPDATE
+MLX5_WQE_CTRL_SOLICITED
+MLX5_WQE_CTRL_FENCE
+MLX5_INVALID_LKEY
+MLX5_EXTENDED_UD_AV
+MLX5_NO_INLINE_DATA
+MLX5_INLINE_DATA32_SEG
+MLX5_INLINE_DATA64_SEG
+MLX5_COMPRESSED
Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/libmlx_expose_headers
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/libmlx_expose_headers
@@ -0,0 +1,364 @@
+#!/bin/bash -eE
+# Name: Expose libmlx5 headers
+# Author: Majd Dibbiny - majd@mellanox.com
+
+name=libmlx_expose_headers
+author="Majd Dibbiny - Majd@Mellanox.com"
+usage="./libmlx_expose_headers defines-file structures-file enumerations-file\nPlease provide the files in the exact order"
+example="./libmlx_expose_headers defines.txt structs.txt enums.txt"
+script_output="The script's output file is saved to $output_file"
+SCRIPTPATH=$(cd `dirname "${BASH_SOURCE[0]}"` && pwd)
+args=3
+defines_file="$1"
+structs_file="$2"
+enums_file="$3"
+prefix="$4"
+output_file="$prefix/include/infiniband/mlx5_hw.h"
+mkdir -p "$prefix/include/infiniband"
+libmlx5_path="$SCRIPTPATH/../../src/*"
+FILES="$libmlx5_path"
+
+function add_header {
+cat <<EOF > $output_file
+/**
+ * Copyright (C) Mellanox Technologies Ltd. 2001-2014. ALL RIGHTS RESERVED.
+ * This software product is a proprietary product of Mellanox Technologies Ltd.
+ * (the "Company") and all right, title, and interest and to the software product,
+ * including all associated intellectual property rights, are and shall
+ * remain exclusively with the Company.
+ *
+ * This software product is governed by the End User License Agreement
+ * provided with the software product.
+ */
+
+#ifndef MLX_HW_H_
+#define MLX_HW_H_
+
+#include <linux/types.h>
+#include <stdint.h>
+#include <pthread.h>
+#include <infiniband/driver.h>
+#include <infiniband/verbs.h>
+
+#define MLX5_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
+#if MLX5_GCC_VERSION >= 403
+# define __MLX5_ALGN_F__ __attribute__((noinline, aligned(64)))
+# define __MLX5_ALGN_D__ __attribute__((aligned(64)))
+#else
+# define __MLX5_ALGN_F__
+# define __MLX5_ALGN_D__
+#endif
+
+EOF
+}
+
+function add_footer {
+ echo -e "\n#endif" >> $output_file
+}
+
+function expose_defines {
+ #need to add support for define on multiple lines
+ local expose_defines_res=0
+ for f in $FILES ; do
+ grep -F -f $defines_file $f | sed -n '/^#/p' >> $output_file
+ done
+ while read -r line
+ do
+ if [ "`grep $line $output_file`" = "" ]; then
+ #echo "define: $line wasn't found."
+ expose_defines_res=1
+ break
+ fi
+ done < "$defines_file"
+ echo -e "\n" >> $output_file
+ echo $expose_defines_res
+}
+
+function expose_enums {
+ local expose_enums_res=0
+
+cat <<EOF >> $output_file
+enum mlx5_alloc_type { MXM_MLX5_ALLOC_TYPE_DUMMY };
+enum mlx5_rsc_type { MXM_MLX5_RSC_TYPE_DUMMY };
+enum mlx5_db_method { MXM_MLX5_DB_TYPE_DUMMY };
+enum mlx5_lock_type { MXM_MLX5_LOCK_TYPE_DUMMY };
+enum mlx5_lock_state { MXM_MLX5_LOCK_STATE_TYPE_DUMMY };
+EOF
+ echo "enum {" >> $output_file
+ while read -r line
+ do
+ for f in $FILES ; do
+ grep "$line" $f| while read -r gline ; do
+ pat="(\t)*(\s)*$line(\t)*(\s)*="
+ if [[ $gline =~ $pat ]] ;
+ then
+ grep_res="`echo $gline|sed -e 's/,.*//'`"
+ echo -e "\t$grep_res," >> $output_file
+ break
+ fi
+ done
+ done
+ if [ "`grep $line $output_file`" = "" ]; then
+ #echo "enum: $line wasn't found."
+ expose_enums_res=1
+ break
+ fi
+ done < "$enums_file"
+ echo -e "};\n" >> $output_file
+ echo $expose_enums_res
+}
+
+function expose_structs {
+ local expose_structs_res=0
+
+ echo -e "struct mlx5_qp;\n" >> $output_file;
+
+ while read -r line
+ do
+ struct_found=0
+ for f in $FILES; do
+ struct_line="struct $line {"
+ grep_res=`grep "$struct_line" $f`
+ if [ "$grep_res" != "" ] ; then
+ struct_found=1
+ counter=0
+ flag=0
+ while IFS='' read -r fline
+ do
+ if [ "$struct_line" == "$fline" ] ;
+ then
+ flag=1
+ fi
+ if [ "$flag" -gt "0" ] ;
+ then
+ if [[ $fline == *{* ]] ;
+ then
+ ((counter++))
+ elif [[ $fline == *}* ]] ;
+ then
+ ((counter--))
+ fi
+ printf "%s\n" "$fline">> $output_file
+ if [ "$counter" -eq "0" ] ;
+ then
+ flag=0
+ echo -e "\n" >> $output_file
+ fi
+ fi
+ done < "$f"
+ break
+ fi
+ done
+ if [ $struct_found -lt 1 ]; then
+ #echo "struct: $line wasn't found."
+ expose_structs_res=1
+ break
+ fi
+ done < "$structs_file"
+ echo $expose_structs_res
+}
+
+function add_aux_funcs {
+cat <<EOF >> $output_file
+#define to_mxxx(xxx, type)\\
+ ((struct mlx5_##type *)\\
+ ((void *) ib##xxx - offsetof(struct mlx5_##type, ibv_##xxx)))
+
+static inline struct mlx5_qp *to_mqp(struct ibv_qp *ibqp)
+{
+ struct verbs_qp *vqp = (struct verbs_qp *)ibqp;
+ return container_of(vqp, struct mlx5_qp, verbs_qp);
+}
+
+static inline struct mlx5_cq *to_mcq(struct ibv_cq *ibcq)
+{
+ return to_mxxx(cq, cq);
+}
+
+EOF
+}
+
+function add_qp_info_struct {
+cat <<EOF >> $output_file
+struct ibv_mlx5_qp_info {
+ uint32_t qpn;
+ uint32_t *dbrec;
+ struct {
+ void *buf;
+ unsigned wqe_cnt;
+ unsigned stride;
+ } sq, rq;
+ struct {
+ void *reg;
+ unsigned size;
+ int need_lock;
+ } bf;
+};
+
+EOF
+}
+function add_qp_info_func {
+ add_qp_info_struct
+cat <<EOF >> $output_file
+static inline int ibv_mlx5_exp_get_qp_info(struct ibv_qp *qp, struct ibv_mlx5_qp_info *qp_info)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+
+ if ((mqp->gen_data.scur_post != 0) || (mqp->rq.head != 0))
+ return -1;
+
+ qp_info->qpn = mqp->ctrl_seg.qp_num;
+ qp_info->dbrec = mqp->gen_data.db;
+ qp_info->sq.buf = mqp->buf.buf + mqp->sq.offset;
+ qp_info->sq.wqe_cnt = mqp->sq.wqe_cnt;
+ qp_info->sq.stride = 1 << mqp->sq.wqe_shift;
+ qp_info->rq.buf = mqp->buf.buf + mqp->rq.offset;
+ qp_info->rq.wqe_cnt = mqp->rq.wqe_cnt;
+ qp_info->rq.stride = 1 << mqp->rq.wqe_shift;
+ qp_info->bf.reg = mqp->gen_data.bf->reg;
+ qp_info->bf.need_lock = mqp->gen_data.bf->need_lock;
+
+ if (mqp->gen_data.bf->uuarn > 0)
+ qp_info->bf.size = mqp->gen_data.bf->buf_size;
+ else
+ qp_info->bf.size = 0;
+
+ return 0;
+}
+
+EOF
+}
+
+function add_cq_info_struct {
+cat <<EOF >> $output_file
+struct ibv_mlx5_cq_info {
+ uint32_t cqn;
+ unsigned cqe_cnt;
+ void *buf;
+ uint32_t *dbrec;
+ unsigned cqe_size;
+};
+
+EOF
+}
+
+function add_cq_info_func {
+ add_cq_info_struct
+cat <<EOF >> $output_file
+static inline int ibv_mlx5_exp_get_cq_info(struct ibv_cq *cq, struct ibv_mlx5_cq_info *cq_info)
+{
+ struct mlx5_cq *mcq = to_mcq(cq);
+
+ if (mcq->cons_index != 0)
+ return -1;
+
+ cq_info->cqn = mcq->cqn;
+ cq_info->cqe_cnt = mcq->ibv_cq.cqe + 1;
+ cq_info->cqe_size = mcq->cqe_sz;
+ cq_info->buf = mcq->active_buf->buf;
+ cq_info->dbrec = mcq->dbrec;
+
+ return 0;
+}
+
+EOF
+}
+
+function add_srq_info_struct {
+cat <<EOF >> $output_file
+struct ibv_mlx5_srq_info {
+ void *buf;
+ uint32_t *dbrec;
+ unsigned stride;
+ unsigned head;
+ unsigned tail;
+};
+
+EOF
+}
+
+function add_srq_info_func {
+ add_srq_info_struct
+cat <<EOF >> $output_file
+static inline int ibv_mlx5_exp_get_srq_info(struct ibv_srq *srq, struct ibv_mlx5_srq_info *srq_info)
+{
+ struct mlx5_srq *msrq;
+
+ if (srq->handle == LEGACY_XRC_SRQ_HANDLE)
+ srq = (struct ibv_srq *)(((struct ibv_srq_legacy *)srq)->ibv_srq);
+
+ msrq = container_of(srq, struct mlx5_srq, vsrq.srq);
+
+ if (msrq->counter != 0)
+ return -1;
+
+ srq_info->buf = msrq->buf.buf;
+ srq_info->dbrec = msrq->db;
+ srq_info->stride = 1 << msrq->wqe_shift;
+ srq_info->head = msrq->head;
+ srq_info->tail = msrq->tail;
+
+ return 0;
+}
+
+EOF
+}
+
+function add_cq_ci_func {
+cat <<EOF >> $output_file
+static inline void ibv_mlx5_exp_update_cq_ci(struct ibv_cq *cq, unsigned cq_ci)
+{
+ struct mlx5_cq *mcq = to_mcq(cq);
+
+ mcq->cons_index = cq_ci;
+}
+EOF
+}
+
+##MAIN##
+
+if [ $# -lt $args ] ; then
+ echo "Wrong number of arguments!"
+ echo -e "\n"
+ echo -e "Usage: $usage"
+ echo -e "\n"
+ echo "Example: $example"
+ echo -e "\n"
+ echo "Output: $script_output"
+ echo -e "\n\n"
+ echo -e "For help please contact $author \nExiting..."
+ exit 1
+fi
+
+add_header
+expose_defines_res=$(expose_defines)
+if [ $expose_defines_res -ne 0 ] ; then
+ echo "expose_defines: Failed!"
+ echo "Exiting..."
+ rm -f $output_file
+ exit 1
+fi
+expose_enums_res=$(expose_enums)
+if [ $expose_enums_res -ne 0 ] ; then
+ echo "expose_enums: Failed!"
+ echo "Exiting..."
+ rm -f $output_file
+ exit 1
+fi
+expose_structs_res=$(expose_structs)
+if [ $expose_structs_res -ne 0 ] ; then
+ echo "expose_structs: Failed!"
+ echo "Exiting..."
+ rm -f $output_file
+ exit 1
+fi
+
+add_aux_funcs
+add_qp_info_func
+add_cq_info_func
+add_srq_info_func
+add_cq_ci_func
+
+add_footer
+
+exit 0
Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/structures.txt
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/structures.txt
@@ -0,0 +1,43 @@
+mlx5_resource
+mlx5_wqe_srq_next_seg
+mlx5_wqe_data_seg
+mlx5_eqe_comp
+mlx5_eqe_qp_srq
+mlx5_wqe_ctrl_seg
+mlx5_wqe_xrc_seg
+mlx5_wqe_masked_atomic_seg
+mlx5_base_av
+mlx5_grh_av
+mlx5_wqe_av
+mlx5_wqe_datagram_seg
+mlx5_wqe_raddr_seg
+mlx5_wqe_atomic_seg
+mlx5_wqe_inl_data_seg
+mlx5_wqe_umr_ctrl_seg
+mlx5_seg_set_psv
+mlx5_seg_get_psv
+mlx5_seg_check_psv
+mlx5_rwqe_sig
+mlx5_wqe_signature_seg
+mlx5_wqe_inline_seg
+mlx5_wqe_wait_en_seg
+mlx5_err_cqe
+mlx5_cqe64
+mlx5_spinlock
+mlx5_lock
+mlx5_numa_req
+mlx5_buf
+general_data_hot
+data_seg_data
+ctrl_seg_data
+mpw_data
+general_data_warm
+odp_data
+mlx5_wq_recv_send_enable
+mlx5_cq
+mlx5_srq
+mlx5_wq
+mlx5_bf
+mlx5_qp
+mlx5_ah
+mlx5_mini_cqe8
Index: contrib/ofed/libmlx5/src/.gitignore
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/.gitignore
@@ -0,0 +1,3 @@
+*.la
+.dirstamp
+.libs
Index: contrib/ofed/libmlx5/src/bitmap.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/bitmap.h
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2000, 2011 Mellanox Technology Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef BITMAP_H
+#define BITMAP_H
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <pthread.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <sys/mman.h>
+#include <errno.h>
+#include "mlx5.h"
+
+/* Only ia64 requires this */
+#ifdef __ia64__
+#define MLX5_SHM_ADDR ((void *)0x8000000000000000UL)
+#define MLX5_SHMAT_FLAGS (SHM_RND)
+#else
+#define MLX5_SHM_ADDR NULL
+#define MLX5_SHMAT_FLAGS 0
+#endif
+
+#define BITS_PER_LONG (8 * sizeof(long))
+#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_LONG)
+
+#ifndef HPAGE_SIZE
+#define HPAGE_SIZE (2UL * 1024 * 1024)
+#endif
+
+#define MLX5_SHM_LENGTH HPAGE_SIZE
+#define MLX5_Q_CHUNK_SIZE 32768
+#define MLX5_SHM_NUM_REGION 64
+
+static inline unsigned long mlx5_ffz(uint32_t word)
+{
+ return __builtin_ffs(~word) - 1;
+}
+
+static inline uint32_t mlx5_find_first_zero_bit(const unsigned long *addr,
+ uint32_t size)
+{
+ const unsigned long *p = addr;
+ uint32_t result = 0;
+ unsigned long tmp;
+
+ while (size & ~(BITS_PER_LONG - 1)) {
+ tmp = *(p++);
+ if (~tmp)
+ goto found;
+ result += BITS_PER_LONG;
+ size -= BITS_PER_LONG;
+ }
+ if (!size)
+ return result;
+
+ tmp = (*p) | (~0UL << size);
+ if (tmp == (uint32_t)~0UL) /* Are any bits zero? */
+ return result + size; /* Nope. */
+found:
+ return result + mlx5_ffz(tmp);
+}
+
+static inline void mlx5_set_bit(unsigned int nr, unsigned long *addr)
+{
+ addr[(nr / BITS_PER_LONG)] |= (1 << (nr % BITS_PER_LONG));
+}
+
+static inline void mlx5_clear_bit(unsigned int nr, unsigned long *addr)
+{
+ addr[(nr / BITS_PER_LONG)] &= ~(1 << (nr % BITS_PER_LONG));
+}
+
+static inline int mlx5_test_bit(unsigned int nr, const unsigned long *addr)
+{
+ return !!(addr[(nr / BITS_PER_LONG)] & (1 << (nr % BITS_PER_LONG)));
+}
+
+#endif
Index: contrib/ofed/libmlx5/src/buf.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/buf.c
@@ -0,0 +1,688 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <signal.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sched.h>
+
+#ifdef HAVE_NUMA
+#include <numa.h>
+#endif
+
+#include "mlx5.h"
+#include "bitmap.h"
+
+#if !(defined(HAVE_IBV_DONTFORK_RANGE) && defined(HAVE_IBV_DOFORK_RANGE))
+
+/*
+ * If libibverbs isn't exporting these functions, then there's no
+ * point in doing it here, because the rest of libibverbs isn't going
+ * to be fork-safe anyway.
+ */
+static int ibv_dontfork_range(void *base, size_t size)
+{
+ return 0;
+}
+
+static int ibv_dofork_range(void *base, size_t size)
+{
+ return 0;
+}
+
+#endif /* HAVE_IBV_DONTFORK_RANGE && HAVE_IBV_DOFORK_RANGE */
+
+static int mlx5_bitmap_init(struct mlx5_bitmap *bitmap, uint32_t num,
+ uint32_t mask)
+{
+ bitmap->last = 0;
+ bitmap->top = 0;
+ bitmap->max = num;
+ bitmap->avail = num;
+ bitmap->mask = mask;
+ bitmap->avail = bitmap->max;
+ bitmap->table = calloc(BITS_TO_LONGS(bitmap->max), sizeof(uint32_t));
+ if (!bitmap->table)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void bitmap_free_range(struct mlx5_bitmap *bitmap, uint32_t obj,
+ int cnt)
+{
+ int i;
+
+ obj &= bitmap->max - 1;
+
+ for (i = 0; i < cnt; i++)
+ mlx5_clear_bit(obj + i, bitmap->table);
+ bitmap->last = min(bitmap->last, obj);
+ bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+ bitmap->avail += cnt;
+}
+
+static int bitmap_empty(struct mlx5_bitmap *bitmap)
+{
+ return (bitmap->avail == bitmap->max) ? 1 : 0;
+}
+
+static int bitmap_avail(struct mlx5_bitmap *bitmap)
+{
+ return bitmap->avail;
+}
+
+static void mlx5_bitmap_cleanup(struct mlx5_bitmap *bitmap)
+{
+ if (bitmap->table)
+ free(bitmap->table);
+}
+
+static void free_huge_mem(struct mlx5_hugetlb_mem *hmem)
+{
+ mlx5_bitmap_cleanup(&hmem->bitmap);
+ if (shmdt(hmem->shmaddr) == -1)
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno));
+ shmctl(hmem->shmid, IPC_RMID, NULL);
+ free(hmem);
+}
+
+static int mlx5_bitmap_alloc(struct mlx5_bitmap *bitmap)
+{
+ uint32_t obj;
+ int ret;
+
+ obj = mlx5_find_first_zero_bit(bitmap->table, bitmap->max);
+ if (obj < bitmap->max) {
+ mlx5_set_bit(obj, bitmap->table);
+ bitmap->last = (obj + 1);
+ if (bitmap->last == bitmap->max)
+ bitmap->last = 0;
+ obj |= bitmap->top;
+ ret = obj;
+ } else
+ ret = -1;
+
+ if (ret != -1)
+ --bitmap->avail;
+
+ return ret;
+}
+
+static uint32_t find_aligned_range(unsigned long *bitmap,
+ uint32_t start, uint32_t nbits,
+ int len, int alignment)
+{
+ uint32_t end, i;
+
+again:
+ start = align(start, alignment);
+
+ while ((start < nbits) && mlx5_test_bit(start, bitmap))
+ start += alignment;
+
+ if (start >= nbits)
+ return -1;
+
+ end = start + len;
+ if (end > nbits)
+ return -1;
+
+ for (i = start + 1; i < end; i++) {
+ if (mlx5_test_bit(i, bitmap)) {
+ start = i + 1;
+ goto again;
+ }
+ }
+
+ return start;
+}
+
+static int bitmap_alloc_range(struct mlx5_bitmap *bitmap, int cnt,
+ int align)
+{
+ uint32_t obj;
+ int ret, i;
+
+ if (cnt == 1 && align == 1)
+ return mlx5_bitmap_alloc(bitmap);
+
+ if (cnt > bitmap->max)
+ return -1;
+
+ obj = find_aligned_range(bitmap->table, bitmap->last,
+ bitmap->max, cnt, align);
+ if (obj >= bitmap->max) {
+ bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+ obj = find_aligned_range(bitmap->table, 0, bitmap->max,
+ cnt, align);
+ }
+
+ if (obj < bitmap->max) {
+ for (i = 0; i < cnt; i++)
+ mlx5_set_bit(obj + i, bitmap->table);
+ if (obj == bitmap->last) {
+ bitmap->last = (obj + cnt);
+ if (bitmap->last >= bitmap->max)
+ bitmap->last = 0;
+ }
+ obj |= bitmap->top;
+ ret = obj;
+ } else
+ ret = -1;
+
+ if (ret != -1)
+ bitmap->avail -= cnt;
+
+ return obj;
+}
+
+#ifndef SHM_HUGETLB
+#define SHM_HUGETLB 0
+#endif
+
+static struct mlx5_hugetlb_mem *alloc_huge_mem(size_t size)
+{
+ struct mlx5_hugetlb_mem *hmem;
+ size_t shm_len;
+
+ hmem = malloc(sizeof(*hmem));
+ if (!hmem)
+ return NULL;
+
+ shm_len = align(size, MLX5_SHM_LENGTH);
+ hmem->shmid = shmget(IPC_PRIVATE, shm_len, SHM_HUGETLB | SHM_R | SHM_W);
+ if (hmem->shmid == -1) {
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno));
+ goto out_free;
+ }
+
+ hmem->shmaddr = shmat(hmem->shmid, MLX5_SHM_ADDR, MLX5_SHMAT_FLAGS);
+ if (hmem->shmaddr == (void *)-1) {
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno));
+ goto out_rmid;
+ }
+
+ if (mlx5_bitmap_init(&hmem->bitmap, shm_len / MLX5_Q_CHUNK_SIZE,
+ shm_len / MLX5_Q_CHUNK_SIZE - 1)) {
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno));
+ goto out_shmdt;
+ }
+
+ /*
+ * Marked to be destroyed when process detaches from shmget segment
+ */
+ shmctl(hmem->shmid, IPC_RMID, NULL);
+
+ return hmem;
+
+out_shmdt:
+ if (shmdt(hmem->shmaddr) == -1)
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno));
+
+out_rmid:
+ shmctl(hmem->shmid, IPC_RMID, NULL);
+
+out_free:
+ free(hmem);
+ return NULL;
+}
+
+static int alloc_huge_buf(struct mlx5_context *mctx, struct mlx5_buf *buf,
+ size_t size, int page_size)
+{
+ int found = 0;
+ LIST_HEAD(slist);
+ int nchunk;
+ struct mlx5_hugetlb_mem *hmem;
+ int ret;
+
+ buf->length = align(size, MLX5_Q_CHUNK_SIZE);
+ nchunk = buf->length / MLX5_Q_CHUNK_SIZE;
+
+ mlx5_spin_lock(&mctx->hugetlb_lock);
+ list_for_each_entry(hmem, &mctx->hugetlb_list, list) {
+ if (bitmap_avail(&hmem->bitmap)) {
+ buf->base = bitmap_alloc_range(&hmem->bitmap, nchunk, 1);
+ if (buf->base != -1) {
+ buf->hmem = hmem;
+ found = 1;
+ break;
+ }
+ }
+ }
+ mlx5_spin_unlock(&mctx->hugetlb_lock);
+
+ if (!found) {
+ hmem = alloc_huge_mem(buf->length);
+ if (!hmem)
+ return -1;
+
+ buf->base = bitmap_alloc_range(&hmem->bitmap, nchunk, 1);
+ if (buf->base == -1) {
+ free_huge_mem(hmem);
+ /* TBD: remove after proven stability */
+ fprintf(stderr, "BUG: huge allocation\n");
+ return -1;
+ }
+
+ buf->hmem = hmem;
+
+ mlx5_spin_lock(&mctx->hugetlb_lock);
+ if (bitmap_avail(&hmem->bitmap))
+ list_add(&hmem->list, &mctx->hugetlb_list);
+ else
+ list_add_tail(&hmem->list, &mctx->hugetlb_list);
+ mlx5_spin_unlock(&mctx->hugetlb_lock);
+ }
+
+ buf->buf = hmem->shmaddr + buf->base * MLX5_Q_CHUNK_SIZE;
+
+ ret = ibv_dontfork_range(buf->buf, buf->length);
+ if (ret) {
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG, "\n");
+ goto out_fork;
+ }
+ buf->type = MLX5_ALLOC_TYPE_HUGE;
+
+ return 0;
+
+out_fork:
+ mlx5_spin_lock(&mctx->hugetlb_lock);
+ bitmap_free_range(&hmem->bitmap, buf->base, nchunk);
+ if (bitmap_empty(&hmem->bitmap)) {
+ list_del(&hmem->list);
+ mlx5_spin_unlock(&mctx->hugetlb_lock);
+ free_huge_mem(hmem);
+ } else
+ mlx5_spin_unlock(&mctx->hugetlb_lock);
+
+ return -1;
+}
+
+static void free_huge_buf(struct mlx5_context *ctx, struct mlx5_buf *buf)
+{
+ int nchunk;
+
+ nchunk = buf->length / MLX5_Q_CHUNK_SIZE;
+ mlx5_spin_lock(&ctx->hugetlb_lock);
+ bitmap_free_range(&buf->hmem->bitmap, buf->base, nchunk);
+ if (bitmap_empty(&buf->hmem->bitmap)) {
+ list_del(&buf->hmem->list);
+ mlx5_spin_unlock(&ctx->hugetlb_lock);
+ free_huge_mem(buf->hmem);
+ } else
+ mlx5_spin_unlock(&ctx->hugetlb_lock);
+}
+
+int mlx5_alloc_prefered_buf(struct mlx5_context *mctx,
+ struct mlx5_buf *buf,
+ size_t size, int page_size,
+ enum mlx5_alloc_type type,
+ const char *component)
+{
+ int ret;
+
+ /*
+ * Fallback mechanism priority:
+ * huge pages
+ * contig pages
+ * default
+ */
+ if (type == MLX5_ALLOC_TYPE_HUGE ||
+ type == MLX5_ALLOC_TYPE_PREFER_HUGE ||
+ type == MLX5_ALLOC_TYPE_ALL) {
+ ret = alloc_huge_buf(mctx, buf, size, page_size);
+ if (!ret)
+ return 0;
+
+ if (type == MLX5_ALLOC_TYPE_HUGE)
+ return -1;
+
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG,
+ "Huge mode allocation failed, fallback to %s mode\n",
+ MLX5_ALLOC_TYPE_ALL ? "contig" : "default");
+ }
+
+ if (type == MLX5_ALLOC_TYPE_CONTIG ||
+ type == MLX5_ALLOC_TYPE_PREFER_CONTIG ||
+ type == MLX5_ALLOC_TYPE_ALL) {
+ ret = mlx5_alloc_buf_contig(mctx, buf, size, page_size, component, NULL);
+ if (!ret)
+ return 0;
+
+ if (type == MLX5_ALLOC_TYPE_CONTIG)
+ return -1;
+ mlx5_dbg(stderr, MLX5_DBG_CONTIG,
+ "Contig allocation failed, fallback to default mode\n");
+ }
+
+ return mlx5_alloc_buf(buf, size, page_size);
+}
+
+int mlx5_free_actual_buf(struct mlx5_context *ctx, struct mlx5_buf *buf)
+{
+ int err = 0;
+
+ switch (buf->type) {
+ case MLX5_ALLOC_TYPE_ANON:
+ mlx5_free_buf(buf);
+ break;
+
+ case MLX5_ALLOC_TYPE_HUGE:
+ free_huge_buf(ctx, buf);
+ break;
+
+ case MLX5_ALLOC_TYPE_CONTIG:
+ mlx5_free_buf_contig(ctx, buf);
+ break;
+ default:
+ fprintf(stderr, "Bad allocation type\n");
+ }
+
+ return err;
+}
+
+/* This function computes log2(v) rounded up.
+ We don't want to have a dependency to libm which exposes ceil & log2 APIs.
+ Code was written based on public domain code:
+ URL: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog.
+*/
+static uint32_t mlx5_get_block_order(uint32_t v)
+{
+ static const uint32_t bits_arr[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
+ static const uint32_t shift_arr[] = {1, 2, 4, 8, 16};
+ int i;
+ uint32_t input_val = v;
+
+ register uint32_t r = 0;/* result of log2(v) will go here */
+ for (i = 4; i >= 0; i--) {
+ if (v & bits_arr[i]) {
+ v >>= shift_arr[i];
+ r |= shift_arr[i];
+ }
+ }
+ /* Rounding up if required */
+ r += !!(input_val & ((1 << r) - 1));
+
+ return r;
+}
+
+void mlx5_get_alloc_type(struct ibv_context *context,
+ const char *component,
+ enum mlx5_alloc_type *alloc_type,
+ enum mlx5_alloc_type default_type)
+
+{
+ char env_value[VERBS_MAX_ENV_VAL];
+ char name[128];
+
+ snprintf(name, sizeof(name), "%s_ALLOC_TYPE", component);
+
+ *alloc_type = default_type;
+
+ if (!ibv_exp_cmd_getenv(context, name, env_value, sizeof(env_value))) {
+ if (!strcasecmp(env_value, "ANON"))
+ *alloc_type = MLX5_ALLOC_TYPE_ANON;
+ else if (!strcasecmp(env_value, "HUGE"))
+ *alloc_type = MLX5_ALLOC_TYPE_HUGE;
+ else if (!strcasecmp(env_value, "CONTIG"))
+ *alloc_type = MLX5_ALLOC_TYPE_CONTIG;
+ else if (!strcasecmp(env_value, "PREFER_CONTIG"))
+ *alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG;
+ else if (!strcasecmp(env_value, "PREFER_HUGE"))
+ *alloc_type = MLX5_ALLOC_TYPE_PREFER_HUGE;
+ else if (!strcasecmp(env_value, "ALL"))
+ *alloc_type = MLX5_ALLOC_TYPE_ALL;
+ }
+}
+
+static void mlx5_alloc_get_env_info(struct ibv_context *context,
+ int *max_block_log,
+ int *min_block_log,
+ const char *component)
+
+{
+ char env[VERBS_MAX_ENV_VAL];
+ int value;
+ char name[128];
+
+ /* First set defaults */
+ *max_block_log = MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE;
+ *min_block_log = MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE;
+
+ snprintf(name, sizeof(name), "%s_MAX_LOG2_CONTIG_BSIZE", component);
+ if (!ibv_exp_cmd_getenv(context, name, env, sizeof(env))) {
+ value = atoi(env);
+ if (value <= MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE &&
+ value >= MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE)
+ *max_block_log = value;
+ else
+ fprintf(stderr, "Invalid value %d for %s\n",
+ value, name);
+ }
+ sprintf(name, "%s_MIN_LOG2_CONTIG_BSIZE", component);
+ if (!ibv_exp_cmd_getenv(context, name, env, sizeof(env))) {
+ value = atoi(env);
+ if (value >= MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE &&
+ value <= *max_block_log)
+ *min_block_log = value;
+ else
+ fprintf(stderr, "Invalid value %d for %s\n",
+ value, name);
+ }
+}
+
+int mlx5_alloc_buf_contig(struct mlx5_context *mctx,
+ struct mlx5_buf *buf, size_t size,
+ int page_size,
+ const char *component, void *req_addr)
+{
+ void *addr = MAP_FAILED;
+ int block_size_exp;
+ int max_block_log;
+ int min_block_log;
+ int mmap_flags = MAP_SHARED;
+ struct ibv_context *context = &mctx->ibv_ctx;
+ off_t offset;
+ void *act_addr = NULL;
+ size_t act_size = size;
+
+ mlx5_alloc_get_env_info(&mctx->ibv_ctx,
+ &max_block_log,
+ &min_block_log,
+ component);
+
+ /* this test guarantees that we don't call mlx5_get_block_order for
+ sizes above 4G so we don't overflow. It is based on the fact that
+ max_block_log cannot exceed 23 (MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE) */
+ if (size >= (1 << max_block_log))
+ block_size_exp = max_block_log;
+ else
+ block_size_exp = mlx5_get_block_order(size);
+
+ if (req_addr) {
+ mmap_flags |= MAP_FIXED;
+ act_addr = (void *)((uintptr_t)req_addr & ~((uintptr_t)page_size - 1));
+ act_size += (size_t)((uintptr_t)req_addr - (uintptr_t)act_addr);
+ }
+
+ do {
+ offset = 0;
+ if (buf->numa_req.valid && (buf->numa_req.numa_id == mctx->numa_id))
+ set_command(MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_DEV_NUMA_CMD, &offset);
+ else if (buf->numa_req.valid && (buf->numa_req.numa_id == mlx5_cpu_local_numa()))
+ set_command(MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_CPU_NUMA_CMD, &offset);
+ else
+ set_command(MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD, &offset);
+
+ set_order(block_size_exp, &offset);
+ addr = mmap(act_addr, act_size, PROT_WRITE | PROT_READ, mmap_flags,
+ context->cmd_fd, page_size * offset);
+
+ /* If CONTIGUOUS_PAGES_DEV_NUMA_CMD fails try CONTIGUOUS_PAGES */
+ if (addr == MAP_FAILED &&
+ get_command(&offset) != MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD) {
+ reset_command(&offset);
+ set_command(MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD, &offset);
+ addr = mmap(act_addr, act_size, PROT_WRITE | PROT_READ, mmap_flags,
+ context->cmd_fd, page_size * offset);
+ }
+ if (addr != MAP_FAILED)
+ break;
+
+ /*
+ * The kernel returns EINVAL if not supported
+ */
+ if (errno == EINVAL)
+ return -1;
+
+ block_size_exp -= 1;
+ } while (block_size_exp >= min_block_log);
+ mlx5_dbg(mctx->dbg_fp, MLX5_DBG_CONTIG, "block order %d, addr %p\n",
+ block_size_exp, addr);
+
+ if (addr == MAP_FAILED)
+ return -1;
+
+ if (ibv_dontfork_range(addr, act_size)) {
+ munmap(addr, act_size);
+ return -1;
+ }
+
+ buf->buf = addr;
+ buf->length = act_size;
+ buf->type = MLX5_ALLOC_TYPE_CONTIG;
+
+ return 0;
+}
+
+void mlx5_free_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf)
+{
+ ibv_dofork_range(buf->buf, buf->length);
+ munmap(buf->buf, buf->length);
+}
+
+#ifdef HAVE_NUMA
+int mlx5_cpu_local_numa(void)
+{
+ if (numa_available() == -1)
+ return -1;
+
+ return numa_node_of_cpu(sched_getcpu());
+}
+
+static void *mlx5_alloc_numa(size_t size, int numa)
+{
+ void *ptr;
+
+ if (numa < 0 || numa_available() == -1)
+ return NULL;
+
+ numa_set_strict(1);
+ ptr = numa_alloc_onnode(size, numa);
+ if (ptr)
+ numa_tonode_memory(ptr, size, numa);
+
+ return ptr;
+}
+
+static void mlx5_free_numa(void *ptr, size_t size)
+{
+ numa_free(ptr, size);
+}
+#else
+int mlx5_cpu_local_numa(void)
+{
+ return -1;
+}
+
+static void *mlx5_alloc_numa(size_t size, int numa)
+{
+ return NULL;
+}
+
+static void mlx5_free_numa(void *ptr, size_t size)
+{
+}
+#endif
+
+int mlx5_alloc_buf(struct mlx5_buf *buf, size_t size, int page_size)
+{
+ int ret;
+ size_t al_size;
+
+ al_size = align(size, page_size);
+
+ buf->buf = NULL;
+ if (buf->numa_req.valid)
+ buf->buf = mlx5_alloc_numa(al_size, buf->numa_req.numa_id);
+ if (buf->buf) {
+ buf->numa_alloc = 1;
+ } else {
+ buf->numa_alloc = 0;
+ ret = posix_memalign(&buf->buf, page_size, al_size);
+ if (ret)
+ return ret;
+ }
+
+ ret = ibv_dontfork_range(buf->buf, al_size);
+ if (ret) {
+ if (buf->numa_alloc)
+ mlx5_free_numa(buf->buf, al_size);
+ else
+ free(buf->buf);
+ }
+
+ if (!ret) {
+ buf->length = al_size;
+ buf->type = MLX5_ALLOC_TYPE_ANON;
+ }
+
+ return ret;
+}
+
+void mlx5_free_buf(struct mlx5_buf *buf)
+{
+ ibv_dofork_range(buf->buf, buf->length);
+ if (buf->numa_alloc)
+ mlx5_free_numa(buf->buf, buf->length);
+ else
+ free(buf->buf);
+}
Index: contrib/ofed/libmlx5/src/cq.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/cq.c
@@ -0,0 +1,1657 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <netinet/in.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include <infiniband/opcode.h>
+
+#include "mlx5.h"
+#include "wqe.h"
+#include "doorbell.h"
+
+enum {
+ MLX5_CQ_DOORBELL = 0x20
+};
+
+enum {
+ CQ_OK = 0,
+ CQ_EMPTY = -1,
+ CQ_POLL_ERR = -2
+};
+
+#define MLX5_CQ_DB_REQ_NOT_SOL (1 << 24)
+#define MLX5_CQ_DB_REQ_NOT (0 << 24)
+
+enum {
+ MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR = 0x01,
+ MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR = 0x02,
+ MLX5_CQE_SYNDROME_LOCAL_PROT_ERR = 0x04,
+ MLX5_CQE_SYNDROME_WR_FLUSH_ERR = 0x05,
+ MLX5_CQE_SYNDROME_MW_BIND_ERR = 0x06,
+ MLX5_CQE_SYNDROME_BAD_RESP_ERR = 0x10,
+ MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR = 0x11,
+ MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR = 0x12,
+ MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR = 0x13,
+ MLX5_CQE_SYNDROME_REMOTE_OP_ERR = 0x14,
+ MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR = 0x15,
+ MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR = 0x16,
+ MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR = 0x22,
+};
+
+enum {
+ MLX5_CQE_OWNER_MASK = 1,
+ MLX5_CQE_REQ = 0,
+ MLX5_CQE_RESP_WR_IMM = 1,
+ MLX5_CQE_RESP_SEND = 2,
+ MLX5_CQE_RESP_SEND_IMM = 3,
+ MLX5_CQE_RESP_SEND_INV = 4,
+ MLX5_CQE_RESIZE_CQ = 5,
+ MLX5_CQE_SIG_ERR = 12,
+ MLX5_CQE_REQ_ERR = 13,
+ MLX5_CQE_RESP_ERR = 14,
+ MLX5_CQE_INVALID = 15,
+};
+
+enum {
+ MLX5_CQ_MODIFY_RESEIZE = 0,
+ MLX5_CQ_MODIFY_MODER = 1,
+ MLX5_CQ_MODIFY_MAPPING = 2,
+};
+
+enum {
+ MLX5_NO_INLINE_DATA = 0x0,
+ MLX5_INLINE_DATA32_SEG = 0x1,
+ MLX5_INLINE_DATA64_SEG = 0x2,
+ MLX5_COMPRESSED = 0x3,
+};
+
+enum {
+ MLX5_CQE_L2_OK = 1 << 0,
+ MLX5_CQE_L3_OK = 1 << 1,
+ MLX5_CQE_L4_OK = 1 << 2,
+};
+
+enum {
+ MLX5_CQE_L3_HDR_TYPE_NONE = 0x0,
+ MLX5_CQE_L3_HDR_TYPE_IPV6 = 0x1,
+ MLX5_CQE_L3_HDR_TYPE_IPV4 = 0x2,
+};
+
+enum {
+ /* Masks to handle the CQE byte_count field in case of MP RQ */
+ MP_RQ_BYTE_CNT_FIELD_MASK = 0x0000FFFF,
+ MP_RQ_NUM_STRIDES_FIELD_MASK = 0x7FFF0000,
+ MP_RQ_FILLER_FIELD_MASK = 0x80000000,
+ MP_RQ_NUM_STRIDES_FIELD_SHIFT = 16,
+};
+
+struct mlx5_err_cqe {
+ uint8_t rsvd0[32];
+ uint32_t srqn;
+ uint8_t rsvd1[16];
+ uint8_t hw_err_synd;
+ uint8_t hw_synd_type;
+ uint8_t vendor_err_synd;
+ uint8_t syndrome;
+ uint32_t s_wqe_opcode_qpn;
+ uint16_t wqe_counter;
+ uint8_t signature;
+ uint8_t op_own;
+};
+
+struct mlx5_mini_cqe8 {
+ union {
+ uint32_t rx_hash_result;
+ uint32_t checksum;
+ struct {
+ uint16_t wqe_counter;
+ uint8_t s_wqe_opcode;
+ uint8_t reserved;
+ } s_wqe_info;
+ };
+ uint32_t byte_cnt;
+};
+
+struct mlx5_cqe64 {
+ uint8_t rsvd0[2];
+ /*
+ * wqe_id is valid only for Striding RQ (Multi-Packet RQ).
+ * It provides the WQE index inside the RQ.
+ */
+ uint16_t wqe_id;
+ uint8_t rsvd4[8];
+ uint32_t rx_hash_res;
+ uint8_t rx_hash_type;
+ uint8_t ml_path;
+ uint8_t rsvd20[2];
+ uint16_t checksum;
+ uint16_t slid;
+ uint32_t flags_rqpn;
+ uint8_t hds_ip_ext;
+ uint8_t l4_hdr_type_etc;
+ __be16 vlan_info;
+ uint32_t srqn_uidx;
+ uint32_t imm_inval_pkey;
+ uint8_t rsvd40[4];
+ uint32_t byte_cnt;
+ __be64 timestamp;
+ union {
+ uint32_t sop_drop_qpn;
+ struct {
+ uint8_t sop;
+ uint8_t qpn[3];
+ } sop_qpn;
+ };
+ /*
+ * In Striding RQ (Multi-Packet RQ) wqe_counter provides
+ * the WQE stride index (to calc pointer to start of the message)
+ */
+ uint16_t wqe_counter;
+ uint8_t signature;
+ uint8_t op_own;
+};
+
+int mlx5_stall_num_loop = 60;
+int mlx5_stall_cq_poll_min = 60;
+int mlx5_stall_cq_poll_max = 100000;
+int mlx5_stall_cq_inc_step = 100;
+int mlx5_stall_cq_dec_step = 10;
+
+#define MLX5E_CQE_FORMAT_MASK 0xc
+static inline int mlx5_get_cqe_format(struct mlx5_cqe64 *cqe)
+{
+ return (cqe->op_own & MLX5E_CQE_FORMAT_MASK) >> 2;
+}
+
+static inline uint8_t get_cqe_l3_hdr_type(struct mlx5_cqe64 *cqe)
+{
+ return (cqe->l4_hdr_type_etc >> 2) & 0x3;
+}
+
+static void *get_buf_cqe(struct mlx5_buf *buf, int n, int cqe_sz)
+{
+ return buf->buf + n * cqe_sz;
+}
+
+static void *get_cqe(struct mlx5_cq *cq, int n)
+{
+ return cq->active_buf->buf + n * cq->cqe_sz;
+}
+
+static inline void *get_sw_cqe(struct mlx5_cq *cq, int n) __attribute__((always_inline));
+static inline void *get_sw_cqe(struct mlx5_cq *cq, int n)
+{
+ void *cqe = get_cqe(cq, n & cq->ibv_cq.cqe);
+ struct mlx5_cqe64 *cqe64;
+
+ cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64;
+
+ if (likely((cqe64->op_own) >> 4 != MLX5_CQE_INVALID) &&
+ !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & (cq->ibv_cq.cqe + 1)))) {
+ return cqe;
+ } else {
+ return NULL;
+ }
+}
+
+static inline struct mlx5_cqe64 *get_next_cqe(struct mlx5_cq *cq, const int cqe_sz)
+{
+ unsigned idx = cq->cons_index & cq->ibv_cq.cqe;
+ void *cqe = cq->active_buf->buf + idx * cqe_sz;
+ struct mlx5_cqe64 *cqe64;
+
+ cqe64 = (cqe_sz == 64) ? cqe : cqe + 64;
+
+ if (likely((cqe64->op_own) >> 4 != MLX5_CQE_INVALID) &&
+ !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(cq->cons_index & (cq->ibv_cq.cqe + 1)))) {
+ return cqe64;
+ }
+
+ return NULL;
+}
+
+static struct mlx5_cqe64 *next_cqe_sw(struct mlx5_cq *cq)
+{
+ return get_next_cqe(cq, cq->cqe_sz);
+}
+
+static void handle_good_req(struct ibv_wc *wc, struct mlx5_cqe64 *cqe)
+{
+ switch (ntohl(cqe->sop_drop_qpn) >> 24) {
+ case MLX5_OPCODE_RDMA_WRITE_IMM:
+ wc->wc_flags |= IBV_WC_WITH_IMM;
+ case MLX5_OPCODE_RDMA_WRITE:
+ wc->opcode = IBV_WC_RDMA_WRITE;
+ break;
+ case MLX5_OPCODE_SEND_IMM:
+ wc->wc_flags |= IBV_WC_WITH_IMM;
+ case MLX5_OPCODE_SEND:
+ case MLX5_OPCODE_SEND_INVAL:
+ wc->opcode = IBV_WC_SEND;
+ break;
+ case MLX5_OPCODE_RDMA_READ:
+ wc->opcode = IBV_WC_RDMA_READ;
+ wc->byte_len = ntohl(cqe->byte_cnt);
+ break;
+ case MLX5_OPCODE_ATOMIC_CS:
+ wc->opcode = IBV_WC_COMP_SWAP;
+ wc->byte_len = 8;
+ break;
+ case MLX5_OPCODE_ATOMIC_FA:
+ wc->opcode = IBV_WC_FETCH_ADD;
+ wc->byte_len = 8;
+ break;
+ case MLX5_OPCODE_BIND_MW:
+ wc->opcode = IBV_WC_BIND_MW;
+ break;
+ case MLX5_OPCODE_UMR:
+ wc->opcode = IBV_EXP_WC_UMR;
+ break;
+
+ case MLX5_OPCODE_ATOMIC_MASKED_CS:
+ wc->opcode = IBV_EXP_WC_MASKED_COMP_SWAP;
+ break;
+
+ case MLX5_OPCODE_ATOMIC_MASKED_FA:
+ wc->opcode = IBV_EXP_WC_MASKED_FETCH_ADD;
+ break;
+ }
+}
+
+static int handle_responder(struct ibv_wc *wc, struct mlx5_cqe64 *cqe,
+ struct mlx5_qp *qp, struct mlx5_srq *srq,
+ enum mlx5_rsc_type type)
+{
+ uint16_t wqe_ctr;
+ struct mlx5_wq *wq;
+ uint8_t g;
+ int err = 0;
+ int cqe_format = mlx5_get_cqe_format(cqe);
+
+ wc->byte_len = ntohl(cqe->byte_cnt);
+ if (srq) {
+ wqe_ctr = ntohs(cqe->wqe_counter);
+ wc->wr_id = srq->wrid[wqe_ctr];
+ mlx5_free_srq_wqe(srq, wqe_ctr);
+ if (cqe_format == MLX5_INLINE_DATA32_SEG)
+ err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe,
+ wc->byte_len);
+ else if (cqe_format == MLX5_INLINE_DATA64_SEG)
+ err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe - 1,
+ wc->byte_len);
+ } else {
+ wq = &qp->rq;
+ wqe_ctr = wq->tail & (wq->wqe_cnt - 1);
+ wc->wr_id = wq->wrid[wqe_ctr];
+ ++wq->tail;
+ if (cqe_format == MLX5_INLINE_DATA32_SEG)
+ err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe,
+ wc->byte_len);
+ else if (cqe_format == MLX5_INLINE_DATA64_SEG)
+ err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe - 1,
+ wc->byte_len);
+ }
+ if (err)
+ return err;
+
+ wc->byte_len = ntohl(cqe->byte_cnt);
+
+ switch (cqe->op_own >> 4) {
+ case MLX5_CQE_RESP_WR_IMM:
+ wc->opcode = IBV_WC_RECV_RDMA_WITH_IMM;
+ wc->wc_flags |= IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->imm_inval_pkey;
+ break;
+ case MLX5_CQE_RESP_SEND:
+ wc->opcode = IBV_WC_RECV;
+ break;
+ case MLX5_CQE_RESP_SEND_IMM:
+ wc->opcode = IBV_WC_RECV;
+ wc->wc_flags |= IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->imm_inval_pkey;
+ break;
+ }
+ wc->slid = ntohs(cqe->slid);
+ wc->sl = (ntohl(cqe->flags_rqpn) >> 24) & 0xf;
+ if (srq && (type != MLX5_RSC_TYPE_DCT) &&
+ ((type == MLX5_RSC_TYPE_INVAL) || (type == MLX5_RSC_TYPE_XSRQ) ||
+ ((qp->verbs_qp.qp.qp_type == IBV_QPT_XRC_RECV) ||
+ (qp->verbs_qp.qp.qp_type == IBV_QPT_XRC))))
+ wc->src_qp = srq->srqn;
+ else
+ wc->src_qp = ntohl(cqe->flags_rqpn) & 0xffffff;
+
+
+ wc->dlid_path_bits = cqe->ml_path & 0x7f;
+
+ if ((qp && qp->verbs_qp.qp.qp_type == IBV_QPT_UD) ||
+ (type == MLX5_RSC_TYPE_DCT)) {
+ g = (ntohl(cqe->flags_rqpn) >> 28) & 3;
+ wc->wc_flags |= g ? IBV_WC_GRH : 0;
+ }
+
+ wc->pkey_index = ntohl(cqe->imm_inval_pkey) & 0xffff;
+
+ return IBV_WC_SUCCESS;
+}
+
+static void dump_cqe(FILE *fp, void *buf)
+{
+ uint32_t *p = buf;
+ int i;
+
+ for (i = 0; i < 16; i += 4)
+ fprintf(fp, "%08x %08x %08x %08x\n", ntohl(p[i]), ntohl(p[i + 1]),
+ ntohl(p[i + 2]), ntohl(p[i + 3]));
+}
+
+static void mlx5_set_bad_wc_opcode(struct ibv_exp_wc *wc,
+ struct mlx5_err_cqe *cqe,
+ uint8_t is_req)
+{
+ if (is_req) {
+ switch (ntohl(cqe->s_wqe_opcode_qpn) >> 24) {
+ case MLX5_OPCODE_RDMA_WRITE_IMM:
+ case MLX5_OPCODE_RDMA_WRITE:
+ wc->exp_opcode = IBV_EXP_WC_RDMA_WRITE;
+ break;
+ case MLX5_OPCODE_SEND_IMM:
+ case MLX5_OPCODE_SEND:
+ case MLX5_OPCODE_SEND_INVAL:
+ wc->exp_opcode = IBV_EXP_WC_SEND;
+ break;
+ case MLX5_OPCODE_RDMA_READ:
+ wc->exp_opcode = IBV_EXP_WC_RDMA_READ;
+ break;
+ case MLX5_OPCODE_ATOMIC_CS:
+ wc->exp_opcode = IBV_EXP_WC_COMP_SWAP;
+ break;
+ case MLX5_OPCODE_ATOMIC_FA:
+ wc->exp_opcode = IBV_EXP_WC_FETCH_ADD;
+ break;
+ case MLX5_OPCODE_BIND_MW:
+ wc->exp_opcode = IBV_EXP_WC_BIND_MW;
+ break;
+ case MLX5_OPCODE_UMR:
+ wc->exp_opcode = IBV_EXP_WC_UMR;
+ break;
+ case MLX5_OPCODE_ATOMIC_MASKED_CS:
+ wc->exp_opcode = IBV_EXP_WC_MASKED_COMP_SWAP;
+ break;
+ case MLX5_OPCODE_ATOMIC_MASKED_FA:
+ wc->exp_opcode = IBV_EXP_WC_MASKED_FETCH_ADD;
+ break;
+ }
+ } else {
+ switch (cqe->op_own >> 4) {
+ case MLX5_CQE_RESP_WR_IMM:
+ wc->exp_opcode = IBV_EXP_WC_RECV_RDMA_WITH_IMM;
+ break;
+ case MLX5_CQE_RESP_SEND:
+ wc->exp_opcode = IBV_EXP_WC_RECV;
+ break;
+ case MLX5_CQE_RESP_SEND_IMM:
+ wc->exp_opcode = IBV_EXP_WC_RECV;
+ break;
+ }
+ }
+}
+
+static void mlx5_handle_error_cqe(struct mlx5_err_cqe *cqe,
+ struct ibv_exp_wc *wc)
+{
+ switch (cqe->syndrome) {
+ case MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR:
+ wc->status = IBV_WC_LOC_LEN_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR:
+ wc->status = IBV_WC_LOC_QP_OP_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_LOCAL_PROT_ERR:
+ wc->status = IBV_WC_LOC_PROT_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_WR_FLUSH_ERR:
+ wc->status = IBV_WC_WR_FLUSH_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_MW_BIND_ERR:
+ wc->status = IBV_WC_MW_BIND_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_BAD_RESP_ERR:
+ wc->status = IBV_WC_BAD_RESP_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+ wc->status = IBV_WC_LOC_ACCESS_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
+ wc->status = IBV_WC_REM_INV_REQ_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+ wc->status = IBV_WC_REM_ACCESS_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
+ wc->status = IBV_WC_REM_OP_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RETRY_EXC_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+ break;
+ case MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR:
+ wc->status = IBV_WC_REM_ABORT_ERR;
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+
+ wc->vendor_err = cqe->vendor_err_synd;
+}
+
+#if defined(__x86_64__) || defined (__i386__)
+static inline unsigned long get_cycles()
+{
+ uint32_t low, high;
+ uint64_t val;
+ asm volatile ("rdtsc" : "=a" (low), "=d" (high));
+ val = high;
+ val = (val << 32) | low;
+ return val;
+}
+
+static void mlx5_stall_poll_cq()
+{
+ int i;
+
+ for (i = 0; i < mlx5_stall_num_loop; i++)
+ (void)get_cycles();
+}
+static void mlx5_stall_cycles_poll_cq(uint64_t cycles)
+{
+ while (get_cycles() < cycles)
+ ; /* Nothing */
+}
+static void mlx5_get_cycles(uint64_t *cycles)
+{
+ *cycles = get_cycles();
+}
+#else
+static void mlx5_stall_poll_cq()
+{
+}
+static void mlx5_stall_cycles_poll_cq(uint64_t cycles)
+{
+}
+static void mlx5_get_cycles(uint64_t *cycles)
+{
+}
+#endif
+
+static int is_requestor(uint8_t opcode)
+{
+ if (opcode == MLX5_CQE_REQ || opcode == MLX5_CQE_REQ_ERR)
+ return 1;
+ else
+ return 0;
+}
+
+static int is_responder(uint8_t opcode)
+{
+ switch (opcode) {
+ case MLX5_CQE_RESP_WR_IMM:
+ case MLX5_CQE_RESP_SEND:
+ case MLX5_CQE_RESP_SEND_IMM:
+ case MLX5_CQE_RESP_SEND_INV:
+ case MLX5_CQE_RESP_ERR:
+ return 1;
+ }
+
+ return 0;
+}
+
+static inline void copy_cqes(struct mlx5_cq *cq, struct mlx5_mini_cqe8 *mini_array,
+ struct mlx5_cqe64 *title, int cnt, uint16_t *wqe_cnt, int cqe_idx,
+ const int mp_rq)
+ __attribute__((always_inline));
+static inline void copy_cqes(struct mlx5_cq *cq, struct mlx5_mini_cqe8 *mini_array,
+ struct mlx5_cqe64 *title, int cnt, uint16_t *wqe_cnt, int cqe_idx,
+ const int mp_rq)
+{
+ struct mlx5_cqe64 *cqe;
+ int i;
+ int is_req = is_requestor(title->op_own >> 4);
+ int log_size = cq->cq_log_size;
+ uint8_t opown = title->op_own & 0xf2;
+
+ for (i = 0; i < cnt; i++) {
+ cqe = get_cqe(cq, (cqe_idx + i) & cq->ibv_cq.cqe);
+ memcpy(cqe, title, sizeof(*title));
+ cqe->byte_cnt = mini_array[i].byte_cnt;
+ cqe->op_own = opown | (((cqe_idx + i) >> log_size) & 1);
+ if (is_req) {
+ cqe->wqe_counter = mini_array[i].s_wqe_info.wqe_counter;
+ cqe->sop_qpn.sop = mini_array[i].s_wqe_info.s_wqe_opcode;
+ } else {
+ /* for now we are supporting only rx_hash_res not
+ * checksum */
+ cqe->rx_hash_res = mini_array[i].rx_hash_result;
+ cqe->wqe_counter = htons(*wqe_cnt);
+ if (mp_rq)
+ /*
+ * In case of mp_rq the wqe_cnt is the stride index of the message start,
+ * therefore we need to increase it by the number of consumed strides
+ */
+ (*wqe_cnt) += (ntohl(mini_array[i].byte_cnt) & MP_RQ_NUM_STRIDES_FIELD_MASK) >>
+ MP_RQ_NUM_STRIDES_FIELD_SHIFT;
+ else
+ /*
+ * In case of non mp_rq the wqe_cnt is the sq/rq wqe counter,
+ * therefore we need to increase it by one
+ */
+ (*wqe_cnt)++;
+ }
+ }
+}
+
+static inline struct mlx5_resource *find_rsc(struct mlx5_cq *cq,
+ struct mlx5_cqe64 *cqe64,
+ const int cqe_ver) __attribute__((always_inline));
+static inline struct mlx5_resource *find_rsc(struct mlx5_cq *cq,
+ struct mlx5_cqe64 *cqe64,
+ const int cqe_ver)
+{
+ uint32_t srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
+ uint32_t rsn;
+
+ if (cqe_ver)
+ return mlx5_find_uidx(to_mctx(cq->ibv_cq.context), srqn_uidx);
+
+ rsn = ntohl(cqe64->sop_drop_qpn) & 0xffffff;
+
+ return mlx5_find_rsc(to_mctx(cq->ibv_cq.context), rsn);
+}
+
+static inline void mlx5_decompress_cqe_idx(struct mlx5_cq *cq, uint32_t cqe_idx)
+ __attribute__((always_inline));
+static inline void mlx5_decompress_cqe_idx(struct mlx5_cq *cq, uint32_t cqe_idx)
+{
+ struct mlx5_cqe64 *title, *cqe;
+ struct mlx5_mini_cqe8 mini_array[8];
+ int cqe_cnt;
+ uint16_t wqe_cnt;
+ struct mlx5_resource *cur_rsc;
+ int mp_rq;
+
+ cqe = get_cqe(cq, cqe_idx & cq->ibv_cq.cqe);
+ title = cqe;
+ memcpy(mini_array, get_cqe(cq, (cqe_idx + 1) & cq->ibv_cq.cqe), sizeof(*title));
+ cqe_cnt = ntohl(title->byte_cnt);
+ wqe_cnt = ntohs(title->wqe_counter);
+ cur_rsc = find_rsc(cq, title, (to_mctx(cq->ibv_cq.context))->cqe_version);
+ mp_rq = cur_rsc ? cur_rsc->type == MLX5_RSC_TYPE_MP_RWQ : 0;
+
+ for (; cqe_cnt > 7; cqe_idx += 8, cqe_cnt -= 8) {
+ copy_cqes(cq, mini_array, title, 8, &wqe_cnt, cqe_idx, mp_rq);
+ cqe = get_cqe(cq, (cqe_idx + 8) & cq->ibv_cq.cqe);
+ memcpy(mini_array, cqe, sizeof(*title));
+ }
+
+ copy_cqes(cq, mini_array, title, cqe_cnt, &wqe_cnt, cqe_idx, mp_rq);
+}
+
+static inline void mlx5_decompress_cqe(struct mlx5_cq *cq)
+ __attribute__((always_inline));
+static inline void mlx5_decompress_cqe(struct mlx5_cq *cq)
+{
+ mlx5_decompress_cqe_idx(cq, cq->cons_index);
+}
+
+static inline int mlx5_poll_one(struct mlx5_cq *cq,
+ struct mlx5_resource **cur_rsc,
+ struct mlx5_srq **cur_srq, struct ibv_exp_wc *wc,
+ uint32_t wc_size,
+ int cqe_ver) __attribute__((always_inline));
+static inline int mlx5_poll_one(struct mlx5_cq *cq,
+ struct mlx5_resource **cur_rsc,
+ struct mlx5_srq **cur_srq,
+ struct ibv_exp_wc *wc,
+ uint32_t wc_size,
+ int cqe_ver)
+{
+ struct mlx5_cqe64 *cqe64;
+ struct mlx5_wq *wq;
+ uint16_t wqe_ctr;
+ void *cqe;
+ uint32_t rsn;
+ uint32_t srqn_uidx;
+ int idx;
+ uint8_t opcode;
+ struct mlx5_err_cqe *ecqe;
+ int err;
+ int requestor;
+ int responder;
+ int is_srq = 0;
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+ struct mlx5_qp *mqp = NULL;
+ struct mlx5_rwq *rwq = NULL;
+ struct mlx5_dct *mdct;
+ uint64_t exp_wc_flags = 0;
+ enum mlx5_rsc_type type = MLX5_RSC_TYPE_INVAL;
+ int cqe_format;
+ uint8_t l3_hdr;
+ int timestamp_en = cq->creation_flags &
+ MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP;
+
+ cqe64 = next_cqe_sw(cq);
+ if (!cqe64)
+ return CQ_EMPTY;
+
+ cqe_format = mlx5_get_cqe_format(cqe64);
+ if (unlikely(cqe_format == MLX5_COMPRESSED)) {
+ mlx5_decompress_cqe(cq);
+ timestamp_en = 0;
+ }
+
+ ++cq->cons_index;
+
+ /*
+ * Make sure we read CQ entry contents after we've checked the
+ * ownership bit.
+ */
+ rmb();
+
+#ifdef MLX5_DEBUG
+ if (mlx5_debug_mask & MLX5_DBG_CQ_CQE) {
+ FILE *fp = mctx->dbg_fp;
+
+ mlx5_dbg(fp, MLX5_DBG_CQ_CQE, "dump cqe for cqn 0x%x:\n", cq->cqn);
+ dump_cqe(fp, cqe64);
+ }
+#endif
+
+ ((struct ibv_wc *)wc)->wc_flags = 0;
+ opcode = cqe64->op_own >> 4;
+ requestor = is_requestor(opcode);
+ responder = is_responder(opcode);
+ if (unlikely(!requestor && !responder))
+ return CQ_POLL_ERR;
+
+ rsn = ntohl(cqe64->sop_drop_qpn) & 0xffffff;
+ srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff;
+ if (cqe_ver) {
+ if (!*cur_rsc || (srqn_uidx != (*cur_rsc)->rsn)) {
+ *cur_rsc = mlx5_find_uidx(mctx, srqn_uidx);
+ if (unlikely(!*cur_rsc))
+ return CQ_POLL_ERR;
+ }
+ } else {
+ if (responder && srqn_uidx) {
+ is_srq = 1;
+ if (!*cur_srq || (srqn_uidx != (*cur_srq)->srqn)) {
+ *cur_srq = mlx5_find_srq(mctx, srqn_uidx);
+ if (unlikely(!*cur_srq))
+ return CQ_POLL_ERR;
+ }
+ }
+
+ if (!*cur_rsc || (rsn != (*cur_rsc)->rsn)) {
+ *cur_rsc = mlx5_find_rsc(mctx, rsn);
+ if (unlikely(!*cur_rsc && !srqn_uidx))
+ return CQ_POLL_ERR;
+ }
+ }
+
+ if (*cur_rsc) {
+ switch ((*cur_rsc)->type) {
+ case MLX5_RSC_TYPE_QP:
+ mqp = (struct mlx5_qp *)*cur_rsc;
+ if (likely(offsetof(struct ibv_exp_wc, qp) < wc_size)) {
+ wc->qp = &mqp->verbs_qp.qp;
+ exp_wc_flags |= IBV_EXP_WC_QP;
+ }
+ if (cqe_ver && responder && mqp->verbs_qp.qp.srq) {
+ *cur_srq = to_msrq(mqp->verbs_qp.qp.srq);
+ is_srq = 1;
+ }
+ break;
+ case MLX5_RSC_TYPE_DCT:
+ mdct = (struct mlx5_dct *)*cur_rsc;
+ is_srq = 1;
+ if (likely(offsetof(struct ibv_exp_wc, dct) < wc_size)) {
+ wc->dct = &mdct->ibdct;
+ exp_wc_flags |= IBV_EXP_WC_DCT;
+ }
+
+ if (cqe_ver)
+ *cur_srq = to_msrq(mdct->ibdct.srq);
+ break;
+ case MLX5_RSC_TYPE_XSRQ:
+ *cur_srq = (struct mlx5_srq *)*cur_rsc;
+ is_srq = 1;
+ break;
+ case MLX5_RSC_TYPE_RWQ:
+ case MLX5_RSC_TYPE_MP_RWQ:
+ rwq = (struct mlx5_rwq *)*cur_rsc;
+ break;
+ default:
+ return CQ_POLL_ERR;
+ }
+ type = (*cur_rsc)->type;
+ }
+
+ if (is_srq && likely(offsetof(struct ibv_exp_wc, srq) < wc_size)) {
+ wc->srq = &(*cur_srq)->vsrq.srq;
+ exp_wc_flags |= IBV_EXP_WC_SRQ;
+ }
+
+ wc->qp_num = rsn;
+
+ switch (opcode) {
+ case MLX5_CQE_REQ:
+ if (unlikely(!mqp)) {
+ fprintf(stderr, "all requestors are kinds of QPs\n");
+ return CQ_POLL_ERR;
+ }
+ wq = &mqp->sq;
+ wqe_ctr = ntohs(cqe64->wqe_counter);
+ idx = wqe_ctr & (wq->wqe_cnt - 1);
+ handle_good_req((struct ibv_wc *)wc, cqe64);
+ if (cqe_format == MLX5_INLINE_DATA32_SEG) {
+ cqe = (cq->cqe_sz == 64) ? cqe64 : cqe64 - 1;
+ err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe,
+ wc->byte_len);
+ } else if (cqe_format == MLX5_INLINE_DATA64_SEG) {
+ cqe = cqe64 - 1;
+ err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1,
+ wc->byte_len);
+ } else {
+ err = 0;
+ }
+
+ wc->wr_id = wq->wrid[idx];
+ wq->tail = mqp->gen_data.wqe_head[idx] + 1;
+ wc->status = err;
+ break;
+ case MLX5_CQE_RESP_WR_IMM:
+ case MLX5_CQE_RESP_SEND:
+ case MLX5_CQE_RESP_SEND_IMM:
+ case MLX5_CQE_RESP_SEND_INV:
+ wc->status = handle_responder((struct ibv_wc *)wc, cqe64, mqp,
+ is_srq ? *cur_srq : NULL, type);
+ if (mqp &&
+ (mqp->gen_data.model_flags & MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP)) {
+ l3_hdr = get_cqe_l3_hdr_type(cqe64);
+ exp_wc_flags |=
+ (!!(cqe64->hds_ip_ext & MLX5_CQE_L4_OK) *
+ (uint64_t)IBV_EXP_WC_RX_TCP_UDP_CSUM_OK) |
+ (!!(cqe64->hds_ip_ext & MLX5_CQE_L3_OK) *
+ (uint64_t)IBV_EXP_WC_RX_IP_CSUM_OK) |
+ ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV4) *
+ (uint64_t)IBV_EXP_WC_RX_IPV4_PACKET) |
+ ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV6) *
+ (uint64_t)IBV_EXP_WC_RX_IPV6_PACKET);
+ }
+ break;
+ case MLX5_CQE_RESIZE_CQ:
+ break;
+ case MLX5_CQE_REQ_ERR:
+ case MLX5_CQE_RESP_ERR:
+ ecqe = (struct mlx5_err_cqe *)cqe64;
+ mlx5_handle_error_cqe(ecqe, wc);
+ mlx5_set_bad_wc_opcode(wc, ecqe, (opcode == MLX5_CQE_REQ_ERR));
+ if (unlikely(ecqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR &&
+ ecqe->syndrome != MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR)) {
+ FILE *fp = mctx->dbg_fp;
+ fprintf(fp, PFX "%s: got completion with error:\n",
+ mctx->hostname);
+ dump_cqe(fp, ecqe);
+ if (mlx5_freeze_on_error_cqe) {
+ fprintf(fp, PFX "freezing at poll cq...");
+ while (1)
+ sleep(10);
+ }
+ }
+
+ if (opcode == MLX5_CQE_REQ_ERR) {
+ wq = &mqp->sq;
+ wqe_ctr = ntohs(cqe64->wqe_counter);
+ idx = wqe_ctr & (wq->wqe_cnt - 1);
+ wc->wr_id = wq->wrid[idx];
+ wq->tail = mqp->gen_data.wqe_head[idx] + 1;
+ } else {
+ if (*cur_srq) {
+ wqe_ctr = ntohs(cqe64->wqe_counter);
+ wc->wr_id = (*cur_srq)->wrid[wqe_ctr];
+ mlx5_free_srq_wqe(*cur_srq, wqe_ctr);
+ } else {
+ if (rwq)
+ wq = &rwq->rq;
+ else
+ wq = &mqp->rq;
+ wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+ ++wq->tail;
+ }
+ }
+ break;
+ }
+
+ if (unlikely(timestamp_en)) {
+ wc->timestamp = ntohll(cqe64->timestamp);
+ exp_wc_flags |= IBV_EXP_WC_WITH_TIMESTAMP;
+ }
+
+ if (likely(offsetof(struct ibv_exp_wc, exp_wc_flags) < wc_size))
+ wc->exp_wc_flags = exp_wc_flags | (uint64_t)((struct ibv_wc *)wc)->wc_flags;
+
+ return CQ_OK;
+}
+
+static inline int poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_exp_wc *wc,
+ uint32_t wc_size, int cqe_ver) __attribute__((always_inline));
+static inline int poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_exp_wc *wc,
+ uint32_t wc_size, int cqe_ver)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_resource *rsc = NULL;
+ struct mlx5_srq *srq = NULL;
+ int npolled;
+ int err = CQ_OK;
+ void *twc;
+
+ if (cq->stall_enable) {
+ if (cq->stall_adaptive_enable) {
+ if (cq->stall_last_count)
+ mlx5_stall_cycles_poll_cq(cq->stall_last_count + cq->stall_cycles);
+ } else if (cq->stall_next_poll) {
+ cq->stall_next_poll = 0;
+ mlx5_stall_poll_cq();
+ }
+ }
+
+ mlx5_lock(&cq->lock);
+
+ for (npolled = 0, twc = wc; npolled < ne; ++npolled, twc += wc_size) {
+ err = mlx5_poll_one(cq, &rsc, &srq, twc, wc_size, cqe_ver);
+ if (err != CQ_OK)
+ break;
+ }
+
+ mlx5_update_cons_index(cq);
+
+ mlx5_unlock(&cq->lock);
+
+ if (cq->stall_enable) {
+ if (cq->stall_adaptive_enable) {
+ if (npolled == 0) {
+ cq->stall_cycles = max(cq->stall_cycles-mlx5_stall_cq_dec_step,
+ mlx5_stall_cq_poll_min);
+ mlx5_get_cycles(&cq->stall_last_count);
+ } else if (npolled < ne) {
+ cq->stall_cycles = min(cq->stall_cycles+mlx5_stall_cq_inc_step,
+ mlx5_stall_cq_poll_max);
+ mlx5_get_cycles(&cq->stall_last_count);
+ } else {
+ cq->stall_cycles = max(cq->stall_cycles-mlx5_stall_cq_dec_step,
+ mlx5_stall_cq_poll_min);
+ cq->stall_last_count = 0;
+ }
+ } else if (err == CQ_EMPTY) {
+ cq->stall_next_poll = 1;
+ }
+ }
+
+ return err == CQ_POLL_ERR ? err : npolled;
+}
+
+int mlx5_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
+{
+ return poll_cq(ibcq, ne, (struct ibv_exp_wc *)wc, sizeof(*wc), 0);
+}
+
+int mlx5_poll_cq_1(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc)
+{
+ return poll_cq(ibcq, ne, (struct ibv_exp_wc *)wc, sizeof(*wc), 1);
+}
+
+int mlx5_poll_cq_ex(struct ibv_cq *ibcq, int ne,
+ struct ibv_exp_wc *wc, uint32_t wc_size)
+{
+ return poll_cq(ibcq, ne, wc, wc_size, 0);
+}
+
+int mlx5_poll_cq_ex_1(struct ibv_cq *ibcq, int ne,
+ struct ibv_exp_wc *wc, uint32_t wc_size)
+{
+ return poll_cq(ibcq, ne, wc, wc_size, 1);
+}
+
+int mlx5_arm_cq(struct ibv_cq *ibvcq, int solicited)
+{
+ struct mlx5_cq *cq = to_mcq(ibvcq);
+ struct mlx5_context *ctx = to_mctx(ibvcq->context);
+ uint32_t doorbell[2];
+ uint32_t sn;
+ uint32_t ci;
+ uint32_t cmd;
+
+ sn = cq->arm_sn & 3;
+ ci = cq->cons_index & 0xffffff;
+ cmd = solicited ? MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT;
+
+ cq->dbrec[MLX5_CQ_ARM_DB] = htonl(sn << 28 | cmd | ci);
+
+ /*
+ * Make sure that the doorbell record in host memory is
+ * written before ringing the doorbell via PCI MMIO.
+ */
+ wmb();
+
+ doorbell[0] = htonl(sn << 28 | cmd | ci);
+ doorbell[1] = htonl(cq->cqn);
+
+ mlx5_write64(doorbell, ctx->uar[0].regs + MLX5_CQ_DOORBELL, &ctx->lock32);
+
+ wc_wmb();
+
+ return 0;
+}
+
+void mlx5_cq_event(struct ibv_cq *cq)
+{
+ to_mcq(cq)->arm_sn++;
+}
+
+static int is_equal_rsn(struct mlx5_cqe64 *cqe64, uint32_t rsn)
+{
+ return rsn == (ntohl(cqe64->sop_drop_qpn) & 0xffffff);
+}
+
+static int is_equal_uidx(struct mlx5_cqe64 *cqe64, uint32_t uidx)
+{
+ return uidx == (ntohl(cqe64->srqn_uidx) & 0xffffff);
+}
+
+static inline int free_res_cqe(struct mlx5_cqe64 *cqe64, uint32_t rsn_uidx,
+ struct mlx5_srq *srq, int cqe_version)
+{
+ if (cqe_version) {
+ if (is_equal_uidx(cqe64, rsn_uidx)) {
+ if (srq && is_responder(cqe64->op_own >> 4))
+ mlx5_free_srq_wqe(srq,
+ ntohs(cqe64->wqe_counter));
+ return 1;
+ }
+ } else {
+ if (is_equal_rsn(cqe64, rsn_uidx)) {
+ if (srq && (ntohl(cqe64->srqn_uidx) & 0xffffff))
+ mlx5_free_srq_wqe(srq,
+ ntohs(cqe64->wqe_counter));
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+void __mlx5_cq_clean(struct mlx5_cq *cq, uint32_t rsn_uidx, struct mlx5_srq *srq)
+{
+ uint32_t prod_index;
+ int nfreed = 0;
+ struct mlx5_cqe64 *cqe64, *dest64;
+ void *cqe, *dest;
+ uint8_t owner_bit;
+ int cqe_version;
+
+ if (!cq)
+ return;
+
+ /*
+ * First we need to find the current producer index, so we
+ * know where to start cleaning from. It doesn't matter if HW
+ * adds new entries after this loop -- the QP we're worried
+ * about is already in RESET, so the new entries won't come
+ * from our QP and therefore don't need to be checked.
+ */
+ cqe_version = (to_mctx(cq->ibv_cq.context))->cqe_version;
+ for (prod_index = cq->cons_index; (cqe = get_sw_cqe(cq, prod_index)); ++prod_index) {
+ if (mlx5_get_cqe_format(cqe) == MLX5_COMPRESSED)
+ mlx5_decompress_cqe_idx(cq, prod_index);
+
+ if (prod_index == cq->cons_index + cq->ibv_cq.cqe)
+ break;
+ }
+
+ /*
+ * Now sweep backwards through the CQ, removing CQ entries
+ * that match our QP by copying older entries on top of them.
+ */
+ while ((int) --prod_index - (int) cq->cons_index >= 0) {
+ cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
+ cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64;
+ if (free_res_cqe(cqe64, rsn_uidx, srq, cqe_version)) {
+ ++nfreed;
+ } else if (nfreed) {
+ dest = get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe);
+ dest64 = (cq->cqe_sz == 64) ? dest : dest + 64;
+ owner_bit = dest64->op_own & MLX5_CQE_OWNER_MASK;
+ memcpy(dest, cqe, cq->cqe_sz);
+ dest64->op_own = owner_bit |
+ (dest64->op_own & ~MLX5_CQE_OWNER_MASK);
+ }
+ }
+
+ if (nfreed) {
+ cq->cons_index += nfreed;
+ /*
+ * Make sure update of buffer contents is done before
+ * updating consumer index.
+ */
+ wmb();
+ mlx5_update_cons_index(cq);
+ }
+}
+
+void mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq)
+{
+ mlx5_lock(&cq->lock);
+ __mlx5_cq_clean(cq, qpn, srq);
+ mlx5_unlock(&cq->lock);
+}
+
+static uint8_t sw_ownership_bit(int n, int nent)
+{
+ return (n & nent) ? 1 : 0;
+}
+
+static int is_hw(uint8_t own, int n, int mask)
+{
+ return (own & MLX5_CQE_OWNER_MASK) ^ !!(n & (mask + 1));
+}
+
+void mlx5_cq_resize_copy_cqes(struct mlx5_cq *cq)
+{
+ struct mlx5_cqe64 *scqe64;
+ struct mlx5_cqe64 *dcqe64;
+ void *start_cqe;
+ void *scqe;
+ void *dcqe;
+ int ssize;
+ int dsize;
+ int i;
+ uint8_t sw_own;
+
+ ssize = cq->cqe_sz;
+ dsize = cq->resize_cqe_sz;
+
+ i = cq->cons_index;
+ scqe = get_buf_cqe(cq->active_buf, i & cq->active_cqes, ssize);
+ scqe64 = ssize == 64 ? scqe : scqe + 64;
+ start_cqe = scqe;
+ if (is_hw(scqe64->op_own, i, cq->active_cqes)) {
+ fprintf(stderr, "expected cqe in sw ownership\n");
+ return;
+ }
+
+ while ((scqe64->op_own >> 4) != MLX5_CQE_RESIZE_CQ) {
+ dcqe = get_buf_cqe(cq->resize_buf, (i + 1) & (cq->resize_cqes - 1), dsize);
+ dcqe64 = dsize == 64 ? dcqe : dcqe + 64;
+ sw_own = sw_ownership_bit(i + 1, cq->resize_cqes);
+ memcpy(dcqe, scqe, ssize);
+ dcqe64->op_own = (dcqe64->op_own & ~MLX5_CQE_OWNER_MASK) | sw_own;
+
+ ++i;
+ scqe = get_buf_cqe(cq->active_buf, i & cq->active_cqes, ssize);
+ scqe64 = ssize == 64 ? scqe : scqe + 64;
+ if (is_hw(scqe64->op_own, i, cq->active_cqes)) {
+ fprintf(stderr, "expected cqe in sw ownership\n");
+ return;
+ }
+
+ if (scqe == start_cqe) {
+ fprintf(stderr, "resize CQ failed to get resize CQE\n");
+ return;
+ }
+ }
+ ++cq->cons_index;
+}
+
+int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq,
+ struct mlx5_buf *buf, int nent, int cqe_sz)
+{
+ struct mlx5_cqe64 *cqe;
+ int i;
+ struct mlx5_device *dev = to_mdev(mctx->ibv_ctx.device);
+ int ret;
+ enum mlx5_alloc_type type;
+ enum mlx5_alloc_type default_type = MLX5_ALLOC_TYPE_PREFER_CONTIG;
+
+ if (mlx5_use_huge(&mctx->ibv_ctx, "HUGE_CQ"))
+ default_type = MLX5_ALLOC_TYPE_HUGE;
+
+ mlx5_get_alloc_type(&mctx->ibv_ctx, MLX5_CQ_PREFIX, &type, default_type);
+
+ buf->numa_req.valid = 1;
+ buf->numa_req.numa_id = mlx5_cpu_local_numa();
+ ret = mlx5_alloc_prefered_buf(mctx, buf,
+ align(nent * cqe_sz, dev->page_size),
+ dev->page_size,
+ type,
+ MLX5_CQ_PREFIX);
+
+ if (ret)
+ return -1;
+
+ memset(buf->buf, 0, nent * cqe_sz);
+
+ for (i = 0; i < nent; ++i) {
+ cqe = buf->buf + i * cqe_sz;
+ cqe += cqe_sz == 128 ? 1 : 0;
+ cqe->op_own = MLX5_CQE_INVALID << 4;
+ }
+
+ return 0;
+}
+
+int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf)
+{
+ return mlx5_free_actual_buf(ctx, buf);
+}
+
+/*
+ * poll family functions
+ */
+static inline int32_t poll_cnt(struct ibv_cq *ibcq, uint32_t max_entries,
+ const int use_lock, const int cqe_sz,
+ const int cqe_ver) __attribute__((always_inline));
+static inline int32_t poll_cnt(struct ibv_cq *ibcq, uint32_t max_entries,
+ const int use_lock, const int cqe_sz,
+ const int cqe_ver)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_resource *cur_rsc = NULL;
+ struct mlx5_cqe64 *cqe64;
+ struct mlx5_qp *mqp;
+ int err = CQ_OK;
+ uint16_t wqe_ctr;
+ int npolled;
+
+ if (unlikely(use_lock))
+ mlx5_lock(&cq->lock);
+
+ for (npolled = 0; npolled < max_entries; ++npolled) {
+ cqe64 = get_next_cqe(cq, cqe_sz);
+ if (!cqe64) {
+ err = CQ_EMPTY;
+ break;
+ }
+
+ if (unlikely(mlx5_get_cqe_format(cqe64) == MLX5_COMPRESSED))
+ mlx5_decompress_cqe(cq);
+
+ cur_rsc = find_rsc(cq, cqe64, cqe_ver);
+ if (unlikely(!cur_rsc)) {
+ err = CQ_POLL_ERR;
+ fprintf(stderr, "Failed to find send QP on poll_cnt\n");
+ break;
+ }
+ mqp = (struct mlx5_qp *)cur_rsc;
+ if (likely((cqe64->op_own >> 4) == MLX5_CQE_REQ)) {
+ wqe_ctr = ntohs(cqe64->wqe_counter);
+ mqp->sq.tail = mqp->gen_data.wqe_head[wqe_ctr & (mqp->sq.wqe_cnt - 1)] + 1;
+ } else if ((cqe64->op_own >> 4) == MLX5_CQE_RESP_SEND) {
+ ++mqp->rq.tail;
+ } else {
+ err = CQ_POLL_ERR;
+ if ((cqe64->op_own >> 4) == MLX5_CQE_REQ_ERR)
+ fprintf(stderr, "MLX5_CQE_REQ_ERR received on poll_cnt\n");
+ else
+ fprintf(stderr, "Non requester message received on poll_cnt\n");
+ }
+
+ if (unlikely(err != CQ_OK))
+ break;
+
+ ++cq->cons_index;
+ }
+
+ if (likely(npolled)) {
+ mlx5_update_cons_index(cq);
+ err = CQ_OK;
+ }
+
+ if (unlikely(use_lock))
+ mlx5_unlock(&cq->lock);
+
+ return err == CQ_POLL_ERR ? -1 : npolled;
+}
+
+static inline int32_t get_rx_offloads_flags(struct mlx5_cqe64 *cqe) __attribute__((always_inline));
+static inline int32_t get_rx_offloads_flags(struct mlx5_cqe64 *cqe)
+{
+ uint8_t l3_hdr;
+ int32_t flags;
+
+ l3_hdr = get_cqe_l3_hdr_type(cqe);
+ flags = (!!(cqe->hds_ip_ext & MLX5_CQE_L4_OK) * IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK) |
+ (!!(cqe->hds_ip_ext & MLX5_CQE_L3_OK) * IBV_EXP_CQ_RX_IP_CSUM_OK) |
+ ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV4) * IBV_EXP_CQ_RX_IPV4_PACKET) |
+ ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV6) * IBV_EXP_CQ_RX_IPV6_PACKET);
+
+ return flags;
+}
+
+static inline int32_t poll_length(struct ibv_cq *ibcq, void *buf, uint32_t *inl,
+ const int use_lock, const int cqe_sz,
+ uint32_t *offset, uint32_t *flags, const int cqe_ver) __attribute__((always_inline));
+static inline int32_t poll_length(struct ibv_cq *ibcq, void *buf, uint32_t *inl,
+ const int use_lock, const int cqe_sz,
+ uint32_t *offset, uint32_t *flags, const int cqe_ver)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_resource *cur_rsc = NULL;
+ struct mlx5_cqe64 *cqe64;
+ struct mlx5_qp *mqp = NULL;
+ struct mlx5_rwq *rwq = NULL;
+ int32_t size = 0;
+ uint16_t wqe_ctr;
+ int err = CQ_OK;
+ int cqe_format;
+
+ if (unlikely(use_lock))
+ mlx5_lock(&cq->lock);
+
+ cqe64 = get_next_cqe(cq, cqe_sz);
+
+ if (cqe64) {
+ cqe_format = mlx5_get_cqe_format(cqe64);
+ if (unlikely(cqe_format == MLX5_COMPRESSED)) {
+ mlx5_decompress_cqe(cq);
+ cqe_format = 0;
+ }
+
+ if (unlikely((cqe64->op_own >> 4) != MLX5_CQE_RESP_SEND)) {
+ if (cqe64->op_own >> 4 == MLX5_CQE_RESP_ERR)
+ fprintf(stderr, "poll_length, CQE response error, syndrome=0x%x, vendor syndrome error=0x%x, HW syndrome 0x%x, HW syndrome type 0x%x\n",
+ ((struct mlx5_err_cqe *)cqe64)->syndrome, ((struct mlx5_err_cqe *)cqe64)->vendor_err_synd,
+ ((struct mlx5_err_cqe *)cqe64)->hw_err_synd, ((struct mlx5_err_cqe *)cqe64)->hw_synd_type);
+ else
+ fprintf(stderr, "Only post-receive completion supported on poll_length, op=%u\n",
+ cqe64->op_own >> 4);
+ err = CQ_POLL_ERR;
+ goto out;
+ }
+ cur_rsc = find_rsc(cq, cqe64, cqe_ver);
+ if (unlikely(!cur_rsc)) {
+ fprintf(stderr, "Failed to find QP resource on poll_length\n");
+ err = CQ_POLL_ERR;
+ goto out;
+ }
+
+ if (cur_rsc->type == MLX5_RSC_TYPE_MP_RWQ) {
+ uint32_t byte_cnt;
+ uint16_t wqe_id;
+
+ if (unlikely(!offset)) {
+ fprintf(stderr, "Can't handle Multi-Packet RQ completion since"
+ " 'offset' output parameter is not provided\n");
+ err = CQ_POLL_ERR;
+ goto out;
+ }
+ rwq = (struct mlx5_rwq *)cur_rsc;
+
+ byte_cnt = ntohl(cqe64->byte_cnt);
+ wqe_id = ntohs(cqe64->wqe_id) & (rwq->rq.wqe_cnt - 1);
+ /* Add the WQE strides consumed by this CQE to the WQE consumed strides counter */
+ rwq->consumed_strides_counter[wqe_id] += (byte_cnt & MP_RQ_NUM_STRIDES_FIELD_MASK) >>
+ MP_RQ_NUM_STRIDES_FIELD_SHIFT;
+
+ /* Updae RX offload flags */
+ if (rwq->model_flags & MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP)
+ *flags = get_rx_offloads_flags(cqe64);
+ else
+ *flags = 0;
+ /* If last packet for receive WR (all strides of this WQE consumed) */
+ if (rwq->consumed_strides_counter[wqe_id] == rwq->mp_rq_strides_in_wqe) {
+ *flags |= IBV_EXP_CQ_RX_MULTI_PACKET_LAST_V1;
+ ++rwq->rq.tail; /* Update the rq tail */
+ rwq->consumed_strides_counter[wqe_id] = 0;
+ }
+
+ if (byte_cnt & MP_RQ_FILLER_FIELD_MASK)
+ /*
+ * In case of filler CQE the application get WC with message-size = 0.
+ * filler CQE may come at any time regardless to the last-packet indication.
+ */
+ size = 0;
+ else /* not a filler CQE */
+ size = (byte_cnt & MP_RQ_BYTE_CNT_FIELD_MASK) - rwq->mp_rq_packet_padding;
+
+ /*
+ * In mp_rq wqe_counter provides the WQE stride index.
+ * We use it to calculate packet offset in the WR posted buffer.
+ */
+ *offset = ntohs(cqe64->wqe_counter) * rwq->mp_rq_stride_size + rwq->mp_rq_packet_padding;
+ } else {
+ if (cur_rsc->type == MLX5_RSC_TYPE_QP) {
+ mqp = (struct mlx5_qp *)cur_rsc;
+ if (flags) {
+ if (mqp->gen_data.model_flags & MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP)
+ *flags = get_rx_offloads_flags(cqe64);
+ else
+ *flags = 0;
+ }
+ } else {
+ if (likely(cur_rsc->type == MLX5_RSC_TYPE_RWQ)) {
+ rwq = (struct mlx5_rwq *)cur_rsc;
+ } else {
+ fprintf(stderr, "Invalid resource type(%d) on poll_length\n", cur_rsc->type);
+ err = CQ_POLL_ERR;
+ goto out;
+ }
+ if (flags) {
+ if (rwq->model_flags & MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP)
+ *flags = get_rx_offloads_flags(cqe64);
+ else
+ *flags = 0;
+ }
+ }
+
+ size = ntohl(cqe64->byte_cnt);
+
+ if (unlikely(cqe_format)) {
+ void *data = (cqe_format == MLX5_INLINE_DATA32_SEG) ? cqe64 : cqe64 - 1;
+
+ if (buf) {
+ *inl = 1;
+ memcpy(buf, data, size);
+ } else {
+ wqe_ctr = mqp->rq.tail & (mqp->rq.wqe_cnt - 1);
+ if (unlikely(mlx5_copy_to_recv_wqe(mqp, wqe_ctr, data, size))) {
+ fprintf(stderr, "Fail to copy inline receive message to receive buffer\n");
+ err = CQ_POLL_ERR;
+ goto out;
+ }
+ }
+ }
+ if (!rwq)
+ ++mqp->rq.tail;
+ else
+ ++rwq->rq.tail;
+ }
+
+ ++cq->cons_index;
+ mlx5_update_cons_index(cq);
+ } else {
+ err = CQ_EMPTY;
+ if (flags)
+ *flags = 0;
+ }
+
+out:
+ if (unlikely(use_lock))
+ mlx5_unlock(&cq->lock);
+
+ return err == CQ_POLL_ERR ? -1 : size;
+}
+
+int32_t mlx5_poll_cnt_safe(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__;
+int32_t mlx5_poll_cnt_safe(struct ibv_cq *ibcq, uint32_t max)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+
+ return poll_cnt(ibcq, max, 1, cq->cqe_sz, mctx->cqe_version == 1);
+}
+
+int32_t mlx5_poll_cnt_unsafe_cqe64(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__;
+int32_t mlx5_poll_cnt_unsafe_cqe64(struct ibv_cq *ibcq, uint32_t max)
+{
+ return poll_cnt(ibcq, max, 0, 64, 0);
+}
+
+int32_t mlx5_poll_cnt_unsafe_cqe128(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__;
+int32_t mlx5_poll_cnt_unsafe_cqe128(struct ibv_cq *ibcq, uint32_t max)
+{
+ return poll_cnt(ibcq, max, 0, 128, 0);
+}
+
+int32_t mlx5_poll_cnt_unsafe_cqe64_v1(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__;
+int32_t mlx5_poll_cnt_unsafe_cqe64_v1(struct ibv_cq *ibcq, uint32_t max)
+{
+ return poll_cnt(ibcq, max, 0, 64, 1);
+}
+
+int32_t mlx5_poll_cnt_unsafe_cqe128_v1(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__;
+int32_t mlx5_poll_cnt_unsafe_cqe128_v1(struct ibv_cq *ibcq, uint32_t max)
+{
+ return poll_cnt(ibcq, max, 0, 128, 1);
+}
+
+int32_t mlx5_poll_length_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+
+ return poll_length(ibcq, buf, inl, 1, cq->cqe_sz, NULL, NULL, mctx->cqe_version == 1);
+}
+
+int32_t mlx5_poll_length_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl)
+{
+ return poll_length(cq, buf, inl, 0, 64, NULL, NULL, 0);
+}
+
+int32_t mlx5_poll_length_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl)
+{
+ return poll_length(cq, buf, inl, 0, 128, NULL, NULL, 0);
+}
+
+int32_t mlx5_poll_length_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl)
+{
+ return poll_length(cq, buf, inl, 0, 64, NULL, NULL, 1);
+}
+
+int32_t mlx5_poll_length_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl)
+{
+ return poll_length(cq, buf, inl, 0, 128, NULL, NULL, 1);
+}
+
+/* Poll length flags */
+int32_t mlx5_poll_length_flags_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl, uint32_t *flags)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+
+ return poll_length(ibcq, buf, inl, 1, cq->cqe_sz, NULL, flags, mctx->cqe_version == 1);
+}
+
+int32_t mlx5_poll_length_flags_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags)
+{
+ return poll_length(cq, buf, inl, 0, 64, NULL, flags, 0);
+}
+
+int32_t mlx5_poll_length_flags_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags)
+{
+ return poll_length(cq, buf, inl, 0, 128, NULL, flags, 0);
+}
+
+int32_t mlx5_poll_length_flags_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags)
+{
+ return poll_length(cq, buf, inl, 0, 64, NULL, flags, 1);
+}
+
+int32_t mlx5_poll_length_flags_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags)
+{
+ return poll_length(cq, buf, inl, 0, 128, NULL, flags, 1);
+}
+
+/* Poll length flags MP RQ */
+int32_t mlx5_poll_length_flags_mp_rq_safe(struct ibv_cq *ibcq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_mp_rq_safe(struct ibv_cq *ibcq, uint32_t *offset, uint32_t *flags)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+
+ return poll_length(ibcq, NULL, NULL, 1, cq->cqe_sz, offset, flags, mctx->cqe_version == 1);
+}
+
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags)
+{
+ return poll_length(cq, NULL, NULL, 0, 64, offset, flags, 0);
+}
+
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags)
+{
+ return poll_length(cq, NULL, NULL, 0, 128, offset, flags, 0);
+}
+
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags)
+{
+ return poll_length(cq, NULL, NULL, 0, 64, offset, flags, 1);
+}
+
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__;
+int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags)
+{
+ return poll_length(cq, NULL, NULL, 0, 128, offset, flags, 1);
+}
+
+static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_safe = {
+ .poll_cnt = mlx5_poll_cnt_safe,
+ .poll_length = mlx5_poll_length_safe,
+ .poll_length_flags = mlx5_poll_length_flags_safe,
+ .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_safe
+};
+
+enum mlx5_poll_cq_cqe_sizes {
+ MLX5_POLL_CQ_CQE_64 = 1,
+ MLX5_POLL_CQ_CQE_128 = 2,
+ MLX5_POLL_CQ_NUM_CQE_SIZES = 3,
+};
+
+static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_unsafe_tbl[MLX5_POLL_CQ_NUM_CQE_SIZES] = {
+ [MLX5_POLL_CQ_CQE_64] = {
+ .poll_cnt = mlx5_poll_cnt_unsafe_cqe64,
+ .poll_length = mlx5_poll_length_unsafe_cqe64,
+ .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe64,
+ .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe64
+
+ },
+ [MLX5_POLL_CQ_CQE_128] = {
+ .poll_cnt = mlx5_poll_cnt_unsafe_cqe128,
+ .poll_length = mlx5_poll_length_unsafe_cqe128,
+ .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe128,
+ .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe128
+
+ },
+};
+
+static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_unsafe_v1_tbl[MLX5_POLL_CQ_NUM_CQE_SIZES] = {
+ [MLX5_POLL_CQ_CQE_64] = {
+ .poll_cnt = mlx5_poll_cnt_unsafe_cqe64_v1,
+ .poll_length = mlx5_poll_length_unsafe_cqe64_v1,
+ .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe64_v1,
+ .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1
+ },
+ [MLX5_POLL_CQ_CQE_128] = {
+ .poll_cnt = mlx5_poll_cnt_unsafe_cqe128_v1,
+ .poll_length = mlx5_poll_length_unsafe_cqe128_v1,
+ .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe128_v1,
+ .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1
+ },
+};
+
+struct ibv_exp_cq_family_v1 *mlx5_get_poll_cq_family(struct mlx5_cq *cq,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status)
+{
+ struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context);
+ enum mlx5_poll_cq_cqe_sizes cqe_size;
+
+ if (params->intf_version > MLX5_MAX_CQ_FAMILY_VER) {
+ *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED;
+
+ return NULL;
+ }
+ if (params->flags) {
+ fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for CQ family\n", params->flags);
+ *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+ if (params->family_flags) {
+ fprintf(stderr, PFX "Family flags(0x%x) are not supported for CQ family\n", params->family_flags);
+ *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+ if (cq->model_flags & MLX5_CQ_MODEL_FLAG_THREAD_SAFE)
+ return &mlx5_poll_cq_family_safe;
+
+ if (cq->cqe_sz == 64) {
+ cqe_size = MLX5_POLL_CQ_CQE_64;
+ } else if (cq->cqe_sz == 128) {
+ cqe_size = MLX5_POLL_CQ_CQE_128;
+ } else {
+ errno = EINVAL;
+ *status = IBV_EXP_INTF_STAT_INVAL_PARARM;
+ return NULL;
+ }
+
+ if (mctx->cqe_version == 1)
+ return &mlx5_poll_cq_family_unsafe_v1_tbl[cqe_size];
+
+ return &mlx5_poll_cq_family_unsafe_tbl[cqe_size];
+}
Index: contrib/ofed/libmlx5/src/dbrec.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/dbrec.c
@@ -0,0 +1,152 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdlib.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <string.h>
+
+#include "mlx5.h"
+
+struct mlx5_db_page {
+ struct mlx5_db_page *prev, *next;
+ struct mlx5_buf buf;
+ int num_db;
+ int use_cnt;
+ unsigned long free[0];
+};
+
+static struct mlx5_db_page *__add_page(struct mlx5_context *context)
+{
+ struct mlx5_db_page *page;
+ int ps = to_mdev(context->ibv_ctx.device)->page_size;
+ int pp;
+ int i;
+ int nlong;
+
+ pp = ps / context->cache_line_size;
+ nlong = (pp + 8 * sizeof(long) - 1) / (8 * sizeof(long));
+
+ page = calloc(1, sizeof(*page) + nlong * sizeof(long));
+ if (!page)
+ return NULL;
+
+ if (mlx5_alloc_buf(&page->buf, ps, ps)) {
+ free(page);
+ return NULL;
+ }
+
+ page->num_db = pp;
+ page->use_cnt = 0;
+ for (i = 0; i < nlong; ++i)
+ page->free[i] = ~0;
+
+ page->prev = NULL;
+ page->next = context->db_list;
+ context->db_list = page;
+ if (page->next)
+ page->next->prev = page;
+
+ return page;
+}
+
+uint32_t *mlx5_alloc_dbrec(struct mlx5_context *context)
+{
+ struct mlx5_db_page *page;
+ uint32_t *db = NULL;
+ int i, j;
+
+ pthread_mutex_lock(&context->db_list_mutex);
+
+ for (page = context->db_list; page; page = page->next)
+ if (page->use_cnt < page->num_db)
+ goto found;
+
+ page = __add_page(context);
+ if (!page)
+ goto out;
+
+found:
+ ++page->use_cnt;
+
+ for (i = 0; !page->free[i]; ++i)
+ /* nothing */;
+
+ j = ffsl(page->free[i]);
+ --j;
+ page->free[i] &= ~(1UL << j);
+ db = page->buf.buf + (i * 8 * sizeof(long) + j) * context->cache_line_size;
+
+out:
+ pthread_mutex_unlock(&context->db_list_mutex);
+
+ return db;
+}
+
+void mlx5_free_db(struct mlx5_context *context, uint32_t *db)
+{
+ struct mlx5_db_page *page;
+ uintptr_t ps = to_mdev(context->ibv_ctx.device)->page_size;
+ int i;
+
+ pthread_mutex_lock(&context->db_list_mutex);
+
+ for (page = context->db_list; page; page = page->next)
+ if (((uintptr_t) db & ~(ps - 1)) == (uintptr_t) page->buf.buf)
+ break;
+
+ if (!page)
+ goto out;
+
+ i = ((void *) db - page->buf.buf) / context->cache_line_size;
+ page->free[i / (8 * sizeof(long))] |= 1UL << (i % (8 * sizeof(long)));
+
+ if (!--page->use_cnt) {
+ if (page->prev)
+ page->prev->next = page->next;
+ else
+ context->db_list = page->next;
+ if (page->next)
+ page->next->prev = page->prev;
+
+ mlx5_free_buf(&page->buf);
+ free(page);
+ }
+
+out:
+ pthread_mutex_unlock(&context->db_list_mutex);
+}
Index: contrib/ofed/libmlx5/src/doorbell.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/doorbell.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+
+#ifndef DOORBELL_H
+#define DOORBELL_H
+
+#if SIZEOF_LONG == 8
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+# define MLX5_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0])
+#elif __BYTE_ORDER == __BIG_ENDIAN
+# define MLX5_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1])
+#else
+# error __BYTE_ORDER not defined
+#endif
+
+static inline void mlx5_write64(uint32_t val[2],
+ void *dest,
+ struct mlx5_lock *lock)
+{
+ *(volatile uint64_t *)dest = MLX5_PAIR_TO_64(val);
+}
+
+#else
+
+static inline void mlx5_write64(uint32_t val[2],
+ void *dest,
+ struct mlx5_lock *lock)
+{
+ mlx5_lock(lock);
+ *(volatile uint32_t *)dest = val[0];
+ *(volatile uint32_t *)(dest + 4) = val[1];
+ mlx5_unlock(lock);
+}
+
+#endif
+
+#endif /* DOORBELL_H */
Index: contrib/ofed/libmlx5/src/implicit_lkey.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/implicit_lkey.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IMPLICIT_LKEY_H
+#define IMPLICIT_LKEY_H
+
+#include <stdint.h>
+
+
+#define ODP_GLOBAL_R_LKEY 0x00000101
+#define ODP_GLOBAL_W_LKEY 0x00000102
+#define MLX5_WHOLE_ADDR_SPACE (~((size_t)0))
+
+struct mlx5_pd;
+struct ibv_exp_reg_mr_in;
+
+struct mlx5_pair_mrs {
+ struct ibv_mr *mrs[2];
+};
+
+struct mlx5_implicit_lkey {
+ struct mlx5_pair_mrs **table;
+ uint64_t exp_access;
+ pthread_mutex_t lock;
+};
+
+int mlx5_init_implicit_lkey(struct mlx5_implicit_lkey *ilkey,
+ uint64_t access_flags);
+
+void mlx5_destroy_implicit_lkey(struct mlx5_implicit_lkey *ilkey);
+struct mlx5_implicit_lkey *mlx5_get_implicit_lkey(struct mlx5_pd *pd, uint64_t exp_access);
+
+struct ibv_mr *mlx5_alloc_whole_addr_mr(const struct ibv_exp_reg_mr_in *attr);
+
+void mlx5_dealloc_whole_addr_mr(struct ibv_mr *);
+
+int mlx5_get_real_lkey_from_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, size_t len,
+ uint32_t *lkey);
+int mlx5_get_real_mr_from_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, uint64_t len,
+ struct ibv_mr **mr);
+
+int mlx5_prefetch_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, size_t len, int flags);
+
+#endif /* IMPLICIT_LKEY_H */
Index: contrib/ofed/libmlx5/src/implicit_lkey.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/implicit_lkey.c
@@ -0,0 +1,279 @@
+#include <pthread.h>
+#include <infiniband/verbs.h>
+#include <stdlib.h>
+#include <inttypes.h>
+#include <assert.h>
+#include "implicit_lkey.h"
+#include "mlx5.h"
+
+#define LEVEL1_SIZE 10
+#define LEVEL2_SIZE 11
+#define MR_SIZE 28
+
+#define ADDR_EFFECTIVE_BITS (MR_SIZE + LEVEL2_SIZE + LEVEL1_SIZE)
+
+#define LEVEL1_SHIFT (MR_SIZE + LEVEL2_SIZE)
+#define LEVEL2_SHIFT MR_SIZE
+
+#define MASK(len) ((1 << (len)) - 1)
+
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))
+
+struct mlx5_implicit_lkey *mlx5_get_implicit_lkey(struct mlx5_pd *pd,
+ uint64_t exp_access)
+{
+ if (!(exp_access & IBV_EXP_ACCESS_ON_DEMAND)) {
+ fprintf(stderr, "cannot create relaxed or implicit\
+ MR as a non-ODP MR\n");
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if ((exp_access & ~IBV_EXP_ACCESS_RELAXED) == IBV_EXP_ACCESS_ON_DEMAND)
+ return &pd->r_ilkey;
+
+ if ((exp_access & ~IBV_EXP_ACCESS_RELAXED) ==
+ (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_LOCAL_WRITE))
+ return &pd->w_ilkey;
+
+ if (!(exp_access & IBV_EXP_ACCESS_RELAXED)) {
+ fprintf(stderr, "cannot create a strict MR (non-relaxed)\
+ for remote access\n");
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if (!pd->remote_ilkey) {
+ pd->remote_ilkey = malloc(sizeof(struct mlx5_implicit_lkey));
+ if (!pd->remote_ilkey) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ errno = mlx5_init_implicit_lkey(pd->remote_ilkey,
+ IBV_EXP_ACCESS_LOCAL_WRITE |
+ IBV_EXP_ACCESS_REMOTE_READ |
+ IBV_EXP_ACCESS_REMOTE_WRITE |
+ IBV_EXP_ACCESS_REMOTE_ATOMIC |
+ IBV_EXP_ACCESS_ON_DEMAND);
+ if (errno) {
+ free(pd->remote_ilkey);
+ pd->remote_ilkey = NULL;
+ }
+ }
+
+ return pd->remote_ilkey;
+}
+
+int mlx5_init_implicit_lkey(struct mlx5_implicit_lkey *ilkey,
+ uint64_t exp_access)
+{
+ ilkey->table = NULL;
+ ilkey->exp_access = exp_access;
+
+ if (!(exp_access & IBV_EXP_ACCESS_ON_DEMAND))
+ return -EINVAL;
+
+ return pthread_mutex_init(&(ilkey->lock), NULL);
+}
+
+static void destroy_level2(struct mlx5_pair_mrs *table)
+{
+ struct mlx5_pair_mrs *ptr = table;
+ for (; ptr != table + (1 << LEVEL2_SIZE); ++ptr) {
+ if (ptr->mrs[0]) {
+ to_mmr(ptr->mrs[0])->alloc_flags &= ~IBV_EXP_ACCESS_RELAXED;
+ ibv_dereg_mr(ptr->mrs[0]);
+ }
+ if (ptr->mrs[1]) {
+ to_mmr(ptr->mrs[1])->alloc_flags &= ~IBV_EXP_ACCESS_RELAXED;
+ ibv_dereg_mr(ptr->mrs[1]);
+ }
+ }
+
+ free(table);
+}
+
+void mlx5_destroy_implicit_lkey(struct mlx5_implicit_lkey *ilkey)
+{
+ struct mlx5_pair_mrs **ptr = ilkey->table;
+
+ pthread_mutex_destroy(&ilkey->lock);
+
+ if (ptr) {
+ for (; ptr != ilkey->table + (1 << LEVEL1_SIZE); ++ptr)
+ if (*ptr)
+ destroy_level2(*ptr);
+
+ free(ilkey->table);
+ }
+}
+
+struct ibv_mr *mlx5_alloc_whole_addr_mr(const struct ibv_exp_reg_mr_in *attr)
+{
+ struct ibv_mr *mr;
+
+ if (attr->exp_access & ~(IBV_EXP_ACCESS_ON_DEMAND |
+ IBV_EXP_ACCESS_LOCAL_WRITE))
+ return NULL;
+
+ mr = malloc(sizeof(struct ibv_mr));
+
+ if (!mr)
+ return NULL;
+
+ mr->context = attr->pd->context;
+ mr->pd = attr->pd;
+ mr->addr = attr->addr;
+ mr->length = attr->length;
+ mr->handle = 0;
+ mr->lkey = attr->exp_access & IBV_EXP_ACCESS_LOCAL_WRITE ?
+ ODP_GLOBAL_W_LKEY : ODP_GLOBAL_R_LKEY;
+ mr->rkey = 0;
+
+ return mr;
+}
+
+void mlx5_dealloc_whole_addr_mr(struct ibv_mr *mr)
+{
+ free(mr);
+}
+
+int mlx5_get_real_mr_from_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, uint64_t len,
+ struct ibv_mr **mr)
+{
+ uint64_t key1 = (addr >> LEVEL1_SHIFT) & MASK(LEVEL1_SIZE);
+ uint64_t key2 = (addr >> LEVEL2_SHIFT) & MASK(LEVEL2_SIZE);
+ uint64_t addr_msb_bits = addr >> ADDR_EFFECTIVE_BITS;
+ uint64_t mr_base_addr = addr & ~MASK(MR_SIZE);
+ int mr_idx_in_pair = (((addr >> (MR_SIZE)) & 1) !=
+ (((addr+len+1) >> (MR_SIZE)) & 1));
+
+ mr_base_addr |= (mr_idx_in_pair << (MR_SIZE-1));
+
+ if (len >> MR_SIZE) {
+ fprintf(stderr, "range too large for the implicit MR\n");
+ return EINVAL;
+ }
+
+ /* Verify that the address is canonical, refuse posting a WQE
+ * for non-canonical addresses. To remove this limitation, add
+ * 5 levels to the tree here.
+ */
+ if (addr_msb_bits &&
+ (addr_msb_bits != ((~((uint64_t)0)) >> ADDR_EFFECTIVE_BITS)))
+ return EINVAL;
+
+
+ /* Access the table in lock-free manner.
+ *
+ * As we only add items to the table, only lock it when adding
+ * the items, and check that the item is still missing with
+ * lock held. Assumes that writes to pointers are atomic, so
+ * we will never read "half-pointer".
+ */
+ if (!ilkey->table) {
+ pthread_mutex_lock(&ilkey->lock);
+ if (!ilkey->table)
+ ilkey->table = calloc(1, sizeof(void *) *
+ (1 << LEVEL1_SIZE));
+ pthread_mutex_unlock(&ilkey->lock);
+ if (!ilkey->table)
+ return ENOMEM;
+ }
+
+ if (!ilkey->table[key1]) {
+ pthread_mutex_lock(&ilkey->lock);
+ if (!ilkey->table[key1])
+ ilkey->table[key1] = calloc(1,
+ (sizeof(struct mlx5_pair_mrs) *
+ (1 << LEVEL2_SIZE)));
+ pthread_mutex_unlock(&ilkey->lock);
+ if (!ilkey->table[key1])
+ return ENOMEM;
+ }
+
+ if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair]) {
+ pthread_mutex_lock(&ilkey->lock);
+ if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair]) {
+ struct ibv_exp_reg_mr_in attr = {
+ .comp_mask = 0,
+ .pd = &pd->ibv_pd,
+ .addr = (void *)(unsigned long)mr_base_addr,
+ .length = 1 << MR_SIZE,
+ .exp_access = ilkey->exp_access,
+ };
+
+ ilkey->table[key1][key2].mrs[mr_idx_in_pair] = ibv_exp_reg_mr(&attr);
+ if (ilkey->table[key1][key2].mrs[mr_idx_in_pair]) {
+ ilkey->table[key1][key2].mrs[mr_idx_in_pair]->addr = (void *)(unsigned long)mr_base_addr;
+ ilkey->table[key1][key2].mrs[mr_idx_in_pair]->length = 1 << MR_SIZE;
+ }
+ }
+ if (ilkey->table[key1][key2].mrs[mr_idx_in_pair]) {
+ to_mmr(ilkey->table[key1][key2].mrs[mr_idx_in_pair])->alloc_flags |= IBV_EXP_ACCESS_RELAXED;
+ to_mmr(ilkey->table[key1][key2].mrs[mr_idx_in_pair])->type = MLX5_ODP_MR;
+ }
+ pthread_mutex_unlock(&ilkey->lock);
+ if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair])
+ return ENOMEM;
+ }
+
+ *mr = ilkey->table[key1][key2].mrs[mr_idx_in_pair];
+
+ assert((*mr)->addr <= (void *)(unsigned long)addr &&
+ (void *)(unsigned long)addr + len <=
+ (*mr)->addr + (*mr)->length);
+ return 0;
+}
+
+int mlx5_get_real_lkey_from_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, size_t len,
+ uint32_t *lkey)
+{
+ struct ibv_mr *mr;
+ int ret_val = mlx5_get_real_mr_from_implicit_lkey(pd, ilkey, addr,
+ len, &mr);
+
+ if (ret_val == 0)
+ *lkey = mr->lkey;
+ return ret_val;
+}
+
+#define PREFETCH_STRIDE_SIZE (MASK(MR_SIZE-1))
+int mlx5_prefetch_implicit_lkey(struct mlx5_pd *pd,
+ struct mlx5_implicit_lkey *ilkey,
+ uint64_t addr, size_t len, int flags)
+{
+ uint64_t end_addr = addr + len;
+ if (addr > end_addr)
+ return EINVAL;
+ while (addr < end_addr) {
+ struct ibv_mr *mr;
+ struct ibv_exp_prefetch_attr attr;
+ size_t effective_length = MIN(1+PREFETCH_STRIDE_SIZE -
+ (addr & PREFETCH_STRIDE_SIZE),
+ end_addr - addr);
+ int ret_val = mlx5_get_real_mr_from_implicit_lkey(pd,
+ ilkey,
+ addr,
+ effective_length,
+ &mr);
+ if (ret_val)
+ return ret_val;
+ attr.comp_mask = 0;
+ attr.addr = (void *)(unsigned long)addr;
+ attr.length = effective_length;
+ attr.flags = flags;
+
+ ret_val = ibv_exp_prefetch_mr(mr, &attr);
+ if (ret_val)
+ return ret_val;
+
+ addr += effective_length;
+ }
+ return 0;
+}
Index: contrib/ofed/libmlx5/src/list.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/list.h
@@ -0,0 +1,331 @@
+#ifndef _LINUX_LIST_H
+#define _LINUX_LIST_H
+
+/*
+ * These are non-NULL pointers that will result in page faults
+ * under normal circumstances, used to verify that nobody uses
+ * non-initialized list entries.
+ */
+#define LIST_POISON1 ((void *) 0x00100100)
+#define LIST_POISON2 ((void *) 0x00200200)
+
+/*
+ * Simple doubly linked list implementation.
+ *
+ * Some of the internal functions ("__xxx") are useful when
+ * manipulating whole lists rather than single entries, as
+ * sometimes we already know the next/prev entries and we can
+ * generate better code by using them directly rather than
+ * using the generic single-entry routines.
+ */
+
+struct list_head {
+ struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define LIST_HEAD(name) \
+ struct list_head name = LIST_HEAD_INIT(name)
+
+#define INIT_LIST_HEAD(ptr) do { \
+ (ptr)->next = (ptr); (ptr)->prev = (ptr); \
+} while (0)
+
+/*
+ * Insert a new entry between two known consecutive entries.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_add(struct list_head *new,
+ struct list_head *prev,
+ struct list_head *next)
+{
+ next->prev = new;
+ new->next = next;
+ new->prev = prev;
+ prev->next = new;
+}
+
+/**
+ * list_add - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it after
+ *
+ * Insert a new entry after the specified head.
+ * This is good for implementing stacks.
+ */
+static inline void list_add(struct list_head *new, struct list_head *head)
+{
+ __list_add(new, head, head->next);
+}
+
+/**
+ * list_add_tail - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head.
+ * This is useful for implementing queues.
+ */
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+ __list_add(new, head->prev, head);
+}
+
+/*
+ * Delete a list entry by making the prev/next entries
+ * point to each other.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_del(struct list_head *prev, struct list_head *next)
+{
+ next->prev = prev;
+ prev->next = next;
+}
+
+/**
+ * list_del - deletes entry from list.
+ * @entry: the element to delete from the list.
+ * Note: list_empty on entry does not return true after this, the entry is
+ * in an undefined state.
+ */
+static inline void list_del(struct list_head *entry)
+{
+ __list_del(entry->prev, entry->next);
+ entry->next = LIST_POISON1;
+ entry->prev = LIST_POISON2;
+}
+
+/**
+ * list_del_init - deletes entry from list and reinitialize it.
+ * @entry: the element to delete from the list.
+ */
+static inline void list_del_init(struct list_head *entry)
+{
+ __list_del(entry->prev, entry->next);
+ INIT_LIST_HEAD(entry);
+}
+
+/**
+ * list_move - delete from one list and add as another's head
+ * @list: the entry to move
+ * @head: the head that will precede our entry
+ */
+static inline void list_move(struct list_head *list, struct list_head *head)
+{
+ __list_del(list->prev, list->next);
+ list_add(list, head);
+}
+
+/**
+ * list_move_tail - delete from one list and add as another's tail
+ * @list: the entry to move
+ * @head: the head that will follow our entry
+ */
+static inline void list_move_tail(struct list_head *list,
+ struct list_head *head)
+{
+ __list_del(list->prev, list->next);
+ list_add_tail(list, head);
+}
+
+/**
+ * list_empty - tests whether a list is empty
+ * @head: the list to test.
+ */
+static inline int list_empty(const struct list_head *head)
+{
+ return head->next == head;
+}
+
+/**
+ * list_empty_careful - tests whether a list is
+ * empty _and_ checks that no other CPU might be
+ * in the process of still modifying either member
+ *
+ * NOTE: using list_empty_careful() without synchronization
+ * can only be safe if the only activity that can happen
+ * to the list entry is list_del_init(). Eg. it cannot be used
+ * if another CPU could re-list_add() it.
+ *
+ * @head: the list to test.
+ */
+static inline int list_empty_careful(const struct list_head *head)
+{
+ struct list_head *next = head->next;
+ return (next == head) && (next == head->prev);
+}
+
+static inline void __list_splice(struct list_head *list,
+ struct list_head *head)
+{
+ struct list_head *first = list->next;
+ struct list_head *last = list->prev;
+ struct list_head *at = head->next;
+
+ first->prev = head;
+ head->next = first;
+
+ last->next = at;
+ at->prev = last;
+}
+
+/**
+ * list_splice - join two lists
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ */
+static inline void list_splice(struct list_head *list, struct list_head *head)
+{
+ if (!list_empty(list))
+ __list_splice(list, head);
+}
+
+/**
+ * list_splice_init - join two lists and reinitialise the emptied list.
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_init(struct list_head *list,
+ struct list_head *head)
+{
+ if (!list_empty(list)) {
+ __list_splice(list, head);
+ INIT_LIST_HEAD(list);
+ }
+}
+
+#ifndef offsetof
+#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+#endif
+
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ *
+ * @ptr: the pointer to the member.
+ * @type: the type of the container struct this is embedded in.
+ * @member: the name of the member within the struct.
+ *
+ */
+#ifndef container_of
+#define container_of(ptr, type, member) ({ \
+ const typeof(((type *)0)->member)*__mptr = (ptr); \
+ (type *)((char *)__mptr - offsetof(type, member)); })
+#endif
+
+
+/**
+ * list_entry - get the struct for this entry
+ * @ptr: the &struct list_head pointer.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_entry(ptr, type, member) \
+ container_of(ptr, type, member)
+
+/**
+ * list_for_each - iterate over a list
+ * @pos: the &struct list_head to use as a loop counter.
+ * @head: the head for your list.
+ */
+#define list_for_each(pos, head) \
+ for (pos = (head)->next; prefetch(pos->next), pos != (head); \
+ pos->next)
+
+/**
+ * __list_for_each - iterate over a list
+ * @pos: the &struct list_head to use as a loop counter.
+ * @head: the head for your list.
+ *
+ * This variant differs from list_for_each() in that it's the
+ * simplest possible list iteration code, no prefetching is done.
+ * Use this for code that knows the list to be very short (empty
+ * or 1 entry) most of the time.
+ */
+#define __list_for_each(pos, head) \
+ for (pos = (head)->next; pos != (head); pos = pos->next)
+
+/**
+ * list_for_each_prev - iterate over a list backwards
+ * @pos: the &struct list_head to use as a loop counter.
+ * @head: the head for your list.
+ */
+#define list_for_each_prev(pos, head) \
+ for (pos = (head)->prev; prefetch(pos->prev), pos != (head); \
+ pos = pos->prev)
+
+/**
+ * list_for_each_safe - iterate over a list safe against removal of list entry
+ * @pos: the &struct list_head to use as a loop counter.
+ * @n: another &struct list_head to use as temporary storage
+ * @head: the head for your list.
+ */
+#define list_for_each_safe(pos, n, head) \
+ for (pos = (head)->next, n = pos->next; pos != (head); \
+ pos = n, n = pos->next)
+
+/**
+ * list_for_each_entry - iterate over list of given type
+ * @pos: the type * to use as a loop counter.
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry(pos, head, member) \
+ for (pos = list_entry((head)->next, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = list_entry(pos->member.next, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_reverse - iterate backwards over list of given type.
+ * @pos: the type * to use as a loop counter.
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_reverse(pos, head, member) \
+ for (pos = list_entry((head)->prev, typeof(*pos), member); \
+ prefetch(pos->member.prev), &pos->member != (head); \
+ pos = list_entry(pos->member.prev, typeof(*pos), member))
+
+/**
+ * list_prepare_entry - prepare a pos entry for use as a start point in
+ * list_for_each_entry_continue
+ * @pos: the type * to use as a start point
+ * @head: the head of the list
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_prepare_entry(pos, head, member) \
+ ((pos) ? : list_entry(head, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_continue - iterate over list of given type
+ * continuing after existing point
+ * @pos: the type * to use as a loop counter.
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_continue(pos, head, member) \
+ for (pos = list_entry(pos->member.next, typeof(*pos), member); \
+ prefetch(pos->member.next), &pos->member != (head); \
+ pos = list_entry(pos->member.next, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
+ * @pos: the type * to use as a loop counter.
+ * @n: another type * to use as temporary storage
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_safe(pos, n, head, member) \
+ for (pos = list_entry((head)->next, typeof(*pos), member), \
+ n = list_entry(pos->member.next, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = n, n = list_entry(n->member.next, typeof(*n), member))
+
+#endif
+
Index: contrib/ofed/libmlx5/src/mlx5-abi.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/mlx5-abi.h
@@ -0,0 +1,409 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_ABI_H
+#define MLX5_ABI_H
+
+#include <infiniband/kern-abi.h>
+
+#define MLX5_UVERBS_MIN_ABI_VERSION 1
+#define MLX5_UVERBS_MAX_ABI_VERSION 1
+
+enum {
+ MLX5_QP_FLAG_SIGNATURE = 1 << 0,
+};
+
+enum {
+ MLX5_RWQ_FLAG_SIGNATURE = 1 << 0,
+};
+
+enum {
+ MLX5_NUM_UUARS_PER_PAGE = 2,
+ MLX5_MAX_UAR_PAGES = 1 << 8,
+ MLX5_MAX_UUARS = MLX5_MAX_UAR_PAGES * MLX5_NUM_UUARS_PER_PAGE,
+ MLX5_DEF_TOT_UUARS = 8 * MLX5_NUM_UUARS_PER_PAGE,
+};
+
+struct mlx5_alloc_ucontext {
+ struct ibv_get_context ibv_req;
+ __u32 total_num_uuars;
+ __u32 num_low_latency_uuars;
+ __u32 flags;
+ __u32 reserved;
+};
+
+struct mlx5_alloc_ucontext_resp {
+ struct ibv_get_context_resp ibv_resp;
+ __u32 qp_tab_size;
+ __u32 bf_reg_size;
+ __u32 tot_uuars;
+ __u32 cache_line_size;
+ __u16 max_sq_desc_sz;
+ __u16 max_rq_desc_sz;
+ __u32 max_send_wqebb;
+ __u32 max_recv_wr;
+ __u32 max_srq_recv_wr;
+ __u16 num_ports;
+ __u16 reserved;
+ __u32 max_desc_sz_sq_dc;
+ __u32 atomic_sizes_dc;
+ __u32 reserved1;
+ __u32 flags;
+ __u32 reserved2[5];
+};
+
+enum mlx5_exp_alloc_context_resp_mask {
+ MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_COMP_MAX_NUM = 1 << 0,
+ MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_VERSION = 1 << 1,
+ MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MIN = 1 << 2,
+ MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MAX = 1 << 3,
+ MLX5_EXP_ALLOC_CTX_RESP_MASK_HCA_CORE_CLOCK_OFFSET = 1 << 4,
+};
+
+struct mlx5_exp_alloc_ucontext_data_resp {
+ __u32 comp_mask; /* use mlx5_exp_alloc_context_resp_mask */
+ __u16 cqe_comp_max_num;
+ __u8 cqe_version;
+ __u8 reserved;
+ __u16 rroce_udp_sport_min;
+ __u16 rroce_udp_sport_max;
+ __u32 hca_core_clock_offset;
+};
+
+struct mlx5_exp_alloc_ucontext_resp {
+ struct ibv_get_context_resp ibv_resp;
+ __u32 qp_tab_size;
+ __u32 bf_reg_size;
+ __u32 tot_uuars;
+ __u32 cache_line_size;
+ __u16 max_sq_desc_sz;
+ __u16 max_rq_desc_sz;
+ __u32 max_send_wqebb;
+ __u32 max_recv_wr;
+ __u32 max_srq_recv_wr;
+ __u16 num_ports;
+ __u16 reserved;
+ __u32 max_desc_sz_sq_dc;
+ __u32 atomic_sizes_dc;
+ __u32 reseved1;
+ __u32 flags;
+ __u32 reserved2[5];
+ /* Some more reserved fields for future growth of
+ * mlx5_alloc_ucontext_resp */
+ __u64 prefix_reserved[8];
+
+ struct mlx5_exp_alloc_ucontext_data_resp exp_data;
+};
+
+struct mlx5_alloc_pd_resp {
+ struct ibv_alloc_pd_resp ibv_resp;
+ __u32 pdn;
+};
+
+struct mlx5_create_cq {
+ struct ibv_create_cq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 cqe_size;
+};
+
+struct mlx5_create_cq_resp {
+ struct ibv_create_cq_resp ibv_resp;
+ __u32 cqn;
+ __u32 reserved;
+};
+
+enum mlx5_exp_creaet_cq_mask {
+ MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_EN = 1 << 0,
+ MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_RECV_TYPE = 1 << 1,
+ MLX5_EXP_CREATE_CQ_MASK_RESERVED = 1 << 2,
+};
+
+enum mlx5_exp_cqe_comp_recv_type {
+ MLX5_CQE_FORMAT_HASH,
+ MLX5_CQE_FORMAT_CSUM,
+};
+
+struct mlx5_exp_create_cq_data {
+ __u32 comp_mask; /* use mlx5_exp_creaet_cq_mask */
+ __u8 cqe_comp_en;
+ __u8 cqe_comp_recv_type; /* use mlx5_exp_cqe_comp_recv_type */
+ __u16 reserved;
+};
+
+struct mlx5_exp_create_cq {
+ struct ibv_exp_create_cq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 cqe_size;
+ __u32 reserved;
+ /* Some more reserved fields for future growth of mlx5_create_cq */
+ __u64 prefix_reserved[8];
+
+ /* sizeof prefix aligned with mlx5_create_cq */
+ __u64 size_of_prefix;
+
+ struct mlx5_exp_create_cq_data exp_data;
+};
+
+struct mlx5_create_srq {
+ struct ibv_create_srq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 flags;
+};
+
+struct mlx5_create_srq_resp {
+ struct ibv_create_srq_resp ibv_resp;
+ __u32 srqn;
+ __u32 reserved;
+};
+
+struct mlx5_create_srq_ex {
+ struct ibv_create_xsrq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 flags;
+ __u32 reserved;
+ __u32 uidx;
+ __u32 reserved1;
+};
+
+struct mlx5_drv_create_qp {
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 sq_wqe_count;
+ __u32 rq_wqe_count;
+ __u32 rq_wqe_shift;
+ __u32 flags;
+};
+
+enum mlx5_exp_drv_create_qp_mask {
+ MLX5_EXP_CREATE_QP_MASK_UIDX = 1 << 0,
+ MLX5_EXP_CREATE_QP_MASK_SQ_BUFF_ADD = 1 << 1,
+ MLX5_EXP_CREATE_QP_MASK_WC_UAR_IDX = 1 << 2,
+ MLX5_EXP_CREATE_QP_MASK_FLAGS_IDX = 1 << 3,
+ MLX5_EXP_CREATE_QP_MASK_RESERVED = 1 << 4,
+};
+
+enum mlx5_exp_create_qp_flags {
+ MLX5_EXP_CREATE_QP_MULTI_PACKET_WQE_REQ_FLAG = 1 << 0,
+};
+
+enum mlx5_exp_drv_create_qp_uar_idx {
+ MLX5_EXP_CREATE_QP_DB_ONLY_UUAR = -1
+};
+
+struct mlx5_exp_drv_create_qp_data {
+ __u32 comp_mask; /* use mlx5_exp_ib_create_qp_mask */
+ __u32 uidx;
+ __u64 sq_buf_addr;
+ __u32 wc_uar_index;
+ __u32 flags; /* use mlx5_exp_create_qp_flags */
+};
+
+struct mlx5_exp_drv_create_qp {
+ /* To allow casting to mlx5_drv_create_qp the prefix is the same as
+ * struct mlx5_drv_create_qp prefix
+ */
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 sq_wqe_count;
+ __u32 rq_wqe_count;
+ __u32 rq_wqe_shift;
+ __u32 flags;
+
+ /* Some more reserved fields for future growth of mlx5_drv_create_qp */
+ __u64 prefix_reserved[8];
+
+ /* sizeof prefix aligned with mlx5_drv_create_qp */
+ __u64 size_of_prefix;
+
+ /* Experimental data
+ * Add new experimental data only inside the exp struct
+ */
+ struct mlx5_exp_drv_create_qp_data exp;
+};
+
+struct mlx5_create_qp {
+ struct ibv_create_qp ibv_cmd;
+ struct mlx5_drv_create_qp drv;
+};
+
+enum {
+ MLX5_EXP_INVALID_UUAR = (-1),
+};
+
+struct mlx5_create_qp_resp {
+ struct ibv_create_qp_resp ibv_resp;
+ __u32 uuar_index;
+ __u32 rsvd;
+};
+
+struct mlx5_exp_create_qp {
+ struct ibv_exp_create_qp ibv_cmd;
+ struct mlx5_exp_drv_create_qp drv;
+};
+
+enum mlx5_exp_drv_create_qp_resp_mask {
+ MLX5_EXP_CREATE_QP_RESP_MASK_FLAGS_IDX = 1 << 0,
+ MLX5_EXP_CREATE_QP_RESP_MASK_RESERVED = 1 << 1,
+};
+
+enum mlx5_exp_create_qp_resp_flags {
+ MLX5_EXP_CREATE_QP_RESP_MULTI_PACKET_WQE_FLAG = 1 << 0,
+};
+
+struct mlx5_exp_drv_create_qp_resp_data {
+ __u32 comp_mask; /* use mlx5_exp_drv_create_qp_resp_mask */
+ __u32 flags; /* use mlx5_exp_create_qp_resp_flags */
+};
+
+
+struct mlx5_exp_create_qp_resp {
+ struct ibv_exp_create_qp_resp ibv_resp;
+ __u32 uuar_index;
+ __u32 rsvd;
+
+ /* Some more reserved fields for future growth of create qp resp */
+ __u64 prefix_reserved[8];
+
+ /* sizeof prefix aligned with create qp resp */
+ __u64 size_of_prefix;
+
+ /* Experimental data
+ * Add new experimental data only inside the exp struct
+ */
+ struct mlx5_exp_drv_create_qp_resp_data exp;
+};
+
+struct mlx5_exp_drv_create_wq {
+ __u64 buf_addr;
+ __u64 db_addr;
+ __u32 rq_wqe_count;
+ __u32 rq_wqe_shift;
+ __u32 user_index;
+ __u32 flags;
+};
+
+struct mlx5_exp_create_wq {
+ struct ibv_exp_create_wq ibv_cmd;
+ struct mlx5_exp_drv_create_wq drv;
+};
+
+struct mlx5_exp_create_wq_resp {
+ struct ibv_exp_create_wq_resp ibv_resp;
+};
+
+struct mlx5_exp_modify_wq {
+ struct ib_exp_modify_wq ibv_cmd;
+};
+
+struct mlx5_exp_create_rwq_ind_table_resp {
+ struct ibv_exp_create_rwq_ind_table_resp ibv_resp;
+};
+
+struct mlx5_exp_destroy_rwq_ind_table {
+ struct ibv_exp_destroy_rwq_ind_table ibv_cmd;
+};
+
+struct mlx5_resize_cq {
+ struct ibv_resize_cq ibv_cmd;
+ __u64 buf_addr;
+ __u16 cqe_size;
+ __u16 reserved0;
+ __u32 reserved1;
+};
+
+struct mlx5_resize_cq_resp {
+ struct ibv_resize_cq_resp ibv_resp;
+};
+
+struct mlx5_drv_create_dct {
+ __u32 uidx;
+ __u32 reserved;
+};
+
+struct mlx5_create_dct {
+ struct ibv_exp_create_dct ibv_cmd;
+ struct mlx5_drv_create_dct drv;
+};
+
+struct mlx5_create_dct_resp {
+ struct ibv_exp_create_dct_resp ibv_resp;
+};
+
+struct mlx5_destroy_dct {
+ struct ibv_exp_destroy_dct ibv_cmd;
+};
+
+struct mlx5_destroy_dct_resp {
+ struct ibv_exp_destroy_dct_resp ibv_resp;
+};
+
+struct mlx5_query_dct {
+ struct ibv_exp_query_dct ibv_cmd;
+};
+
+struct mlx5_query_dct_resp {
+ struct ibv_exp_query_dct_resp ibv_resp;
+};
+
+struct mlx5_arm_dct {
+ struct ibv_exp_arm_dct ibv_cmd;
+ __u64 reserved0;
+ __u64 reserved1;
+};
+
+struct mlx5_arm_dct_resp {
+ struct ibv_exp_arm_dct_resp ibv_resp;
+ __u64 reserved0;
+ __u64 reserved1;
+};
+
+struct mlx5_query_mkey {
+ struct ibv_exp_query_mkey ibv_cmd;
+};
+
+struct mlx5_query_mkey_resp {
+ struct ibv_exp_query_mkey_resp ibv_resp;
+};
+
+struct mlx5_create_mr {
+ struct ibv_exp_create_mr ibv_cmd;
+};
+
+struct mlx5_create_mr_resp {
+ struct ibv_exp_create_mr_resp ibv_resp;
+};
+
+#endif /* MLX5_ABI_H */
Index: contrib/ofed/libmlx5/src/mlx5.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/mlx5.h
@@ -0,0 +1,1291 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX5_H
+#define MLX5_H
+
+#include <stddef.h>
+#include <stdio.h>
+#include <netinet/in.h>
+
+#include <infiniband/driver.h>
+#include <infiniband/driver_exp.h>
+#include <infiniband/verbs_exp.h>
+#include <infiniband/arch.h>
+#include "mlx5-abi.h"
+#include "list.h"
+#include "bitmap.h"
+#include "implicit_lkey.h"
+#include "wqe.h"
+
+#ifdef __GNUC__
+#define likely(x) __builtin_expect((x), 1)
+#define unlikely(x) __builtin_expect((x), 0)
+#endif
+
+#ifndef uninitialized_var
+#define uninitialized_var(x) x = x
+#endif
+
+#ifdef HAVE_VALGRIND_MEMCHECK_H
+
+# include <valgrind/memcheck.h>
+
+# if !defined(VALGRIND_MAKE_MEM_DEFINED) || !defined(VALGRIND_MAKE_MEM_UNDEFINED)
+# warning "Valgrind support requested, but VALGRIND_MAKE_MEM_(UN)DEFINED not available"
+# endif
+
+#endif /* HAVE_VALGRIND_MEMCHECK_H */
+
+#ifndef VALGRIND_MAKE_MEM_DEFINED
+# define VALGRIND_MAKE_MEM_DEFINED(addr, len)
+#endif
+
+#ifndef VALGRIND_MAKE_MEM_UNDEFINED
+# define VALGRIND_MAKE_MEM_UNDEFINED(addr, len)
+#endif
+
+#ifndef rmb
+# define rmb() mb()
+#endif
+
+#ifndef wmb
+# define wmb() mb()
+#endif
+
+#ifndef wc_wmb
+
+#if defined(__i386__)
+#define wc_wmb() asm volatile("lock; addl $0, 0(%%esp) " ::: "memory")
+#elif defined(__x86_64__)
+#define wc_wmb() asm volatile("sfence" ::: "memory")
+#elif defined(__ia64__)
+#define wc_wmb() asm volatile("fwb" ::: "memory")
+#else
+#define wc_wmb() wmb()
+#endif
+
+#endif
+
+#define MLX5_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
+
+#if MLX5_GCC_VERSION >= 403
+# define __MLX5_ALGN_F__ __attribute__((noinline, aligned(64)))
+# define __MLX5_ALGN_D__ __attribute__((aligned(64)))
+#else
+# define __MLX5_ALGN_F__
+# define __MLX5_ALGN_D__
+#endif
+
+#ifndef min
+#define min(a, b) \
+ ({ typeof(a) _a = (a); \
+ typeof(b) _b = (b); \
+ _a < _b ? _a : _b; })
+#endif
+
+#ifndef max
+#define max(a, b) \
+ ({ typeof(a) _a = (a); \
+ typeof(b) _b = (b); \
+ _a > _b ? _a : _b; })
+#endif
+
+#define HIDDEN __attribute__((visibility("hidden")))
+
+#define PFX "mlx5: "
+
+#define MLX5_MAX_PORTS_NUM 2
+
+enum {
+ MLX5_MAX_CQ_FAMILY_VER = 1,
+ MLX5_MAX_QP_BURST_FAMILY_VER = 0,
+ MLX5_MAX_WQ_FAMILY_VER = 0
+};
+
+enum {
+ MLX5_IB_MMAP_CMD_SHIFT = 8,
+ MLX5_IB_MMAP_CMD_MASK = 0xff,
+};
+
+enum {
+ MLX5_QP_PATTERN = 0x012389AB,
+ MLX5_CQ_PATTERN = 0x4567CDEF,
+ MLX5_WQ_PATTERN = 0x89AB0123
+};
+
+enum mlx5_lock_type {
+ MLX5_SPIN_LOCK = 0,
+ MLX5_MUTEX = 1,
+};
+
+enum mlx5_lock_state {
+ MLX5_USE_LOCK,
+ MLX5_LOCKED,
+ MLX5_UNLOCKED
+};
+
+enum {
+ MLX5_MMAP_GET_REGULAR_PAGES_CMD = 0,
+ MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD = 1,
+ MLX5_MMAP_GET_WC_PAGES_CMD = 2,
+ MLX5_MMAP_GET_NC_PAGES_CMD = 3,
+ MLX5_MMAP_MAP_DC_INFO_PAGE = 4,
+
+ /* Use EXP mmap commands until it is pushed to upstream */
+ MLX5_EXP_MMAP_GET_CORE_CLOCK_CMD = 0xFB,
+ MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_CPU_NUMA_CMD = 0xFC,
+ MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_DEV_NUMA_CMD = 0xFD,
+ MLX5_EXP_IB_MMAP_N_ALLOC_WC_CMD = 0xFE,
+};
+
+#define MLX5_CQ_PREFIX "MLX_CQ"
+#define MLX5_QP_PREFIX "MLX_QP"
+#define MLX5_MR_PREFIX "MLX_MR"
+#define MLX5_RWQ_PREFIX "MLX_RWQ"
+#define MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE 23
+#define MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE 12
+
+enum {
+ MLX5_DBG_QP = 1 << 0,
+ MLX5_DBG_CQ = 1 << 1,
+ MLX5_DBG_QP_SEND = 1 << 2,
+ MLX5_DBG_QP_SEND_ERR = 1 << 3,
+ MLX5_DBG_CQ_CQE = 1 << 4,
+ MLX5_DBG_CONTIG = 1 << 5,
+};
+
+enum {
+ MLX5_UMR_PTR_ALIGN = 2048,
+};
+
+extern uint32_t mlx5_debug_mask;
+extern int mlx5_freeze_on_error_cqe;
+
+#ifdef MLX5_DEBUG
+#define mlx5_dbg(fp, mask, format, arg...) \
+do { \
+ if (mask & mlx5_debug_mask) \
+ fprintf(fp, "%s:%d: " format, __func__, __LINE__, ##arg); \
+} while (0)
+
+#else
+ #define mlx5_dbg(fp, mask, format, arg...)
+#endif
+
+enum {
+ MLX5_RCV_DBR = 0,
+ MLX5_SND_DBR = 1,
+};
+
+enum {
+ MLX5_STAT_RATE_OFFSET = 5
+};
+
+enum {
+ MLX5_QP_TABLE_SHIFT = 12,
+ MLX5_QP_TABLE_MASK = (1 << MLX5_QP_TABLE_SHIFT) - 1,
+ MLX5_QP_TABLE_SIZE = 1 << (24 - MLX5_QP_TABLE_SHIFT),
+};
+
+enum {
+ MLX5_SRQ_TABLE_SHIFT = 12,
+ MLX5_SRQ_TABLE_MASK = (1 << MLX5_SRQ_TABLE_SHIFT) - 1,
+ MLX5_SRQ_TABLE_SIZE = 1 << (24 - MLX5_SRQ_TABLE_SHIFT),
+};
+
+enum {
+ MLX5_DCT_TABLE_SHIFT = 12,
+ MLX5_DCT_TABLE_MASK = (1 << MLX5_DCT_TABLE_SHIFT) - 1,
+ MLX5_DCT_TABLE_SIZE = 1 << (24 - MLX5_DCT_TABLE_SHIFT),
+};
+
+enum {
+ MLX5_SEND_WQE_BB = 64,
+ MLX5_SEND_WQE_SHIFT = 6,
+};
+
+enum {
+ MLX5_BF_OFFSET = 0x800
+};
+
+enum {
+ MLX5_INLINE_SCATTER_32 = 0x4,
+ MLX5_INLINE_SCATTER_64 = 0x8,
+};
+
+enum {
+ MLX5_OPCODE_NOP = 0x00,
+ MLX5_OPCODE_SEND_INVAL = 0x01,
+ MLX5_OPCODE_RDMA_WRITE = 0x08,
+ MLX5_OPCODE_RDMA_WRITE_IMM = 0x09,
+ MLX5_OPCODE_SEND = 0x0a,
+ MLX5_OPCODE_SEND_IMM = 0x0b,
+ MLX5_OPCODE_LSO_MPW = 0x0e,
+ MLX5_OPC_MOD_MPW = 0x01, /* OPC_MOD for LSO_MPW opcode */
+ MLX5_OPCODE_RDMA_READ = 0x10,
+ MLX5_OPCODE_ATOMIC_CS = 0x11,
+ MLX5_OPCODE_ATOMIC_FA = 0x12,
+ MLX5_OPCODE_ATOMIC_MASKED_CS = 0x14,
+ MLX5_OPCODE_ATOMIC_MASKED_FA = 0x15,
+ MLX5_OPCODE_BIND_MW = 0x18,
+ MLX5_OPCODE_FMR = 0x19,
+ MLX5_OPCODE_LOCAL_INVAL = 0x1b,
+ MLX5_OPCODE_CONFIG_CMD = 0x1f,
+
+ MLX5_OPCODE_SEND_ENABLE = 0x17,
+ MLX5_OPCODE_RECV_ENABLE = 0x16,
+ MLX5_OPCODE_CQE_WAIT = 0x0f,
+ MLX5_OPCODE_UMR = 0x25,
+
+ MLX5_RECV_OPCODE_RDMA_WRITE_IMM = 0x00,
+ MLX5_RECV_OPCODE_SEND = 0x01,
+ MLX5_RECV_OPCODE_SEND_IMM = 0x02,
+ MLX5_RECV_OPCODE_SEND_INVAL = 0x03,
+
+ MLX5_CQE_OPCODE_ERROR = 0x1e,
+ MLX5_CQE_OPCODE_RESIZE = 0x16,
+};
+
+enum {
+ MLX5_SRQ_FLAG_SIGNATURE = 1 << 0,
+};
+
+enum {
+ MLX5_INLINE_SEG = 0x80000000,
+};
+
+enum mlx5_alloc_type {
+ MLX5_ALLOC_TYPE_ANON,
+ MLX5_ALLOC_TYPE_HUGE,
+ MLX5_ALLOC_TYPE_CONTIG,
+ MLX5_ALLOC_TYPE_PREFER_HUGE,
+ MLX5_ALLOC_TYPE_PREFER_CONTIG,
+ MLX5_ALLOC_TYPE_ALL
+};
+
+enum mlx5_mr_type {
+ MLX5_NORMAL_MR = 0x0,
+ MLX5_ODP_MR = 0x1,
+};
+
+struct mlx5_device {
+ struct verbs_device verbs_dev;
+ int page_size;
+
+ struct {
+ unsigned id;
+ unsigned short rev;
+ } devid;
+ int driver_abi_ver;
+};
+
+enum mlx5_rsc_type {
+ MLX5_RSC_TYPE_QP,
+ MLX5_RSC_TYPE_DCT,
+ MLX5_RSC_TYPE_RWQ,
+ MLX5_RSC_TYPE_MP_RWQ,
+ MLX5_RSC_TYPE_XSRQ,
+ MLX5_RSC_TYPE_SRQ,
+ MLX5_RSC_TYPE_INVAL,
+};
+
+struct mlx5_resource {
+ enum mlx5_rsc_type type;
+ uint32_t rsn;
+};
+
+struct mlx5_db_page;
+
+struct mlx5_lock {
+ pthread_mutex_t mutex;
+ pthread_spinlock_t slock;
+ enum mlx5_lock_state state;
+ enum mlx5_lock_type type;
+};
+
+struct mlx5_spinlock {
+ pthread_spinlock_t lock;
+ enum mlx5_lock_state state;
+};
+
+struct mlx5_atomic_info {
+ int valid;
+ enum ibv_exp_atomic_cap exp_atomic_cap;
+ uint64_t bit_mask_log_atomic_arg_sizes;
+};
+
+enum mlx5_uar_mapping_type {
+ MLX5_UAR_MAP_WC,
+ MLX5_UAR_MAP_NC
+};
+struct mlx5_uar_data {
+ enum mlx5_uar_mapping_type map_type;
+ void *regs;
+};
+
+struct mlx5_port_info_ctx {
+ unsigned consumer;
+ int steady;
+};
+
+struct mlx5_info_ctx {
+ void *buf;
+ struct mlx5_port_info_ctx port[2];
+};
+
+struct mlx5_context {
+ struct ibv_context ibv_ctx;
+ int max_num_qps;
+ int bf_reg_size;
+ int tot_uuars;
+ int low_lat_uuars;
+ int bf_regs_per_page;
+ int num_bf_regs;
+ int prefer_bf;
+ int shut_up_bf;
+ int enable_cqe_comp;
+ struct {
+ struct mlx5_resource **table;
+ int refcnt;
+ } rsc_table[MLX5_QP_TABLE_SIZE];
+ pthread_mutex_t rsc_table_mutex;
+
+ struct {
+ struct mlx5_srq **table;
+ int refcnt;
+ } srq_table[MLX5_SRQ_TABLE_SIZE];
+ pthread_mutex_t srq_table_mutex;
+
+ struct {
+ struct mlx5_resource **table;
+ int refcnt;
+ } uidx_table[MLX5_QP_TABLE_SIZE];
+ pthread_mutex_t uidx_table_mutex;
+
+ struct mlx5_uar_data uar[MLX5_MAX_UAR_PAGES];
+
+ struct mlx5_spinlock send_db_lock; /* protects send_db_list and send_db_num_uars */
+ struct list_head send_wc_db_list;
+ unsigned int num_wc_uars;
+ int max_ctx_res_domain;
+
+ struct mlx5_lock lock32;
+ struct mlx5_db_page *db_list;
+ pthread_mutex_t db_list_mutex;
+ int cache_line_size;
+ int max_sq_desc_sz;
+ int max_rq_desc_sz;
+ int max_send_wqebb;
+ int max_recv_wr;
+ unsigned max_srq_recv_wr;
+ int num_ports;
+ int stall_enable;
+ int stall_adaptive_enable;
+ int stall_cycles;
+ struct mlx5_bf *bfs;
+ FILE *dbg_fp;
+ char hostname[40];
+ struct mlx5_spinlock hugetlb_lock;
+ struct list_head hugetlb_list;
+ int max_desc_sz_sq_dc;
+ uint32_t atomic_sizes_dc;
+ pthread_mutex_t task_mutex;
+ struct mlx5_atomic_info info;
+ int max_sge;
+ uint32_t max_send_wqe_inline_klms;
+ pthread_mutex_t env_mtx;
+ int env_initialized;
+ int compact_av;
+ int numa_id;
+ struct mlx5_info_ctx cc;
+ uint8_t cqe_version;
+ uint16_t cqe_comp_max_num;
+ uint16_t rroce_udp_sport_min;
+ uint16_t rroce_udp_sport_max;
+ struct {
+ uint8_t valid;
+ uint8_t link_layer;
+ enum ibv_port_cap_flags caps;
+ } port_query_cache[MLX5_MAX_PORTS_NUM];
+ struct {
+ uint64_t offset;
+ uint64_t mask;
+ uint32_t mult;
+ uint8_t shift;
+ } core_clock;
+ void *hca_core_clock;
+};
+
+struct mlx5_bitmap {
+ uint32_t last;
+ uint32_t top;
+ uint32_t max;
+ uint32_t avail;
+ uint32_t mask;
+ unsigned long *table;
+};
+
+struct mlx5_hugetlb_mem {
+ int shmid;
+ void *shmaddr;
+ struct mlx5_bitmap bitmap;
+ struct list_head list;
+};
+
+struct mlx5_numa_req {
+ int valid;
+ int numa_id;
+};
+struct mlx5_buf {
+ void *buf;
+ size_t length;
+ int base;
+ struct mlx5_hugetlb_mem *hmem;
+ enum mlx5_alloc_type type;
+ struct mlx5_numa_req numa_req;
+ int numa_alloc;
+};
+
+struct mlx5_pd {
+ struct ibv_pd ibv_pd;
+ uint32_t pdn;
+ struct mlx5_implicit_lkey r_ilkey;
+ struct mlx5_implicit_lkey w_ilkey;
+ struct mlx5_implicit_lkey *remote_ilkey;
+};
+
+enum {
+ MLX5_CQ_SET_CI = 0,
+ MLX5_CQ_ARM_DB = 1,
+};
+
+enum mlx5_cq_model_flags {
+ /*
+ * When set the CQ API must be thread safe.
+ * When reset application is taking care
+ * to sync between CQ API calls.
+ */
+ MLX5_CQ_MODEL_FLAG_THREAD_SAFE = 1 << 0,
+};
+
+enum mlx5_cq_creation_flags {
+ /* When set, CQ supports timestamping */
+ MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP = 1 << 0,
+};
+
+struct mlx5_cq {
+ struct ibv_cq ibv_cq;
+ uint32_t creation_flags;
+ uint32_t pattern;
+ struct mlx5_buf buf_a;
+ struct mlx5_buf buf_b;
+ struct mlx5_buf *active_buf;
+ struct mlx5_buf *resize_buf;
+ int resize_cqes;
+ int active_cqes;
+ struct mlx5_lock lock;
+ uint32_t cqn;
+ uint32_t cons_index;
+ uint32_t wait_index;
+ uint32_t wait_count;
+ uint32_t *dbrec;
+ int arm_sn;
+ int cqe_sz;
+ int resize_cqe_sz;
+ int stall_next_poll;
+ int stall_enable;
+ uint64_t stall_last_count;
+ int stall_adaptive_enable;
+ int stall_cycles;
+ uint8_t model_flags; /* use mlx5_cq_model_flags */
+ uint16_t cqe_comp_max_num;
+ uint8_t cq_log_size;
+};
+
+struct mlx5_srq {
+ struct mlx5_resource rsc; /* This struct must be first */
+ struct verbs_srq vsrq;
+ struct mlx5_buf buf;
+ struct mlx5_spinlock lock;
+ uint64_t *wrid;
+ uint32_t srqn;
+ int max;
+ int max_gs;
+ int wqe_shift;
+ int head;
+ int tail;
+ uint32_t *db;
+ uint16_t counter;
+ int wq_sig;
+ struct ibv_srq_legacy *ibv_srq_legacy;
+ int is_xsrq;
+};
+
+struct wr_list {
+ uint16_t opcode;
+ uint16_t next;
+};
+
+struct mlx5_wq {
+ /* common hot data */
+ uint64_t *wrid;
+ unsigned wqe_cnt;
+ unsigned head;
+ unsigned tail;
+ unsigned max_post;
+ int max_gs;
+ struct mlx5_lock lock;
+ /* post_recv hot data */
+ void *buff;
+ uint32_t *db;
+ int wqe_shift;
+ int offset;
+};
+
+struct mlx5_wq_recv_send_enable {
+ unsigned head_en_index;
+ unsigned head_en_count;
+};
+
+enum mlx5_db_method {
+ MLX5_DB_METHOD_DEDIC_BF_1_THREAD,
+ MLX5_DB_METHOD_DEDIC_BF,
+ MLX5_DB_METHOD_BF,
+ MLX5_DB_METHOD_DB
+};
+
+struct mlx5_bf {
+ void *reg;
+ int need_lock;
+ /*
+ * Protect usage of BF address field including data written to the BF
+ * and the BF buffer toggling.
+ */
+ struct mlx5_lock lock;
+ unsigned offset;
+ unsigned buf_size;
+ unsigned uuarn;
+ enum mlx5_db_method db_method;
+};
+
+struct mlx5_mr {
+ struct ibv_mr ibv_mr;
+ struct mlx5_buf buf;
+ uint64_t alloc_flags;
+ enum mlx5_mr_type type;
+};
+
+enum mlx5_qp_model_flags {
+ /*
+ * When set the QP API must be thread safe.
+ * When reset application is taking care
+ * to sync between QP API calls.
+ */
+ MLX5_QP_MODEL_FLAG_THREAD_SAFE = 1 << 0,
+ MLX5_QP_MODEL_MULTI_PACKET_WQE = 1 << 1,
+ MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP = 1 << 2,
+};
+
+struct mlx5_qp;
+struct general_data_hot {
+ /* post_send hot data */
+ unsigned *wqe_head;
+ int (*post_send_one)(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp,
+ uint64_t exp_send_flags,
+ void *seg, int *total_size);
+ void *sqstart;
+ void *sqend;
+ uint32_t *db;
+ struct mlx5_bf *bf;
+ uint32_t scur_post;
+ /* Used for burst_family interface, keeps the last posted wqe */
+ uint32_t last_post;
+ uint16_t create_flags;
+ uint8_t fm_cache;
+ uint8_t model_flags; /* use mlx5_qp_model_flags */
+};
+enum mpw_states {
+ MLX5_MPW_STATE_CLOSED,
+ MLX5_MPW_STATE_OPENED,
+ MLX5_MPW_STATE_OPENED_INL,
+ MLX5_MPW_STATE_OPENING,
+};
+enum {
+ MLX5_MAX_MPW_SGE = 5,
+ MLX5_MAX_MPW_SIZE = 0x3FFF
+};
+struct mpw_data {
+ uint8_t state; /* use mpw_states */
+ uint8_t size;
+ uint8_t num_sge;
+ uint32_t len;
+ uint32_t total_len;
+ uint32_t flags;
+ uint32_t scur_post;
+ union {
+ struct mlx5_wqe_data_seg *last_dseg;
+ uint8_t *inl_data;
+ };
+ uint32_t *ctrl_update;
+};
+struct general_data_warm {
+ uint32_t pattern;
+ uint8_t qp_type;
+};
+struct odp_data {
+ struct mlx5_pd *pd;
+};
+struct data_seg_data {
+ uint32_t max_inline_data;
+};
+struct ctrl_seg_data {
+ uint32_t qp_num;
+ uint8_t fm_ce_se_tbl[8];
+ uint8_t fm_ce_se_acc[32];
+ uint8_t wq_sig;
+};
+struct mlx5_qp {
+ struct mlx5_resource rsc;
+ struct verbs_qp verbs_qp;
+ struct mlx5_buf buf;
+ int buf_size;
+ /* For Raw Ethernet QP, use different Buffer for the SQ and RQ */
+ struct mlx5_buf sq_buf;
+ int sq_buf_size;
+ uint8_t sq_signal_bits;
+ int umr_en;
+
+ /* hot data used on data path */
+ struct mlx5_wq rq __MLX5_ALGN_D__;
+ struct mlx5_wq sq __MLX5_ALGN_D__;
+
+ struct general_data_hot gen_data;
+ struct mpw_data mpw;
+ struct data_seg_data data_seg;
+ struct ctrl_seg_data ctrl_seg;
+
+ /* RAW_PACKET hot data */
+ uint8_t link_layer;
+
+ /* used on data-path but not so hot */
+ struct general_data_warm gen_data_warm;
+ /* atomic hot data */
+ int enable_atomics;
+ /* odp hot data */
+ struct odp_data odp_data;
+ /* ext atomic hot data */
+ uint32_t max_atomic_arg;
+ /* umr hot data */
+ uint32_t max_inl_send_klms;
+ /* recv-send enable hot data */
+ struct mlx5_wq_recv_send_enable rq_enable;
+ struct mlx5_wq_recv_send_enable sq_enable;
+ int rx_qp;
+};
+
+struct mlx5_dct {
+ struct mlx5_resource rsc;
+ struct ibv_exp_dct ibdct;
+};
+
+enum mlx5_wq_model_flags {
+ /*
+ * When set the WQ API must be thread safe.
+ * When reset application is taking care
+ * to sync between WQ API calls.
+ */
+ MLX5_WQ_MODEL_FLAG_THREAD_SAFE = 1 << 0,
+
+ /*
+ * This flag is used to cache the IBV_EXP_DEVICE_RX_CSUM_IP_PKT
+ * device cap flag and it enables the related RX offloading support
+ */
+ MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP = 1 << 1,
+};
+
+enum mlx5_mp_rq_sizes {
+ /*
+ * Max log num of WQE strides supported by lib is 31 since related
+ * "num of strides" variables size (i.e. consumed_strides_counter[] and
+ * mp_rq_strides_in_wqe) is 32 bits
+ */
+ MLX5_MP_RQ_MAX_LOG_NUM_STRIDES = 31,
+ /*
+ * Max log stride size supported by lib is 15 since related
+ * "stride size" variable size (i.e. mp_rq_stride_size) is 16 bits
+ */
+ MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE = 15,
+ MLX5_MP_RQ_SUPPORTED_QPT = IBV_EXP_QPT_RAW_PACKET,
+ MLX5_MP_RQ_SUPPORTED_SHIFTS = IBV_EXP_MP_RQ_2BYTES_SHIFT
+};
+
+struct mlx5_rwq {
+ struct mlx5_resource rsc;
+ uint32_t pattern;
+ struct ibv_exp_wq wq;
+ struct mlx5_buf buf;
+ int buf_size;
+ /* hot data used on data path */
+ struct mlx5_wq rq __MLX5_ALGN_D__;
+ uint32_t *db;
+ /* Multi-Packet RQ hot data */
+ /* Table to hold the consumed strides on each WQE */
+ uint32_t *consumed_strides_counter;
+ uint16_t mp_rq_stride_size;
+ uint32_t mp_rq_strides_in_wqe;
+ uint8_t mp_rq_packet_padding;
+ /* recv-send enable hot data */
+ struct mlx5_wq_recv_send_enable rq_enable;
+ int wq_sig;
+ uint8_t model_flags; /* use mlx5_wq_model_flags */
+};
+
+struct mlx5_ah {
+ struct ibv_ah ibv_ah;
+ struct mlx5_wqe_av av;
+};
+
+struct mlx5_verbs_srq {
+ struct mlx5_srq msrq;
+ struct verbs_srq vsrq;
+};
+
+struct mlx5_klm_buf {
+ void *alloc_buf;
+ void *align_buf;
+ struct ibv_mr *mr;
+ struct ibv_exp_mkey_list_container ibv_klm_list;
+};
+
+struct mlx5_send_db_data {
+ struct mlx5_bf bf;
+ struct mlx5_wc_uar *wc_uar;
+ struct list_head list;
+};
+
+/* Container for the dynamically allocated Write-Combining(WC) mapped UAR */
+struct mlx5_wc_uar {
+ /* Each UAR contains MLX5_NUM_UUARS_PER_PAGE UUARS (BFs) */
+ struct mlx5_send_db_data send_db_data[MLX5_NUM_UUARS_PER_PAGE];
+ /* The index used to mmap this UAR */
+ int uar_idx;
+ /* The virtual address of the WC mmaped UAR */
+ void *uar;
+};
+
+struct mlx5_res_domain {
+ struct ibv_exp_res_domain ibv_res_domain;
+ struct ibv_exp_res_domain_init_attr attr;
+ struct mlx5_send_db_data *send_db;
+};
+
+static inline int mlx5_ilog2(int n)
+{
+ int t;
+
+ if (n <= 0)
+ return -1;
+
+ t = 0;
+ while ((1 << t) < n)
+ ++t;
+
+ return t;
+}
+
+extern int mlx5_stall_num_loop;
+extern int mlx5_stall_cq_poll_min;
+extern int mlx5_stall_cq_poll_max;
+extern int mlx5_stall_cq_inc_step;
+extern int mlx5_stall_cq_dec_step;
+extern int mlx5_single_threaded;
+extern int mlx5_use_mutex;
+
+static inline unsigned DIV_ROUND_UP(unsigned n, unsigned d)
+{
+ return (n + d - 1u) / d;
+}
+
+static inline unsigned long align(unsigned long val, unsigned long algn)
+{
+ return (val + algn - 1) & ~(algn - 1);
+}
+
+static inline void *align_ptr(void *p, unsigned long algn)
+{
+ return (void *)align((unsigned long)p, algn);
+}
+
+#define to_mxxx(xxx, type) \
+ ((struct mlx5_##type *) \
+ ((void *) ib##xxx - offsetof(struct mlx5_##type, ibv_##xxx)))
+
+static inline struct mlx5_device *to_mdev(struct ibv_device *ibdev)
+{
+ struct mlx5_device *ret;
+
+ ret = (void *)ibdev - offsetof(struct mlx5_device, verbs_dev);
+
+ return ret;
+}
+
+static inline struct mlx5_context *to_mctx(struct ibv_context *ibctx)
+{
+ return to_mxxx(ctx, context);
+}
+
+static inline struct mlx5_pd *to_mpd(struct ibv_pd *ibpd)
+{
+ return to_mxxx(pd, pd);
+}
+
+static inline struct mlx5_cq *to_mcq(struct ibv_cq *ibcq)
+{
+ return to_mxxx(cq, cq);
+}
+
+static inline struct mlx5_srq *to_msrq(struct ibv_srq *ibsrq)
+{
+ struct verbs_srq *vsrq = (struct verbs_srq *)ibsrq;
+
+ return container_of(vsrq, struct mlx5_srq, vsrq);
+}
+
+static inline struct mlx5_qp *to_mqp(struct ibv_qp *ibqp)
+{
+ struct verbs_qp *vqp = (struct verbs_qp *)ibqp;
+
+ return container_of(vqp, struct mlx5_qp, verbs_qp);
+}
+
+static inline struct mlx5_dct *to_mdct(struct ibv_exp_dct *ibdct)
+{
+ return container_of(ibdct, struct mlx5_dct, ibdct);
+}
+
+static inline struct mlx5_rwq *to_mrwq(struct ibv_exp_wq *ibwq)
+{
+ return container_of(ibwq, struct mlx5_rwq, wq);
+}
+
+static inline struct mlx5_mr *to_mmr(struct ibv_mr *ibmr)
+{
+ return to_mxxx(mr, mr);
+}
+
+static inline struct mlx5_ah *to_mah(struct ibv_ah *ibah)
+{
+ return to_mxxx(ah, ah);
+}
+
+static inline struct mlx5_res_domain *to_mres_domain(struct ibv_exp_res_domain *ibres_domain)
+{
+ return to_mxxx(res_domain, res_domain);
+}
+
+static inline struct mlx5_klm_buf *to_klm(struct ibv_exp_mkey_list_container *ibklm)
+{
+ size_t off = offsetof(struct mlx5_klm_buf, ibv_klm_list);
+
+ return (struct mlx5_klm_buf *)((void *)ibklm - off);
+}
+
+static inline int max_int(int a, int b)
+{
+ return a > b ? a : b;
+}
+
+static inline enum mlx5_lock_type mlx5_get_locktype(void)
+{
+ if (!mlx5_use_mutex)
+ return MLX5_SPIN_LOCK;
+ return MLX5_MUTEX;
+}
+
+void *mlx5_uar_mmap(int idx, int cmd, int page_size, int cmd_fd);
+int mlx5_cpu_local_numa(void);
+void mlx5_build_ctrl_seg_data(struct mlx5_qp *qp, uint32_t qp_num);
+int mlx5_alloc_buf(struct mlx5_buf *buf, size_t size, int page_size);
+void mlx5_free_buf(struct mlx5_buf *buf);
+int mlx5_alloc_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf,
+ size_t size, int page_size, const char *component, void *req_addr);
+void mlx5_free_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf);
+int mlx5_alloc_prefered_buf(struct mlx5_context *mctx,
+ struct mlx5_buf *buf,
+ size_t size, int page_size,
+ enum mlx5_alloc_type alloc_type,
+ const char *component);
+int mlx5_free_actual_buf(struct mlx5_context *ctx, struct mlx5_buf *buf);
+void mlx5_get_alloc_type(struct ibv_context *context,
+ const char *component,
+ enum mlx5_alloc_type *alloc_type,
+ enum mlx5_alloc_type default_alloc_type);
+int mlx5_use_huge(struct ibv_context *context, const char *key);
+
+uint32_t *mlx5_alloc_dbrec(struct mlx5_context *context);
+void mlx5_free_db(struct mlx5_context *context, uint32_t *db);
+
+int mlx5_prefetch_mr(struct ibv_mr *mr, struct ibv_exp_prefetch_attr *attr);
+
+int mlx5_query_device(struct ibv_context *context,
+ struct ibv_device_attr *attr);
+int mlx5_query_port(struct ibv_context *context, uint8_t port,
+ struct ibv_port_attr *attr);
+int mlx5_exp_query_port(struct ibv_context *context, uint8_t port_num,
+ struct ibv_exp_port_attr *port_attr);
+
+struct ibv_pd *mlx5_alloc_pd(struct ibv_context *context);
+int mlx5_free_pd(struct ibv_pd *pd);
+void read_init_vars(struct mlx5_context *ctx);
+
+struct ibv_mr *mlx5_reg_mr(struct ibv_pd *pd, void *addr,
+ size_t length, int access);
+struct ibv_mr *mlx5_exp_reg_mr(struct ibv_exp_reg_mr_in *in);
+int mlx5_dereg_mr(struct ibv_mr *mr);
+
+struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector);
+struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context,
+ int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector,
+ struct ibv_exp_cq_init_attr *attr);
+int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq,
+ struct mlx5_buf *buf, int nent, int cqe_sz);
+int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf);
+int mlx5_resize_cq(struct ibv_cq *cq, int cqe);
+int mlx5_destroy_cq(struct ibv_cq *cq);
+int mlx5_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc) __MLX5_ALGN_F__;
+int mlx5_poll_cq_1(struct ibv_cq *cq, int ne, struct ibv_wc *wc) __MLX5_ALGN_F__;
+int mlx5_arm_cq(struct ibv_cq *cq, int solicited);
+void mlx5_cq_event(struct ibv_cq *cq);
+void __mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq);
+void mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq);
+void mlx5_cq_resize_copy_cqes(struct mlx5_cq *cq);
+
+struct ibv_srq *mlx5_create_srq(struct ibv_pd *pd,
+ struct ibv_srq_init_attr *attr);
+int mlx5_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *attr,
+ int mask);
+int mlx5_query_srq(struct ibv_srq *srq,
+ struct ibv_srq_attr *attr);
+int mlx5_destroy_srq(struct ibv_srq *srq);
+int mlx5_alloc_srq_buf(struct ibv_context *context, struct mlx5_srq *srq);
+void mlx5_free_srq_wqe(struct mlx5_srq *srq, int ind);
+int mlx5_post_srq_recv(struct ibv_srq *ibsrq,
+ struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr) __MLX5_ALGN_F__;
+
+struct ibv_qp *mlx5_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr);
+int mlx5_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask,
+ struct ibv_qp_init_attr *init_attr);
+int mlx5_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask);
+int mlx5_destroy_qp(struct ibv_qp *qp);
+void mlx5_init_qp_indices(struct mlx5_qp *qp);
+void mlx5_init_rwq_indices(struct mlx5_rwq *rwq);
+void mlx5_update_post_send_one(struct mlx5_qp *qp, enum ibv_qp_state qp_state, enum ibv_qp_type qp_type);
+int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr) __MLX5_ALGN_F__;
+int mlx5_exp_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr,
+ struct ibv_exp_send_wr **bad_wr) __MLX5_ALGN_F__;
+struct ibv_exp_mkey_list_container *mlx5_alloc_mkey_mem(struct ibv_exp_mkey_list_container_attr *attr);
+int mlx5_free_mkey_mem(struct ibv_exp_mkey_list_container *mem);
+int mlx5_query_mkey(struct ibv_mr *mr, struct ibv_exp_mkey_attr *mkey_attr);
+struct ibv_mr *mlx5_create_mr(struct ibv_exp_create_mr_in *in);
+int mlx5_exp_dereg_mr(struct ibv_mr *mr, struct ibv_exp_dereg_out *out);
+struct ibv_exp_wq *mlx5_exp_create_wq(struct ibv_context *context,
+ struct ibv_exp_wq_init_attr *attr);
+int mlx5_exp_modify_wq(struct ibv_exp_wq *wq, struct ibv_exp_wq_attr *attr);
+int mlx5_exp_destroy_wq(struct ibv_exp_wq *wq);
+struct ibv_exp_rwq_ind_table *mlx5_exp_create_rwq_ind_table(struct ibv_context *context,
+ struct ibv_exp_rwq_ind_table_init_attr *init_attr);
+int mlx5_exp_destroy_rwq_ind_table(struct ibv_exp_rwq_ind_table *rwq_ind_table);
+int mlx5_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr) __MLX5_ALGN_F__;
+void mlx5_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type,
+ struct mlx5_qp *qp);
+void mlx5_set_sq_sizes(struct mlx5_qp *qp, struct ibv_qp_cap *cap,
+ enum ibv_qp_type type);
+int mlx5_store_rsc(struct mlx5_context *ctx, uint32_t rsn, void *rsc);
+void *mlx5_find_rsc(struct mlx5_context *ctx, uint32_t rsn);
+void mlx5_clear_rsc(struct mlx5_context *ctx, uint32_t rsn);
+uint32_t mlx5_store_uidx(struct mlx5_context *ctx, void *rsc);
+void mlx5_clear_uidx(struct mlx5_context *ctx, uint32_t uidx);
+struct mlx5_srq *mlx5_find_srq(struct mlx5_context *ctx, uint32_t srqn);
+int mlx5_store_srq(struct mlx5_context *ctx, uint32_t srqn,
+ struct mlx5_srq *srq);
+void mlx5_clear_srq(struct mlx5_context *ctx, uint32_t srqn);
+struct ibv_ah *mlx5_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr);
+int mlx5_destroy_ah(struct ibv_ah *ah);
+int mlx5_alloc_av(struct mlx5_pd *pd, struct ibv_ah_attr *attr,
+ struct mlx5_ah *ah);
+void mlx5_free_av(struct mlx5_ah *ah);
+int mlx5_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid);
+int mlx5_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid);
+int mlx5_round_up_power_of_two(long long sz);
+void *mlx5_get_atomic_laddr(struct mlx5_qp *qp, uint16_t idx, int *byte_count);
+int mlx5_copy_to_recv_wqe(struct mlx5_qp *qp, int idx, void *buf, int size);
+int mlx5_copy_to_send_wqe(struct mlx5_qp *qp, int idx, void *buf, int size);
+int mlx5_poll_dc_info(struct ibv_context *context,
+ struct ibv_exp_dc_info_ent *ents,
+ int nent, int port);
+int mlx5_copy_to_recv_srq(struct mlx5_srq *srq, int idx, void *buf, int size);
+struct ibv_qp *mlx5_drv_create_qp(struct ibv_context *context,
+ struct ibv_qp_init_attr_ex *attrx);
+struct ibv_qp *mlx5_exp_create_qp(struct ibv_context *context,
+ struct ibv_exp_qp_init_attr *attrx);
+struct ibv_ah *mlx5_exp_create_ah(struct ibv_pd *pd,
+ struct ibv_exp_ah_attr *attr_ex);
+struct ibv_xrcd *mlx5_open_xrcd(struct ibv_context *context,
+ struct ibv_xrcd_init_attr *xrcd_init_attr);
+struct ibv_srq *mlx5_create_srq_ex(struct ibv_context *context,
+ struct ibv_srq_init_attr_ex *attr_ex);
+int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num);
+struct ibv_qp *mlx5_open_qp(struct ibv_context *context,
+ struct ibv_qp_open_attr *attr);
+int mlx5_close_xrcd(struct ibv_xrcd *ib_xrcd);
+int mlx5_modify_qp_ex(struct ibv_qp *qp, struct ibv_exp_qp_attr *attr,
+ uint64_t attr_mask);
+void *mlx5_get_legacy_xrc(struct ibv_srq *srq);
+void mlx5_set_legacy_xrc(struct ibv_srq *srq, void *legacy_xrc_srq);
+int mlx5_query_device_ex(struct ibv_context *context,
+ struct ibv_exp_device_attr *attr);
+int mlx5_exp_query_values(struct ibv_context *context, int q_values,
+ struct ibv_exp_values *values);
+int mlx5_modify_cq(struct ibv_cq *cq, struct ibv_exp_cq_attr *attr, int attr_mask);
+struct ibv_exp_dct *mlx5_create_dct(struct ibv_context *context,
+ struct ibv_exp_dct_init_attr *attr);
+int mlx5_destroy_dct(struct ibv_exp_dct *dct);
+int mlx5_poll_cq_ex(struct ibv_cq *ibcq, int num_entries,
+ struct ibv_exp_wc *wc, uint32_t wc_size) __MLX5_ALGN_F__;
+int mlx5_poll_cq_ex_1(struct ibv_cq *ibcq, int num_entries,
+ struct ibv_exp_wc *wc, uint32_t wc_size) __MLX5_ALGN_F__;
+int mlx5_query_dct(struct ibv_exp_dct *dct, struct ibv_exp_dct_attr *attr);
+int mlx5_arm_dct(struct ibv_exp_dct *dct, struct ibv_exp_arm_attr *attr);
+int mlx5_post_task(struct ibv_context *context,
+ struct ibv_exp_task *task_list,
+ struct ibv_exp_task **bad_task);
+struct ibv_exp_res_domain *mlx5_exp_create_res_domain(struct ibv_context *context,
+ struct ibv_exp_res_domain_init_attr *attr);
+int mlx5_exp_destroy_res_domain(struct ibv_context *context,
+ struct ibv_exp_res_domain *res_dom,
+ struct ibv_exp_destroy_res_domain_attr *attr);
+void *mlx5_exp_query_intf(struct ibv_context *context, struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status);
+int mlx5_exp_release_intf(struct ibv_context *context, void *intf,
+ struct ibv_exp_release_intf_params *params);
+struct ibv_exp_qp_burst_family *mlx5_get_qp_burst_family(struct mlx5_qp *qp,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status);
+struct ibv_exp_wq_family *mlx5_get_wq_family(struct mlx5_rwq *rwq,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status);
+struct ibv_exp_cq_family_v1 *mlx5_get_poll_cq_family(struct mlx5_cq *cq,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status);
+static inline void *mlx5_find_uidx(struct mlx5_context *ctx, uint32_t uidx)
+{
+ int tind = uidx >> MLX5_QP_TABLE_SHIFT;
+
+ if (likely(ctx->uidx_table[tind].refcnt))
+ return ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK];
+
+ return NULL;
+}
+
+static inline int mlx5_spin_lock(struct mlx5_spinlock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK)
+ return pthread_spin_lock(&lock->lock);
+
+ if (unlikely(lock->state == MLX5_LOCKED)) {
+ fprintf(stderr, "*** ERROR: multithreading violation ***\n"
+ "You are running a multithreaded application but\n"
+ "you set MLX5_SINGLE_THREADED=1. Please unset it.\n");
+ abort();
+ } else {
+ lock->state = MLX5_LOCKED;
+ wmb();
+ }
+
+ return 0;
+}
+
+static inline int mlx5_spin_unlock(struct mlx5_spinlock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK)
+ return pthread_spin_unlock(&lock->lock);
+
+ lock->state = MLX5_UNLOCKED;
+
+ return 0;
+}
+
+static inline int mlx5_spinlock_init(struct mlx5_spinlock *lock, int use_spinlock)
+{
+ if (use_spinlock) {
+ lock->state = MLX5_USE_LOCK;
+ return pthread_spin_init(&lock->lock, PTHREAD_PROCESS_PRIVATE);
+ }
+ lock->state = MLX5_UNLOCKED;
+
+ return 0;
+}
+
+static inline int mlx5_spinlock_destroy(struct mlx5_spinlock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK)
+ return pthread_spin_destroy(&lock->lock);
+
+ return 0;
+}
+
+static inline int mlx5_lock(struct mlx5_lock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK) {
+ if (lock->type == MLX5_SPIN_LOCK)
+ return pthread_spin_lock(&lock->slock);
+
+ return pthread_mutex_lock(&lock->mutex);
+ }
+
+ if (unlikely(lock->state == MLX5_LOCKED)) {
+ fprintf(stderr, "*** ERROR: multithreading violation ***\n"
+ "You are running a multithreaded application but\n"
+ "you set MLX5_SINGLE_THREADED=1. Please unset it.\n");
+ abort();
+ } else {
+ lock->state = MLX5_LOCKED;
+ /* Make new lock state visible to other threads */
+ wmb();
+ }
+
+ return 0;
+}
+
+static inline int mlx5_unlock(struct mlx5_lock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK) {
+ if (lock->type == MLX5_SPIN_LOCK)
+ return pthread_spin_unlock(&lock->slock);
+
+ return pthread_mutex_unlock(&lock->mutex);
+ }
+
+ lock->state = MLX5_UNLOCKED;
+
+ return 0;
+}
+
+static inline int mlx5_lock_init(struct mlx5_lock *lock,
+ int use_lock,
+ enum mlx5_lock_type lock_type)
+{
+ if (use_lock) {
+ lock->type = lock_type;
+ lock->state = MLX5_USE_LOCK;
+ if (lock->type == MLX5_SPIN_LOCK)
+ return pthread_spin_init(&lock->slock,
+ PTHREAD_PROCESS_PRIVATE);
+ return pthread_mutex_init(&lock->mutex,
+ PTHREAD_PROCESS_PRIVATE);
+ }
+
+ lock->state = MLX5_UNLOCKED;
+
+ return 0;
+}
+
+static inline int mlx5_lock_destroy(struct mlx5_lock *lock)
+{
+ if (lock->state == MLX5_USE_LOCK) {
+ if (lock->type == MLX5_SPIN_LOCK)
+ return pthread_spin_destroy(&lock->slock);
+
+ return pthread_mutex_destroy(&lock->mutex);
+ }
+ return 0;
+}
+
+static inline void set_command(int command, off_t *offset)
+{
+ *offset |= (command << MLX5_IB_MMAP_CMD_SHIFT);
+}
+
+static inline int get_command(off_t *offset)
+{
+ return ((*offset >> MLX5_IB_MMAP_CMD_SHIFT) & MLX5_IB_MMAP_CMD_MASK);
+}
+
+static inline void reset_command(off_t *offset)
+{
+ *offset &= ~(MLX5_IB_MMAP_CMD_MASK << MLX5_IB_MMAP_CMD_SHIFT);
+}
+
+static inline void set_arg(int arg, off_t *offset)
+{
+ *offset |= arg;
+}
+
+static inline void set_order(int order, off_t *offset)
+{
+ set_arg(order, offset);
+}
+
+static inline void set_index(int index, off_t *offset)
+{
+ set_arg(index, offset);
+}
+
+static inline uint8_t calc_xor(void *wqe, int size)
+{
+ int i;
+ uint8_t *p = wqe;
+ uint8_t res = 0;
+
+ for (i = 0; i < size; ++i)
+ res ^= p[i];
+
+ return res;
+}
+
+static inline void mlx5_update_cons_index(struct mlx5_cq *cq)
+{
+ cq->dbrec[MLX5_CQ_SET_CI] = htonl(cq->cons_index & 0xffffff);
+}
+
+#endif /* MLX5_H */
Index: contrib/ofed/libmlx5/src/mlx5.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/mlx5.c
@@ -0,0 +1,1006 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <pthread.h>
+#include <string.h>
+#include <sched.h>
+#include <sys/param.h>
+#include <sys/cpuset.h>
+
+#ifndef HAVE_IBV_REGISTER_DRIVER
+#include <sysfs/libsysfs.h>
+#endif
+
+#include "mlx5.h"
+#include "mlx5-abi.h"
+
+#ifndef PCI_VENDOR_ID_MELLANOX
+#define PCI_VENDOR_ID_MELLANOX 0x15b3
+#endif
+
+#define HCA(v, d) \
+ { .vendor = PCI_VENDOR_ID_##v, \
+ .device = d }
+
+struct {
+ unsigned vendor;
+ unsigned device;
+} hca_table[] = {
+ HCA(MELLANOX, 4113), /* MT27600 Connect-IB */
+ HCA(MELLANOX, 4114), /* MT27600 Connect-IB virtual function */
+ HCA(MELLANOX, 4115), /* ConnectX-4 */
+ HCA(MELLANOX, 4116), /* ConnectX-4 VF */
+ HCA(MELLANOX, 4117), /* ConnectX-4Lx */
+ HCA(MELLANOX, 4118), /* ConnectX-4Lx VF */
+ HCA(MELLANOX, 4119), /* ConnectX-5 */
+ HCA(MELLANOX, 4120), /* ConnectX-5 VF */
+};
+
+uint32_t mlx5_debug_mask = 0;
+int mlx5_freeze_on_error_cqe;
+
+static struct ibv_context_ops mlx5_ctx_ops = {
+ .query_device = mlx5_query_device,
+ .query_port = mlx5_query_port,
+ .alloc_pd = mlx5_alloc_pd,
+ .dealloc_pd = mlx5_free_pd,
+ .reg_mr = mlx5_reg_mr,
+ .dereg_mr = mlx5_dereg_mr,
+ .create_cq = mlx5_create_cq,
+ .poll_cq = mlx5_poll_cq,
+ .req_notify_cq = mlx5_arm_cq,
+ .cq_event = mlx5_cq_event,
+ .resize_cq = mlx5_resize_cq,
+ .destroy_cq = mlx5_destroy_cq,
+ .create_srq = mlx5_create_srq,
+ .modify_srq = mlx5_modify_srq,
+ .query_srq = mlx5_query_srq,
+ .destroy_srq = mlx5_destroy_srq,
+ .post_srq_recv = mlx5_post_srq_recv,
+ .create_qp = mlx5_create_qp,
+ .query_qp = mlx5_query_qp,
+ .modify_qp = mlx5_modify_qp,
+ .destroy_qp = mlx5_destroy_qp,
+ .post_send = mlx5_post_send,
+ .post_recv = mlx5_post_recv,
+ .create_ah = mlx5_create_ah,
+ .destroy_ah = mlx5_destroy_ah,
+ .attach_mcast = mlx5_attach_mcast,
+ .detach_mcast = mlx5_detach_mcast
+};
+
+static int read_number_from_line(const char *line, int *value)
+{
+ const char *ptr;
+
+ ptr = strchr(line, ':');
+ if (!ptr)
+ return 1;
+
+ ++ptr;
+
+ *value = atoi(ptr);
+ return 0;
+}
+
+static int get_free_uidx(struct mlx5_context *ctx)
+{
+ int tind;
+ int i;
+
+ for (tind = 0; tind < MLX5_QP_TABLE_SIZE; tind++) {
+ if (ctx->uidx_table[tind].refcnt < MLX5_QP_TABLE_MASK)
+ break;
+ }
+
+ if (tind == MLX5_QP_TABLE_SIZE)
+ return -1;
+
+ if (!ctx->uidx_table[tind].refcnt)
+ return (tind << MLX5_QP_TABLE_SHIFT);
+
+ for (i = 0; i < MLX5_QP_TABLE_MASK + 1; i++) {
+ if (!ctx->uidx_table[tind].table[i])
+ break;
+ }
+
+ return (tind << MLX5_QP_TABLE_SHIFT) | i;
+}
+
+uint32_t mlx5_store_uidx(struct mlx5_context *ctx, void *rsc)
+{
+ int tind;
+ int ret = -1;
+ int uidx;
+
+ pthread_mutex_lock(&ctx->uidx_table_mutex);
+ uidx = get_free_uidx(ctx);
+ if (uidx < 0)
+ goto out;
+
+ tind = uidx >> MLX5_QP_TABLE_SHIFT;
+
+ if (!ctx->uidx_table[tind].refcnt) {
+ ctx->uidx_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1,
+ sizeof(void *));
+ if (!ctx->uidx_table[tind].table)
+ goto out;
+ }
+
+ ++ctx->uidx_table[tind].refcnt;
+ ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK] = rsc;
+ ret = uidx;
+
+out:
+ pthread_mutex_unlock(&ctx->uidx_table_mutex);
+ return ret;
+}
+
+void mlx5_clear_uidx(struct mlx5_context *ctx, uint32_t uidx)
+{
+ int tind = uidx >> MLX5_QP_TABLE_SHIFT;
+
+ pthread_mutex_lock(&ctx->uidx_table_mutex);
+
+ if (!--ctx->uidx_table[tind].refcnt)
+ free(ctx->uidx_table[tind].table);
+ else
+ ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK] = NULL;
+
+ pthread_mutex_unlock(&ctx->uidx_table_mutex);
+}
+
+static int mlx5_is_sandy_bridge(int *num_cores)
+{
+ char line[128];
+ FILE *fd;
+ int rc = 0;
+ int cur_cpu_family = -1;
+ int cur_cpu_model = -1;
+
+ fd = fopen("/proc/cpuinfo", "r");
+ if (!fd)
+ return 0;
+
+ *num_cores = 0;
+
+ while (fgets(line, 128, fd)) {
+ int value;
+
+ /* if this is information on new processor */
+ if (!strncmp(line, "processor", 9)) {
+ ++*num_cores;
+
+ cur_cpu_family = -1;
+ cur_cpu_model = -1;
+ } else if (!strncmp(line, "cpu family", 10)) {
+ if ((cur_cpu_family < 0) && (!read_number_from_line(line, &value)))
+ cur_cpu_family = value;
+ } else if (!strncmp(line, "model", 5)) {
+ if ((cur_cpu_model < 0) && (!read_number_from_line(line, &value)))
+ cur_cpu_model = value;
+ }
+
+ /* if this is a Sandy Bridge CPU */
+ if ((cur_cpu_family == 6) &&
+ (cur_cpu_model == 0x2A || (cur_cpu_model == 0x2D) ))
+ rc = 1;
+ }
+
+ fclose(fd);
+ return rc;
+}
+
+/*
+man cpuset
+
+ This format displays each 32-bit word in hexadecimal (using ASCII characters "0" - "9" and "a" - "f"); words
+ are filled with leading zeros, if required. For masks longer than one word, a comma separator is used between
+ words. Words are displayed in big-endian order, which has the most significant bit first. The hex digits
+ within a word are also in big-endian order.
+
+ The number of 32-bit words displayed is the minimum number needed to display all bits of the bitmask, based on
+ the size of the bitmask.
+
+ Examples of the Mask Format:
+
+ 00000001 # just bit 0 set
+ 40000000,00000000,00000000 # just bit 94 set
+ 000000ff,00000000 # bits 32-39 set
+ 00000000,000E3862 # 1,5,6,11-13,17-19 set
+
+ A mask with bits 0, 1, 2, 4, 8, 16, 32, and 64 set displays as:
+
+ 00000001,00000001,00010117
+
+ The first "1" is for bit 64, the second for bit 32, the third for bit 16, the fourth for bit 8, the fifth for
+ bit 4, and the "7" is for bits 2, 1, and 0.
+*/
+static void mlx5_local_cpu_set(struct mlx5_context *ctx, cpuset_t *cpu_set)
+{
+ char *p, buf[1024];
+ char env_value[VERBS_MAX_ENV_VAL];
+ uint32_t word;
+ int i, k;
+ struct ibv_context *context = &ctx->ibv_ctx;
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_LOCAL_CPUS", env_value, sizeof(env_value)))
+ strncpy(buf, env_value, sizeof(buf));
+ else {
+ char fname[MAXPATHLEN];
+
+ snprintf(fname, MAXPATHLEN, "/sys/class/infiniband/%s",
+ ibv_get_device_name(context->device));
+
+ if (ibv_read_sysfs_file(fname, "device/local_cpus", buf, sizeof(buf))) {
+ fprintf(stderr, PFX "Warning: can not get local cpu set: failed to open %s\n", fname);
+ return;
+ }
+ }
+
+ p = strrchr(buf, ',');
+ if (!p)
+ p = buf;
+
+ i = 0;
+ do {
+ if (*p == ',') {
+ *p = 0;
+ p ++;
+ }
+
+ word = strtoul(p, 0, 16);
+
+ for (k = 0; word; ++k, word >>= 1)
+ if (word & 1)
+ CPU_SET(k+i, cpu_set);
+
+ if (p == buf)
+ break;
+
+ p = strrchr(buf, ',');
+ if (!p)
+ p = buf;
+
+ i += 32;
+ } while (i < CPU_SETSIZE);
+}
+
+static int mlx5_device_local_numa(struct mlx5_context *ctx)
+{
+ char buf[1024];
+ struct ibv_context *context = &ctx->ibv_ctx;
+ char fname[MAXPATHLEN];
+
+ snprintf(fname, MAXPATHLEN, "/sys/class/infiniband/%s",
+ ibv_get_device_name(context->device));
+
+ if (ibv_read_sysfs_file(fname, "device/numa_node", buf, sizeof(buf)))
+ return (-1);
+
+ return (int)strtoul(buf, 0, 0);
+}
+
+static int mlx5_enable_stall_cq(struct mlx5_context *ctx, int only_sb)
+{
+ cpuset_t my_cpus, dev_local_cpus, result_set;
+ int stall_enable;
+ int ret;
+ int num_cores;
+
+ if (only_sb && !mlx5_is_sandy_bridge(&num_cores))
+ return 0;
+
+ /* by default disable stall on sandy bridge arch */
+ stall_enable = 0;
+
+ /*
+ * check if app is bound to cpu set that is inside
+ * of device local cpu set. Disable stalling if true
+ */
+
+ /* use static cpu set - up to CPU_SETSIZE (1024) cpus/node */
+ CPU_ZERO(&my_cpus);
+ CPU_ZERO(&dev_local_cpus);
+ CPU_ZERO(&result_set);
+ ret = cpuset_getaffinity(CPU_LEVEL_WHICH, CPU_WHICH_PID, -1,
+ sizeof(my_cpus), &my_cpus);
+ if (ret == -1) {
+ if (errno == EINVAL)
+ fprintf(stderr, PFX "Warning: my cpu set is too small\n");
+ else
+ fprintf(stderr, PFX "Warning: failed to get my cpu set\n");
+ goto out;
+ }
+
+ /* get device local cpu set */
+ mlx5_local_cpu_set(ctx, &dev_local_cpus);
+
+ /* make sure result_set is not init to all 0 */
+ CPU_SET(0, &result_set);
+ /* Set stall_enable if my cpu set and dev cpu set are disjoint sets */
+ CPU_AND(&result_set, &my_cpus);
+ CPU_AND(&result_set, &dev_local_cpus);
+ stall_enable = CPU_COUNT(&result_set) ? 0 : 1;
+
+out:
+ return stall_enable;
+}
+
+static void mlx5_read_env(struct mlx5_context *ctx)
+{
+ char env_value[VERBS_MAX_ENV_VAL];
+ struct ibv_context *context = &ctx->ibv_ctx;
+
+ /* If MLX5_STALL_CQ_POLL is not set enable stall CQ only on sandy bridge */
+ if (ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL", env_value, sizeof(env_value)))
+ ctx->stall_enable = mlx5_enable_stall_cq(ctx, 1);
+ /* If MLX5_STALL_CQ_POLL == 0 disable stall CQ */
+ else if (!strcmp(env_value, "0"))
+ ctx->stall_enable = 0;
+ /* If MLX5_STALL_CQ_POLL == 1 enable stall CQ */
+ else if (!strcmp(env_value, "1"))
+ ctx->stall_enable = mlx5_enable_stall_cq(ctx, 0);
+ /* Otherwise enable stall CQ only on sandy bridge */
+ else
+ ctx->stall_enable = mlx5_enable_stall_cq(ctx, 1);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_NUM_LOOP", env_value, sizeof(env_value)))
+ mlx5_stall_num_loop = atoi(env_value);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL_MIN", env_value, sizeof(env_value)))
+ mlx5_stall_cq_poll_min = atoi(env_value);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL_MAX", env_value, sizeof(env_value)))
+ mlx5_stall_cq_poll_max = atoi(env_value);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_INC_STEP", env_value, sizeof(env_value)))
+ mlx5_stall_cq_inc_step = atoi(env_value);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_DEC_STEP", env_value, sizeof(env_value)))
+ mlx5_stall_cq_dec_step = atoi(env_value);
+
+ ctx->stall_adaptive_enable = 0;
+ ctx->stall_cycles = 0;
+ ctx->numa_id = mlx5_device_local_numa(ctx);
+
+ if (mlx5_stall_num_loop < 0) {
+ ctx->stall_adaptive_enable = 1;
+ ctx->stall_cycles = mlx5_stall_cq_poll_min;
+ }
+}
+
+static int get_total_uuars(void)
+{
+ return MLX5_DEF_TOT_UUARS;
+}
+
+static void open_debug_file(struct mlx5_context *ctx)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (ibv_exp_cmd_getenv(&ctx->ibv_ctx, "MLX5_DEBUG_FILE", env, sizeof(env))) {
+ ctx->dbg_fp = stderr;
+ return;
+ }
+
+ ctx->dbg_fp = fopen(env, "aw+");
+ if (!ctx->dbg_fp) {
+ fprintf(stderr, "Failed opening debug file %s, using stderr\n", env);
+ ctx->dbg_fp = stderr;
+ return;
+ }
+}
+
+static void close_debug_file(struct mlx5_context *ctx)
+{
+ if (ctx->dbg_fp && ctx->dbg_fp != stderr)
+ fclose(ctx->dbg_fp);
+}
+
+static void set_debug_mask(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_DEBUG_MASK", env, sizeof(env)))
+ mlx5_debug_mask = strtol(env, NULL, 0);
+}
+
+static void set_freeze_on_error(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_FREEZE_ON_ERROR_CQE", env, sizeof(env)))
+ mlx5_freeze_on_error_cqe = strtol(env, NULL, 0);
+}
+
+static int get_always_bf(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (ibv_exp_cmd_getenv(context, "MLX5_POST_SEND_PREFER_BF", env, sizeof(env)))
+ return 1;
+
+ return strcmp(env, "0") ? 1 : 0;
+}
+
+static int get_shut_up_bf(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (ibv_exp_cmd_getenv(context, "MLX5_SHUT_UP_BF", env, sizeof(env)))
+ return 0;
+
+ return strcmp(env, "0") ? 1 : 0;
+}
+
+static int get_cqe_comp(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (ibv_exp_cmd_getenv(context, "MLX5_ENABLE_CQE_COMPRESSION", env, sizeof(env)))
+ return 0;
+
+ return strcmp(env, "0") ? 1 : 0;
+}
+
+static int get_use_mutex(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (ibv_exp_cmd_getenv(context, "MLX5_USE_MUTEX", env, sizeof(env)))
+ return 0;
+
+ return strcmp(env, "0") ? 1 : 0;
+}
+
+static int get_num_low_lat_uuars(void)
+{
+ return 4;
+}
+
+static int need_uuar_lock(struct mlx5_context *ctx, int uuarn)
+{
+ if (uuarn == 0)
+ return 0;
+
+ if (uuarn >= (ctx->tot_uuars - ctx->low_lat_uuars) * 2)
+ return 0;
+
+ return 1;
+}
+
+static int single_threaded_app(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_SINGLE_THREADED", env, sizeof(env)))
+ return strcmp(env, "1") ? 0 : 1;
+
+ return 0;
+}
+
+static void set_extended(struct verbs_context *verbs_ctx)
+{
+ int off_create_qp_ex = offsetof(struct verbs_context, create_qp_ex);
+ int off_open_xrcd = offsetof(struct verbs_context, open_xrcd);
+ int off_create_srq = offsetof(struct verbs_context, create_srq_ex);
+ int off_get_srq_num = offsetof(struct verbs_context, get_srq_num);
+ int off_open_qp = offsetof(struct verbs_context, open_qp);
+ int off_mlx5_close_xrcd = offsetof(struct verbs_context, close_xrcd);
+ int off_create_flow = offsetof(struct verbs_context, create_flow);
+ int off_destroy_flow = offsetof(struct verbs_context, destroy_flow);
+
+ if (sizeof(*verbs_ctx) - off_create_qp_ex <= verbs_ctx->sz)
+ verbs_ctx->create_qp_ex = mlx5_drv_create_qp;
+
+ if (sizeof(*verbs_ctx) - off_open_xrcd <= verbs_ctx->sz)
+ verbs_ctx->open_xrcd = mlx5_open_xrcd;
+
+ if (sizeof(*verbs_ctx) - off_create_srq <= verbs_ctx->sz)
+ verbs_ctx->create_srq_ex = mlx5_create_srq_ex;
+
+ if (sizeof(*verbs_ctx) - off_get_srq_num <= verbs_ctx->sz)
+ verbs_ctx->get_srq_num = mlx5_get_srq_num;
+
+ if (sizeof(*verbs_ctx) - off_open_qp <= verbs_ctx->sz)
+ verbs_ctx->open_qp = mlx5_open_qp;
+
+ if (sizeof(*verbs_ctx) - off_mlx5_close_xrcd <= verbs_ctx->sz)
+ verbs_ctx->close_xrcd = mlx5_close_xrcd;
+
+ if (sizeof(*verbs_ctx) - off_create_flow <= verbs_ctx->sz)
+ verbs_ctx->create_flow = ibv_cmd_create_flow;
+
+ if (sizeof(*verbs_ctx) - off_destroy_flow <= verbs_ctx->sz)
+ verbs_ctx->destroy_flow = ibv_cmd_destroy_flow;
+}
+
+static void set_experimental(struct ibv_context *ctx)
+{
+ struct verbs_context_exp *verbs_exp_ctx = verbs_get_exp_ctx(ctx);
+ struct mlx5_context *mctx = to_mctx(ctx);
+
+ verbs_set_exp_ctx_op(verbs_exp_ctx, create_dct, mlx5_create_dct);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, destroy_dct, mlx5_destroy_dct);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, query_dct, mlx5_query_dct);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_arm_dct, mlx5_arm_dct);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_device, mlx5_query_device_ex);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_create_qp, mlx5_exp_create_qp);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_modify_qp, mlx5_modify_qp_ex);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_get_legacy_xrc, mlx5_get_legacy_xrc);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_set_legacy_xrc, mlx5_set_legacy_xrc);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_modify_cq, mlx5_modify_cq);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_cq, mlx5_create_cq_ex);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq, mlx5_poll_cq_ex);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_post_task, mlx5_post_task);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_reg_mr, mlx5_exp_reg_mr);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_post_send, mlx5_exp_post_send);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_alloc_mkey_list_memory, mlx5_alloc_mkey_mem);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_dealloc_mkey_list_memory, mlx5_free_mkey_mem);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_mkey, mlx5_query_mkey);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_create_mr, mlx5_create_mr);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_prefetch_mr,
+ mlx5_prefetch_mr);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_dereg_mr, mlx5_exp_dereg_mr);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_poll_dc_info, mlx5_poll_dc_info);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_wq, mlx5_exp_create_wq);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_modify_wq, mlx5_exp_modify_wq);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_wq, mlx5_exp_destroy_wq);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_create_flow, ibv_exp_cmd_create_flow);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_destroy_flow, ibv_exp_cmd_destroy_flow);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_rwq_ind_table, mlx5_exp_create_rwq_ind_table);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_rwq_ind_table, mlx5_exp_destroy_rwq_ind_table);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_res_domain, mlx5_exp_create_res_domain);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_res_domain, mlx5_exp_destroy_res_domain);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_query_intf, mlx5_exp_query_intf);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, exp_release_intf, mlx5_exp_release_intf);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_port, mlx5_exp_query_port);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_create_ah, mlx5_exp_create_ah);
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_values, mlx5_exp_query_values);
+ if (mctx->cqe_version == 1)
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq,
+ mlx5_poll_cq_ex_1);
+ else
+ verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq,
+ mlx5_poll_cq_ex);
+}
+
+void *mlx5_uar_mmap(int idx, int cmd, int page_size, int cmd_fd)
+{
+ off_t offset;
+
+ offset = 0;
+ set_command(cmd, &offset);
+ set_index(idx, &offset);
+
+ return mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, cmd_fd, page_size * offset);
+}
+
+void read_init_vars(struct mlx5_context *ctx)
+{
+ pthread_mutex_lock(&ctx->env_mtx);
+ if (!ctx->env_initialized) {
+ mlx5_single_threaded = single_threaded_app(&ctx->ibv_ctx);
+ mlx5_use_mutex = get_use_mutex(&ctx->ibv_ctx);
+ open_debug_file(ctx);
+ set_debug_mask(&ctx->ibv_ctx);
+ set_freeze_on_error(&ctx->ibv_ctx);
+ ctx->prefer_bf = get_always_bf(&ctx->ibv_ctx);
+ ctx->shut_up_bf = get_shut_up_bf(&ctx->ibv_ctx);
+ mlx5_read_env(ctx);
+ ctx->env_initialized = 1;
+ }
+ pthread_mutex_unlock(&ctx->env_mtx);
+}
+
+static int mlx5_map_internal_clock(struct mlx5_device *mdev,
+ struct ibv_context *ibv_ctx)
+{
+ struct mlx5_context *context = to_mctx(ibv_ctx);
+ void *hca_clock_page;
+ off_t offset = 0;
+
+ set_command(MLX5_EXP_MMAP_GET_CORE_CLOCK_CMD, &offset);
+ hca_clock_page = mmap(NULL, mdev->page_size,
+ PROT_READ, MAP_SHARED, ibv_ctx->cmd_fd,
+ offset * mdev->page_size);
+
+ if (hca_clock_page == MAP_FAILED) {
+ fprintf(stderr, PFX
+ "Warning: Timestamp available,\n"
+ "but failed to mmap() hca core clock page.\n");
+ return -1;
+ }
+
+ context->hca_core_clock = hca_clock_page + context->core_clock.offset;
+
+ return 0;
+}
+
+enum mlx5_cap_flags {
+ MLX5_CAP_COMPACT_AV = 1 << 0,
+};
+
+static int mlx5_alloc_context(struct verbs_device *vdev,
+ struct ibv_context *ctx, int cmd_fd)
+{
+ struct mlx5_context *context;
+ struct mlx5_alloc_ucontext req;
+ struct mlx5_exp_alloc_ucontext_resp resp;
+ struct ibv_device *ibdev = &vdev->device;
+ struct verbs_context *verbs_ctx = verbs_get_ctx(ctx);
+ struct ibv_exp_device_attr attr;
+ int i;
+ int page_size = to_mdev(ibdev)->page_size;
+ int tot_uuars;
+ int low_lat_uuars;
+ int gross_uuars;
+ int j;
+ int uar_mapped;
+ off_t offset;
+ int err;
+
+ context = to_mctx(ctx);
+ if (pthread_mutex_init(&context->env_mtx, NULL))
+ return -1;
+
+ context->ibv_ctx.cmd_fd = cmd_fd;
+
+ memset(&resp, 0, sizeof(resp));
+ if (gethostname(context->hostname, sizeof(context->hostname)))
+ strcpy(context->hostname, "host_unknown");
+
+ tot_uuars = get_total_uuars();
+ gross_uuars = tot_uuars / MLX5_NUM_UUARS_PER_PAGE * 4;
+ context->bfs = calloc(gross_uuars, sizeof *context->bfs);
+ if (!context->bfs) {
+ errno = ENOMEM;
+ goto err_free;
+ }
+
+ low_lat_uuars = get_num_low_lat_uuars();
+ if (low_lat_uuars > tot_uuars - 1) {
+ errno = ENOMEM;
+ goto err_free_bf;
+ }
+
+ memset(&req, 0, sizeof(req));
+ req.total_num_uuars = tot_uuars;
+ req.num_low_latency_uuars = low_lat_uuars;
+ if (ibv_cmd_get_context(&context->ibv_ctx, &req.ibv_req, sizeof req,
+ &resp.ibv_resp, sizeof resp))
+ goto err_free_bf;
+
+ context->max_num_qps = resp.qp_tab_size;
+ context->bf_reg_size = resp.bf_reg_size;
+ context->tot_uuars = resp.tot_uuars;
+ context->low_lat_uuars = low_lat_uuars;
+ context->cache_line_size = resp.cache_line_size;
+ context->max_sq_desc_sz = resp.max_sq_desc_sz;
+ context->max_rq_desc_sz = resp.max_rq_desc_sz;
+ context->max_send_wqebb = resp.max_send_wqebb;
+ context->num_ports = resp.num_ports;
+ context->max_recv_wr = resp.max_recv_wr;
+ context->max_srq_recv_wr = resp.max_srq_recv_wr;
+ context->max_desc_sz_sq_dc = resp.max_desc_sz_sq_dc;
+ context->atomic_sizes_dc = resp.atomic_sizes_dc;
+ context->compact_av = resp.flags & MLX5_CAP_COMPACT_AV;
+
+ if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_COMP_MAX_NUM)
+ context->cqe_comp_max_num = resp.exp_data.cqe_comp_max_num;
+
+ if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_VERSION)
+ context->cqe_version = resp.exp_data.cqe_version;
+
+ if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MIN)
+ context->rroce_udp_sport_min = resp.exp_data.rroce_udp_sport_min;
+
+ if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MAX)
+ context->rroce_udp_sport_max = resp.exp_data.rroce_udp_sport_max;
+
+ ctx->ops = mlx5_ctx_ops;
+ if (context->cqe_version) {
+ if (context->cqe_version == 1) {
+ ctx->ops.poll_cq = mlx5_poll_cq_1;
+ } else {
+ printf("Unsupported cqe_vesion = %d, stay on cqe version 0\n",
+ context->cqe_version);
+ context->cqe_version = 0;
+ }
+ }
+
+ attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
+ err = mlx5_query_device_ex(ctx, &attr);
+ if (!err && (attr.comp_mask & IBV_EXP_DEVICE_ATTR_MAX_CTX_RES_DOMAIN)) {
+ context->max_ctx_res_domain = attr.max_ctx_res_domain;
+ mlx5_spinlock_init(&context->send_db_lock, !mlx5_single_threaded);
+ INIT_LIST_HEAD(&context->send_wc_db_list);
+
+ }
+
+ if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_HCA_CORE_CLOCK_OFFSET) {
+ context->core_clock.offset =
+ resp.exp_data.hca_core_clock_offset &
+ (to_mdev(ibdev)->page_size - 1);
+ mlx5_map_internal_clock(to_mdev(ibdev), ctx);
+ if (attr.hca_core_clock)
+ context->core_clock.mult = ((1ull * 1000) << 21) /
+ attr.hca_core_clock;
+ else
+ context->core_clock.mult = 0;
+
+ /* ConnectX-4 supports 64bit timestamp. We choose these numbers
+ * in order to make sure that after arithmetic operations,
+ * we don't overflow a 64bit variable.
+ */
+ context->core_clock.shift = 21;
+ context->core_clock.mask = (1ULL << 49) - 1;
+ }
+
+ pthread_mutex_init(&context->rsc_table_mutex, NULL);
+ pthread_mutex_init(&context->srq_table_mutex, NULL);
+ for (i = 0; i < MLX5_QP_TABLE_SIZE; ++i)
+ context->rsc_table[i].refcnt = 0;
+
+ for (i = 0; i < MLX5_QP_TABLE_SIZE; ++i)
+ context->uidx_table[i].refcnt = 0;
+
+ context->db_list = NULL;
+
+ pthread_mutex_init(&context->db_list_mutex, NULL);
+
+ context->prefer_bf = get_always_bf(&context->ibv_ctx);
+ context->shut_up_bf = get_shut_up_bf(&context->ibv_ctx);
+ context->enable_cqe_comp = get_cqe_comp(&context->ibv_ctx);
+ mlx5_use_mutex = get_use_mutex(&context->ibv_ctx);
+
+ offset = 0;
+ set_command(MLX5_MMAP_MAP_DC_INFO_PAGE, &offset);
+ context->cc.buf = mmap(NULL, 4096 * context->num_ports, PROT_READ,
+ MAP_PRIVATE, cmd_fd, page_size * offset);
+ if (context->cc.buf == MAP_FAILED)
+ context->cc.buf = NULL;
+
+ mlx5_single_threaded = single_threaded_app(&context->ibv_ctx);
+ for (i = 0; i < resp.tot_uuars / MLX5_NUM_UUARS_PER_PAGE; ++i) {
+ uar_mapped = 0;
+
+ /* Don't map UAR to WC if BF is not used */
+ if (!context->shut_up_bf) {
+ context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_WC_PAGES_CMD, page_size, cmd_fd);
+ if (context->uar[i].regs != MAP_FAILED) {
+ context->uar[i].map_type = MLX5_UAR_MAP_WC;
+ uar_mapped = 1;
+ }
+ }
+
+ if (!uar_mapped) {
+ context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_NC_PAGES_CMD, page_size, cmd_fd);
+ if (context->uar[i].regs != MAP_FAILED) {
+ context->uar[i].map_type = MLX5_UAR_MAP_NC;
+ uar_mapped = 1;
+ }
+ }
+
+ if (!uar_mapped) {
+ /* for backward compatibility with old kernel driver */
+ context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_REGULAR_PAGES_CMD, page_size, cmd_fd);
+ if (context->uar[i].regs != MAP_FAILED) {
+ context->uar[i].map_type = MLX5_UAR_MAP_WC;
+ uar_mapped = 1;
+ }
+ }
+
+ if (!uar_mapped) {
+ context->uar[i].regs = NULL;
+ goto err_free_cc;
+ }
+ }
+
+ for (j = 0; j < gross_uuars; ++j) {
+ context->bfs[j].reg = context->uar[j / 4].regs +
+ MLX5_BF_OFFSET + (j % 4) * context->bf_reg_size;
+ context->bfs[j].need_lock = need_uuar_lock(context, j) &&
+ context->uar[j / 4].map_type == MLX5_UAR_MAP_WC;
+ mlx5_lock_init(&context->bfs[j].lock,
+ !mlx5_single_threaded,
+ mlx5_get_locktype());
+ context->bfs[j].offset = 0;
+ if (context->uar[j / 4].map_type == MLX5_UAR_MAP_WC) {
+ context->bfs[j].buf_size = context->bf_reg_size / 2;
+ context->bfs[j].db_method = (context->bfs[j].need_lock && !mlx5_single_threaded) ?
+ MLX5_DB_METHOD_BF :
+ (mlx5_single_threaded && wc_auto_evict_size() == 64 ?
+ MLX5_DB_METHOD_DEDIC_BF_1_THREAD :
+ MLX5_DB_METHOD_DEDIC_BF);
+
+ } else {
+ context->bfs[j].db_method = MLX5_DB_METHOD_DB;
+ }
+
+ context->bfs[j].uuarn = j;
+ }
+
+ mlx5_lock_init(&context->lock32,
+ !mlx5_single_threaded,
+ mlx5_get_locktype());
+
+ mlx5_spinlock_init(&context->hugetlb_lock, !mlx5_single_threaded);
+ INIT_LIST_HEAD(&context->hugetlb_list);
+
+ pthread_mutex_init(&context->task_mutex, NULL);
+
+ set_extended(verbs_ctx);
+ set_experimental(ctx);
+
+ for (i = 0; i < MLX5_MAX_PORTS_NUM; ++i)
+ context->port_query_cache[i].valid = 0;
+
+ return 0;
+
+err_free_cc:
+ if (context->cc.buf)
+ munmap(context->cc.buf, 4096 * context->num_ports);
+
+ if (context->hca_core_clock)
+ munmap(context->hca_core_clock - context->core_clock.offset,
+ to_mdev(ibdev)->page_size);
+
+err_free_bf:
+ free(context->bfs);
+
+err_free:
+ for (i = 0; i < MLX5_MAX_UAR_PAGES; ++i) {
+ if (context->uar[i].regs)
+ munmap(context->uar[i].regs, page_size);
+ }
+ close_debug_file(context);
+
+ return errno;
+}
+
+static void mlx5_free_context(struct verbs_device *device,
+ struct ibv_context *ibctx)
+{
+ struct mlx5_context *context = to_mctx(ibctx);
+ int page_size = to_mdev(ibctx->device)->page_size;
+ int i;
+
+ if (context->hca_core_clock)
+ munmap(context->hca_core_clock - context->core_clock.offset,
+ to_mdev(&device->device)->page_size);
+
+ if (context->cc.buf)
+ munmap(context->cc.buf, 4096 * context->num_ports);
+
+ free(context->bfs);
+ for (i = 0; i < MLX5_MAX_UAR_PAGES; ++i) {
+ if (context->uar[i].regs)
+ munmap(context->uar[i].regs, page_size);
+ }
+ close_debug_file(context);
+}
+
+static struct verbs_device *mlx5_driver_init(const char *uverbs_sys_path,
+ int abi_version)
+{
+ char value[8];
+ struct mlx5_device *dev;
+ unsigned vendor, device;
+ int i;
+
+ if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
+ value, sizeof value) < 0)
+ return NULL;
+ sscanf(value, "%i", &vendor);
+
+ if (ibv_read_sysfs_file(uverbs_sys_path, "device/device",
+ value, sizeof value) < 0)
+ return NULL;
+ sscanf(value, "%i", &device);
+
+ for (i = 0; i < sizeof hca_table / sizeof hca_table[0]; ++i)
+ if (vendor == hca_table[i].vendor &&
+ device == hca_table[i].device)
+ goto found;
+
+ return NULL;
+
+found:
+ if (abi_version < MLX5_UVERBS_MIN_ABI_VERSION ||
+ abi_version > MLX5_UVERBS_MAX_ABI_VERSION) {
+ fprintf(stderr, PFX "Fatal: ABI version %d of %s is not supported "
+ "(min supported %d, max supported %d)\n",
+ abi_version, uverbs_sys_path,
+ MLX5_UVERBS_MIN_ABI_VERSION,
+ MLX5_UVERBS_MAX_ABI_VERSION);
+ return NULL;
+ }
+
+ dev = malloc(sizeof *dev);
+ if (!dev) {
+ fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n",
+ uverbs_sys_path);
+ return NULL;
+ }
+
+ dev->page_size = sysconf(_SC_PAGESIZE);
+
+ dev->devid.id = device;
+ dev->driver_abi_ver = abi_version;
+
+ dev->verbs_dev.sz = sizeof(dev->verbs_dev);
+ dev->verbs_dev.size_of_context =
+ sizeof(struct mlx5_context) - sizeof(struct ibv_context);
+
+ /*
+ * mlx5_init_context will initialize provider calls
+ */
+ dev->verbs_dev.init_context = mlx5_alloc_context;
+ dev->verbs_dev.uninit_context = mlx5_free_context;
+
+ return &dev->verbs_dev;
+}
+
+#ifdef HAVE_IBV_REGISTER_DRIVER
+static __attribute__((constructor)) void mlx5_register_driver(void)
+{
+ verbs_register_driver("mlx5", mlx5_driver_init);
+}
+#else
+/*
+ * Export the old libsysfs sysfs_class_device-based driver entry point
+ * if libibverbs does not export an ibv_register_driver() function.
+ */
+struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev)
+{
+ int abi_ver = 0;
+ char value[8];
+
+ if (ibv_read_sysfs_file(sysdev->path, "abi_version",
+ value, sizeof value) > 0)
+ abi_ver = strtol(value, NULL, 10);
+
+ return mlx5_driver_init(sysdev->path, abi_ver);
+}
+#endif /* HAVE_IBV_REGISTER_DRIVER */
Index: contrib/ofed/libmlx5/src/mlx5.map
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/mlx5.map
@@ -0,0 +1,5 @@
+{
+ global:
+ openib_driver_init;
+ local: *;
+};
Index: contrib/ofed/libmlx5/src/qp.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/qp.c
@@ -0,0 +1,2998 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdlib.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <string.h>
+#include <errno.h>
+#include <stdio.h>
+
+#include "mlx5.h"
+#include "doorbell.h"
+#include "wqe.h"
+
+enum {
+ MLX5_OPCODE_BASIC = 0x00010000,
+ MLX5_OPCODE_MANAGED = 0x00020000,
+
+ MLX5_OPCODE_WITH_IMM = 0x01000000,
+ MLX5_OPCODE_EXT_ATOMICS = 0x08,
+};
+
+#define MLX5_IB_OPCODE(op, class, attr) (((class) & 0x00FF0000) | ((attr) & 0xFF000000) | ((op) & 0x0000FFFF))
+#define MLX5_IB_OPCODE_GET_CLASS(opcode) ((opcode) & 0x00FF0000)
+#define MLX5_IB_OPCODE_GET_OP(opcode) ((opcode) & 0x0000FFFF)
+#define MLX5_IB_OPCODE_GET_ATTR(opcode) ((opcode) & 0xFF000000)
+
+
+static const uint32_t mlx5_ib_opcode[] = {
+ [IBV_EXP_WR_SEND] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_SEND_WITH_IMM] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND_IMM, MLX5_OPCODE_BASIC, MLX5_OPCODE_WITH_IMM),
+ [IBV_EXP_WR_RDMA_WRITE] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_WRITE, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_RDMA_WRITE_WITH_IMM] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_WRITE_IMM, MLX5_OPCODE_BASIC, MLX5_OPCODE_WITH_IMM),
+ [IBV_EXP_WR_RDMA_READ] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_READ, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_ATOMIC_CMP_AND_SWP] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_CS, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_ATOMIC_FETCH_AND_ADD] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_FA, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_MASKED_CS, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_MASKED_FA, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_SEND_ENABLE] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND_ENABLE, MLX5_OPCODE_MANAGED, 0),
+ [IBV_EXP_WR_RECV_ENABLE] = MLX5_IB_OPCODE(MLX5_OPCODE_RECV_ENABLE, MLX5_OPCODE_MANAGED, 0),
+ [IBV_EXP_WR_CQE_WAIT] = MLX5_IB_OPCODE(MLX5_OPCODE_CQE_WAIT, MLX5_OPCODE_MANAGED, 0),
+ [IBV_EXP_WR_NOP] = MLX5_IB_OPCODE(MLX5_OPCODE_NOP, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_UMR_FILL] = MLX5_IB_OPCODE(MLX5_OPCODE_UMR, MLX5_OPCODE_BASIC, 0),
+ [IBV_EXP_WR_UMR_INVALIDATE] = MLX5_IB_OPCODE(MLX5_OPCODE_UMR, MLX5_OPCODE_BASIC, 0),
+};
+
+enum {
+ MLX5_CALC_UINT64_ADD = 0x01,
+ MLX5_CALC_FLOAT64_ADD = 0x02,
+ MLX5_CALC_UINT64_MAXLOC = 0x03,
+ MLX5_CALC_UINT64_AND = 0x04,
+ MLX5_CALC_UINT64_OR = 0x05,
+ MLX5_CALC_UINT64_XOR = 0x06
+};
+
+static const struct mlx5_calc_op {
+ int valid;
+ uint8_t opmod;
+} mlx5_calc_ops_table
+ [IBV_EXP_CALC_DATA_SIZE_NUMBER]
+ [IBV_EXP_CALC_OP_NUMBER]
+ [IBV_EXP_CALC_DATA_TYPE_NUMBER] = {
+ [IBV_EXP_CALC_DATA_SIZE_64_BIT] = {
+ [IBV_EXP_CALC_OP_ADD] = {
+ [IBV_EXP_CALC_DATA_TYPE_INT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_ADD },
+ [IBV_EXP_CALC_DATA_TYPE_UINT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_ADD },
+ [IBV_EXP_CALC_DATA_TYPE_FLOAT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_FLOAT64_ADD }
+ },
+ [IBV_EXP_CALC_OP_BXOR] = {
+ [IBV_EXP_CALC_DATA_TYPE_INT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_XOR },
+ [IBV_EXP_CALC_DATA_TYPE_UINT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_XOR },
+ [IBV_EXP_CALC_DATA_TYPE_FLOAT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_XOR }
+ },
+ [IBV_EXP_CALC_OP_BAND] = {
+ [IBV_EXP_CALC_DATA_TYPE_INT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_AND },
+ [IBV_EXP_CALC_DATA_TYPE_UINT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_AND },
+ [IBV_EXP_CALC_DATA_TYPE_FLOAT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_AND }
+ },
+ [IBV_EXP_CALC_OP_BOR] = {
+ [IBV_EXP_CALC_DATA_TYPE_INT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_OR },
+ [IBV_EXP_CALC_DATA_TYPE_UINT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_OR },
+ [IBV_EXP_CALC_DATA_TYPE_FLOAT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_OR }
+ },
+ [IBV_EXP_CALC_OP_MAXLOC] = {
+ [IBV_EXP_CALC_DATA_TYPE_UINT] = {
+ .valid = 1,
+ .opmod = MLX5_CALC_UINT64_MAXLOC }
+ }
+ }
+};
+
+static inline void set_wait_en_seg(void *wqe_seg, uint32_t obj_num, uint32_t count)
+{
+ struct mlx5_wqe_wait_en_seg *seg = (struct mlx5_wqe_wait_en_seg *)wqe_seg;
+
+ seg->pi = htonl(count);
+ seg->obj_num = htonl(obj_num);
+
+ return;
+}
+
+static inline void *get_recv_wqe(struct mlx5_wq *rq, int n)
+{
+ return rq->buff + (n << rq->wqe_shift);
+}
+
+static int copy_to_scat(struct mlx5_wqe_data_seg *scat, void *buf, int *size,
+ int max)
+{
+ int copy;
+ int i;
+
+ if (unlikely(!(*size)))
+ return IBV_WC_SUCCESS;
+
+ for (i = 0; i < max; ++i) {
+ copy = min(*size, ntohl(scat->byte_count));
+ memcpy((void *)(unsigned long)ntohll(scat->addr), buf, copy);
+ *size -= copy;
+ if (*size == 0)
+ return IBV_WC_SUCCESS;
+
+ buf += copy;
+ ++scat;
+ }
+ return IBV_WC_LOC_LEN_ERR;
+}
+
+int mlx5_copy_to_recv_wqe(struct mlx5_qp *qp, int idx, void *buf, int size)
+{
+ struct mlx5_wqe_data_seg *scat;
+ int max = 1 << (qp->rq.wqe_shift - 4);
+
+ scat = get_recv_wqe(&qp->rq, idx);
+ if (unlikely(qp->ctrl_seg.wq_sig))
+ ++scat;
+
+ return copy_to_scat(scat, buf, &size, max);
+}
+
+static void *mlx5_get_send_wqe(struct mlx5_qp *qp, int n)
+{
+ return qp->gen_data.sqstart + (n << MLX5_SEND_WQE_SHIFT);
+}
+
+int mlx5_copy_to_send_wqe(struct mlx5_qp *qp, int idx, void *buf, int size)
+{
+ struct mlx5_wqe_ctrl_seg *ctrl;
+ struct mlx5_wqe_data_seg *scat;
+ void *p;
+ int max;
+
+ idx &= (qp->sq.wqe_cnt - 1);
+ ctrl = mlx5_get_send_wqe(qp, idx);
+ if (qp->verbs_qp.qp.qp_type != IBV_QPT_RC) {
+ fprintf(stderr, "scatter to CQE is supported only for RC QPs\n");
+ return IBV_WC_GENERAL_ERR;
+ }
+ p = ctrl + 1;
+
+ switch (ntohl(ctrl->opmod_idx_opcode) & 0xff) {
+ case MLX5_OPCODE_RDMA_READ:
+ p = p + sizeof(struct mlx5_wqe_raddr_seg);
+ break;
+
+ case MLX5_OPCODE_ATOMIC_CS:
+ case MLX5_OPCODE_ATOMIC_FA:
+ p = p + sizeof(struct mlx5_wqe_raddr_seg) +
+ sizeof(struct mlx5_wqe_atomic_seg);
+ break;
+
+ default:
+ fprintf(stderr, "scatter to CQE for opcode %d\n",
+ ntohl(ctrl->opmod_idx_opcode) & 0xff);
+ return IBV_WC_REM_INV_REQ_ERR;
+ }
+
+ scat = p;
+ max = (ntohl(ctrl->qpn_ds) & 0x3F) - (((void *)scat - (void *)ctrl) >> 4);
+ if (unlikely((void *)(scat + max) > qp->gen_data.sqend)) {
+ int tmp = ((void *)qp->gen_data.sqend - (void *)scat) >> 4;
+ int orig_size = size;
+
+ if (copy_to_scat(scat, buf, &size, tmp) == IBV_WC_SUCCESS)
+ return IBV_WC_SUCCESS;
+ max = max - tmp;
+ buf += orig_size - size;
+ scat = mlx5_get_send_wqe(qp, 0);
+ }
+
+ return copy_to_scat(scat, buf, &size, max);
+}
+
+void mlx5_init_qp_indices(struct mlx5_qp *qp)
+{
+ qp->sq.head = 0;
+ qp->sq.tail = 0;
+ qp->rq.head = 0;
+ qp->rq.tail = 0;
+ qp->gen_data.scur_post = 0;
+ qp->sq_enable.head_en_index = 0;
+ qp->sq_enable.head_en_count = 0;
+ qp->rq_enable.head_en_index = 0;
+ qp->rq_enable.head_en_count = 0;
+}
+
+void mlx5_init_rwq_indices(struct mlx5_rwq *rwq)
+{
+ rwq->rq.head = 0;
+ rwq->rq.tail = 0;
+ rwq->rq_enable.head_en_index = 0;
+ rwq->rq_enable.head_en_count = 0;
+}
+
+static int __mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) __attribute__((noinline));
+static int __mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp)
+{
+ struct mlx5_cq *cq = to_mcq(qp->verbs_qp.qp.send_cq);
+ unsigned cur;
+
+
+ mlx5_lock(&cq->lock);
+ cur = wq->head - wq->tail;
+ mlx5_unlock(&cq->lock);
+
+ return cur + nreq >= wq->max_post;
+}
+static inline int mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) __attribute__((always_inline));
+static inline int mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp)
+{
+ unsigned cur;
+
+ cur = wq->head - wq->tail;
+ if (likely(cur + nreq < wq->max_post))
+ return 0;
+
+ return __mlx5_wq_overflow(wq, nreq, qp);
+}
+
+static inline void set_raddr_seg(struct mlx5_wqe_raddr_seg *rseg,
+ uint64_t remote_addr, uint32_t rkey)
+{
+ rseg->raddr = htonll(remote_addr);
+ rseg->rkey = htonl(rkey);
+ rseg->reserved = 0;
+}
+
+static void set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg,
+ enum ibv_wr_opcode opcode,
+ uint64_t swap,
+ uint64_t compare_add)
+{
+ if (opcode == IBV_WR_ATOMIC_CMP_AND_SWP) {
+ aseg->swap_add = htonll(swap);
+ aseg->compare = htonll(compare_add);
+ } else {
+ aseg->swap_add = htonll(compare_add);
+ aseg->compare = 0;
+ }
+}
+
+static int has_grh(struct mlx5_ah *ah)
+{
+ return ah->av.base.dqp_dct & ntohl(MLX5_EXTENDED_UD_AV);
+}
+
+static int set_datagram_seg(struct mlx5_wqe_datagram_seg *dseg,
+ struct ibv_exp_send_wr *wr)
+{
+ struct mlx5_ah *ah = to_mah(wr->wr.ud.ah);
+ int size;
+
+ size = has_grh(ah) ? sizeof(ah->av) : sizeof(ah->av.base);
+
+ memcpy(&dseg->av, &to_mah(wr->wr.ud.ah)->av, size);
+ dseg->av.base.dqp_dct |= htonl(wr->wr.ud.remote_qpn);
+ dseg->av.base.key.qkey.qkey = htonl(wr->wr.ud.remote_qkey);
+
+ return size;
+}
+
+static int set_dci_seg(struct mlx5_wqe_datagram_seg *dseg,
+ struct ibv_exp_send_wr *wr)
+{
+ struct mlx5_ah *ah = to_mah(wr->dc.ah);
+ int size;
+
+ size = has_grh(ah) ? sizeof(ah->av) : sizeof(ah->av.base);
+
+ memcpy(&dseg->av, &to_mah(wr->dc.ah)->av, size);
+ dseg->av.base.dqp_dct |= htonl(wr->dc.dct_number);
+ dseg->av.base.key.dc_key = htonll(wr->dc.dct_access_key);
+
+ return size;
+}
+
+static int set_odp_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg,
+ struct mlx5_qp *qp) __attribute__((noinline));
+static int set_odp_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg,
+ struct mlx5_qp *qp)
+{
+ uint32_t lkey;
+ if (sg->lkey == ODP_GLOBAL_R_LKEY) {
+ if (mlx5_get_real_lkey_from_implicit_lkey(qp->odp_data.pd, &qp->odp_data.pd->r_ilkey,
+ sg->addr, sg->length,
+ &lkey))
+ return ENOMEM;
+ } else {
+ if (mlx5_get_real_lkey_from_implicit_lkey(qp->odp_data.pd, &qp->odp_data.pd->w_ilkey,
+ sg->addr, sg->length,
+ &lkey))
+ return ENOMEM;
+ }
+
+ dseg->byte_count = htonl(sg->length);
+ dseg->lkey = htonl(lkey);
+ dseg->addr = htonll(sg->addr);
+
+ return 0;
+}
+
+static inline int set_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg,
+ struct mlx5_qp *qp,
+ int offset) __attribute__((always_inline));
+static inline int set_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg,
+ struct mlx5_qp *qp,
+ int offset)
+{
+ if (unlikely(sg->lkey == ODP_GLOBAL_R_LKEY || sg->lkey == ODP_GLOBAL_W_LKEY))
+ return set_odp_data_ptr_seg(dseg, sg, qp);
+
+ dseg->byte_count = htonl(sg->length - offset);
+ dseg->lkey = htonl(sg->lkey);
+ dseg->addr = htonll(sg->addr + offset);
+
+ return 0;
+}
+
+/*
+ * Avoid using memcpy() to copy to BlueFlame page, since memcpy()
+ * implementations may use move-string-buffer assembler instructions,
+ * which do not guarantee order of copying.
+ */
+#if defined(__x86_64__)
+#define COPY_64B_NT(dst, src) \
+ __asm__ __volatile__ ( \
+ " movdqa (%1),%%xmm0\n" \
+ " movdqa 16(%1),%%xmm1\n" \
+ " movdqa 32(%1),%%xmm2\n" \
+ " movdqa 48(%1),%%xmm3\n" \
+ " movntdq %%xmm0, (%0)\n" \
+ " movntdq %%xmm1, 16(%0)\n" \
+ " movntdq %%xmm2, 32(%0)\n" \
+ " movntdq %%xmm3, 48(%0)\n" \
+ : : "r" (dst), "r" (src) : "memory"); \
+ dst += 8; \
+ src += 8
+#else
+#define COPY_64B_NT(dst, src) \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++; \
+ *dst++ = *src++
+
+#endif
+static void mlx5_bf_copy(unsigned long long *dst, unsigned long long *src,
+ unsigned bytecnt, struct mlx5_qp *qp)
+{
+ while (bytecnt > 0) {
+ COPY_64B_NT(dst, src);
+ bytecnt -= 8 * sizeof(unsigned long long);
+ if (unlikely(src == qp->gen_data.sqend))
+ src = qp->gen_data.sqstart;
+ }
+}
+
+static inline void mlx5_write_db(unsigned long long *dst, unsigned long long *src)
+{
+ *dst = *src;
+}
+
+static uint32_t send_ieth(struct ibv_exp_send_wr *wr)
+{
+ return MLX5_IB_OPCODE_GET_ATTR(mlx5_ib_opcode[wr->exp_opcode]) &
+ MLX5_OPCODE_WITH_IMM ?
+ wr->ex.imm_data : 0;
+}
+
+static inline int set_data_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz,
+ int idx, int offset) __attribute__((always_inline));
+static inline int set_data_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz, int idx, int offset)
+{
+ struct mlx5_wqe_inline_seg *seg;
+ void *addr;
+ int len;
+ int i;
+ int inl = 0;
+ void *qend = qp->gen_data.sqend;
+ int copy;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+ seg = wqe;
+ wqe += sizeof *seg;
+
+ for (i = idx; i < num_sge; ++i) {
+ addr = (void *) (unsigned long)(sg_list[i].addr + offset);
+ len = sg_list[i].length - offset;
+ inl += len;
+ offset = 0;
+
+ if (unlikely(inl > qp->data_seg.max_inline_data)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "inline layout failed, err %d\n", ENOMEM);
+ return ENOMEM;
+ }
+
+ if (unlikely(wqe + len > qend)) {
+ copy = qend - wqe;
+ memcpy(wqe, addr, copy);
+ addr += copy;
+ len -= copy;
+ wqe = mlx5_get_send_wqe(qp, 0);
+ }
+ memcpy(wqe, addr, len);
+ wqe += len;
+ }
+
+ if (likely(inl)) {
+ seg->byte_count = htonl(inl | MLX5_INLINE_SEG);
+ *sz += align(inl + sizeof(seg->byte_count), 16) / 16;
+ }
+
+ return 0;
+}
+
+static inline int set_data_non_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz,
+ int idx, int offset) __attribute__((always_inline));
+static inline int set_data_non_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz,
+ int idx, int offset)
+{
+ struct mlx5_wqe_data_seg *dpseg = wqe;
+ struct ibv_sge *psge;
+ int i;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+ for (i = idx; i < num_sge; ++i) {
+ if (unlikely(dpseg == qp->gen_data.sqend))
+ dpseg = mlx5_get_send_wqe(qp, 0);
+
+ if (likely(sg_list[i].length)) {
+ psge = sg_list + i;
+
+ if (unlikely(set_data_ptr_seg(dpseg, psge, qp,
+ offset))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for implicit lkey structure\n");
+ return ENOMEM;
+ }
+ ++dpseg;
+ offset = 0;
+ *sz += sizeof(struct mlx5_wqe_data_seg) / 16;
+ }
+ }
+
+ return 0;
+}
+
+static int set_data_atom_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz, int atom_arg) __MLX5_ALGN_F__;
+static int set_data_atom_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list,
+ void *wqe, int *sz, int atom_arg)
+{
+ struct mlx5_wqe_data_seg *dpseg = wqe;
+ struct ibv_sge *psge;
+ struct ibv_sge sge;
+ int i;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+ for (i = 0; i < num_sge; ++i) {
+ if (unlikely(dpseg == qp->gen_data.sqend))
+ dpseg = mlx5_get_send_wqe(qp, 0);
+
+ if (likely(sg_list[i].length)) {
+ sge = sg_list[i];
+ sge.length = atom_arg;
+ psge = &sge;
+ if (unlikely(set_data_ptr_seg(dpseg, psge, qp, 0))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for implicit lkey structure\n");
+ return ENOMEM;
+ }
+ ++dpseg;
+ *sz += sizeof(struct mlx5_wqe_data_seg) / 16;
+ }
+ }
+
+ return 0;
+}
+
+static inline int set_data_seg(struct mlx5_qp *qp, void *seg, int *sz, int is_inl,
+ int num_sge, struct ibv_sge *sg_list, int atom_arg,
+ int idx, int offset) __attribute__((always_inline));
+static inline int set_data_seg(struct mlx5_qp *qp, void *seg, int *sz, int is_inl,
+ int num_sge, struct ibv_sge *sg_list, int atom_arg,
+ int idx, int offset)
+{
+ if (is_inl)
+ return set_data_inl_seg(qp, num_sge, sg_list, seg, sz, idx,
+ offset);
+ if (unlikely(atom_arg))
+ return set_data_atom_seg(qp, num_sge, sg_list, seg, sz, atom_arg);
+
+ return set_data_non_inl_seg(qp, num_sge, sg_list, seg, sz, idx, offset);
+}
+
+#ifdef MLX5_DEBUG
+void dump_wqe(FILE *fp, int idx, int size_16, struct mlx5_qp *qp)
+{
+ uint32_t *uninitialized_var(p);
+ int i, j;
+ int tidx = idx;
+
+ fprintf(fp, "dump wqe at %p\n", mlx5_get_send_wqe(qp, tidx));
+ for (i = 0, j = 0; i < size_16 * 4; i += 4, j += 4) {
+ if ((i & 0xf) == 0) {
+ void *buf = mlx5_get_send_wqe(qp, tidx);
+ tidx = (tidx + 1) & (qp->sq.wqe_cnt - 1);
+ p = buf;
+ j = 0;
+ }
+ fprintf(fp, "%08x %08x %08x %08x\n", ntohl(p[j]), ntohl(p[j + 1]),
+ ntohl(p[j + 2]), ntohl(p[j + 3]));
+ }
+}
+#endif /* MLX5_DEBUG */
+
+
+void *mlx5_get_atomic_laddr(struct mlx5_qp *qp, uint16_t idx, int *byte_count)
+{
+ struct mlx5_wqe_data_seg *dpseg;
+ void *addr;
+
+ dpseg = mlx5_get_send_wqe(qp, idx) + sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_raddr_seg) +
+ sizeof(struct mlx5_wqe_atomic_seg);
+ addr = (void *)(unsigned long)ntohll(dpseg->addr);
+
+ /*
+ * Currently byte count is always 8 bytes. Fix this when
+ * we support variable size of atomics
+ */
+ *byte_count = 8;
+ return addr;
+}
+
+static int ext_cmp_swp(struct mlx5_qp *qp, void *seg,
+ struct ibv_exp_send_wr *wr)
+{
+ struct ibv_exp_cmp_swap *cs = &wr->ext_op.masked_atomics.wr_data.inline_data.op.cmp_swap;
+ int arg_sz = 1 << wr->ext_op.masked_atomics.log_arg_sz;
+ uint32_t *p32 = seg;
+ uint64_t *p64 = seg;
+ int i;
+
+ if (arg_sz == 4) {
+ *p32 = htonl((uint32_t)cs->swap_val);
+ p32++;
+ *p32 = htonl((uint32_t)cs->compare_val);
+ p32++;
+ *p32 = htonl((uint32_t)cs->swap_mask);
+ p32++;
+ *p32 = htonl((uint32_t)cs->compare_mask);
+ return 16;
+ } else if (arg_sz == 8) {
+ *p64 = htonll(cs->swap_val);
+ p64++;
+ *p64 = htonll(cs->compare_val);
+ p64++;
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(cs->swap_mask);
+ p64++;
+ *p64 = htonll(cs->compare_mask);
+ return 32;
+ } else {
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->swap_val + i));
+ }
+
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->compare_val + i));
+ }
+
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->swap_mask + i));
+ }
+
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->compare_mask + i));
+ }
+ return 4 * arg_sz;
+ }
+}
+
+static int ext_fetch_add(struct mlx5_qp *qp, void *seg,
+ struct ibv_exp_send_wr *wr)
+{
+ struct ibv_exp_fetch_add *fa = &wr->ext_op.masked_atomics.wr_data.inline_data.op.fetch_add;
+ int arg_sz = 1 << wr->ext_op.masked_atomics.log_arg_sz;
+ uint32_t *p32 = seg;
+ uint64_t *p64 = seg;
+ int i;
+
+ if (arg_sz == 4) {
+ *p32 = htonl((uint32_t)fa->add_val);
+ p32++;
+ *p32 = htonl((uint32_t)fa->field_boundary);
+ p32++;
+ *p32 = htonl(0);
+ p32++;
+ *p32 = htonl(0);
+ return 16;
+ } else if (arg_sz == 8) {
+ *p64 = htonll(fa->add_val);
+ p64++;
+ *p64 = htonll(fa->field_boundary);
+ return 16;
+ } else {
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(fa->add_val + i));
+ }
+
+ for (i = 0; i < arg_sz; i += 8, p64++) {
+ if (unlikely(p64 == qp->gen_data.sqend))
+ p64 = mlx5_get_send_wqe(qp, 0);
+ *p64 = htonll(*(uint64_t *)(uintptr_t)(fa->field_boundary + i));
+ }
+
+ return 2 * arg_sz;
+ }
+}
+
+static int set_ext_atomic_seg(struct mlx5_qp *qp, void *seg,
+ struct ibv_exp_send_wr *wr)
+{
+ /* currently only inline is supported */
+ if (unlikely(!(wr->exp_send_flags & IBV_EXP_SEND_EXT_ATOMIC_INLINE)))
+ return -1;
+
+ if (unlikely((1 << wr->ext_op.masked_atomics.log_arg_sz) > qp->max_atomic_arg))
+ return -1;
+
+ if (wr->exp_opcode == IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP)
+ return ext_cmp_swp(qp, seg, wr);
+ else if (wr->exp_opcode == IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD)
+ return ext_fetch_add(qp, seg, wr);
+ else
+ return -1;
+}
+
+enum {
+ MLX5_UMR_CTRL_INLINE = 1 << 7,
+};
+
+static uint64_t umr_mask(int fill)
+{
+ uint64_t mask;
+
+ if (fill)
+ mask = MLX5_MKEY_MASK_LEN |
+ MLX5_MKEY_MASK_START_ADDR |
+ MLX5_MKEY_MASK_LR |
+ MLX5_MKEY_MASK_LW |
+ MLX5_MKEY_MASK_RR |
+ MLX5_MKEY_MASK_RW |
+ MLX5_MKEY_MASK_FREE |
+ MLX5_MKEY_MASK_A;
+ else
+ mask = MLX5_MKEY_MASK_FREE;
+
+ return mask;
+}
+
+static void set_umr_ctrl_seg(struct ibv_exp_send_wr *wr,
+ struct mlx5_wqe_umr_ctrl_seg *seg)
+{
+ int fill = wr->exp_opcode == IBV_EXP_WR_UMR_FILL ? 1 : 0;
+
+ memset(seg, 0, sizeof(*seg));
+
+ if (wr->exp_send_flags & IBV_EXP_SEND_INLINE || !fill)
+ seg->flags = MLX5_UMR_CTRL_INLINE;
+
+ seg->mkey_mask = htonll(umr_mask(fill));
+}
+
+static int lay_umr(struct mlx5_qp *qp, struct ibv_exp_send_wr *wr,
+ void *seg, int *wqe_size, int *xlat_size,
+ uint64_t *reglen)
+{
+ enum ibv_exp_umr_wr_type type = wr->ext_op.umr.umr_type;
+ struct ibv_exp_mem_region *mlist;
+ struct ibv_exp_mem_repeat_block *rep;
+ struct mlx5_wqe_data_seg *dseg;
+ struct mlx5_seg_repeat_block *rb;
+ struct mlx5_seg_repeat_ent *re;
+ struct mlx5_klm_buf *klm = NULL;
+ void *qend = qp->gen_data.sqend;
+ int i;
+ int j;
+ int n;
+ int byte_count = 0;
+ int inl = wr->exp_send_flags & IBV_EXP_SEND_INLINE;
+ void *buf;
+ int tmp;
+
+ if (inl) {
+ if (unlikely(qp->max_inl_send_klms <
+ wr->ext_op.umr.num_mrs))
+ return EINVAL;
+ buf = seg;
+ } else {
+ klm = to_klm(wr->ext_op.umr.memory_objects);
+ buf = klm->align_buf;
+ }
+
+ *reglen = 0;
+ n = wr->ext_op.umr.num_mrs;
+ if (type == IBV_EXP_UMR_MR_LIST) {
+ mlist = wr->ext_op.umr.mem_list.mem_reg_list;
+ dseg = buf;
+
+ for (i = 0, j = 0; i < n; i++, j++) {
+ if (inl && unlikely((&dseg[j] == qend))) {
+ dseg = mlx5_get_send_wqe(qp, 0);
+ j = 0;
+ }
+
+ dseg[j].addr = htonll((uint64_t)(uintptr_t)mlist[i].base_addr);
+ dseg[j].lkey = htonl(mlist[i].mr->lkey);
+ dseg[j].byte_count = htonl(mlist[i].length);
+ byte_count += mlist[i].length;
+ }
+ if (inl)
+ *wqe_size = align(n * sizeof(*dseg), 64);
+ else
+ *wqe_size = 0;
+
+ *reglen = byte_count;
+ *xlat_size = n * sizeof(*dseg);
+ } else {
+ rep = wr->ext_op.umr.mem_list.rb.mem_repeat_block_list;
+ rb = buf;
+ rb->const_0x400 = htonl(0x400);
+ rb->reserved = 0;
+ rb->num_ent = htons(n);
+ re = rb->entries;
+ rb->repeat_count = htonl(wr->ext_op.umr.mem_list.rb.repeat_count[0]);
+
+ if (unlikely(wr->ext_op.umr.mem_list.rb.stride_dim != 1)) {
+ fprintf(stderr, "dimention must be 1\n");
+ return -ENOMEM;
+ }
+
+
+ for (i = 0, j = 0; i < n; i++, j++, rep++, re++) {
+ if (inl && unlikely((re == qend)))
+ re = mlx5_get_send_wqe(qp, 0);
+
+ byte_count += rep->byte_count[0];
+ re->va = htonll(rep->base_addr);
+ re->byte_count = htons(rep->byte_count[0]);
+ re->stride = htons(rep->stride[0]);
+ re->memkey = htonl(rep->mr->lkey);
+ }
+ rb->byte_count = htonl(byte_count);
+ *reglen = byte_count * ntohl(rb->repeat_count);
+ tmp = align((n + 1), 4) - n - 1;
+ memset(re, 0, tmp * sizeof(*re));
+ if (inl) {
+ *wqe_size = align(sizeof(*rb) + sizeof(*re) * n, 64);
+ *xlat_size = (n + 1) * sizeof(*re);
+ } else {
+ *wqe_size = 0;
+ *xlat_size = (n + 1) * sizeof(*re);
+ }
+ }
+ return 0;
+}
+
+static void *adjust_seg(struct mlx5_qp *qp, void *seg)
+{
+ return mlx5_get_send_wqe(qp, 0) + (seg - qp->gen_data.sqend);
+}
+
+static uint8_t get_umr_flags(int acc)
+{
+ return (acc & IBV_ACCESS_REMOTE_ATOMIC ? MLX5_PERM_ATOMIC : 0) |
+ (acc & IBV_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) |
+ (acc & IBV_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) |
+ (acc & IBV_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) |
+ MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
+}
+
+static void set_mkey_seg(struct ibv_exp_send_wr *wr, struct mlx5_mkey_seg *seg)
+{
+ memset(seg, 0, sizeof(*seg));
+ if (wr->exp_opcode != IBV_EXP_WR_UMR_FILL) {
+ seg->status = 1 << 6;
+ return;
+ }
+
+ seg->flags = get_umr_flags(wr->ext_op.umr.exp_access);
+ seg->start_addr = htonll(wr->ext_op.umr.base_addr);
+ seg->qpn_mkey7_0 = htonl(0xffffff00 | (wr->ext_op.umr.modified_mr->lkey & 0xff));
+}
+
+static uint8_t get_fence(uint8_t fence, struct ibv_exp_send_wr *wr)
+{
+ if (unlikely(wr->exp_opcode == IBV_EXP_WR_LOCAL_INV &&
+ wr->exp_send_flags & IBV_EXP_SEND_FENCE))
+ return MLX5_FENCE_MODE_STRONG_ORDERING;
+
+ if (unlikely(fence)) {
+ if (wr->exp_send_flags & IBV_EXP_SEND_FENCE)
+ return MLX5_FENCE_MODE_SMALL_AND_FENCE;
+ else
+ return fence;
+
+ } else {
+ return 0;
+ }
+}
+
+void mlx5_build_ctrl_seg_data(struct mlx5_qp *qp, uint32_t qp_num)
+{
+ uint8_t *tbl = qp->ctrl_seg.fm_ce_se_tbl;
+ uint8_t *acc = qp->ctrl_seg.fm_ce_se_acc;
+ int i;
+
+ tbl[0 | 0 | 0] = (0 | 0 | 0);
+ tbl[0 | 0 | IBV_SEND_FENCE] = (0 | 0 | MLX5_WQE_CTRL_FENCE);
+ tbl[0 | IBV_SEND_SIGNALED | 0] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | 0);
+ tbl[0 | IBV_SEND_SIGNALED | IBV_SEND_FENCE] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE);
+ tbl[IBV_SEND_SOLICITED | 0 | 0] = (MLX5_WQE_CTRL_SOLICITED | 0 | 0);
+ tbl[IBV_SEND_SOLICITED | 0 | IBV_SEND_FENCE] = (MLX5_WQE_CTRL_SOLICITED | 0 | MLX5_WQE_CTRL_FENCE);
+ tbl[IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | 0] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | 0);
+ tbl[IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE);
+ for (i = 0; i < 8; i++)
+ tbl[i] = qp->sq_signal_bits | tbl[i];
+
+ memset(acc, 0, sizeof(qp->ctrl_seg.fm_ce_se_acc));
+ acc[0 | 0 | 0] = (0 | 0 | 0);
+ acc[0 | 0 | IBV_EXP_QP_BURST_FENCE] = (0 | 0 | MLX5_WQE_CTRL_FENCE);
+ acc[0 | IBV_EXP_QP_BURST_SIGNALED | 0] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | 0);
+ acc[0 | IBV_EXP_QP_BURST_SIGNALED | IBV_EXP_QP_BURST_FENCE] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE);
+ acc[IBV_EXP_QP_BURST_SOLICITED | 0 | 0] = (MLX5_WQE_CTRL_SOLICITED | 0 | 0);
+ acc[IBV_EXP_QP_BURST_SOLICITED | 0 | IBV_EXP_QP_BURST_FENCE] = (MLX5_WQE_CTRL_SOLICITED | 0 | MLX5_WQE_CTRL_FENCE);
+ acc[IBV_EXP_QP_BURST_SOLICITED | IBV_EXP_QP_BURST_SIGNALED | 0] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | 0);
+ acc[IBV_EXP_QP_BURST_SOLICITED | IBV_EXP_QP_BURST_SIGNALED | IBV_EXP_QP_BURST_FENCE] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE);
+ for (i = 0; i < 32; i++)
+ acc[i] = qp->sq_signal_bits | acc[i];
+
+ qp->ctrl_seg.qp_num = qp_num;
+}
+
+static inline void set_ctrl_seg(uint32_t *start, struct ctrl_seg_data *ctrl_seg,
+ uint8_t opcode, uint16_t idx, uint8_t opmod,
+ uint8_t size, uint8_t fm_ce_se, uint32_t imm_invk_umrk)
+{
+ *start++ = htonl(opmod << 24 | idx << 8 | opcode);
+ *start++ = htonl(ctrl_seg->qp_num << 8 | (size & 0x3F));
+ *start++ = htonl(fm_ce_se);
+ *start = imm_invk_umrk;
+}
+
+static inline void set_ctrl_seg_sig(uint32_t *start, struct ctrl_seg_data *ctrl_seg,
+ uint8_t opcode, uint16_t idx, uint8_t opmod,
+ uint8_t size, uint8_t fm_ce_se, uint32_t imm_invk_umrk)
+{
+ set_ctrl_seg(start, ctrl_seg, opcode, idx, opmod, size, fm_ce_se, imm_invk_umrk);
+
+ if (unlikely(ctrl_seg->wq_sig))
+ *(start + 2) = htonl(~calc_xor(start, size << 4) << 24 | fm_ce_se);
+}
+
+static int __mlx5_post_send_one_other(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size)
+{
+ void *ctrl = seg;
+ int err = 0;
+ int size = 0;
+ int num_sge = wr->num_sge;
+ uint8_t fm_ce_se;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+ if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) ||
+ (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) &&
+ !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n");
+ return EINVAL;
+ }
+
+ seg += sizeof(struct mlx5_wqe_ctrl_seg);
+ size = sizeof(struct mlx5_wqe_ctrl_seg) / 16;
+
+ err = set_data_seg(qp, seg, &size,
+ !!(exp_send_flags & IBV_EXP_SEND_INLINE),
+ num_sge, wr->sg_list, 0, 0, 0);
+ if (unlikely(err))
+ return err;
+
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags &
+ (IBV_SEND_SOLICITED |
+ IBV_SEND_SIGNALED |
+ IBV_SEND_FENCE)];
+ fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr);
+ set_ctrl_seg_sig(ctrl, &qp->ctrl_seg,
+ MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]),
+ qp->gen_data.scur_post, 0, size, fm_ce_se,
+ send_ieth(wr));
+
+ qp->gen_data.fm_cache = 0;
+ *total_size = size;
+
+ return 0;
+}
+
+static int __mlx5_post_send_one_raw_packet(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp,
+ uint64_t exp_send_flags, void *seg,
+ int *total_size) __MLX5_ALGN_F__;
+
+static int __mlx5_post_send_one_raw_packet(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp,
+ uint64_t exp_send_flags, void *seg,
+ int *total_size)
+{
+ void *ctrl = seg;
+ struct mlx5_wqe_eth_seg *eseg;
+ int err = 0;
+ int size = 0;
+ int num_sge = wr->num_sge;
+ int inl_hdr_size = MLX5_ETH_INLINE_HEADER_SIZE;
+ int inl_hdr_copy_size = 0;
+ int i = 0;
+ uint8_t fm_ce_se;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+ seg += sizeof(struct mlx5_wqe_ctrl_seg);
+ size = sizeof(struct mlx5_wqe_ctrl_seg) / 16;
+
+ eseg = seg;
+ *((uint64_t *)eseg) = 0;
+ eseg->rsvd2 = 0;
+
+ if (exp_send_flags & IBV_EXP_SEND_IP_CSUM)
+ eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
+
+ /* The first 16 bytes of the headers should be copied to the
+ * inline-headers of the ETH segment.
+ */
+ if (likely(wr->sg_list[0].length >= MLX5_ETH_INLINE_HEADER_SIZE)) {
+ inl_hdr_copy_size = MLX5_ETH_INLINE_HEADER_SIZE;
+ memcpy(eseg->inline_hdr_start,
+ (void *)(uintptr_t)wr->sg_list[0].addr,
+ inl_hdr_copy_size);
+ } else {
+ for (i = 0; i < num_sge && inl_hdr_size > 0; ++i) {
+ inl_hdr_copy_size = min(wr->sg_list[i].length,
+ inl_hdr_size);
+ memcpy(eseg->inline_hdr_start +
+ (MLX5_ETH_INLINE_HEADER_SIZE - inl_hdr_size),
+ (void *)(uintptr_t)wr->sg_list[i].addr,
+ inl_hdr_copy_size);
+ inl_hdr_size -= inl_hdr_copy_size;
+ }
+ --i;
+ if (unlikely(inl_hdr_size)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "Ethernet headers < 16 bytes\n");
+ return EINVAL;
+ }
+ }
+
+ seg += sizeof(struct mlx5_wqe_eth_seg);
+ size += sizeof(struct mlx5_wqe_eth_seg) / 16;
+ eseg->inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE);
+
+ /* If we copied all the sge into the inline-headers, then we need to
+ * start copying from the next sge into the data-segment.
+ */
+ if (unlikely(wr->sg_list[i].length == inl_hdr_copy_size)) {
+ ++i;
+ inl_hdr_copy_size = 0;
+ }
+
+ /* The copied headers should be excluded from the data segment */
+ err = set_data_seg(qp, seg, &size,
+ !!(exp_send_flags & IBV_EXP_SEND_INLINE),
+ num_sge, wr->sg_list, 0, i, inl_hdr_copy_size);
+
+ if (unlikely(err))
+ return err;
+
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags &
+ (IBV_SEND_SOLICITED |
+ IBV_SEND_SIGNALED |
+ IBV_SEND_FENCE)];
+ fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr);
+ set_ctrl_seg_sig(ctrl, &qp->ctrl_seg,
+ MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]),
+ qp->gen_data.scur_post, 0, size, fm_ce_se,
+ send_ieth(wr));
+
+ qp->gen_data.fm_cache = 0;
+ *total_size = size;
+
+ return 0;
+}
+
+static int __mlx5_post_send_one_uc_ud(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) __MLX5_ALGN_F__;
+static int __mlx5_post_send_one_uc_ud(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size)
+{
+ void *ctrl = seg;
+ int err = 0;
+ int size = 0;
+ int num_sge = wr->num_sge;
+ uint8_t fm_ce_se;
+ int tmp;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+
+ if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) ||
+ (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) &&
+ !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n");
+ return EINVAL;
+ }
+
+ seg += sizeof(struct mlx5_wqe_ctrl_seg);
+ size = sizeof(struct mlx5_wqe_ctrl_seg) / 16;
+
+ switch (qp->gen_data_warm.qp_type) {
+ case IBV_QPT_UC:
+ switch (wr->exp_opcode) {
+ case IBV_WR_RDMA_WRITE:
+ case IBV_WR_RDMA_WRITE_WITH_IMM:
+ set_raddr_seg(seg, wr->wr.rdma.remote_addr,
+ wr->wr.rdma.rkey);
+ seg += sizeof(struct mlx5_wqe_raddr_seg);
+ size += sizeof(struct mlx5_wqe_raddr_seg) / 16;
+ break;
+
+ default:
+ break;
+ }
+ break;
+
+ case IBV_QPT_UD:
+ tmp = set_datagram_seg(seg, wr);
+ seg += tmp;
+ size += (tmp >> 4);
+ if (unlikely((seg == qp->gen_data.sqend)))
+ seg = mlx5_get_send_wqe(qp, 0);
+ break;
+
+ default:
+ break;
+ }
+
+ err = set_data_seg(qp, seg, &size, !!(exp_send_flags & IBV_EXP_SEND_INLINE),
+ num_sge, wr->sg_list, 0, 0, 0);
+ if (unlikely(err))
+ return err;
+
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)];
+ fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr);
+ set_ctrl_seg_sig(ctrl, &qp->ctrl_seg, MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]),
+ qp->gen_data.scur_post, 0, size, fm_ce_se, send_ieth(wr));
+
+ qp->gen_data.fm_cache = 0;
+ *total_size = size;
+
+ return 0;
+}
+static int __mlx5_post_send_one_rc_dc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) __MLX5_ALGN_F__;
+static int __mlx5_post_send_one_rc_dc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size)
+{
+ struct mlx5_klm_buf *klm;
+ void *ctrl = seg;
+ struct ibv_qp *ibqp = &qp->verbs_qp.qp;
+ struct mlx5_context *ctx = to_mctx(ibqp->context);
+ int err = 0;
+ int size = 0;
+ uint8_t opmod = 0;
+ void *qend = qp->gen_data.sqend;
+ uint32_t mlx5_opcode;
+ struct mlx5_wqe_xrc_seg *xrc;
+ int tmp = 0;
+ int num_sge = wr->num_sge;
+ uint8_t next_fence = 0;
+ struct mlx5_wqe_umr_ctrl_seg *umr_ctrl;
+ int xlat_size;
+ struct mlx5_mkey_seg *mk;
+ int wqe_sz;
+ uint64_t reglen;
+ int atom_arg = 0;
+ uint8_t fm_ce_se;
+ uint32_t imm;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+
+
+ if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) ||
+ (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) &&
+ !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n");
+ return EINVAL;
+ }
+
+ mlx5_opcode = MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]);
+ imm = send_ieth(wr);
+
+ seg += sizeof(struct mlx5_wqe_ctrl_seg);
+ size = sizeof(struct mlx5_wqe_ctrl_seg) / 16;
+
+ switch (qp->gen_data_warm.qp_type) {
+ case IBV_QPT_XRC_SEND:
+ case IBV_QPT_XRC:
+ case IBV_EXP_QPT_DC_INI:
+ if (qp->gen_data_warm.qp_type == IBV_EXP_QPT_DC_INI) {
+ if (likely(wr->exp_opcode != IBV_EXP_WR_NOP))
+ tmp = set_dci_seg(seg, wr);
+ seg += tmp;
+ size += (tmp >> 4);
+ if (unlikely((seg == qend)))
+ seg = mlx5_get_send_wqe(qp, 0);
+
+ } else {
+ xrc = seg;
+ xrc->xrc_srqn = htonl(wr->qp_type.xrc.remote_srqn);
+ seg += sizeof(*xrc);
+ size += sizeof(*xrc) / 16;
+ }
+ /* fall through */
+ case IBV_QPT_RC:
+ switch (wr->exp_opcode) {
+ case IBV_EXP_WR_RDMA_READ:
+ case IBV_EXP_WR_RDMA_WRITE:
+ case IBV_EXP_WR_RDMA_WRITE_WITH_IMM:
+ if (unlikely(exp_send_flags & IBV_EXP_SEND_WITH_CALC)) {
+
+ if ((uint32_t)wr->op.calc.data_size >= IBV_EXP_CALC_DATA_SIZE_NUMBER ||
+ (uint32_t)wr->op.calc.calc_op >= IBV_EXP_CALC_OP_NUMBER ||
+ (uint32_t)wr->op.calc.data_type >= IBV_EXP_CALC_DATA_TYPE_NUMBER ||
+ !mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op]
+ [wr->op.calc.data_type].valid)
+ return EINVAL;
+
+ opmod = mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op]
+ [wr->op.calc.data_type].opmod;
+ }
+ set_raddr_seg(seg, wr->wr.rdma.remote_addr, wr->wr.rdma.rkey);
+ seg += sizeof(struct mlx5_wqe_raddr_seg);
+ size += sizeof(struct mlx5_wqe_raddr_seg) / 16;
+ break;
+
+ case IBV_EXP_WR_ATOMIC_CMP_AND_SWP:
+ case IBV_EXP_WR_ATOMIC_FETCH_AND_ADD:
+ if (unlikely(!qp->enable_atomics)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "atomics not allowed\n");
+ return EINVAL;
+ }
+ set_raddr_seg(seg, wr->wr.atomic.remote_addr,
+ wr->wr.atomic.rkey);
+ seg += sizeof(struct mlx5_wqe_raddr_seg);
+
+ set_atomic_seg(seg, wr->exp_opcode, wr->wr.atomic.swap,
+ wr->wr.atomic.compare_add);
+ seg += sizeof(struct mlx5_wqe_atomic_seg);
+
+ size += (sizeof(struct mlx5_wqe_raddr_seg) +
+ sizeof(struct mlx5_wqe_atomic_seg)) / 16;
+ atom_arg = 8;
+ break;
+
+ case IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP:
+ case IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD:
+ if (unlikely(!qp->enable_atomics)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "atomics not allowed\n");
+ return EINVAL;
+ }
+ if (unlikely(wr->ext_op.masked_atomics.log_arg_sz >=
+ sizeof(ctx->info.bit_mask_log_atomic_arg_sizes) * 8)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "too big atomic arg\n");
+ return EINVAL;
+ }
+ atom_arg = 1 << wr->ext_op.masked_atomics.log_arg_sz;
+ if (unlikely(!(ctx->info.bit_mask_log_atomic_arg_sizes & atom_arg))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported atomic arg size. supported bitmask 0x%lx\n",
+ (unsigned long)ctx->info.bit_mask_log_atomic_arg_sizes);
+ return EINVAL;
+ }
+
+ set_raddr_seg(seg, wr->ext_op.masked_atomics.remote_addr,
+ wr->ext_op.masked_atomics.rkey);
+ seg += sizeof(struct mlx5_wqe_raddr_seg);
+ size += sizeof(struct mlx5_wqe_raddr_seg) / 16;
+ tmp = set_ext_atomic_seg(qp, seg, wr);
+ if (unlikely(tmp < 0)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "invalid atomic arguments\n");
+ return EINVAL;
+ }
+ size += (tmp >> 4);
+ seg += tmp;
+ if (unlikely((seg >= qend)))
+ seg = seg - qend + mlx5_get_send_wqe(qp, 0);
+ opmod = MLX5_OPCODE_EXT_ATOMICS | (wr->ext_op.masked_atomics.log_arg_sz - 2);
+ break;
+
+ case IBV_EXP_WR_SEND:
+ if (unlikely(exp_send_flags & IBV_EXP_SEND_WITH_CALC)) {
+
+ if ((uint32_t)wr->op.calc.data_size >= IBV_EXP_CALC_DATA_SIZE_NUMBER ||
+ (uint32_t)wr->op.calc.calc_op >= IBV_EXP_CALC_OP_NUMBER ||
+ (uint32_t)wr->op.calc.data_type >= IBV_EXP_CALC_DATA_TYPE_NUMBER ||
+ !mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op]
+ [wr->op.calc.data_type].valid)
+ return EINVAL;
+
+ opmod = mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op]
+ [wr->op.calc.data_type].opmod;
+ }
+ break;
+
+ case IBV_EXP_WR_CQE_WAIT:
+ {
+ struct mlx5_cq *wait_cq = to_mcq(wr->task.cqe_wait.cq);
+ uint32_t wait_index = 0;
+
+ wait_index = wait_cq->wait_index +
+ wr->task.cqe_wait.cq_count;
+ wait_cq->wait_count = max(wait_cq->wait_count,
+ wr->task.cqe_wait.cq_count);
+
+ if (exp_send_flags & IBV_EXP_SEND_WAIT_EN_LAST) {
+ wait_cq->wait_index += wait_cq->wait_count;
+ wait_cq->wait_count = 0;
+ }
+
+ set_wait_en_seg(seg, wait_cq->cqn, wait_index);
+ seg += sizeof(struct mlx5_wqe_wait_en_seg);
+ size += sizeof(struct mlx5_wqe_wait_en_seg) / 16;
+ }
+ break;
+
+ case IBV_EXP_WR_SEND_ENABLE:
+ case IBV_EXP_WR_RECV_ENABLE:
+ {
+ unsigned head_en_index;
+ struct mlx5_wq *wq;
+ struct mlx5_wq_recv_send_enable *wq_en;
+
+ /*
+ * Posting work request for QP that does not support
+ * SEND/RECV ENABLE makes performance worse.
+ */
+ if (((wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) &&
+ !(to_mqp(wr->task.wqe_enable.qp)->gen_data.create_flags &
+ IBV_EXP_QP_CREATE_MANAGED_SEND)) ||
+ ((wr->exp_opcode == IBV_EXP_WR_RECV_ENABLE) &&
+ !(to_mqp(wr->task.wqe_enable.qp)->gen_data.create_flags &
+ IBV_EXP_QP_CREATE_MANAGED_RECV))) {
+ return EINVAL;
+ }
+
+ wq = (wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) ?
+ &to_mqp(wr->task.wqe_enable.qp)->sq :
+ &to_mqp(wr->task.wqe_enable.qp)->rq;
+
+ wq_en = (wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) ?
+ &to_mqp(wr->task.wqe_enable.qp)->sq_enable :
+ &to_mqp(wr->task.wqe_enable.qp)->rq_enable;
+
+ /* If wqe_count is 0 release all WRs from queue */
+ if (wr->task.wqe_enable.wqe_count) {
+ head_en_index = wq_en->head_en_index +
+ wr->task.wqe_enable.wqe_count;
+ wq_en->head_en_count = max(wq_en->head_en_count,
+ wr->task.wqe_enable.wqe_count);
+
+ if ((int)(wq->head - head_en_index) < 0)
+ return EINVAL;
+ } else {
+ head_en_index = wq->head;
+ wq_en->head_en_count = wq->head - wq_en->head_en_index;
+ }
+
+ if (exp_send_flags & IBV_EXP_SEND_WAIT_EN_LAST) {
+ wq_en->head_en_index += wq_en->head_en_count;
+ wq_en->head_en_count = 0;
+ }
+
+ set_wait_en_seg(seg,
+ wr->task.wqe_enable.qp->qp_num,
+ head_en_index);
+
+ seg += sizeof(struct mlx5_wqe_wait_en_seg);
+ size += sizeof(struct mlx5_wqe_wait_en_seg) / 16;
+ }
+ break;
+ case IBV_EXP_WR_UMR_FILL:
+ case IBV_EXP_WR_UMR_INVALIDATE:
+ if (unlikely(!qp->umr_en)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "UMR not supported\n");
+ return EINVAL;
+ }
+ next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
+ imm = htonl(wr->ext_op.umr.modified_mr->lkey);
+ num_sge = 0;
+ umr_ctrl = seg;
+ set_umr_ctrl_seg(wr, seg);
+ seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
+ size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
+
+ if (unlikely((seg == qend)))
+ seg = mlx5_get_send_wqe(qp, 0);
+ mk = seg;
+ set_mkey_seg(wr, seg);
+ seg += sizeof(*mk);
+ size += (sizeof(*mk) / 16);
+ if (wr->exp_opcode == IBV_EXP_WR_UMR_INVALIDATE)
+ break;
+
+ if (unlikely((seg == qend)))
+ seg = mlx5_get_send_wqe(qp, 0);
+ err = lay_umr(qp, wr, seg, &wqe_sz, &xlat_size, &reglen);
+ if (err) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "lay_umr failure\n");
+ return err;
+ }
+ mk->len = htonll(reglen);
+ size += wqe_sz / 16;
+ seg += wqe_sz;
+ umr_ctrl->klm_octowords = htons(align(xlat_size, 64) / 16);
+ if (unlikely((seg >= qend)))
+ seg = adjust_seg(qp, seg);
+ if (!(wr->exp_send_flags & IBV_EXP_SEND_INLINE)) {
+ struct ibv_sge sge;
+
+ klm = to_klm(wr->ext_op.umr.memory_objects);
+ sge.addr = (uint64_t)(uintptr_t)klm->mr->addr;
+ sge.lkey = klm->mr->lkey;
+ sge.length = 0;
+ set_data_ptr_seg(seg, &sge, qp, 0);
+ size += sizeof(struct mlx5_wqe_data_seg) / 16;
+ seg += sizeof(struct mlx5_wqe_data_seg);
+ }
+ break;
+
+ case IBV_EXP_WR_NOP:
+ break;
+
+ default:
+ break;
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ err = set_data_seg(qp, seg, &size, !!(exp_send_flags & IBV_EXP_SEND_INLINE),
+ num_sge, wr->sg_list, atom_arg, 0, 0);
+ if (unlikely(err))
+ return err;
+
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)];
+ fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr);
+ set_ctrl_seg_sig(ctrl, &qp->ctrl_seg,
+ mlx5_opcode, qp->gen_data.scur_post, opmod, size,
+ fm_ce_se, imm);
+
+ qp->gen_data.fm_cache = next_fence;
+ *total_size = size;
+
+ return 0;
+}
+
+static inline int __mlx5_post_send_one_fast_rc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size,
+ const int cmd, const int inl) __attribute__((always_inline));
+static inline int __mlx5_post_send_one_fast_rc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size,
+ const int cmd, const int inl)
+{
+ struct mlx5_wqe_ctrl_seg *ctrl = seg;
+ int err = 0;
+ int size = 0;
+ uint8_t fm_ce_se;
+
+ seg += sizeof(*ctrl);
+ size = sizeof(*ctrl) / 16;
+
+ if (cmd == MLX5_OPCODE_RDMA_WRITE) {
+ set_raddr_seg(seg, wr->wr.rdma.remote_addr, wr->wr.rdma.rkey);
+ seg += sizeof(struct mlx5_wqe_raddr_seg);
+ size += sizeof(struct mlx5_wqe_raddr_seg) / 16;
+ }
+
+ if (inl)
+ err = set_data_inl_seg(qp, wr->num_sge, wr->sg_list, seg,
+ &size, 0, 0);
+ else
+ err = set_data_non_inl_seg(qp, wr->num_sge, wr->sg_list, seg,
+ &size, 0, 0);
+ if (unlikely(err))
+ return err;
+
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)];
+ if (unlikely(qp->gen_data.fm_cache)) {
+ if (unlikely(exp_send_flags & IBV_EXP_SEND_FENCE))
+ fm_ce_se |= MLX5_FENCE_MODE_SMALL_AND_FENCE;
+ else
+ fm_ce_se |= qp->gen_data.fm_cache;
+ }
+
+ set_ctrl_seg((uint32_t *)ctrl, &qp->ctrl_seg,
+ cmd, qp->gen_data.scur_post, 0, size,
+ fm_ce_se, 0);
+
+ qp->gen_data.fm_cache = 0;
+ *total_size = size;
+
+ return 0;
+}
+
+#define MLX5_POST_SEND_ONE_FAST_RC(suffix, cmd, inl) \
+ static int __mlx5_post_send_one_fast_rc_##suffix( \
+ struct ibv_exp_send_wr *wr, \
+ struct mlx5_qp *qp, uint64_t exp_send_flags, \
+ void *seg, int *total_size) __MLX5_ALGN_F__; \
+ static int __mlx5_post_send_one_fast_rc_##suffix( \
+ struct ibv_exp_send_wr *wr, \
+ struct mlx5_qp *qp, uint64_t exp_send_flags, \
+ void *seg, int *total_size) \
+ { \
+ return __mlx5_post_send_one_fast_rc(wr, qp, \
+ exp_send_flags, \
+ seg, total_size, \
+ cmd, inl); \
+ }
+/* suffix cmd inl */
+MLX5_POST_SEND_ONE_FAST_RC(send, MLX5_OPCODE_SEND, 0);
+MLX5_POST_SEND_ONE_FAST_RC(send_inl, MLX5_OPCODE_SEND, 1);
+MLX5_POST_SEND_ONE_FAST_RC(rwrite, MLX5_OPCODE_RDMA_WRITE, 0);
+MLX5_POST_SEND_ONE_FAST_RC(rwrite_inl, MLX5_OPCODE_RDMA_WRITE, 1);
+
+static int __mlx5_post_send_one_not_ready(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size)
+{
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "bad QP state\n");
+
+ return EINVAL;
+}
+
+enum mlx5_post_send_one_rc_cases {
+ MLX5_SEND_RC = (IBV_EXP_WR_SEND),
+ MLX5_SEND_RC_INL = (IBV_EXP_WR_SEND) + (IBV_EXP_SEND_INLINE << 8),
+ MLX5_RDMA_WRITE_RC = (IBV_EXP_WR_RDMA_WRITE),
+ MLX5_RDMA_WRITE_RC_INL = (IBV_EXP_WR_RDMA_WRITE) + (IBV_EXP_SEND_INLINE << 8),
+};
+
+static int __mlx5_post_send_one_rc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags,
+ void *seg, int *total_size) __MLX5_ALGN_F__;
+static int __mlx5_post_send_one_rc(struct ibv_exp_send_wr *wr,
+ struct mlx5_qp *qp, uint64_t exp_send_flags,
+ void *seg, int *total_size)
+{
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp;
+#endif
+ uint64_t rc_case = (uint64_t)wr->exp_opcode | ((exp_send_flags & (IBV_EXP_SEND_WITH_CALC | IBV_EXP_SEND_INLINE)) << 8);
+
+ switch (rc_case) {
+
+ case MLX5_SEND_RC:
+ return __mlx5_post_send_one_fast_rc_send(wr, qp, exp_send_flags, seg, total_size);
+
+ case MLX5_SEND_RC_INL:
+ return __mlx5_post_send_one_fast_rc_send_inl(wr, qp, exp_send_flags, seg, total_size);
+
+ case MLX5_RDMA_WRITE_RC:
+ return __mlx5_post_send_one_fast_rc_rwrite(wr, qp, exp_send_flags, seg, total_size);
+
+ case MLX5_RDMA_WRITE_RC_INL:
+ return __mlx5_post_send_one_fast_rc_rwrite_inl(wr, qp, exp_send_flags, seg, total_size);
+
+ default:
+ if (unlikely(wr->exp_opcode < 0 ||
+ wr->exp_opcode >= sizeof(mlx5_ib_opcode) / sizeof(mlx5_ib_opcode[0]))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "bad opcode %d\n", wr->exp_opcode);
+ return EINVAL;
+ } else {
+ return __mlx5_post_send_one_rc_dc(wr, qp, exp_send_flags, seg, total_size);
+ }
+ }
+}
+
+void mlx5_update_post_send_one(struct mlx5_qp *qp, enum ibv_qp_state qp_state, enum ibv_qp_type qp_type)
+{
+ if (qp_state < IBV_QPS_RTS) {
+ qp->gen_data.post_send_one = __mlx5_post_send_one_not_ready;
+ } else {
+ switch (qp_type) {
+ case IBV_QPT_XRC_SEND:
+ case IBV_QPT_XRC:
+ case IBV_EXP_QPT_DC_INI:
+ qp->gen_data.post_send_one = __mlx5_post_send_one_rc_dc;
+ break;
+ case IBV_QPT_RC:
+ if (qp->ctrl_seg.wq_sig)
+ qp->gen_data.post_send_one = __mlx5_post_send_one_rc_dc;
+ else
+ qp->gen_data.post_send_one = __mlx5_post_send_one_rc;
+
+ break;
+
+ case IBV_QPT_UC:
+ case IBV_QPT_UD:
+ qp->gen_data.post_send_one = __mlx5_post_send_one_uc_ud;
+ break;
+
+ case IBV_QPT_RAW_ETH:
+ qp->gen_data.post_send_one = __mlx5_post_send_one_raw_packet;
+ break;
+
+ default:
+ qp->gen_data.post_send_one = __mlx5_post_send_one_other;
+ break;
+ }
+ }
+}
+
+static inline int __ring_db(struct mlx5_qp *qp, const int db_method, uint32_t curr_post, unsigned long long *seg, int size) __attribute__((always_inline));
+static inline int __ring_db(struct mlx5_qp *qp, const int db_method, uint32_t curr_post, unsigned long long *seg, int size)
+{
+ struct mlx5_bf *bf = qp->gen_data.bf;
+
+ qp->gen_data.last_post = curr_post;
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+
+ switch (db_method) {
+ case MLX5_DB_METHOD_DEDIC_BF_1_THREAD:
+ /* This QP is used by one thread and it uses dedicated blue-flame */
+
+ /* Use wc_wmb to make sure old BF-copy is not passing current DB record */
+ wc_wmb();
+ qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post);
+
+ /* This wc_wmb ensures ordering between DB record and BF copy */
+ wc_wmb();
+ if (size <= bf->buf_size / 64) {
+ mlx5_bf_copy(bf->reg + bf->offset, seg,
+ size * 64, qp);
+
+ /* No need for wc_wmb since cpu arch support auto WC buffer eviction */
+ } else {
+ mlx5_write_db(bf->reg + bf->offset, seg);
+ wc_wmb();
+ }
+ bf->offset ^= bf->buf_size;
+ break;
+
+ case MLX5_DB_METHOD_DEDIC_BF:
+ /* The QP has dedicated blue-flame */
+
+ /*
+ * Make sure that descriptors are written before
+ * updating doorbell record and ringing the doorbell
+ */
+ wmb();
+ qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post);
+
+ /* This wc_wmb ensures ordering between DB record and BF copy */
+ wc_wmb();
+ if (size <= bf->buf_size / 64)
+ mlx5_bf_copy(bf->reg + bf->offset, seg,
+ size * 64, qp);
+ else
+ mlx5_write_db(bf->reg + bf->offset, seg);
+ /*
+ * use wc_wmb to ensure write combining buffers are flushed out
+ * of the running CPU. This must be carried inside the spinlock.
+ * Otherwise, there is a potential race. In the race, CPU A
+ * writes doorbell 1, which is waiting in the WC buffer. CPU B
+ * writes doorbell 2, and it's write is flushed earlier. Since
+ * the wc_wmb is CPU local, this will result in the HCA seeing
+ * doorbell 2, followed by doorbell 1.
+ */
+ wc_wmb();
+ bf->offset ^= bf->buf_size;
+ break;
+
+ case MLX5_DB_METHOD_BF:
+ /* The QP has blue-flame that may be shared by other QPs */
+
+ /*
+ * Make sure that descriptors are written before
+ * updating doorbell record and ringing the doorbell
+ */
+ wmb();
+ qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post);
+
+ /* This wc_wmb ensures ordering between DB record and BF copy */
+ wc_wmb();
+ mlx5_lock(&bf->lock);
+ if (size <= bf->buf_size / 64)
+ mlx5_bf_copy(bf->reg + bf->offset, seg,
+ size * 64, qp);
+ else
+ mlx5_write_db(bf->reg + bf->offset, seg);
+ /*
+ * use wc_wmb to ensure write combining buffers are flushed out
+ * of the running CPU. This must be carried inside the spinlock.
+ * Otherwise, there is a potential race. In the race, CPU A
+ * writes doorbell 1, which is waiting in the WC buffer. CPU B
+ * writes doorbell 2, and it's write is flushed earlier. Since
+ * the wc_wmb is CPU local, this will result in the HCA seeing
+ * doorbell 2, followed by doorbell 1.
+ */
+ wc_wmb();
+ bf->offset ^= bf->buf_size;
+ mlx5_unlock(&bf->lock);
+ break;
+
+ case MLX5_DB_METHOD_DB:
+ /* doorbell mapped to non-cached memory */
+
+ /*
+ * Make sure that descriptors are written before
+ * updating doorbell record and ringing the doorbell
+ */
+ wmb();
+ qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post);
+
+ /* This wmb ensures ordering between DB record and DB ringing */
+ wmb();
+ mlx5_write64((__be32 *)seg, bf->reg + bf->offset, &bf->lock);
+ break;
+ }
+
+ return 0;
+}
+
+static inline int __mlx5_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr,
+ struct ibv_exp_send_wr **bad_wr, int is_exp_wr) __attribute__((always_inline));
+static inline int __mlx5_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr,
+ struct ibv_exp_send_wr **bad_wr, int is_exp_wr)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ void *uninitialized_var(seg);
+ int nreq;
+ int err = 0;
+ int size;
+ unsigned idx;
+ uint64_t exp_send_flags;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(ibqp->context)->dbg_fp;
+#endif
+
+
+ mlx5_lock(&qp->sq.lock);
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ idx = qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1);
+ seg = mlx5_get_send_wqe(qp, idx);
+
+ exp_send_flags = is_exp_wr ? wr->exp_send_flags : ((struct ibv_send_wr *)wr)->send_flags;
+
+ if (unlikely(!(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_IGNORE_SQ_OVERFLOW) &&
+ mlx5_wq_overflow(&qp->sq, nreq, qp))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "work queue overflow\n");
+ errno = ENOMEM;
+ err = errno;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (unlikely(wr->num_sge > qp->sq.max_gs)) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "max gs exceeded %d (max = %d)\n",
+ wr->num_sge, qp->sq.max_gs);
+ errno = ENOMEM;
+ err = errno;
+ *bad_wr = wr;
+ goto out;
+ }
+
+
+
+ err = qp->gen_data.post_send_one(wr, qp, exp_send_flags, seg, &size);
+ if (unlikely(err)) {
+ errno = err;
+ *bad_wr = wr;
+ goto out;
+ }
+
+
+
+ qp->sq.wrid[idx] = wr->wr_id;
+ qp->gen_data.wqe_head[idx] = qp->sq.head + nreq;
+ qp->gen_data.scur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+
+#ifdef MLX5_DEBUG
+ if (mlx5_debug_mask & MLX5_DBG_QP_SEND)
+ dump_wqe(to_mctx(ibqp->context)->dbg_fp, idx, size, qp);
+#endif
+ }
+
+out:
+ if (likely(nreq)) {
+ qp->sq.head += nreq;
+
+ if (unlikely(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_MANAGED_SEND)) {
+ /* Controlled qp */
+ wmb();
+ goto post_send_no_db;
+ }
+
+ __ring_db(qp, qp->gen_data.bf->db_method, qp->gen_data.scur_post & 0xffff, seg, (size + 3) / 4);
+ }
+
+post_send_no_db:
+
+ mlx5_unlock(&qp->sq.lock);
+
+ return err;
+}
+
+int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr)
+{
+ return __mlx5_post_send(ibqp, (struct ibv_exp_send_wr *)wr,
+ (struct ibv_exp_send_wr **)bad_wr, 0);
+}
+
+int mlx5_exp_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr,
+ struct ibv_exp_send_wr **bad_wr)
+{
+ return __mlx5_post_send(ibqp, wr, bad_wr, 1);
+}
+
+static void set_sig_seg(struct mlx5_qp *qp, struct mlx5_rwqe_sig *sig,
+ int size, uint16_t idx)
+{
+ uint8_t sign;
+ uint32_t qpn = qp->verbs_qp.qp.qp_num;
+
+ sign = calc_xor(sig + 1, size);
+ sign ^= calc_xor(&qpn, 4);
+ sign ^= calc_xor(&idx, 2);
+ sig->signature = ~sign;
+}
+
+int mlx5_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ struct mlx5_wqe_data_seg *scat;
+ int err = 0;
+ int nreq;
+ int ind;
+ int i, j;
+ struct mlx5_rwqe_sig *sig;
+ int sigsz;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(ibqp->context)->dbg_fp;
+#endif
+
+ mlx5_lock(&qp->rq.lock);
+
+ ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (unlikely(!(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_IGNORE_RQ_OVERFLOW) &&
+ mlx5_wq_overflow(&qp->rq, nreq, qp))) {
+ errno = ENOMEM;
+ err = errno;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (unlikely(wr->num_sge > qp->rq.max_gs)) {
+ errno = EINVAL;
+ err = errno;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ scat = get_recv_wqe(&qp->rq, ind);
+ sig = (struct mlx5_rwqe_sig *)scat;
+ if (unlikely(qp->ctrl_seg.wq_sig))
+ ++scat;
+
+ for (i = 0, j = 0; i < wr->num_sge; ++i) {
+ if (unlikely(!wr->sg_list[i].length))
+ continue;
+ if (unlikely(set_data_ptr_seg(scat + j++,
+ wr->sg_list + i, qp, 0))) {
+ mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for global lkey structure\n");
+ errno = ENOMEM;
+ err = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+ }
+
+ if (j < qp->rq.max_gs) {
+ scat[j].byte_count = 0;
+ scat[j].lkey = htonl(MLX5_INVALID_LKEY);
+ scat[j].addr = 0;
+ }
+
+ if (unlikely(qp->ctrl_seg.wq_sig)) {
+ sigsz = min(wr->num_sge, (1 << (qp->rq.wqe_shift - 4)) - 1);
+
+ set_sig_seg(qp, sig, sigsz << 4, qp->rq.head + nreq);
+ }
+
+ qp->rq.wrid[ind] = wr->wr_id;
+
+ ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+ }
+
+out:
+ if (likely(nreq)) {
+ qp->rq.head += nreq;
+
+ /*
+ * Make sure that descriptors are written before
+ * doorbell record.
+ */
+ wmb();
+
+ if (likely(!(ibqp->qp_type == IBV_QPT_RAW_ETH &&
+ ibqp->state < IBV_QPS_RTR)))
+ qp->gen_data.db[MLX5_RCV_DBR] = htonl(qp->rq.head & 0xffff);
+ }
+
+ mlx5_unlock(&qp->rq.lock);
+
+ return err;
+}
+
+int mlx5_use_huge(struct ibv_context *context, const char *key)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, key, env, sizeof(env)) &&
+ !strcmp(env, "y"))
+ return 1;
+
+ return 0;
+}
+
+void *mlx5_find_rsc(struct mlx5_context *ctx, uint32_t rsn)
+{
+ int tind = rsn >> MLX5_QP_TABLE_SHIFT;
+
+ if (ctx->rsc_table[tind].refcnt)
+ return ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK];
+ else
+ return NULL;
+}
+
+int mlx5_store_rsc(struct mlx5_context *ctx, uint32_t rsn, void *rsc)
+{
+ int tind = rsn >> MLX5_QP_TABLE_SHIFT;
+
+ if (!ctx->rsc_table[tind].refcnt) {
+ ctx->rsc_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1,
+ sizeof(void *));
+ if (!ctx->rsc_table[tind].table)
+ return -1;
+ }
+
+ ++ctx->rsc_table[tind].refcnt;
+ ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK] = rsc;
+ return 0;
+}
+
+void mlx5_clear_rsc(struct mlx5_context *ctx, uint32_t rsn)
+{
+ int tind = rsn >> MLX5_QP_TABLE_SHIFT;
+
+ if (!--ctx->rsc_table[tind].refcnt)
+ free(ctx->rsc_table[tind].table);
+ else
+ ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK] = NULL;
+}
+
+int mlx5_post_task(struct ibv_context *context,
+ struct ibv_exp_task *task_list,
+ struct ibv_exp_task **bad_task)
+{
+ int rc = 0;
+ struct ibv_exp_task *cur_task = NULL;
+ struct ibv_exp_send_wr *bad_wr;
+ struct mlx5_context *mlx5_ctx = to_mctx(context);
+
+ if (!task_list)
+ return rc;
+
+ pthread_mutex_lock(&mlx5_ctx->task_mutex);
+
+ cur_task = task_list;
+ while (!rc && cur_task) {
+
+ switch (cur_task->task_type) {
+ case IBV_EXP_TASK_SEND:
+ rc = ibv_exp_post_send(cur_task->item.qp,
+ cur_task->item.send_wr,
+ &bad_wr);
+ break;
+
+ case IBV_EXP_TASK_RECV:
+ rc = ibv_post_recv(cur_task->item.qp,
+ cur_task->item.recv_wr,
+ NULL);
+ break;
+
+ default:
+ rc = -1;
+ }
+
+ if (rc && bad_task) {
+ *bad_task = cur_task;
+ break;
+ }
+
+ cur_task = cur_task->next;
+ }
+
+ pthread_mutex_unlock(&mlx5_ctx->task_mutex);
+
+ return rc;
+}
+
+/*
+ * family interfaces functions
+ */
+
+/*
+ * send_pending - is a general post send function that put one message in
+ * the send queue. The function is not ringing the QP door-bell.
+ *
+ * User may call this function several times to fill send queue with
+ * several messages, then he can call send_flush to ring the QP DB
+ *
+ * This function is used to implement the following QP burst family functions:
+ * - send_pending
+ * - send_pending_inline
+ * - send_pending_sg_list
+ * - send_burst
+ */
+
+static inline int send_pending(struct ibv_qp *ibqp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags,
+ const int use_raw_eth, const int use_inl,
+ const int thread_safe, const int use_sg_list,
+ const int use_mpw,
+ const int num_sge, struct ibv_sge *sg_list) __attribute__((always_inline));
+static inline int send_pending(struct ibv_qp *ibqp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags,
+ const int use_raw_eth, const int use_inl,
+ const int thread_safe, const int use_sg_list,
+ const int use_mpw,
+ const int num_sge, struct ibv_sge *sg_list)
+
+{
+ struct mlx5_wqe_inline_seg *uninitialized_var(inl_seg);
+ struct mlx5_wqe_data_seg *uninitialized_var(dseg);
+ uint8_t *uninitialized_var(inl_data);
+ uint32_t *uninitialized_var(start);
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ int uninitialized_var(size);
+ uint8_t fm_ce_se;
+ int i;
+
+ if (thread_safe)
+ mlx5_lock(&qp->sq.lock);
+
+ if (use_mpw) {
+ uint32_t msg_size, n_sg;
+
+ if (use_sg_list) {
+ msg_size = 0;
+ for (i = 0; i < num_sge; i++)
+ msg_size += sg_list[i].length;
+ n_sg = num_sge;
+ } else {
+ msg_size = length;
+ n_sg = 1;
+ }
+ if (use_inl &&
+ (qp->mpw.state == MLX5_MPW_STATE_OPENED_INL) &&
+ (qp->mpw.len == msg_size) &&
+ ((qp->mpw.flags & ~IBV_EXP_QP_BURST_SIGNALED) ==
+ (flags & ~IBV_EXP_QP_BURST_SIGNALED)) &&
+ ((qp->mpw.total_len + msg_size) <= qp->data_seg.max_inline_data)) {
+ /* Add current message to opened inline multi-packet WQE */
+ inl_seg = (struct mlx5_wqe_inline_seg *)(qp->mpw.ctrl_update + 7);
+ inl_data = qp->mpw.inl_data + qp->mpw.len;
+ if (unlikely((void *)inl_data >= qp->gen_data.sqend))
+ inl_data = (uint8_t *)mlx5_get_send_wqe(qp, 0) +
+ (inl_data - (uint8_t *)qp->gen_data.sqend);
+ qp->mpw.total_len += msg_size;
+ } else if (!use_inl &&
+ (qp->mpw.state == MLX5_MPW_STATE_OPENED) &&
+ (qp->mpw.len == msg_size) &&
+ ((qp->mpw.flags & ~IBV_EXP_QP_BURST_SIGNALED) ==
+ (flags & ~IBV_EXP_QP_BURST_SIGNALED)) &&
+ (qp->mpw.num_sge + n_sg) <= MLX5_MAX_MPW_SGE) {
+ /* Add current message to opened multi-packet WQE */
+ dseg = qp->mpw.last_dseg + 1;
+ if (unlikely(dseg == qp->gen_data.sqend))
+ dseg = mlx5_get_send_wqe(qp, 0);
+ size = 0;
+ qp->mpw.num_sge += n_sg;
+ } else if (likely(use_inl || (msg_size <= MLX5_MAX_MPW_SIZE))) {
+ /* Open new multi-packet WQE
+ *
+ * In case of inline the user must make sure that
+ * message size is smaller than max_inline which
+ * means that it is also smaller than MLX5_MAX_MPW_SIZE
+ * This guarantees that we can open multi-packet WQE.
+ * In case of non-inline we must check that msg_size is
+ * smaller than MLX5_MAX_MPW_SIZE.
+ */
+
+ qp->mpw.state = MLX5_MPW_STATE_OPENING;
+ qp->mpw.len = msg_size;
+ qp->mpw.num_sge = n_sg;
+ qp->mpw.flags = flags;
+ qp->mpw.scur_post = qp->gen_data.scur_post;
+ qp->mpw.total_len = msg_size;
+ } else {
+ /* We can't open new multi-packet WQE
+ * since msg_size > MLX5_MAX_MPW_SIZE
+ */
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+ }
+ } else {
+ /* Close multi-packet WQE */
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+ }
+
+ if (use_sg_list) {
+ addr = sg_list[0].addr;
+ length = sg_list[0].length;
+ lkey = sg_list[0].lkey;
+ }
+
+ /* Start new WQE if there is no open multi-packet WQE */
+ if ((use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED_INL)) ||
+ (!use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED))) {
+ start = mlx5_get_send_wqe(qp, qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1));
+
+ if (use_raw_eth) {
+ struct mlx5_wqe_eth_seg *eseg;
+
+ eseg = (struct mlx5_wqe_eth_seg *)(((char *)start) +
+ sizeof(struct mlx5_wqe_ctrl_seg));
+ /* reset rsvd0, cs_flags, rsvd1, mss and rsvd2 fields */
+ *((uint64_t *)eseg) = 0;
+ eseg->rsvd2 = 0;
+
+ if (flags & IBV_EXP_QP_BURST_IP_CSUM)
+ eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
+ if (use_mpw && (qp->mpw.state == MLX5_MPW_STATE_OPENING)) {
+ eseg->mss = htons(qp->mpw.len);
+ eseg->inline_hdr_sz = 0;
+ size = (sizeof(struct mlx5_wqe_ctrl_seg) +
+ offsetof(struct mlx5_wqe_eth_seg, inline_hdr)) / 16;
+ if (use_inl) {
+ inl_seg = (struct mlx5_wqe_inline_seg *)(start +
+ (size * 4));
+ inl_data = (uint8_t *)(inl_seg + 1);
+ } else {
+ dseg = (struct mlx5_wqe_data_seg *)(start +
+ (size * 4));
+ }
+ } else {
+ eseg->inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE);
+
+ /* We don't support header divided in several sges */
+ if (unlikely(length <= MLX5_ETH_INLINE_HEADER_SIZE))
+ return EINVAL;
+
+ /* Copy the first 16 bytes into the inline header */
+ memcpy(eseg->inline_hdr_start, (void *)(uintptr_t)addr,
+ MLX5_ETH_INLINE_HEADER_SIZE);
+ addr += MLX5_ETH_INLINE_HEADER_SIZE;
+ length -= MLX5_ETH_INLINE_HEADER_SIZE;
+ size = (sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_eth_seg)) / 16;
+ dseg = (struct mlx5_wqe_data_seg *)(++eseg);
+ }
+ } else {
+ size = sizeof(struct mlx5_wqe_ctrl_seg) / 16;
+ dseg = (struct mlx5_wqe_data_seg *)(((char *)start) + sizeof(struct mlx5_wqe_ctrl_seg));
+ }
+ }
+
+ if (use_inl) {
+ if (use_mpw) {
+ if (unlikely((inl_data + qp->mpw.len) >
+ (uint8_t *)qp->gen_data.sqend)) {
+ int size2end = ((uint8_t *)qp->gen_data.sqend - inl_data);
+
+ memcpy(inl_data, (void *)(uintptr_t)addr, size2end);
+ memcpy(mlx5_get_send_wqe(qp, 0),
+ (void *)(uintptr_t)(addr + size2end),
+ qp->mpw.len - size2end);
+
+ } else {
+ memcpy(inl_data, (void *)(uintptr_t)addr, qp->mpw.len);
+ }
+ inl_seg->byte_count = htonl(qp->mpw.total_len | MLX5_INLINE_SEG);
+ size = (sizeof(struct mlx5_wqe_ctrl_seg) +
+ offsetof(struct mlx5_wqe_eth_seg, inline_hdr)) / 16;
+ size += align(qp->mpw.total_len + sizeof(inl_seg->byte_count), 16) / 16;
+ } else {
+ struct ibv_sge sg_list = {addr, length, 0};
+
+ set_data_inl_seg(qp, 1, &sg_list, dseg, &size, 0, 0);
+ }
+ } else {
+ size += sizeof(struct mlx5_wqe_data_seg) / 16;
+ dseg->byte_count = htonl(length);
+ dseg->lkey = htonl(lkey);
+ dseg->addr = htonll(addr);
+ }
+
+ /* No inline when using sg list */
+ if (use_sg_list) {
+ for (i = 0; i < num_sge - 1; ++i) {
+ sg_list++;
+ if (likely(sg_list->length)) {
+ dseg++;
+ if (unlikely(dseg == qp->gen_data.sqend))
+ dseg = mlx5_get_send_wqe(qp, 0);
+ size += sizeof(struct mlx5_wqe_data_seg) / 16;
+ dseg->byte_count = htonl(sg_list->length);
+ dseg->lkey = htonl(sg_list->lkey);
+ dseg->addr = htonll(sg_list->addr);
+ }
+ }
+ }
+ if (use_mpw) {
+ if (use_inl)
+ qp->mpw.inl_data = inl_data;
+ else
+ qp->mpw.last_dseg = dseg;
+ }
+
+ if ((use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED_INL)) ||
+ (!use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED))) {
+ /* Fill ctrl-segment of a new WQE */
+ fm_ce_se = qp->ctrl_seg.fm_ce_se_acc[flags & (IBV_EXP_QP_BURST_SOLICITED |
+ IBV_EXP_QP_BURST_SIGNALED |
+ IBV_EXP_QP_BURST_FENCE)];
+ if (unlikely(qp->gen_data.fm_cache)) {
+ if (unlikely(flags & IBV_SEND_FENCE))
+ fm_ce_se |= MLX5_FENCE_MODE_SMALL_AND_FENCE;
+ else
+ fm_ce_se |= qp->gen_data.fm_cache;
+ qp->gen_data.fm_cache = 0;
+ }
+
+ if (likely(use_mpw && (qp->mpw.state == MLX5_MPW_STATE_OPENING))) {
+ *start++ = htonl((MLX5_OPC_MOD_MPW << 24) |
+ ((qp->gen_data.scur_post & 0xFFFF) << 8) |
+ MLX5_OPCODE_LSO_MPW);
+ qp->mpw.ctrl_update = start;
+ if ((flags & IBV_EXP_QP_BURST_SIGNALED) ||
+ (qp->mpw.num_sge >= MLX5_MAX_MPW_SGE)) {
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+ } else {
+ if (use_inl)
+ qp->mpw.state = MLX5_MPW_STATE_OPENED_INL;
+ else
+ qp->mpw.state = MLX5_MPW_STATE_OPENED;
+ qp->mpw.size = size;
+ }
+ } else {
+ *start++ = htonl((qp->gen_data.scur_post & 0xFFFF) << 8 |
+ MLX5_OPCODE_SEND);
+ }
+ *start++ = htonl(qp->ctrl_seg.qp_num << 8 | (size & 0x3F));
+ *start++ = htonl(fm_ce_se);
+ *start = 0;
+
+ qp->gen_data.wqe_head[qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1)] = ++(qp->sq.head);
+ /* Update last_post to point on the position of the new WQE */
+ qp->gen_data.last_post = qp->gen_data.scur_post;
+ qp->gen_data.scur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB);
+ } else {
+ /* Update the multi-packt WQE ctrl-segment */
+ if (use_inl)
+ qp->mpw.size = size;
+ else
+ qp->mpw.size += size;
+ *qp->mpw.ctrl_update = htonl(qp->ctrl_seg.qp_num << 8 | ((qp->mpw.size) & 0x3F));
+ qp->gen_data.scur_post = qp->mpw.scur_post + DIV_ROUND_UP(qp->mpw.size * 16, MLX5_SEND_WQE_BB);
+ if (flags & IBV_EXP_QP_BURST_SIGNALED) {
+ *(qp->mpw.ctrl_update + 1) |= htonl(MLX5_WQE_CTRL_CQ_UPDATE);
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+ } else if (unlikely(qp->mpw.num_sge == MLX5_MAX_MPW_SGE)) {
+ qp->mpw.state = MLX5_MPW_STATE_CLOSED;
+ }
+ }
+
+ if (thread_safe)
+ mlx5_unlock(&qp->sq.lock);
+
+ return 0;
+}
+
+/* burst family - send_pending */
+static int mlx5_send_pending_safe(struct ibv_qp *qp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_safe(struct ibv_qp *qp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, inl, safe, */
+ return send_pending(qp, addr, length, lkey, flags, raw_eth, 0, 1,
+ /* use_sg, use_mpw, num_sge, sg_list */
+ 0, 0, 0, NULL);
+}
+
+static int mlx5_send_pending_mpw_safe(struct ibv_qp *qp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_mpw_safe(struct ibv_qp *qp, uint64_t addr,
+ uint32_t length, uint32_t lkey,
+ uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, inl, safe, */
+ return send_pending(qp, addr, length, lkey, flags, raw_eth, 0, 1,
+ /* use_sg, use_mpw, num_sge, sg_list */
+ 0, 1, 0, NULL);
+}
+
+#define MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw) mlx5_send_pending_unsafe_##eth##mpw
+#define MLX5_SEND_PENDING_UNSAFE(eth, mpw) \
+ static int MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *qp, uint64_t addr, \
+ uint32_t length, uint32_t lkey, \
+ uint32_t flags) __MLX5_ALGN_F__; \
+ static int MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *qp, uint64_t addr, \
+ uint32_t length, uint32_t lkey, \
+ uint32_t flags) \
+ { \
+ /* qp, addr, length, lkey, flags, eth, inl, */ \
+ return send_pending(qp, addr, length, lkey, flags, eth, 0, \
+ /* safe, use_sg, use_mpw, num_sge, sg_list */ \
+ 0, 0, mpw, 0, NULL); \
+ }
+/* eth mpw */
+MLX5_SEND_PENDING_UNSAFE(0, 0);
+MLX5_SEND_PENDING_UNSAFE(0, 1);
+MLX5_SEND_PENDING_UNSAFE(1, 0);
+MLX5_SEND_PENDING_UNSAFE(1, 1);
+
+/* burst family - send_pending_inline */
+static int mlx5_send_pending_inl_safe(struct ibv_qp *qp, void *addr,
+ uint32_t length, uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_inl_safe(struct ibv_qp *qp, void *addr,
+ uint32_t length, uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, */
+ return send_pending(qp, (uintptr_t)addr, length, 0, flags, raw_eth,
+ /* inl, safe, use_sg, use_mpw, num_sge, sg_list */
+ 1, 1, 0, 0, 0, NULL);
+}
+
+static int mlx5_send_pending_inl_mpw_safe(struct ibv_qp *qp, void *addr,
+ uint32_t length, uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_inl_mpw_safe(struct ibv_qp *qp, void *addr,
+ uint32_t length, uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, */
+ return send_pending(qp, (uintptr_t)addr, length, 0, flags, raw_eth,
+ /* inl, safe, use_sg, use_mpw, num_sge, sg_list */
+ 1, 1, 0, 1, 0, NULL);
+}
+
+#define MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw) mlx5_send_pending_inl_unsafe_##eth##mpw
+#define MLX5_SEND_PENDING_INL_UNSAFE(eth, mpw) \
+ static int MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *qp, void *addr, \
+ uint32_t length, uint32_t flags) __MLX5_ALGN_F__; \
+ static int MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *qp, void *addr, \
+ uint32_t length, uint32_t flags) \
+ { \
+ /* qp, addr, length, lkey, flags, eth, inl, */ \
+ return send_pending(qp, (uintptr_t)addr, length, 0, flags, eth, 1, \
+ /* safe, use_sg, use_mpw, num_sge, sg_list */ \
+ 0, 0, mpw, 0, NULL); \
+ }
+/* eth mpw */
+MLX5_SEND_PENDING_INL_UNSAFE(0, 0);
+MLX5_SEND_PENDING_INL_UNSAFE(0, 1);
+MLX5_SEND_PENDING_INL_UNSAFE(1, 0);
+MLX5_SEND_PENDING_INL_UNSAFE(1, 1);
+
+/* burst family - send_pending_sg_list */
+static int mlx5_send_pending_sg_list_safe(
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_sg_list_safe(
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(ibqp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, inl, */
+ return send_pending(ibqp, 0, 0, 0, flags, raw_eth, 0,
+ /* safe, use_sg, use_mpw, num_sge, sg_list */
+ 1, 1, 0, num, sg_list);
+}
+
+static int mlx5_send_pending_sg_list_mpw_safe(
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_pending_sg_list_mpw_safe(
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags)
+{
+ struct mlx5_qp *mqp = to_mqp(ibqp);
+ int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && mqp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ /* qp, addr, length, lkey, flags, raw_eth, inl, */
+ return send_pending(ibqp, 0, 0, 0, flags, raw_eth, 0,
+ /* safe, use_sg, use_mpw, num_sge, sg_list */
+ 1, 1, 1, num, sg_list);
+}
+
+#define MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw) mlx5_send_pending_sg_list_unsafe_##eth##mpw
+#define MLX5_SEND_PENDING_SG_LIST_UNSAFE(eth, mpw) \
+ static int MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num, uint32_t flags) __MLX5_ALGN_F__; \
+ static int MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num, uint32_t flags) \
+ { \
+ /* qp, addr, length, lkey, flags, eth, inl, */ \
+ return send_pending(ibqp, 0, 0, 0, flags, eth, 0, \
+ /* safe, use_sg, use_mpw, num_sge, sg_list */ \
+ 0, 1, mpw, num, sg_list); \
+ }
+/* eth mpw */
+MLX5_SEND_PENDING_SG_LIST_UNSAFE(0, 0);
+MLX5_SEND_PENDING_SG_LIST_UNSAFE(0, 1);
+MLX5_SEND_PENDING_SG_LIST_UNSAFE(1, 0);
+MLX5_SEND_PENDING_SG_LIST_UNSAFE(1, 1);
+
+/* burst family - send_burst */
+static inline int send_flush_unsafe(struct ibv_qp *ibqp, const int db_method) __attribute__((always_inline));
+
+static inline int send_msg_list(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags, const int raw_eth, const int thread_safe,
+ const int db_method, const int mpw) __attribute__((always_inline));
+static inline int send_msg_list(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num,
+ uint32_t flags, const int raw_eth, const int thread_safe,
+ const int db_method, const int mpw)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ int i;
+
+ if (thread_safe)
+ mlx5_lock(&qp->sq.lock);
+
+ for (i = 0; i < num; i++, sg_list++)
+ /* qp, addr, length, lkey, */
+ send_pending(ibqp, sg_list->addr, sg_list->length, sg_list->lkey,
+ /* flags, raw_eth, inl, safe, use_sg, */
+ flags, raw_eth, 0, 0, 0,
+ /* use_mpw, num_sge, sg_list */
+ mpw, 0, NULL);
+
+ /* use send_flush_unsafe since lock is already taken if needed */
+ send_flush_unsafe(ibqp, db_method);
+
+ if (thread_safe)
+ mlx5_unlock(&qp->sq.lock);
+
+ return 0;
+}
+
+static int mlx5_send_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ qp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ return send_msg_list(ibqp, sg_list, num, flags, eth, 1, qp->gen_data.bf->db_method, 0);
+}
+
+static int mlx5_send_burst_mpw_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) __MLX5_ALGN_F__;
+static int mlx5_send_burst_mpw_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ qp->link_layer == IBV_LINK_LAYER_ETHERNET;
+
+ return send_msg_list(ibqp, sg_list, num, flags, eth, 1, qp->gen_data.bf->db_method, 1);
+}
+
+#define MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw) mlx5_send_burst_unsafe_##db_method##eth##mpw
+#define MLX5_SEND_BURST_UNSAFE(db_method, eth, mpw) \
+ static int MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num, uint32_t flags) __MLX5_ALGN_F__; \
+ static int MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num, uint32_t flags) \
+ { \
+ return send_msg_list(ibqp, sg_list, num, flags, eth, 0, db_method, mpw); \
+ }
+/* db_method, eth mpw */
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 0, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 0, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 1, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 1, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 0, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 0, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 1, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 1, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 0, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 0, 1);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 1, 0);
+MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 1, 1);
+
+/* burst family - send_flush */
+static inline int send_flush_unsafe(struct ibv_qp *ibqp, const int db_method)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ uint32_t curr_post = qp->gen_data.scur_post & 0xffff;
+ int size = ((int)curr_post - (int)qp->gen_data.last_post + (int)0x10000) & 0xffff;
+ unsigned long long *seg = mlx5_get_send_wqe(qp, qp->gen_data.last_post & (qp->sq.wqe_cnt - 1));
+
+ return __ring_db(qp, db_method, curr_post, seg, size);
+}
+
+static int mlx5_send_flush_safe(struct ibv_qp *ibqp) __MLX5_ALGN_F__;
+static int mlx5_send_flush_safe(struct ibv_qp *ibqp)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+
+ mlx5_lock(&qp->sq.lock);
+ send_flush_unsafe(ibqp, qp->gen_data.bf->db_method);
+ mlx5_unlock(&qp->sq.lock);
+
+ return 0;
+}
+
+#define MLX5_SEND_FLUSH_UNSAFE_NAME(db_method) mlx5_send_flush_unsafe_##db_method
+#define MLX5_SEND_FLUSH_UNSAFE(db_method) \
+ static int MLX5_SEND_FLUSH_UNSAFE_NAME(db_method)( \
+ struct ibv_qp *ibqp) __MLX5_ALGN_F__; \
+ static int MLX5_SEND_FLUSH_UNSAFE_NAME(db_method)( \
+ struct ibv_qp *ibqp) \
+ { \
+ return send_flush_unsafe(ibqp, db_method); \
+ }
+/* db_method */
+MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD);
+MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DEDIC_BF);
+MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_BF);
+MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DB);
+
+/* burst family - recv_pending_sg_list */
+static inline int recv_sg_list(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num_sg,
+ const int thread_safe) __attribute__((always_inline));
+static inline int recv_sg_list(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num_sg,
+ const int thread_safe)
+{
+ struct mlx5_wqe_data_seg *scat;
+ unsigned int ind;
+ int i, j;
+
+ if (thread_safe)
+ mlx5_lock(&rq->lock);
+
+ ind = rq->head & (rq->wqe_cnt - 1);
+ scat = get_recv_wqe(rq, ind);
+
+ for (i = 0, j = 0; i < num_sg; ++i, sg_list++) {
+ if (unlikely(!sg_list->length))
+ continue;
+ scat->byte_count = htonl(sg_list->length);
+ scat->lkey = htonl(sg_list->lkey);
+ scat->addr = htonll(sg_list->addr);
+ scat++;
+ j++;
+ }
+ if (j < rq->max_gs) {
+ scat->byte_count = 0;
+ scat->lkey = htonl(MLX5_INVALID_LKEY);
+ scat->addr = 0;
+ }
+ rq->head++;
+
+ /*
+ * Make sure that descriptors are written before
+ * doorbell record.
+ */
+ wmb();
+
+ *rq->db = htonl(rq->head & 0xffff);
+
+ if (thread_safe)
+ mlx5_unlock(&rq->lock);
+
+ return 0;
+}
+
+/* burst family - recv_burst */
+static inline int recv_burst(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num,
+ const int thread_safe, const int max_one_sge, const int mp_rq) __attribute__((always_inline));
+static inline int recv_burst(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num,
+ const int thread_safe, const int max_one_sge, const int mp_rq)
+{
+ struct mlx5_wqe_data_seg *scat;
+ unsigned int ind;
+ int i;
+
+ if (thread_safe)
+ mlx5_lock(&rq->lock);
+
+ ind = rq->head & (rq->wqe_cnt - 1);
+ for (i = 0; i < num; ++i) {
+ scat = get_recv_wqe(rq, ind);
+ /* Multi-Packet RQ WQE format is like SRQ format and requires
+ * a next-segment octword.
+ * This next-segment octword is reserved (therefore cleared)
+ * when we use CYCLIC_STRIDING_RQ
+ */
+ if (mp_rq) {
+ memset(scat, 0, sizeof(struct mlx5_wqe_srq_next_seg));
+ scat++;
+ }
+ scat->byte_count = htonl(sg_list->length);
+ scat->lkey = htonl(sg_list->lkey);
+ scat->addr = htonll(sg_list->addr);
+
+ if (!max_one_sge) {
+ scat[1].byte_count = 0;
+ scat[1].lkey = htonl(MLX5_INVALID_LKEY);
+ scat[1].addr = 0;
+ }
+
+ sg_list++;
+ ind = (ind + 1) & (rq->wqe_cnt - 1);
+ }
+ rq->head += num;
+
+ /*
+ * Make sure that descriptors are written before
+ * doorbell record.
+ */
+ wmb();
+
+ *rq->db = htonl(rq->head & 0xffff);
+
+ if (thread_safe)
+ mlx5_unlock(&rq->lock);
+
+ return 0;
+}
+
+static int mlx5_recv_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num) __MLX5_ALGN_F__;
+static int mlx5_recv_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+
+ return recv_burst(&qp->rq, sg_list, num, 1, qp->rq.max_gs == 1, 0);
+}
+
+#define MLX5_RECV_BURST_UNSAFE_NAME(_1sge) mlx5_recv_burst_unsafe_##_1sge
+#define MLX5_RECV_BURST_UNSAFE(_1sge) \
+ static int MLX5_RECV_BURST_UNSAFE_NAME(_1sge)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num) __MLX5_ALGN_F__; \
+ static int MLX5_RECV_BURST_UNSAFE_NAME(_1sge)( \
+ struct ibv_qp *ibqp, struct ibv_sge *sg_list, \
+ uint32_t num) \
+ { \
+ return recv_burst(&to_mqp(ibqp)->rq, sg_list, num, 0, _1sge, 0); \
+ }
+/* _1sge */
+MLX5_RECV_BURST_UNSAFE(0);
+MLX5_RECV_BURST_UNSAFE(1);
+
+/*
+ * qp_burst family implementation for safe QP
+ */
+struct ibv_exp_qp_burst_family mlx5_qp_burst_family_safe = {
+ .send_burst = mlx5_send_burst_safe,
+ .send_pending = mlx5_send_pending_safe,
+ .send_pending_inline = mlx5_send_pending_inl_safe,
+ .send_pending_sg_list = mlx5_send_pending_sg_list_safe,
+ .send_flush = mlx5_send_flush_safe,
+ .recv_burst = mlx5_recv_burst_safe
+};
+
+struct ibv_exp_qp_burst_family mlx5_qp_burst_family_mpw_safe = {
+ .send_burst = mlx5_send_burst_mpw_safe,
+ .send_pending = mlx5_send_pending_mpw_safe,
+ .send_pending_inline = mlx5_send_pending_inl_mpw_safe,
+ .send_pending_sg_list = mlx5_send_pending_sg_list_mpw_safe,
+ .send_flush = mlx5_send_flush_safe,
+ .recv_burst = mlx5_recv_burst_safe
+};
+
+/*
+ * qp_burst family implementation table for unsafe QP
+ *
+ * Each table entry contains an implementation of the ibv_exp_qp_burst_family
+ * which fits to QPs with specific attributes:
+ * - db_method (MLX5_DB_METHOD_DEDIC_BF_1_THREAD, MLX5_DB_METHOD_DEDIC_BF,
+ * MLX5_DB_METHOD_BF or MLX5_DB_METHOD_DB)
+ * - raw_eth_qp (yes/no),
+ * - max-rcv-gs == 1 (yes/no)
+ *
+ * To get the right qp_burst_family implementation for specific QP use the QP
+ * attributes (db_method << 2 | eth << 1 | _1sge) as an index for the qp_burst
+ * family table
+ */
+#define MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw) \
+ (db_method << 3 | eth << 2 | _1sge << 1 | mpw)
+
+#define MLX5_QP_BURST_UNSAFE_TBL_ENTRY(db_method, eth, _1sge, mpw) \
+ [MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw)] = { \
+ .send_burst = MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw), \
+ .send_pending = MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw), \
+ .send_pending_inline = MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw), \
+ .send_pending_sg_list = MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw), \
+ .send_flush = MLX5_SEND_FLUSH_UNSAFE_NAME(db_method), \
+ .recv_burst = MLX5_RECV_BURST_UNSAFE_NAME(_1sge), \
+ }
+static struct ibv_exp_qp_burst_family mlx5_qp_burst_family_unsafe_tbl[1 << 5] = {
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 1, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 0, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 0, 1),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 1, 0),
+ MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 1, 1),
+};
+
+struct ibv_exp_qp_burst_family *mlx5_get_qp_burst_family(struct mlx5_qp *qp,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status)
+{
+ enum ibv_exp_query_intf_status ret = IBV_EXP_INTF_STAT_OK;
+ struct ibv_exp_qp_burst_family *family = NULL;
+ uint32_t unsupported_f;
+ int mpw;
+
+ if (params->intf_version > MLX5_MAX_QP_BURST_FAMILY_VER) {
+ *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED;
+
+ return NULL;
+ }
+
+ if ((qp->verbs_qp.qp.state < IBV_QPS_INIT) || (qp->verbs_qp.qp.state > IBV_QPS_RTS)) {
+ *status = IBV_EXP_INTF_STAT_INVAL_OBJ_STATE;
+ return NULL;
+ }
+ if (qp->gen_data.create_flags & IBV_EXP_QP_CREATE_MANAGED_SEND) {
+ fprintf(stderr, PFX "Can't use QP burst family while QP_CREATE_MANAGED_SEND is set\n");
+ *status = IBV_EXP_INTF_STAT_INVAL_PARARM;
+ return NULL;
+ }
+ if (params->flags) {
+ fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for QP family\n", params->flags);
+ *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+ unsupported_f = params->family_flags & ~(IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR);
+ if (unsupported_f) {
+ fprintf(stderr, PFX "Family flags(0x%x) are not supported for QP family\n", unsupported_f);
+ *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+
+ switch (qp->gen_data_warm.qp_type) {
+ case IBV_QPT_RC:
+ case IBV_QPT_UC:
+ case IBV_QPT_RAW_PACKET:
+ mpw = (params->family_flags & IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR) &&
+ (qp->gen_data.model_flags & MLX5_QP_MODEL_MULTI_PACKET_WQE);
+
+ if (qp->gen_data.model_flags & MLX5_QP_MODEL_FLAG_THREAD_SAFE) {
+ if (mpw)
+ family = &mlx5_qp_burst_family_mpw_safe;
+ else
+ family = &mlx5_qp_burst_family_safe;
+ } else {
+ int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET &&
+ qp->link_layer == IBV_LINK_LAYER_ETHERNET;
+ int _1sge = qp->rq.max_gs == 1;
+ int db_method = qp->gen_data.bf->db_method;
+
+ family = &mlx5_qp_burst_family_unsafe_tbl
+ [MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw)];
+ }
+ break;
+
+ default:
+ ret = IBV_EXP_INTF_STAT_INVAL_PARARM;
+ break;
+ }
+
+ *status = ret;
+
+ return family;
+}
+
+/*
+ * WQ family
+ */
+
+/* wq family - recv_burst */
+static int mlx5_wq_recv_burst_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num) __MLX5_ALGN_F__;
+static int mlx5_wq_recv_burst_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num)
+{
+ struct mlx5_rwq *rwq = to_mrwq(ibwq);
+
+ return recv_burst(&rwq->rq, sg_list, num, 1, rwq->rq.max_gs == 1, rwq->rsc.type == MLX5_RSC_TYPE_MP_RWQ);
+}
+
+#define MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge) mlx5_wq_recv_burst_unsafe_##_1sge
+#define MLX5_WQ_RECV_BURST_UNSAFE(_1sge) \
+ static int MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge)( \
+ struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, \
+ uint32_t num) __MLX5_ALGN_F__; \
+ static int MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge)( \
+ struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, \
+ uint32_t num) \
+ { \
+ struct mlx5_rwq *rwq = to_mrwq(ibwq); \
+ \
+ return recv_burst(&rwq->rq, sg_list, num, 0, _1sge, \
+ rwq->rsc.type == MLX5_RSC_TYPE_MP_RWQ); \
+ }
+/* _1sge */
+MLX5_WQ_RECV_BURST_UNSAFE(0);
+MLX5_WQ_RECV_BURST_UNSAFE(1);
+
+/* wq family - recv_sg_list */
+static int mlx5_wq_recv_sg_list_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) __MLX5_ALGN_F__;
+static int mlx5_wq_recv_sg_list_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg)
+{
+ return recv_sg_list(&to_mrwq(ibwq)->rq, sg_list, num_sg, 1);
+}
+
+static int mlx5_wq_recv_sg_list_unsafe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) __MLX5_ALGN_F__;
+static int mlx5_wq_recv_sg_list_unsafe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg)
+{
+ return recv_sg_list(&to_mrwq(ibwq)->rq, sg_list, num_sg, 0);
+}
+
+/*
+ * wq family implementation for safe WQ
+ */
+struct ibv_exp_wq_family mlx5_wq_family_safe = {
+ .recv_sg_list = mlx5_wq_recv_sg_list_safe,
+ .recv_burst = mlx5_wq_recv_burst_safe
+};
+
+/*
+ * wq family implementation table for unsafe WQ
+ *
+ * Each table entry contains an implementation of the ibv_exp_wq_family
+ * which fits to WQs with specific attributes:
+ * - max-rcv-gs == 1 (yes/no)
+ *
+ * To get the right wq_family implementation for specific WQ use the WQ
+ * attribute (_1sge) as an index for the qp_burst family table
+ */
+#define MLX5_WQ_UNSAFE_TBL_IDX(_1sge) \
+ (_1sge)
+
+#define MLX5_WQ_UNSAFE_TBL_ENTRY(_1sge) \
+ [MLX5_WQ_UNSAFE_TBL_IDX(_1sge)] = { \
+ .recv_sg_list = mlx5_wq_recv_sg_list_unsafe, \
+ .recv_burst = MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge) \
+ }
+
+static struct ibv_exp_wq_family mlx5_wq_family_unsafe_tbl[1 << 1] = {
+ MLX5_WQ_UNSAFE_TBL_ENTRY(0),
+ MLX5_WQ_UNSAFE_TBL_ENTRY(1),
+};
+
+struct ibv_exp_wq_family *mlx5_get_wq_family(struct mlx5_rwq *rwq,
+ struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status)
+{
+ enum ibv_exp_query_intf_status ret = IBV_EXP_INTF_STAT_OK;
+ struct ibv_exp_wq_family *family = NULL;
+
+ if (params->intf_version > MLX5_MAX_WQ_FAMILY_VER) {
+ *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED;
+
+ return NULL;
+ }
+
+ if (params->flags) {
+ fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for WQ family\n", params->flags);
+ *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+ if (params->family_flags) {
+ fprintf(stderr, PFX "Family flags(0x%x) are not supported for WQ family\n", params->family_flags);
+ *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED;
+
+ return NULL;
+ }
+
+ if (rwq->model_flags & MLX5_WQ_MODEL_FLAG_THREAD_SAFE) {
+ family = &mlx5_wq_family_safe;
+ } else {
+ int _1sge = rwq->rq.max_gs == 1;
+
+ family = &mlx5_wq_family_unsafe_tbl
+ [MLX5_WQ_UNSAFE_TBL_IDX(_1sge)];
+ }
+
+ *status = ret;
+
+ return family;
+}
+
Index: contrib/ofed/libmlx5/src/srq.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/srq.c
@@ -0,0 +1,271 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdlib.h>
+#include <netinet/in.h>
+#include <pthread.h>
+#include <string.h>
+#include <errno.h>
+
+#include "mlx5.h"
+#include "doorbell.h"
+#include "wqe.h"
+
+static void *get_wqe(struct mlx5_srq *srq, int n)
+{
+ return srq->buf.buf + (n << srq->wqe_shift);
+}
+
+int mlx5_copy_to_recv_srq(struct mlx5_srq *srq, int idx, void *buf, int size)
+{
+ struct mlx5_wqe_srq_next_seg *next;
+ struct mlx5_wqe_data_seg *scat;
+ int copy;
+ int i;
+ int max = 1 << (srq->wqe_shift - 4);
+
+ next = get_wqe(srq, idx);
+ scat = (struct mlx5_wqe_data_seg *) (next + 1);
+
+ for (i = 0; i < max; ++i) {
+ copy = min(size, ntohl(scat->byte_count));
+ memcpy((void *)(unsigned long)ntohll(scat->addr), buf, copy);
+ size -= copy;
+ if (size <= 0)
+ return IBV_WC_SUCCESS;
+
+ buf += copy;
+ ++scat;
+ }
+ return IBV_WC_LOC_LEN_ERR;
+}
+
+void mlx5_free_srq_wqe(struct mlx5_srq *srq, int ind)
+{
+ struct mlx5_wqe_srq_next_seg *next;
+
+ mlx5_spin_lock(&srq->lock);
+
+ next = get_wqe(srq, srq->tail);
+ next->next_wqe_index = htons(ind);
+ srq->tail = ind;
+
+ mlx5_spin_unlock(&srq->lock);
+}
+
+static void set_sig_seg(struct mlx5_srq *srq,
+ struct mlx5_wqe_srq_next_seg *next,
+ int size, uint16_t idx)
+{
+ uint8_t sign;
+ uint32_t srqn = srq->srqn;
+
+ next->signature = 0;
+ sign = calc_xor(next, size);
+ sign ^= calc_xor(&srqn, 4);
+ sign ^= calc_xor(&idx, 2);
+ next->signature = ~sign;
+}
+
+int mlx5_post_srq_recv(struct ibv_srq *ibsrq,
+ struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr)
+{
+ struct mlx5_srq *srq;
+ struct mlx5_wqe_srq_next_seg *next;
+ struct mlx5_wqe_data_seg *scat;
+ unsigned head;
+ int err = 0;
+ int nreq;
+ int i;
+
+ if (ibsrq->handle == LEGACY_XRC_SRQ_HANDLE)
+ ibsrq = (struct ibv_srq *)(((struct ibv_srq_legacy *) ibsrq)->ibv_srq);
+
+ srq = to_msrq(ibsrq);
+ mlx5_spin_lock(&srq->lock);
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (wr->num_sge > srq->max_gs) {
+ errno = EINVAL;
+ err = errno;
+ *bad_wr = wr;
+ break;
+ }
+
+ head = srq->head;
+ if (head == srq->tail) {
+ /* SRQ is full*/
+ errno = ENOMEM;
+ err = errno;
+ *bad_wr = wr;
+ break;
+ }
+
+ srq->wrid[head] = wr->wr_id;
+
+ next = get_wqe(srq, head);
+ srq->head = ntohs(next->next_wqe_index);
+ scat = (struct mlx5_wqe_data_seg *) (next + 1);
+
+ for (i = 0; i < wr->num_sge; ++i) {
+ scat[i].byte_count = htonl(wr->sg_list[i].length);
+ scat[i].lkey = htonl(wr->sg_list[i].lkey);
+ scat[i].addr = htonll(wr->sg_list[i].addr);
+ }
+
+ if (i < srq->max_gs) {
+ scat[i].byte_count = 0;
+ scat[i].lkey = htonl(MLX5_INVALID_LKEY);
+ scat[i].addr = 0;
+ }
+ if (unlikely(srq->wq_sig))
+ set_sig_seg(srq, next, 1 << srq->wqe_shift, head + nreq);
+ }
+
+ if (nreq) {
+ srq->counter += nreq;
+
+ /*
+ * Make sure that descriptors are written before
+ * we write doorbell record.
+ */
+ wmb();
+
+ *srq->db = htonl(srq->counter);
+ }
+
+ mlx5_spin_unlock(&srq->lock);
+
+ return err;
+}
+
+int mlx5_alloc_srq_buf(struct ibv_context *context, struct mlx5_srq *srq)
+{
+ struct mlx5_wqe_srq_next_seg *next;
+ int size;
+ int buf_size;
+ int i;
+ struct mlx5_context *ctx;
+
+ ctx = to_mctx(context);
+
+ if (srq->max_gs < 0) {
+ errno = EINVAL;
+ return -1;
+ }
+
+ srq->wrid = malloc(srq->max * sizeof *srq->wrid);
+ if (!srq->wrid)
+ return -1;
+
+ size = sizeof(struct mlx5_wqe_srq_next_seg) +
+ srq->max_gs * sizeof(struct mlx5_wqe_data_seg);
+ size = max(32, size);
+
+ size = mlx5_round_up_power_of_two(size);
+
+ if (size > ctx->max_recv_wr) {
+ errno = EINVAL;
+ return -1;
+ }
+ srq->max_gs = (size - sizeof(struct mlx5_wqe_srq_next_seg)) /
+ sizeof(struct mlx5_wqe_data_seg);
+
+ srq->wqe_shift = mlx5_ilog2(size);
+
+ buf_size = srq->max * size;
+
+ if (mlx5_alloc_buf(&srq->buf, buf_size,
+ to_mdev(context->device)->page_size)) {
+ free(srq->wrid);
+ return -1;
+ }
+
+ memset(srq->buf.buf, 0, buf_size);
+
+ /*
+ * Now initialize the SRQ buffer so that all of the WQEs are
+ * linked into the list of free WQEs.
+ */
+
+ for (i = 0; i < srq->max; ++i) {
+ next = get_wqe(srq, i);
+ next->next_wqe_index = htons((i + 1) & (srq->max - 1));
+ }
+
+ srq->head = 0;
+ srq->tail = srq->max - 1;
+
+ return 0;
+}
+
+struct mlx5_srq *mlx5_find_srq(struct mlx5_context *ctx, uint32_t srqn)
+{
+ int tind = srqn >> MLX5_SRQ_TABLE_SHIFT;
+
+ if (ctx->srq_table[tind].refcnt)
+ return ctx->srq_table[tind].table[srqn & MLX5_SRQ_TABLE_MASK];
+ else
+ return NULL;
+}
+
+int mlx5_store_srq(struct mlx5_context *ctx, uint32_t srqn,
+ struct mlx5_srq *srq)
+{
+ int tind = srqn >> MLX5_SRQ_TABLE_SHIFT;
+
+ if (!ctx->srq_table[tind].refcnt) {
+ ctx->srq_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1,
+ sizeof(struct mlx5_qp *));
+ if (!ctx->srq_table[tind].table)
+ return -1;
+ }
+
+ ++ctx->srq_table[tind].refcnt;
+ ctx->srq_table[tind].table[srqn & MLX5_QP_TABLE_MASK] = srq;
+ return 0;
+}
+
+void mlx5_clear_srq(struct mlx5_context *ctx, uint32_t srqn)
+{
+ int tind = srqn >> MLX5_QP_TABLE_SHIFT;
+
+ if (!--ctx->srq_table[tind].refcnt)
+ free(ctx->srq_table[tind].table);
+ else
+ ctx->srq_table[tind].table[srqn & MLX5_SRQ_TABLE_MASK] = NULL;
+}
Index: contrib/ofed/libmlx5/src/verbs.c
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/verbs.c
@@ -0,0 +1,3462 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif /* HAVE_CONFIG_H */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <pthread.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <limits.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "mlx5.h"
+#include "mlx5-abi.h"
+#include "wqe.h"
+
+int mlx5_single_threaded = 0;
+int mlx5_use_mutex;
+
+static void __mlx5_query_device(uint64_t raw_fw_ver,
+ struct ibv_device_attr *attr)
+{
+ unsigned major, minor, sub_minor;
+
+ major = (raw_fw_ver >> 32) & 0xffff;
+ minor = (raw_fw_ver >> 16) & 0xffff;
+ sub_minor = raw_fw_ver & 0xffff;
+
+ snprintf(attr->fw_ver, sizeof attr->fw_ver,
+ "%d.%d.%04d", major, minor, sub_minor);
+}
+
+int mlx5_query_device(struct ibv_context *context,
+ struct ibv_device_attr *attr)
+{
+ struct ibv_exp_device_attr attrx;
+ struct ibv_exp_query_device cmd;
+ uint64_t raw_fw_ver;
+ int err;
+
+ read_init_vars(to_mctx(context));
+ memset(&attrx, 0, sizeof(attrx));
+ err = ibv_exp_cmd_query_device(context,
+ &attrx,
+ &raw_fw_ver, &cmd,
+ sizeof(cmd));
+ if (err)
+ return err;
+
+ memcpy(attr, &attrx, sizeof(*attr));
+ __mlx5_query_device(raw_fw_ver, attr);
+
+ return err;
+}
+
+int mlx5_query_device_ex(struct ibv_context *context,
+ struct ibv_exp_device_attr *attr)
+{
+ struct ibv_exp_query_device cmd;
+ struct mlx5_context *ctx = to_mctx(context);
+ uint64_t raw_fw_ver;
+ int err;
+
+ err = ibv_exp_cmd_query_device(context, attr, &raw_fw_ver,
+ &cmd, sizeof(cmd));
+ if (err)
+ return err;
+
+ __mlx5_query_device(raw_fw_ver, (struct ibv_device_attr *)attr);
+
+ attr->exp_device_cap_flags |= IBV_EXP_DEVICE_MR_ALLOCATE;
+ if (attr->exp_device_cap_flags & IBV_EXP_DEVICE_CROSS_CHANNEL) {
+ attr->comp_mask |= IBV_EXP_DEVICE_ATTR_CALC_CAP;
+ attr->calc_cap.data_types =
+ (1ULL << IBV_EXP_CALC_DATA_TYPE_INT) |
+ (1ULL << IBV_EXP_CALC_DATA_TYPE_UINT) |
+ (1ULL << IBV_EXP_CALC_DATA_TYPE_FLOAT);
+ attr->calc_cap.data_sizes =
+ (1ULL << IBV_EXP_CALC_DATA_SIZE_64_BIT);
+ attr->calc_cap.int_ops = (1ULL << IBV_EXP_CALC_OP_ADD) |
+ (1ULL << IBV_EXP_CALC_OP_BAND) |
+ (1ULL << IBV_EXP_CALC_OP_BXOR) |
+ (1ULL << IBV_EXP_CALC_OP_BOR);
+ attr->calc_cap.uint_ops = attr->calc_cap.int_ops;
+ attr->calc_cap.fp_ops = attr->calc_cap.int_ops;
+ }
+ if (ctx->cc.buf)
+ attr->exp_device_cap_flags |= IBV_EXP_DEVICE_DC_INFO;
+
+ if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS)
+ attr->exp_device_cap_flags &= (~IBV_EXP_DEVICE_VXLAN_SUPPORT);
+
+ if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_MP_RQ)
+ /* Lib supports MP-RQ only for RAW_ETH QPs reset other
+ * QP types supported by kernel
+ */
+ attr->mp_rq_caps.supported_qps &= IBV_EXP_QPT_RAW_PACKET;
+
+
+ if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_MP_RQ) {
+ /* Update kernel caps to mp_rq caps supported by lib */
+ attr->mp_rq_caps.allowed_shifts &= MLX5_MP_RQ_SUPPORTED_SHIFTS;
+ attr->mp_rq_caps.supported_qps &= MLX5_MP_RQ_SUPPORTED_QPT;
+ if (attr->mp_rq_caps.max_single_stride_log_num_of_bytes > MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE)
+ attr->mp_rq_caps.max_single_stride_log_num_of_bytes = MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE;
+ if (attr->mp_rq_caps.max_single_wqe_log_num_of_strides > MLX5_MP_RQ_MAX_LOG_NUM_STRIDES)
+ attr->mp_rq_caps.max_single_wqe_log_num_of_strides = MLX5_MP_RQ_MAX_LOG_NUM_STRIDES;
+ }
+
+ return err;
+}
+
+int mlx5_query_port(struct ibv_context *context, uint8_t port,
+ struct ibv_port_attr *attr)
+{
+ struct mlx5_context *mctx = to_mctx(context);
+ struct ibv_query_port cmd;
+ int err;
+
+ read_init_vars(mctx);
+ err = ibv_cmd_query_port(context, port, attr, &cmd, sizeof cmd);
+
+ if (!err && port <= mctx->num_ports && port > 0) {
+ if (!mctx->port_query_cache[port - 1].valid) {
+ mctx->port_query_cache[port - 1].link_layer =
+ attr->link_layer;
+ mctx->port_query_cache[port - 1].caps =
+ attr->port_cap_flags;
+ mctx->port_query_cache[port - 1].valid = 1;
+ }
+ }
+
+ return err;
+}
+
+int mlx5_exp_query_port(struct ibv_context *context, uint8_t port_num,
+ struct ibv_exp_port_attr *port_attr)
+{
+ struct mlx5_context *mctx = to_mctx(context);
+
+ /* Check that only valid flags were given */
+ if (!(port_attr->comp_mask & IBV_EXP_QUERY_PORT_ATTR_MASK1) ||
+ (port_attr->comp_mask & ~IBV_EXP_QUERY_PORT_ATTR_MASKS) ||
+ (port_attr->mask1 & ~IBV_EXP_QUERY_PORT_MASK)) {
+ return EINVAL;
+ }
+
+ /* Optimize the link type query */
+ if (port_attr->comp_mask == IBV_EXP_QUERY_PORT_ATTR_MASK1) {
+ if (!(port_attr->mask1 & ~(IBV_EXP_QUERY_PORT_LINK_LAYER |
+ IBV_EXP_QUERY_PORT_CAP_FLAGS))) {
+ if (port_num <= 0 || port_num > mctx->num_ports)
+ return EINVAL;
+ if (mctx->port_query_cache[port_num - 1].valid) {
+ if (port_attr->mask1 &
+ IBV_EXP_QUERY_PORT_LINK_LAYER)
+ port_attr->link_layer =
+ mctx->
+ port_query_cache[port_num - 1].
+ link_layer;
+ if (port_attr->mask1 &
+ IBV_EXP_QUERY_PORT_CAP_FLAGS)
+ port_attr->port_cap_flags =
+ mctx->
+ port_query_cache[port_num - 1].
+ caps;
+ return 0;
+ }
+ }
+ if (port_attr->mask1 & IBV_EXP_QUERY_PORT_STD_MASK) {
+ return mlx5_query_port(context, port_num,
+ &port_attr->port_attr);
+ }
+ }
+
+ return EOPNOTSUPP;
+
+}
+
+struct ibv_pd *mlx5_alloc_pd(struct ibv_context *context)
+{
+ struct ibv_alloc_pd cmd;
+ struct mlx5_alloc_pd_resp resp;
+ struct mlx5_pd *pd;
+
+ read_init_vars(to_mctx(context));
+ pd = calloc(1, sizeof *pd);
+ if (!pd)
+ return NULL;
+
+ if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof cmd,
+ &resp.ibv_resp, sizeof(resp)))
+ goto err;
+
+ pd->pdn = resp.pdn;
+
+
+ if (mlx5_init_implicit_lkey(&pd->r_ilkey, IBV_EXP_ACCESS_ON_DEMAND) ||
+ mlx5_init_implicit_lkey(&pd->w_ilkey, IBV_EXP_ACCESS_ON_DEMAND |
+ IBV_EXP_ACCESS_LOCAL_WRITE))
+ goto err;
+
+ return &pd->ibv_pd;
+
+err:
+ free(pd);
+ return NULL;
+}
+
+int mlx5_free_pd(struct ibv_pd *pd)
+{
+ struct mlx5_pd *mpd = to_mpd(pd);
+ int ret;
+
+ /* TODO: Better handling of destruction failure due to resources
+ * opened. At the moment, we might seg-fault here.*/
+ mlx5_destroy_implicit_lkey(&mpd->r_ilkey);
+ mlx5_destroy_implicit_lkey(&mpd->w_ilkey);
+ if (mpd->remote_ilkey) {
+ mlx5_destroy_implicit_lkey(mpd->remote_ilkey);
+ mpd->remote_ilkey = NULL;
+ }
+
+ ret = ibv_cmd_dealloc_pd(pd);
+ if (ret)
+ return ret;
+
+ free(mpd);
+ return 0;
+}
+
+static void *alloc_buf(struct mlx5_mr *mr,
+ struct ibv_pd *pd,
+ size_t length,
+ void *contig_addr)
+{
+ size_t alloc_length;
+ int force_anon = 0;
+ int force_contig = 0;
+ enum mlx5_alloc_type alloc_type;
+ int page_size = to_mdev(pd->context->device)->page_size;
+ int err;
+
+ mlx5_get_alloc_type(pd->context, MLX5_MR_PREFIX, &alloc_type, MLX5_ALLOC_TYPE_ALL);
+
+ if (alloc_type == MLX5_ALLOC_TYPE_CONTIG)
+ force_contig = 1;
+ else if (alloc_type == MLX5_ALLOC_TYPE_ANON)
+ force_anon = 1;
+
+ if (force_anon) {
+ err = mlx5_alloc_buf(&mr->buf, align(length, page_size),
+ page_size);
+ if (err)
+ return NULL;
+
+ return mr->buf.buf;
+ }
+
+ alloc_length = (contig_addr ? length : align(length, page_size));
+
+ err = mlx5_alloc_buf_contig(to_mctx(pd->context), &mr->buf,
+ alloc_length, page_size, MLX5_MR_PREFIX,
+ contig_addr);
+ if (!err)
+ return contig_addr ? contig_addr : mr->buf.buf;
+
+ if (force_contig || contig_addr)
+ return NULL;
+
+ err = mlx5_alloc_buf(&mr->buf, align(length, page_size),
+ page_size);
+ if (err)
+ return NULL;
+
+ return mr->buf.buf;
+}
+
+struct ibv_mr *mlx5_exp_reg_mr(struct ibv_exp_reg_mr_in *in)
+{
+ struct mlx5_mr *mr;
+ struct ibv_exp_reg_mr cmd;
+ int ret;
+ int is_contig;
+
+ if ((in->comp_mask > IBV_EXP_REG_MR_RESERVED - 1) ||
+ (in->exp_access > IBV_EXP_ACCESS_RESERVED - 1)) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if (in->addr == 0 && in->length == MLX5_WHOLE_ADDR_SPACE &&
+ (in->exp_access & IBV_EXP_ACCESS_ON_DEMAND))
+ return mlx5_alloc_whole_addr_mr(in);
+
+ if ((in->exp_access &
+ (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_RELAXED)) ==
+ (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_RELAXED)) {
+ struct ibv_mr *ibv_mr = NULL;
+ struct mlx5_pd *mpd = to_mpd(in->pd);
+ struct mlx5_implicit_lkey *implicit_lkey =
+ mlx5_get_implicit_lkey(mpd, in->exp_access);
+ struct ibv_exp_prefetch_attr prefetch_attr = {
+ .flags = in->exp_access &
+ (IBV_ACCESS_LOCAL_WRITE |
+ IBV_ACCESS_REMOTE_WRITE |
+ IBV_ACCESS_REMOTE_READ) ?
+ IBV_EXP_PREFETCH_WRITE_ACCESS : 0,
+ .addr = in->addr,
+ .length = in->length,
+ .comp_mask = 0,
+ };
+
+ if (!implicit_lkey)
+ return NULL;
+ errno = mlx5_get_real_mr_from_implicit_lkey(mpd, implicit_lkey,
+ (uintptr_t)in->addr,
+ in->length,
+ &ibv_mr);
+ if (errno)
+ return NULL;
+
+ /* Prefetch the requested range */
+ ibv_exp_prefetch_mr(ibv_mr, &prefetch_attr);
+
+ return ibv_mr;
+ }
+
+ mr = calloc(1, sizeof(*mr));
+ if (!mr)
+ return NULL;
+
+ /*
+ * if addr is NULL and IBV_EXP_ACCESS_ALLOCATE_MR is set,
+ * the library allocates contiguous memory
+ */
+
+ /* Need valgrind exception here due to compiler optimization problem */
+ VALGRIND_MAKE_MEM_DEFINED(&in->create_flags, sizeof(in->create_flags));
+ is_contig = (!in->addr && (in->exp_access & IBV_EXP_ACCESS_ALLOCATE_MR)) ||
+ ((in->comp_mask & IBV_EXP_REG_MR_CREATE_FLAGS) &&
+ (in->create_flags & IBV_EXP_REG_MR_CREATE_CONTIG));
+
+ if (is_contig) {
+ in->addr = alloc_buf(mr, in->pd, in->length, in->addr);
+ if (!in->addr) {
+ free(mr);
+ return NULL;
+ }
+
+ mr->alloc_flags |= IBV_EXP_ACCESS_ALLOCATE_MR;
+ /*
+ * set the allocated address for the verbs consumer
+ */
+ mr->ibv_mr.addr = in->addr;
+ }
+
+ /* We should store the ODP type of the MR to avoid
+ * calling "ibv_dofork_range" when invoking ibv_dereg_mr
+ */
+ if (in->exp_access & IBV_EXP_ACCESS_ON_DEMAND)
+ mr->type = MLX5_ODP_MR;
+
+ {
+ struct ibv_exp_reg_mr_resp resp;
+
+ ret = ibv_cmd_exp_reg_mr(in,
+ (uintptr_t) in->addr,
+ &(mr->ibv_mr),
+ &cmd, sizeof(cmd),
+ &resp, sizeof(resp));
+ }
+ if (ret) {
+ if ((mr->alloc_flags & IBV_EXP_ACCESS_ALLOCATE_MR)) {
+ if (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG)
+ mlx5_free_buf_contig(to_mctx(in->pd->context),
+ &mr->buf);
+ else
+ mlx5_free_buf(&(mr->buf));
+ }
+ free(mr);
+ return NULL;
+ }
+
+ return &mr->ibv_mr;
+}
+struct ibv_mr *mlx5_reg_mr(struct ibv_pd *pd, void *addr,
+ size_t length, int access)
+{
+ struct ibv_exp_reg_mr_in in;
+
+ in.pd = pd;
+ in.addr = addr;
+ in.length = length;
+ in.exp_access = access;
+ in.comp_mask = 0;
+
+ return mlx5_exp_reg_mr(&in);
+}
+int mlx5_dereg_mr(struct ibv_mr *ibmr)
+{
+ int ret;
+ struct mlx5_mr *mr = to_mmr(ibmr);
+
+ if (ibmr->lkey == ODP_GLOBAL_R_LKEY ||
+ ibmr->lkey == ODP_GLOBAL_W_LKEY) {
+ mlx5_dealloc_whole_addr_mr(ibmr);
+ return 0;
+ }
+
+ if (mr->alloc_flags & IBV_EXP_ACCESS_RELAXED)
+ return 0;
+
+ if (mr->alloc_flags & IBV_EXP_ACCESS_NO_RDMA)
+ goto free_mr;
+
+ ret = ibv_cmd_dereg_mr(ibmr);
+ if (ret)
+ return ret;
+
+free_mr:
+ if ((mr->alloc_flags & IBV_EXP_ACCESS_ALLOCATE_MR)) {
+ if (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG)
+ mlx5_free_buf_contig(to_mctx(ibmr->context), &mr->buf);
+ else
+ mlx5_free_buf(&(mr->buf));
+ }
+
+ free(mr);
+ return 0;
+}
+
+int mlx5_prefetch_mr(struct ibv_mr *mr, struct ibv_exp_prefetch_attr *attr)
+{
+
+ struct mlx5_pd *pd = to_mpd(mr->pd);
+
+ if (attr->comp_mask >= IBV_EXP_PREFETCH_MR_RESERVED)
+ return EINVAL;
+
+
+ switch (mr->lkey) {
+ case ODP_GLOBAL_R_LKEY:
+ return mlx5_prefetch_implicit_lkey(pd, &pd->r_ilkey,
+ (unsigned long)attr->addr,
+ attr->length, attr->flags);
+ case ODP_GLOBAL_W_LKEY:
+ return mlx5_prefetch_implicit_lkey(pd, &pd->w_ilkey,
+ (unsigned long)attr->addr,
+ attr->length, attr->flags);
+ default:
+ break;
+ }
+
+ return ibv_cmd_exp_prefetch_mr(mr, attr);
+}
+
+int mlx5_round_up_power_of_two(long long sz)
+{
+ long long ret;
+
+ for (ret = 1; ret < sz; ret <<= 1)
+ ; /* nothing */
+
+ if (ret > INT_MAX) {
+ fprintf(stderr, "%s: roundup overflow\n", __func__);
+ return -ENOMEM;
+ }
+
+ return (int)ret;
+}
+
+static int align_queue_size(long long req)
+{
+ return mlx5_round_up_power_of_two(req);
+}
+
+static int get_cqe_size(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+ struct mlx5_context *ctx = to_mctx(context);
+ int size = ctx->cache_line_size;
+
+ size = max(size, 64);
+ size = min(size, 128);
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_CQE_SIZE", env, sizeof(env)))
+ size = atoi(env);
+
+ switch (size) {
+ case 64:
+ case 128:
+ return size;
+
+ default:
+ return -EINVAL;
+ }
+}
+
+static int rwq_sig_enabled(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_RWQ_SIGNATURE", env, sizeof(env)))
+ return 1;
+
+ return 0;
+}
+
+static int srq_sig_enabled(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_SRQ_SIGNATURE", env, sizeof(env)))
+ return 1;
+
+ return 0;
+}
+
+static int qp_sig_enabled(struct ibv_context *context)
+{
+ char env[VERBS_MAX_ENV_VAL];
+
+ if (!ibv_exp_cmd_getenv(context, "MLX5_QP_SIGNATURE", env, sizeof(env)))
+ return 1;
+
+ return 0;
+}
+
+enum {
+ EXP_CREATE_CQ_SUPPORTED_FLAGS = IBV_EXP_CQ_CREATE_CROSS_CHANNEL |
+ IBV_EXP_CQ_TIMESTAMP
+};
+
+static struct ibv_cq *create_cq(struct ibv_context *context,
+ int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector,
+ struct ibv_exp_cq_init_attr *attr)
+{
+ struct mlx5_create_cq cmd;
+ struct mlx5_exp_create_cq cmd_e;
+ struct mlx5_create_cq_resp resp;
+ struct mlx5_cq *cq;
+ struct mlx5_context *mctx = to_mctx(context);
+ int cqe_sz;
+ int ret;
+ int ncqe;
+ int thread_safe;
+#ifdef MLX5_DEBUG
+ FILE *fp = mctx->dbg_fp;
+#endif
+
+ if (!cqe) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ errno = EINVAL;
+ return NULL;
+ }
+
+ cq = calloc(1, sizeof *cq);
+ if (!cq) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ return NULL;
+ }
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&cmd_e, 0, sizeof(cmd_e));
+ cq->cons_index = 0;
+ /* wait_index should start at value before 0 */
+ cq->wait_index = (uint32_t)(-1);
+ cq->wait_count = 0;
+
+ cq->pattern = MLX5_CQ_PATTERN;
+ thread_safe = !mlx5_single_threaded;
+ if (attr && (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN)) {
+ if (!attr->res_domain) {
+ errno = EINVAL;
+ goto err;
+ }
+ thread_safe = (to_mres_domain(attr->res_domain)->attr.thread_model == IBV_EXP_THREAD_SAFE);
+ }
+ if (mlx5_lock_init(&cq->lock, thread_safe, mlx5_get_locktype()))
+ goto err;
+
+ cq->model_flags = thread_safe ? MLX5_CQ_MODEL_FLAG_THREAD_SAFE : 0;
+
+ /* The additional entry is required for resize CQ */
+ if (cqe <= 0) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ errno = EINVAL;
+ goto err_spl;
+ }
+
+ ncqe = align_queue_size(cqe + 1);
+ if ((ncqe > (1 << 24)) || (ncqe < (cqe + 1))) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "ncqe %d\n", ncqe);
+ errno = EINVAL;
+ goto err_spl;
+ }
+
+ cqe_sz = get_cqe_size(context);
+ if (cqe_sz < 0) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ errno = -cqe_sz;
+ goto err_spl;
+ }
+
+ if (mlx5_alloc_cq_buf(mctx, cq, &cq->buf_a, ncqe, cqe_sz)) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ goto err_spl;
+ }
+
+ cq->dbrec = mlx5_alloc_dbrec(mctx);
+ if (!cq->dbrec) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "\n");
+ goto err_buf;
+ }
+
+ cq->dbrec[MLX5_CQ_SET_CI] = 0;
+ cq->dbrec[MLX5_CQ_ARM_DB] = 0;
+ cq->arm_sn = 0;
+ cq->cqe_sz = cqe_sz;
+
+ if (attr->comp_mask || mctx->cqe_comp_max_num) {
+ if (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_FLAGS &&
+ attr->flags & ~EXP_CREATE_CQ_SUPPORTED_FLAGS) {
+ mlx5_dbg(fp, MLX5_DBG_CQ,
+ "Unsupported creation flags requested\n");
+ errno = EINVAL;
+ goto err_db;
+ }
+
+ cmd_e.buf_addr = (uintptr_t) cq->buf_a.buf;
+ cmd_e.db_addr = (uintptr_t) cq->dbrec;
+ cmd_e.cqe_size = cqe_sz;
+ cmd_e.size_of_prefix = offsetof(struct mlx5_exp_create_cq,
+ prefix_reserved);
+ cmd_e.exp_data.comp_mask = MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_EN |
+ MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_RECV_TYPE;
+ if (mctx->cqe_comp_max_num) {
+ cmd_e.exp_data.cqe_comp_en = mctx->enable_cqe_comp ? 1 : 0;
+ cmd_e.exp_data.cqe_comp_recv_type = MLX5_CQE_FORMAT_HASH;
+ }
+ } else {
+ cmd.buf_addr = (uintptr_t) cq->buf_a.buf;
+ cmd.db_addr = (uintptr_t) cq->dbrec;
+ cmd.cqe_size = cqe_sz;
+ }
+
+ if (attr->comp_mask || cmd_e.exp_data.comp_mask)
+ ret = ibv_exp_cmd_create_cq(context, ncqe - 1, channel,
+ comp_vector, &cq->ibv_cq,
+ &cmd_e.ibv_cmd,
+ sizeof(cmd_e.ibv_cmd),
+ sizeof(cmd_e) - sizeof(cmd_e.ibv_cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp) - sizeof(resp.ibv_resp), attr);
+ else
+ ret = ibv_cmd_create_cq(context, ncqe - 1, channel, comp_vector,
+ &cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd,
+ &resp.ibv_resp, sizeof(resp));
+
+ if (ret) {
+ mlx5_dbg(fp, MLX5_DBG_CQ, "ret %d\n", ret);
+ goto err_db;
+ }
+
+ if (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_FLAGS &&
+ attr->flags & IBV_EXP_CQ_TIMESTAMP)
+ cq->creation_flags |=
+ MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP;
+
+ cq->active_buf = &cq->buf_a;
+ cq->resize_buf = NULL;
+ cq->cqn = resp.cqn;
+ cq->stall_enable = mctx->stall_enable;
+ cq->stall_adaptive_enable = mctx->stall_adaptive_enable;
+ cq->stall_cycles = mctx->stall_cycles;
+ cq->cq_log_size = mlx5_ilog2(ncqe);
+
+ return &cq->ibv_cq;
+
+err_db:
+ mlx5_free_db(mctx, cq->dbrec);
+
+err_buf:
+ mlx5_free_cq_buf(mctx, &cq->buf_a);
+
+err_spl:
+ mlx5_lock_destroy(&cq->lock);
+
+err:
+ free(cq);
+
+ return NULL;
+}
+
+struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector)
+{
+ struct ibv_exp_cq_init_attr attr;
+
+ read_init_vars(to_mctx(context));
+ attr.comp_mask = 0;
+ return create_cq(context, cqe, channel, comp_vector, &attr);
+}
+
+struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context,
+ int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector,
+ struct ibv_exp_cq_init_attr *attr)
+{
+ return create_cq(context, cqe, channel, comp_vector, attr);
+}
+
+int mlx5_resize_cq(struct ibv_cq *ibcq, int cqe)
+{
+ struct mlx5_cq *cq = to_mcq(ibcq);
+ struct mlx5_resize_cq_resp resp;
+ struct mlx5_resize_cq cmd;
+ struct mlx5_context *mctx = to_mctx(ibcq->context);
+ int err;
+
+ if (cqe < 0) {
+ errno = EINVAL;
+ return errno;
+ }
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&resp, 0, sizeof(resp));
+
+ if (((long long)cqe * 64) > INT_MAX)
+ return EINVAL;
+
+ mlx5_lock(&cq->lock);
+ cq->active_cqes = cq->ibv_cq.cqe;
+ if (cq->active_buf == &cq->buf_a)
+ cq->resize_buf = &cq->buf_b;
+ else
+ cq->resize_buf = &cq->buf_a;
+
+ cqe = align_queue_size(cqe + 1);
+ if (cqe == ibcq->cqe + 1) {
+ cq->resize_buf = NULL;
+ err = 0;
+ goto out;
+ }
+
+ /* currently we don't change cqe size */
+ cq->resize_cqe_sz = cq->cqe_sz;
+ cq->resize_cqes = cqe;
+ err = mlx5_alloc_cq_buf(mctx, cq, cq->resize_buf, cq->resize_cqes, cq->resize_cqe_sz);
+ if (err) {
+ cq->resize_buf = NULL;
+ errno = ENOMEM;
+ goto out;
+ }
+
+ cmd.buf_addr = (uintptr_t)cq->resize_buf->buf;
+ cmd.cqe_size = cq->resize_cqe_sz;
+
+ err = ibv_cmd_resize_cq(ibcq, cqe - 1, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (err)
+ goto out_buf;
+
+ mlx5_cq_resize_copy_cqes(cq);
+ mlx5_free_cq_buf(mctx, cq->active_buf);
+ cq->active_buf = cq->resize_buf;
+ cq->ibv_cq.cqe = cqe - 1;
+ cq->cq_log_size = mlx5_ilog2(cqe);
+ mlx5_update_cons_index(cq);
+ mlx5_unlock(&cq->lock);
+ cq->resize_buf = NULL;
+ return 0;
+
+out_buf:
+ mlx5_free_cq_buf(mctx, cq->resize_buf);
+ cq->resize_buf = NULL;
+
+out:
+ mlx5_unlock(&cq->lock);
+ return err;
+}
+
+int mlx5_destroy_cq(struct ibv_cq *cq)
+{
+ int ret;
+
+ ret = ibv_cmd_destroy_cq(cq);
+ if (ret)
+ return ret;
+
+ mlx5_free_db(to_mctx(cq->context), to_mcq(cq)->dbrec);
+ mlx5_free_cq_buf(to_mctx(cq->context), to_mcq(cq)->active_buf);
+ free(to_mcq(cq));
+
+ return 0;
+}
+
+struct ibv_srq *mlx5_create_srq(struct ibv_pd *pd,
+ struct ibv_srq_init_attr *attr)
+{
+ struct mlx5_create_srq cmd;
+ struct mlx5_create_srq_resp resp;
+ struct mlx5_srq *srq;
+ int ret;
+ struct mlx5_context *ctx;
+ int max_sge;
+ struct ibv_srq *ibsrq;
+
+ ctx = to_mctx(pd->context);
+ srq = calloc(1, sizeof *srq);
+ if (!srq) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ return NULL;
+ }
+ ibsrq = (struct ibv_srq *)&srq->vsrq;
+ srq->is_xsrq = 0;
+
+ memset(&cmd, 0, sizeof cmd);
+ if (mlx5_spinlock_init(&srq->lock, !mlx5_single_threaded)) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err;
+ }
+
+ if (attr->attr.max_wr > ctx->max_srq_recv_wr) {
+ fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", __func__, __LINE__,
+ attr->attr.max_wr, ctx->max_srq_recv_wr);
+ errno = EINVAL;
+ goto err;
+ }
+
+ /*
+ * this calculation does not consider required control segments. The
+ * final calculation is done again later. This is done so to avoid
+ * overflows of variables
+ */
+ max_sge = ctx->max_rq_desc_sz / sizeof(struct mlx5_wqe_data_seg);
+ if (attr->attr.max_sge > max_sge) {
+ fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", __func__, __LINE__,
+ attr->attr.max_wr, ctx->max_srq_recv_wr);
+ errno = EINVAL;
+ goto err;
+ }
+
+ srq->max = align_queue_size(attr->attr.max_wr + 1);
+ srq->max_gs = attr->attr.max_sge;
+ srq->counter = 0;
+
+ if (mlx5_alloc_srq_buf(pd->context, srq)) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err;
+ }
+
+ srq->db = mlx5_alloc_dbrec(to_mctx(pd->context));
+ if (!srq->db) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err_free;
+ }
+
+ *srq->db = 0;
+
+ cmd.buf_addr = (uintptr_t) srq->buf.buf;
+ cmd.db_addr = (uintptr_t) srq->db;
+ srq->wq_sig = srq_sig_enabled(pd->context);
+ if (srq->wq_sig)
+ cmd.flags = MLX5_SRQ_FLAG_SIGNATURE;
+
+ attr->attr.max_sge = srq->max_gs;
+ pthread_mutex_lock(&ctx->srq_table_mutex);
+ ret = ibv_cmd_create_srq(pd, ibsrq, attr, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (ret)
+ goto err_db;
+
+ ret = mlx5_store_srq(ctx, resp.srqn, srq);
+ if (ret)
+ goto err_destroy;
+
+ pthread_mutex_unlock(&ctx->srq_table_mutex);
+
+ srq->srqn = resp.srqn;
+ srq->rsc.rsn = resp.srqn;
+ srq->rsc.type = MLX5_RSC_TYPE_SRQ;
+
+ return ibsrq;
+
+err_destroy:
+ ibv_cmd_destroy_srq(ibsrq);
+
+err_db:
+ pthread_mutex_unlock(&ctx->srq_table_mutex);
+ mlx5_free_db(to_mctx(pd->context), srq->db);
+
+err_free:
+ free(srq->wrid);
+ mlx5_free_buf(&srq->buf);
+
+err:
+ free(srq);
+
+ return NULL;
+}
+
+int mlx5_modify_srq(struct ibv_srq *srq,
+ struct ibv_srq_attr *attr,
+ int attr_mask)
+{
+ struct ibv_modify_srq cmd;
+
+ if (srq->handle == LEGACY_XRC_SRQ_HANDLE)
+ srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq);
+
+ return ibv_cmd_modify_srq(srq, attr, attr_mask, &cmd, sizeof cmd);
+}
+
+int mlx5_query_srq(struct ibv_srq *srq,
+ struct ibv_srq_attr *attr)
+{
+ struct ibv_query_srq cmd;
+ if (srq->handle == LEGACY_XRC_SRQ_HANDLE)
+ srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq);
+
+ return ibv_cmd_query_srq(srq, attr, &cmd, sizeof cmd);
+}
+
+int mlx5_destroy_srq(struct ibv_srq *srq)
+{
+ struct ibv_srq *legacy_srq = NULL;
+ struct mlx5_srq *msrq;
+ struct mlx5_context *ctx = to_mctx(srq->context);
+ int ret;
+
+ if (srq->handle == LEGACY_XRC_SRQ_HANDLE) {
+ legacy_srq = srq;
+ srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq);
+ }
+
+ msrq = to_msrq(srq);
+ ret = ibv_cmd_destroy_srq(srq);
+ if (ret)
+ return ret;
+
+ if (ctx->cqe_version && msrq->is_xsrq)
+ mlx5_clear_uidx(ctx, msrq->rsc.rsn);
+ else
+ mlx5_clear_srq(ctx, msrq->srqn);
+
+ mlx5_free_db(ctx, msrq->db);
+ mlx5_free_buf(&msrq->buf);
+ free(msrq->wrid);
+ free(msrq);
+
+ if (legacy_srq)
+ free(legacy_srq);
+
+ return 0;
+}
+
+static int sq_overhead(struct ibv_exp_qp_init_attr *attr, struct mlx5_qp *qp,
+ int *inl_atom)
+{
+ int size1 = 0;
+ int size2 = 0;
+ int atom = 0;
+
+ switch (attr->qp_type) {
+ case IBV_QPT_RC:
+ size1 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_umr_ctrl_seg) +
+ sizeof(struct mlx5_mkey_seg) +
+ sizeof(struct mlx5_seg_repeat_block);
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_raddr_seg);
+
+ if (qp->enable_atomics) {
+ if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) &&
+ (attr->max_atomic_arg > 4))
+ atom = 4 * attr->max_atomic_arg;
+ /* TBD: change when we support data pointer args */
+ if (inl_atom)
+ *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom);
+ }
+ break;
+
+ case IBV_QPT_UC:
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_raddr_seg);
+ break;
+
+ case IBV_QPT_UD:
+ size1 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_umr_ctrl_seg) +
+ sizeof(struct mlx5_mkey_seg) +
+ sizeof(struct mlx5_seg_repeat_block);
+
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_datagram_seg);
+ break;
+
+ case IBV_QPT_XRC:
+ case IBV_QPT_XRC_SEND:
+ case IBV_QPT_XRC_RECV:
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_xrc_seg) +
+ sizeof(struct mlx5_wqe_raddr_seg);
+ if (qp->enable_atomics) {
+ if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) &&
+ (attr->max_atomic_arg > 4))
+ atom = 4 * attr->max_atomic_arg;
+ /* TBD: change when we support data pointer args */
+ if (inl_atom)
+ *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom);
+ }
+ break;
+
+ case IBV_EXP_QPT_DC_INI:
+ size1 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_umr_ctrl_seg) +
+ sizeof(struct mlx5_mkey_seg) +
+ sizeof(struct mlx5_seg_repeat_block);
+
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_datagram_seg) +
+ sizeof(struct mlx5_wqe_raddr_seg);
+ if (qp->enable_atomics) {
+ if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) &&
+ (attr->max_atomic_arg > 4))
+ atom = 4 * attr->max_atomic_arg;
+ /* TBD: change when we support data pointer args */
+ if (inl_atom)
+ *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom);
+ }
+ break;
+
+ case IBV_QPT_RAW_ETH:
+ size2 = sizeof(struct mlx5_wqe_ctrl_seg) +
+ sizeof(struct mlx5_wqe_eth_seg);
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ if (qp->umr_en)
+ return max(size1, size2);
+ else
+ return size2;
+}
+
+static int mlx5_max4(int t1, int t2, int t3, int t4)
+{
+ if (t1 < t2)
+ t1 = t2;
+
+ if (t1 < t3)
+ t1 = t3;
+
+ if (t1 < t4)
+ return t4;
+
+ return t1;
+}
+
+static int mlx5_calc_send_wqe(struct mlx5_context *ctx,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp)
+{
+ int inl_size = 0;
+ int max_gather;
+ int tot_size;
+ int overhead;
+ int inl_umr = 0;
+ int inl_atom = 0;
+ int t1 = 0;
+ int t2 = 0;
+ int t3 = 0;
+ int t4 = 0;
+
+ overhead = sq_overhead(attr, qp, &inl_atom);
+ if (overhead < 0)
+ return overhead;
+
+ if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG))
+ qp->max_atomic_arg = attr->max_atomic_arg;
+ if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS) &&
+ attr->max_inl_send_klms)
+ inl_umr = attr->max_inl_send_klms * 16;
+
+ if (attr->cap.max_inline_data) {
+ inl_size = align(sizeof(struct mlx5_wqe_inl_data_seg) +
+ attr->cap.max_inline_data, 16);
+ }
+
+ max_gather = (ctx->max_sq_desc_sz - overhead) /
+ sizeof(struct mlx5_wqe_data_seg);
+ if (attr->cap.max_send_sge > max_gather)
+ return -EINVAL;
+
+ if (inl_atom)
+ t1 = overhead + sizeof(struct mlx5_wqe_data_seg) + inl_atom;
+
+ t2 = overhead + attr->cap.max_send_sge * sizeof(struct mlx5_wqe_data_seg);
+
+ t3 = overhead + inl_umr;
+ t4 = overhead + inl_size;
+
+ tot_size = mlx5_max4(t1, t2, t3, t4);
+
+ if (tot_size > ctx->max_sq_desc_sz)
+ return -EINVAL;
+
+ return align(tot_size, MLX5_SEND_WQE_BB);
+}
+
+static int mlx5_calc_rcv_wqe(struct mlx5_context *ctx,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp)
+{
+ int size;
+ int num_scatter;
+
+ if (attr->srq)
+ return 0;
+
+ num_scatter = max(attr->cap.max_recv_sge, 1);
+ size = sizeof(struct mlx5_wqe_data_seg) * num_scatter;
+ if (qp->ctrl_seg.wq_sig)
+ size += sizeof(struct mlx5_rwqe_sig);
+
+ if (size < 0 || size > ctx->max_rq_desc_sz)
+ return -EINVAL;
+
+ size = mlx5_round_up_power_of_two(size);
+
+ return size;
+}
+
+static int get_send_sge(struct ibv_exp_qp_init_attr *attr, int wqe_size, struct mlx5_qp *qp)
+{
+ int max_sge;
+ int overhead = sq_overhead(attr, qp, NULL);
+
+ if (attr->qp_type == IBV_QPT_RC)
+ max_sge = (min(wqe_size, 512) -
+ sizeof(struct mlx5_wqe_ctrl_seg) -
+ sizeof(struct mlx5_wqe_raddr_seg)) /
+ sizeof(struct mlx5_wqe_data_seg);
+ else if (attr->qp_type == IBV_EXP_QPT_DC_INI)
+ max_sge = (min(wqe_size, 512) -
+ sizeof(struct mlx5_wqe_ctrl_seg) -
+ sizeof(struct mlx5_wqe_datagram_seg) -
+ sizeof(struct mlx5_wqe_raddr_seg)) /
+ sizeof(struct mlx5_wqe_data_seg);
+ else if (attr->qp_type == IBV_QPT_XRC)
+ max_sge = (min(wqe_size, 512) -
+ sizeof(struct mlx5_wqe_ctrl_seg) -
+ sizeof(struct mlx5_wqe_xrc_seg) -
+ sizeof(struct mlx5_wqe_raddr_seg)) /
+ sizeof(struct mlx5_wqe_data_seg);
+ else
+ max_sge = (wqe_size - overhead) /
+ sizeof(struct mlx5_wqe_data_seg);
+
+ return min(max_sge, wqe_size - overhead /
+ sizeof(struct mlx5_wqe_data_seg));
+}
+
+static int mlx5_calc_sq_size(struct mlx5_context *ctx,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp)
+{
+ int wqe_size;
+ int wq_size;
+#ifdef MLX5_DEBUG
+ FILE *fp = ctx->dbg_fp;
+#endif
+
+ if (!attr->cap.max_send_wr)
+ return 0;
+
+ wqe_size = mlx5_calc_send_wqe(ctx, attr, qp);
+ if (wqe_size < 0) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return wqe_size;
+ }
+
+ if (attr->qp_type == IBV_EXP_QPT_DC_INI &&
+ wqe_size > ctx->max_desc_sz_sq_dc) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -EINVAL;
+ } else if (wqe_size > ctx->max_sq_desc_sz) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -EINVAL;
+ }
+
+ qp->data_seg.max_inline_data = wqe_size - sq_overhead(attr, qp, NULL) -
+ sizeof(struct mlx5_wqe_inl_data_seg);
+ attr->cap.max_inline_data = qp->data_seg.max_inline_data;
+
+ /*
+ * to avoid overflow, we limit max_send_wr so
+ * that the multiplication will fit in int
+ */
+ if (attr->cap.max_send_wr > 0x7fffffff / ctx->max_sq_desc_sz) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -ENOMEM;
+ }
+
+ wq_size = mlx5_round_up_power_of_two(attr->cap.max_send_wr * wqe_size);
+ qp->sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB;
+ if (qp->sq.wqe_cnt > ctx->max_send_wqebb) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -ENOMEM;
+ }
+
+ qp->sq.wqe_shift = mlx5_ilog2(MLX5_SEND_WQE_BB);
+ qp->sq.max_gs = get_send_sge(attr, wqe_size, qp);
+ if (qp->sq.max_gs < attr->cap.max_send_sge)
+ return -ENOMEM;
+
+ attr->cap.max_send_sge = qp->sq.max_gs;
+ if (qp->umr_en) {
+ qp->max_inl_send_klms = ((attr->qp_type == IBV_QPT_RC) ||
+ (attr->qp_type == IBV_EXP_QPT_DC_INI)) ?
+ attr->max_inl_send_klms : 0;
+ attr->max_inl_send_klms = qp->max_inl_send_klms;
+ }
+ qp->sq.max_post = wq_size / wqe_size;
+
+ return wq_size;
+}
+
+static int qpt_has_rq(enum ibv_qp_type qpt)
+{
+ switch (qpt) {
+ case IBV_QPT_RC:
+ case IBV_QPT_UC:
+ case IBV_QPT_UD:
+ case IBV_QPT_RAW_ETH:
+ return 1;
+
+ case IBV_QPT_XRC:
+ case IBV_QPT_XRC_SEND:
+ case IBV_QPT_XRC_RECV:
+ case IBV_EXP_QPT_DC_INI:
+ return 0;
+ }
+ return 0;
+}
+
+static int mlx5_calc_rwq_size(struct mlx5_context *ctx,
+ struct mlx5_rwq *rwq,
+ struct ibv_exp_wq_init_attr *attr)
+{
+ int wqe_size;
+ int wq_size;
+ int num_scatter;
+ int scat_spc;
+ int mp_rq = !!(attr->comp_mask & IBV_EXP_CREATE_WQ_MP_RQ);
+
+ if (!attr->max_recv_wr)
+ return -EINVAL;
+
+ /* TBD: check caps for RQ */
+ num_scatter = max(attr->max_recv_sge, 1);
+ wqe_size = sizeof(struct mlx5_wqe_data_seg) * num_scatter +
+ /* In case of mp_rq the WQE format is like SRQ.
+ * Need to add the extra octword even when we don't
+ * use linked list.
+ */
+ (mp_rq ? sizeof(struct mlx5_wqe_srq_next_seg) : 0);
+
+ if (rwq->wq_sig)
+ wqe_size += sizeof(struct mlx5_rwqe_sig);
+
+ if (wqe_size <= 0 || wqe_size > ctx->max_rq_desc_sz)
+ return -EINVAL;
+
+ wqe_size = mlx5_round_up_power_of_two(wqe_size);
+ wq_size = mlx5_round_up_power_of_two(attr->max_recv_wr) * wqe_size;
+ wq_size = max(wq_size, MLX5_SEND_WQE_BB);
+ rwq->rq.wqe_cnt = wq_size / wqe_size;
+ rwq->rq.wqe_shift = mlx5_ilog2(wqe_size);
+ rwq->rq.max_post = 1 << mlx5_ilog2(wq_size / wqe_size);
+ scat_spc = wqe_size -
+ ((rwq->wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0) -
+ (mp_rq ? sizeof(struct mlx5_wqe_srq_next_seg) : 0);
+ rwq->rq.max_gs = scat_spc / sizeof(struct mlx5_wqe_data_seg);
+ return wq_size;
+}
+
+static int mlx5_calc_rq_size(struct mlx5_context *ctx,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp)
+{
+ int wqe_size;
+ int wq_size;
+ int scat_spc;
+#ifdef MLX5_DEBUG
+ FILE *fp = ctx->dbg_fp;
+#endif
+
+ if (!attr->cap.max_recv_wr || !qpt_has_rq(attr->qp_type))
+ return 0;
+
+ if (attr->cap.max_recv_wr > ctx->max_recv_wr) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -EINVAL;
+ }
+
+ wqe_size = mlx5_calc_rcv_wqe(ctx, attr, qp);
+ if (wqe_size < 0 || wqe_size > ctx->max_rq_desc_sz) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return -EINVAL;
+ }
+
+ wq_size = mlx5_round_up_power_of_two(attr->cap.max_recv_wr) * wqe_size;
+ if (wqe_size) {
+ wq_size = max(wq_size, MLX5_SEND_WQE_BB);
+ qp->rq.wqe_cnt = wq_size / wqe_size;
+ qp->rq.wqe_shift = mlx5_ilog2(wqe_size);
+ qp->rq.max_post = 1 << mlx5_ilog2(wq_size / wqe_size);
+ scat_spc = wqe_size -
+ ((qp->ctrl_seg.wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0);
+ qp->rq.max_gs = scat_spc / sizeof(struct mlx5_wqe_data_seg);
+ } else {
+ qp->rq.wqe_cnt = 0;
+ qp->rq.wqe_shift = 0;
+ qp->rq.max_post = 0;
+ qp->rq.max_gs = 0;
+ }
+ return wq_size;
+}
+
+static int mlx5_calc_wq_size(struct mlx5_context *ctx,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp)
+{
+ int ret;
+ int result;
+
+ ret = mlx5_calc_sq_size(ctx, attr, qp);
+ if (ret < 0)
+ return ret;
+
+ result = ret;
+ ret = mlx5_calc_rq_size(ctx, attr, qp);
+ if (ret < 0)
+ return ret;
+
+ result += ret;
+
+ qp->sq.offset = ret;
+ qp->rq.offset = 0;
+
+ return result;
+}
+
+static void map_uuar(struct ibv_context *context, struct mlx5_qp *qp,
+ int uuar_index)
+{
+ struct mlx5_context *ctx = to_mctx(context);
+
+ qp->gen_data.bf = &ctx->bfs[uuar_index];
+}
+
+static const char *qptype2key(enum ibv_qp_type type)
+{
+ switch (type) {
+ case IBV_QPT_RC: return "HUGE_RC";
+ case IBV_QPT_UC: return "HUGE_UC";
+ case IBV_QPT_UD: return "HUGE_UD";
+#ifdef _NOT_EXISTS_IN_OFED_2_0
+ case IBV_QPT_RAW_PACKET: return "HUGE_RAW_ETH";
+#endif
+ default: return "HUGE_NA";
+ }
+}
+
+static void mlx5_free_rwq_buf(struct mlx5_rwq *rwq, struct ibv_context *context)
+{
+ struct mlx5_context *ctx = to_mctx(context);
+
+ mlx5_free_actual_buf(ctx, &rwq->buf);
+ if (rwq->consumed_strides_counter)
+ free(rwq->consumed_strides_counter);
+
+ free(rwq->rq.wrid);
+}
+
+static int mlx5_alloc_rwq_buf(struct ibv_context *context,
+ struct mlx5_rwq *rwq,
+ int size,
+ enum mlx5_rsc_type rsc_type)
+{
+ int err;
+ enum mlx5_alloc_type default_alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG;
+
+ rwq->rq.wrid = malloc(rwq->rq.wqe_cnt * sizeof(uint64_t));
+ if (!rwq->rq.wrid) {
+ errno = ENOMEM;
+ return -1;
+ }
+
+ if (rsc_type == MLX5_RSC_TYPE_MP_RWQ) {
+ rwq->consumed_strides_counter = calloc(1, rwq->rq.wqe_cnt * sizeof(uint32_t));
+ if (!rwq->consumed_strides_counter) {
+ errno = ENOMEM;
+ goto free_wr_id;
+ }
+ }
+
+ rwq->buf.numa_req.valid = 1;
+ rwq->buf.numa_req.numa_id = to_mctx(context)->numa_id;
+ err = mlx5_alloc_prefered_buf(to_mctx(context), &rwq->buf,
+ align(rwq->buf_size, to_mdev
+ (context->device)->page_size),
+ to_mdev(context->device)->page_size,
+ default_alloc_type,
+ MLX5_RWQ_PREFIX);
+
+ if (err) {
+ errno = ENOMEM;
+ goto free_strd_cnt;
+ }
+
+ return 0;
+
+free_strd_cnt:
+ if (rwq->consumed_strides_counter)
+ free(rwq->consumed_strides_counter);
+
+free_wr_id:
+ free(rwq->rq.wrid);
+
+ return -1;
+}
+static int mlx5_alloc_qp_buf(struct ibv_context *context,
+ struct ibv_exp_qp_init_attr *attr,
+ struct mlx5_qp *qp,
+ int size)
+{
+ int err;
+ enum mlx5_alloc_type alloc_type;
+ enum mlx5_alloc_type default_alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG;
+ const char *qp_huge_key;
+
+ if (qp->sq.wqe_cnt) {
+ qp->sq.wrid = malloc(qp->sq.wqe_cnt * sizeof(*qp->sq.wrid));
+ if (!qp->sq.wrid) {
+ errno = ENOMEM;
+ err = -1;
+ }
+ }
+
+ qp->gen_data.wqe_head = malloc(qp->sq.wqe_cnt * sizeof(*qp->gen_data.wqe_head));
+ if (!qp->gen_data.wqe_head) {
+ errno = ENOMEM;
+ err = -1;
+ goto ex_wrid;
+ }
+
+ if (qp->rq.wqe_cnt) {
+ qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->rq.wrid) {
+ errno = ENOMEM;
+ err = -1;
+ goto ex_wrid;
+ }
+ }
+
+ /* compatability support */
+ qp_huge_key = qptype2key(qp->verbs_qp.qp.qp_type);
+ if (mlx5_use_huge(context, qp_huge_key))
+ default_alloc_type = MLX5_ALLOC_TYPE_HUGE;
+
+ mlx5_get_alloc_type(context, MLX5_QP_PREFIX, &alloc_type,
+ default_alloc_type);
+
+ qp->buf.numa_req.valid = 1;
+ qp->buf.numa_req.numa_id = to_mctx(context)->numa_id;
+ err = mlx5_alloc_prefered_buf(to_mctx(context), &qp->buf,
+ align(qp->buf_size, to_mdev
+ (context->device)->page_size),
+ to_mdev(context->device)->page_size,
+ alloc_type,
+ MLX5_QP_PREFIX);
+
+ if (err) {
+ err = -ENOMEM;
+ goto ex_wrid;
+ }
+
+ memset(qp->buf.buf, 0, qp->buf_size);
+
+ if (attr->qp_type == IBV_QPT_RAW_ETH) {
+ /* For Raw Ethernet QP, allocate a separate buffer for the SQ */
+ err = mlx5_alloc_prefered_buf(to_mctx(context), &qp->sq_buf,
+ align(qp->sq_buf_size, to_mdev
+ (context->device)->page_size),
+ to_mdev(context->device)->page_size,
+ alloc_type,
+ MLX5_QP_PREFIX);
+ if (err) {
+ err = -ENOMEM;
+ goto rq_buf;
+ }
+
+ memset(qp->sq_buf.buf, 0, qp->buf_size - qp->sq.offset);
+ }
+
+ return 0;
+rq_buf:
+ mlx5_free_actual_buf(to_mctx(qp->verbs_qp.qp.context), &qp->buf);
+ex_wrid:
+ if (qp->rq.wrid)
+ free(qp->rq.wrid);
+
+ if (qp->gen_data.wqe_head)
+ free(qp->gen_data.wqe_head);
+
+ if (qp->sq.wrid)
+ free(qp->sq.wrid);
+
+ return err;
+}
+
+static void mlx5_free_qp_buf(struct mlx5_qp *qp)
+{
+ struct mlx5_context *ctx = to_mctx(qp->verbs_qp.qp.context);
+
+ mlx5_free_actual_buf(ctx, &qp->buf);
+
+ if (qp->sq_buf.buf)
+ mlx5_free_actual_buf(ctx, &qp->sq_buf);
+
+ if (qp->rq.wrid)
+ free(qp->rq.wrid);
+
+ if (qp->gen_data.wqe_head)
+ free(qp->gen_data.wqe_head);
+
+ if (qp->sq.wrid)
+ free(qp->sq.wrid);
+}
+
+static void update_caps(struct ibv_context *context)
+{
+ struct mlx5_context *ctx;
+ struct ibv_exp_device_attr attr;
+ int err;
+
+ ctx = to_mctx(context);
+ if (ctx->info.valid)
+ return;
+
+ attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
+ err = ibv_exp_query_device(context, &attr);
+ if (err)
+ return;
+
+ ctx->info.exp_atomic_cap = attr.exp_atomic_cap;
+ ctx->info.valid = 1;
+ ctx->max_sge = attr.max_sge;
+ if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_UMR)
+ ctx->max_send_wqe_inline_klms =
+ attr.umr_caps.max_send_wqe_inline_klms;
+ if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXT_ATOMIC_ARGS)
+ ctx->info.bit_mask_log_atomic_arg_sizes =
+ attr.ext_atom.log_atomic_arg_sizes;
+
+ return;
+}
+
+static inline int is_xrc_tgt(int type)
+{
+ return (type == IBV_QPT_XRC_RECV);
+}
+
+static struct ibv_qp *create_qp(struct ibv_context *context,
+ struct ibv_exp_qp_init_attr *attrx,
+ int is_exp)
+{
+ struct mlx5_create_qp cmd;
+ struct mlx5_create_qp_resp resp;
+ struct mlx5_exp_create_qp cmdx;
+ struct mlx5_exp_create_qp_resp respx;
+ struct mlx5_qp *qp;
+ int ret;
+ struct mlx5_context *ctx = to_mctx(context);
+ struct ibv_qp *ibqp;
+ struct mlx5_drv_create_qp *drv;
+ struct mlx5_exp_drv_create_qp *drvx;
+ int lib_cmd_size;
+ int drv_cmd_size;
+ int lib_resp_size;
+ int drv_resp_size;
+ int thread_safe = !mlx5_single_threaded;
+ void *_cmd;
+ void *_resp;
+#ifdef MLX5_DEBUG
+ FILE *fp = ctx->dbg_fp;
+#endif
+
+ /* Use experimental path when driver pass experimental data */
+ is_exp = is_exp || (ctx->cqe_version != 0) ||
+ (attrx->qp_type == IBV_QPT_RAW_ETH);
+
+ update_caps(context);
+ qp = calloc(1, sizeof(*qp));
+ if (!qp) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ return NULL;
+ }
+ ibqp = (struct ibv_qp *)&qp->verbs_qp;
+
+ if (is_exp) {
+ memset(&cmdx, 0, sizeof(cmdx));
+ memset(&respx, 0, sizeof(respx));
+ drv = (struct mlx5_drv_create_qp *)(void *)(&cmdx.drv);
+ drvx = &cmdx.drv;
+ drvx->size_of_prefix = offsetof(struct mlx5_exp_drv_create_qp, prefix_reserved);
+ _cmd = &cmdx.ibv_cmd;
+ _resp = &respx.ibv_resp;
+ lib_cmd_size = sizeof(cmdx.ibv_cmd);
+ drv_cmd_size = sizeof(*drvx);
+ lib_resp_size = sizeof(respx.ibv_resp);
+ drv_resp_size = sizeof(respx) - sizeof(respx.ibv_resp);
+ } else {
+ memset(&cmd, 0, sizeof(cmd));
+ drv = &cmd.drv;
+ _cmd = &cmd.ibv_cmd;
+ _resp = &resp.ibv_resp;
+ lib_cmd_size = sizeof(cmd.ibv_cmd);
+ drv_cmd_size = sizeof(*drv);
+ lib_resp_size = sizeof(resp.ibv_resp);
+ drv_resp_size = sizeof(resp) - sizeof(resp.ibv_resp);
+ }
+
+ if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RX_HASH) && attrx->qp_type == IBV_QPT_RAW_ETH) {
+ if (attrx->send_cq || attrx->recv_cq || attrx->srq ||
+ attrx->cap.max_inline_data || attrx->cap.max_recv_sge ||
+ attrx->cap.max_recv_wr || attrx->cap.max_send_sge ||
+ attrx->cap.max_send_wr) {
+ errno = EINVAL;
+ goto err;
+ }
+
+ ret = ibv_exp_cmd_create_qp(context, &qp->verbs_qp,
+ sizeof(qp->verbs_qp),
+ attrx,
+ _cmd,
+ lib_cmd_size,
+ 0,
+ _resp,
+ lib_resp_size,
+ 0, 1);
+ if (ret)
+ goto err;
+
+ qp->rx_qp = 1;
+ return ibqp;
+ }
+
+ qp->ctrl_seg.wq_sig = qp_sig_enabled(context);
+ if (qp->ctrl_seg.wq_sig)
+ drv->flags |= MLX5_QP_FLAG_SIGNATURE;
+
+ if ((ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_HCA_REPLY_BE) &&
+ (attrx->exp_create_flags & IBV_EXP_QP_CREATE_ATOMIC_BE_REPLY)) {
+ qp->enable_atomics = 1;
+ } else if ((ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_HCA) ||
+ (ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_GLOB)) {
+ qp->enable_atomics = 1;
+ }
+
+ if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS) &&
+ (!(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) ||
+ !(attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR))) {
+ errno = EINVAL;
+ goto err;
+ }
+
+ if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) &&
+ (attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR) &&
+ !(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS)) {
+ errno = EINVAL;
+ goto err;
+ }
+
+ if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) &&
+ (attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR))
+ qp->umr_en = 1;
+
+ if (attrx->cap.max_send_sge > ctx->max_sge) {
+ errno = EINVAL;
+ goto err;
+ }
+
+ if (qp->umr_en && (attrx->max_inl_send_klms >
+ ctx->max_send_wqe_inline_klms)) {
+ errno = EINVAL;
+ goto err;
+ }
+
+ ret = mlx5_calc_wq_size(ctx, attrx, qp);
+ if (ret < 0) {
+ errno = -ret;
+ goto err;
+ }
+
+ if (attrx->qp_type == IBV_QPT_RAW_ETH) {
+ qp->buf_size = qp->sq.offset;
+ qp->sq_buf_size = ret - qp->buf_size;
+ } else {
+ qp->buf_size = ret;
+ qp->sq_buf_size = 0;
+ }
+
+ if (attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS)
+ qp->gen_data.create_flags = attrx->exp_create_flags & IBV_EXP_QP_CREATE_MASK;
+
+ if (mlx5_alloc_qp_buf(context, attrx, qp, ret)) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ goto err;
+ }
+
+ if (attrx->qp_type == IBV_QPT_RAW_ETH) {
+ qp->gen_data.sqstart = qp->sq_buf.buf;
+ qp->gen_data.sqend = qp->sq_buf.buf +
+ (qp->sq.wqe_cnt << qp->sq.wqe_shift);
+ } else {
+ qp->gen_data.sqstart = qp->buf.buf + qp->sq.offset;
+ qp->gen_data.sqend = qp->buf.buf + qp->sq.offset +
+ (qp->sq.wqe_cnt << qp->sq.wqe_shift);
+ }
+ qp->odp_data.pd = to_mpd(attrx->pd);
+
+ mlx5_init_qp_indices(qp);
+
+ /* Check if UAR provided by resource domain */
+ if (attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RES_DOMAIN) {
+ struct mlx5_res_domain *res_domain = to_mres_domain(attrx->res_domain);
+
+ drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_WC_UAR_IDX;
+ if (res_domain->send_db) {
+ drvx->exp.wc_uar_index = res_domain->send_db->wc_uar->uar_idx;
+ qp->gen_data.bf = &res_domain->send_db->bf;
+ } else {
+ /* If we didn't allocate dedicated BF for this resource
+ * domain we'll ask the kernel to provide UUAR that uses
+ * DB only (no BF)
+ */
+ drvx->exp.wc_uar_index = MLX5_EXP_CREATE_QP_DB_ONLY_UUAR;
+ }
+ thread_safe = (res_domain->attr.thread_model == IBV_EXP_THREAD_SAFE);
+ }
+ if (mlx5_lock_init(&qp->sq.lock, thread_safe, mlx5_get_locktype()) ||
+ mlx5_lock_init(&qp->rq.lock, thread_safe, mlx5_get_locktype()))
+ goto err_free_qp_buf;
+ qp->gen_data.model_flags = thread_safe ? MLX5_QP_MODEL_FLAG_THREAD_SAFE : 0;
+
+ qp->gen_data.db = mlx5_alloc_dbrec(ctx);
+ if (!qp->gen_data.db) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "\n");
+ goto err_free_qp_buf;
+ }
+
+ qp->gen_data.db[MLX5_RCV_DBR] = 0;
+ qp->gen_data.db[MLX5_SND_DBR] = 0;
+ qp->rq.buff = qp->buf.buf + qp->rq.offset;
+ qp->sq.buff = qp->buf.buf + qp->sq.offset;
+ qp->rq.db = &qp->gen_data.db[MLX5_RCV_DBR];
+ qp->sq.db = &qp->gen_data.db[MLX5_SND_DBR];
+
+ drv->buf_addr = (uintptr_t) qp->buf.buf;
+ if (attrx->qp_type == IBV_QPT_RAW_ETH) {
+ drvx->exp.sq_buf_addr = (uintptr_t)qp->sq_buf.buf;
+ drvx->exp.flags |= MLX5_EXP_CREATE_QP_MULTI_PACKET_WQE_REQ_FLAG;
+ drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_SQ_BUFF_ADD |
+ MLX5_EXP_CREATE_QP_MASK_FLAGS_IDX;
+ }
+ drv->db_addr = (uintptr_t) qp->gen_data.db;
+ drv->sq_wqe_count = qp->sq.wqe_cnt;
+ drv->rq_wqe_count = qp->rq.wqe_cnt;
+ drv->rq_wqe_shift = qp->rq.wqe_shift;
+ if (!ctx->cqe_version) {
+ pthread_mutex_lock(&ctx->rsc_table_mutex);
+ } else if (!is_xrc_tgt(attrx->qp_type)) {
+ drvx->exp.uidx = mlx5_store_uidx(ctx, qp);
+ if (drvx->exp.uidx < 0) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n");
+ goto err_rq_db;
+ }
+ drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_UIDX;
+ }
+
+ ret = ibv_exp_cmd_create_qp(context, &qp->verbs_qp,
+ sizeof(qp->verbs_qp),
+ attrx,
+ _cmd,
+ lib_cmd_size,
+ drv_cmd_size,
+ _resp,
+ lib_resp_size,
+ drv_resp_size,
+ /* Force experimental */
+ is_exp);
+ if (ret) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "ret %d\n", ret);
+ goto err_free_uidx;
+ }
+
+ if (!ctx->cqe_version) {
+ ret = mlx5_store_rsc(ctx, ibqp->qp_num, qp);
+ if (ret) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "ret %d\n", ret);
+ goto err_destroy;
+ }
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ }
+
+ /* Update related BF mapping when uuar not provided by resource domain */
+ if (!(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RES_DOMAIN) ||
+ !to_mres_domain(attrx->res_domain)->send_db) {
+ if (is_exp)
+ map_uuar(context, qp, respx.uuar_index);
+ else
+ map_uuar(context, qp, resp.uuar_index);
+ }
+ qp->gen_data_warm.pattern = MLX5_QP_PATTERN;
+
+ qp->rq.max_post = qp->rq.wqe_cnt;
+ if (attrx->sq_sig_all)
+ qp->sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE;
+ else
+ qp->sq_signal_bits = 0;
+
+ attrx->cap.max_send_wr = qp->sq.max_post;
+ attrx->cap.max_recv_wr = qp->rq.max_post;
+ attrx->cap.max_recv_sge = qp->rq.max_gs;
+ qp->rsc.type = MLX5_RSC_TYPE_QP;
+ if (is_exp && (drvx->exp.comp_mask & MLX5_EXP_CREATE_QP_MASK_UIDX))
+ qp->rsc.rsn = drvx->exp.uidx;
+ else
+ qp->rsc.rsn = ibqp->qp_num;
+
+ if (is_exp && (respx.exp.comp_mask & MLX5_EXP_CREATE_QP_RESP_MASK_FLAGS_IDX) &&
+ (respx.exp.flags & MLX5_EXP_CREATE_QP_RESP_MULTI_PACKET_WQE_FLAG))
+ qp->gen_data.model_flags |= MLX5_QP_MODEL_MULTI_PACKET_WQE;
+
+ mlx5_build_ctrl_seg_data(qp, ibqp->qp_num);
+ qp->gen_data_warm.qp_type = ibqp->qp_type;
+ mlx5_update_post_send_one(qp, ibqp->state, ibqp->qp_type);
+
+ return ibqp;
+
+err_destroy:
+ ibv_cmd_destroy_qp(ibqp);
+err_free_uidx:
+ if (!ctx->cqe_version)
+ pthread_mutex_unlock(&to_mctx(context)->rsc_table_mutex);
+ else if (!is_xrc_tgt(attrx->qp_type))
+ mlx5_clear_uidx(ctx, drvx->exp.uidx);
+err_rq_db:
+ mlx5_free_db(to_mctx(context), qp->gen_data.db);
+
+err_free_qp_buf:
+ mlx5_free_qp_buf(qp);
+err:
+ free(qp);
+
+ return NULL;
+}
+
+struct ibv_qp *mlx5_drv_create_qp(struct ibv_context *context,
+ struct ibv_qp_init_attr_ex *attrx)
+{
+ if (attrx->comp_mask >= IBV_QP_INIT_ATTR_RESERVED) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ return create_qp(context, (struct ibv_exp_qp_init_attr *)attrx, 1);
+}
+
+struct ibv_qp *mlx5_exp_create_qp(struct ibv_context *context,
+ struct ibv_exp_qp_init_attr *attrx)
+{
+ return create_qp(context, attrx, 1);
+}
+
+struct ibv_qp *mlx5_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr)
+{
+ struct ibv_exp_qp_init_attr attrx;
+ struct ibv_qp *qp;
+ int copy_sz = offsetof(struct ibv_qp_init_attr, xrc_domain);
+
+ memset(&attrx, 0, sizeof(attrx));
+ memcpy(&attrx, attr, copy_sz);
+ attrx.comp_mask = IBV_QP_INIT_ATTR_PD;
+ attrx.pd = pd;
+ qp = create_qp(pd->context, &attrx, 0);
+ if (qp)
+ memcpy(attr, &attrx, copy_sz);
+
+ return qp;
+}
+
+struct ibv_exp_rwq_ind_table *mlx5_exp_create_rwq_ind_table(struct ibv_context *context,
+ struct ibv_exp_rwq_ind_table_init_attr *init_attr)
+{
+ struct ibv_exp_create_rwq_ind_table *cmd;
+ struct mlx5_exp_create_rwq_ind_table_resp resp;
+ struct ibv_exp_rwq_ind_table *ind_table;
+ uint32_t required_tbl_size;
+ int num_tbl_entries;
+ int cmd_size;
+ int err;
+
+ num_tbl_entries = 1 << init_attr->log_ind_tbl_size;
+ /* Data must be u64 aligned */
+ required_tbl_size = (num_tbl_entries * sizeof(uint32_t)) < sizeof(uint64_t) ?
+ sizeof(uint64_t) : (num_tbl_entries * sizeof(uint32_t));
+
+ cmd_size = required_tbl_size + sizeof(*cmd);
+ cmd = calloc(1, cmd_size);
+ if (!cmd)
+ return NULL;
+ memset(&resp, 0, sizeof(resp));
+
+ ind_table = calloc(1, sizeof(*ind_table));
+ if (!ind_table)
+ goto free_cmd;
+
+ err = ibv_exp_cmd_create_rwq_ind_table(context, init_attr, ind_table, cmd,
+ cmd_size, cmd_size, &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp));
+ if (err)
+ goto err;
+
+ free(cmd);
+ return ind_table;
+
+err:
+ free(ind_table);
+free_cmd:
+ free(cmd);
+ return NULL;
+}
+
+int mlx5_exp_destroy_rwq_ind_table(struct ibv_exp_rwq_ind_table *rwq_ind_table)
+{
+ struct mlx5_exp_destroy_rwq_ind_table cmd;
+ int ret;
+
+ memset(&cmd, 0, sizeof(cmd));
+ ret = ibv_exp_cmd_destroy_rwq_ind_table(rwq_ind_table);
+
+ if (ret)
+ return ret;
+
+ free(rwq_ind_table);
+ return 0;
+}
+
+struct ibv_exp_wq *mlx5_exp_create_wq(struct ibv_context *context,
+ struct ibv_exp_wq_init_attr *attr)
+{
+ struct mlx5_exp_create_wq cmd;
+ struct mlx5_exp_create_wq_resp resp;
+ int err;
+ struct mlx5_rwq *rwq;
+ struct mlx5_context *ctx = to_mctx(context);
+ int ret;
+ int thread_safe = !mlx5_single_threaded;
+ struct ibv_exp_device_attr device_attr;
+ enum mlx5_rsc_type rsc_type;
+#ifdef MLX5_DEBUG
+ FILE *fp = ctx->dbg_fp;
+#endif
+
+ if (attr->wq_type != IBV_EXP_WQT_RQ)
+ return NULL;
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&resp, 0, sizeof(resp));
+
+ rwq = calloc(1, sizeof(*rwq));
+ if (!rwq)
+ return NULL;
+
+ rwq->wq_sig = rwq_sig_enabled(context);
+ if (rwq->wq_sig)
+ cmd.drv.flags = MLX5_RWQ_FLAG_SIGNATURE;
+
+ ret = mlx5_calc_rwq_size(ctx, rwq, attr);
+ if (ret < 0) {
+ errno = -ret;
+ goto err;
+ }
+
+ rwq->buf_size = ret;
+ if (attr->comp_mask & IBV_EXP_CREATE_WQ_MP_RQ) {
+ /* Make sure requested mp_rq values supported by lib */
+ if ((attr->mp_rq.single_stride_log_num_of_bytes > MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE) ||
+ (attr->mp_rq.single_wqe_log_num_of_strides > MLX5_MP_RQ_MAX_LOG_NUM_STRIDES) ||
+ (attr->mp_rq.use_shift & ~MLX5_MP_RQ_SUPPORTED_SHIFTS)) {
+ errno = EINVAL;
+ goto err;
+ }
+ rsc_type = MLX5_RSC_TYPE_MP_RWQ;
+ rwq->mp_rq_stride_size = 1 << attr->mp_rq.single_stride_log_num_of_bytes;
+ rwq->mp_rq_strides_in_wqe = 1 << attr->mp_rq.single_wqe_log_num_of_strides;
+ if (attr->mp_rq.use_shift == IBV_EXP_MP_RQ_2BYTES_SHIFT)
+ rwq->mp_rq_packet_padding = 2;
+ } else {
+ rsc_type = MLX5_RSC_TYPE_RWQ;
+ }
+ if (mlx5_alloc_rwq_buf(context, rwq, ret, rsc_type))
+ goto err;
+
+ mlx5_init_rwq_indices(rwq);
+
+ if (attr->comp_mask & IBV_EXP_CREATE_WQ_RES_DOMAIN)
+ thread_safe = (to_mres_domain(attr->res_domain)->attr.thread_model == IBV_EXP_THREAD_SAFE);
+
+ rwq->model_flags = thread_safe ? MLX5_WQ_MODEL_FLAG_THREAD_SAFE : 0;
+
+ memset(&device_attr, 0, sizeof(device_attr));
+ device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
+ ret = ibv_exp_query_device(context, &device_attr);
+ /* Check if RX offloads supported */
+ if (!ret && (device_attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS) &&
+ (device_attr.exp_device_cap_flags & IBV_EXP_DEVICE_RX_CSUM_IP_PKT))
+ rwq->model_flags |= MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP;
+
+ if (mlx5_lock_init(&rwq->rq.lock, thread_safe, mlx5_get_locktype()))
+ goto err_free_rwq_buf;
+
+ rwq->db = mlx5_alloc_dbrec(ctx);
+ if (!rwq->db)
+ goto err_free_rwq_buf;
+
+ rwq->db[MLX5_RCV_DBR] = 0;
+ rwq->db[MLX5_SND_DBR] = 0;
+ rwq->rq.buff = rwq->buf.buf + rwq->rq.offset;
+ rwq->rq.db = &rwq->db[MLX5_RCV_DBR];
+ rwq->pattern = MLX5_WQ_PATTERN;
+
+ cmd.drv.buf_addr = (uintptr_t)rwq->buf.buf;
+ cmd.drv.db_addr = (uintptr_t)rwq->db;
+ cmd.drv.rq_wqe_count = rwq->rq.wqe_cnt;
+ cmd.drv.rq_wqe_shift = rwq->rq.wqe_shift;
+ cmd.drv.user_index = mlx5_store_uidx(ctx, rwq);
+ if (cmd.drv.user_index < 0) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n");
+ goto err_free_db_rec;
+ }
+
+ err = ibv_exp_cmd_create_wq(context, attr, &rwq->wq, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp));
+ if (err)
+ goto err_create;
+
+ rwq->rsc.type = rsc_type;
+ rwq->rsc.rsn = cmd.drv.user_index;
+
+ return &rwq->wq;
+
+err_create:
+ mlx5_clear_uidx(ctx, cmd.drv.user_index);
+err_free_db_rec:
+ mlx5_free_db(to_mctx(context), rwq->db);
+err_free_rwq_buf:
+ mlx5_free_rwq_buf(rwq, context);
+err:
+ free(rwq);
+ return NULL;
+}
+
+int mlx5_exp_modify_wq(struct ibv_exp_wq *wq,
+ struct ibv_exp_wq_attr *attr)
+{
+ struct mlx5_exp_modify_wq cmd;
+ struct mlx5_rwq *rwq = to_mrwq(wq);
+ int ret;
+
+ if ((attr->attr_mask & IBV_EXP_WQ_ATTR_STATE) &&
+ attr->wq_state == IBV_EXP_WQS_RDY) {
+ if ((attr->attr_mask & IBV_EXP_WQ_ATTR_CURR_STATE) &&
+ attr->curr_wq_state != wq->state)
+ return -EINVAL;
+
+ if (wq->state == IBV_EXP_WQS_RESET) {
+ mlx5_lock(&to_mcq(wq->cq)->lock);
+ __mlx5_cq_clean(to_mcq(wq->cq),
+ rwq->rsc.rsn, wq->srq ? to_msrq(wq->srq) : NULL);
+ mlx5_unlock(&to_mcq(wq->cq)->lock);
+ mlx5_init_rwq_indices(rwq);
+ rwq->db[MLX5_RCV_DBR] = 0;
+ rwq->db[MLX5_SND_DBR] = 0;
+ }
+ }
+
+ memset(&cmd, 0, sizeof(cmd));
+ ret = ibv_exp_cmd_modify_wq(wq, attr, &cmd.ibv_cmd, sizeof(cmd));
+ return ret;
+}
+
+int mlx5_exp_destroy_wq(struct ibv_exp_wq *wq)
+{
+ struct mlx5_rwq *rwq = to_mrwq(wq);
+ int ret;
+
+ ret = ibv_exp_cmd_destroy_wq(wq);
+ if (ret) {
+ pthread_mutex_unlock(&to_mctx(wq->context)->rsc_table_mutex);
+ return ret;
+ }
+
+ mlx5_lock(&to_mcq(wq->cq)->lock);
+ __mlx5_cq_clean(to_mcq(wq->cq), rwq->rsc.rsn,
+ wq->srq ? to_msrq(wq->srq) : NULL);
+ mlx5_unlock(&to_mcq(wq->cq)->lock);
+
+ mlx5_clear_uidx(to_mctx(wq->context), rwq->rsc.rsn);
+ mlx5_free_db(to_mctx(wq->context), rwq->db);
+ mlx5_free_rwq_buf(rwq, wq->context);
+ free(rwq);
+
+ return 0;
+}
+
+struct ibv_exp_dct *mlx5_create_dct(struct ibv_context *context,
+ struct ibv_exp_dct_init_attr *attr)
+{
+ struct mlx5_create_dct cmd;
+ struct mlx5_create_dct_resp resp;
+ struct mlx5_destroy_dct cmdd;
+ struct mlx5_destroy_dct_resp respd;
+ int err;
+ struct mlx5_dct *dct;
+ struct mlx5_context *ctx = to_mctx(context);
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(context)->dbg_fp;
+#endif
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&cmdd, 0, sizeof(cmdd));
+ memset(&resp, 0, sizeof(resp));
+ dct = calloc(1, sizeof(*dct));
+ if (!dct)
+ return NULL;
+
+ if (ctx->cqe_version) {
+ cmd.drv.uidx = mlx5_store_uidx(ctx, dct);
+ if (cmd.drv.uidx < 0) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n");
+ goto ex_err;
+ }
+ } else {
+ pthread_mutex_lock(&ctx->rsc_table_mutex);
+ }
+
+ err = ibv_exp_cmd_create_dct(context, &dct->ibdct, attr, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd) - sizeof(cmd.ibv_cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp) - sizeof(resp.ibv_resp));
+ if (err)
+ goto err_uidx;
+
+ dct->ibdct.handle = resp.ibv_resp.dct_handle;
+ dct->ibdct.dct_num = resp.ibv_resp.dct_num;
+ dct->ibdct.pd = attr->pd;
+ dct->ibdct.cq = attr->cq;
+ dct->ibdct.srq = attr->srq;
+
+ if (!ctx->cqe_version) {
+ err = mlx5_store_rsc(ctx, dct->ibdct.dct_num, dct);
+ if (err)
+ goto err_destroy;
+
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ }
+ dct->rsc.type = MLX5_RSC_TYPE_DCT;
+ dct->rsc.rsn = ctx->cqe_version ? cmd.drv.uidx :
+ resp.ibv_resp.dct_num;
+
+ return &dct->ibdct;
+
+err_destroy:
+ if (ibv_exp_cmd_destroy_dct(context, &dct->ibdct,
+ &cmdd.ibv_cmd,
+ sizeof(cmdd.ibv_cmd),
+ sizeof(cmdd) - sizeof(cmdd.ibv_cmd),
+ &respd.ibv_resp, sizeof(respd.ibv_resp),
+ sizeof(respd) - sizeof(respd.ibv_resp)))
+ fprintf(stderr, "failed to destory DCT\n");
+err_uidx:
+ if (ctx->cqe_version)
+ mlx5_clear_uidx(ctx, cmd.drv.uidx);
+ else
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ex_err:
+ free(dct);
+ return NULL;
+}
+
+int mlx5_destroy_dct(struct ibv_exp_dct *dct)
+{
+ struct mlx5_destroy_dct cmd;
+ struct mlx5_destroy_dct_resp resp;
+ int err;
+ struct mlx5_dct *mdct = to_mdct(dct);
+ struct mlx5_context *ctx = to_mctx(dct->context);
+
+
+ memset(&cmd, 0, sizeof(cmd));
+ if (!ctx->cqe_version)
+ pthread_mutex_lock(&ctx->rsc_table_mutex);
+ cmd.ibv_cmd.dct_handle = dct->handle;
+ err = ibv_exp_cmd_destroy_dct(dct->context, dct,
+ &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd) - sizeof(cmd.ibv_cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp) - sizeof(resp.ibv_resp));
+ if (err)
+ goto ex_err;
+
+ mlx5_cq_clean(to_mcq(dct->cq), mdct->rsc.rsn, to_msrq(dct->srq));
+ if (ctx->cqe_version) {
+ mlx5_clear_uidx(ctx, mdct->rsc.rsn);
+ } else {
+ mlx5_clear_rsc(to_mctx(dct->context), dct->dct_num);
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ }
+
+ free(mdct);
+ return 0;
+
+ex_err:
+ if (!ctx->cqe_version)
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ return err;
+}
+
+int mlx5_query_dct(struct ibv_exp_dct *dct, struct ibv_exp_dct_attr *attr)
+{
+ struct mlx5_query_dct cmd;
+ struct mlx5_query_dct_resp resp;
+ int err;
+
+ cmd.ibv_cmd.dct_handle = dct->handle;
+ err = ibv_exp_cmd_query_dct(dct->context, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd) - sizeof(cmd.ibv_cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp) - sizeof(resp.ibv_resp),
+ attr);
+ if (err)
+ goto out;
+
+ attr->cq = dct->cq;
+ attr->pd = dct->pd;
+ attr->srq = dct->srq;
+
+out:
+ return err;
+}
+
+int mlx5_arm_dct(struct ibv_exp_dct *dct, struct ibv_exp_arm_attr *attr)
+{
+ struct mlx5_arm_dct cmd;
+ struct mlx5_arm_dct_resp resp;
+ int err;
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&resp, 0, sizeof(resp));
+ cmd.ibv_cmd.dct_handle = dct->handle;
+ err = ibv_exp_cmd_arm_dct(dct->context, attr, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd) - sizeof(cmd.ibv_cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp) - sizeof(resp.ibv_resp));
+ return err;
+}
+
+static void mlx5_lock_cqs(struct ibv_qp *qp)
+{
+ struct mlx5_cq *send_cq = to_mcq(qp->send_cq);
+ struct mlx5_cq *recv_cq = to_mcq(qp->recv_cq);
+
+ if (send_cq && recv_cq) {
+ if (send_cq == recv_cq) {
+ mlx5_lock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ mlx5_lock(&send_cq->lock);
+ mlx5_lock(&recv_cq->lock);
+ } else {
+ mlx5_lock(&recv_cq->lock);
+ mlx5_lock(&send_cq->lock);
+ }
+ } else if (send_cq) {
+ mlx5_lock(&send_cq->lock);
+ } else if (recv_cq) {
+ mlx5_lock(&recv_cq->lock);
+ }
+}
+
+static void mlx5_unlock_cqs(struct ibv_qp *qp)
+{
+ struct mlx5_cq *send_cq = to_mcq(qp->send_cq);
+ struct mlx5_cq *recv_cq = to_mcq(qp->recv_cq);
+
+ if (send_cq && recv_cq) {
+ if (send_cq == recv_cq) {
+ mlx5_unlock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ mlx5_unlock(&recv_cq->lock);
+ mlx5_unlock(&send_cq->lock);
+ } else {
+ mlx5_unlock(&send_cq->lock);
+ mlx5_unlock(&recv_cq->lock);
+ }
+ } else if (send_cq) {
+ mlx5_unlock(&send_cq->lock);
+ } else if (recv_cq) {
+ mlx5_unlock(&recv_cq->lock);
+ }
+}
+
+int mlx5_destroy_qp(struct ibv_qp *ibqp)
+{
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ struct mlx5_context *ctx = to_mctx(ibqp->context);
+ int ret;
+
+ if (qp->rx_qp) {
+ ret = ibv_cmd_destroy_qp(ibqp);
+ if (ret)
+ return ret;
+ goto free;
+ }
+
+ if (!ctx->cqe_version)
+ pthread_mutex_lock(&ctx->rsc_table_mutex);
+
+ ret = ibv_cmd_destroy_qp(ibqp);
+ if (ret) {
+ if (!ctx->cqe_version)
+ pthread_mutex_unlock(&to_mctx(ibqp->context)->rsc_table_mutex);
+ return ret;
+ }
+
+ mlx5_lock_cqs(ibqp);
+
+ __mlx5_cq_clean(to_mcq(ibqp->recv_cq), qp->rsc.rsn,
+ ibqp->srq ? to_msrq(ibqp->srq) : NULL);
+ if (ibqp->send_cq != ibqp->recv_cq)
+ __mlx5_cq_clean(to_mcq(ibqp->send_cq), qp->rsc.rsn, NULL);
+
+ if (!ctx->cqe_version)
+ mlx5_clear_rsc(ctx, ibqp->qp_num);
+
+ mlx5_unlock_cqs(ibqp);
+ if (!ctx->cqe_version)
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ else if (!is_xrc_tgt(ibqp->qp_type))
+ mlx5_clear_uidx(ctx, qp->rsc.rsn);
+
+ mlx5_free_db(ctx, qp->gen_data.db);
+ mlx5_free_qp_buf(qp);
+free:
+ free(qp);
+
+ return 0;
+}
+
+int mlx5_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+ struct ibv_query_qp cmd;
+ struct mlx5_qp *qp = to_mqp(ibqp);
+ int ret;
+
+ if (qp->rx_qp)
+ return -ENOSYS;
+
+ ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd, sizeof(cmd));
+ if (ret)
+ return ret;
+
+ init_attr->cap.max_send_wr = qp->sq.max_post;
+ init_attr->cap.max_send_sge = qp->sq.max_gs;
+ init_attr->cap.max_inline_data = qp->data_seg.max_inline_data;
+
+ attr->cap = init_attr->cap;
+
+ return 0;
+}
+
+int mlx5_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ struct ibv_port_attr port_attr;
+ struct ibv_modify_qp cmd;
+ int ret;
+ uint32_t *db;
+
+ if (attr_mask & IBV_QP_PORT) {
+ ret = ibv_query_port(qp->context, attr->port_num,
+ &port_attr);
+ if (ret)
+ return ret;
+ mqp->link_layer = port_attr.link_layer;
+ }
+
+ if (to_mqp(qp)->rx_qp)
+ return -ENOSYS;
+
+ ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+ if (!ret &&
+ (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RESET) {
+ if (qp->recv_cq) {
+ mlx5_cq_clean(to_mcq(qp->recv_cq), mqp->rsc.rsn,
+ qp->srq ? to_msrq(qp->srq) : NULL);
+ }
+ if (qp->send_cq != qp->recv_cq && qp->send_cq)
+ mlx5_cq_clean(to_mcq(qp->send_cq), mqp->rsc.rsn, NULL);
+
+ mlx5_init_qp_indices(mqp);
+ db = mqp->gen_data.db;
+ db[MLX5_RCV_DBR] = 0;
+ db[MLX5_SND_DBR] = 0;
+ }
+ if (!ret && (attr_mask & IBV_QP_STATE))
+ mlx5_update_post_send_one(mqp, qp->state, qp->qp_type);
+
+ if (!ret &&
+ (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RTR &&
+ qp->qp_type == IBV_QPT_RAW_ETH) {
+ mlx5_lock(&mqp->rq.lock);
+ mqp->gen_data.db[MLX5_RCV_DBR] = htonl(mqp->rq.head & 0xffff);
+ mlx5_unlock(&mqp->rq.lock);
+ }
+
+
+ return ret;
+}
+
+#ifndef s6_addr32
+#define s6_addr32 __u6_addr.__u6_addr32
+#endif
+
+static inline int ipv6_addr_v4mapped(const struct in6_addr *a)
+{
+ return ((a->s6_addr32[0] | a->s6_addr32[1]) |
+ (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL ||
+ /* IPv4 encoded multicast addresses */
+ (a->s6_addr32[0] == htonl(0xff0e0000) &&
+ ((a->s6_addr32[1] |
+ (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL));
+}
+
+struct ibv_ah *mlx5_create_ah_common(struct ibv_pd *pd,
+ struct ibv_ah_attr *attr,
+ uint8_t link_layer,
+ int gid_type)
+{
+ struct mlx5_ah *ah;
+ struct mlx5_context *ctx = to_mctx(pd->context);
+ struct mlx5_wqe_av *wqe;
+ uint32_t tmp;
+ uint8_t grh;
+
+ if (unlikely(attr->port_num < 1 || attr->port_num > ctx->num_ports)) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if (unlikely(!attr->dlid) &&
+ (link_layer != IBV_LINK_LAYER_ETHERNET)) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if (unlikely(!attr->is_global) &&
+ (link_layer == IBV_LINK_LAYER_ETHERNET)) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ ah = calloc(1, sizeof *ah);
+ if (unlikely(!ah)) {
+ errno = ENOMEM;
+ return NULL;
+ }
+ wqe = &ah->av;
+
+ wqe->base.stat_rate_sl = (attr->static_rate << 4) | attr->sl;
+
+ if (link_layer == IBV_LINK_LAYER_ETHERNET) {
+ if (gid_type == IBV_EXP_ROCE_V2_GID_TYPE)
+ wqe->base.rlid = htons(ctx->rroce_udp_sport_min);
+ grh = 0;
+ } else {
+ wqe->base.fl_mlid = attr->src_path_bits & 0x7f;
+ wqe->base.rlid = htons(attr->dlid);
+ grh = 1;
+ }
+
+ if (attr->is_global) {
+ wqe->base.dqp_dct = htonl(MLX5_EXTENDED_UD_AV);
+ wqe->grh_sec.tclass = attr->grh.traffic_class;
+ if ((attr->grh.hop_limit < 2) &&
+ (link_layer == IBV_LINK_LAYER_ETHERNET) &&
+ (gid_type != IBV_EXP_IB_ROCE_V1_GID_TYPE))
+ wqe->grh_sec.hop_limit = 0xff;
+ else
+ wqe->grh_sec.hop_limit = attr->grh.hop_limit;
+ tmp = htonl((grh << 30) |
+ ((attr->grh.sgid_index & 0xff) << 20) |
+ (attr->grh.flow_label & 0xfffff));
+ wqe->grh_sec.grh_gid_fl = tmp;
+ memcpy(wqe->grh_sec.rgid, attr->grh.dgid.raw, 16);
+ if ((link_layer == IBV_LINK_LAYER_ETHERNET) &&
+ (gid_type != IBV_EXP_IB_ROCE_V1_GID_TYPE) &&
+ ipv6_addr_v4mapped((struct in6_addr *)attr->grh.dgid.raw))
+ memset(wqe->grh_sec.rgid, 0, 12);
+ } else if (!ctx->compact_av) {
+ wqe->base.dqp_dct = htonl(MLX5_EXTENDED_UD_AV);
+ }
+
+ return &ah->ibv_ah;
+}
+
+struct ibv_ah *mlx5_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr)
+{
+ struct ibv_exp_port_attr port_attr;
+
+ port_attr.comp_mask = IBV_EXP_QUERY_PORT_ATTR_MASK1;
+ port_attr.mask1 = IBV_EXP_QUERY_PORT_LINK_LAYER;
+
+ if (ibv_exp_query_port(pd->context, attr->port_num, &port_attr))
+ return NULL;
+
+ return mlx5_create_ah_common(pd, attr, port_attr.link_layer,
+ IBV_EXP_IB_ROCE_V1_GID_TYPE);
+}
+
+struct ibv_ah *mlx5_exp_create_ah(struct ibv_pd *pd,
+ struct ibv_exp_ah_attr *attr_ex)
+{
+ struct mlx5_ah *mah;
+ struct ibv_ah *ah;
+ struct ibv_exp_port_attr port_attr;
+ struct ibv_exp_gid_attr gid_attr;
+
+ gid_attr.comp_mask = IBV_EXP_QUERY_GID_ATTR_TYPE;
+ if (ibv_exp_query_gid_attr(pd->context, attr_ex->port_num, attr_ex->grh.sgid_index,
+ &gid_attr))
+ return NULL;
+
+ port_attr.comp_mask = IBV_EXP_QUERY_PORT_ATTR_MASK1;
+ port_attr.mask1 = IBV_EXP_QUERY_PORT_LINK_LAYER;
+
+ if (ibv_exp_query_port(pd->context, attr_ex->port_num, &port_attr))
+ return NULL;
+
+ ah = mlx5_create_ah_common(pd, (struct ibv_ah_attr *)attr_ex,
+ port_attr.link_layer, gid_attr.type);
+
+ if (!ah)
+ return NULL;
+
+ mah = to_mah(ah);
+
+ /* ll_address.len == 0 means no ll address given */
+ if (attr_ex->comp_mask & IBV_EXP_AH_ATTR_LL &&
+ 0 != attr_ex->ll_address.len) {
+ if (LL_ADDRESS_ETH != attr_ex->ll_address.type ||
+ port_attr.link_layer != IBV_LINK_LAYER_ETHERNET)
+ goto err;
+
+ /* link layer is ethernet */
+ if (6 != attr_ex->ll_address.len ||
+ NULL == attr_ex->ll_address.address)
+ goto err;
+
+ memcpy(mah->av.grh_sec.rmac,
+ attr_ex->ll_address.address,
+ attr_ex->ll_address.len);
+ }
+
+ return ah;
+
+err:
+ free(ah);
+ return NULL;
+}
+
+int mlx5_destroy_ah(struct ibv_ah *ah)
+{
+ free(to_mah(ah));
+
+ return 0;
+}
+
+int mlx5_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid)
+{
+ return ibv_cmd_attach_mcast(qp, gid, lid);
+}
+
+int mlx5_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid)
+{
+ return ibv_cmd_detach_mcast(qp, gid, lid);
+}
+
+struct ibv_xrcd *mlx5_open_xrcd(struct ibv_context *context,
+ struct ibv_xrcd_init_attr *xrcd_init_attr)
+{
+ int err;
+ struct verbs_xrcd *xrcd;
+ struct ibv_open_xrcd cmd = {0};
+ struct ibv_open_xrcd_resp resp = {0};
+
+ xrcd = calloc(1, sizeof(*xrcd));
+ if (!xrcd)
+ return NULL;
+
+ err = ibv_cmd_open_xrcd(context, xrcd, sizeof(*xrcd), xrcd_init_attr,
+ &cmd, sizeof(cmd), &resp, sizeof(resp));
+ if (err) {
+ free(xrcd);
+ return NULL;
+ }
+
+ return &xrcd->xrcd;
+}
+
+struct ibv_srq *mlx5_create_xrc_srq(struct ibv_context *context,
+ struct ibv_srq_init_attr_ex *attr)
+{
+ int err;
+ struct mlx5_create_srq_ex cmd;
+ struct mlx5_create_srq_resp resp;
+ struct mlx5_srq *msrq;
+ struct mlx5_context *ctx;
+ int max_sge;
+ struct ibv_srq *ibsrq;
+#ifdef MLX5_DEBUG
+ FILE *fp = to_mctx(context)->dbg_fp;
+#endif
+
+ msrq = calloc(1, sizeof(*msrq));
+ if (!msrq)
+ return NULL;
+
+ msrq->is_xsrq = 1;
+ ibsrq = (struct ibv_srq *)&msrq->vsrq;
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&resp, 0, sizeof(resp));
+
+ ctx = to_mctx(context);
+
+ if (mlx5_spinlock_init(&msrq->lock, !mlx5_single_threaded)) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err;
+ }
+
+ if (attr->attr.max_wr > ctx->max_srq_recv_wr) {
+ fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n",
+ __func__, __LINE__, attr->attr.max_wr,
+ ctx->max_srq_recv_wr);
+ errno = EINVAL;
+ goto err;
+ }
+
+ /*
+ * this calculation does not consider required control segments. The
+ * final calculation is done again later. This is done so to avoid
+ * overflows of variables
+ */
+ max_sge = ctx->max_recv_wr / sizeof(struct mlx5_wqe_data_seg);
+ if (attr->attr.max_sge > max_sge) {
+ fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n",
+ __func__, __LINE__, attr->attr.max_wr,
+ ctx->max_srq_recv_wr);
+ errno = EINVAL;
+ goto err;
+ }
+
+ msrq->max = align_queue_size(attr->attr.max_wr + 1);
+ msrq->max_gs = attr->attr.max_sge;
+ msrq->counter = 0;
+
+ if (mlx5_alloc_srq_buf(context, msrq)) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err;
+ }
+
+ msrq->db = mlx5_alloc_dbrec(ctx);
+ if (!msrq->db) {
+ fprintf(stderr, "%s-%d:\n", __func__, __LINE__);
+ goto err_free;
+ }
+
+ *msrq->db = 0;
+
+ cmd.buf_addr = (uintptr_t) msrq->buf.buf;
+ cmd.db_addr = (uintptr_t) msrq->db;
+ msrq->wq_sig = srq_sig_enabled(context);
+ if (msrq->wq_sig)
+ cmd.flags = MLX5_SRQ_FLAG_SIGNATURE;
+
+ attr->attr.max_sge = msrq->max_gs;
+
+ if (ctx->cqe_version) {
+ cmd.uidx = mlx5_store_uidx(ctx, msrq);
+ if (cmd.uidx < 0) {
+ mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n");
+ goto err_free_db;
+ }
+ } else {
+ pthread_mutex_lock(&ctx->srq_table_mutex);
+ }
+
+ err = ibv_cmd_create_srq_ex(context, &msrq->vsrq, sizeof(msrq->vsrq),
+ attr, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (err)
+ goto err_free_uidx;
+
+ if (!ctx->cqe_version) {
+ err = mlx5_store_srq(to_mctx(context), resp.srqn, msrq);
+ if (err)
+ goto err_destroy;
+
+ pthread_mutex_unlock(&ctx->srq_table_mutex);
+ }
+
+ msrq->srqn = resp.srqn;
+ msrq->rsc.type = MLX5_RSC_TYPE_XSRQ;
+ msrq->rsc.rsn = ctx->cqe_version ? cmd.uidx : resp.srqn;
+
+ return ibsrq;
+
+err_destroy:
+ ibv_cmd_destroy_srq(ibsrq);
+err_free_uidx:
+ if (ctx->cqe_version)
+ mlx5_clear_uidx(ctx, cmd.uidx);
+ else
+ pthread_mutex_unlock(&ctx->srq_table_mutex);
+err_free_db:
+ mlx5_free_db(ctx, msrq->db);
+
+err_free:
+ free(msrq->wrid);
+ mlx5_free_buf(&msrq->buf);
+
+err:
+ free(msrq);
+
+ return NULL;
+}
+struct ibv_srq *mlx5_create_srq_ex(struct ibv_context *context,
+ struct ibv_srq_init_attr_ex *attr)
+{
+ if (!(attr->comp_mask & IBV_SRQ_INIT_ATTR_TYPE) ||
+ (attr->srq_type == IBV_SRQT_BASIC))
+ return mlx5_create_srq(attr->pd,
+ (struct ibv_srq_init_attr *)attr);
+ else if (attr->srq_type == IBV_SRQT_XRC)
+ return mlx5_create_xrc_srq(context, attr);
+
+ return NULL;
+}
+
+int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num)
+{
+ struct mlx5_srq *msrq = to_msrq(srq);
+
+ *srq_num = msrq->srqn;
+
+ return 0;
+}
+
+struct ibv_qp *mlx5_open_qp(struct ibv_context *context,
+ struct ibv_qp_open_attr *attr)
+{
+ struct ibv_open_qp cmd;
+ struct ibv_create_qp_resp resp;
+ struct mlx5_qp *qp;
+ int ret;
+ struct mlx5_context *ctx = to_mctx(context);
+
+ qp = calloc(1, sizeof(*qp));
+ if (!qp)
+ return NULL;
+
+ ret = ibv_cmd_open_qp(context, &qp->verbs_qp, sizeof(qp->verbs_qp),
+ attr, &cmd, sizeof(cmd), &resp, sizeof(resp));
+ if (ret)
+ goto err;
+
+ if (!ctx->cqe_version) {
+ pthread_mutex_lock(&ctx->rsc_table_mutex);
+ if (mlx5_store_rsc(ctx, qp->verbs_qp.qp.qp_num, qp)) {
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ goto destroy;
+ }
+ pthread_mutex_unlock(&ctx->rsc_table_mutex);
+ }
+
+ return (struct ibv_qp *)&qp->verbs_qp;
+
+destroy:
+ ibv_cmd_destroy_qp(&qp->verbs_qp.qp);
+err:
+ free(qp);
+ return NULL;
+}
+
+int mlx5_close_xrcd(struct ibv_xrcd *ib_xrcd)
+{
+ struct verbs_xrcd *xrcd = container_of(ib_xrcd, struct verbs_xrcd, xrcd);
+ int ret;
+
+ ret = ibv_cmd_close_xrcd(xrcd);
+ if (!ret)
+ free(xrcd);
+
+ return ret;
+}
+
+int mlx5_modify_qp_ex(struct ibv_qp *qp, struct ibv_exp_qp_attr *attr,
+ uint64_t attr_mask)
+{
+ struct mlx5_qp *mqp = to_mqp(qp);
+ struct ibv_port_attr port_attr;
+ struct ibv_exp_modify_qp cmd;
+ struct ibv_exp_device_attr device_attr;
+ int ret;
+ uint32_t *db;
+
+ if (attr_mask & IBV_QP_PORT) {
+ ret = ibv_query_port(qp->context, attr->port_num,
+ &port_attr);
+ if (ret)
+ return ret;
+ mqp->link_layer = port_attr.link_layer;
+ if (((qp->qp_type == IBV_QPT_UD) && (mqp->link_layer == IBV_LINK_LAYER_INFINIBAND)) ||
+ ((qp->qp_type == IBV_QPT_RAW_ETH) && (mqp->link_layer == IBV_LINK_LAYER_ETHERNET))) {
+ memset(&device_attr, 0, sizeof(device_attr));
+ device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
+ ret = ibv_exp_query_device(qp->context, &device_attr);
+ if (ret)
+ return ret;
+ if ((device_attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS) &&
+ (device_attr.exp_device_cap_flags & IBV_EXP_DEVICE_RX_CSUM_IP_PKT))
+ mqp->gen_data.model_flags |= MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP;
+ }
+ }
+
+ if (mqp->rx_qp)
+ return -ENOSYS;
+
+ memset(&cmd, 0, sizeof(cmd));
+ ret = ibv_exp_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+ if (!ret &&
+ (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RESET) {
+ if (qp->qp_type != IBV_EXP_QPT_DC_INI)
+ mlx5_cq_clean(to_mcq(qp->recv_cq), mqp->rsc.rsn,
+ qp->srq ? to_msrq(qp->srq) : NULL);
+
+ if (qp->send_cq != qp->recv_cq)
+ mlx5_cq_clean(to_mcq(qp->send_cq), mqp->rsc.rsn, NULL);
+
+ mlx5_init_qp_indices(to_mqp(qp));
+ db = to_mqp(qp)->gen_data.db;
+ db[MLX5_RCV_DBR] = 0;
+ db[MLX5_SND_DBR] = 0;
+ }
+ if (!ret && (attr_mask & IBV_QP_STATE))
+ mlx5_update_post_send_one(to_mqp(qp), qp->state, qp->qp_type);
+
+ if (!ret &&
+ (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RTR &&
+ qp->qp_type == IBV_QPT_RAW_ETH) {
+ mlx5_lock(&mqp->rq.lock);
+ mqp->gen_data.db[MLX5_RCV_DBR] = htonl(mqp->rq.head & 0xffff);
+ mlx5_unlock(&mqp->rq.lock);
+ }
+
+ return ret;
+}
+
+void *mlx5_get_legacy_xrc(struct ibv_srq *srq)
+{
+ struct mlx5_srq *msrq = to_msrq(srq);
+
+ return msrq->ibv_srq_legacy;
+}
+
+void mlx5_set_legacy_xrc(struct ibv_srq *srq, void *legacy_xrc_srq)
+{
+ struct mlx5_srq *msrq = to_msrq(srq);
+
+ msrq->ibv_srq_legacy = legacy_xrc_srq;
+ return;
+}
+
+int mlx5_modify_cq(struct ibv_cq *cq, struct ibv_exp_cq_attr *attr, int attr_mask)
+{
+ struct ibv_exp_modify_cq cmd;
+
+ memset(&cmd, 0, sizeof(cmd));
+ return ibv_exp_cmd_modify_cq(cq, attr, attr_mask, &cmd, sizeof(cmd));
+}
+
+struct ibv_exp_mkey_list_container *mlx5_alloc_mkey_mem(struct ibv_exp_mkey_list_container_attr *attr)
+{
+ struct mlx5_klm_buf *klm;
+ int size;
+
+ if (attr->mkey_list_type !=
+ IBV_EXP_MKEY_LIST_TYPE_INDIRECT_MR) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ klm = calloc(1, sizeof(*klm));
+ if (!klm) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ size = align(attr->max_klm_list_size * sizeof(struct mlx5_wqe_data_seg), 64);
+
+ klm->alloc_buf = malloc(size + MLX5_UMR_PTR_ALIGN - 1);
+ if (!klm->alloc_buf) {
+ errno = ENOMEM;
+ goto ex_klm;
+ }
+
+ klm->align_buf = align_ptr(klm->alloc_buf, MLX5_UMR_PTR_ALIGN);
+
+ memset(klm->align_buf, 0, size);
+ klm->mr = ibv_reg_mr(attr->pd, klm->align_buf, size, 0);
+ if (!klm->mr)
+ goto ex_list;
+
+ klm->ibv_klm_list.max_klm_list_size = attr->max_klm_list_size;
+ klm->ibv_klm_list.context = klm->mr->context;
+
+ return &klm->ibv_klm_list;
+
+ex_list:
+ free(klm->alloc_buf);
+ex_klm:
+ free(klm);
+ return NULL;
+}
+
+int mlx5_free_mkey_mem(struct ibv_exp_mkey_list_container *mem)
+{
+ struct mlx5_klm_buf *klm;
+ int err;
+
+ klm = to_klm(mem);
+ err = ibv_dereg_mr(klm->mr);
+ if (err) {
+ fprintf(stderr, "unreg klm failed\n");
+ return err;
+ }
+ free(klm->alloc_buf);
+ free(klm);
+ return 0;
+}
+
+int mlx5_query_mkey(struct ibv_mr *mr, struct ibv_exp_mkey_attr *mkey_attr)
+{
+ struct mlx5_query_mkey cmd;
+ struct mlx5_query_mkey_resp resp;
+ int err;
+
+ memset(&cmd, 0, sizeof(cmd));
+ err = ibv_exp_cmd_query_mkey(mr->context, mr, mkey_attr, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd), sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp.ibv_resp),
+ sizeof(resp));
+
+ return err;
+};
+
+struct ibv_mr *mlx5_create_mr(struct ibv_exp_create_mr_in *in)
+{
+ struct mlx5_create_mr cmd;
+ struct mlx5_create_mr_resp resp;
+ struct mlx5_mr *mr;
+ int err;
+
+ if (in->attr.create_flags & IBV_EXP_MR_SIGNATURE_EN) {
+ errno = EOPNOTSUPP;
+ return NULL;
+ }
+
+ memset(&cmd, 0, sizeof(cmd));
+ memset(&resp, 0, sizeof(resp));
+
+ mr = calloc(1, sizeof(*mr));
+ if (!mr)
+ return NULL;
+
+ err = ibv_exp_cmd_create_mr(in, &mr->ibv_mr, &cmd.ibv_cmd,
+ sizeof(cmd.ibv_cmd),
+ sizeof(cmd) - sizeof(cmd.ibv_cmd),
+ &resp.ibv_resp,
+ sizeof(resp.ibv_resp), sizeof(resp) - sizeof(resp.ibv_resp));
+ if (err)
+ goto out;
+
+ return &mr->ibv_mr;
+
+out:
+ free(mr);
+ return NULL;
+};
+
+int mlx5_exp_dereg_mr(struct ibv_mr *ibmr, struct ibv_exp_dereg_out *out)
+{
+ struct mlx5_mr *mr;
+
+ if (ibmr->lkey == ODP_GLOBAL_R_LKEY || ibmr->lkey == ODP_GLOBAL_W_LKEY) {
+ out->need_dofork = 0;
+ } else {
+ mr = to_mmr(ibmr);
+ out->need_dofork = (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG ||
+ mr->type == MLX5_ODP_MR) ? 0 : 1;
+ }
+
+ return mlx5_dereg_mr(ibmr);
+}
+
+struct mlx5_info_record {
+ uint16_t lid[30];
+ uint32_t seq_num;
+};
+
+int mlx5_poll_dc_info(struct ibv_context *context,
+ struct ibv_exp_dc_info_ent *ents,
+ int nent,
+ int port)
+{
+ struct mlx5_context *ctx = to_mctx(context);
+ void *start;
+ struct mlx5_port_info_ctx *pc;
+ struct mlx5_info_record *cr;
+ int i;
+ int j;
+ uint32_t seq;
+
+ if (!ctx->cc.buf)
+ return -ENOSYS;
+
+ if (port < 1 || port > ctx->num_ports)
+ return -EINVAL;
+
+ pc = &ctx->cc.port[port - 1];
+ start = ctx->cc.buf + 4096 * (port - 1);
+
+ cr = start + (pc->consumer & 0xfff);
+ for (i = 0; i < nent; i++) {
+ seq = ntohl(cr->seq_num);
+ /* The buffer is initialized to all ff. So if the HW did not write anything,
+ the condition below will cause a return without polling any record. */
+ if ((seq & 0xfff) != (pc->consumer & 0xfff))
+ return i;
+
+ /* When the process comes to life, the buffer may alredy contain
+ valid records. The "steady" field allows the process to synchronize
+ and continue from there */
+ if (pc->steady) {
+ if (((pc->consumer >> 12) - 1) == (seq >> 12))
+ return i;
+ } else {
+ pc->consumer = seq & 0xfffff000;
+ pc->steady = 1;
+ }
+
+ /* make sure LIDs are read after we indentify a new record */
+ rmb();
+ ents[i].seqnum = seq;
+ for (j = 0; j < 30; j++)
+ ents[i].lid[j] = ntohs(cr->lid[j]);
+
+ pc->consumer += 64;
+ cr = start + (pc->consumer & 0xfff);
+ }
+ return i;
+}
+
+static struct mlx5_send_db_data *allocate_send_db(struct mlx5_context *ctx)
+{
+ struct mlx5_device *dev = to_mdev(ctx->ibv_ctx.device);
+ struct mlx5_send_db_data *send_db = NULL;
+ unsigned int db_idx;
+ struct mlx5_wc_uar *wc_uar;
+ int j;
+
+
+ mlx5_spin_lock(&ctx->send_db_lock);
+ if (!list_empty(&ctx->send_wc_db_list)) {
+ send_db = list_entry(ctx->send_wc_db_list.next, struct mlx5_send_db_data, list);
+ list_del(&send_db->list);
+ }
+ mlx5_spin_unlock(&ctx->send_db_lock);
+
+ if (!send_db) {
+ /* Fill up more send_db objects */
+ wc_uar = calloc(1, sizeof(*wc_uar));
+ if (!wc_uar) {
+ errno = ENOMEM;
+ return NULL;
+ }
+ mlx5_spin_lock(&ctx->send_db_lock);
+ /* One res_domain per UUAR */
+ if (ctx->num_wc_uars >= ctx->max_ctx_res_domain / MLX5_NUM_UUARS_PER_PAGE) {
+ errno = ENOMEM;
+ goto out;
+ }
+ db_idx = ctx->num_wc_uars;
+ wc_uar->uar = mlx5_uar_mmap(db_idx, MLX5_EXP_IB_MMAP_N_ALLOC_WC_CMD, dev->page_size, ctx->ibv_ctx.cmd_fd);
+ if (wc_uar->uar == MAP_FAILED) {
+ errno = ENOMEM;
+ goto out;
+ }
+ ctx->num_wc_uars++;
+ mlx5_spin_unlock(&ctx->send_db_lock);
+
+ wc_uar->uar_idx = db_idx;
+ for (j = 0; j < MLX5_NUM_UUARS_PER_PAGE; ++j) {
+ wc_uar->send_db_data[j].bf.reg = wc_uar->uar + MLX5_BF_OFFSET + (j * ctx->bf_reg_size);
+ wc_uar->send_db_data[j].bf.buf_size = ctx->bf_reg_size / 2;
+ wc_uar->send_db_data[j].bf.db_method = (mlx5_single_threaded && wc_auto_evict_size() == 64) ?
+ MLX5_DB_METHOD_DEDIC_BF_1_THREAD : MLX5_DB_METHOD_DEDIC_BF;
+ wc_uar->send_db_data[j].bf.offset = 0;
+
+ mlx5_lock_init(&wc_uar->send_db_data[j].bf.lock,
+ 0,
+ mlx5_get_locktype());
+
+ wc_uar->send_db_data[j].bf.need_lock = mlx5_single_threaded ? 0 : 1;
+ /* Indicate that this BF UUAR is not from the static
+ * UUAR infrastructure
+ */
+ wc_uar->send_db_data[j].bf.uuarn = MLX5_EXP_INVALID_UUAR;
+ wc_uar->send_db_data[j].wc_uar = wc_uar;
+ }
+ for (j = 0; j < MLX5_NUM_UUARS_PER_PAGE - 1; ++j) {
+ mlx5_spin_lock(&ctx->send_db_lock);
+ list_add(&wc_uar->send_db_data[j].list, &ctx->send_wc_db_list);
+ mlx5_spin_unlock(&ctx->send_db_lock);
+ }
+
+ /* Return the last send_db object to the caller */
+ send_db = &wc_uar->send_db_data[j];
+ }
+
+ return send_db;
+
+out:
+ mlx5_spin_unlock(&ctx->send_db_lock);
+ free(wc_uar);
+
+ return NULL;
+}
+
+struct ibv_exp_res_domain *mlx5_exp_create_res_domain(struct ibv_context *context,
+ struct ibv_exp_res_domain_init_attr *attr)
+{
+ struct mlx5_context *ctx = to_mctx(context);
+ struct mlx5_res_domain *res_domain;
+
+ if (attr->comp_mask >= IBV_EXP_RES_DOMAIN_RESERVED) {
+ errno = EINVAL;
+ return NULL;
+ }
+
+ if (!ctx->max_ctx_res_domain) {
+ errno = ENOSYS;
+ return NULL;
+ }
+
+ res_domain = calloc(1, sizeof(*res_domain));
+ if (!res_domain) {
+ errno = ENOMEM;
+ return NULL;
+ }
+
+ res_domain->ibv_res_domain.context = context;
+
+ /* set default values */
+ res_domain->attr.thread_model = IBV_EXP_THREAD_SAFE;
+ res_domain->attr.msg_model = IBV_EXP_MSG_DEFAULT;
+ /* get requested valid values */
+ if (attr->comp_mask & IBV_EXP_RES_DOMAIN_THREAD_MODEL)
+ res_domain->attr.thread_model = attr->thread_model;
+ if (attr->comp_mask & IBV_EXP_RES_DOMAIN_MSG_MODEL)
+ res_domain->attr.msg_model = attr->msg_model;
+ res_domain->attr.comp_mask = IBV_EXP_RES_DOMAIN_RESERVED - 1;
+
+ res_domain->send_db = allocate_send_db(ctx);
+ if (!res_domain->send_db) {
+ if (res_domain->attr.msg_model == IBV_EXP_MSG_FORCE_LOW_LATENCY)
+ goto err;
+ } else {
+ switch (res_domain->attr.thread_model) {
+ case IBV_EXP_THREAD_SAFE:
+ res_domain->send_db->bf.db_method = MLX5_DB_METHOD_BF;
+ res_domain->send_db->bf.need_lock = 1;
+ break;
+ case IBV_EXP_THREAD_UNSAFE:
+ res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF;
+ res_domain->send_db->bf.need_lock = 0;
+ break;
+ case IBV_EXP_THREAD_SINGLE:
+ if (wc_auto_evict_size() == 64) {
+ res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF_1_THREAD;
+ res_domain->send_db->bf.need_lock = 0;
+ } else {
+ res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF;
+ res_domain->send_db->bf.need_lock = 0;
+ }
+ break;
+ }
+ }
+
+ return &res_domain->ibv_res_domain;
+
+err:
+ free(res_domain);
+
+ return NULL;
+}
+
+static void free_send_db(struct mlx5_context *ctx,
+ struct mlx5_send_db_data *send_db)
+{
+ /*
+ * Currently we free the resource domain UUAR to the local
+ * send_wc_db_list. In the future we may consider unmapping
+ * UAR which all its UUARs are free.
+ */
+ mlx5_spin_lock(&ctx->send_db_lock);
+ list_add(&send_db->list, &ctx->send_wc_db_list);
+ mlx5_spin_unlock(&ctx->send_db_lock);
+}
+
+int mlx5_exp_destroy_res_domain(struct ibv_context *context,
+ struct ibv_exp_res_domain *res_dom,
+ struct ibv_exp_destroy_res_domain_attr *attr)
+{
+ struct mlx5_res_domain *res_domain;
+
+ if (!res_dom)
+ return EINVAL;
+
+ res_domain = to_mres_domain(res_dom);
+ if (res_domain->send_db)
+ free_send_db(to_mctx(context), res_domain->send_db);
+
+ free(res_domain);
+
+ return 0;
+}
+
+void *mlx5_exp_query_intf(struct ibv_context *context, struct ibv_exp_query_intf_params *params,
+ enum ibv_exp_query_intf_status *status)
+{
+ void *family = NULL;
+ struct mlx5_qp *qp;
+ struct mlx5_cq *cq;
+ struct mlx5_rwq *rwq;
+
+ *status = IBV_EXP_INTF_STAT_OK;
+
+ if (!params->obj) {
+ errno = EINVAL;
+ *status = IBV_EXP_INTF_STAT_INVAL_OBJ;
+ return NULL;
+ }
+
+ switch (params->intf) {
+ case IBV_EXP_INTF_QP_BURST:
+ qp = to_mqp(params->obj);
+ if (qp->gen_data_warm.pattern == MLX5_QP_PATTERN) {
+ family = mlx5_get_qp_burst_family(qp, params, status);
+ if (*status != IBV_EXP_INTF_STAT_OK) {
+ fprintf(stderr, PFX "Failed to get QP burst family\n");
+ errno = EINVAL;
+ }
+ } else {
+ fprintf(stderr, PFX "Warning: non-valid QP passed to query interface 0x%x 0x%x\n", qp->gen_data_warm.pattern, MLX5_QP_PATTERN);
+ *status = IBV_EXP_INTF_STAT_INVAL_OBJ;
+ errno = EINVAL;
+ }
+ break;
+
+ case IBV_EXP_INTF_CQ:
+ cq = to_mcq(params->obj);
+ if (cq->pattern == MLX5_CQ_PATTERN) {
+ family = (void *)mlx5_get_poll_cq_family(cq, params, status);
+ } else {
+ fprintf(stderr, PFX "Warning: non-valid CQ passed to query interface\n");
+ *status = IBV_EXP_INTF_STAT_INVAL_OBJ;
+ errno = EINVAL;
+ }
+ break;
+
+ case IBV_EXP_INTF_WQ:
+ rwq = to_mrwq(params->obj);
+ if (rwq->pattern == MLX5_WQ_PATTERN) {
+ family = mlx5_get_wq_family(rwq, params, status);
+ if (*status != IBV_EXP_INTF_STAT_OK) {
+ fprintf(stderr, PFX "Failed to get WQ family\n");
+ errno = EINVAL;
+ }
+ } else {
+ fprintf(stderr, PFX "Warning: non-valid WQ passed to query interface\n");
+ *status = IBV_EXP_INTF_STAT_INVAL_OBJ;
+ errno = EINVAL;
+ }
+ break;
+
+ default:
+ *status = IBV_EXP_INTF_STAT_INTF_NOT_SUPPORTED;
+ errno = EINVAL;
+ }
+
+ return family;
+}
+
+int mlx5_exp_release_intf(struct ibv_context *context, void *intf,
+ struct ibv_exp_release_intf_params *params)
+{
+ return 0;
+}
+
+#define READL(ptr) (*((uint32_t *)(ptr)))
+static int mlx5_read_clock(struct ibv_context *context, uint64_t *cycles)
+{
+ uint32_t clockhi, clocklo, clockhi1;
+ int i;
+ struct mlx5_context *ctx = to_mctx(context);
+
+ if (!ctx->hca_core_clock)
+ return -EOPNOTSUPP;
+
+ /* Handle wraparound */
+ for (i = 0; i < 2; i++) {
+ clockhi = ntohl(READL(ctx->hca_core_clock));
+ clocklo = ntohl(READL(ctx->hca_core_clock + 4));
+ clockhi1 = ntohl(READL(ctx->hca_core_clock));
+ if (clockhi == clockhi1)
+ break;
+ }
+
+ *cycles = (uint64_t)(clockhi & 0x7fffffff) << 32 | (uint64_t)clocklo;
+
+ return 0;
+}
+
+int mlx5_exp_query_values(struct ibv_context *context, int q_values,
+ struct ibv_exp_values *values)
+{
+ int err = 0;
+
+ values->comp_mask = 0;
+
+ if (q_values & (IBV_EXP_VALUES_HW_CLOCK | IBV_EXP_VALUES_HW_CLOCK_NS)) {
+ uint64_t cycles;
+
+ err = mlx5_read_clock(context, &cycles);
+ if (!err) {
+ if (q_values & IBV_EXP_VALUES_HW_CLOCK) {
+ values->hwclock = cycles;
+ values->comp_mask |= IBV_EXP_VALUES_HW_CLOCK;
+ }
+ if (q_values & IBV_EXP_VALUES_HW_CLOCK_NS) {
+ struct mlx5_context *ctx = to_mctx(context);
+
+ values->hwclock_ns =
+ (((uint64_t)values->hwclock &
+ ctx->core_clock.mask) *
+ ctx->core_clock.mult)
+ >> ctx->core_clock.shift;
+ values->comp_mask |= IBV_EXP_VALUES_HW_CLOCK_NS;
+ }
+ }
+ }
+
+ return err;
+}
+
Index: contrib/ofed/libmlx5/src/wqe.h
===================================================================
--- /dev/null
+++ contrib/ofed/libmlx5/src/wqe.h
@@ -0,0 +1,298 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef WQE_H
+#define WQE_H
+
+enum {
+ MLX5_WQE_CTRL_CQ_UPDATE = 2 << 2,
+ MLX5_WQE_CTRL_SOLICITED = 1 << 1,
+ MLX5_WQE_CTRL_FENCE = 4 << 5,
+};
+
+enum {
+ MLX5_INVALID_LKEY = 0x100,
+};
+
+enum {
+ MLX5_EXTENDED_UD_AV = 0x80000000,
+};
+
+enum {
+ MLX5_FENCE_MODE_NONE = 0 << 5,
+ MLX5_FENCE_MODE_INITIATOR_SMALL = 1 << 5,
+ MLX5_FENCE_MODE_STRONG_ORDERING = 3 << 5,
+ MLX5_FENCE_MODE_SMALL_AND_FENCE = 4 << 5,
+};
+
+struct mlx5_wqe_srq_next_seg {
+ uint8_t rsvd0[2];
+ uint16_t next_wqe_index;
+ uint8_t signature;
+ uint8_t rsvd1[11];
+};
+
+struct mlx5_wqe_data_seg {
+ uint32_t byte_count;
+ uint32_t lkey;
+ uint64_t addr;
+};
+
+struct mlx5_eqe_comp {
+ uint32_t reserved[6];
+ uint32_t cqn;
+};
+
+struct mlx5_eqe_qp_srq {
+ uint32_t reserved[6];
+ uint32_t qp_srq_n;
+};
+
+enum {
+ MLX5_ETH_WQE_L3_CSUM = (1 << 6),
+ MLX5_ETH_WQE_L4_CSUM = (1 << 7),
+};
+
+enum {
+ MLX5_ETH_INLINE_HEADER_SIZE = 16,
+};
+
+struct mlx5_wqe_eth_seg {
+ uint32_t rsvd0;
+ uint8_t cs_flags;
+ uint8_t rsvd1;
+ uint16_t mss;
+ uint32_t rsvd2;
+ uint16_t inline_hdr_sz;
+ uint8_t inline_hdr_start[2];
+ uint8_t inline_hdr[16];
+};
+
+struct mlx5_wqe_ctrl_seg {
+ uint32_t opmod_idx_opcode;
+ uint32_t qpn_ds;
+ uint8_t signature;
+ uint8_t rsvd[2];
+ uint8_t fm_ce_se;
+ uint32_t imm;
+};
+
+struct mlx5_wqe_xrc_seg {
+ uint32_t xrc_srqn;
+ uint8_t rsvd[12];
+};
+
+struct mlx5_wqe_masked_atomic_seg {
+ uint64_t swap_add;
+ uint64_t compare;
+ uint64_t swap_add_mask;
+ uint64_t compare_mask;
+};
+
+struct mlx5_base_av {
+ union {
+ struct {
+ uint32_t qkey;
+ uint32_t reserved;
+ } qkey;
+ uint64_t dc_key;
+ } key;
+ uint32_t dqp_dct;
+ uint8_t stat_rate_sl;
+ uint8_t fl_mlid;
+ uint16_t rlid;
+};
+
+struct mlx5_grh_av {
+ uint8_t reserved0[4];
+ uint8_t rmac[6];
+ uint8_t tclass;
+ uint8_t hop_limit;
+ uint32_t grh_gid_fl;
+ uint8_t rgid[16];
+};
+
+struct mlx5_wqe_av {
+ struct mlx5_base_av base;
+ struct mlx5_grh_av grh_sec;
+};
+
+struct mlx5_wqe_datagram_seg {
+ struct mlx5_wqe_av av;
+};
+
+struct mlx5_wqe_raddr_seg {
+ uint64_t raddr;
+ uint32_t rkey;
+ uint32_t reserved;
+};
+
+struct mlx5_wqe_atomic_seg {
+ uint64_t swap_add;
+ uint64_t compare;
+};
+
+struct mlx5_wqe_inl_data_seg {
+ uint32_t byte_count;
+};
+
+struct mlx5_wqe_umr_ctrl_seg {
+ uint8_t flags;
+ uint8_t rsvd0[3];
+ uint16_t klm_octowords;
+ uint16_t bsf_octowords;
+ uint64_t mkey_mask;
+ uint8_t rsvd1[32];
+};
+
+struct mlx5_mkey_seg {
+ /* This is a two bit field occupying bits 31-30.
+ * bit 31 is always 0,
+ * bit 30 is zero for regular MRs and 1 (e.g free) for UMRs that do not have tanslation
+ */
+ uint8_t status;
+ uint8_t pcie_control;
+ uint8_t flags;
+ uint8_t version;
+ uint32_t qpn_mkey7_0;
+ uint8_t rsvd1[4];
+ uint32_t flags_pd;
+ uint64_t start_addr;
+ uint64_t len;
+ uint32_t bsfs_octo_size;
+ uint8_t rsvd2[16];
+ uint32_t xlt_oct_size;
+ uint8_t rsvd3[3];
+ uint8_t log2_page_size;
+ uint8_t rsvd4[4];
+};
+
+struct mlx5_seg_set_psv {
+ uint8_t rsvd[4];
+ uint16_t syndrome;
+ uint16_t status;
+ uint16_t block_guard;
+ uint16_t app_tag;
+ uint32_t ref_tag;
+ uint32_t mkey;
+ uint64_t va;
+};
+
+struct mlx5_seg_get_psv {
+ uint8_t rsvd[19];
+ uint8_t num_psv;
+ uint32_t l_key;
+ uint64_t va;
+ uint32_t psv_index[4];
+};
+
+struct mlx5_seg_check_psv {
+ uint8_t rsvd0[2];
+ uint16_t err_coalescing_op;
+ uint8_t rsvd1[2];
+ uint16_t xport_err_op;
+ uint8_t rsvd2[2];
+ uint16_t xport_err_mask;
+ uint8_t rsvd3[7];
+ uint8_t num_psv;
+ uint32_t l_key;
+ uint64_t va;
+ uint32_t psv_index[4];
+};
+
+struct mlx5_seg_repeat_ent {
+ uint16_t stride;
+ uint16_t byte_count;
+ uint32_t memkey;
+ uint64_t va;
+};
+
+struct mlx5_seg_repeat_block {
+ uint32_t byte_count;
+ uint32_t const_0x400;
+ uint32_t repeat_count;
+ uint16_t reserved;
+ uint16_t num_ent;
+ struct mlx5_seg_repeat_ent entries[0];
+};
+
+struct mlx5_rwqe_sig {
+ uint8_t rsvd0[4];
+ uint8_t signature;
+ uint8_t rsvd1[11];
+};
+
+struct mlx5_wqe_signature_seg {
+ uint8_t rsvd0[4];
+ uint8_t signature;
+ uint8_t rsvd1[11];
+};
+
+struct mlx5_wqe_inline_seg {
+ uint32_t byte_count;
+};
+
+struct mlx5_wqe_wait_en_seg {
+ uint8_t rsvd0[8];
+ uint32_t pi;
+ uint32_t obj_num;
+};
+
+enum {
+ MLX5_MKEY_MASK_LEN = 1ull << 0,
+ MLX5_MKEY_MASK_PAGE_SIZE = 1ull << 1,
+ MLX5_MKEY_MASK_START_ADDR = 1ull << 6,
+ MLX5_MKEY_MASK_PD = 1ull << 7,
+ MLX5_MKEY_MASK_EN_RINVAL = 1ull << 8,
+ MLX5_MKEY_MASK_EN_SIGERR = 1ull << 9,
+ MLX5_MKEY_MASK_BSF_EN = 1ull << 12,
+ MLX5_MKEY_MASK_KEY = 1ull << 13,
+ MLX5_MKEY_MASK_QPN = 1ull << 14,
+ MLX5_MKEY_MASK_LR = 1ull << 17,
+ MLX5_MKEY_MASK_LW = 1ull << 18,
+ MLX5_MKEY_MASK_RR = 1ull << 19,
+ MLX5_MKEY_MASK_RW = 1ull << 20,
+ MLX5_MKEY_MASK_A = 1ull << 21,
+ MLX5_MKEY_MASK_SMALL_FENCE = 1ull << 23,
+ MLX5_MKEY_MASK_FREE = 1ull << 29,
+};
+
+enum {
+ MLX5_PERM_LOCAL_READ = 1 << 2,
+ MLX5_PERM_LOCAL_WRITE = 1 << 3,
+ MLX5_PERM_REMOTE_READ = 1 << 4,
+ MLX5_PERM_REMOTE_WRITE = 1 << 5,
+ MLX5_PERM_ATOMIC = 1 << 6,
+ MLX5_PERM_UMR_EN = 1 << 7,
+};
+
+#endif /* WQE_H */
Index: contrib/ofed/usr.lib/Makefile
===================================================================
--- contrib/ofed/usr.lib/Makefile
+++ contrib/ofed/usr.lib/Makefile
@@ -1,4 +1,4 @@
-SUBDIR= libibcommon libibmad libibumad libibverbs libmlx4 libmthca \
+SUBDIR= libibverbs libibcommon libibmad libibumad libmlx5 libmlx4 libmthca \
libopensm libosmcomp libosmvendor libibcm librdmacm libsdp libcxgb4
SUBDIR_DEPEND_libcxgb4= libibverbs
@@ -6,6 +6,7 @@
SUBDIR_DEPEND_libibmad= libibcommon libibumad
SUBDIR_DEPEND_libibumad= libibcommon
SUBDIR_DEPEND_libmlx4= libibverbs
+SUBDIR_DEPEND_libmlx5= libibverbs
SUBDIR_DEPEND_libmthca= libibverbs
SUBDIR_DEPEND_libosmvendor= libibumad libopensm libosmcomp
SUBDIR_DEPEND_librdmacm= libibverbs
Index: contrib/ofed/usr.lib/libmlx5/Makefile
===================================================================
--- /dev/null
+++ contrib/ofed/usr.lib/libmlx5/Makefile
@@ -0,0 +1,25 @@
+# $FreeBSD$
+
+SHLIBDIR?= /usr/lib
+
+.include <bsd.own.mk>
+
+MLX5DIR= ${.CURDIR}/../../libmlx5
+IBVERBSDIR= ${.CURDIR}/../../libibverbs
+MLXSRCDIR= ${MLX5DIR}/src
+
+.PATH: ${MLXSRCDIR}
+
+LIB= mlx5
+SHLIB_MAJOR= 1
+MK_PROFILE= no
+
+SRCS= buf.c cq.c dbrec.c implicit_lkey.c mlx5.c qp.c srq.c verbs.c
+
+LIBADD= ibverbs pthread
+CFLAGS+= -DHAVE_CONFIG_H
+CFLAGS+= -I${.CURDIR} -I${MLXSRCDIR} -I${IBVERBSDIR}/include
+
+VERSION_MAP= ${MLXSRCDIR}/mlx5.map
+
+.include <bsd.lib.mk>
Index: contrib/ofed/usr.lib/libmlx5/config.h
===================================================================
--- /dev/null
+++ contrib/ofed/usr.lib/libmlx5/config.h
@@ -0,0 +1,92 @@
+/* config.h. Generated from config.h.in by configure. */
+/* config.h.in. Generated from configure.ac by autoheader. */
+
+/* Define to 1 if you have the <dlfcn.h> header file. */
+#define HAVE_DLFCN_H 1
+
+/* Define to 1 if you have the `ibv_dofork_range' function. */
+#define HAVE_IBV_DOFORK_RANGE 1
+
+/* Define to 1 if you have the `ibv_dontfork_range' function. */
+#define HAVE_IBV_DONTFORK_RANGE 1
+
+/* adding verbs extension support */
+/* #undef HAVE_IBV_EXT */
+
+/* Define to 1 if you have the `ibv_register_driver' function. */
+#define HAVE_IBV_REGISTER_DRIVER 1
+
+/* Define to 1 if you have the <inttypes.h> header file. */
+#define HAVE_INTTYPES_H 1
+
+/* Define to 1 if you have the `ibverbs' library (-libverbs). */
+#define HAVE_LIBIBVERBS 1
+
+/* Define to 1 if you have the <memory.h> header file. */
+#define HAVE_MEMORY_H 1
+
+/* adding numa support */
+/* #undef HAVE_NUMA */
+
+/* Define to 1 if you have the <stdint.h> header file. */
+#define HAVE_STDINT_H 1
+
+/* Define to 1 if you have the <stdlib.h> header file. */
+#define HAVE_STDLIB_H 1
+
+/* Define to 1 if you have the <strings.h> header file. */
+#define HAVE_STRINGS_H 1
+
+/* Define to 1 if you have the <string.h> header file. */
+#define HAVE_STRING_H 1
+
+/* Define to 1 if you have the <sys/stat.h> header file. */
+#define HAVE_SYS_STAT_H 1
+
+/* Define to 1 if you have the <sys/types.h> header file. */
+#define HAVE_SYS_TYPES_H 1
+
+/* Define to 1 if you have the <unistd.h> header file. */
+#define HAVE_UNISTD_H 1
+
+/* Define to 1 if you have the <valgrind/memcheck.h> header file. */
+/* #undef HAVE_VALGRIND_MEMCHECK_H */
+
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
+#define LT_OBJDIR ".libs/"
+
+/* Define to 1 to disable Valgrind annotations. */
+#define NVALGRIND 1
+
+/* Name of package */
+#define PACKAGE "libmlx5"
+
+/* Define to the address where bug reports for this package should be sent. */
+#define PACKAGE_BUGREPORT "linux-rdma@vger.kernel.org"
+
+/* Define to the full name of this package. */
+#define PACKAGE_NAME "libmlx5"
+
+/* Define to the full name and version of this package. */
+#define PACKAGE_STRING "libmlx5 1.0.2mlnx1"
+
+/* Define to the one symbol short name of this package. */
+#define PACKAGE_TARNAME "libmlx5"
+
+/* Define to the home page for this package. */
+#define PACKAGE_URL ""
+
+/* Define to the version of this package. */
+#define PACKAGE_VERSION "1.0.2mlnx1"
+
+/* The size of `long', as computed by sizeof. */
+#define SIZEOF_LONG 8
+
+/* Define to 1 if you have the ANSI C header files. */
+#define STDC_HEADERS 1
+
+/* Version number of package */
+#define VERSION "1.0.2mlnx1"
+
+/* Define to empty if `const' does not conform to ANSI C. */
+/* #undef const */
Index: share/mk/bsd.libnames.mk
===================================================================
--- share/mk/bsd.libnames.mk
+++ share/mk/bsd.libnames.mk
@@ -102,6 +102,7 @@
LIBMENU?= ${DESTDIR}${LIBDIR}/libmenu.a
LIBMILTER?= ${DESTDIR}${LIBDIR}/libmilter.a
LIBMLX4?= ${DESTDIR}${LIBDIR}/libmlx4.a
+LIBMLX5?= ${DESTDIR}${LIBDIR}/libmlx5.a
LIBMP?= ${DESTDIR}${LIBDIR}/libmp.a
LIBMT?= ${DESTDIR}${LIBDIR}/libmt.a
LIBMTHCA?= ${DESTDIR}${LIBDIR}/libmthca.a
Index: share/mk/src.libnames.mk
===================================================================
--- share/mk/src.libnames.mk
+++ share/mk/src.libnames.mk
@@ -194,6 +194,7 @@
ibumad \
ibverbs \
mlx4 \
+ mlx5 \
mthca \
opensm \
osmcomp \
@@ -329,6 +330,7 @@
_DP_ibmad= ibcommon ibumad
_DP_ibumad= ibcommon
_DP_mlx4= ibverbs pthread
+_DP_mlx5= ibverbs pthread
_DP_mthca= ibverbs pthread
_DP_opensm= pthread
_DP_osmcomp= pthread
@@ -478,6 +480,7 @@
LIBIBUMADDIR= ${OBJTOP}/contrib/ofed/usr.lib/libibumad
LIBIBVERBSDIR= ${OBJTOP}/contrib/ofed/usr.lib/libibverbs
LIBMLX4DIR= ${OBJTOP}/contrib/ofed/usr.lib/libmlx4
+LIBMLX5DIR= ${OBJTOP}/contrib/ofed/usr.lib/libmlx5
LIBMTHCADIR= ${OBJTOP}/contrib/ofed/usr.lib/libmthca
LIBOPENSMDIR= ${OBJTOP}/contrib/ofed/usr.lib/libopensm
LIBOSMCOMPDIR= ${OBJTOP}/contrib/ofed/usr.lib/libosmcomp

File Metadata

Mime Type
text/plain
Expires
Fri, Feb 27, 4:04 PM (1 h, 11 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
29035159
Default Alt Text
D5794.id17796.diff (430 KB)

Event Timeline