Index: contrib/ofed/libmlx5/AUTHORS =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/AUTHORS @@ -0,0 +1 @@ +Eli Cohen Index: contrib/ofed/libmlx5/COPYING =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/COPYING @@ -0,0 +1,378 @@ +This software is available to you under a choice of one of two +licenses. You may choose to be licensed under the terms of the the +OpenIB.org BSD license or the GNU General Public License (GPL) Version +2, both included below. + +Copyright (c) 2007 Cisco, Inc. All rights reserved. + +================================================================== + + OpenIB.org BSD license + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials provided + with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + +================================================================== + + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. Index: contrib/ofed/libmlx5/Makefile.am =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/Makefile.am @@ -0,0 +1,35 @@ +AM_CFLAGS = -g -Wall -Werror -D_GNU_SOURCE -I$(includedir) +LDFLAGS += @NUMA_LIB@ +EXTRA_DIST = src/mlx5.map libmlx5.spec.in mlx5.driver +EXTRA_DIST += debian +EXTRA_DIST += autogen.sh +EXTRA_DIST += scripts/expose_libmlx5_headers/libmlx_expose_headers scripts/expose_libmlx5_headers/defines.txt scripts/expose_libmlx5_headers/structures.txt scripts/expose_libmlx5_headers/enumerations.txt +EXTRA_DIST += libmlx5.spec + + +mlx5_version_script = @MLX5_VERSION_SCRIPT@ + +MLX5_SOURCES = src/buf.c src/cq.c src/dbrec.c src/mlx5.c src/qp.c src/srq.c src/verbs.c src/implicit_lkey.c +noinst_HEADERS = src/bitmap.h src/doorbell.h src/list.h src/mlx5-abi.h src/mlx5.h src/wqe.h src/implicit_lkey.h + + +if HAVE_IBV_DEVICE_LIBRARY_EXTENSION + lib_LTLIBRARIES = src/libmlx5.la + src_libmlx5_la_SOURCES = $(MLX5_SOURCES) + src_libmlx5_la_LDFLAGS = -avoid-version -release @IBV_DEVICE_LIBRARY_EXTENSION@ \ + $(mlx5_version_script) + mlx5confdir = $(sysconfdir)/libibverbs.d + mlx5conf_DATA = mlx5.driver +else + mlx5libdir = $(libdir)/infiniband + mlx5lib_LTLIBRARIES = src/mlx5.la + src_mlx5_la_SOURCES = $(MLX5_SOURCES) + src_mlx5_la_LDFLAGS = -avoid-version -module $(mlx5_version_script) +endif + +install-data-hook: + mkdir -p $(DESTDIR)$(prefix)/include/infiniband + $(top_srcdir)/scripts/expose_libmlx5_headers/libmlx_expose_headers $(top_srcdir)/scripts/expose_libmlx5_headers/defines.txt $(top_srcdir)/scripts/expose_libmlx5_headers/structures.txt $(top_srcdir)/scripts/expose_libmlx5_headers/enumerations.txt $(DESTDIR)$(prefix) + +uninstall-hook: + rm -f $(DESTDIR)$(prefix)/include/infiniband/mlx5_hw.h Index: contrib/ofed/libmlx5/README =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/README @@ -0,0 +1,4 @@ +Introduction +============ + +Original file content erased. Put there something after finish coding... Index: contrib/ofed/libmlx5/autogen.sh =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/autogen.sh @@ -0,0 +1,7 @@ +#!/bin/sh -exE + +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf Index: contrib/ofed/libmlx5/config/.gitignore =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/config/.gitignore @@ -0,0 +1,8 @@ +mkinstalldirs +depcomp +compile +missing +config.guess +config.sub +ltmain.sh +install-sh Index: contrib/ofed/libmlx5/configure.ac =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/configure.ac @@ -0,0 +1,114 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(libmlx5, 1.0.2mlnx1, linux-rdma@vger.kernel.org) +AC_CONFIG_SRCDIR([src/mlx5.h]) +AC_CONFIG_AUX_DIR(config) +AC_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE([1.10 foreign tar-ustar silent-rules subdir-objects]) +m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) + +AC_PROG_LIBTOOL +LT_INIT + +AC_ARG_WITH([valgrind], + AC_HELP_STRING([--with-valgrind], + [Enable Valgrind annotations (small runtime overhead, default NO)])) +if test x$with_valgrind = x || test x$with_valgrind = xno; then + want_valgrind=no + AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.]) +else + want_valgrind=yes + if test -d $with_valgrind; then + CPPFLAGS="$CPPFLAGS -I$with_valgrind/include" + fi +fi + +AC_ARG_WITH([mlx5_debug], + AC_HELP_STRING([--with-mlx5_debug], + [Enable extensive debug prints from libmlx5 (default NO)])) +if test x$with_mlx5_debug = xyes; then + CFLAGS="$CFLAGS -DMLX5_DEBUG" +fi + +CFLAGS="$CFLAGS -Werror" + +dnl Checks for programs +AC_PROG_CC + +dnl Checks for libraries +AC_CHECK_LIB(numa, numa_node_of_cpu, + [ + have_numa=yes + AC_DEFINE(HAVE_NUMA, 1, [adding numa support]) + ], + [ + have_numa=no + ] +) + +AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], + AC_MSG_ERROR([ibv_get_device_list() not found. libmlx5 requires libibverbs.])) + +AC_CHECK_LIB(ibverbs, ibv_register_driver_ext, + AC_DEFINE(HAVE_IBV_EXT, 1, [adding verbs extension support])) + +dnl Checks for header files. +AC_CHECK_HEADER(infiniband/driver.h, [], + AC_MSG_ERROR([ not found. libmlx5 requires libibverbs.])) +AC_HEADER_STDC + +if test x$want_valgrind = xyes; then + AC_CHECK_HEADER(valgrind/memcheck.h, + [AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1, + [Define to 1 if you have the header file.])], + [if test $want_valgrind = yes; then + AC_MSG_ERROR([Valgrind memcheck support requested, but not found.]) + fi]) +fi + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_CHECK_SIZEOF(long) + +dnl Checks for library functions +AC_CHECK_FUNC(ibv_read_sysfs_file, [], + AC_MSG_ERROR([ibv_read_sysfs_file() not found. libmlx5 requires libibverbs >= 1.0.3.])) +AC_CHECK_FUNCS(ibv_dontfork_range ibv_dofork_range ibv_register_driver) + +dnl Now check if for libibverbs 1.0 vs 1.1 +dummy=if$$ +cat < $dummy.c +#include +IBV_DEVICE_LIBRARY_EXTENSION +IBV_VERSION +IBV_DEVICE_LIBRARY_EXTENSION=`$CC $CPPFLAGS -E $dummy.c 2> /dev/null | tail -1` +rm -f $dummy.c +AM_CONDITIONAL(HAVE_IBV_DEVICE_LIBRARY_EXTENSION, + test $IBV_DEVICE_LIBRARY_EXTENSION != IBV_DEVICE_LIBRARY_EXTENSION) +AC_SUBST(IBV_DEVICE_LIBRARY_EXTENSION) + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, + [if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then + ac_cv_version_script=yes + else + ac_cv_version_script=no + fi]) + +if test $ac_cv_version_script = yes; then + MLX5_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/mlx5.map' +else + MLX5_VERSION_SCRIPT= +fi +AC_SUBST(MLX5_VERSION_SCRIPT) + +if test $have_numa = yes; then + NUMA_LIB='-lnuma' +else + NUMA_LIB= +fi +AC_SUBST(NUMA_LIB) + + +AC_CONFIG_FILES([Makefile libmlx5.spec]) +AC_OUTPUT Index: contrib/ofed/libmlx5/debian/changelog =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/changelog @@ -0,0 +1,208 @@ +libmlx5 (1.0.2mlnx1-1) unstable; urgency=low + + * adjust mlx5_hw.h to survive -Werror during autotools + * Add support for compact AV + * fix warning in print with wrong type at mlx5_dbg + * Added report of GRH in wc_flags to DC CQE. + * configure.ac: Not compiling with valgrind if flag not set + * Fix masked atomic args check while post send + * fix bug of setting ctrl_seg opc_mod to wrong variable + * Add ConnectX-4 device + + -- Alaa Hleihel Wed, 15 Apr 2015 18:47:10 +0200 + +libmlx5 (1.0.1mlnx2-1) unstable; urgency=low + + * libmlx5: ibv_exp_create_mr with IBV_EXP_MR_SIGNATURE_EN return error. + * libmlx5: revert the endianess fix for immediate data + * libmlx5: fix gcc version query define + * libmlx5: use wc_auto_evict_size instead of wc_flush + * libmlx5:Fixed Immediate data endianness. + * libmlx5: copy the correct pd into odp_data.pd + * libmlx5: fix debug mode + * libmlx5: fix bug in umr + * Replace getenv to use ibv_exp_getenv + * libmlx5: fix contiguous page registration size. + * libmlx5: valgrind errors on modify_cq + * libmlx5: fix compilation bugs for old gcc + * Modify to use verbs specific getenv + * libmlx5: fixed overrun bug in resize cq + * libmlx5: Add general and code restructuring optimizations + * libmlx5: Reset opmod for each wr + * libmlx5: Optimize usage of memory barrier and locks + * libmlx5: destroy remote implicit rkey during PD deallocation + * libmlx5.spec.in: Changed valgrind libs DESTDIR + * Added valgrind support + * fixed and added valgrind Macros + * Adding experimental dereg_mr support + * libmlx5: added support to choose specific addr when using contig_pages + * libmlx5: fixed wraparound problem in umr creation + * libmlx5: added -Werror to Makefile.am + * libmlx5: Fix send wq calculation + * libmlx5: added max_inl_send_klmx check in create qp + * libmlx5: UMR API change + * libmlx5: Fix create qp errno value from EINVAL to ENOMEM + * libmlx5: Handle send queue wraparound in extended atomics + * libmlx5: fail on create QP if enable umr without comp_mask + * libmlx5: add check max device sge when creating QP + * libmlx5: update opcode in bad completion + * libmlx5: corrections after changes in ibv_exp_prefetch_attr + * libmlx5: change return value on ibv_post_srq_recv + * libmlx5: fix segfault on modify QP + * libmlx5: Verify max atomic arg size + * libmlx5: Fix failure to set inline for invalidate + * libmlx5.spec.in: use %{_prefix} instead of /usr + * libmlx5.spec.in: Support configure_options. + * Makefile.am: add implicit_lkey.h to noinst_HEADERS + * libibverbs: Fix immediate error detection base on IB spec + * libmlx5: Avoid creating AH with DLID 0 + * configure: Update AM_INIT_AUTOMAKE to support new auto tools. + * libmlx5: fix compilation warning on 32bit arch + * libmlx5: remove attr_size from ibv_exp_prefetch_mr verbs. + * libmlx5: prefetch implicit rkey MRs when registering a relaxed MR + * libxml5: Properly set the parameters of mrs created implicitly. + * libmlx5: Fix broken build on XEN server + * libmlx5: Fail post send if not in RTS + * libmlx5: Fix HW limitation in atomic response scatter entry + * libmlx5: Add missing fp in case of debug build + * libmlx5: Indicate UMR support at create + * libmlx5: fix compilation warning on Xen. + * libmlx5: fix compilation warning on newer gcc. + * libmlx5: fix compilation warning on 32bit arch + * libmlx5: add remote implicit mr support. + * libmlx5: Add implicit-lkey support. + * libmlx5: add support for the new ibv_exp_prefetch_mr verb. + * libmlx5: use $includedir to search for include files. + * libmlx5: change ibv_exp_reg_mr to call ibv_cmd_exp_reg_mr. + * libmlx5: fix reported size of verbs device struct. + * libmlx5: Add completion opcodes for masked atomic operations + * libmlx5: Fix bug taking args from wrong place + * libmlx5: Re-work UMR API + * libmlx5: Fix wrong calculation of translation size + * libmlx5: Add work completio opcode for UMR ops + * libmlx5: Fix DC size report to be a mask value + * BUILD: fix make checkdist and install datahook to respect $prefix + * libmlx5: Fix alignment problem + * libmlx5: Minor fixes to post send + * libmlx5: Use correct comp_mask in inline KLMs indication + * libmlx5: Fix compilation issues on 32 bit archs + * libmlx5: Add UMR support + * libmlx5: Add support for send NOP + * libmlx5: Simplify extended atomics API + * libmlx5: Fix compiler warning - unsued varaible + * scripts/expose_libmlx5_headers: install to the correct directory + * libmlx5: Fix endianess of atomics > 8 bytes + * libmlx5: Add support for Connect-IB virtual function + * libmlx5: Fix point type in ext_cmp_swp and ext_fetch_add + * libmlx5: fix 32b host compilation issue + * libmlx5: Add extended atomic support + * scripts/expose_libmlx5_headers: update the structures.txt file. + * libmlx5: Avoid overflow on mlx5_get_block_order() + * Revert "libmlx5: Fix log function to avoid overflow" + * Revert "libmlx5: Fix corner case in mlx5_get_block_order" + * libmlx5: Fix workaround for XRC + * libmlx5: Fix seg fault in poll_cq + * libmlx5: Fix corner case in mlx5_get_block_order + * libmlx5: Fix broken report on srq_qp + * libmlx5: fix refcnt for xrc + * libmlx5: Fix overflow on flag mask + * libmlx5: Fix log function to avoid overflow + * libmlx5: Fix variable overflow + * libmlx5: Return SRQ number in src_qp for XRC legacy + * libmlx5:update qp state on exp modify qp + * libmlx5: improve experimental interface + * libmlx5: Clear destroyed QP for resource table + * Change imm_data to ex.imm_data + * libmlx5: change wc_size from int to uint32_t. + * libmlx5: Fix sq overhead calculation + * libmlx5: Drain DCT CQEs when destroyed + * libmlx5.spec.in: Remove hard coded name and version from the Source + + -- Vladimir Sokolovsky Wed, 10 Dec 2014 10:53:10 +0200 + +libmlx5 (1.0.1mlnx1-1) unstable; urgency=low + + * libmlx5: Fix reported max SGE + * libmlx5: Add support for experimental atomics + * libmlx5: Fix corruption of legacy xrc domain + * libmlx5: added a new script that exposes specific structures, enumerations and defines from the libmlx5 sources to a new header file. + * libmlx5: Fix return codes from post send/recv + * libmlx5: Use new mlx5_alloc_ucontext to allow BF + * libmlx5: fix write on non existing exp_wc_flags field + * libmlx5: Add support for ARM DCT + * libmlx5: Align verbs interface with upstream + * libmlx5: add ibv_exp_reg_mr experimental verb + * libmlx5: Change legacy extended verbs to experimental verbs + * libmlx5: Change legacy extended uverbs to experimental uverbs + * Enable contigous pages for Control resources by default + * libmlx5: Do not publish support for IBV_CALC_OP_MAXLOC + * libmlx5: Follow API changes in libibverbs + * libmlx5: Fix memory leak in destroy DCT + * libmlx5: Optimize post send for CD operations + * libmlx5: Remove valgrind statement from mlx5_poll_one + * libmlx5: Fix valgrind error on Debian 7.1 + * libmlx5: Fix overflow handling in resize CQ + * libmlx5: Fix leak in destory srq + * libmlx5: Fix destroy DCT + * libmlx5: Fix resize CQ + * libmlx5: Add missing defines + * libmlx5: Change sandy bridge work around algorithm + * libmlx5: add debian support to EXTRA_DIST + * libmlx5: add support for "git review" command line gerrit tool + * libmlx5: Fix "make distcheck" + * libmlx5: Fix create QP extended flow + * libmlx5: Fix resize CQ missing mask + * libmlx5: Add Cross-channel capability + * libmlx5: Add mlx5_post_task + * libmlx5: Add CALC capabilities information into mlx5_query_device_ex + * libmlx5: Support Cross-channel capability in mlx5_drv_create_qp + * libmlx5: Add new opcodes to support Cross-channel + * libmlx5: Add support for inline receive new API + * mlx5: Add support for reading DC capabilites + * libmlx5: Fix XRC poll CQ flow + * libmlx5: Return DC related objects in query + * Revert "Revert "libmlx5: Remove deprecated enum IBV_QPT_DCT"" + * Revert "libmlx5: Remove deprecated enum IBV_QPT_DCT" + * libmlx5: Remove deprecated enum IBV_QPT_DCT + * libmlx5: Move DC calls to experimental verbs files + * libmlx5: Avoid clearing unused struct + * libmlx5: Fix justified compile warnings on debian + * libmlx5: Modify support for DC + * libmlx5: Change call to experimental create qp + * libmlx5: Add support for resize cq + * libmlx5: poll cq may report grh indication for non UD QPs + * libmlx5: Remove/rename mentions of mlx4 + * libmlx5: Fix broken uuar allocator + * libmlx5: Add support for create CQ extended + * libmlx5: add support for modify cq + * libmlx5: add support for query device extended + * libmlx5: avoid free of un-allocated pointer + * Avoid allocating receive buffer for QPs without recieve queue + * Fix signature calculation on receive queues + * Disable atomic operations + * libmlx5: Avoid returning negative values of errno + * libmlx5: fix srq free in destroy qp + * call mlx5_store/clear_qp() only when there are wqes + * libmlx5: Add adaptive stall mechanism for cq in sandy bridge + * libmlx5: On destroy qp remove pending cqe only by their qpn + * Fix copy to scat + * Fix leak in destroy SRQ + * libmlx5: Fix scatter to CQE + * libmlx5: XRC compat support + * Fix returned values in create QP + * libmlx5: Add DC support + * Work around for recovery problem in UoF + * Fix failure when mixed SRQ and QP report to CQ + * Add env varialbe to shut down blueflame + * Change dfault SB loop count + * Control action on error CQE + * mlx5: add XRC support + * mlx5: move call to single_threaded_app() to mlx5.c + + -- Vladimir Sokolovsky Sun, 23 Mar 2014 14:16:10 +0200 + +libmlx5 (1.0.0-1) unstable; urgency=low + + * New Mellanox release. + + -- Vladimir Sokolovsky Mon, 7 Jan 2013 13:38:10 +0200 Index: contrib/ofed/libmlx5/debian/compat =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/compat @@ -0,0 +1 @@ +7 Index: contrib/ofed/libmlx5/debian/control =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/control @@ -0,0 +1,47 @@ +Source: libmlx5 +Priority: extra +Maintainer: Eli Cohen +Build-Depends: debhelper (>= 7.0.50~), dpkg-dev (>= 1.13.19), libibverbs-dev (>= 1.1.3) +Standards-Version: 3.9.2 +Section: libs +Homepage: http://www.openfabrics.org/ + +Package: libmlx5-1 +Section: libs +Architecture: any +Depends: ${shlibs:Depends}, ${misc:Depends}, libibverbs1 (>= 1.1.3) +Description: Userspace driver for Mellanox ConnectX InfiniBand HCAs + libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand + host channel adapters (HCAs) for the libibverbs library. This allows + userspace processes to access Mellanox HCA hardware directly with + low latency and low overhead. + . + This package contains the loadable plug-in. + +Package: libmlx5-dev +Section: libdevel +Architecture: any +Depends: ${misc:Depends}, libmlx5-1 (= ${binary:Version}) +Description: Development files for the libmlx5 driver + libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand + host channel adapters (HCAs) for the libibverbs library. This allows + userspace processes to access Mellanox HCA hardware directly with + low latency and low overhead. + . + This package contains static versions of libmlx5 that may be linked + directly to an application, which may be useful for debugging. + +Package: libmlx5-1-dbg +Section: debug +Priority: extra +Architecture: any +Depends: ${misc:Depends}, libmlx5-1 (= ${binary:Version}) +Description: Debugging symbols for the libmlx5 driver + libmlx5 is a device-specific driver for Mellanox Connect-IB InfiniBand + host channel adapters (HCAs) for the libibverbs library. This allows + userspace processes to access Mellanox HCA hardware directly with + low latency and low overhead. + . + This package contains the debugging symbols associated with + libmlx5-1. They will automatically be used by gdb for debugging + libmlx5-related issues. Index: contrib/ofed/libmlx5/debian/copyright =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/copyright @@ -0,0 +1,43 @@ +Initial Debianization: +This package was debianized by Roland Dreier on +Fri, 6 Apr 2007 10:04:57 -0700 + +Source: +It was downloaded from the OpenFabrics web site at + + +Authors: + Roland Dreier + +Portions are copyrighted by: + * Copyright (c) 2005, 2006, 2007 Cisco Systems. All rights reserved. + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies Ltd. All rights reserved. + +libmlx4 is licensed under a choice of one of two licenses. You may +choose to be licensed under the terms of the GNU General Public +License (GPL) Version 2, available from the file +/usr/share/common-licenses/GPL-2 on your Debian system, or the +OpenIB.org BSD license below: + + Redistribution and use in source and binary forms, with or + without modification, are permitted provided that the following + conditions are met: + + - Redistributions of source code must retain the above + copyright notice, this list of conditions and the following + disclaimer. + + - Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials + provided with the distribution. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS +BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN +ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. Index: contrib/ofed/libmlx5/debian/libmlx5-1.install =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/libmlx5-1.install @@ -0,0 +1,2 @@ +usr/lib/libmlx5-rdmav2.so /usr/lib/libibverbs/ +etc/libibverbs.d/mlx5.driver Index: contrib/ofed/libmlx5/debian/libmlx5-dev.install =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/libmlx5-dev.install @@ -0,0 +1,2 @@ +usr/lib/libmlx5.a +usr/include/infiniband Index: contrib/ofed/libmlx5/debian/patches/driver-plugin-directory.patch =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/patches/driver-plugin-directory.patch @@ -0,0 +1,10 @@ +Description: Tell libibverbs to look in /usr/lib/libibverbs for plugin library +Author: Roland Dreier + +Index: libmlx5.git/mlx5.driver +=================================================================== +--- libmlx5.git.orig/mlx5.driver 2011-07-06 01:27:34.521058451 -0700 ++++ libmlx5.git/mlx5.driver 2011-07-06 01:27:47.051074172 -0700 +@@ -1 +1 @@ +-driver mlx5 ++driver /usr/lib/libibverbs/libmlx5 Index: contrib/ofed/libmlx5/debian/patches/series =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/patches/series @@ -0,0 +1 @@ +driver-plugin-directory.patch Index: contrib/ofed/libmlx5/debian/rules =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/rules @@ -0,0 +1,10 @@ +#!/usr/bin/make -f +# -*- mode: makefile; coding: utf-8 -*- + +%: + dh $@ + +override_dh_strip: + dh_strip --dbg-package=libmlx5-1-dbg + +override_dh_makeshlibs: Index: contrib/ofed/libmlx5/debian/source/format =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/source/format @@ -0,0 +1 @@ +3.0 (quilt) Index: contrib/ofed/libmlx5/debian/watch =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/debian/watch @@ -0,0 +1,3 @@ +version=3 +opts="uversionmangle=s/-rc/~rc/" \ + http://www.openfabrics.org/downloads/mlx5/libmlx5-(.+)\.tar\.gz Index: contrib/ofed/libmlx5/libmlx5.spec.in =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/libmlx5.spec.in @@ -0,0 +1,85 @@ +%{!?_with_valgrind: %define _with_valgrind 0} +%{!?_disable_valgrind: %define _disable_valgrind 0} + +%if 0%{?rhel} == 6 +%if 0%{_disable_valgrind} == 0 +%define _with_valgrind 1 +%endif +%endif + +Name: libmlx5 +Version: 1.0.2mlnx1 +Release: 1%{?dist} +Summary: Mellanox ConnectX-IB InfiniBand HCA Userspace Driver + +Group: System Environment/Libraries +License: GPLv2 or BSD +Url: http://openfabrics.org/ +Source: http://openfabrics.org/downloads/mlx5/%{name}-%{version}.tar.gz +BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) + +BuildRequires: libibverbs-devel >= 1.1-0.1.rc2 +%if %{_with_valgrind} +BuildRequires: valgrind-devel +%endif + +%description +libmlx5 provides a device-specific userspace driver for Mellanox +ConnectX HCAs for use with the libibverbs library. + +%package devel +Summary: Development files for the libmlx5 driver +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} +Provides: libmlx5-static = %{version}-%{release} + +%description devel +Static version of libmlx5 that may be linked directly to an +application, which may be useful for debugging. + +%prep +%setup -q -n %{name}-@VERSION@ + +%build +%if %{_with_valgrind} +%configure %{?configure_options} --libdir=%{_libdir}/mlnx_ofed/valgrind --with-valgrind +make %{?_smp_mflags} +make DESTDIR=$RPM_BUILD_DIR/%{name}-%{version}/valgrind install +rm -f $RPM_BUILD_DIR/%{name}-%{version}/valgrind/%{_libdir}/mlnx_ofed/valgrind/*.*a +make clean +%endif + +%configure %{configure_options} +make %{?_smp_mflags} + +%install +rm -rf $RPM_BUILD_ROOT +make DESTDIR=%{buildroot} install +%if %{_with_valgrind} +mkdir -p %{buildroot}/%{_libdir}/mlnx_ofed +cp -a $RPM_BUILD_DIR/%{name}-%{version}/valgrind/%{_libdir}/mlnx_ofed/valgrind %{buildroot}/%{_libdir}/mlnx_ofed +%endif +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/*.la $RPM_BUILD_ROOT%{_libdir}/libmlx5.so + +%clean +rm -rf $RPM_BUILD_ROOT + +%files +%defattr(-,root,root,-) +%{_libdir}/libmlx5-rdmav2.so +%if %{_with_valgrind} +%{_libdir}/mlnx_ofed/valgrind/libmlx5*.so +%endif +%{_sysconfdir}/libibverbs.d/mlx5.driver +%doc AUTHORS COPYING README + +%files devel +%defattr(-,root,root,-) +%{_libdir}/libmlx5.a +%{_prefix}/include/infiniband/ + +%changelog +* Mon Mar 26 2012 Eli Cohen - 1.0.0 +- First version + Index: contrib/ofed/libmlx5/mlx5.driver =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/mlx5.driver @@ -0,0 +1 @@ +driver mlx5 Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/defines.txt =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/defines.txt @@ -0,0 +1,2 @@ +MLX5_CQ_DB_REQ_NOT_SOL +MLX5_CQ_DB_REQ_NOT Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/enumerations.txt =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/enumerations.txt @@ -0,0 +1,74 @@ +MLX5_RCV_DBR +MLX5_SND_DBR +MLX5_SEND_WQE_BB +MLX5_SEND_WQE_SHIFT +MLX5_INLINE_SCATTER_32 +MLX5_INLINE_SCATTER_64 +MLX5_OPCODE_NOP +MLX5_OPCODE_SEND_INVAL +MLX5_OPCODE_RDMA_WRITE +MLX5_OPCODE_RDMA_WRITE_IMM +MLX5_OPCODE_SEND +MLX5_OPCODE_SEND_IMM +MLX5_OPCODE_LSO_MPW +MLX5_OPC_MOD_MPW +MLX5_OPCODE_RDMA_READ +MLX5_OPCODE_ATOMIC_CS +MLX5_OPCODE_ATOMIC_FA +MLX5_OPCODE_ATOMIC_MASKED_CS +MLX5_OPCODE_ATOMIC_MASKED_FA +MLX5_OPCODE_BIND_MW +MLX5_OPCODE_FMR +MLX5_OPCODE_LOCAL_INVAL +MLX5_OPCODE_CONFIG_CMD +MLX5_OPCODE_SEND_ENABLE +MLX5_OPCODE_RECV_ENABLE +MLX5_OPCODE_CQE_WAIT +MLX5_RECV_OPCODE_RDMA_WRITE_IMM +MLX5_RECV_OPCODE_SEND +MLX5_RECV_OPCODE_SEND_IMM +MLX5_RECV_OPCODE_SEND_INVAL +MLX5_CQE_OPCODE_ERROR +MLX5_CQE_OPCODE_RESIZE +MLX5_SRQ_FLAG_SIGNATURE +MLX5_INLINE_SEG +MLX5_CALC_UINT64_ADD +MLX5_CALC_FLOAT64_ADD +MLX5_CALC_UINT64_MAXLOC +MLX5_CALC_UINT64_AND +MLX5_CALC_UINT64_OR +MLX5_CALC_UINT64_XOR +MLX5_CQ_DOORBELL +MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR +MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR +MLX5_CQE_SYNDROME_LOCAL_PROT_ERR +MLX5_CQE_SYNDROME_WR_FLUSH_ERR +MLX5_CQE_SYNDROME_MW_BIND_ERR +MLX5_CQE_SYNDROME_BAD_RESP_ERR +MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR +MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR +MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR +MLX5_CQE_SYNDROME_REMOTE_OP_ERR +MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR +MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR +MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR +MLX5_CQE_OWNER_MASK +MLX5_CQE_REQ +MLX5_CQE_RESP_WR_IMM +MLX5_CQE_RESP_SEND +MLX5_CQE_RESP_SEND_IMM +MLX5_CQE_RESP_SEND_INV +MLX5_CQE_RESIZE_CQ +MLX5_CQE_SIG_ERR +MLX5_CQE_REQ_ERR +MLX5_CQE_RESP_ERR +MLX5_CQE_INVALID +MLX5_WQE_CTRL_CQ_UPDATE +MLX5_WQE_CTRL_SOLICITED +MLX5_WQE_CTRL_FENCE +MLX5_INVALID_LKEY +MLX5_EXTENDED_UD_AV +MLX5_NO_INLINE_DATA +MLX5_INLINE_DATA32_SEG +MLX5_INLINE_DATA64_SEG +MLX5_COMPRESSED Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/libmlx_expose_headers =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/libmlx_expose_headers @@ -0,0 +1,364 @@ +#!/bin/bash -eE +# Name: Expose libmlx5 headers +# Author: Majd Dibbiny - majd@mellanox.com + +name=libmlx_expose_headers +author="Majd Dibbiny - Majd@Mellanox.com" +usage="./libmlx_expose_headers defines-file structures-file enumerations-file\nPlease provide the files in the exact order" +example="./libmlx_expose_headers defines.txt structs.txt enums.txt" +script_output="The script's output file is saved to $output_file" +SCRIPTPATH=$(cd `dirname "${BASH_SOURCE[0]}"` && pwd) +args=3 +defines_file="$1" +structs_file="$2" +enums_file="$3" +prefix="$4" +output_file="$prefix/include/infiniband/mlx5_hw.h" +mkdir -p "$prefix/include/infiniband" +libmlx5_path="$SCRIPTPATH/../../src/*" +FILES="$libmlx5_path" + +function add_header { +cat < $output_file +/** + * Copyright (C) Mellanox Technologies Ltd. 2001-2014. ALL RIGHTS RESERVED. + * This software product is a proprietary product of Mellanox Technologies Ltd. + * (the "Company") and all right, title, and interest and to the software product, + * including all associated intellectual property rights, are and shall + * remain exclusively with the Company. + * + * This software product is governed by the End User License Agreement + * provided with the software product. + */ + +#ifndef MLX_HW_H_ +#define MLX_HW_H_ + +#include +#include +#include +#include +#include + +#define MLX5_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__) +#if MLX5_GCC_VERSION >= 403 +# define __MLX5_ALGN_F__ __attribute__((noinline, aligned(64))) +# define __MLX5_ALGN_D__ __attribute__((aligned(64))) +#else +# define __MLX5_ALGN_F__ +# define __MLX5_ALGN_D__ +#endif + +EOF +} + +function add_footer { + echo -e "\n#endif" >> $output_file +} + +function expose_defines { + #need to add support for define on multiple lines + local expose_defines_res=0 + for f in $FILES ; do + grep -F -f $defines_file $f | sed -n '/^#/p' >> $output_file + done + while read -r line + do + if [ "`grep $line $output_file`" = "" ]; then + #echo "define: $line wasn't found." + expose_defines_res=1 + break + fi + done < "$defines_file" + echo -e "\n" >> $output_file + echo $expose_defines_res +} + +function expose_enums { + local expose_enums_res=0 + +cat <> $output_file +enum mlx5_alloc_type { MXM_MLX5_ALLOC_TYPE_DUMMY }; +enum mlx5_rsc_type { MXM_MLX5_RSC_TYPE_DUMMY }; +enum mlx5_db_method { MXM_MLX5_DB_TYPE_DUMMY }; +enum mlx5_lock_type { MXM_MLX5_LOCK_TYPE_DUMMY }; +enum mlx5_lock_state { MXM_MLX5_LOCK_STATE_TYPE_DUMMY }; +EOF + echo "enum {" >> $output_file + while read -r line + do + for f in $FILES ; do + grep "$line" $f| while read -r gline ; do + pat="(\t)*(\s)*$line(\t)*(\s)*=" + if [[ $gline =~ $pat ]] ; + then + grep_res="`echo $gline|sed -e 's/,.*//'`" + echo -e "\t$grep_res," >> $output_file + break + fi + done + done + if [ "`grep $line $output_file`" = "" ]; then + #echo "enum: $line wasn't found." + expose_enums_res=1 + break + fi + done < "$enums_file" + echo -e "};\n" >> $output_file + echo $expose_enums_res +} + +function expose_structs { + local expose_structs_res=0 + + echo -e "struct mlx5_qp;\n" >> $output_file; + + while read -r line + do + struct_found=0 + for f in $FILES; do + struct_line="struct $line {" + grep_res=`grep "$struct_line" $f` + if [ "$grep_res" != "" ] ; then + struct_found=1 + counter=0 + flag=0 + while IFS='' read -r fline + do + if [ "$struct_line" == "$fline" ] ; + then + flag=1 + fi + if [ "$flag" -gt "0" ] ; + then + if [[ $fline == *{* ]] ; + then + ((counter++)) + elif [[ $fline == *}* ]] ; + then + ((counter--)) + fi + printf "%s\n" "$fline">> $output_file + if [ "$counter" -eq "0" ] ; + then + flag=0 + echo -e "\n" >> $output_file + fi + fi + done < "$f" + break + fi + done + if [ $struct_found -lt 1 ]; then + #echo "struct: $line wasn't found." + expose_structs_res=1 + break + fi + done < "$structs_file" + echo $expose_structs_res +} + +function add_aux_funcs { +cat <> $output_file +#define to_mxxx(xxx, type)\\ + ((struct mlx5_##type *)\\ + ((void *) ib##xxx - offsetof(struct mlx5_##type, ibv_##xxx))) + +static inline struct mlx5_qp *to_mqp(struct ibv_qp *ibqp) +{ + struct verbs_qp *vqp = (struct verbs_qp *)ibqp; + return container_of(vqp, struct mlx5_qp, verbs_qp); +} + +static inline struct mlx5_cq *to_mcq(struct ibv_cq *ibcq) +{ + return to_mxxx(cq, cq); +} + +EOF +} + +function add_qp_info_struct { +cat <> $output_file +struct ibv_mlx5_qp_info { + uint32_t qpn; + uint32_t *dbrec; + struct { + void *buf; + unsigned wqe_cnt; + unsigned stride; + } sq, rq; + struct { + void *reg; + unsigned size; + int need_lock; + } bf; +}; + +EOF +} +function add_qp_info_func { + add_qp_info_struct +cat <> $output_file +static inline int ibv_mlx5_exp_get_qp_info(struct ibv_qp *qp, struct ibv_mlx5_qp_info *qp_info) +{ + struct mlx5_qp *mqp = to_mqp(qp); + + if ((mqp->gen_data.scur_post != 0) || (mqp->rq.head != 0)) + return -1; + + qp_info->qpn = mqp->ctrl_seg.qp_num; + qp_info->dbrec = mqp->gen_data.db; + qp_info->sq.buf = mqp->buf.buf + mqp->sq.offset; + qp_info->sq.wqe_cnt = mqp->sq.wqe_cnt; + qp_info->sq.stride = 1 << mqp->sq.wqe_shift; + qp_info->rq.buf = mqp->buf.buf + mqp->rq.offset; + qp_info->rq.wqe_cnt = mqp->rq.wqe_cnt; + qp_info->rq.stride = 1 << mqp->rq.wqe_shift; + qp_info->bf.reg = mqp->gen_data.bf->reg; + qp_info->bf.need_lock = mqp->gen_data.bf->need_lock; + + if (mqp->gen_data.bf->uuarn > 0) + qp_info->bf.size = mqp->gen_data.bf->buf_size; + else + qp_info->bf.size = 0; + + return 0; +} + +EOF +} + +function add_cq_info_struct { +cat <> $output_file +struct ibv_mlx5_cq_info { + uint32_t cqn; + unsigned cqe_cnt; + void *buf; + uint32_t *dbrec; + unsigned cqe_size; +}; + +EOF +} + +function add_cq_info_func { + add_cq_info_struct +cat <> $output_file +static inline int ibv_mlx5_exp_get_cq_info(struct ibv_cq *cq, struct ibv_mlx5_cq_info *cq_info) +{ + struct mlx5_cq *mcq = to_mcq(cq); + + if (mcq->cons_index != 0) + return -1; + + cq_info->cqn = mcq->cqn; + cq_info->cqe_cnt = mcq->ibv_cq.cqe + 1; + cq_info->cqe_size = mcq->cqe_sz; + cq_info->buf = mcq->active_buf->buf; + cq_info->dbrec = mcq->dbrec; + + return 0; +} + +EOF +} + +function add_srq_info_struct { +cat <> $output_file +struct ibv_mlx5_srq_info { + void *buf; + uint32_t *dbrec; + unsigned stride; + unsigned head; + unsigned tail; +}; + +EOF +} + +function add_srq_info_func { + add_srq_info_struct +cat <> $output_file +static inline int ibv_mlx5_exp_get_srq_info(struct ibv_srq *srq, struct ibv_mlx5_srq_info *srq_info) +{ + struct mlx5_srq *msrq; + + if (srq->handle == LEGACY_XRC_SRQ_HANDLE) + srq = (struct ibv_srq *)(((struct ibv_srq_legacy *)srq)->ibv_srq); + + msrq = container_of(srq, struct mlx5_srq, vsrq.srq); + + if (msrq->counter != 0) + return -1; + + srq_info->buf = msrq->buf.buf; + srq_info->dbrec = msrq->db; + srq_info->stride = 1 << msrq->wqe_shift; + srq_info->head = msrq->head; + srq_info->tail = msrq->tail; + + return 0; +} + +EOF +} + +function add_cq_ci_func { +cat <> $output_file +static inline void ibv_mlx5_exp_update_cq_ci(struct ibv_cq *cq, unsigned cq_ci) +{ + struct mlx5_cq *mcq = to_mcq(cq); + + mcq->cons_index = cq_ci; +} +EOF +} + +##MAIN## + +if [ $# -lt $args ] ; then + echo "Wrong number of arguments!" + echo -e "\n" + echo -e "Usage: $usage" + echo -e "\n" + echo "Example: $example" + echo -e "\n" + echo "Output: $script_output" + echo -e "\n\n" + echo -e "For help please contact $author \nExiting..." + exit 1 +fi + +add_header +expose_defines_res=$(expose_defines) +if [ $expose_defines_res -ne 0 ] ; then + echo "expose_defines: Failed!" + echo "Exiting..." + rm -f $output_file + exit 1 +fi +expose_enums_res=$(expose_enums) +if [ $expose_enums_res -ne 0 ] ; then + echo "expose_enums: Failed!" + echo "Exiting..." + rm -f $output_file + exit 1 +fi +expose_structs_res=$(expose_structs) +if [ $expose_structs_res -ne 0 ] ; then + echo "expose_structs: Failed!" + echo "Exiting..." + rm -f $output_file + exit 1 +fi + +add_aux_funcs +add_qp_info_func +add_cq_info_func +add_srq_info_func +add_cq_ci_func + +add_footer + +exit 0 Index: contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/structures.txt =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/scripts/expose_libmlx5_headers/structures.txt @@ -0,0 +1,43 @@ +mlx5_resource +mlx5_wqe_srq_next_seg +mlx5_wqe_data_seg +mlx5_eqe_comp +mlx5_eqe_qp_srq +mlx5_wqe_ctrl_seg +mlx5_wqe_xrc_seg +mlx5_wqe_masked_atomic_seg +mlx5_base_av +mlx5_grh_av +mlx5_wqe_av +mlx5_wqe_datagram_seg +mlx5_wqe_raddr_seg +mlx5_wqe_atomic_seg +mlx5_wqe_inl_data_seg +mlx5_wqe_umr_ctrl_seg +mlx5_seg_set_psv +mlx5_seg_get_psv +mlx5_seg_check_psv +mlx5_rwqe_sig +mlx5_wqe_signature_seg +mlx5_wqe_inline_seg +mlx5_wqe_wait_en_seg +mlx5_err_cqe +mlx5_cqe64 +mlx5_spinlock +mlx5_lock +mlx5_numa_req +mlx5_buf +general_data_hot +data_seg_data +ctrl_seg_data +mpw_data +general_data_warm +odp_data +mlx5_wq_recv_send_enable +mlx5_cq +mlx5_srq +mlx5_wq +mlx5_bf +mlx5_qp +mlx5_ah +mlx5_mini_cqe8 Index: contrib/ofed/libmlx5/src/.gitignore =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/.gitignore @@ -0,0 +1,3 @@ +*.la +.dirstamp +.libs Index: contrib/ofed/libmlx5/src/bitmap.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/bitmap.h @@ -0,0 +1,111 @@ +/* + * Copyright (c) 2000, 2011 Mellanox Technology Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef BITMAP_H +#define BITMAP_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "mlx5.h" + +/* Only ia64 requires this */ +#ifdef __ia64__ +#define MLX5_SHM_ADDR ((void *)0x8000000000000000UL) +#define MLX5_SHMAT_FLAGS (SHM_RND) +#else +#define MLX5_SHM_ADDR NULL +#define MLX5_SHMAT_FLAGS 0 +#endif + +#define BITS_PER_LONG (8 * sizeof(long)) +#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_LONG) + +#ifndef HPAGE_SIZE +#define HPAGE_SIZE (2UL * 1024 * 1024) +#endif + +#define MLX5_SHM_LENGTH HPAGE_SIZE +#define MLX5_Q_CHUNK_SIZE 32768 +#define MLX5_SHM_NUM_REGION 64 + +static inline unsigned long mlx5_ffz(uint32_t word) +{ + return __builtin_ffs(~word) - 1; +} + +static inline uint32_t mlx5_find_first_zero_bit(const unsigned long *addr, + uint32_t size) +{ + const unsigned long *p = addr; + uint32_t result = 0; + unsigned long tmp; + + while (size & ~(BITS_PER_LONG - 1)) { + tmp = *(p++); + if (~tmp) + goto found; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + + tmp = (*p) | (~0UL << size); + if (tmp == (uint32_t)~0UL) /* Are any bits zero? */ + return result + size; /* Nope. */ +found: + return result + mlx5_ffz(tmp); +} + +static inline void mlx5_set_bit(unsigned int nr, unsigned long *addr) +{ + addr[(nr / BITS_PER_LONG)] |= (1 << (nr % BITS_PER_LONG)); +} + +static inline void mlx5_clear_bit(unsigned int nr, unsigned long *addr) +{ + addr[(nr / BITS_PER_LONG)] &= ~(1 << (nr % BITS_PER_LONG)); +} + +static inline int mlx5_test_bit(unsigned int nr, const unsigned long *addr) +{ + return !!(addr[(nr / BITS_PER_LONG)] & (1 << (nr % BITS_PER_LONG))); +} + +#endif Index: contrib/ofed/libmlx5/src/buf.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/buf.c @@ -0,0 +1,688 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include + +#ifdef HAVE_NUMA +#include +#endif + +#include "mlx5.h" +#include "bitmap.h" + +#if !(defined(HAVE_IBV_DONTFORK_RANGE) && defined(HAVE_IBV_DOFORK_RANGE)) + +/* + * If libibverbs isn't exporting these functions, then there's no + * point in doing it here, because the rest of libibverbs isn't going + * to be fork-safe anyway. + */ +static int ibv_dontfork_range(void *base, size_t size) +{ + return 0; +} + +static int ibv_dofork_range(void *base, size_t size) +{ + return 0; +} + +#endif /* HAVE_IBV_DONTFORK_RANGE && HAVE_IBV_DOFORK_RANGE */ + +static int mlx5_bitmap_init(struct mlx5_bitmap *bitmap, uint32_t num, + uint32_t mask) +{ + bitmap->last = 0; + bitmap->top = 0; + bitmap->max = num; + bitmap->avail = num; + bitmap->mask = mask; + bitmap->avail = bitmap->max; + bitmap->table = calloc(BITS_TO_LONGS(bitmap->max), sizeof(uint32_t)); + if (!bitmap->table) + return -ENOMEM; + + return 0; +} + +static void bitmap_free_range(struct mlx5_bitmap *bitmap, uint32_t obj, + int cnt) +{ + int i; + + obj &= bitmap->max - 1; + + for (i = 0; i < cnt; i++) + mlx5_clear_bit(obj + i, bitmap->table); + bitmap->last = min(bitmap->last, obj); + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + bitmap->avail += cnt; +} + +static int bitmap_empty(struct mlx5_bitmap *bitmap) +{ + return (bitmap->avail == bitmap->max) ? 1 : 0; +} + +static int bitmap_avail(struct mlx5_bitmap *bitmap) +{ + return bitmap->avail; +} + +static void mlx5_bitmap_cleanup(struct mlx5_bitmap *bitmap) +{ + if (bitmap->table) + free(bitmap->table); +} + +static void free_huge_mem(struct mlx5_hugetlb_mem *hmem) +{ + mlx5_bitmap_cleanup(&hmem->bitmap); + if (shmdt(hmem->shmaddr) == -1) + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno)); + shmctl(hmem->shmid, IPC_RMID, NULL); + free(hmem); +} + +static int mlx5_bitmap_alloc(struct mlx5_bitmap *bitmap) +{ + uint32_t obj; + int ret; + + obj = mlx5_find_first_zero_bit(bitmap->table, bitmap->max); + if (obj < bitmap->max) { + mlx5_set_bit(obj, bitmap->table); + bitmap->last = (obj + 1); + if (bitmap->last == bitmap->max) + bitmap->last = 0; + obj |= bitmap->top; + ret = obj; + } else + ret = -1; + + if (ret != -1) + --bitmap->avail; + + return ret; +} + +static uint32_t find_aligned_range(unsigned long *bitmap, + uint32_t start, uint32_t nbits, + int len, int alignment) +{ + uint32_t end, i; + +again: + start = align(start, alignment); + + while ((start < nbits) && mlx5_test_bit(start, bitmap)) + start += alignment; + + if (start >= nbits) + return -1; + + end = start + len; + if (end > nbits) + return -1; + + for (i = start + 1; i < end; i++) { + if (mlx5_test_bit(i, bitmap)) { + start = i + 1; + goto again; + } + } + + return start; +} + +static int bitmap_alloc_range(struct mlx5_bitmap *bitmap, int cnt, + int align) +{ + uint32_t obj; + int ret, i; + + if (cnt == 1 && align == 1) + return mlx5_bitmap_alloc(bitmap); + + if (cnt > bitmap->max) + return -1; + + obj = find_aligned_range(bitmap->table, bitmap->last, + bitmap->max, cnt, align); + if (obj >= bitmap->max) { + bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask; + obj = find_aligned_range(bitmap->table, 0, bitmap->max, + cnt, align); + } + + if (obj < bitmap->max) { + for (i = 0; i < cnt; i++) + mlx5_set_bit(obj + i, bitmap->table); + if (obj == bitmap->last) { + bitmap->last = (obj + cnt); + if (bitmap->last >= bitmap->max) + bitmap->last = 0; + } + obj |= bitmap->top; + ret = obj; + } else + ret = -1; + + if (ret != -1) + bitmap->avail -= cnt; + + return obj; +} + +#ifndef SHM_HUGETLB +#define SHM_HUGETLB 0 +#endif + +static struct mlx5_hugetlb_mem *alloc_huge_mem(size_t size) +{ + struct mlx5_hugetlb_mem *hmem; + size_t shm_len; + + hmem = malloc(sizeof(*hmem)); + if (!hmem) + return NULL; + + shm_len = align(size, MLX5_SHM_LENGTH); + hmem->shmid = shmget(IPC_PRIVATE, shm_len, SHM_HUGETLB | SHM_R | SHM_W); + if (hmem->shmid == -1) { + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno)); + goto out_free; + } + + hmem->shmaddr = shmat(hmem->shmid, MLX5_SHM_ADDR, MLX5_SHMAT_FLAGS); + if (hmem->shmaddr == (void *)-1) { + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno)); + goto out_rmid; + } + + if (mlx5_bitmap_init(&hmem->bitmap, shm_len / MLX5_Q_CHUNK_SIZE, + shm_len / MLX5_Q_CHUNK_SIZE - 1)) { + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno)); + goto out_shmdt; + } + + /* + * Marked to be destroyed when process detaches from shmget segment + */ + shmctl(hmem->shmid, IPC_RMID, NULL); + + return hmem; + +out_shmdt: + if (shmdt(hmem->shmaddr) == -1) + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "%s\n", strerror(errno)); + +out_rmid: + shmctl(hmem->shmid, IPC_RMID, NULL); + +out_free: + free(hmem); + return NULL; +} + +static int alloc_huge_buf(struct mlx5_context *mctx, struct mlx5_buf *buf, + size_t size, int page_size) +{ + int found = 0; + LIST_HEAD(slist); + int nchunk; + struct mlx5_hugetlb_mem *hmem; + int ret; + + buf->length = align(size, MLX5_Q_CHUNK_SIZE); + nchunk = buf->length / MLX5_Q_CHUNK_SIZE; + + mlx5_spin_lock(&mctx->hugetlb_lock); + list_for_each_entry(hmem, &mctx->hugetlb_list, list) { + if (bitmap_avail(&hmem->bitmap)) { + buf->base = bitmap_alloc_range(&hmem->bitmap, nchunk, 1); + if (buf->base != -1) { + buf->hmem = hmem; + found = 1; + break; + } + } + } + mlx5_spin_unlock(&mctx->hugetlb_lock); + + if (!found) { + hmem = alloc_huge_mem(buf->length); + if (!hmem) + return -1; + + buf->base = bitmap_alloc_range(&hmem->bitmap, nchunk, 1); + if (buf->base == -1) { + free_huge_mem(hmem); + /* TBD: remove after proven stability */ + fprintf(stderr, "BUG: huge allocation\n"); + return -1; + } + + buf->hmem = hmem; + + mlx5_spin_lock(&mctx->hugetlb_lock); + if (bitmap_avail(&hmem->bitmap)) + list_add(&hmem->list, &mctx->hugetlb_list); + else + list_add_tail(&hmem->list, &mctx->hugetlb_list); + mlx5_spin_unlock(&mctx->hugetlb_lock); + } + + buf->buf = hmem->shmaddr + buf->base * MLX5_Q_CHUNK_SIZE; + + ret = ibv_dontfork_range(buf->buf, buf->length); + if (ret) { + mlx5_dbg(stderr, MLX5_DBG_CONTIG, "\n"); + goto out_fork; + } + buf->type = MLX5_ALLOC_TYPE_HUGE; + + return 0; + +out_fork: + mlx5_spin_lock(&mctx->hugetlb_lock); + bitmap_free_range(&hmem->bitmap, buf->base, nchunk); + if (bitmap_empty(&hmem->bitmap)) { + list_del(&hmem->list); + mlx5_spin_unlock(&mctx->hugetlb_lock); + free_huge_mem(hmem); + } else + mlx5_spin_unlock(&mctx->hugetlb_lock); + + return -1; +} + +static void free_huge_buf(struct mlx5_context *ctx, struct mlx5_buf *buf) +{ + int nchunk; + + nchunk = buf->length / MLX5_Q_CHUNK_SIZE; + mlx5_spin_lock(&ctx->hugetlb_lock); + bitmap_free_range(&buf->hmem->bitmap, buf->base, nchunk); + if (bitmap_empty(&buf->hmem->bitmap)) { + list_del(&buf->hmem->list); + mlx5_spin_unlock(&ctx->hugetlb_lock); + free_huge_mem(buf->hmem); + } else + mlx5_spin_unlock(&ctx->hugetlb_lock); +} + +int mlx5_alloc_prefered_buf(struct mlx5_context *mctx, + struct mlx5_buf *buf, + size_t size, int page_size, + enum mlx5_alloc_type type, + const char *component) +{ + int ret; + + /* + * Fallback mechanism priority: + * huge pages + * contig pages + * default + */ + if (type == MLX5_ALLOC_TYPE_HUGE || + type == MLX5_ALLOC_TYPE_PREFER_HUGE || + type == MLX5_ALLOC_TYPE_ALL) { + ret = alloc_huge_buf(mctx, buf, size, page_size); + if (!ret) + return 0; + + if (type == MLX5_ALLOC_TYPE_HUGE) + return -1; + + mlx5_dbg(stderr, MLX5_DBG_CONTIG, + "Huge mode allocation failed, fallback to %s mode\n", + MLX5_ALLOC_TYPE_ALL ? "contig" : "default"); + } + + if (type == MLX5_ALLOC_TYPE_CONTIG || + type == MLX5_ALLOC_TYPE_PREFER_CONTIG || + type == MLX5_ALLOC_TYPE_ALL) { + ret = mlx5_alloc_buf_contig(mctx, buf, size, page_size, component, NULL); + if (!ret) + return 0; + + if (type == MLX5_ALLOC_TYPE_CONTIG) + return -1; + mlx5_dbg(stderr, MLX5_DBG_CONTIG, + "Contig allocation failed, fallback to default mode\n"); + } + + return mlx5_alloc_buf(buf, size, page_size); +} + +int mlx5_free_actual_buf(struct mlx5_context *ctx, struct mlx5_buf *buf) +{ + int err = 0; + + switch (buf->type) { + case MLX5_ALLOC_TYPE_ANON: + mlx5_free_buf(buf); + break; + + case MLX5_ALLOC_TYPE_HUGE: + free_huge_buf(ctx, buf); + break; + + case MLX5_ALLOC_TYPE_CONTIG: + mlx5_free_buf_contig(ctx, buf); + break; + default: + fprintf(stderr, "Bad allocation type\n"); + } + + return err; +} + +/* This function computes log2(v) rounded up. + We don't want to have a dependency to libm which exposes ceil & log2 APIs. + Code was written based on public domain code: + URL: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog. +*/ +static uint32_t mlx5_get_block_order(uint32_t v) +{ + static const uint32_t bits_arr[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000}; + static const uint32_t shift_arr[] = {1, 2, 4, 8, 16}; + int i; + uint32_t input_val = v; + + register uint32_t r = 0;/* result of log2(v) will go here */ + for (i = 4; i >= 0; i--) { + if (v & bits_arr[i]) { + v >>= shift_arr[i]; + r |= shift_arr[i]; + } + } + /* Rounding up if required */ + r += !!(input_val & ((1 << r) - 1)); + + return r; +} + +void mlx5_get_alloc_type(struct ibv_context *context, + const char *component, + enum mlx5_alloc_type *alloc_type, + enum mlx5_alloc_type default_type) + +{ + char env_value[VERBS_MAX_ENV_VAL]; + char name[128]; + + snprintf(name, sizeof(name), "%s_ALLOC_TYPE", component); + + *alloc_type = default_type; + + if (!ibv_exp_cmd_getenv(context, name, env_value, sizeof(env_value))) { + if (!strcasecmp(env_value, "ANON")) + *alloc_type = MLX5_ALLOC_TYPE_ANON; + else if (!strcasecmp(env_value, "HUGE")) + *alloc_type = MLX5_ALLOC_TYPE_HUGE; + else if (!strcasecmp(env_value, "CONTIG")) + *alloc_type = MLX5_ALLOC_TYPE_CONTIG; + else if (!strcasecmp(env_value, "PREFER_CONTIG")) + *alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG; + else if (!strcasecmp(env_value, "PREFER_HUGE")) + *alloc_type = MLX5_ALLOC_TYPE_PREFER_HUGE; + else if (!strcasecmp(env_value, "ALL")) + *alloc_type = MLX5_ALLOC_TYPE_ALL; + } +} + +static void mlx5_alloc_get_env_info(struct ibv_context *context, + int *max_block_log, + int *min_block_log, + const char *component) + +{ + char env[VERBS_MAX_ENV_VAL]; + int value; + char name[128]; + + /* First set defaults */ + *max_block_log = MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE; + *min_block_log = MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE; + + snprintf(name, sizeof(name), "%s_MAX_LOG2_CONTIG_BSIZE", component); + if (!ibv_exp_cmd_getenv(context, name, env, sizeof(env))) { + value = atoi(env); + if (value <= MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE && + value >= MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE) + *max_block_log = value; + else + fprintf(stderr, "Invalid value %d for %s\n", + value, name); + } + sprintf(name, "%s_MIN_LOG2_CONTIG_BSIZE", component); + if (!ibv_exp_cmd_getenv(context, name, env, sizeof(env))) { + value = atoi(env); + if (value >= MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE && + value <= *max_block_log) + *min_block_log = value; + else + fprintf(stderr, "Invalid value %d for %s\n", + value, name); + } +} + +int mlx5_alloc_buf_contig(struct mlx5_context *mctx, + struct mlx5_buf *buf, size_t size, + int page_size, + const char *component, void *req_addr) +{ + void *addr = MAP_FAILED; + int block_size_exp; + int max_block_log; + int min_block_log; + int mmap_flags = MAP_SHARED; + struct ibv_context *context = &mctx->ibv_ctx; + off_t offset; + void *act_addr = NULL; + size_t act_size = size; + + mlx5_alloc_get_env_info(&mctx->ibv_ctx, + &max_block_log, + &min_block_log, + component); + + /* this test guarantees that we don't call mlx5_get_block_order for + sizes above 4G so we don't overflow. It is based on the fact that + max_block_log cannot exceed 23 (MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE) */ + if (size >= (1 << max_block_log)) + block_size_exp = max_block_log; + else + block_size_exp = mlx5_get_block_order(size); + + if (req_addr) { + mmap_flags |= MAP_FIXED; + act_addr = (void *)((uintptr_t)req_addr & ~((uintptr_t)page_size - 1)); + act_size += (size_t)((uintptr_t)req_addr - (uintptr_t)act_addr); + } + + do { + offset = 0; + if (buf->numa_req.valid && (buf->numa_req.numa_id == mctx->numa_id)) + set_command(MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_DEV_NUMA_CMD, &offset); + else if (buf->numa_req.valid && (buf->numa_req.numa_id == mlx5_cpu_local_numa())) + set_command(MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_CPU_NUMA_CMD, &offset); + else + set_command(MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD, &offset); + + set_order(block_size_exp, &offset); + addr = mmap(act_addr, act_size, PROT_WRITE | PROT_READ, mmap_flags, + context->cmd_fd, page_size * offset); + + /* If CONTIGUOUS_PAGES_DEV_NUMA_CMD fails try CONTIGUOUS_PAGES */ + if (addr == MAP_FAILED && + get_command(&offset) != MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD) { + reset_command(&offset); + set_command(MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD, &offset); + addr = mmap(act_addr, act_size, PROT_WRITE | PROT_READ, mmap_flags, + context->cmd_fd, page_size * offset); + } + if (addr != MAP_FAILED) + break; + + /* + * The kernel returns EINVAL if not supported + */ + if (errno == EINVAL) + return -1; + + block_size_exp -= 1; + } while (block_size_exp >= min_block_log); + mlx5_dbg(mctx->dbg_fp, MLX5_DBG_CONTIG, "block order %d, addr %p\n", + block_size_exp, addr); + + if (addr == MAP_FAILED) + return -1; + + if (ibv_dontfork_range(addr, act_size)) { + munmap(addr, act_size); + return -1; + } + + buf->buf = addr; + buf->length = act_size; + buf->type = MLX5_ALLOC_TYPE_CONTIG; + + return 0; +} + +void mlx5_free_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf) +{ + ibv_dofork_range(buf->buf, buf->length); + munmap(buf->buf, buf->length); +} + +#ifdef HAVE_NUMA +int mlx5_cpu_local_numa(void) +{ + if (numa_available() == -1) + return -1; + + return numa_node_of_cpu(sched_getcpu()); +} + +static void *mlx5_alloc_numa(size_t size, int numa) +{ + void *ptr; + + if (numa < 0 || numa_available() == -1) + return NULL; + + numa_set_strict(1); + ptr = numa_alloc_onnode(size, numa); + if (ptr) + numa_tonode_memory(ptr, size, numa); + + return ptr; +} + +static void mlx5_free_numa(void *ptr, size_t size) +{ + numa_free(ptr, size); +} +#else +int mlx5_cpu_local_numa(void) +{ + return -1; +} + +static void *mlx5_alloc_numa(size_t size, int numa) +{ + return NULL; +} + +static void mlx5_free_numa(void *ptr, size_t size) +{ +} +#endif + +int mlx5_alloc_buf(struct mlx5_buf *buf, size_t size, int page_size) +{ + int ret; + size_t al_size; + + al_size = align(size, page_size); + + buf->buf = NULL; + if (buf->numa_req.valid) + buf->buf = mlx5_alloc_numa(al_size, buf->numa_req.numa_id); + if (buf->buf) { + buf->numa_alloc = 1; + } else { + buf->numa_alloc = 0; + ret = posix_memalign(&buf->buf, page_size, al_size); + if (ret) + return ret; + } + + ret = ibv_dontfork_range(buf->buf, al_size); + if (ret) { + if (buf->numa_alloc) + mlx5_free_numa(buf->buf, al_size); + else + free(buf->buf); + } + + if (!ret) { + buf->length = al_size; + buf->type = MLX5_ALLOC_TYPE_ANON; + } + + return ret; +} + +void mlx5_free_buf(struct mlx5_buf *buf) +{ + ibv_dofork_range(buf->buf, buf->length); + if (buf->numa_alloc) + mlx5_free_numa(buf->buf, buf->length); + else + free(buf->buf); +} Index: contrib/ofed/libmlx5/src/cq.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/cq.c @@ -0,0 +1,1657 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "mlx5.h" +#include "wqe.h" +#include "doorbell.h" + +enum { + MLX5_CQ_DOORBELL = 0x20 +}; + +enum { + CQ_OK = 0, + CQ_EMPTY = -1, + CQ_POLL_ERR = -2 +}; + +#define MLX5_CQ_DB_REQ_NOT_SOL (1 << 24) +#define MLX5_CQ_DB_REQ_NOT (0 << 24) + +enum { + MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR = 0x01, + MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR = 0x02, + MLX5_CQE_SYNDROME_LOCAL_PROT_ERR = 0x04, + MLX5_CQE_SYNDROME_WR_FLUSH_ERR = 0x05, + MLX5_CQE_SYNDROME_MW_BIND_ERR = 0x06, + MLX5_CQE_SYNDROME_BAD_RESP_ERR = 0x10, + MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR = 0x11, + MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR = 0x12, + MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR = 0x13, + MLX5_CQE_SYNDROME_REMOTE_OP_ERR = 0x14, + MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR = 0x15, + MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR = 0x16, + MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR = 0x22, +}; + +enum { + MLX5_CQE_OWNER_MASK = 1, + MLX5_CQE_REQ = 0, + MLX5_CQE_RESP_WR_IMM = 1, + MLX5_CQE_RESP_SEND = 2, + MLX5_CQE_RESP_SEND_IMM = 3, + MLX5_CQE_RESP_SEND_INV = 4, + MLX5_CQE_RESIZE_CQ = 5, + MLX5_CQE_SIG_ERR = 12, + MLX5_CQE_REQ_ERR = 13, + MLX5_CQE_RESP_ERR = 14, + MLX5_CQE_INVALID = 15, +}; + +enum { + MLX5_CQ_MODIFY_RESEIZE = 0, + MLX5_CQ_MODIFY_MODER = 1, + MLX5_CQ_MODIFY_MAPPING = 2, +}; + +enum { + MLX5_NO_INLINE_DATA = 0x0, + MLX5_INLINE_DATA32_SEG = 0x1, + MLX5_INLINE_DATA64_SEG = 0x2, + MLX5_COMPRESSED = 0x3, +}; + +enum { + MLX5_CQE_L2_OK = 1 << 0, + MLX5_CQE_L3_OK = 1 << 1, + MLX5_CQE_L4_OK = 1 << 2, +}; + +enum { + MLX5_CQE_L3_HDR_TYPE_NONE = 0x0, + MLX5_CQE_L3_HDR_TYPE_IPV6 = 0x1, + MLX5_CQE_L3_HDR_TYPE_IPV4 = 0x2, +}; + +enum { + /* Masks to handle the CQE byte_count field in case of MP RQ */ + MP_RQ_BYTE_CNT_FIELD_MASK = 0x0000FFFF, + MP_RQ_NUM_STRIDES_FIELD_MASK = 0x7FFF0000, + MP_RQ_FILLER_FIELD_MASK = 0x80000000, + MP_RQ_NUM_STRIDES_FIELD_SHIFT = 16, +}; + +struct mlx5_err_cqe { + uint8_t rsvd0[32]; + uint32_t srqn; + uint8_t rsvd1[16]; + uint8_t hw_err_synd; + uint8_t hw_synd_type; + uint8_t vendor_err_synd; + uint8_t syndrome; + uint32_t s_wqe_opcode_qpn; + uint16_t wqe_counter; + uint8_t signature; + uint8_t op_own; +}; + +struct mlx5_mini_cqe8 { + union { + uint32_t rx_hash_result; + uint32_t checksum; + struct { + uint16_t wqe_counter; + uint8_t s_wqe_opcode; + uint8_t reserved; + } s_wqe_info; + }; + uint32_t byte_cnt; +}; + +struct mlx5_cqe64 { + uint8_t rsvd0[2]; + /* + * wqe_id is valid only for Striding RQ (Multi-Packet RQ). + * It provides the WQE index inside the RQ. + */ + uint16_t wqe_id; + uint8_t rsvd4[8]; + uint32_t rx_hash_res; + uint8_t rx_hash_type; + uint8_t ml_path; + uint8_t rsvd20[2]; + uint16_t checksum; + uint16_t slid; + uint32_t flags_rqpn; + uint8_t hds_ip_ext; + uint8_t l4_hdr_type_etc; + __be16 vlan_info; + uint32_t srqn_uidx; + uint32_t imm_inval_pkey; + uint8_t rsvd40[4]; + uint32_t byte_cnt; + __be64 timestamp; + union { + uint32_t sop_drop_qpn; + struct { + uint8_t sop; + uint8_t qpn[3]; + } sop_qpn; + }; + /* + * In Striding RQ (Multi-Packet RQ) wqe_counter provides + * the WQE stride index (to calc pointer to start of the message) + */ + uint16_t wqe_counter; + uint8_t signature; + uint8_t op_own; +}; + +int mlx5_stall_num_loop = 60; +int mlx5_stall_cq_poll_min = 60; +int mlx5_stall_cq_poll_max = 100000; +int mlx5_stall_cq_inc_step = 100; +int mlx5_stall_cq_dec_step = 10; + +#define MLX5E_CQE_FORMAT_MASK 0xc +static inline int mlx5_get_cqe_format(struct mlx5_cqe64 *cqe) +{ + return (cqe->op_own & MLX5E_CQE_FORMAT_MASK) >> 2; +} + +static inline uint8_t get_cqe_l3_hdr_type(struct mlx5_cqe64 *cqe) +{ + return (cqe->l4_hdr_type_etc >> 2) & 0x3; +} + +static void *get_buf_cqe(struct mlx5_buf *buf, int n, int cqe_sz) +{ + return buf->buf + n * cqe_sz; +} + +static void *get_cqe(struct mlx5_cq *cq, int n) +{ + return cq->active_buf->buf + n * cq->cqe_sz; +} + +static inline void *get_sw_cqe(struct mlx5_cq *cq, int n) __attribute__((always_inline)); +static inline void *get_sw_cqe(struct mlx5_cq *cq, int n) +{ + void *cqe = get_cqe(cq, n & cq->ibv_cq.cqe); + struct mlx5_cqe64 *cqe64; + + cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64; + + if (likely((cqe64->op_own) >> 4 != MLX5_CQE_INVALID) && + !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & (cq->ibv_cq.cqe + 1)))) { + return cqe; + } else { + return NULL; + } +} + +static inline struct mlx5_cqe64 *get_next_cqe(struct mlx5_cq *cq, const int cqe_sz) +{ + unsigned idx = cq->cons_index & cq->ibv_cq.cqe; + void *cqe = cq->active_buf->buf + idx * cqe_sz; + struct mlx5_cqe64 *cqe64; + + cqe64 = (cqe_sz == 64) ? cqe : cqe + 64; + + if (likely((cqe64->op_own) >> 4 != MLX5_CQE_INVALID) && + !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(cq->cons_index & (cq->ibv_cq.cqe + 1)))) { + return cqe64; + } + + return NULL; +} + +static struct mlx5_cqe64 *next_cqe_sw(struct mlx5_cq *cq) +{ + return get_next_cqe(cq, cq->cqe_sz); +} + +static void handle_good_req(struct ibv_wc *wc, struct mlx5_cqe64 *cqe) +{ + switch (ntohl(cqe->sop_drop_qpn) >> 24) { + case MLX5_OPCODE_RDMA_WRITE_IMM: + wc->wc_flags |= IBV_WC_WITH_IMM; + case MLX5_OPCODE_RDMA_WRITE: + wc->opcode = IBV_WC_RDMA_WRITE; + break; + case MLX5_OPCODE_SEND_IMM: + wc->wc_flags |= IBV_WC_WITH_IMM; + case MLX5_OPCODE_SEND: + case MLX5_OPCODE_SEND_INVAL: + wc->opcode = IBV_WC_SEND; + break; + case MLX5_OPCODE_RDMA_READ: + wc->opcode = IBV_WC_RDMA_READ; + wc->byte_len = ntohl(cqe->byte_cnt); + break; + case MLX5_OPCODE_ATOMIC_CS: + wc->opcode = IBV_WC_COMP_SWAP; + wc->byte_len = 8; + break; + case MLX5_OPCODE_ATOMIC_FA: + wc->opcode = IBV_WC_FETCH_ADD; + wc->byte_len = 8; + break; + case MLX5_OPCODE_BIND_MW: + wc->opcode = IBV_WC_BIND_MW; + break; + case MLX5_OPCODE_UMR: + wc->opcode = IBV_EXP_WC_UMR; + break; + + case MLX5_OPCODE_ATOMIC_MASKED_CS: + wc->opcode = IBV_EXP_WC_MASKED_COMP_SWAP; + break; + + case MLX5_OPCODE_ATOMIC_MASKED_FA: + wc->opcode = IBV_EXP_WC_MASKED_FETCH_ADD; + break; + } +} + +static int handle_responder(struct ibv_wc *wc, struct mlx5_cqe64 *cqe, + struct mlx5_qp *qp, struct mlx5_srq *srq, + enum mlx5_rsc_type type) +{ + uint16_t wqe_ctr; + struct mlx5_wq *wq; + uint8_t g; + int err = 0; + int cqe_format = mlx5_get_cqe_format(cqe); + + wc->byte_len = ntohl(cqe->byte_cnt); + if (srq) { + wqe_ctr = ntohs(cqe->wqe_counter); + wc->wr_id = srq->wrid[wqe_ctr]; + mlx5_free_srq_wqe(srq, wqe_ctr); + if (cqe_format == MLX5_INLINE_DATA32_SEG) + err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe, + wc->byte_len); + else if (cqe_format == MLX5_INLINE_DATA64_SEG) + err = mlx5_copy_to_recv_srq(srq, wqe_ctr, cqe - 1, + wc->byte_len); + } else { + wq = &qp->rq; + wqe_ctr = wq->tail & (wq->wqe_cnt - 1); + wc->wr_id = wq->wrid[wqe_ctr]; + ++wq->tail; + if (cqe_format == MLX5_INLINE_DATA32_SEG) + err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe, + wc->byte_len); + else if (cqe_format == MLX5_INLINE_DATA64_SEG) + err = mlx5_copy_to_recv_wqe(qp, wqe_ctr, cqe - 1, + wc->byte_len); + } + if (err) + return err; + + wc->byte_len = ntohl(cqe->byte_cnt); + + switch (cqe->op_own >> 4) { + case MLX5_CQE_RESP_WR_IMM: + wc->opcode = IBV_WC_RECV_RDMA_WITH_IMM; + wc->wc_flags |= IBV_WC_WITH_IMM; + wc->imm_data = cqe->imm_inval_pkey; + break; + case MLX5_CQE_RESP_SEND: + wc->opcode = IBV_WC_RECV; + break; + case MLX5_CQE_RESP_SEND_IMM: + wc->opcode = IBV_WC_RECV; + wc->wc_flags |= IBV_WC_WITH_IMM; + wc->imm_data = cqe->imm_inval_pkey; + break; + } + wc->slid = ntohs(cqe->slid); + wc->sl = (ntohl(cqe->flags_rqpn) >> 24) & 0xf; + if (srq && (type != MLX5_RSC_TYPE_DCT) && + ((type == MLX5_RSC_TYPE_INVAL) || (type == MLX5_RSC_TYPE_XSRQ) || + ((qp->verbs_qp.qp.qp_type == IBV_QPT_XRC_RECV) || + (qp->verbs_qp.qp.qp_type == IBV_QPT_XRC)))) + wc->src_qp = srq->srqn; + else + wc->src_qp = ntohl(cqe->flags_rqpn) & 0xffffff; + + + wc->dlid_path_bits = cqe->ml_path & 0x7f; + + if ((qp && qp->verbs_qp.qp.qp_type == IBV_QPT_UD) || + (type == MLX5_RSC_TYPE_DCT)) { + g = (ntohl(cqe->flags_rqpn) >> 28) & 3; + wc->wc_flags |= g ? IBV_WC_GRH : 0; + } + + wc->pkey_index = ntohl(cqe->imm_inval_pkey) & 0xffff; + + return IBV_WC_SUCCESS; +} + +static void dump_cqe(FILE *fp, void *buf) +{ + uint32_t *p = buf; + int i; + + for (i = 0; i < 16; i += 4) + fprintf(fp, "%08x %08x %08x %08x\n", ntohl(p[i]), ntohl(p[i + 1]), + ntohl(p[i + 2]), ntohl(p[i + 3])); +} + +static void mlx5_set_bad_wc_opcode(struct ibv_exp_wc *wc, + struct mlx5_err_cqe *cqe, + uint8_t is_req) +{ + if (is_req) { + switch (ntohl(cqe->s_wqe_opcode_qpn) >> 24) { + case MLX5_OPCODE_RDMA_WRITE_IMM: + case MLX5_OPCODE_RDMA_WRITE: + wc->exp_opcode = IBV_EXP_WC_RDMA_WRITE; + break; + case MLX5_OPCODE_SEND_IMM: + case MLX5_OPCODE_SEND: + case MLX5_OPCODE_SEND_INVAL: + wc->exp_opcode = IBV_EXP_WC_SEND; + break; + case MLX5_OPCODE_RDMA_READ: + wc->exp_opcode = IBV_EXP_WC_RDMA_READ; + break; + case MLX5_OPCODE_ATOMIC_CS: + wc->exp_opcode = IBV_EXP_WC_COMP_SWAP; + break; + case MLX5_OPCODE_ATOMIC_FA: + wc->exp_opcode = IBV_EXP_WC_FETCH_ADD; + break; + case MLX5_OPCODE_BIND_MW: + wc->exp_opcode = IBV_EXP_WC_BIND_MW; + break; + case MLX5_OPCODE_UMR: + wc->exp_opcode = IBV_EXP_WC_UMR; + break; + case MLX5_OPCODE_ATOMIC_MASKED_CS: + wc->exp_opcode = IBV_EXP_WC_MASKED_COMP_SWAP; + break; + case MLX5_OPCODE_ATOMIC_MASKED_FA: + wc->exp_opcode = IBV_EXP_WC_MASKED_FETCH_ADD; + break; + } + } else { + switch (cqe->op_own >> 4) { + case MLX5_CQE_RESP_WR_IMM: + wc->exp_opcode = IBV_EXP_WC_RECV_RDMA_WITH_IMM; + break; + case MLX5_CQE_RESP_SEND: + wc->exp_opcode = IBV_EXP_WC_RECV; + break; + case MLX5_CQE_RESP_SEND_IMM: + wc->exp_opcode = IBV_EXP_WC_RECV; + break; + } + } +} + +static void mlx5_handle_error_cqe(struct mlx5_err_cqe *cqe, + struct ibv_exp_wc *wc) +{ + switch (cqe->syndrome) { + case MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR: + wc->status = IBV_WC_LOC_LEN_ERR; + break; + case MLX5_CQE_SYNDROME_LOCAL_QP_OP_ERR: + wc->status = IBV_WC_LOC_QP_OP_ERR; + break; + case MLX5_CQE_SYNDROME_LOCAL_PROT_ERR: + wc->status = IBV_WC_LOC_PROT_ERR; + break; + case MLX5_CQE_SYNDROME_WR_FLUSH_ERR: + wc->status = IBV_WC_WR_FLUSH_ERR; + break; + case MLX5_CQE_SYNDROME_MW_BIND_ERR: + wc->status = IBV_WC_MW_BIND_ERR; + break; + case MLX5_CQE_SYNDROME_BAD_RESP_ERR: + wc->status = IBV_WC_BAD_RESP_ERR; + break; + case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR: + wc->status = IBV_WC_LOC_ACCESS_ERR; + break; + case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR: + wc->status = IBV_WC_REM_INV_REQ_ERR; + break; + case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR: + wc->status = IBV_WC_REM_ACCESS_ERR; + break; + case MLX5_CQE_SYNDROME_REMOTE_OP_ERR: + wc->status = IBV_WC_REM_OP_ERR; + break; + case MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR: + wc->status = IBV_WC_RETRY_EXC_ERR; + break; + case MLX5_CQE_SYNDROME_RNR_RETRY_EXC_ERR: + wc->status = IBV_WC_RNR_RETRY_EXC_ERR; + break; + case MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR: + wc->status = IBV_WC_REM_ABORT_ERR; + break; + default: + wc->status = IBV_WC_GENERAL_ERR; + break; + } + + wc->vendor_err = cqe->vendor_err_synd; +} + +#if defined(__x86_64__) || defined (__i386__) +static inline unsigned long get_cycles() +{ + uint32_t low, high; + uint64_t val; + asm volatile ("rdtsc" : "=a" (low), "=d" (high)); + val = high; + val = (val << 32) | low; + return val; +} + +static void mlx5_stall_poll_cq() +{ + int i; + + for (i = 0; i < mlx5_stall_num_loop; i++) + (void)get_cycles(); +} +static void mlx5_stall_cycles_poll_cq(uint64_t cycles) +{ + while (get_cycles() < cycles) + ; /* Nothing */ +} +static void mlx5_get_cycles(uint64_t *cycles) +{ + *cycles = get_cycles(); +} +#else +static void mlx5_stall_poll_cq() +{ +} +static void mlx5_stall_cycles_poll_cq(uint64_t cycles) +{ +} +static void mlx5_get_cycles(uint64_t *cycles) +{ +} +#endif + +static int is_requestor(uint8_t opcode) +{ + if (opcode == MLX5_CQE_REQ || opcode == MLX5_CQE_REQ_ERR) + return 1; + else + return 0; +} + +static int is_responder(uint8_t opcode) +{ + switch (opcode) { + case MLX5_CQE_RESP_WR_IMM: + case MLX5_CQE_RESP_SEND: + case MLX5_CQE_RESP_SEND_IMM: + case MLX5_CQE_RESP_SEND_INV: + case MLX5_CQE_RESP_ERR: + return 1; + } + + return 0; +} + +static inline void copy_cqes(struct mlx5_cq *cq, struct mlx5_mini_cqe8 *mini_array, + struct mlx5_cqe64 *title, int cnt, uint16_t *wqe_cnt, int cqe_idx, + const int mp_rq) + __attribute__((always_inline)); +static inline void copy_cqes(struct mlx5_cq *cq, struct mlx5_mini_cqe8 *mini_array, + struct mlx5_cqe64 *title, int cnt, uint16_t *wqe_cnt, int cqe_idx, + const int mp_rq) +{ + struct mlx5_cqe64 *cqe; + int i; + int is_req = is_requestor(title->op_own >> 4); + int log_size = cq->cq_log_size; + uint8_t opown = title->op_own & 0xf2; + + for (i = 0; i < cnt; i++) { + cqe = get_cqe(cq, (cqe_idx + i) & cq->ibv_cq.cqe); + memcpy(cqe, title, sizeof(*title)); + cqe->byte_cnt = mini_array[i].byte_cnt; + cqe->op_own = opown | (((cqe_idx + i) >> log_size) & 1); + if (is_req) { + cqe->wqe_counter = mini_array[i].s_wqe_info.wqe_counter; + cqe->sop_qpn.sop = mini_array[i].s_wqe_info.s_wqe_opcode; + } else { + /* for now we are supporting only rx_hash_res not + * checksum */ + cqe->rx_hash_res = mini_array[i].rx_hash_result; + cqe->wqe_counter = htons(*wqe_cnt); + if (mp_rq) + /* + * In case of mp_rq the wqe_cnt is the stride index of the message start, + * therefore we need to increase it by the number of consumed strides + */ + (*wqe_cnt) += (ntohl(mini_array[i].byte_cnt) & MP_RQ_NUM_STRIDES_FIELD_MASK) >> + MP_RQ_NUM_STRIDES_FIELD_SHIFT; + else + /* + * In case of non mp_rq the wqe_cnt is the sq/rq wqe counter, + * therefore we need to increase it by one + */ + (*wqe_cnt)++; + } + } +} + +static inline struct mlx5_resource *find_rsc(struct mlx5_cq *cq, + struct mlx5_cqe64 *cqe64, + const int cqe_ver) __attribute__((always_inline)); +static inline struct mlx5_resource *find_rsc(struct mlx5_cq *cq, + struct mlx5_cqe64 *cqe64, + const int cqe_ver) +{ + uint32_t srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff; + uint32_t rsn; + + if (cqe_ver) + return mlx5_find_uidx(to_mctx(cq->ibv_cq.context), srqn_uidx); + + rsn = ntohl(cqe64->sop_drop_qpn) & 0xffffff; + + return mlx5_find_rsc(to_mctx(cq->ibv_cq.context), rsn); +} + +static inline void mlx5_decompress_cqe_idx(struct mlx5_cq *cq, uint32_t cqe_idx) + __attribute__((always_inline)); +static inline void mlx5_decompress_cqe_idx(struct mlx5_cq *cq, uint32_t cqe_idx) +{ + struct mlx5_cqe64 *title, *cqe; + struct mlx5_mini_cqe8 mini_array[8]; + int cqe_cnt; + uint16_t wqe_cnt; + struct mlx5_resource *cur_rsc; + int mp_rq; + + cqe = get_cqe(cq, cqe_idx & cq->ibv_cq.cqe); + title = cqe; + memcpy(mini_array, get_cqe(cq, (cqe_idx + 1) & cq->ibv_cq.cqe), sizeof(*title)); + cqe_cnt = ntohl(title->byte_cnt); + wqe_cnt = ntohs(title->wqe_counter); + cur_rsc = find_rsc(cq, title, (to_mctx(cq->ibv_cq.context))->cqe_version); + mp_rq = cur_rsc ? cur_rsc->type == MLX5_RSC_TYPE_MP_RWQ : 0; + + for (; cqe_cnt > 7; cqe_idx += 8, cqe_cnt -= 8) { + copy_cqes(cq, mini_array, title, 8, &wqe_cnt, cqe_idx, mp_rq); + cqe = get_cqe(cq, (cqe_idx + 8) & cq->ibv_cq.cqe); + memcpy(mini_array, cqe, sizeof(*title)); + } + + copy_cqes(cq, mini_array, title, cqe_cnt, &wqe_cnt, cqe_idx, mp_rq); +} + +static inline void mlx5_decompress_cqe(struct mlx5_cq *cq) + __attribute__((always_inline)); +static inline void mlx5_decompress_cqe(struct mlx5_cq *cq) +{ + mlx5_decompress_cqe_idx(cq, cq->cons_index); +} + +static inline int mlx5_poll_one(struct mlx5_cq *cq, + struct mlx5_resource **cur_rsc, + struct mlx5_srq **cur_srq, struct ibv_exp_wc *wc, + uint32_t wc_size, + int cqe_ver) __attribute__((always_inline)); +static inline int mlx5_poll_one(struct mlx5_cq *cq, + struct mlx5_resource **cur_rsc, + struct mlx5_srq **cur_srq, + struct ibv_exp_wc *wc, + uint32_t wc_size, + int cqe_ver) +{ + struct mlx5_cqe64 *cqe64; + struct mlx5_wq *wq; + uint16_t wqe_ctr; + void *cqe; + uint32_t rsn; + uint32_t srqn_uidx; + int idx; + uint8_t opcode; + struct mlx5_err_cqe *ecqe; + int err; + int requestor; + int responder; + int is_srq = 0; + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + struct mlx5_qp *mqp = NULL; + struct mlx5_rwq *rwq = NULL; + struct mlx5_dct *mdct; + uint64_t exp_wc_flags = 0; + enum mlx5_rsc_type type = MLX5_RSC_TYPE_INVAL; + int cqe_format; + uint8_t l3_hdr; + int timestamp_en = cq->creation_flags & + MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP; + + cqe64 = next_cqe_sw(cq); + if (!cqe64) + return CQ_EMPTY; + + cqe_format = mlx5_get_cqe_format(cqe64); + if (unlikely(cqe_format == MLX5_COMPRESSED)) { + mlx5_decompress_cqe(cq); + timestamp_en = 0; + } + + ++cq->cons_index; + + /* + * Make sure we read CQ entry contents after we've checked the + * ownership bit. + */ + rmb(); + +#ifdef MLX5_DEBUG + if (mlx5_debug_mask & MLX5_DBG_CQ_CQE) { + FILE *fp = mctx->dbg_fp; + + mlx5_dbg(fp, MLX5_DBG_CQ_CQE, "dump cqe for cqn 0x%x:\n", cq->cqn); + dump_cqe(fp, cqe64); + } +#endif + + ((struct ibv_wc *)wc)->wc_flags = 0; + opcode = cqe64->op_own >> 4; + requestor = is_requestor(opcode); + responder = is_responder(opcode); + if (unlikely(!requestor && !responder)) + return CQ_POLL_ERR; + + rsn = ntohl(cqe64->sop_drop_qpn) & 0xffffff; + srqn_uidx = ntohl(cqe64->srqn_uidx) & 0xffffff; + if (cqe_ver) { + if (!*cur_rsc || (srqn_uidx != (*cur_rsc)->rsn)) { + *cur_rsc = mlx5_find_uidx(mctx, srqn_uidx); + if (unlikely(!*cur_rsc)) + return CQ_POLL_ERR; + } + } else { + if (responder && srqn_uidx) { + is_srq = 1; + if (!*cur_srq || (srqn_uidx != (*cur_srq)->srqn)) { + *cur_srq = mlx5_find_srq(mctx, srqn_uidx); + if (unlikely(!*cur_srq)) + return CQ_POLL_ERR; + } + } + + if (!*cur_rsc || (rsn != (*cur_rsc)->rsn)) { + *cur_rsc = mlx5_find_rsc(mctx, rsn); + if (unlikely(!*cur_rsc && !srqn_uidx)) + return CQ_POLL_ERR; + } + } + + if (*cur_rsc) { + switch ((*cur_rsc)->type) { + case MLX5_RSC_TYPE_QP: + mqp = (struct mlx5_qp *)*cur_rsc; + if (likely(offsetof(struct ibv_exp_wc, qp) < wc_size)) { + wc->qp = &mqp->verbs_qp.qp; + exp_wc_flags |= IBV_EXP_WC_QP; + } + if (cqe_ver && responder && mqp->verbs_qp.qp.srq) { + *cur_srq = to_msrq(mqp->verbs_qp.qp.srq); + is_srq = 1; + } + break; + case MLX5_RSC_TYPE_DCT: + mdct = (struct mlx5_dct *)*cur_rsc; + is_srq = 1; + if (likely(offsetof(struct ibv_exp_wc, dct) < wc_size)) { + wc->dct = &mdct->ibdct; + exp_wc_flags |= IBV_EXP_WC_DCT; + } + + if (cqe_ver) + *cur_srq = to_msrq(mdct->ibdct.srq); + break; + case MLX5_RSC_TYPE_XSRQ: + *cur_srq = (struct mlx5_srq *)*cur_rsc; + is_srq = 1; + break; + case MLX5_RSC_TYPE_RWQ: + case MLX5_RSC_TYPE_MP_RWQ: + rwq = (struct mlx5_rwq *)*cur_rsc; + break; + default: + return CQ_POLL_ERR; + } + type = (*cur_rsc)->type; + } + + if (is_srq && likely(offsetof(struct ibv_exp_wc, srq) < wc_size)) { + wc->srq = &(*cur_srq)->vsrq.srq; + exp_wc_flags |= IBV_EXP_WC_SRQ; + } + + wc->qp_num = rsn; + + switch (opcode) { + case MLX5_CQE_REQ: + if (unlikely(!mqp)) { + fprintf(stderr, "all requestors are kinds of QPs\n"); + return CQ_POLL_ERR; + } + wq = &mqp->sq; + wqe_ctr = ntohs(cqe64->wqe_counter); + idx = wqe_ctr & (wq->wqe_cnt - 1); + handle_good_req((struct ibv_wc *)wc, cqe64); + if (cqe_format == MLX5_INLINE_DATA32_SEG) { + cqe = (cq->cqe_sz == 64) ? cqe64 : cqe64 - 1; + err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe, + wc->byte_len); + } else if (cqe_format == MLX5_INLINE_DATA64_SEG) { + cqe = cqe64 - 1; + err = mlx5_copy_to_send_wqe(mqp, wqe_ctr, cqe - 1, + wc->byte_len); + } else { + err = 0; + } + + wc->wr_id = wq->wrid[idx]; + wq->tail = mqp->gen_data.wqe_head[idx] + 1; + wc->status = err; + break; + case MLX5_CQE_RESP_WR_IMM: + case MLX5_CQE_RESP_SEND: + case MLX5_CQE_RESP_SEND_IMM: + case MLX5_CQE_RESP_SEND_INV: + wc->status = handle_responder((struct ibv_wc *)wc, cqe64, mqp, + is_srq ? *cur_srq : NULL, type); + if (mqp && + (mqp->gen_data.model_flags & MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP)) { + l3_hdr = get_cqe_l3_hdr_type(cqe64); + exp_wc_flags |= + (!!(cqe64->hds_ip_ext & MLX5_CQE_L4_OK) * + (uint64_t)IBV_EXP_WC_RX_TCP_UDP_CSUM_OK) | + (!!(cqe64->hds_ip_ext & MLX5_CQE_L3_OK) * + (uint64_t)IBV_EXP_WC_RX_IP_CSUM_OK) | + ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV4) * + (uint64_t)IBV_EXP_WC_RX_IPV4_PACKET) | + ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV6) * + (uint64_t)IBV_EXP_WC_RX_IPV6_PACKET); + } + break; + case MLX5_CQE_RESIZE_CQ: + break; + case MLX5_CQE_REQ_ERR: + case MLX5_CQE_RESP_ERR: + ecqe = (struct mlx5_err_cqe *)cqe64; + mlx5_handle_error_cqe(ecqe, wc); + mlx5_set_bad_wc_opcode(wc, ecqe, (opcode == MLX5_CQE_REQ_ERR)); + if (unlikely(ecqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR && + ecqe->syndrome != MLX5_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR)) { + FILE *fp = mctx->dbg_fp; + fprintf(fp, PFX "%s: got completion with error:\n", + mctx->hostname); + dump_cqe(fp, ecqe); + if (mlx5_freeze_on_error_cqe) { + fprintf(fp, PFX "freezing at poll cq..."); + while (1) + sleep(10); + } + } + + if (opcode == MLX5_CQE_REQ_ERR) { + wq = &mqp->sq; + wqe_ctr = ntohs(cqe64->wqe_counter); + idx = wqe_ctr & (wq->wqe_cnt - 1); + wc->wr_id = wq->wrid[idx]; + wq->tail = mqp->gen_data.wqe_head[idx] + 1; + } else { + if (*cur_srq) { + wqe_ctr = ntohs(cqe64->wqe_counter); + wc->wr_id = (*cur_srq)->wrid[wqe_ctr]; + mlx5_free_srq_wqe(*cur_srq, wqe_ctr); + } else { + if (rwq) + wq = &rwq->rq; + else + wq = &mqp->rq; + wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)]; + ++wq->tail; + } + } + break; + } + + if (unlikely(timestamp_en)) { + wc->timestamp = ntohll(cqe64->timestamp); + exp_wc_flags |= IBV_EXP_WC_WITH_TIMESTAMP; + } + + if (likely(offsetof(struct ibv_exp_wc, exp_wc_flags) < wc_size)) + wc->exp_wc_flags = exp_wc_flags | (uint64_t)((struct ibv_wc *)wc)->wc_flags; + + return CQ_OK; +} + +static inline int poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_exp_wc *wc, + uint32_t wc_size, int cqe_ver) __attribute__((always_inline)); +static inline int poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_exp_wc *wc, + uint32_t wc_size, int cqe_ver) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_resource *rsc = NULL; + struct mlx5_srq *srq = NULL; + int npolled; + int err = CQ_OK; + void *twc; + + if (cq->stall_enable) { + if (cq->stall_adaptive_enable) { + if (cq->stall_last_count) + mlx5_stall_cycles_poll_cq(cq->stall_last_count + cq->stall_cycles); + } else if (cq->stall_next_poll) { + cq->stall_next_poll = 0; + mlx5_stall_poll_cq(); + } + } + + mlx5_lock(&cq->lock); + + for (npolled = 0, twc = wc; npolled < ne; ++npolled, twc += wc_size) { + err = mlx5_poll_one(cq, &rsc, &srq, twc, wc_size, cqe_ver); + if (err != CQ_OK) + break; + } + + mlx5_update_cons_index(cq); + + mlx5_unlock(&cq->lock); + + if (cq->stall_enable) { + if (cq->stall_adaptive_enable) { + if (npolled == 0) { + cq->stall_cycles = max(cq->stall_cycles-mlx5_stall_cq_dec_step, + mlx5_stall_cq_poll_min); + mlx5_get_cycles(&cq->stall_last_count); + } else if (npolled < ne) { + cq->stall_cycles = min(cq->stall_cycles+mlx5_stall_cq_inc_step, + mlx5_stall_cq_poll_max); + mlx5_get_cycles(&cq->stall_last_count); + } else { + cq->stall_cycles = max(cq->stall_cycles-mlx5_stall_cq_dec_step, + mlx5_stall_cq_poll_min); + cq->stall_last_count = 0; + } + } else if (err == CQ_EMPTY) { + cq->stall_next_poll = 1; + } + } + + return err == CQ_POLL_ERR ? err : npolled; +} + +int mlx5_poll_cq(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) +{ + return poll_cq(ibcq, ne, (struct ibv_exp_wc *)wc, sizeof(*wc), 0); +} + +int mlx5_poll_cq_1(struct ibv_cq *ibcq, int ne, struct ibv_wc *wc) +{ + return poll_cq(ibcq, ne, (struct ibv_exp_wc *)wc, sizeof(*wc), 1); +} + +int mlx5_poll_cq_ex(struct ibv_cq *ibcq, int ne, + struct ibv_exp_wc *wc, uint32_t wc_size) +{ + return poll_cq(ibcq, ne, wc, wc_size, 0); +} + +int mlx5_poll_cq_ex_1(struct ibv_cq *ibcq, int ne, + struct ibv_exp_wc *wc, uint32_t wc_size) +{ + return poll_cq(ibcq, ne, wc, wc_size, 1); +} + +int mlx5_arm_cq(struct ibv_cq *ibvcq, int solicited) +{ + struct mlx5_cq *cq = to_mcq(ibvcq); + struct mlx5_context *ctx = to_mctx(ibvcq->context); + uint32_t doorbell[2]; + uint32_t sn; + uint32_t ci; + uint32_t cmd; + + sn = cq->arm_sn & 3; + ci = cq->cons_index & 0xffffff; + cmd = solicited ? MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT; + + cq->dbrec[MLX5_CQ_ARM_DB] = htonl(sn << 28 | cmd | ci); + + /* + * Make sure that the doorbell record in host memory is + * written before ringing the doorbell via PCI MMIO. + */ + wmb(); + + doorbell[0] = htonl(sn << 28 | cmd | ci); + doorbell[1] = htonl(cq->cqn); + + mlx5_write64(doorbell, ctx->uar[0].regs + MLX5_CQ_DOORBELL, &ctx->lock32); + + wc_wmb(); + + return 0; +} + +void mlx5_cq_event(struct ibv_cq *cq) +{ + to_mcq(cq)->arm_sn++; +} + +static int is_equal_rsn(struct mlx5_cqe64 *cqe64, uint32_t rsn) +{ + return rsn == (ntohl(cqe64->sop_drop_qpn) & 0xffffff); +} + +static int is_equal_uidx(struct mlx5_cqe64 *cqe64, uint32_t uidx) +{ + return uidx == (ntohl(cqe64->srqn_uidx) & 0xffffff); +} + +static inline int free_res_cqe(struct mlx5_cqe64 *cqe64, uint32_t rsn_uidx, + struct mlx5_srq *srq, int cqe_version) +{ + if (cqe_version) { + if (is_equal_uidx(cqe64, rsn_uidx)) { + if (srq && is_responder(cqe64->op_own >> 4)) + mlx5_free_srq_wqe(srq, + ntohs(cqe64->wqe_counter)); + return 1; + } + } else { + if (is_equal_rsn(cqe64, rsn_uidx)) { + if (srq && (ntohl(cqe64->srqn_uidx) & 0xffffff)) + mlx5_free_srq_wqe(srq, + ntohs(cqe64->wqe_counter)); + return 1; + } + } + + return 0; +} + +void __mlx5_cq_clean(struct mlx5_cq *cq, uint32_t rsn_uidx, struct mlx5_srq *srq) +{ + uint32_t prod_index; + int nfreed = 0; + struct mlx5_cqe64 *cqe64, *dest64; + void *cqe, *dest; + uint8_t owner_bit; + int cqe_version; + + if (!cq) + return; + + /* + * First we need to find the current producer index, so we + * know where to start cleaning from. It doesn't matter if HW + * adds new entries after this loop -- the QP we're worried + * about is already in RESET, so the new entries won't come + * from our QP and therefore don't need to be checked. + */ + cqe_version = (to_mctx(cq->ibv_cq.context))->cqe_version; + for (prod_index = cq->cons_index; (cqe = get_sw_cqe(cq, prod_index)); ++prod_index) { + if (mlx5_get_cqe_format(cqe) == MLX5_COMPRESSED) + mlx5_decompress_cqe_idx(cq, prod_index); + + if (prod_index == cq->cons_index + cq->ibv_cq.cqe) + break; + } + + /* + * Now sweep backwards through the CQ, removing CQ entries + * that match our QP by copying older entries on top of them. + */ + while ((int) --prod_index - (int) cq->cons_index >= 0) { + cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe); + cqe64 = (cq->cqe_sz == 64) ? cqe : cqe + 64; + if (free_res_cqe(cqe64, rsn_uidx, srq, cqe_version)) { + ++nfreed; + } else if (nfreed) { + dest = get_cqe(cq, (prod_index + nfreed) & cq->ibv_cq.cqe); + dest64 = (cq->cqe_sz == 64) ? dest : dest + 64; + owner_bit = dest64->op_own & MLX5_CQE_OWNER_MASK; + memcpy(dest, cqe, cq->cqe_sz); + dest64->op_own = owner_bit | + (dest64->op_own & ~MLX5_CQE_OWNER_MASK); + } + } + + if (nfreed) { + cq->cons_index += nfreed; + /* + * Make sure update of buffer contents is done before + * updating consumer index. + */ + wmb(); + mlx5_update_cons_index(cq); + } +} + +void mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq) +{ + mlx5_lock(&cq->lock); + __mlx5_cq_clean(cq, qpn, srq); + mlx5_unlock(&cq->lock); +} + +static uint8_t sw_ownership_bit(int n, int nent) +{ + return (n & nent) ? 1 : 0; +} + +static int is_hw(uint8_t own, int n, int mask) +{ + return (own & MLX5_CQE_OWNER_MASK) ^ !!(n & (mask + 1)); +} + +void mlx5_cq_resize_copy_cqes(struct mlx5_cq *cq) +{ + struct mlx5_cqe64 *scqe64; + struct mlx5_cqe64 *dcqe64; + void *start_cqe; + void *scqe; + void *dcqe; + int ssize; + int dsize; + int i; + uint8_t sw_own; + + ssize = cq->cqe_sz; + dsize = cq->resize_cqe_sz; + + i = cq->cons_index; + scqe = get_buf_cqe(cq->active_buf, i & cq->active_cqes, ssize); + scqe64 = ssize == 64 ? scqe : scqe + 64; + start_cqe = scqe; + if (is_hw(scqe64->op_own, i, cq->active_cqes)) { + fprintf(stderr, "expected cqe in sw ownership\n"); + return; + } + + while ((scqe64->op_own >> 4) != MLX5_CQE_RESIZE_CQ) { + dcqe = get_buf_cqe(cq->resize_buf, (i + 1) & (cq->resize_cqes - 1), dsize); + dcqe64 = dsize == 64 ? dcqe : dcqe + 64; + sw_own = sw_ownership_bit(i + 1, cq->resize_cqes); + memcpy(dcqe, scqe, ssize); + dcqe64->op_own = (dcqe64->op_own & ~MLX5_CQE_OWNER_MASK) | sw_own; + + ++i; + scqe = get_buf_cqe(cq->active_buf, i & cq->active_cqes, ssize); + scqe64 = ssize == 64 ? scqe : scqe + 64; + if (is_hw(scqe64->op_own, i, cq->active_cqes)) { + fprintf(stderr, "expected cqe in sw ownership\n"); + return; + } + + if (scqe == start_cqe) { + fprintf(stderr, "resize CQ failed to get resize CQE\n"); + return; + } + } + ++cq->cons_index; +} + +int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq, + struct mlx5_buf *buf, int nent, int cqe_sz) +{ + struct mlx5_cqe64 *cqe; + int i; + struct mlx5_device *dev = to_mdev(mctx->ibv_ctx.device); + int ret; + enum mlx5_alloc_type type; + enum mlx5_alloc_type default_type = MLX5_ALLOC_TYPE_PREFER_CONTIG; + + if (mlx5_use_huge(&mctx->ibv_ctx, "HUGE_CQ")) + default_type = MLX5_ALLOC_TYPE_HUGE; + + mlx5_get_alloc_type(&mctx->ibv_ctx, MLX5_CQ_PREFIX, &type, default_type); + + buf->numa_req.valid = 1; + buf->numa_req.numa_id = mlx5_cpu_local_numa(); + ret = mlx5_alloc_prefered_buf(mctx, buf, + align(nent * cqe_sz, dev->page_size), + dev->page_size, + type, + MLX5_CQ_PREFIX); + + if (ret) + return -1; + + memset(buf->buf, 0, nent * cqe_sz); + + for (i = 0; i < nent; ++i) { + cqe = buf->buf + i * cqe_sz; + cqe += cqe_sz == 128 ? 1 : 0; + cqe->op_own = MLX5_CQE_INVALID << 4; + } + + return 0; +} + +int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf) +{ + return mlx5_free_actual_buf(ctx, buf); +} + +/* + * poll family functions + */ +static inline int32_t poll_cnt(struct ibv_cq *ibcq, uint32_t max_entries, + const int use_lock, const int cqe_sz, + const int cqe_ver) __attribute__((always_inline)); +static inline int32_t poll_cnt(struct ibv_cq *ibcq, uint32_t max_entries, + const int use_lock, const int cqe_sz, + const int cqe_ver) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_resource *cur_rsc = NULL; + struct mlx5_cqe64 *cqe64; + struct mlx5_qp *mqp; + int err = CQ_OK; + uint16_t wqe_ctr; + int npolled; + + if (unlikely(use_lock)) + mlx5_lock(&cq->lock); + + for (npolled = 0; npolled < max_entries; ++npolled) { + cqe64 = get_next_cqe(cq, cqe_sz); + if (!cqe64) { + err = CQ_EMPTY; + break; + } + + if (unlikely(mlx5_get_cqe_format(cqe64) == MLX5_COMPRESSED)) + mlx5_decompress_cqe(cq); + + cur_rsc = find_rsc(cq, cqe64, cqe_ver); + if (unlikely(!cur_rsc)) { + err = CQ_POLL_ERR; + fprintf(stderr, "Failed to find send QP on poll_cnt\n"); + break; + } + mqp = (struct mlx5_qp *)cur_rsc; + if (likely((cqe64->op_own >> 4) == MLX5_CQE_REQ)) { + wqe_ctr = ntohs(cqe64->wqe_counter); + mqp->sq.tail = mqp->gen_data.wqe_head[wqe_ctr & (mqp->sq.wqe_cnt - 1)] + 1; + } else if ((cqe64->op_own >> 4) == MLX5_CQE_RESP_SEND) { + ++mqp->rq.tail; + } else { + err = CQ_POLL_ERR; + if ((cqe64->op_own >> 4) == MLX5_CQE_REQ_ERR) + fprintf(stderr, "MLX5_CQE_REQ_ERR received on poll_cnt\n"); + else + fprintf(stderr, "Non requester message received on poll_cnt\n"); + } + + if (unlikely(err != CQ_OK)) + break; + + ++cq->cons_index; + } + + if (likely(npolled)) { + mlx5_update_cons_index(cq); + err = CQ_OK; + } + + if (unlikely(use_lock)) + mlx5_unlock(&cq->lock); + + return err == CQ_POLL_ERR ? -1 : npolled; +} + +static inline int32_t get_rx_offloads_flags(struct mlx5_cqe64 *cqe) __attribute__((always_inline)); +static inline int32_t get_rx_offloads_flags(struct mlx5_cqe64 *cqe) +{ + uint8_t l3_hdr; + int32_t flags; + + l3_hdr = get_cqe_l3_hdr_type(cqe); + flags = (!!(cqe->hds_ip_ext & MLX5_CQE_L4_OK) * IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK) | + (!!(cqe->hds_ip_ext & MLX5_CQE_L3_OK) * IBV_EXP_CQ_RX_IP_CSUM_OK) | + ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV4) * IBV_EXP_CQ_RX_IPV4_PACKET) | + ((l3_hdr == MLX5_CQE_L3_HDR_TYPE_IPV6) * IBV_EXP_CQ_RX_IPV6_PACKET); + + return flags; +} + +static inline int32_t poll_length(struct ibv_cq *ibcq, void *buf, uint32_t *inl, + const int use_lock, const int cqe_sz, + uint32_t *offset, uint32_t *flags, const int cqe_ver) __attribute__((always_inline)); +static inline int32_t poll_length(struct ibv_cq *ibcq, void *buf, uint32_t *inl, + const int use_lock, const int cqe_sz, + uint32_t *offset, uint32_t *flags, const int cqe_ver) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_resource *cur_rsc = NULL; + struct mlx5_cqe64 *cqe64; + struct mlx5_qp *mqp = NULL; + struct mlx5_rwq *rwq = NULL; + int32_t size = 0; + uint16_t wqe_ctr; + int err = CQ_OK; + int cqe_format; + + if (unlikely(use_lock)) + mlx5_lock(&cq->lock); + + cqe64 = get_next_cqe(cq, cqe_sz); + + if (cqe64) { + cqe_format = mlx5_get_cqe_format(cqe64); + if (unlikely(cqe_format == MLX5_COMPRESSED)) { + mlx5_decompress_cqe(cq); + cqe_format = 0; + } + + if (unlikely((cqe64->op_own >> 4) != MLX5_CQE_RESP_SEND)) { + if (cqe64->op_own >> 4 == MLX5_CQE_RESP_ERR) + fprintf(stderr, "poll_length, CQE response error, syndrome=0x%x, vendor syndrome error=0x%x, HW syndrome 0x%x, HW syndrome type 0x%x\n", + ((struct mlx5_err_cqe *)cqe64)->syndrome, ((struct mlx5_err_cqe *)cqe64)->vendor_err_synd, + ((struct mlx5_err_cqe *)cqe64)->hw_err_synd, ((struct mlx5_err_cqe *)cqe64)->hw_synd_type); + else + fprintf(stderr, "Only post-receive completion supported on poll_length, op=%u\n", + cqe64->op_own >> 4); + err = CQ_POLL_ERR; + goto out; + } + cur_rsc = find_rsc(cq, cqe64, cqe_ver); + if (unlikely(!cur_rsc)) { + fprintf(stderr, "Failed to find QP resource on poll_length\n"); + err = CQ_POLL_ERR; + goto out; + } + + if (cur_rsc->type == MLX5_RSC_TYPE_MP_RWQ) { + uint32_t byte_cnt; + uint16_t wqe_id; + + if (unlikely(!offset)) { + fprintf(stderr, "Can't handle Multi-Packet RQ completion since" + " 'offset' output parameter is not provided\n"); + err = CQ_POLL_ERR; + goto out; + } + rwq = (struct mlx5_rwq *)cur_rsc; + + byte_cnt = ntohl(cqe64->byte_cnt); + wqe_id = ntohs(cqe64->wqe_id) & (rwq->rq.wqe_cnt - 1); + /* Add the WQE strides consumed by this CQE to the WQE consumed strides counter */ + rwq->consumed_strides_counter[wqe_id] += (byte_cnt & MP_RQ_NUM_STRIDES_FIELD_MASK) >> + MP_RQ_NUM_STRIDES_FIELD_SHIFT; + + /* Updae RX offload flags */ + if (rwq->model_flags & MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP) + *flags = get_rx_offloads_flags(cqe64); + else + *flags = 0; + /* If last packet for receive WR (all strides of this WQE consumed) */ + if (rwq->consumed_strides_counter[wqe_id] == rwq->mp_rq_strides_in_wqe) { + *flags |= IBV_EXP_CQ_RX_MULTI_PACKET_LAST_V1; + ++rwq->rq.tail; /* Update the rq tail */ + rwq->consumed_strides_counter[wqe_id] = 0; + } + + if (byte_cnt & MP_RQ_FILLER_FIELD_MASK) + /* + * In case of filler CQE the application get WC with message-size = 0. + * filler CQE may come at any time regardless to the last-packet indication. + */ + size = 0; + else /* not a filler CQE */ + size = (byte_cnt & MP_RQ_BYTE_CNT_FIELD_MASK) - rwq->mp_rq_packet_padding; + + /* + * In mp_rq wqe_counter provides the WQE stride index. + * We use it to calculate packet offset in the WR posted buffer. + */ + *offset = ntohs(cqe64->wqe_counter) * rwq->mp_rq_stride_size + rwq->mp_rq_packet_padding; + } else { + if (cur_rsc->type == MLX5_RSC_TYPE_QP) { + mqp = (struct mlx5_qp *)cur_rsc; + if (flags) { + if (mqp->gen_data.model_flags & MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP) + *flags = get_rx_offloads_flags(cqe64); + else + *flags = 0; + } + } else { + if (likely(cur_rsc->type == MLX5_RSC_TYPE_RWQ)) { + rwq = (struct mlx5_rwq *)cur_rsc; + } else { + fprintf(stderr, "Invalid resource type(%d) on poll_length\n", cur_rsc->type); + err = CQ_POLL_ERR; + goto out; + } + if (flags) { + if (rwq->model_flags & MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP) + *flags = get_rx_offloads_flags(cqe64); + else + *flags = 0; + } + } + + size = ntohl(cqe64->byte_cnt); + + if (unlikely(cqe_format)) { + void *data = (cqe_format == MLX5_INLINE_DATA32_SEG) ? cqe64 : cqe64 - 1; + + if (buf) { + *inl = 1; + memcpy(buf, data, size); + } else { + wqe_ctr = mqp->rq.tail & (mqp->rq.wqe_cnt - 1); + if (unlikely(mlx5_copy_to_recv_wqe(mqp, wqe_ctr, data, size))) { + fprintf(stderr, "Fail to copy inline receive message to receive buffer\n"); + err = CQ_POLL_ERR; + goto out; + } + } + } + if (!rwq) + ++mqp->rq.tail; + else + ++rwq->rq.tail; + } + + ++cq->cons_index; + mlx5_update_cons_index(cq); + } else { + err = CQ_EMPTY; + if (flags) + *flags = 0; + } + +out: + if (unlikely(use_lock)) + mlx5_unlock(&cq->lock); + + return err == CQ_POLL_ERR ? -1 : size; +} + +int32_t mlx5_poll_cnt_safe(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__; +int32_t mlx5_poll_cnt_safe(struct ibv_cq *ibcq, uint32_t max) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + + return poll_cnt(ibcq, max, 1, cq->cqe_sz, mctx->cqe_version == 1); +} + +int32_t mlx5_poll_cnt_unsafe_cqe64(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__; +int32_t mlx5_poll_cnt_unsafe_cqe64(struct ibv_cq *ibcq, uint32_t max) +{ + return poll_cnt(ibcq, max, 0, 64, 0); +} + +int32_t mlx5_poll_cnt_unsafe_cqe128(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__; +int32_t mlx5_poll_cnt_unsafe_cqe128(struct ibv_cq *ibcq, uint32_t max) +{ + return poll_cnt(ibcq, max, 0, 128, 0); +} + +int32_t mlx5_poll_cnt_unsafe_cqe64_v1(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__; +int32_t mlx5_poll_cnt_unsafe_cqe64_v1(struct ibv_cq *ibcq, uint32_t max) +{ + return poll_cnt(ibcq, max, 0, 64, 1); +} + +int32_t mlx5_poll_cnt_unsafe_cqe128_v1(struct ibv_cq *ibcq, uint32_t max) __MLX5_ALGN_F__; +int32_t mlx5_poll_cnt_unsafe_cqe128_v1(struct ibv_cq *ibcq, uint32_t max) +{ + return poll_cnt(ibcq, max, 0, 128, 1); +} + +int32_t mlx5_poll_length_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + + return poll_length(ibcq, buf, inl, 1, cq->cqe_sz, NULL, NULL, mctx->cqe_version == 1); +} + +int32_t mlx5_poll_length_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl) +{ + return poll_length(cq, buf, inl, 0, 64, NULL, NULL, 0); +} + +int32_t mlx5_poll_length_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl) +{ + return poll_length(cq, buf, inl, 0, 128, NULL, NULL, 0); +} + +int32_t mlx5_poll_length_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) +{ + return poll_length(cq, buf, inl, 0, 64, NULL, NULL, 1); +} + +int32_t mlx5_poll_length_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl) +{ + return poll_length(cq, buf, inl, 0, 128, NULL, NULL, 1); +} + +/* Poll length flags */ +int32_t mlx5_poll_length_flags_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_safe(struct ibv_cq *ibcq, void *buf, uint32_t *inl, uint32_t *flags) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + + return poll_length(ibcq, buf, inl, 1, cq->cqe_sz, NULL, flags, mctx->cqe_version == 1); +} + +int32_t mlx5_poll_length_flags_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_unsafe_cqe64(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) +{ + return poll_length(cq, buf, inl, 0, 64, NULL, flags, 0); +} + +int32_t mlx5_poll_length_flags_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_unsafe_cqe128(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) +{ + return poll_length(cq, buf, inl, 0, 128, NULL, flags, 0); +} + +int32_t mlx5_poll_length_flags_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_unsafe_cqe64_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) +{ + return poll_length(cq, buf, inl, 0, 64, NULL, flags, 1); +} + +int32_t mlx5_poll_length_flags_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_unsafe_cqe128_v1(struct ibv_cq *cq, void *buf, uint32_t *inl, uint32_t *flags) +{ + return poll_length(cq, buf, inl, 0, 128, NULL, flags, 1); +} + +/* Poll length flags MP RQ */ +int32_t mlx5_poll_length_flags_mp_rq_safe(struct ibv_cq *ibcq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_mp_rq_safe(struct ibv_cq *ibcq, uint32_t *offset, uint32_t *flags) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + + return poll_length(ibcq, NULL, NULL, 1, cq->cqe_sz, offset, flags, mctx->cqe_version == 1); +} + +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) +{ + return poll_length(cq, NULL, NULL, 0, 64, offset, flags, 0); +} + +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) +{ + return poll_length(cq, NULL, NULL, 0, 128, offset, flags, 0); +} + +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) +{ + return poll_length(cq, NULL, NULL, 0, 64, offset, flags, 1); +} + +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) __MLX5_ALGN_F__; +int32_t mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1(struct ibv_cq *cq, uint32_t *offset, uint32_t *flags) +{ + return poll_length(cq, NULL, NULL, 0, 128, offset, flags, 1); +} + +static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_safe = { + .poll_cnt = mlx5_poll_cnt_safe, + .poll_length = mlx5_poll_length_safe, + .poll_length_flags = mlx5_poll_length_flags_safe, + .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_safe +}; + +enum mlx5_poll_cq_cqe_sizes { + MLX5_POLL_CQ_CQE_64 = 1, + MLX5_POLL_CQ_CQE_128 = 2, + MLX5_POLL_CQ_NUM_CQE_SIZES = 3, +}; + +static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_unsafe_tbl[MLX5_POLL_CQ_NUM_CQE_SIZES] = { + [MLX5_POLL_CQ_CQE_64] = { + .poll_cnt = mlx5_poll_cnt_unsafe_cqe64, + .poll_length = mlx5_poll_length_unsafe_cqe64, + .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe64, + .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe64 + + }, + [MLX5_POLL_CQ_CQE_128] = { + .poll_cnt = mlx5_poll_cnt_unsafe_cqe128, + .poll_length = mlx5_poll_length_unsafe_cqe128, + .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe128, + .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe128 + + }, +}; + +static struct ibv_exp_cq_family_v1 mlx5_poll_cq_family_unsafe_v1_tbl[MLX5_POLL_CQ_NUM_CQE_SIZES] = { + [MLX5_POLL_CQ_CQE_64] = { + .poll_cnt = mlx5_poll_cnt_unsafe_cqe64_v1, + .poll_length = mlx5_poll_length_unsafe_cqe64_v1, + .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe64_v1, + .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe64_v1 + }, + [MLX5_POLL_CQ_CQE_128] = { + .poll_cnt = mlx5_poll_cnt_unsafe_cqe128_v1, + .poll_length = mlx5_poll_length_unsafe_cqe128_v1, + .poll_length_flags = mlx5_poll_length_flags_unsafe_cqe128_v1, + .poll_length_flags_mp_rq = mlx5_poll_length_flags_mp_rq_unsafe_cqe128_v1 + }, +}; + +struct ibv_exp_cq_family_v1 *mlx5_get_poll_cq_family(struct mlx5_cq *cq, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status) +{ + struct mlx5_context *mctx = to_mctx(cq->ibv_cq.context); + enum mlx5_poll_cq_cqe_sizes cqe_size; + + if (params->intf_version > MLX5_MAX_CQ_FAMILY_VER) { + *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED; + + return NULL; + } + if (params->flags) { + fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for CQ family\n", params->flags); + *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED; + + return NULL; + } + if (params->family_flags) { + fprintf(stderr, PFX "Family flags(0x%x) are not supported for CQ family\n", params->family_flags); + *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED; + + return NULL; + } + if (cq->model_flags & MLX5_CQ_MODEL_FLAG_THREAD_SAFE) + return &mlx5_poll_cq_family_safe; + + if (cq->cqe_sz == 64) { + cqe_size = MLX5_POLL_CQ_CQE_64; + } else if (cq->cqe_sz == 128) { + cqe_size = MLX5_POLL_CQ_CQE_128; + } else { + errno = EINVAL; + *status = IBV_EXP_INTF_STAT_INVAL_PARARM; + return NULL; + } + + if (mctx->cqe_version == 1) + return &mlx5_poll_cq_family_unsafe_v1_tbl[cqe_size]; + + return &mlx5_poll_cq_family_unsafe_tbl[cqe_size]; +} Index: contrib/ofed/libmlx5/src/dbrec.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/dbrec.c @@ -0,0 +1,152 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include + +#include "mlx5.h" + +struct mlx5_db_page { + struct mlx5_db_page *prev, *next; + struct mlx5_buf buf; + int num_db; + int use_cnt; + unsigned long free[0]; +}; + +static struct mlx5_db_page *__add_page(struct mlx5_context *context) +{ + struct mlx5_db_page *page; + int ps = to_mdev(context->ibv_ctx.device)->page_size; + int pp; + int i; + int nlong; + + pp = ps / context->cache_line_size; + nlong = (pp + 8 * sizeof(long) - 1) / (8 * sizeof(long)); + + page = calloc(1, sizeof(*page) + nlong * sizeof(long)); + if (!page) + return NULL; + + if (mlx5_alloc_buf(&page->buf, ps, ps)) { + free(page); + return NULL; + } + + page->num_db = pp; + page->use_cnt = 0; + for (i = 0; i < nlong; ++i) + page->free[i] = ~0; + + page->prev = NULL; + page->next = context->db_list; + context->db_list = page; + if (page->next) + page->next->prev = page; + + return page; +} + +uint32_t *mlx5_alloc_dbrec(struct mlx5_context *context) +{ + struct mlx5_db_page *page; + uint32_t *db = NULL; + int i, j; + + pthread_mutex_lock(&context->db_list_mutex); + + for (page = context->db_list; page; page = page->next) + if (page->use_cnt < page->num_db) + goto found; + + page = __add_page(context); + if (!page) + goto out; + +found: + ++page->use_cnt; + + for (i = 0; !page->free[i]; ++i) + /* nothing */; + + j = ffsl(page->free[i]); + --j; + page->free[i] &= ~(1UL << j); + db = page->buf.buf + (i * 8 * sizeof(long) + j) * context->cache_line_size; + +out: + pthread_mutex_unlock(&context->db_list_mutex); + + return db; +} + +void mlx5_free_db(struct mlx5_context *context, uint32_t *db) +{ + struct mlx5_db_page *page; + uintptr_t ps = to_mdev(context->ibv_ctx.device)->page_size; + int i; + + pthread_mutex_lock(&context->db_list_mutex); + + for (page = context->db_list; page; page = page->next) + if (((uintptr_t) db & ~(ps - 1)) == (uintptr_t) page->buf.buf) + break; + + if (!page) + goto out; + + i = ((void *) db - page->buf.buf) / context->cache_line_size; + page->free[i / (8 * sizeof(long))] |= 1UL << (i % (8 * sizeof(long))); + + if (!--page->use_cnt) { + if (page->prev) + page->prev->next = page->next; + else + context->db_list = page->next; + if (page->next) + page->next->prev = page->prev; + + mlx5_free_buf(&page->buf); + free(page); + } + +out: + pthread_mutex_unlock(&context->db_list_mutex); +} Index: contrib/ofed/libmlx5/src/doorbell.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/doorbell.h @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +#ifndef DOORBELL_H +#define DOORBELL_H + +#if SIZEOF_LONG == 8 + +#if __BYTE_ORDER == __LITTLE_ENDIAN +# define MLX5_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0]) +#elif __BYTE_ORDER == __BIG_ENDIAN +# define MLX5_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1]) +#else +# error __BYTE_ORDER not defined +#endif + +static inline void mlx5_write64(uint32_t val[2], + void *dest, + struct mlx5_lock *lock) +{ + *(volatile uint64_t *)dest = MLX5_PAIR_TO_64(val); +} + +#else + +static inline void mlx5_write64(uint32_t val[2], + void *dest, + struct mlx5_lock *lock) +{ + mlx5_lock(lock); + *(volatile uint32_t *)dest = val[0]; + *(volatile uint32_t *)(dest + 4) = val[1]; + mlx5_unlock(lock); +} + +#endif + +#endif /* DOORBELL_H */ Index: contrib/ofed/libmlx5/src/implicit_lkey.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/implicit_lkey.h @@ -0,0 +1,79 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef IMPLICIT_LKEY_H +#define IMPLICIT_LKEY_H + +#include + + +#define ODP_GLOBAL_R_LKEY 0x00000101 +#define ODP_GLOBAL_W_LKEY 0x00000102 +#define MLX5_WHOLE_ADDR_SPACE (~((size_t)0)) + +struct mlx5_pd; +struct ibv_exp_reg_mr_in; + +struct mlx5_pair_mrs { + struct ibv_mr *mrs[2]; +}; + +struct mlx5_implicit_lkey { + struct mlx5_pair_mrs **table; + uint64_t exp_access; + pthread_mutex_t lock; +}; + +int mlx5_init_implicit_lkey(struct mlx5_implicit_lkey *ilkey, + uint64_t access_flags); + +void mlx5_destroy_implicit_lkey(struct mlx5_implicit_lkey *ilkey); +struct mlx5_implicit_lkey *mlx5_get_implicit_lkey(struct mlx5_pd *pd, uint64_t exp_access); + +struct ibv_mr *mlx5_alloc_whole_addr_mr(const struct ibv_exp_reg_mr_in *attr); + +void mlx5_dealloc_whole_addr_mr(struct ibv_mr *); + +int mlx5_get_real_lkey_from_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, size_t len, + uint32_t *lkey); +int mlx5_get_real_mr_from_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, uint64_t len, + struct ibv_mr **mr); + +int mlx5_prefetch_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, size_t len, int flags); + +#endif /* IMPLICIT_LKEY_H */ Index: contrib/ofed/libmlx5/src/implicit_lkey.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/implicit_lkey.c @@ -0,0 +1,279 @@ +#include +#include +#include +#include +#include +#include "implicit_lkey.h" +#include "mlx5.h" + +#define LEVEL1_SIZE 10 +#define LEVEL2_SIZE 11 +#define MR_SIZE 28 + +#define ADDR_EFFECTIVE_BITS (MR_SIZE + LEVEL2_SIZE + LEVEL1_SIZE) + +#define LEVEL1_SHIFT (MR_SIZE + LEVEL2_SIZE) +#define LEVEL2_SHIFT MR_SIZE + +#define MASK(len) ((1 << (len)) - 1) + +#define MIN(x, y) (((x) < (y)) ? (x) : (y)) + +struct mlx5_implicit_lkey *mlx5_get_implicit_lkey(struct mlx5_pd *pd, + uint64_t exp_access) +{ + if (!(exp_access & IBV_EXP_ACCESS_ON_DEMAND)) { + fprintf(stderr, "cannot create relaxed or implicit\ + MR as a non-ODP MR\n"); + errno = EINVAL; + return NULL; + } + + if ((exp_access & ~IBV_EXP_ACCESS_RELAXED) == IBV_EXP_ACCESS_ON_DEMAND) + return &pd->r_ilkey; + + if ((exp_access & ~IBV_EXP_ACCESS_RELAXED) == + (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_LOCAL_WRITE)) + return &pd->w_ilkey; + + if (!(exp_access & IBV_EXP_ACCESS_RELAXED)) { + fprintf(stderr, "cannot create a strict MR (non-relaxed)\ + for remote access\n"); + errno = EINVAL; + return NULL; + } + + if (!pd->remote_ilkey) { + pd->remote_ilkey = malloc(sizeof(struct mlx5_implicit_lkey)); + if (!pd->remote_ilkey) { + errno = ENOMEM; + return NULL; + } + + errno = mlx5_init_implicit_lkey(pd->remote_ilkey, + IBV_EXP_ACCESS_LOCAL_WRITE | + IBV_EXP_ACCESS_REMOTE_READ | + IBV_EXP_ACCESS_REMOTE_WRITE | + IBV_EXP_ACCESS_REMOTE_ATOMIC | + IBV_EXP_ACCESS_ON_DEMAND); + if (errno) { + free(pd->remote_ilkey); + pd->remote_ilkey = NULL; + } + } + + return pd->remote_ilkey; +} + +int mlx5_init_implicit_lkey(struct mlx5_implicit_lkey *ilkey, + uint64_t exp_access) +{ + ilkey->table = NULL; + ilkey->exp_access = exp_access; + + if (!(exp_access & IBV_EXP_ACCESS_ON_DEMAND)) + return -EINVAL; + + return pthread_mutex_init(&(ilkey->lock), NULL); +} + +static void destroy_level2(struct mlx5_pair_mrs *table) +{ + struct mlx5_pair_mrs *ptr = table; + for (; ptr != table + (1 << LEVEL2_SIZE); ++ptr) { + if (ptr->mrs[0]) { + to_mmr(ptr->mrs[0])->alloc_flags &= ~IBV_EXP_ACCESS_RELAXED; + ibv_dereg_mr(ptr->mrs[0]); + } + if (ptr->mrs[1]) { + to_mmr(ptr->mrs[1])->alloc_flags &= ~IBV_EXP_ACCESS_RELAXED; + ibv_dereg_mr(ptr->mrs[1]); + } + } + + free(table); +} + +void mlx5_destroy_implicit_lkey(struct mlx5_implicit_lkey *ilkey) +{ + struct mlx5_pair_mrs **ptr = ilkey->table; + + pthread_mutex_destroy(&ilkey->lock); + + if (ptr) { + for (; ptr != ilkey->table + (1 << LEVEL1_SIZE); ++ptr) + if (*ptr) + destroy_level2(*ptr); + + free(ilkey->table); + } +} + +struct ibv_mr *mlx5_alloc_whole_addr_mr(const struct ibv_exp_reg_mr_in *attr) +{ + struct ibv_mr *mr; + + if (attr->exp_access & ~(IBV_EXP_ACCESS_ON_DEMAND | + IBV_EXP_ACCESS_LOCAL_WRITE)) + return NULL; + + mr = malloc(sizeof(struct ibv_mr)); + + if (!mr) + return NULL; + + mr->context = attr->pd->context; + mr->pd = attr->pd; + mr->addr = attr->addr; + mr->length = attr->length; + mr->handle = 0; + mr->lkey = attr->exp_access & IBV_EXP_ACCESS_LOCAL_WRITE ? + ODP_GLOBAL_W_LKEY : ODP_GLOBAL_R_LKEY; + mr->rkey = 0; + + return mr; +} + +void mlx5_dealloc_whole_addr_mr(struct ibv_mr *mr) +{ + free(mr); +} + +int mlx5_get_real_mr_from_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, uint64_t len, + struct ibv_mr **mr) +{ + uint64_t key1 = (addr >> LEVEL1_SHIFT) & MASK(LEVEL1_SIZE); + uint64_t key2 = (addr >> LEVEL2_SHIFT) & MASK(LEVEL2_SIZE); + uint64_t addr_msb_bits = addr >> ADDR_EFFECTIVE_BITS; + uint64_t mr_base_addr = addr & ~MASK(MR_SIZE); + int mr_idx_in_pair = (((addr >> (MR_SIZE)) & 1) != + (((addr+len+1) >> (MR_SIZE)) & 1)); + + mr_base_addr |= (mr_idx_in_pair << (MR_SIZE-1)); + + if (len >> MR_SIZE) { + fprintf(stderr, "range too large for the implicit MR\n"); + return EINVAL; + } + + /* Verify that the address is canonical, refuse posting a WQE + * for non-canonical addresses. To remove this limitation, add + * 5 levels to the tree here. + */ + if (addr_msb_bits && + (addr_msb_bits != ((~((uint64_t)0)) >> ADDR_EFFECTIVE_BITS))) + return EINVAL; + + + /* Access the table in lock-free manner. + * + * As we only add items to the table, only lock it when adding + * the items, and check that the item is still missing with + * lock held. Assumes that writes to pointers are atomic, so + * we will never read "half-pointer". + */ + if (!ilkey->table) { + pthread_mutex_lock(&ilkey->lock); + if (!ilkey->table) + ilkey->table = calloc(1, sizeof(void *) * + (1 << LEVEL1_SIZE)); + pthread_mutex_unlock(&ilkey->lock); + if (!ilkey->table) + return ENOMEM; + } + + if (!ilkey->table[key1]) { + pthread_mutex_lock(&ilkey->lock); + if (!ilkey->table[key1]) + ilkey->table[key1] = calloc(1, + (sizeof(struct mlx5_pair_mrs) * + (1 << LEVEL2_SIZE))); + pthread_mutex_unlock(&ilkey->lock); + if (!ilkey->table[key1]) + return ENOMEM; + } + + if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair]) { + pthread_mutex_lock(&ilkey->lock); + if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair]) { + struct ibv_exp_reg_mr_in attr = { + .comp_mask = 0, + .pd = &pd->ibv_pd, + .addr = (void *)(unsigned long)mr_base_addr, + .length = 1 << MR_SIZE, + .exp_access = ilkey->exp_access, + }; + + ilkey->table[key1][key2].mrs[mr_idx_in_pair] = ibv_exp_reg_mr(&attr); + if (ilkey->table[key1][key2].mrs[mr_idx_in_pair]) { + ilkey->table[key1][key2].mrs[mr_idx_in_pair]->addr = (void *)(unsigned long)mr_base_addr; + ilkey->table[key1][key2].mrs[mr_idx_in_pair]->length = 1 << MR_SIZE; + } + } + if (ilkey->table[key1][key2].mrs[mr_idx_in_pair]) { + to_mmr(ilkey->table[key1][key2].mrs[mr_idx_in_pair])->alloc_flags |= IBV_EXP_ACCESS_RELAXED; + to_mmr(ilkey->table[key1][key2].mrs[mr_idx_in_pair])->type = MLX5_ODP_MR; + } + pthread_mutex_unlock(&ilkey->lock); + if (!ilkey->table[key1][key2].mrs[mr_idx_in_pair]) + return ENOMEM; + } + + *mr = ilkey->table[key1][key2].mrs[mr_idx_in_pair]; + + assert((*mr)->addr <= (void *)(unsigned long)addr && + (void *)(unsigned long)addr + len <= + (*mr)->addr + (*mr)->length); + return 0; +} + +int mlx5_get_real_lkey_from_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, size_t len, + uint32_t *lkey) +{ + struct ibv_mr *mr; + int ret_val = mlx5_get_real_mr_from_implicit_lkey(pd, ilkey, addr, + len, &mr); + + if (ret_val == 0) + *lkey = mr->lkey; + return ret_val; +} + +#define PREFETCH_STRIDE_SIZE (MASK(MR_SIZE-1)) +int mlx5_prefetch_implicit_lkey(struct mlx5_pd *pd, + struct mlx5_implicit_lkey *ilkey, + uint64_t addr, size_t len, int flags) +{ + uint64_t end_addr = addr + len; + if (addr > end_addr) + return EINVAL; + while (addr < end_addr) { + struct ibv_mr *mr; + struct ibv_exp_prefetch_attr attr; + size_t effective_length = MIN(1+PREFETCH_STRIDE_SIZE - + (addr & PREFETCH_STRIDE_SIZE), + end_addr - addr); + int ret_val = mlx5_get_real_mr_from_implicit_lkey(pd, + ilkey, + addr, + effective_length, + &mr); + if (ret_val) + return ret_val; + attr.comp_mask = 0; + attr.addr = (void *)(unsigned long)addr; + attr.length = effective_length; + attr.flags = flags; + + ret_val = ibv_exp_prefetch_mr(mr, &attr); + if (ret_val) + return ret_val; + + addr += effective_length; + } + return 0; +} Index: contrib/ofed/libmlx5/src/list.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/list.h @@ -0,0 +1,331 @@ +#ifndef _LINUX_LIST_H +#define _LINUX_LIST_H + +/* + * These are non-NULL pointers that will result in page faults + * under normal circumstances, used to verify that nobody uses + * non-initialized list entries. + */ +#define LIST_POISON1 ((void *) 0x00100100) +#define LIST_POISON2 ((void *) 0x00200200) + +/* + * Simple doubly linked list implementation. + * + * Some of the internal functions ("__xxx") are useful when + * manipulating whole lists rather than single entries, as + * sometimes we already know the next/prev entries and we can + * generate better code by using them directly rather than + * using the generic single-entry routines. + */ + +struct list_head { + struct list_head *next, *prev; +}; + +#define LIST_HEAD_INIT(name) { &(name), &(name) } + +#define LIST_HEAD(name) \ + struct list_head name = LIST_HEAD_INIT(name) + +#define INIT_LIST_HEAD(ptr) do { \ + (ptr)->next = (ptr); (ptr)->prev = (ptr); \ +} while (0) + +/* + * Insert a new entry between two known consecutive entries. + * + * This is only for internal list manipulation where we know + * the prev/next entries already! + */ +static inline void __list_add(struct list_head *new, + struct list_head *prev, + struct list_head *next) +{ + next->prev = new; + new->next = next; + new->prev = prev; + prev->next = new; +} + +/** + * list_add - add a new entry + * @new: new entry to be added + * @head: list head to add it after + * + * Insert a new entry after the specified head. + * This is good for implementing stacks. + */ +static inline void list_add(struct list_head *new, struct list_head *head) +{ + __list_add(new, head, head->next); +} + +/** + * list_add_tail - add a new entry + * @new: new entry to be added + * @head: list head to add it before + * + * Insert a new entry before the specified head. + * This is useful for implementing queues. + */ +static inline void list_add_tail(struct list_head *new, struct list_head *head) +{ + __list_add(new, head->prev, head); +} + +/* + * Delete a list entry by making the prev/next entries + * point to each other. + * + * This is only for internal list manipulation where we know + * the prev/next entries already! + */ +static inline void __list_del(struct list_head *prev, struct list_head *next) +{ + next->prev = prev; + prev->next = next; +} + +/** + * list_del - deletes entry from list. + * @entry: the element to delete from the list. + * Note: list_empty on entry does not return true after this, the entry is + * in an undefined state. + */ +static inline void list_del(struct list_head *entry) +{ + __list_del(entry->prev, entry->next); + entry->next = LIST_POISON1; + entry->prev = LIST_POISON2; +} + +/** + * list_del_init - deletes entry from list and reinitialize it. + * @entry: the element to delete from the list. + */ +static inline void list_del_init(struct list_head *entry) +{ + __list_del(entry->prev, entry->next); + INIT_LIST_HEAD(entry); +} + +/** + * list_move - delete from one list and add as another's head + * @list: the entry to move + * @head: the head that will precede our entry + */ +static inline void list_move(struct list_head *list, struct list_head *head) +{ + __list_del(list->prev, list->next); + list_add(list, head); +} + +/** + * list_move_tail - delete from one list and add as another's tail + * @list: the entry to move + * @head: the head that will follow our entry + */ +static inline void list_move_tail(struct list_head *list, + struct list_head *head) +{ + __list_del(list->prev, list->next); + list_add_tail(list, head); +} + +/** + * list_empty - tests whether a list is empty + * @head: the list to test. + */ +static inline int list_empty(const struct list_head *head) +{ + return head->next == head; +} + +/** + * list_empty_careful - tests whether a list is + * empty _and_ checks that no other CPU might be + * in the process of still modifying either member + * + * NOTE: using list_empty_careful() without synchronization + * can only be safe if the only activity that can happen + * to the list entry is list_del_init(). Eg. it cannot be used + * if another CPU could re-list_add() it. + * + * @head: the list to test. + */ +static inline int list_empty_careful(const struct list_head *head) +{ + struct list_head *next = head->next; + return (next == head) && (next == head->prev); +} + +static inline void __list_splice(struct list_head *list, + struct list_head *head) +{ + struct list_head *first = list->next; + struct list_head *last = list->prev; + struct list_head *at = head->next; + + first->prev = head; + head->next = first; + + last->next = at; + at->prev = last; +} + +/** + * list_splice - join two lists + * @list: the new list to add. + * @head: the place to add it in the first list. + */ +static inline void list_splice(struct list_head *list, struct list_head *head) +{ + if (!list_empty(list)) + __list_splice(list, head); +} + +/** + * list_splice_init - join two lists and reinitialise the emptied list. + * @list: the new list to add. + * @head: the place to add it in the first list. + * + * The list at @list is reinitialised + */ +static inline void list_splice_init(struct list_head *list, + struct list_head *head) +{ + if (!list_empty(list)) { + __list_splice(list, head); + INIT_LIST_HEAD(list); + } +} + +#ifndef offsetof +#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) +#endif + +/** + * container_of - cast a member of a structure out to the containing structure + * + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member: the name of the member within the struct. + * + */ +#ifndef container_of +#define container_of(ptr, type, member) ({ \ + const typeof(((type *)0)->member)*__mptr = (ptr); \ + (type *)((char *)__mptr - offsetof(type, member)); }) +#endif + + +/** + * list_entry - get the struct for this entry + * @ptr: the &struct list_head pointer. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_struct within the struct. + */ +#define list_entry(ptr, type, member) \ + container_of(ptr, type, member) + +/** + * list_for_each - iterate over a list + * @pos: the &struct list_head to use as a loop counter. + * @head: the head for your list. + */ +#define list_for_each(pos, head) \ + for (pos = (head)->next; prefetch(pos->next), pos != (head); \ + pos->next) + +/** + * __list_for_each - iterate over a list + * @pos: the &struct list_head to use as a loop counter. + * @head: the head for your list. + * + * This variant differs from list_for_each() in that it's the + * simplest possible list iteration code, no prefetching is done. + * Use this for code that knows the list to be very short (empty + * or 1 entry) most of the time. + */ +#define __list_for_each(pos, head) \ + for (pos = (head)->next; pos != (head); pos = pos->next) + +/** + * list_for_each_prev - iterate over a list backwards + * @pos: the &struct list_head to use as a loop counter. + * @head: the head for your list. + */ +#define list_for_each_prev(pos, head) \ + for (pos = (head)->prev; prefetch(pos->prev), pos != (head); \ + pos = pos->prev) + +/** + * list_for_each_safe - iterate over a list safe against removal of list entry + * @pos: the &struct list_head to use as a loop counter. + * @n: another &struct list_head to use as temporary storage + * @head: the head for your list. + */ +#define list_for_each_safe(pos, n, head) \ + for (pos = (head)->next, n = pos->next; pos != (head); \ + pos = n, n = pos->next) + +/** + * list_for_each_entry - iterate over list of given type + * @pos: the type * to use as a loop counter. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry(pos, head, member) \ + for (pos = list_entry((head)->next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = list_entry(pos->member.next, typeof(*pos), member)) + +/** + * list_for_each_entry_reverse - iterate backwards over list of given type. + * @pos: the type * to use as a loop counter. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_reverse(pos, head, member) \ + for (pos = list_entry((head)->prev, typeof(*pos), member); \ + prefetch(pos->member.prev), &pos->member != (head); \ + pos = list_entry(pos->member.prev, typeof(*pos), member)) + +/** + * list_prepare_entry - prepare a pos entry for use as a start point in + * list_for_each_entry_continue + * @pos: the type * to use as a start point + * @head: the head of the list + * @member: the name of the list_struct within the struct. + */ +#define list_prepare_entry(pos, head, member) \ + ((pos) ? : list_entry(head, typeof(*pos), member)) + +/** + * list_for_each_entry_continue - iterate over list of given type + * continuing after existing point + * @pos: the type * to use as a loop counter. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_continue(pos, head, member) \ + for (pos = list_entry(pos->member.next, typeof(*pos), member); \ + prefetch(pos->member.next), &pos->member != (head); \ + pos = list_entry(pos->member.next, typeof(*pos), member)) + +/** + * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry + * @pos: the type * to use as a loop counter. + * @n: another type * to use as temporary storage + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + */ +#define list_for_each_entry_safe(pos, n, head, member) \ + for (pos = list_entry((head)->next, typeof(*pos), member), \ + n = list_entry(pos->member.next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = n, n = list_entry(n->member.next, typeof(*n), member)) + +#endif + Index: contrib/ofed/libmlx5/src/mlx5-abi.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/mlx5-abi.h @@ -0,0 +1,409 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_ABI_H +#define MLX5_ABI_H + +#include + +#define MLX5_UVERBS_MIN_ABI_VERSION 1 +#define MLX5_UVERBS_MAX_ABI_VERSION 1 + +enum { + MLX5_QP_FLAG_SIGNATURE = 1 << 0, +}; + +enum { + MLX5_RWQ_FLAG_SIGNATURE = 1 << 0, +}; + +enum { + MLX5_NUM_UUARS_PER_PAGE = 2, + MLX5_MAX_UAR_PAGES = 1 << 8, + MLX5_MAX_UUARS = MLX5_MAX_UAR_PAGES * MLX5_NUM_UUARS_PER_PAGE, + MLX5_DEF_TOT_UUARS = 8 * MLX5_NUM_UUARS_PER_PAGE, +}; + +struct mlx5_alloc_ucontext { + struct ibv_get_context ibv_req; + __u32 total_num_uuars; + __u32 num_low_latency_uuars; + __u32 flags; + __u32 reserved; +}; + +struct mlx5_alloc_ucontext_resp { + struct ibv_get_context_resp ibv_resp; + __u32 qp_tab_size; + __u32 bf_reg_size; + __u32 tot_uuars; + __u32 cache_line_size; + __u16 max_sq_desc_sz; + __u16 max_rq_desc_sz; + __u32 max_send_wqebb; + __u32 max_recv_wr; + __u32 max_srq_recv_wr; + __u16 num_ports; + __u16 reserved; + __u32 max_desc_sz_sq_dc; + __u32 atomic_sizes_dc; + __u32 reserved1; + __u32 flags; + __u32 reserved2[5]; +}; + +enum mlx5_exp_alloc_context_resp_mask { + MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_COMP_MAX_NUM = 1 << 0, + MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_VERSION = 1 << 1, + MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MIN = 1 << 2, + MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MAX = 1 << 3, + MLX5_EXP_ALLOC_CTX_RESP_MASK_HCA_CORE_CLOCK_OFFSET = 1 << 4, +}; + +struct mlx5_exp_alloc_ucontext_data_resp { + __u32 comp_mask; /* use mlx5_exp_alloc_context_resp_mask */ + __u16 cqe_comp_max_num; + __u8 cqe_version; + __u8 reserved; + __u16 rroce_udp_sport_min; + __u16 rroce_udp_sport_max; + __u32 hca_core_clock_offset; +}; + +struct mlx5_exp_alloc_ucontext_resp { + struct ibv_get_context_resp ibv_resp; + __u32 qp_tab_size; + __u32 bf_reg_size; + __u32 tot_uuars; + __u32 cache_line_size; + __u16 max_sq_desc_sz; + __u16 max_rq_desc_sz; + __u32 max_send_wqebb; + __u32 max_recv_wr; + __u32 max_srq_recv_wr; + __u16 num_ports; + __u16 reserved; + __u32 max_desc_sz_sq_dc; + __u32 atomic_sizes_dc; + __u32 reseved1; + __u32 flags; + __u32 reserved2[5]; + /* Some more reserved fields for future growth of + * mlx5_alloc_ucontext_resp */ + __u64 prefix_reserved[8]; + + struct mlx5_exp_alloc_ucontext_data_resp exp_data; +}; + +struct mlx5_alloc_pd_resp { + struct ibv_alloc_pd_resp ibv_resp; + __u32 pdn; +}; + +struct mlx5_create_cq { + struct ibv_create_cq ibv_cmd; + __u64 buf_addr; + __u64 db_addr; + __u32 cqe_size; +}; + +struct mlx5_create_cq_resp { + struct ibv_create_cq_resp ibv_resp; + __u32 cqn; + __u32 reserved; +}; + +enum mlx5_exp_creaet_cq_mask { + MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_EN = 1 << 0, + MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_RECV_TYPE = 1 << 1, + MLX5_EXP_CREATE_CQ_MASK_RESERVED = 1 << 2, +}; + +enum mlx5_exp_cqe_comp_recv_type { + MLX5_CQE_FORMAT_HASH, + MLX5_CQE_FORMAT_CSUM, +}; + +struct mlx5_exp_create_cq_data { + __u32 comp_mask; /* use mlx5_exp_creaet_cq_mask */ + __u8 cqe_comp_en; + __u8 cqe_comp_recv_type; /* use mlx5_exp_cqe_comp_recv_type */ + __u16 reserved; +}; + +struct mlx5_exp_create_cq { + struct ibv_exp_create_cq ibv_cmd; + __u64 buf_addr; + __u64 db_addr; + __u32 cqe_size; + __u32 reserved; + /* Some more reserved fields for future growth of mlx5_create_cq */ + __u64 prefix_reserved[8]; + + /* sizeof prefix aligned with mlx5_create_cq */ + __u64 size_of_prefix; + + struct mlx5_exp_create_cq_data exp_data; +}; + +struct mlx5_create_srq { + struct ibv_create_srq ibv_cmd; + __u64 buf_addr; + __u64 db_addr; + __u32 flags; +}; + +struct mlx5_create_srq_resp { + struct ibv_create_srq_resp ibv_resp; + __u32 srqn; + __u32 reserved; +}; + +struct mlx5_create_srq_ex { + struct ibv_create_xsrq ibv_cmd; + __u64 buf_addr; + __u64 db_addr; + __u32 flags; + __u32 reserved; + __u32 uidx; + __u32 reserved1; +}; + +struct mlx5_drv_create_qp { + __u64 buf_addr; + __u64 db_addr; + __u32 sq_wqe_count; + __u32 rq_wqe_count; + __u32 rq_wqe_shift; + __u32 flags; +}; + +enum mlx5_exp_drv_create_qp_mask { + MLX5_EXP_CREATE_QP_MASK_UIDX = 1 << 0, + MLX5_EXP_CREATE_QP_MASK_SQ_BUFF_ADD = 1 << 1, + MLX5_EXP_CREATE_QP_MASK_WC_UAR_IDX = 1 << 2, + MLX5_EXP_CREATE_QP_MASK_FLAGS_IDX = 1 << 3, + MLX5_EXP_CREATE_QP_MASK_RESERVED = 1 << 4, +}; + +enum mlx5_exp_create_qp_flags { + MLX5_EXP_CREATE_QP_MULTI_PACKET_WQE_REQ_FLAG = 1 << 0, +}; + +enum mlx5_exp_drv_create_qp_uar_idx { + MLX5_EXP_CREATE_QP_DB_ONLY_UUAR = -1 +}; + +struct mlx5_exp_drv_create_qp_data { + __u32 comp_mask; /* use mlx5_exp_ib_create_qp_mask */ + __u32 uidx; + __u64 sq_buf_addr; + __u32 wc_uar_index; + __u32 flags; /* use mlx5_exp_create_qp_flags */ +}; + +struct mlx5_exp_drv_create_qp { + /* To allow casting to mlx5_drv_create_qp the prefix is the same as + * struct mlx5_drv_create_qp prefix + */ + __u64 buf_addr; + __u64 db_addr; + __u32 sq_wqe_count; + __u32 rq_wqe_count; + __u32 rq_wqe_shift; + __u32 flags; + + /* Some more reserved fields for future growth of mlx5_drv_create_qp */ + __u64 prefix_reserved[8]; + + /* sizeof prefix aligned with mlx5_drv_create_qp */ + __u64 size_of_prefix; + + /* Experimental data + * Add new experimental data only inside the exp struct + */ + struct mlx5_exp_drv_create_qp_data exp; +}; + +struct mlx5_create_qp { + struct ibv_create_qp ibv_cmd; + struct mlx5_drv_create_qp drv; +}; + +enum { + MLX5_EXP_INVALID_UUAR = (-1), +}; + +struct mlx5_create_qp_resp { + struct ibv_create_qp_resp ibv_resp; + __u32 uuar_index; + __u32 rsvd; +}; + +struct mlx5_exp_create_qp { + struct ibv_exp_create_qp ibv_cmd; + struct mlx5_exp_drv_create_qp drv; +}; + +enum mlx5_exp_drv_create_qp_resp_mask { + MLX5_EXP_CREATE_QP_RESP_MASK_FLAGS_IDX = 1 << 0, + MLX5_EXP_CREATE_QP_RESP_MASK_RESERVED = 1 << 1, +}; + +enum mlx5_exp_create_qp_resp_flags { + MLX5_EXP_CREATE_QP_RESP_MULTI_PACKET_WQE_FLAG = 1 << 0, +}; + +struct mlx5_exp_drv_create_qp_resp_data { + __u32 comp_mask; /* use mlx5_exp_drv_create_qp_resp_mask */ + __u32 flags; /* use mlx5_exp_create_qp_resp_flags */ +}; + + +struct mlx5_exp_create_qp_resp { + struct ibv_exp_create_qp_resp ibv_resp; + __u32 uuar_index; + __u32 rsvd; + + /* Some more reserved fields for future growth of create qp resp */ + __u64 prefix_reserved[8]; + + /* sizeof prefix aligned with create qp resp */ + __u64 size_of_prefix; + + /* Experimental data + * Add new experimental data only inside the exp struct + */ + struct mlx5_exp_drv_create_qp_resp_data exp; +}; + +struct mlx5_exp_drv_create_wq { + __u64 buf_addr; + __u64 db_addr; + __u32 rq_wqe_count; + __u32 rq_wqe_shift; + __u32 user_index; + __u32 flags; +}; + +struct mlx5_exp_create_wq { + struct ibv_exp_create_wq ibv_cmd; + struct mlx5_exp_drv_create_wq drv; +}; + +struct mlx5_exp_create_wq_resp { + struct ibv_exp_create_wq_resp ibv_resp; +}; + +struct mlx5_exp_modify_wq { + struct ib_exp_modify_wq ibv_cmd; +}; + +struct mlx5_exp_create_rwq_ind_table_resp { + struct ibv_exp_create_rwq_ind_table_resp ibv_resp; +}; + +struct mlx5_exp_destroy_rwq_ind_table { + struct ibv_exp_destroy_rwq_ind_table ibv_cmd; +}; + +struct mlx5_resize_cq { + struct ibv_resize_cq ibv_cmd; + __u64 buf_addr; + __u16 cqe_size; + __u16 reserved0; + __u32 reserved1; +}; + +struct mlx5_resize_cq_resp { + struct ibv_resize_cq_resp ibv_resp; +}; + +struct mlx5_drv_create_dct { + __u32 uidx; + __u32 reserved; +}; + +struct mlx5_create_dct { + struct ibv_exp_create_dct ibv_cmd; + struct mlx5_drv_create_dct drv; +}; + +struct mlx5_create_dct_resp { + struct ibv_exp_create_dct_resp ibv_resp; +}; + +struct mlx5_destroy_dct { + struct ibv_exp_destroy_dct ibv_cmd; +}; + +struct mlx5_destroy_dct_resp { + struct ibv_exp_destroy_dct_resp ibv_resp; +}; + +struct mlx5_query_dct { + struct ibv_exp_query_dct ibv_cmd; +}; + +struct mlx5_query_dct_resp { + struct ibv_exp_query_dct_resp ibv_resp; +}; + +struct mlx5_arm_dct { + struct ibv_exp_arm_dct ibv_cmd; + __u64 reserved0; + __u64 reserved1; +}; + +struct mlx5_arm_dct_resp { + struct ibv_exp_arm_dct_resp ibv_resp; + __u64 reserved0; + __u64 reserved1; +}; + +struct mlx5_query_mkey { + struct ibv_exp_query_mkey ibv_cmd; +}; + +struct mlx5_query_mkey_resp { + struct ibv_exp_query_mkey_resp ibv_resp; +}; + +struct mlx5_create_mr { + struct ibv_exp_create_mr ibv_cmd; +}; + +struct mlx5_create_mr_resp { + struct ibv_exp_create_mr_resp ibv_resp; +}; + +#endif /* MLX5_ABI_H */ Index: contrib/ofed/libmlx5/src/mlx5.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/mlx5.h @@ -0,0 +1,1291 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef MLX5_H +#define MLX5_H + +#include +#include +#include + +#include +#include +#include +#include +#include "mlx5-abi.h" +#include "list.h" +#include "bitmap.h" +#include "implicit_lkey.h" +#include "wqe.h" + +#ifdef __GNUC__ +#define likely(x) __builtin_expect((x), 1) +#define unlikely(x) __builtin_expect((x), 0) +#endif + +#ifndef uninitialized_var +#define uninitialized_var(x) x = x +#endif + +#ifdef HAVE_VALGRIND_MEMCHECK_H + +# include + +# if !defined(VALGRIND_MAKE_MEM_DEFINED) || !defined(VALGRIND_MAKE_MEM_UNDEFINED) +# warning "Valgrind support requested, but VALGRIND_MAKE_MEM_(UN)DEFINED not available" +# endif + +#endif /* HAVE_VALGRIND_MEMCHECK_H */ + +#ifndef VALGRIND_MAKE_MEM_DEFINED +# define VALGRIND_MAKE_MEM_DEFINED(addr, len) +#endif + +#ifndef VALGRIND_MAKE_MEM_UNDEFINED +# define VALGRIND_MAKE_MEM_UNDEFINED(addr, len) +#endif + +#ifndef rmb +# define rmb() mb() +#endif + +#ifndef wmb +# define wmb() mb() +#endif + +#ifndef wc_wmb + +#if defined(__i386__) +#define wc_wmb() asm volatile("lock; addl $0, 0(%%esp) " ::: "memory") +#elif defined(__x86_64__) +#define wc_wmb() asm volatile("sfence" ::: "memory") +#elif defined(__ia64__) +#define wc_wmb() asm volatile("fwb" ::: "memory") +#else +#define wc_wmb() wmb() +#endif + +#endif + +#define MLX5_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__) + +#if MLX5_GCC_VERSION >= 403 +# define __MLX5_ALGN_F__ __attribute__((noinline, aligned(64))) +# define __MLX5_ALGN_D__ __attribute__((aligned(64))) +#else +# define __MLX5_ALGN_F__ +# define __MLX5_ALGN_D__ +#endif + +#ifndef min +#define min(a, b) \ + ({ typeof(a) _a = (a); \ + typeof(b) _b = (b); \ + _a < _b ? _a : _b; }) +#endif + +#ifndef max +#define max(a, b) \ + ({ typeof(a) _a = (a); \ + typeof(b) _b = (b); \ + _a > _b ? _a : _b; }) +#endif + +#define HIDDEN __attribute__((visibility("hidden"))) + +#define PFX "mlx5: " + +#define MLX5_MAX_PORTS_NUM 2 + +enum { + MLX5_MAX_CQ_FAMILY_VER = 1, + MLX5_MAX_QP_BURST_FAMILY_VER = 0, + MLX5_MAX_WQ_FAMILY_VER = 0 +}; + +enum { + MLX5_IB_MMAP_CMD_SHIFT = 8, + MLX5_IB_MMAP_CMD_MASK = 0xff, +}; + +enum { + MLX5_QP_PATTERN = 0x012389AB, + MLX5_CQ_PATTERN = 0x4567CDEF, + MLX5_WQ_PATTERN = 0x89AB0123 +}; + +enum mlx5_lock_type { + MLX5_SPIN_LOCK = 0, + MLX5_MUTEX = 1, +}; + +enum mlx5_lock_state { + MLX5_USE_LOCK, + MLX5_LOCKED, + MLX5_UNLOCKED +}; + +enum { + MLX5_MMAP_GET_REGULAR_PAGES_CMD = 0, + MLX5_MMAP_GET_CONTIGUOUS_PAGES_CMD = 1, + MLX5_MMAP_GET_WC_PAGES_CMD = 2, + MLX5_MMAP_GET_NC_PAGES_CMD = 3, + MLX5_MMAP_MAP_DC_INFO_PAGE = 4, + + /* Use EXP mmap commands until it is pushed to upstream */ + MLX5_EXP_MMAP_GET_CORE_CLOCK_CMD = 0xFB, + MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_CPU_NUMA_CMD = 0xFC, + MLX5_EXP_MMAP_GET_CONTIGUOUS_PAGES_DEV_NUMA_CMD = 0xFD, + MLX5_EXP_IB_MMAP_N_ALLOC_WC_CMD = 0xFE, +}; + +#define MLX5_CQ_PREFIX "MLX_CQ" +#define MLX5_QP_PREFIX "MLX_QP" +#define MLX5_MR_PREFIX "MLX_MR" +#define MLX5_RWQ_PREFIX "MLX_RWQ" +#define MLX5_MAX_LOG2_CONTIG_BLOCK_SIZE 23 +#define MLX5_MIN_LOG2_CONTIG_BLOCK_SIZE 12 + +enum { + MLX5_DBG_QP = 1 << 0, + MLX5_DBG_CQ = 1 << 1, + MLX5_DBG_QP_SEND = 1 << 2, + MLX5_DBG_QP_SEND_ERR = 1 << 3, + MLX5_DBG_CQ_CQE = 1 << 4, + MLX5_DBG_CONTIG = 1 << 5, +}; + +enum { + MLX5_UMR_PTR_ALIGN = 2048, +}; + +extern uint32_t mlx5_debug_mask; +extern int mlx5_freeze_on_error_cqe; + +#ifdef MLX5_DEBUG +#define mlx5_dbg(fp, mask, format, arg...) \ +do { \ + if (mask & mlx5_debug_mask) \ + fprintf(fp, "%s:%d: " format, __func__, __LINE__, ##arg); \ +} while (0) + +#else + #define mlx5_dbg(fp, mask, format, arg...) +#endif + +enum { + MLX5_RCV_DBR = 0, + MLX5_SND_DBR = 1, +}; + +enum { + MLX5_STAT_RATE_OFFSET = 5 +}; + +enum { + MLX5_QP_TABLE_SHIFT = 12, + MLX5_QP_TABLE_MASK = (1 << MLX5_QP_TABLE_SHIFT) - 1, + MLX5_QP_TABLE_SIZE = 1 << (24 - MLX5_QP_TABLE_SHIFT), +}; + +enum { + MLX5_SRQ_TABLE_SHIFT = 12, + MLX5_SRQ_TABLE_MASK = (1 << MLX5_SRQ_TABLE_SHIFT) - 1, + MLX5_SRQ_TABLE_SIZE = 1 << (24 - MLX5_SRQ_TABLE_SHIFT), +}; + +enum { + MLX5_DCT_TABLE_SHIFT = 12, + MLX5_DCT_TABLE_MASK = (1 << MLX5_DCT_TABLE_SHIFT) - 1, + MLX5_DCT_TABLE_SIZE = 1 << (24 - MLX5_DCT_TABLE_SHIFT), +}; + +enum { + MLX5_SEND_WQE_BB = 64, + MLX5_SEND_WQE_SHIFT = 6, +}; + +enum { + MLX5_BF_OFFSET = 0x800 +}; + +enum { + MLX5_INLINE_SCATTER_32 = 0x4, + MLX5_INLINE_SCATTER_64 = 0x8, +}; + +enum { + MLX5_OPCODE_NOP = 0x00, + MLX5_OPCODE_SEND_INVAL = 0x01, + MLX5_OPCODE_RDMA_WRITE = 0x08, + MLX5_OPCODE_RDMA_WRITE_IMM = 0x09, + MLX5_OPCODE_SEND = 0x0a, + MLX5_OPCODE_SEND_IMM = 0x0b, + MLX5_OPCODE_LSO_MPW = 0x0e, + MLX5_OPC_MOD_MPW = 0x01, /* OPC_MOD for LSO_MPW opcode */ + MLX5_OPCODE_RDMA_READ = 0x10, + MLX5_OPCODE_ATOMIC_CS = 0x11, + MLX5_OPCODE_ATOMIC_FA = 0x12, + MLX5_OPCODE_ATOMIC_MASKED_CS = 0x14, + MLX5_OPCODE_ATOMIC_MASKED_FA = 0x15, + MLX5_OPCODE_BIND_MW = 0x18, + MLX5_OPCODE_FMR = 0x19, + MLX5_OPCODE_LOCAL_INVAL = 0x1b, + MLX5_OPCODE_CONFIG_CMD = 0x1f, + + MLX5_OPCODE_SEND_ENABLE = 0x17, + MLX5_OPCODE_RECV_ENABLE = 0x16, + MLX5_OPCODE_CQE_WAIT = 0x0f, + MLX5_OPCODE_UMR = 0x25, + + MLX5_RECV_OPCODE_RDMA_WRITE_IMM = 0x00, + MLX5_RECV_OPCODE_SEND = 0x01, + MLX5_RECV_OPCODE_SEND_IMM = 0x02, + MLX5_RECV_OPCODE_SEND_INVAL = 0x03, + + MLX5_CQE_OPCODE_ERROR = 0x1e, + MLX5_CQE_OPCODE_RESIZE = 0x16, +}; + +enum { + MLX5_SRQ_FLAG_SIGNATURE = 1 << 0, +}; + +enum { + MLX5_INLINE_SEG = 0x80000000, +}; + +enum mlx5_alloc_type { + MLX5_ALLOC_TYPE_ANON, + MLX5_ALLOC_TYPE_HUGE, + MLX5_ALLOC_TYPE_CONTIG, + MLX5_ALLOC_TYPE_PREFER_HUGE, + MLX5_ALLOC_TYPE_PREFER_CONTIG, + MLX5_ALLOC_TYPE_ALL +}; + +enum mlx5_mr_type { + MLX5_NORMAL_MR = 0x0, + MLX5_ODP_MR = 0x1, +}; + +struct mlx5_device { + struct verbs_device verbs_dev; + int page_size; + + struct { + unsigned id; + unsigned short rev; + } devid; + int driver_abi_ver; +}; + +enum mlx5_rsc_type { + MLX5_RSC_TYPE_QP, + MLX5_RSC_TYPE_DCT, + MLX5_RSC_TYPE_RWQ, + MLX5_RSC_TYPE_MP_RWQ, + MLX5_RSC_TYPE_XSRQ, + MLX5_RSC_TYPE_SRQ, + MLX5_RSC_TYPE_INVAL, +}; + +struct mlx5_resource { + enum mlx5_rsc_type type; + uint32_t rsn; +}; + +struct mlx5_db_page; + +struct mlx5_lock { + pthread_mutex_t mutex; + pthread_spinlock_t slock; + enum mlx5_lock_state state; + enum mlx5_lock_type type; +}; + +struct mlx5_spinlock { + pthread_spinlock_t lock; + enum mlx5_lock_state state; +}; + +struct mlx5_atomic_info { + int valid; + enum ibv_exp_atomic_cap exp_atomic_cap; + uint64_t bit_mask_log_atomic_arg_sizes; +}; + +enum mlx5_uar_mapping_type { + MLX5_UAR_MAP_WC, + MLX5_UAR_MAP_NC +}; +struct mlx5_uar_data { + enum mlx5_uar_mapping_type map_type; + void *regs; +}; + +struct mlx5_port_info_ctx { + unsigned consumer; + int steady; +}; + +struct mlx5_info_ctx { + void *buf; + struct mlx5_port_info_ctx port[2]; +}; + +struct mlx5_context { + struct ibv_context ibv_ctx; + int max_num_qps; + int bf_reg_size; + int tot_uuars; + int low_lat_uuars; + int bf_regs_per_page; + int num_bf_regs; + int prefer_bf; + int shut_up_bf; + int enable_cqe_comp; + struct { + struct mlx5_resource **table; + int refcnt; + } rsc_table[MLX5_QP_TABLE_SIZE]; + pthread_mutex_t rsc_table_mutex; + + struct { + struct mlx5_srq **table; + int refcnt; + } srq_table[MLX5_SRQ_TABLE_SIZE]; + pthread_mutex_t srq_table_mutex; + + struct { + struct mlx5_resource **table; + int refcnt; + } uidx_table[MLX5_QP_TABLE_SIZE]; + pthread_mutex_t uidx_table_mutex; + + struct mlx5_uar_data uar[MLX5_MAX_UAR_PAGES]; + + struct mlx5_spinlock send_db_lock; /* protects send_db_list and send_db_num_uars */ + struct list_head send_wc_db_list; + unsigned int num_wc_uars; + int max_ctx_res_domain; + + struct mlx5_lock lock32; + struct mlx5_db_page *db_list; + pthread_mutex_t db_list_mutex; + int cache_line_size; + int max_sq_desc_sz; + int max_rq_desc_sz; + int max_send_wqebb; + int max_recv_wr; + unsigned max_srq_recv_wr; + int num_ports; + int stall_enable; + int stall_adaptive_enable; + int stall_cycles; + struct mlx5_bf *bfs; + FILE *dbg_fp; + char hostname[40]; + struct mlx5_spinlock hugetlb_lock; + struct list_head hugetlb_list; + int max_desc_sz_sq_dc; + uint32_t atomic_sizes_dc; + pthread_mutex_t task_mutex; + struct mlx5_atomic_info info; + int max_sge; + uint32_t max_send_wqe_inline_klms; + pthread_mutex_t env_mtx; + int env_initialized; + int compact_av; + int numa_id; + struct mlx5_info_ctx cc; + uint8_t cqe_version; + uint16_t cqe_comp_max_num; + uint16_t rroce_udp_sport_min; + uint16_t rroce_udp_sport_max; + struct { + uint8_t valid; + uint8_t link_layer; + enum ibv_port_cap_flags caps; + } port_query_cache[MLX5_MAX_PORTS_NUM]; + struct { + uint64_t offset; + uint64_t mask; + uint32_t mult; + uint8_t shift; + } core_clock; + void *hca_core_clock; +}; + +struct mlx5_bitmap { + uint32_t last; + uint32_t top; + uint32_t max; + uint32_t avail; + uint32_t mask; + unsigned long *table; +}; + +struct mlx5_hugetlb_mem { + int shmid; + void *shmaddr; + struct mlx5_bitmap bitmap; + struct list_head list; +}; + +struct mlx5_numa_req { + int valid; + int numa_id; +}; +struct mlx5_buf { + void *buf; + size_t length; + int base; + struct mlx5_hugetlb_mem *hmem; + enum mlx5_alloc_type type; + struct mlx5_numa_req numa_req; + int numa_alloc; +}; + +struct mlx5_pd { + struct ibv_pd ibv_pd; + uint32_t pdn; + struct mlx5_implicit_lkey r_ilkey; + struct mlx5_implicit_lkey w_ilkey; + struct mlx5_implicit_lkey *remote_ilkey; +}; + +enum { + MLX5_CQ_SET_CI = 0, + MLX5_CQ_ARM_DB = 1, +}; + +enum mlx5_cq_model_flags { + /* + * When set the CQ API must be thread safe. + * When reset application is taking care + * to sync between CQ API calls. + */ + MLX5_CQ_MODEL_FLAG_THREAD_SAFE = 1 << 0, +}; + +enum mlx5_cq_creation_flags { + /* When set, CQ supports timestamping */ + MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP = 1 << 0, +}; + +struct mlx5_cq { + struct ibv_cq ibv_cq; + uint32_t creation_flags; + uint32_t pattern; + struct mlx5_buf buf_a; + struct mlx5_buf buf_b; + struct mlx5_buf *active_buf; + struct mlx5_buf *resize_buf; + int resize_cqes; + int active_cqes; + struct mlx5_lock lock; + uint32_t cqn; + uint32_t cons_index; + uint32_t wait_index; + uint32_t wait_count; + uint32_t *dbrec; + int arm_sn; + int cqe_sz; + int resize_cqe_sz; + int stall_next_poll; + int stall_enable; + uint64_t stall_last_count; + int stall_adaptive_enable; + int stall_cycles; + uint8_t model_flags; /* use mlx5_cq_model_flags */ + uint16_t cqe_comp_max_num; + uint8_t cq_log_size; +}; + +struct mlx5_srq { + struct mlx5_resource rsc; /* This struct must be first */ + struct verbs_srq vsrq; + struct mlx5_buf buf; + struct mlx5_spinlock lock; + uint64_t *wrid; + uint32_t srqn; + int max; + int max_gs; + int wqe_shift; + int head; + int tail; + uint32_t *db; + uint16_t counter; + int wq_sig; + struct ibv_srq_legacy *ibv_srq_legacy; + int is_xsrq; +}; + +struct wr_list { + uint16_t opcode; + uint16_t next; +}; + +struct mlx5_wq { + /* common hot data */ + uint64_t *wrid; + unsigned wqe_cnt; + unsigned head; + unsigned tail; + unsigned max_post; + int max_gs; + struct mlx5_lock lock; + /* post_recv hot data */ + void *buff; + uint32_t *db; + int wqe_shift; + int offset; +}; + +struct mlx5_wq_recv_send_enable { + unsigned head_en_index; + unsigned head_en_count; +}; + +enum mlx5_db_method { + MLX5_DB_METHOD_DEDIC_BF_1_THREAD, + MLX5_DB_METHOD_DEDIC_BF, + MLX5_DB_METHOD_BF, + MLX5_DB_METHOD_DB +}; + +struct mlx5_bf { + void *reg; + int need_lock; + /* + * Protect usage of BF address field including data written to the BF + * and the BF buffer toggling. + */ + struct mlx5_lock lock; + unsigned offset; + unsigned buf_size; + unsigned uuarn; + enum mlx5_db_method db_method; +}; + +struct mlx5_mr { + struct ibv_mr ibv_mr; + struct mlx5_buf buf; + uint64_t alloc_flags; + enum mlx5_mr_type type; +}; + +enum mlx5_qp_model_flags { + /* + * When set the QP API must be thread safe. + * When reset application is taking care + * to sync between QP API calls. + */ + MLX5_QP_MODEL_FLAG_THREAD_SAFE = 1 << 0, + MLX5_QP_MODEL_MULTI_PACKET_WQE = 1 << 1, + MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP = 1 << 2, +}; + +struct mlx5_qp; +struct general_data_hot { + /* post_send hot data */ + unsigned *wqe_head; + int (*post_send_one)(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, + uint64_t exp_send_flags, + void *seg, int *total_size); + void *sqstart; + void *sqend; + uint32_t *db; + struct mlx5_bf *bf; + uint32_t scur_post; + /* Used for burst_family interface, keeps the last posted wqe */ + uint32_t last_post; + uint16_t create_flags; + uint8_t fm_cache; + uint8_t model_flags; /* use mlx5_qp_model_flags */ +}; +enum mpw_states { + MLX5_MPW_STATE_CLOSED, + MLX5_MPW_STATE_OPENED, + MLX5_MPW_STATE_OPENED_INL, + MLX5_MPW_STATE_OPENING, +}; +enum { + MLX5_MAX_MPW_SGE = 5, + MLX5_MAX_MPW_SIZE = 0x3FFF +}; +struct mpw_data { + uint8_t state; /* use mpw_states */ + uint8_t size; + uint8_t num_sge; + uint32_t len; + uint32_t total_len; + uint32_t flags; + uint32_t scur_post; + union { + struct mlx5_wqe_data_seg *last_dseg; + uint8_t *inl_data; + }; + uint32_t *ctrl_update; +}; +struct general_data_warm { + uint32_t pattern; + uint8_t qp_type; +}; +struct odp_data { + struct mlx5_pd *pd; +}; +struct data_seg_data { + uint32_t max_inline_data; +}; +struct ctrl_seg_data { + uint32_t qp_num; + uint8_t fm_ce_se_tbl[8]; + uint8_t fm_ce_se_acc[32]; + uint8_t wq_sig; +}; +struct mlx5_qp { + struct mlx5_resource rsc; + struct verbs_qp verbs_qp; + struct mlx5_buf buf; + int buf_size; + /* For Raw Ethernet QP, use different Buffer for the SQ and RQ */ + struct mlx5_buf sq_buf; + int sq_buf_size; + uint8_t sq_signal_bits; + int umr_en; + + /* hot data used on data path */ + struct mlx5_wq rq __MLX5_ALGN_D__; + struct mlx5_wq sq __MLX5_ALGN_D__; + + struct general_data_hot gen_data; + struct mpw_data mpw; + struct data_seg_data data_seg; + struct ctrl_seg_data ctrl_seg; + + /* RAW_PACKET hot data */ + uint8_t link_layer; + + /* used on data-path but not so hot */ + struct general_data_warm gen_data_warm; + /* atomic hot data */ + int enable_atomics; + /* odp hot data */ + struct odp_data odp_data; + /* ext atomic hot data */ + uint32_t max_atomic_arg; + /* umr hot data */ + uint32_t max_inl_send_klms; + /* recv-send enable hot data */ + struct mlx5_wq_recv_send_enable rq_enable; + struct mlx5_wq_recv_send_enable sq_enable; + int rx_qp; +}; + +struct mlx5_dct { + struct mlx5_resource rsc; + struct ibv_exp_dct ibdct; +}; + +enum mlx5_wq_model_flags { + /* + * When set the WQ API must be thread safe. + * When reset application is taking care + * to sync between WQ API calls. + */ + MLX5_WQ_MODEL_FLAG_THREAD_SAFE = 1 << 0, + + /* + * This flag is used to cache the IBV_EXP_DEVICE_RX_CSUM_IP_PKT + * device cap flag and it enables the related RX offloading support + */ + MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP = 1 << 1, +}; + +enum mlx5_mp_rq_sizes { + /* + * Max log num of WQE strides supported by lib is 31 since related + * "num of strides" variables size (i.e. consumed_strides_counter[] and + * mp_rq_strides_in_wqe) is 32 bits + */ + MLX5_MP_RQ_MAX_LOG_NUM_STRIDES = 31, + /* + * Max log stride size supported by lib is 15 since related + * "stride size" variable size (i.e. mp_rq_stride_size) is 16 bits + */ + MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE = 15, + MLX5_MP_RQ_SUPPORTED_QPT = IBV_EXP_QPT_RAW_PACKET, + MLX5_MP_RQ_SUPPORTED_SHIFTS = IBV_EXP_MP_RQ_2BYTES_SHIFT +}; + +struct mlx5_rwq { + struct mlx5_resource rsc; + uint32_t pattern; + struct ibv_exp_wq wq; + struct mlx5_buf buf; + int buf_size; + /* hot data used on data path */ + struct mlx5_wq rq __MLX5_ALGN_D__; + uint32_t *db; + /* Multi-Packet RQ hot data */ + /* Table to hold the consumed strides on each WQE */ + uint32_t *consumed_strides_counter; + uint16_t mp_rq_stride_size; + uint32_t mp_rq_strides_in_wqe; + uint8_t mp_rq_packet_padding; + /* recv-send enable hot data */ + struct mlx5_wq_recv_send_enable rq_enable; + int wq_sig; + uint8_t model_flags; /* use mlx5_wq_model_flags */ +}; + +struct mlx5_ah { + struct ibv_ah ibv_ah; + struct mlx5_wqe_av av; +}; + +struct mlx5_verbs_srq { + struct mlx5_srq msrq; + struct verbs_srq vsrq; +}; + +struct mlx5_klm_buf { + void *alloc_buf; + void *align_buf; + struct ibv_mr *mr; + struct ibv_exp_mkey_list_container ibv_klm_list; +}; + +struct mlx5_send_db_data { + struct mlx5_bf bf; + struct mlx5_wc_uar *wc_uar; + struct list_head list; +}; + +/* Container for the dynamically allocated Write-Combining(WC) mapped UAR */ +struct mlx5_wc_uar { + /* Each UAR contains MLX5_NUM_UUARS_PER_PAGE UUARS (BFs) */ + struct mlx5_send_db_data send_db_data[MLX5_NUM_UUARS_PER_PAGE]; + /* The index used to mmap this UAR */ + int uar_idx; + /* The virtual address of the WC mmaped UAR */ + void *uar; +}; + +struct mlx5_res_domain { + struct ibv_exp_res_domain ibv_res_domain; + struct ibv_exp_res_domain_init_attr attr; + struct mlx5_send_db_data *send_db; +}; + +static inline int mlx5_ilog2(int n) +{ + int t; + + if (n <= 0) + return -1; + + t = 0; + while ((1 << t) < n) + ++t; + + return t; +} + +extern int mlx5_stall_num_loop; +extern int mlx5_stall_cq_poll_min; +extern int mlx5_stall_cq_poll_max; +extern int mlx5_stall_cq_inc_step; +extern int mlx5_stall_cq_dec_step; +extern int mlx5_single_threaded; +extern int mlx5_use_mutex; + +static inline unsigned DIV_ROUND_UP(unsigned n, unsigned d) +{ + return (n + d - 1u) / d; +} + +static inline unsigned long align(unsigned long val, unsigned long algn) +{ + return (val + algn - 1) & ~(algn - 1); +} + +static inline void *align_ptr(void *p, unsigned long algn) +{ + return (void *)align((unsigned long)p, algn); +} + +#define to_mxxx(xxx, type) \ + ((struct mlx5_##type *) \ + ((void *) ib##xxx - offsetof(struct mlx5_##type, ibv_##xxx))) + +static inline struct mlx5_device *to_mdev(struct ibv_device *ibdev) +{ + struct mlx5_device *ret; + + ret = (void *)ibdev - offsetof(struct mlx5_device, verbs_dev); + + return ret; +} + +static inline struct mlx5_context *to_mctx(struct ibv_context *ibctx) +{ + return to_mxxx(ctx, context); +} + +static inline struct mlx5_pd *to_mpd(struct ibv_pd *ibpd) +{ + return to_mxxx(pd, pd); +} + +static inline struct mlx5_cq *to_mcq(struct ibv_cq *ibcq) +{ + return to_mxxx(cq, cq); +} + +static inline struct mlx5_srq *to_msrq(struct ibv_srq *ibsrq) +{ + struct verbs_srq *vsrq = (struct verbs_srq *)ibsrq; + + return container_of(vsrq, struct mlx5_srq, vsrq); +} + +static inline struct mlx5_qp *to_mqp(struct ibv_qp *ibqp) +{ + struct verbs_qp *vqp = (struct verbs_qp *)ibqp; + + return container_of(vqp, struct mlx5_qp, verbs_qp); +} + +static inline struct mlx5_dct *to_mdct(struct ibv_exp_dct *ibdct) +{ + return container_of(ibdct, struct mlx5_dct, ibdct); +} + +static inline struct mlx5_rwq *to_mrwq(struct ibv_exp_wq *ibwq) +{ + return container_of(ibwq, struct mlx5_rwq, wq); +} + +static inline struct mlx5_mr *to_mmr(struct ibv_mr *ibmr) +{ + return to_mxxx(mr, mr); +} + +static inline struct mlx5_ah *to_mah(struct ibv_ah *ibah) +{ + return to_mxxx(ah, ah); +} + +static inline struct mlx5_res_domain *to_mres_domain(struct ibv_exp_res_domain *ibres_domain) +{ + return to_mxxx(res_domain, res_domain); +} + +static inline struct mlx5_klm_buf *to_klm(struct ibv_exp_mkey_list_container *ibklm) +{ + size_t off = offsetof(struct mlx5_klm_buf, ibv_klm_list); + + return (struct mlx5_klm_buf *)((void *)ibklm - off); +} + +static inline int max_int(int a, int b) +{ + return a > b ? a : b; +} + +static inline enum mlx5_lock_type mlx5_get_locktype(void) +{ + if (!mlx5_use_mutex) + return MLX5_SPIN_LOCK; + return MLX5_MUTEX; +} + +void *mlx5_uar_mmap(int idx, int cmd, int page_size, int cmd_fd); +int mlx5_cpu_local_numa(void); +void mlx5_build_ctrl_seg_data(struct mlx5_qp *qp, uint32_t qp_num); +int mlx5_alloc_buf(struct mlx5_buf *buf, size_t size, int page_size); +void mlx5_free_buf(struct mlx5_buf *buf); +int mlx5_alloc_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf, + size_t size, int page_size, const char *component, void *req_addr); +void mlx5_free_buf_contig(struct mlx5_context *mctx, struct mlx5_buf *buf); +int mlx5_alloc_prefered_buf(struct mlx5_context *mctx, + struct mlx5_buf *buf, + size_t size, int page_size, + enum mlx5_alloc_type alloc_type, + const char *component); +int mlx5_free_actual_buf(struct mlx5_context *ctx, struct mlx5_buf *buf); +void mlx5_get_alloc_type(struct ibv_context *context, + const char *component, + enum mlx5_alloc_type *alloc_type, + enum mlx5_alloc_type default_alloc_type); +int mlx5_use_huge(struct ibv_context *context, const char *key); + +uint32_t *mlx5_alloc_dbrec(struct mlx5_context *context); +void mlx5_free_db(struct mlx5_context *context, uint32_t *db); + +int mlx5_prefetch_mr(struct ibv_mr *mr, struct ibv_exp_prefetch_attr *attr); + +int mlx5_query_device(struct ibv_context *context, + struct ibv_device_attr *attr); +int mlx5_query_port(struct ibv_context *context, uint8_t port, + struct ibv_port_attr *attr); +int mlx5_exp_query_port(struct ibv_context *context, uint8_t port_num, + struct ibv_exp_port_attr *port_attr); + +struct ibv_pd *mlx5_alloc_pd(struct ibv_context *context); +int mlx5_free_pd(struct ibv_pd *pd); +void read_init_vars(struct mlx5_context *ctx); + +struct ibv_mr *mlx5_reg_mr(struct ibv_pd *pd, void *addr, + size_t length, int access); +struct ibv_mr *mlx5_exp_reg_mr(struct ibv_exp_reg_mr_in *in); +int mlx5_dereg_mr(struct ibv_mr *mr); + +struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe, + struct ibv_comp_channel *channel, + int comp_vector); +struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context, + int cqe, + struct ibv_comp_channel *channel, + int comp_vector, + struct ibv_exp_cq_init_attr *attr); +int mlx5_alloc_cq_buf(struct mlx5_context *mctx, struct mlx5_cq *cq, + struct mlx5_buf *buf, int nent, int cqe_sz); +int mlx5_free_cq_buf(struct mlx5_context *ctx, struct mlx5_buf *buf); +int mlx5_resize_cq(struct ibv_cq *cq, int cqe); +int mlx5_destroy_cq(struct ibv_cq *cq); +int mlx5_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc) __MLX5_ALGN_F__; +int mlx5_poll_cq_1(struct ibv_cq *cq, int ne, struct ibv_wc *wc) __MLX5_ALGN_F__; +int mlx5_arm_cq(struct ibv_cq *cq, int solicited); +void mlx5_cq_event(struct ibv_cq *cq); +void __mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq); +void mlx5_cq_clean(struct mlx5_cq *cq, uint32_t qpn, struct mlx5_srq *srq); +void mlx5_cq_resize_copy_cqes(struct mlx5_cq *cq); + +struct ibv_srq *mlx5_create_srq(struct ibv_pd *pd, + struct ibv_srq_init_attr *attr); +int mlx5_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *attr, + int mask); +int mlx5_query_srq(struct ibv_srq *srq, + struct ibv_srq_attr *attr); +int mlx5_destroy_srq(struct ibv_srq *srq); +int mlx5_alloc_srq_buf(struct ibv_context *context, struct mlx5_srq *srq); +void mlx5_free_srq_wqe(struct mlx5_srq *srq, int ind); +int mlx5_post_srq_recv(struct ibv_srq *ibsrq, + struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad_wr) __MLX5_ALGN_F__; + +struct ibv_qp *mlx5_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr); +int mlx5_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + int attr_mask, + struct ibv_qp_init_attr *init_attr); +int mlx5_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + int attr_mask); +int mlx5_destroy_qp(struct ibv_qp *qp); +void mlx5_init_qp_indices(struct mlx5_qp *qp); +void mlx5_init_rwq_indices(struct mlx5_rwq *rwq); +void mlx5_update_post_send_one(struct mlx5_qp *qp, enum ibv_qp_state qp_state, enum ibv_qp_type qp_type); +int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, + struct ibv_send_wr **bad_wr) __MLX5_ALGN_F__; +int mlx5_exp_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr, + struct ibv_exp_send_wr **bad_wr) __MLX5_ALGN_F__; +struct ibv_exp_mkey_list_container *mlx5_alloc_mkey_mem(struct ibv_exp_mkey_list_container_attr *attr); +int mlx5_free_mkey_mem(struct ibv_exp_mkey_list_container *mem); +int mlx5_query_mkey(struct ibv_mr *mr, struct ibv_exp_mkey_attr *mkey_attr); +struct ibv_mr *mlx5_create_mr(struct ibv_exp_create_mr_in *in); +int mlx5_exp_dereg_mr(struct ibv_mr *mr, struct ibv_exp_dereg_out *out); +struct ibv_exp_wq *mlx5_exp_create_wq(struct ibv_context *context, + struct ibv_exp_wq_init_attr *attr); +int mlx5_exp_modify_wq(struct ibv_exp_wq *wq, struct ibv_exp_wq_attr *attr); +int mlx5_exp_destroy_wq(struct ibv_exp_wq *wq); +struct ibv_exp_rwq_ind_table *mlx5_exp_create_rwq_ind_table(struct ibv_context *context, + struct ibv_exp_rwq_ind_table_init_attr *init_attr); +int mlx5_exp_destroy_rwq_ind_table(struct ibv_exp_rwq_ind_table *rwq_ind_table); +int mlx5_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad_wr) __MLX5_ALGN_F__; +void mlx5_calc_sq_wqe_size(struct ibv_qp_cap *cap, enum ibv_qp_type type, + struct mlx5_qp *qp); +void mlx5_set_sq_sizes(struct mlx5_qp *qp, struct ibv_qp_cap *cap, + enum ibv_qp_type type); +int mlx5_store_rsc(struct mlx5_context *ctx, uint32_t rsn, void *rsc); +void *mlx5_find_rsc(struct mlx5_context *ctx, uint32_t rsn); +void mlx5_clear_rsc(struct mlx5_context *ctx, uint32_t rsn); +uint32_t mlx5_store_uidx(struct mlx5_context *ctx, void *rsc); +void mlx5_clear_uidx(struct mlx5_context *ctx, uint32_t uidx); +struct mlx5_srq *mlx5_find_srq(struct mlx5_context *ctx, uint32_t srqn); +int mlx5_store_srq(struct mlx5_context *ctx, uint32_t srqn, + struct mlx5_srq *srq); +void mlx5_clear_srq(struct mlx5_context *ctx, uint32_t srqn); +struct ibv_ah *mlx5_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr); +int mlx5_destroy_ah(struct ibv_ah *ah); +int mlx5_alloc_av(struct mlx5_pd *pd, struct ibv_ah_attr *attr, + struct mlx5_ah *ah); +void mlx5_free_av(struct mlx5_ah *ah); +int mlx5_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid); +int mlx5_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid); +int mlx5_round_up_power_of_two(long long sz); +void *mlx5_get_atomic_laddr(struct mlx5_qp *qp, uint16_t idx, int *byte_count); +int mlx5_copy_to_recv_wqe(struct mlx5_qp *qp, int idx, void *buf, int size); +int mlx5_copy_to_send_wqe(struct mlx5_qp *qp, int idx, void *buf, int size); +int mlx5_poll_dc_info(struct ibv_context *context, + struct ibv_exp_dc_info_ent *ents, + int nent, int port); +int mlx5_copy_to_recv_srq(struct mlx5_srq *srq, int idx, void *buf, int size); +struct ibv_qp *mlx5_drv_create_qp(struct ibv_context *context, + struct ibv_qp_init_attr_ex *attrx); +struct ibv_qp *mlx5_exp_create_qp(struct ibv_context *context, + struct ibv_exp_qp_init_attr *attrx); +struct ibv_ah *mlx5_exp_create_ah(struct ibv_pd *pd, + struct ibv_exp_ah_attr *attr_ex); +struct ibv_xrcd *mlx5_open_xrcd(struct ibv_context *context, + struct ibv_xrcd_init_attr *xrcd_init_attr); +struct ibv_srq *mlx5_create_srq_ex(struct ibv_context *context, + struct ibv_srq_init_attr_ex *attr_ex); +int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num); +struct ibv_qp *mlx5_open_qp(struct ibv_context *context, + struct ibv_qp_open_attr *attr); +int mlx5_close_xrcd(struct ibv_xrcd *ib_xrcd); +int mlx5_modify_qp_ex(struct ibv_qp *qp, struct ibv_exp_qp_attr *attr, + uint64_t attr_mask); +void *mlx5_get_legacy_xrc(struct ibv_srq *srq); +void mlx5_set_legacy_xrc(struct ibv_srq *srq, void *legacy_xrc_srq); +int mlx5_query_device_ex(struct ibv_context *context, + struct ibv_exp_device_attr *attr); +int mlx5_exp_query_values(struct ibv_context *context, int q_values, + struct ibv_exp_values *values); +int mlx5_modify_cq(struct ibv_cq *cq, struct ibv_exp_cq_attr *attr, int attr_mask); +struct ibv_exp_dct *mlx5_create_dct(struct ibv_context *context, + struct ibv_exp_dct_init_attr *attr); +int mlx5_destroy_dct(struct ibv_exp_dct *dct); +int mlx5_poll_cq_ex(struct ibv_cq *ibcq, int num_entries, + struct ibv_exp_wc *wc, uint32_t wc_size) __MLX5_ALGN_F__; +int mlx5_poll_cq_ex_1(struct ibv_cq *ibcq, int num_entries, + struct ibv_exp_wc *wc, uint32_t wc_size) __MLX5_ALGN_F__; +int mlx5_query_dct(struct ibv_exp_dct *dct, struct ibv_exp_dct_attr *attr); +int mlx5_arm_dct(struct ibv_exp_dct *dct, struct ibv_exp_arm_attr *attr); +int mlx5_post_task(struct ibv_context *context, + struct ibv_exp_task *task_list, + struct ibv_exp_task **bad_task); +struct ibv_exp_res_domain *mlx5_exp_create_res_domain(struct ibv_context *context, + struct ibv_exp_res_domain_init_attr *attr); +int mlx5_exp_destroy_res_domain(struct ibv_context *context, + struct ibv_exp_res_domain *res_dom, + struct ibv_exp_destroy_res_domain_attr *attr); +void *mlx5_exp_query_intf(struct ibv_context *context, struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status); +int mlx5_exp_release_intf(struct ibv_context *context, void *intf, + struct ibv_exp_release_intf_params *params); +struct ibv_exp_qp_burst_family *mlx5_get_qp_burst_family(struct mlx5_qp *qp, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status); +struct ibv_exp_wq_family *mlx5_get_wq_family(struct mlx5_rwq *rwq, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status); +struct ibv_exp_cq_family_v1 *mlx5_get_poll_cq_family(struct mlx5_cq *cq, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status); +static inline void *mlx5_find_uidx(struct mlx5_context *ctx, uint32_t uidx) +{ + int tind = uidx >> MLX5_QP_TABLE_SHIFT; + + if (likely(ctx->uidx_table[tind].refcnt)) + return ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK]; + + return NULL; +} + +static inline int mlx5_spin_lock(struct mlx5_spinlock *lock) +{ + if (lock->state == MLX5_USE_LOCK) + return pthread_spin_lock(&lock->lock); + + if (unlikely(lock->state == MLX5_LOCKED)) { + fprintf(stderr, "*** ERROR: multithreading violation ***\n" + "You are running a multithreaded application but\n" + "you set MLX5_SINGLE_THREADED=1. Please unset it.\n"); + abort(); + } else { + lock->state = MLX5_LOCKED; + wmb(); + } + + return 0; +} + +static inline int mlx5_spin_unlock(struct mlx5_spinlock *lock) +{ + if (lock->state == MLX5_USE_LOCK) + return pthread_spin_unlock(&lock->lock); + + lock->state = MLX5_UNLOCKED; + + return 0; +} + +static inline int mlx5_spinlock_init(struct mlx5_spinlock *lock, int use_spinlock) +{ + if (use_spinlock) { + lock->state = MLX5_USE_LOCK; + return pthread_spin_init(&lock->lock, PTHREAD_PROCESS_PRIVATE); + } + lock->state = MLX5_UNLOCKED; + + return 0; +} + +static inline int mlx5_spinlock_destroy(struct mlx5_spinlock *lock) +{ + if (lock->state == MLX5_USE_LOCK) + return pthread_spin_destroy(&lock->lock); + + return 0; +} + +static inline int mlx5_lock(struct mlx5_lock *lock) +{ + if (lock->state == MLX5_USE_LOCK) { + if (lock->type == MLX5_SPIN_LOCK) + return pthread_spin_lock(&lock->slock); + + return pthread_mutex_lock(&lock->mutex); + } + + if (unlikely(lock->state == MLX5_LOCKED)) { + fprintf(stderr, "*** ERROR: multithreading violation ***\n" + "You are running a multithreaded application but\n" + "you set MLX5_SINGLE_THREADED=1. Please unset it.\n"); + abort(); + } else { + lock->state = MLX5_LOCKED; + /* Make new lock state visible to other threads */ + wmb(); + } + + return 0; +} + +static inline int mlx5_unlock(struct mlx5_lock *lock) +{ + if (lock->state == MLX5_USE_LOCK) { + if (lock->type == MLX5_SPIN_LOCK) + return pthread_spin_unlock(&lock->slock); + + return pthread_mutex_unlock(&lock->mutex); + } + + lock->state = MLX5_UNLOCKED; + + return 0; +} + +static inline int mlx5_lock_init(struct mlx5_lock *lock, + int use_lock, + enum mlx5_lock_type lock_type) +{ + if (use_lock) { + lock->type = lock_type; + lock->state = MLX5_USE_LOCK; + if (lock->type == MLX5_SPIN_LOCK) + return pthread_spin_init(&lock->slock, + PTHREAD_PROCESS_PRIVATE); + return pthread_mutex_init(&lock->mutex, + PTHREAD_PROCESS_PRIVATE); + } + + lock->state = MLX5_UNLOCKED; + + return 0; +} + +static inline int mlx5_lock_destroy(struct mlx5_lock *lock) +{ + if (lock->state == MLX5_USE_LOCK) { + if (lock->type == MLX5_SPIN_LOCK) + return pthread_spin_destroy(&lock->slock); + + return pthread_mutex_destroy(&lock->mutex); + } + return 0; +} + +static inline void set_command(int command, off_t *offset) +{ + *offset |= (command << MLX5_IB_MMAP_CMD_SHIFT); +} + +static inline int get_command(off_t *offset) +{ + return ((*offset >> MLX5_IB_MMAP_CMD_SHIFT) & MLX5_IB_MMAP_CMD_MASK); +} + +static inline void reset_command(off_t *offset) +{ + *offset &= ~(MLX5_IB_MMAP_CMD_MASK << MLX5_IB_MMAP_CMD_SHIFT); +} + +static inline void set_arg(int arg, off_t *offset) +{ + *offset |= arg; +} + +static inline void set_order(int order, off_t *offset) +{ + set_arg(order, offset); +} + +static inline void set_index(int index, off_t *offset) +{ + set_arg(index, offset); +} + +static inline uint8_t calc_xor(void *wqe, int size) +{ + int i; + uint8_t *p = wqe; + uint8_t res = 0; + + for (i = 0; i < size; ++i) + res ^= p[i]; + + return res; +} + +static inline void mlx5_update_cons_index(struct mlx5_cq *cq) +{ + cq->dbrec[MLX5_CQ_SET_CI] = htonl(cq->cons_index & 0xffffff); +} + +#endif /* MLX5_H */ Index: contrib/ofed/libmlx5/src/mlx5.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/mlx5.c @@ -0,0 +1,1006 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef HAVE_IBV_REGISTER_DRIVER +#include +#endif + +#include "mlx5.h" +#include "mlx5-abi.h" + +#ifndef PCI_VENDOR_ID_MELLANOX +#define PCI_VENDOR_ID_MELLANOX 0x15b3 +#endif + +#define HCA(v, d) \ + { .vendor = PCI_VENDOR_ID_##v, \ + .device = d } + +struct { + unsigned vendor; + unsigned device; +} hca_table[] = { + HCA(MELLANOX, 4113), /* MT27600 Connect-IB */ + HCA(MELLANOX, 4114), /* MT27600 Connect-IB virtual function */ + HCA(MELLANOX, 4115), /* ConnectX-4 */ + HCA(MELLANOX, 4116), /* ConnectX-4 VF */ + HCA(MELLANOX, 4117), /* ConnectX-4Lx */ + HCA(MELLANOX, 4118), /* ConnectX-4Lx VF */ + HCA(MELLANOX, 4119), /* ConnectX-5 */ + HCA(MELLANOX, 4120), /* ConnectX-5 VF */ +}; + +uint32_t mlx5_debug_mask = 0; +int mlx5_freeze_on_error_cqe; + +static struct ibv_context_ops mlx5_ctx_ops = { + .query_device = mlx5_query_device, + .query_port = mlx5_query_port, + .alloc_pd = mlx5_alloc_pd, + .dealloc_pd = mlx5_free_pd, + .reg_mr = mlx5_reg_mr, + .dereg_mr = mlx5_dereg_mr, + .create_cq = mlx5_create_cq, + .poll_cq = mlx5_poll_cq, + .req_notify_cq = mlx5_arm_cq, + .cq_event = mlx5_cq_event, + .resize_cq = mlx5_resize_cq, + .destroy_cq = mlx5_destroy_cq, + .create_srq = mlx5_create_srq, + .modify_srq = mlx5_modify_srq, + .query_srq = mlx5_query_srq, + .destroy_srq = mlx5_destroy_srq, + .post_srq_recv = mlx5_post_srq_recv, + .create_qp = mlx5_create_qp, + .query_qp = mlx5_query_qp, + .modify_qp = mlx5_modify_qp, + .destroy_qp = mlx5_destroy_qp, + .post_send = mlx5_post_send, + .post_recv = mlx5_post_recv, + .create_ah = mlx5_create_ah, + .destroy_ah = mlx5_destroy_ah, + .attach_mcast = mlx5_attach_mcast, + .detach_mcast = mlx5_detach_mcast +}; + +static int read_number_from_line(const char *line, int *value) +{ + const char *ptr; + + ptr = strchr(line, ':'); + if (!ptr) + return 1; + + ++ptr; + + *value = atoi(ptr); + return 0; +} + +static int get_free_uidx(struct mlx5_context *ctx) +{ + int tind; + int i; + + for (tind = 0; tind < MLX5_QP_TABLE_SIZE; tind++) { + if (ctx->uidx_table[tind].refcnt < MLX5_QP_TABLE_MASK) + break; + } + + if (tind == MLX5_QP_TABLE_SIZE) + return -1; + + if (!ctx->uidx_table[tind].refcnt) + return (tind << MLX5_QP_TABLE_SHIFT); + + for (i = 0; i < MLX5_QP_TABLE_MASK + 1; i++) { + if (!ctx->uidx_table[tind].table[i]) + break; + } + + return (tind << MLX5_QP_TABLE_SHIFT) | i; +} + +uint32_t mlx5_store_uidx(struct mlx5_context *ctx, void *rsc) +{ + int tind; + int ret = -1; + int uidx; + + pthread_mutex_lock(&ctx->uidx_table_mutex); + uidx = get_free_uidx(ctx); + if (uidx < 0) + goto out; + + tind = uidx >> MLX5_QP_TABLE_SHIFT; + + if (!ctx->uidx_table[tind].refcnt) { + ctx->uidx_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1, + sizeof(void *)); + if (!ctx->uidx_table[tind].table) + goto out; + } + + ++ctx->uidx_table[tind].refcnt; + ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK] = rsc; + ret = uidx; + +out: + pthread_mutex_unlock(&ctx->uidx_table_mutex); + return ret; +} + +void mlx5_clear_uidx(struct mlx5_context *ctx, uint32_t uidx) +{ + int tind = uidx >> MLX5_QP_TABLE_SHIFT; + + pthread_mutex_lock(&ctx->uidx_table_mutex); + + if (!--ctx->uidx_table[tind].refcnt) + free(ctx->uidx_table[tind].table); + else + ctx->uidx_table[tind].table[uidx & MLX5_QP_TABLE_MASK] = NULL; + + pthread_mutex_unlock(&ctx->uidx_table_mutex); +} + +static int mlx5_is_sandy_bridge(int *num_cores) +{ + char line[128]; + FILE *fd; + int rc = 0; + int cur_cpu_family = -1; + int cur_cpu_model = -1; + + fd = fopen("/proc/cpuinfo", "r"); + if (!fd) + return 0; + + *num_cores = 0; + + while (fgets(line, 128, fd)) { + int value; + + /* if this is information on new processor */ + if (!strncmp(line, "processor", 9)) { + ++*num_cores; + + cur_cpu_family = -1; + cur_cpu_model = -1; + } else if (!strncmp(line, "cpu family", 10)) { + if ((cur_cpu_family < 0) && (!read_number_from_line(line, &value))) + cur_cpu_family = value; + } else if (!strncmp(line, "model", 5)) { + if ((cur_cpu_model < 0) && (!read_number_from_line(line, &value))) + cur_cpu_model = value; + } + + /* if this is a Sandy Bridge CPU */ + if ((cur_cpu_family == 6) && + (cur_cpu_model == 0x2A || (cur_cpu_model == 0x2D) )) + rc = 1; + } + + fclose(fd); + return rc; +} + +/* +man cpuset + + This format displays each 32-bit word in hexadecimal (using ASCII characters "0" - "9" and "a" - "f"); words + are filled with leading zeros, if required. For masks longer than one word, a comma separator is used between + words. Words are displayed in big-endian order, which has the most significant bit first. The hex digits + within a word are also in big-endian order. + + The number of 32-bit words displayed is the minimum number needed to display all bits of the bitmask, based on + the size of the bitmask. + + Examples of the Mask Format: + + 00000001 # just bit 0 set + 40000000,00000000,00000000 # just bit 94 set + 000000ff,00000000 # bits 32-39 set + 00000000,000E3862 # 1,5,6,11-13,17-19 set + + A mask with bits 0, 1, 2, 4, 8, 16, 32, and 64 set displays as: + + 00000001,00000001,00010117 + + The first "1" is for bit 64, the second for bit 32, the third for bit 16, the fourth for bit 8, the fifth for + bit 4, and the "7" is for bits 2, 1, and 0. +*/ +static void mlx5_local_cpu_set(struct mlx5_context *ctx, cpuset_t *cpu_set) +{ + char *p, buf[1024]; + char env_value[VERBS_MAX_ENV_VAL]; + uint32_t word; + int i, k; + struct ibv_context *context = &ctx->ibv_ctx; + + if (!ibv_exp_cmd_getenv(context, "MLX5_LOCAL_CPUS", env_value, sizeof(env_value))) + strncpy(buf, env_value, sizeof(buf)); + else { + char fname[MAXPATHLEN]; + + snprintf(fname, MAXPATHLEN, "/sys/class/infiniband/%s", + ibv_get_device_name(context->device)); + + if (ibv_read_sysfs_file(fname, "device/local_cpus", buf, sizeof(buf))) { + fprintf(stderr, PFX "Warning: can not get local cpu set: failed to open %s\n", fname); + return; + } + } + + p = strrchr(buf, ','); + if (!p) + p = buf; + + i = 0; + do { + if (*p == ',') { + *p = 0; + p ++; + } + + word = strtoul(p, 0, 16); + + for (k = 0; word; ++k, word >>= 1) + if (word & 1) + CPU_SET(k+i, cpu_set); + + if (p == buf) + break; + + p = strrchr(buf, ','); + if (!p) + p = buf; + + i += 32; + } while (i < CPU_SETSIZE); +} + +static int mlx5_device_local_numa(struct mlx5_context *ctx) +{ + char buf[1024]; + struct ibv_context *context = &ctx->ibv_ctx; + char fname[MAXPATHLEN]; + + snprintf(fname, MAXPATHLEN, "/sys/class/infiniband/%s", + ibv_get_device_name(context->device)); + + if (ibv_read_sysfs_file(fname, "device/numa_node", buf, sizeof(buf))) + return (-1); + + return (int)strtoul(buf, 0, 0); +} + +static int mlx5_enable_stall_cq(struct mlx5_context *ctx, int only_sb) +{ + cpuset_t my_cpus, dev_local_cpus, result_set; + int stall_enable; + int ret; + int num_cores; + + if (only_sb && !mlx5_is_sandy_bridge(&num_cores)) + return 0; + + /* by default disable stall on sandy bridge arch */ + stall_enable = 0; + + /* + * check if app is bound to cpu set that is inside + * of device local cpu set. Disable stalling if true + */ + + /* use static cpu set - up to CPU_SETSIZE (1024) cpus/node */ + CPU_ZERO(&my_cpus); + CPU_ZERO(&dev_local_cpus); + CPU_ZERO(&result_set); + ret = cpuset_getaffinity(CPU_LEVEL_WHICH, CPU_WHICH_PID, -1, + sizeof(my_cpus), &my_cpus); + if (ret == -1) { + if (errno == EINVAL) + fprintf(stderr, PFX "Warning: my cpu set is too small\n"); + else + fprintf(stderr, PFX "Warning: failed to get my cpu set\n"); + goto out; + } + + /* get device local cpu set */ + mlx5_local_cpu_set(ctx, &dev_local_cpus); + + /* make sure result_set is not init to all 0 */ + CPU_SET(0, &result_set); + /* Set stall_enable if my cpu set and dev cpu set are disjoint sets */ + CPU_AND(&result_set, &my_cpus); + CPU_AND(&result_set, &dev_local_cpus); + stall_enable = CPU_COUNT(&result_set) ? 0 : 1; + +out: + return stall_enable; +} + +static void mlx5_read_env(struct mlx5_context *ctx) +{ + char env_value[VERBS_MAX_ENV_VAL]; + struct ibv_context *context = &ctx->ibv_ctx; + + /* If MLX5_STALL_CQ_POLL is not set enable stall CQ only on sandy bridge */ + if (ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL", env_value, sizeof(env_value))) + ctx->stall_enable = mlx5_enable_stall_cq(ctx, 1); + /* If MLX5_STALL_CQ_POLL == 0 disable stall CQ */ + else if (!strcmp(env_value, "0")) + ctx->stall_enable = 0; + /* If MLX5_STALL_CQ_POLL == 1 enable stall CQ */ + else if (!strcmp(env_value, "1")) + ctx->stall_enable = mlx5_enable_stall_cq(ctx, 0); + /* Otherwise enable stall CQ only on sandy bridge */ + else + ctx->stall_enable = mlx5_enable_stall_cq(ctx, 1); + + if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_NUM_LOOP", env_value, sizeof(env_value))) + mlx5_stall_num_loop = atoi(env_value); + + if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL_MIN", env_value, sizeof(env_value))) + mlx5_stall_cq_poll_min = atoi(env_value); + + if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_POLL_MAX", env_value, sizeof(env_value))) + mlx5_stall_cq_poll_max = atoi(env_value); + + if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_INC_STEP", env_value, sizeof(env_value))) + mlx5_stall_cq_inc_step = atoi(env_value); + + if (!ibv_exp_cmd_getenv(context, "MLX5_STALL_CQ_DEC_STEP", env_value, sizeof(env_value))) + mlx5_stall_cq_dec_step = atoi(env_value); + + ctx->stall_adaptive_enable = 0; + ctx->stall_cycles = 0; + ctx->numa_id = mlx5_device_local_numa(ctx); + + if (mlx5_stall_num_loop < 0) { + ctx->stall_adaptive_enable = 1; + ctx->stall_cycles = mlx5_stall_cq_poll_min; + } +} + +static int get_total_uuars(void) +{ + return MLX5_DEF_TOT_UUARS; +} + +static void open_debug_file(struct mlx5_context *ctx) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (ibv_exp_cmd_getenv(&ctx->ibv_ctx, "MLX5_DEBUG_FILE", env, sizeof(env))) { + ctx->dbg_fp = stderr; + return; + } + + ctx->dbg_fp = fopen(env, "aw+"); + if (!ctx->dbg_fp) { + fprintf(stderr, "Failed opening debug file %s, using stderr\n", env); + ctx->dbg_fp = stderr; + return; + } +} + +static void close_debug_file(struct mlx5_context *ctx) +{ + if (ctx->dbg_fp && ctx->dbg_fp != stderr) + fclose(ctx->dbg_fp); +} + +static void set_debug_mask(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_DEBUG_MASK", env, sizeof(env))) + mlx5_debug_mask = strtol(env, NULL, 0); +} + +static void set_freeze_on_error(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_FREEZE_ON_ERROR_CQE", env, sizeof(env))) + mlx5_freeze_on_error_cqe = strtol(env, NULL, 0); +} + +static int get_always_bf(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (ibv_exp_cmd_getenv(context, "MLX5_POST_SEND_PREFER_BF", env, sizeof(env))) + return 1; + + return strcmp(env, "0") ? 1 : 0; +} + +static int get_shut_up_bf(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (ibv_exp_cmd_getenv(context, "MLX5_SHUT_UP_BF", env, sizeof(env))) + return 0; + + return strcmp(env, "0") ? 1 : 0; +} + +static int get_cqe_comp(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (ibv_exp_cmd_getenv(context, "MLX5_ENABLE_CQE_COMPRESSION", env, sizeof(env))) + return 0; + + return strcmp(env, "0") ? 1 : 0; +} + +static int get_use_mutex(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (ibv_exp_cmd_getenv(context, "MLX5_USE_MUTEX", env, sizeof(env))) + return 0; + + return strcmp(env, "0") ? 1 : 0; +} + +static int get_num_low_lat_uuars(void) +{ + return 4; +} + +static int need_uuar_lock(struct mlx5_context *ctx, int uuarn) +{ + if (uuarn == 0) + return 0; + + if (uuarn >= (ctx->tot_uuars - ctx->low_lat_uuars) * 2) + return 0; + + return 1; +} + +static int single_threaded_app(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_SINGLE_THREADED", env, sizeof(env))) + return strcmp(env, "1") ? 0 : 1; + + return 0; +} + +static void set_extended(struct verbs_context *verbs_ctx) +{ + int off_create_qp_ex = offsetof(struct verbs_context, create_qp_ex); + int off_open_xrcd = offsetof(struct verbs_context, open_xrcd); + int off_create_srq = offsetof(struct verbs_context, create_srq_ex); + int off_get_srq_num = offsetof(struct verbs_context, get_srq_num); + int off_open_qp = offsetof(struct verbs_context, open_qp); + int off_mlx5_close_xrcd = offsetof(struct verbs_context, close_xrcd); + int off_create_flow = offsetof(struct verbs_context, create_flow); + int off_destroy_flow = offsetof(struct verbs_context, destroy_flow); + + if (sizeof(*verbs_ctx) - off_create_qp_ex <= verbs_ctx->sz) + verbs_ctx->create_qp_ex = mlx5_drv_create_qp; + + if (sizeof(*verbs_ctx) - off_open_xrcd <= verbs_ctx->sz) + verbs_ctx->open_xrcd = mlx5_open_xrcd; + + if (sizeof(*verbs_ctx) - off_create_srq <= verbs_ctx->sz) + verbs_ctx->create_srq_ex = mlx5_create_srq_ex; + + if (sizeof(*verbs_ctx) - off_get_srq_num <= verbs_ctx->sz) + verbs_ctx->get_srq_num = mlx5_get_srq_num; + + if (sizeof(*verbs_ctx) - off_open_qp <= verbs_ctx->sz) + verbs_ctx->open_qp = mlx5_open_qp; + + if (sizeof(*verbs_ctx) - off_mlx5_close_xrcd <= verbs_ctx->sz) + verbs_ctx->close_xrcd = mlx5_close_xrcd; + + if (sizeof(*verbs_ctx) - off_create_flow <= verbs_ctx->sz) + verbs_ctx->create_flow = ibv_cmd_create_flow; + + if (sizeof(*verbs_ctx) - off_destroy_flow <= verbs_ctx->sz) + verbs_ctx->destroy_flow = ibv_cmd_destroy_flow; +} + +static void set_experimental(struct ibv_context *ctx) +{ + struct verbs_context_exp *verbs_exp_ctx = verbs_get_exp_ctx(ctx); + struct mlx5_context *mctx = to_mctx(ctx); + + verbs_set_exp_ctx_op(verbs_exp_ctx, create_dct, mlx5_create_dct); + verbs_set_exp_ctx_op(verbs_exp_ctx, destroy_dct, mlx5_destroy_dct); + verbs_set_exp_ctx_op(verbs_exp_ctx, query_dct, mlx5_query_dct); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_arm_dct, mlx5_arm_dct); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_device, mlx5_query_device_ex); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_create_qp, mlx5_exp_create_qp); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_modify_qp, mlx5_modify_qp_ex); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_get_legacy_xrc, mlx5_get_legacy_xrc); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_set_legacy_xrc, mlx5_set_legacy_xrc); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_modify_cq, mlx5_modify_cq); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_cq, mlx5_create_cq_ex); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq, mlx5_poll_cq_ex); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_post_task, mlx5_post_task); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_reg_mr, mlx5_exp_reg_mr); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_post_send, mlx5_exp_post_send); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_alloc_mkey_list_memory, mlx5_alloc_mkey_mem); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_dealloc_mkey_list_memory, mlx5_free_mkey_mem); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_mkey, mlx5_query_mkey); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_create_mr, mlx5_create_mr); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_prefetch_mr, + mlx5_prefetch_mr); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_dereg_mr, mlx5_exp_dereg_mr); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_poll_dc_info, mlx5_poll_dc_info); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_wq, mlx5_exp_create_wq); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_modify_wq, mlx5_exp_modify_wq); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_wq, mlx5_exp_destroy_wq); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_create_flow, ibv_exp_cmd_create_flow); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_destroy_flow, ibv_exp_cmd_destroy_flow); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_rwq_ind_table, mlx5_exp_create_rwq_ind_table); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_rwq_ind_table, mlx5_exp_destroy_rwq_ind_table); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_create_res_domain, mlx5_exp_create_res_domain); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_destroy_res_domain, mlx5_exp_destroy_res_domain); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_query_intf, mlx5_exp_query_intf); + verbs_set_exp_ctx_op(verbs_exp_ctx, exp_release_intf, mlx5_exp_release_intf); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_port, mlx5_exp_query_port); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_create_ah, mlx5_exp_create_ah); + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_query_values, mlx5_exp_query_values); + if (mctx->cqe_version == 1) + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq, + mlx5_poll_cq_ex_1); + else + verbs_set_exp_ctx_op(verbs_exp_ctx, drv_exp_ibv_poll_cq, + mlx5_poll_cq_ex); +} + +void *mlx5_uar_mmap(int idx, int cmd, int page_size, int cmd_fd) +{ + off_t offset; + + offset = 0; + set_command(cmd, &offset); + set_index(idx, &offset); + + return mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, cmd_fd, page_size * offset); +} + +void read_init_vars(struct mlx5_context *ctx) +{ + pthread_mutex_lock(&ctx->env_mtx); + if (!ctx->env_initialized) { + mlx5_single_threaded = single_threaded_app(&ctx->ibv_ctx); + mlx5_use_mutex = get_use_mutex(&ctx->ibv_ctx); + open_debug_file(ctx); + set_debug_mask(&ctx->ibv_ctx); + set_freeze_on_error(&ctx->ibv_ctx); + ctx->prefer_bf = get_always_bf(&ctx->ibv_ctx); + ctx->shut_up_bf = get_shut_up_bf(&ctx->ibv_ctx); + mlx5_read_env(ctx); + ctx->env_initialized = 1; + } + pthread_mutex_unlock(&ctx->env_mtx); +} + +static int mlx5_map_internal_clock(struct mlx5_device *mdev, + struct ibv_context *ibv_ctx) +{ + struct mlx5_context *context = to_mctx(ibv_ctx); + void *hca_clock_page; + off_t offset = 0; + + set_command(MLX5_EXP_MMAP_GET_CORE_CLOCK_CMD, &offset); + hca_clock_page = mmap(NULL, mdev->page_size, + PROT_READ, MAP_SHARED, ibv_ctx->cmd_fd, + offset * mdev->page_size); + + if (hca_clock_page == MAP_FAILED) { + fprintf(stderr, PFX + "Warning: Timestamp available,\n" + "but failed to mmap() hca core clock page.\n"); + return -1; + } + + context->hca_core_clock = hca_clock_page + context->core_clock.offset; + + return 0; +} + +enum mlx5_cap_flags { + MLX5_CAP_COMPACT_AV = 1 << 0, +}; + +static int mlx5_alloc_context(struct verbs_device *vdev, + struct ibv_context *ctx, int cmd_fd) +{ + struct mlx5_context *context; + struct mlx5_alloc_ucontext req; + struct mlx5_exp_alloc_ucontext_resp resp; + struct ibv_device *ibdev = &vdev->device; + struct verbs_context *verbs_ctx = verbs_get_ctx(ctx); + struct ibv_exp_device_attr attr; + int i; + int page_size = to_mdev(ibdev)->page_size; + int tot_uuars; + int low_lat_uuars; + int gross_uuars; + int j; + int uar_mapped; + off_t offset; + int err; + + context = to_mctx(ctx); + if (pthread_mutex_init(&context->env_mtx, NULL)) + return -1; + + context->ibv_ctx.cmd_fd = cmd_fd; + + memset(&resp, 0, sizeof(resp)); + if (gethostname(context->hostname, sizeof(context->hostname))) + strcpy(context->hostname, "host_unknown"); + + tot_uuars = get_total_uuars(); + gross_uuars = tot_uuars / MLX5_NUM_UUARS_PER_PAGE * 4; + context->bfs = calloc(gross_uuars, sizeof *context->bfs); + if (!context->bfs) { + errno = ENOMEM; + goto err_free; + } + + low_lat_uuars = get_num_low_lat_uuars(); + if (low_lat_uuars > tot_uuars - 1) { + errno = ENOMEM; + goto err_free_bf; + } + + memset(&req, 0, sizeof(req)); + req.total_num_uuars = tot_uuars; + req.num_low_latency_uuars = low_lat_uuars; + if (ibv_cmd_get_context(&context->ibv_ctx, &req.ibv_req, sizeof req, + &resp.ibv_resp, sizeof resp)) + goto err_free_bf; + + context->max_num_qps = resp.qp_tab_size; + context->bf_reg_size = resp.bf_reg_size; + context->tot_uuars = resp.tot_uuars; + context->low_lat_uuars = low_lat_uuars; + context->cache_line_size = resp.cache_line_size; + context->max_sq_desc_sz = resp.max_sq_desc_sz; + context->max_rq_desc_sz = resp.max_rq_desc_sz; + context->max_send_wqebb = resp.max_send_wqebb; + context->num_ports = resp.num_ports; + context->max_recv_wr = resp.max_recv_wr; + context->max_srq_recv_wr = resp.max_srq_recv_wr; + context->max_desc_sz_sq_dc = resp.max_desc_sz_sq_dc; + context->atomic_sizes_dc = resp.atomic_sizes_dc; + context->compact_av = resp.flags & MLX5_CAP_COMPACT_AV; + + if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_COMP_MAX_NUM) + context->cqe_comp_max_num = resp.exp_data.cqe_comp_max_num; + + if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_CQE_VERSION) + context->cqe_version = resp.exp_data.cqe_version; + + if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MIN) + context->rroce_udp_sport_min = resp.exp_data.rroce_udp_sport_min; + + if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_RROCE_UDP_SPORT_MAX) + context->rroce_udp_sport_max = resp.exp_data.rroce_udp_sport_max; + + ctx->ops = mlx5_ctx_ops; + if (context->cqe_version) { + if (context->cqe_version == 1) { + ctx->ops.poll_cq = mlx5_poll_cq_1; + } else { + printf("Unsupported cqe_vesion = %d, stay on cqe version 0\n", + context->cqe_version); + context->cqe_version = 0; + } + } + + attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1; + err = mlx5_query_device_ex(ctx, &attr); + if (!err && (attr.comp_mask & IBV_EXP_DEVICE_ATTR_MAX_CTX_RES_DOMAIN)) { + context->max_ctx_res_domain = attr.max_ctx_res_domain; + mlx5_spinlock_init(&context->send_db_lock, !mlx5_single_threaded); + INIT_LIST_HEAD(&context->send_wc_db_list); + + } + + if (resp.exp_data.comp_mask & MLX5_EXP_ALLOC_CTX_RESP_MASK_HCA_CORE_CLOCK_OFFSET) { + context->core_clock.offset = + resp.exp_data.hca_core_clock_offset & + (to_mdev(ibdev)->page_size - 1); + mlx5_map_internal_clock(to_mdev(ibdev), ctx); + if (attr.hca_core_clock) + context->core_clock.mult = ((1ull * 1000) << 21) / + attr.hca_core_clock; + else + context->core_clock.mult = 0; + + /* ConnectX-4 supports 64bit timestamp. We choose these numbers + * in order to make sure that after arithmetic operations, + * we don't overflow a 64bit variable. + */ + context->core_clock.shift = 21; + context->core_clock.mask = (1ULL << 49) - 1; + } + + pthread_mutex_init(&context->rsc_table_mutex, NULL); + pthread_mutex_init(&context->srq_table_mutex, NULL); + for (i = 0; i < MLX5_QP_TABLE_SIZE; ++i) + context->rsc_table[i].refcnt = 0; + + for (i = 0; i < MLX5_QP_TABLE_SIZE; ++i) + context->uidx_table[i].refcnt = 0; + + context->db_list = NULL; + + pthread_mutex_init(&context->db_list_mutex, NULL); + + context->prefer_bf = get_always_bf(&context->ibv_ctx); + context->shut_up_bf = get_shut_up_bf(&context->ibv_ctx); + context->enable_cqe_comp = get_cqe_comp(&context->ibv_ctx); + mlx5_use_mutex = get_use_mutex(&context->ibv_ctx); + + offset = 0; + set_command(MLX5_MMAP_MAP_DC_INFO_PAGE, &offset); + context->cc.buf = mmap(NULL, 4096 * context->num_ports, PROT_READ, + MAP_PRIVATE, cmd_fd, page_size * offset); + if (context->cc.buf == MAP_FAILED) + context->cc.buf = NULL; + + mlx5_single_threaded = single_threaded_app(&context->ibv_ctx); + for (i = 0; i < resp.tot_uuars / MLX5_NUM_UUARS_PER_PAGE; ++i) { + uar_mapped = 0; + + /* Don't map UAR to WC if BF is not used */ + if (!context->shut_up_bf) { + context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_WC_PAGES_CMD, page_size, cmd_fd); + if (context->uar[i].regs != MAP_FAILED) { + context->uar[i].map_type = MLX5_UAR_MAP_WC; + uar_mapped = 1; + } + } + + if (!uar_mapped) { + context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_NC_PAGES_CMD, page_size, cmd_fd); + if (context->uar[i].regs != MAP_FAILED) { + context->uar[i].map_type = MLX5_UAR_MAP_NC; + uar_mapped = 1; + } + } + + if (!uar_mapped) { + /* for backward compatibility with old kernel driver */ + context->uar[i].regs = mlx5_uar_mmap(i, MLX5_MMAP_GET_REGULAR_PAGES_CMD, page_size, cmd_fd); + if (context->uar[i].regs != MAP_FAILED) { + context->uar[i].map_type = MLX5_UAR_MAP_WC; + uar_mapped = 1; + } + } + + if (!uar_mapped) { + context->uar[i].regs = NULL; + goto err_free_cc; + } + } + + for (j = 0; j < gross_uuars; ++j) { + context->bfs[j].reg = context->uar[j / 4].regs + + MLX5_BF_OFFSET + (j % 4) * context->bf_reg_size; + context->bfs[j].need_lock = need_uuar_lock(context, j) && + context->uar[j / 4].map_type == MLX5_UAR_MAP_WC; + mlx5_lock_init(&context->bfs[j].lock, + !mlx5_single_threaded, + mlx5_get_locktype()); + context->bfs[j].offset = 0; + if (context->uar[j / 4].map_type == MLX5_UAR_MAP_WC) { + context->bfs[j].buf_size = context->bf_reg_size / 2; + context->bfs[j].db_method = (context->bfs[j].need_lock && !mlx5_single_threaded) ? + MLX5_DB_METHOD_BF : + (mlx5_single_threaded && wc_auto_evict_size() == 64 ? + MLX5_DB_METHOD_DEDIC_BF_1_THREAD : + MLX5_DB_METHOD_DEDIC_BF); + + } else { + context->bfs[j].db_method = MLX5_DB_METHOD_DB; + } + + context->bfs[j].uuarn = j; + } + + mlx5_lock_init(&context->lock32, + !mlx5_single_threaded, + mlx5_get_locktype()); + + mlx5_spinlock_init(&context->hugetlb_lock, !mlx5_single_threaded); + INIT_LIST_HEAD(&context->hugetlb_list); + + pthread_mutex_init(&context->task_mutex, NULL); + + set_extended(verbs_ctx); + set_experimental(ctx); + + for (i = 0; i < MLX5_MAX_PORTS_NUM; ++i) + context->port_query_cache[i].valid = 0; + + return 0; + +err_free_cc: + if (context->cc.buf) + munmap(context->cc.buf, 4096 * context->num_ports); + + if (context->hca_core_clock) + munmap(context->hca_core_clock - context->core_clock.offset, + to_mdev(ibdev)->page_size); + +err_free_bf: + free(context->bfs); + +err_free: + for (i = 0; i < MLX5_MAX_UAR_PAGES; ++i) { + if (context->uar[i].regs) + munmap(context->uar[i].regs, page_size); + } + close_debug_file(context); + + return errno; +} + +static void mlx5_free_context(struct verbs_device *device, + struct ibv_context *ibctx) +{ + struct mlx5_context *context = to_mctx(ibctx); + int page_size = to_mdev(ibctx->device)->page_size; + int i; + + if (context->hca_core_clock) + munmap(context->hca_core_clock - context->core_clock.offset, + to_mdev(&device->device)->page_size); + + if (context->cc.buf) + munmap(context->cc.buf, 4096 * context->num_ports); + + free(context->bfs); + for (i = 0; i < MLX5_MAX_UAR_PAGES; ++i) { + if (context->uar[i].regs) + munmap(context->uar[i].regs, page_size); + } + close_debug_file(context); +} + +static struct verbs_device *mlx5_driver_init(const char *uverbs_sys_path, + int abi_version) +{ + char value[8]; + struct mlx5_device *dev; + unsigned vendor, device; + int i; + + if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor", + value, sizeof value) < 0) + return NULL; + sscanf(value, "%i", &vendor); + + if (ibv_read_sysfs_file(uverbs_sys_path, "device/device", + value, sizeof value) < 0) + return NULL; + sscanf(value, "%i", &device); + + for (i = 0; i < sizeof hca_table / sizeof hca_table[0]; ++i) + if (vendor == hca_table[i].vendor && + device == hca_table[i].device) + goto found; + + return NULL; + +found: + if (abi_version < MLX5_UVERBS_MIN_ABI_VERSION || + abi_version > MLX5_UVERBS_MAX_ABI_VERSION) { + fprintf(stderr, PFX "Fatal: ABI version %d of %s is not supported " + "(min supported %d, max supported %d)\n", + abi_version, uverbs_sys_path, + MLX5_UVERBS_MIN_ABI_VERSION, + MLX5_UVERBS_MAX_ABI_VERSION); + return NULL; + } + + dev = malloc(sizeof *dev); + if (!dev) { + fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n", + uverbs_sys_path); + return NULL; + } + + dev->page_size = sysconf(_SC_PAGESIZE); + + dev->devid.id = device; + dev->driver_abi_ver = abi_version; + + dev->verbs_dev.sz = sizeof(dev->verbs_dev); + dev->verbs_dev.size_of_context = + sizeof(struct mlx5_context) - sizeof(struct ibv_context); + + /* + * mlx5_init_context will initialize provider calls + */ + dev->verbs_dev.init_context = mlx5_alloc_context; + dev->verbs_dev.uninit_context = mlx5_free_context; + + return &dev->verbs_dev; +} + +#ifdef HAVE_IBV_REGISTER_DRIVER +static __attribute__((constructor)) void mlx5_register_driver(void) +{ + verbs_register_driver("mlx5", mlx5_driver_init); +} +#else +/* + * Export the old libsysfs sysfs_class_device-based driver entry point + * if libibverbs does not export an ibv_register_driver() function. + */ +struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev) +{ + int abi_ver = 0; + char value[8]; + + if (ibv_read_sysfs_file(sysdev->path, "abi_version", + value, sizeof value) > 0) + abi_ver = strtol(value, NULL, 10); + + return mlx5_driver_init(sysdev->path, abi_ver); +} +#endif /* HAVE_IBV_REGISTER_DRIVER */ Index: contrib/ofed/libmlx5/src/mlx5.map =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/mlx5.map @@ -0,0 +1,5 @@ +{ + global: + openib_driver_init; + local: *; +}; Index: contrib/ofed/libmlx5/src/qp.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/qp.c @@ -0,0 +1,2998 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include + +#include "mlx5.h" +#include "doorbell.h" +#include "wqe.h" + +enum { + MLX5_OPCODE_BASIC = 0x00010000, + MLX5_OPCODE_MANAGED = 0x00020000, + + MLX5_OPCODE_WITH_IMM = 0x01000000, + MLX5_OPCODE_EXT_ATOMICS = 0x08, +}; + +#define MLX5_IB_OPCODE(op, class, attr) (((class) & 0x00FF0000) | ((attr) & 0xFF000000) | ((op) & 0x0000FFFF)) +#define MLX5_IB_OPCODE_GET_CLASS(opcode) ((opcode) & 0x00FF0000) +#define MLX5_IB_OPCODE_GET_OP(opcode) ((opcode) & 0x0000FFFF) +#define MLX5_IB_OPCODE_GET_ATTR(opcode) ((opcode) & 0xFF000000) + + +static const uint32_t mlx5_ib_opcode[] = { + [IBV_EXP_WR_SEND] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_SEND_WITH_IMM] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND_IMM, MLX5_OPCODE_BASIC, MLX5_OPCODE_WITH_IMM), + [IBV_EXP_WR_RDMA_WRITE] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_WRITE, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_RDMA_WRITE_WITH_IMM] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_WRITE_IMM, MLX5_OPCODE_BASIC, MLX5_OPCODE_WITH_IMM), + [IBV_EXP_WR_RDMA_READ] = MLX5_IB_OPCODE(MLX5_OPCODE_RDMA_READ, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_ATOMIC_CMP_AND_SWP] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_CS, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_ATOMIC_FETCH_AND_ADD] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_FA, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_MASKED_CS, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_IB_OPCODE(MLX5_OPCODE_ATOMIC_MASKED_FA, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_SEND_ENABLE] = MLX5_IB_OPCODE(MLX5_OPCODE_SEND_ENABLE, MLX5_OPCODE_MANAGED, 0), + [IBV_EXP_WR_RECV_ENABLE] = MLX5_IB_OPCODE(MLX5_OPCODE_RECV_ENABLE, MLX5_OPCODE_MANAGED, 0), + [IBV_EXP_WR_CQE_WAIT] = MLX5_IB_OPCODE(MLX5_OPCODE_CQE_WAIT, MLX5_OPCODE_MANAGED, 0), + [IBV_EXP_WR_NOP] = MLX5_IB_OPCODE(MLX5_OPCODE_NOP, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_UMR_FILL] = MLX5_IB_OPCODE(MLX5_OPCODE_UMR, MLX5_OPCODE_BASIC, 0), + [IBV_EXP_WR_UMR_INVALIDATE] = MLX5_IB_OPCODE(MLX5_OPCODE_UMR, MLX5_OPCODE_BASIC, 0), +}; + +enum { + MLX5_CALC_UINT64_ADD = 0x01, + MLX5_CALC_FLOAT64_ADD = 0x02, + MLX5_CALC_UINT64_MAXLOC = 0x03, + MLX5_CALC_UINT64_AND = 0x04, + MLX5_CALC_UINT64_OR = 0x05, + MLX5_CALC_UINT64_XOR = 0x06 +}; + +static const struct mlx5_calc_op { + int valid; + uint8_t opmod; +} mlx5_calc_ops_table + [IBV_EXP_CALC_DATA_SIZE_NUMBER] + [IBV_EXP_CALC_OP_NUMBER] + [IBV_EXP_CALC_DATA_TYPE_NUMBER] = { + [IBV_EXP_CALC_DATA_SIZE_64_BIT] = { + [IBV_EXP_CALC_OP_ADD] = { + [IBV_EXP_CALC_DATA_TYPE_INT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_ADD }, + [IBV_EXP_CALC_DATA_TYPE_UINT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_ADD }, + [IBV_EXP_CALC_DATA_TYPE_FLOAT] = { + .valid = 1, + .opmod = MLX5_CALC_FLOAT64_ADD } + }, + [IBV_EXP_CALC_OP_BXOR] = { + [IBV_EXP_CALC_DATA_TYPE_INT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_XOR }, + [IBV_EXP_CALC_DATA_TYPE_UINT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_XOR }, + [IBV_EXP_CALC_DATA_TYPE_FLOAT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_XOR } + }, + [IBV_EXP_CALC_OP_BAND] = { + [IBV_EXP_CALC_DATA_TYPE_INT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_AND }, + [IBV_EXP_CALC_DATA_TYPE_UINT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_AND }, + [IBV_EXP_CALC_DATA_TYPE_FLOAT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_AND } + }, + [IBV_EXP_CALC_OP_BOR] = { + [IBV_EXP_CALC_DATA_TYPE_INT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_OR }, + [IBV_EXP_CALC_DATA_TYPE_UINT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_OR }, + [IBV_EXP_CALC_DATA_TYPE_FLOAT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_OR } + }, + [IBV_EXP_CALC_OP_MAXLOC] = { + [IBV_EXP_CALC_DATA_TYPE_UINT] = { + .valid = 1, + .opmod = MLX5_CALC_UINT64_MAXLOC } + } + } +}; + +static inline void set_wait_en_seg(void *wqe_seg, uint32_t obj_num, uint32_t count) +{ + struct mlx5_wqe_wait_en_seg *seg = (struct mlx5_wqe_wait_en_seg *)wqe_seg; + + seg->pi = htonl(count); + seg->obj_num = htonl(obj_num); + + return; +} + +static inline void *get_recv_wqe(struct mlx5_wq *rq, int n) +{ + return rq->buff + (n << rq->wqe_shift); +} + +static int copy_to_scat(struct mlx5_wqe_data_seg *scat, void *buf, int *size, + int max) +{ + int copy; + int i; + + if (unlikely(!(*size))) + return IBV_WC_SUCCESS; + + for (i = 0; i < max; ++i) { + copy = min(*size, ntohl(scat->byte_count)); + memcpy((void *)(unsigned long)ntohll(scat->addr), buf, copy); + *size -= copy; + if (*size == 0) + return IBV_WC_SUCCESS; + + buf += copy; + ++scat; + } + return IBV_WC_LOC_LEN_ERR; +} + +int mlx5_copy_to_recv_wqe(struct mlx5_qp *qp, int idx, void *buf, int size) +{ + struct mlx5_wqe_data_seg *scat; + int max = 1 << (qp->rq.wqe_shift - 4); + + scat = get_recv_wqe(&qp->rq, idx); + if (unlikely(qp->ctrl_seg.wq_sig)) + ++scat; + + return copy_to_scat(scat, buf, &size, max); +} + +static void *mlx5_get_send_wqe(struct mlx5_qp *qp, int n) +{ + return qp->gen_data.sqstart + (n << MLX5_SEND_WQE_SHIFT); +} + +int mlx5_copy_to_send_wqe(struct mlx5_qp *qp, int idx, void *buf, int size) +{ + struct mlx5_wqe_ctrl_seg *ctrl; + struct mlx5_wqe_data_seg *scat; + void *p; + int max; + + idx &= (qp->sq.wqe_cnt - 1); + ctrl = mlx5_get_send_wqe(qp, idx); + if (qp->verbs_qp.qp.qp_type != IBV_QPT_RC) { + fprintf(stderr, "scatter to CQE is supported only for RC QPs\n"); + return IBV_WC_GENERAL_ERR; + } + p = ctrl + 1; + + switch (ntohl(ctrl->opmod_idx_opcode) & 0xff) { + case MLX5_OPCODE_RDMA_READ: + p = p + sizeof(struct mlx5_wqe_raddr_seg); + break; + + case MLX5_OPCODE_ATOMIC_CS: + case MLX5_OPCODE_ATOMIC_FA: + p = p + sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg); + break; + + default: + fprintf(stderr, "scatter to CQE for opcode %d\n", + ntohl(ctrl->opmod_idx_opcode) & 0xff); + return IBV_WC_REM_INV_REQ_ERR; + } + + scat = p; + max = (ntohl(ctrl->qpn_ds) & 0x3F) - (((void *)scat - (void *)ctrl) >> 4); + if (unlikely((void *)(scat + max) > qp->gen_data.sqend)) { + int tmp = ((void *)qp->gen_data.sqend - (void *)scat) >> 4; + int orig_size = size; + + if (copy_to_scat(scat, buf, &size, tmp) == IBV_WC_SUCCESS) + return IBV_WC_SUCCESS; + max = max - tmp; + buf += orig_size - size; + scat = mlx5_get_send_wqe(qp, 0); + } + + return copy_to_scat(scat, buf, &size, max); +} + +void mlx5_init_qp_indices(struct mlx5_qp *qp) +{ + qp->sq.head = 0; + qp->sq.tail = 0; + qp->rq.head = 0; + qp->rq.tail = 0; + qp->gen_data.scur_post = 0; + qp->sq_enable.head_en_index = 0; + qp->sq_enable.head_en_count = 0; + qp->rq_enable.head_en_index = 0; + qp->rq_enable.head_en_count = 0; +} + +void mlx5_init_rwq_indices(struct mlx5_rwq *rwq) +{ + rwq->rq.head = 0; + rwq->rq.tail = 0; + rwq->rq_enable.head_en_index = 0; + rwq->rq_enable.head_en_count = 0; +} + +static int __mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) __attribute__((noinline)); +static int __mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) +{ + struct mlx5_cq *cq = to_mcq(qp->verbs_qp.qp.send_cq); + unsigned cur; + + + mlx5_lock(&cq->lock); + cur = wq->head - wq->tail; + mlx5_unlock(&cq->lock); + + return cur + nreq >= wq->max_post; +} +static inline int mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) __attribute__((always_inline)); +static inline int mlx5_wq_overflow(struct mlx5_wq *wq, int nreq, struct mlx5_qp *qp) +{ + unsigned cur; + + cur = wq->head - wq->tail; + if (likely(cur + nreq < wq->max_post)) + return 0; + + return __mlx5_wq_overflow(wq, nreq, qp); +} + +static inline void set_raddr_seg(struct mlx5_wqe_raddr_seg *rseg, + uint64_t remote_addr, uint32_t rkey) +{ + rseg->raddr = htonll(remote_addr); + rseg->rkey = htonl(rkey); + rseg->reserved = 0; +} + +static void set_atomic_seg(struct mlx5_wqe_atomic_seg *aseg, + enum ibv_wr_opcode opcode, + uint64_t swap, + uint64_t compare_add) +{ + if (opcode == IBV_WR_ATOMIC_CMP_AND_SWP) { + aseg->swap_add = htonll(swap); + aseg->compare = htonll(compare_add); + } else { + aseg->swap_add = htonll(compare_add); + aseg->compare = 0; + } +} + +static int has_grh(struct mlx5_ah *ah) +{ + return ah->av.base.dqp_dct & ntohl(MLX5_EXTENDED_UD_AV); +} + +static int set_datagram_seg(struct mlx5_wqe_datagram_seg *dseg, + struct ibv_exp_send_wr *wr) +{ + struct mlx5_ah *ah = to_mah(wr->wr.ud.ah); + int size; + + size = has_grh(ah) ? sizeof(ah->av) : sizeof(ah->av.base); + + memcpy(&dseg->av, &to_mah(wr->wr.ud.ah)->av, size); + dseg->av.base.dqp_dct |= htonl(wr->wr.ud.remote_qpn); + dseg->av.base.key.qkey.qkey = htonl(wr->wr.ud.remote_qkey); + + return size; +} + +static int set_dci_seg(struct mlx5_wqe_datagram_seg *dseg, + struct ibv_exp_send_wr *wr) +{ + struct mlx5_ah *ah = to_mah(wr->dc.ah); + int size; + + size = has_grh(ah) ? sizeof(ah->av) : sizeof(ah->av.base); + + memcpy(&dseg->av, &to_mah(wr->dc.ah)->av, size); + dseg->av.base.dqp_dct |= htonl(wr->dc.dct_number); + dseg->av.base.key.dc_key = htonll(wr->dc.dct_access_key); + + return size; +} + +static int set_odp_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg, + struct mlx5_qp *qp) __attribute__((noinline)); +static int set_odp_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg, + struct mlx5_qp *qp) +{ + uint32_t lkey; + if (sg->lkey == ODP_GLOBAL_R_LKEY) { + if (mlx5_get_real_lkey_from_implicit_lkey(qp->odp_data.pd, &qp->odp_data.pd->r_ilkey, + sg->addr, sg->length, + &lkey)) + return ENOMEM; + } else { + if (mlx5_get_real_lkey_from_implicit_lkey(qp->odp_data.pd, &qp->odp_data.pd->w_ilkey, + sg->addr, sg->length, + &lkey)) + return ENOMEM; + } + + dseg->byte_count = htonl(sg->length); + dseg->lkey = htonl(lkey); + dseg->addr = htonll(sg->addr); + + return 0; +} + +static inline int set_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg, + struct mlx5_qp *qp, + int offset) __attribute__((always_inline)); +static inline int set_data_ptr_seg(struct mlx5_wqe_data_seg *dseg, struct ibv_sge *sg, + struct mlx5_qp *qp, + int offset) +{ + if (unlikely(sg->lkey == ODP_GLOBAL_R_LKEY || sg->lkey == ODP_GLOBAL_W_LKEY)) + return set_odp_data_ptr_seg(dseg, sg, qp); + + dseg->byte_count = htonl(sg->length - offset); + dseg->lkey = htonl(sg->lkey); + dseg->addr = htonll(sg->addr + offset); + + return 0; +} + +/* + * Avoid using memcpy() to copy to BlueFlame page, since memcpy() + * implementations may use move-string-buffer assembler instructions, + * which do not guarantee order of copying. + */ +#if defined(__x86_64__) +#define COPY_64B_NT(dst, src) \ + __asm__ __volatile__ ( \ + " movdqa (%1),%%xmm0\n" \ + " movdqa 16(%1),%%xmm1\n" \ + " movdqa 32(%1),%%xmm2\n" \ + " movdqa 48(%1),%%xmm3\n" \ + " movntdq %%xmm0, (%0)\n" \ + " movntdq %%xmm1, 16(%0)\n" \ + " movntdq %%xmm2, 32(%0)\n" \ + " movntdq %%xmm3, 48(%0)\n" \ + : : "r" (dst), "r" (src) : "memory"); \ + dst += 8; \ + src += 8 +#else +#define COPY_64B_NT(dst, src) \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++; \ + *dst++ = *src++ + +#endif +static void mlx5_bf_copy(unsigned long long *dst, unsigned long long *src, + unsigned bytecnt, struct mlx5_qp *qp) +{ + while (bytecnt > 0) { + COPY_64B_NT(dst, src); + bytecnt -= 8 * sizeof(unsigned long long); + if (unlikely(src == qp->gen_data.sqend)) + src = qp->gen_data.sqstart; + } +} + +static inline void mlx5_write_db(unsigned long long *dst, unsigned long long *src) +{ + *dst = *src; +} + +static uint32_t send_ieth(struct ibv_exp_send_wr *wr) +{ + return MLX5_IB_OPCODE_GET_ATTR(mlx5_ib_opcode[wr->exp_opcode]) & + MLX5_OPCODE_WITH_IMM ? + wr->ex.imm_data : 0; +} + +static inline int set_data_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, + int idx, int offset) __attribute__((always_inline)); +static inline int set_data_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, int idx, int offset) +{ + struct mlx5_wqe_inline_seg *seg; + void *addr; + int len; + int i; + int inl = 0; + void *qend = qp->gen_data.sqend; + int copy; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + seg = wqe; + wqe += sizeof *seg; + + for (i = idx; i < num_sge; ++i) { + addr = (void *) (unsigned long)(sg_list[i].addr + offset); + len = sg_list[i].length - offset; + inl += len; + offset = 0; + + if (unlikely(inl > qp->data_seg.max_inline_data)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "inline layout failed, err %d\n", ENOMEM); + return ENOMEM; + } + + if (unlikely(wqe + len > qend)) { + copy = qend - wqe; + memcpy(wqe, addr, copy); + addr += copy; + len -= copy; + wqe = mlx5_get_send_wqe(qp, 0); + } + memcpy(wqe, addr, len); + wqe += len; + } + + if (likely(inl)) { + seg->byte_count = htonl(inl | MLX5_INLINE_SEG); + *sz += align(inl + sizeof(seg->byte_count), 16) / 16; + } + + return 0; +} + +static inline int set_data_non_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, + int idx, int offset) __attribute__((always_inline)); +static inline int set_data_non_inl_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, + int idx, int offset) +{ + struct mlx5_wqe_data_seg *dpseg = wqe; + struct ibv_sge *psge; + int i; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + for (i = idx; i < num_sge; ++i) { + if (unlikely(dpseg == qp->gen_data.sqend)) + dpseg = mlx5_get_send_wqe(qp, 0); + + if (likely(sg_list[i].length)) { + psge = sg_list + i; + + if (unlikely(set_data_ptr_seg(dpseg, psge, qp, + offset))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for implicit lkey structure\n"); + return ENOMEM; + } + ++dpseg; + offset = 0; + *sz += sizeof(struct mlx5_wqe_data_seg) / 16; + } + } + + return 0; +} + +static int set_data_atom_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, int atom_arg) __MLX5_ALGN_F__; +static int set_data_atom_seg(struct mlx5_qp *qp, int num_sge, struct ibv_sge *sg_list, + void *wqe, int *sz, int atom_arg) +{ + struct mlx5_wqe_data_seg *dpseg = wqe; + struct ibv_sge *psge; + struct ibv_sge sge; + int i; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + for (i = 0; i < num_sge; ++i) { + if (unlikely(dpseg == qp->gen_data.sqend)) + dpseg = mlx5_get_send_wqe(qp, 0); + + if (likely(sg_list[i].length)) { + sge = sg_list[i]; + sge.length = atom_arg; + psge = &sge; + if (unlikely(set_data_ptr_seg(dpseg, psge, qp, 0))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for implicit lkey structure\n"); + return ENOMEM; + } + ++dpseg; + *sz += sizeof(struct mlx5_wqe_data_seg) / 16; + } + } + + return 0; +} + +static inline int set_data_seg(struct mlx5_qp *qp, void *seg, int *sz, int is_inl, + int num_sge, struct ibv_sge *sg_list, int atom_arg, + int idx, int offset) __attribute__((always_inline)); +static inline int set_data_seg(struct mlx5_qp *qp, void *seg, int *sz, int is_inl, + int num_sge, struct ibv_sge *sg_list, int atom_arg, + int idx, int offset) +{ + if (is_inl) + return set_data_inl_seg(qp, num_sge, sg_list, seg, sz, idx, + offset); + if (unlikely(atom_arg)) + return set_data_atom_seg(qp, num_sge, sg_list, seg, sz, atom_arg); + + return set_data_non_inl_seg(qp, num_sge, sg_list, seg, sz, idx, offset); +} + +#ifdef MLX5_DEBUG +void dump_wqe(FILE *fp, int idx, int size_16, struct mlx5_qp *qp) +{ + uint32_t *uninitialized_var(p); + int i, j; + int tidx = idx; + + fprintf(fp, "dump wqe at %p\n", mlx5_get_send_wqe(qp, tidx)); + for (i = 0, j = 0; i < size_16 * 4; i += 4, j += 4) { + if ((i & 0xf) == 0) { + void *buf = mlx5_get_send_wqe(qp, tidx); + tidx = (tidx + 1) & (qp->sq.wqe_cnt - 1); + p = buf; + j = 0; + } + fprintf(fp, "%08x %08x %08x %08x\n", ntohl(p[j]), ntohl(p[j + 1]), + ntohl(p[j + 2]), ntohl(p[j + 3])); + } +} +#endif /* MLX5_DEBUG */ + + +void *mlx5_get_atomic_laddr(struct mlx5_qp *qp, uint16_t idx, int *byte_count) +{ + struct mlx5_wqe_data_seg *dpseg; + void *addr; + + dpseg = mlx5_get_send_wqe(qp, idx) + sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg); + addr = (void *)(unsigned long)ntohll(dpseg->addr); + + /* + * Currently byte count is always 8 bytes. Fix this when + * we support variable size of atomics + */ + *byte_count = 8; + return addr; +} + +static int ext_cmp_swp(struct mlx5_qp *qp, void *seg, + struct ibv_exp_send_wr *wr) +{ + struct ibv_exp_cmp_swap *cs = &wr->ext_op.masked_atomics.wr_data.inline_data.op.cmp_swap; + int arg_sz = 1 << wr->ext_op.masked_atomics.log_arg_sz; + uint32_t *p32 = seg; + uint64_t *p64 = seg; + int i; + + if (arg_sz == 4) { + *p32 = htonl((uint32_t)cs->swap_val); + p32++; + *p32 = htonl((uint32_t)cs->compare_val); + p32++; + *p32 = htonl((uint32_t)cs->swap_mask); + p32++; + *p32 = htonl((uint32_t)cs->compare_mask); + return 16; + } else if (arg_sz == 8) { + *p64 = htonll(cs->swap_val); + p64++; + *p64 = htonll(cs->compare_val); + p64++; + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(cs->swap_mask); + p64++; + *p64 = htonll(cs->compare_mask); + return 32; + } else { + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->swap_val + i)); + } + + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->compare_val + i)); + } + + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->swap_mask + i)); + } + + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(cs->compare_mask + i)); + } + return 4 * arg_sz; + } +} + +static int ext_fetch_add(struct mlx5_qp *qp, void *seg, + struct ibv_exp_send_wr *wr) +{ + struct ibv_exp_fetch_add *fa = &wr->ext_op.masked_atomics.wr_data.inline_data.op.fetch_add; + int arg_sz = 1 << wr->ext_op.masked_atomics.log_arg_sz; + uint32_t *p32 = seg; + uint64_t *p64 = seg; + int i; + + if (arg_sz == 4) { + *p32 = htonl((uint32_t)fa->add_val); + p32++; + *p32 = htonl((uint32_t)fa->field_boundary); + p32++; + *p32 = htonl(0); + p32++; + *p32 = htonl(0); + return 16; + } else if (arg_sz == 8) { + *p64 = htonll(fa->add_val); + p64++; + *p64 = htonll(fa->field_boundary); + return 16; + } else { + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(fa->add_val + i)); + } + + for (i = 0; i < arg_sz; i += 8, p64++) { + if (unlikely(p64 == qp->gen_data.sqend)) + p64 = mlx5_get_send_wqe(qp, 0); + *p64 = htonll(*(uint64_t *)(uintptr_t)(fa->field_boundary + i)); + } + + return 2 * arg_sz; + } +} + +static int set_ext_atomic_seg(struct mlx5_qp *qp, void *seg, + struct ibv_exp_send_wr *wr) +{ + /* currently only inline is supported */ + if (unlikely(!(wr->exp_send_flags & IBV_EXP_SEND_EXT_ATOMIC_INLINE))) + return -1; + + if (unlikely((1 << wr->ext_op.masked_atomics.log_arg_sz) > qp->max_atomic_arg)) + return -1; + + if (wr->exp_opcode == IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP) + return ext_cmp_swp(qp, seg, wr); + else if (wr->exp_opcode == IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD) + return ext_fetch_add(qp, seg, wr); + else + return -1; +} + +enum { + MLX5_UMR_CTRL_INLINE = 1 << 7, +}; + +static uint64_t umr_mask(int fill) +{ + uint64_t mask; + + if (fill) + mask = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_A; + else + mask = MLX5_MKEY_MASK_FREE; + + return mask; +} + +static void set_umr_ctrl_seg(struct ibv_exp_send_wr *wr, + struct mlx5_wqe_umr_ctrl_seg *seg) +{ + int fill = wr->exp_opcode == IBV_EXP_WR_UMR_FILL ? 1 : 0; + + memset(seg, 0, sizeof(*seg)); + + if (wr->exp_send_flags & IBV_EXP_SEND_INLINE || !fill) + seg->flags = MLX5_UMR_CTRL_INLINE; + + seg->mkey_mask = htonll(umr_mask(fill)); +} + +static int lay_umr(struct mlx5_qp *qp, struct ibv_exp_send_wr *wr, + void *seg, int *wqe_size, int *xlat_size, + uint64_t *reglen) +{ + enum ibv_exp_umr_wr_type type = wr->ext_op.umr.umr_type; + struct ibv_exp_mem_region *mlist; + struct ibv_exp_mem_repeat_block *rep; + struct mlx5_wqe_data_seg *dseg; + struct mlx5_seg_repeat_block *rb; + struct mlx5_seg_repeat_ent *re; + struct mlx5_klm_buf *klm = NULL; + void *qend = qp->gen_data.sqend; + int i; + int j; + int n; + int byte_count = 0; + int inl = wr->exp_send_flags & IBV_EXP_SEND_INLINE; + void *buf; + int tmp; + + if (inl) { + if (unlikely(qp->max_inl_send_klms < + wr->ext_op.umr.num_mrs)) + return EINVAL; + buf = seg; + } else { + klm = to_klm(wr->ext_op.umr.memory_objects); + buf = klm->align_buf; + } + + *reglen = 0; + n = wr->ext_op.umr.num_mrs; + if (type == IBV_EXP_UMR_MR_LIST) { + mlist = wr->ext_op.umr.mem_list.mem_reg_list; + dseg = buf; + + for (i = 0, j = 0; i < n; i++, j++) { + if (inl && unlikely((&dseg[j] == qend))) { + dseg = mlx5_get_send_wqe(qp, 0); + j = 0; + } + + dseg[j].addr = htonll((uint64_t)(uintptr_t)mlist[i].base_addr); + dseg[j].lkey = htonl(mlist[i].mr->lkey); + dseg[j].byte_count = htonl(mlist[i].length); + byte_count += mlist[i].length; + } + if (inl) + *wqe_size = align(n * sizeof(*dseg), 64); + else + *wqe_size = 0; + + *reglen = byte_count; + *xlat_size = n * sizeof(*dseg); + } else { + rep = wr->ext_op.umr.mem_list.rb.mem_repeat_block_list; + rb = buf; + rb->const_0x400 = htonl(0x400); + rb->reserved = 0; + rb->num_ent = htons(n); + re = rb->entries; + rb->repeat_count = htonl(wr->ext_op.umr.mem_list.rb.repeat_count[0]); + + if (unlikely(wr->ext_op.umr.mem_list.rb.stride_dim != 1)) { + fprintf(stderr, "dimention must be 1\n"); + return -ENOMEM; + } + + + for (i = 0, j = 0; i < n; i++, j++, rep++, re++) { + if (inl && unlikely((re == qend))) + re = mlx5_get_send_wqe(qp, 0); + + byte_count += rep->byte_count[0]; + re->va = htonll(rep->base_addr); + re->byte_count = htons(rep->byte_count[0]); + re->stride = htons(rep->stride[0]); + re->memkey = htonl(rep->mr->lkey); + } + rb->byte_count = htonl(byte_count); + *reglen = byte_count * ntohl(rb->repeat_count); + tmp = align((n + 1), 4) - n - 1; + memset(re, 0, tmp * sizeof(*re)); + if (inl) { + *wqe_size = align(sizeof(*rb) + sizeof(*re) * n, 64); + *xlat_size = (n + 1) * sizeof(*re); + } else { + *wqe_size = 0; + *xlat_size = (n + 1) * sizeof(*re); + } + } + return 0; +} + +static void *adjust_seg(struct mlx5_qp *qp, void *seg) +{ + return mlx5_get_send_wqe(qp, 0) + (seg - qp->gen_data.sqend); +} + +static uint8_t get_umr_flags(int acc) +{ + return (acc & IBV_ACCESS_REMOTE_ATOMIC ? MLX5_PERM_ATOMIC : 0) | + (acc & IBV_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | + (acc & IBV_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | + (acc & IBV_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; +} + +static void set_mkey_seg(struct ibv_exp_send_wr *wr, struct mlx5_mkey_seg *seg) +{ + memset(seg, 0, sizeof(*seg)); + if (wr->exp_opcode != IBV_EXP_WR_UMR_FILL) { + seg->status = 1 << 6; + return; + } + + seg->flags = get_umr_flags(wr->ext_op.umr.exp_access); + seg->start_addr = htonll(wr->ext_op.umr.base_addr); + seg->qpn_mkey7_0 = htonl(0xffffff00 | (wr->ext_op.umr.modified_mr->lkey & 0xff)); +} + +static uint8_t get_fence(uint8_t fence, struct ibv_exp_send_wr *wr) +{ + if (unlikely(wr->exp_opcode == IBV_EXP_WR_LOCAL_INV && + wr->exp_send_flags & IBV_EXP_SEND_FENCE)) + return MLX5_FENCE_MODE_STRONG_ORDERING; + + if (unlikely(fence)) { + if (wr->exp_send_flags & IBV_EXP_SEND_FENCE) + return MLX5_FENCE_MODE_SMALL_AND_FENCE; + else + return fence; + + } else { + return 0; + } +} + +void mlx5_build_ctrl_seg_data(struct mlx5_qp *qp, uint32_t qp_num) +{ + uint8_t *tbl = qp->ctrl_seg.fm_ce_se_tbl; + uint8_t *acc = qp->ctrl_seg.fm_ce_se_acc; + int i; + + tbl[0 | 0 | 0] = (0 | 0 | 0); + tbl[0 | 0 | IBV_SEND_FENCE] = (0 | 0 | MLX5_WQE_CTRL_FENCE); + tbl[0 | IBV_SEND_SIGNALED | 0] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | 0); + tbl[0 | IBV_SEND_SIGNALED | IBV_SEND_FENCE] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE); + tbl[IBV_SEND_SOLICITED | 0 | 0] = (MLX5_WQE_CTRL_SOLICITED | 0 | 0); + tbl[IBV_SEND_SOLICITED | 0 | IBV_SEND_FENCE] = (MLX5_WQE_CTRL_SOLICITED | 0 | MLX5_WQE_CTRL_FENCE); + tbl[IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | 0] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | 0); + tbl[IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE); + for (i = 0; i < 8; i++) + tbl[i] = qp->sq_signal_bits | tbl[i]; + + memset(acc, 0, sizeof(qp->ctrl_seg.fm_ce_se_acc)); + acc[0 | 0 | 0] = (0 | 0 | 0); + acc[0 | 0 | IBV_EXP_QP_BURST_FENCE] = (0 | 0 | MLX5_WQE_CTRL_FENCE); + acc[0 | IBV_EXP_QP_BURST_SIGNALED | 0] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | 0); + acc[0 | IBV_EXP_QP_BURST_SIGNALED | IBV_EXP_QP_BURST_FENCE] = (0 | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE); + acc[IBV_EXP_QP_BURST_SOLICITED | 0 | 0] = (MLX5_WQE_CTRL_SOLICITED | 0 | 0); + acc[IBV_EXP_QP_BURST_SOLICITED | 0 | IBV_EXP_QP_BURST_FENCE] = (MLX5_WQE_CTRL_SOLICITED | 0 | MLX5_WQE_CTRL_FENCE); + acc[IBV_EXP_QP_BURST_SOLICITED | IBV_EXP_QP_BURST_SIGNALED | 0] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | 0); + acc[IBV_EXP_QP_BURST_SOLICITED | IBV_EXP_QP_BURST_SIGNALED | IBV_EXP_QP_BURST_FENCE] = (MLX5_WQE_CTRL_SOLICITED | MLX5_WQE_CTRL_CQ_UPDATE | MLX5_WQE_CTRL_FENCE); + for (i = 0; i < 32; i++) + acc[i] = qp->sq_signal_bits | acc[i]; + + qp->ctrl_seg.qp_num = qp_num; +} + +static inline void set_ctrl_seg(uint32_t *start, struct ctrl_seg_data *ctrl_seg, + uint8_t opcode, uint16_t idx, uint8_t opmod, + uint8_t size, uint8_t fm_ce_se, uint32_t imm_invk_umrk) +{ + *start++ = htonl(opmod << 24 | idx << 8 | opcode); + *start++ = htonl(ctrl_seg->qp_num << 8 | (size & 0x3F)); + *start++ = htonl(fm_ce_se); + *start = imm_invk_umrk; +} + +static inline void set_ctrl_seg_sig(uint32_t *start, struct ctrl_seg_data *ctrl_seg, + uint8_t opcode, uint16_t idx, uint8_t opmod, + uint8_t size, uint8_t fm_ce_se, uint32_t imm_invk_umrk) +{ + set_ctrl_seg(start, ctrl_seg, opcode, idx, opmod, size, fm_ce_se, imm_invk_umrk); + + if (unlikely(ctrl_seg->wq_sig)) + *(start + 2) = htonl(~calc_xor(start, size << 4) << 24 | fm_ce_se); +} + +static int __mlx5_post_send_one_other(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) +{ + void *ctrl = seg; + int err = 0; + int size = 0; + int num_sge = wr->num_sge; + uint8_t fm_ce_se; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) || + (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) && + !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n"); + return EINVAL; + } + + seg += sizeof(struct mlx5_wqe_ctrl_seg); + size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + + err = set_data_seg(qp, seg, &size, + !!(exp_send_flags & IBV_EXP_SEND_INLINE), + num_sge, wr->sg_list, 0, 0, 0); + if (unlikely(err)) + return err; + + fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & + (IBV_SEND_SOLICITED | + IBV_SEND_SIGNALED | + IBV_SEND_FENCE)]; + fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr); + set_ctrl_seg_sig(ctrl, &qp->ctrl_seg, + MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]), + qp->gen_data.scur_post, 0, size, fm_ce_se, + send_ieth(wr)); + + qp->gen_data.fm_cache = 0; + *total_size = size; + + return 0; +} + +static int __mlx5_post_send_one_raw_packet(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, + uint64_t exp_send_flags, void *seg, + int *total_size) __MLX5_ALGN_F__; + +static int __mlx5_post_send_one_raw_packet(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, + uint64_t exp_send_flags, void *seg, + int *total_size) +{ + void *ctrl = seg; + struct mlx5_wqe_eth_seg *eseg; + int err = 0; + int size = 0; + int num_sge = wr->num_sge; + int inl_hdr_size = MLX5_ETH_INLINE_HEADER_SIZE; + int inl_hdr_copy_size = 0; + int i = 0; + uint8_t fm_ce_se; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + seg += sizeof(struct mlx5_wqe_ctrl_seg); + size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + + eseg = seg; + *((uint64_t *)eseg) = 0; + eseg->rsvd2 = 0; + + if (exp_send_flags & IBV_EXP_SEND_IP_CSUM) + eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; + + /* The first 16 bytes of the headers should be copied to the + * inline-headers of the ETH segment. + */ + if (likely(wr->sg_list[0].length >= MLX5_ETH_INLINE_HEADER_SIZE)) { + inl_hdr_copy_size = MLX5_ETH_INLINE_HEADER_SIZE; + memcpy(eseg->inline_hdr_start, + (void *)(uintptr_t)wr->sg_list[0].addr, + inl_hdr_copy_size); + } else { + for (i = 0; i < num_sge && inl_hdr_size > 0; ++i) { + inl_hdr_copy_size = min(wr->sg_list[i].length, + inl_hdr_size); + memcpy(eseg->inline_hdr_start + + (MLX5_ETH_INLINE_HEADER_SIZE - inl_hdr_size), + (void *)(uintptr_t)wr->sg_list[i].addr, + inl_hdr_copy_size); + inl_hdr_size -= inl_hdr_copy_size; + } + --i; + if (unlikely(inl_hdr_size)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "Ethernet headers < 16 bytes\n"); + return EINVAL; + } + } + + seg += sizeof(struct mlx5_wqe_eth_seg); + size += sizeof(struct mlx5_wqe_eth_seg) / 16; + eseg->inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE); + + /* If we copied all the sge into the inline-headers, then we need to + * start copying from the next sge into the data-segment. + */ + if (unlikely(wr->sg_list[i].length == inl_hdr_copy_size)) { + ++i; + inl_hdr_copy_size = 0; + } + + /* The copied headers should be excluded from the data segment */ + err = set_data_seg(qp, seg, &size, + !!(exp_send_flags & IBV_EXP_SEND_INLINE), + num_sge, wr->sg_list, 0, i, inl_hdr_copy_size); + + if (unlikely(err)) + return err; + + fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & + (IBV_SEND_SOLICITED | + IBV_SEND_SIGNALED | + IBV_SEND_FENCE)]; + fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr); + set_ctrl_seg_sig(ctrl, &qp->ctrl_seg, + MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]), + qp->gen_data.scur_post, 0, size, fm_ce_se, + send_ieth(wr)); + + qp->gen_data.fm_cache = 0; + *total_size = size; + + return 0; +} + +static int __mlx5_post_send_one_uc_ud(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) __MLX5_ALGN_F__; +static int __mlx5_post_send_one_uc_ud(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) +{ + void *ctrl = seg; + int err = 0; + int size = 0; + int num_sge = wr->num_sge; + uint8_t fm_ce_se; + int tmp; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + + if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) || + (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) && + !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n"); + return EINVAL; + } + + seg += sizeof(struct mlx5_wqe_ctrl_seg); + size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + + switch (qp->gen_data_warm.qp_type) { + case IBV_QPT_UC: + switch (wr->exp_opcode) { + case IBV_WR_RDMA_WRITE: + case IBV_WR_RDMA_WRITE_WITH_IMM: + set_raddr_seg(seg, wr->wr.rdma.remote_addr, + wr->wr.rdma.rkey); + seg += sizeof(struct mlx5_wqe_raddr_seg); + size += sizeof(struct mlx5_wqe_raddr_seg) / 16; + break; + + default: + break; + } + break; + + case IBV_QPT_UD: + tmp = set_datagram_seg(seg, wr); + seg += tmp; + size += (tmp >> 4); + if (unlikely((seg == qp->gen_data.sqend))) + seg = mlx5_get_send_wqe(qp, 0); + break; + + default: + break; + } + + err = set_data_seg(qp, seg, &size, !!(exp_send_flags & IBV_EXP_SEND_INLINE), + num_sge, wr->sg_list, 0, 0, 0); + if (unlikely(err)) + return err; + + fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)]; + fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr); + set_ctrl_seg_sig(ctrl, &qp->ctrl_seg, MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]), + qp->gen_data.scur_post, 0, size, fm_ce_se, send_ieth(wr)); + + qp->gen_data.fm_cache = 0; + *total_size = size; + + return 0; +} +static int __mlx5_post_send_one_rc_dc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) __MLX5_ALGN_F__; +static int __mlx5_post_send_one_rc_dc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) +{ + struct mlx5_klm_buf *klm; + void *ctrl = seg; + struct ibv_qp *ibqp = &qp->verbs_qp.qp; + struct mlx5_context *ctx = to_mctx(ibqp->context); + int err = 0; + int size = 0; + uint8_t opmod = 0; + void *qend = qp->gen_data.sqend; + uint32_t mlx5_opcode; + struct mlx5_wqe_xrc_seg *xrc; + int tmp = 0; + int num_sge = wr->num_sge; + uint8_t next_fence = 0; + struct mlx5_wqe_umr_ctrl_seg *umr_ctrl; + int xlat_size; + struct mlx5_mkey_seg *mk; + int wqe_sz; + uint64_t reglen; + int atom_arg = 0; + uint8_t fm_ce_se; + uint32_t imm; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + + + if (unlikely(((MLX5_IB_OPCODE_GET_CLASS(mlx5_ib_opcode[wr->exp_opcode]) == MLX5_OPCODE_MANAGED) || + (exp_send_flags & IBV_EXP_SEND_WITH_CALC)) && + !(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_CROSS_CHANNEL))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported cross-channel functionality\n"); + return EINVAL; + } + + mlx5_opcode = MLX5_IB_OPCODE_GET_OP(mlx5_ib_opcode[wr->exp_opcode]); + imm = send_ieth(wr); + + seg += sizeof(struct mlx5_wqe_ctrl_seg); + size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + + switch (qp->gen_data_warm.qp_type) { + case IBV_QPT_XRC_SEND: + case IBV_QPT_XRC: + case IBV_EXP_QPT_DC_INI: + if (qp->gen_data_warm.qp_type == IBV_EXP_QPT_DC_INI) { + if (likely(wr->exp_opcode != IBV_EXP_WR_NOP)) + tmp = set_dci_seg(seg, wr); + seg += tmp; + size += (tmp >> 4); + if (unlikely((seg == qend))) + seg = mlx5_get_send_wqe(qp, 0); + + } else { + xrc = seg; + xrc->xrc_srqn = htonl(wr->qp_type.xrc.remote_srqn); + seg += sizeof(*xrc); + size += sizeof(*xrc) / 16; + } + /* fall through */ + case IBV_QPT_RC: + switch (wr->exp_opcode) { + case IBV_EXP_WR_RDMA_READ: + case IBV_EXP_WR_RDMA_WRITE: + case IBV_EXP_WR_RDMA_WRITE_WITH_IMM: + if (unlikely(exp_send_flags & IBV_EXP_SEND_WITH_CALC)) { + + if ((uint32_t)wr->op.calc.data_size >= IBV_EXP_CALC_DATA_SIZE_NUMBER || + (uint32_t)wr->op.calc.calc_op >= IBV_EXP_CALC_OP_NUMBER || + (uint32_t)wr->op.calc.data_type >= IBV_EXP_CALC_DATA_TYPE_NUMBER || + !mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op] + [wr->op.calc.data_type].valid) + return EINVAL; + + opmod = mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op] + [wr->op.calc.data_type].opmod; + } + set_raddr_seg(seg, wr->wr.rdma.remote_addr, wr->wr.rdma.rkey); + seg += sizeof(struct mlx5_wqe_raddr_seg); + size += sizeof(struct mlx5_wqe_raddr_seg) / 16; + break; + + case IBV_EXP_WR_ATOMIC_CMP_AND_SWP: + case IBV_EXP_WR_ATOMIC_FETCH_AND_ADD: + if (unlikely(!qp->enable_atomics)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "atomics not allowed\n"); + return EINVAL; + } + set_raddr_seg(seg, wr->wr.atomic.remote_addr, + wr->wr.atomic.rkey); + seg += sizeof(struct mlx5_wqe_raddr_seg); + + set_atomic_seg(seg, wr->exp_opcode, wr->wr.atomic.swap, + wr->wr.atomic.compare_add); + seg += sizeof(struct mlx5_wqe_atomic_seg); + + size += (sizeof(struct mlx5_wqe_raddr_seg) + + sizeof(struct mlx5_wqe_atomic_seg)) / 16; + atom_arg = 8; + break; + + case IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP: + case IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD: + if (unlikely(!qp->enable_atomics)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "atomics not allowed\n"); + return EINVAL; + } + if (unlikely(wr->ext_op.masked_atomics.log_arg_sz >= + sizeof(ctx->info.bit_mask_log_atomic_arg_sizes) * 8)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "too big atomic arg\n"); + return EINVAL; + } + atom_arg = 1 << wr->ext_op.masked_atomics.log_arg_sz; + if (unlikely(!(ctx->info.bit_mask_log_atomic_arg_sizes & atom_arg))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "unsupported atomic arg size. supported bitmask 0x%lx\n", + (unsigned long)ctx->info.bit_mask_log_atomic_arg_sizes); + return EINVAL; + } + + set_raddr_seg(seg, wr->ext_op.masked_atomics.remote_addr, + wr->ext_op.masked_atomics.rkey); + seg += sizeof(struct mlx5_wqe_raddr_seg); + size += sizeof(struct mlx5_wqe_raddr_seg) / 16; + tmp = set_ext_atomic_seg(qp, seg, wr); + if (unlikely(tmp < 0)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "invalid atomic arguments\n"); + return EINVAL; + } + size += (tmp >> 4); + seg += tmp; + if (unlikely((seg >= qend))) + seg = seg - qend + mlx5_get_send_wqe(qp, 0); + opmod = MLX5_OPCODE_EXT_ATOMICS | (wr->ext_op.masked_atomics.log_arg_sz - 2); + break; + + case IBV_EXP_WR_SEND: + if (unlikely(exp_send_flags & IBV_EXP_SEND_WITH_CALC)) { + + if ((uint32_t)wr->op.calc.data_size >= IBV_EXP_CALC_DATA_SIZE_NUMBER || + (uint32_t)wr->op.calc.calc_op >= IBV_EXP_CALC_OP_NUMBER || + (uint32_t)wr->op.calc.data_type >= IBV_EXP_CALC_DATA_TYPE_NUMBER || + !mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op] + [wr->op.calc.data_type].valid) + return EINVAL; + + opmod = mlx5_calc_ops_table[wr->op.calc.data_size][wr->op.calc.calc_op] + [wr->op.calc.data_type].opmod; + } + break; + + case IBV_EXP_WR_CQE_WAIT: + { + struct mlx5_cq *wait_cq = to_mcq(wr->task.cqe_wait.cq); + uint32_t wait_index = 0; + + wait_index = wait_cq->wait_index + + wr->task.cqe_wait.cq_count; + wait_cq->wait_count = max(wait_cq->wait_count, + wr->task.cqe_wait.cq_count); + + if (exp_send_flags & IBV_EXP_SEND_WAIT_EN_LAST) { + wait_cq->wait_index += wait_cq->wait_count; + wait_cq->wait_count = 0; + } + + set_wait_en_seg(seg, wait_cq->cqn, wait_index); + seg += sizeof(struct mlx5_wqe_wait_en_seg); + size += sizeof(struct mlx5_wqe_wait_en_seg) / 16; + } + break; + + case IBV_EXP_WR_SEND_ENABLE: + case IBV_EXP_WR_RECV_ENABLE: + { + unsigned head_en_index; + struct mlx5_wq *wq; + struct mlx5_wq_recv_send_enable *wq_en; + + /* + * Posting work request for QP that does not support + * SEND/RECV ENABLE makes performance worse. + */ + if (((wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) && + !(to_mqp(wr->task.wqe_enable.qp)->gen_data.create_flags & + IBV_EXP_QP_CREATE_MANAGED_SEND)) || + ((wr->exp_opcode == IBV_EXP_WR_RECV_ENABLE) && + !(to_mqp(wr->task.wqe_enable.qp)->gen_data.create_flags & + IBV_EXP_QP_CREATE_MANAGED_RECV))) { + return EINVAL; + } + + wq = (wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) ? + &to_mqp(wr->task.wqe_enable.qp)->sq : + &to_mqp(wr->task.wqe_enable.qp)->rq; + + wq_en = (wr->exp_opcode == IBV_EXP_WR_SEND_ENABLE) ? + &to_mqp(wr->task.wqe_enable.qp)->sq_enable : + &to_mqp(wr->task.wqe_enable.qp)->rq_enable; + + /* If wqe_count is 0 release all WRs from queue */ + if (wr->task.wqe_enable.wqe_count) { + head_en_index = wq_en->head_en_index + + wr->task.wqe_enable.wqe_count; + wq_en->head_en_count = max(wq_en->head_en_count, + wr->task.wqe_enable.wqe_count); + + if ((int)(wq->head - head_en_index) < 0) + return EINVAL; + } else { + head_en_index = wq->head; + wq_en->head_en_count = wq->head - wq_en->head_en_index; + } + + if (exp_send_flags & IBV_EXP_SEND_WAIT_EN_LAST) { + wq_en->head_en_index += wq_en->head_en_count; + wq_en->head_en_count = 0; + } + + set_wait_en_seg(seg, + wr->task.wqe_enable.qp->qp_num, + head_en_index); + + seg += sizeof(struct mlx5_wqe_wait_en_seg); + size += sizeof(struct mlx5_wqe_wait_en_seg) / 16; + } + break; + case IBV_EXP_WR_UMR_FILL: + case IBV_EXP_WR_UMR_INVALIDATE: + if (unlikely(!qp->umr_en)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "UMR not supported\n"); + return EINVAL; + } + next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + imm = htonl(wr->ext_op.umr.modified_mr->lkey); + num_sge = 0; + umr_ctrl = seg; + set_umr_ctrl_seg(wr, seg); + seg += sizeof(struct mlx5_wqe_umr_ctrl_seg); + size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16; + + if (unlikely((seg == qend))) + seg = mlx5_get_send_wqe(qp, 0); + mk = seg; + set_mkey_seg(wr, seg); + seg += sizeof(*mk); + size += (sizeof(*mk) / 16); + if (wr->exp_opcode == IBV_EXP_WR_UMR_INVALIDATE) + break; + + if (unlikely((seg == qend))) + seg = mlx5_get_send_wqe(qp, 0); + err = lay_umr(qp, wr, seg, &wqe_sz, &xlat_size, ®len); + if (err) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "lay_umr failure\n"); + return err; + } + mk->len = htonll(reglen); + size += wqe_sz / 16; + seg += wqe_sz; + umr_ctrl->klm_octowords = htons(align(xlat_size, 64) / 16); + if (unlikely((seg >= qend))) + seg = adjust_seg(qp, seg); + if (!(wr->exp_send_flags & IBV_EXP_SEND_INLINE)) { + struct ibv_sge sge; + + klm = to_klm(wr->ext_op.umr.memory_objects); + sge.addr = (uint64_t)(uintptr_t)klm->mr->addr; + sge.lkey = klm->mr->lkey; + sge.length = 0; + set_data_ptr_seg(seg, &sge, qp, 0); + size += sizeof(struct mlx5_wqe_data_seg) / 16; + seg += sizeof(struct mlx5_wqe_data_seg); + } + break; + + case IBV_EXP_WR_NOP: + break; + + default: + break; + } + break; + + default: + break; + } + + err = set_data_seg(qp, seg, &size, !!(exp_send_flags & IBV_EXP_SEND_INLINE), + num_sge, wr->sg_list, atom_arg, 0, 0); + if (unlikely(err)) + return err; + + fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)]; + fm_ce_se |= get_fence(qp->gen_data.fm_cache, wr); + set_ctrl_seg_sig(ctrl, &qp->ctrl_seg, + mlx5_opcode, qp->gen_data.scur_post, opmod, size, + fm_ce_se, imm); + + qp->gen_data.fm_cache = next_fence; + *total_size = size; + + return 0; +} + +static inline int __mlx5_post_send_one_fast_rc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size, + const int cmd, const int inl) __attribute__((always_inline)); +static inline int __mlx5_post_send_one_fast_rc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size, + const int cmd, const int inl) +{ + struct mlx5_wqe_ctrl_seg *ctrl = seg; + int err = 0; + int size = 0; + uint8_t fm_ce_se; + + seg += sizeof(*ctrl); + size = sizeof(*ctrl) / 16; + + if (cmd == MLX5_OPCODE_RDMA_WRITE) { + set_raddr_seg(seg, wr->wr.rdma.remote_addr, wr->wr.rdma.rkey); + seg += sizeof(struct mlx5_wqe_raddr_seg); + size += sizeof(struct mlx5_wqe_raddr_seg) / 16; + } + + if (inl) + err = set_data_inl_seg(qp, wr->num_sge, wr->sg_list, seg, + &size, 0, 0); + else + err = set_data_non_inl_seg(qp, wr->num_sge, wr->sg_list, seg, + &size, 0, 0); + if (unlikely(err)) + return err; + + fm_ce_se = qp->ctrl_seg.fm_ce_se_tbl[exp_send_flags & (IBV_SEND_SOLICITED | IBV_SEND_SIGNALED | IBV_SEND_FENCE)]; + if (unlikely(qp->gen_data.fm_cache)) { + if (unlikely(exp_send_flags & IBV_EXP_SEND_FENCE)) + fm_ce_se |= MLX5_FENCE_MODE_SMALL_AND_FENCE; + else + fm_ce_se |= qp->gen_data.fm_cache; + } + + set_ctrl_seg((uint32_t *)ctrl, &qp->ctrl_seg, + cmd, qp->gen_data.scur_post, 0, size, + fm_ce_se, 0); + + qp->gen_data.fm_cache = 0; + *total_size = size; + + return 0; +} + +#define MLX5_POST_SEND_ONE_FAST_RC(suffix, cmd, inl) \ + static int __mlx5_post_send_one_fast_rc_##suffix( \ + struct ibv_exp_send_wr *wr, \ + struct mlx5_qp *qp, uint64_t exp_send_flags, \ + void *seg, int *total_size) __MLX5_ALGN_F__; \ + static int __mlx5_post_send_one_fast_rc_##suffix( \ + struct ibv_exp_send_wr *wr, \ + struct mlx5_qp *qp, uint64_t exp_send_flags, \ + void *seg, int *total_size) \ + { \ + return __mlx5_post_send_one_fast_rc(wr, qp, \ + exp_send_flags, \ + seg, total_size, \ + cmd, inl); \ + } +/* suffix cmd inl */ +MLX5_POST_SEND_ONE_FAST_RC(send, MLX5_OPCODE_SEND, 0); +MLX5_POST_SEND_ONE_FAST_RC(send_inl, MLX5_OPCODE_SEND, 1); +MLX5_POST_SEND_ONE_FAST_RC(rwrite, MLX5_OPCODE_RDMA_WRITE, 0); +MLX5_POST_SEND_ONE_FAST_RC(rwrite_inl, MLX5_OPCODE_RDMA_WRITE, 1); + +static int __mlx5_post_send_one_not_ready(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, void *seg, int *total_size) +{ +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "bad QP state\n"); + + return EINVAL; +} + +enum mlx5_post_send_one_rc_cases { + MLX5_SEND_RC = (IBV_EXP_WR_SEND), + MLX5_SEND_RC_INL = (IBV_EXP_WR_SEND) + (IBV_EXP_SEND_INLINE << 8), + MLX5_RDMA_WRITE_RC = (IBV_EXP_WR_RDMA_WRITE), + MLX5_RDMA_WRITE_RC_INL = (IBV_EXP_WR_RDMA_WRITE) + (IBV_EXP_SEND_INLINE << 8), +}; + +static int __mlx5_post_send_one_rc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, + void *seg, int *total_size) __MLX5_ALGN_F__; +static int __mlx5_post_send_one_rc(struct ibv_exp_send_wr *wr, + struct mlx5_qp *qp, uint64_t exp_send_flags, + void *seg, int *total_size) +{ +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(qp->verbs_qp.qp.context)->dbg_fp; +#endif + uint64_t rc_case = (uint64_t)wr->exp_opcode | ((exp_send_flags & (IBV_EXP_SEND_WITH_CALC | IBV_EXP_SEND_INLINE)) << 8); + + switch (rc_case) { + + case MLX5_SEND_RC: + return __mlx5_post_send_one_fast_rc_send(wr, qp, exp_send_flags, seg, total_size); + + case MLX5_SEND_RC_INL: + return __mlx5_post_send_one_fast_rc_send_inl(wr, qp, exp_send_flags, seg, total_size); + + case MLX5_RDMA_WRITE_RC: + return __mlx5_post_send_one_fast_rc_rwrite(wr, qp, exp_send_flags, seg, total_size); + + case MLX5_RDMA_WRITE_RC_INL: + return __mlx5_post_send_one_fast_rc_rwrite_inl(wr, qp, exp_send_flags, seg, total_size); + + default: + if (unlikely(wr->exp_opcode < 0 || + wr->exp_opcode >= sizeof(mlx5_ib_opcode) / sizeof(mlx5_ib_opcode[0]))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "bad opcode %d\n", wr->exp_opcode); + return EINVAL; + } else { + return __mlx5_post_send_one_rc_dc(wr, qp, exp_send_flags, seg, total_size); + } + } +} + +void mlx5_update_post_send_one(struct mlx5_qp *qp, enum ibv_qp_state qp_state, enum ibv_qp_type qp_type) +{ + if (qp_state < IBV_QPS_RTS) { + qp->gen_data.post_send_one = __mlx5_post_send_one_not_ready; + } else { + switch (qp_type) { + case IBV_QPT_XRC_SEND: + case IBV_QPT_XRC: + case IBV_EXP_QPT_DC_INI: + qp->gen_data.post_send_one = __mlx5_post_send_one_rc_dc; + break; + case IBV_QPT_RC: + if (qp->ctrl_seg.wq_sig) + qp->gen_data.post_send_one = __mlx5_post_send_one_rc_dc; + else + qp->gen_data.post_send_one = __mlx5_post_send_one_rc; + + break; + + case IBV_QPT_UC: + case IBV_QPT_UD: + qp->gen_data.post_send_one = __mlx5_post_send_one_uc_ud; + break; + + case IBV_QPT_RAW_ETH: + qp->gen_data.post_send_one = __mlx5_post_send_one_raw_packet; + break; + + default: + qp->gen_data.post_send_one = __mlx5_post_send_one_other; + break; + } + } +} + +static inline int __ring_db(struct mlx5_qp *qp, const int db_method, uint32_t curr_post, unsigned long long *seg, int size) __attribute__((always_inline)); +static inline int __ring_db(struct mlx5_qp *qp, const int db_method, uint32_t curr_post, unsigned long long *seg, int size) +{ + struct mlx5_bf *bf = qp->gen_data.bf; + + qp->gen_data.last_post = curr_post; + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + + switch (db_method) { + case MLX5_DB_METHOD_DEDIC_BF_1_THREAD: + /* This QP is used by one thread and it uses dedicated blue-flame */ + + /* Use wc_wmb to make sure old BF-copy is not passing current DB record */ + wc_wmb(); + qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post); + + /* This wc_wmb ensures ordering between DB record and BF copy */ + wc_wmb(); + if (size <= bf->buf_size / 64) { + mlx5_bf_copy(bf->reg + bf->offset, seg, + size * 64, qp); + + /* No need for wc_wmb since cpu arch support auto WC buffer eviction */ + } else { + mlx5_write_db(bf->reg + bf->offset, seg); + wc_wmb(); + } + bf->offset ^= bf->buf_size; + break; + + case MLX5_DB_METHOD_DEDIC_BF: + /* The QP has dedicated blue-flame */ + + /* + * Make sure that descriptors are written before + * updating doorbell record and ringing the doorbell + */ + wmb(); + qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post); + + /* This wc_wmb ensures ordering between DB record and BF copy */ + wc_wmb(); + if (size <= bf->buf_size / 64) + mlx5_bf_copy(bf->reg + bf->offset, seg, + size * 64, qp); + else + mlx5_write_db(bf->reg + bf->offset, seg); + /* + * use wc_wmb to ensure write combining buffers are flushed out + * of the running CPU. This must be carried inside the spinlock. + * Otherwise, there is a potential race. In the race, CPU A + * writes doorbell 1, which is waiting in the WC buffer. CPU B + * writes doorbell 2, and it's write is flushed earlier. Since + * the wc_wmb is CPU local, this will result in the HCA seeing + * doorbell 2, followed by doorbell 1. + */ + wc_wmb(); + bf->offset ^= bf->buf_size; + break; + + case MLX5_DB_METHOD_BF: + /* The QP has blue-flame that may be shared by other QPs */ + + /* + * Make sure that descriptors are written before + * updating doorbell record and ringing the doorbell + */ + wmb(); + qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post); + + /* This wc_wmb ensures ordering between DB record and BF copy */ + wc_wmb(); + mlx5_lock(&bf->lock); + if (size <= bf->buf_size / 64) + mlx5_bf_copy(bf->reg + bf->offset, seg, + size * 64, qp); + else + mlx5_write_db(bf->reg + bf->offset, seg); + /* + * use wc_wmb to ensure write combining buffers are flushed out + * of the running CPU. This must be carried inside the spinlock. + * Otherwise, there is a potential race. In the race, CPU A + * writes doorbell 1, which is waiting in the WC buffer. CPU B + * writes doorbell 2, and it's write is flushed earlier. Since + * the wc_wmb is CPU local, this will result in the HCA seeing + * doorbell 2, followed by doorbell 1. + */ + wc_wmb(); + bf->offset ^= bf->buf_size; + mlx5_unlock(&bf->lock); + break; + + case MLX5_DB_METHOD_DB: + /* doorbell mapped to non-cached memory */ + + /* + * Make sure that descriptors are written before + * updating doorbell record and ringing the doorbell + */ + wmb(); + qp->gen_data.db[MLX5_SND_DBR] = htonl(curr_post); + + /* This wmb ensures ordering between DB record and DB ringing */ + wmb(); + mlx5_write64((__be32 *)seg, bf->reg + bf->offset, &bf->lock); + break; + } + + return 0; +} + +static inline int __mlx5_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr, + struct ibv_exp_send_wr **bad_wr, int is_exp_wr) __attribute__((always_inline)); +static inline int __mlx5_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr, + struct ibv_exp_send_wr **bad_wr, int is_exp_wr) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + void *uninitialized_var(seg); + int nreq; + int err = 0; + int size; + unsigned idx; + uint64_t exp_send_flags; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(ibqp->context)->dbg_fp; +#endif + + + mlx5_lock(&qp->sq.lock); + + for (nreq = 0; wr; ++nreq, wr = wr->next) { + idx = qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1); + seg = mlx5_get_send_wqe(qp, idx); + + exp_send_flags = is_exp_wr ? wr->exp_send_flags : ((struct ibv_send_wr *)wr)->send_flags; + + if (unlikely(!(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_IGNORE_SQ_OVERFLOW) && + mlx5_wq_overflow(&qp->sq, nreq, qp))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "work queue overflow\n"); + errno = ENOMEM; + err = errno; + *bad_wr = wr; + goto out; + } + + if (unlikely(wr->num_sge > qp->sq.max_gs)) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "max gs exceeded %d (max = %d)\n", + wr->num_sge, qp->sq.max_gs); + errno = ENOMEM; + err = errno; + *bad_wr = wr; + goto out; + } + + + + err = qp->gen_data.post_send_one(wr, qp, exp_send_flags, seg, &size); + if (unlikely(err)) { + errno = err; + *bad_wr = wr; + goto out; + } + + + + qp->sq.wrid[idx] = wr->wr_id; + qp->gen_data.wqe_head[idx] = qp->sq.head + nreq; + qp->gen_data.scur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + +#ifdef MLX5_DEBUG + if (mlx5_debug_mask & MLX5_DBG_QP_SEND) + dump_wqe(to_mctx(ibqp->context)->dbg_fp, idx, size, qp); +#endif + } + +out: + if (likely(nreq)) { + qp->sq.head += nreq; + + if (unlikely(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_MANAGED_SEND)) { + /* Controlled qp */ + wmb(); + goto post_send_no_db; + } + + __ring_db(qp, qp->gen_data.bf->db_method, qp->gen_data.scur_post & 0xffff, seg, (size + 3) / 4); + } + +post_send_no_db: + + mlx5_unlock(&qp->sq.lock); + + return err; +} + +int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr, + struct ibv_send_wr **bad_wr) +{ + return __mlx5_post_send(ibqp, (struct ibv_exp_send_wr *)wr, + (struct ibv_exp_send_wr **)bad_wr, 0); +} + +int mlx5_exp_post_send(struct ibv_qp *ibqp, struct ibv_exp_send_wr *wr, + struct ibv_exp_send_wr **bad_wr) +{ + return __mlx5_post_send(ibqp, wr, bad_wr, 1); +} + +static void set_sig_seg(struct mlx5_qp *qp, struct mlx5_rwqe_sig *sig, + int size, uint16_t idx) +{ + uint8_t sign; + uint32_t qpn = qp->verbs_qp.qp.qp_num; + + sign = calc_xor(sig + 1, size); + sign ^= calc_xor(&qpn, 4); + sign ^= calc_xor(&idx, 2); + sig->signature = ~sign; +} + +int mlx5_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad_wr) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + struct mlx5_wqe_data_seg *scat; + int err = 0; + int nreq; + int ind; + int i, j; + struct mlx5_rwqe_sig *sig; + int sigsz; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(ibqp->context)->dbg_fp; +#endif + + mlx5_lock(&qp->rq.lock); + + ind = qp->rq.head & (qp->rq.wqe_cnt - 1); + + for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (unlikely(!(qp->gen_data.create_flags & IBV_EXP_QP_CREATE_IGNORE_RQ_OVERFLOW) && + mlx5_wq_overflow(&qp->rq, nreq, qp))) { + errno = ENOMEM; + err = errno; + *bad_wr = wr; + goto out; + } + + if (unlikely(wr->num_sge > qp->rq.max_gs)) { + errno = EINVAL; + err = errno; + *bad_wr = wr; + goto out; + } + + scat = get_recv_wqe(&qp->rq, ind); + sig = (struct mlx5_rwqe_sig *)scat; + if (unlikely(qp->ctrl_seg.wq_sig)) + ++scat; + + for (i = 0, j = 0; i < wr->num_sge; ++i) { + if (unlikely(!wr->sg_list[i].length)) + continue; + if (unlikely(set_data_ptr_seg(scat + j++, + wr->sg_list + i, qp, 0))) { + mlx5_dbg(fp, MLX5_DBG_QP_SEND, "failed allocating memory for global lkey structure\n"); + errno = ENOMEM; + err = -1; + *bad_wr = wr; + goto out; + } + } + + if (j < qp->rq.max_gs) { + scat[j].byte_count = 0; + scat[j].lkey = htonl(MLX5_INVALID_LKEY); + scat[j].addr = 0; + } + + if (unlikely(qp->ctrl_seg.wq_sig)) { + sigsz = min(wr->num_sge, (1 << (qp->rq.wqe_shift - 4)) - 1); + + set_sig_seg(qp, sig, sigsz << 4, qp->rq.head + nreq); + } + + qp->rq.wrid[ind] = wr->wr_id; + + ind = (ind + 1) & (qp->rq.wqe_cnt - 1); + } + +out: + if (likely(nreq)) { + qp->rq.head += nreq; + + /* + * Make sure that descriptors are written before + * doorbell record. + */ + wmb(); + + if (likely(!(ibqp->qp_type == IBV_QPT_RAW_ETH && + ibqp->state < IBV_QPS_RTR))) + qp->gen_data.db[MLX5_RCV_DBR] = htonl(qp->rq.head & 0xffff); + } + + mlx5_unlock(&qp->rq.lock); + + return err; +} + +int mlx5_use_huge(struct ibv_context *context, const char *key) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, key, env, sizeof(env)) && + !strcmp(env, "y")) + return 1; + + return 0; +} + +void *mlx5_find_rsc(struct mlx5_context *ctx, uint32_t rsn) +{ + int tind = rsn >> MLX5_QP_TABLE_SHIFT; + + if (ctx->rsc_table[tind].refcnt) + return ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK]; + else + return NULL; +} + +int mlx5_store_rsc(struct mlx5_context *ctx, uint32_t rsn, void *rsc) +{ + int tind = rsn >> MLX5_QP_TABLE_SHIFT; + + if (!ctx->rsc_table[tind].refcnt) { + ctx->rsc_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1, + sizeof(void *)); + if (!ctx->rsc_table[tind].table) + return -1; + } + + ++ctx->rsc_table[tind].refcnt; + ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK] = rsc; + return 0; +} + +void mlx5_clear_rsc(struct mlx5_context *ctx, uint32_t rsn) +{ + int tind = rsn >> MLX5_QP_TABLE_SHIFT; + + if (!--ctx->rsc_table[tind].refcnt) + free(ctx->rsc_table[tind].table); + else + ctx->rsc_table[tind].table[rsn & MLX5_QP_TABLE_MASK] = NULL; +} + +int mlx5_post_task(struct ibv_context *context, + struct ibv_exp_task *task_list, + struct ibv_exp_task **bad_task) +{ + int rc = 0; + struct ibv_exp_task *cur_task = NULL; + struct ibv_exp_send_wr *bad_wr; + struct mlx5_context *mlx5_ctx = to_mctx(context); + + if (!task_list) + return rc; + + pthread_mutex_lock(&mlx5_ctx->task_mutex); + + cur_task = task_list; + while (!rc && cur_task) { + + switch (cur_task->task_type) { + case IBV_EXP_TASK_SEND: + rc = ibv_exp_post_send(cur_task->item.qp, + cur_task->item.send_wr, + &bad_wr); + break; + + case IBV_EXP_TASK_RECV: + rc = ibv_post_recv(cur_task->item.qp, + cur_task->item.recv_wr, + NULL); + break; + + default: + rc = -1; + } + + if (rc && bad_task) { + *bad_task = cur_task; + break; + } + + cur_task = cur_task->next; + } + + pthread_mutex_unlock(&mlx5_ctx->task_mutex); + + return rc; +} + +/* + * family interfaces functions + */ + +/* + * send_pending - is a general post send function that put one message in + * the send queue. The function is not ringing the QP door-bell. + * + * User may call this function several times to fill send queue with + * several messages, then he can call send_flush to ring the QP DB + * + * This function is used to implement the following QP burst family functions: + * - send_pending + * - send_pending_inline + * - send_pending_sg_list + * - send_burst + */ + +static inline int send_pending(struct ibv_qp *ibqp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags, + const int use_raw_eth, const int use_inl, + const int thread_safe, const int use_sg_list, + const int use_mpw, + const int num_sge, struct ibv_sge *sg_list) __attribute__((always_inline)); +static inline int send_pending(struct ibv_qp *ibqp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags, + const int use_raw_eth, const int use_inl, + const int thread_safe, const int use_sg_list, + const int use_mpw, + const int num_sge, struct ibv_sge *sg_list) + +{ + struct mlx5_wqe_inline_seg *uninitialized_var(inl_seg); + struct mlx5_wqe_data_seg *uninitialized_var(dseg); + uint8_t *uninitialized_var(inl_data); + uint32_t *uninitialized_var(start); + struct mlx5_qp *qp = to_mqp(ibqp); + int uninitialized_var(size); + uint8_t fm_ce_se; + int i; + + if (thread_safe) + mlx5_lock(&qp->sq.lock); + + if (use_mpw) { + uint32_t msg_size, n_sg; + + if (use_sg_list) { + msg_size = 0; + for (i = 0; i < num_sge; i++) + msg_size += sg_list[i].length; + n_sg = num_sge; + } else { + msg_size = length; + n_sg = 1; + } + if (use_inl && + (qp->mpw.state == MLX5_MPW_STATE_OPENED_INL) && + (qp->mpw.len == msg_size) && + ((qp->mpw.flags & ~IBV_EXP_QP_BURST_SIGNALED) == + (flags & ~IBV_EXP_QP_BURST_SIGNALED)) && + ((qp->mpw.total_len + msg_size) <= qp->data_seg.max_inline_data)) { + /* Add current message to opened inline multi-packet WQE */ + inl_seg = (struct mlx5_wqe_inline_seg *)(qp->mpw.ctrl_update + 7); + inl_data = qp->mpw.inl_data + qp->mpw.len; + if (unlikely((void *)inl_data >= qp->gen_data.sqend)) + inl_data = (uint8_t *)mlx5_get_send_wqe(qp, 0) + + (inl_data - (uint8_t *)qp->gen_data.sqend); + qp->mpw.total_len += msg_size; + } else if (!use_inl && + (qp->mpw.state == MLX5_MPW_STATE_OPENED) && + (qp->mpw.len == msg_size) && + ((qp->mpw.flags & ~IBV_EXP_QP_BURST_SIGNALED) == + (flags & ~IBV_EXP_QP_BURST_SIGNALED)) && + (qp->mpw.num_sge + n_sg) <= MLX5_MAX_MPW_SGE) { + /* Add current message to opened multi-packet WQE */ + dseg = qp->mpw.last_dseg + 1; + if (unlikely(dseg == qp->gen_data.sqend)) + dseg = mlx5_get_send_wqe(qp, 0); + size = 0; + qp->mpw.num_sge += n_sg; + } else if (likely(use_inl || (msg_size <= MLX5_MAX_MPW_SIZE))) { + /* Open new multi-packet WQE + * + * In case of inline the user must make sure that + * message size is smaller than max_inline which + * means that it is also smaller than MLX5_MAX_MPW_SIZE + * This guarantees that we can open multi-packet WQE. + * In case of non-inline we must check that msg_size is + * smaller than MLX5_MAX_MPW_SIZE. + */ + + qp->mpw.state = MLX5_MPW_STATE_OPENING; + qp->mpw.len = msg_size; + qp->mpw.num_sge = n_sg; + qp->mpw.flags = flags; + qp->mpw.scur_post = qp->gen_data.scur_post; + qp->mpw.total_len = msg_size; + } else { + /* We can't open new multi-packet WQE + * since msg_size > MLX5_MAX_MPW_SIZE + */ + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + } + } else { + /* Close multi-packet WQE */ + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + } + + if (use_sg_list) { + addr = sg_list[0].addr; + length = sg_list[0].length; + lkey = sg_list[0].lkey; + } + + /* Start new WQE if there is no open multi-packet WQE */ + if ((use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED_INL)) || + (!use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED))) { + start = mlx5_get_send_wqe(qp, qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1)); + + if (use_raw_eth) { + struct mlx5_wqe_eth_seg *eseg; + + eseg = (struct mlx5_wqe_eth_seg *)(((char *)start) + + sizeof(struct mlx5_wqe_ctrl_seg)); + /* reset rsvd0, cs_flags, rsvd1, mss and rsvd2 fields */ + *((uint64_t *)eseg) = 0; + eseg->rsvd2 = 0; + + if (flags & IBV_EXP_QP_BURST_IP_CSUM) + eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; + if (use_mpw && (qp->mpw.state == MLX5_MPW_STATE_OPENING)) { + eseg->mss = htons(qp->mpw.len); + eseg->inline_hdr_sz = 0; + size = (sizeof(struct mlx5_wqe_ctrl_seg) + + offsetof(struct mlx5_wqe_eth_seg, inline_hdr)) / 16; + if (use_inl) { + inl_seg = (struct mlx5_wqe_inline_seg *)(start + + (size * 4)); + inl_data = (uint8_t *)(inl_seg + 1); + } else { + dseg = (struct mlx5_wqe_data_seg *)(start + + (size * 4)); + } + } else { + eseg->inline_hdr_sz = htons(MLX5_ETH_INLINE_HEADER_SIZE); + + /* We don't support header divided in several sges */ + if (unlikely(length <= MLX5_ETH_INLINE_HEADER_SIZE)) + return EINVAL; + + /* Copy the first 16 bytes into the inline header */ + memcpy(eseg->inline_hdr_start, (void *)(uintptr_t)addr, + MLX5_ETH_INLINE_HEADER_SIZE); + addr += MLX5_ETH_INLINE_HEADER_SIZE; + length -= MLX5_ETH_INLINE_HEADER_SIZE; + size = (sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_eth_seg)) / 16; + dseg = (struct mlx5_wqe_data_seg *)(++eseg); + } + } else { + size = sizeof(struct mlx5_wqe_ctrl_seg) / 16; + dseg = (struct mlx5_wqe_data_seg *)(((char *)start) + sizeof(struct mlx5_wqe_ctrl_seg)); + } + } + + if (use_inl) { + if (use_mpw) { + if (unlikely((inl_data + qp->mpw.len) > + (uint8_t *)qp->gen_data.sqend)) { + int size2end = ((uint8_t *)qp->gen_data.sqend - inl_data); + + memcpy(inl_data, (void *)(uintptr_t)addr, size2end); + memcpy(mlx5_get_send_wqe(qp, 0), + (void *)(uintptr_t)(addr + size2end), + qp->mpw.len - size2end); + + } else { + memcpy(inl_data, (void *)(uintptr_t)addr, qp->mpw.len); + } + inl_seg->byte_count = htonl(qp->mpw.total_len | MLX5_INLINE_SEG); + size = (sizeof(struct mlx5_wqe_ctrl_seg) + + offsetof(struct mlx5_wqe_eth_seg, inline_hdr)) / 16; + size += align(qp->mpw.total_len + sizeof(inl_seg->byte_count), 16) / 16; + } else { + struct ibv_sge sg_list = {addr, length, 0}; + + set_data_inl_seg(qp, 1, &sg_list, dseg, &size, 0, 0); + } + } else { + size += sizeof(struct mlx5_wqe_data_seg) / 16; + dseg->byte_count = htonl(length); + dseg->lkey = htonl(lkey); + dseg->addr = htonll(addr); + } + + /* No inline when using sg list */ + if (use_sg_list) { + for (i = 0; i < num_sge - 1; ++i) { + sg_list++; + if (likely(sg_list->length)) { + dseg++; + if (unlikely(dseg == qp->gen_data.sqend)) + dseg = mlx5_get_send_wqe(qp, 0); + size += sizeof(struct mlx5_wqe_data_seg) / 16; + dseg->byte_count = htonl(sg_list->length); + dseg->lkey = htonl(sg_list->lkey); + dseg->addr = htonll(sg_list->addr); + } + } + } + if (use_mpw) { + if (use_inl) + qp->mpw.inl_data = inl_data; + else + qp->mpw.last_dseg = dseg; + } + + if ((use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED_INL)) || + (!use_inl && (qp->mpw.state != MLX5_MPW_STATE_OPENED))) { + /* Fill ctrl-segment of a new WQE */ + fm_ce_se = qp->ctrl_seg.fm_ce_se_acc[flags & (IBV_EXP_QP_BURST_SOLICITED | + IBV_EXP_QP_BURST_SIGNALED | + IBV_EXP_QP_BURST_FENCE)]; + if (unlikely(qp->gen_data.fm_cache)) { + if (unlikely(flags & IBV_SEND_FENCE)) + fm_ce_se |= MLX5_FENCE_MODE_SMALL_AND_FENCE; + else + fm_ce_se |= qp->gen_data.fm_cache; + qp->gen_data.fm_cache = 0; + } + + if (likely(use_mpw && (qp->mpw.state == MLX5_MPW_STATE_OPENING))) { + *start++ = htonl((MLX5_OPC_MOD_MPW << 24) | + ((qp->gen_data.scur_post & 0xFFFF) << 8) | + MLX5_OPCODE_LSO_MPW); + qp->mpw.ctrl_update = start; + if ((flags & IBV_EXP_QP_BURST_SIGNALED) || + (qp->mpw.num_sge >= MLX5_MAX_MPW_SGE)) { + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + } else { + if (use_inl) + qp->mpw.state = MLX5_MPW_STATE_OPENED_INL; + else + qp->mpw.state = MLX5_MPW_STATE_OPENED; + qp->mpw.size = size; + } + } else { + *start++ = htonl((qp->gen_data.scur_post & 0xFFFF) << 8 | + MLX5_OPCODE_SEND); + } + *start++ = htonl(qp->ctrl_seg.qp_num << 8 | (size & 0x3F)); + *start++ = htonl(fm_ce_se); + *start = 0; + + qp->gen_data.wqe_head[qp->gen_data.scur_post & (qp->sq.wqe_cnt - 1)] = ++(qp->sq.head); + /* Update last_post to point on the position of the new WQE */ + qp->gen_data.last_post = qp->gen_data.scur_post; + qp->gen_data.scur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + } else { + /* Update the multi-packt WQE ctrl-segment */ + if (use_inl) + qp->mpw.size = size; + else + qp->mpw.size += size; + *qp->mpw.ctrl_update = htonl(qp->ctrl_seg.qp_num << 8 | ((qp->mpw.size) & 0x3F)); + qp->gen_data.scur_post = qp->mpw.scur_post + DIV_ROUND_UP(qp->mpw.size * 16, MLX5_SEND_WQE_BB); + if (flags & IBV_EXP_QP_BURST_SIGNALED) { + *(qp->mpw.ctrl_update + 1) |= htonl(MLX5_WQE_CTRL_CQ_UPDATE); + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + } else if (unlikely(qp->mpw.num_sge == MLX5_MAX_MPW_SGE)) { + qp->mpw.state = MLX5_MPW_STATE_CLOSED; + } + } + + if (thread_safe) + mlx5_unlock(&qp->sq.lock); + + return 0; +} + +/* burst family - send_pending */ +static int mlx5_send_pending_safe(struct ibv_qp *qp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_safe(struct ibv_qp *qp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(qp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, inl, safe, */ + return send_pending(qp, addr, length, lkey, flags, raw_eth, 0, 1, + /* use_sg, use_mpw, num_sge, sg_list */ + 0, 0, 0, NULL); +} + +static int mlx5_send_pending_mpw_safe(struct ibv_qp *qp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_mpw_safe(struct ibv_qp *qp, uint64_t addr, + uint32_t length, uint32_t lkey, + uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(qp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, inl, safe, */ + return send_pending(qp, addr, length, lkey, flags, raw_eth, 0, 1, + /* use_sg, use_mpw, num_sge, sg_list */ + 0, 1, 0, NULL); +} + +#define MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw) mlx5_send_pending_unsafe_##eth##mpw +#define MLX5_SEND_PENDING_UNSAFE(eth, mpw) \ + static int MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *qp, uint64_t addr, \ + uint32_t length, uint32_t lkey, \ + uint32_t flags) __MLX5_ALGN_F__; \ + static int MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *qp, uint64_t addr, \ + uint32_t length, uint32_t lkey, \ + uint32_t flags) \ + { \ + /* qp, addr, length, lkey, flags, eth, inl, */ \ + return send_pending(qp, addr, length, lkey, flags, eth, 0, \ + /* safe, use_sg, use_mpw, num_sge, sg_list */ \ + 0, 0, mpw, 0, NULL); \ + } +/* eth mpw */ +MLX5_SEND_PENDING_UNSAFE(0, 0); +MLX5_SEND_PENDING_UNSAFE(0, 1); +MLX5_SEND_PENDING_UNSAFE(1, 0); +MLX5_SEND_PENDING_UNSAFE(1, 1); + +/* burst family - send_pending_inline */ +static int mlx5_send_pending_inl_safe(struct ibv_qp *qp, void *addr, + uint32_t length, uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_inl_safe(struct ibv_qp *qp, void *addr, + uint32_t length, uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(qp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, */ + return send_pending(qp, (uintptr_t)addr, length, 0, flags, raw_eth, + /* inl, safe, use_sg, use_mpw, num_sge, sg_list */ + 1, 1, 0, 0, 0, NULL); +} + +static int mlx5_send_pending_inl_mpw_safe(struct ibv_qp *qp, void *addr, + uint32_t length, uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_inl_mpw_safe(struct ibv_qp *qp, void *addr, + uint32_t length, uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(qp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, */ + return send_pending(qp, (uintptr_t)addr, length, 0, flags, raw_eth, + /* inl, safe, use_sg, use_mpw, num_sge, sg_list */ + 1, 1, 0, 1, 0, NULL); +} + +#define MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw) mlx5_send_pending_inl_unsafe_##eth##mpw +#define MLX5_SEND_PENDING_INL_UNSAFE(eth, mpw) \ + static int MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *qp, void *addr, \ + uint32_t length, uint32_t flags) __MLX5_ALGN_F__; \ + static int MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *qp, void *addr, \ + uint32_t length, uint32_t flags) \ + { \ + /* qp, addr, length, lkey, flags, eth, inl, */ \ + return send_pending(qp, (uintptr_t)addr, length, 0, flags, eth, 1, \ + /* safe, use_sg, use_mpw, num_sge, sg_list */ \ + 0, 0, mpw, 0, NULL); \ + } +/* eth mpw */ +MLX5_SEND_PENDING_INL_UNSAFE(0, 0); +MLX5_SEND_PENDING_INL_UNSAFE(0, 1); +MLX5_SEND_PENDING_INL_UNSAFE(1, 0); +MLX5_SEND_PENDING_INL_UNSAFE(1, 1); + +/* burst family - send_pending_sg_list */ +static int mlx5_send_pending_sg_list_safe( + struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_sg_list_safe( + struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(ibqp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, inl, */ + return send_pending(ibqp, 0, 0, 0, flags, raw_eth, 0, + /* safe, use_sg, use_mpw, num_sge, sg_list */ + 1, 1, 0, num, sg_list); +} + +static int mlx5_send_pending_sg_list_mpw_safe( + struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_pending_sg_list_mpw_safe( + struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags) +{ + struct mlx5_qp *mqp = to_mqp(ibqp); + int raw_eth = mqp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && mqp->link_layer == IBV_LINK_LAYER_ETHERNET; + + /* qp, addr, length, lkey, flags, raw_eth, inl, */ + return send_pending(ibqp, 0, 0, 0, flags, raw_eth, 0, + /* safe, use_sg, use_mpw, num_sge, sg_list */ + 1, 1, 1, num, sg_list); +} + +#define MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw) mlx5_send_pending_sg_list_unsafe_##eth##mpw +#define MLX5_SEND_PENDING_SG_LIST_UNSAFE(eth, mpw) \ + static int MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num, uint32_t flags) __MLX5_ALGN_F__; \ + static int MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num, uint32_t flags) \ + { \ + /* qp, addr, length, lkey, flags, eth, inl, */ \ + return send_pending(ibqp, 0, 0, 0, flags, eth, 0, \ + /* safe, use_sg, use_mpw, num_sge, sg_list */ \ + 0, 1, mpw, num, sg_list); \ + } +/* eth mpw */ +MLX5_SEND_PENDING_SG_LIST_UNSAFE(0, 0); +MLX5_SEND_PENDING_SG_LIST_UNSAFE(0, 1); +MLX5_SEND_PENDING_SG_LIST_UNSAFE(1, 0); +MLX5_SEND_PENDING_SG_LIST_UNSAFE(1, 1); + +/* burst family - send_burst */ +static inline int send_flush_unsafe(struct ibv_qp *ibqp, const int db_method) __attribute__((always_inline)); + +static inline int send_msg_list(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags, const int raw_eth, const int thread_safe, + const int db_method, const int mpw) __attribute__((always_inline)); +static inline int send_msg_list(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, + uint32_t flags, const int raw_eth, const int thread_safe, + const int db_method, const int mpw) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + int i; + + if (thread_safe) + mlx5_lock(&qp->sq.lock); + + for (i = 0; i < num; i++, sg_list++) + /* qp, addr, length, lkey, */ + send_pending(ibqp, sg_list->addr, sg_list->length, sg_list->lkey, + /* flags, raw_eth, inl, safe, use_sg, */ + flags, raw_eth, 0, 0, 0, + /* use_mpw, num_sge, sg_list */ + mpw, 0, NULL); + + /* use send_flush_unsafe since lock is already taken if needed */ + send_flush_unsafe(ibqp, db_method); + + if (thread_safe) + mlx5_unlock(&qp->sq.lock); + + return 0; +} + +static int mlx5_send_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + qp->link_layer == IBV_LINK_LAYER_ETHERNET; + + return send_msg_list(ibqp, sg_list, num, flags, eth, 1, qp->gen_data.bf->db_method, 0); +} + +static int mlx5_send_burst_mpw_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) __MLX5_ALGN_F__; +static int mlx5_send_burst_mpw_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num, uint32_t flags) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + qp->link_layer == IBV_LINK_LAYER_ETHERNET; + + return send_msg_list(ibqp, sg_list, num, flags, eth, 1, qp->gen_data.bf->db_method, 1); +} + +#define MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw) mlx5_send_burst_unsafe_##db_method##eth##mpw +#define MLX5_SEND_BURST_UNSAFE(db_method, eth, mpw) \ + static int MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num, uint32_t flags) __MLX5_ALGN_F__; \ + static int MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num, uint32_t flags) \ + { \ + return send_msg_list(ibqp, sg_list, num, flags, eth, 0, db_method, mpw); \ + } +/* db_method, eth mpw */ +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 0, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 0, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 1, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DEDIC_BF, 1, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 0, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 0, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 1, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_BF, 1, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 0, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 0, 1); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 1, 0); +MLX5_SEND_BURST_UNSAFE(MLX5_DB_METHOD_DB, 1, 1); + +/* burst family - send_flush */ +static inline int send_flush_unsafe(struct ibv_qp *ibqp, const int db_method) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + uint32_t curr_post = qp->gen_data.scur_post & 0xffff; + int size = ((int)curr_post - (int)qp->gen_data.last_post + (int)0x10000) & 0xffff; + unsigned long long *seg = mlx5_get_send_wqe(qp, qp->gen_data.last_post & (qp->sq.wqe_cnt - 1)); + + return __ring_db(qp, db_method, curr_post, seg, size); +} + +static int mlx5_send_flush_safe(struct ibv_qp *ibqp) __MLX5_ALGN_F__; +static int mlx5_send_flush_safe(struct ibv_qp *ibqp) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + + mlx5_lock(&qp->sq.lock); + send_flush_unsafe(ibqp, qp->gen_data.bf->db_method); + mlx5_unlock(&qp->sq.lock); + + return 0; +} + +#define MLX5_SEND_FLUSH_UNSAFE_NAME(db_method) mlx5_send_flush_unsafe_##db_method +#define MLX5_SEND_FLUSH_UNSAFE(db_method) \ + static int MLX5_SEND_FLUSH_UNSAFE_NAME(db_method)( \ + struct ibv_qp *ibqp) __MLX5_ALGN_F__; \ + static int MLX5_SEND_FLUSH_UNSAFE_NAME(db_method)( \ + struct ibv_qp *ibqp) \ + { \ + return send_flush_unsafe(ibqp, db_method); \ + } +/* db_method */ +MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DEDIC_BF_1_THREAD); +MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DEDIC_BF); +MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_BF); +MLX5_SEND_FLUSH_UNSAFE(MLX5_DB_METHOD_DB); + +/* burst family - recv_pending_sg_list */ +static inline int recv_sg_list(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num_sg, + const int thread_safe) __attribute__((always_inline)); +static inline int recv_sg_list(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num_sg, + const int thread_safe) +{ + struct mlx5_wqe_data_seg *scat; + unsigned int ind; + int i, j; + + if (thread_safe) + mlx5_lock(&rq->lock); + + ind = rq->head & (rq->wqe_cnt - 1); + scat = get_recv_wqe(rq, ind); + + for (i = 0, j = 0; i < num_sg; ++i, sg_list++) { + if (unlikely(!sg_list->length)) + continue; + scat->byte_count = htonl(sg_list->length); + scat->lkey = htonl(sg_list->lkey); + scat->addr = htonll(sg_list->addr); + scat++; + j++; + } + if (j < rq->max_gs) { + scat->byte_count = 0; + scat->lkey = htonl(MLX5_INVALID_LKEY); + scat->addr = 0; + } + rq->head++; + + /* + * Make sure that descriptors are written before + * doorbell record. + */ + wmb(); + + *rq->db = htonl(rq->head & 0xffff); + + if (thread_safe) + mlx5_unlock(&rq->lock); + + return 0; +} + +/* burst family - recv_burst */ +static inline int recv_burst(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num, + const int thread_safe, const int max_one_sge, const int mp_rq) __attribute__((always_inline)); +static inline int recv_burst(struct mlx5_wq *rq, struct ibv_sge *sg_list, uint32_t num, + const int thread_safe, const int max_one_sge, const int mp_rq) +{ + struct mlx5_wqe_data_seg *scat; + unsigned int ind; + int i; + + if (thread_safe) + mlx5_lock(&rq->lock); + + ind = rq->head & (rq->wqe_cnt - 1); + for (i = 0; i < num; ++i) { + scat = get_recv_wqe(rq, ind); + /* Multi-Packet RQ WQE format is like SRQ format and requires + * a next-segment octword. + * This next-segment octword is reserved (therefore cleared) + * when we use CYCLIC_STRIDING_RQ + */ + if (mp_rq) { + memset(scat, 0, sizeof(struct mlx5_wqe_srq_next_seg)); + scat++; + } + scat->byte_count = htonl(sg_list->length); + scat->lkey = htonl(sg_list->lkey); + scat->addr = htonll(sg_list->addr); + + if (!max_one_sge) { + scat[1].byte_count = 0; + scat[1].lkey = htonl(MLX5_INVALID_LKEY); + scat[1].addr = 0; + } + + sg_list++; + ind = (ind + 1) & (rq->wqe_cnt - 1); + } + rq->head += num; + + /* + * Make sure that descriptors are written before + * doorbell record. + */ + wmb(); + + *rq->db = htonl(rq->head & 0xffff); + + if (thread_safe) + mlx5_unlock(&rq->lock); + + return 0; +} + +static int mlx5_recv_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num) __MLX5_ALGN_F__; +static int mlx5_recv_burst_safe(struct ibv_qp *ibqp, struct ibv_sge *sg_list, uint32_t num) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + + return recv_burst(&qp->rq, sg_list, num, 1, qp->rq.max_gs == 1, 0); +} + +#define MLX5_RECV_BURST_UNSAFE_NAME(_1sge) mlx5_recv_burst_unsafe_##_1sge +#define MLX5_RECV_BURST_UNSAFE(_1sge) \ + static int MLX5_RECV_BURST_UNSAFE_NAME(_1sge)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num) __MLX5_ALGN_F__; \ + static int MLX5_RECV_BURST_UNSAFE_NAME(_1sge)( \ + struct ibv_qp *ibqp, struct ibv_sge *sg_list, \ + uint32_t num) \ + { \ + return recv_burst(&to_mqp(ibqp)->rq, sg_list, num, 0, _1sge, 0); \ + } +/* _1sge */ +MLX5_RECV_BURST_UNSAFE(0); +MLX5_RECV_BURST_UNSAFE(1); + +/* + * qp_burst family implementation for safe QP + */ +struct ibv_exp_qp_burst_family mlx5_qp_burst_family_safe = { + .send_burst = mlx5_send_burst_safe, + .send_pending = mlx5_send_pending_safe, + .send_pending_inline = mlx5_send_pending_inl_safe, + .send_pending_sg_list = mlx5_send_pending_sg_list_safe, + .send_flush = mlx5_send_flush_safe, + .recv_burst = mlx5_recv_burst_safe +}; + +struct ibv_exp_qp_burst_family mlx5_qp_burst_family_mpw_safe = { + .send_burst = mlx5_send_burst_mpw_safe, + .send_pending = mlx5_send_pending_mpw_safe, + .send_pending_inline = mlx5_send_pending_inl_mpw_safe, + .send_pending_sg_list = mlx5_send_pending_sg_list_mpw_safe, + .send_flush = mlx5_send_flush_safe, + .recv_burst = mlx5_recv_burst_safe +}; + +/* + * qp_burst family implementation table for unsafe QP + * + * Each table entry contains an implementation of the ibv_exp_qp_burst_family + * which fits to QPs with specific attributes: + * - db_method (MLX5_DB_METHOD_DEDIC_BF_1_THREAD, MLX5_DB_METHOD_DEDIC_BF, + * MLX5_DB_METHOD_BF or MLX5_DB_METHOD_DB) + * - raw_eth_qp (yes/no), + * - max-rcv-gs == 1 (yes/no) + * + * To get the right qp_burst_family implementation for specific QP use the QP + * attributes (db_method << 2 | eth << 1 | _1sge) as an index for the qp_burst + * family table + */ +#define MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw) \ + (db_method << 3 | eth << 2 | _1sge << 1 | mpw) + +#define MLX5_QP_BURST_UNSAFE_TBL_ENTRY(db_method, eth, _1sge, mpw) \ + [MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw)] = { \ + .send_burst = MLX5_SEND_BURST_UNSAFE_NAME(db_method, eth, mpw), \ + .send_pending = MLX5_SEND_PENDING_UNSAFE_NAME(eth, mpw), \ + .send_pending_inline = MLX5_SEND_PENDING_INL_UNSAFE_NAME(eth, mpw), \ + .send_pending_sg_list = MLX5_SEND_PENDING_SG_LIST_UNSAFE_NAME(eth, mpw), \ + .send_flush = MLX5_SEND_FLUSH_UNSAFE_NAME(db_method), \ + .recv_burst = MLX5_RECV_BURST_UNSAFE_NAME(_1sge), \ + } +static struct ibv_exp_qp_burst_family mlx5_qp_burst_family_unsafe_tbl[1 << 5] = { + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 0, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF_1_THREAD, 1, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 0, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DEDIC_BF, 1, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 0, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_BF, 1, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 0, 1, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 0, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 0, 1), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 1, 0), + MLX5_QP_BURST_UNSAFE_TBL_ENTRY(MLX5_DB_METHOD_DB, 1, 1, 1), +}; + +struct ibv_exp_qp_burst_family *mlx5_get_qp_burst_family(struct mlx5_qp *qp, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status) +{ + enum ibv_exp_query_intf_status ret = IBV_EXP_INTF_STAT_OK; + struct ibv_exp_qp_burst_family *family = NULL; + uint32_t unsupported_f; + int mpw; + + if (params->intf_version > MLX5_MAX_QP_BURST_FAMILY_VER) { + *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED; + + return NULL; + } + + if ((qp->verbs_qp.qp.state < IBV_QPS_INIT) || (qp->verbs_qp.qp.state > IBV_QPS_RTS)) { + *status = IBV_EXP_INTF_STAT_INVAL_OBJ_STATE; + return NULL; + } + if (qp->gen_data.create_flags & IBV_EXP_QP_CREATE_MANAGED_SEND) { + fprintf(stderr, PFX "Can't use QP burst family while QP_CREATE_MANAGED_SEND is set\n"); + *status = IBV_EXP_INTF_STAT_INVAL_PARARM; + return NULL; + } + if (params->flags) { + fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for QP family\n", params->flags); + *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED; + + return NULL; + } + unsupported_f = params->family_flags & ~(IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR); + if (unsupported_f) { + fprintf(stderr, PFX "Family flags(0x%x) are not supported for QP family\n", unsupported_f); + *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED; + + return NULL; + } + + switch (qp->gen_data_warm.qp_type) { + case IBV_QPT_RC: + case IBV_QPT_UC: + case IBV_QPT_RAW_PACKET: + mpw = (params->family_flags & IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR) && + (qp->gen_data.model_flags & MLX5_QP_MODEL_MULTI_PACKET_WQE); + + if (qp->gen_data.model_flags & MLX5_QP_MODEL_FLAG_THREAD_SAFE) { + if (mpw) + family = &mlx5_qp_burst_family_mpw_safe; + else + family = &mlx5_qp_burst_family_safe; + } else { + int eth = qp->gen_data_warm.qp_type == IBV_QPT_RAW_PACKET && + qp->link_layer == IBV_LINK_LAYER_ETHERNET; + int _1sge = qp->rq.max_gs == 1; + int db_method = qp->gen_data.bf->db_method; + + family = &mlx5_qp_burst_family_unsafe_tbl + [MLX5_QP_BURST_UNSAFE_TBL_IDX(db_method, eth, _1sge, mpw)]; + } + break; + + default: + ret = IBV_EXP_INTF_STAT_INVAL_PARARM; + break; + } + + *status = ret; + + return family; +} + +/* + * WQ family + */ + +/* wq family - recv_burst */ +static int mlx5_wq_recv_burst_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num) __MLX5_ALGN_F__; +static int mlx5_wq_recv_burst_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num) +{ + struct mlx5_rwq *rwq = to_mrwq(ibwq); + + return recv_burst(&rwq->rq, sg_list, num, 1, rwq->rq.max_gs == 1, rwq->rsc.type == MLX5_RSC_TYPE_MP_RWQ); +} + +#define MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge) mlx5_wq_recv_burst_unsafe_##_1sge +#define MLX5_WQ_RECV_BURST_UNSAFE(_1sge) \ + static int MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge)( \ + struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, \ + uint32_t num) __MLX5_ALGN_F__; \ + static int MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge)( \ + struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, \ + uint32_t num) \ + { \ + struct mlx5_rwq *rwq = to_mrwq(ibwq); \ + \ + return recv_burst(&rwq->rq, sg_list, num, 0, _1sge, \ + rwq->rsc.type == MLX5_RSC_TYPE_MP_RWQ); \ + } +/* _1sge */ +MLX5_WQ_RECV_BURST_UNSAFE(0); +MLX5_WQ_RECV_BURST_UNSAFE(1); + +/* wq family - recv_sg_list */ +static int mlx5_wq_recv_sg_list_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) __MLX5_ALGN_F__; +static int mlx5_wq_recv_sg_list_safe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) +{ + return recv_sg_list(&to_mrwq(ibwq)->rq, sg_list, num_sg, 1); +} + +static int mlx5_wq_recv_sg_list_unsafe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) __MLX5_ALGN_F__; +static int mlx5_wq_recv_sg_list_unsafe(struct ibv_exp_wq *ibwq, struct ibv_sge *sg_list, uint32_t num_sg) +{ + return recv_sg_list(&to_mrwq(ibwq)->rq, sg_list, num_sg, 0); +} + +/* + * wq family implementation for safe WQ + */ +struct ibv_exp_wq_family mlx5_wq_family_safe = { + .recv_sg_list = mlx5_wq_recv_sg_list_safe, + .recv_burst = mlx5_wq_recv_burst_safe +}; + +/* + * wq family implementation table for unsafe WQ + * + * Each table entry contains an implementation of the ibv_exp_wq_family + * which fits to WQs with specific attributes: + * - max-rcv-gs == 1 (yes/no) + * + * To get the right wq_family implementation for specific WQ use the WQ + * attribute (_1sge) as an index for the qp_burst family table + */ +#define MLX5_WQ_UNSAFE_TBL_IDX(_1sge) \ + (_1sge) + +#define MLX5_WQ_UNSAFE_TBL_ENTRY(_1sge) \ + [MLX5_WQ_UNSAFE_TBL_IDX(_1sge)] = { \ + .recv_sg_list = mlx5_wq_recv_sg_list_unsafe, \ + .recv_burst = MLX5_WQ_RECV_BURST_UNSAFE_NAME(_1sge) \ + } + +static struct ibv_exp_wq_family mlx5_wq_family_unsafe_tbl[1 << 1] = { + MLX5_WQ_UNSAFE_TBL_ENTRY(0), + MLX5_WQ_UNSAFE_TBL_ENTRY(1), +}; + +struct ibv_exp_wq_family *mlx5_get_wq_family(struct mlx5_rwq *rwq, + struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status) +{ + enum ibv_exp_query_intf_status ret = IBV_EXP_INTF_STAT_OK; + struct ibv_exp_wq_family *family = NULL; + + if (params->intf_version > MLX5_MAX_WQ_FAMILY_VER) { + *status = IBV_EXP_INTF_STAT_VERSION_NOT_SUPPORTED; + + return NULL; + } + + if (params->flags) { + fprintf(stderr, PFX "Global interface flags(0x%x) are not supported for WQ family\n", params->flags); + *status = IBV_EXP_INTF_STAT_FLAGS_NOT_SUPPORTED; + + return NULL; + } + if (params->family_flags) { + fprintf(stderr, PFX "Family flags(0x%x) are not supported for WQ family\n", params->family_flags); + *status = IBV_EXP_INTF_STAT_FAMILY_FLAGS_NOT_SUPPORTED; + + return NULL; + } + + if (rwq->model_flags & MLX5_WQ_MODEL_FLAG_THREAD_SAFE) { + family = &mlx5_wq_family_safe; + } else { + int _1sge = rwq->rq.max_gs == 1; + + family = &mlx5_wq_family_unsafe_tbl + [MLX5_WQ_UNSAFE_TBL_IDX(_1sge)]; + } + + *status = ret; + + return family; +} + Index: contrib/ofed/libmlx5/src/srq.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/srq.c @@ -0,0 +1,271 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include + +#include "mlx5.h" +#include "doorbell.h" +#include "wqe.h" + +static void *get_wqe(struct mlx5_srq *srq, int n) +{ + return srq->buf.buf + (n << srq->wqe_shift); +} + +int mlx5_copy_to_recv_srq(struct mlx5_srq *srq, int idx, void *buf, int size) +{ + struct mlx5_wqe_srq_next_seg *next; + struct mlx5_wqe_data_seg *scat; + int copy; + int i; + int max = 1 << (srq->wqe_shift - 4); + + next = get_wqe(srq, idx); + scat = (struct mlx5_wqe_data_seg *) (next + 1); + + for (i = 0; i < max; ++i) { + copy = min(size, ntohl(scat->byte_count)); + memcpy((void *)(unsigned long)ntohll(scat->addr), buf, copy); + size -= copy; + if (size <= 0) + return IBV_WC_SUCCESS; + + buf += copy; + ++scat; + } + return IBV_WC_LOC_LEN_ERR; +} + +void mlx5_free_srq_wqe(struct mlx5_srq *srq, int ind) +{ + struct mlx5_wqe_srq_next_seg *next; + + mlx5_spin_lock(&srq->lock); + + next = get_wqe(srq, srq->tail); + next->next_wqe_index = htons(ind); + srq->tail = ind; + + mlx5_spin_unlock(&srq->lock); +} + +static void set_sig_seg(struct mlx5_srq *srq, + struct mlx5_wqe_srq_next_seg *next, + int size, uint16_t idx) +{ + uint8_t sign; + uint32_t srqn = srq->srqn; + + next->signature = 0; + sign = calc_xor(next, size); + sign ^= calc_xor(&srqn, 4); + sign ^= calc_xor(&idx, 2); + next->signature = ~sign; +} + +int mlx5_post_srq_recv(struct ibv_srq *ibsrq, + struct ibv_recv_wr *wr, + struct ibv_recv_wr **bad_wr) +{ + struct mlx5_srq *srq; + struct mlx5_wqe_srq_next_seg *next; + struct mlx5_wqe_data_seg *scat; + unsigned head; + int err = 0; + int nreq; + int i; + + if (ibsrq->handle == LEGACY_XRC_SRQ_HANDLE) + ibsrq = (struct ibv_srq *)(((struct ibv_srq_legacy *) ibsrq)->ibv_srq); + + srq = to_msrq(ibsrq); + mlx5_spin_lock(&srq->lock); + + for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (wr->num_sge > srq->max_gs) { + errno = EINVAL; + err = errno; + *bad_wr = wr; + break; + } + + head = srq->head; + if (head == srq->tail) { + /* SRQ is full*/ + errno = ENOMEM; + err = errno; + *bad_wr = wr; + break; + } + + srq->wrid[head] = wr->wr_id; + + next = get_wqe(srq, head); + srq->head = ntohs(next->next_wqe_index); + scat = (struct mlx5_wqe_data_seg *) (next + 1); + + for (i = 0; i < wr->num_sge; ++i) { + scat[i].byte_count = htonl(wr->sg_list[i].length); + scat[i].lkey = htonl(wr->sg_list[i].lkey); + scat[i].addr = htonll(wr->sg_list[i].addr); + } + + if (i < srq->max_gs) { + scat[i].byte_count = 0; + scat[i].lkey = htonl(MLX5_INVALID_LKEY); + scat[i].addr = 0; + } + if (unlikely(srq->wq_sig)) + set_sig_seg(srq, next, 1 << srq->wqe_shift, head + nreq); + } + + if (nreq) { + srq->counter += nreq; + + /* + * Make sure that descriptors are written before + * we write doorbell record. + */ + wmb(); + + *srq->db = htonl(srq->counter); + } + + mlx5_spin_unlock(&srq->lock); + + return err; +} + +int mlx5_alloc_srq_buf(struct ibv_context *context, struct mlx5_srq *srq) +{ + struct mlx5_wqe_srq_next_seg *next; + int size; + int buf_size; + int i; + struct mlx5_context *ctx; + + ctx = to_mctx(context); + + if (srq->max_gs < 0) { + errno = EINVAL; + return -1; + } + + srq->wrid = malloc(srq->max * sizeof *srq->wrid); + if (!srq->wrid) + return -1; + + size = sizeof(struct mlx5_wqe_srq_next_seg) + + srq->max_gs * sizeof(struct mlx5_wqe_data_seg); + size = max(32, size); + + size = mlx5_round_up_power_of_two(size); + + if (size > ctx->max_recv_wr) { + errno = EINVAL; + return -1; + } + srq->max_gs = (size - sizeof(struct mlx5_wqe_srq_next_seg)) / + sizeof(struct mlx5_wqe_data_seg); + + srq->wqe_shift = mlx5_ilog2(size); + + buf_size = srq->max * size; + + if (mlx5_alloc_buf(&srq->buf, buf_size, + to_mdev(context->device)->page_size)) { + free(srq->wrid); + return -1; + } + + memset(srq->buf.buf, 0, buf_size); + + /* + * Now initialize the SRQ buffer so that all of the WQEs are + * linked into the list of free WQEs. + */ + + for (i = 0; i < srq->max; ++i) { + next = get_wqe(srq, i); + next->next_wqe_index = htons((i + 1) & (srq->max - 1)); + } + + srq->head = 0; + srq->tail = srq->max - 1; + + return 0; +} + +struct mlx5_srq *mlx5_find_srq(struct mlx5_context *ctx, uint32_t srqn) +{ + int tind = srqn >> MLX5_SRQ_TABLE_SHIFT; + + if (ctx->srq_table[tind].refcnt) + return ctx->srq_table[tind].table[srqn & MLX5_SRQ_TABLE_MASK]; + else + return NULL; +} + +int mlx5_store_srq(struct mlx5_context *ctx, uint32_t srqn, + struct mlx5_srq *srq) +{ + int tind = srqn >> MLX5_SRQ_TABLE_SHIFT; + + if (!ctx->srq_table[tind].refcnt) { + ctx->srq_table[tind].table = calloc(MLX5_QP_TABLE_MASK + 1, + sizeof(struct mlx5_qp *)); + if (!ctx->srq_table[tind].table) + return -1; + } + + ++ctx->srq_table[tind].refcnt; + ctx->srq_table[tind].table[srqn & MLX5_QP_TABLE_MASK] = srq; + return 0; +} + +void mlx5_clear_srq(struct mlx5_context *ctx, uint32_t srqn) +{ + int tind = srqn >> MLX5_QP_TABLE_SHIFT; + + if (!--ctx->srq_table[tind].refcnt) + free(ctx->srq_table[tind].table); + else + ctx->srq_table[tind].table[srqn & MLX5_SRQ_TABLE_MASK] = NULL; +} Index: contrib/ofed/libmlx5/src/verbs.c =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/verbs.c @@ -0,0 +1,3462 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mlx5.h" +#include "mlx5-abi.h" +#include "wqe.h" + +int mlx5_single_threaded = 0; +int mlx5_use_mutex; + +static void __mlx5_query_device(uint64_t raw_fw_ver, + struct ibv_device_attr *attr) +{ + unsigned major, minor, sub_minor; + + major = (raw_fw_ver >> 32) & 0xffff; + minor = (raw_fw_ver >> 16) & 0xffff; + sub_minor = raw_fw_ver & 0xffff; + + snprintf(attr->fw_ver, sizeof attr->fw_ver, + "%d.%d.%04d", major, minor, sub_minor); +} + +int mlx5_query_device(struct ibv_context *context, + struct ibv_device_attr *attr) +{ + struct ibv_exp_device_attr attrx; + struct ibv_exp_query_device cmd; + uint64_t raw_fw_ver; + int err; + + read_init_vars(to_mctx(context)); + memset(&attrx, 0, sizeof(attrx)); + err = ibv_exp_cmd_query_device(context, + &attrx, + &raw_fw_ver, &cmd, + sizeof(cmd)); + if (err) + return err; + + memcpy(attr, &attrx, sizeof(*attr)); + __mlx5_query_device(raw_fw_ver, attr); + + return err; +} + +int mlx5_query_device_ex(struct ibv_context *context, + struct ibv_exp_device_attr *attr) +{ + struct ibv_exp_query_device cmd; + struct mlx5_context *ctx = to_mctx(context); + uint64_t raw_fw_ver; + int err; + + err = ibv_exp_cmd_query_device(context, attr, &raw_fw_ver, + &cmd, sizeof(cmd)); + if (err) + return err; + + __mlx5_query_device(raw_fw_ver, (struct ibv_device_attr *)attr); + + attr->exp_device_cap_flags |= IBV_EXP_DEVICE_MR_ALLOCATE; + if (attr->exp_device_cap_flags & IBV_EXP_DEVICE_CROSS_CHANNEL) { + attr->comp_mask |= IBV_EXP_DEVICE_ATTR_CALC_CAP; + attr->calc_cap.data_types = + (1ULL << IBV_EXP_CALC_DATA_TYPE_INT) | + (1ULL << IBV_EXP_CALC_DATA_TYPE_UINT) | + (1ULL << IBV_EXP_CALC_DATA_TYPE_FLOAT); + attr->calc_cap.data_sizes = + (1ULL << IBV_EXP_CALC_DATA_SIZE_64_BIT); + attr->calc_cap.int_ops = (1ULL << IBV_EXP_CALC_OP_ADD) | + (1ULL << IBV_EXP_CALC_OP_BAND) | + (1ULL << IBV_EXP_CALC_OP_BXOR) | + (1ULL << IBV_EXP_CALC_OP_BOR); + attr->calc_cap.uint_ops = attr->calc_cap.int_ops; + attr->calc_cap.fp_ops = attr->calc_cap.int_ops; + } + if (ctx->cc.buf) + attr->exp_device_cap_flags |= IBV_EXP_DEVICE_DC_INFO; + + if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS) + attr->exp_device_cap_flags &= (~IBV_EXP_DEVICE_VXLAN_SUPPORT); + + if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_MP_RQ) + /* Lib supports MP-RQ only for RAW_ETH QPs reset other + * QP types supported by kernel + */ + attr->mp_rq_caps.supported_qps &= IBV_EXP_QPT_RAW_PACKET; + + + if (attr->comp_mask & IBV_EXP_DEVICE_ATTR_MP_RQ) { + /* Update kernel caps to mp_rq caps supported by lib */ + attr->mp_rq_caps.allowed_shifts &= MLX5_MP_RQ_SUPPORTED_SHIFTS; + attr->mp_rq_caps.supported_qps &= MLX5_MP_RQ_SUPPORTED_QPT; + if (attr->mp_rq_caps.max_single_stride_log_num_of_bytes > MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE) + attr->mp_rq_caps.max_single_stride_log_num_of_bytes = MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE; + if (attr->mp_rq_caps.max_single_wqe_log_num_of_strides > MLX5_MP_RQ_MAX_LOG_NUM_STRIDES) + attr->mp_rq_caps.max_single_wqe_log_num_of_strides = MLX5_MP_RQ_MAX_LOG_NUM_STRIDES; + } + + return err; +} + +int mlx5_query_port(struct ibv_context *context, uint8_t port, + struct ibv_port_attr *attr) +{ + struct mlx5_context *mctx = to_mctx(context); + struct ibv_query_port cmd; + int err; + + read_init_vars(mctx); + err = ibv_cmd_query_port(context, port, attr, &cmd, sizeof cmd); + + if (!err && port <= mctx->num_ports && port > 0) { + if (!mctx->port_query_cache[port - 1].valid) { + mctx->port_query_cache[port - 1].link_layer = + attr->link_layer; + mctx->port_query_cache[port - 1].caps = + attr->port_cap_flags; + mctx->port_query_cache[port - 1].valid = 1; + } + } + + return err; +} + +int mlx5_exp_query_port(struct ibv_context *context, uint8_t port_num, + struct ibv_exp_port_attr *port_attr) +{ + struct mlx5_context *mctx = to_mctx(context); + + /* Check that only valid flags were given */ + if (!(port_attr->comp_mask & IBV_EXP_QUERY_PORT_ATTR_MASK1) || + (port_attr->comp_mask & ~IBV_EXP_QUERY_PORT_ATTR_MASKS) || + (port_attr->mask1 & ~IBV_EXP_QUERY_PORT_MASK)) { + return EINVAL; + } + + /* Optimize the link type query */ + if (port_attr->comp_mask == IBV_EXP_QUERY_PORT_ATTR_MASK1) { + if (!(port_attr->mask1 & ~(IBV_EXP_QUERY_PORT_LINK_LAYER | + IBV_EXP_QUERY_PORT_CAP_FLAGS))) { + if (port_num <= 0 || port_num > mctx->num_ports) + return EINVAL; + if (mctx->port_query_cache[port_num - 1].valid) { + if (port_attr->mask1 & + IBV_EXP_QUERY_PORT_LINK_LAYER) + port_attr->link_layer = + mctx-> + port_query_cache[port_num - 1]. + link_layer; + if (port_attr->mask1 & + IBV_EXP_QUERY_PORT_CAP_FLAGS) + port_attr->port_cap_flags = + mctx-> + port_query_cache[port_num - 1]. + caps; + return 0; + } + } + if (port_attr->mask1 & IBV_EXP_QUERY_PORT_STD_MASK) { + return mlx5_query_port(context, port_num, + &port_attr->port_attr); + } + } + + return EOPNOTSUPP; + +} + +struct ibv_pd *mlx5_alloc_pd(struct ibv_context *context) +{ + struct ibv_alloc_pd cmd; + struct mlx5_alloc_pd_resp resp; + struct mlx5_pd *pd; + + read_init_vars(to_mctx(context)); + pd = calloc(1, sizeof *pd); + if (!pd) + return NULL; + + if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof cmd, + &resp.ibv_resp, sizeof(resp))) + goto err; + + pd->pdn = resp.pdn; + + + if (mlx5_init_implicit_lkey(&pd->r_ilkey, IBV_EXP_ACCESS_ON_DEMAND) || + mlx5_init_implicit_lkey(&pd->w_ilkey, IBV_EXP_ACCESS_ON_DEMAND | + IBV_EXP_ACCESS_LOCAL_WRITE)) + goto err; + + return &pd->ibv_pd; + +err: + free(pd); + return NULL; +} + +int mlx5_free_pd(struct ibv_pd *pd) +{ + struct mlx5_pd *mpd = to_mpd(pd); + int ret; + + /* TODO: Better handling of destruction failure due to resources + * opened. At the moment, we might seg-fault here.*/ + mlx5_destroy_implicit_lkey(&mpd->r_ilkey); + mlx5_destroy_implicit_lkey(&mpd->w_ilkey); + if (mpd->remote_ilkey) { + mlx5_destroy_implicit_lkey(mpd->remote_ilkey); + mpd->remote_ilkey = NULL; + } + + ret = ibv_cmd_dealloc_pd(pd); + if (ret) + return ret; + + free(mpd); + return 0; +} + +static void *alloc_buf(struct mlx5_mr *mr, + struct ibv_pd *pd, + size_t length, + void *contig_addr) +{ + size_t alloc_length; + int force_anon = 0; + int force_contig = 0; + enum mlx5_alloc_type alloc_type; + int page_size = to_mdev(pd->context->device)->page_size; + int err; + + mlx5_get_alloc_type(pd->context, MLX5_MR_PREFIX, &alloc_type, MLX5_ALLOC_TYPE_ALL); + + if (alloc_type == MLX5_ALLOC_TYPE_CONTIG) + force_contig = 1; + else if (alloc_type == MLX5_ALLOC_TYPE_ANON) + force_anon = 1; + + if (force_anon) { + err = mlx5_alloc_buf(&mr->buf, align(length, page_size), + page_size); + if (err) + return NULL; + + return mr->buf.buf; + } + + alloc_length = (contig_addr ? length : align(length, page_size)); + + err = mlx5_alloc_buf_contig(to_mctx(pd->context), &mr->buf, + alloc_length, page_size, MLX5_MR_PREFIX, + contig_addr); + if (!err) + return contig_addr ? contig_addr : mr->buf.buf; + + if (force_contig || contig_addr) + return NULL; + + err = mlx5_alloc_buf(&mr->buf, align(length, page_size), + page_size); + if (err) + return NULL; + + return mr->buf.buf; +} + +struct ibv_mr *mlx5_exp_reg_mr(struct ibv_exp_reg_mr_in *in) +{ + struct mlx5_mr *mr; + struct ibv_exp_reg_mr cmd; + int ret; + int is_contig; + + if ((in->comp_mask > IBV_EXP_REG_MR_RESERVED - 1) || + (in->exp_access > IBV_EXP_ACCESS_RESERVED - 1)) { + errno = EINVAL; + return NULL; + } + + if (in->addr == 0 && in->length == MLX5_WHOLE_ADDR_SPACE && + (in->exp_access & IBV_EXP_ACCESS_ON_DEMAND)) + return mlx5_alloc_whole_addr_mr(in); + + if ((in->exp_access & + (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_RELAXED)) == + (IBV_EXP_ACCESS_ON_DEMAND | IBV_EXP_ACCESS_RELAXED)) { + struct ibv_mr *ibv_mr = NULL; + struct mlx5_pd *mpd = to_mpd(in->pd); + struct mlx5_implicit_lkey *implicit_lkey = + mlx5_get_implicit_lkey(mpd, in->exp_access); + struct ibv_exp_prefetch_attr prefetch_attr = { + .flags = in->exp_access & + (IBV_ACCESS_LOCAL_WRITE | + IBV_ACCESS_REMOTE_WRITE | + IBV_ACCESS_REMOTE_READ) ? + IBV_EXP_PREFETCH_WRITE_ACCESS : 0, + .addr = in->addr, + .length = in->length, + .comp_mask = 0, + }; + + if (!implicit_lkey) + return NULL; + errno = mlx5_get_real_mr_from_implicit_lkey(mpd, implicit_lkey, + (uintptr_t)in->addr, + in->length, + &ibv_mr); + if (errno) + return NULL; + + /* Prefetch the requested range */ + ibv_exp_prefetch_mr(ibv_mr, &prefetch_attr); + + return ibv_mr; + } + + mr = calloc(1, sizeof(*mr)); + if (!mr) + return NULL; + + /* + * if addr is NULL and IBV_EXP_ACCESS_ALLOCATE_MR is set, + * the library allocates contiguous memory + */ + + /* Need valgrind exception here due to compiler optimization problem */ + VALGRIND_MAKE_MEM_DEFINED(&in->create_flags, sizeof(in->create_flags)); + is_contig = (!in->addr && (in->exp_access & IBV_EXP_ACCESS_ALLOCATE_MR)) || + ((in->comp_mask & IBV_EXP_REG_MR_CREATE_FLAGS) && + (in->create_flags & IBV_EXP_REG_MR_CREATE_CONTIG)); + + if (is_contig) { + in->addr = alloc_buf(mr, in->pd, in->length, in->addr); + if (!in->addr) { + free(mr); + return NULL; + } + + mr->alloc_flags |= IBV_EXP_ACCESS_ALLOCATE_MR; + /* + * set the allocated address for the verbs consumer + */ + mr->ibv_mr.addr = in->addr; + } + + /* We should store the ODP type of the MR to avoid + * calling "ibv_dofork_range" when invoking ibv_dereg_mr + */ + if (in->exp_access & IBV_EXP_ACCESS_ON_DEMAND) + mr->type = MLX5_ODP_MR; + + { + struct ibv_exp_reg_mr_resp resp; + + ret = ibv_cmd_exp_reg_mr(in, + (uintptr_t) in->addr, + &(mr->ibv_mr), + &cmd, sizeof(cmd), + &resp, sizeof(resp)); + } + if (ret) { + if ((mr->alloc_flags & IBV_EXP_ACCESS_ALLOCATE_MR)) { + if (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG) + mlx5_free_buf_contig(to_mctx(in->pd->context), + &mr->buf); + else + mlx5_free_buf(&(mr->buf)); + } + free(mr); + return NULL; + } + + return &mr->ibv_mr; +} +struct ibv_mr *mlx5_reg_mr(struct ibv_pd *pd, void *addr, + size_t length, int access) +{ + struct ibv_exp_reg_mr_in in; + + in.pd = pd; + in.addr = addr; + in.length = length; + in.exp_access = access; + in.comp_mask = 0; + + return mlx5_exp_reg_mr(&in); +} +int mlx5_dereg_mr(struct ibv_mr *ibmr) +{ + int ret; + struct mlx5_mr *mr = to_mmr(ibmr); + + if (ibmr->lkey == ODP_GLOBAL_R_LKEY || + ibmr->lkey == ODP_GLOBAL_W_LKEY) { + mlx5_dealloc_whole_addr_mr(ibmr); + return 0; + } + + if (mr->alloc_flags & IBV_EXP_ACCESS_RELAXED) + return 0; + + if (mr->alloc_flags & IBV_EXP_ACCESS_NO_RDMA) + goto free_mr; + + ret = ibv_cmd_dereg_mr(ibmr); + if (ret) + return ret; + +free_mr: + if ((mr->alloc_flags & IBV_EXP_ACCESS_ALLOCATE_MR)) { + if (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG) + mlx5_free_buf_contig(to_mctx(ibmr->context), &mr->buf); + else + mlx5_free_buf(&(mr->buf)); + } + + free(mr); + return 0; +} + +int mlx5_prefetch_mr(struct ibv_mr *mr, struct ibv_exp_prefetch_attr *attr) +{ + + struct mlx5_pd *pd = to_mpd(mr->pd); + + if (attr->comp_mask >= IBV_EXP_PREFETCH_MR_RESERVED) + return EINVAL; + + + switch (mr->lkey) { + case ODP_GLOBAL_R_LKEY: + return mlx5_prefetch_implicit_lkey(pd, &pd->r_ilkey, + (unsigned long)attr->addr, + attr->length, attr->flags); + case ODP_GLOBAL_W_LKEY: + return mlx5_prefetch_implicit_lkey(pd, &pd->w_ilkey, + (unsigned long)attr->addr, + attr->length, attr->flags); + default: + break; + } + + return ibv_cmd_exp_prefetch_mr(mr, attr); +} + +int mlx5_round_up_power_of_two(long long sz) +{ + long long ret; + + for (ret = 1; ret < sz; ret <<= 1) + ; /* nothing */ + + if (ret > INT_MAX) { + fprintf(stderr, "%s: roundup overflow\n", __func__); + return -ENOMEM; + } + + return (int)ret; +} + +static int align_queue_size(long long req) +{ + return mlx5_round_up_power_of_two(req); +} + +static int get_cqe_size(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + struct mlx5_context *ctx = to_mctx(context); + int size = ctx->cache_line_size; + + size = max(size, 64); + size = min(size, 128); + + if (!ibv_exp_cmd_getenv(context, "MLX5_CQE_SIZE", env, sizeof(env))) + size = atoi(env); + + switch (size) { + case 64: + case 128: + return size; + + default: + return -EINVAL; + } +} + +static int rwq_sig_enabled(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_RWQ_SIGNATURE", env, sizeof(env))) + return 1; + + return 0; +} + +static int srq_sig_enabled(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_SRQ_SIGNATURE", env, sizeof(env))) + return 1; + + return 0; +} + +static int qp_sig_enabled(struct ibv_context *context) +{ + char env[VERBS_MAX_ENV_VAL]; + + if (!ibv_exp_cmd_getenv(context, "MLX5_QP_SIGNATURE", env, sizeof(env))) + return 1; + + return 0; +} + +enum { + EXP_CREATE_CQ_SUPPORTED_FLAGS = IBV_EXP_CQ_CREATE_CROSS_CHANNEL | + IBV_EXP_CQ_TIMESTAMP +}; + +static struct ibv_cq *create_cq(struct ibv_context *context, + int cqe, + struct ibv_comp_channel *channel, + int comp_vector, + struct ibv_exp_cq_init_attr *attr) +{ + struct mlx5_create_cq cmd; + struct mlx5_exp_create_cq cmd_e; + struct mlx5_create_cq_resp resp; + struct mlx5_cq *cq; + struct mlx5_context *mctx = to_mctx(context); + int cqe_sz; + int ret; + int ncqe; + int thread_safe; +#ifdef MLX5_DEBUG + FILE *fp = mctx->dbg_fp; +#endif + + if (!cqe) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + errno = EINVAL; + return NULL; + } + + cq = calloc(1, sizeof *cq); + if (!cq) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + return NULL; + } + + memset(&cmd, 0, sizeof(cmd)); + memset(&cmd_e, 0, sizeof(cmd_e)); + cq->cons_index = 0; + /* wait_index should start at value before 0 */ + cq->wait_index = (uint32_t)(-1); + cq->wait_count = 0; + + cq->pattern = MLX5_CQ_PATTERN; + thread_safe = !mlx5_single_threaded; + if (attr && (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN)) { + if (!attr->res_domain) { + errno = EINVAL; + goto err; + } + thread_safe = (to_mres_domain(attr->res_domain)->attr.thread_model == IBV_EXP_THREAD_SAFE); + } + if (mlx5_lock_init(&cq->lock, thread_safe, mlx5_get_locktype())) + goto err; + + cq->model_flags = thread_safe ? MLX5_CQ_MODEL_FLAG_THREAD_SAFE : 0; + + /* The additional entry is required for resize CQ */ + if (cqe <= 0) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + errno = EINVAL; + goto err_spl; + } + + ncqe = align_queue_size(cqe + 1); + if ((ncqe > (1 << 24)) || (ncqe < (cqe + 1))) { + mlx5_dbg(fp, MLX5_DBG_CQ, "ncqe %d\n", ncqe); + errno = EINVAL; + goto err_spl; + } + + cqe_sz = get_cqe_size(context); + if (cqe_sz < 0) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + errno = -cqe_sz; + goto err_spl; + } + + if (mlx5_alloc_cq_buf(mctx, cq, &cq->buf_a, ncqe, cqe_sz)) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + goto err_spl; + } + + cq->dbrec = mlx5_alloc_dbrec(mctx); + if (!cq->dbrec) { + mlx5_dbg(fp, MLX5_DBG_CQ, "\n"); + goto err_buf; + } + + cq->dbrec[MLX5_CQ_SET_CI] = 0; + cq->dbrec[MLX5_CQ_ARM_DB] = 0; + cq->arm_sn = 0; + cq->cqe_sz = cqe_sz; + + if (attr->comp_mask || mctx->cqe_comp_max_num) { + if (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_FLAGS && + attr->flags & ~EXP_CREATE_CQ_SUPPORTED_FLAGS) { + mlx5_dbg(fp, MLX5_DBG_CQ, + "Unsupported creation flags requested\n"); + errno = EINVAL; + goto err_db; + } + + cmd_e.buf_addr = (uintptr_t) cq->buf_a.buf; + cmd_e.db_addr = (uintptr_t) cq->dbrec; + cmd_e.cqe_size = cqe_sz; + cmd_e.size_of_prefix = offsetof(struct mlx5_exp_create_cq, + prefix_reserved); + cmd_e.exp_data.comp_mask = MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_EN | + MLX5_EXP_CREATE_CQ_MASK_CQE_COMP_RECV_TYPE; + if (mctx->cqe_comp_max_num) { + cmd_e.exp_data.cqe_comp_en = mctx->enable_cqe_comp ? 1 : 0; + cmd_e.exp_data.cqe_comp_recv_type = MLX5_CQE_FORMAT_HASH; + } + } else { + cmd.buf_addr = (uintptr_t) cq->buf_a.buf; + cmd.db_addr = (uintptr_t) cq->dbrec; + cmd.cqe_size = cqe_sz; + } + + if (attr->comp_mask || cmd_e.exp_data.comp_mask) + ret = ibv_exp_cmd_create_cq(context, ncqe - 1, channel, + comp_vector, &cq->ibv_cq, + &cmd_e.ibv_cmd, + sizeof(cmd_e.ibv_cmd), + sizeof(cmd_e) - sizeof(cmd_e.ibv_cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp) - sizeof(resp.ibv_resp), attr); + else + ret = ibv_cmd_create_cq(context, ncqe - 1, channel, comp_vector, + &cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd, + &resp.ibv_resp, sizeof(resp)); + + if (ret) { + mlx5_dbg(fp, MLX5_DBG_CQ, "ret %d\n", ret); + goto err_db; + } + + if (attr->comp_mask & IBV_EXP_CQ_INIT_ATTR_FLAGS && + attr->flags & IBV_EXP_CQ_TIMESTAMP) + cq->creation_flags |= + MLX5_CQ_CREATION_FLAG_COMPLETION_TIMESTAMP; + + cq->active_buf = &cq->buf_a; + cq->resize_buf = NULL; + cq->cqn = resp.cqn; + cq->stall_enable = mctx->stall_enable; + cq->stall_adaptive_enable = mctx->stall_adaptive_enable; + cq->stall_cycles = mctx->stall_cycles; + cq->cq_log_size = mlx5_ilog2(ncqe); + + return &cq->ibv_cq; + +err_db: + mlx5_free_db(mctx, cq->dbrec); + +err_buf: + mlx5_free_cq_buf(mctx, &cq->buf_a); + +err_spl: + mlx5_lock_destroy(&cq->lock); + +err: + free(cq); + + return NULL; +} + +struct ibv_cq *mlx5_create_cq(struct ibv_context *context, int cqe, + struct ibv_comp_channel *channel, + int comp_vector) +{ + struct ibv_exp_cq_init_attr attr; + + read_init_vars(to_mctx(context)); + attr.comp_mask = 0; + return create_cq(context, cqe, channel, comp_vector, &attr); +} + +struct ibv_cq *mlx5_create_cq_ex(struct ibv_context *context, + int cqe, + struct ibv_comp_channel *channel, + int comp_vector, + struct ibv_exp_cq_init_attr *attr) +{ + return create_cq(context, cqe, channel, comp_vector, attr); +} + +int mlx5_resize_cq(struct ibv_cq *ibcq, int cqe) +{ + struct mlx5_cq *cq = to_mcq(ibcq); + struct mlx5_resize_cq_resp resp; + struct mlx5_resize_cq cmd; + struct mlx5_context *mctx = to_mctx(ibcq->context); + int err; + + if (cqe < 0) { + errno = EINVAL; + return errno; + } + + memset(&cmd, 0, sizeof(cmd)); + memset(&resp, 0, sizeof(resp)); + + if (((long long)cqe * 64) > INT_MAX) + return EINVAL; + + mlx5_lock(&cq->lock); + cq->active_cqes = cq->ibv_cq.cqe; + if (cq->active_buf == &cq->buf_a) + cq->resize_buf = &cq->buf_b; + else + cq->resize_buf = &cq->buf_a; + + cqe = align_queue_size(cqe + 1); + if (cqe == ibcq->cqe + 1) { + cq->resize_buf = NULL; + err = 0; + goto out; + } + + /* currently we don't change cqe size */ + cq->resize_cqe_sz = cq->cqe_sz; + cq->resize_cqes = cqe; + err = mlx5_alloc_cq_buf(mctx, cq, cq->resize_buf, cq->resize_cqes, cq->resize_cqe_sz); + if (err) { + cq->resize_buf = NULL; + errno = ENOMEM; + goto out; + } + + cmd.buf_addr = (uintptr_t)cq->resize_buf->buf; + cmd.cqe_size = cq->resize_cqe_sz; + + err = ibv_cmd_resize_cq(ibcq, cqe - 1, &cmd.ibv_cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp)); + if (err) + goto out_buf; + + mlx5_cq_resize_copy_cqes(cq); + mlx5_free_cq_buf(mctx, cq->active_buf); + cq->active_buf = cq->resize_buf; + cq->ibv_cq.cqe = cqe - 1; + cq->cq_log_size = mlx5_ilog2(cqe); + mlx5_update_cons_index(cq); + mlx5_unlock(&cq->lock); + cq->resize_buf = NULL; + return 0; + +out_buf: + mlx5_free_cq_buf(mctx, cq->resize_buf); + cq->resize_buf = NULL; + +out: + mlx5_unlock(&cq->lock); + return err; +} + +int mlx5_destroy_cq(struct ibv_cq *cq) +{ + int ret; + + ret = ibv_cmd_destroy_cq(cq); + if (ret) + return ret; + + mlx5_free_db(to_mctx(cq->context), to_mcq(cq)->dbrec); + mlx5_free_cq_buf(to_mctx(cq->context), to_mcq(cq)->active_buf); + free(to_mcq(cq)); + + return 0; +} + +struct ibv_srq *mlx5_create_srq(struct ibv_pd *pd, + struct ibv_srq_init_attr *attr) +{ + struct mlx5_create_srq cmd; + struct mlx5_create_srq_resp resp; + struct mlx5_srq *srq; + int ret; + struct mlx5_context *ctx; + int max_sge; + struct ibv_srq *ibsrq; + + ctx = to_mctx(pd->context); + srq = calloc(1, sizeof *srq); + if (!srq) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + return NULL; + } + ibsrq = (struct ibv_srq *)&srq->vsrq; + srq->is_xsrq = 0; + + memset(&cmd, 0, sizeof cmd); + if (mlx5_spinlock_init(&srq->lock, !mlx5_single_threaded)) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err; + } + + if (attr->attr.max_wr > ctx->max_srq_recv_wr) { + fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", __func__, __LINE__, + attr->attr.max_wr, ctx->max_srq_recv_wr); + errno = EINVAL; + goto err; + } + + /* + * this calculation does not consider required control segments. The + * final calculation is done again later. This is done so to avoid + * overflows of variables + */ + max_sge = ctx->max_rq_desc_sz / sizeof(struct mlx5_wqe_data_seg); + if (attr->attr.max_sge > max_sge) { + fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", __func__, __LINE__, + attr->attr.max_wr, ctx->max_srq_recv_wr); + errno = EINVAL; + goto err; + } + + srq->max = align_queue_size(attr->attr.max_wr + 1); + srq->max_gs = attr->attr.max_sge; + srq->counter = 0; + + if (mlx5_alloc_srq_buf(pd->context, srq)) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err; + } + + srq->db = mlx5_alloc_dbrec(to_mctx(pd->context)); + if (!srq->db) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err_free; + } + + *srq->db = 0; + + cmd.buf_addr = (uintptr_t) srq->buf.buf; + cmd.db_addr = (uintptr_t) srq->db; + srq->wq_sig = srq_sig_enabled(pd->context); + if (srq->wq_sig) + cmd.flags = MLX5_SRQ_FLAG_SIGNATURE; + + attr->attr.max_sge = srq->max_gs; + pthread_mutex_lock(&ctx->srq_table_mutex); + ret = ibv_cmd_create_srq(pd, ibsrq, attr, &cmd.ibv_cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp)); + if (ret) + goto err_db; + + ret = mlx5_store_srq(ctx, resp.srqn, srq); + if (ret) + goto err_destroy; + + pthread_mutex_unlock(&ctx->srq_table_mutex); + + srq->srqn = resp.srqn; + srq->rsc.rsn = resp.srqn; + srq->rsc.type = MLX5_RSC_TYPE_SRQ; + + return ibsrq; + +err_destroy: + ibv_cmd_destroy_srq(ibsrq); + +err_db: + pthread_mutex_unlock(&ctx->srq_table_mutex); + mlx5_free_db(to_mctx(pd->context), srq->db); + +err_free: + free(srq->wrid); + mlx5_free_buf(&srq->buf); + +err: + free(srq); + + return NULL; +} + +int mlx5_modify_srq(struct ibv_srq *srq, + struct ibv_srq_attr *attr, + int attr_mask) +{ + struct ibv_modify_srq cmd; + + if (srq->handle == LEGACY_XRC_SRQ_HANDLE) + srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq); + + return ibv_cmd_modify_srq(srq, attr, attr_mask, &cmd, sizeof cmd); +} + +int mlx5_query_srq(struct ibv_srq *srq, + struct ibv_srq_attr *attr) +{ + struct ibv_query_srq cmd; + if (srq->handle == LEGACY_XRC_SRQ_HANDLE) + srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq); + + return ibv_cmd_query_srq(srq, attr, &cmd, sizeof cmd); +} + +int mlx5_destroy_srq(struct ibv_srq *srq) +{ + struct ibv_srq *legacy_srq = NULL; + struct mlx5_srq *msrq; + struct mlx5_context *ctx = to_mctx(srq->context); + int ret; + + if (srq->handle == LEGACY_XRC_SRQ_HANDLE) { + legacy_srq = srq; + srq = (struct ibv_srq *)(((struct ibv_srq_legacy *) srq)->ibv_srq); + } + + msrq = to_msrq(srq); + ret = ibv_cmd_destroy_srq(srq); + if (ret) + return ret; + + if (ctx->cqe_version && msrq->is_xsrq) + mlx5_clear_uidx(ctx, msrq->rsc.rsn); + else + mlx5_clear_srq(ctx, msrq->srqn); + + mlx5_free_db(ctx, msrq->db); + mlx5_free_buf(&msrq->buf); + free(msrq->wrid); + free(msrq); + + if (legacy_srq) + free(legacy_srq); + + return 0; +} + +static int sq_overhead(struct ibv_exp_qp_init_attr *attr, struct mlx5_qp *qp, + int *inl_atom) +{ + int size1 = 0; + int size2 = 0; + int atom = 0; + + switch (attr->qp_type) { + case IBV_QPT_RC: + size1 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_umr_ctrl_seg) + + sizeof(struct mlx5_mkey_seg) + + sizeof(struct mlx5_seg_repeat_block); + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_raddr_seg); + + if (qp->enable_atomics) { + if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) && + (attr->max_atomic_arg > 4)) + atom = 4 * attr->max_atomic_arg; + /* TBD: change when we support data pointer args */ + if (inl_atom) + *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom); + } + break; + + case IBV_QPT_UC: + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_raddr_seg); + break; + + case IBV_QPT_UD: + size1 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_umr_ctrl_seg) + + sizeof(struct mlx5_mkey_seg) + + sizeof(struct mlx5_seg_repeat_block); + + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_datagram_seg); + break; + + case IBV_QPT_XRC: + case IBV_QPT_XRC_SEND: + case IBV_QPT_XRC_RECV: + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_xrc_seg) + + sizeof(struct mlx5_wqe_raddr_seg); + if (qp->enable_atomics) { + if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) && + (attr->max_atomic_arg > 4)) + atom = 4 * attr->max_atomic_arg; + /* TBD: change when we support data pointer args */ + if (inl_atom) + *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom); + } + break; + + case IBV_EXP_QPT_DC_INI: + size1 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_umr_ctrl_seg) + + sizeof(struct mlx5_mkey_seg) + + sizeof(struct mlx5_seg_repeat_block); + + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_datagram_seg) + + sizeof(struct mlx5_wqe_raddr_seg); + if (qp->enable_atomics) { + if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG) && + (attr->max_atomic_arg > 4)) + atom = 4 * attr->max_atomic_arg; + /* TBD: change when we support data pointer args */ + if (inl_atom) + *inl_atom = max(sizeof(struct mlx5_wqe_atomic_seg), atom); + } + break; + + case IBV_QPT_RAW_ETH: + size2 = sizeof(struct mlx5_wqe_ctrl_seg) + + sizeof(struct mlx5_wqe_eth_seg); + break; + + default: + return -EINVAL; + } + + if (qp->umr_en) + return max(size1, size2); + else + return size2; +} + +static int mlx5_max4(int t1, int t2, int t3, int t4) +{ + if (t1 < t2) + t1 = t2; + + if (t1 < t3) + t1 = t3; + + if (t1 < t4) + return t4; + + return t1; +} + +static int mlx5_calc_send_wqe(struct mlx5_context *ctx, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp) +{ + int inl_size = 0; + int max_gather; + int tot_size; + int overhead; + int inl_umr = 0; + int inl_atom = 0; + int t1 = 0; + int t2 = 0; + int t3 = 0; + int t4 = 0; + + overhead = sq_overhead(attr, qp, &inl_atom); + if (overhead < 0) + return overhead; + + if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG)) + qp->max_atomic_arg = attr->max_atomic_arg; + if ((attr->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS) && + attr->max_inl_send_klms) + inl_umr = attr->max_inl_send_klms * 16; + + if (attr->cap.max_inline_data) { + inl_size = align(sizeof(struct mlx5_wqe_inl_data_seg) + + attr->cap.max_inline_data, 16); + } + + max_gather = (ctx->max_sq_desc_sz - overhead) / + sizeof(struct mlx5_wqe_data_seg); + if (attr->cap.max_send_sge > max_gather) + return -EINVAL; + + if (inl_atom) + t1 = overhead + sizeof(struct mlx5_wqe_data_seg) + inl_atom; + + t2 = overhead + attr->cap.max_send_sge * sizeof(struct mlx5_wqe_data_seg); + + t3 = overhead + inl_umr; + t4 = overhead + inl_size; + + tot_size = mlx5_max4(t1, t2, t3, t4); + + if (tot_size > ctx->max_sq_desc_sz) + return -EINVAL; + + return align(tot_size, MLX5_SEND_WQE_BB); +} + +static int mlx5_calc_rcv_wqe(struct mlx5_context *ctx, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp) +{ + int size; + int num_scatter; + + if (attr->srq) + return 0; + + num_scatter = max(attr->cap.max_recv_sge, 1); + size = sizeof(struct mlx5_wqe_data_seg) * num_scatter; + if (qp->ctrl_seg.wq_sig) + size += sizeof(struct mlx5_rwqe_sig); + + if (size < 0 || size > ctx->max_rq_desc_sz) + return -EINVAL; + + size = mlx5_round_up_power_of_two(size); + + return size; +} + +static int get_send_sge(struct ibv_exp_qp_init_attr *attr, int wqe_size, struct mlx5_qp *qp) +{ + int max_sge; + int overhead = sq_overhead(attr, qp, NULL); + + if (attr->qp_type == IBV_QPT_RC) + max_sge = (min(wqe_size, 512) - + sizeof(struct mlx5_wqe_ctrl_seg) - + sizeof(struct mlx5_wqe_raddr_seg)) / + sizeof(struct mlx5_wqe_data_seg); + else if (attr->qp_type == IBV_EXP_QPT_DC_INI) + max_sge = (min(wqe_size, 512) - + sizeof(struct mlx5_wqe_ctrl_seg) - + sizeof(struct mlx5_wqe_datagram_seg) - + sizeof(struct mlx5_wqe_raddr_seg)) / + sizeof(struct mlx5_wqe_data_seg); + else if (attr->qp_type == IBV_QPT_XRC) + max_sge = (min(wqe_size, 512) - + sizeof(struct mlx5_wqe_ctrl_seg) - + sizeof(struct mlx5_wqe_xrc_seg) - + sizeof(struct mlx5_wqe_raddr_seg)) / + sizeof(struct mlx5_wqe_data_seg); + else + max_sge = (wqe_size - overhead) / + sizeof(struct mlx5_wqe_data_seg); + + return min(max_sge, wqe_size - overhead / + sizeof(struct mlx5_wqe_data_seg)); +} + +static int mlx5_calc_sq_size(struct mlx5_context *ctx, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp) +{ + int wqe_size; + int wq_size; +#ifdef MLX5_DEBUG + FILE *fp = ctx->dbg_fp; +#endif + + if (!attr->cap.max_send_wr) + return 0; + + wqe_size = mlx5_calc_send_wqe(ctx, attr, qp); + if (wqe_size < 0) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return wqe_size; + } + + if (attr->qp_type == IBV_EXP_QPT_DC_INI && + wqe_size > ctx->max_desc_sz_sq_dc) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -EINVAL; + } else if (wqe_size > ctx->max_sq_desc_sz) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -EINVAL; + } + + qp->data_seg.max_inline_data = wqe_size - sq_overhead(attr, qp, NULL) - + sizeof(struct mlx5_wqe_inl_data_seg); + attr->cap.max_inline_data = qp->data_seg.max_inline_data; + + /* + * to avoid overflow, we limit max_send_wr so + * that the multiplication will fit in int + */ + if (attr->cap.max_send_wr > 0x7fffffff / ctx->max_sq_desc_sz) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -ENOMEM; + } + + wq_size = mlx5_round_up_power_of_two(attr->cap.max_send_wr * wqe_size); + qp->sq.wqe_cnt = wq_size / MLX5_SEND_WQE_BB; + if (qp->sq.wqe_cnt > ctx->max_send_wqebb) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -ENOMEM; + } + + qp->sq.wqe_shift = mlx5_ilog2(MLX5_SEND_WQE_BB); + qp->sq.max_gs = get_send_sge(attr, wqe_size, qp); + if (qp->sq.max_gs < attr->cap.max_send_sge) + return -ENOMEM; + + attr->cap.max_send_sge = qp->sq.max_gs; + if (qp->umr_en) { + qp->max_inl_send_klms = ((attr->qp_type == IBV_QPT_RC) || + (attr->qp_type == IBV_EXP_QPT_DC_INI)) ? + attr->max_inl_send_klms : 0; + attr->max_inl_send_klms = qp->max_inl_send_klms; + } + qp->sq.max_post = wq_size / wqe_size; + + return wq_size; +} + +static int qpt_has_rq(enum ibv_qp_type qpt) +{ + switch (qpt) { + case IBV_QPT_RC: + case IBV_QPT_UC: + case IBV_QPT_UD: + case IBV_QPT_RAW_ETH: + return 1; + + case IBV_QPT_XRC: + case IBV_QPT_XRC_SEND: + case IBV_QPT_XRC_RECV: + case IBV_EXP_QPT_DC_INI: + return 0; + } + return 0; +} + +static int mlx5_calc_rwq_size(struct mlx5_context *ctx, + struct mlx5_rwq *rwq, + struct ibv_exp_wq_init_attr *attr) +{ + int wqe_size; + int wq_size; + int num_scatter; + int scat_spc; + int mp_rq = !!(attr->comp_mask & IBV_EXP_CREATE_WQ_MP_RQ); + + if (!attr->max_recv_wr) + return -EINVAL; + + /* TBD: check caps for RQ */ + num_scatter = max(attr->max_recv_sge, 1); + wqe_size = sizeof(struct mlx5_wqe_data_seg) * num_scatter + + /* In case of mp_rq the WQE format is like SRQ. + * Need to add the extra octword even when we don't + * use linked list. + */ + (mp_rq ? sizeof(struct mlx5_wqe_srq_next_seg) : 0); + + if (rwq->wq_sig) + wqe_size += sizeof(struct mlx5_rwqe_sig); + + if (wqe_size <= 0 || wqe_size > ctx->max_rq_desc_sz) + return -EINVAL; + + wqe_size = mlx5_round_up_power_of_two(wqe_size); + wq_size = mlx5_round_up_power_of_two(attr->max_recv_wr) * wqe_size; + wq_size = max(wq_size, MLX5_SEND_WQE_BB); + rwq->rq.wqe_cnt = wq_size / wqe_size; + rwq->rq.wqe_shift = mlx5_ilog2(wqe_size); + rwq->rq.max_post = 1 << mlx5_ilog2(wq_size / wqe_size); + scat_spc = wqe_size - + ((rwq->wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0) - + (mp_rq ? sizeof(struct mlx5_wqe_srq_next_seg) : 0); + rwq->rq.max_gs = scat_spc / sizeof(struct mlx5_wqe_data_seg); + return wq_size; +} + +static int mlx5_calc_rq_size(struct mlx5_context *ctx, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp) +{ + int wqe_size; + int wq_size; + int scat_spc; +#ifdef MLX5_DEBUG + FILE *fp = ctx->dbg_fp; +#endif + + if (!attr->cap.max_recv_wr || !qpt_has_rq(attr->qp_type)) + return 0; + + if (attr->cap.max_recv_wr > ctx->max_recv_wr) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -EINVAL; + } + + wqe_size = mlx5_calc_rcv_wqe(ctx, attr, qp); + if (wqe_size < 0 || wqe_size > ctx->max_rq_desc_sz) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return -EINVAL; + } + + wq_size = mlx5_round_up_power_of_two(attr->cap.max_recv_wr) * wqe_size; + if (wqe_size) { + wq_size = max(wq_size, MLX5_SEND_WQE_BB); + qp->rq.wqe_cnt = wq_size / wqe_size; + qp->rq.wqe_shift = mlx5_ilog2(wqe_size); + qp->rq.max_post = 1 << mlx5_ilog2(wq_size / wqe_size); + scat_spc = wqe_size - + ((qp->ctrl_seg.wq_sig) ? sizeof(struct mlx5_rwqe_sig) : 0); + qp->rq.max_gs = scat_spc / sizeof(struct mlx5_wqe_data_seg); + } else { + qp->rq.wqe_cnt = 0; + qp->rq.wqe_shift = 0; + qp->rq.max_post = 0; + qp->rq.max_gs = 0; + } + return wq_size; +} + +static int mlx5_calc_wq_size(struct mlx5_context *ctx, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp) +{ + int ret; + int result; + + ret = mlx5_calc_sq_size(ctx, attr, qp); + if (ret < 0) + return ret; + + result = ret; + ret = mlx5_calc_rq_size(ctx, attr, qp); + if (ret < 0) + return ret; + + result += ret; + + qp->sq.offset = ret; + qp->rq.offset = 0; + + return result; +} + +static void map_uuar(struct ibv_context *context, struct mlx5_qp *qp, + int uuar_index) +{ + struct mlx5_context *ctx = to_mctx(context); + + qp->gen_data.bf = &ctx->bfs[uuar_index]; +} + +static const char *qptype2key(enum ibv_qp_type type) +{ + switch (type) { + case IBV_QPT_RC: return "HUGE_RC"; + case IBV_QPT_UC: return "HUGE_UC"; + case IBV_QPT_UD: return "HUGE_UD"; +#ifdef _NOT_EXISTS_IN_OFED_2_0 + case IBV_QPT_RAW_PACKET: return "HUGE_RAW_ETH"; +#endif + default: return "HUGE_NA"; + } +} + +static void mlx5_free_rwq_buf(struct mlx5_rwq *rwq, struct ibv_context *context) +{ + struct mlx5_context *ctx = to_mctx(context); + + mlx5_free_actual_buf(ctx, &rwq->buf); + if (rwq->consumed_strides_counter) + free(rwq->consumed_strides_counter); + + free(rwq->rq.wrid); +} + +static int mlx5_alloc_rwq_buf(struct ibv_context *context, + struct mlx5_rwq *rwq, + int size, + enum mlx5_rsc_type rsc_type) +{ + int err; + enum mlx5_alloc_type default_alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG; + + rwq->rq.wrid = malloc(rwq->rq.wqe_cnt * sizeof(uint64_t)); + if (!rwq->rq.wrid) { + errno = ENOMEM; + return -1; + } + + if (rsc_type == MLX5_RSC_TYPE_MP_RWQ) { + rwq->consumed_strides_counter = calloc(1, rwq->rq.wqe_cnt * sizeof(uint32_t)); + if (!rwq->consumed_strides_counter) { + errno = ENOMEM; + goto free_wr_id; + } + } + + rwq->buf.numa_req.valid = 1; + rwq->buf.numa_req.numa_id = to_mctx(context)->numa_id; + err = mlx5_alloc_prefered_buf(to_mctx(context), &rwq->buf, + align(rwq->buf_size, to_mdev + (context->device)->page_size), + to_mdev(context->device)->page_size, + default_alloc_type, + MLX5_RWQ_PREFIX); + + if (err) { + errno = ENOMEM; + goto free_strd_cnt; + } + + return 0; + +free_strd_cnt: + if (rwq->consumed_strides_counter) + free(rwq->consumed_strides_counter); + +free_wr_id: + free(rwq->rq.wrid); + + return -1; +} +static int mlx5_alloc_qp_buf(struct ibv_context *context, + struct ibv_exp_qp_init_attr *attr, + struct mlx5_qp *qp, + int size) +{ + int err; + enum mlx5_alloc_type alloc_type; + enum mlx5_alloc_type default_alloc_type = MLX5_ALLOC_TYPE_PREFER_CONTIG; + const char *qp_huge_key; + + if (qp->sq.wqe_cnt) { + qp->sq.wrid = malloc(qp->sq.wqe_cnt * sizeof(*qp->sq.wrid)); + if (!qp->sq.wrid) { + errno = ENOMEM; + err = -1; + } + } + + qp->gen_data.wqe_head = malloc(qp->sq.wqe_cnt * sizeof(*qp->gen_data.wqe_head)); + if (!qp->gen_data.wqe_head) { + errno = ENOMEM; + err = -1; + goto ex_wrid; + } + + if (qp->rq.wqe_cnt) { + qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t)); + if (!qp->rq.wrid) { + errno = ENOMEM; + err = -1; + goto ex_wrid; + } + } + + /* compatability support */ + qp_huge_key = qptype2key(qp->verbs_qp.qp.qp_type); + if (mlx5_use_huge(context, qp_huge_key)) + default_alloc_type = MLX5_ALLOC_TYPE_HUGE; + + mlx5_get_alloc_type(context, MLX5_QP_PREFIX, &alloc_type, + default_alloc_type); + + qp->buf.numa_req.valid = 1; + qp->buf.numa_req.numa_id = to_mctx(context)->numa_id; + err = mlx5_alloc_prefered_buf(to_mctx(context), &qp->buf, + align(qp->buf_size, to_mdev + (context->device)->page_size), + to_mdev(context->device)->page_size, + alloc_type, + MLX5_QP_PREFIX); + + if (err) { + err = -ENOMEM; + goto ex_wrid; + } + + memset(qp->buf.buf, 0, qp->buf_size); + + if (attr->qp_type == IBV_QPT_RAW_ETH) { + /* For Raw Ethernet QP, allocate a separate buffer for the SQ */ + err = mlx5_alloc_prefered_buf(to_mctx(context), &qp->sq_buf, + align(qp->sq_buf_size, to_mdev + (context->device)->page_size), + to_mdev(context->device)->page_size, + alloc_type, + MLX5_QP_PREFIX); + if (err) { + err = -ENOMEM; + goto rq_buf; + } + + memset(qp->sq_buf.buf, 0, qp->buf_size - qp->sq.offset); + } + + return 0; +rq_buf: + mlx5_free_actual_buf(to_mctx(qp->verbs_qp.qp.context), &qp->buf); +ex_wrid: + if (qp->rq.wrid) + free(qp->rq.wrid); + + if (qp->gen_data.wqe_head) + free(qp->gen_data.wqe_head); + + if (qp->sq.wrid) + free(qp->sq.wrid); + + return err; +} + +static void mlx5_free_qp_buf(struct mlx5_qp *qp) +{ + struct mlx5_context *ctx = to_mctx(qp->verbs_qp.qp.context); + + mlx5_free_actual_buf(ctx, &qp->buf); + + if (qp->sq_buf.buf) + mlx5_free_actual_buf(ctx, &qp->sq_buf); + + if (qp->rq.wrid) + free(qp->rq.wrid); + + if (qp->gen_data.wqe_head) + free(qp->gen_data.wqe_head); + + if (qp->sq.wrid) + free(qp->sq.wrid); +} + +static void update_caps(struct ibv_context *context) +{ + struct mlx5_context *ctx; + struct ibv_exp_device_attr attr; + int err; + + ctx = to_mctx(context); + if (ctx->info.valid) + return; + + attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1; + err = ibv_exp_query_device(context, &attr); + if (err) + return; + + ctx->info.exp_atomic_cap = attr.exp_atomic_cap; + ctx->info.valid = 1; + ctx->max_sge = attr.max_sge; + if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_UMR) + ctx->max_send_wqe_inline_klms = + attr.umr_caps.max_send_wqe_inline_klms; + if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXT_ATOMIC_ARGS) + ctx->info.bit_mask_log_atomic_arg_sizes = + attr.ext_atom.log_atomic_arg_sizes; + + return; +} + +static inline int is_xrc_tgt(int type) +{ + return (type == IBV_QPT_XRC_RECV); +} + +static struct ibv_qp *create_qp(struct ibv_context *context, + struct ibv_exp_qp_init_attr *attrx, + int is_exp) +{ + struct mlx5_create_qp cmd; + struct mlx5_create_qp_resp resp; + struct mlx5_exp_create_qp cmdx; + struct mlx5_exp_create_qp_resp respx; + struct mlx5_qp *qp; + int ret; + struct mlx5_context *ctx = to_mctx(context); + struct ibv_qp *ibqp; + struct mlx5_drv_create_qp *drv; + struct mlx5_exp_drv_create_qp *drvx; + int lib_cmd_size; + int drv_cmd_size; + int lib_resp_size; + int drv_resp_size; + int thread_safe = !mlx5_single_threaded; + void *_cmd; + void *_resp; +#ifdef MLX5_DEBUG + FILE *fp = ctx->dbg_fp; +#endif + + /* Use experimental path when driver pass experimental data */ + is_exp = is_exp || (ctx->cqe_version != 0) || + (attrx->qp_type == IBV_QPT_RAW_ETH); + + update_caps(context); + qp = calloc(1, sizeof(*qp)); + if (!qp) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + return NULL; + } + ibqp = (struct ibv_qp *)&qp->verbs_qp; + + if (is_exp) { + memset(&cmdx, 0, sizeof(cmdx)); + memset(&respx, 0, sizeof(respx)); + drv = (struct mlx5_drv_create_qp *)(void *)(&cmdx.drv); + drvx = &cmdx.drv; + drvx->size_of_prefix = offsetof(struct mlx5_exp_drv_create_qp, prefix_reserved); + _cmd = &cmdx.ibv_cmd; + _resp = &respx.ibv_resp; + lib_cmd_size = sizeof(cmdx.ibv_cmd); + drv_cmd_size = sizeof(*drvx); + lib_resp_size = sizeof(respx.ibv_resp); + drv_resp_size = sizeof(respx) - sizeof(respx.ibv_resp); + } else { + memset(&cmd, 0, sizeof(cmd)); + drv = &cmd.drv; + _cmd = &cmd.ibv_cmd; + _resp = &resp.ibv_resp; + lib_cmd_size = sizeof(cmd.ibv_cmd); + drv_cmd_size = sizeof(*drv); + lib_resp_size = sizeof(resp.ibv_resp); + drv_resp_size = sizeof(resp) - sizeof(resp.ibv_resp); + } + + if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RX_HASH) && attrx->qp_type == IBV_QPT_RAW_ETH) { + if (attrx->send_cq || attrx->recv_cq || attrx->srq || + attrx->cap.max_inline_data || attrx->cap.max_recv_sge || + attrx->cap.max_recv_wr || attrx->cap.max_send_sge || + attrx->cap.max_send_wr) { + errno = EINVAL; + goto err; + } + + ret = ibv_exp_cmd_create_qp(context, &qp->verbs_qp, + sizeof(qp->verbs_qp), + attrx, + _cmd, + lib_cmd_size, + 0, + _resp, + lib_resp_size, + 0, 1); + if (ret) + goto err; + + qp->rx_qp = 1; + return ibqp; + } + + qp->ctrl_seg.wq_sig = qp_sig_enabled(context); + if (qp->ctrl_seg.wq_sig) + drv->flags |= MLX5_QP_FLAG_SIGNATURE; + + if ((ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_HCA_REPLY_BE) && + (attrx->exp_create_flags & IBV_EXP_QP_CREATE_ATOMIC_BE_REPLY)) { + qp->enable_atomics = 1; + } else if ((ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_HCA) || + (ctx->info.exp_atomic_cap == IBV_EXP_ATOMIC_GLOB)) { + qp->enable_atomics = 1; + } + + if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS) && + (!(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) || + !(attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR))) { + errno = EINVAL; + goto err; + } + + if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) && + (attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR) && + !(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_MAX_INL_KLMS)) { + errno = EINVAL; + goto err; + } + + if ((attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) && + (attrx->exp_create_flags & IBV_EXP_QP_CREATE_UMR)) + qp->umr_en = 1; + + if (attrx->cap.max_send_sge > ctx->max_sge) { + errno = EINVAL; + goto err; + } + + if (qp->umr_en && (attrx->max_inl_send_klms > + ctx->max_send_wqe_inline_klms)) { + errno = EINVAL; + goto err; + } + + ret = mlx5_calc_wq_size(ctx, attrx, qp); + if (ret < 0) { + errno = -ret; + goto err; + } + + if (attrx->qp_type == IBV_QPT_RAW_ETH) { + qp->buf_size = qp->sq.offset; + qp->sq_buf_size = ret - qp->buf_size; + } else { + qp->buf_size = ret; + qp->sq_buf_size = 0; + } + + if (attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_CREATE_FLAGS) + qp->gen_data.create_flags = attrx->exp_create_flags & IBV_EXP_QP_CREATE_MASK; + + if (mlx5_alloc_qp_buf(context, attrx, qp, ret)) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + goto err; + } + + if (attrx->qp_type == IBV_QPT_RAW_ETH) { + qp->gen_data.sqstart = qp->sq_buf.buf; + qp->gen_data.sqend = qp->sq_buf.buf + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); + } else { + qp->gen_data.sqstart = qp->buf.buf + qp->sq.offset; + qp->gen_data.sqend = qp->buf.buf + qp->sq.offset + + (qp->sq.wqe_cnt << qp->sq.wqe_shift); + } + qp->odp_data.pd = to_mpd(attrx->pd); + + mlx5_init_qp_indices(qp); + + /* Check if UAR provided by resource domain */ + if (attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RES_DOMAIN) { + struct mlx5_res_domain *res_domain = to_mres_domain(attrx->res_domain); + + drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_WC_UAR_IDX; + if (res_domain->send_db) { + drvx->exp.wc_uar_index = res_domain->send_db->wc_uar->uar_idx; + qp->gen_data.bf = &res_domain->send_db->bf; + } else { + /* If we didn't allocate dedicated BF for this resource + * domain we'll ask the kernel to provide UUAR that uses + * DB only (no BF) + */ + drvx->exp.wc_uar_index = MLX5_EXP_CREATE_QP_DB_ONLY_UUAR; + } + thread_safe = (res_domain->attr.thread_model == IBV_EXP_THREAD_SAFE); + } + if (mlx5_lock_init(&qp->sq.lock, thread_safe, mlx5_get_locktype()) || + mlx5_lock_init(&qp->rq.lock, thread_safe, mlx5_get_locktype())) + goto err_free_qp_buf; + qp->gen_data.model_flags = thread_safe ? MLX5_QP_MODEL_FLAG_THREAD_SAFE : 0; + + qp->gen_data.db = mlx5_alloc_dbrec(ctx); + if (!qp->gen_data.db) { + mlx5_dbg(fp, MLX5_DBG_QP, "\n"); + goto err_free_qp_buf; + } + + qp->gen_data.db[MLX5_RCV_DBR] = 0; + qp->gen_data.db[MLX5_SND_DBR] = 0; + qp->rq.buff = qp->buf.buf + qp->rq.offset; + qp->sq.buff = qp->buf.buf + qp->sq.offset; + qp->rq.db = &qp->gen_data.db[MLX5_RCV_DBR]; + qp->sq.db = &qp->gen_data.db[MLX5_SND_DBR]; + + drv->buf_addr = (uintptr_t) qp->buf.buf; + if (attrx->qp_type == IBV_QPT_RAW_ETH) { + drvx->exp.sq_buf_addr = (uintptr_t)qp->sq_buf.buf; + drvx->exp.flags |= MLX5_EXP_CREATE_QP_MULTI_PACKET_WQE_REQ_FLAG; + drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_SQ_BUFF_ADD | + MLX5_EXP_CREATE_QP_MASK_FLAGS_IDX; + } + drv->db_addr = (uintptr_t) qp->gen_data.db; + drv->sq_wqe_count = qp->sq.wqe_cnt; + drv->rq_wqe_count = qp->rq.wqe_cnt; + drv->rq_wqe_shift = qp->rq.wqe_shift; + if (!ctx->cqe_version) { + pthread_mutex_lock(&ctx->rsc_table_mutex); + } else if (!is_xrc_tgt(attrx->qp_type)) { + drvx->exp.uidx = mlx5_store_uidx(ctx, qp); + if (drvx->exp.uidx < 0) { + mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n"); + goto err_rq_db; + } + drvx->exp.comp_mask |= MLX5_EXP_CREATE_QP_MASK_UIDX; + } + + ret = ibv_exp_cmd_create_qp(context, &qp->verbs_qp, + sizeof(qp->verbs_qp), + attrx, + _cmd, + lib_cmd_size, + drv_cmd_size, + _resp, + lib_resp_size, + drv_resp_size, + /* Force experimental */ + is_exp); + if (ret) { + mlx5_dbg(fp, MLX5_DBG_QP, "ret %d\n", ret); + goto err_free_uidx; + } + + if (!ctx->cqe_version) { + ret = mlx5_store_rsc(ctx, ibqp->qp_num, qp); + if (ret) { + mlx5_dbg(fp, MLX5_DBG_QP, "ret %d\n", ret); + goto err_destroy; + } + pthread_mutex_unlock(&ctx->rsc_table_mutex); + } + + /* Update related BF mapping when uuar not provided by resource domain */ + if (!(attrx->comp_mask & IBV_EXP_QP_INIT_ATTR_RES_DOMAIN) || + !to_mres_domain(attrx->res_domain)->send_db) { + if (is_exp) + map_uuar(context, qp, respx.uuar_index); + else + map_uuar(context, qp, resp.uuar_index); + } + qp->gen_data_warm.pattern = MLX5_QP_PATTERN; + + qp->rq.max_post = qp->rq.wqe_cnt; + if (attrx->sq_sig_all) + qp->sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE; + else + qp->sq_signal_bits = 0; + + attrx->cap.max_send_wr = qp->sq.max_post; + attrx->cap.max_recv_wr = qp->rq.max_post; + attrx->cap.max_recv_sge = qp->rq.max_gs; + qp->rsc.type = MLX5_RSC_TYPE_QP; + if (is_exp && (drvx->exp.comp_mask & MLX5_EXP_CREATE_QP_MASK_UIDX)) + qp->rsc.rsn = drvx->exp.uidx; + else + qp->rsc.rsn = ibqp->qp_num; + + if (is_exp && (respx.exp.comp_mask & MLX5_EXP_CREATE_QP_RESP_MASK_FLAGS_IDX) && + (respx.exp.flags & MLX5_EXP_CREATE_QP_RESP_MULTI_PACKET_WQE_FLAG)) + qp->gen_data.model_flags |= MLX5_QP_MODEL_MULTI_PACKET_WQE; + + mlx5_build_ctrl_seg_data(qp, ibqp->qp_num); + qp->gen_data_warm.qp_type = ibqp->qp_type; + mlx5_update_post_send_one(qp, ibqp->state, ibqp->qp_type); + + return ibqp; + +err_destroy: + ibv_cmd_destroy_qp(ibqp); +err_free_uidx: + if (!ctx->cqe_version) + pthread_mutex_unlock(&to_mctx(context)->rsc_table_mutex); + else if (!is_xrc_tgt(attrx->qp_type)) + mlx5_clear_uidx(ctx, drvx->exp.uidx); +err_rq_db: + mlx5_free_db(to_mctx(context), qp->gen_data.db); + +err_free_qp_buf: + mlx5_free_qp_buf(qp); +err: + free(qp); + + return NULL; +} + +struct ibv_qp *mlx5_drv_create_qp(struct ibv_context *context, + struct ibv_qp_init_attr_ex *attrx) +{ + if (attrx->comp_mask >= IBV_QP_INIT_ATTR_RESERVED) { + errno = EINVAL; + return NULL; + } + + return create_qp(context, (struct ibv_exp_qp_init_attr *)attrx, 1); +} + +struct ibv_qp *mlx5_exp_create_qp(struct ibv_context *context, + struct ibv_exp_qp_init_attr *attrx) +{ + return create_qp(context, attrx, 1); +} + +struct ibv_qp *mlx5_create_qp(struct ibv_pd *pd, + struct ibv_qp_init_attr *attr) +{ + struct ibv_exp_qp_init_attr attrx; + struct ibv_qp *qp; + int copy_sz = offsetof(struct ibv_qp_init_attr, xrc_domain); + + memset(&attrx, 0, sizeof(attrx)); + memcpy(&attrx, attr, copy_sz); + attrx.comp_mask = IBV_QP_INIT_ATTR_PD; + attrx.pd = pd; + qp = create_qp(pd->context, &attrx, 0); + if (qp) + memcpy(attr, &attrx, copy_sz); + + return qp; +} + +struct ibv_exp_rwq_ind_table *mlx5_exp_create_rwq_ind_table(struct ibv_context *context, + struct ibv_exp_rwq_ind_table_init_attr *init_attr) +{ + struct ibv_exp_create_rwq_ind_table *cmd; + struct mlx5_exp_create_rwq_ind_table_resp resp; + struct ibv_exp_rwq_ind_table *ind_table; + uint32_t required_tbl_size; + int num_tbl_entries; + int cmd_size; + int err; + + num_tbl_entries = 1 << init_attr->log_ind_tbl_size; + /* Data must be u64 aligned */ + required_tbl_size = (num_tbl_entries * sizeof(uint32_t)) < sizeof(uint64_t) ? + sizeof(uint64_t) : (num_tbl_entries * sizeof(uint32_t)); + + cmd_size = required_tbl_size + sizeof(*cmd); + cmd = calloc(1, cmd_size); + if (!cmd) + return NULL; + memset(&resp, 0, sizeof(resp)); + + ind_table = calloc(1, sizeof(*ind_table)); + if (!ind_table) + goto free_cmd; + + err = ibv_exp_cmd_create_rwq_ind_table(context, init_attr, ind_table, cmd, + cmd_size, cmd_size, &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp)); + if (err) + goto err; + + free(cmd); + return ind_table; + +err: + free(ind_table); +free_cmd: + free(cmd); + return NULL; +} + +int mlx5_exp_destroy_rwq_ind_table(struct ibv_exp_rwq_ind_table *rwq_ind_table) +{ + struct mlx5_exp_destroy_rwq_ind_table cmd; + int ret; + + memset(&cmd, 0, sizeof(cmd)); + ret = ibv_exp_cmd_destroy_rwq_ind_table(rwq_ind_table); + + if (ret) + return ret; + + free(rwq_ind_table); + return 0; +} + +struct ibv_exp_wq *mlx5_exp_create_wq(struct ibv_context *context, + struct ibv_exp_wq_init_attr *attr) +{ + struct mlx5_exp_create_wq cmd; + struct mlx5_exp_create_wq_resp resp; + int err; + struct mlx5_rwq *rwq; + struct mlx5_context *ctx = to_mctx(context); + int ret; + int thread_safe = !mlx5_single_threaded; + struct ibv_exp_device_attr device_attr; + enum mlx5_rsc_type rsc_type; +#ifdef MLX5_DEBUG + FILE *fp = ctx->dbg_fp; +#endif + + if (attr->wq_type != IBV_EXP_WQT_RQ) + return NULL; + + memset(&cmd, 0, sizeof(cmd)); + memset(&resp, 0, sizeof(resp)); + + rwq = calloc(1, sizeof(*rwq)); + if (!rwq) + return NULL; + + rwq->wq_sig = rwq_sig_enabled(context); + if (rwq->wq_sig) + cmd.drv.flags = MLX5_RWQ_FLAG_SIGNATURE; + + ret = mlx5_calc_rwq_size(ctx, rwq, attr); + if (ret < 0) { + errno = -ret; + goto err; + } + + rwq->buf_size = ret; + if (attr->comp_mask & IBV_EXP_CREATE_WQ_MP_RQ) { + /* Make sure requested mp_rq values supported by lib */ + if ((attr->mp_rq.single_stride_log_num_of_bytes > MLX5_MP_RQ_MAX_LOG_STRIDE_SIZE) || + (attr->mp_rq.single_wqe_log_num_of_strides > MLX5_MP_RQ_MAX_LOG_NUM_STRIDES) || + (attr->mp_rq.use_shift & ~MLX5_MP_RQ_SUPPORTED_SHIFTS)) { + errno = EINVAL; + goto err; + } + rsc_type = MLX5_RSC_TYPE_MP_RWQ; + rwq->mp_rq_stride_size = 1 << attr->mp_rq.single_stride_log_num_of_bytes; + rwq->mp_rq_strides_in_wqe = 1 << attr->mp_rq.single_wqe_log_num_of_strides; + if (attr->mp_rq.use_shift == IBV_EXP_MP_RQ_2BYTES_SHIFT) + rwq->mp_rq_packet_padding = 2; + } else { + rsc_type = MLX5_RSC_TYPE_RWQ; + } + if (mlx5_alloc_rwq_buf(context, rwq, ret, rsc_type)) + goto err; + + mlx5_init_rwq_indices(rwq); + + if (attr->comp_mask & IBV_EXP_CREATE_WQ_RES_DOMAIN) + thread_safe = (to_mres_domain(attr->res_domain)->attr.thread_model == IBV_EXP_THREAD_SAFE); + + rwq->model_flags = thread_safe ? MLX5_WQ_MODEL_FLAG_THREAD_SAFE : 0; + + memset(&device_attr, 0, sizeof(device_attr)); + device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS; + ret = ibv_exp_query_device(context, &device_attr); + /* Check if RX offloads supported */ + if (!ret && (device_attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS) && + (device_attr.exp_device_cap_flags & IBV_EXP_DEVICE_RX_CSUM_IP_PKT)) + rwq->model_flags |= MLX5_WQ_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP; + + if (mlx5_lock_init(&rwq->rq.lock, thread_safe, mlx5_get_locktype())) + goto err_free_rwq_buf; + + rwq->db = mlx5_alloc_dbrec(ctx); + if (!rwq->db) + goto err_free_rwq_buf; + + rwq->db[MLX5_RCV_DBR] = 0; + rwq->db[MLX5_SND_DBR] = 0; + rwq->rq.buff = rwq->buf.buf + rwq->rq.offset; + rwq->rq.db = &rwq->db[MLX5_RCV_DBR]; + rwq->pattern = MLX5_WQ_PATTERN; + + cmd.drv.buf_addr = (uintptr_t)rwq->buf.buf; + cmd.drv.db_addr = (uintptr_t)rwq->db; + cmd.drv.rq_wqe_count = rwq->rq.wqe_cnt; + cmd.drv.rq_wqe_shift = rwq->rq.wqe_shift; + cmd.drv.user_index = mlx5_store_uidx(ctx, rwq); + if (cmd.drv.user_index < 0) { + mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n"); + goto err_free_db_rec; + } + + err = ibv_exp_cmd_create_wq(context, attr, &rwq->wq, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp)); + if (err) + goto err_create; + + rwq->rsc.type = rsc_type; + rwq->rsc.rsn = cmd.drv.user_index; + + return &rwq->wq; + +err_create: + mlx5_clear_uidx(ctx, cmd.drv.user_index); +err_free_db_rec: + mlx5_free_db(to_mctx(context), rwq->db); +err_free_rwq_buf: + mlx5_free_rwq_buf(rwq, context); +err: + free(rwq); + return NULL; +} + +int mlx5_exp_modify_wq(struct ibv_exp_wq *wq, + struct ibv_exp_wq_attr *attr) +{ + struct mlx5_exp_modify_wq cmd; + struct mlx5_rwq *rwq = to_mrwq(wq); + int ret; + + if ((attr->attr_mask & IBV_EXP_WQ_ATTR_STATE) && + attr->wq_state == IBV_EXP_WQS_RDY) { + if ((attr->attr_mask & IBV_EXP_WQ_ATTR_CURR_STATE) && + attr->curr_wq_state != wq->state) + return -EINVAL; + + if (wq->state == IBV_EXP_WQS_RESET) { + mlx5_lock(&to_mcq(wq->cq)->lock); + __mlx5_cq_clean(to_mcq(wq->cq), + rwq->rsc.rsn, wq->srq ? to_msrq(wq->srq) : NULL); + mlx5_unlock(&to_mcq(wq->cq)->lock); + mlx5_init_rwq_indices(rwq); + rwq->db[MLX5_RCV_DBR] = 0; + rwq->db[MLX5_SND_DBR] = 0; + } + } + + memset(&cmd, 0, sizeof(cmd)); + ret = ibv_exp_cmd_modify_wq(wq, attr, &cmd.ibv_cmd, sizeof(cmd)); + return ret; +} + +int mlx5_exp_destroy_wq(struct ibv_exp_wq *wq) +{ + struct mlx5_rwq *rwq = to_mrwq(wq); + int ret; + + ret = ibv_exp_cmd_destroy_wq(wq); + if (ret) { + pthread_mutex_unlock(&to_mctx(wq->context)->rsc_table_mutex); + return ret; + } + + mlx5_lock(&to_mcq(wq->cq)->lock); + __mlx5_cq_clean(to_mcq(wq->cq), rwq->rsc.rsn, + wq->srq ? to_msrq(wq->srq) : NULL); + mlx5_unlock(&to_mcq(wq->cq)->lock); + + mlx5_clear_uidx(to_mctx(wq->context), rwq->rsc.rsn); + mlx5_free_db(to_mctx(wq->context), rwq->db); + mlx5_free_rwq_buf(rwq, wq->context); + free(rwq); + + return 0; +} + +struct ibv_exp_dct *mlx5_create_dct(struct ibv_context *context, + struct ibv_exp_dct_init_attr *attr) +{ + struct mlx5_create_dct cmd; + struct mlx5_create_dct_resp resp; + struct mlx5_destroy_dct cmdd; + struct mlx5_destroy_dct_resp respd; + int err; + struct mlx5_dct *dct; + struct mlx5_context *ctx = to_mctx(context); +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(context)->dbg_fp; +#endif + + memset(&cmd, 0, sizeof(cmd)); + memset(&cmdd, 0, sizeof(cmdd)); + memset(&resp, 0, sizeof(resp)); + dct = calloc(1, sizeof(*dct)); + if (!dct) + return NULL; + + if (ctx->cqe_version) { + cmd.drv.uidx = mlx5_store_uidx(ctx, dct); + if (cmd.drv.uidx < 0) { + mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n"); + goto ex_err; + } + } else { + pthread_mutex_lock(&ctx->rsc_table_mutex); + } + + err = ibv_exp_cmd_create_dct(context, &dct->ibdct, attr, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd) - sizeof(cmd.ibv_cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp) - sizeof(resp.ibv_resp)); + if (err) + goto err_uidx; + + dct->ibdct.handle = resp.ibv_resp.dct_handle; + dct->ibdct.dct_num = resp.ibv_resp.dct_num; + dct->ibdct.pd = attr->pd; + dct->ibdct.cq = attr->cq; + dct->ibdct.srq = attr->srq; + + if (!ctx->cqe_version) { + err = mlx5_store_rsc(ctx, dct->ibdct.dct_num, dct); + if (err) + goto err_destroy; + + pthread_mutex_unlock(&ctx->rsc_table_mutex); + } + dct->rsc.type = MLX5_RSC_TYPE_DCT; + dct->rsc.rsn = ctx->cqe_version ? cmd.drv.uidx : + resp.ibv_resp.dct_num; + + return &dct->ibdct; + +err_destroy: + if (ibv_exp_cmd_destroy_dct(context, &dct->ibdct, + &cmdd.ibv_cmd, + sizeof(cmdd.ibv_cmd), + sizeof(cmdd) - sizeof(cmdd.ibv_cmd), + &respd.ibv_resp, sizeof(respd.ibv_resp), + sizeof(respd) - sizeof(respd.ibv_resp))) + fprintf(stderr, "failed to destory DCT\n"); +err_uidx: + if (ctx->cqe_version) + mlx5_clear_uidx(ctx, cmd.drv.uidx); + else + pthread_mutex_unlock(&ctx->rsc_table_mutex); +ex_err: + free(dct); + return NULL; +} + +int mlx5_destroy_dct(struct ibv_exp_dct *dct) +{ + struct mlx5_destroy_dct cmd; + struct mlx5_destroy_dct_resp resp; + int err; + struct mlx5_dct *mdct = to_mdct(dct); + struct mlx5_context *ctx = to_mctx(dct->context); + + + memset(&cmd, 0, sizeof(cmd)); + if (!ctx->cqe_version) + pthread_mutex_lock(&ctx->rsc_table_mutex); + cmd.ibv_cmd.dct_handle = dct->handle; + err = ibv_exp_cmd_destroy_dct(dct->context, dct, + &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd) - sizeof(cmd.ibv_cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp) - sizeof(resp.ibv_resp)); + if (err) + goto ex_err; + + mlx5_cq_clean(to_mcq(dct->cq), mdct->rsc.rsn, to_msrq(dct->srq)); + if (ctx->cqe_version) { + mlx5_clear_uidx(ctx, mdct->rsc.rsn); + } else { + mlx5_clear_rsc(to_mctx(dct->context), dct->dct_num); + pthread_mutex_unlock(&ctx->rsc_table_mutex); + } + + free(mdct); + return 0; + +ex_err: + if (!ctx->cqe_version) + pthread_mutex_unlock(&ctx->rsc_table_mutex); + return err; +} + +int mlx5_query_dct(struct ibv_exp_dct *dct, struct ibv_exp_dct_attr *attr) +{ + struct mlx5_query_dct cmd; + struct mlx5_query_dct_resp resp; + int err; + + cmd.ibv_cmd.dct_handle = dct->handle; + err = ibv_exp_cmd_query_dct(dct->context, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd) - sizeof(cmd.ibv_cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp) - sizeof(resp.ibv_resp), + attr); + if (err) + goto out; + + attr->cq = dct->cq; + attr->pd = dct->pd; + attr->srq = dct->srq; + +out: + return err; +} + +int mlx5_arm_dct(struct ibv_exp_dct *dct, struct ibv_exp_arm_attr *attr) +{ + struct mlx5_arm_dct cmd; + struct mlx5_arm_dct_resp resp; + int err; + + memset(&cmd, 0, sizeof(cmd)); + memset(&resp, 0, sizeof(resp)); + cmd.ibv_cmd.dct_handle = dct->handle; + err = ibv_exp_cmd_arm_dct(dct->context, attr, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd) - sizeof(cmd.ibv_cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp) - sizeof(resp.ibv_resp)); + return err; +} + +static void mlx5_lock_cqs(struct ibv_qp *qp) +{ + struct mlx5_cq *send_cq = to_mcq(qp->send_cq); + struct mlx5_cq *recv_cq = to_mcq(qp->recv_cq); + + if (send_cq && recv_cq) { + if (send_cq == recv_cq) { + mlx5_lock(&send_cq->lock); + } else if (send_cq->cqn < recv_cq->cqn) { + mlx5_lock(&send_cq->lock); + mlx5_lock(&recv_cq->lock); + } else { + mlx5_lock(&recv_cq->lock); + mlx5_lock(&send_cq->lock); + } + } else if (send_cq) { + mlx5_lock(&send_cq->lock); + } else if (recv_cq) { + mlx5_lock(&recv_cq->lock); + } +} + +static void mlx5_unlock_cqs(struct ibv_qp *qp) +{ + struct mlx5_cq *send_cq = to_mcq(qp->send_cq); + struct mlx5_cq *recv_cq = to_mcq(qp->recv_cq); + + if (send_cq && recv_cq) { + if (send_cq == recv_cq) { + mlx5_unlock(&send_cq->lock); + } else if (send_cq->cqn < recv_cq->cqn) { + mlx5_unlock(&recv_cq->lock); + mlx5_unlock(&send_cq->lock); + } else { + mlx5_unlock(&send_cq->lock); + mlx5_unlock(&recv_cq->lock); + } + } else if (send_cq) { + mlx5_unlock(&send_cq->lock); + } else if (recv_cq) { + mlx5_unlock(&recv_cq->lock); + } +} + +int mlx5_destroy_qp(struct ibv_qp *ibqp) +{ + struct mlx5_qp *qp = to_mqp(ibqp); + struct mlx5_context *ctx = to_mctx(ibqp->context); + int ret; + + if (qp->rx_qp) { + ret = ibv_cmd_destroy_qp(ibqp); + if (ret) + return ret; + goto free; + } + + if (!ctx->cqe_version) + pthread_mutex_lock(&ctx->rsc_table_mutex); + + ret = ibv_cmd_destroy_qp(ibqp); + if (ret) { + if (!ctx->cqe_version) + pthread_mutex_unlock(&to_mctx(ibqp->context)->rsc_table_mutex); + return ret; + } + + mlx5_lock_cqs(ibqp); + + __mlx5_cq_clean(to_mcq(ibqp->recv_cq), qp->rsc.rsn, + ibqp->srq ? to_msrq(ibqp->srq) : NULL); + if (ibqp->send_cq != ibqp->recv_cq) + __mlx5_cq_clean(to_mcq(ibqp->send_cq), qp->rsc.rsn, NULL); + + if (!ctx->cqe_version) + mlx5_clear_rsc(ctx, ibqp->qp_num); + + mlx5_unlock_cqs(ibqp); + if (!ctx->cqe_version) + pthread_mutex_unlock(&ctx->rsc_table_mutex); + else if (!is_xrc_tgt(ibqp->qp_type)) + mlx5_clear_uidx(ctx, qp->rsc.rsn); + + mlx5_free_db(ctx, qp->gen_data.db); + mlx5_free_qp_buf(qp); +free: + free(qp); + + return 0; +} + +int mlx5_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr, + int attr_mask, struct ibv_qp_init_attr *init_attr) +{ + struct ibv_query_qp cmd; + struct mlx5_qp *qp = to_mqp(ibqp); + int ret; + + if (qp->rx_qp) + return -ENOSYS; + + ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd, sizeof(cmd)); + if (ret) + return ret; + + init_attr->cap.max_send_wr = qp->sq.max_post; + init_attr->cap.max_send_sge = qp->sq.max_gs; + init_attr->cap.max_inline_data = qp->data_seg.max_inline_data; + + attr->cap = init_attr->cap; + + return 0; +} + +int mlx5_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + int attr_mask) +{ + struct mlx5_qp *mqp = to_mqp(qp); + struct ibv_port_attr port_attr; + struct ibv_modify_qp cmd; + int ret; + uint32_t *db; + + if (attr_mask & IBV_QP_PORT) { + ret = ibv_query_port(qp->context, attr->port_num, + &port_attr); + if (ret) + return ret; + mqp->link_layer = port_attr.link_layer; + } + + if (to_mqp(qp)->rx_qp) + return -ENOSYS; + + ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd)); + + if (!ret && + (attr_mask & IBV_QP_STATE) && + attr->qp_state == IBV_QPS_RESET) { + if (qp->recv_cq) { + mlx5_cq_clean(to_mcq(qp->recv_cq), mqp->rsc.rsn, + qp->srq ? to_msrq(qp->srq) : NULL); + } + if (qp->send_cq != qp->recv_cq && qp->send_cq) + mlx5_cq_clean(to_mcq(qp->send_cq), mqp->rsc.rsn, NULL); + + mlx5_init_qp_indices(mqp); + db = mqp->gen_data.db; + db[MLX5_RCV_DBR] = 0; + db[MLX5_SND_DBR] = 0; + } + if (!ret && (attr_mask & IBV_QP_STATE)) + mlx5_update_post_send_one(mqp, qp->state, qp->qp_type); + + if (!ret && + (attr_mask & IBV_QP_STATE) && + attr->qp_state == IBV_QPS_RTR && + qp->qp_type == IBV_QPT_RAW_ETH) { + mlx5_lock(&mqp->rq.lock); + mqp->gen_data.db[MLX5_RCV_DBR] = htonl(mqp->rq.head & 0xffff); + mlx5_unlock(&mqp->rq.lock); + } + + + return ret; +} + +#ifndef s6_addr32 +#define s6_addr32 __u6_addr.__u6_addr32 +#endif + +static inline int ipv6_addr_v4mapped(const struct in6_addr *a) +{ + return ((a->s6_addr32[0] | a->s6_addr32[1]) | + (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL || + /* IPv4 encoded multicast addresses */ + (a->s6_addr32[0] == htonl(0xff0e0000) && + ((a->s6_addr32[1] | + (a->s6_addr32[2] ^ htonl(0x0000ffff))) == 0UL)); +} + +struct ibv_ah *mlx5_create_ah_common(struct ibv_pd *pd, + struct ibv_ah_attr *attr, + uint8_t link_layer, + int gid_type) +{ + struct mlx5_ah *ah; + struct mlx5_context *ctx = to_mctx(pd->context); + struct mlx5_wqe_av *wqe; + uint32_t tmp; + uint8_t grh; + + if (unlikely(attr->port_num < 1 || attr->port_num > ctx->num_ports)) { + errno = EINVAL; + return NULL; + } + + if (unlikely(!attr->dlid) && + (link_layer != IBV_LINK_LAYER_ETHERNET)) { + errno = EINVAL; + return NULL; + } + + if (unlikely(!attr->is_global) && + (link_layer == IBV_LINK_LAYER_ETHERNET)) { + errno = EINVAL; + return NULL; + } + + ah = calloc(1, sizeof *ah); + if (unlikely(!ah)) { + errno = ENOMEM; + return NULL; + } + wqe = &ah->av; + + wqe->base.stat_rate_sl = (attr->static_rate << 4) | attr->sl; + + if (link_layer == IBV_LINK_LAYER_ETHERNET) { + if (gid_type == IBV_EXP_ROCE_V2_GID_TYPE) + wqe->base.rlid = htons(ctx->rroce_udp_sport_min); + grh = 0; + } else { + wqe->base.fl_mlid = attr->src_path_bits & 0x7f; + wqe->base.rlid = htons(attr->dlid); + grh = 1; + } + + if (attr->is_global) { + wqe->base.dqp_dct = htonl(MLX5_EXTENDED_UD_AV); + wqe->grh_sec.tclass = attr->grh.traffic_class; + if ((attr->grh.hop_limit < 2) && + (link_layer == IBV_LINK_LAYER_ETHERNET) && + (gid_type != IBV_EXP_IB_ROCE_V1_GID_TYPE)) + wqe->grh_sec.hop_limit = 0xff; + else + wqe->grh_sec.hop_limit = attr->grh.hop_limit; + tmp = htonl((grh << 30) | + ((attr->grh.sgid_index & 0xff) << 20) | + (attr->grh.flow_label & 0xfffff)); + wqe->grh_sec.grh_gid_fl = tmp; + memcpy(wqe->grh_sec.rgid, attr->grh.dgid.raw, 16); + if ((link_layer == IBV_LINK_LAYER_ETHERNET) && + (gid_type != IBV_EXP_IB_ROCE_V1_GID_TYPE) && + ipv6_addr_v4mapped((struct in6_addr *)attr->grh.dgid.raw)) + memset(wqe->grh_sec.rgid, 0, 12); + } else if (!ctx->compact_av) { + wqe->base.dqp_dct = htonl(MLX5_EXTENDED_UD_AV); + } + + return &ah->ibv_ah; +} + +struct ibv_ah *mlx5_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) +{ + struct ibv_exp_port_attr port_attr; + + port_attr.comp_mask = IBV_EXP_QUERY_PORT_ATTR_MASK1; + port_attr.mask1 = IBV_EXP_QUERY_PORT_LINK_LAYER; + + if (ibv_exp_query_port(pd->context, attr->port_num, &port_attr)) + return NULL; + + return mlx5_create_ah_common(pd, attr, port_attr.link_layer, + IBV_EXP_IB_ROCE_V1_GID_TYPE); +} + +struct ibv_ah *mlx5_exp_create_ah(struct ibv_pd *pd, + struct ibv_exp_ah_attr *attr_ex) +{ + struct mlx5_ah *mah; + struct ibv_ah *ah; + struct ibv_exp_port_attr port_attr; + struct ibv_exp_gid_attr gid_attr; + + gid_attr.comp_mask = IBV_EXP_QUERY_GID_ATTR_TYPE; + if (ibv_exp_query_gid_attr(pd->context, attr_ex->port_num, attr_ex->grh.sgid_index, + &gid_attr)) + return NULL; + + port_attr.comp_mask = IBV_EXP_QUERY_PORT_ATTR_MASK1; + port_attr.mask1 = IBV_EXP_QUERY_PORT_LINK_LAYER; + + if (ibv_exp_query_port(pd->context, attr_ex->port_num, &port_attr)) + return NULL; + + ah = mlx5_create_ah_common(pd, (struct ibv_ah_attr *)attr_ex, + port_attr.link_layer, gid_attr.type); + + if (!ah) + return NULL; + + mah = to_mah(ah); + + /* ll_address.len == 0 means no ll address given */ + if (attr_ex->comp_mask & IBV_EXP_AH_ATTR_LL && + 0 != attr_ex->ll_address.len) { + if (LL_ADDRESS_ETH != attr_ex->ll_address.type || + port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) + goto err; + + /* link layer is ethernet */ + if (6 != attr_ex->ll_address.len || + NULL == attr_ex->ll_address.address) + goto err; + + memcpy(mah->av.grh_sec.rmac, + attr_ex->ll_address.address, + attr_ex->ll_address.len); + } + + return ah; + +err: + free(ah); + return NULL; +} + +int mlx5_destroy_ah(struct ibv_ah *ah) +{ + free(to_mah(ah)); + + return 0; +} + +int mlx5_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid) +{ + return ibv_cmd_attach_mcast(qp, gid, lid); +} + +int mlx5_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid) +{ + return ibv_cmd_detach_mcast(qp, gid, lid); +} + +struct ibv_xrcd *mlx5_open_xrcd(struct ibv_context *context, + struct ibv_xrcd_init_attr *xrcd_init_attr) +{ + int err; + struct verbs_xrcd *xrcd; + struct ibv_open_xrcd cmd = {0}; + struct ibv_open_xrcd_resp resp = {0}; + + xrcd = calloc(1, sizeof(*xrcd)); + if (!xrcd) + return NULL; + + err = ibv_cmd_open_xrcd(context, xrcd, sizeof(*xrcd), xrcd_init_attr, + &cmd, sizeof(cmd), &resp, sizeof(resp)); + if (err) { + free(xrcd); + return NULL; + } + + return &xrcd->xrcd; +} + +struct ibv_srq *mlx5_create_xrc_srq(struct ibv_context *context, + struct ibv_srq_init_attr_ex *attr) +{ + int err; + struct mlx5_create_srq_ex cmd; + struct mlx5_create_srq_resp resp; + struct mlx5_srq *msrq; + struct mlx5_context *ctx; + int max_sge; + struct ibv_srq *ibsrq; +#ifdef MLX5_DEBUG + FILE *fp = to_mctx(context)->dbg_fp; +#endif + + msrq = calloc(1, sizeof(*msrq)); + if (!msrq) + return NULL; + + msrq->is_xsrq = 1; + ibsrq = (struct ibv_srq *)&msrq->vsrq; + + memset(&cmd, 0, sizeof(cmd)); + memset(&resp, 0, sizeof(resp)); + + ctx = to_mctx(context); + + if (mlx5_spinlock_init(&msrq->lock, !mlx5_single_threaded)) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err; + } + + if (attr->attr.max_wr > ctx->max_srq_recv_wr) { + fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", + __func__, __LINE__, attr->attr.max_wr, + ctx->max_srq_recv_wr); + errno = EINVAL; + goto err; + } + + /* + * this calculation does not consider required control segments. The + * final calculation is done again later. This is done so to avoid + * overflows of variables + */ + max_sge = ctx->max_recv_wr / sizeof(struct mlx5_wqe_data_seg); + if (attr->attr.max_sge > max_sge) { + fprintf(stderr, "%s-%d:max_wr %d, max_srq_recv_wr %d\n", + __func__, __LINE__, attr->attr.max_wr, + ctx->max_srq_recv_wr); + errno = EINVAL; + goto err; + } + + msrq->max = align_queue_size(attr->attr.max_wr + 1); + msrq->max_gs = attr->attr.max_sge; + msrq->counter = 0; + + if (mlx5_alloc_srq_buf(context, msrq)) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err; + } + + msrq->db = mlx5_alloc_dbrec(ctx); + if (!msrq->db) { + fprintf(stderr, "%s-%d:\n", __func__, __LINE__); + goto err_free; + } + + *msrq->db = 0; + + cmd.buf_addr = (uintptr_t) msrq->buf.buf; + cmd.db_addr = (uintptr_t) msrq->db; + msrq->wq_sig = srq_sig_enabled(context); + if (msrq->wq_sig) + cmd.flags = MLX5_SRQ_FLAG_SIGNATURE; + + attr->attr.max_sge = msrq->max_gs; + + if (ctx->cqe_version) { + cmd.uidx = mlx5_store_uidx(ctx, msrq); + if (cmd.uidx < 0) { + mlx5_dbg(fp, MLX5_DBG_QP, "Couldn't find free user index\n"); + goto err_free_db; + } + } else { + pthread_mutex_lock(&ctx->srq_table_mutex); + } + + err = ibv_cmd_create_srq_ex(context, &msrq->vsrq, sizeof(msrq->vsrq), + attr, &cmd.ibv_cmd, sizeof(cmd), + &resp.ibv_resp, sizeof(resp)); + if (err) + goto err_free_uidx; + + if (!ctx->cqe_version) { + err = mlx5_store_srq(to_mctx(context), resp.srqn, msrq); + if (err) + goto err_destroy; + + pthread_mutex_unlock(&ctx->srq_table_mutex); + } + + msrq->srqn = resp.srqn; + msrq->rsc.type = MLX5_RSC_TYPE_XSRQ; + msrq->rsc.rsn = ctx->cqe_version ? cmd.uidx : resp.srqn; + + return ibsrq; + +err_destroy: + ibv_cmd_destroy_srq(ibsrq); +err_free_uidx: + if (ctx->cqe_version) + mlx5_clear_uidx(ctx, cmd.uidx); + else + pthread_mutex_unlock(&ctx->srq_table_mutex); +err_free_db: + mlx5_free_db(ctx, msrq->db); + +err_free: + free(msrq->wrid); + mlx5_free_buf(&msrq->buf); + +err: + free(msrq); + + return NULL; +} +struct ibv_srq *mlx5_create_srq_ex(struct ibv_context *context, + struct ibv_srq_init_attr_ex *attr) +{ + if (!(attr->comp_mask & IBV_SRQ_INIT_ATTR_TYPE) || + (attr->srq_type == IBV_SRQT_BASIC)) + return mlx5_create_srq(attr->pd, + (struct ibv_srq_init_attr *)attr); + else if (attr->srq_type == IBV_SRQT_XRC) + return mlx5_create_xrc_srq(context, attr); + + return NULL; +} + +int mlx5_get_srq_num(struct ibv_srq *srq, uint32_t *srq_num) +{ + struct mlx5_srq *msrq = to_msrq(srq); + + *srq_num = msrq->srqn; + + return 0; +} + +struct ibv_qp *mlx5_open_qp(struct ibv_context *context, + struct ibv_qp_open_attr *attr) +{ + struct ibv_open_qp cmd; + struct ibv_create_qp_resp resp; + struct mlx5_qp *qp; + int ret; + struct mlx5_context *ctx = to_mctx(context); + + qp = calloc(1, sizeof(*qp)); + if (!qp) + return NULL; + + ret = ibv_cmd_open_qp(context, &qp->verbs_qp, sizeof(qp->verbs_qp), + attr, &cmd, sizeof(cmd), &resp, sizeof(resp)); + if (ret) + goto err; + + if (!ctx->cqe_version) { + pthread_mutex_lock(&ctx->rsc_table_mutex); + if (mlx5_store_rsc(ctx, qp->verbs_qp.qp.qp_num, qp)) { + pthread_mutex_unlock(&ctx->rsc_table_mutex); + goto destroy; + } + pthread_mutex_unlock(&ctx->rsc_table_mutex); + } + + return (struct ibv_qp *)&qp->verbs_qp; + +destroy: + ibv_cmd_destroy_qp(&qp->verbs_qp.qp); +err: + free(qp); + return NULL; +} + +int mlx5_close_xrcd(struct ibv_xrcd *ib_xrcd) +{ + struct verbs_xrcd *xrcd = container_of(ib_xrcd, struct verbs_xrcd, xrcd); + int ret; + + ret = ibv_cmd_close_xrcd(xrcd); + if (!ret) + free(xrcd); + + return ret; +} + +int mlx5_modify_qp_ex(struct ibv_qp *qp, struct ibv_exp_qp_attr *attr, + uint64_t attr_mask) +{ + struct mlx5_qp *mqp = to_mqp(qp); + struct ibv_port_attr port_attr; + struct ibv_exp_modify_qp cmd; + struct ibv_exp_device_attr device_attr; + int ret; + uint32_t *db; + + if (attr_mask & IBV_QP_PORT) { + ret = ibv_query_port(qp->context, attr->port_num, + &port_attr); + if (ret) + return ret; + mqp->link_layer = port_attr.link_layer; + if (((qp->qp_type == IBV_QPT_UD) && (mqp->link_layer == IBV_LINK_LAYER_INFINIBAND)) || + ((qp->qp_type == IBV_QPT_RAW_ETH) && (mqp->link_layer == IBV_LINK_LAYER_ETHERNET))) { + memset(&device_attr, 0, sizeof(device_attr)); + device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1; + ret = ibv_exp_query_device(qp->context, &device_attr); + if (ret) + return ret; + if ((device_attr.comp_mask & IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS) && + (device_attr.exp_device_cap_flags & IBV_EXP_DEVICE_RX_CSUM_IP_PKT)) + mqp->gen_data.model_flags |= MLX5_QP_MODEL_RX_CSUM_IP_OK_IP_NON_TCP_UDP; + } + } + + if (mqp->rx_qp) + return -ENOSYS; + + memset(&cmd, 0, sizeof(cmd)); + ret = ibv_exp_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd)); + + if (!ret && + (attr_mask & IBV_QP_STATE) && + attr->qp_state == IBV_QPS_RESET) { + if (qp->qp_type != IBV_EXP_QPT_DC_INI) + mlx5_cq_clean(to_mcq(qp->recv_cq), mqp->rsc.rsn, + qp->srq ? to_msrq(qp->srq) : NULL); + + if (qp->send_cq != qp->recv_cq) + mlx5_cq_clean(to_mcq(qp->send_cq), mqp->rsc.rsn, NULL); + + mlx5_init_qp_indices(to_mqp(qp)); + db = to_mqp(qp)->gen_data.db; + db[MLX5_RCV_DBR] = 0; + db[MLX5_SND_DBR] = 0; + } + if (!ret && (attr_mask & IBV_QP_STATE)) + mlx5_update_post_send_one(to_mqp(qp), qp->state, qp->qp_type); + + if (!ret && + (attr_mask & IBV_QP_STATE) && + attr->qp_state == IBV_QPS_RTR && + qp->qp_type == IBV_QPT_RAW_ETH) { + mlx5_lock(&mqp->rq.lock); + mqp->gen_data.db[MLX5_RCV_DBR] = htonl(mqp->rq.head & 0xffff); + mlx5_unlock(&mqp->rq.lock); + } + + return ret; +} + +void *mlx5_get_legacy_xrc(struct ibv_srq *srq) +{ + struct mlx5_srq *msrq = to_msrq(srq); + + return msrq->ibv_srq_legacy; +} + +void mlx5_set_legacy_xrc(struct ibv_srq *srq, void *legacy_xrc_srq) +{ + struct mlx5_srq *msrq = to_msrq(srq); + + msrq->ibv_srq_legacy = legacy_xrc_srq; + return; +} + +int mlx5_modify_cq(struct ibv_cq *cq, struct ibv_exp_cq_attr *attr, int attr_mask) +{ + struct ibv_exp_modify_cq cmd; + + memset(&cmd, 0, sizeof(cmd)); + return ibv_exp_cmd_modify_cq(cq, attr, attr_mask, &cmd, sizeof(cmd)); +} + +struct ibv_exp_mkey_list_container *mlx5_alloc_mkey_mem(struct ibv_exp_mkey_list_container_attr *attr) +{ + struct mlx5_klm_buf *klm; + int size; + + if (attr->mkey_list_type != + IBV_EXP_MKEY_LIST_TYPE_INDIRECT_MR) { + errno = ENOMEM; + return NULL; + } + + klm = calloc(1, sizeof(*klm)); + if (!klm) { + errno = ENOMEM; + return NULL; + } + + size = align(attr->max_klm_list_size * sizeof(struct mlx5_wqe_data_seg), 64); + + klm->alloc_buf = malloc(size + MLX5_UMR_PTR_ALIGN - 1); + if (!klm->alloc_buf) { + errno = ENOMEM; + goto ex_klm; + } + + klm->align_buf = align_ptr(klm->alloc_buf, MLX5_UMR_PTR_ALIGN); + + memset(klm->align_buf, 0, size); + klm->mr = ibv_reg_mr(attr->pd, klm->align_buf, size, 0); + if (!klm->mr) + goto ex_list; + + klm->ibv_klm_list.max_klm_list_size = attr->max_klm_list_size; + klm->ibv_klm_list.context = klm->mr->context; + + return &klm->ibv_klm_list; + +ex_list: + free(klm->alloc_buf); +ex_klm: + free(klm); + return NULL; +} + +int mlx5_free_mkey_mem(struct ibv_exp_mkey_list_container *mem) +{ + struct mlx5_klm_buf *klm; + int err; + + klm = to_klm(mem); + err = ibv_dereg_mr(klm->mr); + if (err) { + fprintf(stderr, "unreg klm failed\n"); + return err; + } + free(klm->alloc_buf); + free(klm); + return 0; +} + +int mlx5_query_mkey(struct ibv_mr *mr, struct ibv_exp_mkey_attr *mkey_attr) +{ + struct mlx5_query_mkey cmd; + struct mlx5_query_mkey_resp resp; + int err; + + memset(&cmd, 0, sizeof(cmd)); + err = ibv_exp_cmd_query_mkey(mr->context, mr, mkey_attr, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), sizeof(cmd), + &resp.ibv_resp, sizeof(resp.ibv_resp), + sizeof(resp)); + + return err; +}; + +struct ibv_mr *mlx5_create_mr(struct ibv_exp_create_mr_in *in) +{ + struct mlx5_create_mr cmd; + struct mlx5_create_mr_resp resp; + struct mlx5_mr *mr; + int err; + + if (in->attr.create_flags & IBV_EXP_MR_SIGNATURE_EN) { + errno = EOPNOTSUPP; + return NULL; + } + + memset(&cmd, 0, sizeof(cmd)); + memset(&resp, 0, sizeof(resp)); + + mr = calloc(1, sizeof(*mr)); + if (!mr) + return NULL; + + err = ibv_exp_cmd_create_mr(in, &mr->ibv_mr, &cmd.ibv_cmd, + sizeof(cmd.ibv_cmd), + sizeof(cmd) - sizeof(cmd.ibv_cmd), + &resp.ibv_resp, + sizeof(resp.ibv_resp), sizeof(resp) - sizeof(resp.ibv_resp)); + if (err) + goto out; + + return &mr->ibv_mr; + +out: + free(mr); + return NULL; +}; + +int mlx5_exp_dereg_mr(struct ibv_mr *ibmr, struct ibv_exp_dereg_out *out) +{ + struct mlx5_mr *mr; + + if (ibmr->lkey == ODP_GLOBAL_R_LKEY || ibmr->lkey == ODP_GLOBAL_W_LKEY) { + out->need_dofork = 0; + } else { + mr = to_mmr(ibmr); + out->need_dofork = (mr->buf.type == MLX5_ALLOC_TYPE_CONTIG || + mr->type == MLX5_ODP_MR) ? 0 : 1; + } + + return mlx5_dereg_mr(ibmr); +} + +struct mlx5_info_record { + uint16_t lid[30]; + uint32_t seq_num; +}; + +int mlx5_poll_dc_info(struct ibv_context *context, + struct ibv_exp_dc_info_ent *ents, + int nent, + int port) +{ + struct mlx5_context *ctx = to_mctx(context); + void *start; + struct mlx5_port_info_ctx *pc; + struct mlx5_info_record *cr; + int i; + int j; + uint32_t seq; + + if (!ctx->cc.buf) + return -ENOSYS; + + if (port < 1 || port > ctx->num_ports) + return -EINVAL; + + pc = &ctx->cc.port[port - 1]; + start = ctx->cc.buf + 4096 * (port - 1); + + cr = start + (pc->consumer & 0xfff); + for (i = 0; i < nent; i++) { + seq = ntohl(cr->seq_num); + /* The buffer is initialized to all ff. So if the HW did not write anything, + the condition below will cause a return without polling any record. */ + if ((seq & 0xfff) != (pc->consumer & 0xfff)) + return i; + + /* When the process comes to life, the buffer may alredy contain + valid records. The "steady" field allows the process to synchronize + and continue from there */ + if (pc->steady) { + if (((pc->consumer >> 12) - 1) == (seq >> 12)) + return i; + } else { + pc->consumer = seq & 0xfffff000; + pc->steady = 1; + } + + /* make sure LIDs are read after we indentify a new record */ + rmb(); + ents[i].seqnum = seq; + for (j = 0; j < 30; j++) + ents[i].lid[j] = ntohs(cr->lid[j]); + + pc->consumer += 64; + cr = start + (pc->consumer & 0xfff); + } + return i; +} + +static struct mlx5_send_db_data *allocate_send_db(struct mlx5_context *ctx) +{ + struct mlx5_device *dev = to_mdev(ctx->ibv_ctx.device); + struct mlx5_send_db_data *send_db = NULL; + unsigned int db_idx; + struct mlx5_wc_uar *wc_uar; + int j; + + + mlx5_spin_lock(&ctx->send_db_lock); + if (!list_empty(&ctx->send_wc_db_list)) { + send_db = list_entry(ctx->send_wc_db_list.next, struct mlx5_send_db_data, list); + list_del(&send_db->list); + } + mlx5_spin_unlock(&ctx->send_db_lock); + + if (!send_db) { + /* Fill up more send_db objects */ + wc_uar = calloc(1, sizeof(*wc_uar)); + if (!wc_uar) { + errno = ENOMEM; + return NULL; + } + mlx5_spin_lock(&ctx->send_db_lock); + /* One res_domain per UUAR */ + if (ctx->num_wc_uars >= ctx->max_ctx_res_domain / MLX5_NUM_UUARS_PER_PAGE) { + errno = ENOMEM; + goto out; + } + db_idx = ctx->num_wc_uars; + wc_uar->uar = mlx5_uar_mmap(db_idx, MLX5_EXP_IB_MMAP_N_ALLOC_WC_CMD, dev->page_size, ctx->ibv_ctx.cmd_fd); + if (wc_uar->uar == MAP_FAILED) { + errno = ENOMEM; + goto out; + } + ctx->num_wc_uars++; + mlx5_spin_unlock(&ctx->send_db_lock); + + wc_uar->uar_idx = db_idx; + for (j = 0; j < MLX5_NUM_UUARS_PER_PAGE; ++j) { + wc_uar->send_db_data[j].bf.reg = wc_uar->uar + MLX5_BF_OFFSET + (j * ctx->bf_reg_size); + wc_uar->send_db_data[j].bf.buf_size = ctx->bf_reg_size / 2; + wc_uar->send_db_data[j].bf.db_method = (mlx5_single_threaded && wc_auto_evict_size() == 64) ? + MLX5_DB_METHOD_DEDIC_BF_1_THREAD : MLX5_DB_METHOD_DEDIC_BF; + wc_uar->send_db_data[j].bf.offset = 0; + + mlx5_lock_init(&wc_uar->send_db_data[j].bf.lock, + 0, + mlx5_get_locktype()); + + wc_uar->send_db_data[j].bf.need_lock = mlx5_single_threaded ? 0 : 1; + /* Indicate that this BF UUAR is not from the static + * UUAR infrastructure + */ + wc_uar->send_db_data[j].bf.uuarn = MLX5_EXP_INVALID_UUAR; + wc_uar->send_db_data[j].wc_uar = wc_uar; + } + for (j = 0; j < MLX5_NUM_UUARS_PER_PAGE - 1; ++j) { + mlx5_spin_lock(&ctx->send_db_lock); + list_add(&wc_uar->send_db_data[j].list, &ctx->send_wc_db_list); + mlx5_spin_unlock(&ctx->send_db_lock); + } + + /* Return the last send_db object to the caller */ + send_db = &wc_uar->send_db_data[j]; + } + + return send_db; + +out: + mlx5_spin_unlock(&ctx->send_db_lock); + free(wc_uar); + + return NULL; +} + +struct ibv_exp_res_domain *mlx5_exp_create_res_domain(struct ibv_context *context, + struct ibv_exp_res_domain_init_attr *attr) +{ + struct mlx5_context *ctx = to_mctx(context); + struct mlx5_res_domain *res_domain; + + if (attr->comp_mask >= IBV_EXP_RES_DOMAIN_RESERVED) { + errno = EINVAL; + return NULL; + } + + if (!ctx->max_ctx_res_domain) { + errno = ENOSYS; + return NULL; + } + + res_domain = calloc(1, sizeof(*res_domain)); + if (!res_domain) { + errno = ENOMEM; + return NULL; + } + + res_domain->ibv_res_domain.context = context; + + /* set default values */ + res_domain->attr.thread_model = IBV_EXP_THREAD_SAFE; + res_domain->attr.msg_model = IBV_EXP_MSG_DEFAULT; + /* get requested valid values */ + if (attr->comp_mask & IBV_EXP_RES_DOMAIN_THREAD_MODEL) + res_domain->attr.thread_model = attr->thread_model; + if (attr->comp_mask & IBV_EXP_RES_DOMAIN_MSG_MODEL) + res_domain->attr.msg_model = attr->msg_model; + res_domain->attr.comp_mask = IBV_EXP_RES_DOMAIN_RESERVED - 1; + + res_domain->send_db = allocate_send_db(ctx); + if (!res_domain->send_db) { + if (res_domain->attr.msg_model == IBV_EXP_MSG_FORCE_LOW_LATENCY) + goto err; + } else { + switch (res_domain->attr.thread_model) { + case IBV_EXP_THREAD_SAFE: + res_domain->send_db->bf.db_method = MLX5_DB_METHOD_BF; + res_domain->send_db->bf.need_lock = 1; + break; + case IBV_EXP_THREAD_UNSAFE: + res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF; + res_domain->send_db->bf.need_lock = 0; + break; + case IBV_EXP_THREAD_SINGLE: + if (wc_auto_evict_size() == 64) { + res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF_1_THREAD; + res_domain->send_db->bf.need_lock = 0; + } else { + res_domain->send_db->bf.db_method = MLX5_DB_METHOD_DEDIC_BF; + res_domain->send_db->bf.need_lock = 0; + } + break; + } + } + + return &res_domain->ibv_res_domain; + +err: + free(res_domain); + + return NULL; +} + +static void free_send_db(struct mlx5_context *ctx, + struct mlx5_send_db_data *send_db) +{ + /* + * Currently we free the resource domain UUAR to the local + * send_wc_db_list. In the future we may consider unmapping + * UAR which all its UUARs are free. + */ + mlx5_spin_lock(&ctx->send_db_lock); + list_add(&send_db->list, &ctx->send_wc_db_list); + mlx5_spin_unlock(&ctx->send_db_lock); +} + +int mlx5_exp_destroy_res_domain(struct ibv_context *context, + struct ibv_exp_res_domain *res_dom, + struct ibv_exp_destroy_res_domain_attr *attr) +{ + struct mlx5_res_domain *res_domain; + + if (!res_dom) + return EINVAL; + + res_domain = to_mres_domain(res_dom); + if (res_domain->send_db) + free_send_db(to_mctx(context), res_domain->send_db); + + free(res_domain); + + return 0; +} + +void *mlx5_exp_query_intf(struct ibv_context *context, struct ibv_exp_query_intf_params *params, + enum ibv_exp_query_intf_status *status) +{ + void *family = NULL; + struct mlx5_qp *qp; + struct mlx5_cq *cq; + struct mlx5_rwq *rwq; + + *status = IBV_EXP_INTF_STAT_OK; + + if (!params->obj) { + errno = EINVAL; + *status = IBV_EXP_INTF_STAT_INVAL_OBJ; + return NULL; + } + + switch (params->intf) { + case IBV_EXP_INTF_QP_BURST: + qp = to_mqp(params->obj); + if (qp->gen_data_warm.pattern == MLX5_QP_PATTERN) { + family = mlx5_get_qp_burst_family(qp, params, status); + if (*status != IBV_EXP_INTF_STAT_OK) { + fprintf(stderr, PFX "Failed to get QP burst family\n"); + errno = EINVAL; + } + } else { + fprintf(stderr, PFX "Warning: non-valid QP passed to query interface 0x%x 0x%x\n", qp->gen_data_warm.pattern, MLX5_QP_PATTERN); + *status = IBV_EXP_INTF_STAT_INVAL_OBJ; + errno = EINVAL; + } + break; + + case IBV_EXP_INTF_CQ: + cq = to_mcq(params->obj); + if (cq->pattern == MLX5_CQ_PATTERN) { + family = (void *)mlx5_get_poll_cq_family(cq, params, status); + } else { + fprintf(stderr, PFX "Warning: non-valid CQ passed to query interface\n"); + *status = IBV_EXP_INTF_STAT_INVAL_OBJ; + errno = EINVAL; + } + break; + + case IBV_EXP_INTF_WQ: + rwq = to_mrwq(params->obj); + if (rwq->pattern == MLX5_WQ_PATTERN) { + family = mlx5_get_wq_family(rwq, params, status); + if (*status != IBV_EXP_INTF_STAT_OK) { + fprintf(stderr, PFX "Failed to get WQ family\n"); + errno = EINVAL; + } + } else { + fprintf(stderr, PFX "Warning: non-valid WQ passed to query interface\n"); + *status = IBV_EXP_INTF_STAT_INVAL_OBJ; + errno = EINVAL; + } + break; + + default: + *status = IBV_EXP_INTF_STAT_INTF_NOT_SUPPORTED; + errno = EINVAL; + } + + return family; +} + +int mlx5_exp_release_intf(struct ibv_context *context, void *intf, + struct ibv_exp_release_intf_params *params) +{ + return 0; +} + +#define READL(ptr) (*((uint32_t *)(ptr))) +static int mlx5_read_clock(struct ibv_context *context, uint64_t *cycles) +{ + uint32_t clockhi, clocklo, clockhi1; + int i; + struct mlx5_context *ctx = to_mctx(context); + + if (!ctx->hca_core_clock) + return -EOPNOTSUPP; + + /* Handle wraparound */ + for (i = 0; i < 2; i++) { + clockhi = ntohl(READL(ctx->hca_core_clock)); + clocklo = ntohl(READL(ctx->hca_core_clock + 4)); + clockhi1 = ntohl(READL(ctx->hca_core_clock)); + if (clockhi == clockhi1) + break; + } + + *cycles = (uint64_t)(clockhi & 0x7fffffff) << 32 | (uint64_t)clocklo; + + return 0; +} + +int mlx5_exp_query_values(struct ibv_context *context, int q_values, + struct ibv_exp_values *values) +{ + int err = 0; + + values->comp_mask = 0; + + if (q_values & (IBV_EXP_VALUES_HW_CLOCK | IBV_EXP_VALUES_HW_CLOCK_NS)) { + uint64_t cycles; + + err = mlx5_read_clock(context, &cycles); + if (!err) { + if (q_values & IBV_EXP_VALUES_HW_CLOCK) { + values->hwclock = cycles; + values->comp_mask |= IBV_EXP_VALUES_HW_CLOCK; + } + if (q_values & IBV_EXP_VALUES_HW_CLOCK_NS) { + struct mlx5_context *ctx = to_mctx(context); + + values->hwclock_ns = + (((uint64_t)values->hwclock & + ctx->core_clock.mask) * + ctx->core_clock.mult) + >> ctx->core_clock.shift; + values->comp_mask |= IBV_EXP_VALUES_HW_CLOCK_NS; + } + } + } + + return err; +} + Index: contrib/ofed/libmlx5/src/wqe.h =================================================================== --- /dev/null +++ contrib/ofed/libmlx5/src/wqe.h @@ -0,0 +1,298 @@ +/* + * Copyright (c) 2012 Mellanox Technologies, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef WQE_H +#define WQE_H + +enum { + MLX5_WQE_CTRL_CQ_UPDATE = 2 << 2, + MLX5_WQE_CTRL_SOLICITED = 1 << 1, + MLX5_WQE_CTRL_FENCE = 4 << 5, +}; + +enum { + MLX5_INVALID_LKEY = 0x100, +}; + +enum { + MLX5_EXTENDED_UD_AV = 0x80000000, +}; + +enum { + MLX5_FENCE_MODE_NONE = 0 << 5, + MLX5_FENCE_MODE_INITIATOR_SMALL = 1 << 5, + MLX5_FENCE_MODE_STRONG_ORDERING = 3 << 5, + MLX5_FENCE_MODE_SMALL_AND_FENCE = 4 << 5, +}; + +struct mlx5_wqe_srq_next_seg { + uint8_t rsvd0[2]; + uint16_t next_wqe_index; + uint8_t signature; + uint8_t rsvd1[11]; +}; + +struct mlx5_wqe_data_seg { + uint32_t byte_count; + uint32_t lkey; + uint64_t addr; +}; + +struct mlx5_eqe_comp { + uint32_t reserved[6]; + uint32_t cqn; +}; + +struct mlx5_eqe_qp_srq { + uint32_t reserved[6]; + uint32_t qp_srq_n; +}; + +enum { + MLX5_ETH_WQE_L3_CSUM = (1 << 6), + MLX5_ETH_WQE_L4_CSUM = (1 << 7), +}; + +enum { + MLX5_ETH_INLINE_HEADER_SIZE = 16, +}; + +struct mlx5_wqe_eth_seg { + uint32_t rsvd0; + uint8_t cs_flags; + uint8_t rsvd1; + uint16_t mss; + uint32_t rsvd2; + uint16_t inline_hdr_sz; + uint8_t inline_hdr_start[2]; + uint8_t inline_hdr[16]; +}; + +struct mlx5_wqe_ctrl_seg { + uint32_t opmod_idx_opcode; + uint32_t qpn_ds; + uint8_t signature; + uint8_t rsvd[2]; + uint8_t fm_ce_se; + uint32_t imm; +}; + +struct mlx5_wqe_xrc_seg { + uint32_t xrc_srqn; + uint8_t rsvd[12]; +}; + +struct mlx5_wqe_masked_atomic_seg { + uint64_t swap_add; + uint64_t compare; + uint64_t swap_add_mask; + uint64_t compare_mask; +}; + +struct mlx5_base_av { + union { + struct { + uint32_t qkey; + uint32_t reserved; + } qkey; + uint64_t dc_key; + } key; + uint32_t dqp_dct; + uint8_t stat_rate_sl; + uint8_t fl_mlid; + uint16_t rlid; +}; + +struct mlx5_grh_av { + uint8_t reserved0[4]; + uint8_t rmac[6]; + uint8_t tclass; + uint8_t hop_limit; + uint32_t grh_gid_fl; + uint8_t rgid[16]; +}; + +struct mlx5_wqe_av { + struct mlx5_base_av base; + struct mlx5_grh_av grh_sec; +}; + +struct mlx5_wqe_datagram_seg { + struct mlx5_wqe_av av; +}; + +struct mlx5_wqe_raddr_seg { + uint64_t raddr; + uint32_t rkey; + uint32_t reserved; +}; + +struct mlx5_wqe_atomic_seg { + uint64_t swap_add; + uint64_t compare; +}; + +struct mlx5_wqe_inl_data_seg { + uint32_t byte_count; +}; + +struct mlx5_wqe_umr_ctrl_seg { + uint8_t flags; + uint8_t rsvd0[3]; + uint16_t klm_octowords; + uint16_t bsf_octowords; + uint64_t mkey_mask; + uint8_t rsvd1[32]; +}; + +struct mlx5_mkey_seg { + /* This is a two bit field occupying bits 31-30. + * bit 31 is always 0, + * bit 30 is zero for regular MRs and 1 (e.g free) for UMRs that do not have tanslation + */ + uint8_t status; + uint8_t pcie_control; + uint8_t flags; + uint8_t version; + uint32_t qpn_mkey7_0; + uint8_t rsvd1[4]; + uint32_t flags_pd; + uint64_t start_addr; + uint64_t len; + uint32_t bsfs_octo_size; + uint8_t rsvd2[16]; + uint32_t xlt_oct_size; + uint8_t rsvd3[3]; + uint8_t log2_page_size; + uint8_t rsvd4[4]; +}; + +struct mlx5_seg_set_psv { + uint8_t rsvd[4]; + uint16_t syndrome; + uint16_t status; + uint16_t block_guard; + uint16_t app_tag; + uint32_t ref_tag; + uint32_t mkey; + uint64_t va; +}; + +struct mlx5_seg_get_psv { + uint8_t rsvd[19]; + uint8_t num_psv; + uint32_t l_key; + uint64_t va; + uint32_t psv_index[4]; +}; + +struct mlx5_seg_check_psv { + uint8_t rsvd0[2]; + uint16_t err_coalescing_op; + uint8_t rsvd1[2]; + uint16_t xport_err_op; + uint8_t rsvd2[2]; + uint16_t xport_err_mask; + uint8_t rsvd3[7]; + uint8_t num_psv; + uint32_t l_key; + uint64_t va; + uint32_t psv_index[4]; +}; + +struct mlx5_seg_repeat_ent { + uint16_t stride; + uint16_t byte_count; + uint32_t memkey; + uint64_t va; +}; + +struct mlx5_seg_repeat_block { + uint32_t byte_count; + uint32_t const_0x400; + uint32_t repeat_count; + uint16_t reserved; + uint16_t num_ent; + struct mlx5_seg_repeat_ent entries[0]; +}; + +struct mlx5_rwqe_sig { + uint8_t rsvd0[4]; + uint8_t signature; + uint8_t rsvd1[11]; +}; + +struct mlx5_wqe_signature_seg { + uint8_t rsvd0[4]; + uint8_t signature; + uint8_t rsvd1[11]; +}; + +struct mlx5_wqe_inline_seg { + uint32_t byte_count; +}; + +struct mlx5_wqe_wait_en_seg { + uint8_t rsvd0[8]; + uint32_t pi; + uint32_t obj_num; +}; + +enum { + MLX5_MKEY_MASK_LEN = 1ull << 0, + MLX5_MKEY_MASK_PAGE_SIZE = 1ull << 1, + MLX5_MKEY_MASK_START_ADDR = 1ull << 6, + MLX5_MKEY_MASK_PD = 1ull << 7, + MLX5_MKEY_MASK_EN_RINVAL = 1ull << 8, + MLX5_MKEY_MASK_EN_SIGERR = 1ull << 9, + MLX5_MKEY_MASK_BSF_EN = 1ull << 12, + MLX5_MKEY_MASK_KEY = 1ull << 13, + MLX5_MKEY_MASK_QPN = 1ull << 14, + MLX5_MKEY_MASK_LR = 1ull << 17, + MLX5_MKEY_MASK_LW = 1ull << 18, + MLX5_MKEY_MASK_RR = 1ull << 19, + MLX5_MKEY_MASK_RW = 1ull << 20, + MLX5_MKEY_MASK_A = 1ull << 21, + MLX5_MKEY_MASK_SMALL_FENCE = 1ull << 23, + MLX5_MKEY_MASK_FREE = 1ull << 29, +}; + +enum { + MLX5_PERM_LOCAL_READ = 1 << 2, + MLX5_PERM_LOCAL_WRITE = 1 << 3, + MLX5_PERM_REMOTE_READ = 1 << 4, + MLX5_PERM_REMOTE_WRITE = 1 << 5, + MLX5_PERM_ATOMIC = 1 << 6, + MLX5_PERM_UMR_EN = 1 << 7, +}; + +#endif /* WQE_H */ Index: contrib/ofed/usr.lib/Makefile =================================================================== --- contrib/ofed/usr.lib/Makefile +++ contrib/ofed/usr.lib/Makefile @@ -1,4 +1,4 @@ -SUBDIR= libibcommon libibmad libibumad libibverbs libmlx4 libmthca \ +SUBDIR= libibverbs libibcommon libibmad libibumad libmlx5 libmlx4 libmthca \ libopensm libosmcomp libosmvendor libibcm librdmacm libsdp libcxgb4 SUBDIR_DEPEND_libcxgb4= libibverbs @@ -6,6 +6,7 @@ SUBDIR_DEPEND_libibmad= libibcommon libibumad SUBDIR_DEPEND_libibumad= libibcommon SUBDIR_DEPEND_libmlx4= libibverbs +SUBDIR_DEPEND_libmlx5= libibverbs SUBDIR_DEPEND_libmthca= libibverbs SUBDIR_DEPEND_libosmvendor= libibumad libopensm libosmcomp SUBDIR_DEPEND_librdmacm= libibverbs Index: contrib/ofed/usr.lib/libmlx5/Makefile =================================================================== --- /dev/null +++ contrib/ofed/usr.lib/libmlx5/Makefile @@ -0,0 +1,25 @@ +# $FreeBSD$ + +SHLIBDIR?= /usr/lib + +.include + +MLX5DIR= ${.CURDIR}/../../libmlx5 +IBVERBSDIR= ${.CURDIR}/../../libibverbs +MLXSRCDIR= ${MLX5DIR}/src + +.PATH: ${MLXSRCDIR} + +LIB= mlx5 +SHLIB_MAJOR= 1 +MK_PROFILE= no + +SRCS= buf.c cq.c dbrec.c implicit_lkey.c mlx5.c qp.c srq.c verbs.c + +LIBADD= ibverbs pthread +CFLAGS+= -DHAVE_CONFIG_H +CFLAGS+= -I${.CURDIR} -I${MLXSRCDIR} -I${IBVERBSDIR}/include + +VERSION_MAP= ${MLXSRCDIR}/mlx5.map + +.include Index: contrib/ofed/usr.lib/libmlx5/config.h =================================================================== --- /dev/null +++ contrib/ofed/usr.lib/libmlx5/config.h @@ -0,0 +1,92 @@ +/* config.h. Generated from config.h.in by configure. */ +/* config.h.in. Generated from configure.ac by autoheader. */ + +/* Define to 1 if you have the header file. */ +#define HAVE_DLFCN_H 1 + +/* Define to 1 if you have the `ibv_dofork_range' function. */ +#define HAVE_IBV_DOFORK_RANGE 1 + +/* Define to 1 if you have the `ibv_dontfork_range' function. */ +#define HAVE_IBV_DONTFORK_RANGE 1 + +/* adding verbs extension support */ +/* #undef HAVE_IBV_EXT */ + +/* Define to 1 if you have the `ibv_register_driver' function. */ +#define HAVE_IBV_REGISTER_DRIVER 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_INTTYPES_H 1 + +/* Define to 1 if you have the `ibverbs' library (-libverbs). */ +#define HAVE_LIBIBVERBS 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_MEMORY_H 1 + +/* adding numa support */ +/* #undef HAVE_NUMA */ + +/* Define to 1 if you have the header file. */ +#define HAVE_STDINT_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STDLIB_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STRINGS_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_STRING_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_STAT_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_SYS_TYPES_H 1 + +/* Define to 1 if you have the header file. */ +#define HAVE_UNISTD_H 1 + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_VALGRIND_MEMCHECK_H */ + +/* Define to the sub-directory where libtool stores uninstalled libraries. */ +#define LT_OBJDIR ".libs/" + +/* Define to 1 to disable Valgrind annotations. */ +#define NVALGRIND 1 + +/* Name of package */ +#define PACKAGE "libmlx5" + +/* Define to the address where bug reports for this package should be sent. */ +#define PACKAGE_BUGREPORT "linux-rdma@vger.kernel.org" + +/* Define to the full name of this package. */ +#define PACKAGE_NAME "libmlx5" + +/* Define to the full name and version of this package. */ +#define PACKAGE_STRING "libmlx5 1.0.2mlnx1" + +/* Define to the one symbol short name of this package. */ +#define PACKAGE_TARNAME "libmlx5" + +/* Define to the home page for this package. */ +#define PACKAGE_URL "" + +/* Define to the version of this package. */ +#define PACKAGE_VERSION "1.0.2mlnx1" + +/* The size of `long', as computed by sizeof. */ +#define SIZEOF_LONG 8 + +/* Define to 1 if you have the ANSI C header files. */ +#define STDC_HEADERS 1 + +/* Version number of package */ +#define VERSION "1.0.2mlnx1" + +/* Define to empty if `const' does not conform to ANSI C. */ +/* #undef const */ Index: share/mk/bsd.libnames.mk =================================================================== --- share/mk/bsd.libnames.mk +++ share/mk/bsd.libnames.mk @@ -104,6 +104,7 @@ LIBMENU?= ${DESTDIR}${LIBDIR}/libmenu.a LIBMILTER?= ${DESTDIR}${LIBDIR}/libmilter.a LIBMLX4?= ${DESTDIR}${LIBDIR}/libmlx4.a +LIBMLX5?= ${DESTDIR}${LIBDIR}/libmlx5.a LIBMP?= ${DESTDIR}${LIBDIR}/libmp.a LIBMT?= ${DESTDIR}${LIBDIR}/libmt.a LIBMTHCA?= ${DESTDIR}${LIBDIR}/libmthca.a Index: share/mk/src.libnames.mk =================================================================== --- share/mk/src.libnames.mk +++ share/mk/src.libnames.mk @@ -196,6 +196,7 @@ ibumad \ ibverbs \ mlx4 \ + mlx5 \ mthca \ opensm \ osmcomp \ @@ -332,6 +333,7 @@ _DP_ibmad= ibcommon ibumad _DP_ibumad= ibcommon _DP_mlx4= ibverbs pthread +_DP_mlx5= ibverbs pthread _DP_mthca= ibverbs pthread _DP_opensm= pthread _DP_osmcomp= pthread @@ -481,6 +483,7 @@ LIBIBUMADDIR= ${OBJTOP}/contrib/ofed/usr.lib/libibumad LIBIBVERBSDIR= ${OBJTOP}/contrib/ofed/usr.lib/libibverbs LIBMLX4DIR= ${OBJTOP}/contrib/ofed/usr.lib/libmlx4 +LIBMLX5DIR= ${OBJTOP}/contrib/ofed/usr.lib/libmlx5 LIBMTHCADIR= ${OBJTOP}/contrib/ofed/usr.lib/libmthca LIBOPENSMDIR= ${OBJTOP}/contrib/ofed/usr.lib/libopensm LIBOSMCOMPDIR= ${OBJTOP}/contrib/ofed/usr.lib/libosmcomp